Skip to content

Commit

Permalink
Add dynamic artifacts naming, documentation and tests (#3201)
Browse files Browse the repository at this point in the history
* Add dynamic artifacts naming, documentation and tests

* add `ArtifactConfig` support

* reshape docs

* silent ruff by intent

* rework name evaluation logic and extend testing

* highlight the impact of cache on naming

* reenable macos integration testing

* Revert "reenable macos integration testing"

This reverts commit 791fc13.

* rework following suggestions

* rework following suggestions

* functional dynamic naming using placeholders

* resolve branching

* extra_name_placeholders -> name_subs

* fix doc string

* move `original_name` to `ArtifactVersion`

* update the calls of adjusted models

* sunset `original_name`

* `name_subs`>`substitutions`

* consistent substitutions across the board

* resolve test names conflict

* resolve test names conflict

* resolve test names conflict

* extend pipeline run response with full substitutions

* review suggestions

* push None checks deeper

* simplify `format_name_template`

* non-optional substitutions

* fix status quo

* move `subs` compute to schema to model

* sunset `_define_output_names`

* lint

* refactor `_get_full_substitutions`

* remove unused property

* remove redundant field in response

* Revert "remove redundant field in response"

This reverts commit a469594.

* rename

* fix bug with unannotated outputs evaluation

* add default value

* update docs
  • Loading branch information
avishniakov authored Nov 28, 2024
1 parent eb12a64 commit 1b16f16
Show file tree
Hide file tree
Showing 37 changed files with 904 additions and 218 deletions.
3 changes: 2 additions & 1 deletion .gitbook.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ redirects:
how-to/build-pipelines/schedule-a-pipeline: how-to/pipeline-development/build-pipelines/schedule-a-pipeline.md
how-to/build-pipelines/delete-a-pipeline: how-to/pipeline-development/build-pipelines/delete-a-pipeline.md
how-to/build-pipelines/compose-pipelines: how-to/pipeline-development/build-pipelines/compose-pipelines.md
how-to/build-pipelines/dynamically-assign-artifact-names: how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md
how-to/build-pipelines/dynamically-assign-artifact-names: how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names: how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
how-to/build-pipelines/retry-steps: how-to/pipeline-development/build-pipelines/retry-steps.md
how-to/build-pipelines/run-pipelines-asynchronously: how-to/pipeline-development/build-pipelines/run-pipelines-asynchronously.md
how-to/build-pipelines/control-execution-order-of-steps: how-to/pipeline-development/build-pipelines/control-execution-order-of-steps.md
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
description: Understand how you can name your ZenML artifacts.
---

# How Artifact Naming works in ZenML

In ZenML pipelines, you often need to reuse the same step multiple times with different inputs, resulting in multiple artifacts. However, the default naming convention for artifacts can make it challenging to track and differentiate between these outputs, especially when they need to be used in subsequent pipelines. Below you can find a detailed exploration of how you might name your output artifacts dynamically or statically, depending on your needs.

ZenML uses type annotations in function definitions to determine artifact names. Output artifacts with the same name are saved with incremented version numbers.

ZenML provides flexible options for naming output artifacts, supporting both static and dynamic naming strategies:
- Names can be generated dynamically at runtime
- Support for string templates (standard and custom placeholders supported)
- Compatible with single and multiple output scenarios
- Annotations help define naming strategy without modifying core logic

## Naming Strategies

### Static Naming
Static names are defined directly as string literals.

```python
@step
def static_single() -> Annotated[str, "static_output_name"]:
return "null"
```

### Dynamic Naming
Dynamic names can be generated using:

#### String Templates Using Standard Placeholders
Use the following placeholders that ZenML will replace automatically:

* `{date}` will resolve to the current date, e.g. `2024_11_18`
* `{time}` will resolve to the current time, e.g. `11_07_09_326492`

```python
@step
def dynamic_single_string() -> Annotated[str, "name_{date}_{time}"]:
return "null"
```

#### String Templates Using Custom Placeholders
Use any placeholders that ZenML will replace for you, if they are provided into a step via `substitutions` parameter:

```python
@step(substitutions={"custom_placeholder": "some_substitute"})
def dynamic_single_string() -> Annotated[str, "name_{custom_placeholder}_{time}"]:
return "null"
```

Another option is to use `with_options` to dynamically redefine the placeholder, like this:

```python
@step
def extract_data(source: str) -> Annotated[str, "{stage}_dataset"]:
...
return "my data"

@pipeline
def extraction_pipeline():
extract_data.with_options(substitutions={"stage": "train"})(source="s3://train")
extract_data.with_options(substitutions={"stage": "test"})(source="s3://test")
```

{% hint style="info" %}
The substitutions for the custom placeholders like `stage` can be set in:
- `@pipeline` decorator, so they are effective for all steps in this pipeline
- `pipeline.with_options` function, so they are effective for all steps in this pipeline run
- `@step` decorator, so they are effective for this step (this overrides the pipeline settings)
- `step.with_options` function, so they are effective for this step run (this overrides the pipeline settings)

Standard substitutions always available and consistent in all steps of the pipeline are:
- `{date}`: current date, e.g. `2024_11_27`
- `{time}`: current time in UTC format, e.g. `11_07_09_326492`
{% endhint %}

### Multiple Output Handling

If you plan to return multiple artifacts from you ZenML step you can flexibly combine all naming options outlined above, like this:

```python
@step
def mixed_tuple() -> Tuple[
Annotated[str, "static_output_name"],
Annotated[str, "name_{date}_{time}"],
]:
return "static_namer", "str_namer"
```

## Naming in cached runs

If your ZenML step is running with enabled caching and cache was used the names of the outputs artifacts (both static and dynamic) will remain the same as in the original run.

```python
from typing_extensions import Annotated
from typing import Tuple

from zenml import step, pipeline
from zenml.models import PipelineRunResponse


@step(substitutions={"custom_placeholder": "resolution"})
def demo() -> Tuple[
Annotated[int, "name_{date}_{time}"],
Annotated[int, "name_{custom_placeholder}"],
]:
return 42, 43


@pipeline
def my_pipeline():
demo()


if __name__ == "__main__":
run_without_cache: PipelineRunResponse = my_pipeline.with_options(
enable_cache=False
)()
run_with_cache: PipelineRunResponse = my_pipeline.with_options(enable_cache=True)()

assert set(run_without_cache.steps["demo"].outputs.keys()) == set(
run_with_cache.steps["demo"].outputs.keys()
)
print(list(run_without_cache.steps["demo"].outputs.keys()))
```

These 2 runs will produce output like the one below:
```
Initiating a new run for the pipeline: my_pipeline.
Caching is disabled by default for my_pipeline.
Using user: default
Using stack: default
orchestrator: default
artifact_store: default
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml login --local.
Step demo has started.
Step demo has finished in 0.038s.
Pipeline run has finished in 0.064s.
Initiating a new run for the pipeline: my_pipeline.
Using user: default
Using stack: default
orchestrator: default
artifact_store: default
You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml login --local.
Using cached version of step demo.
All steps of the pipeline run were cached.
['name_2024_11_21_14_27_33_750134', 'name_resolution']
```

<!-- For scarf -->
<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@ Please note in the above example if the model version exists, it is automaticall

## Use name templates for your model versions

If you want to continuously run the same project, but keep track of your model versions using semantical naming, you can rely on templated naming in the `version` argument to the `Model` object. Instead of static model version name from the previous section, templated names will be unique with every new run, but also will be semantically searchable and readable by your team.
If you want to continuously run the same project, but keep track of your model versions using semantical naming, you can rely on templated naming in the `version` and/or `name` argument to the `Model` object. Instead of static model version name from the previous section, templated names will be unique with every new run, but also will be semantically searchable and readable by your team.

```python
from zenml import Model, step, pipeline

model= Model(
name="my_model",
name="{team}_my_model",
version="experiment_with_phi_3_{date}_{time}"
)

Expand All @@ -53,16 +53,24 @@ def llm_trainer(...) -> ...:
...

# This configures it for all steps within the pipeline
@pipeline(model=model)
@pipeline(model=model, substitutions={"team": "Team_A"})
def training_pipeline( ... ):
# training happens here
```

Here we are specifically setting the model configuration for a particular step or for the pipeline as a whole. Once you run this pipeline it will produce a model version with a name evaluated at a runtime, like `experiment_with_phi_3_2024_08_30_12_42_53`. Subsequent runs will also have unique but readable names.
Here we are specifically setting the model configuration for a particular step or for the pipeline as a whole. Once you run this pipeline it will produce a model version with a name evaluated at a runtime, like `experiment_with_phi_3_2024_08_30_12_42_53`. Subsequent runs will have the same name of the model and model version, since the substitutions like `time` and `date` are evaluated for the whole pipeline run. We also used a custom substitution via `{team}` placeholder and set it to `Team_A` in the `pipeline` decorator.

We currently support following placeholders to be used in model version name templates:
- `{date}`: current date
- `{time}`: current time in UTC format
{% hint style="info" %}
The substitutions for the custom placeholders like `team` can be set in:
- `@pipeline` decorator, so they are effective for all steps in this pipeline
- `pipeline.with_options` function, so they are effective for all steps in this pipeline run
- `@step` decorator, so they are effective for this step (this overrides the pipeline settings)
- `step.with_options` function, so they are effective for this step run (this overrides the pipeline settings)

Standard substitutions always available and consistent in all steps of the pipeline are:
- `{date}`: current date, e.g. `2024_11_27`
- `{time}`: current time in UTC format, e.g. `11_07_09_326492`
{% endhint %}

## Fetching model versions by stage

Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,21 @@ training_pipeline = training_pipeline.with_options(
training_pipeline()
```

Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the following placeholders that ZenML will replace:
Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the placeholders that ZenML will replace.

* `{date}` will resolve to the current date, e.g. `2023_02_19`
* `{time}` will resolve to the current time, e.g. `11_07_09_326492`
{% hint style="info" %}
The substitutions for the custom placeholders like `experiment_name` can be set in:
- `@pipeline` decorator, so they are effective for all steps in this pipeline
- `pipeline.with_options` function, so they are effective for all steps in this pipeline run

Standard substitutions always available and consistent in all steps of the pipeline are:
- `{date}`: current date, e.g. `2024_11_27`
- `{time}`: current time in UTC format, e.g. `11_07_09_326492`
{% endhint %}

```python
training_pipeline = training_pipeline.with_options(
run_name="custom_pipeline_run_name_{date}_{time}"
run_name="custom_pipeline_run_name_{experiment_name}_{date}_{time}"
)
training_pipeline()
```
Expand Down
2 changes: 1 addition & 1 deletion docs/book/toc.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@
* [Schedule a pipeline](how-to/pipeline-development/build-pipelines/schedule-a-pipeline.md)
* [Deleting a pipeline](how-to/pipeline-development/build-pipelines/delete-a-pipeline.md)
* [Compose pipelines](how-to/pipeline-development/build-pipelines/compose-pipelines.md)
* [Dynamically assign artifact names](how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md)
* [Automatically retry steps](how-to/pipeline-development/build-pipelines/retry-steps.md)
* [Run pipelines asynchronously](how-to/pipeline-development/build-pipelines/run-pipelines-asynchronously.md)
* [Control execution order of steps](how-to/pipeline-development/build-pipelines/control-execution-order-of-steps.md)
Expand Down Expand Up @@ -123,6 +122,7 @@
* [How ZenML stores data](how-to/data-artifact-management/handle-data-artifacts/artifact-versioning.md)
* [Return multiple outputs from a step](how-to/data-artifact-management/handle-data-artifacts/return-multiple-outputs-from-a-step.md)
* [Delete an artifact](how-to/data-artifact-management/handle-data-artifacts/delete-an-artifact.md)
* [Artifacts naming](how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md)
* [Organize data with tags](how-to/data-artifact-management/handle-data-artifacts/tagging.md)
* [Get arbitrary artifacts in a step](how-to/data-artifact-management/handle-data-artifacts/get-arbitrary-artifacts-in-a-step.md)
* [Handle custom data types](how-to/data-artifact-management/handle-data-artifacts/handle-custom-data-types.md)
Expand Down
Loading

0 comments on commit 1b16f16

Please sign in to comment.