Add dynamic artifacts naming, documentation and tests (#3201)

* Add dynamic artifacts naming, documentation and tests * add `ArtifactConfig` support * reshape docs * silent ruff by intent * rework name evaluation logic and extend testing * highlight the impact of cache on naming * reenable macos integration testing * Revert "reenable macos integration testing" This reverts commit 791fc13. * rework following suggestions * rework following suggestions * functional dynamic naming using placeholders * resolve branching * extra_name_placeholders -> name_subs * fix doc string * move `original_name` to `ArtifactVersion` * update the calls of adjusted models * sunset `original_name` * `name_subs`>`substitutions` * consistent substitutions across the board * resolve test names conflict * resolve test names conflict * resolve test names conflict * extend pipeline run response with full substitutions * review suggestions * push None checks deeper * simplify `format_name_template` * non-optional substitutions * fix status quo * move `subs` compute to schema to model * sunset `_define_output_names` * lint * refactor `_get_full_substitutions` * remove unused property * remove redundant field in response * Revert "remove redundant field in response" This reverts commit a469594. * rename * fix bug with unannotated outputs evaluation * add default value * update docs
zenml-io · Nov 28, 2024 · 1b16f16 · 1b16f16
1 parent eb12a64
commit 1b16f16
Show file tree

Hide file tree

Showing 37 changed files with 904 additions and 218 deletions.
diff --git a/.gitbook.yaml b/.gitbook.yaml
@@ -51,7 +51,8 @@ redirects:
   how-to/build-pipelines/schedule-a-pipeline: how-to/pipeline-development/build-pipelines/schedule-a-pipeline.md
   how-to/build-pipelines/delete-a-pipeline: how-to/pipeline-development/build-pipelines/delete-a-pipeline.md
   how-to/build-pipelines/compose-pipelines: how-to/pipeline-development/build-pipelines/compose-pipelines.md
-  how-to/build-pipelines/dynamically-assign-artifact-names: how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md
+  how-to/build-pipelines/dynamically-assign-artifact-names: how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
+  how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names: how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
   how-to/build-pipelines/retry-steps: how-to/pipeline-development/build-pipelines/retry-steps.md
   how-to/build-pipelines/run-pipelines-asynchronously: how-to/pipeline-development/build-pipelines/run-pipelines-asynchronously.md
   how-to/build-pipelines/control-execution-order-of-steps: how-to/pipeline-development/build-pipelines/control-execution-order-of-steps.md

diff --git a/.../book/how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md b/.../book/how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md
@@ -0,0 +1,152 @@
+---
+description: Understand how you can name your ZenML artifacts.
+---
+
+# How Artifact Naming works in ZenML 
+
+In ZenML pipelines, you often need to reuse the same step multiple times with different inputs, resulting in multiple artifacts. However, the default naming convention for artifacts can make it challenging to track and differentiate between these outputs, especially when they need to be used in subsequent pipelines. Below you can find a detailed exploration of how you might name your output artifacts dynamically or statically, depending on your needs.
+
+ZenML uses type annotations in function definitions to determine artifact names. Output artifacts with the same name are saved with incremented version numbers.
+
+ZenML provides flexible options for naming output artifacts, supporting both static and dynamic naming strategies:
+- Names can be generated dynamically at runtime
+- Support for string templates (standard and custom placeholders supported)
+- Compatible with single and multiple output scenarios
+- Annotations help define naming strategy without modifying core logic
+
+## Naming Strategies
+
+### Static Naming
+Static names are defined directly as string literals.
+
+```python
+@step
+def static_single() -> Annotated[str, "static_output_name"]:
+    return "null"
+```
+
+### Dynamic Naming
+Dynamic names can be generated using:
+
+#### String Templates Using Standard Placeholders
+Use the following placeholders that ZenML will replace automatically:
+
+* `{date}` will resolve to the current date, e.g. `2024_11_18`
+* `{time}` will resolve to the current time, e.g. `11_07_09_326492`
+
+```python
+@step
+def dynamic_single_string() -> Annotated[str, "name_{date}_{time}"]:
+    return "null"
+```
+
+#### String Templates Using Custom Placeholders
+Use any placeholders that ZenML will replace for you, if they are provided into a step via `substitutions` parameter:
+
+```python
+@step(substitutions={"custom_placeholder": "some_substitute"})
+def dynamic_single_string() -> Annotated[str, "name_{custom_placeholder}_{time}"]:
+    return "null"
+```
+
+Another option is to use `with_options` to dynamically redefine the placeholder, like this:
+
+```python
+@step
+def extract_data(source: str) -> Annotated[str, "{stage}_dataset"]:
+    ...
+    return "my data"
+
+@pipeline
+def extraction_pipeline():
+    extract_data.with_options(substitutions={"stage": "train"})(source="s3://train")
+    extract_data.with_options(substitutions={"stage": "test"})(source="s3://test")
+```
+
+{% hint style="info" %}
+The substitutions for the custom placeholders like `stage` can be set in:
+- `@pipeline` decorator, so they are effective for all steps in this pipeline
+- `pipeline.with_options` function, so they are effective for all steps in this pipeline run
+- `@step` decorator, so they are effective for this step (this overrides the pipeline settings)
+- `step.with_options` function, so they are effective for this step run (this overrides the pipeline settings)
+
+Standard substitutions always available and consistent in all steps of the pipeline are:
+- `{date}`: current date, e.g. `2024_11_27`
+- `{time}`: current time in UTC format, e.g. `11_07_09_326492`
+{% endhint %}
+
+### Multiple Output Handling
+
+If you plan to return multiple artifacts from you ZenML step you can flexibly combine all naming options outlined above, like this:
+
+```python
+@step
+def mixed_tuple() -> Tuple[
+    Annotated[str, "static_output_name"],
+    Annotated[str, "name_{date}_{time}"],
+]:
+    return "static_namer", "str_namer"
+```
+
+## Naming in cached runs
+
+If your ZenML step is running with enabled caching and cache was used the names of the outputs artifacts (both static and dynamic) will remain the same as in the original run.
+
+```python
+from typing_extensions import Annotated
+from typing import Tuple
+
+from zenml import step, pipeline
+from zenml.models import PipelineRunResponse
+
+
+@step(substitutions={"custom_placeholder": "resolution"})
+def demo() -> Tuple[
+    Annotated[int, "name_{date}_{time}"],
+    Annotated[int, "name_{custom_placeholder}"],
+]:
+    return 42, 43
+
+
+@pipeline
+def my_pipeline():
+    demo()
+
+
+if __name__ == "__main__":
+    run_without_cache: PipelineRunResponse = my_pipeline.with_options(
+        enable_cache=False
+    )()
+    run_with_cache: PipelineRunResponse = my_pipeline.with_options(enable_cache=True)()
+
+    assert set(run_without_cache.steps["demo"].outputs.keys()) == set(
+        run_with_cache.steps["demo"].outputs.keys()
+    )
+    print(list(run_without_cache.steps["demo"].outputs.keys()))
+```
+
+These 2 runs will produce output like the one below:
+```
+Initiating a new run for the pipeline: my_pipeline.
+Caching is disabled by default for my_pipeline.
+Using user: default
+Using stack: default
+  orchestrator: default
+  artifact_store: default
+You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml login --local.
+Step demo has started.
+Step demo has finished in 0.038s.
+Pipeline run has finished in 0.064s.
+Initiating a new run for the pipeline: my_pipeline.
+Using user: default
+Using stack: default
+  orchestrator: default
+  artifact_store: default
+You can visualize your pipeline runs in the ZenML Dashboard. In order to try it locally, please run zenml login --local.
+Using cached version of step demo.
+All steps of the pipeline run were cached.
+['name_2024_11_21_14_27_33_750134', 'name_resolution']
+```
+
+<!-- For scarf -->
+<figure><img alt="ZenML Scarf" referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=f0b4f458-0a54-4fcd-aa95-d5ee424815bc" /></figure>
diff --git a/docs/book/how-to/model-management-metrics/model-control-plane/model-versions.md b/docs/book/how-to/model-management-metrics/model-control-plane/model-versions.md
@@ -37,13 +37,13 @@ Please note in the above example if the model version exists, it is automaticall
 
 ## Use name templates for your model versions
 
-If you want to continuously run the same project, but keep track of your model versions using semantical naming, you can rely on templated naming in the `version` argument to the `Model` object. Instead of static model version name from the previous section, templated names will be unique with every new run, but also will be semantically searchable and readable by your team.
+If you want to continuously run the same project, but keep track of your model versions using semantical naming, you can rely on templated naming in the `version` and/or `name` argument to the `Model` object. Instead of static model version name from the previous section, templated names will be unique with every new run, but also will be semantically searchable and readable by your team.
 
 ```python
 from zenml import Model, step, pipeline
 
 model= Model(
-    name="my_model",
+    name="{team}_my_model",
     version="experiment_with_phi_3_{date}_{time}"
 )
 
@@ -53,16 +53,24 @@ def llm_trainer(...) -> ...:
     ...
 
 # This configures it for all steps within the pipeline
-@pipeline(model=model)
+@pipeline(model=model, substitutions={"team": "Team_A"})
 def training_pipeline( ... ):
     # training happens here
 ```
 
-Here we are specifically setting the model configuration for a particular step or for the pipeline as a whole. Once you run this pipeline it will produce a model version with a name evaluated at a runtime, like `experiment_with_phi_3_2024_08_30_12_42_53`. Subsequent runs will also have unique but readable names.
+Here we are specifically setting the model configuration for a particular step or for the pipeline as a whole. Once you run this pipeline it will produce a model version with a name evaluated at a runtime, like `experiment_with_phi_3_2024_08_30_12_42_53`. Subsequent runs will have the same name of the model and model version, since the substitutions like `time` and `date` are evaluated for the whole pipeline run. We also used a custom substitution via `{team}` placeholder and set it to `Team_A` in the `pipeline` decorator.
 
-We currently support following placeholders to be used in model version name templates:
-- `{date}`: current date
-- `{time}`: current time in UTC format
+{% hint style="info" %}
+The substitutions for the custom placeholders like `team` can be set in:
+- `@pipeline` decorator, so they are effective for all steps in this pipeline
+- `pipeline.with_options` function, so they are effective for all steps in this pipeline run
+- `@step` decorator, so they are effective for this step (this overrides the pipeline settings)
+- `step.with_options` function, so they are effective for this step run (this overrides the pipeline settings)
+
+Standard substitutions always available and consistent in all steps of the pipeline are:
+- `{date}`: current date, e.g. `2024_11_27`
+- `{time}`: current time in UTC format, e.g. `11_07_09_326492`
+{% endhint %}
 
 ## Fetching model versions by stage
 

diff --git a/...ow-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md b/...ow-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md
diff --git a/docs/book/how-to/pipeline-development/build-pipelines/name-your-pipeline-runs.md b/docs/book/how-to/pipeline-development/build-pipelines/name-your-pipeline-runs.md
@@ -15,14 +15,21 @@ training_pipeline = training_pipeline.with_options(
 training_pipeline()
 ```
 
-Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the following placeholders that ZenML will replace:
+Pipeline run names must be unique, so if you plan to run your pipelines multiple times or run them on a schedule, make sure to either compute the run name dynamically or include one of the placeholders that ZenML will replace.
 
-* `{date}` will resolve to the current date, e.g. `2023_02_19`
-* `{time}` will resolve to the current time, e.g. `11_07_09_326492`
+{% hint style="info" %}
+The substitutions for the custom placeholders like `experiment_name` can be set in:
+- `@pipeline` decorator, so they are effective for all steps in this pipeline
+- `pipeline.with_options` function, so they are effective for all steps in this pipeline run
+
+Standard substitutions always available and consistent in all steps of the pipeline are:
+- `{date}`: current date, e.g. `2024_11_27`
+- `{time}`: current time in UTC format, e.g. `11_07_09_326492`
+{% endhint %}
 
 ```python
 training_pipeline = training_pipeline.with_options(
-    run_name="custom_pipeline_run_name_{date}_{time}"
+    run_name="custom_pipeline_run_name_{experiment_name}_{date}_{time}"
 )
 training_pipeline()
 ```

diff --git a/docs/book/toc.md b/docs/book/toc.md
@@ -94,7 +94,6 @@
     * [Schedule a pipeline](how-to/pipeline-development/build-pipelines/schedule-a-pipeline.md)
     * [Deleting a pipeline](how-to/pipeline-development/build-pipelines/delete-a-pipeline.md)
     * [Compose pipelines](how-to/pipeline-development/build-pipelines/compose-pipelines.md)
-    * [Dynamically assign artifact names](how-to/pipeline-development/build-pipelines/dynamically-assign-artifact-names.md)
     * [Automatically retry steps](how-to/pipeline-development/build-pipelines/retry-steps.md)
     * [Run pipelines asynchronously](how-to/pipeline-development/build-pipelines/run-pipelines-asynchronously.md)
     * [Control execution order of steps](how-to/pipeline-development/build-pipelines/control-execution-order-of-steps.md)
@@ -123,6 +122,7 @@
     * [How ZenML stores data](how-to/data-artifact-management/handle-data-artifacts/artifact-versioning.md)
     * [Return multiple outputs from a step](how-to/data-artifact-management/handle-data-artifacts/return-multiple-outputs-from-a-step.md)
     * [Delete an artifact](how-to/data-artifact-management/handle-data-artifacts/delete-an-artifact.md)
+    * [Artifacts naming](how-to/data-artifact-management/handle-data-artifacts/artifacts-naming.md)
     * [Organize data with tags](how-to/data-artifact-management/handle-data-artifacts/tagging.md)
     * [Get arbitrary artifacts in a step](how-to/data-artifact-management/handle-data-artifacts/get-arbitrary-artifacts-in-a-step.md)
     * [Handle custom data types](how-to/data-artifact-management/handle-data-artifacts/handle-custom-data-types.md)