Make dicts/lists visualizable and add JSON as viz type (#2882)

* add visualization to serialize dict * add json as visualization type * add JSONString class * add json viz support in notebook * JSON expects a jsonable dict not str * also add list/tuples to viz * add JSON support in docs * Auto-update of Starter template * fix link --------- Co-authored-by: GitHub Actions <[email protected]>
zenml-io · Nov 28, 2024 · a31dabb · a31dabb
1 parent 34f3fe9
commit a31dabb
Show file tree

Hide file tree

Showing 7 changed files with 41 additions and 8 deletions.
diff --git a/docs/book/how-to/advanced-topics/control-logging/disable-colorful-logging.md b/docs/book/how-to/advanced-topics/control-logging/disable-colorful-logging.md
@@ -10,7 +10,7 @@ By default, ZenML uses colorful logging to make it easier to read logs. However,
 ZENML_LOGGING_COLORS_DISABLED=true
 ```
 
-Note that setting this on the [client environment](../configure-python-environments/README.md#client-environment-or-the-runner-environment) (e.g. your local machine which runs the pipeline) will automatically disable colorful logging on remote pipeline runs. If you wish to only disable it locally, but turn on for remote pipeline runs, you can set the `ZENML_LOGGING_COLORS_DISABLED` environment variable in your pipeline runs environment as follows:
+Note that setting this on the [client environment](../../infrastructure-deployment/configure-python-environments/README.md#client-environment-or-the-runner-environment) (e.g. your local machine which runs the pipeline) will automatically disable colorful logging on remote pipeline runs. If you wish to only disable it locally, but turn on for remote pipeline runs, you can set the `ZENML_LOGGING_COLORS_DISABLED` environment variable in your pipeline runs environment as follows:
 
 ```python
 docker_settings = DockerSettings(environment={"ZENML_LOGGING_COLORS_DISABLED": "false"})

diff --git a/.../data-artifact-management/visualize-artifacts/creating-custom-visualizations.md b/.../data-artifact-management/visualize-artifacts/creating-custom-visualizations.md
@@ -12,20 +12,22 @@ Currently, the following visualization types are supported:
 * **Image:** Visualizations of image data such as Pillow images (e.g. `PIL.Image`) or certain numeric numpy arrays,
 * **CSV:** Tables, such as the pandas DataFrame `.describe()` output,
 * **Markdown:** Markdown strings or pages.
+* **JSON:** JSON strings or objects.
 
 There are three ways how you can add custom visualizations to the dashboard:
 
-* If you are already handling HTML, Markdown, or CSV data in one of your steps, you can have them visualized in just a few lines of code by casting them to a [special class](#visualization-via-special-return-types) inside your step.
+* If you are already handling HTML, Markdown, CSV or JSON data in one of your steps, you can have them visualized in just a few lines of code by casting them to a [special class](#visualization-via-special-return-types) inside your step.
 * If you want to automatically extract visualizations for all artifacts of a certain data type, you can define type-specific visualization logic by [building a custom materializer](#visualization-via-materializers).
 * If you want to create any other custom visualizations, you can [create a custom return type class with corresponding materializer](#how-to-think-about-creating-a-custom-visualization) and build and return this custom return type from one of your steps.
 
 ## Visualization via Special Return Types
 
-If you already have HTML, Markdown, or CSV data available as a string inside your step, you can simply cast them to one of the following types and return them from your step:
+If you already have HTML, Markdown, CSV or JSON data available as a string inside your step, you can simply cast them to one of the following types and return them from your step:
 
 * `zenml.types.HTMLString` for strings in HTML format, e.g., `"<h1>Header</h1>Some text"`,
 * `zenml.types.MarkdownString` for strings in Markdown format, e.g., `"# Header\nSome text"`,
 * `zenml.types.CSVString` for strings in CSV format, e.g., `"a,b,c\n1,2,3"`.
+* `zenml.types.JSONString` for strings in JSON format, e.g., `{"key": "value"}`.
 
 ### Example:
 

diff --git a/src/zenml/enums.py b/src/zenml/enums.py
@@ -60,6 +60,7 @@ class VisualizationType(StrEnum):
     HTML = "html"
     IMAGE = "image"
     MARKDOWN = "markdown"
+    JSON = "json"
 
 
 class ZenMLServiceType(StrEnum):

diff --git a/src/zenml/materializers/built_in_materializer.py b/src/zenml/materializers/built_in_materializer.py
@@ -28,7 +28,7 @@
 )
 
 from zenml.artifact_stores.base_artifact_store import BaseArtifactStore
-from zenml.enums import ArtifactType
+from zenml.enums import ArtifactType, VisualizationType
 from zenml.logger import get_logger
 from zenml.materializers.base_materializer import BaseMaterializer
 from zenml.materializers.materializer_registry import materializer_registry
@@ -414,6 +414,25 @@ def save(self, data: Any) -> None:
             for entry in metadata:
                 self.artifact_store.rmtree(entry["path"])
             raise e
+
+    # save dict type objects to JSON file with JSON visualization type
+    def save_visualizations(
+        self, data: Any
+    ) -> Dict[str, "VisualizationType"]:
+        """Save visualizations for the given data.
+
+        Args:
+            data: The data to save visualizations for.
+
+        Returns:
+            A dictionary of visualization URIs and their types.
+        """
+        # dict/list type objects are always saved as JSON files
+        # doesn't work for non-serializable types as they 
+        # are saved as list of lists in different files
+        if _is_serializable(data):
+            return {self.data_path: VisualizationType.JSON}
+        return {}
 
     def extract_metadata(self, data: Any) -> Dict[str, "MetadataType"]:
         """Extract metadata from the given built-in container object.

diff --git a/src/zenml/materializers/structured_string_materializer.py b/src/zenml/materializers/structured_string_materializer.py
@@ -19,22 +19,23 @@
 from zenml.enums import ArtifactType, VisualizationType
 from zenml.logger import get_logger
 from zenml.materializers.base_materializer import BaseMaterializer
-from zenml.types import CSVString, HTMLString, MarkdownString
+from zenml.types import CSVString, HTMLString, JSONString, MarkdownString
 
 logger = get_logger(__name__)
 
 
-STRUCTURED_STRINGS = Union[CSVString, HTMLString, MarkdownString]
+STRUCTURED_STRINGS = Union[CSVString, HTMLString, MarkdownString, JSONString]
 
 HTML_FILENAME = "output.html"
 MARKDOWN_FILENAME = "output.md"
 CSV_FILENAME = "output.csv"
+JSON_FILENAME = "output.json"
 
 
 class StructuredStringMaterializer(BaseMaterializer):
     """Materializer for HTML or Markdown strings."""
 
-    ASSOCIATED_TYPES = (CSVString, HTMLString, MarkdownString)
+    ASSOCIATED_TYPES = (CSVString, HTMLString, MarkdownString, JSONString)
     ASSOCIATED_ARTIFACT_TYPE = ArtifactType.DATA_ANALYSIS
 
     def load(self, data_type: Type[STRUCTURED_STRINGS]) -> STRUCTURED_STRINGS:
@@ -94,6 +95,8 @@ def _get_filepath(self, data_type: Type[STRUCTURED_STRINGS]) -> str:
             filename = HTML_FILENAME
         elif issubclass(data_type, MarkdownString):
             filename = MARKDOWN_FILENAME
+        elif issubclass(data_type, JSONString):
+            filename = JSON_FILENAME
         else:
             raise ValueError(
                 f"Data type {data_type} is not supported by this materializer."
@@ -120,6 +123,8 @@ def _get_visualization_type(
             return VisualizationType.HTML
         elif issubclass(data_type, MarkdownString):
             return VisualizationType.MARKDOWN
+        elif issubclass(data_type, JSONString):
+            return VisualizationType.JSON
         else:
             raise ValueError(
                 f"Data type {data_type} is not supported by this materializer."

diff --git a/src/zenml/types.py b/src/zenml/types.py
@@ -33,3 +33,6 @@ class MarkdownString(str):
 
 class CSVString(str):
     """Special string class to indicate a CSV string."""
+
+class JSONString(str):
+    """Special string class to indicate a JSON string."""
diff --git a/src/zenml/utils/visualization_utils.py b/src/zenml/utils/visualization_utils.py
@@ -13,9 +13,10 @@
 #  permissions and limitations under the License.
 """Utility functions for dashboard visualizations."""
 
+import json
 from typing import TYPE_CHECKING, Optional
 
-from IPython.core.display import HTML, Image, Markdown, display
+from IPython.core.display import HTML, Image, JSON, Markdown, display
 
 from zenml.artifacts.utils import load_artifact_visualization
 from zenml.enums import VisualizationType
@@ -63,6 +64,8 @@ def visualize_artifact(
             assert isinstance(visualization.value, str)
             table = format_csv_visualization_as_html(visualization.value)
             display(HTML(table))
+        elif visualization.type == VisualizationType.JSON:
+            display(JSON(json.loads(visualization.value)))
         else:
             display(visualization.value)