Skip to content

Releases: AllenCellModeling/datastep

v0.1.9

17 Nov 02:20
Compare
Choose a tag to compare

Drops python 3.6 build in favor of simpler dependencies, particularly around urllib / botocore conflicts that were arising with quilt.

v0.1.3

05 Feb 06:04
Compare
Choose a tag to compare

This release updates the Step class __init__ process, and adds some methods for witching the filepaths in the step manifest between relative and absolute paths.

  • More keyword arguments were added, to set configurations like step.filepath_columns and step.metadata_columns.
  • Options to set the project name and the step name to be something other than what's inferred by the directory tree were added for future use but are currently nonfunctional.
  • The new init defaults are
def __init__(
        self,
        clean_before_run=True,
        filepath_columns=["filepath"],
        metadata_columns=[],
        step_name=None,
        package_name=None,
        direct_upstream_tasks: List["Step"] = [],
        config: Optional[Union[str, Path, Dict[str, str]]] = None,
    ):
  • There are now a lot of keyword args, but you only have to set them in your step classes if you want them to be different from the default values.
  • For example in the Raw step below,
class Raw(Step):
    def __init__(self, filepath_columns=["col1", "col2"]):
        super().__init__(filepath_columns=filepath_columns)

I need two filepath columns, so I set that in the init, but don't set anything else, since I'm fine with the defaults. I also pass filepath_columns to super().__init__, so that my Raw class gets to use the initialization process that's already defined in Step.

  • The methods for switching between relative and absolute paths are step.manifest_filepaths_rel2abs and step.manifest_filepaths_abs2rel. For example if i had a QC step after Raw that needed data from raw, i could use
class QC(Step):
    def __init__(self):
        super().__init__()
    def run():
        raw = Raw()
        raw.manifest_filepaths_abs2rel()
        # interate through raw.manifest and do some qc

to move from relative paths in raw, which is needed for uploading to quilt, to absolute paths, which are easier for local file access.