Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update worker images to optimize IO performance using local data #675

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Commits on Oct 30, 2024

  1. Configuration menu
    Copy the full SHA
    997ca3b View commit details
    Browse the repository at this point in the history
  2. Setting needs_data_local when fulfilled_by is set.

    Updating usages of DataRequirement so that whenever the fulfilled_by
    attribute of an instance is set - creation time or otherwise - the new
    needs_data_local is also set.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    4e3e03f View commit details
    Browse the repository at this point in the history
  3. Update, document ngen image custom dir structure.

    Add 2 new directories - /dmod/local_volumes and /dmod/cluster_volumes -
    to ngen image directory structure, meant for mount points of different
    types of volumes containing necessary data for the job; also, adding
    README with some initial documentation on this directory structure.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    3625385 View commit details
    Browse the repository at this point in the history
  4. Update Launcher for using local data volumes.

    Updating Launcher to prepare services with local volume mounts when some
    data requirements must be fulfilled by local data on the physical node,
    and to update the relevant other args for starting worker services so
    that one worker on each node makes sure data gets prepared in local
    volumes as needed as part of job startup.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    152e8d3 View commit details
    Browse the repository at this point in the history
  5. Update ngen-related images to have mc client.

    Making MinIO CLI client available within ngen worker image and
    derivatives (e.g., calibration worker), though without a pre-configured
    alias for connected to the object store service.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    1aabb62 View commit details
    Browse the repository at this point in the history
  6. Update worker Python functions to make data local.

    Adding functionality to py_funcs.py to support making DMOD dataset data
    local (not just be locally accessible from remote storage).
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    8aff1c2 View commit details
    Browse the repository at this point in the history
  7. Update worker entrypoints for local data.

    Updating main entrypoint scripts for ngen and calibration worker images
    for local data handling.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    13b61a5 View commit details
    Browse the repository at this point in the history
  8. Fix fast dev update script GUI handling.

    Fixing script so that GUI services do not get stopped and updated unless
    that is actually asked for with the available CLI option.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    e163b61 View commit details
    Browse the repository at this point in the history
  9. Move call to make_data_local in entrypoints.

    Moving call to this Python function so that it happens before sanity
    checks (at the entrypoint level) ensuring dataset directories exist, as
    they won't exist until any data is made local.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    1f4fa7b View commit details
    Browse the repository at this point in the history
  10. Fix issues w/ use of separate cluster/local data.

    - Order minio client args properly (config dir must come first)
    - Cleanup output handling during minio client subprocess
    - Correct a few logical mistakes with how conditionals should behave
    - Fix issue with path object creation when copying from cluster volume
    - Adding some helpful logging messages
    - Make sure we actually create symlinks
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    72067da View commit details
    Browse the repository at this point in the history
  11. Fix more issues with py_funcs functions.

    - Fixing handling of symlink for output dataset so it points to cluster
      volume as needed (i.e., so output can actually make it out of the
      worker)
    - Fixing some issues with keyword args coming in from CLI that certain
      functions weren't set up to disregard properly
    - Adding a bit more helpful logging in places
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    2919533 View commit details
    Browse the repository at this point in the history
  12. Update worker entrypoints for permissions issues.

    Adding logic and reordering certain things to make sure that, given
    local writing initially of job outputs, etc., that process to then move
    the results to backing dataset storage works properly and does not run
    into permissions issues.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    088b35c View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    d71abfc View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    15486d8 View commit details
    Browse the repository at this point in the history
  15. Update dataservice internal deps to latest.

    Update dependencies on core and scheduler to 0.21.0 and 0.14.0
    respectively.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    42a09fa View commit details
    Browse the repository at this point in the history
  16. Configuration menu
    Copy the full SHA
    2164ca0 View commit details
    Browse the repository at this point in the history
  17. Update schedulerservice internal deps to latest.

    Update dependencies on core and scheduler to 0.21.0 and 0.14.0
    respectively.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    0c4ef3c View commit details
    Browse the repository at this point in the history
  18. Update requestsservice to latest core dep.

    Updating dependency on core to 0.21.0.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    b59cb7e View commit details
    Browse the repository at this point in the history
  19. Update partitionerservice internal deps to latest.

    Updating dependencies on core and scheduler to 0.21.0 and 0.14.0
    respectively.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    84710c4 View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    46e442e View commit details
    Browse the repository at this point in the history
  21. Account for platform in image mc client download.

    Account for building in environments other than Linux X86_64 when
    downloading the MinIO client for the ngen worker images.
    robertbartel committed Oct 30, 2024
    Configuration menu
    Copy the full SHA
    25dbf8e View commit details
    Browse the repository at this point in the history