Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write multi-level datasets #102

Merged
merged 19 commits into from
Oct 4, 2024
Merged

Write multi-level datasets #102

merged 19 commits into from
Oct 4, 2024

Conversation

forman
Copy link
Member

@forman forman commented Sep 16, 2024

Added experimental function zappend.levels.write_levels() that generates datasets using the multi-level dataset format as specified by xcube.

It resembles the store.write_data(cube, "<name>.levels", ...) method provided by the xcube filesystem data stores ("file", "s3", "memory", etc.). The zappend version may be used for potentially very large datasets in terms of dimension sizes or for datasets with very large number of chunks. It is considerably slower than the xcube version (which basically uses xarray.to_zarr() for each resolution level), but should run robustly with stable memory consumption. The function requires xcube package to be installed.

Addresses #19.

Checklist (strike out non-applicable):

  • Changes documented in CHANGES.md
  • Related issue exists and is referred to in the PR description and CHANGES.md
  • Added docstrings and API docs for any new/modified user-facing classes and functions
  • Changes/features documented in docs/*
  • Unit-tests adapted/added for changes/features
  • Test coverage remains or increases (target 100%)

Copy link

codecov bot commented Sep 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.93%. Comparing base (c5270c5) to head (b6af3ec).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #102   +/-   ##
=======================================
  Coverage   99.93%   99.93%           
=======================================
  Files          30       30           
  Lines        1458     1458           
=======================================
  Hits         1457     1457           
  Misses          1        1           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AliceBalfanz
Copy link

I am missing the possibility to :

  • define that I want to add new leveled time stamps from a certain point of a source cube to an existing multi level dataset. This is relevant for NRT services, where the base cube is "growing" and the levels need to be updated accordingly
  • to write a multi level dataset with zappend from a xarray dataset, which does not exist on disk yet. I.e. I create a new dataset in memory and want to write it directly with zappend to a mulitlevel dataset. This is already possible for non-leveld data e.g. zappend(DatasetSlices(source_ds), force_new=False, config=zappend_config)

another case, but currently not urgent:

  • concatinate two existing multi level datasets along e.g. the time dimension

Copy link

@AliceBalfanz AliceBalfanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works nicely! Thank you for the implementation!
I am missing two things in the documentation:

  1. To inform the user, that write levels is not as robust as zappend itself. There will be a corrupt levels if an interruption happens during the process
  2. Make the user aware of the need to remove bnds when leveling.

Do you think it is worth adding the above information? Not sure where the right place would be, maybe the changelog?

@forman
Copy link
Member Author

forman commented Oct 4, 2024

Do you think it is worth adding the above information?

Yes!

@forman
Copy link
Member Author

forman commented Oct 4, 2024

Not sure where the right place would be, maybe the changelog?

The function's docstring.

@forman forman marked this pull request as ready for review October 4, 2024 10:04
@forman forman self-assigned this Oct 4, 2024
@forman forman merged commit a4e3746 into main Oct 4, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants