Draft: Add mode to read consolidated ZARR datasets #2992

mannreis · 2024-08-27T07:52:15Z

This changes add a mode option mode=consolidated (perhaps best to do it by default when reading and fallback if fails) that will fetch a possibly existing .zmetadata file from the root of the dataset. That could serve as unified representation to be used whenever needing group or variable metadata further down the code path.

This is a WIP motivated by #2987 and lacks (at least):

Unit testing
Functional testing
Support Zarr V3
Robustness when open consolitaded not available

DennisHeimbigner · 2024-08-27T20:03:46Z

The way I planned to do the consolidated metada (aside: would like a shorter term than "consolidated")
for netcdfd is to create another dispatch layer for accessing various metadata pieces.
So for v2, this would wrap read/write of .zroup .zarray and .zattrs.
For v3, this would wrap access to zarr.json.

florianziemen · 2024-10-22T17:25:23Z

How about using csd as a shorthand for consolidated (maybe even make both variants legal options)?

Personally, I would prefer to make consolidated the default, and fall back to unconsolidated, if no .zmetadata file is available (or the user explicitly asks for unconsolidated), but I would also understand if you prefer not to change existing behavior of libnetcdf...

DennisHeimbigner · 2024-10-22T17:30:13Z

It occurs to me to ask. Why is the consolidated metadata in a separate .zmetadata rather than in the root groups' zarr.json?

florianziemen · 2024-10-22T17:34:49Z

No idea why, but it is handled that way in zarr python for zarr2 ...

See your question here: zarr-developers/zarr-python#720

mannreis · 2024-10-29T14:47:03Z

The way I planned to do the consolidated metada (aside: would like a shorter term than "consolidated") for netcdfd is to create another dispatch layer for accessing various metadata pieces. So for v2, this would wrap read/write of .zroup .zarray and .zattrs. For v3, this would wrap access to zarr.json.

You mean adding a block of function pointers to NC_Dispatch](https://github.com/Unidata/netcdf-c/blob/main/include/netcdf_dispatch.h.in#L34) that would handle the metadata(-file) operations for zarr? I was picturing something internal to to the NCZ_* layer but I don't have a really good overview of the code design.

DennisHeimbigner · 2024-10-29T15:34:37Z

You mean adding a block of function pointers to NC_Dispatch

No, I was thinking of an internal dispatch table. When I added support
for Zarr version 3, I created a dispatch table discriminated on the version.
I then constructed some code to look at the URL and the Zarr dataset
to infer which version to use. I would do the same for the metadata
dispatcher but discriminating on consolidated or not.

mannreis · 2024-11-11T12:35:37Z

I finally understood you're referring to the implementation in the branches of your fork! I'm taking a look into zarrv3b.tmp, is this the branch you envision to merge?
Just to clarify, what we'd like to have is that, when opening a consolidated dataset (without authentication), one could point to a "vanilla HTTP" server. This means that, HTTP-S3-specific requests like method=list-bucket-v2 would be avoided (when data is consolidated) or delayed (when not). Is this a sensible requirement?

mannreis requested review from WardF and DennisHeimbigner as code owners August 27, 2024 07:52

mannreis mentioned this pull request Aug 27, 2024

Consolidated Zarr support could improve S3 data loading #2987

Open

Add mode to read consolidated ZARR datasets

1d79947

mannreis force-pushed the consolidated-zarr branch from 964be46 to 1d79947 Compare October 7, 2024 15:01

mannreis mentioned this pull request Nov 18, 2024

Zarrv3b csl md DennisHeimbigner/netcdf-c#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add mode to read consolidated ZARR datasets #2992

Draft: Add mode to read consolidated ZARR datasets #2992

mannreis commented Aug 27, 2024

DennisHeimbigner commented Aug 27, 2024

florianziemen commented Oct 22, 2024

DennisHeimbigner commented Oct 22, 2024

florianziemen commented Oct 22, 2024

mannreis commented Oct 29, 2024

DennisHeimbigner commented Oct 29, 2024

mannreis commented Nov 11, 2024

Draft: Add mode to read consolidated ZARR datasets #2992

Are you sure you want to change the base?

Draft: Add mode to read consolidated ZARR datasets #2992

Conversation

mannreis commented Aug 27, 2024

DennisHeimbigner commented Aug 27, 2024

florianziemen commented Oct 22, 2024

DennisHeimbigner commented Oct 22, 2024

florianziemen commented Oct 22, 2024

mannreis commented Oct 29, 2024

DennisHeimbigner commented Oct 29, 2024

mannreis commented Nov 11, 2024