S3 byte address for chunks; was API access to NetCDF/HDF chunk index #2754
Replies: 5 comments 2 replies
-
HDF5 no doubt provides this, but I don't know how. If we could identify how, we could consider adding it to the netcdf API. An alternative might be to add a netCDF API call that can return the hid_t ID of the open file (i.e. the HDF5 file ID), Then the HDF5 API could be used directly, without trouble... |
Beta Was this translation helpful? Give feedback.
-
Thanks Ed. Sounds like we need to explore the HDF options for doing this, assuming we have the |
Beta Was this translation helpful? Give feedback.
-
There is an HDF5 API for accessing chunks directly called H5Dread_chunk. |
Beta Was this translation helpful? Give feedback.
-
Ah, that's helpful, because we can go chasing its use in HDF code. The issue for us is not so much the reading of the chunk, we can do that, the issue is finding the chunk address from the chunk index. If you're using Zarr, you do their multi indexer (which would likely exploit a kerchunk index for non Zarr native data). @valeriupredoi can you please chase this up? |
Beta Was this translation helpful? Give feedback.
-
After reading a bit of the netcdf c code (*) I realise I asked the wrong question! Probably worth putting it differently. If we assume we know how to got to your
(*) for "reading c-code" imagine me in some foreign country reading a menu and ordering something and hoping I get something vaguely edible and vaguely related to what I thought I was ordering. |
Beta Was this translation helpful? Give feedback.
-
Hi Folks
We are in the process of building support for computational storage support for NetCDF data (repo). To do this, we need access to the NetCDF chunk index [1]. Currently we do this via kerchunk and the Zarr indexer (but not by using Zarr itself, as we do not want either Zarr or NetCDF to load the chunk to memory, that would defeat the point of computational storage). The json format of the kerchunk index is a huge overhead, but it must be the case that the netcdf machinery effectively does the same thing.
Is there some underlying library interface we could get to that does this, or can someone point us to the relevant code? (I wouldn't know where to start in the c library ... sadly).
Cheers
Bryan
[1] We are not doing classic mode, so I believe this is a b-tree index which must exist somewhere, and when we've found that, we would want to be able to go from slice notation into which elements we need, then use this index to tell the storage where the relevant chunk lies.
Beta Was this translation helpful? Give feedback.
All reactions