You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created an example repo to explore how you can use virtualizarr + xarray-beam to create an ARCO dataset from a collection of NetCDF files.
In this example, I used a pre-generated virtualizarr reference as the input dataset. This was then fed into an apache-beam pipeline using xarray-beam PTransforms to open the virtualizarr, rechunk and materialize it as a Zarr store.
Rough and optimized timings on dataflow.
1.3 TB dataset
13 variables (merge op)
60 years of daily data
33 minutes on dataflow using ARM instances.
I created an example repo to explore how you can use virtualizarr + xarray-beam to create an ARCO dataset from a collection of NetCDF files.
In this example, I used a pre-generated virtualizarr reference as the input dataset. This was then fed into an apache-beam pipeline using
xarray-beam
PTransforms to open the virtualizarr, rechunk and materialize it as a Zarr store.Rough and optimized timings on dataflow.
ToDo:
cc @jbusecke @SammyAgrawal
The text was updated successfully, but these errors were encountered: