`open_projectfile_data`: rechunk to have chunks only in time, or rechunk in conversion #1144

Huite · 2024-08-07T10:48:50Z

Reconsidering #845 had me thinking: we're creating a very large number of tasks if the chunks are sized 1 along both time and layer.

This is a direct consequence of the IDF reading which creates one task for each file. However, the conversion to MF6 will generally operate on all the layers at once. We can greatly reduce the number of tasks and the size of the task graph by making sure chunks are merged in the layer dimension.

I'm only doubtful whether we should do this directly in open_projectfile_data such that any user would benefit from this different default. There is a downside, let's say you do something like this this:

prj_data = imod.prj.open_projectfile_data(stuff)

khv = prj_data["khv"]
for layer in khv["layer"]:
     khv.sel(layer=layer).plot()

This will now load all IDFs uselessly for plotting a single layer. Of course, a simple .compute() addresses it:

khv = prj_data["khv"].compute()

But I do not expect most users to come up with this themselves.

Many other operations would probably work better without layer chunking though. Of course, the same is true for layer chunking in imod.idf.open and I haven't seen any complaints from that.

So what I'd suggest now is to remove the layer chunks in the from_imod5_data method (i.e. set layer chunk size equal to dimension size):

chunksizes = dict(da.chunksizes)
if "layer" in chunksizes:
     chunksizes["layer"] = (da.sizes["layer"],)
     da = da.chunk(chunksizes)

The text was updated successfully, but these errors were encountered:

github-project-automation bot added this to iMOD Suite Aug 7, 2024

github-project-automation bot moved this to 📯 New in iMOD Suite Aug 7, 2024

JoerivanEngelen moved this from 📯 New to 🤝 Accepted in iMOD Suite Aug 7, 2024

JoerivanEngelen added this to the v1.0: Backwards Compatibility iMOD5 milestone Aug 14, 2024

JoerivanEngelen added the performance Execution speed label Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`open_projectfile_data`: rechunk to have chunks only in time, or rechunk in conversion #1144

`open_projectfile_data`: rechunk to have chunks only in time, or rechunk in conversion #1144

Huite commented Aug 7, 2024

open_projectfile_data: rechunk to have chunks only in time, or rechunk in conversion #1144

open_projectfile_data: rechunk to have chunks only in time, or rechunk in conversion #1144

Comments

Huite commented Aug 7, 2024

`open_projectfile_data`: rechunk to have chunks only in time, or rechunk in conversion #1144

`open_projectfile_data`: rechunk to have chunks only in time, or rechunk in conversion #1144