Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

open_projectfile_data: rechunk to have chunks only in time, or rechunk in conversion #1144

Open
Huite opened this issue Aug 7, 2024 · 0 comments
Labels
performance Execution speed

Comments

@Huite
Copy link
Contributor

Huite commented Aug 7, 2024

Reconsidering #845 had me thinking: we're creating a very large number of tasks if the chunks are sized 1 along both time and layer.

This is a direct consequence of the IDF reading which creates one task for each file. However, the conversion to MF6 will generally operate on all the layers at once. We can greatly reduce the number of tasks and the size of the task graph by making sure chunks are merged in the layer dimension.

I'm only doubtful whether we should do this directly in open_projectfile_data such that any user would benefit from this different default. There is a downside, let's say you do something like this this:

prj_data = imod.prj.open_projectfile_data(stuff)

khv = prj_data["khv"]
for layer in khv["layer"]:
     khv.sel(layer=layer).plot()

This will now load all IDFs uselessly for plotting a single layer. Of course, a simple .compute() addresses it:

khv = prj_data["khv"].compute()

But I do not expect most users to come up with this themselves.

Many other operations would probably work better without layer chunking though. Of course, the same is true for layer chunking in imod.idf.open and I haven't seen any complaints from that.

So what I'd suggest now is to remove the layer chunks in the from_imod5_data method (i.e. set layer chunk size equal to dimension size):

chunksizes = dict(da.chunksizes)
if "layer" in chunksizes:
     chunksizes["layer"] = (da.sizes["layer"],)
     da = da.chunk(chunksizes)
@github-project-automation github-project-automation bot moved this to 📯 New in iMOD Suite Aug 7, 2024
@JoerivanEngelen JoerivanEngelen moved this from 📯 New to 🤝 Accepted in iMOD Suite Aug 7, 2024
@JoerivanEngelen JoerivanEngelen added the performance Execution speed label Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Execution speed
Projects
Status: 🤝 Accepted
Development

No branches or pull requests

2 participants