Reduce memory consumption of very high-resolution merges #408

nvictus · 2024-03-20T23:24:15Z

This PR addresses a high memory consumption issue when a large number of very high resolution coolers are merged. It should improve the performance not only of cooler merge but also of cooler cload pairs and cooler load.

In pre-calculating offsets to use for the merge execution plan, we were loading (and concatenating) all bin1_offset indexes into memory. This isn't an issue for typical coolers, but can become prohibitively large for many inputs at high resolutions, where a single index vector can be ~2GB in size at human 10bp resolution.

Now we use lazy HDF5 datasets and load each bin1_offset index incrementally during merge execution planning. This results in a drastic improvement for merges involving e.g. 100s of datasets.
We also expose the merge buffer argument to cooler cload pairs and cooler load, and the max-merge option to cooler load, to give the user more flexibility in controlling maximum memory consumption during the actual merge epochs.

nvictus added 4 commits March 19, 2024 22:03

Expose mergebuf parameter to load and cload

c25403c

Build cumulative index incrementally from ondisk arrays

86e4ca9

Restore mergebuf to default to chunksize

f42fe38

Rename maxbuf to bufsize

68a90b0

nvictus requested review from Phlya and thomas-reimonn March 20, 2024 23:24

nvictus added 3 commits March 20, 2024 19:27

Move logging statement

9c9f864

Add max-merge option to cooler load

ded129e

Format string

4a64e33

nvictus merged commit a1b6cb0 into open2c:master Mar 23, 2024
9 checks passed

nvictus mentioned this pull request May 21, 2024

cooler cload pairs memory usage error #412

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory consumption of very high-resolution merges #408

Reduce memory consumption of very high-resolution merges #408

nvictus commented Mar 20, 2024 •

edited

Loading

Reduce memory consumption of very high-resolution merges #408

Reduce memory consumption of very high-resolution merges #408

Conversation

nvictus commented Mar 20, 2024 • edited Loading

nvictus commented Mar 20, 2024 •

edited

Loading