Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add saving of ragged lazy vectors #193

Merged
merged 6 commits into from
Dec 12, 2023

Conversation

CSSFrancis
Copy link
Member

Description of the change

Currently Ragged Lazy arrays aren't saved properly because the underlying dtype isn't properly accessed. Hdf5 and zarr only support 1 layer of object arrays so you can't have an object array of object arrays as serializing that would be quite difficult.

In addition there are some small points here that have performance implications. For example we need the shape size at each navigation position in order to unwrap things. We want to make sure that this runs in the same task graph so that they are merged/ simplified so that things don't run twice when saving :).

Thus passing a tuple to da.store.

Progress of the PR

  • Fix Storing hspy ragged lazy
  • Fix Storing zspy ragged lazy
  • Add tests (passing?)
  • Make sure that loading doesn't load everything into memory
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add a changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting of the changelog entry (and eventual user guide changes) in the docs/readthedocs.org:rosettasciio build of this PR (link in github checks)
  • add tests,
  • ready for review.

Minimal example of the bug fix or the new feature

import numpy as np
import hyperspy.api as hs
nav_shape = np.arange(10, 10 * (nav_dim + 1), step=10)
data = np.empty(nav_shape, dtype=object)
for ind in np.ndindex(data.shape):
   num = rng.integers(3, 10)
   data[ind] = rng.random((num, 2)) * 100
data = da.from_array(data, chunks=2)

s = hs.signals.BaseSignal(data, ragged=True)
s.as_lazy()
s.save("test.zspy")

The 1D hspy case is still failing for some reason but I wanted to just create this PR to see if people had suggestions?

Copy link

codecov bot commented Dec 5, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (13507d4) 86.02% compared to head (1f7a11e) 86.25%.
Report is 60 commits behind head on main.

Files Patch % Lines
rsciio/hspy/_api.py 89.47% 0 Missing and 2 partials ⚠️
rsciio/zspy/_api.py 87.50% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #193      +/-   ##
==========================================
+ Coverage   86.02%   86.25%   +0.23%     
==========================================
  Files          81       82       +1     
  Lines       10387    10541     +154     
  Branches     2253     2291      +38     
==========================================
+ Hits         8935     9092     +157     
  Misses        931      931              
+ Partials      521      518       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Fix lazy loading of ragged arrays
rsciio/_hierarchical.py Fixed Show fixed Hide fixed
Copy link
Member

@ericpre ericpre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks good, just need to get the hspy writer to work!

rsciio/_hierarchical.py Outdated Show resolved Hide resolved
rsciio/_hierarchical.py Outdated Show resolved Hide resolved
rsciio/_hierarchical.py Outdated Show resolved Hide resolved
rsciio/hspy/_api.py Outdated Show resolved Hide resolved
rsciio/zspy/_api.py Outdated Show resolved Hide resolved
rsciio/hspy/_api.py Show resolved Hide resolved
rsciio/tests/test_hspy.py Outdated Show resolved Hide resolved
rsciio/tests/test_hspy.py Outdated Show resolved Hide resolved
rsciio/hspy/_api.py Outdated Show resolved Hide resolved
@CSSFrancis
Copy link
Member Author

CSSFrancis commented Dec 12, 2023

@ericpre I went through and added in your suggestions. I also added an error for trying to save 1-d lazy ragged array using the .hspy file writer.

I think there is something weird going on with h5py casting the array to a higher dimension and then trying to save it. I can try to track down the bug, but it's pretty low on my to-do list. If someone needs the feature or you really think it's worth tracking down I can put the effort in otherwise I might come back to this in a couple of months after I graduate.

@CSSFrancis
Copy link
Member Author

@ericpre what are your thoughts on including this in the 0.3.0 release? It doesn't have the 1-D hyperspy lazy saving figured out but I feel like that shouldn't stop it from being included.

@ericpre
Copy link
Member

ericpre commented Dec 12, 2023

I think there is something weird going on with h5py casting the array to a higher dimension and then trying to save it. I can try to track down the bug, but it's pretty low on my to-do list. If someone needs the feature or you really think it's worth tracking down I can put the effort in.

As this is the hspy reader, I think that it would be important to get it to work, as we would expect that this format supports all feature of hyperspy. That's being said, this is not a regression, so it is fair to leave it for latter! For now, just opening an issue summarising your current understanding of the situation would do! :)

rsciio/hspy/_api.py Outdated Show resolved Hide resolved
@ericpre
Copy link
Member

ericpre commented Dec 12, 2023

@ericpre what are your thoughts on including this in the 0.3.0 release?

Yes, sure, this PR is pretty much finish.

Co-authored-by: Eric Prestat <[email protected]>
@ericpre ericpre merged commit 9c3b2fa into hyperspy:main Dec 12, 2023
31 checks passed
@ericpre ericpre added this to the v0.3 milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants