-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AL-2368] Optimization for saving shallow views #2511
Conversation
@@ -3273,7 +3273,13 @@ def _write_vds( | |||
create_shape_tensor=False, | |||
create_id_tensor=False, | |||
create_sample_info_tensor=False, | |||
).extend(list(self.index.values[0].indices(self.num_samples))) | |||
).extend( | |||
np.array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not handle this inside extend
implementation? Also, is the tuple
conversion necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extend can handle lists, but the extend method has optimizations for when the inputs are numpy arrays.
The tuple conversion is necessary because .indices() is a generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant moving the optimization to extend method so if we have other places call this method, the optimization will be applied automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would have to loop over all the input samples and check their type to achieve that and it would be better not to have such loops. There might also be issues deciding which dtype to cast the samples to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume extend method accept sequence-like object as first argument. Isn't in the implementation just wrap the first argument with np.asarray() would do the same as calling np.array() here? Of course, you still need the code to make sure the input is sequence-like object, like tuple here or list from the original code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first argument of extend can be a sequence of a great many things, not just integers. This is why we can't simply wrap it in a np.asarray and leverage the optimizations. We will need to loop over every sample, make sure it is integer, and then cast the whole sequence to a numpy array of proper dtype. But that is not good design.
🚀 🚀 Pull Request
Impact
Description
Extending with a numpy array of indices is much faster.
Things to be aware of
Things to worry about
Additional Context