Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AL-2368] Optimization for saving shallow views #2511

Merged
merged 1 commit into from
Aug 1, 2023
Merged

Conversation

FayazRahman
Copy link
Contributor

🚀 🚀 Pull Request

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Extending with a numpy array of indices is much faster.

Things to be aware of

Things to worry about

Additional Context

@@ -3273,7 +3273,13 @@ def _write_vds(
create_shape_tensor=False,
create_id_tensor=False,
create_sample_info_tensor=False,
).extend(list(self.index.values[0].indices(self.num_samples)))
).extend(
np.array(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not handle this inside extend implementation? Also, is the tuple conversion necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend can handle lists, but the extend method has optimizations for when the inputs are numpy arrays.

The tuple conversion is necessary because .indices() is a generator.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant moving the optimization to extend method so if we have other places call this method, the optimization will be applied automatically.

Copy link
Contributor Author

@FayazRahman FayazRahman Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to loop over all the input samples and check their type to achieve that and it would be better not to have such loops. There might also be issues deciding which dtype to cast the samples to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume extend method accept sequence-like object as first argument. Isn't in the implementation just wrap the first argument with np.asarray() would do the same as calling np.array() here? Of course, you still need the code to make sure the input is sequence-like object, like tuple here or list from the original code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument of extend can be a sequence of a great many things, not just integers. This is why we can't simply wrap it in a np.asarray and leverage the optimizations. We will need to loop over every sample, make sure it is integer, and then cast the whole sequence to a numpy array of proper dtype. But that is not good design.

@FayazRahman FayazRahman merged commit 95eec4e into main Aug 1, 2023
@FayazRahman FayazRahman deleted the fy_opt_view branch August 1, 2023 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants