-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up pad_nulls
for FixedLenByteArrayBuffer
#6297
Conversation
Benchmark results for these changes (pad loop is as in master branch):
|
I must have hallucinated last night...turns out Linux with newer xeon
Hmm, this is pretty system specific. Old mac laptop
|
Sorry I didn't follow all the nance of the results here. It seems like you have concluded that which approach is faster is a result of what the target architecture is? If this is the case, I think my preference would be go with the simplest code (e.g. |
Sorry, I tend to ramble...I concluded that
agreed! |
I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more |
I'm starting to review this one. Just give me a little bit to dig into the decoding. |
) where | ||
F: FnMut(&mut Vec<u8>, usize, usize, usize), | ||
{ | ||
for (value_pos, level_pos) in values_range.rev().zip(iter_set_bits_rev(valid_mask)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learning question. Confirming that the iter_set_bits_rev
(reversing) is due to little endian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC iter_set_bits_rev
is returning an iterator of valid slot indexes for the array, but in reverse order, to match the values_range iterator which is also reversed (which I believe is just a monotonically decreasing counter from num_non_null-1
to 0). We're turning a dense array of non-null values into a sparse one where null slots are left undefined, but doing it in place, which is why it's done from the tail.
// byte_stream_split encoded, no NULLs | ||
let data = build_encoded_primitive_page_iterator::<T>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice additional coverage. 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to a change you made:
I noticed that under one of the bench_primitive test scenarios, I think this should be "binary packed skip" and use the bench_array_reader_skip
. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll put up the suggested changes in a draft PR, since it's unrelated, and then ping you. Thanks for considering!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decimal_benches, | ||
byte_stream_split_benches, | ||
); | ||
criterion_group!(benches, add_benches, decimal_benches, float16_benches,); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice refactoring of the tests. TY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @etseidl -- sorry this took so long to review |
Which issue does this PR close?
Closes #6296.
Rationale for this change
See issue.
What changes are included in this PR?
Changes
pad_nulls
forFixedLenByteArrayBuffer
as described in the issue. Benchmarking seems to indicate that using the new approach forbyte_length > 4
is optimal. The loop is better forbyte_length <= 4
because the compiler will eliminate the loop via unrolling.This PR does include extensive changes to the
arrow_reader
bench to allow benchmarking these changes.Are there any user-facing changes?
No