-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement UnionArray logical_nulls #6303
Conversation
Looks good from a brief check, what's the result of running the benchmarks? |
|
Edit: Now it runs faster when there are up to ~10 fields with nulls, with timings slowly increasing for every field with nulls, and from there timings stabilize and depends only on the length of the union New results
|
arrow-array/src/array/union_array.rs
Outdated
None | ||
} | ||
} | ||
|
||
/// Union types always return non null as there is no validity buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated docs below, but the others arrays which also should and correctly implements logical_nulls
doesn't include those default methods. Should we delete them too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove these specific implementations, the behavior will not change but the documented reason (because UnionArray has no validity buffer) will be lost.
If we remove these -- can we update the documentation for the default implementations to mention the UnionArray special case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment points to old code, my bad.
I think that the most recent changes already update the documentation, so I removed the methods
Let me know if you agree
I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more |
I'm starting to review this one. 👀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for the code documentation! This was a fun PR to review.
Most of the suggestions are intended to make it easier for the next reader. I'm not sure about our code coverage policy, and I also think we may need to implement UnionArray::is_nullable (maybe in a followup PR?).
arrow-array/src/array/mod.rs
Outdated
/// Similary, a [`UnionArray`] with any nullable child will always return true, even if all | ||
/// selected values are valid, and therefore would not appear in [`Array::logical_nulls`]. | ||
fn is_nullable(&self) -> bool { | ||
self.null_count() != 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the added code comment match the implementation?
UnionArrays always returns a zero null_count. So this conditional self.null_count() != 0
always evaluates to false, which means is never nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposed/added doc comments is what should happen -- but it's not happening now.
Union types have no null validity bitmap (per spec). I believe this is why the null_count is always zero and the UnionArray::nulls is None.
In contrast, I believe the is_nullable is based on the child types (as the docs added here suggest). But what's missing is to define UnionArray::is_nullable
on it's implementation, and in a way that examines the UnionArray's children. (Dictionary arrays already does so.) Do you agree?
If you agree, then maybe add this doc comment & the code change in a follow up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolute right, thank you!
I added an implementation always returning true
just to not be buggy, and will improve in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
arrow-array/src/array/union_array.rs
Outdated
None | ||
} | ||
} | ||
|
||
/// Union types always return non null as there is no validity buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove these specific implementations, the behavior will not change but the documented reason (because UnionArray has no validity buffer) will be lost.
If we remove these -- can we update the documentation for the default implementations to mention the UnionArray special case?
arrow-array/src/array/union_array.rs
Outdated
enum SparseStrategy { | ||
Gather, | ||
AllNullsSkipOne, | ||
MixedSkipWithoutNulls, | ||
MixedSkipFullyNull, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this enum elsewhere and doc comment what the options mean? Since the names are not self explanatory.
For example, the AllNullsSkipOne
is actually "all fields have some (not all) nulls" and the SkipOne
part refers to how the child arrays are iterated. Having that enum explained up front helps speedup the comprehension. 🙏🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I added comments, but please let me know if you think it can be further improved
arrow-array/src/array/union_array.rs
Outdated
// This is the threshold where masking becomes slower than gather | ||
// TODO: test on avx512f(feature is still unstable) | ||
let gather_relative_cost = if cfg!(target_feature = "avx2") { | ||
10 | ||
} else if cfg!(target_feature = "sse4.1") { | ||
3 | ||
} else { | ||
2 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any sources (or data) on specifically why these numbers?
There is a clear preference to using the gather approach when we have many fields (partial null) to iterate thru, but I didn't follow how these exact numbers were chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, those numbers looks magic. I choose them based on benchmarks, numbers below.
I added comments to be more clear, and modified the else branch, that was choose based on x86 baseline/SSE2 benchmarks, but was also being selected for non x86 archs. I restricted it to x86, and set the gather cost to 0 (always used) on non x86, because I only have x86 to bench.
In resume, for a union with len of 4096:
gather = 6.4 µs (regardless of target feature)
AVX2:
1 selection mask per chunk = 0.8 µs
10 masks = 6.3 µs
11 maks = ~7 µs (slower than gather)
SSE4.1:
1 mask = 2.5µs
2 masks = 5 µs
3 masks = 7.5 µs (slower than gather)
x86 baseline/SSE2:
1 mask = 4 µs
2 masks = 8 µs (slower than gather)
The data for AVX2 is at #6303 (comment)
Data for SSE4.1 and SSE2 is from memory, I will bench it again tomorrow
// Unsafe code below depend on it: | ||
// To remove one branch from the loop, if the a type_id is not utilized, or it's logical_nulls is None/all set, | ||
// we use a null buffer of len 1 and a index_mask of 0, or the true null buffer and usize::MAX otherwise. | ||
// We then unconditionally access the null buffer with index & index_mask, | ||
// which always return 0 for the 1-len buffer, or the true index unchanged otherwise | ||
// We also use a 256 array, so llvm knows that `type_id as u8 as usize` is always in bounds | ||
let mut logical_nulls_array = [(&one_valid, Mask::Zero); 256]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
arrow-array/src/array/union_array.rs
Outdated
let chunks = chunks_exact.map(|chunk| { | ||
let chunk_array = <&[i8; 64]>::try_from(chunk).unwrap(); | ||
|
||
mask_chunk(chunk_array, &mut nulls_masks_iter) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would you mind renaming the variable chunk_array
(and where the mask_chunk closures are defined) as type_ids_chunk_array
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks better, I also renamed chunk
to type_ids_chunk
and remainder
to type_ids_remainder
Feel free to ping any other place I may have missed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
arrow-array/src/array/union_array.rs
Outdated
}, | ||
); | ||
|
||
union_nulls | without_nulls_selected(chunk, &without_nulls_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment on this line?
I believe the without_nulls_selected returns the bitmask for the without nulls (the is_d | is_e
), which provides the proper true NullBuffer slot (which translates to not-the-f-array-null, a.k.a. not one of the fully nulls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, please let me know if it can be further clarified
arrow-array/src/array/union_array.rs
Outdated
}) | ||
.fold((0, 0), fold); | ||
|
||
!any_nullable_selected | union_nulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment on this line?
It's basically the inverse of line 480 (see requested comment there). It's clear once reading the NullBuffer docs, but it's an extra hop for anyone newer to the codebase. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified it to be the same as line 480, and renamed nullable
to with_nulls
because it doesn't applies to nullable fields that happens to have 0 nulls
… into union_logical_nulls
Co-authored-by: wiedld <[email protected]>
@wiedld -- please ping me when you think this PS is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gstvg for the clarifications & documentation. ❤️
I can approve after @alamb let's me know about the CI-testing targets, and what is the standard we use. 🙏🏼
Alternatively, there is another suggested approach to make sure we are testing the different sparse strategies.
// Choose the fastest way to compute the logical nulls | ||
// Gather computes one null per iteration, while the others work on 64 nulls chunks, | ||
// but must also compute selection masks, which is expensive, | ||
// so it's cost is the number of selection masks computed per chunk | ||
// Since computing the selection mask gets auto-vectorized, it's performance depends on which simd feature is enabled | ||
// For gather, the cost is the threshold where masking becomes slower than gather, which is determined with benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
// Always use gather on non benchmarked archs because even though it may slower on some cases, | ||
// it's performance depends only on the union length, without being affected by the number of fields | ||
0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. These comments are great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb -- I have a question regarding how CI will be testing these branches.
The default gather_relative_cost=0 means that the default scenario is to always use the SparseStrategy::Gather
except when other architectures AND features are enabled during testing. When I look at our CI (which I believe is here) we won't be testing all of these scenarios.
Not trying to hold up this PR. Just trying to figure out if this means we leave the responsibility on the user to build & run tests on their own platform -- and report to us if the tests errors. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could have these branches (choosing which SparseStrategy
enum) not be hardcoded to a feature & architecure (in that function itself) -- and instead use an approach where we can set the variable separately before running each test. That way all branches can be run through the correctness tests.
Is this a reasonable ask? (I'm new to this codebase & as a code reviewer here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, the arrow tests are controlled by this file: https://github.com/apache/arrow-rs/blob/master/.github/workflows/arrow.yml
Which runs with various combination of feature flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok for now. What I would recommend is making a follow on PR that pulls the choice of size into a function with some better documentation and possibly actually doing the benchmarking described in the comments
I am hoping to get #6340 into this release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a reasonable read of this PR, and while I didn't fully understand all the nuances, I think it looks well tested.
It would be nice if the various complexity could be justified with a benchmark (i.e. if we could make a benchmark that would be great and it would ensure we don't regress this performance in the future).
In any event, 🚀 lets get this in and we can iterate to improve it if necessary
Thank you @gstvg for the code, @samuelcolvin for the original in DataFusion and @wiedld for the review
// Always use gather on non benchmarked archs because even though it may slower on some cases, | ||
// it's performance depends only on the union length, without being affected by the number of fields | ||
0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok for now. What I would recommend is making a follow on PR that pulls the choice of size into a function with some better documentation and possibly actually doing the benchmarking described in the comments
Which issue does this PR close?
Closes #6017
Rationale for this change
N/A
What changes are included in this PR?
UnionArray::logical_nulls
implementation, tests and benchesis_null
andis_not_null
tests on unionsCheck if sparse union child arrays length match the length of the parent union
Are there any user-facing changes?
UnionArray::logical_nulls
return correct resultsAdditional info
This is a port of apache/datafusion#11321