Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary / row helpers #6096

Merged
merged 8 commits into from
Oct 8, 2024
Merged

Binary / row helpers #6096

merged 8 commits into from
Oct 8, 2024

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Jul 19, 2024

Which issue does this PR close?

Closes #6063.

(Potentially - still under discussion at the linked issue!)

Rationale for this change

I've added the optional from_binary method discussed in the associated issue also.

What changes are included in this PR?

data, into_binary and from_binary functions, and an extension to the fuzz test that checks the data survives the roundtrip.

Are there any user-facing changes?

Yes, though I suspect the rustdoc covers them enough?

Currently these are accessible via `AsRef`, but that trait only gives
you the bytes with the lifetime of the `Row` struct and not the lifetime
of the backing data.
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 19, 2024
arrow-row/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bkirwi and @XiangpengHao

I think this PR needs some additional negative tests and error testing but otherwise I think it is looking good to me

cc @tustvold in case you have time to comment on the safety of the design

arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Outdated Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
@alamb alamb marked this pull request as draft July 25, 2024 18:55
@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

Marking as draft so it is clear this PR isn't waiting on feedback anymore (at least I don't think it is). Please mark it as ready for review when it is ready for another look

Copy link
Contributor Author

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I think I've addressed all comments, though there were a couple things I wasn't certain of - addressed inline.

arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
@bkirwi bkirwi marked this pull request as ready for review August 9, 2024 16:29
@bkirwi
Copy link
Contributor Author

bkirwi commented Aug 9, 2024

(Looks like there was some merge skew in the tests; I've merged the main branch in here which ought to fix it.)

@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

Copy link
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments but these seem like straightforward changes to me.

buffer: array.values().to_vec(),
offsets: array.offsets().iter().map(|&i| i.as_usize()).collect(),
config: RowConfig {
fields: Arc::clone(&self.fields),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More for my curiosity than anything but why Arc::clone(&self.fields) instead of self.fields.clone()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some people prefer this form because it makes it more explicit that we're just incrementing an arc and not cloning the underlying data. See the clippy lint docs for more: https://rust-lang.github.io/rust-clippy/master/index.html#/clone_on_ref_ptr

I've gotten used to this style, though I do not personally care deeply about it! This codebase seems to use a mix of both.

///
/// // We can convert rows into binary format and back in batch.
/// let values: Vec<OwnedRow> = rows.iter().map(|r| r.owned()).collect();
/// let binary = rows.try_into_binary().expect("small");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a little confused by .expect("small"). What does "small" mean in this context? Why not just .unwrap()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_into_binary fails when the data is too large to be indexed with a 32-bit integer, so this was meant to suggest that it was fine to unwrap here because the data is known to be small. I'll expand the message a bit!

arrow-row/src/lib.rs Show resolved Hide resolved
@westonpace
Copy link
Member

(ah, I see rust fmt is failling, probably need CI passing before merge)

@alamb
Copy link
Contributor

alamb commented Sep 24, 2024

@bkirwi can you please fix the CI tests so we can merge this PR?

Thank you @westonpace for the review

@bkirwi
Copy link
Contributor Author

bkirwi commented Sep 24, 2024 via email

arrow-row/src/lib.rs Show resolved Hide resolved
arrow-row/src/lib.rs Show resolved Hide resolved
@bkirwi
Copy link
Contributor Author

bkirwi commented Oct 4, 2024

Alright, I've fixed the lint and applied a few more suggestions. Thanks all for the review!

@tustvold tustvold merged commit accf625 into apache:master Oct 8, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make it easier to treat Rows as bytes
5 participants