-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: upgrade to datafusion 43 #2886
Conversation
We need Datafusion to upgrade as well before we can do this. I have some patches to make the latest datafusion work but we're blocked on them to adopt the newest arrow and kernel |
9d686df
to
d4bfacd
Compare
Hi, datafusion has been released. Let's rock! |
Thank you @ion-elgreco -- let me know if there is anything I can to do help with this PR (e.g. I could work on updating the code to use non deprecated APIs in the core for parquet statistics...) |
8db59c2
to
b325e27
Compare
@alamb there seems to be a regression in DF, see https://github.com/delta-io/delta-rs/actions/runs/10959345721/job/30431504526?pr=2886#step:7:948. A literal with a list is creating a dtype list with inner type using the name "item", we however have our data internally read according to the parquet spec, so that when we write the parquet files are meeting the field naming of the spec. I had a quick glance at the Changelog but can't really pinpoint where this change may have occurred, do you have any ideas?
|
It sounds somewhat similar to a discussion happening here about nullability (but I think field name is similar in terms of how it is treated): apache/datafusion#11989 (comment) Update; filed apache/datafusion#12560 to track the regression more |
f824d7a
to
a1a33ce
Compare
@alamb thanks for following up! I feel a little sheepish 🐑 insofar that my internal CI showed these errors weeks ago in my integration testing between datafusion I've subscribed to the issue you created and can help testing if necessary! Depending on my availability this weekend I can try fixing it in datafusion too 🤔 |
Hi there - I believe a fix was implemented in datafusion and this PR is not blocked any longer. |
It's not yet released though |
I can confirm that the latest datafusion main passes tests in CI 🥳 |
4f30e16
to
bac3306
Compare
I filed apache/datafusion#12813 to consider releasing a patch version of DataFusion |
@timsaucer I believe the error is introduced further up the optimization rules stack FWIW. Going into optimization
After invoking simplify_expressions
SimplifyExpressions appears to erase the defined field names which only causes problems later in OptimizeProjections. I might not be qualified to fix this sticky of a bug 😆 |
e363ee6
to
6721672
Compare
FYI this does correct the error in the unit test |
Signed-off-by: R. Tyler Croy <[email protected]>
The release of pyo3 0.22.3 compells this since we cannot otherwise compile. The choice is between pinning 0.22.2 and upgrading our ABI, and I think it's better to upgrade the ABI Signed-off-by: R. Tyler Croy <[email protected]>
see delta-io/delta-kernel-rs#301 Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: Stephen Carman <[email protected]>
We plan to make another DataFusion patch release to unblock this upgrade: |
6721672
to
08a96ed
Compare
Today the make_array function from Datafusion uses "item" as the list element's field name. With recent changes in delta-kernel-rs we have switched to calling it "element" which is more conventional related to how Apache Parquet handles things This change introduces a test which helps isolate the behavior seen in Python tests within the core crate for easier regression testing Signed-off-by: R. Tyler Croy <[email protected]>
08a96ed
to
b144cf1
Compare
Description
Bump kernel