-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnionBuilder produces incorrect Union DataType #1637
Comments
I think automatically determining the nullability of the field based on if that child contains nulls makes a lot of sense, however, it could cause schema-volatility depending on if the written data happens to contain nulls which seems sub-optimal. I think the safest thing is probably to do what Edit: FWIW @jhorstmann reported a similar issue recently with StructArrays - #1611 |
Further #1649 |
I also ran into this on #6303 Currently, the returned union only contains fields appended at least once. So every
Do you think we should follow If the fields are somehow specified, should the returned array contains every child specified, or only those appended at least one time? I thought about another option, and would like to know what you think First, we need to add an id argument to the current
And then add
And finally, document that nullable fields should only use |
Describe the bug
The Union DataType produced by UnionBuilder has non-nullable children Fields after appending nulls in the builder.
To Reproduce
Steps to reproduce the behavior: Try the following code
This code panics:
InvalidArgumentError("column types must match schema types, expected
Union([
Field { name: "a", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None },
Field { name: "b", data_type: Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }
], Dense
) but found Union([
Field { name: "a", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None },
Field { name: "b", data_type: Float64, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: None }
], Dense)
at column index 0")
Expected behavior
Depending on the interpretation of the specification, one of 2 things should happen:
A
Union
's childrenField
s should inherit its nullabillity (i.e. always be false): Then I think this should error when executingField::new()
with a badDataType
.A child should be nullable if it is capable of returning None to the parent when
unionArray.value(index)
is called: This code should run just fine then.Additional context
I ran into this when working on #1594. I think it's a simple fix: track the nullablility of the
UnionBuilder
fields rather than always hardcode the childField
s nullability to be false. That being said, I'm not sure if that's the correct understanding of the specification.The text was updated successfully, but these errors were encountered: