-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coerce Array inner types #13452
Coerce Array inner types #13452
Changes from 7 commits
5d109d1
9ddeda1
f353e3f
22414e3
3c9a768
6e0c56b
d169abc
baf99cb
639f236
2691f80
1181abc
306e7fa
18ee213
cb72340
4005c0c
91b468d
a3052be
1692229
5d39c20
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -1138,27 +1138,44 @@ fn numeric_string_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<D | |||
} | ||||
} | ||||
|
||||
fn coerce_list_children(lhs_field: &FieldRef, rhs_field: &FieldRef) -> Option<FieldRef> { | ||||
Some(Arc::new( | ||||
Arc::unwrap_or_clone(Arc::clone(lhs_field)).with_data_type(comparison_coercion( | ||||
blaginin marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
should we use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So both will work, but IMO the current version is a bit better as it makes code aligned with the dictionally behaviour ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I may have misunderstood your point, but I am pretty sure
For example, this query correctly casts select arrow_typeof(x) from (select make_array(arrow_cast('a', 'Dictionary(Int8, Utf8)')) x UNION ALL SELECT make_array(arrow_cast('b', 'Dictionary(Int8, LargeUtf8)'))) x; Also, I think -- type_union_resolution can't cast nulls
select make_array(arrow_cast('a', 'Utf8')) x UNION ALL SELECT make_array(NULL) x;
-- type_union_resolution can't handle large lists (or fixed lists)
select make_array(make_array(1)) x UNION ALL SELECT make_array(arrow_cast(make_array(-1), 'LargeList(Int8)')) x; There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Shouldn't the result be like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It indicates we need to handle more coercion for
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
all these should converge x, y, z to the common super type, and i believe type_union_resolution is supposedly doing just that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, that's fair! I've switched to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #13468 seems related to this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that this PR doesn’t change names behaviour, let’s go as is and then fix it separately in that PR you highlighted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i am fine assuming the answer to this question is 'nope': can there be a problem if we coerce two lists and they have different field names? i hope it is 'nope, no problem' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think there is any problem in theory. I am a little fuzzy about what the semantic meaning of the Field's name in a However, I remember there have been issues before on this point There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think #13481 is a complain that list item name isn't preserved in some cases. I don't know when it would matter though. From SQL perspective, it shouldn't. |
||||
lhs_field.data_type(), | ||||
rhs_field.data_type(), | ||||
)?), | ||||
)) | ||||
} | ||||
|
||||
/// Coercion rules for list types. | ||||
fn list_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { | ||||
use arrow::datatypes::DataType::*; | ||||
match (lhs_type, rhs_type) { | ||||
(List(_), List(_)) => Some(lhs_type.clone()), | ||||
(LargeList(_), List(_)) => Some(lhs_type.clone()), | ||||
blaginin marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
(List(_), LargeList(_)) => Some(rhs_type.clone()), | ||||
(LargeList(_), LargeList(_)) => Some(lhs_type.clone()), | ||||
(List(_), FixedSizeList(_, _)) => Some(lhs_type.clone()), | ||||
(FixedSizeList(_, _), List(_)) => Some(rhs_type.clone()), | ||||
( | ||||
LargeList(lhs_field), | ||||
List(rhs_field) | LargeList(rhs_field) | FixedSizeList(rhs_field, _), | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
match (lhs_type, rhs_type) {
// Coerce to the left side FixedSizeList type if the list lengths are the same,
// otherwise coerce to list with the left type for dynamic length
(FixedSizeList(lhs_field, ls), FixedSizeList(rhs_field, rs)) if ls == rs => Some(
FixedSizeList(coerce_list_children(lhs_field, rhs_field)?, *rs),
),
// Left is a LargeList[View] or right is a LargeList[View]
(
LargeList(lhs_field) | LargeListView(lhs_field),
List(rhs_field)
| ListView(rhs_field)
| LargeList(rhs_field)
| LargeListView(rhs_field)
| FixedSizeList(rhs_field, _),
)
| (
List(lhs_field)
| ListView(lhs_field)
| FixedSizeList(lhs_field, _)
| LargeList(lhs_field)
| LargeListView(lhs_field),
LargeList(rhs_field) | LargeListView(rhs_field),
) => Some(LargeList(coerce_list_children(lhs_field, rhs_field)?)),
// Left and right are lists
(
List(lhs_field) | ListView(lhs_field) | FixedSizeList(lhs_field, _),
List(rhs_field) | ListView(rhs_field) | FixedSizeList(rhs_field, _),
) => Some(List(coerce_list_children(lhs_field, rhs_field)?)),
_ => None,
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is ListView already supported in arrow? I would prefer to handle list view if there is corresponding test as well to ensure the test coverage There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. TBH I'd as well would handle this separately with a proper test. I'll put a ticket There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but will reorder the arms |
||||
) | ||||
| (FixedSizeList(lhs_field, _) | List(lhs_field), LargeList(rhs_field)) => { | ||||
blaginin marked this conversation as resolved.
Show resolved
Hide resolved
|
||||
Some(LargeList(coerce_list_children(lhs_field, rhs_field)?)) | ||||
} | ||||
|
||||
(List(lhs_field), List(rhs_field) | FixedSizeList(rhs_field, _)) | ||||
| (FixedSizeList(lhs_field, _), List(rhs_field)) => { | ||||
Some(List(coerce_list_children(lhs_field, rhs_field)?)) | ||||
} | ||||
|
||||
// Coerce to the left side FixedSizeList type if the list lengths are the same, | ||||
// otherwise coerce to list with the left type for dynamic length | ||||
(FixedSizeList(lf, ls), FixedSizeList(_, rs)) => { | ||||
(FixedSizeList(lhs_field, ls), FixedSizeList(rhs_field, rs)) => { | ||||
if ls == rs { | ||||
Some(lhs_type.clone()) | ||||
Some(FixedSizeList( | ||||
coerce_list_children(lhs_field, rhs_field)?, | ||||
*rs, | ||||
)) | ||||
} else { | ||||
Some(List(Arc::clone(lf))) | ||||
Some(List(coerce_list_children(lhs_field, rhs_field)?)) | ||||
} | ||||
} | ||||
(LargeList(_), FixedSizeList(_, _)) => Some(lhs_type.clone()), | ||||
(FixedSizeList(_, _), LargeList(_)) => Some(rhs_type.clone()), | ||||
_ => None, | ||||
} | ||||
} | ||||
|
@@ -2105,6 +2122,19 @@ mod tests { | |||
DataType::List(Arc::clone(&inner_field)) | ||||
); | ||||
|
||||
// Negative test: inner_timestamp_field and inner_field are not compatible because their inner types are not compatible | ||||
let inner_timestamp_field = Arc::new(Field::new( | ||||
"item", | ||||
DataType::Timestamp(TimeUnit::Microsecond, None), | ||||
true, | ||||
)); | ||||
let result_type = get_input_types( | ||||
&DataType::List(Arc::clone(&inner_field)), | ||||
&Operator::Eq, | ||||
&DataType::List(Arc::clone(&inner_timestamp_field)), | ||||
); | ||||
assert!(result_type.is_err()); | ||||
|
||||
// TODO add other data type | ||||
Ok(()) | ||||
} | ||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here explaining why this function is needed -- specifically that it is setting the DataType / field name correctly?