Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update INITCAP scalar function to support Utf8View #11888

Merged
merged 4 commits into from
Aug 12, 2024

Conversation

xinlifoobar
Copy link
Contributor

@xinlifoobar xinlifoobar commented Aug 8, 2024

Which issue does this PR close?

Closes #11853 and part of #11790

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 8, 2024
.collect::<GenericStringArray<T>>();

Ok(Arc::new(result) as ArrayRef)
}

fn initcap_utf8view<T: OffsetSizeTrait>(args: &[ArrayRef]) -> Result<ArrayRef> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering it might be a bit heavy to make this a macro, we wouldn't have a lot of initcap_* like this.

let result = string_view_array
.iter()
.map(initcap_string)
.collect::<GenericStringArray<T>>();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type is a StringArray instead of a StringViewArray, should I alter this behavior? In previous it was defined here

https://github.com/apache/datafusion/blob/2521043ddcb3895a2010b8e328f3fa10f77fc094/datafusion/functions/src/utils.rs#L45C1-L46C1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current utf8_to_str_type only return Utf8 or LargeUtf8. I think ideally we should support returning Utf8View. But since we are recreating the strings anyway, I'm not sure if StringView will help here.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xinlifoobar and @XiangpengHao


fn initcap_string(string: Option<&str>) -> Option<String> {
string.map(|string: &str| {
let mut char_vector = Vec::<char>::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect you could make this faster by creating the vector once and then resetting on each loop -- like

        let mut char_vector = Vec::<char>::new();
    string.map(|string: &str| {
      char_vector.clear();
...
}

datafusion/sqllogictest/test_files/string_view.slt Outdated Show resolved Hide resolved
@alamb alamb merged commit f2685d3 into apache:main Aug 12, 2024
24 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 12, 2024

Thanks again @xinlifoobar and @XiangpengHao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update INITCAP scalar function to support Utf8View
3 participants