Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quoting_style: Add support for non-UTF-8 bytes #6882

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jtracey
Copy link
Contributor

@jtracey jtracey commented Nov 23, 2024

This adds support for non-UTF-8 bytes in the quoting_style library on Unix platforms. This is necessary for proper support of non-unicode inputs in a few utilities, including wc, ls, and printf (as of this PR, wc should be good, ls is in a much better state but will need some work to close the final gaps, and printf needs @andrewliebenow's #6812, which might conflict this this, but if so, should be a quick fix).

The first commit bumps the MSRV, because we need access to Utf8Chunks, since we need to operate on strings and non-unicode bytes in the same OsString (namely, we need to be able to tell if something is invalid unicode, or valid unicode but a control character, and apply the appropriate escaping). Avoiding that would require implementing or using another UTF-8 parser.

The third commit fixes a preexisting bug that was in some sense independent of this patch set (multi-byte control characters weren't being handled properly), but it touches the same code so I'm including it.

This new functionality is implemented, but not yet exposed here.
This exposes the non-UTF-8 functionality to callers. Support in `argument`,
`spec`, and `wc` are implemented, as their usage is simple. A wrapper only
returning valid unicode is used in `ls`, since proper handling of OsStrings
there is more involved (outputs that escape non-unicode work now though).
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant