Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFN support #12

Open
5 tasks
thejpster opened this issue Nov 8, 2019 · 3 comments
Open
5 tasks

LFN support #12

thejpster opened this issue Nov 8, 2019 · 3 comments

Comments

@thejpster
Copy link
Member

thejpster commented Nov 8, 2019

Was:

The LFN code assumes the code points are all valid UTF-16 and not surrogate pairs exist. This may not be a valid assumption. We should do a proper UTF-16 to UTF-8 conversion.

Now:

Note that opening an existing file is easier, as we can match a &str against the LFN entries on disk. Creating a new file is more difficult because we need to:

a) work out how many 16-bit code units are required to store the filename
b) work out how many directory entries are required to store the long filename chunks and the real directory entry (13 code units per chunk)
c) find a gap in the directory with that many consecutive deleted directory entries, or at the end of the directory (using any deleted directory entries which are located at the end)
d) write out the LFN chunks as well as the actual directory entry

@jonathanpallant
Copy link
Collaborator

See #157

@jonathanpallant jonathanpallant changed the title LFN support doesn't handle emoji LFN support Oct 25, 2024
@thejpster
Copy link
Member Author

I wanted to see how characters outside the BMP are encoded on Windows 11 on a FAT16 filesystem.

The file was called Smiley 😀.txt.

[2024-10-27T11:47:34Z DEBUG embedded_sdmmc::fat::volume] LFN Contents true 1 90 [0053, 006d, 0069, 006c, 0065, 0079, 0020, d83d, de00, 002e, 0074, 0078, 0074]
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 't'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'x'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 't'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push '.'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push '?'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push '?'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push ' '
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'y'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'e'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'l'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'i'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'm'
[2024-10-27T11:47:34Z TRACE embedded_sdmmc::filesystem::filename] LFN push 'S'

We can see that 😀 (\U{1F600}) was encoded as ['\U{D83D}', '\U{DE00}'], i.e. as a surrogate pair from the D800-DFFF range. Therefore to turn an LFN back into a String, we need to handle surrogate pairs.

@thejpster
Copy link
Member Author

thejpster commented Oct 27, 2024

Note that worst case, a filename could consist of 255 codepoints, each between \U{0800} and \U{D7FF}. These would each be encoded as 3 bytes in UTF-8, meaning the worst-case size for an LfnBuffer is 765 bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants