Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified module ziparchives to alternatively open zip files as byte strings. #90

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nervecenter
Copy link

The module ziparchives features a modified ZipArchiveReader object type. It now contains a string-based alternative to a memfile, and a new ZipArchiveReaderMode to determine which field to read from. There are two new utility procs: getDataPtr() and getDataLen(), which depending on the mode of the input reader, get the casted data pointer or the length.

Most of openZipArchive() was moved into a new openZipArchiveInternal(), which features all of the internal zip archive reading logic. The proc openZipArchive*() is now a wrapper for initializing a reader in MemfileMode. The proc openZipArchiveBytes*() is a wrapper for opening a zip archive in StringMode, and takes a string of bytes; the returned reader can perform operations on those bytes as a .zip file, performing all operations in-memory.

Correspondingly, extractAll*() had an alternative spun out, extractAllBytes*(), which extracts a byte-string archive to the chosen directory. The common internal logic was given its own proc extractAllInternal(), and the directory check was given its own proc checkExtractDestination().

There are also two extra files present at a higher level. The inner_test.zip archive has three internal .zip files, each containing three internal text files, where the filename is a number and the contents are that number's corresponding whole English word. This is a test artifact for test_ziparchives_inmem.nim, which can be run from its own directory with nim r test_ziparchives_inmem.nim. This test extracts all the text files flatly to the working directory. There is an alternative that extracts each inner archive to its own directory using extractAllBytes*(). It should be noted that running this may conflict with any nimble-installed versions of zippy, so it should be isolated.

Chris Collazo added 4 commits September 20, 2024 14:09
…een read in as byte strings. This allows extracting from recursive archives in-memory.
…s to obtain reader pointer and length. Added wrapper procs to open an archive as either a file or a byte string. Removed module ziparchives_inmem.
@quantimnot
Copy link

Why not openArray[byte] and seq[byte]? More semantic, and can be converted into string, if needed for some reason.

@nervecenter
Copy link
Author

@quantimnot Primarily just because readFile() returns a string, so it's easy to just pass a file in. And materially it doesn't really change the content of my commit. Is there a way to convert string to either of those without provoking a copy?

@guzba
Copy link
Owner

guzba commented Nov 19, 2024

Hey sorry for not getting back to this.

Regarding the question from @quantimnot, ZipArchiveReader must have the actual backing data around for its lifetime in order to pull files out of it later, so an openarray parameter will just require me to do a copyMem. Not really serving any purpose in this case though it is fine as an option.

An important requirement of ZipArchiveReader is that it does not decompress all of the files into memory up front. There's good reasons for not doing that.

Supporting a string is fine but it will need to get stored. The parameter could be a sink parameter to potentially avoid a copy if it is not already.

I don't mind the idea of a ptr + len version of this, where it is the lib user's job to keep the data somewhere so the pointer stays valid, but that's another thing.

@guzba
Copy link
Owner

guzba commented Nov 19, 2024

Also I do agree with @nervecenter that string is simply better than seq[byte] or whatever. Every API in Nim that actually deals with bytes takes string so I gave up fighting this years go. Just embrace string and never ever ever use seq[byte], its just a trap (there is no actually good easy conversion, casting is not actually safe so its a copyMem to convert, yay).

@nervecenter
Copy link
Author

nervecenter commented Nov 19, 2024

@guzba I'm fairly certain I'm only reading the archive into memory as-is, and the procs I added decompress individual contents of the archive one at a time on demand in memory. If there's something extra going on behind the scenes, I apologize if that's causing issues. It's worked quite well for me in my in-production project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants