Modified module `ziparchives` to alternatively open zip files as byte strings. #90

nervecenter · 2024-09-30T20:24:39Z

The module ziparchives features a modified ZipArchiveReader object type. It now contains a string-based alternative to a memfile, and a new ZipArchiveReaderMode to determine which field to read from. There are two new utility procs: getDataPtr() and getDataLen(), which depending on the mode of the input reader, get the casted data pointer or the length.

Most of openZipArchive() was moved into a new openZipArchiveInternal(), which features all of the internal zip archive reading logic. The proc openZipArchive*() is now a wrapper for initializing a reader in MemfileMode. The proc openZipArchiveBytes*() is a wrapper for opening a zip archive in StringMode, and takes a string of bytes; the returned reader can perform operations on those bytes as a .zip file, performing all operations in-memory.

Correspondingly, extractAll*() had an alternative spun out, extractAllBytes*(), which extracts a byte-string archive to the chosen directory. The common internal logic was given its own proc extractAllInternal(), and the directory check was given its own proc checkExtractDestination().

There are also two extra files present at a higher level. The inner_test.zip archive has three internal .zip files, each containing three internal text files, where the filename is a number and the contents are that number's corresponding whole English word. This is a test artifact for test_ziparchives_inmem.nim, which can be run from its own directory with nim r test_ziparchives_inmem.nim. This test extracts all the text files flatly to the working directory. There is an alternative that extracts each inner archive to its own directory using extractAllBytes*(). It should be noted that running this may conflict with any nimble-installed versions of zippy, so it should be isolated.

…een read in as byte strings. This allows extracting from recursive archives in-memory.

…s to obtain reader pointer and length. Added wrapper procs to open an archive as either a file or a byte string. Removed module ziparchives_inmem.

…ring.

quantimnot · 2024-11-18T13:17:17Z

Why not openArray[byte] and seq[byte]? More semantic, and can be converted into string, if needed for some reason.

nervecenter · 2024-11-18T13:51:11Z

@quantimnot Primarily just because readFile() returns a string, so it's easy to just pass a file in. And materially it doesn't really change the content of my commit. Is there a way to convert string to either of those without provoking a copy?

guzba · 2024-11-19T06:22:22Z

Hey sorry for not getting back to this.

Regarding the question from @quantimnot, ZipArchiveReader must have the actual backing data around for its lifetime in order to pull files out of it later, so an openarray parameter will just require me to do a copyMem. Not really serving any purpose in this case though it is fine as an option.

An important requirement of ZipArchiveReader is that it does not decompress all of the files into memory up front. There's good reasons for not doing that.

Supporting a string is fine but it will need to get stored. The parameter could be a sink parameter to potentially avoid a copy if it is not already.

I don't mind the idea of a ptr + len version of this, where it is the lib user's job to keep the data somewhere so the pointer stays valid, but that's another thing.

guzba · 2024-11-19T06:23:42Z

Also I do agree with @nervecenter that string is simply better than seq[byte] or whatever. Every API in Nim that actually deals with bytes takes string so I gave up fighting this years go. Just embrace string and never ever ever use seq[byte], its just a trap (there is no actually good easy conversion, casting is not actually safe so its a copyMem to convert, yay).

nervecenter · 2024-11-19T14:11:36Z

@guzba I'm fairly certain I'm only reading the archive into memory as-is, and the procs I added decompress individual contents of the archive one at a time on demand in memory. If there's something extra going on behind the scenes, I apologize if that's causing issues. It's worked quite well for me in my in-production project.

Chris Collazo added 4 commits September 20, 2024 14:09

Added ziparchives_inmem, which can extract from archives which have b…

2031205

…een read in as byte strings. This allows extracting from recursive archives in-memory.

Folded in-memory archiver opener as a reader mode. Added utility proc…

e1f8dc6

…s to obtain reader pointer and length. Added wrapper procs to open an archive as either a file or a byte string. Removed module ziparchives_inmem.

Removed export of openZipArchiveInternal().

a081b01

Added extractAllBytes*(), separated extracting by file and by byte st…

a51102d

…ring.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modified module `ziparchives` to alternatively open zip files as byte strings. #90

Modified module `ziparchives` to alternatively open zip files as byte strings. #90

nervecenter commented Sep 30, 2024

quantimnot commented Nov 18, 2024

nervecenter commented Nov 18, 2024

guzba commented Nov 19, 2024

guzba commented Nov 19, 2024 •

edited

Loading

nervecenter commented Nov 19, 2024 •

edited

Loading

Modified module ziparchives to alternatively open zip files as byte strings. #90

Are you sure you want to change the base?

Modified module ziparchives to alternatively open zip files as byte strings. #90

Conversation

nervecenter commented Sep 30, 2024

quantimnot commented Nov 18, 2024

nervecenter commented Nov 18, 2024

guzba commented Nov 19, 2024

guzba commented Nov 19, 2024 • edited Loading

nervecenter commented Nov 19, 2024 • edited Loading

Modified module `ziparchives` to alternatively open zip files as byte strings. #90

Modified module `ziparchives` to alternatively open zip files as byte strings. #90

guzba commented Nov 19, 2024 •

edited

Loading

nervecenter commented Nov 19, 2024 •

edited

Loading