This is a fork of the bundled Chrome OS ZIP Unpacker extension. It enables support for a wide variety of archive and compression formats. It also supports files that have only been compressed (e.g. foo.gz). All of this is thanks to the great libarchive project.
You can install it via the CWS: https://chrome.google.com/webstore/detail/mljpablpddhocfbnokacjggdbmafjnon
Note that we support archives (compressed or uncompressed), and we support
single compressed files (that have no archiving, e.g. foo.gz
).
Here's the list of supported archive formats:
- 7z: The 7-Zip format.
- ar: Simple UNIX archive, usually for developers.
- cab: Microsoft's cabinet archive format.
- cpio: Classic UNIX archive that still shows up, but has been largely replaced by tarballs.
- crx: Google Chrome extensions.
- deb: Debian package archive format used by Linux distros based on Debian (like Ubuntu).
- iso: ISO 9660 CD disk images. Note: UDF DVD disk images are not supported.
- jar: Java ARchives used by programmers.
- lha/lzh: LHA and LZH are common formats in Japan, and used by many old school video games.
- mtree: The BSD mtree format for mapping a directory tree.
- pax: Portable Archive Exchange format is a UNIX format meant to replace cpio and tar.
- rar: RAR archives produced by WinRAR. This is mostly for testing, so the native CrOS support should be used by default instead.
- rpm: RPM Package Manager archive used by Linux distros like RedHat, Fedora, CentOS, SUSE, and more.
- tar: UNIX tarballs that are common in the Linux computing world.
- warc: The Web ARChive format for archiving web sites.
- zip: The venerable ZIP. This is mostly for testing, so the native CrOS support should be used by default instead.
Here's the list of supported compression/encoding formats:
- bz2/bzip/bzip2: The bzip2 compression format common in the UNIX world.
- gz/gzip: The gzip compression format based on zlib. Common in the UNIX world, although zlib is also used in many places.
- lzma: The lzma compression format that has been largely replaced by xz. The compression algorithm is used by other formats, but the standalone format is not.
- lz4: The LZ4 compression algorithm.
- lzip: The lzma compression algorithm in the lzip format.
- lzop: The LZO compression algorithm in the lzop format.
- uu: The unix-to-unix text encoding format.
- xz: The xz compression format that is common in the UNIX world.
- Z: The compress legacy format that still shows up at random.
- zstd: The Zstandard algorithm developed by Facebook.
Most archive formats don't include an index. This means we need to decompress
the entire file just to get a directory listing. The formats allow any ordering
by design. For example, it could be ./bar.txt
, ./foo/blah.txt
, ./asdf.txt
.
Or it could be ./asdf.txt
, ./foo/blah.txt
, and ./bar.txt
. The only way we
can produce a complete directory listing is by looking through the entire file.
This slows things down overall (like in tarballs) and there isn't much that can be done about it.
However, there are a some file formats that do have indexes and we don't (yet) support using those. 7-zip is the most notable one here.
A similar issue comes up with single compressed files. Many formats do not know the uncompressed file size, so the only way to calculate it is by decompressing the entire file. If we were to report a fake file size (like zero bytes, or a really large file size) to the Files app, it wouldn't be able to copy the result out. It would try to read the number of bytes that it was told were available. For the few formats that do include the uncompressed size in their header (like the gzip format), we can skip the decompression overhead.
Some formats can be encrypted with passwords, but we don't prompt the user, so the files aren't decrypted. Oops.
Some formats can span multiple files, but we don't yet support those.
"It's complicated."
The WGU extension doesn't support the RAR format today. Chrome OS supports it natively via cros-disks -> AVFS -> official unrar program. We can't replace that stack until we have comparable coverage.
The RAR format has gone through a number of major revisions (at least 5 so far). A smart Russian came up with it long ago and continues to develop it as a company (RARLAB). It's a proprietary format and, while some code has been released by them, they are hostile to reverse engineering. As such, only the v1, v2, and v3 formats are supported. Unfortunately, v4 and v5 formats are common and users tend to use those more.
There is an open source unrar library released by RARLAB, but the API is not documented, and its runtime model does not mesh well with libarchive's runtime model. It's possible, but it's not trivial.
Sometimes people ask, since WGU is based on the official Chrome OS Zip unpacker that is bundled with Chrome OS today, why don't we just merge the two so that Chrome OS supports everything WGU does out of the box?
"It's complicated."
From the product team's perspective, they don't want to support an extensive set of formats if there is not high user demand for them. If users run into problems (and they inevitably will), the engineering costs aren't justified.
Similarly, they don't want to say "ZIP is officially supported, but all other formats are 'best effort'". Most users don't care about those trade-offs -- they just want their system to work. All they see is that they tried to open a 7z file and it didn't work even though opening a different 7z file worked. Trying to explain these nuances doesn't really scale.
Thus the status quo is to not support the formats at all. Users can try and locate alternatives (like WGU), and in the process of doing so, understand that the resulting software might be buggy. And those bugs are not the fault of the Chrome OS product (although some will still complain that Chrome OS should have included support out of the box).
Everyone has a reasonable position taken in isolation. But the end result is that everyone loses. Offering best-effort support makes users unhappy, but offering nothing also makes them unhappy. At least this way, the blow back on the Chrome OS product is lower.
Please use the issues link here to report any issues you might run into.
This is the ZIP Unpacker extension used in Chrome OS to support reading and unpacking of zip archives.
Since the code is built with NaCl, you'll need its toolchain.
$ cd third-party
$ make nacl_sdk
We'll use libraries from webports.
$ cd third-party
$ make depot_tools
$ make webports
First install npm using your normal packaging system. On Debian, you'll want something like:
$ sudo apt-get install npm
Your distro might have old versions of npm, so you'd have to install it yourself.
Then install the npm modules that we require. Do this in the root of the unpacker repo.
$ npm install bower vulcanize crisper
Once done, install the libarchive-fork/ from third-party/ of the unpacker project. Note that you cannot use libarchive nor libarchive-dev packages from webports at this moment, as not all patches in the fork are upstreamed.
$ cd third-party
$ make libarchive-fork
Polymer is used for UI. In order to fetch it, in the same directory type:
$ make polymer
Build the PNaCl module.
$ cd unpacker
$ make [debug]
The package can be found in the release or debug directory. You can run it directly from there using Chrome's "Load unpacked extension" feature, or you can zip it up for posting to the Chrome Web Store.
$ zip -r release.zip release/
Once it's loaded, you should be able to open ZIP archives in the Files app.
Paths that aren't linked below are dynamically created at build time.
- node_modules/: All the locally installed npm modules used for building.
- third-party/: The source for third-party NaCl & Polymer code.
- libarchive-fork/: The libarchive NaCl module (w/custom patches).
- polymer/: Polymer code for UI objects.
- unpacker/: The extension CSS/HTML/JS/NaCl source code.
- cpp/: The NaCl module source.
- css/: Any CSS needed for styling UI.
- debug/: A debug build of the Chrome extension.
- html/: Any HTML needed for UI.
- icons/: Various extension images.
- js/: The JavaScript code.
- _locales/: Translations of strings shown to the user.
- pnacl/: Compiled NaCl objects & module (debug & release).
- release/: A release build of the Chrome extension.
- unpacker-test/: Code for running NaCl & JavaScript unittests.
Some high level points to remember: the JS side reacts to user events and is the only part that has access to actual data on disk. It uses the NaCl module to do all the data parsing (e.g. gzip & tar), but it has to both send a request to the module ("parse this archive"), and respond to requests from the module when the module needs to read actual bytes on disk.
When the extension loads, background.js registers everything and goes idle.
When the Files app wants to mount an archive, callbacks in app.js
unpacker.app
are called to initialize the NaCl runtime. Creates an
unpacker.Volume
object for each mounted archive.
Requests on the archive (directory listing, metadata lookups, reading files)
are routed through app.js unpacker.app
and to volume.js unpacker.Volume
.
Then they are sent to the low level decompressor.js unpacker.Decompressor
which talks to the NaCl module using the request.js unpacker.request
protocol. Responses are passed back up.
When the NaCl module is loaded, module.cc NaclArchiveModule
is instantiated.
That instantiates NaclArchiveInstance
for initial JS message entry points.
It instantiates JavaScriptMessageSender
for sending requests back to JS.
When JS requests come in, module.cc NaclArchiveInstance
will create
volume.h Volume
objects on the fly, and pass requests down to them (using
the protocol defined in request.h request::*
).
volume.h Volume
objects in turn use the volume_archive.h VolumeArchive
abstract interface to handle requests from the JS side (using the protocol
defined in request.h request:**
). This way the lower levels don't have to
deal with JS directly.
volume_archive_libarchive.cc VolumeArchiveLibarchive
implements the
VolumeArchive
interface and uses libarchive as its backend to do all the
decompression & archive format processing.
But NaCl code doesn't have access to any files or data itself. So the
volume_reader.h VolumeReader
abstract interface is passed to it to provide
the low level data read functions. The volume_reader_javascript_stream.cc
VolumeReaderJavaScriptStream
implements that by passing requests back up
to the JS side via the javascript_requestor_interface.h
JavaScriptRequestorInterface
interface (which was passed down to it).
So requests (mount an archive, read a file, etc...) generally follow the path:
- JavaScript ->
request::*
- module.cc
NaclArchiveModule
->NaclArchiveModule
->request::*
- volume.cc
Volume
- volume_archive.h
VolumeArchive
- volume_archive_libarchive.cc
VolumeArchiveLibarchive
- volume_reader.h
VolumeReader
- volume_reader_javascript_stream.cc
VolumeReaderJavaScriptStream
- javascript_requestor_interface.h
JavaScriptRequestorInterface
- volume.cc
JavaScriptRequestor
- javascript_message_sender_interface.h
JavaScriptMessageSenderInterface
- module.cc
JavaScriptMessageSender
->request::*
- JavaScript (to read actual bytes of data)
- javascript_message_sender_interface.h
- volume.cc
- javascript_requestor_interface.h
- volume_reader_javascript_stream.cc
- volume_reader.h
- volume_archive_libarchive.cc
- volume_archive.h
- volume.cc
- module.cc
Then once VolumeArchive
has processed the raw data stream, it can return
results to the Volume
object which takes care of posting JS status messages
back to the Chrome side.
Here's the JavaScript code that matters. A few files have very specific purposes and can be ignored at a high level, so they're in a separate section.
- background.js
- Main entry point.
- Initializes the module/runtime.
- Registers the extension with Chrome filesystem/runtime.
- app.js
unpacker.app
- Main runtime for the extension.
- Loads/unloads NaCl modules on demand (to save runtime memory).
- Loads/unloads volumes as Chrome has requested.
- Responds to Chrome filesystem callbacks.
- Passes data back to Chrome from
unpacker.Volume
objects.
- volume.js
unpacker.Volume
- Every mounted archive has a
unpacker.Volume
instance. - Provides high level interface to requests (like reading files & metadata).
- Every mounted archive has a
- decompressor.js
unpacker.Decompressor
- Provides low level interface for
unpacker.Volume
requests. - Talks to the NaCl module using the
unpacker.request
protocol.
- Provides low level interface for
- request.js
unpacker.request
- Handle the JS<->NaCl protocol communication.
- passphrase-manager.js
unpacker.PassphraseManager
- Interface for plumbing password requests between UI & JS & NaCl.
These are the boilerplate/simple JavaScript files you can generally ignore.
- build-config.js
- Unused?
- passphrase-dialog.js
- Polymer code for managing the password dialog.
- types.js
unpacker.types
- Basic file for setting up custom types/constants.
- unpacker.js
unpacker
- Basic file for setting up
unpacker.*
namespace.
- Basic file for setting up
Here's the NaCl layout.
- module.cc
- Main entry point to the module.
- Implements the ppapi interface.
- Routes requests between the JS & NaCl layers.
- Implements the
JavaScriptMessageSenderInterface
interface so the rest of NaCl code can easily send messages back up.
- javascript_message_sender_interface.h
- API for the nacl code to easily send messages back up.
- Implemented in module.cc.
- request.cc request.h
- Defines the protocol used to communicate between JS & NaCl.
- javascript_requestor_interface.h
- Abstract
JavaScriptRequestorInterface
interface for talking to JS side. - Implemented in volume.cc.
- Abstract
- volume.cc volume.h
- Defines the
Volume
class that encompasses a high level volume. - Every mount request has a Volume.
- Takes care of plumbing JS requests to lower
VolumeArchive
. - Implements
JavaScriptRequestorInterface
interface.
- Defines the
- volume_archive.h
- Abstract
VolumeArchive
interface for handling specific archive formats.
- Abstract
- volume_archive_libarchive.cc volume_archive_libarchive.h
- Implements
VolumeArchive
using the libarchive project.
- Implements
- volume_reader.h
- Abstract
VolumeReader
interface for low level reading of data. - Used to read raw streams of data from somewhere.
- Abstract
- volume_reader_javascript_stream.cc volume_reader_javascript_stream.h
- Implements
VolumeReader
. - Uses
JavaScriptRequestorInterface
to get data from the JS side.
- Implements
To see debug messages open chrome from a terminal and check the output. For output redirection see https://developer.chrome.com/native-client/devguide/devcycle/debugging.
Install Karma for tests runner, Mocha for asynchronous testings, Chai for assertions, and Sinon for spies and stubs.
$ npm install --save-dev \
karma karma-chrome-launcher karma-cli \
mocha karma-mocha karma-chai chai karma-sinon sinon
# Run tests:
$ cd unpacker-test
$ ./run_js_tests.sh # JavaScript tests.
$ ./run_cpp_tests.sh # C++ tests.
# Check JavaScript code using the Closure JS Compiler.
# See https://www.npmjs.com/package/closurecompiler
$ cd unpacker
$ npm install google-closure-compiler
$ bash check_js_for_errors.sh