Comment by magicalhippo
5 months ago
On a semi-related note, there was recently a discussion[1] on the F3 file format, which also allows for format-aware compression by embedding the decompressor code as WASM. Though the main motivation for F3 was future compatibility, it does allow for bespoke compression algorithms.
This takes a very different approach, and wouldn't require a full WASM runtime. Though it does have the SDDL compiler and runtime, though I assume it's a lighter dependency.
[1]: https://news.ycombinator.com/item?id=45437759 F3: Open-source data file format for the future [pdf] (125 comments)
As someone seriously trying to develop a compressed archive format with WebAssembly, sandboxing is actually easy and that's indeed why WebAssembly was chosen. The real problem is determinism, which WebAssembly does technically support but actual implementations may vary significantly. And even when WebAssembly can be made fully deterministic, function calls made to those WebAssembly modules may still be undeterministic! I tried very hard to avoid such pitfalls in my design, and it is entirely reasonable to avoid WebAssembly due to these issues.
I'm confused why determinism is a problem here? You write an algorithm that should produce the same output for a given input. How does WASM make that not deterministic?
Assume that I have 120 MB of data to process. Since this is quite large, implementations may want to process them in chunks (say, 50 MB). Now those implementations would call the WebAssembly module multiple times with different arguments, and input sizes would depend on the chunk size. Even though each call is deterministic, if you vary arguments non-deterministically then you lose any benefit of determinism: any bug in the WebAssembly module will corrupt data.
2 replies →
And no mention of zpaq that has had emedable decompressors feature for 15 years
you did not read the white paper then
Isnt that a huge vector for viruses if exevutable code is included in the compressed archive?
Wasm can be sandboxed. Its a safe as visiting a website with javascript.
Can't the decompressor still produce a malicious uncompressed file?
6 replies →
So, not very safe.
I think this is the first time a genuine technical question of mine rather than a social view has been downvoted here. Thats sad.