Comment by theamk
5 months ago
this is messed up
> The decoding performance slowdown of Wasm is minimal (10–30%) compared to a native implementation.
so... you take 10%-30% performance hit _right away_, and you perpetually give up any opportunities to improve the decoder in the future. And you also give up any advanced decoding functions other than "decode whole block and store into memory".
I have no idea why would anyone do this. If you care about speed, then wasm is not going to cut it. If you don't care about speed, you don't need super-fancy encoding algorithms, just use any of the well-known ones.
> so... you take 10%-30% performance hit _right away_, and you perpetually give up any opportunities to improve the decoder in the future.
The WASM is meant as a backup. If you have the native decoder installed (e.g., as a crate), then a system will prefer to use that. Otherwise, fallback to WASM. A 10-30% performance hit is worth it over not being able to read a file at all.
It even says so right in the abstract:
"Embedding the decoders in each file requires minimal storage (kilobytes) and ensures compatibility on any platform in case native decoders are unavailable."
The idea that software I write today can decode a data file written in ten years using new encodings is quite appealing.
And the idea that new software written to make use of the new encodings doesn't have to carry the burden of implementing the whole history of encoders for backwards compatibility likewise.
Now you have code stored in your database which you don't know what will do when you execute it.
Sounds very much like the security pain from macros in Excel and Microsoft Word that could do anything.
This is why most PDF readers will ignore any javascript embedded inside PDF files.
5 replies →
I kind of agree with you, but there's more to the picture.
The situation you describe is kind of already the case with various approaches to compression. For example, perhaps we decide to bitpack instead of use the generic compressor. Or change compressors entirely.
This sort of thing exists without WASM, and it means you have to "transcode" i.e. rewrite the file after updating your software with the new techniques.
With WASM, it's the same. You just rewrite the file.
I do agree that this pushes the costs of iteration up the stack in a vastly less efficient way. Overall this seems way more expensive, very unclear that future proofing is worth it. I've worked with exabyte-scale systems and re-encoding swaths of data regularly would not be good.