The hardest part of a rewrite like this is usually maintaining bug-for-bug compatibility with the legacy parser rather than the actual Rust implementation. Most real-world media files are malformed in some way that the C++ code implicitly handled, so if you write a strict parser you end up breaking valid user data. Differential fuzzing seems like the only practical way to map that behavior without manually reviewing millions of edge cases.
I suspect it is actually about maintaining permissiveness for malformed inputs rather than keeping security bugs. I ran into this building ingestion for a print-on-demand service where users upload technically broken PDFs that legacy viewers handle fine. If the new parser is stricter than the old one you end up rejecting files that used to work, which is a non-starter for the product.
The focus on media parsing is smart - it's one of the most attack-prone surfaces in any messaging app. Media files are essentially untrusted input from the network that need complex processing.
What's interesting is the broader trend: Signal's libsignal is Rust, Matrix's vodozemac (Olm/Megolm implementation) is Rust, and now WhatsApp is moving this direction. The industry seems to be converging on Rust for the security-critical paths while keeping the UI layer in whatever makes sense for the platform.
The differential fuzzing approach they mention is key - you can't just rewrite and hope for the best. Real-world media is full of edge cases and malformed files that users expect to "just work." Having both implementations running in parallel during the transition gives you a safety net.
If it were an old account I might have given them the benefit of the doubt, but they literally just joined to make this comment. There's so many green accounts popping up which reek of AI now. I've seen some where all of their comments are almost exactly the same length.
Probably yes. It's ~300KB per binary, and it's a one-time cost.
It can be avoided entirely by disabling the standard library, but that's inconvenient, and usually done only when writing for embedded devices.
Usually the problem isn't the size directly, but duplication of Rust dependencies in mixed C++/Rust codebases.
If you end up with a sandwich of build systems (when you have library dependencies like C++ => Rust => C++ => Rust), each Rust/Cargo build bundles its copy of libstd and crates. Then you need to either ensure that the linker can clean that up, or use something like Bazel instead of Cargo to make it see both Rust and C++ deps as part of a single dependency tree.
WhatsApp doesn't use libsignal, and Android is already pretty Rusty and deployed more than WhatsApp around the world (not just smartphone. Tons of "embedded" use cases also run on custom Android)
If you watch "Microsoft is Getting Rusty: A Review of Successes and Challenges" it appears the whole effort is more on the Azure side, and besides some timid adoption like GDI regions, there is a lukewarm adoption of Rust on Windows side, still pretty much a C and C++ feud.
Just like Google’s Rust-in-Android blogs this reads like a PR piece (and in the case of facebook also recruitment piece) with some technical words sprinkled in for effect. The overall communication quality is that of a random startup’s “look what we did” posts.
The interesting aspects, such as how they protect against supply-chain attacks from the dependency-happy rust toolchain or how they integrated the C++ code with the Rust code on so many platforms - a top challenge as they said - remain a mystery.
Would also be interesting to hear how much AI-driven development they used for this project. My hope’s that AI gets really good at Rust so one doesn’t have to directly interact with the unergonomic syntax.
To be fair the increased reliability of Rust code over C++ isn't just because of memory errors (out-of-bounds accesses, use-after-free, type confusion, etc). You also get:
* No undefined behaviour (outside `unsafe`, which is quite easy to avoid). In C++ there are many many sources of UB that aren't really memory errors directly, e.g. signed integer overflow or forgetting to `return` from a function.
* A much stronger type system.
Those two things have a really significant impact on reliability.
The hardest part of a rewrite like this is usually maintaining bug-for-bug compatibility with the legacy parser rather than the actual Rust implementation. Most real-world media files are malformed in some way that the C++ code implicitly handled, so if you write a strict parser you end up breaking valid user data. Differential fuzzing seems like the only practical way to map that behavior without manually reviewing millions of edge cases.
It sounds like it's a design goal of this "wamedia" to _not_ maintain bug compatibility with media players.
I suspect it is actually about maintaining permissiveness for malformed inputs rather than keeping security bugs. I ran into this building ingestion for a print-on-demand service where users upload technically broken PDFs that legacy viewers handle fine. If the new parser is stricter than the old one you end up rejecting files that used to work, which is a non-starter for the product.
The focus on media parsing is smart - it's one of the most attack-prone surfaces in any messaging app. Media files are essentially untrusted input from the network that need complex processing.
What's interesting is the broader trend: Signal's libsignal is Rust, Matrix's vodozemac (Olm/Megolm implementation) is Rust, and now WhatsApp is moving this direction. The industry seems to be converging on Rust for the security-critical paths while keeping the UI layer in whatever makes sense for the platform.
The differential fuzzing approach they mention is key - you can't just rewrite and hope for the best. Real-world media is full of edge cases and malformed files that users expect to "just work." Having both implementations running in parallel during the transition gives you a safety net.
That's right, Signal (https://kerkour.com/signal-app-rust), Proton (https://kerkour.com/proton-apps-rust), Matrix, Wire and many more are using a share, cross-platform Rust core and a platform-dependent UI layer.
But it's not only the security-critical paths, but also most of the business logic (see the 2 posts above).
I agree with everything you say. But wow, does that comment sound like AI. Probably Grok?
Not saying you are AI, you might just be a heavy user who picked up the same patterns
If it were an old account I might have given them the benefit of the doubt, but they literally just joined to make this comment. There's so many green accounts popping up which reek of AI now. I've seen some where all of their comments are almost exactly the same length.
I like your AI slop detector, is it part of your consciousness ?
The "is key - ", is a key giveaway.
4 replies →
> Two major hurdles were the initial binary size increase due to bringing in the Rust standard library [...].
They don't say what they did about it, do they? Did they just accept it?
I suspect they just use no_std whenever its applicable
https://github.com/facebook/buck2/commit/4a1ccdd36e0de0b69ee...
https://github.com/facebook/buck2/commit/bee72b29bc9b67b59ba...
Turn out if you have strong control over the compiler and linker instrumentations, there are a lot of ways to optimize binary size
Probably yes. It's ~300KB per binary, and it's a one-time cost.
It can be avoided entirely by disabling the standard library, but that's inconvenient, and usually done only when writing for embedded devices.
Usually the problem isn't the size directly, but duplication of Rust dependencies in mixed C++/Rust codebases.
If you end up with a sandwich of build systems (when you have library dependencies like C++ => Rust => C++ => Rust), each Rust/Cargo build bundles its copy of libstd and crates. Then you need to either ensure that the linker can clean that up, or use something like Bazel instead of Cargo to make it see both Rust and C++ deps as part of a single dependency tree.
Who knows what they did, but there are things which can be done: https://github.com/johnthagen/min-sized-rust
The whole article a bit watery which is why I read it as a PR rather than technical presentation
> We believe that this is the largest rollout globally of any library written in Rust.
I suppose this is true because there's more phones using WhatsApp than there are say Windows 11 PCs.
Given that WhatsApp uses libsignal, is it safe to assume that they haven't been using the Rust library directly?
WhatsApp doesn't use libsignal, and Android is already pretty Rusty and deployed more than WhatsApp around the world (not just smartphone. Tons of "embedded" use cases also run on custom Android)
Like our gym devices that have a full tablet to run a basic application to control weights, talk about wasting money.
6 replies →
If you watch "Microsoft is Getting Rusty: A Review of Successes and Challenges" it appears the whole effort is more on the Azure side, and besides some timid adoption like GDI regions, there is a lukewarm adoption of Rust on Windows side, still pretty much a C and C++ feud.
https://www.youtube.com/watch?v=1VgptLwP588
Just like Google’s Rust-in-Android blogs this reads like a PR piece (and in the case of facebook also recruitment piece) with some technical words sprinkled in for effect. The overall communication quality is that of a random startup’s “look what we did” posts.
The interesting aspects, such as how they protect against supply-chain attacks from the dependency-happy rust toolchain or how they integrated the C++ code with the Rust code on so many platforms - a top challenge as they said - remain a mystery.
Would also be interesting to hear how much AI-driven development they used for this project. My hope’s that AI gets really good at Rust so one doesn’t have to directly interact with the unergonomic syntax.
Very cool! I'm wondering if Signal is doing something similar? libsignal is implemented in Rust, but I don't know about the other parts.
Quite impressive, I did not know so many bugs were due to memory access.
To be fair the increased reliability of Rust code over C++ isn't just because of memory errors (out-of-bounds accesses, use-after-free, type confusion, etc). You also get:
* No undefined behaviour (outside `unsafe`, which is quite easy to avoid). In C++ there are many many sources of UB that aren't really memory errors directly, e.g. signed integer overflow or forgetting to `return` from a function.
* A much stronger type system.
Those two things have a really significant impact on reliability.
Cool - now we only need to get selling-you-out-for-profit-Zuckerberg out of WhatsApp to make it really trustworthy.