Comment by alphazard
12 hours ago
This was a fascinating read, thanks for posting.
I'm not totally convinced that the threat model is realistic. The watermarker has to embed the watermark, the only place to do that is in the least significant bits of whatever the message is. If it's an audio file then the least significant bits of each sample would work. If it's a video file then the LSBs in a DCT bin may also be unnoticeable. It can really only go in certain places, without it affecting the content in a meaningful way. If it's in a header, or separate known location, then the pirate can just delete those bits.
The threat model presented says the pirates have to go with one of the copies, or only correct errors that are different between 2 copies. That's the part that I don't think is realistic. If the pirates knew that the file was marked, and the scheme used to mark it, but didn't know the key (a standard threat model for things like encryption), then they could inject their own noise into wherever the watermark could be hiding, and now the problem is the watermarker trying to send a message on a noisy channel, where the pirates have a jammer. I don't even think you have to sacrifice quality, since the copy you have already has noise, and you just need to inject the same amount (or more).
It's more sophisticated than that. A single movie can be fragmented into 1000s of fragments, each fragment carries 1 bit. It's called A/B forensic watermarking. So you need to insert a 1-bit watermark into a video segment that is a few megabytes, there is no feasible way to defeat this as a pirate unless the watermarker is incompetent. Averaging will not work.
See AWS offering:
<https://docs.aws.amazon.com/wellarchitected/latest/streaming...>
This will be more challenging for text. Not as difficult for images.
> the only place to do that is in the least significant bits
This is also false, it's the most naive way to watermark content. They do it in the mid range frequencies these days. And then make the watermarks robust to resizing, re-encoding, cropping and even rotation in some cases. They survive when someone holds a camera to record a screen.