Comment by anyfoo

4 years ago

I only looked at the example video, but is the concept just "big enough pixels"?

Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].

Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.

[1] If that's what video encoding still does, I really have no idea, but you get the point.

Agree it would be cool to be "untouched" by the compression algorithm, but that's nearly impossible with YouTube. YouTube encodes down to several different versions of a video and on top of that, several different codecs to support different devices with different built-in video hardware decoders.

For example, when I upload a 4K vid and then watch the 4K stream on my Mac vs my PC, I get different video files solely based on the browser settings that can tell what OS I'm running.

Handling this compression protection for so many different codecs is likely not feasible.

  • Yes, but nothing is saying this has to work for every codec. Since you want to retrieve the files using a special client, you could pick the codec you like.

    But (almost) nothing prevents YouTube from not serving that particular codec anymore. This still pretty much falls under the "re-encoding" case I mentioned which would make the whole thing brittle anyway.

    But it's indeed cool to think about. 8)

  • How about Fourier transform (or cosine, whichever works best), and keep data as frequency components coefficients? That’s the rough idea behind digital watermarking. It survives image transforms quite well.

  • Just as an aside, it's absolutely astounding how much hardware Google must throw at YouTube to achieve this for any video anybody in the world wants to upload. The processing power to reencode to so many versions, and then to store all of those versions, and then make all of those accessible anywhere in the world at a moments notice. Really is such an incredible waste for most YouTube content.

    • Keep in mind that they can very well predict which videos will have any meaningful amount of views <99% and just encode on demand the ones that won’t.

      1 reply →

  • What if you have an ML model that produces a vector from a given image. You have a set of vectors that correspond to bytes - for a simple example you have 256 "anchor vectors" that correspond to any possible byte.

    To compress data an arbitrary sequence of bytes, for each byte, you produce an image that your ML model would convert to the corresponding anchor vector for that byte and add the image as a frame in a video. Once all the bytes have been converted to frames you then upload the video to YouTube.

    To decompress the video you simply go frame by frame over the video and send it to your model. Your model produces a vector and you find which of your anchor vectors is the nearest match. Even though YouTube will have compressed the video in who knows what way, and even if YouTube's compression changes, the resultant images in the video should look similar, and if your anchors are well chosen and your model works well, you should be able to tell which anchor a given image is intended to correspond to.

    • Why go that way. I’m no digital signal processing expert, but images (and series thereof, i.e videos) are 2D signals. What we see is spatial domain and analyzing pixel by pixel is naive and won’t get you very far.

      What you need is going to frequency domain. From my own experiment in university times most significant image info lays in lowest frequencies. Cutting off frequencies higher than 10% of lowest leaves very comprehensible image with only wavey artifacts around objects. You have plenty of bandwidth to use even if you want to embed info in existing media.

      Now here you have full bandwidth to use. Start with frequency domain, set expectations of lowest bandwidth you’ll allow and set the coefficients of harmonic components. Convert to spatial domain, upscale and you got your video to upload. This should leave you with data encoded in a way that should survive compression and resizing. You’ll just need to allow some room for that.

      You could slap error correction codes on top.

      If you think about it, you should consider video as - say - copper wire or radio. We’ve come quite far transmitting over these media without ML.

      1 reply →

    • That's essentially a variant of "bigger pixels". Just like them, your algorithm cannot guarantee that an unknown codec will still make the whole thing perform adequately.

      Even if you train your model to work best for all existing codecs (I assume that's the "ML" part of the ML model), the no free lunch theorem pretty much tells us that it can't always perform well for codecs it does not know about.

      (And so does entropy. Reducing to absurd levels, if your codec results in only one pixel and the only color that pixel can have is blue, then you'll only be able to encode any information in the length of the video itself.)

      4 replies →

Back in the day, VCRs were commonly used as tape backup devices for data.

Now studios are using motion-picture film to store data, since it's known to be stable for a century or more.

YouTube let’s you download your uploaded videos. I’ve never tested it, but supposedly it’s the exact same file you uploaded.[a] It probably wouldn’t work with this “tool” as it uses the video ID (so I assume it’s downloading what clients see, not the source), but it’s an idea for some other variation on this concept.

[a] That way, in the future, if there’s any improvements to the transcode process that makes smaller files (different codec or whatever), they still have the HQ source

  • They may retain the original files, but they don't give that back to you in the download screen. I just tested it by going to the Studio screen to download a video I uploaded as a ~50GB ProRes MOV file and getting back an ~84MB H264 MP4.

YT might still recompress your video, possibly using proprietary algorithms that are not necessarily DCT based

  • As said, falls apart with re-encoding. But is a bit more interesting than what is more or less QR codes.

    • I find it a bit more interesting to have something that actually works on youtube, even if only as a proof of concept.