Comment by ArrayBoundCheck

4 years ago

Do you have any idea how many more bits you'd be able to use if you applied any of the encoding transformations?

I'd estimate that there's an easy order-of-magnitude improvement (~10x) just from implementing a simple error-correction mechanism -- a Reed-Solomon code ought to be good enough that we can take the squares down to 10x10, maybe even 8x8 or 5x5. Then, if we really work at it, we might be able to find another order-of-magnitude win (~100x) by packing more bits into a frequency-domain encoding scheme. This would likely require us to do some statistical analysis on the types of compression artifacts that YouTube introduces, in order to find a particularly robust set of basis images.

  • > that we can take the squares down to 10x10, maybe even 8x8 or 5x5

    16x16, 8x8, or 4x4 would be the way to go. You'd want each RGB block to map to a single H.264 macroblock.

    Using non order of 2 numbers means that individual blocks don't line up with macroblocks. Having a single macroblock represent 1, 4, or 16 RGB pixels would be ideal.

    In fact, I bet modifying the original code to use a scaling factor of 16 instead of 20 would produce some significant improvements.

    • There's also the chroma subsampling issue. With the standard 4:2:0 ratios, you'll get half the resolution for the two chroma channels, and if I'm not mistaken, they are more aggressively quantized.

      It would be better to use YUV/YCbCr directly instead of RGB.

  • I'm not sure if your examples are sticking to 0 or 255 RGB. If it is you might get a win by using HSL to pick your colors. If you change the lightness dramatically every frame maybe colors won't bleed across a frame. Then perhaps you can encode 2+ bits in hue and another 2+ in saturation getting a win and another minor one using 1+ bit on brightness (ie first frame can be 0 or 25%, next frame is 75% or 100%. I'm not too familiar with encodings though and how much it'd interfere with the other transforms

  • YouTube's 1080p60 is already at a decimation ratio of about 200:1, then you have to consider how efficient P and B frames are with motion/differences. if your data looks like noise you're gonna be completely screwed since the P and B frames will absolutely destroy the quality.

    There's a bunch of other things too, like YUV420p and TV colour range: 16-235, so you only get 7.7bits / pixel.

    If anything you would want to encode your data in some way that abuses the P and B frames, and the macro block size of 16x16.

    Coding theory for the data output at your end is only one side of the coin, the VP9 codec stupidly good compression is a completely different game to wrangle.

    And I kinda doubt you'll get much better than your estimate of 1% from the original scheme.

    https://www.youtube.com/watch?v=r6Rp-uo6HmI