← Back to context

Comment by amiga386

7 months ago

Yes, and it's allowing the format to change to allow more cleverness or apply more processing power.

For example, changes from one frame to the next are encoded in rectangular areas called "superblocks" (similar to a https://en.wikipedia.org/wiki/Macroblock). You can "move" the blocks (warp them), define their change in terms of other parts of the same frame (intra-frame prediction) or by referencing previous frames (inter-frame prediction), and so on... but you have to do it within a block, as that's the basic element of the encoding.

The more tightly you can define blocks around the areas that are actually changing from frame to frame, the better. Also, it takes data to describe where these blocks are, so there are special limitations on how blocks are defined, to minimise how many bits are needed to describe them.

AV2 now lets you define blocks differently, which makes it easier to fit them around the areas of the frame that are changing. It has also doubled the size of the largest block, so if you have some really big movement on screen, it takes fewer blocks to encode that.

That's just one change, the headline improvement comes from all the different changes, but this is an important one.

There is new cleverness in the encoders, but they need to be given the tools to express that cleverness -- new agreement about what types of transforms, predictions, etc. are allowed and can be encoded in the bitstream.

https://youtu.be/Se8E_SUlU3w?t=242

In general with movement through scenes it would seem that rectangular update windows seem like a poor match.

Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.

  • > I can move a camera through room

    That's what AV1 calls global motion and warped motion. Motion deltas (translation/rotation/scaling) can be applied to the whole frame, and blocks can be sheared vertically/horizontally as well as moved.

    • I wasn't very clear when I reread. What I meant was remapping textured triangles (or remapping bezier surfaces).

      Consider a scene with a couple of cars moving on a background, one can imagine a number of vertices around the contour of each car, and reusing the previous car, it makes no sense to force the shape of blocks. The smaller the seams between shapes (reusing previous frames as textures), the fewer pixels it needs to reconstitute de novo. The more accurate the remapping xy_old(x_prev,y_prev)-><x,y>, the lower the error signal that needs to be reconstructed.

      Also the majority of new contour vertex locations can be reused as the old contour locations in the next frame decoding. Then only changes in contour vertexes over time need to be encoded, like when a new shape enters the scene, or a previously static object starts moving. So there is a lot of room for compression.

  • >Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.

    I mean, that's more or less how it works already. But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.

    • > But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.

      This is exactly what I question. Why should there be block shaped units of granularity? defining a UV-textured 3D mesh that moves and carries previous decoded pixel values should have much less seams, with a textured mesh instead of blocks the only de novo pixel values would be the seams between reusable parts of the mesh, for example when an object rotates and reveals a newly visible part of its surface.

      1 reply →