Comment by DoctorOetker
7 months ago
In general with movement through scenes it would seem that rectangular update windows seem like a poor match.
Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.
> I can move a camera through room
That's what AV1 calls global motion and warped motion. Motion deltas (translation/rotation/scaling) can be applied to the whole frame, and blocks can be sheared vertically/horizontally as well as moved.
I wasn't very clear when I reread. What I meant was remapping textured triangles (or remapping bezier surfaces).
Consider a scene with a couple of cars moving on a background, one can imagine a number of vertices around the contour of each car, and reusing the previous car, it makes no sense to force the shape of blocks. The smaller the seams between shapes (reusing previous frames as textures), the fewer pixels it needs to reconstitute de novo. The more accurate the remapping xy_old(x_prev,y_prev)-><x,y>, the lower the error signal that needs to be reconstructed.
Also the majority of new contour vertex locations can be reused as the old contour locations in the next frame decoding. Then only changes in contour vertexes over time need to be encoded, like when a new shape enters the scene, or a previously static object starts moving. So there is a lot of room for compression.
>Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.
I mean, that's more or less how it works already. But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.
> But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.
This is exactly what I question. Why should there be block shaped units of granularity? defining a UV-textured 3D mesh that moves and carries previous decoded pixel values should have much less seams, with a textured mesh instead of blocks the only de novo pixel values would be the seams between reusable parts of the mesh, for example when an object rotates and reveals a newly visible part of its surface.
And how do you plan to extract that mesh and texture from an arbitrary input video?
Having worked in the field of photogrammetry, I can tell you that it is a really complex task.