AV2 video codec delivers 30% lower bitrate than AV1, final spec due in late 2025

7 months ago (videocardz.com)

Will streaming services ever stop over-compressing their content?

I have a top-of-the-line 4K TV and gigabit internet, yet the compression artifacts make everything look like putty.

Honestly, the best picture quality I’ve ever seen was over 20 years ago using simple digital rabbit ears.

You especially notice the compression on gradients and in dark movie scenes.

And yes — my TV is fully calibrated, and I’m paying for the highest-bandwidth streaming tier.

Not my tv, but a visual example: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

  • Content delivery costs a lot for streaming services. After content is produced, this is basically the only remaining cost. It’s not surprising that they would go to extreme measures in reducing bitrate.

    That’s why, presumably, Netflix came up with the algorithm for removing camera grain and adding synthetically generated noise on the client[0], and why YouTube shorts were recently in the news for using extreme denoising[1]. Noise is random and therefore difficult to compress while preserving its pleasing appearance, so they really like the idea of serving everything denoised as much as possible. (The catch, of course, is that removing noise from live camera footage generally implies compromising the very fine details captured by the camera as a side effect.)

    [0] https://news.ycombinator.com/item?id=45022184

    • So:

      1. camera manufacturers and film crews both do their best to produce a noise-free image 2. in post-production, they add fake noise to the image so it looks more "cinematic" 3. to compress better, streaming services try to remove the noise 4. to hide the insane compression and make it look even slightly natural, the decoder/player adds the noise back

      Anyone else finding this a bit...insane?

      17 replies →

    • It feels to me like there are two different things going on:

      1. Video codecs like the denoise, compress, synthetic grain approach because their purpose is to get the perceptually-closest video to the original with a given number of bits. I think we should be happy to spend the bits on more perceptually useful information. Certainly I am happy with this.

      2. Streaming services want to send as few bytes as they can get away with. So improvements like #1 tend to be spent on decreasing bytes while holding perceived quality constant rather than increasing perceived quality while holding bitrate constant.

      I think one should focus on #2 and not be distracted by #1 which I think is largely orthogonal.

      2 replies →

    • >Content delivery costs a lot for streaming services.

      The hard disk space to store an episode of a show is $0.01. With peering agreements, the bandwidth of sending the show to a user is free.

      2 replies →

    • There might be also copyright owners requirements, e.g. contract that limits the quality of material.

  • > You especially notice the compression on gradients and in dark movie scenes.

    That's not a correctly calibrated TV. The contrast is tuned WAY up. People do that to see what's going on in the dark, but you aren't meant to really be able to see those colors. That's why it's a big dark blob. It's supposed to be barely visible on a well calibrated display.

    A lot of video codecs will erase details in dark scenes because those details aren't supposed to be visible. Now, I will say that streaming services are tuning that too aggressively. But I'll also say that a lot of people have miscalibrated displays. People simply like to be able to make out every detail in the dark. Those two things come in conflict with one another causing the effect you see above.

    • Video codecs aren't tuned for any particular TV calibration. They probably should be because it is easier to spot single bit differences in dark scenes, because the relative error is so high.

      The issue is just that we don't code video with nearly enough bits. It's actually less than 8-bit since it only uses 16-235.

    • If an eye is able to distuinguish all 256 shades on a correctly calibrated display, then the content should be preserved.

  • >Will streaming services ever stop over-compressing their content?

    Before COVID Netflix were at least using 8Mbps for 1080P content. With x264 / beamr it is pretty good, and even better on HEVC. Then COVID hit, every streaming service not just Netflix have excuses to lower their quality due to increased demand with limited bandwidth. Everything went down hill since then. Customer got used to lower quality I dont believe they ever bring it back up. Now it is only something like 3-5Mbps according to previous test posted on HN.

    And while it is easy for HEVC / AV1 / AV2 to have 50%+ bitrate real world savings compared to H.264 saving at 0.5 - 4Mbps range, once you go pass that the savings begin to shrink rapidly to the point good old x264 encoder may perform better at much higher bitrate.

  • Not all video streaming services choose to use the same extremely low average video bit rate used by Netflix on some of their 4k shows.

    Kate - Netflix - 11.15 Mbps

    Andor - Disney - 15.03 Mbps

    Jack Ryan - Amazon - 15.02 Mbps

    The Last of Us - Max - 19.96 Mbps

    For All Mankind - Apple - 25.12 Mbps

    https://hd-report.com/streaming-bitrates-of-popular-movies-s...

    • Netflix has shown they're the mattress-company equivalent of streaming services.

      You will be made to feel the springs on the cheapest plan/mattress, and it's on purpose so you'll pay them more for something that costs them almost nothing.

  • Are you sure about the black-areas-blocking? I remember a long time ago, when I was younger and had time for this kind of tomfoolery, I noticed this exact issue in my BlueRay backups. I figured I needed to up the bitrate, so I started testing, upping the bitrate over and over. Finally, I played the BlueRay and it was still there. This was an old-school, dual-layer, 100GB disc of one of the Harry Potter movies. Still saw the blocking in very dark gradients.

  • I’m still so surprised Disney+ degrades their content/streaming service so much. Of all the main services I’ve tried (Netflix, Prime, Hulu, HBO) Disney+ has some of the worst over-compression, lip-sync, and remembering-which-episode-is-next issues for me. Takes away from the “magic”.

    • Check your settings. I experienced the same until I altered Apple TV settings that fixed Disney+. If I recall, the setting was Match content or Match dynamic range (not near tv right now to confirm exact name)

    • Netflix now this on their lowest paid tier as well. I had to upgrade to the 4K tier just to get somewhat-ok 1080p playback...

    • This is interesting because Disney+ when they started out were using much higher bitrate, 2nd only to Apple+.

  • Economically speaking, it doesn't make any sense for them to spend more on bandwidth and storage if they can get away with not spending more.

  • I don't quite follow why compression would cause this. Feels more like a side effect of adaptive HTTPS streaming protocol where it would automatically adjust based on your connection speed, and so aligns with any jitter on the wire. It could also be an issue with the software implementation because they need to constantly switch between streams based on bandwidth.

    • > side effect of adaptive HTTPS streaming

      Adaptive streaming isn't really adaptive anymore. If you have any kind of modern broadband, the most adaptive it will be is starting off in one of the lower bitrates for the first 6 seconds before jumping to the top, where it will stay for the duration of the stream. A lot of clients don't even bother with that anymore; they look at the manifest, find the highest stream, and just start there.

  • As a little experiment, I'd like you to set up your own little streaming service on a server and see how much bandwidth it uses, even for just a few users. It adds up extremely quickly, with the actual using being quite surprising.

    At the higher prices, I'd have to agree with you. If you pay for the best you should get the best.

  • I pirate blu-ray rips. Pirates are very fastidious about maintaining visual quality in their encodings. I often see them arguing over artifacts that I absolutely cannot see with my eyes.

  • >the best picture quality I’ve ever seen was over 20 years ago using simple digital rabbit ears.

    The biggest jump in quality was when everything was still analog over the air, but getting ready for the digital transition.

    Then digital over the air bumped it up a notch.

    You could really see this happen on a big CRT monitor with the "All-in-Wonder" television receiver PCI graphics adapter card.

    You plugged in your outdoor antenna or indoor rabbit ears to the back of the PC, then tuned in the channels using software.

    These were made by ATI before being acquired by AMD, the TV tuner was in a faraday cage right on the same PCB as the early GPU.

    The raw analog signal was upscaled to your adapter's resolution setting before going to the CRT so you had pseudo better resolution than a good TV like a Trinitron. You really could see more details and the CRT was smooth as butter.

    As the TV broadcaster's entire equipment chain was replaced, like camera lenses, digital sensors and signal processing they eventually had everything in place and working. You could notice these incremental upgrades until a complete digital chain was established as designed. It was really jaw-dropping. This was well in advance of the deadline for digital deployment, so the signal over-the-air was still coming in analog the same old way.

    Eventually the broadcast signal switched to digital and the analog lights went out, plus the All-in-Wonder was not ideal with a cheap converter like analog TV's could get by with.

    But it was still better than most digital TVs for a few years, then it took years more before you could see the ball in live sports as well as on a CRT anyway.

    Now that's about all you've got for full digital resolution, live broadcasts from your local stations, especially live sports from a strong interference-free station over an antenna. You can switch between the antenna and cable and tell the difference when they're both not overly compressed.

    The only thing was, digital engineers "forgot" that TV was based on radio (who knew?) so for the vast majority of "listeners" on the fringe reception areas who could get clear audio but usually not a clear picture if any, too bad for you. You're gonna need a bigger antenna, good enough to have gotten you a clear picture during the analog days. Otherwise your "clean" digital audio may silently appear on the screen as video, "hidden" within the sparse blocks of scattered random digital noise. When anything does appear at all.

  • for the super affluent, https://www.kaleidescape.com/compare/

    • Funny that they're marketing the supposed advantages of higher bitrates using pictures with altered contrast and saturation lol. I would expect the target audience to be somewhat affluent in the actual benefits? Then again, I wouldn't expect somebody like Scorsese to be a video compression nerd.

      Also the whole "you can hear more with lossless audio" is just straight up a lie.

    • Fascinating.

      Pricing, if I am reading the site correctly: $7k-ish for a server (+$ for local disks, one assumes), $2-5k per client. So you download the movie locally to your server and play it on clients scattered throughout your mansion/property.

      Not out of the world for people who drop 10s of thousands on home theater.

      I wonder if that's what the Elysium types use in their NZ bunkers.

      No true self-respecting, self-described techie (Scotsman) would use it instead of building their own of course.

    • For the less affluent you can setup a Jellyfin media server and rip your own blu-rays with makemkv.

  • It's a little surprising to me that there generally aren't more subscription tiers where you can pay more for higher quality. Seems like free money, from people like you (maybe) and me.

    • You can already pay for 4K or "enhanced bitrate" but it's still relatively low bitrate and what's worse, this service quality is not guaranteed. I've had Apple TV+ downgrade to 1080p and lower on a wired gigabit connection so many times.

      1 reply →

    • I'm not surprised they don't offer an even higher tier. When you're pricing things, you often need to use proxies - like 1080p and 4K. It'd be hard to offer 3 pricing tiers: 1080p, 4K, 4K but actually good 4K that we don't compress to hell. That third tier makes it seem like you're being a bit fraudulent with the second tier. You're essentially admitting that you've created a fake-4K tier to take people's money without delivering them the product they think they're buying. At some point, a class-action lawsuit would use that as a sort of admission that you knew you weren't giving customers what they were paying for and that it was being done intentionally, both of which matter a lot.

      Right now, Netflix can say stuff like "we think the 4K video we're serving is just as good." If they offer a real-4K tier, it's hard to make that argument.

      1 reply →

  • Well, you'll be happy to learn that AV2 delivers 30% better quality for the same bitrate!

It’s pretty amazing people are still finding ways to make video smaller.

Is this just people being clever or is it also more processing power being thrown at the problem when decoding / encoding?

  • Yes, and it's allowing the format to change to allow more cleverness or apply more processing power.

    For example, changes from one frame to the next are encoded in rectangular areas called "superblocks" (similar to a https://en.wikipedia.org/wiki/Macroblock). You can "move" the blocks (warp them), define their change in terms of other parts of the same frame (intra-frame prediction) or by referencing previous frames (inter-frame prediction), and so on... but you have to do it within a block, as that's the basic element of the encoding.

    The more tightly you can define blocks around the areas that are actually changing from frame to frame, the better. Also, it takes data to describe where these blocks are, so there are special limitations on how blocks are defined, to minimise how many bits are needed to describe them.

    AV2 now lets you define blocks differently, which makes it easier to fit them around the areas of the frame that are changing. It has also doubled the size of the largest block, so if you have some really big movement on screen, it takes fewer blocks to encode that.

    That's just one change, the headline improvement comes from all the different changes, but this is an important one.

    There is new cleverness in the encoders, but they need to be given the tools to express that cleverness -- new agreement about what types of transforms, predictions, etc. are allowed and can be encoded in the bitstream.

    https://youtu.be/Se8E_SUlU3w?t=242

    • In general with movement through scenes it would seem that rectangular update windows seem like a poor match.

      Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.

      5 replies →

  • I believe patents play a big role here as well. Anything new must be careful to not (accidentally) violate any active patent, so there might be some tricks that can't currently be used for AV1/AV2

    • I think patents are quickly becoming less of a problem. A lot of the foundational encoding techniques have exited patent protection. H.264 and everything before it is patent free now.

      It's true you could still accidentally violate a patent but that minefield is clearing out as those patents simply have to become more esoteric in nature.

      2 replies →

    • There are numerous patent trolls in this space with active litigation against many of the participants in the consortium who brought AV1. The EU was also threatening to investigate (likely to protect the royalty revenues of European companies)

    • It has always seemed very weird to me that compression algorithms were patentable.

      1) it harms interoperability

      2) I thought math wasn’t patentable?

  • A bit of both. Also, the modern Codecs have slightly different tradeoffs (image quality (PSNR, SSIM), computational complexity (CPU vs DSP vs Memory), storage requirements, bit rate) and therefore there isn't one that is best for every use case.

  • I wonder when we will see generative AI codecs in production. The concept seems simple enough, the encoder knows the exact model the decoder will use to generate the final image starting from a handful of pixels, and optimizes towards lowest bitrate and minimum subjective quality loss, for example, by letting the decoder generate a random human face in the crowd, or give it more data in that area to steer it towards the face of the team maskot, as the case may be.

    At the absolute compression limit, it's no longer video, but a machine description of the scene conceptually equivalent to a textual script.

    • There was nvidia videoo upsampling or w/e it is called. It was putting age spots on every face when it was blurry and it used too much resources as far as I can remember

    • And then that script gets processed on hundreds of GPUs in the cloud and the video gets streamed to the client. Wait.

  • New video codecs typically offer more options for how to represent the current frame in terms of other frames. That typically means more processing for the encoder, because it can check for all the similarities to see what works best; there's also harder math for arithmetic coding of the picture data. It will be more work for the encoder if it needs to keep more reference images and especially if it needs to do harder transformations, or if arithemetic decoding gets harder.

    Clever matters a lot more for encoding. If you can determine good ways to figure out the motion information without trying them all, that gets you faster encoding speed. Decoding doesn't tend to have as much room for cleverness; the stream says to calculate the output from specific data, so you need to do that.

  • I don’t know the details of AV2, but going from h.265 to h.266, the number of angles for angular prediction doubled, they added a tool to predict chroma from luma, added the ability to do pixel block copies and a bunch of other techniques… And that’s just for intra predictions. They also added tons of new inter prediction techniques.

    All of this requires a significant amount of extra logic gates/silicon area for hardware decoders, but the bit rate reduction is worth it.

    For CPU decoders, the additional computational load is not so bad.

    The real additional cost is for encoding because there’s more prediction tools to choose from for optimal compression. That’s why Google only does AV1 encoding for videos that are very popular: it doesn’t make sense to do it on videos that are seen by few.

    • Iirc Facebook did the selective encoding too. And it would predict which videos would be popular so even the first streams would get the AV1 version.

  • It’s more money and more user’s compute being thrown at the problem to get the streaming service’s CDN bill down.

    • While funny, that's not really what I would call accurate. Users get reduced data consumption, potentially higher quality selection if the bandwidth now allows for a higher resolution to be streamed, and possibly lower disk usage should they decide to offline the videos.

      Better codecs are an overall win for everyone involved.

      9 replies →

    • Modern video codecs are what broke the telco monopoly on content and gave us streaming services in the first place. If the cdn bill is make or break, the service isn’t going to last.

      And there’s no transfer of effort to the user. Compute complexity of video codecs is asymmetric. The decode is several order of magnitude cheaper to compute than the encode. And in every case, the principal barrier to codec adoption has been hardware acceleration. Pretty much every device on earth has a hardware-accelerated h264 decoder.

    • For those of us who back up media, this can be very appealing as well. I don’t disagree that what you said is a major driving force, but better formats have benefited me and my storage requirements multiple times in the past.

  • Soon we will just have local AI processors which will just make stuff up between scenes but adhere to a “close enough” guideline where all narratively critical elements are maintained but other things (eg landscapes or trees) will be generated locally. Movies will practically be long cutscenes with photorealistic graphics.

    • I'm sure models which replace characters in realtime will also become popular. I would imagine some company thinking it would be cool if the main character looked slightly more like whatever main audience it's being shown to and it's done on their playback devices (so, of course, it can be customized or turned off).

      I find the idea fun, kinda like using snapchat filters on characters, but in practice I'm sure it'll be used to cut corners and prevent the actual creative vision from being shown which saddens me.

    • At that point we aren’t even all watching the same movies. Which could be interesting. But very different—I mean, even stuff like talking with your friends about a movie you saw will change drastically. Maybe a service could be centered around sharing your movie prompts so have a shared movie experience to talk to your friends about.

      3 replies →

Let's hope they get more things right 2nd time around. AOM will do Live Session on 20th of October: The Future of Innovation is Open [1].

May be more data and numbers. Including Encoding Complexity increase, decoding complexity. Hardware Decoder roadmap. Compliance and Test kits. Future Profile. Involvement and improvement to both AVIF the format and the AV2 image codec. Better than JPEG-XL? Are the ~30% BDRATE compared to current best AV1 encoder or AV1 1.0 as anchor point? Live Encoding improvements?

[1] https://aomedia.org/events/live-session-the-future-of-innova...

30% over AV1 is crazy, it doesn't feel too long since AV1 released but that was in 2019.

  • Yet I still only got hardware support for it on my first devices last year. The downside of "rapid" iteration on video codecs is that content needs to always be stored in multiple formats (or alternatively battery life on the client suffers from software playback, which is the route e.g. Youtube seems to be preferring).

    • Hopefully that improves. The guy giving the presentation on AV2 made clear there was "rigorous scrutiny for hardware decoding complexity", and they were advised by Realtek and AMD on this.

      So it seems like they checked that all their ideas could be implemented efficiently in hardware as they went along, with advice from real hardware producers.

      Hopefully AV2-capable hardware will appear much quicker than AV1-capable hardware did.

      15 replies →

    • It'd be really cool if we had 'upgradable codec FPGAs' in our machines that you could just use flash to the newest codec... but that'd probably be noticeably more expensive, and also not really in the interest of the manufacturers, who want to have reasons to sell new chips.

      5 replies →

    • The main delay the last time was corporations being dicks about IP but the two main culprits have got on board this time.

Codec implementation and optimization was probably my favorite type of work. It would be fun to dive deep into AV2 in those areas but no time!

All this high speed fiber for nothing...

Who does this benefit? Sounds like this stuff mainly benefits streaming providers and not users. We get to go through the whole rigamarole again where hardware is made obsolete because it doesn't support acceleration.

  • How does it not help users to lower your mobile bandwidth use? This is especially useful in the era of TikTok, Snapchat, and YouTube.

I always thought the name AV1 was partly a play on/homage to AVI (Audio Video Interlace), but AV2 breaks that. Even if it’s meant to be embedded into other container formats such as MP4, there are files with the .av1 extension and there is a video/AV1 MIME type (and possibly a UTI?). Does this mean we now need to duplicate all that to .av2 and video/AV2? What about the AVIF file format?

  • Files with the .av1 extension are for raw AV1 data. For AV2 this should become .av2, yes. That's by design, as they're two different incompatible formats. Typically you use a container like Matroska (.mkv, video/x-matroska), WebM or MP4 which contains your video stream with a type code specifying the codec (av01, av02).

    AVIF is also a container format, and I believe should be adaptable to AV2, even if the name stands for "AV1 image format". It could simply just be renamed to AOMedia Video Image Format for correctness.

  • Do you mean the file extension should only reflect the file format and not the codecs it has inside ?

    Maybe that’s what we did in the past and it was a bad idea. It’d be useful to know if you can read the file by looking only at its extension

    • File extension shouldn't matter at all, because data should have associated metadata (e.g. HTTP content-type, CSS image-set, HTML <video><source type=""/></video>)

      > It’d be useful to know if you can read the file by looking only at its extension

      That would be madness, and there's already a workaround - the filename itself.

      For most people, all that matters is an MKV file is a video file, and your configured player for this format is VLC. Only in a small number of cases does it matter about an "inner" format, or choice of parameter - e.g. for videos, what video codec or audio codec is in use, what the bitrate is, what the frame dimensions are.

      For where it _matters_, people write "inner" file formats in the filename, e.g. "Gone With The Wind (1939) 1080p BluRay x265 HEVC FLAC GOONiES.mkv", to let prospective downloaders choose what to download from many competing encodings of exactly the same media, on websites where a filename is the _only_ place to write that metadata (if it were a website not standardised around making files available and searching only by filenames, it could just write it in the link description and filename wouldn't matter at all)

      Most people don't care, for example, that their Word document is A4 landscape, so much that they need to know _in the filename_.

    • > Do you mean the file extension should only reflect the file format and not the codecs it has inside ?

      That's pretty much always been the case. File extensions are just not expressive enough to capture all the nuances of audio and video codecs. MIME types are a bit better.

      Audio is a bit of an exception with the popularity of MP3 (which is both a codec and a relatively minimal container format for it).

      3 replies →

We must be reaching the limit at which video codecs can only achieve better quality by synthesizing details. That's already pretty prevalent in still images - phone cameras do it, and there are lots of AI resizing algorithms that do it.

It doesn't look like AV2 does any of that yet though fortunately (except film grain synthesis but I think that's fine).

  • Arguably that's already happening with film grain — you have to extrapolate _what the original probably was_, encode it because it's smaller, then add the noise back to be more faithful to the original despite your image being better.

    I imagine e.g. a picture of an 8x8 circle actually takes more bits to encode than a mathematical description of the same circle

    • >I imagine e.g. a picture of an 8x8 circle actually takes more bits to encode than a mathematical description of the same circle

      I wonder if there are codecs with provisions for storing common shapes. Text comes to mind - I imagine having a bank of 10 most popular fonts an encoding just the difference between source and text + distortion could save quite a lot of data on text heavy material. Add circles, lines, basic face shapes.

  • Outside of AV1/2 (and linear media in general) that's already well and truly developed tech. Nvidia DLSS, AMD FSR and Intel XeSS all provide spatial/temporal super sampling to process lower fidelity base renders [0].

    There also seems to be a fair bit of attention on that problem space from the real-time comms vendors with Cisco [1], Microsoft [2] and Google [3] already leaning on model based audio codecs. With the advantages that provides both around packet loss mitigation and shifting costs to end user (aka free) compute and away from central infra I can't see that not extending to the video channel too.

    [0]: https://mtisoftware.com/understanding-ai-upscaling-how-dlss-...

    [1]: https://www.webex.com/gp/webex-ai-codec.html

    [2]: https://techcommunity.microsoft.com/blog/microsoftteamsblog/...

    [3]: https://research.google/blog/lyra-a-new-very-low-bitrate-cod...

  • >We must be reaching the limit at which video codecs can only achieve better quality by synthesizing details.

    Not quite yet as shown in H.267. But at some point the computational requirement vs bandwidth saving benefits would no longer make sense.

Oh, is more HEVC stuff finally going off patent? They’re the leaders.

  • It would be great. As to being the leaders, even without the patented stuff, AV1 is still more efficient.

I wait on new codec invented #AI

  • You'll be waiting for a long time then, probably. Making codecs is actually a hard problem, the type of thing that AI completely falls over when tasked with.

    • Compression is actually a very good use case for neural networks (i.e. don't have an LLM develop a codec, but rather train a neural network to do the compression itself).

      It works amazingly well with text compression, for example: https://bellard.org/nncp/

    • Considering AI is good at predicting things and that’s largely what compression does, I could see machine learning techniques being useful as a part of a codec though (which is a completely different thing from asking ChatGPT to write you a codec)

      4 replies →