Don't get me wrong, this sort of thing is a valuable exercise and we are better off with better encoders for these older codecs. But look at the numbers for Opus on this benchmark. It simply blows all the AAC encoders out of the water even at 64 kbps.
The biggest advantage for having a good AAC encoder isn't efficiency, it's that for nearly the past 2 decades the de facto standard for live streamed video has been RTMP with H.264 video and AAC audio. There is basically no support for any other codecs. If you want to send a video stream to Youtube or Twitch, you will be sending H.264 and AAC. If you want an idea of how ubiquitous this is, I just checked in OBS and it will not even let you select different video and audio codecs in streaming mode, it just (correctly) assumes that anybody who's streaming will be streaming H.264 and AAC.
YouTube actually supports H.265 and VP9 ingest, depending on the streaming protocol. I can actually stream 4K@60 H.265 from my Mac Studio with < 5% CPU usage due to the hardware encoder support in OBS.
Plus, at 96+ kbps (assuming an Apple-quality AAC-LC encoder) Opus loses its quality advantage. So at higher bitrates, the benefit of choosing Opus is that encoders/decoders are royalty-free.
I think often of how all it would have taken was a bomb for the 10 or so people that years ago at some browser vendor consortium out of pure self centeredness went „nah lets fragment“. We could have saved many many collective years, electricity and eyeballs simply watching the most basic content.
That's a lot of words just to say "The specification points to the code as the primary specification; basing an implementation on their code (a derivative work) is subject to their license; I don't want to create a derivative work, I want it to be my own work, so I won't implement Opus."
I'm amazed by Opus saving ~half the disk space (or double the quality, as you will) and recommend it to everyone, so I'm interested in any critique to be aware of and thus kept reading, looking for what all the issues are that one might have with Opus. Wasted my time. It's a single issue and it was being hinted at since near the start. If I need a lecture on US copyright, I'll go to Wikipedia. But then I suppose that's under a different license and just won't do
That’s going a bit far. I’m in the games industry and have used opus regularly, it’s a great codec for games, often the hardware decoding is so restricted that we’re using software regardless so we might as well use something like opus.
The licensing restriction is unfortunate, but only restrictive for those with very specific goals, under normal conditions BSD is a wonderful license for game devs since you’re free to use the code and only have to add an acknowledgement somewhere.
I suppose a public domain game might hit the same limitation, though as a non-lawyer I would guess the chance of anyone with standing trying to sue anyone implementing from this spec is realistically zero (though I don’t fault stb for being unwilling to roll those dice!)
This essay says it's not possible to make a public-domain implementation of Opus. But it could be released under BSD (as libopus is), which is fine for games, as evidenced by the Licenses section of the credits in many games.
Most games use the sound support that comes with their game engine or choice of sound system, so I don't think the lack of an STB version is an issue. Performance is more of a problem. Audiokinetic, the makers of the popular Wwise audio system, estimate that Opus takes ~3-5x the CPU of Vorbis:
Choosing a lossy audio codec has become such a no brainer. Either use opus and be done with it or if for some reason opus cannot be used then use aac for compatibility with insane high bitrate for good quality without having to do research on what encoder and mode to pick.
Still having a good quality and default aac encoder is great. Though I don't get why it is mainly CBR.
Eh, I prefer Vorbis mostly because it's still competitive at transparent bitrates (esp. with Aotuv patches) and benefits from a much saner volume normalization spec (simply transfer RG 2.0 tags from the FLAC source): Xiph decided to exclude peak information from Opus' spec while adding that weird thing where album gain is stored in the format header and additional track gain in the metadata.
I would like Opus, but I’m using a subsonic client on iOS and my choice has been Flac (Alac?), MP3, or AAC. Opus wouldn’t play (There are some that supported it, but I didn’t like their UX).
I read almost all the way through your comment thinking there was a decent probability you were saying this new AAC encoder was written with Claude Opus.
I've never been AI guy, and have more fascination with audio. I've long stopped being excited when I read "Opus" on HN. It's refreshing when it turns out to be the audio codec.
Nice, I'm looking forward to seeing how this performs in practice. FFmpeg's previous AAC encoder produced poor quality output and often had irritating chirping artifacts, so I've always had to install Apple's Core Audio encoder on any computer I do video recording on to get decent sound. I've done A/B/X comparisons and found that a 320kbps MP3 sounds better than a 320kbps AAC encoded by FFmpeg, but about the same as a 256kbps AAC encoded by Core Audio. If installing Core Audio is no longer necessary, that'll be a huge improvement and people who use something like OBS to do screen recordings or streaming will get a massive sound quality boost the next time they update.
A useful project related to Apple's Core Audio is qaac - it wraps iTunes Windows DLL's in a standalone encoding tool with a CLI interface. I believe it even works under Wine on Linux: https://web.archive.org/web/20250814194428/https://www.andre... So you don't need a Mac or even a full iTunes installation to get high quality AAC encoding.
In the Hydrogenaudio discussion thread's metrics table, the new encoder scores better than Core Audio. But this is at constant bitrate (CBR) [edit: maybe not? see lesscraft's reply below]. Core Audio also has variable bitrate modes (TVBR) which the new encoder lacks.
So maybe Core Audio will continue to be the best when TVBR is available, but I'm hopeful the new FFmpeg encoder will be "good enough", especially if more folks find and contribute problem samples to help tune it.
You can care about quality to the extent that a lossy codec allows. Lossless is not always necessary or wanted. This is like saying “why care about transcoding quality when you can keep the video as is?”. There’s a myriad of use cases and preferences at play here.
I was using FDK AAC encoder, I didn’t know Apple encoder was available for systems other than Apple. Though I have once compared AAC FDK to Apple AAC at 192kbps, and couldn’t tell the difference, while the old FFmpeg AAC encoder fall apart at this bitrate.
>FFmpeg's AAC DEcoder is busted with regards to stereo PNS, and the bug may be in other AAC decoders too, so we work around it in the encoder. Since no other encoder used PNS, the bug was not found until now.
I don't know what PNS is, but I bet this has been bothering someone's niche use-case for 20 years
The issue was twofold, on one hand, using TNS on top of PNS meant the noise that got inserted was shaped by TNS, which is nonsense since the decoder generated the noise, not the encoder. This made PNS explode.
The second, biggest issue was that using PNS in combination with any stereo tools resulted in noise leaking in both channels equally, ruining stereo imaging. So the best and only thing to do was to enable PNS only if the band in both channels is noise (or is sufficiently non-tonal and masked).
> The encoder was mainly optimized for 48Khz audio. Get over it. It's 2026, resampling is free, 48Khz is the standard. 44.1Khz will work, and so will 96Khz but use 48Khz if you want the best quality.
I think the closest thing to an actual "standard" is AES5-2018, "Recommended practice for professional digital audio".
Abstract:
> A sampling frequency of 48 kHz is recommended for the origination, processing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling frequency for applications requiring a higher bandwidth or more relaxed anti-alias filtering. This revision further quantifies the preferred choices for higher sampling frequencies.
Edit: From my personal perspective, 44.1kHz is a legacy minor annoyance
AAC has a strange quirk that the window size is dependent on the sampling rate, thus requiring a complete psychoacoustics reoptimization of all encoder parameters for each sampling rate, since a 20msec window sounds very different than a 60msec window, to human ears.
Pipewire will quite happily pipe through audio without resampling if it is the only source on a system. You can see this by running pw-top and using speaker-test with various sample rates.
More or less. Streaming is often done with 48, video content has ben 48 for a while now, so unless you still produce content for CDs it is the standard.
44100 Hz had reasons no longer really needed (storing audio in 3 samples per line in VHS: 490 lines × 3 samples × 30 GPS = 44100 sample/s).
Qualitywise both are more than enough snd 99.99% of people would not be able to tell it apart in a blind test. Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
Aside from this higher than 48 kHz sample rates may have only downsides, like increased size and potential distortion in the ultrasonic frequency range that has sidebands in the audible range. Yet there is a persistent, but unscientific "more-is-better"-crowd in the HiFi-sector.
> Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
There are numerous use cases for higher sample rates that go beyond this but it's hard to talk about it without starting flame wars filled with junk science.
Yes and no. It is the standard for audio in film, which explains the author's focus. But is the audio CD bigger and more "standarder" than DVD and Blu-Ray? I think they're equals, and I personally think this encoder only makes sense for video content. Given all the caveats the author mentions (in particular about the sample rate) I would steer clear from using it when ripping CDs.
Last time I used ffmpeg to encode songs for my iPod nano they were broken; playback was interrupted by pops and clicks every few seconds. I wonder if this is fixed now?
This is truly a representative of the old internet: somebody codes up the best AAC encoder ever, and the first response comes from some admin, and it's some bickering about 48Khz vs 44Khz.
Older I get, more it seems it’s possible to ping pong between rewrites for good reasons (ex. here, metric maxes but I find it hard to believe VBR and not-48 kHz are silly things and not worth investing it)
It doesn't let me edit the link, but I'm confused by what even happened here... I posted this from my phone and that wrong link doesn't show up in my clipboard history.
Man what a showcase for Opus this is.
Don't get me wrong, this sort of thing is a valuable exercise and we are better off with better encoders for these older codecs. But look at the numbers for Opus on this benchmark. It simply blows all the AAC encoders out of the water even at 64 kbps.
The biggest advantage for having a good AAC encoder isn't efficiency, it's that for nearly the past 2 decades the de facto standard for live streamed video has been RTMP with H.264 video and AAC audio. There is basically no support for any other codecs. If you want to send a video stream to Youtube or Twitch, you will be sending H.264 and AAC. If you want an idea of how ubiquitous this is, I just checked in OBS and it will not even let you select different video and audio codecs in streaming mode, it just (correctly) assumes that anybody who's streaming will be streaming H.264 and AAC.
YouTube actually supports H.265 and VP9 ingest, depending on the streaming protocol. I can actually stream 4K@60 H.265 from my Mac Studio with < 5% CPU usage due to the hardware encoder support in OBS.
https://developers.google.com/youtube/v3/live/guides/ingesti...
Also the fact that hardware-accelerated AAC and even full AAC offload is ubiquitous in modern-ish hardware. I think my rice cooker can play AAC audio
2 replies →
Plus, at 96+ kbps (assuming an Apple-quality AAC-LC encoder) Opus loses its quality advantage. So at higher bitrates, the benefit of choosing Opus is that encoders/decoders are royalty-free.
2 replies →
Sample accurate editing is with AAC is a pain though. Especially if you also have video, because frame rates are usually incompatible.
If you want flexibility without fully transcoding both audio and video, Opus is your friend
1 reply →
I think often of how all it would have taken was a bomb for the 10 or so people that years ago at some browser vendor consortium out of pure self centeredness went „nah lets fragment“. We could have saved many many collective years, electricity and eyeballs simply watching the most basic content.
1 reply →
Most of my collection is Opus 256K, the only downside is support. A lot of tools like Bliss/Roon don't support it :(
I think the biggest issue with Opus is the problem with its specification being lacking, see:
https://nothings.org/stb/stb_opus.html
This essentially causes opus to never be used in games or in things in stores that may have issues with specific licenses.
That's a lot of words just to say "The specification points to the code as the primary specification; basing an implementation on their code (a derivative work) is subject to their license; I don't want to create a derivative work, I want it to be my own work, so I won't implement Opus."
I'm amazed by Opus saving ~half the disk space (or double the quality, as you will) and recommend it to everyone, so I'm interested in any critique to be aware of and thus kept reading, looking for what all the issues are that one might have with Opus. Wasted my time. It's a single issue and it was being hinted at since near the start. If I need a lecture on US copyright, I'll go to Wikipedia. But then I suppose that's under a different license and just won't do
That’s going a bit far. I’m in the games industry and have used opus regularly, it’s a great codec for games, often the hardware decoding is so restricted that we’re using software regardless so we might as well use something like opus.
The licensing restriction is unfortunate, but only restrictive for those with very specific goals, under normal conditions BSD is a wonderful license for game devs since you’re free to use the code and only have to add an acknowledgement somewhere.
I suppose a public domain game might hit the same limitation, though as a non-lawyer I would guess the chance of anyone with standing trying to sue anyone implementing from this spec is realistically zero (though I don’t fault stb for being unwilling to roll those dice!)
2 replies →
This essay says it's not possible to make a public-domain implementation of Opus. But it could be released under BSD (as libopus is), which is fine for games, as evidenced by the Licenses section of the credits in many games.
Most games use the sound support that comes with their game engine or choice of sound system, so I don't think the lack of an STB version is an issue. Performance is more of a problem. Audiokinetic, the makers of the popular Wwise audio system, estimate that Opus takes ~3-5x the CPU of Vorbis:
https://www.audiokinetic.com/en/community/blog/a-guide-for-c...
Choosing a lossy audio codec has become such a no brainer. Either use opus and be done with it or if for some reason opus cannot be used then use aac for compatibility with insane high bitrate for good quality without having to do research on what encoder and mode to pick.
Still having a good quality and default aac encoder is great. Though I don't get why it is mainly CBR.
> Choosing a lossy audio codec has become such a no brainer.
Falser words hath never been spoken.
2 replies →
Eh, I prefer Vorbis mostly because it's still competitive at transparent bitrates (esp. with Aotuv patches) and benefits from a much saner volume normalization spec (simply transfer RG 2.0 tags from the FLAC source): Xiph decided to exclude peak information from Opus' spec while adding that weird thing where album gain is stored in the format header and additional track gain in the metadata.
It also uses less battery on my Rockbox'd Clip+.
1 reply →
I would like Opus, but I’m using a subsonic client on iOS and my choice has been Flac (Alac?), MP3, or AAC. Opus wouldn’t play (There are some that supported it, but I didn’t like their UX).
You might like Poppy (in beta), which supports all media servers (including OpenSubsonic/Navidrome) and Opus as a first-class music format. https://www.reddit.com/r/PoppyApp/comments/1tiyki0/about_pop...
> Man what a showcase for Opus this is.
I take it you mean this Opus (https://en.wikipedia.org/wiki/Opus_(audio_format)) not that Opus (https://en.wikipedia.org/wiki/Claude_(AI)).
I read almost all the way through your comment thinking there was a decent probability you were saying this new AAC encoder was written with Claude Opus.
I've never been AI guy, and have more fascination with audio. I've long stopped being excited when I read "Opus" on HN. It's refreshing when it turns out to be the audio codec.
1 reply →
Nice, I'm looking forward to seeing how this performs in practice. FFmpeg's previous AAC encoder produced poor quality output and often had irritating chirping artifacts, so I've always had to install Apple's Core Audio encoder on any computer I do video recording on to get decent sound. I've done A/B/X comparisons and found that a 320kbps MP3 sounds better than a 320kbps AAC encoded by FFmpeg, but about the same as a 256kbps AAC encoded by Core Audio. If installing Core Audio is no longer necessary, that'll be a huge improvement and people who use something like OBS to do screen recordings or streaming will get a massive sound quality boost the next time they update.
A useful project related to Apple's Core Audio is qaac - it wraps iTunes Windows DLL's in a standalone encoding tool with a CLI interface. I believe it even works under Wine on Linux: https://web.archive.org/web/20250814194428/https://www.andre... So you don't need a Mac or even a full iTunes installation to get high quality AAC encoding.
In the Hydrogenaudio discussion thread's metrics table, the new encoder scores better than Core Audio. But this is at constant bitrate (CBR) [edit: maybe not? see lesscraft's reply below]. Core Audio also has variable bitrate modes (TVBR) which the new encoder lacks.
So maybe Core Audio will continue to be the best when TVBR is available, but I'm hopeful the new FFmpeg encoder will be "good enough", especially if more folks find and contribute problem samples to help tune it.
The benchmarks were made using afconvert on OSX with the default VBR settings.
Why not use a lossless codec if you care about quality? Or use Opus, descent for specht and works pretty much anywhere these days.
> Why not use a lossless codec if you care about quality?
(1) Lossy codecs are transparent at half the file size (or less) of FLAC/ALAC.
(2) AAC (strictly, AAC-LC) is universal, where FLAC and Opus are not yet there.
You can care about quality to the extent that a lossy codec allows. Lossless is not always necessary or wanted. This is like saying “why care about transcoding quality when you can keep the video as is?”. There’s a myriad of use cases and preferences at play here.
There are a ton of older, but still perfectly usable devices that support AAC well but not Opus.
I was using FDK AAC encoder, I didn’t know Apple encoder was available for systems other than Apple. Though I have once compared AAC FDK to Apple AAC at 192kbps, and couldn’t tell the difference, while the old FFmpeg AAC encoder fall apart at this bitrate.
i will never understand apples cuckoldry for proprietary codecs, if it wasn't for their adoption of h265 we would live in the av1 utopia
>FFmpeg's AAC DEcoder is busted with regards to stereo PNS, and the bug may be in other AAC decoders too, so we work around it in the encoder. Since no other encoder used PNS, the bug was not found until now.
I don't know what PNS is, but I bet this has been bothering someone's niche use-case for 20 years
The issue was twofold, on one hand, using TNS on top of PNS meant the noise that got inserted was shaped by TNS, which is nonsense since the decoder generated the noise, not the encoder. This made PNS explode. The second, biggest issue was that using PNS in combination with any stereo tools resulted in noise leaking in both channels equally, ruining stereo imaging. So the best and only thing to do was to enable PNS only if the band in both channels is noise (or is sufficiently non-tonal and masked).
https://www.audiolabs-erlangen.de/content/resources/aesCodin...
Hah, this sounds like the audio equivalent of Netflix’s grain reconstruction.
2 replies →
It’s fascinating so much of this comes down to the developer’s own ears - disturbing and quite cool at the same time how subjective this is
> The encoder was mainly optimized for 48Khz audio. Get over it. It's 2026, resampling is free, 48Khz is the standard. 44.1Khz will work, and so will 96Khz but use 48Khz if you want the best quality.
Is 48kHz really the standard nowadays?
I think the closest thing to an actual "standard" is AES5-2018, "Recommended practice for professional digital audio".
Abstract:
> A sampling frequency of 48 kHz is recommended for the origination, processing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling frequency for applications requiring a higher bandwidth or more relaxed anti-alias filtering. This revision further quantifies the preferred choices for higher sampling frequencies.
Edit: From my personal perspective, 44.1kHz is a legacy minor annoyance
48kHz makes alignment between video and audio so much easier. (I.e.: Lip synchronization after edits)
AAC has a strange quirk that the window size is dependent on the sampling rate, thus requiring a complete psychoacoustics reoptimization of all encoder parameters for each sampling rate, since a 20msec window sounds very different than a 60msec window, to human ears.
This was of course fixed in Opus.
I know the opus codec assumes everything is 48kHz and will resample inputs to that.
Pretty much all DACs run at 48Khz by default due to operating systems picking it as a sane default.
Pipewire will quite happily pipe through audio without resampling if it is the only source on a system. You can see this by running pw-top and using speaker-test with various sample rates.
More or less. Streaming is often done with 48, video content has ben 48 for a while now, so unless you still produce content for CDs it is the standard.
44100 Hz had reasons no longer really needed (storing audio in 3 samples per line in VHS: 490 lines × 3 samples × 30 GPS = 44100 sample/s).
Qualitywise both are more than enough snd 99.99% of people would not be able to tell it apart in a blind test. Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
Aside from this higher than 48 kHz sample rates may have only downsides, like increased size and potential distortion in the ultrasonic frequency range that has sidebands in the audible range. Yet there is a persistent, but unscientific "more-is-better"-crowd in the HiFi-sector.
> Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
There are numerous use cases for higher sample rates that go beyond this but it's hard to talk about it without starting flame wars filled with junk science.
10 replies →
For one, audio transcription services that use Whisper will sample the input down to 16Khz mono first.
Yes, pretty much all new hardware uses it as default output setting as well (by that I mean laptops, phones, smart speakers, etc.)
Yes and no. It is the standard for audio in film, which explains the author's focus. But is the audio CD bigger and more "standarder" than DVD and Blu-Ray? I think they're equals, and I personally think this encoder only makes sense for video content. Given all the caveats the author mentions (in particular about the sample rate) I would steer clear from using it when ripping CDs.
48kHz has been the recommended setting with Premiere Pro as long as I can remember.
44.1kHz, isn't that what lameMP3 uses as default?
It's what CDs use, so it would make sense for mp3 encoders to follow suit.
Nice, I can’t wait to see how this turns out in practice.
Last time I used ffmpeg to encode songs for my iPod nano they were broken; playback was interrupted by pops and clicks every few seconds. I wonder if this is fixed now?
A very welcomed addition, hopefully I can replace fdk-aac
This is truly a representative of the old internet: somebody codes up the best AAC encoder ever, and the first response comes from some admin, and it's some bickering about 48Khz vs 44Khz.
Older I get, more it seems it’s possible to ping pong between rewrites for good reasons (ex. here, metric maxes but I find it hard to believe VBR and not-48 kHz are silly things and not worth investing it)
I always encode my AAC with VBR. Why wouldn't you, right? I guess I'll stick to apple or fdkaac for now.
HA, a blast from the past, when audio encoders were making strides and collecting mp3s was a thing. Same for video encoders.
It was kind of fun being able to easily distinguish 128kbps MP3 from the source audio. (Some early encoders were really bad.)
[dead]
Flagged for the wrong link.
Hopefully they see this - there's still time to edit the submission link.
It doesn't let me edit the link, but I'm confused by what even happened here... I posted this from my phone and that wrong link doesn't show up in my clipboard history.
Link should be: https://hydrogenaudio.org/index.php/topic,129691.0.html
2 replies →