← Back to context

Comment by sdenton4

1 day ago

When I was working in audio compression, evaluation was very painful because we had no programmatic way to measure how good some reconstructed audio sounds to a human. Any metric you could come up with was gameable, and direct optimization would lead to artifacts.

As a result, we always had a two-step evaluation process. We would use a suite of metrics to guide development progress (validation), but the final evaluation reported in a paper always involved subjective human listening experiments. This was expensive, but the only way to show that the codecs were actually improving.

Similarly, here it seems fine to use LLMs to judge your work in progress, but we should be requiring human evaluation for 'final' results.

Wouldn't that process avoid you finding a better subjective audio codec that doesn't reduce typical metrics (PSNR etc.) ? another process would rather be to first construct a metric software that tries to be similar to the subjective experience of humans, then use that to create audio codecs optimizing this metric

  • There's two answers to that....

    The first is, how do you know the subjective optimization your making is actually any good? You're just moving the problem back one layer of abstraction.

    The second is, we did that, eventually, by training models to predict subjective listening scores from the giant pile of subjective test data we had collected over the years. (ViSQoL) It's great, but we still don't trust it for end-of-the-day, cross codec comparison, because we don't want to reward overfit on the trained model.

    https://arxiv.org/abs/2004.09584

  • You are describing psychoacoustic models, which work to a reasonable extent for lossy compression of audio (MP3 and successors are based on them), but I can see how it would be much more difficult/less helpful for reconstructing audio.

You gotta snag yourself one of those awesome KEMAR dummy head and torso simulators, preferably the fully accessorized luxury edition that comes with the heavy duty portable travel case with lots of room for extra ears and microphones and wigs, which is so much fun to take through airport security.

They were great for taking to Grateful Dead concerts to record the music directly in front of the Wall of Sound, and to measure the response so you can play back all your Dead tapes with that same front row psychoacoustic perspective. ;)

https://www.grasacoustics.com/industries/kemar/applications-...

https://www.grasacoustics.com/products/accessories/product/4...