Comment by petercooper
7 years ago
Here we go! This is the first minute or so of Penny Lane by The Beatles converted down to a 10KB .bin and then back to a .wav: http://no.gd/pennylane.wav .. unsurprisingly the vocals remain recognizable, but the music barely at all.
As imagined by Marilyn Manson...
Pretty much! It shows off how the codec works to a great extent though as it seems to be misinterpreting parts of the music to be the pitch of the speech, so Paul's voice sounds weird at the start of most lines but okay throughout the lines.
I've also run a BBC news report through the program with better results although it demonstrates that any background noise at all can throw things off significantly: https://twitter.com/peterc/status/1111736029558517760 .. so at this low bitrate, it really is only good for plain speech without any other noise.
Well, in the case of music, what happens is that due to the low bit-rate there are many different signals that can produce the same features. The LPCNet model is trained to reproduce whatever is the most likely to be a single person speaking. The more advanced the model, the more speech-like the music is likely to turn
When it comes to noisy speech, it should be possible to improve things by actually training on noisy speech (the current model is trained only on clean speech). Stay tuned :-)
Can you try it with Tom's Diner by Suzanne Vega? It's sung without any instruments, and an early version of MP3 reportedly was a disaster on that song.
Here you go: http://no.gd/vega2.wav
It holds up ridiculously well considering the entire song compresses down to 25392 bytes.
The lyrics of the song are 1200 characters long, so this version of the song only takes up twenty times more space than the written lyrics.
1 reply →
Compare it with this now: https://youtu.be/lHjn8ffnEKU :-)
Could you also try "I Feel Love" by Donna Summer?
I am curious how it sounds when there is a really active bassline and lead synth.
1 reply →
I'm getting a 404 on this
2 replies →
I suspect the reason that excerpt sounds so bad is because the music has several instruments playing at once. One doesn't generally design a vocoder to deal with more than one voice. As that except plays, you can hear that the most prominent instruments (eg: the bass at several moments) sound pleasing, albeit speech-like.
It would probably different from the original music, but pleasant, if one processed each track separately.
Right. This form of compression assumes a primary single pitch, plus variations from that tone. You can hear it locking into different components of the song and losing almost everything else.
Heavy compression of voice is vulnerable to background noise.
I miss the classic telco 8K samples per second, 8 bits. We used to think that was crappy audio.
Hilariously nightmarish. I'm going to use this for my alarm clock...
Sounds like a typical LPC encoder at a low bitrate, like maybe 5 kbps.