Comment by petercooper

7 years ago

Here we go! This is the first minute or so of Penny Lane by The Beatles converted down to a 10KB .bin and then back to a .wav: http://no.gd/pennylane.wav .. unsurprisingly the vocals remain recognizable, but the music barely at all.

17 comments

petercooper

bch 7 years ago

As imagined by Marilyn Manson...

petercooper 7 years ago
Pretty much! It shows off how the codec works to a great extent though as it seems to be misinterpreting parts of the music to be the pitch of the speech, so Paul's voice sounds weird at the start of most lines but okay throughout the lines.
I've also run a BBC news report through the program with better results although it demonstrates that any background noise at all can throw things off significantly: https://twitter.com/peterc/status/1111736029558517760 .. so at this low bitrate, it really is only good for plain speech without any other noise.
- jmvalin 7 years ago
  
  Well, in the case of music, what happens is that due to the low bit-rate there are many different signals that can produce the same features. The LPCNet model is trained to reproduce whatever is the most likely to be a single person speaking. The more advanced the model, the more speech-like the music is likely to turn
  When it comes to noisy speech, it should be possible to improve things by actually training on noisy speech (the current model is trained only on clean speech). Stay tuned :-)

bonzini 7 years ago

Can you try it with Tom's Diner by Suzanne Vega? It's sung without any instruments, and an early version of MP3 reportedly was a disaster on that song.

petercooper 7 years ago
Here you go: http://no.gd/vega2.wav
It holds up ridiculously well considering the entire song compresses down to 25392 bytes.
- StavrosK 7 years ago
  
  The lyrics of the song are 1200 characters long, so this version of the song only takes up twenty times more space than the written lyrics.
  
  1 reply →
- bonzini 7 years ago
  
  Compare it with this now: https://youtu.be/lHjn8ffnEKU :-)
- bravura 7 years ago
  
  Could you also try "I Feel Love" by Donna Summer?
  I am curious how it sounds when there is a really active bassline and lead synth.
  
  1 reply →
- godelski 7 years ago
  
  I'm getting a 404 on this
  
  2 replies →

charlesism 7 years ago

    the music barely at all.

I suspect the reason that excerpt sounds so bad is because the music has several instruments playing at once. One doesn't generally design a vocoder to deal with more than one voice. As that except plays, you can hear that the most prominent instruments (eg: the bass at several moments) sound pleasing, albeit speech-like.

It would probably different from the original music, but pleasant, if one processed each track separately.

Animats 7 years ago

Right. This form of compression assumes a primary single pitch, plus variations from that tone. You can hear it locking into different components of the song and losing almost everything else.

Heavy compression of voice is vulnerable to background noise.

I miss the classic telco 8K samples per second, 8 bits. We used to think that was crappy audio.

qwerty456127 7 years ago

Hilariously nightmarish. I'm going to use this for my alarm clock...

sehugg 7 years ago

Sounds like a typical LPC encoder at a low bitrate, like maybe 5 kbps.