Comment by rasz

7 years ago

3 Gflops, we are deep beyond diminishing returns here. Opus seems good enough.

13 comments

rasz

Keep in mind that the very first CELP speech codec (in 1984) used to take 90 seconds to encode just 1 second of speech... on a Cray supercomputer. Ten years later, people had that running in their cell phones. It's not just that hardware keeps getting faster, but algorithms are also getting more efficient. LPCNet is already 1/100 the complexity of the original WaveNet (which is just 2 years old) and I'm pretty sure it's still far from optimal.

rasz 7 years ago

This is roughly >100x computation for 2x improvement, which might sound great, except we are already talking single digit Kbits here, hence diminishing returns.

nullc 7 years ago

Opus is awesome and covers a previously unmatched spectrum of use cases... but that isn't everything.

Opus isn't good enough to be a replacement for AMBE for use over radio. Opus doesn't make it easier to make very high quality speech synthesis, etc.

Opus loss robustness could be much better using tools from this toolbox-- and we're a long way from not wanting better performance in the face of packet loss.

rasz 7 years ago

This is roughly 2x improvement over AMBE+2, except AMBE peaks at maybe couple hundred MIPS, and there are better less computationally intensive alternative, like 20-70 MIPS https://dspini.com/vocoders/lowrate/twelp-lowrate/twelp2400
metildaa 7 years ago

Opus is still improving, v1.1 to v1.2 then onto v1.3 (current in FFMpeg) saw huge reductions in compute for encoding, and the minimum bitrate for stereo wideband fall year after year.
The limiting factor for Opus's penetration has been compute, FEC is still rarely supported on VOIP deskphones due to this, ditto for handling multiple Opus calls at once.

gok 7 years ago

3 GFLOP/sec sounds like a lot but it's considerably less math than the radio DSPs inside any modern phone's baseband is doing during a phone call.

opportune 7 years ago
I don't know much about phone tech, are the basebands really doing math or just instrumenting? My assumption would be that there is just some sensor writing to a buffer at a high frequency but that whatever processes that buffer operates at a lower frequency.
- anisppp 7 years ago
  
  Your question is hard to parse? What is instrumenting? If it helps though... the word “baseband” itself is the lower frequency containing just the bandwidth of the signal. Ie that is the lower frequency...
  
  1 reply →

m0zg 7 years ago

Not really, no. Especially not if this is implemented in a specialized accelerator. A GFLOP is not that much there. Also, like most other neural network algorithms, this could also be done in fixed point, thereby further reducing the computational cost.

nine_k 7 years ago

I wonder if a particular network can be implemented more economically if we only run it in "transformation" mode, without the need do train it.

tachyonbeam 7 years ago

There are technologies to compress deep networks by pruning weak connections. I don't believe the author is using this, so it's likely the computational cost could be reduced by a factor of 10. It could also be that simple tweaks to the NN architecture also work (was the author aiming for using a network as small as possible to begin with?).

jmvalin 7 years ago

Actually, what's in the demo already includes pruning (through sparse matrices) and indeed, it does keep just 1/10 of the weights as non-zero. In practice it's not quite a 10x speedup because the network has to be a bit bigger to get the same performance. It's still a pretty significant improvement. Of course, the weights are pruned by 16x1 blocks to avoid hurting vectorization (see the first LPCNet paper and the WaveRNN paper for details).