← Back to context

Comment by janalsncm

9 months ago

> I wonder if there's a way to automatically detect how "fast" a person talks in an audio file

Transcribe it locally using whisper and output tokens/sec?

3 comments

janalsncm

Reply

maxall4 9 months ago

Just count syllables per second by doing an FFT plus some basic analysis.

tucnak 9 months ago
> FFT plus some basic analysis
Yeah, totally easier than `len(transcribe(a))/len(a)`
- janalsncm 9 months ago
  
  Maybe not as quick to code up but way faster to calculate.
  The tokens/second can be used as ground truth labels for a fft->small neural net model.