Comment by janalsncm
9 months ago
> I wonder if there's a way to automatically detect how "fast" a person talks in an audio file
Transcribe it locally using whisper and output tokens/sec?
9 months ago
> I wonder if there's a way to automatically detect how "fast" a person talks in an audio file
Transcribe it locally using whisper and output tokens/sec?
Just count syllables per second by doing an FFT plus some basic analysis.
> FFT plus some basic analysis
Yeah, totally easier than `len(transcribe(a))/len(a)`
Maybe not as quick to code up but way faster to calculate.
The tokens/second can be used as ground truth labels for a fft->small neural net model.