Comment by briansm
10 days ago
I believe youtube still uses 40 mel-scale vectors as feature data, whisper uses 80 (which provides finer spectral detail but is computationally more intensive to process naturally, but modern hardware allows for that)
No comments yet
Contribute on Hacker News ↗