Comment by walthamstow
1 day ago
Seems quite heavy for a STT model, Parakeet and Whisper are much smaller and perform great for quick dictation and transcription of longer files. I guess that's due to additional accuracy and speaker diarisation?
The TTS example clip in the repo of 'spontaneous singing' is creepy as fuck
No comments yet
Contribute on Hacker News ↗