Comment by ben_w
12 hours ago
According to the Common Voice 15 graph on OpenAI's github repository, Albanian is the single worst performance you could have had: https://github.com/openai/whisper
But for what it's worth, I tried putting the YouTube video of Tom Scott presenting at the Royal Institute into the model, and even then the results were only "OK" rather than "good". When even a professional presenter and professional sound recording in a quiet environment has errors, the model is not really good enough to bother with.
No comments yet
Contribute on Hacker News ↗