Comment by pjc50
13 hours ago
I wanted this to work with Whisper, but the language I tried it with was Albanian and the results were absolutely terrible - not even readable English. I'm sure it would be better with Spanish or Japanese.
13 hours ago
I wanted this to work with Whisper, but the language I tried it with was Albanian and the results were absolutely terrible - not even readable English. I'm sure it would be better with Spanish or Japanese.
According to the Common Voice 15 graph on OpenAI's github repository, Albanian is the single worst performance you could have had: https://github.com/openai/whisper
But for what it's worth, I tried putting the YouTube video of Tom Scott presenting at the Royal Institute into the model, and even then the results were only "OK" rather than "good". When even a professional presenter and professional sound recording in a quiet environment has errors, the model is not really good enough to bother with.