Comment by pjc50

10 months ago

I wanted this to work with Whisper, but the language I tried it with was Albanian and the results were absolutely terrible - not even readable English. I'm sure it would be better with Spanish or Japanese.

1 comment

pjc50

ben_w 10 months ago

According to the Common Voice 15 graph on OpenAI's github repository, Albanian is the single worst performance you could have had: https://github.com/openai/whisper

But for what it's worth, I tried putting the YouTube video of Tom Scott presenting at the Royal Institute into the model, and even then the results were only "OK" rather than "good". When even a professional presenter and professional sound recording in a quiet environment has errors, the model is not really good enough to bother with.