Comment by pants2

1 year ago

That leaderboard omits the current SOTA which is GPT-4o-transcribe (an LLM)

2 comments

pants2

Do you have any comparisons in terms of WER? I doubt that GPT-4o-transcribe is better than the best models from that leaderboard (https://huggingface.co/spaces/hf-audio/open_asr_leaderboard). A quick search on this got me here: https://www.reddit.com/r/OpenAI/comments/1jvdqty/gpt4otransc... https://scribewave.com/blog/openai-launches-gpt-4o-transcrib...

It is stated that GPT-4o-transcribe is better than Whisper-large. That might be true, but what version of Whisper-large actually exactly? Looking at the leaderboard, there are a lot of Whisper variants. But anyway, the best Whisper variant, CrisperWhisper, is currently only at rank 5. (I assume GPT-4o-transcribe was not compared to that but to some other Whisper model.)

It is stated that Scribe v1 from elevenlabs is better than GPT-4o-transcribe. In the leaderboard, Scribe v1 is also only at rank 6.

pzo 1 year ago

you have image with WER on openai blog post here: https://openai.com/index/introducing-our-next-generation-aud...
On their chart they compare also with: gemini 2.0 flash, whisper large v2, whisper large v3, scribe v1, nova 1, nova 2. If you need only english transcription then pretty much all models will be good these days but big difference is depending on input language.