Comment by sgt
11 hours ago
While on this subject, what's the go to transcribe speech to text model (open source or proprietary, doesn't matter) if you have to support a lot of languages really well?
11 hours ago
While on this subject, what's the go to transcribe speech to text model (open source or proprietary, doesn't matter) if you have to support a lot of languages really well?
If propeietary/SaaS fits your use case I can reccomend Speechmatics. Has a wider range of languages than a lot of the competition: https://speechmatics.com
(Full disclosure I'm an engineer there)
Will it work with say - someone speaking English with some hindi mixed in? I'm not from there so I'm not sure how prevalent that is, but I've been told it's quite common to "mix it up" in India, and I need to probably cater for that use case.
PS if you can share your email I'll pop you an email about Speechmatics. I tried the English version and it's impressive.
This is definitely the sort of use case we aim to support! I would need to check about Hindi specifically, but we have several bilingual models already with more to come:
https://docs.speechmatics.com/speech-to-text/languages#trans...
Drop me an email at mattn@speechmatics.com and we can chat about further details :)
I spent a few days on similar scenario without much success (scenario where one person speaks and then their speech is translated, and I want juts the original or both).
An API call to GPT4o works quite well (it basically handles both transcription and diarization), but I wanted a local model.
Whisper is really good for 1 person speaking. With more people you get repetitions. Qwen and other open multimodal models gives subpar results.
I tried multipass approach, with the first one identifying the language and chunking and the next one the actual transcription, but this tended to miss a lot of content.
I'm going to give canary-1b-v2 a try next weekend. But it looks like in spite of enormous development in other areas, speech recognition stalled since Whisper's release (more than 3 years already?).