← Back to context

Comment by bondarchuk

10 days ago

Can whisper do multilingual yet? Last time I tried it on some mixed dutch/english text it would spit out english translations for some of the dutch text. Strange bug/feature since from all appearances it had understood the dutch text perfectly fine.

I think the Dutch/English is probably the worst combination for this. Languages are rather close.

  • I don't understand how this would happen, though. It's not like it will mishear a dutch sentence as if it's english; it will correctly pick up the dutch sentence, but (since the language is auto-detected as english at the start of the segment), seemingly auto-translate that (correct and correctly heard) dutch text to english. All we need is a way to get the dutch text that's surely somewhere in there, before the translation happens.

    Unless it was trained end-to-end on dutch-subtitled english text?? Which might make the translation a somewhat inextricable part of the model..? Does anyone know?

    • Maybe try the turbo model which is transcription only. The other models were trained on x to en translations and they seem to emphasise the output language over the task token. You can get them to translate to any language even though it was never trained for that, comparatively nl-en translation is in the dataset so I'm not surprised it's doing that.

      1 reply →

Isn't that a bit much for ASR models? Humans can't handle simultaneous multilingual dictation task either, I have to stop and reinitialize ears before switching languages between English and my primary one.

  • In South Asia, it's quite common for people to speak a combination of their local language and English. Not just alternating sentences between the two languages, but in fact, constructing sentences using compound phrases from the two languages.

    "Madam, please believe me, maine homework kiya ha" [I did my homework].

    • This is common in the southwestern part of the US too. My partner and her friends she grew up with will have conversations that fluidly pick phrases and vocab from either Spanish or English depending on what words happen to be the easiest to pull from their brain. It's wild to listen to.

      1 reply →

  • Seems like it already has the capability somewhere in the model though - see my reply to clarionbell.

  • Isn't that exactly what intepreters do?

    • If they're like what I am, they seem to just coordinate constant staggered resets for sub-systems of language processing pipeline while keeping internal representations of inputs in half-text state so that input come back out through the pipeline in the other configurations.

      That's how I anecdotally feel and interpret how my own brain appear to work, so it could be different from how interpreters work or how actual human brains work, but as far as I see it, professional simultaneous interpreters don't seem to be agnostic for relevant pairs of languages at all.

I found that it works quite well for Dutch+English as long as you use one of the larger models. But that may just be luck, I imagine mixing Italian and Swedish will have very different results.

Whisper has been multilingual for 5 years at least.

  • I know it is ostensibly multilingual, it's less than a year since I tried, but it does this thing where it then translates everything (or only some things) into a single language regardless with no way to turn it off.

    • Sorry, I've been using it for French audio files since 5 years and never had this issues.