← Back to context

Comment by numpad0

10 days ago

Isn't that a bit much for ASR models? Humans can't handle simultaneous multilingual dictation task either, I have to stop and reinitialize ears before switching languages between English and my primary one.

In South Asia, it's quite common for people to speak a combination of their local language and English. Not just alternating sentences between the two languages, but in fact, constructing sentences using compound phrases from the two languages.

"Madam, please believe me, maine homework kiya ha" [I did my homework].

  • This is common in the southwestern part of the US too. My partner and her friends she grew up with will have conversations that fluidly pick phrases and vocab from either Spanish or English depending on what words happen to be the easiest to pull from their brain. It's wild to listen to.

    • Aren't those limited to specific words or phrases in specific forms? I doubt it works for arbitrary half-sentences.

Seems like it already has the capability somewhere in the model though - see my reply to clarionbell.

Isn't that exactly what intepreters do?

  • If they're like what I am, they seem to just coordinate constant staggered resets for sub-systems of language processing pipeline while keeping internal representations of inputs in half-text state so that input come back out through the pipeline in the other configurations.

    That's how I anecdotally feel and interpret how my own brain appear to work, so it could be different from how interpreters work or how actual human brains work, but as far as I see it, professional simultaneous interpreters don't seem to be agnostic for relevant pairs of languages at all.