Comment by coliveira
5 days ago
The issue is that French, Italian, African, Japanese people shouldn't have the inconvenience of instructing the LLM tool to get the basic facts about their own culture. They should use an LLM that has already been trained like that by default. Nobody has obligation to use a tool that thinks it is talking to an American. If I go to Google for example I want to get facts about my own country in my own language.
Wouldn't those people be asking the questions in their own language in the first place? The model will reply in the language you use. This thread is about people asking for information about a language that is not the one they are messaging the LLM in
Even if the model will reply in my language, I often notice it searching in english. Or thinking in english. There's always something lost in translation. Sometimes it's just minor nuances. Other times it mangles the legal facts with those of other countries.
This sounds like the problem of people calling "911" as the emergency number which they see in so much US-American media but which is not the emergency number in their own country.
2 replies →
They always sound like an obnoxious American tourist talking through a translator, the chatbot training dataset is the same and foundation models are always built with >50% American English data for some reason.
>Nobody has obligation to use a tool that thinks it is talking to an American
Very very emphatic agree from my end, thanks.
> Nobody has obligation to use a tool that thinks it is talking to an American.
Then add top-level instructions saying what country you're from, what country you live in now, and which language you speak. This isn't that hard.
None of that even addresses the problem described, because none of the languages you mentioned would be French in the described example.