Comment by jug

5 days ago

Anthropic’s research did find that Claude seemed to have an inner language agnostic ”language” though. And that the larger a LLM got, the more it could realize the innate meaning of words between language barriers as well as expand upon its internal non-specific language representation.

So, part of its improved performance as they grow in parameter count is probably not only due to expanded raw material that it is trained upon, but a greater ability to ultimately ”realize” and connect apparent meanings of words, so that a German speaker might benefit more and more from training material in Korean.

> These results show that features at the beginning and end of models are highly language-specific (consistent with the {de, re}-tokenization hypothesis [31] ), while features in the middle are more language-agnostic. Moreover, we observe that compared to the smaller model, Claude 3.5 Haiku exhibits a higher degree of generalization, and displays an especially notable generalization improvement for language pairs that do not share an alphabet (English-Chinese, French-Chinese).

Source: https://transformer-circuits.pub/2025/attribution-graphs/bio...

However, they do see that Claude 3.5 Haiku seemed to have an English ”default” with more direct connections. It’s possible that a LLM needs to go a more roundabout way via generalizations to communicate in alternative languages and where this causes a dropoff in performance the smaller the model is?

Sounds like it is capable of thinking in abstract concepts instead of words that are related/connected? So that training material in different languages would all add to knowledge on the same concepts?

It is like a student in school that is really brilliant in learning by heart, and repeating the words it studied, but not understanding the concept versus a student that actually understands the topic and can reason about the concepts.

The modern Standard Chinese language is almost syntactically "identical" to English, for some reason. French was direct ancestor to medieval British language that came to be the modern English.

My point is, those language pairs aren't random examples. Chinese isn't something completely foreign and new thing when it comes to difference between it and English.