← Back to context

Comment by pxc

8 hours ago

If this is true, shouldn't LLMs perform way worse when working in Chinese than in English? Seems like an easy thing to study since there are so many Chinese LLMs that can work in both Cbinese and English.

Do LLMs generally perform better in verbose languages than they do in concise ones?

Are you saying Chinese is more concise than English? Chinese poetry is concise, but that can be true in any language. For LLMs, it depends on the tokenizer. Chinese models are of course more Chinese-friendly and so would encode the same sentence with fewer tokens than Western models.

  • > Are you saying Chinese is more concise than English?

    Yeah, definitely. It lacks case and verb conjugations, plus whole classes of filler words, and words themselves are on average substantially shorter. If you listen to or read a hyper-literal transliteration of Chinese speech into English (you can find fun videos of this on Chinese social media), it even resembles "caveman speech" for those reasons.

    If you look at translated texts and compare the English versions to the Chinese ones, the Chinese versions are substantially shorter. Same if you compare localization strings in your favorite open-source project.

    It's also part of why Chinese apps are so information-dense, and why localizing to other languages often requires reorganizing the layout itself— languages like English just aren't as information-dense, pixel for pixel.

    The difference is especially profound for vernacular Chinese, which is why Chinese people often note that text which "has a machine translation flavor" is over-specified and gratuitously prolix.

    Maybe some of this washes out in LLMs due to tokenization differences. But Chinese texts are typically shorter than English texts and it extends to prose as well as poetry.

    But yeah this is standard stuff: Chinese is more concise and more contextual/ambiguous. More semantic work is allocated in interpretation than with English, less is allocated in the writing/speaking.

    Do you speak Chinese and experience the differences between Chinese and English differently? I'm a native English speaker and only a beginner in Chinese but I've formed these views in discussion with Chinese people who know some English as well.

    • Chinese omits articles, verbs aren't conjugated, and individual characters carry more meaning than English letters, but other than those differences I don't have the impression that Chinese communication is inherently more concise. Some forms of official speech are wordy. Writing is denser, but the amount of information conveyed through speech is about the same. There are jokes about ambiguous words or phrases in both Chinese and English. So I was surprised at your take, but no objection to your points above. Ancient Chinese, on the other hand, is extremely concise, but so are other ancient languages like Hebrew, although in a different way. So it seems that ancient languages are compressed but challenging and modern languages have unpacked the compression for ease of understanding.

      1 reply →