Chinese characters are a type of pictographs that have some characteristics of QR codes. In fact, there is indeed a word retrieval method called four-corner number, which quickly maps Chinese character graphics to four numbers through some simple formulas, which is especially suitable for one-way encoding and retrieval. For example, the four-corner number of "龍" is coded as 0121, and the code of "兲" is 1080 (please refer to https://github.com/chai2010/im4corner).
In addition, Chinese characters are actually more important as hieroglyphic shapes. For example, we have a "凹语言" (Wa-lang https://github.com/wa-lang/wa/ ) designed for WebAssembly (WASM for short, WebAssembly => WASM => Wa), in which the Chinese characters "凹" and WASM The logo is very similar, and there was even a pronunciation of "wa" in the past.
After the popularization of computers, the function input method has been greatly improved, but there is still a lot of input resistance. For example, in programming, frequent switching between Chinese character names and English keywords brings a loss of input efficiency. As a programmer, I hope Chinese users can continue to pay attention to and improve these in the future.
No mention of the competing MingKwai typewriter of Lin Yutang, the famous popularizer of Chinese culture to the west. Apparently his prototype suffered an embarrassing failure at an investor meeting and couldn’t get off the ground. But the idea was good. Article here: https://thereader.mitpress.mit.edu/the-uncanny-keyboard/
The idea of searching for characters via parts is similar to how Cangjie input method selects characters from radicals. I read somewhere that Cangjie input method was indeed inspired by Ming Kwai typewriter, but I can't find the citation for it.
Romanization of Chinese writing was already proposed during the New Culture movement in the 1910s-20s. China's most famous modern writer supported it.
However, the Chinese language has evolved alongside the characters for about 3000 years, and it's very difficult to just separate the two. A huge amount of culture is bound up with the characters. Not only that, but the Romanized writing system is viewed as something that only little children use (as an aid to learn the characters). Once you've put in the effort to learn the characters (as about a billion people have), it's very difficult to accept their replacement by what is viewed as a script for children.
The nice thing about Chinese is information density of writing. Something nice about seeing how much information can be squeezed into a small space. Feels like you front load more on the learning side, but get rewarded when reading and scanning texts. Not sure how much scientific evidence is behind that, just an anecdotal observation. Relatively few Chinese speakers want to give up characters.
> However, the Chinese language has evolved alongside the characters for about 3000 years, and it's very difficult to just separate the two. A huge amount of culture is bound up with the characters.
How did that work out for Korea when they switched to Hangul?
Much of the simplification adopted shorthand already in common use, which is why Japanese shinjitai simplification independently arrived at many similar characters and patterns. The second simplification round was an abysmal newspeak-esque failure, and thank goodness _that_ wasn't adopted either.
I find it humorous that 鷹 was described as a difficult character in the article. It’s like 3 radicals and the character for bird.
Perhaps it’s difficult to render in tiny Latin alphabet font, but if you have any Japanese or Chinese study under you, you could read and reproduce that nearly instantly on sight.
i think he meant difficult in the sense that it consists of many strokes, not in the sense how difficult it is to remember. however, one could argue that there are many other, more complex to write, kanji than 鷹
It interesting to consider that both Japan and China might have been prevent from ever being first with general purpose computers. ASCII, and other encoding schemes, only needed to make provisions for less than 200 characters, making it possible to implement with the limited storage and memory available to early computers. The shear amount of characters in some languages, like Chinese may have served as a distraction or roadblock for early computers in the those countries.
“Safari can’t open the page because the address is invalid”
How strange.
More on topic: Considering how inefficient Chinese characters are in general (but especially evident in computing) as one of the few languages where characters have no direct relation to phonetics, I wonder why there hasn’t been an effort to modernize it similar to Hiragana in Japan. Well, considering how Chinese is basically Kanji, why not just adopt Japanese?
There were various attempts to develop an organic phonetic writing system for Chinese, like hiragana for Japanese, for example Bopomofo (still used in Taiwan) and General Chinese (https://en.wikipedia.org/wiki/General_Chinese). The Simplified characters that you see on the mainland today were originally part of a multi-phase scheme to eventually replace characters altogether, but the second phase (https://en.wikipedia.org/wiki/Second_round_of_simplified_Chi...) was bungled so badly that it didn't continue. In practice Pinyin is the standard phonetic writing now and is used when people can't remember a character.
> how inefficient Chinese characters are in general (but especially evident in computing)
We are not in the 90s anymore. UTF-8 has been around for 32 years now. If you’re working for a system that has no UTF-8 support, you have a much bigger problem to worry about.
> characters have no direct relation to phonetics
Most characters are phono-semantic where one part of the character is a phonetic hint and the other is a semantic hint.
> modernize it similar to Hiragana
Hiragana isn’t and wasn’t intended to replace kanji (unless you are from the fringe Kanamozikai). It serves a different grammatical purpose and is complementary to the other two. Kana is useful for an agglutinating language like Japanese, but not Chinese languages.
I think one of statements with respect to CJK languages that has to be made more often is that each of the languages has own numerous dialects with dubious mutual intelligibilities, e.g. Tsugaru and Kagoshima dialects against standard Japanese.
The phrase "a language is a dialect with an army" often appears in topic of Asian languages, and causing frictions between CJK non-speakers wondering about compatibilities between the three and speakers showing near vile dissents to those questions. While I understand both sides of these sentiments, the situation is not ideal for both sides.
IMO, it might be weird to refer to these languages as "Beijing Tokyo Seoul" languages, but doing so occasionally(just occasionally) could create more tangible feel as to why these three seem to exist side by side so utterly disconnected against each others.
1. That Chinese writing is inherently inefficient. It's actually very efficient...to read. And nothing beats the efficiency of having a script that maps perfectly to the language. Also as sibling comment notes, UTF-8 is a thing.
2. That there is no relation between written characters and phonetics. Incorrect, as several sibling comments point out.
3. That Japanese kana represents a successful "modernization" of kanji that Chinese should emulate.
4. That Chinese is "basically kanji" - assuming the Chinese and Japanese languages are essentially interchangeable. They...are not. I can't even begin to emphasize how much they are not. Chinese is subject-verb-object while Japanese is subject-object-verb, for instance. Chinese also has many phonemes that are incompatible with Japanese, which would not be covered in hiragana. Finally, kanji came from Chinese and has subtle differences and while it is mostly a subset of Chinese hanz, it has its own slightly different character set
GP is making understandable misunderstanding due to how the three Far East countries are presented in the world at large, that there are three countries in Asia that practically touches each others, just like Germany is with Belgium and Netherlands in Europe.
Tokyo from Beijing(2000km/1200mi) is about as far out as Paris to Kyiv. Far East countries are also separated by seas, like Mediterranean countries across the sea. I doubt a lot of Parisians have meaningful ideas of "basically Latin" Ukrainian any way or form, or Italians with Tunisian, but there's such false instinct that forms out of above-mentioned presentation that those Asians are rather next door neighbors.
That and mistaking personal difficulties and inefficiencies associated with understanding languages in non-native manners as inferiority of the foreign one.
It’s really bizarre to see someone claim kana has anything to do with “modernization”. The Japanese modernization and industrialization period is famously associated with translating Western concepts and terminologies into Sinitic words that later spread to China, Korea and Vietnam.
Because Japanese characters have no direct relation to Chinese phonetics. Both belong to different dialect continuums, phonetics aren't compatible.
And I suspect same might explain lack of native Chinese phonetic script; `Chinese` isn't a single spoken language, but what is called as such is its Beijing area version of one of Chinese(or Sinitic) languages. The written language was universally understood in China due to bureaucratic needs, but AIUI it's not same as spoken language and it's not necessarily used everywhere. Maybe they just had little uses for a standardized phonetic script?
It is still very useful to standardize the pronunciations, since people with different dialects had to meet especially those officials in government. There was “yayan” for this purpose.
Non-native speakers who suggest that countries arbitrarily modernize or change their language remind me of non-musicians who come around with a new replacement for traditional sheet music. Even if it was a good idea, which in this case it's patently not, it's just not gonna happen.
It's a failure to recognize that languages (which I would rank music a kind of) evolve organically, and outside of some edge cases, like Esperanto, they're not artificially created in a vacuum.
> one of the few languages where characters have no direct relation to phonetics
nit: It's not accurate to say that the characters have no direct relation to phonetics. Thousands of them are semanto-phonetic compounds, meaning they combine a character relating to the word's (or syllable's) meaning with a character relating to pronunciation. Sinitic languages tend to have a lot of homophones or near-homophones, so this approach works reasonably well as a memory aid once you've memorized a bunch of the basic characters.
One problem is that many of the pronunciations have drifted from the Middle Chinese pronunciation of the words. Also, some of them have been simplified in Simplified Chinese which makes the components a bit harder to discern.
I've been learning some Cantonese recently and this is very apparent with certain common Cantonese words. For example, the first-person pronoun in Cantonese is pronounced ngo, with a low-rising tone, and written like this:
If you enlarge it, you'll see that the left side is the same 我 from before. The right side is 鳥, which means "bird" (https://www.cantonese.sheik.co.uk/dictionary/characters/161/). So if you saw this character and knew the basic characters for the pronouns and the word "bird", and you spoke Cantonese, you'd be able to easily understand what it meant.
Here's another one. The word "ngo" with still a different tone means "hungry". How do we write it?
What does 食 mean? It's the verb "to eat". So if you saw this 餓 character and knew a couple of other basic characters, you could figure out that it's the word "ngo6" meaning "hungry". Many of the characters still work like this although the sound shift I mentioned above means that some work in some Chinese languages and not others.
Native Cantonese speaker here, glad that you are interested in learning Cantonese.
I am working with other volunteers to improve Cantonese teaching, and wonder what difficulties you have encountered when learning Cantonese, and what materials or communities would be helpful for Cantonese learners.
Very good article, like it.
Chinese characters are a type of pictographs that have some characteristics of QR codes. In fact, there is indeed a word retrieval method called four-corner number, which quickly maps Chinese character graphics to four numbers through some simple formulas, which is especially suitable for one-way encoding and retrieval. For example, the four-corner number of "龍" is coded as 0121, and the code of "兲" is 1080 (please refer to https://github.com/chai2010/im4corner).
In addition, Chinese characters are actually more important as hieroglyphic shapes. For example, we have a "凹语言" (Wa-lang https://github.com/wa-lang/wa/ ) designed for WebAssembly (WASM for short, WebAssembly => WASM => Wa), in which the Chinese characters "凹" and WASM The logo is very similar, and there was even a pronunciation of "wa" in the past.
After the popularization of computers, the function input method has been greatly improved, but there is still a lot of input resistance. For example, in programming, frequent switching between Chinese character names and English keywords brings a loss of input efficiency. As a programmer, I hope Chinese users can continue to pay attention to and improve these in the future.
No mention of the competing MingKwai typewriter of Lin Yutang, the famous popularizer of Chinese culture to the west. Apparently his prototype suffered an embarrassing failure at an investor meeting and couldn’t get off the ground. But the idea was good. Article here: https://thereader.mitpress.mit.edu/the-uncanny-keyboard/
Lin Yutang had a few patents related to this typewriter, I believe this is the main one:
https://patents.google.com/patent/US2613795A/
The idea of searching for characters via parts is similar to how Cangjie input method selects characters from radicals. I read somewhere that Cangjie input method was indeed inspired by Ming Kwai typewriter, but I can't find the citation for it.
Radiolab also did an excellent podcast on the topic
https://radiolab.org/podcast/wubi-effect
That one was so good. I was completely ignorant of the topic before that episode aired.
> like Chairman Mao Zedong, who seemed to equate Chinese modernization with the Romanization of Chinese script
One of Mao's better ideas
Romanization of Chinese writing was already proposed during the New Culture movement in the 1910s-20s. China's most famous modern writer supported it.
However, the Chinese language has evolved alongside the characters for about 3000 years, and it's very difficult to just separate the two. A huge amount of culture is bound up with the characters. Not only that, but the Romanized writing system is viewed as something that only little children use (as an aid to learn the characters). Once you've put in the effort to learn the characters (as about a billion people have), it's very difficult to accept their replacement by what is viewed as a script for children.
The nice thing about Chinese is information density of writing. Something nice about seeing how much information can be squeezed into a small space. Feels like you front load more on the learning side, but get rewarded when reading and scanning texts. Not sure how much scientific evidence is behind that, just an anecdotal observation. Relatively few Chinese speakers want to give up characters.
15 replies →
> However, the Chinese language has evolved alongside the characters for about 3000 years, and it's very difficult to just separate the two. A huge amount of culture is bound up with the characters.
How did that work out for Korea when they switched to Hangul?
4 replies →
Here is wonderful article by John DeFrancis on the topic:
The Prospects for Chinese Writing Reform (2006)
https://sino-platonic.org/complete/spp171_chinese_writing_re...
It is cited frequently.
Almost all digital communication is written using pinyin, which today is almost all written communication
6 replies →
Thank God it didn’t happen.
Much of the simplification adopted shorthand already in common use, which is why Japanese shinjitai simplification independently arrived at many similar characters and patterns. The second simplification round was an abysmal newspeak-esque failure, and thank goodness _that_ wasn't adopted either.
pinyin is the best thing that happened to the language after simplification.
Not only did it propel literacy rates to basically 100%, but it added a phonetic component to the language
20 replies →
Vietnamese is relatively OK.
6 replies →
The Vietnamese romanized their writing, they seems to be doing fine.
1 reply →
I find it humorous that 鷹 was described as a difficult character in the article. It’s like 3 radicals and the character for bird.
Perhaps it’s difficult to render in tiny Latin alphabet font, but if you have any Japanese or Chinese study under you, you could read and reproduce that nearly instantly on sight.
i think he meant difficult in the sense that it consists of many strokes, not in the sense how difficult it is to remember. however, one could argue that there are many other, more complex to write, kanji than 鷹
It interesting to consider that both Japan and China might have been prevent from ever being first with general purpose computers. ASCII, and other encoding schemes, only needed to make provisions for less than 200 characters, making it possible to implement with the limited storage and memory available to early computers. The shear amount of characters in some languages, like Chinese may have served as a distraction or roadblock for early computers in the those countries.
Makes you wonder what limitations of our own language and culture are preventing us from inventing certain things?
In an alternative history timeline it might be true.
In our timeline I highly doubt whether it was the main reason why general purpose computers didn't happen first in China or Japan.
See also "How the quest to type Chinese on a QWERTY keyboard created autocomplete": https://news.ycombinator.com/item?id=40548356, but no comments there)
“Safari can’t open the page because the address is invalid”
How strange.
More on topic: Considering how inefficient Chinese characters are in general (but especially evident in computing) as one of the few languages where characters have no direct relation to phonetics, I wonder why there hasn’t been an effort to modernize it similar to Hiragana in Japan. Well, considering how Chinese is basically Kanji, why not just adopt Japanese?
There were various attempts to develop an organic phonetic writing system for Chinese, like hiragana for Japanese, for example Bopomofo (still used in Taiwan) and General Chinese (https://en.wikipedia.org/wiki/General_Chinese). The Simplified characters that you see on the mainland today were originally part of a multi-phase scheme to eventually replace characters altogether, but the second phase (https://en.wikipedia.org/wiki/Second_round_of_simplified_Chi...) was bungled so badly that it didn't continue. In practice Pinyin is the standard phonetic writing now and is used when people can't remember a character.
> how inefficient Chinese characters are in general (but especially evident in computing)
We are not in the 90s anymore. UTF-8 has been around for 32 years now. If you’re working for a system that has no UTF-8 support, you have a much bigger problem to worry about.
> characters have no direct relation to phonetics
Most characters are phono-semantic where one part of the character is a phonetic hint and the other is a semantic hint.
> modernize it similar to Hiragana
Hiragana isn’t and wasn’t intended to replace kanji (unless you are from the fringe Kanamozikai). It serves a different grammatical purpose and is complementary to the other two. Kana is useful for an agglutinating language like Japanese, but not Chinese languages.
I think one of statements with respect to CJK languages that has to be made more often is that each of the languages has own numerous dialects with dubious mutual intelligibilities, e.g. Tsugaru and Kagoshima dialects against standard Japanese.
The phrase "a language is a dialect with an army" often appears in topic of Asian languages, and causing frictions between CJK non-speakers wondering about compatibilities between the three and speakers showing near vile dissents to those questions. While I understand both sides of these sentiments, the situation is not ideal for both sides.
IMO, it might be weird to refer to these languages as "Beijing Tokyo Seoul" languages, but doing so occasionally(just occasionally) could create more tangible feel as to why these three seem to exist side by side so utterly disconnected against each others.
> Kana is useful for an agglutinating language like Japanese, but not Chinese languages.
FWIW, the Japanese did develop a kana-based system for Taiwanese during the occupation, but it was an abomination.[1]
[1]: https://en.wikipedia.org/wiki/Taiwanese_kana
There are a lot of underlying assumptions here:
1. That Chinese writing is inherently inefficient. It's actually very efficient...to read. And nothing beats the efficiency of having a script that maps perfectly to the language. Also as sibling comment notes, UTF-8 is a thing.
2. That there is no relation between written characters and phonetics. Incorrect, as several sibling comments point out.
3. That Japanese kana represents a successful "modernization" of kanji that Chinese should emulate.
4. That Chinese is "basically kanji" - assuming the Chinese and Japanese languages are essentially interchangeable. They...are not. I can't even begin to emphasize how much they are not. Chinese is subject-verb-object while Japanese is subject-object-verb, for instance. Chinese also has many phonemes that are incompatible with Japanese, which would not be covered in hiragana. Finally, kanji came from Chinese and has subtle differences and while it is mostly a subset of Chinese hanz, it has its own slightly different character set
GP is making understandable misunderstanding due to how the three Far East countries are presented in the world at large, that there are three countries in Asia that practically touches each others, just like Germany is with Belgium and Netherlands in Europe.
Tokyo from Beijing(2000km/1200mi) is about as far out as Paris to Kyiv. Far East countries are also separated by seas, like Mediterranean countries across the sea. I doubt a lot of Parisians have meaningful ideas of "basically Latin" Ukrainian any way or form, or Italians with Tunisian, but there's such false instinct that forms out of above-mentioned presentation that those Asians are rather next door neighbors.
That and mistaking personal difficulties and inefficiencies associated with understanding languages in non-native manners as inferiority of the foreign one.
It’s really bizarre to see someone claim kana has anything to do with “modernization”. The Japanese modernization and industrialization period is famously associated with translating Western concepts and terminologies into Sinitic words that later spread to China, Korea and Vietnam.
2 replies →
> why not just adopt Japanese
Because Japanese characters have no direct relation to Chinese phonetics. Both belong to different dialect continuums, phonetics aren't compatible.
And I suspect same might explain lack of native Chinese phonetic script; `Chinese` isn't a single spoken language, but what is called as such is its Beijing area version of one of Chinese(or Sinitic) languages. The written language was universally understood in China due to bureaucratic needs, but AIUI it's not same as spoken language and it's not necessarily used everywhere. Maybe they just had little uses for a standardized phonetic script?
1: https://en.wikipedia.org/wiki/List_of_varieties_of_Chinese
It is still very useful to standardize the pronunciations, since people with different dialects had to meet especially those officials in government. There was “yayan” for this purpose.
https://en.wikimedia.org/wiki/Yayan
I'm guessing you are not familiar with how Chinese characters work nor how Japanese Hiragana or Kanji work.
This is not a helpful comment.
Well obviously not. Posting a dumb question tends to return some very helpful responses
Non-native speakers who suggest that countries arbitrarily modernize or change their language remind me of non-musicians who come around with a new replacement for traditional sheet music. Even if it was a good idea, which in this case it's patently not, it's just not gonna happen.
It's a failure to recognize that languages (which I would rank music a kind of) evolve organically, and outside of some edge cases, like Esperanto, they're not artificially created in a vacuum.
See https://sino-platonic.org/complete/spp171_chinese_writing_re...
> one of the few languages where characters have no direct relation to phonetics
nit: It's not accurate to say that the characters have no direct relation to phonetics. Thousands of them are semanto-phonetic compounds, meaning they combine a character relating to the word's (or syllable's) meaning with a character relating to pronunciation. Sinitic languages tend to have a lot of homophones or near-homophones, so this approach works reasonably well as a memory aid once you've memorized a bunch of the basic characters.
One problem is that many of the pronunciations have drifted from the Middle Chinese pronunciation of the words. Also, some of them have been simplified in Simplified Chinese which makes the components a bit harder to discern.
I've been learning some Cantonese recently and this is very apparent with certain common Cantonese words. For example, the first-person pronoun in Cantonese is pronounced ngo, with a low-rising tone, and written like this:
我 https://www.cantonese.sheik.co.uk/dictionary/characters/1/
The word for goose in Cantonese is also "ngo", but with a different tone. Here's the character for that:
鵝 https://www.cantonese.sheik.co.uk/dictionary/characters/1200...
If you enlarge it, you'll see that the left side is the same 我 from before. The right side is 鳥, which means "bird" (https://www.cantonese.sheik.co.uk/dictionary/characters/161/). So if you saw this character and knew the basic characters for the pronouns and the word "bird", and you spoke Cantonese, you'd be able to easily understand what it meant.
Here's another one. The word "ngo" with still a different tone means "hungry". How do we write it?
餓: https://www.cantonese.sheik.co.uk/dictionary/characters/740/
In this one the phonetic component is on the right instead, which is a bit inconsistent. The left side is this:
食: https://www.cantonese.sheik.co.uk/dictionary/characters/116/
What does 食 mean? It's the verb "to eat". So if you saw this 餓 character and knew a couple of other basic characters, you could figure out that it's the word "ngo6" meaning "hungry". Many of the characters still work like this although the sound shift I mentioned above means that some work in some Chinese languages and not others.
Native Cantonese speaker here, glad that you are interested in learning Cantonese.
I am working with other volunteers to improve Cantonese teaching, and wonder what difficulties you have encountered when learning Cantonese, and what materials or communities would be helpful for Cantonese learners.
Asianometry has a good video on this if my memory serves me right.
[flagged]