← Back to context

Comment by linguae

16 hours ago

I remember learning Japanese in the early 2000s and the fun of dealing with multiple encodings for the same language: JIS, Shift-JIS, and EUC. As late as 2011 I had to deal with processing a dataset encoded under EUC in Python 2 for a graduate-level machine learning course where I worked on a project for segmenting Japanese sentences (typically there are no spaces in Japanese sentences).

UTF-8 made processing Japanese text much easier! No more needing to manually change encoding options in my browser! No more mojibake!

I live in Japan and I still receive the random email or work document encoded in Shit-JIS. Mojibake is not as common as it once was, but still a problem.

  • I'm assuming you misspelled Shift-JIS on purpose because you're sick and tired of dealing with it. If that was an accidental misspelling, it was inspired. :-)