Comment by layer8

1 year ago

This stems from the earlier Turkish 8-bit character sets like IBM code page 857, which Unicode was designed to be roundtrip-compatible with.

Aside from that, it‘s unlikely that authors writing both Turkish and non-Turkish words would properly switch their input method or language setting between both, so they would get mixed up in practice anyway.

There is no escape from knowing (or best-guessing) which language you are performing transformations on, or else just leave the text as-is.

0 comments

layer8

No comments yet

Contribute on Hacker News ↗