Comment by Rendello
6 months ago
True! This list could more accurately be described as "Unicode codepoints that expand or contract when case is changed in UTF-8", which is exactly what I was testing in my program. I had built a parser that was relying on some assumptions that I felt was not correct, so I built some tests with this data.
For those interested, this was the generation script. I'm sure there was a way to do it better or simpler, and I wish I could just say this was a quick-and-dirty script, but in fact I spent quite a few hours on it (this is the fourth rewrite):
https://gist.github.com/rendello/b06ca3d976d26fa011897bd1603...
No comments yet
Contribute on Hacker News ↗