Comment by Rendello

6 months ago

True! This list could more accurately be described as "Unicode codepoints that expand or contract when case is changed in UTF-8", which is exactly what I was testing in my program. I had built a parser that was relying on some assumptions that I felt was not correct, so I built some tests with this data.

For those interested, this was the generation script. I'm sure there was a way to do it better or simpler, and I wish I could just say this was a quick-and-dirty script, but in fact I spent quite a few hours on it (this is the fourth rewrite):

https://gist.github.com/rendello/b06ca3d976d26fa011897bd1603...