Comment by dpc_01234
6 hours ago
UTF-8 is a undeniably a good answer, but to a relatively simple bit twiddling / variable len integer encoding problem in a somewhat specific context.
I realize that hindsight is 20/20, and time were different, but lets face it: "how to use an unused top bit to best encode larger number representing Unicode" is not that much of challenge, and the space of practical solutions isn't even all that large.
Except that there were many different solutions before UTF-8, all of which sucked really badly.
UTF-8 is the best kind of brilliant. After you've seen it, you (and I) think of it as obvious, and clearly the solution any reasonable engineer would come up with. Except that it took a long time for it to be created.
I just realised that all latin text is wasting 12% of storage/memory/bandwidth with MSB zero. At least is compresses well. Are there any technology that utilizes 8th bit for something useful, e.g. error checking?
See mort96's comments about 7-bit ASCII and parity bits (https://news.ycombinator.com/item?id=45225911). Kind of archaic now, though - 8-bit bytes with the error checking living elsewhere in the stack seems to be preferred.