← Back to context

Comment by jhasse

13 hours ago

That's where the standard should come in and say something like "starting with C++26 char is always 1 byte and signed. std::string is always UTF-8" Done, fixed unicode in C++.

But instead we get this mess. I guess it's because there's too much Microsoft in the standard and they are the only ones not having UTF-8 everywhere in Windows yet.

std::string is not UTF-8 and can't be made UTF-8. It's encoding agnostic, its API is in terms of bytes not codepoints.

  • Of course it can be made UTF-8. Just add a codepoints_size() method and other helpers.

    But it isn't really needed anyway: I'm using it for UTF-8 (with helper functions for the 1% cases where I need codepoints) and it works fine. But starting with C++20 it's starting to get annoying because I have to reinterpret_cast to the useless u8 versions.