← Back to context

Comment by flohofwoe

10 hours ago

> When you strlen() a UTF8 string, you don't get the length of the string, but instead the size in bytes.

Exactly, and that's what you want/need anyway most of the time (most importantly when allocating space for the string or checking if it fits into a buffer).

If you want the number of "characters" (which can have two meanings: either a single UNICODE code point, or a grapheme cluster (e.g. a "visible character" that's composed from multiple UNICODE code points). For this stuff you need a proper UNICODE/grapheme-aware string processing library. But this is needed only rarely in most application types which just pass strings around or occasionally need to split/parse/tokenize by 7-bit ASCII delimiters.