Comment by raincole

1 month ago

> I think, when you are building a system, restricting all (human language) input to be UTF-8 is a fair and reasonable design decision, and then you can use strlen to your hearts content.

It makes no sense. If you only need the byte count then you can use strlen no matter what the encoding is. If you need any other kind of counting then you don't use strlen no matter what the encoding is (except in ASCII only environment).

"Whether I should use strlen or not" is a completely independent question to "whether my input is all UTF-8."

2 comments

raincole

lelanthran 1 month ago

> If you only need the byte count then even you can use strlen no matter what the encoding is.

No, strlen won't give you the byte count on UTF16 encodings.

> If you need character count then you don't use strlen no matter what the encoding is (except in ASCII only environment).

What use-case requires the character count without also requiring a unicode glyph library?

raincole 1 month ago

> strlen won't give you the byte count on UTF16 encodings.
You're right. I stand corrected.