Comment by capitainenemo

4 hours ago

With regards to what rust team is admitting or not... https://wtf-8.codeberg.page/#the-wtf-8-encoding "It is identical to generalized UTF-8, with the additional well-formedness constraint that a surrogate pair byte sequence is ill-formed. It is a strict subset of generalized UTF-8 and a strict superset of UTF-8."

https://wtf-8.codeberg.page/#intended-audience "WTF-8 is a hack intended to be used internally in self-contained systems with components that need to support potentially ill-formed UTF-16 for legacy reasons.

Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet."

They seem very transparent, and certainly are not proposing it as a general type.

> With regards to what rust team is admitting or not...

That wasn't an accusation. They admit things just fine. It was a hypothetical about using it as the main string type.

> and certainly are not proposing it as a general type.

1. Python's hack isn't used in file formats or transmissions either, as far as I know. It's also internal-only.

2. What they propose it for has zero relevance to my argument. It's merely proof that a hack like this can be added to ordinary Unicode representations. Python's goofy string representation is not enabling its surrogate hack.