Comment by masklinn

5 months ago

> PyCompactUnicodeObject was introduced with Python 3.3, and uses UTF-8 internally.

UTF8 is used for C level interactions, if it were just that being used there would be no need to know the highest code point.

For Python semantics it uses one of ASCII, iso-8859-1, ucs2, or ucs4.

1 comment

masklinn

Interesting. You're right. Code pointer:

Also implies that Animats is correct that including an emoji in a Python string can bloat the memory consumption by a factor of 4.