Comment by masklinn
1 day ago
> PyCompactUnicodeObject was introduced with Python 3.3, and uses UTF-8 internally.
UTF8 is used for C level interactions, if it were just that being used there would be no need to know the highest code point.
For Python semantics it uses one of ASCII, iso-8859-1, ucs2, or ucs4.
Interesting. You're right. Code pointer:
https://github.com/python/cpython/blob/main/Objects/unicodeo...
Also implies that Animats is correct that including an emoji in a Python string can bloat the memory consumption by a factor of 4.