Comment by ares623

3 months ago

Wouldn’t the training or whatever make that unicode sequence effectively a smiley face?

Yes, but the same face gets represented by many unique strings. Strings which may more may not be tokenized into a single clean “smiley face” token.