Comment by doph

1 day ago

All tokens are symbols. All of the frontier models speak Mandarin.

This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.