Comment by boothby
21 hours ago
This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.
21 hours ago
This is why misspellings and homophones are tells of human righting. LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.
Funny, I’ve been cracking[0] at this exact problem with a purpose-built model[1]:
0: https://huggingface.co/posts/omarkamali/593639295164067
1: https://omneitylabs.com/models/sawtone
Claude the other day wrote code where one of the bytes in the array was 0xO5.
That's zero ex oh (the letter) five
> righting.
> LLMs strongly prefer word-level tokens, and word substitutions follow semantic similarity and not the more human auditory similarity.
Is this an elaborate joke or your full-word misspelling of writing is both agreeing with your statement (word substitutions) and contradicting it (not semantic but only pronunciation similarity)
I don't see the contradiction, unless you believe that the grandparent comment was written by an LLM.