Comment by f33d5173
3 months ago
Vision is how humans see text. So text must have built in adaptations to protect from visual noise. For example, two words that look similar must never appear in similar contexts, or else they would be conflated. Hence we can safely reduce such words to the same token. Or something like that.
That also works purely on text and it's the trick I used in my German speech recognition engine ( https://arxiv.org/abs/2206.12693 ).
"I'm studying at Oxford Univ" has basically no loss in meaning even though "University" was truncated to less than half its characters.
This is like how many CLIs accept the shortest unique version of commands.
Is that really factual/true?
Lots of words have multiple meanings and can mean different things even if used in the same sentence/context just from the interpretation of the person reading it.
Heck, it'd argue that most (not all) dayjob conflicts are down to such differences in interpretation /miscommunications