← Back to context

Comment by brookst

6 months ago

It still uses text tokenization, so it can’t even see the word. Not sure what tokenizer GLM uses, but OpenAI’s tokenizer renders “blueberry” as a single token (116500 or thereabouts IIRC).

It’s like asking us what the average wavelength is when looking at a blueberry: the information is actually there somewhere in our processing stack, but inaccessible to reasoning. It can be worked out logically from general knowledge, but probably inaccurately, and the gotcha of “you’re looking right at it and the photons are hitting your vision” is not much of a gotcha when you understand how it works.