← Back to context

Comment by vjerancrnjak

3 months ago

It must be the tokenizer. Figuring out words from an image is harder (edges, shapes, letters, words, ...), yet internal representations are more efficient.

I always found it strange that tokens can't just be symbols but instead there's an alphabet of 500k tokens, completely removing low level information from language (rhythm, syllables, etc.), side-effect being a simple edge case of 2 rs in strawberry, or no way to generate predefined rhyming patterns (without constrained sampling). There's an understandable reason for these big token dictionaries, but feels like a hack.