Comment by Filligree

1 day ago

So the question is, why do we tokenise it in such a way that it makes everything harder?

3 comments

Filligree

There is no encoding that makes everything easier. You trade off maths for general intelligence. Now we are at a point where the LLM can just choose to use a normal calculator anyway!

sureglymop 9 hours ago

Possibly unrelated but something I never fully understood: while we can't create a perfect parser for natural language, why don't we optimistically parse it to extract semantics and feed that into LLMs as well?

akoboldfrying 1 day ago

The tokenisation needs to be general -- it needs to be able to encode any possible input. It should also be at least moderately efficient across the distribution of inputs that it will tend to see. Existing tokenisation schemes explicitly target this.