← Back to context

Comment by antonvs

1 year ago

That's misleading.

When you read and comprehend text, you don't read it letter by letter, unless you have a severe reading disability. Your ability to comprehend text works more like an LLM.

Essentially, you can compare the human brain to a multi-model or modular system. There are layers or modules involved in most complex tasks. When reading, you recognize multiple letters at a time[], and those letters are essentially assembled into tokens that a different part of your brain can deal with.

Breaking down words into letters is essentially a separate "algorithm". Just like your brain, it's likely to never make sense for a text comprehension and generation model to operate at the level of letters - it's inefficient.

A multi-modal model with a dedicated model for handling individual letters could easily convert tokens into letters and operate on them when needed. It's just not a high priority for most use cases currently.

[]https://www.researchgate.net/publication/47621684_Letters_in...

I agree completely, that wasn’t the point though: the point was that my 6 yo knows when to spell the word when asked and the blob of quantized floats doesn’t, or at least not reliably.

So the blob wasn’t trained to do that (yeah low utility I get that) but it also doesn’t know it doesn’t know, which is an another much bigger and still unsolved problem.

  • I would argue that most sota models do know that they don't know this, as evidenced by the fact that when you give them a code interpreter as a tool they choose to use it to write a script that counts the number of letters rather than try to come up with an answer on their own.

    (A quick demo of this in the langchain docs, using claude-3-haiku: https://python.langchain.com/v0.2/docs/integrations/tools/ri...)