← Back to context

Comment by baq

1 year ago

Call me when models understand when to convert the token into actual letters and count them. Can’t claim they’re more than word calculators before that.

That's misleading.

When you read and comprehend text, you don't read it letter by letter, unless you have a severe reading disability. Your ability to comprehend text works more like an LLM.

Essentially, you can compare the human brain to a multi-model or modular system. There are layers or modules involved in most complex tasks. When reading, you recognize multiple letters at a time[], and those letters are essentially assembled into tokens that a different part of your brain can deal with.

Breaking down words into letters is essentially a separate "algorithm". Just like your brain, it's likely to never make sense for a text comprehension and generation model to operate at the level of letters - it's inefficient.

A multi-modal model with a dedicated model for handling individual letters could easily convert tokens into letters and operate on them when needed. It's just not a high priority for most use cases currently.

[]https://www.researchgate.net/publication/47621684_Letters_in...

  • I agree completely, that wasn’t the point though: the point was that my 6 yo knows when to spell the word when asked and the blob of quantized floats doesn’t, or at least not reliably.

    So the blob wasn’t trained to do that (yeah low utility I get that) but it also doesn’t know it doesn’t know, which is an another much bigger and still unsolved problem.

    • I would argue that most sota models do know that they don't know this, as evidenced by the fact that when you give them a code interpreter as a tool they choose to use it to write a script that counts the number of letters rather than try to come up with an answer on their own.

      (A quick demo of this in the langchain docs, using claude-3-haiku: https://python.langchain.com/v0.2/docs/integrations/tools/ri...)

The model communicates in a language, but our letters are not necessary for such and in fact not part of the english language. You could write english using per word pictographs and it would still be the same english&the same information/message. It's like asking you if there is a '5' in 256 but you read binary.

Is anyone in the know, aside from mainstream media (god forgive me for using this term unironically) and civillians on social media claiming LLMs are anything but word calculators?

I think that's a perfect description by the way, I'm going to steal it.

  • I think it's a very poor intuition pump. These 'word calculators' have lots of capabilities not suggested by that term, such as a theory of mind and an understanding of social norms. If they are a "merely" a "word calculator", then a "word calculator" is a very odd and counterintuitively powerful algorithm that captures big chunks of genuine cognition.

    • They’re trained on the available corpus of human knowledge and writings. I would think that the word calculators have failed if they were unable to predict the next word or sentiment given the trillions of pieces of data they’ve been fed. Their training environment is literally people talking to each other and social norms. Doesn’t make them anything more than p-zombies though.

      As an aside, I wish we would call all of this stuff pseudo intelligence rather than artificial intelligence

      1 reply →