← Back to context

Comment by Mindless2112

2 months ago

It seems like this could be easily solved in models that support tool calling by providing them with a tool that takes a token and returns the individual graphemes.

It doesn't seem valuable for the model to memorize the graphemes in each of its tokens.

Yes, but are you going to special case all of these pain points? The whole point of these LLMs is that they learn from training data, not from people coding logic directly. If you do this people will come up with a dozen new ways in which the models fail. They are really not hard to find. Basically asking them to do anything novel is at risk of complete failure. The interesting bit is that LLMs tend to work best a "medium difficulty" problems. Homework questions and implementing documented APIs and things like that. Asking them to do anything completely novel tends to fail as does asking them to do something so trivial that normal humans won't bother even writing it down.

  • It makes sense when users ask for information not available in the tokenized values though. In the abstract, a tool that changes tokenization for certain context contents when a prompt references said contents is probably necessary to solve this issue (if you consider it worth solving).

    • It's a fools errand. The kinds of problems you end up coding for are the ones that are blatantly obvious and ultimately useless except as a gotcha to the AI engines. All you're doing is papering over the deficiency of the model without actually solving a problem.

      1 reply →

  • Tokenization is an inherent weakness of current LLM design, so it makes sense to compensate for it. Hopefully some day tokenization will no longer be necessary.

That takes away from the notion that LLMs have emergent intelligent abilities. Right now it doesn't seem valuable for a model to count letters, even though it is a very basic measure of understanding. Will this continue in other domains? Will we be doing tool-calling for every task that's not just summarizing text?

  • > Will we be doing tool-calling for every task that's not just summarizing text?

    spoiler: Yes. This has already become standard for production use cases where the LLM is an external-facing interface; you use an LLM to translate the user's human-language request to a machine-ready, well-defined schema (i.e. a protobuf RPC), do the bulk of the actual work with actual, deterministic code, then (optionally) use an LLM to generate a text result to display to the user. The LLM only acts as a user interface layer.

  • How is counting letters a measure of understanding, rather than a rote process?

    The reason LLMs struggle with this is because they literally aren't thinking in English. Their input is tokenized before it comes to them. It's like asking a Chinese speaker "How many Rs are there in the word 草莓".

    • It shows understanding that words are made up of letters and that they can be counted

      Since tokens are atomic, which I didn't realize earlier, then maybe it's still intelligent if it can realize it can extract the result by writing len([b for b in word if b == my_letter]) and decide on its own to return that value.

    • But why doesn’t the LLM reply “I can’t solve this task because I see text as tokens”, rather than give a wrong answer?

We're up to a gazillion parameters already, maybe the next step is to just ditch the tokenization step and let the LLMs encode the tokenization process internally?