That isn't a flaw though. Counting is orthogonal to the functioning of LLMs, which are merely completing patterns based on their training data and available context. If you want an LLM to count reliably, give it a tool.
We're still on that's just how it works. The LLM isn't aware of any consequence, etc. All it does is complete patterns as trained. And the data contains many instances of articulate question answering.
It is for those using the LLM to be aware of its capabilities, or not - be allowed to - use it. Like a child unaware that running their finger on a sharp knife blade will lead to a bad slice; you don't dull the blade to keep the child safe, but keep the child from the knife until they can understand and respect its capabilities.
That isn't a flaw though. Counting is orthogonal to the functioning of LLMs, which are merely completing patterns based on their training data and available context. If you want an LLM to count reliably, give it a tool.
Anything articulate (hence possibly convincing) which could be «merely [guessing]» should either be locked out of consequential questions, or fixed.
We're still on that's just how it works. The LLM isn't aware of any consequence, etc. All it does is complete patterns as trained. And the data contains many instances of articulate question answering.
It is for those using the LLM to be aware of its capabilities, or not - be allowed to - use it. Like a child unaware that running their finger on a sharp knife blade will lead to a bad slice; you don't dull the blade to keep the child safe, but keep the child from the knife until they can understand and respect its capabilities.
16 replies →
If a LLM can get IMO Gold but can’t count, that’s an issue.
I think the issue is that it was advertised as having PhD-level intelligence, while in fact it can't count the letters in a word.
This particular LLM did not get an IMO Gold.