Comment by rainsford
2 months ago
Sure, but I think the point is why do LLM's have a blindspot for performing a task that a basic python script could get right 100% of the time using a tiny fraction of the computing power? I think this is more than just a gotcha. LLMs can produce undeniably impressive results, but the fact that they still struggle with weirdly basic things certainly seems to indicate something isn't quite right under the hood.
I have no idea if such an episode of Star Trek: The Next Generation exists, but I could easily see an episode where getting basic letter counting wrong was used as an early episode indication that Data was going insane or his brain was deteriorating or something. Like he'd get complex astrophysical questions right but then miscount the 'b's in blueberry or whatever and the audience would instantly understand what that meant. Maybe our intuition is wrong here, but maybe not.
Basic Python script? This is a grep command, one line of C, or like three assembly instructions.
If you think this is more than just a gotcha that’s because you don’t understand how LLMs are structured. The model doesn’t operate on words it operates on tokens. So the structure of the text in the word that the question relies on has been destroyed by the tokenizer before the model gets a chance to operate on it.
It’s as simple as that- this is a task that exploits the design of llms because they rely on tokenizing words and when llms “perform well” on this task it is because the task is part of their training set. It doesn’t make them smarter if they succeed or less smart if they fail.
Hence positronic neural network outperforms machine learning that are used today. /headduck