Comment by throwaway613745

2 months ago

GPT 5.2 can't even tell me how many rs are in garlic.

20 comments

throwaway613745

This is a very tiring criticism. Yes, this is true. But, it's an implementation detail (tokenization) that has very little bearing on the practical utility of these tools. How often are you relying on LLM's to count letters in words?

1970-01-01 2 months ago

The implementation detail is that we keep finding them! After this, it couldn't locate a seahorse emoji without freaking out. At some point we need to have a test: there are two drinks before you. One is water, the other is whatever the LLM thought you might like to drink after it completed refactoring the codebase. Choose wisely.
101008 2 months ago
It's an example that shows that if these models aren't trained in a specific problem, they may have a hard time solving it for you.
- altruios 2 months ago
  
  An analogy is asking someone who is colorblind how many colors are on a sheet of paper. What you are probing isn't reasoning, it's perception. If you can't see the input, you can't reason about the input.
  
  3 replies →
- Uehreka 2 months ago
  
  No, it’s an example that shows that LLMs still use a tokenizer, which is not an impediment for almost any task (even many where you would expect it to be, like searching a codebase for variants of a variable name in different cases).
  
  2 replies →
- victorbjorklund 2 months ago
  
  No, it is the issue with the tokenizer.
iAMkenough 2 months ago

The criticism would stop if the implementation issue was fixed.
It's an example of a simple task. How often are you relying on LLMs to complete simple tasks?
andy99 2 months ago
At this point if I was openAI I wouldn’t bother fixing this to give pedants something to get excited about.
- properbrew 2 months ago
  
  Unless they fixed this in 25 minutes (possible?) it correctly counts 1 `r`.
  https://chatgpt.com/share/6941df90-789c-8005-8783-6e1c76cdfc...

desman 2 months ago

This is like complaining that your screwdriver is bad at measuring weight.

If you really need an answer and you really need the LLM to give it to you, then ask it to write a (Python?) script to do the calculation you need, execute it, and give you the answer.

bgwalter 2 months ago
⌴⌴⌴
- worldsayshi 2 months ago
  
  That's a problem that is at least possible for the LLM to perceive and learn through training, while counting letters is much more like asking a colour blind person to count flowers by colour.
  
  1 reply →

wikiterra 2 months ago

[dead]