Comment by marcellus23

6 months ago

I think it's hard to take any LLM criticism seriously if they don't even specify which model they used. Saying "an LLM model" is totally useless for deriving any kind of conclusion.

8 comments

marcellus23

ehnto 6 months ago

When talking about the capabilities of a class of tools long term, it makes sense to be general. I think deriving conclusions at all is pretty difficult given how fast everything is moving, but there is some realities we do actually know about how LLMs work and we can talk about that.

Knowing that ChatGPT output good tokens last tuesday but Sonnet didn't does not help us know much about the future of the tools on general.

dpoloncsak 6 months ago
> Knowing that ChatGPT output good tokens last tuesday but Sonnet didn't does not help us know much about the future of the tools on general.
Isnt that exactly what is going to help us understand the value these tools bring to end-users, and how to optimize these tools for better future use? None of these models are copy+pastes, they tend to be doing things slightly differently under the hood. How those differences affect results seems like the exact data we would want here
- ehnto 6 months ago
  
  I guess I disagree that the main concern is the differences per each model, rather than the overall technology of LLMs in general. Given how fast it's all changing, I would rather focus on the broader conversation personally. I don't really care if GPT5 is better at benchmarks, I care that LLMs are actually capable of the type of reasoning and productive output that the world currently thinks they are.
  
  1 reply →

p1esk 6 months ago

Yes, I’d be curious about his experience with GPT-5 Thinking model. So far I haven’t seen any blunders from it.

eru 6 months ago
I've seen plenty of blunders, but in general it's better than their previous models.
Well, it depends a bit on what you mean by blunders. But eg I've seen it confidently assert mathematically wrong statements with nonsense proofs, instead of admitting that it doesn't know.
- grey-area 6 months ago
  
  In a very real sense it doesn’t even know that it doesn’t know.
  
  1 reply →