Comment by AstroBen

10 months ago

This seems fairly obvious at this point. If they were actually reasoning at all they'd be capable (even if not good) of complex games like chess

Instead they're barely able to eek out wins against a bot that plays completely random moves: https://maxim-saplin.github.io/llm_chess/

25 comments

AstroBen

gilleain 10 months ago

Just in case it wasn't a typo, and you happen not to know ... that word is probably "eke" - meaning gaining (increasing, enlarging from wiktionary) - rather than "eek" which is what mice do :)

AstroBen 10 months ago

hah you're right on the spelling but wrong on my meaning. That's probably the first time I've typed it. I don't think LLMs are quite at the level of mice reasoning yet!
https://dictionary.cambridge.org/us/dictionary/english/eke-o... to obtain or win something only with difficulty or great effort
Terr_ 10 months ago

Ick, OK, ACK.

kylebyte 10 months ago

Every day I am more convinced that LLM hype is the equivalent of someone seeing a stage magician levitate a table across the stage and assuming this means hovercars must only be a few years away.

Terr_ 10 months ago

I believe there's a widespread confusion between a fictional character that is described as a AI assistant, versus the actual algorithm building the play-story which humans imagine the character from. An illusion actively promoted by companies seeking investment and hype.
AcmeAssistant is "helpful" and "clever" in the same way that Vampire Count Dracula is "brooding" and "immortal".

famouswaffles 10 months ago

LLMs are capable of playing chess and 3.5 turbo instruct does so quite well (for a human) at 1800 ELO. Does this mean they can truly reason now ?

https://github.com/adamkarvonen/chess_gpt_eval

AstroBen 10 months ago
My point wasn't chess specific or that they couldn't have specific training for it. It was a more general "here is something that LLMs clearly aren't being trained for currently, but would also be solvable through reasoning skills"
Much in the same way a human who only just learnt the rules but 0 strategy would very, very rarely lose here
These companies are shouting that their products are passing incredibly hard exams, solving PHD level questions, and are about to displace humans, and yet they still fail to crush a random-only strategy chess bot? How does this make any sense?
We're on the verge of AGI but there's not even the tiniest spark of general reasoning ability in something they haven't been trained for
"Reasoning" or "Thinking" are marketing terms and nothing more. If an LLM is trained for chess then its performance would just come from memorization, not any kind of "reasoning"
- famouswaffles 10 months ago
  
  >If an LLM is trained for chess then its performance would just come from memorization, not any kind of "reasoning".
  If you think you can play chess at that level over that many games and moves with memorization then i don't know what to tell you except that you're wrong. It's not possible so let's just get that out of the way.
  >These companies are shouting that their products are passing incredibly hard exams, solving PHD level questions, and are about to displace humans, and yet they still fail to crush a random-only strategy chess bot? How does this make any sense?
  Why doesn't it ? Have you actually looked at any of these games ? Those LLMs aren't playing like poor reasoners. They're playing like machines who have no clue what the rules of the game are. LLMs learn by predicting and failing and getting a little better at it, repeat ad nauseum. You want them to learn the rules of a complex game ? That's how you do it. By training them to predict it. Training on chess books just makes them learn how to converse about chess.
  Humans have weird failure modes that are odds with their 'intelligence'. We just choose to call them funny names and laugh about it sometimes. These Machines have theirs. That's all there is to it. The top comment we are both replying to had gemini-2.5-pro which released less than 5 days later hit 25% on the benchmark. Now that was particularly funny.
  
  4 replies →
hatefulmoron 10 months ago
3.5 turbo instruct is a huge outlier.
https://news.ycombinator.com/item?id=42138289
- famouswaffles 10 months ago
  
  That might be overstating it, at least if you mean it to be some unreplicable feat. Small models have been trained that play around 1200 to 1300 on the eleuther discord. And there's this grandmaster level transformer - https://arxiv.org/html/2402.04494v1
  Open AI, Anthropic and the like simply don't care much about their LLMs playing chess. That or post training is messing things up.
  
  10 replies →

raylad 10 months ago

Eek! You mean eke.