Comment by NitpickLawyer

1 day ago

I have nothing against researching this, I think it's important. My main issue is with articles choosing to grab a "conclusion" and imply it extrapolates to larger models, without any support for that. They are going for the catchy title first, fine-print be damned.

25 comments

NitpickLawyer

willvarfar 1 day ago

I was just at the KDD conference and the general consensus agreed with this paper. There was only one keynoter who just made the assumption that LLMs are associated with reasoning, which was jarring as the previous keynoter had just explained at length why we need a neuro-symbolic approach instead.

The thing is, I think the current companies making LLMs are _not_ trying to be correct or right. They are just trying to hide it better. In the business future for AI the coding stuff that we focus on on HN - how AI can help/impact us - is just a sideline.

The huge-money business future of LLMs is to end consumers not creators and it is product and opinion placement and their path to that is to friendship. They want their assistant to be your friend, then your best friend, then your only friend, then your lover. If the last 15 years of social media has been about discord and polarisation to get engagement, the next 15 will be about friendship and love even though that leads to isolation.

None of this needs the model to grow strong reasoning skills. That's not where the real money is. And CoT - whilst super great - is just as effective if it's hiding better that its giving you the wrong answer (by being more internally consistent) than if its giving you a better answer?

bsaul 1 day ago
"as the previous keynoter had just explained at length why we need a neuro-symbolic approach instead"
Do you have a link to the video for that talk ?
- willvarfar 1 day ago
  
  I don't think they were recorded. In fact, I don't think any of KDD gets recorded.
  I think it was Dan Roth who talked about the challenges of reasoning from just adding more layers and it was Chris Manning who just quickly mentioned at the beginning of his talk that LLMs were well known for reasoning.
  https://kdd2025.kdd.org/keynote-speakers/
ffsm8 1 day ago
> None of this needs the model to grow strong reasoning skills. That's not where the real money is.
I never thought about it like that, but it sounds plausible.
However, I feel like getting to this stage is even harder to get right compared to reasoning?
Aside from the <0.1% of severely mentally unwell people which already imagine themselves to be in relationships with AIs, I don't think a lot of normal people will form lingering attachments to them without solving the issue of permanence and memory
They're currently essentially stateless, while that's surely enough for short term attachment, I'm not seeing this becoming a bigger issue because if that glaring shortfall.
It'd be like being in a relationship with a person with dementia, thats not a happy state of being.
Honestly, I think this trend is severely overstated until LLMs can sufficiently emulate memories and shared experiences. And that's still fundamentally impossible, just like "real" reasoning with understanding.
So I disagree after thinking about it more - emulated reasoning will likely have a bigger revenue stream via B2E applications compared to emotional attachment in B2C...
- willvarfar 16 hours ago
  
  (the top post on HN right now is announcing Claude lets you buy a 1M token context. Extrapolate a few years.
  Generally, there is a push towards 'context engineering' and there is a lot of bleeding edge research in snapshotting large contexts in ways to get the next back-forth turn in the conversation to be fast etc. So optimisations are already being made.)
mdp2021 1 day ago

> None of this needs the model to grow strong reasoning skills. That's not where the real money is
"And the world is more and more complex, and the administrations are less and less prepared"
(~~ Henry Kissinger)
calf 1 day ago
As to general consensus, Hinton gave a recent talk, and he seemed adamant that neural networks (which LLMs are) really are doing reasoning. He gives his reasons for it. Is Hinton considered an outlier or?
- nyrikki 1 day ago
  
  A) Hinton is quite vocal about desiring to be an outsider/outlier as he says it is what lets him innovate.
  B) He is also famous for his Doomerism, which often depends on machines doing "reasoning".
  So...it's complicated, and we all suffer from confirmation bias.
  
  1 reply →
- willvarfar 3 hours ago
  
  I think Hinton uses terms like reasoning and creativity and consciousness in a way that are different from my own embeddings.
  I recently had fun asking Gemini to compare how Wittgenstein and Chomsky would view calling a large transformer that was trained entirely on a synthetic 'language' (in my case symbols that encode user behaviour in an app) a 'language' or not. And then, for the killer blow, whether an LLM that is trained on Perl is a language model.
  My point being that whilst Hinton is a great and all, I don't think I can quite pin down his definitions of the precise words like reasoning etc. Its possible for people to have opposite meanings for the same words (Wittgenstein famously had two contradictory approaches in his lifetime). In the case of Hinton, I can't quite pin down how loosely or precisely he is using the terms.
  A forward-only transformer like GPT can only do symbolic arithmetic to the depth of its layers, for example. And I don't think the solution is to add more layers.
  Of course humans are entirely neuro and we somehow manage to 'reason'. So YMMV.
- Insanity 20 hours ago
  
  Link to the talk?
  
  1 reply →
refulgentis 1 day ago
Not sure what all this is about, I somewhat regret taking a breaking from coding with LLMs to have it explained to me its all a mirage and a secret and sloppy plan for getting me an automagic egirl or something. ;)
- intended 1 day ago
  
  The point being made doesn’t impact people who can find utility from LLM output.
  It’s only when you need to apply it to domains outside of code, or a domain where it needs to actually reason, that it becomes an issue.
  
  1 reply →
- XenophileJKO 1 day ago
  
  Right? Oh this fairly novel solution the the problem I was having that works and is well tested. Oh throw it away.. sorry the model can't think of stuff..
  Back to square one!!
  
  1 reply →

kazinator 1 day ago

Because model size is a trivial parameter, and not a new paradigm.

What you're saying is like, you can't extrapolate that long division works on 100 digit numbers because you only worked through it using 7 digit numbers and a few small polynomials.

zwaps 21 hours ago
Scale changes the performance of LLMs.
Sometimes, we go so far as to say there is "emergence" of qualitative differences. But really, this is not necessary (and not proven to actually occur).
What is true is that the performance of LLMs at OOD tasks changes with scale.
So no, it's not the same as solving a math problem.
- kazinator 10 hours ago
  
  If you scale the LLM, you have to scale the tasks.
  Of course performance improves on the same tasks.
  The researchers behind the submitted work chose a certain size and certain size problems, controlling everything. There is no reason to believe that their results won't generalize to larger or smaller models.
  Of course, not for the input problems being held constant! That is as strawman.
- lossolo 18 hours ago
  
  > What is true is that the performance of LLMs at OOD tasks changes with scale.
  If scaling alone guaranteed strong OOD generalization, we’d expect the largest models to consistently top OOD benchmarks but this isn’t the case. In practice, scaling primarily increases a model’s capacity to represent and exploit statistical relationships present in the training distribution. This reliably boosts in-distribution performance but yields limited gains on tasks that are distributionally distant from the training data, especially if the underlying dataset is unchanged. That’s why trillion parameter models trained on the same corpus may excel at tasks similar to those seen in training, but won’t necessarily show proportional improvements on genuinely novel OOD tasks.
barrkel 1 day ago

Alas, not true. It would be easier to predict progress if so.
exe34 1 day ago
This is 100% how it doesn't work with LLMs.