Comment by petermcneeley

3 months ago

I recently saw a new LLM that was fooled by "20 pounds of bricks vs 20 feathers". These are not reasoning machines.

20 comments

petermcneeley

I recently had a computer tell me that 0.1 + 0.2 != 0.3. It must not be a math capable machine.

Perhaps it is more important to know the limitations of tools rather than dismiss their utility entirely due to the existence of limitations.

blamestross 3 months ago
A computer isn't a math capable machine.
> Perhaps it is more important to know the limitations of tools rather than dismiss their utility entirely due to the existence of limitations.
Well, yes. And "reasoning" is only something LLMs do coincidentally, to their function as sequence continuation engines. Like performing accurate math on rationale numbers, it can happen if you put in a lot of work and accept a LOT of expensive computation. Even then there exists computations that just are not reasonable or feasible.
Reminding folks to dismiss the massive propaganda engine pushing this bubble isn't "dismissing their utility entirely".
These are not reasoning machines. Treating them like they are will get you hurt eventually.
- dghlsakjg 3 months ago
  
  My point is that computers, when used properly, can absolutely do math. And LLMs, when used properly, can absolutely explain the reasoning behind why a pound of bricks and a pound of feathers weigh the same.
  Can they reason? Maybe, depending on your definition of reasoning.
  An example: which weighs more a pound of bricks and 453.59 grams of feathers? Explain your reasoning.
  LLM: The pound of bricks weighs slightly more.
  *Reasoning:*
  * *1 pound* is officially defined as *0.45359237 kilograms*, which is *453.59237 grams*. * You have *453.59 grams* of feathers.
  So, the pound of bricks (453.59237 grams) weighs a tiny fraction more than the 453.59 grams of feathers. For most practical purposes, they'd be considered the same, but technically, the bricks are heavier by 0.00237 grams. /llm
  It is both correct and the reasoning is sound. Do I understand that the machine is a pattern following machine, yes! Is there an argument to be made that humans are also that? Probably. Chomsky himself argued in favor of a universal grammar, after all.
  I’m steel manning this a bit, but the point is that LLMs are capable of doing some things which are indistinguishable from human reasoning in terms of results. Does the process matter in all cases?
  
  1 reply →

StrandedKitty 3 months ago

Surely it just reasoned that you made a typo and "autocorrected" your riddle. Isn't this what a human would do? Though to be fair, a human would ask you again to make sure they heard you correctly. But it would be kind of annoying if you had to verify every typo when using an LLM.

HDThoreaun 3 months ago

Tons of people fall for this too. Are they not reasoning? LLMs can also be bad reasoning machines.

petermcneeley 3 months ago
I dont have much use for a bad reasoning machine.
- the8472 3 months ago
  
  I could retort with another gotcha argument, but instead of doing that perhaps we can do better than that?
  An attempt: They are bad reasoning machines that already are useful in a few domains and they're improving faster than evolutionary speeds. So even if they're not useful today in a domain relevant to you there's a significant possibility they might be in a few months. AlphaEvolve would have been scifi a decade ago.
  "It's like if a squirrel started playing chess and instead of "holy shit this squirrel can play chess!" most people responded with "But his elo rating sucks""
- HDThoreaun 3 months ago
  
  I can think of tons of uses for a bad reasoning machine as long as it’s cheap enough.
  
  4 replies →

downboots 3 months ago

But are you aware of the weight comparison of a gallon of water vs a gallon of butane ?

petermcneeley 3 months ago
No im not. A gallon is a measure of volume? This is a USA unit.
- downboots 3 months ago
  
  https://www.reddit.com/r/dadjokes/comments/flr7tc/which_weig...

fzzzy 3 months ago

20 feathers?

lostmsu 3 months ago

Yes, Claude 4 Sonnet just said they both weight 20 pounds. UPD. and so did Gemini 2.5 Flash. And MS Copilot in "Think deeper" mode.