Comment by snowwrestler

6 days ago

When kids learn multiplication, they learn it on paper, not just in their heads. LLMs don’t have access to paper.

“Do long arithmetic entirely in your mind” is not a test most humans can pass. Maybe a few savants. This makes me suspect it is not a reliable test of reasoning.

Humans also get a training run every night. As we sleep, our brains are integrating our experiences from the day into our existing minds, so we can learn things from day to day. Kids definitely do not learn long multiplication in just one day. LLMs don’t work like this; they get only one training run and that is when they have to learn everything all at once.

LLMs for sure cannot learn and reason the same way humans do. Does that mean they cannot reason at all? Harder question IMO. You’re right that Python did the math, but the LLM wrote the Python. Maybe that is like their version of “doing it on paper.”

They have access to paper, the model output, that is what reasoning models use to keep track of the chain of thought. When I asked Copilot what kind of external resources it can use, it also claimed that it has access to some scratchpad memory, which might or might not be true, did not try to verify that.

Also I am not asking to learn it in one day, you can dump everything that a child would hear and read during primary school into the context. You can even do it interactively, maybe the model has questions.