Comment by bcoates
2 days ago
The towers of Hanoi one is kind of weird, the prompt asks for a complete move by move solution and the 15 or 20 disk version (where reasoning models fail) means the result is unreasonably long and very repetitive. Likely as not it's just running into some training or sampler quirk discouraging the model to just dump huge amounts of low-entropy text.
I don't have a Claude in front of me -- if you just give it the algorithm to produce the answer and ask it to give you the huge output for n=20, will it even do that?
If I have to give it the algorithm as well as the problem, we’re no longer even pretending to be in the AGI world. If it falls down interpreting an algorithm it is worse than even a python interpreter.
Towers of Hanoi is a well-known toy problem. The algorithm is definitely in any LLM’s training data. So it doesn’t even need to come up with a new algorithm.
There may be some technical reason it’s failing but the more fundamental reason is that an autoregressive statistical token generator isn’t suited to solving problems with symbolic solutions.
I'm just saying ~10MB of short repetitive text lines might be out of scope as a response the LLM driver is willing to give at all, regardless of how derived
In the example someone else gave, o3 broke down after 95 lines of text. That’s far short of 10 MB.