← Back to context

Comment by Kim_Bruning

7 days ago

"what is 16929481231+22312333222?" is an easy way to test this claim. Pick large enough numbers and there's no way all the sums of that size would fit into the dataset (you don't need to stick to + either, but it's the simplest thing that works)

But if you were to ask that same question to a human with no specific math training there are exceedingly low odds they would get the right answer.

We spend hours and hours over reinforced over years to have humans that can do it.

  • Indeed! So, where are you going with this?

    For my contribution to the conversation: Earlier/cheaper models can't do it either, they make mistakes, they need a calculator/jupiter kernel/what have you. 'Medium' models will put the numbers underneath each other and do it 'properly' in a table, checking themselves after. Claude Opus 4.6 (the current rolls royce today) just says the answer in one go sometimes (it's a monster). But all of them end up spending many seconds and thousands of tokens on a task that takes a calculator or an ALU fractions of a second.