Comment by rdtsc

1 year ago

> I can trick people who have never heard of the trick with the 7 wives and 7 bags and so on. That doesn't mean they didn't understand

They could fail because they didn’t understand the language. Didn’t have a good memory to memorize all the steps, or couldn’t reason through it. We could pose more questions to probe which reason is more plausible.

The trick with the 7 wives and 7 bags and so on is that no long reasoning is required. You just have to notice one part of the question that invalidates the rest and not shortcut to doing arithmetic because it looks like an arithmetic problem. There are dozens of trick questions like this and they don't test understanding, they exploit your tendency to predict intent.

But sure, we could ask more questions and that's what we should do. And if we do that with LLMs we can quickly see that when we leave the basin of the memorized answer by rephrasing the problem, the model solves it. And we would also see that we can ask billions of questions to the model, and the model understands us just fine.