Comment by jonplackett

5 months ago

The fact that all the models I’ve tried except the thinking ones get it wrong suggests not.

They get caught up in the idea that adding milk first cools it fastest and can’t escape from that

First page of Google search results from 7 years ago: https://www.quora.com/You-have-2-cups-of-coffee-50-degrees-w...

People making up their own benchmarks for these things has confirmed one thing for me: The bias that people think they mostly have original thoughts is extremely strong. I find if I have a “good” idea someone has probably already thought of it as well and maybe even written about it. About 0.01% of the time do I have an idea that one may consider novel and even that’s probably my own bias and overstated. This example just confirms that these models don’t really seem to reason and have a really hard time doing the basic generalization they can with fewer examples.