Comment by kristianp
12 hours ago
Does that mean they've managed to post train the thinking steps required to get these types of questions correct?
12 hours ago
Does that mean they've managed to post train the thinking steps required to get these types of questions correct?
IMO, it’s just a small scale example of “training to the tests” because “count the ‘r’s in strawberry” became such a popular test that would make the news when a powerful model couldn’t answer such a simple question correctly while being advertised as the smartest model ever.
Assigning this as an indicator for improvement of intelligence seems like a mistake (or wishful).
That's my best guess, yeah.