Comment by kristianp

12 hours ago

Does that mean they've managed to post train the thinking steps required to get these types of questions correct?

IMO, it’s just a small scale example of “training to the tests” because “count the ‘r’s in strawberry” became such a popular test that would make the news when a powerful model couldn’t answer such a simple question correctly while being advertised as the smartest model ever.

Assigning this as an indicator for improvement of intelligence seems like a mistake (or wishful).