Comment by therealpygon

7 hours ago

IMO, it’s just a small scale example of “training to the tests” because “count the ‘r’s in strawberry” became such a popular test that would make the news when a powerful model couldn’t answer such a simple question correctly while being advertised as the smartest model ever.

Assigning this as an indicator for improvement of intelligence seems like a mistake (or wishful).