Comment by kristianp

4 months ago

Does that mean they've managed to post train the thinking steps required to get these types of questions correct?

3 comments

kristianp

IMO, it’s just a small scale example of “training to the tests” because “count the ‘r’s in strawberry” became such a popular test that would make the news when a powerful model couldn’t answer such a simple question correctly while being advertised as the smartest model ever.

Assigning this as an indicator for improvement of intelligence seems like a mistake (or wishful).

jononor 4 months ago

If done at scale, they are kinda crowd sourcing the test set from the entire internet, personal and business world. It will be harder and harder at least to pinpoint weaknesses, at least for the general public. It probably has little to do with intelligence (at least fluid intelligence as defined by Chollet et al) - but I guess it is sound tactic if the strategy is "fake it till you make it". And we might be surprised as to how far along that can go...

simonw 4 months ago

That's my best guess, yeah.