Comment by acegod

7 days ago

Right, that's exactly what I'm saying.

>The point is to measure fluid intelligence in a way which supports comparisons between models and between models and humans. It's not the obligation of the test to be tailored to the form of model that's most popular now.

The problem is that the test may not be giving an accurate comparison because the test is problematic when used to assess LLMs, which are the kind of model that people are most interested in assessing for general capabilities.