Comment by abeppu
7 days ago
I think you're basically saying that ARC-AGI doesn't achieve a goal that _it didn't set_. The point of ARC-AGI is not to benchmark LLMs specifically. The point is to measure fluid intelligence in a way which supports comparisons between models and between models and humans. It's not the obligation of the test to be tailored to the form of model that's most popular now.
Right, that's exactly what I'm saying.
>The point is to measure fluid intelligence in a way which supports comparisons between models and between models and humans. It's not the obligation of the test to be tailored to the form of model that's most popular now.
The problem is that the test may not be giving an accurate comparison because the test is problematic when used to assess LLMs, which are the kind of model that people are most interested in assessing for general capabilities.