Comment by gpt5
7 hours ago
ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.
7 hours ago
ARC-AGI isn't perfect, but it helps demonstrates the gap. I'm sure all companies optimize their models for this benchmark given its dominance.
What about other benchmarks? Benchmarks where the contents are freely available have become useless for evaluating models.