← Back to context

Comment by pornel

6 days ago

There's a difficult balance between letting the model simply memorize inputs, and forcing it to figure out a generalisations.

When a model is "lossy" and can't reproduce the data by copying, it's forced to come up with rules to synthesise the answers instead, and this is usually the "intelligent" behavior we want. It should be forced to learn how multiplication works instead of storing every combination of numbers as a fact.

Compression is related to intelligence: https://en.wikipedia.org/wiki/Kolmogorov_complexity

You're not answering the question. Grok 4 also performs better on the semi-private evaluation sets for ARC-AGI-1 and ARC-AGI-2. It's across-the-board better.

  • If these things are truly exhibiting general reasoning, why do the same models do significantly worse on ARC-AGI-2, which is practically identical to ARC-AGI-1?

    • It's not identical. ARC-AGI-2 is more difficult - both for AI and humans. In ARC-AGI-1 you kept track of one (or maybe two) kinds of transformations or patterns. In ARC-AGI-2 you are dealing with at least three, and the transformation interact with one another in more complex ways.

      Reasoning isn't an on-off switch. It's a hill that needs climbing. The models are getting better at complex and novel tasks.

      2 replies →