Comment by kikoreis

20 days ago

> Without any context provided, the state-of-the-art model, GPT-5.1 (High), is only able to solve less than 1% of tasks. This starkly demonstrates that the data is contamination-free, as the model is almost entirely incapable of solving the tasks without learning from the context.

[...]

[With context provided,] on average, models solve only 17.2% of tasks. Even the best-performing model, GPT-5.1 (High), achieves just 23.7%.

0 comments

kikoreis

No comments yet

Contribute on Hacker News ↗