Comment by kikoreis
20 days ago
> Without any context provided, the state-of-the-art model, GPT-5.1 (High), is only able to solve less than 1% of tasks. This starkly demonstrates that the data is contamination-free, as the model is almost entirely incapable of solving the tasks without learning from the context.
[...]
[With context provided,] on average, models solve only 17.2% of tasks. Even the best-performing model, GPT-5.1 (High), achieves just 23.7%.
No comments yet
Contribute on Hacker News ↗