Comment by Tenobrus
10 hours ago
o1 has a METR time horizon of around 40 minutes, opus 4.7 has an implied horizon of 18 hours based on its ECI score. this study is on a model that's several generations behind wrt the kind of tasks it can complete. it would be shocking if this number were anywhere near as low with GPT 5.5, to the point it seems nearly totally irrelevant to talk about these results
No comments yet
Contribute on Hacker News ↗