Comment by conradkay
19 hours ago
Was 4.7 distilled off Mythos (which got 77.8%)? Interesting how mythos got 82% on terminal-bench 2.0 compared to 82.7% for GPT-5.5.
Also notice how they state just for SWE-Bench Pro: "*Anthropic reported signs of memorization on a subset of problems"
No comments yet
Contribute on Hacker News ↗