Comment by Topfi

6 hours ago

The official release by Anthropic is very light on concrete information [0], only contains a select and very brief number of examples and lacks history, context, etc. making it very hard to gleam any reliably information from this. I hope they'll release a proper report on this experiment, as it stands it is impossible to say how much of this are actual, tangible flaws versus the unfortunately ever growing misguided bug reports and pull requests many larger FOSS projects are suffering from at an alarming rate.

Personally, while I get that 500 sounds more impressive to investors and the market, I'd be far more impressed in a detailed, reviewed paper that showcases five to ten concrete examples, detailed with the full process and response by the team that is behind the potentially affected code.

It is far to early for me to make any definitive statement, but the most early testing does not indicate any major jump between Opus 4.5 and Opus 4.6 that would warrant such an improvement, but I'd love nothing more than to be proven wrong on this front and will of course continue testing.

[0] https://red.anthropic.com/2026/zero-days/