Comment by 827a
2 hours ago
GPT-5.5-Cyber has already at least hit if not surpassed Mythos capability in cyber tasks. The only reason they're holding back is because once its out everyone would realize that its capabilities were a step change in March, but are not anymore, yet it costs significantly more and is much slower.
how did you go about assessing this?
So you believe one marketing department more than the other?
The brits have a step-based benchmark that they use for this - https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
They seem pretty close, in both average and "best run" scores. And, in a highly verifiable domain, "best run" or pass@n is what you're looking for.
Worth looking at the followup post that evaluates the current version of Mythos, which solves one of the main tasks that GPT-5.5-Cyber does not. https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber...