Comment by 827a

2 hours ago

GPT-5.5-Cyber has already at least hit if not surpassed Mythos capability in cyber tasks. The only reason they're holding back is because once its out everyone would realize that its capabilities were a step change in March, but are not anymore, yet it costs significantly more and is much slower.

4 comments

827a

john_strinlai 2 hours ago

how did you go about assessing this?

jansan 2 hours ago

So you believe one marketing department more than the other?

NitpickLawyer 2 hours ago
The brits have a step-based benchmark that they use for this - https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...
They seem pretty close, in both average and "best run" scores. And, in a highly verifiable domain, "best run" or pass@n is what you're looking for.
- aesthesia 1 hour ago
  
  Worth looking at the followup post that evaluates the current version of Mythos, which solves one of the main tasks that GPT-5.5-Cyber does not. https://www.aisi.gov.uk/blog/how-fast-is-autonomous-ai-cyber...