Comment by azakai
5 hours ago
Plenty of reasons to be skeptical, but also we know that LLMs can find security vulnerabilities since at least 2024:
https://projectzero.google/2024/10/from-naptime-to-big-sleep...
Some followup findings reported in point 1 here from 2025:
https://blog.google/innovation-and-ai/technology/safety-secu...
So what Anthropic are reporting here is not unprecedented. The main thing they are claiming is an improvement in the amount of findings. I don't see a reason to be overly skeptical.
I'm not sure the volume here is particularly different to past examples. I think the main difference is that there was no custom harness, tooling or fine-tuning. It's just the out of the box capabilities for a generally available model and a generic agent.