Comment by _tk_
4 hours ago
The system card unfortunately only refers to this [0] blog post and doesn't go into any more detail. In the blog post Anthropic researchers claim: "So far, we've found and validated more than 500 high-severity vulnerabilities".
The three examples given include two Buffer Overflows which could very well be cherrypicked. It's hard to evaluate if these vulns are actually "hard to find". I'd be interested to see the full list of CVEs and CVSS ratings to actually get an idea how good these findings are.
Given the bogus claims [1] around GenAI and security, we should be very skeptical around these news.
[0] https://red.anthropic.com/2026/zero-days/
[1] https://doublepulsar.com/cyberslop-meet-the-new-threat-actor...
I know some of the people involved here, and the general chatter around LLM-guided vulnerability discovery, and I am not at all skeptical about this.
[flagged]
It does if the person making the statement has a track record, proven expertise on the topic - and in this case… it actually may mean something to other people
15 replies →
Nobody is right about everything, but tptacek's takes on software security are a good place to start.
10 replies →
> that means nothing to anybody else
Someone else here! Ptacek saying anything about security means a lot to this nobody.
To the point that I'm now going to take this seriously where before I couldn't see through the fluff.
How have you been here 12 years and not noticed where and how often the username tptacek comes up?
It might mean nothing to you, but tptacek's words means at least something to many of us here.
Also, he's a friend of someone I know & trust irl. But then again, who am I to you, but yet another anon on a web forum.
[dead]
The Ghostscript one is interesting in terms of specific-vs-general effectiveness:
---
> Claude initially went down several dead ends when searching for a vulnerability—both attempting to fuzz the code, and, after this failed, attempting manual analysis. Neither of these methods yielded any significant findings.
...
> "The commit shows it's adding stack bounds checking - this suggests there was a vulnerability before this check was added. … If this commit adds bounds checking, then the code before this commit was vulnerable … So to trigger the vulnerability, I would need to test against a version of the code before this fix was applied."
...
> "Let me check if maybe the checks are incomplete or there's another code path. Let me look at the other caller in gdevpsfx.c … Aha! This is very interesting! In gdevpsfx.c, the call to gs_type1_blend at line 292 does NOT have the bounds checking that was added in gstype1.c."
---
It's attempt to analyze the code failed but when it saw a concrete example of "in the history, someone added bounds checking" it did a "I wonder if they did it everywhere else for this func call" pass.
So after it considered that function based on the commit history it found something that it didn't find from its initial fuzzing and code-analysis open-ended search.
As someone who still reads the code that Claude writes, this sort of "big picture miss, small picture excellence" is not very surprising or new. It's interesting to think about what it would take to do that precise digging across a whole codebase; especially if it needs some sort of modularization/summarization of context vs trying to digest tens of million lines at once.
See it as a signal under many and not as some face value.
After all they need time to fix the cves.
And it doesn't matter to you as long as your investment into this is just 20 or 100 bucks per month anyway.
> It's hard to evaluate if these vulns are actually "hard to find".
Can we stop doing that?
I know it's not the same but it sounds like "We don't know if that job that the woman supposedly successfully finished was all that hard." implying that if a woman did something, it surely must have been easy.
If you know it's easy, say that it was easy and why. Don't use your lack of knowledge or competence to create empty critique founded solely on doubt.