Comment by mtlynch
8 days ago
> What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.
Source? I haven't seen this anywhere.
In my experience, false positive rate on vulnerabilities with Claude Opus 4.6 is well below 20%.
To the issue of AI submitted patches being more of a burden than a boon, many projects have decided to stop accepting AI-generated solutioning:
https://blog.devgenius.io/open-source-projects-are-now-banni...
These are just a few examples. There are more that google can supply.
According to Willy Tarreau[0] and Greg Kroah-Hartman[1], this trend has recently significantly reversed, at least form the reports they've been seeing on the Linux kernel. The creator of curl, Daniel Steinberg, before that broader transition, also found the reports generated by LLM-powered but more sophisticated vuln research tools useful[2] and the guy who actually ran those tools found "They have low false positive rates."[3]
Additionally, there was no mention in the talk by the guy who found the vuln discussed in the TFA of what the false positive rate was, or that he had to sift through the reports because it was mostly slop — or whether he was doing it out of courtesy. Additionally, he said he found only several hundred, iirc, not "thousands." All he said was:
"I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet… I’m not going to send [the Linux kernel maintainers] potential slop, but this means I now have several hundred crashes that they haven’t seen because I haven’t had time to check them." (TFA)
He quite evidently didn't have to sift through thousands, or spend months, to find this one, either.
[0]: https://lwn.net/Articles/1065620/ [1]: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_... [2]: https://simonwillison.net/2025/Oct/2/curl/p [3]: https://joshua.hu/llm-engineer-review-sast-security-ai-tools...
No, they haven't. Read the ai slop you posted carefully.
It's a policy update that enables maintainers to ignore low effort "contributions" that come from untrusted people in order to reduce reviewing workload.
An Eternal September problem, kind of.
Didn't you just restate what the parent claimed?
7 replies →
Same. Codex and Claude Code on the latest models are really good at finding bugs, and really good at fixing them in my experience. Much better than 50% in the latter case and much faster than I am.
Source: """AI is bad"""
In my experience, the issue has been likelihood of exploitation or issue severity. Claude gets it wrong almost all the time.
A threat model matters and some risks are accepted. Good luck convincing an LLM of that fact
In TFA:
Those aren't false positives; they're results he hasn't yet inspected.
I wrote a longer reply here: https://news.ycombinator.com/item?id=47638062
>Those aren't false positives; they're results he hasn't yet inspected.
It's not a XOR
3 replies →
some of them certainly are…
The comment said "Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.".
Please explain how a bug can both be unvalidated, and also have undergone a three month process to determine it is a false positive?