Comment by mtlynch

8 days ago

> What is not mentioned is that Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.

Source? I haven't seen this anywhere.

In my experience, false positive rate on vulnerabilities with Claude Opus 4.6 is well below 20%.

23 comments

mtlynch

Supermancho 8 days ago

To the issue of AI submitted patches being more of a burden than a boon, many projects have decided to stop accepting AI-generated solutioning:

https://blog.devgenius.io/open-source-projects-are-now-banni...

These are just a few examples. There are more that google can supply.

logicprog 8 days ago

According to Willy Tarreau[0] and Greg Kroah-Hartman[1], this trend has recently significantly reversed, at least form the reports they've been seeing on the Linux kernel. The creator of curl, Daniel Steinberg, before that broader transition, also found the reports generated by LLM-powered but more sophisticated vuln research tools useful[2] and the guy who actually ran those tools found "They have low false positive rates."[3]
Additionally, there was no mention in the talk by the guy who found the vuln discussed in the TFA of what the false positive rate was, or that he had to sift through the reports because it was mostly slop — or whether he was doing it out of courtesy. Additionally, he said he found only several hundred, iirc, not "thousands." All he said was:
"I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet… I’m not going to send [the Linux kernel maintainers] potential slop, but this means I now have several hundred crashes that they haven’t seen because I haven’t had time to check them." (TFA)
He quite evidently didn't have to sift through thousands, or spend months, to find this one, either.
[0]: https://lwn.net/Articles/1065620/ [1]: https://www.theregister.com/2026/03/26/greg_kroahhartman_ai_... [2]: https://simonwillison.net/2025/Oct/2/curl/p [3]: https://joshua.hu/llm-engineer-review-sast-security-ai-tools...
literalAardvark 8 days ago
No, they haven't. Read the ai slop you posted carefully.
It's a policy update that enables maintainers to ignore low effort "contributions" that come from untrusted people in order to reduce reviewing workload.
An Eternal September problem, kind of.
- coldtea 8 days ago
  
  Didn't you just restate what the parent claimed?
  
  7 replies →

christophilus 8 days ago

Same. Codex and Claude Code on the latest models are really good at finding bugs, and really good at fixing them in my experience. Much better than 50% in the latter case and much faster than I am.

paulddraper 8 days ago

Source: """AI is bad"""

r9295 8 days ago

In my experience, the issue has been likelihood of exploitation or issue severity. Claude gets it wrong almost all the time.

A threat model matters and some risks are accepted. Good luck convincing an LLM of that fact

j16sdiz 8 days ago

In TFA:

   I have so many bugs in the Linux kernel that I can’t 
   report because I haven’t validated them yet… I’m not going 
   to send [the Linux kernel maintainers] potential slop, 
   but this means I now have several hundred crashes that they
   haven’t seen because I haven’t had time to check them.
    
    —Nicholas Carlini, speaking at [un]prompted 2026

mtlynch 8 days ago
Those aren't false positives; they're results he hasn't yet inspected.
I wrote a longer reply here: https://news.ycombinator.com/item?id=47638062
- coldtea 8 days ago
  
  >Those aren't false positives; they're results he hasn't yet inspected.
  It's not a XOR
  
  3 replies →
- bethekidyouwant 8 days ago
  
  some of them certainly are…
sobiolite 8 days ago

The comment said "Claude Code also found one thousand false positive bugs, which developers spent three months to rule out.".
Please explain how a bug can both be unvalidated, and also have undergone a three month process to determine it is a false positive?