Comment by rwmj

13 hours ago

With the exploits, you can try them and they either work or they don't. An attacker is not especially interested in analysing why the successful ones work.

With the CVE reports some poor maintainer has to go through and triage them, which is far more work, and very asymmetrical because the reporters can generate their spam reports in volume while each one requires detailed analysis.

There's been several notable posts where maintainers found there was no bug at all, or the example code did not even call code from their project and had just found running a python script can do things on your computer. Entirely AI generated Issue reports and examples wasting maintainer time.

  • That's because the user of the tool didn't go through the troubles to setup the env properly (as the author of the blog did). So what they got was a "story about a bug", but without verification.

    The proper way to use these tools (like in other verifiable tasks such as math or coding) is to give them a feedback loop and an easily verifiable success criteria. In security exploitation you either capture the flag or not. It's very easy (and cheap) to verify. So you can leave these things to bang their tokens against a wall, and only look at their output once they capture the flag. Or they output something somewhere verifiable (e.g. echo "pwned" > /root/.flag)

  • I've had multiple reports with elaborate proofs of concept that boil down to things like calling dlopen() on a path to a malicious library and saying dlopen has a security vulnerability.

  • My hunch is that the dumbasses submitting those reports were't actually using coding agent harnesses at all - they were pasting blocks of code into ChatGPT or other non-agent-harness tools and asking for vulnerabilities and reporting what came back.

    An "agent harness" here is software that directly writes and executes code to test that it works. A vulnerability reported by such an agent harness with included proof-of-concept code that has been demonstrated to work is a different thing from an "exploit" that was reported by having a long context model spit out a bunch of random ideas based purely on reading the code.

    I'm confident you can still find dumbasses who can mess up at using coding agent harnesses and create invalid, time wasting bug reports. Dumbasses are gonna dumbass.

It can't be too long before Claude Code is capable of replication + triage + suggested fixes...

  • BTW regarding "suggested fixes", an interesting attack would be to report a bug along with a prompt injection which will cause Claude to suggest inserting a vulnerability in the codebase in question. So, it's important to review bug-report-originated Claude suggestions extra carefully. (And watch for prompt injection attacks.)

    Another thought is the reproducible builds become more valuable than ever, because it actually becomes feasible for lots and lots of devs to scan the entire codebase for vulns using an LLM and then verify reproducibility.

  • Would you ever blindly trust it?

    • No. I would probably do something like: Have Claude Code replicate + triage everything. If a report gets triaged as "won't fix", send an email to the reporter explaining what Claude found and why it was marked as "won't fix". Tell the reporter they still have a chance at the bounty if they think Claude made a mistake, but they have to pay a $10 review fee to have a human take a look. (Or a $1 LLM token fee for Claude to take another look, in case of simple confabulation.)

      Note I haven't actually tried Claude Code (not coding due to chronic illness), so I'm mostly extrapolating based on HN discussion etc.