Comment by SchemaLoad

20 days ago

There's been several notable posts where maintainers found there was no bug at all, or the example code did not even call code from their project and had just found running a python script can do things on your computer. Entirely AI generated Issue reports and examples wasting maintainer time.

5 comments

SchemaLoad

wat10000 20 days ago

I've had multiple reports with elaborate proofs of concept that boil down to things like calling dlopen() on a path to a malicious library and saying dlopen has a security vulnerability.

NitpickLawyer 20 days ago

That's because the user of the tool didn't go through the troubles to setup the env properly (as the author of the blog did). So what they got was a "story about a bug", but without verification.

The proper way to use these tools (like in other verifiable tasks such as math or coding) is to give them a feedback loop and an easily verifiable success criteria. In security exploitation you either capture the flag or not. It's very easy (and cheap) to verify. So you can leave these things to bang their tokens against a wall, and only look at their output once they capture the flag. Or they output something somewhere verifiable (e.g. echo "pwned" > /root/.flag)

GrinningFool 20 days ago

Now all that's left is to get every person who uses them to generate bug reports to just follow these practices.

simonw 20 days ago

My hunch is that the dumbasses submitting those reports were't actually using coding agent harnesses at all - they were pasting blocks of code into ChatGPT or other non-agent-harness tools and asking for vulnerabilities and reporting what came back.

An "agent harness" here is software that directly writes and executes code to test that it works. A vulnerability reported by such an agent harness with included proof-of-concept code that has been demonstrated to work is a different thing from an "exploit" that was reported by having a long context model spit out a bunch of random ideas based purely on reading the code.

I'm confident you can still find dumbasses who can mess up at using coding agent harnesses and create invalid, time wasting bug reports. Dumbasses are gonna dumbass.

staticassertion 19 days ago

I strongly suspect the same thing - that they weren't using agents at all in the reports we've seen, let alone agents with instructions on how to verify a viable attack, a threat model, etc.