Comment by protocolture

20 days ago

I genuinely dont know who to believe. The people who claim LLMs are writing excellent exploits. Or the people who claim that LLMs are sending useless bug reports. I dont feel like both can really be true.

40 comments

protocolture

simonw 20 days ago

Why can't they both be true?

The quality of output you see from any LLM system is filtered through the human who acts on those results.

A dumbass pasting LLM generated "reports" into an issue system doesn't disprove the efforts of a subject-matter expert who knows how to get good results from LLMs and has the necessary taste to only share the credible issues it helps them find.

protocolture 20 days ago
Theres no filtering mentioned in the OP article. It claims GPT only created working useful exploits. If it can do that, it could also submit those exploits as perfectly as bug reports?
- moyix 20 days ago
  
  There is filtering mentioned, it's just not done by a human:
  > I have written up the verification process I used for the experiments here, but the summary is: an exploit tends to involve building a capability to allow you to do something you shouldn’t be able to do. If, after running the exploit, you can do that thing, then you’ve won. For example, some of the experiments involved writing an exploit to spawn a shell from the Javascript process. To verify this the verification harness starts a listener on a particular local port, runs the Javascript interpreter and then pipes a command into it to run a command line utility that connects to that local port. As the Javascript interpreter has no ability to do any sort of network connections, or spawning of another process in normal execution, you know that if you receive the connect back then the exploit works as the shell that it started has run the command line utility you sent to it.
  It is more work to build such "perfect" verifiers, and they don't apply to every vulnerability type (how do you write a Python script to detect a logic bug in an arbitrary application?), but for bugs like these where the exploit goal is very clear (exec code or write arbitrary content to a file) they work extremely well.
- simonw 20 days ago
  
  The OP is the filtering expert.
anonymous908213 20 days ago
They can't both be true if we're talking about the premise of the article, which is the subject of the headline and expounded upon prominently in the body:
The Industrialisation of Intrusion By ‘industrialisation’ I mean that the ability of an organisation to complete a task will be limited by the number of tokens they can throw at that task. In order for a task to be ‘industrialised’ in this way it needs two things: An LLM-based agent must be able to search the solution space. It must have an environment in which to operate, appropriate tools, and not require human assistance. The ability to do true ‘search’, and cover more of the solution space as more tokens are spent also requires some baseline capability from the model to process information, react to it, and make sensible decisions that move the search forward. It looks like Opus 4.5 and GPT-5.2 possess this in my experiments. It will be interesting to see how they do against a much larger space, like v8 or Firefox. The agent must have some way to verify its solution. The verifier needs to be accurate, fast and again not involve a human.
"The results are contigent upon the human" and "this does the thing without a human involved" are incompatible. Given what we've seen from incompetent humans using the tools to spam bug bounty programs with absolute garbage, it seems the premise of the article is clearly factually incorrect. They cite their own experiment as evidence for not needing human expertise, but it is likely that their expertise was in fact involved in designing the experiment[1]. They also cite OpenAI's own claims as their other piece of evidence for this theory, which is worth about as much as a scrap of toilet paper given the extremely strong economic incentives OpenAI has to exaggerate the capabilities of their software.
[1] If their experiment even demonstrates what it purports to demonstrate. For anyone to give this article any credence, the exploit really needs to be independently verified that it is what they say it is and that it was achieved the way they say it was achieved.
- adw 20 days ago
  
  What this is saying is "you need an objective criterion you can use as a success metric" (aka a verifiable reward in RL terms). "Design of verifiers" is a specific form of domain expertise.
  This applies to exploits, but it applies _extremely_ generally.
  The increased interest in TLA+, Lean, etc comes from the same place; these are languages which are well suited to expressing deterministic success criteria, and it appears that (for a very wide range of problems across the whole of software) given a clear enough, verifiable enough objective, you can point the money cannon at it until the problem is solved.
  The economic consequences of that are going to be very interesting indeed.
- IanCal 20 days ago
  
  A few points:
  1. I think you have mixed up assistance and expertise. They talk about not needing a human in the loop for verification and to continue search but not about initial starts. Those are quite different. One well specified task can be attempted many times, and the skill sets are overlapping but not identical.
  2. The article is about where they may get to rather than just what they are capable of now.
  3. There’s no conflict between the idea that 10 parallel agents of the top models can mostly have one that successfully exploits a vulnerability - gated on an actual test that the exploit works - with feedback and iteration BUT random models pointed at arbitrary code without a good spec and without the ability to run code, and just run once, will generate lower quality results.
- GaggiX 20 days ago
  
  After setting the environment and the verifier you can spawn as many agents as you want until the conditions are met, this is only possible because they run without human assistance, that's the "industrialisation".
- simonw 20 days ago
  
  My expectation is that any organization that attempts this will need subject matter experts to both setup and run the swarm of exploit finding agents for them.

rwmj 20 days ago

With the exploits, you can try them and they either work or they don't. An attacker is not especially interested in analysing why the successful ones work.

With the CVE reports some poor maintainer has to go through and triage them, which is far more work, and very asymmetrical because the reporters can generate their spam reports in volume while each one requires detailed analysis.

SchemaLoad 20 days ago
There's been several notable posts where maintainers found there was no bug at all, or the example code did not even call code from their project and had just found running a python script can do things on your computer. Entirely AI generated Issue reports and examples wasting maintainer time.
- wat10000 19 days ago
  
  I've had multiple reports with elaborate proofs of concept that boil down to things like calling dlopen() on a path to a malicious library and saying dlopen has a security vulnerability.
- NitpickLawyer 19 days ago
  
  That's because the user of the tool didn't go through the troubles to setup the env properly (as the author of the blog did). So what they got was a "story about a bug", but without verification.
  The proper way to use these tools (like in other verifiable tasks such as math or coding) is to give them a feedback loop and an easily verifiable success criteria. In security exploitation you either capture the flag or not. It's very easy (and cheap) to verify. So you can leave these things to bang their tokens against a wall, and only look at their output once they capture the flag. Or they output something somewhere verifiable (e.g. echo "pwned" > /root/.flag)
  
  1 reply →
- simonw 20 days ago
  
  My hunch is that the dumbasses submitting those reports were't actually using coding agent harnesses at all - they were pasting blocks of code into ChatGPT or other non-agent-harness tools and asking for vulnerabilities and reporting what came back.
  An "agent harness" here is software that directly writes and executes code to test that it works. A vulnerability reported by such an agent harness with included proof-of-concept code that has been demonstrated to work is a different thing from an "exploit" that was reported by having a long context model spit out a bunch of random ideas based purely on reading the code.
  I'm confident you can still find dumbasses who can mess up at using coding agent harnesses and create invalid, time wasting bug reports. Dumbasses are gonna dumbass.
  
  1 reply →
airza 20 days ago
All the attackers I’ve known are extremely, pathologically interested in understanding why their exploits work.
- pixl97 20 days ago
  
  Very often they need to understand it well to chain exploits
- rwmj 19 days ago
  
  I mean someone attacking systems at scale for profit.
0xDEAFBEAD 19 days ago
It can't be too long before Claude Code is capable of replication + triage + suggested fixes...
- 0xDEAFBEAD 19 days ago
  
  BTW regarding "suggested fixes", an interesting attack would be to report a bug along with a prompt injection which will cause Claude to suggest inserting a vulnerability in the codebase in question. So, it's important to review bug-report-originated Claude suggestions extra carefully. (And watch for prompt injection attacks.)
  Another thought is the reproducible builds become more valuable than ever, because it actually becomes feasible for lots and lots of devs to scan the entire codebase for vulns using an LLM and then verify reproducibility.
- ares623 19 days ago
  
  Would you ever blindly trust it?
  
  1 reply →

wat10000 19 days ago

LLMs produce good output and bad output. The trick is figuring out which is which. They excel at tasks where good output is easily distinguished. For example, I've had a lot of success with making small reproducers for bugs. I see weird behavior A coming from giant pile of code B, figure out how to trigger A in a small example. It can often do so, and when it gets it wrong it's easy to detect because its example doesn't actually do A. The people sending useless bug reports aren't checking for good output.

raesene9 19 days ago

Yeah they definitely can be true (IME), as there's a massive difference depending on how LLMs are used to the quality of the output.

For example if you just ask an LLM in a browser with no tool use to "find a vulnerability in this program", it'll likely give you something but it is very likely to be hallucinated or irrelevant.

However if you use the same LLM model via an agent, and provide it with concrete guidance on how to test its success, and the environment needed to prove that success, you are much more likely to get a good result.

It's like with Claude code, if you don't provide a test environment it will often make mistakes in the coding and tell you all is well, but if you provide a testing loop it'll iterate till it actually works.

GoatInGrey 19 days ago

Both are true. Exploits are a very narrow problem with unambiguous success metrics. While also naturally complementing the ingrained persistence of LLMs. Bug reports are much more fuzzy by comparison with open-ended goals that lead to the LLMs metaphorically cheating on their homework to satisfy the prompter who doesn't know any better.

QuadmasterXLII 20 days ago

These exploits were costing $50 of API credit each. If you receive 5001 issues from $100 in API spend on bug hunting and one of the issues cost $50 and the other 5000 cost one cent each, and they’re all visually indistinguishable using perfect grammar and familiar cyber security lingo; hard to find the dianond.

tptacek 20 days ago
The point of the post is that the harness generates a POC. It either works or it doesn't.
- QuadmasterXLII 19 days ago
  
  https://hackerone.com/reports/3100073 includes a POC too- still slop
pjc50 19 days ago

Once your exploit machine is good enough, you can start using stolen credentials to mine more exploits. This is going to be the new version of malware installing bitcoin miners.

AdieuToLogic 20 days ago

Both can be true if each group selectively provides LLM output supporting their position. Essentially, this situation can be thought of as a form of the Infinite Monkey Theorem[0] where the result space is drastically reduced from "purely random" to "likely to be statistically relevant."

For an interesting overview of the above theorem, see here[1].

0 - https://en.wikipedia.org/wiki/Infinite_monkey_theorem

1 - https://www.yalescientific.org/2025/04/sorry-shakespeare-why...

doomerhunter 20 days ago

Both are true, the difference is the skill level of the people who use / create programs to coordinate LLMs to generate those reports.

The AI slop you see on curl's bug bounty program[1] (mostly) comes from people who are not hackers in the first place.

In the contrary persons like the author are obviously skilled in security research and will definitely send valid bugs.

Same can be said for people in my space who do build LLM-driven exploit development. In the US Xbow hired quite some skilled researchers [2] had some promising development for instance.

[1] https://hackerone.com/curl/hacktivity [2] https://xbow.com/about

tptacek 20 days ago

If it helps, I read this (before it landed here) because Halvar Flake told everyone on Twitter to read it.

simonw 20 days ago
I hadn't heard of Halvar Flake but evidently he's a well respected figure in security - https://ringzer0.training/advisory-board-thomas-dullien-halv... mentions "After working at Google Project Zero, he cofounded startup optimyze, which was acquired by Elastic Security in 2021"
His co-founder on optimyze was Sean Heelan, the author of the OP.
- tptacek 20 days ago
  
  Yes, Halvar Flake is pretty well respected in exploit dev circles.
  
  1 reply →

_factor 19 days ago

Depends near entirely on the model being used. A bug report by Opus and a bug report from Gemma3 are not of the same caliber.

octoberfranklin 19 days ago

Finished exploits (for immediate deployment) don't have to be maintainable, and they only need to work once.

ronsor 20 days ago

LLMs are both extremely useful to competent developers and extremely harmful to those who aren't.

rvz 20 days ago

Accurate.