← Back to context

Comment by pixl97

8 hours ago

>, the number of false positives,

Really this is why the LLM needs to be able to write exploits for issues it finds. Of course that leads down a rabbit hole of other issues. But if an exploit works, then that's pretty conclusive evidence.

For a subset of bugs, yes. For some others, not really: I've seen LLMs make bogus assumptions about the threat model (in which case, the exploit works but doesn't demonstrate anything useful) or "cheat" by modifying the code to demonstrate a hallucinated issue.

Frontier models, including Mythos, can greatly streamline bug hunting and exploit developments in the hands of a competent security engineer. In the hands of a person with no security experience, they will still mostly waste your time and money.

  • Seconding this.

    I've seen it make the codebase vulnerable by changing the source, then claiming it found a vuln, or finding a well-defended and secure exec function, write a unit test that shows what exec does (which is running commands), then claiming a critical finding.