← Back to context

Comment by lefty2

16 days ago

I wonder how many false positives there were. Typically this types of static analysis tools come up with a ton of potential bugs, but only a few of them are actual bugs.

The basic technique (as has been publicly described by Anthropic) is you ask one agent to come up with a test case that triggers, say, an ASan use-after-free. Then you have a second agent that validates the test case. This eliminates a lot of false positives. It gets a little tricky when you allow the first agent to modify the code, which is necessary for things like sandbox escapes where you want to demonstrate that sending bad IPC causes problems.

  • And the following part that is more important is the agent that attempts the fix in code, the agent that tests the fix and reports on perfomance and functional impacts, and the agent the triggers the build and release to production.

    Everything up to finding and validating the bug is a huge win in vuln/exploit development, everything after validating the bug is a huge win for defensive security and a massive gap until the tools are generally available :S