Comment by cycomanic

5 hours ago

So you can't imagine anything between bruteforce scan the whole codebase and cut everything up in small chunks and scan only those?

You don't think that security companies (and likely these guys as well) develop systems for doing this stuff?

I'm not a security researcher and I can imagine a harness that first scans the codebase and describes the API, then another agent determines which functions should be looked at more closely based on that description, before handing those functions to another small llm with the appropriate context. Then you can even use another agent to evaluate the result to see if there are false positives.

I would wager that such a system would yield better results for a much lower price.

Instead we are talking about this marketing exercise "oohh our model is so dangerous it can't be released, and btw the results can't be independently verified either"