Comment by cycomanic

2 months ago

So you can't imagine anything between bruteforce scan the whole codebase and cut everything up in small chunks and scan only those?

You don't think that security companies (and likely these guys as well) develop systems for doing this stuff?

I'm not a security researcher and I can imagine a harness that first scans the codebase and describes the API, then another agent determines which functions should be looked at more closely based on that description, before handing those functions to another small llm with the appropriate context. Then you can even use another agent to evaluate the result to see if there are false positives.

I would wager that such a system would yield better results for a much lower price.

Instead we are talking about this marketing exercise "oohh our model is so dangerous it can't be released, and btw the results can't be independently verified either"

2 comments

cycomanic

johnfn 2 months ago

I explained why this won't work elsewhere in the thread[1].

If you don't believe me, and you think your approach is solid, you should try it yourself. It's only a couple of dollars, and it would be extremely popular -- just look at how popular this article, using improper methodology, was! Hey, maybe you're right, and you can prove us all wrong. But I'd bet you on great odds that you're not.

[1]: https://news.ycombinator.com/item?id=47734710

user34283 2 months ago

[dead]