Comment by krschacht

2 months ago

At the end of this article it states, "Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior"). A real autonomous discovery pipeline starts from a full codebase with no hints." I'm not a cybersecurity expert, but isn't 80% of the challenge finding where the exploit lives in the code!?

That really undermines the author's claims. This article feels dishonest in it's claim that "small, cheap, open-weights models ... recovered much of the same analysis."