Comment by DeathArrow

5 months ago

>Recently I ran an experiment where I built agents on top of Opus 4.5 and GPT-5.2 and then challenged them to write exploits for a zeroday vulnerability in the QuickJS Javascript interpreter.

I think the main challenge for hackers is to find 0day vulnerabilities, not writing the actual exploit code.

2 comments

DeathArrow

jdefr89 5 months ago

As someone who does it for a living the challenge can be in both. However this article is asking its agents to do CTF like challenges which I am sure the respective LLMs have seen millions of so it can essentially regurgitate a large part of the exploit code. This is especially true for the OOB/RW primitive API.

GaggiX 5 months ago

The vulnerability was found by Claude:

>This is true by definition as the QuickJS vulnerability was previously unknown until I found it (or, more correctly: my Opus 4.5 vulnerability discovery agent found it).