Comment by apgwoz

2 months ago

The benefit here is reducing the time to find vulnerabilities; faster than humans, right? So if you can rig a harness for each function in the system, by first finding where it’s used, its expected input, etc, and doing that for all functions, does it discover vulnerabilities faster than humans?

Doesn’t matter that they isolated one thing. It matters that the context they provided was discoverable by the model.

12 comments

apgwoz

woeirua 2 months ago

There is absolutely zero reason to believe you could use this same approach to find and exploit vulns without Mythos finding them first. We already know that older LLMs can’t do what Mythos has done. Anthropic and others have been trying for years.

nozzlegear 2 months ago
> There is absolutely zero reason to believe you could use this same approach to find and exploit vulns without Mythos finding them first.
There's one huge reason to believe it: we can actually use small models, but we cant use Anthropic's special marketing model that's too dangerous for mere mortals.
- Filligree 2 months ago
  
  If all you have is a spade, that is _not_ evidence that spades are good for excavating an entire hill.
  
  3 replies →
cycomanic 2 months ago

From the article:
>At AISLE, we've been running a discovery and remediation system against live targets since mid-2025: 15 CVEs in OpenSSL (including 12 out of 12 in a single security release, with bugs dating back 25+ years and a CVSS 9.8 Critical), 5 CVEs in curl, over 180 externally validated CVEs across 30+ projects spanning deep infrastructure, cryptography, middleware, and the application layer.
So there is pretty good evidence that yes you can use this approach. In fact I would wager that running a more systematic approach will yield better results than just bruteforcing, by running the biggest model across everything. It definitely will be cheaper.
apgwoz 2 months ago
Why? They claim this small model found a bug given some context. I assume the context wasn’t “hey! There’s a very specific type of bug sitting in this function when certain conditions are met.”
We keep assuming that the models need to get bigger and better, and the reality is we’ve not exhausted the ways in which to use the smaller models. It’s like the Playstation 2 games that came out 10 years later. Well now all the tricks were found, and everything improved.
- usef- 2 months ago
  
  If this were true, we're essentially saying that no one tried to scan vulnerabilities using existing models, despite vulnerabilities being extremely lucrative and a large professional industry. Vulnerability research has been one of the single most talked about risks of powerful AI so it wasn't exactly a novel concept, either.
  If it is true that existing models can do this, it would imply that LLMs are being under marketed, not over marketed, since industry didn't think this was worth trying previously(?). Which I suspect is not the opinion of HN upvoters here.
  
  3 replies →