Three deterministic Linux LPEs in a week, an LPE in BSD in execve (of all things...), nginx vulnerabilities, one or two new gnarly supply chain attacks. Linus noting that the linux-security mailing list is getting flooded with duplicated, AI-driven reports of varying quality. There are pretty crazy keycloak vulnerabilities getting discovered.
We're most likely entering a year or two or rapid vulnerability discovery, patching, as well as reducing and minimalizing system footprints just to survive the onslaught of strange vulnerabilities from e.g. ancient and widely unused kernel modules.
I met a few people at PyCon this week who have been part of Glasswing (they're just starting to be allowed to talk about it) and it really does drive down the cost of finding vulnerabilities.
You might point them at the cloudflare blog about deploying mythos - I found it interesting. Upshot — as your folks discovered, deployment, harness, and utilization method matters for mythos and is a bit different than how you’d deploy a coding agent for writing code, but if you do that, you get something capable of end to end chaining and reasoning about a much broader class of vulnerabilities.
No personal experience with it. But the security team writeups I’ve read are significantly more positive about it than you describe, so it might be worth a second look.
Wouldn't it drive up the cost of finding vulnerabilities when all the low hanging fruit has already been scanned and patched? Like the new baseline for finding a vulnerability will be something an LLM couldn't find.
Broadly, I'm talking about the shift from building elaborate vulnerability research harnesses towards using the frontier models and their RL-optimized harnesses to build simpler vulnerability discovery pipelines. And then: the ensuing carnage.
Not op but just look at HN posts in the last couple weeks: supply chain worms, zero-day LPEs for all OSes seemingly every other day, researchers on X and here openly saying they’ve got more valid findings than they know what to do with
Three deterministic Linux LPEs in a week, an LPE in BSD in execve (of all things...), nginx vulnerabilities, one or two new gnarly supply chain attacks. Linus noting that the linux-security mailing list is getting flooded with duplicated, AI-driven reports of varying quality. There are pretty crazy keycloak vulnerabilities getting discovered.
We're most likely entering a year or two or rapid vulnerability discovery, patching, as well as reducing and minimalizing system footprints just to survive the onslaught of strange vulnerabilities from e.g. ancient and widely unused kernel modules.
The Claude Mythos / Project Glasswing thing is real: https://www.anthropic.com/glasswing
I met a few people at PyCon this week who have been part of Glasswing (they're just starting to be allowed to talk about it) and it really does drive down the cost of finding vulnerabilities.
I've been collecting notes on that here: https://simonwillison.net/tags/ai-security-research/
People in my company sounded underwhelmed by it. It usually was founding issues by not understanding deployment (or not being fed that info).
You might point them at the cloudflare blog about deploying mythos - I found it interesting. Upshot — as your folks discovered, deployment, harness, and utilization method matters for mythos and is a bit different than how you’d deploy a coding agent for writing code, but if you do that, you get something capable of end to end chaining and reasoning about a much broader class of vulnerabilities.
No personal experience with it. But the security team writeups I’ve read are significantly more positive about it than you describe, so it might be worth a second look.
A friend of mine had hands on experience, it’s not the intelligence of it, it’s the speed.
You used to have a couple of days to close a breach, now it 2 hours.
Wouldn't it drive up the cost of finding vulnerabilities when all the low hanging fruit has already been scanned and patched? Like the new baseline for finding a vulnerability will be something an LLM couldn't find.
Broadly, I'm talking about the shift from building elaborate vulnerability research harnesses towards using the frontier models and their RL-optimized harnesses to build simpler vulnerability discovery pipelines. And then: the ensuing carnage.
Not op but just look at HN posts in the last couple weeks: supply chain worms, zero-day LPEs for all OSes seemingly every other day, researchers on X and here openly saying they’ve got more valid findings than they know what to do with