Comment by tptacek

6 days ago

If you're a vulnerability researcher or a security person generally, there's a big inflection point from Spring of this year.

If it turns out to be a good change or not is to be seen.

The half-full view is that the models are so good at finding vulns that if you plug them into your build-pipeline then the amount of new vulns introduced will go down towards zero.

The half-empty view is that we're now producing more junior-level code with less review, so everything will have more vuln, also it's cheaper and easier to find them so prepare for chaos.

Short term there is sure to be chaos either way as the models are clearly good enough to find all the old bugs, and not everyone has the resources or will to try to stay ahead of the curve like Mozilla is trying to do with their Mythos access https://blog.mozilla.org/en/firefox/ai-security-zero-day-vul...

  • There's a major caveat to the half-full view: You'll only stop adding new vulns that your model can find.

    A threat actor with access to a better model or more money to burn on tokens may yet find more. Some of them have deep pockets, and not nearly every project will get the Glasswing treatment of free Mythos tokens.

    • There's an interesting economic contest here as well - is it more sustainable for a malware group to spend $500 in tokens looking for an issue in my app? or for me to spend $500 scanning for issues on every deployment?

      Systemically this usually favours the offence, as they could scan my app once every 6 months whereas I'd need to do it on weekly releases.

I'm a security person and would love to hear other people's input here as I don't have that much experience with this

Can you be more specific?

  • Three deterministic Linux LPEs in a week, an LPE in BSD in execve (of all things...), nginx vulnerabilities, one or two new gnarly supply chain attacks. Linus noting that the linux-security mailing list is getting flooded with duplicated, AI-driven reports of varying quality. There are pretty crazy keycloak vulnerabilities getting discovered.

    We're most likely entering a year or two or rapid vulnerability discovery, patching, as well as reducing and minimalizing system footprints just to survive the onslaught of strange vulnerabilities from e.g. ancient and widely unused kernel modules.

  • The Claude Mythos / Project Glasswing thing is real: https://www.anthropic.com/glasswing

    I met a few people at PyCon this week who have been part of Glasswing (they're just starting to be allowed to talk about it) and it really does drive down the cost of finding vulnerabilities.

    I've been collecting notes on that here: https://simonwillison.net/tags/ai-security-research/

    • People in my company sounded underwhelmed by it. It usually was founding issues by not understanding deployment (or not being fed that info).

      2 replies →

    • Wouldn't it drive up the cost of finding vulnerabilities when all the low hanging fruit has already been scanned and patched? Like the new baseline for finding a vulnerability will be something an LLM couldn't find.

  • Broadly, I'm talking about the shift from building elaborate vulnerability research harnesses towards using the frontier models and their RL-optimized harnesses to build simpler vulnerability discovery pipelines. And then: the ensuing carnage.

  • Not op but just look at HN posts in the last couple weeks: supply chain worms, zero-day LPEs for all OSes seemingly every other day, researchers on X and here openly saying they’ve got more valid findings than they know what to do with