← Back to context

Comment by IainIreland

3 hours ago

I never got direct access to Mythos, so all I know is what I've seen from the quality of the bugs being produced. I also haven't been involved at the prompting end.

So the best answer I can give is: I dunno, maybe it's possible to find bugs like this using Opus, but if so, where are they? Did nobody think to try "please find the bug in this code" pre-Mythos? I've done enough auditing with Opus to be convinced that it can be a good assistant to somebody who already knows what they're doing, but in practice the big wave of AI-discovered bugs started with Mythos.

I'm sure lots of people have assumed they could send a publicly available model bug hunting and find things. I have not noticed a huge amount of success. We've had some very nice correctness bugs reported, but skimming through the list of security bugs I've fixed recently, the AI-related ones all seem to be Mythos.

My best guess is that Mythos is just enough better along just enough axes that its hit rate on finding potential bugs and filtering out the real ones from the hallucinations is good enough to matter. Like, there's no obvious qualitative difference between 3.6kg of uranium-232 and 3.8 kg of uranium-232, just a small quantitative increase. But if you form both of them into spheres, only one of them has reached critical mass. Can you do something clever to reach critical mass with 3.6kg of uranium? Maybe! But needing to do something clever is a non-trivial barrier in itself.

I did some experiments and Opus seemed pretty able to wire up a harness to find bugs and write PoC + patch for each. It's still a lot of work to get fixes upstreamed from outside so I think even if outsiders have better tools (Mythos etc) it won't change the report rate much, people may find more bugs but they won't report them. I suspect that's part of the calculation of the phased rollout for Mythos, finding bugs is already not the bottleneck.

  • > It's still a lot of work to get fixes upstreamed from outside

    I'm going to disagree in the specific case of Firefox. First, although it has diverged a long way from its roots, Mozilla still has the community project ideal in its DNA. Enough, at least, that I stumbled while reading the clause "from outside" -- if you're finding and reporting actual relevant security bugs, you're already on the inside. SpiderMonkey in particular still has a good amount of code being written and even maintained by non-employees. (Examples: Temporal and LoongArch64 JIT support).

    Second, the bug bounty program still exists[0] and is being used. If someone were sitting on a pile of AI-discovered exploits, then it has monetary value which is rapidly draining away the longer they aren't reported.[1] That's incentive to put in the work to report them properly.

    Third, I agree that finding bugs is likely not the bottleneck. Validating them is. With previous models, the false positive rate was too high so they required too much work to whittle down to the valid ones. A PoC is a very strong signal that a bug is valid, and that's where I just don't believe you: without a really good harness, I don't think Opus was good enough to find very many bugs with PoCs. It could find some, just not very many.[2]

    [0] For now. It remains to be seen how it will adapt to the AI age. For the moment, it hasn't been severely nerfed like Google's.

    [1] One could make the argument that people who are inexpert enough to only be able to poke an AI to find bugs are also the people more likely to sell them on the black market rather than disclosing them. It seems plausible. Still, some people would still be disclosing, and not many were filing quality bugs pre-Mythos. Some were, but it was a trickle compared to post-Mythos.

    [2] Also note that I personally, as a SpiderMonkey developer, don't find a huge amount of value in the AI-generated patches that accompany these bug reports. Sometimes they're useful to better illustrate the problem, especially since the AI's problem analysis is usually subtly wrong in important ways. They can be a decent starting point for a real patch. But I'll still need to go through my own process of figuring out what the right fix is, even in the handful of cases where I end up with the same thing the AI did.

  • > I suspect that's part of the calculation of the phased rollout for Mythos, finding bugs is already not the bottleneck.

    I was wondering this too. By working directly with tech companies and (one assumes) subsidizing tokens, they're empowering the people on the inside who absolutely want to have the bugs fixed.

    Who outside of Mozilla is going to pay and spend the effort to find Firefox bugs? Sure some hobbyists and contributors might, but they don't have the institutional knowledge of the codebase which can help guide an agent prompts, nor do they have strong incentives to try and report them, nor do they necessarily have the time to craft good bug reports that stand out from the slop reports.

    My assumption would be that most people working to discover bugs this way in Firefox are interested in using them rather than getting them fixed, so maintainers wouldn't necessarily even know the degree to which it was already happening.