← Back to context

Comment by JeremyNT

3 hours ago

You may not be able to comment, but do you feel like Mythos is accomplishing anything that couldn't have already been done with Opus and the right prompting?

I've assumed I could send an agent using a publicly available model bug hunting in a codebase like this and get tons of results, assuming I wanted to burn the tokens, so it's really unclear to me whether the Mythos hype is justified or if it's just an easy button (and subsidized tokens?) to do what is already possible.

I never got direct access to Mythos, so all I know is what I've seen from the quality of the bugs being produced. I also haven't been involved at the prompting end.

So the best answer I can give is: I dunno, maybe it's possible to find bugs like this using Opus, but if so, where are they? Did nobody think to try "please find the bug in this code" pre-Mythos? I've done enough auditing with Opus to be convinced that it can be a good assistant to somebody who already knows what they're doing, but in practice the big wave of AI-discovered bugs started with Mythos.

I'm sure lots of people have assumed they could send a publicly available model bug hunting and find things. I have not noticed a huge amount of success. We've had some very nice correctness bugs reported, but skimming through the list of security bugs I've fixed recently, the AI-related ones all seem to be Mythos.

My best guess is that Mythos is just enough better along just enough axes that its hit rate on finding potential bugs and filtering out the real ones from the hallucinations is good enough to matter. Like, there's no obvious qualitative difference between 3.6kg of uranium-232 and 3.8 kg of uranium-232, just a small quantitative increase. But if you form both of them into spheres, only one of them has reached critical mass. Can you do something clever to reach critical mass with 3.6kg of uranium? Maybe! But needing to do something clever is a non-trivial barrier in itself.