Comment by IainIreland
2 hours ago
Yeah, the test with the patch also adds comments. The human reviewer had extra context available.
I did get Opus to do an audit for similar problems elsewhere, to supplement the investigations that we were already doing by hand. It initially thought it found something, but when asked to produce a testcase, it thought for 20 minutes and admitted defeat. I suspect that the difference between Opus and Mythos is in small edges like this: if Mythos is smart enough to spot why Opus's discovery didn't work a little bit faster, and it can waste less time chasing down red herrings, then it's more likely to find a real bug within the limits of a context window. It's not that Opus completely lacks some capability, it's that it has trouble chaining all the pieces together consistently.
No comments yet
Contribute on Hacker News ↗