Partly true. I think the consensus was it wasn't comparable because Mythos swept the entire codebase and found the vulnerabilities, whereas the open models were told where to look for said vulnerabilities.
Not really. The models were pointed specifically at the location of the vulnerability and given some extra guidance. That's an easier problem than simply being pointed at the entire code base.
Surely the Anthropic model also only looked at one chunk of code at a time. Cannot fit the entire code base into context. So supplying an identical chunk size (per file, function, whatever) and seeing if the open source model can find anything seems fair. Deliberately prompting with the problem is not.
Partly true. I think the consensus was it wasn't comparable because Mythos swept the entire codebase and found the vulnerabilities, whereas the open models were told where to look for said vulnerabilities.
https://news.ycombinator.com/item?id=47732337
Not really. The models were pointed specifically at the location of the vulnerability and given some extra guidance. That's an easier problem than simply being pointed at the entire code base.
Surely the Anthropic model also only looked at one chunk of code at a time. Cannot fit the entire code base into context. So supplying an identical chunk size (per file, function, whatever) and seeing if the open source model can find anything seems fair. Deliberately prompting with the problem is not.