← Back to context

Comment by danieltanfh95

2 months ago

People really lack imagination. The point here is that a dedicated attacker with a good harness and really cheap models can run the attack regardless. It's like portscan/url search attacks. They could run all of these against all codebases and clients. However, on the flip side, this also means we could run cheap models against every PR made, and do a thorough red-team security review.

None of these requires mythos. If anything we just need Opus 4.5+ that is not lobotomised.

That is a point. It might even be true. But showing a small model an example of vulnerable code and asking to confirm that it is vulnerable code isn't evidence for that point!

  • No, it is evidence for that point. You could just rattle off every possible vulnerability and have the cheap model scan for it in the harness through a loop.

    Note that I say cheap, not small, because small models may lack the reasoning needed, but some models are cheap enough but retain enough reasoning (ala Sonnet 3.7+)

    • They could write a post demonstrating that you can do that and surface the same bugs in the same codebases.

      It would be way more informative than this one, which didn't do that.