Comment by K0balt

4 hours ago

In my experience sonnet<opus by a long shot for code review. Sonnet often flags things as errors that are not, because it fails to grasp the big picture… and also fails to grasp structural issues that are perfectly coded and only show up as problems at the meta scale.

I have no reason to believe that the next generation won’t offer similar gains in verification, and there is some evidence to support that the cybersecurity implications are the result of exactly this expansion of ability.

1 comment

K0balt

thepasch 4 hours ago

It depends on how you review. In an orchestrated per-task review workflow with clearly defined acceptance criteria and implementation requirements, using anything other than Sonnet (handed those criteria and requirements) hasn’t really led to much improvement, but it drives up usage and takes longer. I even tried Haiku, but, yeah, Haiku is just not viable for review, even tightly scoped, lol.

Siccing Sonnet on a codebase or PR without guidance does indeed lead to worse results than using Opus, though.