Comment by ashleyn

2 months ago

Having to prime it with more context and more guardrails seems to imply they're getting worse. That's fewer context and guardrails it can infer/intuit.

2 comments

ashleyn

theptip 2 months ago

No, they are not getting worse. Again, look at METR task times.

The peak capability is very obviously, and objectively, increasing.

The scaffolding you need to elicit top performance changes each generation. I feel it’s less scaffolding now to get good results. (Lots of the “scaffolding” these days is less “contrived AI prompt engineering” and more “well understood software engineering best practices”.)

falloutx 2 months ago

Why the downvotes, this comment makes sense. If you need to write more guardrails that does increase the work and at some point amount of guardrails needed to make these things work in every case would be just impractical. I personally dont want my codebase to be filled baby sitting instructions for code agents.