← Back to context

Comment by minimaxir

3 days ago

> They are not "getting worse" they "have been bad".

The agents available in January 2025 were much much worse than the agents available in November 2025.

Yes, and for some cases no.

The models are gotten very good, but I rather have an obviously broken pile of crap that I can spot immediately, than something that is deep fried with RL to always succeed, but has subtle problems that someone will lgtm :( I guess its not much different with human written code, but the models seem to have weirdly inhuman failures - like, you would just skim some code, cause you just cant believe that anyone can do it wrong, and it turns out to be.