Comment by Miraste

3 months ago

Are you using it for agentic tasks of any length? 3.5 and 4.5 are about the same for single file/single snippet tasks, but my observation has been that 4.5 can do longer, more complex tasks that were a waste of time to even try with 3.5 because it would always fail.

Yes, this is important. Gpt 5 and o3 were ~ equivalent for a one shot one file task. But 5 and codex-5 can just work for an hour in a way no model was able to before (the newer claudes can too)

  • I use the newer claudes and letting them work for 1 hour leads to horrible code over 50% of the time that does not work. Maybe I am not the target person for agentic tasks, all I use agents for is to do product searches for me on the internet when I have specific constraints and I don't want to waste an hour looking for something.