Comment by jhonof

3 months ago

Claude 3.5 came out in June of last year, and it is imo marginally worse than the AI models currently available for coding. I do not think models are 10x better than 1 year ago, that seems extremely hyperbolic or you are working in a super niche area where that is true.

5 comments

jhonof

Miraste 3 months ago

Are you using it for agentic tasks of any length? 3.5 and 4.5 are about the same for single file/single snippet tasks, but my observation has been that 4.5 can do longer, more complex tasks that were a waste of time to even try with 3.5 because it would always fail.

FergusArgyll 3 months ago
Yes, this is important. Gpt 5 and o3 were ~ equivalent for a one shot one file task. But 5 and codex-5 can just work for an hour in a way no model was able to before (the newer claudes can too)
- jhonof 3 months ago
  
  I use the newer claudes and letting them work for 1 hour leads to horrible code over 50% of the time that does not work. Maybe I am not the target person for agentic tasks, all I use agents for is to do product searches for me on the internet when I have specific constraints and I don't want to waste an hour looking for something.

hadlock 3 months ago

Your knowledge on the topic is at least six months out of date; April 2025 was a huge leap forward in usability, and recent releases in the last 30 days are at least what I would call a full generation newer technology than June of 2024. Summer 2025 was arguably the dawn of true AI assisted coding. Heck reasoning models were still bleeding edge in late December 2024. They might not be 10x better but their ability to competently use (and build their own) tools makes them almost incomparable to last year's technology.

jhonof 3 months ago

Maybe I am just using them wrong, but I don't know how my knowledge can be out of date considering I use the tools every day and pay for Clause and Gemini. I genuinely think GPT 5 was worse than previous models for reference. They are for sure marginally better, but I don't even think 2x better let alone 10x better.