Comment by cwsx
2 months ago
I've been using `claude-4-sonnet` for the last few hours - haven't been able to test `opus` yet as it's still overloaded - but I have noticed a massive improvement so far.
I spent most of yesterday working on a tricky refactor (in a large codebase), rotating through `3.7/3.5/gemini/deepseek`, and barely making progress. I want to say I was running into context issues (even with very targeted prompts) but 3.7 loves a good rabbit-hole, so maybe it was that.
I also added a new "ticketing" system (via rules) to help it's task-specific memory, which I didn't really get to test it with 3.7 (before 4.0 came out), so unsure how much of an impact this has.
Using 4.0, the rest of this refactor (est. 4~ hrs w/ 3.7) took `sonnet-4.0` 45 minutes, including updating all of the documentation and tests (which normally with 3.7 requires multiple additional prompts, despite it being outlined in my rules files).
The biggest differences I've noticed:
- much more accurate/consistent; it actually finishes tasks rather than telling me it's done (and nothing working)
- less likely to get stuck in a rabbit hole
- stopped getting stuck when unable to fix something (and trying the same 3 solutions over-and-over)
- runs for MUCH longer without my intervention
- when using 3.7:
- had to prompt once every few minutes, 5 - 10mins MAX if the task was straight forward enough
- had to cancel the output in 1/4 prompts as it'd get stuck in the same thought-loops
- needed to restore from a previous checkpoint every few chats/conversations
- with 4.0:
- ive had 4 hours of basically one-shotting everything
- prompts run for 10 mins MIN, and the output actually works
- is remembering to run tests, fix errors, update docs etc
Obviously this is purely anecdotal - and, considering the temperament of LLMS, maybe I've just been lucky and will be back to cursing at it tomorrow, but imo this is the best feeling model since 3.5 released.
No comments yet
Contribute on Hacker News ↗