Comment by epolanski

12 hours ago

While what you say is in general true, every model that followed Opus 4.6 on Anthropic side has been increasingly worse at what the previous user points out: they are extremely smart and can convince the user about major falsehood.

They are way too trained/reinforced on solving problems rather than assisting you, something on which they have becoming extremely bad at.

It's hard to explain because I too had the many moments where "Fable5 / Opus4.8 xhigh could solve bugs/stuff that previous models couldn't", I know that to be true, and they are very useful for that.

But 90% of my tasks are quite mundane and I need thorough investigation and a proper assistant. Not a smart bullshitter fixated on solving the issue itself. On that Opus 4.6 has been the last good model.

Anything after that is completely skewed towards passing benchmarks and E2E tasks, but definitely not assisting.

Fable in particular was a disaster on that, non stop being thorough on the fix it fixated on, writing nthousand experiments in /tmp, etc. Great model, not gonna lie, but only if your focus is vibe coding and you accept that you're nothing but an assistant and accept its shortcomings.

1 comment

epolanski

iamanllm 10 hours ago

yeah, the "proactivity" of recent anthropic models and sophisticated bullshitting are bad, although my experience is that even on simple tasks i've never used a oss model that has consistently been better in terms of the quality of the result.