Ask HN: Does anybody still FEEL improvements between latest LLMs for coding?
2 days ago
Title basically, for me it feels like latest generations of LLMs are quite equal in usefulness for coding, does anybody have anecdotes of the opposite case?
2 days ago
Title basically, for me it feels like latest generations of LLMs are quite equal in usefulness for coding, does anybody have anecdotes of the opposite case?
In the last 12 months?
Antigravity started the workflow where you give a list of things to do and it goes off and does those things without supervision, including drafting and testing edge cases. It can even spin up images. Fable is the latest form of this workflow.
Gemini 3 Pro is actually at a mid designer level. None of the others are even at a junior human design level.
Sonnet 4.5 was, for a brief moment, creative brilliance. But we're talking coding, not writing right? They ditched it all for 4.6.
Opus 4.6 and 4.8 are extremely good for coding. I use them to reliably go through logs that are like 15k lines long. It can read my code, plot out the logs that should happen, check the logs for what actually happens, and from there, form hypotheses, and set up the logs needed to validate these.
Codex/ChatGPT is probably second best in all of the above.
Ngl after opus 4.5 I haven’t noticed too many improvements
Yes. Fable is insanely capable. Probably comparable to the jump from ChatGPT 3 to ChatGPT 4 for me, maybe double?
for those who are advanced in the fields who know what they are doing can tell differences but for those of us who arent that good at programming , its basically the same ngl
imporvements are happening in other things like less token usage, speed, cost etc
I still oscillate between "I'm totally cooked, I have no role here, the AI does everything" to "WTF why is this LLM so stupid today, WTF is it doing? This is garbage?"
A lot of that is because in the former case (AI does everything) I wasn't paying enough attention.
Not intelligence improvement, but the improvements to speed are tangible. In my case I've gotten so used to the speed of Composer 2.5 that using the latest Anthropic models frustrates me. They are so slow, and not really worth the wait times since Composer gets me what I need precisely, much faster. I think you'll see labs care a lot more about latency and tokens per second moving forward.
[flagged]