Comment by poolnoodle
12 hours ago
These paid offerings geared toward software development must be a hell of a lot "smarter" than the regular chatbots. The amount of nonsense and bad or outright wrong code Gemini and ChatGPT throw at me lately is off the charts. I feel like they are getting dumber.
Yes they are, the fact that the agents have full access to your local project files makes a gigantic difference.
They do *very* well at things like: "Explain what this class does" or "Find the biggest pain points of the project architecture".
No comparison to regular ChatGPT when it comes to software development. I suggest trying it out, and not by saying "implement game" but rather try it by giving it clear scoped tasks where the AI doesn't have to think or abstract/generalize. So as some kind of code-monkey.
I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved. I prefer Claude code right now because it’s a better product . Gemini just has a weird context window that poisons the rest of the code generated (when online) ChatGPT Codex vs Claude I feel that Claude is a better product and I don’t use enough tokens to for Claude Pro at $100 and just have a regular ChatGPT subscription for productivity tasks .
> I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved.
I think it's clear now that the pace of model improvements is asymptotic (or at least it's reached a local maxima) and the model itself provides no moat. (Every few weeks last year, the perception of "the best model" changed, based on basically nothing other than random vibes and hearsay.)
As a result, the labs are starting to focus on vertical integration (that is, building up the product stack) to deepen their moat.
> I think it's clear now that the pace of model improvements is asymptotic
As much as I wish it were, I don't think this is clear at all... it's only been a couple months since Opus 4.5, after all, which many developers state was a major change compared to previous models.
1 reply →
It’s the inconsistency that gets me. Very similar tasks, similar complexity, same code base, same prompting:
Session A knocks it out of the park. Chef’s kiss.
Session B just does some random vandalism.