Comment by tannedNerd
3 days ago
The problem with this is none of this is production quality. You haven’t done edge case testing for user mistakes, a security audit, or even just maintainability.
Yes opus 4.5 seems great but most of the time it tries to vastly over complicate a solution. Its answer will be 10x harder to maintain and debug than the simpler solution a human would have created by thinking about the constraints of keeping code working.
Yes, but my junior coworkers also don't reliably do edge case testing for user errors either unless specifically tasked to do so, likely with a checklist of specific kinds of user errors they need to check for.
And it turns out the quality of output you get from both the humans and the models is highly correlated with the quality of the specification you write before you start coding.
Letting a model run amok within the constraints of your spec is actually great for specification development! You get instant feedback of what you wrongly specified or underspecified. On top of this, you learn how to write specifications where critical information that needs to be used together isn't spread across thousands of pages - thinking about context windows when writing documentation is useful for both human and AI consumers.
The best specification is code. English is a very poor approximation.
I can’t get past that by the time I write up an adequate spec and review the agents code, I probably could have done it myself by hand. It’s not like typing was even remotely close to the slow part.
AI, agents, etc are insanely useful for enhancing my knowledge and getting me there faster.
How will those juniors ever grow up to be seniors now?
My theory is that this (juniors unable to get in) is generally how industries/jobs die and phase out in a healthy manner that causes the least pain to its workers. I've seen this happen to a number of other industries with people I know and when it phases out this way its generally less disruptive to people.
The seniors who have less leeway to change course (its harder as you get older in general, large sunk costs, etc) maintain their positions and the disruption occurs at the usual "retirement rate" meaning the industry shrinks a bit each year. They don't get much with pay rises, etc but normally they have some buffer from earlier times so are willing to wear being in a dying field. Staff aren't replaced but on the whole they still have marginal long term value (e.g. domain knowledge on the job that keeps them somewhat respected there or "that guy was around when they had to do that; show respect" kind of thing).
The juniors move to other industries where the price signal shows value and strong demand remains (e.g. locally for me that's trades but YMMV). They don't have the sunk cost and have time on their side to pivot.
If done right the disruption to people's lives can be small and most of the gains of the tech can still come out. My fear is the AI wave will happen fast but only in certain domains (the worst case for SWE's) meaning the adjustment will be hard hitting without appropriate support mechanisms (i.e. most of society doesn't feel it so they don't care). On average individual people aren't that adaptable, but over generations society is.
1 reply →
Even better. Job security for current seniors.
1 reply →
Isn't it though? I've worked with plenty of devs who shipped much lower quality code into production than I see Claude 4.5 or GPT 5.2 write. I find that SOTA models are more likely to: write tests, leave helpful comments, name variables in meaningful ways, check if the build succeeds, etc.
Stuff that seems basic, but that I haven't always been able to count on in my teams' "production" code.
I can generally get maintainable results simply by telling Claude "Please keep the code as simple as possible. I plan on extending this later so readability is critical."
Yeah some of it is probably related to me primarily using it for swift ui which doesn’t have years of stuff to scrape. But even with those and even telling that ios26 exists it will still at least once a session claim it doesn’t, so it’s not 100%
That may be true now, but think about how far we've come in a year alone! This is really impressive, and even if the models don't improve, someone will build skills to attack these specific scenarios.
Over time, I imagine even cloud providers, app stores etc can start doing automated security scanning for these types of failure modes, or give a more restricted version of the experience to ensure safety too.
There's a fallacy in here that is often repeated. We've made it from 0 to 5, so we'll be at 10 any day now! But in reality there are any number of roadblocks that might mean progress halts at 7 for years, if not forever.
Even if progress halts here at 5, I think the programming profession is forever changed. That’s not hyperbole. Claude Code— if it doesn’t improve at all— has changed how I approach my job. I don’t know that I like this new world, but I don’t think there’s any going back.
This comment addresses none of the concerns raised. It writes off entire fields of research (accessibility, UX, application security) as Just train the models more bro. Accelerate.
Both accessibility, and application security are easier to build rules + improved models for because they have pretty solid constraints and outcomes. UX on the other hand is definitely more challenging given how much of it isn't quite codified into simple rules.
I didn't write off an entire field of research, but rather want to highlight that these aren't intractable problems for AI research, and that we can actually start codifying many of these things today using the skills framework to close up edges in the model training. It may not be 100% but it's not 0%.
It's not from a few prompts, you're right. But if you layer on some follow-up prompts to add proper test suits, run some QA, etc... then the quality gets better.
I predict in 2026 we're going to see agents get better at running their own QA, and also get better at not just disabling failing tests. We'll continue to see advancements that will improve quality.
I think someone around here said: LLMs are good at increasing entropy, experienced developers become good at reducing it. Those follow up prompts sounded additive, which is exactly where the problem lies. Yes, you might have tests but, no, that doesn't mean that your code base is approachable.
You should try it with BEAM languages and the 'let it crash' style of programming. With pattern matching and process isolated per request you basically only need to code the happy path, and if garbage comes in you just let the process crash. Combined with the TDD plugin (bit of a hidden gem), you can absolutely write production level services this way.
Crashing is the good case. What people worry about is tacit data corruption, or other silently incorrect logic, in cases you didn’t explicitly test for.
You don't need BEAM languages. I'm using Java and I always write my code in "let it crash" style, to spend time on happy paths and avoid spending time on error handling. I think that's the only sane way to write code and it hurts me to see all the useless error handling code people write.
Depends on the audience
Agree... but that is exactly what MVPs are. Humans have been shipping MVPs while calling them production-ready for decades.
> Its answer will be 10x harder to maintain and debug
Maintain and debug by who? It's just going to be Opus 4.5 (and 4.6...and 5...etc.) that are maintaining and debugging it. And I don't think it minds, and I also think it will be quite good at it.
there is are skills / subagents for that
something like code-simplifier is surprisingly useful (as is /review)
https://x.com/bcherny/status/2007179850139000872
Depends on the application. In many cases it's good enough.
Its so much easier to create production quality software