Comment by nickandbro

14 hours ago

I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default. Like for example a trusted user makes feedback -> feedback gets curated into a ticket by an AI agent, then turned into a PR by an Agent, then reviewed by an Agent, before being deployed by an Agent. We are maybe one or two steps from the flywheel being completed. Or maybe we are already there.

113 comments

nickandbro

jwpapi 8 hours ago

I just don’t see it coming. I was full on that camp 3 months ago, but I just realize every step makes more mistakes. It leads into a deadlock and when no human has the mental model anymore.

Don’t you guys have hard business problems where AI just cant solve it or just very slowly and it’s presenting you 17 ideas till it found the right one. I’m using the most expensive models.

I think the nature of AI might block that progress and I think some companies woke up and other will wake up later.

The mistake rate is just too high. And every system you implement to reduce that rate has a mistake rate as well and increases complexity and the necessary exploration time.

I think a big bulk of people is of where the early adaptors where in December. AI can implement functional functionality on a good maintained codebase.

But it can’t write maintable code itself. It actually makes you slower, compared to assisted-writing the code, because assisted you are way more on the loop and you can stop a lot of small issues right away. And you fast iterate everything•

I’ve not opened my idea for 1 months and it became hell at a point. I’ve now deleted 30k lines and the amount of issues I’m seeing has been an eye-opening experience.

Unscalable performance issues, verbosity, straight up bugs, escape hatches against my verification layers, quindrupled types.

Now I could monitor the ai output closer, but then again I’m faster writing it myself. Because it’s one task. Ai-assisted typing isn’t slower than my brain is.

Also thinking more about it FAANG pays 300$ per line in production, so what do we really trying to achieve here, speed was never the issue.A great coder writes 10 production lines per day.

Accuracy, architecture etc is the issue. You do that by building good solid fundamental blocks that make features additions easier over time and not slower

onionisafruit 5 hours ago

I know it’s not your main point, but I’m curious where $300/line comes from. I don’t think I’ve ever seen a dollar amount attached to a line of production code before.
aspenmartin 7 hours ago
I think this sounds like a true yet short sighted take. Keep in mind these features are immature but they exist to obtain a flywheel and corner the market. I don’t know why but people seem to consistently miss two points and their implications
- performance is continuing to increase incredibly quickly, even if you rightfully don’t trust a particular evaluation. Scaling laws like chinchilla and RL scaling laws (both training and test time)
- coding is a verifiable domain
The second one is most important. Agent quality is NOT limited by human code in the training set, this code is simply used for efficiency: it gets you to a good starting point for RL.
Claiming that things will not reach superhuman performance, INCLUDING all end to end tasks: understanding a vague business objective poorly articulated, architecting a system, building it out, testing it, maintaining it, fixing bugs, adding features, refactoring, etc. is what requires the burden of proof because we literally can predict performance (albeit it has a complicated relationship with benchmarks and real world performance).
Yes definitely, error rates are too high so far for this to be totally trusted end to end but the error rates are improving consistently, and this is what explains the METR time horizon benchmark.
- sobellian 6 hours ago
  
  Scaling laws vs combinatorial explosion, who wins? In personal experience claude does exceedingly well on mundane code (do a migration, add a field, wire up this UI) and quite poorly on code that has likely never been written (even if it is logically simple for a human). The question is whether this is a quantitative or qualitative barrier.
  Of course it's still valuable. A real app has plenty of mundane code despite our field's best efforts.
  
  8 replies →
- embedding-shape 7 hours ago
  
  > - coding is a verifiable domain
  You're missing the point though. "1 + 1" vs "one.add(1)" might both be "passable" and correct, but it's missing the forest for the trees, how do you know which one is "long-term the right choice, given what we know?", which is the engineering part of building software, and less about "coding" which tends to be the easy part.
  How do you evaluate, score and/or benchmark something like that? Currently, I don't think we have any methodologies for this, probably because it's pretty subjective in the end. That's where the "creative" parts of software engineering becomes more important, and it's also way harder to verify.
  
  14 replies →
- nprateem 7 hours ago
  
  But the issue isn't coding, it's doing the right thing. I don't see anywhere in your plan some way of staying aligned to core business strategy, forethought, etc.
  The number of devs will reduce but there will still be large activities that can't be farmed out without an overall strategy
  
  5 replies →
EdgeNRoots 8 hours ago

[dead]

chatmasta 14 hours ago

I love everything about this direction except for the insane inference costs. I don’t mind the training costs, since models are commoditized as soon as they’re released. Although I do worry that if inference costs drop, the companies training the models will have no incentive to publish their weights because inference revenue is where they recuperate the training cost.

Either way… we badly need more innovation in inference price per performance, on both the software and hardware side. It would be great if software innovation unlocked inference on commodity hardware. That’s unlikely to happen, but today’s bleeding edge hardware is tomorrow’s commodity hardware so maybe it will happen in some sense.

If Taalas can pull off burning models into hardware with a two month lead time, that will be huge progress, but still wasteful because then we’ve just shifted the problem to a hardware bottleneck. I expect we’ll see something akin to gameboy cartridges that are cheap to produce and can plug into base models to augment specialization.

But I also wonder if anyone is pursuing some more insanely radical ideas, like reverting back to analog computing and leveraging voltage differentials in clever ways. It’s too big brain for me, but intuitively it feels like wasting entropy to reduce a voltage spike to 0 or 1.

efromvt 5 hours ago

Inference costs at least seem like the thing that is easiest to bring down, and there's plenty of demand to drive innovation. There's a lot less uncertainty here than with architectural/capability scaling. To your point, tomorrow's commodity hardware will solve this for the demands of today at some point in the future (though we'll probably have even more inference demand then).
throwaw12 11 hours ago

> I love everything about this direction except for the insane inference costs.
If this direction holds true, ROI cost is cheaper.
Instead of employing 4 people (Customer Support, PM, Eng, Marketing), you will have 3-5 agents and the whole ticket flow might cost you ~20$
But I hope we won't go this far, because when things fail every customer will be impacted, because there will be no one who understands the system to fix it
michaelmior 9 hours ago
I worry about the costs from an energy and environmental impact perspective. I love that AI tools make me more productive, but I don't like the side effects.
- azan_ 5 hours ago
  
  Environmental impact of ai is greatly overstated. Average person will make bigger positive impact on environment by reducing his meat intake by 25% compared with combined giving up flying and AI use.
  
  1 reply →
- x187463 8 hours ago
  
  [dead]
eksu 13 hours ago
This is the wrong way to see it. If a technology gets cheaper, people will use more and more and more of it. If inference costs drop, you can throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.
- gf000 10 hours ago
  
  > throw way more reasoning tokens and a combination of many many agents to increase accuracy or creativity and such.
  But this is just not true, otherwise companies that can already afford such high prices would have already outpaced their competitors.
  
  1 reply →
mastermage 13 hours ago

I mean theoretically if there are many competitiors the costs of the product should generally drop because competition.
Sadly enough I have not seen this happening in a long time.

Leptonmaniac 14 hours ago

I think that as a user I'm so far removed from the actual (human) creation of software that if I think about it, I don't really care either way. Take for example this article on Hacker News: I am reading it in a custom app someone programmed, which pulls articles hosted on Hacker News which themselves are on some server somewhere and everything gets transported across wires according to a specification. For me, this isn't some impressionist painting or heartbreaking poem - the entity that created those things is so far removed from me that it might be artificial already. And that's coming from a kid of the 90s with some knowledge in cyber security, so potentially I could look up the documentation and maybe even the source code for the things I mentioned; if I were interested.

slopinthebag 13 hours ago
Art is and has always been about the creator.
- raincole 10 hours ago
  
  I don't want software that is built to be art. I want software that is built to provide facilities.
  
  1 reply →
- vntok 11 hours ago
  
  Take a walk in any museum, I'm pretty sure you'll react to some of the art displayed there and find it cool before you read the name of the artist.
  
  3 replies →

theredbeard 13 hours ago

We haven’t been inching closer to users writing a half-decent ticket in decades though.

fhub 9 hours ago

Solutions like https://bugherd.com/ might make the issue context capture part more accurate.
aembleton 12 hours ago

Maybe the agent can ask the user clarifying questions. Even better if it could do it at the point of submission.

heavyset_go 12 hours ago

Feedback loops like that would be an exercise in raising garbage-in->garbage-out to exponential terms.

It's the "robots will just build/repair themselves" trope but the robots are agents

TeMPOraL 11 hours ago

Yes. Next they'll want nanobots that build/repair themselves.
Oh wait. That's already here and is working fine.

jvuygbbkuurx 14 hours ago

Tusted user like Jia Tan.

mindwok 11 hours ago

I think Anthropic will launch backend hosting off the back of their Bun acquisition very soon. It makes sense to basically run your entire business out of Claude, and share bespoke apps built by Claude code for whatever your software needs are.

pxtail 9 hours ago

100% its going to happen - also OpenAI will do same, there were already rumors about them building internal "github" which is stepping stone for that Also it is requirement for completing lock-in - the dream for these companies.

lancekey 8 hours ago

Ha I just SPECed out a version of this. I have a simple static website that I want a few people to be able to update.

So, we will give these 3 or 4 trusted users access to an on-site chat interface to request updates.

Next, a dev environment is spun up, agent makes the changes, creates PR and sends branch preview link back to user.

Sort of an agent driven CMS for non-technical stakeholders.

Let’s see if it works.

EastLondonCoder 7 hours ago

I think some type of tickets can be done like this but your trusted user assumption does a lot of work here. Now I don't see this getting better than that with the current architecture of LLMs, you can do all sorts of feedback mechanisms which helps but since LLMs are not conscious drift is unavoidable unless there is a human in the loop that understands and steers what's going on.

But I do think even now with certain types of crud apps, things can be largely automated. And that's a fairly large part of our profession.

andy_ppp 10 hours ago

Users are often incorrect about what the software should actually be doing and don’t see the bigger picture.

backscratches 8 hours ago

In the past three weeks a couple of projects I follow have implemented AI tools with their own github accounts which have been doing exactly this. And they appear to be doing good work! Dozens of open issues iterated, tested and closed. At one point i had almost 50 notification for one projects backlog being eradicated in 24 hours. The maintainer reviewed all of it and some were not merged.

obastani 8 hours ago

I don't know if this is the future, but if it is, why bother building one version of the software for everyone? We can have agents build the website for each user exactly the way they want. That would be the most exciting possibility to come out of AI-generated software.

bwestergard 8 hours ago

"why bother building one version of the software for everyone?"
So one user's experience is relevant to another, so they can learn from one another?

slopinthebag 14 hours ago

What kind of software are people building where AI can just one shot tickets? Opus 4.6 and GPT 5.4 regularly fail when dealing with complicated issues for me.

girvo 9 hours ago

GPT 5.4 straight up just dies with broken API responses sometimes, let alone when it struggles with a even moderately complex task.
I still can't get a good mental model for when these things will work well and when they won't. Really does feel like gambling...
withinboredom 14 hours ago
Not just complicated, but even simple ones if the current software is too “new” of a pattern they’ve never seen before or trained on.
- slopinthebag 13 hours ago
  
  I dunno if Rust async or native platform API's which have existed for years count as new patterns, but if you throw even a small wrench in the works they really struggle. But that's expected really when you look at what the technology is - it's kind of insane we've even gotten to this point with what amounts to fancy autocomplete.
victorbjorklund 12 hours ago

Of course not all tickets are complex. Last week I had to fix a ticket which was to display the update date on a blog post next to the publish date. Perfect use case for AI to one shot.
thin_carapace 14 hours ago
i dont see anyone sane trusting ai to this degree any time soon, outside of web dev. the chances of this strategy failing are still well above acceptable margins for most software, and in safety critical instances it will be decades before standards allow for such adoption. anyway we are paying pennies on the dollar for compute at the moment - as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.
- heavyset_go 13 hours ago
  
  > as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.
  All Chinese labs have to do to tank the US economy is to release open-weight models that can run on relatively cheap hardware before AI companies see returns.
  Maybe that's why AI companies are looking to IPO so soon, gotta cash out and leave retail investors and retirement funds holding the bag.
  
  4 replies →
- m00x 13 hours ago
  
  Several fintechs like Block and Stripe are boasting thousands of AI-generated PRs with little to no human reviews.
  Of course it's in the areas where it doesn't matter as much, like experiments, internal tooling, etc, but the CTOs will get greedy.
  
  3 replies →
- slopinthebag 13 hours ago
  
  Even in webdev it rots your codebase unchecked. Although it's incredibly useful for generating UI components, which makes me a very happy webslopper indeed.
  
  1 reply →
hrmtst93837 9 hours ago

[dead]

eerikkivistik 6 hours ago

I know a company already operating like this in the fintech space. I foresee a front page headline about their demise in their future.

tuo-lei 13 hours ago

The missing piece for me is post-hoc review.

A PR tells me what changed, but not how an AI coding session got there: which prompts changed direction, which files churned repeatedly, where context started bloating, what tools were used, and where the human intervened.

I ended up building a local replay/inspection tool for Claude Code / Cursor sessions mostly because I wanted something more reviewable than screenshots or raw logs.

dominotw 8 hours ago

I dont mean this as a shade but ppl who are not coders now seem to think "coding is now solved" and seem to be pushing absurd ideas like shipping software with slack messages. These ppl are often high up in the chain and have never done serious coding.

Stripe is apparently pushing gazzaliion prs now from slack but their feature velocity has not changed. so what gives?

how is that number of pr is now the primary metric of productivity and no one cares about what is being shipped or if we are shipping product faster. Its total madness right now. Everyone has lost their collective minds.

rkomorn 8 hours ago
I ask myself the same question.
I'm not seeing the apps, SaaS, and other tools I use getting better, with either more features or fewer bugs.
Whatever is being shipped, as an end user, I'm just not seeing it.
- dominotw 8 hours ago
  
  cto and ceo are now feeling insane pressure to show how they are using ai but its not evident in output. So now they've resorted to blabbering publicly about prs, lines of code ect to save face. And ofcourse ppl giving them voice and platform have their own agendas that prevent them from asking "so what exactly have you shipped stripe from million pr/day".
  Its baffling to see these comments on hacknernews though. I guess you have to prove that you are not a luddite by making "ai forward" predictions and show that you "get it"
- duped 5 hours ago
  
  I think a lot of SWE roles are really bullshit jobs (1) and these have been particularly susceptible to getting sniped with AI tools.
  (1) https://en.wikipedia.org/wiki/Bullshit_Jobs

edf13 12 hours ago

Or perhaps we end up where all software is self evolving via agents… adjusting dynamically to meet the users needs.

PeterStuer 12 hours ago

The "user" being the one that's in charge of the AI, not the person on the receiving end.

eru 13 hours ago

Instead of having a trusted user, you can also do statistics on many users.

(That's basically what A/B testing is about.)

hyperionultra 13 hours ago

"Trusted user" also can be an Agent.

bredren 14 hours ago

What you're describing is absolutely where we're headed.

But the entire SWE apparatus can be handled.

Automated A/B testing of the feature. Progressive exposure deployment of changes, you name it.

shafyy 10 hours ago

Haha sure, let's just let every user add their feedback to the software.

tossandthrow 14 hours ago

I think the Ai agent will directly make a PR - tickets are for humans with limited mental capacity.

At least in my company we are close to that flywheel.

_puk 14 hours ago
Tickets need to exist purely from a governance perspective.
Tickets may well not look like they do now, but some semblance of them will exist. I'm sure someone is building that right now.
No. It's not Jira.
- tossandthrow 13 hours ago
  
  Yes, so my point is that PRs act as that governance layer - with preview environments, you can see the complexity and risk of the change etc.
Gigachad 14 hours ago
The agents have even more limited capacity
- eru 13 hours ago
  
  At the moment, maybe. But it's growing.
  
  2 replies →

MattGaiser 14 hours ago

I am already there with a project/startup with a friend. He writes up an issue in GitHub and there is a job that automatically triggers Claude to take a crack at it and throw up a PR. He can see the change in an ephemeral environment. He hasn't merged one yet, but it will get there one day for smaller items.

I am already at the point where because it is just the two of us, the limiting factor is his own needs, not my ability to ship features.

m00x 13 hours ago

Must be nice working on simple stuff.
jondwillis 14 hours ago
Why doesn’t he merge them?
- MattGaiser 5 hours ago
  
  He is not technical but a product guy, so he still wants me to check it over.

yieldcrv 14 hours ago

We do feedback to ticket automatically

We dont have product managers or technical ticket writers of any sort

But us devs are still choosing how to tackle the ticket, we def don't have to as I’m solving the tickets with AI. I could automate my job away if I wanted, but I wouldn't trust the result as I give a degree of input and steering, and there’s bigger picture considerations its not good at juggling, for now

charcircuit 14 hours ago

Then sets up telemetry and experiments with the change. Then if data looks good an agent ramps it up to more users or removes it.

overfeed 11 hours ago

> I feel like we are just inching closer and closer to a world where rapid iteration of software will be by default.

There's a lots of experimentation right now, but one thing that's guaranteed is that the data gatekeepers will slam the door shut[1] - or install a toll-booth when there's less money sloshing about, and the winners and losers are clear. At some point in the future, Atlassian and Github may not grant Anthropic access to your tickets unless you're on the relevant tier with the appropriate "NIH AI" surcharge.

1. AI does not suspend or supplant good old capitalism and the cult of profit maximization.

fatata123 6 hours ago

[dead]

eranation 14 hours ago

Um, we are already there...