← Back to context

Comment by maccard

19 hours ago

So which apps are seeing 10x the bug fixes and improvements in stability and quality? From my side, I see one shot CRUD apps, platforms like AWS and windows actively deteriorating, to the point of causing massive outages and needing to have development processes changed [0]. Who is actually shipping 10x more stuff, or fixing 10x more bugs?

[0] https://arstechnica.com/ai/2026/03/after-outages-amazon-to-m...

I "pair" with claude-code and still write 30% by hand, with additional review with gpt-5.4, but I definitely write fewer bugs than before. I'd estimate my speedup to be 2x.

The Automation bias issue is something that has been raised by many people like myself but mostly ignored. The better models get the worse that problem with get, but IMHO the implications of the claims are not on the code generation side.

The sandwich story in the model card is the bigger issue.

LLMs have always been good at finding a needle in a haystack, if not a specific needle, it sounds like they are claiming a dramatic increase in that ability.

This will dramatically change how we write and deliver software, which has traditionally been based on the idea of well behaved non-malfeasant software with a fix as you go security model.

While I personally find value in the tools as tools, they specifically find a needle and fundamentally cannot find all of the needles that are relevant.

We will either have to move to some form of zero trust model or dramatically reduce connectivity and move to much stronger forms of isolation.

As someone who was trying to document and share a way of improving container isolation that was compatible with current practices I think I need to readdress that.

VMs are probably a minimum requirement for my use case now, and if verified this new model will dramatically impact developer productivity due to increased constraints.

Due to competing use cases and design choice constraints, none of the namespace based solutions will be safe if even trusted partners start to use this model.

How this lands in the long run is unclear, perhaps we only allow smaller models with less impact on velocity and with less essential complexity etc…

But the ITS model of sockets etc.. will probably be dead for production instances.

I hope this is marketing or aspirational to be honest. It isn’t AGI but will still be disruptive if even close to reality.