Comment by ekidd
9 hours ago
No amount of testing will save a large program with a dogshit architecture. Roughly, this is because tests increase coverage linearly with the number of tests, but weird interactions increase exponentially with code size.
This might be fine if you're building a tiny app, or if you're building a medium-sized app that follows a strict existing architecture (like a web app consisting mostly of forms). In which case, have fun.
But if you're building something slightly novel and interesting, then Claude is surprisingly bad at architecture and taste, and it tends to "fix" problems by spewing more slop. What you need instead is actual insight that leads to simplifying principles. This, in turn, allows breaking up the exponential complexity into disciplined patterns. This allows your code complexity to scale far more slowly, allowing an essentially linear number of tests to provide coverage.
I actually download and try people's vibe-coded developer tools. And frankly, those tools are some of the worst software I've used in my life, worse than even Unix-vendor Motif implementations from the early 90s.
Like, I'm super happy that people can vibe-code themselves simple, one-off personal tools. That's incredibly empowering. But that doesn't mean you can big, novel stuff the same way without a competent human actively in the loop.
> those tools are some of the worst software I've used in my life
Is the code bad or don't they do what they claim they do? Both are very different issues.
They do what they claim to do maybe 20% of the time. The other 80% of the time is spent trying to figure out why they aren't working, why they corrupted their data, why they crash every 10 minutes, etc.
And I want to be clear that this isn't some non-technical novice vibe coding this garbage. This is often extremely talented developers with decades of experience who have apparently decided that they don't need to look at their code anymore.
You can get very good results out of AI agents. But mostly the people who get good results are the ones who still read the LLM output in detail, and who introduce the structure the LLMs are missing. But like I said, this distinction mostly becomes apparent past a certain size and novelty level.
Where do you find these apps that fail to work 80% of the time?
I must be an anomaly because all of the vibe coded apps I'm running 24/7 don't keep crashing or stop suddenly working.
Would Antirez with LLMs make the same mistakes a novice would make? You are comparing your strongest contender with my weakest contender.