It’s truly strange that people keep citing the quality of Claude code’s leaked source as if it’s proof vibe coding doesn’t work.
If anything, it’s the exact opposite. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked. I've worked at BigCos and startups, and a lot of the terrible code that makes it to production was shocking when I first started.
This isn't a dig at anyone, I've certainly shipped my share of bad code as well. Deadlines, despite my wishes sometimes, continue to exist. Sometimes you have to ship a hack to make a customer or manager happy, and then replacing those hacks with better code just never happens.
For that matter, the first draft of nearly anything I write is usually not great. I might just be stupid, but I doubt I'm unique; when I've written nice, beautiful, optimized code, it's usually a second or third draft, because ultimately I don't think I fully understand the problem and the assumptions I am allowed to make until I've finished the first draft. Usually for my personal projects, my first dozen or so commits will be pretty messy, and then I'll have cleanup branches that I merge to make the code less terrible.
This isn't inherently bad, but a lot of the time I am simply not given time to do a second or third draft of the code, because, again, deadlines, so my initial "just get it working" draft is what ships into production. I don't love it, and I kind of dread of some of the code with my name attached to it at BigCo ever gets leaked, but that's just how it is in the corporate world sometimes.
This is the product that's claiming "coding is a solved problem" though.
I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
Like, come on. Software has been shit for decades. AI hasn't observably reduced the quality of software I use everyday in a way that is meaningfully separable from normal incidents in the past.
> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
Bad code works fine until it doesn't. In my experience, with humans, doing the right thing is worth it over doing the bad thing if your time horizon is a few months. Once you're in years, absolutely do the right thing, you're actually throwing time away if you don't. And I don't mean "big refactor", I mean at-change-time, when you think "this change feels like an icky hack."
For LLMs, I don't really know. I only have a couple years experience at that.
And it’s perfectly okay to fix and improve the code later.
Many super talented developers I know will say “Make it work, then make it good”. I think it’s okay to do this on a bigger scale than just the commit cycle.
Yes, and to add, in case it's not obvious: in my experience the maintenance, mental (and emotional costs, call me sensitive) cost of bad code compounds exponentially the more hacks you throw at it
It’s also possible to sell chairs that are uncomfortable and food that tastes terrible. Yet somehow we still have carpenters and chefs; Herman Miller and The French Laundry.
Some business models will require “good” code, and some won’t. That’s how it is right now as well. But pretending that all business models will no longer require “good” code is like pretending that Michelin should’ve retired its list after the microwave was invented.
Still, talk about "good" code exist for a reason. When the code is really bad, you end up paying the price by having to spend too more and more time and develop new features, with greater risk to introduce bugs. I've seen that in companies in the past, where bad code meant less stability and more time to ship features that we needed to retain customers or get new ones.
Now whether this is still true with AI, or if vibe coding means bad code no longer have this long term stability and velocity cost because AI are better than humans at working with this bad code... We don't know yet.
Not only true but I would guess it's the normal case. Most software is a huge pile of tech debt held together by zip-ties. Even greenfield projects quickly trend this way, as "just make it work" pressure overrides any posturing about a clean codebase.
It depends on the urgency. Not every product is urgent. CC arguable was very urgent; even a day of delay meant the competitors could come out with something slightly more appealing.
1. "Vibe coding" is a spectrum of how much human supervision (and/or scaffolding in the form of human-written tests and/or specs) is involved.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
1 is definitely false right now. I gave specs, tests, full datasets, reference code to translate to an llm and still produce garbage code/fall flat on it's face. I just spent one week translating a codebase from go to cpp and i had to throw the whole thing out because it put in some horrible bugs that it could not fix even burning 500$ worth of tokens and me babysitting it. As i said it had everything at it's disposal: tests, reference impl, lots of data to work with. I finally got my lazy ass to inplement it and lo and behold i did it in 2 days with no bugs (that i know of) and the code quality is miles better than that undigested vomit. The codebase was a protocol library for decoding network traffic that used a lot of bit twiddling, flow control, huffman table compression, mildly complicated stuff. So no - if you want working non-trivial code that you can rely on then definitely don't use a llm to do it. Use it for autocomplete, small bits of code but never let the damn thing do the thinking for you.
Still, it's probably true that Claude Code (etc) will be more successful working on clean, well-structured code, just like human coders are. So short-term, maybe not such a big deal, but long-term I think it's still an unresolved issue.
I imagine it is way more affordable in terms of tokens to implement a feature in a well organized code base, rather than a hacky mess of a codebase that is the result of 30 band-aid fixes stacked on top of each other.
The traditional rules of good code are heuristics that are practical for human developers. A different set of heuristics will emerge for agentic development.
- Good code is what enables you to be able to build very complex software without an unreasonable number of bugs.
- Good code is what enables you to be responsive to changing customer needs and times. Whether you view that as valuable is another matter though. I guess it is a business decision. There have been plenty of business that have gone bust though by neglecting that.
Good code is for your own sanity, the machine does not care.
This product rides a hype wave. This is why it is crazy popular and successful.
The situation there is akin to Viaweb - Viaweb also rode hype wave and code situation was awful as well (see PG's stories about fixing bugs during customer's issue reproduction theater).
What did Viaweb's buyer do? They rewrote thing in C++.
If history rhymes, then buyer of Anthropic would do something close to "rewrite it in C++" to the current Claude Code implementation.
This is also why they had to release it quickly. They got the first mover advantage but if they delayed to make its code better, a competitor could have taken the wave instead of them.
I don't disagree with your general premise that eventually it'll just be rewritten, but I have to push back on the idea that Anthropic will be acquired. Their most recent valuation was $380B, and even if they wanted to be acquired (which I doubt) essentially no company has the necessary capital.
One truism about coding agents is that they struggle to work with bad code. Code quality matters as much as always, the experts say, and AI agents (left unfettered) produce bad code at an unprecedented rate. That's why good practices matter so much! If you use specs and test it like so and blah blah blah, that makes it all sustainable. And if anyone knows how to do it right, presumably it's Anthropic.
This codebase has existed for maybe 18 months, written by THE experts on agentic coding. If it is already unintelligible, that bodes poorly for how much it is possible to "accelerate" coding without taking on substantial technical debt.
i think you are conflating anthropic (the startup) with claude code (the leaked source of one of said startup's products)
i.e., the claude code codebase doesn't need to be good right now [^1] — so i don't think the assumption that this is an exemplary product / artifact of expert agentic coding actually holds up here specifically
[^1]: the startup graveyard is full of dead startups with good code
My understanding of OP was not a claim that "vibe coding doesn't work", but that the way Anthropic does it doesn't work. He seems to be specifically criticizing the "hands off the actual code, human" approach and advocating for keeping the human in the loop.
I do M&As at my company - as a cto. I have seen lots of successful companies' codebases, and literally none of them elegant. Including very profitable companies with good, loved products.
The only good code I know is in the open source domain and in the demoscene. The commercial code is mostly crap - and still makes money.
What I'm missing so far is how they produced such awful code with the same product I'm using, which definitely would have called out some of those issues.
Perhaps the problem is getting multiple vibe-coders synced up when working on a large repo.
It kind of reminds me of grammar police type personalities. They are so hung up on the fact it reads “ugly” they can’t see the message; this code powers a rapidly growing $400B company. They admit refactoring is easy, but fail to realize they probably know that too and it’s just not a priority yet.
Right, and often the tested depth isnt maximum. So you slowly acclimate to worse and worse code practices if the effort needed to undo it is the same as doing.
It works, it is popular, sure. Claude's code may be barely old enough to have suffered through its true long-term maintainability problems. They probably also haven't had a lot of rotation/attrition in their staff.
Not AI but perfect example is Cloudflare. They have implemented public suffix list (to check if a domain is valid) 10 different times in 10 different ways. In one place, they have even embedded the list in frontend (pages custom domain). You report issues, they fix that one service, their own stuff isn't even aware that it exists in other places.
Meta has four different implementations of the same page to create a “page” for your business… which is required to be able to advertise on any of their services.
Each one is broken, doesn’t have working error handling, and prevents you from giving them money. They all exist to insert the same record somewhere. Lost revenue, and they seem to have no idea.
Amazons flagship ios app has had at least three highly visible bugs, for years. They’re like thorns in my eye sockets, every time I use it. They don’t care.
These companies are working with BILLIONS of dollars in engineering resources, unlimited AI resources, and with massive revenue effects for small changes.
It helps if the product is so revolutionary that people are willing to overlook bugs. Could you imagine a more mundane product with a TUI that flickered all the time where this wouldn't be a showstopper? I believe the bug is fixed now, but it seems crazy that it persisted so long given how obvious the cause was (clear before update). How many more bugs are in CC? As of a few weeks ago there were 5000 or so open issues against it on github.
The success is undeniable, but whether this vibe-coded level of quality is acceptable for more general use cases isn't something you can infer from that.
Yes, that is how Facebook, Yahoo and many other companies started out. But they rewrote their code when it became to big to be maintainable. The problem with shoddy code is not necessarily that it doesn't work but that it becomes impossible to change.
It basically shifting work to future people. This mess will stop working and will introduce unsolvable obscure bugs one day, and someone actually will have to look at it.
It already costed many developers months and hundreds of dollars worth of tokens because of a bug. There will be more.
99.999999% of products can't get away with what Anthropic is able to - this is a one in a billion disruptive product with minimal competition, and its success so far is mostly due to Claude the model, not the agent harness
> It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
We already knew that. This is a matter of people who didn't know that or didn't want to acknowledge that thinking they now have proof that it doesn't matter for creating a crazy popular & successful product, as if it's a gotcha on those who advocate for good practices. When your goal is to create something successful that you can cash out, good practices and quality are/were never a concern. This is the basis for YAGNI, move-fast-and-break-things, and worse-is-better. We've know this since at least betamax-vs-VHS (although maybe the WiB VHS cultural knowledge is forgotten these days).
TBH Claude Code is surprisingly shit to use given the technical resources and the amount of money behind it. Looking past the bugs and missing features, it's so obvious it's not built by people who care about the product from a developer/craftsman perspective. It's missing all the signs of polish/care, it feels like someone shipped an internal PoC to prod and kept hacking on it. And now they are just tacking on features to sell more buzzwords and internal prototypes. Classic user facing/commercial software story.
But we (the dev community) are kind of spoiled, because we have a lot of great developer tools that come from people passionate about their work, skilled at what they do and take pride in what they put out. I don't count myself among one of those people but I have benefited from their work throughout my career and have gotten used to it in my tooling.
All that being said Opus is hands down the best coding model for me (and I'm actively trying all of them) and I'll tolerate it as long as I can get it to do what I need, even with the warts and annoyances.
> TBH Claude Code is surprisingly shit to use given the technical resources and the amount of money behind it. Looking past the bugs and missing features, it's so obvious it's not built by people who care about the product from a developer/craftsman perspective. It's missing all the signs of polish/care, it feels like someone shipped an internal PoC to prod and kept hacking on it.
I don't wholly disagree, but personally it's still the tool I use and it's sort of fine. Perhaps not entirely for the money that's behind it, as you said, but it could be worse.
The CLI experience is pretty okay, although the auth is kinda weird (e.g. when trying to connect to AWS Bedrock). There's a permission system and sandboxing, plan mode and TODOs, decent sub-agent support, instruction files and custom skills, tool calls and LSP support and all the other stuff you'd expect. At least no weird bugs like I had with OpenCode where trying to paste multi-line content inside of a Windows Terminal session lead to the tool closing and every next line getting pasted in an executed in the terminal one by one, that was weird, though I will admit that using Windows feels messed up quite often nowadays even without stuff like that.
The desktop app gives you chat and cowork and code, although it almost feels like Cowork is really close to what Code does (and for some reason Cowork didn't seem to support non-OS drives?). Either way, the desktop app helps me not juggle terminal sessions and keeps a nice history in the sidebar, has a pretty plan display, easy ways of choosing permissions and worktrees, although I will admit that it can be sluggish and for some actions there just aren't progress indicators which feels oddly broken.
I wonder what they spend most of their time working on and why the basics aren't better, though to Anthropic's credit about a month ago the desktop Code section was borderline unusable on Windows when switching between two long conversations, which now seems to take a few seconds (which is still a few seconds too long, but at least usable).
The most obvious sign to me from the start that somebody wasn't really paying attention to how the Claude app(s) work is that on iOS, you have to leave the app active the entire time a response is streaming or it will error out.
Yes that plus having tens of billions of gulf money certainly helps you subsidize your moronic failures with money that isn't yours while you continue, and fail to, achieve profitability in any time horizon within a single lifespan.
There have certainly been periods of irrational exuberance in the tech industry, but there are also many companies that were criticized for being unprofitable which are now, as far as I can tell, quite profitable. Amazon, Uber, I'm sure many more. I'm curious what the basis is to say that Anthropic could never achieve profitability? Are the numbers that bad?
It's also crazy more expensive to run than we thought. That doesn't bode well when their loss-leader period is over and they need to start making money.
"Wildly successful but unpolished product first-to-market with a new technology gets dethroned by a competitor with superior execution" is a story as old as tech.
Also, many of the complaints seem more like giddy joy than anything.
The negative emotion regex, for example, is only used for a log/telemetry metric. Sampling "wtf?" along would probably be enough. Why would you use an agent for that?
I don't see how a vibe-coded app is freed from the same trade-offs that apply to a fast-moving human-coded one.
Especially since a human is still driving it, thus they will take the same shortcuts they did before: instead of a formal planning phase, they'll just yolo it with the agent. Instead of cleaning up technical debt, they want to fix specific issues that are easy to review, not touch 10 files to do a refactor that's hard to review. The highest priority issues are bugs and new integrations, not tech debt, just like it always was.
This is really just a reminder of how little upside there is to coding in the open.
I think the thing is that people expect one of the largest companies in the world to have well written code.
Claude’s source code is fine for a 1-3 person team. It’s atrocious for a flagship product from a company valued over $380 BILLION.
Like if that’s the best ai coding can do given infinite money? Yeah, the emperor has no clothes. If it’s not the best that can be done, then what kinda clowns are running the show over there?
I read this posts and I wonder how many people are thisdelusional or dishonest. I am programmer for 40 years and in most companies 90% of coders are so called stack overflow coders or google coders. Every coder who is honest will admit it and AI is already better than those 90%.FAR better.
At least most influencer coder start to admit the fact that the code is actually awesome, if you know what you are doing.
I am more of a code reviewer and I plan the implementation, what is far more exciting than writing the code itself. I have the feeling most feel the way I do but there are still those stack ovwerflow coders who are afraid to lose their jobs. And they will.
Because or in spite of? Claude code works because of Claude being good and network effects. Agentic coding tools are maybe the dumbest code ever for the level of popularity they have.
While beeing in the center of a hype vortex which basically suspends market physics. But all that bad code eats serverfarms that are going to cost double when the bubble starts deflating.
I think this is a pretty interesting comment because it gets to the heart of differing views on what quality means.
For you, non-buggy software is important. You could also reasonably take a more business centered approach, where having some number of paying customers is an indicator of quality (you've built something people are willing to pay for!) Personally I lean towards the second camp, the bugs are annoying but there is a good sprinkling of magic in the product which overall makes it something I really enjoy using.
All that is to say, I don't think there is a straightforward definition of quality that everyone is going to agree on.
Honestly for such a powerful tool, it’s pretty damn janky. Permissions don’t always work, hitting escape doesn’t always register correctly, the formatting breaks on its own to name a few of the issues i’ve had. It’s popular and successful but it’s got lots of thorns
I can literally see my teams codebase becoming an unmaintainable nightmare in front of my eyes each day.
I use copilot and Claude code and I frequently have to throw away their massively verbose and ridiculously complex code and engage my withering brain to come up with the correct solution that is 80% less code.
I probably get to the solution in the same time when all is said and done.
Honestly what is going on. What are we doing here?
Hardly. Claude Code is basically just a wrapper around an LLM with a CLI.
Obviously it does some fairly smart stuff under the hood, but it's not exactly comparable to a large software project.
But to your point, that doesn't mean you can't vibe code some poorly built product and sell it. But people have always been able to sell poorly built software projects. They can just do it a bit quicker now.
>Hardly. Claude Code is basically just a wrapper around an LLM with a CLI.
I don't know why people keep acting like harnesses are all the same but we know they aren't because people have swapped them out with the same models and receive vastly different results in code quality and token use.
This is a really wrong perspective on software. Short term monkey style coding does not produce products. You might get money but that is not what it is about.
This is similar to retarded builders in Turkey saying “wow, I can make the same building, sell for the same price, but spend way less” and then millions of people becoming victim when there is an earthquake.
This is not how responsible people should think about things in society
> This is a really wrong perspective on software. Short term monkey style coding does not produce products. You might get money but that is not what it is about.
Getting money is 100% what it is about and Claude Code is great product.
> This is a really wrong perspective on software. Short term monkey style coding does not produce products. You might get money but that is not what it is about
You're not alone in thinking that, but unfortunately I think it's a minority opinion. The only thing most people and most businesses care about is money. And frankly not even longterm, sustainable money. Most companies seem happy to extract short term profits, pay out the executives with big bonuses, then rot until they collapse
There is already lots of popular software that is violates any concept of good software. Facebook messenger, instagram, twitter, minecraft, balena etcher, the original ethereum wallet, almost anything that uses electron...
Except for the part where it's constantly having quality and reliability issues, even independent of the server-side infrastructure (OOMs on long running tasks, etc).
> It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
That was always the case. Landlords still want rent, the IRS still has figurative guns. Shipping shit code to please these folks and keep the company alive will always win over code quality, unless the system can be edited to financially incentivize code quality. The current loss function on society is literally "ship shit now and pay your taxes and rent".
>. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
The product is also a bit wonky and doesn't always provide the benefits it's hyped for. It often doesn't even produce any result for me, just keeps me waiting and waiting... and nothing happens, which is what I expect from a vibe coded app.
Yes, just get hundreds of billions of dollars in investments to build a leading product, and then use your massive legal team to force the usage of your highly subsidised and marketed subscription plan through your vibe coded software. This is excellent evidence that code doesn't matter.
> Yes, just get hundreds of billions of dollars in investments to build a leading product, and then use your massive legal team to force the usage of your highly subsidised and marketed subscription plan through your vibe coded software.
What? Your comment makes absolutely zero sense. Legal team forces people to use Claude Code?
I don't think anyone who used Claude code on the terminal had anything good to say about it. It was people using it through vs code that had a good time.
I have used Claude Code in the terminal to the tune of ~20m tokens in the last month and I have very little to complain about. There are definitely quirks that are annoying (as all software has, including vs code or jetbrains IDEs) but broadly speaking it does what it says on the tin ime
I prefer using it via the terminal. Might be anchoring bias, but I have had issues with slash commands not registering and hooks not working in the plugin.
> That wouldn’t even be a big violation of the vibe coding concept. You’re reading the innards a little but you’re only giving high-level, conceptual, abstract ideas about how problems should be solved. The machine is doing the vast majority, if not literally all, of the actual writing.
Claude Code is being produced at AI Level 7 (Human specced, bots coded), whereas the author is arguing that AI Level 6 (Bots coded, human understands somewhat) yields substantially better results. I happen to agree, but I'd like to call out that people have wildly different opinions on this; some people say that the max AI Level should be 5 (Bots coded, human understands completely), and of course some people think that you lose touch with the ground if you go above AI Level 2 (Human coded with minor assists).
It's also a context-specific scale. I work in computer vision. Building the surrounding app, UI, checkout flow, etcetera is easily Level 6/7(sorry...) on this scale.
Building the rendering pipeline, algorithms, maths, I've turned off even level 2. It is just more of a distraction than it's worth for that deep state of focus.
So I imagine at least some of the disconnect comes from the area people work in and its novelty or complexity.
This is exactly true in my experience! The usefulness of AI varies wildly depending on the complexity, correctness-requirements, & especially novelty of the domain.
This attribute plus a bit of human tribalism, social echo-chambering, & some motivated reasoning by people with a horse in the race, easily explains the discord I see in rhetoric around AI.
I like this framing, but it does seem to imply that a whole dev shop, or a whole product, can or should be built at the same level.
The fact is, I think the art of building well with AI (and I'm not saying it's easy) is to have a heterogenously vibe-coded app.
For example, in the app I'm working on now, certain algorithmically novel parts are level 0 (I started at level 1, but this was a tremendously difficult problem and the AI actually introduced more confusion than it provided ideas.)
And other parts of the app (mostly the UI in this case) are level 7. And most of the middleware (state management, data model) is somewhere in between.
Identifying the appropriate level for a given part of the codebase is IMO the whole game.
100% agree. Velocity at level 8 or even 7 is a whole order of magnitude faster than even level 5. Like you said, identifying the core and letting everything else move fast is most of the game. The other part is finding ways to up the level at which you’re building the core, which is a harder problem.
I'm at a 5, and only because I've implemented a lot of guardrails, am using a typed functional language with no nulls, TDD red/green, and a good amount of time spent spec'ing. No way I'd be comfortable enough this high with a dynamic language.
I could probably get to a 7 with some additional tooling and a second max 20 account, but I care too much about the product I'm building right now. Maybe for something I cared less about.
IMO if you're going 7+, you might as well just pick a statically typed and very safe (small surface area) language anyways, since you won't be coding yourself.
You aren't leveling up here... these levels are simple measures of how you use the tools to do something. You can regularly do things from any level or multiple levels at the same time.
That's an interesting list. I think that the humans that will make the most progress in the next few years are the ones that push themselves up to the highest level of that list. Right now is a period of intense disruption and there are many coders that don't like the idea that their way of life is dead. There are still blacksmiths around today but for the most part it's made by factories and cheap 3rd world labor. I think the same is currently happening with coding, except it will allow single builders and designers to do the same thing as an entire team 5 years ago.
> I think the same is currently happening with coding, except it will allow single builders and designers to do the same thing as an entire team 5 years ago.
This part of your post I think signals that you are either very new or haven't been paying attention; single developers were outperforming entire teams on the regular long before LLMs were a thing in software development, and they still are. This isn't because they're geniuses, but rather because you don't get any meaningful speedup out of adding team members.
I've always personally thought there is a sweet spot at about 3 programmers where you still might see development velocity increase, but that's probably wrong and I just prefer it to not feel too lonely.
In any case teams are not there to speed anything up, and anyone who thinks they are is a moron. Many, many people in management are morons.
At work I am at level 4, but my side projects have embarrassingly crept into Level 6. It is very tempting to accept the features as is, without taking the time understand how it works
> some people say that the max AI Level should be 5
> of course some people think that you lose touch with the ground if you go above AI Level 2
I really think that this framing sometimes causes a loss of granularity. As with most things in life, there is nuance in these approaches.
I find that nowadays for my main project I where I am really leaning into the 'autonomous engineering' concept, AI Level 7 is perfect - as long as it is qualified through rigorous QA processes on the output (ie it is not important what the code does if the output looks correct). But even in this project that I am really leaning into the AI 'hands-off' methodology, there are a few areas that dip into Level 5 or 4 depending on how well AI does them (Frontend Design especially) or on the criticality of the feature (in my case E2EE).
The most important thing is recognizing when you need to move 'up' or 'down' the scale and having an understanding of the system you are building
Thanks for that list of levels, it's helpful to understand how these things are playing out and where I'm at in relation to other engineers utilizing LLM agents.
I can say that I feel comfortable at approximately AI level 5, with occasional forays to AI level 6 when I completely understand the interface and can test it but don't fully understand the implementation. It's not really that different from working on a team, with the agent as a team member.
To clarify, does this mean that Anthropic employees don't understand Claude Code's code since it's level 7? I've got to believe they have staff capable of understanding the output and they would spend at least some time reviewing code for a product like this?
I’m not sure I believe that Level 7 exists for most projects. It is utterly *impossible* for most non-trivial programs to have a spec that doesn’t not have deep, carnal knowledge of the implementation. It can not be done.
For most interesting problems the spec HAS to include implementation details and architecture and critical data structures. At some point you’re still writing code, but in a different language, and it migtt hurt have actually been better to just write the damn struct declarations by hand and then let AI run with it.
I agree, I'm venturing into Level 6 myself and it often feels like being one step too high on a ladder. Level 7 feels like just standing on the very top of the ladder, which is terrifying (to me anyway as an experienced software engineer).
Given his background, you'd think he'd know that he should provide some evidence for his position (instead of making this completely unsupported rant).
My favorite uses of Claude code is to do code quality improvements that would be seen as a total waste of time if I was doing them by hand, but are perfectly fine when they are done mostly for free. Looking for repetitive patterns in unit tests/functional tests. Making sure that all json serialization is done in similar patterns unless there's a particularly good reason. Looking for functions that are way too complicated, or large chunks of duplication.
The PRs that it comes with are rarely even remotely controversial, shrink the codebase, and are likely saving tokens in the end when working on a real feature, because there's less to read, and it's more boring. Some patterns are so common you can just write them down, and throw them at different repos/sections of a monorepo. It's the equivalent of linting, but at a larger scale. Make the language hesitant enough, and it won't just be a steamroller either, and mostly fix egregrious things.
But again, this is the opposite of the "vibe coding" idea, where a feature appears from thin air. Vibe Linting, I guess.
Absolutely. I've got a nice multi-paragraph prompt on hunting for subtle bugs, user expectation breaks, crufty/repeated code, useless tests (six tests that actually should be one logical flow; assertions that a ternary is still, indeed, a ternary; etc.), documentation gaps, and a few other bits and bobs.
I sick Opus, GPT5.4, and Gemini on it, have them write their own hitlists, and then have a warden Opus instance go and try to counterprove the findings, and compose a final hitlist for me, then a fresh context instance to go fix the hitlist.
They always find some little niggling thing, or inconsistency, or code organization improvement. They absolutely introduce more churn than is necessary into the codebase, but the things they catch are still a net positive, and I validate each item on the final hitlist (often editing things out if they're being overeager or have found a one in a million bug that's just not worth the fix (lately, one agent keeps getting hung up on "what if the device returns invalid serial output" in which case "yeah, we crash" is a perfectly fine response)).
It’s so strange. I think there’s a few different groups:
- Shills or people with a financial incentive
- Software devs that either never really liked the craft to begin with or who have become jaded over time and are kind of sick of it.
- New people that are actually experiencing real, maybe over-excitement about being able to build stuff for the first time.
Forgetting the first group as that one is obvious.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason. Software work has become a grind for them and vibe coding is actually a relief.
Group 3 I think are mostly the non-coders who are genuinely feeling that rush of being able to will their ideas into existence on a computer. I think AI-assisted coding could actually be a great on-ramp here and we should be careful not to shit on them for it.
You’re missing the group of high performers who love coding, who just want to bring more stuff in the world than their limited human brains have the energy or time to build.
I love coding. I taught myself from a book (no internet yet) when I was 10, and haven’t stopped for 30 years. Turned down becoming a manager several times. I loved it so much that I went through an existential crisis in February as I had to let go of that part of my identity. I seriously thought about quitting.
But for years, it has been so frustrating that the time it took me to imagine roughly how to build something (10-30 minutes depending on complexity) was always dwarfed by the amount of time it took to grind it out (days or sometimes weeks). That’s no longer true, and that’s incredibly freeing.
So the game now is to learn to use this stuff in a way that I enjoy, while going faster and maintaining quality where it matters. There are some gray beards out there who I trust who say it’s possible, so I’m gonna try.
Good point and I’m exactly at the same point as you with this. Working on letting go of the idea (and to be honest just the habit) that it’s somehow ‘cheating’ at the moment.
Yes I'm exactly like you as well. I've been coding for 30+ years, I still love coding and system building etc, but sometimes the level of frustration to find the information and then get something working is simply too high.
Over a weekend, I used ChatGPT to set up Prometheus and Grafana and added node exporters to everything I could think of. I even told ChatGPT to create NOC-style dashboards for me, given the metrics I gave it. This is something that would have painstakingly take several weeks if not more to figure out, and it's something I've been wanting to do but the cognitive load and anticipatory frustration was too high for me to start. I love how it enables me to just do things.
My next step is to integrate some programs that I wrote that I still use every day to collect data and then show it on the dashboards as well.
On a side note, I don't know why Grafana hasn't more deeply integrated with AI. Having to sift through all the ridiculous metrics that different node exporters advertise with no hint of naming convention makes using Grafana so much harder. I cut and pasted all the metrics and dumped it into ChatGPT and told it to make the panels I wanted (ex. "Give me a dashboard that shows the status of all my servers" and it's able to pick and choose the correct metrics across my Windows server, Macbooks and studio, my Linux machines, etc), but Grafana should have this integrated themselves directly into themselves.
I don’t think that is true. I know several very high-performing engineers (some who could have retired a long time ago and are just in it for the love of the game) who use AI prolifically, without lowering any bars, and just deliver a lot more work.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason.
I think it's easy to dismiss that group, but the truth is there was a lot of flux in our industry in the last decade before AI, and I would say almost none of it was beneficial in any way whatsoever.
If I had more time I could write an essay arguing that the 2010s in software development was the rise of the complexity for complexity's sake that didn't make solving real world problems any easier and often massively increased the cost of software development, and worse the drudgery, with little actually achieved.
The thought leaders were big companies who faced problems almost no-one else did, but everyone copied them.
Which led to an unpleasant coding environment where you felt like a hamster spinning in a wheel, constantly having to learn the new hotness or you were a dinosaur just to do what you could already do.
Right now I can throw a wireframe at an AI and poof it's done, react, angular, or whatever who-gives-a-flying-sock about the next stupid javascript framework it's there. Have you switched from webpack to vite to bun? Poof, AI couldn't care less, I can use whatever stupid acronym command line tool you've decided is flavour of the month. Need to write some Lovecraftian-inspired yaml document for whatever dumbass deploy hotness is trending this week? AI has done it and I didn't have to spend 3 months trying to debug whatever stupid format some tit at netflix or amazon or google or meta came up with because they literally had nothing better to do with their life and bang my head against the wall when it falls over every 3 weeks but management are insisting the k8s is the only way to deploy things.
That in itself feels like second-system syndrome but instead of playing out over a single software project it’s the large-scale version playing out over the entire industry.
> I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason.
I say this kindly, but are you sure that _you_ aren't the one in group 2, and _they_ aren't the ones learning new things?
A lot of the discourse around ai coding reminds me of when I went to work for a 90s tech company around 2010 and all the linux guys _absolutely refused_ to learn devops or cloud stuff. It sucks when a lifetime of learned skills becomes devalued over night.
That’s pretty fair, I’m currently in the “trying to get over the feeling that it’s cheating” phase and also just haven’t formed the habit yet of reaching for AI as a tool in my toolbox; particularly in things like pre-review AI-assisted code review, which I’ve found really useful but sometimes don’t think of doing when I could.
In my opinion there are two main groups on the spectrum of "vibe coding". The non technical users that love it but don't understand software engineering enough to know what it takes to make a production grade product. The opposite are the AI haters that used chatgpt 3.5 and decided LLM code is garbage.
Both of these camps are the loudest voices on the internet, but there is a quiet but extremely productive camp somewhere in the middle that has enough optimism, open mindedness along with years of experience as an engineer to push Claude Code to its limit.
I read somewhere that the difference between vibe coding and "agentic engineering" is if you are able to know what the code does. Developing a complex website with claude code is not very different than managing a team of off shore developers in terms of risks.
Unless you are writing software for medical devices, banking software, fighter jets, etc... you are doing a disservice to your career by actively avoiding using LLMs as a tool in developing software.
I have used around $2500 in claude code credits (measured with `bunx ccusage` ) the last 6 months, and 95% of what was written is never going to run on someone else's computer, yet I have been able to get ridiculous value out of it.
These kinds of comments are so spectacularly useless. It was almost impossible to measure productivity gains from _computers_ for nearly two decades after they started being deployed to offices in the 1980s.
There were articles as late as the late 1990s that suggested that investing in IT was a waste of money and had not improved productivity.
You will not see obvious productivity gains until the current generation of senior engineers retires and you have a generation of developers who have only ever coded with AI, since they were in school.
This is nearly as dumb as the post that "Claude code is useless because your home built "Slack App" won't be globally distributed, with multi-primary databases and redis cache layer... and won't scale beyond 50k users".
As if 97% of web apps aren't just basic CRUD with some integration to another system if you are lucky.
Distributing an app to 100 users inside an enterprise is already a hellish nightmare and I'm pretty convinced that citizen developers will never be a thing - we'll sooner reach the singularity.
I think that citizen developers will be a thing--but not in the way you might be thinking.
More people will be enabled (and empowered) to "build" quick-and-dirty solutions to personal problems by just talking to their phone: "I need way to track my food by telling you what I ate and then you telling me how much I have left for today. And suggest what my next meal should be."
In the current paradigm--which is rapidly disappearing--that requires a UI app that makes you type things in, select from a list, open the app to see what your totals are, etc. And it's a paid subscription. In 6 months, that type of app can be ancient history. No more subscription.
So it's not about "writing apps for SaaS subscribers." It's about not needing to subscribe to apps at all. That's the disruption that's taking place.
Crappy code, maintenance, support, etc.--no longer even a factor. If the user doesn't like performance, they just say "fix ___" and it's fixed.
What subscription apps can't be replaced in this disruption? Tell me what you think.
When you move to the enterprise layer, suddenly you get the opposite problem, you have a low amount of "users" but you often need a load of CPU intensive or DB intensive processing to happen quickly.
One company I worked for had their system built by, ummmm, not the greatest engineers and were literally running out of time in the day to run their program.
Every client was scheduled over 24 hours, and they'd got to running the program for 22 hours per day and were desperately trying to fix it before they ran out of "time". They couldn't run it in parallel because part of the selling point of the program was that it amalgamated data from all the clients.
Without seeing more this seems like it could be solved by not recomputing the entire history to add on data. Depends what kind of math you are doing however.
Some sort of check point system could likely save significant IO.
What am I missing that requires you to recompute all data every day?
This reminds me of Clayton Christensen's theory of disruption.
Disruption happens when firms are disincentivized to switch to the new thing or address the new customer because the current state of it is bad, the margins are low. Intel missed out on mobile because their existing business was so excellent and making phone chips seemed beneath them.
The funny thing is that these firms are being completely rational. Why leave behind high margins and your excellent full-featured product for this half-working new paradigm?
But then eventually, the new thing becomes good enough and overtakes the old one. Going back to the Intel example, they felt this acutely when Apple switched their desktops to ARM.
For now, Claude Code works. It's already good enough. But unless we've plateaued on AI progress, it'll surpass hand crafted equivalents on most metrics.
This isn’t the narrative, at least in any circle I speak to. The narrative is currently that everyone needs to strive to be using hundreds of dollars of tokens a day or you aren’t being effective enough. Executives are mulling getting rid of code review and tests. I’ve never seen such blind optimism and so little appreciation for how things can go wrong.
Vibe coders' argument* is that quality of code does not matter because LLMs can iterate much much faster then humans do.
Consider this overly simplified process of writing a logic to satisfy a requirement:
1. Write code
2. Verify
3. Fix
We, humans, know the cost of each step is high, so we come up various way to improve code quality and reduce cognitive burden. We make it easier to understand when we have to revisit.
On the other hand, LLMs can understand** a large piece of code quickly***, and in addition, compile and run with agentic tools like Claude Code at the cost of token****. Quality does not matter to vibe coders if LLMs can fill the function logic that satisfies the requirement by iterating the aforementioned steps quickly.
I don't agree with this approach and have seen too many things broken from vibe code, but perhaps they are right as LLMs get better.
* Anecdotal
** I see LLM as just a probabilistic function so it doesn't "reason" like humans do. It's capable of highly advanced problem solving yet it also fails at primitive task.
*** Relative to human
**** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.
I think it's becoming clear we're not anywhere near AGI, we figured out how to vectorize our knowledge bases and replay it back. We have a vectorized knowledge base, not an AI.
Great way of putting it. That’s clearly what it is and it’s very good at that job. But it’s insane to pretend like it can be used with minimal supervision in all or even most applications.
From a tech discourse perspective, things have never been less productive than they are right now. I feel like we’re witnessing the implosion of an industry in real time. Thanks in no small part to venture capital and its henchmen.
Everyone seems to be drinking the proverbial kool-aid, and everyone else who is looking at the situation skeptically are labeled luddites. I expect we’ll get some clarity over the next few years on who is right. But I don’t know. It feels like the breakdown of shared epistemology. The kind of shared epistemology on which civilization was built.
> Then I explain what I think should be done and we’ll keep discussing it until I stop having more thoughts to give and the machine stops saying stupid things which need correcting.
Users like the author must be the most valuable Claude asset, because AI itself isn't a product — people's feedback that shapes output is.
They think their dog food tastes great now, not because they improved it any, but because they've forgotten the taste of human food. Karmically hilarious.
"Laughing" at how bad the code in Claude Code is really seems to be missing the forest for the trees. Anthropic didn't set out to build a bunch of clean code when writing Claude Code. They set out to make a bunch of money, and given CC makes in the low billions of ARR, is growing rapidly, and is the clear market leader, it seems they succeeded. Given this, you would think you'd would want to approach the strategy that Anthropic used with curiosity. How can we learn from what they did?
There's nothing wrong with saying that Claude Code is written shoddily. It definitely is. But I think it should come with the recognition that Anthropic achieved all of its goals despite this. That's pretty interesting, right? I'd love to be talking about that instead.
people that 'violate the rules of good code' when vibe-coding are largely people that don't know the rules of good code to begin with.
want code that isn't shit? embrace a coding paradigm and stick to it without flip-flopping and sticking your toe into every pond, use a good vcs, and embrace modularity and decomposability.
the same rules when 'writing real code'.
9/10 times when I see an out-of-control vibe coded project it sorta-kinda started as OOP before sorta-kinda trying to be functional and so on. You can literally see the trends change mid-code. That would produce shit regardless of what mechanism used such methods, human/llm/alien/otherwise.
> In this particular case, a human could have told the machine: “There’s a lot of things that are both agents and tools. Let’s go through and make a list of all of them, look at some examples, and I’ll tell you which should be agents and which should be tools. We’ll have a discussion and figure out the general guidelines. Then we’ll audit the entire set, figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
But that isn't the hard part. The hard part is that some people are using the tool versions and some are using the agent versions, so consolidating them one way or another will break someone's workflow, and that incurs a real actual time cost, which means this is now a ticket that needs to be prioritized and scheduled instead of being done for free.
This definitely reminds me of a lot of Nassim Taleb's work, which to say -- Anthropic may not be behaving intelligently but they are at least somewhat behaving honorably, -- if you're going to put out a dangerous product, a moral minimum is to use it heavily yourself so as to be exposed to the risk it creates.
Vibe coding is like building castles in a sandbox, it is fun but nobody would live in them.
Once you have learned enough from playing with sand castles, you can start over to build real castles with real bricks (and steel if you want to build skyscraper). Then it is your responsibility to make sure that they would not collapse when people move it.
It looks vibe coding, or at AI coding in general, has been challenging a few empirical laws:
- Brooks' No Silver Bullet: no single technology or management technique will yield a 10-fold productivity improvement in software development within a decade. If we write a spec that details everything we want, we would write soemthing as specific as code. Currently people seem to believe that a lot of the fundamentals are well covered by existing code, so a vague lines of "build me XXX with YYY" can lead to amazing results because AI successfully transfers the world-class expertise of some engineers to generate code for such prompt, so most of the complex turns to be accidental, and we only need much fewer engineers to handle essential complexities.
- Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
- Dijkstra on the foolishness of programming in natural language. Something along the line of which a system described in natural language becomes exponentially harder to manage as its size increases, whereas a system described in formal symbols grows linearly in complexity relative to its rules. Similar to above, people believe that the messiness of natural language is not a problem as long as we give detailed enough instructions to AI, while letting AI fills in the gaps with statistical "common sense", or expertise thereof.
- Lehman’s Law, which states that a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
- And remotely Coase's Law, which argues that firms exist because the transaction costs of using the open market are often higher than the costs of directing that same work internally through a hierarchy. People start to believe that the cost of managing and aligning agents is so low that one-person companies that handle large number of transactions will appear.
Also, ultimately Jevons Paradox, as people worry that the advances in AI will strip out so much demand that the market will slash more jobs than it will generate. I think this is the ultimate worry of many software engineers. Luddites were rediculed, but they were really skilled craftsmen who spent years mastering the art of using those giant 18-pound shears. They were the staff engineers of the 19th-century textile world. Mastering those 18-pound shears wasn't just a job but an identity, a social status, and a decade-long investment in specialized skills. Yeah, Jevons Paradox may bring new jobs eventually, but it may not reduce the blood and tears of the ordinary people.
> Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
I thought you were gonna go the opposite direction with this. Debugging is now 100x as hard as writing the code in the first place.
> Lehman’s Law, which states that as a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
Gotta disagree with this too. I find a lot of work has to be done to be able to continue vibing, because complexity increases beyond LLM capabilities rapidly otherwise.
> I thought you were gonna go the opposite direction with this. Debugging is now 100x as hard as writing the code in the first place.
100x harder if a human were to debug AI-generated code. I was merely citing other people's beliefs: AI can largely, if not completely, take care of debugging. And "better", rewrite the code altogether. I don't see how that could be a better approach, but that might just be me.
Assuming that AI challenges all that is in my perception a bit simple.
> Brooks' No Silver Bullet
Just because a person can create code or "results" much faster now, it doesn't say anything about productivity. Don't mistake dev productivity for economic productivity.
> Kernighan's Law, which says debugging is twice as hard as writing the code
Debugging is such a vague term in these matters. An AI may be decent to figure out their error they introduced into their code after it runs its own tests. But a production bug, i.e. reported from a user, can be very hard for AIs due to their utter lack of context.
> Dijkstra on the foolishness of programming in natural language.
> ...
> Lehman’s Law, which states that as a system's complexity increases as it evolves, unless work is done to maintain or reduce it.
No clue what the argument is here, "people believe otherwise" isn't.
> Also, ultimately Jevons Paradox
Actually relevant tech people confirm the paradox in the long run. Companies slash jobs now because they tend consolidate in chaotic times.
Interesting, though I disagree on basically all points...
> No Silver Bullet
As an industry, we do not know how to measure productivity. AI coding also does not increase reliability with how things are going. Same with simplicity, it's the opposite; we're adding obscene complexity, in the name of shipping features (the latter of which is not productivity).
In some areas I can see how AI doubles "productivity" (whatever that means!), but I do not see a 10x on the horizon.
> Kernighan's Law
Still holds! AI is amazing at debugging, but the vast majority of existing code is still human-written; so it'll have an easy time doing so, as indeed AI can be "twice as smart" as those human authors (in reality it's more like "twice as persistent/patient/knowledgeable/good at tool use/...").
Debugging fully AI-generated code with the same AI will fall into the same trap, subject to this law.
(As an aside, I do wonder how things will go once we're out of "use AI to understand human-generated content", to "use AI to understand AI-generated content"; it will probably work worse)
> just ask AI to rewrite the code
This is a terrible idea, unless perhaps there is an existing, exhaustive test harness. I'm sure people will go for this option, but I am convinced it will usually be the wrong approach (as it is today).
> Dijkstra on the foolishness of programming in natural language
So why are we not seeing repos of just natural language? Just raw prompt Markdown files? To generate computer code on-the-fly, perhaps even in any programming language we desire? And for the sake of it, assume LLMs could regenerate everything instantly at will.
For two reasons. The prompts would either need to raise to a level of precision as to be indistinguishable from a formal specification. And indeed, because complexity does become "exponentially harder"; inaccuracies inherent to human languages would compound. We need to persist results in formal languages still. It remains the ultimate arbiter. We're now just (much) better at generating large amounts of it.
> Lehman’s Law
This reminds me of a recent article [0]. Let AI run loose without genuine effort to curtail complexity and (with current tools and models) the project will need to be thrown out before long. It is a self-defeating strategy.
I think of this as the Peter principle applied to AI: it will happily keep generating more and more output, until it's "promoted" past its competence. At which point an LLM + tooling can no longer make sense of its own prior outputs. Advancements such as longer context windows just inflate the numbers (more understanding, but also more generating, ...).
The question is, will the market care? If software today goes wrong in 3% of cases, and with wide-spread AI use it'll be, say, 7%, will people care? Or will we just keep chugging along, happy with all the new, more featureful, but more faulty software? After all, we know about the Peter principle, but it's unavoidable and we're just happy to keep on.
> Jevons Paradox
My understanding is the exact opposite. We might well see a further proliferation of information technologies, into remaining sectors which have not yet been (economically) accessible.
> The question is, will the market care? If software today goes wrong in 3% of cases, and with wide-spread Al use it'll be, say, 7%, will people care? Or will we just keep chugging along, happy with all the new, more featureful, but more faulty software?
This is THE question. I honestly think the majority will gladly take an imperfect app over waiting for a perfect app or perhaps having no app at all. Some devs might be able to stand out with a polished app taking the traditional approach but it takes a lot longer to achieve that and by that point the market may be different, which is a risk
The ship has sailed. Vibe coding works. It will only work better in the future.
I have been programming for decades now, I have managed teams of developers. Vibe coding is great, specially in the hands of experts that know what they are doing.
Deal with it because it is not going to stop. In the near future it will be local and 100x faster.
How credible are the claims that the Claude Code source code is bad?
AI naysayers are heavily incentivized to find fault with it, but in my experience it's pretty rare to see a codebase of that size where it's not easy to pick out "bad code" examples.
Are there any relatively neutral parties who've evaluated the code and found it to be obviously junk?
Do you not think that ~400k lines of code for something as trivial as Claude Code is a great indication that there is an immense amount of bloat and stacking of overwrought, poor "choices" by LLMs in there? Do you not encounter this when using LLMs for programming yourself?
I routinely write my own solutions in parallel to LLM-implemented features from varying degrees of thorough specs and the bloat has never been less than 2x my solution, and I have yet to find any bloat in there that would cover more ground in terms of reliability, robustness, and so on. The biggest bloat factor I've found so far was 6x of my implementation.
I don't know, it's hard to read your post and not feel like you're being a bit obtuse. You've been doing this enough to understand just how bad code gets when you vibecode, or even how much nonsense tends to get tacked onto a PR if someone generates from spec. Surely you can do better than an LLM when you write code yourself? If you can, I'm not sure why your question even needs to be asked.
> Do you not think that ~400k lines of code for something as trivial as Claude Code is a great indication that there is an immense amount of bloat and stacking of overwrought, poor "choices" by LLMs in there?
I certainly wouldn't call Claude Code "trivial" - it's by far the most sophisticated TUI app I've ever interacted with. I can drag images onto it, it runs multiple sub-agents all updating their status rows at the same time, and even before the source code leaked I knew there was a ton of sophistication in terms of prompting under the hood because I'd intercepted the network traffic to see what it was doing.
If it was a million+ lines of code I'd be a little suspicious, but a few hundred thousand lines feels credible to me.
> Surely you can do better than an LLM when you write code yourself?
It takes me a solid day to write 100 lines of well designed, well tested code - and I'm pretty fast. Working with an LLM (and telling it what I want it to do) I can get that exact same level of quality in more like 30 minutes.
And because it's so much faster, the code I produce is better - because if I spot a small but tedious improvement I apply that improvement. Normally I would weigh that up against my other priorities and often choose not to do it.
So no, I can't do better that an LLM when I'm writing code by hand.
That said: I expect there are all sorts of crufty corners of Claude Code given the rate at which they've been shipping features and the intense competition in their space. I expect they've optimized for speed-of-shipping over quality-of-code, especially given their confidence that they can pay down technical debt fast in the future.
The fact that it works so well (I get occasional glitches but mostly I use it non-stop every day and it all works fine) tells me that the product is good quality, whether or not the lines of code underneath it are pristine.
How credible are the claims code en masse is good? Because I despise nearly every line of unreasonably verbose Java, that is so much waste of time and effort, but still deployed everywhere.
Every so often, some Windows source gets leaked, and people have a lot of fun laughing at how bad it is. If the source of, say, PeopleSoft were leaked, people would have a lot of fun laughing at how bad it is. If the source of Hogan Deposits were leaked, it would kill anyone who saw it.
I vibe code.
but I also remember the days I had ZERO KNOWLEDGE of what needs to be done, and I would hammer the keyboard with garbage code from stack overflow and half baked documentations plus some native guessing of human nature.
the end result was me understanding what the hell was going on.
those days are over.
"figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
No, I completely disagree with this entire article.
Bad code or good code is no longer relevant anymore. What matters is whether or not AI fulfills the contract as to how the application is supposed to work. If the code sucks, you just rerun the prompt again and the next iteration will be better. But better doesn't matter because humans aren't reading the code anymore. I haven't written a line of code since January and I've made very large scale improvements to the products I work on. I've even stopped looking at the code at all except a cursory look out of curiosity.
Worrying about how the sausage is made is a waste of time because that's how far AI has changed the game. Code doesn't matter anymore. Whether or not code is spaghetti is irrelevant. Cutting and pasting the same code over and over again is irrelevant. If it fulfills the contract, that's all that matters. If there's a bug, you update the contract and rerun it.
> Bad code or good code is no longer relevant anymore.
It's extremely relevant inasmuch as garbage code pollutes the AI's context and misleads it into writing more crap. "How the sausage is made" still matters.
This is the crux of the whole conversation. What percentage of software is "critical"? My guess is 50%. And AI will soon be able to play in that space as well. So in the future, maybe 25% of "critical" software will require real humans in the loop?
This entirely depends on the product. If it’s your own personal blog, then for sure no need to read the code, but a change in a banking architecture would be irresponsible to not have an understanding of the actual code change.
Yes, vibe coding is perfectly acceptable if it is coupled with financial and penal liability of the authors of the program for any damages caused by that program, so if they choose to use it they must be willing to bet on its suitability.
In case of damages, vibe coding should be an aggravating circumstance, i.e. gross negligence.
When the use of a program cannot have any nefarious consequences, obviously vibe coding is fine. However, I do not use many such applications.
I've been a skeptic about LLMs in general since I first heard of them. And I'm a sysadmin type, more comfortable with python scripts than writing "real" software. No formal education in coding at all other than taking Harvard's free online python course a few years ago.
So I set out to build an app with CC just to see what it's like. I currently use Copilot (copilot.money) to track my expenditures, but I've become enamored with sankey diagrams. Copilot doesn't have this charting feature, so I've been manually exporting all my transactions and massaging them in the sankey format. It's a pain in the butt, error prone, and my python skills are just not good enough to create a conversion script. So I had CC do it. After a few minutes of back and forth, it was working fine. I didn't care about spaghetti code at all.
So next I thought, how about having it generate the sankey diagrams (instead of me using sankeymatic's website). 30 minutes later, it had a local website running that was doing what I had been manually doing for months.
Now I was hooked. I started asking it to build a native GUI version (for macOS) and it dutifully cranked out a version using pyobjC etc. After ironing out a few bugs it was usable in less than 30 min. Feature adds consumed all my tokens for the day and the next day I was brimming with changes. Burned through that days tokens as well and after 3 days (I'm on the el cheapo plan), I have an app that basically does what I want in a reasonable attractive, and accurate manner.
I have no desire to look at the code. The size is relatively small, and resource usage is small as well. But it solved this one niche problem that I never had the time or skill to solve.
Is this a good thing? Will I be downvoted to oblivion? I don't know. I'm very very concerned about the long term impact of LLMs on society, technology and science. But it's very interesting to see the other side of what people are claiming.
I really identify with this. As an engineer, I really do enjoy building things. However, a lot of times, what I want is a thing that is built. A lot of time, that means I build it, which sometimes I enjoy and sometimes I don't; so many of my half finished projects are things that I still think would be awesome to have but didn't care to invest the time in building.
LLM-driven develop lets me have the thing built without needing to build the thing, and at the same time I get to exercise some ways-to-build I don't use as often (management, spec writing, spec editing, proactive unblocking, etc.). I have no doubt my work with LLMs has strengthened mental muscles that are also be helpful in technical management contexts/senior+principal-level technical work.
Honestly, I think it's great that you could get the thing you wanted done.
Consider this, though: Your anecdote has nothing to do with software engineering (or an engineering mindset). No measurements were done, no technical aspects were taken into consideration (you readily admit that you lack the knowledge to do that), you're not expecting to maintain it or seemingly to further develop it much.
The above situation has never actually been hard; the thing you made is trivial to someone who knows the basics of a small set of things. LLMs (not Claude Code) have made this doable for someone who knows none of the things and that's very cool.
But all of this really doesn't mean anything for solutions to more complex problems where more knowledge is required, or solutions that don't even really exist yet, or something that people pay for, or things that are expected to be worked on continuously over time, perhaps by multiple people.
When people decry vibecoding as being moronic, the subtext really is (or should be) that they're not really talking to you; they're talking to people who are delivering things that people are expected to pay for, or rely on as part of their workflow, and people who otherwise act like their output/product is good when it's clearly a mess in terms of UX.
I get what you're saying, but imagine a CTO/CIO who's never been very technical. The world is full of them. They vibe up an app, and think it's easy. They don't have the developer experience to know the things they're missing.
While I downplayed my job experience, I'm very in touch with developers and their workflows; the challenges they face. And I'm scared because they won't be making these decisions about LLM usage; their bosses, the guy who vibe coded a dumb app over the weekend will.
Where is the evidence that people are obsessed with one-shotting and not doing the iterative back-and-forth, prompt-and-correct system he describes here? It feels like he is attacking a strawman.
> So pure vibe coding is a myth. But they’re still trying to do it, and this leads to some very ridiculous outcomes
creating a product in a span of mere months that millions of developers use everday is opposite of ridiculous. we wouldn't even have known about the supposed ridiculousness of code if it hadnt leaked.
> I’ll start a conversation by saying “Let’s audit this codebase for unreachable code,” or “This function makes my eyes bleed,” and we’ll have a conversation about it until something actionable comes up. Then I explain what I think should be done and we’ll keep discussing it until I stop having more thoughts
This is painful to read. It feels like rant from person who does not use version control, testing and CI.
It is cruel to force machine into guessing game with a todler whose spec is "I do not like it". If you have a coding standarts and preferences, they should be already destiled and exlained somewhere, and applied automatically (like auto linter in not so old days). Good start is to find OS projects you like, let claude review it, and generate code rule. Than run it on your code base over night, until it passes tests and new coding standarts automated code review.
The "vibe coding" is you run several agants in parallel, sometimes multiple agents on the same problem with different approach, and just do coding reviews. It is mistake to have a synchronous conversation with a machine!
This type of works needs severe automation and parallelisation.
wow - I thought it was called 'ideation' or 'brainstorming'. he didn't give it a 'spec', he started a conversation with it to see if 'something actionable comes up' - which you actually quoted, but didn't appear to read ?
I think it is a 'cult' but also at the same time the inevitable future of engineering. The cult part are a subset of people who are not thinking about LLM code generation critically and blindly follow whatever trend is popular at this exact second.
The worst thing is that everyone but them knows how easy it is to take advantage of their blind hate. News companies, podcasts, and bloggers (such as this one) know they can just twist the thumbscrew and say "AI bad!" then rake in thousands of views/subs without even having to give a substantial argument.
And they're as deterministic as as the underlying thing they're abstracting... which is kinda what makes an abstraction an abstraction.
I get that people love saying LLMs are just compilers from human language to $OUTPUT_FORMAT but... they simply are not except in a stretchy metaphorical sense.
That's only true if you reduce the definition of "compiler" to a narrow `f = In -> Out`. But that is _not_ a compiler. We have a word for that: function. And in LLM's case an impure one.
I totally see what you're saying, but to me this feels different. Compilation is a fairly mechanical and well understood process. The large language models aren't just compiling English to assembler via your chosen language, they try and guess what you want, they add extra bits you didn't ask for, they're doing some of your solution thinking for you. That feels like more than just abstraction to me.
A fundamentally unreliable one: even an AI system that is entirely correctly implemented as far as any human can see can yield wrong answers and nobody can tell why.
That’s not entirely the fault of the technology, as natural language just doesn’t make for reliable specs, especially in inexperienced hands, so in a sense we finally got the natural-language that some among our ancestors dreamed of and it turned out to be as unreliable as some others of our ancestors said all along.
It partly is the fault of the technology, however, because while you can level all the same complaints against a human programmer, a (motivated) human will generally be much better at learning from their mistakes than the current generation of LLM-based systems.
(This even if we ignore other issues, such as the fact that it leaves everybody entirely reliant on the continued support and willingness to transact of a handful of vendors in a market with a very high barrier to entry.)
Does it have to be? The etymology of the word „abstraction“ is „to draw away“. I think it‘s relevant to consider just how far away you want to go.
If I‘m purely focused on the general outcome as written in a requirement or specification document, I‘d consider everything below that as „abstracted away“.
For example, this weekend I built my own MCP server for some services I‘m hosting on my personal server (*arr, Jellyfin, …) to be integrated with claude.ai. I‘ve written down all the things I want it to do, the environment it has to work in and let Claude go.
Not once have I looked at the code. And quite frankly, I don‘t care. As long as it fulfills my general requirements, it can write Python one time and TypeScript the other time should I choose to regenerate from that document. It might behave slightly differently but that is ok to a degree.
From my perspective, that is an abstraction. Deterministic? No, but it also doesn‘t have to be.
The argument against this is that human coders are also non-deterministic, so does it really matter if it's a human or an AI agent producing the code – assuming the AI agent is capable of producing human-quality code or better?
I agree it's not a layer of abstraction in the traditional sense though. AI isn't an abstraction of existing code, it's a new way to produce code. It's an "abstraction layer" in the same way an IDE is is an abstraction layer.
> The AI is very bad at spontaneously noticing, “I’ve got a lot of spaghetti code here, I should clean it up.” But if you tell it this has spaghetti code and give it some guidance (or sometimes even without guidance) it can do a good job of cleaning up the mess.
Set up an AI bot to analyze the code for spaghetti code parts and clean up these parts to turn it into a marvel. :-)
It’s truly strange that people keep citing the quality of Claude code’s leaked source as if it’s proof vibe coding doesn’t work.
If anything, it’s the exact opposite. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked. I've worked at BigCos and startups, and a lot of the terrible code that makes it to production was shocking when I first started.
This isn't a dig at anyone, I've certainly shipped my share of bad code as well. Deadlines, despite my wishes sometimes, continue to exist. Sometimes you have to ship a hack to make a customer or manager happy, and then replacing those hacks with better code just never happens.
For that matter, the first draft of nearly anything I write is usually not great. I might just be stupid, but I doubt I'm unique; when I've written nice, beautiful, optimized code, it's usually a second or third draft, because ultimately I don't think I fully understand the problem and the assumptions I am allowed to make until I've finished the first draft. Usually for my personal projects, my first dozen or so commits will be pretty messy, and then I'll have cleanup branches that I merge to make the code less terrible.
This isn't inherently bad, but a lot of the time I am simply not given time to do a second or third draft of the code, because, again, deadlines, so my initial "just get it working" draft is what ships into production. I don't love it, and I kind of dread of some of the code with my name attached to it at BigCo ever gets leaked, but that's just how it is in the corporate world sometimes.
This is the product that's claiming "coding is a solved problem" though.
I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
10 replies →
> I suspect if people saw the handwritten code
Somehow, everyone has forgotten the terrible code quality that existed prior to 2020.
https://www.youtube.com/watch?v=UjZQGRATlwA
Like, come on. Software has been shit for decades. AI hasn't observably reduced the quality of software I use everyday in a way that is meaningfully separable from normal incidents in the past.
2 replies →
> and then replacing those hacks with better code just never happens
Yeah, we even have an idiom for this - "Temporary is always permanent"
1 reply →
> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
3 replies →
Bad code works fine until it doesn't. In my experience, with humans, doing the right thing is worth it over doing the bad thing if your time horizon is a few months. Once you're in years, absolutely do the right thing, you're actually throwing time away if you don't. And I don't mean "big refactor", I mean at-change-time, when you think "this change feels like an icky hack."
For LLMs, I don't really know. I only have a couple years experience at that.
If you make a working and functional bad code, and put it on maintenance mode, it can keep churning for decades with no major issues.
Everything depends on context. Most code written by humans is indeed, garbage.
2 replies →
And it’s perfectly okay to fix and improve the code later.
Many super talented developers I know will say “Make it work, then make it good”. I think it’s okay to do this on a bigger scale than just the commit cycle.
1 reply →
If you are a company founder, what scenario would you rather find yourself in?
a) a pristine, good codebase that follows the best coding practices, but it is built on top of bad specs, wrong data/domain model
b) a bad codebase but it correctly models and nails the domain model for your business case
Real life example, a fintech with:
a) a great codebase but stuck with a single-entry ledger
b) a bad codebase that perfectly implements a double-entry ledger
2 replies →
> Bad code works fine until it doesn't.
Who is to judge the "good" or "bad" anyway?
But tech debt with vibe coding is fixed by just throwing more magic at it. The cost of tech debt has never been lower.
The fix time horizon changes too, don't discard that.
> you can build a crazy popular & successful product while violating all the traditional rules about “good” code
which has always been true
Yes, and to add, in case it's not obvious: in my experience the maintenance, mental (and emotional costs, call me sensitive) cost of bad code compounds exponentially the more hacks you throw at it
28 replies →
It’s also possible to sell chairs that are uncomfortable and food that tastes terrible. Yet somehow we still have carpenters and chefs; Herman Miller and The French Laundry.
Some business models will require “good” code, and some won’t. That’s how it is right now as well. But pretending that all business models will no longer require “good” code is like pretending that Michelin should’ve retired its list after the microwave was invented.
5 replies →
Still, talk about "good" code exist for a reason. When the code is really bad, you end up paying the price by having to spend too more and more time and develop new features, with greater risk to introduce bugs. I've seen that in companies in the past, where bad code meant less stability and more time to ship features that we needed to retain customers or get new ones.
Now whether this is still true with AI, or if vibe coding means bad code no longer have this long term stability and velocity cost because AI are better than humans at working with this bad code... We don't know yet.
Not only true but I would guess it's the normal case. Most software is a huge pile of tech debt held together by zip-ties. Even greenfield projects quickly trend this way, as "just make it work" pressure overrides any posturing about a clean codebase.
1 reply →
Not according to some on HN. They consider it impossible to create a successful business with imperfect code. Lol
1 reply →
It depends on the urgency. Not every product is urgent. CC arguable was very urgent; even a day of delay meant the competitors could come out with something slightly more appealing.
See also Salesforce, Oracle, SAP
1 reply →
Wordpress hides behind a cabinet
1. "Vibe coding" is a spectrum of how much human supervision (and/or scaffolding in the form of human-written tests and/or specs) is involved.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
The very definition of "vibe coding" is using AI to write software and not even look at the code it produces.
4 replies →
1 is definitely false right now. I gave specs, tests, full datasets, reference code to translate to an llm and still produce garbage code/fall flat on it's face. I just spent one week translating a codebase from go to cpp and i had to throw the whole thing out because it put in some horrible bugs that it could not fix even burning 500$ worth of tokens and me babysitting it. As i said it had everything at it's disposal: tests, reference impl, lots of data to work with. I finally got my lazy ass to inplement it and lo and behold i did it in 2 days with no bugs (that i know of) and the code quality is miles better than that undigested vomit. The codebase was a protocol library for decoding network traffic that used a lot of bit twiddling, flow control, huffman table compression, mildly complicated stuff. So no - if you want working non-trivial code that you can rely on then definitely don't use a llm to do it. Use it for autocomplete, small bits of code but never let the damn thing do the thinking for you.
1 reply →
Still, it's probably true that Claude Code (etc) will be more successful working on clean, well-structured code, just like human coders are. So short-term, maybe not such a big deal, but long-term I think it's still an unresolved issue.
I imagine it is way more affordable in terms of tokens to implement a feature in a well organized code base, rather than a hacky mess of a codebase that is the result of 30 band-aid fixes stacked on top of each other.
The traditional rules of good code are heuristics that are practical for human developers. A different set of heuristics will emerge for agentic development.
You can, but:
- Good code is what enables you to be able to build very complex software without an unreasonable number of bugs.
- Good code is what enables you to be responsive to changing customer needs and times. Whether you view that as valuable is another matter though. I guess it is a business decision. There have been plenty of business that have gone bust though by neglecting that.
Good code is for your own sanity, the machine does not care.
This product rides a hype wave. This is why it is crazy popular and successful.
The situation there is akin to Viaweb - Viaweb also rode hype wave and code situation was awful as well (see PG's stories about fixing bugs during customer's issue reproduction theater).
What did Viaweb's buyer do? They rewrote thing in C++.
If history rhymes, then buyer of Anthropic would do something close to "rewrite it in C++" to the current Claude Code implementation.
This is also why they had to release it quickly. They got the first mover advantage but if they delayed to make its code better, a competitor could have taken the wave instead of them.
I don't disagree with your general premise that eventually it'll just be rewritten, but I have to push back on the idea that Anthropic will be acquired. Their most recent valuation was $380B, and even if they wanted to be acquired (which I doubt) essentially no company has the necessary capital.
4 replies →
One truism about coding agents is that they struggle to work with bad code. Code quality matters as much as always, the experts say, and AI agents (left unfettered) produce bad code at an unprecedented rate. That's why good practices matter so much! If you use specs and test it like so and blah blah blah, that makes it all sustainable. And if anyone knows how to do it right, presumably it's Anthropic.
This codebase has existed for maybe 18 months, written by THE experts on agentic coding. If it is already unintelligible, that bodes poorly for how much it is possible to "accelerate" coding without taking on substantial technical debt.
i think you are conflating anthropic (the startup) with claude code (the leaked source of one of said startup's products)
i.e., the claude code codebase doesn't need to be good right now [^1] — so i don't think the assumption that this is an exemplary product / artifact of expert agentic coding actually holds up here specifically
[^1]: the startup graveyard is full of dead startups with good code
My understanding of OP was not a claim that "vibe coding doesn't work", but that the way Anthropic does it doesn't work. He seems to be specifically criticizing the "hands off the actual code, human" approach and advocating for keeping the human in the loop.
This, 100x.
I do M&As at my company - as a cto. I have seen lots of successful companies' codebases, and literally none of them elegant. Including very profitable companies with good, loved products.
The only good code I know is in the open source domain and in the demoscene. The commercial code is mostly crap - and still makes money.
This kinda puts it in words, most of us naturally expected 2025- LLMs to be able to generate OSS / demo / high craft code. Not messy commercial one.
What I'm missing so far is how they produced such awful code with the same product I'm using, which definitely would have called out some of those issues.
Perhaps the problem is getting multiple vibe-coders synced up when working on a large repo.
I suspect a lot of it is just older, before Opus 4.5+ got good at calling out issues.
Code quality is a tactical concern and products live or die on strategy.
I wouldn't recommend neglecting tactics if your strategy doesn't put you on the good side of a generational bubble though.
It kind of reminds me of grammar police type personalities. They are so hung up on the fact it reads “ugly” they can’t see the message; this code powers a rapidly growing $400B company. They admit refactoring is easy, but fail to realize they probably know that too and it’s just not a priority yet.
They won’t stay 400B for long, and Claude Code will have no effect on that.
> And as for the critics, tell me I don't get it
> Everybody can tell you how to do it, they never did it
> —jay-z
The underlying model powers the valuation.
Not the front end
1 reply →
You can send a submarine down to crushing depths while violating all the traditional rules about "good" engineering, too.
Right, and often the tested depth isnt maximum. So you slowly acclimate to worse and worse code practices if the effort needed to undo it is the same as doing.
2 replies →
It works, it is popular, sure. Claude's code may be barely old enough to have suffered through its true long-term maintainability problems. They probably also haven't had a lot of rotation/attrition in their staff.
Not AI but perfect example is Cloudflare. They have implemented public suffix list (to check if a domain is valid) 10 different times in 10 different ways. In one place, they have even embedded the list in frontend (pages custom domain). You report issues, they fix that one service, their own stuff isn't even aware that it exists in other places.
Meta has four different implementations of the same page to create a “page” for your business… which is required to be able to advertise on any of their services.
Each one is broken, doesn’t have working error handling, and prevents you from giving them money. They all exist to insert the same record somewhere. Lost revenue, and they seem to have no idea.
Amazons flagship ios app has had at least three highly visible bugs, for years. They’re like thorns in my eye sockets, every time I use it. They don’t care.
These companies are working with BILLIONS of dollars in engineering resources, unlimited AI resources, and with massive revenue effects for small changes.
Sometimes the world just doesn’t make sense.
4 replies →
It helps if the product is so revolutionary that people are willing to overlook bugs. Could you imagine a more mundane product with a TUI that flickered all the time where this wouldn't be a showstopper? I believe the bug is fixed now, but it seems crazy that it persisted so long given how obvious the cause was (clear before update). How many more bugs are in CC? As of a few weeks ago there were 5000 or so open issues against it on github.
The success is undeniable, but whether this vibe-coded level of quality is acceptable for more general use cases isn't something you can infer from that.
I'd imagine the AI engineers on million dollar TC are not vibe coding the models though, which is the actual sauce.
Yes, that is how Facebook, Yahoo and many other companies started out. But they rewrote their code when it became to big to be maintainable. The problem with shoddy code is not necessarily that it doesn't work but that it becomes impossible to change.
Makes me stare at mid nineties After Effects’ core rendering engine
ehh, as long as the overall starting architecture is decent, it's not hard to do tiny refactors across components
claude code, the app, is also not some radically complex concept (even if the codebase today is complicated)
but hey, that's why people do version breaking rewrites
It basically shifting work to future people. This mess will stop working and will introduce unsolvable obscure bugs one day, and someone actually will have to look at it.
It already costed many developers months and hundreds of dollars worth of tokens because of a bug. There will be more.
99.999999% of products can't get away with what Anthropic is able to - this is a one in a billion disruptive product with minimal competition, and its success so far is mostly due to Claude the model, not the agent harness
> It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
We already knew that. This is a matter of people who didn't know that or didn't want to acknowledge that thinking they now have proof that it doesn't matter for creating a crazy popular & successful product, as if it's a gotcha on those who advocate for good practices. When your goal is to create something successful that you can cash out, good practices and quality are/were never a concern. This is the basis for YAGNI, move-fast-and-break-things, and worse-is-better. We've know this since at least betamax-vs-VHS (although maybe the WiB VHS cultural knowledge is forgotten these days).
WiB is different from Move Fast and Break Things and again different from YAGNI though.
WiB doesn't mean the thing is worse, it means it does less. Claude Code interestingly does WAY more than something like Pi which is genuinely WiB.
Move Fast and Break Things comes from the assumption that if you capture a market quick enough you will then have time to fix things.
YAGNI is simply a reminder that not preparing for contingencies can result in a simpler code base since you're unlikely to use the contingencies.
The spaghetti that people are making fun of in Claude Code is none of these things except maybe Move Fast and Break Things.
VHS was not worse is better. It’s better is better.
7 replies →
TBH Claude Code is surprisingly shit to use given the technical resources and the amount of money behind it. Looking past the bugs and missing features, it's so obvious it's not built by people who care about the product from a developer/craftsman perspective. It's missing all the signs of polish/care, it feels like someone shipped an internal PoC to prod and kept hacking on it. And now they are just tacking on features to sell more buzzwords and internal prototypes. Classic user facing/commercial software story.
But we (the dev community) are kind of spoiled, because we have a lot of great developer tools that come from people passionate about their work, skilled at what they do and take pride in what they put out. I don't count myself among one of those people but I have benefited from their work throughout my career and have gotten used to it in my tooling.
All that being said Opus is hands down the best coding model for me (and I'm actively trying all of them) and I'll tolerate it as long as I can get it to do what I need, even with the warts and annoyances.
> TBH Claude Code is surprisingly shit to use given the technical resources and the amount of money behind it.
What harness would you recommend instead?
2 replies →
> TBH Claude Code is surprisingly shit to use given the technical resources and the amount of money behind it. Looking past the bugs and missing features, it's so obvious it's not built by people who care about the product from a developer/craftsman perspective. It's missing all the signs of polish/care, it feels like someone shipped an internal PoC to prod and kept hacking on it.
I don't wholly disagree, but personally it's still the tool I use and it's sort of fine. Perhaps not entirely for the money that's behind it, as you said, but it could be worse.
The CLI experience is pretty okay, although the auth is kinda weird (e.g. when trying to connect to AWS Bedrock). There's a permission system and sandboxing, plan mode and TODOs, decent sub-agent support, instruction files and custom skills, tool calls and LSP support and all the other stuff you'd expect. At least no weird bugs like I had with OpenCode where trying to paste multi-line content inside of a Windows Terminal session lead to the tool closing and every next line getting pasted in an executed in the terminal one by one, that was weird, though I will admit that using Windows feels messed up quite often nowadays even without stuff like that.
The desktop app gives you chat and cowork and code, although it almost feels like Cowork is really close to what Code does (and for some reason Cowork didn't seem to support non-OS drives?). Either way, the desktop app helps me not juggle terminal sessions and keeps a nice history in the sidebar, has a pretty plan display, easy ways of choosing permissions and worktrees, although I will admit that it can be sluggish and for some actions there just aren't progress indicators which feels oddly broken.
I wonder what they spend most of their time working on and why the basics aren't better, though to Anthropic's credit about a month ago the desktop Code section was borderline unusable on Windows when switching between two long conversations, which now seems to take a few seconds (which is still a few seconds too long, but at least usable).
The most obvious sign to me from the start that somebody wasn't really paying attention to how the Claude app(s) work is that on iOS, you have to leave the app active the entire time a response is streaming or it will error out.
3 replies →
Isn't Anthropic basically a bunch of AI PhDs writing code? I'd imagine they had to be dragged kicking and screaming into actual software engineering.
Yes that plus having tens of billions of gulf money certainly helps you subsidize your moronic failures with money that isn't yours while you continue, and fail to, achieve profitability in any time horizon within a single lifespan.
Also Claude owes its popularity mostly to the excellent model running behind the scenes.
The tooling can be hacky and of questionable quality yet, with such a model, things can still work out pretty well.
The moat is their training and fine-tuning for common programming languages.
1 reply →
Huh what moronic failure did Anthropic do? Every Claude Code user I know loves it.
13 replies →
There have certainly been periods of irrational exuberance in the tech industry, but there are also many companies that were criticized for being unprofitable which are now, as far as I can tell, quite profitable. Amazon, Uber, I'm sure many more. I'm curious what the basis is to say that Anthropic could never achieve profitability? Are the numbers that bad?
your prediction is going to be wrong, even with all those caveats
1 reply →
> It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
A lot of dollars fix a lot of mistakes.
devaluing craftsmanship is fundamentally insulting.
It's also crazy more expensive to run than we thought. That doesn't bode well when their loss-leader period is over and they need to start making money.
"Wildly successful but unpolished product first-to-market with a new technology gets dethroned by a competitor with superior execution" is a story as old as tech.
It is amazing how much malice did author put into proving the world is stupid, and only few still stand guard of reason.
There's also a business incentive for code produced by LLM companies to be hard to maintain. So you keep needing them in the future.
Also, many of the complaints seem more like giddy joy than anything.
The negative emotion regex, for example, is only used for a log/telemetry metric. Sampling "wtf?" along would probably be enough. Why would you use an agent for that?
I don't see how a vibe-coded app is freed from the same trade-offs that apply to a fast-moving human-coded one.
Especially since a human is still driving it, thus they will take the same shortcuts they did before: instead of a formal planning phase, they'll just yolo it with the agent. Instead of cleaning up technical debt, they want to fix specific issues that are easy to review, not touch 10 files to do a refactor that's hard to review. The highest priority issues are bugs and new integrations, not tech debt, just like it always was.
This is really just a reminder of how little upside there is to coding in the open.
I think the thing is that people expect one of the largest companies in the world to have well written code.
Claude’s source code is fine for a 1-3 person team. It’s atrocious for a flagship product from a company valued over $380 BILLION.
Like if that’s the best ai coding can do given infinite money? Yeah, the emperor has no clothes. If it’s not the best that can be done, then what kinda clowns are running the show over there?
6 replies →
I read this posts and I wonder how many people are thisdelusional or dishonest. I am programmer for 40 years and in most companies 90% of coders are so called stack overflow coders or google coders. Every coder who is honest will admit it and AI is already better than those 90%.FAR better. At least most influencer coder start to admit the fact that the code is actually awesome, if you know what you are doing. I am more of a code reviewer and I plan the implementation, what is far more exciting than writing the code itself. I have the feeling most feel the way I do but there are still those stack ovwerflow coders who are afraid to lose their jobs. And they will.
Because or in spite of? Claude code works because of Claude being good and network effects. Agentic coding tools are maybe the dumbest code ever for the level of popularity they have.
While beeing in the center of a hype vortex which basically suspends market physics. But all that bad code eats serverfarms that are going to cost double when the bubble starts deflating.
The model is the product.
It shows that you can have a garbage front end if people perceive value in your back end.
It also means that any competitor that improves on this part of the experience is going to eat your lunch.
Do we know if the original code was vibe coded? It's like chicken and an egg dilemma.
It's not a chicken and egg dilemma, the model can be used independently of Claude to write code, the heavy lifting is still done on their servers.
Its a buggy pos though, "popular and successful" have never been indicators of quality in any sense.
I think this is a pretty interesting comment because it gets to the heart of differing views on what quality means.
For you, non-buggy software is important. You could also reasonably take a more business centered approach, where having some number of paying customers is an indicator of quality (you've built something people are willing to pay for!) Personally I lean towards the second camp, the bugs are annoying but there is a good sprinkling of magic in the product which overall makes it something I really enjoy using.
All that is to say, I don't think there is a straightforward definition of quality that everyone is going to agree on.
1 reply →
ok, well if youd like to trade in 14billion dollars of revenue for better quality feel free.
Honestly for such a powerful tool, it’s pretty damn janky. Permissions don’t always work, hitting escape doesn’t always register correctly, the formatting breaks on its own to name a few of the issues i’ve had. It’s popular and successful but it’s got lots of thorns
I think you're all fucking crazy. Or maybe I am?
I can literally see my teams codebase becoming an unmaintainable nightmare in front of my eyes each day.
I use copilot and Claude code and I frequently have to throw away their massively verbose and ridiculously complex code and engage my withering brain to come up with the correct solution that is 80% less code.
I probably get to the solution in the same time when all is said and done.
Honestly what is going on. What are we doing here?
I think it is crazy popular for the model and not the crappy vibe code.
Value to customer. Literally the only thing that matters.
Value isn't a one-shot, though. Value sustained over time is what matters.
Well, if unmaintainable code gets in the way of the "sustained over time" part, then that is still a real problem.
1 reply →
Hardly. Claude Code is basically just a wrapper around an LLM with a CLI.
Obviously it does some fairly smart stuff under the hood, but it's not exactly comparable to a large software project.
But to your point, that doesn't mean you can't vibe code some poorly built product and sell it. But people have always been able to sell poorly built software projects. They can just do it a bit quicker now.
>Hardly. Claude Code is basically just a wrapper around an LLM with a CLI.
I don't know why people keep acting like harnesses are all the same but we know they aren't because people have swapped them out with the same models and receive vastly different results in code quality and token use.
This is a really wrong perspective on software. Short term monkey style coding does not produce products. You might get money but that is not what it is about.
This is similar to retarded builders in Turkey saying “wow, I can make the same building, sell for the same price, but spend way less” and then millions of people becoming victim when there is an earthquake.
This is not how responsible people should think about things in society
> This is a really wrong perspective on software. Short term monkey style coding does not produce products. You might get money but that is not what it is about.
Getting money is 100% what it is about and Claude Code is great product.
2 replies →
Nobody rewards responsibility though. It's all about making number go up.
1 reply →
> This is a really wrong perspective on software. Short term monkey style coding does not produce products. You might get money but that is not what it is about
You're not alone in thinking that, but unfortunately I think it's a minority opinion. The only thing most people and most businesses care about is money. And frankly not even longterm, sustainable money. Most companies seem happy to extract short term profits, pay out the executives with big bonuses, then rot until they collapse
There is already lots of popular software that is violates any concept of good software. Facebook messenger, instagram, twitter, minecraft, balena etcher, the original ethereum wallet, almost anything that uses electron...
We are all using Claude despite Claude Code. The day ChudCorp releases a superior model Chud, we are all gone in an instant. There is no moat.
Except for the part where it's constantly having quality and reliability issues, even independent of the server-side infrastructure (OOMs on long running tasks, etc).
I found that to be true years ago when I spooled the source of the Twitch leaks.
To me it said, clearly: nobody cares about your code quality other than your ability to ship interesting features.
It was incredibly eye-opening to me, I went in expecting different lessons honestly.
> It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
That was always the case. Landlords still want rent, the IRS still has figurative guns. Shipping shit code to please these folks and keep the company alive will always win over code quality, unless the system can be edited to financially incentivize code quality. The current loss function on society is literally "ship shit now and pay your taxes and rent".
>. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
The product is also a bit wonky and doesn't always provide the benefits it's hyped for. It often doesn't even produce any result for me, just keeps me waiting and waiting... and nothing happens, which is what I expect from a vibe coded app.
[dead]
Yes, just get hundreds of billions of dollars in investments to build a leading product, and then use your massive legal team to force the usage of your highly subsidised and marketed subscription plan through your vibe coded software. This is excellent evidence that code doesn't matter.
> Yes, just get hundreds of billions of dollars in investments to build a leading product, and then use your massive legal team to force the usage of your highly subsidised and marketed subscription plan through your vibe coded software.
What? Your comment makes absolutely zero sense. Legal team forces people to use Claude Code?
1 reply →
I know this isn't your point, but Anthropic has raised about $70 billion, not "hundreds of billions".
And they don't need a massive legal team to declare that you can't use their software subscription with other people's software.
I don't think anyone who used Claude code on the terminal had anything good to say about it. It was people using it through vs code that had a good time.
I have used Claude Code in the terminal to the tune of ~20m tokens in the last month and I have very little to complain about. There are definitely quirks that are annoying (as all software has, including vs code or jetbrains IDEs) but broadly speaking it does what it says on the tin ime
I prefer using it via the terminal. Might be anchoring bias, but I have had issues with slash commands not registering and hooks not working in the plugin.
> That wouldn’t even be a big violation of the vibe coding concept. You’re reading the innards a little but you’re only giving high-level, conceptual, abstract ideas about how problems should be solved. The machine is doing the vast majority, if not literally all, of the actual writing.
Claude Code is being produced at AI Level 7 (Human specced, bots coded), whereas the author is arguing that AI Level 6 (Bots coded, human understands somewhat) yields substantially better results. I happen to agree, but I'd like to call out that people have wildly different opinions on this; some people say that the max AI Level should be 5 (Bots coded, human understands completely), and of course some people think that you lose touch with the ground if you go above AI Level 2 (Human coded with minor assists).
[0] https://visidata.org/ai
It's also a context-specific scale. I work in computer vision. Building the surrounding app, UI, checkout flow, etcetera is easily Level 6/7(sorry...) on this scale.
Building the rendering pipeline, algorithms, maths, I've turned off even level 2. It is just more of a distraction than it's worth for that deep state of focus.
So I imagine at least some of the disconnect comes from the area people work in and its novelty or complexity.
This is exactly true in my experience! The usefulness of AI varies wildly depending on the complexity, correctness-requirements, & especially novelty of the domain.
This attribute plus a bit of human tribalism, social echo-chambering, & some motivated reasoning by people with a horse in the race, easily explains the discord I see in rhetoric around AI.
am layman. is CV "solved" at this point, or is there more work to be done?
1 reply →
I like this framing, but it does seem to imply that a whole dev shop, or a whole product, can or should be built at the same level.
The fact is, I think the art of building well with AI (and I'm not saying it's easy) is to have a heterogenously vibe-coded app.
For example, in the app I'm working on now, certain algorithmically novel parts are level 0 (I started at level 1, but this was a tremendously difficult problem and the AI actually introduced more confusion than it provided ideas.)
And other parts of the app (mostly the UI in this case) are level 7. And most of the middleware (state management, data model) is somewhere in between.
Identifying the appropriate level for a given part of the codebase is IMO the whole game.
100% agree. Velocity at level 8 or even 7 is a whole order of magnitude faster than even level 5. Like you said, identifying the core and letting everything else move fast is most of the game. The other part is finding ways to up the level at which you’re building the core, which is a harder problem.
I'm at a 5, and only because I've implemented a lot of guardrails, am using a typed functional language with no nulls, TDD red/green, and a good amount of time spent spec'ing. No way I'd be comfortable enough this high with a dynamic language.
I could probably get to a 7 with some additional tooling and a second max 20 account, but I care too much about the product I'm building right now. Maybe for something I cared less about.
IMO if you're going 7+, you might as well just pick a statically typed and very safe (small surface area) language anyways, since you won't be coding yourself.
You aren't leveling up here... these levels are simple measures of how you use the tools to do something. You can regularly do things from any level or multiple levels at the same time.
2 replies →
That's an interesting list. I think that the humans that will make the most progress in the next few years are the ones that push themselves up to the highest level of that list. Right now is a period of intense disruption and there are many coders that don't like the idea that their way of life is dead. There are still blacksmiths around today but for the most part it's made by factories and cheap 3rd world labor. I think the same is currently happening with coding, except it will allow single builders and designers to do the same thing as an entire team 5 years ago.
For certain kinds of software (financial systems, safety-critical systems) it may be very unwise to go beyond level 5.
There may be certain fields where you can't even get to 5.
> I think the same is currently happening with coding, except it will allow single builders and designers to do the same thing as an entire team 5 years ago.
This part of your post I think signals that you are either very new or haven't been paying attention; single developers were outperforming entire teams on the regular long before LLMs were a thing in software development, and they still are. This isn't because they're geniuses, but rather because you don't get any meaningful speedup out of adding team members.
I've always personally thought there is a sweet spot at about 3 programmers where you still might see development velocity increase, but that's probably wrong and I just prefer it to not feel too lonely.
In any case teams are not there to speed anything up, and anyone who thinks they are is a moron. Many, many people in management are morons.
At work I am at level 4, but my side projects have embarrassingly crept into Level 6. It is very tempting to accept the features as is, without taking the time understand how it works
> some people say that the max AI Level should be 5
> of course some people think that you lose touch with the ground if you go above AI Level 2
I really think that this framing sometimes causes a loss of granularity. As with most things in life, there is nuance in these approaches.
I find that nowadays for my main project I where I am really leaning into the 'autonomous engineering' concept, AI Level 7 is perfect - as long as it is qualified through rigorous QA processes on the output (ie it is not important what the code does if the output looks correct). But even in this project that I am really leaning into the AI 'hands-off' methodology, there are a few areas that dip into Level 5 or 4 depending on how well AI does them (Frontend Design especially) or on the criticality of the feature (in my case E2EE).
The most important thing is recognizing when you need to move 'up' or 'down' the scale and having an understanding of the system you are building
> https://visidata.org/ai
Thanks for that list of levels, it's helpful to understand how these things are playing out and where I'm at in relation to other engineers utilizing LLM agents.
I can say that I feel comfortable at approximately AI level 5, with occasional forays to AI level 6 when I completely understand the interface and can test it but don't fully understand the implementation. It's not really that different from working on a team, with the agent as a team member.
To clarify, does this mean that Anthropic employees don't understand Claude Code's code since it's level 7? I've got to believe they have staff capable of understanding the output and they would spend at least some time reviewing code for a product like this?
Yes, I believe the creator has outright stated that they just YOLO vibe and don't even look at the code.
1 reply →
Interesting breakdown of levels. I like it.
I’m not sure I believe that Level 7 exists for most projects. It is utterly *impossible* for most non-trivial programs to have a spec that doesn’t not have deep, carnal knowledge of the implementation. It can not be done.
For most interesting problems the spec HAS to include implementation details and architecture and critical data structures. At some point you’re still writing code, but in a different language, and it migtt hurt have actually been better to just write the damn struct declarations by hand and then let AI run with it.
I agree, I'm venturing into Level 6 myself and it often feels like being one step too high on a ladder. Level 7 feels like just standing on the very top of the ladder, which is terrifying (to me anyway as an experienced software engineer).
1 reply →
[dead]
This is the guy that created bittorrent, btw. I know that was a long time ago, but he's not just some random blogger.
Glad to see Bram getting into things lately. Second appearance on HN
[flagged]
[flagged]
2 replies →
99% of people here dont know what is bittorrent. but they can vibe it :)
Given his background, you'd think he'd know that he should provide some evidence for his position (instead of making this completely unsupported rant).
I think you're interpreting the structure and goal of "Bram's Thoughts" wrong. It's a guy's blog, not a thesis defense.
It's a blog post, not an academic paper. Do you cite every source when you're conversing with colleagues?
1 reply →
My favorite uses of Claude code is to do code quality improvements that would be seen as a total waste of time if I was doing them by hand, but are perfectly fine when they are done mostly for free. Looking for repetitive patterns in unit tests/functional tests. Making sure that all json serialization is done in similar patterns unless there's a particularly good reason. Looking for functions that are way too complicated, or large chunks of duplication.
The PRs that it comes with are rarely even remotely controversial, shrink the codebase, and are likely saving tokens in the end when working on a real feature, because there's less to read, and it's more boring. Some patterns are so common you can just write them down, and throw them at different repos/sections of a monorepo. It's the equivalent of linting, but at a larger scale. Make the language hesitant enough, and it won't just be a steamroller either, and mostly fix egregrious things.
But again, this is the opposite of the "vibe coding" idea, where a feature appears from thin air. Vibe Linting, I guess.
Absolutely. I've got a nice multi-paragraph prompt on hunting for subtle bugs, user expectation breaks, crufty/repeated code, useless tests (six tests that actually should be one logical flow; assertions that a ternary is still, indeed, a ternary; etc.), documentation gaps, and a few other bits and bobs.
I sick Opus, GPT5.4, and Gemini on it, have them write their own hitlists, and then have a warden Opus instance go and try to counterprove the findings, and compose a final hitlist for me, then a fresh context instance to go fix the hitlist.
They always find some little niggling thing, or inconsistency, or code organization improvement. They absolutely introduce more churn than is necessary into the codebase, but the things they catch are still a net positive, and I validate each item on the final hitlist (often editing things out if they're being overeager or have found a one in a million bug that's just not worth the fix (lately, one agent keeps getting hung up on "what if the device returns invalid serial output" in which case "yeah, we crash" is a perfectly fine response)).
Mind sharing that prompt? This is one of my favorite uses for AI too, but I’m just using it to fix the stuff that’s already top of mind for me.
It’s so strange. I think there’s a few different groups:
- Shills or people with a financial incentive
- Software devs that either never really liked the craft to begin with or who have become jaded over time and are kind of sick of it.
- New people that are actually experiencing real, maybe over-excitement about being able to build stuff for the first time.
Forgetting the first group as that one is obvious.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason. Software work has become a grind for them and vibe coding is actually a relief.
Group 3 I think are mostly the non-coders who are genuinely feeling that rush of being able to will their ideas into existence on a computer. I think AI-assisted coding could actually be a great on-ramp here and we should be careful not to shit on them for it.
You’re missing the group of high performers who love coding, who just want to bring more stuff in the world than their limited human brains have the energy or time to build.
I love coding. I taught myself from a book (no internet yet) when I was 10, and haven’t stopped for 30 years. Turned down becoming a manager several times. I loved it so much that I went through an existential crisis in February as I had to let go of that part of my identity. I seriously thought about quitting.
But for years, it has been so frustrating that the time it took me to imagine roughly how to build something (10-30 minutes depending on complexity) was always dwarfed by the amount of time it took to grind it out (days or sometimes weeks). That’s no longer true, and that’s incredibly freeing.
So the game now is to learn to use this stuff in a way that I enjoy, while going faster and maintaining quality where it matters. There are some gray beards out there who I trust who say it’s possible, so I’m gonna try.
Good point and I’m exactly at the same point as you with this. Working on letting go of the idea (and to be honest just the habit) that it’s somehow ‘cheating’ at the moment.
But what is the point of building things if you can’t feel good about them?
Yes I'm exactly like you as well. I've been coding for 30+ years, I still love coding and system building etc, but sometimes the level of frustration to find the information and then get something working is simply too high.
Over a weekend, I used ChatGPT to set up Prometheus and Grafana and added node exporters to everything I could think of. I even told ChatGPT to create NOC-style dashboards for me, given the metrics I gave it. This is something that would have painstakingly take several weeks if not more to figure out, and it's something I've been wanting to do but the cognitive load and anticipatory frustration was too high for me to start. I love how it enables me to just do things.
My next step is to integrate some programs that I wrote that I still use every day to collect data and then show it on the dashboards as well.
On a side note, I don't know why Grafana hasn't more deeply integrated with AI. Having to sift through all the ridiculous metrics that different node exporters advertise with no hint of naming convention makes using Grafana so much harder. I cut and pasted all the metrics and dumped it into ChatGPT and told it to make the panels I wanted (ex. "Give me a dashboard that shows the status of all my servers" and it's able to pick and choose the correct metrics across my Windows server, Macbooks and studio, my Linux machines, etc), but Grafana should have this integrated themselves directly into themselves.
I don’t think that is true. I know several very high-performing engineers (some who could have retired a long time ago and are just in it for the love of the game) who use AI prolifically, without lowering any bars, and just deliver a lot more work.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason.
I think it's easy to dismiss that group, but the truth is there was a lot of flux in our industry in the last decade before AI, and I would say almost none of it was beneficial in any way whatsoever.
If I had more time I could write an essay arguing that the 2010s in software development was the rise of the complexity for complexity's sake that didn't make solving real world problems any easier and often massively increased the cost of software development, and worse the drudgery, with little actually achieved.
The thought leaders were big companies who faced problems almost no-one else did, but everyone copied them.
Which led to an unpleasant coding environment where you felt like a hamster spinning in a wheel, constantly having to learn the new hotness or you were a dinosaur just to do what you could already do.
Right now I can throw a wireframe at an AI and poof it's done, react, angular, or whatever who-gives-a-flying-sock about the next stupid javascript framework it's there. Have you switched from webpack to vite to bun? Poof, AI couldn't care less, I can use whatever stupid acronym command line tool you've decided is flavour of the month. Need to write some Lovecraftian-inspired yaml document for whatever dumbass deploy hotness is trending this week? AI has done it and I didn't have to spend 3 months trying to debug whatever stupid format some tit at netflix or amazon or google or meta came up with because they literally had nothing better to do with their life and bang my head against the wall when it falls over every 3 weeks but management are insisting the k8s is the only way to deploy things.
That in itself feels like second-system syndrome but instead of playing out over a single software project it’s the large-scale version playing out over the entire industry.
> I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason.
I say this kindly, but are you sure that _you_ aren't the one in group 2, and _they_ aren't the ones learning new things?
A lot of the discourse around ai coding reminds me of when I went to work for a 90s tech company around 2010 and all the linux guys _absolutely refused_ to learn devops or cloud stuff. It sucks when a lifetime of learned skills becomes devalued over night.
That’s pretty fair, I’m currently in the “trying to get over the feeling that it’s cheating” phase and also just haven’t formed the habit yet of reaching for AI as a tool in my toolbox; particularly in things like pre-review AI-assisted code review, which I’ve found really useful but sometimes don’t think of doing when I could.
Personally I'm getting sick of the slop. If anything that is whats making me care less.
In my opinion there are two main groups on the spectrum of "vibe coding". The non technical users that love it but don't understand software engineering enough to know what it takes to make a production grade product. The opposite are the AI haters that used chatgpt 3.5 and decided LLM code is garbage.
Both of these camps are the loudest voices on the internet, but there is a quiet but extremely productive camp somewhere in the middle that has enough optimism, open mindedness along with years of experience as an engineer to push Claude Code to its limit.
I read somewhere that the difference between vibe coding and "agentic engineering" is if you are able to know what the code does. Developing a complex website with claude code is not very different than managing a team of off shore developers in terms of risks.
Unless you are writing software for medical devices, banking software, fighter jets, etc... you are doing a disservice to your career by actively avoiding using LLMs as a tool in developing software.
I have used around $2500 in claude code credits (measured with `bunx ccusage` ) the last 6 months, and 95% of what was written is never going to run on someone else's computer, yet I have been able to get ridiculous value out of it.
> extremely productive camp somewhere in the middle
How do you quantify and measure this productivity gain?
These kinds of comments are so spectacularly useless. It was almost impossible to measure productivity gains from _computers_ for nearly two decades after they started being deployed to offices in the 1980s.
There were articles as late as the late 1990s that suggested that investing in IT was a waste of money and had not improved productivity.
You will not see obvious productivity gains until the current generation of senior engineers retires and you have a generation of developers who have only ever coded with AI, since they were in school.
1 reply →
This is nearly as dumb as the post that "Claude code is useless because your home built "Slack App" won't be globally distributed, with multi-primary databases and redis cache layer... and won't scale beyond 50k users".
As if 97% of web apps aren't just basic CRUD with some integration to another system if you are lucky.
99% of companies won't even have 50k users.
Distributing an app to 100 users inside an enterprise is already a hellish nightmare and I'm pretty convinced that citizen developers will never be a thing - we'll sooner reach the singularity.
Here's my take:
I think that citizen developers will be a thing--but not in the way you might be thinking.
More people will be enabled (and empowered) to "build" quick-and-dirty solutions to personal problems by just talking to their phone: "I need way to track my food by telling you what I ate and then you telling me how much I have left for today. And suggest what my next meal should be."
In the current paradigm--which is rapidly disappearing--that requires a UI app that makes you type things in, select from a list, open the app to see what your totals are, etc. And it's a paid subscription. In 6 months, that type of app can be ancient history. No more subscription.
So it's not about "writing apps for SaaS subscribers." It's about not needing to subscribe to apps at all. That's the disruption that's taking place.
Crappy code, maintenance, support, etc.--no longer even a factor. If the user doesn't like performance, they just say "fix ___" and it's fixed.
What subscription apps can't be replaced in this disruption? Tell me what you think.
2 replies →
That's not actually true.
When you move to the enterprise layer, suddenly you get the opposite problem, you have a low amount of "users" but you often need a load of CPU intensive or DB intensive processing to happen quickly.
One company I worked for had their system built by, ummmm, not the greatest engineers and were literally running out of time in the day to run their program.
Every client was scheduled over 24 hours, and they'd got to running the program for 22 hours per day and were desperately trying to fix it before they ran out of "time". They couldn't run it in parallel because part of the selling point of the program was that it amalgamated data from all the clients.
Without seeing more this seems like it could be solved by not recomputing the entire history to add on data. Depends what kind of math you are doing however.
Some sort of check point system could likely save significant IO.
What am I missing that requires you to recompute all data every day?
Well, users or _paying_ users?
It's an important distinction
Probably either. And excluding non-paying users only further narrows the applicability.
This reminds me of Clayton Christensen's theory of disruption.
Disruption happens when firms are disincentivized to switch to the new thing or address the new customer because the current state of it is bad, the margins are low. Intel missed out on mobile because their existing business was so excellent and making phone chips seemed beneath them.
The funny thing is that these firms are being completely rational. Why leave behind high margins and your excellent full-featured product for this half-working new paradigm?
But then eventually, the new thing becomes good enough and overtakes the old one. Going back to the Intel example, they felt this acutely when Apple switched their desktops to ARM.
For now, Claude Code works. It's already good enough. But unless we've plateaued on AI progress, it'll surpass hand crafted equivalents on most metrics.
Even if AI progress plateaus, I'm confident we would build tooling and patterns around the current models that would surpass hand crafted equivalents.
[dead]
This isn’t the narrative, at least in any circle I speak to. The narrative is currently that everyone needs to strive to be using hundreds of dollars of tokens a day or you aren’t being effective enough. Executives are mulling getting rid of code review and tests. I’ve never seen such blind optimism and so little appreciation for how things can go wrong.
Vibe coders' argument* is that quality of code does not matter because LLMs can iterate much much faster then humans do.
Consider this overly simplified process of writing a logic to satisfy a requirement:
1. Write code
2. Verify
3. Fix
We, humans, know the cost of each step is high, so we come up various way to improve code quality and reduce cognitive burden. We make it easier to understand when we have to revisit.
On the other hand, LLMs can understand** a large piece of code quickly***, and in addition, compile and run with agentic tools like Claude Code at the cost of token****. Quality does not matter to vibe coders if LLMs can fill the function logic that satisfies the requirement by iterating the aforementioned steps quickly.
I don't agree with this approach and have seen too many things broken from vibe code, but perhaps they are right as LLMs get better.
* Anecdotal
** I see LLM as just a probabilistic function so it doesn't "reason" like humans do. It's capable of highly advanced problem solving yet it also fails at primitive task.
*** Relative to human
**** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.
I think it's becoming clear we're not anywhere near AGI, we figured out how to vectorize our knowledge bases and replay it back. We have a vectorized knowledge base, not an AI.
I like Tesler’s Theorem, which I recently heard about:
> AI is whatever hasn’t been done yet
Great way of putting it. That’s clearly what it is and it’s very good at that job. But it’s insane to pretend like it can be used with minimal supervision in all or even most applications.
From a tech discourse perspective, things have never been less productive than they are right now. I feel like we’re witnessing the implosion of an industry in real time. Thanks in no small part to venture capital and its henchmen.
Everyone seems to be drinking the proverbial kool-aid, and everyone else who is looking at the situation skeptically are labeled luddites. I expect we’ll get some clarity over the next few years on who is right. But I don’t know. It feels like the breakdown of shared epistemology. The kind of shared epistemology on which civilization was built.
> Then I explain what I think should be done and we’ll keep discussing it until I stop having more thoughts to give and the machine stops saying stupid things which need correcting.
Users like the author must be the most valuable Claude asset, because AI itself isn't a product — people's feedback that shapes output is.
> Users like the author
He’s a pretty interesting fella, I imagine his work influenced a lot of people over the years
https://en.wikipedia.org/wiki/Bram_Cohen
And this is actually sad. I wish programmer like Bram Cohen spent their time teaching other people instead of a chat model.
They think their dog food tastes great now, not because they improved it any, but because they've forgotten the taste of human food. Karmically hilarious.
"Laughing" at how bad the code in Claude Code is really seems to be missing the forest for the trees. Anthropic didn't set out to build a bunch of clean code when writing Claude Code. They set out to make a bunch of money, and given CC makes in the low billions of ARR, is growing rapidly, and is the clear market leader, it seems they succeeded. Given this, you would think you'd would want to approach the strategy that Anthropic used with curiosity. How can we learn from what they did?
There's nothing wrong with saying that Claude Code is written shoddily. It definitely is. But I think it should come with the recognition that Anthropic achieved all of its goals despite this. That's pretty interesting, right? I'd love to be talking about that instead.
If Claude Code truly was worth something they'd sell it instead of forcing its use with a subscription.
Incredible take. They are selling it.
3 replies →
Isn’t this the point? People use CC for the model, not the harness. So the harness can be slop and it doesn’t matter.
> That's pretty interesting, right? I'd love to be talking about that instead
So would I and a couple of others, but HNers don't want to have those kinds of conversations anymore.
people that 'violate the rules of good code' when vibe-coding are largely people that don't know the rules of good code to begin with.
want code that isn't shit? embrace a coding paradigm and stick to it without flip-flopping and sticking your toe into every pond, use a good vcs, and embrace modularity and decomposability.
the same rules when 'writing real code'.
9/10 times when I see an out-of-control vibe coded project it sorta-kinda started as OOP before sorta-kinda trying to be functional and so on. You can literally see the trends change mid-code. That would produce shit regardless of what mechanism used such methods, human/llm/alien/otherwise.
> In this particular case, a human could have told the machine: “There’s a lot of things that are both agents and tools. Let’s go through and make a list of all of them, look at some examples, and I’ll tell you which should be agents and which should be tools. We’ll have a discussion and figure out the general guidelines. Then we’ll audit the entire set, figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
But that isn't the hard part. The hard part is that some people are using the tool versions and some are using the agent versions, so consolidating them one way or another will break someone's workflow, and that incurs a real actual time cost, which means this is now a ticket that needs to be prioritized and scheduled instead of being done for free.
This definitely reminds me of a lot of Nassim Taleb's work, which to say -- Anthropic may not be behaving intelligently but they are at least somewhat behaving honorably, -- if you're going to put out a dangerous product, a moral minimum is to use it heavily yourself so as to be exposed to the risk it creates.
Vibe coding is like building castles in a sandbox, it is fun but nobody would live in them.
Once you have learned enough from playing with sand castles, you can start over to build real castles with real bricks (and steel if you want to build skyscraper). Then it is your responsibility to make sure that they would not collapse when people move it.
It looks vibe coding, or at AI coding in general, has been challenging a few empirical laws:
- Brooks' No Silver Bullet: no single technology or management technique will yield a 10-fold productivity improvement in software development within a decade. If we write a spec that details everything we want, we would write soemthing as specific as code. Currently people seem to believe that a lot of the fundamentals are well covered by existing code, so a vague lines of "build me XXX with YYY" can lead to amazing results because AI successfully transfers the world-class expertise of some engineers to generate code for such prompt, so most of the complex turns to be accidental, and we only need much fewer engineers to handle essential complexities.
- Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
- Dijkstra on the foolishness of programming in natural language. Something along the line of which a system described in natural language becomes exponentially harder to manage as its size increases, whereas a system described in formal symbols grows linearly in complexity relative to its rules. Similar to above, people believe that the messiness of natural language is not a problem as long as we give detailed enough instructions to AI, while letting AI fills in the gaps with statistical "common sense", or expertise thereof.
- Lehman’s Law, which states that a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
- And remotely Coase's Law, which argues that firms exist because the transaction costs of using the open market are often higher than the costs of directing that same work internally through a hierarchy. People start to believe that the cost of managing and aligning agents is so low that one-person companies that handle large number of transactions will appear.
Also, ultimately Jevons Paradox, as people worry that the advances in AI will strip out so much demand that the market will slash more jobs than it will generate. I think this is the ultimate worry of many software engineers. Luddites were rediculed, but they were really skilled craftsmen who spent years mastering the art of using those giant 18-pound shears. They were the staff engineers of the 19th-century textile world. Mastering those 18-pound shears wasn't just a job but an identity, a social status, and a decade-long investment in specialized skills. Yeah, Jevons Paradox may bring new jobs eventually, but it may not reduce the blood and tears of the ordinary people.
Intereting times.
> Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
I thought you were gonna go the opposite direction with this. Debugging is now 100x as hard as writing the code in the first place.
> Lehman’s Law, which states that as a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
Gotta disagree with this too. I find a lot of work has to be done to be able to continue vibing, because complexity increases beyond LLM capabilities rapidly otherwise.
> I thought you were gonna go the opposite direction with this. Debugging is now 100x as hard as writing the code in the first place.
100x harder if a human were to debug AI-generated code. I was merely citing other people's beliefs: AI can largely, if not completely, take care of debugging. And "better", rewrite the code altogether. I don't see how that could be a better approach, but that might just be me.
1 reply →
Assuming that AI challenges all that is in my perception a bit simple.
> Brooks' No Silver Bullet
Just because a person can create code or "results" much faster now, it doesn't say anything about productivity. Don't mistake dev productivity for economic productivity.
> Kernighan's Law, which says debugging is twice as hard as writing the code
Debugging is such a vague term in these matters. An AI may be decent to figure out their error they introduced into their code after it runs its own tests. But a production bug, i.e. reported from a user, can be very hard for AIs due to their utter lack of context.
> Dijkstra on the foolishness of programming in natural language. > ... > Lehman’s Law, which states that as a system's complexity increases as it evolves, unless work is done to maintain or reduce it.
No clue what the argument is here, "people believe otherwise" isn't.
> Also, ultimately Jevons Paradox
Actually relevant tech people confirm the paradox in the long run. Companies slash jobs now because they tend consolidate in chaotic times.
Interesting, though I disagree on basically all points...
> No Silver Bullet
As an industry, we do not know how to measure productivity. AI coding also does not increase reliability with how things are going. Same with simplicity, it's the opposite; we're adding obscene complexity, in the name of shipping features (the latter of which is not productivity).
In some areas I can see how AI doubles "productivity" (whatever that means!), but I do not see a 10x on the horizon.
> Kernighan's Law
Still holds! AI is amazing at debugging, but the vast majority of existing code is still human-written; so it'll have an easy time doing so, as indeed AI can be "twice as smart" as those human authors (in reality it's more like "twice as persistent/patient/knowledgeable/good at tool use/...").
Debugging fully AI-generated code with the same AI will fall into the same trap, subject to this law.
(As an aside, I do wonder how things will go once we're out of "use AI to understand human-generated content", to "use AI to understand AI-generated content"; it will probably work worse)
> just ask AI to rewrite the code
This is a terrible idea, unless perhaps there is an existing, exhaustive test harness. I'm sure people will go for this option, but I am convinced it will usually be the wrong approach (as it is today).
> Dijkstra on the foolishness of programming in natural language
So why are we not seeing repos of just natural language? Just raw prompt Markdown files? To generate computer code on-the-fly, perhaps even in any programming language we desire? And for the sake of it, assume LLMs could regenerate everything instantly at will.
For two reasons. The prompts would either need to raise to a level of precision as to be indistinguishable from a formal specification. And indeed, because complexity does become "exponentially harder"; inaccuracies inherent to human languages would compound. We need to persist results in formal languages still. It remains the ultimate arbiter. We're now just (much) better at generating large amounts of it.
> Lehman’s Law
This reminds me of a recent article [0]. Let AI run loose without genuine effort to curtail complexity and (with current tools and models) the project will need to be thrown out before long. It is a self-defeating strategy.
I think of this as the Peter principle applied to AI: it will happily keep generating more and more output, until it's "promoted" past its competence. At which point an LLM + tooling can no longer make sense of its own prior outputs. Advancements such as longer context windows just inflate the numbers (more understanding, but also more generating, ...).
The question is, will the market care? If software today goes wrong in 3% of cases, and with wide-spread AI use it'll be, say, 7%, will people care? Or will we just keep chugging along, happy with all the new, more featureful, but more faulty software? After all, we know about the Peter principle, but it's unavoidable and we're just happy to keep on.
> Jevons Paradox
My understanding is the exact opposite. We might well see a further proliferation of information technologies, into remaining sectors which have not yet been (economically) accessible.
0: https://lalitm.com/post/building-syntaqlite-ai/
> The question is, will the market care? If software today goes wrong in 3% of cases, and with wide-spread Al use it'll be, say, 7%, will people care? Or will we just keep chugging along, happy with all the new, more featureful, but more faulty software?
This is THE question. I honestly think the majority will gladly take an imperfect app over waiting for a perfect app or perhaps having no app at all. Some devs might be able to stand out with a polished app taking the traditional approach but it takes a lot longer to achieve that and by that point the market may be different, which is a risk
> And in the worst case, just ask AI to rewrite the code.
"And in the worst case just pay for it twice."
That leads to a dead end.
"Ladran, Sancho, señal que cabalgamos"
The ship has sailed. Vibe coding works. It will only work better in the future.
I have been programming for decades now, I have managed teams of developers. Vibe coding is great, specially in the hands of experts that know what they are doing.
Deal with it because it is not going to stop. In the near future it will be local and 100x faster.
"Aunque la mona se vista de seda mona se queda".
A pig with lipstick it's still a pig.
Or, aptly, as you quoted "Don Quixote":
'Con la iglesia hemos topado'.
(indeed Sancho), we just met the Church...
An expert doesn’t deploy code without reviewing it, at that point they’re vibe engineering not vibe coding.
How credible are the claims that the Claude Code source code is bad?
AI naysayers are heavily incentivized to find fault with it, but in my experience it's pretty rare to see a codebase of that size where it's not easy to pick out "bad code" examples.
Are there any relatively neutral parties who've evaluated the code and found it to be obviously junk?
Do you not think that ~400k lines of code for something as trivial as Claude Code is a great indication that there is an immense amount of bloat and stacking of overwrought, poor "choices" by LLMs in there? Do you not encounter this when using LLMs for programming yourself?
I routinely write my own solutions in parallel to LLM-implemented features from varying degrees of thorough specs and the bloat has never been less than 2x my solution, and I have yet to find any bloat in there that would cover more ground in terms of reliability, robustness, and so on. The biggest bloat factor I've found so far was 6x of my implementation.
I don't know, it's hard to read your post and not feel like you're being a bit obtuse. You've been doing this enough to understand just how bad code gets when you vibecode, or even how much nonsense tends to get tacked onto a PR if someone generates from spec. Surely you can do better than an LLM when you write code yourself? If you can, I'm not sure why your question even needs to be asked.
> Do you not think that ~400k lines of code for something as trivial as Claude Code is a great indication that there is an immense amount of bloat and stacking of overwrought, poor "choices" by LLMs in there?
I certainly wouldn't call Claude Code "trivial" - it's by far the most sophisticated TUI app I've ever interacted with. I can drag images onto it, it runs multiple sub-agents all updating their status rows at the same time, and even before the source code leaked I knew there was a ton of sophistication in terms of prompting under the hood because I'd intercepted the network traffic to see what it was doing.
If it was a million+ lines of code I'd be a little suspicious, but a few hundred thousand lines feels credible to me.
> Surely you can do better than an LLM when you write code yourself?
It takes me a solid day to write 100 lines of well designed, well tested code - and I'm pretty fast. Working with an LLM (and telling it what I want it to do) I can get that exact same level of quality in more like 30 minutes.
And because it's so much faster, the code I produce is better - because if I spot a small but tedious improvement I apply that improvement. Normally I would weigh that up against my other priorities and often choose not to do it.
So no, I can't do better that an LLM when I'm writing code by hand.
That said: I expect there are all sorts of crufty corners of Claude Code given the rate at which they've been shipping features and the intense competition in their space. I expect they've optimized for speed-of-shipping over quality-of-code, especially given their confidence that they can pay down technical debt fast in the future.
The fact that it works so well (I get occasional glitches but mostly I use it non-stop every day and it all works fine) tells me that the product is good quality, whether or not the lines of code underneath it are pristine.
1 reply →
How credible are the claims code en masse is good? Because I despise nearly every line of unreasonably verbose Java, that is so much waste of time and effort, but still deployed everywhere.
Every so often, some Windows source gets leaked, and people have a lot of fun laughing at how bad it is. If the source of, say, PeopleSoft were leaked, people would have a lot of fun laughing at how bad it is. If the source of Hogan Deposits were leaked, it would kill anyone who saw it.
I feel like vibe coding a product is functionally the same as prototyping.
In the past, which is a different country, we would throw away the prototypes.
Nowadays vibe coding just keeps adding to them.
I vibe code. but I also remember the days I had ZERO KNOWLEDGE of what needs to be done, and I would hammer the keyboard with garbage code from stack overflow and half baked documentations plus some native guessing of human nature. the end result was me understanding what the hell was going on. those days are over.
Looks like he just promotes AI by attacking vibe coding and inserting his own personal religion. AI is good, you just need to use it right.
"I have been screaming at my computer this past week dealing with a library that was written by overpaid meatbags with no AI help."
And here we go: The famous "humans do it, too" argument. With the gratuitous "meatbag" propaganda.
Look Bram, if you work on bitcoin bullshit startups, perhaps AI is good enough for you. No one will care.
"figure out which category each one belongs in, port the ones that are in the wrong type, and for the ones that are both, read through both versions and consolidate them into one document with the best of both.”
memory created!
We don't know how Claude Code looked before it was vibe coded. It might have always sucked.
No, I completely disagree with this entire article.
Bad code or good code is no longer relevant anymore. What matters is whether or not AI fulfills the contract as to how the application is supposed to work. If the code sucks, you just rerun the prompt again and the next iteration will be better. But better doesn't matter because humans aren't reading the code anymore. I haven't written a line of code since January and I've made very large scale improvements to the products I work on. I've even stopped looking at the code at all except a cursory look out of curiosity.
Worrying about how the sausage is made is a waste of time because that's how far AI has changed the game. Code doesn't matter anymore. Whether or not code is spaghetti is irrelevant. Cutting and pasting the same code over and over again is irrelevant. If it fulfills the contract, that's all that matters. If there's a bug, you update the contract and rerun it.
> Bad code or good code is no longer relevant anymore.
It's extremely relevant inasmuch as garbage code pollutes the AI's context and misleads it into writing more crap. "How the sausage is made" still matters.
Someone vibe-coded the brake control system in your car. It passes the tests. Is it good enough for you?
This is the crux of the whole conversation. What percentage of software is "critical"? My guess is 50%. And AI will soon be able to play in that space as well. So in the future, maybe 25% of "critical" software will require real humans in the loop?
This entirely depends on the product. If it’s your own personal blog, then for sure no need to read the code, but a change in a banking architecture would be irresponsible to not have an understanding of the actual code change.
Yes, vibe coding is perfectly acceptable if it is coupled with financial and penal liability of the authors of the program for any damages caused by that program, so if they choose to use it they must be willing to bet on its suitability.
In case of damages, vibe coding should be an aggravating circumstance, i.e. gross negligence.
When the use of a program cannot have any nefarious consequences, obviously vibe coding is fine. However, I do not use many such applications.
Your code is that contract (unless your tests cover every possible input, which is not practical in most cases).
I had to stop reading halfway through this article, my straw allergy had me sneezing uncontrollably at all the strawmen in there!
OT: I really enjoy Bram's takes, he's brilliant and prickly in the best ways.
Can they ask the Claude to clean up the duplication etc its English code?
Bizarre to make claims about how the Claude Code devs work based solely on the leaked source. They have talked plenty about how they work.
I've been a skeptic about LLMs in general since I first heard of them. And I'm a sysadmin type, more comfortable with python scripts than writing "real" software. No formal education in coding at all other than taking Harvard's free online python course a few years ago.
So I set out to build an app with CC just to see what it's like. I currently use Copilot (copilot.money) to track my expenditures, but I've become enamored with sankey diagrams. Copilot doesn't have this charting feature, so I've been manually exporting all my transactions and massaging them in the sankey format. It's a pain in the butt, error prone, and my python skills are just not good enough to create a conversion script. So I had CC do it. After a few minutes of back and forth, it was working fine. I didn't care about spaghetti code at all.
So next I thought, how about having it generate the sankey diagrams (instead of me using sankeymatic's website). 30 minutes later, it had a local website running that was doing what I had been manually doing for months.
Now I was hooked. I started asking it to build a native GUI version (for macOS) and it dutifully cranked out a version using pyobjC etc. After ironing out a few bugs it was usable in less than 30 min. Feature adds consumed all my tokens for the day and the next day I was brimming with changes. Burned through that days tokens as well and after 3 days (I'm on the el cheapo plan), I have an app that basically does what I want in a reasonable attractive, and accurate manner.
I have no desire to look at the code. The size is relatively small, and resource usage is small as well. But it solved this one niche problem that I never had the time or skill to solve.
Is this a good thing? Will I be downvoted to oblivion? I don't know. I'm very very concerned about the long term impact of LLMs on society, technology and science. But it's very interesting to see the other side of what people are claiming.
I really identify with this. As an engineer, I really do enjoy building things. However, a lot of times, what I want is a thing that is built. A lot of time, that means I build it, which sometimes I enjoy and sometimes I don't; so many of my half finished projects are things that I still think would be awesome to have but didn't care to invest the time in building.
LLM-driven develop lets me have the thing built without needing to build the thing, and at the same time I get to exercise some ways-to-build I don't use as often (management, spec writing, spec editing, proactive unblocking, etc.). I have no doubt my work with LLMs has strengthened mental muscles that are also be helpful in technical management contexts/senior+principal-level technical work.
Honestly, I think it's great that you could get the thing you wanted done.
Consider this, though: Your anecdote has nothing to do with software engineering (or an engineering mindset). No measurements were done, no technical aspects were taken into consideration (you readily admit that you lack the knowledge to do that), you're not expecting to maintain it or seemingly to further develop it much.
The above situation has never actually been hard; the thing you made is trivial to someone who knows the basics of a small set of things. LLMs (not Claude Code) have made this doable for someone who knows none of the things and that's very cool.
But all of this really doesn't mean anything for solutions to more complex problems where more knowledge is required, or solutions that don't even really exist yet, or something that people pay for, or things that are expected to be worked on continuously over time, perhaps by multiple people.
When people decry vibecoding as being moronic, the subtext really is (or should be) that they're not really talking to you; they're talking to people who are delivering things that people are expected to pay for, or rely on as part of their workflow, and people who otherwise act like their output/product is good when it's clearly a mess in terms of UX.
I get what you're saying, but imagine a CTO/CIO who's never been very technical. The world is full of them. They vibe up an app, and think it's easy. They don't have the developer experience to know the things they're missing.
While I downplayed my job experience, I'm very in touch with developers and their workflows; the challenges they face. And I'm scared because they won't be making these decisions about LLM usage; their bosses, the guy who vibe coded a dumb app over the weekend will.
1 reply →
almost as insane as your dumb crypto project
Where is the evidence that people are obsessed with one-shotting and not doing the iterative back-and-forth, prompt-and-correct system he describes here? It feels like he is attacking a strawman.
the simple truth is all code is garbage
> So pure vibe coding is a myth. But they’re still trying to do it, and this leads to some very ridiculous outcomes
creating a product in a span of mere months that millions of developers use everday is opposite of ridiculous. we wouldn't even have known about the supposed ridiculousness of code if it hadnt leaked.
I suppose months is better than weeks. The world's still not recovered from the last such tech used by millions of developers.
> I’ll start a conversation by saying “Let’s audit this codebase for unreachable code,” or “This function makes my eyes bleed,” and we’ll have a conversation about it until something actionable comes up. Then I explain what I think should be done and we’ll keep discussing it until I stop having more thoughts
This is painful to read. It feels like rant from person who does not use version control, testing and CI.
It is cruel to force machine into guessing game with a todler whose spec is "I do not like it". If you have a coding standarts and preferences, they should be already destiled and exlained somewhere, and applied automatically (like auto linter in not so old days). Good start is to find OS projects you like, let claude review it, and generate code rule. Than run it on your code base over night, until it passes tests and new coding standarts automated code review.
The "vibe coding" is you run several agants in parallel, sometimes multiple agents on the same problem with different approach, and just do coding reviews. It is mistake to have a synchronous conversation with a machine!
This type of works needs severe automation and parallelisation.
Bram Cohen doesn't just use version control, he invented one.
https://bramcohen.livejournal.com/17319.html
wow - I thought it was called 'ideation' or 'brainstorming'. he didn't give it a 'spec', he started a conversation with it to see if 'something actionable comes up' - which you actually quoted, but didn't appear to read ?
No, I read it. Machine needs handholding because it makes spagheti code.
This can be easily automated away!
> You don’t have to have poor quality software just because you’re using AI for coding.
People were given faster typers with incredible search capabilities and decided quality doesn’t matter anymore.
I don’t even mean the code. The product quality is noticeably sub par with so many vibe-coded projects.
[flagged]
what about the cult vibe projecting every business logic agentic?
[dead]
[flagged]
[dead]
[flagged]
I think it is a 'cult' but also at the same time the inevitable future of engineering. The cult part are a subset of people who are not thinking about LLM code generation critically and blindly follow whatever trend is popular at this exact second.
The worst thing is that everyone but them knows how easy it is to take advantage of their blind hate. News companies, podcasts, and bloggers (such as this one) know they can just twist the thumbscrew and say "AI bad!" then rake in thousands of views/subs without even having to give a substantial argument.
Do you guys remember the cult of git or the containerization cult? Damn, I hate the advancement ;D
I also remember the NFT cult.
AI is just another layer of abstraction. I'm sure the assembly language folks were grumbling about functions as being too abstracted at one point
High level languages that replaced assembly are not black boxes.
And they're as deterministic as as the underlying thing they're abstracting... which is kinda what makes an abstraction an abstraction.
I get that people love saying LLMs are just compilers from human language to $OUTPUT_FORMAT but... they simply are not except in a stretchy metaphorical sense.
That's only true if you reduce the definition of "compiler" to a narrow `f = In -> Out`. But that is _not_ a compiler. We have a word for that: function. And in LLM's case an impure one.
I totally see what you're saying, but to me this feels different. Compilation is a fairly mechanical and well understood process. The large language models aren't just compiling English to assembler via your chosen language, they try and guess what you want, they add extra bits you didn't ask for, they're doing some of your solution thinking for you. That feels like more than just abstraction to me.
I think it's still abstraction by definition, but you're right in that it's a much larger single leap than in the past.
2 replies →
> AI is just another layer of abstraction.
A fundamentally unreliable one: even an AI system that is entirely correctly implemented as far as any human can see can yield wrong answers and nobody can tell why.
That’s not entirely the fault of the technology, as natural language just doesn’t make for reliable specs, especially in inexperienced hands, so in a sense we finally got the natural-language that some among our ancestors dreamed of and it turned out to be as unreliable as some others of our ancestors said all along.
It partly is the fault of the technology, however, because while you can level all the same complaints against a human programmer, a (motivated) human will generally be much better at learning from their mistakes than the current generation of LLM-based systems.
(This even if we ignore other issues, such as the fact that it leaves everybody entirely reliant on the continued support and willingness to transact of a handful of vendors in a market with a very high barrier to entry.)
AI is non-deterministic. Can it still be considered an abstraction over a deterministic layer?
Does it have to be? The etymology of the word „abstraction“ is „to draw away“. I think it‘s relevant to consider just how far away you want to go.
If I‘m purely focused on the general outcome as written in a requirement or specification document, I‘d consider everything below that as „abstracted away“.
For example, this weekend I built my own MCP server for some services I‘m hosting on my personal server (*arr, Jellyfin, …) to be integrated with claude.ai. I‘ve written down all the things I want it to do, the environment it has to work in and let Claude go.
Not once have I looked at the code. And quite frankly, I don‘t care. As long as it fulfills my general requirements, it can write Python one time and TypeScript the other time should I choose to regenerate from that document. It might behave slightly differently but that is ok to a degree.
From my perspective, that is an abstraction. Deterministic? No, but it also doesn‘t have to be.
The argument against this is that human coders are also non-deterministic, so does it really matter if it's a human or an AI agent producing the code – assuming the AI agent is capable of producing human-quality code or better?
I agree it's not a layer of abstraction in the traditional sense though. AI isn't an abstraction of existing code, it's a new way to produce code. It's an "abstraction layer" in the same way an IDE is is an abstraction layer.
3 replies →
It can loop and probabilistically converge to a set of standards verified against a standard set of eval inputs
Higher level languages that abstract assembly code are deterministic. AI, on the other hand, is not.
That is abstraction of the implementation of the tool, not the output.
Producing outputs you don’t understand is novel
You could say that about atomic bombs, too.
> The AI is very bad at spontaneously noticing, “I’ve got a lot of spaghetti code here, I should clean it up.” But if you tell it this has spaghetti code and give it some guidance (or sometimes even without guidance) it can do a good job of cleaning up the mess.
Set up an AI bot to analyze the code for spaghetti code parts and clean up these parts to turn it into a marvel. :-)