Comment by alexpotato
8 hours ago
I work as a DevOps/SRE and have been doing it FinTech (bank, hedge funds, startups) and Crypto (L1 chain) for almost 20 years.
My thoughts on vibe coding vs production code:
- vibe coding can 100% get you to a PoC/MVP probably 10x faster than pre LLMs
- This is partly b/c it is good at things I'm not good at (e.g. front end design)
- But then I need to go in and double check performance, correctness, information flow, security etc
- The LLM makes this easier but the improvement drops to about 2-3x b/c there is a lot of back and forth + me reading the code to confirm etc (yes, another LLM could do some of this but then that needs to get setup correctly etc)
- The back and forth part can be faster if e.g. you have scripts/programs that deterministically check outputs
- Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
So overall, this is why I think we're getting wildly different reports on how effective vibe coding is. If you've never built a data pipeline and a LLM can spin one up in a few minutes, you think it's magic. But if you've spent years debugging complicated trading or compliance data pipelines you realize that the LLM is saving you some time but not 10x time.
I'm building a Java HFT engine and the amount of things AI gets wrong is eye opening. If I didn't benchmark everything I'd end up with much less optimized solution.
Examples: AI really wants to use Project Panama (FFM) and while that can be significantly faster than traditional OO approaches it is almost never the best. And I'm not taking about using deprecated Unsafe calls, I'm talking about using primative arrays being better for Vector/SIMD operations on large sets of data. NIO being better than FFM + mmap for file reading.
You can use AI to build something that is sometimes better than what someone without domain specific knowledge would develop but the gap between that and the industry expected solution is much more than 100 hours.
AI is extremely good at the things that it has many examples for. If what you are doing is novel then it is much less of a help, and it is far more likely to start hallucinating because 'I don't know' is not in the vocabulary of any AI.
> because 'I don't know' is not in the vocabulary of any AI.
That is clearly false. I’m only familiar with Opus, but it quite regularly tells me that, and/or decides it needs to do research before answering.
If I instruct it to answer regardless, it generally turns out that it indeed didn’t know.
4 replies →
I think the main issue is treating LLM as a unrestrained black box, there's a reason nobody outside tech trust so blindly on LLMs.
The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.
See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.
> there's a reason nobody outside tech trust so blindly on LLMs.
Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.
I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)
We had politicians share LLM crap, researchers doing papers with hallucinated citations..
It's not just tech people.
6 replies →
In my experience, people outside of tech have nearly limitless faith in AI, to the point that when it clashes with traditional sources of truth, people start to question them rather than the LLM.
I would say that if AI has to make decisions about picking between framework or constructs irrelevant to the domain at hand, it feels to me like you are not using the AI correctly.
> AI really wants to use Project Panama
It would help if you briefly specified the AI you are using here. There are wildly different results between using, say, an 8B open-weights LLM and Claude Opus 4.6.
I've been using several. LM Studio and any of the open weight models that can fit my GPU's RAM (24GB) are not great in this area. The Claude models are slightly better but not worth they extra cost most of the time since I typically have to spend almost the same amount of time reworking and re-prompting, plus it's very easy to exhaust credits/tokens. I mostly bounce back and forth between the codex and Gemini models right now and this includes using pro models with high reasoning.
Wouldn't Java always lose in terms of latency against a similarly optimized native code in, let's say, C(++)?
You can achieve optimized C/C++ speeds, you just can't program the same way you always have. Step 1, switch your data layout from Array of Structures to Structure of Arrays. Step 2, after initial startup switch to (near) zero object creation. It's a very different way to program Java.
You have to optimize your memory usage patterns to fit in CPU cache as much as possible which is something typical Java develops don't consider. I have a background in assembly and C.
I'd say it's slightly harder since there is a little bit of abstraction but most of the time the JIT will produce code as good as C compilers. It's also an niche that often considers any application running on a general purpose CPU to be slow. If you want industry leading speed you start building custom FPGAs.
Not necessarily. Java can be insanely performant, far more than I ever gave it credit for in the first decade of its existence. There has been a ton of optimization and you can now saturate your links even if you do fairly heavy processing. I'm still not a fan of the language but performance issues seem to be 'mostly solved'.
1 reply →
As long as you tune the JVM right it can be faster. But its a big if with the tune, and you need to write performant code
4 replies →
Depends. Many reasons, but one is that Java has a much richer set of 3rd party libraries to do things versus rolling your own. And often (not always) third party libraries that have been extensively optimized, real world proven, etc.
Then things like the jit, by default, doing run time profiling and adaptation.
1 reply →
There are actually cases when Java (the HotSpot JVM) runs faster than the same logic written in C/C++ because the JVM is doing dynamic analysis and selective JIT compilation to machine code.
I personally know of an HFT firm that used Java approximately a decade ago. My guess would be they're still using it today given Java performance has only improved since then.
6 replies →
I am curious about what causes some to choose Java for HFT. From what I remember the amount of virgin sacrifices and dances with the wolves one must do to approach native speed in this particular area is just way too much of development time overhead.
Probably the same thing that makes most developers choice a language for a project, it's the language they know best.
It wasn't a matter of choosing Java for HFT, it was a matter of selecting a project that was a good fit for Java and my personal knowledge. I was a Java instructor for Sun for over a decade, I authored a chunk of their Java curriculum. I wrote many of the concurrency questions in the certification exams. It's in my wheelhouse :)
My C and assembly is rusty at this point so I believe I can hit my performance goals with Java sooner than if I developed in more bare metal languages.
"HFT" means different things to different people.
I've worked at places where ~5us was considered the fast path and tails were acceptable.
In my current role it's less than a microsecond packet in, packet out (excluding time to cross the bus to the NIC).
But arguably it's not true HFT today unless you're using FPGA or ASIC somewhere in your stack.
2 replies →
Then you list all of the things you want it not to do and construct a prompt to audit the codebase for the presence of those things. LLMs are much better at reviewing code than writing it so getting what you want requires focusing more on feedback than creation instructions.
I've seen SQL injection and leaked API tokens to all visitors of a website :)
There’s a big gap between reality and the influencer posts about LLMs. I agree with you that LLMs do provide some significant acceleration, but the influencers have tried to exaggerate this into unbelievable numbers.
Even non-influencers are trying to exaggerate their LLM skills as a way to get hired or raise their status on LinkedIn. I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company). Many of these posts come from people who were all in on crypto companies a few years ago.
The world really is changing but there’s a wave of influencers and trend followers trying to stake out their claims as leaders on this new frontier. They should be ignored if you want any realistic information.
I also think these exaggerated posts are causing a lot of people to miss out on the real progress that is happening. They see these obviously false exaggerations and think the opposite must be true, that LLMs don’t provide any benefit at all. This is creating a counter-wave of LLM deniers who think it’s just a fad that will be going away shortly. They’re diminishing in numbers but every LLM thread on HN attracts a few people who want to believe it’s all just temporary and we’re going back to the old ways in a couple years.
> I rarely read the LinkedIn social feed but when I check mine it’s now filled with claims from people about going from idea to shipped product in N days (with a note at the bottom that they’re looking for a new job or available to consult with your company).
This always seems to be the pattern. "I vibe coded my product and shipped it in 96 hours!" OK, what's the product? Why haven't I heard of it? Why can't it replace the current software I'm using? So, you're looking for work? Why is nobody buying it?
Where is the Quicken replacement that was vibecoded and shipping today? Where are the vibecoded AAA games that are going to kill Fortnite? Where is the vibecoded Photoshop alternative? Heck, where is the vibecoded replacement for exim3 that I can deploy on my self hosted E-mail server? Where are all of the actual shipping vibecoded products that millions of users are using?
I found one example of this going very wrong on reddit the other day -
https://www.reddit.com/r/selfhosted/comments/1rckopd/huntarr...
One redditor security reviews a vibe coded project
2 replies →
I agree with your general point but ... "Where are the vibecoded AAA games". A game dev team is typically less than 15% programmers. Most of the team are artists, followed by game designers. Maybe someday those will be replaced too but at the moment, while you can get some interesting pictures from stable-diffusion techniques it's unlikely to make a cohesive game and even prompting to create all of it would still take many person years.
That said, I have had some good experiences getting a few features from zero to working via LLMs and it's helped me find lots of bugs far easier than my own looking.
I can imagine a vibe coded todo app. I can also kind of imagine a vibe coded gIMP/Photoshop though it would still take several person years, prompting through each and every feature.
> Where are all of the actual shipping vibecoded products that millions of users are using?
Claude Code and OpenClaw - they are vibecoded. And I believe more coming.
2 replies →
Yeah, I really wonder if someone would trust to do their taxes in a vibe-coded version of Turbotax...
1 reply →
I regret only having one upvote for this.
I note that games are mostly art assets and things like level design, and players are already happy to instantly consign such products to the slop bin.
The whole thing is "market for lemons": app stores filling with dozens of indistinguishable clones of each product category will simply scare users off all of them.
"I come from a state that raises corn and cotton and cockleburs and Democrats, and frothy eloquence neither convinces nor satisfies me. I am from Missouri. You have got to show me."
>Many of these posts come from people who were all in on crypto companies a few years ago.
This is ditto my observation. There seems to be a certain "type" of people like this. And it's not just people looking for work.
My guess is either they have super low critical thinking, a very cynical view of the world where lies and exaggeration are the only way to make it, or something more pathological (narcissism etc).
The "type" is simply the get-rich-quick schemers.
I have a relative who was late to crypto, late to drop shipping, late to carbon credits, but is now absolutely all-in on AI as his ticket out. It honestly depresses the hell out of me trying to talk to him because everything is about money and getting rich.
People like this don't care about underlying technologies or learning past the most basic surface level of understanding.
Day 7 of using Claude Code here are my takes...
“Day 7" would be amazing - all that I see YouTube recommending is "I tried it for 24 hours"
I was listening to an "expert" on a podcast earlier today up until the point where the interviewer asked how long his amazing new vibe-coded tooling has been in production, and the self-proclaimed expert replied "actually we have an all-hands meeting later today so I can brief the team and we will then start using the output..."
The “store on the chain” thing turned out to be a fad in terms of technology, even though it made a lot of money (in the billions and more) to some people via the crypto thing. That was less than 10 years ago, so many of us do remember the similarities of the discourse being made then to what’s happening now.
With all that said, today’s LLMs do seem so provide a little bit more value compared to the bit chain thing, for example OCR/.pdf parsing is I’d say a solved thing right now thanks to LLMs, which is nice.
> Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
Actually I had some terrible experiences when asking the agent to do something simple in our codebase (like, rename these files and fix build scripts and dependencies) but it spent much longer time than a human, because it kept running the full CI pipelines to check the problems after every attempted change.
A human would, for example, rely on the linter to detect basic issues, run a partial build on affected targets, etc. to save the time. But the agent probably doesn't have a sense of time elapsed.
Can't this be solved with something like "Don't run any CI commands" in the AGENTS.md?
Went through something similar recently with database calls.
Co-pilot said something about having too many rows returned and had some complex answer on how to reduce row count.
I just added a "LIMIT 100" which was more than adequate.
This is exactly my experience at Lovable. For some parts of the organization, LLMs are incredibly powerful and a productivity multiplier. For the team I am in, Infra, it's many times distraction and a negative multiplier.
I can't say how many times the LLM-proposed solution to a jittery behavior is adding retries. At this point we have to be even more careful with controlling the implementation of things in the hot path.
I have to say though, giving Amp/Claude Code the Grafana MCP + read-only kubectl has saved me days worth of debugging. So there's definitely trade-offs!
My colleague recently shipped a "bug fix" that addresses a race condition by adding a 200ms delay somewhere, almost completely coded by LLM. LLM even suggests that "if this is not good enough, increase it to 300ms".
That says something about how much some people care about this.
Even doubly so because that's how most people have solved a similar problem, so that the LLM suggests that
The magic is testing. Having locally available testing and high throughput testing with high amount of test cases now unlocks more speed.
The test cases themselves becomes the foci - the LLM usually can't get them right.
How does that test suite get built and validated? A comprehensive and high quality test suite is usually much larger than the codebase it tests. For example, the sqlite test suite is 590x [1] the size of the library itself
1. https://sqlite.org/testing.html
sqlite is an extreme outlier not a typical example, with regard to test suite size and coverage.
> The magic is testing.
No it is not.
There os no amount of testing that can fix a flawed design
The word "Testing" is a very loaded term. Few non-professionals, or even many professionals, fully understand what is meant by it.
Consider the the following: Unit, Integration, System, UAT, Smoke, Sanity, Regression, API Testing, Performance, Load, Stress, Soak, Scalability, Reliability, Recovery, Volume Testing, White Box Testing, Mutation Testing, SAST, Code Coverage, Control Flow, Penetration Testing, Vulnerability Scanning, DAST, Compliance (GDPR/HIPAA), Usability, Accessibility (a11y), Localization (L10n), Internationalization (i18n), A/B Testing, Chaos Engineering, Fault Injection, Disaster Recovery, Negative Testing, Fuzzing, Monkey Testing, Ad-hoc, Guerilla Testing, Error Guessing, Snapshot Testing, Pixel-Perfect Testing, Compatibility Testing, Canary Testing, Installation Testing, Alpha/Beta Testing...
...and I'm certain I've missed dozens of other test approaches.
There is no science to testing, no provable best way, despite many people's vehement opinions
You forgot a hope-driven development and release process and other optimism based ("i'm sure it's fine" method), or faith based approaches to testing (ship and pray, ...). Customer driven invluntary beta testing also comes to mind and "let's see what happens" 0-day testing before deployment. We also do user-driven error discovery, frequently.
> - This is partly b/c it is good at things I'm not good at (e.g. front end design)
Everyone thinks LLMs are good at the things they are bad at. In many cases they are still just giving “plausible” code that you don’t have the experience to accurately judge.
I have a lot of frontend app dev experience. Even modern tools (Claude w/Opus 4.6 and a decent Claude.md) will slip in unmaintainable slop in frontend changes. I catch cases multiple times a day in code review.
Not contradicting your broader point. Indeed, I think if you’ve spent years working on any topic, you quickly realize Claude needs human guidance for production quality code in that domain.
Yes I’ve seen this at work where people are promoting the usage of LLMs for.. stuff other people do.
There’s also a big disconnect in terms of SDLC/workflow in some places. If we take at face value that writing code is now 10x faster, what about the other parts of the SDLC? Is your testing/PR process ready for 10x the velocity or is it going to fall apart?
What % of your SDLC was actually writing code? Maybe time to market is now ~18% faster because coding was previously 20% of the duration.
It’s the Gell-Mann amnesia effect applied to LLM instead of media
>Testing workloads that take hours to run still take hours to run with either a human or LLM testing them out (aka that is still the bottleneck)
Absolutely. Tight feedback loops are essential to coding agents and you can’t run pipelines locally.
This is where I think we need better tooling around tiered validation - there's probably quite a bit you can run locally if we had the right separation; splitting the cheap validation from the expensive has compounding benefits for LLMs.
I concur on the DevSecOps aspect for a more specific reason: If you're failing a pipeline because ThirdPartyTOol69 doesn't like your code style or W/E, you can have the LLM fix it. Or get you to 100% test coverage etc. Or have it update your Cypress/Jest/SonarQube configs until the pipeline passes without losing brain cells doing it by hand. Or finds you a set of dependency versions that passes.
What I do now is I make an MVP with the AI, get it working. And then tear it all down and start over again, but go a little slower. Maybe tear down again and then go even more slowly. Until I get to the point where I'm looking at everything the AI does and every line of code goes through me.
Also, now you're reading someone else's code and not everybody likes that. In fact, most self-proclaimed 10x coders I know hate it.
So instead of the 10x coder doing it, the 1x coder does it, but then that factor of 3x becomes 0.3x.
Absolutely. In my experience there are more “good coders” than people who are good at code review/PR/iterative feedback with another dev.
A lot of people are OCD pedants about stuff that can be solved with a linter (but can’t be bothered to implement one) or just “LGTM” everything. Neither provide value or feedback to help develop other devs.
> A lot of people are OCD pedants about stuff that can be solved with a linter (but can’t be bothered to implement one) or just “LGTM” everything. Neither provide value or feedback to help develop other devs.
This many be one of the best quotes on HN in a while.
1 reply →
More generally: LLM effectiveness is inversely proportional to domain specificity. They are very good at producing the average, but completely stumble at the tails. Highly particular brownfield optimization falls into the tails.
Isn’t that the reason why people advocate for spec-driven development instead of vibe coding?
At this point, every programmer who claims that vibecoding doesn't make you at least 10 times more productive is simply lying or worst, doesn't know how to vibe code. -So, you want to tell me that you don't review the code you write? Or that others don't review it? - You bring up ONE example with a bottleneck that has nothing to do with programming. Again, if you claim it doesn't make you 10x more productive, you don't know how to use AI, it is that simple. - I pin up 10 agents, while 5 are working on apps, 5 do reviews and testing, I am at the end of that workflow and review the code WHILE the 10 agents keep working.
For me it is far more than 10x, but I consider noobs by saying 10x instead of 20x or more.
Can you link to one launched product with users for us?
Just goes to show that most programmers have no idea what most programmers are mostly programming. Great that it works for you, but don't assume that this applies to everyone else.
I can't tell if this is real or a joke.
What exactly are you producing? LinkedIn posts?