← Back to context

Comment by jasondigitized

5 hours ago

I feel like I am taking crazy pills. I am getting code that works from Opus 4.5. It seems like people are living in two separate worlds.

Working code doesn’t mean the same for everyone. My coworker just started vibe coding. Her code works… on happy paths. It absolutely doesn’t work when any kind of error happens. It’s also absolutely impossible to refactor it in any way. She thinks her code works.

The same coworker asked to update a service to Spring Boot 4. She made a blog post about. She used LLM for it. So far every point which I read was a lie, and her workarounds make, for example tests, unnecessarily less readable.

So yeah, “it works”, until it doesn’t, and when it hits you, that you need to work more in sum at the end, because there are more obscure bugs, and fixing those are more difficult because of terrible readability.

I can't help but think of my earliest days of coding, 20ish years ago, when I would post my code online looking for help on a small thing, and being told that my code is garbage and doesn't work at all even if it actually is working.

There are many ways to skin a cat, and in programming the happens-in-a-digital-space aspect removes seemingly all boundaries, leading to fractal ways to "skin a cat".

A lot of programmers have hard heads and know the right way to do something. These are the same guys who criticized every other senior dev as being a bad/weak coder long before LLMs were around.

Parent's profile shows that they are an experienced software engineer in multiple areas of software development.

Your own profile says you are a PM whose software skills amount to "Script kiddie at best but love hacking things together."

It seems like the "separate worlds" you are describing is the impression of reviewing the code base from a seasoned engineer vs an amateur. It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.

At least in my experience, learning to quickly read a code base is one of the later skills a software engineer develops. Generally only very experienced engineers can dive into an open source code base to answer questions about how the library works and is used (typically, most engineers need documentation to aid them in this process).

I mean, I've dabbled in home plumbing quite a bit, but if AI instructed me to repair my pipes and I thought it "looked great!" but an experienced plumber's response was "ugh, this doesn't look good to me, lots of issues here" I wouldn't argue there are "two separate worlds".

  • > It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.

    This really is it: AI produces bad to mediocre code. To someone who produces terrible code mediocre is an upgrade, but to someone who produces good to excellent code, mediocre is a downgrade.

That is such a vague claim, that there is no contradiction.

Getting code to do exactly what, based on using and prompting Opus in what way?

Of course it works well for some things.

That's a significant rub with LLMs, particularly hosted ones: the variability. Add in quantization, speculative decoding, and dynamic adjustment of temperature, nucleus sampling, attention head count, & skipped layers at runtime, and you can get wildly different behaviors with even the same prompt and context sent to the same model endpoint a couple hours apart.

That's all before you even get to all of the other quirks with LLMs.

It depends heavily on the scope and type of problem. If you're putting together a standard isolated TypeScript app from scratch it can do wonders, but many large systems are spread between multiple services, use abstractions unique to the project, and are generally dealing with far stricter requirements. I couldn't depend on Claude to do some of the stuff I'd really want, like refactor the shared code between six massive files without breaking tests. The space I can still have it work productively in is still fairly limited.