← Back to context

Comment by whatevertrevor

2 days ago

You're right in a descriptive manner, but I also think the parent comment's point is about correctness and not determinism.

In other engineering fields correctness-related-guarantees can often be phrased in probabilistic ways, e.g. "This bridge will withstand a 10-year flood event but not a 100-year flood event", but underneath those guarantees are hard deterministic load estimates with appropriate error margins.

And I think that's where the core disagreement between you and the parent comment lies. I think they're trying to say AI generated code-pushers are often getting fuzzy on speccing out the behavior guarantees of their own software. In some ways the software industry has _always_ been bad at this, despite working with deterministic math, surprise software bugs are plentiful, but vibe-coding takes this to another level.

(This is my best-case charitable understanding of what they're saying, but also happens to be where I stand)

> "I think they're trying to say AI generated code-pushers are often getting fuzzy on speccing out the behavior guarantees of their own software."

I agree, and I think that's the root of the years-long argument of whether programmers are "real" engineers, where "real engineering" implies a level of rigor about the existence of and adherence to specifications.

My take on this is though that this unseriousness really has little to with AI and entirely to do with the longstanding culture of software generally. In fact I'd go as far as to say that pre-LLM ML was better about this than the rest of the industry at-large.

I've had the good fortune to be working in this realm since before LLMs became the buzzword - most ML teams had well-quantified model behaviors! They knew their precision and recall! You kind of had to, because it was very hard to get models to do what you wanted, plus companies involved in this space generally cared about outcomes.

Then we got LLMs, when you can superficially produce really impressive results easily, and the dominance of vibes over results. I can't stand it either, and mostly am just waiting for most of these things to go bust so we can go back to probabilistic systems where we give a shit about quantification.

  • I agree with that.

    I think part of the issue with the lack of "real" quantification in the results of LLMs is that the output and problem domain is so ill-defined. With standard neural nets (and other kinds of ML) classifiers, regression models and reinforcement models all had very narrow, domain specific problems they were solving. It was a no-brainer to measure directly how your vision classifier performs against a radiologist in determining whether an image corresponds to lung cancer.

    Now we've opened up the output to a wide variety of open-ended domains: natural languages, programming languages, images and videos. Since the output domain is inherently subjective, it's hard to get a good handle on their usefulness, let alone getting people to agree on that. Hence the never-ending discourse around them.