← Back to context

Comment by alabut

18 hours ago

Simon Willison made a similar parallel recently:

https://simonwillison.net/2026/May/6/vibe-coding-and-agentic...

  “The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on.

  If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.

  I’m going to look at their documentation and I’m going to use it to resize some images. And then I’m going to start shipping my own features. And if I start running into problems where the image resizer thing appears to have bugs or the performance isn’t good, that’s when I might dig into their Git repositories and see what’s going on. But for the most part I treat that as a semi-black box that I don’t look at until I need to.”

Suppose the image resize service has some caching, and due to a bug in the caching, under certain circumstances it will respond with an already-cached resized version of a different source image.

Let's say for example it caches on something stupid like the CRC32 of the input image -- good enough that the couple dozen images in your test dataset don't collide, you don't see it in smoke testing your app, but real world data has collisions on a daily basis.

This gets into production and customer A sees a resized version of customer B's document for a thumbnail. Now customer A is wondering how many other customers are seeing resized versions of their private documents in thumbnail images. They are very very mad.

If the image resize service was built by "another team" then that other team is responsible for the bug and will take most of the heat for it. If it was built by an "agent swarm" or "gas town" or whatever under my direction then I'm 100% responsible for it and rightly deserve the heat.

That is why I cannot understand any approach that doesn't involve reading the code at all. Testing alone is not sufficient. MTTR is not sufficient because you can't make a customer less mad about a data privacy bug by fixing it.

  • Practically, this is just about confidence values, anticipated blast radius and balancing testing vs review overhead.

This leaves out the part where you ask the original developer: "Why does this thing do that?"

But then, the ownership is clear. And no team would be like to be pointed that their 5th iteration is also broken and can’t be relied for production usage. That’s the difference with AI code. LLM are not aligned with your goals. Any trust in them doing the right thing is very misguided.

  • That's why you have them write tons of tests. Way more than you generally would for human written code. And the agent writing/maintaining the tests is not the agent fixing the bugs.

    I've personally had a LLM write an image resizing library for me. It's a fairly basic one, I didn't need anything fancy. I could have used something off the shelf but it was at a time when I was testing what Claude could do. And to be honest, it just worked. One shot, if I recall correctly, or at least, one session with a few tweaks and never touched again. It's been embedded in a larger app for several months and I don't recall hitting a single bug with that, specifically. So I'm not sure your complaints about "the 5th iteration" being broken have much grounds here.

    • > It's a fairly basic one, I didn't need anything fancy.

      > one session with a few tweaks and never touched again

      > and I don't recall hitting a single bug with that, specifically.

      And there you got your answer. If every scenario was as simple as that, we wouldn't really need software development teams. I'm not saying that you can't good result with an LLM tool, but most software are in constant flux and software engineering is about keeping the cost of making new changes minimal.

      So if you have a dependency, you want to treat it as a black box, because it lowers the cognitive load. But you don't want it to suddenly change its contract, including breaking it in some strange way. And that brings me to...

      > That's why you have them write tons of tests.

      Tests are not implementation guarantee. They are a canary to warn about some errors. You assume the code is going to written in good faith, but you place alert points to warn you about possible mistakes. Because you can't really test the full implementation without having a brittle test suite (which you have to maintain).

      And tests relies on a lot of assumptions (mocks, initial cases, fakes,...). Those should be treated with care. Because as soon as one are wrong, the test cases it affects are make-believe.

      The only true testing of your software is done in production. Everything else is about avoiding the easy mistakes.

Simon Willison’s analogy does not apply unless that other team was immediately fired after they delivered the image resize service, or (more commonly) was done by a one off contractor. The difference is the trust model. We trust that our company has hired a competent team which maintains knowledge of the image resizing service, that they respond to bug reports and feature requests and that they know how to fix and implement those.

Now I have been on HN long enough to know that we used to despise code written by contractors which we now depend on.

  • Why does the team need to be "fired"?

    The single person who did the service might just quit and go to another job. They might be external consultants that rotate away when the contract ends. It might be a SaaS service where you don't control the code at all - nor the composition of their team.

    We have trusted services, contractors and teams within our companies before. Now suddenly _everyone_ has ALWAYS read and meticulously analyzed every single line of code they have ever imported to a project?

    • Personally, if my company would have one person write a utility which mine would depend on, and that person would quit soon after delivery, I would be pissed. And I would demand that my team take ownership of the utility, and gain intimate knowledge of the utility, and voice my concerns with management who made the decision to hand out a task like that to a single person. I would then inform that management about the concept of bus factor, and how they just violated best practices. That next time they decide to hand out a task like that to a single person, that they should instead just hand it out to the team which is gonna rely on that utility.

    • As your parent comment says. It’s about trust. People don’t hire contractors with low reputations. Same with SaaS services. That’s why you see so much stuff about branding and customer testimonials. It can be gamed, but usually works well enough.

      LLM have no reputation to lose. Their work may or may not be aligned with your goals and they can’t care if they messed up.