Comment by zarzavat

4 hours ago

Perhaps I've missed a few weeks worth of progress, but I don't think that AIs have become more trustworthy, the errors are just more subtle.

If the code doesn't compile, that's easy to spot. If the code compiles but doesn't work, that's still somewhat easy to spot.

If the code compiles and works, but it does the wrong thing in some edge case, or has a security vulnerability, or introduces tech debt or dubious architectural decisions, that's harder to spot but doesn't reduce the review burden whatsoever.

If anything, "truthy" code is more mentally taxing to review than just obviously bad code.

I know there are good uses of LLMs out there. I do. But.

The current fever pitch mandates from above seem to want it applied liberally, and pushing back against that is so discouraging and often career-limiting as to wear the fabric of one's psyche threadbare. With all the obvious problems being pointed out to people, there are just as many workarounds; and these workarounds, as is often revealed shortly thereafter, have their own problems, which beget new solutions, ad infinitum.

At some point it genuinely seems like all this work is for the sake of the machine itself. I suppose that is true: The real goal has become obscured at so many firms today, that all that remains is the LLM. Are the people betting the farm and helping implement the visions of those who have done so guaranteed a soft exit to cushion them from the consequences, or is rationality really being discarded altogether?

Sure, sound engineering principles can help work around these problems, but what efficiency is truly gained, in terms of cognitive load, developer time, money, or finite resources? Or were those ever an earnest concern?

  • In my opinion you are just wrong.

    It’s an absolute game changer, and it can now multiply your productivity fivefold if it’s a solo greenfield project.

    Maybe half a year ago it was as you said. You had to wait for the agent to finish, you had to review carefully, and often the result was not that great. You did not save a lot of time.

    Now I can spin up 3+ parallel conversations in Codex, each in a git worktree. My work is mainly QA testing the features, refining the behavior, and sometimes making architectural decisions.

    The results are now undeniable. In the past I could not have developed a product of that scope in my free time.

    That is what is possible today. I suspect many engineers have not yet tried things that became feasible over the last months. Like parallel agents, resolving merge conflicts, separating out functionality from a large branch into proper PRs.

    • "many engineers have not yet tried things that became feasible over the last months"

      I have heard this statement every single day for 2 years and yet we still have no companies compressing 10 years into 1 year thus exploding past all the incumbents who don't "get it".

  • There's two sides to the AI mandates.

    The degenerate side is clueless upper management and fad-driven engineering. We have talked extensively about this.

    There is a more rational side to it that I've seen in my org: some engineers absolutely refuse to use AI and as a consequence they are now, clearly and objectively, much less productive than other engineers. The thing is, you still need to learn how to use the tool, so a nontrivial percentage of obstinate engineers need to be driven to use this in the same way that some developers have refused to use Docker or k8s or whatever.

    • Ah yes, we must force these obstinate engineers to the right path! Only after getting everyone to see the light will they understand and thank us for boundless productivity!! /s

      Perhaps these “obstinate” engineers have good reason in their decision. And it should be their decision!

      To be so confident in what is “the right way (TM)” and try to force it onto others is... revealing.

This has generally been the case, though. As mentioned in the post, "You want solutions that are proven to work before you take a risk on them" remains true and will be place where the edges are found.

  • It's about responsibility.

    If I get pwned because my AI agent wrote code that had a security vulnerability, none of my users are going to accept the excuse that I used AI and it's a brave new world. I will get the blame, not Anthropic or OpenAI or Google but me.

    The same goes for if my AI generated code leads to data loss, or downtime, or if uses too many resources, or it doesn't scale, or it gives out error messages like candy.

    The buck stops with me and therefore I have to read the code, line-by-line, carefully.

    It's not even a formality. I constantly find issues with AI generated code. These things are lazy and often just stub out code instead of making a sober determination of whether the functionality can be stubbed out or not.

    You could say "just AI harder and get the AI to do the review", and I do this a lot, but reviewing is not a neutral activity. A review itself can be harmful if it flags spurious issues where the fix creates new problems. So I still have to go through the AI generated review issue-by-issue and weed out any harmful criticism.

    • On the other hand, I don’t need to review carefully every line of code in my thumbnail generator and associated UI.

      My nonexistent backend isn’t going to be pwned if there is a bug in the thumbnail generation.

      After the QA testing on my device, a quick scroll through of the code is enough.

      Maybe prompt „are errors during thumbnail generation caught to prevent app crashes?“ if we‘re feeling extra cautious today.

      And just like that it saved a day of work.