← Back to context

Comment by d4rkp4ttern

2 days ago

Most of the narrative is about how AI is writing all/most code, but I’d wager that the fraction of human reviewed code is approaching zero far faster than anyone is realizing or willing to admit.

Very true. Last year I at least glanced at every line of AI generated code. Now if some AI makes a 10k line program for some one-off tasks, I run the program, glance only over the output, and move on.

  • Especially if you're having an LLM write non-interactive scripts to calculate complex things from large datasets, glancing at the output is not enough to know if the output is remotely accurate (unless the output is so trivial you could literally do it in your head).

    Case in point: I recently asked an LLM to write a pile of code to compile historical baseball stats to test betting success against the results of my hand-written code that evolves genetic algorithms. I marveled for a little while at the unbelievable improvement in EV/ROI that this script was showing could have been achieved from certain small tweaks. I only noticed after pushing a total bet that the push registered on the output as a win - and only because I was carefully staying on top of it. A single stupid recursively operating >= instead of > had caused completely nonsensical results that looked plausible.

    Imagine, like, trusting a 10k loc script to give you data for something you were going to build in the physical world, and hoping an LLM hadn't made a mistake like that.

    • Code needs tested. I'm glad that the bar of entry has been lowered but now we just have a huge amount of people that haven't yet learned anything about how to test and verify that the code meets the expected requirements.

      1 reply →

  • Which one-off tasks need 10k lines of code?

    • Would depend on what AI and prompt you use ultimately. Ask it to add tests (functional, E2E and unit, maybe invent a new type too), packaging, modular code and/or whatever, and you get to 10K relatively quickly with some of the more verbose LLMs out there.

      Personally it's probably the biggest struggle, trying to rein in the "spray and pray" approach LLMs typically like to take, and reducing the "patch on top of patch" syndrome too.

    • Calculate the engine power of a 2015 VW polo when travelling 70 mph on a flat road behind a box truck. Draw a chart of drag Vs follow distance. How significant is humidity on the result?

      4 replies →

    • One off web app for scrubbing through some data, that, once done, will never be run again?

  • This is fine for one off tools and I do the same. But building long-lived "professional grade" production software this fails real quickly.

    My team is using AI for most of the code, but the human review layer is crucial and unavoidable if you're interested in things like reliability, uptime, controlled feature rollouts, the integrity if your user's data, etc.

  • A huge factor I don’t see mentioned often enough, is the rapid increase of AI-coding in a language unknown to the dev.

Pretty much. For my home IT projects I have been playing around with various means of implementing agents.

I’ve looked at the outputs here and there - and holy hell would it never pass review if I were trying to make something robust and anti-fragile. But since I can just have AI spit out a fix for the horrific “code” when it breaks in a totally predictable manner it’s just not worth my time to try to actually sit down and get it done right. Or even fight with AI by providing a good specification and design guidelines.

I imagine this is how things are going in the real world, given 30 years of working with various levels of humans. So long as the output is “good enough” it is the extreme minority of folks who care about much else. And that’s for mid-level to senior folks who have the experience to know better. Juniors wouldn’t even be able to pick out most of even the most obvious anti-patterns AI tends to spit out such as putting configuration within code, etc.

Refactoring is just in a new world too, that us olds probably have a hard time with. It’s no longer examine the code, identify design gaps, find high leverage places to start fixing, etc. It’s now “this is broken, rewrite from scratch” when it eventually turns into too much spaghetti.

In some ways being entirely focused on the outcomes is freeing in a way. But man under the hood is crazy and a whole new world.

i admit. agentic coders do not look at the code except by accident. not much point unless you're working on enterprise applications

People already barely reviewed code, most of it was imported libraries.

  • The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s). But now even that's unreliable because libraries are being slopified at an unreviewable pace too.

    • > The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer

      I don't know many serious software engineers who'd take that approach, the convention was always to actually open up the code, evaluate the quality, see if they seem to know what they're doing, then chose the libraries you know works and could be adjusted to fit whatever you wanted it. At least for professional development inside companies, not a single library would be included unless you at least reviewed that the top-level dependency you pull in actually had code worth pulling in in the first place.

      And this approach just as well today as it used to, you literally have to spend like 3-5 minutes browsing the code, evaluate the abstractions they've built and then say "Yes, looks good enough to try to use" or "Clearly these people just hacked this together as fast as they could".

    • It's weird that you think humans weren't slopifying code until LLM's came along. At least now they are implementing tests and CI and far more documentation, updating API versions, etc. OOMs above the amount they did before.

      I'd also wager that far more % of code gets more coverage of review, via prompting AI to do it, than it did before.

      Most PR's pass as long as they A. pass checks, B. dont introduce regressions, C. fix a bug or implement a feature. People talk about this era of humans reviewing code with nostalgia... but that never existed at scale.

    • > The assumption used to be that you respected the library enough and believed it was well reviewed and architected by the maintainer(s).

      Let us be honest, for your average dev, the assumption was that the number of github stars, npm/nuget downloads was a god proxy for quality.

  • People seem to have rosy glasses about how great and vetted code was before AI coding took off the way it has, it was not great.

    • I’d say the increased scrutiny has merely exposed the difference in care between the different groups in the industry. Seems to explain pretty well why both sides are equally confounded by the other’s expectations.

  • Which people? I’ve never worked at a place where reviews weren’t taken seriously. For small changes a cursory glance, sure, but anything medium-sized meant checkout+local test. If anything we’d spend too much time on code reviews or pair programming?

  • People keep saying this like it’s some meaningful point, but the reality is many people in different projects have a shared need for that code to work correctly, and there is a social proof involved in used open source libraries. That is why people look at downloads and dependent projects as heuristics of stability and correctness. That is not the case with (and cannot be obtained with) code authored by generative AI.

    • Yes it can, the code will be ran and you will have the proof that it ran well. Or it won't run well and you'll re-do it. Same as with some imported library.