Comment by sunaurus

4 days ago

I'm constantly thinking about that Microsoft guy who posted something like "we want 1 million LoC per engineer per month", which basically read as satire to most engineers I talked to, except apparently it was not satire at all, and indeed seemed to reflect the position of many CEOs etc when it comes to LLM code generation.

I do think that over the past few months, it feels like the hype around producing unmaintainable amounts of LoC has started dying down. More pragmatic and realistic takes are seemingly shared more openly, and are maybe even getting through to top leadership at some tech companies. Maybe not all is lost yet.

I once worked in a company where there was an 80% code coverage requirement. Some enterprising contractor had a script that generated a single file with its own covering test suite the size of which could be tuned to achieve 80% over the whole codebase. Mostly the code was untested.

  • And thanks to AI, we could generate extremely convincing reams of code whose only purpose is to be fake unit tested. Amazing. I sincerely hope I never need to use this nuclear weapon.

    • Or better yet: effectively fake unit tests. It is almost never the case that tests written by AI detect actual issues. At most they detect that has changed.

      1 reply →

The word “slop” was a good choice to talk about the mass of code generated by AI. I think it resonates with non-tech people and it conveys disgust. It’s clear that we should avoid slop.

“Technical debt” never hooked management in the same way and we have found it hard to convince them that it needs to be addressed. Debt in general is something that can be a problem, but doesn’t need to be avoided or addressed until it is a problem so the can is kicked down the road.

  • Just fix technical debt over time as you work on other things and budget for it as you give estimates.

    This approach has always worked for me. Non technical management will never understand technical debt and really shouldn’t need to.

  • To be fair, they are also different things, though there is certainly overlap...

    To me, tech debt, captures the idea that we cut corners now to move faster, with the understanding that it will need to be "re-paid" and cleaned up later, otherwise we take on too much tech debt, and everyone knows too much debt is bad...

    AI slop code means people feed their tasks to a model, trust it to drive the changes, they might do some cosmetic clean ups, then generate a 3 pager PR description they didn't even read themselves, then toss it over to the code reviewer, let that chump figure out what the hell I was doing while I ship 3-4 more PRs...

  • Technical debt is a indefinable quantity which makes it very prone to be abused to mean "I wish I could rewrite this in [insert some fashionable language, framework or coding style]".

    AI slop is an easier concept to quantify. It's basically the code for which insufficient people in the organisation have a meaningful understanding of how it works or what it does.

    • > It's basically the code for which insufficient people in the organisation have a meaningful understanding of how it works or what it does.

      Its connotation also includes being vastly larger than needed for the purpose it serves, _if_ there is even any purpose.

1000000/25/8/60 = 83+ lines of code per minute.

100000 LOC per month /25 days per month /8 hours per day / 60 minutes per hour

That seems...problematic for anyone doing code reviews.

  • > That seems...problematic for anyone doing code reviews.

    No, it's incentive to let LLMs do the reviews, supporting your tokenmaxxing efforts.

It has been incredibly hilarious to watch the C-suites sudden realization that tokens COST MONEY and immediately revise their guidelines for how employees should use AI.

Like maybe having every engineer generate 1 million lines of code per month every month…with no thought to how those lines of code would make the company money…or how many tokens would be burned to accomplish this at what cost…wasn’t fully thought through.

> which basically read as satire to most engineers I talked to

Seemingly engineers get this wrong too. I'm reminded of when Cursor bragged about how many lines of code a group of agents could produce, with the underwhelming results of a barely working browser, when the same could be built with much less code.

But they highlighted the amount of code as they were proud over how much slop their constellation of agents had shit out, and these were supposedly engineers, really strange to see.

  • “Less is better” is sort of… the position of the engineer who enjoys the craft of programming, right? I don’t think this is universally believed.

    And anyway, I’m pretty sure what people really mean by this “less is better” mantra is: the lowest amount of code that still accomplishes the goal and is still readable is preferred. Linux apparently has 40M lines of code, and I bet most of it is better than mine. Some things just take lots of code.

    Which seems to leave room for these agent salesmen to pitch SLoC as a plus. We just have to believe those lines are all good ones. I that case, it would be impressive. I don’t believe it, but they are probably pitching to people who do.

    • > “Less is better” is sort of… the position of the engineer who enjoys the craft of programming, right? I don’t think this is universally believed.

      I think it is (or should be) a goal & business-oriented concern as well, not just an engineer's who enjoys their craft.

      More complex systems are worse than simpler systems (that accomplish the same), in cost, maintenance, fragility, ease of understanding, etc. Fewer moving parts usually result in higher reliability, fewer things that can break down or fail to interact properly, etc. That's a business concern too, not just engineering craftmanship or whatever. Business people should care about this too.

      I don't think this is the same as bikeshedding over irrelevant details, something we software engineers are often prone to. Monstrous complexity does impact the business!

      1 reply →

    • > “Less is better” is sort of… the position of the engineer who enjoys the craft of programming, right?

      No, it's the perspective of a programmer who wants the project to not be bogged down too much in technical debt so every change gets slower and slower to implement, as everything gets more intermingled. A clean design helps you move faster for a long time, compared to a design that is fast to implement but makes it hard to move forward properly in the future, without resorting to shortcuts and/or hacks.

      > Some things just take lots of code.

      True. Rich Hickey does a good job differentiating between what's complicated because the domain is complicated, VS what's complicated because the implementation just ended up that way, even though with some more thought and design, could have been made a lot simpler.

    • Less is better is the position of the engineer who has seen some shit and whose career lived to tell about it.

> I do think that over the past few months, it feels like the hype around producing unmaintainable amounts of LoC has started dying down.

I wonder if a small part of this is more and more business and product people actually trying to incorporate AI into their daily workflows. I have seen this in both small companies I work for. People were very excited about getting Claude Cowork a couple of months ago, and while they use it daily, I would say they are rather underwhelmed compared to the magic they were expecting. Complaints include the output being mediocre and verbose, it getting the most basic things wrong, hitting token limits all the time, and people going back to doing things themselves because it is faster.

Sure, there is some degree of holding it wrong in the beginning, but people are realizing that maybe, just maybe, there is still somewhat of a gap between what AI CEOs, LinkedIn grifters, and YouTube AI supplement peddlers claim and reality.

  • I suspect this is it. I'm 40, and the only tech person in my social circle. Many of my friends were all excited about using it for things like basic webdev and home networking. One shotting that type of stuff is very viable even if you don't know anything about the topic. Now that they are trying to use it for something they actually know about, suddenly it's unusable. It's a modification of Gell-Mann Amnesia.

All else being equal, and assuming you are building the right thing, being able to deliver more correct lines of code is a good thing. The question is how to do it reliably, given that a human cannot possibly read all of it. The answer seems to me to involve spot checks with proofs of correctness and statistical quality control, the latter being things that can be automated. One issue I see is that the models are constantly changing and are therefore not well understood statistically.

  • If you are generating that many lines of code it’s also almost impossible to tell if you’re building the right thing. You need to deploy each functional change and measure if it’s giving you the expected outcome, before moving onto the next thing.

  • >All else being equal, and assuming you are building the right thing, being able to deliver more correct lines of code is a good thing.

    Why? If you can deliver the same thing in fewer correct lines of code wouldn't that be preferable? At a bare minimum if you're still insisting on using AI to slop out your project, having it do things in fewer lines of code means you can fit more into your LLM's context window.

    • > If you can deliver the same thing in fewer correct lines of code

      it really depends on what you're doing. If your goal is "become interoperable with the N different and incompatible network protocols that people have devised for doing task X" I'd really like to know a solution that doesn't have at least some part of the amount of code that scales with N.

      Example: consider https://bitfocus.io/connections which connects to 700 different things. Right now it's written with Node.JS, with one repo per connection (example: https://github.com/bitfocus/companion-module-meyersound-gala...). Let's say you want to make a similar product but that runs on ESP32 where performance is paramount so you need C++ or Rust. How do you do that without at least as many lines of code as the existing JS implementations for every system supported by Companion?

      4 replies →

    • Then you simply produce those fewer lines of code even faster. The question is, how fast are you delivering correct code?

      Moreover, writing too terse code harms readability and maintainability. There is such a thing as irreducible complexity.

I had an MoM at Stripe who pushed back on perf designations based on number of PRs.

I wish I were joking.

(The had never been an engineer.)

  • It's a signal. It's not a strong signal, and you certainly should not base your entire perf on it, but if the number is unusually high or low, it's a signal that could warrant further investigation.

    (I once worked with an engineer that had two PRs, both fairly small bug fixes, in a given calendar year, and when I looked more carefully, they did not have any other obvious output or impact.)

    • Strongly agreed. It is a signal. I did an analysis once at the end of the year. Work group of about 45 engineers. The CM system had a lot of steps, and work could get bounced around, but there was a step where some one "resolved" a software activity. Bug fix or new requirements, it did not matter. This step was when someone actually completed work and put into into the dev stream.

      A quick DB query and the variance was substantial. A couple of people had over a hundred. About 10 had 2. For the year. The ramp up was slow, average was 8 to 10 a year.

      Dig a little deeper. Those at the top were 'group leads' not only did they do IC work, they also got stuck with all 'paperwork' on the problem work packages. They had 'power', so they could override various things. So, they were doing a lot of work, and taking care of things. Good signal, matches what one would expect.

      Those at the bottom. One of them had effectively been a 'systems engineer'; all of their time was working on requirements with the customer, making powerpoint, etc. Important work, so that signal was inverse of what it originally showed.

      A couple were in the middle that had great reputations for technical expertise. They were spending almost full time in training / mentoring / very hard problems mode. Highly valuable, but not shown by looking at these numbers.

      All the rest? 80% of the work was being done by 20% of the people. We could have dropped about 12 heads and barely noticed.

      The problem is, you could not take action on this measure. It gave you a place to start, but you needed to know more about what was going on day to day.

    • Let's measure "executive performance" by counting how many "answered phone calls" per hour they have. If they don't answer enough calls, that's a signal, that they aren't doing anything useful, and should be depreciated as a result.

  • Trying to parse your sentence, which is ambiguous...

    You're saying that the manager-of-managers would argue that the number of PRs should affect perf ratings? Or the MoM would push back against the line managers who were giving ratings based on # of PRs?

    • They were reviewing perf designations, then pulling up PR count, then arguing against designation based on the number of PRs opened.

      3 replies →

I think the reliability struggles of Github may have helped with this

  • I can't help but wonder if the causation is backwards here and the millions of lines of slop had more to do with the Github struggles than the reverse

It's not unmaintainable if you have 1000 agents maintain it.

  • It is unmaintainable even if you spend 100k per month on tokens to have LLMs pretend they are maintaining it, if they slow down and make little ACTUAL progress. Sadly real progress is impossible to measure, if all you have is an overexcited """engineer""", a credit card, and so much cash spent you could hire all the best engineers you know and still have money for a porsche.

    • Well, software presumably has a goal of accomplishing something for some end-user, so the progress should be trivial to measure: are features/changes being completed?

      The marketing ploys of OpenAI/Anthropic where agents build something that nobody uses might be hard to track given that there are zero users. But what about everyone using agents for real software? It's trivial to prove that agents make progress.

      1 reply →

> I'm constantly thinking about that Microsoft guy who posted something like "we want 1 million LoC per engineer per month", which basically read as satire to most engineers I talked to

Did those engineers not actually read the complete tweet? Because it wasn't about "engineers should write 1M LOC per month of product code" it was "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work". Which doesn't seem like satire at all..? It just means "develop mostly reliable AI-driven refactoring tools with good guard rails". Which seems quite sensible, actually?

  • > Because it wasn't about "engineers should write 1M LOC per month of product code" it was "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work".

    Making a grand claim of a goal and not really having an explanation on how to achieve it isn't really much better. I could say "we want to scale food production so that one farmer could manage a million acres of corn a month", but that wouldn't really be sensible. A line of code is less work than an acre of corn of course, but I don't think it's at all apparent what upper bound for how much code is actually plausible for a single engineer to generate in a month and have any degree of confidence in. Given the absurd levels of hype around AI from non-engineering management in the past couple of years, it's not clear why the benefit of the doubt is earned here when there legitimate are managers and executives claiming pretty much exactly what you're claiming this guy wasn't.

  • I don't care - porting the current architecture - with all the known I wish I had done this differently's - doesn't gain much. See some developers I've worked with who love Rust for "safety", even though they just put everything in unsafe at the first sign of trouble instead of thinking about how this should work safely.

    Porting to a new language is easy, but does nothing useful. What we need is to fix the mistakes of the past so we can get to the future. We need to make acceptable performance.

  • If everything in the initial code is 300% covered with excellently documented tests that should be minimally changed during transition (if transition don’t reveal any corner case tests were missing, maybe the transition is not such a bright move after all), that seems a possible thing to consider.

    Otherwise it really sounds like a recipe for unnecessary huge risk with dubious expected positive outcome.

    Not saying don’t have fun, but on the other side maybe not with the core product of you cash cow already?

  • Minor correction: LinkedIn, not twitter. https://www.linkedin.com/posts/galenh_principal-software-eng...

    > Because it wasn't about "engineers should write 1M LOC per month of product code" it was "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work"

    These are one and the same. Whether it's ported code or not doesn't change that. The framing device also doesn't matter, because it's the exact "Oh it's our goal" shtick that executives use in the former's case.

    "It's just a measure" doesn't cut it in a world where every single AI measure immediately gets turned into a target by executives greedy for efficiencies that don't exist.

    EDIT:

    Right, I forgot. This is HN where everyone is a galaxybrain and "Port a million lines of code per month" is a totally reasonable goal for a single individual.

    • I can easily game writing 1M LOC per month by having the LLM write code in more verbose ways, with useless indirections and abstractions thrown in for good measure. I could even ask claude to write code that does nothing but just takes up line.

      In contrast, converting 1M LOC of code per month is a much more solid measure, as long as you measure LOC of the source, not the new code. Sure, in the short term you can pick the easy/verbose things to port, but it's hard to do sustainably. A 5M LOC code base would still be expected to be ported in 5 engineer months.

      Granted, you can still rush the work, not test properly, neglect good planning and engineering. Ported lines of code should not be the only measure (just like with any other measure). But it's a much less problematic measure than coding 1M LOC

      1 reply →

  • > "we want to scale automated porting of code to safe languages so that 1 engineer managing 1M LOC of automated conversion can work". Which doesn't seem like satire at all..?

    Because many programmers don't believe that'd work. See the reaction to Bun's porting to rust. (I bet Bun will work and prove those programmers wrong, but that's another story.)