Comment by devin

4 hours ago

> If you can go from producing 200 lines of code a day to 2,000 lines of code a day, what else breaks? The entire software development lifecycle was, it turns out, designed around the idea that it takes a day to produce a few hundred lines of code. And now it doesn’t.

It is so embarrassing that LOC is being used as a metric for engineering output.

41 comments

devin

ilikebits 4 hours ago

LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000.

jazzypants 3 hours ago
That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning.
- mcmcmc 3 hours ago
  
  Ok? No one is saying that all LOC are equal. Ceteris paribus, 2000 lines is 10x more time consuming to review than 200
  
  10 replies →
moregrist 3 hours ago
It’s still a bad metric.
I have worked with code where 1000s of lines are very straightforward and linear.
I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.
The skills and effort required to review and understand those situations are quite different.
One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.
- lelandfe 3 hours ago
  
  There’s still a limit on how far one can drive in a day, no matter the road.
- jimbokun 2 hours ago
  
  The number of bugs tends to be linear to lines of code written meaning fewer lines of code for the same functionality will have fewer bugs.
  So I’m pretty skeptical that reviewing 2000 lines of code won’t take any more time than reviewing 200 lines of code.
  Furthermore how do you know the AI generated lines are the open highway lines of code and not the mountain road ones? There might be hallucinations that pattern match as perfectly reasonable with a hard to spot flaw.
mrbnprck 3 hours ago

Its still posssible to run any LLM in a loop and optimize for LoC while preserving the wanted outcome.

faizshah 3 hours ago

I experimented with vibe coding (not looking at the code myself) and it produced around 10k LOC even after refactors etc.

I rewrote the same program using my own brain and just using ChatGPT as google and autocomplete (my normal workflow), I produced the same thing in 1500 LOC.

The effort difference was not that significant either tbh although my hand coded approach probably benefited from designing the vibe coded one so I had already though of what I wanted to build.

embedding-shape 3 hours ago
Sounds like a great oppurtunity to understand your own development process, and codify it in such detail that the agent can replicate how you work and end up with less code but doing the same.
My experience was the same as you when I started using agents for development about a year ago. Every time I noticed it did something less-than-optimal or just "not up to my standards", I'd hash out exactly what those things meant for me, added it to my reusable AGENTS.md and the code the agent outputs today is fairly close to what I "naturally" write.
- 8note 2 hours ago
  
  or go with this, and use the agent to prototype ideas, and write it yourself once you know what you want

keeda 1 hour ago

LoC is perfectly fine as a metric for engineering output. It is terrible as a standalone measure of engineering productivity, and the problems occur when one tries to use it as such.

It's still useful, however, because that is the only metric that is instantly intuitively understandable and comparable across a wide variety of contexts, i.e. across companies and teams and languages and applications.

As we know, within the same team working on the same product, a 1000 LoC diff could take less time than a 1 line bug fix that took days to debug. Hence we really cannot compare PRs or product features or story points across contexts. If the industry could come up with a standard measure of developer productivity, you'd bet everyone would use it, but it's unfeasible basically for this very reason.

So, when such comparisons are made (and in this case it was clearly a colloquial usage), it helps to assume the context remains the same. Like, a team A working on product P at company C using tech stack T with specific software quality processes Q produced N1 lines of code yesterday, but today with AI they're producing N2 lines of code. Over time the delta between N1 and N2 approximates the actual impact.

(As an aside, this is also what most of the rigorous studies in AI-assisted developer productivity have done: measure PRs across the same cohorts over time with and without AI, like an A/B test.)

root_axis 3 hours ago

He's not using LOC as a metric, he's making an observation about the impact of a change in the typical volume of LOC.

mcmcmc 4 hours ago

Is it? The whole point of the article is that the rate of output for writing code has surpassed the rate at which it can be reviewed by humans. LOC as an input for software review makes a lot of sense, since you literally need to read each line.

adtac 4 hours ago

LOC is the worst metric for engineering output, except for all the others - Churchill

deadbabe 3 hours ago
The amount of times an engineer says what the fuck while reading code still seems like a reliable metric for code quality assessment.
- dyauspitr 2 hours ago
  
  We won’t be doing that for much longer, enjoy it while you can.
- AnimalMuppet 2 hours ago
  
  Somewhat reliable, yes. Not objective, though, and hard to reproduce.
  
  1 reply →

etothet 4 hours ago

Agreed. And, LOC has historically been one of the things we've collectively fought against management for how to evalute a "productive" developer!

ButyTh0 3 hours ago
Why?
We should have gone the other way; generated a lot of code and demanded pay raises; look at the LOC I cranked out! Company is now in my debt!
If they weren't going to care enough as managers to learn and line go up is all that matters to them, make all lines go up = winning
You all think there's more to this than performative barter for coin to spend on food/shelter.
- embedding-shape 3 hours ago
  
  Because not everyone is just out after earning the most money, some people also want to enjoy the workplace where they work. Personally, what the quality of the codebase and infrastructure is in matters a lot for how much you enjoy working in it, and I'd much rather work in a codebase I enjoy and earn half, than a codebase made by just jerking out as many LOC as possible and earn double.
  Although this requires you to take pride in your profession and what you do.
  
  1 reply →

hungryhobbit 3 hours ago

Humans are also incredibly varied and different.

Do you reject all stats that treat the number of people involved (eg. 2 million pepole protested X) as "embarrassing" ... because they lump incredibly varied people together and pretend they're equal?

vrganj 3 hours ago

I read somewhere that measuring software engineering output by LoC is like measuring aerospace engineering by pounds added to the plane and I thought that was an apt comparison.

dyauspitr 2 hours ago

Honestly it’s more like 200 to a 100,000 of pretty decent quality code at this point.

estimator7292 4 hours ago

At least "mentions of LOC" is now a great metric for "how clueless is this person"

kashyapc 3 hours ago

Totally. I thought Simon was wiser than this; even he couldn't resist getting swept up by breathless hype. The moment you start typing "LOC as a metric", alarm bells should go off in your head.

simonw 1 hour ago
This was a podcast, not a pre-scripted talk. I suggest listening to the audio version - it makes it more clear that this was thinking out loud, not carefully considering every word.
- kashyapc 1 hour ago
  
  I see, fair point. Sorry for taking a dig at you. Please know that I do appreciate a lot of work that you do. I was just worried for a moment when just reading that bit.
Daishiman 3 hours ago

LOC is very much an effective metric for general productivity for the median feature. You can't code golf most lines of code out of existence.
We're also assuming LOC vibe coded by competent engineers who should be able to tell when something is overengineered.