← Back to context

Comment by reenorap

5 hours ago

As someone who has switched to exclusively coded using AI after 30 years of coding by myself, I find it really weird when people take credit for the lines of code ad features that AI generates. Flexing that one "coded" tens of hundreds of thousands of lines per day is a bit cringe, seeing as though it's really just the prompt that one typed.

It's a spectrum, isn't it? From targeted edits that you approve manually - which I think you can reasonably take credit for - all the way to full blown vibe-coded apps where you're hardly involved in the design process at all.

And then there's this awkward bit in the middle where you're not necessarily reviewing all the code the AI generates, but you're the one driving the architecture, coming up with feature ideas, pushing for refactors from reading the code, etc. This is where I'm at currently and it's tricky, because while I'd never say that I "wrote" the code, I feel I can claim credit for the app as a whole because I was so heavily involved in the process. The end result I feel is similar to what I would've produced by hand, it just happened a lot faster.

(granted, the end result is only 2000 LoC after a few weeks working on and off)

Meta apparently now has a "leaderboard" for who is using the most AI - consuming the most tokens. Must make Anthropic happy, since Meta is using Claude, and accounts for some significant percentage (10%? 20%?) of their total volume.

  • Token usage is a different and more sympathetic heuristic than LOC produced.

    The metric by itself tells you nothing about what value those tokens produced, but to some extent it represents the amount of thinking you are able to offload to the computer for you.

    Wide breadth problems seem to scale well with usage, like scanning millions of LOC of code for vulnerabilities, such as the recent claude mythos results.

    • The trouble with rewarding token usage is the same as rewarding LOC written/generated - if that's what you are asking for then that is what you will get. Asking the AI to "scan the entire codebase for vulnerabilities" would certainly be a good way to climb the leaderboard!

      1 reply →

Yes!

I don't mind it so much when it's a newbie or non-techie who has never actually written code before, because bless their hearts, they did it! They got some code working!

But if you've been developing for decades, you know that counting lines of code means nothing, less than nothing. That you could probably achieve the same result in half the lines if you thought about it a bit longer.

And to claim this as an achievement when it's LLM-generated... that's not a boast. That doesn't mean what you think it means.

But I guess we hit the same old problem that we've always had - how do you measure productivity in software development? If you wanted to boast about how an LLM is making you 100x more productive, what metric could you use? LOC is the most easily measurable, really, really, terrible measure that PMs have been using since we started doing this, because everything else is hard.

  • I forget who said it, but I heard the idea floated that if your work can be measured in terms of productivity at all, it can and probably should be done by software. Not sure how that applies here since as you point out, a 10x programmer probably doesn't produce 10x the code.

  • here’s one thing that somewhat worked for my team. when we first started using LLMs we decided to run the same process as if they did not exist, same sprint planning meetings, same estimation. we did this for 6 months and saw roughly 55% increase in output compared to pre-LLM usage. there are biases in what were tried to achieve, it is not easy to estimate something will take XX hours when you know some portion (for example writing documentation or portions of the test coverage) you won’t have to write but we did our best. after we convinced ourselves of productivity gains we stopped doing this.

    • wow, great experiment. I'm amazed the whole team went through with duplicating everything for that long. Nice work :)

      I resorted to feels. After decades of programming, I know when I'm being productive, and I can reasonably estimate when a colleague is being productive. I extrapolate that to the LLM, too. Absolutely not an objective measure, but I feel that I can get the LLM to do in a day a task that would take me 2-3 weeks (post-Nov 25 and using parallel agents).

If anything couldn’t huge amounts of code changes or LoC be a sign of a poor outcome?

  • Yes, there's a reason it was abandoned as a KPI almost as soon as it was introduced. Just because AI is writing the code instead now doesn't magically make it a good metric.

Some argue, LoC is irrelevant as a quality/complexity metric as (in this new software product development lifecycle) implementation + testing + maintainance is wholly overseen by agents.

It has never been possible to code & deploy software with all but specs. Whatever software Garry is building are products he couldn't otherwise. LoC, in that context, serves as a reminder of the capabilities of the agents to power/slog through reqs/specs (quite incredibly so).

Besides, critical human review can always be fed back as instructions to agents.