Comment by jstummbillig
9 hours ago
Well, to be fair, the amount of goalpost shifting that is going on is quite intense. AI not being able to work in a "serious" project, and being limited to "toy projects" has been a long standing critique.
But also, bigger projects need some amount of loc written and it's a bit silly to pretend that this is not the case or a bad thing.
So the answer to the question is roughly: Establishing that an agent can work in a large-ish code base is valuable, because 1) them not being able to do so has been a critique and 2) it's something that is required for a lot of software projects.
Should we not be counting function points rather than LOC’s.
Lines of Code is a meaningless measure. It should also be easy to count function points using AI.
I'd argue LoC isn't actually a meaningless measure, but people use it the wrong way. The same program with the same features but less LoC is more likely to have a proper design and architecture, and is most likely easier to change and maintain in the future. Of course, only if it's less LoC because of proper design, not because you've folded everything to one line.
So if anything, we should find a way to aim for as little lines of code as possible. If you have two agents, and one can build exactly the same program as another, but with half the LoC, then most likely the first agent is better at software engineering and particularly software design.
Of course, as the author of an experiment that investigated exactly this, I'm slightly biased. Cursor's browser had millions lines of code which sounded weird to me based on the features and functionality it had. Meanwhile, I built the same thing but actually thinking about the design with the agent and ended up with ~20K lines of code instead.
Sure; But that's not the point that is argued about here.
(To state it in AI lingo:)
It's not about the best measure for "amount of code".
It's about wether "amount of code" is a good metric to begin with.
I don’t think it’s solvable. And I think Anthropic etc know it. LLMs can only reconstitute things in its training data and they are so hungry they can’t do a good job in long lived codebase full of complexity and novelty. There’s never going to be enough similar code on the open internet.
> LLMs can only reconstitute things in its training data
Such as a 4D raytracing engine in Metal? Or integrating APIs for features first released months after their knowledge cut-off date?
LLMs have shown an ability to transfer "knowledge" and capabilities across domains, languages, and use-cases outside their training data.
Case in point: GPT-2 "learning" to translate English to French and vice versa despite non-English examples having been voluntarily (and almost entirely) removed from the dataset.
Was this in the GPT2 paper?
1 reply →