Comment by jaredklewis

15 hours ago

> what's the evidence

What’s the evidence for anything software engineers use? Tests, type checkers, syntax highlighting, IDEs, code review, pair programming, and so on.

In my experience, evidence for the efficacy of software engineering practices falls into two categories:

- the intuitions of developers, based in their experiences.

- scientific studies, which are unconvincing. Some are unconvincing because they attempt to measure the productivity of working software engineers, which is difficult; you have to rely on qualitative measures like manager evaluations or quantitative but meaningless measures like LOC or tickets closed. Others are unconvincing because they instead measure the practice against some well defined task (like a coding puzzle) that is totally unlike actual software engineering.

Evidence for this LLM pattern is the same. Some developers have an intuition it works better.

13 comments

jaredklewis

codemog 15 hours ago

My friend, there’s tons of evidence of all that stuff you talked about in hundreds of papers on arxiv. But you dismiss it entirely in your second bullet point, so I’m not entirely sure what you expect.

jaredklewis 8 hours ago

I’ve read dozens of them and find them unconvincing for the reasons outlined. If you want a more specific critique, link a paper.
I personally like and use tests, formal verification, and so on. But the evidence for these methods are weak.
edit: To be clear, I am not ragging on the researchers. I think it's just kind of an inherently messy field with pretty much endless variables to control for and not a lot of good quantifiable metrics to rely on.
ChrisGreenHeur 14 hours ago

[dead]

thesz 15 hours ago

You can measure customer facing defects.

Also, lines of code is not completely meaningless metric. What one should measure is lines of code that is not verified by compiler. E.g., in C++ you cannot have unbalanced brackets or use incorrectly typed value, but you still may have off-by-one error.

Given all that, you can measure customer facing defect density and compare different tools, whether they are programming languages, IDEs or LLM-supported workflow.

codeflo 14 hours ago

> Also, lines of code is not completely meaningless metric.
Comparing lines of code can be meaningful, mostly if you can keep a lot of other things constant, like coding style, developer experience, domain, tech stack. There are many style differences between LLM and human generated code, so that I expect 1000 lines of LLM code do a lot less than 1000 lines of human code, even in the exact same codebase.

jacquesm 14 hours ago

The proper metric is the defect escape rate.

exidex 14 hours ago
Now you have to count defects
- jacquesm 14 hours ago
  
  You have to do that anyway, and in fact you probably were already doing that. If you do not track this then you are leaving a lot on the table.
  
  1 reply →

slopinthebag 14 hours ago

Most developer intuitions are wrong.

See: OOP

vbezhenar 13 hours ago
Intuition is subjective. It's hard to convert subjective experience to objective facts.
- tomgp 13 hours ago
  
  That's what science is though * our intuition/ hunch/ guess is X * now let's design an experiment which can falsify X