Comment by lmf4lol

5 months ago

give me one seriously peer reviewed study please with proper controls

i wait

10 comments

lmf4lol

Go ahead and move the goalposts now... This took about 2 minutes of research to support the conclusions I know to be true. You can waste time as long as you choose in academia attempting to prove any point, while normal people make real contributions using LLMs.

### An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation We evaluate TESTPILOT using OpenAI’s gpt3.5-turbo LLM on 25 npm packages with a total of 1,684 API functions. The generated tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%. In contrast, the state-of-the feedback-directed JavaScript test generation technique, Nessie, achieves only 51.3% statement coverage and 25.6% branch coverage. - *Link:* [An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (arXiv)](https://arxiv.org/abs/2302.06527)

---

### Field Experiment – CodeFuse (12-week deployment) - Productivity (measured by the number of lines of code produced) increased by 55% for the group using the LLM. Approximately one third of this increase was directly attributable to code generated by the LLM. - *Link:* [CodeFuse: Generative AI for Code Productivity in the Workplace (BIS Working Paper 1208)](https://www.bis.org/publ/work1208.htm)

footy 5 months ago
> This took about 2 minutes of research to support the conclusions I know to be true
This is a terrible way to do research!
- Our_Benefactors 5 months ago
  
  The point is that the information is readily available, and rather than actually adding to the discussion they chose to crow “source?”. It’s very lame.
capyba 5 months ago
“ Productivity (measured by the number of lines of code produced) increased”
The LLM’s better have written more code, they’re a text generation machine!
In what world does this study prove that the LLM actually accomplished anything useful?
- Our_Benefactors 5 months ago
  
  As expected, the goalposts are being moved.
  LOC does have a correlation with productivity, as much as devs hate to acknowledge it. I don’t care that you can provide counterexamples to this, or even if the AI on average takes more LOC to accomplish the same task - it still results in more productivity overall because it arrives at the result faster.
  
  2 replies →
psunavy03 5 months ago
If you are seriously linking "productivity" to "lines of code produced," that says all about your credibility that I need to know.
- Our_Benefactors 5 months ago
  
  Do you think LOC and program complexity are not correlated? You are arguing in bad faith.
  
  1 reply →