Comment by tptacek

6 months ago

Median-quality code is extraordinarily valuable. It is most of the load-bearing code people actually ship. What's almost certainly happening here is that you and I have differing definitions of "median-quality" commercial code.

I'm pretty sure that if we triangle-tested (say) a Go project from 'jerf and Gemini 2.5 Go output for the same (substantial; say, 2,000 lines) project --- not whatever Gemini's initial spew is, but a final product where Gemini is the author of 80+% of the lines --- you would not be able to pick the human code out from the LLM code.

5 comments

tptacek

quesera 6 months ago

This is probably true. I'm using your "median-quality" label, but that would be a generous description of the code I'm getting from LLMs.

I'm getting median-quality junior code. If you're getting median-quality commercial code, then you are speaking better LLMish than I.

tptacek 6 months ago
A couple prompt/edit "cycles" into a Cursor project, Gemini's initial output gives me better-than-junior code, but still not code I would merge. But you review that code, spot the things you don't like (missed idioms, too much repetition, weird organization) and call them out; Gemini goes and fixes them. The result of that process is code that I would merge (or that would pass a code review).
What I feel like I keep seeing is people who see that initial LLM code "proposal", don't accept it (reasonably!), and end the process right there. But that's not how coding with an LLM works.
- quesera 6 months ago
  
  I've gone many cycles deep, some of which have resulted in incremental improvements.
  Probably one of my mistakes is testing it with toy challenges, like bad interview questions, instead of workaday stuff that we would normally do in a state of half-sleep.
  The latter would require loading the entire project into context, and the value would be low.
  My thought with the former is that it should be able to produce working versions of industry standard algorithms (bubble sort, quicksort, n digits of pi, Luhn, crc32 checksum, timezone and offset math, etc) without requiring any outside context (proprietary code) -- and perhaps erroneously, that if it fails to pull off such parlor tricks, and creates such glaring errors in the process, that it couldn't add value elsewhere either.
  
  2 replies →