Comment by quesera
1 day ago
This is probably true. I'm using your "median-quality" label, but that would be a generous description of the code I'm getting from LLMs.
I'm getting median-quality junior code. If you're getting median-quality commercial code, then you are speaking better LLMish than I.
A couple prompt/edit "cycles" into a Cursor project, Gemini's initial output gives me better-than-junior code, but still not code I would merge. But you review that code, spot the things you don't like (missed idioms, too much repetition, weird organization) and call them out; Gemini goes and fixes them. The result of that process is code that I would merge (or that would pass a code review).
What I feel like I keep seeing is people who see that initial LLM code "proposal", don't accept it (reasonably!), and end the process right there. But that's not how coding with an LLM works.
I've gone many cycles deep, some of which have resulted in incremental improvements.
Probably one of my mistakes is testing it with toy challenges, like bad interview questions, instead of workaday stuff that we would normally do in a state of half-sleep.
The latter would require loading the entire project into context, and the value would be low.
My thought with the former is that it should be able to produce working versions of industry standard algorithms (bubble sort, quicksort, n digits of pi, Luhn, crc32 checksum, timezone and offset math, etc) without requiring any outside context (proprietary code) -- and perhaps erroneously, that if it fails to pull off such parlor tricks, and creates such glaring errors in the process, that it couldn't add value elsewhere either.