Comment by quesera

2 months ago

This is probably true. I'm using your "median-quality" label, but that would be a generous description of the code I'm getting from LLMs.

I'm getting median-quality junior code. If you're getting median-quality commercial code, then you are speaking better LLMish than I.

4 comments

quesera

tptacek 2 months ago

A couple prompt/edit "cycles" into a Cursor project, Gemini's initial output gives me better-than-junior code, but still not code I would merge. But you review that code, spot the things you don't like (missed idioms, too much repetition, weird organization) and call them out; Gemini goes and fixes them. The result of that process is code that I would merge (or that would pass a code review).

What I feel like I keep seeing is people who see that initial LLM code "proposal", don't accept it (reasonably!), and end the process right there. But that's not how coding with an LLM works.

quesera 2 months ago
I've gone many cycles deep, some of which have resulted in incremental improvements.
Probably one of my mistakes is testing it with toy challenges, like bad interview questions, instead of workaday stuff that we would normally do in a state of half-sleep.
The latter would require loading the entire project into context, and the value would be low.
My thought with the former is that it should be able to produce working versions of industry standard algorithms (bubble sort, quicksort, n digits of pi, Luhn, crc32 checksum, timezone and offset math, etc) without requiring any outside context (proprietary code) -- and perhaps erroneously, that if it fails to pull off such parlor tricks, and creates such glaring errors in the process, that it couldn't add value elsewhere either.
- tptacek 2 months ago
  
  Why are you hesitating to load all the context you need (Cursor will start from a couple starting-point files you explicitly add the context window and then go track other stuff down). It's a machine. You don't have to be nice to it.
  
  1 reply →