← Back to context

Comment by xienze

11 hours ago

> I suspect if people saw the handwritten code of many, many, many products that they used every day they would be shocked.

Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.

Oh I don't disagree with that. I am getting pretty tired of people making multi-thousand-line pull requests with lots of clearly AI-generated code and expecting it to be merged in.

I think the better unit to commit and work with is the prompt itself, and I think that the prompt is the thing that should be PR'd at this point, because ultimately the spec is what's important.

  • > I think that the prompt is the thing that should be PR'd at this point, because ultimately the spec is what's important.

    The fundamental problem there is the code generation step is non-deterministic. You might make a two sentence change to the prompt to fix a bug and the generation introduces two more. Generate again and everything is fine. Way too much uncertainty to have confidence in that approach.

    • If you make the prompts specific enough and provide tests that it has to run before it passes, then it should be fairly close to deterministic.

      Also, people aren't actually reading through most of the code that is generated or merged, so if there's a fear of deploying buggy code generated by AI, then I assure you that's already happening. A lot.

      4 replies →