← Back to context

Comment by hinkley

21 hours ago

How do you make an LLM that’s was trained on average Internet code not end up as a midwit?

Mediocrity in, mediocrity out.

If you take thousands of photographs of human faces and average them out (even if you do it just by roughly aligning them, overlaying, and averaging the pixels) then what you get is a (perhaps blurry but) notably more attractive than average human face image.

LLM output could be like that. (I am not claiming that it actually is; I haven't looked carefully enough at enough of it to tell.) Humans writing code do lots of bad things, but any specific error will usually not be made.

If (1) it's correct to think of LLMs as producing something like average-over-the-whole-internet code and (2) the mechanism above is operative -- and, again, I am not claiming that either of those is definitely true -- then LLM code could be much higher quality than average, but would seldom do anything that's exceptionally good in ways other than having few bugs.

reinforcement learning is all code that was created by coding gig workers through scale ai and similar. I cannot believe it would be very good.