Comment by tuhlatte
6 days ago
Now I'm confused -- you're claiming you meant "good enough code" when your previous definition was such that even mathematical proofs could be "terrible"? That doesn't make sense to me. In software engineering, "good enough" has reasonably clear criteria: passes tests, performs adequately, follows conventions, etc. While these are imperfect proxies, they're sufficient for most real-world applications, and crucially -- measurable. And my claim is that they will be more than adequate to get LLMs to produce good code.
And again, diffusion models aren't relevant here. The original comment was about LLMs producing buggy code -- not RL's general limitations in other domains. Diffusion models' tensors aren't written by hand.
You know there's plenty of ways to prove things, right? Like there's not a single proof. Here's a few proofs for pi being irrational[0]. The list is not comprehensive.
Take that like you do with code. They all generate the same final output. They're all correct. But is one better than another? Yes, yes it is. But which one that is depends on context.
This is probably a point of contention. Measuring is far more difficult than people think. A lot of work goes into creating measurements and we get a nice ruler at the end. The problem isn't just that initial complexity, it is that every measure is a proxy. Even your meter stick doesn't measure a meter. What distinguishes the engineer from the hobbyist is the knowledge of alignment.
That's a very hard problem. How often do you ask yourself that? I'm betting not enough. Frankly, most things aren't measurable.
[0] https://proofwiki.org/wiki/Pi_is_Irrational#:~:text=Hence%20...