← Back to context

Comment by viraptor

1 day ago

It's not something that suddenly changed. "I'll generate some code" is as nondeterministic as "I'll look for a library that does it", "I'll assign John to code this feature", or "I'll outsource this code to a consulting company". Even if you write yourself, you're pretty nondeterministic in your results - you're not going to write exactly the same code to solve a problem, even if you explicitly try.

No?

If I use a library, I know it will do the same thing from the same inputs, every time. If I don't understand something about its behavior, then I can look to the documentation. Some are better about this, some are crap. But a good library will continuing doing what I want years or decades later.

An LLM can't decide between one sentence and the next what to do.

  • The library is deterministic, but looking for the library isn't. In the same way that generating code is not deterministic, but the generated code normally is.

    • I...guess? But once you know of a good library for problem X, you don't need to look for it anymore. I guess if you have a bunch of developers and 0 control over what they do, and they're free to drag in additional dependencies willy-nilly, then yes, that part isn't deterministic? But that's a much bigger problem than anything library-related...

Contrary to code generation, all the other examples have one common point which is the main advantage, which is the alignment between your objective and their actions. With a good enough incentive, they may as well be deterministic.

When you order home delivery, you don’t care about by who and how. Only the end result matters. And we’ve ensured that reliability is good enough that failures are accidents, not common occurrence.

Code generation is not reliable enough to have the same quasi deterministic label.

It's not the same, LLM's are qualitatively different due to the stochastic and non-reproducible nature of their output. From the LLM's point of view, non-functional or incorrect code is exactly the same as correct code because it doesn't understand anything that it's generating. When a human does it, you can say they did a bad or good job, but there is a thought process and actual "intelligence" and reasoning that went into the decisions.

I think this insight was really the thing that made me understand the limitations of LLMs a lot better. Some people say when it produces things that are incorrect or fabricated it is "hallucinating", but the truth is that everything it produces is a hallucination, and the fact it's sometimes correct is incidental.

  • I'm not sure who generates random code without a goal or checking if it works afterwards. Smells like a straw man. Normally you set the rules, you know how to validate if the result works, and you may even generate tests that keep that state. If I got completely random results rather than what I expect, I wouldn't be using that system - but it's correct and helpful almost every time. What you describe is just not how people work with LLMs in practice.

  • Correct. The thing has no concept of true or false. 0 or 1.

    Therefore it cannot necessarily discern between two statements that are practically identical in the eyes of humans. This doesnt make the technology useless but its clearly not some AGI nonsense.