Comment by antonvs
6 hours ago
LLMs are one of the most general abstractions possible.
LLMs are also quite deterministic if you want them to be - generally, their final token selection is deliberately randomized (the model “temperature”). But the word you’re looking for here is probably not actually determinism, it’s probably something closer to predictability.
In any case, it’s perfectly possible to ensure that the output of LLMs is fully deterministic, debuggable, understandable, and testable.
> You cannot be serious.
I don’t think you’re thinking about this clearly.
> LLMs are also quite deterministic if you want them to be
In the shallow sense that any PRNG is deterministic if you set the seed and if you control triggering order.
However that's not usually the situation/scope people are talking about.
I was just pointing out, in part, that the non-determinism is a choice, but I probably would have needed to go down a whole rabbit hole about exploration of search spaces etc.
My broader point is that it's not really the non-determinism that's an issue. What the other commenter seems to be looking for is something along the lines of repeatable correctness, where correctness is generally a requirement that the model doesn't have full access to. The non-determinism is an implementation detail here.
With a sufficiently complex prompt and a sufficiently complex codebase, LLMs consistently fail and make mistakes, "forget" parts of the prompt, etc.
There's no comparison to be made between this and, for example, a compiler. It's an incompetent comparison.
> I don’t think you’re thinking about this clearly.
My literal job is dealing with layers of abstraction. I'm thinking pretty clearly when I tell you that, not only are LLMs a super leaky, terrible abstraction, they are also not comparable to any other layers of abstraction. All other layers of abstraction we use are well understood, predictable (as you put it), and DEBUGGABLE.
When claude deletes a fix it did two weeks ago, while trying to fix some unrelated error, do you never stop and think "this is not quite the same as what GCC does"?
> With a sufficiently complex prompt and a sufficiently complex codebase ...
With a sufficiently complex specification of a failure mode, you can find problems with anything.
Humans, given sufficiently complex requirements and sufficiently complex codebases, also regularly fail. You're tacitly admitting that LLMs are approaching (if not exceeding) human levels of performance now. We somehow get non-deterministic humans to achieve useful work. In fact, staff provide managers with an abstraction over the work they're responsible for - managers don't know every detail of the systems they're responsible for.
There are effective ways to use LLMs. I recommend using those, not using overly complex prompts, and not letting LLMs freely make changes to large code bases. Just as compilers only compile one source file at a time, LLMs work best if you scope their attention. Same goes for humans, in fact.
> There's no comparison to be made between this and, for example, a compiler.
A simple comparison is that both can generate useful code. You need to be more precise about the issues you're trying to identify.
Anyway, the comparison to compilers isn't really the point. It's undeniable that LLMs are an abstraction themselves, and that they can generate new abstractions. Saying that they're "not another abstraction" is just definitionally wrong.
Sure, they're not the same kind of abstraction as a traditional compiler. They require new ways of working, but actually not that new, as the manager example I gave suggests.
> When claude deletes a fix it did two weeks ago, while trying to fix some unrelated error, do you never stop and think "this is not quite the same as what GCC does"?
I never made the mistake of thinking LLMs were the same as GCC in the first place.
And once again, I've seen human developers do exactly what you just described. That's why we review code. All the arguments you're making are essentially also arguments that humans shouldn't be involved in software development either.