Comment by orbital-decay

8 hours ago

This word (determinism) has a magical effect of warping any online posts it touches. Once you hear it you can almost guarantee it's going to be misguided. At least this time it's actual determinism (same input = same output), not arbitrary unrelated things.

Determinism matters for reproducibility, but do you really want these outputs to be reproducible in this particular case? Making LLM outputs deterministic is relatively trivial, you have to use batch-invariant kernels (if you use batching) and either set the temperature to 0 (don't do that, randomized sampling is here for a reason) or fix the seed (better). It's readily available in a few systems. But this won't make the result more useful, it will just obscure the fact that the agent is genuinely not sure about it - look at the range of the scores it gives! It still won't predict anything but the score will stay the same each time. Do you really want that?

What happens here is they're supplying too little information (just a resume, which is almost at the noise level) and expecting a reply with too broad implications. This is a basic design mistake regardless of whether it uses LLMs. All surveys, tests, laws, and voting systems are extremely sensitive to framing because they work off too little information. But they also don't exist in vacuum, unlike this thing.

I made a similar comment on a different post. Non-determinism does not necessarily mean it cannot reliably reach the correct output (although sometimes it does mean that). Las Vegas algorithims are non-deterministic and 100% accurate. The tradeoff is the time it takes to reach the correct answer is highly variable.

To contextualize this insight in your post and basically just repeat what you are saying: The mistake is not using a non-deterministic system. The mistake could be, in some sense, using it too little. Re-evaluating the same resume 5 times and seeing a high variance in scores is a more useful signal than evaluating it once.

Nondeterminism is also a feature, not a bug. If you don't want people to optimize against your filtering process, you have to make it somewhat nondeterministic. For example, better candidates are exponentially more likely to pass the filter, instead of a hard cut-off at the top-100. Then it becomes no longer worthwhile to Goodhart the filtering process, because it barely increases your chances and there are so many more places you can use your time better.

  • > If you don't want people to optimize against your filtering process, you have to make it somewhat nondeterministic.

    I'm sorry, I'm not following this at all. When you say "better candidates are exponentially more likely to pass the filter", we're still are talking about a metric, yes? A metric that can be optimized? Why would switching from a hard cutoff to some sort of stochastic filter weighted by this metric discourage optimization?

This. Human judges and examiners are famously not deterministic even though we would wish it were so - we've probably all heard the thing of harsher sentences being given in the hour before lunch.

  • >we've probably all heard the thing of harsher sentences being given in the hour before lunch

    That suggests determinism though.

    I mean I agree with you overall. Either humans decision making is a system so complex it appears non-deterministic, or it is deterministic. Practically speaking, we are non-deterministic.

    Let's not conflate non-deterministic with inaccurate though. Non-deterministic systems can be 100% accurate. https://en.wikipedia.org/wiki/Las_Vegas_algorithm

  • > harsher sentences being given in the hour before lunch.

    Implicit bias theory sparked a massive number of studies that suggested everything influenced you from the color of the room, to what the person said to you before entering.

    It’s been really hard to replicate and the conclusions that have been drawn are contradictory.