Comment by steerlabs

6 days ago

OP here. I wrote this because I got tired of agents confidently guessing answers when they should have asked for clarification (e.g. guessing "Springfield, IL" instead of asking "Which state?" when asked "weather in Springfield").

I built an open-source library to enforce these logic/safety rules outside the model loop: https://github.com/imtt-dev/steer

This approach kind of reminds me of taking an open-book test. Performing mandatory verification against a ground truth is like taking the test, then going back to your answers and looking up whether they match.

Unlike a student, the LLM never arrives at a sort of epistemic coherence, where they know what they know, how they know it, and how true it's likely to be. So you have to structure every problem into a format where the response can be evaluated against an external source of truth.

Thanks a lot for this. Also one question in case anyone could shed a bit of light: my understanding is that setting temperature=0, top_p=1 would cause deterministic output (identical output given identical input). For sure it won’t prevent factually wrong replies/hallucination, only maintains generation consistency (eq. classification tasks). Is this universally correct or is it dependent on model used? (or downright wrong understanding of course?)

  • > my understanding is that setting temperature=0, top_p=1 would cause deterministic output (identical output given identical input).

    That's typically correct. Many models are implemented this way deliberately. I believe it's true of most or all of the major models.

    > Is this universally correct or is it dependent on model used?

    There are implementation details that lead to uncontrollable non-determinism if they're not prevented within the model implementation. See e.g. the Pytorch docs for CUDA convolution determinism: https://docs.pytorch.org/docs/stable/notes/randomness.html#c...

    That documents settings like this:

        torch.backends.cudnn.deterministic = True 
    

    Parallelism can be a source of non-determinism if it's not controlled for, either implicitly via e.g. dependencies or explicitly.

You should use structured output rather than checking and rechecking for valid json. It can’t solve all of your problems but it can enforce a schema on the output format.