Comment by pama
2 months ago
LLM inference is built upon a probability function over every possible token, given a stream of input tokens. If you serve the model yourself you can get the log prob for the next token, so you just add up a bunch of numbers to get the log probability of a sequence. Many API also provide these probabilities as additional outputs.
That gives you the perplexity of those tokens in that context. The probability of a given token is a function of the model and the session context. Think about constructs like "ignore previous instructions"; these can dramatically change the predicted distribution. Similarly, agents blowing up production seems to happen during debugging (totally anecdotal). Debugging is sort of a permissions structure for the agent to do unusual things and violate abstraction barriers. These can also lead to really deep contexts, and context rot will make your prompting forbidding certain actions less effective.
I was answering to the question about how to know the probability from this comment:
> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.
If you have a specific sequence of an agent that blows up production during debugging, you can certainly check its probability and compare it to one (of same length) that does not blow up your environment. If the two differ by a meteroic amount, it could be pointing to errors in your inference pipeline.