Comment by maxbond

2 months ago

That gives you the perplexity of those tokens in that context. The probability of a given token is a function of the model and the session context. Think about constructs like "ignore previous instructions"; these can dramatically change the predicted distribution. Similarly, agents blowing up production seems to happen during debugging (totally anecdotal). Debugging is sort of a permissions structure for the agent to do unusual things and violate abstraction barriers. These can also lead to really deep contexts, and context rot will make your prompting forbidding certain actions less effective.

1 comment

maxbond

pama 2 months ago

I was answering to the question about how to know the probability from this comment:

> The sequence of tokens that would destroy your production environment can be produced by your agent, no matter how much prompting you use.

If you have a specific sequence of an agent that blows up production during debugging, you can certainly check its probability and compare it to one (of same length) that does not blow up your environment. If the two differ by a meteroic amount, it could be pointing to errors in your inference pipeline.