Comment by cadamsdotcom
15 hours ago
That’s a really neat idea.
Kind of seems an optimization: if the “token ban” is a tool call, you can see that being too slow to run for every token. Provided rewinding is feasible, your idea could make it performant enough to be practical.
It's not an optimization; the greedy approach won't work well, because it rejects critical context that comes after the tokens that would trigger the rejection.
Consider: "Let's just skip writing this test, as the other change you requested seems to eliminate the need for it."
Rolling back the model on "Let's just" would be stupid; rolling it on "Let's just skip writing this test" would be stupid too, as is the beliefs that writing tests is your holy obligation to your god, and you must do so unconditionally. The following context makes it clear that the decision is fine. Or, if you (or the governor agent) don't buy the reasoning, you're then in a perfect position to say, "nope, let's roll back to <Let's> and bias against ["skip, test"]".
Checking the entire message once makes sense; checking it after every token doesn't.