Comment by JoshuaDavid

2 months ago

Along these lines, if the main LLM goes down a bad path, you could _rewind_ the model to before it started going down the bad path -- the watcher LLM doesn't necessarily have to guess that "skip" is a bad token after the words "let's just", it could instead see "let's just skip the test" and go "nope, rolling back to the token "just " and rerolling with logit_bias={"skip":-10,"omit":-10,"hack":-10}".

Of course doing that limits which model providers you can work with (notably, OpenAI has gotten quite hostile to power users doing stuff like that over the past year or so).

2 comments

JoshuaDavid

cadamsdotcom 2 months ago

That’s a really neat idea.

Kind of seems an optimization: if the “token ban” is a tool call, you can see that being too slow to run for every token. Provided rewinding is feasible, your idea could make it performant enough to be practical.

TeMPOraL 2 months ago

It's not an optimization; the greedy approach won't work well, because it rejects critical context that comes after the tokens that would trigger the rejection.
Consider: "Let's just skip writing this test, as the other change you requested seems to eliminate the need for it."
Rolling back the model on "Let's just" would be stupid; rolling it on "Let's just skip writing this test" would be stupid too, as is the beliefs that writing tests is your holy obligation to your god, and you must do so unconditionally. The following context makes it clear that the decision is fine. Or, if you (or the governor agent) don't buy the reasoning, you're then in a perfect position to say, "nope, let's roll back to <Let's> and bias against ["skip, test"]".
Checking the entire message once makes sense; checking it after every token doesn't.