Comment by mangoman

5 months ago

From the S1 paper:

> Second, we develop budget forcing to control test-time compute by forcefully terminating the model's thinking process or lengthening it by appending "Wait" multiple times to the model's generation when it tries to end

I'm feeling proud of myself that I had the crux of the same idea almost 6 months ago before reasoning models came out (and a bit disappointed that I didn't take this idea further!). Basically during inference time, you have to choose the next token to sample. Usually people just try to sample the distribution using the same sampling rules at each step.... but you don't have to! you can selectively insert words into the the LLM's mouth based on what it said previously or what it wants to say, and decide "nah, say this instead". I wrote a library so that you could sample an LLM using llama.cpp in swift and you could write rules to sample tokens and force tokens into the sequence depending on what was sampled. https://github.com/prashanthsadasivan/LlamaKit/blob/main/Tes...

Here, I wrote a test that asks Phi-3 instruct "how are you" and it if it tried to say "as an AI I don't have feelings" or "I'm doing " I forced it to say "I'm doing poorly" and refuse to help since it was always so dang positive. It sorta worked, though the instruction tuned models REALLY want to help. But at the time I just didn't have a great use case for it - I had thought about a more conditional extension to llama.cpp's grammar sampling (you could imagine changing the grammar based on previously sampled text), or even just making it go down certain paths, but I just lost steam because I couldn't describe a killer use case for it.

This is that killer use case! forcing it to think more is such a great usecase for inserting ideas into the LLM's mouth, and I feel like there must be more to this idea to explore.

2 comments

mangoman

jwrallie 5 months ago

So what you mean is that if the current train of thought is going in a direction we find to be not optimal, we could just interrupt it and hint it into the right direction?

That sounds very useful, albeit a bit different than how current "chat" implementations would work, as in you could control both ways of the conversation.

latexr 5 months ago

> and a bit disappointed that I didn't take this idea further!

Don’t be, that’s pretty common.

https://en.wikipedia.org/wiki/Multiple_discovery