Comment by lqstuart
6 days ago
I don’t really see a need for token IDs alone, but you absolutely need per-token logprob vectors if you’re trying to do constrained decoding
6 days ago
I don’t really see a need for token IDs alone, but you absolutely need per-token logprob vectors if you’re trying to do constrained decoding
Interesting point, my first reaction was "why do you need logprobs? We use constrained decoding for tool calls and don't need them"...which is actually false! Because we need to throw out those log probs then find the highest log prob of a token meeting the constraints.
Haha yeah. I’ve seen you mention the llama cpp wrapper elsewhere, it sounds cool! I’ve worked enough with vLLM and sglang to get angry at xgrammar, which I believe has some common ancestry with the GGML stack (GBNF if I’m not mistaken, which I may be). The constrained decoding part is as simple as you’d expect, just applies a bitmask to the logprobs during the “logit processing” and continuing as normal.