Comment by refulgentis

8 months ago

Interesting point, my first reaction was "why do you need logprobs? We use constrained decoding for tool calls and don't need them"...which is actually false! Because we need to throw out those log probs then find the highest log prob of a token meeting the constraints.

1 comment

refulgentis

lqstuart 8 months ago

Haha yeah. I’ve seen you mention the llama cpp wrapper elsewhere, it sounds cool! I’ve worked enough with vLLM and sglang to get angry at xgrammar, which I believe has some common ancestry with the GGML stack (GBNF if I’m not mistaken, which I may be). The constrained decoding part is as simple as you’d expect, just applies a bitmask to the logprobs during the “logit processing” and continuing as normal.