Comment by zambelli

21 days ago

Interesting, catching the problem upstream, effectively. How did you enforce the grammar?

3 comments

zambelli

https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...

llama.cpp supports grammar limiting using either GBNF or json schema (It just translate it to GBNF behind the scenes I think). So I have my harness generate a tool schema on the fly (based on what tools are possible for the current task) and pass it in at request time.

zambelli 21 days ago
Oh, interesting - thanks for the link. I really haven't explored this but it should slot in fairly easily I think? Gotta dig into it more.
- tmzt 18 days ago
  
  It's basically restricting what logits are allowed when sampling the model to conform with the JSON (or whatever) shape. It can also cause the model to get "confused" though and doesn't always result in the output you want.