Comment by peer0

21 days ago

This seems similar to what I done using llama.cpp's "Grammar constrained generation" for my local agents. But using that instead of catching and retrying it is just literally impossible for the LLM to generate something that doesn't match a specific schema of tool choices. It is amazing how much better small models can be when you reduce the problem space to only grammatically correct answers.

4 comments

peer0

zambelli 21 days ago

Interesting, catching the problem upstream, effectively. How did you enforce the grammar?

peer0 21 days ago
https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
llama.cpp supports grammar limiting using either GBNF or json schema (It just translate it to GBNF behind the scenes I think). So I have my harness generate a tool schema on the fly (based on what tools are possible for the current task) and pass it in at request time.
- zambelli 21 days ago
  
  Oh, interesting - thanks for the link. I really haven't explored this but it should slot in fairly easily I think? Gotta dig into it more.
  
  1 reply →