Comment by peer0
21 days ago
This seems similar to what I done using llama.cpp's "Grammar constrained generation" for my local agents. But using that instead of catching and retrying it is just literally impossible for the LLM to generate something that doesn't match a specific schema of tool choices. It is amazing how much better small models can be when you reduce the problem space to only grammatically correct answers.
Interesting, catching the problem upstream, effectively. How did you enforce the grammar?
https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
llama.cpp supports grammar limiting using either GBNF or json schema (It just translate it to GBNF behind the scenes I think). So I have my harness generate a tool schema on the fly (based on what tools are possible for the current task) and pass it in at request time.
Oh, interesting - thanks for the link. I really haven't explored this but it should slot in fairly easily I think? Gotta dig into it more.
1 reply →