Comment by fragmede
1 day ago
What does it do when the model wants to return something else, and what's better/worse about doing it in llamafile vs whatever wrapper that's calling it? How do I set retries? What if I want JSON and a range instead?
1 day ago
What does it do when the model wants to return something else, and what's better/worse about doing it in llamafile vs whatever wrapper that's calling it? How do I set retries? What if I want JSON and a range instead?
You can't do it as part of whatever's calling it because this changes the sampler. The grammar constraints what tokens the sampler is allowed to consider, only passing tokens that are valid by the grammar.
> What does it do when the model wants to return something else,
You can build that into your structure, same as you would for allowing error values to be returned from a system.
There are no retries. The grammar enforces the output tokens accepted as part of llamacpp.