Comment by fragmede
24 days ago
What does it do when the model wants to return something else, and what's better/worse about doing it in llamafile vs whatever wrapper that's calling it? How do I set retries? What if I want JSON and a range instead?
24 days ago
What does it do when the model wants to return something else, and what's better/worse about doing it in llamafile vs whatever wrapper that's calling it? How do I set retries? What if I want JSON and a range instead?
You can't do it as part of whatever's calling it because this changes the sampler. The grammar constraints what tokens the sampler is allowed to consider, only passing tokens that are valid by the grammar.
There are no retries. The grammar enforces the output tokens accepted as part of llamacpp.
> What does it do when the model wants to return something else,
You can build that into your structure, same as you would for allowing error values to be returned from a system.