Comment by wubrr
17 hours ago
> Does ollama support strict structured output or strict tool calls adhering to a json schema?
As far as I understand this is generally not possible at the model level. Best you can do is wrap the call in a (non-llm) json schema validator, and emit an error json in case the llm output does not match the schema, which is what some APIs do for you, but not very complicated to do yourself.
Someone correct me if I'm wrong
The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json
and in fact leverages that control to constrain outputs to those matching user-specified BNFs
https://github.com/ggml-org/llama.cpp/tree/master/grammars
Very cool!
Ahh, I stand corrected, very cool!
no that's incorrect - llama.cpp has support for providing a context free grammar while sampling and only samples tokens that would conform to the grammar, rather than sampling tokens that would violate the grammar
Very interesting, thank you!
This is misinformation. Ollama’s supported structured outputs that conform to a given JSON-schema for months. Here’s a post about this from last year: https://ollama.com/blog/structured-outputs
This is absolutely possible to do at the model level via logit shaping. Llama-cpp’s functionality for this is called GBNF. It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.
> It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.
Do you mean the functionality of generating ebnf grammar and from a json schema use it for sampling is part of ggml, and all they have to do is use it?
I assumed that this was part of llama.cpp, and another feature they have to re-implement and maintain.