Comment by wubrr

2 days ago

> Does ollama support strict structured output or strict tool calls adhering to a json schema?

As far as I understand this is generally not possible at the model level. Best you can do is wrap the call in a (non-llm) json schema validator, and emit an error json in case the llm output does not match the schema, which is what some APIs do for you, but not very complicated to do yourself.

Someone correct me if I'm wrong

10 comments

wubrr

tarruda 2 days ago

The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json

kristjansson 2 days ago
and in fact leverages that control to constrain outputs to those matching user-specified BNFs
https://github.com/ggml-org/llama.cpp/tree/master/grammars
- wubrr 2 days ago
  
  Very cool!
wubrr 2 days ago

Ahh, I stand corrected, very cool!

mangoman 2 days ago

no that's incorrect - llama.cpp has support for providing a context free grammar while sampling and only samples tokens that would conform to the grammar, rather than sampling tokens that would violate the grammar

wubrr 2 days ago

Very interesting, thank you!

gcr 2 days ago

This is misinformation. Ollama’s supported structured outputs that conform to a given JSON-schema for months. Here’s a post about this from last year: https://ollama.com/blog/structured-outputs

This is absolutely possible to do at the model level via logit shaping. Llama-cpp’s functionality for this is called GBNF. It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.

tarruda 2 days ago
> It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.
Do you mean the functionality of generating ebnf grammar and from a json schema use it for sampling is part of ggml, and all they have to do is use it?
I assumed that this was part of llama.cpp, and another feature they have to re-implement and maintain.
- gcr 1 day ago
  
  The whole point of GBNF is to serve as part of the API that lets downstream applications control token sampling in a high-level way without having to drop to raw logit distributions or pull model-specific tricks.
  Ollama has a hardcoded GBNF grammar to force generic json output for example, the code is here: https://github.com/ollama/ollama/blob/da09488fbfc437c55a94bc...
  Ollama can also turn user-passed json schema into a more tightly specified GBNF grammar, the code is here and is a bit harder to understand: https://github.com/ollama/ollama/blob/da09488fbfc437c55a94bc...
  This thread was about doing structured generation in a model-agnostic way without wrapping try/except around json.parse(), and GBNF is _the_ way to do that.