Comment by wubrr

17 hours ago

> Does ollama support strict structured output or strict tool calls adhering to a json schema?

As far as I understand this is generally not possible at the model level. Best you can do is wrap the call in a (non-llm) json schema validator, and emit an error json in case the llm output does not match the schema, which is what some APIs do for you, but not very complicated to do yourself.

Someone correct me if I'm wrong

9 comments

wubrr

tarruda 16 hours ago

The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json

kristjansson 16 hours ago
and in fact leverages that control to constrain outputs to those matching user-specified BNFs
https://github.com/ggml-org/llama.cpp/tree/master/grammars
- wubrr 14 hours ago
  
  Very cool!
wubrr 14 hours ago

Ahh, I stand corrected, very cool!

mangoman 17 hours ago

no that's incorrect - llama.cpp has support for providing a context free grammar while sampling and only samples tokens that would conform to the grammar, rather than sampling tokens that would violate the grammar

wubrr 14 hours ago

Very interesting, thank you!

gcr 12 hours ago

This is misinformation. Ollama’s supported structured outputs that conform to a given JSON-schema for months. Here’s a post about this from last year: https://ollama.com/blog/structured-outputs

This is absolutely possible to do at the model level via logit shaping. Llama-cpp’s functionality for this is called GBNF. It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.

tarruda 6 hours ago

> It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.
Do you mean the functionality of generating ebnf grammar and from a json schema use it for sampling is part of ggml, and all they have to do is use it?
I assumed that this was part of llama.cpp, and another feature they have to re-implement and maintain.