Comment by gcr

1 day ago

This is misinformation. Ollama’s supported structured outputs that conform to a given JSON-schema for months. Here’s a post about this from last year: https://ollama.com/blog/structured-outputs

This is absolutely possible to do at the model level via logit shaping. Llama-cpp’s functionality for this is called GBNF. It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.

> It’s tightly integrated into the token sampling infrastructure, and is what ollama builds upon for their json schema functionality.

Do you mean the functionality of generating ebnf grammar and from a json schema use it for sampling is part of ggml, and all they have to do is use it?

I assumed that this was part of llama.cpp, and another feature they have to re-implement and maintain.

  • The whole point of GBNF is to serve as part of the API that lets downstream applications control token sampling in a high-level way without having to drop to raw logit distributions or pull model-specific tricks.

    Ollama has a hardcoded GBNF grammar to force generic json output for example, the code is here: https://github.com/ollama/ollama/blob/da09488fbfc437c55a94bc...

    Ollama can also turn user-passed json schema into a more tightly specified GBNF grammar, the code is here and is a bit harder to understand: https://github.com/ollama/ollama/blob/da09488fbfc437c55a94bc...

    This thread was about doing structured generation in a model-agnostic way without wrapping try/except around json.parse(), and GBNF is _the_ way to do that.