Comment by ironbound
3 days ago
Use one of these structured output libraries:
https://github.com/outlines-dev/outlines
https://github.com/jxnl/instructor
https://github.com/guardrails-ai/guardrails
https://www.askmarvin.ai/docs/text/transformation/
Some of them allow a JSON schema, others a Pydantic model (which you can transform to/from JSON).
To add to that, llama.cpp supports GBNF grammars, which allows generation of JSON, or even programming languages:
https://github.com/ggml-org/llama.cpp/blob/master/grammars/R...
Yeah - the author seems oblivious to that option. But to be fair - this post is more about the more basic step of choosing the schema - he argues that it is still a task for a human. In his next post https://www.domainlanguage.com/articles/context-mapping-an-a... applying some output schemas would be more useful.
By they way you don' need to use library to have output schemas: https://platform.openai.com/docs/guides/structured-outputs , https://platform.claude.com/docs/en/build-with-claude/struct...
All those do is retry the generation if it fails and then give up.
Calling them structured output is a lie, it's structured validation.
If the LLM persists in generating bad JSON, those are useless.