Comment by tarruda

2 days ago

The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json

3 comments

tarruda

kristjansson 2 days ago

and in fact leverages that control to constrain outputs to those matching user-specified BNFs

https://github.com/ggml-org/llama.cpp/tree/master/grammars

wubrr 2 days ago

Very cool!

wubrr 2 days ago

Ahh, I stand corrected, very cool!