Comment by tarruda
2 days ago
The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json
2 days ago
The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json
and in fact leverages that control to constrain outputs to those matching user-specified BNFs
https://github.com/ggml-org/llama.cpp/tree/master/grammars
Very cool!
Ahh, I stand corrected, very cool!