← Back to context

Comment by tarruda

2 days ago

The inference engine (llama.CPP) has full control over the possible tokens during inference. It can "force" the llm to output only valid tokens so that it produces valid json