Comment by hansvm

1 day ago

This is good. It covers the two easiest dominant methods people use. It even touches on my main complaint for the one they seem to recommend.

That said:

- Constrained generation yields a different distribution from what a raw LLM would provide. This can be pathologically bad. My go-to example is LLMs having a preference for including ellipses in long, structured objects. Constrained generation forces closing quotes or whatever it takes to recover from that error according to a schema, nevertheless yielding an invalid result. Resampling tends to repeat till the LLM fully generates the data in question, always yielding a valid result which also adheres to the schema. It can get much worse than that.

- The unconstrained "method" has a few possible implementations. Increasing context length by complaining about schema errors is almost always worse from an end quality perspective than just retrying till the schema passes. Effective context windows are precious, and current models bias heavily toward earlier data which has been fed into them. In a low-error regime you might get away with a "try it again" response in a single chat, but in a high-error regime you'll get better results at a lower cost by literally re-sending the same prompt till the model doesn't cause errors.

> Increasing context length by complaining about schema errors is almost always worse from an end quality perspective than just retrying till the schema passes.

Another way to do this is to use a hybrid approach. You perform unconstrained generation first, and then constrained generation on the failures.

  • There's no difference in the output distribution between always doing constrained generation and only doing it on the failures though. What's the advantage?

    • There's no advantage wrt output quality, but it can be more economical in some high-error regimes, with less LLM calls used in resampling (max 2 for most errors).

      1 reply →