Whom's messenger? You didn't point us to anyone's research.
I just don't see how sampling tokens constrained to a grammar can be worse than rejection-sampling whole answers against the same grammar. The latter needs to follow the same constraints naturally to not get rejected, and both can iterate in natural language before starting their structured answer.
Under a fair comparison, I'd expect the former to provide answers at least just as good while being more efficient. Possibly better if top-whatever selection happened after the grammar constraint.
Or just run json.dumps on the correct answer in the wrong format.
Don't shoot the messenger
Whom's messenger? You didn't point us to anyone's research.
I just don't see how sampling tokens constrained to a grammar can be worse than rejection-sampling whole answers against the same grammar. The latter needs to follow the same constraints naturally to not get rejected, and both can iterate in natural language before starting their structured answer.
Under a fair comparison, I'd expect the former to provide answers at least just as good while being more efficient. Possibly better if top-whatever selection happened after the grammar constraint.