Comment by kevinlu1248
1 month ago
^ these were pretty much the main reasons.
The other one is that constrained decoding only works on CFGs (simpler grammars like JSON schemas) since only these ones can produce automatas which can be used for constrained decoding. Programming languages like Python and C++ aren't CFGs so it doesn't work.
Also constrained decoding generally worsens model quality since the model would be generating off-policy. So RL helps push corrected syntax back on-policy.
No comments yet
Contribute on Hacker News ↗