Comment by HanClinto
6 months ago
Further than that, it feels like we could use constrained generation of outputs [0] to force the model to do X amount of output inside of a <thinking> BEFORE writing an <answer> tag. It might not always produce good results, but I'm curious what sort of effect it might have to convince models that they really should stop and think first.
[0]: https://github.com/ggerganov/llama.cpp/blob/master/grammars/...
No comments yet
Contribute on Hacker News ↗