Comment by cedws

6 months ago

Q: so apparently allowing LLMs to “think” by asking it to walk through and generate preamble tokens to an answer improves quality. With this kind of speedup would it be practical/effective to achieve better output quality by baking in a “thinking” step to every prompt? Say, a few thousand tokens before the actual reply.