← Back to context

Comment by meatmanek

1 month ago

I think you're correct with the standard thinking approach (just generate a big stream of tokens before drafting your actual answer). After a while, additional thinking just results in loops.

The RSA approach from https://rsa-llm.github.io/, expanded on by https://www.zyphra.com/post/zaya1-8b, looks like a promising way to squeeze a bit more intelligence from a small model. As I understand it, running multiple independent thinking traces in parallel gives you a chance of one of them finding a different local optimum, whereas running a single trace for longer is likely to just circle around one optimum.

That said, at the end of the day, there's only so much information a small model can contain. If a model just doesn't know some key piece of information, no amount of thinking will help it figure out a solution that depends on that information.