Comment by throw310822

4 days ago

Interesting observation. One one hand, these resemble more the notes that an actual participant would write while solving the problem. Also, less words = less noise, more focus. But also, specifically for LLMs that output one token at a time and have a limited token context, I wonder if limiting itself to semantically meaningful tokens can be create longer stretches of semantically coherent thought?

2 comments

throw310822

btown 4 days ago

The original thread mentions “test-time compute scaling” so they had some architecture generating a lot of candidate ideas to evaluate. Minimizing tokens can be very meaningful from a scalability perspective alone!

torginus 4 days ago

This is just speculation but I wouldn't be surprised if there were some symbolic AI 'tricks'/tools (and/or modern AI trained to imitiate symbolic AI) under the hood.