← Back to context

Comment by Davidzheng

3 months ago

Probably not needed in the end to reason in latent space. Unless constrained by human preference/SFT data, RL spontaneously should create new additions to language to help with new reasoning methods/new concepts invented by the system.

> RL spontaneously should create new additions to language to help with

Yes, but it may take millions of years. One of the main reasons of LLMs success is their amazing trainability. For every input token it produces predictable output. I.e. loss. While most RL techniques go one by one 'state'. For not tokenized output we cannot predict what it should be. Thus it can be trained only through the next tokens. Which makes it probably unstable and expensive to train, limiting the length of 'continuous' part. But looks like it's still a good idea to have.

Can definitely create new math concepts, for example.

"Let two dhdud and three otincjf be called a Uhehjfj"