Comment by numba888

10 months ago

> RL spontaneously should create new additions to language to help with

Yes, but it may take millions of years. One of the main reasons of LLMs success is their amazing trainability. For every input token it produces predictable output. I.e. loss. While most RL techniques go one by one 'state'. For not tokenized output we cannot predict what it should be. Thus it can be trained only through the next tokens. Which makes it probably unstable and expensive to train, limiting the length of 'continuous' part. But looks like it's still a good idea to have.

0 comments

numba888

No comments yet

Contribute on Hacker News ↗