Comment by numba888
3 months ago
> RL spontaneously should create new additions to language to help with
Yes, but it may take millions of years. One of the main reasons of LLMs success is their amazing trainability. For every input token it produces predictable output. I.e. loss. While most RL techniques go one by one 'state'. For not tokenized output we cannot predict what it should be. Thus it can be trained only through the next tokens. Which makes it probably unstable and expensive to train, limiting the length of 'continuous' part. But looks like it's still a good idea to have.
No comments yet
Contribute on Hacker News ↗