Comment by ants_everywhere
6 months ago
> But ultimately LLMs also in a way are trained for survival, since an LLM that fails the tests might not get used in future iterations. So for LLMs it is also survival that is the primary driver, then there will be the subgoals.
I think this is sometimes semi-explicit too. For example, this 2017 OpenAI paper on Evolutionary Algorithms [0] was pretty influential, and I suspect (although I'm an outsider to this field so take it with a grain of salt) that some versions of reinforcement learning that scale for aligning LLMs borrow some performance tricks from OpenAIs genetic approach.
No comments yet
Contribute on Hacker News ↗