Comment by mewpmewp2

6 months ago

I think more accurate would be that humans are functions that generate actions or behaviours that have been shaped by how likely they are to lead to procreation and survival.

But ultimately LLMs also in a way are trained for survival, since an LLM that fails the tests might not get used in future iterations. So for LLMs it is also survival that is the primary driver, then there will be the subgoals. Seemingly good next token prediction might or might not increase survival odds.

Essentially there could arise a mechanism where they are not really truly trying to generate the likeliest token (because there actually isn't one or it can't be determined), but whatever system will survive.

So an LLM that yields in perfect theoretical tokens (we really can't verify though what are the perfect tokens), could be less likely to survive than an LLM that develops an internal quirk, but the quirk makes them most likely to be chosen for the next iterations.

If the system was complex enough and could accidentally develop quirks that yield in a meaningfully positive change although not in necessarily next token prediction accuracy, could be ways for some interesting emergent black box behaviour to arise.

> But ultimately LLMs also in a way are trained for survival, since an LLM that fails the tests might not get used in future iterations. So for LLMs it is also survival that is the primary driver, then there will be the subgoals.

I think this is sometimes semi-explicit too. For example, this 2017 OpenAI paper on Evolutionary Algorithms [0] was pretty influential, and I suspect (although I'm an outsider to this field so take it with a grain of salt) that some versions of reinforcement learning that scale for aligning LLMs borrow some performance tricks from OpenAIs genetic approach.

[0] https://openai.com/index/evolution-strategies/

> Seemingly good next token prediction might or might not increase survival odds.

Our own consciousness comes out of an evolutionary fitness landscape in which _our own_ ability to "predict next token" became a survival advantage, just like it is for LLMs. Imagine the tribal environment: one chimpanzee being able to predict the actions of another gives that first chimpanzee a resources and reproduction advantage. Intelligence in nature is a consequence of runaway evolution optimizing fidelity of our _theory of mind_! "Predict next ape action" eerily similar to "predict next token"!