Comment by sandrello
9 hours ago
> These models are alien intelligences that could occupy an unimaginably vast space of possibilities (there are trillions of weights inside them), but which have been RL-ed over and over until they more or less stay within familiar reasonable human lines.
or, more plausibly, that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.
Do not fall for the idea that if we're not able to comprehend something, it's because our brain is falling short on it. Most of the time, it's just that what we're looking at has no use/meaning in this world at all.
> that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.
Oh, the space of possibilities is unimaginably vaster than that. Trillions of weights. But more combinations of those weights than there are electrons in the universe. So I think we could equally well speculate (and that's what we're both doing here, of course!) that all these things are simultaneously true:
1) Most configurations of LLM weights are indeed gibberish-producers (I agree with you here)
2) Nonetheless there is a vast space of combinations of weights that exhibit "intelligent" properties but in a profoundly alien way. They can still solve Erdos problems, but they don't see the world like us at all.
3) RL tends to herd LLM weights towards less alien intelligence zones, but it's an unreliable tool. As we just saw, with the goblins.
As a thought experiment, imagine that an alien species (real organic aliens, let's say) with a completely different culture and relation to the universe had trained an LLM and sent it to us to load onto our GPUs. That LLM would still be just as "intelligent" as Opus 4.7 or GPT 5.5, able to do things like solve advanced mathematics problems if we phrased them in the aliens' language, but we would hardly understand it.
> Most of the time, it's just that what we're looking at has no use/meaning in this world at all.
Man, LLMs are really just astrology for tech bros. From randomness comes order.