← Back to context

Comment by libraryofbabel

5 hours ago

> that specific version we're aligning toward is just the only one that makes some kind of rational sense, among a trillion of other meaningless gibberish-producing ones.

Oh, the space of possibilities is unimaginably vaster than that. Trillions of weights. But more combinations of those weights than there are electrons in the universe. So I think we could equally well speculate (and that's what we're both doing here, of course!) that all these things are simultaneously true:

1) Most configurations of LLM weights are indeed gibberish-producers (I agree with you here)

2) Nonetheless there is a vast space of combinations of weights that exhibit "intelligent" properties but in a profoundly alien way. They can still solve Erdos problems, but they don't see the world like us at all.

3) RL tends to herd LLM weights towards less alien intelligence zones, but it's an unreliable tool. As we just saw, with the goblins.

As a thought experiment, imagine that an alien species (real organic aliens, let's say) with a completely different culture and relation to the universe had trained an LLM and sent it to us to load onto our GPUs. That LLM would still be just as "intelligent" as Opus 4.7 or GPT 5.5, able to do things like solve advanced mathematics problems if we phrased them in the aliens' language, but we would hardly understand it.