Comment by 4bpp

1 month ago

That's interesting, could you point me to some source on these findings?

It seems to me that the difference between "iterative improvement" as you put it and "close to the identity" (as in the output is close to the input for most of the volume of the input space) as I put it is fairly subtle, anyway. One experiment I would like to see is what happens to the reasoning performance if rather than duplicating the selected layers, they are deleted/skipped entirely. If the layers improve reasoning by iterative improvement, this should make the performance worse; but if they contain a mechanism that degrades reasoning and is not robust against unannealed self-composition, it should make the performance similarly better.

https://arxiv.org/abs/2505.12540 https://arxiv.org/abs/2405.07987

These, other papers, and the lottery ticket phenomenon; what it boils down to is that any neural network like system which encodes some common mapping of a phenomenon in the context of the world - not necessarily a world model, but some "real-world thing" - will tend to map to a limited number of permutations of some archetypal representation, which will resemble other mappings of the same thing.

The lottery ticket phenomenon is a bit like the birthday paradox; there will be some number of structures in a large, random initialization of neural network weights that coincide with one or more archetypal mappings of complex objects. Some sub-networks are also useful mappings to features of one or more complex objects, which makes learning hierarchical nested networks of feature mappings easier; it's also why interpretability is so damned difficult.