Comment by throw310822

3 hours ago

I keep thinking of the RYS (Repeat Yourself) experiment of simply looping some of the inner layers of LLMs for better results and wonder if any progress was made on it.

https://dnhkng.github.io/posts/rys/

Feels it should be straightforward to integrate in LLMs a network to control the looping. Or just duplicate entire blocks of layers after the initial training.