← Back to context

Comment by SyzygyRhythm

1 month ago

If running twice is good, then is running N times even better? I wonder if you could even loop until some kind of convergence, say hitting a fixed point (input equals output). I wonder if there's even a sort of bifurcation property where it sometimes loops A->A->A, but other times A->B->A, or more, rather like the logistic map fractal.

I explored that, again with Devstral, but the execution with 4 times the same circuit lead to less score on the tests.

I chat with the model to see if the thing was still working and seemed coherent to me, I didn't notice anything off.

I need to automate testing like that, where you pick the local maxima and then iterate over that picking layers to see if it's actually better, and then leave the thing running overnight

  • Can Karpathy's autoresearch be used on this to explore what works and what does not? That is supposed to automate research like this from what I understand.

That's how deep equilibrium models were discovered.

Whats's more. It was found out that you only need a single looped layer to be equivalent to a multi layer network.

Phi-4-14b with layers duplicated (phi-4-25b) has increassed performance. Phi-4-49b has degraded vs 14b.