← Back to context

Comment by Lerc

1 month ago

That weird part is kind of what I was expecting.

This goes to the thing that I posted on the thread a couple of days ago. https://news.ycombinator.com/item?id=47327132

What you need is a mechanism to pick the right looping pattern, Then it really does seem to be Mixture of experts on a different level.

Break the model into input path, thinking, output path. and make the thinking phase a single looping layer of many experts. Then the router gets to decide 13,13,14,14,15,15,16.

Training the router left as an exercise to the reader.

If you're adding a model to do the "routing" you're basically putting learned backward connections and you end up with a RNN

  • Mixture of Experts already have routing models,

    I'm just suggesting eliminate (or weaken) the distinction between layers and expert and have just the one, then iterate that one until its 'gpod enough' score plus (iterationcount*spontaneity) is greater than some threshold.