← Back to context

Comment by simgt

1 month ago

If you're adding a model to do the "routing" you're basically putting learned backward connections and you end up with a RNN

Mixture of Experts already have routing models,

I'm just suggesting eliminate (or weaken) the distinction between layers and expert and have just the one, then iterate that one until its 'gpod enough' score plus (iterationcount*spontaneity) is greater than some threshold.