Comment by simgt
1 month ago
If you're adding a model to do the "routing" you're basically putting learned backward connections and you end up with a RNN
1 month ago
If you're adding a model to do the "routing" you're basically putting learned backward connections and you end up with a RNN
Mixture of Experts already have routing models,
I'm just suggesting eliminate (or weaken) the distinction between layers and expert and have just the one, then iterate that one until its 'gpod enough' score plus (iterationcount*spontaneity) is greater than some threshold.