Comment by liuliu
10 months ago
The router is manually designed (see their cem function). Also, the experts are not separate weights, just different scales of it's singular values.
10 months ago
The router is manually designed (see their cem function). Also, the experts are not separate weights, just different scales of it's singular values.
Thank you, I was missing that second part.