Comment by liuliu
3 days ago
The router is manually designed (see their cem function). Also, the experts are not separate weights, just different scales of it's singular values.
3 days ago
The router is manually designed (see their cem function). Also, the experts are not separate weights, just different scales of it's singular values.
Thank you, I was missing that second part.