Comment by jxmorris12
16 hours ago
There’s nothing to read.
Model A: A_1, …, A_n Model B: B_1, …, B_n
C_i = A_i * p + B_i * (1 - p)
In other words, it’s just a linear combination of the other models’ weights, per position.
16 hours ago
There’s nothing to read.
Model A: A_1, …, A_n Model B: B_1, …, B_n
C_i = A_i * p + B_i * (1 - p)
In other words, it’s just a linear combination of the other models’ weights, per position.
It's been a while since I looked at neural networks in detail. Do all the large models have a close enough architecture that this makes sense? Do they have the same number of layers and width? I had thought that each model it's own "secret sauce" of normal and special layers (convolution, max-pooling, something-something) stacked together. Genuinely curious.