Comment by jxmorris12

15 hours ago

There’s nothing to read.

Model A: A_1, …, A_n Model B: B_1, …, B_n

C_i = A_i * p + B_i * (1 - p)

In other words, it’s just a linear combination of the other models’ weights, per position.

1 comment

jxmorris12

It's been a while since I looked at neural networks in detail. Do all the large models have a close enough architecture that this makes sense? Do they have the same number of layers and width? I had thought that each model it's own "secret sauce" of normal and special layers (convolution, max-pooling, something-something) stacked together. Genuinely curious.