← Back to context

Comment by amdivia

5 hours ago

I think they'll be extremely worse on their own

Predicting "America" in "The United States of ..." Is a different task from predicting the whole sentence.

So the small model is laying the blocks, and the bigger model would be cementing them in place or kicking them down. The bigger model's course correction is what keeps the smaller models predictions relatively on track