← Back to context

Comment by Buttons840

14 days ago

If I have 5000 documents about A, and 5000 documents about B, do we know whether it's better to train one large model on all 10,000 documents, or to train 2 different specialist models and then combine them as you describe?

well you don't. but the power of gradient descent if properly managed will split them up for you. But you might get more mileage out of like 200 specialist models.

It probably depends on how much A and B overlap. If it's say English sci-fi and Chinese poetry two different models may be better.