Comment by Buttons840

3 months ago

If I have 5000 documents about A, and 5000 documents about B, do we know whether it's better to train one large model on all 10,000 documents, or to train 2 different specialist models and then combine them as you describe?

2 comments

Buttons840

vessenes 3 months ago

well you don't. but the power of gradient descent if properly managed will split them up for you. But you might get more mileage out of like 200 specialist models.

MoonGhost 3 months ago

It probably depends on how much A and B overlap. If it's say English sci-fi and Chinese poetry two different models may be better.