Comment by andai 21 hours ago What's the downside? Don't they stop when they hit diminishing returns? 2 comments andai Reply hgoel 2 hours ago Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set. Ifkaluva 18 hours ago You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens
hgoel 2 hours ago Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.
Ifkaluva 18 hours ago You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens
Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.
You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens