Comment by Ifkaluva
20 hours ago
Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model
20 hours ago
Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model
What's the downside? Don't they stop when they hit diminishing returns?
Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.
You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens