← Back to context

Comment by Ifkaluva

20 hours ago

Liquid does amazing work, but I kinda feel like they are overtraining their models. 38T tokens seems like a lot for an 8B model

3 comments

Ifkaluva

Reply

andai 20 hours ago

What's the downside? Don't they stop when they hit diminishing returns?

hgoel 1 hour ago

Wouldn't the model start overfitting at some point? Degrading generalization for accuracy on the training set.
Ifkaluva 17 hours ago

You’d think so, but I haven’t seen it explicitly discussed in their papers, and nobody else that I know of trains on that many tokens