← Back to context

Comment by sota_pop

2 days ago

Always appreciated the work of Jeremy Howard. Also had a lot of fun using the Fast.ai framework. My experience is similar to your description. When using 2, 3, or more epochs, felt that overfitting started to emerge. (And I was CERTAINLY not training models anywhere near the size of modern LLMs) I suppose in this case by “saturation” you meant training “marginally before exhibiting over-fitting” - something akin to “the elbow method” w.r.t. clustering algorithms? I’ll have to chew on your description of overfitting results for a while. It jives with mine, but in a way that really makes me question my own - thanks for the thought provoking response!