Comment by astrange
4 years ago
Even if it was hard to train, you could make your own by fine-tuning a larger model for much cheaper.
That's called "base models". (or "foundation models" if you're Stanford trying to co-opt it)
4 years ago
Even if it was hard to train, you could make your own by fine-tuning a larger model for much cheaper.
That's called "base models". (or "foundation models" if you're Stanford trying to co-opt it)
suppose one has an idea for a different architecture / functional form etc, assuming the receiving model is substantially smaller so that the dominant computational cost is in the SD model, how long would effective knowledge distillation take on say a CPU?
That’s called teacher-student learning. It could still take weeks on a single machine easily, but renting more GPU time or getting free credits from somewhere is perfectly plausible.