Comment by kouteiheika
5 days ago
In general this is of course an active area of research, but yes, you can do something that and people have done it successfully[1] by adding extra layers to an existing model and then continuing to train it.
You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].
[1] -- https://arxiv.org/abs/2312.15166
[2] -- https://arxiv.org/abs/1906.06669
In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...
Thank you for taking the time to provide me all this reading.
This might be obvious, but just to state it explicitly for everyone: you can freeze the weights of the existing layers if you want to train the new layers but want to leave the existing layers untouched.