Comment by MrLeap

5 days ago

> that's a skill issue and not a fundamental property

This made me laugh.

You seem like you may know something I've been curious about.

I'm a shader author these days, haven't been a data scientist for a while, so it's going to distort my vocab.

Say you've got a trained neural network living in a 512x512 structured buffer. It's doing great, but you get a new video card with more memory so you can afford to migrate it to a 1024x1024. Is the state of the art way to retrain with the same data but bigger initial parameters, or are there other methods that smear the old weights over a larger space to get a leg up? Anything like this accelerate training time?

... can you up sample a language model like you can lowres anime profile pictures? I wonder what the made up words would be like.

4 comments

MrLeap

kouteiheika 5 days ago

In general this is of course an active area of research, but yes, you can do something that and people have done it successfully[1] by adding extra layers to an existing model and then continuing to train it.

You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].

[1] -- https://arxiv.org/abs/2312.15166

[2] -- https://arxiv.org/abs/1906.06669

[3] -- https://arxiv.org/abs/2205.10487

[4] -- https://galactica.org/static/paper.pdf

yorwba 5 days ago

In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...
MrLeap 5 days ago

Thank you for taking the time to provide me all this reading.
ijk 5 days ago

This might be obvious, but just to state it explicitly for everyone: you can freeze the weights of the existing layers if you want to train the new layers but want to leave the existing layers untouched.