Comment by Nevermark

8 months ago

> It’s not like these models were released with all of the weights perfectly accounted for and changing them in any way ruins them.

So more imperfect is better?

Of course the model’s parameters leave a many billions of elements vector path for improvement. But what circuitous path is that, which it didn’t already find?

You can’t find it by definition if you don’t include all the original data with the tuning data. You have radically changed the optimization surface with no contribution from the previous data at all.

The one use case that makes sense is sacrificing functionality to get better at a narrow problem.