Comment by solresol
6 days ago
I think of it as trying to encourage the LLM to want to give answers from a particular part of the phase space. You can do it by fine tuning it to be more likely to return values from there, or you can prompt it to get into that part of the phase space. Either works, but fiddling around with prompts doesn't require all that much MLops or compute power.
That said, fine tuning small models because you have to power through vast amounts of data where a larger model might be cost ineffective -- that's completely sensible, and not really mentioned in the article.
> That said, fine tuning small models
Mostly referred to as model distillation, but I give the author the benefit of the doubt that they didn't mean that.
My understanding of model distillation is quite different in that it trains another (typically smaller) model using the error between the new model’s output and that of the existing - effectively capturing the existing model’s embedded knowledge and encoding it (ideally more densely) into the new.
What what I was referring to is similar in concept, but I've seen both described in papers as distillation. What I meant was you take the output of a large model like GPT4 and use that as training data to fine-tune a smaller model.
1 reply →
> That said, fine tuning small models because you have to power through vast amounts of data where a larger model might be cost ineffective -- that's completely sensible, and not really mentioned in the article.
...which I thought was arguably the most popular use case for fine tuning these days.