Comment by solresol

6 days ago

I think of it as trying to encourage the LLM to want to give answers from a particular part of the phase space. You can do it by fine tuning it to be more likely to return values from there, or you can prompt it to get into that part of the phase space. Either works, but fiddling around with prompts doesn't require all that much MLops or compute power.

That said, fine tuning small models because you have to power through vast amounts of data where a larger model might be cost ineffective -- that's completely sensible, and not really mentioned in the article.

5 comments

solresol

lyu07282 6 days ago

> That said, fine tuning small models

Mostly referred to as model distillation, but I give the author the benefit of the doubt that they didn't mean that.

sota_pop 6 days ago
My understanding of model distillation is quite different in that it trains another (typically smaller) model using the error between the new model’s output and that of the existing - effectively capturing the existing model’s embedded knowledge and encoding it (ideally more densely) into the new.
- lyu07282 6 days ago
  
  What what I was referring to is similar in concept, but I've seen both described in papers as distillation. What I meant was you take the output of a large model like GPT4 and use that as training data to fine-tune a smaller model.
  
  1 reply →

cbsmith 6 days ago

> That said, fine tuning small models because you have to power through vast amounts of data where a larger model might be cost ineffective -- that's completely sensible, and not really mentioned in the article.

...which I thought was arguably the most popular use case for fine tuning these days.