Comment by namuol

2 months ago

> sample solutions from the model with certain temperature and truncation configurations, then fine-tune on those samples with standard supervised fine-tuning

It’s all moonspeak to me. I tried reading other comments that explain this and they all sounded different or contradictory. I’ve studied ML as a hobby years ago but this was before the LLM explosion. Guess I need to start over again?