Comment by HarHarVeryFunny

5 days ago

When you say "classification task fine tuning", are you referring to RLHF?

RLHF seems to have been the critical piece that "aligned" the otherwise rather wild output of a purely "causally" (next-token prediction) trained LLM with what a human expects in terms of conversational turn taking (e.g. Q & A) and instruction following, as well as more general preferences/expectations.