Comment by HarHarVeryFunny
5 days ago
When you say "classification task fine tuning", are you referring to RLHF?
RLHF seems to have been the critical piece that "aligned" the otherwise rather wild output of a purely "causally" (next-token prediction) trained LLM with what a human expects in terms of conversational turn taking (e.g. Q & A) and instruction following, as well as more general preferences/expectations.
No comments yet
Contribute on Hacker News ↗