Comment by gunalx
16 hours ago
Probably just means SFT fine-tuning a base model, vs behavioural dpo and/or SFT fine-tuning a instruction model.
16 hours ago
Probably just means SFT fine-tuning a base model, vs behavioural dpo and/or SFT fine-tuning a instruction model.
No comments yet
Contribute on Hacker News ↗