Comment by danielmarkbruce
1 month ago
The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high.
No comments yet
Contribute on Hacker News ↗