Comment by nearbuy
19 hours ago
The parent comment probably forgot about the RLHF (reinforcement learning) where predicting the next token from reference text is no longer the goal.
But even regular next token prediction doesn't necessarily preclude it from also learning to give correct and satisfying answers, if that helps it better predict its training data.
No comments yet
Contribute on Hacker News ↗