Comment by nearbuy
5 days ago
The parent comment probably forgot about the RLHF (reinforcement learning) where predicting the next token from reference text is no longer the goal.
But even regular next token prediction doesn't necessarily preclude it from also learning to give correct and satisfying answers, if that helps it better predict its training data.
I didn't, hence the "first". It's clear that being good at next token prediction forces the models to learn a lot, including giving such answers. But it's not their loss function. Presumably they would be capable of lying and insulting you with the right system prompt just as well. And I doubt RLHF gets rid of this ability.
If you didn't forget about the RLHF, your comment is oddly pedantic, confusing and misleading. "Correct and satisfying answers" is roughly the loss function for RLHF, assuming the humans favor satisfying answers, and using "loss function" loosely, as you yourself do, by gesturing at what the loss function is meant to do rather than formally describing an actual function. The comment you responded to didn't say this was the only loss function during all stages of training. Just that "When your loss function is X", then Y happens.
You could have just acknowledged they are roughly correct about RLHF, but brought up issues caused by pretraining.
> And I doubt RLHF gets rid of this ability.
The commenter you were replying to is worried the RLHF causes lying.