← Back to context Comment by charcircuit 1 day ago It's trying to maximize a reward function. It's not just predicting the next word. 0 comments charcircuit Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗