Comment by cma
9 months ago
They could have rlhfed or finetuned on user thumbs up responses, which could include users who took the test and asked it to explain problems after
9 months ago
They could have rlhfed or finetuned on user thumbs up responses, which could include users who took the test and asked it to explain problems after
No comments yet
Contribute on Hacker News ↗