Comment by cma
10 months ago
They could have rlhfed or finetuned on user thumbs up responses, which could include users who took the test and asked it to explain problems after
10 months ago
They could have rlhfed or finetuned on user thumbs up responses, which could include users who took the test and asked it to explain problems after
No comments yet
Contribute on Hacker News ↗