Comment by RandomBK

8 months ago

My 2c is that it is worthwhile to train on AI generated content that has obtained some level of human approval or interest, as a form of extended RLHF loop.

3 comments

RandomBK

cryptonector 8 months ago

Ok, but how do you denote that approval? What if you partially approve of that content? ("Overall this is correct, but this little nugget is hallucinated.")

bongodongobob 8 months ago
It apparently doesn't matter unless you somehow consider the entire Internet to be correct. They didn't only feed LLMs correct info. It all just got shoveled in and here we are.
- cryptonector 8 months ago
  
  Sure, humans also hallucinate.