← Back to context

Comment by JKCalhoun

3 hours ago

"…they'd need a bunch of human reviewers combing through massive troves of data…"

Yeah, I concede that. It doesn't need to be done over night. Having a static repo of data though that you can work through over time (years)—removing some data, add pre-curated data to. In so many years you can have a pretty good "reference dataset".

I think some of the thousands of people working on training LLMs have tried some of the low-hanging-fruit ideas we can brainstorm of the top of our head 5 years later.