Comment by lastdong

10 months ago

In my opinion, we need more models trained on fully traceable and clean data instead of closed models that we later find out were trained on Reddit and Facebook discussion threads.

I want to see something trained _only_ on stuff like encyclopedias, programming books, etc. I'm interested in how different it would be compared to something with a lot of social media in it.