Comment by ookdatnog
6 days ago
I don't think it's simply a stylistic matter: it seems reasonable to assume that text in books tends to have higher information density, and contains longer and more complicated arguments (when compared to text obtained from social media posts, blogs, shorter articles, etc). If you want models that appear more intelligent, I think you need them to train on this kind of high-quality content.
The fact that these tend to be written in an older writing style is to me incidental. You could rewrite all your college text books in contemporary social media slang and I would still consider them high-quality texts.
No comments yet
Contribute on Hacker News ↗