Comment by kjellsbells
2 days ago
> Meta has a massive corpus of posts, comments, interactions, etc to train AI
I question whether the corpus is of particularly high quality and therefore valuable source data to train on.
On the one hand: 20+ years of posts. In hundreds of languages (very useful to counteract the extreme English-centricity of most AI today).
On the other hand: 15+ years of those posts are clustered on a tiny number of topics, like politics and selling marketplace items. Not very useful unless you are building RagebaitAI I suppose. Reddit's data would seem to be far more valuable on that basis.
No comments yet
Contribute on Hacker News ↗