← Back to context

Comment by kjellsbells

2 days ago

> Meta has a massive corpus of posts, comments, interactions, etc to train AI

I question whether the corpus is of particularly high quality and therefore valuable source data to train on.

On the one hand: 20+ years of posts. In hundreds of languages (very useful to counteract the extreme English-centricity of most AI today).

On the other hand: 15+ years of those posts are clustered on a tiny number of topics, like politics and selling marketplace items. Not very useful unless you are building RagebaitAI I suppose. Reddit's data would seem to be far more valuable on that basis.