← Back to context

Comment by darth_aardvark

8 hours ago

> I don't think most companies can resist the allure of more free data as bitter as it may taste.

Mercor, Surge, Scale, and other data labelling firms have shown that's not true. Paid data for LLM training is in higher demand than ever for this exact reason: Model creators want to improve their models, and free data no longer cuts it.

"Paid data," in the sense of cheap text, is a mature industry, and you can have as much as you want for pennies per word.

I did read or listen on a podcast about the booming business of AI data sets late last year. I'm sure you are right.

Doesn't change my point, I still don't think they can resist pulling from the "free" data. Corps are just too greedy and next quarter focused.