Comment by idiotsecant

9 months ago

The unfortunate side effect is that a megacorp gets to vacuum up the sum of human knowledge for free, boil it down, and sell it back to us for a nice profit.

Ah you mean like Google Search?

  • Google Search brings you traffic and revenue. LLMs do not.

    • See also how people have responded to google-snippets. When google search threatens to remove traffic or revenue, people get angry quite quickly.

  • I still own my content. Google links to it and sends me traffic. We both win. This sort of relationship is not present when my content is anonymously fed into a training model intended to be used to extract users before they are sent to me. And, yes, I am aware Google has pulled some cute shit with this definition, and when they do it then it's also bad.

    • > Google links to it and sends me traffic

      Used to, but more recently it's probably LLM agents using Google not people. And even if it's not yet, it will be. Last time I searched for something on Google it messed up so bad I quickly returned to GPT-4o+search.

      1 reply →

How long before a handful of entities, having already ingested the available content into their proprietary systems, bankroll assaults on Wikipedia and the Internet Archive.

  • Likely never, as those platforms are continuously updating at no cost to the siphons training their LLMs on them

Really?

a) Meta are (so far) releasing their models for free.

b) There's nothing stopping non-mega-corps from doing the same, especially if this precedent was established. (Training is of course expensive but this is a challenge, not an absolute block.)