← Back to context

Comment by zmmmmm

9 days ago

yeah, they should really have a think about how their behavior is harming their future prospects here.

Just because you have infinite money to spend on training doesn't mean you should saturate the internet with bots looking for content with no constraints - even if that is a rounding error of your cost.

We just put heavy constraints on our public sites blocking AI access. Not because we mind AI having access - but because we can't accept the abusive way they execute that access.

Something I’ve noticed about technology companies, and it’s bled into just about every facet of the US these days, is the consideration of if an action *can* be executed upon vs *should* an action be executed upon.

It’s very unfortunate and a short sighted way to operate.

The main issue is a well behaved AI company won't be singled out for continued access, they will all be hit by public sites blocking AI access. So there is no benefit to them behaving.

  • > So there is no benefit to them behaving.

    That's assuming they're deriving a benefit from misbehaving.

    There is no benefit to immediately re-crawling 404s or following dynamic links into a rabbit hole of machine-generated junk data and empty search results pages in violation of robots.txt. They're wasting the site's bandwidth and their own in order to get trash they don't even want.

    Meanwhile there is an obvious benefit to behaving: You don't, all by yourself, cause public sites to block everyone including you.

    The problem here isn't malice, it's incompetence.

  • Why should a well-behaved AI company be singled out for continued access? If the industry can't regulate itself then none deserve access no matter if they're "well-behaved".

    Receiving a response from someone's webserver is a privilege, not a right.

  • Honestly, has any of these AI companies ever offered a compensation for the data they pillage, except in case of large walled up information silos like reddit? This is like asking why the occasional burglars are not singled out for direct access into your house, compared to the stripmining marauders out there.

    Why does any of them deserve any special treatment? Please don't try to normalize this reprehensible behavior. It's a greedy, exploitative and lawless behavior, no matter how much they downplay it or how long they've been doing it.

    • No single piece of content (unless you're a really large website) is worth the paper that such a contract would be written on.

      This is the problem with AI scraping. On one hand, they need a lot of content, on the other, no single piece of content is worth much by itself. If they were to pay every single website author, they'd spend far more on overhead than they would on the actual payments.

      Radio faces a similar problem (it would be impossible to hunt down every artist and negotiate licensing deals for every single song you're trying to play). This is why you have collective rights management organizations, which are even permitted by law to manage your rights without your consent in some countries.