Comment by AdieuToLogic

16 hours ago

The hypocrisy of Anthropic complaining about "illicitly extracting its Claude AI model capabilities" and supporting the White House's accusation of China "stealing U.S. AI labs' intellectual property on an industrial scale" is hilarious.

Anthropic, OpenAI, Google, Microsoft, et al trained their models by ignoring the rights of copyright holders when harvesting whatever content they could. Now one of them is crying foul for another entity doing exactly what they all did?

Hilarious.

The AI companies seem to take the viewpoint that everything on the internet is free, except their stuff. It's okay to hammer some random website with AI crawlers, ignoring robots.txt, and causing bandwidth costs to skyrocket. But if you cost an AI provider money with your data acquisition practices, well, that's just clearly unacceptable.

  • Anthropic, Dario especially seems have eternal grudge against China as a concept, that remind me of Thiel.

    • Coming from him, I am not sure even that is real. It could very easily (and plausibly) be a part of the ongoing hype drama.

      "Our models so precious, US Gov has to revoke access to foreigner." - tuned up version: "Our models so advanced our #1 adversary is desperately stealing it from us."

  • That's one aspect, which is a bit of a gray zone. But Anthropic trained on pirated books. That is explicitly illegal.

    • That ship has sailed, I would wager all the AI labs are ingesting anything human generated, whether that means Hollywood movies, Taylor Swift’s discography, YouTube videos or private GitHub source repos.

      The reward for having a competitive edge is exponentially higher than the risk of a lawsuit. Politicians are still old bureaucrats who don’t understand technology.

    • As I understand it what was "explicitly illegal" was copying the books, in the sense of mere copying before feeding them to the model, and this is what the Anthropic copyright settlement is about.

      Actually processing them through the model, though, was considered transformative and therefore fair use.

    • They didn't train on the books and that court only found that the pirating was illegal anyway.

    • I'd love to see an open-source project that's basically a Torrent client for downloading pirated material, but it trains an AI model "in the background" using the downloaded content. That way everyone can claim fair use for possessing copyrighted material, I mean there's precedent right?

      2 replies →

  • >The AI companies seem to take the viewpoint that everything on the internet is free,

    The AI companies? That's been the common ethos of the internet for 40 years

    I mean, raise your hand if you ad block and have a hard drive of pirated content...

  • > But if you cost an AI provider money with your data acquisition practices, well, that's just clearly unacceptable.

    It's the same question libertarian advocates cannot resolve:

      If one truly believes in personal sovereignty, how are
      shared resources paid for, such as roads, power grids,
      potable water, sewage services, fire departments,
      and police departments?
    

    It is also not a coincidence that leadership in many tech companies have expressed libertarian ideals.

    • What do you mean by "libertarian advocates cannot resolve"? Like, they have no answers at all, or you aren't personally swayed by them? Because they definitely have answers to this question...

      8 replies →

    • Libertarians can just flip it round and say how do socialists solve the free rider problem? Neither system resolves both problems.

      Extremist dogma is not a great way to run a society, but it does good numbers on social media, so here we are.

      8 replies →

It's not exactly the same, since any Claude output is public domain under current law. So the Chinese aren't stealing anything here.

Not really even in the same ballpark as what they did. These other labs are using AI generated content (which has already been ruled un-copyrightable) to train their models. Oh and they are paying for those tokens. So at absolute worst, they are violating the terms of service. The horror. Meanwhile these frontier AI labs pirated and scraped everything they possibly could, paying not a dime to the copyright owners, nor paying anything to the websites they DDoSed.

Not really.

Data mining for AI is presumably fair use, whereas when you sign up for a Claude account, you enter into a legally binding contract that says you will not distill a model based on its outputs.