Comment by theropost

1 month ago

I think there is a real issue here, but I do not think it is as simple as calling it theft in the same way as copying books. The bigger problem is incentives. We built a system where writing docs, tutorials, and open technical content paid off indirectly through traffic, subscriptions, or services. LLMs get a lot of value from that work, but they also break the loop that used to send value back to the people and companies who created it.

The Tailwind CSS situation is a good example. They built something genuinely useful, adoption exploded, and in the past that would have meant more traffic, more visibility, and more revenue. Now the usage still explodes, but the traffic disappears because people get answers directly from LLMs. The value is clearly there, but the money never reaches the source. That is less a moral problem and more an economic one.

Ideas like GPL-style licensing point at the right tension, but they are hard to apply after the fact. These models were built during a massive spending phase, financed by huge amounts of capital and debt, and they are not even profitable yet. Figuring out royalties on top of that, while the infrastructure is already in place and rolling out at scale, is extremely hard.

That is why this feels like a much bigger governance problem. We have a system that clearly creates value, but no longer distributes it in a sustainable way. I am not sure our policies or institutions are ready to catch up to that reality yet.

15 comments

theropost

Brybry 1 month ago

> We have a system that clearly creates value, but no longer distributes it in a sustainable way

The same thing happened (and is still happening) with news media and aggregation/embedding like Google News or Facebook.

I don't know if anyone has found a working solution yet. There have been some laws passed and licensing deals [1]. But they don't really seem to be working out [2].

[1] https://www.cjr.org/the_media_today/canada_australia_platfor...

[2] https://www.abc.net.au/news/2025-04-02/media-bargaining-code...

ViscountPenguin 1 month ago

I'm not sure that I'd call [2] it not working out, just like I wouldn't call the equivalent pressure from the USA to dismantle medicare our public health system not working out.
The biggest issue with the scheme is the fact that it was structured to explicitly favour media incumbents, and is therefore politically unpopular.

w10-1 1 month ago

> I do not think it is as simple as calling it theft in the same way as copying books

Aside from the incentive problem, there is a kind of theft, known as conversion: when you were granted a license under some conditions, and you went beyond them - you kept the car past your rental date, etc. In this case, the documentation is for people to read; AI using it to answer questions is a kind of conversion (no, not fair use). But these license limits are mostly implicit in the assumption that (only) people are reading, or buried in unenforceable site terms of use. So it's a squishy kind of stealing after breaching a squishy kind of contract - too fuzzy to stop incented parties.

jefftk 1 month ago
Why do you think there's was an implicit agreement that documentation was only intended for humans? I've written a lot of documentation, much of it open source, and I'm generally very excited that it has proved additionally useful via LLMs. If you had asked me in 2010 whether that was something I intended in writing docs I'm pretty sure I would have said something like "that's science fiction, but sure".
- b112 1 month ago
  
  You still intended it for humans. Intent is defined by what one is aiming for, and without knowledge of an alternative, that was your intent.
  100% I get that you are OK with it being used by non-human ingestion. And I think many might be OK with that.
  One thing, I'm not sure how helpful the documentation is. I think we're getting training out of example, not docs. This makes me think... we could test this by creating a new pseudo-language, and then provide no examples, only docs.
  If the LLM can then code effectively after reading the docs, we'd have a successful test. Otherwise? It's all parroting.

johnpaulkiser 1 month ago

There will be no royalties, simply make all the models that trained on the public internet also be required to be public.

This won't help tailwind in this case, but it'll change the answer to "Should I publish this thing free online?" from "No, because a few AI companies are going to exclusively benefit from it" to "Yes, I want to contribute to the corpus of human knowledge."

amrocha 1 month ago
Contributing to human knowledge doesn’t pay the bills though
- imiric 1 month ago
  
  It can. The problem is the practice of using open source as a marketing funnel.
  There are many projects that love to brag about being open source (it's "free"!), only to lock useful features behind a paywall, or do the inevitable license rug pull after other companies start profiting from the freedoms they've provided them. This is the same tactic used by drug dealers to get you hooked on the product.
  Instead, the primary incentive to release a project as open source should be the desire to contribute to the corpus of human knowledge. That doesn't mean that you have to abandon any business model around the project, but that shouldn't be your main goal. There are many successful companies built around OSS that balance this correctly.
  "AI" tools and services corrupt this intention. They leech off the public good will, and concentrate the data under the control of a single company. This forces well-intentioned actors to abandon open source, since instead of contributing to human knowledge, their work contributes to "AI" companies. I'm frankly not upset when this affects projects who were abusing open source to begin with.
  So GP has a point. Forcing "AI" tools, and even more crucially, the data they collect and use, to be free/libre, would restore the incentive for people to want to provide a public good.
  The narrative that "AI" will bring world prosperity is a fantasy promoted by the people who will profit the most. The opposite is true: it will concentrate wealth and power in the hands of a few even more than it is today. It will corrupt the last vestiges of digital freedoms we still enjoy today.
  I hope we can pass regulation that prevents this from happening, but I'm not holding my breath. These people are already in power, and governments are increasingly in symbiotic relationships with them.
  
  1 reply →
- AniseAbyss 1 month ago
  
  [dead]

delusional 1 month ago

> We have a system that clearly creates value, but no longer distributes it in a sustainable way.

It does not "create value" it harvests value and redirects the proceeds it accrues towards its owners. The business model is a middleman that arbitrages the content by separating it from the delivery.

Software licensing has been broken for 2 decades. That's why free software isn't financially viable for anybody except a tiny minority. It should be. The entire industry has been operating by charity. The rich mega corporations have decided they're not longer going to be charitable.

sodapopcan 1 month ago

It's not as simple as calling it theft, but it is simply theft, plus the other good points you made.

visarga 1 month ago
Copying is theft, generating is theft, and it is not even taking anything they had. Future revenue can't be stolen.
I think once it becomes infrastructure and widely used knowledge the authors can't claim control anymore. Or shouldn't.
- sodapopcan 1 month ago
  
  > Future revenue can't be stolen.
  This is a big eye-roll but otherwise ya, this is one way to think of it. It's not all about money, though. The people running these companies are just taking, en masse, without credit. This is a basic human desire. Of course there is a discussion of whether or not we should evolve beyond that. It feels incredibly dystopian to me, though.

pico303 1 month ago

The problem is there was a social contract. Someone spent their time and money to create a product that they shared for free, provided you visit their site and see their offerings. In this way they could afford to keep making this free product that everyone benefited from.

LLMs broke that social contract. Now that product will likely go away.

People can twist themselves into knots about how LLMs create “value” and that makes all of this ok, but the truth is they stole information to generate a new product that generates revenue for themselves at the cost of other people’s work. This is literally theft. This is what copyright law is meant to protect. If LLM manufacturers are making money off someone’s work, they need to compensate people for that work, same as any client or customer.

LLMs are not doing this for the good of society. They themselves are making money off this. And I’m sure if someone comes along with LLM 2.0 and rips them off, they’re going to be screaming to governments and attorneys for protection.

The ironic part of all of this is that LLMs are literally killing the businesses they need to survive. When people stop visiting (and paying) Tailwind, Wikipedia, news sites, weather, and so on, and only use LLMs, those sites and services will die. Heck, there’s even good reason to think LLMs will kill the Internet at large, at least as an information source. Why in the hell would I publish news or a book or events on the Internet if it’s just going to be stolen and illegally republished through an LLM without compensating me for my work? Once this information goes away or is locked behind nothing but paywalls, I hope everyone is ready for the end of the free ride.