Comment by strogonoff
22 days ago
Aside from becoming the opposite of the values their name suggests, there’s two main mistakes OpenAI made in my view: violate copyright when training, and rush to release the chatbot. Stealing original work is going to bite them legally (opening them to all sorts of lawsuits while killing their own ability to sue competitors piggy-backing off their model output, for example), and is a special case of them being generally shortsighted and passing on an opportunity to make a truly Apple- or Amazon-scale business by applying strategy and longer term thinking (even if someone else got to release an LLM chatbot before them, they could—as in, had the funds and the talent to—build something higher level, properly licensed, and much more difficult to commoditise).
If this was the fault of Altman, it is understandable that certain people would want him out.
> violate copyright when training
If we could incrementally update our own brains by swapping cells for chips, what percentage of our brain has to be chips before us learning from a book is a violation of copyright?
When learning to recite a recent children's poem in kindergarten, what level of accuracy can a child attain before their ability to repeat it privately to one other person at a time is a copyright violation?
I don't think the concern is related specifically to training on computer chips with copyrighted content.
If you are going to use human brain cells to memorize protected content and sell it as a product, that's still an issue based on current copyright laws.
> If you are going to use human brain cells to memorize protected content and sell it as a product, that's still an issue based on current copyright laws.
And yet, that's all most billable hours at McKinsey, BCG, KPMG, are for. Those consultants memorized copyrighted stuff so your executives didn't have to.
It's very difficult to explain how GPT is not consulting.
2 replies →
Once again...LLMs are not massive archives of data.
You would never want to use an LLM to archive your writings or documents. They are incredibly bad at this.
1 reply →
Want to abolish economic copyright alltogether? I could get behind that. Making a legal exception because of some imagined future metaphysical property of this particular platform sounds like being fooled.
Why not abolish copyright only when it suits multi-billion corporations and leave it in place for us, ordinary people who end up providing training data so that we can be replaced at our jobs?
1 reply →
This is one issue with Microsoft's Total Recall thing, right? I wonder how they're dealing with that.
Others replied to this and I am still not sure what your point is. Are you saying big tech should be able to get away with this because LLMs are just like us humans?
> If we could incrementally update our own brains by swapping cells for chips, what percentage of our brain has to be chips before us learning from a book is a violation of copyright?
The same percentage at which you stop qualifying to be human and become an unthinking tool, fully controlled by its operator to do whatever they want, without free will of its own and without any ethical concerns about abuse and slavery, like is the case with all LLMs.
(Of course, it is a moot point, because creating a human-level consciousness with chips is a thought experiment not grounded in reality.)
> When learning to recite a recent children's poem in kindergarten, what level of accuracy can a child attain before their ability to repeat it privately to one other person at a time is a copyright violation?
Any level thanks to the concept called human rights and freedoms, famously not applied to machines and other unthinking tools.
This seems short sighted. The idea of when a "mechanical man" should be given the same rights as a man has been explored for a long time, as an echo of the past when people had the same debate about women and non-Europeans.
2 replies →
Do the copyright claims have any legs at all? ianal, but I thought it was pretty settled that statistical compilation of copyrighted works (indexes, concordances, summaries, full-text search databases) were considered "facts" and not copies.
(This would be separate from the contributory infringement claim if the model will output a copyrighted work verbatim)
1. Google was, and still is in some developed countries, under fire for as much as summarising search results too much, so I think yes, the claims have legs.
> This would be separate from the contributory infringement claim if the model will output a copyrighted work verbatim
2. Commercial for-profit models were shown to do that, and (other legal arguments aside, such as model and/or its output being a derivative work, etc.) in some cases that was precisely the smoking gun for the lawsuit, if I recall correctly.
I have not seen any conclusive outcome, I suppose it will depend on jurisdiction.
I can guarantee you if one jurisdiction limits AI training via copyright law, another one won’t, and it will have a huge competitive advantage as a result. That competitive advantage alone means you either have to leave the race or change your laws.
1 reply →
[dead]