Comment by advael

5 months ago

I'm strictly speaking never going to think of model distillation as "stealing." It goes against the spirit of scientific research, and besides every tech company has lost my permission to define what I think of as theft forever

31 comments

advael

eru 5 months ago

At most it would be illicit copying.

Though it's poetic justice that OpenAI is complaining about someone else playing fast and loose with copyright rules.

downrightmike 5 months ago
The First Amendment is not just about free speech, but also the right to read, the only question is if AI has that right.
- rig666 5 months ago
  
  Does my software have the right to read the contents of a DVD and sell my own MP4 of it then no. If a streamer plays a YouTube video on there channel is the content original then yes. When gpt3 was training people saw it as a positive. When people started asking chatgpt more things than searching sites it became a negative.
- organsnyder 5 months ago
  
  If AI was just reading, there would be much less controversy. It would also be pretty useless. The issue is that AI is creating its own derivative content based on the content it ingests.
  
  18 replies →
- INGSOCIALITE 5 months ago
  
  kind of. the constitution as a whole, and the amendments, don't give you the right to do anything. you have the right to do whatever you want whenever you want. the constitution tells the government what it can and can not stop you from doing.
- eru 5 months ago
  
  I'm not sure the US 'First Amendment' is relevant here? DeepSeek is in China.
antimatter15 5 months ago
It's hardly even illicit- at least in the United States, the output of an AI isn't copyrightable.
- eru 5 months ago
  
  Was that decided in courts, yet?
  In any case, copyright ain't the only thing that prevents copying.
tomrod 5 months ago

Stochastic decompression. Dass-it.

fennecfoxy 5 months ago

I think it's less about that and more whether or not they used the free or paid API.

I think if OpenAI (or any other company) are paid for their compute time/access as anybody would, then using content generated by other models is fair game. Because it's an active/ongoing cost and not a passive one.

Whereas if someone trained on my dumb Tweets or HN posts then so be it; it's a passive cost for me - I paid my time to say x thing for my own benefits (tribal monk-e social interaction) therefore I have already gotten the value out of it.

surajrmal 5 months ago

Maybe but something has gotta pay the bills to justify the cutting edge. I guess it's a similar problem to researching medicine.

ClumsyPilot 5 months ago

Well the artists and writers also want to pay their bills. We threw them under the bus, might as well throw openAI too and get an actual open AI that we can use
advael 5 months ago

The investment thrown at OpenAI seems deeply inflated for how much meaningful progress they're able to make with it
I think it's clear that innovative breakthroughs in bleeding-edge research are not just a matter of blindly hurling more money at a company to build unprecedentedly expensive datacenters
But also, even if that was a way to do it, I don't think we should be wielding the law to enable privately-held companies to be at the forefront of research, especially in such a grossly inconsistent manner