Comment by blibble

2 months ago

> That’s throwing the baby out with the bath water.

it's not

the parasites can't train their shitty "AI" if they don't have anything to train it on

49 comments

blibble

You refusing to write open source will do nothing to slow the development of AI models - there's plenty of other training data in the world.

It will however reduce the positive impact your open source contributions have on the world to 0.

I don't understand the ethical framework for this decision at all.

lunar_mycroft 2 months ago

> You refusing to write open source will do nothing to slow the development of AI models - there's plenty of other training data in the world.
There's also plenty of other open source contributors in the world.
> It will however reduce the positive impact your open source contributions have on the world to 0.
And it will reduce your negative impact through helping to train AI models to 0.
The value of your open source contributions to the ecosystem is roughly proportional to the value they provide to LLM makers as training data. Any argument you could make that one is negligible would also apply to the other, and vice versa.
blibble 2 months ago

> You refusing to write open source will do nothing to slow the development of AI models - there's plenty of other training data in the world.
if true, then the parasites can remove ALL code where the license requires attribution
oh, they won't? I wonder why
Juliate 2 months ago
The ethical framework is simply this one: what is the worth of doing +1 to everyone, if the very thing you wish didn't exist (because you believe it is destroying the world) benefits x10 more from it?
If bringing fire to a species lights and warms them, but also gives the means and incentives to some members of this species to burn everything for good, you have every ethical freedom to ponder whether you contribute to this fire or not.
- simonw 2 months ago
  
  I don't think that a 10x estimate is credible. If it was I'd understand the ethical argument being made here, but I'm confident that excluding one person's open source code from training has an infinitesimally small impact on the abilities of the resulting model.
  For your fire example, there's a difference between being Prometheus teaching humans to use fire compared to being a random villager who adds a twig to an existing campfire. I'd say the open source contributions example here is more the latter than the former.
  
  2 replies →
bwfan123 2 months ago
> there's plenty of other training data in the world.
Not if most of it is machine generated. The machine would start eating its own shit. The nutrition it gets is from human-generated content.
> I don't understand the ethical framework for this decision at all.
The question is not one of ethics but that of incentives. People producing open source are incentivized in a certain way and it is abhorrent to them when that framework is violated. There needs to be a new license that explicitly forbids use for AI training. That may encourage folks to continue to contribute.
- azakai 2 months ago
  
  Saying people shouldn't create open source code because AI will learn from it, is like saying people shouldn't create art because AI will learn from it.
  In both cases I get the frustration - it feels horrible to see something you created be used in a way you think is harmful and wrong! - but the world would be a worse place without art or open source.
  
  1 reply →
bgwalter 2 months ago
Guilt-tripping people into providing more fodder for the machine. That is really something else.
I'm not surprised that you don't understand ethics.
- simonw 2 months ago
  
  I'm trying to guilt-trip them into using their skills to improve the world through continuing to release open source software.
  I couldn't care less if their code was used to train AI - in fact I'd rather it wasn't since they don't want it to be used for that.
  
  10 replies →
realmadludite 2 months ago

[dead]

pdpi 2 months ago

Yes — That’s the bath water. The baby is the all the communal good that has come from FLOSS.

afavour 2 months ago
OP is asserting that the danger posed by AI is far bigger than the benefit of FLOSS. So to OP AI is the bath water.
- seanclayton 2 months ago
  
  Yes, and they are okay with throwing the baby out with it, which is what the other commenter is commenting about. Throwing babies out of buckets full of bathwater is a bad thing, is what the idiom implies.
  
  1 reply →

Kirth 2 months ago

surely that cat's out of the bag by now; and it's too late to make an active difference by boycotting the production of more public(ly indexed) code?

franktankbank 2 months ago

Kind of kind of not. Form a guild and distribute via SAAS or some other undistributable knowledge. Most code out there is terrible so relying on AI trained on it will lose out.

ekianjo 2 months ago

If we end up with only proprietary software we are the one who lose

Juliate 2 months ago

GenAI would be decades away (if not more) with only proprietary software (which would never have reached both the quality, coordination and volume open source enabled in such a relatively short time frame).

xdavidliu 2 months ago

open source code is a miniscule fraction of the training data

fooqux 2 months ago
I'd love to see a citation there. We already know from a few years ago that they were training AI based on projects on GitHub. Meanwhile, I highly doubt software firms were lining up to have their proprietary code bases ingested by AI for training purposes. Even with NDAs, we would have heard something about it.
- xdavidliu 2 months ago
  
  I should have clarified what I meant. The training data includes roughly speaking the entire internet. Open source code is probably a large fraction of the code in the data, but it is a tiny fraction of the total data, which is mostly non-code.
  My point was that the hypothetical of "not contributing to any open source code" to the extent that LLMs had no code to train on, would not have made as big of an impact as that person thought, since a very large majority of the internet is text, not code.
  
  2 replies →
maplethorpe 2 months ago

Where did most of the code in their training data come from?

garciasn 2 months ago

Free software has always been about standing on the shoulders of giants.

I see this as doing so at scale and thus giving up on its inherent value is most definitely throwing the baby out with the bathwater.

blibble 2 months ago
I'd rather the internet ceased to exist entirely, than contributing in any way to generative "AI"
- boca_honey 2 months ago
  
  This is just childish. This is a complex problem and requires nuance and adaptability, just as programming. Yours is literally the reaction of an angsty 12 year old.
- DiscourseFan 2 months ago
  
  Such a reactionary position is no better than nihilism.
  
  3 replies →
- Marha01 2 months ago
  
  Ridiculous overreaction.

dvfjsdhgfv 2 months ago

It is. If not you, other people will write their code, maybe of worse quality, and the parasites will train on this. And you cannot forbid other people to write open source software.

blibble 2 months ago

> If not you, other people will write their code, maybe of worse quality, and the parasites will train on this.
this is precisely the idea
add into that the rise of vibe-coding, and that should help accelerate model collapse
everyone that cares about quality of software should immediately stop contributing to open source