Comment by jd172

2 days ago

Personally I would rather use a 'bad' ai that's trained ethically and runs locally than a good ai trained on stolen data that requires me to surrender my thoughts to the cloud.

whether or not it's possible to compete I guess we'll see but I am hopeful and appreciative that Mozilla is trying, as I am getting tired of big tech trying to force everyone to hand over even more unhinged amounts of data than what they're already taking from us.

I strongly suspect that it is absolutely impossible to have an even remotely usable/useful "AI" trained on tiny datasets, and that instead of training only on ethical data, companies that want to sound ethical will use an extra post-training step for dirty foundation models to behave more ethically as if they'd only learned from ethical sources. I'd hate for this to become the norm, but I fear this is logically what annoucements like this one really mean. The difference in scale is so vast -- taking whatever you want from the entire internet -- vs hand-curated datasets with explicit authorisation and free to use. It's like trying to make a grain of sand gravitate around a marble in the playground, to mimic the moon around the Earth – won't work.