Comment by ImprobableTruth
10 hours ago
I think the fact that all (good) LLM datasets are full with licensed/pirated material means we'll never really see a decent open source model under the strict definition. Open weight + open source code is really the best we're going to get, so I'm fine with it coopting the term open source even if it doesn't fully apply.
> we'll never really see a decent open source model under the strict definition
But there are already a bunch of models like that, were everything (architecture, training data, training scripts, etc) is open, public and transparent. Since you weren't aware those existed since before, but you now know that, are you willing to change your perspective on it?
> so I'm fine with it coopting the term open source even if it doesn't fully apply
It really sucks that the community seems OK with this. I probably wouldn't have been a developer without FOSS, and I don't understand how it can seem OK to rob other people of this opportunity to learn from FOSS projects.
Not all of the community is OK with this, lots of folks are strongly against OSI's bullshit OSAID for example. Really it should have been more like the Debian Deep Learning Team's Machine Learning Policy, just like last time when the OSI used the Debian Free Software Guidelines (DFSG) to create the Open Source Definition (OSD).
https://salsa.debian.org/deeplearning-team/ml-policy