← Back to context

Comment by 8note

16 hours ago

i still want those data sets to become public domain. open weights still isnt good enough

That's the conundrum isn't it? Anyone that posts their datasets would be immediately sued/blocked/boycotted to oblivion due to the obvious and blatant data theft, not to mention IP and copyright issues.

  • Nvidia's even being sued for providing scripts which automate the downloading of said data from non-Nvidia sources. We certainly don't need copyrights that last nearly a century after the author's death (they literally cannot help the author), so here's hoping that some of the disputes over all this money changing hands can reign in some of the existing copyright sprawl. A stronger public domain would provide more useful training data for everyone, including open source models, and make criminals out of fewer AI researchers.