← Back to context

Comment by kachapopopow

1 day ago

GPL is not only a copyright license, it also covers multiple types of intellectual property rights. Especially when you consider GPL-3 which has explicit IP protection while GPL-2 is implicit, so yah you're partially right for GPL-2 and wrong for GPL-3.

It's true that GPLv3 covers patents, but it is still primarily a copyright license.

The tokenizer's tokens aren't patented, for sure. They can't be trademarked (they don't identify a product or service). They aren't a trade secret (the data is public). They aren't copyrighted (not a creative work). And the GPL explicitly preserves fair use rights, so there are no contractual restrictions either.

A tokenizer is effectively a list of the top-n most common byte sequences. There's simply no basis in law for it to be subject to copyright or any other IP law in the average situation.

  • I mean okay sure, there is no legal framework for tokenizers, but what about the rest of the model I think there is a much stronger argument there? And you could realistically extend the logic that if the model is GPL-2.0 licensed you have to provide all the tools to replicate it which would include the tokenizer.