Comment by ChadNauseam

2 months ago

There are other possible approaches to LLM watermarking that would be much more robust and harder to detect. They exploit the fact that LLMs work by producing a probability distribution that gives a probability for each possible next token. These are then sampled randomly to produce the output. To add fingerprints when generating, you could do some trickery in how you do that sampling that would then be detectable by re-running the LLM and observing its outputs. For example, you could alternate between selecting high-probability and low-probability tokens. (A real implementation of this would be much more sophisticated than that obviously, but hopefully you get the idea)

This is not a great method in a world with closed models and highly diverse open models and samplers. It’s intellectually appealing for sure! But it will always be at best a probabilistic method, and that’s if you have the llm weights at hand.

  • What makes it not a good method? Of course if a model's weights are publicly available, you can't compel anyone using it to add fingerprinting at the sampler stage or later. But I would be shocked if OpenAI was not doing something like this, since it would be so easy and couldn't hurt them, but could help them if they don't want to train on outputs they generated. (Although they could also record hashes of their outputs or something similar as well – I would be surprised if they don't.)