Comment by vessenes

2 months ago

I love the idea of using this for LLM output watermarking. It hits the sweet spot - will catch 99% of slop generators with no fuss, since they only copy and paste anyway, almost no impact on other core use cases.

I wonder how much you’d embed with each letter or token that’s output - userid, prompt ref, date, token number?

I also wonder how this is interpreted in a terminal. Really cool!

Why does anybody think AI watermarking will ever work? Of course it will never work, any watermarking can be instantly & easily stripped...

The only real AI protection is to require all human interaction to be signed by a key verified by irl identity and even then that will: A never happen, B be open to abuse by countries with corrupt governments and countries with corrupt governments heavily influenced by private industry (like the US).

  • > any watermarking can be instantly & easily stripped...

    I think it took a while before printer watermarking (yellow dots) was discovered. It certainly cannot be stripped. It was possibly developed in the mid-80s but not known to the public until mid 2000s.

With the amount of pre processing that is done before integrating stuff in a dataset I'd be surprised if those kinds of shenanigans even worked

In most linux terminals, what you pass it is just a sequence of bytes that is passed unmangled. And since this technique is UTF-8 compliant and doesn't use any extra glyphs, it is invisible to humans in unicode compliant terminals. I tried it on a few. It shows up if you echo the sentence to, say, xxd ofc.

(unlike the PUA suggestion in the currently top voted comment which shows up immediately ofc)

Additional test corrections: While xxd shows the message passing through completely unmangled on pasting it into the terminal, when I selected from the terminal (echoed sentence, verified unmangled in xxd, then selected and pasted the result of echo), it was truncated to a few words using X select in mate terminal and konsole - I'm not sure where that truncation happens, whether it's the terminal or X. In xterm, the final e was mangled, and the selection was even more truncated.

The sentence is written unmangled to files though, so I think it's more about copying out of the terminal dropping some data. Verified by echoing the sentence to a test file, opening it in a browser, and copying the text from there.

  • On MacOS, kitty shows an empty box, then an a for the "h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a" post below. I think this is fair and even appreciated. Mac Terminal shows "ha". That "h󠅘󠅕󠅜󠅜󠅟󠄐󠅖󠅕󠅜󠅜󠅟󠅧󠄐󠅘󠅑󠅓󠅛󠅕󠅢󠄐󠄪󠄙a" (and this one!) can be copied and pasted into the decoder successfully.

There are other possible approaches to LLM watermarking that would be much more robust and harder to detect. They exploit the fact that LLMs work by producing a probability distribution that gives a probability for each possible next token. These are then sampled randomly to produce the output. To add fingerprints when generating, you could do some trickery in how you do that sampling that would then be detectable by re-running the LLM and observing its outputs. For example, you could alternate between selecting high-probability and low-probability tokens. (A real implementation of this would be much more sophisticated than that obviously, but hopefully you get the idea)

  • This is not a great method in a world with closed models and highly diverse open models and samplers. It’s intellectually appealing for sure! But it will always be at best a probabilistic method, and that’s if you have the llm weights at hand.

    • What makes it not a good method? Of course if a model's weights are publicly available, you can't compel anyone using it to add fingerprinting at the sampler stage or later. But I would be shocked if OpenAI was not doing something like this, since it would be so easy and couldn't hurt them, but could help them if they don't want to train on outputs they generated. (Although they could also record hashes of their outputs or something similar as well – I would be surprised if they don't.)

Just you wait until AI starts calling human output to be slop.

  • That's already happening - my kids have had papers unfairly blamed on chatgpt by automated tools. Protect yourself kids, use an editor that can show letter by letter history.

    • 2 people I worked with had this happen and one of them is going to war over it as it was enough to lower the kids grade for college or something. Crazy times.

    • Do you have any examples of editors that show letter by letter history? I have never looked for that as a feature.

      Edit: I've been looking, and Google Docs seems to have version history to the minute.

      1 reply →

  • There are of course human writers who are less-communicative than AI, called "shit writers", and humans who are less accurate than AI, called "liars".

    The difference is humans are responsible for what they write, whereas the human user who used an AI to generate text is responsible for what the computer wrote.