Comment by jerpint

9 months ago

The ability to add watermarks to text is really interesting. Obviously it could be worked around , but could be a good way to subtly watermark e.g. LLM outputs

11 comments

jerpint

tyho 9 months ago

There are way better ways to watermark LLM output. It's easy to make it undetectable, which this is'nt.

shawnz 9 months ago
I recently worked on a steganographics project which could be useful for this problem. See: https://github.com/shawnz/textcoder
- andai 9 months ago
  
  That's really cool, you should repost the HN submission.
  
  1 reply →
antognini 9 months ago

The issue with the standard watermark techniques is that they require an output of at least a few hundred tokens to reliably imprint the watermark. This technique would apply to much shorter outputs.
pava0 9 months ago
For example?
- tyho 9 months ago
  
  A crude way: To watermark: First establish a keyed DRBG. For every nth token prediction: read a bit from the DRBG for every possible token to label them red/black. before selecting the next token, set the logit for black tokens to -Inf, this ensures a red token will be selected.
  To detect: Establish the same DRBG. Tokenize, for each nth token, determine the red set of tokens in that position. If you only see red tokens in lots of positions, then you can be confident the content is watermarked with your key.
  This would probably take a bit of fiddling to work well, but would be pretty much undetectable. Conceptually it's forcing the LLM to use a "flagged" synonym at key positions. A more sophisticated version of a shiboleth.
  In practice you might chose to instead watermark all tokens, less heavy handedly (nudge logits, rather than override), and use highly robust error correcting codes.
  
  3 replies →

LorenPechtel 9 months ago

This is too strippable to be a good watermark, it would only catch the ones who are unaware. The leakers, yes, the cybersecurity people, no.

Rather, I see a use in signing things. Newspapers, politicians etc, generate a unique key and encode it into your article or whatever. Now it's easy for anyone to check if a quote attributed to you actually came from you. Sure, it's not secure but it doesn't need to be because it's simply a stable identifier. Even paywalled sites could display a snippet around the provided quote without being problematic.