← Back to context

Comment by coppsilgold

6 hours ago

> The sophisticated version I heard is to make the differences in the white space between individual words/lines/wherever.

That would be a naive way to do it.

Here is an example of a more sophisticated way:

  A canary trap is a (method, way) for (exposing, determining) an information leak by giving (different, differing) versions of a (sensitive, secret) (document, file) to each of (several, two or more) (suspects, persons) and (seeing, observing) which version gets (leaked, exposed).

I can now include 9 bits of a watermark in there. If I expand the lists from two options to four it would be 18 bits. Four to eight would double that again - so diminishing returns after 4. The lists can vary in size too of course.

The sentiment of an entire paragraph can serve as single bit, it would have a chance to be robust to paraphrasing.

In the example above, if two or more leakers get together you might think that they could figure out a way to generate a clean version. But it turns out if there are enough watermark bits in the content and you use Tardos codes (a crafted Arcsine distribution of bits) small coalitions of traitors will betray themselves. Even large coalitions of 100 or more will betray themselves eventually (after 100s of 1000s of watermarked bits, the scaling is a constant + square of the number of traitors). The Google keyword is "traitor tracing scheme".

"What, precisely, does your employee handbook say about sexual harassment?"

"Well you see, your honour, we have 1000 slightly different employee handbooks, but they all say employees may not, must not, should not, can not, are not permitted to, must refrain from, or are forbidden from committing sexual harassment"