Comment by coppsilgold

1 month ago

> The sophisticated version I heard is to make the differences in the white space between individual words/lines/wherever.

That would be a naive way to do it.

Here is an example of a more sophisticated way:

  A canary trap is a (method, way) for (exposing, determining) an information leak by giving (different, differing) versions of a (sensitive, secret) (document, file) to each of (several, two or more) (suspects, persons) and (seeing, observing) which version gets (leaked, exposed).

I can now include 9 bits of a watermark in there. If I expand the lists from two options to four it would be 18 bits. Four to eight would double that again - so diminishing returns after 4. The lists can vary in size too of course.

The sentiment of an entire paragraph can serve as single bit, it would have a chance to be robust to paraphrasing.

In the example above, if two or more leakers get together you might think that they could figure out a way to generate a clean version. But it turns out if there are enough watermark bits in the content and you use Tardos codes (a crafted Arcsine distribution of bits) small coalitions of traitors will betray themselves. Even large coalitions of 100 or more will betray themselves eventually (after 100s of 1000s of watermarked bits, the scaling is a constant + square of the number of traitors). The Google keyword is "traitor tracing scheme".

2 comments

coppsilgold

michaelt 1 month ago

"What, precisely, does your employee handbook say about sexual harassment?"

"Well you see, your honour, we have 1000 slightly different employee handbooks, but they all say employees may not, must not, should not, can not, are not permitted to, must refrain from, or are forbidden from committing sexual harassment"

crazygringo 1 month ago

What makes it more sophisticated to use synonyms instead of whitespace?

It sounds like a lot more work for the same result, and now it's much more obvious to people that they have different versions.

And even in your short example, many of the supposed synonyms change the actual meaning. And one winds up being grammatically incorrect.