Comment by porridgeraisin
17 hours ago
Differentially private means that:
training_algorithm(training data with a row that has "ForHackernews blood test report...") hard to distinguish from training_algorithm(training data without that) upto a factor of epsilon. They have explained further in the article itself with concrete values for epsilon.
I got that from the article, but I'm not getting what does it means in practice? What's the use case?
It is very difficult for someone to coax the model into regurgitating a sequence from the training data. So as you can imagine, the first usecase is going to be google training on your gmail inbox without me being able to prompt your emails out of it.
User-level DP on the other hand, which the article alludes to near the end, would mean that it's very difficult to make the model regurgitate a particular user's data.
Since this is a theoretical guarantee, you can do whatever prompt engineering you like, it will be really difficult all the same.
How difficult it is depends on a bunch of quantitative factors. Mostly, the value of epsilon.
You might think this would be useful for copyright protection as well, but there is a subtle difference. It's been a while and I'm hazy on the details, so I'll refer you to the Near Access Freeness paper which discusses it in detail and proposes another framework for that.