Comment by AnthonyMouse
10 months ago
Nobody has explained how they could use that data without producing a model that would emit private information.
10 months ago
Nobody has explained how they could use that data without producing a model that would emit private information.
Perhaps de-identification before training could be helpful here.
Microsoft does seem active in this, e.g. https://microsoft.github.io/presidio/
None of that stuff actually works. You can remove someone's social security number from the data but there is still only one person at the exact intersection of all the data that isn't individually considered personally identifying but collectively it still is.
Moreover, that isn't even the problem here. Suppose your company has a trade secret. You know how to manufacture widgets more efficiently than your competitors. If Microsoft produces a model that will now tell your competitors your secret process that it learned from your internal emails, it's completely irrelevant whether they stripped the PII out of your emails first.