Comment by lettergram
5 years ago
It was actually a response to your comment:
> One thing I didn't get is this magical PII thing. How does the author look at a random network packet -- nay, just packet headers -- and assign a PII:true/false label? I think many corporations would sacrifice the right hand of a sysadmin if that was the way to get this tech.
Checkout Amazon macie or Microsoft presidio or try actually using the library I linked?
It’s usually used in a constrained way, in no way perfect. But it helps investigators track suspected cases of data exfiltration. You can pull something that looks suspect (say a credit card) and compare against an internal dataset and see if it’s legit.
In the repo I linked you can see the citation for an earlier model on synthetic and real world datasets:
https://github.com/capitalone/DataProfiler#references
https://arxiv.org/pdf/2012.09597.pdf
So I don’t really understand the hostility.
No comments yet
Contribute on Hacker News ↗