Comment by lettergram

5 years ago

It was actually a response to your comment:

> One thing I didn't get is this magical PII thing. How does the author look at a random network packet -- nay, just packet headers -- and assign a PII:true/false label? I think many corporations would sacrifice the right hand of a sysadmin if that was the way to get this tech.

Checkout Amazon macie or Microsoft presidio or try actually using the library I linked?

It’s usually used in a constrained way, in no way perfect. But it helps investigators track suspected cases of data exfiltration. You can pull something that looks suspect (say a credit card) and compare against an internal dataset and see if it’s legit.

In the repo I linked you can see the citation for an earlier model on synthetic and real world datasets:

https://github.com/capitalone/DataProfiler#references

https://arxiv.org/pdf/2012.09597.pdf

So I don’t really understand the hostility.

0 comments

lettergram

No comments yet

Contribute on Hacker News ↗