Comment by novas0x2a
14 years ago
for any purpose, at any time, provided that following collection
of such location and speed information identifiable to your Vehicle
They store the data tied to your identity. A data breach (quite common these days...) would be a Big Deal. GPS tracks of everywhere you've gone in your car, ever? That's worth quite a bit of money in the right hands.
He gave no evidence that they were not anonymizing the data properly,
he just assumed they were not.
Zipcode, birthday, gender: identifies 87% of Americans[1]. Your (Home,Work) gps tuple? Unique[2][3]. His assumption is quite safe; every "anonymized" dataset that's been released into the public (that I know of) has been de-anonymized. Why would this one be special?
1) http://arstechnica.com/tech-policy/news/2009/09/your-secrets...
2) http://crypto.stanford.edu/~pgolle/papers/commute.pdf
3) http://33bits.org/2009/05/13/your-morning-commute-is-unique-...
EDIT: In response to parent edit and below comments
I have no proof of these, but factoids I believe to be true (so feel free to base a research paper on them :D)
1) To identify commuters: (Highway-Entrance-Location, Average-Highway-Entrance-Time, Highway-Exit-Location, Average-Highway-Exit-Time) -> some derived values: approximate (home,work), average speed, average driving aggression
2) Really, now that I think about it, any dataset where multiple gps tracks (for a single person) are tied together is out. If you can get any single Average-Location-at-Specific-Time data point, (plus point #3 below) you've reduced the unique set to quite small. Then you just stand on that street corner at that time (or, for the police, use the red light cameras...) and you're done.
3) This is an OnStar dataset we're talking about, so you're looking for GMC-manufactured cars, made in the last ~10 years (or whenever onstar started going into cars). I'm willing to bet that just that data point is enough to reduce any other lukewarm/weak de-anonymization to a solid match.
4) Anyone who buys onstar as an option is quite concerned with their safety at all costs (... my bias, I guess, since I consider it a waste of time), so look for e.g. families with small kids or other dependents.
I'm running out of steam for this single comment, but name is certainly not necessary for unique ID. Ongoing research is cracking this stuff wide open. When the netflix dataset came out, who would have thought that movie ratings could uniquely identify a person?
No comments yet
Contribute on Hacker News ↗