Comment by waerhert
4 days ago
I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks. I was under the assumption services like Wigle only provided the reverse lookup (wifi -> coords). Indeed an attacker could potentially reverse the LSH tags if it hashed the wifi environment within very small geofences. It's bit of a needle & haystack problem but not an impossible one with enough resources. I wouldn't say it's a perfect system and I don't mind it falling apart under scrutiny, I just found it an interesting idea so I really appreciate you thinking along here.
Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something.
> I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks
I think this is the issue, is these datasets are out there and at least big tech companies have them since they're used to assist with GPS. I was about to post the same thing as above but saw vessenes beat me to it.
Without thinking about it too hard, the two directions I see are either making observations of the environment in real-time that is only relevant at that time (IE sniffing actual wireless frames, even if they're encrypted and making observations on them, however, most devices won't let you go into promiscuous mode and do this) or encrypting the messages in flight so only participants can decrypt them (IE a model like the signal protocol with E2E message encryption).
Anyways, this is a cool approach, but that risk occurred to me as well about the ability to just brute force the entire dataset to decode every location.