Comment by urbandw311er
3 days ago
This would be because of the legal requirement to purge (erase) all the previous scan data once a user opts out. So the only way to re-enable is to scan everything again — unless you have some clever way I’ve not thought of?
Encrypt the data and store the key on the user's device. If the user enables the feature, they transmit their key to you. If they disable the feature, you delete the key on your side.
In theory, you could store a private key on the device and cryptoshred the data on Microsoft’s servers when the setting is disabled (Microsoft deletes their copy of the key). Then, when the feature is re-enabled, upload the private key to Microsoft again.
Does that meet the legal requirement to delete data when requested? I am not sure it does.
As far as I know, most data protection laws accept cryptoshredding as long as the party with a deletion requirement actually destroys the key. For one thing, it’s hard to reconcile deletion requirements with immutable architectures and backups without a mechanism like this.
IANAL, but I think the key remaining in the user’s possession doesn’t matter as far as the company with a deletion requirement is concerned.
1 reply →
You do not have to, and should not, start deleting data immediately. We've not uncivilized here, we can schedule tasks.
If this were happening on device (lol) then you should do both the scanning and deleting operations at times of usually low activity. Just like how you schedule updates (though Microsoft seems to not have forgotten how to do this). Otherwise, doing the operations at toggle time just slams the user's computer, which is a great way to get them to turn it off! We'd especially want the process to have high niceness and be able to pause itself to not hinder the user. Make sure they're connected to power or at least above some threshold in battery if on laptop.
If you can on device and upload, again, you should do this at times of low activity. But you also are not going to be deleting data right away because that is going to be held across several servers. That migration takes time. There's a reason your Google Takeout can take a few hours and why companies like Facebook say your data might still be recoverable for 90 days.
Doing so immediately also creates lots of problems. Let's say you enable, let it go for awhile, then just toggle back and fourth like a mad man. Does your toggling send the halt signal to the scanning operation? What does the toggling on option do? Do you really think this is going to happen smoothly without things stepping on each other? You're setting yourself up for a situation where the program is both scanning and deleting at the same time. If this is implemented better than most things I've seen from Microsoft then this will certainly happen and you'll be in an infinite loop. All because you make the assumption that there is no such thing, or the possibility of, an orphaned process. You just pray that these junior programmers with a senior title just don't know how to do parallelization...
In addition to the delay you should be marking the images in a database to create a queue. Store the hash of the file as the ID and mark appropriately. We are queuing our operations and we want to have fail safes. You're scanning the entire fucking computer so you don't want to do things haphazardly! Go ahead, take a "move fast and break things" approach, and watch your customers' get a blue screen of death and wake up to having their hard drives borked.
Seriously, just sit down and think about the problem before you start programming. The whiteboard or pen and paper are some of your most important weapons as a programmer. Your first solution will be shit and that's okay. Your second and even third solution might be shit too. But there's a reason you need depth. We haven't even gotten into any real depth here either. Our "solution" here has no depth, it's just the surface level and I'm certain the first go will be shit. And But you'll figure more stuff out and find more problems and fix them. I'm also certain others will present other ideas that can be used too. Yay, collaboration! It's all good unless you just pretend you're done and problems don't exist anymore. (Look ma! All the tests pass! We're bug free!) For christ's sake, what are you getting a quarter million+ salary for?