Comment by godelski
3 days ago
You do not have to, and should not, start deleting data immediately. We've not uncivilized here, we can schedule tasks.
If this were happening on device (lol) then you should do both the scanning and deleting operations at times of usually low activity. Just like how you schedule updates (though Microsoft seems to not have forgotten how to do this). Otherwise, doing the operations at toggle time just slams the user's computer, which is a great way to get them to turn it off! We'd especially want the process to have high niceness and be able to pause itself to not hinder the user. Make sure they're connected to power or at least above some threshold in battery if on laptop.
If you can on device and upload, again, you should do this at times of low activity. But you also are not going to be deleting data right away because that is going to be held across several servers. That migration takes time. There's a reason your Google Takeout can take a few hours and why companies like Facebook say your data might still be recoverable for 90 days.
Doing so immediately also creates lots of problems. Let's say you enable, let it go for awhile, then just toggle back and fourth like a mad man. Does your toggling send the halt signal to the scanning operation? What does the toggling on option do? Do you really think this is going to happen smoothly without things stepping on each other? You're setting yourself up for a situation where the program is both scanning and deleting at the same time. If this is implemented better than most things I've seen from Microsoft then this will certainly happen and you'll be in an infinite loop. All because you make the assumption that there is no such thing, or the possibility of, an orphaned process. You just pray that these junior programmers with a senior title just don't know how to do parallelization...
In addition to the delay you should be marking the images in a database to create a queue. Store the hash of the file as the ID and mark appropriately. We are queuing our operations and we want to have fail safes. You're scanning the entire fucking computer so you don't want to do things haphazardly! Go ahead, take a "move fast and break things" approach, and watch your customers' get a blue screen of death and wake up to having their hard drives borked.
> unless you have some clever way I’ve not thought of?
Seriously, just sit down and think about the problem before you start programming. The whiteboard or pen and paper are some of your most important weapons as a programmer. Your first solution will be shit and that's okay. Your second and even third solution might be shit too. But there's a reason you need depth. We haven't even gotten into any real depth here either. Our "solution" here has no depth, it's just the surface level and I'm certain the first go will be shit. And But you'll figure more stuff out and find more problems and fix them. I'm also certain others will present other ideas that can be used too. Yay, collaboration! It's all good unless you just pretend you're done and problems don't exist anymore. (Look ma! All the tests pass! We're bug free!) For christ's sake, what are you getting a quarter million+ salary for?
No comments yet
Contribute on Hacker News ↗