Comment by Aurornis
3 days ago
They could have avoided the negative press by changing the requirement to be that you can’t re-enable the feature after switching it off 3 times per year.
It’s not hard to guess the problem: Steady state operation will only incur scanning costs for newly uploaded photos, but toggling the feature off and then on would trigger a rescan of every photo in the library. That’s a potentially very expensive operation.
If you’ve ever studied user behavior you’ve discovered situations where users toggle things on and off in attempts to fix some issue. Normally this doesn’t matter much, but when a toggle could potentially cost large amounts of compute you have to be more careful.
For the privacy sensitive user who only wants to opt out this shouldn’t matter. Turn the switch off, leave it off, and it’s not a problem. This is meant to address the users who try to turn it off and then back on every time they think it will fix something. It only takes one bad SEO spam advice article about “How to fix _____ problem with your photos” that suggests toggling the option to fix some problem to trigger a wave of people doing it for no reason.
> Turn the switch off, leave it off, and it’s not a problem.
Assuming that it doesn't mysteriously (due to some error or update, no doubt) move back to the on position by itself.
I cancelled Facebook in part due to a tug-of-war over privacy defaults. They kept getting updated with some corporate pablum about how opting in benefited the user. It was just easier to permanently opt out via account deletion rather than keep toggling the options. I have no doubt Microsoft will do the same. I'm wiping my Windows partition and loading Steam OS or some variant and dual booting into some TBD Linux distro for development.
When I truly need Windows, I have an ARM VM in Parallels. Right now it gets used once a year at tax time.
Is Linux tax software really that bad?
2 replies →
Oh the one you toggle will be off.
But tomorrow they’ll add a new feature, with a different toggle, that does the same thing but will be distinct enough. That toggle will default on, and you’ll find it in a year and a half after it’s been active.
Control over your data is an illusion. The US economy is built upon corporations mining your data. That’s why ML engineers got to buy houses in the 2010s, and it’s why ML/AI engineers get to buy houses in the 2020s.
I agree this is a concern, but it frustrates me that tech companies won't give us reasonable options.
- "Scan photos I upload" yes/no. No batch processing needed, only affects photos from now on.
- "Delete all scans (15,101)" if you are privacy conscious
- "Scan all missing photos (1,226)" can only be done 3x per year
"But users are dummies who cannot understand anything!" Not with that attitude they can't.
> - "Scan photos I upload" yes/no. No batch processing needed, only affects photos from now on.
This would create a situation where some of the photos have tags and some don’t. Users would forget why the behavior is different across their library.
Their solution? Google it and start trying random suggestions. Toggle it all on and off. Delete everything and start over with rescanning. This gets back to the exact problem they’re trying to avoid.
> - "Scan all missing photos (1,226)" can only be done 3x per year
There is virtually no real world use case where someone would want to stop scanning new photos but also scan all photos but only when they remember to press this specific button. The number of users who would get confused and find themselves in unexpected states of half-scanned libraries would outweigh the number of intentional uses of this feature by 1000:1 or more.
I spent about 30s on those options, but ok, I'll bite.
> Google it and start trying random suggestions.
If the options were indeed as I suggested, why would the top Google result not say "click the very clearly labelled 'scan missing photos' button"?
Google search results are useless when tech companies don't empower users with clear control over their data. Users are reduced to superstitious peasants not because that's their nature, but because they are not given the capability to act otherwise.
Tell you what, Microsoft: turn it off, leave it off, remove it, fire the developers who made it, forget you ever had the idea. Bet that saved some processing power?
Most of us wouldn't mind if the limitation was that you can't opt IN more than 3 times/year, but of course Microsoft dark patterned it to limit the opt outs.
That's would be a wild way to implement this feature.
I mean it's Microsoft so I wouldn't be surprised if it was done in the dumbest way possible but god damn this would be such a dumb way to implement this feature.
This would be because of the legal requirement to purge (erase) all the previous scan data once a user opts out. So the only way to re-enable is to scan everything again — unless you have some clever way I’ve not thought of?
Encrypt the data and store the key on the user's device. If the user enables the feature, they transmit their key to you. If they disable the feature, you delete the key on your side.
In theory, you could store a private key on the device and cryptoshred the data on Microsoft’s servers when the setting is disabled (Microsoft deletes their copy of the key). Then, when the feature is re-enabled, upload the private key to Microsoft again.
3 replies →
You do not have to, and should not, start deleting data immediately. We've not uncivilized here, we can schedule tasks.
If this were happening on device (lol) then you should do both the scanning and deleting operations at times of usually low activity. Just like how you schedule updates (though Microsoft seems to not have forgotten how to do this). Otherwise, doing the operations at toggle time just slams the user's computer, which is a great way to get them to turn it off! We'd especially want the process to have high niceness and be able to pause itself to not hinder the user. Make sure they're connected to power or at least above some threshold in battery if on laptop.
If you can on device and upload, again, you should do this at times of low activity. But you also are not going to be deleting data right away because that is going to be held across several servers. That migration takes time. There's a reason your Google Takeout can take a few hours and why companies like Facebook say your data might still be recoverable for 90 days.
Doing so immediately also creates lots of problems. Let's say you enable, let it go for awhile, then just toggle back and fourth like a mad man. Does your toggling send the halt signal to the scanning operation? What does the toggling on option do? Do you really think this is going to happen smoothly without things stepping on each other? You're setting yourself up for a situation where the program is both scanning and deleting at the same time. If this is implemented better than most things I've seen from Microsoft then this will certainly happen and you'll be in an infinite loop. All because you make the assumption that there is no such thing, or the possibility of, an orphaned process. You just pray that these junior programmers with a senior title just don't know how to do parallelization...
In addition to the delay you should be marking the images in a database to create a queue. Store the hash of the file as the ID and mark appropriately. We are queuing our operations and we want to have fail safes. You're scanning the entire fucking computer so you don't want to do things haphazardly! Go ahead, take a "move fast and break things" approach, and watch your customers' get a blue screen of death and wake up to having their hard drives borked.
Seriously, just sit down and think about the problem before you start programming. The whiteboard or pen and paper are some of your most important weapons as a programmer. Your first solution will be shit and that's okay. Your second and even third solution might be shit too. But there's a reason you need depth. We haven't even gotten into any real depth here either. Our "solution" here has no depth, it's just the surface level and I'm certain the first go will be shit. And But you'll figure more stuff out and find more problems and fix them. I'm also certain others will present other ideas that can be used too. Yay, collaboration! It's all good unless you just pretend you're done and problems don't exist anymore. (Look ma! All the tests pass! We're bug free!) For christ's sake, what are you getting a quarter million+ salary for?
Disabling the feature would purge the data. That’s the intent.
If disabling the feature kept the data, that would be a real problem.
I don’t know why you think it’s dumb that they purge the data when you turn a feature off. That’s what you want.
I think you should have read the other comments before responding. There are multiple solutions here. And note that my answer is suggesting a delay so we don't hammer the user's computer. Toggling should schedule the event, not initialize it. You've over simplified the problem, treating it as if operations can be performed instantaneously and that they are all performed locally.
[dead]