Comment by chmaynard
7 years ago
> Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space.
You may be correct, but that doesn't explain why Facebook decided to include so-called deleted files in a download of user data. Clearly these deleted files are still a part of Facebook user profiles and accessible to company data mining software. Facebook has exposed their own duplicity.
Maybe the Facebook development processes and tracking of tech debt is just shit. First person: "I'll just flag the content and then it won't show on their timeline!" Second person: "I'll just select all the records that belongs to this account when packaging a backup. All the deleted content should be gone!"
But I wouldn't discount your hypothesis.
When storage is cheap, it's rational to develop the delete flag first and think about cleanup later, which means never. The download content thing seems like a low priority project and the poor intern who probably did it didn't want to figure out how each store keeps the delete flag. At least it's honest. Would you be surprised a dd of your sd card showed your deleted photos?
Storage being cheap is irrelevant..when a user requests the data to be deleted. You delete it. Outside of government compliance there is no reason to not comply with that request
11 replies →
Maybe they should get someone from the internal department that monitors Facebook employees to come over and show them how to run a tight ship:
https://www.theguardian.com/technology/2018/mar/16/silicon-v...
>Facebook has exposed their own duplicity.
Or possibly they just screwed up. Perhaps the "soft delete" was originally intended to allow "undelete" by the user with delayed purge, and/or single-instance storage with reference counting that they never quite got around to finishing.
> but that doesn't explain why Facebook decided to include so-called deleted files in a download of user data.
This happened because the person tasked with writing the code to build the archive forgot to include the filter for "deleted" records somewhere in the code.
I.e., they forgot the "where is_deleted = false" part below on one or more DB query requests like this:
select * from table where is_deleted = false;
This is the biggest problem with the "soft delete flag in database" method of deletion. Every single query writer, everywhere, forever, must always remember to include the "is_deleted" filter in their queries. And when they don't, what was deleted reappears as if it had never been deleted at all.
If you have soft-deleted user data, then you have user data, so you had better include it.
That is a good point, but flagging shouldn’t be the end of the line for soft deleted data. There should be a process going back and removing everything that was flagged for deletion, prioritized to guarantee deletion within a set time frame but without impacting performance. Meanwhile, most queries should be done through a view that automatically masks out any flagged data. It’s a basic data integrity feature that shouldn’t be left to their API (which is such a fast moving target that one developer doesn’t know what the other is doing much of the time).