Comment by pwg

7 years ago

> but that doesn't explain why Facebook decided to include so-called deleted files in a download of user data.

This happened because the person tasked with writing the code to build the archive forgot to include the filter for "deleted" records somewhere in the code.

I.e., they forgot the "where is_deleted = false" part below on one or more DB query requests like this:

select * from table where is_deleted = false;

This is the biggest problem with the "soft delete flag in database" method of deletion. Every single query writer, everywhere, forever, must always remember to include the "is_deleted" filter in their queries. And when they don't, what was deleted reappears as if it had never been deleted at all.

If you have soft-deleted user data, then you have user data, so you had better include it.

  • That is a good point, but flagging shouldn’t be the end of the line for soft deleted data. There should be a process going back and removing everything that was flagged for deletion, prioritized to guarantee deletion within a set time frame but without impacting performance. Meanwhile, most queries should be done through a view that automatically masks out any flagged data. It’s a basic data integrity feature that shouldn’t be left to their API (which is such a fast moving target that one developer doesn’t know what the other is doing much of the time).