Comment by dividuum
7 years ago
> I am sure that the deletion of media files in services like Facebook has never meant to be absolute. Many of my colleagues believe the same thing that I believe: Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space.
No reason to believe. You can read about the storage architecture used to store photos from a post in 2009 here: https://code.facebook.com/posts/685565858139515/needle-in-a-.... Obviously that might and probably has changed since, but at least at some point that was exactly true.
Quote:
"The delete operation is simple – it marks the needle in the haystack store as deleted by setting a “deleted” bit in the flags field of the needle. However, the associated index record is not modified in any way so an application could end up referencing a deleted needle. A read operation for such a needle will see the “deleted” flag and fail the operation with an appropriate error. The space of a deleted needle is not reclaimed in any way. The only way to reclaim space from deleted needles is to compact the haystack (see below)."
Allow me to ask the obvious question.
Who doesn't do the something like this?
Not to alleviate facebook of blame, but who's to say data on almost every other social media service isn't also just flagged for deletion?
We don't soft delete payloads at Raygun (https://raygun.com), for the very fact that typically if one of our customers wants to delete something it's because they might have sent something they don't want a third party to have. We have filters and other PII filtering tools etc, but it every now and then something might be sent by mistake.
Having said that, you'd be amazed how often folks ask for things to be undeleted (despite a big warning dialog).
Clearly developers pervasively believe soft deletes are occurring everywhere.
It isn’t that hard to combine soft deletes with delayed hard deletes: generate a new encryption key every day for “data deleted today”, and encrypt deleted data with it. After X days, destroy the decryption key.
If you use asymmetric encryption, you can keep the group of people who who can recover “deleted data” small. You could even have an independent party generate your encryption key pair, give you the encryption key, and your customer, on request, the decryption key (I think there is a business model for a non-profit here).
7 replies →
That's great you guys do that. But it cant be proven, why take your word for it?
Ultimately, its the trust that is ghe problem, and that is what needs to be removed eother through new technology or legislation or both.
and why offer the false sense of security?
if they upload a private key, and delete because they "don't want a third party to have". do you also guarantee it wasn't seen or cached anywhere else? I dont know the details of that product, but I usually treat anything uploaded even once as compromised from that point on.
1 reply →
How do you handle:
1. Deletions from backups
2. Deleting material that has been deleted prior to the restoration of the backup?
> Not to alleviate facebook of blame, but who's to say data on almost every other social media service isn't also just flagged for deletion?
The word "delete" has a pretty clear definition to most users. Facebook is one of the most used pieces of software in the world. If FB is allowed to lie to its users, it would indeed give a pass to just about every social media service out there.
The reason Facebook is special, and deserves special scrutiny, is because of its power. If FB establishes a bad behavior, it will become the norm.
A more prudent question would be whether these tech companies should be reined in by federal privacy law. Should they be allowed to collect, trade and analyze private data on all of its users? Where do we draw the line in the sand, in terms of what's acceptable and not.
These are incredibly important questions. A related field would be the credit bureaus, such as Equifax. Global companies who store social security numbers and all other sorts of information. We need a national set of rules for these companies to follow.
Not keeping my hopes up, given our Congress is so dysfunctional these days.
I would think undelete also has a clear definition. Technical implementation is orthogonal to these kinds of definitions.
Does it make it ok for Facebook to do it just because similar other companies do it? I say no, all of them should delete something I say to delete. And "everyone does it" is makes it a bigger problem, not a smaller one.
A lot of the big agile companies are using event sourcing. So there isn't even a model to delete. It's all events with the models being created from a snapshot of events. The event stream is usually durable and lives forever.
https://martinfowler.com/eaaDev/EventSourcing.html
So with this type of system nothing is ever "deleted". It's just an event that something is deleted.
This is a common and very scalable system. You don't deal with models, you deal with events (and a model is a snapshot of events).
This is even an issue. Even other companies that aren't event sourcing, but traditional model architecture have backups. You ask something to be deleted and they might actually delete it, but what about last weeks backup? It's not deleted there.
6 replies →
Why would they? They implement their system the way they want to. Also, this is a completely logical way to deal with deletions. This is what I would do, (what I have done, when I created a simple CMS system). I don't want an endless quarrel with a customer, who "accidentally" deleted something and wants it back. I just turn the switch and it is back.
3 replies →
I recall the distinction being made very clear on LiveJournal between "deleted" content vs "purged". I would be very surprised if they were not being forthright about this. Of course this was 10 years ago, before the Russian ownership. So I do have reason to believe that not all companies act in deceitful ways when it comes to retention of user data.
Snapchat claims to. And after the trouble they once got into for not deleting media after it was viewed, I believe them.
https://www.snap.com/en-GB/privacy/privacy-policy/
Why would you believe them after they have lied to you once already?
6 replies →
Yeah i don't see much difference between this and hitting delete on a file in a local file system. The data itself still sits there until the sectors gets reclaimed, but there is no longer a file name or directory entry associated with them.
The database we use (Vertica) works this way. Nothing is deleted. Instead it is flagged as deleted. A background task may purge old data (older than x). Historical queries show the database state as it was days or weeks ago. If the background task is broken (bug?) then the data stays indefinitelly on disk.
So, like a filesystem more or less
I might just "help" them by uploading more data I guess
File systems eventually overwrite that data, though. FB's system specifically never reclaims it. Why on earth would you ever do that, unless you have absolutely no respect for your users wishes?
Not standing up for FB's other practices, but from a technical stand point there are several reasons, none of which are about not having respect.
- disk space is cheap - deletes are expensive (time) and slow - deletes are harder to scale - can't revert a real delete - delete's don't fit into an event sourcing architecture - append only data is better, more durable
I could go on.
4 replies →
I assume that there might be technical reasons to do it that way.
For example: a soft delete may be just a stronger version of public vs private settings. The whole software infrastructure still assumes a link exists and doesn’t need to cover cases where it really isn’t there. I could see how that makes maintaining indexes etc easier.
Flipping a flag and then filtering out results down the line based on the delete setting is probably much easier than actively removing them from an index.
And if deleting is rare (it probably is), then the performance and resource impact should be minimal.
> unless you have absolutely no respect for your users wishes?
Hehe, you mean, like... Facebook? They respect advertisers with money, not users.
What video would you upload? Just nonsense? Is this to become the modern equivalent of a "black fax"?
https://en.wikipedia.org/wiki/Black_fax
A browser addin to help with this helping might help make the world a better place.
Banning accounts using this add-on (breach of ToS) would be a formality for facebook, if this was to become an issue in the first place. (unlikely that a sufficient number of people will bother doing this)
1 reply →