Facebook Secretly Saved Videos Users Deleted

7 years ago (nymag.com)

This reminds me of the long conversations that I used to have with family members and friends several years ago. With their continuous requests to create my own Facebook profile so I can keep in contact with them and with their activities as well as to share my whereabouts. I always used the same argument to reject these suggestions — "I don't want Facebook to have too much data about me, more than the data that you already provided".

I got used to the looks of disbelief, thinking that I was some sort of hermit, an antisocial.

I also got tired of answering the frequent "Why don't you have Facebook?" questions.

I remember the last time I had this conversation with someone, last year (2017) around August. I found a new love partner, and after the long intimate talks on the phone, they requested the usual "intimate pictures", not necessarily sexual but certainly sexy. While I have no tabus with regards to my sexuality, having an understanding of how the Internet works, I have always refused to send that type of images/videos/audios, and I always tried to be patient with the other person to explain my constant denials. Unfortunately, expecting a non-tech-savvy person to understand how data moves around the Internet is most of the time based on hope, and even if they understand, they ultimately don't care because the result doesn't change: you don't get to share something with them and that affects personal interactions.

I am sure that the deletion of media files in services like Facebook has never meant to be absolute. Many of my colleagues believe the same thing that I believe: Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space. The same way a hard drive works, you don't really delete a picture when you hit the "delete" key, nor even if you clear the "trash" folder, the data is still there, where it was, it just loses the links to the metadata.

It is sad how this information becomes news only when bad things happen.

  • > I am sure that the deletion of media files in services like Facebook has never meant to be absolute. Many of my colleagues believe the same thing that I believe: Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space.

    No reason to believe. You can read about the storage architecture used to store photos from a post in 2009 here: https://code.facebook.com/posts/685565858139515/needle-in-a-.... Obviously that might and probably has changed since, but at least at some point that was exactly true.

    Quote:

    "The delete operation is simple – it marks the needle in the haystack store as deleted by setting a “deleted” bit in the flags field of the needle. However, the associated index record is not modified in any way so an application could end up referencing a deleted needle. A read operation for such a needle will see the “deleted” flag and fail the operation with an appropriate error. The space of a deleted needle is not reclaimed in any way. The only way to reclaim space from deleted needles is to compact the haystack (see below)."

    • Allow me to ask the obvious question.

      Who doesn't do the something like this?

      Not to alleviate facebook of blame, but who's to say data on almost every other social media service isn't also just flagged for deletion?

      38 replies →

    • The database we use (Vertica) works this way. Nothing is deleted. Instead it is flagged as deleted. A background task may purge old data (older than x). Historical queries show the database state as it was days or weeks ago. If the background task is broken (bug?) then the data stays indefinitelly on disk.

  • Unfortunately there will be no prizes for having been right all along.

    Even now, as facebook is burning, statements of how one has quit or will be quitting facebook get swept into the pile of incendiary indignation, with encouragement from all sides.

    But never having used facebook, even at significant personal effort as you indicate, one is relegated from "elitist" before to "smug" now.

    One day in the future a recruiter will ask why there's nothing about you on the Internet, and you will proudly be able to say: "Because I know the Internet and its dynamics that well" and they will hire you, in awe of your analytical foresight.

    That's the dream anyway, because you're more likely to be reported for being suspicious. After facebook there will be another facebook, and another, and people will flock to them just the same, and you get to experience being an antisocial hermit all over again.

    Now I made myself sad. "Social Media: even more depressing when you're not on them!"

    • > Unfortunately there will be no prizes for having been right all along.

      Except not having your racy pictures in Facebook's media archive.

    • What about the option where people return to messaging applications for private matters and keep a Tweetbookdin for public persona ?

    • Facebook is burning. Lol. They suffer a minor setback and they're "burning". Right now there is not much of alternative to facebook, it will be just fine.

  • "I am sure that the deletion of media files in services like Facebook has never meant to be absolute." This is very common, I'm sure. There should be a way to request or a right to request permanent deletion, by law, of one's data on site like Facebook. That said, once something is on the internet, anyone can and will archive it (see https://www.reddit.com/r/DataHoarder/). Closing an account, however, should imply permanent deletion. Companies are instead able to operate in a gray area through terms of service agreements that knowingly play on the ignorance of the end user. This common and widespread behavior is a detriment to the user and (arguably) society at-large.

    • Obviously I'm not privy to the details of this particular requirement, but I'm fairly certain that very few, if any, of our videos actually go away when we delete accounts. (Or even when we delete the videos themselves.) I think this because I've seen images from SMS texts, instagrams, snapchats and things of that nature used in court cases. So law enforcement must have access to that stuff somehow? But, again, I'm not privy to the technical or legal mechanisms they use to make that happen. All that said, I have seen images from services like these in court cases. And defendants have CLAIMED that they had deleted them. (For whatever value of "deleted" exists on the given service.)

      So I'm wondering if the services actually have some sort of archiving requirement for law enforcement purposes? Maybe for a certain number of years, they have to save your data or something like that?

      If there's anyone who would be familiar with the legal obligations of these services vis-a-vis data archiving I'd be really interested in hearing more about what we should reasonably expect from these services in terms of deletion etc?

      1 reply →

    • GDPR is intended to at least force service providers to give folks the right to be forgotten which compels providers to delete data. While it's own Europe, it's difficult to comply without just making general decision about honoring these requests.

      13 replies →

  • > Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space.

    You may be correct, but that doesn't explain why Facebook decided to include so-called deleted files in a download of user data. Clearly these deleted files are still a part of Facebook user profiles and accessible to company data mining software. Facebook has exposed their own duplicity.

    • Maybe the Facebook development processes and tracking of tech debt is just shit. First person: "I'll just flag the content and then it won't show on their timeline!" Second person: "I'll just select all the records that belongs to this account when packaging a backup. All the deleted content should be gone!"

      But I wouldn't discount your hypothesis.

      14 replies →

    • >Facebook has exposed their own duplicity.

      Or possibly they just screwed up. Perhaps the "soft delete" was originally intended to allow "undelete" by the user with delayed purge, and/or single-instance storage with reference counting that they never quite got around to finishing.

    • > but that doesn't explain why Facebook decided to include so-called deleted files in a download of user data.

      This happened because the person tasked with writing the code to build the archive forgot to include the filter for "deleted" records somewhere in the code.

      I.e., they forgot the "where is_deleted = false" part below on one or more DB query requests like this:

      select * from table where is_deleted = false;

      This is the biggest problem with the "soft delete flag in database" method of deletion. Every single query writer, everywhere, forever, must always remember to include the "is_deleted" filter in their queries. And when they don't, what was deleted reappears as if it had never been deleted at all.

      2 replies →

  • Facebook is powerful and insular. Taking it down requires extraordinary organisation. Outrage is helpful in that respect.

    Agreement is better than disagreement. Would I prefer we had agreement earlier? Yes. Is agreement today better than agreement tomorrow? Absolutely.

    Now that we have a constituency, the important thing is to mobilise. The past is in the past. Our job, in the present, is to protect the future.

  • > It is sad how this information becomes news only when bad things happen.

    What bad things? I feel that's the part missing from the argument. People have yet to see or hear what are the negative consequences of all that data being kept or even leaked or re-sold.

    The only one they've started to know about is the potential impact on elections, which is pretty hypothetical and weak to most people I feel. Or maybe identity theft, but that's more related to the Equifax leak.

    I think its important to rationalise on what are the real consequences of our data no longer being private. Is it really dangerous? What's the worse that could happen? What are the chances of it happening, etc.

  • > I have always refused to send that type of images/videos/audios

    Isn't it still trivial to self-host stuff?

    Just send a link to picture (or document or whatever confidential information you want to share) to a password-protected resource on your own server (or even a laptop or desktop machine, if you have globally routeable IP address there). Facebook automation is not that smart to grab the password from the very same conversation, and even if they do - I'm sure they won't do it, knowing you'll catch them in access logs and press charges for unauthorized access.

    I doubt many would object and insist on sending via a very specific medium (i.e. strictly require pics in a FB Messenger). Some, of course, may find this inconvenient.

    • "trivial" and "your own server" together? :) Maybe for some code monkey, but not for my mom. :(

      I really do wish self-hosting were more trivial, it would be a better world.

    • Where I live Internet providers deliberately make self hosting anything extremely hard.

      Then they charge often 5x or more their normal price to let you host things, but add lots of exceptions, for example all providers put in contract they can immediately cancel your subscription of they detect you hosting anything irc related, doesn't matter of it is a irc server or a irc bot or a server for a open source irc client...

  • I think a great way to explain privacy limitations to a non-tech-savvy person is to walk them through using GPG.

    Once someone understands public and private keys, and webs of trust, there really isn't much left to learn. For someone who understands keypairs, the limitations of Facebook/Twitter/etc., DRM, etc. are obvious.

    It seems most of us are afraid our non-tech-savvy friends and family won't be able to wrap their heads around security, but not understanding it has gotten us into a pretty bad situation. We should really stress the importance of learning about it.

  • >I got used to the looks of disbelief, thinking that I was some sort of hermit, an antisocial.

    I know that look.

    >I also got tired of answering the frequent "Why don't you have Facebook?" questions.

    I solved it by stating flatly "For the same reasons I don't have Twitter.", somehow marking the final period, people still believes I'm a kind of weirdo, but they don't go on asking ...

  • >Unfortunately, expecting a non-tech-savvy person to understand how data moves around the Internet

    Explain how data can be unreadable while it moves. Teach them to use secure communication options. You don't need to be an electric engineer to use a TV remote control.

    • But no one has made a TV Remote control version of "encrypted Facebook" or even "encrypted eMail".

      And heck, there are people who can't use tv remote controls.

      The only thing that I'd consider "easy" is encrypted chat (signal). The "issue" there is market fragmentation (arguably a good thing).

  • I always tell people to treat Facebook as if every person you ever meet will be able to see it. It's more or less my public persona. Twitter is more anonymous.

    • > Twitter is more anonymous.

      How did you arrive at that conclusion? I assume Twitter retains everything as well (even "deleted" tweets) and it's all associated with an email address. Or did you mean it in the sense that far fewer people have a Twitter account?

      4 replies →

    • How is Twitter more anonymous? In the UK people have been locked up for tweets.

      Twitter probably have less data on you, but I doubt it can't be linked direct to you by a TLA, say.

  • >I remember the last time I had this conversation with someone, last year (2017) around August. I found a new love partner, and after the long intimate talks on the phone, they requested the usual "intimate pictures", not necessarily sexual but certainly sexy.

    Why the fuck are these a thing? Couples don't meet in real life much anymore? And how "usual" are they?

    • Anyone have stats on how widespread this is? My spouse and I avoid being in front of cameras naked even when we're pretty sure the camera isn't enabled. Not that anyone else would really want to see us nude, but why take a chance on accidentally recording material that could be embarrassing?

  • > expecting a non-tech-savvy person to understand how data moves around the Internet

    Then we - the people that do have the necessary technical knowledge - have a duty to teach them what they need to know. This isn't necessarily "how data moves on the internet". Yes, this can be difficult and tedious, but understanding the risk profile for data/networks is increasingly important as networks become involved in everything.

    > they ultimately don't care

    Again, it's our duty to teach them why they need to care. This probably shouldn't involve a lecture on networking or data analysis, but instead tailoring an explanation to their personal situation and knowledge.

    • I don't think it's because they don't understand or because they don't care, it's just overwhelming. Think about it, to have any basic grasp of understanding regarding the security infrastructure of the internet you need to have a basic understanding of network connections, how HTTPS works, how files are stored on your computer, how files are sent across computers, how your average database works etc...

      Think about the last time you've tried tinkering with something you're a noob at. Maybe it's deciding that you would try fixing your car engine yourself even though you never were a mechanic. Maybe you decided to make a complicated cake and halfway through you realize that you overestimated your pastry skills. Try to remember the feeling of helplessness you felt at that moment, the "I have no idea what I'm doing and I wish I never had started that in the first place". In my experience that's how 90% of people feel like when trying to do something technical with a computer.

      A few weeks ago a colleague from HR asked me if I could make a backup of a computer because it contained some critical stuff and she wanted to be able to restore it later if necessary. I say okay, boot up a debian live USB stick I had lying around and start dd'ing the drive to external storage. When I told her the copy was in progress she told me "but I didn't give you the password?". She was amazed when I told her that I didn't need the windows session password to access the data on the disc. I swear I'm not making it up when I say that she asked me if I was a "hacker".

      That made me realize that there are probably many people out there who think their files are safe as long as their Windows password isn't compromised even if the disc is not encrypted. After all, they can't access the files, so surely nobody else can? If Facebook says my photo is deleted, then surely it must be? Why wouldn't it be?

      I don't think it's fair to blame these people, we've designed so many strange patterns over the past decades in software that it's difficult to keep track. Maybe having "delete" not actually delete should be considered a dark pattern. Maybe it should even be illegal.

      3 replies →

    • And how would we do that? Every time I've tried to explain privacy issues to non tech individuals at best they consider me paranoid and at worse a fucking sociopath who doesn't have a FB profile because I can't correlate with other people. I can't carry this burden and I doubt many can.

      There have been horror stories over the years about identity theft, even before the emergence of social media. Has this stopped anyone outside our community from posting details about their lives online? I hardly think this whole situation with FB will change anything in the end.

      I don't feel I have any obligation/duty towards anyone. If they want my opinion or ask me about an issue I'll gladly inform them. But I won't start a crusade for a better informed society. Internet was supposed to do that and we ended up with videos of cats and wannabe celebrities posing seminude pics on Instagram. Fuck that shit.

      1 reply →

    • > ... instead tailoring an explanation to their personal situation and knowledge.

      I’ve used this with success several times. Though you generally have to know the person well enough to know their “secrets”.

  • > I am sure that the deletion of media files in services like Facebook has never meant to be absolute. Many of my colleagues believe the same thing that I believe: Facebook and other services do not actually delete data, they just mark it as "deleted" and purge it only if they need the space.

    This is a dumb conspiracy theory. Facebook has made plenty of public statements that say otherwise, and there's a whole team that works on the system that ensures every trace is erased from disks, logs, cold storage and backups when deleting content.

    • Looking online briefly for definitions of "delete":

      "remove or obliterate (written or printed matter), especially by drawing a line through it or marking it with a delete sign."

      "synonyms: remove, cut out, take out, edit out, expunge, excise, eradicate, cancel"

      All of these seem clearly "absolute" to me. "Delete" means it's gone.

      I think Facebook has its own special linguistic distortion field. It requires no "dumb conspiracy theory" to realize that Facebook cannot be trusted.

      2 replies →

    • I'm not inclined to believe PR statements like these when there's no way to verify them.

      Can you support your assertion? The infrequent cases where someone manages to extract or recover supposedly deleted data cast a lot of doubt on your claims.

      In any case, even if it's not Facebook specifically, it seems overwhelmingly likely that the majority of companies do not actually delete your data.

      1 reply →

    • To be fair though, the article that this comment thread is attached to offers some seemingly direct evidence to support one aspect of this 'dumb' 'conspiracy' 'theory'.

    • Did you read the OP? How can you say that this is a dumb conspiracy theory?

    • First lesson in DB class: do NOT delete. Just flag.

      I can give you plenty of statements about how I'm Santa Claus though.

You mean they follow what many people consider best practices?

https://softwareengineering.stackexchange.com/questions/1592...

https://stackoverflow.com/questions/820466/never-delete-entr...

https://serverfault.com/questions/31455/should-i-ever-delete...

https://www.infoq.com/news/2009/09/Do-Not-Delete-Data

http://udidahan.com/2009/09/01/dont-delete-just-dont/

https://stackoverflow.com/questions/2549839/are-soft-deletes...

https://azure.microsoft.com/en-us/blog/soft-delete-for-azure...

I find it fascinating how much shock there is that Facebook is doing what nearly everyone else is doing, and what many people here have likely implemented.

  • The accepted answer to the first link you posted explicitly calls out:

    > There is one class of data that you have to delete - and that's personal data that the user doesn't want you to hold any more. There may be local laws (e.g. in the EU) that makes this a mandatory requirement (thanks Gavin)

    This is exactly the type of data we're discussing here. So no, contradicting the user's expectation when handling personal data is not a "best practice".

  • Disclaimer: I deleted my Facebook account a couple years ago and never looked back.

    That said, Facebook is who is just getting collectively stabbed with the pitchfork right now. Engineering best practices are one thing. My right to privacy is another. As an engineer I care about efficiency. As a human I care about privacy. My rights win over any technocratic babble. Sorry if I am being harsh. I am, of course not surprised. Engineers are lazy at best and at worst, something truly sinister is brewing.

    • I agree that you have the right to privacy, but there's also technical reasons why instant deletion is not always possible. If they can guarantee that the data will be gone after X days, then that's fair to me.

      1 reply →

  • "best" practices...

    ...as if I didn't already have enough reasons to hate that cliche, thought-terminating phrase... every situation is unique and figuring out what exactly to do for your particular one is probably the main purpose of being a software engineer.

  • Does this make it right, because others are doing it? I’d say it depends heavily on the type of data and the user expectation.

    The correct thing would be to flag as deleted for a sensible period of time (to be able to undo for the user) and then get rid of it after X days when it clearly isn’t needed anymore.

    • I'm mostly curious how many people are posting angrily about Facebook retaining data while taking a break from implementing a system that retains data.

      1 reply →

  • Facebook lost billions in days and is poised to lose more.

    I imagine there are more pressing issues than the difficulty of implementing a schema.

  • "best practices" is for developers.

    For the average Joe and Janet out there, "deleting" something is synonymous to "remove from the internet for eternity"

    • Except when it’s not, and they want back the data they’ve deleted by mistake.

      In those cases it will take a lot of support to explain that what is gone is gone. I think customers don’t have a unified vision of what deleting means, they just want what’s optimal for the situation.

      1 reply →

    • >"remove from the internet for eternity" lol, internet never forgets. Everybody seems to have an ilusion of control over digital data shared with others or uploaded to the internet.

  • From your first link:

    > There is one class of data that you have to delete - and that's personal data that the user doesn't want you to hold any more. There may be local laws (e.g. in the EU) that makes this a mandatory requirement (thanks Gavin)

  • These best practices are about database records and not about files. I'd be very surprised if Facebook store files as database blobs. These are generally stored on a separate system, and it's quite reasonable to delete the file while keeping the metadata in the database.

  • Privacy hawks are always looking for a reason to complain about Facebook and scream I told you so.

The most unsettling part is in Facebook's response: “We’ve heard that when accessing their information from our Download Your Information tool, some people are seeing their old videos that do not appear on their profile or Activity Log. We are investigating.” Who wants to bet against their investigation being “how to keep users from seeing it.” Anyone?

  • I honestly don't understand this cynicism. Facebook does not want your deleted video, and they certainly don't want to keep it given the current media frenzy, with the CEO under fire.

    Every application of any complexity has features which inactivate, but don't delete data. At Facebook scale, deleting data is non-trivial, and it would be impossible to immediately delete something.

    We all have bugs, including extremely critical security bugs, availability-threatening performance bugs, or many other types of bugs. It's strange that we accept those bugs as merely bugs, without assuming a backdoor, or intentional sabotage, but when it comes to personal data, suddenly it's a nefarious plot. It's an odd position to take that Facebook is not only saving these deleted videos intentionally (for what, exactly?) but that they'll now lie to us and pretend to delete them, but only remove it from their Download Information tool.

    Kudos to Facebook for even having such a tool.

    • I agree with you.

      At Facebook-scale the data is massive -- far bigger than anyone here could possibly comprehend and that includes the Facebook and Google-ers lurking around.

      Data has incredible inertia. And when there's a lot of it, in a lot of different places, I can imagine that it becomes very difficult to keep track of.

      I'm glad that Facebook's data export tool included some things that maybe it didn't expect to.

      6 replies →

    • After delaying informing users of their data being handed over to third party services and keeping quiet for 3 years, some cynicism is warranted.

    • > Facebook does not want your deleted video

      Oh, most certainly they do want that video. Their business is knowing who we are and what drives us, so they can target those ads better. That's what makes their shareholders money.

      That the people working there are human beings who might consider it immoral to keep deleted material, is what most people rely on when using such services... but being kind is not Facebook's goal.

  • It's upsetting that the bug exists in the first place, but there's nothing unsettling about this response. Have you ever reported a bug to an Internet company before? What do you expect them to say?

  • It's no bet. The investigation results in a ticket for another intern to add "WHERE deleted = 0" to the download tool.

  • EU Data Protection Law requires that users are entitled to view all personal data (yes that includes videos) that FB has on them.

    In May the EU will be able to fine FB up to 4% of global revenue for breeches of this law. Popcorn time!

  • Apparently the walls on the garden weren’t high enough - and the sappers are the real culprits!!

One fascinating outcome of all this fallout is that there's now a readymade excuse to stop using Facebook.

My personal observations are that a good number of people have felt 'fatigued' by Facebook for a very long time, but were also unsure of how best to extricate themselves without incurring a social penalty.

But now there's an impetus that most people can understand. I'm not sure about how many people will move away or how quickly it'll happen, but the network effects Facebook capitalized on can also work in reverse: if you have just one or two very vocal privacy proponents in a friend circle pushing to get off the platform. One group I'm in recently migrated to Telegram for this very reason.

  • If you truly want privacy and security I would recommend Signal over Telegram -- Telegram has had some controversy with respect to their encryption protocol not being audited, as well as some weird stuff with a very large recent ICO that seems entirely unnecessary except as a money grab and Russian subpoenas for their master private keys.

    • > and Russian subpoenas for their master private keys.

      While I cannit defend (or attack, I'm no cryptographer) their crypto they seem to have a solution to this:

      They say they don't store keys in the same datacenter or even jurisdiction as the customer data they protect.

      According to them this means getting unencrypted data through a legal process would mean getting a warrant in two or more countries at once.

      3 replies →

  • Definitely. I’ve been off Facebook for years but always felt judged for it. In the past few weeks it’s gone from being judged to being applauded for calling it.

Just an anecdote. I had a day set aside to purging my Facebook entries a year or two back. I manually deleted comments and posts.

Of course there was too many to do and it was very boring so I only spent a couple of hours at it. But that's nots what's interesting. What happened was that I got a huge uptake of people commenting on some old post I made, like a profile picture change. I think Facebook saw I was purging my data slowly and reached out to my FB contacts encouraging them to interact more with me. It was very odd.

  • You made changes to old post (deleted comments), so Facebook decided that because there are some updates to old posts - it makes sense to treat these old posts as new. So Facebook started to show these posts to your friends in their news feeds.

  • If you're worried about FB analyzing/selling your data then "deleting" does nothing. It effectively just sets a boolean flag on a record in a database which is more like 'hiding'.

    It may not appear anymore in the frontend but you can be pretty sure it's still being used by FB. Now that may change after GDPR but who knows...

  • Is there not a way to delete all posts and comments at once?

    • I don't believe so, at least not in the recent past. I purged my FB of all content about 18 months ago, and had to do it manually. Took several hours spread across a few weeks, whenever I could force myself to spend the time on it. For whatever reason I kept finding posts/comments for a few weeks after that; I'd go back to make sure I got everything, scouring the timelines, and there'd be something I missed somehow, quite bizarre.

    • I've never had a Facebook account, but friends have told me that no, there is not. That would go against their user and content retention models, I'm sure. It makes sense that Facebook would make it as difficult, tedious, and painful as possible to delete content from their platform.

      2 replies →

The funniest part of it: All the media hype around the topic is generated... BY DATA collected by NYT/Bloomberg/Techcrunch/you name it. Those articles generate additional views and they just continue to ride this wave. And all those publications share this data with 3rd parties (ad networks, analytics providers, cpa networks)

On top of that, you know what else do they measure? SENTIMENT. So until kicking Facebook generates more revenue - the articles will paint Facebook as a world's main evil. But the day sentiment changes you will see all the articles about Facebook following best practices.

And in the end? Some EU commission will be created and make a law which oblige to "show cookie usage disclaimer", because of which 90% of sites welcome you with ugly popup and ruin the experience providing 0% advantage in managing your privacy...

  • > On top of that, you know what else do they measure? SENTIMENT. So until kicking Facebook generates more revenue - the articles will paint Facebook as a world's main evil. But the day sentiment changes you will see all the articles about Facebook following best practices.

    So what you're saying is that sites like nymag will only run stories that are profitable?

    • Kind of. When everyone is writing articles AGAINST Facebook, it will be very hard to 'sell' article which SUPPORTS Facebook to the editor (because of the potential PR nightmare when potential 4chan starts attacking you)

      Article can't go through editor -> article is not published

Secretly? Secretly from whom?

There is nothing I've come across, ever, that has lead me to believe that Facebook, Google, Amazon, etc., ever delete anything, ever. Not even to clean up space as some people on this thread are suggesting. Hard drive space is cheap and data is valuable. This isn't a secret, this is a fairly obvious business practice that all the big players, and most competent small players, are engaging in.

  • FWIW, Facebook does say that they will delete all of your data within 90 days of account deletion. I believe that indicates that they've put the engineering effort to do a full audit of data to be deleted, handle missing references across the product, and to fully delete user data from logs and backups.

    From https://www.facebook.com/help/250563911970368 :

    > When you delete your account, people won't be able to see it on Facebook. It may take up to 90 days from the beginning of the deletion process to delete all of the things you've posted, like your photos, status updates or other data stored in backup systems.

    The case from the article is trickier. My impression is the feature was just implemented with an append-only data model, which is often (maybe usually) a good engineering decision. "Secretly" from the article title feels disingenuous because Facebook never said it was deleted. As an engineer, it's frustrating that I might have to write my software to be more fragile to match the implicit expectations of how a non-technical user thinks software should work. But the frustration on the user's end is also plenty understandable here. Hopefully the gap can be closed a little on both sides by a combination of educating users and being more privacy-conscious in engineering and business decisions.

    • Fucking stupid policy - so they have 90 days to off load your shit to Utah/gov-cloud before they “delete” your data?

      Who can possibly believe this BS.

      Imagine you wanted to delete data of your own Sustem - but when you hit rm it takes 90 days to execute - this sentiment PISSES me off.

      When I say “delete me from your service now” I have a reasonable expectation that you will delete it.

      C’mon

      11 replies →

  • Clearly there is a big disconnect here. It seems somebody is suggesting there should be a correlation between a user removing content from their account and Facebook destroying some of Facebook property.

    Anything submitted to Facebook is the property of Facebook. Users have no business telling Facebook to destroy Facebook property.

Huh. So now I'm starting to think: what if I purposely recorded thousands and thousands of meaningless video? None of my friends would ever see them since I never published them, but Facebook would use up hard drive space storing them.

What if a lot of people did that?

Suddenly Facebook's cost for hanging onto all these videos would become quite high with no value in doing so.

Anyone feel like making a website to help automate that process?

  • The scale you’d need to achieve to have even the most minor effect on facebooks vast infrastructure would be enormous.

    I feel the effort you and countless opt-in people would expend could be redirected to much more fruitful efforts. Convincing people to delete their profiles, for example.

  • I generally agree with other people's posts that there are better uses of your time.

    However, if you wanted to do this, why bother recording it? Read a little about the mp4 spec and it would probably be fairly easy to generate mp4s containing random data. You could even go further and generate video that tricks facial/object recognition.

  • You should upload random-pixel uncompressible videos and delete them over and over again. It not only increases storage cost but makes the profile SNR very low.

    There are general-strike level attack surfaces on these networks, but people don't really care that much.

    • Does their AUP/TOS allow them to lock you out in that case?

      Lockout would be worse than account deletion. You would have no recourse to fight back on any of their use of your data right?

      1 reply →

  • Facebook might actually love this. Because for every 20 useless videos you upload, you may click, comment, or view some facebook message that pops up while you're uploading. It could become a net gain for them.

  • What if a lot of people did that?

    That is exactly what a lot of people do. I don't think Facebook is conspiratorially hording funny cat videos. It's just standard practice to flag things as deleted. Everyone does it. I certainly have.

This (and the GDPR - even though my company is in the US) are why I now tell the developers on my team to not collect info unless they have a definitive use case as to why to store it. And make sure to delete delete data as soon as it is no longer needed.

It helps that my business doesn't monetize via advertising.

My guess is what actually happened here is that they had a use case to store it for a few hours or so (incase, say the user changed their mind about posting it) and no one ever bothered to write a cleanup script because "storage is cheap" and possibly "maybe we might have a use case for it someday"

I can't imagine this being intentionally. Even if I try to consider malicious use cases I can't think of any where it would be beneficial for Facebook to actually store this data besides being too lazy to clean it up.

Edit: Wow. Who new applying Hanlon's razor to this would get me downvoted so badly? I'm going to leave this here unedited and eat the downvotes of the people on an anti-facebook warpath because I think it is important to state that we as people who make tech products often take short cuts (like avoiding dev work because it is cheaper to just keep data in storage) and we need to stop doing that. There is plenty of stuff Facebook does that is beyond the pale but this one is more likely contributed to lazyness, and if you are going to downvote me without explaining why you are not contributing to the conversation.

  • > I can't imagine this being intentionally. Even if I try to consider malicious use cases I can't think of any where it would be beneficial for Facebook to actually store this data besides being too lazy to clean it up.

    Nope. They wrote a routine that makes the video invisible to the actual user but refrained from deleting it right away. That is intentional.

    • > That is intentional.

      It's funny you are so sure of this.

      And you essentially just called me wrong without providing a use case. My statement was I couldn't think of a use case for them to do this intentionally and your statement does nothing to disprove that.

      Hiding a video can and probably is just an "UPDATE videos SET visible=0 WHERE id=123" And it is extremely common to soft delete things for hard delete later in case the user made a mistake or law enforcement requests it or any number of reasons.

      Especially in large distributed systems where things often need to happen asynchronously.

      Not permanently deleting a soft delete file is not necessary intentional. Anyone who has ever worked on a large software project knows about backlog stories (say, hypothetically, "free up space from soft deleted videos") not being done for years because other priorities keep pre-empting them.

      Similar reason when writing in a garbage collected programming language the memory isn't freed immediately.

      Does that mean it was unintentional? Not necessarily but it certainly is plausible that it was unintentional.

      2 replies →

I would think this is common practice.

I know that YouTube, for example, retains videos indefinitely, because I've personally been able to retrieve videos that were deleted in as early as 2006.

It was possible for anyone to do this until some time in 2017, when they started requiring signatures for RTSP streams. All that was needed was the video ID (the eleven characters in every YouTube video URL). Didn't matter if they were private (IDs for these could be enumerated if the channel ID was known), "deleted" over ten years ago, or behind a paywall.

From ~2008 until 2015, you could do the same but with higher quality streams through the now-retired Apple TV API.

I'd have been more surprised if they didn't save them. I always assumed that anything remotely hosted that I "delete" is soft-deleted, and that anything I edit is actually just versioned. (I'm not claiming to be especially smart, just cynical.)

This was a bug. There was an old feature that used to allow you to record and post directly from the browser. Those videos were streamed to FB as they were being recorded. If you decided not to post those draft videos should have been deleted but were not. They showed up in download your information (DYI) as expected because that tool is designed to show you the data Facebook has about you. Thanks to New York Magazine for the flag. If you see anything in DYI that doesn't look right, let us know and we'll investigate. This was a bug, and we really do appreciate any help in finding them so we can fix them.

  • I downloaded my information and then deleted my account before realizing that the archive I downloaded did not include any of the photos or posts that I had been tagged in, because I made those posts only visible to me on my timeline.

    • If these are posts by other people that you were tagged in then those posts should still be up, just without the tags of you, on the original posters timeline.

      1 reply →

Is this really a secret or a surprise? Most SaaS companies of this size don’t really ever delete anything. They set a deletion flag and call it a day.

  • Yeah for most things this is a good way to do it. For user data when they delete the account it would be ideal if they actually removed it though

    • Why would that be the good way to do it? Especially if that deletion action was behind a confirmation, or if the data was never recoverable by the user? At that point, just delete delete it.

      1 reply →

Someone should mention also that in the downloadable facebook archive, in the html/ folder, there is a file called contact_info.htm and its a pretty large file. It is apparently every google contact you ever had, all synced to facebook, for every device you ever logged into. So, if you ever might have used a friends device to check your facebook account using then facebook app, then all of that persons contacts are there too, as well as their sms history metadata and call history metadata.

  • I’ve let quite a few people log into the Facebook app on my phone and now I’m pissed.

    I was always careful with privacy settings on Facebook, but the thought never even crossed my mind of what I’d be “agreeing” to by letting someone briefly use my phone.

    I’m sure someone will come along shortly to tell me I deserve it and should have know better and was asking for it but whatever.

    How will the GDPR handle instances like this?

I imagine that the data in those downloads is a fraction of all the data Facebook collects. It seems that they disclose only what is required based on local laws, so it's unlikely they will ever disclose derived data unless forced to (for example, the location data they collect and combine to figure out which people were present at the same party etc.)

Btw why do we think that the downloadable archive contains everything they have? Because they said so?

Just add a bool field to the table "canExport".

  • I don’t think we think that. Obviously, in the context of HN crowd, FB has no real credibility / ethical compass.

This is why I've never really deleted my FB account, I've just deactivated it. If I don't believe they would truly delete the data I might better have it available if I want it

There are people on this very site complaning about how "hard" it is to delete data on request to be GDPR compliant. Highly paid developer experts are literaly throwing a fit when told that they should be able to delete data when "Delete" button is clicked.

It's not just Facebook.

Storage costs next to nothing, recovery costs a lot. Why would facebook who depends on this data for income ever delete it. You get two massive benefits from keeping it and setting a "deleted" flag.

Why would anyone expect anything different. How entitled do people feel they are.

  • It's quite simple: if someone says they deleted something, you expect them to delete it. If they don't, they are lying, and you can't really trust them with anything else.

    • Facebook has been embroiled in back to back privacy scandals since it opened and you still trust them? How much simpler does it get, I agree, that's pretty simplistic!

A lot of these revelations are coming from the fact that Facebook allows you to download the information it has stored about you.

There is an exceedingly high chance that the managers are going to notice this. And, rather than doing the right thing by storing less information, they are going to lie about what they have, and put a filter in place regarding what they let you download (a bit) vs. what they actually have (everything).

We need to be nuanced in our approach to the world, but it is becoming increasingly clear that Facebook has created a business model that incentivizes (and maybe depends on) evil behavior.

What I think so many privacy advocates don’t realize while frothing at the mouth is that odds are 99.9999% no one really wants your old photos or data specifically. Surely in large aggregate, but on a macro level you are no more interesting than anyone else. You’re not. You have a delusion of grandeur.

Sure, you might get some targeted ads by data used in aggregate and put you into a group but so what? If I had to see ads, I’d rather them be things I am interested in.

  • Ah the old nothing to fear argument.

    • Oh, there’s plenty to fear, this just isn’t it.

      The proliferation of cameras and cellphone tracking hooked to state owned machine learning predicting your decisions - which is publicly happening in China and almost certainly quietly happening everywhere else? Terrifying.

      Data collection on crap that’s nearly public anyway? Merely a distraction.

It's kind of unsurprising. As soon as you upload some data into another system that you do not fully control, you can't really expect or trust the other party to discard it because you want to.

Unless there's a good an easy way to store that kind of data and share it with End-to-End encryption (and the server NEVER has access to those keys) so that only authorized users can view the plain data, that problem will remain.

In all likelihood, this is simply the tip of the iceberg. I see Twitter's stock is collapsing, due to similar concerns. Nothing is private anymore in this day and age.

And yet - If we're concerned about what Facebook has on us, then just imagine what kind of treasure trove the government sits on.

I’m looking forward to HN letting go of this enthusiastic expectation of FB’s demise and putting forth higher quality articles.

(Note: I feel the same enthusiasm. I just want to read more meaningful, comprehensive articles.)

I considered this a feature. It's interesting to compare true convo vs what's left after people have backtracked and deleted messages. Especially when the convo was daring so to speak. ;)

Well, once you give permission to your data you no longer control how it's stored and shared(at least physically). To make sure you are not caught off guard you should always expect the worst.

Could be non-malicious if they want to prevent redundancy of the same video going up twice or just laziness in engineers who didn't build D in their CRUD.

And this is why I never signed up for social media in the first place. Remember the afage about stuff on the internet never really going away? Yeah, it's true. Another interesting note: some friends hiring in Austin, TX have told me they can't hire in town because almost every kid has a social media containing drunk pics.

  • who cares they have pics containing drunk pics online? is the employer supposed to dictate my morality?

"They're looking into it."

AKA they will continue harvesting the data but be better about not letting you know what is being harvested.

I've downloaded my zip file to try to verify what's going on in the article

I think I have an idea of what might have happened.

When you add a video to the composer window

One of the requests is https://vupload-edge.facebook.com/ajax/video/upload/requests... (Look it up in the network tab of whatever browser dev tool you are using)

With the response as,

for (;;);{"__ar":1,"payload":{"video_id":"11111111111111","start_offset":0,"end_offset":353662,"skip_upload":false},"bootloadable":{},"ixData":{},"gkxData":{},"lid":"1"}

The video 11111111111111 is now in an "unpublished" state. "unpublished" here meaning it's uploaded to Facebook but not linked to a post yet.

You can verify this by taking that ID and doing the following

https://www.facebook.com/11111111111111/ -> redirects to https://www.facebook.com/phwd/11111111111111/

"Sorry, this content isn't available right now"

Your options now are to either discard the post or publish with a privacy setting which will make the link above available. (Notice I didn't say discard the video, the video is still in an unpublished state)

Now for the archive.

You can verify by going to view-source:fb.com/me in a browser Search for the string "access_token" there will be a long string appended. (e.g. access_token:"EAAAAU...)

With that token go to your archive and roll over one of the links in the video section that has an issue and doesn't appear in the activity log.

file:///Users/phwd/Desktop/facebook-phwd-from-zip/videos/11111111111111.mp4

grab the ID 11111111111111 and do the following

https://graph.facebook.com/11111111111111?access_token=THE_T...

That shows an unpublished video for me, it wouldn't show in your activity log (that's the only part of the story I can agree and can confirm with what I have available)

To delete add the method=delete to the request.

https://graph.facebook.com/v2.9/11111111111111?method=delete...

Response should be

{ "success": true }

The next part would be to verify that the video is deleted from the archive. Since Facebook is still giving me the first download zip, I guess I'll have to wait a while (it's 1 am here so I'm heading to bed) until it resets so I can make it build a new archive and confirm the hunch.

This is just my guess, I'm NOT discounting what the Facebook user encountered. I'm just providing a possible background to how it can happen as well as a solution to deleting the "deleted" video. There is also the chance I might be wrong...

References to confirm for yourself. developers.facebook.com/docs/graph-api/reference/video

Disclosure: I don't work for Facebook, however, I do play with their API a bit.

How do I mass-delete all of my Facebook data? Likes, posts, etc. Is there a reputable tool out there that can take care of it for me since Facebook doesn't provide you with one?

I'm generally ok with keeping Facebook just as a contacts list, but I'd rather not have it have anything else.

I can't believe people don't pay attention. Facebook never removed any data generated, they just remove the index from the data. Deleted data is can be more valuable than data that is kept there.

Isn't it kinda obvious that fb 'secretly' stores everything that may be beneficial to the company?. Who is still so naive as to believe that they care about your privacy?

For maximum honesty, they should just rename all of the Delete buttons and text to 'hide' or better yet, 'hide from friends'.

Short, simple and to the point.

It was already known that Facebook burns all user data onto a bluray disc.

How do you erase data that's been permanently burned onto an optical medium? You can't.

The has happened numerous times in the past, especially when they switched to timelines. Everything you thought deleted reappeared.

Basically once in Facebook, always in Facebook.

This is why for those 'special' videos one should always just use a handheld camera and not their cell phone. But the public will never learn.

Lol I'm not impressed by anything at this point. I thought it was clear from the start, they don't care about you, you're the "useds" of facebook, as Stallman would say. But still, every new evidence that comes up confirming this must make into a separate headline, and we'll keep on getting plenty of those until... I don't know. Until it either dies or rebrands itself well enough to pretend all of this never happened, I suppose.

Given any data submitted to FB legally becomes their property, they don’t have any obligation to delete it.

I always believed in the right to be forgotten. That's why I am not on Facebook nor Google Plus

  • How does being on HN help? Your comments can't be deleted after a certain amount of time (a few hours).

How about youtube? Is there any law specifying website must delete the data user choose to remove?

What you put on the Internet might stay there for ever, is an important lesson.

This can not be a surprise to anyone with a just a bit of skepticism.

  • Agreed, but most do not have much skepticism. You and I may have known about Facebook and had accounts (or deleted them) for over a decade, however many current users are relative internet novices.

Heres something that I noticed earlier today that might be of interest:

420 million Facebook profiles uploaded to archive.org

http://www.newshub.co.nz/home/new-zealand/2018/03/the-kiwi-s...

  • Definitely worth noting as another example of how data can be harvested — but also important to point out it was a third-party created archive made from 2007 and 2010. People will still be (justifiably, IMO) unhappy. Just as they were when YourOpenBook made the risk so blatantly obvious.

Meanwhile:

http://fortune.com/2018/03/31/facebook-employees-are-reporte...

"I do also think that, you know, Facebook has a responsibility to its users to protect their data and not just to protect it but make sure that people understand what data they're producing and whether they own it, who has access to it and when.

And Facebook has failed them, you know, across the board.

And the question now is not just what - you know, what can be done to ensure the security of that data. It's, how can we use this moment to ensure that we're having a broader cultural conversation about the data that we're all creating on Facebook, Google, Amazon, through our phones, et cetera and make sure that the companies are held accountable for it?"

Source:

Facebook co-founder, Chris Hughes

https://www.npr.org/2018/03/30/598208043/should-facebook-use...

"The guy who was showing me around pointed out where they were building "apartments for our people to live".

"So they'll work on campus, they'll eat on campus, they'll socialise on campus and now they'll sleep on campus?" I asked. I wondered whether maybe that was kind of unhealthy. Creepy, even.

My guide looked right at me, and for a moment, his megawatt smile faltered. When he first worked there, it reminded him of the Dave Eggers book, The Circle, he said. Then he started talking about the opportunity to connect the world's people, and I stopped listening."

https://en.wikipedia.org/wiki/The_Circle_(Eggers_novel)

https://en.wikipedia.org/wiki/The_Circle_(2017_film)

Source:

https://www.irishtimes.com/life-and-style/people/jennifer-o-...

  • While this may sound creepy for a foreigner, getting a decent apartment in the SF Bay Area is really expensive and the commutes are horrible. So in this context, that could be a good thing.

    • Why does it sound creepy? What percentage of the population, historically, have been "free" in a sense that they don't belong to an organization that exerts some sort of control over their social world? It seems there are many shades and dimensions here...

      2 replies →

    • I don’t know. It makes rents higher since companies pay more and it takes units off the market and they can deduct the expense unlike traditional renters.

    • I think the opposite is more likely to be true. This sounds creepy because it is creepy and the only place it seems reasonable is in the increasingly disconnected culture of Silicon Valley.

      5 replies →

  • Everything's sound in the company town. Next thing you know it'll be really convenient to introduce a system, similar to colleges where you have FBucks on your company ID. You need never venture outside, in fact you can fill your fbucks straight from your paycheck!

    Eventually, you forget how much the conversation rate between fbucks and usd is, but dismiss any wierdness as a convenience fee, you're well paid right?

    Next thing you know, you can't leave because there's no reasonable housing elsewhere in the city. And so on.

    Edit: Even if it doesn't become even more dystopic for the worker, who now lives every moment inside the company and is alienated from their fellow humans, including increasingly the ability to think dissent thoughts (no where is safe even home) the company has succeeded in capturing back most of the money they paid to their workforce. It puts the lie to the notion that the presence of profits filters out to the rest of the city and increases inequality.

  • I disagree that onsite housing at company HQs is always bad depending on how it's used. Tons of companies are already directly renting housing for interns, newly relocated employees, and then also for shorter terms for people interviewing or on short term business trips.

    I imagine doing such is in the long term cheaper than the amount currently spent on hotel services for the above population. But yes, long term on campus housing is a bit weird in this day and age (but many rural communities were built by companies needing to house their employees near the factor).

    • I always hear criticism of the Japanese model where the employer provides assistance and coordination for employees’ personal affairs. I am spending too much time at work and don’t have time to tend to errands, cleaning, life admin, etc. with the demands placed on professionals these days why shouldn’t the employer bear some responsibility for my personal home life rather than leave us to figure it out like in the West?

I have no such duty. I'll teach my kids when they get a bit older, but trying to teach adult friends is an unrewarding exercise in futily (mansplaining even).

  • >mansplaining even

    I am fascinated to know how you are going to wedge gender politics sideways into this completely gender neutral conversation.

    I agree that talking data storage strategies to people that aren't interested is unrewarding, but please mansplain to me how explaining technical concepts is mansplaining? Ideally, also define mansplaining in the process.

    • I believe the label mansplanting is used by non males to describe men teaching something in detail (often the complex idea is over the head of the person making the label comment). For some they would not like to be educated by males and men explaining technical details that no one asked for qualifies. Many males have stopped educating for fear of that label as most thought they were doing a favo(u)r by sharing. Society as a whole is affected.

      2 replies →

    • It's simple:

      - most folks in tech are men

      - most men date women

      - in 2018 any attempt a man makes to explain something to a woman is a candidate for being accused of "mansplaining".

      I'll say one other thing:

      I honestly have no clue if mansplaining has a more technical definition but in general I see most people interpreting it as a man explaining anything to a woman.

      3 replies →

Ah the 24*7 auto polarising, meme and outrage generating factories built on top of social media attack social media. It's a nice time to be China :) Have fun guys!!

  • This is not a substantive comment, and it has no place here. If you have an actual argument to make, especially where you can point to reliable evidence, please do.

Cue wave of outraged HNers because some people believed Facebook cared.

  • Please don't post unsubstantive comments here.

    • I don't think pointing out how disconnected half of the userbase here is is unsubstantive. 99% of people who use Facebook has not the fainstest idea of what Facebook is doing, why they're doing it, and why it might be wrong. Still, on each of these threads the top comment invariably starts with "why is anyone surprised that.. etc". I could find you links to prove my point, but I'm can't believe you're not aware of this. So I strongly disagree my comment is unsubstantive.

      3 replies →

At what point was any of this news? TOS is clear on this so if this was to bother anyone they would have read that before hand. If I was running a data collection business like google for ad-analytics I wouldn't delete anything either, that's your bottom line your wiping away!

I don't care. I stopped caring about my privacy. Nobody will hurt me by knowing too much about me. Facebook can have all of my life, because nothing is private for me.

It is a great tool for keeping up with friends, it allows to cultivate friendships a lot easier than anything before. We can have a lot of friends when we don't keep secrets, because by being open with your weaknesses you create new friend, not an enemy. You create enemies with secrets and lies.