← Back to context

Comment by throwaway2016a

7 years ago

> That is intentional.

It's funny you are so sure of this.

And you essentially just called me wrong without providing a use case. My statement was I couldn't think of a use case for them to do this intentionally and your statement does nothing to disprove that.

Hiding a video can and probably is just an "UPDATE videos SET visible=0 WHERE id=123" And it is extremely common to soft delete things for hard delete later in case the user made a mistake or law enforcement requests it or any number of reasons.

Especially in large distributed systems where things often need to happen asynchronously.

Not permanently deleting a soft delete file is not necessary intentional. Anyone who has ever worked on a large software project knows about backlog stories (say, hypothetically, "free up space from soft deleted videos") not being done for years because other priorities keep pre-empting them.

Similar reason when writing in a garbage collected programming language the memory isn't freed immediately.

Does that mean it was unintentional? Not necessarily but it certainly is plausible that it was unintentional.

Okay, so if it wasn't intentional how else could old videos still be in the system. Videos is not a new feature and the videos date back as far as 2008.

Facebook may be a bad player, but they have tons of talent working for them. Are you seriously suggesting that they stored a decade worth of videos that never saw a single view/download and nobody there realized it?

  • I'm not on Facebook's side here but if you have to consider deletes in a large storage system then you have to consider fragmentation. I implemented a storage system similar to Facebook's Haystack (which they use for photos and videos) and did exactly the same - mark the replaced or deleted object as such and ignore it until a separate process compacts the stack.

    Compaction means copying huge quantities of data around and re-indexing everything in a particular stack, while at the same time maintaining good read performance. It's an expensive operation and not worth it unless you desperately need to reclaim space.

    The alternative is to overwrite deleted content, but that carries a performance cost because new files may need breaking up to fit into smaller gaps left by deleted files, leading to IO devices spending more time seeking per-object. Defragmenting such a scheme is even more expensive than compacting a haystack-style scheme.

    So yes - the system may not actually destroy the bytes on disk by design. However, it should not report those objects as still being available to layers above it since doing so may lead to inconsistency. This leads me to believe that nothing was actually even marked as deleted, it was simply never "published".