Comment by dna_polymerase
7 years ago
Okay, so if it wasn't intentional how else could old videos still be in the system. Videos is not a new feature and the videos date back as far as 2008.
Facebook may be a bad player, but they have tons of talent working for them. Are you seriously suggesting that they stored a decade worth of videos that never saw a single view/download and nobody there realized it?
I'm not on Facebook's side here but if you have to consider deletes in a large storage system then you have to consider fragmentation. I implemented a storage system similar to Facebook's Haystack (which they use for photos and videos) and did exactly the same - mark the replaced or deleted object as such and ignore it until a separate process compacts the stack.
Compaction means copying huge quantities of data around and re-indexing everything in a particular stack, while at the same time maintaining good read performance. It's an expensive operation and not worth it unless you desperately need to reclaim space.
The alternative is to overwrite deleted content, but that carries a performance cost because new files may need breaking up to fit into smaller gaps left by deleted files, leading to IO devices spending more time seeking per-object. Defragmenting such a scheme is even more expensive than compacting a haystack-style scheme.
So yes - the system may not actually destroy the bytes on disk by design. However, it should not report those objects as still being available to layers above it since doing so may lead to inconsistency. This leads me to believe that nothing was actually even marked as deleted, it was simply never "published".