← Back to context

Comment by dumbo-octopus

2 years ago

> delete: remove or obliterate (written or printed matter), especially by drawing a line through it or marking it with a delete sign

Which is, indeed, what every modern database does.

I think you are referring to tombstoning. That's usually a temporary process that may immediately delete the underlying data, keeping a tombstone to ensure the deletion propagates to all storage nodes. A compaction process purges the underlying data (if still present) and the tombstones after a suitable delay. It's a fancy delete that takes some time to process, but the data is eventually gone. You could turn off the compaction, if you wanted.

I believe Kafka make deletion difficult, since it's an append-only log, but Kafka doesn't work well with laws that require deletion of data, so I don't believe it's a popular choice any longer (I.E. isn't modern).

  • If you run a DELETE FROM in any modern sql engine, which is the absolute best you could expect when asking for a delete in the UI^, the data is nowhere near gone. It’s still in all the backups, all the WALs, all the transactions that started before yours, etc. It’s marked for eventual removal, and that’s it. Just as the definition of delete I provided says.

    ^ (more likely they’ll just update the table to set a deleted flag)

    • > eventual removal

      To me, the idea that the deletion takes time to complete doesn't negate the idea that the data will be gone once the process completes.

      WAL archive and backups are external systems. You could argue that nothing supports deletion because an external backup could exist, but that's not a useful conversation.

      6 replies →

    • Imagine the data that was deleted is of the highest level of illegality you can imagine. Under no circumstance can your service be associated with that content.

      - What was your "definition of delete" again?

      - You mentioned some of the convenient technical defaults your frameworks and tools provide out-of-the-box, can you think of ways to improve the situation?

      (You might re-run delete requests after restoring a backup; transaction should resolve in a timely fashion, failed deletes can be communicated to the user quickly etc.)

      5 replies →

Every modern file system works like this too. Then there’s copy-on-write snapshotting and SSD wear leveling to worry about. Data isn’t actually destroyed until the space is reused to store something else at an indeterminate point in the future.

Or when its encryption key is overwritten.

But it probably is a good idea to stop returning deleted data from web APIs.

this is why when I'm building confirm UI, I prefer the term "destroy?" on the confirm action. It's much clearer to the user that this is a destructive and irreversible action and we will be removing this data/state.

*obviously doesn't apply to soft deletes.