← Back to context

Comment by rorylaitila

11 hours ago

Databases store facts. Creating a record = new fact. "Deleting" a record = new fact. But destroying rows from tables = disappeared fact. That is not great for most cases. In rare cases the volume of records may be a technical hurdle; in which case, move facts to another database. The times I've wanted to destroy large volume of facts is approximately zero.

When you start thinking of data as a potentially toxic asset with a maintenance cost to ensure it doesn't leak and cause an environmental disaster, it becomes more likely that you'd want to get rid of large volumes of facts.

Unless your database is immutable, every changed a record causes a “disappeared fact”.

There are many legitimate reasons to delete data. The decision to retain data forever should not be taken lightly.

  • Yes. Another way to look at databases is that they store the state at given time. We can augment tables with valid_from, valid_to columns to retrieve the state at a particular time. In that case there is never a DELETE, only INSERTs and UPDATEs of the valid_to column. Maybe this is what you mean with immutable database.

    The problems are mostly the same as with soft delete: valid_to is more or less the same as deleted_at, which we probably need anyway to mark a record as deleted instead of simply updated. Furthermore, there are way more records in the db. And what about the primary key? Maybe those extra records go to an history table to keep the current table slim and with a unique primary key which is not augmented by some artificial extra key. There are a number of possible designs.

Agreed. In fact I believe there should be 2 main operations in a data store: retrieve and insert. For this to actually work in practice, you probably need different types of data stores for different phases of data. Unfortunately few people have a good understanding of the Data life cycle.