Comment by KronisLV
18 days ago
> I've seen tables where 50%-70% were soft-deleted, and it did affect the performance noticeably.
I think we largely need support for "soft deletes" to be baked into SQL or its dialects directly and treated as something transparent (selecting soft deleted rows = special case, regular selects skip those rows; support for changing regular DELETE statements into doing soft deletes under the hood).
https://news.ycombinator.com/item?id=41272903
And then make dynamically sharding data by deleted/not deleted really easy to configure.
You soft deleted a few rows? They get moved to another DB instance, an archive/bin of sorts. Normal queries wouldn't even consider it, only when you explicitly try to select soft deleted rows would it be reached out to.
Well, Microsoft SQL Server has built-in Temporal Tables [1], which even take this one step further: they track all data changes, such that you can easily query them as if you were viewing them in the past. You can not only query deleted rows, but also the old versions of rows that have been updated.
(In my opinion, replicating this via a `validity tstzrange` column is also often a sane approach in PostgreSQL, although OP's blog post doesn't mention it.)
[1]: https://learn.microsoft.com/en-us/sql/relational-databases/t...
MariaDB has system-versioned tables, too, albeit a bit worse than MS SQL as you cannot configure how to store the history, so they're basically hidden away in the same table or some partition: https://mariadb.com/docs/server/reference/sql-structure/temp...
This has, at least with current MariaDB versions, the annoying property that you really cannot ever again modify the history without rewriting the whole table, which becomes a major pain in the ass if you ever need schema changes and history items block those.
Maria still has to find some proper balance here between change safety and developer experience.
> I think we largely need support for "soft deletes" to be baked into SQL
I think web and GUI programmers must stop expeting the database to contain the data already selected and formatted for their nice page.
> I think web and GUI programmers must stop expeting the database to contain the data already selected and formatted for their nice page.
So a widespread, common and valid practice shouldn't be made better supported and instead should rely on awkward hacks like "deleted_at" where sooner or later people or ORMs will forget about those semantics and will select the wrong thing? I don't think I agree. I also don't think that it has much to do with how or where you represent the data. Temporal tables already do something similar, just with slightly different semantics.
What way of making it better supported wouldn’t require custom semantics that people would forget and then select the wrong thing.
1 reply →