Comment by jackfranklyn

1 month ago

The query complexity is the bit that catches teams off guard. You tell yourself "just add WHERE deleted_at IS NULL everywhere" but then you're six months in and someone's debugging why a report is showing ghost data because one query in a chain of 12 missed the filter.

Views help, but then you're maintaining parallel access patterns. And the moment you need to actually query deleted records (audit, support tickets, undo) you're back to bypassing your own abstractions.

Event sourcing solves this more cleanly but the operational overhead is real - most teams I've seen try it end up with a hybrid where core entities are event-sourced and everything else is just soft deleted with fingers crossed.

3 comments

jackfranklyn

tucnak 1 month ago

I'm struggling to see your point. CREATE VIEW not only helps, yes, indeed it's oftentimes exactly all you need. If you have multiple access patterns, like having to "actually query deleted records" sometimes, somewhere, at some point, someone would have to maintain invariants on these access patterns. This is not rocket science. The heart of the matter is that SWE's cannot handle schema/basic SQL to save their lives, whilst analysts/BI guys/whomever actually somewhat well-versed in SQL, have very little grasp on the inner working of a database, and carry with themselves idiosyncrasies coming all the way back from the 90's.

The pot is calling the kettle black.

Forget about soft deletes for a hot minute. I can give you another super basic example where in my experience SWE's and BI guys both lose the plot: Type 2 slowly-changing dimensions. This is actually heavily related to soft deletes, and much more common as far as access patterns are concerned. Say, you want to support data updates without losing information unless specified by a retention policy. For argument's sake, let's say you want to keep track of edits in the user profile. How do you do it? If you go read up on Stackoverflow, or whatever, you will come across the idea that did more violence to schemas worldwide than anything else in existence, "audit table." So instead of performing a cheap INSERT on a normalised data structure every time you need to make a change, and perhaps reading up-to-date data from a view, you're now performing costly UPDATE, and additional INSERT anyway. Why? Because apparently DISTINCT ON and composite primary keys are black magic (and anathema to ORM's in general.) If you think on BI side they're doing any better, you think wrong! To them, DISTINCT ON is oftentimes a mystery no less. One moment, blink, there you go, back in the subquery hell they call home.

Databases are beautiful, man.

It's a shame they are not treated with more respect that they deserve.

manoDev 1 month ago

I believe this all stems from primordial SQL focusing on storage efficiency, and now it’s kinda hard to retrofit better data modeling ideas without better affordances.
If I started from scratch, I would get rid of UPDATE and DELETE (these would be only very special cases for data privacy), and instead focus on first class views (either batch copy or streaming) and retention policies.
lazide 1 month ago

It’s just a lot of overhead (in every way) if you’re just trying to store some rows and columns.