Comment by tharkun__

20 days ago

Agreed. And if deletes are soft, you likely really just wanted a complete audit history of all updates too (at least that's for the cases I've been part of). And then performance _definitely_ would suffer if you don't have a separate audit/archive table for all of those.

2 comments

tharkun__

pixl97 20 days ago

I mean, yes, growth forever doesn't tend to work.

I've seen a number of apps that require audit histories work on a basis where they are archived at a particular time, and that's when the deletes occurred and indexes fully rebuilt. This is typically scheduled during the least busy time of the year as it's rather IO intensive.

tharkun__ 19 days ago

Oldest I've worked with was a project started in ~1991. I don't recall when they started keeping history and for how long and they might have trimmed history after some legal period that's shorter but, I worked on it ~15 years after that. And that's like what, 15,..., 20 years ago by now and I doubt they changed that part of the system. You've all likely bought products that were administered through this system.
FWIW, no "indexes fully rebuilt" upon "actual deletion" or anything like that. The regular tables were always just "current" tables. History was kept in archive tables that were always up-to-date via triggers. Essentially, current tables never suffered any performance issues and history was available whenever needed. If history access was needed for extensive querying, read replicas were able to provide this without any cost to the main database but if something required "up to the second" consistency, the historic tables were available on the main database of course with good performance (as you can tell from the timelines, this was pre-SSDs, so multi-path I/O over fibre was what they had at the time I worked with it with automatic hot-spare failover between database hosts - no clouds of any kind in sight). Replication was done through replicating the actual SQL queries modifying the data on each replica (multiple read replicas across the world) vs. replicating the data itself. Much speedier, so that the application itself was able to use read replicas around the globe, without requiring multi-master for consistency. Weekends used to "diff" in order to ensure there were no inconsistencies for whatever reason (as applying the modifying SQL queries to each replica does of course have the potential to have the data go out of sync - theoretically).
Gee, I'm old, lol!