Comment by patates

1 month ago

Trigger-based approach is the only one that really works in my experience. Partition the archive table in a way that makes sense for your data and you're good to go.

Some more rules to keep it under control:

Partition table has to be append-only. Duh.

Recovering from a delete needs to be done in the application layer. The archive is meant to be a historical record, not an operational data store. Also by the time you need to recover something, the world may have changed. The application can validate that restoring this data still makes sense.

If you need to handle updates, treat them as soft deletes on the source table. The trigger captures both the old state (before update) and continues normally. Your application can then reconstruct the timeline by ordering archive records by timestamp.

Needless to say, make sure your trigger fires BEFORE the operation, not AFTER. You want to capture the row state before it's gone. And keep the trigger logic dead simple as any complexity there will bite you during high-traffic periods.

For the partition strategy, I've found monthly partitions work well for most use cases. Yearly if your volume is low, daily if you're in write-heavy territory. The key is making sure your common queries (usually "show me history for entity X" or "what changed between dates Y and Z") align with your partition boundaries.

0 comments

patates

No comments yet

Contribute on Hacker News ↗