Comment by evanmoran

3 months ago

Just curious, how do people feel about this general style of soft deletes currently? Do people still use these in production or prefer to just delete fully or alternatively move deleted rows to a separate tables / schema?

I find the complexity to still feel awkward enough that makes me wonder if deleted_at is worth it. Maybe there are better patterns out there to make this cleaner like triggers to prevent deletion, something else?

As for the article, I couldn't agree more on having timestamps / user ids on all actions. I'd even suggest updated_by to add to the list.

13 comments

evanmoran

j_w 3 months ago

Financial world: records have a "close" or "expire" date which is then purged after some period of time. A deletion doesn't just happen, the record is updated to be "closed" or "expired" and some time after that it's deleted.

Something like a loan could live in a production environment for well over a year after closing, while an internal note may last just a month.

swagasaurus-rex 3 months ago

I think soft deletes using timestamptz are a good thing.

Deleting rows directly could mean you're breaking references. For example, say you have a product that the seller wants to delete. Well, what happens if customers have purchased that product? You still want it in the database, and you still want to fulfill the orders placed.

Your backend can selectively query for products, filter out deleted_at for any customer facing queries, but show all products when looking at purchase history.

There are times when deleting rows makes sense, but that's usually because you have a write-heavy table that needs clearing. Yes, soft-deletes requires being careful with WHERE statements filtering out deleted rows, but that's a feature not a bug.

pmontra 3 months ago

> what happens if customers have purchased that product? You still want it in the database, and you still want to fulfill the orders placed.
You might still want to show to those customers their purchase history including what they bought 25 years ago. For example, my ISP doesn't have anymore that 10 Mb/s fiber optic product I bought im 2000, because it was superseded by 100 Mb/s products and then by 1 Gb/s ones. It's also not my ISP anymore but I use it for the SIM in my phone. That also accumulated a number of product changes along the years.
And think about the inventory of eshops with a zillion products and the archive of the pady orders. Maybe they keep the last few years, maybe everything until the db gets too large.

refset 3 months ago

> Maybe there are better patterns out there to make this cleaner

SQL:2011 temporal tables are worth a look.

zie 3 months ago

If you have a good audit log, it really doesn't matter. You can always restore it if need be.

If you have no audit log(or a bad one), like lots of apps, then you have to care a lot.

Personally, I just implement a good audit log and then I just delete with impunity. Worst case scenario, someone(maybe even me) made a mistake and I have to run undo_log_audit() with the id of the audit log entry I want to put back. Nearly zero hassle.

The upside, when something goes wrong, I can tell you who, what and when. I usually have to infer the why, or go ask a human, but it's not usually even difficult to do that.

dml2135 3 months ago
Can you share more about what makes a good audit log? My company doesn’t currently have one and I’m a little lost on where to start.
Should this be at the application code level, or the ORM, or the database itself?
- zie 3 months ago
  
  That depends on where the data you need to keep track of is and your architecture. The important thing is, you want your audit log to be able to tell you:
  * Who * What * When * Ideally Why
  For any change in the system. Also when storing the audit log, take into account that you might need to undo things that happened(not just deletes). For instance maybe some process went haywire and inserted 100k records it wasn't supposed to. A good audit log, you should be able to run something like undo_log_audit(rec1, rec100k) and it will do the right thing. I'm not saying that code needs to exist day 1, but you should take into account the ability to do that when designing it.
  Also you need to take into account your regulatory environment. Sometimes it's very very important that your audit logs are write once, and read only afterwards and are stored off machine, etc. Other times it's just for internal use and you can be a little more lax about date integrity of your audit logs.
  Our app is heavily database centric. We push into the DB the current unix user, the current PID of the process connecting to the DB, etc(also every user has their own login to the DB so it handles our authentication too). This means our database(Postgres) does all of the audit logging for us. There are plenty of Postgres audit logging extensions. We run 2 of them. One that is trigger based creating entries in a log_audit table(which the undo_log_audit() code uses along with most reporting use cases) and a second one that writes out to syslog(so we can move logs off machine and keep them read only). We are in a regulated industry that gets audited regularly however. Not everyone needs the same level of audit logging.
  You need to figure out how you can answer the above questions given your architecture. Normally the "Why" question is hard to answer without talking with a human, but unless you have the who, what and when, it's nearly impossible to even get to the Why part of the question.
- mrkeen 3 months ago
  
  I once worked in a small VB6-based team - you can probably guess the attitudes and surrounding tech were just as out of date.
  I tried to push for using svn, rather than just making copies of our source code folders and adding dates to them.
  My manager allowed me to use svn, but to make sure I also did things the proper way by making copies of the source code folders.
  That's the current level of discourse around audit logs. Write down what happened using your data tables ... but write down what really happened in the audit logs.
  At some point you should just lean into putting audit logs first (just like developers reach for the git first).
- lexh 3 months ago
  
  It is Postgres specific, but I’ve gotten a lot of mileage out of the advice in this article:
  https://supabase.com/blog/postgres-audit
- jimbokun 3 months ago
  
  Probably application level in most cases as those other levels probably don’t have all the information you want to include.

PeterStuer 3 months ago

There can be legal requirements to retain data for a specified time for law enforcement and audits, while at the same time other legal requirements have you requiring to delete data upon customer request.

Doing this with pure 'hard' deletes is not possible, unless you maintain 2 different tables, one of which would still have the soft delete explicit or implicit. You could argue the full db log would contain the data for the former requirement, but while academicly correct this does not fly in practice.

imcritic 3 months ago

Always soft-deletion first. Then it gets exported to a separate archive and only then, after some time and may be attempted to be fully deleted from the initial base.

metanonsense 3 months ago

In our product, we have different strategies depending on the requirements. Sometimes, we just delete. Sometimes, we do soft delete with timestamps. Sometimes, we have a history table with or without versioned entities. Sometimes, we have versions in the table. Sometimes, we have an audit log. Sometimes, we use event sourcing (although everyone in the team hates it ;-)