Comment by fathomdeez

2 days ago

This kind of issue always comes up when people put business logic inside the database. Databases are for data. The data goes in and the data goes out, but the data does not get to decide what happens next based on itself. That's what application code is for.

The way you model data and store it in your database is fundamentally a part of your business logic. The same data can be modeled in many different ways, with different trade-offs for different use cases. Especially if you have a large amount of data, you can't just work with it as is, you need to know how you will use it and model it in a way that makes the common operations fast enough. As your application evolves, this may change, and even require data migrations.

None of this means you have to or even should use stored procedures, triggers, or listen/notify. I'm just making the point that there is no clean separation between "data" and "business logic".

  • Can't upvote this enough. The point is not that procedures outside of the DB is wrong, nor is it that procedures should always go into the DB. It's that you should look at the context and decide what the best way to solve the problem is.

    • Agreed. I used triggers frequently for things like incrementing/decrementing count fields for dashboards because it's the only way to guarantee those numbers are correct while ensuring something in the application hasn't bypassed a callback or handler to modify the data.

      You only need to cover three scenarios and it's very simple to implement. Recorded added +1, Record removed -1, Record moved +1 & -1.

      If you have counts that are more complicated, it doesn't work but this solution easily beats semi-frequent COUNT queries.

The first thing I did when I saw this article was to check the Postgres docs, because I thought "heh, surely they just didn't read the fine print," but the LISTEN/NOTIFY page has zero mentions of "lock" in the entire content.

  • I think, It's because the locking is part of the transaction commit locking, but yes it should be mentioned.

    But it's oversimplified a case of "high queue load f* up the availability/timings for other DB operations" (and themself).

    And thats a generic problem you have, even if just due to "generic CPU/WAL/disk load" if you put your queue into your DB even iff that specific lock would be somehow solved with some atomic concurrent algorithms or similar (not sure if that even is possible).

    So in general make your storage db, and queue a different service (and you cache too), even if it uses the same kind of storage. (Through technically there are clever in-between solutions which run their own queue service but still use you DB for final storage but have a ton of caching, in memory locking etc. to remove a huge part of the load from the DB. )

  • I really hope somebody reading this article (or HN thread) writes a doc patch to mention that.

    I'm unlikely to get it myself today, and by tomorrow I've probably already forgotten it :-(

    • > and by tomorrow I've probably already forgotten it :-(

      You're self-aware and are writing about it, why not maintain and add it to your todo list if this is a recurring issue?

  • One can replace LISTEN/NOTIFY with logical replication / CDC. And it's funny because somehow, somewhere, PG must be serializing the writing of the WAL to some degree. So it's not clear to me why LISTEN/NOTIFY needs additional serialization. Perhaps PG should turn NOTIFY into INSERTs on a special table that a worker process watches and turns those inserts into notifies (and deletes the inserts).

That may hold to a certain extent for relational databases where your business model doesn't align well with physical model (tables). Although you might wonder why stored procedures and triggers were invented.

In databases where your domain is also your physical data model, coupling business logic to the database can work quite well, if the DBMS supports that.

https://medium.com/@paul_42036/entity-workflows-for-event-dr...

> the data does not get to decide what happens next based on itself.

Then why bother with a relational database? Relations and schemas are business logic, and I'll take all the data integrity I can get.

  • I think an argument can be made that relations, schemas and constraints encode a kind of business logic that is intrinsic to the definition and integrity of the data, while other types of business logic represent processes that may hinge on data but aren’t as tightly coupled to it. Similar to the difference between a primitive type and a function.

    I guess some will argue that their business logic is special and really is so tightly coupled to the data definition that it belongs in the database, and I’m not going to claim those use cases don’t exist, but I’ve seen over-coupling far more often than under-coupling.

    This is why I say: Applications come and go, but data is forever.

  • I've seen both of these philosophies. I liken them to religions, the believers are devout. Code is King vs the DB is King.

    I'm personally Code is King, and I have my reasons (like everyone else)

    • Every company I’ve been at that relied on application code to handle referential integrity had orphaned rows, and incidents related to data errors or the absurd pipelines they had built to recreate what FK constraints and triggers already do.

      RDBMS are extremely well-tested pieces of software that do their job incredibly well. To think that you could do better, or even equally as well, is hubris. If you want to trade those guarantees for “velocity” go right ahead, but you also need to take into account the inevitable incidents and recoveries that will occur.

    • It’s really not about code is better or database it better, it’s mostly about locality: if you want to update thousands of records, you can’t pull those records into a separate process, update them there and then write back. So you put your code next to the data in the database. Stored procedures are just code deployed to a database container…

      4 replies →

    • And both of those philosophies will lead to bad engineering.

      There are things that work better, are safer and simpler to do on the database, and things that work better, are safer and simpler in code. And those things might change depending on context, technology, requirements, size of project, experience of contributors, etc.

      Forcing round pegs into square holes will always lead to brittle code and brittle products, often for more cost (mental and financial!) than actually using each tool correctly.

    • I am mostly on the side of business logic should live in applications and relationships between data types are not business logic so much as just the layout of the data. But I typically access data via an ORM and they typically don’t have support for triggers and stored procedures. If they did, I would certainly use it because projects I work on might have multiple people writing application code but everyone uses a single set of database models. This would mean that critical constraints on the shape of the data could be defined and respected at all times vs some developer on my team forgetting to include some critical check in their data update routine.

      4 replies →

    • Maybe not DB, but getting data from wherever it may be to the registers in the computer is certainly is the King of Kings.

    • I believe that both code and data are kings, under different realms. Code is king of the "what we're doing today" realm. Data is king of the "what's possible tomorrow" realm.

      Both have their place in business.

If you want your database to just store bytes, use a key-value store. But SQL gives you schemas and constraints for a reason; they're guardrails for your business logic. Just don’t ask your tables to run the business for you.

  • If only different ORMs had more support for triggers and stored procedures. Things would be so much easier if I could do things like denormalize certain frequently accessed fields across tables but with proper ability to update them automatically without having to do them in application code.

This is one of those absolute statements that cause the kind of problem stated by grandparent. There are lots of those: "Use Postgres for everything", "No business data on the DB", "No methods bigger than 10 lines", "Abstractions only after 3 usages".

Back to the topic: Lots of potential bugs and data corruption issues are solved by moving part of the business logic to the database. Other people already covered two things: data validation and queue atomicity.

On the other hand, lots of potential issues can also arise by putting other parts of business logic to the database, for example, calling HTTPS endpoints from inside the DB itself is highly problematic.

The reality is that the world is not black and white, and being an engineer is about navigating this grey area.

So what are your thoughts on constraints then? Foreign keys? Should that only be handled by the application, like Rails does (or did, haven't used in a long time).

  • I don't think of those as business logic, per se. They're just validity checks on what the data should look like before it's written to disk - they're not actionable in the way L/N is. That being said, constraints usually end up being duplicated outside the db anyway, but having them where the data rests (so you don't have to assume every client is using the correct constraint code) makes sense.

    • I see. Further I have used triggers to automatically populate log tables or aggregate statistics on write. Why do I need fast statistics? For API limits. Customers have N free operations per months and such, so I have to query that on every operation. Do you consider these things as business logic that don't belong in the database?

  • Rails fully supports constraints and encourages you to use them.

    You can either execute SQL in your migration or use add_check_constraint.

    • Back when I used Rails the sentiment was: You don't need foreign keys, this is all handled by ActiveRecord.

  • What happens to FKs when you've to partition/shard the db? At a certain scale, they actually hinder the inserts.

    • FK Constraints on partitioned tables has been a solved problem for years for Postgres. MySQL still doesn’t support them, unfortunately.

      For sharding, Vitess kind of supports them; Citus fully supports them.

      You’re correct that they do impact performance to an extent, but as a counter argument, if your data is incorrect, it doesn’t matter how quickly you wrote it.

    • FKs are nothing special. It's just more INSERTs/UPDATEs/DELETEs. If you can't have a few more DMLs in your transactions in your sharded DB then you've already got scaling problems.

      Really, FKs are typically implemented internally by RDBMSes as TRIGGERs that do what you expect FKs to do, which means they really are nothing more than syntactic sugar.

You're reaching the wrong conclusion, probably because of confirmation bias. Certainly this LISTEN/NOTIFY problem does not lead to your conclusion, nor does it support it. After all if you were relying on LISTEN/NOTIFY you could instead rely on logical replication decoding / CDC instead. And heck, you could even have a client connected to the same database that uses logical decoding to pick up events worth NOTIFYing about and then does just that, but without burdening any other transactions.

> That's what application code is for.

I've seen people who disagree with that statement and say that having a separate back end component often leads to overfetching and in-database processing is better. I've worked on some systems where the back end is essentially just passing data to and from stored procedures.

It was blazing fast, but working with it absolutely sucked - though for whatever reason the people who believe that seem to hold those views quite strongly.

It really depends, but it's also a factor of time, that is, "back in the day", databases were designed to serve many different clients, nowadays a common practice is to have a 1:1 relationship between a database and a client application.

Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.

  • > Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.

    Not to mention the difficulty in maintaining referential integrity with all of that duplicated data. There are various workarounds, but at that point it feels very much like we’re recreating a shared DB, but shittily, and netting zero benefits.

Disagree; these issues come up when people use more advanced features of DBs without having the requisite DB expertise on staff. I’ll give OP that Postgres’ docs do not mention this gotcha (and props to them for drilling down to source code!), but by and large, these issues are from people operating via tech blogs.

The DB is - or should be - the source of truth for your application. Also, since practically everyone is using cloud RDBMS with (usually) networked storage, the latency is atrocious. Given those, it seems silly to rely on an application to react to and direct changes to related data.

For example, if you want to soft-delete customer data while maintaining the ability to hard-delete, then instead of having an is_deleted and/or deleted_at column, have a duplicate table or tables, and an AFTER DELETE trigger on the originals that move the tuples to the other tables.

Or if you want to have get_or_create without multiple round trips (and you don’t have Postgres’ MERGE … RETURNING), you can easily accomplish this with a stored procedure.

Using database features shouldn’t be seen as verboten or outdated. What should be discouraged is not treating things like stored procedures and triggers as code. They absolutely should be in VCS, should go the same review process as anything else, and should be well-documented.

That's purely because nobody knows how to write SQL let alone stored procedures. If stored procedures had better devex they'd be used for most of your app.

  • Postgres lets you write stored procedures out of the box in pgSQL, C, Tcl, Perl, and Python. There are also 3rd party extensions for most languages you might want, including Rust and JS.

    More broadly, not knowing how to write SQL is a very solvable problem, and frankly anyone accessing an RDBMS as a regular part of their job should know it. Even if you’re always using an ORM, you should understand what it’s doing so you can understand the EXPLAIN output you’ll probably be looking at eventually.

    • >... and frankly anyone accessing an RDBMS as a regular part of their job should know it.

      With entity framework code first, Microsoft made it possible for generations of developers to barely touch a database.

      A lot of Devs have poor database skills nowadays.

      Which suits the cloud sellers who want to push managed platforms

      2 replies →