Comment by osigurdson
2 days ago
I like this article. Lots of comments are stating that they are "using it wrong" and I'm sure they are. However, it does help to contrast the much more common, "use Postgres for everything" type sentiment. It is pretty hard to use Postgres wrong for relational things in the sense that everyone knows about indexes and so on. But using something like L/N comes with a separate learning curve anyway - evidenced in this case by someone having to read comments in the Postgres source code itself. Then if it turns out that it cannot work for your situation it may be very hard to back away from as you may have tightly integrated it with your normal Postgres stuff.
I've landed on Postgres/ClickHouse/NATS since together they handle nearly any conceivable workload managing relational, columnar, messaging/streaming very well. It is also not painful at all to use as it is lightweight and fast/easy to spin up in a simple docker compose. Postgres is of course the core and you don't always need all three but compliment each other very well imo. This has been my "go to" for a while.
"use Postgres for everything" is certainly wrong, eventually. It's still the second-best choice for every new project, and most products will never see the traffic levels that justify using something more specialized. Obviously, recall.ai hit the level of traffic where Postgres was no longer ideal. I bet they don't regret it for the other parts of their product.
What is the first-best choice for a new project? SQLite?
No, generally Postgres, just not for everything. If you understand the tradeoffs SQLite can be fine. Once you have more than one service (even just for HA) SQLite means doing kind of crazy things like using NFS in your infra. If you know you will only have one service and can bind it to an EBS like volume it is totally fine.
They aren't even questioning its use as a database, just as an event bus.
Actually LISTEN/NOTIFY does also not scale the other way. Immich also moved to that pg for everything mentality (trying to remove redis dependencies). The problem: postgres needs a WAL flush for all notifications. I ran immich on my HDD-NAS. The result was constant noise because the pg backed socket.io backend issues constant keep alive messages.
Honestly whatever kind of DB you are speaking about always be wary of "niche/side features" which don't fit it's core design goals, they tend to have unexpected limitations.
listen/notify isn't necessary a replacement for redis or other pub/sub systems, redis pub/sub and similar isn't necessary a replacement for idk. Kafka or similar queue/messaging system
but a lot of companies have (for modern standards) surprisingly small amounts of data, very even a increase by 2,3,4x still isn't that big. In that case listen/notify and similar might just work fine :shrug:
also same is true the other way around, depending on you application you can go redis only, as long as you data volume stays small enough and needs for transactional/sync are reasonable simple enough (with watch+exec, NX,XX options etc. and maybe some redis side lua scripts you can do quite a lot for data synchronization). Issue with that is that stylistically redis data sync/transaction code is often much more similar to writing atomic data-structures then to SQL transactions, and even for SQL transactions there is a trend of devs severely overestimating what they provide, so often you are better of not touching on it when you can avoid it, also BTW. redis has something very similar to sqlite or Notify where "basically" (oversimplified by a lot) there is only one set of writes done at a time ;) (and then afterwards distributed to replicas), just that outside of some micro lua scripts you don't really run much logic outside of some NX, XX checks etc. so it's not blocking much and it's "more or less" all just in memory not touching a WAL (again oversimplified).
>> also same is true the other way around, depending on you application you can go redis only
Really the primary reason not to try stuff like this is (at least for me), feel that I won't paint myself into a corner with Postgres. I can always add a table here or a join there and things will work. If I need columnar, I use ClickHouse and NATS for messaging. I know these well but still gravitate toward Postgres because I feel it can grow in whatever direction is needed. However, it is true, I have thought about trying to just use NATS KV and make all services stateful receiving notifications when things change. It does seem that it could massively simplify some things but expect there could be some sharp edges in the face of unknown requirements. If one could just design for exactly the problem at hand it would be different but it never seems to work out like that.
Largely agree. Functionality wise if you don't have many jobs, using the database as the queue is fine.
However, I've been in several situations where scaling the queue brings down the database, and therefore the app, and am thus of the opinion you probably shouldn't couple these systems too tightly.
There are pros and cons, of course.
Using the database for queues is more than fine, it's often essential to correctness. In many use cases for queues you need to atomically update the database with respect to popping from the queue, and if they're separate systems you end up needing either XA or brittle and unreliable custom idempotency logic. I've seen this go wrong before and it's not nice, the common outcome is business-visible data corruption that can have financial impact.
This seems like another case where Postgres gets free marketing due to companies hitting its technical limits. I get why they choose to make lemonade in these cases with an eng blog post, but this is a way too common pattern on HN. Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business. OpenAI had a similar blog post a couple of months ago where they revealed they were probably spending more than quarter of a million a month on an Azure managed Postgres, and it had stopped scaling so they were having to slowly abandon it, where I made the same comment [1].
Postgres is a great DB for what you pay, but IMHO well capitalized blitzscaling startups shouldn't be using it. If you buy a database - and realistically most Postgres users do anyway as they're paying for a cloud managed db - then you might as well just buy a commercial DB with an integrated queue engine. I have a financial COI because I have a part time job there in the research division (on non-DB stuff), so keep that in mind, but they should just migrate to an Oracle Database. It has a queue engine called TxEQ which is implemented on top of database tables with some C code for efficient blocking polls. It scales horizontally by just adding database nodes whilst retaining ACID transactions, and you can get hosted versions of them in all the major clouds. I'm using it in a project at the moment and it's been working well. In particular the ability to dequeue a message into the same transaction that does other database writes is very useful, as is the exposed lock manager.
Beyond scaling horizontally the nice thing about TxEQ/AQ is that it's a full message queue broker with all the normal features you'd expect. Delayed messages, exception queues, queue browsing, multi-consumer etc. LISTEN/NOTIFY is barely a queue at all, really.
For startups like this, the amount of time, money and morale they are losing with all these constant stories of firefights just doesn't make sense to me. It doesn't have to be Oracle, there are other DBs that can do this too. But "We discovered X about Postgres" is a eng blog cliché by this point. You're paying $$$ to a cloud and GPU vendor anyway, just buy a database and get back to work!
[1] https://news.ycombinator.com/item?id=44074506
Using queues in atomic, transactional way was a core principle for building https://pgflow.dev - having whole workflow state transactionally updated alongside the work on the in db queue really simplifies a lot of things: debugging is easier, audit log is easy, reporting, stats etc are one SQL query away.
4 replies →
>> but they should just migrate to an Oracle Database
No big tech companies or unicorn type startups are using Oracle. Is your claim that they are all wrong?
>> Some startup builds on Postgres then spends half their eng budget at the most critical growth time firefighting around its limits instead of scaling their business
This is why I suggest starting with some kind of normal queue / stream mechanism and columnar DB if needed. It isn't even harder than using one DB, particularly if you are using niche features.
1 reply →
It actually depends on the workload.
Sending webhooks, as an example, often has zero need to go back and update the database, but I've seen that exact example take down several different managed databases ( i.e., not just postgres ).
1 reply →
> "We discovered X about Postgres" is a eng blog cliché by this point.
It really is, and it’s often surprising to me how basic some of the issues are being discovered. Like Figma, when they waited a shocking amount of time add [0] PgBouncer and read replicas. This is such a well-trod path that it’s baffling to me why you wouldn’t add it once it’s clear you have a winning product. At the very least, PgBouncer (or PgCat, or any other connection pooler / proxying service) - it adds negligible cost per month (in comparison to DB read replicas) to run a couple of containers with a load balancer.
Re: Oracle, as much as I despise the company for its litigious practices, I’ll hand it to you that the features your DB has are astonishing. RAC is absolutely incredible (on paper - I’ve never used it).
[0]: https://www.figma.com/blog/how-figma-scaled-to-multiple-data...
if you need transaction across a queue into a normal SQL DB or similar I believe you are doing something very wrong.
Sure you need transaction about processing things in a queue (mark as "taken out", but not yet remove then remove or "place back in (or into a failed messages inbox)" on timeout or similar can be _very_ important for queue systems.
But the moment the "fail save if something dies while processing a message" becomes a directly coupled with DB transactions you have created something very brittle and cumbersome.
To be fair that might still be the best solution for some situations.
But the better solution is to make sure you treat a queue as message passing system and handle messages as messages with the appropriate delivery semantics. And if you can't because idk. idempotency logic is supper unreliable then there indeed is a problem, but its not in the missing cross transactions but how you write that logic (missing ?_tooling_, strict code guidelines people actually comply with, interface regression checks, tests (including prop/fuzz tests, regression tests, integration/e2e tests etc., not just "dump" unit test)).
> just migrate to an Oracle Database.
In my experience while Oracle DB is very powerful but also very cumbersome in a lot of ways and if you need thing only they can provide you most likely already fucked up big time somewhere else in your design/architecture. Sure if you are at that point Oracle can lightly be the cheaper solution. But still preferable you never endup there.
As a side note, there are also a lot of decent plugins which can provide similar capabilities to PG, but they tend to have the issue that they aren't part of managed PG solutions and self managing PG (or most other reasonable powerful DB) can be a huge pain, and then yes Oracle can be a solution.
Still the amount of startups which had a overall good experience are in my experience overall non existing in my experience. (But there are some pretty big companies/projects I know of which have a overall good experience with Oracle.)
> constant stories of firefights
If you mean stories on HN, than that isn't a meaningful metric, you will only hear about the "interesting" stories which mostly are about fire fighting or "using pg for everything is grate" but rarely the majority of in-between stories and boring silent successes. If it's about stories from you carriers and asking dev friends you have what their experience is then it is more meaningful. But in a bubble (like this answer of mine is, without question, in a bubble).
Generally I think people really overestimate how representative HN is, idk. about the US but outside of it _huge_ parts of the IT industry are not represented by HN in any meaningful way. I would say in my country HN is _at most_ representative for 1/4 of the industry, but that 1/4 also contains many of the very very motivated software developers. But also very few of the "that my work but not my calling", "bread and butter" work software devs, which are often 1/3 to over 1/2 of devs in most countries as far as I can tell.
3 replies →
> Postgres/ClickHouse/NATS
Maybe throw in a dedicated key-value store like Redis or Valkey.
Oh and maybe something S3 compatible like MinIO, Garage or SeaweedFS for storing bunches of binary data.
With all of that, honestly it should cover most of the common workloads out there! Of course, depends on how specialized vs generic you like your software to be.
NATS does KV pretty well now (didn't have expiration till earlier this year)
Nats is getting there, but not yet.
Redis is still much more powerful: lists, sorted sets and bazillion of other data structures
1 reply →
I’ve been meaning to check out NATS - I’ve tended to default to Redis for pubsub. What are the main advantages? I use clickhouse and Postgres extensively
I've been disappointed by Nats. Core Nats is good and works well, but if you need stronger delivery guarantees you need to use Jetstream which has a lot of quirks, for instance it does not integrate well with the permission system in Core Nats. Their client SDKs are very buggy and unreliable. I've used the Python, Rust and Go ones, only the Go one worked as expected. I would recommend using rabbitmq, Kafka or redpanda instead of Nats.
I've had the same experience and I fixed part of the problem by writing my own Rust client, Watermelon. It's still missing a lot of features but at least I'm not blocked by weird decisions taken by upstream.
Client SDKs are often a major challenge in systems like these. In my experience, building SDKs on top of asynchronous protocols is particularly tricky. It's generally much easier to make the server-side part reliable. The complexity arises because SDKs must account for a wide range of usage patterns - and you are not controlling the usage.
Asynchronous protocols frequently result in callback-based or generator-style APIs on the client side, which are hard to implement safely and intuitively. For example, consider building a real-time SDK for something like NATS. Once a message arrives, you need to invoke a user-defined callback to handle it. At that point, you're faced with a design decision: either call the callback synchronously (which risks blocking the socket reading loop), or do it asynchronously (which raises issues like backpressure handling).
Also, SDKs are often developed by different people, each with their own design philosophy and coding style, leading to inconsistency and subtle bugs.
So this isn't only about NATS. Just last week, we ran into two critical bugs in two separate Kafka SDKs at work.
Are those recommendations based on using them all in the same context? Curious why you chose Kafka (or Redpanda which is effectively the same) over NATS.
NATS gives you regular pub/sub but also streams as well (similar to Kafka along with strong durability guarantees, etc).
I think PG could relax the ordering thing with NOTIFYs since... it seems a bit silly, but NOTIFYs already are unsafe to use because there is no authorization around channel access, so one might as well use change data capture (logical replication, basically) instead.
This kind of issue always comes up when people put business logic inside the database. Databases are for data. The data goes in and the data goes out, but the data does not get to decide what happens next based on itself. That's what application code is for.
The way you model data and store it in your database is fundamentally a part of your business logic. The same data can be modeled in many different ways, with different trade-offs for different use cases. Especially if you have a large amount of data, you can't just work with it as is, you need to know how you will use it and model it in a way that makes the common operations fast enough. As your application evolves, this may change, and even require data migrations.
None of this means you have to or even should use stored procedures, triggers, or listen/notify. I'm just making the point that there is no clean separation between "data" and "business logic".
Can't upvote this enough. The point is not that procedures outside of the DB is wrong, nor is it that procedures should always go into the DB. It's that you should look at the context and decide what the best way to solve the problem is.
1 reply →
The first thing I did when I saw this article was to check the Postgres docs, because I thought "heh, surely they just didn't read the fine print," but the LISTEN/NOTIFY page has zero mentions of "lock" in the entire content.
I think, It's because the locking is part of the transaction commit locking, but yes it should be mentioned.
But it's oversimplified a case of "high queue load f* up the availability/timings for other DB operations" (and themself).
And thats a generic problem you have, even if just due to "generic CPU/WAL/disk load" if you put your queue into your DB even iff that specific lock would be somehow solved with some atomic concurrent algorithms or similar (not sure if that even is possible).
So in general make your storage db, and queue a different service (and you cache too), even if it uses the same kind of storage. (Through technically there are clever in-between solutions which run their own queue service but still use you DB for final storage but have a ton of caching, in memory locking etc. to remove a huge part of the load from the DB. )
I really hope somebody reading this article (or HN thread) writes a doc patch to mention that.
I'm unlikely to get it myself today, and by tomorrow I've probably already forgotten it :-(
1 reply →
One can replace LISTEN/NOTIFY with logical replication / CDC. And it's funny because somehow, somewhere, PG must be serializing the writing of the WAL to some degree. So it's not clear to me why LISTEN/NOTIFY needs additional serialization. Perhaps PG should turn NOTIFY into INSERTs on a special table that a worker process watches and turns those inserts into notifies (and deletes the inserts).
That may hold to a certain extent for relational databases where your business model doesn't align well with physical model (tables). Although you might wonder why stored procedures and triggers were invented.
In databases where your domain is also your physical data model, coupling business logic to the database can work quite well, if the DBMS supports that.
https://medium.com/@paul_42036/entity-workflows-for-event-dr...
> the data does not get to decide what happens next based on itself.
Then why bother with a relational database? Relations and schemas are business logic, and I'll take all the data integrity I can get.
I think an argument can be made that relations, schemas and constraints encode a kind of business logic that is intrinsic to the definition and integrity of the data, while other types of business logic represent processes that may hinge on data but aren’t as tightly coupled to it. Similar to the difference between a primitive type and a function.
I guess some will argue that their business logic is special and really is so tightly coupled to the data definition that it belongs in the database, and I’m not going to claim those use cases don’t exist, but I’ve seen over-coupling far more often than under-coupling.
This is why I say: Applications come and go, but data is forever.
I've seen both of these philosophies. I liken them to religions, the believers are devout. Code is King vs the DB is King.
I'm personally Code is King, and I have my reasons (like everyone else)
14 replies →
If you want your database to just store bytes, use a key-value store. But SQL gives you schemas and constraints for a reason; they're guardrails for your business logic. Just don’t ask your tables to run the business for you.
If only different ORMs had more support for triggers and stored procedures. Things would be so much easier if I could do things like denormalize certain frequently accessed fields across tables but with proper ability to update them automatically without having to do them in application code.
3 replies →
This is one of those absolute statements that cause the kind of problem stated by grandparent. There are lots of those: "Use Postgres for everything", "No business data on the DB", "No methods bigger than 10 lines", "Abstractions only after 3 usages".
Back to the topic: Lots of potential bugs and data corruption issues are solved by moving part of the business logic to the database. Other people already covered two things: data validation and queue atomicity.
On the other hand, lots of potential issues can also arise by putting other parts of business logic to the database, for example, calling HTTPS endpoints from inside the DB itself is highly problematic.
The reality is that the world is not black and white, and being an engineer is about navigating this grey area.
Thank you for bringing some sanity into this discussion.
So what are your thoughts on constraints then? Foreign keys? Should that only be handled by the application, like Rails does (or did, haven't used in a long time).
I don't think of those as business logic, per se. They're just validity checks on what the data should look like before it's written to disk - they're not actionable in the way L/N is. That being said, constraints usually end up being duplicated outside the db anyway, but having them where the data rests (so you don't have to assume every client is using the correct constraint code) makes sense.
1 reply →
Rails fully supports constraints and encourages you to use them.
You can either execute SQL in your migration or use add_check_constraint.
1 reply →
You still use constraints even if you put all your business logic in stored procedures.
What happens to FKs when you've to partition/shard the db? At a certain scale, they actually hinder the inserts.
2 replies →
You're reaching the wrong conclusion, probably because of confirmation bias. Certainly this LISTEN/NOTIFY problem does not lead to your conclusion, nor does it support it. After all if you were relying on LISTEN/NOTIFY you could instead rely on logical replication decoding / CDC instead. And heck, you could even have a client connected to the same database that uses logical decoding to pick up events worth NOTIFYing about and then does just that, but without burdening any other transactions.
> That's what application code is for.
I've seen people who disagree with that statement and say that having a separate back end component often leads to overfetching and in-database processing is better. I've worked on some systems where the back end is essentially just passing data to and from stored procedures.
It was blazing fast, but working with it absolutely sucked - though for whatever reason the people who believe that seem to hold those views quite strongly.
It really depends, but it's also a factor of time, that is, "back in the day", databases were designed to serve many different clients, nowadays a common practice is to have a 1:1 relationship between a database and a client application.
Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.
> Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.
Not to mention the difficulty in maintaining referential integrity with all of that duplicated data. There are various workarounds, but at that point it feels very much like we’re recreating a shared DB, but shittily, and netting zero benefits.
Disagree; these issues come up when people use more advanced features of DBs without having the requisite DB expertise on staff. I’ll give OP that Postgres’ docs do not mention this gotcha (and props to them for drilling down to source code!), but by and large, these issues are from people operating via tech blogs.
The DB is - or should be - the source of truth for your application. Also, since practically everyone is using cloud RDBMS with (usually) networked storage, the latency is atrocious. Given those, it seems silly to rely on an application to react to and direct changes to related data.
For example, if you want to soft-delete customer data while maintaining the ability to hard-delete, then instead of having an is_deleted and/or deleted_at column, have a duplicate table or tables, and an AFTER DELETE trigger on the originals that move the tuples to the other tables.
Or if you want to have get_or_create without multiple round trips (and you don’t have Postgres’ MERGE … RETURNING), you can easily accomplish this with a stored procedure.
Using database features shouldn’t be seen as verboten or outdated. What should be discouraged is not treating things like stored procedures and triggers as code. They absolutely should be in VCS, should go the same review process as anything else, and should be well-documented.
That's purely because nobody knows how to write SQL let alone stored procedures. If stored procedures had better devex they'd be used for most of your app.
Postgres lets you write stored procedures out of the box in pgSQL, C, Tcl, Perl, and Python. There are also 3rd party extensions for most languages you might want, including Rust and JS.
More broadly, not knowing how to write SQL is a very solvable problem, and frankly anyone accessing an RDBMS as a regular part of their job should know it. Even if you’re always using an ORM, you should understand what it’s doing so you can understand the EXPLAIN output you’ll probably be looking at eventually.
3 replies →
There’s no reason this article and start with Postgres for everything can’t be true.
In the beginning having fewer parts to connect and maintain lets the needs and bottlenecks of the actual application emerge.
If it was listen/notify in such a scenario at some volume where optimizing it isn’t in the cards… so be it. It would be some time down the road before sharding a function into a specific subsystem like what you described.
Appreciate learning about the Postgres/Clickhouse/nats combo. If there might be an article if the three together that you liked would be happy to read and learn.
Isn't Kafka the Postgresql of pub/sub
I.e. use Kafka unless you have a explicit reason not to?
So why Nats?
After working with NATS, I wouldn't want to touch Kafka even with a long stick. Its just too complex and a memory hog for no good reason. It doesn't have all the features that NATS supports as well.
What about the Kafka V2, Pulsar?
Kafka is far from trivial to operate, for one thing, even post zookeeper.
And it's kinda wrong to use as a queue (in most cases), being a log stream you can seek in.
> However, it does help to contrast the much more common, "use Postgres for everything" type sentiment.
I think sentiment is to use "for everything in 99% business cases", which involves few 100GB of data with some thousands QPS, and could be handled by PG very well.