Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS. It's amazing and it works perfectly fine.
For client projects, however, I always try and sell them on paying the AWS fees, simply because it shifts the responsibility of the hardware being "up" to someone else. It does not inherently solve the downtime problem, but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."
Doesn't always work like that and isn't always a tried-and-true excuse, but generally lets me sleep much better at night.
With limited budgets, however, it's hard to accept the cost of RDS (and we're talking with at least one staging environment) when comparing it to a very tight 3-node Galera cluster running on Hetzner at barely a couple of bucks a month.
Or Cloudflare, titan at the front, being down again today and the past two days (intermittently) after also being down a few weeks ago and earlier this year as well. Also had SQS queues time out several times this week, they picked up again shortly, but it's not like those things ...never happen on managed environments. They happen quite a bit.
Over 20 year I've had lots of clients on self-hosted, even self-hosting SQL on the same VM as the webserver as you used to in the long distant past for low-usage web apps.
I have never, ever, ever had a SQL box go down. I've had a web server go down once. I had someone who probably shouldn't have had access to a server accidentally turn one off once.
The only major outage I've had (2/3 hours) was when the box was also self-hosting an email server and I accidentally caused it to flood itself with failed delivery notices with a deploy.
I may have cried a little in frustration and panic but it got fixed in the end.
I actually find using cloud hosted SQL in some ways harder and more complicated because it's such a confusing mess of cost and what you're actually getting. The only big complication is setting up backups, and that's a one-off task.
Just wait until you end up spending $100,000 for an awful implantation from a partner who pretends to understand your business need but delivers something that doesn’t work.
But perhaps I’m bitter from prior Salesforce experiences.
> Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS
It is. You need to answer the question: what are the consecuences of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security? Many people are technically unable, lack the time or the resources to be able to confidently address that question, hence paying for someone else to do it.
Your time is money though. You are saving money but giving up time.
Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
You can pay someone else to manage your hardware stack, there are literal companies that will just keep it running, while you just deploy your apps on that.
> It is. You need to answer the question: what are the consecuences of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security?
There is one advantage self hosted setup has here, if you set up VPN, only your employees have access, and you can have server not accessible from the internet. So even in case of zero day that WILL make SaaS company leak your data, you can be safe(r) with self-hosted solution.
> Your time is money though. You are saving money but giving up time.
The investment compounds. Setting up infra to run a single container for some app takes time and there is good chance it won't pay back for itself.
But 2nd service ? Cheaper. 5th ? At that point you probably had it automated enough that it's just pointing it at docker container and tweaking few settings.
> Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
It's cheaper if you include your own time. You pay a technical person at your company to do it. Saas company does that, then pays sales and PR person to sell it, then pays income tax to it, then it also needs to "pay" investors.
Yeah making a service for 4 people in company can be more work than just paying $10/mo to SaaS company. But 20 ? 50 ? 100 ? It quickly gets to point where self hosting (whether actually "self" or by using dedicated servers, or by using cloud) actually pays off
> Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
In a business context the "time is money" thing actually makes sense, because there's a reasonable likelihood that the business can put the time to a more profitable use in some other way. But in a personal context it makes no sense at all. Realistically, the time I spend cooking or cleaning was not going to earn me a dime no matter what else I did, therefore the opportunity cost is zero. And this is true for almost everyone out there.
That argument does not hold when there is aws serverless pg available, which cost almost nothing for low traffic and is vastly superior to self hosting regarding observability, security, integration, backup ect...
There is no reason to self manage pg for dev / environnement.
"which cost almost nothing for low traffic" you invented the retort "what about high traffic" within your own message. I don't even necessarily mean user traffic either. But if you constantly have to sync new records over (as could be the case in any kind of timeseries use-case) the internal traffic could rack up costs quickly.
"vastly superior to self hosting regarding observability" I'd suggest looking into the cnpg operator for Postgres on Kubernetes. The builtin metrics and official dashboard is vastly superior to what I get from Cloudwatch for my RDS clusters. And the backup mechanism using Barman for database snapshots and WAL backups is vastly superior to AWS DMS or AWS's disk snapshots which aren't portable to a system outside of AWS if you care about avoiding vendor lock-in.
> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:
> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
This is funny. I'd argue the exact opposite. I would self host only:
* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or
* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...
> employing engineers to manage self-hosted databases is more cost effective than outsourcing
Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.
Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.
> Every company out there is using the cloud and yet still employs infrastructure engineers
Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.
> Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).
Also your argument is incorrect in my experience.
At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.
In-house vs Cloud Provider is largely a wash in terms of cost. Regardless of the approach, you are going need people to maintain stuff and people cost money. Similarly compute and storage cost money so what you lose on the swings, you gain on the roundabouts.
In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.
US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)
Working in a university Lab self-hosting is the default for almost anything. While I would agree that cost are quite low, I sometimes would be really happy to throw money at problems to make them go away. Without having the chance and thus being no expert, I really see the opportunity of scaling (up and down) quickly in the cloud. We ran a postgres database of a few 100 GB with multiple read replica and we managed somehow, but actually really hit our limits of expertise at some point. At some point we stopped migrating to newer database schemas because it was just such a hassle keeping availability. If I had the money as company, I guess I would have paid for a hosted solution.
I don’t think it’s a lie, it’s just perhaps overstated. The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.
Whether or not you need that equivalence is an orthogonal question.
The fact that as many engineers are on payroll doesn't mean that "cloud" is not an efficiency improvement. When things are easier and cheaper, people don't do less or buy less. They do more and buy more until they fill their capacity. The end result is the same number (or more) of engineers, but they deal with a higher level of abstraction and achieve more with the same headcount.
I can't talk about staff costs, but as someone who's self-hosted Postgres before, using RDS or Supabase saves weeks of time on upgrades, replicas, tuning, and backups (yeah, you still need independent backups, but PITRs make life easier). Databases and file storage are probably the most useful cloud functionality for small teams.
If you have the luxury of spending half a million per year on infrastructure engineers then you can of course do better, but this is by no means universal or cost-effective.
Well sure you still have 2 or 3 infra people but now you don’t need 15. Comparing to modern Hetzner is also not fair to “cloud” in the sense that click-and-get-server didn’t exist until cloud providers popped up. That was initially the whole point. If bare metal behind an API existed in 2009 the whole industry would look very different. Contingencies Rule Everything Around Me.
You are missing that most services don't have high availability needs and don't need to scale.
Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.
I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.
> at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
This is the crux of one of the most common fallacies in software engineering decision making today. I've participated in a bunch of architecture / vendor evaluations that concluded managed services are more cost effective almost purely because they underestimated (or even discarded entirely) the internal engineering cost of vendor management. Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
But most importantly - even if it's significantly less effort than self-hosting, it's never effectively costed when evaluating trade-offs - that's what leads to this persistent myth about the engineering cost of self-hosting. "Managing" managed services is a non-zero cost.
Add to that the ultimate trade-off of accountability vs availability (internal engineers care less about availability when it's out of there hands - but it's still a loss to your product either way).
> Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
I'm really not sure what you're talking about here. I manage many RDS clusters at work. I think in total, we've spent maybe eight hours over the last three years "tuning" the system. It runs at about 100kqps during peak load. Could it be cheaper or faster? Probably, but it's a small fraction of our total infra spend and it's not keeping me up at night.
Virtually all the effort we've ever put in here has been making the application query the appropriate indexes. But you'd do no matter how you host your database.
Hell, even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.
its not. I've been in a few shops that use RDS because they think their time is better spend doing other things.
except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.
One thing unaccounted for if you've only ever used cloud-hosted DBs is just how slow they are compared to a modern server with NVME storage.
This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.
On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.
I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.
I also encourage people to just use managed databases. After all, it is easy to replace such people. Heck actually you can fire all of them and replace the demand with genAI nowadays.
Agreed. As someone in a very tiny shop, all us devs want to do as little context switching to ops as possible. Not even half a day a month. Our hosted services are in aggregate still way cheaper than hiring another person. (We do not employ an "infrastructure engineer").
The discussion isn't "what is more effective". The discussion is "who wants to be blamed in case things go south". If you push the decision to move to self-hosted and then one of the engineers fucks up the database, you have a serious problem. If same engineer fucks up cloud database, it's easier to save your own ass.
grok/claude/gpt: "Write a concise Bash script for setting up an automated daily PostgreSQL database backup using pg_dump and cron on a Linux server, with error handling via logging and 7-day retention by deleting older backups."
So, yeah, I guess there's much confusion about what a 'managed database' actually is? Because for me, the table stakes are:
-Backups: the provider will push a full generic disaster-recovery backup of my database to an off-provider location at least daily, without the need for a maintenance window
-Optimization: index maintenance and storage optimization are performed automatically and transparently
-Multi-datacenter failover: my database will remain available even if part(s) of my provider are down, with a minimal data loss window (like, 30 seconds, 5 minutes, 15 minutes, depending on SLA and thus plan expenditure)
-Point-in-time backups are performed at an SLA-defined granularity and with a similar retention window, allowing me to access snapshots via a custom DSN, not affecting production access or performance in any way
-Slow-query analysis: notifying me of relevant performance bottlenecks before they bring down production
-Storage analysis: my plan allows for #GB of fast storage, #TB of slow storage: let me know when I'm forecast to run out of either in the next 3 billing cycles or so
Because, well, if anyone provides all of that for a monthly fee, the whole "self-hosting" argument goes out of the window quickly, right? And I say that as someone who absolutely adores self-hosting...
It's even worse when you start finding you're staffing specialized skills. You have the Postgres person, and they're not quite busy enough, but nobody else wants to do what they do. But then you have an issue while they're on vacation, and that's a problem. Now I have a critical service but with a bus factor problem. So now I staff two people who are now not very busy at all. One is a bit ambitious and is tired of being bored. So he's decided we need to implement something new in our Postgres to solve a problem we don't really have. Uh oh, it doesn't work so well, the two spend the next six months trying to work out the kinks with mixed success.
This would be a strange scenario because why would you keep these people employed? If someone doesn't want to do the job required, including servicing Postgres, then they wouldn't be with me any longer, I'll find someone who does.
IMO, the reason to self-host your database is latency.
Yes, I'd say backups and analysis are table stakes for hiring it, and multi-datacenter failover is a relevant nice to have. But the reason to do it yourself is because it's literally impossible to get anything as good as you can build in somebody's else computer.
If you set it up right, you can automate all this as well by self hosting. There is really nothing special about automating backups or multi region fail over.
Self-host things the boss won't call at 3 AM about: logs, traces, exceptions, internal apps, analytics. Don't self-host the database or major services.
As someone who self hosted mysql (in complex master/slave setups) then mariadb, memsql, mongo and pgsql on bare metal, virtual machines then containers for almost 2 decades at this point... you can self host with very little downtime and the only real challenge is upgrade path and getting replication right.
Now with pgbouncer (or whatever other flavor of sql-aware proxy you fancy) you can greatly reduce the complexity involved in managing conventionally complex read/write routing and sharding to various replicas to enable resilient, scalable production-grade database setups on your own infra. Throw in the fact that copy-on-write and snapshotting is baked into most storage today and it becomes - at least compared to 20 years ago - trivial to set up DRS as well. Others have mentioned pgBackRest and that further enforces the ease with which you can set up these traditionally-complex setups.
Beyond those two significant features there isn't many other reasons you'd need to go with hosted/managed pgsql. I've yet to find a managed/hosted database solution that doesn't have some level of downtime to apply updates and patches so even if you go fully hosted/managed it's not a silver bullet. The cost of managed DB is also several times that of the actual hardware it's running on, so there is a cost factor involved as well.
I guess all this to say it's never been a better time to self-host your database and the learning curve is as shallow as it's ever been. Add to all of this that any garden-variety LLM can hand-hold you through the setup and management, including any issues you might encounter on the way.
Once they convince you that you can’t do it yourself, you end up relying on them, but didn’t develop the skills you would need to migrate to another provider when they start raising prices. And they keep raising prices because by then you have no choice.
There is plenty of provider markup, to be sure. But it is also very much not a given that the hosted version of a database is running software/configs that are equivalent to what you could do yourself. Many hosted databases are extremely different behind the scenes when it comes to durability, monitoring, failover, storage provisioning, compute provisioning, and more. Just because it acts like a connection hanging off a postmaster service running on a server doesn’t mean that’s what your “psql” is connected to on RDS Aurora (or many of the other cloud-Postgres offerings).
I have not tested this in real life yet but it seems like all the argument about vendor lock in can be solved, if you bite the bullet and learn basic Kubernetes administration. Kubernetes is FOSS and there are countless Kubernetes as a service providers.
I know there are other issues with Kubernetes but at least its transferable knowledge.
I still don't get how folks can hype Postgres with every second post on HN, yet there is no simple batteries-included way to run a HA Postgres cluster with automatic failover like you can do with MongoDB. I'm genuinely curious how people deal with this in production when they're self-hosting.
Beyond the hype, the PostgreSQL community is aware of the lack of "batteries-included" HA. This discussion on the idea of a Built-in Raft replication mentions MongoDB as:
>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.
It's largely cultural. In the SQL world, people are used to accepting the absence of real HA (resilience to failure, where transactions continue without interruption) and instead rely on fast DR (stop the service, recover, check for data loss, start the service). In practice, this means that all connections are rolled back, clients must reconnect to a replica known to be in synchronous commit, and everything restarts with a cold cache.
Yet they still call it HA because there's nothing else.
Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.
Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.
One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.
About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB
CloudNativePG is automation around PostgreSQL, not "batteries included", and not the idea of Kubernetes where pods can die or spawn without impacting the availability. Unfortunately, naming it Cloud Native doesn't transform a monolithic database to an elastic cluster
We’ve recently had a disk failure in the primary and CloudNativePG promoted another to be primary but it wasn’t zero downtime. During transition, several queries failed. So something like pgBouncer together with transactional queries (no prepared statements) is still needed which has performance penalty.
I use Patroni for that in a k8s environment (although it works anywhere). I get an off-the-shelf declarative deployment of an HA postgres cluster with automatic failover with a little boiler-plate YAML.
Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.
It's basially push-button production PG.
There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.
We use patroni and run it outside of k8s on prem, no issues in 6 or 7 years. Just upgraded from pg 12 to 17 with basically no down time without issue either.
Yeah I'm also wondering that. I'm looking for self-host PostgreSQL after Cockroach changed their free tier license but found the HA part of PostgreSQL is really lacking. I tested Patroni which seems to be a popular choice but found some pretty critical problems (https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...). I tried to explore some other solutions, but found out the lack of a high level design really makes the HA for PostgreSQL really hard if not impossible. For example, without the necessary information in WAL, it's hard to enforce primary node even with an external Raft/Paxos coordinator. I wrote some of them down in this blog (https://www.binwang.me/2025-08-13-Why-Consensus-Shortcuts-Fa...) especially in the section "Highly Available PostgreSQL Cluster" and "Quorum".
My theory of why Postgres is still getting the hype is either people don't know the problem, or it's acceptable on some level. I've worked in a team that maintains the in house database cluster (even though we were using MySQL instead of PostgreSQL) and the HA story was pretty bad. But there were engineers manually recover the data lost and resolve data conflicts, either from the recovery of incident or from customer tickets. So I guess that's one way of doing business.
This is my gripe with Postgres as well. Every time I see comments extolling the greatness of Postgres, I can't help but think "ah, that's a user, not a system administrator" and I think that's a completely fair judgement. Postgres is pretty great if you don't have to take care of it.
I manage Postgresql and the thing I really love about it is that there's not much no manage. It just works. Even setting up streaming replication is really easy.
it's easy to through names out like this (pgbackrest is also useful...) but getting them setup properly in a production environment is not at all straightforward, which I think is the point.
I've been self-hosting Postgres for production apps for about 6 years now. The "3 AM database emergency" fear is vastly overblown in my experience.
In reality, most database issues are slow queries or connection pool exhaustion - things that happen during business hours when you're actively developing. The actual database process itself just runs. I've had more AWS outages wake me up than Postgres crashes.
The cost savings are real, but the bigger win for me is having complete visibility. When something does go wrong, I can SSH in and see exactly what's happening. With RDS you're often stuck waiting for support while your users are affected.
That said, you do need solid backups and monitoring from day one. pgBackRest and pgBouncer are your friends.
I have ran (read: helped with infrastructure) a small production service using PSQL for 6 years, with up to hundreds of users per day. PSQL has been the problem exactly once, and it was because we ran out of disk space. Proper monitoring (duh) and a little VACUUM would have solved it.
Later I ran a v2 of that service on k8s. The architecture also changed a lot, hosting many smaller servers sharing the same psql server(Not really microservice-related, think more "collective of smaller services ran by different people"). I have hit some issues relating to maxing out the max connections, but that's about it.
This is something I do on my free time so SLA isn't an issue, meaning I've had the ability to learn the ropes of running PSQL without many bad consequences. I'm really happy I have had this opportunity.
My conclusion is that running PSQL is totally fine if you just set up proper monitoring. If you are an engineer that works with infrastructure, even just because nobody else can/wants to, hosting PSQL is probably fine for you. Just RTFM.
I generally read the parts I think I need, based on what I read elsewhere like Stackoverflow and blog posts. Usually the real docs are better than some random person's SO comment. I feel that's sufficient?
What irks me about so many comments in this thread is that they often totally ignore questions of scale, the shape of your workloads, staffing concerns, time constraints, stage of your business, whether you require extensions, etc.
There is a whole raft of reasons why you might be a candidate for self-hosting, and a whole raft of reasons why not. This article is deeply reductive, and so are many of the comments.
Bad, short-sighted engineers will do that. An engineer who is not acting solely in the best interests of the wider organisation is a bad one. I would not want to work with a colleague who was so detached from reality that they wouldn't consider all GP's suggested facets. Engineering includes soft/business constraints as well as technical ones.
I find it is the opposite way around. I come up with <simple solution> based on open source tooling and I am forced instead to use <expensive enterprise shite> which is 100% lock in proprietary BS because <large corporate tech company> is partnered and is subsidising development. This has been a near constant throughout my career.
since this is on the front page (again?) I guess I'll chime in: learn kubernetes - it's worth it. It did take me 3 attempts at it to finally wrap my head around it I really suggest trying out many different things and see what works for you.
And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.
Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.
Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.
I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.
As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.
And on a similar naming note yet totally unrelated, check out k9s, which is a TUI for Kubernetes cluster admin. All kinds of nifty features built-in, and highly customizable.
No path for busy people, unfortunately. Learn everything from ground up, from containers to Compose to k3s, maybe to kubeadm or hosted. Huge abstraction layers coming from Kubernetes serve their purpose well, but can screw you up when anything goes slightly wrong on the upper layer.
For start, ignore operators, ignore custom CSI/CNI, ignore IAM/RBAC. Once you feel good in the basics, you can expand.
k3sup a cluster, ask an AI on how to serve an nginx static site using trafeik on it and explain every step of it and what it does (it should provide: a config map, a deployment, a service and an ingress)
k3s provides: csi, cni (cluster storage interface, cluster network interface) which is flannel and and local-pv which just maps volumes to disk (pvcs)
trafeik is what routes your traffic from the outside to inside your cluster (to an ingress resource)
I'm probably just an idiot, but I ran unmanaged postgres on Fly.io, which is basically self hosting on a vm, and it wasn't fun.
I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?
Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.
I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.
Beyond the usual points there are some other important factors to consider self-hosting PG:
1. Access to any extension you want and importantly ability to create your own extensions.
2. Being able to run any version you want, including being able to adopt patches ahead of releases.
3. Ability to tune for maximum performance based on the kind of workload you have. If it's massively parallel you can fill the box with huge amounts of memory and screaming fast SSDs, if it's very compute heavy you can spec the box with really tall cores etc.
Self hosting is rarely about cost, it's usually about control for me.
Being able to replace complex application logic/types with a nice custom pgrx extension can save massive amounts of time. Similarity using a custom index access method can unlock a step change in performance unachievable without some non-PG solution that would compromise on simplicity by forcing a second data store.
And if you want a supabase-like functionality, I'm a huge fan of PostgREST (which is actually how supabase works/worked under the hood). Make a view for your application and boom, you have a GET only REST API. Add a plpgsql function, and now you can POST. It uses JWT for auth, but usually I have application on the same VLAN as DB so it's not as rife for abuse.
I hosted PostgreSQL professionally for over a decade.
Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.
Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.
I started in this industry before cloud was a thing. I did most of the things RDS does the hard way (except being able to dynamically increase memory on a running instance, that's magic to me). I do not want that responsibility, especially because I know how badly it turns out when it's one of a dozen (or dozens) of responsibilities asked of the team.
I had a single API endpoint performing ~178 Postgres SQL queries.
Setup Latency/query Total time
-------------------------------------------------
Same geo area 35ms 6.2s
Same local network 4ms 712ms
Same server ~0ms 170ms
This is with zero code changes, these time shavings are coming purely from network latency. A lot of devs lately are not even aware of latency costs coming from their service locations. It's crazy!
I've been self hosting Postgresql for 12+ years at this point. Directly on bare metal then and now in a container with CapRover.
I have a cron sh script to backup to S3 (used to be ftp).
It's not "business grade" but it has also actually NEVER failed. Well once, but I think it was more the container or a swarm thing. I just destroyed and recreated it and it picked up the same volume fine.
The biggest pain point is upgrading as Postgresql can't upgrade the data without the previous version installed or something. It's VERY annoying.
automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.
I've had my hair on fire because my app code shit the bed. I've never ever (throughout 15 years of using it in everything I do) had to even think about Postgres, and yes, I always set it up self-hosted. The only concern I've had is when I had to do migrations where I had to upgrade PG to fit with upgrades in the ORM database layer. Made for some interesting stepping-stone upgrades once in a while but mostly just careful sysadmining.
I often find it sad how many things that we did, almost without thinking about them, that are considered hard today. Take a stroll through this thread and you will find out that everything from RAID to basic configuration management are ultrahard things that will lead you to having a bus factor of 1.
What do you postgres self hosters use for performance analysis? Both GCP-SQL and RDS have their performance analysis pieces of the hosted DB and it's incredible. Probably my favorite reason for using them.
Standard Postgres compiled with some AWS-specific monitoring hooks
A custom backup system using EBS snapshots
Automated configuration management via Chef/Puppet/Ansible
Load balancers and connection pooling (PgBouncer)
Monitoring integration with CloudWatch
Automated failover scripting
I didn't know RDS had PgBouncer under the hood, is this really accurate?
The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.
I don't think it does. AWS has this feature under RDS Proxy, but it's an extra service and comes with extra cost (and a bit cumbersome to use in my opinion, it should have been designed as a checkbox, not an entire separate thing to maintain).
Although, it technically has "load balancer", in form of a DNS entry that resolves to a random reader replica, if I recall correctly.
I've operated both self-hosted and managed database clusters with complex topologies and mission-critical data at well-known tech companies.
Managed database services mostly automate a subset of routine operational work, things like backups, some configuration management, and software upgrades. But they don't remove the need for real database operations.
You still have to validate restores, build and rehearse a disaster recovery plan, design and review schemas, review and optimize queries, tune indexes, and fine-tune configuration, among other essentials.
In one incident, AWS support couldn't determine what was wrong with an RDS cluster and advised us to "try restarting it".
Bottom line: even with managed databases, you still need people on the team who are strong in DBOps. You need standard operating procedures and automation, built by your team. Without that expertise, you're taking on serious risk, including potentially catastrophic failure modes.
I've had an RDS instance run out of disk space and then get stuck in "modifying" for 24 hours (until an AWS operator manually SSH'd in I guess). We had to restore from the latest snapshot and manually rebuild the missing data from logs/other artifacts in the meantime to restore service.
I would've very much preferred being able to SSH in myself and fix it on the spot. Ironically the only reason it ran out of space in the first place is that the AWS markup on that is so huge we were operating with little margin for error; none of that would happen with a bare-metal host where I can rent 1TB of NVME for a mere 20 bucks a month.
As far as I know we never got any kind of compensation for this, so RDS ended up being a net negative for this company, tens of thousands spent over a few years for laptop-grade performance and it couldn't even do its promised job the only time it was needed.
I would suggest if you do host your database yourself consider taking the data seriously. Few easy solutions are using a multi zonal disk [1] with scheduled automatic snapshots [2].
Snapshots might break ACID for last few transactions but it will flush all in-memory writes before taking the freeze. Consider its 1 click solution, its good enough than losing everything?
Over a decade of cloud provider propaganda achieves that. We appear to have lost the basic skill of operating a *nix machine, so anything even remotely close to that now sounds terrifying.
You mean you need to SSH into the box? Horrifying!
I also self-host my webapp for 4+ years. never have any trouble with databases.
pg_basebackup and wal archiving work wonder. and since I always pull the database (the backup version) for local development, the backup is constantly verified, too.
I don't feel like it's easy to self-host postgres.
Here are my gripes:
1. Backups are super-important. Losing production data just is not an option. Postgres offers pgdump which is not appropriate tool, so you should set up WAL archiving or something like that. This is complicated to do right.
2. Horizontal scalability with read replicas is hard to implement.
3. Tuning various postgres parameters is not a trivial task.
4. Upgrading major version is complicated.
5. You probably need to use something like pgbouncer.
6. Database usually is the most important piece of infrastructure. So it's especially painful when it fails.
I guess it's not that hard when you did it once and have all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier.
wal archiving is piss easy. you can also just use basebackup. with postgres 17 it is easier than ever with incremental backup feature.
you don't need horizontal scalability when a single server can have 384 cpu real cores, 6TB of ram, some petabytes of pcie5 ssd, 100Gbps NIC.
for tuning postgres parameters, you can start by using pgtune.leopard.in.ua or pgconfig.org.
upgrading major version is piss easy since postgres 10 or so. just a single command.
you do not need pgbouncer if your database adapter library already provide the database pool functionality (most of them do).
for me maintained database also need that same amount of effort, due to shitty documents and garbage user interfaces (all aws, gcp or azure is the same), not to mention they change all the time.
"all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier."
so we need open source way to do that, coolify/dokploy comes to mind and it exactly do that way
I would say 80% of your point wouldnt be hit at certain scale, as most application grows and therefore outgrow your tech stack. you would replace them anyway at some point
Scaling to a different instance size is also easy on AWS.
That said a self hosted DB on a dedicated Hetzner flies. It does things at the price that may save you time reworking your app to be more efficient on AWS for cost.
Main focus here is a tested solution with automated backup and recovery, leaving out the complicated parts like clustering, prioritizing MTTR over MTBF.
The naming of RDS is a little bit presumptuous I know, but it works quite well :-)
I'm not a cloud-hosting fan, but comparing RDS to a single instance DB seems crazy to me. Even for a hobby project, I couldn't accept losing data since the last snapshot. If you are going to self-host PostgreSQL in production, make sure you have at least some knowledge how to setup streaming replication and have monitoring in place making sure the replication works. Ideally, use something like Patroni for automatic failover. I'm saying this a someone running fairly large self-hosted HA PostgreSQL databases in production.
RDS is not, by default, multi-instance and multi-region or fault tolerant at all - you choose all of that in your instance config. The amount of single-instance single-region zero-backup RDS setup's I've seen in the wild is honestly concerning. Do Devs think an RDS instance on it's own without explicit configuration is fault tolerant and backed-up? If you have an ec2 instance with EBS and auto-restart you have almost identical fault tolerance (yes there are some slight nuances on RDS regarding recovery following a failure).
Just found that assumption a bit dangerous. The ease with which you can set that up is easy on RDS but it's not on by default.
> If your database goes down at 3 AM, you need to fix it.
Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.
I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it
From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am
Really? That might be an anecdote sampled from unusually small businesses, then. Between myself and most peers I’ve ever talked to about availability, I heard an overwhelming majority of folks describe systems that really did need to be up 24/7 with high availability, and thus needed fast 24/7 incident response.
That includes big and small businesses, SaaS and non-SaaS, high scale (5M+rps) to tiny scale (100s-10krps), and all sorts of different markets and user bases. Even at the companies that were not staffed or providing a user service over night, overnight outages were immediately noticed because on average, more than one external integration/backfill/migration job was running at any time. Sure, “overnight on call” at small places like that was more “reports are hardcoded to email Bob if they hit an exception, and integration customers either know Bob’s phone number or how to ask their operations contact to call Bob”, but those are still environments where off-hours uptime and fast resolution of incidents was expected.
Between me, my colleagues, and friends/peers whose stories I know, that’s an N of high dozens to low hundreds.
IME the need for 24x7 for B2B apps is largely driven by global customer scope. If you have customers in North American and Asia, now you need 24x7 (and x365 because of little holiday overlap).
That being said, there are a number of B2B apps/industries where global scope is not a thing. For example, many providers who operate in the $4.9 trillion US healthcare market do not have any international users. Similarly the $1.5 trillion (revenue) US real estate market. There are states where one could operate where healthcare spending is over $100B annually. Banks. Securities markets. Lots of things do not have 24x7 business requirements.
Great read. I moved my video sharing app from GCP to self hosted on a beefy home server+ cloudflare for object storage and video streaming. Had been using Cloud SQL as my managed db and now running Postgres on my own dedicated hardware. I was forced to move away from the cloud primarily because of the high cost of running video processing(not because Cloud SQL was bad) but instead have discovered self hosting the db isnt as difficult as its made out to be. And there was a daily charge of keeping the DB hot which I dont have now. Will be moving to a rackmount server at a datacolo in about a month so this was great to read and confirms my experience.
I would have liked to read about the "high availability" that's mentioned a couple of times in the article; the WAL Configuration section is not enough, and replication is expensive'ish.
There are a couple of things that are being glossed over:
Hardware failures and automated fail overs. That's a thing AWS and other managed hosting solutions do. Hardware will eventually fail of course. In AWS this would be a non event. It will fail over, a replacement spins up, etc. Same with upgrades, and other stuff.
Configuration complexity. The author casually outlines a lot of fairly complex design involving all sorts of configuration tweaks, load balancing, etc. That implies skills most teams don't have. I know enough to know that I have quite a bit of reading up to do if I ever were to decide to self host postgresql. Many people would make bad assumptions about things being fine out of the box because they are not experienced postgresql DBAs.
Vacations/holidays/sick days. Databases may go down when it's not convenient to you. To mitigate that, you need to have several colleagues that are equally qualified to fix things when they go down while you are away from keyboard. If you haven't covered that risk, you are taking a bit of risk. In a normal company, at least 3-4 people would be a good minimum. If you are just measuring your own time, you are not being honest or not being as diligent as you should be. Either it's a risk you are covering at a cost or a risk you are ignoring.
With managed hosting, covering all of that is what you pay for. You are right that there are still failure modes beyond that that need covering. But an honest assessment of the time you, and your team, put in for this adds up really quickly.
Whatever the reasons you are self hosting, cost is probably a poor one.
The author's experience is trivial, so it indicates nothing. Anybody can set up a rack of postgresql servers and say it's great in year 2. All the hardware is under warranty and it still works anyway. There haven't been any major releases. The platform software is still "LTS". Nobody has needed to renegotiate the datacenter lease yet. So experience in year 2 tells you nothing.
From my point of view the real challenge comes when you want high availability and need to setup a Postgres cluster.
With MongoDB you simply create a replicaset and you are done.
When planing a Postgres cluster, you need to understand replication options, potentially deal with Patroni. Zalandos Docker Spilo image is not really maintained, the way to go seems CloudNativePG, but that requires k8s.
I still don’t understand why there is no easy built-in Postgres cluster solution.
I have been self hosting a product on Postgres that serves GIS applications for 20 years and that has been upgraded through all of the various versions during that time. It has a near perfect uptime record modulo two hardware failures and short maintenance periods for final upgrade cutovers. The application has real traffic - the database is bigger than those at my day job.
Standard Postgres compiled with some AWS-specific monitoring hooks
A custom backup system using EBS snapshots
Automated configuration management via Chef/Puppet/Ansible
Load balancers and connection pooling (PgBouncer)
Monitoring integration with CloudWatch
Automated failover scripting
Every company I've ever on boarded at, that hosted their own database, had number one, and a lot of TODOs around the rest. It's really hard! Honestly, it could be a full time job for a team. And that's more expensive than RDS.
Self-hosting Postgres is so incredibly easy. People are under this strange spell that they need to use an ORM or always reach for SQLite when it’s trivially easy to write raw SQL. The syntax was designed so lithium’d out secretaries were able to write queries on a punchcard. Postgres has so many nice lil features.
> When self-hosting makes sense: 1. If you're just starting out in software & want to get something working quickly [...]
This is when you use SQLite, not Postgres. Easy enough to turn into Postgres later, nothing to set up. It already works. And backups are literally just "it's a file, incremental backup by your daily backups already covers this".
I was on a severely restricted budget and self hosted everything for 15+ years, while the heavily used part of the database was on a RAM card. The RAM drive was soft raided to a hard drive pair which were 3Ware raid1 hdds, just in case, and also did a daily backup on the database and during that time never had any data loss and never had to restore anything from backup. And my options were severely restricted due to a capped income.
The real downside wasn't technical. The constant background anxiety you had to learn to live with, since the hosted news sites were hammered by the users. The dreaded SMS alerts saying the server was inaccessible (often due to ISP issues) or going abroad meant persuading one of your mates to keep an eye on things just in case, created a lot of unnecessary stress.
AWS is quite good. It has everything you need and removes most of that operational burden, so the angst is much lower, but the pricing is problematic.
I’ve been managing a 100+ GB PostgreSQL database for years. Each two years I upgrade the VPS for the size, and also the db and os version.
The app is in the same VPS as the DB. A 2 hour window each two years is ok for the use case. No regrets.
I wish this post went into the actual how! He glossed over the details. There is a link to his repo, which is a start I suppose: https://github.com/piercefreeman/autopg
A blog post that went into the details would be awesome. I know Postgres has some docs for this (https://www.postgresql.org/docs/current/backup.html), but it's too theoretical. I want to see a one-stop-shop with everything you'd reasonably need to know to self host: like monitoring uptime, backups, stuff like that.
I'd argue forget about Postgres completely. If you can shell out $90/month, the only database you should use is GCP Spanner (yes, this also means forget about any mega cloud other than GCP unless you're fine paying ingress and egress).
And for small projects, SQLite, rqlite, or etcd.
My logic is either the project is important enough that data durability matters to you and sees enough scale that loss of data durability would be a major pain in the ass to fix, or the project is not very big and you can tolerate some lost committed transactions.
A consensus-replication-less non-embedded database has no place in 2025.
This is assuming you have relational needs. For non-relational just use the native NoSQL in your cloud, e.g. DynamoDB in AWS.
You seem insanely miscalibrated. $90 gets you a dedicated server that covers most projects' needs. data durability isnt some magic that only cloud providers can get you.
If you can lose committed transactions in case of single node data failure, you don't have durability. Then it comes down to do you really care about durability.
I think a big piece missing from these conversations is compliance frameworks and customer trust. Of your selling to enterprise customers or governments, they want to go through your stack, networking, security, audit logs, and access controls with a fine toothed comb.
Everything you do that isn't "normal" is another conversation you need to have with an auditor plus each customer. Those eat up a bunch of time and deals take longer to close.
Right or wrong, these decisions make you less "serious" and therefore less credible in the eyes of many enterprise customers. You can get around that perception, but it takes work. Not hosting on one of the big 3 needs to be decided with that cost in mind
I think we can get to the point where we have self-hosted agents that can manage db maintenance and recovery. There could be regular otel -> * -> Grafana -> ~PagerDuty -> you and TriageBot which would call specialists to gather state and orchestrate a response.
Scripts could kick off health reports and trigger operations. Upgrades and recovery runbooks would be clearly defined and integration tested.
It would empower personal sovereignty.
Someone should make this in the open. Maybe it already exists, there are a lot of interesting agentops projects.
If that worked 60% of the time and I had to figure out the rest, I’d self host that. I’d pay for 80%+.
however, like always, 'complexity has to live somewhere'. I doubt even Opus 4.5 could handle this. as soon as you get into database records themselves, context is going to blow up and you're going to have a bad time
I generally agree with the author, however, there are a handful of relatively prominent, recent examples (eg [1]) that many admins might find scary enough to prefer a hosted solution.
I wish this article would have went more in-depth on how they're setting up backups. The great thing about sequel light is lightstream makes backup and restore something you don't really have to think about
A lot of this comes down to devs not understanding infrastructure and infrastructure components and the insane interplay and complexity. And they don't care! Apps, apps apps, developers, developers, developers!
On the managerial side, it's often about deflection of responsibility for the Big Boss.
It's not part of the app itself it can be HARD, and if you're not familiar with things, then it's also scary! What if you mess up?
(Most apps don't need the elasticity, or the bells and whistles, but you're paying for them even if you don't use them, indirectly.)
I didnt even know there were companies that would host postgres for you. I self host it for my personal projects with 0 users and it works just fine, so I don't know why anyone would do it any differently.
I can't tell if this is satire or not with the first sentence and the "0 users" parts of your comment, but I know several solo devs with millions of users who self host their database and apps as well.
Self-hosting is one of those things that makes sense when you can control all of the variables. For example, can you stop the developers from using obscure features of the db, that suddenly become deprecated, causing you to need to do a manual rolling back while they fix the code? A one-button UI to do that might be very handy. Can you stop your IT department from breaking the VPN, preventing you from logging into the db box at exactly the wrong time? Having it all in a UI that routes around IT's fat fingers might be helpful.
For a fascinating counterpoint (gist: cloud hosted Postgres on RDS aurora is not anything like the system you would host yourself, and other cloud deployments of databases should also not be done like our field is used to doing it when self-hosting) see this other front page article and discussion: https://news.ycombinator.com/item?id=46334990
Aurora is a closed-source fork of PostgreSQL. So it is indeed not possible to self-host it.
However a self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of. Especially at the same price points.
Yep! I was mostly replying to TFA’s claim that AWS RDS is
> Standard Postgres compiled with some AWS-specific monitoring hooks
… and other operational tools deployed alongside it. That’s not always true: RDS classic may be those things, but RDS Aurora/Serverless is anything but.
As to whether
> self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of
That’s often but not always true. Plenty of workloads will perform better on RDS (read auto scaling is huge in Serverless: you can have new read replica nodes auto-launch in response to e.g. a wave of concurrent, massive reporting queries; many queries can benefit from RDS’s additions to/modifications of the pg buffer cache system that work with the underlying storage)—and that’s even with the VM tax and the networked-storage tax! Of course, it’ll cost more in real money whether or not it performs better, further complicating the cost/benefit analysis here.
Also, pedantically, you can run RDS on bare metal with local NVMEs.
Does anyone offer a managed database service where the database and your application server live on the same box? Until, I can get such latency advantages of such a set-up, we've found latency just too high to go with a managed solution. We are already spending too much batching or vectorizing database reads.
I recently was also doing some research into what projects exist that come close to a “managed Postgres on Digital Ocean” experience, sadly there’s some building blocks but nothing that really makes it a complete no-brainer.
Huh. I thought hosting one's own databases was still the norm. Guess I'm just stuck in the past, or don't consume cloud vendor marketing, or something.
Enjoyed the article, and the "less can be more than you think" mindset in general.
To the author - on Android Chrome I seem to inevitably load the page scrolled to the bottom, footnotes area. Scrolling up, back button, click link again has the same results - I start out seeing footnotes. Might be worth a look.
Just don't try to build it from source haha. Compiling Postgres 18 with the PostGIS extension has been such a PITA because the topology component won't configure to not use the system /usr/bin/postgres and has given me a lot of grief. Finally got it fixed I think though.
Sometimes it is nice to simplify the conversation with non-tech management. Oh, you want HA / DR / etc? We click a button and you get it (multi-AZ). Clicking the button doubles your DB costs from x to y. Please choose.
Then you have one less repeating conversation and someone to blame.
Can you get away without exposing it to the internet? Firewall it off altogether, or just open the address of a specific machine that needs access to it?
Without stating actual numbers if not comfortable, what was the % savings one over the other? Happy with performance? Looking at potential of doing the same move.
Huh? Maybe I missed something, but...why should self-hosting a database server be hard or scary? Sure, you are then responsible for security backups, etc...but that's not really different in the cloud - if anything, the cloud makes it more complicated.
Well for the clickops folks who've built careers on the idea that 'systems administration is dead'... I imagine having to open a shell and install some stuff or modify a configuration file is quite scary.
> Hiring and replacing engineers who can and want to manage database servers can be hard or scary for employers.
I heard there's this magical thing called "money" that is claimed to help with this problem. You offer even half of the AWS markup to your employees and suddenly they like managing database servers. Magic I tell you!
I'd say a managed dB, at minimum, should be handling upgrades and backups for you. If it doesn't, thats not a managed db, thats a self-service db. You're paying a premium to do the work yourself.
Better yet, self host Postgres on your own open source PaaS with Coolify, Dokploy, or Canine, and then you can also self host all your apps on your VPS too. I use Dokploy but I'm looking into Canine, and I know many have used Coolify with great success.
Cooking the RDS equivalent is reasonable amount of work, and pretty big amount of knowledge (easy to make failover solution have lower uptime than "just a single VM" if you don't get everything right)
... but you can do a lot with just "a single VM and robust backup". PostgreSQL restore is pretty fast, and if you automated deployment you can start with it in minutes, so if your service can survive 30 minutes of downtime once every 3 years while the DB reloads, "downgrading" to "a single cloud VM" or "a single VM on your own hardware" might not be a big deal.
And then there is the urge to Postgres everything.
I was disappointed alloy doesn't support timescaledb as a metrics endpoint. Considering switching to telegraf just because I can store the metrics on Postgres.
It's pretty easy these days to spin up a local Postgres container. Might as well use it for prototyping too, and save yourself the hassle of switching later.
Have you given thought to why you prototype with SQLite?
I have switched to using postgres even for prototyping once I prepared some shell scripts for various setup. With hibernate (java) or knex (Javascript/NodeJS) and with unit tests (Test Driven Development approach) for code, I feel I have reduced the friction of using postgres from the beginning.
Ironically you need a bit of both. You need to be expert enough to make it work, but not "too" expert to be stuck in your ways and/or influenced by all the fear-mongering.
An expert will give you thousands of theoretical reasons why self-hosting the DB is a bad idea.
An "expert" will host it, enjoy the cost savings and deal with the once-a-year occurrence of the theoretical risk (if it ever occurs).
honestly at this point I'm actually surprised that there aren't specialized linux distributions for hosting postgres. There's so many kernel-level and file-system level optimizations that can be done that significantly impact performance, and the ability to pare down all of the unneeded stuff in most distributions would make for a pretty compact and highly optimized image.
Recommends hosting postgres yourself. Doesn't recommend a distribution stack. If you try this at a startup to save $50 a month, you will never recoup the time you wasted setting it up. We pay dedicated managed services for these things so we can make products on top of them.
"just use postgres from your distro" is *wildly* underselling the amount of work that it takes to go from apt install postgres to having a production ready setup (backups, replica, pooling, etc). Granted, if it's a tiny database just pg-dumping might be enough, but for many that isn't going to be enough.
The one problem with using your distro's Postgres is that your upgrade routine will be dictated by a 3rd party.
And Postgres upgrades are not transparent. So you'll have a 1 or 2 hours task, every 6 to 18 months that you have only a small amount of control over when it happens. This is ok for a lot of people, and completely unthinkable for some other people.
Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS. It's amazing and it works perfectly fine.
For client projects, however, I always try and sell them on paying the AWS fees, simply because it shifts the responsibility of the hardware being "up" to someone else. It does not inherently solve the downtime problem, but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."
Doesn't always work like that and isn't always a tried-and-true excuse, but generally lets me sleep much better at night.
With limited budgets, however, it's hard to accept the cost of RDS (and we're talking with at least one staging environment) when comparing it to a very tight 3-node Galera cluster running on Hetzner at barely a couple of bucks a month.
Or Cloudflare, titan at the front, being down again today and the past two days (intermittently) after also being down a few weeks ago and earlier this year as well. Also had SQS queues time out several times this week, they picked up again shortly, but it's not like those things ...never happen on managed environments. They happen quite a bit.
Over 20 year I've had lots of clients on self-hosted, even self-hosting SQL on the same VM as the webserver as you used to in the long distant past for low-usage web apps.
I have never, ever, ever had a SQL box go down. I've had a web server go down once. I had someone who probably shouldn't have had access to a server accidentally turn one off once.
The only major outage I've had (2/3 hours) was when the box was also self-hosting an email server and I accidentally caused it to flood itself with failed delivery notices with a deploy.
I may have cried a little in frustration and panic but it got fixed in the end.
I actually find using cloud hosted SQL in some ways harder and more complicated because it's such a confusing mess of cost and what you're actually getting. The only big complication is setting up backups, and that's a one-off task.
Disks go bad. RAID is nontrivial to set up. Hetzner had a big DC outage that lead to data loss.
Off site backups or replication would help, though not always trivial to fail over.
15 replies →
Me: “Why are we switching from NoNameCMS to Salesforce?”
Savvy Manager: “NoNameCMS often won’t take our support calls, but if Salesforce goes down it’s in the WSJ the next day.”
This ignores the case when BigVendor is down for your account and your account only and support is mia, which is not that uncommon ime
13 replies →
Just wait until you end up spending $100,000 for an awful implantation from a partner who pretends to understand your business need but delivers something that doesn’t work.
But perhaps I’m bitter from prior Salesforce experiences.
> but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."
From my experience your client’s clients don’t care about this when they’re still otherwise up.
Yes but the fact that it's "not their fault" keeps the person from getting fired.
Don't underestimate the power of CYA
8 replies →
From my experience, this completely disavows you from an otherwise reputation damaging experience.
You can still outsource up to VM level and handle everything else on you own.
Obviously it depends on the operational overhead of specific technology.
> Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS
It is. You need to answer the question: what are the consecuences of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security? Many people are technically unable, lack the time or the resources to be able to confidently address that question, hence paying for someone else to do it.
Your time is money though. You are saving money but giving up time.
Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
You can pay someone else to manage your hardware stack, there are literal companies that will just keep it running, while you just deploy your apps on that.
> It is. You need to answer the question: what are the consecuences of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security?
There is one advantage self hosted setup has here, if you set up VPN, only your employees have access, and you can have server not accessible from the internet. So even in case of zero day that WILL make SaaS company leak your data, you can be safe(r) with self-hosted solution.
> Your time is money though. You are saving money but giving up time.
The investment compounds. Setting up infra to run a single container for some app takes time and there is good chance it won't pay back for itself.
But 2nd service ? Cheaper. 5th ? At that point you probably had it automated enough that it's just pointing it at docker container and tweaking few settings.
> Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
It's cheaper if you include your own time. You pay a technical person at your company to do it. Saas company does that, then pays sales and PR person to sell it, then pays income tax to it, then it also needs to "pay" investors.
Yeah making a service for 4 people in company can be more work than just paying $10/mo to SaaS company. But 20 ? 50 ? 100 ? It quickly gets to point where self hosting (whether actually "self" or by using dedicated servers, or by using cloud) actually pays off
> Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
In a business context the "time is money" thing actually makes sense, because there's a reasonable likelihood that the business can put the time to a more profitable use in some other way. But in a personal context it makes no sense at all. Realistically, the time I spend cooking or cleaning was not going to earn me a dime no matter what else I did, therefore the opportunity cost is zero. And this is true for almost everyone out there.
2 replies →
Yea I agree.. better outsource product development, management, and everything else too by that narrative
18 replies →
That argument does not hold when there is aws serverless pg available, which cost almost nothing for low traffic and is vastly superior to self hosting regarding observability, security, integration, backup ect...
There is no reason to self manage pg for dev / environnement.
https://aws.amazon.com/rds/aurora/serverless/
"which cost almost nothing for low traffic" you invented the retort "what about high traffic" within your own message. I don't even necessarily mean user traffic either. But if you constantly have to sync new records over (as could be the case in any kind of timeseries use-case) the internal traffic could rack up costs quickly.
"vastly superior to self hosting regarding observability" I'd suggest looking into the cnpg operator for Postgres on Kubernetes. The builtin metrics and official dashboard is vastly superior to what I get from Cloudwatch for my RDS clusters. And the backup mechanism using Barman for database snapshots and WAL backups is vastly superior to AWS DMS or AWS's disk snapshots which aren't portable to a system outside of AWS if you care about avoiding vendor lock-in.
This was true for RDS serverless v1 which scaled to 0 but is no longer offered. V2 requires a minimum 0.5 ACU hourly commit ($40+ /mo).
2 replies →
Aurora serverless requires provisioned compute - it’s about $40/mo last time I checked.
1 reply →
Just use a pg container on a vm, cheap as chips and you can do anything to em.
> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:
> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
This is funny. I'd argue the exact opposite. I would self host only:
* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or
* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...
> employing engineers to manage self-hosted databases is more cost effective than outsourcing
Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.
Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.
> Every company out there is using the cloud and yet still employs infrastructure engineers
Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.
21 replies →
> Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).
Also your argument is incorrect in my experience.
At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.
13 replies →
> still employs infrastructure engineers
> The "cloud" reducing staff costs
Both can be true at the same time.
Also:
> Otherwise you're waking up at 3am no matter what.
Do you account for frequency and variety of wakeups here?
2 replies →
In-house vs Cloud Provider is largely a wash in terms of cost. Regardless of the approach, you are going need people to maintain stuff and people cost money. Similarly compute and storage cost money so what you lose on the swings, you gain on the roundabouts.
In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.
US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)
Working in a university Lab self-hosting is the default for almost anything. While I would agree that cost are quite low, I sometimes would be really happy to throw money at problems to make them go away. Without having the chance and thus being no expert, I really see the opportunity of scaling (up and down) quickly in the cloud. We ran a postgres database of a few 100 GB with multiple read replica and we managed somehow, but actually really hit our limits of expertise at some point. At some point we stopped migrating to newer database schemas because it was just such a hassle keeping availability. If I had the money as company, I guess I would have paid for a hosted solution.
I don’t think it’s a lie, it’s just perhaps overstated. The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.
Whether or not you need that equivalence is an orthogonal question.
3 replies →
The fact that as many engineers are on payroll doesn't mean that "cloud" is not an efficiency improvement. When things are easier and cheaper, people don't do less or buy less. They do more and buy more until they fill their capacity. The end result is the same number (or more) of engineers, but they deal with a higher level of abstraction and achieve more with the same headcount.
I can't talk about staff costs, but as someone who's self-hosted Postgres before, using RDS or Supabase saves weeks of time on upgrades, replicas, tuning, and backups (yeah, you still need independent backups, but PITRs make life easier). Databases and file storage are probably the most useful cloud functionality for small teams.
If you have the luxury of spending half a million per year on infrastructure engineers then you can of course do better, but this is by no means universal or cost-effective.
Well sure you still have 2 or 3 infra people but now you don’t need 15. Comparing to modern Hetzner is also not fair to “cloud” in the sense that click-and-get-server didn’t exist until cloud providers popped up. That was initially the whole point. If bare metal behind an API existed in 2009 the whole industry would look very different. Contingencies Rule Everything Around Me.
You are missing that most services don't have high availability needs and don't need to scale.
Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.
I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.
> at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
This is the crux of one of the most common fallacies in software engineering decision making today. I've participated in a bunch of architecture / vendor evaluations that concluded managed services are more cost effective almost purely because they underestimated (or even discarded entirely) the internal engineering cost of vendor management. Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
But most importantly - even if it's significantly less effort than self-hosting, it's never effectively costed when evaluating trade-offs - that's what leads to this persistent myth about the engineering cost of self-hosting. "Managing" managed services is a non-zero cost.
Add to that the ultimate trade-off of accountability vs availability (internal engineers care less about availability when it's out of there hands - but it's still a loss to your product either way).
> Black box debugging is one of the most time costuming engineering pursuits, & even when it's something widely documented & well supported like RDS, it's only really tuned for the lowest common denominator - the complexities of tuning someone else's system at scale can really add up to only marginally less effort than self-hosting (if there's any difference at all).
I'm really not sure what you're talking about here. I manage many RDS clusters at work. I think in total, we've spent maybe eight hours over the last three years "tuning" the system. It runs at about 100kqps during peak load. Could it be cheaper or faster? Probably, but it's a small fraction of our total infra spend and it's not keeping me up at night.
Virtually all the effort we've ever put in here has been making the application query the appropriate indexes. But you'd do no matter how you host your database.
Hell, even the metrics that RDS gives you for free make the thing pay for itself, IMO. The thought of setting up grafana to monitor a new database makes me sweat.
4 replies →
its not. I've been in a few shops that use RDS because they think their time is better spend doing other things.
except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.
One thing unaccounted for if you've only ever used cloud-hosted DBs is just how slow they are compared to a modern server with NVME storage.
This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.
On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.
3 replies →
Interesting. Is this an issue with RDS?
I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.
1 reply →
| self hosting costs you between 30 and 120 minutes per month
Can we honestly say that cloud services taking a half hour to two hours a month of someone's time on average is completely unheard of?
I handle our company's RDS instances, and probably spend closer to 2 hours a year than 2 hours a month over the last 8 years.
It's definitely expensive, but it's not time-consuming.
1 reply →
Very much depends on what you're doing in the cloud, how many services you are using, and how frequently those services and your app needs updates.
Self hosting does not cost you that much at all. It's basically zero once you've got backups automated.
I also encourage people to just use managed databases. After all, it is easy to replace such people. Heck actually you can fire all of them and replace the demand with genAI nowadays.
Agreed. As someone in a very tiny shop, all us devs want to do as little context switching to ops as possible. Not even half a day a month. Our hosted services are in aggregate still way cheaper than hiring another person. (We do not employ an "infrastructure engineer").
The discussion isn't "what is more effective". The discussion is "who wants to be blamed in case things go south". If you push the decision to move to self-hosted and then one of the engineers fucks up the database, you have a serious problem. If same engineer fucks up cloud database, it's easier to save your own ass.
> trading an hour or two of my time
pacman -S postgresql
initdb -D /pathto/pgroot/data
grok/claude/gpt: "Write a concise Bash script for setting up an automated daily PostgreSQL database backup using pg_dump and cron on a Linux server, with error handling via logging and 7-day retention by deleting older backups."
ctrl+c / ctrl+v
Yeah that definitely took me an hour or two.
So your backups are written to the same disk?
> datacenter goes up in flames
> 3-2-1 backups: 3 copies on 2 different types of media with at least 1 copy off-site. No off-site copy.
Whoops!
So, yeah, I guess there's much confusion about what a 'managed database' actually is? Because for me, the table stakes are:
-Backups: the provider will push a full generic disaster-recovery backup of my database to an off-provider location at least daily, without the need for a maintenance window
-Optimization: index maintenance and storage optimization are performed automatically and transparently
-Multi-datacenter failover: my database will remain available even if part(s) of my provider are down, with a minimal data loss window (like, 30 seconds, 5 minutes, 15 minutes, depending on SLA and thus plan expenditure)
-Point-in-time backups are performed at an SLA-defined granularity and with a similar retention window, allowing me to access snapshots via a custom DSN, not affecting production access or performance in any way
-Slow-query analysis: notifying me of relevant performance bottlenecks before they bring down production
-Storage analysis: my plan allows for #GB of fast storage, #TB of slow storage: let me know when I'm forecast to run out of either in the next 3 billing cycles or so
Because, well, if anyone provides all of that for a monthly fee, the whole "self-hosting" argument goes out of the window quickly, right? And I say that as someone who absolutely adores self-hosting...
It's even worse when you start finding you're staffing specialized skills. You have the Postgres person, and they're not quite busy enough, but nobody else wants to do what they do. But then you have an issue while they're on vacation, and that's a problem. Now I have a critical service but with a bus factor problem. So now I staff two people who are now not very busy at all. One is a bit ambitious and is tired of being bored. So he's decided we need to implement something new in our Postgres to solve a problem we don't really have. Uh oh, it doesn't work so well, the two spend the next six months trying to work out the kinks with mixed success.
Slack is a necessary component in well functioning systems.
2 replies →
This would be a strange scenario because why would you keep these people employed? If someone doesn't want to do the job required, including servicing Postgres, then they wouldn't be with me any longer, I'll find someone who does.
3 replies →
IMO, the reason to self-host your database is latency.
Yes, I'd say backups and analysis are table stakes for hiring it, and multi-datacenter failover is a relevant nice to have. But the reason to do it yourself is because it's literally impossible to get anything as good as you can build in somebody's else computer.
Yup, often orders of magnitude better.
If you set it up right, you can automate all this as well by self hosting. There is really nothing special about automating backups or multi region fail over.
But then you have to check that these mechanisms work regularly and manually
6 replies →
Self-host things the boss won't call at 3 AM about: logs, traces, exceptions, internal apps, analytics. Don't self-host the database or major services.
Depending on your industry, logs can be very serious business.
Yugabyte open source covers a lot of this
Which providers do all of that?
I don't know which don't?
The default I've used on Amazon and GCP both do (RDS, Cloud SQL)
GCP Alloy DB
There should be no data loss window with a hosted database
Feom what I remember if AWS loses your data they are basically give you some credits and that's it.
That requires synchronous replication, which reduces availability and performance.
Why is that?
As someone who self hosted mysql (in complex master/slave setups) then mariadb, memsql, mongo and pgsql on bare metal, virtual machines then containers for almost 2 decades at this point... you can self host with very little downtime and the only real challenge is upgrade path and getting replication right.
Now with pgbouncer (or whatever other flavor of sql-aware proxy you fancy) you can greatly reduce the complexity involved in managing conventionally complex read/write routing and sharding to various replicas to enable resilient, scalable production-grade database setups on your own infra. Throw in the fact that copy-on-write and snapshotting is baked into most storage today and it becomes - at least compared to 20 years ago - trivial to set up DRS as well. Others have mentioned pgBackRest and that further enforces the ease with which you can set up these traditionally-complex setups.
Beyond those two significant features there isn't many other reasons you'd need to go with hosted/managed pgsql. I've yet to find a managed/hosted database solution that doesn't have some level of downtime to apply updates and patches so even if you go fully hosted/managed it's not a silver bullet. The cost of managed DB is also several times that of the actual hardware it's running on, so there is a cost factor involved as well.
I guess all this to say it's never been a better time to self-host your database and the learning curve is as shallow as it's ever been. Add to all of this that any garden-variety LLM can hand-hold you through the setup and management, including any issues you might encounter on the way.
The author brings up the point, but I have always found surprising how much more expensive managed databases are than a comparable VPS.
I would expect a little bit more as a cost of the convenience, but in my experience it's generally multiple times the expense. It's wild.
This has kept me away from managed databases in all but my largest projects.
Once they convince you that you can’t do it yourself, you end up relying on them, but didn’t develop the skills you would need to migrate to another provider when they start raising prices. And they keep raising prices because by then you have no choice.
There is plenty of provider markup, to be sure. But it is also very much not a given that the hosted version of a database is running software/configs that are equivalent to what you could do yourself. Many hosted databases are extremely different behind the scenes when it comes to durability, monitoring, failover, storage provisioning, compute provisioning, and more. Just because it acts like a connection hanging off a postmaster service running on a server doesn’t mean that’s what your “psql” is connected to on RDS Aurora (or many of the other cloud-Postgres offerings).
1 reply →
I have not tested this in real life yet but it seems like all the argument about vendor lock in can be solved, if you bite the bullet and learn basic Kubernetes administration. Kubernetes is FOSS and there are countless Kubernetes as a service providers.
I know there are other issues with Kubernetes but at least its transferable knowledge.
Wait, are you talking about cloud providers or LLMs?
Yes if the DB is 5x the VM and the the VM is 10x the dedicated server from say OVH etc. then you are payng 50x.
I still don't get how folks can hype Postgres with every second post on HN, yet there is no simple batteries-included way to run a HA Postgres cluster with automatic failover like you can do with MongoDB. I'm genuinely curious how people deal with this in production when they're self-hosting.
Beyond the hype, the PostgreSQL community is aware of the lack of "batteries-included" HA. This discussion on the idea of a Built-in Raft replication mentions MongoDB as:
>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.
https://www.postgresql.org/message-id/0e01fb4d-f8ea-4ca9-8c9...
"I really dream PostgreSQL will be as reliable as MongoDB" ... someone needs to go and read up on Mongo's history!
Sure, the PostrgreSQL HA story isn't what we all want it to be, but the reliability is exceptional.
2 replies →
It's largely cultural. In the SQL world, people are used to accepting the absence of real HA (resilience to failure, where transactions continue without interruption) and instead rely on fast DR (stop the service, recover, check for data loss, start the service). In practice, this means that all connections are rolled back, clients must reconnect to a replica known to be in synchronous commit, and everything restarts with a cold cache.
Yet they still call it HA because there's nothing else. Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.
Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.
One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.
About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB
The most common way to achieve HÁ is using Patroni. The easiest way to set it up is using Autobase (https://autobase.tech).
CloudNativePG (https://cloudnative-pg.io) is a great option if you’re using Kubernetes.
There’s also pg_auto_failover which is a Postgres extension and a bit less complex than the alternatives, but it has its drawbacks.
Be sure to read the Муths and Truths about Synchronous Replication in PostgreSQL (by the author of Patroni) before considering those solutions as cloud-native high availability: https://www.postgresql.eu/events/pgconfde2025/sessions/sessi...
1 reply →
If you’re running Kubernetes, CloudNativePG seems to be the “batteries included” HA Postgres cluster that’s becoming the standard in this area.
CloudNativePG is automation around PostgreSQL, not "batteries included", and not the idea of Kubernetes where pods can die or spawn without impacting the availability. Unfortunately, naming it Cloud Native doesn't transform a monolithic database to an elastic cluster
We’ve recently had a disk failure in the primary and CloudNativePG promoted another to be primary but it wasn’t zero downtime. During transition, several queries failed. So something like pgBouncer together with transactional queries (no prepared statements) is still needed which has performance penalty.
1 reply →
I use Patroni for that in a k8s environment (although it works anywhere). I get an off-the-shelf declarative deployment of an HA postgres cluster with automatic failover with a little boiler-plate YAML.
Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.
It's basially push-button production PG.
There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.
We use patroni and run it outside of k8s on prem, no issues in 6 or 7 years. Just upgraded from pg 12 to 17 with basically no down time without issue either.
3 replies →
Yeah I'm also wondering that. I'm looking for self-host PostgreSQL after Cockroach changed their free tier license but found the HA part of PostgreSQL is really lacking. I tested Patroni which seems to be a popular choice but found some pretty critical problems (https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...). I tried to explore some other solutions, but found out the lack of a high level design really makes the HA for PostgreSQL really hard if not impossible. For example, without the necessary information in WAL, it's hard to enforce primary node even with an external Raft/Paxos coordinator. I wrote some of them down in this blog (https://www.binwang.me/2025-08-13-Why-Consensus-Shortcuts-Fa...) especially in the section "Highly Available PostgreSQL Cluster" and "Quorum".
My theory of why Postgres is still getting the hype is either people don't know the problem, or it's acceptable on some level. I've worked in a team that maintains the in house database cluster (even though we were using MySQL instead of PostgreSQL) and the HA story was pretty bad. But there were engineers manually recover the data lost and resolve data conflicts, either from the recovery of incident or from customer tickets. So I guess that's one way of doing business.
I love Postgresql simply because it never gives me any trouble. I've been running it for decades without trouble.
OTOH, Oracle takes most of my time with endless issues, bugs, unexpected feature modifications, even on OCI!
This is my gripe with Postgres as well. Every time I see comments extolling the greatness of Postgres, I can't help but think "ah, that's a user, not a system administrator" and I think that's a completely fair judgement. Postgres is pretty great if you don't have to take care of it.
I manage Postgresql and the thing I really love about it is that there's not much no manage. It just works. Even setting up streaming replication is really easy.
1 reply →
I’ve been tempted by MariaDB for this reason. I’d love to hear from anyone who has run both.
IMO Maria has fallen behind MySQL. I wouldn't chose it for anything my income depends on.
(I do use Maria at home for legacy reasons, and have used MySQL and Pg professionally for years.)
4 replies →
Patroni, Zolando operator on k8s
Because that’s an expensive and complex boondoggle almost no business needs.
RDS provides some HA. HAProxy or PGBouncer can help when self hosting.
it's easy to through names out like this (pgbackrest is also useful...) but getting them setup properly in a production environment is not at all straightforward, which I think is the point.
2 replies →
Take a look at https://github.com/vitabaks/autobase
In case you want to self host but also have something that takes care of all that extra work for you
Thank you, this looks awesome.
I wonder how well this plays with other self hosted open source PaaS, is it just a Docker container we can run I assume?
Just skimmed the readme. What's the connection pooling situation here? Or is it out of scope?
I've been self-hosting Postgres for production apps for about 6 years now. The "3 AM database emergency" fear is vastly overblown in my experience.
In reality, most database issues are slow queries or connection pool exhaustion - things that happen during business hours when you're actively developing. The actual database process itself just runs. I've had more AWS outages wake me up than Postgres crashes.
The cost savings are real, but the bigger win for me is having complete visibility. When something does go wrong, I can SSH in and see exactly what's happening. With RDS you're often stuck waiting for support while your users are affected.
That said, you do need solid backups and monitoring from day one. pgBackRest and pgBouncer are your friends.
I have ran (read: helped with infrastructure) a small production service using PSQL for 6 years, with up to hundreds of users per day. PSQL has been the problem exactly once, and it was because we ran out of disk space. Proper monitoring (duh) and a little VACUUM would have solved it.
Later I ran a v2 of that service on k8s. The architecture also changed a lot, hosting many smaller servers sharing the same psql server(Not really microservice-related, think more "collective of smaller services ran by different people"). I have hit some issues relating to maxing out the max connections, but that's about it.
This is something I do on my free time so SLA isn't an issue, meaning I've had the ability to learn the ropes of running PSQL without many bad consequences. I'm really happy I have had this opportunity.
My conclusion is that running PSQL is totally fine if you just set up proper monitoring. If you are an engineer that works with infrastructure, even just because nobody else can/wants to, hosting PSQL is probably fine for you. Just RTFM.
Psql (lowercase) is the name of the textual sql client for PostgreSQL. For a general abbreviation we rather use "Pg".
Good catch, thx
But it’s 1500 pages long!
Good point. I sure didn't read it myself :D
I generally read the parts I think I need, based on what I read elsewhere like Stackoverflow and blog posts. Usually the real docs are better than some random person's SO comment. I feel that's sufficient?
What irks me about so many comments in this thread is that they often totally ignore questions of scale, the shape of your workloads, staffing concerns, time constraints, stage of your business, whether you require extensions, etc.
There is a whole raft of reasons why you might be a candidate for self-hosting, and a whole raft of reasons why not. This article is deeply reductive, and so are many of the comments.
Engineers almost never consider any of those questions. And instead deploy the maximally expensive solution their boss will say ok to.
Bad, short-sighted engineers will do that. An engineer who is not acting solely in the best interests of the wider organisation is a bad one. I would not want to work with a colleague who was so detached from reality that they wouldn't consider all GP's suggested facets. Engineering includes soft/business constraints as well as technical ones.
4 replies →
I find it is the opposite way around. I come up with <simple solution> based on open source tooling and I am forced instead to use <expensive enterprise shite> which is 100% lock in proprietary BS because <large corporate tech company> is partnered and is subsidising development. This has been a near constant throughout my career.
1 reply →
since this is on the front page (again?) I guess I'll chime in: learn kubernetes - it's worth it. It did take me 3 attempts at it to finally wrap my head around it I really suggest trying out many different things and see what works for you.
And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.
Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.
Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.
PostgreSQL specifically: https://github.com/cloudnative-pg/cloudnative-pg If you really want networked storage: https://github.com/longhorn/longhorn
I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.
At $WORK we’ve been using the Zalando Postgres kubernetes operator to great success: https://github.com/zalando/postgres-operator
As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.
For something like a database, what is the added advantage to using Kubernetes as opposed to something simple like Docker Compose?
9 replies →
Check out canine.sh, it's to Kubernetes what Coolify or Dokploy is to Docker, if you're familiar with self hosted open source PaaS.
And on a similar naming note yet totally unrelated, check out k9s, which is a TUI for Kubernetes cluster admin. All kinds of nifty features built-in, and highly customizable.
3 replies →
I just push to git where there is a git action to automatically synchronize deployments
Any good recommendations you got for learning kubernetes for busy people?
No path for busy people, unfortunately. Learn everything from ground up, from containers to Compose to k3s, maybe to kubeadm or hosted. Huge abstraction layers coming from Kubernetes serve their purpose well, but can screw you up when anything goes slightly wrong on the upper layer.
For start, ignore operators, ignore custom CSI/CNI, ignore IAM/RBAC. Once you feel good in the basics, you can expand.
k3sup a cluster, ask an AI on how to serve an nginx static site using trafeik on it and explain every step of it and what it does (it should provide: a config map, a deployment, a service and an ingress)
k3s provides: csi, cni (cluster storage interface, cluster network interface) which is flannel and and local-pv which just maps volumes to disk (pvcs)
trafeik is what routes your traffic from the outside to inside your cluster (to an ingress resource)
Are you working on websites with millions of hourly visits?
I'm probably just an idiot, but I ran unmanaged postgres on Fly.io, which is basically self hosting on a vm, and it wasn't fun.
I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?
Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.
I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.
Assuming you are using fly's managed postgres now?
Yep
Beyond the usual points there are some other important factors to consider self-hosting PG:
1. Access to any extension you want and importantly ability to create your own extensions.
2. Being able to run any version you want, including being able to adopt patches ahead of releases.
3. Ability to tune for maximum performance based on the kind of workload you have. If it's massively parallel you can fill the box with huge amounts of memory and screaming fast SSDs, if it's very compute heavy you can spec the box with really tall cores etc.
Self hosting is rarely about cost, it's usually about control for me. Being able to replace complex application logic/types with a nice custom pgrx extension can save massive amounts of time. Similarity using a custom index access method can unlock a step change in performance unachievable without some non-PG solution that would compromise on simplicity by forcing a second data store.
And if you want a supabase-like functionality, I'm a huge fan of PostgREST (which is actually how supabase works/worked under the hood). Make a view for your application and boom, you have a GET only REST API. Add a plpgsql function, and now you can POST. It uses JWT for auth, but usually I have application on the same VLAN as DB so it's not as rife for abuse.
You can self host Supabase too.
Last time I checked, it was a pain in the ass to self-host it
1 reply →
I hosted PostgreSQL professionally for over a decade.
Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.
Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.
I've been self hosting it for 20 years. Best technical decision I ever made. Rock solid
I've been selfhosting it for at least 10 years, it and mysql, mysql longer. No issues selfhosting either. I have backups and I know they work.
What server company are you guys using with high reliability? Looking for server in US-East right now.
I started in this industry before cloud was a thing. I did most of the things RDS does the hard way (except being able to dynamically increase memory on a running instance, that's magic to me). I do not want that responsibility, especially because I know how badly it turns out when it's one of a dozen (or dozens) of responsibilities asked of the team.
Some fun math for you guys.
I had a single API endpoint performing ~178 Postgres SQL queries.
This is with zero code changes, these time shavings are coming purely from network latency. A lot of devs lately are not even aware of latency costs coming from their service locations. It's crazy!
I've been self hosting Postgresql for 12+ years at this point. Directly on bare metal then and now in a container with CapRover.
I have a cron sh script to backup to S3 (used to be ftp).
It's not "business grade" but it has also actually NEVER failed. Well once, but I think it was more the container or a swarm thing. I just destroyed and recreated it and it picked up the same volume fine.
The biggest pain point is upgrading as Postgresql can't upgrade the data without the previous version installed or something. It's VERY annoying.
Just use Autobase for PostgreSQL
https://github.com/vitabaks/autobase
automates the deployment and management of highly available PostgreSQL clusters in production environments. This solution is tailored for use on dedicated physical servers, virtual machines, and within both on-premises and cloud-based infrastructures.
I've had my hair on fire because my app code shit the bed. I've never ever (throughout 15 years of using it in everything I do) had to even think about Postgres, and yes, I always set it up self-hosted. The only concern I've had is when I had to do migrations where I had to upgrade PG to fit with upgrades in the ORM database layer. Made for some interesting stepping-stone upgrades once in a while but mostly just careful sysadmining.
I often find it sad how many things that we did, almost without thinking about them, that are considered hard today. Take a stroll through this thread and you will find out that everything from RAID to basic configuration management are ultrahard things that will lead you to having a bus factor of 1.
What went so wrong during the past 25 years?
What do you postgres self hosters use for performance analysis? Both GCP-SQL and RDS have their performance analysis pieces of the hosted DB and it's incredible. Probably my favorite reason for using them.
I use pgdash and netdata for monitoring and alerting, and plain psql for analyzing specific queries.
I’ve been very happy with Pganalyze.
> Take AWS RDS. Under the hood, it's:
I didn't know RDS had PgBouncer under the hood, is this really accurate?
The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.
> I didn't know RDS had PgBouncer under the hood
I don't think it does. AWS has this feature under RDS Proxy, but it's an extra service and comes with extra cost (and a bit cumbersome to use in my opinion, it should have been designed as a checkbox, not an entire separate thing to maintain).
Although, it technically has "load balancer", in form of a DNS entry that resolves to a random reader replica, if I recall correctly.
I've operated both self-hosted and managed database clusters with complex topologies and mission-critical data at well-known tech companies.
Managed database services mostly automate a subset of routine operational work, things like backups, some configuration management, and software upgrades. But they don't remove the need for real database operations. You still have to validate restores, build and rehearse a disaster recovery plan, design and review schemas, review and optimize queries, tune indexes, and fine-tune configuration, among other essentials.
In one incident, AWS support couldn't determine what was wrong with an RDS cluster and advised us to "try restarting it".
Bottom line: even with managed databases, you still need people on the team who are strong in DBOps. You need standard operating procedures and automation, built by your team. Without that expertise, you're taking on serious risk, including potentially catastrophic failure modes.
I've had an RDS instance run out of disk space and then get stuck in "modifying" for 24 hours (until an AWS operator manually SSH'd in I guess). We had to restore from the latest snapshot and manually rebuild the missing data from logs/other artifacts in the meantime to restore service.
I would've very much preferred being able to SSH in myself and fix it on the spot. Ironically the only reason it ran out of space in the first place is that the AWS markup on that is so huge we were operating with little margin for error; none of that would happen with a bare-metal host where I can rent 1TB of NVME for a mere 20 bucks a month.
As far as I know we never got any kind of compensation for this, so RDS ended up being a net negative for this company, tens of thousands spent over a few years for laptop-grade performance and it couldn't even do its promised job the only time it was needed.
I would suggest if you do host your database yourself consider taking the data seriously. Few easy solutions are using a multi zonal disk [1] with scheduled automatic snapshots [2].
[1] https://docs.cloud.google.com/compute/docs/disks/hd-types/hy... [2] https://docs.cloud.google.com/compute/docs/disks/create-snap...
Scheduled automatic snapshots are not the kind of consistent snapshots you need for a filesystem based backup.
Snapshots might break ACID for last few transactions but it will flush all in-memory writes before taking the freeze. Consider its 1 click solution, its good enough than losing everything?
> Self-hosting a database sounds terrifying.
Is this actually the "common" view (in this context)?
I've got decades with databases so I cannot even begin to fathom where such an attitude would develop, but, is it?
Boggling.
Over a decade of cloud provider propaganda achieves that. We appear to have lost the basic skill of operating a *nix machine, so anything even remotely close to that now sounds terrifying.
You mean you need to SSH into the box? Horrifying!
Can't agree more.
> I sleep just fine at night thank you.
I also self-host my webapp for 4+ years. never have any trouble with databases.
pg_basebackup and wal archiving work wonder. and since I always pull the database (the backup version) for local development, the backup is constantly verified, too.
I don't feel like it's easy to self-host postgres.
Here are my gripes:
1. Backups are super-important. Losing production data just is not an option. Postgres offers pgdump which is not appropriate tool, so you should set up WAL archiving or something like that. This is complicated to do right.
2. Horizontal scalability with read replicas is hard to implement.
3. Tuning various postgres parameters is not a trivial task.
4. Upgrading major version is complicated.
5. You probably need to use something like pgbouncer.
6. Database usually is the most important piece of infrastructure. So it's especially painful when it fails.
I guess it's not that hard when you did it once and have all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier.
wal archiving is piss easy. you can also just use basebackup. with postgres 17 it is easier than ever with incremental backup feature.
you don't need horizontal scalability when a single server can have 384 cpu real cores, 6TB of ram, some petabytes of pcie5 ssd, 100Gbps NIC.
for tuning postgres parameters, you can start by using pgtune.leopard.in.ua or pgconfig.org.
upgrading major version is piss easy since postgres 10 or so. just a single command.
you do not need pgbouncer if your database adapter library already provide the database pool functionality (most of them do).
for me maintained database also need that same amount of effort, due to shitty documents and garbage user interfaces (all aws, gcp or azure is the same), not to mention they change all the time.
"all scripts and memory to look back. But otherwise it's hard. Clicking few buttons in hoster panel is much easier."
so we need open source way to do that, coolify/dokploy comes to mind and it exactly do that way
I would say 80% of your point wouldnt be hit at certain scale, as most application grows and therefore outgrow your tech stack. you would replace them anyway at some point
Scaling to a different instance size is also easy on AWS.
That said a self hosted DB on a dedicated Hetzner flies. It does things at the price that may save you time reworking your app to be more efficient on AWS for cost.
So swings and roundabouts.
I have spent quite some time the past months and years to deploy Postgres databases to non-hyperscaler environments.
A popular choice for smaller workloads has always been the Hetzner cloud which I finally poured into a ready-to-use Terraform module https://pellepelster.github.io/solidblocks/hetzner/rds/index....
Main focus here is a tested solution with automated backup and recovery, leaving out the complicated parts like clustering, prioritizing MTTR over MTBF.
The naming of RDS is a little bit presumptuous I know, but it works quite well :-)
I'm not a cloud-hosting fan, but comparing RDS to a single instance DB seems crazy to me. Even for a hobby project, I couldn't accept losing data since the last snapshot. If you are going to self-host PostgreSQL in production, make sure you have at least some knowledge how to setup streaming replication and have monitoring in place making sure the replication works. Ideally, use something like Patroni for automatic failover. I'm saying this a someone running fairly large self-hosted HA PostgreSQL databases in production.
RDS is not, by default, multi-instance and multi-region or fault tolerant at all - you choose all of that in your instance config. The amount of single-instance single-region zero-backup RDS setup's I've seen in the wild is honestly concerning. Do Devs think an RDS instance on it's own without explicit configuration is fault tolerant and backed-up? If you have an ec2 instance with EBS and auto-restart you have almost identical fault tolerance (yes there are some slight nuances on RDS regarding recovery following a failure).
Just found that assumption a bit dangerous. The ease with which you can set that up is easy on RDS but it's not on by default.
> If your database goes down at 3 AM, you need to fix it.
Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.
I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it
From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am
The worst SEV calls are the one where you twiddle your thumbs waiting for a support rep to drop a crumb of information about the provider outage.
You wake up. It's not your fault. You're helpless to solve it.
Not when that provider is AWS and the outage is hitting news websites. You share the link to AWS being down and go back to sleep.
2 replies →
This is also the basis for most SaaS purchases by large corporations. The old "Nobody gets fired for choosing IBM."
Really? That might be an anecdote sampled from unusually small businesses, then. Between myself and most peers I’ve ever talked to about availability, I heard an overwhelming majority of folks describe systems that really did need to be up 24/7 with high availability, and thus needed fast 24/7 incident response.
That includes big and small businesses, SaaS and non-SaaS, high scale (5M+rps) to tiny scale (100s-10krps), and all sorts of different markets and user bases. Even at the companies that were not staffed or providing a user service over night, overnight outages were immediately noticed because on average, more than one external integration/backfill/migration job was running at any time. Sure, “overnight on call” at small places like that was more “reports are hardcoded to email Bob if they hit an exception, and integration customers either know Bob’s phone number or how to ask their operations contact to call Bob”, but those are still environments where off-hours uptime and fast resolution of incidents was expected.
Between me, my colleagues, and friends/peers whose stories I know, that’s an N of high dozens to low hundreds.
What am I missing?
> What am I missing?
IME the need for 24x7 for B2B apps is largely driven by global customer scope. If you have customers in North American and Asia, now you need 24x7 (and x365 because of little holiday overlap).
That being said, there are a number of B2B apps/industries where global scope is not a thing. For example, many providers who operate in the $4.9 trillion US healthcare market do not have any international users. Similarly the $1.5 trillion (revenue) US real estate market. There are states where one could operate where healthcare spending is over $100B annually. Banks. Securities markets. Lots of things do not have 24x7 business requirements.
6 replies →
Great read. I moved my video sharing app from GCP to self hosted on a beefy home server+ cloudflare for object storage and video streaming. Had been using Cloud SQL as my managed db and now running Postgres on my own dedicated hardware. I was forced to move away from the cloud primarily because of the high cost of running video processing(not because Cloud SQL was bad) but instead have discovered self hosting the db isnt as difficult as its made out to be. And there was a daily charge of keeping the DB hot which I dont have now. Will be moving to a rackmount server at a datacolo in about a month so this was great to read and confirms my experience.
I would have liked to read about the "high availability" that's mentioned a couple of times in the article; the WAL Configuration section is not enough, and replication is expensive'ish.
There are a couple of things that are being glossed over:
Hardware failures and automated fail overs. That's a thing AWS and other managed hosting solutions do. Hardware will eventually fail of course. In AWS this would be a non event. It will fail over, a replacement spins up, etc. Same with upgrades, and other stuff.
Configuration complexity. The author casually outlines a lot of fairly complex design involving all sorts of configuration tweaks, load balancing, etc. That implies skills most teams don't have. I know enough to know that I have quite a bit of reading up to do if I ever were to decide to self host postgresql. Many people would make bad assumptions about things being fine out of the box because they are not experienced postgresql DBAs.
Vacations/holidays/sick days. Databases may go down when it's not convenient to you. To mitigate that, you need to have several colleagues that are equally qualified to fix things when they go down while you are away from keyboard. If you haven't covered that risk, you are taking a bit of risk. In a normal company, at least 3-4 people would be a good minimum. If you are just measuring your own time, you are not being honest or not being as diligent as you should be. Either it's a risk you are covering at a cost or a risk you are ignoring.
With managed hosting, covering all of that is what you pay for. You are right that there are still failure modes beyond that that need covering. But an honest assessment of the time you, and your team, put in for this adds up really quickly.
Whatever the reasons you are self hosting, cost is probably a poor one.
The author's experience is trivial, so it indicates nothing. Anybody can set up a rack of postgresql servers and say it's great in year 2. All the hardware is under warranty and it still works anyway. There haven't been any major releases. The platform software is still "LTS". Nobody has needed to renegotiate the datacenter lease yet. So experience in year 2 tells you nothing.
From my point of view the real challenge comes when you want high availability and need to setup a Postgres cluster.
With MongoDB you simply create a replicaset and you are done.
When planing a Postgres cluster, you need to understand replication options, potentially deal with Patroni. Zalandos Docker Spilo image is not really maintained, the way to go seems CloudNativePG, but that requires k8s.
I still don’t understand why there is no easy built-in Postgres cluster solution.
I have been self hosting a product on Postgres that serves GIS applications for 20 years and that has been upgraded through all of the various versions during that time. It has a near perfect uptime record modulo two hardware failures and short maintenance periods for final upgrade cutovers. The application has real traffic - the database is bigger than those at my day job.
Looking at this list:
Every company I've ever on boarded at, that hosted their own database, had number one, and a lot of TODOs around the rest. It's really hard! Honestly, it could be a full time job for a team. And that's more expensive than RDS.
Self-hosting Postgres is so incredibly easy. People are under this strange spell that they need to use an ORM or always reach for SQLite when it’s trivially easy to write raw SQL. The syntax was designed so lithium’d out secretaries were able to write queries on a punchcard. Postgres has so many nice lil features.
> When self-hosting makes sense: 1. If you're just starting out in software & want to get something working quickly [...]
This is when you use SQLite, not Postgres. Easy enough to turn into Postgres later, nothing to set up. It already works. And backups are literally just "it's a file, incremental backup by your daily backups already covers this".
I was on a severely restricted budget and self hosted everything for 15+ years, while the heavily used part of the database was on a RAM card. The RAM drive was soft raided to a hard drive pair which were 3Ware raid1 hdds, just in case, and also did a daily backup on the database and during that time never had any data loss and never had to restore anything from backup. And my options were severely restricted due to a capped income.
The real downside wasn't technical. The constant background anxiety you had to learn to live with, since the hosted news sites were hammered by the users. The dreaded SMS alerts saying the server was inaccessible (often due to ISP issues) or going abroad meant persuading one of your mates to keep an eye on things just in case, created a lot of unnecessary stress.
AWS is quite good. It has everything you need and removes most of that operational burden, so the angst is much lower, but the pricing is problematic.
I’ve been managing a 100+ GB PostgreSQL database for years. Each two years I upgrade the VPS for the size, and also the db and os version. The app is in the same VPS as the DB. A 2 hour window each two years is ok for the use case. No regrets.
Over time I've realized that the best abstraction for managing a computer is a computer.
I wish this post went into the actual how! He glossed over the details. There is a link to his repo, which is a start I suppose: https://github.com/piercefreeman/autopg
A blog post that went into the details would be awesome. I know Postgres has some docs for this (https://www.postgresql.org/docs/current/backup.html), but it's too theoretical. I want to see a one-stop-shop with everything you'd reasonably need to know to self host: like monitoring uptime, backups, stuff like that.
I'd argue forget about Postgres completely. If you can shell out $90/month, the only database you should use is GCP Spanner (yes, this also means forget about any mega cloud other than GCP unless you're fine paying ingress and egress).
And for small projects, SQLite, rqlite, or etcd.
My logic is either the project is important enough that data durability matters to you and sees enough scale that loss of data durability would be a major pain in the ass to fix, or the project is not very big and you can tolerate some lost committed transactions.
A consensus-replication-less non-embedded database has no place in 2025.
This is assuming you have relational needs. For non-relational just use the native NoSQL in your cloud, e.g. DynamoDB in AWS.
You seem insanely miscalibrated. $90 gets you a dedicated server that covers most projects' needs. data durability isnt some magic that only cloud providers can get you.
If you can lose committed transactions in case of single node data failure, you don't have durability. Then it comes down to do you really care about durability.
I think a big piece missing from these conversations is compliance frameworks and customer trust. Of your selling to enterprise customers or governments, they want to go through your stack, networking, security, audit logs, and access controls with a fine toothed comb.
Everything you do that isn't "normal" is another conversation you need to have with an auditor plus each customer. Those eat up a bunch of time and deals take longer to close.
Right or wrong, these decisions make you less "serious" and therefore less credible in the eyes of many enterprise customers. You can get around that perception, but it takes work. Not hosting on one of the big 3 needs to be decided with that cost in mind
I think we can get to the point where we have self-hosted agents that can manage db maintenance and recovery. There could be regular otel -> * -> Grafana -> ~PagerDuty -> you and TriageBot which would call specialists to gather state and orchestrate a response.
Scripts could kick off health reports and trigger operations. Upgrades and recovery runbooks would be clearly defined and integration tested.
It would empower personal sovereignty.
Someone should make this in the open. Maybe it already exists, there are a lot of interesting agentops projects.
If that worked 60% of the time and I had to figure out the rest, I’d self host that. I’d pay for 80%+.
this is basically supabase. their entire stack (and product) can be hosted as a series of something like 10+ docker containers:
https://supabase.com/docs/guides/self-hosting/docker
however, like always, 'complexity has to live somewhere'. I doubt even Opus 4.5 could handle this. as soon as you get into database records themselves, context is going to blow up and you're going to have a bad time
I generally agree with the author, however, there are a handful of relatively prominent, recent examples (eg [1]) that many admins might find scary enough to prefer a hosted solution.
[1]: https://matrix.org/blog/2025/07/postgres-corruption-postmort...
I wish this article would have went more in-depth on how they're setting up backups. The great thing about sequel light is lightstream makes backup and restore something you don't really have to think about
ZFS snapshot, send, receive, clone, spin up another postgresql server on the backup server, take full backup on that clone once per week
for postgres specifically pgbackrest works well. Using in a home doing backups to r2 and local s3.
(This is very reductionist)
A lot of this comes down to devs not understanding infrastructure and infrastructure components and the insane interplay and complexity. And they don't care! Apps, apps apps, developers, developers, developers!
On the managerial side, it's often about deflection of responsibility for the Big Boss.
It's not part of the app itself it can be HARD, and if you're not familiar with things, then it's also scary! What if you mess up?
(Most apps don't need the elasticity, or the bells and whistles, but you're paying for them even if you don't use them, indirectly.)
I didnt even know there were companies that would host postgres for you. I self host it for my personal projects with 0 users and it works just fine, so I don't know why anyone would do it any differently.
I can't tell if this is satire or not with the first sentence and the "0 users" parts of your comment, but I know several solo devs with millions of users who self host their database and apps as well.
What hosting providers do they use/recommend?
1 reply →
Self-hosting is one of those things that makes sense when you can control all of the variables. For example, can you stop the developers from using obscure features of the db, that suddenly become deprecated, causing you to need to do a manual rolling back while they fix the code? A one-button UI to do that might be very handy. Can you stop your IT department from breaking the VPN, preventing you from logging into the db box at exactly the wrong time? Having it all in a UI that routes around IT's fat fingers might be helpful.
For a fascinating counterpoint (gist: cloud hosted Postgres on RDS aurora is not anything like the system you would host yourself, and other cloud deployments of databases should also not be done like our field is used to doing it when self-hosting) see this other front page article and discussion: https://news.ycombinator.com/item?id=46334990
Aurora is a closed-source fork of PostgreSQL. So it is indeed not possible to self-host it.
However a self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of. Especially at the same price points.
Yep! I was mostly replying to TFA’s claim that AWS RDS is
> Standard Postgres compiled with some AWS-specific monitoring hooks
… and other operational tools deployed alongside it. That’s not always true: RDS classic may be those things, but RDS Aurora/Serverless is anything but.
As to whether
> self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of
That’s often but not always true. Plenty of workloads will perform better on RDS (read auto scaling is huge in Serverless: you can have new read replica nodes auto-launch in response to e.g. a wave of concurrent, massive reporting queries; many queries can benefit from RDS’s additions to/modifications of the pg buffer cache system that work with the underlying storage)—and that’s even with the VM tax and the networked-storage tax! Of course, it’ll cost more in real money whether or not it performs better, further complicating the cost/benefit analysis here.
Also, pedantically, you can run RDS on bare metal with local NVMEs.
2 replies →
Does anyone offer a managed database service where the database and your application server live on the same box? Until, I can get such latency advantages of such a set-up, we've found latency just too high to go with a managed solution. We are already spending too much batching or vectorizing database reads.
I recently was also doing some research into what projects exist that come close to a “managed Postgres on Digital Ocean” experience, sadly there’s some building blocks but nothing that really makes it a complete no-brainer.
https://blog.notmyhostna.me/posts/what-i-wish-existed-for-se...
> These settings tell Postgres that random reads are almost as fast as sequential reads on NVMe drives, which dramatically improves query planning.
Interesting. Whoever wrote
https://news.ycombinator.com/item?id=46334990
didn't seem to be aware of that.
> Self-hosting a database sounds terrifying.
Is this really the state of our industry? Lol. Bunch of babies scared of the terminal.
Huh. I thought hosting one's own databases was still the norm. Guess I'm just stuck in the past, or don't consume cloud vendor marketing, or something.
Glad my employer is still one of the sane ones.
Enjoyed the article, and the "less can be more than you think" mindset in general.
To the author - on Android Chrome I seem to inevitably load the page scrolled to the bottom, footnotes area. Scrolling up, back button, click link again has the same results - I start out seeing footnotes. Might be worth a look.
Just don't try to build it from source haha. Compiling Postgres 18 with the PostGIS extension has been such a PITA because the topology component won't configure to not use the system /usr/bin/postgres and has given me a lot of grief. Finally got it fixed I think though.
I actually always build PostgreSQL from source as I want 32kb block size as default. It makes ZFS compression more awesome.
Without disagreeing:
Sometimes it is nice to simplify the conversation with non-tech management. Oh, you want HA / DR / etc? We click a button and you get it (multi-AZ). Clicking the button doubles your DB costs from x to y. Please choose.
Then you have one less repeating conversation and someone to blame.
I've had to set up postgres manually ( before docker TBF) and it's best described as suffering.
Things will go wrong. And it's all your fault. You can't just blame AWS.
Also are we changing the definition of self hosting. Self hosting on Digital Ocean ?!
Disk read write performance is also orders of magnitude better/cheaper/faster.
People really love jumping through hoops to avoid spending five dollars.
One of the things that made me think twice for self hosting postgres is securing the OS I host PG on. Any recommendation where to start for that?
Can you get away without exposing it to the internet? Firewall it off altogether, or just open the address of a specific machine that needs access to it?
What's the SOTA for on-prem Postgres, in terms of point-in-time-recovery? are there any well-tested tools for it?
I moved from AWS RDS to ScaleWay RDS, had the same effect on cost
Without stating actual numbers if not comfortable, what was the % savings one over the other? Happy with performance? Looking at potential of doing the same move.
Huh? Maybe I missed something, but...why should self-hosting a database server be hard or scary? Sure, you are then responsible for security backups, etc...but that's not really different in the cloud - if anything, the cloud makes it more complicated.
Well for the clickops folks who've built careers on the idea that 'systems administration is dead'... I imagine having to open a shell and install some stuff or modify a configuration file is quite scary.
Self-hosting a database server is not particularly hard or scary for an engineer.
Hiring and replacing engineers who can and want to manage database servers can be hard or scary for employers.
> Hiring and replacing engineers who can and want to manage database servers can be hard or scary for employers.
I heard there's this magical thing called "money" that is claimed to help with this problem. You offer even half of the AWS markup to your employees and suddenly they like managing database servers. Magic I tell you!
I'd say a managed dB, at minimum, should be handling upgrades and backups for you. If it doesn't, thats not a managed db, thats a self-service db. You're paying a premium to do the work yourself.
Pros self-host their DB's
Better yet, self host Postgres on your own open source PaaS with Coolify, Dokploy, or Canine, and then you can also self host all your apps on your VPS too. I use Dokploy but I'm looking into Canine, and I know many have used Coolify with great success.
Everyone and their mother wants to host Postgres for you!
Cooking the RDS equivalent is reasonable amount of work, and pretty big amount of knowledge (easy to make failover solution have lower uptime than "just a single VM" if you don't get everything right)
... but you can do a lot with just "a single VM and robust backup". PostgreSQL restore is pretty fast, and if you automated deployment you can start with it in minutes, so if your service can survive 30 minutes of downtime once every 3 years while the DB reloads, "downgrading" to "a single cloud VM" or "a single VM on your own hardware" might not be a big deal.
Now for the next step... just use SQLite (it's possible it will be enough for your case).
Disclaimer: there's no silver bullet, yadda yadda. But SQLite in WAL mode and backups using Litestream have worked perfectly for me.
And then there is the urge to Postgres everything.
I was disappointed alloy doesn't support timescaledb as a metrics endpoint. Considering switching to telegraf just because I can store the metrics on Postgres.
I've always just Postgressed everything. I used MySQL a bit in the PHP3 days, but eventually moved onto Postgres.
SQLite when prototyping, Postgres for production.
If you need to power a lawnmower and all you have is a 500bhp Scania V8, you may as well just do it.
It's pretty easy these days to spin up a local Postgres container. Might as well use it for prototyping too, and save yourself the hassle of switching later.
2 replies →
Have you given thought to why you prototype with SQLite?
I have switched to using postgres even for prototyping once I prepared some shell scripts for various setup. With hibernate (java) or knex (Javascript/NodeJS) and with unit tests (Test Driven Development approach) for code, I feel I have reduced the friction of using postgres from the beginning.
2 replies →
I have now switched to pglite for prototyping, because it lets me use all the postgres features.
2 replies →
does self-hosting on EC2 instance count?
Another thread where I can't determine whether the "it's easy" suggestions are from people who are clueless or expert.
Ironically you need a bit of both. You need to be expert enough to make it work, but not "too" expert to be stuck in your ways and/or influenced by all the fear-mongering.
An expert will give you thousands of theoretical reasons why self-hosting the DB is a bad idea.
An "expert" will host it, enjoy the cost savings and deal with the once-a-year occurrence of the theoretical risk (if it ever occurs).
[dead]
honestly at this point I'm actually surprised that there aren't specialized linux distributions for hosting postgres. There's so many kernel-level and file-system level optimizations that can be done that significantly impact performance, and the ability to pare down all of the unneeded stuff in most distributions would make for a pretty compact and highly optimized image.
Recommends hosting postgres yourself. Doesn't recommend a distribution stack. If you try this at a startup to save $50 a month, you will never recoup the time you wasted setting it up. We pay dedicated managed services for these things so we can make products on top of them.
There's not much to recommend; just use the Postgres from your distribution's LTS repo. I like Debian for its rock solid stability.
"just use postgres from your distro" is *wildly* underselling the amount of work that it takes to go from apt install postgres to having a production ready setup (backups, replica, pooling, etc). Granted, if it's a tiny database just pg-dumping might be enough, but for many that isn't going to be enough.
3 replies →
The one problem with using your distro's Postgres is that your upgrade routine will be dictated by a 3rd party.
And Postgres upgrades are not transparent. So you'll have a 1 or 2 hours task, every 6 to 18 months that you have only a small amount of control over when it happens. This is ok for a lot of people, and completely unthinkable for some other people.
1 reply →
Patroni, Pigsty, Crunchy, CloudNativePG, Zalando, ...
Maybe come back when your database spend is two or three orders of magnitude higher? It gets expensive pretty fast in my experience.