← Back to context

Comment by stolsvik

9 days ago

My question to that is: Why do you? Why on earth do people want to tie themselves so hard to a specific mast?

With AWS I can build the stack so that I don't need to wake up in the early hours on a Saturday to fix it. Someone at Amazon is already sweating and pulling crap off racks or reconfiguring a switch or restoring a database. At max I can go "yea, not our problem, it'll come back up when the intern at Amazon stops messing with the DNS again" on Slack, put the phone down and go back to sleep =)

Running on bare metal VPS is only viable if you're:

a) a startup with a true shoestring budget, you can get massive use out of a single mid-tier server auction server with everything on it web, backend and database AND you have someone in-house with the skills to do that.

b) you're so big you can afford to hire a full rotation of SREs to manage your crap AND your devs / SREs are able to maintain Rabbit MQ, Postgres and whatever object storage you're using themselves - and someone Does The Math and calculates it's cheaper and the risks are manageable.

In the middle there's AWS. You can run millions of revenue through AWS with maybe 2-3 people managing the backend infra. 99% of times when something breaks suddenly, it's up to Amazon to fix it.

  • But more specifically why build a stack tied to a single vendor?

    You talk about competent SRE being hard to find and manage but then you describe needing several AWS backend specialists.

    I think I'd rather have a generalized SRE team with portable infrastructure.

    Maybe that's just me. I watched an org get burned by Google App Engine. I find these proprietary stacks to be a giant trap.

    • It's a nice pipe dream to have a "cloud independent" stack. Yea, you can kinda do it with stuff like Opentofu abstracting the services, but in practice nobody does that because it's a massive mess of slight differences here and there. And a complete impossibility if you go anywhere beyond very basic compute and DBs. Like how do you do cloud independent IAM?

      What you do is you accept the risk and mitigate it. Watch the costs and figure out whether buying stuff like AI capacity (Bedrock, Vertex), queues, databases or block storage as a service is more cost-efficient (including maintenance costs) than self-hosting them.

      I _know_ how to run all that shit locally, but I don't _want_ to.

      Upgrading an Aurora Postgres server is like two clicks on the Web UI, not even that if you set the maintenance window. Adding new servers to the cluster is a single number change to the terraform file. I can even up or downscale the compute behind them depending on what's going on. A big release and we're expecting unusual traffic? Bump them up by changing one string in the .tf file or add more replicas temporarily.

      With on-prem hardware I'd need to buy and provision the hardware, pick an OS, get it up and running, install the DB, fuck around with the DB configs and whatever networking the provider is using to get it connected with the other servers while still keeping it out of the larger internet. And there will be no downscaling or upscaling because it's actual hardware.

      Also any half-decent full stack / backend engineer can learn AWS basics in a week or on a two day course provided by AWS with lunch and snacks included. Messing with actual physical hardware is a completely different skill set that's getting rare and expensive these days.

      1 reply →