Comment by nyc_data_geek

2 years ago

Some orgs are looking at moving back to on prem because they're figuring this out. For a while it was vogue to go from capex to opex costs, and C suite people were incentivized to do that via comp structures, hence "digital transformation" ie: migration to public cloud infrastructure. Now, those same orgs are realizing that renting computers actually costs more than owning them, when you're utilizing them to a significant degree.

Just like any other asset.

Funny story time.

I was once part of an acquisition from a much larger corporate entity. The new parent company was in the middle of a huge cloud migration, and as part of our integration into their org, we were required to migrate our services to the cloud.

Our calculations said it would cost 3x as much to run our infra on the cloud.

We pushed back, and were greenlit on creating a hybrid architecture that allowed us to launch machines both on-prem and in the cloud (via a direct link to the cloud datacenter). This gave us the benefit of autoscaling our volatile services, while maintaining our predictable services on the cheap.

After I left, apparently my former team was strong-armed into migrating everything to the cloud.

A few years go by, and guess who reaches out on LinkedIn?

The parent org was curious how we built the hybrid infra, and wanted us to come back to do it again.

I didn't go back.

  • My funny story is built on the idea that AWS is Hotel California for your data.

    A customer had an interest in merging the data from an older account into a new one, just to simplify matters. Enterprise data. Going back years. Not even leaving the region.

    The AWS rep in the meeting kinda pauses, says: "We'll get back to you on the cost to do that."

    The sticker shock was enough that the customer simply inherited the old account, rather than making things tidy.

  • Yes, I do believe autoscaling is actually a good use case for public cloud. If you have bursty load that requires a lot of resources at peak which would sit idle most of the time, probably doesn't make sense to own what you need for those peaks.

  • There are two possible scenarios here. Firstly, they can't find the talent to support what you implemented...or more likely, your docs suck!

    I've made a career out of inheriting other peoples whacky setups and supporting them (as well as fixing them) and almost always its documentation that has prevented the client getting anywhere.

    I personally dont care if the docs are crap because usually the first thing I do is update / actually write the docs to make them usable.

    For a lot of techs though crap documentation is a deal breaker.

    Crap docs aren't always the fault of the guys implementing though, sometimes there are time constraints that prevent proper docs being written. Quite frequently though its outsourced development agencies that refuse to write it because its "out of scope" and a "billable extra". Which I think is an egregious stance...doxs Should be part and parcel of the project. Mandatory.

    • I agree that bad documentation is a serious problem in many cases. So much so that your suggestion to write the documentation after the fact can become quite impossible.

      If there is only one thing that juniors should learn about writing documentation (be it comments or design documents), it is this: document why something is there. If resources are limited, you can safely skip comments that describe how something works, because that information is also available in code.

      (It might help to describe what is available, especially if code is spread out over multiple repositories, libraries, teams, etc.)

      (Also, I suppose the comment I'm responding to could've been slightly more forgiving to GP, but that's another story.)

    • > Quite frequently though its outsourced development agencies that refuse to write it

      It's also completely against their interest to write docs as it makes their replacement easier.

      That's why you need someone competent on the buying side to insist on the docs.

      A lot of companies outsource because they don't have this competency themselves. So it's inevitable that this sort of thing happens and companies get locked in and can't replace their contractors, because they don't have any docs.

    • Unfortunately it’s also possible that e.g the company switched from share point to confluence and lost half the entire knowledge base because it wasn’t labeled the way they thought it was. Or that the docs were all purged because they were part of an abandoned project.

    • > the first thing I do is update / actually write the docs to make them usable.

      OK so the docs are in sync for a single point of time when you finish. Plus you get to have the context in your head (bus factor of 1, job security for you, bad for the org.)

      How about if we just write clean infra configs/code, stick to well known systems like docker, ansible, k8s, etc.

      Then we can make this infra code available to an on prem LLM and ask it questions as needed without it drifting out of sync overtime as your docs surely will.

      Wrong documentation is worse than no documentation.

    • Just to be clear, after I (and a few others left), they moved everything entirely to the cloud.

      Even with documentation on the hybrid setup, they'd need to get a new on-prem environment up and running (find a colo, buy machines, set up the network, blah blah).

    • "Crap docs aren't always the fault of the guys implementing though, sometimes there are time constraints that prevent proper docs being written."

      I can always guarantee a stream of consciousness one note that should have most of the important data, and a few docs about the most important parts. It's up to management if they want me to spend time turning that one note into actual robust documentation that is easily read.

Context: I build internal tools and platforms. Traffic on them varies, but some of them are quite active.

My nasty little secret is for single server databases I have zero fear of over provisioning disk iops and running it on SQLite or making a single RDBMS server in a container. I've never actually run into an issue with this. It surprises me the number of internal tools I see that depend on large RDS installations that have piddly requirements.

  • The problem with single instance is that while performance-wise it's best (at least on bare metal), there comes a moment when you simply have too much data and one machine can't handle. Your your scenario, it may never come up, but many organizations face this problem sooner or later.

    • I agree, my point is that clusters are overused. Most applications simply don't need them and it results in a lot of waste. Much of this has to do with engineers being tasked with an assortment of roles these days, so they obviously opt for the solution where a database and upgrades are managed for them. I've just found that managing a single containers upgrades aren't that big of an issue.

  • >making a single RDBMS server in a container

    On what disk is the actual data written? How do you do backups, if you do?

    • In most setups like this, it’s going to be spinning rust with mdadm, and MySQL dumps that get created via cron and sent to another location.

That’s made possible because of all the orchestration platforms such as Kubernetes being standardized, and as such you can get pretty close to a cloud experience while having all your infrastructure on-premise.

  • Yes, virtualization, overprovisioning and containerization have all played a role in allowing for efficient enough utilization of owned assets that the economics of cloud are perhaps no longer as attractive as they once were.

Same experience here. As a small organization, the quotes we got from cloud providers have always been prohibitively expensive compared to running things locally, even when we accounted for geographical redundancy, generous labor costs, etc. Plus, we get to keep know how and avoid lock-in, which are extremely important things in the long term.

Besides, running things locally can be refreshingly simple if you are just starting something and you don't need tons of extra stuff, which becomes accidental complexity between you, the problem, and a solution. This old post described that point quite well by comparing Unix to Taco Bell: https://news.ycombinator.com/item?id=10829512.

I am sure for some use-cases cloud services might be worth it, especially if you are a large organization and you get huge discounts. But I see lots of business types blindly advocating for clouds, without understanding costs and technical tradeoffs. Fortunately, the trend seems to be plateauing. I see an increasing demand for people with HPC, DB administration, and sysadmin skills.

  • > Plus, we get to keep know how and avoid lock-in, which are extremely important things in the long term.

    So much this. The "keep know how" has been so greatly avoided over the past 10 years, I hope people with these skills start getting paid more as more companies realize the cost difference.

    • When I started working in the 1980s (as a teenager but getting paid) there was a sort of battle between the (genuinely cool and impressive) closed technology of IBM and the open world of open standards/interop like TCP/IP and Unix, SMTP, PCs, even Novell sort of, etc. There was a species of expert that knew the whole product offering of IBM, all the model numbers and recommended solution packages and so on. And the technology was good - I had an opportunity to program a 3093K(?) CM/VMS monster with APL and rexx and so on. Later on I had a job working with AS/400 and SNADS and token ring and all that, and it was interesting; thing is they couldn't keep up and the more open, less greedy, hobbyists and experts working on Linux and NFS and DNS etc. completely won the field. For decades, open source, open standards, and interoperability dominated and one could pick the best thing for each part of the technology stack, and be pretty sure that the resultant systems would be good. Now however, the Amazon cloud stacks are like IBM in the 1980s - amazingly high quality, but not open; the cloud architects master the arcane set of product offerings and can design a bespoke AWS "solution" to any problems. But where is the openness? Is this a pendulum that goes back and forth (and many IBM folks left IBM in the 1990s and built great open technologies on the internet) or was it a brief dawn of freedom that will be put down by the capital requirements of modern compute and networking stacks?

      My money is on openness continuing to grow and more and more pieces of the stack being completely owned by openness (kernels anyone?) but one doesn't know.

    • Even without owning the infrastructure, running in the cloud without know-how is very dangerous.

      I hear tell of a shop that was running on ephemeral instance based compute fleets (EC2 spot instances, iirc), with all their prod data in-memory. Guess what happened to their data when spot instance availability cratered due to an unusual demand spike? No more data, no more shop.

      Don't even get me started on the number of privacy breaches because people don't know not to put customer information in public cloud storage buckets.

  • I was part of a relatively small org that wanted us to move to cloud dev machines. As soon as they saw the size of our existing development docker images that were 99.9% vendor tools in terms of disk space, they ran the numbers and told us that we were staying on-prem. I'm fairly sure just loading the dev images daily or weekly would be more expensive than just buying a server per employee.

  • Is there a bit of risk involved since the know-how has a will of its own and sometimes gets sick?

    If I had a small business with very clever people I'd be very afraid of what happens if they're not available for a while.

Keep in mind, there is an in between..

I would have a hard time doing servers as cheap as hetzner for example including the routing and everything

  • I do that. In fact I've been doing it for years, because every time I do the math, AWS is unreasonably expensive and my solo-founder SaaS would much rather keep the extra money.

    I think there is an unreasonable fear of "doing the routing and everything". I run vpncloud, my server clusters are managed using ansible, and can be set up from either a list of static IPs or from a terraform-prepared configuration. The same code can be used to set up a cluster on bare-metal hetzner servers or on cloud VMs from DigitalOcean (for example).

    I regularly compare this to AWS costs and it's not even close. Don't forget that the performance of those bare-metal machines is way higher than of overbooked VMs.

    • 100% agree. People still think that maintaining infrastructure is very hard and requires lot of people. What they disregard is that using cloud infrastructure also requires people.

    • I was more talking about physical backbone connection which hetzner does for you.

      We are using hetzner cloud.. but we are also scaling up and down a lot right now

      6 replies →

    • When talking about Hetzner pricing, please don’t change the subject to AWS pricing. The two have nothing in common, and intuition derived from one does not transfer to the other.

      3 replies →

It's not an either/or. Many business both own and rent things.

If price is the only factor, your business model (or executives' decision-making) is questionable. Buy only the cheapest shit, spend your time building your own office chair rather than talking to a customer, you aren't making a premium product, and that means you're not differentiated.

i would imagine that cloud infrastructure has the ability for fast scale up, unlike self-owned infrastructure.

For example, how long does it take to rent another rack that you didnt plan for?

And not to mention that the cost of cloud management platforms that you have to deploy to manage these owned assets is not free.

I mean, how come even large consumers of electricity does not buy and own their own infrastructure to generate it?

  • Ordering that amount of amount of servers takes about one hour with hetzner. If you truly want a complete rack on your own maybe a few days as they have to do it manually.

    Most companies don‘t need to scale up full racks in seconds. Heck, even weeks would be ok for most of them to get new hardware delivered. The cloud planted the lie into everyone‘s head that most companies dont have predictable and stable load.

  • One other appealing alternative for smaller startups is to run Docker on one burstable vm. This is a simple setup and allows you to go beyond the cpu limits and also scale up the vm.

    Might be other alternatives than using Docker so if anyone has tips for something simpler or easier to maintain, appreciate a comment.

  • >I mean, how come even large consumers of electricity do not buy and own their own infrastructure to generate it?

    They sure do? BASF has 3 power plants in Hamburg, Disney operate Reedy Creek Energy with at least 1 power plant and I could list a fair bit more...

    >For example, how long does it take to rent another rack that you didnt plan for?

    I mean, you can also rent hardware a lot cheaper then on AWS. There certainly are providers where you can rent out a rack for a month within minutes

    • Some universities also have their own power plants. It’s also becoming more common to at least supplement power on campus with solar arrays.