← Back to context

Comment by dangoodmanUT

1 day ago

It has been 0 days since GCP has taken down a startup (again).

You see this at least once a year. Never heard of this from AWS or Azure.

In all seriousness, this is why we don't use them. They have the most ergonomic cloud of the big three, then absolutely murder it by having this kind of reputation.

On the other hand i can’t remember when there was a serious outage on GCP, unlike AWS/Azure who seem to go down catastrophically a couple of times per year.

  • I've been in AWS for almost twenty years at this point. It's been a long time since I've seen a global outage of the data plane on anything. The control plane, especially the US-east-1 services? Yes - but if you're off of east-1, your outages are measured in missile strikes, not botched deployments.

  • Perhaps you don't notice GCP outages because so few companies rely on them?

    • There is a mobile game I know of that had an outage as a result of a GCP service outage. That is the only time I've noticed GCP outages.

      With that said, I would not say few companies rely on GCP. Search for "GCP" in this month's HN hiring thread. There are 23 hits, more than Azure's 21. AWS has 90 hits, which I guess shows its sheer dominance in the startup space. But these figures more or less agree with my intuition of the major clouds being AWS/GCP/Azure.

    • > Perhaps you don't notice GCP outages because so few companies rely on them?

      GCP is the world's third largest cloud provider, and has around half of AWS' market share. Claiming no one uses it reads like Yogi Berra's "no one goes there anymore, it's too crowded".

      3 replies →

    • GCP has a lot of customers. But you wouldn't know the companies that do, unless you worked there and wanted to leak it, or it publicly comes out. Eg it's been publicly acknowledged that Apple uses GCP for iCloud, https://www.cnbc.com/amp/2018/02/26/apple-confirms-it-uses-g... , and Home Depot is another that's used as a case study, https://cloud.google.com/customers/the-home-depot but most customers don't want to make a big deal about being on GCP as it's none of our business who's hosting them.

      11 replies →

    • Spotify, Ebay, Paypal, Apple, Walmart, Uber are huge users. Lots of other big named companies are big users that I don't think are public.

      Then there's Anthropic...huge user.

      1 reply →

  • AWS goes down catastrophically but are back up in minutes/hours most of the time (as long as they aren't down because Iran blew up their data center). That's obviously REALLY bad for certain industries, but I suspect for the vast majority of their customers it's not a big deal. We've been able to isolate the damage almost every time just by having AZ failover in place and avoiding us-east-1 where we can.

    • Failover is supposed to protect you every time, unless something really exceptional happens.

      While its possible to to isolate the effects, judging by how many things stop working when there is an AWS failure a lot of people fail to do that. I think the shit of responsibility to AWS removes the incentive to put effort into resilience against AWS failure.

    • > AWS goes down catastrophically but are back up in minutes/hours most of the time

      The outage in the linked article appears to have been resolved in 4-5 hours.

  • IIRC the Paris datacenter flood took down a whole “region” and some data was permanently unrecoverable.

  • >On the other hand i can’t remember when there was a serious outage on GCP

    They had a really bad global outage a year ago. At least with AWS outages are contained to a single region.

  • You can't have 100% uptime. It's unfeasible, especially for a startup. You should be telling your customers that downtime might happen, sometimes for reasons beyond your control, and that if it does then you'll do your best to recover and to compensate them for the inconvenience. You should cultivate a relationship with your early customers that makes them feel bad for you when there's an outage rather than angry about how it impacts them. Maybe even go as far as firing the customers who give you a hard time over it. That way if your cloud provider falls over it's really annoying but not a big deal.

    Your cloud provider blocking your business from running is far worse.

  • None of the AWS “outages” have impacted us. They have either been regional, in which case we stand down the region (we run multiple hot regions), or didn’t involve things we need to maintain operation.

    I can’t imagine AWS ever doing such a cascading delete. I mean, they have made deletion protection a difficult thing to ignore even for individual resources.

  • There was a pretty bad one last summer - their IAM system got a bad update and it broke almost all GCP services for an hour or so, since every authenticated API call reaches out to IAM.

    It had lasting effects for us for a little over 3 hours.

> Never heard of this from AWS or Azure.

AWS does it more efficiently; it takes down many startups at a time when us-east-1 goes down.

  • That’s an entirely different type of problem, and avoidable by just using us-east-2 (I still don’t understand why people default to us-east-1 unless they require some highly specific services).

    • Is it that easily avoidable? A lot of AWS's control plane seems to have dependencies on us-east-1, or at least that's what it's looked like as a non-us-east-1 user during recent outages.

      1 reply →

    • Sympathy. Railway is going to have numerous people blaming them for this outage. When us-east-1 fails, it is headline news, so you are not to blame.

  • If my cloud provider brings my startup down, it's my problem. If they bring all the startups down, that's their problem.

  • During my 5 years of my startup, we had only 1 outage due to AWS because we picked us-west-2 as the primary reason. If anyone starting a company and picks us-east-1 as the primary reason, they should be fired. There's absolutely no reason to be in that region.

    • Why do people want to be in that region? Is it the default or something?

      I know some workloads help to be colocated but all these places are connected by fiber and every cloud has a worldwide CDN it seems.

      2 replies →

AWS has throttled our service so badly that we couldn't operate. I was thinking of writing a blog post about how they stalled our growth for a month but it seems moot

Hetzner and OVH also do this all the time.

It's AWS and Azure that are the outliers and tend not to care too much what their customers do with their infrastructure. AWS is perfectly fine with allowing me to run copies of 15 year old vulnerable AMIs copied from AMIs they've long since deprecated and removed. Even for removed features like NAT AMIs.

AWS normally contacts you first.

  • Do they?

    The only anecdotal thing I've seen is we hired a vendor to do a pentest a few years ago, and they setup some stuff in an AWS account and that account got totally yeeted out of existence by AWS if memory serves.

    • You should not be conducting unauthorized penetration tests against third party infrastructure providers without permission. They have processes and systems and usually just wants a heads up of what you plan to test and t the duration / timestamps.

      Cuz otherwise you look like a threat actor.

      That’s assuming your vendor was pentesting AWS systems. If you meant you hired a vendor to pentest your own systems on AWS, that’s of course a totally different matter.

      3 replies →

    • I’m fairly certain you are supposed to contact any vendor before attempting to penetrate hosts with authorization, not the other way around.

      1 reply →

    • If a vendor doesn't know the basics about pentesting open infra and can't be bothered to look up terms of use sounds like they know ssh-it about fsck