← Back to context

Comment by sss111

2 days ago

I'm actually on a bridge call with Google Cloud, we're a large customer -- I just learned today that their status page is not automated, instead someone actually manually updates it!

That's the case with every status page. These pages are managed by business people not engineers, because their primary purpose is to show customers that the company is meeting contractually defied SLAs.

  • Surelly no SLA will be based on the display of the status page...

    • Maybe or maybe not, but someone with nothing better to do than monitor that page out of boredom might “get on the horn” with lots of people to complain if a green check mark turns to a red X.

    • They aren't automatically based on that page, but seeing a red status makes it too easy for customers to point to it and go "see you were down, give us a refund".

This is actually the norm for status pages. If you look at the various status page offerings you'll see that they're designed around manual updates.

  • The best way to consistently having good "time to response" metrics, is to be the one deciding when an incident "actually" started happening, if at all :)

This feels very much like when facebook, locked themselves out of their datacenters. ;)

* https://www.datacenterdynamics.com/en/news/facebook-blames-m...

The bigger you are, the more you want a human involved in the decision to publicly declare an incident.

Most status pages are manual.

At least some of the information has to be.

The weird part is that it took them almost an full hour to update it.