Comment by yoyohello13

3 days ago

I don't know if any of this is true, but as a user of Azure every day this would explain so much.

The Azure UI feels like a janky mess, barely being held together. The documentation is obviously entirely written by AI and is constantly out of date or wrong. They offer such a huge volume of services it's nearly impossible to figure out what service you actually want/need without consultants, and when you finally get the services up who knows if they actually work as advertised.

I'm honestly shocked anything manages to stay working at all.

I’ve created a bunch of fresh Azure accounts over the past few years and each time I’ve found myself sitting there dumbfounded anew at how garbage the experience is.

There has been weird broken jank at just about every step of the process at one point or another. Like, I’m a serious person trying to set something up for a production workload, and multiple times along the way to just having a working account that I can log into with billing configured, I’ll get baffling error messages like [ServiceKeyDepartureException: Insufficient validation expectancy. Sfhtjitgfxswinbvgtt-33-664322888], and the whole thing will simply not work until several hours later. Who knows why!?

I evaluated some Azure + Copilot Studio functionality for a project recently, which required more engagement with their whole 365 ecosystem than I’d had in a long time and it had many of the same problems but worse. Just unbelievably low quality software for the price and how popular it is. Every step of the way I hit some stupid issue. The people using this stuff are clearly not the people buying it.

  • I've joked that on some services, when you're clicking buttons, you're actually opening tickets that a human needs to action.

    That scenario is an example. You complete an action on a web page and nothing works. You make no further changes and hours later it works perfectly. Your human wasn't fast enough that day.

    • That's the "digital escort" process mentioned in the very long OP. Understandably, the US government got mad when they found out that cheap Chinese tech support staff were being used for direct intervention on "secure" VMs.

      10 replies →

    • > I've joked that on some services, when you're clicking buttons, you're actually opening tickets that a human needs to action.

      I just experienced one startup where the buttons just happen to only work during business hours on the US west coast.

    • > when you're clicking buttons, you're actually opening tickets that a human needs to action

      I had one public cloud vendor sales literally admit this was the case with their platform. But they were now selling "the new one" which is supposed to be better.

      It was, a lot. But only compared to the old one.

I remember being impressed with the Azure docs... until I spend a week implementing something, only to have it completely fail when deployed to the test environment because GraphAPI did not work as documented. The beautiful docs were a complete lie.

These days I don't even bother looking at the docs when doing stuff with Azure.

  • I can’t count the number of times the docs have been totally wrong.

    • And they were actually like that pre-LLM, in 2019, when I was implementing stuff for a car company on azure. They spent _hundreds of thousands_ on cosmosDB, for less performance than a raspberry pi running Postgres.

      1 reply →

    • Pretty surprised to hear this. I would think (assuming they are LLM written as parent suggests), that MS could throw a large context "pro" LLM at the code base and you should get perfect docs, updated every release?

      More perfect than a person where I might mistakenly copy/paste or write "Returns 404" but the LLM can probably see actually return a 401.

      I'm not a stranger to LLMs hallucinating things in responses but I'd always assumed that disappeared when you actually pointed it at the source vs some nebulous collection of "knowledge" in a general LLM.

      1 reply →

We migrated some services to AKS because the upper management thought it was a good deal to get so many credits, and now pods are randomly crashing and database nodes have random spikes in disk latency. What ran reliably on GCP became quite unpredictable.

  • Exact same story at my place. Upper management decided it's a good idea to build on Azure because Microsoft promised some benefits. Things that ran reliable on GCP now need active firefighting on Azure

  • Interesting! We're using AKS with huge success so far, but lately our Pods are unresponsive and we get 503 Gateway Timeouts that we really can't trace down. And don't get me started on Azure Blob Tables...

    • In our case this was only a month ago, and now we're stuck because management thought it was a good idea to sign a hefty spend commitment.

      7 replies →

  • Gcp is hard to beat on k8s stuff. Performance and stability is crazy good.

    But it's not aws are famous and costs money. Hence moving away seems like a good idea :)

I’ve worked with their consultants and they were lovely. They hate Azure too.

  • I imagine that no one likes Azure.

    • The only good thing Microsoft azure ever did for me was provide a very easy way to exploit their free trial program in the early 2010s to crypto mine for free. It couldn’t do much, but it was straight up free real estate for CPU mining. $200 or 2 weeks per credit/debit card.

      4 replies →

    • Azure container apps are a great (idea) and work mostly fine as long as you don’t need to touch them. But they’re just like GCR or what fargate should be - container + resources and off you go.

      We ran many internal workloads on ACA, but we had _so may issues_ with everything else around ACA…

The part about prioritizing "aggressive feature velocity" over "core fundamentals" is true.

The push is as insane as push to AI.

At the same time fundamental improvements like migrating to .net core, or reducing logs is actively deprioritised. If it were not for compliance, we would not have any core engineering improvement at all

Honestly, I was not even aware of rust push, probably cause no one in my org could do rust. I am glad we did not move to AKS though

Oh my goodness, yes. And how often their role assumption does not work!

I need privileges to do thing A, so I assume the role, and even though the role is shown as active, the buttons are still greyed out. Sometimes it works after 10 minutes and 7x F5, most often however I do a complete relogin with MFA in an incognito window. Not distracting at all, and even that does not work sometimes.

  • Using a magic link[0] from Microsoft refreshes the token instantly, but you have to do in a new tab. It's worked for me anytime permissions don't update after checking out a PIM role.

    0: https://aka.ms/pim/tokenrefresh

    • Thank-you! thank-you, thank-you, thank-you.

      [This is the single most helpful tip/link from HN I have ever found, much appreciated]

I have been a frustrated user as well. Their services seem to be held together by duct tape. For instance, an online endpoint creation failed after 90minutes with internal-error and no clue what the error is. Support tickets are routed overseas to consultants who dont have a clue - and their job is a daily email keeping the customer warm. All-in-all, and as OP says, its amazing that it is still hanging together. Some services work reliably but not all.