Comment by Anon1096

3 days ago

The post is so dramatized and clearly written by someone with a grudge such that it really detracts from any point that is trying to be made, if there is any.

From another former Az eng now elsewhere still working on big systems, the post gets way way more boring when you realize that things like "Principle Group Manager" is just an M2 and Principal in general is L6 (maybe even L5) Google equivalent. Similarly Sev2 is hardly notable for anyone actually working on the foundational infra. There are certainly problems in Azure, but it's huge and rough edges are to be expected. It mostly marches on. IMO maturity is realizing this and working within the system to improve it rather than trying to lay out all the dirty laundry to an Internet audience that will undoubtedly lap it up and happily cry Microslop.

Last thing, the final part 6 comes off as really childish, risks to national security and sending letters to the board, really? Azure is still chugging along apparently despite everything being mentioned. People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.

>risks to national security and sending letters to the board, really?

Yes, really, and guess what the DoD did on Aug 29, 2025, exactly 234 days after I warned the CEO of potential risks?

https://www.propublica.org/article/microsoft-china-defense-d...

It wasn’t specifically about the escort sessions from any particular country, though, but about the list of underlying reasons why direct node access was necessary.

> People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.

Or… you’ve just normalised the deviation.

One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.

After about three or four weeks everyone adapts, learns what they can and can’t criticise without fallout, and settles into the mud to wallow with everyone else that has become accustomed to the filth.

As an Azure user I can tell you that it’s blindingly obvious even from the outside that the engineering quality is rock bottom. Throwing features over the fence as fast as possible to catch up to AWS was clearly the only priority for over a decade and has resulted in a giant ball of mud that now they can’t change because published APIs and offered products must continue to have support for years. Those rushed decisions have painted Azure into a corner.

You may puff your chest out, and even take legitimate pride in building the second largest public cloud in the world, but please don’t fool yourself that the quality of this edifice is anything other than rickety and falling apart at the seams.

Remind me: can I use IPv6 safely yet? Does it still break Postgres in other networks? Can azcopy actually move files yet, like every other bulk copy tool ever made by man? Can I upgrade a VM in-place to a new SKU without deleting and recreating it to work around your internal Hyper-V cluster API limitations? Premium SSDv2 disks for boot disks… when? Etc…

You may list excuses for these quality gaps, but these kinds of things just weren’t an issue anywhere else I’ve worked as far back as twenty years ago! Heck, I built a natively “all IPv6” VMware ESXi cluster over a decade ago!

  • > One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.

    Wellllll ... my observations after many cycles of this are:

    - wtfs/day exclaimed by people interacting with *a new codebase* are not indicative of anything. People first encountering the internals of any reasonably interesting system will always be baffled. In this context "wtf" might just mean "learning something new".

    - wtfs/day exclaimed by people learning about your *processes and workflows* are extremely important and should be taken extremely seriously. "wtf, did you know all your junior devs are sharing a single admin API token over email?" for example.

  • > One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.

    Eh, I don't think this is exactly as reliable as you'd expect.

    My previous job had a fairly straight forward code base but had fairly poor reliability for the few customers we had, and the WTF portions usually weren't the ones that caused downtime.

    On the other hand, I'm currently working on a legacy system with daily WTFs from pretty much everyone, with a greater degree of complexity in a number of places, and yet we get fewer bug reports and at least an order of magnitude if not two more daily users.

    With all of that said... I don't think I've used any of Microsoft's new software in years and thought to myself "this feels like it was well made."

    • The rapid decay of WTF/day over time applies to both new employees and new customers.

      > currently working on a legacy system

      "Legacy" is the magic word here! Those customers are pissed, trust me, but they've long ago given up trying to do anything about it. That's why you don't hear about it. Not because there are no bugs, but because nobody can be bothered to submit bug reports after learning long ago that doing so is futile.

      I once read a paper claiming that for every major software incident (crash, data loss, outage, etc...) between only one in a thousand to one in ten thousand will be formally reported up to an engineer capable of fixing the issue.

      I refused to believe that metric until I started collecting crash reports (and other stats) automatically on a legacy system and discovered to my horror that it was crashing multiple times per user per day, and required on average a backup restore once a week or so per user due to data corruption! We got about one support call per 4,500 such incidents.

      2 replies →

  • I mean, the org had already decreed everything needed to be rewritten in Rust according to the account.

> Last thing, the final part 6 comes off as really childish, risks to national security and sending letters to the board, really?

That struck me too. Maybe i've never worked high enough in an org (im unclear how highly ranked the author of the piece is) but i've never been in an org where going over your boss's boss's boss's boss's head and writing a letter to the board was likely to go well.

That said, i could easily believe that both Azure is an absolute mess and that the author of the piece was fired because of how he went about things.

  • [flagged]

    • Lol, no.

      It is true that writing to the board will get you noticed, and that you might not like the consequences. If you value having the job then don’t write to the board. Even if you are right, being noticed like that isn’t going to endear you to your boss.

      But if you care more about doing the right thing then writing to the board is the right thing to do. And after a few years of working at Microsoft you might not value your job very much either and you too might decide to go out in style.

      Go watch the last episode of Chernobyl again.

    • Windows is ~500 times bigger than Azure, give or take, by machine count, and still many times larger by loc, modules, users, whatever else you want to measure. The heavy lifting (VM/containers, I/O, the things that cannot not be done just like that) is handled by the Windows folks anyway. The only hard part is the VM placement, everything else is mostly regular software engineering, some of medium-hard complexity but nothing that can excuse the need for constant human intervention.

    • from a philosophy grad. both these responses are logical fallacies.

      1: it's bad, but so is everything else (ad populum, everyone does it so it's ok).

      2: it can only be because the author has a personality disorder or psychotic break (ad hominem)

AWS and Google Cloud are both huge and are significantly better in UX/DX. My only experience with Azure was that it barely worked, provided very little in the way of information about why it didn't. I only have negative impressions of Azure whereas at least GC and AWS I can say my experiences are mixed.

> From another former Az eng now elsewhere still working on big systems, the post gets way way more boring when you realize that things like "Principle Group Manager" is just an M2 and Principal in general is L6 (maybe even L5) Google equivalent. Similarly Sev2 is hardly notable for anyone actually working on the foundational infra.

Before the days of title inflation across the industry, a a Principal at Microsoft was a rare thing. When I was there, the ratio was maybe 1 principal for every 30 developers. Principals were looked up to, had decades of experience, and knew their shit really well. They were the big guns you called in to fix things when the shit really hit the fan, or when no one else could figure out what was going on.

  • One of Microsoft's problems is their pay is significantly lower than FAANG and so you very very rarely see people with expertise in the same verticals jump to Azure. I get that "the deal" at Microsoft is lower pressure for lower pay but it really hinders the talent pipeline. There are some good home grown principals and seniors, but even then I think the people I worked with would have done well to jump around and get a stint at another cloud provider to see what it's like. Many of them started as new grads and their whole career was just at Azure.

    Meanwhile when I was at another company we would get a weekly new hire post with very high pedigree from other FAANGs. And with that we got a lot of industry leading ideas by osmosis that you don't see Azure getting.

    • Yeah the deal has also changed. Right as I was leaving the messaging started changing a lot and there was a clear top down “you all need to work harder”. They hired an ex Amazon guy to run my org which really drove the message home.

      To be fair though I think Microsoft has decided they are fine with rank and file being mediocre. I don’t know how interested they are in competing for top talent except for at the top.

    • > I get that "the deal" at Microsoft is lower pressure for lower pay but it really hinders the talent pipeline.

      The deal used to be a lower cost of living in a major coastal city, an amazing campus (it is seriously lovely), every engineer had their own office, serious job security, and an unbelievable health care plan.

      Seattle exploded in price, they moved to open offices, Microsoft started doing mass layoffs, and they gutted the healthcare plan (by the time I left the main plan on offer was a high deductible with a miserable prescription formulary).

      Hard to attract talent when there is no big differentiator.

      Of course in the 90s the deal was work there 10 years retire a millionaire. Easy to attract talent when that is the offer ...

> risks to national security

Microsoft is the go to solution for every government agency, FEDRAMP / CMMC environments, etc.

> People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.

This I'm more sympathetic to. I really don't think his approach of "here's what a rewrite would look like" was ever going to work and it makes me think that there's another side to this story. Thinking that the solution is a full reset is not necessarily wrong but it's a bit of a red flag.

  • At no point during the reading I got sense that he's suggesting something radical. Where specifically is he pointing out rewrite?

    "The practical strategy I suggested was incremental improvement... This strategy goes a long way toward modernizing a running system with minimal disruption and offers gradual, consistent improvements. It uses small, reliable components that can be easily tested separately and solidified before integration into the main platform at scale." [1]

    [1] https://isolveproblems.substack.com/p/how-microsoft-vaporize...

    • > The current plans are likely to fail — history has proven that hunch correct — so I began creating new ones to rebuild the Azure node stack from first principles.

      > A simple cross-platform component model to create portable modules that could be built for both Windows and Linux, and a new message bus communication system spanning the entire node, where agents could freely communicate across guest, host, and SoC boundaries, were the foundational elements of a new node platform

      Yes, I read that part as well and found it a bit confusing to reconcile with this one.

      The vibe from my quotes is very much "I had a simple from-scratch solution". They mention then slowly adopting it, but it's very hard to really assess this based on just the perspective of the author.

      He also was making suggestions about significantly slowing down development and not pursuing major deals, which I think again is not necessarily wrong but was likely to fall on deaf ears.

      1 reply →

  • > Microsoft is the go to solution for every government agency, FEDRAMP / CMMC environments, etc.

    I've been involved with FEDRAMP initiatives in the past. That doesn't mean as much as you'd think. Some really atrocious systems have been FEDRAMP certified. Maybe when you go all the way to FEDRAMP High there could be some better guardrails; I doubt it.

    Microsoft has just been entrenched in the government, that's all. They have the necessary contacts and consultants to make it happen.

    > Thinking that the solution is a full reset is not necessarily wrong but it's a bit of a red flag.

    The author does mention rewriting subsystem by subsystem while keeping the functionality intact, adding a proper messaging layer, until the remaining systems are just a shell of what they once were. That sounds reasonable.

    • Thanks. That was exactly the plan. Full rewrites are extremely risky (see the 2nd System syndrome) as people wrongly assume they will redo everything and also add everything everyone always wanted, and fix all dept, and do it in a fraction of the time, which is delusional and almost always fail. Stepwise modernization is a proven technique.

      1 reply →

    • > I've been involved with FEDRAMP initiatives in the past. That doesn't mean as much as you'd think. Some really atrocious systems have been FEDRAMP certified. Maybe when you go all the way to FEDRAMP High there could be some better guardrails; I doubt it.

      I never said otherwise. I said that Microsoft services are the defacto tools for FEDRAMP. I never implied that those environments are some super high standard of safety. But obviously if the tools used for every government environment are fundamentally unsafe, that's a massive national security problem.

      > Microsoft has just been entrenched in the government, that's all.

      Yes, this is what I was saying.

      > The author does mention rewriting subsystem by subsystem while keeping the functionality intact, adding a proper messaging layer, until the remaining systems are just a shell of what they once were. That sounds reasonable.

      It sounds reasonable, it's just hard to say without more insight. We're getting one side of things.

I think he did kind of point at the lack of seniority in the org, so I'm not sure he was trying to exaggerate with the titles.

I'm really struck that they have such Jr people in charge of key systems like that.

  • Juniors love to hack out new things and in the mean time they can take the blame if needed, fair trade, won’t you say?

I've worked at both Microsoft and Google in the past 6 years and the notion that msft "Principal" is equivalent to goog L5 is crazy.

  • Meaning Msft Principal is below L5? I got the same feedback from one of my friends who works at Google. She said quality of former MSFT engineers now working at Google was noticeably lower.

    • I mean imputed prestige within the organization. Being an L5 is nothing; it's the promote-or-fire cutoff at Google AFAIK. But being a Principal is slightly more than nothing; it's two levels above the promote-or-fire cutoff.

      I mean, _now_, sure, I'd assume Microsoft Principals should be hired around L4 at Google. But that's just due to a temporary inbalance in the decline of legacy organizations. Give it a few years and it will even back out and msft 64 will be in the middle of L5 range like levels.fyi claims.

      1 reply →

    • I mean if you go by pay in the UK a Microsoft principle is equivalent to an L4 at Google if levels.fyi is too be believed....

> risks to national security …really?

Really. Apparently the Secretary of War agrees with him.

  • In fairness the SECWAR is hardly a computing expert.

    But in this case the SECWAR has been properly advised. If anything it's astonishing that a program whereby China-based Microsoft engineers telling U.S.-based Microsoft engineers specific commands to type in ever made it off the proposal page inside Microsoft, accelerated time-to-market or not.

    It defeats the entire purpose of many of the NIST security controls that demand things like U.S.-cleared personnel for government networks, and Microsoft knew those were a thing because that was the whole point to the "digital escort" (a U.S. person who was supposed to vet the Chinese engineer's technical work despite apparently being not technical enough to have just done it themselves).

    Some ideas "sell themselves", ideas like these do the opposite.

    • > If anything it's astonishing that a program whereby China-based Microsoft engineers telling U.S.-based Microsoft engineers specific commands to type in ever made it off the proposal page inside Microsoft, accelerated time-to-market or not.

      > It defeats the entire purpose of many of the NIST security controls that demand things like U.S.-cleared personnel for government networks, and Microsoft knew those were a thing because that was the whole point to the "digital escort" (a U.S. person who was supposed to vet the Chinese engineer's technical work despite apparently being not technical enough to have just done it themselves).

      That is beyond bad. Proof of this?

      3 replies →

    • Being compliant with the letter of the requirements at 1/3 of the cost is absolutely an idea that sells itself.

    • I'd like to suggest calling him SECDEF, not SECWAR.

      IMHO the country should not capitulate to Trump's power grabs, even if Congress refuses to perform their oversight duties.

      6 replies →

  • The United States does not have a Secretary of War, and has not since 1947.

    • Uhm:

      > The United States secretary of defense (SecDef), secondarily titled the secretary of war (SecWar),[b] is the head of the United States Department of Defense (DoD), the executive department of the U.S. Armed Forces, and is a high-ranking member of the cabinet of the United States.[8][9][10]

      Wikipedia

  • To be fair, it's not like Hegseth is a super high-signal source. Hegseth says lots of stuff, some of which are even true!

    • This was such a genuinely weird moment for me when reading the article.

      "yadda yadda and then also the secretary of defence agreed it was bad"

      I'm just reading along and going, "yeah that sounds really bad if a secretary level position is being cited... wait a second, isn't that actually the guy who is literally famous for being stupid??"

      I never expected to be living through a real life version of "the emperor's new clothes", like, how is anyone quoting this guy about anything?

The problem is that what he writes is very plausible and explains a lot about why Azure is so unreliable and insecure. The author didn't mention the shameful way Microsoft leaked a Golden SAML key to Chinese hackers. This event absolutely was a threat to national security.

If your reaction is emblematic of the way people reacted to his points internally that does give more credibility to his side of the story IMHO

Yes it's easy to critique any large system or organisation, to then go over everyone's head and cry to the CEO and Board is snake like behaviour especially offering you self as the answer to fix it. OP will be marked as a troublemaker and bad team member.

  • Maybe. That would be a dent in the shiny culture of trust Microsoft is proud to run on, though.

Do you contest the fact that Microsoft royally fumbled OpenAI out of sheer incapability of providing what's supposed to be its core business despite having all deals in its favor? Because that's the most damning validation against Azure in recent times.

The grudge is simple and doesn't detract one thing from a very well articulated blog: you do you job as an engineer of pointing out problems, even proposing solutions, and they fire you for doing exactly the job. It's infuriating enough just from reading it, idk how you can't see any legitimacy on what the guy is complaining. You have your right of free speech to complain about shitty jobs if you want, there's no honor bound to maintain silence here.

He might sound like he has a grudge but you sound like you’re personally invested. Shill?