Comment by embedding-shape
1 day ago
Hah, love that now they say "Our priorities are clear: availability first, then capacity, then new features" when 6 months ago, it was seemingly exactly the same except Azure supposedly was gonna save them:
> GitHub Will Prioritize Migrating to Azure Over Feature Development - GitHub is working on migrating all of its infrastructure to Azure, even though this means it'll have to delay some feature development.
> In a message to GitHub’s staff, CTO Vladimir Fedorov notes that GitHub is constrained on capacity in its Virginia data center. “It’s existential for us to keep up with the demands of AI and Copilot, which are changing how people use GitHub,” he writes.
https://thenewstack.io/github-will-prioritize-migrating-to-a...
So the currently delayed feature development is now gonna be further delayed, yet almost every week we see new features and changes, just the other day the single issues view was changed, as just one example. And it was "existential" 6 months ago yet they keep stumbling on the exact same issue today?
Even if they're focused exclusively on reliability and uptime, we get the experience that we have today, kind of incredible how a company with the resources of Microsoft seemingly are unable to stop continuously shot themselves in the foot. It's kind of impressive actually. As icing on the cake, they've decided to buy up all popular developer services then migrate them all to the same platform, great idea too.
This seems uncharitable. Priorities aren't exclusive, especially at scale across large engineering orgs like GitHub. It could be that these are the top level priorities, but teams or individuals who aren't able to contribute to these priorities will work on other things like new features.
Agree that priorities aren't exclusive and there may be teams/individuals that aren't able to contribute if they stay in their current teams/roles
Where it becomes questionable though is when enough progress isn't being made on the top priority (reliability). If Github is being true to their word, they need to be pulling people off of teams that are working on features to work on reliability so that top priority gets the resourcing it needs.
Given the pace of improvement, and the cited example of moving to Azure from months ago, it's not super clear they are doing that. Also not clear that they aren't, maybe the move to Azure is just a more than 6mo project no matter how many people are on it.
Sure, but frontend devs fundamentally cannot contribute to the structural reliability issues.
The person who rewrote the issue page view probably doesn't know anything about multi-cloud scaling for millions of users with Azure-crippling throughput. That's an incredibly specialized set of knowledge and experience that is utterly disjunct to frontend work.
But at the same time, given the state that GitHub is in, I personally wouldn't want to allow any devs to push anything to prod that doesn't immediately affect stability. I'd completely freeze frontend work until the infrastructure is more stable. But then again I write C for microcontrollers so what do I know?
2 replies →
Ditto. I agree though, just because the priority is reliability, doesn't mean others can't work on features, especially features that might help with reliability, which I read was the motivation behind the new single-issue view, so that's my bad, might have been a bit much.
I still think the rest of my point stands, especially the last one which is the move that has the biggest impact to the most of us developers.
Why do we need to be charitable to Microsoft?
Did we lose our ability to consider them the evil empire?
There’s a lot of “won’t someone think of the GitHub employees” on here
No, but they are ordered generally, and in this case they are explicitly saying that availability should come first
It's entirely possible the move to Azure has made the availability problems worse. Dedicated hardware is much more predictable than cloud. "Let's not move to Azure and instead buy a few more racks" was likely a decision beyond the pay grade of github's management.
Moving to cloud makes scaling much easier and faster than colo data centers, though it cost more and might not be as reliable.
Maybe, but on the other hand, modern hardware is fantastically powerful so you might not need to scale, and github likely has an even and predictable usage pattern which allows them to plan expansion.
Azure is easily the least reliable and least secure of the 3 hyperscalers, which is crazy because GCP was an also-ran underdog not that long ago.
This entire exercise if anything is a huge indictment of Azure.
But that doesn't matter because the kind of person that buys Azure, just like the kind of person that buys MS Teams, is entirely driven by price and does not care about anything else.
5 replies →
I mean its Microsoft and its Azure. How much can go wrong clicking yourself a few/hundred non autoscaling normal VMs?
There is so much workload running on Azure, i never heard of VMs go away.
If Microsoft can source hardware for Azure, Microsoft can source hardware for Github.
there's a lot that can go wrong with a hypervisor, even including hiding hardware issues from the guest OS.
We don't think about it because we've been quite spoiled with excellent virtual machine platforms (KVM, Xen and even VMWare).
Those that have worked a lot with VirtualBox will be aware of this, it can be deeply unnerving that VM technology is the default way to deploy things after you've spent sufficient time with VirtualBox. (which: is very good for its original purpose, but not for reliability).
The question is: Does Azure use something more like VirtualBox, or more like KVM?
HyperV exhibits properties closer to VirtualBox.
2 replies →
I've had Windows Server VMs soft crash and hard crash on Azure. Some soft-lock and a restart via Azure gets them back. Some times the only fix has been to power off / deprovision - then power on again (i.e. a restart didn't fix it). It's not common, but I've encountered it multiple times. These are with operating systems that were created in Azure from their images.
> So the currently delayed feature development is now gonna be further delayed, yet almost every week we see new features and changes, just the other day the single issues view was changed, as just one example.
They did that as a panic mode hack to mitigate performance: https://news.ycombinator.com/item?id=47912521
If they had not added or changed any features to GitHub for the past 5 years, nobody would be upset, and yet, they keep changing it. It's a website that doesn't need to be reworked every five minutes. I assume the main development teams maintaining GitHubs codebase are ran by managers who cannot justify their jobs unless they deliver new features for the sake of delivering new features to keep their jobs going, and / or in the hopes of getting new people to join GH, when in reality the more they wind up breaking, the more the opposite becomes true.
They severely nerfed their search, I'm not sure why every other major tech company (Google - Search and YouTube) keeps breaking search for everything when it was working fine previously.
What's a bigger joke is Microsoft has Azure DevOps which looks like it might be abandoned? But then you also have GitHub... My least favorite thing about both is the ticketing system, I cannot believe that I'd ever utter the phrase "I miss Jira" when every Jira project I've ever been in had been so inconsistently setup, every, single, one.
>What's a bigger joke is Microsoft has Azure DevOps which looks like it might be abandoned?
My favorite was trying to figure out how to publish debug symbols with NuGet packages to Azure DevOps artifact feeds. Horrible documentation and I was never able to get it figured out.
> They severely nerfed their search
This always kills me. It used to work so well, and now it doesn't seem to work at all if not logged in, and not particularly well if you are logged in.
What they nerfed the most is the basic feature of the PR diff view.
It's only job is to display diff and review comments and it easily hide the diff for files that are a lit bit longer and hide comments when you have more than a dozen. You need to click to see. It's impossible to search in diff without going through it to expand everything.
And a ton of things are regression compared to working with pr a few years ago. Including being a lot worse in terms of latency!