Comment by dijit
22 days ago
I've watched this pattern play out in systems administration over two decades. The pitch is always the same: higher abstractions will democratise specialist work. SREs are "fundamentally different" from sysadmins, Kubernetes "abstracts away complexity."
In practice, I see expensive reinvention. Developers debug database corruption after pod restarts without understanding filesystem semantics. They recreate monitoring strategies and networking patterns on top of CNI because they never learned the fundamentals these abstractions are built on. They're not learning faster: they're relearning the same operational lessons at orders of magnitude higher cost, now mediated through layers of YAML.
Each wave of "democratisation" doesn't eliminate specialists. It creates new specialists who must learn both the abstraction and what it's abstracting. We've made expertise more expensive to acquire, not unnecessary.
Excel proves the rule. It's objectively terrible: 30% of genomics papers contain gene name errors from autocorrect, JP Morgan lost $6bn from formula errors, Public Health England lost 16,000 COVID cases hitting row limits. Yet it succeeded at democratisation by accepting catastrophic failures no proper system would tolerate.
The pattern repeats because we want Excel's accessibility with engineering reliability. You can't have both. Either accept disasters for democratisation, or accept that expertise remains required.
Where have you worked? I have seen this mentality among the smartest most accomplished people I've come across who do things like debug kernel issues at Google Cloud. Yes, those people need to really know fundamentals.
90% of people building whatever junk their company needs does not. I learned this lesson the hard way after working at both large and tiny companies. Its the people that remain in the bubble of places like AWS, GCP or people doing hard core research or engineering that have this mentality. Everyone else eventually learns.
>Excel proves the rule. It's objectively terrible: 30% of genomics papers contain gene name errors from autocorrect, JP Morgan lost $6bn from formula errors, Public Health England lost 16,000 COVID cases hitting row limits. Yet it succeeded at democratisation by accepting catastrophic failures no proper system would tolerate.
Excel is the largest development language in the world. Nothing (not Python, VB, Java etc.) can even come close. Why? Because it literally glues the world together. Everything from the Mega Company, to every government agency to even mom & pop Bed & Breakfast operations run on Excel. The least technically competent people can fiddle around with Excel and get real stuff done that end up being critical pathways that a business relies on.
Its hard to quantify but I am putting my stake in the ground: Excel + AI will probably help fix many (but not all) of those issues you talk about.
I haven’t worked anywhere special.
The issues I’m talking about are: “we can’t debug kernel issues, so we run 40 pods and tune complicated load balancers health-check procedures in order for the service to work well”.
There is no understanding that anything is actually wrong, for they think that it is just the state of the universe, a physical law that prevents whatever issue it is from being resolved. They aren’t even aware that the kernel is the problem, sometimes they’re not even aware that there is a problem, they just run at linear scale because they think they must.
Excel is the largest development platform because it's installed on (pretty much) every corporate PC by default, without having to ask Legal, Security, Finance or IT for approval. If we count Google Sheets as "Excel", the people who don't have access to it are a rounding error, if that.
BUT
With the arrival of Agentic AI, I've literally seen complete non-coders (copywriter, marketing artist, and a Designer) whip up tooling for themselves that saves them literal days of work every week.
Things that would've been a Big Project in the company, requiring the aforementioned holy quadruple's approval along with tying up precious dev + project management hours.
In the end they're "just" simple tools, simulating or simplifying different processes, but in a way they specifically need it done. All built from scratch in the time it would've taken us to have the requisite meetings for writing the spec for the application and allocating the resources needed - "We have time for this on our team backlock in about 6 months..."
None of them are perfect code, some of them are downright horrible if you look under the hood. But on the other hand they run fully locally, don't touch any external APIs, they just work with the data already on their laptops, but more efficiently than the commercial tools (or Excel).
Zapier, N8N and the like _kinda_ gave people this power, by combining different APIs into workflows. But I personally haven't seen this kind of results from them.
Doesn't every personal computing device on the planet have a browser and thus Javascript? Aren't there more mobile devices than laptops and desktops? I'm an Excel dev and I'm pretty sure that Javascript is the largest development language in the world.
K8s absolutely reduced labor. I used to have a sysadmin who ensured all our AMI images were up to date and maintained, and who maintained a mountain of bespoke bash scripts to handle startup, teardown, and upgrade of our backeneds.
Enter K8s in 2017 and life became MUCH easier. I literally have clusters that have been running since then, with the underlying nodes patched and replaced automatically by the cloud vendor. Deployments also "JustWork", are no downtime, and nearly instant. How many sysadmins are needed (on my side) to achieve all of this, zero. Maybe you're thinking of more complex stateful cases like running DBs on K8s, but for the typical app server workload, it's a major win.
Fair point, but I think you’ve actually illustrated my argument perfectly: you didn’t eliminate the need for specialists, you outsourced them to your cloud vendor. Those underlying nodes being “patched and replaced automatically” by AWS/GCP/Azure? That’s their SRE teams doing exactly the work your sysadmin used to do, just at massive scale. The control plane managing your deployments? Cloud vendor specialists built and maintain that.
And I’d wager you’ve still got people on staff doing operational work, they just don’t have “sysadmin” in their title anymore. Someone’s managing your K8s manifests, debugging why pods won’t schedule, fixing networking issues when services can’t communicate, handling secrets management, setting up monitoring and alerting. That work didn’t vanish, it just got rebranded. The “DevOps engineer” or “platform engineer” or “SRE” doing that is performing sysadmin work under a different job title.
Managed K8s can absolutely reduce operational overhead compared to hand-rolling everything. But that’s not democratisation, that’s a combination of outsourcing and rebranding. The expertise is still required, you’ve just shifted who pays for it and what you call the people doing it.
As an M$ hater from last life I've to disagree it's more expensive. You numerate the instance where they've lost value, but can you even count the value it produced over the years by lowering the entry bar? I don't even excel, but it unarguably produced way more value than it's taken away. I tend to believe history speaks for itself, solely unethical practices won't undermine truly superior products. 50% of the population aren't stupid by definition, they just specalize on different things.
Those work not done by specialist, would not have been done by a specialist nicely, it simply won't get done at all, we just don't have the scale. Of course there's a fine line in some cases it produces negative value, but more often than not it's some value discounted by maintenance versus zero.
We’re agreeing. Excel produced massive value because it accepted catastrophic failures. That’s my point.
The problem isn’t Excel. It’s trying to get Excel’s accessibility in infrastructure whilst demanding engineering reliability. You cannot have both. Kubernetes won’t accept Excel-style disasters, so it still needs specialists; now specialists who must learn the abstraction and the fundamentals.
You’re right: work not done by specialists often wouldn’t happen at all. That’s the choice. Accept Excel-esque failures for democratisation, or accept expertise is required.
My point is that currently available tools promise both, deliver neither.
We're mostly agreeing except I'm optimistic about current generation of tools being closer to assembly > C than C > VB.
There are good signs AI would eliminate whole classes of costly human errors, whether the new classes of machine only problems would cost more as models iterate is remain to be seen, which I think would be lower. I'm not super optimistic about the social economical future coming from this but from a pure tech standpoint I'm optimistic about building cost.
Edit: also to address reliability, I think a lot of things are net positive to this world without five 9s, heck even two 9s.
Edit 2: s/building cost/tco
> accept disasters for democratisation
Will insurance policy coverage and premiums change when using non-deterministic software?
Rather: Barely any insurance company will likely be willing to insure this because of the high unpredictability and high costs in case of disasters.
> Excel proves the rule.
I think you’re just seeing popularity.
The extreme popular and scale of these solutions means more opportunity for problems.
It’s easy to say X is terrible or Y is terrible but the real question is always: compared to what?
If you’re comparing to some hypothetical perfect system that only exists in theory, that’s not useful.
Democratisation doesn't eliminate specialists. It just ensures the specialists arrive later, under more pressure, with more to unwind.
All abstractions are leaky abstractions. E.g. C is a leaky abstraction because what you type isn't actually what gets emitted (try the same code in two different compilers and one might vectorize your loop while the other doesn't).
If Kubernetes didn't in any way reduce labor, then the 95% of large corporations that adopted it must all be idiots? I find that kinda hard to believe. It seems more likely that Kubernetes has been adopted alongside increased scale, such that sysadmin jobs have just moved up to new levels of complexity.
It seems like in the early 2000s every tiny company needed a sysadmin, to manage the physical hardware, manage the DB, custom deployment scripts. That particular job is just gone now.
Kubernetes enabled qualities small companies didn't dream before.
I can implement zero downtime upgrades easily with Kubernetes. No more late-day upgrades and late-night debug sessions because something went wrong, I can commit any time of the day and I can be sure that upgrade will work.
My infrastructure is self-healing. No more crashed app server.
Some engineering tasks are standardized and outsourced to the professional hoster by using managed serviced. I don't need to manage operating system updates and some component updates (including Kubernetes).
My infrastructure can be easily scaled horizontally. Both up and down.
I can commit changes to git to apply them or I can easily revert them. I know the whole history perfectly well.
I would need to reinvent half of Kubernetes before, to enable all of that. I guess big companies just did that. I never had resources for that. So my deployments were not good. They didn't scale, they crashed, they required frequent manual interventions, downtimes were frequent. Kubernetes and other modern approaches allowed small companies to enjoy things they couldn't do before. At the expense of slightly higher devops learning curve.
You’re absolutely right that sysadmin jobs moved up to new levels of complexity rather than disappeared. That’s exactly my point.
Kubernetes didn’t democratise operations, it created a new tier of specialists. But what I find interesting is that a lot of that adoption wasn’t driven by necessity. Studies show 60% of hiring managers admit technology trends influence their job postings, whilst 82% of developers believe using trending tech makes them more attractive to employers. This creates a vicious cycle: companies adopt Kubernetes partly because they’re afraid they won’t be able to hire without it, developers learn Kubernetes to stay employable, which reinforces the hiring pressure.
I’ve watched small companies with a few hundred users spin up full K8s clusters when they could run on a handful of VMs. Not because they needed the scale, but because “serious startups use Kubernetes.” Then they spend six months debugging networking instead of shipping features. The abstraction didn’t eliminate expertise, it forced them to learn both Kubernetes and the underlying systems when things inevitably break.
The early 2000s sysadmin managing physical hardware is gone. They’ve been replaced by SREs who need to understand networking, storage, scheduling, plus the Kubernetes control plane, YAML semantics, and operator patterns. We didn’t reduce the expertise required, we added layers on top of it. Which is fine for companies operating at genuine scale, but most of that 95% aren’t Netflix.
All this is driven by numbers. The bigger you are, the more money they give you to burn. No one is really working solving problems, it's 99% managing complexity driven by shifting goalposts. Noone wants to really build to solve a problem, it's a giant financial circle jerk, everybody wants to sell and rinse and repeat z line must go up. Noone says stop because at 400mph hitting the breaks will get you killed.
People really look through rose-colored glasses when they talk about late 90s, early 2000s or whenever is their "back then" when they talk about everything being simpler.
Everything was for sure simpler, but also the requirements and expectations were much, much lower. Tech and complexity moved forward with goal posts also moving forward.
Just one example on reliability, I remember popular websites with many thousands if not millions of users would put an "under maintenance" page whenever a major upgrade comes through and sometimes close shop for hours. If the said maintenance goes bad, come tomorrow because they aren't coming up.
Proper HA, backups, monitoring were luxuries for many, and the kind of self-healing, dynamically autoscaled, "cattle not pet" infrastructure that is now trivialized by Kubernetes were sci-fi for most. Today people consider all of this and a lot more as table stakes.
It's easy to shit on cloud and kubernetes and yearn for the simpler Linux-on-a-box days, yet unless expectations somehow revert back 20-30 years, that isn't coming back.
> Everything was for sure simpler, but also the requirements and expectations were much, much lower.
This. In the early 2000s, almost every day after school (3PM ET) Facebook.com was basically unusable. The request would either hang for minutes before responding at 1/10th of the broadband speed at that time, or it would just timeout. And that was completely normal. Also...
- MySpace literally let you inject HTML, CSS, and (unofficially) JavaScript into your profile's freeform text fields
- Between 8-11 PM ("prime time" TV) you could pretty much expect to get randomly disconnected when using dial up Internet. And then you'd need to repeat the arduous sign in dance, waiting for that signature screech that tells you you're connected.
- Every day after school the Internet was basically unusable from any school computer. I remember just trying to hit Google using a computer in the library turning into a 2-5 minute ordeal.
But also and perhaps most importantly, let's not forget: MySpace had personality. Was it tacky? Yes. Was it safe? Well, I don't think a modern web browser would even attempt to render it. But you can't replace the anticipation of clicking on someone's profile and not knowing whether you'll be immediately deafened with loud (blaring) background music and no visible way to stop it.
1 reply →