Comment by reactordev
17 hours ago
I read this, and have the opposite experience. Your monolith will fester as developers step on each others toes. You aren’t solving for scalability, you’re solving for sovereignty. Giving other teams the ability to develop their own service without needing to conform to your archaic grey beard architecture restrictions and your lack of understanding what a pod is or how to get your logs from your cloud.
No, this whole article reads like someone who is crying that they no longer have their AS/200. Bye. The reason people use AWS and all those 3rd party is so they don’t have to reinvent the wheel which this author seems hell bent on.
Why are we using TCP when a Unix file is fine… why are we using databases when a directory and files is fine? Why are we scaling when we aren’t Google when my single machine can serve a webpage? Why am I getting paid to be an engineer while eschewing all the things that we have advanced over the last two decades?
Yeah, these are not the right questions. The real question should be: “Now that we have scale what are we gonna do with it?”
> Giving other teams the ability to develop their own service without needing to conform to your archaic grey beard architecture restrictions
IME at many different SaaS companies, the only one that had serious reliability was the one that had “archaic grey beard architecture restrictions.” Devs want to use New Shiny X? Put a formal request before the architectural review committee; they’ll read it, then explain how what the team wants already exists in a different form.
I don’t know why so many developers - notably, not system design experts, nor having any background in infrastructure - think that they know better than the gray beards. They’ve seen some shit.
> and your lack of understanding what a pod is or how to get your logs from your cloud.
No one said the gray beards don’t know this. At the aforementioned company, we ran hybrid on-prem and AWS, and our product was hybrid K8s and traditional Linux services.
Re: cloud logs, every time I’ve needed logs, it has consistently been faster for me to ssh onto the instance (assuming it wasn’t ephemeral) and use ripgrep. If I don’t know where the logs were emitted from, I’ll find that first, then ssh. The only LaaS I’ve used that was worth a damn was Sumologic, but I have no idea how they are now, as that was years ago.
Splunk was (and is) the gold standard for centralized logging. The problem with it now is mainly that it's crazy expensive, though the operational engineering burden in order to run it well is non-zero and has to be accounted for. But being able to basically grep across all logs on the whole fleet, and then easily being able to visualize those results, made me never want to go back to having to ssh somewhere and run grep manually. I could write a script to ssh to all the app servers, grab the past 15 minutes of requests, extract their IPs, and plot them on a map to see which countries are hot, but that would be annoying enough that I'd really have to want to do that.
Meanwhile if you have Splunk, you specify the logfile name and how to extract the IP and then append "| iplocation clientip | geostats count by Country" to see which countries requests are coming from, for example. Or append "| stats count by http_version" and then click pie chart and get a visualization that breaks down how much traffic is still on HTTP 1.1, who's on 1.2, whos is on 2, and who's moved to QUIC/3.
>step on each others toes
Which leads us to a huge problem I’ve seen over the past few decades.
Too many developers for the task at hand. It’s easier for large companies to hire 100 developers with a lower bar that may or may not be a great fit than it is to hire 5 experts.
Then you have a 100 developers that you need to keep busy and not all of them can be busy 100% of the time because most people aren’t good at making their own impactful work. Then instead of trying to actually find naturally separate projects for some of them to do, you attempt to artificially break up your existing project in a way that 100 developers can work on together (and enforce those boundaries at through a network).
This artificial separation fixes some issues (merge conflicts, some deployment issues), but it causes others (everything is a distributed system now, multi stage and multi system deployments required for the smallest changes, massive infrastructure, added network latency everywhere).
That’s not to say that some problems aren’t really so big that you need a huge number of devs, but the vast majority aren’t.
> they don’t have to reinvent the wheel
Everything is a trade off, but we shouldn’t discount the cost of using generic solutions in place of bespoke ones.
Generic solutions are never going to be as good of a fit as something designed to do exactly what you need. Sometimes the tradeoff is worth it. Sometimes it’s isn’t. Like when you need to horizontally scale just to handle the overhead. Or when you have to maintain a fork of a complex system that does way more than you need.
It’s the same problem as hiring 100 generic devs instead of 5 experts. Sometimes worth it. Sometimes not.
There’s another issue here too. If not enough people are reinventing the wheel we get stuck in local optima.
The worst part is that not enough people spend enough time even thinking about these issues to make informed decisions regarding the tradeoffs they are making.