Comment by weitendorf

14 days ago

Having worked on Cloud Run/Cloud Functions, I think almost every company that isn't itself a cloud provider could be in category 1, with moderately more featureful implementations that actually competed with K8s.

Kubernetes is a huge problem, it's IMO a shitty prototype that industry ran away with (because Google tried to throw a wrench at Docker/AWS when Containers and Cloud were the hot new things, pretending Kubernetes is basically the same as Borg), then the community calcified around the prototype state and bought all this SAAS/structured their production environments around it, and now all these SAAS providers and Platform Engineers/Devops people who make a living off of milking money out of Kubernetes users are guarding their gold mines.

Part of the K8s marketing push was rebranding Infrastructure Engineering = building atop Kubernetes (vs operating at the layers at and beneath it), and K8s leaks abstractions/exposes an enormous configuration surface area, so you just get K8s But More Configuration/Leaks. Also, You Need A Platform, so do Platform Engineering too, for your totally unique use case of connecting git to CI to slackbot/email/2FA to our release scripts.

At my new company we're working on fixing this but it'll probably be 1-2 more years until we can open source it (mostly because it's not generalized enough yet and I don't want to make the same mistake as Kubernetes. But we will open source it). The problem is mostly multitenancy, better primitives, modeling the whole user story in the platform itself, and getting rid of false dichotomies/bad abstractions regarding scaling and state (including the entire control plane). Also, more official tooling and you have to put on a dunce cap if YAML gets within 2 network hopes of any zone.

In your example, I think

1. you shouldn't have to think about scaling and provisioning at this level of granularity, it should always be at the multitenant zonal level, this is one of the cardinal sins Kubernetes made that Borg handled much better

2. YAML is indeed garbage but availability reporting and alerting need better official support, it doesn't make sense for every ecommerce shop and bank to building this stuff

3. a huge amount of alerts and configs could actually be expressed in business logic if cloud platforms exposed synchronous/real-time billing with the scaling speed of Cloud Run.

If you think about it, so so so many problems devops teams deal with are literally just

1. We need to be able to handle scaling events

2. We need to control costs

3. Sometimes these conflict and we struggle to translate between the two.

4. Nobody lets me set hard billing limits/enforcement at the platform level.

(I implemented enforcement for something close to this for Run/Appengine/Functions, it truly is a very difficult problem, but I do think it's possible. Real time usage->billing->balance debits was one of the first things we implemented on our platform).

5. For some reason scaling and provisioning are different things (partly because the cloud provider is slow, partly because Kubernetes is single-tenant)

6. Our ops team's job is to translate between business logic and resource logic, and half our alerts are basically asking a human to manually make some cost/scaling analysis or tradeoff, because we can't automate that, because the underlying resource model/platform makes it impossible.

You gotta go under the hood to fix this stuff.

8 comments

weitendorf

spockz 14 days ago

Since you are developing in this domain. Our challenge with both lambdas and cloud run type managed solutions is that they seem incompatible with our service mesh. Cloud run and lambdas can not be incorporated with gcp service mesh, but only if it is managed through gcp as well. Anything custom is out of the question. Since we require end to end mTLS in our setup we cannot use cloud run.

To me this shows that cloud run is more of an end product than a building block and it hinders the adoption as basically we need to replicate most of cloud run ourselves just to add that tiny bit of also running our Sidecar.

How do you see this going in your new solution?

weitendorf 13 days ago

> Cloud run and lambdas can not be incorporated with gcp service mesh, but only if it is managed through gcp as well
I'm not exactly sure what this means, a few different interpretations make sense to me. If this is purely a run <-> other gcp product in a vpc problem, I'm not sure how much info about that is considered proprietary and which I could share, or even if my understanding of it is even accurate anymore. If it's that cloud run can't run in your service mesh then it's just, these are both managed services. But yes, I do think it's possible to run into a situation/configuration that is impossible to express in run that doesn't seem like it should be inexpressible.
This is why designing around multitenancy is important. I think with hierarchical namespacing and a transparent resource model you could offer better escape hatches for integrating managed services/products that don't know how to talk to each other. Even though your project may be a single "tenant", because these managed services are probably implemented in different ways under the hood and have opaque resource models (ie run doesn't fully expose all underlying primitives), they end up basically being multitenant relative to each other.
That being said, I don't see why you couldn't use mTLS to talk to Cloud Run instances, you just might have to implement it differently from how you're doing it elsewhere? This almost just sounds like a shortcoming of your service mesh implementation that it doesn't bundle something exposing run-like semantics by default (which is basically what we're doing), because why would it know how to talk to a proprietary third party managed service?

linuxftw 14 days ago

There are plenty of PaaS components that run on k8s if you want to use them. I'm not a fan, because I think giving developers direct access to k8s is the better pattern.

Managed k8s services like EKS have been super reliable the last few years.

YAML is fine, it's just configuration language.

> you shouldn't have to think about scaling and provisioning at this level of granularity, it should always be at the multitenant zonal level, this is one of the cardinal sins Kubernetes made that Borg handled much better

I'm not sure what you mean here. Manage k8s services, and even k8s clusters you deploy yourself, can autoscale across AZ's. This has been a feature for many years now. You just set a topology key on your pod template spec, your pods will spread across the AZ's, easy.

Most tasks you would want to do to deploy an application, there's an out of the box solution for k8s that already exists. There have been millions of labor-hours poured into k8s as a platform, unless you have some extremely niche use case, you are wasting your time building an alternative.

firesteelrain 14 days ago

Lots to unpack here.

I will just say based on recent experience the fix is not Kubernetes bad it’s Kubernetes is not a product platform; it’s a substrate, and most orgs actually want a platform.

We recently ripped out a barebones Kubernetes product (like Rancher but not Rancher). It was hosting a lot of our software development apps like GitLab, Nexus, KeyCloak, etc

But in order to run those things, you have to build an entire platform and wire it all together. This is on premises running on vxRail.

We ended up discovering that our company had an internal software development platform based on EKS-A and it comes with auto installers with all the apps and includes ArgoCD to maintain state and orchestrate new deployments.

The previous team did a shitty job DIY-ing the prior platform. So we switched to something more maintainable.

If someone made a product like that then I am sure a lot of people would buy it.

solatic 13 days ago

> real-time usage -> billing

This is one of the things that excites me about TigerBeetle; the reason why so much billing by cloud providers is reported only on an hourly granularity at best is because the underlying systems are running batch jobs to calculate final billed sums. Having a billing database that is efficient enough to keep up with real-time is a game-changer and we've barely scratched the surface of what it makes possible.

weitendorf 13 days ago

Thanks for mentioning them, we're doing quite similar debit-credit stuff as https://docs.tigerbeetle.com/concepts/debit-credit/ but reading https://docs.tigerbeetle.com/concepts/performance/ they are definitely thinking about the problem differently from us. You need much more prescribed entities (eg resources and skus) on the modelling side and different choices on the performance side (for something like a usage pricing system) for a cloud platform.
This feels like a single-tenant, centralized ACH but I think what you actually want for a multitenant, multizonal cloud platform is not ACH but something more capability-based. The problem is that cloud resources are billed as subscriptions/rates and you can't centralize anything on the hot-path (like this does) because it means that zone/any availability interacting with that node causes a lack of availability for everything else. Also, the business logic and complexity for computing an actual final bill for a cloud customer's usage is quite complex because it's reliant on so many different kinds of things, including pricing models which can get very complex or bespoke, and it doesn't seem like tigerbeetle wants calculating prices to be part of their transactions (I think)
The way we're modelling this is with hierarchical sub-ledgers (eg per-zone, per-tenant, per-resourcegroup) and something which you could think of as a line of credit. In my opinion the pricing and resource modelling + integration with the billing tx are much more challenging because they need to be able to handle a lot of business logic. Anyway, if someone chooses to opt-in to invoice billing there's an escape hatch and way for us to handle things we can't express yet.

vrosas 14 days ago

Every time I’ve pushed for cloud run at jobs that were on or leaning towards k8s I was looked at as a very unserious person. Like you can’t be a “real” engineer if you’re not battling yaml configs and argoCD all day (and all night).

weitendorf 14 days ago

It does have real tradeoffs/flaws/limitations, chief among them, Run isn't allowed to "become" Kubernetes, you're expected to "graduate". There's been an immense marketing push for Kubernetes and Platform Engineering and all the associated SAAS sending the same message (also, notice how much less praise you hear about it now that the marketing has died down?).
The incentives are just really messed up all around. Think about all the actual people working in devops who have their careers/job tied to Kubernetes, and how many developers get drawn in by the allure and marketing because it lets them work on more fun problems than their actual job, and all the provisioned instances and vendor software and certs and conferences, and all the money that represents.