Comment by kevmo314

20 days ago

The wildest part is they’ll take those massive machines, shard them into tiny Kubernetes pods, and then engineer something that “scales horizontally” with the number of pods.

Yeah man, you're running on a multitasking OS. Just let the scheduler do the thing.

  • Yeah this. As I explain many times to people, processes are the only virtualisation you need if you aren’t running a fucked up pile of shit.

    The problem we have is fucked up piles of shit not that we don’t have kubernetes and don’t have containers.

    • Maybe you are right about kubernetes, I don't have enough experience to have an opinion. I disagree about containers though, especially the wider docker toolchain.

      It is not that difficult to understand a Dockerfile and use containers. Containers, from a developer pov, solve the problem of reliably reproducing development, test and production environments and workloads, and distributing those changes to a wider environment. It is not perfect, its not 100% foolproof, and its not without its quirks or learning curve.

      However, there is a reason docker has become as popular as it is today (not only containers, but also dockerfiles and docker compose), and that is because it has a good tradeoff between various concerns that make it a highly productive solution.

      2 replies →

    • Hahhah, yuuuup.

      I can maybe make a case for running in containers if you need some specific security properties but .. mostly I think the proliferation of 'fucked up piles of shit' is the problem.

    • Containers are just processes plus some namespacing, nothing really stops you from running very huge tasks on Kubernetes nodes. I think the argument for containers and Kubernetes is pretty good owing to their operational advantages (OCI images for distributing software, distributed cron jobs in Kubernetes, observability tools like Falco, and so forth).

      So I totally understand why people preemptively choose Kubernetes before they are scaling to the point where having a distributed scheduler is strictly necessary. Hadoop, on the other hand, you're definitely paying a large upfront cost for scalability you very much might not need.

      14 replies →

    • Disagree.

      Different processes can need different environments.

      I advocate for something lightweight like FreeBSD jails.

    • Yes, Sun had the marketing message "The network is the computer" already in the 1980's, we were doing microservices with plain OS processes.

  • Its all fun and games, until the control plane gets killed by the OOMkiller.

    Naturally, that detaches all your containers. And theres no seamless reattach for control plane restart.

    • Or your CNI implementation is made of rolled up turds and you lose a node or two from the cluster control plane every day.

      (Large EKS cluster)

  • Until you need to schedule GPUs or other heterogenous compute...

    • Are you saying that running your application in a pile of containers somehow helps that problem ..? It's the same problem as CPU scheduling, we just don't have good schedulers yet.. Lots of people are working on it though

      1 reply →

This is especially aggravating when the os inside the container and the language runtimes are much heavier than the process itself.

I've seen arguments for nano services (I wouldn't even call them micros services), that completely ignored that part. Split a small service in n tiny services, such that you have 10(os, runtime, 0.5) rather than 2(os, runtime, x).

  • There is no os inside the container. That's a big part of the reason containerization is so popular as a replacement for heavier alternatives like full virtualization. I get that it's a bit confusing with base image names like "ubuntu" and "fedora", but that doesn't mean that there is a nested copy of ubuntu/fedora running for every container.

To be fair each of those pods can have dedicated, separate external storage volumes which may actually help and it’s def easier than maintaining 200 iscsi or more whatever targets yourself

I mean, a large part of the point is that you can run on separate physical machines, too.