Can’t use cloud stuff on-prem and also if your clients have a server room of their own. Same for homelab.
Also it’s nice not to shift the pets attitude from servers to clusters and instead treat everything as cattle - provided you have backups of persistent data and the config versioned in a Git repo and there’s maybe some Ansible in the mix, being able to recreate an environment in the case of a fuckup is nice and also helps against bit rot.
Disclaimer: I actually prefer Docker Swarm/Compose over K8s due to simplicity (which matches my deployments and scale), but in the cases where I had to use a variety of K8s, going for K3s was pretty okay.
If you peel off all the layers in Docker Swarm and K8s, technically it has the same level of complexity. In k8s there are a lot of concepts. I would argue you have the same network, storage, and compute complexities as an operator.
It's applied big brain memetics. k8s turned pet servers into cattle. People then do the next step and want to treat their clusters as cattle as well. Also it has a bit of the "can it run DOOM" vibe to treat whole k8s clusters like this.
I went RKE2, k3s is nice, but a little too minimal for my tastes. With a few hundred MB ram used, I've got an internal container registry, openbao for secrets, caddy for edge TLS, rabbitmq, and powerdns for exposing k8s ingress. Plus all the standard network policies, which while verbose, gets me nearly all the way there of traditional firewalls and networking.
I used this for a bit a few years ago but eventually needed something that was hard or impossible in k3sup and just went to using the k3s tools directly. My deployment script actually got simpler after removing k3sup.
Also, fun fact, k3sup is pronounced "ketchup" according to the README[0]
What's the point? You can bootstrap k3s with "curl -sfL https://get.k3s.io | sh -". If you need to do that over ssh it works just fine. If you're doing it on multiple hosts, you should be using Ansible.
I can bootstrap an entire RKE2 VM (VM + RKE2 + join cluster) in like 5 mins with Salt (although I have no reason to think you couldn't do it with Ansible).
It's a cool project, but I didn't think the K3s part was the hard part.
You can pretty install it without ssh under 60s. The fun starts after it has been installed.
We have been running into lot of issues at production with k3s. There I embarked on journey to writing a kubernetes compliant and equivalent platform in rust with the help of claude [1]. It is a fun little project for now, still figuring out stuff, idea is to keep it minimal and single binary every embedded including CNI, and support various runtimes like docker, containerd etc but also wasm, vms and also jvm.
Architecturally - where do you run Postgres ? I assume it would be external to the cluster ? (doing it internally would create a circular dependency ?)
You have to be careful trying to do this kind of thing. The problems you describe having below are problems with peripheral components, not k3s itself. The runtime handles garbage collection and image pinning. Your embedded runtime is using libcontainer, the same thing containerd uses, so the behavior should be identical. Since you support other runtimes, how they handle image pinning, if they support it at all, will vary. Whether or not you embed the CNI plugins and networking controllers, you're seemingly still using CNI since that's how container runtimes attach containers to a network, so whatever problems you had with CNI before would still happen. The DR VM not wanting to join sounds like it was probably due to etcd storing node IPs in the cluster member metadata. If you transfer that to a new host and it doesn't have the same IP, you need to first correct that metadata out of band, which no Kubernetes distro I'm aware of handles automatically but it's a simple etcdctl one-liner. You also need to make sure the client certificate you're using to authenticate with etcd is reissued with the new host IP in its IP SANs, which k3s does do automatically. If you're not using etcd, well, good in a way because it has a lot of cruft and I'm not a fan, but that will be difficult to support because the entire Kubernetes API and many third-party controllers are all designed around how etcd works. k3s doesn't actually require etcd and can use any SQL-based RDBMS thanks to its kine compatibility shim.
With all respect, "building it because I want to" and "working toward making (it) production grade" doesn't inspire a ton of confidence. k3s has been part of the CNCF for many years and its developer Darren Shepherd was the founding CTO for both cloud.com and Rancher Labs, which were acquired by Citrix and SUSE. It looks like you're running your own B2B company and hoping to swap out k3s as the underlying engine for multitenancy. That's very risky. Surely Claude can help you understand and use k3s just as readily as help you write a replacement, and I'm sure SUSE sells professional services. I have no clue what they charge but typically you're talking like $300 an hour and you'd probably only need 40 hours.
We do, let me check with my team and post it here.
There were many issues. On top of my mind was, after a DR drill where in a VM was booted, node did not join the cluster. Apart from that bunch of issues due to etcd, longhorn.
Another major one was the CNI stopped work for a particular node. Garbage collection for images was another, we labelled the images, it would still remove then from the node.
Bunch of these kind of issues when our requirement is fairly straightforward. Therefore we are working towards a strip down version.
There is lot of operation complexity in general and most of us can do without.
the best part of k8s is network, most of agentic systems presume no network , since it's a security concern, what are scenarios when you'd like to spin up k3sup?
I use official ‘ansible-playbook k3s.orchestration.site -i inventory.yml’ and it installs k3s over SSH and adds it into my kubectl context, all under 60s too.
I do think the Talos model has kinda superseded this when it comes to repeatable deployment tbh
What is the talos model?
I'm trying to understand why people are spinning up so many k8s clusters that they need a tool to do it for them?
I have one. And it's managed. I don't think there's significant cost savings to going unmanaged, but maybe. Even so, why would I need a ton of them?
> And it's managed.
Can’t use cloud stuff on-prem and also if your clients have a server room of their own. Same for homelab.
Also it’s nice not to shift the pets attitude from servers to clusters and instead treat everything as cattle - provided you have backups of persistent data and the config versioned in a Git repo and there’s maybe some Ansible in the mix, being able to recreate an environment in the case of a fuckup is nice and also helps against bit rot.
Disclaimer: I actually prefer Docker Swarm/Compose over K8s due to simplicity (which matches my deployments and scale), but in the cases where I had to use a variety of K8s, going for K3s was pretty okay.
If you peel off all the layers in Docker Swarm and K8s, technically it has the same level of complexity. In k8s there are a lot of concepts. I would argue you have the same network, storage, and compute complexities as an operator.
Because they are selling a “pro” version as part of their commercial product SlicerVM. It has more features for operating a k3s cluster.
You're cool if you manage your own K8S cluster.
It's applied big brain memetics. k8s turned pet servers into cattle. People then do the next step and want to treat their clusters as cattle as well. Also it has a bit of the "can it run DOOM" vibe to treat whole k8s clusters like this.
I went RKE2, k3s is nice, but a little too minimal for my tastes. With a few hundred MB ram used, I've got an internal container registry, openbao for secrets, caddy for edge TLS, rabbitmq, and powerdns for exposing k8s ingress. Plus all the standard network policies, which while verbose, gets me nearly all the way there of traditional firewalls and networking.
I used this for a bit a few years ago but eventually needed something that was hard or impossible in k3sup and just went to using the k3s tools directly. My deployment script actually got simpler after removing k3sup.
Also, fun fact, k3sup is pronounced "ketchup" according to the README[0]
[0]: https://github.com/alexellis/k3sup/blob/master/README.md
The pronunciation ketchup is somewhat unfortunate as a popular backup operator for k8s, k8up, also claims this.
What's the point? You can bootstrap k3s with "curl -sfL https://get.k3s.io | sh -". If you need to do that over ssh it works just fine. If you're doing it on multiple hosts, you should be using Ansible.
I can bootstrap an entire RKE2 VM (VM + RKE2 + join cluster) in like 5 mins with Salt (although I have no reason to think you couldn't do it with Ansible).
It's a cool project, but I didn't think the K3s part was the hard part.
You can pretty install it without ssh under 60s. The fun starts after it has been installed.
We have been running into lot of issues at production with k3s. There I embarked on journey to writing a kubernetes compliant and equivalent platform in rust with the help of claude [1]. It is a fun little project for now, still figuring out stuff, idea is to keep it minimal and single binary every embedded including CNI, and support various runtimes like docker, containerd etc but also wasm, vms and also jvm.
[1] https://github.com/debarshibasak/superkube
Very interesting!
Architecturally - where do you run Postgres ? I assume it would be external to the cluster ? (doing it internally would create a circular dependency ?)
Yes, it is external to the cluster.
If you want to do a quick setup, it creates a SQLite DB for the metadata.
You have to be careful trying to do this kind of thing. The problems you describe having below are problems with peripheral components, not k3s itself. The runtime handles garbage collection and image pinning. Your embedded runtime is using libcontainer, the same thing containerd uses, so the behavior should be identical. Since you support other runtimes, how they handle image pinning, if they support it at all, will vary. Whether or not you embed the CNI plugins and networking controllers, you're seemingly still using CNI since that's how container runtimes attach containers to a network, so whatever problems you had with CNI before would still happen. The DR VM not wanting to join sounds like it was probably due to etcd storing node IPs in the cluster member metadata. If you transfer that to a new host and it doesn't have the same IP, you need to first correct that metadata out of band, which no Kubernetes distro I'm aware of handles automatically but it's a simple etcdctl one-liner. You also need to make sure the client certificate you're using to authenticate with etcd is reissued with the new host IP in its IP SANs, which k3s does do automatically. If you're not using etcd, well, good in a way because it has a lot of cruft and I'm not a fan, but that will be difficult to support because the entire Kubernetes API and many third-party controllers are all designed around how etcd works. k3s doesn't actually require etcd and can use any SQL-based RDBMS thanks to its kine compatibility shim.
With all respect, "building it because I want to" and "working toward making (it) production grade" doesn't inspire a ton of confidence. k3s has been part of the CNCF for many years and its developer Darren Shepherd was the founding CTO for both cloud.com and Rancher Labs, which were acquired by Citrix and SUSE. It looks like you're running your own B2B company and hoping to swap out k3s as the underlying engine for multitenancy. That's very risky. Surely Claude can help you understand and use k3s just as readily as help you write a replacement, and I'm sure SUSE sells professional services. I have no clue what they charge but typically you're talking like $300 an hour and you'd probably only need 40 hours.
Do you have a writeup what problems you ran into?
We do, let me check with my team and post it here.
There were many issues. On top of my mind was, after a DR drill where in a VM was booted, node did not join the cluster. Apart from that bunch of issues due to etcd, longhorn.
Another major one was the CNI stopped work for a particular node. Garbage collection for images was another, we labelled the images, it would still remove then from the node.
Bunch of these kind of issues when our requirement is fairly straightforward. Therefore we are working towards a strip down version.
There is lot of operation complexity in general and most of us can do without.
3 replies →
the best part of k8s is network, most of agentic systems presume no network , since it's a security concern, what are scenarios when you'd like to spin up k3sup?
I use official ‘ansible-playbook k3s.orchestration.site -i inventory.yml’ and it installs k3s over SSH and adds it into my kubectl context, all under 60s too.
I have just been `ssh ... -- k3s.sh ...`, been meaning to ansible my homelab
[flagged]