← Back to context

Comment by progval

18 hours ago

So this replaces a SUID binary, in order to run as PID 0. The website claims it can escape "Kubernetes / container clusters" and "CI runners & build farms" but I don't see anything supporting the claim it can escape a container (or specifically, a user namespace).

I ran the exploit in rootless Podman, and predictably it doesn't escape the container.

They also claim their script "roots every Linux distribution shipped since 2017.", but only tested four; and it doesn't work on Alpine

>The website claims it can escape "Kubernetes / container clusters" and "CI runners & build farms" but I don't see anything supporting the claim it can escape a container

they state that the write-up is forthcoming. presumably there is some additional steps or modifications that will be detailed in the 'part 2'.

"Next: "From Pod to Host," how Copy Fail escapes every major cloud Kubernetes platform."

It overwrites bytes in memory of any file you can read. It's not hard to imagine how it could escape a lot of things.

> They also claim their script "roots every Linux distribution shipped since 2017.", but only tested four; and it doesn't work on Alpine

They've done themselves no favours at all with their write up.

It does seem legitimate (I was able to use the PoC on a 24.04 instance), and seems like it should be a big deal, but the actual number of affected distributions seems way lower, and not even remotely as per their claim every distribution since 2017.

For example with Ubuntu, if I'm reading it right there's some impact in 16.04 (EOL), but then at least as per their analysis, only the vendor specific 6.17 kernels they ship that have it (e.g. linux-gcp, linux-oracle-6.7 etc.). That's a relatively new kernel version they started shipping recently, after it was released upstream last September.

  • i mean, it doesn't work on any SELinux, but it's still quite severe anyhow

    • Have you got any info about this. 'seinfo -c' shows there is an alg_socket class. I presume this permission is required to be able to create an AF_ALG socket:

          $ sesearch -A -c alg_socket -p createallow bluetooth_t bluetooth_t:alg_socket { accept append bind connect create getattr getopt ioctl listen lock read setattr setopt shutdown write };
          allow container_device_plugin_init_t container_device_plugin_init_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_device_plugin_t container_device_plugin_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_device_t container_device_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_engine_t container_engine_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_init_t container_init_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_kvm_t container_kvm_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_logreader_t container_logreader_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_logwriter_t container_logwriter_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_t container_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow container_userns_t container_userns_t:alg_socket { accept append bind connect create getattr getopt ioctl lock map read setattr setopt shutdown write };
          allow openshift_app_t openshift_app_t:alg_socket { append bind connect create getattr getopt ioctl lock read setattr setopt shutdown write };
          allow openshift_t openshift_t:alg_socket { append bind connect create getattr getopt ioctl lock read setattr setopt shutdown write };
          allow spc_t unlabeled_t:alg_socket { append bind connect create getattr getopt ioctl lock read setattr setopt shutdown write };
          allow staff_t staff_t:alg_socket { append bind connect create getopt ioctl lock read setattr setopt shutdown write };
          allow sysadm_t sysadm_t:alg_socket { accept append bind connect create getopt ioctl listen lock read setattr setopt shutdown write };
          allow unconfined_domain_type domain:alg_socket { accept append bind connect create getattr getopt ioctl listen lock map name_bind read recv_msg recvfrom relabelfrom relabelto send_msg sendto setattr setopt shutdown write };
          allow user_t user_t:alg_socket { append bind connect create getopt ioctl lock read setattr setopt shutdown write };
      

      ... that's a lot of domains, including container_t and user_t; and obviously anything unconfined_t can't be expected to be restricted.

      (Maybe you & others are specifically thinking of Android's policy?)

If you can get to real UID 0 from a rootless container, you can escape it, but you do need to take extra steps. Same with it working on Alpine: the underlying vulnerability probably still exists, but the script might need some adjusting. It's a PoC, not a full exploit for every situation.

  • It's worth pointing out that you cannot, definitionally, get "real UID 0" in a "rootless" container, because then it wouldn't be a rootless container. This is relevant because this exploit doesn't claim to be able to bypass user namespaces, and that getting "real UID 0" would be a different exploit.

    • The underlying exploit allows writing arbitrary values to the page cache, independent of any namespacing, so it should be assumed to allow container escapes even if the given PoC code doesn't do that.

      1 reply →

Kubernetes 1.33 switches to user namespaces enabled by default, which I imagine is the same underlying mechanism that rootless Podman uses. `hostUsers: false` is the way to ensure that root in the pod is root on the host. It's trivial for a real (unmapped) root to escape a Kubernetes pod.

Their PoC does as you say, but is built upon arbitrary modification of the page cache, which could be abused for the other things

Did you try it on systems that don't have the patch already? Seems many distributions already shipped kernels with the patch ~a month ago.

  • Yes. Alpine in rootless Podman doesn't work (after replacing "/usr/bin/su" with "/bin/su" in the .py, running the .py just doesn't do anything) while it does in Debian in rootless Podman on the same host.

It also doesn't work on Raspberry Pi, though presumably it could easily be made to; it does replace the su binary, but the replacement is not executable.

  • It's patching the binary in memory, so the binary patch would be architecture dependent. The existing one is only x86_64, but with an updated payload, it would work on arm.

  • this is because the `su` binary is replaced with x86 shellcode, replace it with aarch64 and it will work just the same.