← Back to context

Comment by petters

1 month ago

> Older alternatives like sandbox-2 exist, but they provide isolation near the OS level, not the language level. At that point we might as well use Docker or VMs.

No,no, Docker is not a sandbox for untrusted code.

What if I told you that, back in the day, we were letting thousands of untrusted, unruly, mischievous people execute arbitrary code on the same machine, and somehow, the world didn't end?

We live in a bizarre world where somehow "you need a hypervisor to be secure" and "to install this random piece of software, run curl | sudo bash" can live next to each other and both be treated seriously.

  • 9/10 times i see curl | sudo bash mentioned, its about it being bad so I don't think that's a good comparison.

It depends on your threat model, but generally speaking would not trust default container runtimes for a true sandbox.

The kata-containers [1] runtime takes a container and runs it as a virtual host. It works with Docker, podman, k8s, etc.

It's a way to get the convenience of a container, but benefits of a virtual host.

This is not do-all-end-all, (there are more options), but this is a convenient one that is better than typical containers.

[1] - https://katacontainers.io/

I don't think it is generally possible to escape from a docker container in default configuration (e.g. `docker run --rm -it alpine:3 sh`) if you have a reasonably update-to-date kernel from your distro. AFAIK a lot of kernel lpe use features like unprivileged user ns and io_uring which is not available in container by default, and truly unprivileged kernel lpe seems to be sufficient rare.

  • The kernel policy is that any distro that isn't using a rolling release kernel is unpatched and vulnerable, so "reasonably up-to-date" is going to lean heavily on what you consider "reasonable".

    LPEs abound - unprivileged user ns was a whole gateway that was closed, io-uring was hot for a while, ebpf is another great target, and I'm sure more and more will be found every year as has been the case. Seccomp and unprivileged containers etc make a huge different to stomp out a lot of the attack surface, you can decide how comfortable you are with that though.

    • >The kernel policy is that any distro that isn't using a rolling release kernel is unpatched and vulnerable, so "reasonably up-to-date" is going to lean heavily on what you consider "reasonable".

      I would expect major distributions to have embargoed CVE access specifically to prevent this issue.

      1 reply →

You're right, Docker isn't a sandbox for untrusted code. I mentioned it because I've seen teams default to using it for isolating their agents on larger servers. So I made sure to clarify in the article that it's not secure for that purpose.

  • It depends on the task, and the risk of isolation failure. Docker can be sufficient if inputs are from trusted sources and network egress is reasonably limited.

Show me how you will escape a docker sandbox.

  • This is a well understood and well documented subject. Do your own research.

    Start here to help give you ideas for what to research:

    https://linuxsecurity.com/features/what-is-a-container-escap...

    • This kind of response isn't helpful. He's right to ask about the motivations for the claim that containers in general are "not a sandbox" when the design of containers/namespaces/etc. looks like it should support using these things to make a sandbox. He's right to be confused!

      If you look at the interface contract, both containers and VMs ought to be about equally secure! Nobody is an idiot for reading about the two concepts and arriving at this conclusion.

      What you should have written is something about your belief that the inter-container, intra-kernel attacker surface is larger than the intra-hypervisor, inter-kernel attack surface and so it's less likely that someone will screw up implementing a hypervisor so as to open a security hole. I wouldn't agree with this position, but it would at least be defensible.

      Instead, you pulled out the tired old "education yourself" trope. You compounded the error with the weasely "are considered" passive-voice construction that lets you present the superior security of VMs as a law of nature instead of your personal opinion.

      In general, there's a lot of alpha in questioning supposedly established "facts" presented this way.

    • > This is a well understood and well documented subject. Do your own research.

      Anything including GNU/Linux kernel can be broken with such security vulnerabilities.

      This is not a weakness in the design of containers. `npm install`, on the other hand, is broken by design (due to post-install.

      3 replies →

    • Escaping a properly set up container is a kernel 0day. Due to how large the kernel attack surface is, such 0days are generally believed to exist. Unless you are a high value target, a container sandbox will likely be sufficient for your needs. If cloud service providers discounted this possibility then a 0day could be burned to attack them at scale.

      Also, you can use the runsc (gvisor) runtime for docker, if you are careful not to expose vulnerable protocols to the container there will be nothing escaping it with that runtime.

      4 replies →

    • Note this lists 3 vulnerabilities as an example: CVE-2016-5195 (Dirty COW), CVE-2019-5736 (host runc override) and CVE-2022-0185 (io_uring escape)

      Out of those, only first one is actually exploitable in common setups.

      CVE-2019-5736 requires either attacker-controlled image or "docker exec". This is not likely to be the case in the "untrusted python" use case, nor in many docker setups.

      CVE-2022-0185 is blocked by seccomp filter in default installs, so as long as you don't give your containers --privileged flags, you are OK. (And if you do give this flag, the escape is trivial without any vulnerabilities)

  • Exploit the Linux kernel underneath it (not the only way, just the obvious one). Docker is a security boundary but it is not suitable for "I'm running arbitrary code".

    That is to say, Docker is typically a security win because you get things like seccomp and user/DAC isolation "for free". That's great. That's a win. Typically exploitation requires a way to get execution in the environment plus a privilege escalation. The combination of those two things may be considered sufficient.

    It is not sufficient for "I'm explicitly giving an attacker execution rights in this environment" because you remove the cost of "get execution in the environment" and the full burden is on the kernel, which is not very expensive to exploit.

    • > Exploit the Linux kernel underneath it (not the only way, just the obvious one). Docker is a security boundary but it is not suitable for "I'm running arbitrary code".

      Dockler is better for running arbitrary code compared to the direct `npm install <random-package>` that's common these days.

      I moved to a Dockerized sandbox[1], and I feel much better now against such malicious packages.

        1 - https://github.com/ashishb/amazing-sandbox

      5 replies →

Docker provides some host isolation which can be used effectively as a sandbox. It's not designed for security (and it does have some reasonable defaults) but it does give you options to layer on security modules like apparmor and seccomp very easily.