Sandboxing Untrusted Python

1 month ago (gist.github.com)

> Older alternatives like sandbox-2 exist, but they provide isolation near the OS level, not the language level. At that point we might as well use Docker or VMs.

No,no, Docker is not a sandbox for untrusted code.

  • What if I told you that, back in the day, we were letting thousands of untrusted, unruly, mischievous people execute arbitrary code on the same machine, and somehow, the world didn't end?

    We live in a bizarre world where somehow "you need a hypervisor to be secure" and "to install this random piece of software, run curl | sudo bash" can live next to each other and both be treated seriously.

    • 9/10 times i see curl | sudo bash mentioned, its about it being bad so I don't think that's a good comparison.

  • It depends on your threat model, but generally speaking would not trust default container runtimes for a true sandbox.

    The kata-containers [1] runtime takes a container and runs it as a virtual host. It works with Docker, podman, k8s, etc.

    It's a way to get the convenience of a container, but benefits of a virtual host.

    This is not do-all-end-all, (there are more options), but this is a convenient one that is better than typical containers.

    [1] - https://katacontainers.io/

  • I don't think it is generally possible to escape from a docker container in default configuration (e.g. `docker run --rm -it alpine:3 sh`) if you have a reasonably update-to-date kernel from your distro. AFAIK a lot of kernel lpe use features like unprivileged user ns and io_uring which is not available in container by default, and truly unprivileged kernel lpe seems to be sufficient rare.

    • The kernel policy is that any distro that isn't using a rolling release kernel is unpatched and vulnerable, so "reasonably up-to-date" is going to lean heavily on what you consider "reasonable".

      LPEs abound - unprivileged user ns was a whole gateway that was closed, io-uring was hot for a while, ebpf is another great target, and I'm sure more and more will be found every year as has been the case. Seccomp and unprivileged containers etc make a huge different to stomp out a lot of the attack surface, you can decide how comfortable you are with that though.

      2 replies →

  • You're right, Docker isn't a sandbox for untrusted code. I mentioned it because I've seen teams default to using it for isolating their agents on larger servers. So I made sure to clarify in the article that it's not secure for that purpose.

    • It depends on the task, and the risk of isolation failure. Docker can be sufficient if inputs are from trusted sources and network egress is reasonably limited.

  • Show me how you will escape a docker sandbox.

    • Exploit the Linux kernel underneath it (not the only way, just the obvious one). Docker is a security boundary but it is not suitable for "I'm running arbitrary code".

      That is to say, Docker is typically a security win because you get things like seccomp and user/DAC isolation "for free". That's great. That's a win. Typically exploitation requires a way to get execution in the environment plus a privilege escalation. The combination of those two things may be considered sufficient.

      It is not sufficient for "I'm explicitly giving an attacker execution rights in this environment" because you remove the cost of "get execution in the environment" and the full burden is on the kernel, which is not very expensive to exploit.

      6 replies →

  • Docker provides some host isolation which can be used effectively as a sandbox. It's not designed for security (and it does have some reasonable defaults) but it does give you options to layer on security modules like apparmor and seccomp very easily.

The example is:

    @task(name="analyze_data", compute="MEDIUM", ram="512MB", timeout="30s", max_retries=1)
    def analyze_data(dataset: list) -> dict:
        # Your code runs safely in a Wasm sandbox
        return {"processed": len(dataset), "status": "complete"}

This is fundamentally awkward in a language with as absurdly flexible a type system as Python. What if that list parameter contains objects that implement __getattr__? What if the output dict has an overridden __getattr__?

Even defining semantics seems awkward, especially if one wants those semantics to simultaneously make sense and have any sort of clear security properties.

edit: a quick look at the source suggests that the output is deserialized JSON regardless of what the type signature says. That’s certainly one solution.

  • Yep, exactly.

    We stick to JSON to make sure we pass data, not behavior. It avoids all that complexity.

The gist dismisses sandbox-2 as “might as well use Docker or VMs” but IMO that misses what makes it interesting. The PyPy sandbox isn’t just isolation, it’s syscall interception with a controller in the loop.

I’ve been building on that foundation: script runs in sandbox, all commands and file writes get captured, human-in-the-loop reviews the diff before anything executes. It’s not adversarial (block/contain) but collaborative (show intent, ask permission).

Different tradeoff than WASM or containers: lighter than VMs, cross-platform, and the user sees exactly what the agent wants to do before approving.

WIP, currently porting to PyPy 3.8 to unlock MacOS arm64 support: https://github.com/corv89/shannot

I have been thinking about this myself, but am still not convinced about how to run untrusted Python code. I'm not convinced that the right solution is to run the code as WebASM [1].

I have been looking towards some kind of quick-start qemu option as a possibility, but the project will take a while.

[1] https://github.com/mavdol/capsule

  • I see what you mean, but i think there is room for both approaches.

    If we want to isolate untrusted code at a very fine-grained level (like just a specific function), VMs can feel a bit heavy due to the overhead, complexity etc

    • What you really want to do is decouple the sandbox specification annotations from the sandbox implementation backend, yes?

  • What's the problem with WASM? It's a mature target, and was created primarily, if not solely, for running untrusted native code.

> The thing is, Python dominates AI/ML, especially the AI agents space. We're moving from deterministic systems to probabilistic ones, where executing untrusted code is becoming common.

This is so true

Neither the article nor the README explains how it works.

How does it work? Which WASM euntime does it use? Does it use a Python jnterpreter compiled to WASM?

Edit: never mind, I read it wrong.

---

That is not save at all. You could always hijack builtin functions within untrusted code.

  def untrusted_function():
      original_map = map
  
      def noisy_map(func, *iterables):
          print(f"--- Log: map() called on {func.__name__} ---")
          return original_map(func, *iterables)
  
      globals()['map'] = noisy_map

  • Actually, since it runs inside a WASM sandbox, even if the untrusted code overwrites built-ins like map or modifies globals(), it only affects its own isolated memory space. It cannot escape the WASM container or affect the host system

  • it blows my mind how people call Perl ugly but yet this monstrosity is ok. Python being 'human' readable has got to be the biggest scam ever perpetrated against language design.

Seems fine to me. I think you're going to take a huge performance hit by putting CPython into wasm. gVisor is mentioned as having a performance penalty but I'm extremely doubtful of that penalty (which is really on IO, which I expect to not be a huge deal for these workloads) being anywhere near the penalty of wasm.