Comment by jmogly

3 days ago

It is kind of a fundamental risk of IMDS, the guest vms often need some metadata about themselves, the host has it. A hardened, network gapped service running host side is acceptable, possibly the best solution. I think the issue is if your IMDS is fat and vulnerable, which this article kind of alludes to.

There’s also the fact that azure’s implementation doesn’t require auth so it’s very vulnerable to SSRF

You could imagine hosting the metadata service somewhere else. After all there is nothing a node knows about a VM that the fabric doesn’t. And things like certificates comes from somewhere anyway, they are not on the node so that service is just cache.

  • Hosting IMDS on the host side is pretty much the only reasonable way to provide stability guarantees. It should still work even if the network is having issues.

    That being said, IMDS on AWS is a dead simple key-value storage. A competent developer should be able to write it in a memory-safe language in a way that can't be easily exploited.

    • “No, there is another”—Yoda, The Empire Strikes Back :)

      What you describe carries the risk that secrets end up in crash dumps and be exfiltrated.

      Imagine an attacker owns the host to some extent and can do that. The data is then on disk first, then stored somewhere else.

      You probably need per-tenant/per-VM encryption in your cache, since you can never protect against someone with elevated privileges from crashing or dumping your process, memory-safe or not.

      Then someone can try to DoS you, etc.

      Finally it’s not good practice to mix tenant’s secrets in hostile multi-tenancy environments, so you probably need a cache per VM in separate processes…

      IMHO, an alternative is to keep the VM's private data inside the VMs, not on the host.

      Then the real wtf is the unsecured HTTP endpoint, an open invitation for “explorations” of the host (or the SoC when they get there) on Azure.

      eBPF+signing agent helps legitimate requests but does nothing against attacks on the server itself; say, you send broken requests hoping to hit a bug. It does not matter if they are signed or not.

      This is a path to own the host, an unnecessary risk with too many moving parts.

      Many VM escapes abuse a device driver, and I trust the kernel guys who write them a lot more than the people who write hostable web servers running inproc on the host.

      Removing these was a subject of intense discussions (and pushbacks from the owning teams) but without leaking any secret I can tell you that a lot of people didn’t like the idea of a customer-facing web server on the nodes.

      1 reply →

  • Ah yes great point, awesome article by the way —- thought provoking, shocking, really crazy stuff. Hopefully some good comes of it, godspeed.