← Back to context

Comment by rtkaratekid

2 days ago

Forgive me of my ignorance, but is XDP faster than DPDK for packet processing? It seems like DPDK has had a lot of work done for hardware optimizations that allow speeds that I can’t recall XDP being able to do. I have not looked too deeply into this though, so I’m very open to being wrong!

DPDK is a framework with multiple backends, on the receive side it can use XDP to intercept packets.

You can't compare the efficiency of the frameworks without talking about the specific setups on the host. The major advantage of XDP is that it is completely baked into the kernel. All you need to do is bring your eBPF program and attach it. DPDK requires a great deal of setup and user space libraries to work.

DPDK will give you the absolute best performance, period. But it will do so with tradeoffs that are far from negligible, especially on mixed-workload machines like a docker host/k8s node/hypervisor.

1. to get the absolute best performance, you're running in poll-mode, and burning cpu cores just for packet processing

2. the network interface is invisible to the kernel, making non-accelerated traffic on said interface tricky (say, letting the kernel perform arp resolution for you).

3. your dataplane is now a long-lived process, which means that stopping said process equates to no more network (hello restarts!)

Alleviating most of those takes a lot of effort or some tradeoffs making it less worth it:

1. can be mitigated by adaptive polling at the cost of latency.

2. by using either a software bifurcation by re-injecting non-accelerated traffic in a tap, or with NICs with hardware bifurcation (e.g. connectx) and installing the flows in its flow engine. Both are quite time consuming to get right

3. by manually writing a handoff system between new and old processes, and making sure it never crashes

DPDK also needs its own runtime, with its own libraries. Some stuff will be manual (e.g. giving it routing tables). XDP gives all of those for free:

1. All modern NIC drivers will already perform adaptive polling and interrupt moderation; so you're not burning CPU cycles on polling the card outside of high packet rate scenarios (on which you'd burn CPUs on IRQs and context switches anyways).

2. It's just an extra bit of software in the driver's path, and the XDP program decides whether to handle it itself or pass id down to the kernel. Pretty useful to keep ARP, ICMP, BGP, etc without extra code.

3. XDP is closer to a lambda than anything: the code runs once for every single packet, meaning its runtime is extremely short. This also means that the long-running process is your kernel; and that updating the code is an atomic operation that done on the fly.

4. A lot of facilities are already provided, and the biggest of them is maps. The kernel handles all the stateful things to feed data (routing tables, arp tables, etc) to your dataplane code. CPU affinity is also handled by the kernel in the sense that XDP runs on the CPU responsible for the NIC queue, whose mapping is controlled through standard kernel interfaces, unrelated to XDP (meaning: not on your mind).

Now, speaking purely of optimizations. Yes, DPDK will always be better CPU-wise because you can compile it with -march native while eBPF is JIT-ed when available (and pretty poorly, having already looked at it). However, from experience, the parts that actually take time are map lookups (looking up the nexthop, looking up the mac address, etc), and those are written in C in the kernel, thus are as optimized as the kernel can be. Recompiling the kernel for your CPU can boost performance, but I've never done it myself.

Today, I would consider that unless you absolutely need the absolute best performance, XDP is more than fine. Modern CPUs are so fast that it's not worth it to consider DPDK for most cases.

- container routing like there? the dpdk runtime is a no-go, and the operational flexibility of xdp is a killer.

- network appliances like switches/routers? shell out a few extra bucks and buy a slightly better CPU. if latency is paramount, or you're doing per-packet processing that cannot fit in an ebpf probe, then go the dpdk route.

At a previous job, I rewrote for fun a simple internal dpdk routing application using xdp: only half the performance (in packets per second, not bits per second) on the same hardware with no optimizations whatsoever, in 100 lines or ebpf. Mind you, I could saturate a 100Gbps link with 100 bytes packets, not 64 bytes, what a tragedy /s. On more modern hardware (latest EPYC), I trivially reached 200Mpps on an 8 core CPU using XDP.

Long story short, you'll know when you need DPDK.

  • Oh wow interesting, so the rewrite only went 1/2 as fast? I know cloudflare uses ebpf quite heavy whereas the Great Firewall uses DPDK. I wonder if cloudflare's motivation is just to run it easier on GCP. Any cloudflare employee's here?

    • Pretty much, which was incredible for a half day rewrite, learning ebpf in rust included. The effort to result ratio is simply incredible. A few cleanups and optimizations later and I was pretty much convinced I would not need to touch DPDK again (so was the company). Following this experiments, I wrote some actual production grade eBPF routers at this company that are in production, much more complex, but still able to reach 200Mpps on a $500 CPU (EPYC 9015).

      As for why Cloudflare uses eBPF where the GFW uses DPDK I can see a few reasons:

      - DPDK was the only game in town when the GFW started, while eBPF was the hot new thing for Cloudflare's recent endeavors. GFW did not have any choice.

      - Cloudflare has a performance focus, but still has a bit of "hardware is cheap, engineers are expensive", making eBPF more than fine.

      - The GFW runs on dedicated machines on the traffic path, while I would expect most of Cloudflare's eBPF endeavors run directly on mixed-workloads machines. One of their first blogpost about it (dropping x Mpps) specifically calls the reason was to protect an end machine directly on said machine, by preventing bad packets from reaching the kernel stack

      - Most of the operational advantages I already mentioned. GFW is fine with a "drop traffic if DPDK down", but Cloudflare is absolutely not, making the operational simplicity a bit win.

      I bet Cloudflare does have quite a hefty DPDK application used for the traffic scrubbing part of their anti-ddos; but they don't publicize it because it's not as shiny as eBPF.

      There are also other advantages to eBPF that makes it better suited to a multi-product company like cloudflare that don't weigh as much as in a mono-product org like the GFW. Take for example the much easier testing, dev env on any laptop, ... Or that eBPF probes can be written in Rust, getting the same featureful language to run in the kernel and in userspace (the classic combo is Go in userspace, C in kernelspace).

      1 reply →

  • Thanks for the effort making a great overview. I’ve used both frameworks before but not deeply, so your write up was a good read.