Comment by tuetuopay

6 hours ago

DPDK will give you the absolute best performance, period. But it will do so with tradeoffs that are far from negligible, especially on mixed-workload machines like a docker host/k8s node/hypervisor.

1. to get the absolute best performance, you're running in poll-mode, and burning cpu cores just for packet processing

2. the network interface is invisible to the kernel, making non-accelerated traffic on said interface tricky (say, letting the kernel perform arp resolution for you).

3. your dataplane is now a long-lived process, which means that stopping said process equates to no more network (hello restarts!)

Alleviating most of those takes a lot of effort or some tradeoffs making it less worth it:

1. can be mitigated by adaptive polling at the cost of latency.

2. by using either a software bifurcation by re-injecting non-accelerated traffic in a tap, or with NICs with hardware bifurcation (e.g. connectx) and installing the flows in its flow engine. Both are quite time consuming to get right

3. by manually writing a handoff system between new and old processes, and making sure it never crashes

DPDK also needs its own runtime, with its own libraries. Some stuff will be manual (e.g. giving it routing tables). XDP gives all of those for free:

1. All modern NIC drivers will already perform adaptive polling and interrupt moderation; so you're not burning CPU cycles on polling the card outside of high packet rate scenarios (on which you'd burn CPUs on IRQs and context switches anyways).

2. It's just an extra bit of software in the driver's path, and the XDP program decides whether to handle it itself or pass id down to the kernel. Pretty useful to keep ARP, ICMP, BGP, etc without extra code.

3. XDP is closer to a lambda than anything: the code runs once for every single packet, meaning its runtime is extremely short. This also means that the long-running process is your kernel; and that updating the code is an atomic operation that done on the fly.

4. A lot of facilities are already provided, and the biggest of them is maps. The kernel handles all the stateful things to feed data (routing tables, arp tables, etc) to your dataplane code. CPU affinity is also handled by the kernel in the sense that XDP runs on the CPU responsible for the NIC queue, whose mapping is controlled through standard kernel interfaces, unrelated to XDP (meaning: not on your mind).

Now, speaking purely of optimizations. Yes, DPDK will always be better CPU-wise because you can compile it with -march native while eBPF is JIT-ed when available (and pretty poorly, having already looked at it). However, from experience, the parts that actually take time are map lookups (looking up the nexthop, looking up the mac address, etc), and those are written in C in the kernel, thus are as optimized as the kernel can be. Recompiling the kernel for your CPU can boost performance, but I've never done it myself.

Today, I would consider that unless you absolutely need the absolute best performance, XDP is more than fine. Modern CPUs are so fast that it's not worth it to consider DPDK for most cases.

- container routing like there? the dpdk runtime is a no-go, and the operational flexibility of xdp is a killer.

- network appliances like switches/routers? shell out a few extra bucks and buy a slightly better CPU. if latency is paramount, or you're doing per-packet processing that cannot fit in an ebpf probe, then go the dpdk route.

At a previous job, I rewrote for fun a simple internal dpdk routing application using xdp: only half the performance (in packets per second, not bits per second) on the same hardware with no optimizations whatsoever, in 100 lines or ebpf. Mind you, I could saturate a 100Gbps link with 100 bytes packets, not 64 bytes, what a tragedy /s. On more modern hardware (latest EPYC), I trivially reached 200Mpps on an 8 core CPU using XDP.