Comment by Cyph0n
1 day ago
Well, according to this[1] bench, you can get ~10 Gbps with kernel WG.
I'm interested in this because I'm working on a small hobby project to learn eBPF. The idea is to implement a "Tailscale-lite" that eliminates context switches by keeping both Wireguard and L3 and L4 policy handling in kernel space. To me, the bulk of Tailscale's overhead comes from the fact that the dataplane is running between user and kernel space.
That's a large packet benchmark, not mixed packet size, and it just barely hits it. If you need consistent 10Gbps for a business use case, I would not consider that sufficient.
> "To me, the bulk of Tailscale's overhead comes from the fact that the dataplane is running between user and kernel space."
Yes and no, it's more complicated. DPDK is the industry standard library for fast packet processing, and it is in entirely user space. The Linux kernel netstack is just not very fast.
Sure, but who is going to ship a DPDK application to end users? And how exactly would that work for existing user applications that are not DPDK aware?
I think kernel networking is the only option for Tailscale (or any similar mesh VPN solution). Given this key constraint, the best you can do is do more work in kernel space and reduce context switches.