Comment by iscoelho
1 day ago
wireguard-go is indeed very slow. For example, the official WireGuard Mac client uses it, and performance on my M1 Max is CPU capped at 200Mbps. The kernel WireGuard implementation available for Linux is certainly faster, but I would not consider it fast.
Tailscale however, although it derives from WireGuard libraries and the protocol, is really not WireGuard at all- so comparing it is a bit apples to oranges. With that said, it is still entirely userspace and its performance is less than stellar.
Well, according to this[1] bench, you can get ~10 Gbps with kernel WG.
I'm interested in this because I'm working on a small hobby project to learn eBPF. The idea is to implement a "Tailscale-lite" that eliminates context switches by keeping both Wireguard and L3 and L4 policy handling in kernel space. To me, the bulk of Tailscale's overhead comes from the fact that the dataplane is running between user and kernel space.
[1]: https://github.com/cyyself/wg-bench
That's a large packet benchmark, not mixed packet size, and it just barely hits it. If you need consistent 10Gbps for a business use case, I would not consider that sufficient.
> "To me, the bulk of Tailscale's overhead comes from the fact that the dataplane is running between user and kernel space."
Yes and no, it's more complicated. DPDK is the industry standard library for fast packet processing, and it is in entirely user space. The Linux kernel netstack is just not very fast.
Sure, but who is going to ship a DPDK application to end users? And how exactly would that work for existing user applications that are not DPDK aware?
I think kernel networking is the only option for Tailscale (or any similar mesh VPN solution). Given this key constraint, the best you can do is do more work in kernel space and reduce context switches.