Comment by Cyph0n
2 days ago
> Noting Tailscale, since it has been mentioned, has comparatively extremely slow performance.
Isn't this mainly because Tailscale relies on userspace WG (wireguard-go)? I'd imagine the perf ceiling is much higher for kernel WG, which I believe is what Netbird uses.
wireguard-go is indeed very slow. For example, the official WireGuard Mac client uses it, and performance on my M1 Max is CPU capped at 200Mbps. The kernel WireGuard implementation available for Linux is certainly faster, but I would not consider it fast.
Tailscale however, although it derives from WireGuard libraries and the protocol, is really not WireGuard at all- so comparing it is a bit apples to oranges. With that said, it is still entirely userspace and its performance is less than stellar.
Well, according to this[1] bench, you can get ~10 Gbps with kernel WG.
I'm interested in this because I'm working on a small hobby project to learn eBPF. The idea is to implement a "Tailscale-lite" that eliminates context switches by keeping both Wireguard and L3 and L4 policy handling in kernel space. To me, the bulk of Tailscale's overhead comes from the fact that the dataplane is running between user and kernel space.
[1]: https://github.com/cyyself/wg-bench
That's a large packet benchmark, not mixed packet size, and it just barely hits it. If you need consistent 10Gbps for a business use case, I would not consider that sufficient.
> "To me, the bulk of Tailscale's overhead comes from the fact that the dataplane is running between user and kernel space."
Yes and no, it's more complicated. DPDK is the industry standard library for fast packet processing, and it is in entirely user space. The Linux kernel netstack is just not very fast.
1 reply →