← Back to context

Comment by toast0

14 hours ago

I've only done a little prototyping with it, but io_uring addresses the same issue as DPDK, but in a totally different way. If you want high perf, you want to avoid context switches between userland and kernelland; you have DPDK which brings the NIC buffers into userland and bypasses the kernel, you have things like sendfile and kTLS which lets the kernel do most of the work and bypasses userland; and you have io_uring which lets you do the same syscalls as you're doing now, but a) in a batch format, b) also in a continuous form with a submission queue thing. I think it's easier to reach for io_uring than DPDK, but it might not get you as far as DPDK; you're still communicating between kernel and userland, but it's better than normal syscalls.

> Can you get similar performance with QUIC?

I don't know that I've seen benchmarks, but I'd be surprised if you can get similar performance with QUIC. TCP has decades of optimization that you can lean on, UDP for bulk transfer really doesn't. For a lot of applications, server performance from QUIC vs TCP+TLS isn't a big deal, because you'll spend much more server performance on computing what to send than on sending it... For static file serving, I'd be surprised if QUIC is actually competitive, but it still might not be a big deal if your server is overpowered and can hit the NIC limits with either.

It is fairly straightforward to implement QUIC transport at ~100 Gb/s per core without encryption which is comparable or better than TCP. With encryption, every protocol will bottleneck on the encryption and only get a mere 40-50 Gb/s per core unless you have dedicated crypto offload hardware.

However, the highest performance public QUIC implementation benchmarks only get ~10 Gb/s per core. It is unclear to me if this is due to slow QUIC implementations or poor UDP stacks with inadequate buffering and processing.