Comment by metadat

17 days ago

The key takeaway is hidden in the middle:

> In extreme cases, on purely CPU bound benchmarks, we’re seeing a jump from < 1Gbit/s to 4 Gbit/s. Looking at CPU flamegraphs, the majority of CPU time is now spent in I/O system calls and cryptography code.

400% increase in throughput, which should translate to a proportionate reduction in CPU utilization for UDP network activity. That's pretty cool, especially for better power efficiency on portable clients (mobile and notebook).

I found this presentation refreshing. Too often, claims about transition to "modern" stacks are treated as being inherently good and do not come with the data to back it up.

10 comments

metadat

fulafel 16 days ago

Any guesses on whether they have other cases where they get more than 4 Gbps but wasn't CPU bound or was this the fastest they got?

mxinden 16 days ago

_Author here_.
4 Gbit/s is on our rather dated benchmark machines. If you run the below command on a modern laptop, you likely reach higher throughput. (Consider disabling PMTUD to use a realistic Internet-like MTU. We do the same on our benchmark machines.)
https://github.com/mozilla/neqo
cargo bench --features bench --bench main -- "Download"

a-dub 17 days ago

i wonder if we'll ever see hardware accelerated cross-context message passing for user and system programs.

wbl 17 days ago
Shared ring buffers for IO exist in Linux, I don't think we'll ever see it extend to DMA for the NIC due to the rearchitecture of security required. However if the NIC is smart enough and the rules simple maybe.
- jitl 17 days ago
  
  There are systems that move the NIC control to user space entirely. For example Snabb has an Intel 10g Ethernet controller driver that appears to use a ring buffer on DMA memory.
  https://github.com/snabbco/snabb/blob/master/src/apps/intel/...
  
  1 reply →
- publicmail 17 days ago
  
  RDMA offers that. The NIC can directly access user space buffers. It does require that the buffers are “registered” first but applications usually aim to do that once up front.
- kd913 17 days ago
  
  There is AMD's onload https://github.com/Xilinx-CNS/onload. It works with Solarflare, Xilinx but also generic NIC support via AF_XDP.
  
  1 reply →
- a-dub 17 days ago
  
  sure, but what about some kind of generalized cross-context ipc primitive towards a zero copy messaging mechanism for high performance multiprocessing microkernels?