That's because it's logical that implementing network capable segmentation and flow control is more costly than just moving data with internal, native structures. And looking up random benchmarks yields anything from equal performance to 10x faster for Unix domain.
It wouldn't surprise me if inet sockets were more optimized though and so unix sockets ended up slower anyway just because nobody has bothered to make them good (which is probably why some of your benchmarks show equal performance). Benchmarks are important.
I've spent several years optimizing a specialized IPC mechanism for a work project. I've spent time reviewing the Linux Kernel's unix socket source code to understand obscure edge cases. There isn't really much to optimize - it's just copying bytes between buffers. Most of the complexity of the code has to do with permissions and implementing the ability to send file descriptors. All my benchmarks have unambiguously showed unix sockets to be more performant than loopback TCP for my particular use case.
I agree, but practically speaking they're used en masse all across the field and people did bother to make them good [enough]. I suspect the benchmarks where they come up equal are cases where things are limited by other factors (e.g. syscall overhead), though I don't want to make unfounded accusations :)
servo's Ipc-channel doesn't use Unix domain sockets to move data. It uses it to share a memfd file descriptor effectively creating a memory buffer shared between two processes
Ok, now it sounds like you're blaming unix sockets for someone's shitty code...
No idea what "TI Envelope" is, and a Google search doesn't come up with usable results (oh the irony...) - if it's a logging/metric thing, those are hard to get to perform well regardless of socket type. We ended up using batching with mmap'd buffers for crash analysis. (I.e. the mmap part only comes in if the process terminates abnormally, so we can recover batched unwritten bits.)
That's because it's logical that implementing network capable segmentation and flow control is more costly than just moving data with internal, native structures. And looking up random benchmarks yields anything from equal performance to 10x faster for Unix domain.
It wouldn't surprise me if inet sockets were more optimized though and so unix sockets ended up slower anyway just because nobody has bothered to make them good (which is probably why some of your benchmarks show equal performance). Benchmarks are important.
I've spent several years optimizing a specialized IPC mechanism for a work project. I've spent time reviewing the Linux Kernel's unix socket source code to understand obscure edge cases. There isn't really much to optimize - it's just copying bytes between buffers. Most of the complexity of the code has to do with permissions and implementing the ability to send file descriptors. All my benchmarks have unambiguously showed unix sockets to be more performant than loopback TCP for my particular use case.
I agree, but practically speaking they're used en masse all across the field and people did bother to make them good [enough]. I suspect the benchmarks where they come up equal are cases where things are limited by other factors (e.g. syscall overhead), though I don't want to make unfounded accusations :)
Unix Domain Sockets are the standard mechanism for app->sidecar communication at Google (ex: Talking to the TI envelope for logging etc.)
servo's Ipc-channel doesn't use Unix domain sockets to move data. It uses it to share a memfd file descriptor effectively creating a memory buffer shared between two processes
Search around on Google Docs for my 2018 treatise/rant about how the TI Envelope was the least-efficient program anyone had ever deployed at Google.
Ok, now it sounds like you're blaming unix sockets for someone's shitty code...
No idea what "TI Envelope" is, and a Google search doesn't come up with usable results (oh the irony...) - if it's a logging/metric thing, those are hard to get to perform well regardless of socket type. We ended up using batching with mmap'd buffers for crash analysis. (I.e. the mmap part only comes in if the process terminates abnormally, so we can recover batched unwritten bits.)
4 replies →
I'm a xoogler so I don't have access. Do you have a TL;DR that you can share here (for non-Googlers)?
Are there resources suggesting otherwise?
As often in computing, profiling is a foreign word.
Tell me more, I know nothing about IPC