← Back to context

Comment by nh2

14 hours ago

> I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallel

No, that is not the question. Even Wikipedia explains that rsync is single-threaded. And even if it was multithreaded "or otherwise" used concurent file IO:

The question is whether rsync _transmission_ is pipelined or not, meaning: Does it wait for 1 file to be transferred and acknowledged before sending the data of the next?

Somebody has to go check that.

If yes: Then parallel filesystem access won't matter, because a network roundtrip has brutally higher latency than reading data sequentially of an SSD.

Note that rsync on many small files is slow even within the same machine (across two physical devices), suggesting that the network roundtrip latency is not the major contributor.

  • The original post only mentions 3564 files and rsync spending 8 minutes on that. This just doesn't check out.

The filesystem access and general threading is the question because transmission is pipelined and not a thing "somebody has to go check". You just quoted the documentation for it.

The dead time isn't waiting for network trips between files, it's parts of the program that sometimes can't keep up with the network.

  • I quoted the documentation that claims _something_ is pipelined.

    That is extremely vague on what that is and I also didn't check that it's true.

    Both the original claim "the issue is the serialization of operations" and the counter-claim all sound like extreme guesswork or me. If you know for certain, please link the relevant code.

    Otherwise somebody needs to go check what it actually does; everything else is just speculating "oh surely it's the files" and then people remember stuff that might just be plain wrong.

    • Speculation isn't the most useful thing, but saying "that is not the question" to valid speculation is even less useful.