Comment by Bender
2 days ago
I doubt this would ever be accepted upstream. That said if one wants speed play around with lftp [1]. It has a mirror subsystem that can replicate much of rsync functionality in a chroot sftp-only destination and can use multiple TCP/SFTP streams in a batch upload and per-file meaning one can saturate just about any upstream. I have used this for transferring massive postgres backups and then because I am paranoid when using applications that automatically multipart transfer files I include a checksum file for the source and then verify the destination files.
The only downside I have found using lftp is that given there is no corresponding daemon for rsync on the destination then directory enumeration can be slow if there are a lot of nested sub-directories. Oh and the syntax is a little odd for me anyway. I always have to look at my existing scripts when setting up new automation.
Demo to play with, download only. Try different values. This will be faster on your servers, especially anything within the data-center.
ssh mirror@mirror.newsdump.org # do this once to accept key as ssh-keyscan will choke on my big banner
mkdir -p /dev/shm/test && cd /dev/shm/test
lftp -u mirror, -e "mirror --parallel=4 --use-pget=8 --no-perms --verbose /pub/big_file_test/ /dev/shm/test;bye" sftp://mirror.newsdump.org
For automation add --loop to repeat job until nothing has changed.
The normal answer that I have heard to the performance problems in the conversion from scp to sftp is to use rsync.
The design of sftp is such that it cannot exploit "TCP sliding windows" to maximize bandwidth on high-latency connections. Thus, the migration from scp to sftp has involved a performance loss, which is well-known.
https://daniel.haxx.se/blog/2010/12/08/making-sftp-transfers...
The rsync question is not a workable answer, as OpenBSD has reimplemented the rsync protocol in a new codebase:
https://www.openrsync.org/
An attempt to combine the BSD-licensed rsync with OpenSSH would likely see it stripped out of GPL-focused implementations, where the original GPL release has long standing.
It would be more straightforward to design a new SFTP implementation that implements sliding windows.
I understand (but have not measured) that forcibly reverting to the original scp protocol will also raise performance in high-latency conditions. This does introduce an attack surface, should not be the default transfer tool, and demands thoughtful care.
https://lwn.net/Articles/835962/
I included LFTP using mirror+sftp in my example as it is the secure way to give less than trusted people access to files and one can work around the lack of sliding windows by spawning as many TCP flows as one wishes with LFTP. I would love to see SFTP evolve to use sliding windows but for now using it in the data-center or over WAN accelerated links is still fast.
Rsync is great when moving files between trusted systems that one has a shell on but the downside is that rsync can not split up files into multiple streams so there is still a limit based on source+dest buffer+rtt and one has to provide people a shell or add some clunky way to prevent a shell by using wrappers unless using native rsync port 873 which is not encrypted. Some people break up jobs on the client side and spawn multiple rsync jobs in the background. It appears that openrsync is still very much work in progress.
SCP is being or has been deprecated but the binaries still exist for now. People will have to hold onto old binaries and should probably static compile them as the linked libraries will likely go away at some point.
The scp program switched to calling sftp as the server in OpenSSH version 8.9, and notably Windows is now running 9.5, so large segments of scp users are now invoking sftp behind the scenes.
If you want to use the historic scp server instead, a command line option is provided to allow this:
"In case of incompatibility, the scp(1) client may be instructed to use the legacy scp/rcp using the -O flag."
https://www.openssh.org/releasenotes.html
The old scp behavior hasn't been removed, but you need to specifically request it. It is not the default.
It would seem to me that an alternate invocation for file transfer could be tested against sftp in high latency situations:
That would be slightly faster than tar, which adds some overhead. Using tar on both sides would allow transfers of special files, soft links, and retain hard links, which neither scp nor sftp will do.
Windows has also recently added a tar command.
2 replies →
Rsync commonly uses SSH as the transport layer so it won't necessarily be any faster than SFTP unless you are using the rsync daemon (usually on port 873). However, the rsync daemon won't provide any encryption and I can't suggest using it unless it's on a private network.
> The rsync question is not a workable answer, as OpenBSD has reimplemented the rsync protocol in a new codebase
I thought openrsync existed solely because of rpki. Even OpenBSD devs recommend using the real version from ports.
Wow, I hadn't heard of this before. You're saying it can "chunk" large files when operating against a remote sftp-subsystem (OpenSSH)?
I often find myself needing to move a single large file rather than many smaller ones but TCP overhead and latency will always keep speeds down.
Not every OS or every SSH daemon support byte ranges but most up to date Linux systems and OpenSSH absolutely support it. One should not assume this exists on legacy systems and daemons.
Byte ranges are the only way to access files over sftp. Look at the read and write requests in https://datatracker.ietf.org/doc/html/draft-ietf-secsh-filex...
1 reply →
I use lftp a lot because of it's better UI compared to sftp. However, for large files, even with scp I can pin GigE with an old Xeon-D system acting as a server.
Yes, for local access this is my experience too. For trans-oceanic file transfers I can really see the limits and parallelization is essential.