How does `scp` differ from `rsync`?
RsyncScpRsync Problem Overview
An article about setting up Ghost blogging says to use scp
to copy from my local machine to a remote server:
scp -r ghost-0.3 root@*your-server-ip*:~/
However, Railscast 339: Chef Solo Basics uses scp
to copy in the opposite direction (from the remote server to the local machine):
scp -r root@178.xxx.xxx.xxx:/var/chef .
In the same Railscast, when the author wants to copy files to the remote server (same direction as the first example), he uses rsync
:
rsync -r . root@178.xxx.xxx.xxx:/var/chef
Why use the rsync
command if scp
will copy in both directions? How does scp
differ from rsync
?
Rsync Solutions
Solution 1 - Rsync
The major difference between these tools is how they copy files.
scp
basically reads the source file and writes it to the destination. It performs a plain linear copy, locally, or over a network.
rsync
also copies files locally or over a network. But it employs a special delta transfer algorithm and a few optimizations to make the operation a lot faster. Consider the call.
rsync A host:B
-
rsync
will check files sizes and modification timestamps of both A and B, and skip any further processing if they match. -
If the destination file B already exists, the delta transfer algorithm will make sure only differences between A and B are sent over the wire.
-
rsync
will write data to a temporary file T, and then replace the destination file B with T to make the update look "atomic" to processes that might be using B.
Another difference between them concerns invocation. rsync
has a plethora of command line options, allowing the user to fine tune its behavior. It supports complex filter rules, runs in batch mode, daemon mode, etc. scp
has only a few switches.
In summary, use scp
for your day to day tasks. Commands that you type once in a while on your interactive shell. It's simpler to use, and in those cases rsync
optimizations won't help much.
For recurring tasks, like cron
jobs, use rsync
. As mentioned, on multiple invocations it will take advantage of data already transferred, performing very quickly and saving on resources. It is an excellent tool to keep two directories synchronized over a network.
Also, when dealing with large files, use rsync
with the -P
option. If the transfer is interrupted, you can resume it where it stopped by reissuing the command. See Sid Kshatriya's answer.
Finally, note that rsync://
the protocol is similar to plain HTTP: unencrypted and no integrity checks. Be sure to always use rsync
via SSH (as in the examples from the question above), not via the rsync protocol, unless you really know what you're doing. scp
will always use SSH as underlying transport mechanism which has both integrity and confidentiality guarantees, so that is another difference between the two utilities.
Solution 2 - Rsync
rysnc can be useful to run on slow and unreliable connections. So if your download aborts in the middle of a large file rysnc will be able to continue from where it left off when invoked again.
Use rsync -vP username@host:/path/to/file .
The -P option preserves partially downloaded files and also shows progress.
As usual check man rsync
Solution 3 - Rsync
Difference b/w scp and rsync on different parameter
1. Performance over latency
-
scp
: scp is relatively less optimise and speed -
rsync
: rsync is comparatively more optimise and speed
2. Interruption handling
-
scp
: scp command line tool cannot resume aborted downloads from lost network connections -
rsync
: If the above rsync session itself gets interrupted, you can resume it as many time as you want by typing the same command. rsync will automatically restart the transfer where it left off.
http://ask.xmodulo.com/resume-large-scp-file-transfer-linux.html
3. Command Example
scp
$ scp source_file_path destination_file_path
rsync
$ cd /path/to/directory/of/partially_downloaded_file
$ rsync -P --rsh=ssh [email protected]:bigdata.tgz ./bigdata.tgz
The -P
option is the same as --partial --progress
, allowing rsync to work with partially downloaded files. The --rsh=ssh
option tells rsync to use ssh as a remote shell.
4. Security :
scp is more secure. You have to use rsync --rsh=ssh
to make it as secure as scp.
man document to know more :
Solution 4 - Rsync
One major feature of rsync
over scp
(beside the delta algorithm and encryption if used w/ ssh) is that it automatically verifies if the transferred file has been transferred correctly. Scp will not do that, which occasionally might result in corruption when transferring larger files. So in general rsync is a copy with guarantee.
Centos manpages mention this the end of the --checksum
option description:
> Note that rsync always verifies that each transferred file was > correctly reconstructed on the receiving side by checking a whole-file > checksum that is generated as the file is transferred, but that > automatic after-the-transfer verification has nothing to do with this > option’s before-the-transfer “Does this file need to be updated?” > check.
Solution 5 - Rsync
There's a distinction to me that scp
is always encrypted with ssh (secure shell), while rsync
isn't necessarily encrypted. More specifically, rsync
doesn't perform any encryption by itself; it's still capable of using other mechanisms (ssh for example) to perform encryption.
In addition to security, encryption also has a major impact on your transfer speed, as well as the CPU overhead. (My experience is that rsync
can be significantly faster than scp
.)
Check out this post for when rsync
has encryption on.
Solution 6 - Rsync
scp is best for one file.
OR a combination of tar
& compression for smaller data sets
like source code trees with small resources (ie: images, sqlite etc).
Yet, when you begin dealing with larger volumes say:
- media folders (40 GB)
- database backups (28 GB)
- mp3 libraries (100 GB)
It becomes impractical to build a zip/tar.gz file to transfer with scp at this point do to the physical limits of the hosted server.
As an exercise, you can do some gymnastics like piping tar
into ssh
and redirecting the results into a remote file. (saving the need to build
a swap or temporary clone aka zip or tar.gz)
However,
rsync simplify's this process and allows you to transfer data without consuming any additional disc space.
Also,
Continuous (cron?) updates use minimal changes vs full cloned copies speed up large data migrations over time.
tl;dr
scp
== small scale (with room to build compressed files on the same drive)
rsync
== large scale (with the necessity to backup large data and no room left)
Solution 7 - Rsync
it's better to think in a practical context. In our team, we use rsync -aP
to replace a bad cassandra host in our cluster. We can't do this with scp (slow and no progress preservation).