Automatic Backups on Ubuntu

If you have a second hard drive, it's generally a good idea to make regular backups of your home directory (or whichever directory you do most of your daily work in). This can protect you from both hard drive failures and accidental file deletion, which is surprisingly easy to do with a mistyped “rm” command. I've found that the “rsnapshot” program provides a simple and effective way to create automatic regular backups, but it's a little hard to figure out how to set it up because the documentation is so sparse. This guide will explain how I did it on my Ubuntu 18.04 system with two hard drives.

Configure rsnapshot

First, install rsnapshot if you don't have it already, with sudo apt-get install rsnapshot. To configure it, you'll need to edit the file /etc/rsnapshot.conf as root. This configuration file is fairly long and has several lines you need to change. In general, each line has the format variable <tab> value.

  1. Change the value of snapshot_root to the path to your backup drive (and the folder within it you want to use for backups). For example, my second hard drive is mounted at /media/storage, so I changed the snapshot_root line to
    snapshot_root	/media/storage/rnsapshot/
  2. Uncomment the lines that start with cmd_cp, cmd_rm, and cmd_du. This tells rnsapshot the path to the Linux programs cp, rm, and du; the default path in the commented-out lines should be correct in a standard Ubuntu installation.
  3. In the section labeled “Backup Levels / Intervals,” change the names associated with the retain commands from “alpha,” “beta,” and “gamma” to “hourly,” “daily,” and “weekly.” You can also change the numeric values to change the number of backups to retain from each backup level. After making these changes, I have the lines:
    retain	hourly	6
    retain	daily 	7
    retain	weekly	4
    This means rsnapshot should keep 6 backups at the “hourly” level, 7 backups at the “daily” level, and 4 backups at the “weekly” level.
  4. Uncomment the line starting with “logfile.” This will create a log of rsnapshot operations in /var/log/rsnapshot.log (you can also change this path if you want), which helps you to verify that rsnapshot is working and debug it if it is not. You probably don't want your backup service to silently fail.
  5. In the section labeled “The include and exclude paramteters, if enabled...” add a new line:
    exclude	".cache/"
    This will tell rsnapshot to exclude the hidden “cache” folder in your home directory, which contains lots of temporary files created by various programs.
  6. Uncomment the line link_dest 0, and change it to link_dest 1. As the comments say, it's a good idea to enable link_dest if “your version of rsync supports it,” and the version of rsync that ships with Ubuntu does support this feature.
  7. In the section labeled “Backup Points / Scripts,” uncomment the line
    backup	/home/	localhost/
    This tells rnsapshot to backup the /home/ directory (which is the location of each user's home directory under Ubuntu, e.g. /home/edward/) to the local file path you specified with snapshot_root (i.e. your second hard drive).

After making these changes to the configuration file, rsnapshot will be set up to do local backups, but it won't actually make any backups yet.

Schedule Backups

Rsnapshot only creates a new backup when it is run, so to create regular backups, you need to run the rsnapshot program at regular intervals. Linux has two different utilities that can automatically run a program at scheduled times: cron and anacron. Cron is older and has more guides on the Internet, but unfortunately it's not a good tool for scheduling backups because it will not run any scheduled task whose scheduled time occurs while the computer is powered off. This design works fine for servers that are always (or almost always) on, but not for your personal computer, especially when it comes to scheduling backups: If you use cron to schedule your weekly backup, and your computer happens to be off at the time the weekly backup should happen, then the weekly backup will never run, even when you turn your computer on again. By contrast, anacron understands that computers sometimes shut down, and if a scheduled task's time passes while the computer is off, anacron will re-run the missed task the next time the computer turns on. Thus, I prefer to use anacron to schedule rsnapshot backup tasks.

To configure anacron to run rsnapshot at regular intervals, add the following lines to /etc/anacrontab (which you will need to be root to edit):

1	10	daily-snapshot	/usr/bin/rsnapshot daily
7	20	weekly-snapshot	/usr/bin/rsnapshot weekly

This runs the “daily” rsnapshot backup every 1 day, and the “weekly” rsnapshot backup every 7 days. The 10 and 20 represent the delay, in minutes, after anacron starts running its daily jobs that it will run these commands. (These delays prevent dozens of processes from starting at once).

Unfortunately, anacron does not have the ability to run tasks more frequently than once per day. In order to schedule hourly backups with rsnapshot, we have no choice but to use cron, despite its drawbacks. To add a cron task that will run the rsnapshot hourly backup, create a file in /etc/cron.d/ named rsnapshot, and put the following line in it:

0 */4		* * *		root	/usr/bin/rsnapshot hourly

This is a single line of a crontab file, which will get concatenated with the other files in /etc/cron.d/ to create the system crontab, at least on Ubuntu 18.04. Adding files to /etc/cron.d/ is preferred to editing /etc/crontab directly in this distribution. The specification 0 */4 at the beginning of the line actually runs the hourly backup task every 4 hours, not every hour, since this is sufficient for most purposes and reduces the number of backups, but you can change this to 0 * if you really want a backup every hour.

Using Your Backups

Once you have configured rsnapshot and anacron as described above, you should see backups start appearing on your secondary hard drive. If the path you specified is /media/storage/rsnapshot/, then each backup will appear as a subfolder of the rsnapshot folder. The backup folders will be named for their frequency category (hourly, daily, weekly), plus their age within that category as an increasing counter. Thus, “daily.0” is the most recent daily backup, while “daily.3” is a daily backup from 4 days ago.

Although each backup folder appears to contain a complete copy of the home directory you're backing up, rsnapshot doesn't actually make new copies of files that did not change between backups. Instead, files that did not change in the latest backup are just hard links to the last known version of the file (in a previous backup folder). This saves disk space without affecting your ability to restore from a backup — any file you copy out of a backup folder will become a new, complete file in its destination, even if the file you copied was actually a hard link.