Updated on 2020-08-16
In another post, I explain how I set up a personal cloud server at home using Nextcloud. This solves the problem of privacy and the size of data I can save does not depend on how much I can pay each month. So far, I am very happy with this solution! Nextcloud is a great software and I really feel like I am the master of my own data.
Now there is still one problem. What if my flat burns? Or get robbed? My nice setup is defenceless against this kind of threat, I would simply lose everything. Another threat is data corruption, either by a wrong user action or a malware (ransomwares are more and more common nowadays!). RAID can prevent losing data because from a faulty disk, but not from the reasons cited above.
The obvious solution I see is to make backups in another place than my home. Or even several other places to maximize safety! And it is better to have the whole thing automated of course. The following figure illustrates the principle:
Several pre-conditions are required:
Now that the problems of the real world are solved, let’s solve the digital ones! In short, I am using rsync daemon mode over ssh.
rsync
is an ideal tool to perform
backups of distant hosts. There are several ways to do so:
The solution I am choosing is a combination of the previous ones: using rsync
daemon mode over SSH.
This mode is explained in the man page of rsync
under
USING RSYNC-DAEMON FEATURES VIA A REMOTE-SHELL CONNECTION.
Here are the steps to set this up.
rsync
daemonThe basic use of rsync
with a distant host is something like:
rsync host:source_dir dest_dir
Where source_dir
is a path on the remote machine. But when rsync
runs as a
daemon on the distant host, it can manage modules
which reference one or more
paths with options, and the command line becomes:
rsync host::module_name dest_dir
Note that there are two double-dots between the host name and the module name.
The description of the available modules can be done in a configuration file
given as a parameter to the rsync
command while starting the daemon:
rsync --daemon --config=rsyncd.conf
To learn everything about this configuration file, read man rsyncd.conf
.
Here is the configuration file of my daemon to backup files from Nextcloud:
# /home/backup-maker/.rsync/rsyncd.conf
log file = /home/backup-maker/.rsync/rsync.log
# to avoid a compromised backup server to harm the main server, it is ro
read only = true
exclude = lost+found/
# important to keep users and groups of files untouched
fake super = yes
# important or else it tries to chroot to path and fails because not root
use chroot = no
[nextcloud_data]
comment = Data files of the Nextcloud server, not including the database.
path = /path/to/nextcloud/data
# exclude additionnal files that are not critical for users,
# to save space and time
exclude = files_trashbin/ uploads/ appdata_* updater-*
[nextcloud_db]
comment = Database dumps of the Nextcloud server.
path = /path/to/db/dumps
As you can see there are two modules described in this configuration. One is dedicated to the data files of all users of Nextcloud. The other one is dedicated to the database dumps ; to backup the database properly it must be made in two steps: (1) dump the database on a regular basis on the server, (2) copy the last dumps on the backup server. Having two modules also allows backup servers to save these directories into different locations and at different periods for instance.
Another important detail is the linux user used to make these backups. I created
a dedicated one named backup-maker
on the server. This is a user with no
shell, and SSH connection only allows to run rsync
(I will get to that point
later).
useradd --shell /bin/false backup-maker
The rsync
daemon will be run by user backup-maker
(because the SSH
connection is made with him). Therefore this user needs read access to Nextcloud
files to be able to copy them. To achieve this, I created a new linux group
named nextcloud
. Every Nextcloud data file and database dump is part of the
group nextcloud
, with read-only access for the group. In addition, user
backup-maker
is part of this same group. In this way backup-maker
have a
read-only access on the files. That means that even without the rsync.d
option
read only
, write access is prevented: double security!
groupadd nextcloud
usermod -a -G nextcloud backup-maker
chgrp --recursive nextcloud /path/to/nextcloud/data
chmod g-w+rx /path/to/nextcloud/data
Each backup server must have access to the main server using SSH. To do so, each backup server must have a SSH key pair, and give the public key to the main server so that it can connect to it without giving a password.
First, I need to generate a ssh key pair on the backup server:
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa_nextcloud
And copy the content of ~/.ssh/id_rsa_nextcloud.pub
for the next step.
Note that I do not use a passphrase for this key (option -N
), as I need it to
be used automatically by a routine task, without a passphrase prompt. Because of
that, the key-pair is stored in plain-text and therefore can be used directly if
stolen. But our safeguard is to restrict the powers of this key to the very
minimum: read my data.
On the main server containing Nextcloud, I add this public key to the list of authorized keys with a forced command. A forced command ensures that a command, and only this one, will be run when a given key is recognized. That ensures that the host connecting with this identity won’t be able to do anything else than this written command.
Therefore, I force the backup server to run a daemon of rsync when connecting, and disable a number of advance features, such as TCP and X11 forwarding.
# /home/backup-maker/.ssh/authorized_keys
# The following ssh key starts a rsync daemon when connecting, to perform backup
command="rsync --config=/home/backup-maker/.rsync/rsyncd.conf --server --daemon . ",no-agent-forwarding,no-port-forwarding,no-pty,no-user-rc,no-X11-forwarding ssh-rsa <pub key> <name>
Where of course <pub key>
and <name>
are to be replaced with the public key
of the backup server and an arbitrary name.
One can see that the configuration file written above is given as an argument
in the command. Note that the path to this file cannot contain ~
, it is not
interpreted. One must use a full path or an environment variable like $HOME
.
I must confess that I do not fully understand the use of the --server
option.
To actually perform the backup, the rsync
command must be run on the backup
server. Two commands: one for each module.
rsync -a -v --delete -e "ssh -i ~/.ssh/id_rsa_nextcloud -l backup-maker" backup-maker@main_server_address::nextcloud_data backup/nextcloud_data
rsync -a -v --delete -e "ssh -i ~/.ssh/id_rsa_nextcloud -l backup-maker" backup-maker@main_server_address::nextcloud_db backup/nextcloud_db
-a
option is used to preserve the ownership of the files.-v
to make the command verbose.-e
to precise to use rsync over SSH with a given identity and SSH
user name.--delete
tells rsync to delete files on destination if they have
been deleted on source.Only client options can be given, as server options are already enforced by the SSH forced command.
Attention: there are two usernames used here — one for ssh and one for rsync (but they are the same here):
rsync -e "ssh -l ssh_username" rsync_username@host::module dest/
I highlight the fact that the IP addresses of the backup servers can be unknown by the main server and they do not need to be static or even public. Only the main server has these requirements. Therefore, it is very easy to spread many backup servers anywhere as long as there is a simple internet connection.
What I want is to run the commands written above periodically, like once
a week for instance. The use of cron
is well-suited in this case.
First, user must not be in /etc/cron.deny
and must be in /etc/cron.allow
.
Then, the crontab must be edited with the command crontab -e
. I am creating a
job that will run a custom script every Monday at 1 A.M.
00 01 * * mon username do_cloud_backup.sh >> backup_cron.log
And the script do_cloud_backup.sh
only contains the two rsync
command
written in the last section.
The database dumps must be done periodically too. cron
is once again perfect
for the job. The exact command depends on your database type and configuration.
In my case, I run mysqldump
once per week, overwriting the previous dump. Then
the backup servers simply synchronize with this dump.
This is a good place to talk about the threat of ransomwares. In case my
data becomes encrypted on the main server because of such a malware, my backup
server would copy these encrypted files and I would lose my data anyway! I am
not an expert and I don’t know the best solutions to prevent this effect, but I
have a very simple idea. I could write a dummy text file with a known content in
the data directory of the main server. Then, I add a check of this file in the
bash script of the cron
job, before running the rsync
commands. I could for
instance create a new rsync
module that copies only this dummy file. If this
file has a correct content I can assume that the rest of my data has not been
corrupted, and synchronize the other modules.
If someone can compromise one of my backup servers:
If one does not want his data to be read at all, the solution is to make NextCloud encrypt all data in the first place.
No need to worry, my main server is still on tracks and I did not lose anything. I just need to replace/fix the broken/stolen/burnt backup server with a new one.
If I am worried about the SSH key being in the hands of an evil person, I can simply remove this key from the list of authorized keys on the main server, and this key becomes useless.
That is more annoying, but backup servers are there for it! I must bring back
the hardware in place for the main server, then I use rsync
on the other way
to restore data. It is a bit more complex for the database but nothing
extraordinary. A fresh installation of Nextcloud can use all that as nothing
ever happened.
I can multiply the number of backup servers simply by adding each new public key to the list of authorized keys on the main server. There can’t be any conflict between them, and each new backup will be an additional safe copy.