Joergs LinuxPages: Backup

This page contains material about backup - from simple hints to complete scripts and links to further references. Please note that a few of these scripts date many years back; they may no longer be actual.

System Backup

Backup to
a(ny) network drive

The script backup2net.sh allows the backup of large amounts of data to another backup location. It is very easy to use - the only requirement is that the "other backup location" is mounted on the system where you want to perform the backup. Thus, you can use any USB disk drive, or a NAS server somewhere on your network!

The script uses rsync according to a description at this blog entry where I added the framework (logging and error checking) of backup2tape.sh. The resulting script has the following features:

can write data to any folder on the system (network or local)
uses rsync for efficient data transport
allows to specify include & exclude patterns
supports snapshots
supports multiple configurations via command line
backup report is mailed to administrator

Download the backup2net.sh script.

Backup to
a tape drive

A while ago the amount of data that should be included in a daily backup outgrew the volume of a CD-RW. At that time I did not have a DVD writer but access to a tape drive, so I wrote the script backup2tape.sh which later became the framework for backup2net.sh (see above). Some features:

writes data to tape drive
performs verify after write
allows to specify include & exclude patterns
supports full & differential backup
supports multiple configurations via command line
preserves access rights (ACL)
backup report is mailed to administrator

The script relies on Jörg Schilling's star. I have been using this script with success for almost a decade to perform regular backups on several Linux machines, either manually or via cron, with both DDS2 and DLT tapes.

As a side note, I have stopped using tape backups and switched to network-based storage. While magnetic tapes are incredibly safe long-term archival media, one inconvenience (for me) lies in their slow operation: it can take several hours to retrieve a file.

Download the backup2tape.sh script.

Backup to remote tape server

One of the wonderful things with Linux is the built-in "allround" networking. This makes it very easy to use almost any device that is attached to a remote computer, such as the display, scanner, or other devices. If you want to use a remote tape drive (say, located on machine "tapehost") to backup your local computer, use something along the following line:

tar cvf - /path/to/file | ssh tapehost 'buffer -s 32k -p 75 -m 10m > /dev/nst0'

Tape access usually requires root rights, so you may need to enable superuser access by adding the option -h to the entry for rshd in /etc/inetd.conf:

shell   stream   tcp   nowait   root    /usr/sbin/tcpd   in.rshd -Lh

You can use a similar procedure to clone an existing system to another one (where the remote system, here called targetPC, should be booted from floppy or CD, so that no access to the harddisk is required for system operation during the cloning process):

(cd / && tar cpf - .) | ssh targetPC '(cd / && tar xpf -)'

... or the other way round, i.e. you fetch the system from targetPC:

 ssh sourcePC '(cd / && tar cpf - .)' | (cd / && tar xpf -)'

Backing up
an entire partition

Computer with a pre-installed MS Windows often come without a recovery medium and you have to create these yourself. Sometimes some additional "one-key recovery" software is available, but I found out (the hard way ;-) that most of these work only if you do not modify the hard disk partition layout.

If you want to preserve the original data, I recommend to make a backup not only of the data, but of the complete partition or even the whole disk. This is easily achieved by booting any USB-based Linux from a USB stick:

dd if=/dev/sda1 bs=256k | gzip -c > /path/to/backup/sda1.gz

... assuming sda1 is your windows partition and /path/to/backup is some free drive (local, USB or remote) where you can write your data. Alternatively, do the same thing over ssh, preferably on a fast network (throughout will usually be limited by the network, not by gzip):

dd if=/dev/sda1 bs=256k | ssh login@remote 'gzip -c > /path/to/backup/sda1.gz'

The parameter bs indicates the block size used by dd. On modern hardware, the old default block size of 512 bytes is slow; values of 64k or 128k or 256k tend to be much more efficient.

The process may take a while and the resulting file may be quite large (more than half of the partition size) even if the actually installed data are only 300 MB - this is due to the archival of the "raw" partiton data. Axel Buergers pointed out that this problem can be largely reduced by defragmenting and then "filling up" the Windows partition with a huge NULL file, something like dd if=/dev/zero of=/mnt/dos/eraseme bs=reallybignumber, which will abort once the partition is full. The file /mnt/dos/eraseme can then be deleted and the "clean" partition archived as above. This "cleaning" process can reduce the size of the final partition image by 50%.

Repeat this for all relevant partitions.

In addition, do not forget to make a copy of the original MBR ...

dd if=/dev/sda of=/path/to/backup/mbr.512-bytes.original bs=512 count=1

.. and print the current partition table to file:

/sbin/fdisk -l /dev/sda > /path/to/backup/partitions.txt

To restore such an archived partition file to the original location:

gunzip -c /path/to/backup/sda1.gz  | dd of=/dev/sda1

Note 1: If you restore a Windows partition in this way but Windows does not start (Problem Event Name: StartupRepairOffline), do not panic - most probably, you have merely forgotten to set the partition to "bootable".

Note 2: If you restore a Windows partition onto a bigger partition, Windows will not be able to use the larger space until you have resized the filesystem (not just the partition). A command like the following should do the job; the --size 99G indicates the maximum size of the partition:

ntfsresize --size 99G /dev/sda1

Backing up
the entire harddisk

The tools above can also be used to back up and restore the entire harddisk of a computer over the network, or to a removable (USB) disk drive.

Backup:

We want to avoid that the harddisk is active during backup, so we boot the computer using some "live" Linux USB stick. Then:

mkdir /media/net 
mount -t nfs 192.168.x.y:/mnt/nasdrive /media/net
dd if=/dev/sda of=/media/net/backup/sys-id.yyyyMMdd.sda.img bs=256k status=progress

Note: In this test run I was using an uncompressed archive file.

Restore:

Restore uses almost the same commands, we merely invert the if= and of= parameters of the dd command:

mkdir /media/net 
mount -t nfs 192.168.x.y:/mnt/nasdrive /media/net
dd if=/media/net/backup/sys-id.yyyyMMdd.sda.img of=/dev/sda status=progress

Using a Gigabit network and a SSD drive on the "writing" end, an entire 240-GB disk was restored in about 1 h.

Copying (cloning)
the entire harddisk

The same approach can be used to copy (clone) the entire harddisk of a computer. As long as the target disk has at least the same size as the source disk, we can simply use dd since this will copy the entire disk, including boot sectors and UEFI information.

Here, it was the Acer ES1-131 laptop (the sticker says "Aspire E 11") whose "conventional" 500-GB HDD was replaced by a much faster 512-GB SSD drive:

We want to avoid that the harddisks are active during this process, so we boot the computer using some "live" Linux USB stick, copy the entire system to RAM and remove the USB stick (the removal was necessary in my specific case since this laptop has only 2 USB ports, and both are used by the adapter used in the next step).
Use an USB-to-SATA adapter to connect the new SSD to the computer (I used a Delock 61883 SATA to USB 3.0 Converter). Do not mount the disks!
fdisk -l will tell you which disk is where - in my case, /dev/sda showed lots of partitions and /dev/sdb was empty, since it was a new harddisk. Thus, the source disk (if=) was /dev/sda and the target disk (of=) was /dev/sdb.
Clone the disk: dd if=/dev/sda of=/dev/sdb bs=100M status=progress. The whole process took about 110 minutes.
Shut down the system.
Swap the two harddisks, then boot the computer ... and smile :-)
With new SSD being slightly bigger, there was additional space available but it was un-used, since the partition table from the "old", smaller disk was copied. I simply booted the computer from the "live" Linux USB system again and used gparted to re-size some of the existing partitions, adding the additional space in the same process.
Done :-)

"Partition Table Entries not in Disk Order"

If you have deleted, moved and resized partitions on your harddisk, you may get the message "Partition Table Entries not in Disk Order". Fixing this is easy in fdisk:

unmount all partitions on that disk (including swap)
fdisk /dev/sdx
in fdisk, use 'x' to enter the "advanced" menu
'f' to fix partitions order
'r' to return to main menu
'w' to write the partition table
quit, then reboot.

Specific Backup and Recovery Tasks

VirtualBox back up

This is yet another variant for copying large amounts of data. To back up my VirtualBox partitions, tar does the job (the `date +"%Y%m%d"` snippet simply converts today's date into ISO8601 notation):

tar cvzf /mnt/nas/backup/vbox/Win10/Win10.`date +"%Y%m%d"`.tgz /mnt/vbox/files/Windows\ 10\ Pro\ 64bit/

Much faster is the compression with --zstd, this is now built into tar:

tar --zstd -cvf /mnt/nas/backup/vbox/Win10/Win10.`date +"%Y%m%d"`.tgz /mnt/vbox/files/Windows\ 10\ Pro\ 64bit/

Here is a quick speed comparison. Both tests were done on the same dataset, on an otherwise almost idle system. The resulting files had nearly the same size: 28.9 GB for the .tar.zst file and 29.4 GB for the standard .tgz file.

$ time tar --zstd -cvf /mnt/nas/backup/vbox/Win10/Win10.`date +"%Y%m%d"`.tar.zst /mnt/vbox/files/Windows\ 10\ Pro\ 64bit/
real    5m56.321s
user    4m30.555s
sys     1m43.399s

$ time tar -cvzf /mnt/nas/backup/vbox/Win10/Win10.`date +"%Y%m%d"`.tgz /mnt/vbox/files/Windows\ 10\ Pro\ 64bit/
real    31m12.425s
user    30m31.711s
sys     1m16.013s

Unpacking (paths are extracted relative to the local directory):

tar --zstd -xvf /mnt/backup_local/drives/vbox/Win10.20240210.tar.zst

Copy all content of a partition

When I do not need to copy the whole partition and I "only" need the content of said partition, rsync does the job:

rsync -auHxvp --exclude=/lost+found/* /path/to/source/* /path/to/dest/

A typical application is copying the contents of several partitions from one disk (here, /dev/sda) to another (here, /dev/sdb). After booting the system in rescue mode, I mount the "old" and "new" drives:

mkdir -p /tmp/{old,new}{root,home}

mount /dev/sda1 /tmp/oldroot
mount /dev/sda2 /tmp/oldhome
mount /dev/sdb1 /tmp/newroot
mount /dev/sdb2 /tmp/newhome

rsync -auHxvp --exclude=/lost+found/* --exclude=/proc/* --exclude=/sys/* /tmp/oldroot/* /tmp/newroot
rsync -auHxvp --exclude=/lost+found/* /tmp/oldhome/* /tmp/newhome

Backup on CDs

Backup on CD-RW, CD-R and DVD-RW

This is discussed in-depth on my CD Writing page.

Tools and Hints

Backup/Synchronise Firefox bookmarks

Firefox is a great web browser, but it has one downside: Mozilla products use random folder names for user profiles. This causes a problem if you want to use the same bookmarks name on multiple computers, e.g. by synchronisation with unison.

The workaround is easy: Simply rename your profile folder in ~/.mozilla/firefox to whatever fits your needs - e.g. default.joe. Then, edit ~/.mozilla/firefox/profiles.ini to reflect this change and you're done.

netcat

If you do need to copy data over the network in a very simple way - without scp or ssh, without ftp, even without telnet ... there is still netcat, the swiss knife of networking (on some systems the binary is called nc). I sometimes use a floppy-based Linux, such as tomsrtbt, to copy data between two computers, e.g. to backup the whole harddisk of a laptop.

An example how to use tar over netcat:

On the "sender" side, launch tar and pipe its output through netcat, specifying the destination IP address, "listen"-mode (-l) and an arbitrary port number, e.g. 5555:

tar cvf - . | netcat -l -p 5555

On the "receiving" computer, just give the sender's IP address (or hostname) and the same port number and pipe the output through tar to unpack:

netcat 192.168.xxx.yyy 5555 | tar xvf -

... and the data will be transferred. The same works with almost any other command, e.g. dd instead of tar.

Speed comparison

This is an informal comparison of several means to backup an entire partition that I made in summer 2022. The source drive /dev/sda2 on computer laptop is an SSD, the target drive /mnt/nfs/ is a NFS drive with RAID1 on conventional (non-SSD) harddisks. For the sake of brevity, I have shortened and annotated the output:

root@laptop:~# mount -t nfs nas:/mnt/nasdrive /mnt/nfs/

# This is standard dd. The resulting image is 212 GB in size.
root@laptop:~# time dd if=/dev/sda2 of=/mnt/nfs/backup/laptop.sda2.img bs=256k status=progress
157286400000 bytes (157 GB, 146 GiB) copied, 2405.71 s, 65.4 MB/s
real	40m5.780s
user	 0m2.799s
sys   	 5m3.263s

# This is dd piped through gzip. The resulting image is 157 GB in size.
root@laptop:/mnt/nfs/backup# time dd if=/dev/sda2 bs=128k status=progress | gzip -c > /mnt/nfs/backup/laptop.sda2.img.gz
157286400000 bytes (157 GB, 146 GiB) copied, 2669.74 s, 58.9 MB/s
real	44m29.915s
user	43m33.571s
sys 	 3m22.518s

# This is dd piped through ssh and gzip on the receiving computer. 
# Painfully slow since throughout is limited by the network, not by gzip:
root@laptop:/mnt/nfs/backup# time dd if=/dev/sda2 bs=128k | ssh joe@nas 'gzip -c > /mnt/nasdrive/backup/laptop.sda2.img.gz'
157286400000 bytes (157 GB, 146 GiB) copied, 5940.23 s, 26.5 MB/s
real	99m0.264s
user	16m40.924s
sys     15m55.026s

# When dd is piped through gzip first and then through ssh, it's much faster:
root@laptop:/mnt/nfs/backup# time dd if=/dev/sda2 bs=128k | gzip | ssh joe@nas 'cat > /mnt/nasdrive/backup/laptop.sda2.img.gz'
157286400000 bytes (157 GB, 146 GiB) copied, 2694.82 s, 58.4 MB/s
real	44m54.825s
user	45m47.374s
sys      4m10.140s

# Now using zstd. The disk image created with zstd is only slightly larger
# than with gzip (38.47 GB vs. 36.50 GB), but the compression is much faster:
root@laptop:~# time zstd -T0 --fast=3 < /dev/sda2 > /mnt/nfs/backup/laptop-sda2.img.zst
real    14m52.495s
user     4m17.569s
sys      2m15.650s

A Quote

A comment to the importance of Backup was also published in the Heise Newsticker in 2002-06:

Einen nachdenklichen Akzent setzte Douglas O'Shaugnessy vom Support-Service der Firma Legato, der mit 18 Spezialisten nach dem Einsturz der Türme des World Trade Centers vor Ort arbeitete. Shaugnessy [...] berichtete von der häufig vergeblichen Suche nach Recovery-Plänen und brauchbaren Inventar-Verzeichnissen. Dies ließ die rein technische Arbeit der Datensicherer zu einer Mischung aus Puzzle und Detektivspiel werden. Mangels aussagekräftiger Kennzeichnung mussten seine Spezialisten 20.000 Bänder auf der Suche nach den neuesten Backups durchforsten und mehr als einmal einen viel früheren Versionsstand der Sicherungen einspielen, damit die Firmen wieder arbeiten konnten.

Shaugnessy präsentierte eine Übersicht, nach der die Firmen den größten Schaden hatten, deren Mitarbeiter im World Trade Center vor allem mit Laptops arbeiteten. Rund 30 Prozent der Firmendaten seien verloren, weil die Mitarbeiter ihre mobilen Rechner nur unregelmäßig im zentralen Firmen-Backup sicherten. Shaugnessys Vortrag endete mit einer dringlich klingenden Bitte: "Dokumentieren Sie Ihren Backup-Prozess. Dokumentieren Sie Ihre Recovery-Maßnahmen. Dokumentieren Sie Ihr Tape-System. Drucken Sie diese Informationen mehrfach aus und bewahren Sie diese an anderen, sicheren Orten."

Jörgs LinuxPages: Backup