(?) Backing up with tar

From Tom Brown

Answered By: Thomas Adam, Chaz Peters, Karl-Heinz Herrmann, Ben Okopnik, Robos

I'm trying to backup my Linux installation with tar, using a second hard-drive in my system, rather than a tape-drive or CD. The trouble is, I have a 2-GB file size limit on the destination (It's Fat32, so I can also use it for Windows backups), so I have to do it in a lot of little chunks (even with compression). Is there another solution to this, either a fancy shell script, awk script, or some combination of tar options that would produce the multiple destination files I'm looking for? If I keep doing it manually like I am now, I know I'll never maintain an up-to-date system backup like I should. I've found tape options for tar that control multivolume backups, and tape length, but nothing for multiple files.

(!) [Chaz] Backups can be a pain, especially ones that require manual operation. I like to automate them as much as possible. The following is a script I made for Kathy's Debian machine. Usually I prefer to backup over a network to another machine, however she has dialup and no other machines on a LAN. I use rsync because it's fast and works well. rsync is a file transfer program capable of efficient remote update via a fast differencing algorithm. This program is run once a week via cron, it works very well for hassle free automated backups as long as you have enough disk space. If you require compression, this is not what you want. I do not recommend using compression for backups, compression reduces the chances that the data will be recoverable.

See attached backup-weekly.sh.txt

(?) An example of what I'm doing now:

tar zcvf /windows/s/suse/back_tbrown.tgz /home/tbrown

(?) Oh, I tried the Suse backup/restore function, and could not restore the resulting files. The .tar.gz files within the .tar archives (don't know why they did it that way) seem to be corrupted. So, I figured I'd do it myself.

(!) [Chaz] SuSE, sorry the dpkg part of my script won't help...
Note the script lacks a secondary archive, that could be disastrous in a few cases. We do have an older backup on CDR and at some point I would like to transfer it to a laptop or something for other more recent off site copies. She can also selectivily transfer files via dialup so that I can back them up.
When I get more disk space, I am going to look into using better archival techniques. I have heard good things about Dirvish, a fast, disk based, rotating network backup system. A dirvish backup vault is like a time machine for your data. http://www.pegasys.ws/dirvish
(!) [Thomas] What you can do is something like this:
(cd /src/dir && tar cf - . ) | (cd /dest/dir && tar xvfp -)
where /src/dir is the directory you're starting from, and /dest/dir is the final destination that the files (dir's) will end up to.
Since you say that this is going to a FAT32 volume, that will not preserve file permissions. The only way you can achieve that is by making a tar file.

(?) Thanks. That's why I didn't just cp the directories over.

(!) [Thomas] Your other option is to make an archive and burn it to CD. One thing you might want to try though, is you are going to make a tar archive, run it with the "j" flag when you create it. That'll use bzip2 and might compact some more space.
(!) [K.-H.] You might have a look at afio instead of tar. It's more robust against data errors in the archives then tar and from reading the manpage I'm not quite sure if you can specify archive-filenames which are automatically numbered for multivolume. If not you can still automate things with the "promptscript" option. You archive to a specific dummy file, the script will mv/rename it to something useful and (number, date,...) and continue. To get rid of the prompts (or answer them automatically) should not be that difficult.
Be careful to read the basics: afio wants a list of files to be archived piped in on STDIN.
This might be a good startingpoint (no multivolumes, add that yourself):
find /var -xdev -print | afio -v -o -Z -T 5k -b10k  ARCHIVE.afio
(!) [Ben] Make your giant tarball, then use the 'split' utility to break it up into chunks. When you're ready to use it, just 'cat' all the pieces in order (which is how they'll be named by 'split') into a single file that you can untar. As someone mentioned, 'j' rather than 'z' gives you even better compression on large files.
(!) [Heather] Since j invokes bzip2 compression, yes. I wouldn't use it if anything needs to be unpacked on a non-linux system though; other OS' are shabby at bzip2 support.

(?) That would work fine, except that the tarball is too big to be created on the destination file system in the first place. What I'm looking for is some way of creating a lot of smaller tarballs right from the start.

(!) [Ben] What I meant was to create it on the "source" system, not the "target" one, then split and transfer. However, you can do it "in flight", too:
tar cvzf - * | split -b 100k backup-01-15-04
(!) [Thomas] Since the destination is not a Unix system, the use of the "-p" flag to preserve permissions is a must in this instance.
(!) [Ben] It's not really relevant to the host OS; the permissions that matter are "inside" the tarball. However, you're right anyway - in a backup,
(!) [Thomas] Indeed.
(!) [Ben] permissions should be preserved, and I lost track of that in generating a random example of "split" usage. In fact, for backups, the "tar" string should be:
tar cvzpSf - *
(add sparse file handling, as well.)
(!) [Thomas] LOL, I don't know, Ben.... all that Yoga and the like is going to your head, just make sure you:
tar cvzpSf
your linux knowledge :)
I for one, would be very interested in that tarball...
(!) [Ben] Sorry, even the pieces would be too large to fit on any possible host system. Although there's a lot of sparse files there, too. :)
(!) [Ben] This will create a load of 100k-sized files called "backup-01-15-04aa", "backup-01-15-04ab", etc. If the destination was a Unix system, I'd suggest piping "tar" into SSH, catching it on the far end and then_ splitting it - all done in one shot.
(!) [Robos] I'd rather use netcat instead of ssh. Depending on the connection certainly (didn't read all). But ssh adds quite a load more to the already busy cpu which tries to to bzip compression on the fly...

(?) I'd love to find out why the Suse backup tarballs won't untar, since Yast2 appears to do the kind of backup I want. I'm overlooking something there, I just know it, since the feature wouldn't exist in Suse if it didn't work.

(!) [Ben] Don't know anything about SuSE backup, but the above should do what you want.


Copyright © 2004
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 99 of Linux Gazette, February 2004
HTML script maintained by Heather Stern of Starshine Technical Services, http://www.starshine.org/


[ Table Of Contents ][ Answer Guy Current Index ] greetings   Meet the Gang   1   2   3   4 [ Index of Past Answers ]