El Cazador-Recolector™
September 9th, 2005
Leo un artículo en Versión Cero que contiene el siguiente párrafo:
Significativo es el caso de la historia contada por un antropólogo en su libro sobre los pigmeos. Recuerdo vivamente esta historia. Fueron todos a cazar, incluido el antropólogo. Y lograron cazar un gran buey salvaje. Mientras iban de camino a la aldea con la pieza cobrada, iban despreciando al cazador y el resultado de la jornada:
- “era un buey viejo, seguro que su carne está dura”
- “con esta pieza no tendremos para alimentar a todos”
El antropólogo, perplejo con lo que oía, pues la pieza bien podía significar el alimento de toda una semana, preguntó a uno de ellos porqué despreciaban ese ejemplar y más concretamente al cazador cuya intervención permitió cobrarlo. Su contestación fue: “si alabamos al hombre por cazar, crecerá su orgullo, pues los hombres son orgullosos por naturaleza, y pensará que es mejor que los demás, creando conflicto y peleas entre nosotros. El orgullo lleva a la envidia, la envidia lleva al odio, y el odio lleva al crimen. Nadie es mejor que nadie, todos somos iguales. Así que hay que despreciarle para que su orgullo no crezca.”
Realmente no sé si esta tribu de pigmeos realmente dijo eso o no, pero esa frase encierra una gran sabiduría y comprensión de la auténtica dimensión humana.
Incremental backups with rsync
September 9th, 2005
I have been thinking for a while to implement incremental, cyclical backups on my home network. The problem with cyclical backups to tape is that they are slow. The problem with cyclical backups to disk is that they consume a great deal of space. I finally opted for cyclical backups to disk since my DDS-3 SCSI tape is slow and can’t hold the many gigabytes I have in data, even with hardware/software compression.
I want to periodically branch my main backup tree so that I can keep several backups, ordered from the newest (backup.0) to the oldest (backup.n), where “n” could be the number of days or weeks, depending on the frequency of the backups.
The filesystem should look like this:
\-- backup.0 | |- backup.1 | |- backup.2 | . . . \- backup.n
A simple way to reduce disk space usage is by using a UNIX-like feature called hard-links. The idea behind this is that if a file does not see its contents changed between backups, we could save space by having all the identical copies hard-linked together.
Using rsync and cp we can implement this very easily, thanks to the way that rsync works. By default, when not using the –inplace command-line switch, if rsync detects that a destination file is different from its source file, instead of performing direct modifications onto the destination file by opening it, writing to it, then closing it, rsync will create a new file. This has several advantages:
- Users can keep on working with files, even when rsync is synching them underneath. Since rsync always creates a new file instead of performing modifications to the current file, users won’t suffer from the strangeness that involves multiple updates to the same file by multiple users/processes.
- Since rsync creates a new file, when the original destination file is hard-linked across several backup branches, the synching process won’t indirectly sync up those backup branches too. Instead, they will be kept intact, and a new destination file, mirroring its source file, will be created.
We don’t want that an update to a file in the backup.0 branch means updating any file hard-linked to it, since that would destroy the incremental semantics.
Thus, we can implement a really simple cyclical backup scheme using rsync and hard-links..
-
Things to run on the server.
We run this periodically:
# rm -fr backup.${n} # for i in `seq ${n} -1 2`; do mv backup.$[${i}-1] backup.${i}; done # cp -al backup.0 backup.1This will rotate all the backups, discarding the last one. Then, the cp command will replicate the main branch (backup.0) into (backup.1) by using hard-links.
NOTE for FreeBSD users: the cp command that comes with the FreeBSD base system does not support neither the -a nor the -l command-line switches. -a means -dpR (recursively copy and preserve attributes), while -l means not to copy, but to create hard-links instead.
Fortunately, the FreeBSD ports collection includes a port of the GNU coreutils package, which sports the full GNU cp program, supporting the -a and -l switches:
# cd /usr/ports/sysutils/coreutils # make all install
To avoid the name clashing betweeh the cp command from the FreeBSD system and the GNU one, the GNU cp command is renamed to gcp. So, in the script listed bedore, we should rename the invocation to cp to gcp.
-
Things to run on the client.
To perform the incremental backup against the server, we can run the following command:
# rsync -a -E Users/ rsync://
:/data/backup.0/ It’s very important to keep the timestamps synchronized on both the client and the server so rsync can use them to decide which files have been changed and which files not. This is done with the -t command-line switch. Note that the -a (archive) command-line switch to rsync is like specifying -rlptgoD, and thus we don’t have to specify -t.
The -E command-line switch is useful for Mac OS X-based machines and will allow synching files stored in a HFS+ volume that uses resource forks by using the AppleDouble format.
Adding extended attributes support to rsync
September 9th, 2005
The rsync software that comes with Mac OS X 10.4, and newer releases, supports extended attributes (HFS+ resource forks). This means it can sync files from a local HFS+ filesystem to a remote volume which does not suport HFS+ resource forks by using AppleDouble encoding.
The AppleDouble encoding splits a native HFS+ file in two parts:
- The data fork, which is the one that holds the real contents of the file, like the contents of a document, the pixels from a bitmap, and so on. It receives the same name as the original HFS+ file.
- The resource fork, which is built of data held on the resource forks and Finder data, like the Spotlight comments and so on. It receives a filename which consists on prepending a dot and a slash characters to the original HFS+ filename.
Thus, for a HFS+ file named MyDocument.webloc, when stored in the AppleDouble format, it is splitted in two files: MyDocument.webloc and ._MyDocument.webloc.
By default, Mac OS X rsync implementation does not enable extended attributes support. This must be explicitly enabled by supplying the -E command-line switch to the rsync command. The problem is, however, that few rsync implementations (I don’t know of any besides Apple’s Mac OS X 10.4 one) support neither this kind of functionality nor the command-line switch that activates it.
The solution was pretty easy, by the way. I downloaded Darwin rsync source code for the rsync-20 from the Darwin Projects Directory and extracted the file patches/EA.diff from within it. This patch file includes the extended attributes (HFS+ resource forks) functionality I was seeking. This patch, at the moment of this writing, is agaist rsync-2.6.3.
Thus, I only had to grab the sources for rsync-2.6.3, which are also included inside the rsync-20.tar.gz file I downloaded before, extracted them, patched, configured, made and installed:
# tar zxvf rsync-20.tar.gz # cd rsync-20 # tar zxvf rsync-2.6.3.tar.gz # cd rsync-2.6.3 # patch < ../patches/EA.diff # CFLAGS="-O -pipe" \ ./configure --prefix=/usr/local \ --disable-debug --enable-ea-support \ --with-rsyncd-conf=/usr/local/etc/rsyncd.conf # make # make install
I ran the previous commands on FreeBSD 7.0-CURRENT, thus the /usr/local prefix. Also, note the –enable-ea-support command-line switch supplied to configure. It is required in order to build the extended attributes support in. Leaving it out will produce a normal, EA-disabled rsync.
# rsync --help | grep -- -E -E, --extended-attributes copy extended attributes
That’s all, folks.