Warehouse-scale computers

May 25th, 2009

Interesting paper by Urs and Luiz, from Google Inc., named Warehouse-scale computers about building big-scale computing clusters, datacenters, power efficiency, cooling, performance, parallel computing, modeling costs, dealing with failures, etc.

I’ve rescued the following e-mail from Neil Brown about building a new RAID5 array in Linux and why one the disks, while the array is being constructed, is marked as a spare:

When creating a new raid5 array, we need to make sure the parity
blocks are all correct (obviously). There are several ways to do
this.

  1. Write zeros to all drives. This would make the array unusable until the clearing is complete, so isn’t a good option.
  2. Read all the data blocks, compute the parity block, and then write out the parity block. This works, but is not optimal. Remembering that the parity block is on a different drive for each ‘stripe’, think about what the read/write heads are doing. The heads on the ‘reading’ drives will be somewhere ahead of the heads on the ‘writing’ drive. Every time we step to a new stripe and change which is the ‘writing’ head, the other reading heads have to wait for the head that has just changes from ‘writing’ to ‘reading’ to catch up (finish writing, then start reading). Waiting slows things down, so this is uniformly sub-optimal.
  3. Read all data blocks and parity blocks, check the parity block to see if it is correct, and only write out a new block if it wasn’t. This works quite well if most of the parity blocks are correct as all heads are reading in parallel and are pretty-much synchronised. This is how the raid5 ‘resync’ process in md works. It happens after an unclean shutdown if the array was active at crash-time. However if most or even many of the parity blocks are wrong, this process will be quite slow as the parity-block drive will have to read-a-bunch, step-back, write-a-bunch. So it isn’t good for initially setting the parity.
  4. Assume that the parity blocks are all correct, but that one drive is missing (i.e. the array is degraded). This is repaired by reconstructing what should have been on the missing drive, onto a spare. This involves reading all the ‘good’ drives in parallel, calculating them missing block (whether data or parity) and writing it to the ‘spare’ drive. The ‘spare’ will be written to a few (10s or 100s of) blocks behind the blocks being read off the ‘good’ drives, but each drive will run completely sequentially and so at top speed.

On a new array where most of the parity blocks are probably bad, ’4′
is clearly the best option. ‘mdadm’ makes sure this happens by creating a raid5 array not with N good drives, but with N-1 good drives and one spare. Reconstruction then happens and you should see exactly what was reported: reads from all but the last drive, writes to that last drives.

This post roughly describes what I do when I want to non-destructively upgrade my Linux system. By non-destructive means a procedure that allows me to upgrade but also to rollback if something goes wrong. As an example, I wanted to upgrade my Ubuntu system from Jaunty to Karmic. Since Karmic is now Alpha 1, the chances of the upgrade going bad or eating my data were high. Here is where LVM and LVM snapshots come into scene.

Basically, the idea consists of taking a snapshot of the root filesystem using an LVM snapshot, then reboot the system to use the filesystem from the LVM snapshot as the root filesystem, perform an in-place upgrade of Jaunty to Karmic and see what happens. Since the upgrade takes place on a system running with the root filesystem mounted from the LVM snapshot, the original root volume is kept intact. Hence, if something goes wrong I can always reboot in order to use the original root filesystem and the system would behave as if no modifications were done at all during the upgrade.

The LVM snapshot volume should be big enough to, in the worst case, store a completely new Linux installation. The average Ubuntu installation requires less than 8GB of disk, so the LVM snapshot should be about that size plus some slack required to download the packages. In my case, and since I have free enough disk space, I chose 16GB just to be on the safe side.

Resize the root filesystem and root volume

This step is only required if there is no space in the volume group to accommodate for the snapshot volume. In my case, the volume group is full so I need to shrink the root filesystem and the root volume. resize2fs does not currently allow one to shrink a filesystem online, so I booted from the Jaunty LiveCD and entered rescue mode. From there:

# e2fsck -f /dev/root/root
# resize2fs /dev/root/root 80G
# lvresize -L 80G /dev/root/root

Create the LVM snapshot

To create an 16GB LVM snapshot volume of my root volume:

# lvcreate -s -n karmic-root -L 16G /dev/root/root

Prepare the boot environment

Mount the filesystem from the LVM snapshot volume and modify /etc/fstab to replace the device name where the original root filesystem is with the device name where the snapshotted rool filesystem lives:

# mount /dev/root/karmic-root /mnt
# vi /mnt/etc/fstab

In my case, the line for the new root filesystem looks like:

/dev/root/karmic-root /  ext4 defaults 0 1

Reboot into the snapshot

Trigger the grub menu and modify the kernel entry that corresponds to the Ubuntu system. The idea is to use the device name for the LVM snapshot. This is how the new kernel in the menu looks like:

kernel /vmlinuz-2.6.28-11-generic root=/dev/root/karmic-root ro

Then press b to boot the system. The system should boot normally, but instead of using the original root filesystem it should be using the filesystem from the LVM snapshot:

$ grep /dev/mapper /proc/mounts
/dev/mapper/root-karmic--root / ext4 rw,relatime,errors=remount-ro,barrier=1,data=ordered 0 0

In-place upgrade

I won’t describe how to do an in-place upgrade of Ubuntu. There are many resources out there that describe how to do that. The point here is that the upgrade will modify the snapshot while the original root filesystem is kept intact.

Destroy the snapshot

If something goes wrong with the update, and it usually goes wrong when upgrading to an Alpha version, to bring the system back to a usable state is just a matter of rebooting the system and using the right entry listed in grub. In-place upgrades of Ubuntu will typically add a new kernel to the list of entries in grub but won’t modify the existing ones.

After rebooting into the original system, the snapshot can be removed:

# lvremove /dev/root/karmic-root

If you don’t intend to experiment with upgrades, perhaps you want to resize the root LVM volume, then the root filesystem back to their original size.

Recently, the Chromium team has started to provide official builds of Chromium for Mac OS X. Looks to me these builds are just the output of the continuous build process — also known as waterfall.

In any case, these are good news and to me a proof that Chromium for Mac OS X keeps evolving at a fast pace and that it is making very good progress. As a consequence, a few days ago, I switched to Chromium as my main browser (also in Linux) and I must say it feels great so far.

PS: This post was written entirely under Chromium for Mac OS X. No crashes or any strange behavior were experienced.

This post documents how I did set up Postfix 2.6 to relay all of its e-mail to GMail.

I used different sources to assemble what is described next. Worth mentioning are Getting Postfix to work on Ubuntu with Gmail, Gmail on Home Linux Box using Postfix and Fetchmail, Postfix Gmail SMTP Relay and finally Postfix TLS Support.

No client-side certificate, please

Some Web sites out there seem to insist on creating client-side certificates for Postfix when dealing with mail relaying to GMal. That is incorrect. Client-side certificates are not required when relaying mail to GMail. At the moment, GMail does only support user and password authentication, so trying to supply client-side certificates during the authentication phase might likely confuse the GMail SMTP servers and/or create problems.

Postfix main.cf main configuration file

The following configuration directives have to be added to a Postfix’s pristine main.cf configuration file. I added them at the end of the file:

# The e-mail sent will use this hostname as the e-mail origin.
myhostname = my.dynamicdns.domain.name
myorigin = $myhostname

# Relay all e-mail via GMail.
relayhost = [smtp.gmail.com]:587

# SASL authentication
smtp_sasl_auth_enable=yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous

# TLS
smtp_tls_eccert_file =
smtp_tls_eckey_file =
smtp_tls_security_level = may # http://www.postfix.org/TLS_README.html#client_tls_may
smtp_tls_CAfile = /etc/postfix/cacert.pem # Trusted root CAs
smtpd_tls_received_header = yes
tls_random_source = dev:/dev/urandom
smtpd_tls_security_level = may # http://www.postfix.org/TLS_README.html#client_tls_may

Store authentication credentials

GMail MSA/SMTP servers require the sending user to authenticate using their standard GMail user name and password. This authentication data must be stored properly secured in the file /etc/postfix/sasl_passwd:

gmail-smtp.l.google.com user.name@gmail.com:password
smtp.gmail.com user.name@gmail.com:password

Make sure the file is properly secured so that only the root user can dump its contents:

# chown root:root /etc/postfix/sasl_passwd
# chmod 600 /etc/postfix/sasl_passwrd

Postfix requires the conversion of the plain-text file to a hashed table format. This is achieved by running:

# postmap /etc/postfix/sasl_passwd

This will create a filed named /etc/postfix/sasl_passwd.db.

Populate the list of trusted CA certificates

This is required because, by default, Postfix does not trust any CA out there. cacert.pem is just Postfix’s trusted CA root certificate store. Other software components, like Web browsers, use different stores, but this file is essentially equivalent to those.

GMail SSL/TLS certificates are signed by Thawte. Therefore, in order to get Postfix to be able to authenticate the GMail SMTP server, it is necessary to store Thawte’s root CA certificates somewhere. Else, when Postfix tries to foward a message to smtp.gmail.com the following errors are logged:

May 10 15:40:07 postfix postfix/smtp[10677]: certificate verification failed
  for smtp.gmail.com[72.14.221.111]:587: untrusted issuer
  /C=ZA
  /ST=Western Cape
  /L=Cape Town
  /O=Thawte Consulting cc
  /OU=Certification Services Division
  /CN=Thawte Premium Server CA
  /emailAddress=premium-server@thawte.com
May 10 15:40:07 postfix postfix/smtp[10677]: warning: SASL authentication failure:
  No worthy mechs found

From a Ubuntu Linux box that had the ssl-cert package installed, I copied the certificates that correspond to Thawte’s CA to the Postfix machine. There, it’s just a matter of concatenating the multiple .pem files into just one file that Postfix will use: /etc/postfix/cacert.pem.

In order to generate cacert.pem from the individual Thawte certificates:

# cat {\
  Thawte_Personal_Basic_CA,\
  Thawte_Personal_Freemail_CA,\
  Thawte_Personal_Premium_CA,\
  Thawte_Premium_Server_CA,\
  Thawte_Server_CA,\
  Thawte_Time_Stamping_CA\
}.pem /etc/postfix/cacert.pem

Reload Postfix configuration

For example, by sending the SIGHUP signal to Postfix’s master process:

# pkill -1 master
# tail /var/log/maillog
May 10 15:58:42 postfix postfix/master[6921]: reload
  -- version 2.6-20090125, configuration /etc/postfix

Test

You can test by connecting port 25 of your Postfix machine or, as in my case, use the mail command:

# mail user.name@gmail.com
Subject: Hola
Este es un mensaje de prueba.
.

Postfix should log some messages to /var/log/maillog that should be equivalent to the following ones:

May 10 15:58:52 postfix postfix/pickup[32213]: 1234567890: uid=0 from=<root>
May 10 15:58:52 postfix postfix/cleanup[12716]: 1234567890:
  message id=<20090510135852.1234567890@my.dynamicdns.domain.name>
May 10 15:58:52 postfix postfix/qmgr[8604]: 1234567890:
  from=<root@my.dynamicdns.domain.name>, size=323, nrcpt=1 (queue active)
May 10 15:58:54 postfix postfix/smtp[32243]: 1234567890:
  to=<user.name@gmail.com>,
  relay=smtp.gmail.com[72.14.221.111]:587,
  delay=3.4,
  delays=1.1/0.21/0.76/1.3,
  dsn=2.0.0,
  status=sent (250 2.0.0 OK 1241963934 l12sm1383617fgb.4)
May 10 15:58:54 postfix postfix/qmgr[8604]: 1234567890: removed

Today I was faced with the following problem: I was trying to configure one of the Ethernet interfaces of an OpenBSD 4.5 box with both a dynamic address leased via DHCP, but also a static IP address. Initially, I tried this:

# cat /etc/hostname.vr2
dhcp
inet 1.1.1.11 255.255.255.0 NONE
up
# sh /etc/netstart vr2

The problem with this approach is that dhclient never gets daemonized because netstart gets it annoyed: dhclient notices that something else reconfigured the interface and commits suicide. So, then I thought about reversing the order of the first two lines:

# cat /etc/hostname.vr2
inet 1.1.1.11 255.255.255.0 NONE
dhcp
up
# sh /etc/netstart vr2

Now dhclient daemonizes but also removes all previously configured IP addresses, so the statically configured address configured via the first line is wiped by dhclient. Not very nice.

Turns out the solution lies in /etc/dhclient.conf:

# cat /etc/dhclient.conf
interface "vr2" {
        supersede domain-name "example.com";
        supersede domain-name-servers 1.1.1.1;
}

alias {
        interface "vr2";
        fixed-address 1.1.1.11;
        option subnet-mask 255.255.255.0;
}

The alias stanza allows one to define an additional, aliased IP address for an interface. Which allows the machine to be always reachable on a fixed IP address.

Neat.

Chromium for Mac OS X

May 1st, 2009

Chromium is the open source browser developed by Google. The differences between Chromium and Chrome are very minimal. Chrome has custom icons and other parafernalia that, due to licensing issues, can’t be made available in Chromium. Chrome is also available as a binary for Microsoft Windows operating systems, and can be downloaded from the Google Chrome Web site.
Other than that, Chromium is a fully functional browser product that is currently available only as source code. Chromium is available for Windows, Mac OS X and Linux. The Mac OS X and Linux ports are still under heavy development but are becoming more and more usable over time.

For more than a month I’ve been tracking development of Chromium for Mac OS X. I’ve been building and testing Chromium for Mac OS X myself [1] and my general impression is that development pace is pretty fast. For example, yesterday, a mock Preferences dialog box was added. A few days ago, working support for draggable and dettachable tabs was also added (previously it was possible to detach a tab from a window but it was not possible to re-attach it to an existing window).

Overall, the Mac OS X port of Chromium is getting more and more usable and stable. I’m now able to use it for most browsing tasks. The look and feel matches perfectly Aqua but also resembles a lot its Windows counterpart. While it is true there are a few annoyances, like losing the focus on edit controls when switching tabs, or tabs crashing at times when executing a paste operation, they are getting fixed in each iteration. The browser feels extremely fast when compared to Firefox 3.0 and faster than Firefox 3.1, Safari or Safari 4 beta. Heavy and complex Web pages like Google Reader or Google Mail load almost instantly while still looking correct. Some Web pages get rendered slightly different from other browsers. As an example, Google Mail looks slightly different with bigger spacing between lines in the mail thread (main) view and also slightly smaller fonts, but these are very subtle differences that do not affect usability or readability.

I must confess I’m pretty impressed about Chromium. When Google disclosed Chrome and the initial availability for the Windows platform only I was very disappointed. I also thought that it’d take much longer to see a nearly-functional port for Mac OS X or Linux. But I was wrong. It is good to be wrong. Let’s hope the development pace keeps on the same levels :)

PS: By the way, this post was written entirely from Chromium in Mac OS X. The tab crashed a couple of times but WordPress has a nice auto-save feature that I really appreciate ;)

[1] http://code.google.com/p/chromium/wiki/MacBuildInstructions