Installing VMware ESXi 4 over PXE

September 19th, 2009

Let’s face it: installing VMware ESXi from a CD-ROM or from a USB key is painfully slow. Installing from the network is faster and more flexible. And preparing VMware to be installed from PXE turned out to be very easy.

The ISC DHCP configuration file could look like this:

# cat /etc/dhcpd.conf
default-lease-time 86400;
max-lease-time 604800;
option subnet-mask 255.255.255.0;
option broadcast-address 1.0.0.255;
option domain-name-servers 1.0.0.1;
option domain-name "example.com";

subnet 1.0.0.0 netmask 255.255.255.0 {
        range 1.0.0.100 1.0.0.254;
        option routers 1.0.0.2;
        option ntp-servers 1.0.0.3;
}

host esx {
        hardware ethernet 00:aa:bb:cc:dd:ee;
        fixed-address esx;
        next-server 1.0.0.4;
        filename "pxelinux.0";
}

The important bits are in the host esx section, where PXE boot support is enabled by means of the next-server and the filename directive. next-server specifies the IP address (or DNS name) of the TFTP server to be used to download the PXE boot loader and filename the file that stores the PXE boot loader code.

Looking inside the TFTP server, we can see that the tftpboot root directory is very simple: it consists of a standard pxelinux.0 PXE boot loader, a pxelinux.cfg directory where the configuration files are stored and a directory for all VMware-related files. pxelinux.0 is just part of the syslinux project. pxelinux.cfg has to be created by hand. vmware-esxi-4-0-0 contains files copied directly from the VMware ESXi 4 installable ISO image:

# ls -l /tftpboot
total 40
-rw-r--r--  1 root  wheel  14776 Sep 18 03:17 pxelinux.0
drwxr-xr-x  2 root  wheel    512 Sep 18 03:43 pxelinux.cfg
drwxr-xr-x  2 root  wheel    512 Sep 18 03:50 vmware-esxi-4-0-0

For all different naming options for configuration files stored under pxelinux.cfg, check the manual page for pxelinux or search the Internet. In my case, I just chose 01-${MAC} where ${MAC} is the MAC address of the Ethernet interface used to PXE-boot the machine where ESXi is to be installed. In this case, ${MAC} is 00-aa-bb-cc-dd-ee.

The contents of the configuration file are in fact a slightly modified copy of the contents of the isolinux.cfg file from the VMware ESXi 4.0 installable ISO image. The only differences are the default and label directives and the adjusted path names for the kernel and modules: all these files live inside their own directory to avoid polluting the tftpboot root.

# cat /tftpboot/pxelinux.cfg/01-00-aa-bb-cc-dd-ee
default esxi
label esxi
kernel vmware-esxi-4-0-0/mboot.c32
append vmware-esxi-4-0-0/vmkboot.gz
   --- vmware-esxi-4-0-0/vmkernel.gz
   --- vmware-esxi-4-0-0/sys.vgz
   --- vmware-esxi-4-0-0/cim.vgz
   --- vmware-esxi-4-0-0/ienviron.tgz
   --- vmware-esxi-4-0-0/image.tgz
   --- vmware-esxi-4-0-0/install.tgz

The files stored inside the vmware-esxi-4-0-0 directory were copied directly from the VMware ESXi 4.0 installable ISO image, as mentioned above:

# ls -l /tftpboot/vmware-esxi-4-0-0
total 694704
-r--r--r--  1 root  wheel   12730046 Sep 18 03:15 cim.vgz
-r--r--r--  1 root  wheel    5818848 Sep 18 03:15 ienviron.tgz
-r--r--r--  1 root  wheel  288629638 Sep 18 03:17 image.tgz
-r--r--r--  1 root  wheel      21456 Sep 18 03:17 install.tgz
-r-xr-xr-x  1 root  wheel      47404 Sep 18 03:44 mboot.c32
-r--r--r--  1 root  wheel   46184258 Sep 18 03:15 sys.vgz
-r--r--r--  1 root  wheel      16805 Sep 18 03:15 vmkboot.gz
-r--r--r--  1 root  wheel    2044368 Sep 18 03:15 vmkernel.gz

When it is not possible to install VMware ESXi 4.0 from a CD/DVD drive, and if the machine supports booting from USB, one can easily install from a USB drive. Preparing the USB drive to install ESXi 4.0 from it is very easy:

Create a FAT32 partition on the USB drive:

# install-mbr /dev/sdX
# fdisk /dev/sdX
...
# mkfs.vfat /dev/sdX1

Make sure the FAT32 partition is tagged as bootable/active in the MBR and that preferably it has a valid Win32 FAT32 partition type.

Next, copy the contents of the ESXi 4.0 CD into the FAT32 partition from the USB drive:

# mount -o loop /path/to/VMware-VMvisor-Installer-4.0.0-171294.x86_64.iso /mnt
# mount /dev/sdX1 /media
# cp /mnt/* /media
# mv /media/isolinux.cfg /media/syslinux.cfg
# umount /media
# umount /mnt

The last step consists of installing syslinux into the FAT32 partition:

# syslinux -s /dev/sdX1

Done!

On this second post I want to talk about the interaction problems I experienced with the HP SmartArray P212 controller in this computer. The HP SmartArray P212 controller is certified for VMware ESXi 4.0 and Solaris 10. Initially I thought that using VMware would be useful to me in order to play with Solaris and even Windows 7.

However, I haven’t been able to get VMware ESXi 4.0 to work properly on this controller. If I create 4 logical drives in the HP controller, one for each phyisical disk, VMware finds the drives and figures out their right sizes. However, if configure a 3-drive RAID-5 logical volume in the HP controller, yielding a usable 3.0TB volume size, VMware finds and reports a 0.0B-sized volume. I tried different options from the HP SmartArray BIOS, like limiting the maximum bootable partition size, but the end result is always the same: VMware sees a 0.0B logical volume that can’t be used to install VMware neither to store virtual disks.

In the end, I ditched VMware ESXi 4.0 in favor of OpenSolaris, at least on this machine. I could have created 4 logical volumes, but it doesn’t make much sense for VMware itself. It makes perfect sense when running Solaris and using RAIDZ, though.

I haven’t been able to found any explanation to this problem other than VMware does not support LUNs bigger than 2TB. Is this the case? Do any of you have experience with VMware and LUNs larger than 2TB?

libvirt and virt-manager are a blessing. They bring powerful, free, open source management to Xen- and KVM-based virtualization environments.

I’ve been using both for quite a while. Also, I’ve always prefered bridged networking support for my virtual machines over NAT. While NAT is non-disruptive and allows for isolation, I typically like to easily access services provided by my virtual machines, like SSH or NFSv4. Turns out that setting bridged networking support in libvirt is very easy, as long as bridged interface is detected by libvirt.

The simplest solution consists of creating a bridge interface that enslaves all the physical networks interfaces used to connect to the LAN or the Internet. For example, in Ubuntu, in order to enslave eth0 to a br0 bridge interface, while using DHCP for IPv4 address configuration, /etc/network/interfaces needs to look like this:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet manual

# The bridge
auto br0
iface br0 inet dhcp
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0

Next time, when creating a new virtual machine, it will be possible to use bridged networking in addition to NAT-based networking. There is one caveat, at least in Ubuntu: libvirt and virt-manager by default connect to qemu:///user instead of qemu:///system. This is neither good nor bad by itself. qemu:///user allows a non-privileged user to create and use virtual machines and the process of creating and destroying the virtual network interfaces used by the virtual machines is done within the context of the user running virt-manager. Due to lack of root privileges, virtual machines are limited to QEMU’s usermode networking support. In order to use advanced networking feautures like bridged networking, make sure you connect to qemu:///system instead. That is typically achieved by running virt-manager as root (which is not necessarily nice). I tried playing with udev and device ownership and permission masks but it all boils down to the inability of a non-privileged user to use brcrl to enslave network interfaces to a bridge.

Free VMware ESXi

July 29th, 2008

It seems that the oust of Diane Greene is having big consequences. One of them is that VMware ESXi is now a free product. I see this like a direct attack to Microsoft’s attempt to get into the virtualization market. What does this mean? That there is little reason to choose Microsoft’s hypervisor when you can choose a more mature one that is now free.

Of course, hardware requirements for ESXi still make it hard to afford for small companies, but these are very good news in any case.

The hypervisors war is getting very hot!

I have been fighting for quite some time to get Solaris 10 installed and working properly under VMwae Fusion 1.1 on an Intel Core 2 Duo Mac. I think this is an interaction problem between VMware Fusion and Solaris 10. Even if you tell VMware Fusion that you want to install Solaris in 32-bit mode, VMware doesn’t disable the 64-bit, long-word instruction support (available from the host). Thus, Solaris 10 is able to detect that 64-bit, long-word instructions are available and boots a 64-bit kernel. VMware then complains about the fact that the virtual machine was configured in 32-bit mode but the guest is trying to execute 64-bit instructions.

The work-around is pretty easy: it consists of disabling 64-bit, long-word instructions in the VMware configuration file for the guest. This is described in greater detail in article Installing Solaris 10 as a 32-Bit Guest Operating System on a 64-Bit Host Machine but consists mainly in editing the .vmx virtual machine’s configuration file and adding the following line:

monitor_control.disable_longmode = 1

Rebooting the installation, or booting an already installed virtual machine system should make Solaris boot into 32-bit mode.

QEMU, KQEMU and udev

February 8th, 2007

Now that KQEMU has switched to the GPL v2 license, I’m starting to get interested on it.

One problem with KQEMU is that modprobing the kernel module, kqemu.ko, doesn’t automatically create /dev/kqemu unless the proper udev rules are defined.

A cannonical udev rule file to get /dev/kqemu created automatically when kqemu.ko is loaded is:

# cat /etc/udev/rules.d/60-kqemu.rules
KERNEL=="kqemu", NAME="%k", MODE="0666"'

Creating this file will tell udev to automatically create the corresponding special device file with permissions 0666 and owner root.root, but this can be easily changed to specify a different user group so that only a limited number of users, members of that group, can access KQEMU.

Xen is one of the coolest pieces of software I have ever used. It allows me to partition my box into manageable pieces, for increased security and increased resource utilization. I have been playing extensively with Xen for more than a year and have also written some posts about it.

NetBSD is a lean, mean, fast free, open source operating system and is nicely supported under Xen, has nice features like the PF packet filter and the pkgsrc ports-like collection and runs in nearly every single hardware architecture on earth. Because of this, I decided to run NetBSD 3.1 on Xen. NetBSD can run either as the privileged domain (called dom0) or as an unprivileged guest (called domU) domain. Since I was already running Linux under Xen as a domU, I am mostly interested in running NetBSD 3.1 as a domU guest on Xen. dom0 can be either Red Hat Enterprise Linux 5.0 or Fedora Core 6, but feel free to use any other Linux distribution as most of them are Xen-ready.

As far as I know, there are some restrictions between the Xen hypervisor + dom0 kernel and domU kernel:

  • You cannot mix PAE-enabled and non-PAE kernels.

    For example, you cannot run a PAE-enabled dom0 kernel and/or PAE-enabled hypervisor and a non-PAE dom0/domU kernel.

    This is currently a problem since Fedora Core 6 and Red Hat Enterprise Linux 5.0 both ship with a PAE-enabled Xen hypervisor and Xen-enabled kernels, but NetBSD does not currently ship a PAE-compatible, Xen-enabled kernel.

  • You cannot mix 64-bit and 32-bit kernels.

    You cannot run a 64-bit Xen hypervisor and 64-bit dom0 kernel and a 32-bit domU kernel.

Since both Fedora Core 6 and Red Hat Enterprise Linux 5.0 ship by default with a PAE-enabled (36-bit addressable memory space) Xen hypervisor and dom0 Xen-enabled Linux kernel, the first thing that I had to do in order to run NetBSD 3.1 as domU under Xen was to recompile the Linux kernel and the Xen hypervisor with PAE support completely disabled. This is described next.

Build Xen hypervisor and dom0 kernel without PAE

You can skip to the next section if you already have a non-PAE, working Xen installation.

The first thing I had to do is to downl the SRPM (source RPM) for the latest Linux kernel, for example kernel-2.6.19-1.2895.fc6.src.rpm, then install it by running:

# rpm -i kernel-2.6.19-1.2895.fc6.src.rpm

In file /usr/src/redhat/SPECS/kernel-2.6.spec replace the following:

%ifarch i686
%define buildpae 1
# we build always xen HV with pae
%define xen_flags verbose=y crash_debug=y pae=y
%endif

with:

%ifarch i686
%define buildpae 0
# we build always xen HV with pae
%define xen_flags verbose=y crash_debug=y
%endif

This will cause the Xen hypervisor to be built without PAE support. Additionally, no PAE-enabled extra kernels will be built. The Xen kernel, however, uses its specific configuration file that has to be changed in order to disable PAE support. To disable PAE support for the Xen kernel, I reconfigured the kernel with no PAE support by running:

# rpmbuild -bp /usr/src/redhat/SPECS/kernel-2.6.spec
# cd /usr/src/redhat/BUILD/kernel-2.6.19/linux-2.6.19.i386
# cp configs/kernel-2.6.19-i686-xen.config .config
# make menuconfig

Make sure PAE is disabled by navigating to Processor type and features, then High Memory Support is set to either off or 4GB (but not 64GB).

Next, I copied the updated configuration file back to /usr/src/redhat/SOURCES, where it belongs. Also, we need to insert # i386 at the beginning of the file so that the RPM build process can derive the exact processor architecture from the config file when building the RPMs:

# cat <(echo "# i386") .config > ../../../SOURCES/kernel-2.6.19-i686-xen.config

The processor architecture is supplied to make during the build process in the form of ARCH=i386.

Now, let’s build the RPMs:

# rpmbuild -ba --target i686 ../../../SPECS/kernel-2.6.spec

We need to specify i686 as the target architecture since Fedora and Red Hat don’t use i386 anymore for kernels themselves — i386 is now only used for some common RPMs like kernel-headers.

Once the RPMs have been built, check the files under /usr/src/redhat/RPMS/i686. At least there should be a file called kernel-xen-2.6.19-1.2895.i686.rpm. This RPM contains several files, but the ones that we are interested in are:

  • /boot/config-2.6.19-1.2895xen

    Contains the kernel configuration. Make sure either CONFIG_X86_PAE is set to n or is undefined.

  • /boot/vmlinuz-2.6.19-1.2895xen

    The Linux Xen-enabled kernel.

  • /boot/xen.gz-2.6.19-1.2895

    The Xen hypervisor. In the most recent versions of Fedora Core and Red Hat Enterprise Linux, the Xen hypervisor and the Xen-enabled kernel are packaged in the same RPM. This is the right thing to do since both are tightly coupled.

Install the new Xen kernel and hypervisor:

# rpm -ivh --force /usr/src/redhat/RPMS/i686/kernel-xen-2.6.19-1.2895.i686.rpm

Reboot:

# reboot

I assume the system will boot correctly and into the new Xen hypervisor and Xen-enabled Linux kernel. You can check that by running:

# uname -a
Linux xen 2.6.19-1.2895xen #1 SMP Sat Feb 3 16:56:34 CET 2007 i686 i686 i386 GNU/Linux

The next step is installing NetBSD 3.1 as a domU. This is covered next.

Installing NetBSD 3.1

The first step is preparing the Xen’s domU configuration file and its corresponding storage backend. Xen can use file-backed storage for a domU or block-backed storage (i.e. a disk partition or logical volume). Typically, block-backed storage is faster than file-backed storage, so I set up a 10GiB logical volume for NetBSD:

# lvcreate -n netbsd xen -L 10G

I also used NetBSD’s Internet-based installation since it’s the easiest way to get a working NetBSD installation and the NetBSD community have built Xen-enabled NetBSD kernels:

  • netbsd-INSTALL_XEN3_DOMU

    A Xen-based, domU kernel used to install NetBSD.

  • netbsd-XEN3_DOMU

    A Xen-based, domU kernel used to run the installed system.

Both files can be downloaded from /pub/NetBSD/NetBSD-3.1/i386/binary/kernel. Download and uncompress both of them:

# wget ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-3.1/i386/binary/kernel/netbsd-*XEN3_DOMU.gz
# zcat netbsd-INSTALL_XEN3_DOMU.gz > /boot/netbsd-INSTALL_XEN3_DOMU
# zcat netbsd-XEN3_DOMU.gz > /boot/netbsd-XEN3_DOMU

If you are running SELinux, you will need to relabel these files properly or xm will be unable to load them into memory:

# chcon root system_u:object_r:boot_t /boot/netbsd*

Next, create the Xen configuration file for NetBSD. In my case, it looked like this:

# cat /etc/xen/auto/netbsd
kernel = "/boot/netbsd-INSTALL_XEN3_DOMU"
memory = 256
name = "netbsd"
vif = [ 'mac=00:16:3e:00:00:11, bridge=xenbr0' ]
disk = [ 'phy:/dev/xen/netbsd,hda,w' ]
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'

Now, we will install NetBSD by starting the domain:

# xm create -c /etc/xen/auto/netbsd

This will start the new domain and will attach to its console. You can follow the Example Installation NetBSD document to assist you in installing NetBSD and also Xensource NetBSDdomU Wiki page.

Once the installer has finished, do not reboot. At the end of the installation process, you’ll be brought back to the main install screen. Select e: Utility menu, then a: Run /bin/sh, then type the following at the shell:

mount /dev/xbd0a /mnt
cp -pR /dev/rxbd* /mnt/dev
cp -pR /dev/xbd* /mnt/dev
halt -p

This will copy the required special device files and shut down the guest. Now, you will have to modify the domain config file in order to use the standard NetBSD domU kernel, /boot/netbsd-XEN3_DOMU. Edit /etc/xen/auto/netbsd and replace:

kernel = "/boot/netbsd-INSTALL_XEN3_DOMU"

with:

kernel = "/boot/netbsd-XEN3_DOMU"

And boot the domain again:

# xm create -c /etc/xen/auto/netbsd

During boot, you will see some errors like:


wsconscfg: /dev/ttyEcfg: Device not configured

This is due to the NetBSD guest only having access to one physical console. To kill those errors, edit /etc/ttys from within the NetBSD guest and turn off all terminals except "console", like:

console "/usr/libexec/getty Pc"         vt100   on  secure
ttyE0   "/usr/libexec/getty Pc"         vt220   off secure
ttyE1   "/usr/libexec/getty Pc"         vt220   off secure
ttyE2   "/usr/libexec/getty Pc"         vt220   off secure
ttyE3   "/usr/libexec/getty Pc"         vt220   off secure
...

Also, comment out all screens in /etc/wscons.conf:

#screen 0       -       vt100
#screen 1       -       vt100
#screen 2       -       vt100
#screen 3       -       vt100
#screen 4       -       -
#screen 4       80x25bf vt100

That’s all. Now we have a fully functional NetBSD 3.1 domU guest running on Xen :-)

References

The information and instructions on this post are based on:

  1. NetBSDdomU — How to install NetBSD as a domU on a Linux host.
  2. Example Installation — NetBSD example installation.

Since I first write my first attempt at trying to get VLAN support working under Xen, I’ve received some reports for people stating that it doesn’t work as expected. And they are right.

At the end of the first article, I pointed out I was having problems with UDP traffic. In turn, it was worse than I ever expected, since it was affecting DNS name resolution, DHCP services and other services running as inside a domainU. This is the reason why I rethought the implementation and now have it working on a production machine acting, among as other things, as a DHCP server and DNS server.

In this second try I decided not to mess around with Xen’s default network configuration, so please undo all the changes you did so you end end up with a pristine Xen configuration. In this new scenario all the native traffic (tagged an untagged Ethernet frames) is being captured by Xen’s switch, xenbr0, and sent to the right network interface. If the traffic being received is a 802.11q tagged frame, the target will receive it tagged and thus will have to implement measures to untag and process it accordingly.

Introduction

So, let’s say we have the following logical network topology and virtual machines:

                   |
                  LAN
                   |
-------------------+----------------------------------
|                  |                                 |
|                peth0 ---- xen-br0                  |
|                              |                     |
|            -----------------------------           |
|            |                           |           |
|          vif0.0                     vif1.0         |
|            |                           |           |
|            |            +--------------+------------
|            |            |              |
|            |            |  ------------+------------
|            |            |  |           |           |
|           eth0          |  |          eth0         |
|            |            |  |           |           |
|     -------+-------     |  |     ------+-------    |
|     |             |     |  |     |            |    |
| eth0.1000      eth0.10  |  | eth0.2000     eth0.10 |
|     |             |     |  |     |            |    |
| VLAN 1000      VLAN 10  |  | VLAN 2000     VLAN 10 |
|     |             |     |  |     |            |    |
|    www           ssh    |  |    ftp          ssh   |
|                         |  |                       |
|        Domain0          |  |        DomainU        |
---------------------------  -------------------------

The Xen’s switch configuration can be seen with the following command:

root@xen:~# brctl show
bridge name  bridge id           STP enabled   interfaces
xenbr0       8000.feffffffffff   no            peth0
                                               vif0.0
                                               vif1.0

For each domain — this includes domain0 or any domainU — there is a vif|X|.|Y| interface attached to Xen’s bridge xen-br0, where |X| is the domain ID (0 for domain0 and a monotonically increasing number for every domainU). Then, we have every network interface card inside the domain, in the form of eth|Y|. Thus, if a domainU with ID #3 defines two network interfaces, eth0 and eth1, there will two corresponding virtual network interfaces in domain0, named vif3.0 and vif3.1.

Instead of trying to export VLAN interfaces to one or more domainUs, we export the whole, native (tagged or not) network interface to the domainU and, inside this domainU, we can configure VLAN subinterfaces if needed.

Sample scenario

Let’s say we want to offer the following services per VLAN:

  • WWW server on VLAN 1000
  • FTP server on VLAN 2000
  • SSH access to administer the WWW sever, reachable only through the VLAN 10
  • SSH server to administer the FTP server, reachable only through the VLAN 10

But we also want to partition the physical machine in two, so domain0 serves WWW traffic while domainU servers FTP traffic:

WWW FTP SSH
domain0 VLAN 1000 - VLAN 10
domainU - VLAN 2000 VLAN 10

Thus, we need the following VLAN subinterfaces:

  • eth0.10 and eth0.1000 on domain0
  • eth0.10 and eth0.2000 on domainU

Configuring VLAN subinterfaces in domainU is straight forward. However, it’s a little bit more difficult for domain0.

Configuring VLAN subinterfaces for domain0

First of all, make sure you are using bridging for your Xen configuration. Make sure the following line is uncommented in /etc/xen/xend-config.sxp:

(network-script network-bridge)

And comment any other network-script configuration lines, like:

(network-script network-nat)

or

(network-script network-route)

It seems we can’t bring up VLAN subinterfaces before Xen’s network script is fired up since Xen’s network scripts perform some black magic on the network interfaces, mainly renaming eth0 to peth0 and bringing up a dummy interface named eth0. Any subinterface related to the original eth0 seems to stop working after the renaming takes place.

Thus, I coded up an init script used to bring up the VLAN subinterfaces that gets invoked just after Xen’s network script has finished. Note that it’s targeted for RedHat-based distributions:

#!/bin/sh
#
# Init file for Network-VLAN
# STARTS AFTER XEN (which is S50 and K01)
#
# chkconfig: 2345 51 89
# description: VLAN networking

. /etc/init.d/functions

case "$1" in
start)
 echo -n $"Configuring VLAN interfaces:"

 if [ ! -f /var/lock/subsys/network-vlan ]; then
  (
  modprobe 8021q || exit 1
  vconfig add eth0 10 || exit 2
  ifconfig eth0.10 up 10.0.0.1 netmask 255.0.0.0 || exit 3
  vconfig add eth0 1000 || exit 2
  ifconfig eth0.1000 up 11.0.0.1 netmask 255.0.0.0 || exit 3
  ) > /dev/null 2>&1

  RETVAL=$?
  [ "$RETVAL" = 0 ] && ( success ;\\
    touch /var/lock/subsys/network-vlan ) || failure
 fi
 echo

 ;;

stop)
 echo -n $"Unconfiguring VLAN interfaces:"

 if [ -f /var/lock/subsys/network-vlan ]; then
  (
  ifconfig eth0.10 down && vconfig rem eth0.10 ;
  ifconfig eth0.1000 down && vconfig rem eth0.1000
  ) > /dev/null 2>&1

  RETVAL=$?
  [ "$RETVAL" = 0 ] && ( rm -f /var/lock/subsys/network-vlan ;\\
    success ) || failure
 fi
 echo
esac

Save this script as /etc/init.d/network-vlan, then run:

chmod +x /etc/init.d/network-vlan
chkconfig --add /etc/init.d/network-vlan

The script runs just after Xen’s init script has renamed the real Ethernet interface and has brought up a dummy interface called eth0. Then, the network-vlan script brings up two VLAN subinterfaces, one for VLAN 10 and another one for VLAN 1000, and then assigns each one its own IP address.

Additionally, these are the contents of /etc/sysconfig/network-scripts/ifcfg-eth0:

DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
TYPE=Ethernet

Note that eth0 in this context refers to the real Ethernet interface, since Xen’s init script has not been ran yet. I didn’t configure any IP address for this interface since I only want to process tagged traffic. Beware that on many switches — i.e., Cisco 2960 and 3560 —, VLAN1 is, by default, the native VLAN and traffic on the native VLAN doesn’t get tagged.

Configuring VLAN subinterfaces for domainU

These are the contents of /etc/sysconfig/network-scripts/ifcfg-eth0:

DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
TYPE=Ethernet

I didn’t configure any IP address for this interface since I only want to process tagged traffic. Read the note above on untagged frames and native VLANs.

These are the contents of /etc/sysconfig/network-scripts/ifcfg-eth0.10:

DEVICE=eth0.10
BOOTPROTO=static
IPADDR=10.0.0.2
NETMASK=255.0.0.0
ONBOOT=yes
TYPE=Ethernet
VLAN=yes

These are the contents of /etc/sysconfig/network-scripts/ifcfg-eth0.2000:

DEVICE=eth0.2000
BOOTPROTO=static
IPADDR=12.0.0.1
NETMASK=255.0.0.0
ONBOOT=yes
TYPE=Ethernet
VLAN=yes

Bonding

For those who desire to use bonding, it seems some tweaking of the networking scripts is required. I recommend them to look at this post on Bonding not working with network-bridge.

Conclusion

I’m sure there are better ways to configure VLAN subinterfaces in domain0, but it was in a hurry and couldn’t find of a better way to get it done.

If anyone out there has a different way of achieving this, please let me know :-)

Xen networking is powerful enough to allow for extreme customization. Although the default networking configuration is usually more than enough for simple scenarios, it can fall short when trying to support multiple guests standing on different VLANs.

In this short article, I describe the steps needed to configure Xen to attach itself to multiple VLANs using a one-bridge-per-VLAN network interface mapping, then attaching each Xen domainU on as many VLANs as needed.

In the sample scenario, we will use a Cisco Catalyst 3560G-24TS switch carrying traffic from five different VLANs:

  • VLAN2 is the administrative VLAN used to administer all the networking gear and boxes.
  • VLAN10 carries Internet traffic coming from the first ISP.
  • VLAN20 carries Internet traffic coming from the second ISP.
  • VLAN100 carries the access network traffic.
  • VLAN200 carries the core network traffic.

The final Xen configuration will provide five bridging network interfaces, one per VLAN. Each Xen domainU can freely attach to any of these bridging network interfaces in order to gain access to the traffic being carried by each VLAN.

The bridging interface, |brname| is named after the following convention: xenbr|vlan|:

  • xenbr2 is the bridging interface standing on VLAN2.
  • xenbr10 is the bridging interface standing on VLAN10.
  • xenbr20 is the bridging interface standing on VLAN20.
  • xenbr100 is the bridging interface standing on VLAN100.
  • xenbr200 is the bridging interface standing on VLAN200.

Also, Xen creates an manages several virtual network interfaces, named in the form of vif|X|.|Y|, where |X| equals the Xen domain numeric ID and |Y| is a sequential interface index. Thus, starting up a Xen domainU given the following virtual network interface definition:

vif = [ 'mac=00:16:3e:00:00:44, bridge=xenbr10',
        'mac=00:16:e3:00:00:45, bridge=xenbr20' ]

Will cause the Xen domain to get assigned, let’s say, a domain ID of 2, and two virtual network interfaces named vif2.0 — attached to xenbr10 — and vif2.1 — attached to xenbr20.

Setting up the bridging interfaces:

This can be done manually, by invoking brctl addbr |brname| in order to create a new bridging interface.

For example, the following commands will create five bridging interfaces, one for each supported VLAN:

brctl addbr xenbr2
brctl addbr xenbr10
brctl addbr xenbr20
brctl addbr xenbr100
brctl addbr xenbr200

or else can be automated to get done during system startup, by creating a file named /etc/sysconfig/network-scripts/ifcfg-|brname|, where |brname| is the name assigned to the bridging interface, like /etc/sysconfig/network-scripts/ifcfg-xenbr2 (the configuration file for the bridging interface standing on VLAN2):

DEVICE=xenbr2
BOOTPROTO=static
IPADDR=192.168.0.10
NETMASK=255.255.0.0
ONBOOT=yes
TYPE=Bridge

Setting up the VLAN interfaces and add them up to the existing bridging interfaces:

This can be done manually, by invoking vconfig add |ifname| |vlan| to configure VLAN number |vlan| by using 802.1q tagging on interface |ifname|. This will active a virtual interface named |ifname|.|vlan|:

  • Any traffic sent to this interface will get tagged for VLAN |vlan|.
  • Any traffic received from interface |ifname| carrying an 802.1q VLAN tag matching |vlan| will be untagged and received by this interface.
vconfig add eth0 2
vconfig add eth0 10
vconfig add eth0 20
vconfig add eth0 100
vconfig add eth0 200

This will add five new VLAN interfaces, one for every supported VLAN.

Once the VLAN interfaces are ready, we add them to their corresponding bridging interfaces by using brctl addif |brname| |ifname|.|vlan|:

brctl addif xenbr2 eth0.2 brctl addif xenbr10 eth0.10 brctl addif xenbr20 eth0.20 brctl addif xenbr100 eth0.100 brctl addif xenbr200 eth0.200

The process of adding up a new VLAN interface and then adding it up to an existing bridging interface can be configured using a single configuration file named ifcfg-|ifname|.|vlan|, like /etc/sysconfig/network-scripts/ifcfg-eth0.2:

DEVICE=eth0.2 BOOTPROTO=none ONBOOT=yes TYPE=Ethernet VLAN=yes BRIDGE=xenbr2

Keeping Xen from reconfiguring the network:

Since we have already configured the network manually, we don’t want Xen to mess up with the configuration. In order to keep Xen from reconfiguring the network, simply make sure none of the following lines appear uncommented in the file /etc/xen/xend-config.sxp:

(network-script network-bridge)
(network-script network-route)
(network-script network-nat)

Additional notes:

I have been experiencing a very strange behavior on Xen domainU guests while using this network configuration: it seems that UDP traffic gets stuck at the network stack and does not flow through unless I load the ip_conntrack.ko kernel module.

Failing to load the ip_conntrack.ko kernel module, even with an unconfigured, empty firewall, allows ICMP and TCP traffic to flow from and to the guest network stack, but UDP traffic, like DNS queries, gets stuck and doesn’t even touch the physical network interface.

This is really strange, isn’t it?