Friday, February 10, 2012

Linux vSwitches, 802.1Q and link aggregation - putting it all together

In the process of migrating my home virtualization lab from Xen with an OpenSolaris Dom0 to a Debian GNU/Linux Dom0, I've had to figure out how to do all the usual network things in an environment I'm less familiar with.

The "usual things" for a virtualization host usually includes:
  • An aggregate link for throughput and redundancy (NIC teaming for you server folks)
  • 802.1Q encapsulation to support multiple VLANs on the aggregate link
  • Several virtual switches, or a VLAN-aware virtual switch

In this example, I'm starting with 3 VLANs:
  • VLAN 99 is a dead-end VLAN that lives only inside this virtual server. You'd use a VLAN like this to interconnect two virtual machines (so long as they'll always run on the same server), or to connect virtual machines only to the Dom0 in the case of a routed / NATed setup
  • VLAN 101 is where I manage the Dom0 system.
  • VLAN 102 is where virtual machines talk to the external network (a non-routed / non-NATed configuration)
Here's the end result:

Aggregation, Trunking and Virtual Switch Configuration Example

VLAN 101 and 102 are carried from the physical switch across a 2x1Gb/s aggregate link. Communication between the Dom0 on VLAN 101 and the DomUs on VLAN 102 must go through a router in the physical network, so that traffic can be filtered / inspected / whathaveyou.

I didn't strictly need to create logical interface bond0.99 in my Dom0 because the external network doesn't get to see VLAN 99, and the Dom0 doesn't care to see it either. I created it here (without an IP address) because it made it simple to do things the  "Debian Way" with configuration scripts, etc... I drew it with dashed lines because I believe that it's optional.

Similarly, I didn't need to create the virtual switch vlan101, there's no harm in having it there, and I might wind up with a "management" VM (say, a RADIUS server?) that's appropriate to put on this VLAN.

Here's the contents of my /etc/network/interfaces file that created this setup:

auto lo
iface lo inet loopback

auto bond0
iface bond0 inet manual
        slaves eth0 eth1
        bond-mode 802.3ad
        bond-miimon 50
        bond-xmit_hash_policy layer3+4
        bond-lacp_rate fast
        bond-updelay 500
        bond-downdelay 100

# Vlan 101 is where we'll access this server.  Also, we'll
# create a bridge "vlan101" that can be attached to xen VMs.
auto bond0.101
iface bond0.101 inet manual
auto vlan101
iface vlan101 inet static
        pre-up /sbin/ip link set bond0.101 down
        pre-up /usr/sbin/brctl addbr vlan101
        pre-up /usr/sbin/brctl addif vlan101 bond0.101
        pre-up /sbin/ip link set bond0.101 up
        pre-up /sbin/ip link set vlan101 up
        post-up echo 1 > /proc/sys/net/ipv6/conf/bond0.101/disable_ipv6
        post-up echo 0 > /proc/sys/net/ipv6/conf/vlan101/autoconf
        post-up echo 1 > /proc/sys/net/ipv6/conf/vlan101/autoconf
        post-down /sbin/ip link set vlan101 down
        post-down /usr/sbin/brctl delbr vlan101

# vlan 102 is a bridge-only vlan.  The dom0 doesn't appear on
# vlan 102, but xen VMs can be attached to it. It's attached
# to on the real network.
auto bond0.102
iface bond0.102 inet manual
auto vlan102
iface vlan102 inet manual
        pre-up /sbin/ip link set bond0.102 down
        pre-up /usr/sbin/brctl addbr vlan102
        pre-up /usr/sbin/brctl addif vlan102 bond0.102
        pre-up /sbin/ip link set bond0.102 up
        pre-up /sbin/ip link set vlan102 up
        post-up echo 1 > /proc/sys/net/ipv6/conf/bond0.102/disable_ipv6
        post-up echo 0 > /proc/sys/net/ipv6/conf/vlan102/autoconf
        post-up echo 1 > /proc/sys/net/ipv6/conf/vlan102/autoconf
        post-down /sbin/ip link set vlan102 down
        post-down /usr/sbin/brctl delbr vlan102

# vlan 99 is a bridge-only vlan.  The dom0 doesn't appear on
# vlan 99, but xen VMs can be attached to it. It goes nowhere.
auto bond0.99
iface bond0.99 inet manual
auto vlan99
iface vlan99 inet manual
        pre-up /sbin/ip link set bond0.99 down
        pre-up /usr/sbin/brctl addbr vlan99
        pre-up /usr/sbin/brctl addif vlan99 bond0.99
        pre-up /sbin/ip link set bond0.99 up
        pre-up /sbin/ip link set vlan99 up
        post-up echo 1 > /proc/sys/net/ipv6/conf/bond0.99/disable_ipv6
        post-up echo 1 > /proc/sys/net/ipv6/conf/vlan99/disable_ipv6
        post-down /sbin/ip link set vlan99 down
        post-down /usr/sbin/brctl delbr vlan99

I know, I know... I should be ashamed of myself for turning IPv6 off on my home network! It's off on some interfaces on purpose -- I don't want to expose the Dom0 on VLAN 102, for example. Autoconfiguration would do that If I didn't intervene. The good news is that figuring out exactly what knobs to turn and in what order (the order of this file is important) was the hard part. Once I have a good handle on exactly what ports/services this Dom0 is running, I'll re-enable v6 on the interfaces where it's appropriate. The network is v6 enabled, but v6 security at home is a constant worry for me. Sure, NAT isn't a security mechanism, but it did allow me to be lazy in some regards.

The switch configuration that goes with this setup is pretty straightforward. It's an EtherChannel running dot1q encapsulation and only allowing VLANs 101 and 102:

interface GigabitEthernet0/1
 switchport trunk allowed vlan 101,102
 switchport mode trunk
 switchport nonegotiate
 channel-group 1 mode active
 spanning-tree portfast trunk
interface GigabitEthernet0/2
 switchport trunk allowed vlan 101,102
 switchport mode trunk
 switchport nonegotiate
 channel-group 1 mode active
 spanning-tree portfast trunk
interface Port-channel1
 switchport trunk allowed vlan 101,102
 switchport mode trunk
 switchport nonegotiate
 spanning-tree portfast trunk

Note that I'm using portfast trunk on the pSwitch. The vSwitches could be running STP, but I've disabled that feature. The VMs here are all mine, and I know that none of them will bridge two interfaces, nor will they originate any BPDUs. For an enterprise or multitenant deployment, I'd probably be inclined to run the pSwitch ports in normal mode and enable STP on the vSwitches to protect the physical network from curious sysadmins. Are you listening VMware?

1 comment:

  1. Hi Chris, I see you're building vSwitches (bridges) manually by pre-up/post-down scripts. What is the benefit of it compared to built-in support for bridge interface definitions in /etc/network/interfaces?

    For example, configuration of your non-routed vlan 102 (let's call it 'srvlan') would look like:

    auto bond0
    iface bond0 inet manual
    # rest of your bond0 setup

    # We will use 'vlanX' names
    # which are independent on physical
    # device transporting our tagged frames
    # making future changes (if any) easier
    # Note: Normally vlan naming is set with
    # vconfig set_name_type VLAN_PLUS_VID_NO_PAD
    # and Debian scripts take care of it.

    auto vlan102
    iface vlan102 inet manual
    vlan-raw-device bond0

    # We use descriptive names
    # for vSwitch (bridge) devices

    auto srvlan
    iface srvlan inet manual
    pre-up ip link set vlan102 address fe:ff:ff:ff:ff:ff arp off
    bridge-ports vlan102
    # go to forwarding state immediately
    bridge-maxwait 0

    I would say it's also good to turn off ARP protocol on bridge device itself. Also to avoid iptables inspection of bridged packets by setting net.bridge.bridge-nf-call-iptables=0 in /etc/sysctl.conf or by any other means (like calling 'echo 0 > /proc/sys/net/bridge/bridge-nf-call-iptables').