Friday, October 7, 2011

VMware vSwitch Nuggets for Network Admins

I've written previously about the promiscuous behavior of pNICs on ESX servers.  It turns out that the behavior of the vNICs and vSwitches is just as interesting.  Here are a few details that might not be obvious to the network-minded.  The last one is the coolest detail, and it's what got me looking into vSwitches today.

No L2 Address Learning - The vSwitch is an L2 switch, but it doesn't learn MAC addresses the way normal L2 switches do.  Because ESX created the vNIC that's plugged (virtually) into each vSwitch port, ESX knows the hardware addresses used by each vNIC, and the vSwitch port it's "plugged" into. So there's no reason to learn MAC addresses.

L2 Address Updates - On many real NICs, the "burned in" MAC address is stored in ROM, but doesn't have to be used by the NIC driver.  It's more of a suggestion.  The driver can load any valid unicast MAC address into the NICs unicast filtering registers.

Because the vSwitch knows the "burned in" (really it's a value stored in a configuration file) address, the vSwitch can stop the driver from changing its unicast MAC address.  ...Or at least, fail to deliver packets for the updated address.  This behavior is configurable.

L2 Address Spoofing - In addition to refusing to deliver mis-addressed frames to a guest, the vSwitch can refuse admittance to frames generated by the guest unless the frame headers display the expected source MAC address.  Address spoofing like this has numerous savory and unsavory purposes, so the behavior is configurable

IGMP Snooping Not Required - IGMP snooping on a physical L2 switch is a traffic suppression mechanism: only deliver IPv4 multicast frames to ports with subscribers attached.  The mechanism relies on L2 switches intercepting (and sometimes dropping) IGMP traffic between routers and hosts.  When a host subscribes to a multicast group, the host driver programs the NIC to pass the appropriate multicast frames.  Because a virtual machine's NIC is really an ESX software component, ESX (and thus the vSwitch) already knows which ports are subscribed to which multicast groups, so the hassle of IGMP snooping isn't necessary.

Automagic Port Monitor Mode - This feature is usaully described as a mechanis to "Stop promiscuous mode" or somesuch, but I think that description glosses over how cool this is.  Yes, you can stop a server from sniffing traffic not destined for him, but real L2 switches can do that too.  The neat thing here is that when you want to sniff traffic, you don't have to enable a mirror feature on the switch. vSwitches don't have a port mirroring lever for the same reason they don't need to do IGMP snooping.  If you enable promiscuous mode on a VM's vNIC, ESX knows you've done it, and can convert the virtual port into a mirror port automatically.

pSwitch MAC Move Update - When a virtual machine migrates from one host to another (Vmotion), all of the vSwitches know exactly what just happened.  They don't need to be told to update their L2 forwarding tables.  The upstream switches, on the other hand, don't know that a particular MAC address has moved,  and they won't know until a frame sourced from that MAC address shows up sourced from an unexpected port.  This isn't much of a problem for chatty servers, but what if it's a mostly quiet system like a syslog server?  The forwarding tables on upstream pSwitches could misdirect traffic for minutes.  VMotion works around this problem by sending a spoofed broadcast frame, apparently from the guest.  The frame floods throughout the broadcast domain, updating L2 forwarding tables on every bridge.  This behavior is neary identical to how the same problem is handled by Cisco switches configured with the FlexLinks feature.

pSwitch IGMP Snooping Update - If your VMotioned guest is subscribed to an IPv4 multicast stream, the L2 switches are going to have a second forwarding table problem:  They've got an entry mapping the multicast stream to the old ESX server port, but won't know to add the new ESX host's port to the list until an IGMP host report ingresses on the new port.  This is a stickier problem than the previous one, because of the 32:1 overlap of IPv4 multicast groups to group MAC addresses.  ESX knows the MAC address (or worse, the multicast hash bucket) that the client's NIC driver has unfiltered, but it can't know exactly which IPv4 multicast group the client wants to receive.  Without knowing the correct IPv4 multicast address, ESX can't spoof an IGMP host report from the client.  VMware solves this problem by tricking the client into sending his own host reports:  It spoofs an IGMP query from the router, destined to the guest.  On receipt of the spoofed query, the guest then waits for up to the specified query-max-response-time before sending a correctly formatted host report.

I'm not in a position to collect an ESX-spoofed IGMP host report, but I'd really like to see one.  In particular, I'm curious about the IGMP query-max-response-time, and the source IP address used in these spoofed queries.  If you can catch one of theses spoofed queries, please share it with me!

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete