Monday, October 18, 2010

VMware runs in promiscuous mode?

I've recently discovered that VMware runs its physical NICs in promiscuous mode.  At least, I think I have made that discovery.

There's a lot of chat out there about VMware and promiscuity, but it's usually devoted to the virtual host side of the vswitch.  On that side of the vswitch, things are usually pretty locked-down:


  • No dynamic learning of MAC addresses (don't need to learn when you know)
  • No forging of MAC addresses allowed (for the same reason)
On a physical switch, we might accomplish the same thing with:
interface type mod/port
  switchport port-security mac-address mac-address
  switchport port-security maximum 1
This leads to frustration for people trying to deploy sniffers, intrusion detection, layered virtualization and the like within VMware, and it's not what I'm interested in talking about here.


I'm interested in something much more rudimentary, and which has always been with us.  But which has begun to vanish.


History Lesson 1
On a truly broadcast medium (like an Ethernet hub), all frames are always delivered to all stations.  Passing a frame from the NIC up to the driver requires an interrupt, which is just as disruptive as it sounds.  Fortunately, NICs know their hardware addresses, and will only pass certain frames up the stack:
  • Frames destined for the burned-in hardware address
  • Frames destined for the all-stations hardware address
You're probably aware that it's possible to change the MAC address on a NIC.  It's possible because the burned-in address just lives in a register on the NIC.  The contents of that register can be changed, and in all likelyhood were loaded there by the driver at initialization time anyway.  The driver can load a new address into this register.


In fact, most NICs have more than one register for holding unicast addresses which must be passed up the stack, allowing you to load several MAC addresses simultaneously.


History Lesson 2
Multicast frames have their own set of MAC addresses.  If you switch on a multicast subscriber application, a series of steps happen which culminate in the NIC unfiltering your desired multicast frames and passing them up the stack.  This use case is much more common than loading multiple unicast addresses, and hardware designers saw it coming before they allowed for multiple reconfigurable unicast addresses.


This mechanism works in much the same way that an EtherChannel balances load among its links:  Deterministic address hashing.  But it uses a lot more buckets, and works something like this:
  1. Driver informs the NIC about the multicast MAC that an upstream process is interested in receiving.
  2. The NIC hashes the address to figure out which bucket is associated with that MAC.
  3. The NIC disables filtering for that bucket.
  4. All frames that hash into the selected bucket (not just the ones we want) get passed up the stack.
  5. Software (the IP stack) filters out packets which made it through the hardware filtering, but which turn out to be unwanted.
Modern implementations
Surprisingly, nothing here has changed.  I reviewed data sheets and driver development guides for several NIC chipsets that are currently being shipped by major label server vendors.  Lots of good "server class" NICs include 16 registers for unicast addresses and a 4096-bucket (65536:1 overlap) multicast hashing scheme.

And VMware fits in how?
Suppose you're running 20 virtual machines on an ESX server.  Each of those VMs has unique IP and MAC addresses associated with it.  But the physical NIC installed in that server can only do filtering for 16 addresses!

The only thing VMware can do in this case is to put the NIC (VMware calls them pNIC) into promiscuous mode, then do the filtering in software, where hardware limitations (registers chiseled into silicon) aren't a problem.

It's good news for the VMware servers that they're (probably) not plugged into a hub, because the forwarding table in the physical switch upstream will protect them from traffic that they don't want.

Promiscuity in NICs is widely regarded as suspicious, performance impacting, and a problem.  ...and 101-level classes in most OS and network disciplines cover the fact that NICs know their address and filter out all others.  The idea that this notion is going away came as a bit of a surprise to me, and makes a strong argument for:
  • Stable L2 topologies (STP TCN messages will "unprotect" VMware on the switches)
  • IGMP snooping (without it, switches won't protect VMware from multicast frames)
How close to the limit?
Okay, so 16 addresses per NIC, isn't quite so dire.  A big VM server running dozens of guests probably has at least a handful of NICs, so the ratio of guests+ESX-overhead/pNIC_count might not be higher than 16 in most cases.

VMware could handle this by using unicast slots one-by-one until they're all full, and only then switching to promiscuous mode.

I've only found one document that addresses this question directly.  It says:
Each physical NIC is put in promiscuous mode
  • Need to receive frames destined to all VMs
  • Not a issue at all on a modern Ethernet network

No comments:

Post a Comment