Fragmentation Needed: September 2015

Tuesday, September 29, 2015

Assigning DMVPN tunnel interface addresses with DHCP

I posted previously about some of the inner workings of DHCP. The three key points from that post are critical building blocks for this discussion:

DHCP requests get modified in flight by the DHCP relay.
DHCP relay determines L2 destination by inspecting contents of relayed packets.
DHCP clients, relays and (sometimes) servers use raw sockets because the end-to-end protocol stack isn't yet available.

The basic steps to converting a DMVPN from static address assignment scheme to dynamic are:

Configure a DHCP server. I'm using an external server¹ in this example so that we can inspect the relayed packets while they're on the wire.
Configure the hub router. There are some non-intuitive details we'll go over.
Configure the spoke router. Ditto on the non-intuitive bits.

My DHCP server is running on an IOS router (because it's convenient - it could be anywhere) and it has the following configuration:

    1     no ip dhcp conflict logging  
    2     ip dhcp excluded-address 172.16.1.1  
    3     !  
    4     ip dhcp pool DMVPN_POOL  
    5      network 172.16.1.0 255.255.255.0

So, that's pretty straightforward.

The Hub Router has the following relevant configuration:

    1     ip dhcp support tunnel unicast  
    2     interface Tunnel0  
    3      ip dhcp relay information option-insert   
    4      ip address 172.16.1.1 255.255.255.0  
    5      ip helper-address 172.16.2.2  
    6      no ip redirects  
    7      ip mtu 1400  
    8      ip nhrp authentication blah  
    9      ip nhrp network-id 1  
   10      ip tcp adjust-mss 1360  
   11      tunnel source GigabitEthernet0/0  
   12      tunnel mode gre multipoint  
   13      tunnel vrf INTERNET  
   14      tunnel protection ipsec profile DMVPN_IPSEC_PROFILE

Only lines 1, 3 and 5 were added when I converted the environment from static spoke tunnel addresses to dynamic addresses.

Line 1: According to the documentation, the ip dhcp support tunnel unicast directive "Configures a spoke-to-hub tunnel to unicast DHCP replies over the DMVPN network." Okay, so the replies are sent directly to the spoke. If you read my last post, you're probably wondering how this works. I promise, it's interesting.

Line 2: Configuring ip dhcp relay information option-insert causes the relay agent to insert the DHCP relay agent information option (option 82) into the client's DHCP packets. We'll look at the contents in a bit.

Line 3: I specified the address of the DHCP server with ip helper-address 172.16.2.2. Nothing unusual about that.

Okay, with those configuration directives, the hub is going to unicast DHCP messages to the client (good, because DMVPN provides a non-broadcast medium). How will the relay agent on the hub know where to send the server's DHCP replies? Last week's example was an Ethernet medium. The Ethernet MAC address appears in the OFFER, so the relay cracked it open in order to know where the OFFER needed to go. This OFFER will also have a MAC address, but it doesn't help. Rather than an un-solved ARP problem, this relay has an un-solved NHRP problem. It needs to know the NBMA (Internet) address of the spoke in order to deliver the OFFER.

So what did the relay stick in option 82 of the client's DISCOVER message, anyway? This:

    1       Option: (82) Agent Information Option  
    2         Length: 13  
    3         Option 82 Suboption: (9) Vendor-Specific Information  
    4           Length: 11  
    5           Enterprise: ciscoSystems (9)  
    6             Data Length: 6  
    7           Value: 11046d7786c2

So, we've got option 82 with sub-option 9 containing a Cisco-proprietary 6-byte payload. I don't know what 0x11 and 0x04 represent, but I'm guessing it's "NBMA" and "IPv4"², because the next 4 bytes (0x6d7786c2) spell out the spoke's NBMA address (109.119.134.194).

The DMVPN hub / DHCP relay agent jammed the spoke's NBMA address into the DHCP DISCOVER that it relayed on to the DHCP server! The debugs spell it out for us:

    1     DHCPD: adding relay information option.  
    2     DHCPD: Client's NBMA address 109.119.134.194 added tooption 82

Now look at what the hub/relay did when the DHCP OFFER came back from the server:

    1     DHCPD: forwarding BOOTREPLY to client 88f0.31f4.8a3a.  
    2     DHCPD: Client's NBMA address is 109.119.134.194  
    3     NHRP: Trying to add temporary cache from external source with (overlay address: 172.16.1.3, NBMA: 109.119.134.194) on interface: (Tunnel0).  
    4     DHCPD: creating NHRP entry (172.16.1.3, 109.119.134.194)  
    5     DHCPD: unicasting BOOTREPLY to client 88f0.31f4.8a3a (172.16.1.3).  
    6     DHCPD: Removing NHRP entry for 172.16.1.3  
    7     NHRP: Trying to delete temporary cache created from external source with nex-hop UNKNOWN on interface: (Tunnel0).

Just like the Ethernet case, the relay read the client's lower layer address info from the OFFER. Unlike the Ethernet case, the DMVPN relay gleaned this critical information from a field which had been inserted by the relay itself.

Somewhat different from the Ethernet case (which, according to debugs, did not populate the relay's ARP table with info from the OFFER), the DMVPN relay manipulated the NHRP table directly. Pretty cool.

A point about dual-hub environments: All of these DHCP gyrations are only possible because the DMVPN hub and spoke have already keyed the IPSec tunnel, and have an active Security Association with one another. In a dual hub environment, only one of the hubs will be able to talk to the client at this stage. Remember that the DHCP server unicasts the OFFER to the relay agent using the agent's client-facing address (172.16.1.1 in this case). If we have a second hub at, say, 172.16.1.2, it's likely that both hubs will be advertising the tunnel prefix (172.16.1.0/24) into the IGP. Routing tables in nearby routers won't draw a distinction between hub 1 (172.16.1.1) and hub 2 (172.16.1.2) when delivering the OFFER (unicast to 172.16.1.1) from the DHCP server, because both hub routers look like equally good ways to reach the entire 24-bit prefix. It is critical that the IGP has 32-bit routes so that traffic for each hub router's tunnel interface gets delivered directly to the correct box.

The Spoke Router sports the following configuration:

    1     interface Tunnel0  
    2      ip dhcp client broadcast-flag clear  
    3      ip address dhcp  
    4      no ip redirects  
    5      ip mtu 1400  
    6      ip nat enable  
    7      ip nhrp authentication blah  
    8      ip nhrp network-id 1  
    9      ip nhrp nhs dynamic nbma 109.119.134.213 multicast  
   10      ip tcp adjust-mss 1360  
   11      tunnel source FastEthernet4  
   12      tunnel mode gre multipoint  
   13      tunnel vrf INTERNET  
   14      tunnel protection ipsec profile DMVPN_IPSEC_PROFILE  
   15     ip route 172.16.1.0 255.255.255.0 Tunnel0  
   16     ip route 0.0.0.0 0.0.0.0 FastEthernet4 dhcp

Line 2: This option clears the broadcast flag bit in the DISCOVER's bootp header. This doesn't affect delivery of the DISCOVER, because of the multicast keyword on line 9. Clearing the broadcast flag in the DISCOVER also results in no broadcast bit in the OFFER message which needs to be relayed to the client. This is important because the hub doesn't have an outbound multicast capability. The DHCP OFFER will never get delivered in over the NBMA transport if the broadcast bit is set.

Line 3: Pretty self explanatory

Line 9: I'm partial to this method of configuring the NHS server (as opposed to two lines: one specifying the tunnel address of the NHS, and one specifying the mapping of NHS tunnel IP to NMBA IP. At any rate, there's a critical detail: the multicast keyword must be present in either configuration. Without it, the spoke won't have an NHRP mapping with which to send the DISCOVER messages upstream.

Line 15: This is a funny one, and might not be required in all environments.

First, consider line 16. There is a default route in the global table via the DHCP-learned next hop in vrf INTERNET. This is here to facilitate split tunneling: Devices behind this DMVPN spoke can access the Internet via overload NAT on interface Fa4.

Next, remember what the OFFER message looks like: It's an IPv4 unicast destined for an address we don't yet own.

Look what comes out FastEthernet 4 if we don't have the route on line 15 in place:

 poetaster:~ chris$ tcpdump -r /tmp/routed.pcapng   
 reading from file /tmp/routed.pcapng, link-type EN10MB (Ethernet)  
 16:25:09.481759 IP 172.16.1.1.bootps > 172.16.1.3.bootpc: BOOTP/DHCP, Reply, length 300  
 16:25:12.677629 IP 172.16.1.1.bootps > 172.16.1.3.bootpc: BOOTP/DHCP, Reply, length 300  
 poetaster:~ chris$

Those are our DHCP OFFERS, and we're spitting them out toward the Internet!

What's going on here? Well, we've got an IPSec SA up and running with our DMVPN hub. We don't yet have an IP address, but IPSec doesn't care about that. It's not a VPN, it's more of a transport security mechanism, right? An encrypted packet rolls in from the hub, matches the SA, gets decrypted and then... Routed! The DHCP client process never saw it. The OFFER missed the raw socket (or whatever Cisco is doing) mechanism entirely!

Line 16 pushes the wayward OFFER back toward the tunnel interface, where the DHCP client implementation will find it. This route doesn't need to match the tunnel interface exactly, it just needs to be the best match for the address we will be offered. I used a 24-bit route to match the tunnel, but these would have worked too:

172.16.1.3/32 (requires me to know my address in advance)
172.16.0.0/12 (requires that I don't learn a better route to my new IP via some path other than Tu0)

Maybe there's a way to handle the incoming DHCP traffic with PBR? That would be an improvement because this static route detail is the only place in the configuration which requires the spoke to know anything about the prefix running on the DMVPN transport he'll be using.

1 Cisco's documentation indicates that the DHCP server cannot run on the DMVPN hub. Take that indication with some salt because (a) That is an IOS XE command guide, and IOS XE doesn't have some of the indicated commands. (b) Running the DHCP server locally seems to work just fine.^↩

2 Maybe "IPv4 NBMA" and "length" ? Eh. That's why it's called a vendor proprietary option. ^↩

Friday, September 25, 2015

Cisco DHCP client bummer

It looks to me like the Cisco IOS DHCP client mis-handles the DNS server option when it's working in a VRF.

I'm working on an IOS 15.4 router with an empty startup-config and only the following configuration applied:

 interface FastEthernet4  
  ip address dhcp  
  no shutdown

debug dhcp detail produces the following when the DHCP lease is claimed:

 Sep 25 19:48:23.316: DHCP: Received a BOOTREP pkt  
 Sep 25 19:48:23.316: DHCP: Scan: Message type: DHCP Offer  
 ...  
 Sep 25 19:48:23.316: DHCP: Scan: DNS Name Server Option: 192.168.100.4

Indeed, we can resolve DNS. We can also see that the DNS server learned from DHCP has been configured (is there a better way to see this?):

 lab-C881#ping google.com  
 Translating "google.com"...domain server (192.168.100.4) [OK]  
 Type escape sequence to abort.  
 Sending 5, 100-byte ICMP Echos to 205.158.11.53, timeout is 2 seconds:  
 !!!!!  
 Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms  
 lab-C881#show hosts summary  
 Default domain is fragmentationneeded.net  
 Name/address lookup uses domain service  
 Name servers are 192.168.100.4  
 Cache entries: 5  
 Cache prune timeout: 50  
 lab-C881#

If I put the interface into a VRF, like this...

 ip vrf INTERNET  
 interface FastEthernet4  
  ip vrf forwarding INTERNET  
  ip address dhcp  
  no shutdown

Debugs look the same, but we can't find google, and we don't seem to have a DNS server configured:

 lab-C881#ping vrf INTERNET google.com    
 % Unrecognized host or address, or protocol not running.  
 lab-C881#show hosts vrf INTERNET summary  
 lab-C881#

The global forwarding table has no interfaces up, but it's trying to use the DNS server which is reachable only within the VRF:

 lab-C881#ping google.com    
 Translating "google.com"...domain server (192.168.100.4)  
 % Unrecognized host or address, or protocol not running.  
 lab-C881#show hosts summary  
 Default domain is fragmentationneeded.net  
 Name/address lookup uses domain service  
 Name servers are 192.168.100.4  
 Cache entries: 1  
 Cache prune timeout: 42

Of course, without any interfaces, attempts to talk to the DNS server from the global table will fail. This is kind of a bummer.

Monday, September 14, 2015

Just some quick points about DHCP

Okay, so everybody knows DHCP pretty well.

I just want to point out a few little details as background for a future post:

DHCP Relays Can Change Things
The first point is about those times when the DHCP client and server aren't on the same segment.

In these cases, a DHCP relay (usually running on a router) scoops up the helpless client's broadcast packets and fires them at the far away DHCP server. The server's replies are sent back to the relay, and the relay transmits them onto the client subnet.

The DHCP relay can change several things when relaying these packets:

It increments the bootp hop counter.
It populates the relay agent field in the bootp header (The DHCP server uses this to identify the subnet where the client is looking for a lease).
It can introduce additional DHCP options to the request.

The last one is particularly interesting. When a DHCP relay adds information to a client message, it can be used by the DHCP server for decision-making or logging purposes. Alternatively, the added information can be used by the DHCP relay itself: Because the relay's addition will be echoed back by the server, the relay can parse information it added to a DISCOVER message when relaying the resulting OFFER message back toward the client.

DHCP servers shortcut ARP
Consider the following DHCP offer sent to a client:

1:  14:11:09.966124 a4:93:4c:46:d3:3f (oui Unknown) > 40:6c:8f:38:26:60 (oui Unknown), ethertype IPv4 (0x0800), length 342: (tos 0x0, ttl 255, id 30511, offset 0, flags [none], proto UDP (17), length 328)  
2:    192.168.26.254.bootps > 192.168.26.225.bootpc: BOOTP/DHCP, Reply, length 300, hops 1, xid 0x91b0d169, Flags [none]  
3:        Your-IP 192.168.26.225  
4:        Gateway-IP 192.168.26.254  
5:        Client-Ethernet-Address 40:6c:8f:38:26:60 (oui Unknown)  
6:        Vendor-rfc1048 Extensions  
7:         Magic Cookie 0x63825363  
8:         DHCP-Message Option 53, length 1: Offer  
9:         Server-ID Option 54, length 4: 192.168.5.13  
10:         Lease-Time Option 51, length 4: 7200  
11:         Subnet-Mask Option 1, length 4: 255.255.255.224  
12:         Default-Gateway Option 3, length 4: 192.168.26.254  
13:         Domain-Name-Server Option 6, length 4: localhost  
14:         Domain-Name Option 15, length 5: "a.net"

Line 2 indicates that it's a unicast IP packet, sent to 192.168.26.225. Line 1 tells us the packet arrived in a unicast Ethernet frame. Nothing too unusual looking about it.

But how did the DHCP relay encapsulate this IP packet into that Ethernet frame? Usually the dMAC seen in the frame header is the result of an ARP query. That can't work in this case: The client won't answer an ARP query for 192.168.26.225 because it doesn't yet own that address.

The encapsulation seen here skipped over ARP. Instead, the relay pulled the dMAC from the DHCP payload (line 5). Nifty.

DHCP Clients use Raw Sockets
Everything we know about learning bridges and interface promiscuity applies here. The IP layer in the client system receives this frame because it was sent to the unicast MAC address and the client's NIC allowed it in.

Line 2 indicates that it's also a unicast IP packet, sent to 192.168.26.225.

But this is the DHCP OFFER. The client doesn't yet own the address to which the offer is sent. In fact, the client don't even know that the address available for its use until it gets to line 3 (a bootp option inside a UDP datagram inside this IP packet). The client won't have actually leased this address until it Requests this address from the server, and the server ACKs the request.

So how can the client receive this IP packet, when it's sent to an address that nobody owns?

The answer is Raw Sockets. Raw sockets give a privileged program the ability to undercut various layers (UDP and IP in this case) of the OS stack, and send/receive messages directly. This is also probably how the DHCP relay encapsulated the offer (skipping ARP) in the first place.

The details of raw socket implementations are specific to the operating system, but usually include the ability to specify an interface and to apply a filter, so that the application doesn't have to process every message on the wire.