Thursday, August 31, 2017

Using FQDN for DMVPN hubs

I've done some testing with specifying DMVPN hubs (NHRP servers, really) using their DNS name, rather than IP address.

This matters to me because of some goofy environments where spoke routers can't predict what network they'll be on (possibly something other than internet), and where I can't leverage multiple hubs per tunnel due to a control plane scaling issue.

The DNS-based configuration includes the following:

 interface Tunnel1  
  ip nhrp nhs dynamic nbma dmvpn-pool.fragmentationneeded.net  

There's no longer a requirement for any ip nhrp map or ip nhrp nhs x.x.x.x configuration when using this new capability.

My testing included some tunnels with very short ISAKMP and IPSec re-key intervals. I found that the routers performed the DNS resolution just once. They didn't go back to DNS again for as long as the hub was reachable.

Spoke routers which failed to establish a secure connection for whatever reason would re-resolve the hub address each time the DNS response expired its TTL. But once they succeeded in connecting, I observed no further DNS traffic for as long as the tunnel survived.

The record I published (dmvpn-pool.fragmentationneeded.net above) includes multiple A records. The DNS server randomizes the record order in its responses and spoke routers always connected to the first address on the list.

The random-ordered DNS response makes for a kind of nifty load balancing and failover capability:

  1. The spokes will naturally balance across the population of hubs, depending on the whim of the DNS server
  2. I don't strictly need a smart (GSLB style) DNS server to effect failover, because spokes will eventually find their way to a working hub, even with bad records in the list.


With 3 hub routers, the following happens when one fails:

  • At T=0, 67% of the routers remain connected.
  • At T=<keepalive>s, 89% of routers are connected (2/3 of the orphans are back online. The others are trying the dead hub again).
  • At T=TTLx1, 96% of routers are connected (1/3 of the orphans from the previous interval tried the dead hub a second time)
  • At T=TTLx2, 99% of routers are back online
Things recover fairly quickly with short TTL intervals, even without a GSLB because the spokes keep trying, and only need to find a working record once. This DMVPN tunnel isn't the only path in my environment, so a couple of minutes outage is acceptable.


A 60 second TTL will result in ~40K queries/month for each spoke that can't connect (problems with firewall, overload NAT, credentials, etc...), so watch out for that if you're using a service that causes you to pay per query :)

Wednesday, August 30, 2017

Small Site Multihoming with DHCP and Direct Internet Access

Cisco recently (15.6.3M2 ) resolved CSCve61996, which makes it possible to fail internet access back and forth between two DHCP-managed interfaces in two different front-door VRFs attached to consumer-grade internet service.

Prior to the IOS fix there was a lot of weirdness with route configuration on DHCP interfaces assigned to VRFs.

I'm using a C891F-K9 for this example. The WAN interfaces are Gi0 and Fa8. They're in F-VRF's named ISP_A and ISP_B respectively:


First, create the F-VRFs and configure the interfaces:

 ip vrf ISP_A  
 ip vrf ISP_B  
   
 interface GigabitEthernet8  
  ip vrf forwarding ISP_A  
  ip dhcp client default-router distance 10  
  ip address dhcp  
 interface FastEthernet0  
  ip vrf forwarding ISP_B  
  ip dhcp client default-router distance 20  
  ip address dhcp  

The distance commands above assign the AD of the DHCP-assigned default route. Without these directives the distance would be 254 in each VRF. They're modified here because we'll be using the distance to select the preferred internet path when both ISPs are available.

Next, let's keep track of whether or not the internet is working via each provider. In this case I'm pinging 8.8.8.8 via both paths, but this health check can be whatever makes sense for your situation. So, a couple of IP SLA monitors and track objects are in order:

 ip sla 1  
  icmp-echo 8.8.8.8  
  vrf ISP_A  
  threshold 500  
  timeout 1000  
  frequency 1  
 ip sla schedule 1 life forever start-time now  
 track 1 ip sla 1  
   
 ip sla 2  
  icmp-echo 8.8.8.8  
  vrf ISP_B  
  threshold 500  
  timeout 1000  
  frequency 1  
 ip sla schedule 2 life forever start-time now  
 track 2 ip sla 2  

Ultimately we'll be withdrawing the default route from each VRF when we determine that the internet has failed. This is introduces a problem: With the default route missing the SLA target will be unreachable. The SLA (and track) will never recover, so the default route will never be restored. So first let's add a static route to our SLA target in each VRF. The default route will get withdrawn, but the host route for the SLA target will persist in each VRF.

 ip route vrf ISP_A 8.8.8.8 255.255.255.255 dhcp 50  
 ip route vrf ISP_B 8.8.8.8 255.255.255.255 dhcp 60  

We used the dhcp keyword as a stand-in for the next-hop IP address. We could have just specified the interface, but specifying a multiaccess interface without a neighbor ID is an ugly practice and assumes that proxy ARP is available from neighboring devices. Not a safe assumption.

Finally, we can set the default route to be withdrawn when the track object goes down:

 interface GigabitEthernet8  
  ip dhcp client route track 1  
   
 interface FastEthernet0  
  ip dhcp client route track 2  

At this point, when everything is healthy, the routing table for ISP_A looks something like this:

 S*  0.0.0.0/0 [10/0] via 192.168.1.126  
    8.0.0.0/32 is subnetted, 1 subnets  
 S    8.8.8.8 [50/0] via 192.168.1.126  
    192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks  
 C    192.168.1.64/26 is directly connected, GigabitEthernet8  
 L    192.168.1.67/32 is directly connected, GigabitEthernet8  

The table for ISP_B looks similar, but with different Administrative Distances. On failure of the SLA/track the default route gets withdrawn but the 8.8.8.8/32 route persists. That looks like this:

    8.0.0.0/32 is subnetted, 1 subnets  
 S    8.8.8.8 [50/0] via 192.168.1.126  
    192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks  
 C    192.168.1.64/26 is directly connected, GigabitEthernet8  
 L    192.168.1.67/32 is directly connected, GigabitEthernet8  

When the ISP is healed, the 8.8.8.8/32 ensures that we'll notice, the SLA will recover, and the default route will be restored.

Okay, now it's time to think about leaking these ISP_A and ISP_B routes into the global routing table (GRT). First, we need an interface in the GRT for use by directly connected clients:

 interface Vlan10  
  ip address 10.10.10.1 255.255.255.0  

And now the leaking configuration:

 ip prefix-list PL_DEFAULT_ONLY permit 0.0.0.0/0  
   
 route-map RM_IMPORT_TO_GRT permit  
  match ip address prefix-list PL_DEFAULT_ONLY  
   
 global-address-family ipv4  
  route-replicate from vrf ISP_A unicast static route-map RM_IMPORT_TO_GRT  
  route-replicate from vrf ISP_B unicast static route-map RM_IMPORT_TO_GRT  

The configuration above leaks only the default route from each F-VRF. The GRT will be offered both routes and will make its selection based on the AD we configured earlier (values 10 and 20).

Here's the GRT with everything working:

 S* + 0.0.0.0/0 [10/0] via 192.168.1.126 (ISP_A)  
    10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks  
 C    10.10.10.0/24 is directly connected, Vlan10  
 L    10.10.10.1/32 is directly connected, Vlan10  

When the ISP_A path fails, the GRT fails over to the higher distance route via ISP_B:

 S* + 0.0.0.0/0 [20/0] via 192.168.1.62 (ISP_B)  
    10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks  
 C    10.10.10.0/24 is directly connected, Vlan10  
 L    10.10.10.1/32 is directly connected, Vlan10  

Strictly speaking, it's not necessary to have the SLA monitor, track object and conditional routing in VRF ISP_B. All of those things could be omitted and the GRT would still fail back and forth between the different F-VRFs based only on the tests in "A". But I like the symmetry.

Okay, so now that we've got the GRT's default route flopping back and forth between these two front-door VRFs, we'll need some NAT. First, enable NVI mode on each interface in the transit path:

 interface GigabitEthernet8   
  ip nat enable  
 interface FastEthernet0  
  ip nat enable  
 interface Vlan10  
  ip nat enable  

Next we'll spell out exactly what's going to get NATted. I like to use route-maps rather than ACLs because the templating is easier when we're matching interfaces rather than ip prefixes:

 route-map RM_NAT->ISP_A permit 10  
  match interface GigabitEthernet8  
   
 route-map RM_NAT->ISP_B permit 10  
  match interface FastEthernet0  
   
 ip nat source route-map RM_NAT->ISP_A interface GigabitEthernet8 overload  
 ip nat source route-map RM_NAT->ISP_B interface FastEthernet0 overload  

That's basically it. The last thing that might prove useful is to automate purging of NAT translation tables when switching between providers. TCP flows can't survive the ISP switchover, and clearing the NAT translations for active flows should make them fail faster than they might have otherwise.