Monday, November 17, 2014

L2 DCI failure remediation with proxy ARP

I had the pleasure of taking an interstate road trip with Ethan Banks last weekend. Naturally, we talked shop a bit.

Somewhere in Connecticut we were speculating about the possibility of using host routes and proxy ARP to restore connectivity between members of a subnet when an L2 DCI fails.

Would a router produce proxy ARP replies for destinations which are members of a directly connected network?

I labbed it up to find out.
Don't do it!
The routers run different HSRP groups. One per data center.

Hosts are configured to use the local HSRP address as their default gateway. There is no FHRP localization other than the configuration on the hosts. This is not a scheme that facilitates VM mobility between sites.

Each router uses an IP SLA monitor to check the L2 interconnect. This is accomplished by pinging the other site's HSRP address. The IP SLA monitor drives a tracking object (track 1). The tracking object drives a second tracking object (track 2) with inverted logic: when the routers can't ping each other track 2 transitions to "up":
 ! IP SLA monitor on R1
 ip sla 1
  icmp-echo 192.168.1.2 source-interface FastEthernet0/1
 ip sla schedule 1 life forever start-time now
 ! track 1 follows IP SLA 1
 track 1 ip sla 1
 ! track 2 follows IP SLA 1 with inverted up/down state
 track 2 list boolean and
  object 1 not

Finally, each router is configured with a static host routes for local systems. These routes depend on track 2 and are redistributed into the IGP:
 ! Configuration on R1
 ip route 192.168.1.11 255.255.255.255 FastEthernet0/1 track 2
 router ospf 1
  log-adjacency-changes
  redistribute static subnets

When the L2 DCI is healthy, the following is true:
  • IP SLA 1 is 'up'
  • track 1 is 'up'
  • track 2 is 'down'
  • the host routes are withdrawn
In other words, everything works normally. There are no host routes in the IGP, and no proxy ARP nonsense going on.

When the DCI fails, interesting things happen. First, the IP SLA monitor on both routers notices the failure and the tracking objects change state:
 *Nov 17 13:50:23.911: %TRACKING-5-STATE: 1 ip sla 1 state Up->Down  
 *Nov 17 13:50:24.443: %TRACKING-5-STATE: 2 list boolean and Down->Up  

The static routes comes to life:
 R1#show ip route static  
    192.168.1.0/24 is variably subnetted, 3 subnets, 2 masks  
 S    192.168.1.11/32 is directly connected, FastEthernet0/1

The other learns the host routes:
 R2#show ip route ospf  
    192.168.1.0/24 is variably subnetted, 3 subnets, 2 masks  
 O E2  192.168.1.11/32 [110/20] via 192.168.0.1, 00:14:02, FastEthernet0/0  

Now what will happen when Host_B tries to ping Host_A?

R2's host route (learned via the IGP) causes him to believe that Host_A is best via the IGP. Proxy ARP is enabled. R2 responds to Host_B's ARP query on behalf of Host_A:
 R2#debug ip arp  
 R2#  
 *Nov 17 14:07:07.791: IP ARP: rcvd req src 192.168.1.12 ca04.12d6.001c, dst 192.168.1.11 FastEthernet0/1  
 *Nov 17 14:07:07.791: IP ARP: sent rep src 192.168.1.11 0000.0c07.ac02, dst 192.168.1.12 ca04.12d6.001c FastEthernet0/1  

The ping succeeds. Host_B's ARP table includes an HSRP MAC entry (the standby process on R2) in the Host_A entry:
 Host_B#ping 192.168.1.11  
 Type escape sequence to abort.  
 Sending 5, 100-byte ICMP Echos to 192.168.1.11, timeout is 2 seconds:  
 .!!!!  
 Success rate is 80 percent (4/5), round-trip min/avg/max = 44/54/64 ms  
 Host_B#show ip arp 192.168.1.11  
 Protocol Address     Age (min) Hardware Addr  Type  Interface  
 Internet 192.168.1.11      0  0000.0c07.ac02 ARPA  Ethernet1/0  

Disclaimer: Don't do this. ARP timers are way too long. Everything about this is horrible.

This was strictly an experiment to discover whether a more specific route would trigger a router to proxy ARP on behalf of a system which should be directly connected.