Monday, June 20, 2011

To redirect, or not to redirect? That is the question (for an FHRP router)

Ethan's post about a problem caused by ICMP redirects has had me thinking about redirects and redundancy on an access network.  When I first learned about this, I was surprised by just how many moving parts there were, so I thought it warranted some attention.

In the examples that follow, R5 is configured with no ip routing.  R5 is our ignorant host system who will be using the wrong gateway and receiving ICMP redirects.


No FHRP:
Figure 1
For the topology in Figure 1, when R5 pings R10's loopback interface, he receives an ICMP redirect:
R5#debug ip icmp
ICMP packet debugging is on
R5#ping 192.168.51.10

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.51.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms
R5#
*Mar  1 21:57:00.715: ICMP: redirect rcvd from 192.168.50.1- for 192.168.51.10 use gw 192.168.50.3

R1#debug ip icmp
ICMP packet debugging is on
R1#
*Mar  1 02:38:01.123: ICMP: redirect sent to 192.168.50.5 for dest 192.168.51.10, use gw 192.168.50.3

No real surprises here.  R1 knows that R3 provides the best path to R10's loopback interface, so he sent a redirect.  The redirect:
  • was sourced from R1's physical interface address (it seems obvious, but this is important)
  • tells R5 to use 192.168.50.3 (R3's physical address) when talking to R10
The result on R5 looks like this:
R5#show ip route
Default gateway is 192.168.50.1

Host               Gateway           Last Use    Total Uses  Interface
192.168.51.10      192.168.50.3          0:05             8  Ethernet0/0


Let's switch on VRRP:
Now, let's enable VRRP on R1, and point R5 at R1's VRRP address, rather than the physical interface address.  The resulting topology now looks like:
Figure 2



R5#ping 192.168.51.10

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.51.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/4 ms
R5#
*Mar  1 22:11:54.707: ICMP: redirect rcvd from 192.168.50.101- for 192.168.51.10 use gw 192.168.50.3

R1#
*Mar  1 02:52:55.099: ICMP: redirect sent to 192.168.50.5 for dest 192.168.51.10, use gw 192.168.50.3

R1 didn't mention explicitly it in the debug output, but he changed the address from which he sourced the redirect.  This time, the redirect came from the VRRP gateway address 192.168.50.101.  It's important that R1 source the redirect from the correct IP, because RFC 1122 section 3.2.2.2 says:
A Redirect message SHOULD be silently discarded if the new
gateway address it specifies is not on the same connected
(sub-) net through which the Redirect arrived [INTRO:2,
Appendix A], or if the source of the Redirect is not the
current first-hop gateway for the specified destination
(see
Section 3.3.1).
R1 is able to determine that R5 knows him by the VRRP address (.101), rather than the physical interface address (.1) because the packet destined for R10 was encapsulated in an Ethernet frame destined for 00:00:5e:00:01:65 (the VRRP MAC address).  If, for some reason, a router is unable to determine the IP by which clients know him, he shouldn't send an ICMP redirect.

RFC 5798 Section 8.1.1 spells this out:
The IPv4 source address of an ICMP redirect should be the address
that the end-host used when making its next-hop routing decision. If
a VRRP router is acting as Master for virtual router(s) containing
addresses it does not own, then it must determine which virtual
router the packet was sent to when selecting the redirect source
address. One method to deduce the virtual router used is to examine
the destination MAC address in the packet that triggered the
redirect.
Note that there are 3 usable addresses on R3:
  • The physical interface address 192.168.50.3, which R1 chose to use in the redirect
  • The address for VRRP group 201: 192.168.50.201
  • The address for VRRP group 202: 192.168.50.202 (currently active on R4)


Here's how VRRP blackholes traffic:
VRRP has given R5 some potentially troublesome information.  Consider what happens if R3 fails:
R3#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
R3(config)#int eth 0/0
R3(config-if)#shut

R1 reconverged around this failure:

R1#show ip route 192.168.51.10
Routing entry for 192.168.51.10/32
  Known via "ospf 1", distance 110, metric 792, type intra area
  Last update from 192.168.50.4 on Ethernet0/0, 00:00:13 ago
  Routing Descriptor Blocks:
  * 192.168.50.4, from 192.168.51.10, 00:00:13 ago, via Ethernet0/0
      Route metric is 792, traffic share count is 1

But R5 did not.  He's still trying to use R3 (now offline), as previously directed by R1:
R5#ping 192.168.51.10

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.51.10, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
R5#show ip route
Default gateway is 192.168.50.101

Host               Gateway           Last Use    Total Uses  Interface
192.168.51.10      192.168.50.3          0:01            33  Ethernet0/0

Thanks, VRRP!

VRRP routers can certainly hear other VRRP routers on the local segment:
R1#debug vrrp errors
VRRP Errors debugging is on
R1#
*Mar  1 03:29:53.915: VRRP: Advertisement from 192.168.50.4 has an incorrect
                    group 202 id for interface Et0/0
*Mar  1 03:29:54.039: VRRP: Advertisement from 192.168.50.3 has an incorrect
                    group 201 id for interface Et0/0

But they don't seem to keep track of groups in which they're not participating:
R1#show vrrp all
Ethernet0/0 - Group 101
  State is Master
  Virtual IP address is 192.168.50.101
  Virtual MAC address is 0000.5e00.0165
  Advertisement interval is 1.000 sec
  Preemption enabled
  Priority is 105
  Master Router is 192.168.50.1 (local), priority is 105
  Master Advertisement interval is 1.000 sec
  Master Down interval is 3.589 sec


So, what about HSRP?
As it turns out, the outcome of the first test (no HSRP on R1, R5 pointed to 192.168.50.1, R3 and R4 running HSRP) is the same:  A redirect sourced from R1's physical interface address steers R5 toward 192.168.50.3 for destination 192.168.51.10.

Routers not running HSRP don't keep track of any HSRP groups (probably on a per-interface basis):
R1#show standby all brief

R1#


Now, lets enable HSRP on R1, and point R5 at the HSRP gateway address:
Figure 3

The first thing to notice is that with HSRP running, R1 (group 101) keeps track of what R3 and R4 are doing on HSRP groups 201 and 202.  These are groups of which R1 is not a member.
R1#show standby all brief
                     P indicates configured to preempt.
                     |
Interface   Grp Prio P State    Active          Standby         Virtual IP
Et0/0       101 105  P Active   local           unknown         192.168.50.101
Et0/0       201 100    Disabled 192.168.50.3    192.168.50.4    192.168.50.201
Et0/0       202 100    Disabled 192.168.50.4    192.168.50.3    192.168.50.202

HSRP redirect behavior:
R5#ping 192.168.51.10

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.51.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/404/1004 ms
R5#
*Mar  1 23:21:25.859: ICMP: redirect rcvd from 192.168.50.101- for 192.168.51.10 use gw 192.168.50.201

R1#debug ip icmp
ICMP packet debugging is on
R1#
*Mar  1 04:02:26.159: ICMP: HSRP changing redirect sent to 192.168.50.5 for dest 192.168.51.10
*Mar  1 04:02:26.159: ICMP:   gw 192.168.50.3 -> 192.168.50.201, src 192.168.50.101
*Mar  1 04:02:26.159: ICMP: Use HSRP virtual address 192.168.50.101 as ICMP src
*Mar  1 04:02:26.159: ICMP: redirect sent to 192.168.50.5 for dest 192.168.51.10, use gw 192.168.50.201

This time, R1 changed two parameters in the redirect.  First, he sourced the redirect from the 192.168.50.101, which is the IP that R5 knows about, as required by RFC 1122.  This is the same as VRRP's behavior.  Next, because R1 knows that R3 is running HSRP, and that R3 is active for group 201, he changed the redirect to refer to R3 by its HSRP virtual address, 192.168.50.201.

This is way cool because now if R3 fails, R5 won't lose connectivity with R10.

Somewhat surprisingly, we've added resiliency to this topology by turning up an HSRP process on R1, even though there's no router backing up R1!

Will an HSRP router redirect to a standby router?
Now lets see what happens if we change the HSRP priority so that R4 is the active router for both of his groups (201 and 202), preempting R3:
Figure 4
R4(config)#interface Ethernet0/0
R4(config-if)#standby 201 priority 110
R4(config-if)#standby 201 preempt
*Mar  1 09:46:48.751: %HSRP-5-STATECHANGE: Ethernet0/0 Grp 201 state Standby -> Active

Now R3 remains the best choice for accessing R10, but it isn't the active HSRP router for any groups.  What will R1 do?
R5#debug ip icmp
ICMP packet debugging is on
R5#ping 192.168.51.10

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.51.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/5/8 ms
R5#

Huh.  No redirect arrived at R5.
R1#debug ip icmp
ICMP packet debugging is on
R1#
*Mar  1 04:49:33.850: ICMP: redirect not sent to 192.168.50.5 for dest 192.168.51.10
*Mar  1 04:49:33.850: ICMP:  192.168.50.3 does not contain an active HSRP group

Cool!  Rather than redirecting to R3's physical interface (like VRRP does), or to a virtual router address on R4, R1 elects to suppress the redirect and route the traffic itself.


Okay, one last thing to point out:
As I mentioned earlier, routers aren't supposed to send redirects if they're not sure about the IP by which clients know them.  One way to accomplish this is with the standby use-bia command.  This causes the router to use its burned-in-address for the HSRP groups.  This creates ambiguity about the client's configuration because both the physical interface IP and the HSRP IP are represented by the same MAC address.
R1(config)#interface Ethernet 0/0
R1(config-if)#standby use-bia


We can see the result in R5's show ip arp output, where both of R1's IP addresses can be seen using the same MAC address.  R1 now has no way of distinguishing whether R5 knows him as 192.168.50.1 or as 192.168.50.101, making it impossible to know which address should be used in an ICMP redirect.
R5#show ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.50.101          3   0050.730d.0d41  ARPA   Ethernet0/0
Internet  192.168.50.1            1   0050.730d.0d41  ARPA   Ethernet0/0



Nonetheless, R1 generates a redirect, sourced from 192.168.50.1.  That's the wrong address because R5's default route points to 192.168.50.101.  R5 doesn't seem to mind, and installs the redirect in his routing table anyway, ignoring the advice in RFC 1122.

R5#show ip route
Default gateway is 192.168.50.101

Host               Gateway           Last Use    Total Uses  Interface
ICMP redirect cache is empty
R5#ping 192.168.51.10

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.51.10, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms
R5#
*Mar  1 23:39:54.667: ICMP: redirect rcvd from 192.168.50.1- for 192.168.51.10 use gw 192.168.50.201

R1(config-if)#
*Mar  1 04:20:54.947: ICMP: HSRP changing redirect sent to 192.168.50.5 for dest 192.168.51.10
*Mar  1 04:20:54.947: ICMP:   gw 192.168.50.3 -> 192.168.50.201
*Mar  1 04:20:54.947: ICMP: redirect sent to 192.168.50.5 for dest 192.168.51.10, use gw 192.168.50.201

R5#show ip route
Default gateway is 192.168.50.101

Host               Gateway           Last Use    Total Uses  Interface
192.168.51.10      192.168.50.201        0:02             8  Ethernet0/0


R1's decision to send the redirect in spite of the addressing ambiguity contrasts with some Cisco documentation.  That document includes the following debug output from a case where a router can't be sure how he's known by a client:
10:43:08: SB: ICMP redirect not sent to 20.0.0.4 for dest 30.0.0.2

10:43:08: SB: could not uniquely determine IP address for mac 00d0.bbd3.bc22