Saturday, September 18, 2010

OSPF: IOS and NX-OS interoperability

I got stung by a nasty OSPF interoperability problem last night.  IOS devices and NX-OS devices follow different rules when installing AS-external (type 5) LSAs into their routing tables.

Consider the following example.  Four routers, in two areas.  Two routers are ASBRs, one in each area.

Both ASBRs are advertising identical LSAs for an external route to the 10/8 network.  These LSAs are flooded to all four routers in the domain so that each router may make the best decision about how to reach 10/8.

IOS routers follow the rules laid out by RFC1583.  Both LSAs are external type 1, so the metric stamped on the LSA (5 in both cases) is added to the cost to reach the ASBR.  The path with the lowest cost will be installed in the routing table.

Router B has two choices:
  • The path through "A" starts with 5, and includes a cost of 64 to cross a T1 link.  Total cost: 69
  • The path through "D" starts with 5, and includes two Ethernet segments (10 each).  Total cost: 25
So, router B will send packets for 10/8 to Router C.

Router C has two choices:
  • The path through "A" starts with 5, includes a T1 (64) and an Ethernet (10).  Total cost: 79
  • The path through "D" starts with 5, and includes a single Ethernet segment (10).  Total cost: 15
Router C will send packets for 10/8 to router D.

The network is converged, life is good.  Everybody routes 10/8 towards "D".  Except for "A" who will use his non-OSPF route.

But what if Router C is a Nexus 7000?  Nexus 7000s running NX-OS follows(ish) a different set of rules when considering AS-external paths.  They're laid out by RFC2328:

o   Intra-area paths using non-backbone areas are always the most preferred.
For Router C, the path through "A" is an intra-area non-backbone path.  The path through "D" is a backbone path.  So, when following these new rules, "C" will forward 10/8 to "B" (toward A).  ...and "B" will forward to "C" (toward "D").  An OSPF routing loop.  Whee!

Fortunately, there's a switch to control this behavior.  RFC2178 introduced a switch called 'RFC1583Compatibility'.  Routers with compatibility mode enabled ignore the "always prefer intra-area non-backbone" path business, and just use the classic OSPF cost-based decision making.

On NX-OS you configure compatibility thusly:
N7K(config)# router ospf <tag>
N7K(config-router)# rfc1583compatibility 
UNfortunately, the default setting for this switch is enabled on IOS devices, and disabled on NX-OS. The RFC is clear that these switches need to match on all routers in a domain, and that the switch should (not SHOULD, not MUST) be enabled by default (section C.1)

Even worse, the compatibility mode is completely undocumented for NX-OS.
(update 11/7/2010:  the compatibility switch is now documented in the October 2010 command reference.  Not that you'd find it there -- the document is essentially a dictionary.  It is still missing from the October 2010 Unicast Routing Configuration Guide)

Gah.  This was really frustrating.  Choosing to defy both the RFC (which calls for compatibility by default), and interoperability with the rest of your product line, without a shred of documentation is, well...  An interesting choice.