Tuesday, September 28, 2010

Layer 2 Traceroute

Cisco switches have a nifty but little used diagnostic feature:  The 'traceroute mac' command.

It does pretty much what you'd expect.  It traces the L2 path between two endpoints.  Exactly how it accomplishes this feat is much less obvious.  Normal (Layer 3) traceroute makes use of progressively larger TTL marks, and uses the ICMP "time to live exceeded" errors from routers along the path in order to print the path between two nodes.  These mechanisms don't exist in a bridged environment.  So how does it work?

Consider the following topology:

Running traceroute mac on the rightmost switch produces the following result:
Cat2960#traceroute mac 000b.5f73.0491 000b.5f73.0480 vlan 11
Source 000b.5f73.0491 found on Cat2960
1 Cat2960 ( : Gi0/20 => Po1
2 Cat3550 ( : Po1 => Fa0/20
3 Cat2950 ( : Fa0/1 => Fa0/12
Destination 000b.5f73.0480 found on Cat2950
Layer 2 trace completed

The procedure for doing this work manually is straightforward.  We look for each MAC address in the VLAN 11 forwarding table, then check to see whether the egress port has a CDP neighbor.  If so, log into the next switch (using the management address reported by CDP).  Lather, rinse, repeat.

The manual procedure works on each MAC address independently.  The 'traceroute mac' command, however requires you to specify both source and destination stations.  It's curious:  L3 traceroute doesn't do that, it assumes you want to trace the path from here to somewhere else.  'traceroute mac' on the other hand can trace from somewhere else to somewhere else.  Here we run a trace from a switch that isn't in the transit path between the two end stations.
Cat2950#traceroute mac  0800.870e.b6b1 000b.5f73.0491 vlan 11
Source 0800.870e.b6b1 found on Cat2960
1 Cat2960 ( : Gi0/10 => Gi0/20
Destination 000b.5f73.0491 found on Cat2960
Layer 2 trace completed
In that case, we were logged into the 2950, but traced the L2 path between two stations that were each connected directly to the 2960.  Two L2 hops away.  The operation of this tool is somewhat mysterious.  Here's what I can tell about it so far:
  • Both MAC addresses must appear in the forwarding table on each switch in the path.
  • Each switche uses CDP to figure out what next hop lies beyond its local egress port (and the IP address on which it can be reached).
  • It's an L3 process.  The right switch in the figure above is in a different management subnet than his neighbors.  These switches all forward traffic directly at L2, but they talk amongst themselves via the router-on-a-stick.  This is different than the operation of CDP (a link layer protocol).
  • If more than one CDP neighbor appears on an interface in the path (several switches hanging from a hub), the process blows up because it's impossible to discern the next hop.  You can replicate this easily by changing the hostname of a switch.  The neighbors will see two entries until the old name times out.
  • The switch running the trace communicates directly with each device in the path.  The first example involved:
    • An exchange between the 2960 and the 3550
    • An exchange between the 2960 and the 2950
  • The process requires a service running on UDP 2228 on each switch (see it with 'show ip sockets')
  • If there's no CDP neighbor on a port (like when I switched CDP off on the 2950), then that's the end of the trace.
The wire format is undocumented as far as I can tell.  The wireshark wiki page for CDP mentions the protocol, but doesn't have any information on it.  It's not CDP, but it's similar.  Each packet seems to have some fixed fields, and some TLV sets.

Here's the payload breakdown of a query packet:
00-01: 02 01  (unknown)
02-03: Length (same as UDP payload length)
04-05: 05 02 (unknown - everything else looks like a TLV set)
07-12: Source MAC address

It includes the following TLV sets:
01 Destination MAC (always 8 bytes)
03 VLAN ID (always 4 bytes)
0E Originator CDP managemet IP (always 6 bytes)
10 CDP name info source (I learned about you from - variable length)

A reply packet looks like this:
00-01: 04 01 :  (unknown)
02-03: Length (same as UDP payload length)
04   : 06 : (unknown)

It includes the following TLV sets:
04 Originator CDP name (variable length)
05 Originator CDP Platform string (variable length)
06 Originator CDP management IP (always 6 bytes)
0F Unknown, 1 byte, seems to be related to "end of path" information
03 Next hop CDP IP (always 6 bytes)
10 Next hop CDP name string (variable length)
07 Egress interface name (variable length)
08 Ingress interface name (variable length)

It's very surprising to me that mapping out an L2 environment can be done using L3 (off subnet) tools with seemingly no security.  I'm not much of a believer in security by obscurity (I generally run CDP on edge ports), but this level of network mapping without even requiring an SNMP read-only string seems like it could be a problem.  The only hint of a complicating factor here is that the name of the target switch is embedded in the request packet.  If that name is checked before a reply is sent, there's some small measure of security.  But all an attacker needs is the name of a single switch, since that first switch will give up the names of all of his neighbors.

It will be difficult to strike the balance between security and usability when writing ACLs for this service, since you need to protect every IP interface on an L3 switch, while still providing service to clients on every IP interface of every L2/L3 switch.


  1. This comment has been removed by a blog administrator.

  2. Thanks so much for the info about l2traceroute. I always worry about names of devices - but admit I need my coworkers' ease and confidence in a noc and will apply the lesson I have secured.

  3. Have you had any further luck figuring out how l2 traceroute actually works? So far this is the most detailed account I have found. I'm attempting to write a script to create the same functionality the cisco devices use.

    1. Sorry, no. Everything I know about it is spelled out here. It still seems crazy to me that the switches will give up this information so readily. If you write that script, please follow up, I'd like to try it!