Friday, July 27, 2012

Native VLAN - Some Surprising Results


I did some fiddling around with router-on-a-stick configurations recently and found some native VLAN behavior that took me by surprise.

The topology for these experiments is quite simple, just one router, one switch, and a single 802.1Q link interconnecting them:
Dead Simple Routers-On-A-Stick Configuration


The initial configuration of the switch looks like:

vlan 10,20,30
!
interface FastEthernet0/34
 switchport trunk encapsulation dot1q
 switchport trunk allowed vlan 10,20,30
 switchport mode trunk
 spanning-tree portfast trunk
!
interface Vlan10
 ip address 192.0.2.1 255.255.255.0
!
interface Vlan20
 ip address 198.51.100.1 255.255.255.0
!
interface Vlan30
 ip address 203.0.113.1 255.255.255.0


And the initial configuration of the router looks like:

interface FastEthernet0/0
 no ip address
 duplex auto
 speed auto
!
interface FastEthernet0/0.10
 encapsulation dot1Q 10
 ip address 192.0.2.2 255.255.255.0
!
interface FastEthernet0/0.20
 encapsulation dot1Q 20
 ip address 198.51.100.2 255.255.255.0
!
interface FastEthernet0/0.30
 encapsulation dot1Q 30
 ip address 203.0.113.2 255.255.255.0



So, nothing too interesting going on here. The devices can ping each other on each of their three IP interfaces.

We can switch VLAN tagging off for one of those VLANs, by making it the native VLAN for the link. On the switch we'll do:

S1(config-if)#switchport trunk native vlan 20

And on the router we have two choices:

1) Eliminate the FastEthernet0/0.20 subinterface altogether, and apply the IP to the physical interface:

no interface FastEthernet 0/0.20
interface FastEthernet 0/0
  ip address 198.51.100.2 255.255.255.0

2) Use the native keyword on the subinterface encapsulation configuration:

interface FastEthernet0/0.20
  encapsulation dot1Q 20 native


Both configurations leave us with full connectivity on all three subnets, with the 198.51.100.0/24 network's packets running untagged on the link. So, what's the difference between them?

The Cisco 360 training materials have this to say about the encapsulation of 802.1Q router subinterfaces:
you can assign any VLAN to the native VLAN on the router and it will make no difference
But the truth is a little more nuanced than that.

First of all, option 1 makes it impossible to administratively disable just the 198.51.100.2 interface. Shutting this down will knock all of the subinterfaces offline as well (thanks, mellowd!)

To see the next difference between these router configuration options, let's introduce a misconfiguration by omitting switchport trunk native vlan 20 from the switch interface.

If we've configured option 1 on the router, pretty much nothing works. Frames sent by the switch with  VLAN 20 tag are ignored when they arrive at the router, and untagged frames sent by the router are similarly ignored by the switch.

On the other hand, if wev'e configured option 2 on the router, things kind of start to work with the mismatched trunk configuration. A ping from the switch doesn't succeed, but elicits the following response from 'debug ip packet' on the router:

IP: tableid=0, s=198.51.100.1 (FastEthernet0/0.20), d=198.51.100.2 (FastEthernet0/0.20), routed via RIB
IP: s=198.51.100.1 (FastEthernet0/0.20), d=198.51.100.2 (FastEthernet0/0.20), len 100, rcvd 3
IP: tableid=0, s=198.51.100.2 (local), d=198.51.100.1 (FastEthernet0/0.20), routed via FIB
IP: s=198.51.100.2 (local), d=198.51.100.1 (FastEthernet0/0.20), len 100, sending


The mistakenly tagged packet was accepted by subinterface Fa0/0.20! So, router configuration option #2 does something pretty interesting: It causes the subinterface to send untagged frames, just like the 'native' function suggests, but it allows the subinterface to receive either untagged frames, or frames tagged with the number specified in the subinterface encapsulation directive. Clearly the number we use makes a difference after all, and I found the behavior pretty interesting.

Catalyst switches have a similarly interesting feature: vlan dot1q tag native

When this command is issued in global configuration mode, it causes the switch to tag all frames egressing a trunk interface, even those belonging to the native VLAN. The curious thing about this command is that it causes the interface to accept incoming frames into the native VLAN whether they're tagged, or untagged. It's kind of the reverse of the curious router behavior, and leads to an interesting interoperability mode in which:
  • The router generates untagged frames, but will accept either tagged or untagged frames using:
    • interface FastEthernet0/0.20
    •   encapsulation dot1Q 20 native
  • The switch generates tagged frames, but will accespt either tagged or untagged frames using:
    • vlan dot1q tag native
    • int FastEthernet 0/34
    •   switchport trunk native vlan 20
And the result is that we have full interoperability between a couple of devices that can't agree on whether an interface should be tagged or not, because they both have liberal policies regarding the type of traffic they'll accept.

Here's the result (three pings) according to my sniffer:

02:04:05.284217 00:0b:fd:67:bf:00 > 00:07:50:80:80:81, ethertype 802.1Q (0x8100), length 118: vlan 20, p 0, ethertype IPv4, 198.51.100.1 > 198.51.100.2: ICMP echo request, id 36, seq 0, length 80
02:04:05.287826 00:07:50:80:80:81 > 00:0b:fd:67:bf:00, ethertype IPv4 (0x0800), length 114: 198.51.100.2 > 198.51.100.1: ICMP echo reply, id 36, seq 0, length 80
02:04:05.288391 00:0b:fd:67:bf:00 > 00:07:50:80:80:81, ethertype 802.1Q (0x8100), length 118: vlan 20, p 0, ethertype IPv4, 198.51.100.1 > 198.51.100.2: ICMP echo request, id 36, seq 1, length 80
02:04:05.292322 00:07:50:80:80:81 > 00:0b:fd:67:bf:00, ethertype IPv4 (0x0800), length 114: 198.51.100.2 > 198.51.100.1: ICMP echo reply, id 36, seq 1, length 80
02:04:05.292904 00:0b:fd:67:bf:00 > 00:07:50:80:80:81, ethertype 802.1Q (0x8100), length 118: vlan 20, p 0, ethertype IPv4, 198.51.100.1 > 198.51.100.2: ICMP echo request, id 36, seq 2, length 80
02:04:05.296703 00:07:50:80:80:81 > 00:0b:fd:67:bf:00, ethertype IPv4 (0x0800), length 114: 198.51.100.2 > 198.51.100.1: ICMP echo reply, id 36, seq 2, length 80

The pings succeeded, so we know that we have bidirectional connectivity. Nifty.


9 comments:

  1. Hi Chris,

    One more interesting thing to try out - have a look how your router and switch respectively send their L2 control protocol traffic (STP, LLDP, CDP and so on). Most of these protocols must travel untagged, irrespective of interface configuration.

    ReplyDelete
    Replies
    1. Hey Dmitri,

      That is an interesting topic, and it's one I've explored in the past. CDP always belongs to VLAN 1, and will carry a tag if VLAN 1 is non-native:
      http://www.fragmentationneeded.net/2011/01/revisiting-vlan-1-myth-again.html

      DTP always goes untagged.

      There's a lot of confusion in this area, because proper capturing of tagged traffic is tricky:
      - PC's often screw up the VLAN tag in hardware, before the pcap libraries get ahold of it.
      - SPAN function screws up untagged traffic when mirroring it to an 802.1Q monitor port.

      I"ve always done this work with a TAP and a Linux system to be sure I know what I'm seeing.

      Thanks for the comment!

      Delete
  2. This is similar to something I ran across in a security audit - turns out that access ports will still accept tagged frames as long as they're tagged with the VLAN that the access port is a member of. Seems like this same logic has been applied to what you've discovered here.

    Great post!

    ReplyDelete
    Replies
    1. Huh, cool. I'll have to try that.

      The situation you describe is closely related to the VLAN hopping attack:
      1) get on an access port on a VLAN which is native to some trunk
      2) send a frame tagged for some other VLAN
      3) when the frame is switched out the trunk link, no tag is applied by the switch because you came from the native VLAN
      4) frame is delivered to remote switch with the attacker's tag, and has hopped VLANs.

      Did the situation you found in the audit result in a security "finding" of some sort? Is there a risk here? It seems benign at first glance.

      Delete
    2. Do you have a view on why would the behaviour you're describing would be necessary? From a purist's point of view, an access port must discard any tagged frames sent to it. The only ports that should be able to exchange mix of tagged and untagged frames should be trunks and tunnel endpoints (e.g., QinQ).

      Delete
    3. You're referring to the VLAN traversal attack?

      The behavior I described in the comment above certainly isn't necessary, and I understand that it isn't possible on most platforms.

      I think it's more of a historical relic at this point, but it seems like good form to make sure that the VLANs used for access LANs aren't configured to be the native VLANs on trunks.

      Delete
    4. Understand.

      Agree regarding the native VLANs; was just checking if I wasn't missing some valid use case when the behaviour above was desired.

      Delete
  3. Interesting post Chris. I know from the dim and distant past that accepting untagged frames as belonging to a native VLAN was useful if you had a hub inbetween two switches (or swithc and router as in your post), which would still allow a host connected to the hub to communicate on a particular VLAN despite the hub not having any idea about them.

    What was more useful than that was to throw the hub in the bin and replace it with a switch. ;-)

    ReplyDelete
    Replies
    1. Hey Matt,

      Yeah, accepting untagged frames on a tagging interface is useful for all kinds of reasons (nearly every wall jack in an environment with IP telephony does this), but I'm less clear on the usefulness of accepting both tagged and untagged frames into the same VLAN.

      Incidentally, I'm posting this comment through a $5 ethernet switch on my desk that's transparently passing VLAN tags :)

      Delete