Monday, January 31, 2011

Revisiting the VLAN 1 Myth - Again!

Over on the ipexpert blog, Marko recently busted the myth that "VLAN 1 is always allowed on switch trunks."

I'd like to present a rebuttal :-)

The origins of the myth seem to me to be threefold:
  • On old platforms, you really couldn't remove VLAN 1 from tagging interfaces
  • Cisco documentation indicates that control traffic always uses VLAN 1
  • There's an unrelated VLAN traversal attack involving VLAN 1, which is just one more reason to avoid using it.

Marko clearly demonstrated that transit traffic on VLAN 1 can be pruned from a trunk, and concluded:
control traffic (VTP, CDP, STP) is sent untagged [...] All untagged traffic on trunks belongs to native VLAN… except for those control frames which don’t.
Control frames don’t really belong to any VLAN per-se – they belong to the switch, regardless of the VLAN they are received on.
I can only mostly agree with Marko here.  Consider the following topology:


What should the CDP frames look like?  Cisco Press (swiped from a comment on Marko's post) says:
It is important to understand that even if the native VLAN of an 802.1Q trunk is not VLAN 1, all of the above protocols (with the exception of DTP as indicated in the previous note) are still sent on VLAN 1, with a tag attached indicating VLAN 1 is not the native VLAN (if the native VLAN is VLAN 1, then messages are sent without a tag).
So, even though VLAN 1 is disallowed from the trunk, frames tagged with VLAN 1 should still appear there.  Wireshark agrees.  Cisco Press is vindicated:


While you can definitely remove VLAN 1 transit traffic from a trunk, control frames really do belong to VLAN 1, and you can't remove these frames from the trunk.  VLAN 1 is magic afterall.

Myth somewhat un-busted.

Saturday, January 29, 2011

Inconsistent DNS resolution in IOS

In a comment to one of Greg Ferro's posts, Jon Still pointed out that IOS only resolves hostnames used in it's configuration file once:  At the instant the configuration is applied.

It turns out that it's not so straightforward.  The behavior depends on the IOS version.

Jon and I stumbled across the same behavior, and for the same reason:  We both tried to use pool.ntp.org in an IOS configuration.  Jon and I both went on to illustrate this behavior to someone else, and this is where our experiences diverged:  Jon got the behavior he was expecting in his later demonstration (on Greg's blog), and I had the IOS contradict me (on a different router).

Jon's experience looks like this:

router(config)#ntp server pool.ntp.org
Translating "pool.ntp.org"...domain server (8.8.8.8) [OK]


router(config)#do sho run | inclu ntp server
ntp server 72.18.205.157
router(config)#

Here we can see that when the 'ntp server' configuration was added to the router, it immediately did a lookup against google's DNS server at 8.8.8.8, and put the resulting IP address into the running configuration.

This is kind of a bummer, because you can't rely on any particular pool.ntp.org server to be available long term, so you might like the router to have the option to go back to DNS when required.  When exactly it goes back is a whole other story - NTP is complicated.

Here's what happened when I was trying to illustrate the issue:
router(config)#ntp server pool.ntp.org
Translating "poo.ntp.org"...domain server (8.8.8.8)

router(config)#do sho run | inclu ntp server
ntp server pool.ntp.org
router(config)#

Huh!  So, this time, the NTP hostname remained in the configuration.  The difference?  The first router is running 12.4(25d), while the second is running 12.4(15r).

Now, fortunately for the sane operation of NTP, the NTP process caches the IP address of the server to which it's associated, even though the running configuration has not.



But things get even weirder when we decide later to remove the configuration:
router(config)#no ntp server pool.ntp.org
Translating "pool.ntp.org"...domain server (8.8.8.8) [OK]

%NTP: unrecognized peer
router(config)#do sho run | inclu ntp server
ntp server pool.ntp.org
router(config)#

Huh.  Why can't I remove the configuration?

router(config)#do ping pool.ntp.org

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 169.229.70.64, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
router(config)#do sho ntp ass

  address         ref clock       st   when   poll reach  delay  offset   disp
*~216.93.242.12   137.146.28.85    2     29    128   177 18.484  -5.244 65.191
 * sys.peer, # selected, + candidate, - outlyer, x falseticker, ~ configured
router(config)#


Apparently the no ntp server pool.ntp.org command initiated a new lookup, and the IP address associated with that name record has changed.  The record currently resolves to 169.229.70.64, but the NTP process is associated with a previously resolved IP of 216.93.242.12.

router(config)#no ntp server 216.93.242.12
router(config)#do sho run | inclu ntp server
router(config)#


Removing the NTP server by IP address removed both the association from the NTP process and the configuration line used to add the association (by name), even though the two are no longer linked anywhere except in the memory of the IOS NTP process.

Friday, January 28, 2011

Router console via bluetooth

UPDATE 2/13/2013: I really like the look of the SEL 2924 bluetooth to serial adapter. It has integrated rechargeable AAA batteries and a micro-USB charing port. Power is the biggest problem with most devices of this type. If you spring for one of these, please post a comment about your experience with it.


My employer recently "upgraded" my laptop to a model that doesn't include a serial port.  I've got lots of USB/serial converters based on the Prolific 2303 chipset, and have used this chip for ages on various platforms, but the Windows 7 drivers for this chip have not worked well for me.

Rather than roll the dice with other chipsets and drivers, I decided to get what I really wanted:  a bluetooth serial port from BlueConsole.

Bluetooth is a huge advantage for me because it allows me to go find a chair while working in sometimes hostile environments (data centers, construction sites, manufacturing plants, mountaintop antenna towers), rather than do the balance-laptop-on-left-hand-while-typing-with-right-index-finger ballet.

Unfortunately BlueConsole is out of business, so I had to find something else.

The UConnect BT232B is the best solution I've found, and I'm happy enough with it to recommend it.  I got mine from US Converters.
It has some shortcomings:
  • It has thumbscrews instead of fixed female threads
  • The DE-9 connector is female, rather than male
  • It requires external power (either mini USB or through a small battery connector)

The thumbscrew problem was the easiest to resolve.  Remove thumbscrews, replace them with 4-40 hardware scavenged from an IT junkpile.  Three screws hold the plastic clamshell together, but two are covered by the sticker.  Sticker carnage:

Having resolved the thumbscrew issue, I installed a 9-pin gender changer and a Cisco console cable:

For power, I use an Energizer EnergiStick 250 charging dongle intended for cell phones.  It's a small 250mAh rechargeable battery with two mini-USB ports:  A male port for powering the adapter, and a female port for recharging.  This 250mAh battery should last almost 3 hours at the BT232B's rated maximum draw (90mA).  I've used it for at least twice that long without recharging, so the actual current consumption must be quite a bit lower.  Maybe I'm not typing fast enough :-)

It's like it was made for the BT232B:

Pass-thru charging is supported:  The Energizer battery can take a charge from a USB source while simultaneously powering the bluetooth radio.

The BT232B pairs easily with both Windows 7 and OSX.  Both platforms install it as a locally attached serial port: COMx in Windows and /dev/tty.mumble in OSX.  Its baud rate defaults to something other than 9600 baud.  I configured it once, and have not had to reconfigure it, so either it remembers its settings, or it's taking cues from my terminal emulator software.

The Windows 7 bluetooth drivers are kind of terrible.  If I'm linked up with the device and put my laptop to sleep, I need to delete the port and re-add it in the control panel in order to get it working again.  Under OSX it just works.

The radio performance has really surprised me.  I'd expected it to give me just a little bit more room to roam than the normal 6' Cisco cable, but it's way better than that.  I've used it over 50' in data centers, and 60' (through walls) in my house.  When it gets to the end of its radio range, it tends to get weird and slow, but doesn't drop characters.

Thursday, January 27, 2011

East - West adjacency: Why do I do this?

Every now and then I deploy a topology that includes the following requirements:
  • L3 access layer switches
  • VLANs extended across two switches

The VLAN extension requirement usually spawns from a server NIC redundancy (teaming), or from the desire to have pooled systems on the same subnet, but with diverse upstream network hardware.

The deployment winds up looking something like this:


There's a full mesh of OSPF point-to-point links in the topology including an East-West access layer adjacency on VLAN 10, which rides the same L2 aggregate link as the access VLANs.  Yesterday, I said "don't use SVIs for routing adjacency."  I confess:  I've done it myself.  Lots.

But now I'm starting to wonder why I've done it.

If all VLANs are shared between the two access switches, then what benefit do I get from the adjacency on VLAN 10?  I'm starting to think the answer might be "nothing."  Maybe I should eliminate it.

We've already got Equal Cost Multi-Path to everything else in the network, and any traffic between access VLANs will always be locally routed (and then bridged across Po10 when necessary).

Is there a case to be made for keeping this link, or should I kill it off?

Wednesday, January 26, 2011

VTP pruning - not all it's cracked up to be

Stretch posted yesterday about the routing convergence time of SVIs vs. routed interfaces.

It reminded me of a couple of problems I've seen at customer sites:

Sloppy Trunk Configurations
I've run into at least one instance of bad trunk configuration on almost every customer network.  It can happen so easily, and can have such serious implications on routing convergence, that it seems like reason enough to never run point-to-point routed links on an SVI with an access port.  Sure, if you're careful, this can be done correctly.  But will the guy in the next cubicle be so careful?  What about the network admin who replaces you?

In the best case failure scenario, this sort of mistake will lead to convergence times depending on IGP timers, rather than interface state.  In the worst case scenario, you might find a subnet split into separate broadcast domains between two L3 switches, with both of them advertising the subnet into your IGP.  Not good.  Know the whole L2 topology for every VLAN.

VTP Pruning
I found the other problem more interesting, probably because I wasn't completely sure what to expect. It's something I've only seen done once:  I had a customer who deliberately ran every trunk without a 'switchport trunk allowed vlan' statement.  The customer allowed all 4094 VLANs onto every trunk, and relied on VTP pruning to remove the "extra" VLANs from trunks where they weren't needed.

The VTP pruning mostly* worked.  If only one access switch needed vlan 100, 'show int trunk' revealed that each distribution switch was only forwarding vlan 100 on two interfaces:

  • The downlink to the closet switch
  • The crosslink to the other distribution switch

Great, right?  Broadcast frames don't get forwarded to places they're not needed.  That's the whole point of VTP pruning.

VTP pruning does not, however, make any promises about quick convergence times.  So what about VLAN 99, which is a /30 routed link to the core, and consists of a single access port?

I labbed it up to find out.

The SVI is up:
Rack2-3550#sho ip int brief | inclu Vlan99
Vlan99                 192.168.99.1    YES manual up                    up      

There's just a single access port assigned to VLAN 99:
Rack2-3550#sho int status | inclu 99
Fa0/12    routed link to core     connected    99         a-half   a-10 10/100BaseTX

VLAN 99 is not forwarding on any trunk links, even through it's allowed there:
Rack2-3550#sho int trunk  

Port        Mode             Encapsulation  Status        Native vlan
Gi0/1       on               802.1q         trunking      1
Gi0/2       on               802.1q         trunking      1

Port        Vlans allowed on trunk
Gi0/1       10-14,99  <- Allowed on Gi0/1
Gi0/2       11-14,99  <- Allowed on Gi0/2

Port        Vlans allowed and active in management domain
Gi0/1       10-14,99  <- Allowed and active on Gi0/1
Gi0/2       11-14,99  <- Allowed and active on Gi0/2

Port        Vlans in spanning tree forwarding state and not pruned
Gi0/1       10-14   <- Pruned from Gi0/1
Gi0/2       none    <- Pruned from (and blocking on) Gi0/2


VLAN 99 effectively consists of a single access port.  What will happen to the SVI if I shut down that port?
Rack2-3550(config)#int fa 0/12
Rack2-3550(config-if)#shut
Rack2-3550(config-if)#do sho ip int br | inclu Vlan99
Vlan99                 192.168.99.1    YES manual up                    up      

Doh!  The SVI survives even through there are now no access ports and no trunks willing to carrying its traffic.  Any routing adjacency established here will survive until the IGP times it out.

This will not be good for your VoIP calls.  Don't do this.


*VTP pruning fell apart wherever non-VTP devices were introduced.  Trunks to vmWare servers, non-Cisco switches and wireless devices weren't able to report their lack of interest in any VLANs.  The result was that these devices and their access switches would up receiving all active VLANs.