Tuesday, March 21, 2017

Cisco: Not Serious About Network Programmability

"You can't fool me, there ain't no sanity clause!"
Cisco isn't known for providing easy programmatic access to their device configurations, but has recently made some significant strides in this regard.

The REST API plugin for newer ASA hardware is an example of that. It works fairly well, supports a broad swath of device features, is beautifully documented and has an awesome interactive test/dev dashboard. The dashboard even has the ability to spit out example code (java, javascript, python) based on your point/click interaction with it.

It's really slick.

But I Can't Trust It

Here's the problem: It's an un-versioned REST API, and the maintainers don't hesitate to change its behavior between minor releases. Here's what's different between 1.3(2) and 1.3(2)-100:

New Features in ASA REST API 1.3(2)-100

Released: February 16, 2017
As a result of the fix for CSCvb21388, the response type of /api/certificate/details was changed from the CertificateDetails object to a list of CertificateDetails. Scripts utilizing this API will need to be modified accordingly.

So, any code based on earlier documentation is now broken when it calls /api/certificate/details.

This Shouldn't Happen

Don't take my word for it:



Remember than an API is a published contract between a Server and a Consumer. If you make changes to the Servers API and these changes break backwards compatibility, you will break things for your Consumer and they will resent you for it. 




It Gets Worse

Not only does the API fail to provide consistently formatted responses, it doesn't even provide a way to discover its version. Cisco advised me to scrape the 'show version' CLI output in order to divine the correct way to parse the API's responses. Whenever they decide to change things.

The irony of having to abandon the API for screen scraping in order to improve API compatibility is almost too much to bear. Lets assume for the moment that I'm willing to do it. Will the regex that finds the API version today still work on tomorrow's release? Do I even know how to parse the version numbers?

What's the version number of the current release anyway?

  • 1.3(2)-100 (according to the release notes above)
  • 1.3.2.100 (according to show version CLI output)
  • 1.3.2 (according to the 'release:' field on the download page)
This does not look like a road I'm going to enjoy traveling.

Would You Use This API?

When I inquired about version-to-version incompatibilities, Cisco's initial response was:
"This definitely shouldn't be happening."
Followed by:
"We are aware of the limitations resulting for not having versioned ASA REST API releases. And as of now there are no plans for us to fix this."
 Further followed by:
"we will update the documentation to reflect the correct behavior, once we post this fix to CCO."
So hey, no problem right? We might sneak breaking changes into the smallest of maintenance releases, but at least we'll document it! Have fun selling and supporting your application!

Clearly I am one of the angry and resentful customers predicted by the articles quoted above :)

Friday, March 17, 2017

Epoch Rollover: Coming Two Years Early To A Router Near You!

The 2038 Problem

Broken Time? -  Roeland van der Hoorn
Many computer systems and applications keep track of time by counting the seconds from "the epoch", an arbitrary date. Epoch for UNIX-based systems is the stroke of midnight in Greenwich on 1 January 1970.

Lots of application functions and system libraries keep track of the time using a 32-bit signed integer, which has a maximum value of around 2.1 billion. It's good for a bit more than 68 years worth of seconds.

Things are likely to get weird 2.1 billion seconds after the epoch on January 19th, 2038.

As the binary counter rolls over from 01111111111111111111111111111111 to 10000000000000000000000000000000, the sign bit gets flipped. The counter will have changed from its farthest reach after the epoch to its farthest reach before the epoch. time will appear to have jumped from early 2038 to late 1901.

Things might even get weird within the next year (January 2018!) as systems begin encounter freshly minted CA certificates with expirations after the epoch rollover (it's common for CA certificates to last for 20 years.) These certificates may appear to have expired in late 1901, over a century prior to their creation.

NTP's 2036 Problem

NTP has a similar, but not-quite-the-same epoch problem. It keeps track of seconds in an unsigned 32-bit value, so it can count twice as high as the problematic UNIX counter (yay!) but NTP's epoch is set 70 years earlier: 1 January 1900 (boo!) The result is that NTP's counter will roll over about 2 years before the UNIX counter.

Practically speaking, NTP's going to be fine for reasons having to do with it being primarily concerned about small offsets in relative time, and it only having to be within 68 years of correct on startup in order to sync up with an authoritative time source.

So What's Up With This Router?

Here's a weird thing I stumbled across recently. Time calculations with dates in 2036 are going wrong but they're unrelated to NTP:

 router#show crypto pki certificates test-1 
 CA Certificate  
  Status: Available  
  Certificate Serial Number (hex): 14  
  Certificate Usage: Signature  
  Issuer:   
   cn=test-2  
  Subject:   
   cn=test-2  
  Validity Date:   
   start date: 02:38:26 UTC Mar 17 2017  
   end  date: 00:00:00 UTC Jan 1 1900  
  Associated Trustpoints: test-1   

But this one looks okay:

 router#show crypto pki certificates test-2
 CA Certificate  
  Status: Available  
  Certificate Serial Number (hex): 12  
  Certificate Usage: Signature  
  Issuer:   
   cn=test-1  
  Subject:   
   cn=test-1  
  Validity Date:   
   start date: 02:37:31 UTC Mar 17 2017  
   end  date: 06:28:15 UTC Feb 7 2036  
  Associated Trustpoints: test-2   

The real expiration dates of these certificates is just one second apart:

$ openssl x509 -in test-1.crt -noout -enddate
notAfter=Feb 7 06:28:16 2036 GMT
$ openssl x509 -in test-2.crt -noout -enddate
notAfter=Feb 7 06:28:15 2036 GMT
So... That's unfortunate.

Here's the actual certificate data and import procedure used for this experiment in case you feel inclined to test:

 crypto pki trustpoint test-1  
  enrollment terminal  
 crypto pki authenticate test-1  
 -----BEGIN CERTIFICATE-----  
 MIIBeDCCASKgAwIBAgIBFDANBgkqhkiG9w0BAQUFADARMQ8wDQYDVQQDDAZ0ZXN0  
 LTIwIBcNMTcwMzE3MDIzODI2WhgPMjAzNjAyMDcwNjI4MTZaMBExDzANBgNVBAMM  
 BnRlc3QtMjBcMA0GCSqGSIb3DQEBAQUAA0sAMEgCQQDUjEccGNjjtv8lKNnvGpta  
 Z4x8LB82D2JJwTcvA5blUI2nr4vF41RqG0ifZ+Qtyqo+ntSD2QzDu3LKdSUw46if  
 AgMBAAGjYzBhMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1Ud  
 DgQWBBQ2NpEF0FG/g3ryNgU7Skjbm4IGHTAfBgNVHSMEGDAWgBQ2NpEF0FG/g3ry  
 NgU7Skjbm4IGHTANBgkqhkiG9w0BAQUFAANBAIVyT+iBimH7c/jtBrFGmKq+7YdM  
 eMwf9I/En/TAUqtte7QGLNRyTgBJvGgN/uc0KUjlZ5D6G/kxTwDtzse2Uow=  
 -----END CERTIFICATE-----  
 quit  
   
 crypto pki trustpoint test-2  
  enrollment terminal  
 crypto pki authenticate test-2  
 -----BEGIN CERTIFICATE-----  
 MIIBeDCCASKgAwIBAgIBEjANBgkqhkiG9w0BAQUFADARMQ8wDQYDVQQDDAZ0ZXN0  
 LTEwIBcNMTcwMzE3MDIzNzMxWhgPMjAzNjAyMDcwNjI4MTVaMBExDzANBgNVBAMM  
 BnRlc3QtMTBcMA0GCSqGSIb3DQEBAQUAA0sAMEgCQQDUjEccGNjjtv8lKNnvGpta  
 Z4x8LB82D2JJwTcvA5blUI2nr4vF41RqG0ifZ+Qtyqo+ntSD2QzDu3LKdSUw46if  
 AgMBAAGjYzBhMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1Ud  
 DgQWBBQ2NpEF0FG/g3ryNgU7Skjbm4IGHTAfBgNVHSMEGDAWgBQ2NpEF0FG/g3ry  
 NgU7Skjbm4IGHTANBgkqhkiG9w0BAQUFAANBAIjboo8wtehMpOReLw01tW8MLYzl  
 rtpwYVGoHCVVpXU+s7YQtfR1pt5ZVHZ8OVeP8SoTtoS+5k97aWgBZ+hu8/M=  
 -----END CERTIFICATE-----  
 quit  

Wednesday, February 1, 2017

Docker's namespaces - See them in CentOS

In the Docker Networking Cookbook (I got my copy directly from Pact Publishing), Jon Langemak explains why the iproute2 utilities can't see Docker's network namespaces: Docker creates its namespace objects in /var/run/docker/netns, but iproute2 expects to find them in /var/run/netns.

Creating a symlink from /var/run/docker/netns to /var/run/netns is the obvious solution:

 $ sudo ls -l /var/run/docker/netns  
 total 0  
 -r--r--r--. 1 root root 0 Feb 1 11:16 1-6ledhvw0x2  
 -r--r--r--. 1 root root 0 Feb 1 11:16 ingress_sbox  
 $ sudo ip netns list  
 $ sudo ln -s /var/run/docker/netns /var/run/netns  
 $ sudo ip netns list  
 1-6ledhvw0x2 (id: 0)  
 ingress_sbox (id: 1)  
 $  

But there's a problem. Look where this stuff is mounted:

 $ ls -l /var/run  
 lrwxrwxrwx. 1 root root 6 Jan 26 20:22 /var/run -> ../run  
 $ df -k /run  
 Filesystem   1K-blocks Used Available Use% Mounted on  
 tmpfs      16381984 16692 16365292  1% /run  
 $   

The symlink won't survive a reboot because it lives in a memory-backed filesystem. My first instinct was to have a boot script (say /etc/rc.d/rc.local) create the symlink, but there's a much better way.

Fine, I'm starting to like systemd

Systemd's tmpfiles.d is a really elegant way of handling touch files, symlinks, empty directories, device nodes, pipes and whatnot which live in volatile filesystems. The feature works from these directories:
  • /etc/tmpfiles.d
  • /run/tmpfiles.d
  • /usr/lib/tmpfiles.d
When the directives found in these directories contradict one another, the instance I've listed earlier wins. This allows an administrator to override package declarations in /usr/lib/tmpfiles.d by creating an entry in /etc/tmpfiles.d. Conflicts between files are resolved by the order of their appearance in a lexical sort.

So, what goes in these directories? Files named <whatever>.conf. Each line in these files controls creation of a file / folder / symlink / etc... There are switches and options to control ownership, permissions, overwrite condition, contents, and so forth.

Here's the file that causes systemd to create my symlink on every boot:

 $ cat /etc/tmpfiles.d/netns.conf   
 L /run/netns - - - - ./docker/netns  

I'm still not quite ready to forgive systemd for taking away the udev network interface naming persistency stuff and replacing it with something that's useless in virtual machines (this helps). But I'm getting there.

Lately I've been really liking each new facet of systemd as I've discovered it.

Tuesday, January 31, 2017

Anuta Networks NCX: Overcoming Skepticism

Anuta Networks demonstrated their NCX network/service orchestration product at Network Field Day 14.

Disclaimer
Anuta Networks page at TechFieldDay.com with videos of their presentations

Anuta's promise with NCX is to provide a vendor and platform agnostic network provisioning tool with a slick user interface and powerful management / provisioning features.

I was skeptical, especially after seeing the impossibly long list of supported platforms.

Impossible!
Network device configurations are complicated! They've got endless features, each of which is tied to others the others in unpredictable ways. Sure, seasoned network ops folks have no problem hopping around a text configuration to discover the ways in which ACLs, prefix lists, route maps, class maps, service policies, interfaces, and whatnot relate to one another... But capturing these complicated relationships in a GUI? In a vendor independent way?

I left the presentation with an entirely different perspective, and a desire to try it out on a network I manage. Seriously, I have a use case for this thing. Here's why I was wrong:

Not a general purpose UI
Okay, so it's a provisioning system, not a general purpose UI. Setup is likely nontrivial because it requires you to consider the types of services and related configurations you deploy in your network, and then express those possibilities in a simple form. Examples of things you might choose to express are:

  • For an MPLS PE device, are we 8021.Q tagging the traffic at the customer handoff? If so, what tag? That's a checkbox and a text field.
  • For a DMVPN router, do we want to allow direct internet access, or backhaul everything to HQ? A checkbox.
  • For various WAN interfaces, choose from a list of provider types in order to get the correct QoS templates applied
Pre-built Templates
NCX ships with understanding of how to do lots of things (create a VLAN,  configure spanning tree, define a BGP neighbor) on lots of different platforms. Much of the work is already done. We're not inventing the wheel with Ansible here.

Vendors and Features
The vendor list was huge, but let's just consider a Cisco for the moment. What does it mean for Cisco platforms to be "supported" by NCX? Apparently everything in the IWAN deployment guide is supported. That's a pretty complete list of features: routing protocols, QoS, PBR, security, etc... I'm sure they don't have crazy corner case stuff (using appletalk, are you?) but that's okay because...

Extensibility
Need to use a feature that NCX doesn't know about? The toolkit for defining your own features, exposing them in the UI and getting them expressed as device configuration looked pretty straightforward. It boiled down to expressing a YANG model of the feature in question and then mapping that to device specific NETCONF/CLI/REST/whathaveyou configuration directives.

Final Fragments
  • Offline devices which have missed several cycles of updates do not receive that series of individual updates when they come back online. NCX somehow flattens the queued changes into a single update prior to delivering it. Frankly, this blows my mind. It suggests a surprising level intimacy with the device configuration. Imagine, for example, if one of the updates included the bgp upgrade-cli directive. NCX would anticipate the result and merge subsequent address-family ipv4 directives? Maybe I misunderstood the answer to my question on this topic :)
  • NCX knows about its management path to the devices in question, and is careful not to lock itself out. I didn't get a lot of clarity about how this works, but there's no question the guys behind it are thoughtful and clever.

Friday, January 6, 2017

ERSPAN on Comware

The Comware documentation doesn't spell it out clearly, but it's possible to get ERSPAN-like functionality by using a GRE tunnel interface as the target for a local port mirror session.

This is very handy for quick analysis of stuff that's not L2 adjacent with an analysis station.

First, create a local mirror session:

 mirroring-group 1 local  

Next configure an unused physical interface for use by tunnel interfaces:

 service-loopback group 1 type tunnel  
 interface <unused-interface>  
  port service-loopback group 1  
  quit  

Now configure a GRE tunnel interface as the destination for the mirror group:

 interface Tunnel0 mode gre  
  source <whatever>  
  destination <machine running wireshark>  
  mirroring-group 1 monitor-port  
  quit  

Finally, configure the source interface(s):

 interface <interesting-source-interface-1>  
  mirroring-group 1 mirroring-port inbound  
 interface <interesting-source-interface-2>
  mirroring-group 1 mirroring-port inbound  

Traffic from the source interfaces arrives at the analyzer with extra Ethernet/IP/GRE headers attached. Inside each GRE payload is the original frame as collected at a mirroring-group source interface. If the original traffic with extra headers attached (14+20+4 == 38 bytes) exceeds MTU, then the switch fragments the frame. Nothing gets lost and Wireshark handles it gracefully.