Monday, April 9, 2012

Ethernet Fabric - The Bulb Glows Dimly

People talking about "Ethernet Fabrics" are usually describing a scheme in which many switches are interconnected and all links area available for forwarding Ethernet frames.

Rather than allowing STP to block links until it forms a loop-free topology, Fabrics include an L2 multipath scheme which forwards frames along the "best" path between any two endpoints.

Brandon Carrol outlined the basics of an Ethernet fabric here, and his description leaves me with the same question that I've had since I first heard about this technology: What problem can I solve with it?

The lightbulb over my head began to glow during one of Brocade's presentations (pop quiz: what switch is the STP root in figure 1 of the linked document?) at the Gestalt IT Fabric Symposium a couple of weeks ago. In that session, Chip Copper suggested that a traditional data center topology with many blocked links and sub-optimal paths like this one:

Three-tier architecture riddled with downsides

might be rearranged to look like this:
Flat topology. All links are available to forward traffic. It's all fabricy and stuff.

The advantages of the "Fabric" topology are obvious:
  • Better path selection: It's only a single hop between any two Access switches, where the previous design required as many as four hops.
  • Fewer devices: We're down from 11 network devices to 6
  • Fewer links: We're down from 19 infrastructure links to 15
  • More bandwidth: Aggregate bandwidth available between access devices is up from 120Gb/s to 300Gb/s (assuming 10 Gb/s links)
If I were building a network to support a specialized, self-contained compute cluster, then this sort of design is an obvious choice.

But that's not what my customers are building. The networks in my customers' data centers need to support modular scaling (full mesh designs like I've pictured here don't scale at all, let alone modularly) and they need any-vlan-anywhere support from the physical network.

So how does a fabric help a typical enterprise?
The scale of the 3-tier diagram I presented earlier is way off, and that's why fully meshing the Top of Rack (ToR) devices looks like a viable option. A more realistic topology in a large enterprise data center might have 10-20 pairs of aggregation devices and hundreds of Top of Rack devices living in the server cabinets.

Obviously, we can't fully mesh hundreds of ToR devices, but we can mesh the aggregation layer and eliminate the core! The small compute cluster fabric topology isn't very useful or interesting to me, but eliminating the core from a typical enterprise data center is really nifty. The following picture shows a full mesh of aggregation switches with fabric-enabled access switches connected around the perimeter:
Two-tier fabric design
Advantages of this design:
  • Access switches are never more than 3 hops from each other.
  • Hop count can be lowered by running a cable
  • No choke point at the network core.
  • Scaling: The most densely populated switch shown here only uses 13 links. This can grow big.
  • Scaling: Monitoring shows a link running hot? Turn up a parallel link.
Why didn't I see this before?
Honestly, I'm not sure why it took so long to pound this fabric use case through my skull. I think there are a number of factors:
  • Marketing materials for fabrics tend to focus on the simple full mesh case, and go out of their way to bash the three-tier design. A two-tier design fabric doesn't sound different enough.
  • Fabric folks also talk a lot about what Josh O'Brien calls "monkeymesh" - the idea that we can build links all willy-nilly and have things work. One vendor reportedly has a commercial with children cabling the network however they see fit, and everything works fine. This is not a useful philosophy. Structure is good!
  • The proposed topology represents a rip-and-replace of the network core. This probably hasn't been done too many times yet :-)

Wednesday, April 4, 2012

Tech Field Day

I'm priviledged to have been invited to attend Gestalt IT's Network Field Day 3 event held in and around San Jose last week. These Field Day events are rare opportunities for social-media-enabled IT folks like me (Gestalt IT calls us delegates) to get together with the people behind the amazing products we use in our jobs.  Then we chase all of the sales and marketing people out of the room :-)

Full Disclosure
Gestalt IT covered the cost of my event-related travel, hotel room, and meals. The vendors we met (who ultimately are the ones footing the bill) didn't have any say about the list of delegates, and don't know what we're going to say about their products. Generally speaking, they make good products and are hoping that the content of our blogs and tweets will indicate that. There were high and low points, I'll cover interesting examples of both in future posts. Oh, I also came home with vendor-supplied T shirts (two), coffee mugs (three) and a handful of USB flash drives.

What is Tech Field Day?
Different people get different things out of TFD events. For me, the highlight of NFD3 was the opportunity to meet an array of interesting people, some of whom are my high tech heroes. Among the list of my co-delegates are people who, without knowing it, have been influencing my career for years (one of them for over a decade). I'd only met three of them before and didn't really know any of them prior to last week. My co-delegates (table swiped from the NFD3 page) for this event were:

Ethan Banks Packet Pushers @ECBanks
Tony Bourke The Data Center Overlords @TBourke
Brandon Carroll Brandon Carroll
Brad Casemore Twilight in the Valley of the Nerds @BradCasemore
Greg Ferro EtherealMind
Packet Pushers
Jeremy L. Gaddis Evil Routers @JLGaddis
Tom Hollingsworth The Networking Nerd @NetworkingNerd
Josh O’Brien StaticNAT @JoshOBrien77
Marko Milivojevic IPExpert
Ivan Pepelnjak @IOSHints
Derick Winkworth Cloud Toad @CloudToad
Mrs. Y. Packet Pushers @MrsYisWhy

I also got to hang out with the Gestalt IT folks who make it all happen: Stephen Foskett and Matt Simmons, a couple of amazing guys that I'm proud to know.

In addition to meeting my esteemed colleagues (giggling like a schoolgirl because I got to ride around in a limo sitting next to Ivan Pepelnjak for a couple of days), I saw some awesome technology and presentations. Stuff I'm excited about and hope to get to use at work some day. ...And that brings me to what the vendor sponsors get out of these events: Nerdy bloggers like me get exposed to their best new offerings and just might write about them or tell their friends.

Does it work? Well, I'm going to be talking about it. Heck it's almost inevitable: Anyone who has squeezed out more than a a few blog posts will tell you that having material for a dozen posts dropped in your lap is awesome. Several sponsors represent repeat business for Gestalt IT, so the exposure they get from Field Day events must make their participation worthwhile.

The sponsors I met, in the order I met them were:

Chip Copper, Brocade solutioneer (I want this title!) presented at a a related Gestalt IT event, the "Fabric Symposium" held the day before NFD3 kicked off. Chip told us all about  Brocade's Shortest Path Bridging (SPB) capability, including some nifty special sauce that sets Brocade apart. Brocade management take note: Chip was an awesome presenter, he really knows how to talk to nerds, and made a compelling case for your products. I've seen Brocade sales presentations before and was underwhelmed. Chip made all the difference. I'll be posting about it soon.

Mav Turner and Joel Dolisy from Solarwinds gave us the rundown on the latest in network management. Most of my work is project-oriented consulting, rather than the long-term care and feeding of networks, so I don't work with this sort of product on a daily basis and I won't have much to say about it as far as what's new and exciting. But the room was full of passionate Solarwinds users, so I'm sure the blogosphere will be abuzz about Solarwinds in the coming weeks.

Don Clark and Samrat Ganguly told us about NEC's OpenFlow offering. This was the only OpenFlow based product we saw, and I'm not sure that I really "get" OpenFlow: Sure, it's cool tech, and a it presents a couple of large advantages over the traditional way of doing things (stay tuned for more), but it's just so different. Those that can really take advantage of it probably are already all over openflow. My customer base, on the other hand, is mostly non-tech companies for whom the network is a means to an end. I think it will take quite a while before the perceived risks (different is scary!) will be overcome in that market.

Doug Gourlay and Andy Bechtosheim (!) talked about Arista products and product philosophy in general, their new FX series switches in particular, and their view on the direction of the industry. I'd long known that Arista made compelling products, but I can't remember the last time (before now) that I was actually excited about a switching platform.

Infineta makes data-center-sized WAN optimizers that work entirely differently from the branch office boxes most of us are accustomed to using. Making the point about just how different they are required Infineta to bring an unprecedented level of nerdy to their TFD presentation. I think there was only a single delegate who managed to survive internalize most of the math they threw at us. Short version: these are exciting products that do things which will likely never be possible to do with server-based WAN optimizers.

Cisco talked to us about several new offerings in their data center, access layer, network management, security portfolios and virtual switching portfolios. We were interested in every topic, and asked lots of questions. Unfortunately, there wasn't enough time, nor technical enough Cisco folks to address these topics at the depth we would have liked. Future TFD sponsors take note: Pick a topic and be ready to dig deep on it. Don't plan to fill the allotted time because the nerds you're inviting will find a way to take the discussion into the weeds. The presentation wasn't bad, it was just too wide and not deep enough. Several topics got cut short. I still came away with material for a few blog posts, so stay tuned.

Most of the delegates weren't familiar with Spirent, so the final presentation of NFD3 was a real eye-opener for them. I've had exposure to Spiren't Test Center device on two different occasions. Both were pre-deployment data center design validation exercises, and in both cases the Spirent test tools exposed a problem so that we could solve it before the system went live. Spirent's test boxes are spectacularly capable, powerful and precise, and support a wide spectrum of test types. The boxes are expensive to purchase, but can be rented along with a Spirent PS guy (Hi Glen!) for the sorts of short duration validation tests that most enterprises might be interested in running. They also showed off their iTest product. Everyone was excited about this, and I think you'll see lots of blogging about this product - especially since they gave us test licenses to play with.

In Summary
Gestalt IT puts together an amazing event. I sure hope I get invited to another one. If you're a potential delegate, you owe it to yourself to throw your hat into the ring. Heck, start a blog and get active on twitter if  you're not blogging and tweeting already. The community is amazing and will find surprising ways to pay back your efforts. I'm sure glad that I've done it.

If you're a vendor marketing person, have a look at the sort of exposure that TFD events get for their sponsors. I have no idea about the value proposition that Gestalt IT offers (my brain doesn't work that way), but having the nerdiest bloggers with the biggest audiences excited about your product has to be good, right? Lots of big names are repeat customers, so I can only assume that it works.