Monday, June 18, 2012

Nexus 7004 - WTF?

Where's The Fabric?

Nexus 7004 photo by Ethan Banks
On display at Cisco Live in San Diego was a new Nexus 7000 chassis, the 4-slot Nexus 7004.

The first thing that I noticed about this chassis is that it doesn't take the same power supplies as its 9, 10, and 18 slot siblings. According to the guy at the Cisco booth, the 7004 uses 3KW power supplies that will live in the four small slots at the bottom of the chassis.

The next thing I looked for were the fabric modules. Imagine my surprise to learn that the 7004 doesn't have fabric modules!

No fabric modules? This is pretty interesting, but before we get into the specifics, a quick recap of how the larger Nexus 7K chassis work is probably in order...

The Nexus 7000 uses a 3 stage fabric where frames/packets moving from one line card to another pass through three distinct fabric layers:
  1. Fabric on the ingress line card - The fabric here connects the various replication engine or switch-on-chip (SoC) ASICs together. Traffic that ingresses and egresses on the same line card doesn't need to go any further than this fabric layer. Traffic destined for a different card or the supervisor progresses to:
  2. Chassis fabric - There are up to five "fabric modules" installed in the chassis. These modules each provide either 46Gb/s or 110Gb/s to the line cards, depending on which card we're talking about.
  3. Fabric on the egress line card - This is the same story as the ingress card's fabric, but our data is hitting this fabric from the chassis fabric, rather than from a front panel port.
Proxy-routed packets in an F1/M1 chassis take an even more complicated path, because they have the privilege of hitting the fabric twice, as they traverse up to 3 different line cards. 5 fabric hops in total.

The interconnections between line cards and fabric modules look something like the following:
Nexus 7009/7010/7018 with fabric modules

Note that I'm not completely sure about the speed of the interconnection between supervisor and fabric (black lines), but it doesn't really matter because control plane stuff is pretty low bandwidth and aggressively throttled (CoPP) anyway.

The important takeaways from the drawing above are:

  1. Fabric 2 modules can run at M1/F1 speeds (46Gb/s per slot), and at F2 speeds (110 Gb/s per slot) simultaneously. They provide the right amount of bandwidth to each slot.
  2. While there are 5 fabric modules per chassis, each card (at least the M1/F1/F2 - not sure about supervisors) has 10 fabric connections, with 2 connections to each fabric module.
It's been commonly explained that the 7004 chassis is able to do away with the fabric modules by interconnecting its pair of payload slots (two slots are dedicated for supervisors) back-to-back, and eliminating the chassis fabric stage of the 3-stage switching fabric.

...Okay... But what about control plane traffic? On the earlier chassis, the control plane traffic takes the following path:
  1. Ingress port
  2. Ingress line card fabric
  3. Chassis fabric
  4. Supervisor
With no chassis fabric, it appears that there's no way for control plane traffic to get from a line card to the supervisor. Well, it turns out that the 7004 doesn't dedicate all of the line card's fabric to a back-to-back connection.

I think that the following diagram explains how it works, but I haven't seen anything official: 8 of the fabric channels on each line card connect to the peer line card, and one channel is dedicated to each supervisor. Something like this:

Nexus 7004 - no fabric modules
Cool, now we have a card <-> supervisor path, but we don't have a full line-rate fabric connection between the two line cards in the 7004. Only 8 fabric channels are available to the data plane because two channels have been dedicated for communication with the supervisors.

F2 cards clearly become oversubscribed, because they've got 480 Gb/s of front panel bandwidth, adequate forwarding horsepower, but only 440Gb/s of fabric.

I believe that F1 cards would have the same problem, with only 184 Gb/s (eight 23 Gb/s fabric channels), but now we're talking about an F1-only chassis with no L3 capability. I'm not sure whether that is even a supported configuration.

M1 cards would not have a problem, because their relatively modest forwarding capability would not be compromised by an 8 channel fabric.

Having said all that, the oversubscription on the F2 module probably doesn't matter: Hitting 440Gb/s on the back-to-back connection would require that 44 front panel ports on a single module are forwarding only to the other module. Just a little bit of card-local switching traffic would be enough to ensure that the backplane is not oversubscribed.

6 comments:

  1. all nexus 7k line cards, even in the 7010/7018 chassis use a switched Ethernet out of band connection (EOBC) for control plane traffic.
    Each supervisior has the Gige EOBC switch on it

    So in the 7004 all 5 fabric links can be used for intercard traffic

    ReplyDelete
  2. sorry, just realised that they do have to steal those fabric links to the supervisor for it to be able to inject packets for outbound transmission because the 7k is all ingress features.

    You could do an all F1 based 7004 but then it is just a L2 switch and a 5k would be cheaper.

    ReplyDelete
  3. Yep, the EOBC is there, but control plane traffic isn't out-of-band

    ReplyDelete
  4. I cant wait to see the use cases.

    Neat 40Gb and 100Gb cards in the chassis.

    ReplyDelete
    Replies
    1. This comment has been removed by a blog administrator.

      Delete