Monday, February 13, 2012

10Gb/s Server Access Layer - Use The FEX!

Several people who have read the four part 10Gb/s pricing reported that the central thesis wasn't clear enough. So, here it is again:

10Gb/s Servers? Rack your Nexus 5500 with the core switches. Connect servers to Nexus 2232s.

I know of several networks that look something like this:
Top Of Rack Nexus 5500

I think that this might be a better option:
Centralized Nexus 5500

We save lots of money on optics by moving the Nexus 5500 out of the server rack and into the vicinity of the Nexus 7000 core. Then we spend that savings on Nexus 2232s, FETs and TwinAx. These two deployments cost almost exactly the same amount.

The pricing is pretty much a wash, but we end with the following advantages:
  • The ability to support 10GBASE-T servers - I expect this to be a major gotcha for some shops in the next few months.
  • Inexpensive (this is a relative term) 1Gb/s ports at top of rack for low speed connections
  • Greater flexibility for oversubscription (these servers are unlikely to need line rate connections)
  • Greater flexibility for equipment placement (drop an inexpensive FEX where ever you need it)
  • Look at all those free ports! 5K usage has dropped from 24 ports to 8 ports each! Think of how inexpensive next batch of 10Gig racks will be if we only have to buy 2232s. And the next. And the next...
It's not immediately apparent, but oversubscription is an advantage of this design. With top-of-rack 5500, you can't oversubscribe the thing; you must dedicate a 10Gb/s port to every server whether that's sensible or not. With FEXes you get to choose: oversubscribe them, or don't.

The catches with this setup are:
  • The core has to be able to support TwinAx cables: The first generation 32-port line cards must use the long "active" cables and the M108 cards will require OneX converters which list for $200 each. And check your NX-OS version.
  • You need to manage the oversubscription.
Inter-pod (through the core) oversubscription is identical at 2.5:1 in both examples. Intra-pod oversubscription rises from 1:1 to 2.5:1 with the addition of the FEX. Will it matter? Maybe. Do you deploy applications carefully so that server traffic tends to stay in-pod or in-rack, or do you servers get installed without regard to physical location ("any port / any server" mentality), with VMware DRS moving workload around?

We can cut oversubscription in this example down to 1.25:1 for just $4000 in FETs and 16 fiber strands by adding links between the 5500 and the 2232. This is a six-figure deployment, so that should be a drop in the bucket. You wouldn't factor in the cost of the 5500 interfaces in this cost comparison because we're still using less of them than the first example..

I recognize that this topology isn't perfect for everybody, but I believe it's a better option for many networks. It Depends. But it's worth thinking about, because it might cost a lot less and be a lot more flexible in the long run.


  1. Chris,

    Quick Q - what did you mean by the M132 card needing to use the active Twinax cables? Can you not use the shorter, passive cables there?


  2. The datasheet for the M132 only lists the 7 and 10 meter active cables. For giggles, I just tried a 5 meter in my lab N7K:

    ecdc-nsw1# sh int e1/9 transceiver details
    transceiver is present
    type is SFP-H10GB-CU5M
    name is CISCO-TYCO
    nominal bitrate is 10300 MBit/sec
    Link length supported for copper is 5 m

    ecdc-nsw1# sh mod
    Mod Ports Module-Type Model Status
    --- ---- -------------------------------- ------------------ -------
    1 32 10 Gbps Ethernet Module N7K-M132XP-12 ok

  3. Agree, although I find that most 10Gig uplinks are to other switches and 2232s do not support (in best practice) uplinks to other switches, unless you're hankering for a loop.

    So 2232s are out if you need a mix of switch and server uplinks.

  4. It's link-up without complaining

    Device-ID Local Intrfce Hldtme Capability Platform Port ID
    switch Eth1/9 134 S I s N5K-C5548P Eth1/1

  5. @ Jon and Hagen,

    Relevant quote from Lincoln Dale about passive twinax in the original M132 card:
    "they are currently unsupported on N7K for good reasons that are technical. strongly suggest that you don't use it for a production environment"

    ...That was in reply to me saying "hey, they seem to work just fine!"


    1. Indeed, that's what I'd expect. I'll just continue with 10Gbase-SR for production (N5K->N7K).


  6. @Will,

    That's why this post is about 10Gb/s access layer.

    You're right, FEX does not make for a good distribution layer :-)

  7. If the FEX based pod is 2.5:1 oversubscribed, that must mean you can only have 10 servers in the pod?

  8. @Brad

    20 servers with (redundant) 10Gb/s links, with 8x 10 Gb/s uplink from the pod to the core.

    20:8 = 2.5:1

    The specifics numbers here aren't something I addressed all that carefully -- I hashed all that out in the earlier series. This time I just drew something that fit nicely in my window :)

    The real point was, by locating the 5K carefully, we can save enough money to buy FEXes. Given that the FEX is free, laying things out this way can make the physical environment much easier to manage.

    1. Got it. Your diagram shows 4 x 10G uplinks per FEX, so I was compelled to ask :)
      Nice post.

    2. The drawing is correct -- I'm just making the assumption that the server connections are doubled up for redundancy, not because we expect 20Gb/s from each server :-)

      With that in mind, lets assume that the servers really run at 10Gb/s, but balance evenly (a dream, I know) across their two uplinks: 5Gb/s each.

      20 servers sending 5Gb/s to each FEX = 100Gb/s
      4 FEX uplinks at 10Gb/s each = 40Gb/s
      100:40 = 2.5:1

  9. Hi Chris,
    To be clear, we support FEX on 3 line cards in the Nexus 7000. The M132, the M132L and the F2 module. The M132(original) requires the active cables as mentioned while the M132L and F2 can use passive as well as active twinax cable. Note that the F2 offers line rate high density support for FEX over the M1 option.

    Ron @ccie5851

    1. Also want to add that the M108 module, even with the One-X converter will not support FEX.

      Ron @ccie5851

    2. Hey Ron,

      I've never connected a FEX straight to the 7K, though it seems a good option, and would save even more money: skip the 5K alltogether! Politics have driven the decision to shoehorn 5K in the middle. Access ports on the 7K "DC core" seem scary :-)

      ...If I were considering this deployment, matters of twinax support on the 7K would be moot because:
      - twinax almost certainly wouldn't reach server racks
      - twinax is a PITA. Even if it did reach, I'd be happy to spend the extra few bucks for FET rather than twinax.

      List price difference between a twinax cable and a pair of FETs is only $90 - $350 list, and I believe that FET is supported by every device capable of hosting a FEX.

      Other than the slight cost advantage, is twinax superior to FET in any way?

  10. Chris, Hagen - thanks for clarifying - not had to deploy or spec N7ks yet!

  11. It's worth noting that the 10G-FET optics have a distance limit of 100m via MultiMode Fiber. They ARE considerably cheaper than their SR/LR 10Gb counterparts, but distance limitations should be kept in mind when planning data center/IDF deployments using FEXs as your access layer.

    1. Have you deployed FEX in IDFs before? I haven't seen that done before but it certainly peaks my interested. As Chris mentions, I'd be interested in hear about peoples experience with 2K IDF deployments. Cost savings, environment, business type, etc.

  12. @Unknown, FEXen in an IDF? ...interesting...

    There's that FEX with PoE that doesn't look like it will make it to manufacturing (despite having made it into NX-OS).

    One customer has deployed some FEXen around their building because some far-flung servers were overwhelming the legacy access network. Servers were far-flung because they were connected to scientific equipment, so the FEXen wound up in laboratories.

    Curious to hear anecdotes of other Nexus 2K in IDF closets.

    Your point is well taken: know your requirements, available media and optical capabilities.

    I generally steer customers toward installing singlemode fiber for any link that leaves a room, and have never encountered an in-room link in excess of 100m.