Fragmentation Needed: September 2017

Tuesday, September 26, 2017

Pluribus Networks... Wait, where are we again?

I was privileged to visit Pluribus Networks as a delegate at Network Field Day 16 a couple of weeks ago. Somebody else paid for the trip. Details here.

Much has changed at Pluribus, I hardly recognized the place!

I quite like Pluribus (their use of Solaris under their Netvisor switching OS got me right in the feels early on) so I'm happy to report that most of what's new looks like changes for the better.

When we arrived at Pluribus HQ we were greeted by some new faces, a new logo, color scheme... Even new accent lighting in the demo area!

Gone also are the Server Switches with their monstrous control planes (though still listed on the website, they weren't mentioned in the presentation), Solaris, and a partnership with Supermicro.

In their place we found:

The new logo and colors
New faces in management and marketing
Netvisor running on Linux
Whitebox and OCP-friendly switches
A partnership with D-Link
Some Netvisor improvements

Linux

This was probably inevitable, and likely matters little to Netvisor users. When Pluribus was first getting off the ground, I was waiting for an OpenSolaris release that never happened. That Pluribus stuck with Solaris for as long as they did while Oracle was dismantling the Solaris ecosystem is kind of incredible. Netvisor on Linux is fine, I'm sure.

Switch Hardware

One of Pluribus' claims to fame was their "server switches". These were normal switches using merchant switching silicon (from 2 or 3 different vendors, if I recall... I think Netvisor has a hardware abstraction layer which allows them to switch easily between Broadcom/Intel/Mellanox ASICs), but with enormous control planes sporting lots of cores, lots of RAM, lots of storage, dedicated network processors, etc...

The big switches opened the door to some interesting possibilities, but likely made a tough sell to customers that just wanted an IP network fabric. Which is probably most customers.

These days Pluribus is selling vanilla-looking Open Compute-friendly switches with ONIE, and supporting Netvisor on a handful of 3rd party whitebox platforms.

That D-Link Partnership

Okay, quit laughing. The D-Link switch in question is Trident II based, just like (almost) every other switch in the market. If D-Link helps Pluribus move product, then I'm delighted for all involved. The only thing I don't like about the DXS-5000-54S is that it lacks an RS-232 port. USB console? Ugh. I'll run my Netvisors on something with a proper management interface, thankyouverymuch.

Netvisor

Netvisor still looks pretty great! Some standout features:

Netvisor uses standard protocols to interact with neighboring devices, but you manage a Netvisor fabric as a single device.
It's still got fantastic telemetry and flow analytics capabilities, even without the monstrous control plane. Some slightly outrageous claims were made in this area toward the end of the presentation, but we didn't have time to dig in.
Individual nodes are managed in-band (via the front-panel interfaces, rather than the management LAN port). Incredibly this capability is not universal in this product space. Some platforms rely on the lone management Ethernet interface for fabric control purposes. This fact blows my mind. I'm similarly surprised that whitebox switches don't tend to come with redundant control plane paths. Maybe there's a single "eth0" port baked into the Trident chip for this purpose?
Routing is performed by an anycast gateway. That is, moving packets from one broadcast domain to another does not require them to be hauled to a certain point in the fabric. Any Netvisor switch (the nearest Netvisor switch) will do the job. This is a welcome change.
Members of a Netvisor fabric don't need to be cabled to one another. This opens the door to using Netvisor only at the leaf tier in a leaf/spine fabric... Or only at the spine... Or at both layers as a single large fabric... Or at both layers, but as two fabrics (one for leaf, one for spine)... Or as smaller deployment units in a huge fabric. Lots of possibilities here.

Saturday, September 23, 2017

KEMP Presented Some Interesting Features at NFD16

KEMP Technologies presented at Network Field Day 16, where I was privileged to be a delegate. Who paid for what? Answers here.

Three facets of the KEMP presentation stood out to me:

The KEMP Management UI Can Manage Non-KEMP Devices

KEMP's centralized management UI, the KEMP 360 Controller, can manage/monitor other load balancers (ahem, Application Delivery Controllers) including AWS ELB, HAProxy, NGINX and F5 BIG-IP.

This is pretty clever: If KEMP gets into an enterprise, perhaps because it's dipping a toe into the cloud at Azure, they may manage to worm their way deeper than would otherwise have been possible. Nice work, KEMPers.

VS Motion Can Streamline Manual Deployment Workflows

KEMP's VS Motion feature allows easy service migrations between KEMP instances by copying service definitions from one box to another. It's probably appropriate when replicating services between production instances and when promoting configurations between dev/test/prod. The mechanism is described in some detail here:

The interface is pretty straightforward. It looks just like the balance transfer UI at my bank: Select the From instance, the To instance, what you want transferred (which virtual service) and then hit the Move button. The interface also sports a Copy button, so in that regard it's better than the UI at the bank. I look forward to the bank allowing me to replicate funds between accounts in the future :)

I think it struck all of the Network Field Day delegates that this feature is primarily useful for manual workflows. An automated workflow wouldn't need an "easy button" like the VS Motion feature. Unfortunately there wasn't enough time to get into KEMP's Automation/API capabilities during the presentation, but Keith Miller was tuned into the live stream and reported that the API is a pleasure to use:

The API for KEMP is very easy to work with. I’m a Python newb & was able to script something easily to change a setting in all VSs #NFD16
— Keith Miller (@packetologist) September 14, 2017

Update from Keith:

Good read! I'll admit I was caught off guard by the mention of my tweet. The one thing I'll say now that I've learned a little more about APIs is that I wish the output was in a JSON format so you don't have to webscrape the response.
— Keith Miller (@packetologist) November 12, 2017

It's disappointing to read that the API doesn't return structured data.

VS Motion does not, as I understand it, have the ability to copy TLS certificates around right now, but the feature request is in.

That Strange License

Frankly, this topic from the NFD16 presentation doesn't make much sense to me.

When you're buying boxes, or even virtual capabilities that are licensed by a bandwidth cap, you're going to have paid-for-but-wasted capacity during off-peak times. KEMP has introduced a consumption-based model to work around that problem: Pay only for what you use!

It sounds great, especially with the popularity of virtual services. When talking about physical boxes, it makes sense that you'd have to pay for any overcapacity you may have provisioned: There's the box, 95% idle, waiting for that peak traffic day... Full of expensive processors and RAM... Oh, and there's the failover box, at 100% idle... You probably didn't expect to get the hardware for free, right?

The situation feels different when we're talking about virtual appliances: How much would you expect to pay for a virtual standby server? One which, if everything goes according to plan, will never see a request from a live client? You're already paying somebody else (the server vendor or IaaS provider) for the hardware, so paying KEMP based on usage seems ideal.

But they've created an altogether new problem: KEMP's consumption based license model finds the peak throughput (at 5 minute intervals) of each participating node, then adds them up to calculate the monthly bill.

Let's imagine that your organization has a rock-steady 1Gb/s flow rate through an active/standby pair of KEMP boxes, plus a DR facility somewhere.

Every month you pay for 1Gb/s of usage.

Then one day the active unit fails, load switches to the standby unit. Several hours later, you shift workload to the DR site while performing maintenance to restore the failed hardware in the main site.

Take the peak throughput from each KEMP unit: Active (failed), Standby (now active) and DR have each hit 1Gb/s. That month you'll pay for 3Gb/s, even though the workload never changed. You just moved it around.

It seems like anybody with any degree of workload mobility will be overpaying with this model, unless the per-bandwidth price is also quite low.

I'd be much more comfortable paying per byte, per TLS setup or per load-balanced request. The sum-of-peaks model seems too unpredictable to me.