I have been given the distinct privilege of being able to attend
an exclusive partners-only boot camp training put on by Cisco for their UCS
system (Cisco’s blade implementation).
Like Scott
Lowe, Rich
Brambley and Rodney
Haywood, I will be blogging my thoughts and technical details (along with
the occasional tweet). I may skip past many technical details simply
due to the fact they’ve been covered by one of the above blogs, or because I
simply have to leave something for my customers to pay for.
The class started off pretty slow with general housekeeping
and an overview of the pains of current blade infrastructures and a high level
explanation of FCoE. Then the fun began.
The UCS chassis includes eight half-width blade slots, four
power supplies, eight fan modules and two Fabric Extenders (also referred to as
I/O Modules). Two power supplies can power
the entire chassis with the third and fourth providing redundancy. (Similarly, the HP c-Class enclosures can be
fully powered by three power supplies with the fourth, fifth and sixth
providing redundancy.) There is no power
domain concept in the UCS chassis as there is in the IBM and the HP p-Class enclosures.
External to the chassis, but still required, are a pair of
Fabric Interconnects (FI), each of which can support up to 20 or 40 UCS
chassis, depending on the FI model. The
UCS Manager (UCSM) runs in an Active/Passive mode on these FI. The UCSM is built on a XML database that
stores all configuration details and settings for the entire solution,
including the Service Profiles. This
database can be accessed through many different ways, including the client Java
GUI, CLI, SNMP, and XML APIs.
The UCSM works through the Chassis Management Channel (CMC)
on the I/O Modules (IOM) within each chassis.
There are two IOMs in each chassis and they run Active/Active for the
data communication and Active/Passive for the CMC. The CMC is really nothing more than a proxy
for talking to the individual chassis, blade and switch components. Also on the IOMs is the Chassis Management
Switch (CMS), which provides communication to the Baseboard Management
Controller (BMC) on the individual blades (think iLO). These are 100MB connections that flow through
a dedicated port on the chassis.
Currently there is only one blade available, the half-width
B200, which contains two Xeon 5500 (Nehalem) processors, 12 DIMM slots (up to
96GB), 2 SAS/SATA HDD and a single mezzanine socket. The DIMMs are limited to only Registered 4GB
and 8GB 1333Mhz DIMMs (yes, only two types of DIMMs are supported based on what
we were told). The BIOS is currently
limiting the bus speed to 1066Mhz, no matter how many DIMM slots are
populated. It is possible for certain
processors to push the bus speed down to 800Mhz.
The next blade to be released will be the full-height B250,
which will contain two Xeon 5500 (Nehalem) processors, 48 DIMM slots (up to
384GB), 2 SAS/SATA HDD and 2 mezzanine sockets.
The extra DIMM slots are made possible by the Cisco-exclusive Catalina
chip. How this works was beyond the
scope of our class, but I am supposed to be getting a whitepaper that describes
the nuts and bolts, but essentially it’s able to put four times the DIMMs in
each channel (8 DIMMs x 3 Channels x 2 Processors) without affecting bus speed
and only incurring minimal additional latency (6 ns). The same limits to DIMM types and speeds
apply to this blade as in the B200.
All the pieces are well laid out in Rodos’ post here: http://rodos.haywood.org/2009/08/ucs-schematic-sketch.html
One thing I learned today is that FCoE is not simply a Fibre
Channel packet wearing an Ethernet Halloween costume. FCoE requires a special handling and flow
control. FC works by not allowing
packets to be dropped, and FCoE must still abide by this rule. The Nexus switches do this by actually
embedding MDS functionality directly into switch.
Figure 1: FCoE Switching
In Figure 1 I have depicted a server with a CNA adapter
connecting through two Nexus 5000 switches to a native FCoE storage array. The blue lines depict the FCoE traffic, which
by the nature of the NX-OS will be lossless, and the orange lines depict native
FC traffic. Note the use of a MAC
address as the Source and Destination for the FCoE packets and the fact that
they are unwrapped upon arrival at their destination.
Given the losses requirement (among others, as our
instructor was quick to point out), FCoE packets cannot be routed
through a Catalyst switch. In other
words FCoE should only flow through FCoE-aware switches (read Nexus) because they
are not normal Ethernet packets.
One point that has been pretty well documented, and I have
confirmed this to still be true (for a little while longer at least) is the
fact that FCoE cannot be sent upstream (toward the SAN appliance) from the
Fabric Interconnects (FI). Note that
Figure 1 describes a non-UCS server.
There currently is no way to do native FCoE from a UCS blade completely
to the Storage Array. See Scott
Lowe’s description for a good summary.
Essentially, it comes down to this: the FI does not contain the embedded
MDS functionality to forward on the FC packets.
All the FI can do today is strip off the FCoE wrapper and send the
native FC out an NPIV enabled port. This
will change in the future, but today is a limitation.
Service Profiles are a big differentiator for the UCS
system. Those familiar with
VitualConnect may realize that Service Profiles are very similar to
VirtualConnect Server Profiles, but UCS does add some unique capabilities, such
as the ability to define the Firmware and Quality of Service properties. Another unique feature is the ability to wipe
the local drives when applying a Service Profile to a blade.
When it comes to configuring interconnects and adapters, Cisco
seems to have moved most functionality and choice into the adapters instead of the
interconnects. Ultimately, this seems
like a simpler solution, since now you just pick which adapter you want and the
I/O Module (IOM) is the same regardless of your choice. There are (or will soon be) three choices for
mezzanine adapters: Palo (CNA built for virtualization), Menlo (CNA with two
10GbE and two FC) and Oplin (standard dual port 10GbE with support for
OS-implemented FCoE).
Cisco’s upcoming Palo adapter mezzanine card provides
similar functionality to HP’s VirtualConnect Flex-10. I was in awe when I first realized what
Flex-10 could do. Using the Palo adapter
within UCS really blew me away. This is
where Cisco’s networking expertise really shows through. Here’s a quick comparison:
Similarities
-
Split a single 10GbE connection into multiple
instances that the OS sees as individual devices
-
Ability to define bandwidth of each device
Differences
-
Splitting of connection occurs completely within
the Palo adapter, rather than a combination of the adapter and interconnect
-
Palo can create 128 connections, as opposed to
Flex-10’s four
-
Palo can define many more characteristics for
each logical connection
-
Palo has built-in hardware failover between the
two uplinks, eliminating the need to implement failover within the OS/software
layer (mezzanine card is still single point of failure)
-
Palo is a CNA, meaning those 128 connections can
be any combination of vNICs and vHBAs
-
Palo can enable direct 1:1 mapping of VM vNICs
to Palo vNICs using VN-Link
-
The Palo adapter actually runs a Linux OS and an
unmanaged switch in order to manage all this magic
As we dug in deeper into the actual data paths when using
Palo, FCoE, 6100 Fabric Interconnects (FI) , 2104 Fabric Extenders (FEX) and
Nexus switches (primarily 1000v and 5000), I began to wonder: Did Cisco create
a complicated UCS (w/ FCoE and Palo adapter) to sell more Nexus 1000v? It essentially comes down to this: Ethernet
best practice is to not route a packet back down the same port it came in
on. In the case of an ESX host, this
could be a possible scenario. In order
to avoid this, VN-Link creates virtual Ethernet ports on the FI in order to
treat them as two separate ports, thereby allowing routing between them. At the end of the long, hard to grasp
discussion it was stated that the Nexus 1000v would avoid all of this by simply
routing the traffic within the host and avoiding the FI completely. Good selling point for the Nexus 1000v.
Two great pictures we actually used in our class for understanding
how traffic flows out of the UCS using Palo can be found here: http://www.internetworkexpert.org/2009/08/11/cisco-ucs-nexus-1000v-design-palo-virtual-adapter/
and here: http://www.internetworkexpert.org/2009/07/05/cisco-ucs-vmware-vswitch-design-cisco-10ge-virtual-adapter/. Both are by Brad Hedlund, who appears (based
on my limited exposure to the Cisco Data Center world) to be an IT rockstar (perhaps
the Duncan Epping of UCS?).
This leads me to a final general point about the UCS system. Ultimately, there is a lot to love about the
UCS system. It was clearly designed with
Network and Storage I/O in mind (as you would expect from Cisco), and with
little innovation needed on Nehalem systems, this helps Cisco stand apart. They have also made an effort to truly unify
all the management interfaces, though based on the screenshots I’ve seen so far
they’re not as nice as HP’s. At the same
time I worry that the UCS system is simply just too complicated to sell to the
general customer. As a HP reseller and implementer,
I find the whole VirtualConnect and Flex-10 conversation can go over many technical
people’s heads. UCS is even harder to
understand (note to self: practice UCS whiteboarding skills thoroughly).
Some additional comparisons to HP blades:
-
Cisco’s blade slot architecture seems similar to
HP p-class (4 slots that can be divided w/half height blades) as opposed to
c-class (16 half height slots that can be converted to full height).
-
Cisco’s Baseboard Manager is equivalent to HP’s iLO
Miscellaneous final notes:
-
UCS certifications will be available early next
year for design and implementation
- Storage redundancy is not handled in the UCS
hardware and should be implemented within the OS/Application layer
- If multiple uplinks are used to connect IOM and
FI, they are completely separate connections and cannot be combined with a port
group
- An IOM can only be connected to a single FI
I guess that’s it for today.
Not enough to digest? Check back
tomorrow and the rest of the week for more.
Don’t worry; I’m pretty sure this will be the longest post since the
rest of the week will involve more labs and less architecture.
Please feel free to
leave comments to ask questions, make corrections or provide additional
information.