May 6, 2014 IBM announced the second generation of their Storwize V7000 storage system. In this article we will make a brief overview of the system and talk about some of its features.
The competition in the storage market requires from manufactures the implementation of new technologies, which should satisfy the growing demands of performance, availability and capacity and allow customers to save money.
As in the previous generation, V7000 Gen2 system may consist of several pairs of Controllers (Control Enclosures) combined into a cluster with a single management.
In fact, the pair of controllers is an ordinary storage system with two controllers (node canisters). It has its input-output ports to communicate with the hosts and its own disk subsystem. Set consisting of a pair of the controllers and the disk subsystem in IBM terminology is called IO-group.
IBM calls a set of 4 pairs of controllers the cluster system, but in fact, it’s 4 different storage systems united by common management. If a pair of controllers fails, then the other pairs of controllers can not intercept the failed IO-group. I.e. the data which is managed by this pair of controllers becomes unavailable. Therefore, when we consider this system we do not follow its marketing description, and we will consider the capabilities of this system for one IO-group only.
Basis things for one pair of controllers (Type – Model: 2076-524):
- Each controller has a modern 8-core 64-bit Intel CPU, as well as a hardware accelerator of data compression. Thus, one pair of controllers includes 16 cores and 2 hardware compression accelerator on board.
- Optionally, HBA with an additional hardware compression accelerator may be installed into each controller.
- The amount of cache memory on the pair of controllers is 64GB, i.e. 32GB per controller. Optionally, the amount of cache memory on the pair of controllers can be increased up to 128GB; however, this extension can only be used with the optional hardware compression accelerator.
- An access to hosts may be organized in the following external interfaces to block access: FC – 8Gbps, iSCSI – 1Gbps and 10Gbps, FCoE – 10Gbps.
- A modern standard PCIE V3 is used for Back End IO connection. The throughput of this standard allows the use of SAS Back End at speed up to 12Gbps. Thus, the controllers can be connected to the disk shelves and disk drives at speed up to 12Gbps.
- The maximum number of disk drives per pair of controllers is 504.
The maximum number of disk drives per 4 pairs of controllers is 1056.
- A form factor of shelves with a pair of controllers is 2U in the 19-inch rack.
- Up to 24 disk drives size 2.5″ can be installed in the shelf with a pair of controllers (2076-524).
- Up to 24 disk drives size 2.5″ can be installed in the disk shelf (2076-25F).
- Up to 12 disk drives size 3.5″ can be installed in the disk shelf (2076-12F).
- The Flash disk drives 2.5” sizes 200, 400, 800 GB with 12Gbps SAS interface are supported.
- The SAS disk drives 2.5” 15K rpm sizes 300, 600 GB with 12Gbps SAS interface are supported.
- The SAS disk drives 2.5” 10K rpm sizes 600, 900, 1200 GB with 6Gbps SAS interface are supported.
- The NL-SAS disk drives 2.5” 7.2K rpm size 1 TB with 6Gbps SAS interface are supported.
- The NL-SAS disk drives 3.5” 7.2K rpm size 2, 3, 4 TB with 6Gbps SAS interface are supported.
- The different disk drive types with same form factor can be supported in one shelf.
- One pair of controllers can support up to 20 disk shelves.
- The pairs of controllers are compatible with files modules, which allow to organize the Unified solution with file access (NAS).
- Thin Provisioning technology is included in the base set of licenses.
- Hybrid pools with automatic data migration between storage tires (Easy Tier) are included in the base set of licenses.
- External block storage virtualization (it requires additional licensing).
- Real time compression (RtC) (it requires additional licensing and optional additional HW).
- Local replication (FlashCopy is included in the base set of licenses).
- Remote replication synchronous and asynchronous (RemoteMirror requires additional licensing).
- Software version IBM Storwize family 7.3 or above are supported.
A few years ago IBM company decided to create universal midrange storage platform based on SAN Volume Controller (SVC) code. It has allowed to reduce investments in the software development and support for these products. Thus, IBM company has got two solutions (SVC and Storwize), which are based on the same software that has been being improved for more than 10 years. Besides, the needed functional for a work with RAID groups and Pools was taken from High End systems DS8xxx series.
This approach gave good results in the first generation of the Storwize V7000. New generation of the systems is developing this approach and adding new opportunities.
Let’s look what news in a part of HW configuration in the pair of controllers is. its Below its layout is shown in the figure.
First of all, you can see that the form factor of the controllers has changed. There was a possibility to install only one additional Ethernet 10Gbps card in the controller in the previous generation V7000 and the controllers were located one above the other. However, the current trends have led to the fact that this approach has become less effective, and new changes were needed to get a more flexible configuration. Many vendors have already been using this approach for a long time.
Note, that the Midplane board is a single point of failure (SPOF). This situation is typical for the majority of vendors, although, marketing materials tell us that the system has no single point of failure. This situation is not critical, because the probability of failure of the board is small, since it has almost no active components.
The developers changed the form factor of controllers in the V7000 Gen2 and placed them on the same level. It gave a possibility to place up to 3 PCIE slots for HBA cards on the one controller.
The figure shows that the controllers do not have built-in FC ports for connecting hosts. Now this functionality is provided by installing HBAs that can provide access by FC 8Gbps, FCoE and iSCSI 10Gbps. This solution is good because it doesn’t need a lot of changes of the controller’s board in the future generation in the transition from 8Gbps to 16Gbps, and HBA will simply need to be updated. In this case, the necessary reserve throughput at the switch level has already been laid, since interconnect is based on the standard PCIE v3. Vendors do these things to reduce their costs. This is quite reasonable on their part.
Embedded ports on the controller
- Up to 4 ports Ethernet 1Gbps, in this case, 1 port is used for service needs (initial install and etc). The remaining three ports can be used to block access iSCSI 1Gbps protocol.
- Up to 2 USB ports (used for configuration purposes).
- Up to 2 SAS connectors are used for the connections of disk shelves. One connector consists of 4 bidirectional serial lanes and gives speed up to 12Gbps.
Connections of disk shelves
One SAS connector can support up to 10 disk shelves. In this case, the first SAS connectors of both controllers are used to connect one group of 10 disk shelves, and the second SAS connectors are used to connect the other 10 disk shelves. Thus, up to 20 shelves with 2.5″ drives can be connected to a controller pair in total. Typical connection scheme is shown in the figure below.
PCIE cards for controllers
At the moment, you can install different cards, these are:
- 4 ports FC 8Gbps card. It comes with Shot Wave SFP 8Gbps transceivers. There is a possibility to use the Long Wave SFP 8Gbps transceivers.
- 4 ports iSCSI, FCoE 10Gbps card. It comes with SFP+ transceivers.
- 4 ports iSCSI, FCoE 10Gbps card. It comes with SFP+ transceivers.
- An additional compression accelerator (installed optional) or a special path-through card that connects the integrated compression core (included in the basic package).
The table contains information in which slots the cards can be installed:
|1||Compression path-through или Compression Acceleration card|
|2||None, 4 port 8Gbps FC, 4 port 10Gbps iSCSI/FCoE|
|3||None, 4 port 8Gbps FC, 4 port 10Gbps iSCSI/FCoE|
In the current version 7.3 of the software, there are some limitations on the number of cards. Thus, in each controller can be installed two cards FC 8Gbps each with 4 ports, but only one 4 ports card iSCSI/FCoE 10Gbps is supported.
Among the new things I want to note that it is now possible to make the replication via Ethernet, using the integrated ports 1Gbps and 10Gbps ports on the cards.
Host connection thru FCoE 10Gbps is supported only when the network infrastructure is used, i.e. direct connection to the host is not supported.
Besides, there is a recommendation to separate FCoE and iSCSI protocols between different ports.
Batteries and power outage
The batteries are fairly compactly located in the controllers. Their main task is to allow the controllers to save the current configuration and the write cache to the internal Flash drive, in case of failure of the guaranteed supply.
Architectural scheme of controller and components of solution
Block diagram of the controller is presented in the figure below:
The key element of the new controller is the using modern 8-core CPU Intel Xeon E5-2628L v2, based on the Ivy Bridge.
This CPU is oriented for embedded systems and lower power consumption as other CPUs from this series (E5-2618L v2, E5-2628L v2, E5-2648L v2 и E5-2658L v2).
One of the famous features of this CPU is embedded IO, which supports PCIE v3 version. This CPU has 40 lanes of PCIE v3.
The scheme of the CPU:
The scheme of a separation by ports and lanes:
Let’s consider the scheme of the controller in a more detail and try to find the limits of throughput between different components.
Increase of the IO throughput in the new CPU allows to connect main components simply enough. Let’s start our consideration from host adapters (FrontEnd).
As you can see from the figure, PCIE slots have x8 size. It is known, that the same FC HBA is used in the new controllers as the first generation. This is 4 ports HBA PM8032 Tachyon QE8 designed by PMC-Sierra, Inc., it supports PCIE v2.
Take notice, that the PCIE x8 slot working on version 2 standard can have up to 3,2 GBps of the throughput in one direction. This is two times less than using version 3 of PCIE in the same configuration (6,4 GBps). I.e. we see the reserve for future use for new 16Gbps HBAs, which will work with PCIE v3 standard.
We know little about new 10Gbps iSCSI/FCoE HBA. As you see in the figure, HBA is installed in the PCIE x8 slot.
Let’s consider new Intel chipset, which suppots hardware compression, gives 4 ethernet ports 1Gbps, 2 USB 2.0 ports and allows to connect embedded Flash drive where controller software is located.
This is the series of Intel 89xx chipset (Coleto Greek) which integrates with CPU perfectly and allows to make hardware compression and encryption. While IBM says nothing about encryption, perhaps, we will see it in the next SW releases. Although, you notice a small block TPM (Trusted Platform Module), so there is crypto processor and when it works officially it’s not clear.
The main characteristics of the Intel 89xx chipsets are represented in the next figure:
The most interesting parameter for us is the compression speed up to 24Gbps.
PCIE x16 slot is used for the compression card. As you can see from the table, the different quantity of the PCIE lanes (x16, x8, x4) can be used and the card supports PCIE v2 specification.
Thus, one chip with compression support is located on the board and the next chip can be located on the card which is installed in the PCIE x16 slot. The chip on the board is connected directly to the CPU with 4 lanes PCIE, you can see this in the figure with slots. These lanes are DMI (Direct Media Interface).
The question appears, how other PCIE lanes of the embedded chip are connected?!
It is known, that for the work of the embedded controller it is necessary that the special path-through card should be installed in the PCIE x16 slot. But if 2 devices are connected to each other thru PCIE x16 slot, then one device should have 8 lanes only for connection.
In that case, when the second chip Intel 89xx is used, then it and the embedded chip are connected to CPU thru PCIE switch (PEX 8xxx) PLX Technology (V7000 Gen1 is used switch PEX8648, information about device ids for Gen1 you can see in the article http://it-consultant.su/page/2/).
In the picture of the compression card you can see two big chips with radiators. The one, which is smaller is PCIE switch, and the other is Intel 89xx chip.
It follows, that electrical circuits for PCIE x16 are made so that half of the lanes is used for connecting PCIE switch and CPU and the other half is used for connecting embedded chip Intel 89xx and second chip is connected to PCIE switch directly.
I expect that PEX8xxx switch is used and it supports specification PCIE v3 for connecting CPU. 8 lanes from CPU (PCIE v3) and also 16 lanes (PCIE v2) from compression chips Intel 89xx connect to the switch.
Thus, the max throughput of 8 lanes PCIE v3 (6,4 GBps) corresponds to the throughput of two connections 8 lanes PCIE v2 (3.2 GBps).
Other PCIE v3 switches (PEX 8xxx PLX Technology) are used for making data exchange between storage controllers. Admittedly, the switch connects to CPU with 8 lanes PCIE v3. Thus, an interconnect between controllers in one direction is up to 6,4 GBps.
Probably, PCIE extension is used on the PLX Technology switches, it enables to make low latency communication between the caches of the controllers via RDMA (Remote Direct Memory Access).
Let’s consider the last component of the scheme. This is a connection of disk devices (BackEnd) to PCIE fabric. If you recalculate the number of lanes, which were used for the tasks listed above, you find that only 8 lanes remain from 40. I assume that this quantity of PCIE V3 lanes is used to connect BackEnd devices. Thus, the bandwidth in one direction is no more than 6,4GBps.
As the result, you get such a scheme of the components connection:
Backend is organized on the basis of SAS controller PCM-S SPC series (PM8xxx from PMC-Sierra Inc.) embedded in the board, this controller allows to get multiple SAS ports which operates with bandwidth up to 12 Gbps (PM8001 is used in the V7000 Gen1 controller, it has 8 ports up to 6 Gbps and it is connected to the PCIE V2 fabric).
The SAS expanders are connected to these ports, they are used in the disk shelves. Disk drives are connected to these expanders. Besides, each SAS expander has an additional port that allows to connect the next SAS expander by chain.
The connection scheme looks like this:
Each SAS 12Gbps port of the chip PCM-S SPC PM8xxx series consists of 4 lanes (SAS x4) and has a bandwidth of 4,4 GBps.
Note that the disks in the shelf with controllers are connected by two such ports.
This suggests that the throughput of disks connection in the controllers shelf should be much higher. Therefore, e.g., there is a recommendation to place high performance disks (Flash) in the controllers shelf first of all. A certain balance will be respected in this case.
The other 2 ports are used for the organization of two cascades for connecting additional disk shelves.
Allocation of processor and memory resources when using hardware compression
The use of compression in real time (RtC) requires significant resource costs.
Up to 4 CPU cores on the controller can be used when RtC is active.
If an additional compression card is not used, then 12 GB of memory is allocated for RtC from 32 GB of the memory of the controller.
In case an additional compression card is installed then 32 GB of memory, which are used only for RtC, is added to each controller. Besides, 6 GB is allocated from the main memory. Thus, 38 GB of memory is allocated to each controller for RtC.
In this case, the controller pair can support up to 200 compressed volumes.
Cache memory architecture
The figures below show the schemes of IO processing in the cache for V7000 Gen1 and Gen2:
Division of the cache memory on 2 areas: Upper Cache and Lower Cache appeared in V7000 Gen2.
Apparently, the structure and algorithms of the cache memory (Cache Partitioning, Destaging, LRU, Write Throttling, Prefetching, Full Stripe Writes) remained the same as in V7000 Gen1, but now Flash Copy, Volume Mirroring, Real Time Compression modules are higher than Lower Cache, which is responsible for the R/W caching, metadata and etc. Most likely, It’s an additional advantage in the data processing speed.
The main role of Upper Cache is optimization of transmit copy of data to second controller, when the write operations occur.
Some parameters of the cache memory:
- The size of a memory page – 4 KB.
- The granularity of the data in the destage process – 32 KB (8xPages).
- The size of Upper Cache – 256 MB.
- The size of Lower Cache = Cache – Upper Cache.
An allocation of the cache memory for the tasks to a single controller:
|Cache size, GB||RtC||Linux kernel,GB||RtC, GB||Max Write Cache, GB|
Hybrid pools based on EasyTier v3
A mechanism for automatic load balancing in the pool and the ability to use up to 3 tiers appeared in the new generation of systems.
A possible configurations of the tiers:
An automatic balancing is possible as between logical blocks, so within tiers and between tiers. The block size is pool extent size, all MDISKs devices in the pool are logically separated into extents.
The extent size is specified in pool creation stage and can range from 16 MB to 8192 MB. The maximum size of the LUN in the pool and the maximum capacity of the system depend on the extent size, because the number of extents is limited.
The mechanism EasyTier collects statistics about extents utilization (IOPS, Response time, etc.) and creates a plan to move the extents between tiers of storage. The plan is built every 24 hours. Next, the extents migration process is started. The maximum speed is about 30 MBps.
Hardware compatibility with V7000 Gen1
- Disk shelves and hard drives from V7000 Gen1 cannot be used with the controller pair V7000 Gen2.
- The controller pair V7000 Gen2 can be clustered with V7000 Gen1.
- Remote replication – no restrictions for V7000 Gen1 and SVC.
We finish our review with a small comparative table:
|IBM Storwize||V7000 Gen2|
|Max drives per controller pair (IO Group)||504|
|Memory per controller, GB||64|
|Memory per controller pair (IO Group), GB||128|
|CPU type||Xeon E5|
|CPU per controller||1|
|CPU core q-ty||8|
|CPU core clock, GHz||1,9|
|Max FC 8Gbps ports per controller (FE)||8|
|Max FC 8Gbps ports per controller pair (IO Group) (FE)||16|
|Max iSCSI 10Gbps ports per controller (FE)||4|
|Max iSCSI 10Gbps ports per controller pair (IO Group) (FE)||8|
|Max Disks per RAID Group||16|
|Max Disks per Pool||128|