Sunday, December 12, 2010

Sub-LUN Tiering Design Discussion

Nigel Poulton's recent blog posting on storage architecture considerations when using sub-LUN tiering is very thoughtful and I appreciate his approach and concern for the subject. Indeed, one of the challenges for me in working with new Compellent customers is helping them understand the different approach to storage design using automated tiering with sub-LUN granularity.

 I wanted to address one point which is particular to Compellent (and maybe some others, I'm not certain) and that is the RAID striping is dynamic, variable and part of the data progression process. In Nigel's example, he shows a three tiered system with various drive types and sizes already pre-carved to suit RAID protection (in the example case all RAID 6 protected). In the Compellent world, the only decision administrators need to decide is an appropriate level of parity protection per tier (typically based on drive size and speed which all goes back to rebuild times). As a best practice, customers are advised to use dual parity protection (which includes both RAID 6 and RAID 110*) for drives 1TB or larger.

That aside, I tend to agree with Nigel on a three tiered approach when bringing SSD into the picture. However, in configurations with spinning rust only there's usually no need for 15K and 10K drives, particularly with the capacities available in 15K and the density of drives for 2.5" 10K tiers
Two rules of thumb can help administrators plan for sub-LUN tiering -

Size for performance first and capacity second
Performance should never be addressed with slow, fat disks

Sizing for performance first allows you to potentially address all of your capacity in a single tier. Using 7200 RPM drives as your primary storage brings up issues of performance degradation during rebuilds, lower reliability and decreased performance with dual parity protection schemes. Rules of thumb, as I stated, so please no comments about exceptions - I know they exist.

Point is, using the rules above you can pretty easily draft a solution design if you know understand performance and capacity requirements.

For example, a solution requiring 4000 IOPS and 8TB of storage could be configured as

Tier 1 - Qty 24 146GB 15K SAS drives (RAID10/RAID5-5)
Tier 2 - Null
Tier 3 - Qty 12 1TB 7200 SAS drives (RAID110/RAID6-6)

On the other hand, a solution needing only 2500 IOPS and 6TB could be designed with:

Tier 1 - Qty 24 450GB 10K SAS drives (RAID10/RAID5-5)
Tier 2 - Null
Tier 3 - Null

Additional capacity tiering could be added in the future as needed, provided that performance requirements don't change (grow). These a simplistic examples and they really only provide starting points for the solution. They will be tweaked to improve initial cost and anticipated growth in demand.
So far those examples don't include SSD, I know. However, that's going to depend a good bit on the application requirements and behavior and this is where sub-LUN tiering adds value but system design gets a bit more difficult.

Consider an example where 600 virtual desktops are being deployed using VMware View 4.5 - we have the ability to create data stores for three different components in that environment:

  • Read only boot objects can be isolated and stored on the fastest storage (SSD) and are space efficient since multiple guests can read from the same data set.
  • Writeable machine delta files can be stored on 15K or even 10K drives to avoid problems associated with SSD overwrite performance.
  • User data can be stored on two tiers - high performing disk for active data and capacity drives for inactive pages.
So in this case we may deploy a solution similar to this very high level design (I'm assuming some things here and really won't go into details about images per replica, IOPS per image, parent image size)

Tier 1 (Replica boot objects) Qty 6 200GB SSD SAS
Tier 2 (Delta and user disks) Qty 48 600GB 10K SAS
Tier 3 Null (may be added in future for capacity growth)

In the end, the administrator really only has to watch two overall metrics in the environment for planning and growth trending. Performance and capacity could be added to the solution to address those specific needs only.

Again, this is all slanted toward the Compellent architecture but I do appreciate Nigel bringing this up as storage customers are going to be facing this more often and should start to get a handle on it sooner rather than later.

* My term for Compellent's Dual Mirrored RAID 10 scheme; I'm always trying to get a new phrase to catch on :)