StorageGumbo: 2010

Sunday, December 12, 2010

Sub-LUN Tiering Design Discussion

Nigel Poulton's recent blog posting on storage architecture considerations when using sub-LUN tiering is very thoughtful and I appreciate his approach and concern for the subject. Indeed, one of the challenges for me in working with new Compellent customers is helping them understand the different approach to storage design using automated tiering with sub-LUN granularity.

I wanted to address one point which is particular to Compellent (and maybe some others, I'm not certain) and that is the RAID striping is dynamic, variable and part of the data progression process. In Nigel's example, he shows a three tiered system with various drive types and sizes already pre-carved to suit RAID protection (in the example case all RAID 6 protected). In the Compellent world, the only decision administrators need to decide is an appropriate level of parity protection per tier (typically based on drive size and speed which all goes back to rebuild times). As a best practice, customers are advised to use dual parity protection (which includes both RAID 6 and RAID 110*) for drives 1TB or larger.

That aside, I tend to agree with Nigel on a three tiered approach when bringing SSD into the picture. However, in configurations with spinning rust only there's usually no need for 15K and 10K drives, particularly with the capacities available in 15K and the density of drives for 2.5" 10K tiers

Two rules of thumb can help administrators plan for sub-LUN tiering -

Size for performance first and capacity second

Performance should never be addressed with slow, fat disks

Sizing for performance first allows you to potentially address all of your capacity in a single tier. Using 7200 RPM drives as your primary storage brings up issues of performance degradation during rebuilds, lower reliability and decreased performance with dual parity protection schemes. Rules of thumb, as I stated, so please no comments about exceptions - I know they exist.

Point is, using the rules above you can pretty easily draft a solution design if you know understand performance and capacity requirements.

For example, a solution requiring 4000 IOPS and 8TB of storage could be configured as

Tier 1 - Qty 24 146GB 15K SAS drives (RAID10/RAID5-5)
Tier 2 - Null
Tier 3 - Qty 12 1TB 7200 SAS drives (RAID110/RAID6-6)

On the other hand, a solution needing only 2500 IOPS and 6TB could be designed with:

Tier 1 - Qty 24 450GB 10K SAS drives (RAID10/RAID5-5)
Tier 2 - Null

Tier 3 - Null

Additional capacity tiering could be added in the future as needed, provided that performance requirements don't change (grow). These a simplistic examples and they really only provide starting points for the solution. They will be tweaked to improve initial cost and anticipated growth in demand.

So far those examples don't include SSD, I know. However, that's going to depend a good bit on the application requirements and behavior and this is where sub-LUN tiering adds value but system design gets a bit more difficult.

Consider an example where 600 virtual desktops are being deployed using VMware View 4.5 - we have the ability to create data stores for three different components in that environment:

Read only boot objects can be isolated and stored on the fastest storage (SSD) and are space efficient since multiple guests can read from the same data set.
Writeable machine delta files can be stored on 15K or even 10K drives to avoid problems associated with SSD overwrite performance.
User data can be stored on two tiers - high performing disk for active data and capacity drives for inactive pages.

So in this case we may deploy a solution similar to this very high level design (I'm assuming some things here and really won't go into details about images per replica, IOPS per image, parent image size)

Tier 1 (Replica boot objects) Qty 6 200GB SSD SAS
Tier 2 (Delta and user disks) Qty 48 600GB 10K SAS
Tier 3 Null (may be added in future for capacity growth)

In the end, the administrator really only has to watch two overall metrics in the environment for planning and growth trending. Performance and capacity could be added to the solution to address those specific needs only.

Again, this is all slanted toward the Compellent architecture but I do appreciate Nigel bringing this up as storage customers are going to be facing this more often and should start to get a handle on it sooner rather than later.

* My term for Compellent's Dual Mirrored RAID 10 scheme; I'm always trying to get a new phrase to catch on :)

Tuesday, November 23, 2010

The Morning After

Compellent's big news yesterday generated a lot of traffic and I'm just catching up having otherwise been engaged in pre-sales meetings. Overall, I think the Compellent marketing crew did a great job and is to be commended for delivering the message globally and in a consistent manner. And the message seems to have been well received.

Chris Mellor quoted Chris Evans great in a writeup over at The Register. I have a great deal of respect for both Chris's but wanted to respond specifically to one key point concerning Live Volume (red font coloring is my doing):

Live Volume will place an extra processing burden on the controller. Storage consultant Chris Evans said: "With Live Volume, as a LUN is moved, the new target array has to both serve data and move data to the new location. This put significant extra load on the new target. I don't know how many arrays can be in Live Volume, but I would imagine the intention from Compellent would be to have a many to many relationship. If that's the case then I can see a lot of that extra [controller] horsepower being taken up moving data between arrays and handling I/O for non-local data."

Keep in mind that Live Volume is an extension of Remote Instant Replay (Compellent's replication suite) with the ability to mount the target while replication is underway. In other words, no data is being moved that wouldn't normally be moved during a replication job. The additional functionality serves IO at the target site by having the target replication device become a pass-through to the source. The cut over of a volume from one system (array) to a target basically involves the same computational workload as activating the DR site under a traditional replication scheme. I guess maybe Chris (Evans) is referring to the pass-through IO on the target side being an extra burden but if you consider that the whole point is to transfer workloads then I don't see an undue burden being placed on the target system - it will assume the source role if IO or throughput exceeds the configured threshold anyway.

Like Chris Evans, I can see Live Volume evolving into a many-to-many product eventually, since Remote Instant Replay already supports this type of replication. In fact, the possibilities are exciting and I'm sure (but not in the loop sad to say) that more enhancements will be coming - personally I'd like to see some level of OS awareness of this movement so that outside events could trigger Live Volume movement.

Sunday, November 14, 2010

The Thick and the Thin of Provisioning

Last week there was an interesting exchange between two storage industry analysts on the topic of thinly provisioned storage. The discussion revolved around the value of thin provisioning in the most common sense - lowering cost of storage by avoiding the purchase of tomorrow's storage at today's prices. I'm not going to rehash that discussion other than to say that I agree with the proposition that thin provisioning can lower total cost of ownership. However, I came to realize that there's really more to be said for thin provisioning.

First, let's agree on something which should really be obvious but I think needs to be stated up front in any discussion on thin provisioning.

THIN PROVISIONING ≠ OVER ALLOCATION

If this seems contrary to what you understand thin provisioning to be then blame the marketing hype. One of the questions I'm most frequently asked is, "What happens when my thinly provisioned volumes fill up?" and my answer is "The same thing that happens when your thickly provisioned volumes fill up!" In other words, don't use thin provisioning to try and avoid best practices and planning. Unless you have a very firm grip on your storage growth and have a smooth and responsive procurement process (and planned budget) you're better off not over allocating your storage. That's my two cents.

Now that I've made at least one industry analyst happy let me explain why I still think thin provisioning is a feature worth having and a necessary part of a storage virtualization solution's feature set (yes, on the array).

As a storage administrator, having information about actual capacity utilization is pure gold. It's not good enough to know how much storage you've provisioned - you really need to understand how that storage is being used in order to drive efficiency and control cost (not to mention plan and justify upgrades). In many shops, storage and the application teams are siloed and obtaining information about storage utilization above the block level provisioning is usually difficult and very likely not accurate. Consider also that storage consumption can be reported on a lot of different levels and with many different tools. Collecting and coalescing that information can be time consuming and frustrating.

In a thinly provisioned storage array the administrator can tell in an instant what the utilization rates are and also trend utilization for planning and budgeting purposes. And, yes, the information can also be used to question new storage requests when existing storage assignments are poorly utilized or over sized.

Although thin provisioning is provided at various other layers of the stack outside of the array it doesn't devalue the single pane management benefits associated with array based thin provisioning. For example, VMware administrators must select between three storage provisioning options when creating a virtual disk (zeroedthick, thin and eagerzeroedthick). There may be rationale for using thin provisioning tools within the LVM or application but that should only apply to use cases within that system or solution - in Compellent's case it matters not because the block level device will be thinly provisioned regardless of any higher layer sparse allocation. In short, a shared storage model requires thin provisioning at the storage layer to drive efficiency for the entire environment (which by the way is justification for storage virtualization at the array in general).

Thin provisioning could be considered a side effect of virtualizing storage and actually assists in delivery of other virtualization features such as snapshots, automated tiering, cloning and replication. Foremost is reduction in the amount of work that must be done to manage storage for other features. With thick volumes the zero space must be manipulated as if they were "load bearing" - and in the case of volumes sized for growth this could be significant on a large scale. For example, a new 100GB LUN, thickly provisioned would need to be allocated storage from some tier. Maybe that's all tier 1 storage which would eat up expensive capacity. Maybe it's tier 3 storage which means performance might suffer while the system figures out that some pages just got very active and need to be promoted to a higher tier. Even if some rough assumptions were made and the LUN was provisioned out of a 50/50 split of high performance and lower cost storage there's still going to be some inefficient use of the overall array.

Likewise, a feature which involves making a copy of stored data, such as cloning and replication, would be more costly if the entire thick (and underutilized) volume were being copied. Many storage virtualization products provide a capability to create a volume as a golden image for booting and then assign thinly provisioned copies of that boot image to new servers, conserving storage. Without thin provisioning you could still dole out servers from a golden image, of course but why not deduplicate? Yes, thin provisioning is a form of deduplication when you think about it.

Far from being a new problem to deal with, thin provisioning is a key feature in any virtualized storage solution and you don't have to over allocate your storage to get value from it.

Saturday, August 7, 2010

One Step Provisioning

When I managed an enterprise storage team one of the pain points I had to deal with was my organization's desire (for many reason) for separation of duties. Simply put my storage team and contractors weren't given access to operating systems to complete storage provisioning tasks - they basically stopped at the HBA. Some of it had to do with audit controls but there were also lines of responsibility that made server managers reluctant to allow anyone outside their team to have administrative credentials.

As you can imagine this added time and complexity to the relatively simple task of fully provisioning a new volume on a Windows or VMware ESX server.

So, I was pretty delighted to see Compellent introduce VMware integration into Enterprise Manager version 5 which allows a storage administrator to not only carve out storage and map/mask LUNs (or volumes in Compellent's world) to ESX servers but to then have the related storage tasks in vCenter performed as well. With one activity and one administrator storage is provisioned, mounted, formatted and available for use. I can also configure and kick off replication (Remote Instant Replay) and snapshots (Data Instant Replay) if required at provisioning time.

By the way, this same functionality is available for MS Windows servers which are running Server Agent.

Here's a screen shot of the volume creation dialog from within Enterprise Manager you'll see when selecting an ESX server or cluster as the target for a new volume (that's right, I can create a volume and assign it to multiple nodes simultaneously).

When I click OK, Enterprise Manager will coordinate and issue commands to Storage Center for volume creation, mapping and replication, then the appropriate administrative steps will be taken with VMware to rescan for the new storage, create the new data store and present it to the ESX server or cluster.

I can also select a guest and perform soup to nuts RDM provisioning as well (again, including replication).

Super easy. Here's a shot of the volume creation dialog with a Windows server as the target. Note that I can assign a drive letter if needed or set up as a mount point.

If you're managing Compellent Storage Center and not using Enterprise Manager for day-to-day administration you may want to reconsider. With version 5.3 there's really not much you can't do from Enterprise Manager and the additional benefit of one step provisioning may make your life a little simpler.

Friday, July 16, 2010

All The Benefits of Short Stroking Without The Calories

There were some interesting tweets that I mostly stayed away from today regarding the benefits of Compellent's Fast Track performance enhancement feature. I'll let that particular dog be and await a clarification from the marketing folks at Compellent. However, I do want to take the opportunity to talk about Fast Track and explain what it is - and what it is not (all in my humble opinion, of course).

Some background on short stroking. Short stroking is a performance enhancement technique that improves disk IO response time by restricting data placement to the outer tracks of the disk drive platters.

This technique has been around for a long time and it's pretty easy to do (format and partition only a fraction - say 30% - of the disk capacity). The trade off, obviously, is that your storage cost will have increased. You could partition the remainder of the disk and use that for data storage as well, but this could put you back into a wildly swinging actuator arm situation which is precisely what you're trying to get away from.

Enter Fast Track. As a feature of Compellent's Fluid Data Architecture it provides the benefits of having your read/write IO activity confined to the "sweet spot" of each and every disk in the system while placing less frequently accessed blocks in the disk tracks that would normally not be used in a traditional short stroking setup. It's like cheating death. OK, maybe not that good but it's certainly got benefits over plain old short stroking.

If you're familiar with Compellent's Data Progression feature, this is really just an extension of that block management mechanism. Consider that the most actively accessed blocks are generally the newest. So, if we assume that a freshly written block is likely to be read from again very soon it's a good bet that placing it in an outer track of a given disk with other active blocks will reduce actuator movement. Likewise, a block or set of blocks that haven't been out on the dance floor for a few songs probably won't be asked to boogie anytime soon or at least not frequently - so we can push those to the back row with the other 80% of the population. It may take a few milliseconds to retrieve those relatively inactive blocks but an occasional blip isn't likely to transfer to application performance problems. And this block placement is analyzed and optimized during each Data Progression cycle, so unlike short stroking, you're not sticking stale data in the best disk real estate.

So, in reality, Fast Track is an optimization feature which provides an overall performance boost without sacrificing storage capacity. Comparisons to short stroking help explain the benefits of Fast Track but it's really much more than that. Obviously, short stroking still provides you with the best guaranteed performance since you're removing variables that we have to live with in a contentious shared storage world. But that's a wholly different issue - I've never advocated shared storage (using any product, mind you) as a way to increase disk performance. Fast Track delivers a legacy performance tweaking concept into the shared storage economy without increasing administration complexity.

Friday, June 25, 2010

The Games We Play - How Console Games Are Like Integrated Stacks

In response to Chuck Hollis and his views on integrated versus differentiated stack infrastructure, I was most interested in his example case of building your own PC to make the point that integrated stacks will win out.

I’m not going to prognosticate – that’s for the industry giants like Chuck, Chris Mellor and others to debate. In the end, it doesn’t matter to me (and it’s one reason I moved to sales – there’s always something to sell and someone to buy it). But if you run a data center or are responsible for IT in your organization, it should.

Chuck’s example using PC technology is fine, if you don’t consider the application and the desired functionality. In my case, I build my own PC’s primarily with gaming in mind – I’ve done this for many years. It’s interesting that despite console (think XBOX 360) gaming has outstripped PC game sales for a good long while now, but even with all the benefits of what you could call an “integrated gaming stack” we still see PC gaming (or, the best-of-breed, differentiated stack) still hanging in there. Possibly making a few dying gasps for air even.

The parallels with enterprise concerns are interesting and I think we can draw some conclusions based on what’s going on in gaming currently.

First, let’s examine why the integrated gaming stack has been so popular and all but crushed legacy PC gaming.

Lower startup costs
Guaranteed compatibility
Ease of use / single interface
Integrates with other home entertainment technology

You can probably think of others, including the all important “cool factor” of something new and different (I’m picturing Eric Cartman waiting for the Wii to be GA). Don’t discount the cool factor for enterprise decisions – everyone’s always checking the other guy out to see what they’re up to and how they’re doing it.

So, the integrated gaming stack looks like a clear win, right? Maybe. Consider the following:

Substandard graphics, storage and processing capability versus PC gaming. I think we can all agree that giving up the pain associated with building your own gaming rig results in you having to accept some lower standards. While you have only yourself to blame for not researching your component technologies before hand, in a console world you are dealing with the best of the mediocre or what I’ve heard some industry big wigs call “good enough” technology.
Technology lock-in, unless you hack your console (but then you forfeit ease of use, right?) In the PC world, I can buy, sell, trade (EULA permitting of course) games with all players. In the console world, I’m stuck if I buy the wrong stack – maybe not a big deal for gaming, but think about the enterprise that runs “XBOX 360” stacks and wants to merge with a company running on “PS3” applications… oh, some lucky “stack integrator of integrated stacks” stands to make some nice coin for the conversion.
Console games SHOULD theoretically be less expensive than PC editions because of the platform compatibility (in other words, consoles are all using the same hardware and drivers, while PC owners can choose from virtually limitless combinations of video, processor and input devices). However, console games are priced the same (and a couple of years ago were even about $10 more). Who’s benefiting from compatibility – the stack provider or the customer? (There’s another interesting situation going on with ebooks which should be cheaper to publish but somehow that savings isn’t being transferred to readers or authors).
Finally, the hidden costs of console gaming are rarely considered because they show up as a gradual tax rather than an upfront cost. Want to play online with your friends? That’s going to cost you extra. Want downloadable content for value added play? Buy some credits. Don’t forget specialized input devices for Rockband or other interactive games (which are very limited in selection, heavily licensed and typically of low quality).

Welcome to console gaming a la the integrated gaming stack – give us your credit card number, sit back and ignore the sucking sound emanating from your checking account.

So, in the end you haven’t reduced your cost – you just transferred that cost to a (gaming) cloud provider. Depending on how much you game and what features you require, you could actually see increased costs. Hopefully not, but keep your eye on the ball.

I agree with Chuck’s statement that “both perspectives are right” but I don’t see the value going wholesale up the stack as his example indicates. Smart and strategic IT leaders are going to need to make sure that the integrated stack is really, honestly delivering on the value promise.

Wednesday, May 26, 2010

SSD at Home

This past weekend I installed a new Intel X25-M 80GB solid state drive into my home PC (which I use for work and play).

I had no end of fun clearing out my 1TB disk formatted as my C: drive (and “system reserve” partition) by moving my documents and certain key applications to another partition in the system. After I was done playing the “sliding square number puzzle game” with my data to pare the combo C: and system reserve down to under 80GB with some headroom I used Partition Wizard Home Edition to move the boot and system partitions to my new SSD.

From there it was a matter of changing the boot order in BIOS and then running a Windows 7 repair after a failed boot attempt and I was off and running.

Drum roll, please!

Does it boot quicker? Oh yes. But, considering that I only reboot about once a week (I usually have the system sleep during idle time) it’s not a huge improvement for me.

Was 80GB enough? Yes but I’m down to about 18GB free now. My Steam game files all now reside on a spinning disk but honestly I’ve not had disk IO bottlenecks with my games.

So… why did I spend money and time on this? Frankly, I wanted to speed up work I was doing in Excel and Perfmon with customer performance data. It didn’t really help that much, but I’m trying to figure out why that is. Shame on me, but I assumed that the bottleneck was with disk because I’ve got an AMD Phenom II X4 955 running at 3.2GHz with 8GB RAM and I’ve got a CPU and RAM monitor loaded as widgets and check them often when things are going slowly.

Your mileage may vary but overall I’m not getting what I thought I’d get in return for my SSD investment but it was pretty much an impulse buy and gosh I just wanted to be the first kid on my block with a solid state drive.

If you want faster boot time or performance improvement for your laptop, I’d check out the new Seagate Momentus XT drives.

Important Notes

Of course, always back up your data before moving anything.

Move documents using the location tab in the properties of the various user folders (i.e. My Documents, My Pictures, etc).

Steam files can be moved to a safe location, and then copied back after reinstalling Steam (you don't need to redownload your games or content).

Make sure you turn off defrag for any logical drives stored on your new SSD.

Make sure you move or RE-move your page files so you aren't thrashing your SSD. Page files are probably not needed if you have ample RAM anyway.

Intel provides a utility to schedule and run TRIM - make sure you do this to maintain optimal drive write performance. Once a week is recommended.

Monday, May 17, 2010

FUD Slinging - Why It Is Poison

I made myself a few promises when I jumped the fence from end user to peddler of storage goods. Among those was that if I ever had to compromise my integrity or ethics I'd go find something else to do. This means that I have to believe in what I'm selling and that the product can stand on its own merits. It also means that I am free to be truthful with a prospect and walk away from an opportunity that doesn't make sense.

Happily, during my onboarding and initial training with Compellent these points were firmly established by the management team all the way from the top to my direct leadership. One of the points made, emphatically, was that it's a very bad idea to talk about the competition to your customer.

I heartily agree with this point of view. Based on my experience as a customer sitting through countless sales presentations I can tell you that there are a variety of reasons which make spreading FUD a bad practice and virtually no good ones.

1. When you talk about your competition you're taking time out from selling your solution. Time is golden. Every minute in front of a prospective customer is a chance to listen and learn and help solve their problems. Every minute spent bad mouthing your competition robs you of a chance to sell your value.

2. FUD is typically based on outdated or inaccurate information. I spend a great deal of my free time getting intimately familiar with my product. I do research competitive offerings just so I know how I stack up in a given account. The customer has allowed me in to talk about what I know best - my product.

3. It's annoying. Really. Sometimes customers will ask for competitive info and that's fine. But even then your probably going to offend someone in the room depending on how you approach those particular questions. I always try to keep it positive when asked about the competition - "Vendor X makes a really great product, it works well and they've sold a lot of them. However, this is how we're different and we believe this is a better fit for you."

4. It's potentially dangerous. First, if I spout off about a "weakness" in the competitions product I've just given the customer a reason to invite them back in to answer to my accusations. Bad for me. Secondly, my blabbering on and on about how bad my competition sucks may leave the customer wondering why I protest too much. Finally, if the FUD you spread turns out to be unfounded the customer could then be convinced you don't know your ass from a hole in the ground (and rightly so). To be honest, I really do like it when my competition has been in before me and spread FUD so that I get to spend more time talking about my product and feature set and to erode the customer's confidence in the other guy.

Anyway, that's my take. I'm not going to say I've never spread FUD. It's too tempting and sometimes the stress of the situation leads you to not think rationally and say all sorts of stupid things! But, as a practice in front of customers and in social media I do my level best to keep the conversation above the level of degrading anyone's company or product.

Friday, April 9, 2010

Dear Jon - Use Cases for Block-Level Tiering

Yesterday afternoon I happened to pop into Tweetdeck and saw a tweet from Jon Toigo -

I'm interested in exploring the case for on-array tiering. Makes no sense to me. Sounds like tech for lazy people!...

I engaged Jon in a quick tweet discussion (I'm the "inbound tweet") until we both had to attend to personal matters but I wanted to come back to this statement because he's brought this up before and I find it a little bothersome. Not because Jon's asking a question or challenging the use case - I'm perfectly fine with that.

My rub is that his premise seems to be that block-level tiering is being positioned as a replacement for data management policy and best practices. That's not the story - at least not the Compellent story. For example, Jon's last tweet on the matter was this:

Seems like understanding your data and applying appropriate services to it based on its business context is becoming a must have, not a theoretical nice to have.

If anyone's selling array based block-level tiering as a replacement for data management policy, archiving best practices, private information security and the like, I'm not aware. This is a pure storage optimization play. There's nothing about automated block-level tiering that would prevent the development, implementation or enforcement of a good data management policy.

What makes my ears prick up is when a statement like Jon's attempts to paint automated block-level tiering as an evil when it's nothing of the sort. You want to implement an HSM scheme or data management policy on top of ATS? Go right ahead - you'll still have data that is less actively accessed (or practically inactive for that matter) until the data management police take appropriate action is my guess.

On Jon's blog, he quotes an anonymous former EMC employee:

The real dillusion [sic] in the tiered storage approach is that the data owners – the end users – have no incentive to participate in data classification, so the decisions get left to data administrators or worse (software). Just because data is accessed frequently doesn’t mean the access is high priority.

This really sums up the ATS nay saying. First, it's not a delusion to say that data owners aren't incented to participate in data classification. It's an ironclad fact. If you're lucky enough to have published a data retention policy, the exemptions and exclusions start flying almost before the ink is set on paper. Still, I don't believe that ATS is a solution to that problem, but rather a reaction to it.

Secondly, the whole concept that ATS is somehow trying to equate data access to criticality is, in my opinion, fallacious. At some level, yes, access frequency tells us a lot about the data - chiefly that there's a lot of interest in it. It doesn't tell us that it's necessarily important to the business. It may be - it likely is. It may not be. Conversely, infrequently accessed blocks may contain business critical data. Maybe it doesn't and likely it's less critical (now) because it's not being accessed frequently (now).

So ATS gives you a way to store data cost effectively, without impeding the data steward from taking action to classify the data and handle it appropriately. It's not an enemy of data management - nor an ally for that matter. So why is it drawing so much ire from Jon?

Jon, feel free to continue to champion for better data management practices - I'm with you. But please don't waste energy fighting something that adds value today while we're waiting for that battle to be won.

As for the use cases - ask your friendly neighborhood Storage Center user!

Friday, April 2, 2010

Hotel Compellent

Tommy posted a great analogy piece on his blog which explains storage in terms of hotel ownership and occupancy rates versus room cost. Not to take anything away from Tommy, because it was a good example for non-technical audiences, but I want to point out that analogies like statistics in USA Today, can be used to position your point favorably while making a seemingly fair comparison.

Let me illustrate. Let’s use the storage-as-hotel example but make some modifications. Hotel X has an outstanding occupancy rate – or shall we say occupancy capacity but our enterprising new owner quickly finds that he has a problem. Following the advice of his architects he’s built a high end hotel with luxury rooms outfitted with imported artworks, complimentary services like turn down and pillow chocolates and other fancy features. Because of this, he finds that his operational expenses (opex) begin to erode the apparent capital expense (capex) efficiencies he thought he’d realized because of the higher capacity.

On top of that, Hotel X has a highly trained and experienced staff waiting to serve the every whim of the guests from bell service, to concierge, to shoe shine, to someone who will hold the door open for you.

Who wouldn’t love to stay at Hotel X? I would – sounds like a great place!

But, I can’t afford it. Nor can many travelers who simply need a place to sleep, shower and maybe make a few phone calls at the end of the business day. Hotel E might be the perfect place for them. Clean, comfortable and cheap. Not too many services available but you can get a free cinnamon bun in the morning and a complimentary cup of coffee.

But consider the proprietors of Hotel C – let’s call them Phil, Larry and John. They’ve been in the business for many years and they know hotels and more importantly they understand guests. They know that most guests (say 80%) really just need a place to sleep for a night or two and don’t want to pay a lot for stuff they don’t need. The other 20% are high end travelers, VIPs or executives who expect the best and demand all sorts of expensive services – and they have the money to pay for it. So, Phil, Larry and John build a hotel to meet the needs of everyone.

Not only can a guest choose to stay in a room that meets their demands, if those demands should change they can upgrade or downgrade to a more appropriate level of service.

OK, you get the point and I could go on and on with this. I even thought about talking about Hotel E as an example of vendor lock in (“you can check out any time you like, but you can never leave”). But the bottom line is this: Take the time to understand what you’re getting or you’ll go broke staying in places like Hotel X and won’t have money to get back home!

Saturday, March 13, 2010

Risky Business

When I look back over my career in IT, I realize that I’ve been involved in sales long before I decided to join Compellent. Every strategic initiative I pushed for over the years has involved a great deal of salesmanship, evangelization and consensus building. Technologies that are common place today got there, in large part, because someone on the front lines identified a trend, thought about how it could help them drive value for the business and started putting a case together for adopting the new technology in their own data center.

I recall years ago, as an IT administrator for a regional bank, suggesting that we drop our Token-Ring network topology for new branch office rollouts in favor of Ethernet. Seems like a no brainer now but at the time there was a lot of concern about the risk associated with making a directional change like that. The concerns were typical (and justified) of any shop faced with the prospect of doing things in a new way. Will our applications behave differently? Are there hidden costs associated with the change? How will support be impacted? Will this introduce complexity?

No Escape
Change is a difficult and necessary part of IT life, and it carries risk. However, there is no escape from risk because NOT adapting and changing also carries risk. Sometimes changes are forced upon you by other entities (government regulations), circumstances (mergers and acquisitions) or drivers (remember Y2K?) beyond your control.

Managing risks due to change on a tactical level involves policies and procedures to establish controls and checkpoints as well as contingency plans. On a strategic level, I think the best way to reduce risk is through simplicity.

Avoid Complexity
One thing I quickly learned as an IT manager was that complexity is your enemy – always. Complexity, in my opinion, is the offspring of laziness and the cousin of carelessness. The more complex your environment, the more difficult and costly to manage and adapt to changes which means that you have one big ass cloud of risk hanging over your head.

The opposite of complexity in the data center is elegance. A solution that is simple to understand, manage and maintain and is effective at lowering costs and delivering service is an elegant solution. Compellent’s Fluid Data Architecture is one such elegant solution and I know this because every time I do a demo for a prospective customer they light up – they understand how elegant our solution is.

Spare Some Change?
On March 24th from 2-3PM CT you’ll have an opportunity to chat with one of our customers, Ben Higginbotham of WhereToLive.com. I’ll moderate a Twitter chat with Ben on the topic of change and risk in IT. Here are the details. I hope you’ll join and ask questions or share your own success or lessons learned.

Tuesday, March 2, 2010

HISTORY LATHERS, RINSES AND REPEATS

Early in the last decade if you were making a storage purchasing decision you likely would have been frustrated with sales presentations, analyst reviews and industry news about storage virtualization to the point that you’d rather purchase ANY product that didn’t have this mysterious capability. Of course, the uncertainty continues today, although the hype has died down. Unfortunately, another buzz phrase has cropped up to befuddle the marketplace – automated storage tiering (ATS).

I’m watching this unfold as I transition from end-user to peddler of shared data storage and there’s clear indication from several recent blog posts and tweet threads that ATS, as an idea, is getting abused much like storage virtualization has been for years.

ME TOO!

Why is this happening? If you consider that all suppliers sell to their strengths then it’s no wonder this happens. You’ll generally have a leader or two who develop a conceptual feature into a real working product and start to pull in some mindshare (and hopefully for them, market share as well). When a feature or function starts to gain traction, you’ll find that folks on the supply side will generally fall into one of two camps; “We have that too” or “You don’t want that” and the debate rages from there.

The root of confusion lays within the “Me too!” crowd because, well, they may not actually HAVE it, but they have something close enough that they can fudge a little and get away with it. This feeds the “You don’t need it” side with the fuel of uncertainty with which they’ll try to capitalize on customer frustration.

IT’S CACHE! IT’S FLASH! IT’S A DESSERT TOPPING!

A big part of the confusion around ATS is around the role of SSD as a tier of storage. Since SSD acts like disk but performs like traditional storage cache it doesn’t fit neatly into either category. For example, many (possibly all, I don’t know for certain) disk array systems will by-pass write cache for SSD bound blocks.

Does that now make SSD cache? Well, not according to the SNIA Technology Council’s storage dictionary. Cache is both temporary and performance enhancing. While SSD certainly improves performance, it is arguably not temporary storage.

Bottom line is that SSD can be used as a healthy part of your ATS solution. And it’s easy to see that eventually it will be a big part with traditional enterprise disk being squeezed out by SSD on the high end and big slow SATA/SAS disk on the low end. Who knows, maybe it will all be SSD at some point? Or bubble memory? Or quantum dots?

The point is, don’t let confusion around the future face of tiered storage scare you from adopting ATS today because you can reap real benefits here and now. Just make sure you’re choosing an architecture which will accommodate the changing landscape and you’ll be fine.

Sunday, February 28, 2010

Architecture versus Feature

I'll give NetApp CEO Tom Georgens credit for one thing, he knows how to stir the pot. There’s been lots of good and interesting debate over a feature offered by several storage suppliers since he made that comment.

Notice I used the word "feature" - that's an important distinction. Automated tiered storage (I'll say ATS as a generic term - Compellent calls it Data Progression, others have different marketing terms) is a feature built on top of an underlying storage architecture. Other features include thin provisioning, snapshots, replication, data protection and storage management. It's important to understand the foundation used to offer up features when deciding how functional and useful they really are in YOUR environment.

In many cases storage companies have developed ways to virtualize physical storage in order to treat the available installed disk capacity (or some subset of that capacity) as a single logical pool. Using this abstraction, volumes or LUNs can be created which ignore the limitations imposed by a single physical disk array (such as being able to change RAID protection levels on a LUN). Depending on the way in which storage has been virtualized the ability to deliver more value from the aforementioned features will be impacted.

For example, some virtualization schemes are built on top of existing, traditional array architectures which limit the level of granularity for virtualization. Be aware that any architecture with a long legacy will show signs of these limitations by restricting the utility of the newer features. By necessity, these legacy systems must maintain backwards compatibility with older generations so that services such as replication and storage management aren't "broken" which is a good thing. The trade off is a watered down capability for emerging feature sets. Sometimes this is because of hardware dependencies in earlier generations of products.

In the case of newer generation platforms, such as Compellent Storage Center, there is no such bolting on of new features and software engineers are free to implement these exciting new capabilities in more valuable ways.

For ATS, this means having the ability to create a logical volume which provides storage economy by addressing performance requirements without the cost penalty associated with locking the entire volume into high performance disk. For legacy architectures it means moving the entire volume around to address performance spikes while maintaining a reservation of storage in each tier for these volume moves. You can see there's a huge difference in the implementation between an ATS offering on one architecture versus another.

To REALLY understand how any of these features can help you, it's important to take the time to learn about the foundational architecture used to build these features. Take a look at Compellent's Fluid Data architecture and you'll see why we are perfectly suited to deliver advanced storage features in an easy to manage package.

Everyone sells to their strengths, which is why Mr. Georgens is trying to convince you that tiering is dying. By understanding what those strengths (and associated weaknesses) are you'll be better able to make the best decision on future storage purchases for your business.

Focus on architecture before you get sold on features.

Friday, February 26, 2010

Kickstart Your Replication

They say a picture is worth a thousand words. Given that this particular storage array had a capacity of about 20 terabytes this picture is arguably worth two trillion words. I’ll try to use far less words than that to explain this photo.

What is this catastrophe, you ask? Clearly not a good day for some storage administrator (and trust me when I tell you I know all too well how he felt at this moment). Let’s rewind and figure out how things got to this point, shall we?

Replicating data to a disaster recovery site is really a great thing – provided you have either oodles of bandwidth or many days (weeks?) to get your data protected, or what we in the business call TTP (Time to Protect). Until your data is fully replicated or “sync’d up” as we say, you’re really at continued risk. Since not many companies can afford the high speed WAN connectivity to pull this off, traditional storage vendors offer a simple, if not brute force, solution… I call this the CTAM protocol.

That’s right, the Cargo Truck Async Mirror protocol involves standing up the DR array in your production data center and performing an initial sync between the production and DR systems over LAN or FC speeds. In addition to potential problems like the one depicted in the picture you have the cost of your storage vendors' professional services for an additional install, plus the packing and shipping services (and trust me you do NOT want to go cheap here) as well as dealing with your risk officer and explaining the detailed shipping process which includes hourly tracking updates (typically when you’d rather be sleeping).

Fortunately, Compellent customers have another option in Portable Volume. A couple weeks ago I called one of my customers, Dane Babin, CIO of Business First Bank to see if he was free for lunch and he told me that he’d sent his Portable Volume kit to his DR site earlier that morning with a kickstart replication. He’d have time for lunch and then we could both go back to his office and finish up the process.

When we got back he went straight to his office and I set my gear down and popped over to the break room for a bottle of water. By the time I got back, he was grinning from ear to ear having just finished the process of importing the data and could go into the weekend secure in the knowledge that his company's critical data was protected at a DR site four hours away.

I asked Dane what Portable Volume meant to him as a customer and he said, "It was very easy. I connected it at my production site, made a few clicks in Enterprise Manager and a short time later packed up the drives and sent them off. I had a branch manager connect at the DR site and the Storage Center immediately recognized the drives and started copying data. It was so easy even a branch manager could do it!"

Not that branch managers aren't capable people, but the point is Dane didn't need to tie up one of his engineers to get the job done. Dane can reuse his Portable Volume kit to kick start more replications in the future or to help recover his production site if the need should arise.

For a demonstration of Portable Volume in action go to this link featuring Peter Fitch, IT Infrastructure Manager at Rudolph Technologies. Notice Peter mentions being influential as a customer in the design and implementation of Portable Volume - Compellent listens to customers!

You can also view a demonstration of Portable Volume here.

Monday, February 15, 2010

Hello World

This year is the beginning of a new era for me professionally. I've lived in the corporate IT world for 15 years doing everything from report writing to desktop support to server support and finally carved a niche for myself by jumping on the shared storage bandwagon at the turn of the century (I love saying that).

Career progression is great, but as I found myself moving up the chain, and thus away from the technology front lines, I began to reconsider where I wanted to be over the long haul. Strange as it may sound, I've always been attracted to the sales side of things - I mean, after all we're all salespersons at the end of the day. So, when the opportunity presented itself I said a bittersweet farewell to my cubicle and jumped on the chance to join Compellent as a Storage Architect.

The most interesting thing I've noticed in this side of the business is that the sales teams you meet as a customer are pretty much all on speaking terms outside of your conference room. You wouldn't believe the waves and static a customer makes in the general sales world with a simple comment, question, complaint or praise... I didn't realize as a customer that I had that much influence. I guess I should have, but I never really thought about it.

So, my message to comrades I left behind on the front lines of keeping the business running is two fold:

1. You have tremendous powers of influence on the IT industry
2. Use your powers wisely

Thanks for reading my blog and I hope it adds value - as Storage Mojo says, courteous comments are welcome!