{Expensive | High-End | Modern} SANs Never Fail… Not!

Posted in: Technical Track

How many times have we heard the assurance of storage administrators (fueled by the SAN vendor’s claims) that their top-of-the-shelf SAN arrays simply cannot fail. Unfortunately, reality proves this wrong and we see it regularly with our customers.

At the moment of this writing, one of our DBA teams has just completed failover to the standby database as a result of a database crash caused by a SAN issue. A few hours have passed, and parts of these databases are still not available on the formerly primary host, but traffic is being handled just fine on the standby. This customer provides SaaS type of services. Imagine what hours of downtime would do for them and their clients?

Unfortunately, people get bitten by this overestimated (god-like I’d say) SAN reliability. It must, however, be said: SANs do fail!

Do you want such a wake up call for your executives?

The outage, blamed on an IBM storage array, saw the company’s chief technology officer promise “significant changes to the way we deploy and manage our storage environment”.

Since I mentioned one Australian example, here is one more storage failure scenario described by our friends at Open Query. There are many cases from literally any industry, and some of them are rather complicated while others are just plain obvious.

Is there a silver bullet? Well, not as solution but as a concept, yes — simply admit that SANs do fail — this what should drive infrastructure design for business continuity. Actually, I should extrapolate it to another design principle — everything fails, but that’s another story.

Interested in working with Alex? Schedule a tech call.

About the Author

What does it take to be chief technology officer at a company of technology experts? Experience. Imagination. Passion. Alex Gorbachev has all three. He’s played a key role in taking the company global, having set up Pythian’s Asia Pacific operations. Today, the CTO office is an incubator of new services and technologies – a mini-startup inside Pythian. Most recently, Alex built a Big Data Engineering services team and established a Data Science practice. Highly sought after for his deep expertise and interest in emerging trends, Alex routinely speaks at industry events as a member of the OakTable.

5 Comments. Leave new

“everything fails”, indeed!
Including standby/DR HA sites…


It’s just more fun when the failure is caused by the HA!


How true…


Oh, yes. They most certainly do fail. Normally when a maintenance engineer is in the vicinity :-(

This IBM Storage Fails Too Often, so Let’s Switch to EMC and Be Done… NOT! | The Pythian Blog
February 26, 2010 9:32 am

[…] couple weeks ago I did a short blog post about SAN storage failures and how people are blinded by all the bells and whistles that are supposed to make storage arrays […]


Leave a Reply

Your email address will not be published. Required fields are marked *