THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

When Was Your Last Disaster Recovery Test?

If you answer anything else but something like “last month and every month before that”, then you are probably in troubles. Learn from Wikipedia’s Data Center Overheating.

It doesn’t mean that they didn’t regularly test their disaster recovery process. Maybe they did but the failover mechanism was broken after the last test.

A regular DR procedure validation is designed to minimize the risk of a broken process to go unnoticed. If the failure is detected during a regular switchover process, you are prepared to handle it way better (or potentially just leave services on the currently primary site) than during emergency failover when you get to the “Oh shit!” moment under the tremendous pressure to get services back.
Read the rest of this entry . . .

{Expensive | High-End | Modern} SANs Never Fail… Not!

How many times have we heard the assurance of storage administrators (fueled by the SAN vendor’s claims) that their top-of-the-shelf SAN arrays simply cannot fail. Unfortunately, reality proves this wrong and we see it regularly with our customers.

At the moment of this writing, one of our DBA teams has just completed failover to the standby database as a result of a database crash caused by a SAN issue. A few hours have passed, and parts of these databases are still not available on the formerly primary host, but traffic is being handled just fine on the standby. This customer provides SaaS type of services. Imagine what hours of downtime would do for them and their clients?

Unfortunately, people get bitten by this overestimated (god-like I’d say) SAN reliability. It must, however, be said: SANs do fail!

Do you want such a wake up call for your executives?

The outage, blamed on an IBM storage array, saw the company’s chief technology officer promise “significant changes to the way we deploy and manage our storage environment”.

Since I mentioned one Australian example, here is one more storage failure scenario described by our friends at Open Query. There are many cases from literally any industry, and some of them are rather complicated while others are just plain obvious.

Is there a silver bullet? Well, not as solution but as a concept, yes — simply admit that SANs do fail — this what should drive infrastructure design for business continuity. Actually, I should extrapolate it to another design principle — everything fails, but that’s another story.

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @pythianfielding: My #ukoug2011 #Exadata IORM presentation starts in a few mins in hall 7A
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more