SQL Server: High Availabilty & Disaster Recovery à la carte!
Aug 2, 2012 / By Mohammed Mawla
For the business world , a good RDMS is one that can provide high availability (HA) options with minimal downtime and cheapest licenses. Vendors, usually if not always, provide a set of HA features to satisfy different needs. Usually some of these features are reserved for higher-end editions: Enterprise, Datacenter, cluster or whatever named; of course, those editions are more expensive than their “inferior” counterparts.
Most important factors on choosing an edition are:
- License fees:
This is almost the most important factor as high-End editions cost a lot of money; also upgrading them in the future to same edition can cost a lot; for example: Cost of SQL server Enterprise Edition core license is almost four times of Standard Edition cost.
A critical system can benefit from the fine-tuned optimizations, HA and recovery options available only to high-end editions. Example: SQL server ENT edition can support more than two nodes for clustering, support online index operations, fast recovery, online restores, resource governor, partitioning, TDE,….etc
- Vendor requirements
If you are using a 3rd party software, you may be limited to one or two editions. For example: SAP requires SQL Server Enterprise Edition for applications based on the SAP Netweaver Stack because they use ENT edition features like partitioning and data compression.
For those who have been working with SQL server for a while, they know that new High Availability (HA) options were being added every major version, with the exception of SQL server 2008, to address business needs. SQL server 2008 introduced enhancements to the already existing features but no new features.
New features were added trying to address business needs that weren’t addressed in previous editions or tried to address them it in a better way.
Here’s a brief outline of the HA features introduction across the versions; I’m starting with SQL server 2000 albeit I worked with 6.5 & 7.0:
SQL server 2000
- Log shipping
Providing a warm standby copy of a database, something not possible with Clustering or an immense effort with replication depending on your schema. Log shipping allowed creation of read-only copy of a database and also across multiple sites which is beneficial for reporting.
However, LS was far from perfect : No automatic failover, data loss could happen because logs are shipped on a schedule and more space is needed for the log backups (backup compression enhanced that later).
SQL server 2005
- Database mirroring (supported in SQL server 2005 SP1)
Automatic failover possible with Witness, no data loss with Synchronous Mirroring & simpler to configure. However, still have its downs: works only per database, only one secondary instance and no read-only standby (Database snapshot can be used but only supported in ENT edition).
- Peer to Peer transactional replication
ENT edition feature where a set of instances contain identical copies of the data and we want to distribute the load among them. Reads are spread around and the loss of one instance can be substituted by routing traffic to others.
P2P replication is still uneasy to setup, has many restrictions beside it does NOT have conflict detection/resolution. More info can be found here http://technet.microsoft.com/en-us/library/ms151196.aspx
SQL server 2008/R2
No New features but some enhancements like DB mirroring log stream compression and corrupt pages recovery.
SQL server 2012
- Introduced AlwaysON Availability groups which almost combined all of the above features.
AlwaysON is the gem of HA patching a lot of other technologies shortages : Failover a group of databases, multiple secondaries, more flexible Failover Policy, Virtual name for fast application failover and more.
Downside: COST, licensees and hardware.
Some other 3rd party application provide HA features that satisfy other needs; example HP Polyserve (I believe it’s retired now) consolidates multiple SQL Server instances onto fewer servers to maximize data availability through virtualizing the data tier. Some other tools continuously replicate the storage tier across the network to a secondary server in case primary fails.
SQL server provides a set of HA features with each version. Editions themselves can support a different set of features and sometimes one technology may have better capabilities in higher editions (Example: DB mirroring Asynch mode in ENT edition).
I’m listing the features of Enterprise and Standard editions as those are the most widely used but you can use this link to see the features comparison across all editions
|SQl server Version||Edition||Clustering (1)||Replication||Log shipping||DB mirroring||P2P replication||Geo-Clustering (2)||Availability groups|
(1) : Standard edition can allow two-node clustering only
(2) : Stretched VLAN is necessary for Geo-clusters involving SQL server 2008 R2 and below. SQL server 2012 ENT edition supports multi-subnet clusters natively but STD edition still needs Stretching VLAN for a 2-node Geo cluster, more info is here.
What’s really good about these options, and the idea behind this blog, is how you can mix almost ALL of them to achieve your HA/DR goals (pick à la carte) . Since not everyone can afford SQL 2012 ENT edition, Windows Server Enterprise edition and multiple instances in different locations to achieve a decent HA/DR topology, we need to mix the existing features to complement each other and offer a requirement not provided by other feature.
- Clustering does NOT provide a standby or a redundant copy of the databases; a failure in shared storage can be a severe problem. Yes there are hardware redundancies at some SANs but they don’t come for cheap so you need another way to “duplicate” you data.
SQL server 2000 brought log shipping allowing to create a standby database that can be even readable for reporting purposes but it has some disadvantages (see above). If no data loss can be tolerated by LS then database mirroring should be used.
Both of LS or DB mirroring can work smoothly with clustering since the virtual name of the clustered instance will be the end point.
- On the other hand, DB mirroring or Log Shipping protects only against one single database so if the primary server fails, there’s a lot of work needed to bring a second server to take over including (logins , jobs , user objects ….etc) so clustering excels here.
- If you are using database mirroring as a primary HA/DR solution, you may want to have some data shipped for reporting purposes and then you want to combine replication or log shipping. I saw a client deploying a 3rd party application that needs to replicate the WHOLE database to another instance for HA but we wanted to add a 2nd layer of protection for published database by mirroring it.
- Also, since database mirroring have limitations on the number of databases that can be mirrored then you want to use another feature if you have too many databases. Personally, I’ve seen that with some of our clients where they have too many databases and they want to mirror them all causing the mirror instance to stop working after reaching a number of mirrored databases; Microsoft has a KB that outlines hardware considerations for database mirroring.
- Failover clustering can also be combined by Shared storage replication (needs 3rd party support) across multi-site to protect against disaster on one site; this is Geo-Clustering.
- P2P replication nodes themselves can be clustered instances. Also , the instances involved in SQL 2012 Availability groups can be themselves clustered SQL instances. Please do NOT confuse Windows Server Failover Clustering WSFC necessary for Availability groups with SQL Server failover cluster instance (FCI); the first protects a single or multi-database and the second protects whole instance.
Well, SQL server has plenty to offer and you can pick and mix what you want to make it work best for you; what’s lacking on one feature maybe offered in the other. Whether it’s budget, 3rd party tools requirements, upgrade policy, ….etc there should be a fit somehow.
I intend writing few blogs where I’ll list some of the scenarios where we had to implement more than one feature to satisfy data redundancy and the considerations behind implementing them.