THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

750G Disks Are BAHD for DBs: A Call To Arms

I was reading the morning newspaper with a cup of coffee, well, actually I was reading slashdot.org, and I tripped across this story about some new 750G disks @ 7200 RPM soon to be released by Seagate. This filled me with a sense of dread about having to, once again, go through the process of convincing purchasing managers at various customer sites that actually, no, they can not just buy three of these and RAID-5 them together into a huge storage area for their terabyte database.

But, but, why, you may ask?

Think of a disk array as a warehouse. No, not a data warehouse, an actual brick and mortar warehouse. Imagine it as a big building in which you store physical stuff, like books or paper forms or cases of wine or something. Visualize the warehouse as having several loading docks for delivering new stuff or for loading up containers to ship stuff out. Then, imagine the access road or, for a large warehouse, various access roads leading to the loading docks. Are you with me so far?
Now, let’s map this analogy back to the array:

The square feet of your warehouse is the size of your array in gigabytes.

The loading docks are the separate disks in your array.

The access roads are the number of controllers in your server servicing these disks.

So now, tell me, what happens when you use very big disks for high-performance applications? You have way, way too many square feet to service with far, far too few loading docks (and usually only one access road!!!).

In the “good old days” when 9G disks were big, we didn’t have this problem. Really, this problem is new since then. Back then, if we wanted 200G of storage RAID1, we needed about 45 of those disks. Controllers could only handle 7 of them, you see (the 8th device on the bus was the controller itself) and that meant we had proportionally lots more access roads, and lots more loading docks per square foot of warehouse space than you typically have today. Now, some may look it up and say that Ultra-320 SCSI has 32 times the bandwidth capability of those ancient controllers! But note the following: In 1998, Storage Review’s editor’s choice for enterprise hard disk was the Seagate Cheetah 9LP This drive featured 10000 RPM and 5.4ms seek time and could deliver 10MB/sec to the controller. Now compare that the the specs of these newly announced Barracudas: 7200 RPM, unpublished seek times and 100MB/sec maximum theoretical peak delivery to the controller. For reference, the 7200.7 had 8.5 ms seek, the 7200.8 had 8ms seek and the 7200.9 had 11ms seeks, so we’re likely to have seeks somewhere in the 1.5x slower range than my 9GB reference.

I note:

  • The seek time is actually slower today.
  • The bandwidth performance is maximum only 10 times better
  • Your controller is no more than 32 times faster
  • The disk, however is about 83 times bigger!

Now, I admit things got faster as they got bigger, but they did not get as faster as fast as they got bigger.

That’s why I’m founding a new club, in the spirit of BAARF, the Battle Against Any Raid Five.

I’m calling it the Battle Against Huge Disks for Databases, or “BAHD for DBs”. You can either join me in my battle against huge disks for Databases. Or not. Together, we can relegate these monsters to their intended purpose, whatever that may be. Or not.

To join, simply post a comment to this article with a story of how you have fought the BAHD for DB. I will keep the following list of charter members up to date:

# Name Battle story
1. Paul Vallee Wrote the original BAHD for DBs call to arms
2. Mark Brinsmead …short story made long, here’s the problem: disks are now almost 200x bigger than they were back then, but they are nowhere near 200x faster! Not if you use the entire disk, anyway. (Perhaps I’ll elaborate on that another day…)
3. Jonathan Gennick You mean I should use more than one drive?
4. Doug Burns Then again, if I were to put 5 of those 750Gb disks in my server at home… Mmmm.
5. Pete Scott Well, big disks could be useful to hold an online disk backup of a database. But other than that
6. Stephen Booth In the long run (due to better throughputs &c) using lots of smaller faster disks will be much, much cheaper but it could put up the initial implementation costs by 5%. As the implementation team have no responsibility for the long term running costs but do for the initial costs they go for what ever’s cheapest.
7. Connor McDonald As long as their in a SAN, it will all be okay. Doesn’t matter what crap disk it is, apparently, if its in a SAN, disk performance will be magically awesome. All the SAN vendors tell me this all the time.
8. Mogens Norgaard This is very good. Please make me a member
PS: The Cash you pay for the Cache will of course remove (alleviate? is that the word?) all problems you might encounter with big disks.
9. Thet Win Absolutely! Call for arms, it is. Count me in.
10. Marco Gralike What there aren’t 200Mb disk any more?
Were did they go? You traded them in for only a view 750Gb disks? He, thats not an army! No wonder you you were written out of the script.
11. Carel-Jan Engel Well, this should make me the 11th member, the same seq# as I have at the BAARF.
We need a Small Disk Liberation Army.
12. Andrei Kriushin The history evolves in spirals indeed. Does anybody remember “magnetic drums”? Seems they are coming back ;-)
13. This space intentionally left blank. ;-)
14. Joel Garry I remember when carrying around 20M disk platter stacks was a decent workout. Now I lose thumb drives. There’s no substitute for cubic inches, but you need to get the power to the ground. Bandwidth Über alles.
15. Jay Miller I’ve lost this fight many times in the past and expect to lose it many times in the future.

Them: But we’re giving you much faster CPUs, that should make up for it!
Me: We’re not CPU bound, we’re i/o bound!

16. James Morle I’m holding out for the 1TB drive. It would make the most perfect ironic
bedfellow for the 3-disk RAID-5 volume (The Most Ridiculous Configuration
In The World). Conversely, if we could get these storage densities into a
73GB 15K drive (thus minimising seeks) that might be a nice drive.
42. Jared Still It appears that soon we may have multi petabyte disks to contend with, and using storage virtualization software to manage many database in our 2 disk SANS. (RAID 1 you know)
What could make life easier than that.

Sign me up please, and make me member #42, as Mogens didn’t ask for it.

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @sheeri: #confoo talk "Bending Queries to your Will with EXPLAIN" slides http://bit.ly/explainslides & handout
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more