Pythian has openings for MySQL and MS SQL Server DBAs in each of our offices in Ottawa, Canada; Boston, USA; Dubai, UAE; and Hyderabad, India. If you are a MySQL and/or SQL Server DBA and would like to evaluate this opportunity, please send us your résumé with an introductory paragraph to hr@pythian.com.

Good Database Design is Mightier than Hardware

By Shakir March 7th, 2008 at 10:03 am
Posted in Group Blog PostsOracle
Tags:

Have you ever heard the one about throwing hardware at a software problem?

In one of my previous blog posts, I mentioned something along the lines of—well I’ll just cut and paste . . .

In my experience, the solution to most problems (the ones the caller refers to as “it’s running slow”) are not rooted in hardware, because hardware problems generally cause things to not run at all. It always baffles me when developers and architects prescribe hardware upgrades to make things run faster, because about 80% of the performance-related problems and subsequent solutions I’ve dealt with were resolved by tuning the application.

Well yes, I know you can buy new hardware, and it’s easier. But when it comes to hardware, how many of you have a ten-node RAC cluster running Enterprise Edition with 8GB of RAM on each node, running off a massive SAN?

I’ve been on so many systems that have been running for years–poorly–the way they were, and in a week we can take them apart and have them running without a hitch. We’ve even managed to fix problems that turned out to be the business case to go from RAC back down to a single instance. How much did those customers save on licensing costs?

Back to the example at hand. I have this nifty RAC system that supports some very public and very mission-critical apps, and one day (it was Sunday night) it starts choking. We’re getting enqueues. Slowly they start climbing. Ten nodes came to a crashing halt. I have now seen a ten-node RAC cluster come to crashing halt and completely lock up.

Why, you ask?

A simple SQL statement: DELETE FROM a WHERE b=c AND d=e;.

That’s it. It was going off indexes and in itself was rather boring. Yes, indeed. So why was this statement the one waiting on all the enqueues?

This table was at the top of a four-levels-deep parent-child relationship, and all the foreign keys were un-indexed. For every delete, Oracle was forced to do four full tablescans on parent tables to make sure no child records were around to be orphaned. This oft-overlooked foundation of data-modeling was overlooked on this app, causing some serious downtime.

I first learned of this issue (well, not really an issue but a design oversight) when reading something by Steve Adams over at Ixora back in the day. (If anyone has a link to this or any article illustrating this, I’d appreciate it.) Back to basics, folks! We know Oracle is good, but that doesn’t mean we can forget the core of application design.

The point to take away from this is that we had a massive amount of hardware crippled by a design flaw that was easily avoidable. At this point, the work that I and a few of the other professionals at Pythian have done has reduced the overall load on this cluster to justify permanently reducing it. This is on an app that peaked at over 3000 queries per second prior to anything being done on it. It’s still doing the same amount of work, although I think we can count on this and a number of other changes that we’ve made at the design level, to drop it down from ten nodes (which is obviously overkill for most apps) to a more reasonable two.

There is a time and place for hardware upgrades. The problem is that most people find hardware easier and just aren’t aware of how expensive their app logic really is. I find tuning easier and more cost effective.

Adieu.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Slashdot
  • Google
  • del.icio.us
  • Facebook
  • bodytext
  • Technorati
  • TwitThis
  • Reddit
  • Spurl
  • De.lirio.us
  • Furl
  • blogmarks
  • Ma.gnolia
  • E-mail this story to a friend!

6 Responses to “Good Database Design is Mightier than Hardware”

  1. Paul Vallee Says:

    Hi Shakir

    The article and query at ixora’s site is here - the year was 2000. We miss you Steve. :-)

    Paul

  2. lotso Says:

    I’m running a 250GB sized PG DB on a celeron 1.7G box and 768MB of ram which never really sees it’s load average hit < 3 and due to the design of it, it’s handling the load pretty well and users are happier with the response compared to the company’s own OLTP styled DB on mssql

    right tools for the job.

  3. Nigel Thomas Says:

    Other references: Tom Kyte and Connor McDonald (on Jonathan Lewis’s site.

    As well as full table scans, there’s a locking - sometimes deadlocking - issue (which full table scans are more likely to reveal, of course). See OraStory for one anecdote. The FK locking behaviour has changed over time (sorry, can’t find a good reference or version number quickly).

  4. Oracle Musings » The Rule of 5 Says:

    […] point you at an excellent blog post by Shakir Sadikali at the Pythian Group which shows off a ten-node RAC cluster brought to its knees […]

  5. Log Buffer #88: a Carnival of the Vanities for DBAs Says:

    […] SQL is 5 LIOs per row per row source.” He cites a post by Pythian’s Shakir Sadikali: Good Database Design is Mightier than Hardware. (Incidentally, Shakir will be handling next week’s […]

  6. Randall Says:

    Big hardware breeds lazy programmers…

Leave a Reply

Filling out the following captcha not only allows us to cut down on automated blogspam but also helps digitize books. Please feel free to send comments on this approach directly to Paul at vallee@pythian.com.

NOTE: After submitting your comment, verify that it is added to the blog. New comments will be marked as "waiting for moderation" (we only moderate for spam). If the level of spam is as low as we hope, we will bypass this step.