How Todd Hoff learned to stop worrying and use lots of disk space to scale

By Paul Vallee May 21st, 2008 at 9:22 am
Posted in MySQLOracle
Tags:

Todd Hoff, who apparently learned a hell of a lot during a short stint at Yahoo followed by some startups has an extremely well-written and edutaining article about how scaling to a million or more users requires jettisoning more or less everything we know and love about relational modeling.

Even though he uses bigtable (Google’s distributed hash storage system) as his example, in reality this approach works well with relational datastores like MySQL and Oracle too, you just have to think about your data differently and use the databases differently. So I’m including this article in the MySQL and Oracle categories because I think it would be of interest.

Here’s a taste of how it reads:

How do you structure your database using a distributed hash table like BigTable? The answer isn’t what you might expect. If you were thinking of translating relational models directly to BigTable then think again. The best way to implement joins with BigTable is: don’t. You–pause for dramatic effect–duplicate data instead of normalize it. *shudder*

Flickr anticipated this design in their architecture when they chose to duplicate comments in both the commentor and the commentee user shards rather than create a separate comment relation. I don’t know how that decision was made, but it must have gone against every fiber in their relational bones…

But Flickr’s reasoning was genius. To scale you need to partition. User data must spread across the shards. So where do comments belong in a scalable architecture?

The answer is, in case you aren’t following yet, you store it everywhere you might need it and worry about keeping your multiple copies in sync later, if at all.

BigTable data ethics are more Mardi Gras than dinner with the in-laws. Data just wants to have fun. BigTable won’t stop you from hurting yourself. And to get the best results you may have to engage in some conventionally risky behaviors. But if those are the glass bead necklaces you have to give for a peak at scalability, why not take a walk on the wild side?

So anyway, this is awesome stuff and thanks Todd. For your reading and learning enjoyment: Todd Hoff’s “How I learned to stop worrying and use lots of disk space to scale”.

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Google
  • del.icio.us
  • Facebook
  • bodytext
  • Technorati
  • TwitThis
  • Reddit

2 Responses to “How Todd Hoff learned to stop worrying and use lots of disk space to scale”

  1. todd Says:

    Hi Paul,

    Actually I was learning from the people mentioned in post and inflicting it on the blog. I don’t have an easy time with this stuff either. The post was my way of trying to understand how to write my own code and then writing up how it seems to me.

    And I think you are exactly correct about the ideas working on other systems.

  2. Paul Vallee Says:

    Yes, exactly. In fact I was jamming with a giant in the internet-scale world (a very early leader) today about how one could use Oracle’s object interface on a federated architecture to exactly mimic the performance and scaling characteristics of bigtable.

    Cheers

Leave a Reply

Filling out the following captcha not only allows us to cut down on automated blogspam but also helps digitize books. Please feel free to send comments on this approach directly to Paul at vallee@pythian.com.

NOTE: After submitting your comment, verify that it is added to the blog. New comments will be marked as "waiting for moderation" (we only moderate for spam). If the level of spam is as low as we hope, we will bypass this step.