On the Difficulty of Data Migrations (Especially to NoSQL Databases)

Apr 26, 2010 / By Gwen Shapira

Tags: , , ,

I’ve been reading a lot of NoSQL blogs recently, and one thing that bothers me is that many of the leading NoSQL bloggers seem to have very different experience in operations that I’ve had.

Here’s an example:
Over at the O’Reilly community blogs, Andy Oram interviewed two MongoDB experts about migrating from a relational databases to MongoDB.

Here’s what the experts said:

” 1. Get to know MongoDB. Download it, read the tutorials, try some toy projects.
2. Think about how to represent your model in its document store.
3. Migrate the data from the database to MongoDB, probably simply by writing a bunch of SELECT * FROM statements against the database and then loading the data into your MongoDB model using the language of your choice.
4. Rewrite your application code to query MongoDB through statements such as insert() or find().

OK, so which step do you think takes the longest? And the answer is…step 2. Design is critical, and there are trade-offs that provide no simple answers but require a careful understanding of your application. Migrating the data and rewriting the application are straightforward by comparison. “

I’ve never migrated anything to MongoDB, but I was involved in the migration of a large application from SQLServer to Oracle. Both are relational databases so there was almost no need to rethink the data model. The rewrite and the migration took over two years, with significant bugs discovered and fixed up to the last week. The majority of the time spent on migration. None of it was done by “simply by writing a bunch of SELECT * FROM statements against the database”.

We did not lack expertise – we had plenty SQLServer and Oracle developers and DBAs with 10+ years of experience. Note that no one has 10 years of MongoDB experience.

I don’t doubt that modeling is critical and the trade-offs are always difficult, but I’ve yet to see a modeling phase that took more than rewrite + migration of large applications with big data. Note that large applications and big data are the target customers of NoSQL databases, so I’m not inventing irrelevant issues here.

I’ve experienced two major difficulties with migrations:
The first one is that you normally have large number of users, and you may be reluctant to migrate everyone to a new system at once. No matter how good your load testing skills are, you will still not be 100% certain your new system will have perfect performance under peak load. So you do phased migration. Start by moving 5% of the users, then another 15%, then another 30%, and then if everything goes well, you may migrate the rest.

Why is this a difficulty? First, the users may share data with users that have not yet migrated. There could be dependencies. You’ll need to figure these out and write temporary code to solve those that will be used only during the migration phase. But before that, you need to find a way to migrate specific parts of your data. This requires figuring out how to tear things apart carefully within and across tables. A mini modeling project in its own right. This complicates the “bunch of SELECT * FROM statements” quite a bit.

Oh, and the migration may fail. Spectacularly. At 3am. You now need to migrate all the users back. With the new data they inserted into the new DB. I hope you prepared a script in advance to do that.

And that is just the first difficulty. The second major problem is that you may have large amounts of data arriving at high rates. You could declare 3 days downtime to move all the data, but I can see some reasons not to do that.

The alternative is to move the data in increments. First select and copy all the data inserted until today at 8am. Once this is done, select and copy all the data inserted between 8am and now. Then all the data between the previous now and the now-now. All in ever shrinking deltas of data that will eventually converge to a point where you can switch the users over. This requires that all large tables will have timestamps, preferably indexed, hopefully partitioned. Even with timestamps it is not a trivial application to write, and it has to take care of dependencies – you can’t migrate comments on a document without migrating the document itself.

During the incremental migration and the data streaming phase, you have to support two systems with the same one operational group. The same operational group that now have to learn to support a new database and a lot of new code rewritten for it. Not impossible, but far from “straightforward”.

I always thought that the biggest misconception developers have about operations is the “just add a bunch of servers to solve the performance issue” myth. I can add “migration to a new system is straighforward” as another dangerous myth.

I’m not blaming them, they are architects and developers. Solving difficult operational problems is not their job. The “migration is straightforward” attitude is a problem only when you ask your developers to support your operations. Something that seems depressingly common when NoSQL databases arrive to operations. Operations have no NoSQL experience and management asks the developers to help out until the ops teams learn to support the new beast. Problem is that NoSQL developers without operations experience are likely to cause just as much damage as operations without NoSQL experience.

8 Responses to “On the Difficulty of Data Migrations (Especially to NoSQL Databases)”

  • jametong says:

    Good Post. Man often rate high for his own work,and rate low for something he does now familiar.

  • Chen Shapira says:

    Exactly! I didn’t think of it that way, but the same way developers say “oh, just throw hardware at it”, I often say “just fix that bug! how difficult can this be?”

    We all suffer from myopia :)

  • Kristina says:

    Keep in mind, it was a 15-minute interview :)

    You make an excellent point, though: migration always sucks. People usually migrate to MongoDB because relational databases aren’t working for them, not “for fun.”

    We were trying (possibly badly) to make the point that the scary parts of trying a new technology (such as actually getting it working and migrating your data/code) are easy with MongoDB.

  • Gwen Shapira says:

    Hi Kristina,

    I actually enjoyed reading the interview. I wouldn’t bother writing about it if I didn’t think it was just a single misleading point mixed in with otherwise smart advice.

    But migrations are never easy (even when you use MongoDB, which has an excellent API and is relatively easy to use), so I wouldn’t set the wrong expectations :)

  • joel garry says:

    I stopped reading at “The main relational features missing from MongoDB are joins, foreign key constraints, and multi-row transactions. ”

    I can say with a fair amount of certainty that would make migrations of any of the systems I’ve worked on for the last 25 years extremely difficult. Except maybe for the MS ACCESS v1 I used to track my wedding invitations years ago. But that was a toy project.

    So the #2 being the hardest is correct – you have to throw out everything useful in your design. Sorry, I just don’t get it. This Mongo stuff seems to be limited to a simple doc store application. I’m starting to think this would be getting design backwards, wagging the app dog with the featureless technology tail.

  • Gwen Shapira says:

    It all depends on the requirements, I would guess. If your application doesn’t require (or can’t use) MongoDB, than it just can’t – not much debate about it.

    The migration comments are for those who want to use MongoDB and believe they can survive without transactions and they never used joins anyway.

    I admit that this makes MongoDB seem like a good replacement of a cache layer, but not of any DB application I’ve ever seen.

    On the bright side – MongoDB has indexes :)

  • […] Thukral suggests a well captured post by Gwen Shapira which conveys the complexity of a database migration even to technical […]

  • […] of lunacy used to be too-common with MongoDB adopters. I wrote two blog posts (No silver bullet and Difficulty of migrations) just ranting about this kind of optimism in MongoDB […]

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>