Love Your MongoDB
Pythian now officially supports MongoDB both as On-Demand and Managed Services offerings. We've been dipping our toes into the MongoDB pool for some time now, as more and more of our customers adopt MongoDB into their data infrastructure, but now it's official. Part of the decision to officially support MongoDB was driven by our desire to offer our customers full managed services for their entire data stack, which includes MongoDB now. But I'd like to believe that my own advocacy to expand our services was part of the decision, and I kept advocating this because MongoDB is the perfect database to have with managed services. MongoDB is perfect for managed services because there is no other database that is so much fun to use as a developer and so challenging to support as an administrator. Even as an experienced database administrator and a newbie developer, I much prefer to program on MongoDB than to manage it. This is not the case with Oracle or MySQL, where tuning and tinkering are actually more fun than writing SQL. Why is MongoDB such amazing fun for developers? MongoDB is a JSON document store. You can store any JSON doc there through a very easy to use API. Tons of Restful web APIs are sending back JSON docs, and being able to easily dump them into a database makes MongoDB just perfect. I'm working on a small program to provide some social network analysis from Twitter, and because Twitter throttles some of the APIs rather aggressively, I need to slowly grab the data I need over many hours. Using MongoDB to store all that data is a no-brainer. It's as easy as writing into text files. Of course, unlike text files, MongoDB lets you build indexes and run queries. This brings me to the most important reason MongoDB is awesome. MongoDB supports the most important paradigm of the post-relational-database era: Grab data first, structure it later. As much as DBAs don't want to hear about it, the requirement to define a data model and schema before you start collecting data creates serious friction on the starting phases of development. It's not a big deal for a multi-month large-scale project, but can be a problem for the small hacks I'm typically involved in. In a recent project, we need to store events from a queue to a database. All events have originator, timestamp, and priority. But they also have a bunch of "other data". In a pure relational schema, we can only keep the data we know we need and know how to structure, and maybe store the "other data" as a CLOB and be totally unable to use it later. With MongoDB, I can store everything since the events themselves are JSON. And I will still be able to query the data , even though I don't know in advance what is the data and what I'd like to get in the queries. MongoDB is not the only "store now, structure later" datastore; Hadoop allows you to do the same and so do many other data stores. But if you are now a relational-only shop, you can be sure that your developers are looking at one of those solutions with keen interest, and you'll be wise to do the same. Why is MongoDB a pain to manage? The web is full of horror stories from companies regretting the moment they adopted MongoDB. It is interesting to see how those MongoDB war stories typically involve downtimes far longer than anything you'd see in other data stores. It is very rare to see downtime on Oracle or MySQL lasting longer than 4 hours. With MongoDB, stories often include over 12h of downtime. This is the main reason I was nagging my managers to provide MongoDB support. I care about our customers and don't want to see them experience 12h downtime. As an IT professional, I find the idea unacceptable. A lot of those downtimes look completely preventable or at least solvable in shorter amounts of time. The main causes can be reduced to: 1. A company adopted MongoDB without fully understanding the model, its benefits, and its limitations. Especially its limitations. Any database has to be well understood before it goes into production, and MongoDB is no different. Assuming that you can just install a database, throw data in, and assume it will work is lunacy. This kind of lunacy used to be too common with MongoDB adopters. I wrote two blog posts ( No silver bullet and Difficulty of migrations) just ranting about this kind of optimism in the MongoDB community. 2. MongoDB was managed by a small team of developers. Developers are naturally not as good as experienced administrators in detecting problems early and responding accordingly and can often be out of their depth when things really go wrong. DBAs and sysadmins read mailing lists with people describing how their database crashed and discuss possible solutions. Developers are more likely to read mailing lists with programming problems and solutions. So guess who is more likely to recognize a production problem and know the solution? Why are those problems worse in MongoDB than in other databases? Part of it is maturity - MongoDB is not as mature as Oracle or MySQL. The documentation (official and in blogs and forums) is not as complete, the error messages are not as clear, and instrumentation is still lacking. Finding root cause of issues still take longer. But the main problem is that MongoDB is getting adopted into production by teams of developers with little to none operational commitment. Even if the operations team knows MongoDB is running somewhere (and it's not always the case!), they probably don't go to MongoDB training or even read a book. They have enough other work to do and hope the developers who pushed it into production know what they are doing. I've seen this happen with an early adoption of MySQL, and it's a deja vu all over again. This is why our most experienced MySQL admins have been busy learning MongoDB - they've been there before too. I hope you see now why I'm so excited about managed services for MongoDB. Your developers can have fun, and leave the pain to us. To answer the most frequently asked question: No, we are not competing with 10Gen. First, because we like 10Gen and hope to work with them a lot. Second, because we can't hope to compete with 10Gen - they employ many MongoDB developers, they know the code inside out, and they can fix bugs for you when you need it. How can we compete with that? Just like our Oracle customers still use Oracle support, we encourage our MongoDB customers to also have 10Gen support. The third reason is that 10Gen can't compete with us either. We offer full managed services - we will configure full monitoring of MongoDB, based on our experience and best practices, and the alerts will go to our pagers, providing 24/7 support. We aim to fix problems before you even notice they're there. We are very proactive about your high availability, recoverability, and performance. We have tons of experience making systems run so smooth that you'll forget they are there, until you customers call to ask "Why is everything so much faster now?" (True story!) Speaking of monitoring, our team of experts is building MongoDB monitoring system as I'm writing this blog post. I think that this is the secret sauce for MongoDB success. Monitoring and capacity planning go hand in hand and are the keys to keep systems up and performing well, so we put a lot of focus of getting the basics right. Of course, if you already have your own monitoring solution (or use 10Gen's), we'll integrate our pagers and capacity planning systems with whatever monitoring you use. Are you running MongoDB in production? How's the experience so far?