Love Your MongoDB

Feb 4, 2013 / By Gwen Shapira

Tags: , , ,

mongodb

Pythian now officially supports MongoDB both as On-Demand and Managed Services offerings.
We’ve been dipping our toes into the MongoDB pool for some time now, as more and more of our customers adopt MongoDB into their data infrastructure, but now it’s official.

Part of the decision to officially support MongoDB was driven by our desire to offer our customers full managed services for their entire data stack, which includes MongoDB now. But I’d like to believe that my own advocacy to expand our services was part of the decision, and I kept advocating this because MongoDB is the perfect database to have with managed services.

MongoDB is perfect for managed services because there is no other database that is so much fun to use as a developer and so challenging to support as an administrator. Even as an experienced database administrator and a newbie developer, I much prefer to program on MongoDB than to manage it. This is not the case with Oracle or MySQL, where tuning and tinkering are actually more fun than writing SQL.

Why is MongoDB such amazing fun for developers? MongoDB is a JSON document store. You can store any JSON doc there through a very easy to use API. Tons of Restful web APIs are sending back JSON docs, and being able to easily dump them into a database makes MongoDB just perfect. I’m working on a small program to provide some social network analysis from Twitter, and because Twitter throttles some of the APIs rather aggressively, I need to slowly grab the data I need over many hours. Using MongoDB to store all that data is a no-brainer. It’s as easy as writing into text files.
Of course, unlike text files, MongoDB lets you build indexes and run queries.

This brings me to the most important reason MongoDB is awesome. MongoDB supports the most important paradigm of the post-relational-database era: Grab data first, structure it later. As much as DBAs don’t want to hear about it, the requirement to define a data model and schema before you start collecting data creates serious friction on the starting phases of development. It’s not a big deal for a multi-month large-scale project, but can be a problem for the small hacks I’m typically involved in. In a recent project, we need to store events from a queue to a database. All events have originator, timestamp, and priority. But they also have a bunch of “other data”. In a pure relational schema, we can only keep the data we know we need and know how to structure, and maybe store the “other data” as a CLOB and be totally unable to use it later. With MongoDB, I can store everything since the events themselves are JSON. And I will still be able to query the data , even though I don’t know in advance what is the data and what I’d like to get in the queries.

MongoDB is not the only “store now, structure later” datastore; Hadoop allows you to do the same and so do many other data stores. But if you are now a relational-only shop, you can be sure that your developers are looking at one of those solutions with keen interest, and you’ll be wise to do the same.

Why is MongoDB a pain to manage? The web is full of horror stories from companies regretting the moment they adopted MongoDB. It is interesting to see how those MongoDB war stories typically involve downtimes far longer than anything you’d see in other data stores. It is very rare to see downtime on Oracle or MySQL lasting longer than 4 hours. With MongoDB, stories often include over 12h of downtime. This is the main reason I was nagging my managers to provide MongoDB support. I care about our customers and don’t want to see them experience 12h downtime. As an IT professional, I find the idea unacceptable.

A lot of those downtimes look completely preventable or at least solvable in shorter amounts of time. The main causes can be reduced to:

1. A company adopted MongoDB without fully understanding the model, its benefits, and its limitations. Especially its limitations. Any database has to be well understood before it goes into production, and MongoDB is no different. Assuming that you can just install a database, throw data in, and assume it will work is lunacy. This kind of lunacy used to be too common with MongoDB adopters. I wrote two blog posts (No silver bullet and Difficulty of migrations) just ranting about this kind of optimism in the MongoDB community.

2. MongoDB was managed by a small team of developers. Developers are naturally not as good as experienced administrators in detecting problems early and responding accordingly and can often be out of their depth when things really go wrong. DBAs and sysadmins read mailing lists with people describing how their database crashed and discuss possible solutions. Developers are more likely to read mailing lists with programming problems and solutions. So guess who is more likely to recognize a production problem and know the solution?

Why are those problems worse in MongoDB than in other databases? Part of it is maturity – MongoDB is not as mature as Oracle or MySQL. The documentation (official and in blogs and forums) is not as complete, the error messages are not as clear, and instrumentation is still lacking. Finding root cause of issues still take longer. But the main problem is that MongoDB is getting adopted into production by teams of developers with little to none operational commitment. Even if the operations team knows MongoDB is running somewhere (and it’s not always the case!), they probably don’t go to MongoDB training or even read a book. They have enough other work to do and hope the developers who pushed it into production know what they are doing. I’ve seen this happen with an early adoption of MySQL, and it’s a deja vu all over again. This is why our most experienced MySQL admins have been busy learning MongoDB – they’ve been there before too.

I hope you see now why I’m so excited about managed services for MongoDB. Your developers can have fun, and leave the pain to us.

To answer the most frequently asked question: No, we are not competing with 10Gen. First, because we like 10Gen and hope to work with them a lot. Second, because we can’t hope to compete with 10Gen – they employ many MongoDB developers, they know the code inside out, and they can fix bugs for you when you need it. How can we compete with that? Just like our Oracle customers still use Oracle support, we encourage our MongoDB customers to also have 10Gen support.
The third reason is that 10Gen can’t compete with us either. We offer full managed services – we will configure full monitoring of MongoDB, based on our experience and best practices, and the alerts will go to our pagers, providing 24/7 support. We aim to fix problems before you even notice they’re there. We are very proactive about your high availability, recoverability, and performance. We have tons of experience making systems run so smooth that you’ll forget they are there, until you customers call to ask “Why is everything so much faster now?” (True story!)

Speaking of monitoring, our team of experts is building MongoDB monitoring system as I’m writing this blog post. I think that this is the secret sauce for MongoDB success. Monitoring and capacity planning go hand in hand and are the keys to keep systems up and performing well, so we put a lot of focus of getting the basics right. Of course, if you already have your own monitoring solution (or use 10Gen’s), we’ll integrate our pagers and capacity planning systems with whatever monitoring you use.

Are you running MongoDB in production? How’s the experience so far?

10 Responses to “Love Your MongoDB”

  • Yury says:

    Thanks for the blog post. I found it very educational. I am sure that Admins will find their fun managing MongoDB too. It almost looks like a perfict place to be for an admin who have a dream to get his hands dirty with code fixes and who enjoys to be integrated part of Development team :)

  • Chetan says:

    Thanks for the blog post , it is very educational and nicely presented. (True Story!)

  • Andreas says:

    Thanks for a very informative and practical post.

    I read a lot about NoSQL but never had a personal experience – just playgrounds. You wrote
    “MongoDB war stories typically involve downtimes far longer than anything you’d see in other data stores”
    1.) I always thought one of THE advantages of “NoSQL” is availability and avoiding upgrade nightmares because of schemaless design (at least many NoSQL companies and promoters tell you so at every conference and in every paper). Could you please give some examples why the downtimes happened?
    2.) How many shards do you typically operate?

    • Gwen Shapira says:

      1) MongoDB is different than other NoSQL databases in that regard. You create shards, but those don’t provide replication and high availability (as they would in Cassandra), you need to create replication-slaves for each shard to get HA. And some sort of failures will cause them to fail as well (running out of connections or memory can cascade to slaves, and global locks are global). Thats what I meant by “know the features and limitations” – each NoSQL is different.

      2) Clusters are typically small, as our existing customers are just starting out with MongoDB. 3-5 shards are common. Sizing depends mostly on data sizes and size of memory per server.

  • Jared says:

    Thanks Gwen – very thoughtful and informative.

  • Dean Langford says:

    Thanks for the post, Gwen. I’m glad to hear Pythian is now supporting MongoDB.

    I’ve been working with Oracle for almost my entire career, first as a developer, then as a DBA. I’ve been working with MongoDB for the past 18+ months and I have to admit that the developer in me loves it (mostly; “What? No joins?!”). The DBA in me was hesitant at first (global lock, JSON commands, lack of good tools).

    Probably the biggest thing I miss in working with MongoDB, having worked for so long on an RDBMS, is SQL. I lost count of how many times I said to myself “if I could just write some SQL, this problem would be solved; but no, I have to write a program!”. Aggregation is another downside. Before version 2.2, you had to use MongoDB’s MapReduce (JavaScript) or aggregate in code. Things are better since 2.2, which includes the new Aggregation Framework, but I have not had a chance to use it yet.

    We were lucky when we started using MongoDB: the person who chose to use it for their project was very knowledgable and experienced with data systems, and the company’s senior DBA (not me) was involved from the start. I think they did a lot of things right and that has helped minimize problems. MongoDB was a good fit for the project at the time, and we have used it in a couple other projects since then. That’s important: MongoDB is not a fit for all applications. I think a lot of the horror stories you see on Hacker News, Reddit, or wherever are probably a case of “wrong tool for the job”.

    I can think of a few areas where we need improvement in managing MongoDB, some of which we can hopefully solve ourselves, some not:

    1) Backups. We’re using replica sets for high availability (as should everyone). They are easy to set up, as is just about everything else in MongoDB. We’re using the mongodump utility for a basic, non-consistent (useless, CYA) backup. Ideally, we would want to have more replicas for other purposes. One would be on a time delay, for those times when someone does something stupid, like drop a collection. Another replica could be for backup purposes: lock the replica and do a filesystem snapshot.

    2) Capacity Planning. You’ll hear this a lot, but the most important key to MongoDB performance is to size the system to keep the “working set” in memory. What is the “working set”? It depends on the appication. At minimum, the collection’s indexes need to fix in memory. It is difficult to know the total size of the working set, though. My fear is that one day, the database performance will drop off a cliff. Monitoring is one component to avoid that. But other factors are know the data usage patterns of the appliction. All to answer the question “should I shard now?”.

    3) Monitoring. We’re using 10gen’s MMS service for monitoring. We’re also using a modified PostgreSQL adapter for Oracle Grid Control (still on 11) to monitor and alert on some basic metrics. MMS is nice, but I wish it had more alerting capabilties that simply “node is up/down”. 10gen has been awesome in supporting it, though, and giving it away for free, so who can complain?

    I think Pythian can provide value to companies that are starting to use MongoDB for areas such as these.

    • Ivan Saez says:

      Dean,

      You wrote:
      “….
      Monitoring. We’re using 10gen’s MMS service for monitoring. We’re also using a modified PostgreSQL adapter for Oracle Grid Control (still on 11) to monitor and alert on some basic metrics.
      ….”

      We also use Oracle Grid (OG) to monitor all our databases and are going to experiment with Mongodb and would like to monitor it with OG. Could you please let me know how you can monitor Mongodb with OG?
      I could write some UDM but I don’t want to re-invent the wheel.
      Thanks in advance.

      Ivan

  • Alex Popescu says:

    You guys are always preparing and ready for the future. In that sense and considering your dedication to customers, I think something that might be interesting to keep on your radar is http://www.rethinkdb.com. It’s still very young and there aren’t yet customers needing support for it. But I’m pretty sure this will change very soon. Indeed all the decisions that make it in the product are meant *not* to lead to another painful to manage database, but there’ll always be a need for experts.

    Have fun and good luck with Mongo!

  • I. Gorbatovsky says:

    Thank you for the article! Good to hear Mongo now is in support list.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>