Stupid Tricks with MongoDB

The MongoDB Trap

1/12/2014

Everyone knows that MongoDB is incredibly easy to get started with.

Most people also agree that MongoDB is very easy to prototype with. I'm talking especially about the kind of prototyping where you don't really know what the stress points of your application are going to be, what data will be most critical to access fast, what sort of questions you will need to ask of the data after the fact... In other words, exactly the situation you're in when you first start developing any new application.

MongoDB is easy to install, easy to use with your choice of development languages and it has some wonderful operational features. Okay, we already know all this, so what is this trap that I am talking about?

I've seen time and time again what happens to a super-fast little MongoDB backed app as the demand on it suddenly skyrockets: if there was no performance/load testing before launch, it's likely that the bottlenecks are a surprise and it's not always clear what they are, where they are, and how to "fix" them.

Here is where people can fall into one of two traps:

1. assume that their application load is not a good fit for MongoDB at scale.
2. assume that MongoDB can handle this if they can figure out how to "tune" it properly.

"Wait a minute," I hear you say. "Isn't it going to be the case that one of those things is true?"

Absolutely true. It's possible that the requirements of the application at scale are not a good fit for a document database. It's also possible that it is the use of MongoDB (schema design, indexing, hardware configuration) that was not done with high scale in mind, and MongoDB is perfectly suited to the workload being thrown at it.

"The trap" is when the assumption about which scenario you're in and reality don't match:

1. Scenario: Assumption is made that application is not a good fit for MongoDB but the application load is actually a perfect fit for MongoDB but poor schema design or bad indexing strategy or suboptimal topology or hardware choices are the cause of bad performance.
  Result: the team will spend tremendous effort architecting a new system to store their data when they could have improved their MongoDB performance by several orders of magnitude by fixing their design or hardware or cluster topology.

Here are some examples of this:
- There is a missing index. Solution: you don't fix the slow query by moving the task into a different system, you simply add the missing index.
- Indexes created are not best for the queries/updates running on the system, they take up RAM without providing much of a benefit. Solution: review and fix your indexes.
- The schema split a number of things into separate collections that should be stored together and every request for the object takes multiple accesses to the database instead of just one. Solution: reconsider/redesign parts of your schema.
- You have multiple shards but every request is sent to every shard, increasing the overall number of requests that the system must handle. Sharding scales best when each shard is only required to do work on its portion of the dataset, not the entire dataset.
- You have system that does heavy writes but you chose extremely slow storage system. Solution: get faster disks (after making sure that your writes aren't unnecessarily inefficient).
- You've made some inappropriate assumptions about how various parts of the system will benefit your use case, or followed outdated or flat-out wrong advice about how to use MongoDB. Solution: examine your assumptions that don't seem to be holding up, and remove any inefficiency that was created by following the incorrect assumption.

A variant of these scenarios is when an assumption is made about how MongoDB will work with the application and then time is wasted trying to "tune" the wrong thing or worse, "pre"-optimizations are made to cater to some rumored limitation of MongoDB and it's the "optimization" that ends up killing the performance, not the original thing that it was meant to "correct".

2. Scenario: The application load was a terrible fit for MongoDB but for whatever reason an assumption is made that MongoDB will be able to handle it, if only the right "tuning" is made.
Result: The team will spend tremendous effort trying to improve performance of a square peg in a round hole, instead of finding a round peg.

Examples of this include any scenario where you find yourself implementing more database work in your application than you are asking the database to do for you. Some examples out in the real world include situations where after migrating from MongoDB to another datastore, the application ended up having a lot less code - that tends to tell me that MongoDB was a poor choice from the start.

To be honest, I've seen a lot more examples of 1. than 2. Because MongoDB is quite flexible, it can be a good fit for an extremely wide range of application needs, but if no thought is given to proper schema, indexes and cluster configuration to serve those needs, there is no end to the number of ways that it can fall short of your expectations.

The worst of them all is the assumption that because MongoDB was so easy to install and so easy to get started with (not to mention so fast when it was being run with just some test data) that it will somehow tune itself at scale, without the developer having to give any thought to it. It would be wonderful if that were true, but at the end of the day, MongoDB is a database, and there is no magic pixie dust that you can sprinkle on it to say "just go faster" - it is my humble opinion, that reports of death of the role of MongoDB DBA have been greatly exaggerated.

5 Comments

Leif Walsh link

1/12/2014 09:09:39 pm

This is all true, but I think tokumx (http://www.tokutek.com/products/tokumx-for-mongodb) makes the entire story stronger.

In scenario 1, when the application is a good fit, tokumx can make it easier to scale up to larger data sets without as much development work, and tokumx can easily handle larger write volume without buying faster disks or sharding more.

For scenario 2, tokumx simply reduces the number of applications which are a bad fit for mongodb, by adding compression, concurrency, and transactional semantics.

Michael S.

1/19/2014 03:14:19 pm

Good warning, it's always useful to look beyond the hypes.

The first two examples you mention are of course also very true for any database, possibly even more for SQL server, as complex joins and foreign keys lean even heavier on indexing.

Schema design and shard design are very closely related. If a query gets passed to all shards, your shard design is simply wrong, because you're multiplying traffic and reads.

I recommend everyone interested to start following the next round of courses at Mongo University. They have a very good free education program.

charlie kelsey link

2/2/2014 05:37:13 pm

Interesting share, excellent understanding. Your blog is nice! I’m pleased through the info

Gary

11/28/2019 04:26:54 am

Question: if we create 2 DBs in 1 MongoDB instance (3 shards 3 replica i.e. 9 nodes) And we use DB #1 for data exploration purpose i.e. heavy heavy queries and DB #2 as 'Live' production db.

Would performance of DB#2 be affected if 1 day, someone issues a super heavy query on the DB #1?

Gary

11/28/2019 04:27:16 am

subscribed.

The MongoDB Trap

Leave a Reply.

Asya Kamsky

Archives

Categories