Most people also agree that MongoDB is very easy to prototype with. I'm talking especially about the kind of prototyping where you don't really know what the stress points of your application are going to be, what data will be most critical to access fast, what sort of questions you will need to ask of the data after the fact... In other words, exactly the situation you're in when you first start developing any new application.
MongoDB is easy to install, easy to use with your choice of development languages and it has some wonderful operational features. Okay, we already know all this, so what is this trap that I am talking about?
I've seen time and time again what happens to a super-fast little MongoDB backed app as the demand on it suddenly skyrockets: if there was no performance/load testing before launch, it's likely that the bottlenecks are a surprise and it's not always clear what they are, where they are, and how to "fix" them.
Here is where people can fall into one of two traps:
1. assume that their application load is not a good fit for MongoDB at scale.
2. assume that MongoDB can handle this if they can figure out how to "tune" it properly.
"Wait a minute," I hear you say. "Isn't it going to be the case that one of those things is true?"
Absolutely true. It's possible that the requirements of the application at scale are not a good fit for a document database. It's also possible that it is the use of MongoDB (schema design, indexing, hardware configuration) that was not done with high scale in mind, and MongoDB is perfectly suited to the workload being thrown at it.
"The trap" is when the assumption about which scenario you're in and reality don't match:
1. Scenario: Assumption is made that application is not a good fit for MongoDB but the application load is actually a perfect fit for MongoDB but poor schema design or bad indexing strategy or suboptimal topology or hardware choices are the cause of bad performance.
Result: the team will spend tremendous effort architecting a new system to store their data when they could have improved their MongoDB performance by several orders of magnitude by fixing their design or hardware or cluster topology.
Here are some examples of this:
- There is a missing index. Solution: you don't fix the slow query by moving the task into a different system, you simply add the missing index.
- Indexes created are not best for the queries/updates running on the system, they take up RAM without providing much of a benefit. Solution: review and fix your indexes.
- The schema split a number of things into separate collections that should be stored together and every request for the object takes multiple accesses to the database instead of just one. Solution: reconsider/redesign parts of your schema.
- You have multiple shards but every request is sent to every shard, increasing the overall number of requests that the system must handle. Sharding scales best when each shard is only required to do work on its portion of the dataset, not the entire dataset.
- You have system that does heavy writes but you chose extremely slow storage system. Solution: get faster disks (after making sure that your writes aren't unnecessarily inefficient).
- You've made some inappropriate assumptions about how various parts of the system will benefit your use case, or followed outdated or flat-out wrong advice about how to use MongoDB. Solution: examine your assumptions that don't seem to be holding up, and remove any inefficiency that was created by following the incorrect assumption.
A variant of these scenarios is when an assumption is made about how MongoDB will work with the application and then time is wasted trying to "tune" the wrong thing or worse, "pre"-optimizations are made to cater to some rumored limitation of MongoDB and it's the "optimization" that ends up killing the performance, not the original thing that it was meant to "correct".
2. Scenario: The application load was a terrible fit for MongoDB but for whatever reason an assumption is made that MongoDB will be able to handle it, if only the right "tuning" is made.
Result: The team will spend tremendous effort trying to improve performance of a square peg in a round hole, instead of finding a round peg.
Examples of this include any scenario where you find yourself implementing more database work in your application than you are asking the database to do for you. Some examples out in the real world include situations where after migrating from MongoDB to another datastore, the application ended up having a lot less code - that tends to tell me that MongoDB was a poor choice from the start.
To be honest, I've seen a lot more examples of 1. than 2. Because MongoDB is quite flexible, it can be a good fit for an extremely wide range of application needs, but if no thought is given to proper schema, indexes and cluster configuration to serve those needs, there is no end to the number of ways that it can fall short of your expectations.
The worst of them all is the assumption that because MongoDB was so easy to install and so easy to get started with (not to mention so fast when it was being run with just some test data) that it will somehow tune itself at scale, without the developer having to give any thought to it. It would be wonderful if that were true, but at the end of the day, MongoDB is a database, and there is no magic pixie dust that you can sprinkle on it to say "just go faster" - it is my humble opinion, that reports of death of the role of MongoDB DBA have been greatly exaggerated.