It boggles my mind when someone runs a single threaded test and sends work to the database serially and then complains about bad throughput. In order to measure throughput you have to throw as much work at the database server as you can get to it, and usually it means running many clients or multi-threaded clients.
Recently on the MongoDB Google Group someone asked if adding more shards to their cluster would make a particular MapReduce job faster. Here is how I answered them:
If the limit to performance is a lack of certain resources, then adding those resources will possibly make your performance better. But the problem with testing performance with a mapReduce job is ... well, it's a program that at the end tries to reduce everything into a single set of results. Sharding may or may not help at all, depending on which phase of mapReduce is taking the most time...
Think of it this way: if you want to make a baby, and it takes one woman nine months to carry a baby to term, will having nine women get a baby to term in one month? Or even eight months? Nope. However, having nine women can theoretically get nine babies to term in nine months. It's not any faster, but you're getting better throughput.