Stupid Tricks with MongoDB

Using 3.4 Aggregation Enhancements for Parallel Array Processing

11/30/2016

Now that 3.4 is out, I thought I'd publish some example aggregations I've shown to various folks over the last few months as we were testing new features. One thing that I've seen people store in MongoDB documents are "parallel arrays" - when there are two arrays that are somehow correlated, the first element in each array are related, so are the second ones, etc.

Here's a simple pipeline to add up each Nth element from each array:

db.example.find()
{ "_id" : ObjectId("583f35399bb2f9300fd1effe"), "a" : [ 1, 2, 3, 4, 5 ], "b" : [ 10, 20, 30, 40, 50 ] }
{ "_id" : ObjectId("583f355a9bb2f9300fd1efff"), "a" : [ 6, 7, 8 ], "b" : [ 600, 700, 800 ] }

db.example.aggregate( [ { "$project" : {
"aPlusb" : { "$map" : {
"input" : { "$zip" :{ "inputs" :["$a","$b"]}},
"as" : "zipped",
"in" : { "$sum":"$$zipped"}
}}
}})
{ "_id" : ObjectId("583f35399bb2f9300fd1effe"), "aPlusb" : [ 11, 22, 33, 44, 55 ] }
{ "_id" : ObjectId("583f355a9bb2f9300fd1efff"), "aPlusb" : [ 606, 707, 808 ] }

This is possible thanks to the new operator "$zip" which follows the Python zip function purpose and lets you combine multiple arrays into one.

Is "$zip" only useful when you already have parallel arrays in your document? It turns out there are other cases you may want to keep it in mind. One situation may be when you have an array and you would like to "enumerate" each element's index or location in the array, but you don't want or need to "$unwind" the array first (in previous versions you could "$unwind" with "includeArrayIndex" option but then to recreate the original array with indexes you would have to do a "$group" which is likely to be very inefficient.)

Here's a simple way to use new "$range" operator combined with "$zip" to generate array indexes along with original array elements.

db.example.find()
{ "_id" : ObjectId("583f37859bb2f9300fd1f000"), "a" : [ "first", "second", "third" ] }
{ "_id" : ObjectId("583f37949bb2f9300fd1f001"), "a" : [ "pizza", "sushi" ] }

db.example.aggregate( [ { "$project" : {
           "aWithIx" : {
               "$zip" : {
"inputs" : [ "$a", { "$range" : [ 0, { "$size" : "$a" } ] } ]
               }
           }
} } ] )
{ "_id" : ObjectId("583f37859bb2f9300fd1f000"), "aWithIx" : [ [ "first", 0 ], [ "second", 1 ], [ "third", 2 ] ] }
{ "_id" : ObjectId("583f37949bb2f9300fd1f001"), "aWithIx" : [ [ "pizza", 0 ], [ "sushi", 1 ] ] }

I'm sure you noticed that I made my range 0 based and I used size of each array "a" as the end value. The default "step" (optional third argument) is 1 so that works fine for this simple example.

There are many other great new aggregation features in 3.4. In the future, I want to show examples with some of the new stages: "$replaceRoot" and "$addFields", which allow you to manipulate the shape of your documents without having to know all the existing fields in them as well as "$facet" which allows you to run several "parallel" aggregations on the same input stream of documents.

1 Comment

Using 3.4 Aggregation Enhancements for Parallel Array Processing

Asya Kamsky

Archives

Categories