Asya's Collection of Random Stuff - Stupid Tricks with MongoDB

Comparing Subdocuments In MongoDB Expressions

Thu, 16 Jan 2020 19:01:35 GMT

A coworker recently asked how to efficiently determine that two subdocuments are "equal". The issue of course is that in normal MongoDB query language semantics, if you just say:

{$eq: [ {"a":1, "b":2},  {"b":2, "a":1} ] }

the result is false because the two subdocuments are not "equal". So how do you determine if two subdocuments are logically equal (without regard to field order) or whether all subdocuments in the collection or a group are logically equal?

The challenge the coworker was working on was comparing index definitions across multiple shards. There are several parts to the index definition. The key part very much depends on the order of the fields - an index on {"a":1, "b":1} is not the same thing as the index on {"b":1, "a":1}.

However the options on the index are not order dependent, if the specification part of the index is {"unique":true, "sparse":true} it has exactly the same effect as if it's {"sparse":true, "unique":true}.

Here are a couple of functions to the rescue. The first one does a comparison of two objects and considers them equal if they have the same top level fields with the same values. The second one will "normalize" an object so that no matter what order the fields are in, they will be in alphabetical order in the result document.

var unorderedEq = function(o1, o2) {    return {$eq: [       {$arrayToObject:{$setUnion:[{$objectToArray:o1}]}},       {$arrayToObject:{$setUnion:[{$objectToArray:o2}]}}    ] };};var normalize = function(o) {    return {"$arrayToObject" : {"$setUnion" : [ {"$objectToArray" : o}]}}}

Check out lots of other useful functions in my github repo here: https://github.com/asya999/bits-n-pieces/tree/master/scripts and try them out!

How to Convert Epoch to ISODate

Thu, 31 Oct 2019 21:16:22 GMT

The great thing about recent versions of MongoDB is that they added a lot of new expressions for handling different types in aggregation. The not so great thing is sometimes you still have to get things done in an older version that doesn't have the same capabilities.

A colleague asked me today how you can convert an epoch (number of milliseconds since 1970/1/1) to a proper ISODate format. The answer comes from date math:

db.coll.aggregate({$addFields:{
  date:{$add:[
  ISODate("1970-01-01T00:00:00Z"),
  "$epoch"
  ]}
}})

Because $add supports adding numbers to date (treating the number as milliseconds) we get back a proper ISODate that the number in "epoch" field represents!

How to get k-combinations from array in aggregation

Wed, 11 Sep 2019 17:09:34 GMT

A colleague asked me if it's possible to generate all combinations of 2 items for a given array using aggregation pipeline expression. In other words, something that gives this:

> db.combos.find(){_id:1, a:[ 1,2,3]}> db.combos.aggregate({$project:{_id:0, pairs: }}){pairs: [  [1,2], [1,3], [2,3] ]}

I'm always game to challenge aggregation so here is what I came up with for the expression for k=2:

{$reduce:{   input:{$range:[0,{$size:"$a"}]},    initialValue:[],    in:{$concatArrays:[       "$$value",       {$let:{         vars:{i:"$$this"},         in:{$map:{            input:{$range:[{$add:[1,"$$i"]},{$size:"$a"}]},            in:[ {$arrayElemAt:["$a","$$i"]}, {$arrayElemAt:["$a","$$this"]}] }}      }}   ]}}}

This is the aggregation equivalent of looping over the array elements and for each one looping over the remaining array elements to create the pairs. If array is `a` it's:

pairs = [];for (i=0; i

Aggregation expressions are pretty powerful. I gave a talk about the power of expressions over arrays at MongoDB World and local events in 2017/2018: if you missed it watch it HERE.

More Truncation of Dates in Aggregation

Tue, 23 Jan 2018 22:13:57 GMT

A long time ago, I wrote about how to convert an ISODate() that's got time into just a date (meaning zeroing out the hours/minutes/seconds/milliseconds). We did it by subtracting from the date the number of milliseconds since midnight which we calculated using some date math and $hour, $minute, $second and $millisecond expressions.

As of 3.6 there's a slightly simpler way to achieve the same thing using the "$dateFromParts" expression.

Here's an example:

> db.dates.find({},{_id:0}){ "sys_created_on" : ISODate("2012-02-18T03:04:49Z") }{ "sys_created_on" : ISODate("2012-02-18T03:04:49Z") }{ "sys_created_on" : ISODate("2012-02-18T03:04:49Z") }{ "sys_created_on" : ISODate("2012-02-10T03:04:49Z") }{ "sys_created_on" : ISODate("2012-02-28T03:04:49Z") }{ "sys_created_on" : ISODate("2012-03-18T03:04:49Z") }> db.dates.aggregate({$project:{_id:0, roundDate:{$dateFromParts:{     year:{$year:"$sys_created_on"},     month:{$month:"$sys_created_on"},     day:{$dayOfMonth:"$sys_created_on"}}}}}){ "roundDate" : ISODate("2012-02-18T00:00:00Z") }{ "roundDate" : ISODate("2012-02-18T00:00:00Z") }{ "roundDate" : ISODate("2012-02-18T00:00:00Z") }{ "roundDate" : ISODate("2012-02-10T00:00:00Z") }{ "roundDate" : ISODate("2012-02-28T00:00:00Z") }{ "roundDate" : ISODate("2012-03-18T00:00:00Z") }

Aggregation Helper Functions: lpad

Thu, 07 Dec 2017 17:54:48 GMT

MongoDB aggregation provides quite a few string manipulation functions, but there are many that it doesn't provide (yet), but we can express them ourselves using existing string expressions.

Today's example is 'lpad' - given a string, desired length and a character to pad with, return a string that is at least that length and if it was shorter then pad it on the front (i.e. left side) with provided pad character (by default we will use space to pad with).

lpad = function (str, len, padstr=" ") {      var redExpr={$reduce:{        input:{$range:[0,{$subtract:[len, {$strLenCP:str}]}]},        initialValue:"",        in:{$concat:["$$value",padstr]}}};      return {$cond:{        if:{$gte:[{$strLenCP:str},len]},        then:str,        else:{$concat:[ redExpr, str]}      }};}

To test the function, let's look at converting one date format to another:

db.d2s.aggregate({$project:{_id:0}}){ "d" : "1/4/2017" }{ "d" : "1/14/2017" }{ "d" : "11/8/2017" }{ "d" : "09/6/2017" }

To convert to "YYYY-MM-DD" we can use $split and $arrayElemAt expressions:

db.d2s.aggregate({$project:{_id:0, dt:{$let:{    vars:{parts:{$split:["$d","/"]}},    in:{$concat:[        {$arrayElemAt:["$$parts",2]},'-',        {$arrayElemAt:["$$parts",0]} ,'-',        {$arrayElemAt:["$$parts",1]}    ]}}}}}){ "dt" : "2017-1-4" }{ "dt" : "2017-1-14" }{ "dt" : "2017-11-8" }{ "dt" : "2017-09-6" }

But to get it to look right, we want to pad single digit days and months with '0' and we can use our function for this:

db.d2s.aggregate({$project:{_id:0, dt:{$let:{    vars:{parts:{$split:["$d","/"]}},    in:{$concat:[        {$arrayElemAt:["$$parts",2]},'-',        lpad({$arrayElemAt:["$$parts",0]},2,"0") ,'-',        lpad({$arrayElemAt:["$$parts",1]},2,"0")    ]}}}}}){ "dt" : "2017-01-04" }{ "dt" : "2017-01-14" }{ "dt" : "2017-11-08" }{ "dt" : "2017-09-06" }

Great news is that in 3.6 (out earlier this week!) you can take advantage of some great new date expressions to avoid all this extra work like this:

db.d2s.aggregate({$project:{_id:0, dt:{    $dateToString:{        format:'%Y-%m-%d',        date:{$dateFromString:{dateString:"$d"}}    }}}}){ "dt" : "2017-01-04" }{ "dt" : "2017-01-14" }{ "dt" : "2017-11-08" }{ "dt" : "2017-09-06" }

$dateFromString is new and while it does not take a format specifier, it can handle just about every format of date string I tried to throw at it!

Luckily we still can use lpad helper when we want to line up string columns.

Rank and Dense Rank in Aggregation

Wed, 19 Jul 2017 19:28:41 GMT

Earlier today someone asked me if it was possible to do dense ranking using aggregation framework. If you need a reminder of rank vs dense rank (I did) rank is the one that ranks sequentially making ties have the same rank but then skipping the rank that would have been used if there was no tie. So if the values we have are [ 100, 96, 96, 25, 25, 1 ] then ranks would be [ 1, 2, 2, 4, 4, 6]. Dense rank will also give ties the same rank, but it doesn't skip any "position" so the dense ranks for the same set would be: [ 1, 2, 2, 3, 3, 4 ].

Since ranking is done within specific "grouping", you're probably not surprised that $group stage is going to be involved. To get the scores (whatever you want to rank by) in order you can use $sort on that field first (assuming you have an index to support it) or you can $group with $push first and then sort the array in each document. Then you need to do ranking. Because it's a pretty complex expression, I created a helper function that generates it based on appropriate inputs:

That's it! Now that I have that function, I can pass in my array of objects, specifying which field is being used for ranking, and whether or not I want dense ranking or regular ranking.

The most sharp-eyed of you may have noticed that I must be running this in the latest 3.5 development release, because I used the "$mergeObjects" expression in my rankArray function. You can simulate the functionality of "$mergeObjects" by using $objectToArray, $concatArrays and $arrayToObject to do exactly the same thing in 3.4.4 or later, or if you are on an earlier version, instead of $mergeObjects of "$$this" and {"rank": "$$rank"} you can write the new object yourself explicitly:
{
"emp": "$$this.emp",
"sal" : "$$this.sal",
"rank": "$$rank"
}

$mergeObjects is only one of many great enhancements coming to MongoDB 3.6.

How to Match a Strict Subset of an Array in Order

Thu, 29 Jun 2017 16:44:27 GMT

While reviewing an old jira case for MongoDB that asked for a way to query for a strict subset of an array, I realized this can very easily be done in aggregation. Since I've been talking a lot recently about the power of aggregation (and MongoDB schema) lying in being able to query things stored in arrays, I thought I'd write up this example here.

The simple example will use a simple array of scalars representing "actions" like the example the ticket.

db.test.find({},{}){ "_id" : 1, "actions" : [ 2, 6, 3, 8, 5, 3 ] }{ "_id" : 2, "actions" : [ 6, 4, 2, 8, 4, 3 ] }{ "_id" : 3, "actions" : [ 6, 4, 6, 4, 3 ] }{ "_id" : 4, "actions" : [ 6, 8, 3 ] }{ "_id" : 5, "actions" : [ 6, 8 ] }{ "_id" : 6, "actions" : [ 6, 3, 11, 8, 3 ] }{ "_id" : 7, "actions" : [ 6, 3, 8 ] }

We want to find only the documents which contain actions [6, 3, 8] and in exactly this order with no intervening actions.

let wantedActions = [6, 3, 8];db.test.aggregate([  {$match:{actions:{$all:wantedActions}}},])

Note that first we match to reduce the documents we will be processing only to the ones that contain all of the actions we are interested in (but in any order).

Next we create an array of indexes which will let us step through the actions array creating a new array of all three element sub-arrays. At the end of the first two stages, our results are:

db.test.aggregate([    {$match:{actions:{$all:[6,3,8]}}},    {$project:{actions638:{$map:{       input:{$range:[0,{$subtract:[{$size:"$actions"},2]}]},       in:{$slice:["$actions","$$this",3]}    }}}}]){ "_id" : 1, "actions638" : [ [ 2, 6, 3 ], [ 6, 3, 8 ], [ 3, 8, 5 ], [ 8, 5, 3 ] ] }{ "_id" : 2, "actions638" : [ [ 6, 4, 2 ], [ 4, 2, 8 ], [ 2, 8, 4 ], [ 8, 4, 3 ] ] }{ "_id" : 4, "actions638" : [ [ 6, 8, 3 ] ] }{ "_id" : 6, "actions638" : [ [ 6, 3, 11 ], [ 3, 11, 8 ], [ 11, 8, 3 ] ] }{ "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }

Now it's easy to add another $match stage to get just the documents we want:

db.test.aggregate([  {$match:{actions:{$all:wantedActions}}},  {$project:{actions638:{$map:{        input:{$range:[0,{$subtract:[{$size:"$actions"},2]}]},        in:{$slice:["$actions","$$this",3]}  }}}},  {$match:{actions638:wantedActions}}]){ "_id" : 1, "actions638" : [ [ 2, 6, 3 ], [ 6, 3, 8 ], [ 3, 8, 5 ], [ 8, 5, 3 ] ] }{ "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }

If the action is an object inside an array, note that we can perform necessary transformations on it during the $map stage - rather than outputting subarray of original elements, we can extract only a single element from the subobjects.

What if we care about finding all actions "in order" but they don't have to be in strict sequence - that is, other actions are allowed in between, as long as the order of the actions we are looking for is correct?

The simplest way to achieve that (out of many) would be to add a $filter expression to remove all actions which are not in our wantedActions list and then proceed with exact same processing we've already seen:

db.test.aggregate([   {$match:{actions:{$all:wantedActions}}},   {$project:{actions638:{       $let: {          vars: {ouractions:{$filter:{input:"$actions",cond:{$in:["$$this", wantedActions]}}}},          in: {$map:{               input:{$range:[0,{$subtract:[{$size:"$$ouractions"},2]}]},               in:{$slice:["$$ouractions","$$this",3]}          }}      }   }}},   {$match:{actions638:wantedActions}}]){ "_id" : 1, "actions638" : [ [ 6, 3, 8 ], [ 3, 8, 3 ] ] }{ "_id" : 6, "actions638" : [ [ 6, 3, 8 ], [ 3, 8, 3 ] ] }{ "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }

Converting ObjectId Values to year-month labels v2

Mon, 05 Jun 2017 17:38:27 GMT

A long time ago I wrote a blog post showing how to convert ObjectId value field to corresponding "YYYY-MM" string for reporting type applications. Since I wrote that, aggregation pipeline gained the "$switch" expression which makes the syntax a lot shorter and easier to express (and read).

For variety, this version converts *string* type that represents ObjectId value to corresponding year-month:

d=[];o=[];for (yr=2011;  yr < 2018; yr++ ) {       for (m=1; m<13; m++) {           if (m<10) mo="0"+m; else mo=""+m;           var dt=new ISODate(""+yr+"-"+mo+"-01T00:00:00Z");           d.push(""+yr+"-"+mo);           /* wrap string in 'new ObjectId()' to convert OID rather than string type */           o.push(""+(dt.getTime()/1000).toString(16)+pad);      }  }makeLabeledSwitch = function(field, keys, values) {      var sw = {$switch:{           "branches":[ ],           default:"other"}      };      var br=[];      var maxPos=keys.length;      var first="<" + keys[0];      br.push({case:{$lt:[field,values[0]]}, then:first})      for (pos = 0; pos < maxPos-1; pos++) {           br.push({case:{$lt:[field,values[pos+1]]}, then: keys[pos] });      }      var last=">" + keys[maxPos-1];      sw["$switch"]["default"] = last;      sw["$switch"]["branches"] = br;      return sw;}

This syntax is more straight forward, and what it does is quite similar, which is for every "YYYY-MM" string in the range you're interested in, it maps the range of ObjectId string values (or technically, its first 4-bytes) to the year-month range. If you want to make this work with actual ObjectId type rather than string type, change the loop to populate "o" array with ObjectId() of corresponding string.

How to do intra-array comparisons

Fri, 05 May 2017 20:14:14 GMT

A colleague asked me how to find documents where the array of objects called "trans" has the following properties: one element contains a:0 and it's immediately followed by an element where a:1 and s>3.

In other words, flag the document that has trans array with element { ..., a:0, ...} immediately followed by element with { ..., a:1, s: N, ... } where N is greater than 3 or one that looks like this:

{
...
"trans" : [..., {...}, {..., a:0, ...}, {.., a:1, s:4, ...}, {...}, ...],
...
}

Here's the aggregation stage that adds a true or false field that indicates whether such a pattern was found in the "trans" array:

{$addFields:     {bad:          {$in:[true,              {$map:{                  input: {$range:[0,{$subtract:[{$size:"$trans"},1]} ]},                  as: "z",                  in: {$let: {                       vars: {                          e: {$arrayElemAt:["$trans","$$z"]},                          e1: {$arrayElemAt:["$trans",{$add:[1,"$$z"]}]}                      },                      in: {$cond: {                          if: {$and:[ {$eq:["$$e.a",0] },{$eq:["$$e1.a",1]}, {$gt:["$$e1.s",3]} ]},                         then: true,                         else: false                      } }                  }}              }}         ]}    }}

To elaborate: using "$range" we generate "z", an array of integers we'll $map to traverse "trans" and then we create two variables with "$let" which represent the array element at position "z" and at position "z+1". We then check our conditions and if all of them are true, we output "true" otherwise "false". The resultant array of booleans is checked using "$in" expression to see if "true" appears anywhere in it.

This stage uses several 3.4 features, the "$addFields" stage, as well as the $range and $in expressions. We could have used "$anyElementTrue" expression instead of "$in" and "$project" instead of "$addFields" (though then we would need to know all the fields we wanted to pass through) but there is no equivalent to "$range" before 3.4, so without it, we would need to do far more complex manipulation involving "$unwind" with "includeArrayIndex" option (which was introduced in 3.2) followed by "$group". If at all possible, just upgrade to 3.4 if you need to do intra-array comparisons.

Using 3.4 Aggregation Enhancements for Parallel Array Processing

Wed, 30 Nov 2016 20:20:26 GMT

Now that 3.4 is out, I thought I'd publish some example aggregations I've shown to various folks over the last few months as we were testing new features. One thing that I've seen people store in MongoDB documents are "parallel arrays" - when there are two arrays that are somehow correlated, the first element in each array are related, so are the second ones, etc.

Here's a simple pipeline to add up each Nth element from each array:

db.example.find()
{ "_id" : ObjectId("583f35399bb2f9300fd1effe"), "a" : [ 1, 2, 3, 4, 5 ], "b" : [ 10, 20, 30, 40, 50 ] }
{ "_id" : ObjectId("583f355a9bb2f9300fd1efff"), "a" : [ 6, 7, 8 ], "b" : [ 600, 700, 800 ] }

db.example.aggregate( [ { "$project" : {
"aPlusb" : { "$map" : {
"input" : { "$zip" :{ "inputs" :["$a","$b"]}},
"as" : "zipped",
"in" : { "$sum":"$$zipped"}
}}
}})
{ "_id" : ObjectId("583f35399bb2f9300fd1effe"), "aPlusb" : [ 11, 22, 33, 44, 55 ] }
{ "_id" : ObjectId("583f355a9bb2f9300fd1efff"), "aPlusb" : [ 606, 707, 808 ] }

This is possible thanks to the new operator "$zip" which follows the Python zip function purpose and lets you combine multiple arrays into one.

Is "$zip" only useful when you already have parallel arrays in your document? It turns out there are other cases you may want to keep it in mind. One situation may be when you have an array and you would like to "enumerate" each element's index or location in the array, but you don't want or need to "$unwind" the array first (in previous versions you could "$unwind" with "includeArrayIndex" option but then to recreate the original array with indexes you would have to do a "$group" which is likely to be very inefficient.)

Here's a simple way to use new "$range" operator combined with "$zip" to generate array indexes along with original array elements.

db.example.find()
{ "_id" : ObjectId("583f37859bb2f9300fd1f000"), "a" : [ "first", "second", "third" ] }
{ "_id" : ObjectId("583f37949bb2f9300fd1f001"), "a" : [ "pizza", "sushi" ] }

db.example.aggregate( [ { "$project" : {
           "aWithIx" : {
               "$zip" : {
"inputs" : [ "$a", { "$range" : [ 0, { "$size" : "$a" } ] } ]
               }
           }
} } ] )
{ "_id" : ObjectId("583f37859bb2f9300fd1f000"), "aWithIx" : [ [ "first", 0 ], [ "second", 1 ], [ "third", 2 ] ] }
{ "_id" : ObjectId("583f37949bb2f9300fd1f001"), "aWithIx" : [ [ "pizza", 0 ], [ "sushi", 1 ] ] }

I'm sure you noticed that I made my range 0 based and I used size of each array "a" as the end value. The default "step" (optional third argument) is 1 so that works fine for this simple example.

There are many other great new aggregation features in 3.4. In the future, I want to show examples with some of the new stages: "$replaceRoot" and "$addFields", which allow you to manipulate the shape of your documents without having to know all the existing fields in them as well as "$facet" which allows you to run several "parallel" aggregations on the same input stream of documents.

Using 3.4 Aggregation to Return Documents in Same Order as "$in" Expression

Mon, 24 Oct 2016 19:57:24 GMT

Sometimes when performing a MongoDB query with a long "$in" list, you might want to get return documents in the same order as the elements of the "$in" array are in. This request is Jira ticket SERVER-7528. Upcoming version 3.4 adds many cool new features, and some of the newly available aggregation stages and expressions make it pretty easy to do this.

Example of our collection:

{ "_id" : ObjectId("580e51fc87a0572ee623854f"), "name" : "Asya" }{ "_id" : ObjectId("580e520087a0572ee6238550"), "name" : "Charlie" }{ "_id" : ObjectId("580e520587a0572ee6238551"), "name" : "Tess" }{ "_id" : ObjectId("580e520887a0572ee6238552"), "name" : "David" }{ "_id" : ObjectId("580e520c87a0572ee6238553"), "name" : "Kyle" }{ "_id" : ObjectId("580e521287a0572ee6238554"), "name" : "Aly" }

The query we want to run is one that will return all documents where name is one of "David", "Charlie" or "Tess" and we want them in that exact order.

> db.people.find({"name":{"$in": ["David", "Charlie", "Tess"]}}).sort({ ??? })

Let's define a variable called "order" so we don't have to keep typing the names in the array:

> order = [ "David", "Charlie", "Tess" ]

Here's how we can do this with aggregation framework:

m = { "$match" : { "name" : { "$in" : order } } };
a = { "$addFields" : { "__order" : { "$indexOfArray" : [ order, "$name" ] } } };
s = { "$sort" : { "__order" : 1 } };
db.people.aggregate( [ m, a, s ] );

Our result:
{ "_id" : ObjectId("580e520887a0572ee6238552"), "name" : "David", "__order" : 0 }
{ "_id" : ObjectId("580e520087a0572ee6238550"), "name" : "Charlie", "__order" : 1 }
{ "_id" : ObjectId("580e520587a0572ee6238551"), "name" : "Tess", "__order" : 2 }

The "$addFields" stage is new in 3.4 and it allows you to "$project" new fields to existing documents without knowing all the other existing fields. The new "$indexOfArray" expression returns position of particular element in a given array.

The result of this aggregation will be documents that match your condition, in order specified in the input array "order", and the documents will include all original fields, plus an additional field called "__order". If we want to remove this field, 3.4 allows "$project" stage with just exclusion specification, so we would just add { "$project": {"__order":0}} at the end of our pipeline.

Lots of great new things coming in 3.4 - I'll post some more tricks soon.

Converting ObjectId to dates in aggregation

Wed, 17 Feb 2016 22:00:08 GMT

Someone just asked me how they can do reporting grouping on "year-month" when they only have the ObjectId generated by MongoDB to represent creation date.

While ObjectId is very useful - its first four bytes are the timestamp when it was generated - there's a simple way to convert it to a full date in Javascript (like mongo shell) but there is no way to convert it to a timestamp in aggregation pipeline (although there is a request for such a feature).

Since we can't do it in aggregation natively, we can use a stupid trick to generate "YEAR-MONTH" from ObjectId during $project stage so that we can group by it. Here is how I did it.

Working in the shell, first I generated an array of objects which represent all the months I want to report for (so I only generated a few years worth of months):

var d = []; var o = [];var pad="f000000000000000";for (yr=2014;  yr < 2017; yr++ ) {      for (m=1; m<13; m++) {         if (m<10) mo="0"+m; else mo=""+m;           var dt=new ISODate(""+yr+"-"+mo+"-01T00:00:00Z");         d.push(""+yr+"-"+mo);         o.push(new ObjectId( (dt.getTime()/1000).toString(16)+pad));      }  }

This generated two arrays of "YYYY-MM" strings and their corresponding ObjectId() values.

Now I can create a shell function which takes a field and two arrays, and creates an expression we can use in $project stage to map ranges in the second array to labels in the first array:

makeLabeledBuckets=function( field, keys, values) {    var con=[];    var maxPos=keys.length;    con[maxPos]=">" + keys[maxPos-1];    for (pos = maxPos-1; pos > 0; pos--) {        con[pos] = {"$cond":{                  if: {$lt:[field, values[pos]]},                  then:  keys[pos-1],                  else:  con[pos+1]        }};     }     var first = "< " + keys[0];     con[0]={"$cond":{if: {$lt:[field,values[0]]}, then: first, else: con[1] }};     return con[0];}

Now we can run our aggregation in the shell like this:

> db.collection.aggregate( [   { $project: { yearMonthStr: makeLabeledBuckets("$_id", d, o) } } ] ){ "_id" : ObjectId("55af2194cd214aaa0a5e3545"), "yearMonth" : "2015-08" }{ "_id" : ObjectId("55af21b5cd214aaa0a5e3548"), "yearMonth" : "2015-08" }{ "_id" : ObjectId("55aff3f78909abe4721284bc"), "yearMonth" : "2015-08" }{ "_id" : ObjectId("55aff4bd8909abe4721284c0"), "yearMonth" : "2015-08" }{ "_id" : ObjectId("56900c440172f6f5768fb249"), "yearMonth" : "2016-02" }{ "_id" : ObjectId("56900d780172f6f5768fb24c"), "yearMonth" : "2016-02" }{ "_id" : ObjectId("56900dc80172f6f5768fb24e"), "yearMonth" : "2016-02" }{ "_id" : ObjectId("569014240172f6f5768fb251"), "yearMonth" : "2016-02" }

As you can see, each ObjectId in "_id" field got converted to corresponding "year-month" string, which we can now use to aggregate other metrics by.

Determining Type of Field in MongoDB Aggregation

Wed, 15 Jul 2015 19:53:19 GMT

It can be useful to determine what data type a particular field is during aggregation. It's most useful to determine whether something is an array or not - mainly so that you don't try to get its $size, for example, but it can be useful for other types - you may need to apply some sort of transformation, or conversion before the next stage.

Aggregation does not (yet) have a $typeOf expression, otherwise you could just $project a new field with {"$typeOf" : "$field"} as its value. So we have to be more tricky and starting with MongoDB 3.0 we can be, due to SERVER-3304 which ensures consistent total ordering across all different types.

The ordering is documented and you can always double-check the source code just to be sure.

Let's look at an example collection:

How can we create an aggregation to output the type of each "f1" value? One thing that will help in the future will be the $isArray operator coming in 3.2 (now available as part of development version 3.1.5), but we don't really need it here. Knowing the total ordering across all types, we can figure out each type by comparing the value to lowest possible value of that data type, ordering the comparisons in such a way as to always get at most one type.

Because we don't want to write out long "if then else" type conditions, let's generate them in the mongo shell with a function:

When we include a call to this function with a string representing value of a field, it generates a very long string of "$cond" "if:, then:, else:" tests, which gives us a type. Let's now include it in our aggregation call and check out the results:

The $project stage passed through _id (default) and "f1" plus we added a new field "typeF1" which was equal to the long and nested conditional generated by getTypes js function we just created.

Until SERVER-13447 gives us an operator/expression to get the same value simply, this will work just fine.

Troubles with Stepdown? No problem...

Mon, 23 Jun 2014 15:35:43 GMT

Here is a stupid MongoDB Trick for someone who accidentally tells their primary to step down and not be eligible for election for some number of seconds - and that number is higher than they intended.

How do you cut that time short?

Normally when you run:

rs.stepDown(120)

your primary will step down (relinquish its Primary role) and it won't be eligible to be re-elected for two minutes. What if you realize that you didn't really mean to do this, or whatever you meant took about 5 seconds instead of 120?

You can't do another rs.stepDown with a different time value, because it will rightfully give you an error saying it cannot step down, not being a primary.

But what you can do is use the rs.freeze() command - this would normally be run on a secondary to prevent it from being eligible for the election for some number of seconds, but it has a special treatment for being passed 0 seconds:

rs.freeze(0)
{ "info" : "unfreezing", "ok" : 1 }

Well, isn't that convenient!

New "Little MongoDB Book" Updated for 2.6.

Sat, 31 May 2014 18:51:25 GMT

More than three years ago when MongoDB was newer (1.8) and not as well known as it is today, Karl Seguin wrote a free ebook called "The Little MongoDB Book".

I read it about two years ago - it was only slightly out of date - and I really enjoyed the high level introduction it gives people. While it may be addressing developers, architects or managers, and doesn't have as much for DBAs, it was still a great place to get a quick intro.

MongoDB (the company) frequently hands out printed copies of this book at Meetups and MongoDB days. Over the last two years I was saddened that while still being a good intro, the technical details were getting out of date.

The great thing about all things open is that you can fork a github repo, make updates and then create a pull request, which is just a fancy way of saying you can make updates yourself and then ask the owner to include them.

And that's what I did. And yesterday, Karl announced the newly updated book is available to all.

The Little MongoDB Book has finally been updated thanks to @asya999 http://t.co/KJNxA3Iks7 http://t.co/vUf96rbbtp
— karlseguin (@karlseguin) May 30, 2014

Enjoy!