Someone just asked me how they can do reporting grouping on "year-month" when they only have the ObjectId generated by MongoDB to represent creation date.
While ObjectId is very useful - its first four bytes are the timestamp when it was generated - there's a simple way to convert it to a full date in Javascript (like mongo shell) but there is no way to convert it to a timestamp in aggregation pipeline (although there is a request for such a feature).
Since we can't do it in aggregation natively, we can use a stupid trick to generate "YEAR-MONTH" from ObjectId during $project stage so that we can group by it. Here is how I did it.
Working in the shell, first I generated an array of objects which represent all the months I want to report for (so I only generated a few years worth of months):
While ObjectId is very useful - its first four bytes are the timestamp when it was generated - there's a simple way to convert it to a full date in Javascript (like mongo shell) but there is no way to convert it to a timestamp in aggregation pipeline (although there is a request for such a feature).
Since we can't do it in aggregation natively, we can use a stupid trick to generate "YEAR-MONTH" from ObjectId during $project stage so that we can group by it. Here is how I did it.
Working in the shell, first I generated an array of objects which represent all the months I want to report for (so I only generated a few years worth of months):
var d = []; var o = []; var pad="f000000000000000"; for (yr=2014; yr < 2017; yr++ ) { for (m=1; m<13; m++) { if (m<10) mo="0"+m; else mo=""+m; var dt=new ISODate(""+yr+"-"+mo+"-01T00:00:00Z"); d.push(""+yr+"-"+mo); o.push(new ObjectId( (dt.getTime()/1000).toString(16)+pad)); } }
This generated two arrays of "YYYY-MM" strings and their corresponding ObjectId() values.
Now I can create a shell function which takes a field and two arrays, and creates an expression we can use in $project stage to map ranges in the second array to labels in the first array:
Now I can create a shell function which takes a field and two arrays, and creates an expression we can use in $project stage to map ranges in the second array to labels in the first array:
makeLabeledBuckets=function( field, keys, values) { var con=[]; var maxPos=keys.length; con[maxPos]=">" + keys[maxPos-1]; for (pos = maxPos-1; pos > 0; pos--) { con[pos] = {"$cond":{ if: {$lt:[field, values[pos]]}, then: keys[pos-1], else: con[pos+1] }}; } var first = "< " + keys[0]; con[0]={"$cond":{if: {$lt:[field,values[0]]}, then: first, else: con[1] }}; return con[0]; }
Now we can run our aggregation in the shell like this:
> db.collection.aggregate( [ { $project: { yearMonthStr: makeLabeledBuckets("$_id", d, o) } } ] ) { "_id" : ObjectId("55af2194cd214aaa0a5e3545"), "yearMonth" : "2015-08" } { "_id" : ObjectId("55af21b5cd214aaa0a5e3548"), "yearMonth" : "2015-08" } { "_id" : ObjectId("55aff3f78909abe4721284bc"), "yearMonth" : "2015-08" } { "_id" : ObjectId("55aff4bd8909abe4721284c0"), "yearMonth" : "2015-08" } { "_id" : ObjectId("56900c440172f6f5768fb249"), "yearMonth" : "2016-02" } { "_id" : ObjectId("56900d780172f6f5768fb24c"), "yearMonth" : "2016-02" } { "_id" : ObjectId("56900dc80172f6f5768fb24e"), "yearMonth" : "2016-02" } { "_id" : ObjectId("569014240172f6f5768fb251"), "yearMonth" : "2016-02" }
As you can see, each ObjectId in "_id" field got converted to corresponding "year-month" string, which we can now use to aggregate other metrics by.