While reviewing an old jira case for MongoDB that asked for a way to query for a strict subset of an array, I realized this can very easily be done in aggregation. Since I've been talking a lot recently about the power of aggregation (and MongoDB schema) lying in being able to query things stored in arrays, I thought I'd write up this example here.
The simple example will use a simple array of scalars representing "actions" like the example the ticket.
db.test.find({},{}) { "_id" : 1, "actions" : [ 2, 6, 3, 8, 5, 3 ] } { "_id" : 2, "actions" : [ 6, 4, 2, 8, 4, 3 ] } { "_id" : 3, "actions" : [ 6, 4, 6, 4, 3 ] } { "_id" : 4, "actions" : [ 6, 8, 3 ] } { "_id" : 5, "actions" : [ 6, 8 ] } { "_id" : 6, "actions" : [ 6, 3, 11, 8, 3 ] } { "_id" : 7, "actions" : [ 6, 3, 8 ] }
We want to find only the documents which contain actions [6, 3, 8] and in exactly this order with no intervening actions.
let wantedActions = [6, 3, 8]; db.test.aggregate([ {$match:{actions:{$all:wantedActions}}}, ])
Note that first we match to reduce the documents we will be processing only to the ones that contain all of the actions we are interested in (but in any order).
Next we create an array of indexes which will let us step through the actions array creating a new array of all three element sub-arrays. At the end of the first two stages, our results are:
Next we create an array of indexes which will let us step through the actions array creating a new array of all three element sub-arrays. At the end of the first two stages, our results are:
db.test.aggregate([ {$match:{actions:{$all:[6,3,8]}}}, {$project:{actions638:{$map:{ input:{$range:[0,{$subtract:[{$size:"$actions"},2]}]}, in:{$slice:["$actions","$$this",3]} }}}} ]) { "_id" : 1, "actions638" : [ [ 2, 6, 3 ], [ 6, 3, 8 ], [ 3, 8, 5 ], [ 8, 5, 3 ] ] } { "_id" : 2, "actions638" : [ [ 6, 4, 2 ], [ 4, 2, 8 ], [ 2, 8, 4 ], [ 8, 4, 3 ] ] } { "_id" : 4, "actions638" : [ [ 6, 8, 3 ] ] } { "_id" : 6, "actions638" : [ [ 6, 3, 11 ], [ 3, 11, 8 ], [ 11, 8, 3 ] ] } { "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }
Now it's easy to add another $match stage to get just the documents we want:
db.test.aggregate([ {$match:{actions:{$all:wantedActions}}}, {$project:{actions638:{$map:{ input:{$range:[0,{$subtract:[{$size:"$actions"},2]}]}, in:{$slice:["$actions","$$this",3]} }}}}, {$match:{actions638:wantedActions}} ]) { "_id" : 1, "actions638" : [ [ 2, 6, 3 ], [ 6, 3, 8 ], [ 3, 8, 5 ], [ 8, 5, 3 ] ] } { "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }
If the action is an object inside an array, note that we can perform necessary transformations on it during the $map stage - rather than outputting subarray of original elements, we can extract only a single element from the subobjects.
What if we care about finding all actions "in order" but they don't have to be in strict sequence - that is, other actions are allowed in between, as long as the order of the actions we are looking for is correct?
The simplest way to achieve that (out of many) would be to add a $filter expression to remove all actions which are not in our wantedActions list and then proceed with exact same processing we've already seen:
What if we care about finding all actions "in order" but they don't have to be in strict sequence - that is, other actions are allowed in between, as long as the order of the actions we are looking for is correct?
The simplest way to achieve that (out of many) would be to add a $filter expression to remove all actions which are not in our wantedActions list and then proceed with exact same processing we've already seen:
db.test.aggregate([ {$match:{actions:{$all:wantedActions}}}, {$project:{actions638:{ $let: { vars: {ouractions:{$filter:{input:"$actions",cond:{$in:["$$this", wantedActions]}}}}, in: {$map:{ input:{$range:[0,{$subtract:[{$size:"$$ouractions"},2]}]}, in:{$slice:["$$ouractions","$$this",3]} }} } }}}, {$match:{actions638:wantedActions}} ]) { "_id" : 1, "actions638" : [ [ 6, 3, 8 ], [ 3, 8, 3 ] ] } { "_id" : 6, "actions638" : [ [ 6, 3, 8 ], [ 3, 8, 3 ] ] } { "_id" : 7, "actions638" : [ [ 6, 3, 8 ] ] }