Stupid Tricks with MongoDB

Using aggregation framework to reshape schema

4/21/2013

When I first started playing with MongoDB aggregation framework queries, I quickly realized that '$project' stage could be hugely useful to completely reshape documents in cases where you structured them one way, but you need to get the results shaped differently.

I thought it would be particularly helpful if you could map keys (aka field names) to be field values and vice versa. This feature did not exist (and still doesn't) so I filed a request for it in MongoDB issue tracking system.

Meanwhile, as I show here it is still possible to do this "projection" via aggregation if you know the names of the fields in advance.

But what if you get the names of fields dynamically, rather than always knowing they will be field1, field2?

I'm going to show how you can generate the appropriate aggregation framework pipeline programmatically based on the set of field names passed in. I'm going to use Javascript in the shell as the most general example, but you can translate this into your language of choice and adjust accordingly.

My sample documents and schema will be like this:

 {
"_id" : 1,
    "attr" : [
        {  "k": "firstName",
           "v":  "Asya" }, 
        {  "k": "lastName",
           "v": "Kamsky" }, 
        {  "k": "employer",
           "v": "10gen, the MongoDB company" }, 
        {  "k": "URL",
           "v": "http://www.kamsky.org" } 
    ]
}

This is a pretty standard way of storing dynamic attributes - properties of the document that can't be easily enumerated in advance either because they are not all known, because they can vary widely for different types of documents or both. It makes it possible to index the attributes by creating a compound index on {"attr.k":1,"attr.v":1} fields and now querying on something like {"attr.k":"color","attr.v":"blue"} will use the index. An alternative way of storing keys as field names and values as field values has other advantages but it makes it difficult to have a good indexing strategy as you may end up with a large number of indexes and every time you add a new type of attribute you have to create a new index to support it. Sparse indexes can help, but they can create their own challenges worthy of a separate blog post.

Now imagine I want to output documents which have this shape:

9 Comments

Using aggregation framework to reshape schema

Asya Kamsky

Archives

Categories