Someone on stackoverflow recently asked about projecting a substring out of a string field using a regular expression. This capability does not currently exist in the aggregation framework, all we have is the "$substr" operator and several related enhancement requests in MongoDB Jira. But the specific request was somewhat simple, and I wondered if I could figure out a way to do it with existing available functionality.
The existing "$substr" function takes a string field and two numbers, position and length and it will give you the substring of that field starting at position and of specified length.
What if I could use "$project" phase to figure out the first location of the separator? I could then use "$substr" and I'd be all set. Sure, it's only going to handle finding first occurrence of such character, but it's something.
This particular question was about ignoring the first part of the machine name - i.e. everything up to the first '.' character.
Imagine we have a collection which has these documents, including the name of some machine hostname and we want to report on the domain only, that is, remove the part up to the first '.' character.
The existing "$substr" function takes a string field and two numbers, position and length and it will give you the substring of that field starting at position and of specified length.
What if I could use "$project" phase to figure out the first location of the separator? I could then use "$substr" and I'd be all set. Sure, it's only going to handle finding first occurrence of such character, but it's something.
This particular question was about ignoring the first part of the machine name - i.e. everything up to the first '.' character.
Imagine we have a collection which has these documents, including the name of some machine hostname and we want to report on the domain only, that is, remove the part up to the first '.' character.
We construct the documents we will need in the `$project` stage programmatically, so we need to set some variables, starting position in the string for the search, the maximum position where this character is expected to appear, and the character itself.
Basically, we've constructed a giant "if-then-else" statement by looping and checking every character against the one we are interested in. This is why there has to be a limit set on the length of the string we want to search on.
We now use the first array element to get the appropriate substring, and here are our results:
We now use the first array element to get the appropriate substring, and here are our results:
If you think it would be useful to have a more straight forward way to manipulate strings, please comment on and vote for SERVER-8951 in Jira.