Page Content

Tutorials

What are the String Expression Operators in MongoDB?

String Expression Operators in MongoDB

Within its aggregation framework, MongoDB provides a set of string expression operators that are mostly used to work with string data in documents and produce calculated outcomes [6.4.1, 155, 299, 451, 538]. These operators are frequently used in stages, such as $project, to specify _id fields in a $group operator or to alter documents [6.3.1, 149].

String Expression Operators in MongoDB
String Expression Operators in MongoDB

The given string expression operators are explained as follows:

$concat Operator

Two or more strings can be concatenated into a single string using the $concat operator [6.4.1, 156, 299, 452]. Each of the expressions that it accepts must resolve to a string. Regardless of the characters used, $concat behaves in a well-defined manner.

Syntax:

"$concat" : [expr1[, expr2, ..., exprN]]

Example: In a $project stage, an example of how to use $concat to join first and last names with a space between them, in addition to other string manipulations.

db.users.aggregate([
    {$match: {username: 'kbanker'}},
    {$project: {
        name: {$concat:['$first_name', ' ', '$last_name']}, // Concatenates first_name, a space, and last_name
        firstInitial: {$substr: ['$first_name',0,1]},
        usernameUpperCase: {$toUpper: '$username'}
    }}
])

Example Output:

{
    "_id" : ObjectId("4c4b1476238d3b4dd5000001"),
    "name" : "Kyle Banker",
    "firstInitial" : "K",
    "usernameUpperCase" : "KBANKER"
}

The values of the first_name field, a literal space, and the last_name field are concatenated into a new field called name in this example: “name”: $concat:[‘$first_name’,”, ‘$last_name’].

$strcasecmp Operator

A case-insensitive string comparison is carried out using the $strcasecmp operator [6.4.1, 156, 301, 452]. It comes back:

  • 0 if the two strings are equivalent (case-insensitive).
  • 1 if the first string is greater than the second.
  • -1 if the first string is less than the second.

Only Roman characters are compatible with this operator.

Syntax:

"$strcasecmp" : [string1, string2]

Example: While a direct code example with output is not provided for $strcasecmp, its functionality is described as a case-insensitive comparison. It would typically be used in expressions where you need to sort or filter based on string equality without considering case. For instance, to sort a collection of documents by a field in a case-insensitive manner.

// Example of how it might be used within an aggregation pipeline,
// for instance, in a $project stage to create a comparison result.
db.myCollection.aggregate([
  {
    $project: {
      compareResult: { $strcasecmp: [ "$fieldA", "$fieldB" ] }
    }
  }
])

Sample Data for db.myCollection:

db.myCollection.drop(); // Clear existing data for a clean example
db.myCollection.insertMany([
    { "_id": 1, "fieldA": "apple", "fieldB": "Apple" },
    { "_id": 2, "fieldA": "Banana", "fieldB": "apple" },
    { "_id": 3, "fieldA": "cat", "fieldB": "dog" },
    { "_id": 4, "fieldA": "dog", "fieldB": "cat" },
    { "_id": 5, "fieldA": "hello", "fieldB": "world" },
    { "_id": 6, "fieldA": "WORLD", "fieldB": "hello" }
]);
db.myCollection.aggregate([
  {
    $project: {
      _id: 0, // Exclude _id from output for brevity in this example
      fieldA: 1,
      fieldB: 1,
      compareResult: { $strcasecmp: [ "$fieldA", "$fieldB" ] }
    }
  }
])

Output:

[
  {
    "fieldA": "apple",
    "fieldB": "Apple",
    "compareResult": 0
  },
  {
    "fieldA": "Banana",
    "fieldB": "apple",
    "compareResult": 1
  },
  {
    "fieldA": "cat",
    "fieldB": "dog",
    "compareResult": -1
  },
  {
    "fieldA": "dog",
    "fieldB": "cat",
    "compareResult": 1
  },
  {
    "fieldA": "hello",
    "fieldB": "world",
    "compareResult": -1
  },
  {
    "fieldA": "WORLD",
    "fieldB": "hello",
    "compareResult": 1
  }
]

$toUpper Operator

A string can be converted to all capital letters [6.4.1, 156, 300, 437, 452, 538] using the $toUpper operator. It is required that the expression passed to $toUpper evaluate to a string. It is only guaranteed to function on Roman alphabet characters, just like other case-affecting procedures.

Syntax:

"$toUpper" : expr

Example: As seen in the $concat example, $toUpper is used to convert the username field to uppercase.

db.users.aggregate([
    {$match: {username: 'kbanker'}},
    {$project: {
        name: {$concat:['$first_name', ' ', '$last_name']},
        firstInitial: {$substr: ['$first_name',0,1]},
        usernameUpperCase: {$toUpper: '$username'} // Converts the 'username' field to uppercase
    }}
])

Example Output:

{
    "_id" : ObjectId("4c4b1476238d3b4dd5000001"),
    "name" : "Kyle Banker",
    "firstInitial" : "K",
    "usernameUpperCase" : "KBANKER" // The username "kbanker" becomes "KBANKER"
}

Another example shows its use to normalise and sort usernames:

db.users.aggregate( [
  { $project : { name:{$toUpper:"$_id"} , _id:0 } }, // Converts _id to uppercase for the 'name' field
  { $sort : { name : 1 } }
] )

$toLower Operator

A string can be converted to all lowercase characters with the $toLower operator [6.4.1, 156, 300, 437, 452, 538]. The expression must evaluate to a string, much as $toUpper, and it is certain to function with Roman characters.

Syntax:

"$toLower" : expr

Example: Although $toLower’s symmetric function is suggested, a direct example with output using $toLower is not as clearly shown in the aggregate examples as $toUpper. As indicated by the recommendation to store data in a normalised case format or to utilise $toLower in the aggregate framework for query efficiency, it can be used, for example, to normalise case for comparison or storage.

Initial products collection data:

db.products.drop(); // Clear any previous data
db.products.insertMany([
    { "_id": 1, "name": "Laptop", "price": 1200, "stock": 50, "lastUpdated": ISODate("2024-01-15T10:00:00Z"), "status": "active" },
    { "_id": 2, "name": "Mouse", "price": 25, "stock": 100, "lastUpdated": ISODate("2024-02-01T14:30:00Z"), "status": "active" },
    { "_id": 3, "name": "Keyboard", "price": 75, "stock": 30, "lastUpdated": ISODate("2024-03-10T09:15:00Z"), "status": "active" },
    { "_id": 4, "name": "Monitor", "price": 300, "stock": 20, "lastUpdated": ISODate("2024-04-05T11:00:00Z"), "status": "active" }
]);
db.products.aggregate([
  {
    $project: {
      productNameLower: { $toLower: "$name" } // Converts the 'name' field to lowercase
    }
  }
])

Output:

[
  {
    "_id": 1,
    "productNameLower": "laptop"
  },
  {
    "_id": 2,
    "productNameLower": "mouse"
  },
  {
    "_id": 3,
    "productNameLower": "keyboard"
  },
  {
    "_id": 4,
    "productNameLower": "monitor"
  }
]

$substrCP Operator

The $substrCP operator is not specifically mentioned or explained. The $substr operator is what they are referring to instead [6.4.1, 156, 299, 452, 539].

A substring of a specified string can be created using the $substr operator [6.4.1, 156, 299, 452]. Three arguments are required:

  1. expr: The string expression that will be used to extract the substring.
  2. startOffset: An integer that indicates the offset of the first byte (0-indexed).
  3. numToReturn: The number of bytes to be returned is indicated by the integer numToReturn.

It’s important to remember that $substr uses bytes, not characters, to measure length and offset. Because a single character may span multiple bytes, users should exercise caution when working with multibyte encodings (such as UTF-8, which MongoDB strings are) as improper handling could result in unexpected outcomes.

Syntax:

"$substr" : [expr, startOffset, numToReturn]

Example: As seen in the $concat example, $substr is used to get the first initial of the first_name field.

db.users.aggregate([
    {$match: {username: 'kbanker'}},
    {$project: {
        name: {$concat:['$first_name', ' ', '$last_name']},
        firstInitial: {$substr: ['$first_name',0,1]}, // Gets a substring starting at byte 0, with a length of 1 byte
        usernameUpperCase: {$toUpper: '$username'}
    }}
])

Example Output:

{
    "_id" : ObjectId("4c4b1476238d3b4dd5000001"),
    "name" : "Kyle Banker",
    "firstInitial" : "K", // The first character of "Kyle"
    "usernameUpperCase" : "KBANKER"
}

General Context and Usage

Especially in the $project stage of MongoDB’s aggregation pipeline, these string expression operators are strong instruments. Selecting particular fields from a collection and reshaping documents including generating new fields based on calculations involving existing fields are made possible by the $project stage. Requirements are reduced and application code is made simpler by using these procedures directly on the MongoDB server. Because of its inherent operations and efficiency, the aggregation pipeline is regarded as the recommended approach for data aggregation. However, $substr works with bytes, which necessitates careful attention for multi-byte character sets, and actions impacting case ($toUpper, $toLower) are mainly assured for Roman characters.

Index