Page Content

Tutorials

How do I Create a Text Search in MongoDB?

Text Search in MongoDB

You can use MongoDB Text Search to do full-text queries on the string content in your documents. This feature incorporates language analysis, which goes beyond basic string matching.

Text Search in MongoDB
Text Search in MongoDB

MongoDB Text Search has several important features, such as:

  1. Stemming: Text Search looks for certain words by breaking them down to their “stem” or root form using stemming techniques. Searching for “gardens” may yield “gardening” materials because both come from “garden”.
  2. Stop Words: Language-specific stop words (e.g., “a”, “an”, “the”) are dropped during indexing and searching since they are rarely relevant. MongoDB employs distinct stop word dictionaries according to the language that is given.
  3. Language Support: Text search in around 15 languages, including English, French, German, Spanish, and others, is supported by MongoDB. The stemming and stop word rules are determined by the language. You can set an index’s default language or change it for each document. If none is chosen as the language, only exact words are indexed and neither stemming nor stop word processing takes place.
  4. Relevance Scoring: You may order results based on how well they fit the query thanks to Text Search’s relevance scoring system, which gives each matched document a score.

Implementation

  • Enabling Text Search: MongoDB 2.6 defaults to text search. Use db.adminCommand({setParameter:true,textSearchEnabled:true}) to manually enable it in older versions (e.g., 2.4).
  • Creating Text Indexes: A collection can index many fields but only one text index.
    • On Specific Fields: “text” is the value you use when you specify the fields.
    • Wildcard Text Indexes: The wildcard specifier $** can be used to index any string content field.
  • Using Text Search:  Searching is done using the $text query operator.
  • For sentences, use double quotes, such as “impact crater”. Use a prefix with a minus sign (-) to reject texts that include certain words or phrases.
  • Controlling Search Results with Weights: To indicate the importance of a field for scoring, you may give it a weight in the text index definition. Explicitly weighted fields are set to 1.
  • Sorting by Text Score: “textScore” and sorting by that field will allow you to arrange the results according to their relevancy. Always, the scores are sorted from highest to lowest.
  • Text Search in Aggregation Pipeline: Text Search is accessible during the $match phase of the aggregate pipeline. The first stage has to be the $match stage with $text in it.

Limitations of Text Search

  1. Only one text index may be included in a collection.
  2. Word proximity information and phrases are not stored in text indexes.
  3. A text index cannot be used directly to determine sort order; $meta: “textScore” must be used instead.
  4. Geospatial or multi-key fields are examples of special index types that cannot be included in compound text indexes. In the event that a compound text index uses preceding keys, equality matches on those keys must be included in queries.
  5. MongoDB doesn’t let you change the stop word list or alter the stemming algorithms.

Regular Expression in MongoDB

Finding patterns in strings is made flexible and effective by regular expressions, or regex. PCRE (Perl Compatible Regular Expression) is the regular expression language used by MongoDB. In contrast to Text Search, regular expressions don’t need any particular setup or indexes.

Implementation

  • Using $regex Operator: The JavaScript regular expression syntax or the $regex operator can be used.
  • Case-Insensitive Search: Enter “$i” in the $options argument to conduct a case-insensitive search.
  • Regex for Array Elements: Regular expressions can be used on array fields as well, which is helpful for things like tags.
  • Regular Expression Query Optimisation:
    • By employing indexed values to match the regular expression, queries with indexed fields can greatly accelerate the search.
    • Prefix Expressions: An index can be used if the regular expression is a prefix expression (beginning with ^ or \A). This is due to the query’s requirement to look exclusively for strings that start with a specific set of characters.

Limitations of Regular Expressions

  1. Index Usage: While prefix regular expressions can use indexes, most regular expression queries particularly those that contain case-insensitive flags ($options:”$i” or /i) cannot use indexes efficiently and necessitate a collection scan. On huge collections, this may cause them to lag.
  2. The $where clause and arithmetic operators like $mod are unable to use indexes.

When to Use Which

  • For full-text enquiries that require linguistic elements such as stemming, stop word removal, and relevance score across perhaps lengthy text passages, use Text Search. For keyword searches on document content, it’s perfect.
  • When flexible pattern matching in strings is required and linguistic analysis is not always necessary, use regular expressions. It’s good for discovering patterns or incomplete matches (like “Ba” names). Consider performance if the regex cannot use an index.

In conclusion, Regular Expressions match patterns precisely, while Text Search does advanced content-based searches with natural language understanding.

Index