Text index usage within MongoDB

3 min read

Jul 30, 2018 12:00:00 AM

[caption id="" align="aligncenter" width="600"] MongoDB text

MongoDB full text[/caption] Recently a client came to me asking “How do we verify if a full text search index is being used on MongoDB?” The db.showIndexes() command shows an index on a text field, but explain() shows COLLSCAN and the query is really slow (More about explain() here). Since it was an interesting case, I decided to write this blog post, describing the use of text indexes within MongoDB. First, let’s see how to create a text index. The command below will create one for the data.entry_text key in the entries collection:

db.entries.createIndex( { "data.entry_text": "text" } )

If we check the index definition, the output would look like this:

db.entries.getIndexes()
 . . . . . .
 [
 {
  "v": 1,
  "key": {
  "_fts": "text",
  "_ftsx": 1
  },
  "name": "data.entry_text_text",
  "ns": "database.entries",
  "background": false,
  "weights": {
  "data.entry_text": 1
  },
  "default_language": "english",
  "language_override": "language",
  "textIndexVersion": 3
 }
 ]
 . . . . . .

We can see the index is of type “text”, created on the namespace database.entries and only for the field data.entry_text. The index version is "textIndexVersion": 3 and that is the default version since MongoDB 3.2. More information about changes in text indexes introduced changes in version 3 can be found here. Going back to my customer’s issue, they were testing with a query using a search pattern similar to the one below:

db.entries.find({"data.entry_text": /cats/})

If we look at the explain plan for this query, we can see that no index is being used and the query is doing a full collection scan.

{
  "queryPlanner": {
  "plannerVersion": 1,
  "namespace": "database.entries",
  "indexFilterSet": false,
  "parsedQuery": {
  "data.entry_text": {
  "$regex": "cats"
  }
  },
  "winningPlan": {
  "stage": "COLLSCAN",
  "filter": {
  "data.entry_text": {
  "$regex": "cats"
  }
  },
  "direction": "forward"
  },
  "rejectedPlans": [ ]
  },
  "executionStats": {
  "executionSuccess": true,
  "nReturned": 20,
  "executionTimeMillis": 1795,
  "totalKeysExamined": 0,
  "totalDocsExamined": 133414,
  "executionStages": {
  "stage": "COLLSCAN",
  "filter": {
  "data.entry_text": {
  "$regex": "cats"
  }
  },
  "nReturned": 20,
  "executionTimeMillisEstimate": 1799,
  "works": 133416,
  "advanced": 20,
  "needTime": 133395,
  "needYield": 0,
  "saveState": 1098,
  "restoreState": 1098,
  "isEOF": 1,
  "invalidates": 0,
  "direction": "forward",
  "docsExamined": 133414
  },
  "allPlansExecution": [ ]
  },
  "serverInfo": {
  "host": "sanitized",
  "port": 27017,
  "version": "3.4.10",
  "gitVersion": "078f28920cb24de0dd479b5ea6c66c644f6326e9"
  },
  "ok": 1
 }

So, what is wrong with the index and why it is not picked up by the optimizer? If you worked with text search in MongoDB before, you probably have noted the problem already. First, a text search requires the $text operator in order to indicate the server on which we want to perform these type of queries. Furthermore, regex syntax is using (/ /) , which will not be considered a full text search.

db.entries.find({"data.entry_text": /cats/}) -- Regex search

Here is how the text search should look like:

db.entries.find({$text : {$search: "cats"}}) -- Text search

Now, if we run explain on the query and check the plan, the WinningPlan key shows that the FTS index is being used for the query. IndexName is our index "data.entry_text_text" as expected.

db.entries.find({$text : {$search: "cats"}}).explain()
 . . . . . .
 {
  "queryPlanner" : {
  "plannerVersion" : 1,
  "namespace" : "mtkiller.original-mt-entries",
  "indexFilterSet" : false,
  "parsedQuery" : {
  "$text" : {
  "$search" : "cats",
  "$language" : "english",
  "$caseSensitive" : false,
  "$diacriticSensitive" : false
  }
  },
  "winningPlan" : {
  "stage" : "TEXT",
  "indexPrefix" : {
  
  },
  "indexName" : "data.entry_text_text",
  "parsedTextQuery" : {
  "terms" : [
  "cat"
  ],
  "negatedTerms" : [ ],
  "phrases" : [ ],
  "negatedPhrases" : [ ]
  },
  "textIndexVersion" : 3,
  "inputStage" : {
  "stage" : "TEXT_MATCH",
  "inputStage" : {
  "stage" : "TEXT_OR",
  "inputStage" : {
  "stage" : "IXSCAN",
  "keyPattern" : {
  "_fts" : "text",
  "_ftsx" : 1
  },
  "indexName" : "data.entry_text_text",
  "isMultiKey" : true,
  "isUnique" : false,
  "isSparse" : false,
  "isPartial" : false,
  "indexVersion" : 2,
  "direction" : "backward",
  "indexBounds" : {
  
  }
  }
  }
  }
  },
  "rejectedPlans" : [ ]
  },
  "serverInfo" : {
  "host" : "sanitized",
  "port" : 37017,
  "version" : "3.4.15",
  "gitVersion" : "52e5b5fbaa3a2a5b1a217f5e647b5061817475f9"
  },
  "ok" : 1
 }

Conclusion

Regular expressions utilize B+ tree indexes and work well for search patterns that match the regular expressions against the values in the index. Further optimization can occur if the regular expression is a “prefix expression,” which means that all potential matches start with the same string. However, text search on any field whose value is a string or an array of string elements requires text index. Both regex and text search have their own operators and syntax, so the right ones should be used in each case for the optimizer to choose the expected index.

On this page

Ready to unlock value from your data?

With Pythian, you can accomplish your data transformation goals and more.

Speak with Pythian consultants now →

What We Do

Text index usage within MongoDB

Conclusion

Share this

Share this

More resources

Choosing best index for your MongoDB query

Practical Tips For Moving Indexes Between Tablespaces

How to find unused indexes and drop them safely in MongoDB

Ready to unlock value from your data?