MongoDB Full and Partial Text Search

MongodbMongodb QueryAggregation FrameworkSpring Data-MongodbFull Text-Indexing

Mongodb Problem Overview


Env:

  • MongoDB (3.2.0) with Mongoose

Collection:

  • users

Text Index creation:

  BasicDBObject keys = new BasicDBObject();
  keys.put("name","text");

  BasicDBObject options = new BasicDBObject();
  options.put("name", "userTextSearch");
  options.put("unique", Boolean.FALSE);
  options.put("background", Boolean.TRUE);
  
  userCollection.createIndex(keys, options); // using MongoTemplate

Document:

  • {"name":"LEONEL"}

Queries:

  • db.users.find( { "$text" : { "$search" : "LEONEL" } } ) => FOUND
  • db.users.find( { "$text" : { "$search" : "leonel" } } ) => FOUND (search caseSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONÉL" } } ) => FOUND (search with diacriticSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } ) => FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "LEO" } } ) => NOT FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "L" } } ) => NOT FOUND (Partial search)

Any idea why I get 0 results using as query "LEO" or "L"?

Regex with Text Index Search is not allowed.

db.getCollection('users')
     .find( { "$text" : { "$search" : "/LEO/i", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
     .count() // 0 results

db.getCollection('users')
     .find( { "$text" : { "$search" : "LEO", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
.count() // 0 results

MongoDB Documentation:

Mongodb Solutions


Solution 1 - Mongodb

As at MongoDB 3.4, the text search feature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languages are based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.

There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demo page to experiment with more words and stemming algorithms.

Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.

If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:

There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.

Solution 2 - Mongodb

As Mongo currently does not supports partial search by default...

I created a simple static method.

import mongoose from 'mongoose'

const PostSchema = new mongoose.Schema({
    title: { type: String, default: '', trim: true },
    body: { type: String, default: '', trim: true },
});

PostSchema.index({ title: "text", body: "text",},
    { weights: { title: 5, body: 3, } })

PostSchema.statics = {
    searchPartial: function(q, callback) {
        return this.find({
            $or: [
                { "title": new RegExp(q, "gi") },
                { "body": new RegExp(q, "gi") },
            ]
        }, callback);
    },

    searchFull: function (q, callback) {
        return this.find({
            $text: { $search: q, $caseSensitive: false }
        }, callback)
    },

    search: function(q, callback) {
        this.searchFull(q, (err, data) => {
            if (err) return callback(err, data);
            if (!err && data.length) return callback(err, data);
            if (!err && data.length === 0) return this.searchPartial(q, callback);
        });
    },
}

export default mongoose.models.Post || mongoose.model('Post', PostSchema)

How to use:

import Post from '../models/post'

Post.search('Firs', function(err, data) {
   console.log(data);
})

Solution 3 - Mongodb

Without creating index, we could simply use:

db.users.find({ name: /<full_or_partial_text>/i}) (case insensitive)

Solution 4 - Mongodb

If you want to use all the benefits of MongoDB's full-text search AND want partial matches (maybe for auto-complete), the n-gram based approach mentioned by Shrikant Prabhu was the right solution for me. Obviously your mileage may vary, and this might not be practical when indexing huge documents.

In my case I mainly needed the partial matches to work for just the title field (and a few other short fields) of my documents.

I used an edge n-gram approach. What does that mean? In short, you turn a string like "Mississippi River" into a string like "Mis Miss Missi Missis Mississ Mississi Mississip Mississipp Mississippi Riv Rive River".

Inspired by this code by Liu Gen, I came up with this method:

function createEdgeNGrams(str) {
	if (str && str.length > 3) {
		const minGram = 3
		const maxGram = str.length
		
		return str.split(" ").reduce((ngrams, token) => {
			if (token.length > minGram) {	
				for (let i = minGram; i <= maxGram && i <= token.length; ++i) {
					ngrams = [...ngrams, token.substr(0, i)]
				}
			} else {
				ngrams = [...ngrams, token]
			}
			return ngrams
		}, []).join(" ")
	} 
	
	return str
}

let res = createEdgeNGrams("Mississippi River")
console.log(res)

Now to make use of this in Mongo, I add a searchTitle field to my documents and set its value by converting the actual title field into edge n-grams with the above function. I also create a "text" index for the searchTitle field.

I then exclude the searchTitle field from my search results by using a projection:

db.collection('my-collection')
  .find({ $text: { $search: mySearchTerm } }, { projection: { searchTitle: 0 } })

Solution 5 - Mongodb

I wrapped @Ricardo Canelas' answer in a mongoose plugin here https://www.npmjs.com/package/mongoose-partial-full-search" >on npm

Two changes made:

  • Uses promises
  • Search on any field with type String

Here's the important source code:

// mongoose-partial-full-search

module.exports = exports = function addPartialFullSearch(schema, options) {
  schema.statics = {
    ...schema.statics,
    makePartialSearchQueries: function (q) {
      if (!q) return {};
      const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
        val.instance == "String" &&
          queries.push({
            [path]: new RegExp(q, "gi")
          });
        return queries;
      }, []);
      return { $or }
    },
    searchPartial: function (q, opts) {
      return this.find(this.makePartialSearchQueries(q), opts);
    },

    searchFull: function (q, opts) {
      return this.find({
        $text: {
          $search: q
        }
      }, opts);
    },

    search: function (q, opts) {
      return this.searchFull(q, opts).then(data => {
        return data.length ? data : this.searchPartial(q, opts);
      });
    }
  }
}

exports.version = require('../package').version;

Usage

// PostSchema.js
import addPartialFullSearch from 'mongoose-partial-full-search';
PostSchema.plugin(addPartialFullSearch);

// some other file.js
import Post from '../wherever/models/post'

Post.search('Firs').then(data => console.log(data);)

Solution 6 - Mongodb

If you are using a variable to store the string or value to be searched:

It will work with the Regex, as:

{ collection.find({ name of Mongodb field: new RegExp(variable_name, 'i') }

Here, the I is for the ignore-case option

Solution 7 - Mongodb

The quick and dirty solution, that worked for me: use text search first, if nothing is found, then make another query with a regexp. In case you don't want to make two queries - $or works too, but requires all fields in query to be indexed.

Also, you'd better not to use case-insensitive rx, because it can't rely on indexes. In my case I've made lowercase copies of used fields.

Solution 8 - Mongodb

Good n-gram based approach for fuzzy matching is explained here (Also explains how to score higher for Results using prefix Matching) https://medium.com/xeneta/fuzzy-search-with-mongodb-and-python-57103928ee5d

Note : n-gram based approaches can be storage extensive and mongodb collection size will increase.

Solution 9 - Mongodb

full/partial search in MongodB for a "pure" Meteor-project

I adpated flash's code to use it with Meteor-Collections and simpleSchema but without mongoose (means: remove the use of .plugin()-method and schema.path (altough that looks to be a simpleSchema-attribute in flash's code, it did not resolve for me)) and returing the result array instead of a cursor.

Thought that this might help someone, so I share it.

export function partialFullTextSearch(meteorCollection, searchString) {

    // builds an "or"-mongoDB-query for all fields with type "String" with a regEx as search parameter
    const makePartialSearchQueries = () => {
        if (!searchString) return {};
        const $or = Object.entries(meteorCollection.simpleSchema().schema())
            .reduce((queries, [name, def]) => {
                def.type.definitions.some(t => t.type === String) &&
                queries.push({[name]: new RegExp(searchString, "gi")});
                return queries
            }, []);
        return {$or}
    };

    // returns a promise with result as array
    const searchPartial = () => meteorCollection.rawCollection()
        .find(makePartialSearchQueries(searchString)).toArray();

    // returns a promise with result as array
    const searchFull = () => meteorCollection.rawCollection()
        .find({$text: {$search: searchString}}).toArray();

    return searchFull().then(result => {
        if (result.length === 0) throw null
        else return result
    }).catch(() => searchPartial());

}

This returns a Promise, so call it like this (i.e. as a return of a async Meteor-Method searchContact on serverside). It implies that you attached a simpleSchema to your collection before calling this method.

return partialFullTextSearch(Contacts, searchString).then(result => result);

Solution 10 - Mongodb

I create an additional field which combines all the fields within a document that I want to search. Then I just use regex:

user = {
    firstName: 'Bob',
    lastName: 'Smith',
    address: {
        street: 'First Ave',
        city: 'New York City',
        }
    notes: 'Bob knows Mary'
}

// add combined search field with '+' separator to preserve spaces
user.searchString = `${user.firstName}+${user.lastName}+${user.address.street}+${user.address.city}+${user.notes}`

db.users.find({searchString: {$regex: 'mar', $options: 'i'}})
// returns Bob because 'mar' matches his notes field

// TODO write a client-side function to highlight the matching fragments

Solution 11 - Mongodb

import re

db.collection.find({"$or": [{"your field name": re.compile(text, re.IGNORECASE)},{"your field name": re.compile(text, re.IGNORECASE)}]})

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLeonelView Question on Stackoverflow
Solution 1 - MongodbStennieView Answer on Stackoverflow
Solution 2 - MongodbRicardo CanelasView Answer on Stackoverflow
Solution 3 - Mongodbnurealam siddiqView Answer on Stackoverflow
Solution 4 - MongodbJohannes FahrenkrugView Answer on Stackoverflow
Solution 5 - MongodbflashView Answer on Stackoverflow
Solution 6 - MongodbvigviswaView Answer on Stackoverflow
Solution 7 - MongodbTactical CatgirlView Answer on Stackoverflow
Solution 8 - MongodbShrikant PrabhuView Answer on Stackoverflow
Solution 9 - MongodbNiklas DView Answer on Stackoverflow
Solution 10 - MongodbMFBView Answer on Stackoverflow
Solution 11 - MongodbHrishikeshView Answer on Stackoverflow