Overflow sort stage buffered data usage exceeds internal limit

MongodbMongodb QueryMongodb Indexes

Mongodb Problem Overview


Using the code:

all_reviews = db_handle.find().sort('reviewDate', pymongo.ASCENDING)
print all_reviews.count()

print all_reviews[0]
print all_reviews[2000000]

The count prints 2043484, and it prints all_reviews[0].

However when printing all_reviews[2000000], I get the error:

>pymongo.errors.OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33554495 bytes exceeds internal limit of 33554432 bytes

How do I handle this?

Mongodb Solutions


Solution 1 - Mongodb

You're running into the 32MB limit on an in-memory sort:

https://docs.mongodb.com/manual/reference/limits/#Sort-Operations

Add an index to the sort field. That allows MongoDB to stream documents to you in sorted order, rather than attempting to load them all into memory on the server and sort them in memory before sending them to the client.

Solution 2 - Mongodb

As said by kumar_harsh in the comments section, i would like to add another point.

You can view the current buffer usage using the below command over the admin database:

> use admin
switched to db admin
> db.runCommand( { getParameter : 1, "internalQueryExecMaxBlockingSortBytes" : 1 } )
{ "internalQueryExecMaxBlockingSortBytes" : 33554432, "ok" : 1 }

It has a default value of 32 MB(33554432 bytes).In this case you're running short of buffer data so you can increase buffer limit with your own defined optimal value, example 50 MB as below:

>  db.adminCommand({setParameter: 1, internalQueryExecMaxBlockingSortBytes:50151432})
{ "was" : 33554432, "ok" : 1 }

We can also set this limit permanently by the below parameter in the mongodb config file:

setParameter=internalQueryExecMaxBlockingSortBytes=309715200

Hope this helps !!!

Note:This commands supports only after version 3.0 +

Solution 3 - Mongodb

solved with indexing

db_handle.ensure_index([("reviewDate", pymongo.ASCENDING)])

Solution 4 - Mongodb

If you want to avoid creating an index (e.g. you just want a quick-and-dirty check to explore the data), you can use aggregation with disk usage:

all_reviews = db_handle.aggregate([{$sort: {'reviewDate': 1}}], {allowDiskUse: true})

(Not sure how to do this in pymongo, though).

Solution 5 - Mongodb

JavaScript API syntax for the index:

db_handle.ensureIndex({executedDate: 1})

Solution 6 - Mongodb

In my case, it was necessary to fix nessary indexes in code and recreate them:

rake db:mongoid:create_indexes RAILS_ENV=production

As the memory overflow does not occur when there is a needed index of field.

PS Before this I had to disable the errors when creating long indexes:

# mongo
MongoDB shell version: 2.6.12
connecting to: test
> db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

Also may be needed reIndex:

# mongo
MongoDB shell version: 2.6.12
connecting to: test
> use your_db
switched to db your_db
> db.getCollectionNames().forEach( function(collection){ db[collection].reIndex() } )

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionsheetal_158View Question on Stackoverflow
Solution 1 - MongodbA. Jesse Jiryu DavisView Answer on Stackoverflow
Solution 2 - MongodbJerryView Answer on Stackoverflow
Solution 3 - Mongodbsheetal_158View Answer on Stackoverflow
Solution 4 - MongodbporoszdView Answer on Stackoverflow
Solution 5 - MongodbwyttenView Answer on Stackoverflow
Solution 6 - MongodbshilovkView Answer on Stackoverflow