Understanding MongoDB BSON Document size limit

MongodbBson

Mongodb Problem Overview


From MongoDB The Definitive Guide:

> Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance.

I don't understand this limit, does this mean that A Document containing a Blog post with a lot of comments which just so happens to be larger than 4MB cannot be stored as a single document?

Also does this count the nested documents too?

What if I wanted a document which audits the changes to a value. (It will eventually may grow, exceeding 4MB limit.)

Hope someone explains this correctly.

I have just started reading about MongoDB (first nosql database I'm learning about).

Thank you.

Mongodb Solutions


Solution 1 - Mongodb

First off, this actually is being raised in the next version to 8MB or 16MB ... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:

EDIT: The size has been officially 'raised' to 16MB

> So, on your blog example, 4MB is > actually a whole lot.. For example, > the full uncompresses text of "War of > the Worlds" is only 364k (html): > http://www.gutenberg.org/etext/36 > > If your blog post is that long with > that many comments, I for one am not > going to read it :) > > For trackbacks, if you dedicated 1MB > to them, you could easily have more > than 10k (probably closer to 20k) > > So except for truly bizarre > situations, it'll work great. And in > the exception case or spam, I really > don't think you'd want a 20mb object > anyway. I think capping trackbacks as > 15k or so makes a lot of sense no > matter what for performance. Or at > least special casing if it ever > happens. > > -Eliot

I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.

The main point of the limit is so you don't use up all the RAM on your server (as you need to load all MBs of the document into RAM when you query it.)

So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.

Note on Storing Files in MongoDB

If you need to store documents (or files) larger than 16MB you can use the GridFS API which will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)

> Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.

> GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.

Solution 2 - Mongodb

Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument: https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283

My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.

Solution 3 - Mongodb

To post a clarification answer here for those who get directed here by Google.

The document size includes everything in the document including the subdocuments, nested objects etc.

So a document of:

{
  "_id": {},
  "na": [1, 2, 3],
  "naa": [
    { "w": 1, "v": 2, "b": [1, 2, 3] },
    { "w": 5, "b": 2, "h": [{ "d": 5, "g": 7 }, {}] }
  ]
}

Has a maximum size of 16 MB.

Subdocuments and nested objects are all counted towards the size of the document.

Solution 4 - Mongodb

I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?

JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like

binary file <> JSON (encoded) <> BSON (encoded)

It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.

If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.

Solution 5 - Mongodb

Nested Depth for BSON Documents: MongoDB supports no more than 100 levels of nesting for BSON documents.

More more info vist

Solution 6 - Mongodb

Perhaps storing a blog post -> comments relation in a non-relational database is not really the best design.

You should probably store comments in a separate collection to blog posts anyway.

[edit]

See comments below for further discussion.

Solution 7 - Mongodb

According to https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1

If you expect that a blog post may exceed the 16Mb document limit, you should extract the comments into a separate collection and reference the blog post from the comment and do an application-level join.

// posts
[
  {
    _id: ObjectID('AAAA'),
    text: 'a post',
    ...
  }
]

// comments
[
  {
    text: 'a comment'
    post: ObjectID('AAAA')
  },
  {
    text: 'another comment'
    post: ObjectID('AAAA')
  }
]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Question0xdeadbeefView Question on Stackoverflow
Solution 1 - MongodbJustin JenkinsView Answer on Stackoverflow
Solution 2 - Mongodbmarr75View Answer on Stackoverflow
Solution 3 - MongodbSammayeView Answer on Stackoverflow
Solution 4 - MongodbChris GolledgeView Answer on Stackoverflow
Solution 5 - Mongodbuser2903536View Answer on Stackoverflow
Solution 6 - MongodbMchlView Answer on Stackoverflow
Solution 7 - MongodbmzarrughView Answer on Stackoverflow