To what extent are 'lost data' criticisms still valid of MongoDB?

Mongodb

Mongodb Problem Overview


To what extent are 'lost data' criticisms still valid of MongoDB? I'm referring to the following:

> 1. MongoDB issues writes in unsafe ways by default in order to win benchmarks

>If you don't issue getLastError(), MongoDB doesn't wait for any confirmation from the database that the command was processed. This introduces at least two classes of problems:

>* In a concurrent environment (connection pools, etc), you may have a subsequent read fail after a write has "finished"; there is no barrier condition to know at what point the database will recognize a write commitment >* Any unknown number of save operations can be dropped on the floor due to queueing in various places, things outstanding in the TCP buffer, etc, when your connection drops of the db were to be KILL'd or segfault, hardware crash, you name it

>2. MongoDB can lose data in many startling ways

>Here is a list of ways we personally experienced records go missing:

>1. They just disappeared sometimes. Cause unknown. >2. Recovery on corrupt database was not successful, pre transaction log. >3. Replication between master and slave had gaps in the oplogs, causing slaves to be missing records the master had. Yes, there is no checksum, and yes, the replication status had the slaves current >4. Replication just stops sometimes, without error. Monitor your replication status!

> ...[other criticisms]

If still valid, these criticisms would be worrying to some extent. The article primarily references v1.6 and v1.8, but since then v2 has been released. Are the shortcomings discussed in the article still outstanding as of the current release?

Mongodb Solutions


Solution 1 - Mongodb

Note on Context:

This question was asked in 2012, but still sees traffic and votes to this day. The original answer was specifically to refute a particular post that was popular at the time of the question. Things have changed (and continue to change) massively since this answer was written. MongoDB has certainly become far more durable and reliable than it was in 2012 when even things like basic journaling were relatively new. I get downvotes and comments on this answer because people feel I don't address the current (for a given value of current) general answer to the titular question (not the detail): "are lost data criticisms still valid?". I have attempted to clarify in updates below, but there is basically no perfect answer to this question, it depends on your perspective, what your expectations are/were, what version you are using, what configuration, whether you feel upset about the default settings etc.

Original Answer:

That particular post was debunked, point by point by the MongoDB CTO and co-founder, Eliot Horowitz, here:

http://news.ycombinator.com/item?id=3202959

There is also a good summary here:

http://www.betabeat.com/2011/11/10/the-trolls-come-out-for-10gen/

The short version is, it looks like this was basically someone trolling for attention (successfully), with no solid evidence or corroboration. There have been genuine incidents in the past, which have been dealt with as the product evolved (see the introduction of journaling in 1.8 for example) or as more specific bugs were found and fixed.

Disclaimer: I do work for MongoDB (formerly 10gen), and love the fact that philnate got here and refuted this independently first - that probably says more about the product than anything else :)

Update: August 19th 2013

I've seen quite a bit of activity on this answer recently, which I assume is related to the announcement of the bug in SERVER-10478 - it is most certainly an edge case, but I would still recommend anyone using sharding with large documents to upgrade ASAP to v2.2.6 and v2.4.6 which include the fix for this issue.

Update: March 24th 2017

I no longer work for MongoDB, but stand behind this answer nonetheless. Given that this answer continues to get up (and down) votes and receives a lot of views I would like to point people at this post which shows the progress MongoDB has made since this question was posed. The database now passes the Jepsen tests, and has integrated the tests into its build process, there are plenty of far more mature systems that do not pass. Anyone still beating the data loss drum in 2017 really hasn't been paying attention.

Update: May 24th 2020

Jepsen has re-analyzed MongoDB 4.2.6 given that MongoDB now offers "full ACID transactions" and while it gets quite technical in parts, I highly recommend reading the article if data loss in MongoDB is a concern for you (I would recommend checking out any database you use that Jepsen tests, you might be surprised at their weak spots). The report summarizes the weaknesses in the default read and write concerns, talks about how reliable non-transaction reads and writes are with appropriate read and write concerns, addresses flaws in the documentation, and then provides significant details about the issues encountered when testing the new ACID transactions (and associated read/write concerns).

So, can you still lose data with MongoDB? Yes, especially with default settings, but that is true of most databases. Things are vastly better than they were back when this question was answered, and the capabilities are there for more reliability and durability, and they seem to work (transactions aside). My advice is to learn what the limitations of the configuration are that you operate and to then determine whether the data loss risk is acceptable or not for your product/business/use case.

Solution 2 - Mongodb

I can't speak for every case, only my own. However unlike the other answer I don't work for Mongo or its competitors, I have lost data when using MongoDB, and used Mongo for around ten years, so here goes.

2010

This is when I first began using Mongo, the main criticisms of Mongo around the time were:

  • Supposedly stable versions of Mongo had major data-losing bugs that weren't made explicit to users. Eg, prior to 1.8 non-clustered configurations were likely to lose data. This was documented by Mongo, but not to the extent a data losing bug in a stable-versioned DB would normally be.

The main defence of that criticism was:

  • Users were informed of this danger, albeit not so explicitly. Users should read all the documentation before they begin.

In my own case, I was using 1.7 in a single server configuration but aware of the risk. I shut down the DB to take a back up. The act of shutting down the DB itself lost my data, 10gen assisted (for free) but were unable to recover the data.

2013

Later, in 2013, a study came out revealing MongoDB defaults can cause significant loss of acknowledged writes during network partitions.

Also in 2013 Mongo the official production node Mongo drivers wrapped and threw away all errors.

2014

Since then, in 2014 a completely different bug in the stable MongoDB driver bit me and many other users.

2016 (and again in 2020)

In 2016, the Meteor project has issues with MongoDB queries not always returning all matching documents.

Later MongoDB's policy of listening on all interfaces by default with no admin password set has also worked out badly for many users. We knew two decades ago (and probably earlier, but I wasn't in tech at the time) that listening on all ports by default was a bad idea, which is why other software avoids this.

2020 update: the Meow attack now automatically destroys databases with Mongo's old network defaults.

2020

Jepsen evaluated MongoDB 4.2.6 and concluded:

> even at the strongest levels of read and write concern, MongoDB 4.2.6 failed to preserve snapshot isolation. Instead, Jepsen observed read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level.

Conclusion

There have been general, repeated observations, over many years, that Mongo has unsafe defaults to win performance benchmarks. Mongo generally responds that the user should be aware of these by reading all the relevant docs and may use choose to use safe options if they are needed.

As of 2020 I feel like MongoDB now is actually a more stable product simply through time and investment, however I will never trust the company for using our data to beta test for a decade, and I would not be surprised at all if another data loss condition was revealed. I have used Postgres JSONB, FoundationDB and RethinkDB as structured data stores which may be valid alternatives.

enter image description here

Solution 3 - Mongodb

As of February 2017, the most recent Jepsen analysis of MongoDB suggests that data loss was possible in all versions of MongoDB up to MongoDB 3.2.11 and 3.4.0-rc4.

So at the time the question was written (2012) the answer should've been yes, those criticisms were valid from theoretical perspective. But it looks like customers don't care about the theory. As RethinkDB fail has shown, correctness doesn't matter. The only thing that matters is time to market. Very sad.

As of Oct 2018, On MongoDB 3.4 - This is still an issue.

Solution 4 - Mongodb

Never heard of those severe problems in recent versions. What you need to consider is that MongoDB has no decade of development as relational Systems in the back. Further it may be true that MongoDB doesn't offer that much functionality to avoid data loss at all. But even with relational Systems you won't be ever sure that you'll never loose any data. It highly depends on your system configuration (so with Replication and manual data backups you should be quite safe).

As a general guideline to avoid Beta Bugs or bugs from early versions, avoid to use fresh versions in productions (there's a reason why debian is so popular for servers). If MongoDB would suffer such severe problems (all the time) the list of users would be smaller: https://www.mongodb.com/community/deployments Additionally I don't really trust this pastebin message, why is this published anonymously? Is this person company shamed to tell that they used mongodb, do they fear 10gen? Where a links to those Bug reports (or did 10gen delete them from JIRA?)

So lets talk shortly about those points:

  1. Yep MongoDB operates normally in fire and forget mode. But you can modify this bevavior with several options: https://docs.mongodb.com/manual/reference/command/getLastError/#dbcmd.getLastError. So only because MongoDB defaults to it, it doesn't mean you can't change it to your needs. But you need to live less performance if you don't fire and forget within your app, as you're adding a roundtrip.

    Update: Since version 2.6, the commands insert, update, save, remove by default acknowledges the write.

  2. Never heard of such problems, except those caused to own failure...but that can happen with relational systems as well. I guess this point only talks about Master-Slave Replication. Replica-Sets are much never and stable. Some links from the web where other dbms caused data loss due to malfunction as well: http://blog.lastinfirstout.net/2010/04/bit-by-bug-data-loss-running-oracle-on.html http://dbaspot.com/oracle-server/430465-parallel-cause-data-lost-another-oracle-bug.html http://bugs.mysql.com/bug.php?id=18014 (Those posted links aren't in any favor of a given system or should imply anything else than showing that there are bugs in other systems as well, who can cause data loss.)

  3. Yes actually there's Locking at instance level, I don't think that in sharded environment this is a global one, I think this will be at instance level for each shard separate, as there's no need to lock other instances as there are no consistency checks needed. The upcoming Version 2.2 will lock at DB Level, tickets for Collection Level and maybe extend or document exists as well (https://jira.mongodb.org/browse/SERVER-4328). But locking at deeper levels may affect the actual performance of MongoDB, as a lock management is expensive.

  4. Moving chunks shouldn't cause much problems as rebalancing should take a few chunks from each node and move them to the new one. It never should cause ping/pong of chunks nor does rebalancing start just because of a difference of one or two chunks. What can be problematic is when your shard key is choosen wrong. So you may end up with all new entries inserted to one node rather than all. So you would see more often rebalancing which can cause problems, but that would be not due to mongo rather than your poorly choosen shardkey.

  5. Can't comment on this one

  6. Not 100% sure, but I think Replicasets where introduced in 1.6, so as told earlier never use the latest version for production, except you can live with loss of data. As with every new feature there's the possibility of bugs. Even extensive testing may not reveal all problems. Again always run some manual backup for gods sake, except you can live with data loss.

  7. Can't comment on this. But in reality software may contain severe bugs. Many games suffer those problems as well and there are other areas as well where banana software was quite well known or is. Can't Comment about something concrete as this was before my MongoDB time.

  8. Replication can cause such problems. Depending on the replication strategy this may be a problem and another system may fit better. But without a really really write intensive workload you may not encounter such problems. Indeed it may be problematic to have 3 replicas polling changes from one master. You could cure the problem by adding more shards.

As a general conclusion: Yeah it may be that those problems were existent, but MongoDB did much in this direction and further I doubt that other DBMS never had the one or other problem itself. Just take traditional relational dbms, would those scale well to web-scale there would be no need for Systems like MongoDB, HBase and what else. You can't get a system which fits all needs. So you have to live with the downsides of one or try to build a combined system of multiple to get what you need.

Disclaimer: I'm not affiliated with MongoDB or 10gen, I'm just working with MongoDB and telling my opinion about it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondeltanovemberView Question on Stackoverflow
Solution 1 - MongodbAdam ComerfordView Answer on Stackoverflow
Solution 2 - MongodbmikemaccanaView Answer on Stackoverflow
Solution 3 - MongodbVanuanView Answer on Stackoverflow
Solution 4 - MongodbphilnateView Answer on Stackoverflow