Using S3 as a database vs. database (e.g. MongoDB)

MongodbAmazon Web-ServicesAmazon S3Nosql

Mongodb Problem Overview


Due to simple setup and low costs I am considering using AWS S3 bucket instead of a NoSQL database to save simple user settings as a JSON (around 30 documents).

I researched the following disadvantages of not using a database which are not relevant for my use case:

  • Listing of buckets/files will cost you money.
  • No updates - you cannot update a file, just replace it.
  • No indexes.
  • Versioning will cost you $$.
  • No search
  • No transactions
  • No query API (SQL or NoSQL)

Are there any other disavantages of using a S3 bucket instead of a database?

Mongodb Solutions


Solution 1 - Mongodb

You are "considering using AWS S3 bucket instead of a NoSQL database", but the fact is that Amazon S3 effectively is a NoSQL database.

It is a very large Key-Value store. The Key is the filename, the Value is the contents of the file.

If your needs are simply "Store a value with this key" and "Retrieve a value with this key", then it would work just fine!

In fact, old orders on Amazon.com (more than a year old) are apparently archived to Amazon S3 since they are read-only (no returns, no changes).

While slower than DynamoDB, Amazon S3 certainly costs significantly less for storage!

Solution 2 - Mongodb

Context: we use S3 for some "database" (lit. key/value structured storage).

It should be noted that S3 does actually have search and, depending on how you structure your data, queries in the form of S3 Select (and, if you have the time: Athena).

Edit: prior to December, 2020, S3 was eventually consistent. Now it it is strongly consistent. Following disadvantages doesn't apply anymore, but are here for historical reasons.


Before December, 2020, the biggest disadvantage/architectural challenge was that S3 was eventually consistent (which was actually the reason why you could not "update" a file). This manifested itself in some behaviours which your architecture needed to tolerate:

  • Operations were cached by key, so if you attempted to get an object that doesn't exist, and then create it- for a period of time* any gets on that object will return that it does not exist.
  • There was no global cache, so you could get two different versions of the same object for a period of time* after it has been overwritten.
  • List operations provided a semi-unstable iterator. If you were going to list on a large number of objects in a bucket that was being updated, then chances are you were not going to visit all the objects by the end of the iterator.

*period of time is purposely undefined by AWS, however, from observation, it is rarely more than a minute.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSimon ThielView Question on Stackoverflow
Solution 1 - MongodbJohn RotensteinView Answer on Stackoverflow
Solution 2 - MongodbthomasmichaelwallaceView Answer on Stackoverflow