Database vs File system storage

DatabaseFile System-Storage

Database Problem Overview


Database ultimately stores the data in files, whereas File system also stores the data in files. In this case what is the difference between DB and File System. Is it in the way it is retrieved or anything else?

Database Solutions


Solution 1 - Database

> A database is generally used for storing related, structured data, with well defined data formats, in an efficient manner for insert, update and/or retrieval (depending on application). > > On the other hand, a file system is a more unstructured data store for storing arbitrary, probably unrelated data. The file system is more general, and databases are built on top of the general data storage services provided by file systems. [Quora]

The file system is useful if you are looking for a particular file, as operating systems maintain a sort of index. However, the contents of a txt file won't be indexed, which is one of the main advantages of a database.

For very complex operations, the filesystem is likely to be very slow.

Main RDBMS advantages:

  • Tables are related to each other

  • SQL query/data processing language

  • Transaction processing addition to SQL (Transact-SQL)

  • Server-client implementation with server-side objects like stored procedures, functions, triggers, views, etc.

Advantage of the File System over Data base Management System is:

When handling small data sets with arbitrary, probably unrelated data, file is more efficient than database. For simple operations, read, write, file operations are faster and simple.

You can find n number of difference over internet.

Solution 2 - Database

Something one should be aware of is that Unix has what is called an inode limit. If you are storing millions of records then this can be a serious problem. You should run df -i to view the % used as effectively this is a filesystem file limit - EVEN IF you have plenty of disk space.

Solution 3 - Database

"They're the same"

Yes, storing data is just storing data. At the end of the day, you have files. You can store lots of stuff in lots of files & folders, there are situations where this will be the way. There is a well-known versioning solution (svn) that finally ended up using a filesystem-based model to store data, ditching their BerkeleyDB. Rare but happens. More info.

"They're quite different"

In a database, you have options you don't have with files. Imagine a textfile (something like tsv/csv) with 99999 rows. Now try to:

  • Insert a column. It's painful, you have to alter each row and read+write the whole file.
  • Find a row. You either scan the whole file or build an index yourself.
  • Delete a row. Find row, then read+write everything after it.
  • Reorder columns. Again, full read+write.
  • Sort rows. Full read, some kind of sort - then do it next time all over.

There are lots of other good points but these are the first mountains you're trying to climb when you think of a file based db alternative. Those guys programmed all this for you, it's yours to use; think of the likely (most frequent) scenarios, enumerate all possible actions you want to perform on your data, and decide which one works better for you. Think in benefits, not fashion.

Again, if you're storing JPG pictures and only ever look for them by one key (their id maybe?), a well-thought filesystem storage is better. Filesystems, btw, are close to databases today, as many of them use a balanced tree approach, so on a BTRFS you can just put all your pictures in one folder - and the OS will silently implement something like an early SQL query each time you access your files.

So, database or files?...
Let's see a few typical examples when one is better than the other. (These are no complete lists, surely you can stuff in a lot more on both sides.)

DB tables are much better when:

  • You want to store many rows with the exact same structure (no block waste)
  • You need lightning-fast lookup / sorting by more than one value (indexed tables)
  • You need atomic transactions (data safety)
  • Your users will read/write the same data all the time (better locking)

Filesystem is way better if:

  • You like to use version control on your data (a nightmare with dbs)
  • You have big chunks of data that grow frequently (typically, logfiles)
  • You want other apps to access your data without API (like text editors)
  • You want to store lots of binary content (pictures or mp3s)


TL;DR

Programming rarely says "never" or "always". Those who say "database always wins" or "files always win" probably just don't know enough. Think of the possible actions (now + future), consider both ways, and choose the fastest / most efficient for the case. That's it.

Solution 4 - Database

The difference between file processing system and database management system is as follow:

  1. A file processing system is a collection of programs that store and manage files in computer hard-disk. On the other hand, A database management system is collection of programs that enables to create and maintain a database.

  2. File processing system has more data redundancy, less data redundancy in dbms.

  3. File processing system provides less flexibility in accessing data, whereas dbms has more flexibility in accessing data.

  4. File processing system does not provide data consistency, whereas dbms provides data consistency through normalization.

  5. File processing system is less complex, whereas dbms is more complex.

Solution 5 - Database

Context: I've written a filesystem that has been running in production for 7 years now. [1]

The key difference between a filesystem and a database is that the filesystem API is part of the OS, thus filesystem implementations have to implement that API and thus follow certain rules, whereas databases are built by 3rd parties having complete freedom.

Historically, databases where created when the filesystem provided by the OS were not good enough for the problem at hand. Just think about it: if you had special requirements, you couldn't just call Microsoft or Apple to redesign their filesystem API. You would either go ahead and write your own storage software or you would look around for existing alternatives. So the need created a market for 3rd party data storage software which ended up being called databases. That's about it.

While it may seem that filesystems have certain rules like having files and directories, this is not true. The biggest operating systems work like that but there are many mall small OSs that work differently. It's certainly not a hard requirement. (Just remember, to build a new filesystem, you also need to write a new OS, which will make adoption quite a bit harder. Why not focus on just the storage engine and call it a database instead?)

In the end, both databases and filesystems come in all shapes and sizes. Transactional, relational, hierarchical, graph, tabled; whatever you can think of.

[1] I've worked on the Boomla Filesystem which is the storage system behind the Boomla OS & Web Application Platform.

Solution 6 - Database

The main differences between the Database and File System storage is:

> 1. The database is a software application used to insert, update and delete > data while the file system is a software used to add, update and delete > files. > 2. Saving the files and retrieving is simpler in file system > while SQL needs to be learn to perform any query on the database to > get (SELECT), add (INSERT) and update the data. > 3. Database provides a proper data recovery process while file system did not. > 4. In terms of security the database is more secure then the file system (usually). > 5. The migration process is very easy in File system just copy and paste into the target > while for database this task is not as simple.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSriramView Question on Stackoverflow
Solution 1 - DatabaseVickyView Answer on Stackoverflow
Solution 2 - DatabaseAntonyView Answer on Stackoverflow
Solution 3 - DatabasedkellnerView Answer on Stackoverflow
Solution 4 - DatabaserashedcsView Answer on Stackoverflow
Solution 5 - DatabasezupaView Answer on Stackoverflow
Solution 6 - DatabaseTahir AlviView Answer on Stackoverflow