database vs. flat files

DatabaseFile

Database Problem Overview


The company I work for is trying to switch a product that uses flat file format to a database format. We're handling pretty big files of data (ie: 25GB/file) and they get updated really quick. We need to run queries that randomly access the data, as well as in a contiguous way. I am trying to convince them of the advantages of using a database, but some of my colleagues seem reluctant to this. So I was wondering if you guys can help me out here with some reasons or links to posts of why we should use databases, or at least clarify why flat files are better (if they are).

Database Solutions


Solution 1 - Database

  1. Databases can handle querying tasks, so you don't have to walk over files manually. Databases can handle very complicated queries.
  2. Databases can handle indexing tasks, so if tasks like get record with id = x can be VERY fast
  3. Databases can handle multiprocess/multithreaded access.
  4. Databases can handle access from network
  5. Databases can watch for data integrity
  6. Databases can update data easily (see 1) )
  7. Databases are reliable
  8. Databases can handle transactions and concurrent access
  9. Databases + ORMs let you manipulate data in very programmer friendly way.

Solution 2 - Database

This is an answer I've already given some time ago:

> It depends entirely on the > domain-specific application needs. A > lot of times direct text file/binary > files access can be extremely fast, > efficient, as well as providing you > all the file access capabilities of > your OS's file system. > > Furthermore, your programming language > most likely already has a built-in > module (or is easy to make one) for > specific parsing. > > If what you need is many appends > (INSERTS?) and sequential/few access > little/no concurrency, files are the > way to go. > > On the other hand, when your > requirements for concurrency, > non-sequential reading/writing, > atomicity, atomic permissions, your > data is relational by the nature etc., > you will be better off with a > relational or OO database. > > There is a lot that can be > accomplished with SQLite3, which > is extremely light (under 300kb), ACID > compliant, written in C/C++, and > highly ubiquitous (if it isn't already > included in your programming language > -for example Python-, there is surely one available). It can be useful even > on db files as big as 140 terabytes, or 128 tebibytes (Link to Database Size), possible > more. > > If your requirements where bigger, > there wouldn't even be a discussion, > go for a full-blown RDBMS.

As you say in a comment that "the system" is merely a bunch of scripts, then you should take a look at pgbash.

Solution 3 - Database

Don't build it if you can buy it.

I heard this quote recently, and it really seems fitting as a guide line. Ask yourself this... How much time was spent working on the file handling portion of your app? I suspect a fair amount of time was spent optimizing this code for performance. If you had been using a relational database all along, you would have spent considerably less time handling this portion of your application. You would have had more time for the true "business" aspect of your app.

Solution 4 - Database

They're faster; unless you're loading the entire flat file into memory, a database will allow faster access in almost all cases.

They're safer; databases are easier to safely backup; they have mechanisms to check for file corruption, which flat files do not. Once corruption in your flat file migrates to your backups, you're done, and you might not even know it yet.

They have more features; databases can allow many users to read/write at the same time.

They're much less complex to work with, once they're setup.

Solution 5 - Database

What types of files is not mentioned. If they're media files, go ahead with flat files. You probably just need a DB for tags and some way to associate the "external BLOBs" to the records in the DB. But if full text search is something you need, there's no other way to go but migrate to a full DB.

Another thing, your filesystem might provide the ceiling as far as number of physical files are concerned.

Solution 6 - Database

Databases all the way.

However, if you still have a need for storing files, don't have the capacity to take on a new RDBMS (like Oracle, SQLServer, etc), than look into XML.

XML is a structure file format which offers you the ability to store things as a file but give you query power over the file and data within it. XML Files are easier to read than flat files and can be easily transformed applying an XSLT for even better human-readability. XML is also a great way to transport data around if you must.

I strongly suggest a DB, but if you can't go that route, XML is an ok second.

Solution 7 - Database

What about a non-relational (NoSQL) database such as Amazon's SimpleDB, Tokio Cabinet, etc? I've heard that Google, Facebook, LinkedIn are using these to store their huge datasets.

Can you tell us if your data is structured, if your schema is fixed, if you need easy replicability, if access times are important, etc?

Solution 8 - Database

Difference between database and flat files are given below:

  • Database provide more flexibility whereas flat file provide less flexibility.

  • Database system provide data consistency whereas flat file can not provide data consistency.

  • Database is more secure over flat files.

  • Database support DML and DDL whereas flat files can not support these.

  • Less data redundancy in database whereas more data redundancy in flat files.

Solution 9 - Database

SQL ad hoc query abilities are enough of a reason for me. With a good schema and indexing on the tables, this is fast and effective and will have good performance.

Solution 10 - Database

Unless you are loading the files into memory each time you boot, use a database. Simple as that.

That is assuming that your colleges already have the program to handle queries to the files. If not, then use a database.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionhyperboreeanView Question on Stackoverflow
Solution 1 - DatabaseAndreyView Answer on Stackoverflow
Solution 2 - DatabaseEsteban KüberView Answer on Stackoverflow
Solution 3 - DatabaseGeorge MastrosView Answer on Stackoverflow
Solution 4 - DatabaseDean JView Answer on Stackoverflow
Solution 5 - DatabasebcoscaView Answer on Stackoverflow
Solution 6 - DatabaseScott RootView Answer on Stackoverflow
Solution 7 - DatabaseVictorView Answer on Stackoverflow
Solution 8 - DatabaserashedcsView Answer on Stackoverflow
Solution 9 - DatabaseOdedView Answer on Stackoverflow
Solution 10 - DatabaseLay GonzálezView Answer on Stackoverflow