Dynamic Database Schema

SqlDatabase DesignArchitectureDynamic Data

Sql Problem Overview


What is a recommended architecture for providing storage for a dynamic logical database schema?

To clarify: Where a system is required to provide storage for a model whose schema may be extended or altered by its users once in production, what are some good technologies, database models or storage engines that will allow this?

A few possibilities to illustrate:

  • Creating/altering database objects via dynamically generated DML
  • Creating tables with large numbers of sparse physical columns and using only those required for the 'overlaid' logical schema
  • Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity
  • Using a BigTable/SimpleDB PropertyBag type system

Any answers based on real world experience would be greatly appreciated

Sql Solutions


Solution 1 - Sql

What you are proposing is not new. Plenty of people have tried it... most have found that they chase "infinite" flexibility and instead end up with much, much less than that. It's the "roach motel" of database designs -- data goes in, but it's almost impossible to get it out. Try and conceptualize writing the code for ANY sort of constraint and you'll see what I mean.

The end result typically is a system that is MUCH more difficult to debug, maintain, and full of data consistency problems. This is not always the case, but more often than not, that is how it ends up. Mostly because the programmer(s) don't see this train wreck coming and fail to defensively code against it. Also, often ends up the case that the "infinite" flexibility really isn't that necessary; it's a very bad "smell" when the dev team gets a spec that says "Gosh I have no clue what sort of data they are going to put here, so let 'em put WHATEVER"... and the end users are just fine having pre-defined attribute types that they can use (code up a generic phone #, and let them create any # of them -- this is trivial in a nicely normalized system and maintains flexibility and integrity!)

If you have a very good development team and are intimately aware of the problems you'll have to overcome with this design, you can successfully code up a well designed, not terribly buggy system. Most of the time.

Why start out with the odds stacked so much against you, though?

Don't believe me? Google "One True Lookup Table" or "single table design". Some good results: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

http://thedailywtf.com/Comments/Tom_Kyte_on_The_Ultimate_Extensibility.aspx?pg=3

http://www.dbazine.com/ofinterest/oi-articles/celko22

http://thedailywtf.com/Comments/The_Inner-Platform_Effect.aspx?pg=2

Solution 2 - Sql

A strongly typed xml field in MSSQL has worked for us.

Solution 3 - Sql

Like some others have said, don't do this unless you have no other choice. One case where this is required is if you are selling an off-the-shelf product that must allow users to record custom data. My company's product falls into this category.

If you do need to allow your customers to do this, here are a few tips:

  • Create a robust administrative tool to perform the schema changes, and do not allow these changes to be made any other way.
  • Make it an administrative feature; don't allow normal users to access it.
  • Log every detail about every schema change. This will help you debug problems, and it will also give you CYA data if a customer does something stupid.

If you can do those things successfully (especially the first one), then any of the architectures you mentioned will work. My preference is to dynamically change the database objects, because that allows you to take advantage of your DBMS's query features when you access the data stored in the custom fields. The other three options require you load large chunks of data and then do most of your data processing in code.

Solution 4 - Sql

I have a similar requirement and decided to use the schema-less MongoDB.

> MongoDB (from "humongous") is an open source, scalable, high-performance, schema-free, document-oriented database written in the C++ programming language. (Wikipedia)

Highlights:

  • has rich query functionality (maybe the closest to SQL DBs)
  • production ready (foursquare, sourceforge use it)

Lowdarks (stuff you need to understand, so you can use mongo correctly):

Solution 5 - Sql

The whole point of having a relational DB is keeping your data safe and consistent. The moment you allow users to alter the schema, there goes your data integrity...

If your need is to store heterogeneous data, for example like a CMS scenario, I would suggest storing XML validated by an XSD in a row. Of course you lose performance and easy search capabilities, but it's a good trade off IMHO.

Since it's 2016, forget XML! Use JSON to store the non-relational data bag, with an appropriately typed column as backend. You shouldn't normally need to query by value inside the bag, which will be slow even though many contemporary SQL databases understand JSON natively.

Solution 6 - Sql

I did it ones in a real project:

The database consisted of one table with one field which was an array of 50. It had a 'word' index set on it. All the data was typeless so the 'word index' worked as expected. Numeric fields were represented as characters and the actual sorting had been done at client side. (It still possible to have several array fields for each data type if needed).

The logical data schema for logical tables was held within the same database with different table row 'type' (the first array element). It also supported simple versioning in copy-on-write style using same 'type' field.

Advantages:

  1. You can rearrange and add/delete your columns dynamically, no need for dump/reload of database. Any new column data may be set to initial value (virtually) in zero time.
  2. Fragmentation is minimal, since all records and tables are same size, sometimes it gives better performance.
  3. All table schema is virtual. Any logical schema stucture is possible (even recursive, or object-oriented).
  4. It is good for "write-once, read-mostly, no-delete/mark-as-deleted" data (most Web apps actually are like that).

Disadvantages:

  1. Indexing only by full words, no abbreviation,
  2. Complex queries are possible, but with slight performance degradation.
  3. Depends on whether your preferred database system supports arrays and word indexes (it was inplemented in PROGRESS RDBMS).
  4. Relational model is only in programmer's mind (i.e. only at run-time).

And now I'm thinking the next step could be - to implement such a database on the file system level. That might be relatively easy.

Solution 7 - Sql

Sounds to me like what you really want is some sort of "meta-schema", a database schema which is capable of describing a flexible schema for storing the actual data. Dynamic schema changes are touchy and not something you want to mess with, especially not if users are allowed to make the change.

You're not going to find a database which is more suited to this task than any other, so your best bet is just to select one based on other criteria. For example, what platform are you using to host the DB? What language is the app written in? etc

To clarify what I mean by "meta-schema":

CREATE TABLE data (
    id INTEGER NOT NULL AUTO_INCREMENT,
    key VARCHAR(255),
    data TEXT,

    PRIMARY KEY (id)
);

This is a very simple example, you would likely have something more specific to your needs (and hopefully a little easier to work with), but it does serve to illustrate my point. You should consider the database schema itself to be immutable at the application level; any structural changes should be reflected in the data (that-is, the instantiation of that schema).

Solution 8 - Sql

I know that models indicated in the question are used in production systems all over. A rather large one is in use at a large university/teaching institution that I work for. They specifically use the long narrow table approach to map data gathered by many varied data acquisition systems.

Also, Google recently released their internal data sharing protocol, protocol buffer, as open source via their code site. A database system modeled on this approach would be quite interesting.

Check the following:

Entity-attribute-value model

Google Protocol Buffer

Solution 9 - Sql

Create 2 databases

  • DB1 contains static tables, and represents the "real" state of the data.
  • DB2 is free for users to do with as they wish - they (or you) will have to write code to populate their odd-shaped tables from DB1.

Solution 10 - Sql

EAV approach i believe is the best approach, but comes with a heavy cost

Solution 11 - Sql

I know it's an old topic, but I guess that it never loses actuality. I'm developing something like that right now. Here is my approach. I use a server setting with a MySQL, Apache, PHP, and Zend Framework 2 as application framework, but it should work as well with any other settings.

Here is a simple implementation guide, you can evolve it yourself further from this.

You would need to implement your own query language interpreter, because the effective SQL would be too complicated.

Example:

select id, password from user where email_address = "[email protected]"

The physical database layout:

Table 'specs': (should be cached in your data access layer)

  • id: int
  • parent_id: int
  • name: varchar(255)

Table 'items':

  • id: int
  • parent_id: int
  • spec_id: int
  • data: varchar(20000)

Contents of table 'specs':

  • 1, 0, 'user'
  • 2, 1, 'email_address'
  • 3, 1, 'password'

Contents of table 'items':

The translation of the example in our own query language:

select id, password from user where email_address = "[email protected]"

to standard SQL would look like this:

select 
    parent_id, -- user id
    data -- password
from 
    items 
where 
    spec_id = 3 -- make sure this is a 'password' item
    and 
    parent_id in 
    ( -- get the 'user' item to which this 'password' item belongs
        select 
            id 
        from 
            items 
        where 
            spec_id = 1 -- make sure this is a 'user' item
            and 
            id in 
            ( -- fetch all item id's with the desired 'email_address' child item
                select 
                    parent_id -- id of the parent item of the 'email_address' item
                from 
                    items 
                where 
                    spec_id = 2 -- make sure this is a 'email_address' item
                    and
                    data = "[email protected]" -- with the desired data value
            )
    )

You will need to have the specs table cached in an associative array or hashtable or something similar to get the spec_id's from the spec names. Otherwise you would need to insert some more SQL overhead to get the spec_id's from the names, like in this snippet:

Bad example, don't use this, avoid this, cache the specs table instead!

select 
    parent_id, 
    data 
from 
    items 
where 
    spec_id = (select id from specs where name = "password") 
    and 
    parent_id in (
        select 
            id 
        from 
            items 
        where 
            spec_id = (select id from specs where name = "user") 
            and 
            id in (
                select 
                    parent_id 
                from 
                    items 
                where 
                    spec_id = (select id from specs where name = "email_address") 
                    and 
                    data = "[email protected]"
            )
    )

I hope you get the idea and can determine for yourself whether that approach is feasible for you.

Enjoy! :-)

Solution 12 - Sql

Over at the c2.com wiki, the idea of "Dynamic Relational" was explored. You DON'T need a DBA: columns and tables are Create-On-Write, unless you start adding constraints to make it act more like a traditional RDBMS: as a project matures, you can incrementally "lock it down".

Conceptually you can think of each row as an XML statement. For example, an employee record could be represented as:

<employee lastname="Li" firstname="Joe" salary="120000" id="318"/>

This does not imply it has to be implemented as XML, it's just a handy conceptualization. If you ask for a non-existing column, such as "SELECT madeUpColumn ...", it's treated as blank or null (unless added constraints forbid such). And it's possible to use SQL, although one has to be careful about comparisons because of the implied type model. But other than type handling, users of a Dynamic Relational system would feel right at home because they can leverage most of their existing RDBMS knowledge. Now, if somebody would just build it...

Solution 13 - Sql

In the past I've chosen option C -- Creating a 'long, narrow' table that stores dynamic column values as rows that then need to be pivoted to create a 'short, wide' rowset containing all the values for a specific entity.. However, I was using an ORM, and that REALLY made things painful. I can't think of how you'd do it in, say, LinqToSql. I guess I'd have to create a Hashtable to reference the fields.

@Skliwz: I'm guessing he's more interested in allowing users to create user-defined fields.

Solution 14 - Sql

ElasticSearch. You should consider it especially if you're dealing with datasets that you can partition by date, you can use JSON for your data, and are not fixed on using SQL for retrieving the data.

ES infers your schema for any new JSON fields you send, either automatically, with hints, or manually which you can define/change by one HTTP command ("mappings"). Although it does not support SQL, it has some great lookup capabilities and even aggregations.

Solution 15 - Sql

I know this is a super old post, and much has changed in the last 11 years, but thought I would added this as it might be helpful to future readers. One of the reason's why my co-founders and I created HarperDB is to natively accomplish Dynamic schema in a single, unduplicated data set while providing full index capability. You can read more about it here:
https://harperdb.io/blog/dynamic-schema-the-harperdb-way/

Solution 16 - Sql

sql already provides a way to change your schema: the ALTER command.

simply have a table that lists the fields that users are not allowed to change, and write a nice interface for ALTER.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFake JimView Question on Stackoverflow
Solution 1 - SqlMatt RogishView Answer on Stackoverflow
Solution 2 - SqlMike BecattiView Answer on Stackoverflow
Solution 3 - SqlJosh YeagerView Answer on Stackoverflow
Solution 4 - SqlclyfeView Answer on Stackoverflow
Solution 5 - SqlSklivvzView Answer on Stackoverflow
Solution 6 - SqlThevsView Answer on Stackoverflow
Solution 7 - SqlDaniel SpiewakView Answer on Stackoverflow
Solution 8 - SqlsicularsView Answer on Stackoverflow
Solution 9 - SqlAJ.View Answer on Stackoverflow
Solution 10 - SqlkamalView Answer on Stackoverflow
Solution 11 - SqlOliver KonigView Answer on Stackoverflow
Solution 12 - SqlFloverOweView Answer on Stackoverflow
Solution 13 - SqlDanimalView Answer on Stackoverflow
Solution 14 - SqlOrenView Answer on Stackoverflow
Solution 15 - SqlsgoldbergView Answer on Stackoverflow
Solution 16 - SqllongneckView Answer on Stackoverflow