Alternatives to Entity-Attribute-Value (EAV)?

DatabaseDatabase DesignData StructuresData ModelingEntity Attribute-Value

Database Problem Overview


Our database is designed based on EAV (Entity-Attribute-Value) model. Those who have worked with EAV models know all the crap that comes with for the purpose of flexibility.

I asked my client about the reasons why using EAV model (flexibility), and their response was: Their entities change over time. So, today they may have a table with a few attributes, but in a month time, a few new attributes may be added, or an existing attribute may be renamed. They need to produce reports to get back to any stage in time and query the data based on the shape of entities at that stage.

I understand this is not feasible with a conventional relational model, but I personally see EAV as anti-pattern. Are there any other alternative models that enables us to capture the time dimension in changes to the entities and instances?

Cheers, Mosh

Database Solutions


Solution 1 - Database

There is a difference between EAV done faithfully or badly; 5NF done by skilled people or by those who are clueless.

Sixth Normal Form is the Irreducible Normal Form (no further Normalisation is possible). It eliminates many of the problems that are common, such as The Null Problem, and provides the ultimate method identifying missing values. It is the academically and technically robust NF. There are no products to support it, and it is not commonly used. To be implemented properly and consistently, it requires a catalogue for metadata to be implemented. Of course, the SQL required to navigate it becomes even more cumbersome (SQL already being cumbersome re joins), but this is easily overcome by automating the production of SQL from the metadata.

EAV is a partial set or a subset of 6NF. The problem is, usually it is done for a purpose (to allow columns to be added without having to make DDL changes), and by people who are not aware of the 6NF, and who do not implement metadata. The point is, 6NF and EAV as principles and concepts offer substantial benefits, and performance increases; but commonly it is not implemented properly, and the benefits are not realised. Quite a few EAV implementations are disasters, not because EAV is bad, but because the implementation is poor.

Eg. Some people think that the SQL required to construct the 3NF rows from the 6NF/EAV database is complex: no, it is cumbersome but not complex. More important, an ordinary SQL VIEW can be provided, so that all users and report tools see only the straight 3NF VIEW, and the 6NF/EAV issues are transparent to them. Last, the SQL required can be automated, so the labour cost that many people endure is quite unnecessary.

So the answer really is, Sixth Normal Form, being the father of EAV, and a purer form, is the replacement for it. The Caveat is, ensure it is done properly. I have one large 6NF db, and it suffers none of the problems people post about, it performs beautifully, the customer is very happy (no further work is a sign of complete functional satisfaction).

I have already posted a very detailed answer to another question which applies to your question as well, which you may be interested in.

Other EAV Question

Solution 2 - Database

Regardless of the kind of relational model you use, tracking field name changes requires a lot of meta data which you must keep track of in either transaction logs or audit tables. Unfortunately, querying either of those for state at a particular date is very complicated. If your client only requires state at a particular time date however, meaning the entire state, not just with respect to name changes, you can duplicate the database and roll back the transaction log to the particular time required and run your queries on the new instance. If entities added after the specified date need to show up in the query with the old field names however, you have a very large engineering problem ahead of you. In that case, with the information you provided in your question, I would suggest either negotiating alternatives with the client or getting more information about the use of the reports to find alternative solutions.

You could move to a document based datastore, but that still wouldn't solve the problem in the second case. Sorry this isn't really an answer, but having worked through similar situations, the client likely needs a more realistic reporting solution or a number of other investors willing to front the capital for the engineering.

When this problem came up for us, we kept the db schema constant and implemented an entity mapping factory based on a timestamp. In the end, the client continually changed requirements (on a weekly to monthly basis) as to how aggregate fields were calculated and were never fully satisfied.

Solution 3 - Database

To add to the answers from @NickLarsen and @PerformanceDBA

If you need to track historical changes to things like field name, you may want to look into something like Slowly Changing Dimensions. It appears to me like you are using the EAV to model dynamic dimensional models (probably lookup lists).

The simplest (and probably least efficient) way of achieving this would be to include an "as of" date field on EAV tables, and whenever a change occurs, insert a new record (instead of updating an existing record) with the current date. This means that you need to alter your queries to always include or look for an "as of" date, or deafult to "now" if none provided. Your base entity that joins to the EAV objects would then have to query "top 1" from the EAV table where "as of" date is less than or equal to the 'last updated' date of the row, ordered by "as of" descending. Worst case scenario, if you need to track the most recent change to a given row where both the name (stored in the 'attribute' table) and the value have changed, you would chain this logic to the value table using 'last modified' of the row to find the appropriate value for that particular date.

This obviously has the potential to generate LARGE amounts of data if there are a lot of changes. That's why this approach is referred to as "slowly" changing. It's intended for dimensional values that may change, but not very often. To help with query performance, indexes on the "as of" and "last modified" fields should help.

Solution 4 - Database

Create a new table description for each Entity description Version and one additional table that tells you which table is which version. The query system should be updated as well.

I think creating a script that generates, tables and queries is your best shot.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMoshView Question on Stackoverflow
Solution 1 - DatabasePerformanceDBAView Answer on Stackoverflow
Solution 2 - DatabaseNick LarsenView Answer on Stackoverflow
Solution 3 - DatabaseCodeMonkeyView Answer on Stackoverflow
Solution 4 - DatabasefabrizioMView Answer on Stackoverflow