What are design patterns to support custom fields in an application?

Database DesignData Modeling

Database Design Problem Overview


We develop a commercial application. Our customers are asking for custom fields support. For instance, they want to add a field to the Customer form.

What are the known design patterns to store the field values and the meta-data about the fields?

I see these options for now:

Option 1: Add Field1, Field2, Field3, Field4 columns of type varchar to my Customer table.

Option 2: Add a single column of type XML in the customer table and store the custom fields' values in xml.

Option 3: Add a CustomerCustomFieldValue table with a column of type varchar and store values in that column. That table would also have a CustomerID, a CustomFieldID.

CustomerID,  CustomFieldID, Value
10001,       1001,          '02/12/2009 8:00 AM'
10001,       1002,          '18.26'
10002,       1001,          '01/12/2009 8:00 AM'
10002,       1002,          '50.26'

CustomFieldID would be an ID from another table called CustomField with these columns: CustomFieldID, FieldName, FieldValueTypeID.

Option 4: Add a CustomerCustomFieldValue table with a column of each possible value type and store values in the right column. Similar to #3 but field values are stored using a strongly-type column.

CustomerID,  CustomFieldID, DateValue,           StringValue,       NumericValue                 
10001,       1001,          02/12/2009 8:00 AM,  null,              null
10001,       1002,          null,                null,              18.26
10002,       1001,          01/12/2009 8:00 AM,  null,              null
10002,       1002,          null,                null,              50.26

Option 5: Options 3 and 4 use a table specific to a single concept (Customer). Our clients are asking for custom field in other forms as well. Should we instead have a system-wide custom field storage system? So instead of having multiple tables such as CustomerCustomFieldValue, EmployeeCustomFieldValue, InvoiceCustomFieldValue, we would have a single table named CustomFieldValue? Although it seems more elegant to me, wouldn't that cause a performance bottleneck?

Have you used any of those approaches? Were you successful? What approach would you select? Do you know any other approach that I should consider?

Also, my clients want the custom field to be able to refer to data in other tables. For instance a client might want to add a "Favorite Payment Method" field to the Customer. Payment methods are defined elsewhere in the system. That brings the subject of "foreign keys" in the picture. Should I try to create constraints to ensure that values stored in the custom field tables are valid values?

Thanks

======================

EDIT 07-27-2009:

Thank you for your answers. It seems like the list of approaches is now quite comprehensive. I have selected the option 2 (a single XML column). It was the easiest to implement for now. I will probably have to refractor to a more strongly-defined approach as my requirements will get more complex and as the number of custom fields to support will get larger.

Database Design Solutions


Solution 1 - Database Design

I do agree with posters below that Options 3, 4, or 5 are most likely to be appropriate. However, each of your suggested implementations has its benefits and costs. I'd suggest choosing one by matching it to your specific requirements. For example:

  1. Option 1 pros: Fast to implement. Allows DB actions on custom fields (searching, sorting.)
    Option 1 cons: Custom fields are generic, so no strongly-typed fields. Database table is inefficient, size-wise with many extraneous fields that will never be used. Number of custom fields allowed needs to be anticipated.
  2. Option 2 pros: Fast to implement. Flexible, allowing arbitrary number and type of custom fields.
    Option 2 cons: No DB actions possible on custom fields. This is best if all you need to do is display the custom fields, later, or do minor manipulations of the data only on a per-Customer basis.
  3. Option 3 pros: Both flexible and efficient. DB actions can be performed, but the data is normalized somewhat to reduce wasted space. I agree with unknown (google)'s suggestion that you add an additional column that can be used to specify type or source information. Option 3 cons: Slight increase in development time and complexity of your queries, but there really aren't too many cons, here.
  4. Option 4 is the same as Option 3, except that your typed data can be operated on at the DB level. The addition of type information to the link table in Option 3 allows you to do more operations at our application level, but the DB won't be able to do comparisons or sort, for example. The choice between 3 and 4 depends on this requirement.
  5. Option 5 is the same as 3 or 4, but with even more flexibility to apply the solution to many different tables. The cost in this case will be that the size of this table will grow much larger. If you are doing many expensive join operations to get to your custom fields, this solution may not scale well.

P.S. As noted below, the term "design pattern" usually refers to object-oriented programming. You're looking for a solution to a database design problem, which means that most advice regarding design patterns won't be applicable.

Solution 2 - Database Design

As far as the application code is concerned I'm unsure. I do know that custom fields benefit greatly from a EAV model in the database.

Per the comments below, the most significant mistake you can make with this model is putting foreign keys into it. Never ever put something like FriendID or TypeID into this model. Use this model in conjunction with the typical relational model and keep foreign key fields in table columns as they should.

A second significant mistake is placing data in this model that needs to be reported with every element. For example putting something like Username in this model would mean that anytime you want to access a user and need to know their username you've committed yourself to a join at best or 2n queries where n is the number of users you're looking at. When you consider that you are usually going to need the Username property for every User element it becomes obvious this too should remain in the table columns.

However, if you're just using this model with custom user fields you'll be fine. I can't imagine many situations where a user would be entering in relational data and the EAV model is not too significantly detrimental to searches.

Lastly, don't try to join data from this and get a nice pretty recordset. Grab the original record and then grab the set of records for the entity. If you find yourself tempted to join the tables you've probably made the second mistake as mentioned above.

Solution 3 - Database Design

If you're developing with an object oriented language, we're talking about adaptive object models here. There are quite a few articles written about how you can implement them in oo-languages, but not so much information about how to design the data store side.

In the company where I work, we have solved the problem by using a relational database to store AOM data. We have central entity table for presenting all the different "entities" in the domain, like people, network devices, companies, etc... We store the actual "form fields" to data tables that are typed, so we have one table for strings, one for dates and so on. All the data tables have a foreign key pointing to the entity table. We also need tables to present the type-side, i.e. what kind of attributes (form fields) can certain entity have and this information is used to interpret the data in data tables.

Pros of our solution are that anything can be modeled without code changes, including references between entities, multivalues and so on. It's also possible to add business rules and validations to fields and they can be reused in all form. Cons are that the programming model is not very easy to understand and query performance will be worse than with a more typical DB design. Some other solution than relational database could have been better and easier for AOM.

Building a good AOM with a working data store for it is a lot of work and I wouldn't recommend it if you don't have highly skilled developers. Maybe one day there will be an OS solution for these kinds of requirements.

Custom fields have been discussed before in SO:

Solution 4 - Database Design

Something like Option 3 is the way to go and i have used this method previously. Create a single table to define additional properties and their corresponding values. This would be a 1-N relationship between your Customer and CustomerCustomField table (respectively). Your second question regarding defining relationships with custom properties would be something to think about. The first thing that comes to mind is adding a DataSource field, which would contain the table to which the property value is bound to. So essentially your CustomerCustomField would look like:

  1. CustomerId
  2. Property
  3. Value
  4. ValueDataSource (nullable)

This should allow you to either bind to a specific data structure or simply allow you to specify unbound values. You can further normalize this model, but something like this could work and should be easy enough to handle in code.

Solution 5 - Database Design

Option 4 or 5 would be my choice. If your data is important, I wouldn't go tossing away your type information with Option 3. (You might try to implement full type-checking yourself, but it's a pretty big job, and the database engine already does it for you.)

Some thoughts:

  • Make sure your CustomFields has a DataType column.
  • Use a UDF-based check constraint on CustomFieldValues to ensure that the column specified by CustomFields.DataType is non-null.
  • You'll also want a standard check constraint to make sure you have exactly one non-null value.
  • Regarding foreign keys, I would model these as a separate DataType.
  • Each potential cross-table reference would require its own column. This is good, because it maintains referential integrity.
  • You would have to support these relationships in application code anyway, so the fact that they are hard-coded in the database does not actually limit functionality.
  • This will also jive well with your ORM, if you're using one.
  • For Option 5, use intermediary tables to model the relationships.
  • You would still have a CustomerCustomFieldValue, but instead with only CustomerID and CustomFieldValueID columns.
  • Think long and hard about your constraints every step of the way. This is tricky stuff, and one misstep can cause utter havok down the line.

I am using this in an application currently in development. There haven't been any problems yet, but EAV designs still scare the daylights out of me. Just be careful.

As an aside, XML may also be a good choice. I don't know as much about it from direct experience, but it was one of the options I considered when starting the data design, and it looked pretty promising.

Solution 6 - Database Design

if those 'extra' fields are incidental and don't care to do searches on them, I usually go for option 2 (but like JSON better than XML). If there's going to be searches on custom fields, option 3 isn't hard to do, and usually the SQL optimizer can get reasonable performance out of it.

Solution 7 - Database Design

I am currently working on a project with this same problem, and I have chosen to use option 3, but I added a FieldType field and a ListSource field in case the FieldType="list". The ListSource field could be a query, an sql view, a function name, or something that results in a list of options for the list. The biggest problem with trying to store fields like this in my situation is that this field list can change, and the users are allowed to edit the data later. So what to do if the field list has changed and they go to edit. My solution to that scenario was to allow editing only if the list hasn't changed and to display read-only data if it has.

Solution 8 - Database Design

The table does not matter, imagine a table for stations. The fields of this can go to a form. Obviously this form is static, since the fields are fixed in the model.

What a CCK does (it's an old Drupal 6 module) is create other tables that relate extra fields to one or more tables. To do this, 3 tables are created called:

  • table
  • attributes
  • values

You can put a prefix on them if you do not want to touch your current model.

Note: The antipattern is called the Entity–Attribute–Value model (EAV). It is not recommended to use in large-scale databases due to the delay in obtaining the associated records. Or, if that is the case, at least use on tables that do not require frequent queries and that these tables are not important within your model.

In table goes a list of table names to which you will allow extra fields and an id. Here only goes the id and name (which is the name of the table you want to expand). If you want to give a table the possibility of defining new fields, you just have to add a record here.

In attributes fields called id, name, type and tables_id are created. The name is the field name that would be associated in the new version of the table. And in type is associated the type of value that it uses (integer, string, date, time, etc). The data types depend on the DB engine, but you can start with only some that the engine allows. Only there will be no relations or indexes for these fields since they are optional. table_id is associated with the id of the record used in table (table.id).

In values fields are written with all the possible types that you have defined for attributes. For example, if you only allow integers and strings, you only have to create two fields, one for integer and one for string. It is important to note that the default values of these types will always be null. Together with two other fields, one, let's call it attributes_id, associated with the attributes record that will indicate the name and type of field that you must use to store the value (that is, the field of this table according to what the attributes table defines). And another one called objective_id that points to the record inside your final table (let's say a record in stations).

Stations
id name lat long
1 Alto Hospicio -20.290467 -70.100192
2 Gobernación -22.093082 -70.201210
3 Hospital el Cobre -22.450496 -68.908442
Table
id name
10 stations
20 fooTable
Attributes
id name type tables_id
100 key integer 10
200 email string 20
300 start date 10
Values
objective_id attributes_id integer date string
1 100 117 NULL NULL
1 300 NULL 2022-01-01 NULL
3 100 217 NULL NULL

Now, in your view with the form, you can check whether or not to add a new form field according to whether or not there is a record associated with that table in the attributes table (you know in this view that the final table is stations by which is just doing a SELECT of all the records associated with that table name). Depending on its type, you can add validations. When you store or save this field you create or update it in the values table and write the value provided by your form in the appropriate field, the id of your record in the final table (stations) is copied to objective_id. To be dynamic you must incorporate this generation in all the forms associated with a table. Or at least in those that you previously define in tables.

The same analysis to present the values in a report. You iterate in the attributes table in case there is an attribute to your report (stations) and for each record obtained you consult the values table filtering by attributes_id and objective_id (which is the id of the current station to report).

Stations report table (view)
name lat long key start
Alto Hospicio -20.290467 -70.100192 117 2022-01-01
Gobernación -22.093082 -70.201210
Hospital el Cobre -22.450496 -68.908442 217

You can delete the values directly in the values table, but you cannot delete records from the tables table or attributes. This to avoid future inconsistencies. To simplify the latter, you can opt for logical deletions by adding date attributes like deleted_at both of this tables.

To prevent inconsistencies you must create a unique constraint that prevents associated duplicate records.

  • table: name
  • attributes: (name, tables_id)
  • values: (objective_id, attributes_id)

To add custom validations you could add a field in the attributes table with a regular expression that administrators can configure. They should contain default values depending on the data type. This is optional.

Attributes
id name type tables_id regex
100 key integer 10 ^\d+$
200 email string 20 [^@ \t\r\n]+@[^@ \t\r\n]+\.[^@ \t\r\n]+
300 start date 10 `^\d{4}-([0]\d

Obviously this applies to many uses and facilitates the growth of models by the user. Against it is the lack of relationships that will simplify queries.

Check this considerations for EAV.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSylvainView Question on Stackoverflow
Solution 1 - Database DesignEric NguyenView Answer on Stackoverflow
Solution 2 - Database DesignSpencer RuportView Answer on Stackoverflow
Solution 3 - Database DesignKaitsuView Answer on Stackoverflow
Solution 4 - Database DesignSergeyView Answer on Stackoverflow
Solution 5 - Database DesignIsabelle WedinView Answer on Stackoverflow
Solution 6 - Database DesignJavierView Answer on Stackoverflow
Solution 7 - Database DesignjbairView Answer on Stackoverflow
Solution 8 - Database DesignnhereveriView Answer on Stackoverflow