Why are composite primary keys still around?

MysqlSql ServerDatabase Design

Mysql Problem Overview


I'm assigned to migrate a database to a mid-class ERP. The new system uses composite primary keys here and there, and from a pragmatic point of view, why?

Compared to autogenerated IDs, I can only see negative aspects;

  • Foreign keys becomes blurry
  • Harder migration or db-redesigns
  • Inflexible as business change. (My car has no reg.plate..)
  • Same integrity better achieved with constraints.

It's falling back to the design concept of candiate keys, which I neither see the point of.

Is it a habit/artifact from the floppy-days (minimizing space/indexes), or am I missing something?

//edit// Just found good SO-post: https://stackoverflow.com/questions/159087/composite-primary-keys-versus-unique-object-id-field //

Mysql Solutions


Solution 1 - Mysql

Composite keys are required when your primary keys are non-surrogate and inherently, um, composite, that is, breakable into several non-related parts.

Some real-world examples:

  • Many-to-many link tables, in which the primary keys are composed of the keys of the entities related.

  • Multi-tenant applications when tenant_id is a part of primary key of each entity and the entities are only linkable within the same tenant (constrained by a foreign key).

  • Applications processing third-party data (with already provided primary keys)

Note that logically, all this can be achieved using a UNIQUE constraint (additional to a surrogate PRIMARY KEY).

However, there are some implementation specific things:

  • Some systems won't let a FOREIGN KEY refer to anything that is not a PRIMARY KEY.

  • Some systems would only cluster a table on a PRIMARY KEY, hence making the composite the PRIMARY KEY would improve performance of the queries joining on the composite.

Solution 2 - Mysql

Personally I prefer the use of surrogate keys. However, in joining tables that consist only of the ids from two other tables (to create a many-to-many relationships) composite keys are the way to go and thus taking them out would make things more difficult.

There is a school of thought that surrogate keys are always bad and that if you don't have uniqueness to record through the use of natural keys you have a bad design. I strongly disagree with this (if you aren't storing SSN or some other unique value I defy you to come up with a natural key for a person table for instance.) But many people feel that it is necessary for proper normalization.

Sometimes having a composite key reduces the need to join to another table. Sometimes it doesn't. So there are times when a composite key can boost performance as well as times when it can harm performance. If the key is relatively stable, you may be fine with faster performance on select queries. However, if it is something that is subject to change like a company name, you could be in a world of hurt when company A changes it's name and you have to update a million associated records.

There is no one size fits all in database design. There are time when composite keys are helpful and times when they are horrible. There are times when surrogate keys are helpful and times when they are not.

Solution 3 - Mysql

Composite primary key provides better performance when it comes to them being used as Foreign keys in other tables and reduces table reads - sometimes they can be life savers. If you use surrogate keys, you have to go to that table to get natural key information.

For example (pure example - so we are not talking DB design here), lets say you have an ORDER table and ORDER_ITEM. If you use ProductId and LineNumber (UPDATE: and as Pedro mentioned OrderId or even better OrderNumber) as composite primary key in ORDER_ITEM, then in your cross table for SHIPPING, you would be able to have ProductId in the SHIPPING_ORDERITEM. This can massively boost your performance if for example you have run out of that product and need to find out all products of that ProductId that need to be shipped without a need to join.

On the other hand, if you use a surrogate key, you have to join and you end up with a very inefficient SQL execution plan where it has to do bookmark lookup on several indexes.

See more on bookmark lookup which using surrogate keys becomes a major issue.

Solution 4 - Mysql

Natural primary keys are brittle.

Suppose we have built a system around a natural PK on (CountryCode, PhoneNumber), and several years down the road we need to add Extension, or change the PK to one column: Email. If these PK columns are propagated to all child tables, this becomes very expensive.

A few years ago there were some systems that were built assuming that Social Security Number is a natural PK, and had to be redesigned to use identities, when the SSN became non-unique and nullable.

Because we cannot predict the future, we don't know if later on some change will render obsolete what used to be a perfectly correct and complete model.

Solution 5 - Mysql

The very simple answer is data integrity. If the data is to be useful and accurate then the keys are presumably required. Having an "autogenerated id" doesn't remove the requirement for other keys as well. The alternative is not to enforce uniqueness and accept that data will be duplicated and almost inevatibly contain anomalies and lead to errors as a result. Why would you want that?

Solution 6 - Mysql

In short, the purpose of composite keys is to use the database to enforce one or more business rules. In other words: protect the integrity of your data.

Ex. You have a list of parts that you buy from suppliers. You could could create your supplier and parts table like such:

SUPPLIER
SupplierId
SupplierName
   
PART
PartId
PartName
SupplierId

Uh oh. The parts table allows for duplicate data. Since you used a surrogate key that was autogenerated, you're not enforcing the fact that a part from a supplier should only be entered once. Instead, you should create the PART table like such:

PART
SupplierId
SupplierPartId
PartName

In this example, your parts come from specific suppliers and you want to enforce the rule: "A single supplier can only supply a single part once" in the PARTS table. Hence, the composite key. Your composite key prevents accidental duplicate entry of a part.

You can always leave business rules out of your database and leave them to your application, but by keeping the rule in the database (via a composite key), you ensure that the business rule is enforced everywhere, especially if you should ever decide to allow multiple applications to access the data.

Solution 7 - Mysql

Just as functions encapsulate a set of instructions, or database views abstract base table connections, so to do surrogate keys abstract the meaning of the entity they are placed on.

If, for example, you have a table that holds vehicle data, applying a surrogate VehicleId abstracts what it means to be a vehicle from a data point of view. When you reference VehicleId = 1, you are most surely talking about a vehicle of some sort, but do we know if it is a 2008 Chevy Impala, or a 1991 Ford F-150? No. Can the underlying data of whatever Vehicle #1 is change at any time? Yes.

Solution 8 - Mysql

Short answer: Multi-column foreign keys naturally refer to multi column primary keys. There can still be an autogenerated id column that is part of the primary key.

Philosophical answer: Primary key is the identity of the row. If there there is a bit of information that is an intrinsic part of the identity of the row (such as which customer the article belongs to.. in a multi customer wiki) - The information should be part of the primary key.

An example: System for organizing LAN parties

The system supports several LAN parties with the same people and organizers attending thus:

CREATE TABLE users ( users_id serial PRIMARY KEY, ... );

And there are several parties:

CREATE TABLE parties ( parties_id serial PRIMARY KEY, ... );

But most of the other stuff needs to carry the information about which party it is linked to:

CREATE TABLE ticket_types (
    ticket_types_id serial,
    parties_id integer REFERENCES parties,
    name text,
    ....
    PRIMARY KEY(ticket_types_id, parties_id)
);

...this is because we want to refer to primary keys. Foreign key on table attendances points to table ticket_types.

CREATE TABLE attendances (
    attendances_id serial,
    parties_id integer REFERENCES parties,
    ticket_types_id integer,
    PRIMARY KEY (attendances_id, parties_id),
    FOREIGN KEY (ticket_types_id, parties_id) REFERENCES parties
);

Solution 9 - Mysql

While I prefer surrogate keys, I use composite cases in a few cases. The composite key may consist entirely or partially of surrogate key fields.

  • Many to many join tables. These usually require a unique key on the key pair anyway. In some cases additional columns may be included in the key.
  • Weak child tables. Things like order lines do not stand on their own. In this case I use the parent (orders) tables primary key in the composite table.

When there are multiple weak tables related to an entity, it may be possible to eliminate a table from the join set when querying child data. In the case of grandchild tables, it is possible to join the grandparent to grandchild without involving the table in the middle.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTesonView Question on Stackoverflow
Solution 1 - MysqlQuassnoiView Answer on Stackoverflow
Solution 2 - MysqlHLGEMView Answer on Stackoverflow
Solution 3 - MysqlAliostadView Answer on Stackoverflow
Solution 4 - MysqlA-KView Answer on Stackoverflow
Solution 5 - MysqlnvogelView Answer on Stackoverflow
Solution 6 - MysqlJohnView Answer on Stackoverflow
Solution 7 - Mysqlses011View Answer on Stackoverflow
Solution 8 - MysqljkjView Answer on Stackoverflow
Solution 9 - MysqlBillThorView Answer on Stackoverflow