Is a Primary Key necessary in SQL Server?

SqlSql ServerSql Server-2005

Sql Problem Overview


This may be a pretty naive and stupid question, but I'm going to ask it anyway

I have a table with several fields, none of which are unique, and a primary key, which obviously is.

This table is accessed via the non-unique fields regularly, but no user SP or process access data via the primary key. Is the primary key necessary then? Is it used behind the scenes? Will removing it affect performance Positively or Negatively?

Sql Solutions


Solution 1 - Sql

Necessary? No. Used behind the scenes? Well, it's saved to disk and kept in the row cache, etc. Removing will slightly increase your performance (use a watch with millisecond precision to notice).

But ... the next time someone needs to create references to this table, they will curse you. If they are brave, they will add a PK (and wait for a long time for the DB to create the column). If they are not brave or dumb, they will start creating references using the business key (i.e. the data columns) which will cause a maintenance nightmare.

Conclusion: Since the cost of having a PK (even if it's not used ATM) is so small, let it be.

Solution 2 - Sql

Do you have any foreign keys, do you ever join on the PK?

If the answer to this is no, and your app never retrieves an item from the table by its PK, and no query ever uses it in a where clause, therefore you just added an IDENTITY column to have a PK, then:

  • the PK in itself adds no value, but does no damage either
  • the fact that the PK is very likely the clustered index too is .. it depends.

If you have NC indexes, then the fact that you have a narrow artificial clustered key (the IDENTITY PK) is helpful in keeping those indexes narrow (the CDX key is reproduced in every NC leaf slots). So a PK, even if never used, is helpful if you have significant NC indexes.

On the other hand, if you have a prevalent access pattern, a certain query that outweighs all the other is frequency and importance, or which is part of a critical time code path (eg. is the query run on every page visit on your site, or every second by and app etc) then that query is a good candidate to dictate the clustered key order.

And finally, if the table is seldom queried but often written to then it may be a good candidate for a HEAP (no clustered key at all) since heaps are so much better at inserts. See Comparing Tables Organized with Clustered Indexes versus Heaps.

Solution 3 - Sql

The primary key is behind the scenes a clustered index (by default unless generated as a non clustered index) and holds all the data for the table. If the PK is an identity column the inserts will happen sequentially and no page splits will occur.

But if you don't access the id column at all then you probably want to add some indexes on the other columns. Also when you have a PK you can setup FK relationships

Solution 4 - Sql

In the logical model, a table must have at least one key. There is no reason to arbitarily specify that one of the keys is 'primary'; all keys are equal. Although the concept of 'primary key' can be traced back to Ted Codd's early work, the mistake was picked up early on has long been corrected in relational theory.

Sadly, PRIMARY KEY found it's way into SQL and we've had to live with it ever since. SQL tables can have duplicate rows and, if you consider the resultset of a SELECT query to also be a table, then SQL tables can have duplciate rows too. Relational theorists dislike SQL a lot. However, just because SQL lets you do all kinds of wacky non-relational things, that doesn't mean that you have to actually do them. Is is good practise to ensure that every SQL table has at least one key.

In SQL, using PRIMARY KEY on its own has implications e.g. NOT NULL, UNIQUE, the table's default reference for foreign keys. In SQL Server, using PRIMARY KEY on its own has implications e.g. the table's clustered index. However, in all these cases, the implicit behaviour can be made explicit using specific syntax.

You can use UNIQUE (constraint rather than index) and NOT NULL in combination to enforce keys in SQL. Therefore, no, a primary key (or even PRIMARY KEY) is not necessary in SQL Server.

Solution 5 - Sql

I would never have a table without a primary key. Suppose you ever need to remove a duplicate - how would you identify which one to remove and which to keep?

Solution 6 - Sql

The primary key when defined will help improve performance within the database for indexing and relationships.

I always tend to define a primary key as an auto incrementing integer in all my tables, regardless of if I access it or not, this is because when you start to scale up your application, you may find you do actually need it, and it makes life a lot simpler.

Solution 7 - Sql

A PK is not necessary.

But you should consider to place a non-unique index on the columns that you use for querying (i.e. that appear in the WHERE-clause). This will considerably boost lookup performance.

Solution 8 - Sql

A primary key is really a property of your domain model, and it uniquely identifies an instance of a domain object.

Having a clustered index on a montonically increasing column (such as an identity column) will mean page splits will not occur, BUT insertions will unbalance the index over time and therefore rebuilding indexes needs to be done regulary (or when fragmentation reaches a certain threshold).

I have to have a very good reason to create a table without a primary key.

Solution 9 - Sql

If you are accessing them via non-key fields the performance probably will not change. However it might be nice to keep the PK for future enhancements or interfaces to these tables. Does your application only use this one table?

Solution 10 - Sql

As SQLMenace said, the clustered index is an important column for the physical layout of the table. In addition, having a clustered index, especially a well chosen one on a skinny column like an integer pk, actually increases insert performance.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionroryokView Question on Stackoverflow
Solution 1 - SqlAaron DigullaView Answer on Stackoverflow
Solution 2 - SqlRemus RusanuView Answer on Stackoverflow
Solution 3 - SqlSQLMenaceView Answer on Stackoverflow
Solution 4 - SqlonedaywhenView Answer on Stackoverflow
Solution 5 - SqlHLGEMView Answer on Stackoverflow
Solution 6 - SqlTom GullenView Answer on Stackoverflow
Solution 7 - SqlThomas WellerView Answer on Stackoverflow
Solution 8 - SqlMitch WheatView Answer on Stackoverflow
Solution 9 - SqlProfessionalAmateurView Answer on Stackoverflow
Solution 10 - Sqlmarr75View Answer on Stackoverflow