Use email address as primary key?

SqlDatabaseDatabase DesignPostgresql

Sql Problem Overview


Is email address a bad candidate for primary when compared to auto incrementing numbers?

Our web application needs the email address to be unique in the system. So, I thought of using email address as primary key. However my colleague suggests that string comparison will be slower than integer comparison.

Is it a valid reason to not use email as primary key?

We are using PostgreSQL.

Sql Solutions


Solution 1 - Sql

String comparison is slower than int comparison. However, this does not matter if you simply retrieve a user from the database using the e-mail address. It does matter if you have complex queries with multiple joins.

If you store information about users in multiple tables, the foreign keys to the users table will be the e-mail address. That means that you store the e-mail address multiple times.

Solution 2 - Sql

I will also point out that email is a bad choice to make a unique field, there are people and even small businesses that share an email address. And like phone numbers, emails can get re-used. [email protected] can easily belong to John Smith one year and Julia Smith two years later.

Another problem with emails is that they change frequently. If you are joining to other tables with that as the key, then you will have to update the other tables as well which can be quite a performance hit when an entire client company changes their emails (which I have seen happen.)

Solution 3 - Sql

the primary key should be unique and constant

email addresses change like the seasons. Useful as a secondary key for lookup, but a poor choice for the primary key.

Solution 4 - Sql

Disadvantages of using an email address as a primary key:

  1. Slower when doing joins.

  2. Any other record with a posted foreign key now has a larger value, taking up more disk space. (Given the cost of disk space today, this is probably a trivial issue, except to the extent that the record now takes longer to read. See #1.)

  3. An email address could change, which forces all records using this as a foreign key to be updated. As email address don't change all that often, the performance problem is probably minor. The bigger problem is that you have to make sure to provide for it. If you have to write the code, this is more work and introduces the possibility of bugs. If your database engine supports "on update cascade", it's a minor issue.

Advantages of using email address as a primary key:

  1. You may be able to completely eliminate some joins. If all you need from the "master record" is the email address, then with an abstract integer key you would have to do a join to retrieve it. If the key is the email address, then you already have it and the join is unnecessary. Whether this helps you any depends on how often this situation comes up.

  2. When you are doing ad hoc queries, it's easy for a human being to see what master record is being referenced. This can be a big help when trying to track down data problems.

  3. You almost certainly will need an index on the email address anyway, so making it the primary key eliminates one index, thus improving the performance of inserts as they now have only one index to update instead of two.

In my humble opinion, it's not a slam-dunk either way. I tend to prefer to use natural keys when a practical one is available because they're just easier to work with, and the disadvantages tend to not really matter much in most cases.

Solution 5 - Sql

It is pretty bad. Assume some e-mail provider goes out of business. Users will then want to change their e-mail. If you have used e-mail as primary key, all foreign keys for users will duplicate that e-mail, making it pretty damn hard to change ...

... and I haven't even started talking about performance considerations.

Solution 6 - Sql

I don't know if that might be an issue in your setup, but depending on your RDBMS the values of a columns might be case sensitive. PostgreSQL docs say: „If you declare a column as UNIQUE or PRIMARY KEY, the implicitly generated index is case-sensitive“. In other words, if you accept user input for a search in a table with email as primary key, and the user provides "[email protected]", you won't find “[email protected]".

Solution 7 - Sql

No one seems to have mentioned a possible problem that email addresses could be considered private. If the email address is the primary key, a profile page URL most likely will look something like ..../Users/[email protected]. What if you don't want to expose the user's email address? You'd have to find some other way of identifying the user, possibly by a unique integer value to make URLs like ..../Users/1. Then you'd end up with a unique integer value after all.

Solution 8 - Sql

At the logical level, the email is the natural key. At the physical level, given you are using a relational database, the natural key doesn't fit well as the primary key. The reason is mainly the performance issues mentioned by others.

For that reason, the design can be adapted. The natural key becomes the alternate key (UNIQUE, NOT NULL), and you use a surrogate/artificial/technical key as the primary key, which can be an auto-increment in your case.

systempuntoout asked,

> What if someone wants to change his email address? Are you going to change all the foreign keys too?

That's what cascading is for.

Another reason to use a numeric surrogate key as the primary key is related to how the indexing works in your platform. In MySQL's InnoDB, for example, all indexes in a table have the primary key pre-pended to them, so you want the PK to be as small as possible (for speed's and size's sakes). Also related to this, InnoDB is faster when the primary key is stored in sequence, and a string would not help there.

Another thing to take into consideration when using a string as an alternate key, is that using a hash of the actual string that you want might be faster, skipping things like upper and lower cases of some letters. (I actually landed here while looking for a reference to confirm what I just said; still looking...)

Solution 9 - Sql

Yes, it is a bad primary key because your users will want to update their email addresses.

Solution 10 - Sql

yes, it is better if you use an integer instead. you can also set your email column as unique constraint.

like this:

CREATE TABLE myTable(
    id integer primary key,
    email text UNIQUE
);

Solution 11 - Sql

Another reason why integer primary key is better is when you refer to email address in different table. If address itself is a primary key then in another table you have to use it as a key. So you store email addresses multiple time.

Solution 12 - Sql

I am not too familiar with postgres. Primary Keys is a big topic. I've seen some excellent questions and answers on this site (stackoverflow.com).

I think you may have better performance by having a numeric primary key and use a UNIQUE INDEX on the email column. Emails tend to vary in length and may not be proper for primary key index.

some reading here and here.

Solution 13 - Sql

Personally, I do not use any information for primary key when designing database, because it is very likely that I might need to alter any information later. The sole reason that I provide primary key is, it is convenience to do most SQL operation from client-side, and my choice for that has been always auto-increment integer type.

Solution 14 - Sql

You may need to consider any applicable data regulation legislation. Email is personal information, and if your users are EU citizens for instance then under GDPR they can instruct you to delete their information from your records (remember this applies regardless of which country you are based).

If you need to keep the record itself in the database for referential integrity or historical reasons such as audit, using a surrogate key would allow you to just NULL all the personal data field. This obviously isn't as easy if their personal data is the primary key

Solution 15 - Sql

Your colleague is right: Use an autoincrementing integer for your primary key.

You can implement the email-uniqueness either at the application level, or you coudl mark your email address column as unique, and add an index on that column.

Adding the field as unique will cost you string comparision only when inserting into that table, and not when performing joins and foreign key constraint checks.

Of course, you must note that adding any constraints to your application at the database level can cause your app to become inflexible. Always give due consideration before you make any field "unique" or "not null" just because your application needs it to be unique or non-empty.

Solution 16 - Sql

Use a GUID as a primary key... that way you can generate it from your program when you do an INSERT and you don't need to get a response from the server to find out what the primary key is. It will also be unique accross tables and databases and you don't have to worry about what happens if you truncate the table some day and the auto-increment gets reset to 1.

Solution 17 - Sql

I know this is a bit of a late entry but i would like to add that people abandon email accounts and service providers recover the address allowing another person to use it.

As @HLGEM pointed out "[email protected] can easily belong to John Smith one year and Julia Smith two years later." in this case should John Smith want your service you either have to refuse to use his email address or delete all your records pertaining to Julia Smith.

If you have to delete records and they relate to the financial history of the business depending on local law you could find yourself in hot water.

So i would never use data like email addresses, number plates, etc. as a primary keys because no matter how unique they seem they are out of your control and can provide some interesting challenges that you may not have time to deal with.

Solution 18 - Sql

you can boost the performance by using integer primary key.

Solution 19 - Sql

you should use an integer primary key. if you need the email-column to be unique, why don't you simply set an unique-index on that column?

Solution 20 - Sql

If you have a non int value as primary key then insertions and retrievals will be very slow on large data.

Solution 21 - Sql

primary key should be chosen a static attribute. Since email addresses are not static and can be shared by multiple candidates so it is not a good idea to use them as primary key. Moreover email addresses are strings usually of a certain length which may be greater than unique id we would like to use[len(email_address)>len(unique_id)] so it would require more space and even worst they are stored multiple times as foreign key. And consequently it will lead to degrade the performance.

Solution 22 - Sql

It depends on the table. If the rows in your table represent email addresses, then email is the best ID. If not, then email is not a good ID.

Solution 23 - Sql

If it's simply a matter of requiring the email to be unique then you can just create a unique index with that column.

Solution 24 - Sql

Email is a good unique index candidate, but not for primary key, if it is a primary key, you will be no able to change the contact's emails address for example. I think your join querys will be slower too.

Solution 25 - Sql

don not use email address as primary key , keep email as unique but don not use it as primary key, use user id or username as primary key

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionrobertView Question on Stackoverflow
Solution 1 - SqlSjoerdView Answer on Stackoverflow
Solution 2 - SqlHLGEMView Answer on Stackoverflow
Solution 3 - SqlSteven A. LoweView Answer on Stackoverflow
Solution 4 - SqlJayView Answer on Stackoverflow
Solution 5 - SqlmeritonView Answer on Stackoverflow
Solution 6 - SqlxlttjView Answer on Stackoverflow
Solution 7 - SqlSimen EchholtView Answer on Stackoverflow
Solution 8 - SqlRafaView Answer on Stackoverflow
Solution 9 - SqlBryan LegendView Answer on Stackoverflow
Solution 10 - SqlibramView Answer on Stackoverflow
Solution 11 - SqlklewView Answer on Stackoverflow
Solution 12 - SqlSaif KhanView Answer on Stackoverflow
Solution 13 - SqltiaView Answer on Stackoverflow
Solution 14 - SqlStuart ParkerView Answer on Stackoverflow
Solution 15 - SqljrharshathView Answer on Stackoverflow
Solution 16 - SqlJoelFanView Answer on Stackoverflow
Solution 17 - SqlRobertView Answer on Stackoverflow
Solution 18 - SqlxportView Answer on Stackoverflow
Solution 19 - SqloeziView Answer on Stackoverflow
Solution 20 - SqlAmareswarView Answer on Stackoverflow
Solution 21 - Sqluser2719152View Answer on Stackoverflow
Solution 22 - SqlLajos ArpadView Answer on Stackoverflow
Solution 23 - SqlMicahView Answer on Stackoverflow
Solution 24 - SqlChocolimView Answer on Stackoverflow
Solution 25 - SqlNikkiView Answer on Stackoverflow