Does the size used with NVARCHAR matter?
Sql Server-2008Sql Server-2008 Problem Overview
Every time I create a table I wonder if there is any performance difference whether I say nvarchar (100)
or nvarchar (1000)
assuming that actual string size will be less than 100
. So is there?
Sql Server-2008 Solutions
Solution 1 - Sql Server-2008
According to the documentation:
> nvarchar [ ( n | max ) ] > > Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes.
So, only the actual length of the data entered matters when calculating the storage size.
The documentation doesn't say why it's there, but the length parameter is useful because it enforces simple limit constraints (for example, so someone can't enter 2 GB of text as their "name").
Solution 2 - Sql Server-2008
The reason why you should not use nvarchar(1000) when you need nvarchar (10) is to help prevent bad data from being entered into your database. Unless you like it when phone numbers say things like 'call the fat secretary not the cute one if you want a real answer." (not so random example I once found in an actual customer file sent to us) Users will figure out pretty quickly what fields are big enough that they can use them to store notes in which tends to make the data in the field useless over time.
And as for nvarchar(Max), it is a bad idea to use this at all unless you expect to have over 4000 characters. Look up indexing and varchar(max) to see why.
Solution 3 - Sql Server-2008
Yes, it matters from the performance point-of-view.
Query Optimizer looks at this meta data to plan the query. It estimates the row size based on the provided length and this can cause a performance issue. For example, when you need to sort a column that is varchar(10), it may plan to run the sort operation on RAM, but the same query for varchar(1000) may be planned to be run on a secondary storage.
I try to use the domain knowledge and estimate the required size. In addition, you may need to put some space for future maintenance. For example, if you think that your data may have maximum 50 characters, use varchar(70) instead of 50 so that it can handle unpredictable future changes in the application usage.
I got to know about it from this blog post (I am NOT the author): http://aboutsqlserver.com/2010/08/18/what-is-the-optimal-size-for-variable-width-columns/
NOTE: Don't choose smaller lengths blindly. Changing the field size may become a big maintenance headache. I can remember when I choose a small length for LastName field, and some users couldn't sign up in the system because of this. We had to update a critical database in-use (it takes time to increase field length), and compile the program and redeploy. If I had chosen a proper field size, I could avoid all these headaches.
You may also want to read about differences between nvarchar(max) and nvarchar(n) as n>4000 for 4000 makes the field basically similar to nvarchar(max). (https://stackoverflow.com/questions/148398/are-there-any-disadvantages-to-always-using-nvarcharmax)
Solution 4 - Sql Server-2008
As for size versus performance, remember that SQL server will store the initial value of the data for nvarchar
/varchar
and the whole value for nchar
/char
in terms of space. For example: nvarchar(1000)
with data stored test data
will initially take 92 bytes of space or 18-bytes. While a nchar(1000)
will take 10002 bytes (2000-bytes) no matter what.
Then it goes on its merry way adding the next set of data on the page (which is 8k) until the page meets (or is close to) the fill-factor set for the table. Then starts a new page. Now let's say a user needs to update that data and enters something with some substance in the previous field, let's say something 800 chars long. Now that value needs to update and will grow significantly, but now the page is full and when the data for that field has to grow, the page needs to split and make way for the data (unless the fill factor is low enough to allow for the growth).
That page split will aggregate as index fragmentation and result in slower search/seek times and longer update times. So there may be a difference in terms of impact for performance if the data changes significantly.
As is often the case, the answer is: "depends".
Solution 5 - Sql Server-2008
At least in sql server database it is not allowed to create Unique constraint against column with its type as nvarchar(max). It should be limited to nvarchar(450) to add this constraint successfully.
Solution 6 - Sql Server-2008
Since nvarchar is a variable length data type it will only store the data you assign to it (2 bytes per char) plus 2 bytes for length information and is primarily used for double byte languages like Chinese.
Personally, I use varchar(n) when I know of a certain limitation (i.e. URL query string limit, file path szie limit, or my own limit). I use varchar(max) when the max length is undefined and it could go beyond 8000 characters. And I almost never use nvarchar primarily because our application will never go international.
Solution 7 - Sql Server-2008
There is some mess in the documentation obout the actual size occupied by varchar(n) or nvarchar(n) fields. Probably @Brendan Long quoted the text from Microsoft documentation that later has been changed. Today this documatation says that the actual size ocuppied by a field varchar(n) or nvarchar(n) depends only on the 'n' value and not on actual length of a stored string. Another strange thing is, that a documentation of varbinary(n) data type states that actual storage size for this type is a actual entered data length plus 2 bytes. This informations seem to be incorrect or messy.