Why not use varchar(max)?

SqlTsqlSql Server-2008Varcharmax

Sql Problem Overview


I'm a bit old school when it comes to database design, so I'm totally for using the correct data sizes in columns. However, when reviewing a database for a friend, I noticed he used varchar(max) a lot. Now, my immediate thought was to throw it back to him and tell him to change it. But then I thought about it and couldn't come up with a good reason for him not to use it (he'd used a case type tool to generate the db, if you're wondering).

I've been researching the topic of varchar(max) usage and I can't really come up with any good reason for him not to use it.

He doesn't use the columns for indexes, the application that sits on the db has limitations on the input, so it won't allow massive entries in the fields.

Any help would be appreciated to help me make him see the light :).

Sql Solutions


Solution 1 - Sql

My answer to this, isn't about the usage of Max, as much as it is about the reason for VARCHAR(max) vs TEXT.

In my book; first of all, Unless you can be absolutely certain that you'll never encode anything but english text and people won't refer to names of foreign locations, then you should use NVARCHAR or NTEXT.

Secondly, it's what the fields allow you to do.

TEXT is hard to update in comparison to VARCHAR, but you get the advantage of Full Text Indexing and lots of clever things.

On the other hand, VARCHAR(MAX) has some ambiguity, if the size of the cell is < 8000 chars, it will be treated as Row data. If it's greater, it will be treated as a LOB for storage purposes. Because you can't know this without querying RBAR, this may have optimization strategies for places where you need to be sure about your data and how many reads it costs.

Otherwise, if your usage is relatively mundane and you don't expect to have problems with the size of data (IE you're using .Net and therefore don't have to be concerned about the size of your string/char* objects) then using VARCHAR(max) is fine.

Solution 2 - Sql

There is a blog post about why not to use varchar max here

Edit

The basic difference is where the data is stored. A SQL Data row has a max size of 8000 bytes (or was it 8K). Then a 2GB varchar(max) cannot be stored in the data row. SQL Server stores it "Out of row".

Therefore you could get a performance hit since the data will not be in the same place on disk, see: http://msdn.microsoft.com/en-us/library/ms189087.aspx

Solution 3 - Sql

If you are working in an OLTP environment, you are all about the performance. From overhead and tuning concerns to indexing limitations and query bottlenecks. Using a varcahr(max) or any other LOB type will most likely contravene most design best practices, so unless there is a specific business need that cannot be handled through the use of some other typing mechanism and only a varchar(max) will fit the bill then why subject your system and applications to the kind of overhead and performance issues inherent in one of the LOB datatypes?

If on the other hand you are working in an OLAP environment or in a Star Schema DW environment with Dimension tables with descriptors fields that naturally need to be verbose then a varchar(max), as long as you are not adding that to an index, may be useful. Still I would recommend even then to use a char(x) varchar(x) As it is always a best practice to only use those resources you absolutely must have to get the job done.

Solution 4 - Sql

They should NOT be used unless you expect large amounts of data and here is the reason why (directly from Books Online):

> Columns that are of the large object (LOB) data types ntext, text, > varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be > specified as key columns for an index.

If you want to cripple performance, use nvarchar for everything.

Solution 5 - Sql

Redgate wrote a great article about this.
https://www.red-gate.com/simple-talk/sql/database-administration/whats-the-point-of-using-varcharn-anymore/

Conclusions

  • Where appropriate, use VARCHAR(n) over VARCHAR(MAX) for reasons of good design if not performance benefits, and because VARCHAR(MAX) data does not compress
  • Storing large strings takes longer than storing small strings.
  • Updating an in-row VARCHAR(MAX) value from below 8,000 to over 8,000 will be relatively slow, but the difference for a single transaction will likely not be measurable.
  • Updating an in-row VARCHAR(MAX) value from over 8,000 to below 8,000 will be faster than if the table is set to store data out-of-row.
  • Using the out-of-row option for VARCHAR(MAX) will cause slower writes until the strings are very long.

Solution 6 - Sql

Why not? There are reasons to not use varchar(max):

  1. Just like a good old BLOB, SQL Server cannot index a varchar(max) column.
  2. It's just wasteful and lazy to overprovision, specially varchar(max) since it allocates AT LEAST 8-bytes per row. I've seen developers assigning "max" to single-byte binary (True / False) variables, only to find out later that the system is slow as molasses when discriminating data using those values.
  3. You cannot infer what data type is getting saved there. The obvious use case exception is saving actual large text chunks of up to 8K.

Solution 7 - Sql

I don't know how sql server handles large (declared) varchar fields from a performance, memory and storage perspective.. but assuming it does so as efficiently as smaller declared varchar fields, there's still the benefit of integrity constraints.

The application sitting on the db is supposed to have limits on the input, but the database can properly report an error if the application has a bug in this respect.

Solution 8 - Sql

The diff is in next:
VARCHAR(X) can be indexed
VARCHAR(MAX) can't be indexed

Solution 9 - Sql

   It is somewhat old-fashioned to believe that the application will only pass short strings to the database, and that will make it okay.

   In modern times, you HAVE to anticipate that the database will be accessed primarily by the current application, but there may be a future version of the application, (will the developer of that version know to keep strings below a certain length?)

   You MUST anticipate that web services, ETL processes, LYNC to SQL, and any other number of already existing, and/or not-yet-existing technologies will be used to access your database.

   Generally speaking I try not to go over varchar(4000), because it's four-thousand characters, after all. If I exceed that, then I look to other datatypes to store whatever it is I am trying to store. Brent Ozar has written some pretty great stuff on this.

   All that said, it is important to evaluate the current design's approach to your current requirements when you are working on a project. Have an idea of how the various parts work, understand the trade-offs of various approaches and solve the problem at hand. Exercising some great axiom can lead to blind adherence which might turn you into a lemming.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAtaLossView Question on Stackoverflow
Solution 1 - SqlRuss ClarkeView Answer on Stackoverflow
Solution 2 - SqlShiraz BhaijiView Answer on Stackoverflow
Solution 3 - SqlScott JohnstonView Answer on Stackoverflow
Solution 4 - SqlHLGEMView Answer on Stackoverflow
Solution 5 - SqlDonny V.View Answer on Stackoverflow
Solution 6 - SqlalejandrobView Answer on Stackoverflow
Solution 7 - Sqlat.View Answer on Stackoverflow
Solution 8 - SqlVlad KirovView Answer on Stackoverflow
Solution 9 - SqlStephen LauzonView Answer on Stackoverflow