Should I index a bit field in SQL Server?

Sql ServerIndexing

Sql Server Problem Overview


I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing. I admit I don't know enough about how indexes work to understand why that is.

So what if I have a table with 100 million rows in it, and I am selecting records where a bit field is 1? And let's say that at any point in time, there are only a handful of records where the bit field is 1 (as opposed to 0). Is it worth indexing that bit field or not? Why?

Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?

Sql Server Solutions


Solution 1 - Sql Server

Consider what an index is in SQL - and index is really a chunk of memory pointing at other chunks of memory (i.e. pointers to rows). The index is broken into pages so that portions of the index can be loaded and unloaded from memory depending on usage.

When you ask for a set of rows, SQL uses the index to find the rows more quickly than table scanning (looking at every row).

SQL has clustered and non-clustered indexes. My understanding of clustered indexes is that they group similar index values into the same page. This way when you ask for all the rows matching an index value, SQL can return those rows from a clustered page of memory. This is why trying to cluster index a GUID column is a bad idea - you don't try to cluster random values.

When you index an integer column, SQL's index contains a set of rows for each index value. If you have a range of 1 to 10, then you would have 10 index pointers. Depending on how many rows there are this can be paged differently. If your query looks for the index matching "1" and then where Name contains "Fred" (assuming the Name column is not indexed), SQL gets the set of rows matching "1" very quickly, then table scans to find the rest.

So what SQL is really doing is trying to reduce the working set (number of rows) it has to iterate over.

When you index a bit field (or some narrow range), you only reduce the working set by the number of rows matching that value. If you have a small number of rows matching it would reduce your working set a lot. For a large number of rows with 50/50 distribution, it might buy you very little performance gain vs. keeping the index up to date.

The reason everyone says to test is because SQL contains a very clever and complex optimizer that may ignore an index if it decides table scanning is faster, or may use a sort, or may organize memory pages however it darn well likes.

Solution 2 - Sql Server

I just came across this question by way of another. Assuming that your statement that only a handful of the records assume the value of 1 (and that those are the ones you're interested in), then a filtered index could be a good choice. Something like:

create index [IX_foobar] on dbo.Foobar (FooID) where yourBitColumn = 1

This will create a substantially smaller index that the optimizer is smart enough to use when that is a predicate in your query.

Solution 3 - Sql Server

100-million records with only a few having the bit field set to 1? Yes, I would think indexing the bit field would definitely speed up querying the bit=1 records. You should get logarithmic search time from the index and then only touch the few pages with bit=1 records. Otherwise, you'd have to touch all pages of the 100-million record table.

Then again, I'm definitely not a database expert and could be missing something important.

Solution 4 - Sql Server

Solution 5 - Sql Server

If your distribution is pretty known and unbalanced, like 99% of the rows are bit = 1 and the 1% are bit = 0, when you do a WHERE clause with bit = 1, a full table scan will be around the same time as the index scan. If you want to have a fast query where bit = 0, the best way I know is create a filtered index, adding a clause WHERE bit = 0. That way, that index will only store the 1% row. Then doing a WHERE bit = 0 will simply let the query optimizer choose that index, and all rows from it will be bit = 0. You also have the benefit to have a very small amount of disk space required compare a full index on the bit.

Solution 6 - Sql Server

While I don't think I would index JUST a bit column by itself, it is very common to include bit columns as part of a compound index.

A simple example would be an index on ACTIVE, LASTNAME instead of just lastname, when your application is almost always looking for active customers.

Solution 7 - Sql Server

If you want to know if an index has the effects you desire: test and test again.

In general you don't want an index that doesn't narrow down your table enough, because of the cost to maintain an index. (cost > profit). But if the index in your case will cut the table in half, you may gain something but putting it on the table. It all depends on the exact size/structure of your table and how you are using it (number of reads/writes).

Solution 8 - Sql Server

Of course it worths, especially if you need to retrieve the data by that value. It would be similar to using a sparse matrix instead of using a normal matrix.

Now with SQL 2008 you can use partitioning functions, and you are able to filter the data that goes in an index. The disadvantage for earlier versions would be that the index would be made for all the data, but this can be optimized by storing the interesting values in a separate file group.

Solution 9 - Sql Server

As others have said, you'll want to measure this. I don't recall where I've read this, but a column needs to have very high cardinality (around 95%) in order for an index to be effective. Your best test for this would be to build the index and examine the execution plans for the 0 and 1 values of the BIT field. If you see an index seek operation in the execution plan then you know that your index will be used.

Your best course of action would be to test the with a basic SELECT * FROM table WHERE BitField = 1; query and slowly build out the functionality from there step by step until you have a realistic query for your application, examining the execution plan with every step to make sure that the index seek is still being utilized. Admittedly, there is no guarantee that this execution plan will be used in production, but there is a good chance that it will be.

Some information can be found on the sql-server-performance.com forums and in the referenced article

Solution 10 - Sql Server

"I remember reading at one point that indexing a field with low cardinality (a low number of distinct values) is not really worth doing"

That because SQL Server will almost always find its more efficient to just do a table-scan than to read the index. So basically your index will never get used and it's a waste to maintain it. As others have said it might be ok in a compound index.

Solution 11 - Sql Server

If your goal is to make querying for records where the bit field value equals '1' faster you might try an indexed view of your base table which only contains records where your bit field equals '1'. In enterprise edition if a query could make use of an indexed view instead of a specified table to improve query performance it will use the view. In theory this would increase the speed of select queries which only look for records with a bit field value of '1'.

http://www.microsoft.com/technet/prodtechnol/sql/2005/impprfiv.mspx

All this assumes you are Microsoft SQL Server 2005 Enterprise. The same might apply to 2008, I'm not familiar with that version.

Solution 12 - Sql Server

You can't index a bit field in SQL Server 2000, as was indicated in the Books Online at the time:

> bit > > Integer data type 1, 0, or NULL. > > Remarks > > Columns of type bit cannot > have indexes on them.

Yes, if you have only a handful of rows, out of millions, an index will help. But if you want to do it in this case you need to make the column a tinyint.

Note: Enterprise Manager will not let you create an index on a bit column. If you wish you can still manually create an index on a bit column:

CREATE INDEX IX_Users_IsActiveUsername ON Users
(
   IsActive,
   Username
)

But SQL Server 2000 will not actually use such an index - running a query where the index would be a perfect candidate, e.g.:

SELECT TOP 1 Username 
FROM Users
WHERE IsActive = 0

SQL Server 2000 will do a table scan instead, acting as though the index doesn't even exist. If you change the column to a tinyint SQL Server 2000 will do an index seek. Also, the following non-covered query:

SELECT TOP 1 * 
FROM Users
WHERE IsActive = 0

It will perform an index seek, followed by a bookmark lookup.


SQL Server 2005 does have limited support for indexes on bit columns. For example:

SELECT TOP 1 Username 
FROM Users
WHERE IsActive = 0

will cause an index seek through the covering index. But the non-covered case:

SELECT TOP 1 * 
FROM Users
WHERE IsActive = 0

will not cause an index seek followed by a bookmark lookup, it will perform a table scan (or clustered index scan), rather than performing the index seek followed by a bookmark lookup.

Verified by experimentation and direct observation.

Solution 13 - Sql Server

On its own, no as it results in very little selectivity. As part of a compound index. quite possibly but only after other equality columns.

Solution 14 - Sql Server

very late answer...

Yes, it can be useful according to SQL CAT team (updated, has been consolidated)

Solution 15 - Sql Server

measure response time before and after and see if it is worthwhile; theoretically it should improve performance for queries using the indexed fields but it really depends on the distribution of true/false values and the other fields involved in the queries that you're concerned about

Solution 16 - Sql Server

Is this a common query? It may be worth it when looking for the "handful" of records but won't help you much on the other rows. Are there other ways to identify the data?

Solution 17 - Sql Server

Cardinality is one factor, the other is how well does the index divide your data. If you have about half 1s and half 0s, then it will help. (Assuming that that index is a better path to choose than some other index). However, how often are you inserting and updating? Adding indexes for SELECT performance also hurt the INSERT, UPDATE and DELETE performance, so keep that in mind.

I would say, if the 1s to 0s (or vice versa) isn't better than 75% to 25%, don't bother.

Solution 18 - Sql Server

Ian Boyd is correct when he says that you could not do it via Enterprise Manager for SQL 2000 (see his note regarding creating it throught T-SQL.

Solution 19 - Sql Server

You need to be smart here to query, you must know the load value on your column if the load of true is more in your system and you want to check all the true values writ your query to check not false.. it will help alot, it just trick.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjeremccView Question on Stackoverflow
Solution 1 - Sql ServerGeoff CoxView Answer on Stackoverflow
Solution 2 - Sql ServerBen ThulView Answer on Stackoverflow
Solution 3 - Sql ServerC. Dragon 76View Answer on Stackoverflow
Solution 4 - Sql ServerJeffView Answer on Stackoverflow
Solution 5 - Sql ServerPhilippe BoucherView Answer on Stackoverflow
Solution 6 - Sql ServerBradCView Answer on Stackoverflow
Solution 7 - Sql ServerthijsView Answer on Stackoverflow
Solution 8 - Sql ServerBogdan MaximView Answer on Stackoverflow
Solution 9 - Sql ServerJeremiah PeschkaView Answer on Stackoverflow
Solution 10 - Sql ServerDJ.View Answer on Stackoverflow
Solution 11 - Sql ServerJeremyView Answer on Stackoverflow
Solution 12 - Sql ServerIan BoydView Answer on Stackoverflow
Solution 13 - Sql ServerCraig NicholsonView Answer on Stackoverflow
Solution 14 - Sql ServergbnView Answer on Stackoverflow
Solution 15 - Sql ServerSteven A. LoweView Answer on Stackoverflow
Solution 16 - Sql Serverjason saldoView Answer on Stackoverflow
Solution 17 - Sql ServerAnthony PottsView Answer on Stackoverflow
Solution 18 - Sql ServerJohn BView Answer on Stackoverflow
Solution 19 - Sql ServerChetan VermaView Answer on Stackoverflow