How get the T-SQL code to find duplicates?

Sql Server-2005TsqlSsms

Sql Server-2005 Problem Overview


MS Access has a button to generate sql code for finding duplicated rows. I don't know if SQL Server 2005/2008 Managment Studio has this.

  1. If it has, please point where

  2. If it has not, please tell me how can I have a T-SQL helper for creating code like this.

Sql Server-2005 Solutions


Solution 1 - Sql Server-2005

Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.

However, here's how to build a SQL to get duplicates over a set of columns:

SELECT col1, col2, col3, col4
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1

This will find rows which, for columns col1-col4, has the same combination of values, more than once.

For instance, in the following table, rows 2+3 would be duplicates:

PK    col1    col2    col3    col4    col5
1       1       2       3       4      6
2       1       3       4       7      7
3       1       3       4       7      10
4       2       3       1       4      5

The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.

Solution 2 - Sql Server-2005

If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:

SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table

Youd can also delete (or otherwise work with) duplicates using this technique:

WITH cte AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
    FROM table
)
DELETE FROM cte WHERE DuplicateRowNumber > 1

ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx

Solution 3 - Sql Server-2005

I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:

SELECT * FROM db WHERE col IN
    (SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
    ORDER BY col

Solution 4 - Sql Server-2005

AFAIK, it doesn't. Just make a select statement grouping by all the fields of a table, and filtering using a having clause where the count is greater than 1.

If your rows are duplicated except by the key, then don't include the key in the select fields.

Solution 5 - Sql Server-2005

Another way one can do this is by joining a table on itself.

SELECT *
FROM dbo.TableA aBase
JOIN dbo.TableA aDupes ON aDupes.ColA = aBase.ColA AND
                          aDupes.ColB = aBase.ColB
WHERE aBase.Pkey < aDupes.Pkey

Note: The aBase.Pkey < aDupes.Pkey is there because joining a table against itself will create two rows per match since the condition will always be true twice.

In other words: If table aBase has a row equal to a row from aDupes (based on ColA and ColB), the reflection of that match will also be true - that aDupes has a row equal to a row aBase based on ColA and ColB. Therefore both of those matches will be returned in the result set.

Narrow this down/eliminate this reflection by arbitrarily picking all results where one of the tables has a lower key.

< or > doesn't matter, as long as the keys are different.

This also takes care of filtering out matches with a row upon itself because aBase.Pkey < aDupes.Pkey forces the primary keys to be different.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJader DiasView Question on Stackoverflow
Solution 1 - Sql Server-2005Lasse V. KarlsenView Answer on Stackoverflow
Solution 2 - Sql Server-2005Mike DeFehrView Answer on Stackoverflow
Solution 3 - Sql Server-2005FerruccioView Answer on Stackoverflow
Solution 4 - Sql Server-2005eKek0View Answer on Stackoverflow
Solution 5 - Sql Server-2005boylec1986View Answer on Stackoverflow