Finding duplicate rows in SQL Server

SqlSql ServerDuplicates

Sql Problem Overview


I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.

A statement like:

SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

Will return something like

orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2 

But I'd also like to grab the IDs of them. Is there any way to do this? Maybe like a

orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2  

The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don't screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.

Sql Solutions


Solution 1 - Sql

select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

Solution 2 - Sql

You can run the following query and find the duplicates with max(id) and delete those rows.

SELECT orgName, COUNT(*), Max(ID) AS dupes 
FROM organizations 
GROUP BY orgName 
HAVING (COUNT(*) > 1)

But you'll have to run this query a few times.

Solution 3 - Sql

You can do it like this:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName

If you want to return just the records that can be deleted (leaving one of each), you can use:

SELECT
    id, orgName
FROM (
     SELECT 
         orgName, id,
         ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
     FROM organizations
) AS d
WHERE intRow != 1

Edit: SQL Server 2000 doesn't have the ROW_NUMBER() function. Instead, you can use:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id

Solution 4 - Sql

You can try this , it is best for you

 WITH CTE AS
    (
    SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations 
    )
    select * from CTE where RN>1
    go

Solution 5 - Sql

The solution marked as correct didn't work for me, but I found this answer that worked just great: https://stackoverflow.com/questions/2881774/get-list-of-duplicate-rows-in-mysql

SELECT n1.* 
FROM myTable n1
INNER JOIN myTable n2 
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id

Solution 6 - Sql

If you want to delete duplicates:

WITH CTE AS(
   SELECT orgName,id,
       RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
   FROM organizations
)
DELETE FROM CTE WHERE RN > 1

Solution 7 - Sql

select * from [Employees]

For finding duplicate Record 1)Using CTE

with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte

2)By Using GroupBy

select Name,EmailId,COUNT(name) as Duplicate from  [Employees] group by Name,EmailId 

Solution 8 - Sql

Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1

So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos. So rownum> 1 will be the duplicate records which could be deleted as such.

Solution 9 - Sql

select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;

Src : https://stackoverflow.com/a/59242/1465252

Solution 10 - Sql

select a.orgName,b.duplicate, a.id
from organizations a
inner join (
    SELECT orgName, COUNT(*) AS duplicate
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id

Solution 11 - Sql

select orgname, count(*) as dupes, id 
from organizations
where orgname in (
	select orgname
	from organizations
	group by orgname
	having (count(*) > 1)
)
group by orgname, id

Solution 12 - Sql

You have several way for Select duplicate rows.

for my solutions , first consider this table for example

CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)

INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

First solution :

SELECT DISTINCT *
FROM   #Employee;

WITH #DeleteEmployee AS (
                     SELECT ROW_NUMBER()
                            OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
                            RNUM
                     FROM   #Employee
                 )

SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1

SELECT DISTINCT *
FROM   #Employee

Secound solution : Use identity field

SELECT DISTINCT *
FROM   #Employee;

ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)

SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
    SELECT MAX(UNIQ_ID)
    FROM   #Employee a2
    WHERE  #Employee.ID = a2.ID
           AND #Employee.FIRST_NAME = a2.FIRST_NAME
           AND #Employee.LAST_NAME = a2.LAST_NAME
)

ALTER TABLE #Employee DROP COLUMN UNIQ_ID

SELECT DISTINCT *
FROM   #Employee

and end of all solution use this command

DROP TABLE #Employee

Solution 13 - Sql

i think i know what you need i needed to mix between the answers and i think i got the solution he wanted:

select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
    SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

having the max id will give you the id of the dublicate and the one of the original which is what he asked for:

id org name , dublicate count (missing out in this case) 
id doublicate org name , doub count (missing out again because does not help in this case)

only sad thing you get it put out in this form

id , name , dubid , name

hope it still helps

Solution 14 - Sql

Suppose we have table the table 'Student' with 2 columns:

  • student_id int

  • student_name varchar

      Records:
      +------------+---------------------+
      | student_id | student_name        |
      +------------+---------------------+
      |        101 | usman               |
      |        101 | usman               |
      |        101 | usman               |
      |        102 | usmanyaqoob         |
      |        103 | muhammadusmanyaqoob |
      |        103 | muhammadusmanyaqoob |
      +------------+---------------------+
      
    

Now we want to see duplicate records Use this query:

select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

Solution 15 - Sql

I got a better option to get the duplicate records in a table

SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances

Click here to see the result of the sql

Solution 16 - Sql

 /*To get duplicate data in table */

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1 
  GROUP BY EmpCode HAVING COUNT(EmpCode) > 1

Solution 17 - Sql

I use two methods to find duplicate rows. 1st method is the most famous one using group by and having. 2nd method is using CTE - Common Table Expression.

As mentioned by @RedFilter this way is also right. Many times I find CTE method is also useful for me.

WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName) 
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations   e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1

In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.

Source : CodoBee

Solution 18 - Sql

Try

SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionxtineView Question on Stackoverflow
Solution 1 - SqlD'Arcy RittichView Answer on Stackoverflow
Solution 2 - SqlAykut AkıncıView Answer on Stackoverflow
Solution 3 - SqlPaulView Answer on Stackoverflow
Solution 4 - Sqlcode saveView Answer on Stackoverflow
Solution 5 - SqlecairolView Answer on Stackoverflow
Solution 6 - SqlakdView Answer on Stackoverflow
Solution 7 - SqlDebendra DashView Answer on Stackoverflow
Solution 8 - SqlMike ClarkView Answer on Stackoverflow
Solution 9 - SqliCrazybestView Answer on Stackoverflow
Solution 10 - Sqluser5336758View Answer on Stackoverflow
Solution 11 - SqlJordãoView Answer on Stackoverflow
Solution 12 - SqlArdalan ShahgholiView Answer on Stackoverflow
Solution 13 - SqlArthur KielbasaView Answer on Stackoverflow
Solution 14 - SqlUsman YaqoobView Answer on Stackoverflow
Solution 15 - SqlSoftIdeaView Answer on Stackoverflow
Solution 16 - SqlCodeView Answer on Stackoverflow
Solution 17 - SqlIshrarView Answer on Stackoverflow
Solution 18 - SqlryanView Answer on Stackoverflow