Delete duplicate records from a SQL table without a primary key

SqlSql Server-2005TsqlDuplicate Removal

Sql Problem Overview


I have the below table with the below records in it

create table employee
(
 EmpId number,
 EmpName varchar2(10),
 EmpSSN varchar2(11)
);

insert into employee values(1, 'Jack', '555-55-5555');
insert into employee values (2, 'Joe', '555-56-5555');
insert into employee values (3, 'Fred', '555-57-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');
insert into employee values (1, 'Jack', '555-55-5555');
insert into employee values (4, 'Mike', '555-58-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6 ,'Lisa', '555-70-5555');
insert into employee values (5, 'Cathy', '555-59-5555');
insert into employee values (6, 'Lisa', '555-70-5555');

I dont have any primary key in this table .But i have the above records in my table already. I want to remove the duplicate records which has the same value in EmpId and EmpSSN fields.

Ex : Emp id 5

Can any one help me to frame a query to delete those duplicate records

Thanks in advance

Sql Solutions


Solution 1 - Sql

It is very simple. I tried in SQL Server 2008

DELETE SUB FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY EmpId, EmpName, EmpSSN ORDER BY EmpId) cnt
 FROM Employee) SUB
WHERE SUB.cnt > 1

Solution 2 - Sql

Add a Primary Key (code below)

Run the correct delete (code below)

Consider WHY you woudln't want to keep that primary key.


Assuming MSSQL or compatible:

ALTER TABLE Employee ADD EmployeeID int identity(1,1) PRIMARY KEY;

WHILE EXISTS (SELECT COUNT(*) FROM Employee GROUP BY EmpID, EmpSSN HAVING COUNT(*) > 1)
BEGIN
    DELETE FROM Employee WHERE EmployeeID IN 
    (
        SELECT MIN(EmployeeID) as [DeleteID]
        FROM Employee
        GROUP BY EmpID, EmpSSN
        HAVING COUNT(*) > 1
    )
END

Solution 3 - Sql

Use the row number to differentiate between duplicate records. Keep the first row number for an EmpID/EmpSSN and delete the rest:

    DELETE FROM Employee a
     WHERE ROW_NUMBER() <> ( SELECT MIN( ROW_NUMBER() )
                               FROM Employee b
                              WHERE a.EmpID  = b.EmpID
                                AND a.EmpSSN = b.EmpSSN )

Solution 4 - Sql

With duplicates

As
(Select *, ROW_NUMBER() Over (PARTITION by EmpID,EmpSSN Order by EmpID,EmpSSN) as Duplicate From Employee)

delete From duplicates

Where Duplicate > 1 ;

This will update Table and remove all duplicates from the Table!

Solution 5 - Sql

select distinct * into newtablename from oldtablename

Now, the newtablename will have no duplicate records.

Simply change the table name(newtablename) by pressing F2 in object explorer in sql server.

Solution 6 - Sql

Code

DELETE DUP 
FROM 
( 
    SELECT ROW_NUMBER() OVER (PARTITION BY Clientid ORDER BY Clientid ) AS Val 
    FROM ClientMaster 
) DUP 
WHERE DUP.Val > 1

Explanation

Use an inner query to construct a view over the table which includes a field based on Row_Number(), partitioned by those columns you wish to be unique.

Delete from the results of this inner query, selecting anything which does not have a row number of 1; i.e. the duplicates; not the original.

The order by clause of the row_number window function is needed for a valid syntax; you can put any column name here. If you wish to change which of the results is treated as a duplicate (e.g. keep the earliest or most recent, etc), then the column(s) used here do matter; i.e. you want to specify the order such that the record you wish to keep will come first in the result.

Solution 7 - Sql

You could create a temporary table #tempemployee containing a select distinct of your employee table. Then delete from employee. Then insert into employee select from #tempemployee.

Like Josh said - even if you know the duplicates, deleting them will be impossile since you cannot actually refer to a specific record if it is an exact duplicate of another record.

Solution 8 - Sql

If you don't want to create a new primary key you can use the TOP command in SQL Server:

declare @ID int
while EXISTS(select count(*) from Employee group by EmpId having count(*)> 1)
begin
	select top 1 @ID = EmpId
	from Employee 
	group by EmpId
	having count(*) > 1

	DELETE TOP(1) FROM Employee WHERE EmpId = @ID
end

Solution 9 - Sql

ITS easy use below query

WITH Dups AS
(
  SELECT col1,col2,col3,
ROW_NUMBER() OVER(PARTITION BY col1,col2,col3 ORDER BY (SELECT 0)) AS rn
 FROM mytable
)
DELETE FROM Dups WHERE rn > 1

Solution 10 - Sql

delete sub from (select ROW_NUMBER() OVer(Partition by empid order by empid)cnt from employee)sub where sub.cnt>1

Solution 11 - Sql

I'm not an SQL expert so bear with me. I'm sure you'll get a better answer soon enough. Here's how you can find the duplicate records.

select t1.empid, t1.empssn, count(*)
from employee as t1 
inner join employee as t2 on (t1.empid=t2.empid and t1.empssn = t2.empssn)
group by t1.empid, t1.empssn
having count(*) > 1

Deleting them will be more tricky because there is nothing in the data that you could use in a delete statement to differentiate the duplicates. I suspect the answer will involve row_number() or adding an identity column.

Solution 12 - Sql

create unique clustered index Employee_idx
on Employee ( EmpId,EmpSSN )
with ignore_dup_key

You can drop the index if you don't need it.

Solution 13 - Sql

no ID, no rowcount() or no temp table needed....

WHILE 
  (
     SELECT  COUNT(*) 
     FROM TBLEMP  
     WHERE EMPNO 
            IN (SELECT empno  from tblemp group by empno having count(empno)>1)) > 1 


DELETE top(1)  
FROM TBLEMP 
WHERE EMPNO IN (SELECT empno  from tblemp group by empno having count(empno)>1)

Solution 14 - Sql

there are two columns in the a table ID and name where names are repeating with different IDs so for that you may use this query: . .

DELETE FROM dbo.tbl1
WHERE id NOT IN (
     Select MIN(Id) AS namecount FROM tbl1
     GROUP BY Name
)

Solution 15 - Sql

Having a database table without Primary Key is really and will say extremely BAD PRACTICE...so after you add one (ALTER TABLE)

Run this until you don't see any more duplicated records (that is the purpose of HAVING COUNT)

DELETE FROM [TABLE_NAME] WHERE [Id] IN 
(
	SELECT MAX([Id])
	FROM [TABLE_NAME]
	GROUP BY [TARGET_COLUMN]
	HAVING COUNT(*) > 1
)


SELECT MAX([Id]),[TABLE_NAME], COUNT(*) AS dupeCount
FROM [TABLE_NAME]
GROUP BY [TABLE_NAME]
HAVING COUNT(*) > 1

MAX([Id]) will cause to delete latest records (ones added after first created) in case you want the opposite meaning that in case of requiring deleting first records and leave the last record inserted please use MIN([Id])

Solution 16 - Sql

Let's think out of the box.

I don't delete from the table, I make a new table first, for safety. I personally prefer do a

INSERT INTO new_table SELECT DISTINCT * FROM orig_table;

Now, new_table now should contains the expected data I want. I can check new_table to ensure that.

Then I have 2 options to replace the orig_table

A. delete orig_table; rename new_table to orig_table

B. truncate orig_table; insert data from new_table to orig_table; delete new_table (Recommended: in case you have some trigger/something else linked to the original orig_table)

Solution 17 - Sql

select t1.* from employee t1, employee t2 where t1.empid=t2.empid and t1.empname = t2.empname and t1.salary = t2.salary
group by t1.empid, t1.empname,t1.salary having count(*) > 1

Solution 18 - Sql

delete from employee where rowid in (select rowid from (select rowid, name_count from (select rowid, count(emp_name) as name_count from employee group by emp_id, emp_name) where name_count>1))

Solution 19 - Sql

DELETE FROM 'test' 
USING 'test' , 'test' as vtable
WHERE test.id>vtable.id and test.common_column=vtable.common_column  

Using this we can remove duplicate records

Solution 20 - Sql

ALTER IGNORE TABLE test
ADD UNIQUE INDEX 'test' ('b');

@ here 'b' is column name to uniqueness, @ here 'test' is index name.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionShyjuView Question on Stackoverflow
Solution 1 - SqlAnjib RajkhowaView Answer on Stackoverflow
Solution 2 - SqlcjkView Answer on Stackoverflow
Solution 3 - SqlPaul MorganView Answer on Stackoverflow
Solution 4 - SqlNirav ParikhView Answer on Stackoverflow
Solution 5 - Sqlnaga vara prasadView Answer on Stackoverflow
Solution 6 - Sqlkamz kamarajanView Answer on Stackoverflow
Solution 7 - SqlDaren ThomasView Answer on Stackoverflow
Solution 8 - SqlJoeView Answer on Stackoverflow
Solution 9 - SqlAbhishek JaiswalView Answer on Stackoverflow
Solution 10 - SqlSudhar PView Answer on Stackoverflow
Solution 11 - SqlJoshView Answer on Stackoverflow
Solution 12 - SqlSudhirView Answer on Stackoverflow
Solution 13 - SqlAshish SahuView Answer on Stackoverflow
Solution 14 - Sqluser2497372View Answer on Stackoverflow
Solution 15 - Sqld1jhoni1bView Answer on Stackoverflow
Solution 16 - Sqluser3713573View Answer on Stackoverflow
Solution 17 - Sqltejas rupaniView Answer on Stackoverflow
Solution 18 - SqlSantosh kumarView Answer on Stackoverflow
Solution 19 - Sqljayaram.pagotiView Answer on Stackoverflow
Solution 20 - Sqljayaram.pagotiView Answer on Stackoverflow