How to delete duplicate rows in SQL Server?

SqlSql Server-2008DuplicatesSql Delete

Sql Problem Overview


How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

Sql Solutions


Solution 1 - Sql

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE:

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO (result is different; I assume that it's due to a typo on your part)

COL1	COL2	COL3	COL4	COL5	COL6	COL7
john	1	     1	     1	     1	     1	     1
sally	2	     2       2	     2	     2	     2

This example determines duplicates by a single column col1 because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)

Solution 2 - Sql

I would prefer CTE for deleting duplicate rows from sql server table

strongly recommend to follow this article ::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

> by keeping original

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)
 
DELETE FROM CTE WHERE RN<>1

> without keeping original

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
 
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

Solution 3 - Sql

Without using CTE and ROW_NUMBER() you can just delete the records just by using group by with MAX function here is and example

DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

Solution 4 - Sql

If you have no references, like foreign keys, you can do this. I do it a lot when testing proofs of concept and the test data gets duplicated.

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

INTO [newTable]

FROM [oldTable]

Go into the object explorer and delete the old table.

Rename the new table with the old table's name.

Solution 5 - Sql

DELETE from search
where id not in (
   select min(id) from search
   group by url
   having count(*)=1

   union

   SELECT min(id) FROM search
   group by url
   having count(*) > 1
)

Solution 6 - Sql

There are two solutions in mysql:

A) Delete duplicate rows using DELETE JOIN statement

DELETE t1 FROM contacts t1
INNER JOIN contacts t2 
WHERE 
    t1.id < t2.id AND 
    t1.email = t2.email;

This query references the contacts table twice, therefore, it uses the table alias t1 and t2.

The output is:

> 1 Query OK, 4 rows affected (0.10 sec)

In case you want to delete duplicate rows and keep the lowest id, you can use the following statement:

DELETE c1 FROM contacts c1
INNER JOIN contacts c2 
WHERE
    c1.id > c2.id AND 
    c1.email = c2.email;

   

B) Delete duplicate rows using an intermediate table

The following shows the steps for removing duplicate rows using an intermediate table:

    1. Create a new table with the structure the same as the original table that you want to delete duplicate rows.

    2. Insert distinct rows from the original table to the immediate table.

    3. Insert distinct rows from the original table to the immediate table.

 

Step 1. Create a new table whose structure is the same as the original table:

CREATE TABLE source_copy LIKE source;

Step 2. Insert distinct rows from the original table to the new table:

INSERT INTO source_copy
SELECT * FROM source
GROUP BY col; -- column that has duplicate values

Step 3. drop the original table and rename the immediate table to the original one

DROP TABLE source;
ALTER TABLE source_copy RENAME TO source;

Source: http://www.mysqltutorial.org/mysql-delete-duplicate-rows/

Solution 7 - Sql

Remove all duplicates, but the very first ones (with min ID)

should work equally in other SQL servers, like Postgres:

DELETE FROM table
WHERE id NOT IN (
   select min(id) from table
   group by col1, col2, col3, col4, col5, col6, col7
)

Solution 8 - Sql

Please see the below way of deletion too.

Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values 
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)

Created a sample table named @table and loaded it with given data.

enter image description here

Delete	aliasName from (
Select	*,
		ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From	@table) aliasName 
Where	rowNumber > 1

Select * from @table

enter image description here

Note: If you are giving all columns in the Partition by part, then order by do not have much significance.

I know, the question is asked three years ago, and my answer is another version of what Tim has posted, But posting just incase it is helpful for anyone.

Solution 9 - Sql

It can be done by many ways in sql server the most simplest way to do so is: Insert the distinct rows from the duplicate rows table to new temporary table. Then delete all the data from duplicate rows table then insert all data from temporary table which has no duplicates as shown below.

select distinct * into #tmp From table
   delete from table
   insert into table
   select * from #tmp drop table #tmp

   select * from table

Delete duplicate rows using Common Table Expression(CTE)

With CTE_Duplicates as 
(select id,name , row_number() 
over(partition by id,name order by id,name ) rownumber  from table  ) 
delete from CTE_Duplicates where rownumber!=1

Solution 10 - Sql

Microsoft has a vey ry neat guide on how to remove duplicates. Check out http://support.microsoft.com/kb/139444

In brief, here is the easiest way to delete duplicates when you have just a few rows to delete:

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

myprimarykey is the identifier for the row.

I set rowcount to 1 because I only had two rows that were duplicated. If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1.

Solution 11 - Sql

Try to Use:

SELECT linkorder
	,Row_Number() OVER (
		PARTITION BY linkorder ORDER BY linkorder DESC
		) AS RowNum
FROM u_links

enter image description here

Solution 12 - Sql

After trying the suggested solution above, that works for small medium tables. I can suggest that solution for very large tables. since it runs in iterations.

  1. Drop all dependency views of the LargeSourceTable

  2. you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies"

  3. Rename the table:

  4. sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO

  5. Create the LargeSourceTable again, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON)

  6. For example:

    CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO

  7. Create again the views that you dropped in the first place for the new created table

  8. Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often.

  9. Note, that I set the IDENTITY_INSERT on and off because one the columns contains auto incremental id, which I'm also copying

SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
	begin transaction tran_inner
		; with cte as
		(
			SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
			OFFSET ((@PageNumber) * @RowspPage) ROWS
			FETCH NEXT @RowspPage ROWS ONLY
		)

		INSERT INTO LargeSourceTable 
		(
			 ID 					
			,[CreateDate]       
			,[Column1]   
			,[Column2] 
			,[Column3]       
		)		
		select 
			 ID 					
			,[CreateDate]       
			,[Column1]   
			,[Column2] 
			,[Column3]       
		from cte
		
	commit transaction tran_inner

	PRINT 'Page: ' + convert(varchar(10), @PageNumber)
	PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
	PRINT 'Of: ' + convert(varchar(20), @TotalRows)

	SELECT @dt = convert(varchar(19), getdate(), 121)
	RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
	SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF

Solution 13 - Sql

To delete the duplicate rows from the table in SQL Server, you follow these steps:

  1. Find duplicate rows using GROUP BY clause or ROW_NUMBER() function.
  2. Use DELETE statement to remove the duplicate rows.

Setting up a sample table

DROP TABLE IF EXISTS contacts;

CREATE TABLE contacts(
    contact_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name NVARCHAR(100) NOT NULL,
    last_name NVARCHAR(100) NOT NULL,
    email NVARCHAR(255) NOT NULL,
);

Insert values

INSERT INTO contacts
    (first_name,last_name,email) 
VALUES
    ('Syed','Abbas','[email protected]'),
    ('Catherine','Abel','[email protected]'),
    ('Kim','Abercrombie','[email protected]'),
    ('Kim','Abercrombie','[email protected]'),
    ('Kim','Abercrombie','[email protected]'),
    ('Hazem','Abolrous','[email protected]'),
    ('Hazem','Abolrous','[email protected]'),
    ('Humberto','Acevedo','[email protected]'),
    ('Humberto','Acevedo','[email protected]'),
    ('Pilar','Ackerman','[email protected]');

enter image description here

Query

	SELECT 
   contact_id, 
   first_name, 
   last_name, 
   email
FROM 
   contacts;

Delete duplicate rows from a table

   WITH cte AS (
    SELECT 
        contact_id, 
        first_name, 
        last_name, 
        email, 
        ROW_NUMBER() OVER (
            PARTITION BY 
                first_name, 
                last_name, 
                email
            ORDER BY 
                first_name, 
                last_name, 
                email
        ) row_num
     FROM 
        contacts
)
DELETE FROM cte
WHERE row_num > 1;

Should delete the record now

enter image description here

Solution 14 - Sql

with myCTE
as

(
select productName,ROW_NUMBER() over(PARTITION BY productName order by slno) as Duplicate from productDetails
)
Delete from myCTE where Duplicate>1

Solution 15 - Sql

-- this query will keep only one instance of a duplicate record.
;WITH cte
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY col1, col2, col3-- based on what? --can be multiple columns
                                       ORDER BY ( SELECT 0)) RN
         FROM   Mytable)
         
    
         
delete  FROM cte
WHERE  RN > 1

Solution 16 - Sql

You need to group by the duplicate records according to the field(s), then hold one of the records and delete the rest. For example:

DELETE prg.Person WHERE Id IN (
SELECT dublicateRow.Id FROM
(
select MIN(Id) MinId, NationalCode
 from  prg.Person group by NationalCode  having count(NationalCode ) > 1
 ) GroupSelect
 JOIN  prg.Person dublicateRow ON dublicateRow.NationalCode = GroupSelect.NationalCode 
 WHERE dublicateRow.Id <> GroupSelect.MinId)

Solution 17 - Sql

Deleting duplicates from a huge(several millions of records) table might take long time . I suggest that you do a bulk insert into a temp table of the selected rows rather than deleting.

--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() 
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM 
CTE WHERE ID =1;

Solution 18 - Sql

This might help in your case

DELETE t1 FROM table t1 INNER JOIN table t2 WHERE t1.id > t2.id AND t1.col1 = t2.col1 

Solution 19 - Sql

With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server

The idea of removing duplicate involves

  • a) Protecting those rows that are not duplicate
  • b) Retain one of the many rows that qualified together as duplicate.

Step-by-step

    1. First identify the rows those satisfy the definition of duplicate and insert them into temp table, say #tableAll .
    1. Select non-duplicate(single-rows) or distinct rows into temp table say #tableUnique.
    1. Delete from source table joining #tableAll to delete the duplicates.
    1. Insert into source table all the rows from #tableUnique.
    1. Drop #tableAll and #tableUnique

Solution 20 - Sql

If you have the ability to add a column to the table temporarily, this was a solution that worked for me:

ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)

Then perform a DELETE using a combination of MIN and GROUP BY

DELETE b
FROM dbo.DUPPEDTABLE b
WHERE b.RowID NOT IN (
                     SELECT MIN(RowID) AS RowID
                     FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
                     GROUP BY a.ITEM_NUMBER,
                              a.CHARACTERISTIC,
                              a.INTVALUE,
                              a.FLOATVALUE,
                              a.STRINGVALUE
                 );

Verify that the DELETE performed correctly:

SELECT a.ITEM_NUMBER,
    a.CHARACTERISTIC,
    a.INTVALUE,
    a.FLOATVALUE,
    a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
    a.CHARACTERISTIC,
    a.INTVALUE,
    a.FLOATVALUE,
    a.STRINGVALUE
ORDER BY COUNT(*) DESC 

The result should have no rows with a count greater than 1. Finally, remove the rowid column:

ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;

Solution 21 - Sql

Oh wow, i feel so stupid by ready all this answers, they are like experts' answer with all CTE and temp table and etc.

And all I did to get it working was simply aggregated the ID column by using MAX.

DELETE FROM table WHERE col1 IN (
    SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

NOTE: you might need to run it multiple time to remove duplicate as this will only delete one set of duplicate rows at a time.

Solution 22 - Sql

Another way of removing dublicated rows without loosing information in one step is like following:

delete from dublicated_table t1 (nolock)
join (
	select t2.dublicated_field
    , min(len(t2.field_kept)) as min_field_kept
	from dublicated_table t2 (nolock)
	group by t2.dublicated_field having COUNT(*)>1
) t3 
on t1.dublicated_field=t3.dublicated_field 
	and len(t1.field_kept)=t3.min_field_kept

Solution 23 - Sql

DECLARE @TB TABLE(NAME VARCHAR(100));
INSERT INTO @TB VALUES ('Red'),('Red'),('Green'),('Blue'),('White'),('White')
--**Delete by Rank**
;WITH CTE AS(SELECT NAME,DENSE_RANK() OVER (PARTITION BY NAME ORDER BY NEWID()) ID FROM @TB)
DELETE FROM CTE WHERE ID>1
SELECT NAME FROM @TB;
--**Delete by Row Number** 
;WITH CTE AS(SELECT NAME,ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB)
DELETE FROM CTE WHERE ID>1;
SELECT NAME FROM @TB;

Solution 24 - Sql

DELETE FROM TBL1  WHERE ID  IN
(SELECT ID FROM TBL1  a WHERE ID!=
(select MAX(ID) from TBL1  where DUPVAL=a.DUPVAL 
group by DUPVAL
having count(DUPVAL)>1))

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFearghalView Question on Stackoverflow
Solution 1 - SqlTim SchmelterView Answer on Stackoverflow
Solution 2 - SqlShamseer KView Answer on Stackoverflow
Solution 3 - SqlAamirView Answer on Stackoverflow
Solution 4 - SqlRhysView Answer on Stackoverflow
Solution 5 - SqlShoja HamidView Answer on Stackoverflow
Solution 6 - SqlMahdi BashirpourView Answer on Stackoverflow
Solution 7 - SqlepoxView Answer on Stackoverflow
Solution 8 - SqlJithin ShajiView Answer on Stackoverflow
Solution 9 - SqlMd Masududzaman KhanView Answer on Stackoverflow
Solution 10 - SqloabarcaView Answer on Stackoverflow
Solution 11 - SqlFezal halaiView Answer on Stackoverflow
Solution 12 - SqlMoshe TaiebView Answer on Stackoverflow
Solution 13 - SqlSan JaisyView Answer on Stackoverflow
Solution 14 - SqlDebendra DashView Answer on Stackoverflow
Solution 15 - SqlHasan ShoumanView Answer on Stackoverflow
Solution 16 - SqlHadi SalehyView Answer on Stackoverflow
Solution 17 - SqlEmmanuel BullView Answer on Stackoverflow
Solution 18 - SqlAnkit JindalView Answer on Stackoverflow
Solution 19 - SqlrajibdotnetView Answer on Stackoverflow
Solution 20 - Sqlj.hullView Answer on Stackoverflow
Solution 21 - Sqlmessed-upView Answer on Stackoverflow
Solution 22 - SqlTolga GölelçinView Answer on Stackoverflow
Solution 23 - SqlSurinder SinghView Answer on Stackoverflow
Solution 24 - SqlDr.StarkView Answer on Stackoverflow