Removing duplicate rows from table in Oracle

SqlOracleDuplicatesDelete Row

Sql Problem Overview


I'm testing something in Oracle and populated a table with some sample data, but in the process I accidentally loaded duplicate records, so now I can't create a primary key using some of the columns.

How can I delete all duplicate rows and leave only one of them?

Sql Solutions


Solution 1 - Sql

Use the rowid pseudocolumn.

DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);

Where column1, column2, and column3 make up the identifying key for each record. You might list all your columns.

Solution 2 - Sql

From Ask Tom

delete from t
 where rowid IN ( select rid
                    from (select rowid rid, 
                                 row_number() over (partition by 
                         companyid, agentid, class , status, terminationdate
                                   order by rowid) rn
                            from t)
                   where rn <> 1);

(fixed the missing parenthesis)

Solution 3 - Sql

From DevX.com:

DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3...) ;

Where column1, column2, etc. is the key you want to use.

Solution 4 - Sql

DELETE FROM tablename a
      WHERE a.ROWID > ANY (SELECT b.ROWID
                             FROM tablename b
                            WHERE a.fieldname = b.fieldname
                              AND a.fieldname2 = b.fieldname2)

Solution 5 - Sql

#Solution 1)

delete from emp
where rowid not in
(select max(rowid) from emp group by empno);

#Solution 2)

delete from emp where rowid in
               (
                 select rid from
                  (
                    select rowid rid,
                      row_number() over(partition by empno order by empno) rn
                      from emp
                  )
                where rn > 1
               );

#Solution 3)

delete from emp e1
         where rowid not in
          (select max(rowid) from emp e2
           where e1.empno = e2.empno ); 

Solution 6 - Sql

create table t2 as select distinct * from t1;

Solution 7 - Sql

You should do a small pl/sql block using a cursor for loop and delete the rows you don't want to keep. For instance:

declare
prev_var my_table.var1%TYPE;

begin

for t in (select var1 from my_table order by var 1) LOOP

-- if previous var equal current var, delete the row, else keep on going.
end loop;

end;

Solution 8 - Sql

To select the duplicates only the query format can be:

SELECT GroupFunction(column1), GroupFunction(column2),..., 
COUNT(column1), column1, column2...
FROM our_table
GROUP BY column1, column2, column3...
HAVING COUNT(column1) > 1

So the correct query as per other suggestion is:

DELETE FROM tablename a
      WHERE a.ROWID > ANY (SELECT b.ROWID
                             FROM tablename b
                            WHERE a.fieldname = b.fieldname
                              AND a.fieldname2 = b.fieldname2
                              AND ....so on.. to identify the duplicate rows....)

This query will keep the oldest record in the database for the criteria chosen in the WHERE CLAUSE.

Oracle Certified Associate (2008)

Solution 9 - Sql

create table abcd(id number(10),name varchar2(20))

insert into abcd values(1,'abc')

insert into abcd values(2,'pqr')


insert into abcd values(3,'xyz')

insert into abcd values(1,'abc')

insert into abcd values(2,'pqr')

insert into abcd values(3,'xyz')


select * from abcd
id  Name
1	abc
2	pqr
3	xyz
1	abc
2	pqr
3	xyz

Delete Duplicate record but keep Distinct Record in table 

DELETE 
FROM abcd a
WHERE ROWID > (SELECT MIN(ROWID) FROM abcd b
WHERE b.id=a.id
);

run the above query 3 rows delete 

select * from abcd

id  Name 
1	abc
2	pqr
3	xyz

Solution 10 - Sql

This blog post was really helpful for general cases:

> If the rows are fully duplicated (all values in all columns can have copies) there are no columns to use! But to keep one you still need a unique identifier for each row in each group. Fortunately, Oracle already has something you can use. The rowid. All rows in Oracle have a rowid. This is a physical locator. That is, it states where on disk Oracle stores the row. This unique to each row. So you can use this value to identify and remove copies. To do this, replace min() with min(rowid) in the uncorrelated delete:

delete films
where  rowid not in (
  select min(rowid)
  from   films
  group  by title, uk_release_date
)

Solution 11 - Sql

The Fastest way for really big tables

  1. Create exception table with structure below: exceptions_table

    ROW_ID ROWID
    OWNER VARCHAR2(30)
    TABLE_NAME VARCHAR2(30)
    CONSTRAINT VARCHAR2(30)
    
  2. Try create a unique constraint or primary key which will be violated by the duplicates. You will get an error message because you have duplicates. The exceptions table will contain the rowids for the duplicate rows.

    alter table add constraint
    unique --or primary key
    (dupfield1,dupfield2) exceptions into exceptions_table;
    
  3. Join your table with exceptions_table by rowid and delete dups

    delete original_dups where rowid in (select ROW_ID from exceptions_table);
    
  4. If the amount of rows to delete is big, then create a new table (with all grants and indexes) anti-joining with exceptions_table by rowid and rename the original table into original_dups table and rename new_table_with_no_dups into original table

    create table new_table_with_no_dups AS (
        select field1, field2 ........ 
        from original_dups t1
        where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
    )
    

Solution 12 - Sql

Using rowid-

delete from emp
 where rowid not in
 (select max(rowid) from emp group by empno);

Using self join-

delete from emp e1
 where rowid not in
 (select max(rowid) from emp e2
 where e1.empno = e2.empno );

Solution 13 - Sql

Solution 4)

 delete from emp where rowid in
            (
             select rid from
                (
                  select rowid rid,
                  dense_rank() over(partition by empno order by rowid
                ) rn
             from emp
            )
 where rn > 1
);

Solution 14 - Sql

1. solution

delete from emp
    where rowid not in
    (select max(rowid) from emp group by empno);

2. sloution

delete from emp where rowid in
               (
                 select rid from
                  (
                    select rowid rid,
                      row_number() over(partition by empno order by empno) rn
                      from emp
                  )
                where rn > 1
               );

3.solution

delete from emp e1
         where rowid not in
          (select max(rowid) from emp e2
           where e1.empno = e2.empno ); 

4. solution

 delete from emp where rowid in
            (
             select rid from
                (
                  select rowid rid,
                  dense_rank() over(partition by empno order by rowid
                ) rn
             from emp
            )
 where rn > 1
);

Solution 15 - Sql

5. solution

delete from emp where rowid in 
    (
      select  rid from
       (
         select rowid rid,rank() over (partition by emp_id order by rowid)rn from emp     
       )
     where rn > 1
    );

Solution 16 - Sql

DELETE from table_name where rowid not in (select min(rowid) FROM table_name group by column_name);

and you can also delete duplicate records in another way

DELETE from table_name a where rowid > (select min(rowid) FROM table_name b where a.column=b.column);

Solution 17 - Sql

DELETE FROM tableName  WHERE ROWID NOT IN (SELECT   MIN (ROWID) FROM table GROUP BY columnname);

Solution 18 - Sql

delete from dept
where rowid in (
     select rowid
     from dept
     minus
     select max(rowid)
     from dept
     group by DEPTNO, DNAME, LOC
);

Solution 19 - Sql

For best performance, here is what I wrote :
(see execution plan)

DELETE FROM your_table
WHERE rowid IN 
  (select t1.rowid from your_table  t1
      LEFT OUTER JOIN (
      SELECT MIN(rowid) as rowid, column1,column2, column3
      FROM your_table 
      GROUP BY column1, column2, column3
  )  co1 ON (t1.rowid = co1.rowid)
  WHERE co1.rowid IS NULL
);

Solution 20 - Sql

Check below scripts -

Create table test(id int,sal int); 
    insert into test values(1,100);    
    insert into test values(1,100);    
    insert into test values(2,200);    
    insert into test values(2,200);    
    insert into test values(3,300);    
    insert into test values(3,300);    
    commit;

3.

 select * from test;    

You will see here 6-records.
4.run below query -

delete from 
   test
where rowid in
 (select rowid from 
   (select 
     rowid,
     row_number()
    over 
     (partition by id order by sal) dup
    from test)
  where dup > 1)

5. select * from test;

You will see that duplicate records have been deleted.
Hope this solves your query. Thanks :)

Solution 21 - Sql

I didn't see any answers that use common table expressions and window functions. This is what I find easiest to work with.

DELETE FROM
 YourTable
WHERE
 ROWID IN
    (WITH Duplicates
          AS (SELECT
               ROWID RID, 
               ROW_NUMBER() 
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date)
                  AS RN
               SUM(1)
               OVER(
               PARTITION BY First_Name, Last_Name, Birth_Date
               ORDER BY ROWID ROWS BETWEEN UNBOUNDED PRECEDING 
                                       AND UNBOUNDED FOLLOWING)
                   AS CNT
              FROM
               YourTable
              WHERE
               Load_Date IS NULL)
     SELECT
      RID
     FROM
      duplicates
     WHERE
      RN > 1);

Somethings to note:

  1. We are only checking for duplication on the fields in the partition clause.

  2. If you have some reason to pick one duplicate over others you can use an order by clause to make that row will have row_number() = 1

  3. You can change the number duplicate preserved by changing the final where clause to "Where RN > N" with N >= 1 (I was thinking N = 0 would delete all rows that have duplicates, but it would just delete all rows).

  4. Added the Sum partition field the CTE query which will tag each row with the number rows in the group. So to select rows with duplicates, including the first item use "WHERE cnt > 1".

Solution 22 - Sql

solution :

delete from emp where rowid in
(
    select rid from
    (
        select rowid rid,
        row_number() over(partition by empno order by empno) rn
        from emp
    )
    where rn > 1
);

Solution 23 - Sql

create or replace procedure delete_duplicate_enq as
  	cursor c1 is
  	select *
  	from enquiry;
begin
  	for z in c1 loop
   		delete enquiry
   		where enquiry.enquiryno = z.enquiryno
   		and rowid > any
   		(select rowid
   		from enquiry
   		where enquiry.enquiryno = z.enquiryno);
  	end loop;
 end delete_duplicate_enq;

Solution 24 - Sql

This is similar to the top answer but gives me a much better explain plan:

delete from your_table
 where rowid in (
        select max(rowid)
          from your_table
         group by column1, column2, column3
        having count(*) > 1
       );

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionjuanView Question on Stackoverflow
Solution 1 - SqlBill the LizardView Answer on Stackoverflow
Solution 2 - SqlDead ProgrammerView Answer on Stackoverflow
Solution 3 - SqlMarkView Answer on Stackoverflow
Solution 4 - Sqluser187624View Answer on Stackoverflow
Solution 5 - SqlDoOrDieView Answer on Stackoverflow
Solution 6 - SqlMohammed khaledView Answer on Stackoverflow
Solution 7 - SqlNickView Answer on Stackoverflow
Solution 8 - Sqluser1799846View Answer on Stackoverflow
Solution 9 - SqlKrunal PatelView Answer on Stackoverflow
Solution 10 - SqlalaView Answer on Stackoverflow
Solution 11 - Sqluser2158672View Answer on Stackoverflow
Solution 12 - SqlDnyaneshwar TandaleView Answer on Stackoverflow
Solution 13 - SqlDoOrDieView Answer on Stackoverflow
Solution 14 - Sqluser5906974View Answer on Stackoverflow
Solution 15 - SqlDoOrDieView Answer on Stackoverflow
Solution 16 - SqlMd WasiView Answer on Stackoverflow
Solution 17 - SqlJgSudhakarView Answer on Stackoverflow
Solution 18 - Sqluser3655760View Answer on Stackoverflow
Solution 19 - SqlEnguerrand JOREView Answer on Stackoverflow
Solution 20 - SqlRakesh RoshanView Answer on Stackoverflow
Solution 21 - SqlDarrel LeeView Answer on Stackoverflow
Solution 22 - Sqlsandeep guptaView Answer on Stackoverflow
Solution 23 - SqlAshish sinhaView Answer on Stackoverflow
Solution 24 - SqlHowdView Answer on Stackoverflow