Delete all Duplicate Rows except for One in MySQL?

MysqlSqlDuplicates

Mysql Problem Overview


How would I delete all duplicate data from a MySQL Table?

For example, with the following data:

SELECT * FROM names;

+----+--------+
| id | name   |
+----+--------+
| 1  | google |
| 2  | yahoo  |
| 3  | msn    |
| 4  | google |
| 5  | google |
| 6  | yahoo  |
+----+--------+

I would use SELECT DISTINCT name FROM names; if it were a SELECT query.

How would I do this with DELETE to only remove duplicates and keep just one record of each?

Mysql Solutions


Solution 1 - Mysql

>Editor warning: This solution is computationally inefficient and may bring down your connection for a large table.

NB - You need to do this first on a test copy of your table!

When I did it, I found that unless I also included AND n1.id <> n2.id, it deleted every row in the table.

  1. If you want to keep the row with the lowest id value:

     DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
    
  2. If you want to keep the row with the highest id value:

     DELETE n1 FROM names n1, names n2 WHERE n1.id < n2.id AND n1.name = n2.name
    

I used this method in MySQL 5.1

Not sure about other versions.


Update: Since people Googling for removing duplicates end up here
Although the OP's question is about DELETE, please be advised that using INSERT and DISTINCT is much faster. For a database with 8 million rows, the below query took 13 minutes, while using DELETE, it took more than 2 hours and yet didn't complete.

INSERT INTO tempTableName(cellId,attributeId,entityRowId,value)
    SELECT DISTINCT cellId,attributeId,entityRowId,value
    FROM tableName;

Solution 2 - Mysql

If you want to keep the row with the lowest id value:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MIN(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

If you want the id value that is the highest:

DELETE FROM NAMES
 WHERE id NOT IN (SELECT * 
                    FROM (SELECT MAX(n.id)
                            FROM NAMES n
                        GROUP BY n.name) x)

The subquery in a subquery is necessary for MySQL, or you'll get a 1093 error.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHighway of LifeView Question on Stackoverflow
Solution 1 - Mysqlmartin.masaView Answer on Stackoverflow
Solution 2 - MysqlOMG PoniesView Answer on Stackoverflow