Finding duplicate values in MySQL

Mysql

Mysql Problem Overview


I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?

Mysql Solutions


Solution 1 - Mysql

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

Solution 2 - Mysql

SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING COUNT(*) > 1;

Solution 3 - Mysql

SELECT  *
FROM    mytable mto
WHERE   EXISTS
        (
        SELECT  1
        FROM    mytable mti
        WHERE   mti.varchar_column = mto.varchar_column
        LIMIT 1, 1
        )

This query returns complete records, not just distinct varchar_column's.

This query doesn't use COUNT(*). If there are lots of duplicates, COUNT(*) is expensive, and you don't need the whole COUNT(*), you just need to know if there are two rows with same value.

This is achieved by the LIMIT 1, 1 at the bottom of the correlated query (essentially meaning "return the second row"). EXISTS would only return true if the aforementioned second row exists (i. e. there are at least two rows with the same value of varchar_column) .

Having an index on varchar_column will, of course, speed up this query greatly.

Solution 4 - Mysql

Building off of levik's answer to get the IDs of the duplicate rows you can do a GROUP_CONCAT if your server supports it (this will return a comma separated list of ids).

SELECT GROUP_CONCAT(id), name, COUNT(*) c
FROM documents
GROUP BY name
HAVING c > 1;

Solution 5 - Mysql

to get all the data that contains duplication i used this:

SELECT * FROM TableName INNER JOIN(
  SELECT DupliactedData FROM TableName GROUP BY DupliactedData HAVING COUNT(DupliactedData) > 1 order by DupliactedData)
  temp ON TableName.DupliactedData = temp.DupliactedData;

TableName = the table you are working with.

DupliactedData = the duplicated data you are looking for.

Solution 6 - Mysql

Assuming your table is named TableABC and the column which you want is Col and the primary key to T1 is Key.

SELECT a.Key, b.Key, a.Col 
FROM TableABC a, TableABC b
WHERE a.Col = b.Col 
AND a.Key <> b.Key

The advantage of this approach over the above answer is it gives the Key.

Solution 7 - Mysql

Taking @maxyfc's answer further, I needed to find all of the rows that were returned with the duplicate values, so I could edit them in MySQL Workbench:

SELECT * FROM table
   WHERE field IN (
     SELECT field FROM table GROUP BY field HAVING count(*) > 1
   ) ORDER BY field

Solution 8 - Mysql

SELECT * 
FROM `dps` 
WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)

Solution 9 - Mysql

To find how many records are duplicates in name column in Employee, the query below is helpful;

Select name from employee group by name having count(*)>1;

Solution 10 - Mysql

My final query incorporated a few of the answers here that helped - combining group by, count & GROUP_CONCAT.

SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c 
FROM product_variant 
GROUP BY `magento_simple` HAVING c > 1;

This provides the id of both examples (comma separated), the barcode I needed, and how many duplicates.

Change table and columns accordingly.

Solution 11 - Mysql

I am not seeing any JOIN approaches, which have many uses in terms of duplicates.

This approach gives you actual doubled results.

SELECT t1.* FROM my_table as t1 
LEFT JOIN my_table as t2 
ON t1.name=t2.name and t1.id!=t2.id 
WHERE t2.id IS NOT NULL 
ORDER BY t1.name

Solution 12 - Mysql

I saw the above result and query will work fine if you need to check single column value which are duplicate. For example email.

But if you need to check with more columns and would like to check the combination of the result so this query will work fine:

SELECT COUNT(CONCAT(name,email)) AS tot,
       name,
       email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
              AND also COUNT)

Solution 13 - Mysql

I prefer to use windowed functions(MySQL 8.0+) to find duplicates because I could see entire row:

WITH cte AS (
  SELECT *
    ,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
    ,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
  FROM table
)
SELECT *
FROM cte
WHERE num_of_duplicates_group > 1;

DB Fiddle Demo

Solution 14 - Mysql

SELECT t.*,(select count(*) from city as tt
  where tt.name=t.name) as count
  FROM `city` as t
  where (
     select count(*) from city as tt
     where tt.name=t.name
  ) > 1 order by count desc

Replace city with your Table. Replace name with your field name

Solution 15 - Mysql

SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1

Solution 16 - Mysql

SELECT 
    t.*,
    (SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count 
FROM `city` AS t 
WHERE 
    (SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC

Solution 17 - Mysql

CREATE TABLE tbl_master
	(`id` int, `email` varchar(15));

INSERT INTO tbl_master
	(`id`, `email`) VALUES
	(1, '[email protected]'),
	(2, '[email protected]'),
	(3, '[email protected]'),
	(4, '[email protected]'),
	(5, '[email protected]');

QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)

Solution 18 - Mysql

The following will find all product_id that are used more than once. You only get a single record for each product_id.

SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1

Code taken from : http://chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html

Solution 19 - Mysql

I improved from this:

SELECT 
    col, 
    COUNT(col)
FROM
    table_name
GROUP BY col
HAVING COUNT(col) > 1; 

Solution 20 - Mysql

SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

Solution 21 - Mysql

Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

Solution 22 - Mysql

For removing duplicate rows with multiple fields , first cancate them to the new unique key which is specified for the only distinct rows, then use "group by" command to removing duplicate rows with the same new unique key:

Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;

Solution 23 - Mysql

One very late contribution... in case it helps anyone waaaaaay down the line... I had a task to find matching pairs of transactions (actually both sides of account-to-account transfers) in a banking app, to identify which ones were the 'from' and 'to' for each inter-account-transfer transaction, so we ended up with this:

SELECT 
    LEAST(primaryid, secondaryid) AS transactionid1,
    GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
    SELECT table1.transactionid AS primaryid, 
        table2.transactionid AS secondaryid
    FROM financial_transactions table1
    INNER JOIN financial_transactions table2 
    ON table1.accountid = table2.accountid
    AND table1.transactionid <> table2.transactionid 
    AND table1.transactiondate = table2.transactiondate
    AND table1.sourceref = table2.destinationref
    AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;

The result is that the DuplicateResultsTable provides rows containing matching (i.e. duplicate) transactions, but it also provides the same transaction id's in reverse the second time it matches the same pair, so the outer SELECT is there to group by the first transaction ID, which is done by using LEAST and GREATEST to make sure the two transactionid's are always in the same order in the results, which makes it safe to GROUP by the first one, thus eliminating all the duplicate matches. Ran through nearly a million records and identified 12,000+ matches in just under 2 seconds. Of course the transactionid is the primary index, which really helped.

Solution 24 - Mysql

If you want to remove duplicate use DISTINCT

Otherwise use this query:

SELECT users.*,COUNT(user_ID) as user FROM users GROUP BY user_name HAVING user > 1;

Solution 25 - Mysql

As a variation on Levik's answer that allows you to find also the ids of the duplicate results, I used the following:

SELECT * FROM table1 WHERE column1 IN (SELECT column1 AS duplicate_value FROM table1 GROUP BY column1 HAVING COUNT(*) > 1)

Solution 26 - Mysql

Thanks to @novocaine for his great answer and his solution worked for me. I altered it slightly to include a percentage of the recurring values, which was needed in my case. Below is the altered version. It reduces the percentage to two decimal places. If you change the ,2 to 0, it will display no decimals, and to 1, then it will display one decimal place, and so on.

SELECT GROUP_CONCAT(id), name, COUNT(*) c, 
COUNT(*) OVER() AS totalRecords, 
CONCAT(FORMAT(COUNT(*)/COUNT(*) OVER()*100,2),'%') as recurringPecentage
FROM table
GROUP BY name
HAVING c > 1

Solution 27 - Mysql

Try using this query:

SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJon TackaburyView Question on Stackoverflow
Solution 1 - MysqllevikView Answer on Stackoverflow
Solution 2 - MysqlmaxyfcView Answer on Stackoverflow
Solution 3 - MysqlQuassnoiView Answer on Stackoverflow
Solution 4 - MysqlMatt R.View Answer on Stackoverflow
Solution 5 - MysqludiView Answer on Stackoverflow
Solution 6 - MysqlTechTravelThinkView Answer on Stackoverflow
Solution 7 - MysqlAbsoluteĈµERØView Answer on Stackoverflow
Solution 8 - MysqlstrustamView Answer on Stackoverflow
Solution 9 - Mysqluser5599549View Answer on Stackoverflow
Solution 10 - MysqlJonathanView Answer on Stackoverflow
Solution 11 - MysqlAdam FischerView Answer on Stackoverflow
Solution 12 - Mysqluser2235601View Answer on Stackoverflow
Solution 13 - MysqlLukasz SzozdaView Answer on Stackoverflow
Solution 14 - MysqlLalit PatelView Answer on Stackoverflow
Solution 15 - MysqlScott FergusonView Answer on Stackoverflow
Solution 16 - MysqlmageshView Answer on Stackoverflow
Solution 17 - MysqlBijesh ShethView Answer on Stackoverflow
Solution 18 - MysqlChandreshView Answer on Stackoverflow
Solution 19 - MysqlNhlanhla RihlapfuView Answer on Stackoverflow
Solution 20 - MysqlPawel FurmaniakView Answer on Stackoverflow
Solution 21 - MysqlVipin JainView Answer on Stackoverflow
Solution 22 - Mysqluser3162712View Answer on Stackoverflow
Solution 23 - MysqlfortyninthnetView Answer on Stackoverflow
Solution 24 - MysqlHassan Latif ButtView Answer on Stackoverflow
Solution 25 - MysqlDavid RobertsonView Answer on Stackoverflow
Solution 26 - MysqlIwan RossView Answer on Stackoverflow
Solution 27 - MysqlAtul AkabariView Answer on Stackoverflow