How do I find duplicates across multiple columns?

SqlSql ServerSql Server-2008Duplicates

Sql Problem Overview


So I want to do something like this sql code below:

select s.id, s.name,s.city 
from stuff s
group by s.name having count(where city and name are identical) > 1

To produce the following, (but ignore where only name or only city match, it has to be on both columns):

id      name  city   
904834  jim   London  
904835  jim   London  
90145   Fred  Paris   
90132   Fred  Paris
90133   Fred  Paris

Sql Solutions


Solution 1 - Sql

Duplicated id for pairs name and city:

select s.id, t.* 
from [stuff] s
join (
	select name, city, count(*) as qty
	from [stuff]
	group by name, city
	having count(*) > 1
) t on s.name = t.name and s.city = t.city

Solution 2 - Sql

 SELECT name, city, count(*) as qty 
 FROM stuff 
 GROUP BY name, city HAVING count(*)> 1

Solution 3 - Sql

Something like this will do the trick. Don't know about performance, so do make some tests.

select
  id, name, city
from
  [stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)

Solution 4 - Sql

Using count(*) over(partition by...) provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:

SELECT
    t.*
FROM (
    SELECT
        s.*
      , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
    FROM stuff s
    ) t
WHERE t.qty > 1
ORDER BY t.name, t.city

While most recent RDBMS versions support count(*) over(partition by...) MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0) > CREATE TABLE stuff( > id INTEGER NOT NULL > ,name VARCHAR(60) NOT NULL > ,city VARCHAR(60) NOT NULL > );

> INSERT INTO stuff(id,name,city) VALUES > (904834,'jim','London') > , (904835,'jim','London') > , (90145,'Fred','Paris') > , (90132,'Fred','Paris') > , (90133,'Fred','Paris') >
> , (923457,'Barney','New York') # not expected in result > ; >

> SELECT > t.* > FROM ( > SELECT > s.* > , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty > FROM stuff s > ) t > WHERE t.qty > 1 > ORDER BY t.name, t.city > > >

>     id | name | city   | qty
> -----: | :--- | :----- | --:
>  90145 | Fred | Paris  |   3
>  90132 | Fred | Paris  |   3
>  90133 | Fred | Paris  |   3
> 904834 | jim  | London |   2
> 904835 | jim  | London |   2
> 

db<>fiddle here

> Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to > that row. These include functions such as RANK(), LAG(), and NTILE(). > In addition, several existing aggregate functions now can be used as > window functions; for example, SUM() and AVG(). For more information, > see Section 12.21, “Window Functions”.

Solution 5 - Sql

A little late to the game on this post, but I found this way to be pretty flexible / efficient

select 
    s1.id
    ,s1.name
    ,s1.city 
from 
    stuff s1
    ,stuff s2
Where
    s1.id <> s2.id
    and s1.name = s2.name
    and s1.city = s2.city

Solution 6 - Sql

SELECT Feild1, Feild2, COUNT(*)
FROM table name
GROUP BY Feild1, Feild2
HAVING COUNT(*)>1

This will give you all yours answers.

Solution 7 - Sql

You have to self join stuff and match name and city. Then group by count.

select 
   s.id, s.name, s.city 
from stuff s join stuff p ON (
   s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1

Solution 8 - Sql

From OP question, OP wants to group columns and get additional columns that aren't grouping columns.

so that regular group by + having might not be worked.

I would use EXISTS subquery with HAVING.

we can try to add columns which you want to mark duplicate in a subquery.

SELECT s.id, s.name,s.city 
FROM stuff s
WHERE EXISTS (
   SELECT 1
   FROM stuff ss
   WHERE 
      s.name = ss.name
   AND
      s.city = ss.city
   GROUP BY ss.name,ss.city
   HAVING COUNT(*) > 1
)

If we create a suitable Index might get better performance than join

CREATE INDEX IX_name ON stuff (
	name,
	city
);

Another way we can use COUNT window function with filter condition to make it which add grouping columns in PARTITION BY part

SELECT s.id, s.name,s.city 
FROM (
   SELECT *,COUNT(*) OVER(PARTITION BY name,city) cnt
   FROM stuff 
) s
WHERE cnt > 1

sqlfiddle

Solution 9 - Sql

Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:

SELECT 
	COUNT(*)
	,LTRIM(RTRIM(S.TransactionDate)) 
	,LTRIM(RTRIM(S.TransactionTime))
	,LTRIM(RTRIM(S.TransactionTicketNumber)) 
	,LTRIM(RTRIM(GrossCost)) 
FROM Staging.dbo.Stage S
GROUP BY 
	LTRIM(RTRIM(S.TransactionDate)) 
	,LTRIM(RTRIM(S.TransactionTime))
	,LTRIM(RTRIM(S.TransactionTicketNumber)) 
	,LTRIM(RTRIM(GrossCost)) 
HAVING COUNT(*) > 1

.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNimChimpskyView Question on Stackoverflow
Solution 1 - SqlMichał PowagaView Answer on Stackoverflow
Solution 2 - SqlSunnnyView Answer on Stackoverflow
Solution 3 - SqlssarabandoView Answer on Stackoverflow
Solution 4 - SqlPaul MaxwellView Answer on Stackoverflow
Solution 5 - SqlMattDView Answer on Stackoverflow
Solution 6 - SqlArunav dutta guptaView Answer on Stackoverflow
Solution 7 - SqlAnjaView Answer on Stackoverflow
Solution 8 - SqlD-ShihView Answer on Stackoverflow
Solution 9 - SqlDon G.View Answer on Stackoverflow