How do I find duplicates across multiple columns?
SqlSql ServerSql Server-2008DuplicatesSql Problem Overview
So I want to do something like this sql code below:
select s.id, s.name,s.city
from stuff s
group by s.name having count(where city and name are identical) > 1
To produce the following, (but ignore where only name or only city match, it has to be on both columns):
id name city
904834 jim London
904835 jim London
90145 Fred Paris
90132 Fred Paris
90133 Fred Paris
Sql Solutions
Solution 1 - Sql
Duplicated id
for pairs name
and city
:
select s.id, t.*
from [stuff] s
join (
select name, city, count(*) as qty
from [stuff]
group by name, city
having count(*) > 1
) t on s.name = t.name and s.city = t.city
Solution 2 - Sql
SELECT name, city, count(*) as qty
FROM stuff
GROUP BY name, city HAVING count(*)> 1
Solution 3 - Sql
Something like this will do the trick. Don't know about performance, so do make some tests.
select
id, name, city
from
[stuff] s
where
1 < (select count(*) from [stuff] i where i.city = s.city and i.name = s.name)
Solution 4 - Sql
Using count(*) over(partition by...)
provides a simple and efficient means to locate unwanted repetition, whilst also list all affected rows and all wanted columns:
SELECT
t.*
FROM (
SELECT
s.*
, COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty
FROM stuff s
) t
WHERE t.qty > 1
ORDER BY t.name, t.city
While most recent RDBMS versions support count(*) over(partition by...)
MySQL V 8.0 introduced "window functions", as seen below (in MySQL 8.0)
> CREATE TABLE stuff(
> id INTEGER NOT NULL
> ,name VARCHAR(60) NOT NULL
> ,city VARCHAR(60) NOT NULL
> );
> INSERT INTO stuff(id,name,city) VALUES
> (904834,'jim','London')
> , (904835,'jim','London')
> , (90145,'Fred','Paris')
> , (90132,'Fred','Paris')
> , (90133,'Fred','Paris')
>
> , (923457,'Barney','New York') # not expected in result
> ;
>
> SELECT > t.* > FROM ( > SELECT > s.* > , COUNT(*) OVER (PARTITION BY s.name, s.city) AS qty > FROM stuff s > ) t > WHERE t.qty > 1 > ORDER BY t.name, t.city > > >
> id | name | city | qty > -----: | :--- | :----- | --: > 90145 | Fred | Paris | 3 > 90132 | Fred | Paris | 3 > 90133 | Fred | Paris | 3 > 904834 | jim | London | 2 > 904835 | jim | London | 2 >
db<>fiddle here
> Window functions. MySQL now supports window functions that, for each row from a query, perform a calculation using rows related to > that row. These include functions such as RANK(), LAG(), and NTILE(). > In addition, several existing aggregate functions now can be used as > window functions; for example, SUM() and AVG(). For more information, > see Section 12.21, “Window Functions”.
Solution 5 - Sql
A little late to the game on this post, but I found this way to be pretty flexible / efficient
select
s1.id
,s1.name
,s1.city
from
stuff s1
,stuff s2
Where
s1.id <> s2.id
and s1.name = s2.name
and s1.city = s2.city
Solution 6 - Sql
SELECT Feild1, Feild2, COUNT(*)
FROM table name
GROUP BY Feild1, Feild2
HAVING COUNT(*)>1
This will give you all yours answers.
Solution 7 - Sql
You have to self join stuff and match name and city. Then group by count.
select
s.id, s.name, s.city
from stuff s join stuff p ON (
s.name = p.city OR s.city = p.name
)
group by s.name having count(s.name) > 1
Solution 8 - Sql
From OP question, OP wants to group columns and get additional columns that aren't grouping columns.
so that regular group by
+ having
might not be worked.
I would use EXISTS
subquery with HAVING
.
we can try to add columns which you want to mark duplicate in a subquery.
SELECT s.id, s.name,s.city
FROM stuff s
WHERE EXISTS (
SELECT 1
FROM stuff ss
WHERE
s.name = ss.name
AND
s.city = ss.city
GROUP BY ss.name,ss.city
HAVING COUNT(*) > 1
)
If we create a suitable Index might get better performance than join
CREATE INDEX IX_name ON stuff (
name,
city
);
Another way we can use COUNT
window function with filter condition to make it which add grouping columns in PARTITION BY
part
SELECT s.id, s.name,s.city
FROM (
SELECT *,COUNT(*) OVER(PARTITION BY name,city) cnt
FROM stuff
) s
WHERE cnt > 1
Solution 9 - Sql
Given a staging table with 70 columns and only 4 representing duplicates, this code will return the offending columns:
SELECT
COUNT(*)
,LTRIM(RTRIM(S.TransactionDate))
,LTRIM(RTRIM(S.TransactionTime))
,LTRIM(RTRIM(S.TransactionTicketNumber))
,LTRIM(RTRIM(GrossCost))
FROM Staging.dbo.Stage S
GROUP BY
LTRIM(RTRIM(S.TransactionDate))
,LTRIM(RTRIM(S.TransactionTime))
,LTRIM(RTRIM(S.TransactionTicketNumber))
,LTRIM(RTRIM(GrossCost))
HAVING COUNT(*) > 1
.