Find duplicate rows with PostgreSQL

SqlDatabaseRuby on-Rails-3DuplicatesPostgresql 9.2

Sql Problem Overview


We have a table of photos with the following columns:

id, merchant_id, url 

this table contains duplicate values for the combination merchant_id, url. so it's possible that one row appears more several times.

234 some_merchant  http://www.some-image-url.com/abscde1213
235 some_merchant  http://www.some-image-url.com/abscde1213
236 some_merchant  http://www.some-image-url.com/abscde1213

What is the best way to delete those duplications? (I use PostgreSQL 9.2 and Rails 3.)

Sql Solutions


Solution 1 - Sql

Here is my take on it.

select * from (
  SELECT id,
  ROW_NUMBER() OVER(PARTITION BY merchant_Id, url ORDER BY id asc) AS Row
  FROM Photos
) dups
where 
dups.Row > 1

Feel free to play with the order by to tailor the records you want to delete to your specification.

SQL Fiddle => http://sqlfiddle.com/#!15/d6941/1/0


SQL Fiddle for Postgres 9.2 is no longer supported; updating SQL Fiddle to postgres 9.3

Solution 2 - Sql

The second part of sgeddes's answer doesn't work on Postgres (the fiddle uses MySQL). Here is an updated version of his answer using Postgres: http://sqlfiddle.com/#!12/6b1a7/1

DELETE FROM Photos AS P1  
USING Photos AS P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  

Solution 3 - Sql

I see a couple of options for you.

For a quick way of doing it, use something like this (it assumes your ID column is not unique as you mention 234 multiple times above):

CREATE TABLE tmpPhotos AS SELECT DISTINCT * FROM Photos;
DROP TABLE Photos;
ALTER TABLE tmpPhotos RENAME TO Photos;

Here is the SQL Fiddle.

You will need to add your constraints back to the table if you have any.

If your ID column is unique, you could do something like to keep your lowest id:

DELETE FROM P1  
USING Photos P1, Photos P2
WHERE P1.id > P2.id
   AND P1.merchant_id = P2.merchant_id  
   AND P1.url = P2.url;  

And the Fiddle.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionschlubbiView Question on Stackoverflow
Solution 1 - SqlMatthewJView Answer on Stackoverflow
Solution 2 - Sql11101101bView Answer on Stackoverflow
Solution 3 - SqlsgeddesView Answer on Stackoverflow