DISTINCT for only one column

SqlSql Server

Sql Problem Overview


Let's say I have the following query.

SELECT ID, Email, ProductName, ProductModel FROM Products

How can I modify it so that it returns no duplicate Emails?

In other words, when several rows contain the same email, I want the results to include only one of those rows (preferably the last one). Duplicates in other columns should be allowed.

Clauses like DISTINCT and GROUP BY appear to work on entire rows. So I'm not sure how to approach this.

Sql Solutions


Solution 1 - Sql

If you are using SQL Server 2005 or above use this:

SELECT *
  FROM (
				SELECT  ID, 
						Email, 
						ProductName, 
						ProductModel,
						ROW_NUMBER() OVER(PARTITION BY Email ORDER BY ID DESC) rn
					FROM Products
			  ) a
WHERE rn = 1

EDIT: Example using a where clause:

SELECT *
  FROM (
                SELECT  ID, 
                        Email, 
                        ProductName, 
                        ProductModel,
                        ROW_NUMBER() OVER(PARTITION BY Email ORDER BY ID DESC) rn
                    FROM Products
                   WHERE ProductModel = 2
                     AND ProductName LIKE 'CYBER%'

              ) a
WHERE rn = 1

Solution 2 - Sql

This assumes SQL Server 2005+ and your definition of "last" is the max PK for a given email

WITH CTE AS
(
SELECT ID, 
       Email, 
       ProductName, 
       ProductModel, 
       ROW_NUMBER() OVER (PARTITION BY Email ORDER BY ID DESC) AS RowNumber 
FROM   Products
)
SELECT ID, 
       Email, 
       ProductName, 
       ProductModel
FROM CTE 
WHERE RowNumber = 1

Solution 3 - Sql

You can over that by using GROUP BY like this:

SELECT ID, Email, ProductName, ProductModel
FROM Products
GROUP BY Email

Solution 4 - Sql

When you use DISTINCT think of it as a distinct row, not column. It will return only rows where the columns do not match exactly the same.

SELECT DISTINCT ID, Email, ProductName, ProductModel
FROM Products

----------------------
1 | something@something.com | ProductName1 | ProductModel1
2 | something@something.com | ProductName1 | ProductModel1

The query would return both rows because the ID column is different. I'm assuming that the ID column is an IDENTITY column that is incrementing, if you want to return the last then I recommend something like this:

SELECT DISTINCT TOP 1 ID, Email, ProductName, ProductModel
FROM Products
ORDER BY ID DESC

The TOP 1 will return only the first record, by ordering it by the ID descending it will return the results with the last row first. This will give you the last record.

Solution 5 - Sql

The reason DISTINCT and GROUP BY work on entire rows is that your query returns entire rows.

To help you understand: Try to write out by hand what the query should return and you will see that it is ambiguous what to put in the non-duplicated columns.

If you literally don't care what is in the other columns, don't return them. Returning a random row for each e-mail address seems a little useless to me.

Solution 6 - Sql

For Access, you can use the SQL Select query I present here:

For example you have this table:

CLIENTE|| NOMBRES || MAIL

888 || T800 ARNOLD || [email protected]

123 || JOHN CONNOR || [email protected]

125 || SARAH CONNOR ||[email protected]

And you need to select only distinct mails. You can do it with this:

SQL SELECT:

SELECT MAX(p.CLIENTE) AS ID_CLIENTE
, (SELECT TOP 1 x.NOMBRES 
    FROM Rep_Pre_Ene_MUESTRA AS x 
    WHERE x.MAIL=p.MAIL 
     AND x.CLIENTE=(SELECT MAX(l.CLIENTE) FROM Rep_Pre_Ene_MUESTRA AS l WHERE x.MAIL=l.MAIL)) AS NOMBRE, 
p.MAIL
FROM Rep_Pre_Ene_MUESTRA AS p
GROUP BY p.MAIL;

You can use this to select the maximum ID, the correspondent name to that maximum ID , you can add any other attribute that way. Then at the end you put the distinct column to filter and you only group it with that last distinct column.

This will bring you the maximum ID with the correspondent data, you can use min or any other functions and you replicate that function to the sub-queries.

This select will return:

CLIENTE|| NOMBRES || MAIL

888 || T800 ARNOLD || [email protected]

125 || SARAH CONNOR ||[email protected]

Remember to index the columns you select and the distinct column must have not numeric data all in upper case or in lower case, or else it won't work. This will work with only one registered mail as well. Happy coding!!!

Solution 7 - Sql

Try This

;With Tab AS (SELECT DISTINCT Email FROM  Products)
SELECT Email,ROW_NUMBER() OVER(ORDER BY Email ASC) AS  Id FROM Tab
ORDER BY Email ASC

Solution 8 - Sql

Try this:

SELECT ID, Email, ProductName, ProductModel FROM Products WHERE ID IN (SELECT MAX(ID) FROM Products GROUP BY Email)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJonathan WoodView Question on Stackoverflow
Solution 1 - SqlChanduView Answer on Stackoverflow
Solution 2 - SqlPero P.View Answer on Stackoverflow
Solution 3 - SqlMarshall UnduemiView Answer on Stackoverflow
Solution 4 - Sqljon3lazeView Answer on Stackoverflow
Solution 5 - SqlJohnFxView Answer on Stackoverflow
Solution 6 - SqljRam90View Answer on Stackoverflow
Solution 7 - SqlAbdullah YousufView Answer on Stackoverflow
Solution 8 - SqlСергей ПустовитView Answer on Stackoverflow