MYSQL sum() for distinct rows

MysqlSum

Mysql Problem Overview


I'm looking for help using sum() in my SQL query:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)

I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.

Is that possible?

Mysql Solutions


Solution 1 - Mysql

I may be wrong but from what I understand

  • conversions.id is the primary key of your table conversions
  • stats.id is the primary key of your table stats

Thus for each conversions.id you have at most one links.id impacted.

You request is a bit like doing the cartesian product of 2 sets :

[clicks]
SELECT *
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 

[conversions]
SELECT *
FROM links 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 

and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

As you noted the number of unique conversions in your request can be obtained via a

count(distinct conversions.id) = sizeof([conversions])

this distinct manages to remove all the [clicks] lines in the cartesian product

but clearly

sum(conversions.value) = sum([conversions].value) * sizeof([clicks])

In your case, since

count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)

you have

sizeof([clicks]) = count(*)/count(distinct conversions.id)

so I would test your request with

SELECT links.id, 
   count(DISTINCT stats.id) as clicks, 
   count(DISTINCT conversions.id) as conversions, 
   sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

Keep me posted ! Jerome

Solution 2 - Mysql

Jeromes solution is actually wrong and can produce incorrect results!!

sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value

let's assume the following table

conversions
id value
1 5
1 5
1 5
2 2
3 1

the correct sum of value for distinct ids would be 8. Jerome's formula produces:

sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8

Solution 3 - Mysql

For an explanation of why you were seeing incorrect numbers, read this.

I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.

Solution 4 - Mysql

Use the following query:

SELECT links.id
  , (
    SELECT COUNT(*)
    FROM stats
    WHERE links.id = stats.parent_id
  ) AS clicks
  , conversions.conversions
  , conversions.conversion_value
FROM links
LEFT JOIN (
  SELECT link_id
    , COUNT(id) AS conversions
    , SUM(conversions.value) AS conversion_value
  FROM conversions
  GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC

Solution 5 - Mysql

I use a subquery to do this. It eliminates the problems with grouping. So the query would be something like:

SELECT COUNT(DISTINCT conversions.id)
...
     (SELECT SUM(conversions.value) FROM ....) AS Vals

Solution 6 - Mysql

How about something like this:

select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from	(SELECT l.id id, l.created created,
			   s.id clicks,  
			   c.id conversions,  
			   max(c.value) conversion_value			  		
		FROM links l LEFT
		JOIN stats s ON l.id = s.parent_id LEFT
		JOIN conversions c ON l.id = c.link_id  
		GROUP BY l.id, l.created, s.id, c.id) t
order by t.created	

Solution 7 - Mysql

This will do the trick, just divide the sum with the count of conversation id which are duplicate.

SELECT a.id,
       a.clicks,
       SUM(a.conversion_value/a.conversions) AS conversion_value,
       a.conversions
FROM (SELECT links.id, 
       COUNT(DISTINCT stats.id) AS clicks, 
       COUNT(conversions.id) AS conversions, 
       SUM(conversions.value) AS conversion_value 
      FROM links 
      LEFT OUTER JOIN stats ON links.id = stats.parent_id 
      LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
      GROUP BY conversions.id,links.id
      ORDER BY links.created DESC) AS a
GROUP BY a.id

Solution 8 - Mysql

Select sum(x.value) as conversion_value,count(x.clicks),count(x.conversions)
FROM
(SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions,
       conversions.value,       
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY conversions.id) x
GROUP BY x.id 
ORDER BY x.created desc;

I believe this will give you the answer that you are looking for.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmakeeeView Question on Stackoverflow
Solution 1 - MysqlJerome WAGNERView Answer on Stackoverflow
Solution 2 - MysqlClemens ValienteView Answer on Stackoverflow
Solution 3 - MysqlTehShrikeView Answer on Stackoverflow
Solution 4 - MysqlBrysonView Answer on Stackoverflow
Solution 5 - MysqlDaveView Answer on Stackoverflow
Solution 6 - MysqlQuesiView Answer on Stackoverflow
Solution 7 - MysqlDipu RajView Answer on Stackoverflow
Solution 8 - MysqlNikhil Kumar SinghView Answer on Stackoverflow