Using LIMIT within GROUP BY to get N results per group?

MysqlSqlGroup ByGreatest N-per-GroupRanking

Mysql Problem Overview


The following query:

SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC

yields:

year	id	rate
2006	p01	8
2003	p01	7.4
2008	p01	6.8
2001	p01	5.9
2007	p01	5.3
2009	p01	4.4
2002	p01	3.9
2004	p01	3.5
2005	p01	2.1
2000	p01	0.8
2001	p02	12.5
2004	p02	12.4
2002	p02	12.2
2003	p02	10.3
2000	p02	8.7
2006	p02	4.6
2007	p02	3.3

What I'd like is only the top 5 results for each id:

2006	p01	8
2003	p01	7.4
2008	p01	6.8
2001	p01	5.9
2007	p01	5.3
2001	p02	12.5
2004	p02	12.4
2002	p02	12.2
2003	p02	10.3
2000	p02	8.7

Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?

Mysql Solutions


Solution 1 - Mysql

You could use GROUP_CONCAT aggregated function to get all years into a single column, grouped by id and ordered by rate:

SELECT   id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM     yourtable
GROUP BY id

Result:

-----------------------------------------------------------
|  ID | GROUPED_YEAR                                      |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007                |
-----------------------------------------------------------

And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.

SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1

SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6

Using a combination of GROUP_CONCAT and FIND_IN_SET, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:

SELECT
  yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT
      id,
      GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
    FROM
      yourtable
    GROUP BY id) group_max
  ON yourtable.id = group_max.id
     AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
  yourtable.id, yourtable.year DESC;

Please see fiddle here.

Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.

The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.

Solution 2 - Mysql

You want to find top n rows per group. This answer provides a generic solution using example data that is different from OP.

In MySQL 8 or later you can use the ROW_NUMBER, RANK or DENSE_RANK function depending on the exact definition of top 5. Below are the numbers generated by these functions based on value sorted descending. Notice how ties are handled:

pkid catid value row_number rank dense_rank
1 p01 100 *1 *1 *1
2 p01 90 *2 *2 *2
3 p01 90 *3 *2 *2
4 p01 80 *4 *4 *3
5 p01 80 *5 *4 *3
6 p01 80 6 *4 *3
7 p01 70 7 7 *4
8 p01 60 8 8 *5
9 p01 50 9 9 6
10 p01 40 10 10 7

Once you have chosen the function, use it like so:

SELECT *
FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY value DESC) AS n
    FROM t
) AS x
WHERE n <= 5

DB<>Fiddle


In MySQL 5.x you can use poor man's rank over partition to achieve desired result: outer join the table with itself and for each row, count the number of rows before it (e.g. the before row could be the one with higher value).

The following will produce results similar to RANK function:

SELECT t.pkid, t.catid, t.value, COUNT(b.value) + 1 AS rank
FROM t
LEFT JOIN t AS b ON b.catid = t.catid AND b.value > t.value
GROUP BY t.pkid, t.catid, t.value
HAVING COUNT(b.value) + 1 <= 5
ORDER BY t.catid, t.value DESC, t.pkid

Make the following change to produce results similar to DENSE_RANK function:

COUNT(DISTINCT b.value)

Or make the following change to produce results similar to ROW_NUMBER function:

ON b.catid = t.catid AND (b.value > t.value OR b.value = t.value AND b.pkid < t.pkid)

DB<>Fiddle

Solution 3 - Mysql

For me something like

SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N) 

works perfectly. No complicated query.


for example: get top 1 for each group

SELECT 
    *
FROM
    yourtable
WHERE
    id IN (SELECT 
            SUBSTRING_INDEX(GROUP_CONCAT(id
                            ORDER BY rate DESC),
                        ',',
                        1) id
        FROM
            yourtable
        GROUP BY year)
ORDER BY rate DESC;

Solution 4 - Mysql

No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).

This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various ways to tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answer to a similar previous question.

As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same rate value, so you may still need a quantity of post-processing to check for that.

Solution 5 - Mysql

This requires a series of subqueries to rank the values, limit them, then perform the sum while grouping

@Rnk:=0;
@N:=2;
select
  c.id,
  sum(c.val)
from (
select
  b.id,
  b.bal
from (
select   
  if(@last_id=id,@Rnk+1,1) as Rnk,
  a.id,
  a.val,
  @last_id=id,
from (   
select 
  id,
  val 
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;

Solution 6 - Mysql

SELECT year, id, rate
FROM (SELECT
  year, id, rate, row_number() over (partition by id order by rate DESC)
  FROM h
  WHERE year BETWEEN 2000 AND 2009
  AND id IN (SELECT rid FROM table2)
  GROUP BY id, year
  ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5

The subquery is almost identical to your query. Only change is adding

row_number() over (partition by id order by rate DESC)

Solution 7 - Mysql

Try this:

SELECT h.year, h.id, h.rate 
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx 
	  FROM (SELECT h.year, h.id, h.rate 
			FROM h
			WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
			GROUP BY id, h.year
			ORDER BY id, rate DESC
			) h, (SELECT @lastid:='', @index:=0) AS a
	) h 
WHERE h.indx <= 5;

Solution 8 - Mysql

Build the virtual columns(like RowID in Oracle)

Table:

CREATE TABLE `stack` 
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL) 
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

Data:

insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);

SQL like this:

select t3.year,t3.id,t3.rate 
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3 
where rownum <=3 order by id,rate DESC;

If delete the where clause in t3, it shows like this:

enter image description here

GET "TOP N Record" --> add the rownum <=3 in where clause (the where-clause of t3);

CHOOSE "the year" --> add the BETWEEN 2000 AND 2009 in where clause (the where-clause of t3);

Solution 9 - Mysql

Took some working, but I thougth my solution would be something to share as it is seems elegant as well as quite fast.

SELECT h.year, h.id, h.rate 
  FROM (
	SELECT id, 
	  SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
	  FROM h
      WHERE year BETWEEN 2000 AND 2009
	  GROUP BY id
      ORDER BY id
  ) AS h_temp
    LEFT JOIN h ON h.id = h_temp.id 
      AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l
		

Note that this example is specified for the purpose of the question and can be modified quite easily for other similar purposes.

Solution 10 - Mysql

The following post: sql: selcting top N record per group describes the complicated way of achieving this without subqueries.

It improves on other solutions offered here by:

  • Doing everything in a single query
  • Being able to properly utilize indexes
  • Avoiding subqueries, notoriously known to produce bad execution plans in MySQL

It is however not pretty. A good solution would be achievable were Window Functions (aka Analytic Functions) enabled in MySQL -- but they are not. The trick used in said post utilizes GROUP_CONCAT, which is sometimes described as "poor man's Window Functions for MySQL".

Solution 11 - Mysql

for those like me that had queries time out. I made the below to use limits and anything else by a specific group.

DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
	DECLARE a INT Default 0;
	DECLARE stop_loop INT Default 0;
	DECLARE domain_val VARCHAR(250);
	DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;

	OPEN domain_list;

	SELECT COUNT(DISTINCT(domain)) INTO stop_loop 
	FROM db.one;
	-- BEGIN LOOP
	loop_thru_domains: LOOP
		FETCH domain_list INTO domain_val;
		SET a=a+1;

		INSERT INTO db.two(book,artist,title,title_count,last_updated) 
		SELECT * FROM 
		(
			SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW() 
			FROM db.one 
			WHERE book = domain_val
			GROUP BY artist,title
			ORDER BY book,titleCount DESC
			LIMIT 200
		) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();

		IF a = stop_loop THEN
			LEAVE loop_thru_domain;
		END IF;
	END LOOP loop_thru_domain;
END $$

it loops through a list of domains and then inserts only a limit of 200 each

Solution 12 - Mysql

Try this:

SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
    @num := if(@type = `id`, @num + 1, 1) AS `row_number`,
    @type := `id` AS `dummy`
FROM (
    SELECT *
    FROM `h`
    WHERE (
        `year` BETWEEN '2000' AND '2009'
        AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
    )
    ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;

Solution 13 - Mysql

Please try below stored procedure. I have already verified. I am getting proper result but without using groupby.

CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
     
    SET @query_string='';

      OPEN cur1;
      read_loop: LOOP
      
      FETCH cur1 INTO tenants ;
      
      IF done THEN
		LEAVE read_loop;
      END IF;
      
      SET @datasource1 = tenants;
      SET @query_string = concat(@query_string,'(select * from demo  where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');
      
       END LOOP; 
      close cur1;
      
    SET @query_string  = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));  
  select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionWellsView Question on Stackoverflow
Solution 1 - MysqlfthiellaView Answer on Stackoverflow
Solution 2 - MysqlSalman AView Answer on Stackoverflow
Solution 3 - MysqlVishal KumarView Answer on Stackoverflow
Solution 4 - MysqlbobinceView Answer on Stackoverflow
Solution 5 - MysqlBrian L CartwrightView Answer on Stackoverflow
Solution 6 - MysqlRicky MorenoView Answer on Stackoverflow
Solution 7 - MysqlSaharsh ShahView Answer on Stackoverflow
Solution 8 - MysqlWang Wen'anView Answer on Stackoverflow
Solution 9 - MysqlJohnView Answer on Stackoverflow
Solution 10 - MysqlShlomi NoachView Answer on Stackoverflow
Solution 11 - MysqlDev-RiaView Answer on Stackoverflow
Solution 12 - MysqlMLFR2kxView Answer on Stackoverflow
Solution 13 - MysqlHimanshu PatelView Answer on Stackoverflow