MySQL Order before Group by
MysqlWordpressGroup BySql Order-ByMysql Problem Overview
I need to find the latest post for each author and then group the results so I only a single latest post for each author.
SELECT wp_posts.* FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author
ORDER BY wp_posts.post_date DESC
This is correctly grouping the output so I only get one post per author, but it is ordering the results after they have been grouped and not before they have been selected.
Mysql Solutions
Solution 1 - Mysql
select wp_posts.* from wp_posts
where wp_posts.post_status='publish'and wp_posts.post_type='post'
group by wp_posts.post_author
having wp_posts.post_date = MAX(wp_posts.post_date) /* ONLY THE LAST POST FOR EACH AUTHOR */
order by wp_posts.post_date desc
EDIT:
After some comments I have decided to add some additional informations.
The company I am working at also uses Postgres and especially SQL Server. This databases don't allow such queries. So I know that there is a other way to do this (I write a solution below). You shoud also have to know what you do if you don't group by all columns treated in the projection or use aggregate functions. Otherwise let it be!
I chose the solution above, because it's a specific question. Tom want to get the recent post for each author in a wordpress site. In my mind it is negligible for the analysis if a author do more than one post per second. Wordpress should even forbid it by its spam-double-post detection. I know from personal experience that there is a really significant benefit in performance doing a such dirty group by with MySQL. But if you know what you do, then you can do it! I have such dirty groups in apps where I'm professionally accountable for. Here I have tables with some mio rows which need 5-15s instead of 100++ seconds.
May be useful about some pros and cons: http://ftp.nchu.edu.tw/MySQL/tech-resources/articles/debunking-group-by-myths.html
SELECT
wp_posts.*
FROM
wp_posts
JOIN
(
SELECT
g.post_author
MAX(g.post_date) AS post_date
FROM wp_posts as g
WHERE
g.post_status='publish'
AND g.post_type='post'
GROUP BY g.post_author
) as t
ON wp_posts.post_author = t.post_author AND wp_posts.post_date = t.post_date
ORDER BY wp_posts.post_date
But if here is more then one post per second for a author you will get more then one row and not the only last one.
Now you can spin the wheel again and get the post with the highest Id
. Even here it is at least not guaranteed that you really get the last one.
Solution 2 - Mysql
Not sure if I understand your requirement correct but following inner statement gets the list of the latest post_date for each author and joins these back with the wp_posts table to get a complete record.
SELECT *
FROM wp_posts wp
INNER JOIN (
SELECT post_author
, MAX(post_date) AS post_date
FROM wp_posts
WHERE post_status = 'publish'
AND post_type = 'post'
GROUP BY
post.author
) wpmax ON wpmax.post_author = wp.post_author
AND wpmax.post_date = wp.post_date
ORDER BY
wp.post_date DESC
Solution 3 - Mysql
I think that @edze response is wrong.
In the MySQL manual you can read:
> MySQL extends the use of GROUP BY so that the select list can refer to > nonaggregated columns not named in the GROUP BY clause. You can use > this feature to get better performance by avoiding unnecessary column > sorting and grouping. However, this is useful primarily when all > values in each nonaggregated column not named in the GROUP BY are the > same for each group. The server is free to choose any value from each > group, so unless they are the same, the values chosen are > indeterminate. Furthermore, the selection of values from each group > cannot be influenced by adding an ORDER BY clause. Sorting of the > result set occurs after values have been chosen, and ORDER BY does > not affect which values the server chooses.
Two great references:
- http://kristiannielsen.livejournal.com/6745.html
- http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Sorry, but I can not comment the @edze response because of my reputation, so I have written a new answer.
Solution 4 - Mysql
Do a GROUP BY after the ORDER BY by wrapping your query with the GROUP BY like this:
SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t GROUP BY t.author
Solution 5 - Mysql
it doesn't matter if you order before or after the group-statement, because order means only that 213 goes to 123 or 321 and not more. group by takes only SOME entry per column, not only the latest. I consider you working with subselects here like
SELECT wp_posts.* FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
AND wp_posts.post_date = (Select max(post_date) from wp_posts where author = ... )
Solution 6 - Mysql
What do you think about this?? Seems to work for me
SELECT wp_posts.post_author, MAX(wp_posts.post_date), wp_posts.status, wp_posts.post_type
FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author
It brings me all the Authors with the most updated post_date ... Do you identify a problem there?? I don't
Solution 7 - Mysql
SELECT wp_posts.*,max(wp_posts.post_date) FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author
Solution 8 - Mysql
When our table became large, performance need to checked also. I checked all the options in the questions here, with a PM system with a 136K messages and link table with 83K rows.
When you need only count, or only IDs - Alex's solution is the best.
SELECT wp_posts.post_author, MAX(wp_posts.post_date), wp_posts.status, wp_posts.post_type
FROM wp_posts
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
GROUP BY wp_posts.post_author
When you need other fields, I need to modify Husky110 solution (to my table design - here it is only example - not checked), that in my tables 10x faster than the subquery option:
SELECT wp_posts.* FROM wp_posts,
(Select post_id as pid, max(post_date) maxdate from wp_posts where author = ... group by author order by maxdate desc limit 4) t
WHERE wp_posts.post_status='publish'
AND wp_posts.post_type='post'
AND wp_posts.post_id = pid
This change can select more than one post (one for user, for example), and can be modified to other solutions.
Moshe.
Solution 9 - Mysql
Use the below code...
<?php
//get all users, iterate through users, query for one post for the user,
//if there is a post then display the post title, author, content info
$blogusers = get_users_of_blog();
if ($blogusers) {
foreach ($blogusers as $bloguser) {
$args = array(
'author' => $bloguser->user_id,
'showposts' => 1,
'caller_get_posts' => 1
);
$my_query = new WP_Query($args);
if( $my_query->have_posts() ) {
// $user = get_userdata($bloguser->user_id);
// echo 'This is one post for author with User ID: ' . $user->ID . ' ' . $user- >user_firstname . ' ' . $user->user_lastname;
while ($my_query->have_posts()) : $my_query->the_post(); ?>
<a href="<?php the_permalink() ?>" rel="bookmark" title="Permanent Link to <? php the_title_attribute(); ?>"><?php the_title(); ?></a>
<small><?php the_time('F jS, Y') ?> by <?php the_author_posts_link() ?> </small><?php
the_content();
endwhile;
}
}
}
?>
Solution 10 - Mysql
HERE a simple answer from http://www.cafewebmaster.com/mysql-order-sort-group
SELECT * FROM
(
select * from `my_table` order by timestamp desc
) as my_table_tmp
GROUP BY catid
ORDER BY nid desc
it worked wonders for me