Postgresql aggregate array
SqlArraysPostgresqlSql Problem Overview
I have a two tables
Student
--------
Id Name
1 John
2 David
3 Will
Grade
---------
Student_id Mark
1 A
2 B
2 B+
3 C
3 A
Is it possible to make native Postgresql SELECT to get results like below:
Name Array of marks
-----------------------
'John', {'A'}
'David', {'B','B+'}
'Will', {'C','A'}
But not like below
Name Mark
----------------
'John', 'A'
'David', 'B'
'David', 'B+'
'Will', 'C'
'Will', 'A'
Sql Solutions
Solution 1 - Sql
Use array_agg: http://www.sqlfiddle.com/#!1/5099e/1
SELECT s.name, array_agg(g.Mark) as marks
FROM student s
LEFT JOIN Grade g ON g.Student_id = s.Id
GROUP BY s.Id
By the way, if you are using Postgres 9.1, you don't need to repeat the columns on SELECT to GROUP BY, e.g. you don't need to repeat the student name on GROUP BY. You can merely GROUP BY on primary key. If you remove the primary key on student, you need to repeat the student name on GROUP BY.
CREATE TABLE grade
(Student_id int, Mark varchar(2));
INSERT INTO grade
(Student_id, Mark)
VALUES
(1, 'A'),
(2, 'B'),
(2, 'B+'),
(3, 'C'),
(3, 'A');
CREATE TABLE student
(Id int primary key, Name varchar(5));
INSERT INTO student
(Id, Name)
VALUES
(1, 'John'),
(2, 'David'),
(3, 'Will');
Solution 2 - Sql
What I understand you can do something like this:
SELECT p.p_name,
STRING_AGG(Grade.Mark, ',' ORDER BY Grade.Mark) As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
EDIT
I am not sure. But maybe something like this then:
SELECT p.p_name,
array_to_string(ARRAY_AGG(Grade.Mark),';') As marks
FROM Student
LEFT JOIN Grade ON Grade.Student_id = Student.Id
GROUP BY Student.Name;
Reference here
Solution 3 - Sql
You could use the following:
SELECT Student.Name as Name,
(SELECT array(SELECT Mark FROM Grade WHERE Grade.Student_id = Student.Id))
AS ArrayOfMarks
FROM Student
As described here: http://www.mkyong.com/database/convert-subquery-result-to-array/
Solution 4 - Sql
@Michael Buen got it right. I got what I needed using array_agg.
Here just a basic query example in case it helps someone:
SELECT directory, ARRAY_AGG(file_name) FROM table WHERE type = 'ZIP' GROUP BY directory;
And the result was something like:
parent_directory | array_agg | ------------------------+----------------------------------------+ /home/postgresql/files | {zip_1.zip,zip_2.zip,zip_3.zip} | /home/postgresql/files2 | {file1.zip,file2.zip} |
This post also helped me a lot: "Group By" in SQL and Python Pandas. It basically says that it is more convenient to use only PSQL when possible, but that Python Pandas can be useful to achieve extra functionalities in the filtering process.
I hope it helps