how to exclude null values in array_agg like in string_agg using postgres?
SqlPostgresqlPostgresql 9.1Postgresql 8.4Sql Problem Overview
If I use array_agg
to collect names, I get my names separated by commas, but in case there is a null
value, that null is also taken as a name in the aggregate. For example :
SELECT g.id,
array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END) canonical_users,
array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END) non_canonical_users
FROM groups g
GROUP BY g.id;
it returns ,Larry,Phil
instead of just Larry,Phil
(in my 9.1.2, it shows NULL,Larry,Phil
).
Instead, if I use string_agg()
, it shows me only the names (without empty commas or nulls).
The problem is that I have Postgres 8.4
installed on the server, and string_agg()
doesn't work there. Is there any way to make array_agg work similar to string_agg() ?
Sql Solutions
Solution 1 - Sql
With postgresql-9.3 one can do this;
SELECT g.id,
array_remove(array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END), NULL) canonical_users,
array_remove(array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END), NULL) non_canonical_users
FROM groups g
GROUP BY g.id;
Update: with postgresql-9.4;
SELECT g.id,
array_agg(g.users) FILTER (WHERE g.canonical = 'Y') canonical_users,
array_agg(g.users) FILTER (WHERE g.canonical = 'N') non_canonical_users
FROM groups g
GROUP BY g.id;
Update (2022-02-19): also with postgresql-9.4;
This results in an empty array when all values in an array are null instead of returning null;
SELECT g.id,
coalesce( array_agg(g.users) FILTER (WHERE g.canonical = 'Y'), '{}' ) canonical_users,
coalesce( array_agg(g.users) FILTER (WHERE g.canonical = 'N'), '{}' ) non_canonical_users
FROM groups g
GROUP BY g.id;
Solution 2 - Sql
If you are looking for a modern answer to the general question of how to remove a NULL from an array, it is:
array_remove(your_array, NULL)
I was specifically curious about performance and wanted to compare this to the best possible alternative:
CREATE OR REPLACE FUNCTION strip_nulls(
IN array_in ANYARRAY
)
RETURNS anyarray AS
'
SELECT
array_agg(a)
FROM unnest(array_in) a
WHERE
a IS NOT NULL
;
'
LANGUAGE sql
;
Doing a pgbench test proved (with high confidence) that array_remove() is a little more than twice as fast. I did my test on double precision numbers with a variety of array sizes (10, 100 and 1000 elements) and random NULLs in between.
It's also worth noting that this can be used to remove blanks ('' != NULL). But the second parameter accepts anyelement
, and since it is most likely they you'd be indicating a blank with a string literal, make sure to cast it to the form you want, usually a non-array.
For example:
select array_remove(array['abc', ''], ''::text);
If you try:
select array_remove(array['abc', ''], '');
it will assume that the '' is TEXT[] (array) and will throw this error:
> ERROR: malformed array literal: ""
Solution 3 - Sql
select
id,
(select array_agg(a) from unnest(canonical_users) a where a is not null) canonical_users,
(select array_agg(a) from unnest(non_canonical_users) a where a is not null) non_canonical_users
from (
SELECT g.id,
array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END) canonical_users,
array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END) non_canonical_users
FROM groups g
GROUP BY g.id
) s
Or, simpler and may be cheaper, using array_to_string
which eliminates nulls:
SELECT
g.id,
array_to_string(
array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END)
, ','
) canonical_users,
array_to_string(
array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END)
, ','
) non_canonical_users
FROM groups g
GROUP BY g.id
Solution 4 - Sql
In solving the general question of removing nulls from array aggregates there are two main ways of attacking the problem: either doing array_agg(unnest(array_agg(x)) or creating a custom aggregate.
The first is of the form shown above:
SELECT
array_agg(u)
FROM (
SELECT
unnest(
array_agg(v)
) as u
FROM
x
) un
WHERE
u IS NOT NULL;
The second:
/*
With reference to
http://ejrh.wordpress.com/2011/09/27/denormalisation-aggregate-function-for-postgresql/
*/
CREATE OR REPLACE FUNCTION fn_array_agg_notnull (
a anyarray
, b anyelement
) RETURNS ANYARRAY
AS $$
BEGIN
IF b IS NOT NULL THEN
a := array_append(a, b);
END IF;
RETURN a;
END;
$$ IMMUTABLE LANGUAGE 'plpgsql';
CREATE AGGREGATE array_agg_notnull(ANYELEMENT) (
SFUNC = fn_array_agg_notnull,
STYPE = ANYARRAY,
INITCOND = '{}'
);
Calling the second is (naturally) a little nicer looking than the first:
> select array_agg_notnull(v) from x;
Solution 5 - Sql
I am adding this even though this this thread is quite old, but I ran into this neat trick that works quite well on small arrays. It runs on Postgres 8.4+ without additional libraries or functions.
string_to_array(array_to_string(array_agg(my_column)))::int[]
The array_to_string()
method actually gets rid of the nulls.
Solution 6 - Sql
You should wrap your array_agg
with array_remove.
SELECT g.id,
array_remove(array_agg(CASE WHEN g.canonical = 'Y' THEN g.users ELSE NULL END), NULL) canonical_users,
array_remove(array_agg(CASE WHEN g.canonical = 'N' THEN g.users ELSE NULL END), NULL) non_canonical_users
FROM groups g
GROUP BY g.id;
Solution 7 - Sql
As has been suggested in the comments you can write a function to replace nulls in an array, however as also pointed out in the thread linked to in the comments, this kind of defeats the efficiency of the aggregate function if you have to create an aggregate, split it then aggregate it again.
I think keeping nulls in the array is just a (perhaps unwanted) feature of Array_Agg. You could use subqueries to avoid this:
SELECT COALESCE(y.ID, n.ID) ID,
y.Users,
n.Users
FROM ( SELECT g.ID, ARRAY_AGG(g.Users) AS Users
FROM Groups g
WHERE g.Canonical = 'Y'
GROUP BY g.ID
) y
FULL JOIN
( SELECT g.ID, ARRAY_AGG(g.Users) AS Users
FROM Groups g
WHERE g.Canonical = 'N'
GROUP BY g.ID
) n
ON n.ID = y.ID
Solution 8 - Sql
It is very simple, just first of all create a new - (minus) operator for text[]:
CREATE OR REPLACE FUNCTION diff_elements_text
(
text[], text[]
)
RETURNS text[] as
$$
SELECT array_agg(DISTINCT new_arr.elem)
FROM
unnest($1) as new_arr(elem)
LEFT OUTER JOIN
unnest($2) as old_arr(elem)
ON new_arr.elem = old_arr.elem
WHERE old_arr.elem IS NULL
$$ LANGUAGE SQL IMMUTABLE;
CREATE OPERATOR - (
PROCEDURE = diff_elements_text,
leftarg = text[],
rightarg = text[]
);
And simply subtract the array[null]:
select
array_agg(x)-array['']
from
( select 'Y' x union all
select null union all
select 'N' union all
select ''
) x;
That's all:
{Y, N}