PostgreSQL - max number of parameters in "IN" clause?

Postgresql

Postgresql Problem Overview


In Postgres, you can specify an IN clause, like this:

SELECT * FROM user WHERE id IN (1000, 1001, 1002)

Does anyone know what's the maximum number of parameters you can pass into IN?

Postgresql Solutions


Solution 1 - Postgresql

According to the source code located here, starting at line 850, PostgreSQL doesn't explicitly limit the number of arguments.

The following is a code comment from line 870:

/*
 * We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
 * possible if the inputs are all scalars (no RowExprs) and there is a
 * suitable array type available.  If not, we fall back to a boolean
 * condition tree with multiple copies of the lefthand expression.
 * Also, any IN-list items that contain Vars are handled as separate
 * boolean conditions, because that gives the planner more scope for
 * optimization on such clauses.
 *
 * First step: transform all the inputs, and detect whether any are
 * RowExprs or contain Vars.
 */

Solution 2 - Postgresql

This is not really an answer to the present question, however it might help others too.

At least I can tell there is a technical limit of 32767 values (=Short.MAX_VALUE) passable to the PostgreSQL backend, using Posgresql's JDBC driver 9.1.

This is a test of "delete from x where id in (... 100k values...)" with the postgresql jdbc driver:

Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
	at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)

Solution 3 - Postgresql

explain select * from test where id in (values (1), (2));

QUERY PLAN

 Seq Scan on test  (cost=0.00..1.38 rows=2 width=208)
   Filter: (id = ANY ('{1,2}'::bigint[]))

But if try 2nd query:

explain select * from test where id = any (values (1), (2));

QUERY PLAN

Hash Semi Join  (cost=0.05..1.45 rows=2 width=208)
       Hash Cond: (test.id = "*VALUES*".column1)
       ->  Seq Scan on test  (cost=0.00..1.30 rows=30 width=208)
       ->  Hash  (cost=0.03..0.03 rows=2 width=4)
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)

We can see that postgres build temp table and join with it

Solution 4 - Postgresql

As someone more experienced with Oracle DB, I was concerned about this limit too. I carried out a performance test for a query with ~10'000 parameters in an IN-list, fetching prime numbers up to 100'000 from a table with the first 100'000 integers by actually listing all the prime numbers as query parameters.

My results indicate that you need not worry about overloading the query plan optimizer or getting plans without index usage, since it will transform the query to use = ANY({...}::integer[]) where it can leverage indices as expected:

-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN ($1, $2, $3, ..., $9592);

-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);

-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes  (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
"  Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"

-- setup, should you care:
CREATE TABLE public.primes
(
  n integer NOT NULL,
  prime boolean,
  CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.primes
  OWNER TO postgres;

INSERT INTO public.primes
SELECT generate_series(1,100000);

However, this (rather old) thread on the pgsql-hackers mailing list indicates that there is still a non-negligible cost in planning such queries, so take my word with a grain of salt.

Solution 5 - Postgresql

There is no limit to the number of elements that you are passing to IN clause. If there are more elements it will consider it as array and then for each scan in the database it will check if it is contained in the array or not. This approach is not so scalable. Instead of using IN clause try using INNER JOIN with temp table. Refer http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/ for more info. Using INNER JOIN scales well as query optimizer can make use of hash join and other optimization. Whereas with IN clause there is no way for the optimizer to optimize the query. I have noticed speedup of at least 2x with this change.

Solution 6 - Postgresql

Just tried it. the answer is -> out-of-range integer as a 2-byte value: 32768

Solution 7 - Postgresql

You might want to consider refactoring that query instead of adding an arbitrarily long list of ids... You could use a range if the ids indeed follow the pattern in your example:

SELECT * FROM user WHERE id >= minValue AND id <= maxValue;

Another option is to add an inner select:

SELECT * 
FROM user 
WHERE id IN (
    SELECT userId
    FROM ForumThreads ft
    WHERE ft.id = X
);

Solution 8 - Postgresql

If you have query like:

SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)

you may increase performace if rewrite your query like:

SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionView Question on Stackoverflow
Solution 1 - PostgresqlJordan S. JonesView Answer on Stackoverflow
Solution 2 - PostgresqlnimaiView Answer on Stackoverflow
Solution 3 - PostgresqlYevhen SurovskyiView Answer on Stackoverflow
Solution 4 - PostgresqlblubbView Answer on Stackoverflow
Solution 5 - PostgresqlPrasanth JayachandranView Answer on Stackoverflow
Solution 6 - PostgresqlAndrewView Answer on Stackoverflow
Solution 7 - PostgresqlPatrikAkerstrandView Answer on Stackoverflow
Solution 8 - PostgresqlYevhen SurovskyiView Answer on Stackoverflow