SQL WHERE ID IN (id1, id2, ..., idn)

SqlSelect

Sql Problem Overview


I need to write a query to retrieve a big list of ids.

We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.

The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?

1) Writing a query using IN
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

My question here is. What happens if n is very big? Also, what about performance?

2) Writing a query using OR
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn

I think that this approach does not have n limit, but what about performance if n is very big?

3) Writing a programmatic solution:
  foreach (var id in myIdList)
  {
      var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
      myObjectList.Add(item);
  }

We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.

What would be a correct solution for this problem?

Sql Solutions


Solution 1 - Sql

Option 1 is the only good solution.

Why?
  • Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...

  • Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"

Solution 2 - Sql

An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.

You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.

Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.

Solution 3 - Sql

What Ed Guiness suggested is really a performance booster , I had a query like this

select * from table where id in (id1,id2.........long list)

what i did :

DECLARE @temp table(
			ID  int
			)
insert into @temp 
select * from dbo.fnSplitter('#idlist#')

Then inner joined the temp with main table :

select * from table inner join temp on temp.id = table.id

And performance improved drastically.

Solution 4 - Sql

First option is definitely the best option.

SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:

  • Divide you list of Ids into chunks of fixed number, say 100
  • Chunk size should be decided based upon the memory size of your server
  • Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
  • Process one chunk at a time resulting in 100 database calls for select

Why should you divide into chunks?

> You will never get memory overflow exception which is very common in scenarios like yours. > You will have optimized number of database calls resulting in better performance.

It has always worked like charm for me. Hope it would work for my fellow developers as well :)

Solution 5 - Sql

Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!

Doing this instead returned results immediately:

select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id

Use a join.

Solution 6 - Sql

In most database systems, IN (val1, val2, …) and a series of OR are optimized to the same plan.

The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.

You may want to read this articles:

Solution 7 - Sql

I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.

Solution 8 - Sql

Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.

Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.

Solution 9 - Sql

  1. For 1st option
    Add IDs into temp table and add inner join with main table.
CREATE TABLE #temp (column int)
INSERT INTO #temp (column) 
SELECT t.column1 FROM (VALUES (1),(2),(3),...(10000)) AS t(column1)

Solution 10 - Sql

Try this

SELECT Position_ID , Position_Name
FROM 
position
WHERE Position_ID IN (6 ,7 ,8)
ORDER BY Position_Name

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDaniel PeñalbaView Question on Stackoverflow
Solution 1 - SqlThiefMasterView Answer on Stackoverflow
Solution 2 - SqlEd GuinessView Answer on Stackoverflow
Solution 3 - SqlRituView Answer on Stackoverflow
Solution 4 - SqlAdarsh KumarView Answer on Stackoverflow
Solution 5 - SqlJakeJView Answer on Stackoverflow
Solution 6 - SqlQuassnoiView Answer on Stackoverflow
Solution 7 - SqlflqView Answer on Stackoverflow
Solution 8 - SqljuddaView Answer on Stackoverflow
Solution 9 - SqlAradhana GodhaniView Answer on Stackoverflow
Solution 10 - SqlSIAMWEBSITEView Answer on Stackoverflow