How can I get all rows with keys provided in a list using SQLalchemy?

PythonSelectSqlalchemy

Python Problem Overview


I have sequence of IDs I want to retrieve. It's simple:

session.query(Record).filter(Record.id.in_(seq)).all()

Is there a better way to do it?

Python Solutions


Solution 1 - Python

Your code is absolutety fine.

IN is like a bunch of X=Y joined with OR and is pretty fast in contemporary databases.

However, if your list of IDs is long, you could make the query a bit more efficient by passing a sub-query returning the list of IDs.

Solution 2 - Python

The code as is is completely fine. However, someone is asking me for some system of hedging between the two approaches of doing a big IN vs. using get() for individual IDs.

If someone is really trying to avoid the SELECT, then the best way to do that is to set up the objects you need in memory ahead of time. Such as, you're working on a large table of elements. Break up the work into chunks, such as, order the full set of work by primary key, or by date range, whatever, then load everything for that chunk locally into a cache:

 all_ids = [<huge list of ids>]
 
 all_ids.sort()
 while all_ids:
     chunk = all_ids[0:1000]

     # bonus exercise!  Throw each chunk into a multiprocessing.pool()!
     all_ids = all_ids[1000:]

     my_cache = dict(
           Session.query(Record.id, Record).filter(
                 Record.id.between(chunk[0], chunk[-1]))
     )

     for id_ in chunk:
         my_obj = my_cache[id_]
         <work on my_obj>

That's the real world use case.

But to also illustrate some SQLAlchemy API, we can make a function that does the IN for records we don't have and a local get for those we do. Here is that:

from sqlalchemy import inspect


def get_all(session, cls, seq):
    mapper = inspect(cls)
    lookup = set()
    for ident in seq:
        key = mapper.identity_key_from_primary_key((ident, ))
        if key in session.identity_map:
            yield session.identity_map[key]
        else:
            lookup.add(ident)
    if lookup:
        for obj in session.query(cls).filter(cls.id.in_(lookup)):
            yield obj

Here is a demonstration:

from sqlalchemy import Column, Integer, create_engine, String
from sqlalchemy.orm import Session
from sqlalchemy.ext.declarative import declarative_base
import random

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    data = Column(String)

e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

ids = range(1, 50)

s = Session(e)
s.add_all([A(id=i, data='a%d' % i) for i in ids])
s.commit()
s.close()

already_loaded = s.query(A).filter(A.id.in_(random.sample(ids, 10))).all()

assert len(s.identity_map) == 10

to_load = set(random.sample(ids, 25))
all_ = list(get_all(s, A, to_load))

assert set(x.id for x in all_) == to_load

Solution 3 - Python

If you use composite primary keys, you can use tuple_, as in

from sqlalchemy import tuple_
session.query(Record).filter(tuple_(Record.id1, Record.id2).in_(seq)).all()

Note that this is not available on SQLite (see doc).

Solution 4 - Python

I'd recommend to take a look at the SQL it produces. You can just print str(query) to see it.

I'm not aware of an ideal way of doing it with standard SQL.

Solution 5 - Python

There is one other way; If it's reasonable to expect that the objects in question are already loaded into the session; you've accessed them before in the same transaction, you can instead do:

map(session.query(Record).get, seq)

In the case where those objects are already present, this will be much faster, since there won't be any queries to retrieve those objects; On the other hand, if more than a tiny number of those objects are not loaded, it will be much, much slower, since it will cause a query per missing instance, instead of a single query for all objects.

This can be useful when you are doing joinedload() queries before reaching the above step, so you can be sure that they have been loaded already. In general, you should use the solution in the question by default, and only explore this solution when you have seen that you are querying for the same objects over and over.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCheeryView Question on Stackoverflow
Solution 1 - PythonAdam DziendzielView Answer on Stackoverflow
Solution 2 - PythonzzzeekView Answer on Stackoverflow
Solution 3 - PythonG.JView Answer on Stackoverflow
Solution 4 - PythoninyView Answer on Stackoverflow
Solution 5 - PythonSingleNegationEliminationView Answer on Stackoverflow