SQLAlchemy ORM conversion to pandas DataFrame

PythonPandasSqlalchemyFlask Sqlalchemy

Python Problem Overview


Is there a solution converting a SQLAlchemy <Query object> to a pandas DataFrame?

Pandas has the capability to use pandas.read_sql but this requires use of raw SQL. I have two reasons for wanting to avoid it:

  1. I already have everything using the ORM (a good reason in and of itself) and
  2. I'm using python lists as part of the query, e.g.: > db.session.query(Item).filter(Item.symbol.in_(add_symbols) where Item is my model class and add_symbols is a list). This is the equivalent of SQL SELECT ... from ... WHERE ... IN.

Is anything possible?

Python Solutions


Solution 1 - Python

Below should work in most cases:

df = pd.read_sql(query.statement, query.session.bind)

See pandas.read_sql documentation for more information on the parameters.

Solution 2 - Python

Just to make this more clear for novice pandas programmers, here is a concrete example,

pd.read_sql(session.query(Complaint).filter(Complaint.id == 2).statement,session.bind) 

Here we select a complaint from complaints table (sqlalchemy model is Complaint) with id = 2

Solution 3 - Python

For completeness sake: As alternative to the Pandas-function read_sql_query(), you can also use the Pandas-DataFrame-function from_records() to convert a structured or record ndarray to DataFrame.
This comes in handy if you e.g. have already executed the query in SQLAlchemy and have the results already available:

import pandas as pd 
from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import scoped_session, sessionmaker


SQLALCHEMY_DATABASE_URI = 'postgresql://postgres:postgres@localhost:5432/my_database'
engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_pre_ping=True, echo=False)
db = scoped_session(sessionmaker(autocommit=False, autoflush=False, bind=engine))
Base = declarative_base(bind=engine)


class Currency(Base):
    """The `Currency`-table"""
    __tablename__ = "currency"
    __table_args__ = {"schema": "data"}

    id = Column(Integer, primary_key=True, nullable=False)
    name = Column(String(64), nullable=False)


# Defining the SQLAlchemy-query
currency_query = db.query(Currency).with_entities(Currency.id, Currency.name)

# Getting all the entries via SQLAlchemy
currencies = currency_query.all()

# We provide also the (alternate) column names and set the index here,
# renaming the column `id` to `currency__id`
df_from_records = pd.DataFrame.from_records(currencies
    , index='currency__id'
    , columns=['currency__id', 'name'])
print(df_from_records.head(5))

# Or getting the entries via Pandas instead of SQLAlchemy using the
# aforementioned function `read_sql_query()`. We can set the index-columns here as well
df_from_query = pd.read_sql_query(currency_query.statement, db.bind, index_col='id')
# Renaming the index-column(s) from `id` to `currency__id` needs another statement
df_from_query.index.rename(name='currency__id', inplace=True)
print(df_from_query.head(5))

Solution 4 - Python

The selected solution didn't work for me, as I kept getting the error >AttributeError: 'AnnotatedSelect' object has no attribute 'lower'

I found the following worked:

df = pd.read_sql_query(query.statement, engine)

Solution 5 - Python

from sqlalchemy import Column, Integer, String, create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('postgresql://postgres:postgres@localhost:5432/DB', echo=False)
Base = declarative_base(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()

conn = session.bind

class DailyTrendsTable(Base):
    
    __tablename__ = 'trends'
    __table_args__ = ({"schema": 'mf_analysis'})
    
    company_code = Column(DOUBLE_PRECISION, primary_key=True)
    rt_bullish_trending = Column(Integer)
    rt_bearish_trending = Column(Integer)
    rt_bullish_non_trending = Column(Integer)
    rt_bearish_non_trending = Column(Integer)
    gen_date = Column(Date, primary_key=True)

df_query = select([DailyTrendsTable])

df_data = pd.read_sql(rt_daily_query, con = conn)

Solution 6 - Python

If you want to compile a query with parameters and dialect specific arguments, use something like this:

c = query.statement.compile(query.session.bind)
df = pandas.read_sql(c.string, query.session.bind, params=c.params)

Solution 7 - Python

This answer provides a reproducible example using an SQL Alchemy select statement and returning a pandas data frame. It is based on an in memory SQLite database so that anyone can reproduce it without installing a database engine.

import pandas
from sqlalchemy import create_engine
from sqlalchemy import MetaData, Table, Column, Text
from sqlalchemy.orm import Session

Define table metadata and create a table

engine = create_engine('sqlite://')
meta = MetaData()
meta.bind = engine
user_table = Table('user', meta,
                   Column("name", Text),
                   Column("full_name", Text))
user_table.create()

Insert some data into the user table

stmt = user_table.insert().values(name='Bob', full_name='Sponge Bob')
with Session(engine) as session:
    result = session.execute(stmt)
    session.commit()

Read the result of a select statement into a pandas data frame

# Select data into a pandas data frame
stmt = user_table.select().where(user_table.c.name == 'Bob')
df = pandas.read_sql_query(stmt, engine)
df
Out:
  name   full_name
0  Bob  Sponge Bob

Solution 8 - Python

if use SQL query

def generate_df_from_sqlquery(query):
   from pandas import DataFrame
   query = db.session.execute(query)
   df = DataFrame(query.fetchall())
   if len(df) > 0:
      df.columns = query.keys()
   else:
      columns = query.keys()
      df = pd.DataFrame(columns=columns)
return df

profile_df = generate_df_from_sqlquery(profile_query) 

Solution 9 - Python

Using the 2.0 SQLalchemy syntax (available also in 1.4 with the flag future=True) it looks that pd.read_sql is not implemented yet and it will raise:

NotImplementedError: This method is not implemented for SQLAlchemy 2.0.

This is an open issue that won't be solved till pandas 2.0, you can find some information about this here and here.

I didn't find any satisfactory work around, but some people seems to be using two configurations of the engine, one with the flag future False:

engine2 = create_engine(URL_string, echo=False, future=False)

This solution would be OK if you query strings, but using the ORM, the best I could do is a custom function yet to be optimized, but it works:

Conditions = session.query(ExampleTable)
def df_from_sql(query):
    return pd.DataFrame({i:j.__dict__ for i,j in enumerate(query.all())},).T.drop(columns='_sa_instance_state')
df = df_from_sql(ExampleTable)

This solution in any case would be provisional till pd.read_sql has implemented the new syntax.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJaredView Question on Stackoverflow
Solution 1 - PythonvanView Answer on Stackoverflow
Solution 2 - PythonChandan PurohitView Answer on Stackoverflow
Solution 3 - PythontaffitView Answer on Stackoverflow
Solution 4 - Pythonjorr45View Answer on Stackoverflow
Solution 5 - PythonAkshay SalviView Answer on Stackoverflow
Solution 6 - PythonJohan DahlinView Answer on Stackoverflow
Solution 7 - PythonPaul RougieuxView Answer on Stackoverflow
Solution 8 - PythonRamesh PonnusamyView Answer on Stackoverflow
Solution 9 - PythonPablo RuizView Answer on Stackoverflow