python-pandas and databases like mysql

PythonPandas

Python Problem Overview


The documentation for Pandas has numerous examples of best practices for working with data stored in various formats.

However, I am unable to find any good examples for working with databases like MySQL for example.

Can anyone point me to links or give some code snippets of how to convert query results using mysql-python to data frames in Pandas efficiently ?

Python Solutions


Solution 1 - Python

As Wes says, io/sql's read_sql will do it, once you've gotten a database connection using a DBI compatible library. We can look at two short examples using the MySQLdb and cx_Oracle libraries to connect to Oracle and MySQL and query their data dictionaries. Here is the example for cx_Oracle:

import pandas as pd
import cx_Oracle

ora_conn = cx_Oracle.connect('your_connection_string')
df_ora = pd.read_sql('select * from user_objects', con=ora_conn)    
print 'loaded dataframe from Oracle. # Records: ', len(df_ora)
ora_conn.close()

And here is the equivalent example for MySQLdb:

import MySQLdb
mysql_cn= MySQLdb.connect(host='myhost', 
                port=3306,user='myusername', passwd='mypassword', 
                db='information_schema')
df_mysql = pd.read_sql('select * from VIEWS;', con=mysql_cn)    
print 'loaded dataframe from MySQL. records:', len(df_mysql)
mysql_cn.close()

Solution 2 - Python

For recent readers of this question: pandas have the following warning in their docs for version 14.0:

> Warning: Some of the existing functions or function aliases have been > deprecated and will be removed in future versions. This includes: > tquery, uquery, read_frame, frame_query, write_frame.

And:

> Warning: The support for the ‘mysql’ flavor when using DBAPI connection objects has > been deprecated. MySQL will be further supported with SQLAlchemy > engines (GH6900).

This makes many of the answers here outdated. You should use sqlalchemy:

from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('dialect://user:pass@host:port/schema', echo=False)
f = pd.read_sql_query('SELECT * FROM mytable', engine, index_col = 'ID')

Solution 3 - Python

For the record, here is an example using a sqlite database:

import pandas as pd
import sqlite3

with sqlite3.connect("whatever.sqlite") as con:
    sql = "SELECT * FROM table_name"
    df = pd.read_sql_query(sql, con)
    print df.shape

Solution 4 - Python

I prefer to create queries with SQLAlchemy, and then make a DataFrame from it. SQLAlchemy makes it easier to combine SQL conditions Pythonically if you intend to mix and match things over and over.

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from pandas import DataFrame
import datetime

# We are connecting to an existing service
engine = create_engine('dialect://user:pwd@host:port/db', echo=False)
Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()

# And we want to query an existing table
tablename = Table('tablename', 
    Base.metadata, 
    autoload=True, 
    autoload_with=engine, 
    schema='ownername')

# These are the "Where" parameters, but I could as easily 
# create joins and limit results
us = tablename.c.country_code.in_(['US','MX'])
dc = tablename.c.locn_name.like('%DC%')
dt = tablename.c.arr_date >= datetime.date.today() # Give me convenience or...

q = session.query(tablename).\
            filter(us & dc & dt) # That's where the magic happens!!!

def querydb(query):
    """
    Function to execute query and return DataFrame.
    """
    df = DataFrame(query.all());
    df.columns = [x['name'] for x in query.column_descriptions]
    return df

querydb(q)

Solution 5 - Python

MySQL example:

import MySQLdb as db
from pandas import DataFrame
from pandas.io.sql import frame_query

database = db.connect('localhost','username','password','database')
data     = frame_query("SELECT * FROM data", database)

Solution 6 - Python

The same syntax works for Ms SQL server using podbc also.

import pyodbc
import pandas.io.sql as psql

cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=servername;DATABASE=mydb;UID=username;PWD=password') 
cursor = cnxn.cursor()
sql = ("""select * from mytable""")

df = psql.frame_query(sql, cnxn)
cnxn.close()

Solution 7 - Python

And this is how you connect to PostgreSQL using psycopg2 driver (install with "apt-get install python-psycopg2" if you're on Debian Linux derivative OS).

import pandas.io.sql as psql
import psycopg2

conn = psycopg2.connect("dbname='datawarehouse' user='user1' host='localhost' password='uberdba'")

q = """select month_idx, sum(payment) from bi_some_table"""

df3 = psql.frame_query(q, conn)

Solution 8 - Python

For Sybase the following works (with http://python-sybase.sourceforge.net)

import pandas.io.sql as psql
import Sybase

df = psql.frame_query("<Query>", con=Sybase.connect("<dsn>", "<user>", "<pwd>"))

Solution 9 - Python

pandas.io.sql.frame_query is deprecated. Use pandas.read_sql instead.

Solution 10 - Python

import the module

import pandas as pd
import oursql

connect

conn=oursql.connect(host="localhost",user="me",passwd="mypassword",db="classicmodels")
sql="Select customerName, city,country from customers order by customerName,country,city"
df_mysql = pd.read_sql(sql,conn)
print df_mysql

That works just fine and using pandas.io.sql frame_works (with the deprecation warning). Database used is the sample database from mysql tutorial.

Solution 11 - Python

This should work just fine.

import MySQLdb as mdb
import pandas as pd
con = mdb.connect(‘127.0.0.1’, ‘root’, ‘password’, ‘database_name’);
with con:
 cur = con.cursor()
 cur.execute(“select random_number_one, random_number_two, random_number_three from randomness.a_random_table”)
 rows = cur.fetchall()
 df = pd.DataFrame( [[ij for ij in i] for i in rows] )
 df.rename(columns={0: ‘Random Number One’, 1: ‘Random Number Two’, 2: ‘Random Number Three’}, inplace=True);
 print(df.head(20))

Solution 12 - Python

This helped for me for connecting to AWS MYSQL(RDS) from python 3.x based lambda function and loading into a pandas DataFrame

import json
import boto3
import pymysql
import pandas as pd
user = 'username'
password = 'XXXXXXX'
client = boto3.client('rds')
def lambda_handler(event, context):
    conn = pymysql.connect(host='xxx.xxxxus-west-2.rds.amazonaws.com', port=3306, user=user, passwd=password, db='database name', connect_timeout=5)
    df= pd.read_sql('select * from TableName limit 10',con=conn)
    print(df)
    # TODO implement
    #return {
    #    'statusCode': 200,
    #    'df': df
    #}

Solution 13 - Python

For Postgres users

import psycopg2
import pandas as pd

conn = psycopg2.connect("database='datawarehouse' user='user1' host='localhost' password='uberdba'")

customers = 'select * from customers'

customers_df = pd.read_sql(customers,conn)

customers_df

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1320615View Question on Stackoverflow
Solution 1 - PythonKeith C CampbellView Answer on Stackoverflow
Solution 2 - PythonKoremView Answer on Stackoverflow
Solution 3 - PythonmbatchkarovView Answer on Stackoverflow
Solution 4 - PythondmviannaView Answer on Stackoverflow
Solution 5 - PythonaerkenemesisView Answer on Stackoverflow
Solution 6 - PythonhedgcutterView Answer on Stackoverflow
Solution 7 - PythonWillView Answer on Stackoverflow
Solution 8 - Pythonuser1827356View Answer on Stackoverflow
Solution 9 - PythonajklView Answer on Stackoverflow
Solution 10 - Pythonuser5925400View Answer on Stackoverflow
Solution 11 - PythonMontyPythonView Answer on Stackoverflow
Solution 12 - PythonDheerajView Answer on Stackoverflow
Solution 13 - PythonEvaMwangiView Answer on Stackoverflow