Pandas update sql

PythonPostgresqlPandas

Python Problem Overview


Is there any way to do an SQL update-where from a dataframe without iterating through each line? I have a postgresql database and to update a table in the db from a dataframe I would use psycopg2 and do something like:

con = psycopg2.connect(database='mydb', user='abc', password='xyz')
cur = con.cursor()

for index, row in df.iterrows():
    sql = 'update table set column = %s where column = %s'
    cur.execute(sql, (row['whatver'], row['something']))
con.commit()

But on the other hand if im either reading a table from sql or writing an entire dataframe to sql (with no update-where), then I would just use pandas and sqlalchemy. Something like:

engine = create_engine('postgresql+psycopg2://user:pswd@mydb')
df.to_sql('table', engine, if_exists='append')

It's great just having a 'one-liner' using to_sql. Isn't there something similar to do an update-where from pandas to postgresql? Or is the only way to do it by iterating through each row like i've done above. Isn't iterating through each row an inefficient way to do it?

Python Solutions


Solution 1 - Python

Consider a temp table which would be exact replica of your final table, cleaned out with each run:

engine = create_engine('postgresql+psycopg2://user:pswd@mydb')
df.to_sql('temp_table', engine, if_exists='replace')

sql = """
    UPDATE final_table AS f
    SET col1 = t.col1
    FROM temp_table AS t
    WHERE f.id = t.id
"""

with engine.begin() as conn:     # TRANSACTION
    conn.execute(sql)
  

Solution 2 - Python

It looks like you are using some external data stored in df for the conditions on updating your database table. If it is possible why not just do a one-line sql update?

If you are working with a smallish database (where loading the whole data to the python dataframe object isn't going to kill you) then you can definitely conditionally update the dataframe after loading it using read_sql. Then you can use a keyword arg if_exists="replace" to replace the DB table with the new updated table.

df = pandas.read_sql("select * from your_table;", engine)

#update information (update your_table set column = "new value" where column = "old value")
#still may need to iterate for many old value/new value pairs
df[df['column'] == "old value", "column"] = "new value"

#send data back to sql
df.to_sql("your_table", engine, if_exists="replace")

Pandas is a powerful tool, where limited SQL support was just a small feature at first. As time goes by people are trying to use pandas as their only database interface software. I don't think pandas was ever meant to be an end-all for database interaction, but there are a lot of people working on new features all the time. See: https://github.com/pandas-dev/pandas/issues

Solution 3 - Python

I have so far not seen a case where the pandas sql connector can be used in any scalable way to update database data. It may have seemed like a good idea to build one, but really, for operational work it just does not scale.

What I would recommend is to dump your entire dataframe as CSV using

df.to_csv('filename.csv', encoding='utf-8')

Then loading the CSV into the database using COPY for PostgreSQL or LOAD DATA INFILE for MySQL.

If you do not make other changes to the table in question while the data is being manipulated by pandas, you can just load into the table.

If there are concurrency issues, you will have to load the data into a staging table that you then use to update your primary table from.

In the later case, your primary table needs to have a datetime which tells you when the latest modification to it was so you can determine if your pandas changes are the latest or if the database changes should remain.

Solution 4 - Python

I was wondering why donnt you update the df first based on your equation and then store the df to the database, you could use if_exists='replace', to store on the same table.

Solution 5 - Python

In case the column names have not changed I prefer removing all rows and then appending the data to the now empty table. Otherwise, dependent views will have to be regenerated as well:

from sqlalchemy import create_engine
from sqlalchemy import MetaData

engine = create_engine(f'postgresql://postgres:{pw}@localhost:5432/table')   

# Get main table and delete all rows
# without deleting the table
meta = MetaData(engine)
meta.reflect(engine)
table = meta.tables['table']
del_st = table.delete()

conn = engine.connect()
res = conn.execute(del_st)

# Insert new data    
df.to_sql('table', engine, if_exists='append', index=False)

Solution 6 - Python

I try the first answer and find it works not so well, then I change some parts to pass all situation by using pandas+sqlalchemy to update.

def update_to_sql(self, table_name, key_name)
    a = []
    self.table = table_name
    self.primary_key = key_name
    for col in df.columns:
        if col == self.primary_key:
            continue
        a.append("f.{col}=t.{col}".format(col=col))
    df.to_sql('temporary_table', self.sql_engine, if_exists='replace', index=False)
    update_stmt_1 = "UPDATE {final_table} AS f".format(final_table=self.table)
    update_stmt_2 = " INNER JOIN (SELECT * FROM temporary_table) AS t ON t.{primary_key}=f.{primary_key} ".format(primary_key=self.primary_key)
    update_stmt_3 = "SET "
    update_stmt_4 = ", ".join(a)
    update_stmt_5 = update_stmt_1 + update_stmt_2 + update_stmt_3 + update_stmt_4 + ";"
    print(update_stmt_5)
    with self.sql_engine.begin() as cnx:
        cnx.execute(update_stmt_5)

Solution 7 - Python

Here is an approach that I found to be somewhat clean. This utilizes sqlalchemy. It only updates one column at a time but can easily be generalized.

def dataframe_update(df, table, engine, primary_key, column):
  md = MetaData(engine)
  table = Table(table, md, autoload=True)
  session = sessionmaker(bind=engine)()
  for _, row in df.iterrows():
    session.query(table).filter(table.columns[primary_key] == row[primary_key]).update({column: row[column]})
  session.commit()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondarkpoolView Question on Stackoverflow
Solution 1 - PythonParfaitView Answer on Stackoverflow
Solution 2 - Pythonjeffery_the_windView Answer on Stackoverflow
Solution 3 - PythonfirelynxView Answer on Stackoverflow
Solution 4 - PythonnabazView Answer on Stackoverflow
Solution 5 - PythonSorenView Answer on Stackoverflow
Solution 6 - PythonWelsonlemonView Answer on Stackoverflow
Solution 7 - PythonslanderaanView Answer on Stackoverflow