How to load existing db file to memory in Python sqlite3?

PythonPerformanceSqlite

Python Problem Overview


I have an existing sqlite3 db file, on which I need to make some extensive calculations. Doing the calculations from the file is painfully slow, and as the file is not large (~10 MB), so there should be no problem to load it into memory.

Is there a Pythonic way to load the existing file into memory in order to speed up the calculations?

Python Solutions


Solution 1 - Python

Here is the snippet that I wrote for my flask application:

import sqlite3
from io import StringIO

def init_sqlite_db(app):
    # Read database to tempfile
    con = sqlite3.connect(app.config['SQLITE_DATABASE'])
    tempfile = StringIO()
    for line in con.iterdump():
        tempfile.write('%s\n' % line)
    con.close()
    tempfile.seek(0)

    # Create a database in memory and import from tempfile
    app.sqlite = sqlite3.connect(":memory:")
    app.sqlite.cursor().executescript(tempfile.read())
    app.sqlite.commit()
    app.sqlite.row_factory = sqlite3.Row

Solution 2 - Python

What about sqlite3.Connection.backup(...)? "This method makes a backup of a SQLite database even while it’s being accessed by other clients, or concurrently by the same connection." Availability: SQLite 3.6.11 or higher. New in version 3.7.

import sqlite3

source = sqlite3.connect('existing_db.db')
dest = sqlite3.connect(':memory:')
source.backup(dest)

Solution 3 - Python

sqlite3.Connection.iterdump "[r]eturns an iterator to dump the database in an SQL text format. Useful when saving an in-memory database for later restoration. This function provides the same capabilities as the .dump command in the sqlite3 shell."

Get such an iterator and dump the disk-based database into a memory-based one, and you're ready to compute. When the computation is done, just dump the other way around back to disk.

Solution 4 - Python

First you should try and find out what is causing the slowness you are observing. Are you writing to tables? Are your writes within large enough transactions so that you don't save needless temporary results to disk? Can you change writes to go to temporary tables (with pragma temp_store=memory)? Can you live with pragma synchronous=off?

I don't think this functionality is exposed in the Python module, but sqlite has a backup API that sounds like exactly what you are asking for: a way to copy from one database to another (either one of which may be an in-memory database) that works pretty much automatically without any user-visible enumeration of tables. (Maybe APSW exposes this?)

Another option is to create a ram disk (if you have sufficient control of the environment) and copy the file there.

Solution 5 - Python

if we must use a python wrapper,then there are no better solution than the two pass, read and write solution. but beginning with version 3.7.17, SQLite has the option of accessing disk content directly using memory-mapped I/O.sqlite mmap

if you want to use mmap,you have to use the C interface since no wrapper provide it.

and there is another hardware solution,the memory disk.then you have the convenient file IO and the speed of memory.

Solution 6 - Python

This has already been answered before, including code examples at https://stackoverflow.com/questions/3826552

You do not mention operating system, but one gotcha of Windows XP is that it defaults to a 10MB file cache, no matter how much memory you have. (This made sense in the days when systems came with 64MB etc). This message has several links:

http://marc.info/?l=sqlite-users&m=116743785223905&w=2

Solution 7 - Python

Here is a relatively simple way to read a SQLite db into memory. Depending upon your preferences with regard to manipulating data, you either use Pandas dataframe or write your table to a in-memory sqlite3 database. Similarly after manipulating your data you use the same df.to_sqlite approach to store your results back into a db table.

import sqlite3 as lite
from pandas.io.sql import read_sql
from sqlalchemy import create_engine

engine = create_engine('sqlite://')
c = engine.connect()
conmem = c.connection
con = lite.connect('ait.sqlite', isolation_level=None) #Here is the connection to <ait.sqlite> residing on disk
cur = con.cursor()
sqlx = 'SELECT * FROM Table'
df = read_sql(sqlx, con, coerce_float=True, params=None) 

#Read SQLite table into a panda dataframe
df.to_sql(con=conmem, name='Table', if_exists='replace', flavor='sqlite')

Solution 8 - Python

With the solution of Cenk Alti, I always had a MemoryError with Python 3.7, when the process reached 500MB. Only with the use of the backup functionality of sqlite3 (mentioned by thinwybk), I was able to to load and save bigger SQLite databases. Also you can do the same with just 3 lines of code, both ways.

Solution 9 - Python

The answers of @thinwybk and Crooner are both excellent.

When you have multiple connections to :memory: sqlite databases, for instance when using SQLAlchemy together with the source.backup(dest) function, then you may end up in not placing the backup into the "right" memory DB.

This can be fixed using a proper connection string: https://stackoverflow.com/a/65429612/1617295 and does not involve any hack not use of undocumented features.

Solution 10 - Python

sqlite supports in-memory databases.

In python, you would use a :memory: database name for that.

Perhaps you could open two databases (one from the file, an empty one in-memory), migrate everything from the file database into memory, then use the in-memory database further to do calculations.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAdam MatanView Question on Stackoverflow
Solution 1 - PythonCenk AltiView Answer on Stackoverflow
Solution 2 - PythonthinwybkView Answer on Stackoverflow
Solution 3 - PythonFred FooView Answer on Stackoverflow
Solution 4 - PythonJouni K. SeppänenView Answer on Stackoverflow
Solution 5 - PythonobgnawView Answer on Stackoverflow
Solution 6 - PythonRoger BinnsView Answer on Stackoverflow
Solution 7 - PythonCroonerView Answer on Stackoverflow
Solution 8 - PythonwengaView Answer on Stackoverflow
Solution 9 - PythonRaffiView Answer on Stackoverflow
Solution 10 - Pythonjedi_coderView Answer on Stackoverflow