What is an efficient way of inserting thousands of records into an SQLite table using Django?

PythonSqlDjangoSqliteInsert

Python Problem Overview


I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:

for item in items:
    entry = Entry(a1=item.a1, a2=item.a2)
    entry.save()

What is an efficient way of doing this?

Edit: A little comparison between the two insertion methods.

Without commit_manually decorator (11245 records):

nox@noxdevel marinetraffic]$ time python manage.py insrec             

real	1m50.288s
user	0m6.710s
sys	    0m23.445s

Using commit_manually decorator (11245 records):

[nox@noxdevel marinetraffic]$ time python manage.py insrec                

real	0m18.464s
user	0m5.433s
sys	    0m10.163s

Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.

Python Solutions


Solution 1 - Python

You want to check out django.db.transaction.commit_manually.

http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually

So it would be something like:

from django.db import transaction

@transaction.commit_manually
def viewfunc(request):
    ...
    for item in items:
        entry = Entry(a1=item.a1, a2=item.a2)
        entry.save()
    transaction.commit()

Which will only commit once, instead at each save().

In django 1.3 context managers were introduced. So now you can use transaction.commit_on_success() in a similar way:

from django.db import transaction

def viewfunc(request):
    ...
    with transaction.commit_on_success():
        for item in items:
            entry = Entry(a1=item.a1, a2=item.a2)
            entry.save()

In django 1.4, bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.

NOTE the save method will not be called when using bulk create.

>>> Entry.objects.bulk_create([
...     Entry(headline="Django 1.0 Released"),
...     Entry(headline="Django 1.1 Announced"),
...     Entry(headline="Breaking: Django is awesome")
... ])

In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success and commit_manually.

from the django documentation on atomic:

atomic is usable both as a decorator:

from django.db import transaction

@transaction.atomic
def viewfunc(request):
    # This code executes inside a transaction.
    do_stuff()

and as a context manager:

from django.db import transaction

def viewfunc(request):
    # This code executes in autocommit mode (Django's default).
    do_stuff()

    with transaction.atomic():
        # This code executes inside a transaction.
        do_more_stuff()

Solution 2 - Python

Solution 3 - Python

Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.

Solution 4 - Python

You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.

Solution 5 - Python

To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."

The quoted stuff is from the docs--- A-IV provided a link.

What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.

Solution 6 - Python

You should check out DSE. I wrote DSE to solve these kinds of problems ( massive insert or updates ). Using the django orm is a dead-end, you got to do it in plain SQL and DSE takes care of much of that for you.

Thomas

Solution 7 - Python

def order(request):    
    if request.method=="GET":
        cust_name = request.GET.get('cust_name', '')
        cust_cont = request.GET.get('cust_cont', '')
        pincode = request.GET.get('pincode', '')
        city_name = request.GET.get('city_name', '')
        state = request.GET.get('state', '')
        contry = request.GET.get('contry', '')
        gender = request.GET.get('gender', '')
        paid_amt = request.GET.get('paid_amt', '')
        due_amt = request.GET.get('due_amt', '')
        order_date = request.GET.get('order_date', '')
        print(order_date)
        prod_name = request.GET.getlist('prod_name[]', '')
        prod_qty = request.GET.getlist('prod_qty[]', '')
        prod_price = request.GET.getlist('prod_price[]', '')
        print(prod_name)
        print(prod_qty)
        print(prod_price)
        # insert customer information into customer table
        try:
            # Insert Data into customer table
            cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
            cust_tab.save()
            # Retrive Id from customer table
            custo_id = Customer.objects.values_list('customer_id').last()   #It is return
Tuple as result from Queryset
            custo_id = int(custo_id[0]) #It is convert the Tuple in INT
            # Insert Data into Order table
            order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
            order_tab.save()
            # Insert Data into Products table
            # insert multiple data at a one time from djanog using while loop
            i=0
            while(i<len(prod_name)):
                p_n = prod_name[i]
                p_q = prod_qty[i]
                p_p = prod_price[i]
                # this is checking the variable, if variable is null so fill the varable value in database
                if p_n != "" and p_q != "" and p_p != "":
                    prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
                    prod_tab.save()
                i=i+1

Solution 8 - Python

I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:

insert into A select from B;

The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.

Solution 9 - Python

def order(request):    
    if request.method=="GET":
        # get the value from html page
        cust_name = request.GET.get('cust_name', '')
        cust_cont = request.GET.get('cust_cont', '')
        pincode = request.GET.get('pincode', '')
        city_name = request.GET.get('city_name', '')
        state = request.GET.get('state', '')
        contry = request.GET.get('contry', '')
        gender = request.GET.get('gender', '')
        paid_amt = request.GET.get('paid_amt', '')
        due_amt = request.GET.get('due_amt', '')
        order_date = request.GET.get('order_date', '')
        prod_name = request.GET.getlist('prod_name[]', '')
        prod_qty = request.GET.getlist('prod_qty[]', '')
        prod_price = request.GET.getlist('prod_price[]', '')
        
        # insert customer information into customer table
        try:
            # Insert Data into customer table
            cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
            cust_tab.save()
            # Retrive Id from customer table
            custo_id = Customer.objects.values_list('customer_id').last()   #It is return Tuple as result from Queryset
            custo_id = int(custo_id[0]) #It is convert the Tuple in INT
            # Insert Data into Order table
            order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
            order_tab.save()
            # Insert Data into Products table
            # insert multiple data at a one time from djanog using while loop
            i=0
            while(i<len(prod_name)):
                p_n = prod_name[i]
                p_q = prod_qty[i]
                p_p = prod_price[i]

                # this is checking the variable, if variable is null so fill the varable value in database
                if p_n != "" and p_q != "" and p_p != "":
                    prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
                    prod_tab.save()
                i=i+1
            
            return HttpResponse('Your Record Has been Saved')
        except Exception as e:
            return HttpResponse(e)     
        
    return render(request, 'invoice_system/order.html')

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser11617View Question on Stackoverflow
Solution 1 - PythonmonkutView Answer on Stackoverflow
Solution 2 - PythonA-IVView Answer on Stackoverflow
Solution 3 - PythonVinay SajipView Answer on Stackoverflow
Solution 4 - PythonConcernedOfTunbridgeWellsView Answer on Stackoverflow
Solution 5 - PythonMike O'ConnorView Answer on Stackoverflow
Solution 6 - PythonWeholtView Answer on Stackoverflow
Solution 7 - PythonMohit MishraView Answer on Stackoverflow
Solution 8 - PythonSergey GolovchenkoView Answer on Stackoverflow
Solution 9 - PythonMohit MishraView Answer on Stackoverflow