Converting Django QuerySet to pandas DataFrame

PythonDjangoPandas

Python Problem Overview


I am going to convert a Django QuerySet to a pandas DataFrame as follows:

qs = SomeModel.objects.select_related().filter(date__year=2012)
q = qs.values('date', 'OtherField')
df = pd.DataFrame.from_records(q)

It works, but is there a more efficient way?

Python Solutions


Solution 1 - Python

import pandas as pd
import datetime
from myapp.models import BlogPost

df = pd.DataFrame(list(BlogPost.objects.all().values()))
df = pd.DataFrame(list(BlogPost.objects.filter(date__gte=datetime.datetime(2012, 5, 1)).values()))

# limit which fields
df = pd.DataFrame(list(BlogPost.objects.all().values('author', 'date', 'slug')))

The above is how I do the same thing. The most useful addition is specifying which fields you are interested in. If it's only a subset of the available fields you are interested in, then this would give a performance boost I imagine.

Solution 2 - Python

Convert the queryset on values_list() will be more memory efficient than on values() directly. Since the method values() returns a queryset of list of dict (key:value pairs), values_list() only returns list of tuple (pure data). It will save about 50% memory, just need to set the column information when you call pd.DataFrame().

Method 1:
queryset = models.xxx.objects.values("A","B","C","D")
df = pd.DataFrame(list(queryset))  ## consumes much memory
#df = pd.DataFrame.from_records(queryset) ## works but no much change on memory usage

Method 2: queryset = models.xxx.objects.values_list("A","B","C","D") df = pd.DataFrame(list(queryset), columns=["A","B","C","D"]) ## this will save 50% memory #df = pd.DataFrame.from_records(queryset, columns=["A","B","C","D"]) ##It does not work. Crashed with datatype is queryset not list.

I tested this on my project with >1 million rows data, the peak memory is reduced from 2G to 1G.

Solution 3 - Python

Django Pandas solves this rather neatly: https://github.com/chrisdev/django-pandas/

From the README:

class MyModel(models.Model):
    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

from django_pandas.io import read_frame
qs = MyModel.objects.all()
df = read_frame(qs)

Solution 4 - Python

From the Django perspective (I'm not familiar with pandas) this is fine. My only concern is that if you have a very large number of records, you may run into memory problems. If this were the case, something along the lines of this memory efficient queryset iterator would be necessary. (The snippet as written might require some rewriting to allow for your smart use of .values()).

Solution 5 - Python

You maybe can use model_to_dict

import datetime
from django.forms import model_to_dict
pallobjs = [ model_to_dict(pallobj) for pallobj in PalletsManag.objects.filter(estado='APTO_PARA_VENTA')] 
df = pd.DataFrame(pallobjs)
df.head()

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionFranco MariluisView Question on Stackoverflow
Solution 1 - PythonlexualView Answer on Stackoverflow
Solution 2 - Pythonshengyang wangView Answer on Stackoverflow
Solution 3 - PythonDavid WatsonView Answer on Stackoverflow
Solution 4 - PythonDavid EykView Answer on Stackoverflow
Solution 5 - PythonPjlView Answer on Stackoverflow