How to get primary keys of objects created using django bulk_create

DjangoDjango ModelsDjango ViewsDjango Queryset

Django Problem Overview


Is there a way to get the primary keys of the items you have created using the bulk_create feature in django 1.4+?

Django Solutions


Solution 1 - Django

2016

Since Django 1.10 - it's now supported (on Postgres only) here is a link to the doc.

>>> list_of_objects = Entry.objects.bulk_create([
...     Entry(headline="Django 2.0 Released"),
...     Entry(headline="Django 2.1 Announced"),
...     Entry(headline="Breaking: Django is awesome")
... ])
>>> list_of_objects[0].id
1

From the change log:

> Changed in Django 1.10: Support for setting primary keys on objects created using bulk_create() when using PostgreSQL was added

Solution 2 - Django

According to the documentation you can't do it: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create

bulk-create is just for that: create a lot of objects in an efficient way saving a lot of queries. But that means that the response you get is kind of incomplete. If you do:

>>> categories = Category.objects.bulk_create([
    Category(titel="Python", user=user),
    Category(titel="Django", user=user),
    Category(titel="HTML5", user=user),
])

>>> [x.pk for x in categories]
[None, None, None]

That doesn't mean your categories doesn't have pk, just that the query didn't retrieve them (if the key is an AutoField). If you want the pks for some reason you will need to save the objects in a classic way.

Solution 3 - Django

Two approaches I can think of:

a) You could do

category_ids = Category.objects.values_list('id', flat=True)
categories = Category.objects.bulk_create([
    Category(title="title1", user=user, created_at=now),
    Category(title="title2", user=user, created_at=now),
    Category(title="title3", user=user, created_at=now),
])
new_categories_ids = Category.objects.exclude(id__in=category_ids).values_list('id', flat=True)

This could be a little expensive if the queryset is extremely huge.

b) If the model has a created_at field,

now = datetime.datetime.now()
categories = Category.objects.bulk_create([
    Category(title="title1", user=user, created_at=now),
    Category(title="title2", user=user, created_at=now),
    Category(title="title3", user=user, created_at=now),
])

new_cats = Category.objects.filter(created_at >= now).values_list('id', flat=True)

This has the limitation of having a field that stores when the object was created.

Solution 4 - Django

Actually my colleague has suggested the following solution which seems all so obvious now. Add a new column called bulk_ref which you populate with a unique value and insert for every row. Afterwards simply query the table with the bulk_ref set beforehand and voila, your inserted records are retrieved. e.g.:

cars = [Car(
    model="Ford",
    color="Blue",
    price="5000",
    bulk_ref=5,
),Car(
    model="Honda",
    color="Silver",
    price="6000",
    bulk_ref=5,
)]
Car.objects.bulk_create(cars)
qs = Car.objects.filter(bulk_ref=5)

Solution 5 - Django

I will share you AUTO_INCREMENT handling in InnoDB (MySQL) and approach to get primary key when bulk_create (Django)

According to bulk_create doc If the model’s primary key is an AutoField it does not retrieve and set the primary key attribute, as save() does, unless the database backend supports it (currently PostgreSQL). so we need to find out the cause of the problem in Django or MySQL before looking for a solution.

The AUTO FIELD in Django is actually AUTO_INCREMENT in MySQL. It used to generate a unique identity for new rows (ref)

You want to bulk_create objects (Django) means insert multiple rows in a single SQL query. But how you can retrieve the most recent automatically generated PK (primary key)? Thanks to LAST_INSERT_ID. It returns first value automatically generated of the most recently executed INSERT statement...This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.

I encourage you to read AUTO_INCREMENT Handling in InnoDB and read Django code django.db.models.query.QuerySet.bulk_create to know why Django not support it for MySQl yet. It's interesting. Come back here and comment your idea please.

Next, I will show you sample code:

from django.db import connections, models, transaction
from django.db.models import AutoField, sql

def dict_fetch_all(cursor):
    """Return all rows from a cursor as a dict"""
    columns = [col[0] for col in cursor.description]
    return [
        dict(zip(columns, row))
        for row in cursor.fetchall()
    ]

class BulkQueryManager(models.Manager):
    def bulk_create_return_with_id(self, objs, batch_size=2000):
        self._for_write = True
        fields = [f for f in self.model._meta.concrete_fields if not isinstance(f, AutoField)]
        created_objs = []
        with transaction.atomic(using=self.db):
            with connections[self.db].cursor() as cursor:
                for item in [objs[i:i + batch_size] for i in range(0, len(objs), batch_size)]:
                    query = sql.InsertQuery(self.model)
                    query.insert_values(fields, item)
                    for raw_sql, params in query.get_compiler(using=self.db).as_sql():
                        cursor.execute(raw_sql, params)
                    raw = "SELECT * FROM %s WHERE id >= %s ORDER BY id DESC LIMIT %s" % (
                        self.model._meta.db_table, cursor.lastrowid, cursor.rowcount
                    )
                    cursor.execute(raw)
                    created_objs.extend(dict_fetch_all(cursor))

        return created_objs

class BookTab(models.Model):
    name = models.CharField(max_length=128)
    bulk_query_manager = BulkQueryManager()

    class Meta:
        db_table = 'book_tab'


def test():
    x = [BookTab(name="1"), BookTab(name="2")]
    create_books = BookTab.bulk_query_manager.bulk_create_return_with_id(x)
    print(create_books)  # [{'id': 2, 'name': '2'}, {'id': 1, 'name': '1'}]

The idea is using cursor to execute raw insert sql and then get back created_records. According to AUTO_INCREMENT handling in InnoDB, it make sure that there will be no records interrupting your objs from PK cursor.lastrowid - len(objs) + 1 to cursor.lastrowid (cursor.lastrowid).

Bonus: It's running production in my company. But you need to care about size affect that why Django not support it.

Solution 6 - Django

# datatime.py
# my datatime function
def getTimeStamp(needFormat=0, formatMS=True):
    if needFormat != 0:
        return datetime.datetime.now().strftime(f'%Y-%m-%d %H:%M:%S{r".%f" if formatMS else ""}')
    else:
        ft = time.time()
        return (ft if formatMS else int(ft))


def getTimeStampString():
    return str(getTimeStamp()).replace('.', '')


# model
    bulk_marker = models.CharField(max_length=32, blank=True, null=True, verbose_name='bulk_marker', help_text='ONLYFOR_bulkCreate')



# views
import .........getTimeStampString

data_list(
Category(title="title1", bulk_marker=getTimeStampString()),
...
)
# bulk_create
Category.objects.bulk_create(data_list)
# Get primary Key id
Category.objects.filter(bulk_marker=bulk_marker).values_list('id', flat=True)

Solution 7 - Django

I have tried many strategies to get around this limitation of MariaDB/MySQL. The only reliable solution I came up with at the end was to generate the primary keys in the application. DO NOT generate INT AUTO_INCREMENT PK fields yourself, it won't work, not even in a transaction with isolation level serializable, because the PK counter in MariaDB is not protected by transaction locks.

The solution is to add unique UUID fields to the models, generate their values in the model class, and then use that as their identifier. When you save a bunch of models to the database, you still won't get back their actual PK but that's fine, because in subsequent queries you can uniquely identify them with their UUID.

Solution 8 - Django

The django documentation currently states under the limitations:

> If the model’s primary key is an AutoField it does not retrieve and > set the primary key attribute, as save() does.

But, there is good news. There has been a couple of tickets talking about bulk_create from memory. The ticket listed above is the most likely to have a solution which will soon be implemented but obviously there is no guarantee on time or if it will ever make it.

So there are two possible solutions,

  1. Wait and see if this patch makes it to production. You can help with this by testing out the stated solution and let the django community know your thoughts / issues. https://code.djangoproject.com/attachment/ticket/19527/bulk_create_and_create_schema_django_v1.5.1.patch

  2. Override / write your own bulk insert solution.

Solution 9 - Django

Probably the simplest workaround is manually assigning primary keys. It depends on particular case, but sometimes it's enough to start with max(id)+1 from table and assign numbers incrementing on every object. However if several clients may insert records simultaneously some lock may be needed.

Solution 10 - Django

This doesn't work in stock Django, but there is a patch in the Django bug tracker that makes bulk_create set the primary keys for created objects.

Solution 11 - Django

The approach suggested by @Or Duan works for PostgreSQL when using bulk_create with ignore_conflicts=False. When ignore_conflicts=True is set then you don't get the values for the AutoField (usually ID) in the returned objects.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmikecView Question on Stackoverflow
Solution 1 - DjangoOr DuanView Answer on Stackoverflow
Solution 2 - DjangopyrikuView Answer on Stackoverflow
Solution 3 - DjangokarthikrView Answer on Stackoverflow
Solution 4 - DjangoDanHView Answer on Stackoverflow
Solution 5 - DjangoHoang Dung PhamView Answer on Stackoverflow
Solution 6 - DjangoKevinView Answer on Stackoverflow
Solution 7 - DjangoPatrickView Answer on Stackoverflow
Solution 8 - DjangoMatt SeymourView Answer on Stackoverflow
Solution 9 - Djangopeper0View Answer on Stackoverflow
Solution 10 - Djangouser3175220View Answer on Stackoverflow
Solution 11 - DjangoKunal YadavView Answer on Stackoverflow