How to do bulk (multi row) inserts with JpaRepository?

HibernateSpring BootKotlinSpring Data-JpaCockroachdb

Hibernate Problem Overview


When calling the saveAll method of my JpaRepository with a long List<Entity> from the service layer, trace logging of Hibernate shows single SQL statements being issued per entity.

Can I force it to do a bulk insert (i.e. multi-row) without needing to manually fiddle with EntityManger, transactions etc. or even raw SQL statement strings?

With multi-row insert I mean not just transitioning from:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

to:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

but instead to:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

In PROD I'm using CockroachDB, and the difference in performance is significant.

Below is a minimal example that reproduces the problem (H2 for simplicity).


./src/main/kotlin/ThingService.kt:

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties:

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts:

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

Run:

./gradlew bootRun

Trigger DB INSERTs:

curl http://localhost:8080/test_trigger

Log output:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)

Hibernate Solutions


Solution 1 - Hibernate

To get a bulk insert with Spring Boot and Spring Data JPA you need only two things:

  1. set the option spring.jpa.properties.hibernate.jdbc.batch_size to appropriate value you need (for example: 20).

  2. use saveAll() method of your repo with the list of entities prepared for inserting.

Working example is here.

Regarding the transformation of the insert statement into something like this:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

the such is available in PostgreSQL: you can set the option reWriteBatchedInserts to true in jdbc connection string:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

then jdbc driver will do this transformation.

Additional info about batching you can find here.

UPDATED

Demo project in Kotlin: sb-kotlin-batch-insert-demo

UPDATED

> Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.

Solution 2 - Hibernate

The underlying issues is the following code in SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

In addition to the batch size property settings, you have to make sure that the class SimpleJpaRepository calls persist and not merge. There are a few approaches to resolve this: use an @Id generator that does not query sequence, like

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

Or forcing the persistence to treat the records as new by having your entity implement Persistable and overriding the isNew() call

@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

Or override the save(List) and use the entity manager to call persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

The above code is based on the following links:

Solution 3 - Hibernate

You can configure Hibernate to do bulk DML. Have a look at https://stackoverflow.com/questions/36356473/spring-data-jpa-concurrent-bulk-inserts-updates. I think section 2 of the answer could solve your problem:

> Enable the batching of DML statements Enabling the batching support > would result in less number of round trips to the database to > insert/update the same number of records. > > Quoting from batch INSERT and UPDATE statements: > > hibernate.jdbc.batch_size = 50 > > hibernate.order_inserts = true > > hibernate.order_updates = true > > hibernate.jdbc.batch_versioned_data = true

UPDATE: You have to set the hibernate properties differently in your application.properties file. They are under the namespace: spring.jpa.properties.*. An example could look like the following:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

Solution 4 - Hibernate

All mentioned methods work but will be slow especially if the source for inserted data lies in some other table. Firstly, even with batch_size>1 the insert operation will be executed in multiple SQL queries. Secondly, if the source data lies in the other table you need to fetch the data with other queries (and in the worst case scenario load all data into memory), and convert it to static bulk inserts. Thirdly, with separate persist() call for each entity (even if batch is enabled) you will bloat entity manager first level cache with all these entity instances.

But there's another option for Hibernate. If you use Hibernate as a JPA provider you can fallback to HQL which supports bulk inserts natively with subselect from another table. The example:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

Whether this will work depends on your ID generation strategy. If the Entity.id is generated by the database (for example MySQL auto increment), it will be executed successfully. If the Entity.id is generated by your code (especially true for UUID generators), it will fail with "unsupported id generation method" exception.

However, in the latter scenario this problem can be overcome by custom SQL function. For example in PostgreSQL I use uuid-ossp extension which provides uuid_generate_v4() function, which I finally register in my custom dialog:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

	public MyPostgresDialect() {
		registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
	}
}

And then I register this class as a hibernate dialog:

hibernate.dialect=MyPostgresDialect

Finally I can use this function in bulk insert query:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

The most important is the underlying SQL generated by Hibernate to accomplish this operation and it's just a single query:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTobias HermannView Question on Stackoverflow
Solution 1 - HibernateCepr0View Answer on Stackoverflow
Solution 2 - HibernateJean MaroisView Answer on Stackoverflow
Solution 3 - HibernaterieckpilView Answer on Stackoverflow
Solution 4 - HibernateLukasz FrankowskiView Answer on Stackoverflow