Bulk inserts taking longer than expected using Dapper

PerformanceSqlbulkcopyDapper

Performance Problem Overview


After reading this article I decided to take a closer look at the way I was using Dapper.

I ran this code on an empty database

var members = new List<Member>();
for (int i = 0; i < 50000; i++)
{
	members.Add(new Member()
	{
		Username = i.toString(),
		IsActive = true
	});
}

using (var scope = new TransactionScope())
{
	connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

	scope.Complete();
}

it took about 20 seconds. That's 2500 inserts/second. Not bad, but not great either considering the blog was achieving 45k inserts/second. Is there a more efficient way to do this in Dapper?

Also, as a side note, running this code through the Visual Studio debugger took over 3 minutes! I figured the debugger would slow it down a little, but I was really surprised to see that much.

UPDATE

So this

using (var scope = new TransactionScope())
{
	connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

	scope.Complete();
}

and this

	connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

both took 20 seconds.

But this took 4 seconds!

SqlTransaction trans = connection.BeginTransaction();

connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members, transaction: trans);

trans.Commit();

Performance Solutions


Solution 1 - Performance

The best I was able to achieve was 50k records in 4 seconds using this approach

SqlTransaction trans = connection.BeginTransaction();

connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members, transaction: trans);

trans.Commit();

Solution 2 - Performance

I stumbled accross this recently and noticed that the TransactionScope is created after the connection is opened (I assume this since Dappers Execute doesn't open the connection, unlike Query). According to the answer Q4 here: https://stackoverflow.com/a/2886326/455904 that will not result in the connection to be handled by the TransactionScope. My workmate did some quick tests, and opening the connection outside the TransactionScope drastically decreased performance.

So changing to the following should work:

// Assuming the connection isn't already open
using (var scope = new TransactionScope())
{
    connection.Open();
    connection.Execute(@"
insert Member(Username, IsActive)
values(@Username, @IsActive)", members);

    scope.Complete();
}

Solution 3 - Performance

I created an extension method that would allow you to do a bulk insert very quickly.

public static class DapperExtensions
{
    public static async Task BulkInsert<T>(
        this IDbConnection connection,
        string tableName,
        IReadOnlyCollection<T> items,
        Dictionary<string, Func<T, object>> dataFunc)
    {
        const int MaxBatchSize = 1000;
        const int MaxParameterSize = 2000;
        
        var batchSize = Math.Min((int)Math.Ceiling((double)MaxParameterSize / dataFunc.Keys.Count), MaxBatchSize);
        var numberOfBatches = (int)Math.Ceiling((double)items.Count / batchSize);
        var columnNames = dataFunc.Keys;
        var insertSql = $"INSERT INTO {tableName} ({string.Join(", ", columnNames.Select(e => $"[{e}]"))}) VALUES ";
        var sqlToExecute = new List<Tuple<string, DynamicParameters>>();
        
        for (var i = 0; i < numberOfBatches; i++)
        {
            var dataToInsert = items.Skip(i * batchSize)
                .Take(batchSize);
            var valueSql = GetQueries(dataToInsert, dataFunc);

            sqlToExecute.Add(Tuple.Create($"{insertSql}{string.Join(", ", valueSql.Item1)}", valueSql.Item2));
        }
        
        foreach (var sql in sqlToExecute)
        {
            await connection.ExecuteAsync(sql.Item1, sql.Item2, commandTimeout: int.MaxValue);
        }
    }
    
    private static Tuple<IEnumerable<string>, DynamicParameters> GetQueries<T>(
        IEnumerable<T> dataToInsert,
        Dictionary<string, Func<T, object>> dataFunc)
    {
        var parameters = new DynamicParameters();

        return Tuple.Create(
            dataToInsert.Select(e => $"({string.Join(", ", GenerateQueryAndParameters(e, parameters, dataFunc))})"),
            parameters);
    }

    private static IEnumerable<string> GenerateQueryAndParameters<T>(
        T entity,
        DynamicParameters parameters,
        Dictionary<string, Func<T, object>> dataFunc)
    {
        var paramTemplateFunc = new Func<Guid, string>(guid => $"@p{guid.ToString().Replace("-", "")}");
        var paramList = new List<string>();

        foreach (var key in dataFunc)
        {
            var paramName = paramTemplateFunc(Guid.NewGuid());
            parameters.Add(paramName, key.Value(entity));
            paramList.Add(paramName);
        }

        return paramList;
    }
}

Then to use this extension method, you would write code like the following:

await dbConnection.BulkInsert(
    "MySchemaName.MyTableName",
    myCollectionOfItems,
    new Dictionary<string, Func<MyObjectToInsert, object>>
        {
            { "ColumnOne", u => u.ColumnOne },
            { "ColumnTwo", u => u.ColumnTwo },
            ...
        });

This is quite primitive and has further room for improvement, such as passing in a transaction or a commandTimeout value but it does the trick for me.

Solution 4 - Performance

Using the Execute method with only one insert statement will never do a bulk insert or be efficient. Even the accepted answer with a Transaction doesn't do a Bulk Insert.

If you want to perform a Bulk Insert, use the SqlBulkCopy https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy

You will not find anything faster than this.

Dapper Plus

Disclaimer: I'm the owner of the project Dapper Plus

This project is not free but offers all bulk operations:

  • BulkInsert
  • BulkUpdate
  • BulkDelete
  • BulkMerge

(Use under the hood SqlBulkCopy)

And some more options such as outputting identity values:

// CONFIGURE & MAP entity
DapperPlusManager.Entity<Order>()
                 .Table("Orders")
                 .Identity(x => x.ID);

// CHAIN & SAVE entity
connection.BulkInsert(orders)
          .AlsoInsert(order => order.Items);
          .Include(x => x.ThenMerge(order => order.Invoice)
                         .AlsoMerge(invoice => invoice.Items))
          .AlsoMerge(x => x.ShippingAddress);   

Our library supports multiple providers:

  • SQL Server
  • SQL Compact
  • Oracle
  • MySql
  • PostgreSQL
  • SQLite
  • Firebird

Solution 5 - Performance

I found all these examples incomplete.

Here is some code that properly closes the connection after use, and also correctly uses the transactionscope to enhance the Excecute performance, based on the more recent and better answers in this thread.

using (var scope = new TransactionScope()) 
{
	Connection.Open();
	Connection.Execute(sqlQuery, parameters);

	scope.Complete();
}

Solution 6 - Performance

the fastest variant for me:

            var dynamicParameters = new DynamicParameters();
            var selects = new List<string>();
            for (var i = 0; i < members.Length; i++)
            {
                var member = members[i];
                var pUsername = $"u{i}";
                var pIsActive = $"a{i}";
                dynamicParameters.Add(pUsername, member.Username);
                dynamicParameters.Add(pIsActive, member.IsActive);
                selects.Add("select @{pUsername},@{pIsActive}");
            }
            con.Execute($"insert into Member(Username, IsActive){string.Join(" union all ", selects)}", dynamicParameters);

which generate sql like:

INSERT TABLENAME (Column1,Column2,...)
 SELECT @u0,@a0...
 UNION ALL
 SELECT @u1,@a1...
 UNION ALL
 SELECT @u2,@a2...

this query works faster because sql adds set of rows instead adding 1 row at a time. The bottleneck is not writing the data, it's writing what you're doing in the log.

Also, look into the rules of minimally logged transactions.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkenwarnerView Question on Stackoverflow
Solution 1 - PerformancekenwarnerView Answer on Stackoverflow
Solution 2 - PerformanceFredrik LjungView Answer on Stackoverflow
Solution 3 - PerformanceCallumVassView Answer on Stackoverflow
Solution 4 - PerformanceJonathan MagnanView Answer on Stackoverflow
Solution 5 - PerformanceErik BergstedtView Answer on Stackoverflow
Solution 6 - PerformancerazonView Answer on Stackoverflow