Most efficient way to insert Rows into MySQL Database

C#MysqlPerformance

C# Problem Overview


I've read a lot of questions about that but i couldn't find one that is fast enough. I think there are better ways to insert a lot of rows into a MySQL Database

I use the following code to insert 100k into my MySQL-Database:

public static void CSVToMySQL()
{
    string ConnectionString = "server=192.168.1xxx";
    string Command = "INSERT INTO User (FirstName, LastName ) VALUES (@FirstName, @LastName);";
    using (MySqlConnection mConnection = new MySqlConnection(ConnectionString))
    {
        mConnection.Open();

        for(int i =0;i< 100000;i++) //inserting 100k items
        using (MySqlCommand myCmd = new MySqlCommand(Command, mConnection))
        {
            myCmd.CommandType = CommandType.Text;
            myCmd.Parameters.AddWithValue("@FirstName", "test");
            myCmd.Parameters.AddWithValue("@LastName", "test");
            myCmd.ExecuteNonQuery();
        }
    }
}

This takes for 100k rows about 40 seconds. How can i make this faster or a little more efficient?

Might be faster to insert multiple rows via a DataTable/DataAdapter or at once:

INSERT INTO User (Fn, Ln) VALUES (@Fn1, @Ln1), (@Fn2, @Ln2)...

Due to security issues i can't load the data into a file and MySQLBulkLoad it.

C# Solutions


Solution 1 - C#

Here is my "multiple inserts"-code.

The insertion of 100k rows took instead of 40 seconds only 3 seconds!!

public static void BulkToMySQL()
{
    string ConnectionString = "server=192.168.1xxx";
    StringBuilder sCommand = new StringBuilder("INSERT INTO User (FirstName, LastName) VALUES ");           
    using (MySqlConnection mConnection = new MySqlConnection(ConnectionString))
    {
        List<string> Rows = new List<string>();
        for (int i = 0; i < 100000; i++)
        {
            Rows.Add(string.Format("('{0}','{1}')", MySqlHelper.EscapeString("test"), MySqlHelper.EscapeString("test")));
        }
        sCommand.Append(string.Join(",", Rows));
        sCommand.Append(";");
        mConnection.Open();
        using (MySqlCommand myCmd = new MySqlCommand(sCommand.ToString(), mConnection))
        {
            myCmd.CommandType = CommandType.Text;
            myCmd.ExecuteNonQuery();
        }
    }
}

The created SQL-statement looks like this:

INSERT INTO User (FirstName, LastName) VALUES ('test','test'),('test','test'),... ;

Update: Thanks Salman A I added MySQLHelper.EscapeString to avoid code injection which is internally used when you use parameters.

Solution 2 - C#

I did a small test using three things MySqlDataAdapter,transactions and UpdateBatchSize. It is about 30 times faster than your first example. Mysql is running on separate box so there is latency involved. The batchsize might need some tuning. Code follows:

string ConnectionString = "server=xxx;Uid=xxx;Pwd=xxx;Database=xxx";

string Command = "INSERT INTO User2 (FirstName, LastName ) VALUES (@FirstName, @LastName);";


 using (var mConnection = new MySqlConnection(ConnectionString))
     {
         mConnection.Open();
         MySqlTransaction transaction = mConnection.BeginTransaction();

        //Obtain a dataset, obviously a "select *" is not the best way...
        var mySqlDataAdapterSelect = new MySqlDataAdapter("select * from User2", mConnection);

        var ds = new DataSet();

        mySqlDataAdapterSelect.Fill(ds, "User2");


        var mySqlDataAdapter = new MySqlDataAdapter();

        mySqlDataAdapter.InsertCommand = new MySqlCommand(Command, mConnection);


        mySqlDataAdapter.InsertCommand.Parameters.Add("@FirstName", MySqlDbType.VarChar, 32, "FirstName");
        mySqlDataAdapter.InsertCommand.Parameters.Add("@LastName", MySqlDbType.VarChar, 32, "LastName");
        mySqlDataAdapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        for (int i = 0; i < 50000; i++)
        {
            DataRow row = ds.Tables["User2"].NewRow();
            row["FirstName"] = "1234";
            row["LastName"] = "1234";
            ds.Tables["User2"].Rows.Add(row);
        }

         mySqlDataAdapter.UpdateBatchSize = 100;
         mySqlDataAdapter.Update(ds, "User2");

         transaction.Commit();

         stopwatch.Stop();
         Debug.WriteLine(" inserts took " + stopwatch.ElapsedMilliseconds + "ms");
    }
}

Solution 3 - C#

Execute command in a Transaction and reuse same instance of command for each iteration. For further performance optimization, send 100 queries in one command. Going for parallel execution could give better performance (Parallel.For) but make sure each parallel loop get its own MySqlCommand instance.

public static void CSVToMySQL()
{
	string ConnectionString = "server=192.168.1xxx";
	string Command = "INSERT INTO User (FirstName, LastName ) VALUES (@FirstName, @LastName);";
	using (MySqlConnection mConnection = new MySqlConnection(ConnectionString)) 
	{
		mConnection.Open();
		using (MySqlTransaction trans = mConnection.BeginTransaction()) 
		{
			using (MySqlCommand myCmd = new MySqlCommand(Command, mConnection, trans)) 
			{
				myCmd.CommandType = CommandType.Text;
				for (int i = 0; i <= 99999; i++) 
				{
					//inserting 100k items
					myCmd.Parameters.Clear();
					myCmd.Parameters.AddWithValue("@FirstName", "test");
					myCmd.Parameters.AddWithValue("@LastName", "test");
					myCmd.ExecuteNonQuery();
				}
				trans.Commit();
			}
		}
	}
}

Solution 4 - C#

This way may not be faster than the stringbuilder approach, but it is parameterized:

/// <summary>
    /// Bulk insert some data, uses parameters
    /// </summary>
    /// <param name="table">The Table Name</param>
    /// <param name="inserts">Holds list of data to insert</param>
    /// <param name="batchSize">executes the insert after batch lines</param>
    /// <param name="progress">Progress reporting</param>
    public void BulkInsert(string table, MySQLBulkInsertData inserts, int batchSize = 100, IProgress<double> progress = null)
    {
        if (inserts.Count <= 0) throw new ArgumentException("Nothing to Insert");

        string insertcmd = string.Format("INSERT INTO `{0}` ({1}) VALUES ", table,
                                         inserts.Fields.Select(p => p.FieldName).ToCSV());
        StringBuilder sb = new StringBuilder(); 
        using (MySqlConnection conn = new MySqlConnection(ConnectionString))
        using (MySqlCommand sqlExecCommand = conn.CreateCommand())
        {
            conn.Open();
            sb.AppendLine(insertcmd);
            for (int i = 0; i < inserts.Count; i++)
            {
                sb.AppendLine(ToParameterCSV(inserts.Fields, i));
                for (int j = 0; j < inserts[i].Count(); j++)
                {
                    sqlExecCommand.Parameters.AddWithValue(string.Format("{0}{1}",inserts.Fields[j].FieldName,i), inserts[i][j]);
                }
                //commit if we are on the batch sizeor the last item
                if (i > 0 && (i%batchSize == 0 || i == inserts.Count - 1))
                {
                    sb.Append(";");
                    sqlExecCommand.CommandText = sb.ToString();
                    sqlExecCommand.ExecuteNonQuery();
                    //reset the stringBuilder
                    sb.Clear();
                    sb.AppendLine(insertcmd);
                    if (progress != null)
                    {
                        progress.Report((double)i/inserts.Count);
                    }
                }
                else
                {
                    sb.Append(",");
                }
            }
        }
    }

This uses the helper classes as below:

/// <summary>
/// Helper class to builk insert data into a table
/// </summary>
public struct MySQLFieldDefinition
{
    public MySQLFieldDefinition(string field, MySqlDbType type) : this()
    {
        FieldName = field;
        ParameterType = type;
    }

    public string FieldName { get; private set; }
    public MySqlDbType ParameterType { get; private set; }
}

///
///You need to ensure the fieldnames are in the same order as the object[] array
///
public class MySQLBulkInsertData : List<object[]>
{
    public MySQLBulkInsertData(params MySQLFieldDefinition[] fieldnames)
    {
        Fields = fieldnames;
    }

    public MySQLFieldDefinition[] Fields { get; private set; }
}

And this helper method:

    /// <summary>
    /// Return a CSV string of the values in the list
    /// </summary>
    /// <returns></returns>
    /// <exception cref="ArgumentNullException"></exception>
    private string ToParameterCSV(IEnumerable<MySQLFieldDefinition> p, int row)
    {
        string csv = p.Aggregate(string.Empty,
            (current, i) => string.IsNullOrEmpty(current)
                    ? string.Format("@{0}{1}",i.FieldName, row)
                    : string.Format("{0},@{2}{1}", current, row, i.FieldName));
        return string.Format("({0})", csv);
    }

Maybe not super elegant but it works well. I require Progress tracking so that is included for me, feel free to remove that part.

This will produce SQL commands similar to your desired output.

EDIT: ToCSV:

        /// <summary>
    /// Return a CSV string of the values in the list
    /// </summary>
    /// <param name="intValues"></param>
    /// <param name="separator"></param>
    /// <param name="encloser"></param>
    /// <returns></returns>
    /// <exception cref="ArgumentNullException"></exception>
    public static string ToCSV<T>(this IEnumerable<T> intValues, string separator = ",", string encloser = "")
    {
        string result = String.Empty;
        foreach (T value in intValues)
        {
            result = String.IsNullOrEmpty(result)
                ? string.Format("{1}{0}{1}", value, encloser)
                : String.Format("{0}{1}{3}{2}{3}", result, separator, value, encloser);
        }
        return result;
    }

Solution 5 - C#

If Add of AddWithValue does not escape strings, you must do such in advance to avoid SQL injection and syntax errors.

Build INSERT statements with only 1000 rows at a time. That should run easily 10 times as fast as what you started with (1 row per INSERT). Doing all 100K at once is risky and possibly slower. Risky because you might blow out some limit (packet size, etc); slower because of the need for a huge ROLLBACK log. COMMIT after each batch, or use autocommit=1.

Solution 6 - C#

One way to accelerate would be wrapping all inserts into ONE transaction (SQL-Server code):

using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
{
    conn.Open();
    SqlTransaction transaction = conn.BeginTransaction();

    try 
    {  
        foreach (string commandString in dbOperations)
        {
            SqlCommand cmd = new SqlCommand(commandString, conn, transaction);
            cmd.ExecuteNonQuery();
        }
        transaction.Commit(); 
    } // Here the execution is committed to the DB
    catch (Exception)
    {
      transaction.Rollback();
      throw;
    }
    conn.Close();
}

Another way is to load the CSV-File into a datatable, and use the batching feature of DataAdapter

 DataTable dtInsertRows = GetDataTable(); 
  
    SqlConnection connection = new SqlConnection(connectionString);
    SqlCommand command = new SqlCommand("sp_BatchInsert", connection);
    command.CommandType = CommandType.StoredProcedure;
    command.UpdatedRowSource = UpdateRowSource.None;
  
    // Set the Parameter with appropriate Source Column Name
    command.Parameters.Add("@PersonId", SqlDbType.Int, 4, dtInsertRows.Columns[0].ColumnName);   
    command.Parameters.Add("@PersonName", SqlDbType.VarChar, 100, dtInsertRows.Columns[1].ColumnName);
    
    SqlDataAdapter adpt = new SqlDataAdapter();
    adpt.InsertCommand = command;
    // Specify the number of records to be Inserted/Updated in one go. Default is 1.
    adpt.UpdateBatchSize = 2;
    
    connection.Open();
    int recordsInserted = adpt.Update(dtInsertRows);   
    connection.Close();

You find a nice example here.

Or you can use the MySQL BulkLoader C# class:

var bl = new MySqlBulkLoader(connection);
bl.TableName = "mytable";
bl.FieldTerminator = ",";
bl.LineTerminator = "\r\n";
bl.FileName = "myfileformytable.csv";
bl.NumberOfLinesToSkip = 1;
var inserted = bl.Load();
Debug.Print(inserted + " rows inserted.");

If you do multiple inserts in one command, you might still squeeze an inch or two out by using StringBuilder instead of string.

Solution 7 - C#

As Stefan Steiger says, Bulk Insert is suitable for your situations.

Another trick is using staging tables, so instead of writing directly to the production table, you will write to staging one (which has the same structure). Having written all info you just swap tables. With staging aproach you will avoid locking tables for insertion (can be used for update and delete too), and this pattern is heavily used with MySQL in some projects.

Also, disabling table keys may speed up insertion, but also can introduce some problems when you enable them (only for MyISAM engine).

Added:

Let's say you have table Products:

  • ProductId
  • ProductName
  • ProductPrice

For staging purpose you create a staging table called ProductsStaging, with the same set of columns.

All your operation you do on staging table:

UpdateStagingTable();
SwapTables();
UpdateStagingTable();

because after swap your staging table does not have the new data, you invoke the same method once again. In SwapTables() method you execute one SQL statement:

RENAME TABLE Products TO ProductsTemp,
             ProductsStaging TO Products,
             ProductsTemp TO ProductsStagin;

The speed of data manipulations depends on the MySql engine (e.g. InnoDB, MyISAM etc.), so you can also speed up inserts by changing engine.

Solution 8 - C#

I stumbled upon a similar problem while working with EF - MySQL. The EF inserts were way too slow and hence used the approach mentioned by fubo. To start with, the performance improved drastically (~20K records were inserted in ~10 seconds) but degraded as the table grew in size, with ~1M records in the table, the insertion took ~250 seconds.

Finally figured out the issue! The PK of the table was of type GUID (UUID - char(36)). As UUIDs cannot index sequentially and every insert required the indexes to be rebuilt, it slowed down.

The fix was to replace the PK with bigint (or int) and set it as an identity column. This improved the performance, the insertions took an average of ~12 seconds with ~2M+ records in the table!

Thought I'd share this finding here just in case someone gets stuck on a similar problem!

Solution 9 - C#

I have found the way to avoid using a file for the bulk insert. In this connector was implementer load from stream. So loading could be done something like that

  public void InsertData(string table, List<string> columns, List<List<object>> data) {

  using (var con = OpenConnection() as MySqlConnection) {
    var bulk = new MySqlBulkLoader(con);
    using (var stream = new MemoryStream()) {
      bulk.SourceStream = stream;
      bulk.TableName = table;
      bulk.FieldTerminator = ";";
      var writer = new StreamWriter(stream);

      foreach (var d in data)
        writer.WriteLine(string.Join(";", d));

      writer.Flush();
      stream.Position = 0;
      bulk.Load();
    }
  }
}

Solution 10 - C#

My suggestion is an idea, not example or solution. What if you don't use INSERTs but pass data as multiple parameters (not necessary all 100K at once, you can use bundles of 1K for example) to STORED PROCEDURE which itself doing INSERTs.

Solution 11 - C#

A bulk operation would be a good manner to lead with that. Something that read your properties and then create a bulk query for you...

There's a github repository that contains both useful methods: BulkInsert and BulkUpdate using MySql and EF6+.

The BulkUpdate/BulkInsert basically read all properties from your generic entity and then create the bulkquery for you.

Ps: This has been intended developed to my needs and the project is opened to who concerns to improve it or change it for a better solution that will worth to the community.

Ps²: If it doesnt fulfill the trouble, try to make changes on the project to improve and achieve what you want, it's a good start at least.

Please, take a look at here

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionfuboView Question on Stackoverflow
Solution 1 - C#fuboView Answer on Stackoverflow
Solution 2 - C#KonstantinView Answer on Stackoverflow
Solution 3 - C#Sarvesh MishraView Answer on Stackoverflow
Solution 4 - C#SimonView Answer on Stackoverflow
Solution 5 - C#Rick JamesView Answer on Stackoverflow
Solution 6 - C#Stefan SteigerView Answer on Stackoverflow
Solution 7 - C#Alex SikilindaView Answer on Stackoverflow
Solution 8 - C#ashinView Answer on Stackoverflow
Solution 9 - C#vik_78View Answer on Stackoverflow
Solution 10 - C#Dzianis YafimauView Answer on Stackoverflow
Solution 11 - C#Bruno HenriqueView Answer on Stackoverflow