Excel Interop - Efficiency and performance

C#ExcelPerformanceInteropVsto

C# Problem Overview


I was wondering what I could do to improve the performance of Excel automation, as it can be quite slow if you have a lot going on in the worksheet...

Here's a few I found myself:

  • ExcelApp.ScreenUpdating = false -- turn off the redrawing of the screen

  • ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual -- turning off the calculation engine so Excel doesn't automatically recalculate when a cell value changes (turn it back on after you're done)

  • Reduce calls to Worksheet.Cells.Item(row, col) and Worksheet.Range -- I had to poll hundreds of cells to find the cell I needed. Implementing some caching of cell locations, reduced the execution time from ~40 to ~5 seconds.

What kind of interop calls take a heavy toll on performance and should be avoided? What else can you do to avoid unnecessary processing being done?

C# Solutions


Solution 1 - C#

When using C# or VB.Net to either get or set a range, figure out what the total size of the range is, and then get one large 2 dimensional object array...

//get values
object[,] objectArray = shtName.get_Range("A1:Z100").Value2;
iFace = Convert.ToInt32(objectArray[1,1]);

//set values
object[,] objectArray = new object[3,1] {{"A"}{"B"}{"C"}};
rngName.Value2 = objectArray;

Note that its important you know what datatype Excel is storing (text or numbers) as it won't automatically do this for you when you are converting the type back from the object array. Add tests if necessary to validate the data if you can't be sure beforehand of the type of data.

Solution 2 - C#

This is for anyone wondering what the best way is to populate an excel sheet from a db result set. This is not meant to be a full list by any means but it does list a few options.

Some performance numbers while attempting to populate an excel sheet with 155 columns and 4200 records on an old Pentium 4 3GHz box including data retrieval time which was never more than 10 seconds in order of slowest to fastest is as follows...

  1. One cell at a time - Just under 11 minutes

  2. Populating a dataset by converting to html + Saving html to disk + Loading html into excel and saving worksheet as xls/xlsx - 5 minutes

  3. One column at a time - 4 minutes

  4. Using the deprecated sp_makewebtask procedure in SQL 2005 to create an HTML file - 9 Seconds + Followed by loading the html file in excel and saving as XLS/XLSX - About 2 minutes.

  5. Convert .Net dataset to ADO RecordSet and use the WorkSheet.Range[].CopyFromRecordset function to populate excel - 45 seconds!

I ended up using option 5. Hope this helps.

Solution 3 - C#

If you're polling values of many cells you can get all the cell values in a range stored in a variant array in one fell swoop:

Dim CellVals() as Variant
CellVals = Range("A1:B1000").Value

There is a tradeoff here, in terms of the size of the range you're getting values for. I'd guess if you need a thousand or more cell values this is probably faster than just looping through different cells and polling the values.

Solution 4 - C#

Use excels builtin functionality whenever possible, for example: Instead of searching a whole column for a given string, use the find command available in the GUI by Ctrl-F:

Set Found = Cells.Find(What:=SearchString, LookIn:=xlValues, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, _
    MatchCase:=False, SearchFormat:=False)
               
If Not Found Is Nothing Then
    Found.Activate
    (...)
EndIf

If you want to sort some lists, use the excel sort command, don't do it manually in VBA:

Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
    OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
    DataOption1:=xlSortNormal

Solution 5 - C#

Performance also depends a lot on how you automate Excel. VBA is faster than COM automation is faster than .NET automation. And typically early (compile time) binding is faster than late binding, too.

If you have serious performance problems you could think of moving the critical parts of the code to a VBA module and call that code from your COM/.NET automation code.

If you use .NET you should also use the optimized primary interop assemblies available from Microsoft and not use custom-built interop assemblies.

Solution 6 - C#

As Anonymous Type says: reading/writing large range blocks is very important to performance.

In cases where the COM-Interop overhead is still too large you may want to switch to using the XLL interface, which is the fastest Excel interface.

Although the XLL interface is primarily meant for C++ users, both XL DNA and Addin Express provide .NET to XLL bridge capability which is significantly faster than COM-Interop.

Solution 7 - C#

Another big thing you can do in VBA is to use Option Explicit and avoid Variants wherever possible. Variants are not 100% avoidable in VBA, but they make the interpreter do more work at runtime and waste memory.

I found this article very helpful when I was starting with VBA in Excel.
http://www.ozgrid.com/VBA/SpeedingUpVBACode.htm

And this book

http://www.amazon.com/VB-VBA-Nutshell-Language-OReilly/dp/1565923588

Similar to

 app.ScreenUpdates = false //and
 app.Calculation = xlCalculationManual

you can also set

 app.EnableEvents = false //Prevent Excel events
 app.Interactive = false  //Prevent user clicks and keystrokes

although they don't seem to make as big a difference as the first two.

Similar to setting Range values to arrays, if you are working with data that is mostly tables with the same formula in every row of a column, you can use R1C1 formula notation for your formula and set an entire column equal to the formula string to set the whole thing in one call.

app.ReferenceStyle = xlR1C1
app.ActiveSheet.Columns(2) = "=SUBSTITUTE(C[-1],"foo","bar")"

Also, creating XLL add-ins using ExcelDNA & .NET (or the hard way in C) is also the only way you can get UDFs to run on multiple threads. (See Excel DNA's ExcelFunction attribute's IsThreadSafe property.)

Before I transitioned to Excel DNA completely, I also experimented with creating COM visible libraries in .NET to reference in VBA projects. Heavy text processing is a bit faster than VBA that way, as are using wrapped .NET List classes instead of VBA's Collection, but Excel DNA is better.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionVincent Van Den BergheView Question on Stackoverflow
Solution 1 - C#Anonymous TypeView Answer on Stackoverflow
Solution 2 - C#RiteshView Answer on Stackoverflow
Solution 3 - C#Jon FournierView Answer on Stackoverflow
Solution 4 - C#TrebView Answer on Stackoverflow
Solution 5 - C#Dirk VollmarView Answer on Stackoverflow
Solution 6 - C#Charles WilliamsView Answer on Stackoverflow
Solution 7 - C#JamesFaixView Answer on Stackoverflow