.Include() vs .Load() performance in EntityFramework

C#.NetEntity Framework

C# Problem Overview


When querying a large table where you need to access the navigation properties later on in code (I explicitly don't want to use lazy loading) what will perform better .Include() or .Load()? Or why use the one over the other?

In this example the included tables all only have about 10 entries and employees has about 200 entries, and it can happen that most of those will be loaded anyway with include because they match the where clause.

Context.Measurements.Include(m => m.Product)
                    .Include(m => m.ProductVersion)
                    .Include(m => m.Line)
                    .Include(m => m.MeasureEmployee)
                    .Include(m => m.MeasurementType)
                    .Where(m => m.MeasurementTime >= DateTime.Now.AddDays(-1))
                    .ToList();

or

Context.Products.Load();
Context.ProductVersions.Load();
Context.Lines.Load();
Context.Employees.Load();
Context.MeasurementType.Load();

Context.Measurements.Where(m => m.MeasurementTime >= DateTime.Now.AddDays(-1))
                    .ToList();

C# Solutions


Solution 1 - C#

It depends, try both

When using Include(), you get the benefit of loading all of your data in a single call to the underlying data store. If this is a remote SQL Server, for example, that can be a major performance boost.

The downside is that Include() queries tend to get really complicated, especially if you have any filters (Where() calls, for example) or try to do any grouping. EF will generate very heavily nested queries using sub-SELECT and APPLY statements to get the data you want. It is also much less efficient -- you get back a single row of data with every possible child-object column in it, so data for your top level objects will be repeated a lot of times. (For example, a single parent object with 10 children will product 10 rows, each with the same data for the parent-object's columns.) I've had single EF queries get so complex they caused deadlocks when running at the same time as EF update logic.

The Load() method is much simpler. Each query is a single, easy, straightforward SELECT statement against a single table. These are much easier in every possible way, except you have to do many of them (possibly many times more). If you have nested collections of collections, you may even need to loop through your top level objects and Load their sub-objects. It can get out of hand.

Quick rule-of-thumb

Try to avoid having any more than three Include calls in a single query. I find that EF's queries get too ugly to recognize beyond that; it also matches my rule-of-thumb for SQL Server queries, that up to four JOIN statements in a single query works very well, but after that it's time to consider refactoring.

However, all of that is only a starting point.

It depends on your schema, your environment, your data, and many other factors.

In the end, you will just need to try it out each way.

Pick a reasonable "default" pattern to use, see if it's good enough, and if not, optimize to taste.

Solution 2 - C#

Include() will be written to SQL as JOIN: one database roundtrip.

Each Load()-instruction is "explicitly loading" the requested entities, so one database roundtrip per call.

Thus Include() will most probably be the more sensible choice in this case, but it depends on the database layout, how often this code is called and how long your DbContext lives. Why don't you try both ways and profile the queries and compare the timings?

See Loading Related Entities.

Solution 3 - C#

I agree with @MichaelEdenfield in his answer but I did want to comment on the nested collections scenario. You can get around having to do nested loops (and the many resulting calls to the database) by turning the query inside out.

Rather than loop down through a Customer's Orders collection and then performing another nested loop through the Order's OrderItems collection say, you can query the OrderItems directly with a filter such as the following.

context.OrderItems.Where(x => x.Order.CustomerId == customerId);

You will get the same resulting data as the Loads within nested loops but with just a single call to the database.

Also, there is a special case that should be considered with Includes. If the relationship between the parent and the child is one to one then the problem with the parent data being returned multiple times would not be an issue.

I am not sure what the effect would be if the majority case was where no child existed - lots of nulls? Sparse children in a one to one relationship might be better suited to the direct query technique that I outlined above.

Solution 4 - C#

Include is an example of eager loading, where as you not only load the entities you are querying for, but also all related entities.

Load is an manual override of the EnableLazyLoading. If this one is set to false. You can still lazily load the entity you asked for with .Load()

Solution 5 - C#

It's always hard to decide whether to go with Eager, Explicit or even Lazy Loading.
What I would recommend anyway is always to perform some profiling. That's the only way to be sure your request will be performant or not.
There're a lot of tools that will help you out. Have a look at this article from Julie Lerman where she lists several different ways to do profiling. One simple solution is to start profiling in your SQL Server Management Studio.
Do not hesitate to talk with a DBA (if you have on near you) that will help you to understand the execution plan.
You could also have a look a this presentation where I wrote a section about loading data and performance.

Solution 6 - C#

One more thing to add to this thread. It depends on what server you use. If you are working on sql server it's ok to use eager loading but for sqlite you will have to use .Load() to avoid crossloading exception cause sqlite can not deal with some include statements that go deeper than one dependency level

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionStaeffView Question on Stackoverflow
Solution 1 - C#Michael EdenfieldView Answer on Stackoverflow
Solution 2 - C#CodeCasterView Answer on Stackoverflow
Solution 3 - C#Scott MunroView Answer on Stackoverflow
Solution 4 - C#Henk JansenView Answer on Stackoverflow
Solution 5 - C#MaxSCView Answer on Stackoverflow
Solution 6 - C#PernerOlView Answer on Stackoverflow