Should LINQ be avoided because it's slow?

C#.NetLinqPerformance

C# Problem Overview


I've had been told that since .net linq is so slow we shouldn't use it and was wondering anyone else has come up with the same conclusion, and example is:

Took 1443ms to do 1000000000 compares non-LINQ.
Took 4944ms to do 1000000000 compares with LINQ.
(243% slower)

the non-LINQ code:

for (int i = 0; i < 10000; i++)
{
    foreach (MyLinqTestClass1 item in lst1) //100000 items in the list
    {
        if (item.Name == "9999")
        {
            isInGroup = true;
            break;
        }
    }
}

Took 1443ms to do 1000000000 compares non-LINQ.

LINQ code:

for (int i = 0; i < 10000; i++)  
    isInGroup = lst1.Cast<MyLinqTestClass1>().Any(item => item.Name == "9999");  

Took 4944ms to do 1000000000 compares with LINQ.

I guess its possible to optimize the LINQ code but the thought was that its easily to get really slow LINQ code and given that it shouldn't be used. Given that LINQ is slow then it would also follow that PLINQ is slow and NHibernate LINQ would be slow so any kind on LINQ statement should not be used.

Has anyone else found that LINQ is so slow that they wished they had never used it, or am I making a too general conclusion based on benchmarks like this?

C# Solutions


Solution 1 - C#

> Should Linq be avoided because its slow?

No. It should be avoided if it is not fast enough. Slow and not fast enough are not at all the same thing!

Slow is irrelevant to your customers, your management and your stakeholders. Not fast enough is extremely relevant. Never measure how fast something is; that tells you nothing that you can use to base a business decision on. Measure how close to being acceptable to the customer it is. If it is acceptable then stop spending money on making it faster; it's already good enough.

Performance optimization is expensive. Writing code so that it can be read and maintained by others is expensive. Those goals are frequently in opposition to each other, so in order to spend your stakeholder's money responsibly you've got to ensure that you're only spending valuable time and effort doing performance optimizations on things that are not fast enough.

You've found an artificial, unrealistic benchmark situation where LINQ code is slower than some other way of writing the code. I assure you that your customers care not a bit about the speed of your unrealistic benchmark. They only care if the program you're shipping to them is too slow for them. And I assure you, your management cares not a bit about that (if they're competent); they care about how much money you're spending needlessly to make stuff that is fast enough unnoticably faster, and making the code more expensive to read, understand, and maintain in the process.

Solution 2 - C#

Why are you using Cast<T>()? You haven't given us enough code to really judge the benchmark, basically.

Yes, you can use LINQ to write slow code. Guess what? You can write slow non-LINQ code, too.

LINQ greatly aids expressiveness of code dealing with data... and it's not that hard to write code which performs well, so long as you take the time to understand LINQ to start with.

If anyone told me not to use LINQ (especially LINQ to Objects) for perceived reasons of speed I would laugh in their face. If they came up with a specific bottleneck and said, "We can make this faster by not using LINQ in this situation, and here's the evidence" then that's a very different matter.

Solution 3 - C#

Maybe I've missed something, but I'm pretty sure your benchmarks are off.

I tested with the following methods:

  • The Any extension method ("LINQ")
  • A simple foreach loop (your "optimized" method)
  • Using the ICollection.Contains method
  • The Any extension method using an optimized data structure (HashSet<T>)

Here is the test code:

class Program
{
    static void Main(string[] args)
    {
        var names = Enumerable.Range(1, 10000).Select(i => i.ToString()).ToList();
        var namesHash = new HashSet<string>(names);
        string testName = "9999";
        for (int i = 0; i < 10; i++)
        {
            Profiler.ReportRunningTimes(new Dictionary<string, Action>() 
            {
                { "Enumerable.Any", () => ExecuteContains(names, testName, ContainsAny) },
                { "ICollection.Contains", () => ExecuteContains(names, testName, ContainsCollection) },
                { "Foreach Loop", () => ExecuteContains(names, testName, ContainsLoop) },
                { "HashSet", () => ExecuteContains(namesHash, testName, ContainsCollection) }
            },
            (s, ts) => Console.WriteLine("{0, 20}: {1}", s, ts), 10000);
            Console.WriteLine();
        }
        Console.ReadLine();
    }

    static bool ContainsAny(ICollection<string> names, string name)
    {
        return names.Any(s => s == name);
    }

    static bool ContainsCollection(ICollection<string> names, string name)
    {
        return names.Contains(name);
    }

    static bool ContainsLoop(ICollection<string> names, string name)
    {
        foreach (var currentName in names)
        {
            if (currentName == name)
                return true;
        }
        return false;
    }

    static void ExecuteContains(ICollection<string> names, string name,
        Func<ICollection<string>, string, bool> containsFunc)
    {
        if (containsFunc(names, name))
            Trace.WriteLine("Found element in list.");
    }
}

Don't worry about the internals of the Profiler class. It just runs the Action in a loop and uses a Stopwatch to time it. It also makes sure to call GC.Collect() before each test to eliminate as much noise as possible.

Here were the results:

      Enumerable.Any: 00:00:03.4228475
ICollection.Contains: 00:00:01.5884240
        Foreach Loop: 00:00:03.0360391
             HashSet: 00:00:00.0016518

      Enumerable.Any: 00:00:03.4037930
ICollection.Contains: 00:00:01.5918984
        Foreach Loop: 00:00:03.0306881
             HashSet: 00:00:00.0010133

      Enumerable.Any: 00:00:03.4148203
ICollection.Contains: 00:00:01.5855388
        Foreach Loop: 00:00:03.0279685
             HashSet: 00:00:00.0010481

      Enumerable.Any: 00:00:03.4101247
ICollection.Contains: 00:00:01.5842384
        Foreach Loop: 00:00:03.0234608
             HashSet: 00:00:00.0010258

      Enumerable.Any: 00:00:03.4018359
ICollection.Contains: 00:00:01.5902487
        Foreach Loop: 00:00:03.0312421
             HashSet: 00:00:00.0010222

The data is very consistent and tells the following story:

  • Naïvely using the Any extension method is about 9% slower than naïvely using a foreach loop.

  • Using the most appropriate method (ICollection<string>.Contains) with an unoptimized data structure (List<string>) is approximately 50% faster than naïvely using a foreach loop.

  • Using an optimized data structure (HashSet<string>) completely blows any of the other methods out of the water in performance terms.

I have no idea where you got 243% from. My guess is it has something to do with all that casting. If you're using an ArrayList then not only are you using an unoptimized data structure, you're using a largely obsolete data structure.

I can predict what comes next. "Yeah, I know you can optimize it better, but this was just an example to compare the performance of LINQ vs. non-LINQ."

Yeah, but if you couldn't even be thorough in your example, how can you possibly expect to be this thorough in production code?

The bottom line is this:

>

How you architect and design your software is exponentially more important than what specific tools you use and when.

If you run into performance bottlenecks - which is every bit as likely to happen with LINQ vs. without - then solve them. Eric's suggestion of automated performance tests is an excellent one; that will help you to identify the problems early so that you can solve them properly - not by shunning an amazing tool that makes you 80% more productive but happens to incur a < 10% performance penalty, but by actually investigating the issue and coming up with a real solution that can boost your performance by a factor of 2, or 10, or 100 or more.

Creating high-performance applications is not about using the right libraries. It's about profiling, making good design choices, and writing good code.

Solution 4 - C#

Is LINQ a real-world bottleneck (either effecting the overall or perceived performance of the application)?

Will your application be performing this operation on 1,000,000,000+ records in the real-world? If so--then you might want to consider alternatives--if not then it's like saying "we can't buy this family sedan because it doesn't drive well at 180+ MPH".

If it's "just slow" then that's not a very good reason... by that reasoning you should be writing everything in asm/C/C++, and C# should be off the table for being "too slow".

Solution 5 - C#

While premature pessimization is (imho) as bad as premature optimization, you shouldn't rule out an entire technology based on absolute speed without taking usage context into consideration. Yes, if you're doing some really heavy number-crunching and this is a bottleneck, LINQ could be problematic - profile it.

An argument you could use in favour of LINQ is that, while you can probably outperform it with handwritten code, the LINQ version could likely be clearer and easier to maintain - plus, there's the advantage of PLINQ compared to complex manual parallelization.

Solution 6 - C#

The problem with this sort of comparison, is that its meaningless in the abstract.

One could beat either of those if one got to start by hashing the MyLinqTestClass1 objects by their Name property. In between those if one could sort them by Name and later do a binary search. Indeed, we don't need to store the MyLinqTestClass1 objects for that, we just need to store the names.

Memory size a problem? Maybe store the names in a DAWG structure, combine suffices and then use that for this check?

Does the extra overhead in setting these data structures up make any sense? It's impossible to tell.

A further matter is a different problem with the concept of LINQ, which is its name. It's great for marketing purposes for MS to be able to say "here's a bunch of cool new stuff that works together" but less good when it comes to people combining stuff together when they are doing the sort of analysis where they should be pulling them apart. You've to a call to Any that basically implements the filter-on-enumerable pattern common in .NET2.0 days (and not unknown with .NET1.1 though it being more awkward to write meant it was only used where its efficiency benefits in certain cases really mattered), you've got lambda expressions and you've got query trees all bunged together in one concept. Which is the slow one?

I'd bet the answer here is the lambda and not the use of Any, but I wouldn't bet a large amount (e.g. the success of a project), I'd test and be sure. Meanwhile, the way lambda expressions work with IQueryable can make for particularly efficient code that it would be extremely difficult to write with equivalent efficiency without the use of lambdas.

Do we not get to be efficient when LINQ is good at efficiency because it failed an artificial benchmark? I don't think so.

Use LINQ where it makes sense.

In bottleneck conditions, then move away or to LINQ despite it seeming appropriate or inappropriate as an optimisation. Don't write hard to understand code first go, as you'll just make real optimisation harder.

Solution 7 - C#

Maybe linq is slow, but with linq i can parallelize my code very simple.

Like this:

lst1.Cast<MyLinqTestClass1>().AsParallel().Any(item => item.Name == "9999");

How you would parallelize cycle?

Solution 8 - C#

Here's an interesting observation, since you mention nHibernate being slow as a consequence of LINQ being slow. If you're doing LINQ to SQL (or the nHibernate equivalent), then your LINQ code translates to an EXISTS query on the SQL server where as your loop code must first fetch all rows, then iterate over them. Now, you could easily write such a test so that the loop code reads all the data once (single DB lookup) for all 10K runs but the LINQ code actually performs 10K SQL queries. That's probably going to show a big speed advantage for the loop-version which doesn't exist in reality. In reality a single EXISTS query is going to outperform the table scan and loop every time -- even if you don't have an index on the column being queried (which you probably would if this query is done very often).

I'm not saying that it is the case with your test -- we don't have enough code to see -- but it could be. It could also be that there really is a performance difference with LINQ to Objects, but that may not translate to LINQ to SQL at all. You need to know what you're measuring and how applicable it is to your real world needs.

Solution 9 - C#

To me, this sounds like you're working on a contract and the employer either doesn't understand LINQ, or doesn't understand the performance bottlenecks of the system. If you're writing an applicaiton with a GUI, the minor performance impact of using LINQ is negligible. In a typical GUI/Web app, in-memory calls make up less than 1% of all wait time. You, or rather your employer, is trying to optimize that 1%. Is that really beneficial?

However, if you are writing an application that is scientific or heavily math oriented, with very little disk or database access, then I'd agree that LINQ is not the way to go.

BTW, the cast is not needed. The following is functionally equivalent to your first test:

       for (int i = 0; i < 10000; i++)
            isInGroup = lst1.Any(item => item.Name == "9999");

When I ran this using a test list containing 10,000 MyLinqTestClass1 objects, the original ran in 2.79 seconds, the revised in 3.43 seconds. Saving 30% on operations that likely take up less than 1% percent of CPU time is not a good use of your time.

Solution 10 - C#

"I've had been told [by whom?] that since .net linq is so slow [for what?] we shouldn't use it"

In my experience, basing decisions such as what technique, library or language to use solely on what someone has once told you is a bad idea.

First of all, does the information come from a source you trust? If not, you might be making a huge mistake trusting this (perhaps unknown) person to make your design decisions. Secondly, is this information still relevant today? But okay, based on your simple and not very realistic benchmark, you've concluded that LINQ is slower than manually performing the same operation. The natural question to ask yourself is this: is this code performance critical? Will the performance of this code be limited by other factors than the execution speed of my LINQ query -- think database queries, waiting on I/O, etc?

Here's how I like to work:

  1. Identify the problem to be solved, and write the simplest feature-complete solution given the requirements and limitations you already know of
  2. Determine whether your implementation actually fulfills the requirements (is it fast enough? Is the resource consumption kept at an acceptable level?).
  3. If it does, you're done. If not, look for ways to optimize and refine your solution until it passes the test at #2. This is where you may need to consider giving up on something because it's too slow. Maybe. Chances are, though, that the bottleneck isn't where you expected it to be at all.

To me, this simple method serves a single purpose: maximizing my productivity by minimizing the time I spend improving code that is already perfectly adequate.

Yes, the day might come when you find that your original solution doesn't cut it any more. Or it might not. If it does, deal with it then and there. I suggest you avoid wasting your time trying to solve hypothetical (future) problems.

Solution 11 - C#

Yes, you're right. It's easy to write slow code in LINQ. The others are right, too: it's easy to write slow code in C# without LINQ.

I wrote the same loop as you in C and it ran some number of milliseconds faster. The conclusion I draw from this is that C# itself is slow.

As with your LINQ->loop expansion, in C it will take more than 5 times as many lines of code to do the same thing, making it slower to write, harder to read, more likely to have bugs, and tougher to find and fix them, but if saving a few milliseconds for every billion iterations is important, that's often what it takes.

Solution 12 - C#

As you demonstrated, it is possible to write non-LINQ code that performs better than LINQ code. But the reverse is also possible. Given the maintenance advantage that LINQ can provide, you might consider defaulting to LINQ as it is unlikely you will run into any performance bottlenecks that can be attributed to LINQ.

That said, there are some scenarios where LINQ just won't work. For example, if you are importing a ton of data, you might find that the act of executing individual inserts is slower than sending the data to SQL Server in batches of XML. In this example, it's not that the LINQ insert is faster than the non-LINQ insert, rather it's not prodent to execute individual SQL inserts for bulk data imports.

Solution 13 - C#

I would rather say you should avoid trying too hard to write most efficient code, except it is mandatory.

Solution 14 - C#

>Given that LINQ is slow then it would also follow that PLINQ is slow and NHibernate LINQ would be slow so any kind on LINQ statement should not be used.

That's a Way different context, but incredibly different. A 1.4 vs. 5 secs for the whole 1 billion of operations are irrelevant when you are talking about data access operations.

Solution 15 - C#

Your test case is a little skewed. The ANY operater will begin enumerating through your results and return true on the first instance if finds and quit. Try this with simple lists of strings to see the result. To answer your question about avoiding LINQ, you should really transition toward the use of LINQ. It makes code easier to read when doing complex queries in addition to compile time checking. Also you don't need to use the Cast operator in your example.

string compareMe = "Success";
string notEqual = "Not Success";

List<string> headOfList = new List<string>();
List<string> midOfList = new List<string>();
List<string> endOfList = new List<string>();

//Create a list of 999,999 items
List<string> masterList = new List<string>();
masterList.AddRange(Enumerable.Repeat(notEqual, 999999));

//put the true case at the head of the list
headOfList.Add(compareMe);
headOfList.AddRange(masterList);

//insert the true case in the middle of the list
midOfList.AddRange(masterList);
midOfList.Insert(masterList.Count/2, compareMe);

//insert the true case at the tail of the list
endOfList.AddRange(masterList);
endOfList.Add(compareMe);


Stopwatch stopWatch = new Stopwatch();

stopWatch.Start();
headOfList.Any(p=>p == compareMe);
stopWatch.ElapsedMilliseconds.Dump();
stopWatch.Reset();

stopWatch.Start();
midOfList.Any(p=>p == compareMe);
stopWatch.ElapsedMilliseconds.Dump();
stopWatch.Reset();

stopWatch.Start();
endOfList.Any(p=>p == compareMe);
stopWatch.ElapsedMilliseconds.Dump();
stopWatch.Stop();

Solution 16 - C#

The type casting is of course going to slow your code down. If you care that much, at least used a strongly typed IEnumerable for the comparison. I myself try to use LINQ wherever possible. It makes your code much more concise. It's not not to have to worry about the imperative details of your code. LINQ is a functional concept, which means you'll spell out what you want to happen and not worry about how.

Solution 17 - C#

There are a thousand times better reasons to avoid Linq.

Following quote from a discussion on Linq names a few of them:

QUOTE1

"For instance this works:

var a = new { x = 1, y = 2 };
a = new { x = 1, y = 3 };

But this does not work:

var a = new { x = 1, y = 2 };
a = new { x = 1, y = 2147483649 };

It returns : Error 1 Cannot implicitly convert type 'AnonymousType#1' to 'AnonymousType#2'

But this works:

var a = new { x = 1, y = 2147483648 };
a = new { x = 1, y = 2147483649 };

When you compile:

var a = new { x = 1, y = 2 };

The type of the x and y components is arbitrarily declared as a 32 bit signed integer, and it is one of the many integer types the platform has, without anything special.

But there is more. For instance this works:

double x = 1.0;
x = 1;

But this does not work:

var a = new { x = 1.0, y = 0 }; 
a = new { x = 1, y = 0 };

The numeric conversion rules are not applicable to this kind of types. As you can see, elegance is in every detail."

QUOTE2

"It appears, then, that 'AnonymousType#1' and 'AnonymousType#2' are not synonymous--they name distinct types. And as { x = 1, y = 2 } and { y = 2, x = 1 } are expressions of those two types, respectively, not only do they denote distinct values, but also values of distinct types.

So, I was right to be paranoid. Now my paranoia extends even further and I have to ask what LinQ makes of the following comparison:

new { x = 1, y = 2 } == new { x = 1, y = 2 }

The result is false because this is a pointer comparison.

But the result of:

(new { x = 1, y = 2 }).Equals(new { x = 1, y = 2 })

Is true.

And the result of:

(new { x = 1, y = 2 }).Equals(new { y = 2, x = 1 })

and

(new { x = 1, y = 2 }).Equals(new { a = 1, b = 2 })

Is false."

QUOTE3

"updates are record oriented :-O

This, I agree, is problematic, and derives from LINQ's sequence-oriented nature.

This is a show stopper for me. If I have to use SQL for my updates anyway why to bother about LinQ?

the optimization in LinQ to objects is unexistent.

There is not any algebraic optimization nor automatic expression rewrite. Many people don't want to use LinQ to Objects because they lose a lot of performance. Queries are executed in the same way as you write them."

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser455095View Question on Stackoverflow
Solution 1 - C#Eric LippertView Answer on Stackoverflow
Solution 2 - C#Jon SkeetView Answer on Stackoverflow
Solution 3 - C#AaronaughtView Answer on Stackoverflow
Solution 4 - C#STWView Answer on Stackoverflow
Solution 5 - C#snemarchView Answer on Stackoverflow
Solution 6 - C#Jon HannaView Answer on Stackoverflow
Solution 7 - C#gandjustasView Answer on Stackoverflow
Solution 8 - C#tvanfossonView Answer on Stackoverflow
Solution 9 - C#Beep beepView Answer on Stackoverflow
Solution 10 - C#Martin TörnwallView Answer on Stackoverflow
Solution 11 - C#KenView Answer on Stackoverflow
Solution 12 - C#MayoView Answer on Stackoverflow
Solution 13 - C#tiaView Answer on Stackoverflow
Solution 14 - C#eglasiusView Answer on Stackoverflow
Solution 15 - C#Steve G.View Answer on Stackoverflow
Solution 16 - C#A-DubbView Answer on Stackoverflow
Solution 17 - C#Erwin SmoutView Answer on Stackoverflow