What is the fastest way to determine if a row exists using Linq to SQL?

C#Linq to-SqlLambda

C# Problem Overview


I am not interested in the contents of a row, I just want to know if a row exists. The Name column is a primary key, so there will either be 0 or 1 matching rows. Currently, I am using:

if ((from u in dc.Users where u.Name == name select u).Count() > 0)
    // row exists
else
    // row doesn't exist

While the above works, it does a lot of unnecessary work by selecting all the contents of the row (if it exists). Does the following create a faster query:

if (dc.Users.Where(u => u.Name == name).Any())

...or is there an even faster query?

C# Solutions


Solution 1 - C#

The Count() approach may do extra work, as (in TSQL) EXISTS or TOP 1 are often much quicker; the db can optimise "is there at least one row". Personally, I would use the any/predicate overload:

if (dc.Users.Any(u => u.Name == name)) {...}

Of course, you can compare what each one does by watching the TSQL:

dc.Log = Console.Out;

Solution 2 - C#

Of Course

if (dc.Users.Where(u => u.Name == name).Any())

this is best and if multiple conditions to check then it is very simple to write as

Say you want to check the user for company then

if (dc.Users.Where(u => u.ID== Id && u.Company==company).Any())

Solution 3 - C#

I think:

if (dc.Users.Any(u => u.Name == name)) {...}

is the best approach.

Solution 4 - C#

For those people claiming Any() is the way forward I've done a simple test in LinqPad against a SQL database of CommonPasswords, 14 million give or take. Code:

var password = "qwertyuiop123";

var startTime = DateTime.Now;
"From DB:".Dump();
startTime = DateTime.Now;

if (CommonPasswords.Any(c => System.Data.Linq.SqlClient.SqlMethods.Like(c.Word, password)))
{
    $"FOUND: processing time: {(DateTime.Now - startTime).TotalMilliseconds}\r\n".Dump();
}
else
{
    $"NOT FOUND: processing time: {(DateTime.Now - startTime).TotalMilliseconds}\r\n".Dump();
}

"From DB:".Dump();
startTime = DateTime.Now;
if (CommonPasswords.Where(c => System.Data.Linq.SqlClient.SqlMethods.Like(c.Word, password)).Count() > 0)
{
    $"FOUND: processing time: {(DateTime.Now - startTime).TotalMilliseconds}\r\n".Dump();
}
else
{
    $"NOT FOUND: processing time: {(DateTime.Now - startTime).TotalMilliseconds}\r\n".Dump();
}

"From DB:".Dump();
startTime = DateTime.Now;
if (CommonPasswords.Where(c => c.Word.ToLower() == password).Take(1).Any())
{
    $"FOUND: processing time: {(DateTime.Now - startTime).TotalMilliseconds}\r\n".Dump();
}
else
{
    $"NOT FOUND: processing time: {(DateTime.Now - startTime).TotalMilliseconds}\r\n".Dump();
}

Here is the translated SQL:

-- Region Parameters
DECLARE @p0 NVarChar(1000) = 'qwertyuiop123'
-- EndRegion
SELECT 
    (CASE 
        WHEN EXISTS(
            SELECT NULL AS [EMPTY]
            FROM [Security].[CommonPasswords] AS [t0]
            WHERE [t0].[Word] LIKE @p0
            ) THEN 1
        ELSE 0
     END) AS [value]
GO

-- Region Parameters
DECLARE @p0 NVarChar(1000) = 'qwertyuiop123'
-- EndRegion
SELECT COUNT(*) AS [value]
FROM [Security].[CommonPasswords] AS [t0]
WHERE [t0].[Word] LIKE @p0
GO

-- Region Parameters
DECLARE @p0 NVarChar(1000) = 'qwertyuiop123'
-- EndRegion
SELECT 
    (CASE 
        WHEN EXISTS(
            SELECT NULL AS [EMPTY]
            FROM (
                SELECT TOP (1) NULL AS [EMPTY]
                FROM [Security].[CommonPasswords] AS [t0]
                WHERE LOWER([t0].[Word]) = @p0
                ) AS [t1]
            ) THEN 1
        ELSE 0
     END) AS [value]

You can see that ANY wraps the query up in another layer of code to do a CASE Where Exists Then 1 where as Count() just adds in a Count command. Problem with both of these is you can't do a Top(1) but I can't see a better way using Top(1)

Results:

From DB: FOUND: processing time: 13.3962

From DB: FOUND: processing time: 12.0933

From DB: FOUND: processing time: 787.8801

Again:

From DB: FOUND: processing time: 13.3878

From DB: FOUND: processing time: 12.6881

From DB: FOUND: processing time: 780.2686

Again:

From DB: FOUND: processing time: 24.7081

From DB: FOUND: processing time: 23.6654

From DB: FOUND: processing time: 699.622

Without Index:

From DB: FOUND: processing time: 2395.1988

From DB: FOUND: processing time: 390.6334

From DB: FOUND: processing time: 664.8581

Now some of you may be thinking it's only a millisecond or two. However the varience was much greater before I put an index on it; by a few seconds.

The last calculation is there as I started with the notion that ToLower() would be faster than LIKE, and I was right, until I tried count and put an Index on it. I guess the Lower() makes the index irrelavent.

Solution 5 - C#

I do not agree that selecting top 1 will always outperform select count for all SQL implementations. It's all implementation dependent, you know. Curiously, even the nature of data stored in a particular database also affects the overall outcome.

Let's examine both of them the way I would implement them if I were to do so: For both cases, the projection (WHERE clause) evaluation is a common step.

Next for select top 1, you will have to do a read of all fields (unless you did select top 1 'x' eg: select top 1 1). This will be functionally equivalent to IQueryable.Any(...)., except that you will spend some time flashing in the value for each column of the first encountered record if EXISTS. If SELECT TOP is found in the statement, the projection is truncation if there's no post-projection proc (eg ORDER BY clause). This preprocess incurs a small cost but this is extra cost if no record does exist, in which case, a full project is still done.

For select count, the preprocess is not done. A projection is done and if EXISTS is false, the result is instant. If EXISTS is true, the count is still fast because it will be a mere dW_Highest_Inclusive - dW_Lowest_Exclusive. As quick as 500 - 26. If exists is false, the result is even more instant.

The remaining case therefore is: How fast is the projection and what do you loose by doing full projection? And the answer leads to the most crucial issue here which is: is the [NAME] field indexed or not! If you have an index on [NAME] the performance of either query will be so close that it boils down to developer's preference.

By and large, I will simply write two to four linq queries and output difference in time before and after.

  1. select count
  2. select top 1
  3. select top 1 1
  4. select any

Repeat all 4 with an nonclustered index on [NAME];

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionProtagonistView Question on Stackoverflow
Solution 1 - C#Marc GravellView Answer on Stackoverflow
Solution 2 - C#RajuView Answer on Stackoverflow
Solution 3 - C#MRFerociusView Answer on Stackoverflow
Solution 4 - C#TodView Answer on Stackoverflow
Solution 5 - C#Pita.OView Answer on Stackoverflow