Why does this method result in an infinite loop?

C#LinqStack OverflowInfinite Loop

C# Problem Overview


One of my coworkers came to me with a question about this method that results in an infinite loop. The actual code is a bit too involved to post here, but essentially the problem boils down to this:

private IEnumerable<int> GoNuts(IEnumerable<int> items)
{
    items = items.Select(item => items.First(i => i == item));
    return items;
}

This should (you would think) just be a very inefficient way to create a copy of a list. I called it with:

var foo = GoNuts(new[]{1,2,3,4,5,6});

The result is an infinite loop. Strange.

I think that modifying the parameter is, stylistically a bad thing, so I changed the code slightly:

var foo = items.Select(item => items.First(i => i == item));
return foo;

That worked. That is, the program completed; no exception.

More experiments showed that this works, too:

items = items.Select(item => items.First(i => i == item)).ToList();
return items;

As does a simple

return items.Select(item => .....);

Curious.

It's clear that the problem has to do with reassigning the parameter, but only if evaluation is deferred beyond that statement. If I add the ToList() it works.

I have a general, vague, idea of what's going wrong. It looks like the Select is iterating over its own output. That's a little bit strange in itself, because typically an IEnumerable will throw if the collection it's iterating changes.

What I don't understand, because I'm not intimately familiar with the internals of how this stuff works, is why re-assigning the parameter causes this infinite loop.

Is there somebody with more knowledge of the internals who would be willing to explain why the infinite loop occurs here?

C# Solutions


Solution 1 - C#

The key to answering this is deferred execution. When you do this

items = items.Select(item => items.First(i => i == item));

you do not iterate the items array passed into the method. Instead, you assign it a new IEnumerable<int>, which references itself back, and starts iterating only when the caller starts enumerating the results.

That is why all your other fixes have dealt with the problem: all you needed to do is to stop feeding IEnumerable<int> back to itself:

  • Using var foo breaks self-reference by using a different variable,
  • Using return items.Select... breaks self-reference by not using intermediate variables at all,
  • Using ToList() breaks self-reference by avoiding deferred execution: by the time items is re-assigned, old items has been iterated over, so you end up with a plain in-memory List<int>.

> But if it's feeding on itself, how does it get anything at all?

That's right, it does not get anything! The moment you try iterating items and ask it for the first item, the deferred sequence asks the sequence fed to it for the first item to process, which means that the sequence is asking itself for the first item to process. At this point, it's turtles all the way down, because in order to return the first item to process the sequence must first get the first item to process from itself.

Solution 2 - C#

>It looks like the Select is iterating over its own output

You are correct. You are returning a query that iterates over itself.

The key is that you reference items within the lambda. The items reference is not resolved ("closed over") until the query iterates, at which point items now references the query instead of the source collection. That's where the self-reference occurs.

Picture a deck of cards with a sign in front of it labelled items. Now picture a man standing beside the deck of cards whose assignment is to iterate the collection called items. But then you move the sign from the deck to the man. When you ask the man for the first "item" - he looks for the collection marked "items" - which is now him! So he asks himself for the first item, which is where the circular reference occurs.

When you assign the result to a new variable, you then have a query that iterates over a different collection, and so does not result in an infinite loop.

When you call ToList, you hydrate the query to a new collection and also do not get an infinite loop.

Other things that would break the circular reference:

  • Hydrating items within the lambda by calling ToList
  • Assigning items to another variable and referencing that within the lambda.

Solution 3 - C#

After studying the two answers given and poking around a bit, I came up with a little program that better illustrates the problem.

    private int GetFirst(IEnumerable<int> items, int foo)
    {
        Console.WriteLine("GetFirst {0}", foo);
        var rslt = items.First(i => i == foo);
        Console.WriteLine("GetFirst returns {0}", rslt);
        return rslt;
    }

    private IEnumerable<int> GoNuts(IEnumerable<int> items)
    {
        items = items.Select(item =>
        {
            Console.WriteLine("Select item = {0}", item);
            return GetFirst(items, item);
        });
        return items;
    }

If you call that with:

var newList = GoNuts(new[]{1, 2, 3, 4, 5, 6});

You'll get this output repeatedly until you finally get StackOverflowException.

Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
...

What this shows is exactly what dasblinkenlight made clear in his updated answer: the query goes into an infinite loop trying to get the first item.

Let's write GoNuts a slightly different way:

    private IEnumerable<int> GoNuts(IEnumerable<int> items)
    {
	    var originalItems = items;
        items = items.Select(item =>
        {
            Console.WriteLine("Select item = {0}", item);
            return GetFirst(originalItems, item);
        });
        return items;
    }

If you run that, it succeeds. Why? Because in this case it's clear that the call to GetFirst is passing a reference to the original items that were passed to the method. In the first case, GetFirst is passing a reference to the new items collection, which hasn't yet been realized. In turn, GetFirst says, "Hey, I need to enumerate this collection." And thus begins the first recursive call that eventually leads to StackOverflowException.

Interestingly, I was right and wrong when I said that it was consuming its own output. The Select is consuming the original input, as I would expect. The First is trying to consume the output.

Lots of lessons to be learned here. To me, the most important is "don't modify the value of input parameters."

Thanks to dasblinkenlight, D Stanley, and Lucas Trzesniewski for their help.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJim MischelView Question on Stackoverflow
Solution 1 - C#Sergey KalinichenkoView Answer on Stackoverflow
Solution 2 - C#D StanleyView Answer on Stackoverflow
Solution 3 - C#Jim MischelView Answer on Stackoverflow