How to loop through IEnumerable in batches

C#Ienumerable

C# Problem Overview


I am developing a C# program which has an "IEnumerable users" that stores the ids of 4 million users. I need to loop through the IEnumerable and extract a batch 1000 ids each time to perform some operations in another method.

How do I extract 1000 ids at a time from start of the IEnumerable, do some thing else, then fetch the next batch of 1000 and so on?

Is this possible?

C# Solutions


Solution 1 - C#

You can use MoreLINQ's Batch operator (available from NuGet):

foreach(IEnumerable<User> batch in users.Batch(1000))
   // use batch

If simple usage of library is not an option, you can reuse implementation:

public static IEnumerable<IEnumerable<T>> Batch<T>(
        this IEnumerable<T> source, int size)
{
    T[] bucket = null;
    var count = 0;

    foreach (var item in source)
    {
       if (bucket == null)
           bucket = new T[size];

       bucket[count++] = item;

       if (count != size)                
          continue;
       
       yield return bucket.Select(x => x);
           
       bucket = null;
       count = 0;
    }

    // Return the last bucket with all remaining elements
    if (bucket != null && count > 0)
    {
        Array.Resize(ref bucket, count);
        yield return bucket.Select(x => x);
    }
}

BTW for performance you can simply return bucket without calling Select(x => x). Select is optimized for arrays, but selector delegate still would be invoked on each item. So, in your case it's better to use

yield return bucket;

Solution 2 - C#

Sounds like you need to use Skip and Take methods of your object. Example:

users.Skip(1000).Take(1000)

this would skip the first 1000 and take the next 1000. You'd just need to increase the amount skipped with each call

You could use an integer variable with the parameter for Skip and you can adjust how much is skipped. You can then call it in a method.

public IEnumerable<user> GetBatch(int pageNumber)
{
    return users.Skip(pageNumber * 1000).Take(1000);
}

Solution 3 - C#

The easiest way to do this is probably just to use the GroupBy method in LINQ:

var batches = myEnumerable
    .Select((x, i) => new { x, i })
    .GroupBy(p => (p.i / 1000), (p, i) => p.x);

But for a more sophisticated solution, see this blog post on how to create your own extension method to do this. Duplicated here for posterity:

public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
    List<T> nextbatch = new List<T>(batchSize);
    foreach (T item in collection)
    {
        nextbatch.Add(item);
        if (nextbatch.Count == batchSize)
        {
            yield return nextbatch;
            nextbatch = new List<T>(); 
            // or nextbatch.Clear(); but see Servy's comment below
        }
    }

    if (nextbatch.Count > 0)
        yield return nextbatch;
}

Solution 4 - C#

How about

int batchsize = 5;
List<string> colection = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"};
for (int x = 0; x < Math.Ceiling((decimal)colection.Count / batchsize); x++)
{
    var t = colection.Skip(x * batchsize).Take(batchsize);
}

Solution 5 - C#

try using this:

  public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
        this IEnumerable<TSource> source,
        int batchSize)
    {
        var batch = new List<TSource>();
        foreach (var item in source)
        {
            batch.Add(item);
            if (batch.Count == batchSize)
            {
                 yield return batch;
                 batch = new List<TSource>();
            }
        }
    
        if (batch.Any()) yield return batch;
    }

and to use above function:

foreach (var list in Users.Batch(1000))
{
    
}

Solution 6 - C#

You can achieve that using Take and Skip Enumerable extension method. For more information on usage checkout linq 101

Solution 7 - C#

Something like this would work:

List<MyClass> batch = new List<MyClass>();
foreach (MyClass item in items)
{
    batch.Add(item);

    if (batch.Count == 1000)
    {
        // Perform operation on batch
        batch.Clear();
    }
}

// Process last batch
if (batch.Any())
{
    // Perform operation on batch
}

And you could generalize this into a generic method, like this:

static void PerformBatchedOperation<T>(IEnumerable<T> items, 
                                       Action<IEnumerable<T>> operation, 
                                       int batchSize)
{
    List<T> batch = new List<T>();
    foreach (T item in items)
    {
        batch.Add(item);

        if (batch.Count == batchSize)
        {
            operation(batch);
            batch.Clear();
        }
    }

    // Process last batch
    if (batch.Any())
    {
        operation(batch);
    }
}

Solution 8 - C#

Solution 9 - C#

In a streaming context, where the enumerator might get blocked in the middle of the batch, simply because the value is not yet produced (yield) it is useful to have a timeout method so that the last batch is produced after a given time. I used this for example for tailing a cursor in MongoDB. It's a little bit complicated, because the enumeration has to be done in another thread.

    public static IEnumerable<List<T>> TimedBatch<T>(this IEnumerable<T> collection, double timeoutMilliseconds, long maxItems)
    {
        object _lock = new object();
        List<T> batch = new List<T>();
        AutoResetEvent yieldEventTriggered = new AutoResetEvent(false);
        AutoResetEvent yieldEventFinished = new AutoResetEvent(false);
        bool yieldEventTriggering = false; 

        var task = Task.Run(delegate
        {
            foreach (T item in collection)
            {
                lock (_lock)
                {
                    batch.Add(item);

                    if (batch.Count == maxItems)
                    {
                        yieldEventTriggering = true;
                        yieldEventTriggered.Set();
                    }
                }

                if (yieldEventTriggering)
                {
                    yieldEventFinished.WaitOne(); //wait for the yield to finish, and batch to be cleaned 
                    yieldEventTriggering = false;
                }
            }
        });

        while (!task.IsCompleted)
        {
            //Wait for the event to be triggered, or the timeout to finish
            yieldEventTriggered.WaitOne(TimeSpan.FromMilliseconds(timeoutMilliseconds));
            lock (_lock)
            {
                if (batch.Count > 0) //yield return only if the batch accumulated something
                {
                    yield return batch;
                    batch.Clear();
                    yieldEventFinished.Set();
                }
            }
        }
        task.Wait();
    }

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionuser1526912View Question on Stackoverflow
Solution 1 - C#Sergey BerezovskiyView Answer on Stackoverflow
Solution 2 - C#BillView Answer on Stackoverflow
Solution 3 - C#p.s.w.gView Answer on Stackoverflow
Solution 4 - C#KabindasView Answer on Stackoverflow
Solution 5 - C#ZakiView Answer on Stackoverflow
Solution 6 - C#Krishnaswamy SubramanianView Answer on Stackoverflow
Solution 7 - C#JLRisheView Answer on Stackoverflow
Solution 8 - C#Aghilas YakoubView Answer on Stackoverflow
Solution 9 - C#Mihai CiureanuView Answer on Stackoverflow