Log in

Subscribe to site updates.

Return of the Batch Enumerable

Posted by Nicholas Blumhardt | November 17, 2007 09:58

Just a quick post about a little extension to IEnumerable<T> that caused a nasty case of VM upsets when Mark Monsour and I tried to run it after building on the VS2005 compiler. (Where's your blog, Mark? :P)

Turns out that the solution works under the 2008 compiler - perhaps we were doing something crazy in the last attempt a year ago - for interest's sake here is the code anyway:

static class InBatchesExtension
{
    public static IEnumerable<IEnumerable<T>> InBatches<T>(
        this IEnumerable<T> source, 
        int batchSize)
    { 
        using (var enumerator = source.GetEnumerator())
            while(enumerator.MoveNext())
                yield return TakeBatch(enumerator.Current, enumerator, batchSize);
    }
    
    static IEnumerable<T> TakeBatch<T>(T item, IEnumerator<T> rest, int max)
    {
        yield return item;
        int count = 1;
        while (count++ < max && rest.MoveNext())
            yield return rest.Current;
    }
}

This code breaks an IEnumerable<T> up into batches, so that you can write code like:

var ints = new[] { 1, 2, 3, 4, 5, 6, 7 };

foreach (var batch in ints.InBatches(3))
{
    Console.Write("Batch: ");
    foreach (var item in batch)
        Console.Write(item.ToString() + " ");
    Console.WriteLine();
}

This prints:

Batch: 1 2 3
Batch: 4 5 6
Batch: 7

See the discussion on Luke's blog if you're curious as to why the using statement is necessary.

Comments

Posted by Nicholas Blumhardt | November 18, 2007 21:52

This algorithm works when translated and run under VS2005. I wish I still had the original attempt...

I like it... but it's not quite right

Posted by Luke Marshall | November 19, 2007 15:51

Love the idea, but the batches are still dependent on each other. i.e share the same enumerator. While this is very efficient, it won't work as intended if you want to skip the processing for one of the batched queues. Here is my version, which solves that issue: public static IEnumerable<IEnumerable<T>> InBatches<T>(this IEnumerable<T> source, int batchSize) { for (IEnumerable<T> s = source; s.Any(); s = s.Skip(batchSize)) yield return s.Take(batchSize); } Is it cheating to use linq?

Right you are!

Posted by Nicholas Blumhardt | November 19, 2007 20:47

Also - the cost of the extra enumerator is nothing once you're processing a non-trivial sequence. I should have put the standard "don't use this but..." disclaimer on the article ;)

Posted by Luke Marshall | November 20, 2007 10:12

Yes, the cost of the enumerators are cheap, but if the actual enumeration is expensive then my version will be a lot slower due to the skip. Skip still evaluates every element, it just ignores the results. Note that this is the case for linq to objects. I'm curious to see how this would work over the top of my file reading enumeration. I think your version would load the file once, whereas mine would load it many times.... interesting prospect.

Your Comment





Reset

Disclaimer: These articles represent the opinions of the authors and may not match the official position of Ubik Systems Pty. Ltd. Confirmation should be sought on all matters involving professional advice.