Sum(), ForEach() with Lambda in a hierarchical List Collection

Posted on

Problem

I have to list the number of documents related to items in a hierarchical tree.

The specification states:
The tree will only ever be 2 levels deep.
The higher level items will list not only the number of documents in it’s folder, but also the total count of documents in the child folders. They want this total to include the documents in the current folder.

I’ve developed the solution below, it works, however it just seems complicated, hard to maintain and probably not the most efficient way to do this. Also, if the tree is ever more than 2 levels deep, this will fail.

class Item
{
    public int Id { get; set; }
    public int ParentId { get; set; }
    public int DocumentCount { get; set; }
    public int ChildItemCount { get; set; }
    public int ChildCount { get; set; }
}

static class Program
{
    static void Main()
    {
        var items = new List<Item>
        {
            new Item {Id = 1, ParentId = 0},
            new Item {Id = 2, ParentId = 1},
            new Item {Id = 3, ParentId = 1},
            new Item {Id = 4, ParentId = 2},
            new Item {Id = 5, ParentId = 2},
            new Item {Id = 6, ParentId = 3},
            new Item {Id = 7, ParentId = 3},
            new Item {Id = 8, ParentId = 3}
        };
        // setup
        var rnd = new Random();
        items.ForEach(c => c.DocumentCount = rnd.Next(1, 10));

        // act
        items.ForEach(z => z.ChildCount = items.Count(x => x.ParentId == z.Id));
        items.ForEach(z => z.ChildItemCount = items.Where(x => x.ParentId == z.Id).Sum(y => y.DocumentCount) + z.DocumentCount);
        items.Where(k => k.ParentId == 0).ToList().ForEach(z => z.ChildItemCount = items.Where(x => x.ParentId == z.Id).Sum(y => y.ChildItemCount) + z.DocumentCount);

        // output
        items.ForEach(i => Console.WriteLine("Id: {0}, ParentId: {1}, DocumentCount: {2}, ChildCount: {3}, ChildItemCount: {4}", i.Id, i.ParentId, i.DocumentCount,i.ChildCount, i.ChildItemCount));
        Console.ReadKey();
    }
}

This gives the output below which is technically correct and meets the spec, but I just feel the solution is not the best solution. Can anyone think of a better way to do this?

Id: 1, ParentId: 0, DocumentCount: 8, ChildCount: 2, ChildItemCount: 46
Id: 2, ParentId: 1, DocumentCount: 7, ChildCount: 2, ChildItemCount: 16
Id: 3, ParentId: 1, DocumentCount: 8, ChildCount: 3, ChildItemCount: 22
Id: 4, ParentId: 2, DocumentCount: 2, ChildCount: 0, ChildItemCount: 2
Id: 5, ParentId: 2, DocumentCount: 7, ChildCount: 0, ChildItemCount: 7
Id: 6, ParentId: 3, DocumentCount: 9, ChildCount: 0, ChildItemCount: 9
Id: 7, ParentId: 3, DocumentCount: 1, ChildCount: 0, ChildItemCount: 1
Id: 8, ParentId: 3, DocumentCount: 4, ChildCount: 0, ChildItemCount: 4

Solution

You can create lookup for children. That will involve single enumeration over items collection instead of enumerating all items for each calculation of each item.

I.e. you currently have 2 * N * N iterations for calculating child count and child item count. With lookup you will have N + N + N iterations

var children = items.ToLookup(i => i.ParentId);

foreach(var item in items) 
{
    var subitems = children[item.Id];
    item.ChildCount = subitems.Count();
    item.ChildItemCount = subitems.Sum(i => i.DocumentCount) + item.DocumentCount;
}

Explanation: Let’s look on this line

items.ForEach(z => z.ChildItemCount = 
    items.Where(x => x.ParentId == z.Id).Sum(y => y.DocumentCount) + z.DocumentCount);

Here for each item in items collection you enumerate every item in items collection to get only items which are children of current. That gives you N * N iterations totally. Count is calculated same way.

Now lets look on my code.

var children = items.ToLookup(i => i.ParentId);

Here lookup is created. Internally it looks like enumerating over items and adding them to appropriate groupings of lookup. You can think of lookup as a Dictionary<TKey, List<TValue>>.

Next we do enumerating of all items (that’s another N). In body of this loop we get children of item (getting appropriate group by key is pretty fast O(1) operation, comparing to enumerating all items and filtering them). Good news here is behavior of lookup when key not found – it simply returns EmptyEnumerable (i.e. empty array of TValue), so we don’t need to check anything – further calculations will work without problems:

foreach(var item in items)
{
    var subitems = children[item.Id];
    // ...
}

Next we do two operations. First is getting count of items in group. Grouping implements ICollection with Count property, and it simply returns count of items in group, without enumerating them:

item.ChildCount = subitems.Count();

Second operation is calculating sum of documents in group. Thus total number of items in all groups will be N then enumerating all groups will take N iterations:

item.ChildItemCount = subitems.Sum(i => i.DocumentCount) + item.DocumentCount;

UPDATE: Thus you want ChildItemCount to contain sum

@Sergey’s answer does a beautiful job at flattening your solution’s complexity; this is more of a style review.

Your lambda parameters are arbitrarily named:

items.ForEach(z => z.ChildCount = items.Count(x => x.ParentId == z.Id));

Consider giving a somewhat meaningful name to these identifiers:

items.ForEach(item => item.ChildCount = items.Count(child => child.ParentId == item.Id));

Better naming and proper whitespace make it easier to read your code; you didn’t need to come up with y, it’s in a scope that’s a sibling of that of x, but it’s harder to tell when everything is on one line:

items.ForEach(z => z.ChildItemCount = items.Where(x => x.ParentId == z.Id).Sum(y => y.DocumentCount) + z.DocumentCount);

Breaking the line and lining up the dots makes it easier to see the scopes created (and iterations), at a glance:

items.ForEach(item => item.ChildItemCount = items.Where(child => child.ParentId == item.Id)
                                                 .Sum(child => child.DocumentCount) + item.DocumentCount);

Leave a Reply

Your email address will not be published. Required fields are marked *