Group by and sum query on multiple columns

Posted on

Problem

var fpslist = db.FPSinformations.Where(x => x.Godown_Code != null && x.Godown_Code == godownid).ToList();
var data1 = fpslist.GroupBy(x => x.Ration_Card_Type1)
    .Select(x => new
    {
        CardType_Name = x.Key,
        CardType_Count = x.Sum(y => y.Ration_Card_Count1)
    }).ToList();
var data2 = fpslist.GroupBy(x => x.Ration_Card_Type2)
    .Select(x => new
    {
        CardType_Name = x.Key,
        CardType_Count = x.Sum(y => y.Ration_Card_Count2)
    }).ToList();
var data3 = fpslist.GroupBy(x => x.Ration_Card_Type3)
    .Select(x => new
    {
        CardType_Name = x.Key,
        CardType_Count = x.Sum(y => y.Ration_Card_Count3)
    }).ToList();
var data4 = fpslist.GroupBy(x => x.Ration_Card_Type4)
    .Select(x => new
    {
        CardType_Name = x.Key,
        CardType_Count = x.Sum(y => y.Ration_Card_Count4)
    }).ToList();
var data5 = fpslist.GroupBy(x => x.Ration_Card_Type5)
    .Select(x => new
    {
        CardType_Name = x.Key,
        CardType_Count = x.Sum(y => y.Ration_Card_Count5)
    }).ToList();

var GodownRCCount = data1.Where(x => x.CardType_Name != null).ToList();
var GodownRCCounts = GodownRCCount;
GodownRCCount = data2.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);
GodownRCCount = data3.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);
GodownRCCount = data4.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);
GodownRCCount = data5.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);

I have 10 columns in my database like this:

Ration_Card_Type1
Ration_card_count1
Ration_Card_Type2
Ration_card_count2
Ration_Card_Type3
Ration_card_count3
Ration_Card_Type4
Ration_card_count4
Ration_Card_Type5
Ration_card_count5

Now what I want is to get the sum of Ration_Card_Counts and its type from its type.

Expected output:

CardType_Name
CardType_Count

It works fine, but I want to optimize it in max possible way as this will be inside a loop.

Solution

From what you gave us, there’s nothing you can do to optimize your query except changing the way your data is stored.

Having Ration_Card_Count[1..5] is a big code smell. There’s something wrong in the implementation.

Though, you could change this code to be a little less repetitive.

Most code is duplicated in there, so let’s refactor it :

var data1 = fpslist.GroupBy(x => x.Ration_Card_Type1)
.Select(x => new
{
    CardType_Name = x.Key,
    CardType_Count = x.Sum(y => y.Ration_Card_Count1)
}).ToList();

This could be extracted in a method but beforehand, we need to create a small data type for your anonymous type because methods cannot return anonymous types.

struct CardGrouping
{
    public string Name { get; set; }
    public int Count { get; set; }
}

private static IEnumerable<CardGrouping> GetCardGrouping(IQueryable<??> queryable, Func<???, int> groupingFunction)
{
    return queryable.GroupBy(groupingFunction)
        .Select(x => new CardGrouping()
        {
            Name = x.Key,
            Count = x.Sum(groupingFunction)
        }).ToList();
}

Notice the ???, they’re here because there’s not enough context in your question, I can’t figure what type to use. In your next question, please add more context! 🙂

Now, we’ve got a method that does the nasty work for us, let’s see what the code looks like :

var data1 = GetCardGrouping(fpsList, y => y.Ration_Card_Count1);
var data2 = GetCardGrouping(fpsList, y => y.Ration_Card_Count2);
var data3 = GetCardGrouping(fpsList, y => y.Ration_Card_Count3);
var data4 = GetCardGrouping(fpsList, y => y.Ration_Card_Count4);
var data5 = GetCardGrouping(fpsList, y => y.Ration_Card_Count5);

var GodownRCCount = data1.Where(x => x.CardType_Name != null).ToList();
var GodownRCCounts = GodownRCCount;
GodownRCCount = data2.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);
GodownRCCount = data3.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);
GodownRCCount = data4.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);
GodownRCCount = data5.Where(x => x.CardType_Name != null).ToList();
GodownRCCounts.AddRange(GodownRCCount);

You repeat x => x.CardType_Name != null with every data, so maybe we should add this to our method :

private static IEnumerable<CardGrouping> GetCardGrouping(IQueryable<??> queryable, Func<??, int> groupingFunction)
{
    return queryable.GroupBy(groupingFunction)
            .Where(x => x.Key != null)
            .Select(x => new CardGrouping
            {
                Name = x.Key,
                Count = x.Sum(groupingFunction)
            }).ToList();
}

Now what does it look like :

var data1 = GetCardGrouping(fpsList, y => y.Ration_Card_Count1);
var data2 = GetCardGrouping(fpsList, y => y.Ration_Card_Count2);
var data3 = GetCardGrouping(fpsList, y => y.Ration_Card_Count3);
var data4 = GetCardGrouping(fpsList, y => y.Ration_Card_Count4);
var data5 = GetCardGrouping(fpsList, y => y.Ration_Card_Count5);

var GodownRCCounts = new List<???>();
GodownRCCounts.AddRange(data1);
GodownRCCounts.AddRange(data2);
GodownRCCounts.AddRange(data3);
GodownRCCounts.AddRange(data4);
GodownRCCounts.AddRange(data5);

The naming is pretty terrible. Snake_Case shouldn’t be used in C#. If you can change all those Ration_Card_Count5 to RationCardCount5, it’d be great. Also, variable shouldn’t be PascalCased, but camelCased. So GodownRCCounts = godownRCCounts.

Lastly, try to use var only when it’s possible to guess the type used by reading the code. Right now it is impossible and it makes the code harder to read/review.

Edit : I don’t think this code even works. Is Ration_Card_Count an int or a string? Or maybe a Nullable<int>? Because you use it in the Sum function, which takes an int, but you also check for null, which isn’t a possible value for int. What does that mean?? If you check int == null, you can remove all the checks for that, it’ll never happen.

First and most obvious issue is usage of ToList(), it’s not required and you force creation of multiple (and possibly big) lists moreover you’re repeating almost same code again and again. Step by step:

var fpslist = db.FPSinformations.Where(x => x.Godown_Code == godownid);

Unless comparison operator for Godown_Code is broken then you shouldn’t need to check for null (in case you may opt for ?. to avoid it). Also you don’t need .ToList(), drop it. If you will execute such code inside a loop you may consider to move this outside the loop (to be sure more context is needed).

Now let’s make a reusable function (and also avoid to create multiple anonymous types, it’s a micro optimization but it also simplifies our code):

static IEnumerable<Tuple<TName, TCount>> CountByType<T, TName, TCount>(
    IEnumerable<T> list,
    Func<T, TName> nameSelector,
    Func<IEnumerable<T>, TCount> aggregator)
{
    return list.Where(x => Object.Equals(nameSelector(x), null) == false)
        .GroupBy(x => nameSelector(x))
        .Select(x => Tuple.Create(x.Key, aggregator(x)));
}

Now your code can be greatly simplified (I assumed you cannot change database columns otherwise everything can be even simpler, see last section of this answer). I also used generics because type of your columns is unknown.

    var data1 = CountByType(fpslist, 
        x => x.Ration_Card_Type1,
        x => x.Sum(y => y.Ration_Card_Count1));

    var data2 = CountByType(fpslist, 
        x => x.Ration_Card_Type2,
        x => x.Sum(y => y.Ration_Card_Count2));

   // Repeat for each set

   var result = data1
       .Concatenate(data2)
       .Concatenate(data3)
       .Concatenate(data4)
       .Concatenate(data5);

Note that from your code it seems you can have multiple rows with a value for Ration_Card_CountX. If you have just one row then all code will be greatly reduced. Here I used a Tuple<T1, T2> because I do not know exact types of your objects, you’d better use a proper class instead.

Note that you can consume result as-is (an enumeration) or materialize a list (don’t forget this if data comes from DB and connection will be disposed, LINQ execution is deferred).


About DB

I do not know if you can change your database schema and I do not know exact layout of your data then this section may not apply in your case.

If your data is in the form:

Ration_Card_Type1   Ration_Card_Count1   Ration_Card_Type2   Ration_Card_Count2
Type1                                5   NULL                                 0
NULL                                 0   Type2                               11
Type1                               12   Type2                                3

Then you have to keep your actual code but I would suggest to do not map 1:1 column names with C# property names (conventions are pretty different) then Ration_Card_Type1 should be RationCardType1 (assuming you can’t simply have Type1, Type2 and so on).

If, for example, you have data with this simplified layout:

Ration_Card_Type1   Ration_Card_Count1   Ration_Card_Type2   Ration_Card_Count2
Type1                                5   NULL                                 0
NULL                                 0   Type2                               11
Type1                               12   NULL                                 0

Then you may both optimize and simplify your DB (it will also impact code):

Type   Count
1          5
2         11
1         12

In that case all you need (assuming your C# entities maps 1:1 with DB columns):

var result = db.GroupBy(x => x.Type)
    .Select(x => new { Type = x.Key, Count = x.Sum(y => y.Count) });

Leave a Reply

Your email address will not be published. Required fields are marked *