How to iterate through data and write to excel file quicker with C# and openXML

Posted on

Problem

I have a pretty simple method that grabs data from an sql query and then puts it inside of a list and finally iterates through two for each loops onto an excel file. It works fine with small amounts of data, but not at all with larger amounts. What would be the best way to iterate and also write to my workbook the quickest? I am using OpenXML.

Here is the code

  private UInt32 AddSheets(SpreadsheetDocument document, WorkbookPart workbookPart, Sheets sheets, string sheetName, UInt32 sheetNumber, List<RentalFeeReportingRecord> currentRecordList)
    {
        Dictionary<int, string> listofCoreBankingIds = new Dictionary<int, string>();
        listofCoreBankingIds = currentRecordList.Select(x => new { id = x.CoreBankingId, regionName = x.RegionName }).Distinct().ToDictionary(y => y.id, y => y.regionName);
        foreach (var coreBankingId in listofCoreBankingIds.OrderBy(x => x.Value))
        {

            WorksheetPart worksheetPart1 = workbookPart.AddNewPart<WorksheetPart>();
            string currentCurrency = null;
            string regionName = null;

            Worksheet worksheet1 = new Worksheet();
            SheetData sheetData1 = new SheetData();

            Row rowInSheet1 = new Row();
            Row firstRow = new Row();
            Row lastRow = new Row();

            rowInSheet1.Append(
              excelController.ConstructCell("Region", CellValues.String, 3), excelController.ConstructCell("Merchant - Terminal", CellValues.String, 3),
              excelController.ConstructCell("Fee Charged", CellValues.String, 3),
              excelController.ConstructCell("Currency", CellValues.String, 3),
              excelController.ConstructCell("Processing Date", CellValues.String, 3),
              excelController.ConstructCell("Description", CellValues.String, 3)
            );

            sheetData1.Append(rowInSheet1);


            foreach (var currentRecord in currentRecordList.OrderBy(x => x.RegionName).ThenBy(z => z.Currency).ThenBy(y => y.DateProcessed))
            {
                if (coreBankingId.Key == currentRecord.CoreBankingId)
                {

                    regionName = currentRecord.RegionName;
                    currentCurrency = currentRecord.Currency;
                    double currentTotal = currentRecord.TotalFeeAmount;
                    string merchant = currentRecord.MerchantRecord.MerchantID + " - " + currentRecord.MerchantTerminalRecord.terminalID;

                    if (currentRecord.IsPinPad)
                    {
                        merchant = merchant + " - PIN Pad";
                    }

                    string description = currentRecord.TerminalRecord.description;
                    if (currentRecord.IsProRated)
                    {
                        int days = DateTime.DaysInMonth(currentRecord.DeployedDate.Year, currentRecord.DeployedDate.Month);

                        description = currentRecord.TerminalRecord.description + " (" + currentRecord.MonthlyFee + " Fee \ " + days.ToString() + " Days) * " + currentRecord.DaysActive.ToString() + " Days Active";
                    }

                    rowInSheet1 = new Row();
                    firstRow = new Row();
                    lastRow = new Row();

                    rowInSheet1.Append(
                      excelController.ConstructCell(regionName, CellValues.String, 2),
                      excelController.ConstructCell(merchant, CellValues.String, 2),
                      excelController.ConstructCell(currentRecord.TotalFee.ToString("N2"), CellValues.String, 1),
                      excelController.ConstructCell(currentRecord.Currency, CellValues.String, 2),
                      excelController.ConstructCell(currentRecord.DateProcessed.ToString("MMM dd, yyyy"), CellValues.String, 2),
                      excelController.ConstructCell(description, CellValues.String, 2)
                    );

                    sheetData1.Append(rowInSheet1);

                }

            }


            Columns columns = excelController.AutoSize(sheetData1);

            worksheet1.AppendChild(columns);
            worksheet1.AppendChild(sheetData1);
            worksheetPart1.Worksheet = worksheet1;


            Sheet sheet1 = new Sheet()
            {

                Id = document.WorkbookPart.GetIdOfPart(worksheetPart1),
                SheetId = sheetNumber,
                Name = regionName
            };

            sheets.Append(sheet1);
            sheetNumber++;
        }
        return sheetNumber;

    }

Any help would be appreciated.

Solution

It would be worth doing a high-level analysis of your algorithm’s complexity. I won’t use big o to keep things simple.

// ...
foreach (var coreBankingId in listofCoreBankingIds.OrderBy(x => x.Value))
{
    // ...
    foreach (var currentRecord in currentRecordList.OrderBy(x => x.RegionName).ThenBy(z => z.Currency).ThenBy(y => y.DateProcessed))
    {
        if (coreBankingId.Key == currentRecord.CoreBankingId)
        {
            // ...
        }
    }
}

So:

  • for every iteration of coreBankingId, you are reordering the currentRecordList in exactly the same way
  • the inner loop only cares about 1 set of values (when the CoreBankingId is equal to the outer loop’s key)

With those two pieces of knowledge, I suggest you look at using GroupBy instead. This code is untested straight into CR and I’m a bit rusty on C# but something like:

var groups = currentRecordList
    .GroupBy(r => new { Id = r.CoreBankingId, RegionName = r.RegionName });

foreach (var group in groups.OrderBy(g => g.Key.RegionName)
{
    var coreBankingId = group.Key.Id;

    foreach (var currentRecord in group.OrderBy(x => x.RegionName).ThenBy(z => z.Currency).ThenBy(y => y.DateProcessed))
    {
        // Your loop
    }
}

That should have scale much better to bigger datasets.


Original suggestion (which I think is wrong):

var data = currentRecordList
    .GroupBy(r => new { Id = r.CoreBankingId, RegionName = r.RegionName })
    .OrderBy(g => g.Key.RegionName)
    // I can't remember if ToLookup preserves order. If it doesn't, you'll have to tweak this to order later.
    .ToLookup(
        g => g.Key.Id, 
        g => g.OrderBy(x => x.RegionName)
            .ThenBy(z => z.Currency)
            .ThenBy(y => y.DateProcessed)
            .ToList());

foreach (var datum in data)
{
    var coreBankingId = datum.Key;
    foreach (var activity in datum)
    {
        // Your loop body without the if.
    }
}

Quick remarks:

  • I don’t see the point of listofCoreBankingIds, especially since you only use the key, never the value. Just loop through the currentRecordList. I’m guessing you might need to GroupBy the CoreBankingId and loop through that, but your logic is obscured by the creation of that dictionary.

  • currentRecordList is a bad name IMHO. The “List” part is pointless. Why not call it rentalFeeReportingRecords or currentRecords?

  • Six parameters, almost a hundred lines: I’d turn this method into a class of its own and split it into several descriptively named methods. For instance, the compilation of the description could be a method.

  • Your logic jumps all over the place. For instance, worksheetPart1 gets defined early on, but doesn’t come into play until almost at the end. Same for worksheet1.

  • currentCurrency is never used. regionName only exists because you do the pointless “create dictionary and loop through its items” logic.

  • Why the “1” suffix in sheetData1 and worksheet1 and rowInSheet1?

  • Use string interpolation instead of concatenating strings.

Leave a Reply

Your email address will not be published. Required fields are marked *