Reading data from a CSV and pushing it to a Salesforce application

Posted on

Problem

I’m trying to write code that reads data from a CSV and pushes it to a Salesforce application using the API. My code processes the data in a for loop, but it takes a long time (3 hours) to run the function. What can I do to optimize my code to run faster?

Here’s an example of my code which reads Patient Diagnosis data from a flatfile which us more than 200k records. Inside the for loop, I query the patient list which has 100k+ records, transform the object then add it to a list for bulk processing. My code looks like this:

Iterating over ptdiag which contains flatfile data

for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
  var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(),sfPatients);
  var batch = BulkUpsert(job.Id, batchContents);
}

Function that transforms the object. Here I query sfpatients to link a patientid to the diagnosis object

    public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, List<SfPatient__c> sfpatients)
    {
        string res = "Patient__c,DiagKey__c,NickName__c" +
            ",Sequence__c,ShortDescr__c,PTDiagKey__c" + Environment.NewLine;

        foreach (var d in ptdiags)
        {
            var sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);
            sfd.Patient__c = sfpatients.FirstOrDefault(c => c.PatientKey__c == d.Ptkey.ToString())?.Id;

            res += string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
                                , sfd.Sequence__c, sfd.ShortDescr__c.Replace(",",""), sfd.PTDiagKey__c);
            if (ptdiags.Last() != d)
                res += Environment.NewLine;
        }

        return res;
    }

Method that creates a mapping for Ptdiag

    public static SfPatientDiag__c Map_BTSQL_Patientdiag_To_SF_Patientdiag(Ptdiag d)
    {
        return new SfPatientDiag__c
        {
            DiagKey__c = d.Diagkey.ToString(),
            Diagnosis__r = new SfDiagnosis__c { Diagnosis_Key__c = d.Diagkey.ToString() },
            NickName__c = d.Nickname,
            Patient__r = new SfPatient__c { PatientKey__c = d.Ptkey.ToString() },
            Sequence__c = d.Sequence != null ? Convert.ToDouble(d.Sequence) : 0,
            ShortDescr__c = d.Shortdescr,
            PTDiagKey__c = d.Ptdiagkey.ToString()
        };

    }

Solution

In addition to the post by @aepot, I made the following changes which reduced the completion time significantly.

I used a dictionary instead of a list to query patient Id. Querying a large list of objects in for loop is what really slowed down the processing. Here’s what the code looks like now:

for (int i = 0; i < ptdiags.Count; i += BATCH_SIZE)
{
  var batchContents = SFToBTMapping.Bulk_PtDiag_Content(ptdiags.Skip(i).Take(BATCH_SIZE).ToList(), sfPatients.ToDictionary(p => p.PatientKey__c));
  var batch = BulkUpsert(job.Id, batchContents);
}

public static string Bulk_PtDiag_Content(List<Ptdiag> ptdiags, Dictionary<string, SfPatient__c> sfpatients)
        {
            var sb = new StringBuilder();
            sb.AppendLine("Patient__c,DiagKey__c,NickName__c,Sequence__c,ShortDescr__c,PTDiagKey__c");

            foreach (Ptdiag d in ptdiags)
            {
                SfPatientDiag__c sfd = Map_BTSQL_Patientdiag_To_SF_Patientdiag(d);

                sfd.Patient__c = sfpatients.GetValueOrDefault(d.Ptkey.ToString()) != null ? sfpatients[d.Ptkey.ToString()].Id : "";

                sb.Append(string.Join(",", sfd.Patient__c, sfd.DiagKey__c, sfd.NickName__c
                                    , sfd.Sequence__c, sfd.ShortDescr__c.Replace(",", ""), sfd.PTDiagKey__c));
                if (ptdiags.Last() != d)
                    sb.AppendLine();
            }

            return sb.ToString();
        }

Leave a Reply

Your email address will not be published. Required fields are marked *