Search datatable faster

Posted on

Problem

I would like to make the search faster as it takes a long time to retrieve the data that I require. Is there a faster method to do this? I have written the code below to allow me to search through a data table and retrieve data, which currently is fully functional. It’s the nested for loops which I am using and wondering if there is another method of completing the tasks that the for loops get?

 DataTable sqlData = GetConfiguration();

       // var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");

        var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
            .Select(data => data.Field<String>("QuestionStartText")).Distinct();

        List<String> queryResult = new List<String>();
        foreach (var item in w.ToArray<string>())
        {
            if (item != null)
            {
                String queryString = item;
                //queryResult.Clear();
                for (int i = 0; i < excelDataTable.Columns.Count; i++)
                {   
                    for (int k = 2; k < excelDataTable.Rows.Count; k++)
                    {
                       String row = "";
                       bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
                       if (check)
                       {
                           for (int j = 0; j < excelDataTable.Columns.Count; j++)
                           {
                               string value = excelDataTable.Rows[k][j].ToString()+":";
                               row += value;
                           }
                           if (!queryResult.Contains(row))
                           {
                               queryResult.Add(row);
                           }
                       }
                    }
                }
            }               
        } 

Solution

The most obvious problem is when there are large number of strings, string concatenation is inherently slow. Try using a StringBuilder instead, like this:

 DataTable sqlData = GetConfiguration();

       // var q = sqlData.AsEnumerable().Where(data=> data.Field<String>("slideNo")=="5");

        var w = sqlData.AsEnumerable().Where(data => data.Field<String>("slideNo") == "5")
            .Select(data => data.Field<String>("QuestionStartText")).Distinct();

        List<String> queryResult = new List<String>();
        StringBuilder sb = new StringBuilder();
        foreach (var item in w.ToArray<string>())
        {
            if (item != null)
            {
                String queryString = item;
                //queryResult.Clear();
                for (int i = 0; i < excelDataTable.Columns.Count; i++)
                {   
                    for (int k = 2; k < excelDataTable.Rows.Count; k++)
                    {
                       sb.Clear();
                       bool check = excelDataTable.Rows[k][0].ToString().StartsWith(queryString);
                       if (check)
                       {
                           for (int j = 0; j < excelDataTable.Columns.Count; j++)
                           {
                               sb.Append(excelDataTable.Rows[k][j].ToString());
                               sb.Append(':');
                           }
                           string row = sb.ToString();
                           if (!queryResult.Contains(row))
                           {
                               queryResult.Add(row);
                           }
                       }
                    }
                }
            }               
        } 

Well, you can certainly make it easier to read by cleaning it up a little bit and getting rid of the extra (for i) loop, and substituting a string.Join and Select for the inner (for j) loop’s concatenation:

List<String> queryResult = new List<String>();
foreach (var queryString in w.ToArray<string>())
{
    if (string.IsNullOrEmpty(queryString)) continue;
    for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
        var excelRow = excelDataTable.Rows[rowIndex];
        if (!excelRow[0].ToString().StartsWith(queryString)) continue;
        var row = string.Join(":", excelRow.Select(c => c.ToString()).ToArray());
        if (!queryResult.Contains(row)) queryResult.Add(row);
    }
}

Now that we can see that we’re looping over all the rows for each queryString, we can change to traversing rows once and pick up any queryStrings along the way.
This effectively flips the order of iteration (I’m assuming there’s more rows than
query strings). To get rid of the nested loops altogether, we’ll switch to using Any to find any matches. We’ll also drop the queryResult.Contains check on each iteration for a single Distinct call at the end. That should keep us from iterating queryResult multiple times.

var queryStrings = w.ToArray<string>();
List<String> queryResult = new List<String>();
for (int rowIndex = 2; rowIndex < excelDataTable.Rows.Count; rowIndex++) {
    var row = excelDataTable.Rows[rowIndex];
    var rowStart = row[0].ToString();
    if (!queryStrings.Any(q => rowStart.StartsWith(q)) continue;
    var s = string.Join(":", row.Select(c => c.ToString()).ToArray());
    queryResult.Add(s);
}
queryResult = queryResult.Distinct().ToList();

That’s not going to do a whole lot for performance (basically get rid of a rows iteration), but everything else I see is really context specific that may end up with worse performance assuming your data looks like I expect it does (< 10 queryStrings, < 50 queryResults and thousands of rows).

A couple of additional thoughts you can try, though, depending on your data:

  • If queryStrings and/or queryResults are large, using a HashSet<string> may be a better choice. You’ll take a hit on insert, but lookups (Any and Contains) will be faster.
  • the queryResult.Contains call can effectively invalidate all of our work in looking through queryStrings and concatenating row. Depending on how many duplicate results you expect vs how many queryString matches, swapping that and the queryStrings.Any could help.
  • You could partition the rows and then parallel process them. This may help depending on the number of rows and CPUs available.

From there, any further optimizations would be heavily dependent on your data and would need some example data of the correct relative sizes to profile and test.

Guessing at intent a little bit here, but this should simplify and be more performant (I don’t see the point in the outer loop over the columns). That being said, where is your bottleneck? I’d almost guess it’s in the database operation GetConfiguration() more than any of this code.

    var sqlData = GetConfiguration();

        ////var q = sqlData.AsEnumerable().Where(data => data.Field<string>("slideNo") == "5");
        var w = sqlData
            .AsEnumerable()
            .Where(data => data.Field<string>("slideNo") == "5")
            .Select(data => data.Field<string>("QuestionStartText")).Distinct();
        var queryResult = new List<string>();

        foreach (var queryString in w.Where(item => item != null))
        {
            ////queryResult.Clear();
            for (var k = 2; k < excelDataTable.Rows.Count; k++)
            {
                if (!excelDataTable.Rows[k][0].ToString().StartsWith(queryString))
                {
                    continue;
                }

                var rowSb = new StringBuilder();

                for (var j = 0; j < excelDataTable.Columns.Count; j++)
                {
                    rowSb.Append(excelDataTable.Rows[k][j] + ":");
                }

                var row = rowSb.ToString();

                if (!queryResult.Contains(row))
                {
                    queryResult.Add(row);
                }
            }
        }

Leave a Reply

Your email address will not be published. Required fields are marked *