Linq query performance improvements

Posted on

Problem

As I am getting my Linq query to a functional point, I start looking at the query and think about all the “ANY” and wonder if those should be a different method and then I have data conversions going on.

Does anything jump out as being a performance issue? What is recommended to make this more performant? (Yes, I need all the &&.)

etchVector =
    from vio in list
    where excelViolations.Any(excelVio => vio.VioID.Formatted.Equals(excelVio.VioID.ToString()))
    && excelViolations.Any(excelVio => vio.RuleType.Formatted.Equals(excelVio.RuleType))
    && excelViolations.Any(excelVio => vio.VioType.Formatted.Equals(excelVio.VioType)) 
    && excelViolations.Any(excelVio => vio.EtchVects.Any(x => x.XCoordinate.Equals(excelVio.XCoordinate)))
    && excelViolations.Any(excelVio => vio.EtchVects.Any(y => y.YCoordinate.Equals(excelVio.YCoordinate)))
    select new EtchVectorShapes
    {
        VioID = Convert.ToInt32(vio.EtchVects.Select(x => x.VioID)),
        ObjectType = vio.EtchVects.Select(x => x.ObjectType).ToString(),
        XCoordinate = Convert.ToDouble(vio.EtchVects.Select(x => x.XCoordinate)),
        YCoordinate = Convert.ToDouble(vio.EtchVects.Select(x => x.YCoordinate)),
        Layer = vio.EtchVects.Select(x => x.Layer).ToString()      
    };

Solution

This is without optimizations, but the errors that you describe seem to be in how you are getting the data.

Based on your lists within list and the error message you are getting , try something like this:

etchVector = list.Where(vio => excelViolations.Any(currVio => vio.VioID.Formatted.Equals(currVio.VioID.ToString())
                        && vio.RuleType.Formatted.Equals(currVio.RuleType)
                        && vio.VioType.Formatted.Equals(currVio.VioType)
                        && vio.Bows.Any(bw => bw.XCoordinate.Equals(currVio.XCoordinate))
                        && vio.Bows.Any(bw1 => bw1.YCoordinate.Equals(currVio.YCoordinate)))).SelectMany(vi => vi.EtchVects).ToList();

You can definitely optimize this query. If list has M items and excelViolations has N items, the query will iterate excelViolations M*5N times.

It is possible to reduce this to M*N iterations by consuming excelViolations only once per item in list. You can do this with a subquery on excelViolations that checks all five conditions for each excelVio, then ORs each of the five conditions across all excelVio instances.

If that is not clear, hopefully this walkthrough is:

from vio in list

// Perform all five checks for each excelVio

let checks = excelViolations
    .Select(excelVio => new
    {
        HasVioID = vio.VioID.Formatted.Equals(excelVio.VioID.ToString()),
        HasRuleType = vio.RuleType.Formatted.Equals(excelVio.RuleType),
        HasVioType = vio.VioType.Formatted.Equals(excelVio.VioType),
        HasXCoordinate = vio.EtchVects.Any(x => x.XCoordinate.Equals(excelVio.XCoordinate)),
        HasYCoordinate = vio.EtchVects.Any(y => y.YCoordinate.Equals(excelVio.YCoordinate))
    })

// From left to right, OR each of the five results (equivalent to .Any)

    .Aggregate((left, right) => new
    {
        HasVioID = (left.HasVioID || right.HasVioID),
        HasRuleType = (left.HasRuleType || right.HasRuleType),
        HasVioType = (left.HasVioType || right.HasVioType),
        HasXCoordinate = (left.HasXCoordinate || right.HasXCoordinate),
        HasYCoordinate = (left.HasYCoordinate || right.HasYCoordinate)
    })

// Filter to only those that pass every check

where checks.HasVioID
    && checks.HasRuleType
    && checks.HasVioType
    && checks.HasXCoordinate
    && checks.HasYCoordinate

// Same projection

select new EtchVectorShapes
{
    VioID = Convert.ToInt32(vio.EtchVects.Select(x => x.VioID)),
    ObjectType = vio.EtchVects.Select(x => x.ObjectType).ToString(),
    XCoordinate = Convert.ToDouble(vio.EtchVects.Select(x => x.XCoordinate)),
    YCoordinate = Convert.ToDouble(vio.EtchVects.Select(x => x.YCoordinate)),
    Layer = vio.EtchVects.Select(x => x.Layer).ToString()
};

The underlying problem is that vio.EtchVects is IEnumerable<T>, and LINQ IEnumerable<T>.Select functions still return IEnumerable<T>. Your Convert.ToX functions are expecting scalar values.

For simplicity, I will build from @Bryan Watts’ answer, starting only with the select statement.

If you want only the EtchVectors as a flat list, you can do the following:

from vect in vio.EtchVects
select new EtchVectorShapes
{
    VioID = Convert.ToInt32(vect.VioID),
    ObjectType = vect.ObjectType.ToString(),
    XCoordinate = Convert.ToDouble(vect.XCoordinate),
    YCoordinate = Convert.ToDouble(vect.YCoordinate),
    Layer = vect.Layer.ToString()
};

If you want them to retain the same organizational hierarchy they started with, you can do something similar to the following:

select new EtchVectorShapes
{
    Shapes = from vect in vio.EtchVects
             select new EtchVectorShape
             {
                 VioID = Convert.ToInt32(vect.VioID),
                 ObjectType = vect.ObjectType.ToString(),
                 XCoordinate = Convert.ToDouble(vect.XCoordinate),
                 YCoordinate = Convert.ToDouble(vect.YCoordinate),
                 Layer = vect.Layer.ToString()
             }
};

If you only want one of the vectors from each vio, you could use one of the scalar LINQ functions (e.g., First or Last) or one of the aggregate functions (e.g., Max or Sum):

let vect = vio.EtchVects.First ()
select new EtchVectorShapes
{
    VioID = Convert.ToInt32(vect.VioID),
    ObjectType = vect.ObjectType.ToString(),
    XCoordinate = Convert.ToDouble(vect.XCoordinate),
    YCoordinate = Convert.ToDouble(vect.YCoordinate),
    Layer = vect.Layer.ToString()
};

Leave a Reply

Your email address will not be published. Required fields are marked *