Count words that are longer than a set Length

Posted on

Problem

This is working, but I want to know if is this a place where IEnumerable can be used?

public static int LongerThanStingCompair(string s, int n)
    {

        int counter = 0;
        // splits based on nextLine break
        var sCount = s.Split('n');
        // Cast to list
        sCount.ToList();
        foreach (var item in sCount)
        {
            //checks to see if words are a certain charLength
            if (item.Length > n)
            {
                counter++;
            }
        }

        return counter;
    }

Test class gives it the input like

Lion, Ape, Tiger, Zebra and 4

So it would return 2 since there are two words with more than 4 letters

Solution

Signature:

public static int LongerThanStingCompair(string s, int n)

Compair is misspelled, as is Sting – they should be Compare and String, respectively. Although, a better name might be CountLinesLongerThan.

s and n aren’t great names for parameters.

On to your code:

// splits based on nextLine break
var sCount = s.Split('n');

The comment is unnecessary and slightly wrong. The code splits on all new line characters.

// Cast to list
sCount.ToList();

This comment is also incorrect – that doesn’t simply cast it creates a new list from an array. It’s unnecessary because you only need to enumerate the lines – you don’t need the functionality of the list.

Here’s a Linq based alternate for you:

public static int CountLinesLongerThan(string source, int characterLimit)
{
    if (source == null) 
    {
        throw new ArgumentNullException("source");
    }
    return source.Split('n')
        .Count(line => line.Length > characterLimit);
}

Since this is tagged with performance:

var sCount = s.Split('n');

If you want performance in .net you want to avoid allocations. If the string is millions of lines this call will allocate an array with millions of strings. Allocating is relatively cheap but cleaning up is not. The garbage collector will have to work hard to clean up after each call to your method.

sCount is a poor name should be rows.

sCount.ToList();

This is another disaster for many reasons, it copies the array of strings to a list that is then not used. You definitely don’t want to allocate a list here for performance reasons and you don’t want list for semantic reasons.

You want a List<T> when you are adding and or removing items from it.
In release build this will be optimized away so it will have no cost but it is very smelly.

For performance you want something like this:

private static int CountRowsLongerThan(string text, int min)
{
    var counter = 0;
    var index = 0;
    while (index < text.Length)
    {
        if (text[index] == 'n') // using 'n' although Environment.NewLine is probably what you want.
        {
            index++;
        }
        var rowLength = GetRowLength(text, index);
        if (rowLength > min)
        {
            counter++;
        }

        index += rowLength;
    }

    return counter;
}

private static int GetRowLength(string text, int index)
{
    var result = 0;
    while (index < text.Length && text[index] != 'n')  // again using 'n' although Environment.NewLine is probably what you want.
    {
        index++;
        result++;
    }
    return result;
}

As you can see the optimized code reads very poorly so only do this kind of optimization if it solves a real problem. The profiler will tell you if it is a problem although profiling allocations is nontrivial since the big cost comes when cleaning up.

This kind of code is nontrivial and requires unit tests. Using LINQ makes it trivial enough to not need to be tested.

General remarks
You probably want to split on Environment.NewLine
Like so:

var count = yourText.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
                    .Count(x => x.Length > 4);

Using LINQ I would do it inline and not wrap it in a method since it is a trivial call.

Leave a Reply

Your email address will not be published. Required fields are marked *