Which approach is preferable string parsing?

Posted on

Problem

I want to find total word count in string. I have two methods as mentioned below.

public static int Wordcount(string strName)
{
    bool wordfound;
    int ctr = 0;

    for (int i = 0; i < strName.Length; )
    {
        wordfound = false;

        while ((i < strName.Length) && (strName[i] == ' '))
            i++;

        while ((i < strName.Length) && (strName[i] != ' '))
        {
            if (!wordfound)
            {
                ctr++;
                wordfound = true;
            }
            i++;
        }
    }

    return ctr;
}

public static int Wordcount1(string strName)
{
    string temp = null;
    int ctr = 0;

    for (int i = 0; i < strName.Length; i++)
    {
        if (strName[i] != ' ')
        {
            if (temp == null)
                ctr++;

            temp += strName[i];
        }
        else if (strName[i] == ' ')
            temp = null;
    }

    return ctr;
}

Can anyone please comment on better approach w.r.t. logic, performance and memory utilization?

Solution

Let’s analyze each approach:

public static int Wordcount1(string strName)
{
    string temp = null;
    int ctr = 0;

    for (int i = 0; i < strName.Length; i++)
    {
        if (strName[i] != ' ')
        {
            if (temp == null)
                ctr++;

            temp += strName[i];
        }
        else if (strName[i] == ' ')
            temp = null;
    }

    return ctr;
}

You are using temp string only as boolean flag (whether its null or not). So, you don’t need to store current word in this variable. Actually temp += strName[i] creates many strings in memory, because strings are immutable. So, its performance and memory consuming. Remove it.

Also you don’t need second if. If char is not white space, then it is white space. There is no third option. And improve naming – don’t use prefixes for variable names.

public static int Wordcount(string text)
{
    bool wordFound = false
    int count = 0;

    for (int i = 0; i < text.Length; i++)
    {
        if (text[i] != ' ' && !wordFound)
        {             
            wordFound = true;
            count++
            continue;
        }

        if (wordFound)
            wordFound = false;
    }

    return count;
}

Another approach:

public static int GetWordsCount(string strName)
{
    bool wordfound;
    int ctr = 0;

    for (int i = 0; i < strName.Length; )
    {
        wordfound = false;

        while ((i < strName.Length) && (strName[i] == ' '))
            i++;

        while ((i < strName.Length) && (strName[i] != ' '))
        {
            if (!wordfound)
            {
                ctr++;
                wordfound = true;
            }
            i++;
        }
    }

    return ctr;
}

For me its a little harder to understand, because you are incrementing loop variable in loop body on some conditions inside other loops. But you are not creating strings here as you do in your second approach. So it’s definitely better in terms of performance and memory usage, than you original second approach.

In second approach you can eliminate usage of boolean flag, and condition check in second while loop. Also with some comments this code becomes easier to understand:

public static int GetWordsCount(string text)
{        
    int count = 0;

    for (int i = 0; i < text.Length; )
    {   
        // process to next word 
        while ((i < text.Length) && (text[i] == ' '))
            i++;

        if (i < text.Length)
            count++;

        // loop till the word end
        while ((i < text.Length) && (text[i] != ' '))
            i++;            
    }

    return count;
}

But of course its better to use already existing functionality:

return text.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Length;

Its not that fast, and it creates array of words which uses memory. But it’s easy to understand for other developers. So, instead of doing premature optimization, I’d go with this approach and modified it only in case of performance problems.

Leave a Reply

Your email address will not be published. Required fields are marked *