Function for encoding strings in-place

Posted on

Problem

I recently wrote a function that replaces every white space with ‘%20’, just for fun (and sharping my coding skills). The input string is terminated with extra white spaces, with the length that the string should have after encoding takes place. For example:

input = "Not encoded  " //2 white spaces in the end
output = "Not%20encoded"

Although I was able to implement such encoding in-place, I was still not pleased with the result, specially due to the inner loop that shifts the whole string to the right everytime a white space is found. Here is my solution, written in C#:

    public string encode(string str)
    {
        var array = str.ToCharArray();
        for (int i = 0; i < array.Length; i++) {
            if (array[i] != ' ') continue;

            //switch every char >> 2, from the end of the string to i
            for (int j = array.Length - 1; j > i; j--) {
                array[j] = array[j - 2];                    
            }

            array[i++] = '%';
            array[i++] = '2';
            array[i] = '0';
        }
        return new String(array);
    }

I’m looking for a solution that operates in O(n), possibly performing a single traversal on the input string. Additional feedback on how to improve performance/readability of my function are always very appreciated 🙂

Solution

As strings are immutable you can’t really avoid intermediate store of the data as it’s being encoded.

As you already know the length of the final string you can avoid the shuffling:

    int spaceCount = 0;
    for (int i = str.Length - 1; str[i] == ' '; i--)
    {
        spaceCount++;
    }
    if (spaceCount == 0) { return str; }

    var array = new char[str.Length];
    int idx = 0;
    for (int i = 0; i < str.Length - spaceCount; i++)
    {
        var current = str[i];
        if (current  == ' ')
        {
            array[idx++] = '%';
            array[idx++] = '2';
            array[idx++] = '0';
        }
        else 
        {
             array[idx++] = current ;
        }
    }

    var encoded = new string(array);

Space required is 2 * O(n) (array and final string) and time is 2 * O(n) (encoding string into array, building final string)

Your solution:

  • copies the whole string twice (1. string.ToCharArray() 2. new string())
  • needs n bytes of extra storage (apart from the input and output; n is the size of the output)
  • potentially moves the characters in the string a lot (O(n2))
  • requires the input to already have free space for the expansion

Solution using StringBuilder:

  • copies the whole string twice (1. calls to Append() 2. StringBuilder.ToString())
  • needs about n bytes of extra storage
  • doesn’t need to move characters at all
  • doesn’t require the input to be in any special format

Based on this, StringBuilder is clearly the better solution.

But there is even better solution: just use string.Replace(). That’s likely going to be quite efficient and it’s much shorter than any other of the proposed alternatives.

Here’s how I would do it (sans assertions/error checks):

private static string Encode(string input)
{
    var output = new char[input.Length];
    var i = 0;
    var o = 0;

    while (o < output.Length)
    {
        if (input[i] != ' ')
        {
            output[o++] = input[i++];
        }
        else
        {
            output[o++] = '%';
            output[o++] = '2';
            output[o++] = '0';
            ++i;
        }
    }

    return new string(output);
}

It’s O(n) in both time and memory.

Using a StringBuilder doesn’t gain you much in this instance, because you know exactly how large your output will be (the same length as your input). It’s just a wrapper around a char[] anyway, so you’d end up with slightly slower code without any benefit.

Leave a Reply

Your email address will not be published. Required fields are marked *