Partitioning a string into chunks

Posted on

Problem

Inspired by this question: Split a string into chunks of the same length

The code is designed to work on text elements rather than chars to avoid unicode problems.

public static class StringExtensions
{
    public static IEnumerable<string> Partition(this string value, int chunkSize)
    {
        if (value == null)
        {
            throw new ArgumentNullException(nameof(value));
        }
        if (chunkSize < 1)
        {
            throw new ArgumentOutOfRangeException(nameof(chunkSize));
        }
        var sb = new StringBuilder(chunkSize);
        var enumerator = StringInfo.GetTextElementEnumerator(value);
        while (enumerator.MoveNext())
        {
            sb.Append(enumerator.GetTextElement());
            for (var i = 0; i < chunkSize - 1; i++)
            {
                if (!enumerator.MoveNext())
                {
                    break;
                }
                sb.Append(enumerator.GetTextElement());
            }
            yield return sb.ToString();
            sb.Length = 0;
        }
    }
}

And a couple of unit tests

[TestMethod]
public void Partition_SplittingAnAsciiString_ShouldSplitTheStringIntoTheRequiredChunkSize()
{
    string input = "123456";
    string[] expected = { "123", "456" };
    string[] actual = input.Partition(3).ToArray();

    CollectionAssert.AreEqual(expected, actual);
}

[TestMethod]
public void Partition_SplittingAPartiallyDecomposedString_ShouldSplitTheStringIntoTheRequiredChunkSize()
{
    string input = "ééeu0301";
    string[] expected = { "é", "é", "eu0301" };
    string[] actual = input.Partition(1).ToArray();

    CollectionAssert.AreEqual(expected, actual);
}

I usually use Machine Specifications for unit tests so any tips on writing MSTest unit tests would be great especially naming conventions!

The method feels slightly too long to me but I couldn’t see a nice way of splitting it up.

Solution

Nice and clean implementation. I tested this with the tests (the new ones) of my implementation and they all passed (skipping the tests with chunkSize == 0).

The code could be sligthly more readable by having some vertical space to group related code, for instance after validating the input.

You could improve this a little bit by just having this

if (chunkSize == 1)
{
    yield return value;
    yield break;
}  

after the validation, in this way you wouldn’t need to create a StringBuilder nor having the enumerator.


A slightly different approach could be to remove the for loop. It makes the intent more clear (IMO) and removes the need to double check return value of enumerator.MoveNext().

Unfortunately this makes the code 2 lines (3 with the vertical spacing) longer

        var sb = new StringBuilder(chunkSize);
        var enumerator = StringInfo.GetTextElementEnumerator(value);
        var counter = 0;
        while (enumerator.MoveNext())
        {
            counter++;
            sb.Append(enumerator.GetTextElement());
            if (counter == chunkSize)
            {
                yield return sb.ToString();

                sb.Length = 0;
                counter = 0;
            }
        }
        if (counter > 0)
        {
            yield return sb.ToString();
        }

Regarding naming of unit tests, I usually (not in the posted question of mine) use the pattern

UnitOfWork_StateUnderTest_ExpectedBehavior

like shown in the accepted answer over here: unit test naming best practices

I think this type of problems are better solved with LINQ since there is little to no control-flow that can go wrong.

public static class StringExtensions
{
    public static IEnumerable<string> Partition(this string value, int chunkSize)
    {
        if (value == null)
        {
            throw new ArgumentNullException(nameof(value));
        }

        if (chunkSize < 1)
        {
            throw new ArgumentOutOfRangeException(nameof(chunkSize));
        }

        var stringInfo = new StringInfo(value);

        return Enumerable.Range(0, stringInfo.LengthInTextElements)
            .GroupBy(index => index / chunkSize)
            .Select(@group => stringInfo.SubstringByTextElements(@group.First(), @group.Count()));          
     }      
 }

Leave a Reply

Your email address will not be published. Required fields are marked *