Problem
Inspired by this question: Split a string into chunks of the same length
The code is designed to work on text elements rather than chars to avoid unicode problems.
public static class StringExtensions
{
public static IEnumerable<string> Partition(this string value, int chunkSize)
{
if (value == null)
{
throw new ArgumentNullException(nameof(value));
}
if (chunkSize < 1)
{
throw new ArgumentOutOfRangeException(nameof(chunkSize));
}
var sb = new StringBuilder(chunkSize);
var enumerator = StringInfo.GetTextElementEnumerator(value);
while (enumerator.MoveNext())
{
sb.Append(enumerator.GetTextElement());
for (var i = 0; i < chunkSize - 1; i++)
{
if (!enumerator.MoveNext())
{
break;
}
sb.Append(enumerator.GetTextElement());
}
yield return sb.ToString();
sb.Length = 0;
}
}
}
And a couple of unit tests
[TestMethod]
public void Partition_SplittingAnAsciiString_ShouldSplitTheStringIntoTheRequiredChunkSize()
{
string input = "123456";
string[] expected = { "123", "456" };
string[] actual = input.Partition(3).ToArray();
CollectionAssert.AreEqual(expected, actual);
}
[TestMethod]
public void Partition_SplittingAPartiallyDecomposedString_ShouldSplitTheStringIntoTheRequiredChunkSize()
{
string input = "ééeu0301";
string[] expected = { "é", "é", "eu0301" };
string[] actual = input.Partition(1).ToArray();
CollectionAssert.AreEqual(expected, actual);
}
I usually use Machine Specifications for unit tests so any tips on writing MSTest unit tests would be great especially naming conventions!
The method feels slightly too long to me but I couldn’t see a nice way of splitting it up.
Solution
Nice and clean implementation. I tested this with the tests (the new ones) of my implementation and they all passed (skipping the tests with chunkSize == 0).
The code could be sligthly more readable by having some vertical space to group related code, for instance after validating the input.
You could improve this a little bit by just having this
if (chunkSize == 1)
{
yield return value;
yield break;
}
after the validation, in this way you wouldn’t need to create a StringBuilder
nor having the enumerator
.
A slightly different approach could be to remove the for
loop. It makes the intent more clear (IMO) and removes the need to double check return value of enumerator.MoveNext()
.
Unfortunately this makes the code 2 lines (3 with the vertical spacing) longer
var sb = new StringBuilder(chunkSize);
var enumerator = StringInfo.GetTextElementEnumerator(value);
var counter = 0;
while (enumerator.MoveNext())
{
counter++;
sb.Append(enumerator.GetTextElement());
if (counter == chunkSize)
{
yield return sb.ToString();
sb.Length = 0;
counter = 0;
}
}
if (counter > 0)
{
yield return sb.ToString();
}
Regarding naming of unit tests, I usually (not in the posted question of mine) use the pattern
UnitOfWork_StateUnderTest_ExpectedBehavior
like shown in the accepted answer over here: unit test naming best practices
I think this type of problems are better solved with LINQ since there is little to no control-flow that can go wrong.
public static class StringExtensions
{
public static IEnumerable<string> Partition(this string value, int chunkSize)
{
if (value == null)
{
throw new ArgumentNullException(nameof(value));
}
if (chunkSize < 1)
{
throw new ArgumentOutOfRangeException(nameof(chunkSize));
}
var stringInfo = new StringInfo(value);
return Enumerable.Range(0, stringInfo.LengthInTextElements)
.GroupBy(index => index / chunkSize)
.Select(@group => stringInfo.SubstringByTextElements(@group.First(), @group.Count()));
}
}