Problem
I’ve got a fairly simple task here, but I think it’s worth trying to perfect the implementation, because I can probably learn a thing or two about using .NET’s enumerables correctly in the process.
I have two “lists” of strings, let’s call them values
and separators
. The former is one element longer than the latter, and I want to interleave them into a single string (so I want to riffle the separators in between the values). Example:
var values = new []{ "abc", "def", "ghi" };
var separators = new []{ "123", "456" };
// expected result: "abc123def456ghi"
I need this in a few places, and in my actual code these might be List
s or other enumerables, so I figured I’d add an extension method to IEnumerable<string>
, which I call on the values and pass the separators as an argument. Here’s what I’ve got:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Retina.Extensions
{
public static class EnumerableExtension
{
public static string Riffle(this IEnumerable<string> values, IEnumerable<string> separators)
{
if (values.Count() != separators.Count() + 1)
throw new Exception("Riffled enumerable needs to be one shorter than this enumerable.");
var builder = new StringBuilder();
var zipped = values.Zip(separators, (val, sep) => new { val, sep });
foreach (var pair in zipped)
{
builder.Append(pair.val);
builder.Append(pair.sep);
}
builder.Append(values.Last());
return builder.ToString();
}
}
}
As I understand it, IEnumerable
is generally a lazy list, so both Count()
and Last()
potentially incur an additional iteration through the entire list, which seems unnecessary. Also, the intermediate zipped
enumerable seems awkward for something as simple as alternatingly iterating through two lists.
Is there a more idiomatic way to implement this and make better use of .NET’s collections/enumerables? Is IEnumerable
even the correct choice here?
Solution
You’re right about enumerating both parameters multiple times. Each call to Count
or Last
will execute the enumerator. This might of course become very inefficient, especially if your collections have a lot of items.
So what could you do instead? Well, one option would be to replace IEnumerable
with the next closest type which is the ICollection
. This would guarantee that both collections are already materialized. This however is rarely a good solution so it’s preferable to keep it lazy.
But for this, simple LINQ is not enough. You need to work with enumerators directly and implement your extension in such a way that it uses the GetEnumerator
method. With it you have the complete control over the enumeration process. With a little bit of logic you can keep you parameters lazy and still be able to validate that the are more values then separators.
You do this by first getting the first value. Then the while
will try to get a pair of separators and values to append them to the string. If at some point it could get a separator but not a value, it means that there is at least the same number of separators as values and at this point you can throw an exception. The other case is when there are not enough separators. To detect this you just need to try to get another value after the while
. If it works then it means there were not enough separators.
public static string Join(this IEnumerable<string> values, IEnumerable<string> separators)
{
var result = new StringBuilder();
using (var value = values.GetEnumerator())
using (var separator = separators.GetEnumerator())
{
if (value.MoveNext())
{
result.Append(value.Current);
while (separator.MoveNext())
{
if (value.MoveNext())
{
result
.Append(separator.Current)
.Append(value.Current);
}
else
{
throw new ArgumentException("There are too many separators!");
}
}
if (value.MoveNext())
{
throw new ArgumentException("There are not enough separators!");
}
}
return result.ToString();
}
}
I would also call this extension Join
because this is what it does.
Whether or not Count()
requires a new iteration depends on how the IEnumerable<T>
object is wired. The framework is smart enough to know that if the object implements ICollection<T>
(“unordered”) it should invoke the Count
property. If not, GetEnumerator()
is invoked and the returned enumerator iterated.
The same logic applies for Last()
. If the object implements IList<T>
(ordered), then the Count
property is used to calculate the position, and a call to the indexer returns the element.
So by using your example variables, no extra iteration is required as string[]
implements both ICollection<string>
and IList<string>
.
Now, with that being said, I would rather move the constraint check from the callee to the caller, rending the function more flexible.
Then invoke string.Join
which under the hood use StringBuilder
to compose the final string.
public static class EnumerableExtensions
{
public static IEnumerable<TSource> Interleave<TSource>(this IEnumerable<TSource> source1, IEnumerable<TSource> source2)
{
if (source1 == null) { throw new ArgumentNullException(nameof(source1)); }
if (source2 == null) { throw new ArgumentNullException(nameof(source2)); }
using (var enumerator1 = source1.GetEnumerator())
{
using (var enumerator2 = source2.GetEnumerator())
{
var continue1 = true;
var continue2 = true;
do
{
if (continue1 = enumerator1.MoveNext())
{
yield return enumerator1.Current;
}
if (continue2 = enumerator2.MoveNext())
{
yield return enumerator2.Current;
}
}
while (continue1 || continue2);
}
}
}
}
Example:
https://dotnetfiddle.net/CbSXp2
string[] values;
string[] separators;
values = new[] { "a", "b", "c" };
separators = new[] { "1", "2" };
Console.WriteLine(string.Join("", values.Interleave(separators)));
values = new[] { "a", "b" };
separators = new[] { "1", "2", "3", "4", "5", "6" };
Console.WriteLine(string.Join("", values.Interleave(separators)));
values = new[] { "a", "b", "c", "d", "e", "f" };
separators = new[] { "1", "2" };
Console.WriteLine(string.Join("", values.Interleave(separators)));
a1b2c
a1b23456
a1b2cdef