LINQ for extracting N bottom items (async streams)

Posted on

Problem

source.Bottom(n, x => x) should be the same as well known LINQ source.OrderBy(x => x).Take(n) but is more memory/run-time efficient as it does not keep all items in memory as OrderBy does:

public static class BottomN
{
    public static async IAsyncEnumerable<T> Bottom<T, TValue>(
        this IAsyncEnumerable<T> source, int number, Func<T, TValue> selector)
        where TValue : IComparable<TValue>
    {
        var list = new List<T>();
        await foreach(var item in source)
        {
            list.Insert(list.BestIndex(item, selector), item);
            if (list.Count > number)
                list.RemoveAt(number);                
        }

        foreach (var item in list)
            yield return item;
    }

    static int BestIndex<T, TValue>(this IList<T> list, T value, Func<T, TValue> selector) 
        where TValue : IComparable<TValue>
    {
        return bestIndex(0, list.Count);
        int bestIndex(int s, int e) =>
            s == e ? s :
                selector(list[(s + e) / 2]).CompareTo(selector(value)) > 0 
                ? bestIndex(s, (s + e) / 2) 
                : bestIndex((s + e) / 2 + 1, e);
    }
}

To test:

    [TestMethod]
    public async Task Keep_Three()
    {
        var values = new[] { 10, 22, 3, 55, 66, 100, 200, 2 };
        var bottom = await values.ToAsyncEnumerable().Bottom(3, v => v).ToArrayAsync();
        CollectionAssert.AreEqual(new int[] { 2, 3, 10 }, bottom);
    }

Solution

where TValue : IComparable<TValue>

Don’t do this. OrderBy doesn’t require its type parameters to implement this interface like this either. Instead it uses the IComparer<T> interface. What’s the difference? With IComparable<T>, the implementation of the comparison logic is sitting on the class T itself. This means that there is one and only one way of ordering elements of T. If I wanted to sort them using some customized logic, I would be out of luck.

Instead, IComparer<T> is a separate class that compares Ts. Providing an instance of that interface to the method allows me to use whatever logic I want to order T.

But what if I don’t want to implement an entire class, but instead want to use IComparable<T>? This is where Comparer<T>.Default comes to play. This static property provides a default implementation for IComparer<T> for T, which, if T implements IComparable<T>, will default to call that logic.

So how would your interface look? We have an overload with the IComparer<TValue> argument, and an overload without:

public static async IAsyncEnumerable<T> Bottom<T, TValue>(
        this IAsyncEnumerable<T> source, int number, Func<T, TValue> selector, IComparer<TValue> comparer)
{
    return Bottom(source, number, selector, Comparer<TValue>.Default);
}

public static async IAsyncEnumerable<T> Bottom<T, TValue>(
        this IAsyncEnumerable<T> source, int number, Func<T, TValue> selector, IComparer<TValue> comparer)
{
    // Actual implementation.
}

Binary tree

I think this problem would call for a binary tree, instead of a list that’s sorted/ranked continuously. This would you can quickly check whether the item you are iterating would even be in the top-number of items in the collection, without having to add and subsequently having to remove it again. The downside is that C# doesn’t have a built-in Binary Tree that isn’t hidden within the SortedX collection set. These classes sadly require unique values in their collections, which isn’t guaranteed here.

Alternatively, if you can handle adding another few lines to your solution, you can check if the index returned by BestIndex is equal to number, and skip adding and removing it from the list if this is the case.

Code style

This needs to be said. Your code is really compact. It took me multiple reads to figure out what on earth BestIndex was actually doing. Docstrings, comments and/or intermediate variables with clear names please. Something as simple as “Returns the rank of value in the list by performing a merge-sort style algorithm.” is enough to understand its role in Bottom.

Leave a Reply

Your email address will not be published. Required fields are marked *