Find median of list of integers

Posted on

Problem

This code is meant to find the median of a list of integers. The list is assumed to be of odd length.

This is based on this HackerRank question.

I’m doing this to improve my style and to improve my knowledge of fundamental algorithms/data structures for an upcoming coding interview.

from heapq import heappush, heappop

def median(l):
    heap = []
    for i in l:
        heappush(heap, i)
    cur_int = None
    mid = len(l) // 2
    for i in range(0, mid+1):
        cur_int = heappop(heap)
    median = cur_int
    return median

I think this algorithm is O(n) time and space complexity. Please correct me if I’m wrong. I’m also open to any suggestions to improving its complexities or general performance.

Solution

  1. This code has O(NlogN)O(NlogN) time complexity. Building the heap by pushing elements one by one already requires O(NlogN)O(NlogN). The heapify function works in linear time. But it doesn’t matter that much, anyway, as popping n/2n/2 elements requires O(NlogN)O(NlogN) time.

  2. You can achieve the same time complexity by just sorting the list. This solution is also simpler.

  3. It is possible to get O(N)O(N) time complexity on average (and O(N2)O(N2) in the worst case) using the following algorithm:

    • Let’s choose a random pivot element and split the list into two parts: with smaller or equal elements and with bigger ones.

    • We know in which half the median is by looking at sizes of these lists.

    • Thus, we can go to the half that contains it and solve the problem recursively.

    • One can show that this algorithm has a linear time complexity in average case.

  4. There is also a deterministic algorithm for finding a median in linear time in the worst case, but it’s quite complex so I will not describe it here.

To clean up the code a bit and increase performance, try this one line that does everything your code does 🙂

from statistics import median

As far as your algorithm goes, I think that the speed would likely be improved if you just sorted the list in the first place, then selected the middle one (or ones and took the average). You don’t correctly handle the case where the size of the list is even.

Leave a Reply

Your email address will not be published. Required fields are marked *