Implementing basic sorting in Python

Posted on

Problem

I would like to get my code for implementation of simple sorting algorithms in idiomatic python code reviewed. I am looking for feedback on python idioms used, better ways of doing the same thing. I am familiar with FP, and I am willing to put in effort to get myself familiar with pythonic idioms such as list comprehensions and enumerators. Hence I would also like feedback on if any of the explicit loops can be converted to these.

import doctest

Bubble Sort

def bubblesort(items):
    """
    >>> bubblesort([])
    []
    >>> bubblesort([1])
    [1]
    >>> bubblesort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> bubblesort([4,5,3,1,2])
    [1, 2, 3, 4, 5]
    """
    for i in range(len(items)):
        mod = False
        for j in range(len(items)-1):
            if items[i] < items[j]:
                mod = True
                items[i], items[j] = items[j], items[i]
        if not mod: break
    return items

Selection Sort

def selectionsort(items):
    """
    >>> selectionsort([])
    []
    >>> selectionsort([1])
    [1]
    >>> selectionsort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> selectionsort([4,5,3,1,2])
    [1, 2, 3, 4, 5]
    """
    for k in range(len(items)-1,0,-1):
        v,i = max((v,i) for i,v in enumerate(items[:k]))
        if items[k] < v:
            items[k],items[i] = items[i], items[k]
    return items

Insertion Sort

def insertionsort(items):
    """
    >>> insertionsort([])
    []
    >>> insertionsort([1])
    [1]
    >>> insertionsort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> insertionsort([4,5,3,1,2])
    [1, 2, 3, 4, 5]

    """
    def find_the_place(v, till):
        """
        >>> items = [1,2,4]
        >>> find_the_place(items, 3, 3)
        2
        """
        for i in range(0, till):
            if v < items[i]:
                return i
        return till
    def shift_things_from(i, till):
        items[i+1:till+1] = items[i:till]

    for k in range(1,len(items)):
        v = items[k]
        i = find_the_place(v, k)
        shift_things_from(i, k)
        items[i] = v
    return items

Shell Sort

def shellsort(items):
    """
    >>> shellsort([])
    []
    >>> shellsort([1])
    [1]
    >>> shellsort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> shellsort([4,5,3,1,2])
    [1, 2, 3, 4, 5]
    """
    def sort_gap(start, gap):
        for k in range(len(items)-1, 0, -1*gap):
            v,i = max((v,i) for i,v in enumerate(items[:k:gap]))
            if items[k] < v:
                items[k], items[i] = items[i], items[k]

    gap = len(items)//2
    while gap > 0:
        for i in range(gap):
            sort_gap(i, gap)
        gap = gap//2
    return items

Merge Sort

def mergesort(items):
    """
    >>> mergesort([])
    []
    >>> mergesort([1])
    [1]
    >>> mergesort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> mergesort([4,5,3,1,2])
    [1, 2, 3, 4, 5]
    """
    def merge(left, right):
        """
        >>> merge([2,4,6], [])
        [2, 4, 6]
        >>> merge([], [2,4,6])
        [2, 4, 6]
        >>> merge([1,3,6], [2,4,6])
        [1, 2, 3, 4, 6, 6]
        """
        if not left: return right
        elif not right: return left
        elif left[0] < right[0]:
            return left[:1] + merge(left[1:], right)
        else:
            return right[:1] + merge(left, right[1:])
    if len(items) <= 1 : return items
    mid = len(items)//2
    left = mergesort(items[:mid])
    right = mergesort(items[mid:])
    return merge(left, right)

Quick Sort

def quicksort(items):
    """
    >>> quicksort([])
    []
    >>> quicksort([1])
    [1]
    >>> quicksort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> quicksort([4,5,3,1,2])
    [1, 2, 3, 4, 5]
    """
    def partition(f, l, pivot):
        while f < l:
            while items[f] < pivot: f+=1
            while items[l] > pivot: l-=1
            items[l], items[f] = items[f], items[l]
            l, f = l-1, f+1
        f,l = l,f # swap because while switches it at the end.
        return (f, l)
    def qsort(fst, lst):
        if fst >= lst: return
        pivot = items[fst]
        (f, l) = partition(fst, lst, pivot)
        qsort(fst, f)
        qsort(l, lst)

    if not items: return items
    qsort(0, len(items)-1)
    return items

Heap Sort

def heapify(items):
    """
    >>> heapify([])
    []
    >>> heapify([1])
    [1]
    >>> heapify([2,1])
    [2, 1]
    >>> heapify([2,1,3])
    [3, 2, 1]
    >>> heapify([2,1,4,3])
    [4, 3, 1, 2]
    >>> heapify([5,3,6,7,1,9,4,8])
    [9, 8, 6, 7, 1, 3, 4, 5]
    """
    for i,t in enumerate(items):
        if i == len(items): break
        while i > 0:
            p = i//2
            if items[p] < items[i]: items[p], items[i] = items[i], items[p]
            i = p
    return items

def siftdown(items, i, size):
    l = i * 2
    r = l * 2 + 1
    largest = i
    if l < size and items[i] < items[l]:
        largest = l
    if r < size and items[i] < items[r]:
        largest = r
    if largest != i:
        items[largest], items[i] = items[i], items[largest]
        items = siftdown(items, largest, size)
    return items

def heapsort(items):
    """
    >>> heapsort([])
    []
    >>> heapsort([1])
    [1]
    >>> heapsort([1,2,3,4,5])
    [1, 2, 3, 4, 5]
    >>> heapsort([4,5,3,1,2])
    [1, 2, 3, 4, 5]
    """
    items = heapify(items)
    for i in range(len(items)-1, 0, -1):
        items[i], items[0] = items[0], items[i] #swap first and last
        items = siftdown(items, 0, i)
    return items

Tests

doctest.testmod()

New question after fixing is asked here.

Solution

I’ll do a proper review later, but for now, I’ll just point out that there seem to be some bugs in your implementation:

>>> from sorting import *
>>> bubblesort([1, 0])
[1, 0]
>>> quicksort([0, 0, 0, 1, 0, 0, 0])
[0, 0, 0, 0, 0, 1, 0]
>>> heapsort([1, 0, 1, 1, 0, 0])
[0, 0, 1, 0, 1, 1]

I used Hypothesis to create some randomised list of integers, throw them into your function, and compare them to the builtin sorted(). I couldn’t get any crashers, but those results seem a bit off to me.

The lesson here: you need more than a few simple test cases. Only having a few tests means that you’re more likely to miss edge cases, and have subtle bugs in your code.

Bubble sort

As above, this doesn’t actually work. There are a couple of things wrong here:

  • This isn’t a bubble sort. In bubble sort, you compare adjacent pairs of items until the list is sorted – here, you’re doing pairwise comparisons of non-adjacent elements. If I try with a list of length 4, and print the values of i and j, this is what I get:

    0 0
    0 1
    0 2
    1 0
    1 1
    1 2
    2 0
    2 1
    2 2
    3 0
    3 1
    3 2
    

    I would expect to see 0 1, 1 2 and 2 3. This is actually much more inefficient than bubble sort.

  • If you want to iterate over multiple ranges like this, you’d be better off using something like the itertools module, than constructing multiple nested loops. This products the same iterations of i and j as your code:

    for i, j in itertools.product(range(len(items)), range(len(items) - 1)):
        print(i, j)
    

Selection sort

  • This is a slightly unusual implementation of selection sort – you’re building up the sorted subset from the right (with the largest elements sorted first) rather than the left. Usually, I see selection sort as building up a sorted subset with the smallest elements first. But this seems to work, so it’s fine.

  • Personally I’m not a fan of range() with three parts, especially if the step is –1, i.e. range(len(items)-1,0,-1). I think they’re a little cramped, and prefer to use reversed(range(len(items)-1)) – I find that easier to read.

  • Small thing, but make sure you add a space after commas. Good for readability.

Insertion sort

  • It would be helpful to have some docstrings on your utility functions. It’s not obvious to me what find_the_place() or shift_things_from() are supposed to do, which makes it hard to tell if they’re doing that correctly.

  • You can improve the for loop in find_the_place with enumerate():

    for idx, item in enumerate(till):
        if v < item:
            return idx
    

Leave a Reply

Your email address will not be published. Required fields are marked *