Find Missing Numbers in an int list

Posted on

Problem

I have an alternative algorithm to the same problem as Finding missing items in an int list.

My implementation includes optional min and max bounds to selectively fill in the list.

The linked implementation is much simpler.

My Questions are:

  • How do our two algorithms compare, efficiency wise?

  • Besides the algorithm linked, are there more efficient algorithms?

  • Did I identify and does my algorithm work for the edge cases?


I tried not to use min and max because I heard that they are O(n)O(n).

However, I used the .index method for lists. I am not sure how efficient .index is. I would think that it would be O(logn)O(logn) for a sorted but I am not sure if or how the .index function would know that a given list is sorted.

Additionally, I used exceptions, which do give a performance penalty.

7 Cases

  1. One missing number – Normal

  2. Multiple Missing Numbers – Normal

  3. No missing numbers – Normal

  4. Repeats – Normal

  5. Array too small – Case for empty array: returns an array of the numbers from min_val to max_val inclusive

  6. Missing numbers at end – Filled in based on the min and max number

  7. Numbers out of range – Ignored those which were out of range


from bisect import bisect_left

def find_missing(int_array, min_val = False, max_val = False):

"""Doctest

>>> find_missing([1,2,3,5,6,7], 1, 7)

[4]
>>> find_missing([2,3,6,4,8], 2, 8)

[5, 7]
>>> find_missing([1,2,3,4], 1, 4)

[]
>>> find_missing([11,1,1,2,3,2,3,2,3,2,4,5,6,7,8,9],1,11)

[10]
>>> find_missing([-1,0,1,3,7,20], -1, 7)

[2, 4, 5, 6]

>>> find_missing([-2,0,3], -5, 2)

[-5, -4, -3, -1, 1, 2]

>>> find_missing([2],4,5)

[4, 5]

>>> find_missing([3,5,6,7,8], -3, 5)

[-3, -2, -1, 0, 1, 2, 4]

>>> find_missing([1,2,4])

[3]

"""
if len(int_array) == 0:
    return list(range(min_val, max_val+1))

int_array = sorted(int_array)
first_in_int_array = int_array[0]
last_in_int_array = int_array[len(int_array)-1]

if not min_val:
    min_val = first_in_int_array

if not max_val:
    max_val = last_in_int_array
result = []

#Checking to see if the min & max values currently exist
#If they do not then add them to the resultant array as they are missing numbers
if min_val < first_in_int_array:            
    int_array.insert(0, min_val)
    result.append(min_val)

if max_val > last_in_int_array:
    int_array.insert(len(int_array), max_val)
    result.append(max_val)

#Dealing with case where max_val < last_in_int_array but not in array 
#Example: max_val = 2 int_array = [1,3]
#Also dealing with the analogous case for min_val and first_in_int_array
#Deal with this using exceptions
try:
    min_pos = int_array.index(min_val)

except Exception as e:
    result.append(min_val)
    min_pos = bisect_left(int_array, min_val)
    int_array.insert(min_pos, min_val)



try:
    max_pos = int_array.index(max_val)
except Exception as e:
    result.append(max_val)
    max_pos = bisect_left(int_array, max_val)
    int_array.insert(max_pos, max_val)


for i in range(min_pos,max_pos):
    difference = int_array[i+1] - int_array[i]
    if difference != 1 and difference != 0:
        result += range(int_array[i]+1,int_array[i+1])

result.sort()
return result

Solution

Your doctest is misformatted; fix that. Also, don’t add a space before the docstring and don’t give it a useless header like “Doctest”: give it real documentation. Beware of PEP-8 spacing, too.

Your parameter int_array is misnamed; you don’t take an array but a list. Best call it just integers, which is good for duck-typing.

You default min_val and max_val to False… but False == 0! This means that if the user passes in 0 as min_val or max_val it would be ignored. You should default unknown values to, say, None. I would also rename these to smallest and largest.

You do

integers = sorted(integers)

which is O(nlogn)O(nlogn). This is OK, but is a suboptimal running time. I shall leave it for now.

You do mention

I tried not to use min and max because I heard that they are O(n)O(n).

but you run these at most twice and

2O(n)=O(2n)=O(n)2O(n)=O(2n)=O(n)

Since the overall cost is at best (hopefully) O(n+(largestsmallest))>O(n)O(n+(largestsmallest))>O(n), this does not affect overall running times. They are also very fast operations. However, since you’re already sorting the array, indexing will be faster.

All of this is not needed if you make your main loop more robust:

#Checking to see if the min & max values currently exist
#If they do not then add them to the resultant array as they are missing numbers
if min_val < first_in_int_array:            
    int_array.insert(0, min_val)
    result.append(min_val)

if max_val > last_in_int_array:
    int_array.insert(len(int_array), max_val)
    result.append(max_val)

#Dealing with case where max_val < last_in_int_array but not in array 
#Example: max_val = 2 int_array = [1,3]
#Also dealing with the analogous case for min_val and first_in_int_array
#Deal with this using exceptions
try:
    min_pos = int_array.index(min_val)

except Exception as e:
    result.append(min_val)
    min_pos = bisect_left(int_array, min_val)
    int_array.insert(min_pos, min_val)



try:
    max_pos = int_array.index(max_val)
except Exception as e:
    result.append(max_val)
    max_pos = bisect_left(int_array, max_val)
    int_array.insert(max_pos, max_val)

You should also fix up the default case to be more robust:

if not integers:
    if smallest is None or largest is None:
        return []

    return list(range(smallest, largest+1))

Maybe even throw an error with smallest or largest undefined for an empty list. Maybe.

Your main loop should probably loop over the whole array, not some subsection.

for integer in integers:

Then one can do just

result.extend(range(smallest+1, integer))

since that is the difference between them. You can then set

smallest = integer

and carry on. You will need to deal with integer < smallest and largest > integers[-1], and also missing smallest. One quick fix is

result = []
for integer in integers:
    if not smallest <= integer <= largest:
        continue

    result += range(smallest, integer)

    if integer >= smallest:
        smallest = integer+1

result += range(smallest, largest+1)
return result

result will already be sorted, so don’t call .sort. Use .extend over += as well.

More efficient than sorting (O(nlogn))(O(nlogn)) is to use a set. However, to produce output one would have to loop over a range. This can be done with:

seen = set(integers)
return [i for i in range(smallest, largest) if i not in seen]

This condenses the whole function down to just

if smallest is None:
    smallest = min(integers)

if largest is None:
    largest = max(integers)

seen = set(integers)
return [i for i in range(smallest, largest+1) if i not in seen]

This is similar to Caridorc’s except

  • I am not using inline if statements,
  • I have made a set before checking if i not in seen; this is O(n)O(n) for lists but O(1)O(1) for sets.

Consider, document and implement what find_missing([]) should do.

  • You say ‘Doctest‘ but then you format them badly with unecessary newlines and you do not actually run the tests.

  • Your code is much more complex than it should be.
    When writing a programme remember that it should be the simplest possible.

The function down below behaves exactly like yours but:

  1. It is 3 lines of code instead of 50.
  2. The doctest is properly formatted so that you can actually run it.
  3. The doctest is actually run when the script is executed as the main file.

:

import doctest

def find_missing(integers_list,start=None,limit=None):
    """
    Given a list of integers and optionally a start and an end, finds all
    the integers from start to end that are not in the list.

    'start' and 'end' default respectivly to the first and the last item of the list.

    Doctest:

    >>> find_missing([1,2,3,5,6,7], 1, 7)
    [4]

    >>> find_missing([2,3,6,4,8], 2, 8)
    [5, 7]

    >>> find_missing([1,2,3,4], 1, 4)
    []

    >>> find_missing([11,1,1,2,3,2,3,2,3,2,4,5,6,7,8,9],1,11)
    [10]

    >>> find_missing([-1,0,1,3,7,20], -1, 7)
    [2, 4, 5, 6]

    >>> find_missing([-2,0,3], -5, 2)
    [-5, -4, -3, -1, 1, 2]

    >>> find_missing([2],4,5)
    [4, 5]

    >>> find_missing([3,5,6,7,8], -3, 5)
    [-3, -2, -1, 0, 1, 2, 4]

    >>> find_missing([1,2,4])
    [3]

    """
    start = start if start is not None else integers_list[0]
    limit = limit if limit is not None else integers_list[-1]
    return [i for i in range(start,limit + 1) if i not in integers_list]


def _test():
    doctest.testmod()

if __name__ == "__main__":
    _test()

Leave a Reply

Your email address will not be published. Required fields are marked *