Problem
I have an alternative algorithm to the same problem as Finding missing items in an int list.
My implementation includes optional min and max bounds to selectively fill in the list.
The linked implementation is much simpler.
My Questions are:
-
How do our two algorithms compare, efficiency wise?
-
Besides the algorithm linked, are there more efficient algorithms?
-
Did I identify and does my algorithm work for the edge cases?
I tried not to use min
and max
because I heard that they are O(n).
However, I used the .index
method for lists. I am not sure how efficient .index
is. I would think that it would be O(logn) for a sorted but I am not sure if or how the .index
function would know that a given list is sorted.
Additionally, I used exceptions, which do give a performance penalty.
7 Cases
-
One missing number – Normal
-
Multiple Missing Numbers – Normal
-
No missing numbers – Normal
-
Repeats – Normal
-
Array too small – Case for empty array: returns an array of the numbers from
min_val
tomax_val
inclusive -
Missing numbers at end – Filled in based on the min and max number
-
Numbers out of range – Ignored those which were out of range
from bisect import bisect_left
def find_missing(int_array, min_val = False, max_val = False):
"""Doctest
>>> find_missing([1,2,3,5,6,7], 1, 7)
[4]
>>> find_missing([2,3,6,4,8], 2, 8)
[5, 7]
>>> find_missing([1,2,3,4], 1, 4)
[]
>>> find_missing([11,1,1,2,3,2,3,2,3,2,4,5,6,7,8,9],1,11)
[10]
>>> find_missing([-1,0,1,3,7,20], -1, 7)
[2, 4, 5, 6]
>>> find_missing([-2,0,3], -5, 2)
[-5, -4, -3, -1, 1, 2]
>>> find_missing([2],4,5)
[4, 5]
>>> find_missing([3,5,6,7,8], -3, 5)
[-3, -2, -1, 0, 1, 2, 4]
>>> find_missing([1,2,4])
[3]
"""
if len(int_array) == 0:
return list(range(min_val, max_val+1))
int_array = sorted(int_array)
first_in_int_array = int_array[0]
last_in_int_array = int_array[len(int_array)-1]
if not min_val:
min_val = first_in_int_array
if not max_val:
max_val = last_in_int_array
result = []
#Checking to see if the min & max values currently exist
#If they do not then add them to the resultant array as they are missing numbers
if min_val < first_in_int_array:
int_array.insert(0, min_val)
result.append(min_val)
if max_val > last_in_int_array:
int_array.insert(len(int_array), max_val)
result.append(max_val)
#Dealing with case where max_val < last_in_int_array but not in array
#Example: max_val = 2 int_array = [1,3]
#Also dealing with the analogous case for min_val and first_in_int_array
#Deal with this using exceptions
try:
min_pos = int_array.index(min_val)
except Exception as e:
result.append(min_val)
min_pos = bisect_left(int_array, min_val)
int_array.insert(min_pos, min_val)
try:
max_pos = int_array.index(max_val)
except Exception as e:
result.append(max_val)
max_pos = bisect_left(int_array, max_val)
int_array.insert(max_pos, max_val)
for i in range(min_pos,max_pos):
difference = int_array[i+1] - int_array[i]
if difference != 1 and difference != 0:
result += range(int_array[i]+1,int_array[i+1])
result.sort()
return result
Solution
Your doctest is misformatted; fix that. Also, don’t add a space before the docstring and don’t give it a useless header like “Doctest”: give it real documentation. Beware of PEP-8 spacing, too.
Your parameter int_array
is misnamed; you don’t take an array
but a list
. Best call it just integers
, which is good for duck-typing.
You default min_val
and max_val
to False
… but False == 0
! This means that if the user passes in 0
as min_val
or max_val
it would be ignored. You should default unknown values to, say, None
. I would also rename these to smallest
and largest
.
You do
integers = sorted(integers)
which is O(nlogn). This is OK, but is a suboptimal running time. I shall leave it for now.
You do mention
I tried not to use min and max because I heard that they are O(n).
but you run these at most twice and
Since the overall cost is at best (hopefully) O(n+(largest−smallest))>O(n), this does not affect overall running times. They are also very fast operations. However, since you’re already sorting the array, indexing will be faster.
All of this is not needed if you make your main loop more robust:
#Checking to see if the min & max values currently exist
#If they do not then add them to the resultant array as they are missing numbers
if min_val < first_in_int_array:
int_array.insert(0, min_val)
result.append(min_val)
if max_val > last_in_int_array:
int_array.insert(len(int_array), max_val)
result.append(max_val)
#Dealing with case where max_val < last_in_int_array but not in array
#Example: max_val = 2 int_array = [1,3]
#Also dealing with the analogous case for min_val and first_in_int_array
#Deal with this using exceptions
try:
min_pos = int_array.index(min_val)
except Exception as e:
result.append(min_val)
min_pos = bisect_left(int_array, min_val)
int_array.insert(min_pos, min_val)
try:
max_pos = int_array.index(max_val)
except Exception as e:
result.append(max_val)
max_pos = bisect_left(int_array, max_val)
int_array.insert(max_pos, max_val)
You should also fix up the default case to be more robust:
if not integers:
if smallest is None or largest is None:
return []
return list(range(smallest, largest+1))
Maybe even throw an error with smallest
or largest
undefined for an empty list. Maybe.
Your main loop should probably loop over the whole array, not some subsection.
for integer in integers:
Then one can do just
result.extend(range(smallest+1, integer))
since that is the difference between them. You can then set
smallest = integer
and carry on. You will need to deal with integer < smallest
and largest > integers[-1]
, and also missing smallest
. One quick fix is
result = []
for integer in integers:
if not smallest <= integer <= largest:
continue
result += range(smallest, integer)
if integer >= smallest:
smallest = integer+1
result += range(smallest, largest+1)
return result
result
will already be sorted, so don’t call .sort
. Use .extend
over +=
as well.
More efficient than sorting (O(nlogn)) is to use a set
. However, to produce output one would have to loop over a range. This can be done with:
seen = set(integers)
return [i for i in range(smallest, largest) if i not in seen]
This condenses the whole function down to just
if smallest is None:
smallest = min(integers)
if largest is None:
largest = max(integers)
seen = set(integers)
return [i for i in range(smallest, largest+1) if i not in seen]
This is similar to Caridorc’s except
- I am not using inline
if
statements, - I have made a
set
before checkingif i not in seen
; this is O(n) for lists but O(1) for sets.
Consider, document and implement what find_missing([])
should do.
-
You say ‘
Doctest
‘ but then you format them badly with unecessary newlines and you do not actually run the tests. -
Your code is much more complex than it should be.
When writing a programme remember that it should be the simplest possible.
The function down below behaves exactly like yours but:
- It is 3 lines of code instead of 50.
- The doctest is properly formatted so that you can actually run it.
- The doctest is actually run when the script is executed as the main file.
:
import doctest
def find_missing(integers_list,start=None,limit=None):
"""
Given a list of integers and optionally a start and an end, finds all
the integers from start to end that are not in the list.
'start' and 'end' default respectivly to the first and the last item of the list.
Doctest:
>>> find_missing([1,2,3,5,6,7], 1, 7)
[4]
>>> find_missing([2,3,6,4,8], 2, 8)
[5, 7]
>>> find_missing([1,2,3,4], 1, 4)
[]
>>> find_missing([11,1,1,2,3,2,3,2,3,2,4,5,6,7,8,9],1,11)
[10]
>>> find_missing([-1,0,1,3,7,20], -1, 7)
[2, 4, 5, 6]
>>> find_missing([-2,0,3], -5, 2)
[-5, -4, -3, -1, 1, 2]
>>> find_missing([2],4,5)
[4, 5]
>>> find_missing([3,5,6,7,8], -3, 5)
[-3, -2, -1, 0, 1, 2, 4]
>>> find_missing([1,2,4])
[3]
"""
start = start if start is not None else integers_list[0]
limit = limit if limit is not None else integers_list[-1]
return [i for i in range(start,limit + 1) if i not in integers_list]
def _test():
doctest.testmod()
if __name__ == "__main__":
_test()