Problem
I’ve implemented the sieve of Eratosthenes in Python 3 as follows:
def sieve_of_eratosthenes(n):
is_prime = [True] * (n+1)
is_prime[0] = False
is_prime[1] = False
p = 0
while True:
for i, prime in enumerate(is_prime):
if prime and i > p:
p = i
break
else:
break
multiple = p + p
while multiple <= n:
is_prime[multiple]= False
multiple = multiple + p
r = []
for i, prime in enumerate(is_prime):
if prime: r.append(i)
return r
Running this to 100,000 takes ~25 seconds.
I felt that was on the slow side, so I decided to take a different approach. The Sieve works prospectivly: it works ahead to clear out all multiples. I made a retrospective function, trial-dividing everything lower than the current prime-candidate:
def retrospect(n):
primes = [2, 3]
i = 5
while i <= n:
isprime = True
lim = math.sqrt(n)+1
for p in primes:
if p > lim:
break
if i % p == 0:
isprime = False
break
if isprime:
primes.append(i)
i += 2
return primes
This is way faster!
Questions:
- Are there any obvious shortcomings to the Sieve implementation?
- Is there a point where the retrospect gets slower than the sieve?
Solution
while True:
for i, prime in enumerate(is_prime):
if prime and i > p:
p = i
break
else:
break
This is the cause of the slowness: track the state between loops and the 25 seconds drop to 0.05 seconds.
multiple = p + p
while multiple <= n:
is_prime[multiple]= False
multiple = multiple + p
It’s probably more Pythonic to use range
here:
for multiple in range(p + p, n + 1, p):
is_prime[multiple] = False
When handling larger inputs, it becomes worthwhile starting at p * p
instead of p + p
(on the basis that every composite multiple of p
smaller than p * p
also has a smaller prime factor).
r = []
for i, prime in enumerate(is_prime):
if prime: r.append(i)
return r
IMO it makes more sense to inline this into the main loop.
Edited code at this point:
def sieve_of_eratosthenes(n):
is_prime = [True] * (n+1)
is_prime[0] = False
is_prime[1] = False
primes = []
for p in range(2, n+1):
if is_prime[p]:
primes.append(p)
for multiple in range(p * p, n + 1, p):
is_prime[multiple] = False
return primes
Now if you want more speed at the cost of complicating things ever so slightly, you can apply a simple wheel by special-casing the only even prime:
def sieve_of_eratosthenes(n):
is_prime = [True] * (n+1)
primes = [2]
for p in range(3, n + 1, 2):
if is_prime[p]:
primes.append(p)
for multiple in range(p * p, n + 1, p + p):
is_prime[multiple] = False
return primes
Your prime sieve is overly complicated. You only need a single loop (and can also make it a generator to potentially save some memory):
def prime_sieve(limit):
prime = [True] * limit
prime[0] = prime[1] = False
for p, is_prime in enumerate(prime):
if is_prime:
yield p
for n in range(p * p, limit, p):
prime[n] = False
This also uses the fact that p + p
is already known to be composite for most numbers (since it is 2 * p
and all multiples of 2
have already been marked composite once you have added 2
to the primes) and indeed all multiples of n * p
for n < p
have already been excluded. So it is sufficient to start at p * p
, because that is the first multiple that has not been marked of as composite.
This is strictly faster than your retrospect
function:
It is also linear with n
, whereas retrospect
is roughly O(n2)O(n2), due to having to iterate over all already found primes.
In your other function I would also use a for
loop. It has the nice feature that it has an else
clause that gets executed if the loop was not interrupted with a break
statement:
def retrospect(n):
primes = [2, 3]
lim = math.sqrt(n) + 1
for i in range(5, n + 1, 2):
for p in primes:
if p > lim:
break
if i % p == 0:
break
else:
primes.append(i)
return primes
It is also sufficient to calculate lim
only once.
Your sieve function is rather convoluted and slow. If we think carefully about what’s actually required, for each identified prime number, we cross off multiples. We can stop when we get to n−−√n and each prime cross-out operation for a prime ii can start with i2i2.
def sieve_of_eratosthenes(n):
is_prime = [True] * (n+1)
is_prime[0] = False
is_prime[1] = False
limit = int(math.sqrt(n))+1
for i in range(2, limit):
if is_prime[i]:
for j in range(i*i, n+1, i):
is_prime[j] = False
r = []
for i, prime in enumerate(is_prime):
if prime: r.append(i)
return r
A still faster way to do it would be to note that all even numbers except for 2 are composite (that is, non-prime). We can slightly tweak the code to account for that:
def sieve_of_eratosthenes(n):
is_prime = [False, True] * int(n/2 + 1)
del is_prime[n+1:]
is_prime[1] = False
is_prime[2] = True
limit = int(math.sqrt(n))+1
for i in range(3, limit):
if is_prime[i]:
for j in range(i*i, n+1, 2*i):
is_prime[j] = False
r = []
for i, prime in enumerate(is_prime):
if prime:
r.append(i)
return r
Performance
Just for fun, I decided to repeat and expand the performance measurement that @Graipher made. Here’s the code I used to do it:
import time
import matplotlib.pyplot as plt
def time_algo(algo, times):
result = []
for t in times:
start = time.time()
algo(t)
result.append((time.time() - start)*1000)
return result
names = [
#'retrospect',
taylor1,
graipher,
edward1,
taylor2,
edward2,
]
numarray = [1000, 3000,
10000, 30000,
100000, 300000,
1000000, 3000000,
10000000, 30000000,]
trials = []
for algo in names:
trials.append(time_algo(algo, numarray))
for t in trials:
plt.plot(numarray, t, '.-')
plt.ylabel('time (ms)')
plt.xlabel('n')
plt.title("Prime algorithm perfomance")
plt.legend([alg.__name__ for alg in names])
plt.show()
@Edward: corrected limit
def sieve_of_eratosthenes(n):
is_prime = [False, True] * int(n/2 + 1)
n_plus_one = n + 1 # save n+1 for later use
del is_prime[n_plus_one:] #
is_prime[1] = False
is_prime[2] = True
#limit = int(math.sqrt(n))
limit = int(n**.5) + 1 # import math not required, limit corrected!
for i in range(3, limit):
if is_prime[i]:
for j in range(i*i, n_plus_one, 2*i): # limit corrected!
is_prime[j] = False
r = []
for i, prime in enumerate(is_prime):
if prime:
r.append(i)
return r
without corrected limit s_o_e(9)
yields [2,3,5,7,9], so 9 would be prime (as would 11,13,17,…)
On the performance side, using numpy arrays makes the computations 10 times faster of course.
Using Edward’s code for performance, together with the following code (“Edward1” algorithm adapted to numpy) :
def sieve_of_eratosthenes(n):
import numpy as np
is_prime = np.ones(n,dtype=bool)
is_prime[0:2] = False
limit = int(math.sqrt(n))+1
for i in range(2, limit):
if is_prime[i]:
is_prime[np.arange(i*i,n,i)] = False
r = np.where(is_prime == True)[0]
return r