Problem
I’m working on the third Euler problem and I’ve written a simple little function that returns the list of prime factors of a given number.
The number for which I have to get all the factors is over 6 billion, so building this huge freakin list of numbers in memory struck me as wonky.
I decided I’d build a factor
function that generates output by the chunk and spits it out so that I can write it to a text file chunk by chunk. A buffered_factor
function:
Ok, I’ve decided that doing the ‘buffered’ thing isn’t needful. The computation is just going to take a while, and making my factor
function a generator will keep my program light in memory.
I’m looking for best practices and practical advice about (what I think is describable as) buffering the output of a function.
Here’s the prime sieve:
def is_prime(n):
if n == 1:
return False
if n == 2:
return True
for i in range(2, int(math.ciel(math.sqrt(n))):
if n % i:
return False
return True
And the factoratorizer:
def factor(num):
"""Get factors of given number.
:param num: get us them there factors o this number here
:type num: integer
:param chunk_size: number of operations to work through before crapping out a list
:rtype: list of factors
:raises: ValueError on num < 1
"""
if num < 1:
raise ValueError('Given %s. Numbers >= 1, please' % num)
for i in (n + 1 for n in xrange(num)):
if num % i == 0:
if is_prime(i):
yield i
I’m trying to use it thusly:
import os
from factor import factor
TARGET = 600851475143
OUTFILE = 'factors.txt'
with open(OUTFILE, 'w') as f:
for output in buffered_factor(TARGET):
f.write(str(output))
f.flush()
Nothing’s being written to file, so obviously I’m doing something wrong.
Solution
You’ve changed the question several times. To be clear, I’m reviewing revision 6.
Be careful not to introduce bugs when revising your code. You’re still calling buffered_factor(TARGET)
when you have changed the function name to factor()
. Also, the optional chunk_size
parameter is no longer being used, but you haven’t removed it.
There is a lot of duplication between your is_prime()
and factor()
functions, which is not surprising, since they do similar things. Checking for primality is going to drastically hurt performance — it’s like starting to factor the result all over again. So, let’s just decide that your is_prime()
function needs to be eliminated. You have to rewrite factor()
such that once it yields a factor, you eliminate all of its multiples from consideration. (Hint: You know that num % i == 0
— now modify num
.)
The only even prime number is 2. Use that fact to modify your xrange()
to halve your work. Also, your n + 1
is weird, and should be eliminated by using xrange()
correctly.
Why do you want to remove your output file before open
ing it? It would be simpler to truncate it automatically if necessary with open(..., 'w')
. (Note that truncation is not the same as removing and recreating, when it comes to preserving the file’s creation time and extended attributes (on systems that keep track of such things), the behaviour when you don’t have write access to the containing directory, the effect on existing open filehandles, and the effect of existing open filehandles on the removal. However, I doubt that you care about any of those considerations for this exercise.) Better yet, for greater flexibility, just print your results to standard output and let your shell command redirect the output to wherever you wish.
Some advice about maths and algorithms which will be useful for this problem and for future problems:

You are interested in prime factors.

You don’t care about the nonbiggest factors (you might go through them but there is not much point in storing them).

If a number
n
can be decomposed inn = m*p
then at least one ofm
andp
is smaller thansqrt(n)
.