I’m working on the third Euler problem and I’ve written a simple little function that returns the list of prime factors of a given number.
The number for which I have to get all the factors is over 6 billion, so building this huge freakin list of numbers in memory struck me as wonky.
I decided I’d build a
factor function that generates output by the chunk and spits it out so that I can write it to a text file chunk by chunk. A
Ok, I’ve decided that doing the ‘buffered’ thing isn’t needful. The computation is just going to take a while, and making my
factor function a generator will keep my program light in memory.
I’m looking for best practices and practical advice about (what I think is describable as) buffering the output of a function.
Here’s the prime sieve:
def is_prime(n): if n == 1: return False if n == 2: return True for i in range(2, int(math.ciel(math.sqrt(n))): if n % i: return False return True
And the factoratorizer:
def factor(num): """Get factors of given number. :param num: get us them there factors o this number here :type num: integer :param chunk_size: number of operations to work through before crapping out a list :rtype: list of factors :raises: ValueError on num < 1 """ if num < 1: raise ValueError('Given %s. Numbers >= 1, please' % num) for i in (n + 1 for n in xrange(num)): if num % i == 0: if is_prime(i): yield i
I’m trying to use it thusly:
import os from factor import factor TARGET = 600851475143 OUTFILE = 'factors.txt' with open(OUTFILE, 'w') as f: for output in buffered_factor(TARGET): f.write(str(output)) f.flush()
Nothing’s being written to file, so obviously I’m doing something wrong.
You’ve changed the question several times. To be clear, I’m reviewing revision 6.
Be careful not to introduce bugs when revising your code. You’re still calling
buffered_factor(TARGET) when you have changed the function name to
factor(). Also, the optional
chunk_size parameter is no longer being used, but you haven’t removed it.
There is a lot of duplication between your
factor() functions, which is not surprising, since they do similar things. Checking for primality is going to drastically hurt performance — it’s like starting to factor the result all over again. So, let’s just decide that your
is_prime() function needs to be eliminated. You have to rewrite
factor() such that once it yields a factor, you eliminate all of its multiples from consideration. (Hint: You know that
num % i == 0 — now modify
The only even prime number is 2. Use that fact to modify your
xrange() to halve your work. Also, your
n + 1 is weird, and should be eliminated by using
Why do you want to remove your output file before
opening it? It would be simpler to truncate it automatically if necessary with
open(..., 'w'). (Note that truncation is not the same as removing and re-creating, when it comes to preserving the file’s creation time and extended attributes (on systems that keep track of such things), the behaviour when you don’t have write access to the containing directory, the effect on existing open filehandles, and the effect of existing open filehandles on the removal. However, I doubt that you care about any of those considerations for this exercise.) Better yet, for greater flexibility, just print your results to standard output and let your shell command redirect the output to wherever you wish.
Some advice about maths and algorithms which will be useful for this problem and for future problems:
You are interested in prime factors.
You don’t care about the non-biggest factors (you might go through them but there is not much point in storing them).
If a number
ncan be decomposed in
n = m*pthen at least one of
pis smaller than