# Attempting to efficiently compute the largest prime factor

Posted on

Problem

I’m working on the third Euler problem and I’ve written a simple little function that returns the list of prime factors of a given number.

The number for which I have to get all the factors is over 6 billion, so building this huge freakin list of numbers in memory struck me as wonky.

I decided I’d build a `factor` function that generates output by the chunk and spits it out so that I can write it to a text file chunk by chunk. A `buffered_factor` function:

Ok, I’ve decided that doing the ‘buffered’ thing isn’t needful. The computation is just going to take a while, and making my `factor` function a generator will keep my program light in memory.

I’m looking for best practices and practical advice about (what I think is describable as) buffering the output of a function.

Here’s the prime sieve:

``````def is_prime(n):
if n == 1:
return False
if n == 2:
return True

for i in range(2, int(math.ciel(math.sqrt(n))):
if n % i:
return False
return True
``````

And the factoratorizer:

``````def factor(num):
"""Get factors of given number.

:param num: get us them there factors o this number here
:type num: integer
:param chunk_size: number of operations to work through before crapping out a list
:rtype: list of factors
:raises: ValueError on num < 1
"""
if num < 1:
raise ValueError('Given %s. Numbers >= 1, please' % num)

for i in (n + 1 for n in xrange(num)):
if num % i == 0:
if is_prime(i):
yield i
``````

I’m trying to use it thusly:

``````import os
from factor import factor
TARGET = 600851475143
OUTFILE = 'factors.txt'

with open(OUTFILE, 'w') as f:
for output in buffered_factor(TARGET):
f.write(str(output))
f.flush()
``````

Nothing’s being written to file, so obviously I’m doing something wrong.

Solution

You’ve changed the question several times. To be clear, I’m reviewing revision 6.

Be careful not to introduce bugs when revising your code. You’re still calling `buffered_factor(TARGET)` when you have changed the function name to `factor()`. Also, the optional `chunk_size` parameter is no longer being used, but you haven’t removed it.

There is a lot of duplication between your `is_prime()` and `factor()` functions, which is not surprising, since they do similar things. Checking for primality is going to drastically hurt performance — it’s like starting to factor the result all over again. So, let’s just decide that your `is_prime()` function needs to be eliminated. You have to rewrite `factor()` such that once it yields a factor, you eliminate all of its multiples from consideration. (Hint: You know that `num % i == 0` — now modify `num`.)

The only even prime number is 2. Use that fact to modify your `xrange()` to halve your work. Also, your `n + 1` is weird, and should be eliminated by using `xrange()` correctly.

Why do you want to remove your output file before `open`ing it? It would be simpler to truncate it automatically if necessary with `open(..., 'w')`. (Note that truncation is not the same as removing and re-creating, when it comes to preserving the file’s creation time and extended attributes (on systems that keep track of such things), the behaviour when you don’t have write access to the containing directory, the effect on existing open filehandles, and the effect of existing open filehandles on the removal. However, I doubt that you care about any of those considerations for this exercise.) Better yet, for greater flexibility, just print your results to standard output and let your shell command redirect the output to wherever you wish.

Some advice about maths and algorithms which will be useful for this problem and for future problems:

• You are interested in prime factors.

• You don’t care about the non-biggest factors (you might go through them but there is not much point in storing them).

• If a number `n` can be decomposed in `n = m*p` then at least one of `m` and `p` is smaller than `sqrt(n)`.