Problem

The prime 41, can be written as the sum of six consecutive primes:

41 = 2 + 3 + 5 + 7 + 11 + 13

This is the longest sum of consecutive primes that adds to a prime below one-hundred.

The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.

Which prime, below one-million, can be written as the sum of the most consecutive primes?

```
from time import time
def is_prime(n):
"""returns True for a prime number, False otherwise."""
if n % 2 == 0 and n > 2:
return False
for i in range(3, int(n ** 0.5) + 1, 2):
if n % i == 0:
return False
return True
def sieve(n):
"""generates primes in range n."""
initial = [True] * n
initial[0] = initial[1] = False
for (i, prime) in enumerate(initial):
if prime:
yield i
for index in range(i * i, n, i):
initial[index] = False
def make_prime_partitions(min_size):
"""Assumes minsize an integer, initializes a prime sequence of length min_size."""
primes = sieve(1000000)
prime_sequence = [next(primes) for _ in range(min_size)]
start_point = 0
while start_point < len(prime_sequence):
for i in range(start_point, min_size):
yield prime_sequence[start_point: i + 1]
start_point += 1
def check_partitions(min_size=1000, max_total=1000000):
"""Assumes min_size is the minimum size of the initial prime sequence to generate
and max total is the maximum total of consecutive prime sequences generated.
generates sequences that sum up to less than max total."""
prime_partitions = make_prime_partitions(min_size)
for sequence in prime_partitions:
if len(sequence) > min_size // 2:
total = sum(sequence)
if is_prime(total) and total <= max_total:
yield sequence
if __name__ == '__main__':
start_time = time()
all_sequences = check_partitions()
longest_sequence = max(all_sequences, key=len)
print(f'Maximum length: {len(longest_sequence)}')
print(f'Maximum sum: {sum(longest_sequence)}')
print(f'Time: {time() - start_time} seconds.')
```

Solution

## Meta Information

I’m glad to see you are now tagging your questions with all of the appropriate tags. Well done.

## Project Euler

Project Euler is a series of challenging mathematical/computer programming problems. Since all problems should be solvable in less than 60 seconds on a computer, efficiency is important when attempting to solve problems. This can often mean sacrificing proper code structuring in an effort to eek out every last cycle of computing speed.

While programming exercises are essential in learning how to program, or learning how to program in a new language, the Project Euler problems are not designed to target learning aspects of any given language. They are problem solving exercises designed to tax the limits of computers if solved via naive methods. The solution isn’t to write a program; it is to write a smart program. If you are trying to learn how to program, perhaps Project Euler is not the problem set you should be tackling.

This Stack Exchange is dedicated to Reviewing Code, in order to increase the quality of written code. It is not intended as a evaluation forum for Project Euler solutions. We are not grading your solutions. More over, Project Euler asks that you not post solutions to problems in other online forums in order to allow others the euphoric experience of solving the problems themselves. If you do post your code here for review, you should first review it yourself looking for good variable names, understandable comments, no unused variables or functions, etc. You shouldn’t post a hacky “it works, please make it pretty” solution; you should be proud of code you submit to Code Review before you submit it. Then the members here will feel more inclined to help you polish the code.

## Yield

The `yield`

statement is powerful, but comes with several costs. The generator function requires its own call stack. The interpreter must keep switching back and forth between the tasks, which takes time, evicts memory out of cache, so takes even more time. The `yield`

statement slows down code.

If you can write the code without the `yield`

, your code will be faster. It will be more understandable, too, because functions run to completion. There are uses for `yield`

, and there are times when you trying to avoid `yield`

is more work than its worth. This isn’t it.

## Doc-Strings

Doc-strings are designed to be extracted by various tools and turned into help documentation for other users to read in order to understand what the function does, and what it requires. In a Python shell, try out `help(make_prime_partitions)`

. Does the description make sense? It says it “initializes” something. But the function actually is a generator, and it returns stuff. A bit of a disconnect, and less than helpful.

Then there is this function:

```
def check_partitions(min_size=1000, max_total=1000000):
"""Assumes min_size is the minimum size of the initial prime sequence to generate
and max total is the maximum total of consecutive prime sequences generated.
generates sequences that sum up to less than max total."""
prime_partitions = make_prime_partitions(min_size)
for sequence in prime_partitions:
if len(sequence) > min_size // 2:
# ...
```

This looks like a wall of text. My aging eyes can’t see where the doc-string ends and the code begins. Putting the `"""`

delimiters on their own line, and add a blank line after the doc-string goes a long way towards improving readability.

```
def check_partitions(min_size=1000, max_total=1000000):
"""
Assumes min_size is the minimum size of the initial prime sequence to generate
and max total is the maximum total of consecutive prime sequences generated.
generates sequences that sum up to less than max total.
"""
prime_partitions = make_prime_partitions(min_size)
for sequence in prime_partitions:
if len(sequence) > min_size // 2:
# ...
```

## Magic Numbers

Where did these numbers come from?

```
def check_partitions(min_size=1000, max_total=1000000):
```

The `1000000`

is for problem you are trying to solve. It should not be a default argument. The caller, who wants to solve PE50 should pass in the desired limit.

What is the `1000`

? Is it the square-root of `max_total`

? Where did it come from and what does it mean? Is it a “minimum size”? It seems to be the maximum length of the sequence of prime numbers you generate … not a minimum of anything!

```
primes = sieve(1000000)
```

Why is `1000000`

hard-coded in the `make_prime_partitions`

function? Is this not the same as `max_total`

? Perhaps it should be passed as an argument into the function??

## Testing

When I am trying to solve a Project Euler problem, I write my main code like:

```
if __name__ == '__main__':
pe50_solve(100)
pe50_solve(1_000)
pe50_solve(1_000_000)
```

And I expect to get the output showing that 41 is the sum of 6 consecutive primes (from 2 to 13) under 100, that 953 is the sum of 21 consecutive primes (and what the start/end primes are) under 1000, and the actual desired answer to the problem.

- This gives me confidence in my solution. It works for the two examples given in the problem text.
- It requires a reasonable interface for solving the problem. No extra magic numbers are being passed in.

## Assumptions

You can speed up your code by making assumptions about the answer. For this question, you’ve are generating 1000 primes, assuming the answer will use primes in that range.

Is it reasonable?

Below 1000, longest sequence was 21 terms. So below 1 million, the longest sequence will be at least 21 terms. Since these primes are sequential, they will all be fairly close together, so we can reason about them. They would fall around an average. What would the largest possible average be? 1000000 / 21 = 41619.04… so perhaps stopping the prime generation at 50000 would be a reasonable limit? The longer the sequence, the smaller the average, and the lower the limit would become, but how can you come up with an upper limit for the longest sequence?

What is this `len(sequence) > min_size // 2`

? Oh sure, it speeds up your code. But why? Because you know the answer’s sequence length is between 500 and 1000? What if the question is changed to a limit of 750 thousand? 2 million? 2.5 million? 10 million? Is it still valid? You only know it is valid for the 1 million case because you’ve already run it the slow way, and found the answer. But it isn’t justifiable.

`is_prime`

In this function, you are going through successive numbers up to n−−√$\sqrt{n}$ looking for any number that evenly divides n. This is an O(n−−√)$O(\sqrt{n})$ operation. If only you could weed out some of the divided you don’t need to check. You skip the even numbers by counting by 2’s, but you don’t skip `9`

, or `15`

, or `21`

, or `25`

. If only you generated some prime numbers you could use as a list of possible divider, you could reduce this to O(n√logn√)$O(\frac{\sqrt{n}}{\mathrm{log}\sqrt{n}})$. Oh wait, you have generated a list of prime numbers … why aren’t you using it?

Better yet, the `sieve()`

produces an array of boolean flags in the `initial`

list, and `initial[x]`

is `True`

if `x`

is prime. You could make the `is_prime(x)`

function O(1)$O(1)$. But, you’d have to expose `initial`

to the outside world, instead of it being an internal workspace. And no I don’t recommend using `global initial`

; there is a better way. You could simply `return initial`

from your sieve. But then it can’t be a generator, but I advocated getting rid of the `yield`

above anyway.