Problem

There are multiple Project Euler problems (e.g. 7, 10) that deal with prime numbers. I decided make a reusable solution that could be used for most of them.

I wrote a method to generate an *infinite** prime number sequence, which would enumerate lazily (that is – computing only when requested), using `yield return`

. At this point I can use the whole power of LINQ, MoreLINQ, and any other `IEnumerable`

extension method.

It doesn’t perform great though – finding prime numbers below 2 million takes over a second. I don’t think I can use a sieve as I don’t know the upper bound. What could be done to improve it? Any other non-performance improvements?

_{*The sequence isn’t really infinite, only up to 263-1 or until an OutOfMemoryException. }

**The prime sequence generator:**

```
class PrimeSequenceGenerator : ISequenceGenerator
{
public IEnumerable<long> GetSequence()
{
var primeCache = new List<long>();
long curentNumber = 2;
while (true)
{
var isPrime = true;
var currentRoot = Math.Sqrt(curentNumber);
foreach (var cachedPrime in primeCache)
{
if (curentNumber % cachedPrime == 0)
{
isPrime = false;
break;
}
if (cachedPrime > currentRoot)
{
break;
}
}
if (isPrime)
{
primeCache.Add(curentNumber);
yield return curentNumber;
}
curentNumber++;
}
}
}
```

**An interface for all my sequence generators:**

```
interface ISequenceGenerator
{
IEnumerable<long> GetSequence();
}
```

**Usage:**

```
var generator = new PrimeSequenceGenerator();
// Euler problem 7 - Find the 10001st prime.
var problem7Solution = generator.GetSequence().ElementAt(10001 - 1);
// Euler problem 10 - Find the sum of all primes below two million.
var problem10Solution = generator.GetSequence().TakeUntil(x => x > 2000000).SkipLast(1).Sum();
```

Solution

For small primes – i.e. up to 2^32 and maybe up to 2^40..2^50ish regions – you can improve speed by several orders of magnitude if you use a Sieve of Eratosthenes.

In order to get decent speed the factor primes – i.e. those up to `sqrt(limit)`

, whatever your limit might be – should be sieved up front and stored in an array, alongside with the current working offset for each prime (initialised to the square of the prime). The sieving of the small factor primes can also be done with a – simpler – Sieve of Eratosthenes.

The size of the sieve should be equal to the L1 cache size of the expected target machine (usually 32 KByte), and for maximum efficiency you should use a packed odds-only bitmap. `BitArray`

is slow as molasses, so either use `bool[1<<15]`

, or `byte[1<<15]`

with manual bit indexing (i.e. `sieve[i>>3] |= 1<<(i&7)`

to set bit i). Using wheels like mod30 can improve efficiency further, but only in connection with major unrolling of loops and lots of other complications.

Another performance boost is offered by presieving. Instead of clearing the sieve prior to sieving a new window’s worth of primes, you simply blast over the sieve a pre-computed bit pattern that corresponds to sieving by a certain number of (odd) small primes. Presieving up to 13 hits a sweet spot in a situation like this, but going up to 17 or 19 can realise further (tiny) speed gains at the cost of much increased size of the precomputed pattern.

In order to illustrate what the code would look like and how the various techniques connect, here’s the actual sieving function from my SPOJ PRIME1 submission:

```
internal static byte[] presieve_pattern;
internal static int presieve_modulus;
internal static int first_sieve_prime_index;
internal static byte[] sieve;
internal static int[] result;
internal static int result_size;
// ...
internal static void sieve_primes_between_ (int m, int n)
{
if (m > n)
return;
Trace.Assert(first_sieve_prime_index == 0 || m > small_odd_primes[first_sieve_prime_index - 1]);
Trace.Assert((m & 1) != 0 && (n & 1) != 0);
Trace.Assert(m > 1);
int sqrt_n = (int)Math.Sqrt(n);
int max_bit = (n - m) >> 1; // (the +1 is redundant here since both are odd)
Trace.Assert(max_bit < sieve.Length * 8);
if (first_sieve_prime_index == 0) // without presieving
Array.Clear(sieve, 0, (max_bit >> 3) + 1);
else
{
int base_bit = (m >> 1) % presieve_modulus;
while ((base_bit & 7) != 0)
base_bit += presieve_modulus;
Buffer.BlockCopy(presieve_pattern, base_bit >> 3, sieve, 0, (max_bit >> 3) + 1);
}
for (int i = first_sieve_prime_index, e = small_odd_primes.Length; i < e; ++i)
{
int prime = small_odd_primes[i];
if (prime > sqrt_n)
break;
int start = prime * prime, stride = prime << 1;
if (start < m)
{
start = (stride - 1) - (m - start - 1) % stride;
}
else
start -= m;
for (int j = start >> 1; j <= max_bit; j += prime)
sieve[j >> 3] |= (byte)(1 << (j & 7));
}
Trace.Assert(m > 3 && m % 3 != 0);
for (int i = 0, d = 3 - m % 3, e = (n - m) >> 1; i <= e; i += d, d ^= 3)
if ((sieve[i >> 3] & (1 << (i & 7))) == 0)
result[result_size++] = m + (i << 1); // m is odd already
}
```

The code should answer quite a few questions that I simply glossed over in the text. This version is geared towards experimentation with various aspects (like number of presieve primes and so on), and thus it lacks a lot of – uglifying and code-bloating – optimisations.

The wrapper function needs to take care of things like non-sieve primes and normalising arguments, which is why I’m showing it here as well:

```
internal static void sieve_primes_between (int m, int n)
{
result_size = 0;
if (m < small_odd_primes[first_sieve_prime_index])
{
for (int i = -1; i < first_sieve_prime_index; ++i)
{
int prime = i < 0 ? 2 : small_odd_primes[i];
if (m <= prime && prime <= n)
{
result[result_size++] = prime;
}
}
m = small_odd_primes[first_sieve_prime_index];
}
n -= 1 - (n & 1); // make it odd
m |= 1; // make it odd
m += m % 3 == 0 ? 2 : 0; // skip if multiple of 3
sieve_primes_between_(m, n);
}
```

This illustrates the kind of complications that you have to deal with when certain optimisations are applied, like special handling for the wheel primes (in this case only the number 2) and the presieve primes.

What is not shown here – because it does not fit the profile of the PRIME1 task – is the remembering of offsets. I don’t have working C# code that uses the technique at the moment, because I just started learning the language. Remembering the working offsets shouldn’t be too difficult to implement, despite the peculiarities of C# like lack of reference types.

The code takes 3..4 milliseconds to sieve 10 x 100,000 numbers just below the PRIME1 limit (10^9). A version using `bool[]`

could be marginally faster for the low PRIME1 limit but certain further optimisations require the use of `byte[]`

or `int[]`

, especially as C# has major performance problems with indexing.

This type of sieve performs decently in the 32-bit range. It can be modified to operate beyond 32 bits but in higher regions of the 64-bit range it gets bogged down by the sheer number of potential factor primes (hundreds of millions, at the end of the range) each of which needs to be considered for sieving each window unless a technique like bucketing is used.