Sieve of Sundaram for Project Euler 7

Posted on

Problem

This is a sequel to my previous question: Sieve of Sundaram for Project Euler 7: Python implementation slower than C++ and R

I am trying to implement the Sieve of Sundaram in various languages to solve Project Euler Problem #7.

I have written a solution in Haskell (to which I am still very new, but eager to learn) based loosely on the example available on this blog.

The approach is to find the upper bound for the value of the nth prime:

Bound(n)=n×(log(n)+log(log(n)))Bound(n)=n×(log(n)+log(log(n)))

Then sieve all of the numbers up to the bound.

I would like some advice on improving my style (conciseness, clarity), and the performance of my code, which is about 30x slower than the C++ solution included in my previous question.

import Data.List

bound :: Int -> Int
bound n = floor (n' * ((log n') + log (log n')))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n = sort [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
    where 
        iMax = floor (sqrt (fromIntegral n))
        jMax i = quot (n - i) (2*i + 1)

removeComposites :: [Int] -> [Int] -> [Int]
removeComposites [] _          = []
removeComposites (s:ss) []     = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
    | s == c    = removeComposites ss cs
    | s > c     = removeComposites (s:ss) cs
    | otherwise = 2*s + 1 : removeComposites ss (c:cs)

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2:(removeComposites [1..n'] (marked n'))
    where 
        n' = quot (bound n) 2

pe_007 = last (take n (sieveSundaram n))
    where n = 10001

main :: IO ()
main = do
    print pe_007

Solution

The first thing to try is enable compiler optimization level 2 by adding:

{-# OPTIONS_GHC -O2 #-}

to the start of your file. This reduced my execution time by 9%.

Next I removed the limits to take advantage of Haskell’s laziness.

{-# OPTIONS_GHC -O2 #-}
-- doesn't work
import Data.List

marked :: [Int]
marked = [ i + j + 2*i*j | i <- [1..], j <- [i..]]

removeComposites :: [Int] -> [Int] -> [Int]
removeComposites [] _          = []
removeComposites (s:ss) []     = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
    | s == c    = removeComposites ss cs
    | s > c     = removeComposites (s:ss) cs
    | otherwise = 2*s + 1 : removeComposites ss (c:cs)

sieveSundaram :: [Int]
sieveSundaram = 2:(removeComposites [1..] marked)

pe_007 = last (take n (sieveSundaram))
    where n = 100001

main :: IO ()
main = do
    print pe_007

Unfortunately that doesn’t work because marked ends up being [4,7,10,13,16,19,22,25..] which is just i=1 and j<-[1..]. To solve this I generated a list for each i and merged the lists together using a fold.

{-# OPTIONS_GHC -O2 #-}
-- compiled execution time for 100001 is 0.877s (about 27% better)

marked :: [Int]
marked = fold $ map mark [1..]
    where
        fold ((x:xs):t) = x : (xs `union` fold t)
        mark i = map (calc i) [i..]
        calc i j = i + j + 2*i*j

union :: [Int] -> [Int] -> [Int]
union a         []        = a
union []        b         = b
union (x:xs) (y:ys)
    | x < y     = x : union xs (y:ys)
    | x == y    = x : union xs ys
    | otherwise = y : union (x:xs) ys

removeComposites :: [Int] -> [Int] -> [Int]
removeComposites [] _          = []
removeComposites (s:ss) []     = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
    | s == c    = removeComposites ss cs
    | s > c     = removeComposites (s:ss) cs
    | otherwise = 2*s + 1 : removeComposites ss (c:cs)

sieveSundaram :: [Int] 
sieveSundaram = 2:(removeComposites [1..] marked)

pe_007 = last $ take n sieveSundaram
    where n = 100001

main :: IO ()
main = do
    print pe_007

In addition to reducing the execution time by a modest amount it also makes it simpler to debug things inside ghci. You could also change the main to take n as an command line input so you wouldn’t need to recompile.

There is probably some inefficiency that I missed. For comparison, my Sieve of Eratosthenes code uses a similar merge fold and runs in 0.26s.


Update

I was curious about the drop in performance of my solution as N increased so I took some more time measurements. I also measured my Sieve of Eratosthenes code that uses a similar fold merge and a faster version I found that uses a treefold merge with a wheel.

                                10k     100k    200k    1M
Original:                       0.073   1.210   2.788   20.983
My simple fold merge:           0.038   0.875   2.634   38.835
SoE simple fold merge:          0.028   0.270   0.646    6.214
SoE treefold merge with wheel:  0.019   0.079   0.142    0.900
HJulle's Unboxed STUarray:      0.015   0.021   0.038    0.172

So it is possible to make a sieve using lists that is quite a bit faster than the one I posted. I find it useful to have the infinite list version for use in other Project Euler problems so I can avoid issues with setting the upper bound incorrectly.

As far as my SoS code goes I’m not sure why it’s so much worse than the SoE but I’m guessing it has to do with the double map in marked.

One last thing to note is the original uses much more memory (1.7GB for N=100k) than the other solutions.

Prime sieves are really not well suited for linked lists. While sorting or merging are clever way to handle the lists in a functional context, the easiest way to improve preformance is to simply use mutable arrays, so it’s more similar to the C++ version.

Here I just replaced the sorting and removeComposites with an unboxed STArray and indexing. The benifit with using STArray is that you still get a pure function, even though the array is mutable while generating. The times are now comparable with the C++ version.

import Data.Array.ST
import Data.Array.Unboxed

bound :: Int -> Int
bound n = floor (n' * (log n' + log (log n')))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n =  [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
    where 
        iMax = floor (sqrt (fromIntegral n))
        jMax i = quot (n - i) (2*i + 1)

marked' :: Int -> UArray Int Bool
marked' n = runSTUArray $ do
    arr <- newArray (1,n) True
    mapM_ (i -> writeArray arr i False) (marked n)
    return arr

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- [1..n'], marked' n' ! x]
    where 
        n' = quot (bound n) 2

pe_007 = last . take n $ sieveSundaram n
    where n = 100001

main :: IO ()
main = print pe_007

On my system, with n=200001, the times are:

Haskell, Original version: 5.930s
Haskell, Neil's version:   5.916s
Haskell, Data.IntSet:      2.853s
Haskell, Boxed STArray:    0.209s
Haskell, Unboxed STUArray: 0.055s
C++:                       0.019s

It is probably possible to improve the times even more, but this takes care of most of the difference.


Update

Continuing on the theme of using the right data structure for the job, I tried using IntSets for aditional comparison. It is about twice as fast as the list based solutions, but still a lot slower than using mutable arrays (see updated table). So if you want to stay with pure haskell98, this might be an alternative. Here is my (naive) translation. I just create the set of numbers [1,n] and the set of numbers to remove (from marked) and take the difference.

import qualified Data.IntSet as IS

bound :: Int -> Int
bound n = floor $ n' * (log n' + log (log n'))
    where 
        n' = fromIntegral n

marked :: Int -> [Int]
marked n =  [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
    where 
        iMax = floor (sqrt (fromIntegral n))
        jMax i = quot (n - i) (2*i + 1)

marked' :: Int -> [Int]
marked' n = let
    l1 = IS.fromList [1..n]
    l2 = IS.fromList (marked n)
    in IS.toList $ l1 IS.\ l2

sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- marked' n']
    where 
        n' = quot (bound n) 2

pe_007 = last . take n $ sieveSundaram n
    where n = 200001

main :: IO ()
main = print pe_007

IntSets are a really good data structure if you need to repeatedly modify the sets in a pure context, but for this problem nothing can beat an unboxed mutable array.

The Sieve of Sundaram is essentially a buggy reinvention of the Sieve of Eratosthenes and it requires strictly more work for sieving the same range. Conversely, if you do due diligence when implementing Sundaram’s and add one fairly obvious optimisation then you end up with an implementation of the SoE. See my answer in Sieve of Eratosthenes has huge ‘overdraw’ – is Sundaram’s better after all? for the details of how getting Sundaram’s right makes you end up with the Greek classic.

In short: if performance matters, ignore curious museum pieces like Sundaram’s and concentrate on algorithms that have some merit (beyond the intellectual excercise).

Over on Code Review there are quite a few topics dedicated to optimising the Sieve of Eratosthenes, in various languages. There’s also a topic of mine that shows how segmented sieving can speed up things by several orders of magnitude; the sample code is C++ but the principles apply to most languages: Sieve of Eratosthenes – segmented to increase speed and range

Languages like Haskell make you go hunt for a efficient bit set implementations but apart from that most of the other lessons from these optimisation topics can be applied, with a bit of creativity. These topics also have some real performance numbers for sieving non-homoeopathic amounts of data, as a reference.

Leave a Reply

Your email address will not be published. Required fields are marked *