Problem
This is a sequel to my previous question: Sieve of Sundaram for Project Euler 7: Python implementation slower than C++ and R
I am trying to implement the Sieve of Sundaram in various languages to solve Project Euler Problem #7.
I have written a solution in Haskell (to which I am still very new, but eager to learn) based loosely on the example available on this blog.
The approach is to find the upper bound for the value of the nth prime:
Then sieve all of the numbers up to the bound.
I would like some advice on improving my style (conciseness, clarity), and the performance of my code, which is about 30x slower than the C++ solution included in my previous question.
import Data.List
bound :: Int -> Int
bound n = floor (n' * ((log n') + log (log n')))
where
n' = fromIntegral n
marked :: Int -> [Int]
marked n = sort [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
where
iMax = floor (sqrt (fromIntegral n))
jMax i = quot (n - i) (2*i + 1)
removeComposites :: [Int] -> [Int] -> [Int]
removeComposites [] _ = []
removeComposites (s:ss) [] = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
| s == c = removeComposites ss cs
| s > c = removeComposites (s:ss) cs
| otherwise = 2*s + 1 : removeComposites ss (c:cs)
sieveSundaram :: Int -> [Int]
sieveSundaram n = 2:(removeComposites [1..n'] (marked n'))
where
n' = quot (bound n) 2
pe_007 = last (take n (sieveSundaram n))
where n = 10001
main :: IO ()
main = do
print pe_007
Solution
The first thing to try is enable compiler optimization level 2 by adding:
{-# OPTIONS_GHC -O2 #-}
to the start of your file. This reduced my execution time by 9%.
Next I removed the limits to take advantage of Haskell’s laziness.
{-# OPTIONS_GHC -O2 #-}
-- doesn't work
import Data.List
marked :: [Int]
marked = [ i + j + 2*i*j | i <- [1..], j <- [i..]]
removeComposites :: [Int] -> [Int] -> [Int]
removeComposites [] _ = []
removeComposites (s:ss) [] = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
| s == c = removeComposites ss cs
| s > c = removeComposites (s:ss) cs
| otherwise = 2*s + 1 : removeComposites ss (c:cs)
sieveSundaram :: [Int]
sieveSundaram = 2:(removeComposites [1..] marked)
pe_007 = last (take n (sieveSundaram))
where n = 100001
main :: IO ()
main = do
print pe_007
Unfortunately that doesn’t work because marked ends up being [4,7,10,13,16,19,22,25..] which is just i=1 and j<-[1..]. To solve this I generated a list for each i and merged the lists together using a fold.
{-# OPTIONS_GHC -O2 #-}
-- compiled execution time for 100001 is 0.877s (about 27% better)
marked :: [Int]
marked = fold $ map mark [1..]
where
fold ((x:xs):t) = x : (xs `union` fold t)
mark i = map (calc i) [i..]
calc i j = i + j + 2*i*j
union :: [Int] -> [Int] -> [Int]
union a [] = a
union [] b = b
union (x:xs) (y:ys)
| x < y = x : union xs (y:ys)
| x == y = x : union xs ys
| otherwise = y : union (x:xs) ys
removeComposites :: [Int] -> [Int] -> [Int]
removeComposites [] _ = []
removeComposites (s:ss) [] = 2*s + 1 : removeComposites ss []
removeComposites (s:ss) (c:cs)
| s == c = removeComposites ss cs
| s > c = removeComposites (s:ss) cs
| otherwise = 2*s + 1 : removeComposites ss (c:cs)
sieveSundaram :: [Int]
sieveSundaram = 2:(removeComposites [1..] marked)
pe_007 = last $ take n sieveSundaram
where n = 100001
main :: IO ()
main = do
print pe_007
In addition to reducing the execution time by a modest amount it also makes it simpler to debug things inside ghci. You could also change the main to take n as an command line input so you wouldn’t need to recompile.
There is probably some inefficiency that I missed. For comparison, my Sieve of Eratosthenes code uses a similar merge fold and runs in 0.26s.
Update
I was curious about the drop in performance of my solution as N increased so I took some more time measurements. I also measured my Sieve of Eratosthenes code that uses a similar fold merge and a faster version I found that uses a treefold merge with a wheel.
10k 100k 200k 1M
Original: 0.073 1.210 2.788 20.983
My simple fold merge: 0.038 0.875 2.634 38.835
SoE simple fold merge: 0.028 0.270 0.646 6.214
SoE treefold merge with wheel: 0.019 0.079 0.142 0.900
HJulle's Unboxed STUarray: 0.015 0.021 0.038 0.172
So it is possible to make a sieve using lists that is quite a bit faster than the one I posted. I find it useful to have the infinite list version for use in other Project Euler problems so I can avoid issues with setting the upper bound incorrectly.
As far as my SoS code goes I’m not sure why it’s so much worse than the SoE but I’m guessing it has to do with the double map in marked.
One last thing to note is the original uses much more memory (1.7GB for N=100k) than the other solutions.
Prime sieves are really not well suited for linked lists. While sorting or merging are clever way to handle the lists in a functional context, the easiest way to improve preformance is to simply use mutable arrays, so it’s more similar to the C++ version.
Here I just replaced the sorting and removeComposites with an unboxed STArray and indexing. The benifit with using STArray is that you still get a pure function, even though the array is mutable while generating. The times are now comparable with the C++ version.
import Data.Array.ST
import Data.Array.Unboxed
bound :: Int -> Int
bound n = floor (n' * (log n' + log (log n')))
where
n' = fromIntegral n
marked :: Int -> [Int]
marked n = [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
where
iMax = floor (sqrt (fromIntegral n))
jMax i = quot (n - i) (2*i + 1)
marked' :: Int -> UArray Int Bool
marked' n = runSTUArray $ do
arr <- newArray (1,n) True
mapM_ (i -> writeArray arr i False) (marked n)
return arr
sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- [1..n'], marked' n' ! x]
where
n' = quot (bound n) 2
pe_007 = last . take n $ sieveSundaram n
where n = 100001
main :: IO ()
main = print pe_007
On my system, with n=200001, the times are:
Haskell, Original version: 5.930s
Haskell, Neil's version: 5.916s
Haskell, Data.IntSet: 2.853s
Haskell, Boxed STArray: 0.209s
Haskell, Unboxed STUArray: 0.055s
C++: 0.019s
It is probably possible to improve the times even more, but this takes care of most of the difference.
Update
Continuing on the theme of using the right data structure for the job, I tried using IntSets for aditional comparison. It is about twice as fast as the list based solutions, but still a lot slower than using mutable arrays (see updated table). So if you want to stay with pure haskell98, this might be an alternative. Here is my (naive) translation. I just create the set of numbers [1,n] and the set of numbers to remove (from marked
) and take the difference.
import qualified Data.IntSet as IS
bound :: Int -> Int
bound n = floor $ n' * (log n' + log (log n'))
where
n' = fromIntegral n
marked :: Int -> [Int]
marked n = [ i + j + 2*i*j | i <- [1..iMax], j <- [i..jMax i]]
where
iMax = floor (sqrt (fromIntegral n))
jMax i = quot (n - i) (2*i + 1)
marked' :: Int -> [Int]
marked' n = let
l1 = IS.fromList [1..n]
l2 = IS.fromList (marked n)
in IS.toList $ l1 IS.\ l2
sieveSundaram :: Int -> [Int]
sieveSundaram n = 2: [2*x+1 | x <- marked' n']
where
n' = quot (bound n) 2
pe_007 = last . take n $ sieveSundaram n
where n = 200001
main :: IO ()
main = print pe_007
IntSets are a really good data structure if you need to repeatedly modify the sets in a pure context, but for this problem nothing can beat an unboxed mutable array.
The Sieve of Sundaram is essentially a buggy reinvention of the Sieve of Eratosthenes and it requires strictly more work for sieving the same range. Conversely, if you do due diligence when implementing Sundaram’s and add one fairly obvious optimisation then you end up with an implementation of the SoE. See my answer in Sieve of Eratosthenes has huge ‘overdraw’ – is Sundaram’s better after all? for the details of how getting Sundaram’s right makes you end up with the Greek classic.
In short: if performance matters, ignore curious museum pieces like Sundaram’s and concentrate on algorithms that have some merit (beyond the intellectual excercise).
Over on Code Review there are quite a few topics dedicated to optimising the Sieve of Eratosthenes, in various languages. There’s also a topic of mine that shows how segmented sieving can speed up things by several orders of magnitude; the sample code is C++ but the principles apply to most languages: Sieve of Eratosthenes – segmented to increase speed and range
Languages like Haskell make you go hunt for a efficient bit set implementations but apart from that most of the other lessons from these optimisation topics can be applied, with a bit of creativity. These topics also have some real performance numbers for sieving non-homoeopathic amounts of data, as a reference.