Posted on

Problem

I’m working on a task from the one of Coursera programming courses. The purpose is to estimate the amount of inversions that are required to sort the array of numbers.   Also I’m trying to learn some Haskell basics, here is my solution:

``````fsthalf :: [a] -> [a]
fsthalf xs = take (length xs `div` 2) xs

sndhalf :: [a] -> [a]
sndhalf xs = drop (length xs `div` 2) xs

union :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
union (a, ac) (b, bc) = (a++b, ac+bc)

merge :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
merge ([], ac) (bs, bc) = (bs, ac + bc)
merge (as, ac) ([], bc) = (as, ac + bc)
merge (a:as, ac) (b:bs, bc)
| a <= b = ([a], 0) `union` merge (as, ac) (b:bs, bc)
| otherwise = ([b], length(a:as)) `union` merge (a:as, ac) (bs, bc)

mergeSort :: ([Int], Int) -> ([Int], Int)
mergeSort (s, c)
| length s `elem` [0, 1]  = (s, c)
| otherwise = merge (mergeSort (fsthalf s, c)) (mergeSort (sndhalf s, c))

main :: IO()
main = do
_ <- readLn :: IO Int
rawValues <- getLine
let intValues = map read \$ words rawValues :: [Int]
print \$ snd \$ mergeSort (intValues, 0)
``````

It seems to work properly, but exceeds time limits. It takes from 9 to 10 seconds to sort the longest possible list of integers (100000 values), while the limit is 6 seconds. Could anyone please point out some possible optimizations for this solution.

Solution

General review tips:

• It’d be better to document your code. In particular functions like

``````union :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
``````

need documentation – it’s not clear which `Int` has what purpose at all without examining the source.

• Some of your functions are polymorphic (`sndhalf`), others not. Why? I’d suggest to have all types consistent. Either just sort `Int`s, or make the types as generic as possible, for example

``````union :: (Ord a) => ([a], Int) -> ([a], Int) -> ([a], Int)
``````

Notice how this generality also makes the types easier to understand.

• I’d suggest to wrap list + its inversion count into a `newtype` such as:

``````newtype List a = List [a] Int
``````

This makes the code better structured, and also allows you to use smart
constructors
, if you decide
so.

`sort`
for inspiration.

Now specific to performance:
– For each pass you traverse the list several times,
which can impact the performance considerably. In particular: `take` and
`length` in `fsthalf`, the same in `sndhalf`. Also `length` in `mergeSort`. This
makes it 5 traversals before the merging pass. I’d try to reduce this number.

• Matching on `length` is wasteful, if you need just to distinguish if it’s 0 or 1, you traverse the whole list needlessly:

``````mergeSort (s, c) | length s `elem` [0, 1]  = (s, c)
``````

``````mergeSort ([], _) = ([], 0)
• Make sure you’re compiling with `-O`.