Posted on

Problem

I’m working on a task from the one of Coursera programming courses. The purpose is to estimate the amount of inversions that are required to sort the array of numbers.

Also I’m trying to learn some Haskell basics, here is my solution:

fsthalf :: [a] -> [a]
fsthalf xs = take (length xs `div` 2) xs

sndhalf :: [a] -> [a]
sndhalf xs = drop (length xs `div` 2) xs

union :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
union (a, ac) (b, bc) = (a++b, ac+bc)

merge :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
merge ([], ac) (bs, bc) = (bs, ac + bc)
merge (as, ac) ([], bc) = (as, ac + bc)
merge (a:as, ac) (b:bs, bc)
| a <= b = ([a], 0) `union` merge (as, ac) (b:bs, bc)
| otherwise = ([b], length(a:as)) `union` merge (a:as, ac) (bs, bc)

mergeSort :: ([Int], Int) -> ([Int], Int)
mergeSort (s, c)
| length s `elem` [0, 1]  = (s, c)
| otherwise = merge (mergeSort (fsthalf s, c)) (mergeSort (sndhalf s, c))

main :: IO()
main = do
_ <- readLn :: IO Int
rawValues <- getLine
let intValues = map read \$ words rawValues :: [Int]
print \$ snd \$ mergeSort (intValues, 0)

It seems to work properly, but exceeds time limits. It takes from 9 to 10 seconds to sort the longest possible list of integers (100000 values), while the limit is 6 seconds. Could anyone please point out some possible optimizations for this solution.

Solution

General review tips:

• It’d be better to document your code. In particular functions like

union :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)

need documentation – it’s not clear which Int has what purpose at all without examining the source.

• Some of your functions are polymorphic (sndhalf), others not. Why? I’d suggest to have all types consistent. Either just sort Ints, or make the types as generic as possible, for example

union :: (Ord a) => ([a], Int) -> ([a], Int) -> ([a], Int)

Notice how this generality also makes the types easier to understand.

• I’d suggest to wrap list + its inversion count into a newtype such as:

newtype List a = List [a] Int

This makes the code better structured, and also allows you to use smart
constructors
, if you decide
so.

sort
for inspiration.

Now specific to performance:
– For each pass you traverse the list several times,
which can impact the performance considerably. In particular: take and
length in fsthalf, the same in sndhalf. Also length in mergeSort. This
makes it 5 traversals before the merging pass. I’d try to reduce this number.

• Matching on length is wasteful, if you need just to distinguish if it’s 0 or 1, you traverse the whole list needlessly:

mergeSort (s, c) | length s `elem` [0, 1]  = (s, c)