Problem
I’m working on a task from the one of Coursera programming courses. The purpose is to estimate the amount of inversions that are required to sort the array of numbers.
Also I’m trying to learn some Haskell basics, here is my solution:
fsthalf :: [a] -> [a]
fsthalf xs = take (length xs `div` 2) xs
sndhalf :: [a] -> [a]
sndhalf xs = drop (length xs `div` 2) xs
union :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
union (a, ac) (b, bc) = (a++b, ac+bc)
merge :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
merge ([], ac) (bs, bc) = (bs, ac + bc)
merge (as, ac) ([], bc) = (as, ac + bc)
merge (a:as, ac) (b:bs, bc)
| a <= b = ([a], 0) `union` merge (as, ac) (b:bs, bc)
| otherwise = ([b], length(a:as)) `union` merge (a:as, ac) (bs, bc)
mergeSort :: ([Int], Int) -> ([Int], Int)
mergeSort (s, c)
| length s `elem` [0, 1] = (s, c)
| otherwise = merge (mergeSort (fsthalf s, c)) (mergeSort (sndhalf s, c))
main :: IO()
main = do
_ <- readLn :: IO Int
rawValues <- getLine
let intValues = map read $ words rawValues :: [Int]
print $ snd $ mergeSort (intValues, 0)
It seems to work properly, but exceeds time limits. It takes from 9 to 10 seconds to sort the longest possible list of integers (100000 values), while the limit is 6 seconds. Could anyone please point out some possible optimizations for this solution.
Solution
General review tips:
-
It’d be better to document your code. In particular functions like
union :: ([Int], Int) -> ([Int], Int) -> ([Int], Int)
need documentation – it’s not clear which
Int
has what purpose at all without examining the source. -
Some of your functions are polymorphic (
sndhalf
), others not. Why? I’d suggest to have all types consistent. Either just sortInt
s, or make the types as generic as possible, for exampleunion :: (Ord a) => ([a], Int) -> ([a], Int) -> ([a], Int)
Notice how this generality also makes the types easier to understand.
-
I’d suggest to wrap list + its inversion count into a
newtype
such as:newtype List a = List [a] Int
This makes the code better structured, and also allows you to use smart
constructors, if you decide
so. - Investigate Haskell’s
sort
for inspiration.
Now specific to performance:
– For each pass you traverse the list several times,
which can impact the performance considerably. In particular: take
and
length
in fsthalf
, the same in sndhalf
. Also length
in mergeSort
. This
makes it 5 traversals before the merging pass. I’d try to reduce this number.
-
Matching on
length
is wasteful, if you need just to distinguish if it’s 0 or 1, you traverse the whole list needlessly:mergeSort (s, c) | length s `elem` [0, 1] = (s, c)
Instead:
mergeSort ([], _) = ([], 0) mergeSort (xs@[_], _) = (xs, 1) ...
-
Make sure you’re compiling with
-O
.
I hope this helps!