- Create a function called
wordsAndSpacesthat splits a string on groups of spaces, preserving each group of space characters as a single token.
wordsAndSpaces "extra spaces"gives
["extra", " ", "spaces"]
- Input strings will contains only alphanumeric characters and space characters (no need to handle special characters)
- Only Prelude functions are allowed (no imports e.g.
- I’m looking specifically for feedback on how I can break thought patterns I have developed from working in imperative languages. My code likely has the telltale signs of a recovering Java programmer 🙂
wordsAndSpaces :: String -> [String] wordsAndSpaces = wordsAndSpaces'  wordsAndSpaces' :: [String] -> String -> [String] wordsAndSpaces' tokens  = tokens wordsAndSpaces' tokens (char : xs) | shouldAddToLastToken char tokens = wordsAndSpaces' (addToLastToken char tokens) xs | otherwise = wordsAndSpaces' (tokens ++ [[char]]) xs shouldAddToLastToken :: Char -> [String] -> Bool shouldAddToLastToken c tokens | null tokens = False | otherwise = (isSpace c && (isSpace . last . last) tokens) || ((not . isSpace) c && (not . isSpace . last . last) tokens) addToLastToken :: Char -> [String] -> [String] addToLastToken char tokens = init tokens ++ [last tokens ++ [char]] isSpace :: Char -> Bool isSpace = (== ' ')
The big thing that jumped out to me was how you seem to be treating Haskell lists like Java… Arrays? (I haven’t used Java since… 2006?) If you’re repeatedly fiddling with the elements at the end of lists, you’re probably doing something wrong. I’m not sure what the complexity of this all is, but I think it might be something like O(m2n2) where m is the length of the original string and n is the number of tokens. Haskell lists are linked lists, finding arbitrary elements (of which the
last is one) is an O(n) operation, as is appending any element to the tail of a list (
I think largely as you gain familiarity with the Prelude you’ll begin to intuit the shape of efficient Haskell algorithms. Now that’s not to say Haskell doesn’t also have arrays, but we tend to use them differently. (How exactly that is being way out of the scope of this question.)
In this case the concept you’re looking for is a
span. That is, the part of a list at the front that matches a predicate, and then the rest. In this case we care about the predicates
not . isSpace.
-- REPL session, if you're not familiar with REPLs yet coming from Java, check out GHCi > span isSpace " word word" (" ","word word") > span (not . isSpace) "wordword word word" ("wordword"," word word")
span is in the Prelude, but it’s worth taking a crack at implementing it yourself to start building an intuition for living in Linked List World. Try it before you move on and I start to really blow your mind.
Here’s a solution using
span. Spoiler’d so you can figure it out first.
wordsAndSpaces :: String -> [String] wordsAndSpaces  =  wordsAndSpaces cs@(c:_) = case isSpace c of True -> let (w, rest) = span isSpace cs in w : wordsAndSpaces rest False -> let (w, rest) = break isSpace cs in w : wordsAndSpaces rest -- break p = span (not . p), just a convenient name
In that solution you might notice how both tines of the fork (as it were) are very, very similar. Generally this is a hint that a handy library function exists or there’s some common trick that has a deeply unintuitive category theoretical name and/or requires a flash of abstractly reasoned insight to recognize. Luckily in this case you would just have to know about
Data.List.groupBy :: (a -> a -> Bool) -> [a] -> [[a]].
Coming from outside the Prelude
groupBy doesn’t fit your criteria for usage in this practice problem, but as with
span it’s something worth knowing for the way we think about decomposing problems. If you didn’t immediately understand what
groupBy is or does from its type, that’s more than alright for a beginner, but give it a think before reading on.
groupBy breaks a list into sublists based on which stretches of values are like one another. This probably makes most sense when you look at an example.
> groupBy (==) [1,1,2,2,2,3,4,5,5,5,5] [[1,1],[2,2,2],,,[5,5,5,5]] groupBy :: (a -> a -> Bool) -> [a] -> [[a]] isSpace :: (a -> Bool)
So the question is, how do we turn the unary predicate
isSpace into a binary predicate to pass to
groupBy? The first step to recognize is that the
Bool return values mean different things.
isSpace says whether or not a character is a space,
groupBy’s predicate is asking whether two things are the same. So we can test the result of isSpace on two elements for equality.
predicate :: Char -> Char -> Bool predicate c d = isSpace c == isSpace d
But it turns out we even do that so frequently that there’s a name for it also,
Data.Function.on :: (b -> b -> c) -> (a -> b) -> a -> a -> c. Used like—
predicate = (==) `on` isSpace
on takes a binary operation (
b -> b -> c) and a unary operation (
a -> b) and lifts the binop to work on
as, whatever those may be.
groupBy is a good example, sorts are another where you might do something like
compare `on` color, if
color had a type like
Shape -> Color. We would like to sort
Color, so you use a higher order function (
on) to build a function to do that! Try to implement
So, here’s how a Haskeller would write
wordsAndSpaces = groupBy ((==) `on` isSpace)
With that (probably, hopefully, shocking) motivating example, now try to write
groupBy. Think back to the solution using
span, consider its symmetries, and try to go one level up the abstraction stack. Ultimately Haskell coding is about solving the smallest part of a problem that you can, recognizing which higher order function can plumb that into solving a slightly larger problem, then repeating that process over and over again until you reach the end.
In this case (though I’ve kind of meandered), first we had to have figured out how to say whether or not something is a space. Next you figure out identifying a single stretch of spaces or not-spaces. Then the trick is categorizing all of the maybe-or-not-spaces. Each step in the path is something small and tackle-able on its own, and we try to consider the most general possible solution to it so that we don’t have to think about the whole problem at once.