Problem
I put together a little tool for finding where people have suppressed warnings, and to see if they have commented why. Its in Haskell for fun and some practice. Its split into two files, the thinking here was to keep all of the IO functions in one place and the pure parser code in another.
My immediate concerns are:
- I use Text, Text.Lazy fairly indiscriminately, not sure if there is a better way?
- It works blazingly fast on a pure source code tree but on an entire project tree, with binaries in it, it seems to stall, so I think either I’m not guarding against reading the binaries or there is a space leak?
Main.hs
{-# LANGUAGE DeriveDataTypeable #-}
module Main where
import Control.Arrow
import qualified Data.Foldable as F
import qualified Data.Text as T
import qualified Data.Text.Lazy as TL
import qualified Data.Text.IO as TIO
import qualified Data.Text.Lazy.IO as TLIO
import JavaParser
import System.Directory.Tree
import System.Environment
import System.FilePath
import Text.Hastache
import Text.Hastache.Context
import Data.Data
import Data.Generics
import Data.Functor.Identity
main :: IO ()
main = do
args <- getArgs
handleArgs args
-- | simple CL args handling before main entry point
handleArgs :: [String] -> IO ()
handleArgs [templatePath, outputPath, sourceDir] = walkFiles templatePath outputPath sourceDir
handleArgs _ = putStrLn "Usage: suppr <template-path> <output-path> <source-dir>"
-- | the real main entry point after handling CL args
walkFiles :: FilePath -> FilePath -> FilePath -> IO ()
walkFiles templatePath outputPath sourceDir = do
tree <- readDirectoryWith TIO.readFile sourceDir
testTemplate <- readFile templatePath
temp <- render testTemplate . getResults $ tree
TLIO.writeFile outputPath temp
where
getResults = F.foldr processFile (Tables [] []) . zipPaths . filterAnchoredTree isJavaDirTree
-- | data structures that mainly exist to make rendering the template easy cos of deriving Data, Typeable
data Tables = Tables {unjust :: [Table], just :: [Table]} deriving (Show, Data, Typeable)
data Table = Table {path :: FilePath, suppressions :: [Line] } deriving (Show, Data, Typeable)
-- | render the Tables to a hastache template
render :: String -> Tables -> IO TL.Text
render template = hastacheStr defaultConfig (encodeStr template) . mkGenericContext
-- | Given a source file build the actual tables of unjust and just suppressions
processFile :: (FilePath, T.Text) -> Tables -> Tables
processFile (source, content) (Tables unjust just) =
makeTables <<< addTable source unjust *** addTable source just $ getSuppressions content
-- | push out data into the record structure that hastache can understand
makeTables :: ([Table],[Table]) -> Tables
makeTables (unjust, just) = Tables unjust just
-- | Add new table to list of tables i.e. add the lines for a single file
addTable :: FilePath -> [Table] -> [Line] -> [Table]
addTable source table [] = table
addTable source table lines = Table source lines:table
-- | Predicate to filter out parts of the DirTree we dont care about i.e. non java files
isJavaDirTree :: DirTree a -> Bool
isJavaDirTree (File path _) = takeExtension path == ".java"
isJavaDirTree (Dir _ _) = True
isJavaDirTree _ = False
-- | Apply filter to dire tree preserving anchor
filterAnchoredTree :: (DirTree a -> Bool) -> AnchoredDirTree a -> AnchoredDirTree a
filterAnchoredTree f (b:/a) = b :/ filterDir f a
JavaParser.hs
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}
module JavaParser
( Line(..),
getSuppressions,
javaParser,
javaLineParser,
suppressionParser,
reasonParser,
quotedString,
skipString,
skipToEOL
) where
import Control.Applicative
import Data.Attoparsec.Text
import Data.List
import qualified Data.Text as T
import Data.Data
import Data.Generics
-- |Represents a code line we may or may not be interested in
--
-- * 'Justified' is a warning suppression with a justification (good)
--
-- * 'Unjustified' is a warning suppression without a justification (bad)
--
-- * 'Code' is any other line (uninteresting)
data Line = Justified {rule :: T.Text, reason :: T.Text}
| Unjustified {rule :: T.Text}
| Code deriving (Show, Eq, Data, Typeable)
isJustified :: Line -> Bool
isJustified (Justified _ _) = True
isJustified _ = False
-- |Tell us which lines are interesting
-- True if a 'Justified' or 'Unjustifed' 'Line', False if a code line
isSuppression :: Line -> Bool
isSuppression Code = False
isSuppression _ = True
-- |Return all the suppresssions in the text
getSuppressions :: T.Text -> ([Line],[Line])
getSuppressions content = case parseOnly javaParser content of
Right result -> partition (not.isJustified) $ filter isSuppression result
_ -> ([],[])
-- |Parse all lines
javaParser :: Parser [Line]
javaParser = many1 javaLineParser
-- |Parse a single java line
javaLineParser :: Parser Line
javaLineParser = suppressionParser <|> skipToEOL *> pure Code
-- |Try to parse a line as a suppression, either 'Justified' or 'Unjustified'
suppressionParser :: Parser Line
suppressionParser = do
rule <- ruleParser
option (Unjustified rule) $ fmap (Justified rule) reasonParser <* skipSpace
-- |Try to parse a rule
ruleParser :: Parser T.Text
ruleParser = annotationParser *> char '(' *> skipSpace *>
quotedString <* skipSpace <* char ')'
-- |Try to parse the annotations we are interested in
annotationParser :: Parser ()
annotationParser = skipString "@SuppressWarnings" <|> skipString "@SuppressFBWarnings"
-- |Try to parse a reason (i.e. a comment)
reasonParser :: Parser T.Text
reasonParser = skipString "//" *> Data.Attoparsec.Text.takeWhile (/= 'n') <?> "reasonParser"
-- |Try to parse a string in quotes returning the string without quotes
quotedString :: Parser T.Text
quotedString = char '"' *> Data.Attoparsec.Text.takeWhile (/= '"') <* char '"' <?> "quotedString"
-- |Skip a specific string plus any whitespace before and after
skipString :: T.Text -> Parser ()
skipString str = skipSpace *> string str *> skipSpace <?> ("skipString " ++ T.unpack str)
-- |Skip all chars upto and including the end of line
skipToEOL :: Parser ()
skipToEOL = skipWhile (/= 'n') *> endOfLine
repo is at https://github.com/j-a-k/suppr
Solution
-
haskell is Lazy, so should you be. You should consider using
readDirectoryWithL
overreadDirectoryWith
. That should have no heavy performance implications but still allow us to save a lot of memory. -
testTemplate
is a bit confusing as a name. Why notoutputTemplate
? This IMO fits the usage better. -
You’re not quite consistent as to when you use
import qualified
. I’d have expectedData.Attoparsec.Text
to be imported qualified. -
UX for erroneous arguments is bad. I’m not told that
outputPath
andtemplatePath
better be files instead of directories. I’m not quite sure as to how TLIO handles this, but I expect problems in case this is used incorrectly. -
Nitpick + Design Critique: I’d prefer a multiline comment when explaining what
Line
is / does. You could even go so far as to not force the grouping intoJustified
andUnjustified
into the types you define and instead use the followingLine
:data Line = Suppression { rule :: T.Text, reason :: Maybe T.Text } | Code deriving (Show, Eq, Data, Typeable)
Then again this seems like bad FP advice from a mostly OO person, so … Note that it would make
isSuppression
a bit more straightforward and greatly simplify suppressionParser and functions called by it. It also makes sure you’re not going to have that an easy time partitioning that list.
Thepartition (not.isJustified)
shouldn’t be the parser’s responsibility.getSuppressions
should just return a[Line]
consuming the results is the caller’s responsibility. -
The
Data.Attoparsec.ByteString
documentation explicitly states to usetakeWhile1
overmany1
where possible. To be more correct it encourages using the ByteString parsers over the others. In the current case I’m reasonably sure you can rework your code to usescan
and consume the lines you’re looking for.