Haskell tool for find @SuppressWarnings in Java source code

Posted on

Problem

I put together a little tool for finding where people have suppressed warnings, and to see if they have commented why. Its in Haskell for fun and some practice. Its split into two files, the thinking here was to keep all of the IO functions in one place and the pure parser code in another.
My immediate concerns are:

  • I use Text, Text.Lazy fairly indiscriminately, not sure if there is a better way?
  • It works blazingly fast on a pure source code tree but on an entire project tree, with binaries in it, it seems to stall, so I think either I’m not guarding against reading the binaries or there is a space leak?

Main.hs

{-# LANGUAGE DeriveDataTypeable #-}

module Main where

import Control.Arrow
import qualified Data.Foldable as F
import qualified Data.Text as T
import qualified Data.Text.Lazy as TL
import qualified Data.Text.IO as TIO
import qualified Data.Text.Lazy.IO as TLIO
import JavaParser
import System.Directory.Tree
import System.Environment
import System.FilePath
import Text.Hastache
import Text.Hastache.Context
import Data.Data
import Data.Generics
import Data.Functor.Identity

main :: IO ()
main = do
  args <- getArgs
  handleArgs args

-- | simple CL args handling before main entry point
handleArgs :: [String] -> IO ()
handleArgs [templatePath, outputPath, sourceDir] = walkFiles templatePath outputPath sourceDir
handleArgs _ = putStrLn "Usage: suppr <template-path> <output-path> <source-dir>"

-- | the real main entry point after handling CL args
walkFiles :: FilePath -> FilePath -> FilePath -> IO ()
walkFiles templatePath outputPath sourceDir = do
  tree <- readDirectoryWith TIO.readFile sourceDir
  testTemplate <- readFile templatePath
  temp <- render testTemplate . getResults $ tree
  TLIO.writeFile outputPath temp
  where
    getResults = F.foldr processFile (Tables [] []) . zipPaths .  filterAnchoredTree isJavaDirTree

-- | data structures that mainly exist to make rendering the template easy cos of deriving Data, Typeable
data Tables = Tables {unjust :: [Table], just :: [Table]} deriving (Show, Data, Typeable)
data Table = Table {path :: FilePath, suppressions :: [Line] } deriving (Show, Data, Typeable)

-- | render the Tables to a hastache template
render :: String -> Tables -> IO TL.Text
render template = hastacheStr defaultConfig (encodeStr template) . mkGenericContext

-- | Given a source file build the actual tables of unjust and just suppressions
processFile :: (FilePath, T.Text) -> Tables -> Tables
processFile (source, content) (Tables unjust just) =
  makeTables <<< addTable source unjust *** addTable source just $ getSuppressions content

-- | push out data into the record structure that hastache can understand
makeTables :: ([Table],[Table]) -> Tables
makeTables (unjust, just) = Tables unjust just

-- | Add new table to list of tables i.e. add the lines for a single file
addTable :: FilePath -> [Table] -> [Line] -> [Table]
addTable source table [] = table
addTable source table lines = Table source lines:table

-- | Predicate to filter out parts of the DirTree we dont care about i.e. non java files
isJavaDirTree :: DirTree a -> Bool
isJavaDirTree (File path _) = takeExtension path == ".java"
isJavaDirTree (Dir _ _) = True
isJavaDirTree _ = False

-- | Apply filter to dire tree preserving anchor
filterAnchoredTree :: (DirTree a -> Bool) -> AnchoredDirTree a -> AnchoredDirTree a
filterAnchoredTree f (b:/a) = b :/ filterDir f a

JavaParser.hs

{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE OverloadedStrings #-}

module JavaParser
( Line(..),
  getSuppressions,
  javaParser,
  javaLineParser,
  suppressionParser,
  reasonParser,
  quotedString,
  skipString,
  skipToEOL
) where


import Control.Applicative
import Data.Attoparsec.Text
import Data.List
import qualified Data.Text as T
import Data.Data
import Data.Generics

-- |Represents a code line we may or may not be interested in
--
-- * 'Justified' is a warning suppression with a justification (good)
--
-- * 'Unjustified' is a warning suppression without a justification (bad)
--
-- * 'Code' is any other line (uninteresting)
data Line = Justified {rule :: T.Text, reason :: T.Text}
          | Unjustified {rule :: T.Text}
          | Code deriving (Show, Eq, Data, Typeable)

isJustified :: Line -> Bool
isJustified (Justified _ _) = True
isJustified _ = False


-- |Tell us which lines are interesting
-- True if a 'Justified' or 'Unjustifed' 'Line', False if a code line
isSuppression :: Line -> Bool
isSuppression Code = False 
isSuppression _ = True

-- |Return all the suppresssions in the text
getSuppressions :: T.Text -> ([Line],[Line])
getSuppressions content = case parseOnly javaParser content of
                           Right result -> partition (not.isJustified) $ filter isSuppression result
                           _ -> ([],[])

-- |Parse all lines
javaParser :: Parser [Line]
javaParser = many1 javaLineParser

-- |Parse a single java line
javaLineParser :: Parser Line
javaLineParser = suppressionParser <|> skipToEOL *> pure Code

-- |Try to parse a line as a suppression, either 'Justified' or 'Unjustified'
suppressionParser :: Parser Line
suppressionParser = do
  rule <- ruleParser
  option (Unjustified rule) $ fmap (Justified rule) reasonParser <* skipSpace

-- |Try to parse a rule
ruleParser :: Parser T.Text
ruleParser = annotationParser *> char '(' *> skipSpace *>
             quotedString <* skipSpace <* char ')'

-- |Try to parse the annotations we are interested in
annotationParser :: Parser ()
annotationParser = skipString "@SuppressWarnings" <|> skipString "@SuppressFBWarnings"

-- |Try to parse a reason (i.e. a comment)
reasonParser :: Parser T.Text
reasonParser = skipString "//" *> Data.Attoparsec.Text.takeWhile (/= 'n') <?> "reasonParser"

-- |Try to parse a string in quotes returning the string without quotes
quotedString :: Parser T.Text
quotedString = char '"' *> Data.Attoparsec.Text.takeWhile (/= '"') <* char '"' <?> "quotedString"

-- |Skip a specific string plus any whitespace before and after
skipString :: T.Text -> Parser ()
skipString str = skipSpace *> string str *> skipSpace <?> ("skipString " ++ T.unpack str)

-- |Skip all chars upto and including the end of line
skipToEOL :: Parser ()
skipToEOL = skipWhile (/= 'n') *> endOfLine

repo is at https://github.com/j-a-k/suppr

Solution

  1. haskell is Lazy, so should you be. You should consider using readDirectoryWithL over readDirectoryWith. That should have no heavy performance implications but still allow us to save a lot of memory.

  2. testTemplate is a bit confusing as a name. Why not outputTemplate? This IMO fits the usage better.

  3. You’re not quite consistent as to when you use import qualified. I’d have expected Data.Attoparsec.Text to be imported qualified.

  4. UX for erroneous arguments is bad. I’m not told that outputPath and templatePath better be files instead of directories. I’m not quite sure as to how TLIO handles this, but I expect problems in case this is used incorrectly.

  5. Nitpick + Design Critique: I’d prefer a multiline comment when explaining what Line is / does. You could even go so far as to not force the grouping into Justified and Unjustified into the types you define and instead use the following Line:

    data Line = Suppression { rule :: T.Text, reason :: Maybe T.Text }
              | Code deriving (Show, Eq, Data, Typeable)
    

    Then again this seems like bad FP advice from a mostly OO person, so … Note that it would make isSuppression a bit more straightforward and greatly simplify suppressionParser and functions called by it. It also makes sure you’re not going to have that an easy time partitioning that list.
    The partition (not.isJustified) shouldn’t be the parser’s responsibility. getSuppressions should just return a [Line] consuming the results is the caller’s responsibility.

  6. The Data.Attoparsec.ByteString documentation explicitly states to use takeWhile1 over many1 where possible. To be more correct it encourages using the ByteString parsers over the others. In the current case I’m reasonably sure you can rework your code to use scan and consume the lines you’re looking for.

Leave a Reply

Your email address will not be published. Required fields are marked *