Kotlin program to summarize git commits by regex match

Posted on

Problem

I am trying to learn Kotlin. I am coming from some background in Java. As a learning exercise, I wrote this simple program to summarize occurrences of a string by author email in a list of git repositories.

I am curious if I am approaching things in an idiomatic Kotlin way. Extension functions are awesome but I suspect it’s something that I should not overuse.

I plan to make this a command line application and maybe use recursion to get the git repositories from the source directory without explicitly providing them. Before I get too far though I would like to see how I can improve what I have.

Any feedback is appreciated!

import java.io.File
import java.time.OffsetDateTime
import java.time.format.DateTimeFormatter
import java.util.concurrent.TimeUnit
import java.util.regex.Pattern
import kotlin.concurrent.thread

data class CommitSummary(
        var author: String,
        var regexString: String,
        var count: Int)

data class Commit(
        var commit: String = "",
        var author: String = "",
        var date: OffsetDateTime? = null,
        var message: String = "")

fun main(args: Array<String>) {

    val cmd = "git log --all --branches=* --remotes=*"
    val matchStr = "d'{0,1}oh" //case-insensitive
    val gitDir = "C:\dev\git\"

    arrayListOf("repo-1", "repo-2")
            .forEach { repo ->
                thread {
                    val log = cmd.run(File(gitDir + repo), createTempFile())
                    val commits = log.parseGitLog()
                    val summaries = commits.summarizeGitMessages(matchStr)
                    summaries.forEach { author, summary ->
                        println(String.format("repo: %s, author: %s, %s count: %s", repo, author, summary.regexString, summary.count))
                    }
                }
            }
}

fun File.parseGitLog(): ArrayList<Commit> {
    return this.readLines().fold(arrayListOf()) { accumulated, current ->

        val startingWord = current.split("\s".toRegex())[0]

        when (startingWord) {
            "commit" -> accumulated.add(Commit(current.split("\s".toRegex())[1]))
            "Author:" -> accumulated.last().author = current.substring(current.indexOf("<") + 1, current.indexOf(">"))
            "Date:" -> accumulated.last().date = current.split("Date:")[1].trim().parseGitDateString()
            else -> accumulated.last().message += current.trim()
        }

        return@fold accumulated
    }
}

fun ArrayList<Commit>.summarizeGitMessages(regexString: String): MutableMap<String, CommitSummary> {
    val pattern = Pattern.compile(regexString, Pattern.MULTILINE or Pattern.CASE_INSENSITIVE)

    return this.fold(mutableMapOf()) { acc, commit ->
        val summary: CommitSummary = acc.getOrPut(commit.author) { CommitSummary(commit.author, regexString, 0) }
        val match = pattern.matcher(commit.message)
        while (match.find()) {
            summary.count = summary.count.plus(1)
        }
        return@fold acc
    }
}

// string extensions
fun String.run(workingDir: File, targetFile: File = File("C:\dev\tmpFile")): File {

    if (targetFile.exists()) {
        targetFile.delete()
    }

    ProcessBuilder(*split(" ").toTypedArray())
            .directory(workingDir)
            .redirectOutput(ProcessBuilder.Redirect.appendTo(targetFile))
            .redirectError(ProcessBuilder.Redirect.INHERIT)
            .start()
            .waitFor(60, TimeUnit.MINUTES)

    return targetFile
}

fun String.parseGitDateString(format: DateTimeFormatter = DateTimeFormatter.ofPattern("EEE MMM d HH:mm:ss yyyy Z")): OffsetDateTime {
    return OffsetDateTime.parse(this, format)
}

Solution

You already got some remarks, I think in general you could look at how
command line arguments could be used to pass in what’s currently hard
coded and make it a more generally usable command line tool. Overall
looking good to though, the folds are nice.


Code-wise:

  • The point about data classes is good, in production code I’d perhaps
    consider a builder pattern since you definitely need to accumulate the
    state over multiple steps before you have everything to construct the
    instance. But, that would then make it possible to define it
    immutably!
  • summary.count = summary.count.plus(1) might be valid in Kotlin, but
    still ++summary.count/summary.count++/summary += 1 are likely
    all a bit better (less verbose).
  • The whole loop though can also be phrased more succinctly:
    summary.count += pattern.matcher(commit.message).results().count().toInt(),
    c.f. this SO post.
  • I’m not recommending it “just because”, but take a look at
    Kotlin’s string templates.
  • Since targetFile.delete() will only raise an exception via the
    security manager, so you can just call it without the extra check,
    eliminating a few more lines. The return value will of course still
    be ignored, which is fine.
  • In fact, if you’re just concerned with whether the invoked process has
    a clean file to write to, ProcessBuilder with the regular .to()
    redirect
    will truncate the output file
    via the created FileOutputStream.
  • Apart from the extension method comment I’d also leave out the default
    argument for the target. Or, if you want a default, create a new
    temporary file by default and return that instead of a fixed one.
  • If this were production code I’d also recommend that the regex splits
    have the limit set since you only ever use the first or second
    result anyway. And of course to reuse the result and not split
    multiple times.
  • Not quite sure why multiple threads are spawned. In any case the
    output might be mangled pretty easily without any locking going on.
  • In case you want to embrace the more functional parts of Kotlin,
    consider replacing return@fold acc with … just acc.
  • git can also be convinced to produce dates in some
    more standard formats, meaning you
    can also find some already existing constants in libraries (or even
    the standard library, not sure) that handle them. In general you’d
    want to define constants for these things anyway.
  • And I’ve just noticed you’re concatenating things for File, please
    use Path so it works on more than a single OS.

I have never used Kotlin, so I can’t give a comprehensive review or comment particularly on specific idioms.

I’m slightly bothered by this line, because null objects seems very much like a Java habit which Kotlin’s paradigm generally tries to avoid.

var date: OffsetDateTime? = null,

It seems to me that Commit is the sort of thing whose values should be known at construction, so consider using the data class constructor.


I’m also a bit nervous about this line:

val gitDir = "C:\dev\git\"

File(gitDir + repo)

I know you’ll parameterise the list of repositories you want to use. When you do, it’s worth thinking about things like cross platform usability. I can’t give a specific recommendation, but it’s generally better to use some sort of filesystem library rather than stitching together strings like that.


// string extensions
fun String.run(workingDir: File, targetFile: File = File("C:\dev\tmpFile")): File 

Class extensions is a really cool capability, but it also feels like a dangerous use of that capabiltiy. As a rule of thumb, you wouldn’t have a class method which isn’t meaningful and legal to run for whatever instance of that class you have, and similarly be wary of adding extensions that don’t make sense of most versions of a class. Since matchStr.run(...) wouldn’t make any sense I’d be suspicious of extending String like this. I’d say either have a specific class for runnable strings, or just have a function that takes the string to run as a parameter.

Leave a Reply

Your email address will not be published. Required fields are marked *