Problem
I am trying to learn Kotlin. I am coming from some background in Java. As a learning exercise, I wrote this simple program to summarize occurrences of a string by author email in a list of git repositories.
I am curious if I am approaching things in an idiomatic Kotlin way. Extension functions are awesome but I suspect it’s something that I should not overuse.
I plan to make this a command line application and maybe use recursion to get the git repositories from the source directory without explicitly providing them. Before I get too far though I would like to see how I can improve what I have.
Any feedback is appreciated!
import java.io.File
import java.time.OffsetDateTime
import java.time.format.DateTimeFormatter
import java.util.concurrent.TimeUnit
import java.util.regex.Pattern
import kotlin.concurrent.thread
data class CommitSummary(
var author: String,
var regexString: String,
var count: Int)
data class Commit(
var commit: String = "",
var author: String = "",
var date: OffsetDateTime? = null,
var message: String = "")
fun main(args: Array<String>) {
val cmd = "git log --all --branches=* --remotes=*"
val matchStr = "d'{0,1}oh" //case-insensitive
val gitDir = "C:\dev\git\"
arrayListOf("repo-1", "repo-2")
.forEach { repo ->
thread {
val log = cmd.run(File(gitDir + repo), createTempFile())
val commits = log.parseGitLog()
val summaries = commits.summarizeGitMessages(matchStr)
summaries.forEach { author, summary ->
println(String.format("repo: %s, author: %s, %s count: %s", repo, author, summary.regexString, summary.count))
}
}
}
}
fun File.parseGitLog(): ArrayList<Commit> {
return this.readLines().fold(arrayListOf()) { accumulated, current ->
val startingWord = current.split("\s".toRegex())[0]
when (startingWord) {
"commit" -> accumulated.add(Commit(current.split("\s".toRegex())[1]))
"Author:" -> accumulated.last().author = current.substring(current.indexOf("<") + 1, current.indexOf(">"))
"Date:" -> accumulated.last().date = current.split("Date:")[1].trim().parseGitDateString()
else -> accumulated.last().message += current.trim()
}
return@fold accumulated
}
}
fun ArrayList<Commit>.summarizeGitMessages(regexString: String): MutableMap<String, CommitSummary> {
val pattern = Pattern.compile(regexString, Pattern.MULTILINE or Pattern.CASE_INSENSITIVE)
return this.fold(mutableMapOf()) { acc, commit ->
val summary: CommitSummary = acc.getOrPut(commit.author) { CommitSummary(commit.author, regexString, 0) }
val match = pattern.matcher(commit.message)
while (match.find()) {
summary.count = summary.count.plus(1)
}
return@fold acc
}
}
// string extensions
fun String.run(workingDir: File, targetFile: File = File("C:\dev\tmpFile")): File {
if (targetFile.exists()) {
targetFile.delete()
}
ProcessBuilder(*split(" ").toTypedArray())
.directory(workingDir)
.redirectOutput(ProcessBuilder.Redirect.appendTo(targetFile))
.redirectError(ProcessBuilder.Redirect.INHERIT)
.start()
.waitFor(60, TimeUnit.MINUTES)
return targetFile
}
fun String.parseGitDateString(format: DateTimeFormatter = DateTimeFormatter.ofPattern("EEE MMM d HH:mm:ss yyyy Z")): OffsetDateTime {
return OffsetDateTime.parse(this, format)
}
Solution
You already got some remarks, I think in general you could look at how
command line arguments could be used to pass in what’s currently hard
coded and make it a more generally usable command line tool. Overall
looking good to though, the fold
s are nice.
Code-wise:
- The point about data classes is good, in production code I’d perhaps
consider a builder pattern since you definitely need to accumulate the
state over multiple steps before you have everything to construct the
instance. But, that would then make it possible to define it
immutably! summary.count = summary.count.plus(1)
might be valid in Kotlin, but
still++summary.count
/summary.count++
/summary += 1
are likely
all a bit better (less verbose).- The whole loop though can also be phrased more succinctly:
summary.count += pattern.matcher(commit.message).results().count().toInt()
,
c.f. this SO post. - I’m not recommending it “just because”, but take a look at
Kotlin’s string templates. - Since
targetFile.delete()
will only raise an exception via the
security manager, so you can just call it without the extra check,
eliminating a few more lines. The return value will of course still
be ignored, which is fine. - In fact, if you’re just concerned with whether the invoked process has
a clean file to write to,ProcessBuilder
with the regular.to()
redirect
will truncate the output file
via the createdFileOutputStream
. - Apart from the extension method comment I’d also leave out the default
argument for the target. Or, if you want a default, create a new
temporary file by default and return that instead of a fixed one. - If this were production code I’d also recommend that the regex splits
have thelimit
set since you only ever use the first or second
result anyway. And of course to reuse the result and not split
multiple times. - Not quite sure why multiple threads are spawned. In any case the
output might be mangled pretty easily without any locking going on. - In case you want to embrace the more functional parts of Kotlin,
consider replacingreturn@fold acc
with … justacc
. git
can also be convinced to produce dates in some
more standard formats, meaning you
can also find some already existing constants in libraries (or even
the standard library, not sure) that handle them. In general you’d
want to define constants for these things anyway.- And I’ve just noticed you’re concatenating things for
File
, please
usePath
so it works on more than a single OS.
I have never used Kotlin, so I can’t give a comprehensive review or comment particularly on specific idioms.
I’m slightly bothered by this line, because null objects seems very much like a Java habit which Kotlin’s paradigm generally tries to avoid.
var date: OffsetDateTime? = null,
It seems to me that Commit
is the sort of thing whose values should be known at construction, so consider using the data class constructor.
I’m also a bit nervous about this line:
val gitDir = "C:\dev\git\"
File(gitDir + repo)
I know you’ll parameterise the list of repositories you want to use. When you do, it’s worth thinking about things like cross platform usability. I can’t give a specific recommendation, but it’s generally better to use some sort of filesystem library rather than stitching together strings like that.
// string extensions
fun String.run(workingDir: File, targetFile: File = File("C:\dev\tmpFile")): File
Class extensions is a really cool capability, but it also feels like a dangerous use of that capabiltiy. As a rule of thumb, you wouldn’t have a class method which isn’t meaningful and legal to run for whatever instance of that class you have, and similarly be wary of adding extensions that don’t make sense of most versions of a class. Since matchStr.run(...)
wouldn’t make any sense I’d be suspicious of extending String
like this. I’d say either have a specific class for runnable strings, or just have a function that takes the string to run as a parameter.