Problem
I had some time to kill today, and I found the Karate Chop Kata.
Specification:
Write a binary chop method that takes an integer search target and a sorted array of integers. It should return the
integer index of the target in the array, or -1 if the target is not
in the array.
I’ve never implemented a binary search before. So, even though all of my tests pass, I’m not sure that I’ve covered all of the corner cases. It also doesn’t look very elegant. How can I improve on this?
I’m also still not sure that I’m unit testing in a “proper” way. How can I improve them? (Please keep in mind that Rubberduck automatically inserts boilerplate for new test methods.)
Chop
Option Explicit
' Returns index of the target number in a given array.
' If not found returns -1
Public Function Chop(target As Long, Arr() As Long, Optional midpoint As Long = -1) As Long
Dim result As Long
Dim currentTest As Long
If Not IsArrayAllocated(Arr) Then
Chop = -1
Exit Function
End If
If midpoint < 0 Then
midpoint = UBound(Arr) 2 'integer division
End If
currentTest = Arr(midpoint)
If target = currentTest Then
result = midpoint
ElseIf midpoint = 0 Or (Arr(UBound(Arr)) < target) Then
result = -1 'not found
Else
If target > currentTest Then
midpoint = midpoint + (midpoint 2) ' go up by half
If midpoint > UBound(Arr) Then midpoint = UBound(Arr)
Else
midpoint = midpoint 2
End If
result = Chop(target, Arr, midpoint)
End If
Chop = result
End Function
'Borrorwed from Chip Pearson
' http://www.cpearson.com/excel/isarrayallocated.aspx
Function IsArrayAllocated(Arr As Variant) As Boolean
On Error Resume Next
IsArrayAllocated = IsArray(Arr) And _
Not IsError(LBound(Arr, 1)) And _
LBound(Arr, 1) <= UBound(Arr, 1)
End Function
Rubberduck Unit Tests
Option Explicit
Option Private Module
'@TestModule
Private Assert As New Rubberduck.AssertClass
'@TestMethod
Public Sub EmptyArrayReturnsNegativeOne()
On Error GoTo TestFail
Arrange:
Const expected As Long = -1
Dim integers() As Long
Act:
Assert:
Assert.AreEqual expected, Chop(2, integers)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub TargetNotInArrayReturnsNegativeOne()
On Error GoTo TestFail
Arrange:
Const expected As Long = -1
Const target As Long = 2
Dim integers(0) As Long
integers(0) = 3
Act:
Assert:
Assert.AreEqual expected, Chop(target, integers)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub TargetNotInArrayReturnsNegativeOne_LargerArray()
On Error GoTo TestFail
Arrange:
Const expected As Long = -1
Act:
Assert:
Assert.AreEqual expected, Chop(10, EvenSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub OddSizeArray_FirstReturnsZero()
On Error GoTo TestFail
Arrange:
Const expected As Long = 0
Act:
Assert:
Assert.AreEqual expected, Chop(1, OddSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub OddSizeArray_MiddleReturnsTwo()
On Error GoTo TestFail
Arrange:
Const expected As Long = 2
Act:
Assert:
Assert.AreEqual expected, Chop(3, OddSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub OddSizeArray_LastReturnsFour()
On Error GoTo TestFail
Arrange:
Const expected As Long = 4
Act:
Assert:
Assert.AreEqual expected, Chop(5, OddSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub EvenSizeArray_FirstReturnsZero()
On Error GoTo TestFail
Arrange:
Const expected As Long = 0
Act:
Assert:
Assert.AreEqual expected, Chop(1, EvenSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub EvenSizeArray_LastReturnsFive()
On Error GoTo TestFail
Arrange:
Const expected As Long = 5
Act:
Assert:
Assert.AreEqual expected, Chop(6, EvenSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
'@TestMethod
Public Sub EvenSizeArray_NextToLastReturnsFour()
On Error GoTo TestFail
Arrange:
Const expected As Long = 4
Act:
Assert:
Assert.AreEqual expected, Chop(5, EvenSizeArray)
TestExit:
Exit Sub
TestFail:
If Err.Number <> 0 Then
Assert.Fail "Test raised an error: #" & Err.Number & " - " & Err.Description
Else
Resume TestExit
End If
End Sub
Private Function OddSizeArray() As Long()
Dim result(0 To 4) As Long
Dim i As Long
For i = LBound(result) To UBound(result)
result(i) = i + 1
Next
OddSizeArray = result
End Function
Private Function EvenSizeArray() As Long()
Dim result(0 To 5) As Long
Dim i As Long
For i = LBound(result) To UBound(result)
result(i) = i + 1
Next
EvenSizeArray = result
End Function
Solution
I see some weird stuff going on here, let me see if I can verbalize my concerns
If target = currentTest Then
result = midpoint
ElseIf midpoint = 0 Or (Arr(UBound(Arr)) < target) Then
result = -1 'not found
Else
If target > currentTest Then
midpoint = midpoint + (midpoint 2) ' go up by half
If midpoint > UBound(Arr) Then midpoint = UBound(Arr)
Else
midpoint = midpoint 2
End If
result = Chop(target, Arr, midpoint)
End If
One of the first things I would do is to move the ElseIf
statement out front, dump out right away, but I have a feeling that this won’t happen as often as the current If
statement and that the Arr(UBound(Arr)) < target
must be more Resource intensive than comparing two objects.
so I guess that I am going to leave that one alone.
The next thing that I noticed was that you nested an If
structure inside the Else
statement, and it made me stop and think for a little bit.
So I wrote it out like this
If target = currentTest Then
result = midpoint
ElseIf midpoint = 0 Or (Arr(UBound(Arr)) < target) Then
result = -1 'not found
Else If target > currentTest Then
midpoint = midpoint + (midpoint 2) ' go up by half
If midpoint > UBound(Arr) Then midpoint = UBound(Arr)
result = Chop(target, Arr, midpoint)
Else
midpoint = midpoint 2
result = Chop(target, Arr, midpoint)
End If
But then the code isn’t dry, it repeats result = Chop(target, Arr, midpoint)
which is annoying, and now I understand why you did it like that, it smells but seems to be what is necessary.
But your If
statement structure is inconsistent, you one line in a weird spot and I almost missed it. You were also missing an End If
statement. I came up with
If target = currentTest Then
result = midpoint
ElseIf midpoint = 0 Or (Arr(UBound(Arr)) < target) Then
result = -1 'not found
Else
If target > currentTest Then
midpoint = midpoint + (midpoint 2) ' go up by half
If midpoint > UBound(Arr) Then
midpoint = UBound(Arr)
End If
Else
midpoint = midpoint 2
End If
result = Chop(target, Arr, midpoint)
End If
- Even though it matches the spec,
Chop
isn’t a very good name for
what this function does.IndexOf
would be a better name. Ubound(Arr)
gets called in this code a lot. It happens more
than enough to declare a variable for it.- All of the parameters are being implicitly passed
ByRef
which isn’t nice. (And you have no excuse. The add-in you’re using warns you about this.) -
The optional parameter for
midpoint
is okay, but it exposes implementation details to the client. It would be better to create a private function that does all of the heavy lifting. It can require that the midpoint be specified.Public Function IndexOf(ByVal target As Long, ByRef Arr() As Long) As Long IndexOf = RecursiveIndexOf(target, Arr, -1) End Function
-
This is a useless comment. If the maintainer doesn’t know the difference between
/
andthey can look it up.
midpoint = upperBoundry 2 'integer division
-
-1
is a magic number with two separate meanings. Constants should be defined.Private Const DefaultMidpoint As Integer = -1 Public Const IndexOfResultNotFound As Long = -1
-
It’s not semantically correct to initialize the midpoint if
midpoint < 0
. It’s correct to initialize it ifmidpoint = DefaultMidpoint
.If midpoint = DefaultMidpoint Then midpoint = upperBoundry 2 End If
-
It’s not quite as DRY, but an
ElseIf
here instead of anOr
improves readability. Considering it’s merely assignment, I think it’s a good change.ElseIf midpoint = 0 Or (Arr(upperBoundry) < target) Then result = IndexOfResultNotFound Else
Becomes
ElseIf midpoint = 0 Then result = IndexOfResultNotFound ElseIf Arr(upperBoundry) < target Then result = IndexOfResultNotFound Else
-
This logic can and should be extracted into a
GetNextMidpoint
function.Else If target > currentTest Then midpoint = midpoint + (midpoint 2) ' go up by half If midpoint > upperBoundry Then midpoint = upperBoundry End If Else midpoint = midpoint 2 End If result = RecursiveIndexOf(target, Arr, midpoint) End If
Which gives us the refactored code below.
Option Explicit
Private Const DefaultMidpoint As Integer = -1
Public Const IndexOfResultNotFound As Long = -1
' Returns index of the target number in a given array.
' If not found returns -1
Public Function IndexOf(ByVal target As Long, ByRef Arr() As Long) As Long
IndexOf = RecursiveIndexOf(target, Arr, DefaultMidpoint)
End Function
Private Function RecursiveIndexOf(ByVal target As Long, ByRef Arr() As Long, ByVal midpoint As Long)
Dim result As Long
Dim currentTest As Long
If Not IsArrayAllocated(Arr) Then
RecursiveIndexOf = IndexOfResultNotFound
Exit Function
End If
Dim upperBoundry As Long
upperBoundry = UBound(Arr)
If midpoint = DefaultMidpoint Then
midpoint = upperBoundry 2
End If
currentTest = Arr(midpoint)
If target = currentTest Then
result = midpoint
ElseIf midpoint = 0 Then
result = IndexOfResultNotFound
ElseIf Arr(upperBoundry) < target Then
result = IndexOfResultNotFound
Else
midpoint = GetNextMidpoint(currentTest, target, upperBoundry, midpoint)
result = RecursiveIndexOf(target, Arr, midpoint)
End If
RecursiveIndexOf = result
End Function
Private Function GetNextMidpoint(ByVal current As Long, ByVal target As Long, ByVal upperBoundry As Long, ByVal midpoint As Long) As Long
Dim result As Long
If target > current Then
result = midpoint + (midpoint 2) ' go up by half
If result > upperBoundry Then
result = upperBoundry
End If
Else
result = midpoint 2
End If
GetNextMidpoint = result
End Function
'Borrorwed from Chip Pearson
' http://www.cpearson.com/excel/isarrayallocated.aspx
Function IsArrayAllocated(Arr As Variant) As Boolean
On Error Resume Next
IsArrayAllocated = IsArray(Arr) And _
Not IsError(LBound(Arr, 1)) And _
LBound(Arr, 1) <= UBound(Arr, 1)
End Function
I’ll just comment on your unit testing code.
Your unit tests aren’t really AAA:
Arrange:
Const expected As Long = -1
Dim integers() As Long
Act:
Assert:
Assert.AreEqual expected, Chop(2, integers)
You’re doing the “Act” step in the “Assert” part, which should do nothing other than assert – like this:
Arrange:
Const expected As Long = -1
Dim integers() As Long
Dim result As Long
Act:
result = Chop(2, integers)
Assert:
Assert.AreEqual expected, result
OddSizeArray
and EvenSizeArray
are setup code that shouldn’t be called in the “Act” part (even less so in the “Assert” part).
Rubberduck unit tests allow you to move the call to that setup code outside the actual test method, so that creating the setup array doesn’t affect the measurements (although 0-1ms is arguably not a big deal).
Consider adding private fields:
Private oddSizedArray() As Long
Private evenSizedArray() As Long
And then add a TestInitialize
method that will populate them:
'@TestInitialize
Public Sub SetupTestArrays()
oddSizedArray = OddSizeArray
evenSizedArray = EvenSizeArray
End Sub
Alternatively you can drop the '@TestInitialize
marker comment and name the method like this:
Public Sub TestInitialize()
Every method called TestInitialize
or marked with a @TestInitialize
marker right above the signature line will execute before every unit test in a test module, i.e. before Rubberduck makes the timed method call to run the test method. As a result, setting up the two arrays will not count in the Duration
of a test.
The fields should “die” by themselves when all tests have executed, but if you want you can also add a TestCleanup
method that runs after every unit test, to deallocate the arrays:
Public Sub TestCleanup()
Erase oddSizedArray
Erase evenSizedArray
End Sub
In this specific case, we’re talking about nanoseconds. But in other unit testing code you could be setting up a mock of an interface implementation to supply fake content to a method that would otherwise access a database or the file system: that setup code belongs in the “Arrange” part of your tests – not in “Act”, not in “Assert”.