Problem
I use the following code in many of my answers on Stack Overflow, to mimic the new TEXTJOIN function available in Office 365 Excel:
Function TEXTJOIN(delim As String, skipblank As Boolean, arr)
Dim d As Long
Dim c As Long
Dim arr2()
Dim t As Long, y As Long
t = -1
y = -1
If TypeName(arr) = "Range" Then
arr2 = arr.Value
Else
arr2 = arr
End If
On Error Resume Next
t = UBound(arr2, 2)
y = UBound(arr2, 1)
On Error GoTo 0
If t >= 0 And y >= 0 Then
For c = LBound(arr2, 1) To UBound(arr2, 1)
For d = LBound(arr2, 1) To UBound(arr2, 2)
If arr2(c, d) <> "" Or Not skipblank Then
TEXTJOIN = TEXTJOIN & arr2(c, d) & delim
End If
Next d
Next c
Else
For c = LBound(arr2) To UBound(arr2)
If arr2(c) <> "" Or Not skipblank Then
TEXTJOIN = TEXTJOIN & arr2(c) & delim
End If
Next c
End If
TEXTJOIN = Left(TEXTJOIN, Len(TEXTJOIN) - Len(delim))
End Function
I have successfully made it so it works with both ranges and arrays, so it works with:
=TEXTJOIN(",",TRUE,A1:B7)
As well as
{=TEXTJOIN(",",TRUE,IF(A1:A7 = "x",B1:B7,""))}
The question is: Is this the best method to check for array vs. range? Is there a better method?
As stated this works, nicely with both, but I have to think that I am doing it in a round about manner.
Solution
Okay, let’s first eliminate the easy stuff Rubberduck points out:
- Function is implicitly
Public
and implicitly returns aVariant
– should be aString
. - All parameters are implicitly passed
ByRef
(and semantically should beByVal
). - Parameter
arr
is implicitlyVariant
, - Local variables
c
,d
,t
,y
,arr2
all have terrible, meaningless names (arr
, matey! ..I like calling it Pirate Notation 😉 String
-returningLeft$
function should be used overVariant
-returningLeft
function.vbNullString
should probably be preferred over""
empty string literal.
With default settings Rubberduck will also complain about Dim t As Long, y As Long
, because having multiple declarations in a single instruction isn’t ideal.
The variables are declared at the top of the function rather than as close as possible to their usage, which makes it hard to see what’s used where.
Type-checking isn’t type safe:
If TypeName(arr) = "Range" Then
arr2 = arr.Value
Else
arr2 = arr
End If
If the Microsoft Word object model is referenced, you can pass a Word.Range
object and it will happily take it; same if I made my own Range
class and gave it an instance. Use the TypeOf
operator to perform compile-time type-safe type checks:
If TypeOf arr Is Excel.Range Then
'definitely an Excel Range object
Else
'could be anything
End If
Note, in the Else
branch arr
could literally be anything – but you’re assuming it’s an array. You could use the IsArray
function to make sure of that, and then you can also assert on the upper bound using a rather ugly helper function, to make sure you’re looking at the 2D array you’re expecting.
Else
Debug.Assert IsArray(arr)
Debug.Assert GetArrayDimSize(arr) = 2
arr2 = arr
End If
With the array dimension count known/asserted, the On Error Resume Next
statement that follows can be removed… and t
can be renamed sourceColumns
or colCount
or whatever, and y
can be renamed sourceRows
or rowCount
or whatever – and arr2
could be renamed to sourceArray
:
sourceColumns = UBound(sourceArray, 2)
sourceRows = UBound(sourceArray, 1)
And with that naming scheme I think I’d rename the arr
parameter to source
.
If t >= 0 And y >= 0 Then
Ah, ok, so that condition is leveraging the fact that VBA would have thrown an error given a 1D array, leaving t = 0
. Not very obvious, let’s improve that.
But first we need to tweak the assertions – we don’t really want a 2D array, we just want any array with at most two dimensions. So… let’s be less forgiving given anything other than that:
Dim dimensionCount As Long
If TypeOf(arr) Is Excel.Range Then
sourceArray = source.Value
dimensionCount = 2 'Range.Value is always a 2D array
Else
Dim isValidArray As Boolean
isValidArray = IsArray(source)
If isValidArray Then dimensionCount = GetArrayDimSize(source)
isValidArray = dimensionCount <> 0 And dimensionCount <= 2
If Not isValidArray Then Err.Raise 5, "TEXTJOIN", "Expected: 1D or 2D array"
End If
And now that If
statement can be much more explicit about what’s going on and why:
If dimensionCount = 2 Then
'handle 2D array
Else
Debug.Assert dimensionCount = 1
'handle 1D array
End If
So, c
iterates dimension 1 / rows, d
iterates dimension 2 / columns.
Now all these concatenations are inherently slow. I realize that’s “just a worksheet function” and you’d like to share it as a single, simple, cohesive and focused little piece of code… but given thousands of iterations, VBA’s string-handling will start becoming the performance bottleneck of the function.
There’s a lightning-fast StringBuilder class right here on this site that you can use to address that.
As a bonus, using a StringBuilder
makes the function assign to its return value only once – as opposed to what you have now, which is treating the return-value-identifier as if it were a local variable (which it technically is, …I just don’t like doing that).
Not sure why the d
/ column loop is inconsistent here:
For d = LBound(arr2, 1) To UBound(arr2, 2)
You’re iterating the 2nd dimension, the LBound
should be off the 2nd dimension too (yes, it should be the very same as that of the 1st dimension). Also, the upper bounds of both dimensions are already known and stored in local variables:
For currentRow = LBound(sourceArray, 1) To sourceRows
For currentColumn = LBound(sourceArray, 2) To sourceColumns
This code will throw an error (by design?) if the array contains an Error
value:
If arr2(c, d) <> "" Or Not skipblank Then
You could have a parameter that helps you decide how to treat errors – perhaps an Enum
could be used:
Public Enum TEXTJOIN_ErrorValues
ThrowOnError
SkipError
IncludeErrorText
End Enum
ThrowOnError
would be the current/default behavior; SkipError
would treat errors as blanks, and IncludeErrorText
would include the e.g. #N/A
error text into the result.
Except, once you have an Variant/Error
value and not a Range
, it’s pretty much impossible to get the text back (unless you want to map CVErr(xlErrWhatever)
values to a corresponding string… probably not worth it) – so scratch that enum value, and the behavior becomes either throw on error, or skip error values. And that’s entirely possible, and quite easy to do – an additional optional Boolean
parameter could be helpful for that.
Dim sb As StringBuilder
Set sb = New StringBuilder
'...
For currentRow = LBound(sourceArray, 1) To sourceRows
For currentColumn = LBound(sourceArray, 2) To sourceColumns
If Not IsError(sourceArray(currentRow, currentColumn)) Then
If sourceArray(currentRow, currentColumn) <> vbNullString Or Not skipBlank Then
sb.Append sourceArray(currentRow, currentColumn)
sb.Append delim
End If
ElseIf Not skipErrors Then
sb.Append delim
End If
Next
Next
TEXTJOIN = sb.ToString
Now, that’s the 2D loop.. the 1D loop does essentially the same thing.. and that’s annoying. I’d extract a method for that.
Private Sub ProcessValue(ByVal value As Variant, ByVal sb As StringBuilder, ByVal delim As String, ByVal skipBlanks As Boolean, ByVal skipErrors As Boolean)
If Not IsError(value) Then
If CStr(value) <> vbNullString Or Not skipBlanks Then
sb.Append CStr(value)
sb.Append delim
End If
ElseIf Not skipErrors Then
sb.Append delim
End If
End Sub
That turns the logic into:
If dimensionCount = 2 Then
For currentRow = LBound(sourceArray, 1) To sourceRows
For currentColumn = LBound(sourceArray, 2) To sourceColumns
ProcessValue sourceArray(currentRow, currentColumn), sb, delim, skipBlanks, skipErrors
Next
Next
Else
Debug.Assert dimensionCount = 1
For currentRow = LBound(sourceArray, 1) To sourceRows
ProcessValue sourceArray(currentRow), sb, delim, skipBlanks, skipErrors
Next
End If
TEXTJOIN = sb.ToString
The order of the parameters strikes me as unintuitive – might be by design to match Microsoft’s function, by I would have made the source array/range the first parameter, followed by an optional delimiter, followed by an optional flag to skip blanks (followed by an optional flag to skip errors).
Disclaimer: I know you are asking a relatively straightforward question, but this is CR after all…
Checking an array versus a range is perfectly fine (assuming this function will generally be called from the worksheet), but there are some important considerations to be made.
First, it took me a few minutes to decode what your code is doing. Given that you are quite active in the community (and thus you are helping reach many people) it would be worth investing in a copy of ‘Clean Code’ by Robert C Martin. This is a book that was recommended to me by Mat’s Mug, and has completely change how I think about code, and how I approach coding.
As a result, the first thing I noticed was how the small things within your code add up towards a larger potential for bugs. For example:
On Error Resume Next
t = UBound(arr2, 2)
y = UBound(arr2, 1)
On Error GoTo 0
If t >= 0 And y >= 0 Then
...
End If
If I interpret this bit correctly, you’re just testing whether there is a second dimension within the array, and if there is, you’re looping through the array as a 2d array. Otherwise, you’re looping through as a one-dimensional array. Wouldnt it be nice if we could explicitly say that in VBA?
Private Function ArrayIsTwoDimensional(ByVal TestArray As Variant) As Boolean
Dim Test As Variant
On Error Resume Next
Test = TestArray(LBound(TestArray, 1), LBound(TestArray, 2))
' Note: This will work for a 3d array, or a 4d array, etc. If there is any risk of a higher-dimension array, use a function that returns
' the exact number of dimensions.
ArrayIsTwoDimensional = (Err.Number = 0)
On Error GoTo 0
End Function
Private Function JoinFromTwoDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant)
Dim Join As String
Dim i As Long
For i = LBound(InputArray, 1) To UBound(InputArray, 1)
Dim j As Long
For j = LBound(InputArray, 2) To UBound(InputArray, 2)
If InputArray(i, j) <> vbNullString Or Not SkipBlanks Then
If Join <> vbNullString Then Join = Join & Delimeter
Join = Join & InputArray(i, j)
End If
Next
Next
JoinFromTwoDimensionalArray= Join
End Function
Private Function JoinFromOneDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant) As String
Dim Join As String
Dim i As Long
For i = LBound(InputArray) To UBound(InputArray)
If InputArray(i) <> vbNullString Or Not SkipBlanks Then
' Note the placement of the delimeter forces the joined text to only ever add a delimeter when needed.
If Join <> vbNullString Then Join = Join & Delimeter
Join = Join & InputArray(i)
End If
Next i
JoinFromOneDimensionalArray = Join
End Function
That was the first step I took in making your code make a little bit more sense. By extracting the two loops, and the dimension check, the main routine becomes much cleaner, and gets to rely on a few Private Function
s to do the work it needs to do. The beauty of this is that your code now explicitly says what it is doing (anyone could read the code, regardless of whether they have been coding for days or years).
The next step I took was explicitly checking for the supported types. For example:
If TypeName(arr) = "Range" Then
arr2 = arr.Value
Else
arr2 = arr
End If
If arr
is a Worksheet
for example, you will get a With block or Object variable not set
error when trying to assign arr2
to arr
(the names here are troubling as well). Let’s explicitly raise an error if someone passes a value we don’t currently support:
Private Const ERR_NUMBER_TYPE_NOT_SUPPORTED As Long = 513
Private Const ERR_MESSAGE_TYPE_NOT_SUPPORTED As String = "Please provide a Variant() or Range for the InputValues argument of TEXTJOIN. Alternate types are not currently supported."
Dim Values As Variant
Select Case TypeName(InputValues)
Case "Range"
Values = InputValues.Value
Case "Variant()"
Values = InputValues
Case Else
Err.Raise ERR_NUMBER_TYPE_NOT_SUPPORTED, ERR_MESSAGE_TYPE_NOT_SUPPORTED
End Select
The beauty here is that not only will our code raise an error explicitly related to the source of the problem, but we also have a very modular way of adding additional support. For example, if we wanted to support a worksheet (for whatever reason) we would want to update the error message, and add just a bit of additional code:
Dim Values As Variant
Select Case TypeName(InputValues)
Case "Range"
Values = InputValues.Value
Case "Variant()"
Values = InputValues
Case "Worksheet"
Values = GetArrayFromWorksheet(InputValues)
Case Else
Err.Raise ERR_NUMBER_TYPE_NOT_SUPPORTED, ERR_MESSAGE_TYPE_NOT_SUPPORTED
End Select
Finally, making these few small changes has a huge impact on the readability/maintainability of the code. Here is the finished product (identical in function):
Private Const ERR_NUMBER_TYPE_NOT_SUPPORTED As Long = 513
Private Const ERR_MESSAGE_TYPE_NOT_SUPPORTED As String = "Please provide a Variant() or Range for the InputValues argument of TEXTJOIN. Alternate types are not currently supported."
Public Function TEXTJOIN(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputValues As Variant) As String
Dim Values As Variant
Select Case TypeName(InputValues)
Case "Range"
Values = InputValues.Value
Case "Variant()"
Values = InputValues
Case Else
Err.Raise ERR_NUMBER_TYPE_NOT_SUPPORTED, ERR_MESSAGE_TYPE_NOT_SUPPORTED
End Select
If ArrayIsTwoDimensional(InputValues) Then
TEXTJOIN = JoinFromTwoDimensionalArray(Delimeter, SkipBlanks, InputValues)
Else
TEXTJOIN = JoinFromOneDimensionalArray(Delimeter, SkipBlanks, InputValues)
End If
End Function
Private Function ArrayIsTwoDimensional(ByVal TestArray As Variant) As Boolean
Dim Test As Variant
On Error Resume Next
Test = TestArray(LBound(TestArray, 1), LBound(TestArray, 2))
' Note: This will work for a 3d array, or a 4d array, etc. If there is any risk of a higher-dimension array, use a function that returns
' the exact number of dimensions.
ArrayIsTwoDimensional = (Err.Number = 0)
On Error GoTo 0
End Function
Private Function JoinFromTwoDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant)
Dim Join As String
Dim i As Long
For i = LBound(InputArray, 1) To UBound(InputArray, 1)
Dim j As Long
For j = LBound(InputArray, 2) To UBound(InputArray, 2)
If InputArray(i, j) <> vbNullString Or Not SkipBlanks Then
If Join <> vbNullString Then Join = Join & Delimeter
Join = Join & InputArray(i, j)
End If
Next
Next
JoinFromTwoDimensionalArray= Join
End Function
Private Function JoinFromOneDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant) As String
Dim Join As String
Dim i As Long
For i = LBound(InputArray) To UBound(InputArray)
If InputArray(i) <> vbNullString Or Not SkipBlanks Then
' Note the placement of the delimeter forces the joined text to only ever add a delimeter when needed.
If Join <> vbNullString Then Join = Join & Delimeter
Join = Join & InputArray(i)
End If
Next i
JoinFromOneDimensionalArray = Join
End Function
We get rid of all of those counter variables (opting instead for i and j which are pretty standard). Our other variables are quite clearly named, and everything is what you would expect. As a result if you (however many years from now) come back to the function to add something new, or if it breaks, you can quickly find the source instead of trying to remember what t
and y
were and why you set t
or y
to -1
and so on so forth.
Again, highly recommend picking up a copy of Clean Code
. The idea of levels of abstraction, functions, classes, etc all can be difficult to understand at first but once you get going your code will become much easier to read.
I really like the OP’s concept but I feel that a pseudo Excel Application.WorksheetFunction.TextJoin
function should take a ParamArray of mixed data types.
In my implementation I use a combination of a string buffer and the Mid function improve the speed by avoiding concatenating large strings. To test the speed of my function I filled 500K cells with random strings ranging from 5 to 50 characters in length. Using the TheSpreadsheetGuru: Timer I determined that it took 1.95 seconds to create a string of 14,256,557 characters.
I’m sure that by utilizing the lightning-fast StringBuilder that @Mat’sMug mentioned would increase the speed 5 fold. I did not use it myself because I wanted convenience of having a single function do all the work.
Formulas
=TextJoin2(“,”,FALSE,”Numbers”,A6:C6,A7:C9,{10,11,12})
=TextJoin2(“,”,TRUE,”Numbers”,A6:C6,A7:C9,{10,11,12})
Code
Function TextJoin2(Delimiter As String, Ignore_Emtpy As Boolean, ParamArray Args() As Variant) As Variant
Dim results As String
Dim count As Long, i As Long, j As Long, length As Long, pos As Long
Dim argument As Variant, v As Variant
Select Case TypeName(Args(0))
Case "Empty"
argument = Array()
Case "Range"
If Args(0).count = 1 Then
argument = Array(Args(0).value)
Else
argument = Args(0).value
End If
Case "String"
argument = Array(Args(0))
Case "Variant()"
argument = Args(0)
End Select
For Each v In argument
length = length + Len(v)
count = count + 1
Next
results = Space(length + count * Len(Delimiter))
If count - 1 + LBound(argument) = UBound(argument) Then
For Each v In argument
If Not Ignore_Emtpy Or Len(v) > 0 Then
Mid(results, pos + 1, Len(v) + Len(Delimiter)) = v & Delimiter
pos = pos + Len(v) + Len(Delimiter)
End If
Next
Else
For i = LBound(argument) To UBound(argument)
For j = LBound(argument, 2) To UBound(argument, 2)
If Not Ignore_Emtpy Or Len(argument(i, j)) > 0 Then
Mid(results, pos + 1, Len(argument(i, j)) + Len(Delimiter)) = argument(i, j) & Delimiter
pos = pos + Len(argument(i, j)) + Len(Delimiter)
End If
Next
Next
End If
'Trim results needed to adjust for skipping empty values
results = Left(results, pos)
For i = 1 To UBound(Args)
results = results & TextJoin2(Delimiter, Ignore_Emtpy, Args(i)) & Delimiter
Next
Debug.Print Left(results, Len(results) - Len(Delimiter))
TextJoin2 = Left(results, Len(results) - Len(Delimiter))
End Function