I am trying to apply this simple macro to merge two big sized sheets (around 30000 rows each), but the process is too slow and never ends. The macro works perfectly with smaller sheets. Could you give me any advice for the optimization of my code please ?
My Macro consists in an Userform :
Public listChoice As String 'Using your code to get the sheet names for the ListBox rowsource. Private Sub UserForm_Activate() For n = 1 To ActiveWorkbook.Sheets.Count With SelectSheet .AddItem ActiveWorkbook.Sheets(n).Name End With Next n End Sub 'Including an update event for the ListBox Private Sub SelectSheet_AfterUpdate() listChoice = SelectSheet.Text End Sub 'Including a test just to demonstrate that the result is still retained. You don't need this, it demonstrates the results on the screenshot. Private Sub CommandButton1_Click() Dim sh1 As Worksheet Dim sh2 As Worksheet Dim sh3 As Worksheet Dim lc As String Dim letter2 As String Dim letter3 As String Dim mrgKeyRange1 As Range Dim mrgKeyRange2 As Range Dim cell As Range Dim lastC1 As Integer Dim lastC2 As Integer Dim lastC3 As Integer Dim lrow As Integer Dim currentR As Integer Dim key As Variant lc = listChoice 'closing the UserForm Unload Me Set mrgKeyRange1 = Application.InputBox("Select the range by wich the rows of the current sheet will be merged with the sheet " & lc, Type:=8) 'type 8 serve a fargli pigliare un range Set mrgKeyRange2 = Application.InputBox("Select the corresponding range in " & lc, Type:=8) Application.ScreenUpdating = False Application.EnableEvents = False Application.Calculation = xlCalculationManual 'if there are many calculations, it helps to speed the macro up Set sh1 = ActiveSheet Set sh2 = ActiveWorkbook.Sheets(lc) Set sh3 = Sheets.Add 'renaming the new sheet If Len(sh1.Name) < 26 Then 'the limit of a sheet's title is 31 chars sh3.Name = "Merged" & sh1.Name Else sh3.Name = "MergedSheet" End If 'adding the headers to the new sheet sh1.Rows(1).Copy Destination:=sh3.Rows(1) lastC1 = LastColumn(sh1) lastC2 = LastColumn(sh2) 'LastCol() is defined in the module LastRowColumn lastC3 = LastColumn(sh3) + 1 letter2 = NumberToLetter(lastC2) 'NumberToLetter is described in NumToLet module sh2.Range("A1:" & letter2 & "1").Copy Destination:=sh3.Cells(1, lastC3) 'formatting the headers of the new sheet (sh3) With sh3 lastC3 = LastColumn(sh3) letter3 = NumberToLetter(lastC3) .Cells(1, 1).Copy .Range("B1:" & letter3 & "1").PasteSpecial Paste:=xlPasteFormats .Range("A1:" & letter3 & "1").Columns.AutoFit End With '>>CR note: the macro begins to be veeeeeery slow starting from here: 'For each value in the Merging Key range, it finds the corresponding row in the other sheet For Each key In mrgKeyRange1 If Trim(key) <> "" Then With mrgKeyRange2 Set cell = .Find(What:=key, _ LookIn:=xlValues, _ LookAt:=xlWhole, _ SearchDirection:=xlNext, _ MatchCase:=False) If Not cell Is Nothing Then lrow = LastRow(sh3) key.EntireRow.Copy sh3.Rows(lrow + 1).PasteSpecial xlPasteValues currentR = cell.row sh2.Range("A" & currentR & ":" & letter2 & currentR).Copy sh3.Cells(lrow + 1, lastC1 + 1).PasteSpecial xlPasteValues End If End With End If Next Application.ScreenUpdating = True Application.EnableEvents = True Application.Calculation = xlCalculationAutomatic End Sub
In order to call the userform, I use this very simple code, written in another module:
Sub MergeSheets() Merge_UserForm.Show vbModeless End Sub
Here is an example of two sheets that I want to Merge by ID. Normally my sheets contain much more columns and the ID is variable (sometimes are numbers, sometimes strings,…)
@Mat’sMug does an excellent job with covering the OOP aspects of the code, so I’ll just dive straight into the performance issues.
1 – Just a tiny drop in the performance bucket, but you call
lastC3 = LastColumn(sh3) twice in this section:
lastC3 = LastColumn(sh3) + 1 letter2 = NumberToLetter(lastC2) 'NumberToLetter is described in NumToLet module sh2.Range("A1:" & letter2 & "1").Copy Destination:=sh3.Cells(1, lastC3) 'formatting the headers of the new sheet (sh3) With sh3 lastC3 = LastColumn(sh3)
You could omit the second call by simply calculating the new last column instead:
lastC3 = LastColumn(sh3) + 1 letter2 = NumberToLetter(lastC2) 'NumberToLetter is described in NumToLet module sh2.Range("A1:" & letter2 & "1").Copy Destination:=sh3.Cells(1, lastC3) 'formatting the headers of the new sheet (sh3) With sh3 lastC3 = lastC3 + lastC2
However, if you glance up about 13 lines of code you’ll notice that
LastColumn(sh3) will always be the same as
lastC1 at that point because you’ve just added it on the line
Set sh3 = Sheets.Add and its only contents are the
sh1 headers. Best would be to always use the calculated values when you can.
2 – The
NumberToLetter should never be used unless you’re displaying a human readable column address (and in that case you should let Excel do it for you). Computers work with numbers well, but strings, not so much. What happens with code like this…
letter2 = NumberToLetter(lastC2) 'NumberToLetter is described in NumToLet module sh2.Range("A1:" & letter2 & "1").Copy Destination:=sh3.Cells(1, lastC3 + 1)
…is that you take your last column number (let’s say for the sake of argument that it’s 4 like your top data example). You make a function call to convert it to “D”, then concatenate it into a
"A1:" & "D" & "1", and pass the value to
sh.Range as “A1:D1”. And then… Excel parses the string. It determines that “A” refers to column 1, and “D” refers to column 4. You’re doing a bunch of work with the sole effect of making Excel do more work. Do Excel a solid and use the numeric interfaces:
With sh2 .Range(.Cells(1, 1), .Cells(lastC2, 1)).Copy Destination:=sh3.Cells(1, lastC3 + 1) End With
3 – Don’t use
.Copy. At all. Ever. It actually does 2 copies internally – one to the clipboard and one to the destination. Then comes the really annoying UX part – it wipes out whatever the user already had in the clipboard. Ctrl-V didn’t do anything? Oh. Somebody used
.Copy in a macro. Too bad I closed that file/navigated away from that page/deselected the 100 unordered items I laboriously Ctrl-clicked… You get the point. Instead, just assign the values directly:
sh3.Range(sh3.Cells(1, lastC3 + 1),sh3.Cells(1, lastC3 + lastC2 + 1)).Value = _ sh2.Range(sh2.Cells(1, 1), sh2.Cells(lastC2, 1)).Value
It’s much faster, and doesn’t blow away the clipboard.
4 – While I have UX on the brain, you shouldn’t make the assumption that the user has
Application.Calculation set to
xlCalculationAutomatic. This is an application wide setting, and it holds its state after the your code completes. Instead, cache the current setting at the start of the code, do what you need to, and then change it back to the setting it was when you found it. Also, if you’re going to alter the state of the application, you should have error handling to make sure that it gets set back if you have an exception somewhere:
On Error GoTo CleanExit Dim calcState As XlCalculation Dim eventState As Boolean calcState = Application.Calculation Application.Calculation = xlCalculationManual eventState = Application.EnableEvents Application.EnableEvents = False '... CleanExit: Application.Calculation = calcState Application.EnableEvents = eventState End Sub
5 – Only copy data that needs to be copied.
.EntireRow is well named – it’s every single cell in the row. All 16,384 of them. You already know how wide the data is because you measure it here:
lastC1 = LastColumn(sh1) lastC2 = LastColumn(sh2) lastC3 = LastColumn(sh3)
Just figure out which sheet
mrgKeyRange1 belongs to (hint, it’s
mrgKeyRange1.Parent) and only copy the appropriate number of columns.
6 – Similar to #1, but with a much, much higher impact on your overall performance is the call
lrow = LastRow(sh3) inside your main loop. This value will never increase by more than 1. You enter the loop with only header rows in
sh3, and you’re adding rows to
sh3 one at a time. It can be replaced with this…
lrow = 1 For Each key In mrgKeyRange1 If Trim(key) <> "" Then With mrgKeyRange2 '... If Not cell Is Nothing Then lrow = lrow + 1 key.EntireRow.Copy sh3.Rows(lrow).PasteSpecial xlPasteValues
7 – Pausing for a brief nitpick, the
If block beginning with
If Not cell Is Nothing Then should be indented another level.
8 – Avoid round trips to the Worksheets, especially in the loops. You can pull all of the values from a range of cells as an array like this:
Dim allValues As Variant allValues = ActiveSheet.Range("A1:D1000").Value 'allValues is now a 2D array containing all the cell values 'in "A1:D1000".
Reading individual values from a worksheet is an expensive operation. Reading values from an array is really, really cheap. This is where the rubber really hits the road. You’ll see an immediate performance gain by restructuring your main loop like this:
Dim searchItems As Variant searchItems = mrgKeyRange1.Value Dim searchRow As Long Dim searchCol As Long For searchRow = LBound(searchItems, 1) To UBound(searchItems, 1) For searchCol = LBound(searchItems, 2) To UBound(searchItems, 2) key = searchItems(searchRow, searchCol) '... Next Next
9 – I don’t know enough about the data to recommend a specific alternative approach, but my guess is that the
.Find call is another big source of burned CPU cycles. If you’re only checking one column in SheetX against one column in SheetY and at least one contains only unique values, it would probably be fastest to just sort both of them on the target columns and use “seeking” indexes to ladder your way down both sets of data at the same time. Pseudo-code:
Sort both lists. Do until index1 hits the end of array1 Do until array2(index2) = array1(index1) or index2 hits the end of array2 Increment index2 Loop If index2 is at the end of array2, Exit. Process the match. Do until array1(index1) = array2(index2) or index1 hits the end of array1 Increment index1 Loop Loop
Otherwise it would probably be faster to read all of the
mrgKeyRange2 values into a
Dictionary and use its hash lookups instead of doing individual searches. Store the “key” as the key and the row number (or collection of row numbers if you have duplicates) as the value. Then you can just test to see if they’re in the
Dictionary and retrieve the row number(s) as you iterate through
mrgKeyRange1. There’s an example of something similar to this method here.
The first thing I expect to see in any code module – be it a standard module, a class module, a worksheet module,
ThisWorkbook, or a
UserForm‘s code-behind, is this:
With this, your code wouldn’t compile… because you’re not declaring every variable that you’re using – in the
n is an implicit
Variant that VBA allocates on the fly… and Variant should be avoided for routine tasks such as looping:
Private Sub UserForm_Activate() For n = 1 To ActiveWorkbook.Sheets.Count With SelectSheet .AddItem ActiveWorkbook.Sheets(n).Name End With Next n End Sub
There are a few things to improve here. A
For Each loop performs best when iterating an array; when iterating a collection of objects (such as the
Sheets collection of the
ActiveWorkbook), it’s best to use a
For Each loop:
Dim sheet As Worksheet For Each sheet In ActiveWorkbook.Worksheets SelectSheet.AddItem sheet.Name Next
Notice this loop is iterating the
Worksheets collection – the
Sheets collection contains sheets that aren’t worksheets, such as chart sheets.
sheet reference is “captured” at each iteration, so you don’t need to access the
Worksheets collection every time – it’s simply given to you by the iteration mechanism; that’s why a
For Each loop performs better with collections. Keep
For...Next loops for iterating arrays.
Public listChoice As String
Given that a
UserForm is really a class module with a designer and a default instance, it should be used as an object – and in object-oriented code, this
listChoice module-level variable is a public field.
A public field makes the
String value readable from the calling code. The problem is that is also makes the value writable from the calling code… which doesn’t always make sense and makes it easier to introduce bugs.
I like that you’ve abstracted away the
ListBox, so the caller doesn’t need to know that the selection is coming from a specific control. A better and more object-oriented way to do exactly that is to expose a property:
Public Property Get SelectedSheetName() As String With SelectSheet If .ListIndex = -1 Then Exit Property 'nothing is selected SelectedSheetName = .List(.ListIndex) End With End Property
Notice that this eliminates the need for a public field for the client code to access the selected sheet name.
Now, looking at the rest of the code, it seems the
listChoice field /
SelectedSheetName property can very well be
Private, given it’s only used in the module it’s declared in: variables and members should always have the tightest possible scope – I like that all members are
Private, but that
Public field makes this code possible:
Merge_UserForm.listChoice = "potato" Merge_UserForm.Show vbModeless
And then the user could click the
CommandButton1 without making a valid selection in the
SelectSheet listbox, and then this line would blow up:
Set sh2 = ActiveWorkbook.Sheets(lc)
Side note, what’s the
lc local variable needed for?
listChoice is already there, in scope, waiting to be used… but I’ll get back to this in a moment.
I think the form,
CommandButton1 in particular, is responsible for way too many things. A
UserForm is a view, a user interface: a UI exists because a program needs to collect user input.
I think these
InputBox calls are a missed opportunity to have two
RefEdit controls on that form, to collect
Range selections from the user without ugly and annoying
InputBox prompts – not to mention that
Excel.Application.InputBox is a bit confusing when there’s also the standard
VBA.Interaction.InputBox. And you’re not validating that the selected range is actually on the
lc sheet, which can lead to some interesting bugs.
Why do you need that ListBox at all anyway then? Let the user select a range, and extract the sheet’s name from that range!
So back the form’s responsibilities: collecting user input. What are the actual inputs you need?
mrgKeyRange2? I’m not sure I completely understand exactly what’s happening (haven’t looked at the actual “do work” code yet), but it seems to me your UI could look something like this:
Then there would be some logic to validate the selected ranges and make sure the form can’t be OK’d without consistent input values (e.g. if the selected columns/rows need to line up, or if the two ranges must be on separate sheets, etc.) – the entire role and purpose of a
UserForm is to collect and validate the user’s input.
So the only code that should be in a form’s code-behind, is simple code that deals with simple mechanics, e.g. making sure that the object instance doesn’t get destroyed when the user decides to X-out and cancel everything:
Private cancelled As Boolean Public Property Get IsCancelled() As Boolean IsCancelled = cancelled End Property Public Property Get Selection1() As Range 'todo: rename On Error Resume Next Set Selection1 = Application.Range(RefEdit1.Value) On Error GoTo 0 End Property Public Property Get Selection2() As Range 'todo: rename On Error Resume Next Set Selection2 = Application.Range(RefEdit2.Value) On Error GoTo 0 End Property Private Sub OkButton_Click() cancelled = False Me.Hide End Sub Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer) If CloseMode = VbQueryClose.vbFormControlMenu Then cancelled = True Cancel = True Me.Hide End Sub
If the input ranges can be invalid and you would want to prevent the user from OK’ing the form with such invalid input, I would suggest to keep that responsibility outside the form still – by implementing it in dedicated class modules.
I don’t know what the rules are, and I don’t know how many there are – so I’ll go and add a new class module and call it
IRangeValidationRule to define a simple interface:
Option Explicit Public Function Validate(ByVal range1 As Range, ByVal range2 As Range) As Boolean End Function
And then say one of the rules is that
range2 must refer to distinct sheets – we would have another class module,
Option Explicit Implements IRangeValidationRule Private Function IRangeValidationRule_Validate(ByVal range1 As Range, ByVal range2 As Range) As Boolean On Error GoTo CleanFail Dim result As Boolean result = Not range1.Parent Is range2.Parent CleanExit: IRangeValidationRule_Validate = result Exit Function CleanFail: result = False Resume CleanExit End Function
And so on for each validation rule you might have. Then the form could have this:
Private rules As New Collection Public Sub AddValidationRule(ByVal rule As IRangeValidationRule) rules.Add rule End Sub
And you could then determine whether or not the selected ranges are valid by simply iterating the rules:
Private Function IsValid() As Boolean Dim rule As IRangeValidationRule For Each rule In rules If Not rule.Validate(Selection1, Selection2) Then IsValid = False Exit Function End If Next IsValid = True End Function
And then you could handle the two
AfterUpdate handlers to run the validation and disable the
OkButton until the input is valid:
Private Sub RefEdit1_AfterUpdate() OkButton.Enabled = IsValid End Sub Private Sub RefEdit2_AfterUpdate() OkButton.Enabled = IsValid End Sub
So, what would the calling code look like then? This couldn’t work:
Sub MergeSheets() Merge_UserForm.Show vbModeless End Sub
First, it’s working off the default instance of the form, and then with the actual “do work” code gone, that wouldn’t be doing anything. We need to create an instance and work with that – but first let’s remove that underscore and keep the form’s name
PascalCase; the underscore makes it look like some event handler or interface member implementation procedure. Then we’ll restrict user interactions to entering valid input or cancelling the form by using
vbModal, so while the form is displayed, the user can’t interact with anything other than the form:
Public Sub MergeSheets() With New MergeUserForm .Show vbModal If Not .IsCancelled Then DoWork .Selection1, .Selection2 'todo: rename all these End With End Sub
Notice the conceptual difference here: instead of fire-and-forget displaying a form and not knowing what’s happening afterwards, we can see that we’re displaying a form, collecting
Selection2, allowing the user to cancel everything, and passing the inputs to some
DoWork procedure that’s responsible for the actual work – the form itself doesn’t do much.
If we have implemented validation rules as I’ve shown above, we could have this:
Public Sub MergeSheets() With New MergeUserForm .AddValidationRule New DistinctSheetValidationRule .AddValidationRule New SomeOtherValidationRule .Show vbModal If Not .IsCancelled Then DoWork .Selection1, .Selection2 'todo: rename all these End With End Sub
The biggest advantage with this approach, is that you’ve decoupled the actual work (and input validation rules) from the UI, and you can even write unit tests for them if you want – and because the
DoWork procedure is taking the user input as parameters rather than prompting for it, you can write unit tests for it too!
I’ll end this answer here and let other reviewers address the actual do work procedure and its performance problems.