Converting from .xlsx to .txt

Posted on

Problem

I have code that converts each sheet of a spreadsheet into a .txt file. It works well, however, given the big number of exports (about 90 .txt files), I’d like to seek advice on how to speed it up.

Sub xlsxTotxt()

Dim i As Integer
Dim directory As String
Dim fname As String
Dim xWs As Worksheet
Dim xTextFile As String
Dim rdate As String

directory = ThisWorkbook.Sheets("Macro").Range("D576").Value
rdate = ThisWorkbook.Sheets("Macro").Range("E47").Value
i = 0
Application.ScreenUpdating = False
Application.DisplayAlerts = False
Do While ThisWorkbook.Sheets("Macro").Range("D577").Offset(i).Value <> ""
fname = Sheets("Macro").Range("D577").Offset(i).Value
Workbooks.Open (directory & fname)
For Each xWs In Workbooks(fname).Worksheets
xWs.Copy
xTextFile = directory & rdate & " - " & xWs.name & ".txt"
Application.ActiveWorkbook.SaveAs filename:=xTextFile, FileFormat:=xlText
Application.ActiveWorkbook.Saved = True
Application.ActiveWorkbook.Close
Next
Workbooks(fname).Close
i = i + 1
Loop
Application.ScreenUpdating = True
Application.DisplayAlerts = True

End Sub

Solution


Your code is incredibly fragile

directory = ThisWorkbook.Sheets("Macro").Range("D576").Value
rdate = ThisWorkbook.Sheets("Macro").Range("E47").Value

If a single row/column gets added or deleted or moved, those ranges are going to move and your code is going to fail completely.

If you can, those values should be in a dedicated sheet, not buried below 500 lines of other stuff. If not, name your ranges.

enter image description here

So, let’s say you take cell D576 and name it Directory_Path or something similar. Now, rather than

directory = ThisWorkbook.Sheets("Macro").Range("D576").Value

which is incredibly fragile, you can use

directory = ThisWorkbook.Range("Directory_Path").Value

and so long as nobody actually deletes said row/column, that will always point to the right place.


Use With

Rather than re-referencing the same object over and over, you can use a With statement to hold a reference. Like so:

Application.ActiveWorkbook.SaveAs Filename:=xTextFile, FileFormat:=xlText
Application.ActiveWorkbook.Saved = True
Application.ActiveWorkbook.Close

Becomes

With Application.ActiveWorkbook
    .SaveAs Filename:=xTextFile, FileFormat:=xlText
    .Saved = True
    .Close
End With

much clearer and easier to read.


Handle things in the right order

Specifically your Application.[Settings]. Anything *meta* like that should go right at the start and right at the end of the method(s) it applies to. This allows you to confirm, at a glance, what the internal state of your method is and check that things are set/reset correctly. You should also disable Application.EnableEvents and Application.Calculation for significant extra speed. Like so:

Sub xlsxTotxt()

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
        .EnableEvents = False
        .Calculation = xlCalculationManual
    End With


    ...


    Code


    ...


    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
        .EnableEvents = True
        .Calculation = xlCalculationAutomatic
    End With

End Sub

Use descriptive naming

Code should be written for other people (including future you) to read. Names should be descriptive, unambiguous and concise. In that order. I highly recommend the excellent, classic article on naming by Joel Spolsky. But in short, things should sound like what they are.

xTextFile. What on earth is that? Looking at it I have no idea. If I had to guess, I’d say it’s some kind of file object. Oh, it’s the filename you want to save your text file under? Why not call it newFilename? Since it’s a full filename (including directory path). Maybe newFullfilename or just fullFilename might be even more descriptive.

xWs suffers from the same problem. I can guess it’s a worksheet object but beyond that? no idea. Since you just use it to iterate over the sheets in your workbook, maybe just call it currentSheet?

fname. Same thing. filename. Since it’s the filename for the workbook you’re opening, how about targetWorkbookFilename? Sure, it’s long, but screen real estate is cheap, cognitive processing is not and that name’s an awful lot easier to understand and work with.

Also, move your declarations near to where they’re used. This helps you *see* the different parts of your method and helps you keep track of where you are, refer back to your declarations and see ways to split things up into logical sub-methods. Personally, I prefer to keep declarations outside of Loop structures, but that’s personal preference.


Objects are your friend

VBA has a huge, extensive, comprehensive object model for every Office Application. Use it.

ThisWorkbook.Sheets("Macro")

You refer to that sheet multiple times. Rather than continually repeating that reference, put it in a worksheet object and then just refer to the object.

Dim macroSheet As Worksheet
Set macroSheet = ThisWorkbook.Sheets("Macro")

Dim dateString As String
dateString = macroSheet.Range("E47").Value

Same with Workbooks.

Do While ThisWorkbook.Sheets("Macro").Range("D577").Offset(i).Value <> ""
fname = Sheets("Macro").Range("D577").Offset(i).Value
Workbooks.Open (directory & fname)
For Each xWs In Workbooks(fname).Worksheets
xWs.Copy
xTextFile = directory & rdate & " - " & xWs.name & ".txt"
Application.ActiveWorkbook.SaveAs filename:=xTextFile, FileFormat:=xlText
Application.ActiveWorkbook.Saved = True
Application.ActiveWorkbook.Close

Relying on ActiveWorkbook being the one you want is very fragile. Just make it a proper object and then your references will always be accurate:

Dim targetWorkbook As Workbook
Set targetWorkbook  = Workbooks.Open (directory & fname)

...

For Each currentSheet In targetWorkbook.Sheets()

    ...

    With targetWorkbook
        .SaveAs filename:=xTextFile, FileFormat:=xlText
        .Saved = True
        .Close
    End With

Indenting

Indenting is a wonderful way to let you see the structure of your Sub. It’ll let you pick out nested-logic (Loops, Ifs, Fors, Withs etc.) and very quickly get a rough idea for where the important stuff in your sub is. Compare:

Do While ThisWorkbook.Sheets("Macro").Range("D577").Offset(i).Value <> ""
fname = Sheets("Macro").Range("D577").Offset(i).Value
Workbooks.Open (directory & fname)
For Each xWs In Workbooks(fname).Worksheets
xWs.Copy
xTextFile = directory & rdate & " - " & xWs.name & ".txt"
Application.ActiveWorkbook.SaveAs filename:=xTextFile, FileFormat:=xlText
Application.ActiveWorkbook.Saved = True
Application.ActiveWorkbook.Close
Next
Workbooks(fname).Close
i = i + 1
Loop

With

Do While ThisWorkbook.Sheets("Macro").Range("D577").Offset(i).Value <> ""

    fname = Sheets("Macro").Range("D577").Offset(i).Value
    Workbooks.Open (directory & fname)

    For Each xWs In Workbooks(fname).Worksheets

        xWs.Copy
        xTextFile = directory & rdate & " - " & xWs.Name & ".txt"

        With Application.ActiveWorkbook
            .SaveAs Filename:=xTextFile, FileFormat:=xlText
            .Saved = True
            .Close
        End With

    Next

    Workbooks(fname).Close
    i = i + 1

Loop

And suddenly you can see your program flow, and where the real work is happening, and where loops start/end, in just a glance. Adding some whitespace also goes a long way to helping readability.


Putting it all together

Sub xlsxTotxt()

    With Application
        .ScreenUpdating = False
        .DisplayAlerts = False
        .EnableEvents = False
        .Calculation = xlCalculationManual
    End With

    Dim parentDirectoryCell As Range
    Set parentDirectoryCell = ThisWorkbook.Range("Directory_Path")

    Dim parentDirectoryPath As String
    parentDirectoryPath = parentDirectoryCell.Value

    Dim dateString As String
    dateString = ThisWorkbook.Sheets("Macro").Range("E47").Value


    Dim fullFilename As String
    Dim currentSheet As Worksheet
    Dim targetWorkbook As Workbook
    Dim targetWorkbookFilename As String

    Dim rowOffset As Long
    rowOffset = 1

    targetWorkbookFilename = parentDirectoryCell.Offset(rowOffset).Value
    Do While targetWorkbookFilename <> ""

        Set targetWorkbook = Workbooks.Open(parentDirectoryPath & targetWorkbookFilename)

        For Each currentSheet In targetWorkbook.Sheets()

            currentSheet.Copy
            fullFilename = parentDirectoryPath & dateString & " - " & currentSheet.Name & ".txt"

            With targetWorkbook
                .SaveAs Filename:=fullFilename, FileFormat:=xlText
                .Saved = True
                .Close
            End With

        Next currentSheet

        targetWorkbook.Close

        rowOffset = rowOffset + 1
        targetWorkbookFilename = parentDirectoryCell.Offset(rowOffset).Value

    Loop

    With Application
        .ScreenUpdating = True
        .DisplayAlerts = True
        .EnableEvents = True
        .Calculation = xlCalculationAutomatic
    End With

End Sub

Nicola,
To answer your original question.. How to make it go faster. Since you say you are having to take this over, and there is a wider program involved, here’s an interesting solution to add to your environment.

Setup

You will need 2 workbooks.

Workbook1 : Manager.xls

Workbook2 : Runner.xls

Manager.xls has the following code:

Option Explicit

Public Sub manage()

    Dim item As Variant
    For Each item In Range("booksToExtract")
         shellout item
    Next item

End Sub

Public Sub shellout(ByVal val As String)

    ChDir ThisWorkbook.Path
    Shell """c:Program Files (x86)Microsoft OfficeOffice12EXCEL.EXE"" /r " & val & " runner.xls /r "

End Sub

Runner.xls will do you work of extracting the code; there is a bit of management code to identify when a book to extract is available.

Option Explicit

Public Function waitForNewWorkBook() As String

    While Workbooks.Count = 1
        DoEvents
    Wend

    If Workbooks(1).Name = ThisWorkbook.Name Then
        waitForNewWorkBook = Workbooks(2).Name
        Exit Function
    End If

    waitForNewWorkBook = Workbooks(1).Name

End Function
Public Sub auto_open()

    Dim workbookName As String

    workbookName = waitForNewWorkBook
    ExtractCode workbookName

    'Cleanup
    Workbooks(workbookName).Close
    Application.Quit

End Sub

Public Sub ExtractCode(ByVal workbookName As String)

    Dim oWorksheet As Worksheet
    Dim oBook As Workbook

    Set oBook = Workbooks(workbookName)

    ' get your filename format here.
    ' loop through your worksheets and extract here.

    oBook.SaveAs oBook.Name & ".txt", xlText

    Call oBook.Close(False)

End Sub

Assumptions

I’ve assumed all the workbooks are in the same folder, and the extract folder is also in the same folder. You can modify if needed.

How does this work

  • Part One

The Manager workbook loops through all your available workbooks, and calls the shellout method.

The shellout method, calls the VBA Shell method, which runs the string provided and opens a new excel session in a new process, and in that excel opens the runner.xls book, and the next book in your list, both as readonly.

The shellout method will not wait for the excel session to die before returning to the manager, so the manager will just “spawn” as many excel sessions as there are books in the list. You can modify this behaviour to check the number of concurrently running excel session in the processors if you want.

In my example I have I created 10 workbooks, and ran them, at this point I should see in the process explorer, 10 excel sessions, each trying to get processor time. Sadly I only have 2 cores, so the manager sheet is taking one process, and I can only run one other process. But if you have 4 or 16 processors, you could extract 3 or 15 excels at a time.

  • Part Two

Now that each of the excel extracts ‘runner.xls myExtractBookX.xls’ is opened the runner.xls will have started the auto_open method.

waitForNewWorkbook does as it says. Wait untill the myExtractBookX.xls is open and returns the name of the file.

then we run ExtractCode which is the code to extract your workbook. I leave it to you to implement how you want to extract a workbook.

Now you have a multithreaded excel application. but don’t tell anyone that, because excel is actually single threaded, and actually we’ve just spawned multiple excel processes to run on any free cores.

Finale

If you run this from a command line it can really get interesting…

for /f "usebackq" %i in (`dir /s/b pnl*.xls`) do "c:Program Files (x86)Microsoft OfficeOffice12EXCEL.EXE" /r %i /r runner.xls

I found that 2 additional Application properties make a noticeable difference: .Calculation and .EnableEvents. Runtime came down from 20s to 6.5s with 2 files, from 30s to 13s with 3 files:

Sub xlsxTotxt()
    Dim fileno As Integer
    Dim directory As String
    Dim fname As String
    Dim xWs As Worksheet
    Dim xTextFile As String
    Dim rdate As String
    Dim namelist As Range
    Dim runtime As Variant

    ' runtime = Timer  # decomment if runtime is to be measured
    ToggleWaitMode  ' switch Application properties

    rdate = ThisWorkbook.Sheets("Macro").Range("E47").Value
    directory = ThisWorkbook.Sheets("Macro").Range("D576").Value
    Set namelist = ThisWorkbook.Sheets("Macro").Range("D577")

    fileno = 0
    Do While namelist.Offset(fileno).Value <> ""
        fname = namelist.Offset(fileno).Value
        Workbooks.Open (directory & fname)
        For Each xWs In Workbooks(fname).Worksheets
            xTextFile = directory & rdate & " - " & xWs.name & ".txt"
            xWs.Copy   ' create a new workbook from this sheet
            With ActiveWorkbook
                .SaveAs filename:=xTextFile, FileFormat:=xlText
                .Saved = True
                .Close
            End With
        Next xWs
        Workbooks(fname).Close
        fileno = fileno + 1
    Loop

    ToggleWaitMode False
    ' runtime = Timer - runtime
    ' MsgBox runtime
End Sub

Public Sub ToggleWaitMode(Optional ByVal wait As Boolean = True)
    With Excel.Application
        .Calculation = IIf(wait, xlCalculationManual, xlCalculationAutomatic)
        .Cursor = IIf(wait, xlWait, xlDefault)
        .StatusBar = IIf(wait, "Please wait...", False)

        .DisplayAlerts = Not wait
        .ScreenUpdating = Not wait
        .EnableEvents = Not wait
    End With
End Sub

Switching properties of the Application object is so common in macros that I use a public function of it’s own for this, ToggleWaitMode() (found here). In addition to the original settings it suppresses auto-macros and switches off the recalculation.
Finally I’ve used some of the good advice about use of objects instead of direct references etc. found in the other answers.

The vast majority of time in your program will be spent opening the workbooks. It’s a terribly slow operation. If you can avoid that, you’ll see massive performance gains.

That’s great, but how do you expect me to export these without opening them?

By using ADODB to connect to the closed file and query the sheets. Then you can write the results of those queries to text file.

Leave a Reply

Your email address will not be published. Required fields are marked *