Read/write a pipe-delimited file line by line with some simple text manipulation

Posted on

Problem

This code that I wrote is supposed to read/write a pipe-delimited file line by line to a new file with some simple text manipulation. (It also adds two new columns) and publishes a “Status Update” ever 100,000 lines to keep me updated on how close it is to completion.

I previously posted this code on StackOverflow to get help with incrementing, and someone mentioned that it would be faster if I did not open the second text file, but being extremely new at Python, I do not understand how to do that without potentially breaking the code.

counter=1
for line in open(r"C:Pathname.txt"):
    spline = line.split("|")
    if counter==1:
        with open(r"C:PATH2019.txt",'a') as NewFile:
            spline.insert(23,"Column A")
            spline.insert(23,"Column B")
            s="|"
            newline=s.join(spline)
            NewFile.write(newline)

    elif counter > 1 and not spline[22]=="0.00":
        spline.insert(23,"")
        spline.insert(23,"")
        gl=spline[0]
        gl=gl.strip()
        if gl[0]=="-": gl="000" + gl
        gl=gl.upper()
        spline[0]=gl

        if gl[:3]=="000": spline[24]="Incorrect"

        s="|"
        newline=s.join(spline)

        with open(r"C:PATHPythonWrittenData.txt",'a') as NewFile:
            NewFile.write(newline)

    counter+=1          
    if counter%100000==0: print("Status Update: n", "{:,}".format(counter)) 

Solution

A nice trick you can use in python is to open two (or more) files at once in one line. This is done with something like:

with open('file_one.txt', 'r') as file_one, open('file_two.txt', 'r') as file_two:
    for line in file_one:
        ...
    for line in file_two:
        ...

This is a very common way of reading from one file and writing to another without continually opening and closing one of them.

Currently, you’re opening and closing the files with each iteration of the loop. Your program loops through the lines in name.txt, checks an if / elif condition, then if either are satisfied, a file is opened, written to, then closed again with every iteration of the loop.

Simply by opening both files at the same time you can stop opening and closing them repeatedly.

For more info on the with statement and other context managers, see here.


Another small improvement can be made. At the moment, you check the first if condition every time, but you know it will only actually evaluate to True once. it would be better to remove that check and just always perform that block once. Assign counter after the first block (after where if counter == 1 currently is) then replace the elif statement with a while loop.


It would be worth getting familiar with PEP8 if you’re going to use Python a lot in the future. It’s a standard style guide and will help with the readability of your code (for you and others). Just small stuff like new lines after colons or spaces either side of variable declarations / comparisons.


If you include an example file and desired output, there may be more I can help with.

Here is another way to organize your code. Instead of an if within the loop, use iterators more explicitly. Concretely:

with open(r"C:Pathname.txt") as source:
    lines = iter(source)

    # first line
    first_line =  next(lines)
    with open(r"C:PATH2019.txt") as summary:
        # ... omitted ...

    # remaining lines
    with open(r"C:PATHPythonWrittenData.txt", 'a') as dest:
        for counter, line in enumerate(lines, start=1):
            # ... omitted ...

I have also used enumerate to update counter and line simultaneously.

The other answer has some more tips on writing good python code. But as far as structuring the opening and closing of files, as well as the main loop, this approach should get you started.

Leave a Reply

Your email address will not be published. Required fields are marked *