Problem
I have the following functions which help me to open a text file and remove any blank (empty) lines:
def fileExists(file):
try:
f = open(file,'r')
f.close()
except FileNotFoundError:
return False
return True
def isLineEmpty(line):
return len(line.strip()) < 1
def removeEmptyLines(file):
lines = []
if not fileExists(file):
print ("{} does not exist ".format(file))
return
out = open(file,'r')
lines = out.readlines()
out.close()
out = open(file,'w')
t=[]
for line in lines:
if not isLineEmpty(line):
t.append(line)
out.writelines(t)
out.close()
As you can see I open a file 2 times. One for reading and one for writing. Is there anything I can to to improve this code?
Solution
First thing is that your function fileExists can be replaced with os.path.isfile
function
Now here you cannot have negative size of string so to make it less confusing you can do just:
def isLineEmpty(line):
return len(line.strip()) == 0
however I would not even create a function for this, because empty string is a False
in boolean meaning so this:
for line in lines:
if not isLineEmpty(line):
t.append(line)
will be replaced with:
for line in lines:
if line.strip():
t.append(line)
which can be prettifed using filter function:
lines = filter(lambda x: x.strip(), lines)
Now speaking about opening files, it is better to use with
statement, it is safer, prettier and more pythonic way.
So this:
out = open(file,'r')
lines = out.readlines()
out.close()
will be just:
with open(file,'r') as out:
lines = out.readlines()
Another thing is that python functions/variables should named using underscore as separtor. So in the end what we have is:
import os
def remove_empty_lines(filename):
if not os.path.isfile(filename):
print("{} does not exist ".format(filename))
return
with open(filename) as filehandle:
lines = filehandle.readlines()
with open(filename, 'w') as filehandle:
lines = filter(lambda x: x.strip(), lines)
filehandle.writelines(lines)
I see two good strategies to accomplish the task.
One solution is to read all of the text into a list, then rewind to the start of the file and write out the desired lines.
def remove_empty_lines(filename):
"""Overwrite the file, removing empty lines and lines that contain only whitespace."""
with open(filename, 'r+') as f:
lines = f.readlines()
f.seek(0)
f.writelines(line for line in lines if line.strip())
f.truncate()
The disadvantage of that approach is that it scales poorly if the file is huge, because the entire contents have to be read into memory first. The other approach would be to overwrite the file while reading from it. This uses two file objects, each keeping track of its own position within the file.
def remove_empty_lines(filename):
"""Overwrite the file, removing empty lines and lines that contain only whitespace."""
with open(filename) as in_file, open(filename, 'r+') as out_file:
out_file.writelines(line for line in in_file if line.strip())
out_file.truncate()
Note some other issues with your code.
First, PEP 8, the official Python style guide, recommends lower_case_with_underscore
for function names, and two blank lines between functions.
The fileExists
function does not just test for the existence of the file — it actually checks whether you can open it for reading, which is a more stringent condition, since it also involves file permissions. But I don’t see any reason to check specifically for file existence. All kinds of I/O errors are possible, such as filesystem permission denial, read-only filesystem, disk quota, or hardware failure. Furthermore, even if the file exists when you check, it could disappear during the split second between if not fileExists(…)
and the real open(…)
call. On top of that, your rmoveEmptyLines
function has no way of reporting failure to its caller. (Printing an error message doesn’t count!) Therefore, the only reasonable approach is to Just Do It, and handle any exception that might occur.
Any open()
call should be written using a with
block, which will automatically close the file handle when exiting the block.
I usually find it a bad idea to do a bunch of checks before doing things. This will avoid race conditions like when a file is deleted between when you check and when you actually open the file.
I also like to do “atomic” writes. By this I mean writing the entire file in one call rather than writing each line. I usually accomplish this by writing to a temporary file and then moving that file to overwrite the original. This also can help to avoid certain race conditions.
For instance I would do something like the following:
from tempfile import NamedTemporaryFile
from shutil import move
def remove_empty_lines(filename):
try:
with open(filename, "rb") as fin, NamedTemporaryFile(delete=False) as fout:
temp_filename = fout.name
for line in fin:
if line.strip():
fout.write(line)
except FileNotFoundError:
print("{} does not exist.".format(filename))
else:
move(temp_filename, filename)