# Calculate query coverage from BLAST output

Posted on

Problem

I have a BLAST output file and want to calculate query coverage, appending the query lengths as an additional column to the output. Let’s say I have

``````2 7 15
``````
``````f=open('file.txt', 'r')
import re
for line in lines:
new_list=re.split(r't+',line.strip())
q_start=new_list
q_end=new_list
q_len=new_list
q_cov=((float(q_end)-float(q_start))/float(q_len))*100
q_cov=round(q_cov,1)
q_cov=str(q_cov)
new_list.append(q_cov)
r=open('results.txt', 'a')
x='t'.join(new_list)
x=x+'n'
r.writelines(x)
f.close()
r.close()
``````

Solution

One serious bug is that you open `results.txt` for each line of input. It’s almost always better to open files in a `with` block. Then, you won’t have to worry about closing your filehandles, even if the code exits abnormally. The `with` block would have made your `results.txt` mistake obvious as well.

Since you want to treat your `q_start`, `q_end`, and `q_len` as numbers, I wouldn’t even bother to assign their string representations to a variable. Just convert them to a `float` as soon as possible. Similarly, `q_cov` should be a `float`; I would just stringify it at the last moment. I would also postpone rounding just for the purposes of formatting the output, preferring to preserve precision in `q_cov` itself.

Put your `import` statements at the beginning of the program.

``````import re

with open('file.txt') as input, open('results.txt', 'a') as output: