Problem
The Problem
After his brush with the Justice Department over fundraising with Buddhist monks, the Vice President devised a plan to
ensure that such activities are carried out in a more discrete manner and are kept less noticeable. Realizing the Democratic
National Committee’s need for more and more money in the campaign war chest, Mr. Gore devised a “rhythm method” for
accepting campaign donations from Buddhist monks and yet avoiding the conception of an independent counsel. Gore’s
theory is that if the donations are spaced no less than 28 days apart, an independent counsel will not be necessary.To help Mr. Gore keep track of when it’s okay to accept money from Buddhist monks, you must write a program to
automatically scan the White House email logs for messages from “veep@whitehouse.gov” addressed to“buddha@whitehouse.gov” (the code name for the Al Gore Rhythm Method). Each such email is a secret entry in Mr.Gore’s Buddhist monk fundraising dairy. Your program must send Mr. Gore (“veep@whitehouse.gov”) a reply to
each such email advising him when the next donation may be accepted. You should assume that the email was sent the day
that the donation was accepted. To maintain more secrecy, Mr. Gore refers to a Buddhist monk as a “BM.”Sample Input
Your program must process the White House electronic mail log, stored in the file whmail.log as input. The mail log is
an ASCII text file which contains a sequence of one or more valid email messages. Each email message consists of a header
followed by the body. A header consists of exactly four lines, each of which begins with a unique keyword. The first line is
the sender line and begins with the keyword From which is followed by a character string representing the email address of
the sender. The second line is the recipient line and begins with the keyword To which is followed by a character string
representing the email address of the recipient of the message. The third line is the date line and begins with the keyword
Date
which is followed by a character string representing the date on which the email message was received. The fourth
line is the subject line and begins with the keyword Subject which is followed by an arbitrary character string. A single
blank space separates the keyword on each header line from the character string that follows. The body of an email message
is simply a sequence of one or more lines of text any of which may be blank. The body begins on the fifth line of the email
message. There will be normal White House email interspersed with the Al Gore Rhythm email, but your program should
ignore all email other than those from “veep” to “buddha.”
Sample contents ofwhmail.log
could appear as:From bill@whitehouse.gov To buffy@airhead.net Date Saturday, October 4, 1997 Subject Get together Hey, honeypie. Ole Big Daddy sure is missin you. I’ll be a lookin for you to come around again this weekend. Love, Bill From veep@whitehouse.gov To buddha@whitehouse.gov Date Monday, October 6, 1997 Subject BM Dear Buddha, I just had a BM. Please advise. From reno@justice.gov To bill@whitehouse.gov Date Wednesday, October 8, 1997 Subject Roby Ridge Mr. President: The situation with the lady in Roby is quickly deteriorating. I advise we use an Apache loaded with napalm to flush the crazy woman out. If it kills her, it serves her right for that Vaseline trick. Dead or alive, at least it will be over. If I don’t hear from you within the next hour, I’ll send for the chopper. Janet
Sample Output
The output of your program must be directed to the screen and must consist of a reply email for each Al Gore Rhythm email
found in the log. Each reply must specify the date on which the next donation may be accepted. Your output must be
formatted exactly as that below, which is the output corresponding to the sample input above.From buddha@whitehouse.gov To veep@whitehouse.gov Date Saturday, November 8, 1997 Subject Re: BM Thank you for advising me of your BM. You may not have another BM until Monday, November 3, 1997.
algore.py
import datetime
def search_match(t, f):
return t.startswith('From veep@whitehouse.gov') and f.startswith('To buddha@whitehouse.gov')
def get_date(date):
prefix = date.strftime('%A, %B ')
day = date.strftime('%d').lstrip('0')
postfix = date.strftime(', %Y')
return '{0}{1}{2}'.format(prefix, day, postfix)
with open('whmail.log') as f:
for line in f:
if search_match(line, f.readline()):
date, subject = datetime.datetime.strptime(f.readline().strip()[5:], "%A, %B %d, %Y"), f.readline()
subject = 'Re:' + subject[7:]
limit = date + datetime.timedelta(days=28)
sent = limit + datetime.timedelta(days=5)
print('From veep@whitehouse.govnTo buddha@whitehouse.govnDate {0}nSubject {1}nThank you for advising me of your BM. You may not havenanother BM until {2}'.format(get_date(sent), subject, get_date(limit)))
Any advice on performance enhancement and solution simplification is appreciated, as are topical comments!
Solution
Buggy / fragile code
As evidenced by the comments debating whether this code works or not, this code is fragile, if not outright broken.
-
In Python 2: If you run the program in Python 2, it will crash and explicitly tell you what the problem is:
$ python algore.py Traceback (most recent call last): File "algore.py", line 14, in <module> if search_match(line, f.readline()): ValueError: Mixing iteration and read methods would lose data
The problem is that you are using both
for line in f: …
andf.readline()
to fetch text fromf
, and the two mechanisms don’t interact well with each other:Imagine that you are implementing the code to support
for line in f: …
. Rather than reading one character at a time from the buffer, you’d want to fetch a block of characters to fill a buffer, then scan that buffer for the line break. But then,f.readline()
also reads fromf
into a buffer, but starting from where the first read left off, which is likely beyond where the first line break is.One simple fix is to replace all calls to
f.readline()
withnext(f, '')
. Then, you would be consistently treatingf
as a line-by-line iterator, and not mixing the two mechanisms. But I’d still consider it to be poor style, since most people would assume that each iteration through thefor line in f: …
loop would consume just one line. -
In Python 3: In Python 3 appears to not suffer from that problem:
f.readline()
uses the same buffer as the line-by-line iterator forf
. However, your problem is thatif search_match(line, f.readline()):
always consumes another line from the buffer, whether or notline
is aFrom
header. Therefore, if you put aprint(line)
immediately afterfor line in f:
, you’ll see that it’s skipping some lines.
Style
This simultaneous assignment of two variables is unnecessarily lengthening a line of code that is very long already. It should be split up into two statements.
date, subject = datetime.datetime.strptime(f.readline().strip()[5:], "%A, %B %d, %Y"), f.readline()
Suggested solution
Since the challenge states that every message contains headers that occur in a predictable order, we can simply do pattern matching using a regular expression. If whmail.log
is not too large to fit entirely in memory, then the task would be greatly simplified by treating the file as one big string rather than a collection of lines.
The code to print the output would be more readable if you wrote the template as a triple-quoted multi-line string, with named placeholders instead of numbers.
Your get_date()
function is poorly named, as “get” implies that you are retrieving a piece of data from some existing place. It’s actually a date-formatting function, though.
from datetime import datetime, timedelta
import re
BM_EMAIL_RE = re.compile(
r'^From (?P<From>veep@whitehouse.gov)$s+'
r'^To (?P<To>buddha@whitehouse.gov)$s+'
r'^Date (?P<Date>.*)$s+'
r'^Subject (?P<Subject>.*)$s+',
re.MULTILINE
)
REPLY_TEMPLATE = """From buddha@whitehouse.gov
To veep@whitehouse.gov
Date {reply_date}
Subject Re: {subject}
Thank you for advising me of your BM. You may not have
another BM until {limit_date}."""
def format_date(date):
# We want %e (day of month with no zero padding) instead of %d (day of
# month with zero padding), but %e is not portable. Perform substitution
# as a workaround.
return re.sub('0([0-9],)', r'1', date.strftime('%A, %B %d, %Y'))
with open('whmail.log') as f:
for email in BM_EMAIL_RE.finditer(f.read()):
date = datetime.strptime(email.group('Date'), '%A, %B %d, %Y')
print(REPLY_TEMPLATE.format(
subject=email.group('Subject'),
reply_date=format_date(date + timedelta(days=33)),
limit_date=format_date(date + timedelta(days=28)),
))