Contest Solution: Al Gore Rhythm

Posted on

Problem

The Problem

After his brush with the Justice Department over fundraising with Buddhist monks, the Vice President devised a plan to
ensure that such activities are carried out in a more discrete manner and are kept less noticeable. Realizing the Democratic
National Committee’s need for more and more money in the campaign war chest, Mr. Gore devised a “rhythm method” for
accepting campaign donations from Buddhist monks and yet avoiding the conception of an independent counsel. Gore’s
theory is that if the donations are spaced no less than 28 days apart, an independent counsel will not be necessary.

To help Mr. Gore keep track of when it’s okay to accept money from Buddhist monks, you must write a program to
automatically scan the White House email logs for messages from “veep@whitehouse.gov” addressed to“buddha@whitehouse.gov” (the code name for the Al Gore Rhythm Method). Each such email is a secret entry in Mr.Gore’s Buddhist monk fundraising dairy. Your program must send Mr. Gore (“veep@whitehouse.gov”) a reply to
each such email advising him when the next donation may be accepted. You should assume that the email was sent the day
that the donation was accepted. To maintain more secrecy, Mr. Gore refers to a Buddhist monk as a “BM.”

Sample Input

Your program must process the White House electronic mail log, stored in the file whmail.log as input. The mail log is
an ASCII text file which contains a sequence of one or more valid email messages. Each email message consists of a header
followed by the body. A header consists of exactly four lines, each of which begins with a unique keyword. The first line is
the sender line and begins with the keyword From which is followed by a character string representing the email address of
the sender. The second line is the recipient line and begins with the keyword To which is followed by a character string
representing the email address of the recipient of the message. The third line is the date line and begins with the keyword
Date which is followed by a character string representing the date on which the email message was received. The fourth
line is the subject line and begins with the keyword Subject which is followed by an arbitrary character string. A single
blank space separates the keyword on each header line from the character string that follows. The body of an email message
is simply a sequence of one or more lines of text any of which may be blank. The body begins on the fifth line of the email
message. There will be normal White House email interspersed with the Al Gore Rhythm email, but your program should
ignore all email other than those from “veep” to “buddha.”
Sample contents of whmail.log could appear as:

From bill@whitehouse.gov
To buffy@airhead.net
Date Saturday, October 4, 1997
Subject Get together

Hey, honeypie. Ole Big Daddy sure is missin you. I’ll be a
lookin for you to come around again this weekend.
Love,
Bill

From veep@whitehouse.gov
To buddha@whitehouse.gov
Date Monday, October 6, 1997
Subject BM
Dear Buddha, I just had a BM. Please advise.

From reno@justice.gov
To bill@whitehouse.gov
Date Wednesday, October 8, 1997
Subject Roby Ridge

Mr. President:
  The situation with the lady in Roby is quickly deteriorating.
I advise we use an Apache loaded with napalm to flush the crazy
woman out. If it kills her, it serves her right for that Vaseline
trick. Dead or alive, at least it will be over. If I don’t hear
from you within the next hour, I’ll send for the chopper.

Janet

Sample Output

The output of your program must be directed to the screen and must consist of a reply email for each Al Gore Rhythm email
found in the log. Each reply must specify the date on which the next donation may be accepted. Your output must be
formatted exactly as that below, which is the output corresponding to the sample input above.

From buddha@whitehouse.gov
To veep@whitehouse.gov
Date Saturday, November 8, 1997
Subject Re: BM

Thank you for advising me of your BM. You may not have
another BM until Monday, November 3, 1997.

algore.py

import datetime

def search_match(t, f):
    return t.startswith('From veep@whitehouse.gov') and f.startswith('To buddha@whitehouse.gov')

def get_date(date):
    prefix = date.strftime('%A, %B ')
    day = date.strftime('%d').lstrip('0')
    postfix = date.strftime(', %Y')
    return '{0}{1}{2}'.format(prefix, day, postfix)

with open('whmail.log') as f:
    for line in f:
        if search_match(line, f.readline()):
            date, subject = datetime.datetime.strptime(f.readline().strip()[5:], "%A, %B %d, %Y"), f.readline()
            subject = 'Re:' + subject[7:]
            limit = date + datetime.timedelta(days=28)
            sent = limit + datetime.timedelta(days=5)
            print('From veep@whitehouse.govnTo buddha@whitehouse.govnDate {0}nSubject {1}nThank you for advising me of your BM. You may not havenanother BM until {2}'.format(get_date(sent), subject, get_date(limit)))

Any advice on performance enhancement and solution simplification is appreciated, as are topical comments!

Solution

Buggy / fragile code

As evidenced by the comments debating whether this code works or not, this code is fragile, if not outright broken.

  • In Python 2: If you run the program in Python 2, it will crash and explicitly tell you what the problem is:

    $ python algore.py 
    Traceback (most recent call last):
      File "algore.py", line 14, in <module>
        if search_match(line, f.readline()):
    ValueError: Mixing iteration and read methods would lose data
    

    The problem is that you are using both for line in f: … and f.readline() to fetch text from f, and the two mechanisms don’t interact well with each other:

    Imagine that you are implementing the code to support for line in f: …. Rather than reading one character at a time from the buffer, you’d want to fetch a block of characters to fill a buffer, then scan that buffer for the line break. But then, f.readline() also reads from f into a buffer, but starting from where the first read left off, which is likely beyond where the first line break is.

    One simple fix is to replace all calls to f.readline() with next(f, ''). Then, you would be consistently treating f as a line-by-line iterator, and not mixing the two mechanisms. But I’d still consider it to be poor style, since most people would assume that each iteration through the for line in f: … loop would consume just one line.

  • In Python 3: In Python 3 appears to not suffer from that problem: f.readline() uses the same buffer as the line-by-line iterator for f. However, your problem is that if search_match(line, f.readline()): always consumes another line from the buffer, whether or not line is a From header. Therefore, if you put a print(line) immediately after for line in f:, you’ll see that it’s skipping some lines.

Style

This simultaneous assignment of two variables is unnecessarily lengthening a line of code that is very long already. It should be split up into two statements.

date, subject = datetime.datetime.strptime(f.readline().strip()[5:], "%A, %B %d, %Y"), f.readline()

Suggested solution

Since the challenge states that every message contains headers that occur in a predictable order, we can simply do pattern matching using a regular expression. If whmail.log is not too large to fit entirely in memory, then the task would be greatly simplified by treating the file as one big string rather than a collection of lines.

The code to print the output would be more readable if you wrote the template as a triple-quoted multi-line string, with named placeholders instead of numbers.

Your get_date() function is poorly named, as “get” implies that you are retrieving a piece of data from some existing place. It’s actually a date-formatting function, though.

from datetime import datetime, timedelta
import re

BM_EMAIL_RE = re.compile(
    r'^From (?P<From>veep@whitehouse.gov)$s+'
    r'^To (?P<To>buddha@whitehouse.gov)$s+'
    r'^Date (?P<Date>.*)$s+'
    r'^Subject (?P<Subject>.*)$s+',
    re.MULTILINE
)

REPLY_TEMPLATE = """From buddha@whitehouse.gov
To veep@whitehouse.gov
Date {reply_date}
Subject Re: {subject}

Thank you for advising me of your BM. You may not have
another BM until {limit_date}."""

def format_date(date):
    # We want %e (day of month with no zero padding) instead of %d (day of
    # month with zero padding), but %e is not portable.  Perform substitution
    # as a workaround.
    return re.sub('0([0-9],)', r'1', date.strftime('%A, %B %d, %Y'))

with open('whmail.log') as f:
    for email in BM_EMAIL_RE.finditer(f.read()):
        date = datetime.strptime(email.group('Date'), '%A, %B %d, %Y')
        print(REPLY_TEMPLATE.format(
            subject=email.group('Subject'),
            reply_date=format_date(date + timedelta(days=33)),
            limit_date=format_date(date + timedelta(days=28)),
        ))

Leave a Reply

Your email address will not be published. Required fields are marked *