Repeatable HTTP requests in Python

Posted on

Problem

I have a simple python script that’s repeating a HTTP response/request. I would like to know how I could have constructed this better, particularly if there’s a way I could have incorporated the slice operator to cut down on my variable usage.

For context – the response contained a JSON string in between the <ExecuteResult> flags that I’m removing currently with split()

import requests
import json
from pprint import pprint

url = "<redacted>"
cookies = {<redacted>}
headers = {<redacted>}
data="<redacted>"

submissions = input("How many submissions to create:")

while submissions > 0:
    response = requests.post(url, headers=headers, cookies=cookies, data=data)
    xml_string = response.content
    json_file = xml_string.split('<ExecuteResult>')[1]
    json_file = (json_file.split('</ExecuteResult>')[0]).replace("\", "")
    json_response = json.loads(json_file)
    print "Created Id: " + json_response['_entity']['Id']
    submissions -= 1

Solution

If you are making many requests, it is worth it to have a look at requests.Session. This re-uses the connection to the server for subsequent requests (and also the headers and cookies).

I would also start separating your different responsibilities. One might be global constants (like your header and cookies), one a function that actually does the submission, one that creates the session and does each submission and then some code under a if __name__ == '__main__': guard that executes the code.

import requests
import json

COOKIES = {<redacted>}
HEADERS = {<redacted>}

def post(session, url, data):
    """Use session to post data to url.
    Returns json object with execute result"""
    response = session.post(url, data=data)
    json_file = response.content.split('<ExecuteResult>')[1]
    json_file = (json_file.split('</ExecuteResult>')[0]).replace("\", "")
    return json.loads(json_file)


def post_submisssions(url, data, n):
    """Create a session and post `n` submissions to `url` using `data`."""
    session = requests.Session()
    session.headers.update(HEADERS)
    session.cookies = COOKIES

    for _ in xrange(n):
        json_response = post(session, url, data)
        print "Created Id:", json_response['_entity']['Id']


if __name__ == "__main__":
    url = "<redacted>"
    data = "<redacted>"
    post_submisssions(url, data, input("How many submissions to create:"))

I also added some rudimentary docstrings.


You could also change your parsing by using BeautifulSoup:

from bs4 import BeautifulSoup

def post(session, url, data):
    """Use session to post data to url.
    Returns json object with execute result"""
    response = session.post(url, data=data)
    soup = BeautifulSoup(response.content, 'lxml')
    return json.loads(soup.find('ExecuteResult'.lower()).replace("\", ""))

I’ll start with the obvious:

while submissions > 0:
    ...
    submissions -= 1

can be transformed into:

for _ in xrange(submissions):
    ...

There are other changes that I think would make the code slightly better, albeit not necessarily more pythonic:

  • submissions should probably be named submission_count (thanks, Ludisposed!).
  • You appear to be parsing HTML using string.split(). It might work OK now, but if you need to add more functionality in the future this approach will be very cumbersome. Try using a library instead (for example, Beautiful Soup).
  • You’re not using the pprint import, so you might as well remove that line.

Leave a Reply

Your email address will not be published. Required fields are marked *