Python 3 Simple Xor Pipe/CLI Program

Posted on

Problem

I wrote a simple Python 3 program that takes data from either a file or standard input and xor encrypts or decrypts the data. By default, the output is encoded in base64, however there is a flag for disabling that --raw. It works as intended except when I am using the raw mode, in which case an extra line and some random data is appended to the output when decrypting xor data.

#!/usr/bin/env python3
from itertools import cycle
import argparse
import base64

import re
def xor(data, key):
    return ''.join(chr(ord(str(a)) ^ ord(str(b))) for (a, b) in zip(data, cycle(key)))

# check if a string is base64 encoded.
def is_base64(s):
    pattern = re.compile("^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$")
    if not s or len(s) < 1:
        return False
    else:
        return pattern.match(s)



if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--infile', type=argparse.FileType('r'), default='-', dest='data', help='Data to encrypt'
                                                                                                      'or decrypt')
    parser.add_argument('-r', '--raw', dest='raw', default=False, action='store_true', help='Do not use base64 encoding')
    parser.add_argument('-k', '--key', dest='key', help='Key to encrypt with', required=True)
    args = parser.parse_args()

    data = args.data.read()
    key = args.key
    raw = args.raw
    if raw:
        ret = xor(data, key)
        print(str(ret))
    else:
        if is_base64(data):
            # print('is base64')
            decoded = base64.b64decode(data).decode()
            ret = xor(decoded, key)
            print(ret)
        else:
            # print('is not base64')
            ret = xor(data, key)
            encoded = base64.b64encode(bytes(ret, "utf-8"))
            print(encoded.decode())

When running without the --raw flag, everything performs as intended:

$ echo lol|./xor.py -k 123  
XV1fOw==
echo lol|./xor.py -k 123 |./xor.py -k 123
lol

However, if I disable base64, something rather odd happens. It’s easier to demonstrate then it is to explain:

$ echo lol|./xor.py -k 123 -r |./xor.py -k 123 -r
lol
8

Does anyone know why I am seeing the character 8 appended to the output of xor decrypted data? I have a c program called xorpipe that I use for this exact use case, and it does not suffer this bug. I wanted to rewrite it in Python.

I am looking for other constructive criticism, suggestions, or reviews as well. Particular, I would like argparse to be able to determine whether the supplied input is either a file, string, or data piped from standard input. This is easy to accomplish bash or C, but I am not sure how best to do this in Python.

Solution

Does anyone know why I am seeing the character 8 appended to the output of xor decrypted data?

The statement echo lol pipes lolnr to Python, which encodes the line breaks as ;_ which is decoded into an 8. Unfortunately echo -n doesn’t work here but adding .strip() to the input data in the Python script fixes this issue.

PEP8 is Python’s internal coding standards which specify how line breaks and code should be structured: https://www.python.org/dev/peps/pep-0008/ . The guide is very long; you can use autopep8 to auto-format the code.

    if raw:
        ret = xor(data, key)
        print(str(ret))
    else:
        if is_base64(data):
            # print('is base64')
            decoded = base64.b64decode(data).decode()
            ret = xor(decoded, key)
            print(ret)
        else:
            # print('is not base64')
            ret = xor(data, key)
            encoded = base64.b64encode(bytes(ret, "utf-8"))
            print(encoded.decode())

I would simplify the nested if-statements and add a return to print(str(ret)) then the is_base64 could be unindented, or I would set a variable called decoded to the final string to print, then print it out at the end of the if/elif loop.

is_base64(s) could just run base64.b64decode(data).decode() and return False if any exceptions were thrown during decoding instead of the regex.

I would remove the commented out code such as # print('is base64').

Leave a Reply

Your email address will not be published. Required fields are marked *