Problem
This piece of code is supposed to walk through a TLV string and print out its contents. For this particular example, tag field length is 2, size field length is 3.
As a mostly C programmer, this is what I came up with. I know I can parametrize the tag and length field sizes, but I don’t care about that right now. Just interested to know the more pythonic way of parsing the string.
tlv = "01011Lorem ipsum02014dolor sit amet03027consectetur adipiscing elit"
temp = tlv[:]
while len(temp):
print('tag: {}'.format(temp[0:2]))
print('length: {}'.format(temp[2:2+3]))
tam = int(temp[2:2+3])
print('value: '{}'n'.format(temp[5:5+tam]))
temp = temp[5+tam:]
Solution
Some initial comments:
- Good use of string slicing to extract each field, though ideally we should not need to make a copy of the string. I propose a different way of doing this below.
while len(temp):
works, but a more Pythonic way of writing this would bewhile temp:
because any object can be tested for a truth value in Python, and non-empty strings evaluate to true.- A popular new way of string interpolation introduced in Python 3.6 is the f-string. So instead of
'tag: {}'.format(temp[0:2])
you could dof'tag: {temp[0:2]}'
which I personally find much easier to read.
To avoid copying the string tlv
we can instead work with an iterator over the characters in the string, i.e. iter(tlv)
. Then we can use itertools
, a nice built-in library for working with iterators, specifically itertools.islice
to extract/consume chunks of arbitrary length from the iterator:
from itertools import islice
TAG_FIELD_LENGTH = 2
LENGTH_FIELD_LENGTH = 3
def tlv_parser(tlv_string):
it = iter(tlv_string)
while tag := "".join(islice(it, TAG_FIELD_LENGTH)):
length = int("".join(islice(it, LENGTH_FIELD_LENGTH)))
value = "".join(islice(it, length))
yield (tag, length, value)
Notes on the above:
- The strategy of parsing is very similar to yours, with the only difference being that we’re working with an iterator of characters, and we’re consuming from the iterator chunk by chunk so we don’t need to calculate indices for field boundaries.
- We’re concatenating the characters in each iterator to a string with
"".join(...)
, i.e. str.join. :=
is the “walrus operator” (introduced in Python 3.8) that binds values to variables as part of a larger expression, so we’re binding the value of"".join(islice(it, TAG_FIELD_LENGTH))
totag
and at the same time testing its truth value.- The
yield
keyword makestlv_parser
a generator of 3-tuples of (tag, length, value).
Example usage:
>>> tlv = "01011Lorem ipsum02014dolor sit amet03027consectetur adipiscing elit"
>>> for t in tlv_parser(tlv):
... print(t)
...
('01', 11, 'Lorem ipsum')
('02', 14, 'dolor sit amet')
('03', 27, 'consectetur adipiscing elit')
>>> for tag, length, value in tlv_parser(tlv):
... print(f"tag: {tag}")
... print(f"length: {length}")
... print(f"value: {value!r}n")
...
tag: 01
length: 11
value: 'Lorem ipsum'
tag: 02
length: 14
value: 'dolor sit amet'
tag: 03
length: 27
value: 'consectetur adipiscing elit'