# Parsing a TLV string

Posted on

Problem

This piece of code is supposed to walk through a TLV string and print out its contents. For this particular example, tag field length is 2, size field length is 3.

As a mostly C programmer, this is what I came up with. I know I can parametrize the tag and length field sizes, but I don’t care about that right now. Just interested to know the more pythonic way of parsing the string.

``````tlv = "01011Lorem ipsum02014dolor sit amet03027consectetur adipiscing elit"

temp = tlv[:]

while len(temp):
print('tag: {}'.format(temp[0:2]))
print('length: {}'.format(temp[2:2+3]))
tam = int(temp[2:2+3])
print('value: '{}'n'.format(temp[5:5+tam]))
temp = temp[5+tam:]
``````

Solution

• Good use of string slicing to extract each field, though ideally we should not need to make a copy of the string. I propose a different way of doing this below.
• `while len(temp):` works, but a more Pythonic way of writing this would be `while temp:` because any object can be tested for a truth value in Python, and non-empty strings evaluate to true.
• A popular new way of string interpolation introduced in Python 3.6 is the f-string. So instead of `'tag: {}'.format(temp[0:2])` you could do `f'tag: {temp[0:2]}'` which I personally find much easier to read.

To avoid copying the string `tlv` we can instead work with an iterator over the characters in the string, i.e. `iter(tlv)`. Then we can use `itertools`, a nice built-in library for working with iterators, specifically `itertools.islice` to extract/consume chunks of arbitrary length from the iterator:

``````from itertools import islice

TAG_FIELD_LENGTH = 2
LENGTH_FIELD_LENGTH = 3

def tlv_parser(tlv_string):
it = iter(tlv_string)
while tag := "".join(islice(it, TAG_FIELD_LENGTH)):
length = int("".join(islice(it, LENGTH_FIELD_LENGTH)))
value = "".join(islice(it, length))
yield (tag, length, value)
``````

Notes on the above:

• The strategy of parsing is very similar to yours, with the only difference being that we’re working with an iterator of characters, and we’re consuming from the iterator chunk by chunk so we don’t need to calculate indices for field boundaries.
• We’re concatenating the characters in each iterator to a string with `"".join(...)`, i.e. str.join.
• `:=` is the “walrus operator” (introduced in Python 3.8) that binds values to variables as part of a larger expression, so we’re binding the value of `"".join(islice(it, TAG_FIELD_LENGTH))` to `tag` and at the same time testing its truth value.
• The `yield` keyword makes `tlv_parser` a generator of 3-tuples of (tag, length, value).

Example usage:

``````>>> tlv = "01011Lorem ipsum02014dolor sit amet03027consectetur adipiscing elit"

>>> for t in tlv_parser(tlv):
...     print(t)
...
('01', 11, 'Lorem ipsum')
('02', 14, 'dolor sit amet')

>>> for tag, length, value in tlv_parser(tlv):
...     print(f"tag: {tag}")
...     print(f"length: {length}")
...     print(f"value: {value!r}n")
...
tag: 01
length: 11
value: 'Lorem ipsum'

tag: 02
length: 14
value: 'dolor sit amet'

tag: 03
length: 27