Problem
I made a Python function that takes an XML file as a parameter and returns a JSON.
My goal is to convert some XML get from an old API to convert into Restful API.
This function works, but it is really ugly. I try to clean everything, without success.
Here are my tests:
def test_one_node_with_more_than_two_children(self):
xml = '<a id="0" name="foo"><b id="00" name="moo" /><b id="01" name="goo" /><b id="02" name="too" /></a>'
expected_output = {
"a": {
"@id": "0",
"@name": "foo",
"b": [
{
"@id": "00",
"@name": "moo"
},
{
"@id": "01",
"@name": "goo"
},
{
"@id": "02",
"@name": "too"
}
]
}
}
self.assertEqual(expected_output, self.parser.convertxml2json(xml))
And my function:
from xml.dom.minidom import parseString, Node
class Parser(object):
def get_attributes_from_node(self, node):
attributes = {}
for attrName, attrValue in node.attributes.items():
attributes["@" + attrName] = attrValue
return attributes
def convertxml2json(self, xml):
parsedXml = parseString(xml)
return self.recursing_xml_to_json(parsedXml)
def recursing_xml_to_json(self, parsedXml):
output_json = {}
for node in parsedXml.childNodes:
attributes = ""
if node.nodeType == Node.ELEMENT_NODE:
if node.hasAttributes():
attributes = self.get_attributes_from_node(node)
if node.hasChildNodes():
attributes[node.firstChild.nodeName] = self.recursing_xml_to_json(node)[node.firstChild.nodeName]
if node.nodeName in output_json:
if type(output_json[node.nodeName]) == dict:
output_json[node.nodeName] = [output_json[node.nodeName]] + [attributes]
else:
output_json[node.nodeName] = [x for x in output_json[node.nodeName]] + [attributes]
else:
output_json[node.nodeName] = attributes
return output_json
Can someone give me some tips to improve this code?
def recursing_xml_to_json(self, parsedXml)
is really bad.
I am ashamed to produce it 🙂
Solution
There is a structural issue with your code: the recursive function works at two levels. 1) It constructs a dict representing a node, and 2) it does some work in constructing the representation for the children. This makes it unnecessarily complicated. Instead, the function should focus on handling one node, and delegate the creation of child nodes to itself via recursion.
You seem to have requirements such as:
- A node that has no children nor attributes converts to an empty string. My implementation:
return output_dict or ""
- Multiple children with same nodeName convert to a list of dicts, while a single one converts to just a dict. My implementation makes this explicit by constructing lists first, then applying this conversion:
v if len(v) > 1 else v[0]
- A node with children but no attributes raises a TypeError. I suspect this is an oversight and did not reproduce the behavior.
Note that (2) means that a consumer of your JSON that expects a variable number of nodes must handle one node as a special case. I don’t think that is good design.
def get_attributes_from_node(self, node):
attributes = {}
if node.attributes:
for attrName, attrValue in node.attributes.items():
attributes["@" + attrName] = attrValue
return attributes
def recursing_xml_to_json(self, node):
d = collections.defaultdict(list)
for child in node.childNodes:
if child.nodeType == Node.ELEMENT_NODE:
d[child.nodeName].append(self.recursing_xml_to_json(child))
output_dict = self.get_attributes_from_node(node)
output_dict.update((k, v if len(v) > 1 else v[0])
for k, v in d.iteritems())
return output_dict or ""