How to remove goto statement from iterator

Posted on

Problem

I’m implementing a simple iterator in C# over an xml document, in order to process an xml document in as much a streaming manner as possible. For this reason, I’m using an XmlReader variable in a while loop.

This iterator must either return each first-level child elements, depending upon the name of the root tag. Otherwise, it should return the single root element as a whole.

In order to return an xml element, I have a choice of using XNode.ReadFrom() or XmlReader.ReadOuterXml(). Both of which perform an additionnal read on the XmlReader variable. Therefore, I must find a way to prevent reading past interesting data in the underlying xml.

The first draft I came up with had a goto statement that I now would like to avoid, without sacrificing readability. Here is the code:

    public IEnumerable<XElement> Elements()
    {
        while (reader_.Read())
        {
        next_element:
            switch (reader_.NodeType)
            {
                case XmlNodeType.Element:
                    {
                        // This is a composite document
                        // skip to first-level elements
                        if (reader_.LocalName.StartsWith("CompositeDocument"))
                        {
                            RootNamespace = reader_.NamespaceURI;
                            IsComposite = true;
                            continue;
                        }

                        var element = XNode.ReadFrom(reader_) as XElement;
                        System.Diagnostics.Debug.Assert(element != null);
                        yield return element;
                        goto next_element;
                    }
                    break;

                case XmlNodeType.Text:
                case XmlNodeType.Whitespace:
                case XmlNodeType.SignificantWhitespace:
                case XmlNodeType.CDATA:
                case XmlNodeType.EntityReference:
                case XmlNodeType.XmlDeclaration:
                case XmlNodeType.ProcessingInstruction:
                case XmlNodeType.DocumentType:
                case XmlNodeType.Comment:
                case XmlNodeType.EndElement:
                    break;
            }
        }
    }

A first attempt to remove the goto statement, involves using two embedded loops, and a boolean flag indicating whether to stay in the inner loop or break back to the outer loop. I feel this makes the code less readable. Here is the code:

    public IEnumerable<XElement> Elements()
    {
        while (reader_.Read())
        {
            var inner_loop = true;
            while (inner_loop)
            {
                switch (reader_.NodeType)
                {
                    case XmlNodeType.Element:
                        {
            // this is a composite document
            // skip to first-level elements
                            if (reader_.LocalName.StartsWith("CompositeDocument"))
                            {
                                RootNamespace = reader_.NamespaceURI;
                                IsComposite = true;
                                inner_loop = false;
                            }
                            else
                            {
                                var element = XNode.ReadFrom(reader_) as XElement;
                                System.Diagnostics.Debug.Assert(element != null);
                                yield return element;
                                inner_loop = true;                                    
                            }
                        }
                        break;

                    case XmlNodeType.Text:
                    case XmlNodeType.Whitespace:
                    case XmlNodeType.SignificantWhitespace:
                    case XmlNodeType.CDATA:
                    case XmlNodeType.EntityReference:
                    case XmlNodeType.XmlDeclaration:
                    case XmlNodeType.ProcessingInstruction:
                    case XmlNodeType.DocumentType:
                    case XmlNodeType.Comment:
                    case XmlNodeType.EndElement:
                        inner_loop = false;
                        break;
                }
            }
        }
    }

The reason I’m inclined to keed the goto statement is that, in iterators, the yield statement is already akin to a goto statement, albeit to the next line of code.

Please, can anyone suggest how to best rewrite this code snippet ?

Solution

Could you do it with a boolean and short circuiting your while loop condition?

...
    var isElement = false;
    while (isElement || reader_.Read())
    {
        isElement = false;
        switch (reader_.NodeType)
        {
            case XmlNodeType.Element:
                {
                    // This is a composite document
                    // skip to first-level elements
                    if (reader_.LocalName.StartsWith("CompositeDocument"))
                    {
                        RootNamespace = reader_.NamespaceURI;
                        IsComposite = true;
                        continue;
                    }

                    var element = XNode.ReadFrom(reader_) as XElement;
                    System.Diagnostics.Debug.Assert(element != null);
                    yield return element;
                    isElement = true;
                }
                break;
...

Leave a Reply

Your email address will not be published. Required fields are marked *