Reading denormalized data using a stateful loop body

Posted on

Problem

I’m reading denormalized data with a SqlDataReader, the data has this form:

SELECT
    Objects.ObjectName, Objects.OwnerId, Owners.Name
FROM
    Objects
    INNER JOIN Owners ON Objects.OwnerId = Owners.OwnerId
ORDER BY
    Objects.OwnerId

Which has this output:

OjectName      OwnerId    OwnerName
Foo            1          Mr. Fu
Bar            1          Mr. Fu
Baz            2          Mrs. Daz

My reading loop is this:

using( SqlDataReader rdr = cmd.ExecuteReader() )
{
    Int64 lastOwnerId = -1;
    List<String> objectNames = new List<String>();
    while( rdr.Read() )
    {
        String objName   = rdr.GetString( 0 );
        Int64  ownerId   = rdr.GetInt64 ( 1 );
        String ownerName = rdr.GetString( 2 );

        if( ownerId != lastOwnerId && objectNames.Count > 0 )
        {
            // logic to populate and yield-return new Owner w/ ObjectNames

            objectNames.Clear();
            lastOwnerId = ownerId;
        }
        else
        {
            objectNames.Add( objName );
        }
    }

    if( ownerId != lastOwnerId && objectNames.Count > 0 )
    {
        // logic to populate and yield-return new Owner w/ ObjectNames          
    }
}

The logic to populate and yield-return new Owner w/ObjectNames exists twice. Short of moving it to its own function, is there another loop pattern that is DRY?

Solution

In essence your code is currently:

while (rdr.Read()) {
    if (new owner) {
       // do duplicated logic yield stuff
       // Prepare for new owner
    } else {
       // Add owner names
    }
}

if (new owner) {
    // do duplicated logic yield stuff
}

To avoid the duplicated logic, the nicest solution would be to use an extra function, but alternatively you could also change the loop logic into being an eternal loop, and breaking out of it when reaching the end.

In other words something similar to the following:

while (True) {
    if (!rdr.read()) {
        last_record == true;
    }

    if (new owner || last_record ) {
       // do duplicated logic yield stuff
       if (!last_record) {
           // Prepare for new owner
       } else {
          break; // Break out of loop
       }
    } else {
       // Add owner names
    }
}

In this way you call the duplicated logic block both if there are data and you have a new owner, and if you’ve read the last record. Note that the !rdr.read() will read records even though the if condition is negated afterwards.

Whether you feel this is a better approach or not, is left for you to decide, but it does avoid the duplication of logic you are concerned about.

The way you are reading from the database is fairly standard. It would really be best to put your owner logic in a new function. If you can’t do that for some reason, you could put the logic in a lambda at the start of your function and call it in both places, but this would be less readable than having a separate function.

Also note that a SqlDataReader object can be accessed like an array.

String objName   = (string)rdr.Item["OjectName"];
Int64  ownerId   = (Int64)rdr.Item["OwnerId"];
String ownerName = (string)rdr.Item["OwnerName"];

If I understood correctly what you are trying to achieve (concatenating objects for owners), I think one way is to actually process all the data on SQL Server.

Setup:

-- drop table #ObjectOwner
create table #ObjectOwner
(
    ObjectName VARCHAR(64),
    OwnerId INT,
    OwnerName VARCHAR(64) 
)

insert into #ObjectOwner values ('Foo', 1, 'Mr. Fu'), ('Bar', 1, 'Mr. Fu'), ('Baz', 2, 'Mrs. Daz'), ('Other', 3, 'Anonymous')
GO

select * from #ObjectOwner
GO

Query:

-- get all owners and a concatenation of all their objects
SELECT OwnerId, OwnerName,
  STUFF((
    SELECT ', ' + ObjectName
    FROM #ObjectOwner 
    WHERE (OwnerId = Results.OwnerId) 
    FOR XML PATH(''),TYPE).value('(./text())[1]','VARCHAR(MAX)')
  ,1,2,'') AS NameValues
FROM #ObjectOwner Results
GROUP BY OwnerId, OwnerName
ORDER BY OwnerId

The advantage is that you avoid fetching lots of VARCHARs from SQL Server. However, the query is more complex and does not allow to easily change it. Also, result is obtained as a concatenation, not a list of objects, so it must be split, if a list is really needed.

My opinion is to stick with holroy‘s approach, but I felt like sharing this alternative.

Leave a Reply

Your email address will not be published. Required fields are marked *