Problem
I can’t decide whether my method that returns an IEnumerable<T>
should itself be lazy or whether it should build a list and return that. I often opt for the latter so I can be sure the enumeration of each value isn’t performed multiple times.
For example:
public IEnumerable<UserProfile> GetUsers()
{
var allDepartments = GetAllDepartments(active: true); // Returns IEnumerable<Department>
var allUsers = GetAllUserDepartments(active: true); // Returns IEnumerable<User>
var users
= allDepartments
.Join(allUsers, x => x.Department, y => y.Department, (x, y) => y, StringComparer.OrdinalIgnoreCase)
.Distinct()
.Select(x => GetUserProfile(x))
.ToList(); // Is this recommended or not
return users;
}
The key part is each enumeration is doing something (GetUserProfile
) non trivial, it might be expensive, it might not, but what’s the recommendation for a method that returns an IEnumerable<T>
? Should it care about whether the caller might enumerate it several times or not?
Assume only IEnumerable<T>
functionality is required by the caller and my question can be reworded into:
How do I signify to the caller that each enumeration may be expensive?
How expensive it is, may change their implementation decisions. If they enumerate the whole lot several times, then a ToList()
makes sense to them to perform. If they don’t enumerate the whole lot, then a ToList()
would be a waste for me (or the caller) to perform.
Solution
Does code that calls the method always expect List
functionality (access by index, etc.)? Return a List
. Does code that calls the method only expect to iterate over it? Return an IEnumerable
.
You shouldn’t care about what the caller does with it, because the return type clearly states what the returned value is capable of doing. Any caller that gets an IEnumerable
result knows that if they want indexed access of the result, they will have to convert to a List
, because IEnumerable
simple isn’t capable of it until it’s been enumerated and put into an indexed structure. Don’t assume that the callers are stupid, otherwise you end up taking functionality away from them. For example, by returning a List
, you’ve taken away the ability to stream results which can have its own performance benefits. Your implementation may change, but the caller can always turn an IEnumerable
into a List
if they need to.
Now that you’ve decided what the return type should be, you have to decide what the implementation should use. Deferred execution has benefits, but can also be confusing. Take this example:
public static IEnumerable<int> GetRecordIds()
{
return dbContext.Records.Select(r => r.Id);
}
IEnumerable<int> currentRecordIds = GetRecordIds();
dbContext.Add(new Record { Id = 7 });
// Includes "7", despite GetRecordIds() being called before the Add():
currentRecordIds.Dump();
This can be “remedied” by GetRecordIds
calling ToList
before returning. The “correct” use here simply depends on what is expected of the class (“live” results, or time of call results).
Again, don’t assume the caller is stupid, and don’t take away functionality from the caller by making assumptions about how it will be used. Remember, the interface tells you what is expected to be returned. IEnumerable
only means you are getting something that can be iterated over (potentially streaming results and making use of deferred execution), and List
only means you’re getting an in-memory collection that can be added to, removed from, accessed by index, etc.
Edit – To address your “Update 1”:
If you are really concerned, you might add some information into the documentation, but I don’t think it’s necessary. The IEnumerable
interface doesn’t claim to make any promises about performance on repeat enumeration (or even each iteration for that matter), so it’s on the caller to handle it intelligently.
I think the real question here is can the return value be lazily evaluated? This is essentially what this comes down to.
IEnumerable
is lazily evaluated in that any operations you apply to it (through Linq, or any sensible author) will be deferred until the last possible moment i.e, until you try and observe the IEnumerable
by iterating or by using a fold (First()
, FirstOrDefault()
). This is a good thing in most cases because it means you can completely avoid doing any work until you need to – which sometimes means that a potentially expensive operation never actually occurs. Awesome!
However, this doesn’t work so well when it comes to sequences that are retrieved from some unmanaged resource. Your example of this is perfect – a database. An IEnumerable
backed by a database should not be lazily evaluated, as the connection it was bound to can potentially be closed by the time it is enumerated.
The exception to this is if you can find some way of binding the lifecycle of a connection to an IEnumerable
– in other words, can you teach the IEnumerable
how to instantiate a connection/other resource, use it, and then dispose it. In some applications this solution might not be viable due to the performance impact, and you might feel uneasy about the loss of control here (this is where IoC comes in handy).
There is a framework called Rx (Reactive Extensions) for .NET that actually has a feature in this built in called Observable.Using()
. An observable is a class that can be observed, and is lazily evaluated – much like IEnumerable
. In fact, IObservable
implements IEnumerable
. By using Observable.Using()
, you can bind the creation of an object through a factory method to the lifetime of an IObservable
. When the IObservable
is first observed, the resource is created. When the IObservable
is finished being observed, the resource is Dispose()
d. Neat and tidy.
Back to the crux of your question, though; in this case, I would say, yes, ToList()
the IEnumerable
before returning the control to the caller; you will run into concurrency issues otherwise, and you have no guarantee that the connection will still be alive when you return control to the caller. But make sure to return an IEnumerable
still – IEnumerable
has a dual purpose in that it is very good at representing a read-only collection.
You shouldn’t use var
if it is to put a comment stating the return type, if it isn’t clear without the comment, put the real type instead of var
, it is more readable this way.
To answer you question, I think that yes you should call ToList
before returning the enumeration mainly because your IEnumerable<UserProfile>
could be an EF query and there’s no way for the user of your class to know this, meaning the person could use this enumeration multiple times and launch tons of SQL queries without knowing it (until there would be performance problems and diagnostic would lead to that IEnumerable
launching queries).
The performance cost of using ToList
is the same as using new List<>([your enumerable])
constructor, which is an O(n) operation since you need to copy the whole array. If you have a very big enumerable that you don’t want to copy all the time, maybe you should consider adding a parameter to your method that would let you decide if you execute the IEnumerable
or not this way :
public IEnumerable<UserProfile> GetUsers(bool deferExecution)
{
IEnumerable<Department> allDepartments = GetAllDepartments(active: true);
IEnumerable<User> allUsers = GetAllUserDepartments(active: true);
IEnumerable<UserProfile> users
= allDepartments
.Join(allUsers, x => x.Department, y => y.Department, (x, y) => y, StringComparer.OrdinalIgnoreCase)
.Distinct()
.Select(x => GetUserProfile(x));
if(!deferExecution)
{
users = users.ToList();
}
return users;
}
Though honestly, I’ve never tried it with the boolean in real life so I can’t tell if it is a really good idea, but I feel it wouldn’t be bad!
If you want more info about the performance cost of ToList
, check this post.