ToList method in Linq

Question

If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example

//using linqlist = Students.Where(s => s.Name == "ABC").ToList();//traditional wayforeach (var student in Students){if (student.Name == "ABC")list.Add(student);}

I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.

The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.

Do these things make any impact on performance?

Best Answer

var list = Students.Where(s=>s.Name == "ABC");

This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.

List<Student> studentList = new List<Student>();var list = Students.Where(s=>s.Name == "ABC");foreach(Student s in list){studentList.add(s);}

this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.

And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.

First we have our list

public class TestList<T> : IEnumerable<T>{private TestEnumerator<T> _Enumerator;public TestList(){_Enumerator = new TestEnumerator<T>();}public IEnumerator<T> GetEnumerator(){return _Enumerator;}System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator(){throw new NotImplementedException();}internal void Add(T p){_Enumerator.Add(p);}}

And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.

public class TestEnumerator : IEnumerator{public Item FirstItem = null;public Item CurrentItem = null;

 public TestEnumerator(){}public T Current{get { return CurrentItem.Value; }}public void Dispose(){}object System.Collections.IEnumerator.Current{get { throw new NotImplementedException(); }}public bool MoveNext(){Program.Counter++;if (CurrentItem == null){CurrentItem = FirstItem;return true;}if (CurrentItem != null && CurrentItem.NextItem != null){CurrentItem = CurrentItem.NextItem;return true;}return false;}public void Reset(){CurrentItem = null;}internal void Add(T p){if (FirstItem == null){FirstItem = new Item<T>(p);return;}Item<T> lastItem = FirstItem;while (lastItem.NextItem != null){lastItem = lastItem.NextItem;}lastItem.NextItem = new Item<T>(p);}}

And then we have a custom item that just wraps our value

public class Item<T>{public Item(T item){Value = item;}public T Value;public Item<T> NextItem;}

To use the actual code we create a "list" with 3 entries.

 public static int Counter = 0;static void Main(string[] args){TestList<int> list = new TestList<int>();list.Add(1);list.Add(2);list.Add(3);var v = list.Where(c => c == 2).ToList(); //will use movenext 4 timesvar v = list.Where(c => true).ToList(); //will also use movenext 4 timesList<int> tmpList = new List<int>(); //And the loop in OP questionforeach(var i in list){tmpList.Add(i);} //Also 4 times.}

And conclusion? How does it hit performance?The MoveNext is called n+1 times in this case. Regardless of how many items we have. And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list. The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.

And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.

To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.

In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.

First of all, Why are you even asking me? Measure for yourself and see.

That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:

var list = Students.Where(s => s.Name == "ABC");

won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".

There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.

If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.

Accepted Answer

Do these things make any impact on performance?

That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).

But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).

Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):

Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…

What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().

ToList method in Linq

Best Answer

Random Posts