Monday, July 16, 2012

Facts about yield keyword.

When I take interview for SE(Software Engg) or above, I normally ask about less frequently used keywords such as yield to asses about candidate’s C# language skills. A normal person who reads theory will start talking like Yield is something we can use with for each statements . But when the next question arrives they will go into mute and the question is “What is the difference in returning a List<T> and using yield return?” .If they had used yield they will say like we can have more than one yield returns but List<T> can be returned once. But is that the only difference?


Some facts about yield
  • Introduced in .Net 2.0 with the intention to make the iteration easier by eliminating use of IEnumerator derived class.
  • Yield can only be used in methods or property get assessors which are returning IEnumerable,IEnumerator or their generic variants.
  • It is not a .Net runtime feature .It is implemented at the language level which will be converted to IEnumerator derived implementation during compilation.ie you can’t see the yield keyword, if you open the assembly in reflector.
  • Usage of each yield keyword adds one compiler generated private class to your assembly.
  • The generated class will have implementation of IEnumerator and most probably you can see a goto statement in its MoveNext method.
  • There is no one to one relation between for…each and yield. ie you can use for…each without yield and yield without for…each
  • Yield depends on IEnumerable but IEnumerable is not depended on yield.
  • There is a usage “yield break” to stop the returns.
  • Yield allows you to write a C# function containing a code path without return statement.
  • VB.Net has got the yield support recently with VB 11 with .Net 4.5
  • Cannot yield return from try {} block if it has a catch block. catch and finally blocks never allows to write yield return.
  • Cannot be used inside anonymous methods or lambda expressions.
I don’t think explanation is needed on each and every item listed above .So let me go over some items.


History of yield keyword
Earlier if we wanted to iterate something we had to implement the IEnumerable interface which will return an IEnumerator for the iteration purpose. IEnumerator interface has some members named Current,MoveNext to achieve the for…each functionality. Look at a sample implementation here .It is little difficult.Isn’t it? This caused the invention of yield keyword which makes the iteration easier.

Dependency
Yield is not a dependency for anything in the C# language. Even if there is no yield keyword we can accomplish all the tasks. But think about IEnumerable. If it is not there for…each and yield cannot exist.


Why goto comes into generated code?
As I told earlier, the yield is not a .net framework /  runtime feature. It’s a C# & VB language feature which will be converted to IEnumerator based equivalent after compilation. This means you cannot expect this keyword in other .Net compatible languages. Sometimes if you look at the generated IEnumerator derived class you can see a goto statement. When it is coming ? Lets take case by case. The below method returns 2 integers using yield.
private IEnumerable<int> GetFirst2Integers()
{
        yield return 0;
        yield return 1;
}



If you open the assembly in reflector you can see a compiler generated private class GetFirst2Integers which implements IEnumerator and the MoveNext method will look as follows.

private bool MoveNext()
{
    switch (this.<>1__state)
    {
        case 0:
            this.<>1__state = -1;
            this.<>2__current = 0;
            this.<>1__state = 1;
            return true;

        case 1:
            this.<>1__state = -1;
            this.<>2__current = 1;
            this.<>1__state = 2;
            return true;

        case 2:
            this.<>1__state = -1;
            break;
    }
    return false;
}
 



No goto statement.Now lets involve a parameter of method into return logic.Say we pass a startFrom parameter.

private IEnumerable<int> Get2Integers(int startFrom)
{
    int i = startFrom;
    while (i < startFrom + 2)
        yield return i++;
}



This produces the MoveNext method with goto.

private bool MoveNext()
{
    switch (this.<>1__state)
    {
        case 0:
            this.<>1__state = -1;
            this.<i>5__a = this.startFrom;
            while (this.<i>5__a < (this.startFrom + 2))
            {
                this.<>2__current = this.<i>5__a++;
                this.<>1__state = 1;
                return true;
            Label_0055:
                this.<>1__state = -1;
            }
            break;

        case 1:
            goto Label_0055;
    }
    return false;
}



I was not able to conclude on what are the exact scenarios which will put goto statement in the generated code. But most of the cases processing the method parameters cause a goto statement.
http://blogs.msdn.com/b/oldnewthing/archive/2008/08/12/8849519.aspx

Returning infinite times
So what are the actual differences .One we saw is the ability for a method to have multiple returns. Another is the ability to return infinite times. For example we can have a method which can yield return the odd numbers infinitely. The process will be stopped only when the for...each iterator stops by break / return keyword.

Delayed return
If the method which returns using yield takes a lot of time to compute the return value ,it is advisable to use IEnumerable with yield as the return value can be processed by the for...each loop before the next value is computed. That doesn't mean we are getting any parallel processing. But a chance to delay the return value generation till the previous value is processed. If the for...each thinks that I am satisfied with the current value ,it can even stop the loop. If we are return as List this is not at all possible. So in this way the yield saves lot of time in the generation method.

Some more links

http://msdn.microsoft.com/en-us/library/9k7k7cf0(v=vs.80).aspx
http://msdn.microsoft.com/en-us/library/hh156729(v=vs.110).aspx

No comments: