Sei sulla pagina 1di 34

LINQ

Fundamentals of

(Language-Integrated Query)
Fundamentals of LINQ

Octavio Hernandez

1
About the Author
Octavio Hernandez currently lives and works in Santa Clarita,
California.
He is a seasoned developer with many years of experience with
Microsoft technologies. He is also the author or co-author of
several books.
From 2004 to 2010 he was distinguished by Microsoft as a
Visual C# MVP.

FUNDAMENTALS OF LINQ

Notice of Liability
The author and publisher have made every effort to ensure the accuracy of the
information herein. However, the information contained in this book is sold without
warranty, either express or implied. Neither the authors and Krasis Consulting S.L., nor
its dealers or distributors, will be held liable for any damages to be caused either directly
or indirectly by the instructions contained in this book, or by the software or hardware
products described herein.

Trademark Notice
Rather than indicating every occurrence of a trademarked name as such, this book
uses the names only in an editorial fashion and to the benefit of the trademark owner
with no intention of infringement of the trademark.

Krasis Consulting, S. L. 2013


www.campusmvp.net

ALL RIGHTS RESERVED. NO PART OF THIS BOOK MAY BE REPRODUCED, IN ANY


FORM OR BY ANY MEANS, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
Premium coached online training

r>
for busy developers
<code

Got no time and need to learn


new programming skills?

More than canned videos


Tutored by the ones who know most
Specific training methodology
Direct contact with our Students Office
91% of our students give us an A

www.campusmvp.net
CHAPTER

1
Fundamentals of LINQ

Note: In this chapter you will see references to C# 3.0. This is not a misprint even
though we presently have a later version of the language. The 3.0 version is cited
because, in reality, LINQ has its origins in it and not in the current version of the
language.

1.- INTRODUCTION
Language-Integrated Query (the source of the LINQ acronym) is one of the most
significant improvements in recent years in the field of programming languages
and systems. It plays an essential role in the goal to minimize the effects of the
phenomenon known as impedance mismatch. Impedance mismatch
imposed the need upon us to use not only our ordinary programming language
when developing applications, but also a whole series of other different
languages to access a wide variety of data sources, such as SQL or
XPath/XQuery, whose syntactic constructs are currently embedded literally
inside the C# or Visual Basic code.
Thanks to LINQan integral part of the new versions of these languages
from .NET Framework 3.5 onwardswe can really start to create applications
that contain only .NET code, using a clear and natural syntax to access those
data sources, leaving the task of generating those foreign constructions to the
compiler and the support libraries.

3
4 Fundamentals of LINQ

1.1.- Presenting LINQ


LINQ is a combination of extensions to the language and the managed code libraries
which allows the uniform expressing of queries performed against collections of data
from the most diverse sources, including objects in memory, XML documents or
relational databases.
The meaning of the acronym itself (Language-Integrated Query) gives us the key
to understanding the main purpose of LINQ: to allow the developer to express
queries to data from very diverse sources using resources of their own
programming language. The main advantages of this possibility are the following:

When expressing queries, the developer will be able to make use of all the
benefits that the compiler and the integrated environment offer (syntax and type
checking by the compiler, IntelliSense help within the environment, metadata
access by both).
Although LINQ will not eliminate it completely, it will indeed allow us to
greatly reduce said phenomenon of impedance mismatch produced by the
differences between programming models proposed by the general purpose
languages and the query languages for relational databases (SQL),
XML documents (XPath/XQuery) and other data sources.
Finally, another important advantage of LINQ is that it will allow increasing the
level of abstraction and clarity when programming queries. For example, a
developer who needs to access a database nowadays must set out a meticulous
plan specifying how to retrieve the data they need. Query expressions, on the
other hand, are a much more declarative tool that largely allows us to just
indicate what we want to obtain, leaving the details on how to achieve this
purpose to the expression evaluation engine.

The other key element in the LINQ technology is its open architecture, which
makes extensibility possible. The semantics of operators in query expressions is
in no way hardwired into the language, so it can be modified (or extended) by
Microsoft or third party libraries in order to access specific data sources. Apart
from the standard mechanism to apply LINQ to arrays and generic collections
(LINQ to Objects), Microsoft itself provides us with at least four other
technologies: on the one hand, LINQ to XML (to execute integrated queries to
XML documents) and LINQ to DataSets (to execute queries in LINQ style against
typed and untyped datasets); on the other, LINQ to SQL and LINQ to Entities,
which make it possible to make integrated queries against relational databases.
But the architecture of the tools that are available to us through LINQ is such
that it has allowed the emergence of technologies (providers) which simplify
and homogenize access to many other data sources: LINQ to Amazon and LINQ
to SharePoint are two of the most powerful examples can be found on the net.
The architecture of LINQ can be graphically described as follows:
Fundamentals of LINQ 5

Figure A1_1.- Graphic representation of the architecture of LINQ

1.2.- Query expressions

Query expressions are the mechanism by which LINQ technology comes to life. They
are simply expressions that respond to a new syntax that has been added to C# 3.0 and
Visual Basic 9.0, and they can act on any object implementing the generic interface
IEnumerable<T> (in particular, arrays and collections of .NET 2.0 and above implement
this interface), transforming it into other objects using a set of operators; generally (but
not always), objects that implement that same interface.

Note: When referring to an object that implements the IEnumerable<T>


interface, we will frequently use the term sequence; it would be incorrect to use
array or collection, and enumerable (which they are, in truth: objects that
can be enumerated) could create confusion with enum enumerated types.

Query expressions rely, in turn, on other new features included into C# 3.0 and
Visual Basic 9.0, such as:

Implicit declaration of the type of local variables


6 Fundamentals of LINQ

Object and collection initializers


Anonymous types
Extension methods
Lambda expressions
Expression trees

We will henceforth use C# as our vehicle for expression.


We will begin with a small example which will illustrate what query expressions
basically consist in. For it, we will assume that we have defined the classes Person and
Country that appear below.

using System;

namespace CampusMVP.Classes
{
public class Country
{
public string Code { get; set; }
public string Name { get; set; }

public Country(string code, string name)


{
Code = code;
Name = name;
}
}

public enum Gender { Female, Male }

public class Person


{
#region Properties
public string Name { get; set; }
public string CodeCountryOfBirth { get; set; }
public DateTime? DateOfBirth { get; set; }
public Gender? Gender { get; set; }

public int? Age


{
get
{
if (DateOfBirth == null)
return null;
int days = (int)
((DateTime.Today - DateOfBirth.Value).TotalDays);
return days / 365;
}
}
#endregion

#region Constructors
public Person() { }
public Person(string name, string country): this()
{
this.Name = name;
this. CodeCountryOfBirth = country;
Fundamentals of LINQ 7

}
#endregion

#region Methods
public override string ToString()
{
return
(Name == null ? "???" : Name) +
(CodeCountryOfBirth == null ? " (??)" :
" (" + CodeCountryOfBirth + ")") +
(Gender == null ? "" :
(Gender.Value == Gender.Female ? " (F)"
: " (M)")) +
(DateOfBirth == null ? "" :
" (" + DateOfBirth.Value.ToString("dd/MM/yyyy") +
")");
}
#endregion
}
}

Additionally, the Data class introduced below defines static properties of some test
data:

using System;
using System.Collections.Generic;

namespace CampusMVP.Classes
{
public static class Data
{
public static List<Country> Countries = new List<Country> {
new Country("ES", "SPAIN"),
new Country("CU", "CUBA"),
new Country("RU", "RUSSIA"),
new Country("US", "UNITED STATES")
};

public static List<Person> People = new List<Person> {


new Person {
Name = "Diana",
CodeCountryOfBirth = "ES",
DateOfBirth = new DateTime(1996, 2, 4),
Gender = Gender.Female
},
new Person {
Name = "Dennis",
CodeCountryOfBirth = "RU",
DateOfBirth = new DateTime(1983, 12, 27),
Gender = Gender.Male,
},
new Person {
Name = "Claudia",
CodeCountryOfBirth = "CU",
DateOfBirth = new DateTime(1989, 7, 26),
Gender = Gender.Female,
},
new Person {
Name = "Jennifer",
8 Fundamentals of LINQ

CodeCountryOfBirth = "CU",
DateOfBirth = new DateTime(1982, 8, 12),
Gender = Gender.Female,
}
};
}
}

Suppose that, based on the above data, we want to obtain a collection with
the names and ages of the people appearing in the original list who are older
than 20, with their names converted to uppercase. The objects in the resulting
collection must also appear alphabetically.
If you are not familiar with the wonders of integrated queries yet, the first
thing that will come to your mind will probably be to search for a mechanism
to traverse through the original collection, creating new objects that contain
the characteristics required from the people who meet the requirements, and
then making another pass to order these objects alphabetically by their
names. Thanks to LINQ, this procedure becomes anachronistic.
In C# 3.0, we have a much clearer and elegant way of obtaining that result in a
single shot:

using System;
using System.Linq;

namespace Demo1
{
using CampusMVP.Classes;

class Program
{
static void Main(string[] args)
{
var older20 = from h in Data.People
where h.Age > 20
orderby h.Name
select new
{
Name = h.Name.ToUpper(),
Age = h.Age
};

foreach (var h in older20)


Console.WriteLine(h.Name + " (" + h.Age + ")");
}
}
}

The semantics of the assignment statement that contains the integrated query
will be intuitively clear for anyone familiar with the SELECT statement of
SQL. What is less usual is that the select clause, where we specify what we
want to obtain, is located at the end, unlike in SQL. The reason for this change
in position is quite logical: if select were at the beginning, it would be impossible to
offer IntelliSense help to the developer when typing the data to be selected, because
Fundamentals of LINQ 9

at that point they have not yet specified the data collection on which the query
will be executed. However, if we have the from operand in first place, it is easy
for the integrated environment to help the developer while they are typing the
expression all along: if Data.People is of the List<Person> type (and hence
IEnumerable<Person>), then h (the name chosen by us for the variable that will
successively refer to each of the elements in the sequence) is of the Person type,
and the system will be able to determine whether any of the expressions
appearing in the where, orderby, etc. clauses is correct or not.
In the statement, we make use of three features that appeared with C# 3.0:

Anonymous types: since the structure of the desired set of results does
not match the structure of the original type of the elements, in the select
clause of the expression we define ad hoc a new data type, which will be
generated by the compiler automatically. The result produced by the query
expression is a collection of objects of this anonymous type.
Anonymous types themselves rely on object initializers, which allow us
to assign initial values to the fields of the anonymous type objects.
Finally, the object resulting from the execution of the query is assigned to a
variable whose type is automatically inferred. Although in some cases
automatic determination of the type of a local variable is merely convenient,
when combined with anonymous types its use is simply indispensable. This
feature is also used later when declaring the variable used to iterate the resulting
collection with the foreach loop.
Next we will see how two other new features of C# 3.0 activate behind the scenes:
lambda expressions and extension methods.

1.3.- Rewriting query expressions

What does the compiler do when it encounters a query expression? The rules of
C# 3.0 stipulate that, before being compiled, any query expression appearing in the
source code is mechanically transformed (rewritten) into a sequence of calls to methods
with predetermined names and signatures. These sequences are known as query
operators. The expressions in each of the clauses that make up the query expression are
also rewritten to adapt them to the requirements of those predetermined methods. Before
actually compiling it, the compiler will translate the query expression of our previous
example into the following:

var older20 = Data.People


.Where(h => h.Age > 20)
.OrderBy(h => h.Name)
.Select(h => new
{
Name = h.Name.ToUpper(),
Age = h.Age
});
10 Fundamentals of LINQ

The where clause of the query expression becomes a call to a method named Where().
In order to pass it to that method, the expression that accompanied wherei.e.

h.Age > 20

is transformed into a lambda expression that produces true or false for a person:

h => h.Age > 20

Hopefully, you will find this plainly logical: if the Where() method is to have a
general nature, that is, if it is to be able to work for any condition that a developer might
need, it should hence receive a delegate as a parameter, and this delegate would be
pointing to a Boolean method which would check the condition to be met. Lambda
expressions are just that: a more practical way to specify anonymous delegates.
Once the above is understood, our next question would be which is (or should be)
the signature of this Where() method into which the where clause translates? Where
should the method be located? Note that, after our first step rewriting our example query
expression, we would have this:

var older20 = Data.People.Where(h => h.Age > 20)

In order for our call to be valid, Where() must be (a) an instance method of the
class to which People belongs, or (b) a method of an interface implemented by
the class to which People belongs (and since we started with the premise that
objects that can serve as the source of integrated queries must implement
IEnumerable<T>, this interface would be a strong candidate to be extended with
methods such as Where(), etc.).
Any of the two aforementioned paths could work, but the creators of C# 3.0
considered that the architectures that they would produce would not be so open
and extensible. Thus, at this point in the representation, extension methods make
their appearance. With extension methods at our disposal, Where()like
OrderBy(), Select() and the other actorscould also be extension methods of
IEnumerable<T> which could potentially be defined in any static class that is in
scope when the query expression is compiled.
To check the previous theory, do this small experiment: In the source code of
the example, comment the line at the top of the file that reads:

using System.Linq;

You will see that the program stops compiling (analyze the error message in detail:
Could not find an implementation of the query pattern for source type List<Person>.
Where not found.). The reason is that when you commented the using statement, the
compiler was deprived of the definitions of the extension methods Where(), OrderBy(),
Select(), etc. (which are generically known as query operators) into which the
different clauses of the integrated query are translated. The default implementations of
these operators are held in a static class that is fittingly called System.Linq.Enumerable
Fundamentals of LINQ 11

and implemented in the System.Core.dll assembly, which all projects for


.NET Framework created with Visual Studio automatically reference.

1.4.- The (non) semantics of query operators

As you will have concluded, query expressions are pure syntactic sugar. In the
previous section we saw how these expressions are mechanically translated into a
sequence of calls to methods, following a set of predefined rules in the language
specification. We have also seen how, when we eliminate an import of the namespace,
the code containing a query expression stops compiling. In relation to that, we should
emphasize that if we put in scope another namespace containing the static classes with
the relevant definitions of those extension methods, the query expression will compile
again without any problems, and it will use the new set of methods-operators for its
execution. This is what the open architecture of LINQ consists in: C# does not define
specific semantics for the operators implementing query expressions, and anyone can
create one or more classes with custom implementations of query operators for generic
or specific collections and plug them into the system by putting them in scope so that
they are used when compiling the integrated queries on sequences of those types. In fact,
this is the path through which the predefined extensions of LINQ, such as LINQ to XML
or LINQ to Entities, are integrated in the language, as well as the path through which
third parties can develop proprietary providers.

1.5.- Resolving calls to operators

Although the development of LINQ extensions (providers) is beyond the scope of


this Appendix, we are just going to look at an example which will clarify a related topic:
call resolution. Suppose that we want our integrated queries on sequences of the Person
type to behave in such a way that when we ask which People meet a condition P, only
the female meeting said condition are returned (which would otherwise be incorrect).
We could define the following Where() method (for simplification, we will do so in the
Program class itself). This Where() method operates on a closed generic type
(IEnumerable<Person>):

public static IEnumerable<Person> Where(


this IEnumerable<Person> source,
Func<Person, bool> filter)
{
foreach (Person p in source)
if (filter(p) && p.Gender == Gender.Female)
{
yield return p;
}
}
12 Fundamentals of LINQ

If you now compile and execute the query which selects those older than 20 years,
you will verify that our newly created method is used instead of the standard query
operator, and thus there will only be females in the result. Because this query operates
on an object of the IEnumerable<Person> type, our method takes precedence over the
method of the System.Linq.Enumerable class. And what happens with OrderBy(),
Select() and the rest? Well, we have not defined those methods, but
IEnumerable<Person> is compatible with IEnumerable<T>, and therefore the
implementations of those methods located in the base class library will be used.

One final aspect to bear in mind: What if we had defined the Where() method so that
it operated on IEnumerable<T>, just like in the default implementation? The answer is
that our version would be likewise used, because it is located in the same namespace as
the class where the query is executed. The call resolution algorithm of the compiler
searches the namespaces of our code from inside out, and only if it does not find anything
this way does it use the methods that it finds in static classes belonging to other
namespaces in scope.

1.6.- Deferred execution

The default implementations of query operators work through iterators, in a similar


way to what we did in the previous code. This procedure is reflected in the use of the
yield return statement. This gives rise to a particular aspect, by which assigning a query
expression such as the following:

var older20 = from h in Data.People


where h.Age > 20
orderby h.Name
select new
{
Name = h.Name.ToUpper(),
Age = h.Age
};

merely prepares the enumerator objects needed. The result of a query will not be really
obtained until iteration takes place on it. This evaluation on demand, also known as
lazy or deferred evaluation is the default behavior of LINQ. We must always take into
account a possible collateral effect of this: two successive evaluations of one same query
can produce different results if there are changes in the underlying source of information
between them. In spite of this, deferred execution is the best option in most cases.
Nevertheless, sometimes we may want to completely cache the result of a query in
memory for its later reuse. For this purpose, we have the standard query operators
ToArray(), ToList(), ToLookup() and ToDictionary(). For example, we can obtain the
results of the previous query in one go using the statement:

var listOlder20 = older20.ToList();


Fundamentals of LINQ 13

1.7.- Standard query operators

By now it should be clear that, potentially, we can make the methods implemented
by query operators do whatever we want as long as they comply with the signatures
required by the compiler. However, we are meant to associate a functionality to them
that is in keeping with what is generally expected from them. The Where() method, for
example, is supposed to filter the input sequence, leaving only the elements satisfying
the specified condition for output. OrderBy(), on its part, is to collect the input sequence
and produce another one containing the same elements as the original one, but in
ascending order according to a certain criterion.
Still on Where(), by now we are completely acquainted with the signature of the
method, at least in its main overload. If it is implemented as an extension method, it
receives the input sequence (marked with this) as its first argument, and the second
parameter is a delegate to a function that receives a T and returns a bool. The type of the
return value is IEnumerable<T>, as you can conclude from our last example of code: It is
easy to realize it, considering that the result produced by Where() is going to serve as an
input for OrderBy(), Select() or one of the other methods in the cascade of calls that is
generated as a result of rewriting.
The main overload of the standard query operator Where() is implemented like this:

// *** Alternative Where() operator


public static IEnumerable<T> Where<T>(
this IEnumerable<T> source, Func<T, bool> filter)
{
if (source == null || filter == null)
throw new ArgumentNullException();
foreach (T t in source)
if (filter(t))
{
yield return t;
}
}

The method first checks the validity of the input arguments. Next, it enters a loop
which iterates through the input sequence, and for each of its elements it calls the
predicate to check whether the element meets the condition or not. Only if the element
meets the condition does the method produce it in the output sequence. In our example,
where we wanted people older than 20, that output sequence would be, in turn, the input
sequence for the OrderBy() operator.
As another example, look at how the Select() operator is implemented:

// *** Alternative Select() operator


public static IEnumerable<V> Select<T, V>(
this IEnumerable<T> source, Func<T, V> selector)
{
if (source == null || selector == null)
throw new ArgumentNullException();
foreach (T t in source)
yield return selector(t);
}
14 Fundamentals of LINQ

Here, the type of the elements of the resulting sequence results from the return type
of the selection or transformation expression used.
The default implementation (in the System.Linq.Enumerable class of
System.Core.dll) of a set of extension methods including Where(), Select(), OrderBy()
and some others, which can be used to execute integrated queries against any enumerable
sequence of objects in memory, is known as LINQ to Objects, and these methods are
called standard query operators.

1.8.- The query expression pattern

Not all standard query operators are reflected in the language syntax. For example,
there is an operator called Reverse() which produces the elements of its sequence in
reverse order (from last to first). However, there is no syntactic mapping in the
C# language for this operator. When we need it, we will have to use it with the customary
notation for calls to method calls:

var reverseOrder = older20.Reverse()

Also, not even all the overloads of the same query operator are reflected in the syntax
of C# query expressions, but just some of them. For example, the Where() operator has
two overloads, but only one (the one presented above) is used for rewriting query
expressions.
The subset of standard query operators of C# 3.0 from which the syntax of query
expressions depends directly (and hence any LINQ extension vendor should support)
produces what is known as the query expression pattern or LINQ pattern: a
specification of the set of methods (subset of the set of standard query operators) which
must be available in order to ensure full support for integrated queries.

1.9.- Syntax of query expressions

The full syntax of query expressions is as follows:

<query expr.> ::= <from clause> <query body>

<from clause> ::=


from <element> in <source expr.>

<query body> ::=


<query body clause>*
<final query clause>
<continuation>?

<query body clause> ::=


(<from clause>
Fundamentals of LINQ 15

| <join clause>
| <join-into clause>
| <let clause>
| <where clause>
| <orderby clause>)

<let clause> ::=


let <element> = <selection expr.>

<where clause> ::=


where <filter expr.>

<join clause> ::=


join <element> in <source expr.>
on <key expr.> equals <key expr.>

<join-into clause> ::=


join <element> in <source expr.>
on <key expr.> equals <key expr.>
into <element>

<orderby clause> ::=


orderby <orderings>

<orderings> ::=
<ordering>
| <orderings> , <ordering>

<ordering> :=
<key expr.> (ascending | descending)?

<final query clause> ::=


(<select clause> | <groupby clause>)

<select clause> ::=


select <selection expr.>

<groupby clause> ::=


group <selection expr.> by <key expr.>

<continuation> ::=
into <element> <query body>

Meta-language:
* - zero or more times
( ... | ... ) - alternative
? optional element

Basically, a query expression always begins with a from clause, where the source
collection on which the query will be executed is specified. Next, there may be one or
more from, join, let, where or orderby clauses, and finally a select or group by clause.
Optionally, at the end of the expression there may be a continuation clause, which begins
with the reserved word into and is followed by the body of another query. Remember
that all keywords used here are contextualthey only have a special meaning within
query expressions.
We will next show examples of use of the different syntactic elements of query
expressions.
16 Fundamentals of LINQ

1.10.-Table of standard query operators

The following table lists the available standard query operators, grouped by category.
We have first highlighted the basic operators, for which the syntax of query
expressions offers a special clause (where, orderby, select, group, join) and which have
some overloads that are part of the aforementioned query expression pattern, which is
precisely defined in the specification document of C# 3.0. For the rest of standard
operators there is no direct linguistic support in C# 3.0, and to use them we will have to
employ the specific syntax for method calls. Note that, although the basic operators and
many of the non-basic ones produce another sequence as a result, among the rest of
operators there are several ones which produce scalar values as results, which means that
they must always be placed at the end of the chain of method calls.

Table 1.- Standard query operators

Operators of the LINQ pattern


Filters the original sequence based on a logical
Where()
predicate.
Projects the original sequence into another one
Select()/SelectMany()
based on a transformation function.
Rearranges the original sequence in ascending
OrderBy()/ThenBy()
order based on a function calculating the sort key.
OrderByDescending()/ Rearranges the original sequence in descending
ThenByDescending() order based on a function calculating the sort key.
Creates groups from the original sequence based
GroupBy()
on a function calculating the grouping key.
Performs an inner join of the original sequence
and another sequence based on functions
Join()
calculating the matching keys for each of the
sequences.
Performs a grouped join of the original sequence
and another sequence based on functions
GroupJoin()
calculating the matching keys for each of the
sequences.
Partitioning operators
Returns a specified number of contiguous
Take()
elements from the start of a sequence.
Bypasses a specified number of elements in a
Skip()
sequence and then returns the remaining ones.
Selects the elements from the original sequence
TakeWhile() while a predicate is satisfied and bypasses the
rest.
Fundamentals of LINQ 17

Bypasses the elements from the original


SkipWhile() sequence while a predicate is satisfied and then
returns the remaining ones.
Set operators
Selects unique elements from the original
Distinct()
sequence.
Produces the set union of the original sequence
Union()
and another sequence.
Produces the set intersection of the original
Intersect()
sequence and another sequence.
Produces the set difference between the original
Except()
sequence and another sequence.
Conversion operators
Creates an array from the elements of the
ToArray()
original sequence.
Creates a generic list (List<T>) from the
ToList()
elements of the original sequence.
Creates a dictionary of key/value pairs
(Dictionary<K, V>) from the elements of the
ToDictionary()
original sequence based on functions calculating
the keys and the values.
Creates a dictionary of key/sequence of elements
with that key value (Lookup<K, V>) pairs, from
ToLookup()
the elements of the original sequence based on
functions calculating the keys and the values.
Changes the type of the original sequence to
AsEnumerable()
IEnumerable<T>.
Casts the type of the elements of the original
Cast<T>()
sequence to T.
Filters the elements of the original sequence
OfType<T>()
which are of type T.
Sequence generation operators
Generates a sequence made up of n consecutive
Range()
integers starting from a specific m value.
Generates a sequence where an element of type
Repeat<T>()
T is repeated n times.
Generates an empty sequence of elements of
Empty<T>()
type T.
Sequence transformation operators
Concat() Concatenates two sequences.
Reverse() Reverses the elements of a sequence.
Quantifiers
18 Fundamentals of LINQ

Existential quantifier: it returns true if any of the


Any() elements of the original sequence satisfies a
logical predicate; otherwise it returns false.
Universal quantifier: it returns true if all the
All() elements in the original sequence satisfy a logical
predicate; otherwise it returns false.
Checks the existence of a specified element
Contains()
within the original sequence.
Checks if the original sequence and another one
SequenceEqual()
are equal.
Elements
Returns the first element of the original
First() sequence, or the first element which satisfies a
specified condition.
Returns the first element of the original
sequence, or the first element which satisfies a
FirstOrDefault() specified condition. If such an element does not
exist, it returns the default value of the type of the
elements in the original sequence.
Returns the last element of the original sequence,
Last() or the last element which satisfies a specified
condition.
Returns the last element of the original sequence,
or the last element which satisfies a specified
LastOrDefault() condition. If such an element does not exist, it
returns the default value of the type of the
elements in the original sequence.
Returns the single element of the original
Single() sequence, or the single element which satisfies a
specified condition.
Returns the single element of the original
sequence, or the single element which satisfies a
SingleOrDefault() specified condition. If such an element does not
exist, it returns the default value of the type of the
elements in the original sequence.
Returns the element of the original sequence
ElementAt()
located at the specified position.
Returns the element of the original sequence
located at the specified position. If such an
ElementAtOrDefault() element does not exist, it returns the default value
of the type of the elements in the original
sequence.
Returns the same original sequence, or a
DefaultIfEmpty() sequence formed by a default value if the original
sequence is empty.
Fundamentals of LINQ 19

Aggregate operators
Returns the number of elements in the original
Count() / LongCount() sequence, or the number of elements which
satisfies a specified logical predicate.
Returns the maximum (or minimum) of the
Max() / Min()
elements in the original sequence.
Returns the sum of the elements in the original
Sum()
(numeric) sequence.
Returns the average of the elements in the
Average()
original (numeric) sequence.
Returns the result of applying a specified
Aggregate() aggregate function to the elements in the original
sequence.

1.11.-Some examples

The purpose of this section is to present some examples of the things we can
accomplish using query expressions.

Basic examples
Given that any object implementing IEnumerable<T> can serve as the source of an
integrated query, and that this interface is implemented by objects as common as arrays,
generic collections and even character strings (which allows us to enumerate the
characters constituting them), it is clear that we can apply integrated queries in a large
number of everyday situations for which we previously used loops, counters and other
various techniques.
Now we will give some examples of query expressions applied to strings and arrays:

string s = "Hasta la vista, baby";

// produces the vowels of string 's' in alphabetical order


var s1 = from c in s
where "AEIOU".Contains(char.ToUpper(c))
orderby c
select c;

// counts the spaces in string 's'


// note that there is no syntax for the Count() operator
int n1 = (from c in s
where c == ' '
select c).Count();

// produces the words which are different in a sentence


// string.Split() produces an array of strings
// Distinct() doesnt have "sugared syntax" either
var n2 = (from w in s.Split(new char[] { ' ', '\t', '\n' },
StringSplitOptions.RemoveEmptyEntries)
20 Fundamentals of LINQ

orderby w.ToLower()
select w.ToLower()).Distinct();

int[] arr = { 2, 4, 3, 7, 25, 9, 6 };

// produces a sequence with the pairs in arr


var pairs = from n in arr
where n % 2 == 0
select n;
// could also have been:
var pairs2 = arr1.Where(n => n % 2 == 0);

// produces the sum of the numbers in the sequence


int sum = arr.Sum();
// the same as:
int sum2 = (from n in arr select n).Sum();

// produces the numbers of the sequence, incremented by 1


var other = from n in arr
select n + 1;
// the same as:
var other2 = arr.Select(n => n + 1);

Finally, let us look at some examples on the list of people:

// names beginning with 'D'


var children = from h in Data.People
where h.Name.StartsWith("D")
select h.Name;

// ordered list, first by gender


// then by age in descending order
var order = from h in Data.People
orderby h.Gender, h.Age descending
select h;

Cartesian products
In the world of relational databases, a Cartesian product of two tables is simply the
set of rows resulting from combining each row from the first table with each row from
the second one. Here, the same concept can be applied to the combination of two
sequences: if we have two from clauses (which act as generators) one after the other, all
the elements of the second sequence will be produced for each element of the first
sequence. For example:

/* CARTESIAN PRODUCT */
var pc1 = from co in Data.Countries
from pe in Data.People
select new {
co.Name,
NamePerson = pe.Name
};
Fundamentals of LINQ 21

Cartesian products are implemented through calls to the standard query operator
SelectMany(), which is in charge of producing a sequence in which each of the elements
of the first sequence is combined with each of the elements of the second one.
The main danger of Cartesian products is the combinational explosion of results that
they can produce. For this reason, it is generally recommended to avoid Cartesian
products whenever possible, perhaps by applying the following techniques.

Restricted products and query optimization


Suppose we want to obtain a list of person pairs where the first element is a male and
the second element is a female. A first attempt might be:

var pc2 = from p1 in Data.People


from p2 in Data.People
where p1.Gender == Gender.Male &&
p2.Gender == Gender.Female
select new { He = p1.Name, She = p2.Name };

The example above is what we might call a restricted Cartesian product: a


Cartesian product to which we attach filter conditions that reduce the size of the resulting
sequence.

If you analyze the previous query closely, you will agree that the following option is
better with regard to performance, because the elements of the first sequence that will
ultimately be discarded are eliminated earlier in the pipe of query operators executed:

var pc3 = from p1 in Data.People


where p1.Gender == Gender.Male
from p2 in Data.People
where p2.Gender == Gender.Female
select new { He = p1.Name, She = p2.Name };

Although the study of specific optimization techniques for LINQ queries falls beyond
the scope of this Appendix, we have deemed it relevant to note this fact so that you can
take it into account when programming integrated queries. A completely different matter
is that an intelligent compiler could transform the first expression into the second one
in a way that was transparent to the developer. Future versions of the C# compiler will
probably do it, but not the current one.

Joins
Joins are another of the typical constructions of relational languages such as SQL that
have been added to C# 3.0 query expressions. A join is basically a Cartesian product on
two sequences which is limited to the tuples (t1, t2), where the value of a certain
expression applied to the element of the first sequence t1 is equal to the value of another
expression applied to the element of the second sequence t2. Basically, the point is to
22 Fundamentals of LINQ

drastically reduce the combinations that a full Cartesian product would produce, keeping
only the elements of the sequences that match according to a certain shared criterion.
For example, the query that we presented earlier which combines the names of
people and countries would be much better like this:

/* JOIN */
var enc1 = from co in Data.Countries
join pe in Data.People
on co.Code equals pe.CodeCountryOfBirth
select new {
co.Name,
NamePerson = pe.Name
};

The join condition comprises a key selector for the outer sequence, the contextual
keyword equals and another key selector for the inner sequence. The key selectors used
to compare fields can be any expression obtained based on the identifier representing the
element of the corresponding sequence.
The same result could have been obtained using a restricted Cartesian product, but
the performance of the join is much higher. The extension method Join() (which is
called to execute joins) is conceived for using hash tables, in a similar way as table
indexes are used in the world of databases.

Groups
The syntax of query expressions also supports the organization of the elements of a
sequence into groups according to the different values of a grouping key which is
calculated for each element. For example, the following statement

/* GROUP */
var groupsGender =
from h in Data.People
group new { h.Name, h.Age } by h.Gender;

groups the elements of the original sequence according to the different values of the
h.Gender expression. In this case, we will obtain a sequence of two elements, which will
in turn be sequences: the first one, with objects of an anonymous type which includes
the data requested (name and age) of all the females (objects for which the value of the
grouping key is Gender.Female); and the second one, with the data of all the males, for
which the value of h.Gender equals Gender.Male.
C# 3.0 translates the previous query expression into a call to the GroupBy()standard
operator. The result is a sequence, and each of its elements is, in turn, an inner sequence
associated to each group, implementing an interface called IGrouping<TKey, TElmt>
which inherits from IEnumerable<TElmt>. Basically, this interface adds a Key read-only
property, whose type is the type of the grouping key. The following loop shows the
structure of the result of the query:

foreach (var hh in groupsGender)


{
Fundamentals of LINQ 23

// the key of the group


Console.WriteLine(hh.Key);
// the elements of the group
foreach (var hhh in hh)
Console.WriteLine(" - " + hhh.Name +
" (" + hhh.Age + ")");
}

Below is an example which involves our two tables of people and countries. The
following statement allows us to group the people by their country of birth:

Console.WriteLine("GROUPS BY COUNTRY");
var groupsCountries =
from co in Data.Countries
join pe in Data.People
on co.Code equals pe.CodeCountryOfBirth
group new { pe.Name, pe.Age }
by co.Name;

foreach (var hh in groupsCountries)


{
// the value of the key
Console.WriteLine(hh.Key);
// the elements of the group
foreach (var hhh in hh)
Console.WriteLine(" - " + hhh.Name +
" (" + hhh.Age + ")");
}

Note the similarity between this last query expression and the one presented in our
previous section on joins: the difference lies in the presence of the group...by clause
instead of select. These two clauses are precisely final clauses in the syntax of query
expressions.

Continuations
If you execute the previous grouping, you will see that the different groups appear in
the resulting sequence in the same order as the countries appear in the original sequence.
What if we wanted to obtain the groups in alphabetical order of the countries? We could
resort to explicit syntax:

var groupsCountries2 =
(from co in Data.Countries
join pe in Data.People
on co.Code equals pe.CodeCountryOfBirth
group new { pe.Name, pe.Age }
by co.Name).OrderBy(g => g.Key);

To express this type of situations, the language provides a better mechanism:


continuations. Continuations allow us to set a cascade of queries, where the results of
a query are used as the input sequence of the subsequent query. For this, we use the into
clause:
24 Fundamentals of LINQ

var groupsCountries3 =
from co in Data.Countries
join pe in Data.People
on co.Code equals pe.CodeCountryOfBirth
group new { pe.Name, pe.Age } by co.Name
into tmp
orderby tmp.Key
select tmp;

In practice, continuations are especially useful for processing the results produced by
a group...by clause. Observe the following example, similar to the previous one:

var summaryCountries =
from co in Data.Countries
join pe in Data.People
on co.Code equals pe.CodeCountryOfBirth
group new { pe.Name, pe.Age } by co.Name
into tmp
orderby tmp.Count() descending
select new {
Name = tmp.Key,
Number = tmp.Count()
};

This query produces an ordered sequence of objects with two properties: the name of
the country and the number of people born in that country.

Grouped joins
The second and most important application of the into clause has the purpose of
implementing what is known as grouped joins. This is a type of join that has no direct
equivalent in the world of relational databases. Instead of producing the typical sequence
of pairs yielded by a normal join, it produces a sequence where each element of the first
sequence is paired up with the group of elements of the second one whose matching key
values correspond to the matching key value of the element of the outer sequence. A
join...into construction translates into a call to the GroupJoin() standard operator,
which is based, like Join(), on the use of hash tables.
For example, a more concise way to obtain the list of countries with the number of
people in each country, similar to the one in the previous section, would have been this
one:

var summaryCountries2 =
from co in Data.Countries
orderby co.Name
join pe in Data.People
on co.Code equals pe.CodeCountryOfBirth
into gp
select new {
Country = co.Name,
Number = gp.Count()
};
Fundamentals of LINQ 25

This query expression translates into:

var summaryCountries3 =
Countries.OrderBy(co => co.Name).
GroupJoin(Generation,
co => co.Code,
pe => pe.CodeCountryOfBirth,
(c, gp) => new { Country = c.Name,
Number = gp.Count() });

Note that, as opposed to our previous query, the countries where nobody has been
born will also be included in the result this time.

The let clause


Finally, the let clause serves as a convenience mechanism which is very useful in
cases where we need to assign a name to an intermediate result in order to reuse it later
or to execute a subsidiary query (subquery) whose result is needed inside the outer
query.
For example, suppose that we have a sequence of integers and we want to group them
according to their last digit. We will need to obtain the last digit of each number in the
sequence, since the grouping will be performed based on it. We could do it like this:

var let1 = from n in arr


let ending = n % 10
orderby ending
group n by ending;

The compiler translates a let clause by injecting what is known as a transparent


identifier into the source code. In practice, this translates into a call to the Select()
operator, which adds the values of the variable to the elements of the output sequence
as a new additional property. The previous statement will be translated into this:

var let2 = arr.


Select(n => new { n, ending = n % 10 }).
OrderBy(x => x.ending).
GroupBy(x => x.ending, x => x.n);

As an example of use of let to express subqueries, suppose that we want to obtain


the people whose age is equal to or greater than the average age of the set. In principle,
we would have to execute two queries: the first one to calculate the average and the
second one to obtain the elements whose age exceeds the average previously calculated.
With the aid of let, we could express everything in one stroke like this:

var let3 = from p in Data.People


let average = (Data.People.Average(pp => pp.Age))
where p.Age >= average
select p.Name;
26 Fundamentals of LINQ

1.12.-LINQ extensions

As we have already said, query operators do not have predetermined semantics. Their
semantics are instead plugged in at compile time depending on the type of the data
feed to which the query is applied and the sets of extension methods that are in scope at
that moment. The following table shows the assemblies that contain the extension classes
and the namespaces to which these classes belong, for each of the extensions of LINQ
available as standard in .NET Framework 3.5 and above:
Table 2.- LINQ extensions

Technology Assembly Namespace


Local providers
LINQ to Objects System.Core.dll System.Linq
LINQ to XML System.Xml.Linq.dll System.Xml.Linq
LINQ to DataSets System.Data.DataSetExtensions.dll System.Data
Remote providers
LINQ to SQL System.Data.Linq.dll System.Data.Linq
System.Data.Objects
LINQ to Entities System.Data.Entity.dll
and others

As you can see, LINQ providers can be classified into two very different categories:
local providers and remote providers. The following table briefly summarizes the main
differences between them.
Table 3.- Differences between local and remote providers

Local providers Remote providers


Interface IEnumerable<T> IQueryable<T>
Execution Local, in memory Generally remote
Implementation Anonymous delegates Expression trees
LINQ to Amazon,
Examples Parallel LINQ
LINQ to LDAP

Local providers are those which operate on data sources available in the memory of
the computer where the application is executed. Apart from LINQ to Objects, in this
category we also find LINQ to XML (which allows querying XML documents loaded as
trees of nodes in memory) and LINQ to DataSets (intended for querying in-memory
ADO.NET datasets). Since they actually operate on sequences (which implement
IEnumerable<T>), such providers generally rely on the implementations of the standard
query operators provided by LINQ to Objects. For example, if you use the Object
Browser to examine the System.Xml.Linq.dll assembly, which contains the
implementation of LINQ to XML, you will not find any class having extension methods
called Where(), Select(), etc. LINQ to XML relies on LINQ to Objects. Therefore, to
program an integrated query against a LINQ to XML document, it will not only be
Fundamentals of LINQ 27

necessary to import the System.Xml.Linq namespace, but System.Linq too. These


providers also offer specific query operators, which must always be expressed using
functional notation. For example, LINQ to XML offers several operators (such as
Elements()) which emulate the selection possibilities of the XPath language.
Much more powerful and interesting are remote providers, a category to which
LINQ to Entities belongs, and to which we will devote the following sections.

1.13.-The IQueryable<T> interface

As soon as the idea of implementing a LINQ extension for querying relational


databases arose, it became evident that the mechanism based on going through sequences
in memory was not the most convenient. Any implementation of query operators based
on client cursor navigation, even when using lazy evaluation, would be disastrous for
performance. The creators of this technology realized that the only right way to
implement integrated queries against a relational store would be to progressively
capture the specifications contained in each of the where , orderby, etc. clauses step by
step. A single SQL SELECT statement would be composed and sent to the database
engine only when all the string of calls to extension methods had been analyzed.
To that end, a new interface was defined. It is IQueryable<T>, which inherits from
IEnumerable<T> (and hence the objects implementing this interface are potential sources
of integrated queries), but this new interface is equipped with a set of extension methods
(whose basic implementation is contained in the System.Linq.Queryable static class, in
the same way that extension methods for IEnumerable<T> are implemented in
System.Linq.Enumerable), and these extension methods work in a completely different
way than LINQ to Objects do. This interface is more suitable when the data source that
we want to access is remote, as is the case with LINQ to SQL and LINQ to Entities.
The Queryable class implements practically the same query operators as Enumerable,
although there is a logical difference: instead of an AsEnumerable() operator, Queryable
works with the AsQueryable() operator. But the principal difference between extension
methods of the Enumerable class and those of Queryable lies in the fact that the latter do
not receive delegates (references of the Func<T,> type) as parameters but expression
trees (references of the Expression<Func<T,>> type) instead. As you probably know,
lambda expressions can be transformed at our convenience into the former or the latter,
thus a statement such as the following

var older20 = SourcePeople


.Where(h => h.Age > 20)
.OrderBy(h => h.Name)
.Select(h => new
{
Name = h.Name.ToUpper(),
Age = h.age
});

will compile correctly regardless of whether the source of the query (SourcePeople) is
IQueryable<Person> or simply IEnumerable<Person>. In the case of query expressions
28 Fundamentals of LINQ

of LINQ to SQL, LINQ to Entities and other providers based on IQueryable<T>, the
implementations of query operators do not receive, together with the input sequence,
delegates to the functions that have to be called if it is deemed appropriate, but instead,
expression trees that reflect what those functions do.

1.14.-What do operators of IQueryable<T> do?

In order to better understand what providers based on IQueryable<T> do, we must be


aware that all the objects that implement this interface carry an expression tree inside
them which reflects the algorithm for obtaining the sequence. Specifically, source
objects (those serving as sources of the queries) contain a single-node tree, of the
ConstantExpression type (that is, they are constant).
In IQueryable<T> the default implementations of query operators generate a new
IQueryable<T> object, whose expression tree is the result of combining the expression
tree of the input object with the expression trees corresponding to the other arguments of
the method. For example, in our previous query expression, the Where() method will
produce an output object whose expression tree will have a node of the
MethodCallExpression (call to a method) type as a root node. In turn, this root node will
have the original collection as the first argument of the call, and the tree corresponding
to the predicate to be checked as the second argument:

Figure A1_2.- Predicate tree

The same process will be repeated for the subsequent query operators (OrderBy() and
Select() in our example), serving each IQueryable<T> object resulting from a previous
call as the input object for the following one. At the end of the chain, we will have a
complex tree that will reflect everything that the query expression must do. This tree is
Fundamentals of LINQ 29

ready for use, when iteration over the query results begins, as a source from which to
build a statement in the language of the remote data source that we want to query (some
dialect of SQL, in the case of LINQ to SQL and LINQ to Entities). To translate LINQ
expression trees into syntactic constructions in the language of the store, LINQ providers
rely on auxiliary classes known as query providers.

1.15.-On the availability of operators and functions

As we have remarked, the only standard query operators that must mandatorily be
implemented by a LINQ provider are, in principle, the set of overloads of the operators
which are reflected on the syntax, which in the language specification are known as the
LINQ pattern. Also remember that, in the case of providers that we have referred to as
remote, at the end of the day a query expression translates into a huge expression tree,
which the query provider will later translate into a statement in the language of the
remote store to be queried.
For some types of stores, some of the extended standard query operators (invoked
using functional notation) may be meaningless or even impossible to implement. Of
course, in such cases we must avoid using said operators. If we use them, we will get an
exception of the System.NotSupportedException type.
For example, consider the ElementAt() operator, which returns the element located
in a specific position in the input sequence. The creators of LINQ to SQL and LINQ to
Entities understood that the use of this operator should not be permitted, so a query such
as the following

using (var ctx = new MAmazonEntities())


{
// produces an exception
var x = (from p in ctx.Product
orderby p.Title
select p).ElementAt(2);
Console.WriteLine(x.Title);
}

will produce an exception. Note, however, that the following alternative works properly:

using (var ctx = new MAmazonEntities())


{
// works
var y = (from p in ctx.Product
orderby p.Title
select p).Skip(2).Take(1).First();
Console.WriteLine(y.Title);
}

Similarly, we must not forget that the query will finally need to be translated into a
statement in the language of the store against which we are working when writing the
lambda expressions corresponding to the where, orderby, etc. clauses of our integrated
queries. For example, the following query
30 Fundamentals of LINQ

using (var ctx = new MAmazonEntities())


{
// produces an exception
var x = (from p in ctx.Product
// titles longer than three words
where p.Title.
Split(new char[] { ' ' }).Length > 3
orderby p.Title
select p.Title);
foreach (var s in x)
Console.WriteLine(s);
}

will produce an exception, because the LINQ to Entities query provider has no way to
express the call to the Split() method of the string class of .NET Framework in SQL,
which is quite poorer than .NET with regard to programming support. You cannot get
blood out of a turnip :-)
All the information about which operators and functions are supported, supported
with limitations or not supported at all by LINQ to SQL and LINQ to Entities is available
at MSDN.

1.16.-Update mechanisms

Finally, we should mention that although, in principle, LINQ is only associated with
the retrieval (query) of information, most LINQ providers have been equipped with
additional mechanisms to also allow data updates and, more generally, the
manipulationin the broadest sense of the wordof the containers where these data
are stored, be it XML documents, relational databases or others. In particular, the
technologies based on LINQ which involve access to relational databases (LINQ to SQL
and LINQ to Entities) allow us to make in-memory changes to objects obtained from
executing queries, and then apply those changes to the relational store. For this purpose,
these technologies are also capable of generating the necessary SQL statements INSERT,
UPDATE and DELETE.
Are you enjoying this book?
You will love this superb EF5 course:

Data Access with Entity Framework 5:


Up & Running
3 months to learn EF5 from scratch
Sergey Barskiy will answer your questions and doubts
Clear learning path:
Progressive dificulty, step by step
Detailed milestones to meet
Custom methodology
The right amount of theory needed
Precise hands-on videos
Downloadable code examples and related material

The best EF5 course you'll find in the market. You can bet!

Get it now!!!
Your trainer
Sergey Barskiy (Data Platform MVP) http://bit.ly/learnEF5

Premium coached online training


for busy developers
Are you enjoying this book?
You will love this superb SQL Server course:

Hands on SQL Server 2012:


Expert Programming and Design
3 months to learn SQL Server 2012 from scratch
Alessandro Alpi will answer your questions and doubts
Clear learning path:
Progressive dificulty, step by step
Detailed milestones to meet
Custom methodology
The right amount of theory needed
Precise hands-on videos
Downloadable code examples and related material

The best SQL Server 2012 course you'll find in the market.
You can bet!

Get it now!!!
Your trainer
Alessandro Alpi (SQL Server MVP) http://bit.ly/learnSQLServer

Premium coached online training


for busy developers

Potrebbero piacerti anche