Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Processes
Lecture Notes
Contents
1 Mathematical Prologue
1.1
1.2
Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1
Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2
Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3
Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1
Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Probability Theory
2.1
10
10
2.1.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
13
2.2.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.3.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
2.4
14
2.5
Equiprobable Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.5.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.6.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
22
2.7.1
24
2.2
2.3
2.6
2.7
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Random Variables
3.1
3.2
3.3
3.4
27
27
3.1.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
30
3.2.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.3.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
38
3.4.1
41
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.1
Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.2
Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
4.2.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3.1
49
4.3
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4
4.5
4.6
4.7
49
4.4.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.5.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.6.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
56
4.7.1
56
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
5.1.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5.2.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
5.3.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
5.4.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
5.5.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
5.6.1
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
75
6 Bibliography
76
1 Mathematical Prologue
1.1 Combinatorial Analysis
Being able to accurately count the possible results of an experiment is key in probability theory. Combinatorial
analysis is a powerful tool towards achieving this goal. This section contains a brief introduction to it.
1.1.1 Counting
Assume r experiments will be performed. If the i-th experiment has ni possible outcomes, i = 1, . . . , n then the
total number of outcomes of the r experiments is
ri=1 ni := n1 n2 . . . nr .
Example 1. If we flip a coin r times, the total number of outcome in each experiment is two: either heads or
tails. It follows from the counting principle that there are ri=1 2 = 2r different results on the joint experiment,
i.e., there is a total of 2r sequences of heads or tails.
Example 2. If we flip a coin and then a dice, there is a total of 2 6 = 12 results one can observe.
1.1.2 Permutations
Consider a set with n objects. The total number of ways we can sort these objects is given by
n! := n (n 1) . . . 2 1
Example 3. There are 4!=24 ways of displaying four shirts in a closet.
1.1.3 Combinations
Consider a set with n objects. The total number of different groups that of size r n that can be formed with
this n objects is given by
n
n!
:=
r
r!(n r)!
Example 4. There are
10
2
= 45 ways of choosing two questions from a section with 10 questions from this book
to study.
1.1.4 Exercises
Exercise 1. A restaurant offers three different entries, five main dishes and six deserts. How many different
meals does this restaurant offer?
Exercise 2. A student has 2 math books, 3 geography books and 4 chemistry books. In how many ways can these
books be displayed in a shelf:
If the student does not care about books of the same subject not being close together?
If the student wants that books about the same subject stay together?
Exercise 3. How many different words can be formed with two letters A, three letters B and one letter C?
4
Exercise 4. A student needs to study for three exams this week. The math teacher gave him 6 exercises to help
him study to the test, the geography teacher gave him 7 exercises, and the chemistry teacher gave him 5 exercises.
Considering the student does not have much time, how many different sets of exercise can he pick to do if he
wants to choose only 2 math exercises, 4 geography exercises, and 1 chemistry exercise?
Exercise 5. Eight people, named A, B, . . . , H, are going to make a line.
In how many ways can these people be placed?
In how many ways can these people be placed if A and B have to be next to each other?
In how many ways can these people be placed if A and B have to be next to each other, and C and D also
have to be next to each other?
{o1 , o2 , . . . , on }. We denote the set of natural numbers by N, the set of integers by Z and the set of reals by
R. In probability, sets are fundamental for describing outcomes of an experiment.
Example 5 (Sets).
The set of possible outcomes of a six sided die: {1, 2, 3, 4, 5, 6}.
The set of outcomes of a coin flip: {T, H}.
The set of outcomes of two coin flips: {(T, T ), (T, H), (H, T ), (H, H)}.
The set of all odd numbers: {2n + 1 : n N} or {1, 3, 5, 7, . . .}.
The set of non-negative real numbers: {x R : x 0}.
A circle of radious 1: {(x, y) R2 : x2 + y 2 1}.
Definition 1 ( and ).
/ We write o S if object o is an element of set S and o
/ S, otherwise.
Example 6 ( and ).
/
T {T, H}.
7
/ {1, 2, 3, 4, 5, 6}.
7 {2n + 1 : n N}.
Definition 2 (empty set - ). is the only set with no elements. That is, for every object o, o
/ .
Definition 3 (disjoint sets).
Two sets A and B are disjoint if, for every o A we have that o
/ B and for every o B, o
/ A.
5
A sequence of sets (An )nN is disjoint if, for every i 6= j, Ai is disjoint from Aj .
Example 7 (Disjoint sets).
{1, 2} and {3, 4} are disjoint.
{1, 2} and {2, 3} are not disjoint since 2 {1, 2} and 2 {2, 3}.
Definition 4 ( and =). Let A and B be two sets. We say that:
A B if, for every o A, o B.
A = B if A B and B A.
Example 8 ( and =).
{1, 2} {1, 2, 3, 4}.
{n Z : n 1} N.
{n Z : n 0} = N.
We reserve the symbol for the set of all objects we are considering in a given model. is often called the
sample space in probability theory. That is, for every set A we consider in that model, A .
1.2.1 Set operations
Definition 5 (complement - c ). Let A be a set. o is an element of Ac if and only if o
/ A. That is, the complement
of A is formally defined as Ac = {o : o
/ A}.
Example 9 (c ).
Let = {T, H}, {T }c = {H}.
Let = {1, 2, 3, 4, 5, 6}, {1, 2}c = {3, 4, 5, 6}.
Let = N, {n N : n > 0}c = {0}.
Definition 6 (union - ).
Let A and B be two sets. o is an element of the union between A and B, A B, if and only if either
o is an element of A or o is an element of B. That is, A B = {o : o A or o B}.
Let (An )nN be a sequence of sets. o is an element of the union of (An )nN , nN An , if and only if
there exists n N such that o An . That is, nN An = {o : exists n N such that o An }
Example 10 ().
{T } {H} = {T, H}.
{1, 2} {2, 3} = {1, 2, 3}.
{1} {3} {5} = {1, 3, 5}.
{n Z : n > 0} {n Z : n < 0} = {n Z : n 6= 0}.
6
nN {n} = N.
nN {x R : x n} = {x R : x 0}.
nN {x R : x 1/(n + 1)} = {x R : x > 0}.
Definition 7 (intersection - ).
Let A and B be two sets. o is an element of the intersection between A and B, A B, if and only if o
is an element of A and o is an element of B. That is, A B = {o : o A and o B}.
Let (An )nN be a sequence of sets. o is an element of the intersection of (An )nN , nN An , if and only
if for every n N, o An . That is, nN An = {o : for every n N, o An }
Example 11 ().
{T } {H} = .
{1, 2} {2, 3} = {2}.
({1, 2} {2, 3}) {5} = {2, 5}.
{n Z : n 0} {n Z : n 0} = {0}.
nN {i N : i n} = .
nN {x R : x n} = {x R : x 0}.
Theorem 1 (DeMorgans laws). Let (An )nN be a sequence of subsets of . Then, for every n N,
(ni=1 Ai )c = ni=1 Aci
(ni=1 Ai )c = ni=1 Aci
Moreover,
(iN Ai )c = iN Aci
(iN Ai )c = iN Aci
Definition 8 (Partition). Let (An )nN be a sequence of sets. We say that (An )nN partitions if:
for every i, j N such that i 6= j, Ai and Aj are disjoint.
nN An = .
1.2.2 Exercises
Exercise 6. Let A = {1, 3, 5}, B = {1, 2} and = {1, 2, 3, 4, 5, 6}. Find:
a. A B
b. A B c
c. B B c
7
d. (A B)c
e. (A B)c
Exercise 7. Consider that a given day can be rainy - R - or not rainy - N R. We are interested in the weather
of the next two days.
a. How would you formally write ?
b. How would you formally write The outcomes such that both days are rainy ? Call this set A.
c. How would you formally write The outcomes such that at least one day is rainy? Call this set B.
d. Is it true that A B?
e. Find B c . How would you describe this set in English?
f. Is it true that A and B c are disjoint?
Exercise 8. Are the following statements true or false?
a. .
b. {}.
c. {} .
d. .
e. {}.
f. {} .
Exercise 9. Prove the following:
a. S and T are disjoint if and only if S T = .
b. S T = T S
c. S T = T S
d. S = (S c )c
e. S S c =
f. S = S
g. S =
h. (S T )c = S c T c
i. (S T )c = S c T c
j. ({n})nN partitions N.
k. ({1, 2}, {3, 4}, , , , . . .) partitions {1, 2, 3, 4}.
8
2 Probability Theory
2.1 The Axioms of Probability
We denote by the sample space, that is, the set of all possible outcomes in the situation we are interested in.
For example, one might be interested in the outcome of a coin flip ( = {H, T }), the number of rainy days next
year ( = {n N : n 366}) or the time until a given transistor fails ( = {x R : x > 0}).
For every A , one might be uncertain if A is true or not. For example, {T } {H, T } and one cannot usually
determine if the outcome of a coin flip will be heads or tails before the coin is tossed. A probability is a function
defined on subsets of which is usually interpreted in one the following ways:
(Frequency) P (A) denotes the relative frequency which outcomes in A will be observed if the same experiment
is performed many times independently.
(Degree of Belief) P (A) represents the evaluators degree of belief that the outcome will be an element in A.
Events are subsets of to which we attribute probabilities. We denote by F the set of all events. In order for
P to be a probability it must satisfy the Axioms of Probability.
Definition 9 (Axioms of Probability). P : F R is a probability if:
1. (Non-negativity) For all A F, P(A) 0.
2. (Countable additivity) If (An )nN is a sequence of disjoint sets in F, P(nN An ) =
nN P(Ai ).
3. (Normalization) P() = 1.
Next, we prove some consequences of these axioms that are constantly used:
Lemma 1. P() = 0.
Proof. Let (An )nN be a sequence of subsets of such that A0 = and Ai = for all i > 0. By constructions,
this is a sequence of disjoint sets and, therefore, by countable additivity:
P(nN Ai ) =
P(Ai ) = P() +
P()
n>0
nN
P()
n>0
Pn
i=1 P(Ai ).
Proof. Consider the sequence (Bi )iN such that Bi = Ai for i n and Bi = for i > n. Observe that (Bi )iN is
a sequence of disjoint sets. Hence, by countable additivity:
P(A1 A2 . . . An ) = P(iN Bi ) =
P(Bi ) =
iN
Pn
i=1 P(Ai ).
10
n
X
i=1
P(Ai ) +
X
i>n
P()
(1)
(2)
(3)
1
18 ,
What is the sample space in this problem description? What were the mistakes in the argument?
Exercise 15. Let A and B be two events.
a. If P(A) = 0.7, P(B) = 0.4 and P(A B) = 0.8, what is the value of P(A B)?
b. If P(A B) = 0.25, P(A B) = 0.75 and P(A) = 2P(B), what is P(A)?
c. Prove that if P(A) = P(B c ), then P(Ac ) = P(B).
Exercise 16. Show that P((A B c ) (Ac B)) = P(A) + P(B) 2P(A B).
Exercise 17. Prove that
P(A B C) = P(A) + P(B) + P(C) P(A B) P(A C) P(B C) + P(A B C)
Exercise 18.
Let be the sample space and A and B be events. Prove that:
a. P(A B) = 1 P(Ac B c ).
b. P(A) = P(A B) + P(A B c ).
c. max(P(A), P(B)) P(A B) min(1, P(A) + P(B)).
d. max(0, P(A) + P(B) 1) P(A B) min(P(A), P(B)).
Exercise 19. Let A1 , A2 , . . . F. Prove that
P(n1 An )
P(An )
n1
Exercise 20. Let A, B and C be events of . Using set operations, describe the following sets:
At least one of A, B or C happens
All A, B or C happen
A and B happen, but C does not
Exactly one of A, B and C happen
Exercise 21. (Challenge). Let A1 , A2 , . . . be a sequence of events with Ai Ai+1 for every i 1. Show that
P( lim An ) = lim P(An ),
n
where
lim An := n1 An .
c
Hint: Define Fn := An (n1
i=1 Ai ) and notice they are disjoint.
where
lim An := n1 An .
Exercise 22. Assuming that P(An ) = 1 for every n N, prove that P(nN An ) = 1.
nN P(Ai ).
3. (Normalization) P() = 1.
Some sets F are more useful than others. For instance, it is possible to define the continuous uniform measure
over a special -field called the Borelians of (0, 1), however it is not possible to do it over the set of the parts of
(0, 1). This is the reason why -fields are necessary in modern probability theory. For more on this issue, see for
instance (Billingsley, 2008).
2.2.1 Exercises
Exercise 23. Let = {a, b, c, d}. Is {, {a}, {b, c}, } a -field of ?
Exercise 24. Let F be a -field. Prove that if Ai F for every i N, then iN Ai F.
13
X(w)
0
1
2
Definition 13 (Indicator function). The indicator function is a special type of random variable. Consider an
event A F. The indicator function of A is denoted by IA : R and defined as:
1
IA (w) =
0
, if w A
, otherwise
If P r() < 1, I P r() > 0 and, therefore, the opponent can make you a sure loser setting A = 1.
If P r() > 1, I P r() < 0 and, therefore, the opponent can make you a sure loser setting A = 1.
Finally, let A and B be disjoint events. Observe that IAB IA IB = 0. Hence, if the opponent sets
AB = 1, A = 1, B = 1, your final outcome is:
AB (IAB P r(A B)) + A (IA P r(A)) + B (IB P r(B)) =
(IAB IA IB ) + (P r(A) + P r(B) P r(A B)) =
P r(A) + P r(B) P r(A B)
Similarly, if AB = 1, A = 1, B = 1, then your final outcome is P r(A B) P r(A) P r(B). Hence, if
P r(A B) 6= P (A) + P (B) there exist an option such that the opponent makes you a sure loser. That is, in order
for you to avoid sure loss P r must satisfy:
1. P r(A) 0.
2. P r() = 1.
3. If A and B are disjoint, P r(A B) = P r(A) + P r(B).
These rules closely resemble the axioms of probability. Hence, if one believes that how much one is willing to
bet can approximate degrees of belief, this reasoning can justify the axioms of probability.
P({w}) = P ()
(Lemma 2)
P({w}) = 1
(Definition 9)
|| P({w}) = 1
(Equiprobable outcomes)
P({w}) = ||1
|A|
|| .
15
P({w})
(Lemma 2)
||1
(Lemma 6)
wA
X
wA
|A|
||
That is, in problems with equiprobable outcomes, a probability of a set is proportional to its size. In order to
determine sizes of sets, some counting principles are of great help. We saw some combinatorial analysis in Chapter
1, heres a quick review in the context of the examples we have been using:
Consider you have n different objects. The number of ways you can select k times among them with
replacement is nk .
Example 16. Consider a six sided die. It can assume 6 different values. Hence, if we throw the die 3 times
in a row, we can observe 63 different outcomes.
Consider you have n different objects. If the order of selection matters, the number of ways you can select
k n of these objects is
n!
(nk)! .
Example 17. Consider there are n participants in a given chess tournament. In how many ways can the
list of top 3 be composed?
n
(n3)! .
Consider you have n different objects. If the order of selection doesnt matter, the number of ways you can
n!
.
select k n of these objects is nk = k!(nk)!
Example 18. The number of poker hands is 52
5 .
2.5.1 Exercises
Exercise 27. Consider we throw a fair coin 3 times.
a. What is the sample space?
b. What is the probability of each outcome?
c. What set corresponds to obtain at least two heads? What is the probability of this set?
Exercise 28.
a. How many distinct 7-digit phone numbers are there? Assume that the first digit of any telephone number
cannot be 0 or 1.
b. If I pick a number equally likely among all valid phone numbers, what is the probability that I pick 8939240?
c. What about any number of the form 893 924z, where z can be any digit between 0 and 8?
d. What about a number such that no 2 digits are the same?
16
Exercise 29. Suppose there are 4 people in a room. What is the probability that no two of them celebrate
their birthdays in the same month? Assume the probabilities of having a birthday in each month are equal and
the birthday of two people are independent from one another. Can you think of a situation in which assuming
independence wouldnt be reasonable?
Exercise 30. Consider that a fair six-sided die is tossed twice. Consider that all outcomes are equally probable.
Let X be a random variable that denotes the outcome of the first die. Let Y be a random variable that denotes the
outcome of the second die. Find P (|X Y | {0, 1, 2}).
Exercise 31. (Challenge) Player A has n+1 coins, while player B has n coins. Both players throw all of their
coins simultaneously and observe the number of heads each one gets. Assuming all the coins are fair, what is the
probability that A obtains more heads than B?
Exercise 32 (Challenge). A rook can capture another if they are both in the same row or column of the chess
board. If one puts 8 rooks in the board such that the different placements are equally probable, what is the probability
that no rook can capture another?
Exercise 33 (Challenge). Two magicians perform the following trick: A person randomly picks 5 cards out of
a fair deck and hands them out to Magician 1. Magician 1 puts the five cards in the following way. The first
card is facing down. The next four cards are facing up in the order he chooses. Magician 2 looks at the arranged
cards and guesses which one is facing down. What plan can the magicians agree upon so that Magician 2 always
answers correctly?
17
1
2
P(A B) = P(A|B)P(B)
P({3, 4}) = P(A|B)P({1, 2, 3, 4})
P(A B) = P(B|A)P(A)
P({3, 4}) = P(A|B)P({3, 4, 5})
2
3
P(A)
P(B) .
1
4.
1
2.
P(Ai )
iI
1
3
Figure 1
Exercise 35. Explain how the cartoon in Figure 1 from http: // xkcd. com/ 795/ relates to the concept of
conditional probability.
Exercise 36. Consider that a coin is flipped twice. I announce that there was at least 1 heads. What is the
probability that there were 2 heads?
Exercise 37. A drug prescription states the following information:
There is a 10% chance of experiencing headache (event H).
There is a 15% chance of experiencing nausea (event N ).
There is a 5% chance of experiencing both side effects.
a. Are the events H and N disjoint? Why?
b. Are the events H and N independent? Why?
c. What is the probability of experiencing at least one of the two side effets?
d. What is the probability of experiencing exactly one of the two side effects?
e. What is the probability of experiencing neither headache or nausea?
f. What is the probability of experiencing headache given that you experienced at least one of the two side
effects?
Exercise 38. Assume a card is taken from a well shuffled deck. Let A denote the card is an ace and B denote
the cards suit is diamonds. Show that A and B are independent.
Exercise 39. Two cards are taken from a well shuffled deck:
a. Let A be the first cards suit is red and B be the second cards suit is red. Compute P (A B). Are A
and B independent?
20
b. Let A be the first cards suit is red and B be the second card is a J, Q or K. Compute P (A B). Are
A and B independent?
Exercise 40. Show that:
1. A and B are independent if and only if P(B|A) = P(B).
2. A and B are independent if and only if P(A|B c ) = P(A).
Exercise 41.
a. If A is independent of B and B is independent of C, are A and C independent? Why?
b. If A is independent of B, B is independent of C and A is independent of C, is P(ABC) = P(A)P(B)P(C)?
Exercise 42. Prove Theorem 2.
Exercise 43. A fair coin is tossed 10 times. What is the probability that no two consecutive heads and no two
consecutive tails appear?
Exercise 44. Let A be an event. Prove that P(|A) is indeed a probability measure.
Exercise 45. Let X denote the outcome of a fair six-sided die. Find P((X 3)2 = 1|X 6= 3).
Exercise 46. (Challenge) Player A has n+1 coins, while player B has n coins. Both players throw all of their
coins simultaneously and observe the number of heads each one gets. Assuming all the coins are fair, what is the
probability that A obtains more or equal heads as B? (This problem is quite hard. You are not required to know
how to solve this one.)
Exercise 47. Prove that if A1 , . . . , An are independent events and their union is the sample space, then P(Ai ) = 1
for at least one i.
21
1
3
of heads. Suppose you remove either one of the coins with equal probability, flip
H2 denote the event that you obtain two heads. The problem states that P(C 1 ) = P(C 1 ) = 21 . We also know that
P(H2 |C 1 ) =
2
1
4
and P(H2 |C 1 ) = 91 . What is your belief about which coin you picked after performing the flips?
3
In other words, can you use the previous probability values and the axioms of probability to determine P(C 1 |H2 )?
2
P(An )P(B|An )
nN
B =B
(4)
= nN An
(5)
B = B (nN An )
(6)
= nN (B An )
(7)
and, therefore,
P(B) = P(nN (B An ))
Next, since (An )nN is a partition, it is a disjoint sequence. Hence, for every i 6= j, Ai Aj = and
22
(8)
(B Ai ) (B Aj ) = B (Ai Aj )
(9)
=B
(10)
(11)
That is, (B An )nN also is a disjoint sequence. Thus, from the axioms of probability (Definition 9),
P(nN (B An )) =
P(B An )
(12)
nN
Finally, from the axiom of conditional probability (Definition 14), for each n N, P(B An ) = P(An )P(B|An ).
Using equations 8 and 12, conclude that:
P(B) =
P(An )P(B|An )
(13)
nN
Pn
Proof. If A1 , . . . , An partitions , the sequence (Ci )iN such that Ci = Ai for i n and Ci = for i > n also
partitions . The proof follows straight from Theorem 3 and P() = 0.
The previous result allows us to relate an unconditional probability P(B) to conditional probabilities P(B|An ).
Example 24 (Example 23 continued). The problem description tells us the probability of obtaining heads once
the coin is decided, that is, P(H2 |C 1 ) and P(H2 |C 1 ). Nevertheless, it doesnt directly tell us the probability of
2
obtaining H2 without knowing which coin was picked, P(H2 ). Observe that C 1 , C 1 partitions and, therefore, it
2
13
1 1 1 1
= + =
2 4 2 9
72
Simple, right? Were starting to benefit from building a rigorous theory instead of relying on intuition alone.
Theorem 4 (Bayes Theorem). Let (Ai )iN be a partition of and B be an event. For every n N,
P(An |B) = P
P(An )P(B|An )
iN P(Ai )P(B|Ai )
Proof. Recall from the axiom of conditional probability (Definition 14) that
P(An |B) =
P(An B)
P(B)
Using the axiom of conditional probability again, P(An B) = P(An )P(B|An ). Using equation 14,
23
(14)
P(An )P(B|An )
P(B)
P(An |B) =
(15)
Finally, observe that (Ai )iN is a partition of and, therefore, using Theorem 3, P(B) =
nN P(An )P(B|An ).
P(An )P(B|An )
nN P(An )P(B|An )
P(An |B) = P
(16)
1
4
and P(H2 |C 1 ) = 19 . It is less intuitive to find out what one learns about
3
the bias of the coin by observing that two heads, P(Ci |H2 ). Bayes Theorem provides the answer.
Recall that C 1 , C 1 is a partition of . Hence, by Lemma 10,
2
P(C 1 |H2 ) =
2
P(C 1 )P(H2 |C 1 )
2
1 1
2 4
1
1
4 + 2
1
9
9
=
13
2.7.1 Exercises
Exercise 48. Assume 5% of men and 0.25% of women are daltonic. Assume a random person is selected and
he/she is daltonic. Assuming that there are the same number of men and women, what is the probability that this
person is a man?
Exercise 49. Assume that there exist 4 fair coins in a bag. A person picks a number of coins with equal probability
among {1, 2, 3, 4}. Next, the person throws all the coins he picked. What is the probability that all coins land tails,
without knowing how many were picked?
Exercise 50. Medical case histories indicate that different illnesses may produce identical symptoms. Suppose
that a particular set of symptoms, H, occurs only when one of three illnesses, I1 , I2 or I3 occurs. Assume that the
simultaneous occurrence of more than one of these illnesses is impossible. Also assme that:
a. P(I1 ) = 0.01; P(I2 ) = 0.05; P(I3 ) = 0.02.
b. P(H|I1 ) = 0.90; P(H|I2 ) = 0.95; P(H|I3 ) = 0.75.
Assuming that an ill person exhibits the symptoms, H, what is the probability that the person has illness I1 ?
24
Exercise 51. Consider that each component C1 , . . . , Cn in a system fails independently with probability p. A
system fails if there is no operational path from B to E. What is the probability that the following systems fail?
a. Series system:
/ 2
C
/ 1
C
/E
b. Parallel system:
C
? 1
?E
B
C2
c. Mixed system:
C
? 2
B
?E
/ 1
C
C3
Exercise 52. (Monty Hall) There are 3 doors: A, B and C. There is a car behind one of these doors. If you
pick the right one, you get the car. At first, you believe that it is equally likely that the car is behind any one of
the doors. You pick door A. Next, the shows presenter opens door B, shows that it is empty and allows you to
change to door C. Is it a good idea to change? Assume the showman would always open a door with no prize and
offer you the chance to change your door, no matter if you initially chose the door with the prize or not.
Exercise 53. (Polyas urn) Consider that an urn has 1 black ball and 1 white ball. Every time you draw a ball,
you must put it back into the urn and add an extra ball with the same color. What is the probability that you get
a white ball in your 3rd draw from the urn?
Exercise 54. You have 12 red, 12 yellow balls and 2 urns. Consider that the balls are distributed among the urns,
one of the urns is randomly selected (the probabilities that you choose each urn are equal) and a ball is drawn from
the urn that was selected.
a. Assume you put 2 red balls and 4 yellow balls in the first urn and the rest of the balls in the second urn.
What is the probability that you draw a yellow ball?
b. (Challenge) What way of distributing the balls among the urns maximizes your probability of drawing a
yellow ball?
Exercise 55. Probability Theory was used in the criminal case People v. Collins. In this case, a woman had
her purse robbed. Wittnesses claimed that a couple running from the scene was composed of a black man with a
beard and a mustache and a blond girl with her hair in a ponytail. Wittnesses also said the couple drove off in
a yellow car. Malcolm and Janet Collins were found to satisfy all the traits previously presented. A professor
of mathematics also stated in Court that, if a couple were randomly selected, one would obtain the following
probabilities:
25
Event
Man with moustache
Girl with blonde hair
Girl with ponytail
Black man with beard
Interracial couple in a car
Partly yellow car
Probability
1/4
1/3
1/10
1/10
1/1000
1/10
Hence, the probability that all of these characteristics are found in a randomly chosen couple is 1 in 12, 000, 000.
The prosecution claimed that this constituted proof beyond reasonable doubt that the defendants were guilty. Do
you find this argument compelling? Why?
Exercise 56. Prove that if (An )nN and (Bn )nN are two sequences of events such that limn P(An ) = 1 and
limn P(Bn ) = p, then limn P(An Bn ) = p.
26
3 Random Variables
3.1 Distribution of Random Variables
Recall that a random variable is an unknown number, that is, a function from to R (Definition 12). A discrete
random variable is a random variable that only assumes a countable number of values.
Example 26. Consider that a person flips a fair coin once. The possible outcomes are heads or tails and =
{H, T }. Let X denote the number of heads that are observed. That is, X(H) = 1 and X(T ) = 0.
Example 27. Consider that a person throws a fair coin until he obtains the first heads or 3 tosses, =
{H, T H, T T H, T T T }. Denote by X the number of coin flips the person performs. X is a random variable such
that X(H) = 1, X(T H) = 2, X(T T H) = 3 and X(T T T ) = 3. X is discrete and only assumes a finite number of
values, 1, 2 or 3.
Example 28. A fair coin is tossed until it lands heads, = {H, T H, T T H, T T T H, . . .}. Denote by X the number
of coin flips the person performs. X is a discrete random variable and can assume a countably infinite number of
values. X can assume any value in N {0}. For example, X(H) = 1, X(T H) = 2, X(T T H) = 3, . . .
Definition 17. Let X be a random variable. For x R, we define pX (x) = P(X = x) = P({w : X(w) = x}).
We call the function pX : R [0, 1] the probability mass function (pmf ) of X.
Remark: The set {w : X(w) = x} is often denoted by X 1 ({x}).
Example 29. Consider Example 26.
pX (0) = P(X = 0) = P({w : X(w) = 0}) = P({T }) = 0.5
pX (1) = P(X = 1) = P({w : X(w) = 1}) = P({H}) = 0.5
Doesnt it feel good to write pX (1) instead of P({w : X(w) = 1})? This is the power of good notation. . .
Example 30. Consider Example 27. Observe that the elements of are not equiprobable. Hence, even though
{H} and {T H} are unitary sets, their probabilities are different.
In order to obtain the probabilities of the unitary subsets of , define the event Hi as the i-th coin flip is
heads. Observe that all Hi are jointly independent and, since the coin is fair, P(Hi ) = 0.5. Hence,
P({H}) = P(H1 ) = 0.5
P({T H}) = P(H1c H2 ) = P(H1c )P(H 2 ) = 0.25
P({T T H}) = P(H1c H2c H3 ) = P(H1c )P(H2c )P(H3 ) = 0.125
P({T T T }) = P(H1c H2c H3c ) = P(H1c )P(H2c )P(H3c ) = 0.125
Example 31. Consider Example 28. Observe that, once again, the elements of are not equiprobable. Define
the event Hi as the i-th coin flip is heads. Observe that all Hi are jointly independent and, since the coin is
fair, P(Hi ) = 0.5. Hence,
pX (1) = P(X = 1) = P({w : X(w) = 1}) = P(H1 ) = 0.5
pX (2) = P(X = 2) = P({w : X(w) = 2}) = P(H1c H2 ) = 0.25
...
c
pX (n) = P(X = n) = P({w : X(w) = n}) = P(H1c . . . Hn1
Hn ) =
1
2n
...
0.15
Probability
0.10
0.05
0.00
10
15
20
Number of Heads
Figure 2
Lemma 11. Let X be a discrete random variable and pX be the pmf of X. Let be the possible values of X.
For every x , 0 pX (x) 1.
x pX (x)
= 1.
Proof.
pX (x) = P (X = x) = P ({w : X(w) = x}). Recall that 0 P ({w : X(w) = x}) 1.
x pX (x)
x P ({w
28
There are many ways to represent a pmf. For example, a formula, a table of values or just writing a list of
values. A useful visualization tool is to plot the probability values against , as shown in Figure 2.
Any probabilities associated to X maybe be calculated using a pmf. One may also use cumulative distribution
functions, i.e., FX (x) := P(X x), however we leave this for continuous random variables (Chapter 5).
Definition 18. Let X1 , . . . , Xn be discrete random variables. We say that they are independent if, for every
x1 , . . . , xn R,
P(X1 = x1 , . . . , Xn = xn ) := P(ni=1 Xi = xi ) =
n
Y
P(Xi = xi )
i=1
29
E[X|A] =
X(w)P({w}|A)
Definition 20. Let X be a discrete random variable. The expected value of X is denoted by E[X] and is equal to
E[X|]. That is,
X
E[X] =
X(w)P({w})
Lemma 12 (Law of the unconscious statistician). Let X be a discrete random variable with pmf pX and that
assumes values in :
X
E[f (X)] =
f (x) pX (x)
x
Proof.
E[f (X)] =
f (X(w))P({w})
f (X(w))P({w})
x w:X(w)=x
X
x
f (x)
P({w})
w:X(w)=x
f (x) pX (x)
x x
P(X i)
i=1
Proof.
E[X] =
X
j=0
j pX (j) =
j
X
X
pX (j) =
j=1 i=1
X
i=1 j=i
30
pX (j) =
X
i=1
P (X i)
p=0.25
0.20
p=0.5
0.15
10
15
20
10
15
p=0.75
p=0.1
0.20
Number of Heads
20
0.10
10
0.00
0.20
Probability
0.10
0.15
0.00
Number of Heads
0.05
Probability
0.05
0.00
0.05
0.15
Probability
0.10
0.00
Probability
0.10
15
20
Number of Heads
10
15
20
Number of Heads
Figure 3: pmfs for number of heads in 20 coin flips with probability p of heads. Red dotted line is the expected
value.
" n
X
i=1
# X
n
ci Xi A =
ci E[Xi A]
i=1
Proof.
"
E
n
X
i=1
#
n
XX
ci Xi A =
ci Xi (w)P({w}A)
=
w i=1
n
X
X
ci
i=1
n
X
Xi (w)P({w}A)
ci E[Xi A]
i=1
Example 34 (20 coin flips). Consider that you flip 20 coins with probability of heads p and count the number of
heads. Figure 3 presents pmf s for different values of p. Their means correspond to the vertical red lines.
31
Lemma 15 (Law of total expectation). Let A1 , . . . , An be a partition of and X be a discrete random variable.
E[X] =
n
X
E[X|Ai ] P(Ai )
i=1
Proof.
n
X
E[X|Ai ] P(Ai ) =
i=1
n X
X
i=1 w
n X
X
X(w)P({w}|Ai )P(Ai )
X(w)P({w} Ai )
(Definition 14)
i=1 w
X(w)
=
=
P({w} Ai )
i=1
n
X
X(w)P
n
[
!
({w} Ai )
i=1
X(w)P({w}) = E[X]
3.2.1 Exercises
Exercise 64. Recall that IA denotes the indicator function (Definition 13) of event A. If P(A) = 0.2, P(B) = 0.9
and P(A B) = 0.1,
a. Compute the pmf of 10IA + 5IB .
b. Compute E[10IA + 5IB ].
Exercise 65. Consider that you throw 200 times a coin with probability p of landing heads. Let X denote the
total number of heads. Compute E[X].
Hint: Denote by Hi the event that the i-th coin flip is heads. Observe that X =
P200
i=1 IHi .
Exercise 66. Let X be the number of heads observed in two flips of a fair coin. Compute E[X 2 ] and E[X]2 .
Exercise 67. Let X1 , . . . , Xn be discrete random variables such that, for every i {1, . . . , n}, E[Xi ] = R.
a. Let pi 0 be such that
=
b. Let X
Pn
i=1
Xi
Pn
i=1 pi
P
= 1. Find, E[ ni=1 pi Xi ].
Exercise 68. Consider that you either choose a four-sided die or a six-sided die with equal probability. Next, you
throw the chosen die 1000 times. Let X denote the sum of the outcomes of the 1000 die throws. Compute E[X].
Exercise 69. Consider a urn with balls numbered from 1 to n. If one samples one ball at a time with replacement
from the urn, what is the the p.m.f. of the random variable X : number of samples needed until the same ball is
drawn twice for the first time? What is its expectation? Hint: use Lemma 13.
32
Exercise 70. Assume you have six keys with you. You are not sure which one opens the door you need to open,
so you start trying each of these keys. What is the average number of keys youll need to try until you are able to
open the door?
Exercise 71. (Challenge) At iteration 1, a bacterial colony has a single bacteria. At each new each iteration, a
single bacteria in the colony can either die (with probability 1 p) or divide into two bacteria (with probability p).
Let Xi denote the number of bacteria in the colony at iteration i.
a. Find E[X2 ].
b. Find E[Xn |X2 = 0] and E[Xn |X2 = 2].
c. Find limn E[Xn ].
Exercise 72. Prove that:
If, for every , X() = c for some c <, then E[X] = c.
If, for every , Z() X() Y (), then E[Z] E[X] E[Y ].
33
3.3 Variance
Let X be a random variable. In the previous subsection, we saw the definition of the expected value of X, E[X].
We saw that, intuitively, E[X] is a central value around which the possible values of X are dispersed.
In this subsection we present the variance of X, V ar[X]. The variance of X is a measure of the concentration
of the possible values of X around E[X].
Definition 21 (Variance). The variance of a discrete random variable X is defined as E[(X E[X])2 ] and denoted
by V ar[X].
Example 35. Let H denote the event that the outcome of a single coin flip is heads. Consider that P(H) = p.
Lets compute V ar[IH ]. Observe that pIH (0) = 1 p and pIH (1) = p. Hence,
E[IH ] = 0 (1 p) + 1 p = p
Hence, V ar[IH ] = E[(IH p)2 ]. If IH = 0, then (IH p)2 = p2 and if IH = 1, then (IH p)2 = (1 p)2 . Conclude
that
V ar[IH ] = E[(IH p)2 ]
= (1 p) p2 + p (1 p)2
= p(1 p)(p + 1 p) = p(1 p)
Observe that p(1 p) is a parabola with roots 0 and 1 and that assumes its maximum value at 0.5. In other words,
the variance of IH is minimized if p = 0 or p = 1, since in these cases we know for sure if we will observe heads or
not. The variance is maximized for a fair coin and, in this sense, the flips of a fair coin oscilate the most around
the expected value.
Example 36. Let X denote the number of heads observed in two flips of a coin with probability p of heads. Let
Hi denote the event that the i-th flip is heads. Lets compute V ar[X]. The pmf of X is:
pX (0) = P({T T }) = P(H1c H2c ) = (1 p)2
pX (1) = P({T H, HT }) = P(H1c H2 ) + P(H1 H2c ) = 2p(1 p)
pX (2) = P({HH}) = P(H1 H2 ) = p2
Hence,
E[X] = 0 (1 p)2 + 1 2p(1 p) + 2 p2 = 2p
Thus, V ar[X] = E[(X 2p)2 ] and
E[(X 2p)2 ] = (0 2p)2 (1 p)2 + (1 2p)2 2p(1 p) + (2 2p)2 p2
= 2p(1 p) (2p(1 p) + (1 2p)2 + 2(1 p)p) = 2p(1 p)
That is, the variance of X is a parabola with minimums at p = 0 and p = 1 and maximum at p = 0.5.
Lemma 16. V ar[aX + b] = a2 V ar[X]
34
Proof.
V ar[aX + b] = E[(aX + b E[aX + b])2 ]
(Definition 21)
= E[(aX + b aE[X] b) ]
2
(Lemma 14)
= E[a (X E[X]) ]
= a2 E[(X E[X])2 ] = a2 V ar[X]
(Definition 21)
= E[X 2 ] E[X]2
35
Proof.
X
E[XY ] =
xyP (X = x Y = y)
xIm(X),yIm(Y )
xyP (X = x)P (Y = y)
xIm(X) yIm(Y )
X
X
yP (Y = y)
yIm(Y )
xIm(X)
xP (X = x)
xpX (x)
xIm(X)
ypY (y)
yIm(Y )
= E[X]E[Y ]
Observation: E[XY ] = E[X]E[Y ] does not imply that X and Y are independent. Well see a counter-example
in a future homework.
Lemma 19. If X and Y are independent, V ar[X + Y ] = V ar[X] + V ar[Y ].
Proof.
V ar[X + Y ] = E[(X + Y E[X + Y ])2 ]
= E[(X E[X] + Y E[Y ])2 ]
= E[(X E[X])2 ] + E[(Y E[Y ])2 ] + E[2(X E[X])(Y E[Y ])]
= V ar[X] + V ar[Y ] + E[2(X E[X])(Y E[Y ])]
Hence, it remains to show that E[2(X E[X])(Y E[Y ])] = 0.
E[2(X E[X])(Y E[Y ])] = 2E[XY E[X]Y XE[Y ] + E[X]E[Y ]]
= 2(E[XY ] E[X]E[Y ] E[X]E[Y ] + E[X]E[Y ]])
= 2(E[XY ] E[X]E[Y ])
Since X and Y are independent, E[XY ] = E[X]E[Y ], which completes the proof.
Example 39 (Continuation of Examples 36 and 38). We showed that X = IH1 + IH2 and IH1 and IH2 are
independent. Hence, V ar[X] = V ar[IH1 ] + V ar[IH2 ]. From Example 35, V ar[IH1 ] = V ar[IH2 ] = p(1 p). Hence,
V ar[X] = 2p(1 p), which again confirms the calculations in Example 36.
Lemma 20. If X is a discrete random variable, V ar[X] = 0 if, and only if, X is constant (i.e, there exists c R
such that P(X = c) = 1).
Proof. Assume there exists c R such that P(X = c) = 1. Then E[X] = c, and E[X 2 ] = c2 , which implies that
V ar[X] = c2 c2 = 0.
Now, let be set of the values X assumes. If V ar[X] = 0, we have that
X
(x E[X])2 pX (x) = 0.
x
36
Now, because pX (x) > 0 for every x , this implies that x E[X])2 = 0 for every x , i.e., x = E[X] for every
x , which concludes the proof.
3.3.1 Exercises
Exercise 73. Show that V ar[X] 0.
Exercise 74. Let X assume values in {1, 0, 1}. Let pX (1) = pX (1) =
1p
2
Pn
i=1
Xi
. Find V ar[X].
Pn
i=1 pi Xi ]
V ar[X].
e. Combine your knowledge from Exercises 67 and 80. Provide arguments in English of why, among all possible
P
provides the values that are closest to .
random variables of the form ni=1 pi Xi , X
37
(Lemma 14)
= E[XY ] E[X]E[Y ]
a.
Cov[X, X] = E[X X] E[X]E[X]
= E[X 2 ] E[X]2 = V ar[X]
(Lemma 21)
(Lemma 17)
(Lemma 21)
(Lemma 21)
c.
Cov[aX + bY, Z] = E[(aX + bY )Z] E[aX + bY ]E[Z]
(Lemma 21)
(Lemma 14)
38
(Lemma 21)
V ar[X] is a norm in V.
p
V ar[X] V ar[Y ].
The equality holds if, and only if, there exists a, b R such that Y = aX + b.
Proof. Let V = X E[X] and W = Y E[Y ]. Because V, W V and, from Lemma 24, covariance is an inner
product, it follows that
p
p
Cov[V, V ] Cov[W, W ]
p
p
= V ar[V ] V ar[W ]
p
p
= V ar[X] V ar[Y ]
|Cov[X, Y ]| = |Cov[V, W ]|
(Cauchy-Schwarz inequality)
(Lemma 22)
Now, from Cauchy-Schwarz inequality, we know that the equality holds if, and only if, the exists b R, b 6= 0, such
that W = bV . In other words, the equality holds if, and only if, there exists b R such that Y E[Y ] = b(XE[X]),
which concludes the proof.
Lemma 26 (Pythagorean Theorem for random variables).
V ar[X + Y ] = V ar[X] + V ar[Y ] + 2Cov[X, Y ]
Hence, if Cov[X, Y ] = 0, V ar[X + Y ] = V ar[X] + V ar[Y ].
Proof.
V ar[X + Y ] = Cov[X + Y, X + Y ]
(Lemma 22)
(Lemma 24)
(Lemma 22)
39
Definition 24 (Correlation).
Cov[X, Y ]
p
Corr[X, Y ] = p
V ar[X] V ar[Y ]
Let h, i be an inner product and k k be the norm generated by the inner product. Recall from linear algebra
class that
is the cosine of the angle betwen v1 and v2 . Hence, using Lemma 24, we can interpret Corr[X, Y ]
as the cosine of the angle between random variables X and Y . In other words, Corr[X, Y ] is a measure of the
linear association between X and Y .
Lemma 27.
|Corr[X, Y ]| 1
Proof. Follows directly from applying Lemma 25 to Definition 24.
Lemma 28. Let X be a discrete random variable.
a. Let b R, Cov[b, X] = 0.
b. Let a 6= 0,
1
Corr[aX + b, X] =
1
, if a > 0
, if a < 0
Proof.
a. Cov[b, X] = E[bX] E[b]E[X] = bE[X] bE[X] = 0
b.
Cov[aX + b, X]
p
Corr[aX + b, X] = p
V ar[aX + b] V ar[X]
aCov[X, X] + Cov[b, X]
p
=p
V ar[aX + b] V ar[X]
aV ar[X]
p
=p
2
a V ar[X] V ar[X]
a
=
|a|
(Lemma 22)
(Lemmas 22 and 16)
i=1
i6=j
Lemma 30 is a generalization of Lemma 26, and we leave its proof to the reader.
Example 40 (Ross). A group of n people throw their hats into the center of a room. The hats are mixed up, and
then each person randomly selects one hat. Let X be the expected number of people that will selected their own
P
hat. We have that X can be written as the sum X = ni=1 Xi , where Xi is one if the i-th person selects his own
hat, and zero otherwise. Now, E[Xi ] = 1 pXi (1) = 1/n. It follows that E[X] = n 1/n = 1. We can also compute
the variance of X. First notice that Xi2 = Xi , and hence V ar(Xi ) = E[Xi2 ] E[Xi ]2 = 1/n (1/n)2 . Also, for
i 6= j, Xi Xj is a random variable that can only assume to values, 0 and 1. Moreover,
P(Xi Xj = 1) = P(Xi = 1, Xj = 1) = P(Xj = 1)P(Xi = 1|Xj = 1) =
It follows that E[Xi Xj ] = 1
1
n
1
n1 ,
1
1
.
n n1
and therefore
1
1 1
1
1
= 2
.
n n1 n n
n (n 1)
Finally, we notice that Cov[Xi , Xj ] is the same for every i 6= j. Because there are 2
n
2
" n
X
#
Xi = n
i=1
1
1
2
n n
1
n
+2
=1
2
2 n (n 1)
3.4.1 Exercises
Exercise 81. Let P(A) = pA , P(B) = pB and P(A B) = pAB .
a. Find Cov[IA , IB ] and Corr[IA , IB ].
b. Find a numerical value for Corr[IA , IB ] when A = B.
c. Find a numerical value for Corr[IA , IB ] when A is independent of B.
d. Find a numerical value for Corr[IA , IB ] when A and B partition .
e. Provide an interpretation in English for the previous items.
Exercise 82. Let = {1, 0, 1} and all outcomes be equally likely. Let X(w) = w and Y (w) = w2 .
a. Find Corr[X, Y ]. Are X and Y linearly associated?
b. Are X and Y independent?
Exercise 83. Let X1 , . . . , Xn , Y1 , . . . , Ym be random variables, and let a1 , . . . , an , b1 , . . . , bm be real numbers. Prove
that
n
m
n X
m
X
X
X
Cov(
ai Xi ,
bj Yj ) =
ai bj Cov(Xi , Yj ).
i=1
j=1
i=1 j=1
41
Exercise 84. Assume X1 , . . . , Xn are independent and identically distributed (a.k.a., iid) random variables with
X]
= 0.
variance 2 < . Show that Cov[Xi X,
Exercise 85. Assume X1 , . . . , Xn are identically distributed variables with covariance Cov[Xi , Xj ] = for i 6= j,
42
Definition 25. We say that a sequence of random variables X1 , X2 , . . . is a Bernoulli Process with parameter p
if the random variables are jointly independent and such that, for each i N, Xi Bernoulli(p).
Example 42 (Coding). Write a code that generates a number according to the Bernoulli(p) distribution.
import random
def rbernoulli(p):
return random.random() < p
i=1 Xi
Binomial(n, p).
One can interpret the Binomial distribution in the following way. Consider that one performs n independent
trials and each trial can be a success with probability p or a failure with probability 1 p. Let X denote the
number of successes we get after performing the n trials. X has distribution Binomial(n, p), where n is the number
of trials and p is the probability of sucess in each trial.
Lemma 32. If X has distribution Binomial(n, p), then, for 0 i n, P(X = i) =
43
n
i
pi (1 p)ni .
Proof. In order for X = i, one must observe i sucesses and n i failures. Observe that, since the trials are
independent, the probability of every outcome with i sucesses and n i failures is pi (1 p)i .
Hence P(X = i) = cn,i pi (1 p)ni , where cn,i is the number of outcomes such that there are i successes and
n i failures. Note that cn,i corresponds to the number of anagrams of SSS
| {z. .}. FF
| {z. .}. . There are n! permutations
i times n-i times
between letters but the i! permutations between Ss are the same and so are the (n i)! permutations between
n!
F s. Hence cn,i = i!(ni)!
= ni and P(X = i) = ni pi (1 p)ni .
The following result is commonly useful when performing calculations with binomials.
Lemma 33 (Binomial Theorem).
n
X
n i ni
ab
= (a + b)n
i
i=0
Example 43. Let X have distribution Binomial(n, p). Lets use the Binomial Theorem to perform a sanity check
and prove that P(X {0, 1, . . . , n}) = 1.
P(X {0, 1, . . . , n}) =
=
n
X
P(X = i)
i=0
n
X
i=0
n i
p (1 p)ni
i
= (p + 1 p)n = 1
Lemma 34. If X has distribution Binomial(n, p), then E[X] = np and V ar[X] = np(1 p).
Proof.
E[X] =
=
=
n
X
i=0
n
X
i=1
n
X
i=1
i P (X = i)
n i
i
p (1 p)ni
i
i
n!
pi (1 p)ni
(n i)!i!
n
X
(n 1)!
pi1 (1 p)ni
(n i)!(i 1)!
i=1
n
X
n 1 i1
= np
p (1 p)ni
i1
i=1
n1
X n 1
= np
pj (1 p)n1j
j
= np
(Since i! = i (i 1)!)
(Calling j = i-1)
j=0
= np(p + 1 p)n1 = np
also
44
E[X(X 1)] =
n
X
i(i 1) P(X = i)
i=0
n
X
i(i 1)
i=2
n!
pi (1 p)ni
(n i)!i!
n
X
(n 2)!
pi2 (1 p)ni
(n i)!(i 2)!
i=0
n2
X n 2
= n(n 1)p2
pj (1 p)n2j
j
2
= n(n 1)p
(Calling j = i-2)
j=0
We proved that
E[X] = np
E[X(X 1)] = n(n 1)p2
2 2
= n(n 1)p + np n p
= n2 p2 np2 + np n2 p2
= np(1 p)
As an alternative proof, let Si denote the event that trial i was a success. We know that X =
that ISi are independent random variables with distribution Bernoulli(p). Hence,
n
X
E[X] = E[
ISi ]
i=1
n
X
E[ISi ]
i=1
= np
and
45
Pn
i=1 ISi
and
n
X
V ar[X] = V ar[
ISi ]
i=1
n
X
V ar[ISi ]
i=1
= np(1 p)
Example 44 (Coding). Write a code that generates a number according to the Binomial(n,p) distribution. Consider the rbernoulli(p) function in Example 42.
def rbinom(n,p):
sum = 0
foreach ii in range(n):
sum += rbernoulli(p)
return(sum)
4.2.1 Exercises
Exercise 88. I throw a coin with probability p of heads 5 times. What is the probability that exactly 3 outcomes
are heads?
Exercise 89. I throw a fair coin 7 times. What is the probability that 1 or more outcomes are heads?
Exercise 90. The products created by a machine are defective independently with probability p. Out of a batch of
1000 products, how many are expected to be defective? What is the variance of the number of defective products?
Exercise 91. I throw a fair coin 4 times and a coin with probability p of heads 8 times. Let X be the total number
of heads. Compute E[X] and V ar[X].
Exercise 92. Consider that a box has o 1 orange balls and p 1 pink balls. Consider that I remove 2 balls
without replacement from the box. Let X denote the total number of pink balls. Is the distribution of X a Binomial?
If yes, find the parameters of the distribution.
Exercise 93. Assume you trow a coin n times and observe the total number of heads X. Say the probability of
heads is p. If you observe exactly i heads, i {0, 1, . . . , n}, what is the value of p that maximizes the probability
P(X = i)? This is an example of a statistical inference methodology: given the observed value of a random
variable, you try to infer what is the probability distribution it came from.
Exercise 94.
a. A fair coin is thrown 1001 times. Find the probability that one observes strictly more heads than tails.
b. A fair coin is thrown 1000 times. Find the probability that one observes strictly more heads than tails.
Exercise 95 (Challenge). Let X Binomial(1000, 0.5) and Y Binomial(1001, 0.5) be independent random
variables. Find the probability that Y > X.
46
qi =
i=j
qj
1q
Example 46. We can perform a sanity check and show that the pmf of the Geometric sums to 1. Observe that:
p(1 p)i1 = p
i=1
(1 p)j
j=0
p
=1
1 (1 p)
Observe that P(X i) corresponds to the probability that one observes at least i 1 failures. The following
lemma proves a way to compute P(X i) using the geometric series.
Lemma 37. Let X Geom(p). We have that P(X i) = (1 p)i1 .
Proof.
P(X i) =
=
X
j=i
P(X = j)
(1 p)j1 p
j=i
=p
(1 p)j1
j=i
p(1 p)i1
= (1 p)i1
p
1
p
and V ar[X] =
1p
.
p2
Proof.
E[X] =
=
X
i=1
iP(X = i)
ip(1 p)i1
i=1
i=1
(1 p)i
p
= p
X
(1 p)i
p
i=1
= p
=
1p
p p
1
p
E[X(X + 1)] =
i(i + 1)P(X = i)
i=1
i=1
X
2 (1 p)i+1
=
p
p2
i=1
2 X
=p 2
(1 p)i+1
p
i=1
2 (1 p)2
=p 2
p
p
2
= 2
p
Finally,
V ar[X] = E[X 2 ] E[X]2
(Lemma 17)
Example 47 (Coding). Write a code that generates a number according to the Geometric(p) distribution. Consider the rbernoulli(p) function in Example 42.
def rgeom(p):
ii = 1
while(not(rbernoulli(p))):
ii += 1
return(ii)
48
4.3.1 Exercises
Exercise 96. In a certain population, 10% of people have blood type O, 40% have blood type A, 45% have blood
type B, and 5% have blood type AB. Let Y denote the number of donors who enter a blood bank on a given day
until the first potential donor for a patient with type B (i.e. until a donor has either blood type O or B).
a. Find P (Y = 1).
b. Find P (Y 4).
c. Find E[Y ].
d. Find V ar[Y ].
Exercise 97. Let X Geom(p). Find E[X(X + 1)(X + 2)].
Exercise 98. Let X Geom(p). Find P(X t + s|X > s).
Exercise 99. Suppose a person is waiting at a bus stop. The person believes that the event of a bus coming in
each five-minute period is independent of each other five-minute period, and that the probability that a bus will
come in any given five-minute period is p, where p is assumed to be known. This belief is different from believing
that the buses operate on a fixed schedule. Having waited 20 minutes already, is the bus more likely, less likely or
equally likely to come in the next five-minute period? Why?
Exercise 100. Compute the expectation of a geometric random variable using Lemma 13.
Exercise 101. Consider the following game: a fair coin is flipped until the first heads appear. If n trials are
made in total, you receive the prize of 2n dollars. How much money you expect to receive in this game?
i1
r1
(1 p)ir pr .
Proof. Observe that P(Y = i) corresponds to the event that the r-th success occurs exactly at the i-th trial. By
definition, this event is true if and only if there are r successes and i r failures in the first i trials and the last
trial is a success. The probability of every outcome with r successes and i r failures in i trials is pr (1 p)ir .
i1
Furthermore, there are r1
possible permutations of the i r failures and r 1 first succeses. Since all these
i1 r
permutations are equally likely, P(Y = i) = r1
p (1 p)ir .
Example 48. Observe that if Y Negative Binomial(1, p), P(Y = i) =
i1
11
expression is exactly the pmf of a geometric distribution and, therefore, Y Geom(p). That is, the geometric
distribution is a particular case of the negative binomial when r = 1.
49
r
p
and V ar[Y ] =
r(1p)
.
p2
Proof. Recall that if Y has distribution Negative Binomial(r, p), then there exist X1 , . . . , Xr independent variables
P
such that Xi Geom(p) and Y = ri=1 Xi .
r
r
r
X
X
X
r
1
E[Y ] = E[
Xi ] =
E[Xi ] =
=
p
p
i=1
i=1
i=1
r
X
i=1
Xi ] =
r
X
V ar[Xi ] =
i=1
r
X
1p
i=1
p2
r(1 p)
p2
Example 49 (Coding). Write a code that generates a number according to the Negative Binomial(r, p) distribution. Consider the rbernoulli(p) and rgeom(p) functions in Examples 42 and 47.
def rnbinom(r,p):
sum = 0
for ii in xrange(r):
sum += rgeom(p)
def rnbinom(r,p):
number_of_ones = value = 0
while(number_of_ones < r):
value += 1
number_of_ones += rbernoulli(p)
return value
4.4.1 Exercises
Exercise 102. An exploratory oil well has a probability of 10% of striking oil. Consider that different wells are
independent
a. What is the probability that the 3rd time a company strikes oil happens on the 8-th try?
b. What is the expected number of tries until the company strikes oil for the 5th time?
c. What is the variance of the number of tries until the company strikes oil for the 8-th time?
Exercise 103. Consider that I throw a coin 5 times. Let X denote the total number of heads. After those flips,
I also throw the same coin until I get 2 heads. Let Y denote the total number of trials. What are the distributions
of X and Y ? Is P(X = 2) = P(Y = 5)? Interpret this result.
Exercise 104. Consider the same X and Y as in Exercise 103. Also consider that I picked the coin with equal
probability among a fair coin and a coin with probability 0.1 of heads and did not tell you which one I got. Let F
denote the event that I got the fair coin. Compute P(F |X = 2) and P(F |Y = 2).
50
Exercise 105. Argue that if X Negative Binomial(r, p) and Y Binomial(n, p), then P(X > n) = P(Y < r).
Exercise 106. You have two jars with candies, each of them with n candies. Each time you want one candy, you
choose at random if get it from jar A or jar B. What is the probability that by the time you empty the first jar,
there are exactly i candies left on the other one?
N
n
k
(ki)(Nni
)
.
N
(n)
equally probable, it follows from Lemma 7 that it is enough to determine the number of groups of size n such
that i individuals have the property of interest. Observe that such a group must contain i individual that have
the property of interest and n i individuals that dont that property. That is, we want to determine in how
many ways it is possible to select a group of i individuals out of k with the property and a group of n i
k
individuals out of the N k that dont have the property of interest. This number is ki Nni
. Conclude that
k N k
( )( )
P (X = i) = i Nni .
(n)
Lemma 42. Consider that there exist N individuals and k of them have a given property. Consider that a sample
of size n is drawn without replacement from the population and all groups of size n are equally likely. Let Mi
denote the event that the i-th drawn individual has the property of interest. For every 1 i N , P (Mi ) =
Proof. We pick the first individual with equal probability among N and, hence, P (M1 ) =
k
N.
k
N.
number of individuals with the property of interest that are selected after i draws. Since the sample is selected
without replacement, observe that P (Mi+1 |Xi = j) =
kj
N .
Hence,
P(Mi+1 ) =
i
X
(Theorem 3)
j=0
k
j
i
X
kj
=
N i
j=0
i
X
kj
=
N i
k!
j!(kj)!
j=0
N k
ij
N
i
(Lemma 41)
(N k)!
(ij)!(N ki+j)!
N!
i!(N i)!
(N k)!
(ij)!(N ki+j)!
N (N 1)!
j=0
i!(N i1)!
(k1)!
(N k)!
i
k X j!(kj1)! (ij)!(N ki+j)!
(N 1)!
N
j=0
(i)!(N i1)!
k1 N k
i
k X j
k
ij
=
N
1
N
N
i
j=0
i
X
k!
j!(kj1)!
51
N k
(k1
j )( ij )
is the pmf of a Hypergeometric(N 1, i, k 1) and the sum of the pmf
N 1
( i )
over all its possible values is 1 (Lemma 11).
nk
N
and V ar[X] =
nk (N k) N n
N
N
N 1 .
Proof. Let Mi denote the event that the i-th element of the sample has the property of interest. Also observe
P
that X = ni=1 IMi and IMi Bernoulli(P (Mi )). Therefore,
n
X
E[X] = E[
IMi ]
i=1
=
=
=
n
X
i=1
n
X
i=1
n
X
i=1
E[IMi ]
P (Mi )
nk
k
=
N
N
(Lemma 42)
Nevertheless, since the sampling is without replacement the IHi are not independent. Therefore, we cannot use
the same trick to compute the variance. Instead we compute E[X(X 1)],
E[X(X 1)] =
n
X
i(i 1)P(X = i) =
n
X
i(i 1)
i=0
i=1
N k
k!
i(i 1) i!(ki)! ni
N!
n!(N n)!
i=1
N k
n (i 1) k(k1)!
X
(i1)!(ri)! ni
N (N 1)!
i=1
n(n1)!(N n)!
k1 N k
n
kn X (i 1) i1 ni
N 1
N
n1
i=1
k N k
i
ni
N
n
n
X
E[X(X 1)] =
=
=
kn X
(i 1)P(Y = i 1)
N
kn
N
i=1
n1
X
jP (Y = j)
j=0
kn
kn (k 1)(n 1)
E[Y ] =
N
N
N 1
(17)
N k
(k1
i1 )( ni )
. Therefore equation 17
1
(Nn1
)
(Lemma 17)
2
N
N 1
N
N
kn N k N n
=
N
N
N 1
Example 50 (Coding). Write a code that generates a number according to the Hypergeometric(N, n, k).
def rhyper(N,n,k):
k = float(k)
number_of_ones = 0
for(ii in range(n)):
if(k > 0 and rbernoulli(k/N)):
k -= 1
number_of_ones += 1
return number_of_ones
4.5.1 Exercises
Exercise 107. In order to sell a batch of toys in Brazil, the batch must pass a test that the toys are safe. This test
consists of evaluating the safety of a sample without replacement of the toys in the batch. If any of the sampled
toys are found unsafe, none of the products of the batch can be sold. Assume that a batch has 100 products and 10
of them are defective and unsafe. What is the smallest sample size such that the probability that the batch doesnt
pass the test is larger than 95%?
Exercise 108. A box has 5 orange balls and 8 pink balls.
a. Consider that 3 balls are taken without replacement. Let X denote the total number of pink balls. Find tbe
distribution of X, E[X] and V ar[X].
b. Consider that 3 balls are taken with replacement. Let Y denote the total number of pink balls. Find the
distribution of Y , E[Y ] and V ar[Y ].
c. Compare V ar[X] to V ar[Y ].
Exercise 109. What is the distribution of the number of heads I get on the first 10 flips of a coin with bias p
given that on the first 20 flips I got 11 heads?
Exercise 110 (Challenge). Write a code that outputs with equal probability one of the permutations of (0, . . . , n
1).
53
e i
i!
X
xi
e =
i=0
i!
X
i=0
iP(X = i)
i
i=1
e i
i!
X
ie i1
i(i 1)!
i=1
=
=
i=1
X
j=0
e i1
(i 1)!
e j
j!
P(X = j) = 1
j=0
E[X(X 1)] =
i(i 1)P(X = i)
i=0
i(i 1)
i=2
2
= 2
e i
i!
i2
X
e
i=2
X
j=0
Finally,
54
(i 2)!
P(X = j) = 2
55
E[Y ]
np
n
k
N
k
N
(1
k
N)
N n
N 1
1p
p2
1
p
V ar[Y ]
np(1 p)
1
p
1p
p2
yN
Table 2
4.7.1 Exercises
Exercise 114. Can you find an example of a random variable X such that E[X(X + 1)] = 10 and E[X] = 5?
Exercise 115. Let A and B be events such that P(A) = 0.4, P(B) = 0.8 and P(A B) = 0.3. Find E[IA + IB ]
and V ar[IA + IB ].
56
Exercise 116. A system is composed of 4 components and works when at least 3 of them are operational. At
each minute, each component is operational independently with probability p. Let X denote the number of minutes
until the system works for the first time. Find the distribution of X. What is the probability that the system will
work in 5 minutes or less?
Exercise 117.
a. Let X Bernoulli(p). Compute E[etX ].
b. Let X1 , . . . , Xn independent and Bernoulli(p). Compute E[et
Pn
i=1
Xi ].
f
t (0).
c. Compute
2f
(0).
t2
1
2
E[X 2 ].
Exercise 123. Person A throws a fair coin 5 times. Person B throws a fair coin 6 times. Let X and Y denote
respectively the number of heads obtained by person A and B. Compute P(X > Y ).
Exercise 124. A urn contains 3 black balls and 4 white balls. You pick 4 balls at random. If exactly 2 of these
balls are black, you step. Otherwise, you replace these balls in the urn and repeat the whole procedure. This
continues until you get get exactly 2 of these balls are black. What is the probability that you make exactly n
selections?
57
3.
Rb
fX (x)dx
= 1.
Lemma 46. Let X be a continuous random variable with density fX (x). For every x R, P(X = x) = 0.
Proof. Observe that P(X = x) = P(x X x). Using Definition 27, P(x X x) =
Rx
x
fX (x)dx = 0.
fX (x) =
, if x < 0
3x2
, if 0 x 1
, if x > 0
Z
P(X R) =
fX (x)dx
Z 1
3x2
=
=
0
1
x3 0
=1
0.5
3x2 dx
0.5
Z
=
=
Example 52. Consider that X 1 and fX (x) =
c
x3
3x2 dx
0
0.5
x3 0
= 0.125
c
dx
3
x
c
c
= x3 1 =
3
3
fX (x)dx =
fX (x)dx
= 1. Hence c = 3.
Example 53. Consider that X 0 is a continuous random variable with density fX (x) = cex , for some c > 0.
fX (x)dx =
cex dx
c
= ex 0
c
= =1
Hence, c = .
Definition 28. Let X be a continuous random variable with density fX (x). The expected value of g(X) is
Z
E[g(X)] =
g(x)fX (x)dx
Comment: All the rules for E[] we saw in the discrete case still apply for continuous random variables. For
example, the additivity of expected values (Lemma 14) still applies.
Example 54. Consider Example 51.
E[X] =
xfX (x)dx
Z 1
x3x2 dx
=
0
3x3 dx
=
0
3 1 3
= x4 0 =
4
4
Also,
E[X 2 ] =
x2 fX (x)dx
x2 3x2 dx
=
0
Z
=
3x4 dx
3 1 3
= x5 0 =
5
5
Hence, V ar[X] = E[X 2 ] E[X]2 =
3
5
9
16
= 0.0375.
59
Definition 29. The cumulative distribution function (cdf ) of a random variable X is a function FX : R R,
F (x) = P(X x)
Lemma 47. A cdf F of a random variable X must satisfy the following properties, even if X is not continuous:
1. F is non-decreasing.
2. limx F (x) = 0.
3. limx F (x) = 1.
4. F is right-continuous.
Proof.
1. Note that, for every t1 t2 , { : X() t1 } { : X() t2 }. It follows from Lemma 5 that
F (t1 ) := P(X t1 ) P(X t2 ) := F (t2 ).
2. Let (yn )nN be a sequence such that yn . Define Bn := { : X() yn }. By construction,
Bn+1 Bn , and nN Bn = . From Exercise 21, it follows that
lim F (x) = lim F (yn ) = lim P(Bn ) = P(nN Bn ) = P() = 0
n
For continuous random variables, F is always continuous. However, for discrete random variables, F has leftdiscontinuities at the image values of X.
Example 55. Consider Example 51. For 0 x 1,
F (x) =
fX (x)dx
Z
x
=
=
3x2 dx
0
x
x3 0
= x3
60
fX (x)dx
F (b) F (a) =
fX (x)dx
Z b
fX (x)dx = P(a X b)
=
a
Lemma 49. Let FX (x) be the cdf of X and fX (x) be the pdf of X.
FX (x)
= fX (x)
x
Proof.
FX (x)
=
x
x
fX (y)dy
= fX (x)
Lemma 50. Let X be a continuous random variable with density fX . Then, for any set A F,
Z
P(X A) =
fX (x)dx
A
Definition 30. Let X be a continuous random variable. We denote the median of X by M ed[X] and define it
as the m R such that P(X m) = F (m) = 0.5. That is,
Z
fX (x)dx = 0.5
P(X m) =
f (x)dx
0
Z
=
=
3x2 dx
0
m
x3 0
= m3
1
Exercise 128. Consider X 1. Can there exist c > 0 such that the density of X is fX (x) = cx1 , for x 1?
Exercise 129 (Challenge). Let X be a continuous random variable. Find d R that minimizes E[|X d|]. (This
is not required for this course).
Hence, for every d R, E[|X d|] E[|X m|]. Conclude that m minimizes E[|X d|].
62
As in the case of discrete random variables, several continuous distributions are often used. We now explore
some of them.
fX (x) =
1
ba
, if a x b
, otherwise
Lemma 51. If X U (a, b), then for every set (c, d) such that (c, d) (a, b), P(c X d) =
dc
ba .
probability that X is inside an interval that is a subset of (a, b) is proportional to the length of the interval. Hence,
every two intervals of equal length have the same probability.
Proof.
Z
1
dx
b
a
c
x d d c
=
=
ba c
ba
P (c X d) =
a+b
2
and V ar[X] =
(ba)2
12 .
Proof.
Z
x fX (x)dx
E[X] =
Z b
x
dx
a ba
x2 b
b2 a2
b+a
=
=
=
2(b a) a 2(b a)
2
x2 fX (x)dx
E[X ] =
1
dx
ba
a
x3 b
b3 a3
a2 + ab + b2
=
=
=
3(b a) a 3(b a)
3
x2
3
4
4a2 + 4ab + 4b2 3a2 3b2 6ab
(a b)2
=
=
12
12
=
63
5.2.1 Exercises
Exercise 130. Let X U(0, 1). Find P(X (0.5, 0.25) X (0.5, 0.75)).
Exercise 131. Let X U(0, a) and Y U(0, b). If fX (a/2) > fY (b/2), is a < b? Why?
Exercise 132. Let X be a continuous random variable with cdf FX (x). Let U Uniform(0, 1). Let Y = FX1 (U ).
Find the cdf of Y .
Exercise 133 (Coding). In the computer language Python you can generate a Uniform(0,1) by using the function
random.random(). Consider that the function Inv F(x) gives you the inverse of the cdf F . Use Exercise 132 to
write a short code that generates a random variable that has the cdf F .
Exercise 134 (Coding). Use Exercise 132 to write a function that simulates a random variable with distribution
U (a, b) using the function random.random().
1 e x
fX (x) =
0
, if x 0
, otherwise
x
1 e dy
0
y x
x
= e 0 = 1 e
FX (x) =
x fX (x)dx
E[X] =
Z
x1 e dx
0
Z
x
x
= xe
e dx
0
0
x
= 0 e 0 =
=
Integration by parts
64
A second way of deriving the variance of the exponential distribution is by using the fact that, if X is a nonR
negative continuous distribution E[X] = 0 P(X > x)dx (this is the continuous version of Lemma 13 try to
prove it!). From Lemma 53,
Z
E[X] =
x
x
e dx = e 0 =
x2 fX (x)dx
x2 1 e dx
0
Z
x
x
2
= x e
2xe dx
0
0
Z
x
x
2e dx
= 0 2xe 0
0
x
2
2
= 2
= 2 e
0
=
Integration by parts
Integration by parts
Lemma 55. If X Exp(), then X is memoryless. That is, P(X > t + s|X > t) = P(X > s).
Proof.
P(X > t + s X > t)
P(X > t)
P(X > t + s)
=
P(X > t)
t+s
e
s
= e = P(X > s)
Notice that this is very similar to what happens to the geometric distribution (Exercise 98).
5.3.1 Exercises
Exercise 135. Let X Exp(1). Find E[X 3 ].
Exercise 136. A light bulb lasts, on average, 300 hours. What is the probability that it will last more than 700
hours given that it has already lasted 300 hours? Assuming the time it lasts follows an exponential distribution.
Exercise 137. The mode of a continuous distribution is the value that maximizes fX . What is the mode of an
exponential distribution?
65
Exercise 138 (Coding). Use Lemma 53 and Exercise 133 to simulate from an Exponential().
Exercise 139. Let X1 Exponential(1 ) and X2 Exponential(2 ) be independent random variables. Let
Z = min(X1 , X2 ). Find the cdf of Z. Do you recognize this cdf ? Generalize this result for X1 , . . . , Xn such that
Xi Exponential(i ) and Z = min(X1 , . . . , Xn ).
Hint: for any a <, min{x, y} > a if, and only if, both x and y are larger than a.
Pn
Pn
1 1
1 1
FZ (z) is the cdf of an Exponential distribution with parameter
. Hence, Z Exp(
).
i=1 i
i=1 i
Exercise 140. Consider a system with n components in series. Recall that this system fails whenever either of
the components fail. Assume that each component lasts on average 1 year before failing. How long do you expect
the system to last until failing? Use Exercise 139.
Z
(z) =
tz1 et dt
ta1 et dt
ta1 et 0
(a 1)ta2 et dt
0
Z
= 0 + (a 1)
ta2 et dt = (a 1)(a 1)
=
Integration by parts
Gamma Function (Definition 33)
et dt = et 0 = 1
Definition 34. We say that a random variable X follows the Gamma distribution with parameters(k, ) and
denote by X Gamma(k, ) if the density of X is
fX (x) =
x
1
xk1 e
(k)k
, if x > 0
, otherwise
66
Z
0
x
1
xk1 e dx =
k
(k)
=
0
1
(t)k1 et dt
(k)k
1 k1 t
t e dy = 1
(k)
Calling x = t
Gamma Function (Definition 33)
Observe that, in the special case in which k = 1, the Gamma distribution corresponds to the Exponential
distribution.
Lemma 58. For every a > 0, E[X a ] =
(k+a)a
(k) .
Proof.
a
x
1
xk1 e dx
k
(k)
Z0
x
1
k+a1
x
e
dx
=
(k)k
0
Z
x
(k + a)a
1
=
xk+a1 e dx
k+a
(k)
(k + a)
0
(k + a)a
=
(k)
E[X ] =
xa
Density of a Gamma(k + 1, )
Properties of the Gamma function (Lemma 56)
k(k)1
(k + 1)1
=
= k
(k)
(k)
(k + 2)2
k(k + 1)(k)2
E[X 2 ] =
=
= k(k + 1)2
(k)
(k)
E[X] =
5.4.1 Exercises
Exercise 141. Let X be the time a student takes to solve this exercise. Assume that X follows a Gamma
distribution. If E[X] = 6 minutes and V ar[X] = 12 minutes2 , what are the parameters of the Gamma distribution?
Exercise 142. Let X Gamma(k, ). Find E[X 3 ].
Exercise 143. Let X Gamma(k, ). Let c > 0 and Y = cX.
a. What is E[Y ] and V ar[Y ]?
b. If you knew that Y follows Gamma distribution, what would be the parameters of the distribution?
c. Use the fact that
P (Y y)
y
1.2
0.8
1.5
0.4
f(x)
1.0
10
0.0
0.0
4
10
Gamma( 1 , 0.5 )
Gamma( 1 , 1 )
Gamma( 1 , 2 )
10
10
0.0
0.2
f(x)
0.4
0.8
0.4
0.0
f(x)
2
10
Gamma( 2 , 0.5 )
Gamma( 2 , 1 )
Gamma( 2 , 2 )
10
0.10
f(x)
0.2
0.00
0.1
0.0
0.0
0.2
f(x)
0.4
0.3
0.6
f(x)
Gamma( 0.5 , 2 )
0.5
f(x)
2
f(x)
Gamma( 0.5 , 1 )
f(x)
10
10
68
6
x
10
fX (x) =
(+)
()()
x1 (1 x)1
, if 0 < x < 1
, otherwise
Z
0
( + ) 1
x
(1 x)1 dx = 1
()()
Proof.
Z
()() =
x
Z0 Z
1 x
=
0
dx
y 1 ey dx
(Definition 33)
x1 y 1 e(x+y) dxdy
0
Z 1
t = x + y, u = x(x + y)1
(Definition 33)
Hence,
R1
0
u1 (1 u)1 du =
()()
(a+b)
and
R1
(+) 1
(1
0 ()() x
x)1 dx = 1.
Since the Beta distribution assumes values in (0, 1), it is commonly used to model relative frequencies and
ratios. It is a flexible distribution and can assume many different shapes. Figure 5 shows some densities that the
Beta distribution can assume.
Lemma 60. If X Beta(, ), then for every c, d 0 E[X c (1 X)d ] =
V ar[X] =
(+c)(+d)
(++c+d) .
Hence, E[X] =
.
(+)2 (++1)
Proof.
E[X c (1 X)d ] =
xc (1 x)d fX (x)dx
Z
=
( + ) 1
x
(1 x)1 dx
()()
Z
( + c)( + d) 1 ( + + c + d)
x (1 x)1 dx
( + + c + d) 0 ( + c)( + d)
( + c)( + d)
1
( + + c + d)
xc (1 x)d
( + )
=
()()
( + )
=
()()
69
(Lemma 59)
and
Beta( 0.5 , 1 )
Beta( 0.5 , 2 )
6
4
f(x)
1.0
f(x)
2.0
f(x)
3.0
0.2
0.4
0.6
0.8
1.0
0.0
0.8
1.0
0.0
0.2
0.4
0.6
0.8
Beta( 1 , 2 )
f(x)
0.4
0.6
0.8
1.0
f(x)
4
3
1
0.2
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
Beta( 2 , 0.5 )
Beta( 2 , 1 )
Beta( 2 , 2 )
0.6
0.8
1.0
1.0
0.0
0.5
f(x)
f(x)
0.4
1.0
1.5
0.2
1.0
Beta( 1 , 1 )
0.6 0.8 1.0 1.2 1.4
Beta( 1 , 0.5 )
f(x)
0.6
0
0.0
0.4
x
f(x)
0.0
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
70
0.0
0.2
0.4
0.6
x
0.8
1.0
( + ) ( + 1)( + 0)
()() ( + + 1 + 0)
( + )
()
=
()
( + )( + )
+
(
+
2)(
+
0)
(
+
)
E[X 2 ] =
()() ( + + 2 + 0)
( + )
( + 1)()
( + 1)
=
=
()
( + + 1)( + )( + )
( + )( + + 1)
E[X] =
(Lemma 56)
(Lemma 56)
( + )( + + 1) ( + )2
( + )( + 1) 2 ( + + 1)
=
=
2
( + + 1)( + )
( + + 1)( + )2
=
5.5.1 Exercises
Exercise 144. Let X Beta(, ). Find E[X(1 X)].
Exercise 145. Let X Beta(1, ). Find FX (x).
Exercise 146. Let X Beta(, 1). Find FX (x).
Exercise 147. Let X Beta(, )
a. Let Y = 1 X. What is E[Y ] and V ar[Y ]?
b. If you knew that Y follows Beta distribution, what would be the parameters of the distribution?
c. Use the fact that
dP(Y y)
y
(X)
N (0, 1).
(X)
.
FZ (z) = P(Z z) = P
(X )
z
= P(X z + ) = FX (z + )
Hence,
fZ (z) =
FZ (z)
FX (z + )
=
z
z
FX (z + ) (z + )
=
(z + )
z
(Lemma 49)
Chain Rule
= fX (z + )
(z+)2
1
1
2
= e 2
= ez
2
2
It follows from Lemma 61 that every parametrization of the normal distribution can be obtained by appropriate
rescaling of the standard normal. Figure 6 presents the density of the standard normal distribution.
0.2
0.0
0.1
f(x)
0.3
0.4
Figure 6
Lemma 62. If X N(, 2 ), then E[X] = and V ar[X] = 2 .
Proof. Consider that X N(0, 1).
Z
E[X] =
1
2
xex dx
2
72
1
2
x2 ex dx
2
Z
1
1
2
x2
ex dx
= xe
+
2 2
2
E[X 2 ] =
integration by parts
=0+1
X
+
=E
=0+=
(Lemma 61)
X
V ar[X] = V ar
+
X
2
= V ar
= 2 1 = 2
(Lemma 61)
Lemma 63 (Binomial Approximation and Central Limit Theorem). Consider that X Binomial(n, p) and that
n is large (n = 30 is often enough for a good result).
X np
p
z
np(1 p)
!
(z)
This is a specific instance of a more general rule. If X1 , X2 , . . . is a sequence of independent random variables
with the same distribution, E[Xi ] = and V ar[Xi ] = 2 , then, for large n,
Pn
P
(Xi
i=1
z
(z)
Proof. We will make the statement more precise as well as prove it in Chapter ??.
5.6.1 Exercises
Exercise 148. Let X N (, 2 ). Find E[X 3 ].
Exercise 149. Let Y N (, 2 ). Let a, b R. Show that
P(a < Y < b) = ((b )/) ((a )/) ,
where is the cumulative distribution of a standard normal distribution (i.e., mean zero and variance 1).
73
This shows one can compute any probability related to a normal random variable by computing the cumulative
distribution of a standard normal distribution. This was very important when computers were not available because
it allows one to compute all probabilities of interest by having access to a single table. Not many distributions have
this property.
Exercise 150. Let X denote the number of heads on 10000 throws of a fair coin. Approximate P(X 5050).
74
pdf: fY (y)
E[Y ]
V ar[Y ]
Uniform(a, b)
1
ba
a+b
2
(ba)2
12
k2
Exponential()
Gamma(k)
Beta(, )
Normal(, 2 )
y (a, b)
y
1
e
y R+
y
1
y k1 e
(k)k
y R+
(+) 1
(1 x)1
()() x
y (0, 1)
1 e
2
(z)2
2 2
yR
Table 3: Review of Continuous Distributions
75
1p
p2
6 Bibliography
References
P. Billingsley. Probability and measure. John Wiley & Sons, 2008. 13
76