Sei sulla pagina 1di 76

Introduction to Probability Theory and Random

Processes
Lecture Notes

Rafael Bassi Stern, with edits from Rafael Izbicki

Last revised: January 20, 2016


Please send remarks, comments, typos and mistakes to rafaelizbicki@gmail.com

Contents
1 Mathematical Prologue
1.1

1.2

Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1

Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.2

Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.3

Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.4

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(Basic) Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.1

Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2.2

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Probability Theory
2.1

10

The Axioms of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.1.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

*The Truth About Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.2.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.3.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.4

*Why were these axioms chosen? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.5

Equiprobable Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.5.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.6.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

The Law of Total Probability and Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.7.1

24

2.2
2.3

2.6
2.7

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Random Variables
3.1
3.2
3.3
3.4

27

Distribution of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

3.1.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

(Conditional) Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

3.2.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.3.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

Covariance and the Vector Space of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . .

38

3.4.1

41

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 Bernoulli Processes (a long example. . . )

43

4.1

Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

4.2

Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

4.2.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

4.3.1

49

4.3

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4
4.5
4.6
4.7

Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

4.4.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.5.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

4.6.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

Review of Special Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.7.1

56

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Continuous Random Variables


5.1
5.2
5.3
5.4
5.5
5.6
5.7

58

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

5.1.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63

5.2.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

5.3.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

5.4.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

5.5.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

5.6.1

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

Review of Special Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

6 Bibliography

76

1 Mathematical Prologue
1.1 Combinatorial Analysis
Being able to accurately count the possible results of an experiment is key in probability theory. Combinatorial
analysis is a powerful tool towards achieving this goal. This section contains a brief introduction to it.
1.1.1 Counting
Assume r experiments will be performed. If the i-th experiment has ni possible outcomes, i = 1, . . . , n then the
total number of outcomes of the r experiments is
ri=1 ni := n1 n2 . . . nr .
Example 1. If we flip a coin r times, the total number of outcome in each experiment is two: either heads or
tails. It follows from the counting principle that there are ri=1 2 = 2r different results on the joint experiment,
i.e., there is a total of 2r sequences of heads or tails.
Example 2. If we flip a coin and then a dice, there is a total of 2 6 = 12 results one can observe.
1.1.2 Permutations
Consider a set with n objects. The total number of ways we can sort these objects is given by
n! := n (n 1) . . . 2 1
Example 3. There are 4!=24 ways of displaying four shirts in a closet.
1.1.3 Combinations
Consider a set with n objects. The total number of different groups that of size r n that can be formed with
this n objects is given by
 
n
n!
:=
r
r!(n r)!
Example 4. There are

10
2

= 45 ways of choosing two questions from a section with 10 questions from this book

to study.
1.1.4 Exercises
Exercise 1. A restaurant offers three different entries, five main dishes and six deserts. How many different
meals does this restaurant offer?
Exercise 2. A student has 2 math books, 3 geography books and 4 chemistry books. In how many ways can these
books be displayed in a shelf:
If the student does not care about books of the same subject not being close together?
If the student wants that books about the same subject stay together?
Exercise 3. How many different words can be formed with two letters A, three letters B and one letter C?
4

Exercise 4. A student needs to study for three exams this week. The math teacher gave him 6 exercises to help
him study to the test, the geography teacher gave him 7 exercises, and the chemistry teacher gave him 5 exercises.
Considering the student does not have much time, how many different sets of exercise can he pick to do if he
wants to choose only 2 math exercises, 4 geography exercises, and 1 chemistry exercise?
Exercise 5. Eight people, named A, B, . . . , H, are going to make a line.
In how many ways can these people be placed?
In how many ways can these people be placed if A and B have to be next to each other?
In how many ways can these people be placed if A and B have to be next to each other, and C and D also
have to be next to each other?

1.2 (Basic) Set Theory


As we saw in the last section, combinatorial analysis is used to count the possible results of an experiment. In
probability theory, we also have to be able to accurately describe such results. This is where set theory becomes
important.
A set is a collection of objects.

If a set has a finite number of objects, o1 , o2 , . . . , on , we denote it by

{o1 , o2 , . . . , on }. We denote the set of natural numbers by N, the set of integers by Z and the set of reals by
R. In probability, sets are fundamental for describing outcomes of an experiment.
Example 5 (Sets).
The set of possible outcomes of a six sided die: {1, 2, 3, 4, 5, 6}.
The set of outcomes of a coin flip: {T, H}.
The set of outcomes of two coin flips: {(T, T ), (T, H), (H, T ), (H, H)}.
The set of all odd numbers: {2n + 1 : n N} or {1, 3, 5, 7, . . .}.
The set of non-negative real numbers: {x R : x 0}.
A circle of radious 1: {(x, y) R2 : x2 + y 2 1}.
Definition 1 ( and ).
/ We write o S if object o is an element of set S and o
/ S, otherwise.
Example 6 ( and ).
/
T {T, H}.
7
/ {1, 2, 3, 4, 5, 6}.
7 {2n + 1 : n N}.
Definition 2 (empty set - ). is the only set with no elements. That is, for every object o, o
/ .
Definition 3 (disjoint sets).
Two sets A and B are disjoint if, for every o A we have that o
/ B and for every o B, o
/ A.
5

A sequence of sets (An )nN is disjoint if, for every i 6= j, Ai is disjoint from Aj .
Example 7 (Disjoint sets).
{1, 2} and {3, 4} are disjoint.
{1, 2} and {2, 3} are not disjoint since 2 {1, 2} and 2 {2, 3}.
Definition 4 ( and =). Let A and B be two sets. We say that:
A B if, for every o A, o B.
A = B if A B and B A.
Example 8 ( and =).
{1, 2} {1, 2, 3, 4}.
{n Z : n 1} N.
{n Z : n 0} = N.
We reserve the symbol for the set of all objects we are considering in a given model. is often called the
sample space in probability theory. That is, for every set A we consider in that model, A .
1.2.1 Set operations
Definition 5 (complement - c ). Let A be a set. o is an element of Ac if and only if o
/ A. That is, the complement
of A is formally defined as Ac = {o : o
/ A}.
Example 9 (c ).
Let = {T, H}, {T }c = {H}.
Let = {1, 2, 3, 4, 5, 6}, {1, 2}c = {3, 4, 5, 6}.
Let = N, {n N : n > 0}c = {0}.
Definition 6 (union - ).
Let A and B be two sets. o is an element of the union between A and B, A B, if and only if either
o is an element of A or o is an element of B. That is, A B = {o : o A or o B}.
Let (An )nN be a sequence of sets. o is an element of the union of (An )nN , nN An , if and only if
there exists n N such that o An . That is, nN An = {o : exists n N such that o An }
Example 10 ().
{T } {H} = {T, H}.
{1, 2} {2, 3} = {1, 2, 3}.
{1} {3} {5} = {1, 3, 5}.
{n Z : n > 0} {n Z : n < 0} = {n Z : n 6= 0}.
6

nN {n} = N.
nN {x R : x n} = {x R : x 0}.
nN {x R : x 1/(n + 1)} = {x R : x > 0}.
Definition 7 (intersection - ).
Let A and B be two sets. o is an element of the intersection between A and B, A B, if and only if o
is an element of A and o is an element of B. That is, A B = {o : o A and o B}.
Let (An )nN be a sequence of sets. o is an element of the intersection of (An )nN , nN An , if and only
if for every n N, o An . That is, nN An = {o : for every n N, o An }
Example 11 ().
{T } {H} = .
{1, 2} {2, 3} = {2}.
({1, 2} {2, 3}) {5} = {2, 5}.
{n Z : n 0} {n Z : n 0} = {0}.
nN {i N : i n} = .
nN {x R : x n} = {x R : x 0}.
Theorem 1 (DeMorgans laws). Let (An )nN be a sequence of subsets of . Then, for every n N,
(ni=1 Ai )c = ni=1 Aci
(ni=1 Ai )c = ni=1 Aci
Moreover,
(iN Ai )c = iN Aci
(iN Ai )c = iN Aci
Definition 8 (Partition). Let (An )nN be a sequence of sets. We say that (An )nN partitions if:
for every i, j N such that i 6= j, Ai and Aj are disjoint.
nN An = .
1.2.2 Exercises
Exercise 6. Let A = {1, 3, 5}, B = {1, 2} and = {1, 2, 3, 4, 5, 6}. Find:
a. A B
b. A B c
c. B B c
7

d. (A B)c
e. (A B)c
Exercise 7. Consider that a given day can be rainy - R - or not rainy - N R. We are interested in the weather
of the next two days.
a. How would you formally write ?
b. How would you formally write The outcomes such that both days are rainy ? Call this set A.
c. How would you formally write The outcomes such that at least one day is rainy? Call this set B.
d. Is it true that A B?
e. Find B c . How would you describe this set in English?
f. Is it true that A and B c are disjoint?
Exercise 8. Are the following statements true or false?
a. .
b. {}.
c. {} .
d. .
e. {}.
f. {} .
Exercise 9. Prove the following:
a. S and T are disjoint if and only if S T = .
b. S T = T S
c. S T = T S
d. S = (S c )c
e. S S c =
f. S = S
g. S =
h. (S T )c = S c T c
i. (S T )c = S c T c
j. ({n})nN partitions N.
k. ({1, 2}, {3, 4}, , , , . . .) partitions {1, 2, 3, 4}.
8

Exercise 10. Prove DeMorgans law.


Exercise 11. In a class of 25 students, consider the following statements:
14 will major in Computer Science.
12 will major in Engineering.
5 will major both in Computer Science and Engineering.
How many wont major in either Computer Science or Engineering?

2 Probability Theory
2.1 The Axioms of Probability
We denote by the sample space, that is, the set of all possible outcomes in the situation we are interested in.
For example, one might be interested in the outcome of a coin flip ( = {H, T }), the number of rainy days next
year ( = {n N : n 366}) or the time until a given transistor fails ( = {x R : x > 0}).
For every A , one might be uncertain if A is true or not. For example, {T } {H, T } and one cannot usually
determine if the outcome of a coin flip will be heads or tails before the coin is tossed. A probability is a function
defined on subsets of which is usually interpreted in one the following ways:
(Frequency) P (A) denotes the relative frequency which outcomes in A will be observed if the same experiment
is performed many times independently.
(Degree of Belief) P (A) represents the evaluators degree of belief that the outcome will be an element in A.
Events are subsets of to which we attribute probabilities. We denote by F the set of all events. In order for
P to be a probability it must satisfy the Axioms of Probability.
Definition 9 (Axioms of Probability). P : F R is a probability if:
1. (Non-negativity) For all A F, P(A) 0.
2. (Countable additivity) If (An )nN is a sequence of disjoint sets in F, P(nN An ) =

nN P(Ai ).

3. (Normalization) P() = 1.
Next, we prove some consequences of these axioms that are constantly used:
Lemma 1. P() = 0.
Proof. Let (An )nN be a sequence of subsets of such that A0 = and Ai = for all i > 0. By constructions,
this is a sequence of disjoint sets and, therefore, by countable additivity:
P(nN Ai ) =

P(Ai ) = P() +

P()

n>0

nN

Observe that nN Ai = and, applying this fact on the previous equation:


P() = P() +

P()

n>0

Conclude that P() = 0.


Lemma 2. If A1 , A2 , . . . , An are disjoint, then P(A1 A2 . . . An ) =

Pn

i=1 P(Ai ).

Proof. Consider the sequence (Bi )iN such that Bi = Ai for i n and Bi = for i > n. Observe that (Bi )iN is
a sequence of disjoint sets. Hence, by countable additivity:
P(A1 A2 . . . An ) = P(iN Bi ) =

P(Bi ) =

iN

Since P() = 0, conclude that P(A1 A2 . . . An ) =

Pn

i=1 P(Ai ).

10

n
X
i=1

P(Ai ) +

X
i>n

P()

Lemma 3. For every event A, P(Ac ) = 1 P(A)


Proof. Observe that A and Ac are disjoint and A Ac = . Hence:
1 = P() = P(A AC ) = P(A) + P(Ac )
Putting the extremes together, 1 = P(A) + P(Ac ), that is, P(Ac ) = 1 P(A).
Lemma 4. For every events A and B, P(A B) = P(A) + P(B) P(A B).
Proof. Observe that Ac B and A B are disjoint and (Ac B) (A B) = B. Hence,
P(B) = P(Ac B) + P(A B)

(1)

P(Ac B) = P(B) P(A B)

(2)

Also observe that A and Ac B are disjoint and A (Ac B) = A B. Hence,


P(A B) = P(A) + P(AC B)

(3)

Using Equation 2 on Equation 3:


P(A B) = P(A) + P(B) P(A B)

Lemma 5. If A B, then P(B) P(A).


Proof. Observe that A B and Ac B are disjoint and (A B) (Ac B) = B. Hence,
P(B) = P(A B) + P(Ac B)
Since A B, conclude that A B = A. Hence, P(B) = P(A) + P(Ac B). From the non-negativity of probability,
P(Ac B) 0 and, therefore, P(B) P(A).
Example 12. Consider a coin flip. Let = {H, T } and F = {, {H}, {T }, {H, T }}. It follows from the axioms
of probability that P() = 0 and P({H, T }) = 1. We say that the coin is fair if P(H) = 0.5 and, therefore,
P(T ) = 1 P(H) = 0.5.
2.1.1 Exercises
Exercise 12. If the probability that I get a pocket pair in a poker match is

1
18 ,

what is the probability that I dont

get a pocket pair?


Exercise 13. Let A and B be two disjoint events. Is it possible for P(A) = 0.4 and P(B) = 0.7? Why?
Exercise 14. Consider the following incorrect argument:
Consider a fair coin and a fair six-sided die. Let the set of all possible outcomes for the coin be C,
which implies P(C) = 1. Let the set of all possible outcomes for the die be D, which implies P(D) = 1.
Now P(C D), which is the probability that either C or D occur is 1. Nevertheless, C and D are also
disjoint events, hence P(C D) = P(C) + P(D) = 1 + 1. That is, 1 + 1 = 1.
11

What is the sample space in this problem description? What were the mistakes in the argument?
Exercise 15. Let A and B be two events.
a. If P(A) = 0.7, P(B) = 0.4 and P(A B) = 0.8, what is the value of P(A B)?
b. If P(A B) = 0.25, P(A B) = 0.75 and P(A) = 2P(B), what is P(A)?
c. Prove that if P(A) = P(B c ), then P(Ac ) = P(B).
Exercise 16. Show that P((A B c ) (Ac B)) = P(A) + P(B) 2P(A B).
Exercise 17. Prove that
P(A B C) = P(A) + P(B) + P(C) P(A B) P(A C) P(B C) + P(A B C)
Exercise 18.
Let be the sample space and A and B be events. Prove that:
a. P(A B) = 1 P(Ac B c ).
b. P(A) = P(A B) + P(A B c ).
c. max(P(A), P(B)) P(A B) min(1, P(A) + P(B)).
d. max(0, P(A) + P(B) 1) P(A B) min(P(A), P(B)).
Exercise 19. Let A1 , A2 , . . . F. Prove that
P(n1 An )

P(An )

n1

Exercise 20. Let A, B and C be events of . Using set operations, describe the following sets:
At least one of A, B or C happens
All A, B or C happen
A and B happen, but C does not
Exactly one of A, B and C happen
Exercise 21. (Challenge). Let A1 , A2 , . . . be a sequence of events with Ai Ai+1 for every i 1. Show that
P( lim An ) = lim P(An ),
n

where
lim An := n1 An .

c
Hint: Define Fn := An (n1
i=1 Ai ) and notice they are disjoint.

Also show that if Ai+1 Ai for every i 1 instead, then


P( lim An ) = lim P(An ),
n
n
12

where
lim An := n1 An .

Exercise 22. Assuming that P(An ) = 1 for every n N, prove that P(nN An ) = 1.

2.2 *The Truth About Probability


Although when presenting the axioms of probability we assume that P is defined over any subset E of the sample
space , for uncountable spaces this is actually too strong: for instance, it is not possible to define a uniform
distribution in (0, 1) (see Section 5) that satisfy all axioms. Modern probability (more precisely, measure theory),
shows us that it is only possible to define such P over some events. In this section we present a brief overview of
this issue. A probability measure is actually defined over a -field of the sample space:
Definition 10 (-field).
We say that a set F of subsets of is a -fields if
(i) F
(ii) A F implies Ac F
(iii) Ai F for every i N implies iN Ai F
Example 13 (-fields). The following are -fields:
F = {, },
F = {, {a}, {b, c}, } when = {a, b, c},
F = P(), i.e., the set of the parts of (i.e., P() = {E : E }).
We can now state the general probability axioms:
Definition 11. Let F be a -field over . A probability measure over F is a function P : F R that satisfies:
1. (Non-negativity) For all A F, P (A) 0.
2. (Countable additivity) If (An )nN is a sequence of disjoint sets in F, P(nN An ) =

nN P(Ai ).

3. (Normalization) P() = 1.
Some sets F are more useful than others. For instance, it is possible to define the continuous uniform measure
over a special -field called the Borelians of (0, 1), however it is not possible to do it over the set of the parts of
(0, 1). This is the reason why -fields are necessary in modern probability theory. For more on this issue, see for
instance (Billingsley, 2008).
2.2.1 Exercises
Exercise 23. Let = {a, b, c, d}. Is {, {a}, {b, c}, } a -field of ?
Exercise 24. Let F be a -field. Prove that if Ai F for every i N, then iN Ai F.

13

2.3 Random Variables


Numbers we are uncertain about occupy a special position in probability theory. Random variables are functions
from to R and are used to represent uncertainty about numbers.
Definition 12. A random variable X is a function such that X : R.
Example 14. Consider a single coin flip. = {H, T }. The number of times we observed tails is represented
through a random variable X such that X(H) = 0 and X(T ) = 1.
Example 15. Consider that a given day can be rainy - R - or not rainy - N R. We are interested in the weather
of the next two days. = {(N R, N R), (N R, R), (R, N R), (R, R)}. The number of times it will be rainy in the
next two days is represented through a random variable X defined in table 1.
w
(N R, N R)
(N R, R), (R, N R)
(R, R)

X(w)
0
1
2

Table 1: Number of rainy days (Example 15)

Definition 13 (Indicator function). The indicator function is a special type of random variable. Consider an
event A F. The indicator function of A is denoted by IA : R and defined as:

1
IA (w) =
0

, if w A
, otherwise

We will further discuss random variables in Chapter 3.


2.3.1 Exercises
Exercise 25. Let A and B be two disjoint events. Can you write IA + IB as a single indicator function?
Exercise 26. Let A and B be two events. Is it true that IAB IA IB = 0? Why?

2.4 *Why were these axioms chosen?


There are many possible justifications for the axioms of probability. A simple one involves certain types of bets.
Consider a lottery ticket in which you get $1 if event A happens and $0, otherwise. Consider this game:
1. You define a price for this ticket P r(A).
2. An opponent determines if you buy the ticket from him or sell the ticket to him.
Your monetary outcome is A (IA P r(A)), where A is 1 if the opponent makes you sell and A = 1 if he
makes you buy. You are a sure loser if the opponent has the option to make you lose money for sure. When can
you be made a sure loser?
Consider that P r(A) < 0. In this case, IA P r(A) > 0 and, therefore, the opponent can make you a sure loser
by setting A = 1.
Consider that P r() 6= 1. Observe that I = 1. Hence I P r() = 1 P r().
14

If P r() < 1, I P r() > 0 and, therefore, the opponent can make you a sure loser setting A = 1.
If P r() > 1, I P r() < 0 and, therefore, the opponent can make you a sure loser setting A = 1.
Finally, let A and B be disjoint events. Observe that IAB IA IB = 0. Hence, if the opponent sets
AB = 1, A = 1, B = 1, your final outcome is:
AB (IAB P r(A B)) + A (IA P r(A)) + B (IB P r(B)) =
(IAB IA IB ) + (P r(A) + P r(B) P r(A B)) =
P r(A) + P r(B) P r(A B)
Similarly, if AB = 1, A = 1, B = 1, then your final outcome is P r(A B) P r(A) P r(B). Hence, if
P r(A B) 6= P (A) + P (B) there exist an option such that the opponent makes you a sure loser. That is, in order
for you to avoid sure loss P r must satisfy:
1. P r(A) 0.
2. P r() = 1.
3. If A and B are disjoint, P r(A B) = P r(A) + P r(B).
These rules closely resemble the axioms of probability. Hence, if one believes that how much one is willing to
bet can approximate degrees of belief, this reasoning can justify the axioms of probability.

2.5 Equiprobable Outcomes


Consider the specific case in which is a finite set and for all w1 , w2 , P({w1 }) = P({w2 }). That is the
probability of every possible outcome is the same. Equiprobability is a special case and does not apply to all
probability models.
Lemma 6. If all outcomes are equiprobable, then for every w , P({w}) = ||1 .
Proof. Notice that = w {w}. Also observe that, if w1 6= w2 , then {w1 } is disjoint of {w2 }. Hence,
X

P({w}) = P ()

(Lemma 2)

P({w}) = 1

(Definition 9)

|| P({w}) = 1

(Equiprobable outcomes)

P({w}) = ||1

Lemma 7. For every event A, P(A) =

|A|
|| .

15

Proof. Observe that A = wA {w}. Since the events are disjoint,


P(A) =

P({w})

(Lemma 2)

||1

(Lemma 6)

wA

X
wA

|A|
||

That is, in problems with equiprobable outcomes, a probability of a set is proportional to its size. In order to
determine sizes of sets, some counting principles are of great help. We saw some combinatorial analysis in Chapter
1, heres a quick review in the context of the examples we have been using:
Consider you have n different objects. The number of ways you can select k times among them with
replacement is nk .
Example 16. Consider a six sided die. It can assume 6 different values. Hence, if we throw the die 3 times
in a row, we can observe 63 different outcomes.
Consider you have n different objects. If the order of selection matters, the number of ways you can select
k n of these objects is

n!
(nk)! .

Example 17. Consider there are n participants in a given chess tournament. In how many ways can the
list of top 3 be composed?

n
(n3)! .

Consider you have n different objects. If the order of selection doesnt matter, the number of ways you can

n!
.
select k n of these objects is nk = k!(nk)!

Example 18. The number of poker hands is 52
5 .
2.5.1 Exercises
Exercise 27. Consider we throw a fair coin 3 times.
a. What is the sample space?
b. What is the probability of each outcome?
c. What set corresponds to obtain at least two heads? What is the probability of this set?
Exercise 28.
a. How many distinct 7-digit phone numbers are there? Assume that the first digit of any telephone number
cannot be 0 or 1.
b. If I pick a number equally likely among all valid phone numbers, what is the probability that I pick 8939240?
c. What about any number of the form 893 924z, where z can be any digit between 0 and 8?
d. What about a number such that no 2 digits are the same?

16

Exercise 29. Suppose there are 4 people in a room. What is the probability that no two of them celebrate
their birthdays in the same month? Assume the probabilities of having a birthday in each month are equal and
the birthday of two people are independent from one another. Can you think of a situation in which assuming
independence wouldnt be reasonable?
Exercise 30. Consider that a fair six-sided die is tossed twice. Consider that all outcomes are equally probable.
Let X be a random variable that denotes the outcome of the first die. Let Y be a random variable that denotes the
outcome of the second die. Find P (|X Y | {0, 1, 2}).
Exercise 31. (Challenge) Player A has n+1 coins, while player B has n coins. Both players throw all of their
coins simultaneously and observe the number of heads each one gets. Assuming all the coins are fair, what is the
probability that A obtains more heads than B?
Exercise 32 (Challenge). A rook can capture another if they are both in the same row or column of the chess
board. If one puts 8 rooks in the board such that the different placements are equally probable, what is the probability
that no rook can capture another?
Exercise 33 (Challenge). Two magicians perform the following trick: A person randomly picks 5 cards out of
a fair deck and hands them out to Magician 1. Magician 1 puts the five cards in the following way. The first
card is facing down. The next four cards are facing up in the order he chooses. Magician 2 looks at the arranged
cards and guesses which one is facing down. What plan can the magicians agree upon so that Magician 2 always
answers correctly?

17

2.6 Conditional Probability


Conditional probability is an extension of probability. Let A and B be two events, the conditional probability of
A given B is denoted by P(A|B). We interpret this probability as the uncertainty about A assuming that B is
true. In this sense, regular probability is a particular case of conditional probabilities since is always true and,
therefore, P(A) = P(A|). Besides the usual axioms of probability (Definition 9), conditional probabilities also
satisfy an extra axiom:
Definition 14 (Axiom of Conditional Probabilities).
P(A B) = P(A)P(B|A)
That is, the probability of A and B happening is the same as the probability of A happening multiplied by the
probability of B happening given that A happened.
Example 19. Consider the outcome of a fair six-sided die. = {1, 2, 3, 4, 5, 6} and all elements of are equally
likely. Let Odd correspond to the event that the outcome is an odd number, Odd = {1, 3, 5}. Observe that
P(Odd) =

1
2

and P({3}) = 16 . Using definition 14,


P({3} Odd) = P(Odd)P({3}|Odd)

Since {3} Odd, {3} Odd = {3} and


P({3}) = P(Odd)P({3}|Odd)
Substituting the values for P({3}) and P(Odd), we get P({3}|Odd) = 31 .
Example 20. Consider the outcome of a fair six-sided die. = {1, 2, 3, 4, 5, 6} and all elements of are equally
likely. Let A = {3, 4, 5} and B = {1, 2, 3, 4}. Using definition 14,

P(A B) = P(A|B)P(B)
P({3, 4}) = P(A|B)P({1, 2, 3, 4})

Hence, P(A|B) = 24 . We can also compute P(B|A).

P(A B) = P(B|A)P(A)
P({3, 4}) = P(A|B)P({3, 4, 5})

That is, P(B|A) =

2
3

and, therefore, P(A|B) 6= P(B|A).

Example 21. Let A B. Using definition 14,


P(A B) = P(A)P(B|A)
18

Since A B, P(A B) = P(A). Hence, P(B|A) = 1. Similarly,


P(A B) = P(B)P(A|B)
P(A) = P(B)P(A|B)
and P(A|B) =

P(A)
P(B) .

Definition 15. Two events A and B are independent if P(A B) = P(A)P(B).


Lemma 8. A and B are independent if and only if P(A|B) = P(A). In other words, A and B are independent if
and only if ones uncertainty about A does not change assuming that B is true.
Proof. Assume P(A|B) = P(A). Recall from definition 14 that P(A B) = P(A|B)P(B). Using the assumption,
P(A B) = P(A)P(B) and A and B are independent.
Assume A and B are indepedent. Hence, P(A)P(B) = P(A B) = P(A|B)P(B). Putting both ends of the
equality together and dividing by P(B), P(A|B) = P(A).
Example 22. Let = {(H, H), (H, T ), (T, H), (T, T )} and all outcomes be equiprobable. Let A denote The
outcome of the first coin flip is heads, A = {(H, H), (H, T )}. Let B be The outcome of the second coin flip is
heads, B = {(H, H), (T, H)}. Observe that A B = {(H, H)} and P(A B) =

1
4.

Also, P(A) = P(B) =

1
2.

Hence, since P(A B) = P(A)P(B), A and B are independent.


A common mistake is to believe that one can conclude independence from disjointness or the reverse. Recall
that A and B are disjoint if A B = , that is P(A B) = 0. In words, A and B are disjoint if it is impossible
for A and B to happen simultaneously. On the other hand A and B are independent if P(A B) = P(A)P(B).
Hence, for example, if P(A) > 0 and P(B) > 0 two events cannot simultaneously be independent and disjoint.
Definition 16. The events A1 , A2 , . . . An are jointly independent if, for every I {1, 2, . . . , n},
P(iI Ai ) =

P(Ai )

iI

Theorem 2 (Multiplication rule). Let A1 , A2 , . . . An be events. Then


n1
P(ni=1 Ai ) = P(A1 )P(A2 |A1 )P(A3 |A1 A2 ) . . . P(An | i=1
Ai )

Remark: it is common to denote P(ni=1 Ai ) by P(A1 , . . . , An ).


2.6.1 Exercises
Exercise 34. Let A and B be two events such that P(A) = 21 , P(B) =
a. P(A|B)
b. P(A|B c )
c. P(B|A)
d. P(Ac |B c )
19

1
3

and P(A B) = 14 . Find:

Figure 1
Exercise 35. Explain how the cartoon in Figure 1 from http: // xkcd. com/ 795/ relates to the concept of
conditional probability.
Exercise 36. Consider that a coin is flipped twice. I announce that there was at least 1 heads. What is the
probability that there were 2 heads?
Exercise 37. A drug prescription states the following information:
There is a 10% chance of experiencing headache (event H).
There is a 15% chance of experiencing nausea (event N ).
There is a 5% chance of experiencing both side effects.
a. Are the events H and N disjoint? Why?
b. Are the events H and N independent? Why?
c. What is the probability of experiencing at least one of the two side effets?
d. What is the probability of experiencing exactly one of the two side effects?
e. What is the probability of experiencing neither headache or nausea?
f. What is the probability of experiencing headache given that you experienced at least one of the two side
effects?
Exercise 38. Assume a card is taken from a well shuffled deck. Let A denote the card is an ace and B denote
the cards suit is diamonds. Show that A and B are independent.
Exercise 39. Two cards are taken from a well shuffled deck:
a. Let A be the first cards suit is red and B be the second cards suit is red. Compute P (A B). Are A
and B independent?
20

b. Let A be the first cards suit is red and B be the second card is a J, Q or K. Compute P (A B). Are
A and B independent?
Exercise 40. Show that:
1. A and B are independent if and only if P(B|A) = P(B).
2. A and B are independent if and only if P(A|B c ) = P(A).
Exercise 41.
a. If A is independent of B and B is independent of C, are A and C independent? Why?
b. If A is independent of B, B is independent of C and A is independent of C, is P(ABC) = P(A)P(B)P(C)?
Exercise 42. Prove Theorem 2.
Exercise 43. A fair coin is tossed 10 times. What is the probability that no two consecutive heads and no two
consecutive tails appear?
Exercise 44. Let A be an event. Prove that P(|A) is indeed a probability measure.
Exercise 45. Let X denote the outcome of a fair six-sided die. Find P((X 3)2 = 1|X 6= 3).
Exercise 46. (Challenge) Player A has n+1 coins, while player B has n coins. Both players throw all of their
coins simultaneously and observe the number of heads each one gets. Assuming all the coins are fair, what is the
probability that A obtains more or equal heads as B? (This problem is quite hard. You are not required to know
how to solve this one.)
Exercise 47. Prove that if A1 , . . . , An are independent events and their union is the sample space, then P(Ai ) = 1
for at least one i.

21

2.7 The Law of Total Probability and Bayes Theorem


Example 23. Consider that there are 2 coins inside a box. You know that one of them is fair and that the other
one is biased, with probability

1
3

of heads. Suppose you remove either one of the coins with equal probability, flip

that coin twice and obtain two heads.


Denote by C 1 the event that you remove the fair coin, by C 1 the event that you remove the biased coin and by
2

H2 denote the event that you obtain two heads. The problem states that P(C 1 ) = P(C 1 ) = 21 . We also know that
P(H2 |C 1 ) =
2

1
4

and P(H2 |C 1 ) = 91 . What is your belief about which coin you picked after performing the flips?
3

In other words, can you use the previous probability values and the axioms of probability to determine P(C 1 |H2 )?
2

What about the value of P(H2 )?


The answer is yes and, by the end of class, well be able to solve this problem!
In the previous classes we proved various simple lemmas that follow from the axioms of probability. In this
Section well use these lemmas as building blocks for two general results. In order to fully appreciate the beauty
of the next results, try for a few minutes your best shot at Example 23.
Theorem 3 (Law of Total Probability). Let (An )nN be a partition of and B be an event.
P(B) =

P(An )P(B|An )

nN

Proof. Observe that

B =B

(4)

= nN An

(5)

B = B (nN An )

(6)

= nN (B An )

(7)

Also, since (An )nN is a partition of ,

Using equation 5 on equation 4, obtain:

and, therefore,

P(B) = P(nN (B An ))
Next, since (An )nN is a partition, it is a disjoint sequence. Hence, for every i 6= j, Ai Aj = and

22

(8)

(B Ai ) (B Aj ) = B (Ai Aj )

(9)

=B

(10)

(11)

That is, (B An )nN also is a disjoint sequence. Thus, from the axioms of probability (Definition 9),

P(nN (B An )) =

P(B An )

(12)

nN

Finally, from the axiom of conditional probability (Definition 14), for each n N, P(B An ) = P(An )P(B|An ).
Using equations 8 and 12, conclude that:

P(B) =

P(An )P(B|An )

(13)

nN

Lemma 9. If A1 , . . . , An partitions and B is an event. P(B) =

Pn

i=1 P(An )P(B|An ).

Proof. If A1 , . . . , An partitions , the sequence (Ci )iN such that Ci = Ai for i n and Ci = for i > n also
partitions . The proof follows straight from Theorem 3 and P() = 0.
The previous result allows us to relate an unconditional probability P(B) to conditional probabilities P(B|An ).
Example 24 (Example 23 continued). The problem description tells us the probability of obtaining heads once
the coin is decided, that is, P(H2 |C 1 ) and P(H2 |C 1 ). Nevertheless, it doesnt directly tell us the probability of
2

obtaining H2 without knowing which coin was picked, P(H2 ). Observe that C 1 , C 1 partitions and, therefore, it
2

follows from lemma 9 that


P(H2 ) = P(C 1 )P(H2 |C 1 ) + P(C 1 )P(H2 |C 1 )
2

13
1 1 1 1
= + =
2 4 2 9
72
Simple, right? Were starting to benefit from building a rigorous theory instead of relying on intuition alone.
Theorem 4 (Bayes Theorem). Let (Ai )iN be a partition of and B be an event. For every n N,
P(An |B) = P

P(An )P(B|An )
iN P(Ai )P(B|Ai )

Proof. Recall from the axiom of conditional probability (Definition 14) that

P(An |B) =

P(An B)
P(B)

Using the axiom of conditional probability again, P(An B) = P(An )P(B|An ). Using equation 14,
23

(14)

P(An )P(B|An )
P(B)

P(An |B) =

(15)

Finally, observe that (Ai )iN is a partition of and, therefore, using Theorem 3, P(B) =

nN P(An )P(B|An ).

Plugging this value into equation 15

P(An )P(B|An )
nN P(An )P(B|An )

P(An |B) = P

(16)

Lemma 10. Let A1 , . . . , An partition and B be an event.


P(Ai )P(B|Ai )
P(Ai |B) = Pn
i=1 P(Ai )P(B|Ai )
Proof. Follow the proof of Theorem 4. In the last step, use Lemma 9 instead of Theorem 3.
Example 25 (Example 23 continued). The problem description tells us the probability of obtaining two heads
given the bias of the coin, P(H2 |C 1 ) =
2

1
4

and P(H2 |C 1 ) = 19 . It is less intuitive to find out what one learns about
3

the bias of the coin by observing that two heads, P(Ci |H2 ). Bayes Theorem provides the answer.
Recall that C 1 , C 1 is a partition of . Hence, by Lemma 10,
2

P(C 1 |H2 ) =
2

P(C 1 )P(H2 |C 1 )
2

P(C 1 )P(H2 |C 1 ) + P(C 1 )P(H2 |C 1 )


1
2

1 1
2 4
1
1
4 + 2

1
9

9
=
13

2.7.1 Exercises
Exercise 48. Assume 5% of men and 0.25% of women are daltonic. Assume a random person is selected and
he/she is daltonic. Assuming that there are the same number of men and women, what is the probability that this
person is a man?
Exercise 49. Assume that there exist 4 fair coins in a bag. A person picks a number of coins with equal probability
among {1, 2, 3, 4}. Next, the person throws all the coins he picked. What is the probability that all coins land tails,
without knowing how many were picked?
Exercise 50. Medical case histories indicate that different illnesses may produce identical symptoms. Suppose
that a particular set of symptoms, H, occurs only when one of three illnesses, I1 , I2 or I3 occurs. Assume that the
simultaneous occurrence of more than one of these illnesses is impossible. Also assme that:
a. P(I1 ) = 0.01; P(I2 ) = 0.05; P(I3 ) = 0.02.
b. P(H|I1 ) = 0.90; P(H|I2 ) = 0.95; P(H|I3 ) = 0.75.
Assuming that an ill person exhibits the symptoms, H, what is the probability that the person has illness I1 ?
24

Exercise 51. Consider that each component C1 , . . . , Cn in a system fails independently with probability p. A
system fails if there is no operational path from B to E. What is the probability that the following systems fail?
a. Series system:
/ 2
C

/ 1
C

/E

b. Parallel system:
C
? 1

?E

B


C2
c. Mixed system:
C
? 2
B


?E

/ 1
C


C3
Exercise 52. (Monty Hall) There are 3 doors: A, B and C. There is a car behind one of these doors. If you
pick the right one, you get the car. At first, you believe that it is equally likely that the car is behind any one of
the doors. You pick door A. Next, the shows presenter opens door B, shows that it is empty and allows you to
change to door C. Is it a good idea to change? Assume the showman would always open a door with no prize and
offer you the chance to change your door, no matter if you initially chose the door with the prize or not.
Exercise 53. (Polyas urn) Consider that an urn has 1 black ball and 1 white ball. Every time you draw a ball,
you must put it back into the urn and add an extra ball with the same color. What is the probability that you get
a white ball in your 3rd draw from the urn?
Exercise 54. You have 12 red, 12 yellow balls and 2 urns. Consider that the balls are distributed among the urns,
one of the urns is randomly selected (the probabilities that you choose each urn are equal) and a ball is drawn from
the urn that was selected.
a. Assume you put 2 red balls and 4 yellow balls in the first urn and the rest of the balls in the second urn.
What is the probability that you draw a yellow ball?
b. (Challenge) What way of distributing the balls among the urns maximizes your probability of drawing a
yellow ball?
Exercise 55. Probability Theory was used in the criminal case People v. Collins. In this case, a woman had
her purse robbed. Wittnesses claimed that a couple running from the scene was composed of a black man with a
beard and a mustache and a blond girl with her hair in a ponytail. Wittnesses also said the couple drove off in
a yellow car. Malcolm and Janet Collins were found to satisfy all the traits previously presented. A professor
of mathematics also stated in Court that, if a couple were randomly selected, one would obtain the following
probabilities:
25

Event
Man with moustache
Girl with blonde hair
Girl with ponytail
Black man with beard
Interracial couple in a car
Partly yellow car

Probability
1/4
1/3
1/10
1/10
1/1000
1/10

Hence, the probability that all of these characteristics are found in a randomly chosen couple is 1 in 12, 000, 000.
The prosecution claimed that this constituted proof beyond reasonable doubt that the defendants were guilty. Do
you find this argument compelling? Why?
Exercise 56. Prove that if (An )nN and (Bn )nN are two sequences of events such that limn P(An ) = 1 and
limn P(Bn ) = p, then limn P(An Bn ) = p.

26

3 Random Variables
3.1 Distribution of Random Variables
Recall that a random variable is an unknown number, that is, a function from to R (Definition 12). A discrete
random variable is a random variable that only assumes a countable number of values.
Example 26. Consider that a person flips a fair coin once. The possible outcomes are heads or tails and =
{H, T }. Let X denote the number of heads that are observed. That is, X(H) = 1 and X(T ) = 0.
Example 27. Consider that a person throws a fair coin until he obtains the first heads or 3 tosses, =
{H, T H, T T H, T T T }. Denote by X the number of coin flips the person performs. X is a random variable such
that X(H) = 1, X(T H) = 2, X(T T H) = 3 and X(T T T ) = 3. X is discrete and only assumes a finite number of
values, 1, 2 or 3.
Example 28. A fair coin is tossed until it lands heads, = {H, T H, T T H, T T T H, . . .}. Denote by X the number
of coin flips the person performs. X is a discrete random variable and can assume a countably infinite number of
values. X can assume any value in N {0}. For example, X(H) = 1, X(T H) = 2, X(T T H) = 3, . . .
Definition 17. Let X be a random variable. For x R, we define pX (x) = P(X = x) = P({w : X(w) = x}).
We call the function pX : R [0, 1] the probability mass function (pmf ) of X.
Remark: The set {w : X(w) = x} is often denoted by X 1 ({x}).
Example 29. Consider Example 26.
pX (0) = P(X = 0) = P({w : X(w) = 0}) = P({T }) = 0.5
pX (1) = P(X = 1) = P({w : X(w) = 1}) = P({H}) = 0.5
Doesnt it feel good to write pX (1) instead of P({w : X(w) = 1})? This is the power of good notation. . .
Example 30. Consider Example 27. Observe that the elements of are not equiprobable. Hence, even though
{H} and {T H} are unitary sets, their probabilities are different.
In order to obtain the probabilities of the unitary subsets of , define the event Hi as the i-th coin flip is
heads. Observe that all Hi are jointly independent and, since the coin is fair, P(Hi ) = 0.5. Hence,
P({H}) = P(H1 ) = 0.5
P({T H}) = P(H1c H2 ) = P(H1c )P(H 2 ) = 0.25
P({T T H}) = P(H1c H2c H3 ) = P(H1c )P(H2c )P(H3 ) = 0.125
P({T T T }) = P(H1c H2c H3c ) = P(H1c )P(H2c )P(H3c ) = 0.125

Thus, we can compute the pmf of X,


pX (1) = P(X = 1) = P({w : X(w) = 1}) = P({H}) = 0.5
pX (2) = P(X = 2) = P({w : X(w) = 2}) = P({T H}) = 0.25
pX (3) = P(X = 3) = P({w : X(w) = 3}) = P({T T H, T T T }) = 0.25
27

Example 31. Consider Example 28. Observe that, once again, the elements of are not equiprobable. Define
the event Hi as the i-th coin flip is heads. Observe that all Hi are jointly independent and, since the coin is
fair, P(Hi ) = 0.5. Hence,
pX (1) = P(X = 1) = P({w : X(w) = 1}) = P(H1 ) = 0.5
pX (2) = P(X = 2) = P({w : X(w) = 2}) = P(H1c H2 ) = 0.25
...
c
pX (n) = P(X = n) = P({w : X(w) = n}) = P(H1c . . . Hn1
Hn ) =

1
2n

...

Probability of obtaining number of heads in 20 coin flips

0.15

Probability

0.10

0.05

0.00

10

15

20

Number of Heads

Figure 2
Lemma 11. Let X be a discrete random variable and pX be the pmf of X. Let be the possible values of X.
For every x , 0 pX (x) 1.

x pX (x)

= 1.

Proof.
pX (x) = P (X = x) = P ({w : X(w) = x}). Recall that 0 P ({w : X(w) = x}) 1.

x pX (x)

x P ({w

: X(w) = x}). Observe that, for x1 6= x2 , the events {w : X(w) = x1 }

and {w : X(w) = x2 } are disjoint. Hence, by countable additivity,


X

pX (x) = P (x {w : X(w) = x}) = P () = 1

28

There are many ways to represent a pmf. For example, a formula, a table of values or just writing a list of
values. A useful visualization tool is to plot the probability values against , as shown in Figure 2.
Any probabilities associated to X maybe be calculated using a pmf. One may also use cumulative distribution
functions, i.e., FX (x) := P(X x), however we leave this for continuous random variables (Chapter 5).
Definition 18. Let X1 , . . . , Xn be discrete random variables. We say that they are independent if, for every
x1 , . . . , xn R,
P(X1 = x1 , . . . , Xn = xn ) := P(ni=1 Xi = xi ) =

n
Y

P(Xi = xi )

i=1

Therefore, for every x1 , . . . , xn , { : Xi () = xi } are jointly independent.


3.1.1 Exercises
Exercise 57. Let X be a random variable such that X N and pX (i) = c 2i . Find c.
Exercise 58. Consider that you roll 3 times a coin with probability p of heads. Let X be the number of heads you
observe. Find the pmf of X.
Exercise 59. You throw a four-sided die twice. Let X1 and X2 denote, respectively, the outcomes of the first
and second throw. Let Y = X1 + X2 . Find the pmf of Y and of 2X1 . Observe that the pmf of summing two die
outcomes is different from the pmf of multiplying a die outcome by two.
Exercise 60. Consider you pick a coin with equal probability among a fair coin and a coin with probability p of
landing heads. Next, you throw the coin you picked 3 times. Let X denote the number of heads you observe. Find
the pmf of X.
Exercise 61. A box has 4 blue balls and 4 red balls.
a. You draw 3 balls with replacement and X denotes number of blue balls drawn. Find the pmf of X.
b. You draw 3 balls without replacement and Y denotes number of blue balls drawn. Find the pmf of Y .
c. Describe in English the difference between the pmf s of X and Y .
Exercise 62. Let X and Y be two independent discrete variables. Are eX and eY independent?
Exercise 63.
a. Let X and Y be two independent random variables. Show that, for any functions f and g, f (X) is independent of g(Y ).
b. Show an example of discrete random variables X, Y and functions f and g such that f (X) is independent
of g(X) but X is not independent of Y .

29

3.2 (Conditional) Expected value


Definition 19. Let X be a discrete random variable and A an event. The expected value of X given A is denoted
by E[X|A] and
X

E[X|A] =

X(w)P({w}|A)

Definition 20. Let X be a discrete random variable. The expected value of X is denoted by E[X] and is equal to
E[X|]. That is,
X

E[X] =

X(w)P({w})

Lemma 12 (Law of the unconscious statistician). Let X be a discrete random variable with pmf pX and that
assumes values in :
X
E[f (X)] =
f (x) pX (x)
x

Proof.
E[f (X)] =

f (X(w))P({w})

f (X(w))P({w})

x w:X(w)=x

X
x

f (x)

P({w})

w:X(w)=x

f (x) pX (x)

In particular, observe that it follows from Lemma 12 that E[X] =


average of the values of using pX as the weights.

x x

px (x). That is, E[X] is the weighted

Example 32 (Continuation of Example 26).


E[X] = 0 pX (0) + 1 pX (1) = 1 0.5 = 0.5
Example 33 (Continuation of Example 27).
E[X] = 1 pX (1) + 2 pX (2) + 3 pX (3)
= 1 0.5 + 2 0.25 + 3 0.25 = 1.75
Lemma 13. Let X be a discrete random variable such that X N,
E[X] =

P(X i)

i=1

Proof.

E[X] =

X
j=0

j pX (j) =

j
X
X

pX (j) =

j=1 i=1

X
i=1 j=i

30

pX (j) =

X
i=1

P (X i)

p=0.25
0.20

p=0.5

0.15

10

15

20

10

15

p=0.75

p=0.1

0.20

Number of Heads

20

0.10

10

0.00

0.20

Probability

0.10

0.15

0.00

Number of Heads

0.05

Probability

0.05

0.00

0.05

0.15

Probability

0.10

0.00

Probability

0.10

15

20

Number of Heads

10

15

20

Number of Heads

Figure 3: pmfs for number of heads in 20 coin flips with probability p of heads. Red dotted line is the expected
value.

Lemma 14. Linearity of Expected Value

" n
X
i=1

# X
n


ci Xi A =
ci E[Xi A]
i=1

Proof.
"
E

n
X
i=1

#
n
XX


ci Xi A =
ci Xi (w)P({w} A)
=

w i=1
n
X
X

ci

i=1

n
X


Xi (w)P({w} A)


ci E[Xi A]

i=1

Example 34 (20 coin flips). Consider that you flip 20 coins with probability of heads p and count the number of
heads. Figure 3 presents pmf s for different values of p. Their means correspond to the vertical red lines.

31

Lemma 15 (Law of total expectation). Let A1 , . . . , An be a partition of and X be a discrete random variable.
E[X] =

n
X

E[X|Ai ] P(Ai )

i=1

Proof.
n
X

E[X|Ai ] P(Ai ) =

i=1

n X
X
i=1 w
n X
X

X(w)P({w}|Ai )P(Ai )
X(w)P({w} Ai )

(Definition 14)

i=1 w

X(w)

=
=

P({w} Ai )

i=1

n
X

X(w)P

n
[

!
({w} Ai )

i=1

X(w)P({w}) = E[X]

(Lemma 2 and Ai disjoint)


(Since (Ai )ni=1 partition , ni=1 Ai = )

3.2.1 Exercises
Exercise 64. Recall that IA denotes the indicator function (Definition 13) of event A. If P(A) = 0.2, P(B) = 0.9
and P(A B) = 0.1,
a. Compute the pmf of 10IA + 5IB .
b. Compute E[10IA + 5IB ].
Exercise 65. Consider that you throw 200 times a coin with probability p of landing heads. Let X denote the
total number of heads. Compute E[X].
Hint: Denote by Hi the event that the i-th coin flip is heads. Observe that X =

P200

i=1 IHi .

Exercise 66. Let X be the number of heads observed in two flips of a fair coin. Compute E[X 2 ] and E[X]2 .
Exercise 67. Let X1 , . . . , Xn be discrete random variables such that, for every i {1, . . . , n}, E[Xi ] = R.
a. Let pi 0 be such that
=
b. Let X

Pn

i=1

Xi

Pn

i=1 pi

P
= 1. Find, E[ ni=1 pi Xi ].

, commonly called the sample mean. Find E[X].

Exercise 68. Consider that you either choose a four-sided die or a six-sided die with equal probability. Next, you
throw the chosen die 1000 times. Let X denote the sum of the outcomes of the 1000 die throws. Compute E[X].
Exercise 69. Consider a urn with balls numbered from 1 to n. If one samples one ball at a time with replacement
from the urn, what is the the p.m.f. of the random variable X : number of samples needed until the same ball is
drawn twice for the first time? What is its expectation? Hint: use Lemma 13.
32

Exercise 70. Assume you have six keys with you. You are not sure which one opens the door you need to open,
so you start trying each of these keys. What is the average number of keys youll need to try until you are able to
open the door?
Exercise 71. (Challenge) At iteration 1, a bacterial colony has a single bacteria. At each new each iteration, a
single bacteria in the colony can either die (with probability 1 p) or divide into two bacteria (with probability p).
Let Xi denote the number of bacteria in the colony at iteration i.
a. Find E[X2 ].
b. Find E[Xn |X2 = 0] and E[Xn |X2 = 2].
c. Find limn E[Xn ].
Exercise 72. Prove that:
If, for every , X() = c for some c <, then E[X] = c.
If, for every , Z() X() Y (), then E[Z] E[X] E[Y ].

33

3.3 Variance
Let X be a random variable. In the previous subsection, we saw the definition of the expected value of X, E[X].
We saw that, intuitively, E[X] is a central value around which the possible values of X are dispersed.
In this subsection we present the variance of X, V ar[X]. The variance of X is a measure of the concentration
of the possible values of X around E[X].
Definition 21 (Variance). The variance of a discrete random variable X is defined as E[(X E[X])2 ] and denoted
by V ar[X].
Example 35. Let H denote the event that the outcome of a single coin flip is heads. Consider that P(H) = p.
Lets compute V ar[IH ]. Observe that pIH (0) = 1 p and pIH (1) = p. Hence,
E[IH ] = 0 (1 p) + 1 p = p
Hence, V ar[IH ] = E[(IH p)2 ]. If IH = 0, then (IH p)2 = p2 and if IH = 1, then (IH p)2 = (1 p)2 . Conclude
that
V ar[IH ] = E[(IH p)2 ]
= (1 p) p2 + p (1 p)2
= p(1 p)(p + 1 p) = p(1 p)
Observe that p(1 p) is a parabola with roots 0 and 1 and that assumes its maximum value at 0.5. In other words,
the variance of IH is minimized if p = 0 or p = 1, since in these cases we know for sure if we will observe heads or
not. The variance is maximized for a fair coin and, in this sense, the flips of a fair coin oscilate the most around
the expected value.
Example 36. Let X denote the number of heads observed in two flips of a coin with probability p of heads. Let
Hi denote the event that the i-th flip is heads. Lets compute V ar[X]. The pmf of X is:
pX (0) = P({T T }) = P(H1c H2c ) = (1 p)2
pX (1) = P({T H, HT }) = P(H1c H2 ) + P(H1 H2c ) = 2p(1 p)
pX (2) = P({HH}) = P(H1 H2 ) = p2
Hence,
E[X] = 0 (1 p)2 + 1 2p(1 p) + 2 p2 = 2p
Thus, V ar[X] = E[(X 2p)2 ] and
E[(X 2p)2 ] = (0 2p)2 (1 p)2 + (1 2p)2 2p(1 p) + (2 2p)2 p2
= 2p(1 p) (2p(1 p) + (1 2p)2 + 2(1 p)p) = 2p(1 p)
That is, the variance of X is a parabola with minimums at p = 0 and p = 1 and maximum at p = 0.5.
Lemma 16. V ar[aX + b] = a2 V ar[X]

34

Proof.
V ar[aX + b] = E[(aX + b E[aX + b])2 ]

(Definition 21)

= E[(aX + b aE[X] b) ]
2

(Lemma 14)

= E[a (X E[X]) ]
= a2 E[(X E[X])2 ] = a2 V ar[X]

The following lemma can often be useful for computing variances:


Lemma 17.
V ar[X] = E[X 2 ] E[X]2
Proof.
V ar[X] = E[(X E[X])2 ]

(Definition 21)

= E[X 2 2E[X]X + E[X]2 ]


= E[X 2 ] 2E[X]E[X] + E[X]2

(E[X] R, Lemma 14)

= E[X 2 ] E[X]2

Example 37. In Example 36, we have


E[X 2 ] = 02 (1 p)2 + 12 2p(1 p) + 22 p2 = 2p2 + 2p.
It follows from Lemma 17 that
V ar[X] = 2p2 + 2p (E[X])2 = 2p2 + 2p (2p)2 = 2p(1 p),
which matches the value found in Example 36.
Example 38. Consider two coin flips of a coin with probability p of heads. Let Hi denote that the i-th flip is
heads. Lets show that IH1 and IH2 are independent,
P(IH1 = 0 IH2 = 0) = (1 p)2 = P(IH1 = 0)P(IH2 = 0)
P(IH1 = 0 IH2 = 1) = p(1 p) = P(IH1 = 0)P(IH2 = 1)
P(IH1 = 1 IH2 = 0) = (1 p)p = P(IH1 = 1)P(IH2 = 0)
P(IH1 = 1 IH2 = 1) = p2 = P(IH1 = 1)P(IH2 = 1)
Since IH1 and IH2 assume values in {0, 1} and for every i1 , i2 {0, 1}, P (IH1 = i1 IH2 = i2 ) is equal to
P (IH1 = i1 )P (IH2 = i2 ), conclude that IH1 and IH2 are independent.
Lemma 18. If X and Y are independent, E[XY ] = E[X]E[Y ].

35

Proof.
X

E[XY ] =

xyP (X = x Y = y)

For a proof of the first step, see Lemma ??

xIm(X),yIm(Y )

xyP (X = x)P (Y = y)

xIm(X) yIm(Y )

X
X

yP (Y = y)

yIm(Y )

xIm(X)

xP (X = x)
xpX (x)

xIm(X)

ypY (y)

yIm(Y )

= E[X]E[Y ]

Observation: E[XY ] = E[X]E[Y ] does not imply that X and Y are independent. Well see a counter-example
in a future homework.
Lemma 19. If X and Y are independent, V ar[X + Y ] = V ar[X] + V ar[Y ].
Proof.
V ar[X + Y ] = E[(X + Y E[X + Y ])2 ]
= E[(X E[X] + Y E[Y ])2 ]
= E[(X E[X])2 ] + E[(Y E[Y ])2 ] + E[2(X E[X])(Y E[Y ])]
= V ar[X] + V ar[Y ] + E[2(X E[X])(Y E[Y ])]
Hence, it remains to show that E[2(X E[X])(Y E[Y ])] = 0.
E[2(X E[X])(Y E[Y ])] = 2E[XY E[X]Y XE[Y ] + E[X]E[Y ]]
= 2(E[XY ] E[X]E[Y ] E[X]E[Y ] + E[X]E[Y ]])
= 2(E[XY ] E[X]E[Y ])
Since X and Y are independent, E[XY ] = E[X]E[Y ], which completes the proof.
Example 39 (Continuation of Examples 36 and 38). We showed that X = IH1 + IH2 and IH1 and IH2 are
independent. Hence, V ar[X] = V ar[IH1 ] + V ar[IH2 ]. From Example 35, V ar[IH1 ] = V ar[IH2 ] = p(1 p). Hence,
V ar[X] = 2p(1 p), which again confirms the calculations in Example 36.
Lemma 20. If X is a discrete random variable, V ar[X] = 0 if, and only if, X is constant (i.e, there exists c R
such that P(X = c) = 1).
Proof. Assume there exists c R such that P(X = c) = 1. Then E[X] = c, and E[X 2 ] = c2 , which implies that
V ar[X] = c2 c2 = 0.
Now, let be set of the values X assumes. If V ar[X] = 0, we have that
X
(x E[X])2 pX (x) = 0.
x

36

Now, because pX (x) > 0 for every x , this implies that x E[X])2 = 0 for every x , i.e., x = E[X] for every
x , which concludes the proof.
3.3.1 Exercises
Exercise 73. Show that V ar[X] 0.
Exercise 74. Let X assume values in {1, 0, 1}. Let pX (1) = pX (1) =

1p
2

and pX (0) = p. Find V ar[X]. For

what value of p is it maximized? For what values of p is it minimized?


Exercise 75. Consider that X assumes values in {a, a} and pX (a) = pX (a) = 0.5. Find E[X] and V ar[X].
What is the behavior of V ar[X] as a funcion of a?
Exercise 76. Consider 3 flips of a coin with probability p of heads. Let X denote the number of heads that is
observed. Find V ar[X].
Exercise 77. Consider that I remove two balls without replacement and with equal probability from a box with 3
pink balls and 3 orange balls. Let X denote the number of orange balls that I remove. Find E[X] and V ar[X].
Exercise 78. If E[X 2 ] = 4, E[X] = 1, E[Y 2 ] = 9, E[Y ] = 2 and X and Y are independent, find V ar[X + 2Y + 3].
Exercise 79. Let X be a discrete random variable and d R,
a. Show that E[(X d)2 ] = V ar[X] + (E[X] d)2 .
b. Prove that E[(X d)2 ] is minimized when d = E[X].
Exercise 80. Let X1 , . . . , Xn be independent random variables such that, for every i {1, . . . , n}, E[Xi ] = and
P
V ar[Xi ] = 2 . Let pi 0 such that ni=1 pi = 1.
a. Show that p1 X1 , . . . , pn Xn are jointly independent.
P
b. Find V ar[ ni=1 pi Xi ].
=
c. Recall that X

Pn

i=1

Xi

. Find V ar[X].

d. Show that, for any pi , . . . , pn , V ar[

Pn

i=1 pi Xi ]

V ar[X].

e. Combine your knowledge from Exercises 67 and 80. Provide arguments in English of why, among all possible
P
provides the values that are closest to .
random variables of the form ni=1 pi Xi , X

37

3.4 Covariance and the Vector Space of Random Variables


Definition 22 (Covariance). Let X and Y be two discrete random variables.
Cov[X, Y ] = E[(X E[X])(Y E[Y ])]
Lemma 21. Cov[X, Y ] = E[XY ] E[X]E[Y ].
Proof.
Cov[X, Y ] = E[(X E[X])(Y E[Y ])]
= E[XY XE[Y ] Y E[X] + E[X]E[Y ]]
= E[XY ] E[Y ]E[X] E[X]E[Y ] + E[X]E[Y ]

(Lemma 14)

= E[XY ] E[X]E[Y ]

Lemma 22 (Properties of Covariance). If X and Y are discrete random variables,


a. Cov[X, X] = V ar[X] 0. Hence Cov[X, X] = 0 only if X is a constant random variable.
b. Cov[X, Y ] = Cov[Y, X].
c. Cov[aX + bY, Z] = aCov[X, Z] + bCov[Y, Z].
Proof.

a.
Cov[X, X] = E[X X] E[X]E[X]
= E[X 2 ] E[X]2 = V ar[X]

(Lemma 21)
(Lemma 17)

The conclusion follows from Lemma 20.


b.
Cov[X, Y ] = E[XY ] E[X]E[Y ]

(Lemma 21)

= E[Y X] E[Y ]E[X] = Cov[Y, X]

(Lemma 21)

c.
Cov[aX + bY, Z] = E[(aX + bY )Z] E[aX + bY ]E[Z]

(Lemma 21)

= E[aXZ + bY Z] E[aX + bY ]E[Z]


= aE[XZ] + bE[Y Z] (aE[X] + bE[Y ])E[Z]

(Lemma 14)

= a(E[XZ] E[X]E[Z]) + b(E[Y Z] E[Y ]E[Z])


= aCov[X, Z] + bCov[Y, Z]

38

(Lemma 21)

Definition 23. Let be a countable set, P a probability function on .


1. Define V = {X : R s.t. E[X] = 0}.
2. For every X V and Y V, we define (X + Y ) : R such that (X + Y )(w) = X(w) + Y (w).
3. For every X V and a R, we define (aX) : R such that (a X)(w) = aX(w).
Lemma 23. (V, +, ) is a vector space over R.
Proof. In order to prove that V is a vector space, it is enough to show the following:
1. Since E[0] = 0, 0 V.
2. If X V and Y V, E[X + Y ] = E[X] + E[Y ] = 0. Hence, X + Y V.
3. If a R and X V, E[aX] = aE[X] = 0. Hence, aX V.

Lemma 24. Cov is an inner product in V. Hence,

V ar[X] is a norm in V.

Proof. Follows directly from Lemma 22.


Lemma 25 (Cauchy-Schwarz for random variables).
|Cov[X, Y ]|

p
V ar[X] V ar[Y ].

The equality holds if, and only if, there exists a, b R such that Y = aX + b.
Proof. Let V = X E[X] and W = Y E[Y ]. Because V, W V and, from Lemma 24, covariance is an inner
product, it follows that
p
p
Cov[V, V ] Cov[W, W ]
p
p
= V ar[V ] V ar[W ]
p
p
= V ar[X] V ar[Y ]

|Cov[X, Y ]| = |Cov[V, W ]|

(Cauchy-Schwarz inequality)
(Lemma 22)

Now, from Cauchy-Schwarz inequality, we know that the equality holds if, and only if, the exists b R, b 6= 0, such
that W = bV . In other words, the equality holds if, and only if, there exists b R such that Y E[Y ] = b(XE[X]),
which concludes the proof.
Lemma 26 (Pythagorean Theorem for random variables).
V ar[X + Y ] = V ar[X] + V ar[Y ] + 2Cov[X, Y ]
Hence, if Cov[X, Y ] = 0, V ar[X + Y ] = V ar[X] + V ar[Y ].
Proof.
V ar[X + Y ] = Cov[X + Y, X + Y ]

(Lemma 22)

= Cov[X, X] + Cov[Y, Y ] + 2Cov[X, Y ]

(Lemma 24)

= V ar[X] + V ar[Y ] + 2Cov[X, Y ]

(Lemma 22)

39

Definition 24 (Correlation).
Cov[X, Y ]
p
Corr[X, Y ] = p
V ar[X] V ar[Y ]
Let h, i be an inner product and k k be the norm generated by the inner product. Recall from linear algebra
class that

<v1 ,v2 >


kv1 kkv2 k

is the cosine of the angle betwen v1 and v2 . Hence, using Lemma 24, we can interpret Corr[X, Y ]

as the cosine of the angle between random variables X and Y . In other words, Corr[X, Y ] is a measure of the
linear association between X and Y .
Lemma 27.
|Corr[X, Y ]| 1
Proof. Follows directly from applying Lemma 25 to Definition 24.
Lemma 28. Let X be a discrete random variable.
a. Let b R, Cov[b, X] = 0.
b. Let a 6= 0,

1
Corr[aX + b, X] =
1

, if a > 0
, if a < 0

Proof.
a. Cov[b, X] = E[bX] E[b]E[X] = bE[X] bE[X] = 0
b.
Cov[aX + b, X]
p
Corr[aX + b, X] = p
V ar[aX + b] V ar[X]
aCov[X, X] + Cov[b, X]
p
=p
V ar[aX + b] V ar[X]
aV ar[X]
p
=p
2
a V ar[X] V ar[X]
a
=
|a|

(Lemma 22)
(Lemmas 22 and 16)

Under independence, the covariance and correlation must be zero:


Lemma 29. If X and Y are independent, then Cov(X, Y ) = Corr(X, Y ) = 0.
Proof. It follows from the fact that if X and Y are independent, then E[XY ] = E[X]E[Y ]
The opposite, however, is not true; see the exercises for an example.
It follows from Lemma 25 that the correlation has maximum absolute value if, and only if, X and Y are linearly
dependent. This is why weve stated that covariance is a measure of linear association.
Covariances allow us to compute variances of sums of random variables that are not independent in an easy
way:
40

Lemma 30. Let X1 , . . . , Xn be random variables. We have that


n
n
X
X
XX
V ar[
Xi ] =
Cov[Xi , Xj ]
V ar(Xi ) +
i=1

i=1

i6=j

Lemma 30 is a generalization of Lemma 26, and we leave its proof to the reader.
Example 40 (Ross). A group of n people throw their hats into the center of a room. The hats are mixed up, and
then each person randomly selects one hat. Let X be the expected number of people that will selected their own
P
hat. We have that X can be written as the sum X = ni=1 Xi , where Xi is one if the i-th person selects his own
hat, and zero otherwise. Now, E[Xi ] = 1 pXi (1) = 1/n. It follows that E[X] = n 1/n = 1. We can also compute
the variance of X. First notice that Xi2 = Xi , and hence V ar(Xi ) = E[Xi2 ] E[Xi ]2 = 1/n (1/n)2 . Also, for
i 6= j, Xi Xj is a random variable that can only assume to values, 0 and 1. Moreover,
P(Xi Xj = 1) = P(Xi = 1, Xj = 1) = P(Xj = 1)P(Xi = 1|Xj = 1) =
It follows that E[Xi Xj ] = 1

1
n

1
n1 ,

1
1

.
n n1

and therefore

Cov[Xi , Xj ] = E[Xi Xj ] E[Xi ]E[Xj ] =

1
1 1
1
1

= 2
.
n n1 n n
n (n 1)

Finally, we notice that Cov[Xi , Xj ] is the same for every i 6= j. Because there are 2

n
2

of such pairs, it follows

from Lemma 30 that


V ar[X] = V ar

" n
X

#
Xi = n

i=1

1
1
2
n n

 
1
n
+2
=1
2
2 n (n 1)

3.4.1 Exercises
Exercise 81. Let P(A) = pA , P(B) = pB and P(A B) = pAB .
a. Find Cov[IA , IB ] and Corr[IA , IB ].
b. Find a numerical value for Corr[IA , IB ] when A = B.
c. Find a numerical value for Corr[IA , IB ] when A is independent of B.
d. Find a numerical value for Corr[IA , IB ] when A and B partition .
e. Provide an interpretation in English for the previous items.
Exercise 82. Let = {1, 0, 1} and all outcomes be equally likely. Let X(w) = w and Y (w) = w2 .
a. Find Corr[X, Y ]. Are X and Y linearly associated?
b. Are X and Y independent?
Exercise 83. Let X1 , . . . , Xn , Y1 , . . . , Ym be random variables, and let a1 , . . . , an , b1 , . . . , bm be real numbers. Prove
that

n
m
n X
m
X
X
X
Cov(
ai Xi ,
bj Yj ) =
ai bj Cov(Xi , Yj ).
i=1

j=1

i=1 j=1

41

Exercise 84. Assume X1 , . . . , Xn are independent and identically distributed (a.k.a., iid) random variables with
X]
= 0.
variance 2 < . Show that Cov[Xi X,
Exercise 85. Assume X1 , . . . , Xn are identically distributed variables with covariance Cov[Xi , Xj ] = for i 6= j,

variance 2 and mean . Compute the variance of X.


Exercise 86. Let X1 , X2 , . . . be independent with common mean and common variance 2 . For n = 1, 2, . . .,
let Yn = Xn + Xn+1 + Xn+2 . For j = 0, 1, . . ., find Cov(Yn , Yn+j ).
Exercise 87. (Challenge) Prove the triangle inequality for random variables:
p
p
p
V ar[X + Y ] V ar[X] + V ar[Y ]

42

4 Bernoulli Processes (a long example. . . )


4.1 Bernoulli Distribution
We say that the distribution of a random variable X is Bernoulli if X can assume values in {0, 1}. In certain
contexts, 0 is interpreted as a failure and 1 as a success. We say that X has distribution Bernoulli with parameter
p if P(X = 1) = p and write X Ber(p).
Example 41. Let H denote the event that the outcome of a fair coin flip is heads. IH assumes values in {0, 1}
and, its distribution is Bernoulli. Furthermore, since the coin is fair, IH Ber(p).
Lemma 31. If X Ber(p), E[X] = p and V ar[X] = p(1 p).
Proof.
E[X] = 0 (1 p) + 1 p = p
and

V ar[X] = E[X 2 ] E[X]2


= (0 (1 p) + 1 p) p2
= p(1 p)

Definition 25. We say that a sequence of random variables X1 , X2 , . . . is a Bernoulli Process with parameter p
if the random variables are jointly independent and such that, for each i N, Xi Bernoulli(p).
Example 42 (Coding). Write a code that generates a number according to the Bernoulli(p) distribution.
import random
def rbernoulli(p):
return random.random() < p

Remark: All the code we present in this book is written in Python.

4.2 Binomial Distribution


Pn

Definition 26. Let X1 , X2 , . . . be a Bernoulli Process with parameter p. We say that


P
That is, ni=1 Xi follows a Binomial distribution with parameters n and p.

i=1 Xi

Binomial(n, p).

One can interpret the Binomial distribution in the following way. Consider that one performs n independent
trials and each trial can be a success with probability p or a failure with probability 1 p. Let X denote the
number of successes we get after performing the n trials. X has distribution Binomial(n, p), where n is the number
of trials and p is the probability of sucess in each trial.
Lemma 32. If X has distribution Binomial(n, p), then, for 0 i n, P(X = i) =
43

n
i

pi (1 p)ni .

Proof. In order for X = i, one must observe i sucesses and n i failures. Observe that, since the trials are
independent, the probability of every outcome with i sucesses and n i failures is pi (1 p)i .
Hence P(X = i) = cn,i pi (1 p)ni , where cn,i is the number of outcomes such that there are i successes and
n i failures. Note that cn,i corresponds to the number of anagrams of SSS
| {z. .}. FF
| {z. .}. . There are n! permutations
i times n-i times

between letters but the i! permutations between Ss are the same and so are the (n i)! permutations between


n!
F s. Hence cn,i = i!(ni)!
= ni and P(X = i) = ni pi (1 p)ni .
The following result is commonly useful when performing calculations with binomials.
Lemma 33 (Binomial Theorem).
n  
X
n i ni
ab
= (a + b)n
i
i=0

Example 43. Let X have distribution Binomial(n, p). Lets use the Binomial Theorem to perform a sanity check
and prove that P(X {0, 1, . . . , n}) = 1.
P(X {0, 1, . . . , n}) =
=

n
X

P(X = i)

i=0
n 
X
i=0


n i
p (1 p)ni
i

= (p + 1 p)n = 1
Lemma 34. If X has distribution Binomial(n, p), then E[X] = np and V ar[X] = np(1 p).
Proof.
E[X] =
=
=

n
X
i=0
n
X
i=1
n
X
i=1

i P (X = i)
 
n i
i
p (1 p)ni
i
i

(Canceling i=0 out)

n!
pi (1 p)ni
(n i)!i!

n
X

(n 1)!
pi1 (1 p)ni
(n i)!(i 1)!
i=1

n 
X
n 1 i1
= np
p (1 p)ni
i1
i=1
n1
X n 1
= np
pj (1 p)n1j
j
= np

(Since i! = i (i 1)!)

(Calling j = i-1)

j=0

= np(p + 1 p)n1 = np

(Binomial Theorem, Lemma 33)

also

44

E[X(X 1)] =

n
X

i(i 1) P(X = i)

i=0

n
X

i(i 1)

i=2

n!
pi (1 p)ni
(n i)!i!

(Canceling i=0 and i=1 out)

n
X

(n 2)!
pi2 (1 p)ni
(n i)!(i 2)!
i=0
n2
X n 2
= n(n 1)p2
pj (1 p)n2j
j
2

= n(n 1)p

(Since i! = i(i 1) (i 2)!)

(Calling j = i-2)

j=0

= n(n 1)p2 (p + 1 p)n2 = n(n 1)p2

(Binomial Theorem, Lemma 33)

We proved that

E[X] = np
E[X(X 1)] = n(n 1)p2

Recall from Lemma 17 that


V ar[X] = E[X 2 ] E[X]2
and, therefore,

V ar[X] = E[X 2 ] E[X] + E[X] E[X]2


= E[X(X 1)] + E[X] E[X]2
2

(Additivity of Expectation, Lemma 19)

2 2

= n(n 1)p + np n p

= n2 p2 np2 + np n2 p2
= np(1 p)
As an alternative proof, let Si denote the event that trial i was a success. We know that X =
that ISi are independent random variables with distribution Bernoulli(p). Hence,
n
X
E[X] = E[
ISi ]
i=1

n
X

E[ISi ]

(Additivity of Expectation, Lemma 19)

i=1

= np

(Properties of Bernoulli(p), Lemma 31)

and
45

Pn

i=1 ISi

and

n
X
V ar[X] = V ar[
ISi ]
i=1

n
X

V ar[ISi ]

(Additivity of Variance, Lemma 19)

i=1

= np(1 p)

(Properties of Bernoulli(p), Lemma 31)

Example 44 (Coding). Write a code that generates a number according to the Binomial(n,p) distribution. Consider the rbernoulli(p) function in Example 42.
def rbinom(n,p):
sum = 0
foreach ii in range(n):
sum += rbernoulli(p)
return(sum)

4.2.1 Exercises
Exercise 88. I throw a coin with probability p of heads 5 times. What is the probability that exactly 3 outcomes
are heads?
Exercise 89. I throw a fair coin 7 times. What is the probability that 1 or more outcomes are heads?
Exercise 90. The products created by a machine are defective independently with probability p. Out of a batch of
1000 products, how many are expected to be defective? What is the variance of the number of defective products?
Exercise 91. I throw a fair coin 4 times and a coin with probability p of heads 8 times. Let X be the total number
of heads. Compute E[X] and V ar[X].
Exercise 92. Consider that a box has o 1 orange balls and p 1 pink balls. Consider that I remove 2 balls
without replacement from the box. Let X denote the total number of pink balls. Is the distribution of X a Binomial?
If yes, find the parameters of the distribution.
Exercise 93. Assume you trow a coin n times and observe the total number of heads X. Say the probability of
heads is p. If you observe exactly i heads, i {0, 1, . . . , n}, what is the value of p that maximizes the probability
P(X = i)? This is an example of a statistical inference methodology: given the observed value of a random
variable, you try to infer what is the probability distribution it came from.
Exercise 94.
a. A fair coin is thrown 1001 times. Find the probability that one observes strictly more heads than tails.
b. A fair coin is thrown 1000 times. Find the probability that one observes strictly more heads than tails.
Exercise 95 (Challenge). Let X Binomial(1000, 0.5) and Y Binomial(1001, 0.5) be independent random
variables. Find the probability that Y > X.
46

4.3 Geometric Distribution


Let X1 , X2 , . . . be a Bernoulli Process with parameter p. Define Y as the smallest index i such that Xi = 1.
That is, for each w , Y (w) = miniN {i : Xi (w) = 1}. Y can be interpreted as the number of trials in a
Bernoulli Process until one observes a 1. We say that Y has Geometric distribution with parameter p and write
Y Geom(p).
Example 45. Consider that a fair coin is tossed repeatedly. Let Y be the number of tosses until the first heads is
observed. Y Geom(0.5).
If Y has Geometric distribution, Y can assume any value greater or equal to 1. That is, contrary to the Binomial
distribution, the Geometric can assume infinitely many values.
Lemma 35. Let Y Geom(p). pY (i) = p(1 p)i1 .
Proof. Observe that there is only one way for Y = i. The first i 1 trials must be failures and the subsequent
trial must be a success. Since the trials are independent, the probability of this event is (1 p)i1 p.
Lemma 36 (Geometric Series). If 0 < q < 1,

qi =

i=j

qj
1q

Example 46. We can perform a sanity check and show that the pmf of the Geometric sums to 1. Observe that:

p(1 p)i1 = p

i=1

(1 p)j

j=0

p
=1
1 (1 p)

Geometric Series (Lemma 36)

Observe that P(X i) corresponds to the probability that one observes at least i 1 failures. The following
lemma proves a way to compute P(X i) using the geometric series.
Lemma 37. Let X Geom(p). We have that P(X i) = (1 p)i1 .
Proof.
P(X i) =
=

X
j=i

P(X = j)
(1 p)j1 p

j=i

=p

(1 p)j1

j=i

p(1 p)i1
= (1 p)i1
p

Geometric Series (Lemma 36)

Lemma 38. If X has distribution Geom(p), then E[X] =


47

1
p

and V ar[X] =

1p
.
p2

Proof.
E[X] =
=

X
i=1

iP(X = i)
ip(1 p)i1

i=1

i=1

(1 p)i
p

= p

X
(1 p)i
p
i=1

= p
=

1p
p p

Geometric Series (Lemma 36)

1
p

E[X(X + 1)] =

i(i + 1)P(X = i)

i=1

i(i + 1)p(1 p)i1

i=1

X
2 (1 p)i+1
=
p
p2
i=1

2 X
=p 2
(1 p)i+1
p
i=1

2 (1 p)2
=p 2
p
p
2
= 2
p

Geometric Series (Lemma 36)

Finally,
V ar[X] = E[X 2 ] E[X]2

(Lemma 17)

= E[X 2 ] + E[X] E[X] E[X]2


= E[X 2 + X] E[X] E[X]2
2
1
1
1p
= 2 2 =
p
p p
p2

Additivity of Expectation (Lemma 14)

Example 47 (Coding). Write a code that generates a number according to the Geometric(p) distribution. Consider the rbernoulli(p) function in Example 42.
def rgeom(p):
ii = 1
while(not(rbernoulli(p))):
ii += 1
return(ii)
48

4.3.1 Exercises
Exercise 96. In a certain population, 10% of people have blood type O, 40% have blood type A, 45% have blood
type B, and 5% have blood type AB. Let Y denote the number of donors who enter a blood bank on a given day
until the first potential donor for a patient with type B (i.e. until a donor has either blood type O or B).
a. Find P (Y = 1).
b. Find P (Y 4).
c. Find E[Y ].
d. Find V ar[Y ].
Exercise 97. Let X Geom(p). Find E[X(X + 1)(X + 2)].
Exercise 98. Let X Geom(p). Find P(X t + s|X > s).
Exercise 99. Suppose a person is waiting at a bus stop. The person believes that the event of a bus coming in
each five-minute period is independent of each other five-minute period, and that the probability that a bus will
come in any given five-minute period is p, where p is assumed to be known. This belief is different from believing
that the buses operate on a fixed schedule. Having waited 20 minutes already, is the bus more likely, less likely or
equally likely to come in the next five-minute period? Why?
Exercise 100. Compute the expectation of a geometric random variable using Lemma 13.
Exercise 101. Consider the following game: a fair coin is flipped until the first heads appear. If n trials are
made in total, you receive the prize of 2n dollars. How much money you expect to receive in this game?

4.4 Negative Binomial Distribution


Consider a sequence of independent Bernoulli trials, all having the same probability of success, p. Recall the
Geometric distribution involves performing trials until the first success. The negative binomial is a generalization
of the Geometric distribution and corresponds to the number of trials until the first r successes are observed. In
P
this sense, if we consider X1 , . . . , Xr independent variables such that Xi Geom(p), then ri=1 Xi has distribution
Negative Binomial(r, p).
Lemma 39. If Y has distribution Negative Binomial(r, p), then pY (i) =

i1
r1

(1 p)ir pr .

Proof. Observe that P(Y = i) corresponds to the event that the r-th success occurs exactly at the i-th trial. By
definition, this event is true if and only if there are r successes and i r failures in the first i trials and the last
trial is a success. The probability of every outcome with r successes and i r failures in i trials is pr (1 p)ir .

i1
Furthermore, there are r1
possible permutations of the i r failures and r 1 first succeses. Since all these

i1 r
permutations are equally likely, P(Y = i) = r1
p (1 p)ir .
Example 48. Observe that if Y Negative Binomial(1, p), P(Y = i) =

i1
11

p1 (1 p)i1 = p(1 p)i1 . This

expression is exactly the pmf of a geometric distribution and, therefore, Y Geom(p). That is, the geometric
distribution is a particular case of the negative binomial when r = 1.
49

Lemma 40. If Y has distribution Negative Binomial(r, p), then E[Y ] =

r
p

and V ar[Y ] =

r(1p)
.
p2

Proof. Recall that if Y has distribution Negative Binomial(r, p), then there exist X1 , . . . , Xr independent variables
P
such that Xi Geom(p) and Y = ri=1 Xi .
r
r
r
X
X
X
r
1
E[Y ] = E[
Xi ] =
E[Xi ] =
=
p
p
i=1

i=1

i=1

Similarly, since the Xi s are independent,


V ar[Y ] = V ar[

r
X
i=1

Xi ] =

r
X

V ar[Xi ] =

i=1

r
X
1p
i=1

p2

r(1 p)
p2

Example 49 (Coding). Write a code that generates a number according to the Negative Binomial(r, p) distribution. Consider the rbernoulli(p) and rgeom(p) functions in Examples 42 and 47.
def rnbinom(r,p):
sum = 0
for ii in xrange(r):
sum += rgeom(p)
def rnbinom(r,p):
number_of_ones = value = 0
while(number_of_ones < r):
value += 1
number_of_ones += rbernoulli(p)
return value

4.4.1 Exercises
Exercise 102. An exploratory oil well has a probability of 10% of striking oil. Consider that different wells are
independent
a. What is the probability that the 3rd time a company strikes oil happens on the 8-th try?
b. What is the expected number of tries until the company strikes oil for the 5th time?
c. What is the variance of the number of tries until the company strikes oil for the 8-th time?
Exercise 103. Consider that I throw a coin 5 times. Let X denote the total number of heads. After those flips,
I also throw the same coin until I get 2 heads. Let Y denote the total number of trials. What are the distributions
of X and Y ? Is P(X = 2) = P(Y = 5)? Interpret this result.
Exercise 104. Consider the same X and Y as in Exercise 103. Also consider that I picked the coin with equal
probability among a fair coin and a coin with probability 0.1 of heads and did not tell you which one I got. Let F
denote the event that I got the fair coin. Compute P(F |X = 2) and P(F |Y = 2).
50

Exercise 105. Argue that if X Negative Binomial(r, p) and Y Binomial(n, p), then P(X > n) = P(Y < r).
Exercise 106. You have two jars with candies, each of them with n candies. Each time you want one candy, you
choose at random if get it from jar A or jar B. What is the probability that by the time you empty the first jar,
there are exactly i candies left on the other one?

4.5 Hypergeometric Distribution


A population has N individuals and k of them have a given property of interest. Consider a sample of size n
without replacement out of the population such that all groups of size n are equally likely. Let X denote the
number of members of the population with the property of interest that were sampled. We say that X has
Hypergeometric distribution with parameters N , n and k and write X Hypergeometric(N, n, k).
Lemma 41. If X Hypergeometric(N, n, k), then P(X = i) =
Proof. Recall that there exist

N
n

k
(ki)(Nni
)
.
N
(n)

ways to choose a group of n out of a population of N . Since all groups are

equally probable, it follows from Lemma 7 that it is enough to determine the number of groups of size n such
that i individuals have the property of interest. Observe that such a group must contain i individual that have
the property of interest and n i individuals that dont that property. That is, we want to determine in how
many ways it is possible to select a group of i individuals out of k with the property and a group of n i
 k
individuals out of the N k that dont have the property of interest. This number is ki Nni
. Conclude that
k N k
( )( )
P (X = i) = i Nni .
(n)
Lemma 42. Consider that there exist N individuals and k of them have a given property. Consider that a sample
of size n is drawn without replacement from the population and all groups of size n are equally likely. Let Mi
denote the event that the i-th drawn individual has the property of interest. For every 1 i N , P (Mi ) =
Proof. We pick the first individual with equal probability among N and, hence, P (M1 ) =

k
N.

k
N.

Let Xi denote the

number of individuals with the property of interest that are selected after i draws. Since the sample is selected
without replacement, observe that P (Mi+1 |Xi = j) =

kj
N .

Also observe that Xi Hypergeometric(N, i, k).

Hence,
P(Mi+1 ) =

i
X

P(Mi+1 |Xi = j)P(Xi = j)

(Theorem 3)

j=0
k
j

i
X
kj
=

N i

j=0

i
X
kj
=

N i

k!
j!(kj)!

j=0

N k
ij

N
i


(Lemma 41)
(N k)!
(ij)!(N ki+j)!
N!
i!(N i)!

(N k)!
(ij)!(N ki+j)!
N (N 1)!
j=0
i!(N i1)!
(k1)!
(N k)!
i
k X j!(kj1)! (ij)!(N ki+j)!
(N 1)!
N
j=0
(i)!(N i1)!


k1 N k
i
k X j
k
ij

=

N
1
N
N
i
j=0

i
X

k!
j!(kj1)!

51

N k
(k1
j )( ij )
is the pmf of a Hypergeometric(N 1, i, k 1) and the sum of the pmf
N 1
( i )
over all its possible values is 1 (Lemma 11).

The last equality holds because

Lemma 43. If X Hypergeometric(N, n, k), then E[X] =

nk
N

and V ar[X] =

nk (N k) N n
N
N
N 1 .

Proof. Let Mi denote the event that the i-th element of the sample has the property of interest. Also observe
P
that X = ni=1 IMi and IMi Bernoulli(P (Mi )). Therefore,
n
X
E[X] = E[
IMi ]
i=1

=
=
=

n
X
i=1
n
X
i=1
n
X
i=1

E[IMi ]

Additivity of Expected Value (Lemma 14)

P (Mi )

Properties of Bernoulli (Lemma 31)

nk
k
=
N
N

(Lemma 42)

Nevertheless, since the sampling is without replacement the IHi are not independent. Therefore, we cannot use
the same trick to compute the variance. Instead we compute E[X(X 1)],

E[X(X 1)] =

n
X

i(i 1)P(X = i) =

n
X
i(i 1)

i=0

i=1

N k
k!
i(i 1) i!(ki)! ni
N!
n!(N n)!
i=1

N k
n (i 1) k(k1)!
X
(i1)!(ri)! ni
N (N 1)!
i=1
n(n1)!(N n)!


k1 N k
n
kn X (i 1) i1 ni

N 1
N
n1
i=1

k N k
i
 ni
N
n

n
X

Let Y Hypergeometric(N 1, n 1, k 1). Observe that P(Y = i 1) =


simplifies to

E[X(X 1)] =
=
=

kn X
(i 1)P(Y = i 1)
N
kn
N

i=1
n1
X

jP (Y = j)

j=0

kn
kn (k 1)(n 1)
E[Y ] =
N
N
N 1

Finally, recall that


52

(17)
N k
(k1
i1 )( ni )
. Therefore equation 17
1
(Nn1
)

V ar[X] = E[X 2 ] E[X]2


2

= E[X ] E[X] + E[X] E[X]

(Lemma 17)
2

= E[X(X 1)] + E[X] E[X]2


 2
kn (k 1)(n 1) kn
kn
=

N
N 1
N
N
kn N k N n
=

N
N
N 1

Additivity of Expected Value (Lemma 14)

Example 50 (Coding). Write a code that generates a number according to the Hypergeometric(N, n, k).
def rhyper(N,n,k):
k = float(k)
number_of_ones = 0
for(ii in range(n)):
if(k > 0 and rbernoulli(k/N)):
k -= 1
number_of_ones += 1
return number_of_ones

4.5.1 Exercises
Exercise 107. In order to sell a batch of toys in Brazil, the batch must pass a test that the toys are safe. This test
consists of evaluating the safety of a sample without replacement of the toys in the batch. If any of the sampled
toys are found unsafe, none of the products of the batch can be sold. Assume that a batch has 100 products and 10
of them are defective and unsafe. What is the smallest sample size such that the probability that the batch doesnt
pass the test is larger than 95%?
Exercise 108. A box has 5 orange balls and 8 pink balls.
a. Consider that 3 balls are taken without replacement. Let X denote the total number of pink balls. Find tbe
distribution of X, E[X] and V ar[X].
b. Consider that 3 balls are taken with replacement. Let Y denote the total number of pink balls. Find the
distribution of Y , E[Y ] and V ar[Y ].
c. Compare V ar[X] to V ar[Y ].
Exercise 109. What is the distribution of the number of heads I get on the first 10 flips of a coin with bias p
given that on the first 20 flips I got 11 heads?
Exercise 110 (Challenge). Write a code that outputs with equal probability one of the permutations of (0, . . . , n
1).
53

4.6 Poisson Distribution


The Poisson distribution is commonly used to model the occurence of rare events and can be used as an approximation of the Binomial(n, p) distribution when n is large and p is small. Formally, if X has distribution Poisson
with parameter , we write X Poisson(). The Poisson as defined such that X N has pmf:
P(X = i) =

e i
i!

Lemma 44 (Taylor expansion of ex ).

X
xi

e =

i=0

i!

Lemma 45. If X Poisson(), then E[X] = and V ar[X] = .


Proof.
E[X] =
=

X
i=0

iP(X = i)
i

i=1

e i
i!

X
ie i1

i(i 1)!

i=1

=
=

i=1

X
j=0

e i1
(i 1)!
e j
j!
P(X = j) = 1

j=0

Next, we compute E[X(X 1)] in order to find V ar[X]]

E[X(X 1)] =

i(i 1)P(X = i)

i=0

i(i 1)

i=2
2

= 2

e i
i!

i2
X
e
i=2

X
j=0

Finally,

54

(i 2)!
P(X = j) = 2

V ar[X] = E[X 2 ] E[X]2


= E[X(X 1)] + E[X] E[X]2
= 2 + 2 =
4.6.1 Exercises
Exercise 111. The average demand for a given product is 4 per day. The store owner wants to be 99% certain
that he does not run out of the product during the day. How many products does he need to stock?
Exercise 112. In a transmission line, there is a fault in the insulation every 2.5 miles, on the average. What is
the probability of having 2 faults in less than 7.5 miles?
Exercise 113. The average number of earthquakes of magnitude eight or higher in the Richter scale is 1 per year.
Let X denote the number of earthquakes of magnitude eight or higher next year. Find P(X = 0) and V ar[X].

55

4.7 Review of Special Discrete Distributions


1. The Binomial Random Variable - X Binomial(n, p).
X counts the number of successes out of n independent (Bernoulli) trials, each having a probability of
success p.
2. The Geometric Random Variable - X Geom(p).
Consider a series of independent (Bernoulli) trials, each having probability of success p. X is the number
of trials until the first success.
3. The Negative Binomial Random Variable - X Negative Binomial(r, p).
Consider a series of independent (Bernoulli) trials, each having probability of success p. X is the number
of trials until we get r successes. By definition, Negative Binomial(1, p) is the same as Geom(p).
4. The Hypergeometric Raondom Variable - X Hypergeometic(N, n, k).
A sample of size n is chosen (without replacement) from a group of size N that has k successes (and N k
failures). X is the number of successes in the sample.
5. The Poisson Random Variable - X Poisson().
The Poisson family of distributions often provides a good model for the number of events (in particular rare
events) that occur in a fixed time period (or other fixed unit). For example, number of customers arriving
in an hour, number of insurance claims in a month, number of earthquakes in a year, number of typing
mistakes in a page. is the mean (average) number of events.
Random Variable (Y )
Binomial(n, p)
Hypergeometric(N, n, k)
Geometric(p)
Negative Binomial(r, p)
Poisson()

pmf: pY (y) = P (Y = y)


n y
ny
y p (1 p)
y {0, 1, 2, . . . , n}
k
(ky)(Nny
)
N
(n)
max(0, n N + k) y min(n, k)
p(1 p)y1
y {1, 2, 3, . . .}

y1 r
yr
r1 p (1 p)
y {r, r + 1, r + 2, . . .}
e y
y!

E[Y ]
np
n

k
N

k
N

(1

k
N)

N n
N 1

1p
p2

1
p

V ar[Y ]
np(1 p)

1
p

1p
p2

yN
Table 2

4.7.1 Exercises
Exercise 114. Can you find an example of a random variable X such that E[X(X + 1)] = 10 and E[X] = 5?
Exercise 115. Let A and B be events such that P(A) = 0.4, P(B) = 0.8 and P(A B) = 0.3. Find E[IA + IB ]
and V ar[IA + IB ].

56

Exercise 116. A system is composed of 4 components and works when at least 3 of them are operational. At
each minute, each component is operational independently with probability p. Let X denote the number of minutes
until the system works for the first time. Find the distribution of X. What is the probability that the system will
work in 5 minutes or less?
Exercise 117.
a. Let X Bernoulli(p). Compute E[etX ].
b. Let X1 , . . . , Xn independent and Bernoulli(p). Compute E[et

Pn

i=1

Xi ].

c. Let Y Binomial(n, p). Use the previous items to compute E[etY ].


Exercise 118. Messages arrive to a server at an average rate of 6 per hour. What is the probability that, in a
given hour, 4 or more messages arrive to the server?
Exercise 119. I tell you that I picked a coin equally likelly among a fair one and another with probability 0.2 of
heads. What is the probability that I picked the fair coin given that I throwed the coin 5 times until I obtained 2
heads?
Exercise 120. There are N white balls inside a box. A person independently paints red each ball in the box with
probability p. Let X denote the number of red balls in the box. What is the distribution of X? Next, you take a
sample without replacement of size n from the box. Call the number of red balls you get in the sample Y . What
is the distribution of Y given that X = k? What is the distribution of Y if you dont know the value of X?
Exercise 121. A population has size 20. 10 individual have been infected with a given disease. A given test detects
infected individuals with probability p and always says that a healthy individual is not infected. If 2 individuals are
sampled without replacement, what is the probability that the test detects no infected individuals?
Exercise 122. Let X Poisson().
a. Find f (t) = E[tX ].
b. Compute

f
t (0).

c. Compute

2f
(0).
t2

d. Compare the previous items with E[X] and

1
2

E[X 2 ].

Exercise 123. Person A throws a fair coin 5 times. Person B throws a fair coin 6 times. Let X and Y denote
respectively the number of heads obtained by person A and B. Compute P(X > Y ).
Exercise 124. A urn contains 3 black balls and 4 white balls. You pick 4 balls at random. If exactly 2 of these
balls are black, you step. Otherwise, you replace these balls in the urn and repeat the whole procedure. This
continues until you get get exactly 2 of these balls are black. What is the probability that you make exactly n
selections?

57

5 Continuous Random Variables


5.1 Introduction
Continuous random variables can assume an uncountable number of values. They are commonly used to model
quantities such as time, stock returns, height, weight, etc . . . While the distribution of a discrete random variable is
defined by its pmf, the distribution of a continuous random variable is defined by its probability density function.
Definition 27. Let X be a continuous random variable. We denote the probability density function of X by
fX : R R. It satisfies the following properties:
1. fX (x) 0.
2.

3.

Rb

fX (x)dx

= 1.

fX (x)dx = P(a X b).

Lemma 46. Let X be a continuous random variable with density fX (x). For every x R, P(X = x) = 0.
Proof. Observe that P(X = x) = P(x X x). Using Definition 27, P(x X x) =

Rx
x

fX (x)dx = 0.

Example 51. Consider that X has the pdf:

fX (x) =

, if x < 0

3x2

, if 0 x 1
, if x > 0

fX satisfies properties (1) and (2):

Z
P(X R) =

fX (x)dx

Z 1

3x2

=
=

0
1
x3 0

=1

We can use fX to compute P(X < 0.5),

0.5

P(X < 0.5) =

3x2 dx

0.5

Z
=
=
Example 52. Consider that X 1 and fX (x) =

c
x3

3x2 dx

0
0.5
x3 0

= 0.125

for some c R. What is the only value of c such that fX is

a legitimate density function?


58

c
dx
3
x

c
c
= x3 1 =
3
3

fX (x)dx =

In order for fX to be legitimate,

fX (x)dx

= 1. Hence c = 3.

Example 53. Consider that X 0 is a continuous random variable with density fX (x) = cex , for some c > 0.

fX (x)dx =

cex dx


c
= ex 0

c
= =1

Hence, c = .
Definition 28. Let X be a continuous random variable with density fX (x). The expected value of g(X) is
Z

E[g(X)] =

g(x)fX (x)dx

Comment: All the rules for E[] we saw in the discrete case still apply for continuous random variables. For
example, the additivity of expected values (Lemma 14) still applies.
Example 54. Consider Example 51.

E[X] =

xfX (x)dx

Z 1

x3x2 dx

=
0

3x3 dx

=
0

3 1 3
= x4 0 =
4
4
Also,

E[X 2 ] =

x2 fX (x)dx

x2 3x2 dx

=
0

Z
=

3x4 dx

3 1 3
= x5 0 =
5
5
Hence, V ar[X] = E[X 2 ] E[X]2 =

3
5

9
16

= 0.0375.
59

Definition 29. The cumulative distribution function (cdf ) of a random variable X is a function FX : R R,
F (x) = P(X x)
Lemma 47. A cdf F of a random variable X must satisfy the following properties, even if X is not continuous:
1. F is non-decreasing.
2. limx F (x) = 0.
3. limx F (x) = 1.
4. F is right-continuous.
Proof.
1. Note that, for every t1 t2 , { : X() t1 } { : X() t2 }. It follows from Lemma 5 that
F (t1 ) := P(X t1 ) P(X t2 ) := F (t2 ).
2. Let (yn )nN be a sequence such that yn . Define Bn := { : X() yn }. By construction,
Bn+1 Bn , and nN Bn = . From Exercise 21, it follows that
lim F (x) = lim F (yn ) = lim P(Bn ) = P(nN Bn ) = P() = 0
n

3. This is left as an exercise.


4. Proving that F is right-continuous requires showing that, for every t0 , F (t0 ) = limtt0 F (t). Let (tn )nN be
a sequence such that tn t0 (i.e., tn tn+1 for every n). Define Bn := { : X() tn }. By construction,
Bn+1 Bn , and nN Bn = { : X() t0 }. From Exercise 21, it follows that
lim F (x) = lim F (tn ) = lim P(Bn ) = P(nN Bn ) = P(X t0 ) = F (t0 ).
tt0

For continuous random variables, F is always continuous. However, for discrete random variables, F has leftdiscontinuities at the image values of X.
Example 55. Consider Example 51. For 0 x 1,

F (x) =

fX (x)dx
Z
x

=
=

3x2 dx

0
x
x3 0

= x3

For x 0, F (x) = 0 and for x 1, F (x) = 1.


Lemma 48. Let X be a continuous random variable with cdf FX . For b a, FX (b) FX (a) = P(a X b).
Proof. Using the definition of cdf

60

fX (x)dx

F (b) F (a) =

fX (x)dx

Z b

fX (x)dx = P(a X b)

=
a

Lemma 49. Let FX (x) be the cdf of X and fX (x) be the pdf of X.
FX (x)
= fX (x)
x
Proof.
FX (x)

=
x
x

fX (y)dy

= fX (x)

Fundamental Theorem of Calculus

Lemma 50. Let X be a continuous random variable with density fX . Then, for any set A F,
Z
P(X A) =

fX (x)dx
A

Definition 30. Let X be a continuous random variable. We denote the median of X by M ed[X] and define it
as the m R such that P(X m) = F (m) = 0.5. That is,
Z

fX (x)dx = 0.5

Example 56. Consider Example 51.

P(X m) =

f (x)dx
0

Z
=
=

3x2 dx

0
m
x3 0

= m3
1

Hence, M ed[X] corresponds to m such that m3 = 0.5. M ed[X] = (0.5) 3 .


5.1.1 Exercises
Exercise 125. Let 0 X 1 be a continuous random variable such that fX (x) = c for 0 x 1. Find c, E[X],
V ar[X], M ed[X] and E[etX ] and P(0.25 X 0.75).
Exercise 126. Let X 0 be a continuous random variable such that fX (x) = cex for x 0. Find c, E[X],
V ar[X] and P(X 1).
Exercise 127. Let 0 X n, for some n N, n > 0. If fX (x) = c, what is the value of c? Compute P(X 1).
61

Exercise 128. Consider X 1. Can there exist c > 0 such that the density of X is fX (x) = cx1 , for x 1?
Exercise 129 (Challenge). Let X be a continuous random variable. Find d R that minimizes E[|X d|]. (This
is not required for this course).
Hence, for every d R, E[|X d|] E[|X m|]. Conclude that m minimizes E[|X d|].

62

As in the case of discrete random variables, several continuous distributions are often used. We now explore
some of them.

5.2 Uniform Distribution


Definition 31. We say that X follows the uniform distribution on the interval (a, b) and denote by X U(a, b)
if the density of X is such that:

fX (x) =

1
ba

, if a x b

, otherwise

Lemma 51. If X U (a, b), then for every set (c, d) such that (c, d) (a, b), P(c X d) =

dc
ba .

That is, the

probability that X is inside an interval that is a subset of (a, b) is proportional to the length of the interval. Hence,
every two intervals of equal length have the same probability.
Proof.
Z

1
dx
b

a
c
x d d c
=
=
ba c
ba

P (c X d) =

Lemma 52. If X U(a, b), E[X] =

a+b
2

and V ar[X] =

(ba)2
12 .

Proof.
Z

x fX (x)dx

E[X] =

Z b

x
dx
a ba
x2 b
b2 a2
b+a
=
=
=
2(b a) a 2(b a)
2

x2 fX (x)dx

E[X ] =

1
dx
ba
a
x3 b
b3 a3
a2 + ab + b2
=
=
=
3(b a) a 3(b a)
3

x2

V ar[X] = E[X 2 ] E[X]2


a2 + ab + b2 (a + b)2

3
4
4a2 + 4ab + 4b2 3a2 3b2 6ab
(a b)2
=
=
12
12
=

63

5.2.1 Exercises
Exercise 130. Let X U(0, 1). Find P(X (0.5, 0.25) X (0.5, 0.75)).
Exercise 131. Let X U(0, a) and Y U(0, b). If fX (a/2) > fY (b/2), is a < b? Why?
Exercise 132. Let X be a continuous random variable with cdf FX (x). Let U Uniform(0, 1). Let Y = FX1 (U ).
Find the cdf of Y .
Exercise 133 (Coding). In the computer language Python you can generate a Uniform(0,1) by using the function
random.random(). Consider that the function Inv F(x) gives you the inverse of the cdf F . Use Exercise 132 to
write a short code that generates a random variable that has the cdf F .
Exercise 134 (Coding). Use Exercise 132 to write a function that simulates a random variable with distribution
U (a, b) using the function random.random().

5.3 Exponential distribution


Definition 32. We say that a random variable X follows an exponential distribution with parameter and denote
by X Exp() if the density of X is:

1 e x
fX (x) =
0

, if x 0
, otherwise
x

Lemma 53. If X Exp(), then FX (x) = 1 e . Hence, P(X > x) = e .


Proof.
Z

1 e dy
0
y x
x
= e 0 = 1 e

FX (x) =

Lemma 54. If X Exp(), then E[X] = and V ar[X] = 2 .


Proof.
Z

x fX (x)dx

E[X] =
Z

x1 e dx
0
Z
x
x


= xe

e dx
0
0
x

= 0 e 0 =
=

Integration by parts

64

A second way of deriving the variance of the exponential distribution is by using the fact that, if X is a nonR
negative continuous distribution E[X] = 0 P(X > x)dx (this is the continuous version of Lemma 13 try to
prove it!). From Lemma 53,
Z

E[X] =

P(X > x)dx =


0

x
x
e dx = e 0 =

Much easier, right?


To derive the variance, we compute
E[X 2 ] =

x2 fX (x)dx

x2 1 e dx
0
Z
x
x
2

= x e

2xe dx
0
0
Z
x
x

2e dx
= 0 2xe 0
0
x
2
2

= 2
= 2 e
0
=

Integration by parts
Integration by parts

V ar[X] = E[X 2 ] E[X]2 = 22 2 = 2

Lemma 55. If X Exp(), then X is memoryless. That is, P(X > t + s|X > t) = P(X > s).
Proof.
P(X > t + s X > t)
P(X > t)
P(X > t + s)
=
P(X > t)

P(X > t + s|X > t) =

Conditional Probability (Definition 14)

t+s

Exponencial cdf (Lemma 53)

e
s
= e = P(X > s)

Exponencial cdf (Lemma 53)

Notice that this is very similar to what happens to the geometric distribution (Exercise 98).
5.3.1 Exercises
Exercise 135. Let X Exp(1). Find E[X 3 ].
Exercise 136. A light bulb lasts, on average, 300 hours. What is the probability that it will last more than 700
hours given that it has already lasted 300 hours? Assuming the time it lasts follows an exponential distribution.
Exercise 137. The mode of a continuous distribution is the value that maximizes fX . What is the mode of an
exponential distribution?
65

Exercise 138 (Coding). Use Lemma 53 and Exercise 133 to simulate from an Exponential().
Exercise 139. Let X1 Exponential(1 ) and X2 Exponential(2 ) be independent random variables. Let
Z = min(X1 , X2 ). Find the cdf of Z. Do you recognize this cdf ? Generalize this result for X1 , . . . , Xn such that
Xi Exponential(i ) and Z = min(X1 , . . . , Xn ).
Hint: for any a <, min{x, y} > a if, and only if, both x and y are larger than a.


Pn
Pn
1 1
1 1
FZ (z) is the cdf of an Exponential distribution with parameter
. Hence, Z Exp(
).
i=1 i
i=1 i
Exercise 140. Consider a system with n components in series. Recall that this system fails whenever either of
the components fail. Assume that each component lasts on average 1 year before failing. How long do you expect
the system to last until failing? Use Exercise 139.

5.4 Gamma Distribution


Definition 33. : R+ R+ is called the Gamma function and is such that:

Z
(z) =

tz1 et dt

Lemma 56. The function satisfies the following properties:


1. For a 1, (a) = (a 1)(a 1)
2. If n N, (n) = (n 1)!
Proof.
1.
Z
(a) =

ta1 et dt


ta1 et 0

(a 1)ta2 et dt
0
Z
= 0 + (a 1)
ta2 et dt = (a 1)(a 1)
=

Integration by parts
Gamma Function (Definition 33)

2. Using the previous item and iterating over n, observe that


(n) = (n 1) (n 2) . . . 2 1 (1) = (n 1)! (1)
Hence, it is enough to show that (1) = 1.
Z
(1) =
0


et dt = et 0 = 1

Definition 34. We say that a random variable X follows the Gamma distribution with parameters(k, ) and
denote by X Gamma(k, ) if the density of X is

fX (x) =

x
1
xk1 e
(k)k

, if x > 0
, otherwise

66

Figure 4 presents some possible densities for the Gamma function.


Lemma 57. The density of the gamma distribution integrates to 1.
Proof.

Z
0

x
1
xk1 e dx =
k
(k)

=
0

1
(t)k1 et dt
(k)k
1 k1 t
t e dy = 1
(k)

Calling x = t
Gamma Function (Definition 33)

Observe that, in the special case in which k = 1, the Gamma distribution corresponds to the Exponential
distribution.
Lemma 58. For every a > 0, E[X a ] =

(k+a)a
(k) .

If X Gamma(k, ), then E[X] = k and V ar[X] = k2 .

Proof.
a

x
1
xk1 e dx
k
(k)
Z0
x
1
k+a1
x
e
dx
=
(k)k
0
Z
x
(k + a)a
1
=
xk+a1 e dx
k+a
(k)
(k + a)
0
(k + a)a
=
(k)

E[X ] =

xa

Density of a Gamma(k + 1, )
Properties of the Gamma function (Lemma 56)

k(k)1
(k + 1)1
=
= k
(k)
(k)
(k + 2)2
k(k + 1)(k)2
E[X 2 ] =
=
= k(k + 1)2
(k)
(k)
E[X] =

Use a = 1 in the previous item


Use a = 2 in the previous item

V ar[X] = E[X 2 ] E[X]2 = k(k + 1)2 k 2 2 = k2

5.4.1 Exercises
Exercise 141. Let X be the time a student takes to solve this exercise. Assume that X follows a Gamma
distribution. If E[X] = 6 minutes and V ar[X] = 12 minutes2 , what are the parameters of the Gamma distribution?
Exercise 142. Let X Gamma(k, ). Find E[X 3 ].
Exercise 143. Let X Gamma(k, ). Let c > 0 and Y = cX.
a. What is E[Y ] and V ar[Y ]?
b. If you knew that Y follows Gamma distribution, what would be the parameters of the distribution?
c. Use the fact that

P (Y y)
y

= fY (y) to show that fY (y) = 1c fX ( yc ).

d. What is the distribution of Y ?


67

1.2
0.8

1.5

0.4

f(x)

1.0

10

0.0

0.0
4

10

Gamma( 1 , 0.5 )

Gamma( 1 , 1 )

Gamma( 1 , 2 )

10

10

0.0

0.2

f(x)

0.4

0.8
0.4
0.0

f(x)
2

10

Gamma( 2 , 0.5 )

Gamma( 2 , 1 )

Gamma( 2 , 2 )

10

0.10

f(x)

0.2

0.00

0.1
0.0

0.0

0.2

f(x)

0.4

0.3

0.6

f(x)

Gamma( 0.5 , 2 )

0.5

f(x)
2

0.0 0.5 1.0 1.5 2.0

f(x)

Gamma( 0.5 , 1 )

0.0 0.5 1.0 1.5 2.0

f(x)

Gamma( 0.5 , 0.5 )

10

10

Figure 4: densities of some Gamma distributions

68

6
x

10

5.5 Beta Distribution


Definition 35. We say that X follows Beta distribution with parameters (, ) and denote by X Beta(, ) if
the density of X is

fX (x) =

(+)
()()

x1 (1 x)1

, if 0 < x < 1
, otherwise

Lemma 59. The density in Definition 35 is a valid pdf. In particular,


1

Z
0

( + ) 1
x
(1 x)1 dx = 1
()()

Proof.
Z
()() =

x
Z0 Z

1 x

=
0

dx

y 1 ey dx

(Definition 33)

x1 y 1 e(x+y) dxdy

0
Z 1

(tu)1 (t(1 u))1 et t dudt


Z
Z 1
=
t+1 et dt
u1 (1 u)1 du
0
0
Z 1
u1 (1 u)1 du
= (a + b)
=

t = x + y, u = x(x + y)1

(Definition 33)

Hence,

R1
0

u1 (1 u)1 du =

()()
(a+b)

and

R1

(+) 1
(1
0 ()() x

x)1 dx = 1.

Since the Beta distribution assumes values in (0, 1), it is commonly used to model relative frequencies and
ratios. It is a flexible distribution and can assume many different shapes. Figure 5 shows some densities that the
Beta distribution can assume.
Lemma 60. If X Beta(, ), then for every c, d 0 E[X c (1 X)d ] =
V ar[X] =

(+c)(+d)
(++c+d) .

Hence, E[X] =

.
(+)2 (++1)

Proof.
E[X c (1 X)d ] =

xc (1 x)d fX (x)dx

Z
=

( + ) 1
x
(1 x)1 dx
()()
Z
( + c)( + d) 1 ( + + c + d)
x (1 x)1 dx
( + + c + d) 0 ( + c)( + d)
( + c)( + d)
1
( + + c + d)

xc (1 x)d

( + )
=

()()
( + )
=

()()

69

(Lemma 59)

and

Beta( 0.5 , 1 )

Beta( 0.5 , 2 )

6
4

f(x)

1.0

f(x)

2.0

f(x)

3.0

Beta( 0.5 , 0.5 )

0.2

0.4

0.6

0.8

1.0

0.0

0.8

1.0

0.0

0.2

0.4

0.6

0.8

Beta( 1 , 2 )

f(x)
0.4

0.6

0.8

1.0

f(x)

4
3
1

0.2

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

Beta( 2 , 0.5 )

Beta( 2 , 1 )

Beta( 2 , 2 )

0.6

0.8

1.0

1.0
0.0

0.5

f(x)

f(x)

0.4

1.0

1.5

0.0 0.5 1.0 1.5 2.0

0.2

1.0

0.0 0.5 1.0 1.5 2.0

Beta( 1 , 1 )
0.6 0.8 1.0 1.2 1.4

Beta( 1 , 0.5 )

f(x)

0.6

0
0.0

0.4
x

f(x)

0.0

0.2

0.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 5: densities of some Beta distributions

70

0.0

0.2

0.4

0.6
x

0.8

1.0

( + ) ( + 1)( + 0)

()() ( + + 1 + 0)
( + )
()

=
()
( + )( + )
+
(
+
2)(
+
0)
(
+
)

E[X 2 ] =
()() ( + + 2 + 0)
( + )
( + 1)()
( + 1)
=

=
()
( + + 1)( + )( + )
( + )( + + 1)
E[X] =

(Lemma 56)

(Lemma 56)

V ar[X] = E[X 2 ] E[X]2


( + 1)
2

( + )( + + 1) ( + )2
( + )( + 1) 2 ( + + 1)

=
=
2
( + + 1)( + )
( + + 1)( + )2
=

5.5.1 Exercises
Exercise 144. Let X Beta(, ). Find E[X(1 X)].
Exercise 145. Let X Beta(1, ). Find FX (x).
Exercise 146. Let X Beta(, 1). Find FX (x).
Exercise 147. Let X Beta(, )
a. Let Y = 1 X. What is E[Y ] and V ar[Y ]?
b. If you knew that Y follows Beta distribution, what would be the parameters of the distribution?
c. Use the fact that

dP(Y y)
y

= fY (y) to show that fY (y) = fX (1 y).

d. What is the distribution of Y ?

5.6 Normal Distribution


Definition 36. We say that X follows the Normal distribution with parameters (, 2 ) and denote by X N(, 2 )
if the density of X is
(x)2
1
fX (x) = e 22
2

If = 0, and = 1, we say that X follows the standard normal distribution.


Definition 37. Let Z N(0, 1).
(z) = P(Z z) = FZ (z)
has no analytical solution. Two ways to obtain approximate values of are:
1. Many computer languages have a statistical library with these values. For example, R, python, C, . . .
2. Consult a standard normal table, available in many Statistical books.
71

Lemma 61. If X N(, 2 ), then


Proof. Let Z =

(X)

N (0, 1).

(X)
.


FZ (z) = P(Z z) = P

(X )
z

= P(X z + ) = FX (z + )
Hence,
fZ (z) =

FZ (z)
FX (z + )
=
z
z
FX (z + ) (z + )
=

(z + )
z

(Lemma 49)
Chain Rule

= fX (z + )
(z+)2
1
1
2
= e 2
= ez
2
2

It follows from Lemma 61 that every parametrization of the normal distribution can be obtained by appropriate
rescaling of the standard normal. Figure 6 presents the density of the standard normal distribution.

0.2
0.0

0.1

f(x)

0.3

0.4

Standard Normal Distribution

Figure 6
Lemma 62. If X N(, 2 ), then E[X] = and V ar[X] = 2 .
Proof. Consider that X N(0, 1).
Z

E[X] =

1
2
xex dx
2

72

Since xex is an odd function, E[X] = 0.


Recall that V ar[X] = E[X 2 ] E[X]2 . Since E[X] = 0, V ar[X] = E[X 2 ].
Z

1
2
x2 ex dx
2


Z

1
1
2
x2
ex dx
= xe
+
2 2
2

E[X 2 ] =

integration by parts

=0+1

standard normal pdf (Definitions, 27 36)

Next, consider that X N(, 2 ),




X
E[X] = E
+



X
+
=E

=0+=

(Lemma 61)

X
V ar[X] = V ar
+



X
2
= V ar

= 2 1 = 2

(Lemma 61)

Lemma 63 (Binomial Approximation and Central Limit Theorem). Consider that X Binomial(n, p) and that
n is large (n = 30 is often enough for a good result).

X np
p
z
np(1 p)

!
(z)

This is a specific instance of a more general rule. If X1 , X2 , . . . is a sequence of independent random variables
with the same distribution, E[Xi ] = and V ar[Xi ] = 2 , then, for large n,
 Pn
P

(Xi
i=1


z

(z)

Proof. We will make the statement more precise as well as prove it in Chapter ??.
5.6.1 Exercises
Exercise 148. Let X N (, 2 ). Find E[X 3 ].
Exercise 149. Let Y N (, 2 ). Let a, b R. Show that
P(a < Y < b) = ((b )/) ((a )/) ,
where is the cumulative distribution of a standard normal distribution (i.e., mean zero and variance 1).
73

This shows one can compute any probability related to a normal random variable by computing the cumulative
distribution of a standard normal distribution. This was very important when computers were not available because
it allows one to compute all probabilities of interest by having access to a single table. Not many distributions have
this property.
Exercise 150. Let X denote the number of heads on 10000 throws of a fair coin. Approximate P(X 5050).

74

5.7 Review of Special Continuous Distributions


1. The Uniform Random Variable - X U(a, b).
All subsets of (a, b) with the same length are equally probable. This distribution is usually used when all
points in (a, b) are equally likely.
2. The Exponential Random Variable - X Exp().
This distribution is commonly used to model time until a certain event happens. It is the only continuous
distribution with the memoryless property.
3. The Gamma Random Variable - X Gamma(k, ).
A Gamma(1, ) is an Exponential() and, thus, the Gamma distribution is a generalization of the ExPk
ponential. If k is a natural number, X =
i=1 Yi , where Yi are independent random variables and
Yi Exponential().
4. The Beta Random Variable - X Beta(, ).
Since the Beta distribution assumes values in (0, 1), it is commonly used to model frequencies and ratios.
5. The Normal (Gaussian) Random Variable - X N(, 2 ).
Using the Central Limit Theorem (Lemma 63), the Normal distribution is commonly used to approximate
the distribution of the average of a sequence of independent and identically distributed random variables.
Random Variable (Y )

pdf: fY (y)

E[Y ]

V ar[Y ]

Uniform(a, b)

1
ba

a+b
2

(ba)2
12

k2

Exponential()
Gamma(k)
Beta(, )
Normal(, 2 )

y (a, b)
y
1
e
y R+
y
1
y k1 e
(k)k
y R+
(+) 1
(1 x)1
()() x
y (0, 1)
1 e
2

(z)2
2 2

yR
Table 3: Review of Continuous Distributions

75

1p
p2

6 Bibliography
References
P. Billingsley. Probability and measure. John Wiley & Sons, 2008. 13

76

Potrebbero piacerti anche