138191rvsplecturenotes 170424151355

EEM 2046 Engineering Mathematics IV Random Variables and Stochastic Processes
Random Variables and Stochastic Processes

Refer to Lecture Notes Series: Engineering Mathematics Volume 2,
Second Edition Prentice Hall, 2006 for more Examples
Random Variables
Probability:
Symbol: P(⋅)
Example:
P ( X ≥ 5)
P(− 1 < X < 1)
P( X > 1) = P( X < −1) + P( X > 1)
Always True: 0 ≤ P( X ) ≤ 1
Sample space: The entire possible outcome for an experiment

Symbol: S
Example:
Two balls are drawn in succession without replacement from a box
containing 4 yellow balls and 3 green balls.
S = {YY, GG, YG, GY}
Example:
A fair coin was tossed twice. S = {TT, TH, HT, HH}.
1
Random variable: A function X with the sample space S as the

domain and a set of real number R X as the range.
X
S R X (Range of X)
Symbol for random variable: Uppercase (for example, X)

Value for random variable: lowercase (for example, x)
Example:
Let X = “number of yellow balls”.
S x X(YY) = 2
X(YG) =1
YY 2 X(GY) = 1
YG 1 X(GG) = 0
GY
GG 0 Then R X = {0, 1, 2} .
Example:
Let X = “number of head appears”
R X = {0, 1, 2}
2
Discrete Random Variables
Random Variables
Continuous Random Variables
Discrete Random Variables: if it can take on at most a countable

number of possible values.
Example:
S x X(YY) = 2
X(YG) =1
YY 2 X(GY) = 1
YG 1 X(GG) = 0
GY
GG 0 Then R X = {0, 1, 2} .
Example:
Let X = “number of head appears”. R X = {0, 1, 2}
3
Probability function for discrete random variables

Probability mass function (pmf)
Probability distribution function
Symbol: f X ( x ) This is to indicate that
the random variable in
the pmf is X.
We can have f Y ( y ) ,
f Z ( z ) and etc.
Properties:
(1) f X ( x) ≥ 0
(2) ∑ f X ( x) =1 ∀x Given f X ( x) = kx , x = 1,2,3 . Find k.
x
(3) P( X = x ) = f X ( x ) 3
∑ f X (x) = 1 ∀x
x =1
k ⋅1 + k ⋅ 2 + k ⋅ 3 = 1
6k = 1
x 1
Given f X ( x ) = , x = 1,2,3 . k=
6 6
(i) Find P( X = 1) .
1
P( X = 1) = f X (1) =
6
(ii) Find P( X < 3) .
P( X < 3) = ∑ f X ( x )
x <3
= f X (1) + f X (2 )
1 2 1
= + =
6 6 2
(iii) Find P( X = 4 ) .
P( X = 4 ) = 0
4
Example:
Let X = “number of head appears”
R X = {0, 1, 2} fX(x)
1
1
fX(0) = P(X = 0) = P((TT))=
4 2
1 1
2 1
fX(1) = P(X = 1) = P((TH),(HT))= = 4 4
4 2
1
fX(2) = P(X = 2) = P((HH))=
4
0 1 2 x
Figure 1: The graph of probability mass function
Example:
( )
Determine the value c so that the function f ( x ) = c x 2 + 4 for x = 0, 1, 2,
3 is a probability mass function of the discrete random variable X.
Solution:
From Property 2: ∑ f X ( x) =1 ∀x
x
3
∑ c(x 2 + 4) = 1
x =0
4c + 5c + 8c + 13c = 1
30c = 1
1
c=
30
5
Cumulative distribution function (cdf)

Symbol: FX ( x )
FX ( x ) = P ( X ≤ x )
= ∑ f X (t ) for − ∞ < x < ∞
t≤x
Example:
Let X = “numbers of head appears”. We know that R X = {0, 1, 2}.
If B is the event that “X ≤ 1”, then find
(a) P(B )
(b) FX ( x )
(c) Sketch the graph of cumulative distribution function FX ( x ) .
Solution:
1
1 2 3
(a) P(B ) = P( X ≤ 1) = ∑ f X (t ) = f X (0) + f X (1) = + =
t =0 4 4 4
(b) The cumulative distribution of X is

0 for x < 0
1
 for 0 ≤ x < 1
4
FX ( x ) = 
3 for 1 ≤ x < 2
4
1 for x ≥ 2

(c)
FX ( x )
1
3/4
1/4
0 1 2 x
6
Continuous Random Variables: outcomes contain an interval of real

numbers.
For example: 0 < x < 1 , 5 ≤ y ≤ 9
Probability function for continuous random variables

Probability density function (pdf)
Probability distribution function
Symbol: f X ( x ) This is to indicate that
the random variable in
the pdf is X.
We can have f Y ( y ) ,
f Z ( z ) and etc.
Given f X ( x ) = 0.25 , 0 < x < k .

Properties: Find k.
(1) f X ( x) ≥ 0 ∀ x ∈ R
∞ (i)
(2) ∫ −∞ f X ( x ) dx = 1 ∞
b ∫ −∞ f X ( x ) dx = 1
(3) P(a < X < b ) = ∫ f X ( x ) dx k
a
∫0 0.25dx = 1
[0.25 x]0k = 1
0.25k = 1
k =4
(ii) Find P(0 < X < 2.5) .

Question: If X is a 2.5
P(0 < X < 2.5) = ∫ 0.25 dx
continuous random 0
variable, find P ( X = 6 ) . = [0.25 x ]0

2.5
= 0.625
P ( X = 6 ) = 0 WHY?
7
Cumulative distribution function (cdf)

Symbol: FX ( x )
FX ( x ) = P ( X ≤ x )
x
=∫ f X (t )dt for −∞< x<∞
−∞
Example:
Given f X ( x ) = 0.25 , 0 < x < 4 . Find FX ( x ) .
0, x≤0

Answer: FX ( x ) = 0.25 x, 0 < x ≤ 4
1, x > 4.

WHY??????????
Case 1 Value x falls into the

region x ≤ 0 :
The x values fall into this region FX ( x ) = P( X ≤ x )
x
=∫ −∞
f X (t )dt
x
=∫ 0dt
−∞
x-axis = 0 for x≤0
0 4
Case 2
Value x fall into the
The x values fall into this region region 0 < x ≤ 4 :
F X ( x ) = P( X ≤ x )
x
= ∫ 0.25dt
0
x-axis = 0.25 x for 0 < x ≤ 4

0 4
8
Case 3 Value x fall into the

region x > 4 :
FX (x ) = P( X ≤ x )
The x values fall into this region 4
= ∫ 0.25dt
0
= 1 for x>4
x-axis
0 4
Two random variables
x1
X1
X2 x2
Example:

R X = {0, 1, 2}
Define another random variable Z as follows

Z = the number of green balls
What is the range of two random variables X and Z.
R( X , Z ) = {(2, 0 ), (1, 1), (0, 2 )}
(Recall your sample space, which is given by S = {YY, GG, YG, GY})
9
Two discrete random variables

-countable
Joint probability mass function / Joint probability function

Symbol: f XY ( x, y )
Properties:
(1) f XY ( x, y ) ≥ 0 ∀( x, y )
(2) ∑∑ f XY ( x, y ) = 1
x y
(3) P ( X = x, Y = y ) = f XY ( x, y )
Example:
Given f XY ( x, y ) = kxy , ( x, y ) ∈ {(1, 2 ), (2, 1)}.
Find k.
∑∑ kxy = 1
x y
2k + 2k = 1
1
k=
4
2 2
∑∑ kxy = 1
y =1 x =1
1 Different.
k= WHY????
9
You MUST consider the values of (x, y) in

PAIRS and not to break them into one by
one.
It is WRONG to say that

x = 1, 2 and y = 1, 2
WHY??
(1, 1) (1, 2) (2, 1) and (2,2)
10
Joint Cumulative Distribution Function

Symbol: FXY ( x, y )
FXY ( x, y ) = P{X ≤ x, Y ≤ y}, -∞ < x < ∞ , − ∞ < y < ∞

y x
= ∑ ∑ f XY (u, v )
v = −∞ u = −∞
Example:
Let X and Y be two discrete random variables with joint probability
x+ y
distribution f XY ( x, y ) = , for x = 0,1,2,3; y = 0,1,2 . Find FXY (1, 2 ) .
30
Solution:
F (1,2) = P[ X ≤ 1, Y ≤ 2]
2 1
= ∑ ∑f
y = −∞ x = −∞
XY ( x, y )
2 1
x+ y
= ∑∑
y =0 x =0 30
2
 y 1+ y 
= ∑ +
y = 0  30 30 
1 1 2 2 3
= + + + +
30 30 30 30 30
9
=
30
3
=
10
11
Marginal Probability Distributions/ marginal probability mass

function
Find f X ( x ) or f Y ( y ) from f XY ( x, y ) .
How to find f X ( x ) or f Y ( y ) from f XY ( x, y ) ?
f X ( x ) = P( X = x ) = ∑ f XY ( x, y )
y
f Y ( y ) = P(Y = y ) = ∑ f XY ( x, y )
x
Example:
xy
Given f XY ( x, y ) = , ( x, y ) ∈ {(1, 2 ), (2, 1)}. Find marginal probability
4
distribution of X alone.
Solution 1 Solution 2
xy 2
f X (1) = ∑ = 2
xy
y =2 4 4 f X (x ) = ∑
y =1 4
xy 2 x 2 x 3x
f X (2 ) = ∑ = = + = , x = 1, 2
y =1 4 4 4 4 4
1
f X ( x ) = , x = 1, 2
2
Which one is correct? Which one is correct?
Solution 1 or Solution 2?
12
Conditional Probability:
Conditional Probability distribution of X given Y = y:

f ( x, y )
f X Y (x y ) = XY , fY ( y ) > 0 .
fY ( y)
Conditional Probability distribution of Y given X = x:

f ( x, y )
f Y X ( y x ) = XY , f X (x ) > 0 .
f X (x )
Example:
xy
Given f XY ( x, y ) = , ( x, y ) ∈ {(1, 2 ), (2, 1)}. Find conditional probability
4
of Y given X = 1.
f XY ( x , y )
fY (y x) =
f X (x )
X
f (1, y ) y 1 y
f Y X ( y 1) = XY = = , y=2
f X (1) 4 2 2
13
Two continuous random variables:

- will give an area on xy-plane
Example:
0 < x < 1, 0 < y < 1.
y
x
0
1
Joint probability density function:

- can be viewed as a surface lying above xy-plane
Symbol: f XY ( x, y )
Example:
Given a joint density function f XY ( x, y ) = 1 , 0 < x < 1 , 0 < y < 1 .
f XY ( x, y )
f XY (x, y ) = 1
y
x
0
14
IMPORTANT:
For discrete random variables:
f XY ( x, y ) = P( X = x, Y = y ) .
For continuous random variables:

f XY ( x, y ) ≠ P ( X = x, Y = y ) .
Example:
Find P(0 < X < 1, 0 < Y < 1) .
f XY ( x, y )
y P(0 < X < 1, 0 < Y < 1) = 1 is
the volume bounded by the
surface f XY ( x, y ) and area
0 < x < 1, 0 < y < 1.
1
1 ⇒ P(0 < X < 1, 0 < Y < 1) = 1
x
0 1
Properties for joint pdf:

1. f XY ( x, y ) ≥ 0 ∀( x, y )
∞ ∞
2. ∫ ∫ f XY ( x, y )dxdy = 1
−∞ −∞
3. For any region A of two-dimensional space,
P[( X , Y ) ∈ A] = ∫∫ f XY ( x, y )dxdy
A
15
Example:
Find P(0 < X < 0.5, 0 < Y < 0.5) .
0.5 0.5
P(0 < X < 0.5, 0 < Y < 0.5) = ∫ ∫0 1dxdy = 0.25
0
Example:
Given a joint density function f XY ( x, y ) = 2 , 0 < x ≤ y < 1 .
Find P(0 < X < 0.5, 0 < Y < 0.5) .
0.5 y
P(0 < X < 0.5, 0 < Y < 0.5) = P(0 < X < y, 0 < Y < 0.5) = ∫ ∫0 2dxdy = 0.25
0
16
Joint Cumulative Probability Distribution Function
Symbol: FXY ( x, y )
FXY ( x, y ) = P( X ≤ x, Y ≤ y ), - ∞ < x, y < ∞

y x
= ∫ ∫ f XY (u, v )dudv
v = −∞ u = −∞
Example:
Let X and Y be two continuous random variables with joint probability
distribution
3 2 2
( )
 x + y , for 0 ≤ x < 1; 0 ≤ y < 1
f XY ( x , y ) =  2 .
0 , elsewhere.
 1
Find FXY 1,  .
 2
 1  1
Solution: FXY 1,  = P  X ≤ 1, Y ≤ 
 2  2
1
2 1
= ∫ ∫ f XY ( x, y)dxdy
−∞ −∞
1 1 1
1
21
3 2 2
3 x 3
 3 1 2
= ∫∫ (
x + y 2 dxdy ) = ∫
20 3
+ xy 2  dy = ∫ + y 2 dy
002 0 203
1
3  y y3  2
=  + 
2  3 3 0
5
=
16
17
Marginal Probability Distributions/ marginal probability density

function
Find f X ( x ) or f Y ( y ) from f XY ( x, y ) .
How to find f X ( x ) or f Y ( y ) from f XY ( x, y ) ?
∞ ∞
f X (x) = ∫ f XY (x, y )dy or fY ( y) = ∫ f XY (x, y )dx
−∞ −∞
Example:
Given a joint density function f XY ( x, y ) = 1 , 0 < x < 1 , 0 < y < 1 . Find
marginal probability density function of X alone.
∞
f X (x) = ∫ f XY (x, y )dy
−∞
1
= ∫ 1dy
0
=1 0 < x <1
18
Conditional Probability:
Conditional Probability distribution of X given Y = y:

f ( x, y )
f X Y (x y ) = XY , fY ( y ) > 0 .
fY ( y)
Conditional Probability distribution of Y given X = x:

f ( x, y )
f Y X ( y x ) = XY , f X (x) > 0 .
f X (x)
Example:
 32 1 1
 ( x + y − 2 xy ), 0 ≤ x ≤ ;0 ≤ y ≤
f XY ( x, y ) =  3 2 2
0, elsewhere.
Find the conditional probability distribution of Y given X = 0 .

f ( x, y )
From, f Y X ( y x ) = XY
f X (x)
∞
f X ( x) = ∫ f XY ( x, y)dy
−∞
1
2
32
=∫ ( x + y − 2 xy )dy
0
3
1
32  y 2
2
=  xy + − xy 2 

3  2 0
32  x 1  1
=  +  0≤x≤
3  4 8 2
f (0, y )
f (Y X = 0) = XY
f X (0)
1
= 8y 0≤ y≤
2
19
One Random Variables
Two Random Variables
Three Random Variables
Multiple Random Variables

(refer to lecture notes series)
Probability
Sample space
Random variable
- Discrete
- Continuous
Probability mass function
Probability density function
Cumulative distribution function
Marginal probability mass function
Marginal probability density function
Conditional probability
20
Independence
Recall from Math 1, the events A and B are said to be independent if and
only if P ( A B ) = P ( A) .
A card is drawn at random from a deck of 52, and its face value and suit
are noted. The event that an ace was drawn is denoted by A, and the
event that a club was drawn is denoted by B. There are four aces so
4 1 13 1
P ( A) = = , and there are 13 clubs, so P (B ) = = . A∩ B
52 13 52 4
denotes the event that the ace of clubs was drawn, and since there is only
1  1  1
one such card in the deck, P ( A ∩ B ) = =   ×   = P ( A) × P ( B ) .
52  13   4 
Thus,
1
P( A ∩ B ) 1
P(A B ) = = 52 = = P( A) .
P(B ) 1 13
4
In other words, knowing that the card selected was a club did not change
the probability that the card selected was an ace. We say that the event A
independent of the event B.
Independent of two random variables

X and Y are statistically independent if and only if
f XY ( x, y ) = f X ( x ) f Y ( y )
TRUE for discrete and continuous random variables.
For continuous random variables X and Y, if the product of f X ( x ) and

f Y ( y ) equals the joint probability density function, then they are said to be
statistically independent.
For discrete random variables X and Y, the product of f X ( x ) and fY ( y )

might equal to the joint probability distribution function for some but not
all combinations of ( x, y ) . If there exists a point ( x0 , y 0 ) such that
f XY ( x0 , y 0 ) ≠ f X ( x0 ) f Y ( y 0 ) , then the discrete variables are said to be NOT
statistically independent.
Extend the idea the p random variables:
21
The random variables X 1 , X 2 , ⋯, X p are said to be mutually statistically

independent if and only if
f (x1 , x 2 , ⋯ , x p ) = f X1 ( x1 ) f X 2 ( x 2 )⋯ f X p (x p ).
Example(discrete case):
Let
1
 , (x1 , x2 , x3 ) ∈ {(− 1, − 1, 1), (1, 1, 1), (− 1, 1, − 1), (1, − 1, − 1)},
f X1 X 2 X 3 ( x1 , x 2 , x3 ) =  4
0, elsewhere.
(a) Find the joint marginal probability distribution of X i and X j , i ≠ j ;

i, j = 1, 2, 3.
(b) Find the marginal probability distribution of X i , i = 1, 2, 3.
(c) Determine whether the two random variables X i and X j are
statistically independent or dependent where i ≠ j ; i, j = 1, 2, 3.
(d) Determine whether the three random variables X 1 , X 2 and X 3 are
statistically independent or dependent.
Solution:
(a) We see that
1
f X 1 X 2 (− 1, − 1) = f X1 X 2 (1, 1) = f X1 X 2 (− 1, 1) = f X1 X 2 (1, − 1) = ,
4
1
f X 1 X 3 (− 1, − 1) = f X 1 X 3 (1, 1) = f X1 X 3 (− 1, 1) = f X 1 X 3 (1, − 1) = ,
4
1
f X 2 X 3 (− 1, − 1) = f X 2 X 3 (1, 1) = f X 2 X 3 (− 1, 1) = f X 2 X 3 (1, − 1) = .
4
The joint marginal probability distribution of X i and X j is
1
 ,
f X i X j (xi , x j ) =  4
(x , x )∈ {(− 1, − 1), (1, 1), (− 1, 1), (1,−1)},
i j
0, elsewhere.
(b) We have
1 1
f X 1 (− 1) = f X 2 (− 1) = f X 3 (− 1) = , f X 1 (1) = f X 2 (1) = f X 3 (1) = .
2 2
The marginal probability distribution of X i is
1
 , xi = −1,1,
f X i (xi ) =  2
0, elsewhere.
22
(c) Obviously, if i ≠ j , we have

f X i X j (xi , x j ) = f X i ( xi ) f X j (x j )
and thus X i and X j are statistically independent.
(d) We see that

1
f X i X j X 3 (− 1,−1,−1) = 0 and f X 1 (− 1) f X 2 (− 1) f X 3 (− 1) =
8
which means
f X 1 X 2 X 3 (− 1,−1,−1) ≠ f X 1 (− 1) f X 2 (− 1) f X 3 (− 1) .
Thus, X 1 , X 2 and X 3 are statistically dependent.
Example (continuous case):

1 , 0 < x1 < 1,0 < x 2 < 1,0 < x3 < 2,
Let f X1X 2 X 3 (x1 , x 2 , x3 ) =  2
0, elsewhere.
(a) Find the joint marginal probability distribution of X i and X j , i ≠ j ;

i, j = 1, 2, 3.
(b) Find the marginal probability distribution of X i , i = 1, 2, 3.
(c) Determine whether the two random variables X i and X j are
statistically independent or dependent where i ≠ j ; i, j = 1, 2, 3.
(d) Determine whether the three random variables X 1 , X 2 and X 3 are
statistically independent or dependent.
Solution:
(a) We see that
2
f X1X 2 ( x1 , x 2 ) = ∫ 12 dx3 = 1 , 0 < x1 < 1, 0 < x 2 < 1 .
0
1
f X1X 3 ( x1 , x3 ) = ∫ 12 dx 2 = 12 , 0 < x1 < 1, 0 < x3 < 2 .
0
1
f X 2 X 3 ( x 2 , x3 ) = ∫ 12 dx1 = 12 , 0 < x 2 < 1, 0 < x3 < 2 .
0
23
(b) The marginal probability distribution of X i , i = 1, 2, 3 are as follows:

1 1
f X1 ( x1 ) = ∫ f X1X 2 ( x1 , x 2 )dx 2 = ∫ 1 dx 2 = 1 , 0 < x1 < 1 or
0 0
2 2
f X1 ( x1 ) = ∫ f X1X 3 ( x1 , x3 )dx3 = ∫ 12 dx3 = 1 , 0 < x1 < 1 or
0 0
2 1 2 1
f X1 ( x1 ) = ∫ ∫ f X1X 3 ( x1 , x 2 , x3 )dx 2 dx3 = ∫ ∫ 12 dx 2 dx3 = 1 , 0 < x1 < 1 .
0 0 0 0
Similarly, for f X 2 (x 2 ) and f X 3 (x3 ) .

1 1
f X 2 ( x 2 ) = ∫ f X 1 X 2 ( x1 , x 2 )dx1 = ∫ 1 dx1 = 1 , 0 < x 2 < 1
0 0
1 1
f X 3 ( x3 ) = ∫ f X 1 X 3 ( x1 , x3 )dx1 = ∫ 12 dx1 = 12 , 0 < x3 < 2
0 0
(c) For X 1 and X 2 , we see that

f X1X 2 ( x1 , x 2 ) = f X1 ( x1 ) f X 2 (x 2 )
and thus X 1 and X 2 are statistically independent.
For X 1 and X 3 , we see that

f X1 X 3 ( x1 , x3 ) = f X1 ( x1 ) f X 3 ( x3 )
For X 2 and X 3 , we see that

f X 2 X 3 ( x 2 , x3 ) = f X 2 ( x 2 ) f X 3 ( x3 )
Thus X i and X j are statistically independent where i ≠ j ;

i, j = 1, 2, 3.
(d) For X 1 , X 2 and X 3 , we see that

f X1X 2 X 3 ( x1 , x 2 , x3 ) = f X1 ( x1 ) f X 2 ( x 2 ) f X 3 ( x3 )
and thus X 1 , X 2 and X 3 are statistically independent.
24
Question (a):
If X 1 , X 2 and X 3 are independent, does it imply that X i and X j are
independent where i ≠ j ; i, j = 1, 2, 3 ?
Yes.
In discrete case:
For X 1 and X 2 , we have
f X1 X 2 (x1 , x 2 ) = ∑ f X1 X 2 X 3 ( x1 , x 2 , x 3 ) = ∑ f X 1 ( x1 ) f X 2 ( x 2 ) f X 3 ( x3 ) = f X 1 ( x1 ) f X 2 ( x 2 )
x3 x3
Thus, X 1 and X 2 are independent.

Similarly for the cases of
(i) X 1 and X 3
(ii) X 2 and X 3 .
In continuous case, we have integration instead of summation.
Question (b):
If X i and X j are independent where i ≠ j ; i, j = 1, 2, 3 , does it imply that
X 1 , X 2 and X 3 are independent?
No.
Refer to previous example (discrete case)
25
Transformation of Variables
f X (x)
Given a relation between X and Y ,

Y = g(X )
fY ( y )
Transformation of Variables for one discrete random variable
Example:
Let X be a random variable with the following probability mass function,
161 (2 x + 1) , x = 0,1,2,3
f X ( x) =  and Y = 2 X . Find f Y ( y ) .
0, elsewhere
1. The transformation maps the space R X = {0, 1, 2,3} to

RY = {0, 2, 4, 6}.
2. The transformation y = 2 x sets up a one-to-one correspondence

between the point of R X and those of RY (one-to-one
transformation).
y
3. The inverse function is x = .
2
4.
(
f Y ( y ) = P(Y = y ) = P(2 X = y ) = P X = y
2
)
1
 ( y + 1), y = 0, 2, 4,6,
= 16
0, elsewhere.
26
Example:
Let X be a geometric random variable with probability distribution
x −1
2 3
f X ( x ) =   , x = 1,2,3, … . Find the probability distribution function
55
of the random variable Y = 2 X 2 .
Solution:
1. The transformation maps the space R X = { 1, 2, ⋯} to
RY = {2, 8, ⋯}.
2. Since values of X are all positive, the transformation defines a one-
to-one correspondence between the values X and values of Y,
y = 2x 2 .
3. The inverse function of y = 2x 2 is x = y 2 .
y = 2x 2
One-
to-
x
Hence
 23
y 2 −1
f
gY ( y ) =  X
( )
y 2 =  
55
, y = 2, 8, 18, ⋯

0, elsewhere
27
Transformation of Variables for one discrete random variable
f X (x)
1. Transformation based on Y = g ( X ) maps the space
R X to RY . (Find RY )
2. Make sure that transformation based on Y = g ( X )
sets up a one-to-one correspondence between the
point of R X and those of RY .
3. Find the inverse function x = w( y ) .
4. Replace x in f X ( x ) by w( y ) . Finally form the
function f Y ( y )
fY ( y )
Transformation of Variables for one continuous random variable
f X (x)
1. Transformation based on Y = g ( X ) maps the space R X
to RY . (Find RY )
2. Make sure that transformation based on Y = g ( X ) sets
up a one-to-one correspondence between the point of
R X and those of RY .
3. Find the inverse function x = w( y ) .
dx
4. From the inverse function, find the Jacobian (J) .
dy
5. Replace x in f X ( x ) by w( y ) then multiply the function
with modulus of Jacobian J . Finally form the
function f Y ( y )
fY ( y )
28
Example:
Let X be a continuous random variable with probability distribution
function
1

f X ( x ) = 12
(
1 + x2 , ) 0 < x < 3,
0, elsewhere.
Find the probability distribution function of the random variable Y = X 2 .
Solution:
1. The one-to-one transformation y = x 2 maps the space {x 0 < x < 3}
onto the space {y 0 < y < 9}.
2. The transformation Y = X 2 sets up a one-to-one correspondence

between the points of R X and those of RY (one-to-one
transformation).
3. The inverse of y = x2 is x = y.
dx 1
4. We obtain Jacobian J = = .
dy 2 y
5. Therefore, f Y ( y ) = f X ( y ) J
=
1
(
1+ ( y ) ) 2 1 y 
2
 12  
1+ y
= , 0 < y < 9.
24 y
29
Example:
Let X be a continuous random variable with probability distribution

function
e − x , x > 0,
f X (x ) = 
0, elsewhere.
Find the probability distribution of the random variable Y = e − X .
Solution:
1. The transformation maps the space R X = {x x > 0} to
RY = {y 0 < y < 1}.
2. The transformation Y = e − X sets up a one-to-one correspondence

between the points of R X and those of RY (one-to-one transformation).
3. The inverse function is x = − ln y .
dx 1
4. Jacobian, J = =− .
dy y
5.
 ln y  1 
 f X (− ln y ) J = e   = 1, 0 < y <1
fY ( y ) =   y
0,
 elsewhere
Transformation of two random variables

HOW?????????????
30
Example (Refer to Lecture Notes Series)

Let X1 and X2 be two independent random variables that have Poisson
distributions with means µ1 and µ2 respectively. Find the probability
distribution function of Y1 = X 1 + X 2 and Y 2= X 2 .
 µ1x1 e −µ1 x1 = 0,1,2,3,⋯

f X1 ( x1 ) =  x1!
0,
 elsewhere
 µ 2x2 e − µ2
 ,
f X 2 ( x2 ) =  x2 ! x2 = 0,1,2,3,⋯
0,
 elsewhere
 µ1x1 µ 2x2 e − µ1 e − µ 2 x1 = 0,1,2,3,⋯
 ,
f X1 X 2 ( x1 , x2 ) =  x1! x2 ! x2 = 0,1,2,3,⋯
0,
 elsewhere
(since X 1 and X 2 are independent : f X1 X 2 ( x1 , x2 ) = f X1 ( x1 ) f X 2 ( x2 ) )
1. The transformation of y1 = x1 + x 2 and y 2 = x 2 maps the space

R( X1 , X 2 ) = {( x1 , x2 ) x1 = 0,1,2,3,⋯; x2 = 0,1,2,3,⋯} to ??
How to find the range of y1 and y 2 ?
x1 and x2 are always positive so the summation of x1 and x2 are always

positive too, this implies that y1 is always positive. So y1 = 0,1,2,3,⋯ .
How about y 2 ?
Is y 2 = 0,1,2,3,... since y 2 = x 2 and x2 = 0,1,2,3,... ? NO!
From y1 = x1 + x 2 and y 2 = x 2 , we have
1 y 2 = x 2 which means y 2 is always positive (since x2 is always
positive)
2 y 2 = y1 − x1 which means y 2 always take a value less that y1 (or
maximum value of y 2 is y1 ).
From 1 and 2 we get the range of y 2 : y 2 = 0,1,2,3,..., y1
So, R(Y1 ,Y2 ) = {( y1 , y 2 ) y1 = 0,1,2,3,⋯; y 2 = 0,1,2,3,⋯, y1 }
31
2. The transformation y1 = x1 + x 2 and y 2 = x 2 sets up a one-to-one

correspondence between the points of R( X1 , X 2 ) and those of
R(Y1 ,Y2 ) (one-to-one transformation).
3. The inverse functions are x1 = y1 − y 2 and x 2 = y 2 .
 µ1y1 − y2 µ 2y2 e − µ1 e − µ2
 , ( y1 , y 2 ) ∈ R( y1 , y2 )
4. f Y1Y2 ( y1 , y 2 ) =  ( y1 − y 2 )! y 2 !
0,
 elsewhere
Transformation of Variables for TWO discrete random variable
f X1 X 2 ( x1 , x2 )
1. Transformation based on Y1 = g1 ( X 1 ) and
Y2 = g 2 ( X 2 ) maps the space R( X1 , X 2 ) to R(Y1 ,Y2 ) .
(Find R(Y1 ,Y2 ) )
R( X1 , X 2 ) and those of R(Y1 ,Y2 ) .
3. Find the inverse function x1 = w1 ( y1 ) and x2 = w2 ( y 2 ) .
4. Replace x1 and x2 in f X1 X 2 ( x1 , x2 ) with w1 ( y1 ) and
w2 ( y 2 ) respectively. Finally form the function
f Y1Y2 ( y1 , y 2 )
f Y1Y2 ( y1 , y 2 )
Symbol: R( X1 , X 2 ) = R X1X 2
32
Transformation of Variables for TWO continuous random variable
f X1 X 2 ( x1 , x2 )
1. Transformation based on Y1 = g1 ( X 1 ) and
Y2 = g 2 ( X 2 ) maps the space R( X1 , X 2 ) to R(Y1 ,Y2 ) . (Find
R(Y1 ,Y2 ) )
R( X1 , X 2 ) and those of R(Y1 ,Y2 ) .
3. Find the inverse function x1 = w1 ( y1 ) and x2 = w2 ( y 2 ) .
4. From the inverse function, find the Jacobian,
∂x1 ∂x1
∂y ∂y 2
J= 1 ≠ 0.
∂x2 ∂x2
∂y1 ∂y 2
5. Replace x1 and x2 in f X1 X 2 ( x1 , x2 ) with w1 ( y1 ) and
w2 ( y 2 ) respectively. Then multiply the function with
modulus of Jacobian J . Finally form the function
f Y1Y2 ( y1 , y 2 ) .
f Y1Y2 ( y1 , y 2 )
33
Example (Refer to Lecture notes series)

Let X 1 and X 2 be two continuous random variables with joint
probability distribution
4 x x , 0 < x1 < 1,0 < x2 < 1
f X1 X 2 ( x1 , x2 ) =  1 2
0, elsewhere.
Find the joint probability density function of Y1 = X 1 + X 2 and Y2 = X 2 .
Answer:
Solution:
The one-to-one transformation y1 = x1 + x 2 and y 2 = x 2 maps the space
R( X , X ) = {( x1 , x2 ) | 0 < x1 < 1; 0 < x2 < 1} onto the space
1 2
R(Y ,Y ) = {( y1 , y2 ) | y2 < y1 < 1 + y2 , 0 < y2 < 1 }.

1 2
How to determine the set of points in the y1 y 2 -plane?

First, we write x1 = y1 − y 2 and x2 = y 2
and then setting x1 = 0 , x2 = 0 , x1 = 1 and x2 = 1 , the boundaries of set
R( X1 , X 2 ) are transformed to y1 = y 2 , y 2 = 0 , y1 = 1 + y 2 and y 2 = 1 . The
regions of R( X1 , X 2 ) and R(Y1 ,Y2 ) are illustrated in the following figure:
X2
Y2
X2 =1
Y2 = 1
X1 = 0 X1 = 1
Y1 = Y2 Y1 = 1 + Y2
X1 Y1
X2 = 0
Y2 = 0
Clearly, the transformation is one-to-one.
The inverse functions of y1 = x1 + x2 and y2 = x2 are x1 = y1 − y2 ,

1 −1
x2 = y2 . Then the Jacobian of the transformation is J = =1,
0 1
hence the joint probability distribution function of Y1 and Y2 is
4( y1 − y2 )y2 , ( y1, y2 ) ∈ R( y , y ) ,
fY Y ( y1 , y2 ) =  1 2
0,
1 2
elsewhere.
34
Transformation of ONE random variable (Discrete and Continuous)
Extended to
Transformation of TWO random variables (Discrete and Continuous)
Extended to
Transformation of MULTIPLE random variables (Discrete and

Continuous)
Example(Refer to Lecture Notes series)

Let X 1 , X 2 ,⋯, X k +1 be mutually independent with Gamma distribution,
i.e, X i ~ Gamma (α i ,1) . And the joint probability distribution function is
 k +1 1
∏ xiαi −1e − xi , 0 < xi < ∞
f X1 X 2⋯X k +1 ( x1 , x2 ,⋯, xk +1 ) =  i =1 Γ(α i )
0,
 elsewhere.
Xi
Given Yi = , i = 1,2,⋯, k and
X 1 + X 2 + ⋯ + X k +1
Yk +1 = X 1 + X 2 + ⋯ + X k +1 , find the joint probability distribution function
of f Y1Y2⋯Yk +1 ( y1 , y 2 ,⋯, y k +1 ) .
35
Solution:
1. The transformations maps the space
R X1 X 2⋯ X k +1 = {( x1 , x 2 ,⋯, x k +1 ) 0 < xi < ∞, i = 1,2,⋯, k + 1} to
RY1Y2⋯Yk +1 = {( y1 , y 2 ,⋯, y k +1 ) y i > 0, y1 + y 2 + ⋯ + y k < 1, 0 < y k +1 < ∞}
xi
2. The transformations yi = , i = 1,2,⋯, k and
x1 + x2 + ⋯ + xk +1
y k +1 = x1 + x2 + ⋯ + xk +1 , set up a one-to-one correspondence
between the points of R X and those of RY (one-to-one
transformation).
3. The inverse functions are x1 = y1 y k +1 ,⋯, x k = y k y k +1 ,

x k +1 = y k +1 (1 − y1 − y 2 − ⋯ − y k ) .
y k +1 0 ⋯ 0 y1
0 y k +1 ⋯ 0 y2
4. Jacobian, J = ⋮ ⋮ ⋮ ⋮ = y kk+1 .
0 0 ⋯ y k +1 yk
− y k +1 − y k +1 ⋯ − y k +1 (1 − y1 − ⋯ − y k )
5. So, the probability distribution for (Y1 , Y2 ,⋯, Yk +1 ) is
 y kα+11+ α 2 +⋯+ α k +1 −1 y1α1 −1 y 2α 2 −1 ⋯ y kα k −1 (1 − y1 − ⋯ − y k )α k +1 −1 e − yk +1


( ) ( ) ( )
,
 Γ α 1 Γ α 2 ⋯ Γ α k +1

f Y1Y2 ⋯Yk +1 ( y1 , y 2 ,⋯ , y k +1 ) =  ( y1 , y 2 ,⋯ , y k +1 ) ∈ RY1Y2 ⋯Yk +1

0 , elsewhere

36
So far the transformations involved are ONE-TO-ONE. What happen if

the transformation is NOT one-to-one?
PARTITION the range of x, R X into a few intervals.

R X = A1 ∪ A2 ∪ A3 ∪ ⋯ ∪ An with conditions
1. Ai ∩ A j = φ , i ≠ j .
2. y = g ( x ) define a one-to-one transformation from Ai to RY .
For each of the range Ai , you can find one function in terms of y. Finally
sum up all the functions in terms of y if the range of Y is the same, then it
will form the function f Y ( y ) .
You may extend the idea to two or multiple random variables.
Example:
x2
1 −2
Given f X ( x ) = e , − ∞ < x < ∞ . Find f Y ( y ) if Y = X 2 .
2π
Solution
Clearly the transformation y = x 2 is NOT one-to-one.
y
y = x2
A1 A2
Partition R X = {x − ∞ < x < ∞} into A1 = {x − ∞ < x < 0} and

A2 = {x 0 < x < ∞}.
37
For the range of A1 :

1. The transformation maps the space A1 = {x − ∞ < x < 0} to
RY = {y 0 < y < ∞}.
2. The transformation y = x 2 sets up a one-to-one correspondence

between the points of A1 and those of RY (one-to-one transformation).
3. The inverse function is x = − y .
1
4. Jacobian, J = − .
2 y
y
1 −2 −1
5. gY ( y ) = f X (− y ) J = e , 0< y<∞
2π 2 y
For the range of A2 :
6. The transformation maps the space A2 = {x 0 < x < ∞} to

RY = {y 0 < y < ∞}.
7. The transformation y = x 2 sets up a one-to-one correspondence

between the point of A2 and those of RY (one-to-one transformation).
8. The inverse function is x = y .
1
9. Jacobian, J = .
2 y
y
1 −2 1
10. hY ( y ) = f X ( y ) J = e , 0< y<∞
2π 2 y
Finally,
f Y ( y ) = gY ( y ) + hY ( y ) , 0 < y < ∞
38
y y
1 −2 −1 1 −2 1
fY ( y ) = e + e , 0< y<∞
2π 2 y 2π 2 y
1 y
1 − −
= y 2e 2 , 0 < y < ∞
2π
We can sum up g Y ( y ) and hY ( y ) because both of them are having the

range of y.
So if the range of y are different then just leave the answer in the way
such that
 g Y ( y ), y ∈ RY
fY ( y ) =  1
hY ( y ), y ∈ RY 2
39
Example(Refer to Lecture Notes series)

Show that Y =
( X − µ )2 has a chi-squared distribution with 1 degree of
2
σ
freedom when X has a normal distribution with mean µ and variance σ 2 .
Solution:
Let Z =
( X − µ ) , where the random variable Z has the standard normal
σ
distribution
z2
1 −
f Z (z ) = e 2
, −∞ < z <∞.
2π
We shall now find the distribution of the random variable Y = Z 2 . The
inverse solution of y = z 2 are z = ± y . If we designate z1 = − y and
1 1
z2 = y , then J 1 = − and J 2 = . Hence we have
2 y 2 y
y y
1 − 2 −1 1 −2 1
gY ( y ) = e + e
2π 2 y 2π 2 y
1 y
1 −1 −
= 1
y2 e 2 , y > 0.
2 2
π
Since g Y ( y ) is a density function, it follows that
1 1
1 y Γ  1 y Γ 
e dy =   ∫ e dy =   ,
1 ∞ −1 − 2 ∞ 1 −1 − 2
1= 1 ∫
0
y 2 2
π 0
1
1
y 2 2
π
2 2
π 2 Γ 
2
2
the integral being the area under a gamma probability curve with
1 1
parameters α = and β = 2 . Therefore, π = Γ  and the probability
2 2
distribution of Y is given by
 1 1
−1 −
y
 1 y 2
e 2
, y > 0,
 2 1
g Y ( y ) =  2 Γ 
 2
0, elsewhere,
which is seen to be a chi-squared distribution with 1 degree of freedom.
40
Expected Values / Mean

- average value of the occurrence of outcomes
- describe where the probability distribution centered
Symbol: X , µ X , E ( X )
Expected value for ONE random variable


∑ X
xf ( x ) if X is discrete
E(X ) =  x
 ∞ xf ( x )dx if X is continuous.
∫−∞ X
Example (Discrete case)

The probability distribution function of the discrete random variable X is
161 (2 x + 1) , x = 0,1,2,3
f X ( x) =  Find the mean of X.
0, elsewhere.
Solution:
3
µ X = E ( X ) = ∑ x ⋅ 161 (2 x + 1)
x =0
1 3 5 7
= 0⋅ + 1⋅ + 2 ⋅ + 3 ⋅
16 16 16 16
17
=
8
41
Example(Continuous case)
The probability density function of the continuous random variable X is
1

f X ( x ) = 12
(
1 + x2 , ) 0 < x < 3,
0, elsewhere.
Find the mean of X.
Solution:
3
(1 + x 2 )
µ X = E(X ) = ∫ x dx
0
12
3
1
(
= ∫ x + x 3 dx
12 0
)
3
1  x2 x4 
=  + 
12  2 4 0
33
= .
16
42
Example:
Suppose in a computer game competition, the probabilities for Ali to
score 10, 20 and 30 points are given by 1/3, 1/5 and 7/15, respectively.
The probabilities for Ahmad to score 10, 20 and 30 points are given by
1/6, 1/3 and 1/2, respectively.
By using expected value, determine who is having a better skill in playing
the computer game?
Solution:
Let X be the points score by Ali and Y be the points score by Ahmad.
Then
E ( X ) = 10 × 13 + 20 × 15 + 30 × 157 = 643
E (Y ) = 10 × 16 + 20 × 13 + 30 × 12 = 703
We see that E (Y ) > E ( X ) , we may conclude that Ahmad is having a
better skill compare to Ali.
To find E ( X ) , we have

∑ X
xf ( x ) if X is discrete
E(X ) =  x
 ∞ xf ( x )dx if X is continuous.
∫−∞ X
How about E [g ( X )] ?

∑
g (x) f X (x) if X is discrete
E [g ( X )] =  x
 ∞ g ( x ) f ( x )dx
∫−∞ X if X is continuous.
43
The following results are true for both discrete and continuous of
ONE random variable:
1. E[aX + b] = aE[ X ] + b
2. E [g ( X ) ± h( X )] = E[g ( X )] ± E [h( X )]
Expected value for TWO random variables
∑∑ g ( x, y ) f XY ( x, y ) if X and Y are discrete

 x y
µ g ( X ,Y ) = E [g ( X , Y ) ] =  ∞ ∞
 ∫ ∫ g ( x, y ) f XY ( x, y )dx dy if X and Y are continuous
−∞ −∞
Example(Discrete case)
Let X and Y be the random variables with joint probability distribution
function indicated as below:
x
fXY(x,y) 0 1 Row
total
0 1 1 3
2 4 4
y 1 1 1 1
8 8 4
Column 5 3 1
total 8 8
44
(i) Find E(XY).

1 1
E ( XY ) = ∑∑ xyf XY ( x, y )
x =0 y =0
= (0 )(0 ) f XY (0,0 ) + (0 )(1) f XY (0,1) + (1)(0 ) f XY (1,0 ) + (1)(1) f XY (1,1)

= f XY (1,1)
1
=
8
(ii) Find E(X).

1 1
E ( X ) = ∑∑ xf XY ( x, y )
x =0 y =0
1 1 
= ∑ x  ∑ f XY ( x, y )
x =0  y = 0 
1
= ∑ x[ f XY ( x,0 ) + f XY ( x,1)]
x =0
= (0 )[ f XY (0,0 ) + f XY (0,1)] + (1)[ f XY (1,0 ) + f XY (1,1)]
3
=
8
Example(Continuous case)
Let the joint probability density function be
 x + y, 0 < x < 1;0 < y < 1
f XY ( x, y ) =  . Find E(XY) and E(Y).
0, elsewhere
1
11 11
 x 3 y x 2 y 2 
1 1
 y y2 
E ( XY ) = ∫ ∫ xy ( x + y )dxdy = ∫ ∫ (2 2
)
x y + xy dxdy = ∫ 
3
+
2
 dy = ∫  +
3 2
dy
00 00 0   0 0  
3 1
y2 y  1
= +  =
6 6 0 3
45
11 1
1 
E (Y ) = ∫ ∫ yf XY (x, y )dxdy = ∫ y  ∫ f XY (x, y )dx dy
00 0 0 
1
1 
= ∫ y  ∫ (x + y )dx  dy
0 0 
1
1
 x2 
= ∫ y  + xy  dy
0 
2 0
1
y 
= ∫  + y 2 dy
0 
2
1
y2 y3  7
= +  =
4 3  0 12
The following results are true for both discrete and continuous of TWO
random variables, X and Y:
1. E [g ( X , Y ) ± h( X , Y )] = E [g ( X , Y )] ± E [h( X , Y )]
2. X and Y are independent ⇒ E[ XY ] = E[ X ]E[Y ] .
Example:
Let the joint probability density function be
 x + y, 0 < x < 1;0 < y < 1
f XY ( x, y ) =  . find E ( X + Y ) .
0, elsewhere
7
From previous example, E (Y ) = .
12
11 1
 1
 7
E ( X ) = ∫ ∫ xf XY ( x, y )dxdy = ∫ x  ∫ ( x + y )dy dx = .
00 0 0  12
7 7 7
E ( X + Y ) = E ( X ) + E (Y ) = + = .
12 12 6
46
Example:
Given two independent random variables with pdf
1, 0 < x1 < 1, 0 < x2 < 1,
f XY ( x, y ) = 
0, otherwise.
Find E ( X ) , E (Y ) and E ( XY ). Then, illustrate that E ( XY ) = E ( X )E (Y ) .
Solution:
11
1
E ( X ) = ∫ ∫ xdxdy =
00
2
11
1
E (Y ) = ∫ ∫ ydxdy =
00
2
11
1
E ( XY ) = ∫ ∫ xydxdy =
00
4
1 1 1
We see that E ( X )E (Y ) = ⋅ = = E ( XY ) . Hence, E ( XY ) = E ( X )E (Y ) .
2 2 4
Important remark:
E [ XY ] ≠ E [ X ]E [Y ] ⇒ X and Y are dependent.
E [ XY ] = E [ X ]E [Y ] , DOES NOT IMPLY X and Y are independent.
It is a ONE WAY statement !!!

HOW TO prove X and Y are independent?
X and Y are statistically independent if and only if
f XY ( x, y ) = f X ( x ) f Y ( y ) .
HOW TO prove X and Y are dependent?

Prove EITHER of the following:
1. f XY ( x, y ) ≠ f X ( x ) f Y ( y )
2. E [ XY ] ≠ E [ X ]E [Y ]
47
Variance (one random variable)

- A measure of the variability of a random variable X. OR, A
measure of the dispersion or spread of a distribution.
µ µ
σ 2 small σ 2 large
Symbol: σ 2X , var( X ) (some of the books use V ( X ) )
∑ ( x − µ X )2 f X ( x ) if X is discrete
 x
[
σ 2X = E ( X − µ X )
2
] =∞
 ∫ ( x − µ X )2 f X ( x )dx if X is continuous
−∞
Note:
σ X : standard deviation
48
Example (Discrete case):

The probability distribution function of the discrete random variable X is
161 (2 x + 1) , x = 0,1,2,3
f X ( x) =  Find the variance of X.
0, elsewhere.
Solution:
3 3
[ 2
]
σ 2X = E ( X − µ X ) = ∑ ( x − µ X ) f X (x ) = ∑ (x − 178 )
2 2 1
16
(2 x + 1)
x =0 x =0
1  17   55
2 2 2 2
 9  1 7
=  −  (1) +  −  (3) +  −  (5) +   (7 ) =
16  8   8  8 8  64
Example(Continuous case):
The probability density function of the continuous random variable X is
1

f X ( x ) = 12
(
1 + x2 , )
0 < x < 3,
0, elsewhere.
Find the variance of X.
Solution:
∞
σ 2X [
= E (X − µ X ) =
2
] ∫ (x − µ X )
2
f X (x )dx
−∞
2
333  1 + x 2
= ∫ x −  dx
0
 16  12
699
=
1280
49
Example:
Consider the following pmf:
1
f X (x ) = , x = −2,−1,0,1,2 and
5
1
f Y ( y ) = , y = −4,−2,0,2,4 .
5
We may calculate E ( X ) and E (Y ) as follows:

1 2 1 1 2
E(X ) = ∑ x⋅5 = −5 − 5 +0+ 5 + 5 =0
x = −2, −1, 0,1, 2
1 4 2 2 4
E (Y ) = ∑ y⋅ =− − +0+ + =0
5 5 5 5 5
y = −4 , −2, 0, 2, 4
We may calculate var( X ) and var(Y ) as follows:
2 1  2 1 2 1  2 1  2 1 
var( X ) = ∑ x 2 f X (x ) = (− 2)  5  + (− 1)  5  + (0 )  5  + (1)  5  + (2 )  5 
x = −2, −1, 0 ,1, 2
=2
1 1 1 1 1

var(Y ) = ∑ y 2 fY ( y ) = (− 4)2  5  + (− 2 )2  5  + (0 )2  5  + (2)2  5  + (4)2  5 
x = −4, −2, 0, 2, 4
=8
The mean for X and Y are zero.

From the variance, we get standard deviation of X and Y, respectively as
follow:
σ X = 2 and σY = 2 2 .
Here the standard deviation of Y is twice that of X, reflecting the fact that
the probability of Y is spread out twice as much as that of X.
50
Covariance (Two random variables)

- A measurement of the nature of the association between two
random variables (for example dependency of two random
variables).
- A positive value of covariance indicates that X and Y tend to
increase together, whereas a negative value indicates that an
increase in X is accompanied by a decrease in Y.
Symbol: cov( X , Y ) or σ XY
To calculate cov( X , Y ) , we have,

cov( X , Y ) = σ XY = E[( X − µ X )(Y − µY )]
∑∑ ( x − µ X )( y − µY ) f XY ( x , y ) if X and Y are discrete
 x y
=∞ ∞
 ∫ ∫ ( x − µ X )( y − µY ) f XY ( x , y )dxdy if X and Y are continuous
−∞ −∞
OR
σ XY = E ( XY ) − µ X µ Y
Example:
Let X and Y be the random variables with joint probability distribution
function indicated as below:
x
fXY(x,y) 0 1 Row
total
0 1 1 3
2 4 4
y 1 1 1 1
8 8 4
Column 5 3 1
total 8 8
Find cov( X , Y ) .
51
Solution 1:
3
From previous example, we see that µ X = E ( X ) = .
8
1 1
µY = E (Y ) = ∑∑ yf XY ( x, y )
y =0 x =0
1
1  1
= ∑ y ∑ f XY ( x, y ) = ∑ y[ f XY (0, y ) + f XY (1, y )]
y =0  x = 0  y =0
1
= (0 )[ f XY (0,1) + f XY (1,0 )] + (1)[ f XY (0,1) + f XY (1,1)] = .
4
1 1
 3  1
cov( X , Y ) = σ XY = E [( X − µ X )(Y − µY )] = ∑ ∑  x −  y −  f XY ( x, y )
x =0 y = 0  8  4
1
 3  1  3  1  
= ∑  x −  0 −  f XY (x,0 ) +  x − 1 −  f XY ( x, 1)
x =0  8  4  8  4  
1
= .
32
Solution 2:
From previous examples, we see that
3 1 1
µ X = E ( X ) = , µY = E (Y ) = and E ( XY ) = .
8 4 8
cov( X , Y ) = E ( XY ) − µ X µY
1 3 1
= − ⋅
8 8 4
1
=
32
X and Y are statistically independent ⇒ cov( X , Y ) = 0 (which means

uncorrelated)
cov( X , Y ) = 0 ⇒ X and Y are statistically independent

cov( X , Y ) ≠ 0 ⇒ X and Y are statistically dependent
52
Example:
Given two independent random variables with pdf
1, 0 < x < 1, 0 < y < 1,
f XY ( x, y ) = 
0, otherwise.
Show that cov( X , Y ) = 0 .
Solution:
11 11 11
1 1 1
E ( X ) = ∫ ∫ xdxdy = , E (Y ) = ∫ ∫ ydxdy = , E ( XY ) = ∫ ∫ xydxdy =
00
2 00
2 00
4
cov( X , Y ) = E ( XY ) − µ X µY
1 1 1
= − ⋅
4 2 2
=0
Example:
Let X and Y have the joint pmf
1
f XY ( x, y ) = , (x, y ) = (0, 1), (1, 0 ), (2, 1).
3
(i) Determine whether X and Y are independent.
(ii) Find cov( X , Y ).
Solution:
1 1 1
(i) f X (0 ) = , f X (1) = , f X (2 ) = .
3 3 3
1 2
f Y (0 ) = , f Y (1) =
3 3
1
We see that f XY (0,1) = which is not equal to
3
1 2 2
f X (0 ) f Y (1) = × = .
3 3 9
Thus, X and Y are dependent.
(ii) The means of X and Y are µ X = 1 and µ Y = 2 3 , respectively.

Hence
cov( X , Y ) = E ( XY ) − µ X µ Y
1 1 1 2
= (0 )(1)  + (1)(0 )  + (2 )(1)  − (1) 
 3  3 3 3
= 0.
That is cov( X , Y ) = 0 , but X and Y are dependent.
53
Variance of Linear Combination of Random Variables
2 2 2 2 2
σ aX +bY = a σ X + b σ Y + 2abσ XY
var(aX + bY ) = a 2 var( X ) + b 2 var(Y ) + 2ab cov( X , Y )
If X and Y are statistically independent,

2 2 2 2 2
σ aX +bY = a σ X + b σ Y
var(aX + bY ) = a 2 var( X ) + b 2 var(Y )
(Because X and Y are statistically independent ⇒ cov( X , Y ) = 0 )
Proof:
From definition,
2
σ aX {
+ bY = E [ (aX + bY ) − µ aX +bY ] 2}
Now,
µ aX +bY = E (aX + bY ) = aE ( X ) + bE (Y ) = aµ X + bµY
Therefore,
σ 2aX +bY = E {[ (aX + bY ) − (aµ X + bµ Y ) ] 2 }
=E {[ a( X − µ ) + b(Y − µ ) ] }
X Y
2
= a E [ ( X − µ ) ] + b E [ (Y − µ ) ] + 2abE [ ( X − µ )(Y − µY ) ]
2 2 2 2
X Y X
= a 2 σ 2X + 2 2
b σY + 2abσ XY
2
σ aX −bY = ?
54
If X and Y are independent, then

1. f XY ( x, y ) = f X ( x ) f Y ( y )
2. E ( XY ) = E ( X )E (Y )
3. cov( X , Y ) = 0
4. X and Y are uncorrelated
Moment
- Useful information to the shape and spread of the distribution
function.
- Used to construct estimators for population parameters via the so-
called method of moment.
The kth moment (about the origin) of a random variable X is

∑ x k f X ( x ) if X is discrete
 x
µ k = E( X k ) =  ∞
 ∫ x k f X ( x) dx if X is continuous.
−∞
The kth moment about the mean of random variable X is

∑ ( x − µ ) k f X ( x ) if X is discrete
 x
[
E ( X − µ) =  ∞
k
]
 ∫ ( x − µ )k f X ( x )dx if X is continuous.
−∞
55
First moment about the origin (E ( X )) gives the mean value which is a
measurement to describe central tendency.
Second moment about the mean tells about the dispersion of pdf (the
spread of random variables).
The skewness of a pdf can be measured in terms of its third moment

about the mean.
If pdf symetry, then E [( X − µ )3 ] = 0 .
Fourth moment about the mean has been used as a measure of kurtosis
and peakedness.
Example(Refer to Lecture Notes Series)

1
 , x = 1, ⋯ , N
Let probability function f X ( x ) =  N . Show that
0, elsewhere
µ1 =
N +1
and µ 2 =
( N + 1)(2 N + 1) .
2 6
Solution:
The well-known formulas for the sums of powers of the first N integers
are as follows.
N ( N + 1) N ( N + 1)(2 N + 1)
∑ x = 2 and ∑ x 2 = 6
1≤ x ≤ N 1≤ x ≤ N
Thus, µ1 = ∑ xf X ( x ) . Similarly, µ 2 = ∑x
1≤ x ≤ N
2
f X (x )
1≤ x ≤ N
∑x ∑ x2
= 1≤ x≤ N = 1≤ x ≤ N
N N
N ( N + 1) N ( N + 1)(2 N + 1)
= =
2N 6N
=
N +1
=
(N + 1)(2 N + 1)
2 6
56

The skewness of a pdf can be measured in term of its third moment about
the mean. If a pdf is symmetric, E [( X − µ X )3 ] will obviously be 0; for
pdf’s not symmetric, E [( X − µ X )3 ] will not be zero. In practice, the
symmetry (or lack of symmetry) of a pdf is often measured by the
coefficient of skewness, γ 1 , where
γ1 =
[
E (X − µ X )
.
3
]
σ 3X
Dividing E [( X − µ X )3 ] by σ 3X makes γ 1 dimensionless.
A second “shape” parameter in common use is the coefficient of kurtosis,
γ 2 , which involves the fourth moment about the mean. Specifically,
γ2 =
[
E (X − µ X )
− 3.
4
]
σ 4X
For certain pdf’s, γ 2 is a useful measure of peakedness; relatively “flat”
pdf’s are said to be platykurtic; more peaked pdf’s are called leptokurtic.
What is the meaning of “skewness”?

- A distribution is skewed if one of its tails is longer than the other.
For examples,
Positive skew or skew to Negative skew or skew to the No skew or a skew of 0

the right left (Symmetric distribution)
What is the meaning of “kurtosis”?

- Kurtosis measures the degree of peakedness of a distribution.
- A distribution with positive kurtosis is called leptokurtic(sharper
“peak” and fatter “tails”). For example, Laplace distribution and
logistic distribution.
- A distribution with negative kurtosis is called platykurtic(rounded
peak with wider “shoulders”). For example, continuous uniform
distribution.
57
Correlation Coefficient
- The correlation coefficient measures the strength of the linear

relationship between random variables X and Y.
cov( X , Y )
The correlation coefficient of X and Y, is given by ρ XY = .
σ X σY
If ρ XY = 0 , then random variables X and Y are said to be uncorrelated.
Remarks:
For any two random variables X and Y,
(a) the correlation coefficient satisfies ρ XY ≤ 1 .
(b) there is an exact linear dependency (Y = aX + b ) when
(i) ρ XY = 1 if a > 0 or
(ii) ρ XY = −1 if a < 0 .

Let X and Y have the joint pmf
1
f ( x, y ) = , (x, y ) = (0, 1), (1, 0), (2, 1).
3
Since the support is not “rectangular,” X and Y must be dependent. The

means of X and Y are µ X = 1 and µ Y = 2 3 , respectively. Hence
cov( X , Y ) = E ( XY ) − µ X µ Y
1 1 1 2
= (0)(1)  + (1)(0 )  + (2 )(1)  − (1) 
 3  3 3 3
= 0.
That is ρ = 0 , but X and Y are dependent.
Uncorrelated ≠ Independent
58
Relation between Two variables:

- Functional relation
- Statistical relation
Functional relation between two variables:

Y = f (X )
Independent variable
Dependent variable
Example [see page 395, Example 6.15.1, Engineering Mathematics Volume 1,

second Edition, Prentice Hall]
Consider the relation between the number of products(Y) produced in an hour and
number of hours(X). If 15 products are produced in an hour, the relation is expressed
as follows:
Y = 15 X
Number of hours Number of products

1 15
2 30
3 45
4 60
The observations are plotted in Figure 6.15.1.
Number of 80
products 70
60
50
40
30
20
10
0
0 1 2 3 4 5 6
Number of hours
Figure 6.15.1 Functional Relation between number of products and number of hours
Statistical relation between two variables:
59
The observations for a statistical relation do not fall directly on the curve
of relationship.
Example [see page 396, Example 6.15.1, Engineering Mathematics Volume 1,

second Edition, Prentice Hall]
Consider the experimental data of Table 6.15.1, which was obtained from 33 samples
of chemically treated waste in the study conducted at the Virginia Polytechnic
Institute and State University. Reading on the percent reduction in total solids, and
the percent reduction in chemical demand for 33 samples, were recorded.
Table 6.15.1 Measures of Solids and Chemical Oxygen Demand

Solids reduction, Chemical oxygen Solids reduction, Chemical oxygen
x(%) demand, y(%) x(%) demand, y(%)
3 5 36 34
7 11 37 36
11 21 38 38
15 16 39 37
18 16 39 36
27 28 39 45
29 27 40 39
30 25 41 41
30 35 42 40
31 30 42 44
31 40 43 37
32 32 44 44
33 34 45 46
33 32 46 46
34 34 47 49
36 37 50 51
36 38
A diagram is plotted (Figure 6.15.2) based on the data in Table 6.15.1. The percent
reduction in chemical oxygen demand is taken as the dependent variable or response,
y, and the percent reduction in total solids as the independent variable or regressor, x.
Figure 6.15.2 is called a scatter diagram. In statistical terminology, each point in the
scatter diagram represents a trial or a case. Note that most of the points do not fall
directly on the line of statistical relationship (which do not have the exactitude of a
functional relation) but it can be highly useful.
60
60
50
40
30
20
10
0
0 10 20 30 40 50 60 x
Figure 6.15.2 Statistical Relation between Solids Reduction(%) and
Chemical Oxygen Demand(%)
Simple linear regression model:
y i = α + βxi + ε i
where
i. α and β are unknown intercept and slope parameters respectively.
ii. y i is the value of the response variable in the ith trial.
iii. xi is a known constant, namely, the value of the independent variable in the ith
trial.
iv. ε i is a random error with E (ε i ) = 0 and var (ε i ) = σ 2 . The quantity σ 2 is
often called the error variance or residual variance.
61
Fitted Regression Line:
ŷ i = c1 + c 2 xi
where
i. c1 and c 2 are estimated values for α and β (unknown parameters, so-called
regression coefficient), respectively.
ii. ŷ i is the predicted or fitted value.
- We expect to have a fitted line which is close to the true regression line.
- In order to find “good” estimators of regression coefficients α and β , the method of
least squares is used.
Method of Least Squares:

Before we go into details of the method of least squares, we need to study what
residual is because it plays an important role in the method of least squares.
Residual: Error in Fit

A residual ei , is an error in the fit of the model ŷ i = c1 + c 2 xi and it is given by
ei = y i − ŷ i .
Method of least squares: To minimize the sum of the squares of the residual (sum of
squares of the error about the regression line, SSE), we see that
n n n
SSE = ∑ ei2 = ∑ ( y i − ŷ i ) = ∑ ( y i − c1 − c 2 xi )
2 2
i =1 i =1 i =1
Differentiating SSE with respect to c1 and c 2 , we have

n
∂SSE
= −2∑ ( y i − c1 − c 2 xi )
∂c1 i =1
n
∂SSE
= −2∑ ( y i − c1 − c 2 xi )xi
∂c 2 i =1
Setting the partial derivations equal to zero, we obtain the following equations:
n n
∑y
i =1
i = nc1 + c 2 ∑ xi
i =1
n n n
∑x y
i =1
i i = c1 ∑ xi + c 2 ∑ xi2
i =1 i =1
n n
The above equations are called normal equations. The quantities ∑x , ∑y
i =1
i
i =1
i ,
n n
∑x y
i =1
i i and ∑x
i =1
2
i can be calculated from relevant data. Solve the normal equations
simultaneously, we have
62
n
 n  n  n
n∑ xi y i −  ∑ xi  ∑ y i  ∑ (x i − x )( y i − y )
c 2 = i =1  i =1  i =1  = i =1
2 n
 n 
∑ (x
n
− x)
2
n∑ xi2 −  ∑ xi  i
i =1  i =1  i =1
and
n n
∑y
i =1
i − c 2 ∑ xi
i =1
c1 = = y − c2 x
n
63
Stochastic Processes
A collection of random variables {X (t ),t ∈ T } defined on a given
probability space, indexed by the time parameter t where t is in index set
T.
For example, the price of a particular stock counter listed on the stock
exchange as a function of time is a stochastic process.
Example of stochastic process

(Refer to Example in Lecture Notes Series)
Let Xn be a random variable denoting the position at time n of a moving
particle (n=0, 1, 2, 3, …). The particle will move around the integer
{⋯, − 2, − 1, 0, 1, 2, ⋯} . For every single point of time, there is a jump of
1
one step for the particle with probability ( a jump could be upwards or
2
downwards). Those jumps at time n = 1, 2, 3, … are being independent.
This process is called Simple Random Walk.
In general,
X n = X n−1 + Z n , with Z n = 1,−1.
1 1
P(Z n = 1) = , P(Z n = −1) = .
2 2
X n = X 0 + Z 1 + Z 2 + ... + Z n and X 0 = 0 .
Figure 1: An example of a simple random walk.

X(n)
2 x
1 x x
x x
0 1 2 3 4 5
n
x
-1
Suppose that an absorbing barrier is placed at state a. That is, the

random walk continues until state a is first reached. The process stops
and the particle stops at state a thereafter. a is then known as an
absorbing state.
64
State space
State space contains all the possible values of X (t ) .
Symbol is S.
In the stock counter example, the state space is the set of all prices of that
particular counter throughout the day.
Discrete space: If S is a finite or at most countable

infinite values.
State space
Continuous space: If S is a finite or infinite intervals of
the real line.
Index Parameter
Index parameter normally refers to time parameter t.
Discrete time: discrete

point of time
Index Parameter T
Continuous time: interval

of real line
Example (Refer to Example in Lecture Notes Series)

Successive observation of tossing a coin.
1 if t toss is head,
th
X (t ) = 
0 if t th toss is tail.
State space, S = {0, 1}. This is the stochastic process with discrete time
and discrete state space.

Number of customers in the interval time [0, t).
State space, S ={0, 1, 2, …}. This is the stochastic process with
continuous time and discrete state space. (Number of customers is
countable)
65
Classification of Stochastic process
State space
Time parameter Discrete Continuous
Discrete Discrete time stochastic Discrete time stochastic
chain/process with a process with a
discrete state space continuous state space
Continuous Continuous time Continuous time
stochastic chain/process stochastic process with a
with a discrete state continuous state space
space
Stochastic process with discrete time parameter

Symbol: {X t } or {X (t )}
Example: {X t , t = 0,1,2,...} or {X (t ), t = 0,1,2,...}
Stochastic process with continuous time parameter

Symbol: {X (t ), t ≥ 0}
Common Examples
A game which moves are determined entirely by dice such as snakes and
ladders, monopoly is characterized by a discrete time stochastic process
with discrete state space.
The number of web page requests arriving at a web server is

characterized by a continuous time stochastic process with discrete
state space. However this is not true when the server is under coordinated
denial of service attacks.
The number of telephone calls arriving at a switchboard or an automatic

phone-switching system is characterized by a continuous time stochastic
process with discrete state space.
66

(Discrete time process with a discrete state space)
Suppose X k is the beginning price for day k of a particular counter listed
on the Kuala Lumpur Stock Exchange (KLSE). If we observed the prices
from day 1 to 5, then the sequence {X k } is a stochastic sequence. The
following are the prices from day 1 to 5:
X 1 = RM 3.10 X 2 = RM 3.15 X 3 = RM 3.13 X 4 = RM 3.10
X 5 = RM 2.90

(Continuous time process with a discrete state space)
If we are interested in the price at any time t on a given day, then the
following figure is a realization of a continuous time process with a
discrete state space.
3.18
X (t )
3.15
3.10
9.00 am 10.00 am 11.00 am 12.00 pm t
X(t) = price of a particular counter at time t on a given day
67
Realization
Assignment to each t of a possible value of X(t)
If the process corresponds to discrete units of time then the realization is a

sequence.
If the process corresponds to continuous units of time T=[0, ∞ ), then the

realization is a function of t.

Successive observation of tossing a coin.
1 if t th toss is head,
X (t ) = 
0 if t th toss is tail.
One of the realizations is 0, 0, 1, 1, 0, 1, 0 …

Another realization though unlikely is 1, 1, 1, 1, 1, 1, 1…
Can you give another realization?

Number of customers in the time interval [0, t).
X(t)
2
t
0 t1 t2 t3
68
Discrete time Markov chain

The following conditional probability holds for all i, i0 , i1 , …, ik −1 , j in S
and all k = 0, 1, 2, ⋯ .
P{X k +1 = j X 0 = i0 , X 1 = i1 , ⋯, X k −1 = ik −1 , X k = i} = P{X k +1 = j X k = i}= Pij
time
state
Future probabilistic development of the chain depends only on its current

state and not on how the chain has arrived at the current state. The
system here has no memory of the past – a memoryless chain
(Markovian property).
Markov Matrix or Transition Probability Matrix of the process

The elements inside the matrix are probabilities.
State space
0 1 2 . . . State space
0  P00 P01 P02 . . .
1  P10 P11 .
 
2  P20 . .
P=  
. . . .
. . . .
 
. . . . . . .
In this matrix,
(i) What is the probability from 10? One step transition
Answer: P10 probabilities
(ii) What is the probability from 00? One step transition

(iii) What is the probability from 02? probabilities
69
One-step transition probabilities

Symbol: Pijn , n+1 time From time n to time n+1:
e one unit of time.
state i→ j
Pijn , n+1 = P( X n+1 = j X n = i ) , n = 0, 1, 2…
When one-step transition probabilities are independent of the time

variables, we say that the Markov process has stationary transition
probabilities. In here, we limit our discussion on Markov chain having
stationary transition probabilities, i.e. such that P( X n+1 = j X n = i ) is
independent of n.
In this case, for each i and j,
P(Xn+1=j | Xn = i) = P(X1=j | X0 = i)
OR Pijn , n+1 = Pij for all n = 0, 1, 2, …
Pij satisfies the conditions

(a) 0 ≤ Pij ≤ 1 , i, j = 0, 1, 2, ⋯
∞
(b) ∑ Pij = 1, i = 0, 1, 2, ⋯
j =1
The transition probability matrix P is also called one-step transition

matrix.
How to make sure that a given matrix is a transition matrix?
70
Example:
1 2 3
0.2 1 0. 5 1
Is T = 
0 0.5 2
a transition matrix?
 0. 3
0.5 0 0  3
Yes. The way to read the transition probability for this type of matrix is
from ‘horizontal’ to ‘vertical’
1 2 3 Current state
 0. 2 1 0.5 1
T = . For example, P11 = 0.2 , P21 = 1 , P31 = 0.5 .
 0.3 0 0.5 2
0.5 0 0  3
Current state Next state
Next state
1 2 3
1  0 .2 0 .3 0 .5
Once we transpose the matrix, we have P = T T = 
0 
.
2 1 0
3 0.5 0.5 0 
This is the form that we use throughout the lecture notes, the way to read
this type of matrix is from ‘vertical’ to ‘horizontal’.
Remarks:
1. To verify whether a given matrix is a transition matrix, either
summation of a row equals 1 or summation of a column equals 1,
depending on the form of the matrix given.
2. In lecture notes, the way to read the transition matrix is from
‘vertical’ to ‘horizontal’.
71

Let a component be inspected everyday and be classified into three states:
State 0 – satisfactory
State 1 – unsatisfactory
State 2 – defective
Assume that the performance of the unsatisfactory component cannot be
improved further, and that the defective component cannot be repaired.
{Xn, n = 0, 1, 2, 3, …} is a stochastic process which shows the state of the

component at nth day.
The model for this system is as below:

Suppose the component is in state 0 at time n, the probability for it to
achieve state 0, 1, 2 at time n+1 is P00, P01, P02, respectively.
( P00 + P01 + P02 = 1)
If the component is in state 1 at time n, then the probability for it to

achieve state 0,1, 2 at time n+1 is P10 = 0, P11 and P12, respectively.
(P11 + P12 = 1)
(By assuming that the performance of the unsatisfactory component
cannot be improved further, P10 = 0.)
If the component is in state 2 at time n, then it must also be in state 2 at

time n+1.
(Assume that the defective component cannot be repaired.)
Pij is called transition probability.
In general, Pij = P{X n +1 = j | X n = in , X n −1 = in −1 ,⋯, X 1 = i1 , X 0 = i0 } for

all states i0, i1, …, in-1, j and all n ≥ 0.
0 1 2
0  P00 P01 P02 
For this process, P = 1  0 P11 P12  .
 
2  0 0 1 
Whenever state 2 is reached, the realization can be regarded as ended.

Such a stochastic process is known as Markov Chain
A transition matrix may also be represented by a directed graph, we
called it as state transition diagram in which each node represents a
state and arc (i,j) represents the transition probabilities, Pij.
72
A transition matrix may also be represented by a directed graph, we

called it as state transition diagram in which each node represents a
state and arc (i,j) represents the transition probabilities, Pij.
Example:
Given a transition matrix as below, draw the state transition diagram.
1 2
1 0.90 0.10
2 0.20 0.80
P12 = 0.10
P11 = 0.90 P22 = 0.80
State transition
1 2 diagram
P21 = 0.20
Example:
Given a transition matrix P with state space S = {1, 2, 3, 4} as follows:
1 2 3 4
1 0.7 a 0 0 
 0 .
P = 2  c 1− a a − b 
3 0 0 1 d 
 
4 0 0 0.2 1 − b
(a) Find the value of a, b, c and d.

(b) Draw the state transition diagram.
Solution:
(a) (b)
0.7 + a = 1 ⇒ a = 0.3
c +1− a + a − b =1 ⇒ c = b
1 2 3 4
1+ d =1⇒ d = 0
0. 2 + 1 − b = 1 ⇒ b = 0. 2
Thus, a = 0.3, b = 0.2, c = 0.2, d = 0
73
Example
A connection between two communication nodes is modeled by a discrete
time Markov chain. The connection is in any of the following three states.
State 0 – No connection
State 1 – Slow connection
State 2 – Fast connection
When the connection is very unstable, there is a 50% chance any

connection will be disconnected. Once disconnected, 70% chance it will
remain disconnected and 10% chance it will reconnect to fast connection.
If it is already in fast connection, it is just as likely to remain in fast
connection or drop to slow connection. If it is in slow connection, only
10% chance it will improve to a fast connection.
For this process, the transition probability matrix and state transition
diagram is given as below:
0.5
0 1 2
0.4
0 0.7 0.2 0.1  0.7
0 1
P = 1 0.5 0.4 0.1 
0.2
2 0.5 0.25 0.25 0.1 0.25

0.5 0.1
0.25
In extreme case, once the connection is disconnected, it will no longer be

able to reconnect. The transition probability matrix and state transition
diagram is
0.5
0 1 2
0.4
0 1 0 0  1
0 1
P = 1 0.5 0.4 0.1 

2 0.5 0.25 0.25 0.25
0.5 0.1
0.25
In this case, state 0 is the absorbing state.
74

Suppose the entire industry produces only two types of batteries. Given
that if a person last purchased battery 1, there is 80% possibility that the
next purchase will be battery 1. Given that if a person last purchased
battery 2, there is 90% possibility that the next purchase will be battery 2.
Let X n denote the type of n battery purchased by a person. Construct the
transition matrix.
Solution:
Let state 1: battery 1 is purchased,
state 2: battery 2 is purchased.
1 2
1 0.80 0.20
2 0.10 0.90
n-step Transition Probability

In order to study n-step transition probability, let’s study 2-step
transition probability first.
How to find 2-step transition probability?

Pij(2 ) = P( X m+ 2 = j X m = i ) = P( X 2 = j X 0 = i )
i??j
1. Chapman-Kolmogorov Equations:
Pij(2 ) = ∑ Pik(1) Pkj(1) where Pik(1) = Pik and Pkj(1) = Pkj
k∈S
2. From multiplication of Transition Probability Matrix

We have P (2 ) = P × P where P is a transition probability matrix.
Pij(2 ) is the entry (i, j) of the matrix P ( 2 ) .
75
Example:
Given a transition probability matrix with state space {1, 2} as below:
P= 0.90 0.10 . Find P12( 2 ) .

0.20 0.80
 
Solution:
Method 1:
From Chapman-Kolmogorov Equations we have
2
P12(2 ) = ∑ P1(k1) Pk(12)
k =1
= P11 P12 + P12 P22
= (0.90)(0.10) + (0.10)(0.80)
= 0.17
Method 2:
0.90 0.10 0.90 0.10
P (2 ) =  ×
0.20 0.80 0.20 0.80
0.83 0.17
=
0.34 0.66
P12(2 ) = 0.17
How to find n-step transition probability?

(Extend the idea from 2-step transition probability)
Symbol: Pij( n )
Pij( n ) = P ( X n+ m = j X m = i ) = P ( X n = j X 0 = i ), n, m ≥ 0; i, j ≥ 0.
To find n-step transition probability Pij( n ) , we also have 2 methods.

1. Chapman-Kolmogorov Equations
The Chapman-Kolmogorov equations provide a method for

computing the n-step transition probabilities:
∞
(n1 + n2 )
Pij = ∑ Pik(n1 ) Pkj(n2 ) where n1 + n2 = n and n1 , n2 ≥ 0
k∈S
76
2. From multiplication of Transition Probability Matrix

We have P (n ) = P × P (n−1) = P × P × P (n−2 ) = P n where P is a
transition probability matrix.
Pij(n ) is the entry (i, j) of the matrix P ( n ) .

Referring to example in pg. 13(batteries):
(a) If a person is currently a battery 2 purchaser, what is the
probability that he will purchase battery 1, after 2 purchases from
now?
(b) If a person is currently a battery 1 purchaser, what is the
probability that he will purchase battery 1, after 3 purchases from
now?
Solution:
(a)
0.66 0.34
P (2 ) =  
0.17 0.83
P21(2 ) = 0.17
(b)
0.80 0.20 0.66 0.34
P (3 ) = PP 2 = 
0.10 0.90 0.17 0.83
0.562 ×
=
 × ×
∴ P11(3 ) = 0.562 .
77
Example:
Given a one-step transition matrix P as below:
0 1 2 3
0 0 1 0 0
1 0.2 0 0.8 0 

P=  .
2  0 0.3 0.3 0.4
 
3 0 0 1 0
Initially, the particle is in position 2. What is the probability the particle

will be in position 1 after 2 transitions?
Solution:
We want to find P ( X 2 = 1 X 0 = 2) = P21(2 )
2-step
 0.2 0 0.8 0 
 0 transition
P (2 ) = P × P = 0.44 0.24 0.32
  matrix
0.06 0.09 0.73 0.12
 
 0 0.3 0.3 0.4 
Which is the correct answer?

P21(2 ) = 0.09 or P21(2 ) = 0 ?
Remark:
If no movement then the process will stay in the beginning state,
1 i= j
Pij(0 ) =  . so the probability involved equals 1.
0 i≠ j If no movement then it is impossible for the process to go from
one state to another state, so the probability involved equals 0.
State Probabilities
Symbol: p j (k ) = P[ X k = j ]
What is the meaning of X k = j ?

The chain is said to be in state j at time k.
How to find state probability?
We have 2 methods.
78
Method 1:
∞ ∞
P[ X k +1 = j ] = ∑ P[ X k = i ]P[X k +1 = j X k = i ] = ∑ pi (k )Pij .
i =0 i =0
Method 2:
By using iteration formula (which involves state probability vector, will
be discussed later)
What is the difference between transition probability and state

probability?
Transition probability – is the

State probability – is the
“moving probability” from
probability to stay in a certain
one state to another
state without knowing the state it
Pij(n ) = P( X n = j X 0 = i ) comes from
Pj (k ) = P( X k = j )

Let X k denote the position of a particle after k transitions and X 0 be the
particle’s initial position, pi (k ) be the probability of the particle in state i
after k transitions.
The table below shows the probability for the movement of the particle.
(Assume that the particle’s initial position is in state 0.)
Probability of moving to next position

Current state State 0 State 1 State 2
State 0 0 0.5 0.5
State 1 0.75 0 0.25
State 2 0.75 0.25 0
(i) Find the probability of the particle’s position after first

transition.
(ii) Find the probability of the particle’s position after second
transition.
79
Solution:
(i) p0 (1) = P( X 1 = 0)
= ∑ P( X 1 = 0 X 0 = i )P( X 0 = i )
i
= P( X 1 = 0 X 0 = 0 )P( X 0 = 0 )
=0
p1 (1) = P( X 1 = 1)
= ∑ P ( X 1 = 1 X 0 = i )P( X 0 = i )
i
= P( X 1 = 1 X 0 = 0)P( X 0 = 0)
= 0. 5
Similarly, p2 (1) = 0.5 × 1 = 0.5
(ii) p0 (2) = P( X 2 = 0 )
= ∑ P( X 2 = 0 X 1 = i )P( X 1 = i )
i
= 0.75 × 0.5 + 0.75 × 0.5
= 0.75
p1 (2 ) = P( X 2 = 1) = 0.125
p2 (2 ) = P( X 2 = 2 ) = 0.125
80
State Probability Vector
Symbol: p(n) = [ p0 (n )........ pk (n )]

From a state probability vector, we get the information about the
probability in different states at time n.
For example,
p0 (n ) = P ( X n = 0 ), which is the state probability in state 0 at time n.
p1 (n ) = P( X n = 1), which is the state probability in state 1 at time n.
.
.
.
pk (n ) = P( X n = k ), which is the state probability in state k at time n.
Property:
k
∑ p j (n ) = 1, and each element p j (n ) is nonnegative.
j =0
How to find state probability vector?

Method 1:
By one iteration with n-step transition matrix
p(n ) = p(0 )P n
Method 2:
By n iterations with the one-step transition matrix
p (n ) = p (n − 1)P
Using p (0 ) = [ p0 p1 ] to denote the probabilities of states 0 and 1 at

1 − p p 
time n = 0 and the state transition matrix is given as P =  ,
 q 1 − q 
it can be shown that the state probabilities at time n as
 q p  n  p0 p − p1q − p0 p + p1q 
p (n ) = [ p0 (n ) p1 (n )] =  + λ2 
p+q p + q   p+q p + q 
where λ 2 = 1 − ( p + q ) .
81
The state probability vector as

 q p  n  p0 p − p1q − p0 p + p1q 
p (n ) = [ p0 (n ) p1 (n )] =  + λ2 
p+q p + q   p+q p + q 
where λ 2 = 1 − ( p + q )
is shown below.
Step 1: Find the eigenvalues
Step 2: Find the eigenvectors
Step 3: Form a Q matrix from eigenvectors
Step 4: Diagonalization
Step 5: By using one iteration with n-step transition matrix
Step 1: find the eigenvalues

1− p − λ p
= λ2 − [1 + {1 − ( p + q)}]λ + 1 − ( p + q)
q 1− q − λ
Hence, λ = 1, 1−(p+q)
Step 2: find the eigenvectors

For λ = 1
 − p p   x  0 
 q − q   y  = 0 
    
−p x + p y = 0
x=y
1
Choose x = 1, we have v1=  
1
For λ = 1−(p+q)
q p   x  0
q p   y  = 0
    
qx+py=0
 p
Choose x = p, we have v2=  
− q 
82
Step 3: Form a Q matrix from eigenvectors

1 p 
Hence, we have Q =   &
1 − q 
1 − q − p  1 q p 
Q-1= =
− q − p  − 1 1  p + q 1 − 1
 
D is a diagonal matrix with

Step 4: Diagonalization
By diagonalizing matrix P, we obtain eigenvalues in the diagonal
1 p  1 0  1 q p 
P=      ( P = QDQ −1 )
1 − q  0 1 − ( p + q)  p + q 1 − 1
And hence,
( you can find P n easily because P n = QD n Q −1 )
1 p  1 0  1 q p 
Pn =    n  
1 − q  0 {1 − ( p + q)}  p + q 1 − 1
1 1 p{1 − ( p + q)}n  q p 
=  
p + q 1 − q{1 − ( p + q)}n  1 − 1
1 q + p{1 − ( p + q )}n p − p{1 − ( p + q)}n 
=  
p + q  q − q{1 − ( p + q)}n p + q{1 − ( p + q)}n 
Step 5: By using one iteration with n-step transition matrix

Let λ = 1−(p+q)
p(n) = p(0) Pn
1 q + pλn p − pλn 
= [p0 p1]  
p + q  q − qλn p + qλn 
=
1
p+q
[
p0 q + p0 pλn + p1q − p1qλn p0 p − p0 pλn + p1 p + p1qλn ]
1 λn
= [( p0 + p1 )q ( p0 + p1 ) p] + [ p0 p − p1q − p0 p + p1q]
p+q p+q
1 λn
= [q p] + [ p0 p − p1q − p0 p + p1q]
p+q p+q
83

we may solve the previous example by using iteration formula (under
state probability vector).
Let X k denote the position of a particle after k transitions and X 0 be the

particle’s initial position, pi (k ) be the probability of the particle in state i
after k transitions.
The table below shows the probability for the movement of the particle.
(Assume that the particle’s initial position is in state 0.)
Probability of moving to next position
Current state State 0 State 1 State 2
State 0 0 0.5 0.5
State 1 0.75 0 0.25
State 2 0.75 0.25 0
(i) Find the probability of the particle’s position after first
transition.
(ii) Find the probability of the particle’s position after second
transition.
Solve the question by using the iteration formula above.

(i) p(1) = p(0 )P
 0 0.5 0.5 
 
= (1 0 0 ) 0.75 0 0.25 
 0.75 0.25 0 

= (0 0.5 0.5)
So, p0 (1) = 0 , p1 (1) = 0.5 , p 2 (1) = 0.5

(ii) p (2 ) = p (1)P
 0 0.5 0.5 
 
= (0 0.5 0 .5) 0.75 0 0.25 
 0.75 0.25 0 

= (0.75 0.125 0.125)
So, p0 (2 ) = 0.75 , p1 (2 ) = 0.125 , p2 (2 ) = 0.125
84
Example
Refer to earlier example on connection between two communication
nodes (pg. 12). The connection is in any of the following three states.
State 0 – No connection, State 1 – Slow connection, State 2 – Fast
connection
For this process, the transition probability matrix is given as below:
0.7 0.2 0.1 
P = 0.5 0.4 0.1 
0.5 0.25 0.25
Assume initially the connection is at full speed: p(0) = (0 0 1)
Then the probabilities of each type of connection after increasing number
of transitions are:
 0.7 0.2 0.1   0.7 0.2 0.1 
   
p (1) = p (0 ) 0.5 0.4 0.1  = (0 0 1) 0.5 0.4 0.1 
 0.5 0.25 0.25   0.5 0.25 0.25 
   
= (0.5 0.25 0.25)
 0.7 0.2 0.1   0.7 0.2 0.1 
   
p (2 ) = p (1) 0.5 0.4 0.1  = (0.5 0.25 0.25) 0.5 0.4 0.1 
 0.5 0.25 0.25   0.5 0.25 0.25 
   
= (0.6 0.2625 0.1375)
p (3) = p (2 )P
= (0.62 0.2594 0.1206 )
p (4 ) = p (3)) P
= (0.6240 0.2579 0.1181)
p (5) = (0.6248 0.2575 0.1177 )
p (6) = (0.6250 0.2574 0.1177)
p (7 ) = (0.6250 0.2574 0.1176 )
p (8) = (0.6250 0.2574 0.1176)
If we assume initially the connection is equally possible at any 3 states.

p (0 ) = (13 1
3
1
3
)
p (1) = (0.5667 0.2833 0.1500 )
p (2 ) = (0.6133 0.2642 0.1225)
p (3) = (0.6227 0.2590 0.1184 )
p (4 ) = (0.6245 0.2577 0.1178)
p (5) = (0.6249 0.2574 0.1177 )
p (6 ) = (0.6250 0.2574 0.1176)
Notice the probabilities converge to certain values independent of p(0) .
85
Limiting State Probabilities Symbol: π j
What is limiting state probability, π j ?

The probability that the system will stay in state j in the future (or after a
long run)
How to find limiting state probability (if they exist)?
π j = lim p j (n ) = lim P[ X n = j ]
n→∞ n→∞
Example:
Consider a transition matrix as follows:
0 1
0 0.8 0.2
1  0.1 0.9
What is the limiting state (stationary) probability vector [π 0 π1 ] ?
Solution:
1 − p p 
Compare the transition matrix with P =  . We see that p = 0.2
 q 1 − q 
and q = 0.1 .
First, we may use the following formula to find state probabilities at time
n:
 q p   p p − p1q − p 0 p + p1q 
p(n ) = [ p0 (n ) p1 (n )] =   + λn2  0 where
p+q p + q  p+q p + q 
λ2 = 1 − ( p + q) .
1 2 2 1 −2 1 
p(n ) = [ p0 (n ) p1 (n )] =   + λn2  p0 − p1 p0 + p1 
3 3 3 3 3 3 
Since λ 2 < 1 , the limiting state probabilities are

 q p  1 2
[π 0 π 1 ] = lim p (n ) =  =
n→∞ p+q p + q   3 3 
86
If Markov chain fulfils the following properties, then we have another

method to solve the above example.
(i) aperiodic,
(ii) irreducible and
(iii) finite Markov chain
Under the section of Stationary probability vector (will be discuss later).
State Classification of a Markov chain:
Communication
State j is said to be accessible from state i if for some n ≥ 0, Pij( n ) > 0 .

(There exists a path from i to j)
i j In this case, i and j

do not communicate.
If two states i and j do not communicate, then either

(i) Pij(n ) = 0 ∀n ≥ 0 or
(ii) Pji(n ) = 0 ∀n ≥ 0 or
(iii) both relations are true.
In this case, i and j

i j communicate.
(There exists a path from i to j and a path from j to i)
87
The concept of communication is an equivalence relation.

(i) i ↔ i (reflexivity)
(ii) If i ↔ j then j ↔ i (symmetry)
(iii) If i ↔ j and j ↔ k then i ↔ k (transitivity)
As a result of these three properties, the state space can be partitioned into
disjoint classes.
How to specify the classes?

The states in an equivalence class are those communicate with each other.
Example:
Given a state transition diagram with state space S = {1, 2, 3, 4} as shown

below:
1 2 3 4
Specify the classes.
Solution:
C1 = {1}
C2 = {2, 3}
C3 = {4}
88
Example:
Given a transition probability matrix with state space S = {1, 2, 3} as
shown below:
1 2 3
1  0. 5 0. 5 0 
2 0.7 0 0.3
P=
 
3  0.1 0.9 0 
Specify the classes.
Solution:
There is only one class (all states
1 2 3 communicate with each other), the
Markov chain is said to be
irreducible.
C = {1, 2, 3}
Example:
Given a Markov chain with state space, S = {0, 1, 2, 3, 4, 5} and transition
probability matrix as follows:
1 2 3 4 5
1 0.4 0.6 0 0 0 
2 0.5 0.5 0 0 0 
 
P = 3 0 0 0 1 0 .
 
4 0 0 0.8 0 0.2
5  0 0 0 1 0 
Decompose the state space, S into equivalence classes.
Solution:
1 2 3 4 5
There are two classes: C1 ={1, 2} and C2 ={3, 4, 5}.
89
Periodicity
Symbol: d(i) [denotes the period of state i]
{ }
d (i ) = g.c.d n Pii( n ) > 0 .
(g.c.d is the largest integer that divides all the {n} exactly).
i i
- n is the number of steps from i to i.
- In between state i, the process MAY or MAY NOT go back to
state i.
Example:
Given a Markov Chain with transition matrix:
1 2 3 4
1 0 1 0 0
 
P = 2 0 0 1 0 . Find d (i ) , i = 1, 2, 3, 4.
3 0 0 0 1 
 
4 1 0 0 0
Solution:
{
d (1) = g.c.d n P11(n ) > 0 }
= g.c.d {4,8,12,⋯}
=4
Similarly, d (2 ) = d (3) = d (4 ) = 4 .
90
Some remarks:
(i) If Pii(n ) = 0 ∀ n ≥ 1 , define d(i) = 0.
(ii) If Pii(n ) > 0 ∀ n ≥ 1 at n=s, s+1 then d(i) = 1.
(iii) If i ↔ j , then d(i) = d(j).
(iv) If d (i ) = 1 , then i is said to be aperiodic.
(v) If d (i ) ≥ 2 , then i is said to be periodic.
Periodicity is a class property. If state i in a class has period t, then all

states in that class have period t.
Example:
Find the period of all the states:
1 2 3 4
1  0 0.3 0 0.7
2 1 0 0 0
 
3 0 0 0 1 
 
4 0.2 0 0.8 0 
Solution:
First, we determine number of classes:
1 2 3 4
We have only a class, C = {1, 2, 3, 4}
We can see that P11(n ) > 0 for n = 2, 4, 6, ⋯ .

∴d (1) = 2
⇒ d (2) = d (3) = d (4) = 2 (since periodicity is a class property)
91
Stationary probability vector

Symbol: π , π = [π 0 ⋯ π n ]
Recall from Limiting State Probability

Symbol: π j
Can you see the relations between stationary probability vector and
limiting state probability?
How to find stationary probability vector π ?

For an aperiodic, irreducible, finite Markov chain with transition matrix
P, the stationary probability vector π is the unique solution of
π = πP and ∑ π j = 1.
j ∈S
The above formula can also be used for an irreducible, recurrent,
periodic, finite Markov chain.
Example:
Consider a transition matrix as follows:
0 1
0 0.8 0.2
1  0.1 0.9
What is the limiting state (stationary) probability vector [π 0 π1 ] ?
Solution
The above Markov chain fulfills the conditions of
(iv) aperiodic,
(v) irreducible and
(vi) finite Markov chain.
The Markov chain given yields the following three equations:
π 0 = 0.8π 0 + 0.1π1
π1 = 0.2π 0 + 0.9π1
π 0 + π1 = 1
92
From the first two equations, we see that π 0 = 0.5π1 and π1 = 2π 0 .

Applying π 0 + π1 = 1 ,
1
⇒ π 0 + 2π 0 = 1 ⇒ π 0 =
3
2
⇒ 0.5π1 + π1 = 1 ⇒ π1 =
3
1 2
Thus π 0 = and π1 = .
3 3
Example
Refer to earlier example on connection between two communication
nodes. The connection is in any of the following three states.
State 0 – No connection,
State 1 – Slow connection
State 2 – Fast connection
For this process, the transition probability matrix is given as below:
0.7 0.2 0.1 
P = 0.5 0.4 0.1 
0.5 0.25 0.25
Then the probabilities of each type of connection after long runs are:
π = πP
 0.7 0.2 0.1 
 
(π 0 π 1 π 2 ) = (π 0 π 1 π 2 ) 0.5 0.4 0.1 
 0.5 0.25 0.25 
 
Equation from the first column
π 0 = 0.7π 0 + 0.5π 1 + 0.5π 2 → 0.3π 0 − 0.5π 1 − 0.5π 2 = 0
Equation from the second column
π 1 = 0.2π 0 + 0.4π 1 + 0.25π 2 → 0.2π 0 − 0.6π 1 + 0.25π 2 = 0
Plus the standard equation
π 0 + π1 + π 2 = 1
Forms a 3 × 3 matrix equation
−1
 3 − 5 − 5  π 0   0   π 0   3 − 5 − 5   0 
          
 4 − 12 5  π 1  =  0  →  π 1  =  4 − 12 5   0 
1 1 1  π 2   1   π 2   1 1 1   1 

π 0   − 85 
  1   5 35 2
 π1  =  − 35  , π 0 = = 0.625 , π 1 = = 0.2574 , π 2 = = 0.1176
 π  − 136  − 16  8 136 17
 2  
Compare these probabilities with p(8) in earlier example.
93
The different between f ii(n ) and Pii( n ) :
What is f ii( n ) ?
f ii( n ) is the probability that, starting from state i, the first return to state i
occur at the nth transition.
f ii( n ) = P{X n = i, X υ ≠ i,υ = 1,2,⋯, n − 1 X 0 = i} for n ≥ 1 ,
Can you see the difference between f ii(n ) and Pii( n ) ?
i i i i
NO state i appear State i may appear

in between. in between.
We can see that

(i) f ii(1) = Pii and
(ii) f ii(0 ) = 0 ∀ i .
94
Recurrence and Transient
If the process starts from state i, after some time it can still go back to
state i, then we said that state i is a recurrent state.
Those states, which are not recurrent, we said they are transient.
In other words, a state i is transient if there is a way to leave state i that
never returns to state i.
How to determine whether a state is recurrent or transient?
Method 1
Draw and check the state transition diagram.
Method 2
Specify the classes and determine whether they are a closed set or not.
A closed set is a recurrent set.
A set of states S in a Markov Chain is a closed set if no state outside of S
is accessible from any state in S.
Method 3
∞
A state i is recurrent if and only if ∑ f ii(n ) = 1 .
n =1
∞
A state i is transient if and only if ∑ f ii(n ) < 1.
n =1
Method 4
∞
A state i is recurrent if and only if ∑ Pii(n ) = ∞ (diverge)
n =1
∞
A state i is transient if and only if ∑ Pii(n ) < ∞ (converge).
n =1
95
A special case of a recurrent state is an absorbing state.
Some properties for recurrent states:

(i) If i ↔ j and if i is recurrent then j is recurrent.
(ii) A finite and closed set of state space is recurrent.
(iii) All states in a class are either recurrent or transient.
Suppose C is a finite class, class C is recurrent if and only if it is a
closed set.
Example:
Markov Chain with transition matrix:
1 2 3 4
0 0 1 0
1
1 0 0 0
2 1 1  and s ={1, 2, 3, 4}
P=  0 0
3 2 2 
4 1 1 1 1
 4 4 4 4 
(a) Decompose the state space, s into equivalent classes.

(b) Determine those equivalent classes whether they are recurrent or
transient.
Solution:
(a)
1 2 3 4
C1={1, 2, 3}; C2 = {4}
(b) C1 is a closed set.

C2 is not a closed set.
So C1 is a recurrent and C2 is transient.
96
The following transition matrix represents the Markov success Chain:

0 1 2 3 ⋯
0 q p 0 0 
1 q 0 p 0 ⋯ 
 
P = 2 q 0 0 p 0 
 
3 q 0 0 0 p ⋯
⋮  ⋮ 
where s = {0,1,2,3,⋯}, p + q = 1 , Pi ,i +1 = p and Pi ,0 = q ∀ i .
Is state 0 recurrent?
Solution:
∞ ∞
∑ f 00(n ) = ∑ p n−1q
n =1 n =1
∞
= q ∑ p n −1
n =1
q
=
1− p
=1
⇒ state 0 is recurrent
Ergodic
The most important case is that in which a class is both recurrent and
aperiodic. Such classes are called ergodic and a chain consisting entirely
of one ergodic class is called an ergodic chain. These chains have the
property that Pijn becomes independent of the starting state i as n →∞.
First Passage Times

For any state i and j, f ij( n ) is defined to be the probability that starting in i
the first transition into j occurs at time n. This length of time (normally
in terms of number of transition) is known as the first passage times.
These probability can be computed by the recursive relationships

f ij( n ) = Pij( n ) − f ij(1) Pjj( n −1) − f ij( 2) Pjj( n − 2) ... − f ij( n −1) Pjj .
n
Theorem: Pij(n ) = ∑ f ij( k ) Pjj( n − k ) , n ≥ 1
k =0
97
Example:
Given a transition matrix as below
1 2
1 0.90 0.10
2 0.20 0.80
Find f12(3) .
Solution:
f12( 2 ) = P12( 2 ) − f12(1) P22(1) 1 1 1 2
= 0.17 − (0.10)(0.80) 0.90 0.90 0.10
= 0.09
f12(3) = 0.90 × 0.90 × 0.10 = 0.081
f12(3) = P12(3) − f12(1) P22(2 ) − f12( 2 ) P22(1)
= 0.219 − (0.10)(0.66) − (0.09)(0.80)
= 0.081
∞
When ∑ f ij(n ) equals 1, f ij( n ) can be considered as a probability
n =1
distribution for the random variable, the first passage time.
Consider an ergodic chain, denote the expected number of transitions

needed to travel from state i to state j for the first time as µij and defined
by
 ∞
∞ if ∑ f ij( n ) < 1
 n =1
µ ij =  ∞ ∞
 nf ( n )
∑ ij if ∑ f ij( n ) = 1
n =1 n =1
∞
Whenever ∑ f ( ) = 1 , the µij satisfies uniquely the equation
n =1
ij
n
µ ij = 1 + ∑ Pik µ kj .
k≠ j
98
Referring to previous example:

“Suppose the entire industry produces only two types of batteries. Given
that if a person last purchased battery 1, there is 80% possibility that the
next purchase will be battery 1. Given that if a person last purchased
battery 2, there is 90% possibility that the next purchase will be battery 2.
Let X n denote the type of n battery purchased by a person. Construct the
transition matrix.”
(a) Find µ12 and µ 21 .

(b) Interpret µ12 .
Solution:
Solution:
Let state 1: battery 1 is purchased,
state 2: battery 2 is purchased.
(a) µ 12 = 1 + P11µ12 = 1 + 0.8µ12

∴ µ12 = 5
µ 21 = 1 + P22 µ 21 = 1 + 0.9µ 21
∴ µ 21 = 10 .
(b) A person who last purchased battery 1 will buy an average of 5
battery 1 before switching to battery 2.
~END~
99

138191rvsplecturenotes 170424151355

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

138191rvsplecturenotes 170424151355

Caricato da

Copyright:

Formati disponibili

EEM 2046 Engineering Mathematics IV Random Variables and Stochastic Processes

Random Variables and Stochastic Processes

Sample space: The entire possible outcome for an experiment

S = {YY, GG, YG, GY}

Random variable: A function X with the sample space S as the

Symbol for random variable: Uppercase (for example, X)

Discrete Random Variables

Discrete Random Variables: if it can take on at most a countable

Discrete Random Variables

Discrete Random Variables

Probability function for discrete random variables

Figure 1: The graph of probability mass function

Cumulative distribution function (cdf)

(b) The cumulative distribution of X is

Continuous Random Variables: outcomes contain an interval of real

For example: 0 < x < 1 , 5 ≤ y ≤ 9

Probability function for continuous random variables

Given f X ( x ) = 0.25 , 0 < x < k .

(ii) Find P(0 < X < 2.5) .

variable, find P ( X = 6 ) . = [0.25 x ]0

Cumulative distribution function (cdf)

Case 1 Value x falls into the

x-axis = 0.25 x for 0 < x ≤ 4

Case 3 Value x fall into the

Two random variables

Let X = “number of yellow balls”.

Define another random variable Z as follows

Two discrete random variables

Joint probability mass function / Joint probability function

You MUST consider the values of (x, y) in

It is WRONG to say that

Joint Cumulative Distribution Function

FXY ( x, y ) = P{X ≤ x, Y ≤ y}, -∞ < x < ∞ , − ∞ < y < ∞

Marginal Probability Distributions/ marginal probability mass

How to find f X ( x ) or f Y ( y ) from f XY ( x, y ) ?

Conditional Probability distribution of X given Y = y:

Conditional Probability distribution of Y given X = x:

Two continuous random variables:

Joint probability density function:

For continuous random variables:

Properties for joint pdf:

Joint Cumulative Probability Distribution Function

FXY ( x, y ) = P( X ≤ x, Y ≤ y ), - ∞ < x, y < ∞

Marginal Probability Distributions/ marginal probability density

How to find f X ( x ) or f Y ( y ) from f XY ( x, y ) ?

Conditional Probability distribution of X given Y = y:

Conditional Probability distribution of Y given X = x:

Find the conditional probability distribution of Y given X = 0 .

One Random Variables

Two Random Variables

Three Random Variables

Multiple Random Variables

Independent of two random variables

TRUE for discrete and continuous random variables.

For continuous random variables X and Y, if the product of f X ( x ) and

For discrete random variables X and Y, the product of f X ( x ) and fY ( y )

Extend the idea the p random variables:

The random variables X 1 , X 2 , ⋯, X p are said to be mutually statistically

(a) Find the joint marginal probability distribution of X i and X j , i ≠ j ;

(c) Obviously, if i ≠ j , we have

(d) We see that

Example (continuous case):

(a) Find the joint marginal probability distribution of X i and X j , i ≠ j ;