Lec 1

EE 5375/7375.
Random Processes
0-0
Outline for first part
• Axioms
• Conditional Probability
• Independence
• Sequences of Independent Experiments
• Random Variables
EE 5375/7375 p1 SMU Dept of EE

Probability Axioms
• Axiomatic approach to probability attempts to ensure logical

consistency by defining probability with a small number of pre-
cise axioms (postulates), then deducing rest of theory from the
axioms
• Kolmogorov’s axioms are the most widely accepted and taught
(in hindsight, they seem more obvious)
• We don’t really need the formalities to work with probabilities,
but should be aware of them

Experiments
• Probability is defined in context of a repeatable random experi-

ment or trial
• An experiment consists of
– Procedure
– Observation/Outcome
• A model is also assigned to the outcomes

Sample Space
• Sample space Ω is defined as the finest grain, mutually exclusive,

collectively exhaustive set of all possible outcomes of the random
experiment
• A sample point ωi is an element of Ω representing a particular
outcome, so Ω = {ω1 , ω2 , . . .}
• Ω can be finite, countably infinite, or uncountably infinite
– For two coin tosses, Ω = {HH, HT, T H, T T } (finite)
– For number of times required to transmit a data frame over
a noisy channel until an error-free frame is received, Ω =
{1, 2, 3, . . .} (countably infinite)
– For angle in wheel of fortune problem, Ω = {θ : 0 ≤ θ ≤ 2π}
(uncountably infinite)

Events
• Event A is any subset of Ω

• Ω is a subset of itself, so Ω is a valid event - called the certain
event
• Empty set ∅ is also a valid event - called impossible event
• Examples
– For two coin tosses, event of “both tosses same” A = {HH, T T }
– For a die toss, event of “less than 3” A = {1, 2}
• Events can be defined by set operations on other events
– Eg, event C = A ∪ B occurs if A or B occurs
– Event C = A ∩ B occurs if both A and B occurs
• If A ∩ B = ∅ then A and B are disjoint or mutually exclusive

Relationship
Set Theory Probability

Universal Set Sample Space
Element of set Outcome
Set Event

Sigma Fields
• Let F be a collection of events defined for sample space Ω

• F is called a σ-field if
1. F includes ∅ and Ω (ie, we must be able to talk about impos-
sible events or certain events)
2. If A is in F , then so is Ac (ie, if we can talk about probability
of A, then we must be able to talk about not A)
3. If A and B are in F , then so are A ∪ B and A ∩ B (ie, F is
closed under countable set operations; if we can talk about
probability of A and B, we must be able to talk about events
“A or B” and “A and B”)

Probability Measure
• Given a σ-field F , a probability measure P (·) is a mapping of

every event A ∈ F to a number P (A), called the probability of
A, satisfying these axioms for all events A and B in F :
1. P (A) ≥ 0
2. P (Ω) = 1; the total probability mass is 1
3. If A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B); Probability
mass of disjoint events can be added together.

• From these axioms, it can be deduced that
– P (∅) = 0
– P (A) ≤ 1
– P (A) = 1 − P (Ac )
– If A ⊂ B, then P (A) ≤ P (B)
– If events A1 , A2 , . . . are all mutually exclusive, then
Ã∞ ! ∞
[ X
P An = P (An )
n=1 n=1
– P (A ∪ B) = P (A) + P (B) − P (A ∩ B); (how can we show this?)

Probability Space
• (Ω, F, P ) defines a probability space: a sample space Ω, a σ-field

F , a probability measure P defined on F
• Eg, for a coin toss,
– Ω = {H, T }
– F consists of these sets: {H}, {T }, Ω, ∅
– P ({H}) = P ({T }) = 1/2, P (Ω) = 1, P (∅) = 0
• For a die toss,
– Ω = {1, 2, 3, 4, 5, 6}
– F consists of all possible subsets of Ω including Ω and ∅
– Probability of any number is 1/6 → can calculate probability
of any event

Example
• A bucket contains 10 identical balls (0,1,...,9) and one is selected

at random
– Sample space Ω = {0, 1, . . . , 9}
– Each sample point has probability 1/10, eg, P ({0}) = 1/10
• Define events
– A = selected ball is odd = {1, 3, 5, 7, 9}
– B = selected ball is multiple of 3 = {3, 6, 9}
– C = selected ball is less than 5 = {0, 1, 2, 3, 4}
Since any ball is equally likely with probability 1/10,
– P (A) =?
– P (B) =?
– P (C) =?

• What is P (A ∩ B)?
• What is P (A ∪ B)?
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= ?
This can also be found directly by
P (A ∪ B) =?
• What is P (A ∪ B ∪ C)?
P (A ∪ B ∪ C) =?

Conditional Probability
• The probability of event A given that event B has occured, de-

noted P (A|B)
• Conditional probability P (A|B) is defined by
P (A|B) =?

Example: Die roll
Two fair dice are rolled. Let X1 denote the number that shows up
on dice 1. Let X2 be the number that shows up on dice 2. Define
A : X1 ≥ 4 and B : X1 + X2 is even.
Find the following
• P (A) =?
• P (B) =?
• P (A ∩ B)
• P (A|B)
• P (A|B c )
Can you express P (A) in terms of the conditional probabilties? can
you express P (A) in terms of P (A ∩ B) and P (A ∩ B c )?

Example: Binary Symmetric channel
• X is transmitted symbol (0 or 1)
• Y is received symbol (0 or 1)
• Channel noise may cause X and Y to be different
• Sample space Ω = {(X, Y )} = {(0, 0), (0, 1), (1, 0), (1, 1)}
• Suppose by design, P (X = 0) = P (X = 1) = 0.5, and from
measurements,
P (Y = 1|X = 1) = P (Y = 0|X = 0) = 0.9
P (Y = 0|X = 1) = P (Y = 1|X = 0) = 0.1

• calculate the following probabilities
P (X = 0, Y = 0) =

P (X = 0, Y = 1) =
P (X = 1, Y = 0) =
P (X = 1, Y = 1) =
• What is P (Y = 0)?
• This is an example of theorem of total probability useful for find-
ing unconditional probabilities from conditional probabilities

Theorem of Total Probability
Suppose B1 , B2 ,..., Bn are mutually exclusive events and B1 ∪ B2 ∪

· · ·∪Bn = Ω (ie, these events are said to partition the sample space),
then the (unconditional) probability of event A is
P (A) = P (A|B1 )P (B1 ) + · · · + P (A|Bn )P (Bn )
This theorem leads directly to Bayes’ rule or Bayes’ theorem.

Bayes’ Theorem
Suppose A1 , A2 ,..., An are mutually exclusive events and A1 ∪ A2 ∪

· · · ∪ An = Ω (these events partition the sample space), then the
conditional probability of Aj given B is
P (Aj ∩ B) P (B|Aj )P (Aj )
P (Aj |B) = = Pn
P (B) i=1 P (B|Ai )P (Ai )
The numerator is simply the definition of conditional probability.

The denominator is expanded by theorem of total probability.

Example: binary communication system
• Given Y = 1 was received, what is the probability that the

transmitted symbol was X = 1?
P (X = 1|Y = 1) = ?
• P (X = 1) is called a priori probability of X

• P (X = 1|Y = 1) is a posteriori probability of X given Y

Example: 2 Dice roll
For the earlier 2 dice roll example, find P (B|A)?

Compute it directly and also using Bayes theorem.

Independence
• Events A and B are independent if
P (A ∩ B) = P (A)P (B)
or equivalently, in terms of conditional probabilities,
P (A|B) = P (A)
P (B|A) = P (B)
• what is the difference between independence and mutually ex-

clusive?
• If A and B are independent, what can we say about A and B c ?

Example
Let a card be drawn at random from a regular pack of 52 cards.

Let A be the event that the card is a spade and B be the event that
the card is a 7. Are events A and B independent?

Multiple independent events
• Three events A, B, and C are independent iff ...

• Does pairwise independence imply joint independece?
• Consider example of rolling 2 dice. define event A: sum of rolls
is 7, event B: first dice shows up as 4 and event C: second dice
shows up as 3.

Sequence of Independent Experiments
• In many problems, we are concerned with probabilities related

to sequences of independent experiments or trials
– Eg, coin tosses, dice throws
• Simplest example of independent experiments is Bernoulli trials
– Outcome of each experiment is a success or failure.
• Eg, what is the probability of 2 heads occuring in 3 coin tosses.

Binomial Probabilities
• This can be generalized by the binomial probability law: let k

be number of successes in n Bernoulli trials with probability of
success p, then probability of k successes is
pn (k) =?

Example
• A lottery consists of drawing 3 numbers randomly from a bucket

of 20 different numbered balls, without replacing the balls
• A ticket wins if the 3 numbers match, regardless of order
= (20)(19)(18)
¡20¢ 20!
– There are 3 = 3!17! (1)(2)(3) = 1140 ways of choosing
3 numbers out of 20
1
– Since each is equally likely, any lottery ticket has 1140 prob-
ability of winning
• What if sequential order of the 3 number is important (ie, ticket
must match the order of 3 balls drawn)?

Example
• A long distance carrier saves bandwidth by carrying 8 telephone

conversations over 6 transmission channels, choosing only con-
versations that are active at the time
• Assume conversations are independent, and any conversation is
active at any time with probability p = 1/3
– If more than 6 conversations are active, only 6 can be carried
and the others are “clipped” off
• Probability of exactly k active conversations is binomial
p8 (k) =
• Probability of more than 6 active conversations is ?

Example
• Suppose a transmitter is sending 10 bits/second over a channel

that has random bit errors with probability p = 0.001
• What is probability of at least one error in one second?
• Consider number of errors in one sec as number of successes in
n = 10 Bernoulli trials
P (at least one error/sec) = 1 − P (no errors/sec)

= 1 − p10 (0)
µ ¶
10
= 1− (0.001)0 (0.999)10
0
= 1 − (0.999)10 ≈ 0.01

Example
• Bits are transmitted over a communication channel that has ran-

dom bit errors with probability p = 0.001
• To compensate, each bit is transmitted 3 times and receiver takes
a majority vote of received bits to decide on transmitted bit
– Receiver will make incorrect decision if channel has 2 or more
errors for 3 bits
– Probability of 2 or more successes (errors) in 3 Bernoulli trials
is
µ ¶ µ ¶
3 3
p3 (2)+p3 (3) = (0.001)2 (0.999)+ (0.001)3 ≈ 3×10−6
2 3
• Receiver is very likely correct but this method is very costly in
bandwidth

Example
• Suppose 5 missiles are fired at a battleship, but each missile gets

past the ship’s defenses with probability 0.1 (independently of
each other)
• At least 2 missiles are required to sink the ship
• Number of missiles getting through is number of successes in 5
Bernoulli trials
• What is the probability that the ship survives the attack?
• Does the answer makes sense?

Geometric Probabilities
• Suppose we continue Bernoulli trials until the first success, and

ask what is the probability that first success occurs on k th trial?
• Packet retransmission in TCP
•
p(k) = (1 − p)k−1 p
• Probabilities can be visualized in a tree diagram

Example: TCP
• Each time a packet is successfully sent with probability p = 0.9

and any transmission is errored is 1 − p = 0.1
• Each transmission is a Bernoulli trial with probability of success
p
• Probability that exactly k transmissions are needed for a message
is p(k) = (1 − p)k−1 p
• Probability that more than 2 transmissions are needed = ? Com-
pute your answer in two different ways.

Random Variables
• Random variable X is a function that maps each sample point

ω to a real number, X(ω)
• Types of random variables
– discrete
– continuous
– mixed type
• Example for random variable?

Cumulative Distribution Function
• Cumulative distribution function (CDF) for random variable X

is defined as probability of the event {X ≤ x}:
FX (x) = P (X ≤ x)
• In other words, it is the probability that X takes a value in

(−∞, x]
• Properties of CDF
– 0 ≤ FX (x) ≤ 1
– FX (−∞) =?, FX (∞) =?
– FX (x) is nondecreasing function of x
– FX (x) is continuous from the right, FX (x) = lim²→0 FX (x+²)
– P (a < X ≤ b) = FX (b) − FX (a). Does the equality (at b)
matter?
Probability Mass Function - discrete RV
• The PMF of discrete rv X is simply given by PX (x) = P r(X =

x)
• Note the notational difference between X and x!
• Example:
• Next, we look at some commonly occuring discrete RVs (why?)

Relation between PMF and CDF
• What is probability of event {X = a} in terms of CDF?

– We know
P (a − ² < X ≤ a) = FX (a) − FX (a − ²)
– To find P (X = a), we have to take ² → 0:
P (X = a) = FX (a) − FX (a−)
• Its easier to visualize the relationship between PMF and CDF

Bernoulli PMF
• Recall Bernoulli trial has two possible outcomes, success or fail-

ure
• Bernoulli random variable has two possible values: 1 (success)
with probability p or 0 (failure) with probability 1 − p
• Bernoulli PMF
PX (x) = px (1 − p)1−x for x = 0, 1

Binomial PMF
• Recall binomial probabilities

µ ¶
n k
pn (k) = p (1 − p)n−k
k
are associated with k successes occurring in n Bernoulli trials
with probability of success p
• Likewise, a binomial random variable has PMF
µ ¶
n x
PX (x) = p (1 − p)n−x for x = 0, 1, ..., n
x

Geometric PMF
• Geometric random variable has PMF
PX (x) = (1 − p)x p for x = 0, 1, . . .
• This has exponential decrease

Poisson PMF
• Poisson random variable has PMF

x
−a a
PX (x) = e for x = 0, 1, . . .
x!
for a > 0
• This arises in situations where events occur “completely at ran-
dom” in time

Discrete Uniform PMF
1
PX (x) = b−a+1 for all values a ≤ X ≤ b.
Pascal RV
A biased coin is flipped until Heads appears exactly k times. What

is the PMF of L the number of flips required for this?

Relationship between Discrete Random Variables
• Relations between Bernoulli, binomial, and Poisson?

• Binomial is sum of successes in k Bernoulli trials
• Binomial PMF will converge to Poisson PMF if we hold np = a
constant in the limit n → ∞
– We will show this later

Probability Density Function - continuous RV
• For a continuous and differentiable PDF, probability density func-

tion (pdf) for random variable X is defined as:
d
fX (x) = FX (x)
dx
• If pdf fX (x) exists, then
– fX (x) > 0
R∞
– −∞ fX (x)dx = 1
Rx
– FX (x) = −∞ fX (y)dy
Rb
– a fX (y)dy = P (a < X ≤ b)
– fX (x) can have a value greater than 1

• Interpretation of pdf if it is continuous:
Z x+∆x
P (x < X ≤ x + ∆x) = fX (y)dy
x
≈ fX (x)∆x
for very small ∆x

– So pdf fX (x) is proportional to the probability that X will be
“around” the value x
• It is incorrect to think that fX (x) = P (X = x)
– What is probability that X = x exactly?
– X = x means (x < X ≤ x + ∆x) with ∆x = 0
– So we have to say P (X = x) = fX (x) · 0 = 0

PMF and PDF for discrete RVs
• If the CDF FX (x) is a sequence of steps

– Strictly speaking, FX (x) is not differentiable so pdf fX (x)
does not exist
– Instead, the pdf is defined in terms of Dirac delta functions
(sometimes called impulse functions)
• Dirac delta function δ(x) is 0 everywhere except at x = 0 where
it is infinite such that
Z ∞
δ(x)dx = 1
−∞
or equivalently,
Z ∞
f (y)δ(x − y)dy = f (x)
−∞

• It can be visualized as a tall skinny rectangle of width 1/a and
height a (such that its area is always 1) in the limit a → ∞
• Dirac delta function can considered the derivative of the unit
step function u(x):
d
δ(x) = u(x)
dx
• Using Dirac delta functions, we can say that PDF for a discrete
random variable X consists of step functions:
X
FX (x) = PX (xi )u(x − xi )
i
and pdf consists of delta (impulse) functions:

X
fX (x) = PX (xi )δ(x − xi )
i
where PX (xi ) is the probability that X = xi

Uniform PDF
• Uniform random variable has equal probability over some inter-

val:
1
fX (x) = for a ≤ x ≤ b
b−a

Exponential PDF
• Exponential random variable is often used to model the lifetime

of components, and has an important role in queueing theory
fX (x) = λe−λx for x ≥ 0
• Parameter λ determines the height and rate of decay

• Single parameter (which also depends on the mean) determines
the entire RV.

Rayleigh PDF
• Rayleigh random variable is useful for certain types of signal

noise
x 2 2
fX (x) = 2 e−x /2σ for x ≥ 0
σ
• Parameter σ determines the height and rate of decay

Laplacian PDF
• Laplacian random variable is useful in modeling speech sources

and image gray levels
c −c|x|
fX (x) = e
2
for c > 0
• Parameter c determines the height and rate of decay

Normal (Gaussian) PDF
• Normal or Gaussian random variable is widely useful in situ-

ations where many small variables are summed (central limit
theorem will be covered later)
• PDF depends on 2 parameters: mean µ (center) and variance σ 2
(width), and we denote this by X ∼ N (µ, σ 2 )
1 −(x−µ)2 /2σ 2
fX (x) = √ e
2πσ 2
• The standard normal PDF refers to N (0, 1), ie, zero mean and
unit variance
– Common notation Φ(x) refers to standard normal PDF
Z x
1 −t2 /2
Φ(x) = e dt
2π −∞

– There are common tables of Φ(x)
– Normal random variable X ∼ N (µ, σ 2 ) can always be related
to a standard normal random variable Y ∼ N (0, 1) by
X = σY + µ
so PDF of X can be found by

µ ¶
x−µ
FX (x) = Φ
σ

Gamma PDF
• Gamma random variable has 2 parameters: α > 0 and λ > 0

λ(λx)α−1 e−λx
fX (x) = for x ≥ 0
Γ(α)
where Γ(x) is the gamma function defined
Z ∞
Γ(x) = y x−1 e−y dy
0
• In special case α = 1, X is exponential random variable

• In special case λ = 1/2 and α = k/2 where k is positive integer,
X is chi-square (χ2 ) random variable
• In special case α = m where m is positive integer, X is m-Erlang
random variable

Outline for next part
• Joint Random Variables

• Joint PDF
• Joint pdf
• Joint PMF
• Conditional Probabilities

Joint Random Variables
• Recall that CDF FX (x) of random variable X is probability of

event {X ≤ x}
• Joint CDF of two random variables X and Y is probability of
event {X ≤ x, Y ≤ y} = {X ≤ x} ∩ {Y ≤ y} denoted by
FXY (x, y) = P (X ≤ x, Y ≤ y)
• Can we visualize pictorially?

Example
• Suppose there are a number of students in a classroom

– Let X be the random height of a student
– Let Y be the random weight of the student
– Let Z be the random age of the student
• Joint PDF is
FXY Z (x, y, z) = P (X ≤ x, Y ≤ y, Z ≤ z)
= P (height ≤ x and weight ≤ y and age ≤ z)

Joint CDFs - Properties
• As in the one-dimensional case, joint PDFs are functions with

these properties:
– 0 ≤ FXY (x, y) ≤ 1
–
FXY (−∞, −∞) =?
–
FXY (∞, ∞) =?
– Since {X ≤ ∞} and {Y ≤ ∞} are certain events,
FXY (x, ∞) =?
FXY (∞, y) =?
FX (x) and FY (y) are called the marginal CDFs obtained from
the joint PDFs
– FXY (x, y) is nondecreasing function of x and nondecreasing
function of y
– FXY (x, y) is continuous from the right and above
FXY (x, y) = lim FXY (x + ², y)

²→0
FXY (x, y) = lim FXY (x, y + ²)

²→0
– Probability of a joint interval
P (a < X ≤ b, c < Y ≤ d) =?
Express your answer in terms of joint CDFs.

Events
• For joint variables X and Y , they make a 2-dimensional space

(X, Y )
• Events can be defined in the (X, Y ) space
• For example,
– A = {X + Y ≤ 10}
– B = {min(X, Y ) ≤ 5}
– C = {X 2 + Y 2 ≤ 100}

Joint Probability Density Function - continuous RV’s
• Recall in one-dimensional case, pdf of X is defined as

d
fX (x) = FX (x)
dx
if CDF FX (x) is continuous and differentiable
• The joint probability density function (pdf) is similarly defined
as
∂2
fXY (x, y) = FXY (x, y)
∂x ∂y
• Properties are similar to the one-dimensional case:
– fXY (x, y) > 0
R∞ R∞
– −∞ −∞ fXY (x, y) dx dy =?
Ry Rx
– FXY (x, y) = −∞ −∞ fXY (η, ξ) dη dξ

RdRb
– c a
fXY (x, y) dx dy = P (a < X ≤ b, c < Y ≤ d)
• Also, marginal pdf of X can be calculated by
Z ∞
fX (x) = fXY (x, y) dy
−∞
and marginal pdf of Y by

Z ∞
fY (y) = fXY (x, y) dx
−∞

Example
• Joint CDF
FXY (x, y) = (1 − e−αx )(1 − e−βy ) for x ≥ 0, y ≥ 0
• Joint pdf
fXY (x, y) =
• Marginal CDFs
FX (x) =
FY (y) =
• Marginal PDF: calculate using i) marginal CDFs and ii) joint

PDF

Example
• Given that joint pdf has the form:
fXY (x, y) = ce−x e−y for 0 ≤ y ≤ x < ∞
what is the constant c?

• What is the marginal pdf of X?
• What is the marginal pdf of Y ?

Interpretation
• Interpretation of joint pdf if it is continuous:
P (x < X ≤ x + ∆x, y < Y ≤ y + ∆y)

R y+∆y R x+∆x
= y x
fXY (η, ξ) dη dξ
≈ fXY (x, y) ∆x ∆y
for very small ∆x and ∆y

– So pdf fXY (x, y) is proportional to the probability that (X, Y )
will be “around” the value (x, y)

Probability of Events
• If an event has form {(X, Y ) ∈ A} where A is some region in the

(X, Y ) space, then its probability can be found by
Z Z
P ({(X, Y ) ∈ A}) = fXY (x, y) dx dy
A
• Eg, say joint pdf

1
fXY (x, y) =
4
for 0 < x < 2, 0 < y < 2
– What is probability of event {0 < X < 1, 0 < Y < 1}?
– What is probability of event {X + Y ≤ 1}?
– What is probability of event {max(X, Y ) ≤ 1}?
– What is probability of event {min(X, Y ) ≤ 1}?

Joint Probability Mass Function - discrete RVs
• If X and Y are discrete random variables, then similar to one-

dimensional case,
– Joint PDF FXY (x, y) looks like step functions in x and y
– Joint pdf does not strictly exist due to discontinuities in PDF
• Joint PMF is
PXY (x, y) = P (X = xi , Y = yj )
for i = 1, 2, . . . ; j = 1, 2, . . .
• Probability of event A is sum of PMF over the outcomes in A:
X X
P (A) = PXY (x, y)
(xi ,yj ) ∈A

• Marginal PMF of X can be found by
PX (xi ) = P (X = xi )
= P (X = xi , Y ≤ ∞)
X∞
= PXY (xi , yj )
j=1
and similarly for marginal PMF of Y

Example
• Suppose the number of bytes in a message is geometrically dis-

tributed with PMF
PN (x) = (1 − p)px for x = 0, 1, . . .
• Messages are segmented into small packets of maximum length

M bytes
• Say X is number of full packets, Y is number of remaining bytes
• What is joint PMF of X and Y ?
PXY (x, y) = P (X = x, Y = y)
= P (N = xM + y)
= (1 − p)pxM +y

• What is marginal PMF of X?
P (X = x) = P (N = xM ) + P (N = xM + 1) + · · ·
+P (N = xM + (M − 1))
M
X −1
= (1 − p)pxM +j
j=0
xM 1 − pM
= (1 − p)p
1−p
= (1 − pM )(pM )x
so X has geometric PMF with parameter pM

• What is marginal PMF of Y ?
P (Y = y) = P (N = y) + P (N = M + y) + P (N = 2M + y) + · · ·
X∞
= (1 − p)piM +y
i=0

1−p y
= M
p
1−p
for y = 0, 1, . . . , M − 1, so Y has truncated geometric PMF
• In summary, we need to know how to find probability of any
event in terms of the joint PMF/PDF/CDF
• Given joint CDF, we know how to find marginal CDF. What
about the reverse?

Conditional Probabilities
• Recall for events A and B, conditional probability P (A|B) is

P (A ∩ B)
P (A|B) =
P (B)
• For continuous joint pdf fXY (x, y), conditional pdf of Y given X
is
fXY (x, y)
fY (y|x) =
fX (x)
• For discrete joint PMF PXY (x, y), conditional PMF of Y given
X is
PXY (x, y)
PY (y|x) =
PX (x)

Independence
• Similar to one-dimensional case, if X and Y are independent,

then
fY (y|x) = fY (y)
or
fXY (x, y) = fX (x)fY (y)
• In discrete case,
PY (y|x) = PY (y)
or
PXY (x, y) = PX (x)PY (y)

Example
• Given joint pdf (from earlier example):
fXY (x, y) = 2e−x e−y for 0 ≤ y ≤ x < ∞
• What is conditional pdf of X given Y ?

fXY (x, y)
fX (x|y) =
fY (y)
2e−x e−y
=
2e−2y
= e−(x−y) for 0 ≤ y ≤ x
• Are X and Y independent?

– Recall fX (x) = 2e−x (1−e−x ) but this is not equal to fX (x|y)
– X and Y are dependent

Example
• Suppose a server has 2 communication lines

• Number of messages per hour received on the lines is X and Y
with joint PMF
PXY (x, y) = (1 − p)(1 − q)px q y for x, y = 0, 1, . . .

Example
• Suppose X is chosen uniformly from (0,1), then Y is chosen

uniformly from (0, X)
• Therefore fX (x) = 1 for 0 < x < 1 and
1
fY (y|x) = for 0 < y < x
x
• Joint pdf is
1
fXY (x, y) = fY (y|x)fX (x) = for 0 < y < x, 0 < x < 1
x
• Marginal pdf of Y :
fY (y) = − ln y(how?)

Caveat
• There is a subtle difference between independence of events and

independence of discrete random variables

• Functions of Random Variables

• Sums of 2 Random Variables
• Expectations

Functions of Random Variables
• If X is a random variable and g(x) is a real-valued function, then

Y = g(X) defines a new random variable
– Why? Think in terms of mappings?
– Can view g(x) as the function of a system with X as input
and Y as output
• Given PDF/PMF/CDF of X how to find the same for Y ?
– discrete case
– continuous case

Example
• Suppose

 0 if X < 0
+
Y = g(X) = (X) =
 X if X ≥ 0
This keeps the positive part of X

• If FX (x) is the PDF of X, what is the PDF of Y = (X)+ ?
– Note g(x) = (x)+ maps all negative values of X to 0 but all
positive values of X are unchanged, so

 0 if y < 0
FY (y) =
 FX (y) if y ≥ 0

Example
• Suppose
Y = g(X) = aX + b
for constants a, b
• This is a linear function, note
µ ¶
y−b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X≤
a
• Therefore µ ¶
y−b
FY (y) = FX
a

Example
• Indicator function I{c(x)} where c(x) is a condition on x (eg,

x > 0) is defined as

 0 if c(x) is not true
I=
 1 if c(x) is true
• Suppose
Y = I{X > a}
for some a
• Note Y is discrete with only 2 possible values, 0 or 1, so Y has
PMF:
P (Y = 0) = P (X ≤ a) = FX (a)
P (Y = 1) = P (X > a) = 1 − FX (a)

Example
• Suppose
Y = X2
for continuous random variable X
• Note
√ √
P (Y ≤ y) = P (X 2 ≤ y) = P (− y ≤ X ≤ y)
• Therefore

 0 if y < 0
FY (y) =
 FX (√y) − FX (−√y) if y ≥ 0

Functions of 2 Random Variables
• What about functions of 2 or more random variables, eg, Z =

g(X, Y )?
– In principle, the approach is the same
– CDF of Z is determined by joint CDF of (X, Y )
• By far, most common case is sum Z = X + Y
– Note
FZ (z) = P (Z ≤ z) = P (X + Y ≤ z)
identifies the area under the line y = z − x

Sums of 2 Random Variables
• So we need to integrate this area of the joint pdf fXY (x, y):
Z Z
FZ (z) = fXY (x, y) dx dy
x+y≤z
Z ∞ µZ z−y ¶
= fXY (x, y) dx dy
−∞ −∞
• The pdf of Z can be found by differentiating:

Z ∞
d
fZ (z) = FZ (z) = fXY (x, z − x) dx
dz −∞
or equivalently,
Z ∞
fZ (z) = fXY (z − y, y) dy
−∞
• In the common case that X and Y are independent, then fXY (x, y) =

fX (x)fY (y) and
Z ∞
fZ (z) = fX (x)fY (z − x) dx
−∞
or equivalently,
Z ∞
fZ (z) = fX (z − y)fY (y) dy
−∞
this is the convolution of fX and fY

– Think of “flip and slide” - take one function, reverse around
0, slide by z
– Then multiply two functions, and integrate
• If X and Y are independent discrete random variables, PMF of
Z = X + Y is found by discrete convolution
X
PZ (n) = PX (k)PY (n − k)
k

Example
• Suppose X and Y are independent, exponentially distributed
fX (x) = λe−λx for x ≥ 0

fY (y) = γe−γy for y ≥ 0
• Now convolution is
Z z
fZ (z) = λe−λx γe−γ(z−x) dx
0
Z z
= λγe−γz e−(λ−γ)x dx
0
λγ −γz
= e (1 − e−(λ−γ)z )
λ−γ

Expectations
• Expected value or mean of continuous rv X is defined as

Z ∞
E(X) = x fX (x) dx
−∞
and for discrete random variable X is

X
E(X) = xk PX (xk )
k
– Mean can be visualized as “center of gravity” of a pdf

• In general, expected value E(g(X)) is defined as
Z ∞
E(g(X)) = g(x) fX (x) dx
−∞
X
orE(g(X)) = g(xk )PX (xk )
k

Means of Some Common Random Variables
• Bernoulli
E(X) = 1 · p + 0 · (1 − p) = p
• Binomial
n µ ¶
X n
E(X) = x px (1 − p)n−x = np
x=0
x
• Geometric
∞
X 1−p
x
E(X) = x (1 − p) p =
x=0
p
• Poisson
∞ x
X a
E(X) = x e−a =a
x=0
x!

• Uniform
b
1 a+b
Z
E(X) = x dx =
a b−a 2
• Exponential
∞
1
Z
E(X) = xλe−λx dx =
0 λ
√ 1 −(x−µ)2 /2σ 2
• Normal pdf fX (x) = 2πσ 2
e is symmetric around
x = µ so
E(X) = µ

• Moments
• Variance
• Markov and Chebyshev Inequalities
• Sample Mean

Example
• Suppose g(X) = I{X > a}, ie,


 0 if X ≤ a
g(X) =
 1 if X > a
• Define Y = g(X) = I{X > a}, then

Z ∞ Z ∞
E(Y ) = g(x)fX (x) dx = 1 · fX (x) dx = P (X > a)
−∞ a

Properties of Expectation
• We can easily show that, for constants a, b, and functions g(x)

and h(y),
1. E(ag(X) + bh(Y )) = aE(g(X)) + bE(h(Y ))
• When there are two RVs what is the expectation over?

Moments
• kth moment of X is the special case g(X) = X k , the expected

value of X k defined as
Z ∞
E(X k ) = xk fX (x) dx
−∞
for a continuous variable, and for discrete random variable X is

X
k
E(X ) = (xj )k PX (xj )
j

Variance
• Second moment E(X 2 ) is of special interest in the form of the

variance:
var(X) = E[(X − E(X))2 ] = E(X 2 ) − [E(X)]2
– Sometimes var(X) is denoted by σ 2

p
– Standard deviation is the square root of variance, σ = var(X)
• Variance or standard deviation is a rough measure of the “spread”
of pdf

Example
• Suppose X is uniformly distributed in (a, b)

• Mean
b
1 1 b2 − a 2 a+b
Z
E(X) = x dx = =
a b−a b−a 2 2
a+b
– This can be also seen by the symmetry of the pdf around 2
• Variance
b ¶2
(b − a)2
µ
a+b 1
Z
var(X) = x− dx =
a 2 b−a 12
which is proportional to width of the interval (a, b)

Example
• Suppose X is geometric with PMF
PX (x) = p(1 − p)x , x = 0, 1, . . .
• We will need to know the series

∞
X 1
qk = 1 + q + q2 + · · · = for q < 1
1−q
k=0
Differentiating this, we get another series

∞
X 1
kq k−1 = 1 + 2q + 3q 2 + · · · =
(1 − q)2
k=1

Differentiating again, we get a third series
∞
X
k−2 2 2
k(k − 1)q = 2 + 6q + 12q + · · · =
(1 − q)3
k=2
• Recall from earlier that mean is

∞
X ∞
X
E(X) = kp(1 − p)k = p(1 − p) k(1 − p)k−1
k=0 k=1
p(1 − p) 1−p
= =
p2 p
• 2nd moment is
∞
X ∞
X
E(X 2 ) = k 2 p(1 − p)k = (k(k − 1) + k)p(1 − p)k
k=0 k=1
∞
X ∞
X
= p(1 − p)2 k(k − 1)(1 − p)k−2 + kp(1 − p)k
k=2 k=1

2p(1 − p)2 1−p
= +
p3 p
2 − 3p + p2
=
p2
• Variance is
var(X) = E(X 2 ) − [E(X)]2

¶2
2 − 3p + p2
µ
1−p
= 2
−
p p
2 − 3p + p2 − (1 − 2p + p2 )
=
p2
1−p
=
p2

Variance of Common Random Variables
• Bernoulli
E(X) = p, var(X) = p(1 − p)
• Binomial
E(X) = np, var(X) = np(1 − p)
• Poisson
E(X) = a, var(X) = a
• Exponential
E(X) = 1/λ, var(X) = a
• Rayleigh
p
E(X) = σ π/2, var(X) = (2 − π/2)σ 2

• Laplacian
2
E(X) = 0, var(X) =
c2
• Normal (Gaussian)
E(X) = µ, var(X) = σ 2
• Gamma
α α
E(X) = , var(X) = 2
λ λ

Properties of Variance
• We can easily show that, for constant a,

1. var(X + a) = var(X)
2. var(aX) = a2 var(X)
3. var(X+Y ) = var(X)+var(Y )+2cov(X, Y ) where covariance
of X and Y is
cov(X, Y ) = E[(X − E(X))(Y − E(Y ))] =?

Correlation Coefficient
• Correlation coefficient of X and Y is a normalized version of the

covariance:
cov(X, Y )
ρXY = p
var(X)var(Y )
which ranges between -1 to 1
• X and Y are uncorrelated if ρXY = 0
• note: independent random variables will be uncorrelated, but
uncorrelated random variables are not necessarily independent

Example: uncorrelated but not independent
• Let θ be uniformly distributed between 0 and 2π. Let X = cos(θ)

and Y = sin(θ).
• Find E[X], E[Y ] and E[XY ]

Conditional Expectations
• A generalization to conditional expectations is straightforward,

just use conditional probabilities
• Conditional expected value of continuous random variable X
given Y is defined as
Z ∞
E(X|Y = y) = x fX (x|y) dx
−∞
and for discrete random variable X is

X
E(X|Y = y) = xk PX (xk |y)
k
• In general, conditional expected value E(g(X)|Y = y) is defined

as Z ∞
E(g(X)|Y = y) = g(x) fX (x|y) dx
−∞

for a continuous random variable X and real-valued function
g(x), and for discrete random variable is
X
E(g(X)|Y = y) = g(xk )PX (xk |y)
k
• Conditional variance is
var(X|Y = y) = E[(X − E(X))2 |Y = y]

= E(X 2 |Y = y) − [E(X|Y = y)]2

Properties
• Note that E(X|Y = y) depends on the value of y and is therefore

a random variable
• What is
E[E(X|Y )]

Markov Inequality
• Suppose X is non-negative random variable with mean E(X)

•
Z a Z ∞
Then E(X) = xfX (x) dx + xfX (x) dx
Z0 ∞ a
≥ xfX (x) dx
Za ∞
≥ afX (x) dx
a
= aP (X ≥ a)
E(X)
⇒ P (X ≥ a) ≤ Markov inequality
a
• This uses only the mean to give a bound on the tail probability
• Usually the bound may be quite loose

Example
• Suppose the average height of children in a first grade class is 42

inches (3 feet 6 inches)
• By Markov inequality,
42
P (height ≥ 96 inches) ≤ = 0.44
96
• In this case, actual probability of height more than 96 inches (8
feet) is very small, and Markov bound is very loose because only
mean is used
• Perhaps a bound using mean and variance might be better?

Chebyshev Inequality
• Let D2 = (X − m)2 be squared deviation from mean

• Then Markov inequality applied to D 2 gives
2 2 E[(X − m)2 ] var(X)

P (D ≥ a ) ≤ =
a2 a2
• Note that P (D 2 ≥ a2 ) = P (|X − m| ≥ a), so the Chebyshev
inequality says
var(X)
P (|X − m| ≥ a) ≤
a2

Example
• Suppose the average height of children in a first grade class is 42

inches (3 feet 6 inches) and standard deviation is 6 inches
• By Chebyshev inequality,
62
P (|height − 42 inches| ≥ 12) ≤ 2 = 0.25
12
62
P (|height − 42 inches| ≥ 24) ≤ 2 = 0.06
24
• Probability of height deviating more than 12 inches from average
is bounded by 0.25, and deviating more than 24 inches from
average is less than 0.06 probability
• Chebyshev inequality can be very loose in many cases, but is
practically useful if only mean and variance are known (but en-
tire pdf is unknown)

Putting Things Together - the Sample Mean
• Notions of mean and variance are important for the example of

the sample mean or sample average
• Suppose we have N samples {x1 , x2 , . . . , xN } that are iid (inde-
pendent and identically distributed), drawn from a common PDF
FX (x) with unknown mean µ = E(X) and variance σ 2 = var(X)
• We might ask what is the expected next sample xN +1 to be
drawn?
• Essentially we are asking what is the mean µ? If mean is viewed
as “center of gravity” of the pdf, then the center of gravity of
the samples might be a good estimate
• Center of gravity M will minimize the total squared distance to

all samples:
N
X
D2 = (M − xi )2
i=1
• To minimize, differentiate with respect to M and set to 0:

N
d X
D2 = 2N M − 2 xi = 0
dM i=1
which leads to the sample mean

N
1 X
M= xi
N i=1
• Expected value of sample mean is

N N
Ã !
1 X 1 X
E(M ) = E xi = E(X) = µ
N i=1 N i=1
– M is unbiased estimator of µ because E(M ) = µ

• Variance of sample mean is
N N
Ã !
1 X 1 X σ2
var(M ) = var xi = 2 var(X) =
N i=1 N i=1 N
– Note that variance becomes 0 when N → ∞

• Applying Chebyshev inequality,
var(X)
P (|M − µ| ≥ δ) ≤
N δ2
– Note that bound can be made arbitrarily small by increasing
N
– Sample mean is called consistent estimator because it con-
verges to µ eventually
– This expresses weak law of large numbers

Weak Law of Large Numbers
• Chebyshev inequality implies that as N → ∞, the sample mean

M will converge to the statistical mean µ = E(X) in the sense
lim P (|M − µ| ≥ δ) = 0
N →∞
• Sample mean M is a random variable but becomes more “pre-

dictable” when number of samples N is large
• Law of Large Numbers implies that behavior of large homoge-
neous populations tends to be more predictable than individuals
or small groups
• Practically important in communications and networking, so
networks can be designed to meet predictable demand

Lec 1

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lec 1

Caricato da

Copyright:

Formati disponibili

EE 5375/7375.

EE 5375/7375 p1 SMU Dept of EE

• Axiomatic approach to probability attempts to ensure logical

EE 5375/7375 p2 SMU Dept of EE

• Probability is defined in context of a repeatable random experi-

EE 5375/7375 p3 SMU Dept of EE

• Sample space Ω is defined as the finest grain, mutually exclusive,

EE 5375/7375 p4 SMU Dept of EE

• Event A is any subset of Ω

EE 5375/7375 p5 SMU Dept of EE

Set Theory Probability

EE 5375/7375 p6 SMU Dept of EE

• Let F be a collection of events defined for sample space Ω

EE 5375/7375 p7 SMU Dept of EE

• Given a σ-field F , a probability measure P (·) is a mapping of

EE 5375/7375 p8 SMU Dept of EE

– P (A ∪ B) = P (A) + P (B) − P (A ∩ B); (how can we show this?)

EE 5375/7375 p9 SMU Dept of EE

• (Ω, F, P ) defines a probability space: a sample space Ω, a σ-field

EE 5375/7375 p10 SMU Dept of EE

• A bucket contains 10 identical balls (0,1,...,9) and one is selected

EE 5375/7375 p11 SMU Dept of EE

This can also be found directly by

EE 5375/7375 p12 SMU Dept of EE

• The probability of event A given that event B has occured, de-

EE 5375/7375 p13 SMU Dept of EE

EE 5375/7375 p14 SMU Dept of EE

P (Y = 1|X = 1) = P (Y = 0|X = 0) = 0.9

P (Y = 0|X = 1) = P (Y = 1|X = 0) = 0.1

EE 5375/7375 p15 SMU Dept of EE

EE 5375/7375 p16 SMU Dept of EE

Suppose B1 , B2 ,..., Bn are mutually exclusive events and B1 ∪ B2 ∪

P (A) = P (A|B1 )P (B1 ) + · · · + P (A|Bn )P (Bn )

This theorem leads directly to Bayes’ rule or Bayes’ theorem.

EE 5375/7375 p17 SMU Dept of EE

Suppose A1 , A2 ,..., An are mutually exclusive events and A1 ∪ A2 ∪

The numerator is simply the definition of conditional probability.

EE 5375/7375 p18 SMU Dept of EE

• Given Y = 1 was received, what is the probability that the

• P (X = 1) is called a priori probability of X

EE 5375/7375 p19 SMU Dept of EE

For the earlier 2 dice roll example, find P (B|A)?

EE 5375/7375 p20 SMU Dept of EE

• Events A and B are independent if

or equivalently, in terms of conditional probabilities,

• what is the difference between independence and mutually ex-

EE 5375/7375 p21 SMU Dept of EE

Let a card be drawn at random from a regular pack of 52 cards.

EE 5375/7375 p22 SMU Dept of EE

• Three events A, B, and C are independent iff ...

EE 5375/7375 p23 SMU Dept of EE

• In many problems, we are concerned with probabilities related

EE 5375/7375 p24 SMU Dept of EE

• This can be generalized by the binomial probability law: let k

EE 5375/7375 p25 SMU Dept of EE

• A lottery consists of drawing 3 numbers randomly from a bucket

EE 5375/7375 p26 SMU Dept of EE

• A long distance carrier saves bandwidth by carrying 8 telephone

• Probability of more than 6 active conversations is ?

EE 5375/7375 p27 SMU Dept of EE

• Suppose a transmitter is sending 10 bits/second over a channel

P (at least one error/sec) = 1 − P (no errors/sec)