Sei sulla pagina 1di 115

EE 5375/7375.

Random Processes

0-0
Outline for first part

• Axioms
• Conditional Probability
• Independence
• Sequences of Independent Experiments
• Random Variables

EE 5375/7375 p1 SMU Dept of EE


Probability Axioms

• Axiomatic approach to probability attempts to ensure logical


consistency by defining probability with a small number of pre-
cise axioms (postulates), then deducing rest of theory from the
axioms
• Kolmogorov’s axioms are the most widely accepted and taught
(in hindsight, they seem more obvious)
• We don’t really need the formalities to work with probabilities,
but should be aware of them

EE 5375/7375 p2 SMU Dept of EE


Experiments

• Probability is defined in context of a repeatable random experi-


ment or trial
• An experiment consists of
– Procedure
– Observation/Outcome
• A model is also assigned to the outcomes

EE 5375/7375 p3 SMU Dept of EE


Sample Space

• Sample space Ω is defined as the finest grain, mutually exclusive,


collectively exhaustive set of all possible outcomes of the random
experiment
• A sample point ωi is an element of Ω representing a particular
outcome, so Ω = {ω1 , ω2 , . . .}
• Ω can be finite, countably infinite, or uncountably infinite
– For two coin tosses, Ω = {HH, HT, T H, T T } (finite)
– For number of times required to transmit a data frame over
a noisy channel until an error-free frame is received, Ω =
{1, 2, 3, . . .} (countably infinite)
– For angle in wheel of fortune problem, Ω = {θ : 0 ≤ θ ≤ 2π}
(uncountably infinite)

EE 5375/7375 p4 SMU Dept of EE


Events

• Event A is any subset of Ω


• Ω is a subset of itself, so Ω is a valid event - called the certain
event
• Empty set ∅ is also a valid event - called impossible event
• Examples
– For two coin tosses, event of “both tosses same” A = {HH, T T }
– For a die toss, event of “less than 3” A = {1, 2}
• Events can be defined by set operations on other events
– Eg, event C = A ∪ B occurs if A or B occurs
– Event C = A ∩ B occurs if both A and B occurs
• If A ∩ B = ∅ then A and B are disjoint or mutually exclusive

EE 5375/7375 p5 SMU Dept of EE


Relationship

Set Theory Probability


Universal Set Sample Space
Element of set Outcome
Set Event

EE 5375/7375 p6 SMU Dept of EE


Sigma Fields

• Let F be a collection of events defined for sample space Ω


• F is called a σ-field if
1. F includes ∅ and Ω (ie, we must be able to talk about impos-
sible events or certain events)
2. If A is in F , then so is Ac (ie, if we can talk about probability
of A, then we must be able to talk about not A)
3. If A and B are in F , then so are A ∪ B and A ∩ B (ie, F is
closed under countable set operations; if we can talk about
probability of A and B, we must be able to talk about events
“A or B” and “A and B”)

EE 5375/7375 p7 SMU Dept of EE


Probability Measure

• Given a σ-field F , a probability measure P (·) is a mapping of


every event A ∈ F to a number P (A), called the probability of
A, satisfying these axioms for all events A and B in F :
1. P (A) ≥ 0
2. P (Ω) = 1; the total probability mass is 1
3. If A ∩ B = ∅, then P (A ∪ B) = P (A) + P (B); Probability
mass of disjoint events can be added together.

EE 5375/7375 p8 SMU Dept of EE


• From these axioms, it can be deduced that
– P (∅) = 0
– P (A) ≤ 1
– P (A) = 1 − P (Ac )
– If A ⊂ B, then P (A) ≤ P (B)
– If events A1 , A2 , . . . are all mutually exclusive, then
Ã∞ ! ∞
[ X
P An = P (An )
n=1 n=1

– P (A ∪ B) = P (A) + P (B) − P (A ∩ B); (how can we show this?)

EE 5375/7375 p9 SMU Dept of EE


Probability Space

• (Ω, F, P ) defines a probability space: a sample space Ω, a σ-field


F , a probability measure P defined on F
• Eg, for a coin toss,
– Ω = {H, T }
– F consists of these sets: {H}, {T }, Ω, ∅
– P ({H}) = P ({T }) = 1/2, P (Ω) = 1, P (∅) = 0
• For a die toss,
– Ω = {1, 2, 3, 4, 5, 6}
– F consists of all possible subsets of Ω including Ω and ∅
– Probability of any number is 1/6 → can calculate probability
of any event

EE 5375/7375 p10 SMU Dept of EE


Example

• A bucket contains 10 identical balls (0,1,...,9) and one is selected


at random
– Sample space Ω = {0, 1, . . . , 9}
– Each sample point has probability 1/10, eg, P ({0}) = 1/10
• Define events
– A = selected ball is odd = {1, 3, 5, 7, 9}
– B = selected ball is multiple of 3 = {3, 6, 9}
– C = selected ball is less than 5 = {0, 1, 2, 3, 4}
Since any ball is equally likely with probability 1/10,
– P (A) =?
– P (B) =?
– P (C) =?

EE 5375/7375 p11 SMU Dept of EE


• What is P (A ∩ B)?
• What is P (A ∪ B)?

P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
= ?

This can also be found directly by

P (A ∪ B) =?

• What is P (A ∪ B ∪ C)?

P (A ∪ B ∪ C) =?

EE 5375/7375 p12 SMU Dept of EE


Conditional Probability

• The probability of event A given that event B has occured, de-


noted P (A|B)
• Conditional probability P (A|B) is defined by

P (A|B) =?

EE 5375/7375 p13 SMU Dept of EE


Example: Die roll

Two fair dice are rolled. Let X1 denote the number that shows up
on dice 1. Let X2 be the number that shows up on dice 2. Define
A : X1 ≥ 4 and B : X1 + X2 is even.
Find the following
• P (A) =?
• P (B) =?
• P (A ∩ B)
• P (A|B)
• P (A|B c )
Can you express P (A) in terms of the conditional probabilties? can
you express P (A) in terms of P (A ∩ B) and P (A ∩ B c )?

EE 5375/7375 p14 SMU Dept of EE


Example: Binary Symmetric channel

• X is transmitted symbol (0 or 1)
• Y is received symbol (0 or 1)
• Channel noise may cause X and Y to be different
• Sample space Ω = {(X, Y )} = {(0, 0), (0, 1), (1, 0), (1, 1)}
• Suppose by design, P (X = 0) = P (X = 1) = 0.5, and from
measurements,

P (Y = 1|X = 1) = P (Y = 0|X = 0) = 0.9

P (Y = 0|X = 1) = P (Y = 1|X = 0) = 0.1


• calculate the following probabilities

P (X = 0, Y = 0) =

EE 5375/7375 p15 SMU Dept of EE


P (X = 0, Y = 1) =
P (X = 1, Y = 0) =
P (X = 1, Y = 1) =

• What is P (Y = 0)?
• This is an example of theorem of total probability useful for find-
ing unconditional probabilities from conditional probabilities

EE 5375/7375 p16 SMU Dept of EE


Theorem of Total Probability

Suppose B1 , B2 ,..., Bn are mutually exclusive events and B1 ∪ B2 ∪


· · ·∪Bn = Ω (ie, these events are said to partition the sample space),
then the (unconditional) probability of event A is

P (A) = P (A|B1 )P (B1 ) + · · · + P (A|Bn )P (Bn )

This theorem leads directly to Bayes’ rule or Bayes’ theorem.

EE 5375/7375 p17 SMU Dept of EE


Bayes’ Theorem

Suppose A1 , A2 ,..., An are mutually exclusive events and A1 ∪ A2 ∪


· · · ∪ An = Ω (these events partition the sample space), then the
conditional probability of Aj given B is
P (Aj ∩ B) P (B|Aj )P (Aj )
P (Aj |B) = = Pn
P (B) i=1 P (B|Ai )P (Ai )

The numerator is simply the definition of conditional probability.


The denominator is expanded by theorem of total probability.

EE 5375/7375 p18 SMU Dept of EE


Example: binary communication system

• Given Y = 1 was received, what is the probability that the


transmitted symbol was X = 1?

P (X = 1|Y = 1) = ?

• P (X = 1) is called a priori probability of X


• P (X = 1|Y = 1) is a posteriori probability of X given Y

EE 5375/7375 p19 SMU Dept of EE


Example: 2 Dice roll

For the earlier 2 dice roll example, find P (B|A)?


Compute it directly and also using Bayes theorem.

EE 5375/7375 p20 SMU Dept of EE


Independence

• Events A and B are independent if

P (A ∩ B) = P (A)P (B)

or equivalently, in terms of conditional probabilities,

P (A|B) = P (A)

P (B|A) = P (B)

• what is the difference between independence and mutually ex-


clusive?
• If A and B are independent, what can we say about A and B c ?

EE 5375/7375 p21 SMU Dept of EE


Example

Let a card be drawn at random from a regular pack of 52 cards.


Let A be the event that the card is a spade and B be the event that
the card is a 7. Are events A and B independent?

EE 5375/7375 p22 SMU Dept of EE


Multiple independent events

• Three events A, B, and C are independent iff ...


• Does pairwise independence imply joint independece?
• Consider example of rolling 2 dice. define event A: sum of rolls
is 7, event B: first dice shows up as 4 and event C: second dice
shows up as 3.

EE 5375/7375 p23 SMU Dept of EE


Sequence of Independent Experiments

• In many problems, we are concerned with probabilities related


to sequences of independent experiments or trials
– Eg, coin tosses, dice throws
• Simplest example of independent experiments is Bernoulli trials
– Outcome of each experiment is a success or failure.
• Eg, what is the probability of 2 heads occuring in 3 coin tosses.

EE 5375/7375 p24 SMU Dept of EE


Binomial Probabilities

• This can be generalized by the binomial probability law: let k


be number of successes in n Bernoulli trials with probability of
success p, then probability of k successes is

pn (k) =?

EE 5375/7375 p25 SMU Dept of EE


Example

• A lottery consists of drawing 3 numbers randomly from a bucket


of 20 different numbered balls, without replacing the balls
• A ticket wins if the 3 numbers match, regardless of order
= (20)(19)(18)
¡20¢ 20!
– There are 3 = 3!17! (1)(2)(3) = 1140 ways of choosing
3 numbers out of 20
1
– Since each is equally likely, any lottery ticket has 1140 prob-
ability of winning
• What if sequential order of the 3 number is important (ie, ticket
must match the order of 3 balls drawn)?

EE 5375/7375 p26 SMU Dept of EE


Example

• A long distance carrier saves bandwidth by carrying 8 telephone


conversations over 6 transmission channels, choosing only con-
versations that are active at the time
• Assume conversations are independent, and any conversation is
active at any time with probability p = 1/3
– If more than 6 conversations are active, only 6 can be carried
and the others are “clipped” off
• Probability of exactly k active conversations is binomial

p8 (k) =

• Probability of more than 6 active conversations is ?

EE 5375/7375 p27 SMU Dept of EE


Example

• Suppose a transmitter is sending 10 bits/second over a channel


that has random bit errors with probability p = 0.001
• What is probability of at least one error in one second?
• Consider number of errors in one sec as number of successes in
n = 10 Bernoulli trials

P (at least one error/sec) = 1 − P (no errors/sec)


= 1 − p10 (0)
µ ¶
10
= 1− (0.001)0 (0.999)10
0
= 1 − (0.999)10 ≈ 0.01

EE 5375/7375 p28 SMU Dept of EE


Example

• Bits are transmitted over a communication channel that has ran-


dom bit errors with probability p = 0.001
• To compensate, each bit is transmitted 3 times and receiver takes
a majority vote of received bits to decide on transmitted bit
– Receiver will make incorrect decision if channel has 2 or more
errors for 3 bits
– Probability of 2 or more successes (errors) in 3 Bernoulli trials
is
µ ¶ µ ¶
3 3
p3 (2)+p3 (3) = (0.001)2 (0.999)+ (0.001)3 ≈ 3×10−6
2 3
• Receiver is very likely correct but this method is very costly in
bandwidth

EE 5375/7375 p29 SMU Dept of EE


Example

• Suppose 5 missiles are fired at a battleship, but each missile gets


past the ship’s defenses with probability 0.1 (independently of
each other)
• At least 2 missiles are required to sink the ship
• Number of missiles getting through is number of successes in 5
Bernoulli trials
• What is the probability that the ship survives the attack?
• Does the answer makes sense?

EE 5375/7375 p30 SMU Dept of EE


Geometric Probabilities

• Suppose we continue Bernoulli trials until the first success, and


ask what is the probability that first success occurs on k th trial?
• Packet retransmission in TCP

p(k) = (1 − p)k−1 p

• Probabilities can be visualized in a tree diagram

EE 5375/7375 p31 SMU Dept of EE


Example: TCP

• Each time a packet is successfully sent with probability p = 0.9


and any transmission is errored is 1 − p = 0.1
• Each transmission is a Bernoulli trial with probability of success
p
• Probability that exactly k transmissions are needed for a message
is p(k) = (1 − p)k−1 p
• Probability that more than 2 transmissions are needed = ? Com-
pute your answer in two different ways.

EE 5375/7375 p32 SMU Dept of EE


Random Variables

• Random variable X is a function that maps each sample point


ω to a real number, X(ω)
• Types of random variables
– discrete
– continuous
– mixed type
• Example for random variable?

EE 5375/7375 p33 SMU Dept of EE


Cumulative Distribution Function

• Cumulative distribution function (CDF) for random variable X


is defined as probability of the event {X ≤ x}:
FX (x) = P (X ≤ x)

• In other words, it is the probability that X takes a value in


(−∞, x]
• Properties of CDF
– 0 ≤ FX (x) ≤ 1
– FX (−∞) =?, FX (∞) =?
– FX (x) is nondecreasing function of x
– FX (x) is continuous from the right, FX (x) = lim²→0 FX (x+²)
– P (a < X ≤ b) = FX (b) − FX (a). Does the equality (at b)
matter?
EE 5375/7375 p34 SMU Dept of EE
Probability Mass Function - discrete RV

• The PMF of discrete rv X is simply given by PX (x) = P r(X =


x)
• Note the notational difference between X and x!
• Example:
• Next, we look at some commonly occuring discrete RVs (why?)

EE 5375/7375 p35 SMU Dept of EE


Relation between PMF and CDF

• What is probability of event {X = a} in terms of CDF?


– We know

P (a − ² < X ≤ a) = FX (a) − FX (a − ²)

– To find P (X = a), we have to take ² → 0:

P (X = a) = FX (a) − FX (a−)

• Its easier to visualize the relationship between PMF and CDF

EE 5375/7375 p36 SMU Dept of EE


Bernoulli PMF

• Recall Bernoulli trial has two possible outcomes, success or fail-


ure
• Bernoulli random variable has two possible values: 1 (success)
with probability p or 0 (failure) with probability 1 − p
• Bernoulli PMF

PX (x) = px (1 − p)1−x for x = 0, 1

EE 5375/7375 p37 SMU Dept of EE


Binomial PMF

• Recall binomial probabilities


µ ¶
n k
pn (k) = p (1 − p)n−k
k
are associated with k successes occurring in n Bernoulli trials
with probability of success p
• Likewise, a binomial random variable has PMF
µ ¶
n x
PX (x) = p (1 − p)n−x for x = 0, 1, ..., n
x

EE 5375/7375 p38 SMU Dept of EE


Geometric PMF

• Geometric random variable has PMF

PX (x) = (1 − p)x p for x = 0, 1, . . .

• This has exponential decrease

EE 5375/7375 p39 SMU Dept of EE


Poisson PMF

• Poisson random variable has PMF


x
−a a
PX (x) = e for x = 0, 1, . . .
x!
for a > 0
• This arises in situations where events occur “completely at ran-
dom” in time

EE 5375/7375 p40 SMU Dept of EE


Discrete Uniform PMF

1
PX (x) = b−a+1 for all values a ≤ X ≤ b.
Pascal RV

A biased coin is flipped until Heads appears exactly k times. What


is the PMF of L the number of flips required for this?

EE 5375/7375 p41 SMU Dept of EE


Relationship between Discrete Random Variables

• Relations between Bernoulli, binomial, and Poisson?


• Binomial is sum of successes in k Bernoulli trials
• Binomial PMF will converge to Poisson PMF if we hold np = a
constant in the limit n → ∞
– We will show this later

EE 5375/7375 p42 SMU Dept of EE


Probability Density Function - continuous RV

• For a continuous and differentiable PDF, probability density func-


tion (pdf) for random variable X is defined as:
d
fX (x) = FX (x)
dx
• If pdf fX (x) exists, then
– fX (x) > 0
R∞
– −∞ fX (x)dx = 1
Rx
– FX (x) = −∞ fX (y)dy
Rb
– a fX (y)dy = P (a < X ≤ b)
– fX (x) can have a value greater than 1

EE 5375/7375 p43 SMU Dept of EE


• Interpretation of pdf if it is continuous:
Z x+∆x
P (x < X ≤ x + ∆x) = fX (y)dy
x
≈ fX (x)∆x

for very small ∆x


– So pdf fX (x) is proportional to the probability that X will be
“around” the value x
• It is incorrect to think that fX (x) = P (X = x)
– What is probability that X = x exactly?
– X = x means (x < X ≤ x + ∆x) with ∆x = 0
– So we have to say P (X = x) = fX (x) · 0 = 0

EE 5375/7375 p44 SMU Dept of EE


PMF and PDF for discrete RVs

• If the CDF FX (x) is a sequence of steps


– Strictly speaking, FX (x) is not differentiable so pdf fX (x)
does not exist
– Instead, the pdf is defined in terms of Dirac delta functions
(sometimes called impulse functions)
• Dirac delta function δ(x) is 0 everywhere except at x = 0 where
it is infinite such that
Z ∞
δ(x)dx = 1
−∞

or equivalently,
Z ∞
f (y)δ(x − y)dy = f (x)
−∞

EE 5375/7375 p45 SMU Dept of EE


• It can be visualized as a tall skinny rectangle of width 1/a and
height a (such that its area is always 1) in the limit a → ∞
• Dirac delta function can considered the derivative of the unit
step function u(x):
d
δ(x) = u(x)
dx
• Using Dirac delta functions, we can say that PDF for a discrete
random variable X consists of step functions:
X
FX (x) = PX (xi )u(x − xi )
i

and pdf consists of delta (impulse) functions:


X
fX (x) = PX (xi )δ(x − xi )
i

where PX (xi ) is the probability that X = xi

EE 5375/7375 p46 SMU Dept of EE


Uniform PDF

• Uniform random variable has equal probability over some inter-


val:
1
fX (x) = for a ≤ x ≤ b
b−a

EE 5375/7375 p47 SMU Dept of EE


Exponential PDF

• Exponential random variable is often used to model the lifetime


of components, and has an important role in queueing theory

fX (x) = λe−λx for x ≥ 0

• Parameter λ determines the height and rate of decay


• Single parameter (which also depends on the mean) determines
the entire RV.

EE 5375/7375 p48 SMU Dept of EE


Rayleigh PDF

• Rayleigh random variable is useful for certain types of signal


noise
x 2 2
fX (x) = 2 e−x /2σ for x ≥ 0
σ
• Parameter σ determines the height and rate of decay

EE 5375/7375 p49 SMU Dept of EE


Laplacian PDF

• Laplacian random variable is useful in modeling speech sources


and image gray levels
c −c|x|
fX (x) = e
2
for c > 0
• Parameter c determines the height and rate of decay

EE 5375/7375 p50 SMU Dept of EE


Normal (Gaussian) PDF

• Normal or Gaussian random variable is widely useful in situ-


ations where many small variables are summed (central limit
theorem will be covered later)
• PDF depends on 2 parameters: mean µ (center) and variance σ 2
(width), and we denote this by X ∼ N (µ, σ 2 )
1 −(x−µ)2 /2σ 2
fX (x) = √ e
2πσ 2
• The standard normal PDF refers to N (0, 1), ie, zero mean and
unit variance
– Common notation Φ(x) refers to standard normal PDF
Z x
1 −t2 /2
Φ(x) = e dt
2π −∞

EE 5375/7375 p51 SMU Dept of EE


– There are common tables of Φ(x)
– Normal random variable X ∼ N (µ, σ 2 ) can always be related
to a standard normal random variable Y ∼ N (0, 1) by

X = σY + µ

so PDF of X can be found by


µ ¶
x−µ
FX (x) = Φ
σ

EE 5375/7375 p52 SMU Dept of EE


Gamma PDF

• Gamma random variable has 2 parameters: α > 0 and λ > 0


λ(λx)α−1 e−λx
fX (x) = for x ≥ 0
Γ(α)
where Γ(x) is the gamma function defined
Z ∞
Γ(x) = y x−1 e−y dy
0

• In special case α = 1, X is exponential random variable


• In special case λ = 1/2 and α = k/2 where k is positive integer,
X is chi-square (χ2 ) random variable
• In special case α = m where m is positive integer, X is m-Erlang
random variable

EE 5375/7375 p53 SMU Dept of EE


Outline for next part

• Joint Random Variables


• Joint PDF
• Joint pdf
• Joint PMF
• Conditional Probabilities

EE 5375/7375 p54 SMU Dept of EE


Joint Random Variables

• Recall that CDF FX (x) of random variable X is probability of


event {X ≤ x}
• Joint CDF of two random variables X and Y is probability of
event {X ≤ x, Y ≤ y} = {X ≤ x} ∩ {Y ≤ y} denoted by

FXY (x, y) = P (X ≤ x, Y ≤ y)

• Can we visualize pictorially?

EE 5375/7375 p55 SMU Dept of EE


Example

• Suppose there are a number of students in a classroom


– Let X be the random height of a student
– Let Y be the random weight of the student
– Let Z be the random age of the student
• Joint PDF is

FXY Z (x, y, z) = P (X ≤ x, Y ≤ y, Z ≤ z)
= P (height ≤ x and weight ≤ y and age ≤ z)

EE 5375/7375 p56 SMU Dept of EE


Joint CDFs - Properties

• As in the one-dimensional case, joint PDFs are functions with


these properties:
– 0 ≤ FXY (x, y) ≤ 1

FXY (−∞, −∞) =?

FXY (∞, ∞) =?
– Since {X ≤ ∞} and {Y ≤ ∞} are certain events,
FXY (x, ∞) =?
FXY (∞, y) =?
FX (x) and FY (y) are called the marginal CDFs obtained from
the joint PDFs
EE 5375/7375 p57 SMU Dept of EE
– FXY (x, y) is nondecreasing function of x and nondecreasing
function of y
– FXY (x, y) is continuous from the right and above

FXY (x, y) = lim FXY (x + ², y)


²→0

FXY (x, y) = lim FXY (x, y + ²)


²→0

– Probability of a joint interval

P (a < X ≤ b, c < Y ≤ d) =?
Express your answer in terms of joint CDFs.

EE 5375/7375 p58 SMU Dept of EE


Events

• For joint variables X and Y , they make a 2-dimensional space


(X, Y )
• Events can be defined in the (X, Y ) space
• For example,
– A = {X + Y ≤ 10}
– B = {min(X, Y ) ≤ 5}
– C = {X 2 + Y 2 ≤ 100}

EE 5375/7375 p59 SMU Dept of EE


Joint Probability Density Function - continuous RV’s

• Recall in one-dimensional case, pdf of X is defined as


d
fX (x) = FX (x)
dx
if CDF FX (x) is continuous and differentiable
• The joint probability density function (pdf) is similarly defined
as
∂2
fXY (x, y) = FXY (x, y)
∂x ∂y
• Properties are similar to the one-dimensional case:
– fXY (x, y) > 0
R∞ R∞
– −∞ −∞ fXY (x, y) dx dy =?
Ry Rx
– FXY (x, y) = −∞ −∞ fXY (η, ξ) dη dξ

EE 5375/7375 p60 SMU Dept of EE


RdRb
– c a
fXY (x, y) dx dy = P (a < X ≤ b, c < Y ≤ d)
• Also, marginal pdf of X can be calculated by
Z ∞
fX (x) = fXY (x, y) dy
−∞

and marginal pdf of Y by


Z ∞
fY (y) = fXY (x, y) dx
−∞

EE 5375/7375 p61 SMU Dept of EE


Example

• Joint CDF

FXY (x, y) = (1 − e−αx )(1 − e−βy ) for x ≥ 0, y ≥ 0

• Joint pdf
fXY (x, y) =

• Marginal CDFs
FX (x) =
FY (y) =

• Marginal PDF: calculate using i) marginal CDFs and ii) joint


PDF

EE 5375/7375 p62 SMU Dept of EE


Example

• Given that joint pdf has the form:

fXY (x, y) = ce−x e−y for 0 ≤ y ≤ x < ∞

what is the constant c?


• What is the marginal pdf of X?
• What is the marginal pdf of Y ?

EE 5375/7375 p63 SMU Dept of EE


Interpretation

• Interpretation of joint pdf if it is continuous:

P (x < X ≤ x + ∆x, y < Y ≤ y + ∆y)


R y+∆y R x+∆x
= y x
fXY (η, ξ) dη dξ
≈ fXY (x, y) ∆x ∆y

for very small ∆x and ∆y


– So pdf fXY (x, y) is proportional to the probability that (X, Y )
will be “around” the value (x, y)

EE 5375/7375 p64 SMU Dept of EE


Probability of Events

• If an event has form {(X, Y ) ∈ A} where A is some region in the


(X, Y ) space, then its probability can be found by
Z Z
P ({(X, Y ) ∈ A}) = fXY (x, y) dx dy
A

• Eg, say joint pdf


1
fXY (x, y) =
4
for 0 < x < 2, 0 < y < 2
– What is probability of event {0 < X < 1, 0 < Y < 1}?
– What is probability of event {X + Y ≤ 1}?
– What is probability of event {max(X, Y ) ≤ 1}?
– What is probability of event {min(X, Y ) ≤ 1}?

EE 5375/7375 p65 SMU Dept of EE


Joint Probability Mass Function - discrete RVs

• If X and Y are discrete random variables, then similar to one-


dimensional case,
– Joint PDF FXY (x, y) looks like step functions in x and y
– Joint pdf does not strictly exist due to discontinuities in PDF
• Joint PMF is

PXY (x, y) = P (X = xi , Y = yj )

for i = 1, 2, . . . ; j = 1, 2, . . .
• Probability of event A is sum of PMF over the outcomes in A:
X X
P (A) = PXY (x, y)
(xi ,yj ) ∈A

EE 5375/7375 p66 SMU Dept of EE


• Marginal PMF of X can be found by

PX (xi ) = P (X = xi )
= P (X = xi , Y ≤ ∞)
X∞
= PXY (xi , yj )
j=1

and similarly for marginal PMF of Y

EE 5375/7375 p67 SMU Dept of EE


Example

• Suppose the number of bytes in a message is geometrically dis-


tributed with PMF

PN (x) = (1 − p)px for x = 0, 1, . . .

• Messages are segmented into small packets of maximum length


M bytes
• Say X is number of full packets, Y is number of remaining bytes
• What is joint PMF of X and Y ?

PXY (x, y) = P (X = x, Y = y)
= P (N = xM + y)
= (1 − p)pxM +y

EE 5375/7375 p68 SMU Dept of EE


• What is marginal PMF of X?

P (X = x) = P (N = xM ) + P (N = xM + 1) + · · ·
+P (N = xM + (M − 1))
M
X −1
= (1 − p)pxM +j
j=0

xM 1 − pM
= (1 − p)p
1−p
= (1 − pM )(pM )x

so X has geometric PMF with parameter pM


• What is marginal PMF of Y ?

P (Y = y) = P (N = y) + P (N = M + y) + P (N = 2M + y) + · · ·
X∞
= (1 − p)piM +y
i=0

EE 5375/7375 p69 SMU Dept of EE


1−p y
= M
p
1−p
for y = 0, 1, . . . , M − 1, so Y has truncated geometric PMF
• In summary, we need to know how to find probability of any
event in terms of the joint PMF/PDF/CDF
• Given joint CDF, we know how to find marginal CDF. What
about the reverse?

EE 5375/7375 p70 SMU Dept of EE


Conditional Probabilities

• Recall for events A and B, conditional probability P (A|B) is


P (A ∩ B)
P (A|B) =
P (B)

• For continuous joint pdf fXY (x, y), conditional pdf of Y given X
is
fXY (x, y)
fY (y|x) =
fX (x)
• For discrete joint PMF PXY (x, y), conditional PMF of Y given
X is
PXY (x, y)
PY (y|x) =
PX (x)

EE 5375/7375 p71 SMU Dept of EE


Independence

• Similar to one-dimensional case, if X and Y are independent,


then
fY (y|x) = fY (y)
or
fXY (x, y) = fX (x)fY (y)

• In discrete case,
PY (y|x) = PY (y)
or
PXY (x, y) = PX (x)PY (y)

EE 5375/7375 p72 SMU Dept of EE


Example

• Given joint pdf (from earlier example):

fXY (x, y) = 2e−x e−y for 0 ≤ y ≤ x < ∞

• What is conditional pdf of X given Y ?


fXY (x, y)
fX (x|y) =
fY (y)
2e−x e−y
=
2e−2y
= e−(x−y) for 0 ≤ y ≤ x

• Are X and Y independent?


– Recall fX (x) = 2e−x (1−e−x ) but this is not equal to fX (x|y)
– X and Y are dependent

EE 5375/7375 p73 SMU Dept of EE


Example

• Suppose a server has 2 communication lines


• Number of messages per hour received on the lines is X and Y
with joint PMF

PXY (x, y) = (1 − p)(1 − q)px q y for x, y = 0, 1, . . .

• Are X and Y independent?

EE 5375/7375 p74 SMU Dept of EE


Example

• Suppose X is chosen uniformly from (0,1), then Y is chosen


uniformly from (0, X)
• Therefore fX (x) = 1 for 0 < x < 1 and
1
fY (y|x) = for 0 < y < x
x
• Joint pdf is
1
fXY (x, y) = fY (y|x)fX (x) = for 0 < y < x, 0 < x < 1
x
• Marginal pdf of Y :

fY (y) = − ln y(how?)

EE 5375/7375 p75 SMU Dept of EE


Caveat

• There is a subtle difference between independence of events and


independence of discrete random variables

EE 5375/7375 p76 SMU Dept of EE


Outline for next part

• Functions of Random Variables


• Sums of 2 Random Variables
• Expectations

EE 5375/7375 p77 SMU Dept of EE


Functions of Random Variables

• If X is a random variable and g(x) is a real-valued function, then


Y = g(X) defines a new random variable
– Why? Think in terms of mappings?
– Can view g(x) as the function of a system with X as input
and Y as output
• Given PDF/PMF/CDF of X how to find the same for Y ?
– discrete case
– continuous case

EE 5375/7375 p78 SMU Dept of EE


Example

• Suppose

 0 if X < 0
+
Y = g(X) = (X) =
 X if X ≥ 0

This keeps the positive part of X


• If FX (x) is the PDF of X, what is the PDF of Y = (X)+ ?
– Note g(x) = (x)+ maps all negative values of X to 0 but all
positive values of X are unchanged, so

 0 if y < 0
FY (y) =
 FX (y) if y ≥ 0

EE 5375/7375 p79 SMU Dept of EE


Example

• Suppose
Y = g(X) = aX + b
for constants a, b
• This is a linear function, note
µ ¶
y−b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X≤
a

• Therefore µ ¶
y−b
FY (y) = FX
a

EE 5375/7375 p80 SMU Dept of EE


Example

• Indicator function I{c(x)} where c(x) is a condition on x (eg,


x > 0) is defined as

 0 if c(x) is not true
I=
 1 if c(x) is true

• Suppose
Y = I{X > a}
for some a
• Note Y is discrete with only 2 possible values, 0 or 1, so Y has
PMF:
P (Y = 0) = P (X ≤ a) = FX (a)
P (Y = 1) = P (X > a) = 1 − FX (a)

EE 5375/7375 p81 SMU Dept of EE


Example

• Suppose
Y = X2
for continuous random variable X
• Note
√ √
P (Y ≤ y) = P (X 2 ≤ y) = P (− y ≤ X ≤ y)

• Therefore

 0 if y < 0
FY (y) =
 FX (√y) − FX (−√y) if y ≥ 0

EE 5375/7375 p82 SMU Dept of EE


Functions of 2 Random Variables

• What about functions of 2 or more random variables, eg, Z =


g(X, Y )?
– In principle, the approach is the same
– CDF of Z is determined by joint CDF of (X, Y )
• By far, most common case is sum Z = X + Y
– Note
FZ (z) = P (Z ≤ z) = P (X + Y ≤ z)
identifies the area under the line y = z − x

EE 5375/7375 p83 SMU Dept of EE


Sums of 2 Random Variables

• So we need to integrate this area of the joint pdf fXY (x, y):
Z Z
FZ (z) = fXY (x, y) dx dy
x+y≤z
Z ∞ µZ z−y ¶
= fXY (x, y) dx dy
−∞ −∞

• The pdf of Z can be found by differentiating:


Z ∞
d
fZ (z) = FZ (z) = fXY (x, z − x) dx
dz −∞

or equivalently,
Z ∞
fZ (z) = fXY (z − y, y) dy
−∞

• In the common case that X and Y are independent, then fXY (x, y) =

EE 5375/7375 p84 SMU Dept of EE


fX (x)fY (y) and
Z ∞
fZ (z) = fX (x)fY (z − x) dx
−∞

or equivalently,
Z ∞
fZ (z) = fX (z − y)fY (y) dy
−∞

this is the convolution of fX and fY


– Think of “flip and slide” - take one function, reverse around
0, slide by z
– Then multiply two functions, and integrate
• If X and Y are independent discrete random variables, PMF of
Z = X + Y is found by discrete convolution
X
PZ (n) = PX (k)PY (n − k)
k

EE 5375/7375 p85 SMU Dept of EE


Example

• Suppose X and Y are independent, exponentially distributed

fX (x) = λe−λx for x ≥ 0


fY (y) = γe−γy for y ≥ 0

• Now convolution is
Z z
fZ (z) = λe−λx γe−γ(z−x) dx
0
Z z
= λγe−γz e−(λ−γ)x dx
0
λγ −γz
= e (1 − e−(λ−γ)z )
λ−γ

EE 5375/7375 p86 SMU Dept of EE


Expectations

• Expected value or mean of continuous rv X is defined as


Z ∞
E(X) = x fX (x) dx
−∞

and for discrete random variable X is


X
E(X) = xk PX (xk )
k

– Mean can be visualized as “center of gravity” of a pdf


• In general, expected value E(g(X)) is defined as
Z ∞
E(g(X)) = g(x) fX (x) dx
−∞
X
orE(g(X)) = g(xk )PX (xk )
k

EE 5375/7375 p87 SMU Dept of EE


Means of Some Common Random Variables

• Bernoulli
E(X) = 1 · p + 0 · (1 − p) = p

• Binomial
n µ ¶
X n
E(X) = x px (1 − p)n−x = np
x=0
x

• Geometric

X 1−p
x
E(X) = x (1 − p) p =
x=0
p

• Poisson
∞ x
X a
E(X) = x e−a =a
x=0
x!

EE 5375/7375 p88 SMU Dept of EE


• Uniform
b
1 a+b
Z
E(X) = x dx =
a b−a 2
• Exponential

1
Z
E(X) = xλe−λx dx =
0 λ
√ 1 −(x−µ)2 /2σ 2
• Normal pdf fX (x) = 2πσ 2
e is symmetric around
x = µ so
E(X) = µ

EE 5375/7375 p89 SMU Dept of EE


Outline for next part

• Moments
• Variance
• Markov and Chebyshev Inequalities
• Sample Mean

EE 5375/7375 p90 SMU Dept of EE


Example

• Suppose g(X) = I{X > a}, ie,



 0 if X ≤ a
g(X) =
 1 if X > a

• Define Y = g(X) = I{X > a}, then


Z ∞ Z ∞
E(Y ) = g(x)fX (x) dx = 1 · fX (x) dx = P (X > a)
−∞ a

EE 5375/7375 p91 SMU Dept of EE


Properties of Expectation

• We can easily show that, for constants a, b, and functions g(x)


and h(y),
1. E(ag(X) + bh(Y )) = aE(g(X)) + bE(h(Y ))
• When there are two RVs what is the expectation over?

EE 5375/7375 p92 SMU Dept of EE


Moments

• kth moment of X is the special case g(X) = X k , the expected


value of X k defined as
Z ∞
E(X k ) = xk fX (x) dx
−∞

for a continuous variable, and for discrete random variable X is


X
k
E(X ) = (xj )k PX (xj )
j

EE 5375/7375 p93 SMU Dept of EE


Variance

• Second moment E(X 2 ) is of special interest in the form of the


variance:

var(X) = E[(X − E(X))2 ] = E(X 2 ) − [E(X)]2

– Sometimes var(X) is denoted by σ 2


p
– Standard deviation is the square root of variance, σ = var(X)
• Variance or standard deviation is a rough measure of the “spread”
of pdf

EE 5375/7375 p94 SMU Dept of EE


Example

• Suppose X is uniformly distributed in (a, b)


• Mean
b
1 1 b2 − a 2 a+b
Z
E(X) = x dx = =
a b−a b−a 2 2
a+b
– This can be also seen by the symmetry of the pdf around 2

• Variance
b ¶2
(b − a)2
µ
a+b 1
Z
var(X) = x− dx =
a 2 b−a 12
which is proportional to width of the interval (a, b)

EE 5375/7375 p95 SMU Dept of EE


Example

• Suppose X is geometric with PMF

PX (x) = p(1 − p)x , x = 0, 1, . . .

• We will need to know the series



X 1
qk = 1 + q + q2 + · · · = for q < 1
1−q
k=0

Differentiating this, we get another series



X 1
kq k−1 = 1 + 2q + 3q 2 + · · · =
(1 − q)2
k=1

EE 5375/7375 p96 SMU Dept of EE


Differentiating again, we get a third series

X
k−2 2 2
k(k − 1)q = 2 + 6q + 12q + · · · =
(1 − q)3
k=2

• Recall from earlier that mean is



X ∞
X
E(X) = kp(1 − p)k = p(1 − p) k(1 − p)k−1
k=0 k=1
p(1 − p) 1−p
= =
p2 p
• 2nd moment is

X ∞
X
E(X 2 ) = k 2 p(1 − p)k = (k(k − 1) + k)p(1 − p)k
k=0 k=1

X ∞
X
= p(1 − p)2 k(k − 1)(1 − p)k−2 + kp(1 − p)k
k=2 k=1

EE 5375/7375 p97 SMU Dept of EE


2p(1 − p)2 1−p
= +
p3 p
2 − 3p + p2
=
p2

• Variance is

var(X) = E(X 2 ) − [E(X)]2


¶2
2 − 3p + p2
µ
1−p
= 2

p p
2 − 3p + p2 − (1 − 2p + p2 )
=
p2
1−p
=
p2

EE 5375/7375 p98 SMU Dept of EE


Variance of Common Random Variables

• Bernoulli
E(X) = p, var(X) = p(1 − p)

• Binomial
E(X) = np, var(X) = np(1 − p)

• Poisson
E(X) = a, var(X) = a

• Exponential
E(X) = 1/λ, var(X) = a

• Rayleigh
p
E(X) = σ π/2, var(X) = (2 − π/2)σ 2

EE 5375/7375 p99 SMU Dept of EE


• Laplacian
2
E(X) = 0, var(X) =
c2
• Normal (Gaussian)

E(X) = µ, var(X) = σ 2

• Gamma
α α
E(X) = , var(X) = 2
λ λ

EE 5375/7375 p100 SMU Dept of EE


Properties of Variance

• We can easily show that, for constant a,


1. var(X + a) = var(X)
2. var(aX) = a2 var(X)
3. var(X+Y ) = var(X)+var(Y )+2cov(X, Y ) where covariance
of X and Y is

cov(X, Y ) = E[(X − E(X))(Y − E(Y ))] =?

EE 5375/7375 p101 SMU Dept of EE


Correlation Coefficient

• Correlation coefficient of X and Y is a normalized version of the


covariance:
cov(X, Y )
ρXY = p
var(X)var(Y )
which ranges between -1 to 1
• X and Y are uncorrelated if ρXY = 0
• note: independent random variables will be uncorrelated, but
uncorrelated random variables are not necessarily independent

EE 5375/7375 p102 SMU Dept of EE


Example: uncorrelated but not independent

• Let θ be uniformly distributed between 0 and 2π. Let X = cos(θ)


and Y = sin(θ).
• Find E[X], E[Y ] and E[XY ]
• Are X and Y independent?

EE 5375/7375 p103 SMU Dept of EE


Conditional Expectations

• A generalization to conditional expectations is straightforward,


just use conditional probabilities
• Conditional expected value of continuous random variable X
given Y is defined as
Z ∞
E(X|Y = y) = x fX (x|y) dx
−∞

and for discrete random variable X is


X
E(X|Y = y) = xk PX (xk |y)
k

• In general, conditional expected value E(g(X)|Y = y) is defined


as Z ∞
E(g(X)|Y = y) = g(x) fX (x|y) dx
−∞

EE 5375/7375 p104 SMU Dept of EE


for a continuous random variable X and real-valued function
g(x), and for discrete random variable is
X
E(g(X)|Y = y) = g(xk )PX (xk |y)
k

• Conditional variance is

var(X|Y = y) = E[(X − E(X))2 |Y = y]


= E(X 2 |Y = y) − [E(X|Y = y)]2

EE 5375/7375 p105 SMU Dept of EE


Properties

• Note that E(X|Y = y) depends on the value of y and is therefore


a random variable
• What is
E[E(X|Y )]

EE 5375/7375 p106 SMU Dept of EE


Markov Inequality

• Suppose X is non-negative random variable with mean E(X)



Z a Z ∞
Then E(X) = xfX (x) dx + xfX (x) dx
Z0 ∞ a

≥ xfX (x) dx
Za ∞
≥ afX (x) dx
a
= aP (X ≥ a)
E(X)
⇒ P (X ≥ a) ≤ Markov inequality
a
• This uses only the mean to give a bound on the tail probability
• Usually the bound may be quite loose

EE 5375/7375 p107 SMU Dept of EE


Example

• Suppose the average height of children in a first grade class is 42


inches (3 feet 6 inches)
• By Markov inequality,
42
P (height ≥ 96 inches) ≤ = 0.44
96
• In this case, actual probability of height more than 96 inches (8
feet) is very small, and Markov bound is very loose because only
mean is used
• Perhaps a bound using mean and variance might be better?

EE 5375/7375 p108 SMU Dept of EE


Chebyshev Inequality

• Let D2 = (X − m)2 be squared deviation from mean


• Then Markov inequality applied to D 2 gives

2 2 E[(X − m)2 ] var(X)


P (D ≥ a ) ≤ =
a2 a2
• Note that P (D 2 ≥ a2 ) = P (|X − m| ≥ a), so the Chebyshev
inequality says
var(X)
P (|X − m| ≥ a) ≤
a2

EE 5375/7375 p109 SMU Dept of EE


Example

• Suppose the average height of children in a first grade class is 42


inches (3 feet 6 inches) and standard deviation is 6 inches
• By Chebyshev inequality,
62
P (|height − 42 inches| ≥ 12) ≤ 2 = 0.25
12
62
P (|height − 42 inches| ≥ 24) ≤ 2 = 0.06
24
• Probability of height deviating more than 12 inches from average
is bounded by 0.25, and deviating more than 24 inches from
average is less than 0.06 probability
• Chebyshev inequality can be very loose in many cases, but is
practically useful if only mean and variance are known (but en-
tire pdf is unknown)

EE 5375/7375 p110 SMU Dept of EE


Putting Things Together - the Sample Mean

• Notions of mean and variance are important for the example of


the sample mean or sample average
• Suppose we have N samples {x1 , x2 , . . . , xN } that are iid (inde-
pendent and identically distributed), drawn from a common PDF
FX (x) with unknown mean µ = E(X) and variance σ 2 = var(X)
• We might ask what is the expected next sample xN +1 to be
drawn?
• Essentially we are asking what is the mean µ? If mean is viewed
as “center of gravity” of the pdf, then the center of gravity of
the samples might be a good estimate
• Center of gravity M will minimize the total squared distance to

EE 5375/7375 p111 SMU Dept of EE


all samples:
N
X
D2 = (M − xi )2
i=1

• To minimize, differentiate with respect to M and set to 0:


N
d X
D2 = 2N M − 2 xi = 0
dM i=1

which leads to the sample mean


N
1 X
M= xi
N i=1

• Expected value of sample mean is


N N
à !
1 X 1 X
E(M ) = E xi = E(X) = µ
N i=1 N i=1

– M is unbiased estimator of µ because E(M ) = µ

EE 5375/7375 p112 SMU Dept of EE


• Variance of sample mean is
N N
à !
1 X 1 X σ2
var(M ) = var xi = 2 var(X) =
N i=1 N i=1 N

– Note that variance becomes 0 when N → ∞


• Applying Chebyshev inequality,
var(X)
P (|M − µ| ≥ δ) ≤
N δ2
– Note that bound can be made arbitrarily small by increasing
N
– Sample mean is called consistent estimator because it con-
verges to µ eventually
– This expresses weak law of large numbers

EE 5375/7375 p113 SMU Dept of EE


Weak Law of Large Numbers

• Chebyshev inequality implies that as N → ∞, the sample mean


M will converge to the statistical mean µ = E(X) in the sense

lim P (|M − µ| ≥ δ) = 0
N →∞

• Sample mean M is a random variable but becomes more “pre-


dictable” when number of samples N is large
• Law of Large Numbers implies that behavior of large homoge-
neous populations tends to be more predictable than individuals
or small groups
• Practically important in communications and networking, so
networks can be designed to meet predictable demand

EE 5375/7375 p114 SMU Dept of EE

Potrebbero piacerti anche