Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
( =
Stat 110A, UCLA, Ivo Dinov Slide 5
Marginal Probability Mass Functions
The marginal probability mass
functions of X and Y, denoted p
X
(x) and
p
Y
(y) are given by
( ) ( , ) ( ) ( , )
X Y
y x
p x p x y p y p x y = =
Stat 110A, UCLA, Ivo Dinov Slide 6
Joint Probability Density Function
Let X and Y be continuous rvs. Then f (x, y)
is a joint probability density function for X
and Y if for any two-dimensional set A
( ) , ( , )
A
P X Y A f x y dxdy ( =
If A is the two-dimensional rectangle
( ) , ( , )
b d
a c
P X Y A f x y dydx ( =
{ } ( , ) : , , x y a x b c y d
2
Stat 110A, UCLA, Ivo Dinov Slide 7
( ) , P X Y A (
= Volume under density surface above A
( , ) f x y
A = shaded
rectangle
Stat 110A, UCLA, Ivo Dinov Slide 8
Marginal Probability Density Functions
The marginal probability density functions of
X and Y, denoted f
X
(x) and f
Y
(y), are given by
( ) ( , ) for
( ) ( , ) for
X
Y
f x f x y dy x
f y f x y dx y
= < <
= < <
= = =
x
R
XY X
Y X f x X P x f ) , ( ) ( ) (
= = =
Ry
XY Y
Y X f y Y P y f ) , ( ) ( ) (
Stat 110A, UCLA, Ivo Dinov Slide 14
Mean and Variance
z If the marginal probability distribution of X has the probability
function f(x), then
z R = Set of all points in the range of (X,Y).
z Example 5-4.
=
|
|
.
|
\
|
= = =
x R
XY
x R
XY
x
X X
x x
y x xf y x f x x xf X E ) , ( ) , ( ) ( ) (
=
R
XY
y x xf ) , (
) , ( ) ( ) , ( ) (
) , ( ) ( ) ( ) ( ) (
2 2
2 2 2
y x f x y x f x
y x f x x f x X V
XY
R
X
x R
XY X
x R
XY X
x
X X
X
x
x
= =
= = =
Stat 110A, UCLA, Ivo Dinov Slide 15
Joint probability mass function example
The joint density, P{X,Y}, of the number of minutes waiting to catch the first fish, X ,
and the number of minutes waiting to catch the second fish, Y, is given below.
P {X = i,Y = k } k
1 2 3
Row Sum
P{ X = i }
1
i 2
3
0.01 0.02 0.08
0.01 0.02 0.08
0.07 0.08 0.63
0.11
0.11
0.78
Column Sum P
{Y =k }
0.09 0.12 0.79 1.00
The (joint) chance of waiting 3 minutes to catch the first fish and 3 minutes to
catch the second fish is:
The (marginal) chance of waiting 3 minutes to catch the first fish is:
The (marginal) chance of waiting 2 minutes to catch the first fish is (circle all
that are correct):
The chance of waiting at least two minutes to catch the first fish is (circle
none, one or more):
The chance of waiting at most two minutes to catch the first fish and at most
two minutes to catch the second fish is (circle none, one or more):
Stat 110A, UCLA, Ivo Dinov Slide 16
Conditional probability
z Given discrete random variables X and Y with joint
probability mass function f
XY
(X,Y), the conditional
probability mass function of Y given X=x is
f
Y|x
(y|x) = f
Y|x
(y) = f
XY
(x,y)/f
X
(x) for f
X
(x) > 0
Stat 110A, UCLA, Ivo Dinov Slide 17
Conditional probability (Cont.)
z Because a conditional probability mass function f
Y|x
(y) is a
probability mass function for all y in R
x
, the following
properties are satisfied:
(1) f
Y|x
(y) 0
(2) f
Y|x
(y) = 1
(3) P(Y=y|X=x) = f
Y|x
(y)
x
R
Stat 110A, UCLA, Ivo Dinov Slide 18
4
Stat 110A, UCLA, Ivo Dinov Slide 19
Conditional probability (Cont.)
z Let R
x
denote the set of all points in the range of
(X,Y) for which X=x. The conditional mean of Y
given X=x, denoted as E(Y|x) or
Y|x
, is
z And the conditional variance of Y given X=x,
denoted as V(Y|x) or
2
Y|x
is
=
x
R
y yf E ) ( ) x | Y (
x | Y
= =
x x
R R
y f y y f y V
2
x | Y x | Y
2
x | Y
2
x | Y
) ( ) ( ) ( ) x | Y (
Stat 110A, UCLA, Ivo Dinov Slide 20
Independence
z For discrete random variables X and Y, if any one of
the following properties is true, the others are also
true, and X and Y are independent.
(1) f
XY
(x,y) = f
X
(x) f
Y
(y) for all x and y
(2) f
Y|x
(y) = f
Y
(y) for all x and y with f
X
(x) > 0
(3) f
X|y
(y) = f
X
(x) for all x and y with f
Y
(y) > 0
(4) P(X A, Y B) = P(X A)P(Y B) for any
sets A and B in the range of X and Y respectively.
Stat 110A, UCLA, Ivo Dinov Slide 21
5.2
Expected Values,
Covariance, and
Correlation
Stat 110A, UCLA, Ivo Dinov Slide 22
Expected Value
Let X and Y be jointly distributed rvs with pmf
p(x, y) or pdf f (x, y) according to whether the
variables are discrete or continuous. Then the
expected value of a function h(X, Y), denoted
E[h(X, Y)] or
is
( , ) ( , )
( , ) ( , )
x y
h x y p x y
h x y f x y dxdy
discrete
continuous
( , ) h X Y
discrete
continuous
( ) ( )( ) Cov ,
X Y
X Y E X Y ( =
Stat 110A, UCLA, Ivo Dinov Slide 24
Short-cut Formula for Covariance
( ) ( ) Cov ,
X Y
X Y E XY =
5
Stat 110A, UCLA, Ivo Dinov Slide 25
Correlation
The correlation coefficient of X and Y,
denoted by Corr(X, Y),
defined by
,
, or just , is
X Y
( )
,
Cov ,
X Y
X Y
X Y
= =
= =
In addition, with T
o
= X
1
++ X
n
,
( ) ( )
2
, , and .
o
o o T
E T n V T n n = = =
Stat 110A, UCLA, Ivo Dinov Slide 34
Normal Population Distribution
Let X
1
,, X
n
be a random sample from a
normal distribution with mean value and
standard deviation Then for any n,
is normally distributed, as is T
o
.
. X
Stat 110A, UCLA, Ivo Dinov Slide 35
The Central Limit Theorem
Let X
1
,, X
n
be a random sample from a
distribution with mean value and variance
Then if n sufficiently large, has
approximately a normal distribution with
X
2
.
2
2
and ,
X X
n
= = and T
o
also has
approximately a normal distribution with
2
, .
o o
T T
n n = =
n, the better the approximation.
The larger the value of
Stat 110A, UCLA, Ivo Dinov Slide 36
The Central Limit Theorem
Population
distribution
small to
moderate n
X
large n X
7
Stat 110A, UCLA, Ivo Dinov Slide 37
Rule of Thumb
If n > 30, the Central Limit Theorem can
be used.
Stat 110A, UCLA, Ivo Dinov Slide 38
Approximate Lognormal Distribution
Let X
1
,, X
n
be a random sample from a
distribution for which only positive values are
possible [P(Xi > 0) = 1]. Then if n is
sufficiently large, the product Y = X
1
X
2
X
n
has
approximately a lognormal distribution.
Stat 110A, UCLA, Ivo Dinov Slide 39
5.5
The Distribution
of a
Linear Combination
Stat 110A, UCLA, Ivo Dinov Slide 40
Linear Combination
Given a collection of n random variables
X
1
,, X
n
and n numerical constants a
1
,,a
n
,
the rv
is called a linear combination of the X
i
s.
1 1
1
...
n
n n i i
i
Y a X a X a X
=
= + + =
Stat 110A, UCLA, Ivo Dinov Slide 41
Expected Value of a Linear
Combination
Let X
1
,, X
n
have mean values
and variances of respectively
1 2
, ,...,
n
2 2 2
1 2
, ,..., ,
n
Whether or not the X
i
s are independent,
( ) ( ) ( )
1 1 1 1
... ...
n n n n
E a X a X a E X a E X + + = + +
1 1
...
n n
a a = + +
Stat 110A, UCLA, Ivo Dinov Slide 42
Variance of a Linear Combination
( ) ( ) ( )
2 2
1 1 1 1
... ...
n n n n
V a X a X a V X a V X + + = + +
If X
1
,, X
n
are independent,
2 2 2 2
1 1
...
n n
a a = + +
and
1 1
2 2 2 2
... 1 1
...
n n
a X a X n n
a a
+ +
= + +
8
Stat 110A, UCLA, Ivo Dinov Slide 43
Variance of a Linear Combination
( ) ( ) 1 1
1 1
... Cov ,
n n
n n i j i j
i j
V a X a X a a X X
= =
+ + =
For any X
1
,, X
n
,
Stat 110A, UCLA, Ivo Dinov Slide 44
Difference Between Two Random
Variables
( ) ( ) ( )
1 2 1 2
E X X E X E X =
and, if X
1
and X
2
are independent,
( ) ( ) ( )
1 2 1 2
V X X V X V X = +
Stat 110A, UCLA, Ivo Dinov Slide 45
Difference Between Normal Random
Variables
If X
1
, X
2
,X
n
are independent, normally
distributed rvs, then any linear combination
of the X
i
s also has a normal distribution. The
difference X
1
X
2
between two independent,
normally distributed variables is itself
normally distributed.
Stat 110A, UCLA, Ivo Dinov Slide 46
Central Limit Theorem:
When sampling from almost any distribution,
is approximately Normally distributed in large samples. X
Central Limit Theorem heuristic formulation
Show Sampling Distribution Simulation Applet:
file:///C:/Ivo.dir/UCLA_Classes/Winter2002/AdditionalInstructorAids/
SamplingDistributionApplet.html
Stat 110A, UCLA, Ivo Dinov Slide 47
Independence
z For discrete random variables X and Y, if any one of
the following properties is true, the others are also
true, and X and Y are independent.
(1) f
XY
(x,y) = f
X
(x) f
Y
(y) for all x and y
(2) f
Y|x
(y) = f
Y
(y) for all x and y with f
X
(x) > 0
(3) f
X|y
(y) = f
X
(x) for all x and y with f
Y
(y) > 0
(4) P(X A, Y B) = P(X A)P(Y B) for any
sets A and B in the range of X and Y respectively.
Stat 110A, UCLA, Ivo Dinov Slide 48
9
Stat 110A, UCLA, Ivo Dinov Slide 49
z For the sample mean calculated from a random sample,
E( ) = and SD( ) = , provided
= (X
1
+X
2
+ + X
n
)/n, and X
k
~N(, ). Then
z ~ N(, ). And variability from sample to sample
in the sample-means is given by the variability of the
individual observations divided by the square root of
the sample-size. In a way, averaging decreases variability.
X
n
X
X
X
X
Stat 110A, UCLA, Ivo Dinov Slide 50
Central Limit Effect
Histograms of sample means
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
n = 1
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
n = 2
Triangular
Distribution
Sample means from sample size
n=1, n=2,
500 samples
Area = 1
2
1
0
2
1
0
2
1
0
Y=2 X
Stat 110A, UCLA, Ivo Dinov Slide 51
Central Limit Effect -- Histograms of sample means
0.0 0.2 0.4 0.6 0.8 1.0
n = 4
0.0 0.2 0.4 0.6 0.8 1.0
n = 10
Triangular Distribution
Sample sizes n=4, n=10
Stat 110A, UCLA, Ivo Dinov Slide 52
Central Limit Effect
Histograms of sample means
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
n = 1
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
n = 2
0
0.0 0.2 0.4 0.6 0.8 1.0
1
2
3
Area = 1
Uniform Distribution
Sample means from sample size
n=1, n=2,
500 samples
Y = X
Stat 110A, UCLA, Ivo Dinov Slide 53
Central Limit Effect -- Histograms of sample means
n = 4
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
4
Uniform Distribution
Sample sizes n=4, n=10
Stat 110A, UCLA, Ivo Dinov Slide 54
Central Limit Effect
Histograms of sample means
Sample means from sample size
n=1, n=2,
500 samples
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
n = 1
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
n = 2
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
Area = 1
Exponential Distribution
) , 0 [ ,
x
x
e
10
Stat 110A, UCLA, Ivo Dinov Slide 55
Central Limit Effect -- Histograms of sample means
n = 4
0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
n = 10
0 1 2
0.0
0.4
0.8
1.2
Exponential Distribution
Sample sizes n=4, n=10
Stat 110A, UCLA, Ivo Dinov Slide 56
Central Limit Effect
Histograms of sample means
Sample means from sample size
n=1, n=2,
500 samples
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
n = 1
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
Quadratic UDistribution
Area = 1
( ) ] 1 , 0 [ , 12
2
2
1
= x x Y
Stat 110A, UCLA, Ivo Dinov Slide 57
Central Limit Effect -- Histograms of sample means
n = 4
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0
1
2
3
Quadratic UDistribution
Sample sizes n=4, n=10
Stat 110A, UCLA, Ivo Dinov Slide 58
Central Limit Theorem:
When sampling from almost any distribution,
is approximately Normally distributed in large samples. X
Central Limit Theorem heuristic formulation
Show Sampling Distribution Simulation Applet:
file:///C:/Ivo.dir/UCLA_Classes/Winter2002/AdditionalInstructorAids/
SamplingDistributionApplet.html
Stat 110A, UCLA, Ivo Dinov Slide 59
Let be a sequence of independent
observations from one specific random process. Let
and and and both be
finite ( ). If , sample-avg,
Then has a distribution which approaches
N(,
2
/n), as .
Central Limit Theorem
theoretical formulation
{ } ,... ,...,X ,X X
k 2 1
= ) ( X E = ) ( X SD
< < < | | ; 0
=
=
n
k
k
X
n n
X
1
1
X
n