Sei sulla pagina 1di 35

UNIVERSITY OF ILLINOIS

Department of Economics
Course: Econ 506, Fall 2012 August 28, 2012
Instructor: Anil K. Bera (abera@illinois.edu), 225E DKH
Class Hours: 1:30 - 3:10 TuTh
Class Room: 215 DKH
Oce Hours: 12:00 - 1:00 TuTh
TA: Yu-Hsien Kao (kao21@illinois.edu)
Prologue
April 1242
Baghdad, Iraq
Baghdad took no note of the arrival of Shams (Sun) of Tabriz, a wondering Su
Saint, from Samarkand to the citys famous Dervish Lodge. Shams told the master of
the lodge, Baba Zaman, that he wanted to share his accumulated knowledge to the
most competent student. Why? Because, Shams said, Knowledge is like brackish
water at the bottom of an old vase unless it ows somewhere. Baba Zaman got serious
and asked a bizarre question: You say you are ready to deliver all your knowledge
to another person. You want to hold the Truth in your palm as if it were a precious
pearl and oer it to someone special. That is no small task for a human being. Are
not you asking too much! What are you willing to pay in return?
Raising an eyebrow, Shams of Tabriz said rmly, I am willing to give my head.
This is an introductory course in mathematical statistics, and its purpose is to
prepare you for the econometrics course, Econ 507 (Spring 2013). To carry out a
good applied econometrics study, it is necessary to master the econometric theory.
Econometric theory requires a good knowledge of statistical theory which in turn has
its foundation on probability theory. Finally, one cannot study probability without
set theory. Therefore, we will begin at the beginning. We will start with the set
theory, and discuss probability and the basic structure for statistics. Then we will
slowly move into dierent probability distributions, asymptotic theory, estimation
and hypothesis testing.
After doing all these, the whole course will be just like a candle. It will provide
us much valuable light. But let us not forget that a candle will help us to go from one
place to another in the dark. If we, however, forget where we are headed and instead
concentrate on the candle, what good will it be?
As you have guessed the course materials will be highly theoretical. No statistical
background will be assumed. However, I will take it for granted that you already
know dierential and integral calculus and linear algebra. Good Luck!
Course Outline:
1. Introduction
(a) Why statistics?
(b) Statistical data analysis: Life by numbers
2. Probability Theory
(a) Algebra of sets
(b) Random variable
1
(c) Distribution function of a random variable
(d) Probability mass and density functions
(e) Conditional probability distribution
(f) Bayes theorem and its applications
(g) More on conditional probability distribution
(h) Mathematical expectation
(i) Bivariate moments
(j) Generating functions
(k) Distribution of a function of a random variable
3. Univariate Discrete and Continuous Distributions
(a) The basic distributionhypergeometric
(b) Binomial distribution (as a limit of hypergeometric)
(c) Poisson distribution (as a limit of binomial)
(d) Normal distribution
(e) Properties of normal distribution
(f) Distributions derived from normal (
2
, t and F)
(g) Distributions of sample mean and variance
4. Asymptotic Theory
(a) Law of large numbers
(b) Central limit theorems
5. Estimation
(a) Properties of an estimator
(b) Cramer-Rao inequality
(c) Suciency and minimal suciency
(d) Minimum variance unbiased estimator and Rao-Blackwell theorem
(e) Maximum likelihood estimation
(f) Nonparametric method and density estimation
6. Hypothesis Testing
(a) Notion of statistical hypothesis testing
(b) Type I and II errors
(c) Uniformly most powerful test and Neyman-Pearson lemma
(d) Likelihood ratio (LR) test
(e) Examples on hypothesis testing
(f) Raos score or the Lagrange multiplier (LM) test
(g) Wald (W) test
2
Recommended Text:
A First Course in Probability and Statistics by B.L.S. Prakasa Rao, 2008, World
Scientic.
However, I will not follow this book closely. For your convenience detailed notes
(in four volumes) on the whole course will be made available in the course web-
page. As you will notice, the lecture notes, given the subject matter, are very
dry and mechanical. We will try to make things more lively by analyzing some
interesting data (some even depicting your lives) sets and contemporary real
world problems.
Yu-Hsien Kao, TA for this course will meet with the class on Fridays, 1:30 -
2:50pm, 215 DKH. Her oce hours will be: 11:00am - 12:30pm Mondays.
Course Webpage: Please check Compass regularly for Announcements/ Updates
on Homeworks, Exams etc.
Assessment: There will be two closed book examinations. You will also receive four
homework assignments. The grading of the course will be based on:
Homework 20%
First Exam (around mid-semester on a Th) 40%
Second Exam(on the last day of the class) 40%
Epilogue
In late October of 1244 in Konya, Turkey, Shams found the student he was looking
for: Jalaluddin Rumi, already a famous Islamic scholar in Turkey. Under the tutelage
of Shams, Rumi became one of the most revered poets in the world, as Rumi said,
I was raw. I was cooked. I was burned.
March 1248
Konya, Turkey
Rumis son Aladdin hired a killer who did not require much convincing.
It was a windy night, unusually chilly for this time of the year. A few nocturnal
animals hoofed and howled from afar. The killer was waiting. Shams of Tabriz came
out of the house holding an oil lamp in his hand and walked in the direction of the
killer and stopped only a few steps away from the bush where the killer was hiding.
It is a lovely night, isnt it? Shams asked.
Did he know the killer was there? Soon six others joined the killer. The seven
of them knocked Shams to the ground, and the killer pulled his dagger out of his
belt......
Together they lifted Shams body which was strangely light, and dumped him into
a well. Gasping loudly for air, each of them took a step back and waited to hear the
sound of Shams body hitting the water.
It never came.
Taken from: Elif Shafak (2010), The Forty Rules of Love, Penguin Books.
3
CONTENTS
1. Introduction
(a) Structure of the course.
2. Probability Theory
(a) Algebra of sets
(b) Random variable
(c) Distribution function of a random variable.
(d) Probability mass and density functions .
(e) Conditional probability distribution
(f) Bayes theorem and its applications . .
(g) More on conditional probability distribution .
(h) Mathematical expectation
(i) Bivariate moments
(j) Generating functions
(k) Distribution of a function of a random variable
3. Univariate Discrete and Continuous Distributions
(a) The basic distribution-hypergeometric ....
(b) Binomial distribution (as a limit of hypergeometric)
(c) Poisson distribution (as a limit of binomial)
(d) Normal distribution ...... .
(e) Properties of normal distribution
(f) Distributions derived from normal (X
2
, t and F)
1
2
6
10
13
19
22
25
35
57
66
74
82
85
96
98
. 100
105
Introduction to Statistics for
Econometricians
by
Anil K. Bera
1.1 Introduction.
If you look around, you will notice the world is full of uncertainity. With
all the enormous amount of past information, we never can tell about the exact
weather condition of tomorrow. The same is true for many economic variable, such
as- stock price, exchange rate, inflation, unemployment, interest rate, mortgage
rate, etc. [If you know the exact future price of major stock- you can make a
million! In that case, of course, you won't be taking this course.] Then, what is
the role of statistics in this uncertain world? The basic foundation of statistics
is based on the idea that there is an underlying principle or common rule in
the midst of all the chaos and irregularities. Statistics is a science to formulate
these common rules in a systematic way. Econometric:; is that field of science
which deals with application of statistics to economics. Statistics is applicable
to all branches of science and humanities. You might have heard of fields, like
sociometry, psychometry, cliometrics and biometrics. These are application of
statistics to sociology, psychology, history and biology, respectively. Application of
statistics in economics is somewhat controversial, since unlike physical or biological
sciences, in economics we can't conduct purely random experiments. In most
cases what we have historical data on certain economic variables. For all practical
purposes, we can view these data as result of some random experiment and then
use the statistical tools to analyze the data. For example, regarding stock price
movement, based on the available data we can try to find the underlying probability
distribution. This distribution will depend on some unknown parameters which
can be estimated using the data. We can also test some hypothesis regarding the
parameters or we can even test whether the (assumed) probability distribution is
valid or not.
Just like any other science, in statistics there are many approaches, Classical
and Bayesian, parametric and nonparametric etc. These are not always substitutes
of each other and in many cases, they can be successfully used as compliments of
each other. However) in this course, we will concentrate on the classical parametric
approach.
1
2
2.1 Basic Set Theory.
The objective of econometrics is "advancement of economic theory in its rela
tion to statistics and mathematics." This course is concerned with the "statistics"
part, and the foundation of statistics is in probability theory. Again, "probability"
is defined for events, and these events can be described as sets or subsets. To see
the link
Econometrics ---+ Statistics ---+ Probability ---+ Event ---+ Set.
Let us start with a definition of a "set".
Definition 2.1.1: A set is any (well defined) collection of objects.
Example 2.1.1:
i) C = {I, 2, 3, ... } set of all positive integers.
ii) D = {2,4,6, ... } set of all positive even integers.
iii) F = {Students attending Econ 472}
iv) G = {Students attending Econ 402}
An object in a set is an element, e.g., 1 is an element of the set "C". We will
denote this as 1 E C, " E " means "belongs to." Note that the set C contains more
elements than the set D. We can say that D is a "subset" of C and will denote
this as DeC. Formally,
Definition 2.1.2: Set B is a subset of set A, denoted by B C A if x E B implies
x E A.
You know that with real numbers we can do lot of operations like addition
(+), substraction (-), multiplication (x) etc. Similar operations could also be done
with sets, e.g., we can "add" (sort of) two sets, substract one set from the another
etc. Two very important operations are "union" and "intersection" of sets.
Definition 2.1.3 : Union of two sets A and B is C, defined by C = AU B, if
C = {xix E A and/or x E B.}
2
3
In other words, by union of two sets we mean collection of elements which
belong to at least one of the two sets.
Example 2.1.2: In Example 2.1.1, CUD = C. If we define another set E =
{l, 3,5, ...}, set of all positive odd integers, then C = DUE.
The operation can be defined for more than two sets. Suppose we have n sets
A
I
,}!2,' .. , An. Then Al U A2 . " UAn denoted by Ui=I Ai is defined as-
n
U Ai = {xix E at least one Ai, i = 1,2, ... ,n}
i=I
In a similar fashion, we can define union of infinite ( but countable) number
00
of sets A
1
,A
2
,A
3
, as U Ai = Al UA2 UA3 .
i=I
Example 2.1.3: Let Ai = {i}, i.e. Al {I}, A2 = {2}, ... etc.
00
Then U Ai C of Example 2.1.1.
i=I
00
Or let Ai = [-i, iJ, an interval in the real line R, then UAi n.
i=I
The next concept we discuss is "intersection". Intersection of two sets A and
B, denoted by A n B, consists of all the common elements of A and B. Formally,
Definition 2.1.4 : Intersection of two sets A and B is C, denoted by C An B,
if C = {xix E A and x E B}.
Example 2.1.4 : In Example 2.1.1, enD D and F n G={students attending
both Econ 472 and 402}
As in the case of "union", we can also define the operation "n" for more than two
sets. For example,
n
nAi = Al n Az ... n An {xix E Ai, for all i = 1,2"", n}
i=l
00
1 2 3
nAi = Al n A2 n A3 = {xix E Ai, for all t , , ,... }
i=I
It is easy to represent the above two concepts diagramatically called Venn
diagram [see Figure 2.1.1]
3
4
A
AOB
Figure 2.1.1
B
Be (l...
Figure 2.1.2
- 3/:
5
Continuing with Example 2.1.1, suppose those students taking Econ 472 have
already attended Econ 402, i.e. there is no student in Econ 472 class who is taking
Econ 402 now. Then if we talk about F n G, the set will be empty. We will call
such a set a null set and will denote it by 4>. By definition for any set A, AU4> = A,
An 4> = 4>.
Example 2.1.5: If! Example 2.1.1 and 2.1.2, D n E = 4>.
Earlier we noted, in Example 2.1.1, DeC. Now remove the elements of D
from the set C, what we are left with is the set E in Example 2.1.2. We write this
as E = C - D, i.e., the difference between sets is the elements of one set after
removing the elements those belong to the other set. Formally,
Definition 2.1.5: The difference between two sets A and B, denoted by A - B,
is defined as C = A - B = {xix E A and x r:f. B}. Note, " r:f. " means "does not
belong to". In Venn diagram A - B can be represented as in Figure 2.1.2.
Now it is clear that a set consists of elements satisfying certain properties.
We can imagine a big set which consists of elements with very little restriction.
For example, in Example 2.1.1, regarding sets C and D, we can think of n, set
of all real numbers. We will vaugely called such a big (reference) set, a .space
and will denote as S. l\-ote here, C c S, DeS. So let S = n. Define
Q = {set of all rational numbers}, then S - Q = {set of all irrational numbers}.
Another way to think about S - Q is as "compliment" of Q in S, which denoted
as QCIS.
Definition 2.1.6. Compliment of a set A with respect to a Space S, denoted by
ACIS= {x E Six r:f. A}.
In most cases, the reference set S will be obvious from the context and we
will omit S from the notation and will write AcIS as simply Ac.
Example 2.1.6: In Examples 2.1.1 and 2.1.2 DC IC = E, EC IC D.
See the Venn diagrams in Figure 2.1.3.:
Consider the identity (A UBr = AC n BC. Without the diagram, we can
easily prove this. The trick is if we want to show that a set C is equal to another
set D, show the following:
4
6
Figure 2.1.3
- yA
7
If for every x, x E C then xED :::} C C D
If for every x,x E D then x E C:::} Dee
combine these two and obtain C D.
Let us prove the above identity. Let x E (A UBt so x :. (A. U B), i.e.,
x :. A and x :. B. In other words x E AC and x E B
C
i.e., x E AC nBc.
Therefore, (A. U B) C C AC n BC. N ~ x t assume x E AC n BC, and reversing the
above arguments, we see that x E (A U Bt. So we have AC n B
C
c (A U Bt.
Hence (A U Bt = AC nBc.
Now try to prove the following identity
These identities are known as De Morgan's law. Try to prove the following gener
alizations:
Let us now link up the set theory with the concepts of "event" and "probability".
Suppose we throw one coin twice. The coin has two sides, head (H) and tail (T).
What are the possible outcomes?
Both tail (T T)
Both head (H H)
Tail head (T H)
Head tail (H T).
Collect these together in a set D={ (T T), (H H), (T H), (H Tn, this is the
collection of all possible outcomes. We may be interested in the following special
outcomes:
Al {outcomes with first head} = {(HH),(HTn.
A2 {outcomes with at least one head} = {(HH),(HT),(T,H)}.
A3 {outcomes with no tail} = {(HH)}.
5
8
AI, A
2
, As ... are all events, and note A
I
,A
21
As C n. We can think
of a collection of subsets of n and a particular event will be an element of that
collection. Under this framework we can define the probabilities of different events.
So far we have considered sets which are collection of single element, e.g., we
had a set C = {I, 2, 3, ... }. We can think uf <t set whose elements are also sets,
i.e., a set of sets. We can call this a collection of a class of sets. By giving a
different structure to this class of sets, we can define many concepts, such as ring
and field. For our future purpose, all we need is the concept of a - field (sigma
field). This will be denoted by A (script A). A 0" - field is nothing but a collection
of sets AI, A
2
, As, ... satisfying the following properties
(i) AI,Az, ... E A ===?
00
UAi EA.
i=l
(ii) If A E A then AC E A.
In other words A is closed under the formation of countable unions and under
complimentation. From the above two conditions, it is clear that for A to be a
O"-field,the null set and the space n must belong to A.
Example 2.1.7: n {1,2,3,4}, A a-field on n c ~ n be written as
A = {<p, (1, 2), (3, 4),(1, 2, 3, 4)}
Example 2.1.8:
n = R (real line)
A = {countable union of intervals like( a, b]}
A is called Borel field and members of A are called Borel sets in R.
2.2 Random Variable.
As you can guess, the word "random" is associated with some sort of uncer
tainty. If we toss a coin, we know the possibilities: head (H) or tail (T); but we are
uncertain about exactly which one will appear. Therefore, "tossing a coin" can be
regarded as a random experiment where the possibilities are known but not the
6
9
exact outcome. The probability theory, the collection of all possible outcomes is
known as Jample Jpace.
Example 2.2.1:
(i) Toss a coin. The sample space is 51 = {H, T}.
(ii) Toss two coins or one coin twice, the sample space
51 = {(HH),(TT),(HT), (TH)}
(iii) Throw a die,
51 {(.), (:), (: .), (::), (:: .), (:::)}
Instead of assigning symbols, we can give these outcomes, some numbers (real
numbers). For example, for the above Example (i), we can define
x 0 if the outcome is T
= 1 if the outcome is H
For Example (iii) above, X can take values 1, 2, 3, 4, 5, 6. X defined in such a
way is called a random variable. Once a random variable is defined, we can talk
about the probability distribution of the random variable.
Let us first formally define "Probability". For Example (i), we have the sample
space 51 {H, T}. The O"-field defined on 51 is A = {cP, 51, (H), (T)}. Elements
of A are called the events. "Probability" is nothing but assigning real numbers
(satisfying some conditions) to each of these events.
Definition 2.2.1: Probability denoted as P is a function from A to [0,1].
P : A ---t [O,lJ
satisfying the following axioms
(i) pen) = 1
(ii) If AI, A
2
, A
3
, . E A are disjoint (i.e., Ai n Aj = cP for all i f:: j) then
00
7
10
Example 2.2.2:
n == {H,T}
A = {t,h, n, (H), (T)}
P(t,h) = 0, pen) 1, P(H) = f' peT) = t
Earlier we indicated that a random variable can be-defined by assigning real
number to the elements of n. Now define a a-field on the real line n and denote
it by B. Formally, we can define a random variable X as
Definition 2.2.2: A random variable X is a function X : n --+ R such that for
all B E B, X-I (B) EA.
Note that here X I(B) = {w E nIX(w) E B). For a diagramatic representa
tion of random variable X as a function, see Figure 2.2.1
In other words X (.) is a measurable function from the sample space to the
real line. "Measurability" is defined by requiring that the inverse image of X is
an element of the a-field, i.e., an event. Recall that, probability is defined only
for events. By requiring that X is measurable, in a sense, we are assuring its
probability distribution.
Example 2.2.3: Toss a coin twice, then the sample space n and a a-field A can
be defined as
n = {(HH), (TT), (HT), (TH)}
A {t,h, n, (HH), (TT), (HT), (TH), ((HH)(TT)),
((TT)(HT)), ((HT)(TH)), ((HH)(TH)), ((HH)(HT))
((TT)(TH)), ((HH)(TT)(HT)), ((TT)(HT)(TH)),
((HH)(TT)(TH)),((HH)(HT)(TH))}
Define X = number of heads. Then X takes 3 values
x=o
=1
2
8
11
]
X-I
Figure 2.2.1
-- ~ l -
12
First assign the following probabilities
1 1 1 1
P(HH) = 4' P(TT) P(HT) = 4'
P(TH) = -.
4' 4
The triplet (n, A, P) is called a probability space and peA) is the probability
of the event A.
CorreSponding to (n, A, P), there exists another probability space (R, B, pX),
where pX is defined as
PX(B) = P[X-
1
(B)] for B E B.
In the above example, take B = 1, then
PX(l) = P[X-
1
(1)]
= P[(HT), (TH)]
P[( HT) U (TH)]
P(HT) +P(TH) (why?)
1 1 1
= 4 + 4; = 2'
Similarly, we can show that
1
pX(0) = 1 and pX(2)
4 4
pX(.) is called the probability measure induced by X. To summarize, we have
defined two functions
X : n ------+ R
pX : B ------+ [0, 1].
where B is a a-field defined on R [see Example 2.1.8]
For the above example, the two functions can be described as
w X(w)
pX
(TT)
(HT),(TH)
(HH)
0
1
2
1/4
1/2
1/4
9
13
The last two columns describe the probability distribution of the random
variable X. Sometimes we will simply denote it by P(X).
x P(X)
0 1/4
1 1/2
2 1/4
Most of the time probability distributions (of discrete random variables) are
presented this way. From the above discussion, it is clear that each such probability
distribution originates from an n, sample space of a random experiment.
Definition 2.2.3: Listing of the values along with the corresponding probabilities
is called the probability distribution of a random variable.
Note: Strictly speaking, this definition applies to "discrete" random variable only.
Later, we will define, "discrete" and"continuous" random variables.
2.3 Distribution Function of a Random Variable.
Sometimes it is also called cumulative probability distribution and is denoted
by F(). Let us denote by "x" the value( s) X can take, then F() is simply defined
as
F( x) Probabili ty of the event X:::; x
Pr(X :::; x).
Note: \Ve will use "PrO" to denote probability of an event without defining the
set explicitly, and PO or pX(.), when the set is explicitly stated in the argument.
Also note that the probability spaces for P and pX are respectively, (n, A, P) and
(n,B,p
X
).
Let us now provide a formal definition of the distribution function. Let
W(x) = {w E nIX(w) :::; x}.
Since X is measurable, W(x) E A. In the probability space (n,B,p
X
), we can
write the probability of W(x) as
P(W(x)) = pX[(_oo, xl].
10
14
This is well defined since (-00, xl E B. This probability is called the distribution
function of X, i.e.,
F(x) = Pr(X ~ x) P (W(x = pX [( -00, xl].
For our example:
w px = Pr(X = x) F(x) = Pr(X ~ x)
(TT) o 1/4 1/4
(H T) (T H) 1 1/2 1/2 +1/4 = 3/4
(HH) 2 1/4 3/4 + 1/4 = 1
Or simply
X F(X)
0
1
2
1/4
3/4
1
If we plot, F(x) will look like as in Figure 2.3.1. Note that it is a step function.
Also notice, the discontinuties at x = 0, 1 and 2.
2.3.1 Properties of the Distribution Function.
(i) 0 ~ F(x) ~ 1. Since F( x) is nothing but a probability, the result follows from
the definition of probability.
(ii) F(x) is a nondecreasing function of x Le., if Xl > Xz, then F(xt) 2: F(xz).
Proof:
F(Xl) = Pr(X ~ xd pX [( -00, Xl]] pX(Ad (say)
F(xz) = Pr(X ~ xz) = pX [( -00, xzn pX(Az) (say)
Since Az C AI, we have
PX(Ad 2: PX(A2) (why 7)
I.e., F(XI) 2: F(xz)
11
15
Fex-)
F
. 0
i
19ure 2.3.1
1/ A
16
(iii) F(-oo) = 0 where F(-oo) = limn--+ooF(-n).
Proof: Define the event
An = {w E DIX(w) ::; -n}
Note that P(An) = Pr(X ::; -n) = pX [( -00, -nl] = F( -n).
Now limn--+oo An = >
F(-oo) = lim F(-n) = lim P(An)
n--+oo n--+oo
= P( lim An)' (why 7)
n--+oo
= P( = O. (why 7)
Note: The first (why 7) follows from the "continuity" property of P(.). It says:
if {An} is a monotone sequence of events, then P(limn--+oo An) = limn--+oo P(A
n
).
(Try to prove this; see Workout Examples-I, Question 6).
(iv) F(oo) = 1, where F(oo) = limn--+ooF(n).
The proof is similar to (iii), Define
An = {w E DIX(w) ::; n}
F( 00) = lim P(An) = P( lim An) = P(D) = l.
n--+oo n--+oo
(v) For all x, F( x) is continuous to the right or right continuous. [What does it
really mean is that F(x + 0) = F(x) where F(x + 0) = lime.j..o F(x + c:).]
Proof: Define the set
1
An = {w E DIX (w) ::; X + -)
n
1
F(x + -) = P(An)
n
lim F(x + ~ ) = lim pX [( -00,
xl] =F(x).
n--+oo n n--+oo
12
17
1
F(x +0) = limF(x +c) = lim F(x + -).
e:.j.O n-too n
Therefore, F(x +0) = F(x).
We can show that F(x) may not be continuous to the left. i.e., F(x-O) F(x)
where F(x 0) lime:to F(x + c:). To prove this, define
1
Bn = {w E nIX(w) :::; x - -}
n
F(x - 0) = lim F(x - ~ ) lim P(Bn)
n-too n n-too
P( lim Bn) = pew E QIX(w) < x) = Pr(X < x).
n-too
However,
F(x) Pr(X:::; x) = Pr(X < x) +P(X = x) (why?)
Hence,
F(x) - F(x - 0) Pr(X = x)
Therefore, whenever Pr(X x) > 0, there will be a jump in F(x) at X = x, or
discontinuity at X x. In the Figure 2.3.1, we noted the discontinuity at x = 0,1,
and 2. Also note that
Pr(X = 0) = ~ > 0
Pr(X = 1) = ! > 0
Pre X = 2) = ~ > 0
If Pr(X = x) = 0 for all x then F(x) will be continuous since, in that case
F(x) F(x + 0) = F(x - 0) for all x.
2.4 Probability Mass and Density Functions.
Once we have defined the distribution function, we can talk about the "Prob
ability mass function" (for discrete variables) and "Probability density function"
(for continuous variables).
13
18
Let n contain finite (or count ably infinite) number of elements. Here by
countably infinite we mean one-to-one correspondence with the set of integers, N =
{I, 2, 3, ..... }. To see an example, consider an experiment of tossing a coin until we
get a head. Then n = {H, T H, TTH, ..... }. If we define X as the number of trials
to get a head then X = 1,2,3, ...... Denote that as n = {WI, W2, W3, . }. Therefore,
n contains discrete points. For any event A E A, we define the probability
peA) L: P(Wi).
"',EA
For a random variable X 1 constructed on n will also take discrete values. Let us
now denote the range of X as X and the associated probability space as (X, B, pX).
Therefore, we will have a discrete random variable X with discrete probability
distribution pX. Given that
pX(X) 1.
the total mass will be distributed on a discrete number of points. Therefore,
sometimes the probability distribution of X, pX is called probability mass func
tion(pmf).
Example 2.4.1:
n = {(HH), (TT), (HT), (TH)}
X = # heads
Then
1.e.,
X
p,x
o 1/4
1
1/2
2 1/4
14
19
pX(X) = L
3
Pr(X = Xi)
i=l
Example 2.4.2 :
(i) Toss a coin n times and let X = # heads. Then X takes (n+ 1) values, namely,
X = 0,1,2, ... , n. The probability distribution of X with the corresponding
points in the sample space can be written as
pX
w X
TTTT ... TTT o (1/2t
HTTT ... TTT
2
2
2
(1/2)n )
THTT ... TTT (1/2)n ( Add = n(1/2)n
TTTT ... TTH (1/2)nj
HHTT .. , TTT
THHT ... TTT
TTTT ... THH
THHH ... HHH (n -1)
HTHH ... HHH (n - 1)
Add = n(1/2)n
HHHH ... HHT
(n
(1/2)n
(1/2)n
U/2)n
(1/2)n
HHHH ... HHH n (1/2)n
So here Pr(X 1) = n O)n ,Pr(X = 2) = ot and so on. Later we
will derive this probability distribution simply as a special case of binomial
15
20
distribution. Check here that if we add pX for all the values of X, it is equal
to one.
(ii) Let us now consider our earlier example of tossing a coin until we get a head,
and define X = # heads. Then X will take countably infinite number of
values with the following probability distribution.
pX
w x
H 1 1/2
TH 2 (1/2)2
TTH 3 (1/2)3
It is easy to check that here the total probability is equal to t+(t) 2 + ( t )
3
+
... = 1.
(iii) Now suppose X takes n values, (Xl,X2, ... ,X
n
) = {Xi, i = 1,2, ... ,n}.
Let Pr(X=xi)=Pi, 2 1,2, ... ,n
The distribution function for this probabiity mass function is
F(x) = Pr(X $ x) = I: Pi
Xi
Any set of pi s can serve our purpose. All we need is to satisfy the following two
conditions:
(i) Pi 2: 0 Vi.
(ii) 2:
i
Pi = 1.
As we noted before when the distribution is discrete there will be jumps in F(x),
and therefore, it will not be continuous and hence not differentiable. Now suppose,
F(x) is continuous and, differentiable excpept a few points and
f(x) = dF(x)
dx
16
21
where f( x) is continuous function (except at a few points). We will then call X
a continuous random variable with probability density function (p. d. f.) f(x).
Therefore, the relation between f(x) and F(x) can also be written as
F(x) = [Zoo f(t)dt.
Recall F(00) 1, therefore
[: f(x)dx = 1.
Also we noted earlier that F( x) is nondecreasing, therefore we should have f( x) :2:
o "Ix. We define f(x) to be a pdf of a continuous random variable X if the
following two conditions are satisfied
(i) f(x):2: 0 "Ix E X
(ii) [: f(x)dx Ix f(x)dx 1.
Note: Here X denotes the range of X.
For a continuous variable X,
Pr(a::; X :::; b) = Pr[X :::; b]- Pr[X ::; a]
= F(b) - F(a)
= [boo f(x)dx - [a f(x)dx
oo
= lb f(x)dx.
Note that for the discrete case, this probability can be written as
Pr(a :::; X :::; b) = 'I: Pr(X Xi)
a ~ X i ~ b
When F is continuous Pr(X = a) = F(a) - F(a-) = O. Therefore, for
continuous case, Pr(a ::; X :::; b) = Pr(a < X :::; b) Pr(a:::; X < b) = Pr(a <
X < b). [see Figure 2.4.1]
17
22
Figure 2.4.1
17A
23
Example 2.4.3:
0, for x <
Let F(x) = x, for x E [O,IJ
{
1, for x > 1.
as given in Figure 2.4.2.
Here F( x) is ".differentiable," therefore we can construct f( x) as
0, for x <
f(x)::::; 1, for x E [0, IJ
{
0, for x > 1.
Simply we can write this as [see Figure 2.4.3]
f(x) = {I, for x E [0,1]
0, elsewhere.
Here X is a continuous random variable; however, note the discontinuities of f( x)
at 0 and 1. This distribution is known as uniform distribution [since for x E [0, 1],
f(x) is constant].
So far we have talked variables which are either discrete or continuous. A
random variable, however, could be of mixed type. Let
X = expenditure on cars
If we assume X is continuous, then Pr(X = 0) = O. But there will be many
individuals who do not have any expenditure on cars. Suppose half of the people
do not have any expenditure on cars, during a certain period then it is reasonable
to put Pr(X = 0) = 0.5. Suppose we assume F(x) = 0.5 +0.5(1 - e-
X
) for x > 0,
and F(x) = 0 for x < 0. The corresponding probability function is [see Figure
2.4.4J
Pr(X < 0) = 0
Pr(X =0) 0.5
f(x) = 0.5e-
x
for x > 0
18
24
Figure 2.4.2
,..
' ()
i
F19ure 2.4.3
16 /1 -
25
j ~ )
'x... ->
Figure 2.4.4
26
Note that here f(x) ;:::: 0 and
i: f(x)dx 0.5 +0.51= e-Xdx = 1.0.
Hence, this is a well defined probability distribution.
2.5 Conditional Probability Distribution.
Let us consider two events A, B E A. We are interested in evaluating proba
bility of A only for those cases when B also occurs. We will denote this probability
as P(AIB) and will assume PCB) > O. We can treat B as the (total) sample space.
First note that
P(AIB) peA n BIB).
This is true because when fl is the sample space
peA) = P(Alfl) peA n fllfl).
Here B is our sample space. Also note that P(BIB) = 1. Now,
peA IB) = peA nBIB) = peA n Blfl)
P(AIB) (why?)
n B P(BIB) P(Blfl)
p(An B)
PCB) .
We will write this conditional probability simply as P(AIB)
P ~ t m ) ' This is
called conditional probability of (event) A given (event) B.
Note: (Above why?) Use old definition of probability
#cases for An B
peA n BIB)
#cases for B
(#cases for An B/#cases in fl) peA n Blfl)
-
(#cases forB/#cases in fl) P(Blfl)
Example 2.5.1 : Let
fl {(HH),(TT),(HT),(TH)}
and A (HT), B = {(HT), (TH)}, An B = (HT).
19
27
1 1 1
Therefore,
P(A) = 4' P(B) = 2'
p(AnB)=4
Let us first intiutively find the conditional probabilities. For (AlB), we know
that either (HT) or (TH) has appeared, and we want to find the probability that
(HT) has occurred. Since all the element of n has equal probability, P(AIB) =
Similarly P(BIA) = 1 since (HT) has already occurred. Now let us use the formula
to get the conditional probabilities.
P(AIB) = P(A n B) = t = # P(A)
P(B) t 2
P(A n B) 1
P(BIA) = P(A) = t = 1 # P(B) (Interpret this result)
Here the probability of A (or B) changes after it has been given that B (or A) has
appeared. In such a case we say that the two events A and B are dependent.
Example 2.5.2: Let us continue with the same sample space
n = {(HH), (TT), (HT), (TH)}
but now assume A = {(TT),(HT)}, . B = {(HT),(TH)}
We have An B = {(HT)}
Therefore,
P(A) P(B) =
P(A n B) =
4
p(AnB) 1 1
P(AIB) = P(B) = t = "2 = P(A) (Interpret this result)
P(BIA) = p(AnB) = i = = P(B)
P(A) 2
Therefore, we have P(AIB)= =P(A)
1.e., P(AB)=P(A).P(B).
In this case, we say that A and B are independent.
Result 2.5.1: Conditional probability satisfies the axioms of probability.
20
28
Proof:
C) P(AIB) = P(AB) > 0
Z PCB)
(
;;) p(nIB) pen n B) PCB) 1
.. = PCB) = PCB) = .
(iii) Let AI) A21 A
3
) be a sequence of disjoint then
P (QI (Ai n B))
-
PCB)
_ P(Ai nB)
PCB)
= f P(Ai nB)
i=1 PCB)
<Xl
= LP(AiIB)
i=l
Note that (Ai n BYs are disjoint, since (Ai n B) n (Aj n B)=Ai n Aj n B = for
ii=j.
Note: Conditional distributions are vary useful in many practical applications.
Such as,
(i) Forecasting: Give data on T periods, XI, X
2
, .. ,XT if we want to forecast
the value in (T + 1)th period, that could be obtained from the conditional
disribution P(XT+IIX
I
, X 2 ) ,XT).
(ii) Duration dependence: We can consider the conditional probability of a getting
a job given the duration of unemployment.
(iii) Wage differential: Wage distributions could be different for unionized and
non-unionized workers.
21

Potrebbero piacerti anche