Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
to Econometrics
Chapters 1 and 2
The statistical analysis of
economic (and related)
data
and
Review of Probability
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-2
Introduction to Econometrics is title of text
What is econometrics?
What is it?
Science (& art!)
Broadly, using theory and statistical methods to analyze
data
What are some uses?
Test theories
Forecast values (e.g., firms sales, unemployment, stock
prices, path of a hurricane, & much, much more)
Fit mathematical economic models to data
Use data to make numerical policy recommendations in
govt. and business
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-3
Brief Overview of the Course
Economics suggests important relationships, often
with policy implications, but virtually never
suggests quantitative magnitudes of causal
effects.
What is the quantitative effect of reducing class size on
student achievement?
How does a bachelors degree change earnings?
What is the price elasticity of cigarettes?
What is the effect on output growth of a 1 percentage
point increase in interest rates by the Fed?
What is the effect on housing prices of environmental
improvements?
How much does knowing econometrics improve your love
life?
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-4
Economic Questions Well Examine
1. Does reducing class size improve elementary
school education?
2. Is there racial discrimination in the market for
home loans?
3. How much do cigarette taxes reduce smoking?
4. What will be the rate of inflation next year?
(in todays economy, a bigger question might be What
will be the unemployment rate next year?)
5. How much does knowing econometrics improve
your love life?
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-5
This course is about using data to
measure causal effects.
Ideally, we would like an experiment
What would be an experiment to estimate the effect of class
size on standardized test scores?
But almost always we only have observational
(nonexperimental) data.
returns to education
cigarette prices
monetary policy
Most of the course deals with difficulties arising from using
observational data to estimate causal effects
confounding effects (omitted factors)
simultaneous causality
correlation does not imply causation
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-6
Learn methods for estimating causal effects using
observational data
Learn some tools that can be used for other purposes; for
example, forecasting using time series data;
Focus on applications theory is used only as needed to
understand the whys of the methods;
Learn to evaluate the regression analysis of others this
means you will be able to read/understand empirical
economics papers in other econ courses;
Get some hands-on experience with regression analysis in
your problem sets.
In this course you will:
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-7
Three types of data
Cross-sectional
Different entities, single time period
Time series
Single entity, multiple time periods
Panel
Multiple entities, two or more time periods
Speaking of using observational data. . .
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-8
Empirical problem: Class size and educational
output
Policy question: What is the effect on test scores (or
some other outcome measure) of reducing class size by
one student per class? by 8 students/class?
We must use data to find out (is there any way to answer
this without data?)
Review of Probability and Statistics
(Chapter 2)
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-9
The California Test Score Data Set (note 1-1)
All K through 8 California school districts (n = 420)
1999
Variables:
5
th
grade test scores
district-wide mean of reading and math scores for fifth
graders.
Student-teacher ratio (STR)
no. of students in the district divided by no. of full-time
teachers
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-10
Initial look at the data: (note 1-2)
(You should already know how to interpret this table)
What does this table tell us about the relationship between test
scores and the STR?
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-11
Do districts with smaller classes have
higher test scores?
Scatterplot of test score v. student-teacher ratio
What does this figure show?
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-12
We need to get some numerical evidence on whether
districts with low STRs have higher test scores but how?
1. Compare average test scores in districts with low STRs to
those with high STRs (estimation)
2. Test the null hypothesis that the mean test scores in the
two types of districts are the same, against the
alternative hypothesis that they differ (hypothesis
testing)
3. Estimate an interval for the difference in the mean test
scores, high v. low STR districts (confidence interval)
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-13
Initial data analysis: Compare districts with small (STR < 20)
and large (STR 20) class sizes: (note 1-3)
1. Estimation of = difference between group means
2. Test the hypothesis that = 0
3. Construct a confidence interval for
Class Size Average score
( )
Standard deviation
(s
Y
)
n
Small 657.4 19.4 238
Large 650.0 17.9 182
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-14
1. Estimation (note 1-4)
=
= 657.4 650.0
= 7.4
Is this a large difference in a real-world sense?
Standard deviation across districts = 19.1
Is this a big enough difference to be important for
school reform discussions, for parents, or for a
school committee?
What does this tell us about the population?
1
n
small
Y
i
i=1
n
small
Y
small
Y
large
1
n
large
Y
i
i=1
n
large
(
o
Y
3
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-30
Moments (cont.) (note 1-17)
kurtosis =
= measure of mass in tails
= measure of probability of large values
kurtosis = 3: normal distribution
kurtosis > 3: heavy tails (leptokurtotic)
E Y
Y
( )
4
(
o
Y
4
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-31
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-32
Two random variables
Random variables X and Y
Together they have a joint distribution
Each one has a marginal distribution
Each one has a conditional distribution
Joint distribution of two discrete X and Y
Probability that X and Y simultaneously take on certain values, say x
and y.
Pr(X = x, Y = y) or Pr(x, y) or P(X = x, Y = y) or P(x, y)
NOTE lower case symbols x and y denote values and. . .
Upper case symbols X and Y denote random variables
Probabilities of all possible (x, y) combinations sum to what?
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Joint distribution (cont.)
After recording data for many commutes
prob. of long, rainy commute = P(X=0,Y=0) = .15
prob. of long, clear commute = P(X=1,Y=0) = ??
prob. of short, rainy commute = P(X=0,Y=1) = ??
prob. of short, clear commute = P(X=1,Y=1) = ??
These four outcomes are mutually exclusive and exhaust all possibilities
So, they must sum to ??
1-33
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Marginal distribution
Marginal distribution is P(X=x) or P(Y=y)
Sum of joint probabilities:
prob. of long commute = P(X=0,Y=0) + P(X=1,Y=0) = .15 +.07 =.22
prob. of short commute = ??
prob. of rainy commute = ??
prob. of clear commute = ??
1-34
1
( ) ( , )
L
i
i
P Y y P X x Y y
=
= = = =
1
n
E(Y
i
)
i=1
n
1
n
Y
i=1
n
Y
Y
Y
1
n
Y
i
i=1
n
|
\
|
.
|
Y
(
2
1
n
(Y
i
Y
)
i=1
n
(
2
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-63
so var( ) = E
=
=
=
=
=
1
n
(Y
i
Y
)
i=1
n
(
2
Y
E
1
n
(Y
i
Y
)
i=1
n
1
n
(Y
j
Y
)
j=1
n
1
n
2
E (Y
i
Y
)(Y
j
Y
)
j=1
n
i=1
n
1
n
2
cov(Y
i
,Y
j
)
j=1
n
i=1
n
2 2
2 2
1
1 1
[ : cov( , ) var( )]
n
Y Y
i
n note Y Y Y
n n
o o
=
= =
2
Y
n
o
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-64
Mean and variance of sampling
distribution of (cont.) (note 1-32)
E( ) =
Y
var( ) =
Implications:
1. is an unbiased estimator of
Y
(that is, E( ) =
Y
)
2. var( ) is inversely proportional to n
1. the spread (standard deviation) of the sampling
distribution is proportional to 1/
2. Thus the sampling uncertainty associated with
is proportional to 1/ (larger samples, less
uncertainty, but square-root law)
Y
Y
Y
2
Y
n
o
Y
Y
Y
n
n
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-65
The sampling distribution of when n is
large (note 1-33)
For small sample sizes, the distribution of will
usually be complicated (unless. . . what is true about
the distribution of the Y
i
values in the population?)
But if n is large, the sampling distribution is simple!
1. As n increases, the distribution of becomes more
tightly centered around
Y
(the Law of Large Numbers)
2. Moreover, the distribution of both become normal (the
Central Limit Theorem)
1.
2.
Y
Y
Y
Y
Y
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-66
The Law of Large Numbers: (note 1-34)
An estimator is consistent if the probability that its falls within
an interval of the true population value tends to one as the
sample size increases.
If (Y
1
,,Y
n
) are i.i.d. and < , then is a consistent
estimator of
Y
, that is,
Pr[|
Y
| < ] 1 as n
which can be written,
( means converges in probability to
Y
).
o
Y
2
Y
Y
Y
p
Y
Y
" "
p
Y
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-67
The Central Limit Theorem (CLT): (note 1-35)
If (Y
1
,,Y
n
) are i.i.d. and 0 < < , then when n is
large the distribution of is well approximated by
a normal distribution.
is approximately distributed N(
Y
, ) (normal
distribution with mean
Y
and variance /n) AND. . .
(
Y
)/
Y
is Y approximately distributed N(0,1)
(standard normal)
That is, standardized = = is
approximately distributed as N(0,1)
VIP: The larger is n, the better is the
approximation.
n
Y
o
Y
2
Y
Y
o
Y
2
n
o
Y
2
Y
Y E(Y )
var(Y )
Y
Y
o
Y
/ n
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-68
Fig. 2.8 Sampling distribution of when Y is
Bernoulli, p = 0.78 (n = 2, 5, 25, 100)
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Fig. 2.8 Sampling distribution of (cont.)
(note 1-36)
In figure on previous slide (fig. 2.8), when
n = 100, it might not be easy to see that
the distribution of is normal.
Its easier to see this if we examine the
distribution of standardized =
See next slide
1-69
Y
Y
/
Y
Y
Y
n
o
Y
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-70
Same example: sampling distribution of
(n = 2, 5, 25, 100) (Fig. 2.9 in book)
Y E(Y )
var(Y )
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-71
Summary: The Sampling Distribution of
For Y
1
,,Y
n
i.i.d. with 0 < < ,
The exact (finite sample) sampling distribution of has mean
Y
( is an unbiased estimator of
Y
) and variance /n
Other than its mean and variance, the exact distribution of is
complicated and depends on the distribution of Y (the
population distribution)
When n is large, the sampling distribution simplifies:
Y
o
Y
2
Y
o
Y
2
Y
(Law of large numbers)
p
Y
Y
( )
var( ) var( )
Y
Y E Y Y
Y Y
=
is approximately N(0,1) (CLT)
Y