Sei sulla pagina 1di 49

Introduction

to Econometrics

Chapter 3

Review of Statistics

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


Remember we said econometrics =
disciplined data analysis + statistical inference

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-2
Three Main Types of Statistical Methods

Estimation
Computing best guess for unknown characteristic of
population distribution
Often a function of a sample of data randomly drawn from
population
e.g. mean or variance

Hypothesis Testing
Formulating hypothesis about a population then using data to
learn/infer whether it is true

Confidence Intervals
Using a set of data to estimate an interval or range for an
unknown population characteristic

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-3
Review of Probability and Statistics
(SW Chapters 2, 3)

Empirical problem: Class size and educational


output
Policy question: Does class size have a significant impact
on educational outcomes?
What is the effect of reducing class size by one student
per class on test scores?
We must use data to find out
Is there any way to answer this without data?

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-4
The California Test Score Data Set

K-6 and K-8 California school districts (n = 420)

Variables:
5th grade test scores (Stanford-9 achievement test,
combined math and reading), district average
Student-teacher ratio (STR) = no. of students in the
district divided by no. full-time equivalent teachers

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-5
Initial look at the data:

This table doesn t tell us anything about the relationship


between test scores and the STR.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-6
Do districts with smaller classes have
higher test scores?

Scatterplot of test score v. student-teacher ratio

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-7
We need to get some numerical evidence
on whether districts with low STRs have
higher test scores but how?

1. Compare average test scores in districts with low


STRs to those with high STRs ( estimation )

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-8
1. Estimation

nGroupA nGroupB
1 1
YGroupA YGroupB = Y i Y i
nGroupA i=1
nGroupB i=1

Two main questions :

1) Is this a large difference in a statistical sense?


i.e. Can we say with certainty the difference is not
explained by sampling)?

2) Is this a large difference in a real-world sense?


i.e. Is the difference economically significant
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
BUTWhy Do We Use Y To Estimate Y?

Want the estimator to get as close as possible to


unknown true value
Want its sampling distribution to be as tightly centered
on unknown value as possible

Desirable characteristics of an estimator:


- 1. Unbiasedness: If you took a bunch of samples
on average the mean estimator would be right
answer
- Recall: Central Limit Theorem

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-10
Desirable characteristics of an estimator:

- 2. Consistency Probability that estimator is


within small interval of true value approaches 1
as sample size increases
- i.e. When N is large, uncertainty about the estimator
due to random variations in the sample is small.
- Recall: Law of Large Numbers

- 3. Variance and efficiency


- Assuming multiple estimators are consistent and
unbiased.choose the one with tightest sampling
distribution
- i.e. Minimum variance

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-11
Your old buddyCentral Limit Theorem (CLT):

2
If (Y1,,Yn) are i.i.d. and 0 < Y < , then when n is large the
distribution of sample mean is well approximated by a
normal distribution.
2

Y is approximately distributed N(Y, Y )
n
2

Normal with mean( Y ) and var ( Y / n)

( Y Y)/Y is approximately distributed N(0,1)


(standard normal)

As sample size grows variance shrinks

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


2-12
Y as Least Squares Estimator for Y

Consider the problem of finding the estimator m


that minimizes the sum of squared errors:
n
2
(Y
i m)
i=1

You dont have to know the proof but it is:


d n 2
n n

(Yi
m) = 2 (Yi m) = 2Yi + 2nm
dm i=1 i=1 i=1

Set equal to zero for:


n n
1
2Yi + 2nm = 0 => m = Yi
i=1 n i=1
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-13
Initial data analysis: Compare districts with
small (STR < 20) and large (STR 20) class sizes:

Class Size Average score Standard deviation n


( Y ) (sBYB)

Small 657.4 19.4 238


Large 650.0 17.9 182

1. Estimation of = difference between group


means

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-14
1. Estimation - Is this difference large?
nlarge
1
nsmall
1
Ysmall Ylarge = Y Y i
nsmall i=1
i nlarge i=1

= 657.4 650.0 = 7.4


The average score for small class districts is about
1.1% higher than large class districts

Difference between 60th and 75th percentiles of


test score distribution is 667.6 659.4 = 8.2

Variation within groups much larger:


St Dev across small class districts = 19.4
St Dev across large class districts = 17.9
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-15
OK so you find that average test score for students
in smaller classes is higher than the average test
score for students in larger classes. We can thus
conclude that smaller classes lead to higher test
scores.
!

A. True'
B. False'

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


2. Hypothesis testing

The hypothesis testing problem (for the mean):


- Based on the sample data, test whether a null
hypothesis is true.or instead that some
alternative hypothesis is true.

H0: E(Y) Y,0 vs. H1: E(Y) > Y,0 (1-sided, alt is >)

H0: E(Y) Y,0 vs. H1: E(Y) < Y,0 (1-sided, alt is <)

H0: E(Y) = Y,0 vs. H1: E(Y) Y,0 (2-sided, alt is > or <)

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-17
Hypothesis testing terminology:

Type I error is the incorrect rejection of a true


null hypothesis
"false positive
We reject the null hypothesis when it is actually true

Type II error is incorrectly retaining a false null


hypothesis
"false negative
We accept the null hypothesis when it is actually
false

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


7-18
More hypothesis testing terminology:

The significance level of a test is a pre-specified


probability of incorrectly rejecting the null, when the
null is true.
Pre-specified allowance for probability of a Type I error
Critical value of test statistic is the value of the test
statistic needed to reject null at a given significance
level

The power of the test is the probability that the test


correctly rejects the null when it is false.
As power increases, the probability of Type II error
decreases

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-19
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Comments on Students t distribution

If the sample size is moderate (several dozen) or large


(hundreds or more), the difference between the t-distribution
and N(0,1) critical values is negligible. Here are some 5%
critical values for 2-sided tests:

degrees of freedom 5% t-distribution


(n 1) critical value
10 2.23
20 2.09
30 2.04
60 2.00
1.96

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-22
Difference-in-means - unpooled variance
YA YB d0 YA YB d0
t= =
SE(YA YB ) s 2A
+ sB2
nA nB

where d0= hypothesized difference between


group means (often 0)
where SE(YA YB ) is the standard error
of (YA YB ) , the subscripts A and B refer to
the two groups of interest
where s2 = sample variance for each group
unpooled means dont assume same variance
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Difference-in-means - pooled variance

YA YB d0 YA YB d0
t= =
SE(YA YB ) S(Y ) 1 + 1
i n n A B

where d0= hypothesized difference between group


means (often 0)
where SE(YA YB ) is the standard error of (YA YB )
where S(Yi) = sample standard deviation when both
groups are pooled together (this formula is a bit
involved )
pooled means assume same variance
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
We need to get some numerical evidence
on whether districts with low STRs have
higher test scores but how?

1. Compare average test scores in districts with low


STRs to those with high STRs ( estimation )

2. Test the null hypothesis that the mean test


scores in the two types of districts are the same,
against the alternative hypothesis that they
differ ( hypothesis testing )

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-25
Initial data analysis: Compare districts with
small (STR < 20) and large (STR 20) class sizes:

Class Size Average score Standard deviation n


( Y ) (sBYB)

Small 657.4 19.4 238


Large 650.0 17.9 182

1. Estimation of = difference between group means

2. Test the hypothesis that = 0

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-26
Ceteris!paribus!,!which!of!the!following!is!most!accurate?!

A. As'the'dierence'in'average'test'scores'
between'large'and'small'classrooms'
increases''you'are'less'likely'to'reject'the'
null'
B. The'dierence'in'average'test'scores'
between'large'and'small'classrooms'has'
no'impact'on'your'likelihood'of'accep@ng'
or'rejec@ng'the'null'
C. As'variance'in'large'classroom'test'scores'
falls'B'you'are'less'likely'to'reject'the'null'
D. As'variance'in'large'classroom'test'scores'
falls'B'you'are'more'likely'to'reject'the'null'

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


Compute the difference-of-means t-statistic:

Size Y sBYB n
small 657.4 19.4 238
large 650.0 17.9 182

Ys Yl d0 657.4 650.0 0 7.4


t= = = = 4.05
ss2
+ sl2 19.42
+ 17.92 1.83
ns nl 238 182

|t| > 1.96, so reject (at the 5% significance level)


the null hypothesis that the two means are the
same.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-28
OK so you find that the calculated |t| is greater
than 1.96. So you conclude that class size is
economically significant with respect to test scores.
!

A. True'
B. False'

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


More hypothesis testing terminology:
p-value = probability of drawing a statistic (e.g. Y )
at least as adverse to the null as the value actually
computed, assuming the null hypothesis is true.
i.e. probability that the difference between some
random sample mean and the hypothesized population
mean is GREATER than the difference between the
actual sample mean we observe and the hypothesized
population mean

Calculating the p-value based on Y :


p-value = PrH [| Y Y ,0 |>| Y act Y ,0 |]
0

where Y act is the value of Y actually observed


Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-30
Calculating the p-value with Y known:

For large n, p-value = the probability that a N(0,1)


act
random variable falls outside |( Y Y,0)/ Y |
In practice, Y is unknown it must be estimated

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-31
What is the link between the p-value and
the significance level?

The significance level is prespecified. For


example, if the prespecified significance level is
5%,
you reject the null hypothesis if |t| 1.96.
Equivalently, you reject if p 0.05.
The p-value is sometimes called the marginal
significance level.
Often, it is better to communicate the p-value
than simply whether a test rejects or not the p-
value contains more information than the yes/
no statement about whether the test rejects.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-32
Students t distribution for
small samples

- If Yi, i = 1,, n is i.i.d. N(Y, ), then the t-


statistic has the Student t-distribution
Y2 with n 1
degrees of freedom.
- The critical values of the Student t-distribution is
tabulated in the back of all statistics books.
Remember the recipe?
1. Compute the t-statistic
2. Compute the degrees of freedom, which is n 1
3. Look up the 5% critical value
4. If the t-statistic exceeds (in absolute value) this critical
value, reject the null hypothesis.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-33
Students t distribution for
small samples

So the Student-t distribution is highly relevant when


the sample size is very small;
- BUT - for it to be correct, you must be sure that
the population distribution of Y is normal.
- In economic data, the normality assumption is
rarely credible.
- Earnings
- Financial returns

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-34
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-35
3. Confidence Interval

An X% confidence interval for Y is an interval


that contains the true value of Y in X% of
repeated samples.

Digression: What is random here? The values of


Y1,...,Yn and thus any functions of them
including the confidence interval.

The confidence interval will differ from one sample


to the next. The population parameter, Y, is not
random; we just don t know it.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-36
95% Confidence Interval

A 95% confidence interval for the difference


between the means is:

( YA Y ) 1.96 SE( YA YB )
B

Two equivalent statements:


1) The 95% confidence interval for doesnt include 0;
2) The hypothesis that = 0 is rejected at the 5%
level.

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


90 and 99% Confidence Intervals

A 90% confidence interval for the difference


between the means is:

( YA Y ) 1.64 SE( Y Y )
B A B

A 99% confidence interval for the difference


between the means is,

( YA YB ) 2.58 SE( YA YB )

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


We need to get some numerical evidence on
whether districts with low STRs have higher test
scores but how?

1. Compare average test scores in districts with low


STRs to those with high STRs ( estimation )

2. Test the null hypothesis that the mean test


scores in the two types of districts are the same,
against the alternative hypothesis that they
differ ( hypothesis testing )

3. Estimate an interval for the difference in the


mean test scores, high v. low STR districts
( confidence interval )
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-39
Initial data analysis: Compare districts with
small (STR < 20) and large (STR 20) class sizes:

Class Size Average score Standard deviation n


( Y ) (sBYB)

Small 657.4 19.4 238


Large 650.0 17.9 182

1.Estimation of = difference between group


means

2.Test the hypothesis that = 0

3. Construct a confidence interval for


Copyright 2011 Pearson Addison-Wesley. All rights reserved.
1-40
Summary:

From the two assumptions of:


1. simple random sampling of a population, that is,
{Yi, i =1,,n} are i.i.d.
2. 0 < E(Y4) <

we developed, for large samples (large n):


Theory of estimation (sampling distribution of Y )
Theory of hypothesis testing (large-n distribution of t-
statistic and computation of the p-value)
Theory of confidence intervals (constructed by inverting
the test statistic)
Are assumptions (1) & (2) plausible in practice? Yes

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-41
Let s go back to the original policy
question:
What is the effect on test scores of reducing STR by one
student/class?
Have we answered this question?

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


1-42
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Excel: We want to know whether studying
results in higher exam grades.

1. Compare average test scores for students with


low Hours Studied to those with high Hours
Studied ( estimation )

2. Test the null hypothesis that the mean test


scores in the two student groups are the same,
against the alternative hypothesis that they
differ ( hypothesis testing )

3. Estimate an interval for the difference in the


mean test scores, high v. low Hours Studied
( confidence interval )
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
OK so you conduct a hypothesis test and find
that the average exam grade for students who
study a lot is statistically significantly higher
than the average exam grade for students who
study a little. We can thus conclude that
studying more leads to higher exam grades.
!
A. True'
B. False'

Copyright 2011 Pearson Addison-Wesley. All rights reserved.


Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Internal validity: the statistical inferences
about causal effects are valid for the
population being studied.

Often complicated by omitted variables bias


and/or selection bias

Omitted variables bias means - there variables


missing which are predictive of Y but
correlated with X (Chapter 6)

Selection bias occurs when there are inherent


differences between groups, which are
predictive of outcome, that have not been held
constant
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Determining causality

Selection bias can be thought of as an


omitted variables biasjust individuals are
opting into the groups

We have selection bias when the average


outcome without intervention differs
between groups

Difference in group means =


Average causal effect + Selection bias
Copyright 2011 Pearson Addison-Wesley. All rights reserved.
Determining causality

To the extent that the selection bias can be


fully explained by observable variables
we can use regression analysis.

What variables might we start with?

Will this get get to fully to ceteris paribus?

Copyright 2011 Pearson Addison-Wesley. All rights reserved.

Potrebbero piacerti anche