Sei sulla pagina 1di 825

Introduction

to Econometrics
James H. Stock
HARVARD UNIVERSITY

Mark W. Watson
PRINCETON UNIVERSITY

Boston San Francisco New York


London TorontO Sydney Tok.] 'o Singapore Madrid
Mexico Ci r Munich Paris Cape own Hong Kong Montreal

Brief Contents
PART ONE

Introduction and Review

CHAPTER 1

Economic Questions and Data

CHAPTER2

ReviewofProbability

CHAPTER 3

Review of Statistics

PART TWO

Fundamentals of Regression Analysis

CHAPTER 4

Linear Regression with One Regressor

CHAPTER 5

Regression with a Single Regressor: Hypothesis Tests and


Confidence Intervals 148

CHAPTER 6

Linear Regression with Multiple Regressors

CHAPTER 7

Hypothesis Test s and Confidence Intervals


in Multiple Regression 220

CHAPTER 8

No nlinear Regression Functions

CHAPTER 9

Assessing Studies Based on Multiple Regression

PART THREE

Further Topics in Regression Analysis

CHAPTER 10

Regression with Panel Data

CHAPTER I I

Regression with a Binary Dependent Variable

CHAPTER 12

Instrumenta l Variables Regression

CHAPTER 13

Ex periments and Quasi -Experiment s

PART FOUR

Regression Analysis of Economic


Time Series Data 523

CHAPTER 14

Introduction to Time Series Regression and Forecasting

CHAPTER 15

Estimation ofDynamic Causal Effects

CHAPT ER 16

Additional Topics in Time Series Regression

PART FIVE

The Econometric Theory of Regression Analysis

CHAPTER 17

The Theory of linear Regression with One Regressor

17
65

I09

Ill

186

254
312

34 7

349

CHAPTER 18 The Theory of Multiple Regression

383

421
468

525

591
637

675
677

704
v

Contents
Preface

xxvii

PART ONE

Introduction and Review

CHAPTER I

Economic Questions and Data

1.1

Economic Questions We Examine 4


Question # I: Does Reducing Class Size Improve Elementary School
Education? 4
Question # 2: Is There Racial Discrimination in the Market for Home Loans?
Question # 3: H ow Much Do Cigarette Taxes Reduce Smoking? 5
Question #4: W hat Will the Rate of Inflation Be Next Year? 6
Quantitative Questions, Quantitative Answers 7

1.2

Causal Effects and Idealized Experiments

Estimation of Causal Effects 8


Forecasting and Causality 9
1.3

Data: Sources and Types

I0

Experimental versus Observational Data


Cross-Sectional Data I I
Time Series Data I I
Panel Data 13
CHAPTER 2

2.1

Review of Probability

I0

17

Random Variables and Probability Distributions

18

Probabilities, the Sample Space, and Random Variables 18


Probability Distribution of a Discrete Random Variable I9
Probability Distribution of a Continuous Random Variable 21
2.2

Expected Values, Mean, and Variance

23

The Expected Value of a Random Variable 23


The Standard Deviation and Variance 24
Mean and Variance of a Linear Function of a Random Var iable
Other Measures of the Shape of a Distribution
26
2.3

Two Random Variables

25

29

Joint and Marginal Distributions


Conditional Distributions 30

29

vii

viii

CONTENTS

Independence 34
Covariance and Correlation 34
The Mean and Variance of Sums of Random Variables
2.4

35

The Normal, Chi-Squared, Student t, and F Distributions

39

The Normal Distribution 39


The Chi-Squared Distribution 43
The Student t Distribution 44
The F Distribution 44
2.5

Random Sampling and the Distribution of the Sample Average


Random Sampling 45
The Sampling Distribution of the Sample Average

2.6

AP P ENDI X 2.1

CHAPTER 3

3.1

Review ofStatistics

48

49

Derivation of Results in Key Concept 2.3

63

65

Estimation ofthe Population Mean


Estimators and Their Properties 6 7
Properties of Y 68
The Importance of Random Sampling

3.2

46

Large-Sample Approximat ions to Sampling Distributions


The Law of Large Numbers and Consistency
The Central Limit Theorem 52

66

70

Hypothesis Tests Concerning the Population Mean

71

Null and Alternative Hypotheses 72


Thep-Value 72
Calculating thep-Value When a y Is Known 74
The Sample Variance, Sample Standard Deviation , and Standard Error
Calculating thep-Value When cry Is Unknown 76
The t-Statistic 77
Hypothesis Testing with a Prespecified Significance Level 78
One-Sided Alternatives 80
3.3

Confidence Intervals for the Population Mean

81

3.4

Comparing Means from Different Populations

83

Hypothesis Tests for the Difference Between Two Means 83


Confidence Intervals for the Diffe rence Between Two Population Means
3 .5

45

Differences-of-Means Esti mat ion ofCausal Effects


Using Experimental Data 85
The Causal Effect as a Difference of Conditional Expectations 85
Estimation of the Causal Effi ct Using Differences of Means 87

75

84

ix

CO NTENTS

3.6

Using t he t-St at istic When the Sample Size Is Small 88


The t-Statistic and the Student t Distribution 88
Use of the Student t Distribution in Practice 92

3.7

Scatterplot, the Sample Covariance, and the Sample C orrelation 92


Scatterplots
93
Sample Covariance and Correlation

94

APPENDI X 3. 1

The U.S. Current Population Survey

105

APPENDIX 3.2

Two Proofs That Y ls the Least Squares Estimator of J.Ly

APPENDIX 3 . 3

A ProofThat the Sample Variance Is Consistent

PART TWO

Fundamentals of Regression Analysis

CHAPTER 4

Linear Regression with One Regressor

4. 1

4.2

The Linear Regression Model

107

I 09

Ill

112

Estimating the Coefficients of the Linear Regression Model


The Ordinary Least Squares Estimator 118
OLS Estimates of the Relationship Between Test Scores and the
Student-Teacher Ratio 120
Why Use the OLS Estimator? 121

4.3

Measures of Fit

123

The R2 123
The Standard Error of the Regression 124
Application to the Test Score Data 125
4.4

T he Least Squares Assumptions

126

Assumption #I: The Conditional Distribution of U; Given X; Has a Mean


ofZero 126
Assumption #2 : (X; . Y;), i I, ... , n Are Independently and Identically
Distributed 128
Assumption #3: Large Outl iers Are Unlikely 129
Use of the Least Squares Assumptions 130

4.5

The Sampling Distribution of the OLS Estimators


The Sampling Distribution of the OLS Estimators

4.6

106

131
132

Conclusion

135

A P P E NDI X 4 . 1

The California Test Score Data Set

A PPENDI X 4.2

Derivation of the OLS Estimators

A PP E N D IX 4 . 3

Sampling Distribution of the O LS Estimator

143
143
144

116

CONTENTS

CHAPTER 5

5.1

II

l.
I

Regression with a Single Regressor: Hypothesis Tests


and Confidence Intervals 148
Testing Hypotheses About One of the
Regression C oefficie nts 149
Tw o-Sided H ypotheses Concerning (3 1 149
One-Sided H ypotheses Concerning (3 1 153
Testing H pothese About the Intercept (30 155

5.2

5.3

Confidence Intervals for a Regression Coefficient


Regression When X Is a Binary Variable
lnrerpretat1on of the Regression Coefficients

5.4

158
158

Heteroskedasticity and Homoskedasticity

160
160

W hat Are Heteroskedasticity and Homoskedasticity?


Mathematical Implications of H omoskedasticity 163
W hat Does This Mean in Practice? 164
5.5

155

The Theoretical Foundations of Ordinary Least Squares

166

Linear Conditionally Unbiased Estimators and the Gauss-Markov -t heorem


Regression Estimators Other Than OLS 168
5.6

Using the t-Statistic in Regression When the Sample Siz


Is Small 169
The t-Statistic and the Student t Distribution 170
Use of the Student t Distribution in Practice 170

5.7

Conclusion

171

APPE NDIX 5. 1

Formulas for OLS Standard Errors

180

The Gauss-Markov Conditions and a Proof


of the Gauss-Markov Theorem 182

APPEND IX 5 . 2

CHAPTER 6
6.1

Linear Regression with Multiple Regressors


Omitted Variable Bias

186

186

Defmiti n of Omitted Variable Bias 187


A Formula for Omitted Variable Bias 189
Addressing Omitted Variable Bias by Dividing the Data into Grour s

6.2

The Multiple Regression Model 193


The Population Regression Line 193
The Population Multiple Regre ssion Model

6.3

194

The OLS Estimator in Multiple Regression

196

The OLS Estimator 197


Application to Test Scores nd the Student-Teach r Ratio

198

191

167

CONTENTS

6.4

200

Measures of Fit in Multiple Regression

T he Standard Err r of Ihe Reg ression (SER) 200


TheR 2 200
The ':Adjus ted R2" 201
Applicat ion to Test Scores
202
6.5

The Least Squares Assumptions in Multiple Regression 202


Ass umption # l: The Conditional Distribution of Given X 1, . X2, . .. . , Xk, Has a

u,

Mean of Zero 203


Assumption # 2: (X 1, . X2,, ... , Xlci. Y,) i = I, ... , n Are i.i.d.
A sumption # 3: Large Outliers Are Unlikel y 203
Assumption # 4: No Perfect Mul ticollinearity 203
6.6

The Distribution of the OLS Estimators


in Multiple Regression 205

6.7

Mult icollinearity

206

Examples of Perfect Multicollinearit y


Imperfect Multicollinearity 209
6.8

Conclusion
APP ENDIX 6.1

20

206

210
Derivation ofEquation (6 .1 )

218

Distribution of the OLS Estimators W hen There Are Two


Regressors and H omoskedastic Errors 218

APPEN DIX 6 .2

CHAPTER 7

7.1

Hypothesis Tests and Confidence Intervals


in Multiple Regression 220
Hypothesis Tests and Confidence Intervals
for a Singl e Coefficient 221
Standard Errors for the OLS Estimators 221
H ypothesis Tests for a Single Coeffici ent 221
Confidence Intervals for a Single Coefficient 223
Application to Test Scores and the Studen t- Teacher Ratio

7.2

Tests of Joint Hypotheses

225

Tes ti ng H yp theses on Tw o or More Coefficie nts 225


T he F-Statistic 227
Application to Test Scores and the Student-Teacher Ratio
The Homoskedastici ty-Onl F-Statistic 230
7.3

7.4

22 3

Testing Si ngle Restrictions


Involving Mul tiple Coefficients

232

Confidence Sets for Multi ple Coefficients

234

229

xi

xii

CONTE NTS

7.5

Model Specification for Multiple Regression 235


Omitted Variable Bias in Multiple Regression 236
Model Specification in Theory and in Practice 236
Interpreting the R2 and the Adjusted R2 in Practice 237

7.6

Analysis of the Test Score Data Set 239

7 .7

Conclusion

244

APPE N DIX 7. 1 The Bonferroni Test of a Joint Hypotheses

CHAPTER 8

8 .1

Nonlinear Regression Functions

25 1

254

A General Strategy for Modeling Nonlinear


Regression Functions 256
Test Scores and District Income 256
The Effect on Y of a Change in X in Nonlinear Specifications 260
A General Approach to Modeling Nonlinearities Using Mult iple Regression

8 .2

Nonlinear Functions of a Single Independent Variable

264

Polynomials 265
Logarithms 26 7
Polynomial and Logarithmic Models of Test Scores and District Income
8 .3

Interactions Between Independent Variables

280

Nonlinear Effects on Test Scores of the Student- Teacher Ratio


Discussion ofRegression Results
Summary of Findings 295

8 .5

275

277

Interactions Between Two Binary Variables 277


Interactions Between a Continuous and a Binary Variable
Interactions Between Two Continuous Variables 286
8 .4

Conclusion

291

296

APPENDI X 8. 1 Regression Functions That Are N onlinear

in the Parameters
CHAPTER 9

9.1

307

Assessing Studies Based on Multiple Regression


Internal and External Validity

312

313

Threats to Internal Validity 3 13


Threats to Ex ternal Validity 314

9 .2

264

Threats to Internal Validity of Mult iple Regression Analysis


Omitted Variable Bias 3 I6
Misspecification of the Functional Form of the Regression Function
Errors-in-Varia bles 3 19
Sample Select ion 322

316
3 19

290

CON TE NTS

Simu lta neous Causa lity 32


Sources of Inconsistency of OLS Stan ard Errors
9.3

325

Internal and External Validity W hen the Regression Is Used


327
for Forecast ing
Using Regression Model s for Forecasti ng 327
Assess mg the Validity of Regre ss ion Models for Forecasting

9.4

Example: Test Scores and C lass Size


External Validity 329
Internal Validity 336
Discussion and Implications

9.5

Co nclusion

328

329

337

338

APPEND IX 9 . 1 The Ma ssachusetts Elementary School Testing Data

PART THREE Further Topics in Regression Analysis


CHAPTE R 10

10. 1

Regression with Panel Data


Panel Dat a

10.3

34 7

349

350

Example : Traff1c Deaths and Alcohol Taxes


10.2

344

351

Pane l Data wi t h Two T ime Pe riods: " Before a nd Afte r"


Com parisons 353
Fixed Effects Regression

356

The Fixed Effects Regression Mode l 356


Estimation and Inference 359
Application to Traff1c Deaths 360
10.4

Regression with Time Fixed Effects


T ime Effects Only 36 1
Both Entity and Time Fixed Effects

1 O.S

361

362

The Fixed Effects Regression Assumptions and Standard Errors


for Fixed Effects Regression 364
The Fixed Effects Regression Ass umption s 364
Standard Errors for Fixed Effects Regression 366

10.6

Drunk Driving Laws a nd Traffic Deaths

10.7

Concl us ion
APPENDIX I 0.1

36 7

37 1
The State Traffic Fatality Data Set

378

Standard Errors for Fixed Effects Regression


with Serially Correlated Errors 379

APPENDIX 10.2

Jdii

xiv

CONTENTS

CHAPTER II

11.1

Regression with a Binary Dependent Variable

383

Binary Dependent Variables and the Linear Probability Model


Binary Dependent Variables 385
The Linear Probability Model 387

1 1.2

Probit and Logit Regression

389

Probit Regression 389


Logit Regression 394
Comparing the Linear Probability, Probit, and Logit Models
11 .3

396

Estimation and Inference in the Logit and Probit Models

396

Nonlinear Least Squares Estimation 397


Maximum Likelihood Estimation 398
Measures of Fit 399
11.4

Appl ication to the Boston HMDA Data

11.5

Summary

CHAPTER 12
1 2. 1

400

407

APPENDI XII.I

TheBostonHMDADataSet

APP END IX 11.2

Maximum Likelihood Estimation

APPEND IX 11.3

Other Limited Dependent Variable Models

Instrumental Variables Regression

415
415
4 18

421

The IV Est imator w ith a Single Regressor


and a Single Instrument 422
T he IV Model and Assumptions 422
The Two Stage Least Squares Estimator 423
Why Does IV Regression Work? 424
The Sampling Distribution of the TSLS Estimator 428
Application to the Demand for Cigarettes 430

12.2

The General IV Regression Model 432


T SLS in the General IV Model 433
Instrument Relevance and Exogeneity in the General IV Model
The IV Regression Assumptions and Sampling Distribution
of the TSLS Estimator 434
Inference Using the TSLS Estimator 437
Application to the Demand for C igarettes 437

12.3

Checking Instrument Validity

439

Assumption # l: Instrument Relevance 439


Assumption #2: Inst rument Exogeneity 443
12.4

Application to the Demand for C igaret tes

445

434

384

xv

CONTENTS

12.5

Where Do Valid Instruments Come From?


T hree Examples

12.6

Conclusion
A PPENDIX 12. 1

450

451

455
The Cigarette Consumption Panel Data Set

462

Derivation of the Formula for theTS LS Estimator in


Equation (12.4) 462

APPENDIX 12 .2

APPEN DIX 12 . 3

Large-Sample Distribution of theTS LS Estimator

463

Large-Sample Distribution of the TSLS Estimator When the


Instrument Is Not Valid 464

A P P ENDIX 12.4

APPENDIX 12 . 5

Instrumental Variables Analysis with Weak Instruments

CHAPTER 13 Experiments and Quasi-Experiments


13.1

13.2

468

Idealized Experiments and Causal Effects


Ideal Randomized Controlled Experiments
The Differences Estimator 4 71

466

4 70

4 70

Potential Problems with Experiments in Practice

4 72

T hreats to Internal Validity 472


Threats to External Validity 475
13.3

Regression Estimators of Causal Effects


Using Experimental Data 4 77
The Differences Estimator with Additional Regressors 4 77
T he Differences-in- Differences Estimator 480
Estimation of Causal Effects for Different Groups 484
Estimat ion W hen There Is Part ial Compliance 484
Testing for Randomization 485

13.4

Experimental Estimates of the Effect of C lass Size Reductions 486


Experimental Design 486
Analysis of the STAR Data 487
Comparison of the O bservational and Experimental Estimates
ofClass Size Effects 492

13.5

Quasi-Experiments

494

Examples 495
Econometric Methods for Analyzing Quasi-Experiments
13.6

Potential Problems with Quasi-Experiments


Threats to Internal Validity 500
Threats to External Validity 502

500

497

xv

CONTENTS

12.5

Where Do VaJid Instruments Come From?


Three Examples

12.6

Conclusion
APPE NDIX 12 .1

450

451

455
The Cigarette Consumption Panel Data Set

462

Derivation ofthe Formula for theTSLS Estimator in


Equation (12.4) 462

A P PENDIX 12 .2

APPENDIX 12.3

Large-Sample Distribution ofthe TSLS Estimator

463

Large-Sample Distribution of the TSLS Estimator W hen the


Instrument Is Not Valid 464

APPENDIX 12.4

APPENDIX 12.5

Instrumental Variables Analysis with Weak Instruments

CHAPTER 13 Experiments and Quasi-Experiments


13.1

13.2

468

Idealized Experiments and Causal Effects


Ideal Randomized Controlled Experiments
The Differences Estimator 4 71

466

470

4 70

Potential Problems with Experiments in Practice

4 72

Threats to In ternal Validity 472


Threats to External Validity 4 7 5
13.3

Regression Estimators of Causal Effects


Using Experimental Data 4 77
The Differences Estimator with Additional Regressors 477
The Differences-in- Differences Estimator 480
Estimation of Causal Effects for Different Groups 484
Estimation When There Is Partial Compliance 484
Testing for Randomization 485

13 .4

Experimental Estimates of the Effect of Class Size Reductions


Experimental Design 486
Analysis of the STAR Data 487
Comparison of the Observational and Experimental Estimates
of Class Size Effects 492

13.5

Quasi-Experiments

494

Examples 495
Econometric Methods for Analyzing Quasi- Experiments
13.6

Pot ential Problems with Quasi-Experiments


T hreats to Internal Validity 500
Threats to External Validity 502

500

497

486

xvi

CONTENTS

13.7

Experimental and Quasi-Experimental Estimates


in Heterogeneous Populat ions 502
Population H eterogeneity: Whose Causal Effect? 502
OLS w ith Heterogeneous Causal Effects 503
IV Regression w ith Heterogeneous Causal Effects 504

13.8

Conclusion

507

APP EN DI X 13. 1 TheProjectSTARData Set

516

Extension of the Diffe rences-in- Differences Estimator to


Multiple Time Periods 517

APPENDIX 13. 2

Conditional Mean Independence

APPE N D IX 13.3

IV Estimation W hen the Causal Effect Varies Across

APP E N DIX 13.4

Individuals

PART FOUR

518

520

Regression AnaJysis of Economic Time Series Data

C HAPTER 14 Introduction to Time Series Regression and Forecasting


14.1
14.2

Using Regression Mode ls for Forecasting

Introduction to Time Series Data and Serial Correlation

Autoregressions

525

527

The Rates of Inflation and Unemployment in the United States


Lags, First Differences, Logarithms, and Growth Rates 528
Autocorrelation 532
Other Example s of Economic Time Series 533
14.3

523

528

528

535

T he First Order A utoregressive Model 535


The ptl Order A utoregressive Model 538
14.4

Time Series Regression w ith Addit ional Predictors and the


Autoregressive Distribut ed Lag Model 541
Forecasting Cha nges in the Inflation Rate Usi ng Past
Unemployment Rates 541
Stationarity 544
Time Series Regression w 1t h Multiple Predictors 545
Forecast Uncertaint y and Forecast Intervals 548

14.5

Lag Length Select ion Using Informat ion Criteria

549

Determining the Order of an A uto regression 55 1


Lag Le ngth Se lec ti n in Time Se ries Regressio n w it h Multiple Predictors
14.6

Nonstationarity 1: Trends
What I a -rrend?
p

554

r-55
7

553

CONTENTS

xvii

Detecting Stochas tic Tr e ds: Tr sting for a Unit A Root 560


Avoidmg the Problems a u ed by Stochastic Trends 564
14.7

Nonstationarity II: Breaks

565

W hat Is a Break? 565


Testing fo r Breaks 566
Pseudo Out-of-Sample Forecasti ng 57 1
Avoid ing the Probl e ms Caused by Breaks 576
14.8

Conclusion

577

APP ENDI X 14. 1 Time Series Data Used in Chapte r 14

15.2

Stationarity in the A R(l ) Model

APPE ND IX 14.3

Lag Operator Notation

APPE ND IX 14.4

ARMA Models

A P PEN DIX 14.5

Consistency of the BI C Lag Length Estimator 589

588

589

591

An Ini t ial Taste of the O range J uice Data 593


Dynamic Causal Effects

595

Causal Effects and Time Series Data


Tw o Types ofExogeneity 598
15.3

586

APP EN D IX 14.2

CHAPTER 15 Estimation of Dynamic Causal Effects


15.1

586

596

Estimati on ofDynamic Causal Effects with Exogenous Regressors


The Distributed Lag Model Assumptions 60 I
Autocorre lated u1, Standard Errors, and Inference 60 I
Dynamic Mul tipliers and C umulative Dynamic Multipl iers

15.4

602

H eteroskedasticity- and Autocorrelat ion-Consistent Standard


Errors 604
Distribution of the OLS Estimator w ith Autocorrelated Errors
H AC Standard Errors 606

15.5

604

Estimation of Dynami c Causal Effects with Strict ly


Exogenous Regressors 608
The Distributed Lag Mode l with A R(I) Errors 609
0 S Estimation of the A DL Model 612
GLS Estimation 613
T he Distributed Lag Model w ith Additional Lags and A R(p) Errors

15.6
15.7

Orange J uice Prices and Cold Weather


Is Exogeneity Plausible? Some Examples
U.S. Income and Australian Exports
Oil Prices and Inflation 626

625

618
624

615

600

xviii

CONTENTS

Monetary Policy and Inflation


The Phillips Curve 627
15 .8

Conclusion

626

62 7
The Orange Juice Data Set

APPENDIX 15 .1

634

The ADL Model and Generalized Least S uares


in Lag Operator Notation 634

APPENDIX 15 .2

CHAPTER 16 Additional Topics in Time Series Regression


16.1

Vector Autoregressions

637

638

The VAR Model 638


A VAR Model of the Rates of Inflation and Unemployment
16.2

Multiperiod Forecasts

64 1

642

Iterated Muliperiod Forecasts 643


Direct Multiperiod Forecasts 645
Which Method Should You Use? 647

I.
16.3

Orders of Integration and the DF-GLS Unit Root Test


Other Models ofTrends and Orders of Integration 648
The DF-GLS Test for a Unit Root 650
Why Do Unit Root Tests Have Non-normal Distributions?

16.4

Cointegration

653

655

Cointegration and Error Correct ion 655


How Can You Tell Whether Two Variables Are Cointegrated?
Estimation of Cointegrating Coefficients 660
Extension to Multiple Cointegrated Variables 661
Application to Interest Rates 662
16.5

Conclusion
A PPENDIX 16. 1

PART FIVE

658

Volatility Clustering and Autoregressive Conditional


Heteroskedasticity 664
Volatility Clustering 665
Autoregressive Conditional Heteroskedasticity
Application to Stock Price Volatility 66 7

16.6

648

666

669
U.S. Financial Data Used in Chapter 16

674

The Econometr.cTheory of Regression Analysis

CHAPTER 17 The Theory of Linear Regression with One Regressor


17. 1

675
6 77

The Extended Least Squares Assumptions and the OLS Estimator 6 78


The Extended Least Squares Assumptions
The OLS Estimator 680

678

xix

CONTENTS

17.2

Fundamentals of Asymptotic Distribution Theory

680

Con ergence in Pr bability and the Law of Large Numbers 681


The Central Limit Theorem and Convergence in Distribution 683
Slutsky's Theorem and the Continuous Mapping Theorem 685
Application to the t-Statistic Based on the Sample Mean 685
17.3

Asymptoti c Distribution of the OLS Estimator and t-Statist ic


Consistency and Asymptotic Normality of the OLS Estimators 686
Consistency of Heteroskedasticity-Ro bust Standard Errors 686
Asymptotic Normality of the Heteroskedasticity-Robust t-Statistic

17.4

688

Exact Sampling Distributions Whe n the Errors


Are Normally Distributed 688
Distribution of {3, w ith No rmal Errors 688
Distribution of the Homoskedasticity-only t-Statistic

17.5

686

Weighted Least Squares

690

691

WLS with Known Heteroskedast icity 691


WLS with Heteroskedasticity of Known Functional Form 692
Heteroskedasticity- Robust Standard Errors or WLS? 695
AP PENDIX 17. 1 The Normal and Related Distributions and Moments of

Continuous Random Variables

700

Two Inequalities

702

APP ENDIX 17. 2

C H A PTER 18

18 . 1

The Theory of Multiple Regression

704

The Linear Multi ple Regression Model and O LS Estimator


in Matrix Form 706
T he Multiple Regression Model in Matrix Notation
The Extended Least Squares Assumptions 707
The OLS Estimator 708

18.2

706

Asymptotic Distribution of the O LS Estimator and t-Statistic


The Multivariate Central Limit Theorem 710
Asymptotic Norma lity of {3 710
Hete roskedasticity- Robust Standard Errors 711
Co nfidence Intervals for Predicted Effects 712
Asymptotic Distribution of the t-Statistic 7 13

18.3

Tests of Jo int Hypotheses

71 3

Joint Hypotheses in Matrix Notation 7 13


Asymptotic Distributio n of the F-Statistic 714
Confidence Sets for Multiple Coefficients 714
18.4

Dis ri but ion of Regression Statistics w ith Normal Errors


Matrix Representatio ns ofOLS Regre ssion Sta tistics
Distributio n of {3 with No rmal Er rs 716

715

715

7 10

XX

CONTEN TS

st

Distribution of
717
Homoskedasticity-Only Standard Errors
Distribution of the t-Statistic 718
Distribution of the F-Statistic 718

18. 5

717

Efficiency of the OLS Estimator with Homoskedastic Errors

719

The Gauss-Markov Conditions for Multiple Regression 719


Linear Conditionally Unbiased Estimators 719
The Gauss-Markov Theorem for Multiple Regression 720
18.6

Generalized Least Squares

721

The GLS Assumptions 722


GLS When n Is Known 724
GLS When !l Contains Unknown Parameters 725
The Zero Conditional Mean Assumption and GLS 725
18.7

Instrumental Variables and Generalized Method


of Moments Estimation 727
The IV Estimator in Matrix Form 728
Asymptotic Distribution of the TSLS Estimator 729
Properties ofTSLS When the Errors Are Homoskedastic 730
Generalized Method of Moments Estimation in Linear Models 733
APPENDIX 18 .1

Summary of Matrix Algebra

APPENDIX 18 .2

Multivariate Distributions

APPENDI X 18 .3

Derivation of the Asymptotic Distribution of {3

APPEND IX 18. 4

Derivations of Exact Distributions of OLS Test Statistics with


749

Normal Errors

743

747
748

Proof of the Gauss-Markov Theorem


for Multiple Regression 751

APPENDIX 18.5

APPENDIX 18.6

Proof of Selected Results for IV and GMM Estimation

Appendix 755
References 763
Answers to "Review the Concepts" Questions
Glossary 775
Index 783

767

752

Key Concepts
PART ONE
1.1

Introduction and Review

Cross-Sectional, Time Series, and Panel Data

15

2. 1

Expected Value and the Mean

2.2
2. 3

Variance and Standard Deviation

2.4

Computing Probabilities Involving Normal Random Variables

2.5

Simple Random Sampling and i.i.d. Random Variables

2.6

Convergence in Probability, Consistency, and the Law of Large Numbers

2. 7

The Central Limit Theorem

24

25

Means, Variances, and Covariances of Sums of Random Variables

Estimators and Estimates

3. 2

Bias, Consistency, and Eff1ciency


The Standard Error ofY

68

70

76

3. 5

The Terminology of H ypothesis Testing

3.6

Testing the Hypothesis E(Y)

3. 7

Confidence Intervals for the Population Mean

PART TWO

50

67

Y is BLUE

Efficiency ofY:

3.4

47

55

3. I
3.3

38

40

79

=1-'-Y,o Against the Alternative E(Y) t= 1-'-Y,o

80

82

Fundamentals of Regression Analysis

I 09

4. I

Terminology fo r the Linear Regression Model with a Single Regressor

4. 2

The OLS Estimator, Predicted Values, and Residuals

4. 3

The Least Squares Assumptions

115

119

131

4. 4

Large-Sample Distributions of ~ 0 and ~ 1

5.1

General Form of the t-Statistic

5.2

Testing the Hypothesis (3 1 =13 1,0 Against the Alternative {3 1 t= 13 1,0

5.3
5.4

Confidence Interval for /3 1

5. 5

T he Gauss-Markov Theorem for ~ 1

6. 1

Omitted Variable Bias in Regression w ith a Single Regressor

6.2

The Multiple Regression Model

6. 3

The OLS Estimators, Predicted Values, and Residuals in the Mult iple Regression Model

6. 4

The Least Squares Assumptions in the Multiple Regression Model

6. 5

Large Sample Distribution of ~ 0 ~ 1 ,

133

150
152

157

H eteroskedasticity and Homoskedasticity

162

168
189

196

... ,

~k

198

204

206

xxi

JUtii

KEY CONC EPTS

7. I
7. 2
7.3

Testi ng the H ypothe sis {3i - {31,0 Against the A lternative {31 cF f3i.O 222
Confidence Inter ls for a Single Coefficient in Multiple Regression 223
O mitted Variable Bias in Multiple Regression 237

7.4

R2 and

8 .1

The Expected Effect on Y of a Change in X in the Nonlinear Regression Model (8 .3)

R2: W hat T hey Tell You-

and What They Don 't

238

8.2

Logarithms in Regression: Three Cases

8. 3

A Me thod fo r In terpreting Coefficients in Regressions w ith Binary Variables

8.4

Interactions Between Binary and Continuous Variables

8.5

Interacti ons in Mu ltiple Regression

273
279

282

287

9 .I

Internal and External Validit y

9 .2

Omitted Variable Bias: Should I Include More Variables in My Regression?

313

9.3

Functional Form Misspecification

9.4

Errors-in-Variables Bias

9 .5

Sam ple Se lection Bias

9.6

Simultaneous Causal ity Bias

9. 7

Threats to the Internal Validity of a Multiple Regression Study

318

319

321
323
325
327

PART THREE Further Topics in Regression Analysis


I0.1

Notation for Panel Data

10. 2

The Fixed Effects Regression Model

10.3

The Fixed Effects Regre ssion Assumptions

II. I

The Linear Probability Model

I 1. 2

The Probit Model, Predicted Probabilities, and Estimated Effects

347

350
359
365

388
392

I I. 3

Log it Regression

12.1

The General Instrumental Variables Regression Model and Terminology

394

12.2

Two Stage Least Squares

12 .3

The T .vo Conditi ons for Valid Instruments

433

435
436

12 .4

The IV Regression Assumptions

12.5

A Rule ofThumb for Checking for W eak Instruments

44 1

I 2.6

The O veridentifying Restrictions Test (the )-Statistic)

444

437

PART FOUR Regression Analysis of Economic Time Series Data


!4 .1

Lags, First Differences, Logarithm s, and Growth Rates

14.2

Autocorre lation (Serial Correlation) and Autoco ariance

14.3

Autoregre ssions

14.4

The Aut oregressive Distributed Lag Model

14.5

Statio narity

545

26 1

539
544

530
532

523

xxiii

KEY CONCEPTS

14.6

Time Se nes Regression w it h Multiple Predictors

546

14.7 Granger Causahty Te ls (Tests of Predictive Conte nt) 547


14.8 The Augmented Dickey-Fuller Test for a Unit Autoregressi e Root
14.9 The QLR Test for Coefficient Stability
14. 10 Pseudo Out-of-Sample Forecasts

569

572

15 . I

The Distribut d Lag Model and Ex ge ne it y

15.2
15 .3

The Distributed Lag Model Assumptio ns


HAC Standard Errors 609

15.4

Estimation of Dynamic Multipliers Under Strict Exogeneity

16. 1 Vector Autoregressions

600

602

Iterated Multiperiod Forecasts

16. 3
16.4

Direct Multiperiod Forecast 64 7


Orde rs of Integration, Diffe rencing, and Stationarity
ointegration

PART FIVE
17 . I

6 17

639

16. 2

16.5

562

645
650

658

The Econometric Theory of Regression Analysis

6 75

The Extended Least Squares Assumptions for Regression w ith a Single Regressor

18. 1 The Extended Least Squares Assumptions in the Multiple Regression Model
18.2 The Multivariate Central Limit Theorem

710

18.3

Gauss-Markov Theorem for Multiple Regression

18.4

The GLS Assumption s

723

721

707

680

G eneral Intere st Boxe s


The Distribution of Earnings in the United States in 2004
A Bad Day on Wall Street
Landon Winsl 71

35

42

The Gender Gap of Earnings ofCollege Graduates in the United States


A Novel Way to Boost Retirement Savings 90
The "Beta" of a Stock 122

86

The Economic Value of a Year of Education : Heteroskedasticity or Homoskedasticity? 165


The Mozart Effect: Omitted Variable Bias? 190
The Returns to Education and the Gender Gap
The Demand for Economics Journals

284

288

Do Stock Mutual Funds Outperform the Market?

323

James Heckman and Daniel McFadden , Nobel Laureates


Who Invented Instrumental Variable Regression?
A Scary Regression

407

425

426

The Externalities of Smoking


The Hawthorne Effect

446

4 74

What Is the Effect on Employment of the Minimum Wage?

498

Can You Beat the Market? Part l 540


The River of Blood

550

Can You Beat the Market? Part II

573

N EWS FLASH : Commodity Traders Send Shivers Through Disney World


Robert Engle and Clive Granger, Nobel Laureates 657

625

Preface
conometrics ca n be a fun course for both teacher a nd stud nt. The real world
of economics, business, an d gov -mment is a com plical d and messy place. full
of com petin g ideas and q uestions that dema nd answe rs. Is it more effective to
tackle dr unk driving by passing tough laws or by increasing the tax o n ale hol?
Can you make mon y in the st ck marke t by buying when prices are hist rically
low, re lative to earning . or should you just sit tight as the rando m walk theory of
stock prices suggests? Can we improve elementary e ducation by reducing class
size . or sho uld we simply have our children li ten to Mozart for te n minutes a day?
Ec nometrics help us to sort o ut sound ideas from crazy ones and to find quantita tive answ rs to in1portant quantitative question . E cono metrics op n a wi ndow on o ur complicated world tha t le ts us see the re lation hips on which people.
bu inesses, and governments base their decisions.
This textbook is designed for a first course in undergr aduate e onometrics.lt
o ur expe rie nce that to make econome trics rei vant in an introductory cours ,
interesting applications must motivate the the ry and the theory must match the
a pplications. This simple principle represents a significant depar ture fr m the older
gene ration of econ metrics books, in which theoretical models and a umptions
do not match the applications. It is no wonder that orne tud nts qu tion the relevance of econometrics after they spend much of their time learning assumpli ns
that they subseque ntly realize are unrealistic, so th at they must then learn " olutions'' to ' pro !ems" that aris when the applications d not match the assumpt ions. We believ that it is far better to motivate the need for tools with a concrete
a pplication, a nd the n to provide a few simple assumptions that ma tch the application. Because the theory is im media te ly re levan t to the applicali o , this
approach can make econome trics come alive.
The second edition benefits from the many c nstructive suggestions of teachers who used the fi rst edition, while m aintaining the philo o phy that applications
hould drive the theory, not the other way around. The single greatest change in
the second edition i a re rgan iza tion and expansion of the mat rial o n core
regression analysis: Part II, wh ich covers regression with cro s-section l data, has
be n expanded fr m fo ur chapters to six. We have added new empirical examples
(as boxes) d rawn from economics and finance; some new optional . ections on

xxviii

PREFACE

classical regres ion th ory; and many new exercises, both paper-and-pencil and
computer-based empirical exercises using data sets n wly placed on the textbook
Web site. A more detailed description of changes to the econd edition can be
found on page xxxii.

Features of This Book


This textbook differs from others in three main ways. First. we integrate real-w rld
questions and data into the development of the theory, and we take seriously the
substantive findings of the resulting empirical analysis. Second, our choice of topics reflects mod m theory and practice.Third, we provide theory and assumptions
that match the applications. Our aim is to teach students to become sophisticated
consumers of econometrics and to do so at a level of mathematic appropriate for
an introductory course.

Real-world Questions and Data


We organize each methodological topic around an impqrtant real-world question
that demands a specific numerical answer. For example, we teach single-variable
regression, multiple regression, and functional form analysis in the context of estimating the effect of school inputs on school outputs. (Do smaller elementary
school class sizes produce higher test scores?) We teach panel data methods in the
context of analyzing the effect of drunk driving laws on traffic fatalities. We use
possible racial discrimination in the market for home loans as the empirical application for teaching regression with a binary dependent variable (Iogit and probit).
We teach instrumental variable estimation in the context of estimating the demand
elasticity for cigarettes. Although these examples involve economic reasoning, all
can be understood with only a single introductory course in economics, and many
can be understood without any previous economics course work.Thus the instructor can focus on teaching econometrics, not microeconomics or macroeconomics.
We treat all our empirical applications seriously and in a way that shows students how they can learn from data but at the same time be s if-critical and
aware of the limitations of empirical analyses. Through each application, we
teach students to explore alternative specifications and thereby to assess whether
their substan tive findings are robust. The questions asked in the empirical applications are important, and we provide serious and, we think. credible answers.
We e ncourage students and instructors to di agree, however, and in ite them to
reanalyze the data , which are provided on the textbook ' companion Web site
( www.aw-bc.com/stock_wat on).

PREFACE

xxix

Contemporary Choice ofTopics


Econometrics has come a long way in the past two decade . The topics we co er
reflect the best of contemporary applied econometrics. O ne can nly do so much
in an introductory course. so we focus on procedures and tests that are commonly
used in practice. For example:

Instrumental variables regression. We present instrumental variabl s r gression as a general method for handJing correlation between the error term and
a regressor, which can arise for many reasons, including omitted variabl sand
simultaneous causality. The two assumptions for a valid instrum nt - exogeneity and relevance-are given equal billing. We follow that presentation
with an extended discussion of where instruments come from, and with tests
of overidentifying restrictions and diagnostics for weak instruments-and we
explain what to do if these diagnostics suggest problems.
Program evaluation. An increasing number of e onometric studies analyze
either randomized controlled experiments or quasi-experiments, also known
as natural experiments. We address these topics, often collectively referred to
as program evaluation, in Chapter 13. We present this research strategy as an
alternative approach to the problems of omitted variables. simultaneous
causality, and selection, and we assess both the strengths and the weaknesses
of studies using experimental or quasi-experimental data.
Forecasting. The chapter on forecasting (Chapter 14) considers univariate
(autoregressive) and multivariate forecasts using time series regression, no t
large simultaneous equation structural models. We focus on simple and reliable tools, such as autoregressions and model selection via an information criterion , that work well in practice. This chapter also fe atures a practically
oriented treatment of stochastic trends (unit roots), unit root te ts, tests for
structural breaks (at known and unknown dates), and pseud o ut-of-sample
forecasting, all in the context of developing stable and reliable time series forecasting models.
Time series regression. We make a clear distinction between t\ o very dill rent applications of time se ries regression: forecasting and estimation of
dynamic causal effects. The chapter on causal inference using time series data
(Chapter 15) pays careful attention to when different estima tion methods,
including generalized least squares, will or will not lead to valid causaJ inferences, and when it i advi able to estimat dynamic regressions using OLS with
heteroskedasticity- and aut correla tion-consi tent standard error .

XXX

PREFA CE

Theory T hat Matches Applications


A lthough econometric tools are best motivated by empirical applications, tudent
need to Jearn enough econometric theory ro understand the strengths and limitations of those tools. We provide a modern tr atment in which the fit between theory and applications is as tight as possible, while keeping th mathematics at a level
that requires only algebra.
Modern empirical applications share some common characteri ti : the data
s ts typically are large (hundreds of observations, often more): r grcs o rs are not
fixed over repeated samples but rather are collected by random sampling (or some
other mechanism that makes them random); the data are not normally distributed;
and there is no a priori reason to think that the errors are homo kedastic (although
often there are reasons to think that they are heteroskedastic).
These observations lead to important differences between the theoretical
development in this textbook and other textbooks.

Large-sample approach. Because data sets are large, from the o utset we use
large-sample normal approximations to sampling distributions for hypothesis
testing and confidence intervals. O ur experience is tha t it takes les time to
teach the rudiments of large-sample approximations than to teach the Student
t and exact F distributions, degrees-of-freedom corrections, and so forth. This
large-sample approach also saves students the fr ustration of discovering that ,
because of nonnormal errors, the exact distribution theory they just rna tered
i irrelevant. O nce taught in the context of the sample mean. the large-sample
approach to hypothesis testing and confid ence intervals carries directly
thro ugh multipl regression analysis, logit and pro bit, instrumental variables
estimation, and time series methods.

Random sampling. Because regressors are rarely fix d in econometric applications, from the outse t we treat data on all variables (dependent and independent) as the result of random sampling. This assumption matches our initial
app lications to cross-sectional data; it extends readily to panel and time eries
data; and because of our large-sample approach, it poses no c dclitional conceptual o r mathematical difficulties.

Heteroskedasticity. A pplied econometricians routin ly use het roskedasticity-robust standard errors to elim ina te worries a bout \ hether hetero skedasticity is pr sen t or not. Jn this book. we move beyond treating
heteroskedasticity as an xception or a 'p roblem to he "solve ''; instead, we
all w fo r heter skcdasticity fro m the outset and simply use heteroskedasticity-

PREFACE

xxxi

robust standard errors. We present homoskedasticity as a special case tha t provides a theoreti al motivation for OLS.

Skilled Producers, Sophisticated Consumers


We hope that students using this book will become sophisticated consumers
of empirical analysis. To do so, they must learn not only how to use the tools of
regression analysis, but also how to assess the validity of empirical analyses presented to them.
Our approach to teaching how to assess an empirical tudy is threefold . First,
immediately after introducing the main tools of regression analysis, we devo te
Chapter 9 to the threats to internal and external validity of an empirical study. This
chapter discusses data problems and issues of generalizing findings to other settings. It also examines the main threats to regression analysis, including omitted
variables, functional form misspecification, errors-in-variables, selection, and
simultaneity-and ways to recognize these threats in practice.
Second, we apply these methods for assessing empirical studies to the empirical analysis of the ongoing examples in the book. We do so by considering alternative specifications and by systematically addressing the various threats to
validity of the analyses presented in the book.
Third, to become sophisticated consumers, students need firsthand expe rience
as producers. Active learning beats passive learning, and econometrics is an ideal
course for active learning. For this reason, the textbook Web site features data sets,
software, and suggestions for empirical exercises of di ffering scopes. These web
resources have been expanded considerably for the second edition.

Approach to Mathematics and Level of Rigor


Ou r aim is for students to develop a sophisticated understanding of the tools of
m de rn regression analysis, whether the course is taught at a " high" or a " low''
level of mathematics. Parts I-IV of the text (which cover the substantive mat rial)
are accessible to students with only precalculus mathematics. Parts I-IV have fewer
equation , and more applications, than many introductory econometrics book ,
and f r fe wer equations than books aimed at mathematical sections of undergraduate courses. But more equations do not imply a more sophisticated treatment. In our experience, a more mathematical treatment does not lead t a deeper
understanding for most students.
This said, different students learn differently, and for the mathematically weB pre pared student , le arning can be enhanc d by a more xpli illy mathematical
treatment. Pa rt V therefore contains an introduction to econometric theory that

xxxii

PREFAC E

is appropriate ror tudenls with a stronger mat hematica l background. We believ


that, when the mathematical chapters in Pa n V are used in conjunction with the
mat rial in Parts I-lV, this book is uita ble for advanced undergraduate or master's level econ metri cour es.

Changes to the Second Edition


Th cha nges introd uced in th e . econd edition fall into three ca tegorie : more
empi rical exa mples: expanded th eore ti cal ma t rial, especially in the treatm ent of
the core regression topics; nd additional studen t exerdses.

More empirical examples. TI1e econd edition retains the empirical examples
from rhe fir t edition , nd adds a sign ificant number of new ones.1l1e e addi tio na l
exampl s include esti ma tion of the returns to education; infe ren e about the gender gap in earnings: the diffic ulty of for ecasting the stock market; and mo deling
the volatility cl stering in stock returns. The data se ts for these e mpirical example are posted on th course Web ite. The se omJ edition also includes more general-interest box s, for example how ample sel cti on bia ("survivorship bias' )
can produce misleading conclusions about whe ther actively mana ged mu tual fund s
actually beat the market
Expanded theoretical material. The phil osophy of this and the previous edition is that t he modeling assumptions should be motivated by empirical applications. For this rea on, o ur th ree basic least sq uares a. sumpt io ns that under pin
regression with a single regressor include neith er normali ty nor homoske dasticity, both o f which are arguably t he exc ption in econometric a pp lication . Thi
leads directly to large-sa mple inference using hetero kedasticity-robust tanda rd
errors. O ur experience is that students don t fin d th is difficult- in fact, what they
fin d difficult is the traditiona l approach of introducin g the homo kedasticity and
nonnali ty assum ptions, learn ing how to use t- and F-ta bl es, then being told that
wh at th y just learned is not reliabl in applications because of the fail ure of these
assumpti ons and that these " problems' must be ''fixed." But n ot all in tructors
sh are th is vie> , a nd some find it u -eful to introd uce the horn oskedastic nonna l
regr ssi on model. Moreover, even if homosk edasticity is the exc ption instead of
the rule. assuming homo kedasti city permits discussing the G auss-M rkov theorem , a key motivation fo r using ordina lea t - u ares (OLS).
For the r a on. , the treatm ent of the core r gre ion mate rial has been signific ntly expanded in the second edition. and now incl udes ections on the theoretical motivation for OLS (the au -Markov t heorem ). mall-<;ample inference
in the IJomoskeda tic normal modeL and mullicollin eanty and the dummy ari-

PREFAC E

x.x.xiil

able trap. To accommodate these new sections, the new empirical examples, the
new general-i11terest boxes. and the many new exercises, the core regression chapt rs ha e b n expanded from two to four: TI1e linear regression model with a single regresso r and OLS (Chapte r 4); inference in regression with a single regressor
(Ch apter 5); the mult iple regression model a nd OLS (C hapter 6); and inference
in the m ultiple regression model (Chapter 7). Thi expanded and reorgan ized
treatment of the core regr ssion material con titutes the single greatest change in
the second edition.
The second edition also includes some additional topics req uc ted by some
instructors. On such addition is specification and estimation of m dels tha t are
nonli nea r in the parame ters (Appendix 8.1). A no ther is how to com pute standard erro rs in panel data regression when the error term is serially correlated
fo r a given entity ( clustere I tandard rrors; Section 10.5 and ppendix 10.2). A
third addition is an introduction to current best practices for detecting an d handling weak in trume nts (Appendix 12.5), and a fourth addition is a treatme nt, in
a new final s ction of the last chapter (Section 18.7), of effjcien t stima tion in
the heteroskedastic linear IV regression model using general ized method of
moments.

Additional student exercises. The second edition contains many new exercises, both ' paper and pencil" and empirical exercises that invol e the use f data
bases, supplied on the course Web site, and regression oftwar . T he data section
of the course Web site has been significantly enhanced by the addition of numerous databases.

Contents and Organization


There are fiv e parts to the textbo k. Thi text ook as urn that the tudent
has had a course in probability and statistics, although we review tha t material in
Part I. We co e r the co re material of regression analysi in Part U. Parts Ill, IV,
and V pre e nt additional t pies that build on th core treatme nt in Part U.
Part I
Chapter 1 intr due s econometrics and stresses the importance of providing quantitative answers to q uantitative questions. I t di cus es the o ncept of causality in
stati tical studies and ur eys the differe nt types of data encountered in econometrics. Material from probability and stati stics is reviewed in Chapters 2 and 3,
respectiv ly; wh ther thes chapters are taught in a give n course, or simply provided as a referenc , depend o n the background f th e tudent .

xx.xiv

PREFACE

Part II
Chapter 4 introduces regression with a single regressor and ordinary least qua res
(OLS) estimation, and Chapter 5 discusses hypoth is tests and confidence intervals in the regression model with a single regressor. In Chapter 6, students learn
how they can address omitted variable bias using multiple regression, thereby estimating the effect of one independent variable while holding other indep ndent
variables constant. Chapter 7 covers hypothesis tests. including F-tests, and confidence intervals in multiple regression. In Chapter 8, the linear regression model
is extended to models with nonlinear population regression functions, with a focus
on regression functions that are linear in the parameters (so that the parameters
can be estimated by OLS). In Chapter 9, students step back and learn how to identify the tr ngths and limitations of regression studies, seeing in the process how
to apply the concepts of internal and external validity.

Part Ill
Part III presents extensions of regression methods. In Chapter 10, students learn
how to use panel data to control for unobserved variables that are constant over
time. Chapter 11 covers regression with a binary dependent variable. Chapter 12
shows how instrumental variables regression can be used to address a variety of
problems that produce correlation between the error term and the regressor, and
examines how one might find and evaluate valid instruments. Chapt r 13 introduces
students to the analysis of data from experiments and quasi-, or natural, experiments, topics often referred to a "program evaluation."

Part IV
Part IV takes up regression with time series data. Chapter 14 focuses on forecasting and introduces various modern tools for analyzing time series regressions such
as unit root tests and tests for stability. Chapter 15 discusses the use of time eries
data to estimate causal relations. Chapter 16 presents some more advanced tools
for time series analysis, including models of conditional heteroskedasticity.

PartY
Part Vis an introduction to econometric theory. This p rt is more than an appendix that fills in mathematical details omitted from the t xt. R ather, it is as If-contained treatment of the econometric theor y of estin1ation and inference in the
linea r regression model. Chapter 17 develops the theory of regres i n analysis for
a single regressor; th e expo ition does n t use matrix algebra, although it does
demand a higher level of mathematical phi tication than the re l of the te t .

PREFACE

XXXV

TABLE I Guide to Prerequisites for Special-Topic Chapters in Parts Ill, IV, and V
Prerequisite parts or chapters
Part I

Part II

Chapter

1-3

4- 7 , 9

10

xa

x
xa

x
x

X
X

x
x

x
x
x

X
X

X
X

X
X

- -l l

12.1. l2.2
l2.3-12.6
-13
- 14
15
16
17
r18

,___ xa

Part Ill

10. 1,

12.1,

10..2

12.2

Part IV

14. 1-

14.5-

14.4

14.8

PartY

15

17

X
X
X

r-

X
X

-X

This table shows the minimum prerequisites needed to cover the material in a given choJ.'er. For example, estimation of
dynamic causal effects with time series data (Chapter 15) first requires Pori I (as neede , depending on student preparation ,
and except as noted in footnote a), Port II (except for chapter 8 ; see footnote b), and Sections 14. 1-14.4.
"Chapters 1Q-16 use exclusively lorge-sa11e aphroximations ta samplint distributions, so the optional Sections 3.6 (the
Student r d istribution for testing means) an 5.6 (t e Student r distribution or testing regression coefficients) can be skipped.
bChapters 14- 16 (the time series chapters} can be taught without fi rst teaching Chapter 8 (nonlinear regression functions) if
the instructor pauses to explain the use of logarithmic transformations ta approximate percentage changes.

C hapter 18 presents and studies the multiple regression model, instrum ental variables regression, and generalized method of moments estima tion of th e linear
model, all in matrix form .

Prerequisites Within the Book


Because differe nt instructors like to emphasize different m at rial, we wr te tltis
book with diverse teaching preferences in mind. To the maximum extent possible,
the chapters in Parts III, IV, and V are "stand-alone" in the sense that they do not
require first teaching all th preceding chapter Th.e speciiic prereq ui ites for each
chapter are describ d in Table 1. Although we hav found that the sequenc of topics adopted in the textbook works well in our own courses, the cb pt rs are written
in way that allows instructors t pre ent topics in a diCferent rder if they o desire.

x.xxvi

PREFACE

Sample Courses
This b ok ac

mmodates several different course structures.

Standard Introductory Econometrics


This course introduces econometrics (Chapter 1) and reviews probability and tatistics as needed (Chapters 2 and 3). It then moves on to regre ion with a single
regressor, multiple regression , the basics of functional form analy is, and the evalua tion f regression studies (all of Part II). The course proceeds to over regression with panel data (Chapter 10), regression with a limited dep ndent variable
(Chapter 11), and/or instrumental variables regression (Chapter 12), as time permits. The course conclud s with experiments and quasi-experiments in Chapter
13, topics that provide a n opportunity to return to the questions of estimating
causal effects raised at the beginning of the emester and to recapitulate core
regression method . Prerequisites: A lgebra II and introductory statistics.

Introductory Econometrics with


Time Series and Forecasting Applications
Like the standard introductory course, this course covers all of Part I (as needed)
and all of Part II. Optionally, the course next provides a brief introduction to panel
data (Sections 10.1 and 10.2) and takes up instrumental variables regression
(Chapter 12, or just Sections 12.1 and 12.2). The course then proceeds to Part IV,
covering forecasting (Chapter 14) and estimation of dynamic causal eff cts (Chapter 15). If time permits. the course can include some advanced topics in time series
analysis such as volatility clustering and conditional heteroskedasticity (Section
16.5). Prerequisites: Algebra 11 and introductory statistics.

A pplied Time Series Analysis and Forecasting


Th is book also can be used for a short course on pplied time series and fo recasti ng, for which a course on regression analysis is a prerequi ite. Some time is
spent reviewing the tools of basic regression analy is in Part II. depending on studen t preparation. The course then moves directly to Part IV and works through
forecasting (Chapter 14), estimation of dynamic causal effects (Chapter 15). and
advanc d topics in time series analysis (Chapter 16). includi ng vector autorcgressions and conditional heteroskedasti city. A n important component of thi s
course is ha nds-on foreca ting exercises, available t instructors on th book's
accompanying Web site. Prerequisites: AiRehra II and hasic introductor econometrics or the equiva!ellt.

PREFACE

xxxvii

Introduction to Econometric Theory


This book is also suitable {or an advanced undergraduate course in which the students have a strong mathematical preparation, or for a master's level course in
econometrics. The course briefly reviews the theory of statistics and probability as
necessary (Part 1). The course introduces regression analysis using the nonmathematical, applications-based treatment of Part II. This introduction is followed by
the theoretical development in Chapters 17 and 18 (through section 18.5). The
course then takes up regression with a limited dependent variabl (Chapter 11 )
and maximum likelihood estimation (Appendix 11.2). Next, the course optionally
turns to instrumental variables regression and Generalized Method of Moments
(Chapter 12 and Section 18.7), time series methods (Chapter 14), and/or the estimation of causal effects using time series data and generalized least squares (Chapter 15 and Section 18.6). Prerequisites: calculus and introducrory statistics. Chapter
18 assumes previous exposure to m atrix algebra.

Pedagogical Features
The textbook has a variety of pedagogical features aimed at helping students to
understand, to retain, and to apply the essential ideas. Chapter introductions provide a real-world grounding and motivation, as well as a brief road map high lighting the sequence of the discussion. Key terms are boldfaced and defined in
context throughout each chapter, and Key Concept boxes at regular intervals recap
the central ideas. General interest boxes provide interesting excursions into related
topics and highlight real-world stud.ies that use the methods or concepts being discussed in the text. A numbered Summary concluding each chapter serves as a helpful framework for reviewing the main points of coverage. The questions in the
Review the Concepts section check students' understanding of the core content,
Exercises give more intensive practice working with the concepts and techniques
introduced in the chapter, and Empirical Exercises allow the student to apply
what they have learned to answer real-world empirical questions. A t the end of
the textbook, the R eferences section lists sources for further reading, the Appendix provides statistical tables, and a Glossary conveniently d fines all the key terms
in the book.

Supplements to Accompany the Textbook


The on li ne supplements accompanying the Second E dition of Introduction to
Econometrics include the Solutions ManuaL Te t Bank (by Manfred W. Keil of
Claremont McKenn a College), and Power Point Lecture Notes with text figures,

x.x.xviii PREFAC E

tables, and Key Concepts. The Soluti ons Manual includes solutions to all the endo f-chapter exerci es, while the Test Bank. offered in Test Generator Software (Te tGen with Q uizMaster) . provides a rich supply of easily edited test problems and
que tion of various types to meet specifjc course needs. These resource are availabl for download from the In. t ructors Resource Center at www.aw-bc.com/irc.
If instructors pref r th ir suppl ments on a CD-ROM, our Instructor' Re ource
Disk, available for Windows and Macinto h, ntains th e PowerPoint Lecture
Notes, the 11 st Bank , and the Solut ions Manual.
In adilltion. a C mpa nion Web site. found at www.aw-bc.com/ tock_watson,
provides a wide ra n ge of addi tio nal resources for students and facu lty. The e
include data e ts (or all the text examples repli ation fi les for empirical results
reported in the text, data sets fo r the end-of-cha pter Empirical Exercises, EViews
and STATA tutorials for students. and an Excel add-in fo r OLS regres ion .

Acknowledgments
A great many people contributed to the fir t eiliti n of this book. Our biggest debt
of gratit ude arc to our colleagues at Har vard and Prin ceton who u ed early drafts
of this book jn their classroom . A t H arvard 's Ken nedy School of Government.
S uzanne Cooper provided invaluable sugge Lio ns and detailed comm ents on multi p le draft . As a co-teacher with one of th a uthors ( tock), she also helped to vet
much of the m a teria l in this b ok while it was being develop d for a requi red
cour e for master's student at the Kenn dy Sch ool. We are also indebted to two
o ther Kennedy School colleagu s, A I ert Abadie and Su e D yn arski. for the ir
patient xptanation of quasi-experiments and the field of program evaluation an d
for th eir detailed comments on early draft of th e text. A t Prine ton . Eli Tamer
taught from an early draft and also provided he lpful om ments on the penultimate
draft of the b ok.
We also owe much t man y o f o ur friends a nd co lleagues in econometri s who
spe nt time talking with us about the substanc of Lhis book and wh collectively
m ade so ma ny h lpful suggestions. B ru cHan en (U niversi ty ofWiscon in . Madison) and Bo Honore (Princeton) provided helpful feedback o n very t.:arly outlines
and preliminary versions of the core material in Part II. Joshua A n grist (MJT) and
Guido l mbens (U niver ity of California, Berk ley) provided th o ughtful suggestions about our treatment of materials on program valuat ion. Our pre ntation
of the material on time erie h a be nefited f rom discussions with Yacine itSahalia (Princdon), Graham Elliott (University of California, San Diego). Andrew
Harvey (Cambridge University), and Chri topher ims (Prince! n ). Finally. many
people mad helpful sugge tions on parL of the manuscript clo c to their area of
expertise: Don Andrew (Yale), John Bound ( mversity of Mich1gan). 'r~gor}

PREFACE

xxxix

Chow (Princeton). Thomas Downes (Tufts). David D rukker (Stata, Corp.), Jean
Bald win G rossman (Princeton), EricH n u hek (the Hoove r Institution), James
He kman (Univ rsity of Chicago), Han Hong (Princeton , Caro li ne Hoxby (Harvard) , A lan Krueger (Princeton), Steven Levitt (University of Chicago), Richard
Light Harvard), D a id eumark (Michigan State U niversity), Jo eph Newhouse
(H r ard) , Pierre Perron (Bo ton University), Ke nneth Warn r ( U niv r ity of
Mi higan), and Richard Z ckhauser (Harvard).
Many people were e ry generous in providing us with data. The California test
score data w re c nstructed wit h the assistance of Les Ax !rod of the Standard
and Ass s ments Division, California Department of Education. We are grateful
to Charlie DePascale. Student Assessment Services, Massachusetts D partment
of E du ation, for his help with asp ct of the Massachusen test score data set.
Christop her Ruhm (Univ r ity o f N rth Ca r !ina. G reensboro) graciously provided us with his data set on drunk driving laws and traffic fatalities. The research
department at the Federal Res r e Bank of Boston dese es thanks fo r putting
together their data on racial discrimination in mortgag lending; we particularly
thank Geoffrey TooteU for providing us with the updated version of the data set
we use in Chapter 9, and Lynn Browne for explaining its policy context. We thank
Jonathan Gruber (MIT) for sharing hi data on cigarette sales, which w analyze
in Chapter 10, and Alan Krueger (P rinceton) for his help with the Tenn ssee
STAR data that we analyze in Chapter 11.
We are also grateful for the many constructive, detailed, and thoughtful comments we received from those who reviewed vari us drafts for Addison-Wesley:
We thank several people for carefull y checking the page proof for errors.
Kerr. Griffin and Yair Listokin read the entire manuscript, and Andrew Fraker.
Ori H ffetz, A mber Henry, Hong Li, A lessandro Tarozzi, and Matt Watson worked
through several chapters.
Mi hael Abbott, Queen' U niver ity,
Canada
Richard J Agnello, Univ rsity
of Delaware
Cl pp ,r Almon, University
of Maryland
Joshu a Angrist. Massachusetts
ln titute of Technology
Swarnj it S. A rora, University
of Wisconsin, Milwaukee
Ch ri. lopher F. Baum. Boston
C liege

McKinley L. Blackburn . Univ rsity


of outh Carolina
A lok Bohara, U niversity of New
Mexico
Chi-Young Choi, Univer ity of New
H mpshire
Dennis Coates, niversity of
Maryland, Baltimore County
Tim Conley, G raduate School
of Busi.ne s, U niversity of Chicago
D uglas D len berg. U niv r ity
of M ntana

xl

PREFACE

Antony D avies, Duquesne University


Joanne M. Doyle, James Madison
University
D avid Eaton, Murray State
University
A drian R. Fleissig, California State
University, Fullerton
Rae Jean B. Goodman, United States
Naval cademy
Bruce E. Hansen, University
of Wisconsin, Madison
Peter Reinhard Hansen, Brown
University
Ian T. Henry, University of
Melbourne, Australia
Marc Henry, Columbia University
William Horrace, University
of Arizona
Oscar Jorda, University of California,
Davis
Frederick L. Joutz, The George
Washington University
Elia Kacapyr, Ithaca College
Manfred W. Keil, Claremont McKenna
College
Eugene Krach Villanova University
Gary Krueger, Macalester
College
Kajal Lahiri , State University of New
York, Albany
Dani I Lee, Shippensburg University
Tung Liu, Ball State University
Ken Matwiczak, LBJ School of Pu blic
Affairs, U niv rsity of Texas, Austin
KirnMarie McGoldrick, U niversity
of Richmond
Robert McNown, University
of Colorado, Boulde r
H . Naci M can, Universit-y
of Colorado Denver

Mototsugu Shintani. Vanderbilt


University
Mica Mrkai , Duke University
Serena Ng, Johns Hopkins
University
Jan Ondrich Syracuse University
Pierre Perron Boston University
Robert Phi!Ep , The George
Washington Univer ity
Simran Sahi, University
of Minnesota
Sunil Sapra, California State
University, Los Angeles
Frank Schorfheide, University
of Pennsylvania
Leslie S. Stratton, Virginia
Commonwealth U niversity
Jane Sung, Truman State University
Christopher Taber, Northwestern
University
Petra Todd, University
of Pennsylvania
John Veitch, University of San
Francisco
Edward J. Vytlacil, Stanford
University
M. Daniel Westbrook, Georgetown
University
Tiemen Woutersen, Univers;ty
of Western Ontario
Phanindra V. Wunnava, Middlebury
College
Zhe nhui Xu, Georgia College and
State University
Yong Yin, State U niver.ity of New
York, Buffalo
Jiangfeng Z hang, Univer ity of
California, Berkeley
John Xu Z heng, Un iversity ofTexa
Austin

PREFACE

xli

In the first edition we benefited fr m the help of an ex ptional development


editor, Jane Tufts, whose creativity h ard work, and attention to detail improved
the book in many ways large and small. A ddison-Wesley provided us with firstrate support, sta rting with o ur excellent e itor, S lvia Mallo ry, and extending
through the entire publishing team. Jane and Sylvia patiently taught us a lot about
writing. organization. and presentation, and their effo rts are vident on every page
of this book. We extend o ur thanks to the superb A ddison-Wesley team, who
worked with us on the second edition: Adrienn e D 'A mbrosio (senior acquisitions
editor), Bridget Page (associate media p roducer), Charles Spaulding (. e nior
designer), Nancy Fenton (managing editor) and h r s lection of Nancy Freihofer
and Thompson Steele Inc. who handled the entire production process, H eat her
McNally (supplements coordinator), and Denise Clinton (editor-in-chief). Finally
we had the benefit of Kay Ueno 's skilled editing in the econd edition.
We also receiv d a great deal of help preparing the second edition. We ha e
been especially pleased by the number of instructors who co ntacted us directly
with thoughtful suggestions for this edition. In particular, the changes mad in the
second edition incorporate or reflect suggestions, corrections, comments, and help
provided by Michael Ash , Laura Chioda, Avinash D ixit, Tom Doan, Susan
Dynarski, Graham Elliott, Chris Foote, Roberto E. Jalon Gardella, William
Greene, P eter R. Hansen, Bo Honore, Weibin Huang, Michael Jansson Manfred
Keil, Jeffrey Kling, Alan Krueger, Jean-Francois Lamarche, Hong Li Jeffrey L iebman, E d McKenna , Chris Murray, G iovanni Oppe nheim, Ken Simon , Douglas
Staiger, Steve Stauss, George Tauchen, and Samuel Thompson.
This edition (including the new exercises) uses data generously supplied by
Marianne Bertrand, John Donohue, Liran E inav, William E vans, Daniel Hamermesh, Ross Levine, John List, Robert Porter, Harvey Ros'n Cecilia Rou e, and
Motohiro Yogo. Jim Bathgate, Craig A. D epken 11, Elena Pesavento, and Della
Lee Sue helped with the exercises and solutions.
We also benefited from thoughtful review for the second ed ition prepared
for Addison Wesl y by:

Necati Aydin, Florida A&M


U nivers ity
Jim Bathgate, Linfield College
James Cardon, Brigham Young
University
1-Ming hiu, Minot ta te U niver ity
R. Kim Craft, Southern Utah
n.iversity

Brad Curs, University of O regon


Jamie E merson , Clarkson Univ rsity
Scott England. California State
University, F resno
Bradley Ewing, Texas Tech
Unjver ity
Barry Falk, Iowa State University
Gary Ferrier, Univer ity of Arkansas


xlii

PREFACE

R udy Fichten baum, Wright State


U niversity
Brian Karl Finch, San Diego State
University
Shelby Gerking, University of Central
F lorida
Edward Greenberg, Washington
U niversity
Carolyn J. Heinrich, University
of Wisconsin-Madison
Christina Hilmer. irginia Polytechnic
In titute
Luoj ia Hu, Northwestern University
Tomoni Kumagai, Wayne State
U niversity
Tae-Hwy Lee, University
of California, Riverside
E lena Pesavento, Emory University

Susan Porter-Hudak. Northern lllinoi


niversity
Louis Putterman, Brown niver ity
Sharon Ryan, University of Mi souri.
Columbia
John Spitzer, SUNY at Brockport
Kyle Steigert, University of Wisconsin,
Madison
Norman Swanson, Rutgers
U niversity
Justin Tobias. Iowa Stat University
Charles S. Wass 11, Jr., Central
Washington University
Rob Wassmer, Califo rnia State
U niversity, Sacramen to
Ron Warren, University of G eorgia
William Wood, James Madison
U niversity

Above all, we are inde bted to our families for their endurance throughout this
p roj ect. Writing this book took a long time-for them, the project mu t have
seemed endless. They more than anyone bore the burden of this commitment, and
for their help and support we are deep! grateful.

CHAPTER

Economic

Questions and Data

sk a half dozen econometricians what cconum~.otric!)

and you could get a

hall dozen di ffer~nt answers. One m1ghttdl )OU that cconomctnel> ~~the

sc1ence of testing economic


IS

i~

the~

second might tdl )'OU th:st economt:tncs

the !)t:t of tool used for forecasting future valUe!!) of econumis 'ariabln such

a!> a firm 's sales, the O\crall growth of the econom ~

M sw~k

pm:es. Another

might say that econometrics is the process of fitting mathcm'ltical economic


models to real-world data. A fourth might tell you that it 1'
d rl

th~.

science and

of using historical data to make numerical, or quan titative, policy

recommendations in government and business.


lo fact. all these answers are right. At a broad level, econometrics 1s the
scie nce and art of usmg economic theory and stat1St1cal techmquc' to analyze
economic data. Econometric methods are uscJ m many oranchcs of economics,
mcluding finance , labor economics, macTOeconomics, m1croeconom1cs,
marketing. and economic policy. Econometric methode; are also commonly used
in other social sciences. including political science and -.ociology.
1l1is book introduces you to the core set of methods used hy
ecnntlmclrician.,_ We will use these method<~ to nn'wl r ,, v.1ric1y of c;pecific,
quantll3tiVC questions taken from the world of bu me ss Jnd govl.!rnment policy.
This chapter poses four of those questions and discusses. in general terms, the
economdric approach to answering them. The chaptc r concludes with a .;;urvey
of the mam types of data available to economct rici..tn~ for answenng these and
ot her quanti tative e-conomic quesrions.

CHAPTER 1

1.1

Economic Questions ond Data

Economic Questions We Examine


Many decisions in economics, business. and government hangc un undcr..tandmg
relataonshap among variables in Lbc world around u~.lbc;~~: dc~oa,aon.., require
quantitative answers to quantitative questions.
Th1s book examines several quantitative question taken from cu rrent issues
in economics. Four of rbese questjoos concern education policv racaal bias m mortgage lending, c1gareue consumption, and macroecononm. torecasting.

Question #l : Does Reducing Class Size


[ Improve Elementary School Education?
Proposals for reform of the U.S. public education system genera te hea ted debate.
Muny of lhe proposals concern the youngest students, 1J1ose in ele mentary schools.
E lementary school education has various objectives. such as developing socia l
skills, but for many parents and educators the most trnportant objective is basic
academic learning: reading, writing, and basic mathematics. One prominent proposal for trnproving hasic learning is to reduce class sizes at elementary schools.
With fewer l>tudents in the classroom, the argument goes, each student gets more
or the teacher's attention , there are fewer class disruptions. teaming is enhanced,
and grades improve.
But what, precisely, 1s the effect on elementary school education of reducing
class '>tze? Reducing class size costs money: It requires hiring more teachers and ,
if the chool is already at capacity, building more classrooms. A decision maker
contemplating hiring more teachers must weigh the<;e costs aga1nst tl1e benefits.
To weigh costs and benefits. however, the decision maker must have a prec1se
quantitative understanding of the likely benefits. Is the beneficial effect on basic
learning of smaUer classes large or small? ls it possible 1hat smaller class size actual!> has no effect on basic learning?
Althoug h common sense and everyday experience may suggest tha t more
learni11g occurs when the re are fewer students, common sen.se cannot provide a
quan aitative answer to the question of what exactly is the effect on basic Jearning
of reducing class size. To provide such an answer, we roust exa mine empirical
evidence-that is, evidence based oo data- relating class size ro basic learning in
d~mentary schools.
In trus book, we examine the relationship bei"-Cen class site and basic learning u~ing data gathered from 420 California school districts in 1998. Jn the California da t<:~ c:tudents in districts with ~mall class siz~ol> h.nd co perform better on
st.andun.hzcJ te~t tban :.tudcnts m dbtncts '"ith larg.a cJ l"l:" \\lt k th1s fact is

1 1

Economic Ouestions We Examine

consistent with the idea that smaller classes produce better test scores, it might
simply renect many other advantages that stu<.lents in districts with small classes
have over their counterparts in districts with large classes. For example, districts
with small class sites tend to have '"cal tbier residents than distncts with large
classes, so students in smaiJ-class districts could have more opportunities for learning o uts id~ the classroom. It could be these extra lc.:aming opportunitit:s thotlt!ad
to higher test scores, not smaller class sizes.. ln Part II. we use multiple regression
analy::>is to isolate Lbe effect of changes in class size from changes in othe r r~ctors,
sucb as the economic background of the students.

Question #2: Is There Racial


Discrimination in the Market for Home Loans?
Most people buy their homes with the help of a mortgage, a large loan secured by
the value of the home. By law, U.S. lending institutions cannot take race into
account when deciding to grant or deny a request for a mortgage: Applican ts who
are ide'!!l.cal in all wavs but their race should be c:qually likely to have their mortgage applications approved. In theory. then, there should be no racjal bias jo mort- gage lendmg.
In cont ra~tto tbts theoretical conclusion, rest:archers a l the Federal Reserve
Bank of Boston found (using data from the early 1990s) that 28% of black applicants are denied mortgages, while only 9% of whit e Hpplicant!:. are denied. Do
these data indicate that. in practice, there is racia l bias in mo11gage lending? lf so,
how large is it?
The fact that more black than wh.itc applicants arc denied in the Boston Fed
data does not by irself provide evidence of dlScrimination by mortgage ltmders,
because the black and wh.ite applicants differ in many ways other than their race.
Before concluding that there is bias in the mo1tgage market, these data must be
examined more closely to see if there is a diffcrem."\; in the probability of being
denied for mher wi.~e idemical applicants and. if so. whether this difference is large
or :,maU. To do so, io Chapter 11 we introduce econometric methods that make it
po"sible to quantity the effect of race on chance of obtaining a mo rtgag<;J!olding
co1Want other applicant characteristics. notablv their abilitY to rcp a}' the loan.

Question #3 : How Much Do


Cigarette Taxes Reduce Smoking?
Cigarette smoking ts a major public health concern worldwide. Many of the
costs of smukinl!, :.uch as the medkal cxp~.:nscl. uf caring fo r those mnde sick by

CHAPTER l

Economic Questions ond Dote

smoking a nd the less quantifiable costs to nonsm okers who prder not to breathe
second hand c1gare n e smoke. a re borne by other meml'tcr' ~.>I :.od~:t) Bc:\."ause
these costs are borne by pcopJc Olher than lhe c;mo ker. tbtrt:. b a role fur gnH!mmcnt interventio n in reducing c igarette consumption On(' of the mo... t tkxible
tools for cuumg consumption is to incre ase taxes on cif trcttc-...
Basc ~con omics says that if cigarette prices go up. consu-.pt:on \\'Ill go down.
B ut by how much? If the sales price goes up by 1'"u hy \\hal pcrc\.:ntage will the
quanuty of cigare ttes sold de crease? The pt,.rcentagc ch 1ngl! m the qua nuty
demanded re ulling from a ll% increase in p nce is the pnn cla\ttdty of demtmd.
If we want to reduce smoking by a certa in amo unt. say 20%. h> raisi ng taxes, the n
we need LO kno w the price elasticity to calcul a te the price incrc.lsc necessary to
achieve this reductio n in consumptio n. But wha t is the price e lasticity of dem and
for ciga rettes?
A It ho ugh economic theory provides us wi th tbe co ncepts rhot he lp us a nswer
this questi o n , it does not tell us the nume rica l value of the price elasticit y o f
dema nd. To learn the e lasticit y we must examine e mpirical e vidence about the
behavio r o f smo ke rs a nd pote ntial sm oke rs; in o the r words, we nee d to analyze
data o n cigarette cons umption and pric~.
The data we exami ne a re cigarette sales. prices, taxcc;. and persona l incom~ fo r
U.S stmes in the 1980s and 1990s. Jn these data. sta les with km- taxes. a nd thus low
cigare tte pnces. have high smo king rates, and s tates with high pnccs have low
smokmg rates. However. the a naJyss o f these d ata i com plicated beca use causality runs both ways: Low wxes lead to high demand. hut if there arc many !lmo kers
tn the sta te the n local politicians might try ro keep cgarctte taxes low to satic;fy
thetr smok10g constitue nts. [n Chapter 12 we study methods for handling lhis
simultaneous causality" and use those me thods to estima te the price elasticity of
cigarette demand.

Question #4: What Will


the Rate oflnflatjon Be Next Year?
It seems that people always want a sneak previe w of the fuL url' What '' ill sa les
be oexLyea r a t a firm considering investing io ne w equipme nt'? Will tht.! stock
ma rket go up next month and. if so. by how much? Will city tax r~.;ceipt s next year
cover pla nned c!'\penditures on city services? Will your microcconom11.:S e>.am next
we~k focuc; on t.:xternalitjes o r monopolies? Will Sawrda)' be a nicl.! cJay to gn to
the beach ?

1.1

Economic Questions We Exom1ne

One :hpccl o ( tht future:: in wh11.:h m;:~cnlCconomr,ls and financrl t'c:on,)ml,ts


arc particular!\ intcr<::.tc:d i_ th.: rate of o~o~.. r.Ill pn<.l' tn:lalion dunn th~ nt \I }'- r
A fmanci.tl prnlc~~JttnaJ mrght ,u.h l'c.. a c lient \\ h~o.t hcr to make ,1 l113n or to t.tk~;
one out at a ~I\ ~.on ratt. of mlerest. dc:pc.:nding on her bi!.,l gucs~ nf the: rat~. nf mn~
llon 0\CT thL' C:llllllng year. EclmOmt-;ts .1t central hanks like the f ederal R t:\d\ l'
Board in Wa,htngtnn. D.C.. anJ the: J?uropcan Central Ban k in Friln ktun, Ocr
man~. arc rc-.ptlJNblc for keeping the rate uf price intlation um.lc:r con trol. su their
decisions ubuut how to set interest ru tc:~ rely on the o utlook for tnll,\llon over the
next yct~r II thc.:y thm k the rati! lll rntlallon will increase b~ a pc:n:cnragc poult.
then theY mtl'ht Olrc..ase mtl!rest rate:. by OlUTt than rhat to :..lo'' down m tll'nomy that. in thcit \It\\. mks ovc:rbl!allng. U the~ t!Ut!SS \HOng. 1hc::~ rhk cau.,ineel\h.:r .tn unncc:c"''1 reces:.ion or an un~k,ir rblo... jump in the rat~; of inn lliu
Proft:!>!>ional ecunomi'>tS "'ho rely on prl!cisc numerical lorcc1sb usc econometric moc.kh lcJ make those forecasts. A fo re-.:il'>lCr'~ joh is to predrct th..: luture
using t he past. nnd t.:conometncians do this by w..ing economtc thl.!ory anu ~wtis
tcal tethniqucs to quantrly relationships in histoncal data.
l l1e data\\ l! usc to forecast inflation <tr1.' the rates of inflation anJ unemployment in the C nlled S t ate~ An imp0rtant cmpinc,ll rel,ttionship in m;.~erocconlllllic
duta is the P h1lltp-. curve... in '' hich a curn.ntly )O\\ value ol th~ uncmploynll nt
rate i:, a~sOCltted with tn increase in the rate o( rnflcttion over the next year One
of the intlation foreca<;ts we dewlop anJ cv:t.luatc in Chapter 14 '' haseJ on the
Phillips curve.

Quantitative Questions, Quantitative Answers


Each of thv. t: lour questions require., .1 n u m ~..rkal answer. Economic. thcun pro,ides clue-. .th11Ut that ans\\ Cr-dgarc:ttc con,umptron ought to go down when the
price goc' up but the actual val ue o f tht: n umhc r must be INlTnl!J e mpiricall~ .
thar is.. b' .m.tlv;nn{! d.tt.t. BecaU!>e we usc datil to , US\\er quantitall'" qu~~tton<:,
<.lUT an.,\\.<:r:. <Jlwavs hav~.. some uncenamty: A d tlkrcnt ser of data \\OUid prvduc~.
3 diHI!r~nt num~..ncal answer. Thereforl!.lhc cont:cplua.liramework for thL .111 lySIS need.; to provide both a numericalnllllWI.!r tu the q uestion t~nd 3 mcu.;ur~.: of
how prcc:i"c the answer is.
The ctmc~optu.l l framework used in thi-; hook tS the multiple rcgres<.ion mode l,
the main . . t 1 \ <'f ~;contmlc.:l rics. This model introJuccd 10 Part fl. prOvld~.:s .1 mathematical " Y to lJUantrrv ho" a chang" tn one ' ariable a fleets another varMhlc:.
hold mg. othu thing:. con..,lJnl. For exampk. \\ h<~t dtect doe~ J c:h,JO~c: 111 cl,.-,, ' "I!

CHAPTER 1

Economic Questions and Dolo


have o n test scores. holding consumt s tudent characte r istics (~uc h as fami ly
income) that a school district administrator cannot control? Wh,u ~.:lice doe"> your
race ha ve o n your cha nces o f having a mortgage applicatio n g.rant~:<.l,Jwldm l( coll stant o rher (actors such as your ability to repay the loan! \\ hat c.ltc: t u<A~ .t 1 1o
increase in the price of cigarettes have on cigareue consumption holdins romwm
tbe income or s mok~rs and porential smokers? The multiple regr~'~ion model and
its exte nsions provide a framework for answering the.e q uestions using data a nd
(or quantifying the uncertainty associated wnh those JDS\\I:rs..

1.2

Causal Effects and Idealized Experiments


Like many questions e ncountered in econome rrics, the rtrst three questions in Seclion 1.1 con~rn causal relationships among varia bles. In common usage. a n action
is said to cause an outcome if the outcome is rhe direct result, or conseque nce, of
that action. Touching a hot stove causes you to get burned; drinking wa ter causes
you to be less thirsty; putting air in your tires causes them to inflate; pulling te rtilizer o n your tomato plants causes the m to produce mo re tomatoes. Causality
means that a specific actio n (applying fe rtilizer) leads to a specific, measurable
conseque nce (more tomatoes).

Estimation of Causal Effects


H o w be~t might we measure the causal effect on tom ato yie ld (measured in kilograms) of applying a certain amount of fertili zer, say 100 grams of renilizer per
sq uare me ter?
One way to measure this causa l etfect is to co nduct a n expenme nt. ln \hat
experiment. a horticultural researcher plants ma ny plots of tomatoes. Each plo t is
te nded identicaUy, with one exception: Some plo ts get 100 grams of fe rtilizer per
square mete r, while the rest get no ne. Moreover, whe ther a plo t is fertilize() o r no r
is de te rmined randomly by a compute r, e nsuring that any othe r diffe re nces
between the plo ts are unre la ted to whe the r they receive fe rt ilizer. At the end of
lhe growing season , the horticultura lisl weighs the harvest (rom each plot."Ine differe nce between the average yield per square meter of the treated and unt reated
plots is t n~ effec t on to mato production of rhe fertilizer treatme nt.
This is a n example of a randomized COiltrolled experiment. It is controlled in
the sens~ that there are both a control group that receive no tr~:atmt:nt (no

1. 2

Cousol Effects ond Idealized Experiments

fertili zer) and a treatme nt group that n::ceivcs the treatment (100 g/m 2 of fe rtilizer). It is rando mize d in the sense tha t the treatment is assigne d randomly. 111is
random assignm~nt e limi nates the posl>ibility of aS} temalic relationship betwe en,
for example. bow sunny the plot is and whe ther it receives fer t ili7er, so that the
only sy~temaric diffe rence between the treatment and control groups is the treat m e nt. [f tl1is exp..:ri mt:nt is proper:ly Impleme nted on a large enough scale, then it
will yield an estimate of the causal e ffec t o n the o utcome of intt.;rest ( tomato production) of tbe treatment {applym g 100 glm 2 of fe rllh7er).
In this book, the causal effect b defined to be the effect on a n outcome of a
givt:n action or ueatme nl, as measured in an ideal randomized controlled t: pe rime nt. In sucl1 a n expe riment. the only system atic re ason for diffe rences in o utcome::s between the treatment and control gro ups is the treatm~nt itselL
It 1s possible to imagine an idea l randomized controlled expenment to a nswer
each of the first three qut:stions in Section 1.1. For example, to s tud) class size o ne
can imagine randomly assigning ' treatments'' of di(ferent dass sizes to diffe rent
groups of students. Jf the experime nt is designed a nd executed so that the on.ly systemallc diff<!rence bet\\ Cen the groups of s tudents is their dass 1ze. then in theory this t:xperime ot would estimate the effect o o test scores of reducing class size.
holding all e lse consram.
m e concept of an ideal rando mized controlled experiment is useful be(,;ause
it gives a defutition of a causal eff~ct. In practice, however, it ill nol possible to per
form ideal expe rim~rH s. In fac t, experiments are rare in cconomt:trics because
often they a re une thicaL impossible to execute sntisfactorily, or prohibitively
e xpe nsive. TI1e cone:: pi of the ideal randomized controlled expe riment does, however, provide a theore tical benchmark for a n econome tric analysis o f causal e ffectli
using actual data.

Forecasting and Causality


Although the fi rst three questions m Section U concern causal e ffec ts. the
fourth-forecasting innatioo--does not. You do not need to know a causal re lationship to make a good torecasl. A good way to ''fo recast '' if' it is raining is to
ohscrw whethc:lr pcdcc;trians are using umbrellas, butt he act of using a n umbrella
doc(, not cause it to rain.
~.n tho ug h forecasting need no t involve causal relatio nships, economic
theory suggests pattern!. and r e lationl>hips that mighr be useful fo r forecast ing. As
\W Sl.!c in Chapter 14. multiple regression a n a l ys i~ a llows us to q ua ntify historica l

r,

,
10

CHAPTER 1

Economic Questions and Data


rdattUll\hlp" "\Ugg~'>ti!J hy ~COOOID IC lhCOI), 10 Check \\ hdhcr thl)~C rcl.lliOO"\tUp~
ha\l! been 'tahk O\er 11mc. to make quanlltatl\~ lort"C.I\h c~hout the lutun: . md

Ill u-;"c'' tht: accuracy of tboc;e lorecast'-.

1.3

Data: Sources and Types


In cconumetn~ uata come from one ol t\\O siJU Tl ~: ~;xp(.. 1mc'1h nr nunexplnmcntaloh-.crvatinn... IJf the woriJ. Th1') boL'k exa mmt:~ both t:C\pcnm~;nt I anJ nunexperimental Jata "ct-..

Experjmental versus Observational Data


l.,pcrimcntal duta com~: from experiments J cSI!!ncd to cvalu,nt a trt:atmem or
policy (ll to in vc~tigHlt! !I C<lU!.>al t!ffect. PO! I.!Xamplc,lhc '>tntc of llnnCS'iCl financed
a l ar~c 1<111dumiLed cnntrollell cxperimc nt~.- xam lnlll)!. da<,' -;IlL mth~o. 19~(h. In th.11
cxpc11 mcn l, wh1ch we examine in Chapter 13, thou'lanth o( 'tuJ~o.nt' wcrc rantlomlv av.;i1nctl to clac:scs of d ifferent s in~s for several vc::.tr-. and were giH:n ann ual
~t.lllJJrLillcd t~<.t~

The knncssl!'- cl. ,., siZe experiment co!)t mllhl'"" of dnll u ... and required the
un!!omg cuufkratlun of many admimstratorl>. p;1rcnt-.. anlltt.;,Jthus mer s~ve ral
~\:;n-. . Bccnu'c real-wt rlllexpcrimenh \\ llh hum,m !)llhJl:LlS ar" \hfltlUiltu aJmmi ...ter and lu control.tht;y ha' c n- \\s rda(\ c tu ideal ranJonuz d lllnl ll~o.J c::xpui'lllcOI\. ~lorcover.1n <:orne circumstance' ~.-xpcrimcnh arc nut nnl) C\pcnstvc anJ
t.hlllcult to nJmin' h:r but a.Jso unethical (Would it~ ethk1l hl olltr r mJoml}
~ lct:tt.;d tccnn.!en. n~o. C\pen..-.1\~;; c1garcth:s h SC.:! how n :tn} th~.- v 11uv?) Bec:.u'c \'lf
tht:"~o. fmant' l,tl. practKal. and ethical pml1kms. expcrim~.-nto;; in ~oa.:onomk" arc rare.
I n't~;au mn... t ecnnomtc dat~t arc obtainc::d by ob!)Ct\tng rc::JI \\LiriJ b... h "tor.
D tl 1 ul1tainetll1) observing actual hd~t1vior tllll,tl.lt: an c Xpltlmcnt.ll settin~
arc callcd observational data. Obscrvatinncll U<tla arc collected U'itng -;uncy... such
n., ,, tdcphnnl..' ~un cy of consumers, and admin i... tr ,tli\'c rccllrd.,, "uch "' historkal
rccortl<, IJn mort~wge applications maintamcd by lcnc.hn~ institution"
Obsc1 va twnal duta pose maJor cha llenges to l!conumcl lll.: nncmpb to cstim,llt cau'>all'lfc.:cts, anJ the tools of econometric' h1 tm:kle the.'>~ dl'lll~n'l'' In th...:
ltll \\Orld. k\cls ol "tre.nmcnt" (the amLIUnt ol tuuhzcr 111 tl1~: tomto c.:\amplc.
t h~.- '\I Ud~nt - llJChl.;r ratio in tbc cl..t<is sizt c.:\.1mpl~) ate nut a'"l!ntJ ,tt randl)ffi,
'n 11 i:- llillicult to sort out the effect of the treatment " from nth~o.t tdtHtnt
l.tctuh. ~ tuch 11 (..Uir '' :1crncs,, nJ much ol lhh ho<'k, 1 dcvot d 1<1 me tl oJs 1r

t .3

Data: Sources ond Types

11

meeting the chalkngc!> ~ncountereu when real-world duw arc used to estimate
cau-.al dlccts.
\\ hether the data are experimcma l or ob en ational, data set come in three
nwn t) pes. cross-sec:til)nal dara. time ~cries data. and pand tlatn. In this book you
willl!ncounter all three types.

Cross-SectionaJ Data
Data on different entities-workers. conl>uruers. firm:-.. governmental unib. and so
fmth-for a single lime period are callt:tl crO'i!!Sectionnl datu fur example. the
unla on test score in California school tlistricts are cross sectional. Those data arc
for 4:!0 entiticc: (school districts) for a sin~le time p.:riod ( 1998). ln general , the
number of enllt tc::> on which we have ob::.crvations is denoh:J by n; so for exampk. in the Cahfom~:1 data set 11 = 420.
The Cahrorma t.:::.l score data set Cllntains measuremenb of :o..:, cral differc::nt
Htriahles for each dtstrict. Some of these data are tabulah:d in lbhle 1 1. Each row
hc;ts data for a different district. For example. the average tc:-.1 -.core for the firl>l
district (''district #1 ") is 690.8; lh.i:s is the average of the math and science test scores
for all fifth graders in that district in 1998 on a standardi1!d test (thl! Stanford
Ach ievement Tc::st).l11c average student- teacher ratio in that d tstrict is 17.H9,
that is.tbc number of :o.tudents in district #1. divtdcd b) the numb..:r of classroom
teuchcrs in d istrict "I. is 17.89. A\ eragc ~:xpen d i turc per pupil in districtl#l is
$6 ,3~5. The percentage of studen ts in that d istrict 'ili ll lt:arning E nglish-that is.
1hc perc\.!ntagc of students for whom English is a sl'cond language and who are
not yet proficient in English-is 0% .
The rema1ning. rows present da1a for other distrh.:lS. Th-.: onkr of the row~ is
nrhitrary. and lht.> number of the distncl, which is called the observation number.
IS an arbitran ly assigned number that o rge~mzes tht! data. As you can see in the
tuble. all the 'uriablcs listed vary con')idcrably.
With cro~s-sectional data. we can learn about relationship~ among variablelh~ .. lud~ing differtmces across people. firms. or other economic entitie:o. dunng a
single time period.

Time Series Data


Time sericb datu arc data for a single cntitr (person . fit m. country) collected at
mu ltiple time.: periods. O ur data set on (ht! rates of intlation ond uncmploymenl in
the Umteu State!-> '"an example of u time series data o;ct. The:: data set contains

12

CHAPTER 1

TABLE 1. 1

Economic Questions and Data

Selected Observations on Test Scores


and Other Variables for California School Districts in 1998
Percentage of

Observation

District Average
Test Score (Fifth Grode)

Student-Teamer

xpenditure

Ratio

per Pupil ($ )

b90 s

17.89

So~5

661 .2

2 1.52

5()<)<)

46

641.6

18 70

'i502

JOO

047.7

17.3n

71 02

0.0

640.8

1K67

5236

13.9

411{

645.0

2 1.H9

4403

24.3

419

672.2

20.20

4776

J.O

420

655.R

19.04

5993

5.0

(District) Number

Studenu
Leoming Englim
Oil%

observations on two variables (the rates of inflation <IOU un~ rnploymenl) for a
single e ntity (the united States) for 183 time pe riods. Each time period in this data
set1s a quarter of a year (the first quarter i~ January. February. and March: the second quarter is April. May, and June; and so forth) . The observations in this data
set begin in the second quarter of 1959. which is denoted 1959:1I. and end in the
fourth quarte r of2004 (2004:1V). The number of observations (tha t is. time periods) in a time series data set is denoted by T. Because there are 183 quarters from
l959:11 to 2004:1 V, this data set contains T = 183 observ a tion~.
Some observations in this data set are listed in Table 1.2. The data in each row
corresrond to a differenl time period (year and quarter). In the secant.! quarter of
1959, for example, the rate of price inflation was 0.7% pe r year at an annual rate.
In otlwr \vords, if inflation had continued Cor 12 months at its rate during the .sec
ond quarter of 1959, the overall price level (as measured by the Consumer Price
Index. CPI) would have increased by 0.7%. In the second quarter of 1959. the rate
of unemployment was 5.1 %; that is. 5.1% of the labor force reported that the~ did
not have a job but were looking for work. In the thtrd quarter of 1959, the rate of
CPT in nation was 2.1%, and the rate of unemployment Wa'> 5.3%.

1. 3

Dolo. Sources and Types

13

~ Selected Observations on the Rates of Consumer Price Index (CPI) Inflation

I_

ond Unemployment in the United Stutes: Quorterfy Data, 1959-2000


Obervarion
Number

(Yeor:quomr)

CP11nflotion Ror.

Unemployment

(% per yeor ot on annual rar.)

Rote ("!.)

llJ59:11

0.7%

5.l~o

1959:111

2. 1

5..'
5.1\

l'I)Q;JV

0.4
Jl),4} 11

24

5.2

l\1

Z~ll

4.3

'.6

1 ~2

~lii4 :Il[

I ''

SA

.~

21114.1\'

3.5

5.4

b.
'

By tracki ng 1 smglt! t..Olily over time, ume ~erie!> d tta e<~n he used to s1Ud\ the
C\'Oiuuon of vanables over time and to lorecast tutur~.- vttlucs of tho::.~ "ariahlc:s.

Panel Data
Panel data. ,tlc;o called longitudinal data. are J at.l for mulliplc cnti t i~~ in which
each cntll\' is nh,crvcd a t two or more ttme pe11mh. Our Jat.J l)n c1g ut.tlc con~umptim

and pncc::. art: an C).3mple of a panel c..lat.t ''-' and selected 'affitblcs and
that data ~et are hstt..J m l ahl~; 1.3. Tht: numhtor of cntllk~ 111 .t
p.. nc.l JJta 'Lt IS denoteJ b) n. and the nu m~r of lmt.. rcriod.; i' Jen~liCJ 1:'1~ T In
th~.- d~.tretll. d.tl a ''-'l. We; ha,e observation~ on 11
4S cominenlctl U.S 'tate" (c:nlittcs) torT= 11 y~:af'> (time period-;) from L985 to I~J5.1llUc; there il> a tol:ll of
" X T = 4R v II = 52R ob!'ervations.
Some cJ,lt,, from the cigarette consumption data set are listed m Tabh... 1.3. 11\e
liN block u( -1~ observau on'> li,ts the data lot each slate m 1985, 11rgamzcd .tlph.th:tlnlly lrllll \l.tham,, 1<.1 Wyoming. Thl! n1. \t hlock o f 4!) oo-.l:l'\ a11on' Ji._t., !he
J l.l fllf I "" .tnJ '-II fllr\h through 1995 ror C\Jrnpll in 19\:', cigarette -;ales in
Arl.;an-..t-.. '"'' r~ I" 5 p:~ck" per capiw {the tot:tl numhl!r ot pacb ot dgan:lh:' solcJ
o~'!>Cf\ uon::. 10

14

CHAPTER I

Economic Oveslioos ond Doto

TABU 1 .3 Selected Observations on Cigarette Sales, Prices,


and Taxes, by State and Year for U.S. States, 1985-1995

Observation

Average Price
Cigarette Soles
per Pock
(paclu per coplto) (Including toxu)

Totol Taxes
(clvorene excise

State

Yeor

A l ttlamll

JQ~5

116.5

1.022

S0.333

Arka,,~:~,

1%5

128.5

I 015

II '\"U

A ruona

1985

10-t5

I 086

() 11\2

-17

Wes1Varvmi.J

1985

ll2.8

I UR9

0.382

48

Wyuming

1985

129.4

0.935

11.240

49

AJnb.tmu

1986

117.2

1.080

0.3~4

96

Wyomang

1%6

1r.s

1.007

0.?40

97

AlabnmJ

19~7

115.8

I 135

0.335

528

Wyomnr

1995

112.2

Number

tax soles tax)

0.36(1

~--~

/'J()l~

The .:hl.ilr<"llc: ronsumptwn daiJ .:II' dc:~bed Ul:\ppendh 12 1

in Arkansas in 1985 divided by the total popula tion of Arkansas in 1985 equals
128.5). The averuge price of a pack of cigarettes in Arkansas in 1985, including tax,
was $1.015, of which 37 went to federal. state, and local taxes.
Panel data can be used to learn about economic n:lationship5 from the experiences of the many diffecent enrities in the data set and from the evolution over
time of the variables for each en tity.
ll1e definitions of cross-sectional data. time series data, and pane l data are
summarized in Key Concept 1.1.

Key Terms

Cross-sectionaJ data consist of multiple entttie<i obsene<.l at a smgle


tnne period.

lime series data consist of a single entity observed at multiple time

15

1. 1

p!riods.

Pa nel data (also known as longttudinal data) consist of multiple entitie).


wlu:re each entity is observed at two or more time pe riods.

Summary
1. M a ny decisions in business and econo mics require quanrira 11ve estimates of how
a change m one variable affects another variable.
2. Conceptua lly. the way to es(jma re a causal effect ts in an ideal ran.domi7ed contro lled experime nt, but performing such experiments in economic applica tio ns is
us ually unethical, impractical. or too expensive.
3. Econometrics provides tools for estimating causal effects us10g either observational ( no nexperimental) data o r data from real-world, imperfect experiments.
4. Cross-sectional data are gathered by observing multiple entities at a single point
in time; time series data are gathered by o bserving a single e ntity a t multiple points
in ttme; and panel data <ue gathered by observing multiple e ntities, each of wh.icb
is observed at mulltple points in time.

Key Terms
rnndomi/c<.l contro lled experi ment (8)
l'Ontrol group (8)
11 catment group (9)
\...tusaJ eftect (!>)
~;xpenment.tl data (10)
observational data (10}

cross-sectional data (11)


observation number ( l l)
time series data (11)

panel data (13)


longitudinal data (13)

16

CHAPTER 1

Economic Questions and Data

Review the Concepts


l. l

IJ~.;\t~n n hvpothdical ideal randomized com rolled experiment to 'lud~ the

dkct of hours spent :.tudying on perlorm.tncc on micrncconomic:. c.\.Jrn<;.


Suvgc~t :.orne;: tmp~diments to implementing thl~ t:Xpcriment in actiC\;

1.2

!Jcc;il!n a hypothetical ideal randomized contmlled t '<I .:rimcnt to tudy the


c llcct on ht~h\\ay traffic deaths of weanng -.eat h...! h. \ug_g~-;,1 some. impedt m~nl\ to tmplcmeming this expenment tn prnctice

1.3

Ynu arc ~tsk~d to ~tudy the:: relationship between hours -.pent on employee
ltaining (measured in hours per worker per week) in a manufacturing pla nt
am! the productivity of its worker~ (output rer \\Nkcr pcr hout ). De~cribe:
u. an idc.al ranclom izeu co ntro lled ex r~.:r imcntr o measu re rhb cnu~al
effect:
b. nn ob~ct va tional cross-sectional data set with which you cou kl stud y
tlus e ffect;
c. an observational time :.eries data set for studying thi' effect: <IIH.I

d. an observational panel data s~t for stud) ing thi' dTcct.

CHAPTER

Review of Probability

hj... dtaptcr rc\'lew.; the core ideas of the theory of prvbabllll\ Lhal an:

nec.:dcu to undcr!>t.lllJ regres::.ton analvsis nnd econometric' Wt. ,,.,,ume

rhat ynu h l "L' t.t'-\..n

10

mtroductol) cour<:c m probahihl) and aaustic::.. If) oua

J.no'"ledgc of rrl)hanilit~ i' <aale.you ~h~.... uld rctrc'h it b) rcadtn\' this chaptl:r lt

you Ieel conhut.:nt wtth the.! materi.tl, you !>llll .,hould ... kim the chapter .tnd the
tcm1s nnd concepts ntt he end to make sun.: you arc familiar with the lth.:ns and
notation.
Most aspects ol the world around us have an clement ol randomness. llte
lhcur~ of probability

provides mathematical tool!\ for quantil ymg and describing

lhi'!- randomn c~-.. Section

2.1 reviews probabilit~ distnhution' for 1 'ingh.;

random variabk. and Section 2.2 cover; the mathcmaucal '-xped<~ttun mean,
<1 nJ 'ariance of,, ,jnglc.! random variable. Most ot the interesting problem-. m
~_;<.:onomics

invoh c more than one vanablt!. a nd Sec lion 2. ~ introduces the b~l'.ic

d ements of probability theory for two randum van.thlcs. Section 2.4 <.J i.;cusscs
thrc.!C

'pecial pruh.1hi li t~ dbtribution' that pia\ a \:\olltnl role in staTistics and

CC\mnmctnc:.: lh~ normal, chH>quarcd, and F u1stributton.s.


1l1~;

fin.1lt\\O c;~:ctiflnc; of this chapter focu) Cln 1 'pecific source of

T.Jndnmnc"s of ~~;ntral imponancc. in ecunomcui~;' the randomnt.''- that ,u..,~c;


hy ranclnmlv drnwmg a sample o( data from~ lnrgl!l population. For example-.

'uppusc you 'iliiVc.!Y lt.:n recen t college graduntcc; ~e lected at random . record (or
obl>u n."") the If carnmgs. and compute the il\~; rage ~:arning., using the~c

t~;n

data pomts (or un,crvations'').Because you chose the: t-am pic.! at tJnt.lom, you

17

18

CHAPTER 2

Review of Probobi11ty

could have chosen ten di(fere nt graduates by pure random chance; had vou
done so. you would have observed ten different e arnings a nd vuu would b.n e
computed a different sample

av~ra ge. Becau&e

the a verage earnin~t!> vary from

one randomly chosen sample to the next. the sample a verage

lS

il!;elf a random

va riatlle. Therefore, the sample average has a probabilily distribution, referred


to as iLS sampling distribution because this distribution describes the different
possible values of the sample ave rage that might have occurred hc~d a different
sample been drawn .
Section 2.5 djscusses random sampling and the sampling distribution of the
sample average. This sam pling distribution is, in general, complicated. When the
sample size is su[ficiently la rge. howe ver. th.e sampling distribution o f the sample
average is approximately normal, a result known as the cen tral limit theorem,
which is discussed in Section 2 .6.

2.1

Random Variables
and Probability Distributions
Probabilities, the Sample
Space, and Random Variables
Probabilities and outcomes. The gender o{ the next ne w person you meet,
your grade on an exam. and the number of times your comp uter will crash while
you are writing a term paper all have ao eleme nt o( chance or randomness. ln each
of these e xamph.:s. then:: is something not yet known that is eventually revealed.
The mutually exclusive potential results of a random process are called the
outcomes. For example, your computer might ne ver crash, it might cras h once, it
might crash twice. and so on. Only one ot these o utcomes will actually occur (the
outcomes a re mutually exclusive), and the outcomes need not be eq ually likdy.
1,.te probability o( an ou tcome is the proportion ot the time that the outcome
occurs in Lbc Jong run. 1f th e pro bability of your computer not crashing wh1le you
are wri ting a h:rm pape r is 80%. the n over Lh~ course of writing rndD) tenn pap.:r.;.
you will complete 8l}% without a crac;h.

2. 1

Random Variables and Probability Distribuhons

19

The set of all po,sabk ~l utcomc .. i-. called the


oi thl ,.ampk 'pace.thJI h. 111 t;'\~nt j, 3 _et of
nnl or mort; ~1UlCt111lt.:,. n1c event m~ compul~r '' ill..:rash nu Olllfl than once" is
the -.cl Cl)n sistim~ of I\\O outcomes: 'no crashes'' and "one cra,h."

The sample space and events.


o,umplc space \n event as 3

'\Ubset

Random variables.

A random vanable is a numcncal :o.Uillllhll \ ol a random


out1:.0mc: TI1e numtk:r ot ttmes your computer eras he~ whak you .arc '' 11Ling a term
paper IS random and takc:. on a numencal Haluc, o 111:. a r.mJnm 'lrlll'lk.
Som~: J <~lllklm , ,mahle:. are Jiscri.!IC and some .m.: cunttnU,lU.... A'> their
n.1me~ '>Uggcsl. a discrete random ' 'ariable uake:. on onlv .a t.lt.,lrete 'cl or' alues.
lih 0. I. 2..... whereas a continuous random \'ariuble lakeo; on ,1 cont1nuum ol
PO"-c;.ible value::..

Probability Distribution
of a Discrete Random Variable
Probability distribution .

The probability distribut ion of a d imete r,mdom


'ari.tbk i<.. thl list of aU posc;ihle values of the ''arinhlc a nd the probabili ty that

c '' h 'alu~ "ill occur. These probabilities sum to l.


fur example, let M be the numher of tlmt!::. your computer cra'>bes \\ htlc you
a r ~ \Hittng a term pa(X'r. The probabilit)' l.lt:.lribution of th~.; a.tndom \anableM h
th~.. list of probabi liues of each possible outcomt.:: the prohahalit} that \/ = 0,
1.knutcJ Pr(M 0), is the probabi lit) o1 no computer c1,1~hc-. Pr(M = I) is the
prohahtllty or a singk computer cra~h: ant.! c:;o follh An c\amplc ,,f a probability
Ji!-.tribulion lnr \1 i.., gi"en in the second ro" of I able., 1 in thi'> di<itrihution.
it vour computtr crash ~s four times. you wrll quit anJ \\fi le the p.lpet bv hand.
An:ordtnllt,, th1c; di~tnhu tion., lhl! probabilil) ol no ~.-r ..t::.hes is RO"o. lh~.- probabtli t~ \.II one cr ''h ts 10%: a nd the probablllly ot t\\o, thr~~ . or !our crashe:. tS.
Jc:.pcctivt:h. 6%, 3'}n. a nd 1% . The~c probabilities 'um tn I00"" ll11 prohah1lit)
Ul,tnhuunn ~ pluttcd 10 figure 2-. l .

Probabilities of events.

The prohabihty of an C\'Cnt c1n be compu ted from


the pwbJbllit\ da~..trahution. For example. the prob3balll\ ol tht.. c\cnt o{ one or
I\\ '1.13.,hcs ''the ... um of the prohabalmc:s o l lh~,; cons t uu~.;nt ou t~omes. That is.
l'r(.\1
I or,\/ 2)-= Pr(.\,f = 1)- Pr( \1 = 2) - 0.10 O.llfl 0.16,or ln':o.

Cumulative probability distribution. The cumulative probabilit) distribution


lhl proh:1h1hl\ I hat the random vanahlc ''Jess than or equal to a particular \aluc
'll1c Jn,l ro" ul 1.1hh: 2.1 \..:s lhc c.:umulatJVt; probilhtht,:v dlo;tnhuttun uf thl r mdnm
j,

20

CH APTER 2

TABLE 2.1

Review of Probability

Probability of Your Computer Crashing M Times


Outcome (number of crashes)

dL~tnbution

0.80

010

(I (Jo

O.IJ~

Cumulamc probahility dt,trihullon

0.80

0.90

0.96

ll.W

Probahalit\

U.!Jl

variable M . For example, the probability of at most one crash. Pr(M :S 1). is 90% ,
which is the sum of the probabilities of no crashes (80%) and of ont! crash ( 10% ).
A cumulative probability disLribution tS also referred to as a cumulative dis
tributioo fun ction, a c.d.r., or a CUJ11.ulative distribution.

The Bernoulli distribution.

An important s pecta l case of a discre re random


variable is when the random variable is binary, that is, the outcomes are 0 or 1. A
binary random variable is called a Bernoulli rando m variable (in hono r of che
seventeenth-century Swiss mathematician and scientist Jacob Oemoulli), and it~
probability distribution is caJled the Bernoulli distribution.

FIGURE 2.1

Probability Distribution of the Number of Computer Crashes

The heighl ol eoch bor i~ the probability thot the


computer croh s the indicated number of times
The height of the fi~t bor is 0.8, ~ the probability
of 0 computer croshes i~ 80% The height of the
~ond bor is 0 I , so the proboblity ol I com
puler crash is l O'l., ond ~ forti, for the other bors

P ro babiliry
lUI

.,,

t)O

Numh"r of cnuh~

2.l

Random Variables ond Probobilily Distributions

21

For example. le t G be the gender of the nt;>.l new pcrfion you meet. where
0 indicat~..;s that the person is male and G I indicates that she is female.The
outcomes of 6 and their probabili lie~ thus are

G _ { 1 with probability p
J 0 wilh probability l - p ,

(2.1)

where pis the probability of the next new person you meet being a woman. The
probability dist ribution in Equation (2.1) is Ibe Bernoulli di&tribution.

Probability Distribution
of a Continuous Random Variable
Cumulative probability distribution.

The cumulative probabi lity distribullon for a conunuous variable is defmed just as it is (or a discrete n1ndom varia ble.
nlal is. the cumulative probability distribution o f a continuo us rMdom variable
is the prob<~biU t y that the random variable is less than o r equal to a particular
val ue.
For example.. consider a student who drives from homt: to school. Tilis student's
commuting time can take on a continuum of values aod. because it depends on random (actors such as the weather and traffic conditions, it is natural to treat it as a
continuous random variable. Figure 2.2a plot~ a hypothetica l cum ulative distribullon of commuting times. For example, the probability that the commute takes less
than 15 minutes is 20% and lbe probability that it takes less than 20 minutes is 78%.

Probability density function .

Because a continuou& random variable can take


on a continuum of possib le values. the proba bili ty distribution used for discrete
variables, which l..ists the probability of each possible value of the random variable,
is not suitable for continuous variables.lnsteud.the probabili ty is s ummarized by
the probability density function. The area under the probability density function
~tween any two pmnts ~ the probability that the random variable falh between
thmc two points. A probability density function IS also ca lled a p.d.f., a den ity
runction. or simply a density.
figure 2.2b p lo ts the probability density function of commuting times corre'ponding to tlte cumulative distribution in Hgure 2.2a. The probability that the
commute takes between 15 and 20 minutes is given bv the area unde r the p.d.f.
between 1' minutes and 20 minutes, which is 0.58, or 5S%. Equivalently. this probHhihty can he seen on the umulative distribution in Fi!ttln:: 2.2a as the difference

22

Review of Probob1l1ty

CHAPTER 2

FIGURE 2 .2

Cumulative Distribution and ProbohiJity Density Functions of Commuting Tim

~~~l'fi[ommii/JIIgt.Te S

114

IS :020

11.:!

I)

,,

(I ~--~--~-------~---------~------~--------~------~

HI

C om m uting rime

(m iu u tc~)

lllmw h 1r1b mnn tumuon tf connnurin

(a) (

Probabilit\ d c n)ih
I~

""f \-., "' - ~:, c-"'"..,.


151 0

<

<>OI o;a

llM ._

II (I{,

Pr CComm~t n~ 1 mt > 20l :: 0 22

1\1.1-

1\

on L....~--~:.l------1----~-==::::::~----J..._------l
II

3
Commu tin~

35
timt-

II)

(m i nu to~)

f ogure 2 2o shows ltle cvmulotiVc probobiltty dtsrnbution (or c.d f.) of commuhng limes The probolity thai o c.om
is less rhon 15 minu'cs is 0 20 (or 20%), and >he probobtlity tho: tl ts less than 20 m1n.. "'
78 (78%)

mutng time

Fgure 2 2b s~ the probobtltty dcnsty funct,on (or p.d f ) of commuhng times Probob,lotM oro gvco by arcos
under tho p.d I Tht> probe' hty t'
1
nmuing tune rs ~ 15 aod 20 mmu~ is 0 58 (58!t), and i~ gi11en by
tho oroo under the curve b tween 15 and 20 minull!s

2.2

Expected Values, Mean, o'ld Variance

23

hctwecn the rrohabilit~ that the commut~. '' kss than 20 minut~" (7h"o) and the
prnbabtht\ thatlt J<; les~ than l: minute (20gu)
the rrobabillly den ll) functiOn and the. cumulall\e probabilit} dtstnbution ~hO\\ tht.: -.am..: infom1.1tion in diffcr~.nt form.ns.

,lU,

2.2

Expected Values, Mean , and Variance


The Expected VaJue of a Random Variable
Expected value. The expedcd \Uiue of a random vari.thl~ Y. t.lt:notcd t.:(Y). is
the lons>-run nvcrage vaJuc. of the random \;triabk U\~;r manv repeated triab or
occurr~.:n,.;s. llle ~xpo.!crcd value of a du;crcte random 'Jtr.rbk '"computed as a
wc rght~.d avc::ragt.: ol the pos-.iblc outcome~ of th.ll random 'ariablc. where the
''eight-. arc the. prubabilittcs of thatuutc:umc.- lll~o. expectc.;d vatu~ of Y 1!-> .tl~;o called
thl.' expectation of Y or the mean of Y and is denut~.u bv p.y.
Fur cxamplc, suppose you loan a friend $100 nt lO~o interest. If the loan is
repnid vou get $110 (the principal of $100 plus mtere!lt of $10). but there is a n~k
ut 1;~ th.n your lnend will ddauh and \ OU "'II gl.!t nothrng at aiL Thus, the amount
}OU an. rcpard is a r.mdom ,ariable that equal~ ~110 \\ ilh proh.tbtht) IJlN and
c.:quah ~0 wrth probatuli[} 0.01. o" ~ r many such loun-..99 of the llm1. }ou would
he paid hack 110, but l% of the time you would g.et nothing. ~o on <t\cra~c you
would h~.: r~.; pa i d $110 x 0.99 .1. SO x lUll $10 .90. llms the expected value of
your repayment ( nr the .. mcnn repayment'') is $1 08.90.
As '"' second l!xamplc. consider tht; number oJ ~omrut c r crashes lvl with the
prohubtlity dic.t nhutmn given in Table 2.1. TI1 t..: expected valul.' or M i~ the uv~.: rogc
number ol crashes over man) term p~pers, weighted b) the frc quenC) with which
a cra~h nf J gi,.en 'MC occur!>. Accordmgl~.
E( ~/)

= (l

0. 0

1 X 0. 10

+2

0.06 - 3 X 0.0"'

-l " 0.()1 - 0.35.

(2.2}

Titat i'-. the expected number of computer cra.;hcs whi le writing a tcml pap.;r is
tU5. Of course. the acrual number or crash ~!$ mu-.t always be an i nt e~cr: 11 makec;
llt) -.cnc;c to -.a\ that the computer crashed 0.35 times while writing a partrcular
tc.r m papl.'r 1 RJth~.;r,thl. calculalio n tn Equatton (2.~) mean-. th.tt the <l\cragc number ul c;r,r,h~.-. o,~,.r man) ... uch term parxn. is 0.~5.
The r,~mlU)a for lh-: e xpected \ alue or., dtscrl.!l~ random \'ari3hlc y that can
lnke c)n k different \alucs ''given as Ke\' Concept::? L

Expected value ofa Bernoulli random variable. An tmport.mt -.pcci.tl ca'>C


111 Key Concept 2 I b the nu.an ut :1 B~:tnoulli t.mdom

ul th\: g~o.rh.:talll,rmuln

24

CHAPTER 2

KE.Y CONCEPT

2.1

Review orProbability

EXPECTED VALUE AND THE MEAN


Suppo'>l! the random \-ariable Y take on k po. stbk valu ') 1 . y 4 'f h\:rc y ,
dcnott:s the first value,y2 denotes the second value, and so 1 , I md thallhe probability that Y takes on y 1 is p 1 the probabiltt) that Y takes on r2 1~ p 2 and "O forth.
The e~,xcted value of Y. denoted ( Y). is
J.

( Y);:: Yt P t

+ Y P:.- . .. + Yl PI

= L y,p,.

(2.3)

where the notation .. "'i~t>', P; .. means "the sum 0 1 v1p1 for 1 running fro m I to k."
The expected value of Y is also called the mean of Y or the expectation of Y and
is denoted J.l.r

variable. Let G be the Bernoulli random variable with Lhe probability distribution
in Equation (2. l ) .The expected value of G is
E{G)

=1X

+0

(1 - p)

=p.

(2.4)

Thus the expected va lue of a BernouUi random vanable is p. the probability that
it takes on lht: value '1."

Ex pected value of a continuous random variable. The ~ xpected value of


a continuous random variable b also the probabdity-wcghtt:d ave rage of the pos
"ible o utcomes of the rondom varia bit:. Because a continuous random "anable can
take on a continuum of possible values. t h~ forma l mathematical definition of its
expecta tion invoh cs calculus and its defi ni tion is give n m Append ix 17.1.

The Standard Deviation and Variance


Tile variance ctnc.l standard deviation measure I he dispersion 0 1 the ''sprcau'' of a
probability distribution. The variance of a random variable Y. denoted var( Y). is
the expcctcc.l val ue of the square of the deviat ion of Y from its mean: va r(Y) =
c[( Y - J.Lr)2].
Becau!'c the varia nce involves the square of Y. the units of the variance.:. are
the. umt1> of the '>quare of Y, wh ich makes the v... n ,w c~ awl-.\\ ,tn..l lClJnterpret !tiS
thcrctorc co mmon to measure lhc ~prt.:dd by th'- 'l und~trd deviation "bich 1. the
' qunrc r~..;ot ol the " anance and i:. th!not~d tT) Th" 110 .lull d ' dtion h ' he
ame umh
)'. Th~s,. definillons are 'umman;:cd Ill "c.:.' ( ..ncept 2.2.

2 .2

Expected Valu~ Moon, ond Variance

25

VARIANCE AND STANDARD DEVIATION

rn~ variance of the discrete random variable Y . denoted u-i , is

2 .2

a,'

var(Y)

= F[(l

iJ.l )

j = 2:, (.v, - J.Lrf p..

(~.5)

i-I

ntc ..t mdard de' ~tllion of Y i:. <T) the square root of the variance. Tht! un ' t!. ot tht
stand. n.l 1.lcvatinn arc the c;ame as the units of Y.

f11r l xamplc. !hr.. '1riance of the number of computer cr;1-.hes M i~ the: proh:lhlhty-weightcd average of the squared difference bet w~.:~n \I .1nd Its m ~an.lJ.35:
' 1r(M)

(0

0.35) 2 X ll.~-. (1 - 0.:15f X 0. 10- (2- 0.35)'

+ (3 -

0.35)2 X 0.03

-I- (

0.06
(2.6)

4 - 0.35) 2 X O.U 1 - U.~ I~.

l11e ~t a n dard devHHion of M is the ~4u arc root of the: variance. so 11M

O.M75

... 0.80

Variance of a Bernoulli random variable.

llll.: mean of thr.. Bernoulli random ' .1riahk G " ith pwbability distribution in 'Equnion (2.1) is ~J.r, = p [E4uation (2.4)] ~o its ' ariance is
\ ar(G)

= u 1~ = (0- Pi X ( 1 -

p)

(I - p)

Titu-. the standarJ cle, Jation of a Bernoulli random

J>

p(l- p).

\~triable 1<; "c

v'p(l

(2.7)
-

p).

Me an and Variance of
a Linear Function of a Random Variable
I hi' .;cctwn d1~c.u~~s random ' anabk<: ('iay, .\ and } ) that an: related b) a linear
lunl11on. For C'\ampkconsider an incom~ ta~ sr..hcm~.; unJc.r \\l11ch a \\nrk~r is
1 1\cd .It " rat~.; tlf 211"., on his o r h~r earnings and tht: n gh r.. n 1 (Lax-free) grant If
~.,(}()(),

Undl!r thi' tax <:cheme. afll.:r-ta-.: earnings Y ar~ rclatc.:d tu pre-lax l'arnings
.\ b, the l:!qu<Jlion
Y - '2000

'1 u.~

\'.

I h.tl i-., niter tax carningo;; Y io;; ~0% of pre-ta'\ c.u ning.' \', pht<: 2000.

(2.X)


26

CHAPTER

Rev1ew of Probability

Suppose an individual's pre-tax earnings next )COr art. n random variable \\ilh
mean JJ.x and variance o} Because pre-tax earnings arc r<mdom. '-O arc after tax
earnings. Wba t are the mean and standard devia t ion ~ of h~r , fter tax earnings
under !hi!. tax? After ta.xes, her earnings are SO% of the o m!inal pr~;-l t\ carnmg.~
plus $2,000. Thus the expected value of her alter-tax cc.ming~ i:,
E( Y) -= J.l.l'

= 2000 + 0.

J.L r

(2.9)

The variance of after-tax earnings is the expected value of ( Y J.Ly)z. Because


Y = 2000 - 0.8X, Y- J.Lr = 2000 + 0.8X- (2000 + 0.8J.Lx) 0.8(X - J.J.x).Thus,
El(Y - JJ.r) 2) = {[0.8(X- JJ.x>Fl = 0.64{(.\" - J.J.x) 2). It follows that va r(Y) =
0.64vor(X), so, taking the square root of the va riance, the standard deviation o ( Y
is
(2.10)

That is, the standard deviation of the distribution of her aft er-tax earnings is 80%
of the standard deviation of the distribution of pre-tax eammgs.
This analysis can be generalized so tha t Y depends o n X with an intercept a
(instead o f $2000) and a slope b (instead of 0.8). so that

Y =a +bX.

(2.ll)

Then the mean and variance of Yare


P-r = a + bp. y and

(2.12)
(2. 13)

and the standard deviation of Y is u y =bu x The expressio ns in Equatio ns (2.9)


and (2.1 0) ore applications of the more general fon11l1las in Equations (2.12) and
(2.13) with a = 2000 a nd b = 0.8.

Other Measures of the Shape of a Distribution


The

m~an

and standard dt!viation mca,ure rwo import,mt fe<~turcs of a di..trtbumean) and its ~pread (the o;tandard u eviatllln) 1 ltis c.:cttun di'm~.t,urcs ot two otner fcaturec; of a d t,tn hultOn: the k 'n~;. " whach

tion t~ center (the


'-ll''~'-'
m,a~,,.,.

,. ,

~,..t,

.. r . . , ....... .-nw.JY"'

......

.,...a ....... , _ ........... - - -..JL - -- : .. a...:_,

2.2 &.peeled Values, Mean, ond Voroonce

27

how th1ck, or ''heavy." are it5 tail~ The mean. variance. skewncs'\. and kurtosis are
all ba~l'd on what are called the moments of a djstrihution.

Skewness.

Figure 2.3 plolS four dic;tribulloos, two which are symmetric and two
which .1rc not. Visually. the d ist ribution in Figure 2.3d app~:a rs to devjate more
from symmetry than does the distribution in Figure 2.3c. The skewness of a distribu tion provides a mathematical way to descri be how much a distribution deviates
from symmetry.
The skewnes of the distribution of a random vanable Y s
Skewness = [ (Y - 1-Lr ) ~]
(T~

(2.14)

whc!re uy is the standard deviation of Y. For a symmetric distribution. a value o{


Y a gjven amount above its mean is just as likely as a value of Y the same amount
below its mean. If so. then positive values of (Y - JA.y) 3 will be offset on average
(in expectation) by equaUy likely negaLivc values. lltus. for a symmelric d istribution . [(Y - J.Lr)l] = 0: the skewness of a symme tric distribuuon is zero. If a
distribution is not symmetric, then a positive value of (Y - l-ly) 3 generally is not
offset on average by an equally likely oegative value, so the skewness is nonzero
Cor a distribution that is no t symmetric. Di viding by o-~ io the denominator of
Equation (2. 14) cancels the units of Y3 in Lhe nume1ator. so the !okewness is unit
free; in other words, changing lhe units of Y docs not change its skewness.
Below each of tht four distribulions in Figure 2.3 is its skewness. Jf a distributjon has a long right tail, positive values of (Y - ILl')~ are not fully offset by nega tive va lu~s. and the skewness is posiuve. If a distribution IJas a long left tail, its
skewnes.c; IS negative.

Kurtosis.

The kurtosis of a distribution is a mcac;ure of how much mass is in its


tall and, therefore. is a measure ot how much of the variance of Y arises (Tom
extreme values. A n extreme value of Y i~ called an outlier. The greater the kurto'h of a distribution. the more likely are outliers.
The kurtosis of the distribution of Y is
Kurtosis

{2.15)

IC a di!-.trihution has a large amount of mass in its tails. the n some extreme de partun:~ ()[ Y from its mean are likel). und these very large values wiUlead to large
values on average (in expectation), o f ( Y - lk >)J. Th us. for a dislribution w1th a
large nmo unt ol "''-'"' m 1h tails. the J..urtosi.; will b.: large. Because ( Y - ,u.y)4 cannut he nc.:f_allVC lh t.. kurtos1s cann u l b~.o ncgo t1vc,;

28

CHAPTER

FIGURE 2.3

Rev1ew of Probob1l11y

Four Distributions with Different Skewness and Kurtosis

,, ...
ll.4

II U1----F:::.-.-.--...-----...--1
{1
-~
-1

{c)

-.~ Wl11:-'~

-I . kurul'i

(d) ~"'" ~~~"''

:;

All of the~ dstribuhoos hove o meor

= i) r,

kunosJ- 5

J 0 and o variance of I . The d sin buttOns wllh s~ewncss of zero (o

and b) ore symmetric, the di\lribl.ttions with nonzero skewness (c and d) ore not symmetric. Ths distnbvtioos

wth kurtosis exceed1ng 3 (b d) hovo heovy toil~

Titc kurtnsb ol 1normally dbtrihutcd r~tntlum vn1 1ahlc is 1 -.o.~ r.md~1111 vari
abk "' 11h kurtusi" c-.:ct;ctling 3 has more m.1ss 1n 11~ tall<; than n 1101 m 11 r tndnm
vanabk \ u stnhut1on with kurl\)sis t:\~t.lJ ng .\ 1s c.tlh.:d le pto'-urtic 1.Jr. more
s mph ht.a\} wiled. L1kt. ~kcwne<>~ the kurto"'' 1 un11 tr~:~...~o ch.tn '"!! tht. units
uf F Jot.' notlh,mge 11'> kurtosiS.
Bdo\\ c~ hot th~.; h .. ur U1"lnbutluns m Iil!un: ~.3 lS its kurttl'i'. llh: t.h'>trihuIJUns Ill I 1gur .., .., Jh u nrc hea') -t,,ih;tl.

2.3

Two Random Variables

29

Moments.

lne mean of Y. E(Y). is also callt:d the first momt.nt of Y. and the
e-<pcctcd value of the !>quare of }', 0 ). is cnlku the sccont.lmomcnt of Y. In
gt:Jllral.the expected vaJue of t~ 1<: called the r 1h nwmenl ol th~.; random variable
}'. Ill 1! IS. the rb moment of} ' i~ Ell') The 'k~:" ne~ ... I' a function ul the fint, ~ec
ont.l, .1nd thtrd mom..:nt of Y. and the kurto,is I' .1 functuln of the! first through
fourth moments of Y.

2.3

Two Random Variables


Most ol tht. mreresung questions tn ...:~o:onomtc... tn\ nh c twu or more \ariable::.. Are
collcg.t. graduates more likd y to ha\ c a job thun nongraduatc~'> Hll\\ does the distrihutwn of income for "' omen compare to that for men! lne''" q .:,lions concern
t h~o. dl'\trihution of two random \'3ri~hlcs. con,td.:rcd together (ed ucation and
cmplo~mcnt status in the fn st e>.dmpl~. income and !!Cn<.h!r in the second).
Answering such qu~ttOns requires an understanding ol the:: concepts of joim,marginal, and comltuonal probabiluy distnbuuons.

Joint and Marginal Distributions


Joint distribution. The joint probubiliry di. rribution of two discrete random
.,.,m,lhks.. ~ay X and Y. is the prob.1llihty that the random van<~ hie-. <;imultaneou~ly
take on cenam value:., say x andy. The pro~ah1lltl~!> ol .Ill ~~~~hie (x.y) comnoation'\ :.um to l. The joint probability Jic;tribution can be \Hillen as the funct ion
Pr(X =x. Y = y).
For example, wcuthcr conditton!>-whether or not it '" rammg-a ffcct the
commuting tim~: of the <:tudent comm ukr in S~dwn ~ 1 l.d Y be a nina~ randum '.triahk that equah l if the cummutl. IS -.hort (k~" than 20 minutes) and
equal' II otherwise. and let X be n hinnry random vanahle that \!<.JUals 0 1f it IS nuning and 1 1f not. Between the!>c two random ' ariohk,, there arc lour possible outcome<;: it rains and the commute c. long (X 0. Y 0): rain and <:hort commute
(.\ 0, l' = 1). no rain and long commute(.-\
I } 0); and no rain and short
commute (X = 1, Y = 1). The joint prohahility c.Ji,trihution is the frequency with
\\h1ch c.u.:h of these four outcome!\ occurs over man) repeated commutes.
n example of joint distributiOn ol the'e two ,arwhk' tS g1vcn in Table 2.2.
According to thts dtStribution over manv commutes. 15% of the days ha\'c rum
and d l,lng commute.; (X- 0.} =0). that is. the probab1ht) ol a long. ram) commule.. ll> 15 Yo. or Pr(X = 0, 1" 0) 0.15. \bo, Pr(.\ = 0 Y - I) = 0. 15.
Pr( \
I, Y - 0) - OJP. and Pr( \
I }' I) 0.61 'J ht.:'l! lour po<..,ihlt: oulcum~" ur~ mutuall) exclu'i'c nnd ~;on,titutc tlw 'ample ~pt~cc 'nth~ four prohahilitic.:-. 'urn to I.

30

CHAPTER 2

TABLE 2 .2

Review of Probability

Joint Distribution of Weather Conditions and Commuting Times


Rain (X=O)

No Rain (X : 1)

Total

Long Commute ( Y - 0)

0.15

0.117

C ~mmut~ ( Y- I)

U.l5

o.o3

11 7S

0.30

0.70

ll)il

~hort

Total

II?...,

Marginal probability distribution.

The marginal pro bability distribution of


a ranuom 'aria bit! }' is just another name for its probability distnbution. This re nn
is us~d 10 distinguish the distribution of }' alone (the marginal distrilbution) from
the JOnt di~tribut ion of Y and another random va riable.
The mC'Irginal distribution of Y can be computed from the joi nL clistli bution of
X and Y by adding up the probabilities of all possible outcomes for which Y takes
on a specified va lue. H X can take on I different values x.1 ,.x" lhen the marginal
probability I hat Y takes on the value y is
I

Pr(Y

= y)

2: Pr(X

= x,, Y = y).

(2.16)

i=l

For example, in Table 2.2.the probability of a long rainy commute is 15wo and
the prooabihty of a long commute wi1b no rain is 7(\~. so the probability of a long
commute (ra~n y or not) is 22%. The marginal dt tribution of com muttng times is
given in the fi nal column of Table 2.2. Simi larly, 1he marginal probability that i1
w1ll rai n is 30%. as shown in the final row ol Table 2.2.

Conditional Distributions
Th~ distribulion of a random varianlc Y conditional
on anulher random variable X taking on a spccilic value is called the conditional
distribution of Y given X. The condilional probabi lity tha t Y take!> on tb ~ val ue y
when X takes on the value x is written Pr( Y = yiX = x).
For cxumple, what is the probability of a long commute (Y = 0) if you know
it is raining (X = 0)? From Table 2.2, th~ joint probability of a rn iny short commute is 151u and the joint probability of a rainy long commute is I~ %. so it it is
ra ining a long commute and a short commute are equally likely. Time;. the probabllll~ of a long commute (Y = 0). conditional on il o~1ng rainy (X= U),1s 50%. or
Pr( Y II .Y 0) 0.50. EquivaJentl)'.the marrmal prooanllity of ram'' 10'.., that
i-. )\ cr ma11) Ctlmmute-; tt rains 30% of I he limt:. Of lhl' JO uf "ummutc". 50%

Conditional distribution.

nl I hP limo' t h rt\mmotlo> '" lnnu tO '\ l tl '\ll\

2.3

r,.ABLE 2.3

Two Random Venables

Joint and Conditional Distributions of Computer Crashes (M) and Computer Age (A)

A. Joint Distribution

"'

..

"' .. 1

"' - 2

"' .. 3

0.35

1).065

0.05

0.025

0.0 1

0.50

New compute! (A 1)

0.45

0.035

0.01

II.IXI.'i

0.00

1).50

futaJ

0.!$

U.l

0.06

o.m

().Ql

1.00

"' : 4

TotoJ

I O IJ com putc 1 (II -

31

"'
0)

TotoI

B. Conditional Distributions of M given A


"' - 0

"' :: 1

111 = 2

"' - 3

P1( \ljA

0)

0 70

U11

0.10

IJ.O'

0.02

I' ~ \IJ.tl

ll

0.90

0.07

0.02

0.01

t).()O

~~
1.00

rn general. the conditional distribution or y gjvcn X = X is

= r , Y = y)
Pr (y = y IX= x ) = Pr(X
Pr(X = .l)
.

(2.17)

For example, the conditional probability of a long commute given thal it is rainy
is Pr(Y = OI X = 0) = Pr(X = 0, Y = 0)/Pr(X = 0) 0.15/0.30 = 0.50.
As a second example, consider a modificu tion of the crashing computer example. Suppe&e you use a computer in the libra ry to type your 1em1 paper and rhe
Ubrarian random ly assigns you a compute r from those avai lable, ha lf of which are
new and half of which are old. Because you are randomly assigned Loa computer,
the age of the computer you use. A ( = 1 if the computer is new, = 0 tf it is old), is
a random variable. Suppose the joint distribution of the random variables M and
A i" gven in Part A of Table 2.3. Then the conditional distribution of computl!r
cra,bes, given the age of the computer, is gwen in Part B of lhe table. For example.lhe joim probability M = 0 and A = 0 is 0.35; because half the compute~ are
old, the conditio nal probability of no crashes, given that you are using an old computer, is Pr(M = 0 1\ = 0) = Pr(M = 0, A - 0)/ Pr(A = 0) = 0.35/ 0.50 = 0.70.or
70%. In contrast, the conditional probability of no crashes give n that you arc::
assigned a new computer is 90%. Acco rd ~n g to the conditiona l distributions in
Part B of Table 2.3, the newer computers ar~ (co; hkely to cra~h than the old ones:
for example. the probability of three crashes i'> ~<>.. "ith an old computer hut 1%
with a new computer.

32

CHAPTER 2

Rev1ew of Probability

Conditional expectation .

The condi tional c pectution uf Y ~iu.n .\', al-;o


c.tllcd the conditional mean of Y given X , is the mean ol the cnndllion.tl Ji-.trihution (I } .,,.._n ,\. Th.tt '"-the conditional ~xre~tttlun a:, the cxpcctcJ value! nl }',
cum, ut~J u' 1.1. tht.: contluonal tlistnbuuon ol } ga~c.: n .\. II }' tnk\:' un I. '.llu.!.
y 1 )/...th~n ti'J~ (.Ondtt('IOaJ mean of Ygl\..:n \' = t '
t

( Yl X = t)

= ::S:

PrO' :.. \ / X

x).

(2.18)

r,lr cx.mrlc, ha<;~ on the condat10nal Jt!>trlbUtlllO' m Jabk ., 3.thl; expected


ol c.;omputt.:r r.t~hc". given that th~ l<Jillf'Uh.r 1'\ old. 1:, r(,\.1 A = ll) =
0 X U. 70
I X 0.13 ~ 2 0.10 - 3 0.05 4 '>< 0.02 U.)(). 111~ c.;;>.pc~:ll:d numb~ ot cumpulct c.r ,h~:..given that the computet'' new. is/-( \1 \=I)=- 0.14,
h:'' th.tn fw the nltl compuh:r-..
l11e cm1tlitional expectation of Y given X= xis ju~tthc mean htluc of Y v.. hen
X= r.ln the example ot Table 2.3. thc mean number of l1'<l'\hc'> '' 0 <;~for old comf'Hih;r'.llO the cnntliunnal expectation ol Y gl\en th.tl th~ computer is old is 056.
S1m1larlv, .ltlhlllg n~'' computers, tbt! mc.an number ol cr.t~h~~ 1::. 0 14. that is. the
\.Ondllannal e:cp~ctauon of r gt\Cn that rhL compu t...:r I'> nL'\\ 1-:. 1).14
nurnb~..r

The law ofiterated expectations. The mean ol Y s th\!" i1.hted ,,,~rag~ oi


the condition1l expectation of Y given .A. weighted b\ tht probabalitv di~tnhution
01 \ for C\,lmrk. thC mc.1n height Of ,ldUJt::. IS thc WeiL:htCU nHf'l 1L' UlthC mean
h\.teht ,lln1c:n anJ th~ me: an h ght of womc.;n. \\l'ightcd h> tht. pm(l\rt1nn') nt men
,tnt! "llml:n. St,ll\.tl m llhemaucall), 1f X tak.l! on thL l 'aluc::..\ 1 x1,tbcn
I

E(Y)

= 2:, Et > tX ~
t

.\ )Pr(X - r 1)

(2.1\}J

Lqu.111nn (2.1Y) follow~ t rom Equat1on~ (2.18) ,md (:!.1 7) (sec rx~:n:w 2.19).
Stilled differently. th~ expectation of} j.., the expi.!ctattllll of the conditional
cxpcct!'llion of Y given X,
E(Y)- E[f::(YIX) ],
\\h~1 c

(2.20)

the 1nncr cxpcctatwn on the nghthand ..,1Jc ol J LJUJtlon (2.20) IS computed


u-.ing th~ ...:ondltton.tl di..,trihuuon of} g.i\en .\ ,md thL outer c\pc:ct.lllun i' com puted u'ing th~ m.1rginal di-.trihution ol \' E4UJt11111 ("' '0) i-. kmm n a-. the Jan o f
if('mtcd cxpcdalion,.
I or cx.unpk. the rnean number of cra,hc.:s \/ 1' th~ \hi htcd :J\Crllg~ lll th<~.:ontllltonnl \'Xpcd:ttlon nl \/ gi' n that 11 1' uld nnd lhl n nduiunnl c.:-.pcclnllon

2.3

Two Random Vonobles

33

E(,'v!J t = 0) :>o' Pr( 1 ll) +- I ( \,f' A ..: 1) X


J>r( 1
1) 0 56 v 0.50 _._ 0.14 >< 0.50- 0.35. Thtl\ is the mc.H\ of the marginal
Jt,trtbuti m t'f \1. a:. calculated m C4uatt,,n (2.2).
J'h~. ltw ol uerat~tl C'\pcctauon!> tiDplit.~ th 11 tl the cntH.liuonal mc:an ol Y
gi\ t.ll X tl> zt:ro. th~n the mean of Y b tcro. I'his '' itO llllllh:Jt th.. cnn ... t.qUl.nce l>f
b.!llttJOn (" 20).tf ( Y 1\')=II, then E(Y)- 1(1' \')! /IOJ 0 Said differc.. nil~. if thl.:' 11lc!.tn of Y gi,en X b L~.ro,thcn it must bl. tint the: ptoh.thilit)Wdghtcd
a\C:Ill~'\. ol Lhc<;e conditional means is zero. that i'\,thc.. mean nf Y mu.;t be Lero.
lh e law of iterated expectations aho apphcl' to c.Xpl.ll 'tlton' that <.~rc conditi,m.tlon multtplc ranJ om 'anahlc!>. f-or example, let ,\, } ,.llld Z be J.tnJom vari
.tbles th.H 1re jomtl~ distributed. Then the Ia\\ ol ilcr..ttcJ \. xp~.:c..t.tlions ::...t)::. that
E() ) f[ F.(Y, A, Z )]. \\ here(}". X , Z) i-. the cont.httuiMI c\[X.'Cintion of) gJ\en
huth A 1nd Z For example. in the compuh:r cra-.h illu,trati,m ofTthk ~.3, kt P
dcnott the. numbl.'r of program in,l,tlh;d o n the cnmputct. tht'n /:( \/ '1 Pl j<; the
\.\pcctcd number of crashes for a computer wtth :t~c \ thll h.t:. P pro~rt~ms
i n ~ ta lll d.1l1e e.xpectiXI number of ..:ra-.hes overall. l: (M ). " the '"dl!hted il\'Cragc
of the expected number of era hes fo r a computer \\ith a~c A and number ot
prugtums P, weight~:d by the proportion or compuh.. r.. with th,tt 'aluc ol both A
mJ p
Exe rci ~e 2.20 provides some additional properties of conditiunal c\pcctalii'ln<;
\\ith multipk 'ariahles.

,,f M gi~\. n th.tt it is new. -.o (\I)

Conditional variance.

1'he variao<."e of Y conditional on .V ''the \art, nee of


the conditional d istnbution of}" gl\cn X. Statt:d maLIKmaucall), the conditional
'art a nee of Y gh en X is
J.

var(YIX=x) - L IJ,-(Y, X - \)rPr(r

,.,..,

,,1.\'= x).

(2.21)

fur e\,tmpk,t hc conditional \ari.mcc of the numb~. 1 ,,f cr.t,.hv. gi\'cn the\ I the
umputt:r i-. old is \dr( )f' A = 0) = (0- 0.56)" 0 o (I - 0.5o)2 >. O. n(2 0 .:;6)2 X () )() ~ (3 - 0.56) 2 X 0.05 ~ (4 - 0 56) Y. O.IY.' - (} 99.111C stand.trJ
dr' tation of the conditional distribution of \1 ~iH:n that A = 0 ts thu') v 'O.Y9 =
II'N Ilh. condlltunal variance of M given that A
I ''the vmwnc~ ol th~ Ji"lri
button Ill the seconJ row ol Panel B of fablt! 2 3, whtch IS 0.22. ~o the ~t.1nJarJ
dc.:\ wtion ot \I for new com put~Th i~ \ '0.22 = 0.47 I or lh~ conJiltonaJ Jtstrihut inns in l .thlc 2.3.thc C\pected numlxr of cra!.hc.:<. lur nc\\ computt:r... (0.1~) j, less
thun th.tl for oltl computer.., (0.56). and the 'rrcau t'lf the dio;ll ihution of the numhcr t)l crn~hc ... a~ mca..urcd i:l~ the conditional :-.t.tnd.m.l dc\'i,llton. '' <;maller ror
ne" computer... (0.47) than tor old (0.99).

..,
34

CHAPTER 2

Review of Probob41.ty

Independence
1\vo random variables X and Yare independen11y distributed. or inde pe ndent if
knowing tbe value of one of the variables provides no infonnatwn about the uth~.;r.
SpecifJcally, X and Y are independent if the conditional d1stnhuuon of Y ghc.;n .\'
equals the marginal distribution of Y. Thaa is. X and Yare indcpen<.kmly distribUted iL for aU values of x and y,
Pr(Y == yi X ,.. .r)

= Pr(Y ""' y )

(independence of X andY).

(2.22)

Substituiing Equarioo (2.22) into Equation (2. l7) g1ves an alternative expression for indept:ndent raodom variables m terms of their joint d1stnbution. If X and
Yare independent, the n

Pr(X = x. Y o:= y)

= Pc(X == x)Pr(Y = y).

(2.23)

That is, the joint distribution of two independent random vanables is the produc t
of their margmal dismbutions.

Covariance and Correlation


Covariance.

One measure of the exte nt to which two random variables move


together IS their covariance.1l1e covar1unce between X and Y i'> the e >..-pected value
EI(X - 1-'x )(Y - Jl r)l. whe re J.Lx is the mean of X and p.y is the mean of Y. The
covariance is denoted by cov(X,Y) or by rr xy It X can take on I values and Y can
take on k values. then the covariance is given by tbe formu la
cov(X, Y)

= <1 xY = I(X 4

=L
I

JlA)( Y- P.r)]
(2.24)

2: (x
I 1

1-

P.x)(y,- Jl y)Pr(X

= x,. Y =Y;)

To mterpret this formula. suppo e that when X is greater than it mean (so
that X - J.Lx is positive), then Y tends be greater than its mean (so that Y - JLy 1s
posiuve). and whe n X is less than its mean (so tha t X - Jl.\ < 0), then Y te nds to
be less than its mean (so that Y - p.y < 0). In both ca!>es, the product (X - P.x)
x (Y - p.y) tend~ to he positive. so the covariance is positive. In contrast, tf X and
Y tc:nd to move 10 oppu~ite direct tons (so that X is large when Y 1" ~mall. antl \icc
vc.:r..t) then th~ ~o.OV<Inance i ncl!auvc finally. if X anti Y an.: intlcpcn<.Jcnt then
th-.: c.:uvariancc i'i 7t.ro ( ce Excrc1c;c 2.19).

2. 3

Two Random Vonohles

35

Correlation.

Bccau e Che covariance is the product of X and Y. dt:\ iated from


their means. its units arc. awkwardly. the units of X times the units of Y. This "unit "
probkm can make numerical values of the co' aria nee difficult to interpret.
llle correlation is an alternative measure o f dependence bctwc\!n X and Y that
solves I he ' units'' prob l~ m of the covariance. Specifically, the correlation between
X ami Y is the covariance between X a nd Y, divided by their sta ndard deviations:

corr(X, Y) =

cov( X .Y)
r---:vvar(X ) var(Y)

CTx )

= -".\'rrl
'"-

B ecause the units of the numerator in Equation (2.25) are the sa me as those
o f the de nominator, the units cancel and the corre lation is unit k ss. The random
varia ble!' X and Ya re sajd to be nncorre tated if corr(X, Y) - 0.
T hl:\ corre latio n a lways is between - I a nd 1; that is. as rroven in Appen -

dix 2.1.
- I

s corr(X. Y)

:5 1

(corre lation inequali ty).

(2.26)

Correlation and conditional mean .

If the conditiona l me an of Y does no t


di!pend on X , the n Y .and X are uncorrelated. That is,
if E(YI X)

= Jky, then cov(Y.X) = 0 and corr(Y.X) = 0.

(2.27)

We now show this result. First suppose that Y and X have mean ze ro, so tha t
cov(Y,X) = .t.l (Y - ,uy)(X- Mx)) = E(YX).By the luw of i t~ r a ted expectations
!E qua tion (2.20)). E(YX) = [(YIX)X) = 0 because E(YIX) = 0. so cov(Y,X)
= 0. E4u ~nion (2.27) follows by substituting cov(Y,X) =0 into the definition of
correlatio n in Equa tio n (2.2'5). If Y and X do no t have mean zero, first subtract off
thetr means. then the prece ding proof applies.
It is not ne cessarily true, however, that if X and Y are uncorrclated , the n the
conditional rnea.o of Y given X does oo t depend on X . Said differently, it ~ possihl~ for the conditio nal mean of Y to be a functi o n of X but for Y a nd X nonetheh:-.-. to he un~orre l a te d . An example is given in Exercise 2.23.

The Mean and Variance


of Sums of Random Variables
'1 he mean of the sum of two random variables, X and Y.is the sum of rheir means:
t::(X

Y)

= E(X) +

E(Y)

= IJ.,

t- J.L,,.

(2.28)

36

CH APTER 2

Review of Probability

The Distribution of Earnings in the United States in 2004

o me

par~nb

d ht:ir c. aldrcn th<tl th") will be

o1blc to get a betta. hi~h~or-p:1~10A jol:lal the) g.~t


a colic" lkgrc:c tluw tf h~' "l lf luph-.:r eJucauon
Are thc~e p;. ~,;Ill\ ri)hl'l r>oo.;S thl.' 1.hMn bution of
\. rtliOC~ d,. u .. T Jx-t\H;en wori..CI' V. ho :\r~,; C11Jcgt:

ga..tdu. t~; ., mu \\ t)rk~; r. who ha" ~ onl~ a h111 h 'lChool


IIi :>lom:t. and, tf ~1. ho\\ > ,\ m( t-" 1rl crs wnh a <;Jmilar edUt:<lliOn. UllC!i the da,tuhlllion of eauu ng.) for
m~n nd ''om~.; f1 thher" Ft example. are the b.:-;tpaid colll:gr...<.:ducnll.:d women paid Of' well as the
h<;~l - pacd colkct-.-... clu~.,ued m~on''
One wRy 10 f'lnswcr th<.:sc questJon:'i is l<l examine
th-.: di-;tribmion of earningr.,condilionnl on the highest ~:d uca ti omt l dc.:grt>c achieved (high school
diploma or b.tchdo~ degree.: ) und on eend.:r. These

four conditiontd lltc;tributiC\ns urc

TABLE 2.4

~hown

in

Figur... 2..1. and 1h mt tn. , , ndard de\ I uon nnd


!>Orne pcn:.:ntilc~ ~'l

l h~o

Ct ndtl!unnl d1 tnbuttort-.nrc

presented ml.thlo. : -J 1 '' " , nple. th Ct 11lrt"''

mcnn of curnin!t' for '"omen '' hus~: hu hc:.l dcgr\. ~

1!. lttgb -:chool dt.rlon


hat 1 E(li '' ., ' 1111'1
c..)t dt'l:rt~e = h Kl .c/wo/ cfip/(}1/111. vclltii'T
femuf.) i-. SJJ.2'i per h (lUT.
TI1e di,tribuuun o r wcng.: hourly rnme_, for

female college gr.tduatc' (figure 2 ln ) '' ~ht rtcd to


the r ight of the lli!itribuuon lllf \\ ODl<.oll ' ' ith on I~ ll
high school degree
be ~C(;ll

fl'lr

(Fi~ure

2.4tt), the ~..tmt.' shtft can


m~.:n (Figura 2.4d und

ihc two group!; of

Flgure 2.4c). For both men <mll wom.:n. m.:un enrniogo; .1re hil!hcr for tbn~C with collc~c u c ~n;c (T&

blc 2.4,

f1r~t num~nc

column).

l n t ~: re~t 10gly.

Pen:enh1e

Standard
Deviation

s~

90"11.

lS%

(med"tan)

75"1.

7 II-i

S IP9

12.0~

$16 lllr

12

1085

13 i'.:l

11113

~ll 11.:1

.H.2h

17 tJ3

4. ~6

11 54

1:\.S7

21.113

2t< \ 5

J..un

ti.31

1.S.23

'5 71

4. liS

(.l) Wnm~n wnh h1!!h 'dlllol

dtp ll

( b l \\lumen with
dcgro.:\:

1 .~5

'

1 ~1 ut

)o.:.u

l7S

~o:.ollo.:go.:

~I

(c ) M en wi th lugh ~dW11 l

Jiplum.t
ILIJ ~len \\ tt h "'ur-yc.11 collcgo.:

dcercc

2, "

th~ numh.!r f hnu~


' ' ' 1111:~ hnurl) c m111p brc th\.' tllllllf IIOOUill pn:t "W"t!!O:~ ...alanc5. ~~~ 'H'!d honu~s, divl\.lc\!
.. ~Un:h l\.IJS Currens Ptlflut; lion 'iul"c:~. \\hw:h ., d :")CT'Ib.:d '"
lhc ~nbutK "'~ere coxnputal

"'urtc:d mnll311)
,\,~ndi\J.

.,

the

spread of the d iStribution of~ rntn!t-'- .I~ m~csured


Cl?ttimu f o11 lit' \ I pa8e

Summaries of the Conditional Distribution of Average Hourly


Earnings of U.S. Full-Time Workers in 2004 Given Education level and Gender

Mean

2.3

lly the standard d~viation. il> grcutcr for those with a


degree than for those with a high school
dirl(lma. ln additton. for both men and women. the
90th JXrcentih: of earnings i'> much htghcr for worker- with a college degrc~; than for workers with onl}
a ha~h ~hool dtploma. Thts ianal comparison is con(tstcnt with the parental admonition that a college
d.:grcc open' door!> that rctnnin closed to individual~ \\tth otlly n htgh school diploma.
colle~'t:

FIGURE 2 .4

Two Random Vor10bles

37

Another feature of thc:.c di-.trtbultort!> i!> that thl!


distnbution or c:arnings for m11 i'\ ~hit ted to the right
of the distribution of carnmg:. Cor women.Titis "gender gap" in earning:. is un important-and to many.
troubling-aspect of the di<>trtbution of earnings. We
rerum to thi.!. toptc in later ch tpter
1Thc dt ~rrihutions "' l!rt:. c~ttma tcu

u-ing uata from tho!


Mnrch 2()(J) Current Populntton Survc> 1.1.h11:h t) dis.:u<.<;cd
in more deta1tln Appendt.\ 3.1.

Conditional Distribution of Average Hoorly Earnings of U.S.


Full-Time Workers in 2004, Given Education level and Gender

The lour di$tributions of


earning' ore for women
ond men, for those with
only o high school
doplomo (o ond c) and
tho~oe whose highe.sl

degree 1$ from o four


yeor college (b and d).

c?o~[sity

Den sity

UCtll~

(1117

007
I

fi(J<

0.05

c '~

CI.CI~

'-'I .

11u1
V.lll

co

HI 1il 311 #I 50 60 7ro 1!0

Do llan

(a) w,,m,n \\lrn ~ b~gb .h~"l JJrl.o'no


D<'nsity
1/.Cib

Cl

1V "'

-4(1

5<

-t, so

lul:h ... h.,. I lor

'

'-

Dollar~

Dollars
(c) Men "''Uh

(d) " rn wnh ollCJ,"<' Jc

1
38

CHAPTER 2

k!:v CONCEPT

2.3

Review of Probobtlity

MEANS, VARIANCES, AND COVARIANCE$ Of SUMS


OF RANDOM VARIABLES
1 ~i X. r. and V be random vanabl~. ll!t J.l., and u\ tx he m~e~n and \ .1rianc~ of
X. kt u.\ y b~;. tht. covariance between X and Y (and so flrlh for the other \.HI
able~). and let u.l>. und c be con'\tant:.. I11e following l:t~h [ullo\\ from the :Jefinitions ot thl.! ml!un variance. and covan;tncc:
(2.29)
var(c1- b Y) = hz~.

var ( a,,v - lJ Y)

' '
tNr:._

(2.30)

2ahu.n

-t

I,
,
r<T).

(2.~1)

(2.32)
cov(a + hX .. cV.Y)"" buyy - rrr 1 r
E(XY)

=ux,

corr(X. Y) -: 1 and uXl l s

(2.33)
(2.34)

P.\J.l.> and

~~ {corrdaLion inl.'quality). (2.35)

Int.' ariunce olth~ :-.um uf X and Y is th~.: '\um of tbei1 \atinnccs. plus twice their
co\anancc.
var(X }") -= '.tr(X) - \ ar(}') + 2cov(X,Y)

= rr}- rr~- 2n:n

(2.36}

li \'and Yare indepcndcnt. tb~n the c,wari.tnc\! ~' :teru and the' ariuncc l\l thd
sum 1tlthcil ' ariances:

~urn 1s the

= V<lr(A ) + var( Y) = tr',


(if X and ) ,lrt. 1ndcpt:ndenl ).

var(.'< + Y)

+ rr;,

(2.37)

L "dul t:~prc~... 1on~ lor means. \ .~ni.ln<.:~,;~. anJ Hl' aria 1~e' nvuh 11~ "' "~htcd
'um-. 111 randl'Ol , .., 11hlc' are clllk(.'h;u 10 .k.e~ Cnncept 2.3. Th~ n.::.ulb 111 Kc~
C1,nct:rt 2.1 are d~rih:u in Appcnui' ~.1

2. 4

fFtGURE 2.S

The Normal Ch1Squored, Sludeot t, on~ FD1~tnbuhons

39

The Normal Probability Density

The nor mol probobility density function With


mean /l and vononce " 1s o bell.hopcd curve,
centered Ol 1' Thf. oreo under the nor mol p d f
ber.veen p I 96. and I' ~ l 96tr is 0 95
The oormol d1stnbullon 1s denoted N II' < 'l

II -

2.4

I '.IN'I

)'

The Normal, Chi-Squared,


Student t, and F Distributions
Jll~. rrnbilhilit\ dic;tJibuuons most oft~n encountered rn c..:onnnlltric~ art! the nor
m.tl chi-,qu..m.:u. S tudent r. and F d1stnbutio ns.

The N ormaJ Distribution


A con unuo us random variable \\ ilh a normal d.istriiJutiun h.l\ tht. f.uuili.tr bdl'hapL:d rrolw htli ty dcnsit) shown tn h!!UH! 2.5. The: :.pcctltC fun<.:tlOO ddintng the
normal ptob.tbtbt) density is gi,en in Appendix 17 I \-. hgurc 25 ~ll\lw, the normal dcn,tl) wuh me.m J.l. and variance''' is 'ymmetric tmund it... mc.;-m and hn'i
9""' ,r it' prub.l:lilit~ between J.l. l.%tT and J.l. .~.. 1 IJn.r.
c.;om "f'Ccinl notation nnd tcrminologv have been de\' J,pcd lor the n'1rmal
dt,trthution lllt. Of\rmal distribution v. ith mean p ml.l ' n.tn<;~ t; ts expr;.;.s!)cd
conrtc.l'lv ll" J\(J.I.. a ) ."The ~1 andard norma l di, Cribution rs tht. nurm.ll dtstnbu
tiun wtth mc:an p. = 0 and variance cT2 = I and ts dt:twt~.;d 1\(0. I) Random
':~rial, II.'~ that have a .\'(0.1) dbtnbutton an~ uflcn tlcnntl'd hy Z, and thl.! stnndard
normal cumul.lliH distribution function is denoted b~ thl.' (.reck letter <ll:
lCcort.lingl~. Pt (Z '!:c) = <f>(L). whe re c i~ 1 comtant. Value-. <'r the <..tand.trd nor
m ll~o:urnul.ltiVI.' Jl,trihution runction arc tabulnh.:d '" \r~ndix 1 hie l.
J'u ~lmpull P"'hah1lit1es for a normnl vanablc \\Jih 1 ~:ncral mean and' ari.tncc. itmu-.t b~. ' tandardiLed by first :.ubtracting the mean, th~.:n dt'vtdmg the n: ... ult

40

CHAPTER 2

Review of Probobility

COMPUTING PROBABILITIES INVOLVING


NORMAL RANDOM VARIABLES

2.4

Suppose Y is normally distributed with mean p. and variance <r. m other words.
Y is distributed N(p.. u-2) . The n Y is standardized by subtmcting its mean and
dividing by its s tandard deviation. that is. by computing % = (Y- p.)lu.

Let c and c~ denore two numbers with c 1 < c:. a nd let c/ 1 = (c 1 - p.)lu and
d 2 = (c: - 1-L) / u-.Then.

Pr(c1 ~ Y s c2) = Pr(d 1 s Z s d1 )

= cf>(d2) -

<l>{c/ 1) .

(2.40)

TI1e normal cumul ative distribution function js tabulated in A,ppcndixThblt! 1.

by the slandard deviation . For example. suppose Y is distributed N(l , 4).thatjs. Y


is nonnally distributed wilh a mean of 1 and a variance of 4. Wha t is the probabiHty that Y ~ 2-thar is. what is the shaded area in Figure 2.6a?The standardized
version of Y is Y minus its mean. divided by its standard deviation, that is.
(Y- 1)/ v'4 = ~(Y- LJ. Accordmgly, the random variable ~( Y - 1) is normally
disuibuted with mean zero and variance one (see Exercise 2.8); it has the standard
normal distribution shown in Figure 2.6b. Now Y s 2 is equivalent to !(Y - 1):::;
that is,~( Y- 1) s ~ Thus.

i<2 - 1),

Pr(Y:::; 2) = PrH(Y- 1) s ~J = Pr(Z :::; ~ ) = <P(O.S)

= 0.691 ,

(2.4 1)

where the value 0.691 is taken from Appendix Tabl e 1.


The same approach can be applied to compute the probability that a normally
distributed random variable exceeds some value or that it fa lls in a certain range.
These steps arc summarized in Key Concept 2.4. The box. ''A Bad Day on Wall
Street, presents an unusual application of the cumulative nom1al distribution.
The nonnal distribution is 'iymmetric, so its skl!wness is ; ero.1be kurtosis ol
the normal distnbuti<.m is 3.

T he multivariate normal distribution.

111e normal di!>tributiun can be gen-

2.4

FIGURE 2 .6

The Norrnol. ChiSquored, Student /, ond FDistributions

41

Calculating the Probability that Ys 2 When Yis Distributed N(l , 4)

To colcvloto Pr(Y s 2), standardize

Y.

then use the ~tondord normal distribu


tion toblo. Y is standardized by subtract
mg its meon (,u 1) and dividing by
its stondord deviation (tr = 2). The
probob1lity that '{ .s 2 is, shown in
f1gure 2 6o, and the corresponding
Pfoboblity after stondordizing Y is
shown m f1gure 2.6b. Because the
1
:1 .do -d'1zed random 110riable. Y 2 ,
o stondord normal (Z) random
variable, Pr(Y s 2) =Pr{ r f 1 s 1 ; 1l
Pr(Z .;o.a 0.5). From Appendix Tobie 1,
t'r(Z ... 0 5) 0.691 .

I II 2 II

}'

(a) N(l.-1)

(I f)

0.5

(b) N(fl. 1)

the diSiribution is called the multivariate normal distribution, or. if only two variables a re being considered, the bivariate normal di tribution. The formula for the
bhariate normal p.d.f. is given in Appendix 17 I. a nd the formula for the gene ral
mult tvnriate normal p.d.f. is given in Appendix 18.1.
The multivariate normal distribution has thr\!c imponant properties. I X and
Y have a bivanate normal distribution with covariance uXY. and if nand hare two
con.;tants. then aX + bY bas the normal distribution,

aX + bY i~ distributed N(aJ.Lx + bJ.Ly, a2q} + b 2cr~ + 2abuxy)


(X, Y bivariate normal)

(2.42)

More g..-ncrally.lf 11 random variables htl\ 1.! u multtvanatc normal distrihution.thcn


llO) linc:nr c:umhm.111on of these \'ariahks (su~:h '" lh\.:11 sum) b nonnally Ul'>lrihuteJ.

REI\'icw of Probobtlity

CHAPTER 2

42

A Bad Day on Wall Street

0
hy

1\p~t:al

del) the overall \alue of

~locks

trmlt I on 1he u.!>. Ml k 'fl:trb.:t c 10 ri~ or fall

I~,.

ur even mor

'l bi~

h a 1m-hut nothing co m-

rnrcll 111 ''h(.t happcn~.:d on \ltonll:t\ October 19.


19:>7. On .. , ,- td, ~I nda}."tho: D<m Jone" fndtNrial
J-\\Ctatte

aver~ge of '0 lu6c tnJu-,tnal sllx:ks)

"cl' h) :? .6

I wm J;muary I.

I'J~O.to Octo~r

16.

IIJH7. the; 'tanJ m.l J ., iatinn o( daily rx:rccntage

pnc<.. ~.:httnl!n on the Du\\ was 1.16%. so the t.lrop of


2;).o':t,., wa~ n nc.gauv~; return of 22(- 2~.6/1. 1 6) stan-

dard dc,iatio " l11c enormity of lht drop c n be


seen 10 Fi~ure ~ ., n r l l

Dow during the

If dail\'

)f the d;uh rclurno; on the

1980~

per~ent<J~c.; ;'Tit:C

ch.mgc!> .uc.; nmnulh

di tributcJ. th~;n the p ol-1l'll ll n drop vi ,,, ku:.t

22 standard de\J.tlnlll' '' Pr( L


2:) <II( 22).
You will no1 fint.lthh htlue 10 Append'< l.thk hut
you can calculat~ it u<,ing a computer ( tr> 11 1). Tim
probabJLil} is I .4 X I() 111 thnt IS () 000 .... (1{)014.
where thert: nre 11 Wtal o f tiXltcw~!
C'()nll/1/t('d

FIGURE 2 .7

Doily Percentage Changes in the Dow Jones Industrial Average in the 1980s

During the 1980s, the


overog percentage dotly

Pcrctnr change

chonge ol "the ~ ndcx


wos 0.05"" ond ,js !ondord
dcvolion wos

16 . On

October 19, 1987- 81ocl


Monday..-the ndex fell
25 . 6 ~, 04' mcxe thon 22
s!ondord deviations.

-I

_,.,

0.:1obtr 19. 19lH

_ ,

2.4

The Normol, Chi-Squored, Student/, ond FDistributions

How l>mall i" IA .1( 10 Lm'? Cons.idc.:r the following:


The world pt)putotion i~ ahout 6 billion. so the
proba:Pilil)' of winning a random lottery among all
living people is about one in 6 billion. or 2 X

w-'o.
The universe is helh:vc.:d to have existed for 15 billion years. or about 5 x '1017 seconds. so the probnbihty of choosing a particular <;econd at random
from all the <;t:conds sine.! the bc.:ginning of time
is 2 X 10- }. .
Then~ are appro;\imately I043 molecules of gas
in the fir.;t kilometer above the earth's surface.
The probability ot choosing one at random is

w-.u.

43

Although Wall Street did h;tve a bad da). the fact


that it happened a t ull ~uggests that its probability
was more than 1.4 X 10 w7 l n fact. stock price percentage changes have a distribution with heavier Utils
than lh~! nonnal distrillution: in Other words. there
an: more days with large positive or large negative.:
changes than the normal dishibtttlon would suggest.
For this reason. finance professionah. use econometric.' models in \\hich lht. 'ariancc of lhe percentage
chang~:: in stock prices can e' olve over time, so some
periods have higher volatility lhnn o thers. These
models with changing variances M C more consistent
wub lhe very bad- und very good-days we actually
~ on Wall Street.

Second . ii a set of variables has a multivariate normal distribution. the n the


marginal distribution of each of the variables is oonnal [this foUows from Equation (2.42) by setting a = 1 and b = 0].
Third, if variables with a multivariate normHI distribution have covariances
that equal zero, then the variables a re independent. TilUs. if X and Y have a
bivariate norn1al distribution and ifxy = 0. then X and Y are independent. In
Set:tion 2.3 it was stated that if X and Y are independent tht n , regardless of lheir
JOint distribution, tTxy = 0. H X a nd Yare jointly nonnall y distribu ted, then
the converse is also true. This result-tbat zero covariance imp lies independenceis a specia l p rope rty of the multivariate norma l distrib utio n that is not true

m 11enera I.

The Chi-Squared Distribution


The chi-squared distribu tion is used whe n testing certa)n types of hypo theses in
stillJstics and economerrics.
The chj-squured distribution is the distribution of the sum of m squared m<ler c nclent standurd normal random variables. 1l1is distribution depends oo 111 , which
is called the degrees of freedom of the chi-squared distribution. For example. le t
Z 1 2 2, a nd 7. , be independent standard normal random variahles. 1l1en
+ Z~
r Z ~ hno; a chi-squared distribution "ilh 3 dep, rct:~ of frccdom.111e name for this

Zy

44

CH APTER 2

Review of Probability
from tht: Greek kllcr useJ to denol~: II a chi-squared distribution with Ill uegrces of freedom is denoted x:,.
Sdt--cted perct!ntilcs of the X: distribution are given in Appendix Table 3. For
example. Appendix fable 3 sbowi'\ that the 951: pcrcentsk of tht: ,), dhtnbuuon 1!.
7. 1, so Pr(7.' -r z; , z~ s 7.Xl) = 0.1)5.

UlStnbuliOn

u~rivcs

The Student t Distribution


1llC Student t distribution with m degrees of freedom j, <.lcfancd lO be the dio;tribution of Lh~.; ralio of a ~t an dan! normal random ' ariable. dsv itl.::d b) the squar~
root of an inc.Jcpcndently distributed ch r-squarcJ ranc.lnm variable with m tk:grees
of freedom uividec.J by m . That is. let Z be il standard normal rnnc.lom variable, let
W be.: a random variable with a chi-squared distribution with m degrees of Jrcedom. and k t Z anu W be independently dist ributcd .111en the random variable
Z/'VW lm has a Student r distribution ( also called the r distributjon) with m
degn:c~ of freedom. This d istribution is denoted t,. Selected p~.,rcentlles of the Stu<.knl I distrihution nre given i.n Appendix Table 2.
lhc Stuc.l..:nt r <.hstrihuuon dep<:ncls on the degrees or frct.:Jom m. Thus the 95'h
pcrccntilt:: ul the 1111 distribution depends on the llcgrt::cs of frecc.lom m . The Stuc.lcnt r dismbulion ha~ a heU :.hape simrlar Ll\ that of the normal dbtrihution. but
when m b. !>mall (20 nr less) it ha more mass in the tail~-thn t i<;. it i~ a " fallt:r"
bdl "'ll.lp-: than the nonnal. \\'hen m is 30 or more. the Studt:;nll d rstribUtion is well
approximatt.:d b) the standard nom1al dil>tribution. anc.J the 1,.. dil>tribution t:quals
the <;t:mdard nom1al dbtnbution

The F Distribution
The F distribulion wiLh m and 11 degree" of frecc.lom. denoted F, ,,.. is defined ro
be the distribution of the ratio of a chi-squared random vanahle \\-llh degrees ol
freeuom m. divided by m.lo an indcpend~.ntly distribut ed chi-squared random
varinhle with degrees of rrcedom 11 , diviucd by 11 . To ~late th1~ mathcmaticHlly.let
W hl a ch t-\<j uart:tl ranc.lom varrablc with m tlegrcl!s uf fn:ctlom and let V' be n
chi-~qu<rr..:d random variable with" tlcgrces of rrcc;Jom. \\ h~:rc Want.! V arc inde
pt:nc.lcml~ di~trihutt:cl.ll1en ~.~'has an l~,u' Jistributi,ln-that is. an F distribution
"'ith num~:rator ~k.gree\ of freec.Jom m and d\!nominatur clq;rces of treeuum 11.
In sl,-tti..,tics ,tnd econometri<.:"-. an important .;p~:ci al a-.t> or the F c.listributitln
arises when the d\.nomm.ltor degrees of lreelftlm IS l.trg..: enough th"ttlbc l mJI di'>tributron can be appr oxrmaled hy the r:,. drst ributmn. In thio; limiting case. the
l.knomin.ttm rand1)m \anable \.' h the mean of mlinuel ~ many chi-.;quarcJ nm - I ... . . n , l..tHI- - -

2.5

Random Sampling and the Distribution of the Somplo Averoge

45

ntnJom \ari tbk i' 1 c~~ E:xerci'c: ">~) l hU\ the,.., disuiblllinn j, .he JJ,Irlhution of u chi ...quart.'d r.tndom 'ctria~lc v. uh m de!Ire ... " 01 lrc..:c..:dom tli' 1J ~d by
m \\' 1111 " Jt,trihutcJ /., _. For example from Appl.ndi\ Tabk 4. thc 95' p<..r~~:nule oJ rhc r, dt~tnhullon ts 2.60, \\hich i:-. th1.. ~.tmc .., th1.. ""' p~..rccntllc of
th ... ..\; Jt.,tnbuttun ' ' ' (from Appc:nJix Table 2). Jivid~.;d b) thc Jcgr~~' uf frl!edom," hkh 1., :1 ( 7.Sl i 3 = 2.60).
1111..' 90' 11 95111 and C)C)Ih percentiles of th~ rll Ji-.tribution an: riven in Append ix Table 5 (or .;elected values of m 1nd n. For exa tple. tht. ~) pu...enulc.. nf the
F di,trihullon b 2.Y2,and th~Y5 1 Jkn. nllk of I'"- F; ~' J1,tr bUllt_)O 1:\ ~.71.As
the denomtnator dl.!gree:-. of treeJom lltn~rea...cs, th<..lJ5 p<.;n.:.. ntik of the.: 1..., dJstributwn ll.:nJs to tbt.: F3. It mit of 2.o0.

2.5

Random Sampling and


the Distribution of the Sample Average
Almo.;t all the.: ... tati.;tical and econometric proccJu re:. u;)c::d in thll- hooJ... invohc
average-; or wcig.htcd a"erages of a sample ol d.ttu. C'harac t erllm~ the dl:.trtbutwns ol sample avtragcs therefor\.! ic; an esse ntial step lO\\ ard unJc.; rstanding the
pcrtorm.tuu.. ot cconom~tnc procedure::..
Tht'> o,~ct1 on tntroduces some:: b.t'olc. concc::ph al'lnut random 'nl'lplin" and the
di..,tnhutitlllS u( avcr,tges that are u~cd th roughout th.. hook. We begin hy
Jisculi'>inl! random l-.tmpli ng. The act of random .;amplln~ tllolt j.;, r:1ndomly
drawin~ 1 sample from a l<.uger populallon-hn'> th ... dl.:-ct otm. t..i n~ thl.' ~mple
aver;tl!~ i t s~.-11 .1 rnndllm variable. D c~.:ause lh~.. s 1mple a\'c.;t 1g1.. as a random
' .mahlc. 11 ha-; .1 prohabthty di.;tnhutaon. wluch 1:-. c..:aUcd ib s<llllpling di..,trthutmn.
l'his s~ctwn concludes \\ith some pwp~rtic.., ol th ... c.ampling di,trihution tl( the
'><tmpk tvc.r.tg~.

Random Sampling
Simple random sampling.

Supp<,.,e our CCimmuting student frnm "cction 2.1


.1spires tu h~: a ... tatistician and decides to record her commllling t1mcs on vanou-.
<.lav'>. :-,1w .;c.;(ccts these: days at randum lrom the <;chool year. and her dail\ commuun~ 11m .. ll.ls the cumulatl\ e dtstnbuuon func.tmn m Figur~o 2.2a BcciiU"l' these
da)~ \\Crc dcctc..:d .11 rdlldom. kno,,mg the" tlu~o ulthe c~..,mn ut ng lllllC on nne
<ll th~-.c r.mdumly "clt:ct~d da~~ prm1dcs no nfnrmation ai'nutthc C\1011Tluting
time <'11 .muth~:r of the.: days: that is. bcc.tu"c the. day.; were ~...J~:ctcd fll ranJ<,m. the
Htlut.'" ol thc. cununutin- lime on ~.nch ul th~; t.hlkrcnt dn~ ~ .trl.. tndc.:pcmlcntl~
ui,trthutcd r olld<'lll varmnlcs.

46

CHAPTER 2

Review of Probability
The situation described in the previous paragraph is an example of l.be simplest sampling scheme used in statistics. called simple random sampling. in \\-hicb
n objects are selected at random from a p o puJation (the population of commuting clays) and each member of Lhe population (each Jay) is equall> likely to be
included in tbe sample.
Then obse rvations in lhe sample aie denoted Y 1 , Y~ where Y1 is the first
observation, Y 2 is the second observation . a nd :.o forth In the comm uting
example. Y1 is the commuting time o o the first o l her n random ly selected days
aocl Y, is the commuting time on the i'h of her random ly selected days.
Because the members of the population included in the sample are selected
at random, the vaJues of the observations Y1, Y, are the mselve s random. If diffe re nt m embers of the populati on are chosen. their values of Y will diffe r. Thus.

the act of random sampling means thar Y1, , Y, can be treated as random varia bles. Before they are sample d, Y 1, , Y 11 can take on m any possible va lues: after
they are sampled. a specific value is recorded for each observation.

i. i.d. draws.

Because Y 1, Yn are randomly drawn from the same popula-

tion. the margina l distribution of Y; is the same for e ach i = J, . .. , n; this marginal
distribution is the distribution o( Yin Lhe population be ing sampled. Whe n Y, has
the same marginal distribution fori

1, . .. , n. then Y" ... , Y, nre said robe iden-

tically di tributed.
U nde r simple random sampljng. knowing the value of Y1 provides no information about Y2 so the conditional distnoutio n of Y2 given Y 1 is Ihe same a s t he
margina l d istribution of Y1 . In o ther words, under simple rantlom sam pling, Y1 is
distributed independently of Y1. . . , Y,.
\\'hen Y1 , . , Y,, are drawn from the same distributiOn and are independlently
distributed, they are said to be ind ependently and identicaUy distribute d, or i.i.d.
Simple r andom sampling and i.i.d. draws are summarize d in Key Concept 2.5.

The Sampling
Distribution of the Sample Average
The sample average, Y,of then observations Y1.. , Yn is
-

y == -n ( y 1 + y 2 + ... + y '')

II

= -11 ""'
,~
. ,Y.,

(2.43)

An esst!ntia l concept is tha t the act of drawing a random sample llas the eCCect
of making the sample avera!le Y a random variabk. Because the sample was dra\\-n
al random, the value of each Y, is rando m. Be cause Y 1 Y, arc nmdurn. their
average is rando m. Had a different sample bc~n urav.n,l hen tht: uh-.cnntion~ and

2.S

Random Sampling and the Distribution of the Sample Average

47

StMPlf RANDOM SAMPLING AND 1.1.0. RANDOM VARIABLES


In a :.1mple random sample. n objects are drawn at random from a population and
each object is equally likely to he drawn. The va lue of tht! random variable Y for
rht ,11 ra ndomly drawn ohject is denoted Y,. Because each object is equally likely
tl, be drawn and the distribution of Y, ~ the same for all i. the random variables
y 1, 1' art: mdcpcndentl} and idcnrically distributed (i.i .d.); that is. the dbtribution ot Y, i the same for all i = 1.. ... n and Y1 1s distributed independent!) of
Y2 Y, and so forth .

2.5

their sa mple average would have been diffe re nt: the value of Y diffe rs from o ne

ra ndomly drawn sample to the next.


For example, suppose our student commuter selected five days at ra ndo m to
record her commute ti mes. then compmed the average of those five times. Had
she chosen five differe nt days, she would have recorded five different t1mes- a nd
thus wo uld bave computed a different value of the sam ple average.
Because Y is random, it has a probability d iStribution. Th~ d i~tribution of Y is
called the ampling distribution of Y , because it is lhe pro bability distribution associated w)th possible values of Y that could be computed fo r diffc:ren t possible samples Y1, Y,.
The sampling distribution of averages and weighted averag~:s plays a central
role in sta tis tics aod econometr ics. We start our disc.:usl>ion of the sampling distri
button of Y by computing its mean and vanance unde r gene ra l conditions o n the
popu la tio n dJstributioo o f Y.

Mean and variance of Y.

S uppose that the observations Y 1, , Y, are i.i.d .,


.uH.I kt JJ.y and cT~ deno te the mean and variance of Y, (because the observarions
arc i i d the mean and variance is the !.ame for a ll i = 1, ... , n). When n = 2. the
mcnn 11f lhc sum Y 1 .._ Y2 is given by a pplying Equation (2.28): E(Y 1 + Y~) = JJ.y
+ lly '- 2JJ.y. Thus the mean of the sample average is E[!( Y -r Y,) ) = ~ ..< 2})-y =
p.) In Pcncral.
I

(Y)

=-;; ,.,.,
2: E(Y,)

= }Joy .

(2.44}

l nc 'an.mcc of Y i fouod by a pplying Equation (2.37). Fo r exa mple tor n =


,, '"(}
Y2) :!oi. so (by applymp E4uatton (2.31) \\ith 11 = h- and
I
cov(Y1.>- ) OJ.var(Y) = 2 o}. Forgcncrt~lrr hccau'\:. )' 1
Y arcii.d. Y1and
) ' 1111.! mdcpcnucntly dJslributed for 1 "' j, sn cuv( }',. }' )
0
lltu..,,
1

48

CHAPTER 2

Review of Probability
var(T')

= var( -I

"
)
2:Y,

II ,-J

u-'
= ___r
n
11 '- 'tandanl de\ iation of Y j, lht! :.quare root llf th~: 'c~rinnct. n

V";.
In summary. the mean. the! \aria nee. aml thc 'tandard tk vi uion ol Yare
E(Y)

= /-L)'

stt.I.Licv( -Y)

These

rc ~ ult~

{2.46)

.,

var( Y ) = n:)

(Tf

= II <HlU
. '

= 1 1 -= !!..r
/. .
\n

(.:!..t7)

(2.48)

holu whatever the: &>tribution of Y, h: that i,,thc Ji,IJ ibutinn of ,..-,

UlK:~ not nt:.cu to t..tke on a peci fic form. 'uch ,1, tht:. normal di,tnhuuon, for Equn

Itons (2 16). (2.47), and (2.48) to hold.


The notatton
denotes thl.! vanancc: (lf the ~ampltng JJ, tnl'luuon ol the ' ampt .. tt\d.1ge} . 1n contrast, a1 is the' ariancl. of each mdtvtdu;ll } ,. th<ll t~ tht:. \art. 1111.4 of t b~; population Jislribmion from \\hl~o:h the ub,cr\ ,llt<lO 'Jrawn ~tmllarl).
u dt.nott.' t 11: ' ' ndard deviation of the 'ampur.! di,trihution t)fY.

lr;

Sampling distribution of Y when Y is normally distributed. Supp<>:.t. that


1 1 Y an. t.i.d. draws from tht! S{p u 1 ) dt,tribuuon \ s ,t,tleJ lulllm tog
e quation (~.rp) lhc ~Urn of II nonnally d~'\lf butcd nnd(lln \ ,IJ i,1bJc, j-; it,df nor
m;tll) tl t,tnhutt.d Bc!cau'c the mean of}' i"J.l) and th~ v.triancc .,(y t., <'~ 'n.thi~
m~.:an s that. if ) 1, Y, are i.i.tl. dnml'l from thc: 'V(p 1 tr; ). tht:. n Y h Jt-.tnhutcd
.V(J.I.),<r; In ).

2.6

Large-Sample Approximations
to Sampling Distributions
'\,unpling Jt.,tnhurion5 play,, central rolc in the dcn~],,pmt: nt nht.lti-ah.:.ll wu
..:c,,nomdnt. prt cuure::... -;oil b important to J.;ml\\, in am .th~ n~tica l 'cn,c. \\h't
tht. ' ampling t.lt'>trillution 1lf } is. There 1rc two apprnudtc' h i ~,;huructcri11ng
s,unpltng Ji,trihutaons: an "c\ at:t" appro.u.h .tnd an ...Jppro\tllt.lle- appro.u::h.
1ll "C.'\:UCI" arpr~..-,ach ~;nt.ul'-lh!rh ing a fc,rrnuJa ftlr the s:unpling dt .. rrihutitn

2.6

largeSomple Approximations to Sompling Ot~tnbutions

49

describes the distribution of Y for any 11 is called the exact dist ribution or fi nite
ample distribution of Y. For example. if Y b normally distributed, and } .... , Yn
are i.i.d .. then (as d iscussed in Section 2.5) the exact di:.tnburion of Y is no rmal
with mean 1-f.r and variance u~ l n. Unfortunately, if th1.. dtstribulion of Y 1" not nor
mal , then in general the exact sampling distribution of Y is very complicated and
depends on the distribution of Y.
The "approximate" approach uses approximmions to the sampling distribu
tion that rely on the sample size being large. The large ~a m p l e approximation to
the sampling distribution is often called the asymptotic di lributioo-'a:.ymptotic''
because the approximations become exact in the limit that 11 - 4 ..As we see
in this section, these approximauons can be \Cry accurate ~ ..e n if the ':!.tmple size
tS oo l~ 11 = 30 observations. Because sample si7CS U!>ed in practice tn economet
n cs I) ptcally number 10 the hundreds or thousands, these asym pto tic distributions
can be counted on to provide very good approximattons 10 the exact sampling
distribution .
This section presents the two key roots used to upproximate sampling dislri
but ions when the sample size is large, the law of large numbers and the central
lumt theorem. The law of large numbers says that, when th e sample sit.e s large.
Y will bt! close to 1-f.y with very high probabiliry. The central limit theorem says that,
when t he sample size is large, the sampling distributio n of Lhe srandard i7ed :;am
pie average, (Y- p,y)luy, is approximately normal.
A Ithough exacl sampling distributions are complicated a nd depend oo the distribulioo of Y, the asympto tic distributio ns are si mple. Moreover- remarkablyLhe asympto tic normal distribution of (Y- tJ.'I'} Iuy does not depend on the
d1stribution of Y. This oonnal approximate di tribution provides enormous si mpltfications and underlies the theory of regression u~ed throughou t this book .

The Law of Large Numbers and Consistency


The law of large numbers states that. under general conditions. Y \\ill b.... ncar fJ.>
\cry high prohabilit~ when n is large. Tim tS sometime:. called the .. law of
a\\:f,,ges.'' When a large number of random ' 'aril\blP.s with the saml! mean a re
u' craged together, the large values balance the small values nnd t.heir sample aver
age ts cl o~e to their common mean.
For c-.:ample, conside r a simplified version of o ur stude nt commuter's exper
nncnt. in which she simply records whether her commute was shorl (le~ than 20
minute!>) or long. Le t Y, equal 1 if her commut\! wa-. -.ho rc on the t 11 randomly
M.:lcctc.:d da) and equal 0 if it was long. Becau't: she u-.cd -.unph: random sampling.
Y .
, }' arc i i.d. Thu . Y i = 1, . . 11 arc i.i.d Jr '" oln Rcrnnulli r.mdom

''''h

50

CHAPTER l

Review ol Probahihty

CONVERGENCE IN PROBABILITY, CONSISTENCY, AND THE


OF lARGE NUMBERS

2.6

lAW

The 'ampk avcrC:tg.e Y con,ergcs in probnbility top. ~ (or. c4uhalcntly. }"is con-.i'>ll.:nt for J.Ly) if the prohabilit~ that Y ts in th(; r.tng.,; JL> - c ll1 1, ~ ~ ~comes
arbitrarily closl! to one a~'' increases tor an~ constant c > 0 nu~ 1::. written as
} 4 J.l.)
Ihe Ia\\ of large numbers sa~r-; that II Y,, i 1. . ... 11 arc tndcpcndently and
identically dtstrihutcd with (Y1) = JLy and it large outliers are unlikcl) (tcchn:ic.:ally if Htr( Y,) = CJ'~ < -:.c). then Y ~ J.Lr

where (from Table 2.2) the probability that Y, = 1 is 0.7R. Because the
cxpcc.:tation of a B ~rnoulli random variab!l;! is its success probability. E( Y;) = ll- >
=- 0 7R. The sample average Y is the fraction o( days in her sampk in whi<.:h her
V<tl iablc!,

commutL wao;; shun.


rig.ure 2.. ~ ~hO\\S thl! sampling distribution of Y for various sampk ~izes n .
\\hen 11 - 2 (flgure 2.Xa ). Y can take on only three 'aluc!>: 0, ~and 1 (neither commute \\as shorl. one wa" bon. and both \\ ere 'hurt). none ot whtch '"'particular!)
close tu thl! true proportion tn the population. 0.71-l.th 11 mcreas~::.. however (Figures V~h - u). Y t,tkt:s on more ''alues ami the ~.tmpling dtMnhutwn becomes
tightly centered on p..y.
lne propcrtv that Y is near ILl' \\ tlh incrc:,t.,ing probahtlil\ "' 11 increa-.es is
c.sllcJ cmnerj!ence in prububili1y or. more c,mct~ch. cOO)i'\tenc) ('.ec Kc) Concept ::!.fl). lllc la" of large: numher.;; <;tate-; that. umkr Cl.'rlain c~mutllon,, Y con' c.:t !!C' tn prohahtltty to J.l.) or. ~4uivalentl), thut ) ~ wnstst~nt lor J.Ll
lll1. conditio~ for th1. Ia\\ of large numbers that w~ will u~c in this booJ.. are
thul Y,. i - I . ... . 11 arc i.i.d. and that Lhc vnriam:c of Y1.11i is finit~.ll1c mathematscal role of these condi tions i made clea r in Section 17.2. where the law of
Jurgl.! numbcrs b proven. [f the Jata arc collected by stmple random ~.tmpling.then
the 1 t d. ,s,suruplh>n holds. llH: assumption that th..:- ' arianc~.: b fintte ~ay:- that
extrcmd) large 'aluc' uf r that is. oulltcr-; ar~. unlikely and obo::crvctl mlreq ucntl~; \.l!!J erwise.the~e large o,;alue::. couiJ dllminatc Y and thc _..,smpk avcrugc
"ould he unreliahle. Tht~ assumption is plausihh.. lor the applications tn lhi., buok .
For example. because there is nn upper limit to our 'lludcnt's commuting time (she
couh.l pnrk and walk tf the trurfic ts dreadful) lhc \aria nee of the dt.,trthution ol
commuting umes is lintLc.

2 .6

Lorge-Sample Approxrmohons to Somplrng Drstnbohons

Sompling Distribution of the Somple Average


of n Bernoulli Random Variables

FIGURE 2.8

P robabilit)'

Pn>h.tbiliry

., . r

., i-

J1

Jl ~ 078

0 78

l l~

lll

r ,,

,,

~s

o),II._,.-,-,.....,L-.--.--,,..........,,...., ..,a.,_.,-,.~,---:"'1
I)

II

U 2S

Valu e of sample average


( o)

II

(I

11.7.1

I.IJI.I

Value of sample average

(b) , - 5

.:::

Prnhahilil}

Probabiliry

t'!'i

II

15

Il l

(11~1

Ill

I -.I

"I

1 I (I

...,....,.....r-r..,..-,..-.,.-,................-.~~-~

1111

\laluc of sam ple average

!lilt

Va lue of um ple OJHrnge


(d)

II

lfl()

Th dr~trobulton$ ore thP ~mpling distnbutions of Y, the ~mple overage ol n mdependont Bernoulli random
Pr( Y
1J 0.78 (the probability of o short commuto is 78' ) Tho variance of the som
1
dr$tributton of Y decreases as n gels forger, so the sampling distnbuhon becomes more ttghtly concen
- d around rls moon 1 0.78 o~ the somple size n increases

vonobla~ With p

S1

52

CHAPTER 2

Reoview of Probability

The Central Limit Theorem


The centn~l Umit tbeorcm says that. under general condition,, the oistribution of
Y as wdl .tppro\lmc.IICO b) a normal Jbtnbuuon "hr..n 11 tl> hrc~.: . Rccetll that the
= ui I 11 . Accord me to the centrnllimtt themean ot Y ll) JLy and its variance is
orem, when n is lArge;. the distribution of Y IS approximately .~'(JJ.,-. rr~ ). As dis
cussed .ll the end of Section 25. the dist rihution uf Y ,., rxuctl\ V( Jl.> r~) \\"h~n
the !.ample tl> drawn from a populatton \\ tth the norn;al Ji'-.tnhution V(~J. 1 . rr1 ).
Tile ~.;cntr.tl Ia mat theorem says that tht" c;ame r..:-.ult i'i uppro\tlllcttt?fv true '' hen n
i-.largc e'en il }'1 Y, are not themselvc" normall\ di~trilluted
The con,~rgcncc! of the distribution of Y to the bdl-shapcd, normal approxt
matton can be seen (a bit) in figure 2.8. However, becau~e the dts tribution gets
quite ti~ht for large n, this requires some squinting. It would be eac;ier to sec the
shape of the <.listribution of Y if you used a n~tHmi t ving glas':> nr had ...orne othe r
wny to l OOm in or to expand the hon;onwl axas ol the ligurl!.
One way to do th1s is to standardil'e Y by subtracting it~ mean and dividing
by ib $lnndan.l deviation . so that it ha!> a mean of 0 and a variance of J.This leads
to cxamanin!! the Uhtri bution of the \tanuardttt!u version of}.(}" IJ.) )l uy.
A c~1rdang to the ccntrallirml theorem. thi'\ Ja-,tril"lution shouiJ ll~.: \h II lpproximated b) a N(O. I) da-.tribution wh~:. o 11 ~ large
The di!.tnbution of the standardized average (Y - IL>) I u> is plotted in Figure 2.9 for tbe distributions in Figure 2.8: the distributions in R gure ~ 9 are exactly
the sa me as in Figure 2.8. except that the '~:a l e of the: horiznnt.tl JXi' '' chane.cd so
that the :standJrdttcd 'ariable hru- a me 10 of 0 and .1 'Jri ilncc of I. 1\ ft cr this
change of calc. tl ts easy to ec that. If n is large enough. the distrabutton of Y i::.
well approxtmatcd by a normal distributaon.
One might a~k how large i ta rg~ enough" > That i-.. ho\\ lar~e mu't'' h~ for
the di~tribut10n off
to he appro\.imatd} normal"? fbe an ..wcr 1~ it d Rend ... t'hc:
....... .......
qu1lH\ ol thr.. normal approximation d\:pc:nd-. on th1. dtMlbutJon -ot the un ~erl~ in~ } th 1 make up the avcra2e..-\tone e:\tremc. tf th<; } an: them,~ I,~\;., normally
di!.tliQ.utcd then }' ~ cxactl~ normall~ ilhtrihutc(J lor all" In contr,,,t. when the

rr,

ap_pro\tm< lion cun require n = 30 or ~ven more.


I ht' point is ill ustra tt:d in Figure 2.10 for a population distribution. shown tn
Figure 2.10a. that is quite different from th..: Bl!rnouiU distribution. Thh. uist ribution ha<; a long right tail (it is "skewed'' to the right). The sampling dtsrnhutaon of
Y. after centerine and !>caling. is sho\\ n tn Figures 2.l0b, c, and d for n = 5 2'i. and
100. rc pectwely. Al though the sampling uic;tribution is approach.in~ the bl.!ll shape
for 11 -: 25. the normal approximatton still has nouccablc amperl~ctaons.

2.6

lorge-Somple Approximations to Sampling Distributions

Distribution of the Standardized Sample Average


of n Bernoulli Random Variables with p = 0.78

FIGURE 2.9

Probability

P robability

0.7

11,5

02
II I
n n ~~~,-~~~~r---~,---~

-\ I

- lll

-1 11

IJ II

I U

2 11

-J

3 .11

- :!.0

(I

St t~odardi:z.ed

value of
sample average

II (t

l.tl

2.11

J.ti

Standardized val ue of
sample ;~veragf

(a) "=:!

(b) n= 5

Probabiliry

P robability
lU:!

J.:5

Standt~rdi:ud

St~mple

( c ) u <=

-1 .11

:?:>

val ue of
average

Standardized vnlue of
sample average
(d)

II=

100

The sampl1ng diilribvtion oS Y in Figure 2.8 h plotted here after standardizing Y. This centers the distribvltons in Figure 2.8 and mogni~es the sc.ole on the horizontal axis by o foetor of Vii. When the sample size
s Iorge. the sampling distributions ore increosingly well approximated by the normal distribution (the solid
ltne), os predicted by the central limit theorem The normal distribution is Koled so thot the height of the
distributions is approximately the some in all figures.

53

54

CHAPTER 2

FIGURE 2. 10

Rev1ew of Probability

Distribution of the Standardized Sample Average of n Draws from o Skewed Distribution

Probability

Probability

1). 12

Ol P/

l,li.IJ

ll.f k) >:::::.,-,--.

-.H1

10

- 1.\)

II 11

I)

2.ll

.\ !J

Stl'lndardizcdl ~du e of
sample rw erage

Standardized value of
sample aver age
(a) u- I

( b)

Probability

Prob abi lity

II=

. I.?

0.1 ~

I ll{t

l l (1\

(I

011
_ , IJ

2.0

l .ll

lJ II

I II

.?I J

- 2.0

3.11

1).11

I \1

j ,I.J

Stnndnrdizcd vol ue of
snmplc :wern go

Scaudardizcd value of
sam ple Average
(c) 11 - l'i

-1 !l

(d)

II=

Jllrl

The figur~ show the sompl10g distribution of the standardized sample overoge of n drows from the skewed
(asymmetric) population distribution shown in Figure 2. 10o When n is smoll (n 5) the sampling dastribu
hon, like the populohon d1stribuhon is skewed But when n is Iorge (n - I00) the 10mpling d1stribution is well
opproxm101ed by o slonclord nonnol distribution (solid line), os predicled by the cent "OIIamit thf.orem The I'IOf
rnof distribuhon is SCOiecJ SO tho! the height of the distributions IS ClpPfOXirnoleiy the 101ll0 in aJ figurtn

Summary

55

THE CENTRAL l iMIT THEOREM


that Y 1 Y11 an:: i.i.d. with E( Y,) = J.1) and var( Y1) = oi . where
I)< fT~ < x . As 11 ~ ~. the distribution of (Y- p.y) l cry (whe re u~. = v~ln)
11ecomes arbitrari ly we ll <tpproxitnated by the standard normal distribution.
~up pose

2.7

Bv 11 = 100. howeYer. tl~n;al appro\im-:rion i... quite good. rn fact. lor Tl 2:. Iuo
the normal approximation to tjJt! distrihUtllm,ol t} ptcalf} is \W~ good for a" 11.k
variety o[ population Jistnbutton~.
The CCtHrallim.it theorem is a r~mark able result. While the small n'' distributions of Y in parts band c of Flgures 2.9 ami 2.10 arc compl iC<t ted and quite
di(fl'rc nt from each othe r. the 'large 11'' distributions in Figures 2.9d a nd 2.10d
are simple and, amazi ngly. hav~ a simtlar ~hape. Be!causc the dt!>tribution ot
Y approuchc' the nom1al as 11 grows large, Y it. said to he a!'~mptoticall~ norma ll~

diS1ribuCed.

The comenience of the normal approximation. combmcd with its wide


applicability bt>caUl>C of the central limit th..:orem. ma~e~ 11 a key uodc!rpinniog
of modem applied econoro ~.:trics. The central limit theorem is summaritcd in Key
C11ncept 2.7.

Summary
l. The probabilities with which a random variable takes on different value!> are

'ummari1cd hy the cumulative distribution function, the probability d i~t ribu ti on


funct ion (for discrete random variables), and the probe1bi)jty uensity fun ction (for
cominuou.<; random variables}.
2. The c:-<pcctcd value of a random varia hie Y (also called its mean~ Il-l), denoted
( }' ). is its probability-weighted average \alue. The va ria nc~ of Y is a1 =
l:."((Y - 11-y )~ j . a nd the standard deviation of Y is the square root of its
\anancc.
3. ThL joint probabilities for rwo random variables X and Y are summari1cd by theiJ
joint prohabiliry distribution. The conditional probability distribution or y given
X= ..\ is lhe probability J istribution of Y, conditional on X taking. on th~ value x .
4. A n\)tmull y distribu td random variable hus the bd l -~ h opLd rrobability dL n ~il}
in Figure 25. To calculate a probabiJjt y :l'lsucintcd with n normnl ran<.lum varinble.

56

CHAPTER 2

Review of Probability

first standardize the variable, rhen use the standard normal cumul.uivc t.li.;tribution tabulated in Appendix Table 1.
5. SU:nple random sampling produces 11 random observations Y 1 Y,. that art mdependently and iJ eotically distributed (i.i.d.).
6. The sample average, Y . va ries from one randomly chosen sample t o the next and
thus is a random variable wirh a sampling distribution. tf Y1, . . Y,.. nre 1 t.d .then:

a. the s.lmpling disrribution of Y has mean IJ.y and variance u~ =

u1-tn;

b. the law of large numbers says that Y converge~ in probabillry to IJ. y.


and
c. the central limit theorem says that the standardized verston of Y.
(Y- JLy)!u-y, has a standard normal distribution (N(O.l) diSiribution]
when n is large.

Key Terms
outcomes (18)
probability (l8)
sample space (19)
event (19)
discrete random variable (19)
continuous random variable (19)
probability distribution (19)
cumulative probability distribution (19)
cumulative distribution funct ion (c.d.t.)
(20)

Bernoulli random variable (20)


Bernoulli distribution (20)
probability density function (p.d.f.) (21)
densit y function (2 1)
density (21)
expected va lue (23)
expectation (23)
mean (23)
varia nce (24)
standard deviation (24)
moments of a di:-.lrabution (27)
..,kcv.:ncss (27)

kurto i:, (27 )


ou tlier (27)
leptokurtic (28)
joint probability distribution (29)
marginal probability distnbution (30)
conditional distribution (30)
conditional expectation (32)
conditional mean (32)
law of Iterated expectations (32)
conditional variance (33)
indepe ndence (34)
covariance (34)
correlation (35)
uncorrelated {35)
normal distributjon (39}
slandard normal distribution (39)
standardize a variable (39)
multivariate normal distribution (4 1)
bivariate normal distribution (41 )
chi-squared di tribuuon (43)
Student t distribution (44)
Fd1stnou tion (44)

Review of Con<epls

simple random sampli ng (46)


population (46)
identc.:~lly distributed (46)
independently and ideoticaJJy distributed
(i.i.d.) (46)
sampling distribution (47)
exact (finite-sample) distribution (49)

57

asymptotic dJstribution (49)


law of large numbers (49)
convergence in probability (50)
consjstency (50)
central limit theorem (52)
asymptotic nonnal distribution (55)

Review the Concepts


2.1

Examples of random variables used in this chapter mcludcd: (a) the gender
of the oext person you meet, (b) the number of tunes a computer crashes.
(c) the time it takes to commute to school, (d) whether the computer you are
assigned LD the library is new or old, and (e) whether it is raming or not.
Explain why each can be thought of as random .

2.2

Suppose that the random variables X and Yare independent and you know
their distributions. Explain why knowing the value of X tells you nothing
about tbe vaJue of Y.

2.3

Suppose that X denotes the amount of rainfall in your hometown during a


given month and Y denotes the number of children bom in Los Angeles during the same month. A re X and Y independent? E.xplam.

2.4

An econometrics class has 80 students, and the mean swden l weight is 145
lbs. A random sample of 4 students is selected fr om the class and rheir average weight is ca lculated. Will the average wciglll of the srudeuts in the sample equal145 lbs.? Why or why not? Use this example to explain why the
sample average, Y. is a random variable.

2.5

Suppose that Y1 .. Y, are. ii.d. random variables with a N( l , 4) distribution. Sketch the probability density of Y when n = 2. Repeat this fo r n = 10
and n = 100. In words, describe bow the densities differ. What i!. the relationship between your answer and the law of large numbers?

2.6

Suppose that Y1, Y, are i.i.d. ranJom vari:Jbles with the probabilit y dis
tribution given in Figure 2.10a. You want to calculate Pr(Y s. 0.1). Would it
bt:: reasonable to use tbe normal approximation if n = 5? What about 11 = 25
or n = 100? Explain.

2.7

Y ts a random vanable with p., = 0, uy = I , skewness = 0, and kurtosis =


100. Ske tch a hy pothetic.:~ I probability distribution of Y. Explain why n random variables drawn from this distribution might have some large outbers.

58

CHAPTER 2

Review of Probability

Exercises
2.1

Let Y d e note the number of " heads'' tbat occur " hen 1\HI Cl>in-. arc tn,,cd.
o. O e n ve rhe probability distribution of Y
h. Derive tbl:! cumulative probability distributton of
c. Derive the

2.2

m~:an

i:

a nd variance o{ Y.

lh~ the probability disu ibution given in Table 2 .2 to compuh.. (a) F.(Y ) and

E(X): (b) u'i and oi-: and (c) u KY Cl nd corr(X. Y)


2.3

U:,ing the rando m va riables X and Y from Table 2.2. con!-ilkr two new ran

=3

dom variables W

+ oX a nd V

= 20 -

?Y. Compute (a) (W) a nd (V);

(b) u~+ and <r~:a nd (c) (fwvand corr(W,V).


2.4

Suppose X is a Berno ulli rand0111 vari a bl~ wi th P(X 1)


fl.

Shnw

p.

(X~) = p .

b. Show (X")

= p fo r k > 0.

c. Suppose th:u p = 0.3. Compute tbe mean. variam:c. skewne~~. a nd kurto is of X . (Him: You might find it helpful to use the formulas given in
Exercise 2.21 .)
2.5

In Septe mber, Seaule's d ail) htgh te mpet a ture ha'i a me a u o t 70 F a ud ~


'itanuarJ de\ iation of ?"F. Wha t is lbc ml!an. standard dev1ation, a nd vanance an C?

2.6

1l1c following table give:. the joint probab1lity d ~mbu 11on he t wc~n c:mploymcnl statu and college graduation am on!t tho se e ither e rnployeu or looking for \\Ork (unemployed) in the wo rking age U.S. popultwon. based on the..

1991) U.S. Censu:-..

Joint Distribution of Employment Stotus


and College Graduation in the U.S. Population Aged 2564, 1990
Unemployed (Y - 0)

( ull~c 1'1
To111

h ( \

II

Employed ( Y

ll.\1~5

0.71l<l

ll.(k15

,1-~;o!l

IIIL"'

f)

9.50

1)

Total

1.000

Exercises

n.

Comput~

59

E( Y).

b. The unemployment rate ic; the rra tlllll of th..: lahor lure~ th.n '" unemploycu. Show that rhe unemplo> mc.:nt rate I) gih n by I - L:t Y).
c. Calculat~ E( YI X =

d.

1) and F-( Y1 X

= 0).

Calcul<tt~; the unemployment rate for (i) c.:olkg.l! gJ .tJu.u~c; and (u)noncolkgc graduates.

e. !\ randomly selected mt!mbcr of th1s population report~ bemg. uncmploycJ . What is the probability that thi~ \\Urkc 1s .t culkgc grJduatt!?
A non-c:olkg~ graduate?
f. Arc educational achie\emeot and employment :.tntus independent?

Explain.
2.7

ln a gh en population of two-earner malt:, female couples mak carnintt-c; have

u mean nf $40.000 per year and a standard deviation of S12.1KXJ. Fe: male.: ~.:am
ings ha q~ a mean of $45.000 per year and u stnndurJ dc\l<ltlOn ol $1 K.OOO.
The corrcl:lllon between male and r~.malc earnings fm ,, c.ouple b 0.80. Let
C denote the com hi ned earnings for a randomly ~e l ected couple.
a. Wh&t is the mean of C?
b. What i'> the covariance between male and fe male ~..mungs?
c. W hat is the !>Landard dev1ation o( C?

d. Convert the answers to (a)-( c) from$ (dollars) to tcuros).

2.8

The random variable Y has a mean of l and a v1rinncc of 4. Le t Z


~ (Y - 1). Show that Jl.L = 0 and a) - 1.

2.9

X and Yare discrete random variables\\ 1th the foliO\\ ing JOint Jt:-.tribution:
Voluo of Y

[
1
5

40

6S

(1.0.\

II 01

o.P

11. 15

IUI5

002

(l.IJI

0.02

0.03

11.1't

lllll

IJ.09

1 1'- l~) = 0.02. and so forth


a. <'nlculatc the probability distribution. mean. and vuriancc of Y.

' lll.tl is. Pr( Y

h. C.1kulatc the probability distribution , mcan.und variancl! of Y given


X
8.

('. c lkulatc Lht: CO\ariao~.:e anu C(lrrd:lli<IO hi.! I\\ ccn X .md Y.

60

CHAPTER 2

Review of Probability

2.10 Compute the following probabilities:


at.

rr Y is distribu ted N(l. 4), Cind Pr(Y s

3).

b. tf Y is distributed N (3, 9). find Pr(Y > 0).

c. lf Y is distributed N(50, 25). find Pr(40 s Y s 5:!)


d. If Y as di~tnbuted N(5. 2), find Pt-(6

s 8).

2.11 Compute the followrng probabilities:


a. If Y as distributed ,d, find Pr( Y s 7. 78).

b. lf Y is distnbured xfo find Pr( Y > 18.31).


c. Tf Y is distributed F10..., find Pr(Y > 1.83).
d. Why a re the answers to (b) and (c) the sa me?

e. If Y is distributed xf, fi nd Pr(Y s 1.0). (Him: Use the defi nition of


thext distribution.)
2.12 Compute the following probabililies:

n. Tf Y is distributed t 15 , find Pr(Y > 1.75).


b. lf Y is distributed l~, find Pr(- J.99
c. If Y is distributed N(O, 1), find Pr( -

s Y s 1.99).
1.99 s Y s 1.99).

d. Why are the answers to (b) and (c) approxtmntely Lhe Sl'lmc?
e. If Y is Jastributed F7 .~. find Pr(Y > 4.12).

. If Y is distnbuted F7.l.'tl find Pr( Y > 2.79).


2.13 X is a Be rnoulli random variable with Pr(X

= 1) = 0.99, Y is distributed

tV(O. 1), and W is distributed N(O. 100). Le t S= XY + (1 - X)W (That is.


S = Y when X = I . and S = W when X ==- 0.)
a. Show that E(Y 2) = 1 and (W2 )""' 100.
b. Show that ( Y3 ) = 0 and ( W3)
a symmetric distribution?)

= 0. (H im: What is the skewness for

c. Show that (Y~) = 3 and (WJ) = 3 X l00 2. (Hin.t: Usc the fact th at
tht.:. 1-.unosis is 3 for a normal distribution.)
d. Derive E(S) , (52). (5 3) and (54 ) . (Him. Use the Jaw of iterated
t!Xpcctations conditioning on X =0 and X = 1.)

e. Deme the skewness and kurtosis Cor S.


2.)4 In a population ILl

= 100 and u~ = 43. Use the centra l lim it

.tn-.,\\\.'f th~.; [OIJt)\\JOg qnC<;llODS:

th~.:urem tu

Exercises
a. Jn a random sample of size n = 100. fi nd Pr(Y
b. In a random sample of si7e 11

<

61

101).

=165. unc.J Pr( Y > 9S).

c. In a random sample o f size 11 = 64, find Pr( 101 s Y ~ Hl3).


2.15 Suppose Yi, i

1, 2... , n are i.i.d. random , ,,riaolcs. each distributed

N (1 0, 4).

a. Compute Pr(9.6 s
(ni) n = 1.000.

Ys

10.4) when (i) n = 20, (ii) n

= 100. and

b. Suppose cis a positive number. Show that Pr( I0 - c :5 Y


becomes close to 1.0 as n grows large.

<

lO - c)

c. Use your answer in (b) 10 argue that Y converges in probability to 10.

2.16 Y is distrib uted N(5, 100) and you want to calculate Pr( Y < 3.6). Unfortunate ly, you do no t have your tcxthook and do nor have access ro a no rma l
probah1hty table like Appendix Table l. H owcwr. you do have vour compurer and a computer program that ca n generatt 1 i.d. draws from the
N(5. I 00) distribution. Explain how you can usc your computer to compute
an accurate approximation for Pr(Y < 3.6).
2.17 Y,. i = 1, .... n, arc i.i.d. Bernoulli rando m variables with p
denote the sample mean.

= 0 .4. Let Y

a. Use the central limit to com pme approximations ror


j, Pr(Y 2:: 0.43) when n = 100.
ii. Pr(Y

:5

0.37) when n = 400.

b. Ho\\ large would n need to ~ to e nsure that Pr(O.J9 :s :5 0.4 1) ;::::


0.95? (Use the ceolral limiltheore m to com pute an approximate
answer.)

2. 11( In any year. the weather can innict storm damage to a home. From year to

vcar.lhe d amage is random. Let Y denote the doiiHr value of damage many
gl\cn year. Suppol>e that in 95o/o of the years Y = $0, hut tn 5% of the years
}' = $20.000.

a. Wh.ll is the mean and standard deviation o f the damage in any )c!ar?

b. Consider an " insurance pool" o f tOO people whose homl!l> are sufCicicntl\ dispersed so that, in any year, the damage 10 dil krcnt homes
can reviewed as independentlv distributed random variables. Let Y
denote the a'...:ragc damage 10 theS<! 100 homes tn .t year. (i) What ;.,
tht: "'pcct...:c..l value ofth<; ..l'~ra)o!C lhtmal.!c f , (n) What ..., th~: prohahilit) th,ll ) exceeds $2000?

62

CHAPTER 2

Review or Probability

2.19 Coosider two random variables X and Y. Suppose that Y taJ...cc; on k valucc;
y 1, >'t and that X takes on I values x 1, x1

2:!=

o. Show that Pr(Y = y1) =


Pr(Y = y 1IX
the definitton of Pr(Y = y1IX = .x,).J

= x,)Pr(X

t 1).1 Hmt:

Use

b. Use your answer to (a} to verify Equation (2.19).


c. Suppose that X and Yare independent. Sho\\ that tr ~,

cor((X, Y)

(I

and

= 0.

2.20 Consider three random variables X, Y, and z. Suppose that Y takes on k values y 1 >'k that X takes on I values x1 . , ' "and tha t Z takes on m values , 1 , Zm T11e joint probability disuibution of X, Y. Z is Pr(X = x, Y =
y, Z = z), and the conditional probabi lity dist1ibutio n of Y given X and Z is
Pr(Y = IX = x Z = z) = Pr(Y = .v. x - x. l.- { 1

y
'
Pr(X - x. 7 (j
o. Explai n how the marginal probability that Y = y can be calculated
from the joint probabi lity distribution. [Hi11t: This is a generaliza tion of
Equation (2.16).]

b. Show that ( Y) = E[E(YIX,Z)]. [Hint This is a generalizarion of


Equations (2.19) and (2.20).]
2.2l X is a random variable wi th moments (X), E(X 2). (X 3). and so forth.
a. Show E(X- .uV = E(X l) - 3((X2)]1E(X)I + 2[(X)J'.
b. Show (X-

p.y = E(_,r)- 4((X))((X3) ) + 61(X)jl(E(X2)J -

31E(X W .

2.22 Suppose you have some money to iovesL- for simplicity, $l- and you are
planning to put a fraction w into a stock marke t mutual fund and the rest.
1 - w, into a bond mutual fund. Suppose that $1 invested io a stock funtl
yields R. after one year and that $1 invested in a bond fund yelds Rb, that
R1 is random with mean 0.08 (8%) and standard deviation O.o7. and that Rh
is randolll with mean 0.05 (5%) and standard deviation 0.04. The correlation
between R3 and Rb is 0.25. If you place a fraction w of your money in the
stock fund and the rest, 1 - w, in the bond fund , the n the return o n your
investment is R = wR5 + (1 - w)Rb.

a. Suppose that w = 0.5. Compute the mean and standard deviation of R.


b. Suppose that

w =

0.75. Compute the mean and standard deviation

of R.

c. What value of w makes the mean of R ~ larg~ as poo;sihlc? What b the


-;tnndard deviation of R for Lhis value. or w?

Derivation of Results in Key Concept 2.3

63

d. (Harder) \\'hat is the value of w tlutt minimizes the standa rd devta tio n
of R? (You can show tbjs usmg a grapn, algebra, or calculus.)
2.23 This exercise provides an example o f a patr of rando m variables X and Y for
which tbe condit iona1 mean of Y giveo X depends on X bur corr(X,Y) = 0.
Lei X and Z be two independently distributed standard normal random variables, aod let Y

= X 2 + Z.

a. Show tha t (YIX)


b. Sbow that /1-y

= X 2

= 1.

c. Show that E(XY) = 0. (Him. U e the (act that Lhe odd moments of a
sta ndard normal random variable are aU zero.)
d. Show !hal cov(X, Y)

= 0 and thus corr(X, Y)

2.24 Suppose Y; is distributed i.i.d. N(O. a 2 ) fo ri

a Show that (Y~/a2 )

= 0.

= 1, 2, .... n.

= 1.

b. Show that W = ~ 2:;':1 Yt is distributed

K,.

c. Show that E(W) = n. (Him: U!.e your answer to (a).]


d. Sbow that V

~ ~ is distributed '- ,.
~;~; Yf

n-1
APPENDIX

2.1

- - - ----1

Derivation of Results tn Key Concept 2. 3


This appendU. derives the equations in Key Concept 2.3.
Equation (2.29) !ollows from 1he dcint1ion or tbe expectauon.
To derive Equaton (2.30). use the deftnitioo of the variance Lo wnte. var(o + bY) =
Elfa + bY - (a+ bY)jZJ: E([b(Y - ~tv)FJ = b2l(Y - P.yi] = b2oi-.

To derive EqualiOJl (2.31), use the definttion of the variance to \\-Ti l e

var(aX +bY)= El[(a X +bY) - (Of.lx + bJ.Ly)fl


= Efiu(X - ~tx)
= E[a 2(X

+ b(Y- p.y)J2J
+ 2E fab(X - P.x)(Y

P.x) 1)

- }.ly)}

+ E(b2(Y - }.ly)2J
=

tr1var(X) ,. 2obcov(X. Y) + b 2 var(Y)

111111

+ 2J1/Kr \ y

~ b~tr}.

(2.49)

64

CHAPTER 2

Review of Probability
wh~.;rc

ng

the: -.cc,nu 10 1ht\ follow~ t'ly colkcting term the thtrc.l ..:quaht} follow-; by cxp: nJquadr:uic. and the fourth equality fol11'~ b) the tkltnJlton o t the van,m~-e nnd

th~

CO\ JnJ.llcc.

1o Ul.'nve

LtjU .Ilh>n

{2.32), wnte F.(Yl)

= E(l(Y -

J.l y) ~~A-t

!'I- L:(p

J.ld

I-

211-t E( Y - IL>) 1 J.l) - cry ~ M~ bl.'causc E(Y J.1 ,.) 0 .


To d..:rivc F.quntion (2.33). u:.e the dc:itmtion of the cov.~rt.mce to writ~:
cov(a ~ I1X

+ cV. Y) == Ella

bX + cV - l::(u + I>X _. d )I[ ) -

= l[b(X- ,u.,)- c(V

IL"I)[Y

= l[b(X- 11-,\)}IY- 11-tJI


=

J.l.. t

II

11-dJ

+ llc(\t-

(2 50)
p. li( Y - 11-y}f

buxr- ccryy.

\\hkh 1~ Equu11on (2.33).

'lo t.lenw lquation (2.34), write E(XY)

= E j((X- l-4x) + J.l..\'J[( Y -

IJ.r) + p. 1 II

E[(X- Mx)(Y ~ My)] + !Jxli( Y- 1-4r) + IL rE(X - J.Lx) + J.l..xJ.l..y = (rx 1 -r J.l..xJ.Ir
We now prove the CMrelation inl.'quality in Equation (2.35); t h<~l is, 1corr(X. >')] :! I.

Let n

-a Xl'tal and b .. I. Applyin2 Equation (2.31 ). we ba\'e that


var(aX + Y) = a21Tj. + u~

2att-' y

== (-u\T ' "W"~ + uf

+ 2(- uXYI"~Wx 1

(2.51)

= u~- uhlu~
Bec:tuse

v:.~r(uX

Y) is u variance. it cannot be negotive.

tion (2.51) 11 mU)t he that u?u~ t 5

~o

t rom the fina l ltnL oJ Cqtht

u}:ylvj ~ ll Rearranging this ineq ualit~ yidus

tr_i.q?

(covarianc..: mequahty)

(252)

.llle ccnananc..: 10cquality tmpties that u.~ yf(trrui) ~ I or. ci.JUI\,dc n tv.
t' { u x iTt) :... I, \\hich (u mg. the definitiOn or tho.; corrdutH.>n) provr:;. lhl.' C(.llfdJtll.iO
1111!quality.jcorr( X .Y ) Is I.

,v,

CHAPTE R

Review of Statistics

t atis t ic~

S Statistical tools.

is the science of using data to learn about the world around us.
hdp to answer questions about unknown charactcnstics of

ui'itributions in populations of interesl. For 1!.\,Lrnph..

\\ hHI j,

the

m~:an

of rhc

distributtun of cammg-; of recent college graduate, '_! Do m~an carmng-. differ


for men and women and. it so. by how much'?
Tllcl>e question" relate to the distribution of carmngs in the population of
workerl>. One way to answer these questions -.vould be 10 perform an exhaustive
survey of the popula tion of workers. measuring the earnings of each worker and
thus fi nding. the population distribution of earni ng, In pnlClicc, however. such a
compre! he n~i ve l>U f\C~

~u n t:y uf

would he extremely I!Xp~ n)tvc. 1l1c onl) comprehensive

the U.S. populaunn is rbe decennial ccnsu .... Tiu: 20fXl l'.S. Cc nsu. cost

SJO billion. nn<.lthc proccs.c;. o f de<:igning the c~.. n ~w. Corm~. managing and
conducting tl1e surveys. and compiling and analyzing the data takes ten years.
Despite this extraordinary commitment. many member~ of the population slip
through I he cracks and arc not surveyed. Thus a diCfcrcnl, more practical
approach is needed .
TI1c key insight or statistics is that one can learn .1bout a population
di-.tt ibution by sck cting a random san1plc fro m th.1t p(pulation. Rather than

.;ua '1.!} th" en lire U.S. population, we might .,urve}. S<l}. 1000 mcmhers of the
population

~elected a t random by simple

ran<.lom l'ampling. Ustng stathtical

mt:.thod<:. we.: can use this sample to reach tentative conclusions- to draw
~t.ltisucal

inferences-about characteristics of the full populalil\n.

65

l
66

CHAPTER J

Review of Statistics
'Jbrec types of statistical methods are used throughout oconumctril.v

estimation. hypothesis testing. and confidence intervals. Estimation cntaih


computmg a "best guess" numerical value for an unknown ch trd~.:tcn,tic of a
population distnbution, such as its mean, from a sample of ll.tt:t II) p\llhcw. tesun,g.
enwils formulating a specific hypothesis about the population . then using sample

e\ lllence to decide wbelber it is Lrue Confidence mtervals u.;e a set \)I c.lata to
estimate an tmerval or range for an unknown population charactl;nStiC. Sections
3.1, 3.2. and 3.3 review estimation, hypothesis testing, and confiucncc intervals in
the context of statistical inference about an unknown population mean.
Most of the interesting questions in economics involve relationships between

two or more variables or comparisons between different popul ations. For


exa mple. is there a gap between the mean earnings for male and female recent
college graduates? In Section 3.4, tl1e methods for learning abou t the mea n o( a
si ngle populaUon in Sections 3.1-3.3 are extended ro com rare means m two
d1fferent populations. Section 3.5 dtscusses how the methods for companng the
mea ns of t wo populations can be tc.ed to estimate causal effects m expenmcnts.
Sections 3.2-3.5 focus on the use of ilie normal distnbullon for pertorrning
hypothesis tests and for constructing confidence intervals when the sample SI7C is
large. In some special circumstances.. bypothestS tests and con tide nee intervals
can be based oo the Student t distribut.ion mstead of the normal d1stribu11on:
these special drcumstance<s are discussed in Section 3 6. The chapter concludes

with a diSCUSSIOn of the sample correlation and scauerplot& in Section 3.7.

------------------------------------------------------------t----3.1

Estimation of the Population Mean


Suppo'c you want to know the mean value of Y (J.4l ill popul.llinu . . uch ; , thc
mcm carnines of women recently graduated from colkAc. J\ naturo l way to csti
mat~ tim, mean is to compute the sample:: avt:rage r Irom a )ample of n ind~pcn
dcntly ,tnd ide ntically distribmcd (i.1.d.) obsc rvatl\)nS, Y1 . Y (recall that
Y 1, , Y,. .~re 1 t.d. If they are collected by simple ranc.lont !>amphn~ l lhb Sl:Clllln
di\CU"I.' 1. 'limatiun of J.L~ and the properties of Y a' an l;\1 im.llor {ll 11 t

3.1

Estimation of the Population Mean

61

ESTIMATORS AND ESTIMATES


~-

An estimator is a fu nctiou of a sample of data to be dra\\:n random)} from a pop


uJation. A n estimate is the numerical value of the estimator when it is actually
cnmputed using data from a specific sample. An estimator is a random variable
hecau~e of randomness in selecting the sample. while an estimate is a nonrandom
number.

3. l

Estimators and Their Properties


Estimators.

The sample average Y is a natural way to estimate JJ. y, but it is not


the only way. Por example. another way to estimate iJ.l' is simply to use the first
ob ervarion. Y1 Both Y and Y1 are functions of the:: d.tlu thul .m: dcs1gned to estimate p.,: U!)ing the tCffi1inology in Ke) Concept '\ I both Me e!)timators of J.l-l'
When evaluated in repeatt!d '\8mples. Y and Y1 take on diffc.:rent values (they produce different estimates) from one sample to the next. ' JllU!). the estimators Y and
Y1 both have sampling distributions. There a re, in fact , many estimators of J.Ly, of
wh1ch Y and Y1 are rwo examples.
The re are many possible estimator'>, so what ma k e!~ o ne:: c-.timator "hetter"
than a no ther? Because estimators are random vanahfc,, tht!. question can be
phtascd more prcc1scly: What a.re desirable cha racteris tic~ uf tht! sampling distribution of an estimator? Tn general, we would likt; an C'ltmiator that gets as close
a-. possible to the unknown true value. at lctht 111 'rt>rnc .t~t; r.tgc \COM!; in othe r
word<;, we would like the sampling distribution of an csttmntor to be a.., t1ghtly cen
tt.red on the unknown value as possible. Th1s othcn ilton lead-. to three <ipccific
d~.;s1rablc charact crist i~ of an estimator: unbiased ness (u lack of bias), consistency,
<1nd efficiency.

Unbiasedness.

Suppo c you evaluate an estimator many times over repeated


ra ndo mly drawn samples. It is reasonable to hop~ that, on ;w~.;r tgl }OU \\ OUid get
the nght answer Th~ a desirable property ot an c.:~timn tor j.., tha t the mean of it:.
sampling Jtstnbution eq uals SLy: if so, the estimator is s;ud 10 be unbiased.
To state this mathematically, let P.v denote some estimator of p.y. such as Y or
Y1. The estimator jJ.y is unbiased if E ([Ly) = p. y. where (jJ.y) is the mean of the
Sdmpling distribution of [J. y; o therwise, [J. y is biased.

68

CHAPTER 3

Revtew of Stoh~cs

BIAS, CONSISTENCY, AND EFFICIENCY

3.2

Lc.:t P.r b~ an C)timator of 1J.r.l11en:

l11c bia!l of jJ.. 1 is E([J.y) - IJ.r

P.r is un un bi~cd e!ltimator of J.Ly if E(fJ.r) = J.Lr


con istent estimator of p..,- if P,y ~ J.L1

P, 1

L~t jj,y be unother estimator of P..l and suppose that both P.r and ji. 1
ore unhiascd. Then {Ly is said to be more efficient th:tn ;;..,. 1f var([J.~) <

IS a

var(ji.y ).
Another desirable property of an estimator jL,. io; that, when th~
..ample 'itc is t.ug,e. the u ncertmnl~ about thl val u ~: ol 11> ari,.ng lrnm random
'ariatmn.., 10 thl .,ample I' V~IJ 'ma ll. S tatcJ more precisd) a J~.;slrahle propt!rt~
of P., is th.tt the p10bahility that it is withm a nt.~ll lOll \ d o' thc tJ,Uc \ du" p.,
011
ilpp roalhl'' I,,, tht. .,,,mpk '"~,; mcrc.:asc~ thatts.p.

Consistency.

ccpt 2.ll).

Suppose you have two candtJatc t.:.llmato r~ P.r and


jj.y. ooth of\\ Inch arc! unbiased. H O\\o might you choo't. ~Cl\\CCil th<.. m'~ One w. ~
to do so'' lll choose the estimator with the tightc~'t ,,,mptm~ dl't tbutll.m llm. 'u ~
l!l!~h ~ h no'tnl' bd\\~.;~n IJ.> ..tnd IJ.> y p1dung tht l!'llm< tot wttH thl o;m.uk ,, .tr
tnll II JJ. h .., t ')m.. llcr \"lrtancc than Jl.) . then jJ. 1 1 ~a id to he mott: dfictt.:ntthan
p.., . I I! lt.rminolog) "efficiency'' sterns from the notion that. i1 P- 1 hi'l a ~m alltr
var ianw than jj. 1. the n it uses. tbe in!ormauon 1n the data more: cfflcll!ntl~ than
docs j;.y.
Bitts, con)iStc!ncy. and dfidc!nc.:y are summari;cJ in ~cy Concept 3.2.

Variance and efficiency

Properties of Y
How Joes Y farl! .t) nn esttrnator of IJ.y when judgcll h~ the th ree Clltt.:tt.l of h1a~.
COO\I"tCOC}, unJ clflcicn('y'?

Bios and consistency. Thtc> 'am piing di-.trihutton ,,1 )' h.' alrc.tO\ hccn ~' tmincd in ~cc.:uon~ 2 ' and 2.6. As

s hO\\ n

in Sccuon ~ '\ I

n ) = JJ Y

<;(\ }

) )!

3. 1

Eshmolion of the Populohon Meon

69

unbi a~cd

estimator of p.y. Similarly, the law of large numherl> ( Key Concept 2.6)
states that Y ~ JJ-y. that is. Y is consistent.

Efficiency. What can be said about rhe efficicncv of Y'! Because efficie ncy
t!n taals 1 cornpanson of estimators.. we need to sped f) the estamator or c timators
to whtch Y is to be compared.
We stan by comparing tbe efficiency of Y to the estimator Y 1 Because
Y 1..... Y,1 are i.i.d .. the mean of the sampling distribution or Y1 is F( Y1 ) = JJ-y: thus
Y1 is an unbiased estimator of /J-)' Its variance is vnr( Y1) = tr ~ From Section 2.5.
the varinnce of }' is CT~ l n. Thus. for 11 ~ 2. the variance of Y i::. lc'>S than the variance ol Y1: that h. Y i::. a more efficient e::.timator th.tn Y so. according to the crite::non of dlictCnC). l "hould be used instead of Y,. 'l11e esumato r Y1 might strike
\ OU as an obviously poor estimator-why would you go to the trouble of collecting a sa mple of n observations only to throw awa\ all hut the fir~t ?-and the concept of c tl'iciency provides a formal way to sho\\ th.a t Y is a more desirable
estimator than Y1
Wb!ll about a less obviously poor estimator? Consider the weighted average
in which the observations are alternately weigh ted by ~ nnd ~:
-y

(t

3 + 2y3
1 + 23 y4 + ... + 2Y,_,
I
3 ).
= nl 2YJ + 2y2
+ 2Y,

(3.1)

where the number of observations 11 is assumed to be even for convemence.ll1e


mean of is JJ-y and its variance is var(Y) = 1.25tr?.tn (E\U(.tsc 3.11). Thus is
unbiased and, because var(Y) ~ 0 as" ---+ x, i-, consistent. However,

has a larger va rianc~ than Y .~1us Y i ~ more efficient tlt.1n }'


The estimators Y, Y1, andY bavc a common math~mnt i ca l structure: They are
weighted averages of Y1. Y,. The compariso n~ Ill the pacvious two paragraph~
~how that the weighted average Y 1 and Y ha' c l,trl.!c.. r '.mancc-. than Y ln fact .
thc~c.. l:onclusion::. retlect a more general re~ult : } '' the mo ... I c.. ffttic:nt l.!~t1m ator
o( all unbi;t,ed ~stimat ors that ure wei~hted ,J\ c..n~c:. ~,r Y ... r . Sa1d different!\, } ts the Be::.t Linear Unbiased Es;imato7 (Bl F ). that i' it b the most dlil'ic..nt (nc-.t) estimator among all estimators thnt rc unht.a.;cd and n1c.. linear
Iunction ... nf Y . ... . Yw This result is stated in Key Concc::pt 3.3 and b pro' c..n in
( h.tptca 5

Y is the least squares estimator ofJ.lvy The l-ample avcra~c:: Y provide' the
hc ... t fit l<l the data in the ~en!';e that tht" .1vcr 1g -.qu 1r l Otlkrcnc..e ... ~d\\CI.'n the
uh 'n.tllnn.:and r arc thc.:-.maUest ot aU po"1hlc c~tunntnr'- ...
&

----

10

CHAPTER

Review of Stotislics

EFFICIENCY OF

3. 3

y: y

IS

BLUE

Let fir be an estimator of p.y that is a weighted average of Y1, , Yn. that is.
f;,y == ~ I7= 1a1Y;. where a1.... , an are nonrandom constants. If fiv is unbiased, then
var(Y) < var([i.y) unless fly = Y. Thus Y is the Best Linear Unbiased Estimator
(BLUE); that is, Y is the most efficient estimator of J.Ly among all unbiased estimators that are weighted averages of Y1, .. . , Y71

Consider the problem of finding the estimator m that minimizes


I!

L (Y,- m) 2

(3.2)

i=l

which is a measure ot the total squared gap or distance between the estimator m
and the samp le points. Because m is an estimator of E(Y), you can think of it as a
prediction of the value of Y 1, so that the gap Y; - m ca n be thought of as a prediction mistake. The sum of squared gaps in expression (3.2) can be thought of as
the s um of squared prediction mistakes.
Tne estimator m that minimizes the sum of squared gaps Y; - m in expression
(3.2) is called the least squares estimator. One can imagi ne using trial and error t\ l
solve the least squares problem: Try many values of m until you are satisfied tha t
you have the value that makes expression (3.2) as small as possible. Alte rnatively.
as is done in Appendix 3.2. you can use algebra or calculus to show that choosing
m = Y minimizes the sum of squared gaps in expression (3.2). so that Y is the least
squares estimator of f.t y.

The Importance of Random Sampling


We have assumed that Y1, , Y, are i.i.d. draws. such as those that would bt'
obtajned from simple random sampling. This ass umption is important becau-<t
nonrandom sampling can result in Y being biased. Suppose that, to estimate the
monthly national unemploymen t rate. a statistica l agency adopts a samplin!!
scheme in which intervie wers surve y working-age adults siu ing in city parks at
10:00 A.M. on the second Wednesday of the month. Because most employed peo
ple are at work at that hour ( not silting in the park!), the unemployed are over!~

3.2

Hypothesi~ Tes~

Concerning the Popvlotioo Meon

71

J andon Wins!
honlv before the 1936 Presidential election. the

I irrrary Ga:me puhli~hed a poll indtcaling that


\If M. Landon would deeatthe incumbent. Franklin

D Roosevelt. by a landslide-57% to ~3% . The

(,utrre was nght that the election was a landslide.


hut it wa~ wrong about th~ winner: Roosevelt won by
'\t>% to .t I 0 {, '
Iiow could the Gazrue have made such a big mist<tkc'/ The Gaum:'s <;ample was chosen from telephone records and automobile registration files. But

in 1936 man) households d1d not have cars or telephones, and tho<>e that d1d tended to be richer-and
were abo more likely to be Republican Because rhe
telephone survey did not o;ample randoml) (rom Lhe
population but Jn)tf.:ad undcrsampled Democrats.
the estimator wa~ bia ed and the Ga zette made an
embarrassing mistukc.
Uo you think :.ur..eys conducted over the !Illernet m1ght have a <itm1la problem wnh btns'?

represented in the sample, and an estimate of t he unemployment rate based on


this sampling plan would be biased.This bias arises because this sampling scheme
overrepresents, or oversamples, rhe unemployed members of the population. This
example ts ficntious, bm the "Landon Wins! " box gives a reul-world example of
biases introduced by samphng that is nol entirely random .
It is important to design sample selection schemes in a way that minimizes
bias. Append1x 3.1 includes a discussion of what the Bu reau of La bor Statistics
actually does when it conducts !be U.S. Current Population Survey (CPS). Lhe survey it uses to estimate the monthly U.S. unemployment rate.

3.2

Hypothesis Tests
Concerning the Population Mean
Many hyp otheses about the world around uc; can be phrased as yes/no questions.
Do the mean hou ri~ earnings of recent U.S. collcg~ IH .tlluutc-. equal $20/hour? Are
mean earnings the same for malt: and female college gralluatcs? Both these que~
tions embody o;pccific hypotheses about the population distnbution of earnings.
D1c s tatistical challenge is to answer thcc;t; questions based on a sample of evidence. This section describes testing h~ p()these' concern~ the populauon mean
{Does the population mean t houri\ catntn ., u u " )') . ltv othec;is tests
mg l \\C) l>OpulatiOn!>
are taken up in Section 3.4.

72

CH APTER 3

Review of Statistic

Null and Alternative Hypotheses


fne c;wrting point of sratistjcaJ hypotheses testing is speetfying th~ h) pothe<.a~ to
be tc~t ed. called the null hypothesis. Hvpothec;is tcc:tinf! cntaals usjgg d a\UlO com.:._
pare the null hv thesis to a second hv thesis, culled the altemafh e h
h ;.is.

th.at olck a1the nuJI d~ not.


[he.. ddii lij poitltIS I diAl tne population mean F( Y). tuh~ on u specafic
vaJue. denoted by J.L.r.o- The null hypolhc~is is denoted Hl .tnl.lthm. ~

Ho: ( Y) = JLr,o

laAA

U,~ \

For cxmnple, the conjecture lhat, on average in the population. college graduates
cnm $20/hour constitutes a 011!! hygprhqj~ ghmu rh5 go pulappn distribution oi
hourly cnrnin s. Sta ted mathematically. if Y is the hourly cnrning of a randomly
:-.clcctcc.l recent co ege gra uate, t en e nu 1ypo csa!l tS a
.. . 1a

r- r ~~s.
-~
~

r' ',If'

(3.3)

20 m C.. qualton (3.3).


The alternative hypothesis specifies what is true if the null hypothesis is not.
llle most general a lternative hypothesis is that E(Y) P. r,oi this is called a two'iided alte rna tive h '
because it allows (Y) to be either lc~s !han or
an IJ.>:o TI1e two-sid~d allernutive is wriu e n as
ILl;u

(3.4)

It~

O ne-sided alternatives are also possible. and these arl! dt:.cussed late r in Ibis
section.

..

3.2

Hypothesis Tests Concerning the Population Mean

73

ra

(3.5)

That is, the p-value is the area in the Laits of the dislribution of Y under the null
hypoth esis heyond 1?0 ' 1 - ,u.y,ol rt the p -value is large. then the observed value
Y""1 is consistent with the null hypothesi, but ifllle -value is small it is not
,.,
_
mpute t e p-va uc t
c c ary o now the sampling distribution of
Y under the null hypothesis. As discussed in Section 2.6. when the sample size is
~ ... f, small this distribution is complicated. However. accordi njtlo the cemra!Umitthe.._
-=orcm, when th
Ie size i~ lar c the sam ling distribution of )7 is well a rox~
tmate by a normal dis tributi on. Under the n
ypotbc-is, 1 e mean of lhis
normal distri bution is J.L Y.o so under the nuiJ bypothesis Y is dislributcd N(,u.,.:11,ut ),

AI

,A

74

CHAPTER 3

FIGURE 3.1

Review of Statistics

Cakukmng a p-valuo

The p-volve is the probability of drowing o


value of y thof differs from PrtJ by ol leosl as
much os Y""'. In Iorge samples, Y is d1stnboled
N(p~ <T:) under the null hypothesis, so
(Y - p yo)/"r IS d1slnbuted t-.(0, 1). Thus the
p-volve is the shaded stondord normal toil
probability outside - (Y"" Proll u v

Y' '"'

,..
II \,I I

wh ~rc u~ = cr ~ ln. This largc-samplt! normal approximation makes it possible to


compute the p-value without needing to know the population dic;t.ribution of Y. ac;
long as the sample size is large. The detaiJs of the calculauon hgwnn dcornd on
~ hs' !her q[sI kpown.

Calculating the p-Value When U y Is Known


Inc calculation of the p-value when u y is known il> summari7.cd in Figure 3.1. If
the sample size is large, then under lhe null hypothesis the sampling distributton
ot Y b N(p. r,11 I'T f), wher:_ cr~. = crf,l n. Thus, under the null h)rpothesis. the sta n
uarJt7Ccl vcrs10n of Y, (Y - J.Lr,u P rrr , has a stnnllan.l normul dist ribution. Tb..
p -val ue is the probability of obtaining a value of Y f<Hthcr from . tha f ''

ability in Figure 3.1 (lhat is, the p -valuc) is

p-'alul! = PrH (1Y--- !:'~~ > ,_Y__---'-JI...:.l-.... )


<T\

<Ty

2tll( I

I -

IT )

Jl.\

01)

(3.6

Hypothesis Tests Concernmg the Population Meon

3. 2

75

where ct> is the standard normal cumulative dhtribution lunc.: t1on. That is. the
p-vuluc is the a rea in the tails of a standard normal dl'\lnhutioo outside::
::!: (Y - p. r.u) I u y .

v t e nu
-.c in general
<ly mu~L c esti mated before the p-value can be computc:d. \\ C now tum to the
problem of c:stimating 17~.

The Sample Variance, Sample


Standard Deviation, and Standard Error
l"he sample variances~ is an estimator of the population vanancc: u~: the sample
~Hmda rd deviation sr is an estimator of the population <,t,mdard deviation t7y: and
the standard c:rror of the sample average Y is an estimator of the standard deviation of the sampling distribution of Y.

The sample variance and standard deviation .


.,

Sy

1 ~
=~
~ (Y, I=

Y) .

1l1c -.umplc \ariance,

Sf. is
(3.7)

The ~n mple 'ltandard dcvjetipp fn is the spUAr roo,t of the .,,mlll~ variance.
11tc lorm ula for the sample variance is mucb like th~ h>rmula tor the popu ation variance. The population variance, E(Y - J.l.) ) ' , I\ the average value of
( Y- J.l.y) 2 in the population distribution. Similarly, lh~ ..... mpk \!I ria nee is the -.ample~vcrage of (Y, - JJ.y} 2 i = 1.... ,n, with two modifil.ttion1: First. fL >is rt:placed
the divi-.or 11 - I instc.H.I of 11
b\ Y. and 'ccond. the avera
~ r\.:a'\on for Lhc fmt modificauon- c
y
ts
unkno\\n and thus must be estimated: the natural c!'illmator of IJ. > ic; Y . The rca'un fur the second modification-dividing ~b~\~ll'-~~~~~W..61iiiiiiiiil~i!Wrj,.,},j..__
m_ 1tin11"
~ 5 s,
...... r l b' Y. introduc ~as
r t' ' iulh
1
wn 111 xercise 3.18, E[( Y, - Y) ] = [(n - l )fni<Ty. l hus. 1:. 2:: 1 ()', Y)1 =
ni(CY, Y)1 J = (n- l)t7f, Dividingbyn - I 10 Equution(3.7)in.,teadofncorrcch tor thi:. :.m<tll downward bias. and as a res ult s~ is unbiased.
01 \ H.hng hv
of n i called a degree of freedom

1cmain.

76

CHAPTER 3

Review of Stotistic.s

THE STANDARD ERROR OF

3. 4

T he stapdard error of Y is an estim;ror of rbe !apdard deviatigq g f Y, The standard eg or of Y js degwcq;l bySECY) or by up. Wilen Y1 Yn are i.i.d..

SE(Y)

= Uy

Consistency of the sample variance.

= Sy!Vn .

(3.8)

The sample variance is a consiste nt

est imator of the population variance:


(3.9)
I n other words, the sample varia nce is close to the population variance with high
probability whe n n is large.
The result in Equation (3.9) is proven in Appendix 3.3 under the assumptions
4
t ha t Y .. . .. Y are i.i.d. and Y, has a fi nite four th moment: that i

nite; in other words. Y, must have a finite fourth moment.

The standard error ofY.

Because the standard deviation of the sampling dis


lribution of Y is 17y = O'y/Yn, E quation (3.9) j ustifies using sy!Yn as an esti ma
tor of 17-y. The estimator o f <r-y. syl Vii, is calle d the s tandard erro[ of Y and b
denoted by SE(Y) or by <1--y (the ..~, over the sym bol means that this is an estimat or of O' y ) . The standard error of Y is summarized as Key Concept 3.4.
When Y 1, , Y11 are i.i.d. draws from a Be rno ulli distribution with succes~
probability p, the formula for the variance of Y simplifies to p(l - p )Jn [see
Equ ation (2.7) J. The formula for rhe stan dard error also takes on a simple form
that depends only on Y and n : SE(Y) = V Y(l - Y) l n.

Calculating the p- Value When U y Is Unknown


Because s ~ is a consistent estimator of u~. lh e p -value can be computed by replacing rr 1 in Eq uation (3.6) by the s tandard error. SE(Y) = .:7-y. That is, when 17y is

3. 2

Hypothesis Tesls Concerning the Population Mean

77

unknown and Y1 . , Yn are i.i.d .. the p-value is calcu lated using the formula
p-value

( ly (lf/ - I)

= 2<1>

JJ.

S (Y)l.u .

(3.10)

The t-Statistic
T11e standarcli2ed sample average (Y - 1-LY,o) IS( Y) play~ fl central role in testing
statistical hypotheses and has a special name. the t-statiseic o r 1-ratig_ y - 1-LY.O

(3. 11)

t- SE(Y) .

ln general, a test st atistic is a statistic used to perform a hypothesis test The


t-statistic is an im portant example of a test statistic.

Wllen n is large,s~ is close to


a ~ with high probability. Thus the distribution of the t-stMistic is approximately
the same as the distribution of (Y- J.i-Y,u)lcry, which in turn is well approximated
by the standard normal d istribution wh en n is large because of the central limit
theorem (K ey Co ncept 2.7). Accordingly, under the null hypolhesis,

Large-sample distribution ofthe t-statistic.

r is approximate!~ di<>:tributed N(O, 1) for large 11.

(3.12)

The formula for the p-value in E quation (3.10) ca n be rewritten in terms of


the t-sratistic. Le t tq<:t denote the value of the r-statistic actually computed:
y acr _ /L y

r a<:t =

0
_ _-:::,...:.=

SE(Y)

(3.13)

Accordingly, when n is J a rg~. tbe p-value can be calcula ted using


(3.14)
As a hypothetical example, supp ose that a sample of n = 200 recent college
graduates is used to test the null hypot hesis that the mean wage, ( Y), is $20/hour.
The sample aver age wage is yw = $22.64 and the sample standard deviation is
s\ = $18.14.Then the standard erro.-of Y issyl v'ii = 18.14/V2o0 = 1.28. The
value of the r-statistic is t tX1 = (22.64 - 20)/1.28 = 2.06. From Appendix Table 1.
the p-value i 2<1>( - 2.06) = 0.039, or 3.9%. Thntl\ nssum1ng the null hypothi!SIS

78

CHAPTER 3

Re-~iew of Statistics

to bl!

true. the probability of obtaining a sample average at least us di(fcrcnt from


the null as the one actually computed is 3.9%.

Hypothesis Testing
with a Prespecified Significance Level
Whe n you unde rtake u statistical hypothesis test. you cun make two types of mis
takes: You can incorrectly reject tbe null hypothesis whe n it i!) true, o r you can fail
to reject the null hypothesis when it is false. Hypo thesis tests can be performed
without computing the p-value if you are willing lo specify in ad vance the pcobability you arc willing to tolerate of making the first kind of mistake- tha i is. of
incorrectly re'cctin the null h the. is when it is true. If ym1choose a prespecific<.l pro a 1 tty of rejecting the null hypotheSIS when it IS true (fo r e xample,5%).
the n you wiU reject the null hypothesis i ( and only if the p -value is less than 0.05.
This approach gives preferential treatment to the null hypothesis, hut in many
practical situations this preferential treatme nt is appropria te.

Suppose it has be ~ n
decided that the hypothesis will be rejected if the p-va!ye '5'' S !IWO 5%. Because
er the lails of tbe normal distribution outside =: 1.96 i" 5%, this g1vl!s
a simple rule:

Hypothesis tests using a fixed significance /eve/.

Reject //0 if IT'sc11> l.96.

(3.15)

1bat is, reject if the absolute value of the 1-statislic computt:d from the sample i!'.
greater than 1.96. H 11 is large e nough, then under the null hypothesis the
t-statistic bas a N(O. 1) distribution. Thus, the probability of erroneously rejectmg
the null hypoth esis (rejecting the ouU hypothesis whe n it is in fact rrue) is 5%.
This (rarne work. for testing statistical hypotheses has some specialized 1erm1
no l.ogy. summarized in Key Concept 3.5. The significance level of lhc test in E4u;-a
tion (3.15) is 5% , the critica l valu e of this two-sidetltesl is 1.96, and the rejection
region is the values of the t-statistic outside :!: 1.96. 1f the te5t rejects ut the 5/c) c;ig
nificance level., the population mean J.L y is said to be stotislically significantly dtf
ferent from J.LY,o at the 5% significance leve l.
l esting hypo theses using a prespecified significance level docs not require.
computing p-values. In the previous example of testing the hypothe,is that th~
mean earning of recent college gradua tes is $20, the /-statistic wa:, 2.06 Uti-'
exceeds 1.96. so the hypothesis is rejected at Lhe So/o level. Although pc:rlorminc
rhe test with a 5% signi ficance le veJ is easy. reporting only wht:thl!r the! null

3.2

Hypothesis Tests Concerning the Population Mean

79

I THE TERMINOLOGY OF HYPOTHESIS TESTING


\ ~t.t l lstic.'l l hypothesis test can make two tvpes of mistakes: a type l error. in" hich
the! null hypothesis is rejecte d '"hen in factH is true. a nd a t~pc 0 error. in which
th(; null hypothesis is not rejected when in fact it i~ false. The prespccificd rcJCCnon probabilit) <.l f a statistical hypothests test when the null hypothcsil:. is tru~.;th tt 1, , th<. prcspecified probability of a type I ~rror-is the significance le-.e l of

lht: tc:>l. '(b e crilic~tl \'lllue of the tec;t stati!.tk is lhc vaJue of the sta tistic for whtch
the tc~t jur;t rejects the null hypothc-;i:;, at the g h.'en significance le vel. lnc r;ct of
, alue-. nf the test statistic for which the test rejects the null hypothesis is the rejection region. nnd the values o f the te't statistic for which it does not rject the null
h~ p<lthcsis is the acccp1a ncc rc~i o n. fne p robability that the test actually incorr~ctJy rejects the null h~1 pothe sis when it is true is the size of th~ t ~sl. and the proh~bilily that the rest correctly rl:!j~ctt. the null hypothesis wh~n the altl.!rnative jo; true
i~;

the power of the t~Sl.

1l1c p-valuc is the probnbilily of obtaining a te~t statistic. by random sampling


vnri ltJOn, a t least as athcrsc to the null hypothesis value as is the stati!.tic actually
O~l.'rvcd. a~uming that the: null hypothesir; is correct. Equivalently. the p-value is
the mallt:s\ significance lcYcl at which you can reject the null hypothe~al>.

hypothc b i-. rcjl.!ctcd at a grespecified '>ignificancc le\'el comc~"' less information


than reportmg the fH c~lue.

What significance level should you use in practice?

In many cast:s, statistician'> and t:conomctricians use a 5% significance le\1.!1. lf you wer~ to test many
<;tali,llc:al h\ pothc,es at the 5% level. ) o u wo ukl incorrl.'(:tly rc.jcctthe null on average once in 20 C<N!'- Sometimes a more conscn.ttivc ... ignificance h::vel might be
in order f-or example. legal cases :>omclimes mvolvc stH ii:.Lical evidence. a nt.! the
null hvpolhc\is coultl be that lhe de fendant IS not gu alty; the n one woultl '"ant to
be quite sure thnt a reJection of the null (conclusion of gualt) a<; not just a result of
rantlnm !i.lmplc variation . In some legal l.CIIings the '>lgnificance !~vel u~ed i" I%
or even 0.1 %. to U\ oiLI thic; son of mistake. S1milll1 ly, 1f a go vernment a gem:~ is
considering p<.:rmitting the sale of a new tlrug. a verv con:,crvaLive standard migh(
be in oadcr so that consumers can be sure rhnt the drugs available in the matkcL
actuall~

\\urJ.. .

1
80

CHAPTER 3

Review of Statistics

TESTING THE HYPOTHESIS E{Y} = IJ.-Y.O


AGAINST THE ALTERNATIVE E(Y) i:: p..'Y,o

3.6

1. Compute the standard errorofY.S(Y) [Equation (3.8)j.


2. Compute the t-statistic [Equation (3.13)].
3. Compute the p-value (Equation (3.14)]. Reject the hypothesis at the 5% significance Jevel if the p-value is less than 0.05 (equivalently. II t"' > 1.96).

Being conservative, in the sense of using a very low !:iign ifica ncc level, has a
cost: The smaller !J1e significance level, the larger the critical va lue, and the more
difficult it becomes lo reject the null when the null is false. In fact, the most conservative thing to do is never to reject the null hypothesis-but if that is your view,
then you never need to look at any statistical evidence, for you will never change
your mind! The lower the significance level, the lower the power of the test. Many
economic a nd policy applications can call for less conservatism than a legal case.
so a 5% significa nce level is often considered to be a reasonable compromise.
Key Concept 3.6 summarizes hypothesis tests for the population mean against
the two-sided alternative.

One-Sided Alternatives
In some circumstances. the alternative hypothesis might be that the mean exceeds
.UY.o For example, one hopes that education helps in the labor market. so the relevant alternative to th~ null hypothesis that earnings are the same for college graduates and nongraduates is not just that the ir e arnings differ. but rather that

graduates earn more than nongraduates. This is calle d a one-sided alternati"c


hypothesis and can be written

H1: E(Y) > J.LY,o (one-sided alte rnative).

(3.16)

The general approach to co mputing p-values a nd to hypothesis testing is thu


same for one-sided a lternatives as it is for two-sided alternatives. with rhe modification that only large positive values of the t-statistic reject the null hypotheSIS.
ra ther than values that are large in absolute value. Specifica lly. to tc!.l the one-sideJ
hypothesi-, in Equation (3.16), construct the t-statistic in Equation (3.13). The pvalul: ts the area under the standard normal di!>tributioo to thc.: ng.ht of the

3.3

Con~dence Intervals

for the Population Mean

8T

calcu1atcd c-statist 1c. That is, the p-value. based on the V(O. 1) approximation to
the distribution of the t-statistic, is
p-value

= Pr 11, (Z > 1 ' 1) = 1 - ct>(t"'').

(3.17)

The N(O>L) criticnl vnlue for a one-sided test with a 5/o significance level is 1.645.
'The rejeclion re~ion for this test is all va lues of the r-sl'atistic exceeding 1.645.
The one-sided hypothesis in Equation (3.16) concern" val ues of J.L y exceeding
JLY.o If instead the alternative hypothesi!. is that E( Y) < JL);(I then the discussion
of the previous paragraph applies except that the signs are switched; for example,
the 5% rejection region consists of values of the t-stallstic less than - 1.645.

3.3

Confidence Intervals
for the Population Mean
Beca use of random sampling error, it is impossible to learn the exact value of the
population mean or Y using only the information in a sample. However. it is possible to use data fro m a random sample to construct a set of ,a lues that contains
the true population mean J.Ly \\ith a ecnain pre~pccified probability. Such a ~et is
called a confidence ~et. and the prespecifil!d prooab1ht~ that p.., is contamed m this
set is caUed the confidence level. The conf1dence set for f-L y turns out tO be all the
possible valut:s of the mean between a Lower and an upper li mit, so tha t tbe coo
fide11ce set is an interval, ca.lled a confidence int.crvnl.
Here is one way to construct a 95% confidence set for the population mean.
Begin by picking some arbitrary value for the mean : call this J.LY,n Test the null
hypothesis that JJ..y = J.l-.r.o against the alternative that J.Lr _. J.t.r.o by computing the
/-statistic; i( 1t JS less thdn 1.96, this hypothesized \ alue J.I..):O is not rejected at the
01
'\ o level, and write down this nonrejected value f-L H Now pick another arbitraT)
value of JJ..r.o and test it; if you cannot reject it. write this value down oo your list.
Do this again and again: indeed, keep doing this for all possible values of the populalion mean Continuing this process yields the s~t o f all values of the population
mean that ca nnot be rejected at the 5% level by a two-sided hypothesis tes1.
This list is useful because it summarizes the set of hypotheses you can and cannot reject (at the 5% level) based on your data: ll someone walks up to you \\tth
a specific number in mind, you can tell him whether his hypothesi is rejected or
not simply by looking up his number on your handy list. A bit of clever rl!asoning
. hows thatth1s set of values ha~ a remarka ble property: The probability that it contain!> the true ' aluc of the population mean j..,l)) 0 {,

82

CHAPTER 3

Review of Statistics

CONFIDENCE INTERVALS FOR THE POPULATION MEAN


A 95% two-sided confjdcnce interval for p.. y is an interval c-onl>lructed so that it
contains th6 true value of !J-} in 9.5% of all possible random samples. When the
.::ample size n is large. 95% . 90/o . and 99% confidence inten.als !or J.l.r are

95 % con!idenc~ interval for JJ.y = {Y = 1.96SE(Y)}.


90% confidence interval for !J- ) =

(V 1.64SE(Y)}.

99% confidenC(; inte rval for J.l.y =

{Y =:: 2.58SE(Y)j.

The clever reasoning goes like th is. Suppose the t r ue value of Ji.l is 21.5
(nllbough we do not know this). TI1cn Y has a normal distribution centered on
21 .5. und the /-statistic te~-r in g tht: null hypothesis IJ. y = 215 has a N(O. 1) d istrihution. Thus, if 11 is large. the pro bability of rejecting the null hy po thesis Jl.y = 21.)
a t the 5.{, level i.s 5%. Out bccau..-;c you tested all possible va lues of the population
mean in con\trucring your set. in particular you tested the [rue value. J.Lr .= 21.5
In 95 % of all !'ampk s. you will correctly accept 21.5: litis mt!ans that in 9:"% o f all
sampl~. your list will contain the true value ot J.J-y. nws. the va lues o n yow list
cnru;timtc a lJ~% confidenct: set for p.y.
This method of constructing a confidence <;.Ct is impr acticaL for It requires you
to test all possible values o f J.l.y as null hypotht:ses. Fortunately then~ is a much ..: a~
1cr approach.Ac{:ordmg to the formula for the r-stat1stic in Equation (3.1 ~).a rri.tl
value ol P..ut is rejected at the 5!(, level if it is more than J.96 . tandard errors a"" u~
from Y Thus the set of ''a lues of Ji.} that are not rejected at the 5% level wnsi~h
of those values within :t 1.96SE(Y) of Y .That is. a 95% w nfidcm.:e interval for J.L >
is Y - l 96SE(YJ s JJ.1 s }- + 1.%Sf.(f\ Key Co ncept 3.7 summantes tht'
lpproach.
L\s an example, consjdcr the probkm of constru~:ting. a 95% co nfidence interval for the mt!an hourly earnings of recen t college graduates using a hypothetic<~ I
random sample of 200 recent coll~ge graduates where Y = $22.()-t nnd SE(Y)
1.28.1l1e 95% con fide nee interval for mean hourly e a rnings i<> 22.64 +- 1.9fl -x 1.2.'
= 11.6-t
~5 1 = [~20.13. $25.15}.
1 his discussion so far ha s roc.: used on two-sided confidence tnterval". Ont.
could m-.tead construct a one-sided confidence interval as the scl of valu~.:s of P.. 1
that cannot he rcjt:cted by <J one-sided hypothesis test AJ tbough unc-sided conft

3 .4

Comparing Means from Dilf8fenl Populations

83

t.lcnc~

mtc:rvals ha\e applicanon' m ,orne branchc:'> ul -.t tll'-tic,, the~ are uncommon m apphcd ~onomctric anal)'i'

Coverage probabilities.

The l'overnge probnhility t)f a confide:: nee interval for


Ihe popul,lllon mean~ the prot abilit y, computed over all po~s1hlc ranJ om sample'-. that 11 contamc; the true population me.1n.

3.4

Comparing Means
from Different Populations
Do rt!cent m.1le and female college graduate<. earn the samc amount on a veragc:?
Thi' qu~;,lltln invohes comparing the means of two dillerent populatmn distributionc;. Thio.; section summarizes h~1w to tel>l bypothc)es md ho'' w con')lruct
cunltd~ncc tnh:rvals for the difference in th~.: mc,lll'> (rom l\\0 different
populations.

Hypothesis Tests
for the Difference Between Two Means
Let ll. bc: the mean hourly earning in the populatiOn of women recc:nlly graduated Irom college and lel #J. be the population rnt!an for recently graduated men.
Conc;tdt!r the null hypothesis tJ1at eammgs for thcst: two pupul.tttons dtUcr by a
certain amount , say d11 Then the null hypo thel>b .md the t \~vO-'>iued alte rnative
hy pot hc~'" m~:

(3.1 )
OH! null hypolh(!')is that men and wom~n in tht!l.c populauon~ have the same t!arnm~~ c.onel>pundc; 10 Ho in Equal ton (3. 18) \\llh tl
n
Bc~;.ausc: thest:: population means <tre unl..no\\ n. they rnu't be estimated from
'iillnpJc, tlf men and women. Suppose \\'c ha,c samples nf 11,, men and 11. women
dr .two at ranJom from their population.' Let I he sample uverage annual carnin~
be }"' tur men and Y,, for women. Then an Cl>ll m nor ot JJ.1 - JJ. . i'> } ,, - Y,,
li) IC!<t th-. null hypothests that p., - IJ.... = d usinL } , - }; . '" need to know
thr..; Ul\trihutton of}:., - Y..~ Recall that Ym '' accordtng to the centrul hmll theorun approximate!~ di:-.trihutcd V(IJ.m a}, I 11, ) . whcrt.' rr;, j, the population variam.c nl car ning" for men Similarly. Y. is apprwomatdy Lli<trihuted N(p. .... rr: /11w),

84

CHAPTER 3

Review of Statistics

u;

where
is the population variance of earnings for women. Also, recall from Section 2.-1 that a weighted average of two normal rnm1om variables lS itself normally
distributed. Because Y,11 a nd Y,, arc constructed !rom different randomly selected
samplec;, they arc indcpemlenl random va riables. Thus. Y,, - Y,. b distributed
Nlfom- JJ-,,. (u,:l nm) .,_ (u;ln.,)].
If fT~ and
are known. tben this approximate normal distributton can be
used to compute p-values for the test of the null hypothe~,is thal#J.,., - IJ.,. = d0 . In
practice, however, these population variances are typically unknown so they must
bt! esti mat~d. As before. they can be estimated u. ing the ampll! variances, s~, and
s:. where .\~1 is defined as in Equation (3.7). except that the ~ta tistic is computed
only for the men in the sample, and is defined 'imilarly for the women. Thus the
standard error of ?,11 - Y.. . is

cr;

s;

SE(ym - y ...) --

~;,,

\ '"

s?.

+ n" .

(3. 19)

The r-statistic for testing the null hypothesis is construcled analogously to the
/-statistic for te!>Ling a hypothesis about a single population mean, by subtracting
the null hypothesized value of ll-m - IJ... from the estimator Y"' - Y,,. and dividing
the result by tbe standard error of Ym - Y .,:

r = <;E~YV..~ ;
Ill

~0

(r-statistic for comparing rwo means).

(3.20}

IY

lf hoth nmand n,.. are large, tben th.i:; r-statistic bas a standard normal distribution.
Because the r-statistic in Equation (3.20) has a standard normal distribution
under the null hypothesis wben n"' and n,. are large, t he p-value of the two-sideJ
lest is computed exactl} as it was in the case of a single population: that is.. tbe
p-val uc is computed using Eq uation (3.14).
To conduct a test wi th a prespecifi ed significance level, simply calculate tht!
r-statistic in Equation (3.20) and compare it ro rhe appropl'iate crilical value. For
example, thl! null hypothesi is rejected at the 5% significance level if the absolute
value of tbc 1- tal i tic exceeds 1.96.
If the altcrnaUvc is on ~-sided rather than two-sided (tha t is, if the altcrnativt:
is that J.Lm - IJ.w > d 0), then the test is modiCi !!d as outlined in Section 3. 2. me
p-valuc i-; computed using Equation (3.17). and a test with a 5% significance kvel
rejects when t > 1.65.

Confidence Intervals for the Difference


Between Two Population Means
Th1. mc:thod for constructing confidence intervull> ... ummarizcu in cction 3 ~
extend~

to con:aructing a confidence intcnal for the difference ho.:t\\een the

3 .5

means. d

Differences-of-Means Estimohon of Causal Effects Usmg Expenmentol Data

= J-l.m -

85

J-l.w Becaw;c: the hypothesized value dn i~ reJected at the 5% level

if 111> l.96. du "ill be in the confidence set if ftl s 1.96. But itI :s; 1.96 mean~ that
the estimated differ ence, Y , - V.... i1o less than 1.96 s tandard errors away from t/11
Thus. the 95!.) two-sided cunfit.lcnce interval ford consi!'ts of those values of d
within :: 1.96 standard errors of Y 111 - Y,..:
95% confidence interval ford= J-l.m - f-L .. is

(Y,,- Y,..)::: 1.96S E( Y m- 'f l.

(3.21)

With these formulas in hand, the box ''The Gender Gap of Earnings o l College Graduates in th e U.S." contains an empirical investigation of gender differences in earnings of U.S. college graduates.

3.5

Differences-of-Means Estimation of
Causal Effects Using Experimental Data
Recall from St:ction 1.2 that a randomized co:ntrollt!d ~xpcrim c nt randomly selects
subjects (individuals or. more generally. en tit ic ) from a population of interest.
then randomly a igns them either to a treatmen1 group. which receives the experimental treatment, or to a control group. which docs not receiw the treatment.
The d1fference between the c;aruple means of the treatment and control group is
~1n

estimator of rhe causal effect of the treatment.

The Causal Effect as


a Difference of Conditional Expectations
The causal effect ol a treatment ts the expected eCfed on the outcome of interest
of the treatment as measured in an ideal randomi7ed controlled experim..::nt. This
effect can be expr~ed a~ the difference of two conditionaJ expectations. Specifically. the causal effect on Y ortreatment level.t il> the difference in the conditional
expectations. E(YIX = x) - 1~( Yl X= 0 ), where i:.(YI X= x) is the e xpected value
of Y for the treatment group (which receives treatment level X = x) in an ideal
randomized controlled experiment and F.'( YIX = 0) is the expected value of Y for
the control group (\\ hich receives treatml.!nt level X = 0). In the context of
the causal ercect is also called the treatment effect. If there arc only
two treatment levels (that is. if the treatment is binarv). then we can let X= 0
t.h:notc he control group <~nd X= l dcnl)tc the trea tment grou p. Tf the treatment
is binary treatment. tht:n t h~ causal effect (tha t is, the treatment effect) is
t.'( Y X- I) - ( Yl X - 0) in an itlcal ranJomizccJ controlled experiment.

~:xperi.ments.

86

Review of Statistics

CHAPTER 3

The Gender Gap of Earnings of College Graduates in the U.S.


he box in Ch.sptcr 2. 111e Distnbuuon of Eamm~ in the l,;nned Stat) m 2004." <ihO\\ S that., on
11vcragc. male college graduates earn more than
fem ale college graduates. What arc the rc:centtrcnds
in th1s "gender gap" in earnings? Social norms and
laws go,crning gender dscmrunauon 10 the workplace have changed ~u~tantially m the t ' ni1ed
State~ 1s the gender gap in earnings of college gradunh!~ stable or lui~ it dimmished over lim.:'!
Table 3.1 gives estimntcs of hourly earnings for
collegeeducated full-time workers ugcd 25- 34 in rhc
U nited States in 1992. 1996. 2000. anJ 2004, using
data collected by the Currem Populauon Survey.
Earnings for 1992. J996, and 2000 were utljusted for
inflation by putting them in 2004 dollars using the
Consumer Price (ndcx.1 In 200t the average hourly
earnings of the 1,901 men surveyed was $21.99. and
the standard de' iation o f earning:. for men was
$10.39. The ave rage hourly carmng.s in 2004 of the

TABLE 3. 1

1739 women o;urv~}c.:J was lVI and lbc.: st. nlltrd


deviation of earnings ~ .1~ ~.16 lhu~ tht t. 'ttm.tlc ot
the geode gap m earnings ft1r J{X~I'' $352 ( $'I lJ9
- $11tH). with a standard error of S0.31
(= v 10.3~11901 -r b. Lfrl / 173'>). The 95~u confil f' in camm~ 10 2004
is 3.52 :!: I% 0.3 1 (S2.-J1 ~ 12)
The re~ults in Table 3.1 suggc~t tour conclul>ions.
First.thc gender gap b large. An houri} gap of $3.52
might not sound like much. but owr a year 1111dds up
to S7,040. as.c;uming a 40-hour work week and 50 pa1d
weeks p..:r y..:a1. Second. the C!ltimnted gender gap
has increased by $0.79/hour in renl terms over this
sample. from $2. 73/hour to $3.52/hour: how~vcr. thi~
increase is not !.1Hti!>ticall}1 signif1cuot at the S% ~ig
nificancc lt:vel (Exercise 3.17).Th1rd. thi:) gap is large
if i11s measured mstead in percentage terms: According to the estimate!. in Table 3.1 . in 2~ women

dence mtc;;T\ .sl lor the g..:nde

C:()nt/mte<l

Trend$ in Hourly Earnings in the United States


of Working College Graduates, Ages 25-34, 1992to 2004, in 2004 Dollars
Women

Men

Differenu., Men vs. Women


95%

Confide~><

,._,

lnmvol

Year

Y.,

...

n,.

v_

1992

20.33

8.70

1592

17.60

6.90

137(1

2.7:.

0.29

2. 1~3.30

19%

19.52

8.48

\'377

16.72

7.03

1235

2.80

0.30

2.22 1..10

2000

21.77

10.()0

1.300

18.21

8.20

ll!i2

3.56

0.37

2.HJ t 21J

201~

21.\N

10 39

19(11

" .47

s 16

1739

3 '\2

0. ~ 1

2.1-H -4 13

n_

Y.,.

r_

Sf Y..

ford

Thee ~liM tes an comrcued 1m11gd.ua on all !uUumc workers~ :.5-1.1 :.uf\c\ ~d 1 l Cun ~tl'l: pulatwn !>un~v ron
dueled tn \l..rdt ot lbL n~ ~ ar ( for C.'{~ o .. 11 dau for2(X)4 wen: c:olkL'tcd tl \ t 1n;b ~ 61 The dtrfclcnu: ' ''gmfi.-ant l)
dtU..-rcm tr..>m ltro ;n 1b.; I 51!tn't..:an.:e c\cl

3 .5

Differences-of-Means Estimation of Causal Effects Using Experimental Data

87

earned 16% less per hour tha n men did ($3.52/

men and women? Does it rcfkct differences in

$21.99). more than the gap of 13% seen in 1992

choice of jobs'! Or is there some other cause'? We

($2.73/$20.33). Fourth. the gender gap is smilller for


young college graduates (the group analy1.ed in

return to these questio ns once we have in hand

the tools or multiple regression analysis. the topic of

1'able 3.1) than it is for all college graduates (an a-

Part 11.

lyzed in Table 2.4): A s reponed in Table 2.4. the mean


earnings for aU college-educa ted women working
full-time in 2004 was $21.12. while for men tbis mean
was $27.83. which corresponds to a gendet gap of
24 % I= (27.83 - 21.12)127.83} among all (uU-time

college-educated wor kers.

TI1is empirical analysis documents that the "gender gap" in hourly earnin gs is large and has been
fairly stahlc (or perhaps increased sligb1ly) over the
recent pas t. The analysis does not, however, tell us
why this gap exists. D oes it arise from gende r discrimination in the labor market? D oes it rdlcct differences in skills. experience. or educa tion between

Bc:cause ol inflation. a dollar in 1992 \\as wo nh more than


a dollar in 2~. in the sense Ll\111 a dollar in 1992 could buy
more gootls and services than a dollar in 2004 could. Thus
earn ings i.u 1991 cannot be dir.actly compare d to earnings
in 2004 without adJusting for mflation. On~ way 10 make
\bjs adjustment is to use tho'. Con~um c r Price: Index (CPT).
a measure of the price of a "marke t basket" o( consumer
goods and services constructc:d by the Bureau of Labor Sta
tislics. Over the twelve y.:ars from 1992 to 2004. the price: of
the CPI market ~askct rose by 3~.6%. in o the r words. !ht:
CPl baskc:t or goods and ~rviccs that cost SJO() in 1992 C0!>1
$1 34.60 in 2()()4. To make earni ng~ in 1992 and 2ll04 c<Jmparablc in Table 3.1. 1992 l:arning~ are inflntcd by the
amount of o ve rall CPI pri ce inflation. tha t is, by multiplying 1992 amings by 1.346 to put them intu -2()!~ d o llar;.

Estimation of the Causal


Effect Using Differences of Means
If the treatment in a randomized contiolled experiment is binary, then the causal
effect can be estimated by the difference in the sample average outcomes between
the treatment and control groups. The hypothesis that the treatment is ineffective
is eq uivale nt to the hypothesis that the two means are the same, wh ich can be
tested using the r-statistic for comparing two means, given in Equation (3.20). A
95/c. confidence interval for the difference in the means of the two groups is a 95%
confidence interva l fo r the causal effect, so a 95 % confidence in te rva l fo r the
causal effect can be constructed using E quation (3.21).
A well-designed , well-run experiment can provide a compelling esfimale of a
causal effect . For this reason. randomized controlled experiments are commonly
conducted in some fields, such as medicine. [n economics, however, experiments
te nd. Lo be expensive, difficult to ad m iniste r. and, in some cases, ethically ques
tionable. so they r em a in rare. For lhis reaso n , econo metricians sometimes study
na tura l exper iments," also called quasi-experimen ts. U1 whlch some event

88

CHAPTER 3

Review of Stoti~tics

unrclmc.:u to Lh~ trc.:atm.:nt or 'iUbjcct characteril:ttic ha-.thc t.lfect of a!:-'-Jgning uitfcrcnt treatmcnL" to dtffcrent subjccrs as tf they bad been part ot a randomized
controlled t:Xperimen l. The box, ''A Novel Way to Boost Rc;ttr~mcrll Si.i\tng~"
pruvitlcs an ~'ample of such a quasi-expt:riment that >1eh.lt:d 'llnlC '>Urpri-.ing
conc.:Ju..,tOn~

3.6

Using the t-Statistic


When the Sample Size Is Small
I n S..:ct1ons 3.2 through 3.5. lhe /-statistic is used in conJUnction with critical val
ucs fwm the standard nom1al distribution for hypothe is lc.::.ting and fo r the con
struction of confilkncc intervals. 1l1e usc of tbe standnrd normal di tributlon ic;
justificu by the ccmral llmit theorem, which applies whcn 1he sampk size is large
When th<: sample siLc i!> small, the standard norma l distribution can provide n poor

appro\imation to the di-.tribu tion of the l-sta tistic. H. however, the popul.uion
dbtnhullon is itsdf normally distributed, then the exact Jbtnhuuon (tbnt ~. thl.!
Iinilc-,ample ui tnbution: sc:e Secuon 2.6) of the 1-stati..,tic testing the; mean of .t
smgle population i::. th~ Stuuem t distribution with 11 - l degrees of freedom. ttnd
t:rilicul valu~ c<tn he taken from th ~.: Student 1 distribu tion.

The t-Statistic and the Student t Distribution


Con<;ider the 1- ' tati'\tlc uo;ed w t.:~t rhe
that the mean of Y is J.L.ro using data Y1 , Y11 The formula fo r thi~
siHti..,tic i:. gi\cn b~ E4uation (3.l0). where the standard r..rror of Y is given hy
l quat ion (3.8) Suhc;ttlUtion of lhc latter expression into the forml!r ykldc; the for
mula lor the /-statistu;:

The t-statistic testing the mean.


hypnth~o;b

Y - J.tLo

I = .

-VS'j-111
~

(3.22)

''hc:rc: .\i hgi,en in ['luation (3.7).


1\<; discuso.;cd 1n Secuon 3.2. under general conditions the 1\I:IIISttc lut' a c;ta!l danJ nnrmal distribution tf the sample $1ZC i large and the null hypothc'>b 1:, uu"
['cc P.4u.tuon (3.1.2)]. Although the stundurd normal appn.Jommion to thr.. 1-.;ta
lt'>ll<.. j, reliable lor a '"itk range of di trihuLmns of } if n i-.large it ciln he unrdi
able ir '' ~......mall.lllc ~\al.l tJi,tribuuon of th~: c-~Lalistic oc:p..:nd'- un the d ....trihutl(lll
of 1: .tnd it can h~ \\!f\' ~.:omphcaLCd . TI1~:rc is. howe\CJ,iclOI.' ... pel tal t:<t.,c tn "htch

3.6

Using the !-Statistic When the Sample Size Is Small

89

distributed. thl!n the 1-stntistic in Equation (3.22) has a tudt:nt t di:-.trihution with
n - I deg.rt!e!> of freedom.
To \'Crif\ thi" r<~!>Uit. rct:all from Section 1.4 that the St udcnt / distribution with
11 - 1 dt:gr~e!> ot frl;cdom is defined to he the dtstrihution of Z tVWI(n- 1).
"here Z is a random 'ariat"lle with a standard normal distributiOn. W ~a random
variable " itb a ch1-squared clli.Lribution \1.1th n - 1 dcgr~!> ol freedom . and 7. and
n arc tndcpendenll) distributed. When Y1. . Y, arc 1.1.J. and the population di5tnbution of Y is .V(IJ.r, u~ ). the /-statistic can be '~ ritlen "' '>Uch a ratio. Spccifica II).ICt 7. == (Y - /J.) 11)
and let 'lV - ( 11 - I ).1~/ tr~: then some algebra 1
shO\\S that the r-.,tatistic in Equation (3.22) can be written a<> c =
Z tVw (n - l ). Recall from Section 2.4 that if Y1 Yn are i.i.d. and the population distribution or Y is N(J.Lr u?, ).then the !>ampling thstribution of Y is exactJy
N(p.y, u~ l n) for alln: thus. if the null hypothesis IJ.y IJ.ro is correct, then Z =
(Y - IJ.-r.o> tv;;{! n has a standard normal di!>tribution for all 11. In add1tion, W =
(n - l)s~!u~ has a xl,_ , distribution for all n, anu >'and .\f. are independently
d ist ributcd. It follow<> t ha 1. if lhe population distribution of Y is norma I. 1hen under
the null hypothcsh the t-::;tafis tic given in Equation ().22) has an exact S t ud~..:nt 1
di~tribution with 11 - l d egree ~ of freedom.
If the population distribmion is normally distributed. then critical value from
the Student t di\trihution can be used to perform hypothesi tc ts and to con!'truct
confidence interval:.. A" an example, consido::r a h~ pothctical problt:m rn "hich
t ' - 1. 15 and n = 20 so that the degree of freedom j.., n - I = 19. From .1\ppcnJi\ Table 2, the 5'3o two-sided critical ,alue for thl' t '~ di~trinution ic; 2.09. Becau c
the r-statistic is larger in absolute value than the critical value (~ .15 > 2.Cl9), the null
h) pothcsis would he rejected at the 5% significanc.: level agamsl the two-sided
a lternative. The 95% confidence interval for /J. )' con.,tl uc.:ted u ing the ' '~ distribution. would beY 2.095 E(Y). Thi::; confidcnt.:c interval b somewhat wider than
the confidence interval constructed using the tandartl normal cmical value of 1.96.

tv;;rt;;

The t-statistic testing differences of means. 1 he t-stati"lic te..,ting the difference:: of t\\ 0 mcanc;. given in Equation (.3.20). c.Jocs not hav~ a Student 1 dlst rihution, even if the popula tion distribution of Y il"> normal. The Stude nt 1
di-,tribution docs not apply here because tlw variance estimator used to compute
the standard error in EquaLion (3. 19) does not produce a denominator in thl.! t- tati~tic with a ch1-squared distribulion.

The tlc~ircd e'l.r c''1o~ ~~ oblamcd by mutuplnng and Ul\ tUIOJ! b' \1 0'1 ilnd oollccunc. lcnm:

:1'

VI\

(II -

1>.

90

Review of Stolistics

CHAPTE R 3

A Novel Way to Boost Retirement Savings


any ~oonomist~ think that '-"l.lrker.; tend no t to

Lional method.c; for encourag.mg rcurcment sa'loings

nonpartacapatlon ' " parwapauun The} omparcd


two groups of \\Orkcr..: lhose hared the )ear bc(un;
the change und not automaticall~ enroll d It ut C()\Jid

focu on financial inccnuvc\. Recent!). however.

opt an). nnd thO$t.' hired in Lh1.

~conom1~h

and !lutomaucnlly enrolled (but c<.uld up1 out ). The

~ave enough

tor thc1r retirement Comcn-

have lncn:!cbingly

ob:-:.~t vcd

that behav-

ior 1!> not alway" in accord wnh conventional economic models. Ac; a con~cqucnc.:c. there hns been an
up:,urgc m mtcrcst in unconventional way~ lo influ-

ence econ(1mic decll.ion~


In an 1mportant study puhli):hcd m 20l.ll. Brigitte

financial

aspect~ of

\~a r

nf1

lhc l'hangc

th-: plan were the 'Jane \{adrian

and Shea urgucd that there ""er" no ~\ ~t.. rnauc dif- ,


terences bet\\Cen Lhe workers hared ~fore and a{lcr
the chJnge an the enrollment ddault. Thus. from an 1
eronometncian~

perspective, the chnogc: wa!> hke a

~1ndn;m

and Denm!. Shcu co1ts1dered one ~uch


uncunvcntional method for stlmuhHing retirement

randomly a~!iigned treatment and the a\Usnl effect of


the change could be estimated by the uiCCcn.mcc in

savings. Muny firms offer retirement savings plans in


which th e tlrm matche::.. in full or in part. saving.~

means between the two groups.

taken out of the paycheck of' particapllting employ-

ment rul e made a huge dillcn.:ncc: llle enrollment


rate Cor tht. opt -in~ (control) group \\i.l!> 37.4% (n ""
4249). whereas tbe enrollment rate to. he "opt-our
(treatment) group was 85.9% (n "\l)()l ) . The estimate of th..: treatment effect ts485 01o ( - 85.9% 37.4"'u). Bt:eause their sample i~ largc.tht. 9'\"'n confidence for the treatment effect is tight (46.8% to
50.:?.%).
'lo I.!Conomists sympathetic to the conventional
view that the ddault enrollment ~heme c;houiJ not
mallcr. ~ladmn and Shea:-:. fim.hng \\a::. a!'>lona~hing,
One potenttal explanation for their fmdm~ i!> !hat
many worker.c; find these plan~ '0 conlusang thatthr,;>
simply trent the detuult option as if 11 wen.: rdiablc
advice: another explanation is lhtll )'OlH'llg workers
would ~imp l y rather nol think nbout ~tB,in~ und
retirement. Although neither c~plannuon is ccPnorruc~lly ro~ttonal in a convenuonal 'c:n.;e. both are
con'i~tent with the prcdictionc; c)f "bchnviMal economiC;-.." ol!'ld hoth COUJd lead tO :l~"pltllg l h J 3U!I

ees. Enrollment an such plan'>. culled 401 (k) plans

after the applicable section of the t,.S tax code. is


aiY. a\c. optional. Ho'-"c'cr.1t ::.om~ hrm~ employees
ore nutomatically enrolled an '-UCh ,, phm unless they
choose to opt out: at other tarm" e mployees arc

enrolled onlv 11 they choose to opt m. According to


con,entionul economic models of hehavior. lhl
ml.!lhod ot enrollment-opt out. or opt in-o;hould
~an.:el ) matter: An l.!mployee

who wnnl'> to change

rus or her enrollmt:nt ~t.uu~ ~amplv 1111~ out a form.


and the dtlllar value of the t1mc requarcd to fill out
he form h

'ery small comp;ued with t h~ fmancial

ampli<:at ion~ of thi::. deci!>ion. Uut. Madrinn and Shea


wond er~ d. could

this cnnvemional

rca~oning

be

wrung'? Doc~ the merhod of <'llrollmem In a savings


plan direct ly aUcct its enrollment rat.'!
'lo measure the effect ol t h~; method of enrollment. Madnan and Shea \tudacd hH~c ftrm that
th<:. ddauh option for II'> 40)(1\) plan from

chang~:d

Matlrian and Shea found that the default enroll-

Using the /Statistic When the Sample Size I$ Small

3.6

cort.>llmcnt option. lnCH!''''n)1.!v many economists arc


..tartlng. 10 think that ~uch dt>tatls might b&: as impm -

To karn more a b 1)\ll bchav1oral economu:s and


tho.: tl..!sign or reuremc:nt ~avin!!." plans..:.ct Thaler and
Benarui (2!X.l4 ).

t nt as hnancial aspect-, fllr lx>thiJng t:nwllmcnl in


rcur~men t sa,ing~

91

phn-

A modtfied \'er.;;ion of the dtlfl'rcnces-of-mean.;; f-<.,t<lli,tic. based on a different 'tandard error fum1ula-the pooled" standard aror tomula-has an exact
Student r distrihut10n when Y u. normally distrihuted. howe,er. the pooleJ ::.t.tndanl error formul a applies only in the ~pecial case thHI the two group::. have rhe
snnh! v~triance or thtt l each group has the same numb~,;r of obser vation~ (exercise
3.21 ). Adopt the notatwn of Equation (3.19). so that thl' t wn group~ at c 1..knoted
a., m and 11. The pookd variance e timator is,
I _
~ --'----

11m' II,..

'7?
-

" (Y; 2:

i- 1

!!IOU(' Ill

~.Y 1-

" (~ 2:

i- 1

~..)' ]

(3.23)

VlnUp il

where the fi rst summa non is for the observations tn group m and t h~ second summauon is fo t the ohservations in group w. The pooled q,:mdard e rror of the c..Iiffcrcncc in means ts SEf"""''cd( Ym - ~ .) - s,ofJ/~d X v l 1111, .- I I n,. .md the pooled
H>tatlsllc JS comput~o:d usmg Eq u~lllon (3.20}, \\here th~ standard error b the
pooled ~tandard error, SEpOt~t-i f.n If the population distribution of Y in group m 1'\ N(J.L,.
if the pupulation
d1 trihutton of Yin group w i'> f' C~-t... cr' ). and if the t\\ O gwup variances are the
'a me (that is. a,~, - o-~ ). then under the null hypoth<.!-t~ the H.tattstlc computed
usmg the pooled stantlard erro r has .t Stude nt r dt~tnbutmn \\ ilh n,, + 11_. - 2
tlcgrccs of freeuom.
Tht: dra\\.-hack nf u'ing the pooled variance estimatot ,;,H,ktl is that it apphes
11nh if the two population variances are the same (as-;uming 11111 =t- n. ). If tbl! populatiOn variances arc different. the pooled variance csumator is bie~ Sl'd and inconsi~ t cnt. U the population va riances are different but the poolcJ vanance formula
is uo;cd.t he null distribution of th ~ pooled t-statistit- is not n Student r di ~tributi o n .
C\'Cn il the data arl! nomally distribu ted. in fact, 11 does not even have a standard
normal di.,trihution in large samples. Therefore. the pooled staodan.l error and the
pnolcd t-stall"tic .-.lwuld not be used unless vou have a good reason to hchcve that
lh&: population variances are the snme

Y., ).

rT: ).

92

CHAPTER 3

Review of Stotistb

Use of the Student t Ojstribution in Practice


For the proble m o f testing the mean of Y, the Student t d btribulioo is applicable
if the unde rlyin g population distribution o ( Y is normal. For economic variables.
however, no r mal dist ributions are the exception ( fo r e xample. s~c lbe boxes in
Chapte r 2. 'The D istribution of Earnings in the United States m 2004 and -A Bao
Day on Wall Street"). E ven if the underlying d ata arc not normally dhlributed.
the normal approxima tio n to the distr ibution of lhc t-stalblic is valid if the sample size is large.Therefore. inferences-hypothesis tc:.ts and co nfidence intervalsabout the mean o f a distribution s hould be based o n the la rge-sample normal
a pproxima tio n.
Whe n comparing two means, a ny econom.ic reason fo r two groups having diffe rent mea ns typically implies that the two groups a lso could have different va n ances. Accordingly, the pooled sta ndard error formula is ina ppropriate and t h~
correct sta ndard error formula, which allows for diffe rellt group variances, is a<,
gi.ven in Eq uation (3.19). Even if the population distri butions are normal, the /sta tistic computed using the standard error fo nnula io E q ua tio n (3.19) does not
have a Student r distribution. In practice, the refore, infere nces a bo ut difference~
in means sho uld be based on Equation (3. 19), used in conjunction with the largesample sta nda rd nonnal approximatio n.
Even Lho ugh the Student t distribution is ra re ly applicable in econom ics, so me
software uses the Student t distribution to compute p-values a nd confidence intervals. In practice. this does not pose a pro blem because tbe diffe re nce between the!
Student 1 d iSlribu tion and the standard no rma l dimibution is negligible if the sample size is large. For n > 15, t he diffe re nce in the p -values computed usi ng LhL:
Student 1 and standard normal distributio ns neve r excee d 0.01; fo r n > 80. the)
never exceed 0.002. In most modern applica tions, a nd in a ll a ppl ications in lhi~
textbook, the sample sizes are in the hundreds o r thousands, lar ge enough lor
the diffe re nce between the Student 1 distribution and the standard normal
distribution to be negligible.

3.7

Scatterplots, the Sample


Covariance, and the Sample Correlation
What is the re lationship between age and earnings? This questio n. like many <.1th
crs. relates o ne variable. X (age). to a nother, Y (t!am ings). Th i:. sectio n review:.
three wayll to sum marize the relationship be tween variables: the i>Cattcrplot. Lbe
sample covariance. and tbe sample correlation coefficrent.

Scotterplots, the Somple Covononce, and the Sample Correlation

3.7

l FIGURE 3.2

93

Scotterplot of Average Hourly Earnings vs. Age

Avt-rage H ourly en m ings


IIIU

s
I

I
. :i . I !
~ I , .t

25

.35

30

40

I . .

:
45

I I

50

ss

60

65

Age

Eoch point '" the plot repro~ents the oge and overage earnings of one of the 200 workers in tho ~mple. The colored
dot cooesponds 1o o 40-yeorold worker who eorm $3 1.25 per hour The dolo ore for technicians in the information
dustry from tho Morch 2005 CPS.

Scatterplots
A -,catterplot is a plot of n observations on X 1 and Y11 in which each observation is
represented by the JXlint (X;,Y;). For e~a mple. Figure 3.2 is a scanerplot of age (X)

nnd hourly earning:; ( Y) for a sample of 200 workers in the infonnation industry
I rom the March 2005 CPS. Each dot in Figure 3.2 corresponds to a n (X, Y) pw for
one ol lhc observatio ns. For example ooe o( the workers in this ~ampl e ts -tO years
'-'IJ .md c.trns $31.25 per hour: this wo rker's age and earnings a re indica ted by the
colored dol tn Figure 3.2. The scanerplo t shows a positwe relationship between
a~~ nn<.l earning!. in this sam ple: O lder workers tend to earn more than younger
wurl..crl.. This relationship is not exact, however, a nd earnings could not be pre-

dicted

p~;rh::ctl}

uc;ing only a pef)On's agt:.

94

Review of Statistics

CHAPTER 3

Sample Covariance and Correlation


The: co,arinnce anti corrc:lati<m were mtwducc:d 1n ':!cctjon ~ 3 as t\\o ptopcrtres
<lf the joint proh~hilit) <.li,.tribution of the ran<.lon. '.triahh:o;, Y Jnd >: Becau'>l. tl c:.
popuJation tl1::.tri"utiun is unkno'm in pr.u.:tkc;; we J~,.l not knO\\ the: populatiun
CO\ ariancc (If corrLIHl mn. Thi.! popul;ll ion covarmnte and corrdauon l"Un. IH 1W
ever. b~: i.!~llnwtcd h\ tnk.mg a random sample of 11 mcmher.. ot the pupul.lllon and
coUccting the data ( \,, l ,).t = 1. ... , n.
The 'ampk ~m ariunce <tnd corrdatitm arc c\tim<ltor.. of the pupulallon
covariance :-tml corrd tti~.m I ike the c'tim.llor" Ji-;eu.,.,cd prevtousl)' 1 tht' chap
l<.r.thc'
~ wmputco t'Jy replaetng ' p<lJ tl tioo '"' .gc the '-'-JX.dati1m) \\llh
a ~ample.; .l\ cragc. rn~.. ..ampll' covaria nce, denoted ,. \).is

' \) = -II -I- 1 i~


,L., (X,
I

- X)(Y1 - Y) .

(3.24)

Like the sampk vmiunce.the average in Fquation (3.24) i::. computed b\ dh 1d1ng
hy 11 - I in'lt 1d of' hc.re tou. this difkrc.:nce \ll..ms from using X and ); to c'tt
mate th~ re'r'-~''"c pt1pulation mcaru;. When n i la rge. it makes little diff~r~n~~
'' hcther dl\ 1'1un 1~ b\ n or n - 1.
The sumplc correlation coeffi cient. 01 ~m pl e correlation 'dent tc.:J r 1 ami
i ~ the ratio nf tht <;ampk cO\anaoce to the sampk :.landard d~''talJOn<;
~\)

r \l' - ,,,,,

n1c sample l"1>rrclatiun measure~ the :,trcn!lh of the linear a"sociation hl!twcc.:n \'
and Yin <.~ S<tmplt. ut 11 ob:,ervauons. L1l.c the popu lation corrclalion, the ::..1mpic
~orrdat110 ''

unttlc::.::. and lu..~ bc:mct!n I .mJ t r,) ~ 1.


The: 'ample Cl)rrdfllllln equ.J{S 1 Jt \, a Y, for all! and equals -1 it X,. -

fllf

r,

all i ~lure gencrnlly the c' rrdc~tion i~ _ I 1f the -.c ttcrplot "'a ::.traightline. If

lhc line sl1lpc' UP" lrd,then there i<> a r<''itin relationship ~d ween .\ .md }' <~n '
the corrdation 1~ I. If the line slope~ do\\ n thl!n there j.., a negalivc rd.rti~m,hip

and the concl.llton '" I. lllcdo'>cr the ~cattcrpl1ll is to a stmight lim.. the.; clm'-1
JS the cum.lation 111 ::!: t. A high correlat1on cudfku.:nt doe~ nul nccessnltl) mean
that the line has a 'h:cp slopt!; rather. tl ml.!ans that the point!> 10 th1.. ~~:.lth.:rrl~.>t
f<~ll \~r~ dthl! ton 'tra1.::htlint;.

Consistency of the sample covariance and correlation.


\

mane~. lht: "trr rk c.;0\ .10

mce j.;, con~l'-!l. nt. Th.ll ~

Like lht: s<unpk

3.7

tn o th~r "'mh. in

Scotterplots, the Sample Covononce, ond the Sample Correlation

95

lo.~rge

:.<Unplc!s the <.ample covariance ts cJo,e to the populanon


co,anancc w1th h1gh prohah1hty.
lne proot ot Lhe n~~ult in Equation (3.26) umkr the assumptton that (X;. Y,)
.1rc i.i.J. <1nd that X, anJ r, hav~.. finite tt,urth moments ~ ~mulm to the proof in
AppcnJi\ \.~ that the -;ample covariance i:. consistent, and j., left a' an C\crcisc
( Lxcrcise 3.20)
Bccau-.e the sample \'ariance and c;amplc cov.trmncc arc conc;io;rcnr. the c;amplc corrclauon coclfiCh:nt is con-.tstent. lhdt is. r\) corr(.\' } )

Example. A JO cxample.con~id~r the data on age and carnmgs tn Figure 3.2.


For the..;e 200 workers, th~.- ~ample st.mdard dt.viat1on ol a~~. 1' '
10 '75 >c:ar~
.mJ th1. ,,tmpk -.tanc.l,lrd de\'iation of earnings i' \L
13 - 9'hour 1111! CO\'ariancc
!'letween age and earnings L ~. 1 t; - 37.01 (the unite; ar~.- \ ~.:ars >.. dt)llar-. ~r hour,
nc.ll readily mterpretable ). Thus. the correlation c~fficient 1s r 11 = 37 01 (I 0.75 x
13.7')) = 0.25 o r 25%. The correlation of 0.25 meanc; th, t then.: i a pu:,l tl\~ rdatJon,hlp bc:l\\een age and eammg. , but as is t:\ tdent m th~. scatterplot. this rdationshlp I ' r.u from p~rfccl.
l o 'erif~ that the correlation does notl.h:pt!nd o n the: uni ts of mca:.un.:mcnt,
'uprose th,tt ~:arni.ng~ bad been reportcJ in Cl.!n ts, in wh1ch ca-.~. th~ ~ample stanthird dcvtations of earnings is 1379~/h<.lur and the covarinncc between age ami
cnrn1n1(:. i!. 3701 ( uni ls are years x cents/hour): then the correlation i~ "\7() I/( 10.75
x 1379) - 0.25 or 25%.
figure 3.3 gi' es additional cxamph:s ol ~calll.!rplub .mJ correlation. Figure.: 3..\a shO\\o<; ::1 .trong po~itive lJne"r rc:Ja tion~h lp bet\\CCO thl.!'>e \<Jrtablc~ and
thl.! 'kunple correlation as 0.9. Figure 3.3h -.htm '- a Mung ncg.lli\e relarion'\hip '"ith
n <;ample correlation of -0.8. Figur~ 3.3c shO\\' ,, -.<.ath.:rplul with nn c.;\ ident relattun,Jup anJ the sample correlation i'\ 7CTO. f lltUTI. ,,,\J ~lOWS a clear relation,hip:
"' ,\ tncr1..a<>c~ Y iruttally increases hut then de.'-' c.1:-.Q. Despite thi" L11~ccrnable
ro..l1twn:,h1p bc:t"''een X and Y, the sample corrcl:ltton '" z~ro. the rcac;on i<; thal.
for the-.c Jo.~ta. ~m~ll values of Yare assocHtted With
large.: and :,mall 'alue:.
of \'.

b"'"

I' .t

l11i') final example emphastzcs an important pomt:'Jhc correlation coelhcicnt


mc.1,urc of /ma1r assOC1atiun.l11crc 1s a relationship 111 figure 3.3d. but it b

not

lin~.:ar.

96

CHAPTER 3

Review of Stotislics

Scotterplots for Four Hypothetical Data Sets

FIGURE 3.3

)'

70

::r

....., .
~

'ill

41)
31)

50

I ' ...

.. 4~

...... .
. ..
.. . '.
\

2H

6U

'
~'\
. , '..t: ..

. ,..
..... .. z ..
......
. ... ...... ,
:, ,:

~. ~

.. . '

30

!.. . . .

2U
I()

II

tlO

70

90

1110

II tJ

120

UO

Cw11.:lutiun =+0.9

40

:w

I Ifl

120

no
X

?10

l)(J

100

IIU

120

()()

-"~~....
.......,

.,..1:

so
40

30 -

HI
ill

IUO

')'

..~:..

~~

90

7n

.... .
.
. .. ':. ..
.... ....
... ,,....... :
.
...
.
....
. .: ... . ..
....

50

80

(b) Corrdattl>n = - 0.8

.I
60

0
70

)'

..

40

Ill

(a)

,.. ........

130

::~

"'0

80

~~

IIlii

~~~

130
X

(c) Currduuun - 0.0

II II

(d) Corrcl:mun .. 0.0 (,tu.1dnut)

The scotterplol$ in Ftgures 3.3o and 3.3b show strong lineor relationships between X ond Y. In Figure 3.3c, X is independent of Y and the two voriobles ore uncorrelated In Figure 3.3d, the two variables ol~ ore uncorreloted even
though they ore related nonlineorly.

Summary
I. The sample average, Y , is an estimator of the population mea n. p.y. Whe n )'
Y, are i.i.<.l .,
a. the sumpling distribulion of Y ha!) mean JJ.y a nd ' ariancc tr ;
h. } j.., unbiast.:d:

= rr~ /fl:

Key Tenns

97

c. by the Ia\\ of large number~. Y is con"''tL'nt: and


d . b) the c~.:ntrallimittht:l1Tem }'has 1n .lppru\imat~.ly nnrn1.1l '>amphng
Ji-..tribution ''hen the ,~mpk -..ize i'>lill~l'.
2. 'I11C' 1 -..\,l\1'\I IC is uscJ 10 test the null h~puthe,ls lh.n thL pnpul111on mean take) on
u f'ltrllcul.1r \ Jiu~.:. It 11 is l<rrgc, the t-s l ati~tic ha:. a :.t<mdan.l nornwl samphng diS
tnbuuon when the null hypotbe!.,, 1s true.
3. 11 t: 1 ..tatb.tic ~o:<Jn he u<-ed to calculate the p-hllUI.' 1~ l'X-ltll:d \\ith the null h) pothc-.i-.. A 'malltH aluc: j, t:\ ide nee th.1l lh~.: null h~ puthcSJ-. is f II'><.
4. A <)<;:.;. t:unfilkncc interval for IJ.l is an intt!rval con:.truckd :.o th<ll 1\ cont,tins the
1 1JC v,tlue ol !J-> in lJ."% uf repeated -.arnplcs.
5. I h poth~,..,,~ tl.!:.ll> and confidenL"C tntervab for the dtltcr<.ncc in the means of 1wo
p,,pul.llllllll> ar~.: conceptually simil.tr to ICl>lS and mtenaJ, fur the mean ot a ,m.
gk popul.1110n.
6. ll1c s1mple correlation coefficient is an e timn1111 uf the: popuLttiun corrllation
cn~offic;h.:nt c~nd mc.t~urcs rhe line.1r relationship l:>ctwccn two vadablcs-th.tt h.
how wdltheir sc:Hh..rplor is appruximated hv a str.ughr hnc.

Key Terms
~'tllll.llOI (hX)

c~lllllclll (h~)

l'ia .~.:un'"tcnc.:}.and clltdcnC) (68)


BLUE (69)
k.t ... t 'llll.trcs c'timator (70)
hypothc is lC~I (71)
nllll tnd .lltern:niv~ hvr<'th .,e., (72)
two '>td d .tllcrn:ttt\c h}pOthcsi;; (72)
f1 \alu~ (l>lgnilu::ancc prohahrhty) (73)
-.ample ' m.wc-c (75)
'nmplc 'lund.trd d~.:\iation (75)
c.lcgrc~" ol 11 ~cd,lm 17"')
.,lltJH.hr d .:nor of .m I.'s! cmator ( 76}
. ,,.,,,ttc (I r.ttlt.l) (77)
I ' '' ill'll~ (7')
I} p.c I crrn1 ( 7ll)

power (79)

one-'>ldcd :tile; 1n IIi H.' hypothc-.i., (XII)


conh dcnc.:~: S~,;t l \I)
CClntiJ~.nc.:~o IL\d (~I)

conlidcnc.:c int<.r,al (1-11)


coverage pmhnbilil\ (tU)
lt'!:>l tor the,; difference between twCl means
(b))

c.:aur.all.:llt!C.:I

(~:>)

trealml.'nt cUctt (&5)


~callcrplut (Y')

I) P'-' H cr111r ( 79)

-.i_gnilll'<tlll'C h!HI

criucal volu..: (79)


rejt!ct ion rcgaun (79)
ace( pt m~.:~ r~gion (71))
,jze of a t<.st (7'1)

(711)

...ampk .:mariancc (9-t)


'>ampk cm rdatiun coclltcicnl (...ample
l.'OIIdation) ('11)

98

CI-!APTER 3

Review of Statistics

Review the Concepts


3.1

Explain the diffe rence between the sample average Y and the population
mean.
~limaLe.

3.2

Explain the differe nce between an estimator and an


exa(nple of each.

ProHde an

3.3

A population distribution hal' a mean of 10 and a variance of 16. Determine


the mean and variance of Y from an i.i.d. sample from thlS population for
(a) n = 10: (b) n = 100; and (c) n = 1000. Relate your answers to the law of
large numbers.

3.4

What role does the ceo tral limit tJ1eorem play in statistical hypothesis testing? Jn the construct ion of confidence intervals?

3.5

What is the difference between a null and alternalive hypothesis'? Among


size, significance level. and power? Between a one-sided and two-sided alternative hypotJ1esis?

3.6

Why does a confidence interval contain more information than the result of
a single hypothesis test?

3.7

Explain why the differences-of-means estimator, applied to data from a ran


domized conlToUed experime nt, is an estimator of the treatment effect.

3..8 Sketch a hypothetical scatrerplot for a sample of size LO for two random vari
ables with a population correlauon of (a) 1.0; (b) - 1.0; (c) 0.9; (d) -0.5:
(e) 0.0.

Exercises
3.1

In a population JJ. y = 100 and <7~


answe r the following questions:

= 43. Use t he

central limit theorem

l 1l

= 100, fin<.! Pr{Y < 101).


Jn a random sample of size n = 64. find Pr(101 < Y < 103).
In a random sample of size n = 165. Cind Pr(Y > 98).

a. ln a random sample of size n


b.

c.

3.2

Let Y be a Bernoulli random variable with success probability Pr(Y = 1) ==


p be the [rae

p. and 1e1 Y1 , Yn be i.i.<.l. draws from this distribut ion. Le t


tion of successes (ls) in tbjs sa mple.

a. Show that p - Y.
b. Show that p is an unbi~cd estimator of p.
,. ~hnw rh;or v:erf n) = n( I - n \ / 11

Exercises

3.3

99

In <t survey of -+00 likely voters, 215 responded that they would vote for the
incumbent and 185 responded that they would vote for the challenger. Let
p denote the (ruction of all likely voters who preferred the incumbent at the
time of the survey. and let p be the (raction of survey respondents who preferred the incumbent.

a. Use the sur\'Cy results to estimate p.

b. Use the estimawr of the variance of p , p(l - p)l n, to calculate tbc


standard erro r of your estimator.
c. What is the p-value for rhe tesr H 0 : p

= 0.5 vs. H 1: p

d. What is the p-value for the rest H 0 :p

= 0.5 vs. H 1:p > 0.5?

:f= 0.5?

e. Why do Lhc rc:-;ults from (c) and (d) differ?


f. Did the survey contain statistically significant evidence that the incumbent was ahead of the challenger at the time of the survey? Explain.

3.4

Using the data in Exercise 3.3:

a. Construct a 95% confidence interval for p.


b. Construct a 99% confidence interval for p.

c. Why i~ the interval in (b) wider than the interval in (a)?


d. Without doing any additiona l calculations. test the bypothess
H 0 : p = 0.50 vs. H 1: p =F 0.50 at the 5% significance level.
3.5

A survey of 1055 registered voters is conducted. and the voters are asked to
choose between candidate A and candidate B. Let p denote the fraction of
voters in !be population who prefer candidate A , and let p denote the fiaction of voters in the sample who prefer Candidate A.
a. You arc inte rested in the competing hypotheses: H 0: p = 0.5 vs.
H 1:p=I=0.5. Suppose fhat you decide to reject H 0 if 1.0- 0.51 > 0.02.
i. What is the size of this tes t'?

ii . Compute the power of this test if p

= 0.53.

p = 0.54.
i. Test H 0: p = 0.5 vs. H 1: p * 0.5 using a 5% ~jgnifi cance lev<.: I.

b. In the survey

ii. Test H0 : p = 0.5 vs. H 1: p > 0.5 using a 5% significance level.


ill. Construct a 95% confidence interval for p.

iv. C..onstruct a 99% confidence interval for p.


' Con!>truct a 50% confidcnct: mtcrv:sl for p.

100

CHAPTER

Review of Stohslics

c. ~uppo'c that rhc '\Urvcy i" c,lrrii!U out 21l llm~os u''"!! mJcp~..ndcntl}
sclcL!cd voters m each sun c). For 1..i.11.:h of th~..-.~.. n-.ur,~.~,.a ''5'h
con u.knce intcnaJ lor p ts coiThlrul.h.:J.
i. Wh 11 j.; tllc prohab ht~ that the tru~.- '.luc ol fl

21 of l

u~<:e coniJdence

~.on rained

in nil

mtervab 1

ii. H(lw mlny oi the~ conll dem.~: mt~f\1lb do> I)U c\pt\:1 to contain

the tru~.. "aluc of p?


d. lr: 'urvcv jargon. the 'margm o1 error" lS 1.% < '-l[(j,): th:.H i._, it is~
11mc~ the: kngth of 95'}o confi1.knc~.: mt<.n al Suppo~e y<'u '~:ln tcd to
Llt.:,ll!n ,, surYC) that hatl a margm llf t.:rwr uf ..tt mo't t% lh.tt i'. >o u
"ant~.tl Pr( p - pI . . ,. 0 01) s 0.05. Ho' lar~ l.' houltln he il tht.: sur''-)' use' simple: random <;ampling?
3.6

rd

Y1 . , Y 11 be i.i.d. tlra ws from a distribution with mean J1 A test of


1-1 , J1. - 5 vl.!r~u' H : J1. "/= 5 using thl' usual/ stati,tic yidds a !-value ot 0.!13.
a.

o,,.. , th1.. Q.:;"o confJdeoce in terval cnnt.tin p


you Jctermin~ if J.L
'a I? I::.\ plain.

b. (
3.7

.10

==

)'!

Lxplam

= 6 i!. coot uo~J m tht: 95% conhtlcnc.:c inter

I 1 , eivcn p<.lpulauon. ll ~o of tht! likely voter'> .111. \fril'~n \ mcncan A !-1\lr


u'sng 1 <ample rantlom 'ample ol hO(J lanJ-Iine telephone num~r' finu'
'"Aim tn Amencan ... h there e' idence that the ur''-" ts ~'~ -.~J. Expl ..un

v~,.'

3.8

A nc\\ 'c:r:.ion ,.,f the SAf test s g.tv~.n to IUOO r.mJoml) ,cJcc:tcd high '-Chool
nior-.. The :-.ample: mean te:>t :.core j<; lllll .tnd the.. -.ampk tanJ:u d de:'"
111m is 23. Con-;truct :t 95.., confidt. nee rntcc\ al fnr the pnpulati<'n mean IC'>l
''-Ole. fort e.h sch('lol seDJ(ln,..

,l

3.9

~uppn'e that ;~ lightbulb manufacturinr plant pn. uJuccs bulb-. with ' I me<ln
lik of :!00() hour<; and a ~tandarJ deviation ol 200 hoUI '( An in' en tor claim
tn haw developed an improved procco.,s th nt prm.luct.:' hulh~ with "' long.l'T
mean hie nnd the same standan.l <kvwtmn .'l he plant mattag~..r rnntloml)
select' too bulbs produced by the pn>cc'>S. She '"Y' th.ll 'he will 1-lellcvc th.:
Jnv~..ntot \ cl.um 1llhe ~ample mean lift of the hulbl> 1s e.rcatcr than~ lOll
hour:..uthc.. 1"\\J,e. ~he'' ill concludt: that the new flTillCS' 1s no l>cttc.:J tban th ~
old pr1~c' l ct J1 Jenote the mean olth~.: 111.'\\ prl>Ces~ lom.LJ~o.lth~.. nullun.l
ahcrnativ~,. h\pothesisH., IL- 20110~ /L 1J.L
2(.)( I

a. \A h.ll
h.

IS

'\t rr~),

thl.' '>lit.: of lhe plant manager':, te-.ttng fl OCc.!durc'?

. . th tl

proce" .... in lal.. lx. lcr nnd I .. I mean I ulb


k <)( 214\0 hour;. \\1un j, the f'O"d ot the plant mlllt.tg~r lc ln.,.

h.:

llC:\\

Exercises

1 01

c. What lc:stmg procedure shnuklt hc plant mm.tl!cr usc 11'he ''ant-> tht:
'at e ol her It: 1 to be 5% '?
3. LO S upfH''~ a nc\\ !)tandardized te-.t i' ghcn

100 rnnunmh ... ct~ted thirugrlde -,tudt:nh in ~ ew Jersey. The 'ample avera~e ...cor<: ) on the tc:.t IS SX
points and the sample standard lie' iauon. \ >. i., ~ potn l'
Ill

a. '11,.;: author-. plan to administer the test l<' all third !tl al.le student!> in
Nc '' Jcrsc\'. Con~truct a 95% confidence tntcr\'allor the mean score of
all New J e r~cy third grader'>.
b. Suppo't lht 'tme test b gt"cn to 2(1() random I\ selected thtrd graders
from Jo,\a. producing a sample tvcn.Jf.e of o2 putnts .10J -.amp!... ~ta n
d arJ dcvtallon ot I I points. (onc:truct a IJO'lu co nlldt:n\A; tntenal for
the diUcrcncc in mean scores bct\\cen lo'' " and Nc\J. Jcr:.ey.

you conclude\\ ith a h1t!h de!!fct: of conltdcnce I hat the pnpulation


means fM }O\\U and :\ ew Jer<;l.;y <;tudcnh art dtift.tcnl.' ( V. hat IS the
stanllard error of the dJffc n.. nct 10 the L\\O ~ampk m~.an~? \Vh 11 '"the
P' alut of the tc:.t of no difcrcncc 10 mean-;' ersu:, ~om~.: Jiffcrcncc?)

c. Cm

3. 1I ConsiJ~..r the C!\~m alor y ' defi ned in Equation (::\.1). Show tha t (a) L (
J..l. y and (h) var(Y) = 1.25a} In .

n- =

3.12 To tn\~,;!)llg<~te possible gender discnmination in .1 lirm, r1 ~.;ample of 100 men


and M women \\ith stmilar job dc-.cnptions me , eJected HI ranJnm A summary of the rc"ulling monthly sal a ri~' follows:

\ten

Average Salary ( Y)

Standard Deviation (.s 1 )

S311 J

S200

IIW"l

Women

a. What Ju thl!sc data sugl!e'l .1oout \\age differ~. nee~ in the firm'' Do
th~.:) n.:pr~..-.cnt statislicalh ,ignificant ~vtdcncc that w:~~cs of rnt.:n and
'''<'men are different? (To an"'"cr th1s l!Ut:,t ton ltr..,t ~tate the null and
alh:matih: hypothests: second. com putt the rc.:ll!\ .lilt t-\t3ll~llc: thmL
compute the p\'alue ~octatcd with the t-:>L.Illt>IIC; .tnll ftn.tll) U<;L the
fH .1luc to an!)wcr the quc~ti<.Hl. )
b. Do thcsl! data suggest Lho~t the llrm is gu tlt) ol gcnu~..r dbc:rimino.Jtion
1n ''' Lnrnp.:n,alion pulicic-.' Explalll
.'.1.'\ D:ua <'n tilth-grade te -l scorv. (rcauin}! ,wJ m.tthematll"') for .$20 --chool dbtm t-.. in C.tlilurma ~ ield Y - h4h.2 and ,t,mdard d1..' t.llaun 'y ~ 1') .S.

102

CHAPTER 3

Review of Stotislics
a. Construct a 95% confidence in terval for the mean
popui<:1Lion.

t~:st

-;core in the

b. When the districts were divided into distric1s WJtb small cla~scs ( <20
studenls per teacher) and large cl asses( ~ 20 students per teacher) the
following results were found:

l c~Y..

Avervge

Standard

Score (Y)

Deviation (sy)

Small

65/A

19.4

Llrge

650.0

J 7.9

__;_j
:2

ls the re statistically significant evidence tha t the d istricts wit h smaller


classes have higher average test scores? Explain.
3.14 Values of height in inches (X ) and weight in pounds (Y) are recorded from
a sample of 300 male college students. TI1e resulti ng summary statistics are
X = 70.5 inches; Y = J58 lhs; sx = I.R inches: s y = 14.2 lbs; Sxy = 21.73
inches X lbs. and rxy = 0.85. Convert these statistics to the me tric system
(meters and kilograms).

3.15 The CNNIUSA Today/GaJlup poll conducted on September 3-5, 2004, surveyed 755 likely voters; 405 reponed a preference for Presiden1 George W.
Bush, and 350 repon ed a preference for Senator John Kerry. TI1e CNNIUSA
1oday/Gallup poll conducted on October 1- 3.2004. surveyed ?56 likely voters: 378 reported a pl'eference for Bush. and 37g reported a preference for
Kerr~.

a. Construct a 95% confidence interval fo r the fraction of likely voters in


the population who tavored Bush in ea1ly September 2004.
b. Construct <1 95% confidence interval for rhc fraction of likely voters in
lhe population who favored Bush in early October 2004.
c. Was there a statistically significant change in voters' opinions m:ross

rhc two dates?


3.16 Grades on a standardi7.cd test are known to have a mean of I000 fo r students
iJ1 the United S t a t ~;s. The test is administt:red to 453 randomly selected stude nts in Florida: in this sample. the mean is I0 13 and the slanc.larcl deviation
(s) is 108.
a. Con:-.truct a 95% contidence intcrvul for the average 1<.!"1 score lor
Flonda '!>tudents.

Exercises

103

b. Is the re statistically significant evide nce that Flo rida stude nts perform
differen tly than o rher srudc nts in the U ni1cd S ta tes'?
c. Ano tber 503 students are selecte d at random from F lo rida. TI1ey a re
given a three-hour preparatio n co urse bdore the test is administe red.
Their average test sco re is 1019 with a standard deviation of 95.

i. Construct a 95% c onfide nce inte rval for the change in ave r age test
score associated with the prep course.
ii. Is the re stat istically sig,nilicant evidence tlta t the prep course

he lped?
d. TI1e origi nal 453 students a re give n the prep course and t he n asked to
take the test a second time. The ave rage change io their test scores is 9
points. and the standard devia l.ion of the cbaoge is 60 points.
i. Construct a 95 % confidence interval fo r t he change in aver age test

scores.

ii. Is the re sta tistically sig nificant evidence that students will perfo rm
better on their second a ttempt after ta king the prep course ?

m. Students may have performe d be lle r in lhe ir second attempt


because of rhe prep course o r because they gained test-taking
expe rience in the ir first atte mpt. Descri bt! an exper imenl tha t
would q uantify these two effects.
3.17 Read the box 'The Gender G ap in E arni ngs o f CoUege Graduates in the

UniLed Sta tes."

a. Construct a 95% confide nce inte rval for the change 10 me n's average
hourly earnings be lween 1992 aod 2004.

b. Construct a 95% confid ence inte rval fo r the cha nge in women 's average ho urly earnings be tween 1992 and 2004.

c. Co nstruct a 95% co nfi dence interval for the change in the gender gap
in ave rage hourly e arnings between 1992 and 2004 . (Hinr:
Y,n.t<m -

Y,.. 1992 is independt!nt or Ym..:004

Y,. .z~X~o~)

3.18 'Ibis exe rcise shows t hat the sample varian ce is an unbia sed estimator of the
population variance whe n Y1... . , Yn are i.i.d. with me an p.y and varia nce tri-.
a. Use E quatio n (2.31) to sho w tbat E [(Y; 2cov(Y,Y) ,. var(Y).

Y)2] = var (Y;) -

104

CHAPTER 3

Review

or Statistics

b. lise. Equation (2.33) t<l 'hO\\ that cov(Y.YJ -= IT~ l n.


c. l'>e the results tn pans (a) unu (b) to shm' th.lt

Hl \;) - tri.

3.19 u. Y i-. an unbiased c timator o f~). Is Y2 an unbiac;e<J e~timator of p.~:>


b. ) ~ a consistent estimator of Jl..'t' Is Y 2 a consbtt;nt r.:-.timator of P-5:?
3.20 Suppo"c that (X,. Y) are i.i.d. with finite founh momenb.l>rcwc thatth~ sample co' ariancc l'i a con:.1stent esttmator of the populmion covarian e. that i~
~ \ )' -'-+ tr n where .) \'Y 1s dcfincu in Equation (3.24). (1/mt: Usc th~:
su.ncg} of Appcndtx 3.3 and the Cauchy-Sch\\aril inl qualuy)

3.21 Show that lhl} puoktl '>lanuard error lS,.. ,,, ,( Y,. - r . . )I given following
Equation (3.23) equal<~ the usual slaudarJ error lnr the difference in mcam
in Fquation (3.19) when the two group sizes are t.he sam~.: (n, = n w)

Empirical Exercise
E3. 1

On the text Web site www.a\lbc.com/st ock_wat'ion you will find a data Ilk
CI'S92_0-l that contuin::. an extended version of lbc ~l.ttasct used 10 'I able .:u
of the text for the years 1992 um.l ::!004. IL contain,;; data on full-time. full-~ car
wnrk~r' age 25-34. with a high o;chool diplum<t or 13.1\.lllS. ac; their hight:'l
de~rt:~ A Jctailcd description is giwu in CPSIJ2 04_ 0e~cription . availal'll~
on the \\cb "ite. Cse these data to answer the following <.JUC~ton ..

a. Compute the sample mean for average hourly t.trmng::. (A HE) tn l q~2
and tn 2004. Coru.truLL a 9.-o.-o confidence inter-.. I h1r thL population
mean::. of AHE in 1992 and 2()(}.1 and the change between 1992 uno
2004.

b. In 200-1. tbe value of the Consumer Pnct: Imh.:x (CPI ) was J88.9. l n
1992. the value of the CPI was 140.3. Repeat (a) hut use AHE measured in re<d 200-4 dollars l$2004): that is. adjust Ihe 1992 data forth~
pncc: infl<Hion that occurred between 1992 and 200~.
c. [f you were interested in the change in wnrkcrs' pu rcha-.ing power
lrom 1992to 200-1. would you use the rcl>ulls Irom (a) or from (h)'!
Explain.
d. t ''~C the 2004 da ta lo construct a 95'Yu conhdcncc interval for the
mean of AHE for high school graduah:s. Com.truct a 1.)5~.., mntit.h.:nc-:

The U.S. Curren! Populohon So!'\ley

105

int~

'.tl for th~: mean of AHE fur "orJ.;cr' "ilh a ,,lie dt:,.rt!t:. Con~truct a <>5'\, confidence inteJYal for the thllcrcnt'l' llctwccn the two

e. RLpc.tt (d) u'tng the Jl)tJ2 data expre-:,cd in ~2ll04.


f. DiJ real (in Oat ion-adjusted) wage' of high -.dH1ol1t ,tdualc' incrc.t'c
from Jl)9" to :!004'? Explain. Did real wa~~.:' ol colk~c gr,tduatc'

increa.,c? D id the gap between earnm~~ of cnlkg~ and huh C>thnol


l!raJu..tt..:s mcrealoe? Explam,ll!-mg appropriate c'tam.ttu.. conlaJcncc
tnld\31

ctnd lt:'i t Sl3l1Stics.

g. lank 3 1 present information on the tl.e!'l'dct S'P for collt"i!.C gradu.tt... s. Prepare a <:millar table for htgh !.~ht)ol A' :~duatc' u'anl?. the 1992
.tnJ 200-t daw. An.. there any notabk (.hltcn:u~:e:.lx:.t\\Ct. 1 tht rc'!>Uhl>
tor tugh chool and colkge grm.luatc~'!

APPENDIX

3 .1 ~ The U.S. Current Population Survey


fach month the Uurt. tu of labor Statisuc-; in th~.; U.. Ot.partmcnt ol l 1Nir conduct~ the

'Current Population SurYey" (CPS), \\hiCb J'fO\Ide ... J.tt.l un I '"'' r >r~c: ch.lractt:ri~lll~ ol
the popul.ltiUII,IIldudtnl! the 1..:\-elnf employm..:nt un~,;mpltl\llll.:nt, .11\J canlllll!'- \!lrc.: than
'll.IJUU G.~. lwu~c:holo.h are ~urveyed each munth.ll1c ~.unpk "~ hu~cu b} r;mdornl} -;elcct-

mg JJd e .. n lu'm a Jatabill>c of addre<>se<. from the mu lto.:ccnl dco.:cnnml ccn'>u' .eugmcnt~.:d v.nh J.et.s l'O new housing units coru.tructetl after the he 1ccn,u I he ~\act randnm
.tmpli .. -...heme '' ruther cumphcated (fir,t .... m.tll geo.gr.1phknl nren-. nrc r.mdoml~
~k~to.:J th n

hllu'>tnQ units within these area" arc r<~ndtmh' lectcd) Jctnl" can he II)Und
m the 1/andht>ok o I 11hnr StatlSiics and on tho.: Buu:uu o.>l l.lii'IOr c;t \11\IIC' \hb -.itc

{W\\\\

I'IJ,gm)

11w ~u' ) cvm.luc:tcd each \larch es mur .. Jcllut..;llthnn 10 other mont!' .md ,,,)..,
I.{Uo.:,IHllh .1hnu1 trnmg~ during lhc: pre' illU' ~ear. The ''''''''h' 111 Inhlc '\ I wert;; Clmputcd
u'ing tlw ~1.11\:h 'Ill\~.;~ 111~. C PS earning' data <Hl' for full 11111~.: v.rl..cr defined to~
'omd,oJ' empi11\'Cd mnrc than 35 hours per \\Cd; r1'1 :'II kn,t IR week' 111 th~ prev11lU'
\c,er.

106

CHAPTER 3

APPENDIX

3.2

Review of Statistics

Two Proofs That Y ls the Least


Squares Estimator of p.y
Thas appendix provides two proofs. one using calculus ami one not that Y minimizes tht"
sum of squared predjctioo mistakes in Equatjoo (3.2)- that h. that } ts the Least square~
~.:stimator

of E( Y ).

Calculus Proof
To minimize the sum of squared prediction mistakes, take its de nvntive und set it to zero:

Solving for the fi nal equation form shows that ~:~ 1 ( Y, -

m? is mimmized when

m=Y.
Non-calculus Proof
The strategy is to show that the difference between the least squares estimator and Y must
be ~ro, from which it follows that Y is the lea!>t squa r~ estimator Let d = Y - m, so tlut
m =Y - d. Then ( Y,- m)~ = (Y, - [Y- dl) 2 = ([Y, - Y) - tf)! = (Y1 - Y)l + 2d( Y,Y) + d 2. Thus. the sum of sq uared predictiof) mistakes (Equation (3.2)} 1~
n

If

L (Y, - m)'l = ,L( Y,- Y)1


11

2d 2'; (Y;- Y) + ntfl


t =l

= 2'; (Y; -

Y)' ... mf2,

(3.2:.>)

where the second equality uses the fact that :L~ l Y, - Y) =0. Because both terms in the
final line of Equation (3.28) are nonnegative and because the first term does nor dc p~no
on c/, L7 1( Y, - m f is minimized by choosmg d to mab.c lhc second t..: rm ,nd 2 as small as
possible.This is done by settmg d = 0, that is, by selling m = Y, so that l' is the least squnrt:~
estimmor of ( Y).

A Proof That the Sample Vononce l1 Consistent

107

APPENDIX

3.3 A ProofThat the SampJe Variance ls Consistent

-------1

l11i$ appendix uses the law of large numoer:. to prove lhut the sa mph: vanancc l'~ is a consiste nt estimator of the populatio n van ancc
as stated in r quatiun (J 9). when Y 1
Y,, are i.i.d. and E(Y ~) < oo.
First. add and subtract f-L y fo write ( Y1 - Y) 2 = [(Y; - f.L\ ) ( )': - f.' }W = ( Y, - i-'I') 2
- 2( Y, - f.'y)(Y - p. y) + (Y - p.yf Substituting this cx prc~siun !o r ( Y, - Y)" into tbt: t.Jef.
inilion or s? (Equation (3.7)]. we have that

ur.

S~ = -

II

- L (Y; - Y) 2
n- l ,_1

[!

.. (n--n1) ni=t CYI - p.Y )2] - (


"n1

)<'r - JJ. )2.


Y

(3.29)

where the rin al equality follows from the defin ition ofY [which implies that ~7- 1 ( Y, - f.L y)
= n(Y - f.l.y)l and by collecring terms.
The law of large numbers can now be applied to th~.: two ll m1~ in the final line of Equation (3.29). De fine W, = (Y,- f.L yl Now E(W,) = cr~ (by the tkfmition of the variance).
Because the random var iahles Y1 . .. Y,. are i.i.d., thc random vMiables W1, ... , W11 are
i.i.d. l n addi tion, E(W f) = E[(Y, - J.t 1) 4J < ~ bcca us<l, hy a'liumption, E(Y:> < oo. Thus
If . . , tt, are i.i.d. and var(W,) < :.::. soW satisfies the cont.ltll~ms Lor I he Ia'' of large num-

! :L:

be~ an Ker Concept 2.6 and W ~ E( W,). But \V =


(Y1 - f.L l)' and ( \<\ ,) = cr~.
-.o .~~~-a( }',- J.tyf ~ cri. A lso, n/(n- 1) __. 1. so the f~rst t-.:ml in Equation (3.:!9)

cor.vuge;s 10 probability to uf. Because Y -L... J.tl. (Y - p. 1 f ~ 0 -;o the second lcrm
co~\'erges in probability to zero. Combining tbese result~ ytelds 4 ~ cr~.

PAR T TWO

Fundamentals of
Regression Analysis

CuAPTER 4
Cu

rTF.R

Linear Regression with One R egressor


Regression with a Single Regressor:
Hypothesis Tests and Confidence
Intervals

CuArTr:R 6

Linear Regression with Multiple


Regressors

cu" ,. rER 1

Hypothesis Tests and Confidence


Intervals in Multiple Regression

CH\PJF.R

Nonlinear Regression Functions

C ll i\PTER

Assessing Studies Based on Multiple


R egression

Linear Regression
with One Regressor

lltatc. uHpkmcnrs. t.ough new pcnallH:.s on drunk~~ ivcrs: Wllat is th'- cfkct
on h1ghwoy fatahttes"! A school J1-ilr1Cl cuts the <.171! o( '" dcmt:nt.trv

~chool da'"'s What r<. th~ dfcct on

its -.tuJ\; nh stanJarJi7cJ tc'l 'C\Irc"" You

sucet:...,Jully complch! one more >ear of ~.:ollcgc chl,SCll. \\ h.ll

j, the dlcct on

your fut ure earning-.?


All

thrc~.. uf these question' are .tboutthc unknO\\O eflcct of ch.mging one

variable. X (X being penalties for d runl. dmmg. class SIZC. or) ~...tr~ ol schouhng),
on another variable, Y (}being high\\ i.l) death\ student test scores. or t;,1rninuc;),
1bis chlfHCr introduces the hnca1 regre,.sion model relating one "ariabk,

X, to ano ther, Y This modd po tulates a lrnear relatiOnship PClwten X 1nJ Y:


Lbe slope o t the I me rela ting X ami Y 1s the dkct of ,, Cine-unit chnng(' rn X

on Y. Just .rs rhe ml:!an of Yis an unkno\\ n ch.tracteristl of the pnpul.llrnn


distribution of Y, the slope of the line rel.tt ing X and )"is an unknown
charactc ric;tic of the population join t disrribution of X ami Y.Thc econometric
probkm is to estimate this slope-thai'" to e~t tmn tc the dfcct un Y uta unit
l'hangc 10 X- u'ing ,, '<mple ol data on thc-.e "' o vari.1hk'
Thl') ~.-hapt r Jc,cnbcs m ethods for c-.timating thi:. 'lupe U')tng a r.mJom

c;amplc of dara on ...\ and Y. For instance. U"ina data on cJ;,,, !->UC" .mJ c't scMes
lr,lm dillcrcnt '<.:hool Jistricts, we shO\\ ho'' tn c'\timatc the c:-.pcctcJ cllccl on
II!' I

-.cures tl( red ucing class sizes b~ c;av. one ... tudcnt per clcl''- The .,lurx- .mJ the

intl.'rc~:pl ur tht hne

relating X and )' ~.:

II\ ~c

cstimatt.:J b)

:1

ml.:lhod ~ailed

ordinary least 'liu.~r~..-s (OLS).

1t1

112

CHAPTER 4

linear Regression with One RegreSlOI"

4 . 1 The Linear Regression Model


The superintendent of an elementary school district mu't decide '"hcthcr to htr~
additional teachers and she wants your advice. If she hin:s the teachers. she w' 'l
reduce the number of stude nts per teacher {the student tC;tcher rattO) hy two. Sh
faces a tradeoff. Parents want sm a lle r classes so that thea. ch1ldren can rcce 1\ ~,.
mo re individualized a tte ntio n. But hiring m o re teacher:. me an~ ~pendmg more
money, which is not to the liking of those paying the btll 1 o she asks you: If '>h\!
cuts class sizes, what will the effect be on student p~rformance ?
In many school district!). student performance is mea:.urc d by standardi7c '
rests, and the job status or pay of some administrators can depend in part on hm'
well thei r students do on these tesrs. We therefore sharpen the superintende nts
question: If she reduces the average class size by two students, what will the effect

be on stan dardized test scores in her district?


A precise answer to this question requires a qua ntitative statem e nt abou
changes. lf the superintendent changes the c lass size by a certain amount. wh.ll
would she expect the change in standardized test scores to be? We cao write thi~
as a malhcmarical relationship using the Greek. letter heta, f3c:t.us5l:.f'' where the ~u~
script 'CiassSize" dist inguishes the effect of changing the class size Crom other
effects. 1l1us.
cha nge in TestScore

f3ct.uSi:r

= change m ClassSize

6 TestScore
= t:J.ClassSize '

pi)

where the Greek letter a. (della) s tands for ' change in.'' That is, f3c. '' : is 'tc
change in the test score that resul!s from changing the class size, divtdcd b~ 'te
change in the class size.
lf you were lucky .:!nougb to know f3ctoSitc you wouJd be able to te ll the i>Up.... r
inte odent that decreasing class size by one Sllldent would change district wide t L 't
scores by f3nau.~f:t You could also answer the !.upe rinte nde nt's actual q uesll' 1.
which concerneLl c hanging class size by two students per class. To do so, r~urntn~('
Equatio n ( 4.1) so that

llTestScore = 13('/u"SI:t

t:J.ClassSi;.e.

(4 ~)

Suppose that {3 , \ = -0.6. Then a reduction in class size of t\\ O student:. ~r


class would yield a prl!dicted chang~ in test scor~s of ( - 0.6) X ( - 2) = I 2 th ~~ i"
} O U wo uld predict that te.t scores woulu mt hy I 2 poinh 1~ a n.:l>Ult of the redtt(
tim m cla~~ l>lh' h~ t\\.0 studenh per cia'

4. 1

The Lmeor Regre$sion Model

11 3

Equatton (.U) is the definiti o n of the slop~ of u ~tra ight line relating test
!>COre~ and class size. This straight tine can he wri11en
(4.3)
where {3,. is the intercept of rllis straight line. and, as bl.!lore. {3<, ,,, is the slope.
According to Equation (4.3). if you knew {30 and {3, (\,,,.,. not o nl> \\Oulu you be
able to determine the change in test scores at a di-.trict assoctated with a change in
class s1ze, but you also would be able to predict the avc!rage te t score 11'-t.lf for a
given class size.
Whe n you propose Equation (4.3) to the s uperinte ndent. 'ohe tells you that
some thing is wrong with this formula tion. She points out that class size is just one
of many facets of eleme ntary education, and that two districts with the arne class

perhap~ one district

has more immigrants (and thus Iewer nattve English speakers)


es. ma s e
e same
tn
eSc! way they might have dilfcrcnl test SCOT\:' tor cssen tia U~ random rcasons having to do with the perfonnance of the individual stude nts on the day of the
test. She IS right. of course; for all these reasons. Equation ( 4.~) will not hold exactly
lor all districts. Instead, it should be viewed as a statement 1:1bout a rci;Htonship that
holds on average across the popuJation of districts.
A version of rhis linear rdalionship that holds for t'acll di.,trict must incorpora le these other factors influencing test scores. includtn ~ each district's uruque
characteristics (for example, quality of their teachers. background or their students,
how lucky the s tudents wer e on test day). One approach would be to list the m~l
un portant factors and to introduce them explicitly into Equntion (4.3) (an idea we
n:turn to in Chap tt!r 6). For now, however. w~ simply lump all these "otht!r factor-'' togethe r and write the relationship for a given district as

TestScore = {30 + {30 ,,.._'\l:, X ClatrSize + o ther fac to rs.

(t4)

Thus, the test score for the district is written in te rms of o ne compont!nl, {30 +
fJc. , , x ClassSize, that represents t he average dfect of class size o n "cores in
tht. population of school districts and a econd component that represents all other
factors.
A lthough this discussion has focused on test scores and class size, tht! tdca
exprl!so:l:d m Equauon (4A) is much more gena.tl. so ita' uc;efulto mtroducc more

11 4

cHAPTER 4

Uneor Regression with One Regressor

gcnc;;ral notation ~uppose you h;l\c a ..ample of ,, OJ..,trkt~. Ld Y be the nverage


tc<;t core tn the 1 J1~tri ct. let X, b.: the a' ~ra2e cla"S ).Jte 10th..: 1 JJslril"l. and let
11 d..:notc the other l,,~tn~ intluen .. m!! thl.! 1<.. :-t s~l>Tt. mtht.. t1h J i..,tnct. fhcn Equalllln HA) can be \Hillen more g..:ncrall)' Js

/Jo + fJ X

(4 <;)

,,,.

fur each d1<;rnct. (that ~ i = 1, ... , 11) ''here {3 '' lht. mtercept ol thh h n~ anJ p
is the ~l ope. fThc g;.:neral notation "/3 1'" '~' u sed lor the "lope in Equ,Jttou (-!.5 )
JO<;te<u.J of ../3 0 ,,,. becau!.c this cquatton is \Hiltcn 111 terms of a gcncr.11 \anuble X, I
Equation (-t 'i) is the linear regrt.>ssion model with a !lingle regresioor. in" hich
r i" the dependent variable and X is the independent variublc or the regressor.
Thl! fi rst part of Equation (4.5)./30 1- {3 1X 1 b the population regrcNsion line or
the population regression fun ction. Th b i~ the rc l ntion~ h ip that hol<.ls bct wt:len }"
.1nu X on average over the population.lllU". if yo u "-tH.:" the value of.':. acwnltn~ to thi population rcgrcsswn line you would prec.lict tha t the value of the:
dependent variabk. Y, jo; /311 f3 ...\'.
TIH! inte rcept /30 a mi the ~l ope {3 1 arc the cocffident' of the population regrc'sion line. also known a' the aram ct~~ of the population rcgrc.,~ion line The slope

iidll IJ#J idllliilt iiltEIQEts lite I J(R ih -.orne ccnnometnc apphcauon<;., t .. L


tntcrccpt has a mc:!ani ngl ui cconunul. tn h:rprcla lwn In \lt hcr applic.,t\ n~. the
intercept has no rc.ll-world meamng: for example "hen Xi~ the: da..,.., ...
tricth
spc.tking the intercept is the predicted v,1luc u f tc ... t ~core' wh..:n there are no .;tu
d .... nt~ in the cl bs! Vvhcn the real wtlrld mcaninl.! of the mtcrcept is non,Cil!>tl.al ll
''be t to thin!. ol1t mathematicnll\ a<> the codllctl. nt that detcrmtncs thL le,d ,,
the regression linl.!.
The ll!n11 111 in Equation (4.5) ic; the error term . 11H~ error lc!rm incurporilt '
nil of the fact ors n:sponsiblc for the uiffcrcncc hctwccn lhc ;th <.lislrict'" average
test 'iCOn.: nnd the valul.! predicted h) the population tcgrcs_-;inn linc.l11 i" error term
I.."Ontains all the: other fuch.lf\ hc,idc:s X !hell uctcrmine tht! \<IIUC of the <kpcmh.J I
\. ;;ll i<~blt:. Y. lor n :;pecific obst:rvation. i. In the cl.as:> 'Ill: example. thc:)c othu I .:tor'i mcludt: all the umquc features of th(.. i1h d t~tnct th.tl .11tc,;t the p,.;rlornMll'~
olll'- ;tuJcnl::> on thl.' t~t. mcludmg reacher quality, :-.llldent econumiclhtl:l-grnund.
luck.md <!\en an} rrustak~ m gr.Jdmg Ihe;. lt:'ol
The hnear regrt.."ion nodd find ih lcm1inolog~ arc '>Umm ritcd in ~ '

ut'....

( r

nccpl 4 l.

4.1

The linear Regres51on Model

11 S

~
TERMINOLOGY FOR THE LINEAR REGRESSION
MoDEL WITH A SINGLE REGRESSOR

t:

..
The Jlnem rcgress1on model 1s

--------------------------------------~

4.1

the ~ub c ript 1 rune: over obser"ations, i = 1. ... . n;

Y, ts the dependent variable. tht: regress(lnd, or simply the ll!ft-hand w1riuble:


X as the indepc11de11t variable, the regressor. or simply the right-hand v(lliablt!:

/31) 1J Xi the population reJm::,sion line or populmion regression f uncrioll :


13i'J "' the intercept of tbe popula tion regression line:
{31 is the ,\/opt of tht! population regression line; and

u1 as t he aror rcrm.

Figure 4.1 summarizes the linear regression model with a single regressor fo r
seven hypothe tical observations o n test scores (Y) a nd clac:s s1ze (X ). The popula tion regression line is the straight li ne /30 + {3 1X. The popula tiOn regression tine
slopes down ({31 < 0), which means that distr icts with lowe r student-teacher ratios
(smaJie r classes) te nd to have highe r tesl scores. The intercept {30 has a ma thema tica l meaoing as the vaJue of the Y axis inte rsected by the population regresSJOn line. but. as mentioned earlie r. it has no real-world mearung in this example.
Beca ~e of the other factors that determine test pe rtormance, the hypothetical observa tions in Figure 4.1 do not fa ll exactly o n the populmion regression line.
r or example, the value of Y for di ~ Lrict # J. Y 1 is above the populat ion regre~io n
lane. Th 1s means tha t tesl scores in district #1 we re better than predicted by the
populat ion regression line, so tbe e rror term for that J io;tricl, u 1.as positive. In con!ra't Y, is below the population regression line. so test scores for that district we re
., or.,c tha n predicted, and u 2 < 0.
:"ow return to your proble m as advisor to the superintendent: What is the
e xpected effect on test scores of rcd ucang the s tudent- teache r ra tio by two stude nts pe r \cac be r? The answer is ea!>y:The e xpec ted c hange is ( - 2) X l3oauSt:c
But wha t is the va lue of f3cJilliSI:?

il

116

CHAPTER 4

Ltneor Regress1on with One Regressor


FIGURE 4 . 1 Scatter Plot of Test Score vs. Student-Teacher Rafio (Hypothetical Dotal
The scoffefplol sho..-s
hypothehcol observohons
for seven school di~tricts
The populotion rep-euion
lu1e is Jl.- - r X The
vcmcal dislonce from !he
' point to thu populotton

Test '<'On.' (Y )

~'I

regreuion line is

'(, (J3o ~ J3,X.), which IS


the population error 'erm
u, lor the i 111 observollon.
(>~0

<tOll
] I)

1'\

'II
2'1
\II
Stud ent- teac her ratio (X)

1'1

li

4.2 Estimating the Coefficients


of the Linear Regression Model
ln a practil.:al 'ituation. 'uch ac; the appltcation to cl.l~:. size "od tc:.t score'-.
mtcrcept 13 and -.lope {3 of the! populatton regres:.too line arc unk nown Th~...~
lnr~. \\c mu't UM. data to C'-llmat~.. the uokOO\\O :.lope and mtcrccpl ()f the popul<~tiOn r~ere~-.wn hn\...
Thb e~um.tunn problem is similar to oth~.;rs ~uu have faced m tali"tic' i \ 1r
example. suppost: you wanr to compare th-.: ml!an c<~rningl> of mln and woml'n '' IH'
recent ly g.nu.Juatctl from college. A lth ough the population mean earn ings .Ill'
unknown. we can e'tima tc the populntion means Ul>tng a r lll11.h)m s:unpk of nt.tk'
and female tulk ~c graduates. Then the natlll all!.;ttmator of the un l.n,)\\ n p~o)J t
l.ttion m~..-an c.trnmg' for '"uml!n. tor example, IS the J\Cray.t: carn111!!~ ul the h.:r 1'
college g.raou.ttc~ tn the sampk
The arne id~a ~xtcnth to the linear n.:grcssiun model. Wt: du n\ll know th~
population 'dut. uf f3t <.)rr the ,fupt! (If the unknu"'o popuhtion rcg.rc"''l1''
line relarin~ \ (cia-.-. '"e) nod )' (tc'l ..cot~'l But tust a' 11 ''d" pu.;;5ihk Ill
learn <1bt1U I<.. p pulatioo mean U'tn~ S '1 k Ul d,tl d \ 0 lrom thlll

4.2

L 4.1

Estimating the Coefficients of the Lincor Regression Model

1 17

Summary of the Distribution of Student-Teacher Ratios


and FifthGrode Test Scores for 420 K-8 Districts in California in 1998
Percentilo

Avorogo

T ' .....o~t:

Sta ndard
Deviation

10%

25%

40%

50%

60%

75%

90".4

(IT*IIan)

19.1l

J.9

J7.3

I'!_(,

(+.5 ~

lQ 1

ll30A

6400

191

Jll7

:'U I

h'4"

(l.'i9.4

(),11)

~(J.<J

21'1

IJ71J J

6ti(>

population. so is it poo.,sible to learn nb~lU( the: popuhtion -.lope /3 1 lo1t<., ,., uc;ing a
'-lrnpk of uuta.
The dnta we analyze here conlll'>t t.'f t~<:t -.cor~:- .mu cl"ss '-ties in IIJ9\) in
-120 California :,~,.hool dtstncts that -;er\~ kmdcrgu~n thmu.,:h t..lllhth grade. The
test score is the Ul)>trict,,ide averaac: of re~tding <Jnd m.tlh ...cnre... for fifth ~rrauc:r....
Chi!)S si7c can be: measured in various wn~ s. The rnea'>urc u-.cu hc:rc is nne of the
broadest.'' b.ich i., the n
thc district di\'idcd by the number ot
teachcr-.-that ''the district"ide stud~nt-tc:ach r o lC"I.! .1ta arc de-.cribeu
.., m more etat an Appendix -U .
Table 4. 1 !>Ummanze~ Lhe Jistrihuttonc. c1r tc::.t sc,)r~s anc.J 1:!.1::.::. Sties lor thts
sample. lne average srudent-teachl!r rat10 is 19.6l>tmkn b pd lt;ach~r anc.J the
!'tandard deviation is 1.9 student::. per teacher The IO'h r~rc~.:nttk ur th~: distribu
tion of the student- teacher ratio 1s 17.3 (that b. on l> IO'Y., uf Ji!>trich baH! stu
JC'nt t~a ch er rauos below 17.~). \\hile the district \1 the rll)'h pl.!rccntilc has a
studcnl- h:achcr rttio of 21.9.
\ scatterpiOt of rhese .f2Q o b,Cfh\liOO<; on ll!!'t 'core nd tl e -.(udent-h:a~he r
r.1t1u 1 ~hown in Figurl! ~.2. The sample corrdauon b - 0.23. tnd1canng a \\t: tk
n~:gauve rdattonc:hip between th~; l\\ O variabk!>. Although l.trgcr da,-.c, in th1s
~<ample tc:nc.J to haw low~r test scores.. there arc oth~.:r dctcrmin.mts of tel>t -.cores
that keep the obs~;f\ at ions from falling perfec t!~ ulu01.~ a -.t raight line.
nc,pitl this low correlation if llllc cou ld 'omchow dr:m a strairht line
throul!h the-.c data. hen tbe slope of this line \\ uuld h~ an c ... llmatc or fJo~,s.;:r
ba"ed llll these data. One wa~ to J ra\\ the lin~.; \\ouiJ be h.lltkc out a pencil and
a rukr and w 'cyeball .. the be~tltnl you could. \\'h1lc tim mdhoUt) ._,"> 1L1s \cry
unsc1cntiltc. and dtffcrl!nt people: wtll crcat~.. dtl"lcrl!nt c-.tim.llcJ line-...
How.th~..n. 'huuiJ you choo c .tmong th ~: mJny ptls.,ihlt: lin~:-.') B~ l;u the mo't
l!lltnnlOil \\ I~ i' tO ChOOSe the UOC thnl protiUCI..''- thC ")CL\<;t "t)li::ITCS'' li t tO lhcsC
data - that b. to u'e the ordinar. h:rtst ~u~rc-. (OL\) c-.tunntor.

118

CHAPTER 4

linear Regression with One Regressor

FIGURE 4.2 Scatterplot of Test Score vs. Student-Teacher Ratio (California School Distrid Da1o)
Dolo from 420 Coli
lomio school districts.
There is o wee~
negative reJotionship

between the ~tleocher rotio and tes1


scores: The sample
correlation is -0.23.

T~st

scon

:.~r
()i l -

.. . ..
.... .-:..... ....., ,"... ..
'. .
-. ... .; ' . \.....
. .
... .. ~ .. :.......
.
. ca.. 1':'

...:,.'*.!It:
,.. .: .." , .. ,.'
.....
.I.,~

. . .. 'k. , ..
,,.~~ ~:-

f,()ll

'~

640

,/'I
t*

to20

. ..

(,0/)

Ill

20

15

:!0,

30

Student-tencber ratio

The Ordinary Least Squares Estimator


T he OLS estimator chooses t he regressiOn coefficients so that the estimat..J
regrcsston line is as close as possible to the observed data, where closeness is me<~
sured by the sum of the squared misutkes made in predicting Y given X.
As discussed in Section 3.1, the sample average, Y. is the least squares eslUl
tor of the population mean,( Y); that i::.. Y minimizes the total squared esumat n
mlstakes 2.7- 1( Y,- m f among all possible estimators m [see expression (3.2 )).
The OLS estimator extends this idea to the linear regression model. Le fJ0
and b 1 be some estimators of /3, and /31. The regression line based o n these c,.mators is b0 + b 1X. so the value of Y, predicted u~ing tbis line is b. ~ b 1X,.11 ~
the mbtake made in predicting the ;th observatio n is Y, - (b,, ... b X,) = F - '
- b1X1 Tite sum of these squared predic1ion mistakes over a lin observatiom. 1'
n

}:(Y; - b0 - b1XY.

(4.6)

i= l

The sum of the squared mistakes for the linear regression model 10 expre:.
sion (4.6) is the extension of the sum of the squared mistakes for the problem uf
estimating the mean in expression (3.2). In fact, if there is no regressor, then 1
does not enter expression (4.6) and the t"' o problems are idenucal except lor the
differe nt notation [m in expression (3.2), b0 in t.:xpression (4.6) J. J lbt us there i" J
unique estimator Y that minimi7c~ the ~xpres inn ("'.2). u b thcrL a unique p:ur
of esumators ot /3, and /31 that mminwc cxprt.s,iun (4.6).

4 .2

---

Estimating the Coefficients of the lineor Regression Model

1 19

THE OLS ESTIMATOR, PREDICTED VALUES, AND RESIDUALS

4 .2

The O LS estimators of the slope {3 1 and the intercept {J(l are


II

:Lex,. - X)(Y

1-

{:3,-

i =l

Y)

,,

- ~Y

--2-

2: <X,.- X)?.

(4.7)

sx

i= l

Po=Y- fi,x.

(4.8)

The OLS predicted values ~~and re~idua1s tL; are

~=

__,___J

30
ratio

f;0 + ~ 1X1 ,

U; = Y; - ~. i

i =I, ... , 11
= 1. ... , II.

(4.9)
(4.10)

The estimated in tercept (~0), slope (~ 1 ), and residu al (ii,) are computed from a
:sample of n observations of X, and Y1 i = 1. ... , 11 . These are estimates of the

;timatc d

unknown true population intercept ({30), slope ({3 1) , and error term (u 1).

.s is mea-

s esti:ma-

!> limation
(3.2)j.
el. Let 1>1,
hese esti-

'tXi.Thus.
= Y,- bu
atioos is

(4.6)

in expres
)roblem of
;or. then h
:ept for the
ts there is 3
unique pair

The estimators of the intercept a nd slope th at minimize the sum of squar ed


mistakes in expression (4.6) are called the ordinary least squares (OLS) estimators of {30 and {3 1
OLS has its own special notation and terminology. The OLS estimator of {30
is denoted f;0 , and the OLS estima tor of {31 is denoted 1. ~c Of:S regression
line is the straight line constructed using the O LS estimato rs: {30 + {3 1X. The predicted v
n ,
s

between Y1 and its precticted

a ue: II; = Y, - Y;.


You could com pute the OLS estimators ffin and {3, by trying different values
o( b0 and b 1 re peatedly until you find tbose that mi nimize the totall"quared mistakes in expression (4.6); they are the least squares estimates. This method would
be quite tedious, however. Fortuna tely there are Com1ulas, derived hy m.inim ihing expression (4.6) using calculus, that streamline the calculation of the OLS
estimators.
The OLS formulas and terminology are co llecte d in Key Concept 4.2. These
formulas arc implemented in virtually all statis tical and spreadshee t programs.
These formulas are de-rived in Appt:nctix 4.2

120

Uneor Regression w1th One Regr~sor

CHAPTER 4

OLS Estimates of the Relationship Between


Test Scores and the Student- Teacher Ratio
\\ hen OLS as u~ec.lto esumate a line rcluung the stuc.lem t-.. ..chcr rallo 10 tc'>l
core" using the .pn oh,ervatinn' in Figutt. 4 2 th-.. c'timmcd 'lope i' _.,"~Sand
the ~'timatcd inlc.;rtepl is 69:-\.9. Accun.hngl\. the OL I) rt. n: ''on line for tn~.:~ 420
<,h,en at ion j,

T estScon

o9KI.) - 2.21)

c; ~TU .

(4.11)

' '"here TestScort: '" the average test .;core in 1he di'tt ict .md STR is the
... tmknt-teach(. r ratio. TI1c 'ymool .. ~,. over 1.\fScort! in EltUation ( 4.7) im.licat~o:'
that this is the predicted value ba,ed on tht! OLS r~ol!rc~sion line. ngun; 4.3 pint'
this OLS regression line superi mposed over the scauerplot of the data previous!\
~how n in Figure 4.2.
The lope ol -2.28 means that an increase i n the -;tutlcnl teacher ratio by one:
stut.lcnt per class i!>. on average, assucmh:d " ith a Jcchnc 10 Ji,lrictwiJe Lest !>COI't.''
hy ~ 28 points on the test. A dcct ca'& jg hr >'!Jrf P' It' ll h q , tJio h, "' > "'d~nt
pt:r cla~~ i) 90 ;n rnn nss:Cimsd llh qn iosrraw in 'r>' scmes pf -1 .56 pomt
1= -2 x
'

cr tea ch~
arl!er classes) i'\ 3""ocwtcd \\:lth oora c:rlormancc on th~ lC'\t.
b no\\ po~~bk 10 predic:t thc: district id~:. test sc:orc 21' en J .. aluc ot tin:
'ltuden t-t l!a~.:h~.:r r.1t1o. for t:xampk tor a distnct ath ::!0 stuJ-..nh per tc..,cht:I.tht:

FIGUR 4 .3

The Estima1ed Regression line for the California Data

The esltmoled regres

Tc~t sco~

1n,...
sion line ~s o
negotrve relottons.htp
between test scores
and the student
teacher ratio II class
sizes loll by 1 student,
the eslimoled regros
sion predtct~ thot test
' Ill
scores will 1ncreose by
2 28 potnls

..
...
..

.......

..

Tt$/Scon =o6989-228 x STR

4. 2 Estimating the Coefficients of the linear Regress1oo Model

12 1

ptcdictcd test score is 698.9- 2.28 X 20 = 653.3. Of course. this prediction will
not be exactly right because of the other factors that determine a district's performance. Bu t the regression line docs give a prcdicti<.m (the O LS pred iction) of
wbat test scores would be for that district, bast:J on their ~t udc n t-teach cr ratio.
ahscnt thO!:>t! other factors.
Is this estimate of the slope large or small'! To answer th1' we return to the
superintenden t's problem. R ecall that she is contemplating htnng enough teach
ers to reduce t he student-teacher ratio by 2. Suppa c her distnct bat the median
of the California districts. From Table 4.l, the median stuJ ent-tcacher ratio is 19.7
and the median test score is 654.5. A reduction of 2 tudents per class. from 19.7
to 17.7. would move her swdent- teacber ratio from the S01h pcrccntik to very near
the l O'b percentile. This is a big change, and she would need to biTl.. many new
teachers. How would h affect test scores?
Accordin g to E quation (4.11). cutllng the studeot-teucher ratio b} 2 is predicted to increase test scores by approximately 4.6 points: if her di~t rict's test scores
are at the median. 654.5, they are predk ted to inc rea~e to 659.1. Is this improvement large or small'? According to Table 4.1, thb imp rovement would move h er
district from the median to just short of lhe 601h percentile. Thu~ a decrease in class
size that would place her district close to t he 10% wi th the c;mallcst classes would
ffiQ\'C her test scores from the solh to the 6Qih percentile. According lO these estimates,at leasL cutt1ng the student- teacher ratio by a large amount (2 !.tudeots per
teacher) would help and might be worth doing depending on her budgetary situatio n, but it would not be a panacea.
What if the superintenden t were contemplating a far more;: radical change,
such as reducing the s tudent- teacher rat io from 20 students per teacher to 5?
Unfortunately, the estimates in Equation (4.11) would not be very useful to her.
Th ic; regr~ion was estimated using the data in Figure 4.2. and as the figure shows.
the :,mallcst student-teacher ratio in these duta IS 14. Thi.!Se data contain no information on how districts with extreme I) small classc.:.. perform, so these data alone
arc nut a reliable basis for predicting the cflcct of radical mo\'e to uch an
extremely low student- teacher ratio.

Why Use the OLS Estimator?

tl

There ue both practical and theoretical reasons to usc the O LS estima10r.:; {30 and
{3 1 Because O LS is the dominant method used in practice. it ha!. become the common language for regression anaJysjs througho ut \,;conomics, llnance (see the box),
und the social sciences more generally. Prescnung rc~~ulls u<.,ing OLS (or lls varirum di~>cusscd late-r in this houk) mean~ that you <Ill! "<,peaking the same language''

122

CHAPTER 4

lmeor Regresston with One Regressor

The "Beta" of a Stock

!umlamentaladc.l vr m xiCf"l\ finun'-: i.; that nn


c~d' n

to tilkt'
mlo. Sat ditlt:rcntl).tt t.: c\:pect~.:J return on a ri~k~
~ ;, R. mu't exc~.; he:. return un .t ._.,fe. 01
'" . )T

tmcnt 1ih 0\\011


P I C011p.t0\ ')btiUicJ bt: po-;111\'C.
l iN u might 'cc;tn hkc the ri~k ol a ~tock

urn. R
'\lclCk
At

Oc:tl!l 1CCOti\'C

n..,k,

R,.un

J.D\C

<:huuld o .. mea'1.1 red b) ''" van.utcc. ~tuch of thut


n 'k. hO\\ t:Wr. can he tcduccd by holding oth~:r
~tocb

m a pot t tohn

in other wort.b. hy dt\'t:rstf\

ing yuur finitn<:ial ltllklings Thil> means tlH)t the right


way tu mca!>urc the m.l-. of a stock i'l not by its \'urlunce but rather hy it11 ('(It 111 i,wcl' v.ith th~.; markc1.
lltt capnal il~-.c:t pn.:ang model (CAPM) ConnaiIL~s

thb idea

ACCI)I

ling to the CAP\.!. the e,.:pect.:d

~:\cess return

un '" .~~set is proportional to the


C\flCCtCU C'-CeS" rl!tltn. Oli 't ('<lrttOiill tlf all a\ailuht..
OJ''c:t~ (t~.: ma ~o;d portfolio") lMt jc;_ the CAJ>\f
3)' that

I<

,;(!~,

Rl

(4.12)

\\here U,., i\o the e\pect~.;d return nn the ~ nrh port


~the <.:oefficrent m th..: popUI 11 n rc~rc~

a <.tod: \.\,llh a fj > I '' n:.I..Jt.:r than lh" mnrket port


folio and thu\ cnm.1mh a htgher c pected cxt"C

return.
ln~

m:U~"d

"hcta" of a stock has hccomt


p, 1 r

huntlrc:~c;

ol

~tack'

" ,rt.- hor!'c

nn in,c::.tm,;nt

ltrm \\~b :,iter-. Tho~-. {.fs IH ic II .uc c::.limated hy

C)l. S ref!re~siCln ul the actual cxcc')~ return on the.:


'>tnck aguinsl the nctuul cx~.:c~~ rtlurn on <1 broad
murj..ct index.
The table bdow g1vcs cstimatcu {fs for -:-L' L:.s.
~wc k:.. U m -nsk con::.umcr prn<.luct'i firm~ lrke Kd
logg hnvl;! ~tuck' wi 1h lm' {3'~: mkter technology
QfOCk:-. h<l\ C hi~h J3'~

Company

Kdlng!; eaJ..tu..,, ccrc:all


Wnl M.1r1 (di~~ount r..:tailer)
\\.~cc \1
>!c:m~.. t (~a:.tc: dt'fl<J'llll
~~nt '~ td (tt:le<"ommunu.-.uums)

0.0~

11.65
(1,70

0 .7S

0 no a d ~uhlc: (h<lOk rctutkr I

l.O~

\licr

l.li
2.1 '\
2.t-5

.;oft\\.H..:)

Bc~t Bu~

c:kcth>IUt: ~.:qutpm.:ntrd;u1-.r}
.\Jlld7 m (millie n: tailct)
v
Pn\1 '""' c ""

fl' .-, nd"

!;iOn" R - Rr on R - It n r tl11CC, 'he pc;k-rree


return 1~ often 11~.cn lO he the rJ tc tll mll!rest on
' h"' H.:rm l' <; OlWI..IIlmcut d.:bt. <\c,mdinu to thl)
( . l' \.1 .t st 1cj.. wtth 1 p "" I IM~ k..,., n:.k th.111 lhl
m;(rk~.t port (oltll .md tht:refon.: hitS a lower expected
l:\Ct,:::.s return than the; marl..l.'l p11rtfnho ln contr.t'it.

Th~ ~turn vn 1n invc)\m~ul ~ the. c;h;m!!c: 10 th prtce


pJU)> '10\ p.I\UUI ldi vHfl-OU) frnm lhC Jnvl'~ (mt:nl II~ II p~ 1

c;:ntagc o! 11~ tnllt.tl rn<'c l(n c:\.tmrk :1 \tad. ~~u~ht <ll'


J,,nu ~ ltt>r ~11(1, whtch th~:n {llltt.l ,, ., ~Odi\tJ~nd duung
lh~ \'ear :tnll 'uh.l un l>.;ccmhc '1 fur \HJ5. \\UUIIJ h'' ~ ll

r.:turn ol/<

I(SIO.~

'lltl) + 'l!50I $11Kl

-.s"

.1~ llthcr \!Conomt't" anll ..ta tistichn. ThL OLS form ub.-. ,1r~.. built intll \'irtu.tll\' II
' f>reall... h~d nll st.tli,ticaJ,ofh\ are pad .. 1
'll.lking OLS C'l'\ ' tO u.;t.
The OL. c,timutors alo;o have lle"1rahk th tel teal prlJXrlll.'' Thl.!s~,. .1rc;. "'" tl
()go us to t h~.. ll~::,i r.thlc p rl1p~ 1tie' ' tullie<.! in S ~tio n 3. I. o l } u' nn c' tlm.ttl
the populatimt m~an. l ndcr the a"umptiun' introJuct..-d in Sectio n -t.4, thc 0 "

Measure~ offit

4.3

123

c:.unultor j.; unh1a:-cd and coo'i't<:nt ."llt~ OLS c,..timator b aho cffkH:ntumonl! .
ccrt un ell'' of unbia-.t.:d estnn<Jtur...; h o\\C'-Cf. th1s eff!C.Icnc) re~ult hold,.. umkr
~om~.: <u1Jlllon.tl pt:ci.tl conditions. .mJ furth&. r Jl',c~'ion ~~r tim. rc..,ult is Jdc:rrcu
until se~\1110 .:; '

4.3 Measures of Fit


Ha"mg csumateJ a hnc:ar r~gress10n. )llll might wund..:r hlm wcllth,at rcgr(.c;.,ton
hnc dcc;cr l"lc' th~.- data Doe tht.! rc.;~rcss u 1ccount form tch lH fm link t'f the
van 1t1on m the depend~:ot variable'l AI~.: th&. ob<;cn ation' 111!1111) du,tcrcd around
thL r~.grc"ion line. or arc they spread out'
I he R2 and the !;tandard e rror ot the rclrc.;'>,HlD ml.!<t'>Ufl h(m wdl the OL
rel!rC s10n hne fi t:, the d.lt3. lbe R r:.lnl!~o;:> bLtween 0 and I mc.J mt.: ~l.U fL'" t h~. Iraelion n t th(. ' .trianc~. of Y, that ts cxplamed b) .\. ThL '>tund.u c.J er1 01 oft he rcgrc\
sion m~.usures how far l ",t)p i call~ 1 from it-- predicted \":.tlue.

The regression R 2 is the fraction o1 tht ~ampk varian e of Y ("<plamc:d lw (or


prc~..hct~d t.y) X,. The c.Jdiniuons of th ~..- prcdtctcd ,alu~. .mJ th rcstdual {~cc J...l: y
Concept -1.2) allo'" us to v.rilt: the dl:penJcnt \..triable}, .1 ... thL sum ot th~ predicted vnluc, Y;. plus th~: rc!>idual ft1:

Y, "-

}~

+ Li ,.

(4. 13)

In thi-; notation. the R2 i" the ratiCl o f the ~ample' .man~:c ol )~ to the c:a 1ple ' antncc <'t } ,.
~l.llhcmattca lly. thl:! R-:. can be '' rittt:n " tht! r.ttllt uf the C \plain~d 'um

llf

-.quarl:' IClthe l<.lt,tl ~urn of squares. The C\plaincd 'urn of 'quare' (F..SS) i' the 'urn
<..'f" u rccJ dc,iationc; of the predicted', luc'-<'f Y ) , from then avda~c. and th~
total ' urn of sq u are.~ ( TSS) is the sum ul squared dc\i,tttun~ ol )'1 fwm tt<; 3\CTJI;!C:
ESS

=:-

...

L (Y, -

(~

Y)-

IH

' t

{U5)

Eljllilllun (-I. I-I ) u'c' the (acttha t


V (ptmcn 111 Appt.:ndtx 4.3) .

th~.

sample .tv-. rug~. 01

prcdu:tcJ -.:,tluc c4uals

124

CHAPTER 4

linear Regression with One Regressor


The R1 h U1~ ratio of the explained sum of squares to tbe totaJ sum of squares
(4.16)
Alternatively. the R2 can be written u1 terms of the traction of the variance"
Y ; no1 explained by X ;. The sum of squared residuaJs, or SSR, is lhe sum of thc;:squared OLS residuals:

SSR

"
= ;2>?
1

(4.17)

It is shown in Appendix 4.3 that TSS = ESS - SSR. Thus the R 2 a lso can be
expressed as lminus tbe ratio of the sum of squared residuals to the total sum nl
squares:

R2 = t _ SSR

rss

(4.18)

Finally, the R2 of lJ1e regression of Yon the si ngle regressor X is the square of the
correlation coefficient between Y and X.
The R2 ranges between 0 and l.li ~ 1 = 0, then X; explains none of the vanJtion of Y, and the predicted value of Y; based on the regression is just tbe sample
average of Y,. In this case, the explained sum o( squares is zero and the sum ' t
squared residuals equals the total sum of squares: thus the R2 is zero. In contra'!.
if X; explains aU of the variation of Y,. then Y, = Y, for all i and every residual ~
zero ( that is, = 0), so that ESS = TSS and R2 = I. Tn general, the R2 does n
take on the extreme values of 0 or 1 but falls somewhere in between. An R2 ne, r
1 indicates that the regressor is good at predicting Y,. while an R2 near 0 indica -.
that the regressor is not very good at predicting Y,.

u;

The Standard Error of the Regression


The standard error of the regression (SER) is a n estimator of the standard
deviation of the regression error u,.The units ofu, a nd Y;rue the same, so the SER
is a measure of the spread of the observatio ns a round lhe regression line. rne.~
sured in the units of tbe dependen t variable. For example, if tbe units of tbe dcp~n
dent variable are dollars, then the SER measures the magnitude of a typic. I
deviation from the regression line- that is, 1he magniLUde of a typical regressi(' 1
error- in doUars.

4.3

Measures of fit

125

Because the regression erron. u 1, . ,un are unOhi>erwd,thc SR IS computed


usmg their samp l~ counterparts. the OLS residuals u1, it,. lllc form ula for
the SER is

SER = sll ' w here SII2

= -n

1 2:
" .,
- -2 i I 11I

= -nSSR
-- 2'

(4.19)

where the form ula for s~ uses the fact (proven in Append" -U) th.at the ~ample
aven1ge of the OLS res1duals is zero.
The formula for the SER in Equation (4.19) is similar to the formula for the
sample sta ndard deviation of Y given in Equation (3.7) in Section 3.2. except that
Y, - Y in Equation (3.7) is replaced by
and the divio;or an Equa 110n (3.7) is
n - 1, whereas here ll is n - 2. The reason for using the divic;cn 11 - ~here (instead
of n) as the same as the reason for using the dJvisor 11 - I in Equation (3.7): It
corrects for a slight downward bias introduced hecause two lcgre,l>tOn coefficients
were estimated. TI1is is called a "degrees of freedom" correction: bccau~e rwo coeffici e nts were estimated (130 and /31), two 'degrees of freedom" <>I the data were
lost, so the divisor in this factor is n - 2. (The mat hcmatics behind U1is is d1scussed
in Section 5.6.) When n is large, the difference between dividing by 11, by n - 1, or
by 11 2 is negligible.

u,,

Application to the Test Score Data


Equation (4.11) reports the regression line. estimated using the California test
score data, relating tbe standardized test score (TestScore) to the student- teacher
ratio (STR). The R2 of this regression is 0.051 , or 5.1 %. and the SER is 18.6.
The 1?2 of 0.051 means that the regressor STI? explai n~ '\ 1% o( the variance
of the dependent variable TestScore. Figure 4.~ -;upcrimpo~c-. thh regression line
on the scallerplot of the TestScore and STR data. As 1he scatlerplot shows. the student-teacher ratio explains some of the variation in test scores, but much variation remains unaccounted for.
The SER of 18.6 means that standard dcvaataon of the rcgrc.;ssion residuals is
l~ .l-1. where the units are points on the stamlanllzeJ test. Because the standard
devi ataon 1s a measure of spread, the SER of 18.6 mean'> that there is a large spread
of ht:. scauerplot in Figure 4.3 around lhe ref_rc,sioo lioc: '-"'measured in points
un the test. Thic; large spread means that predictions of test score:-. made using only
the.: student- teacher ratio for that db trict will often be wrong by a large amount.
What should we make of this low R2 and large SER? The fact that the R 2 of
thib It!gn:sc;ioo is lo\\ (and the SER is large) does not. by iLc;clf. imply thallhis

126

CH APTER 4

Linear Regresion with One Regressor


n:l!r~'>'ion ts ~1ther good'' or "bad .'' What t he l,m R elm.) tc:IJ us t'> that oth~
imponant factors tnOu-.:nct.: tc:st scores. n,csc l.actvr-> could tncludc lhlft.:rc:ncc:~ 111

tht: '>tudcnt hod~ acros~ <.h,tnct , diffcrcn~-. 111 'c.:houl qualtt} unrdatcd to the stu
Jcnt-tcath~r ratio. or luck on the IC'>I. The.. I'>" R' unJ hi:.h STR Jo nt. t tdl u~ ,.. h. t
these factor" arc. but the~ Jo indicate that the.. -.tuu-.:nt-lc.. .c..hcr ;ati1) ~tlone explain!
only :t c;mall part of the variation in tc..st scores m the~ datu

4.4 The Least Squares Assumptions


Thio; o;ee11on prc-.cn t ,, set of three a ,umption., l)n tlw lin~.:ar rcgrc-.~ion mudd
and the samplin!! cheme under which OLS provtJcc; an :l['lpropriate estimator ,,r
the unknown rcl!rt.!ssion codficients. {30 and {3 1 Tnitinllv these ns.,umptiuns might
appcur ahstr acl.111cy do, however, have n.nund intcrpn.:H'Ilions, nnJ um.lerstanJ
ing these assumptJuns is essential for untll!rsta nding when OLS will-and Will
not-giv~ u"eful est imates uf the regressi~)ll cocffici~.:nt~.

Assumption #I : The Conditional Distribution


of u ; Given X; Has a Mean of Zero
The lir.;t least quares assumption i that the condit ion"ll d 'tributilm of u ,:!1\ fn

X, has a m~:.an of zero.llus assumpuon IS a torm.tl m.ath~:.rn.uiul !-latcm .... nl .tbout


the ..uthcr factor' conta n~:.d m 11, dOd cl''>erh th.tl th\. '1.' oth~:.r {aLtors ar~ unr.
l.ltt:d to X, in thL !.~n~c that gi,cn a ' aluc of.\~ the mean of the.. ua,tnbutJtn l)i
lh~:.'c uther factor<; i~ 1cro.
111is 1~ 11lu\lratcd m Figure 4.4. The population rc~rc<;c;ion ''the rt!i<ltionc:.l p
that hold::. on a\cragc hltwcen class size and t~.::s t toc:orcs in the population. anJ 1
error term tt 1 represents the other facton; ti1Jt lead test scores tt it g.ivt;n Jt.... tl d
to daffcr from th.c pn:dacuon based on th~: pt1pulutaun regrcs.o;aon lmc. A' :o.hl''' 1
in Figurt> 4.4. a t a gl\ en vuluc of cia s size. -;a) ;o <,tudc nts pl'f class, "omclll11t:
these other fncltlrl> lt~au to better performance than prcJictcJ (u, 0) and ~om.:
times to wor<;c performance (11, < 0). but on av\!ra 1!c U\Cr the pupulatil>n I he pre
1
diction 1'\ richt In oth~r word-.. gi\'CO
20, the Ill~ an nt the
IHUtlhli bi ll,,
/Sru lp fjgurs t,.t tb1? B' 'ho'"" as the distrihu1ion ol 11 l:lcine c~ntcr.:d un lhc
pnpulauon rcgrcssillll lane at X = 211 and. more ~encrnlly. at nthsr \'alucs .\ vl X;
5E \,elf Srud diftsn:ntl} the distribution ot u,, condumn,tl on (
f. lid< .1 nw,111
of /l!ro: tated mathc..m.tticallv.l:.(u XI = \) = nor, in 'uml.'\\ h.ll -.imnkr
notation.
l"

x,

(ul \' )

11.

lJ

a...n

4. 4

The least Squor~ Assumptions

127

AGURE 4.4 The Conditional Probability Distributions


and the Population Regression line
Test score
- ~,

7('M t

0 lntiUIOn of Ywhen X =20


(oStl

D~!libuho ~

ol Ywhen X = 25

I
(,41)r
( ~20

~ ~-----._~~~~------~----.__ _ _ _ _ _ _ _ _ _ _ _L_~~~~--~

10

;w

")")

Siudrnt-teocher rario

The figure 5hows the conditional probability of test scores for districts w1th do ~zes of 15,
20, ond 25 students The mean of the condtionol distribution of test scores, gven the
wdent-teocher roho, E(Yl X), is the populotton regression line f3o 13 X At o given volue
of X, Y is dostnbuted around the regression Ime and the error, v Y - (f3o JJ X) has o
condilionol mean of zero for all values of X

As shown in Figure 4.4. the assumption that (u, 1X J 01:. equl\uh:ntto


assuming that the population regrc:-.l>inn line is tht: conJitiona l m~::nn of Y, given

X, (n mathcm!lticnl proof of this is left as Exercise 4.6).

The conditional mean ofu in a randomized controlled ~xperiment.

Inn

raml\mi7ed controlled experiment. '-UhJc.;C.ts ar<! r munmly aS'>igncJ to th.! lrc.Jt


mem::-rftUp (X 1) or to the cont rol group l-\ 0). Inc random asstgnm~nt typic.dl) 1::. don~ u::.mg a comput~r pn.,g.ram that U\cl> no tnform.tllon cJMut tl~:
-.uhjcc:t, ~:nsuring that X'" di,tnhuted mdcpcnu~. ntl~ t)f all per,onal char.tct~ri'
tic<; of the .,uhjcct. Random assignment makes X and 11 indcpcm.h!nl. which in tum
1mphu. th,tt the conditional mean of 11 g.i\'Cn X is zero.
ln ob l.'tVOltonal da1a. X is not ranuomly aso;igncJ in nn experiment. ln).l\...au,
the t-.~.--.1 th.ll c;m b~. hoped tor is that Xi<. a~ 1/ r andomly assigned. in th1. preeN:
~... 'c.: that L(u ,.\) = 0. \\'hclhcr th1~ assumption ho lJt. m a ga\~on cmpincal.tpplicaton ''' h oh,~.;rvauonal dat,t requires careful thought anJ judgment and \\C
ret urn to 1hi-. ''' uc rcp~atcdly.

128

CHAPTER 4

lineor Regression wilh One Regreswr

Correlation and conditional mean . Recall from Section 2.3 that

i[ the

conditional mean of one random variable given another is zero, then the rwo random variables have zero covariance and thus are uncorrclated [Equation (2.27 )].
Thus. the conditional mean assumption E(u,IX,) = 0 implies that X, and u, are::
uncorrelated, or corr(X;. u;) = 0. Because correlation is a measure of linear association. this implication does not go the other way; even if X, and u, are uncorrelatcd, the conditional mean of u; given X ; might be nonzero. However, if X1 and C/ 1
are correlated, then it mus t be the case that E(u,IX1) b nonzero. It is therefore
often convenient to discuss the conditional mean assumption in terms of possible
correlation berweeo X , and u,. If X; and u; are correlated. then the conditi onal
mean assumption is violated.

Assumption #2: (X;, Y;), i = l, ... , n Are


Independently and Identically Distributed
The second least squares assumption is that (X;,Y;). i = 1, ... , n are independenlly
and identically distributed (i.i .d.) across observations. As discussed in Section 2 5
(Key Concept 2.5), this is a statement about how the sample is drawn. If the observations are drawn hy simple random sampling from a sing.Je large population, then
(XnY,), i = 1, .. . , n are i.i.d. For example, let X be the age of a worker and Y be
his or her earnings, and imagine drawing a person at random from the population
of workers. That randomly drawn person will have a certain age and earnings (that
is, X and Y will take on some values). If a sample of n workers is drawn from this
population, then (X1, Y). i ::: 1, ... , n, necessarily have the same distribution. If they
are drawn at random they are also distributed independently from one observation to the next; that is, they are i.i.d.
The i.i.d. assumption is a reasonable one for many data collection sche mes.
For example, survey data from a randomly chosen subset of the population typicaUy can be treated as i.i.d.
Not all sampling schemes produce i.i.d. obse rvations on (X, Y;), h owevcr. One:
example is when the values of X are not drawn from a random sample of the population but rather are sel by a researcher as part of an experiment. For example.
suppose a borticulturalist wants to study the effects of different organic weedin,!l
methods (X) on tomato production (Y) and accordingly grows different plots of
tomatoes using different organic weeding techniques. If she picks the rechnh.Juc:s
(the level of X) to be used on the i 111 plot and applies the same technique to the i'h
plot in all re petitions of the experimen t, then the value of X , does not change front
ooc sample ro the next. Thus X , is nonrandom (although the outcome Y, is random) . so the sampling scheme is not i.i.d. -n1e resu lts presented in this chaptt:r

4.4

The least Squares Assumptions

129

developed for i.i.d . regresso rs are a lso true if the re gresso rs are no nra ndo m . The
case of a no mando m re gressor is.. ho we' cr. quite special. Fo r e xample. mode m
experimen ta l protocols would have the ho rticulturalist assign the le vel of X to the
different plots using a co mputerized random number ge ne rator, Lhe re by circ umventing an y possible bias by the ho rticul turaJist (she migh t usc he r favorite: weeding method for the tomatoe~ in the sunniest plot). Whe n this mode rn expe rimental
pro tocol is used, the level of X is ra ndom a nd (X;. Y,) a re i.i.d.
Ano the r example of no n-ii.d. sam pling is when o bservatio ns refer ro lhe sa me
unit of o bserva tio n over time. Fo r example, we might have data o n inventory leve ls (Y) ar a firm and the interest ra te at whic h the fi rm can borrow (X) . where
tJ1ese da ta are co fl ecred over time from a specific firm; for example , they might be
recorded four times a year (quarte r ly) for 30 years. Thill is an exa mple of lime
series data, and a key fea tur e of time series data is that observations falling close
to e ach othe r in time are oo t independent but rather tend to be correlated witb
each o the r: if interest rates a re low now. rhcy arc likely to be lo w next quarter. This
pa tte rn of corre lation violates the ' independence .. pa rt of the i.i.d. assumption.
Time series data introduce a set of complicatio ns tha t arc best ha ndled afler develo ping the basic tools of regression ana lysis.

ren
be
on
at
Ihis

Assumption #3: Large Outliers Are Unlikely

pey

"' anes.
~pi

Kl P "

lpl ~.

:\iM
lS ot
=tll~s

.,tt

1e 1

rottl
Cilll-

The third least squa re s assumption is that large o utliers- that is. o bserva tions with
values of X, a nd/o r Yi far outside the usual range of the data-are unlikely. Large
outliers can make OLS regression results m isleading. This potential scnitivity of
OLS to ex tre me o uilie rs is illustra ted in Figure 4.5 using hypothe tical data.
In this book, the assumptio n that large outliers are unlikely is made mathematically precise by assuming rhat X and Y have no nzero Imite fo urth mo ments:
0 < E (X1) < oo and 0 < E ( Y ?) < oc. A no the r way to sta te thi!> assumption is
tha t X a nd Y ha ve finite kurtosis..
l11e assumptio n of fi nite kunosis is used in the ma thema tics tha t justify the
la rge -sa mple a pproximations to tbe distributions o f the OLS test statistics. We
encountered this assumption in Chapter 3 when d iscussing the consistency of the
sample varia nce. Specifica lly, Equa tion (3.9) states tbat lhe samr te varia nce ?y is
a consisten t estim a to r of the population variaoce u j. (s~ ~ cr~) . If Y1, Y11
are i.i .d. and th e fo urth mome nt of Y, is fi nite. then the law of la rge numbe rs in
Ke) Conce pt 2.6 applies lo the av~ragc, , 2..~ 1(Y, - fLy) 2, a ke y step in the proof
in Appendix 3.3 sho wing tha t s~ is consisten t.
One source of lar ge o utlie rs is data entry e rrors.. '> uch as a typogr apJ1ical erro r
or incor re ctly using d iffe rent units for differe nt o bservations: Imagine collecting

130

Linear Regression with One Regressor

CHAPTER 4

fiGURE 4.5

The Sensitivity of OLS to Lorge Outliers

Th1s hypothetical doto


set has one ou~1er.
The OLS regress1on
line estimated with 1he

y
~111)11-

1-ou

ou~ier ~ o strong

poshve relohonship
between X ond Y.
but the OLS regress1on
line estimated without
the outlier ~hows no

l lllll :....
111111 ~

1;1"1

relotionsh1p

data on the height of students in meters. but inadvertently recording one st uu~nt\
height in centimeters inste<Jd. One way to find outlien> i!) to plot your da ta. If ~ou
dcci1.k that an outlier is due to a data entry error. then you can either correct tb.:
~ rror or. II th.tl 1:> tmpossible. Jrop the obs~ rvation from your data ~lt
Data entl) crrorr. aside, the assumption of finite kurto~~ is a plausrble onL
in man} application~ '>'ith economic data. Class size is capped by the ph )sic,t
capncily of a clao;l>ro<.>m: lhe best you can do on a standarc.li.!ed test is to g..: I .llltht:
qth.::-.ti,1ns right aml the worst you can do is to get all the qucsttons wrong. Bccau, ...
cle~ss s1ze and tc ~l scores have a fin ite raogt'. they ncce:.sarily have finil~ kurl\his
I\ l or~ !!Cnerall), commonly u<>ed distribution<> such as the nom1n l distnhullon hc~,c.
four rnom~nts. Still , as a mathematical matter, sonu: distribut ions hu ve infinite
fomth moments. and U1is assumption rules out those distributions. H thi~ nssumptiun hold then it io; unlikel~ that statistical inference_ U!)ing OLS will b~ dominatrd
ova few obc;crvations.

Use of the Least Squares Assumptions


squares assumptions for the linear rcgrc~'>i<.>n model a1t: summarued rn KC) Cunc~pt 4.3. Thl! lea. I squares as!)umptions plav twm roles. and'' ~

1l1c thtcc

rcwrn

1u

k<~-.1

them r..:pcatculy throughout

tht~

le'\tbook.

cu t'~

fyou
:t the

e one

ysica l
11lthe
~cause

rtosi!>.
n ba ,e

illinite

>sump-

llnatcd

.umma<lnd we

4.5

Samplrng Distribution of the OLS Estimators

131

'

THE LEAST SaUARES AssUMPTIONs


Y1 = {30

'

{3 1X, + tt 1 i

4 .3

= L .. .. n, where

J. 111e error term u, bas conditional mean zero given X;: E(u,I X,)

= 0;

2. (X,Y1) , i = L . ... n are inde pendent and identically distributed (i.Ld.) draws
from their joint distribution: and
3. Large o utliers are unlike ly: X; and Yi have nonzero finite fourth mome nts.

Their first role is mathematical: If these assumptions hold, then, as is s hown


in the next section. in large samples the OLS estimators ha ve sampling distributions that are norma l. In turn. this large-sample normal distribution le ts us develop
methods for hypotht!sis testing and construc ting conCidcncc intervals using tbc
OLS estimator s.
1l1eir second role is to organize the c ircumstances that pose diff iculties for
OLS regression. As we will see. the first least squares assumption is the most
important to consider in practice. One reason why the first least squar es as~um p
tion mig ht nol ho ld in practice is discus~ed in Chapter 6, and additiona l re asons
are discussed in Section 9.2.
It is also important to consider whether the second assumption holds in an
application. A l.though it plausibly holds in many cross-sect io nal data se ts, the independence assum ption is inappropriate for time series data. Therefore. the regression methods de veloped under assumption 2 re q uir e modification for some
applications with time ~eries data.
The third as umprion serve s as a reminder tha t OLS, just like the sample
mean . can be sensitive to large outliers. if your data set contains large ou tliers, you
should examine those o utlie rs carefully to make sure those observations arc cor rectly recorded and belong in tbe data set.

4.5 Sampling Distribution of the OLS Estimators

Bn

Because the OLS est imator s


and ffi 1 are compute d from a randomly dra wn
sample, the e timn tors them selves are random variahles with a probability di'ltributio n- the sampling distribution- that describes the va lues they co uld take over

132

CHAPTER 4

Lineor Regression with One Regressor

different possible random -;am pies. This section pn.:~cnt!> lhc!>c c;ampling distribution In small am pies. the c dt'>tribut ion ~ arc comphcatccl, but in large amp h.:.
they are approximately normal beca use of the cenu .tlliatit theorc::m.

The Sampling Distribution


of the OLS Estimators
Review of the sampling distribution of

Y.

Recall the discussion in Sev


tions 2.5 and 2.6 abo ut the samphng distribuuon of the sample average, Y, an el>t l
ma tor of the unknown populatton mean of Y. J.L y- Bccauc;c Y is calculated usrnt; 1
randomly drawn sample. Y is a random variable that takes on different values from
one sample to the next: the probabiHty of these different values is summarizl!d 1.1
its sampling distribution. Although the sampling distrihution of Y can be compl"cated when the sample sjze is small. ll is possible to make certain statements about
ir that hold fo r alln. In particular. the mean of the sampling distribution i'l JJ,.
that is, E(Y) = J.L y.so Y is an unbiHsed estimaLOr of f.l- y. If n is large. then mort!~ n
be said about the sampling dstribution. In particular, the central limit t hcor~m
(Section 2.6) states that this dtstnbution is approximately normaL

Po

The sampling dis_tribution of and {J 1 These ideas carry over to the O LS


estimators /311 and /3 1 of the unknown intercept {3.1 and slope /31 of the popultt 1
regression line. Decause the OLS estimato rs arc calculated using a random ' mple, ~0 and ~~ are random variables that take on diffe rent values (rom o ne sam '.:
to the nexl; the probabitit)' of these di ffe rent values is summarized in theirs.
pii ng distributions.
Although the sampling distribution of and 1 <.:an be compJjcated "' hen tho.!
sample size is small, it ts possible to make certain stateme nts about it that hold tor
all 11. In particular, the mean of the sampling distributions of ffiu and~~ are /311 .tni.l
{31. In other words, under the least squares assumptions in Key Concept .u.

Pu P

lhat is, ffiu and ~~ an~ unbiased esumators of {30 and {3 1.1be proo( that ~ 1 is unhi:hr.:d
is given in Appendix 4.3 and the proof that {311 ts unbiased is le ft a Ex1. rc1o;c 4 i
H the sample is sufficient!) large. by the central limit theorem th~: 'Jmplint_:
dtstribution of ~0 and ~ 1 is well approximated by the bh ariate normal dtstnhuttOtl
(Section 2.4.). This tmpltes that the marginal dtstribution" of iJ and~. ar" nurm I
tn lar'!c ~Jmple:..

Sampling Di~tnbvlloo of the OLS E~timotors

4.5

133

,..

,..

lARGE-SAMPLE DISTRIBUTIONS OF {30 AND {3 1

Pu

It tIn: 1\!ast squares as!>umplions in Key Concepl 4.3 hold. then in large samr lcs
~nc1 jJ 1 hnvt a jointly no rmul sam pling distribution. The lilrgc sample normnl
dl'lrtbution of fo 1 is N ({31,(T$.). where the variaucc of this distribution. a ;,. i~
~~ = .!_ \'ari(X, - JL " )u,]
' ft

{var(X,)f

!h e larg~-samplt normal distrihutioo of

(PI )

Pois N ({30. Ti. ).where

1 var(JI,u,)

"71 , =-;, IE(Hi >F'

4.4

''he re H ,

=1-

JJ.x

( X~) X,.

(4.2:!)

This argument i nvok e~ Lhe central limit theorem. Tt!c:hnic.llly. the central limit
theon.. m C\lncerns the dJ, trihution of averag.t:s (like Y) . l f you ~.xami n c the numeryou wi ll sec that 11. too. i~ a type ol average-not n
'ltor in Equatio n (4.7) for
''mplc a\ crag.e. lake };. but an a\erage of thl.! produl.l. ( Y1 - } ')t~\ - X ). As discussed furthe r 10 Appendix 4.3, the central limit theorem appl k~ to thLl> a\c:rage
so that, like the lllm pkr average Y , it is normall) di:o.trahutcJ llllclf~c ~.ampks.
The normal approximation lo tbe ui trihut ion ur the OLS e-.tamatMS in large
sampk' is summa riled in Key Concept 4.4. (App ndix -t 1 'utmn.trvcs the dcri\'ation of thc'c fomlUla'-) A relevant questaon in practice'" hO\\ large n must he [or
these approximat io ns to be reliable. In Sectio n 2.fl w~ su ge~l;t ed th.tt n = 100 is
~ ul fici c ntly large r~)r the sampling distribution of Y to b1. well approximated by a
normal J itnbullnn. and some times smaller 11 l!lllltc.:<' n,., criterion carrie::~ over
t\''1 th<.. mnrc: comphcatcd a\ erages appcanng an rcgrc"ion mal) i-,. ln ' irtuall} all
nwJcrn CC\lnnmctric appliC<ttiolll. n > IIXJ. "0 \\C \\ill trc H the normal appro'.imati,)ns to the Jiqribution of the OLS ~uma t or" as reli.1hlc unlcsc; there a rc l!ooJ
rc:~<;ons to thin k otherwise.
n,<- rc,ults in Kt!y Concept .u in1ply t huth~ OLS estsmJl<.ll"\ arc on si~ten t
th 11 ts, \\ht.nlhe s3mpk ize is largcj and
will be clos~ to the true pupulatil'O cch.lhcicnt:- /311 and /3 1 with high pwbahsltt y. This is ht:CJU">C the' anance.
2
1
''tl amI 111, o I' tI1c c't1. mators decrca-;e tll/clll '' n .Increases (11 appear'\ .10 t IH~
Jcnominatnr nt the formulas for tl1e variann ... ). so the dlsttibution of Ihe OLS csli
matm:. ' 'ill be lll!htl) concenlraled around their m~:.an~.~il a nd /3. '' hen 111-. large.

ffi,.

h,

134

CHAPTER 4

linear Regression with One Regressor

FIGURE 4.6 The Variance of {3 1 and the Variance of X

The colored dots represent a set of X;'s with o


small variance. The
black dots represent o
set of Xfs with o large
variance. The regre!>Sion
line ca n be estimated
more accurately with the
block dots than with the
colored dots.

206 -

:w-1 -

..
...... .-:.. .:........'......
--::t
... .
.
..
.
.
.
.,
..
.......
.
.. ,. . . ..
.
.".,........, .....
.

20:!

200

19~

JC)(\

\94
97

98

99

100

101

102

103

Anothe r implication of the distributions in Key Concept 4.4 is that, in gener:~l.


the larger the variance of X,, the sm aller the variance q~ o( ffi 1. M athematically.
~
"1
this arises because tl1e variance of {3 1 in Equation (4.21) is inversely proportional
to the square of the variance of X ;: the larger is var(X;), the larger is the denomi
nator in E quation (4.21) so the smaller is uj,. To get a better sense of why th b is
so. look at Figure 4.6, which presents a scatterplot of 150 artificial data po ints on
X and Y The data points indicated by the colo red dots are the 75 obse rvau~,n~
closest to X. Suppose you were asked to draw a line as accura te ly as possible
through either th e colored or the black dots-which would you choose? It woulJ
be easier to draw a precise line through the black dots. which have a la rger vnri
ance than the colored dots. Similarly, the larger the variance of X. the more precise is ~.1
The normal approximation to the sampling distribution of ~0 and ffi 1 is a pow
erfut tool. Wjth this approximation in h and, we a rc able to develop mC!hod!> ftlr
making inferences about the true population valu~ of the regression coeffi cients
using Otlly a sample of data.

Summary

135

4.6 Conclusion

y.
al
m~ IS

on
)OS

ble
uld
ariort."-

, ow; [or
1eots

Thts chapter has focused on the use of ord1nary lea t squares to estimate the interet:pt and slope of a population regression lme using a sample o! n observatjons on
a dependent variable, Y, and a single regressor. X.There .trc many ways to draw a
straight line through a scatterplot , but doing so using OLS hai> several virtues. Tf
the least squares assumptions hold, then the OLS cstimutors of the slope and intercept are unbiased , are consistent, and huve a sampling distribution with a variance
that IS inversely propon ional to the sample silc " Mort:o-..c.;r t1 n IS large, then the
sampling disrriburion of the OLS estimator is normal.
These importaor properties of the sampling distribution of rhe OLS estimator hold under the rhree least squares assumptiono;.
The firsr assumplioo is thar the error term in the lineaa regression model has
a conditional me:m of zero, given the regressor X. This assumption implies that the
OLS estimator IS unbiased.
The second assumption is that (X,.Y,) are i.i.d., as is the case 1f the data are
collected by simple random sampling. This assumption ) tclds the formula , presented in Key Concept 4.4, for the variance of the sampling distribution of the OLS
estimator.
The third assumption is that large outlieri> are unlikely. Stated more fo rmally,
X and Y have fmite fourth moments (finite kurtosis). The reason for this assumption is that OLS can be unreliable if there are large outlier;.
The results in this chaprer delicti be the sampling <.li-.rribution of the Ots estimator. By themselves, however, these results arc not suflicicn t to test a hypothesis about lhe value of {31 or to construct n confid!!ncc tntcrval Cor /3 1 Doing so
requires an estimator of the standard deviation of the samplin)? distnbutioo- that
'"-the standard error of the OLS estimalor. This step-moving from the sampling
distribution of 1 to its standard error, hypothesis WSll' ::md confidence intervals'' taken in the next chapter.

Summary
I . The popu lation regression line. {30 + {J 1X , is the mean of Y as a function of the
va lue of X. The slope, {3 1 is the expected change in Y associated vith a 1-unit
change in X The intercept. {30 , determines the level (or height) of the regression

lioe. Key Concept4.1 summarizes the terminology of the population linear r~gres
'ion model.

136

CHAPTER 4

Linear Regression wtth One Regressor

2. T he populatinn n:grcc;c;ion ltnc can be estimated using sampk obscn.atinns


( Y, . .\,),1 = l. .... 11 by ordmar> tca ... t ~quares (OLS). 1nc OLS esumator., or the
rcgn.:.,'>klD intc::rc~pt .1nd ..,lope .1rc:: dcnotc.:J h) ~11 c~nd B,.
3. The R' and <.tanJan.l error of the rcgr~-. ... ion (Sl:.l< ) are mca'>un::, of how clo.,c the
value.; ol )' arc to the c<;timat~J r~-.~n:,.,ion line. l h~-. R~ i' bet\\cen II and 1, '' ith a
lar\!cr ,,,lui! tndtc.Jting that the Y,'c; nre do,cr to the lme Inc 'IClndMd error of th~
rt:gr~'ion i an estimator of the stand.ml tic\ t.ttu1n or thc regres:.1on error.
4. l11crc are three key <J~umpttons for tht.. ltnc tr re~~''>IOil moJd ( 1) The regre....
-,Hlll error,, 11 1 h<tVe .t nK..tn ul tcro t.onJuinnal on thc regressors X,: (2) the
sample ub:-.cn at ion., .m: i 1.tl. rantlom Jr..tws from the population: and (3) largt.
t.\Utltcr... arc unltk-.ly. ll th~.:c;~ .,,umptions hold. the OLS c'limatMs and ~ 1 att.
(1) unhiac;cd: (2) con<;i-;tcm: a nd (3) norma II~ distributed when the sample is larg..:.

Po

Key Terms
lin..:ar rej.!rcsswn motld with a c;inglc
r~gr~:.:.or ( 11~)
dl:penJent \anabl.: (114)
indt:pcnJ~.o nt 'nriahlt- ( ll4)
n.:grd-.l>Or (I 14)
pnpul;stiOn regrc"i''" lith: (11 -t)
popul.nion r~!'rc.,sion func tkm (I I4)
populatinn mterccpt and slope ( 114)
populath"1n codflctenl~ (I 14)
paramdcrs ( 114)
error tLnn (114)

ordina ry least squar~s (OLS) estimator


(119)

OLS rtercssioo line ( l l9)


prcdiCtl!d htlue ( I 19)
residual (I llJ)
regrc!>sion R2 ( 123)
e\plainctl ~urn of square<; ( f:.)S) (123)
tol\ll sum ot square-: (TSS) (123)
sum of ::.quareJ re~iJ u a ls (SSR) (I :!4)
standard error or the regrec;~aon (SF.R)
(124)
least square:- assumptions (126)

Review the Concepts


4.1
4.2

-t J

Expl.tin the Jiffcrenct b~rwc\.'n ~ 1 and {3 1: bet\\ cc n th~ rc..,tdu.tlli anJ tl '
rcvrc.,,ion crrM 11. and between th~.; 01 5 prcdkteJ v,tluc } ;mJ f ( r l \')
l-or cnch lcn::.t squar cs a<;<;umption. provrJe an example 10 wh1ch thc a..,o:u nt
uun ,.; '.thJ. nJ th~n pro' 1J" an example 10 '' htch the a"umption tall:-..
!)htch a h\ pothl!lK ttl ~m~-.rplut ol d..tw tur an e...llm.tti!J regrc,~1on " tth
R
( l) ~ dch a t) r llt CIJC- , ... t(h; rot ol J.Jt,t l\lr rt n.:grc ......i,m " ith
u-:. 05.

Exercises

137

Exercises
4.1

Suppose that a researcher. usi ng. d ata on ch1-;:, ~1/1! (CS) ami a'eragc test
scon~s from 100 third-grade classes, estimates the OLS regressio n.

~ = 520.4 - 5.82 x CS, R2

0.0~. SER = 11.5.

a. A cla~room ha~ 22 students. What b lht> rcgrt:SlllUtl', rrediction (or


tha t classroom's average h! t score?
b. Last year a classroom had JIJ stude ntc;, and thi<> year it h.L:, 23 students.
What is the regression's prediction forth~ changt. m th~. classroom

a' eragc test score?


c. The sample average class size across the I()() classr\loms is 2 1.4. What
is the sample average of the te-.t scor~~ aero~ th.: 1110 classrooms?
(Him: Review the form ula:, for the O LS estimawrs..)

d. What is the sample standard deviation of test scores .u:ross the I 00


classrooms? (Hint: Review the form ula:, for the R2 and SER.)

4.2

Suppose thar a random sample of 200 twenty-yeaJOitl men is selected Cram


a population a nd that these me n's height a nd weight are recorded. A regression of weight o n height yields

Weighi = -

99.41 + 3 94

1/ei~:llf, R 1

= 0.81, SER =

10.2.

where Weight is measured in pounds a nd I hi~:ht i.; measured in


11.

inche~

\hat is the regressio n's wc1ght prcdictil)n lor someone who is


70 inc hes taU? 65 inches tall'? 7<1 inches tnll?

b. A ma n has a late gro\\lh spurt und grows t 5 im:hcs over the cour.;c
of a year. What is the regression's predict ion lnr the incre ase in this

man's weight?
c. Suppose that instead o t mea uring wcifht and heigh t in pounds and

inches. these variable arc measured in cen timeters and kilogram!..


What are the regressio n estimntcs Irom this new C(!ntimetl!r-klogram
rcgrc....sJon? (Give all r~sults, t;SIImatcJ <:tx..lficicnts, R2 and SER.)

the
).

4.3

A regression of average weekly earnings (A\\T mca~urcd in d\lllar.) on age


(mea'iured in ye ars) using ,, rnndom sample of colkgc-educated fuU-tirne
"orkas aged 25-6: yields Ihe tollcl\\ mg:

AWE = 696.7

(),(, ,., A/oil\ R.

O.tP1 \T R - fl24. 1.

138

CHAPTE 11 4

linear Regre5saon with Ooe Regressor


o. Explain what the coefficient values 696.7 and 9.o mean.

b. The standard error of the regression (S ER) is 624.1. \\ hat arl! th~ unit
of measurement for the SER (dollars? years? or~:. SER unit-frec)7
c. The regre ion R2 is 0.023. What are the unit of mt:asurl!mcnt lor the

R2 (do lla~? years? or is R2 unit-free)?


d. Wha t is lhe regression's predicted earni ng~ for

.1

2"-year-olJ worker'>

A 45 -~ear-old worker?

e. Will the rcgre sion give reliable prediction ror a 99-year-old worker'?
Why or why not?
f. Given wuat you know ahoutthe di!.tribution of earnings. do you thml.

it is plau$ible that the distribution of errors in the regression is normal? (Htru: Do you think that the di$tribution is symme tric or
skewed'/ What is the smallest value of earnings, and is ir consistent
with a normal distribution?)

g. The average <~ge in this sample is 41.6 years. What is the average valu:
of AWE in the sample? ( Him: Review Key Concept4.2.)
4.4

Read the box 'The 'Beta' of a Stock" in Section 4.2.

a. Suppose thullhe value of {3 is greater than 1 for a panicul<lr stock.


Show that the variance of (R - R1) for tlus stock is gTeater than the
variance of (Rm - R,).
b. Suppose that the value of {3 is less than 1 for a particular stock. Is 11
possjble tbat variance of (R - R1) for this stock is gr~a tcr than the

variance of (R,, - R,)? (Hinr: Don't forget the

regr~ssion error.)

c. In a gtven y~ai the rate of return on 3-mont h Treasury bills IS 3.5..,~


and th~ rate of return on a large diversified portfolio of stocks (the
S&P 500) is 7.3%. For each company listed in the table at the
end of the box. usc the estimated value of {3 to estimate the stock's
expected rate of return .
4.5

A profeso,or decides to run an experiment to measure rhe effect of ttmc pt.;S


sure on fi nal exam scores. He gives each of the 400 students in his couN, t
same final exam, but orne students have 90 minutes to complete the t;" 1
whi le ot her~ have 120 minutes. Each student i) random ly usstgncd unc nf th~
c~amination times based on tbe nip of a coin. Let Y1 denote tht; numncr ot
point:. scored on th~ c-.am by the ih tuJcnt (0 ~ Y, - 100) let X denote th.::
amount of tim~. that the student has to complete the C'\.1m (X = 90 or 120
and constder th~o r~!--rc,:.ton model Y, /30 p .\"' "~"

Exercises
u. F'<plain "hat the term u reprt:!.unt<,. Why

\\llllhllcr~nt

139

<;\U(knts have

diflcrcn t values of u,'?


b. f:'< plain why E(tt,IX,)
l'.

0 for th i<; regrcs.,ion model.

A rc the other assum pt ion~ in Kc) C'nnlcpt 4 '~.lll-.fi ed ? fxplain.

d. l ll\! estimated regression is Y,

= 4lJ + 0.24 ;\ .

i. Compute the estimatcJ regression\ prcJictiun for the average


:.core of students gi,en ':}() minut~~ to complc t~ the e"Xa m:
120 min mcs; and 150 minutes.
ii. Compute the estimc1ted gain 1n score for
an nddllional 10 minutes on the exam
4.6

studen t who i gi" en

,1

S how that the first least squares assumptiOn. f(u, X) = 0. imphcs that

(Yw\,)

= {3l, -

f3 t Xr

4.7

S how that ffiu is an unbiased estimator of {30. (Nmt. Use the fact that~~ is
unbiast:d, wh.ich is shown in Appendix 4.3.)

4.8

Suppose thai aU of the regression assumptions in Key Concept 4.3 a re satis-

fi\!0 e.\ccpt that the first assumpt ion is replaced with L'(u, X,) = 2. \\ h1ch
r am of Key Concept ~.4 contmue to ho ld? W hich clungc'! Why? (Is {3 1
normali) distributed in large sample!. wilb mean and

\,tn,lllCI!

given in Key

Concept 4.4? What about f3u?)

4.9

u. A linear regression yidds ~~ - 0. ShO\\ that R2 0.


b. A linear regression yield!. R2

=0. Does this 1mpl~ th.11 p1 -

0?

4.10 Suppo"c:. that Y, = {3~. J. {3 1X + 11. where (X,. 11) arc i i d, and X is a
nernoul h random variable wit h Pt(X = I) - 0.211. \ \hen ,\ = l.u is \'(0. 4);
when

>. -

0. u, i N (O. 1).

n. Show that thl:! regre~ ion assumptionll in Key Concept 4.~ are

-.ati-.ric.tl.
b. Derive an expression for the large sample variance of

fi,.

[Hmc: E'aluate the tc.:rms in l:.quatlon (4 21).]


.S. l1 Con,id~ the regression mod e l Y; = {J,l + {3 1X,

11.

a. SuppoM: you kno'' that {3~


csllmator of f3r

= 0. De11VC a formula tor thl.! least '>quare:>

b. Suppo!'c you know that /311


l.'tunator of /3 1.

=~

Derive n fomwi:J for the least squares

140

CHAPTER 4

linear Regre$$ion with One Regressor

4.U a. Show that the regression R2 in the regression o{ Yon A

the squared
value of the sample corre la tion between X and Y. Th It '" s ho\\ th.u
IS

R2 = r.}y.
b. Show that the R?. from the regression of Yon X ~~ the same as the R~
from lhe regression of X on Y.

Empirical Exercises
E4.1

On the text Web site (www.aw-bc.com/stock_watson) , you wtll find a d at1


file CPS04 rhat contains an extender..! version of the data set used in Table 3.1 for 2004. It contains data Cor full-time, full-year workers. age 25-34,
with a high school diploma or B.A./B.S. as their highest degree. A detaileu
description is given in CPS04_D~scrlption . also avail able on the Web itc.
(These are the same d a ta as in C PS92_04 but are limite d ro the year 2004 )
ln tl1is exercise you will investigate the relationship between a worker's ag~
and earnings. (Generally. older workers have more job experience, lead in_:>
to hig.ht:r producrivity a nd earnings.)

a. Run a regression of average hourly earnings (A H ) on age (Age).


What is the estimated mtercept? What is the estima\ed slope? Use the
estimated regression to answer this question: How much do earnings
increase as workers age by one year?
b. Bob is a 26-year-old worker. Pre dict Bob's earnings using the estimated regression. A texts is a 30-year-old worker. Predict Alex1s's
earnings using the estimated rt:!,'Tession .
c. Does age account for a large fraction of the variance in earoiogs
across individuals? Exptam.
E 4.2

On lhe text Web site (www.aw-bc.com/stock_ watsou). you will find a uJil
fi le TeachingRatings lha t contai ns data on course eva lua tions. cour~t:
characteristics, and professor characteristics for 463 courses a t the U niwr
sity of Texas at Au~tin. 1 A d e tailed description is given in TeachingRa lings_D escription , a lso available o n tile Web site. One of tbe charactcrhll~~
is a n index of the professor's ''beauty'' as r a ted by a panel of six JUdge!> I
this exercise you will mvestigate bow course evaluatio ns :~ re rdated to t
professor's beau ty.

1Thc:!e o.ll.s \\ Cf\;

pro' idcd hv Pruk .'(tr O.tnd I !amcrme-.h nl the l

OI\Cr.>ll> ul

lc\il' at Au.lln :md

~'ere u-c:O m h~ paper '' 11h -\m\ I' .rt..cr, .. Bc..uh m th~ Cl01~'n'll.lnl lnstru..:tos-. Put~hntu.k und f>ut
h-c Pe.lagog~.:nl t'ruduc:tiu~. Et " "Illes nfl.'duc-utmiJ Rnle"t . AUU'I :20.1,, 2-114) pp. .lh~\11>.

Empencol Exercises

14 1

a. Construct a scauerplot o{ average coul'!.c ~' tlua ttons (Course_Evaf)


on Lhe professor's bea uty ( Beauty). Docs tht:rc appear to bl! a relationshtp between tbe variables?
b. Run a regression of average course C\ aJua tlon (Course_ val) o n
the professor 's beauty (Beoury}. What is the estimated inte rcept?
What is the estimated slope? Explain wby the e'ti m.lleu intercept is
equal to the sample mean of Course_E~ol. (Hint: Wha t is the sample
mean of Beawy?)
c. Professor Watson bas an ave rage value of Beuuty, while Professor
Stock's value of Beauty is o ne standard deviatio n above the average.
Pre dict Professor Stock 's and Professor Watson 's course evaluations.

d. Comment on the size of the regression 's slope. 1-. the e~tima ted effect
of Beauty on Course_ Eva/ large or smull? Explain wha t you mean by
" la rge'' a nd "smalL"

e. Does Be~uuy explain a large fraction of the variance in eva luations


across courses? Explain .
E4.3

On the text Web site ( www.aw-bc.com/stock_watsoo) , you will find a data


file CollegeDistnnce that coma in~ data from a random sa mple of high &cbool
sen1ors interviewed in 1980 andre-interviewed in 1986. ln th1s exercise you
w.iU u~e these data to investigate the relationship hetween the number of
completed years of educa tion for young adult~ and the dista nce from each
stu<.lt:nt's ltigh school to the nearest fo ur-year college. (Proximity to coUege
lowers the cosl of education. so that students who lh-e closer to a fo ur-year
college sho uld, oo average, complete mo re years o r highe r e duc.ation.) A
de tailed description is given in Colle geD istance_D escription. also a vailable
on the Web site. 2

o. Run a regression of years of completed euuc.nto n (ED) on distance


to th e near est college (Dist). where Dist is m..;asurcd in tens of miks.
(For e xample, Dist = 2 meaos that the uista nce is 20 miles.) What is
the esUm atcd intercept? What is the est1mah.:d slope? Use tht: estima ted regr ession to answer this q uestio n: How doc:s th e average value
of years of comple ted schooling change \\hen colleges are built close
to \vhcrc students go to high school'?

~~-se 11111 ~ were pro\ldcd h\ Prof~"<lr \eClha R1u...: ul l'nncctun t m veNI\ .tnd "en: Ul>Cd tn her
J>.tpcr ~ u.mocr.tllllllton ur 01\cr,inn" The F lh:.:t <~I Cummuml) f'<ll~ , on f.d uc tltonal AltaJOmcnt," Juumt~l r/ Huwu:.~.\ amt &cmunuc: ~''''1.1""\ (lrtli'J'J'i 121:!1 rr 217...:!~4

142

CHAPTER 4

Linear Regres$ion with One Regre$SOr

b. Bob'!> high school was 20 miles lrom th~ nearest college. Pr\!t.hct Bub'.),
) ~a rs Ill -.:ompleted educatwn uo,mg the c~tunated rcgrc ......ion. rlow
would th~: pred1ction ~h mg~ 1f Bob h\'cU Ill m1k~ ln.lm the ne..~rest
colkge?
c. Doe!) dtstance to collcg~ c\phm i.tlarge tr.1ction of the 'arlance in
c=ducauon.JI ,tlla.nment aero' indl\ 1tlu ''',Explain

d. What j, th~ value of the l>tantlard ~,;frl)T ol the rt:!:-rc;;,slon? What


an: the units for the standard error (me. ter' gram, )\.'ar, dollars. ccnb.
or -.omcthing cJ,e)?
E~.~

On tht..: tc>.t Weh Mt~ (www.aw-hc.com/stock_watson). you will f1 nd a dat 1


file GrO\~ t h that contains d<Jta on average gro\\th nlles over 1960-1995 fur
()5 coun tries. along with var iables th at are potentially re latctl to growth.
A detailed descript ion i!l given in Growth_Description. also available on tile
Web siLl.!. In thi~ exercise you will in vt:!>tigate the rdationship between
growth and tradc. 3
a. Construct a scan erplot of average onnunl growth rate (Growth) on tl..:
average trade share (TradeSharf!). Does there appear robe a relauono;htp tlctweeo tht! variables?
b. One country. Malta, ha~ a trade share mu..:h lar~cr than the other
countrie~ Find Malta oo the '-Cattc:rploL Does ~lal la look like an
outlia'1
c. Ll>mg Jll observallon'\. run a regresson of Growrll on TradeShare.
\\'hat L' the esumah..d ,Jope'! Wbal is the esumat~tl mt~.:rccpl ? C:.e
rcgr~s,ion Ill predict the g.rcl\\lh rI ll: ror a COUntry With trade share \.1
0.:; and with a trade 'h.trc equnl hl 1.0.
d. Estimate the same regrl.!:.sion excluding the data lrom Malta. An~wu
the 'arne questions in (c).

e. Where JS Malta'? Why 1s the Malt<t tr;HJ<.: share !:.o large? Should Malt.t
be included or ~.:xcludcu fwm Ihe ; 11wlyt'i~'?

'!'he~ J;~us-..crc pro,KJ.:d hy l'rol.: ~r H~ l..c'm vr Rr.-..n 'nivcr~at~ ,, J w , u~ an hb il-'~P'


-.. uh lboNen Rnk nnd :-;mman l..oaYJ.a ul man\."C n I 'our'~ uC t.ru "' J. nwl OJ Fu: zn, I
E..o110mtn 200J, ~s_ :!IJI-300

Derivohoo of the OlS E~rimolors

143

APPENDIX

___4_._1 ~ The California Test Score Data Set


n 1e Caltfnnua Swnt.lardw:t.llc')ting anJ R~port ing d<lta 'ct cont<un~ d.11.1 un tc~t pl:rfllr-

mance . .;chm1l char,tctl!rhtic.... and sttH.knt dcnmgraphil.: 1-tacky.HHIOJ,, lllc Jata used here
nrc frvm all 42() K -6 nnd K -8 dil>tricl5. in California wtth J.tta ov:ul.1hh: for 194!( ant.l1999.
Test ~core~ arc the a\er<tgc of the reading and math :-.co re~ on the StunfurJ 9 Achievement
lbt, a stand;IIJI/eJ h!St admim:.terc:d tu ftflh-~rade s t ud~..o n t ~ S<:huul ch.tractcmttQ; (avcrtgcd ac.m:..' the.: dt:.tr1ct) mdude. enrollmen1 . number of tc JCh.:r~ (nH.:a, uJct.l as (ull-ttme
equiv.tlcnh ") number ,,r computers per cla:.sroom. and c.;\pcndttutc' per ..,tuJent.l11c l>LUt.h:nt tc.tchcr ratio th~d here j., the numher of 'tude nt ~ in th1. c.Ji,trict, Ul\ idcd hy the numhcr of full time ~.yui va len t teachers.. Demographic variable~ for the: o,;tudcnts also are
,1\cr.t~cd

across the dtc;trict. nwdemographic variable'\ include the percentage of students

wlu..) ae 111 the public a~islimce program CaiWork!l (l<>rmerly AFD C), the p;;rccntage of
:.tudents ~bo qualify for a reduced pncc luncb,.mJ the p;;rcl!ntagl! uf students who arc EngJt~;h kat nef'\ (that is, :.t udcnts !or whom English ~ a second l n n gua~l!). All of th.:'-e data
W L rc.: obt ain~d fmrn the Californ ta Dcpartmcnl of J:.ducation (w""' .nkca g,m 1.

APPENDIX

4.2 Derivation ofthe OLS Estimators


nus aprcndtJC u'e~ calculus to derive Lhc formula~ for l h~: O LS CSilmutors giv.:n Jfl Key
Concept ~ 2. 'lu mimmue the sum of squared prcdH:tllln 1l' ~tJ kc!S L 1( Y 1 - h11 b 1X,f
( r~u tulr, {I fl)J hrst take the partial dcnvatJ'C' wit h r~o:'JX:" to h and 1> 1
~

-Mu,
- :L ( Y1 -b,, - bX
) 2 =-""~- ,2, ( Y1 - b11 - bX
) and
1 1
1 1
1

- '
tl 1
,,

2,I ( Y,- b0 -I,X,I

(423)

= -2 2, ( Y;- bfl- b.X)X 1

{4.2-l)

t J

111~: ( H ~ e'tunuhll'l\. ~ :tnd ~ 1 are the values of h. and i> Lhat nunim11e "i.~ { Y1

b -

b .\ 1 ) ur. cyut-..lcntl). thc 'nlue:. of b, and b 1 for \\htch Lhc dem .JII\ c:. 10 I qu.1ttons ( 4.23)

144

CHAPTER 4

Linear Regre$Sion with One Regr~~


and (414) equal zero. Accordingly, seuing these lh.. nvuth es equal to zero oollcc:tin~ h:rm<
and diVIdmg by 11 shows that the OLS estlmKtors.A and P1. must sallsfy tht! two cq u1111on

(4 2i>l

Solving tlri patr of equation~ for iJo and iJ


1
~

II

{3, = I

)1cld~

L:AY -X Y

"
L:tx
-

X)(Y,- Y)

II

- L:Xf- CX> 1
II tl

Equ<Hions (4.27) and (4.28) are the formulas for

iJo and p1 ghen m Key Concept

4 :!:

iJ

lhe formula 1 ... s,yyll } i~ obtained by dividing the numerator and denommator in fquil
iion (4.27) by 11 - 1

APPENDIX

4.3

- -- ----i

Sampling Distribution ofthe OLS Estimator

ln this appendlX. we show that the OLS estimator

p1 ts uob1ased and. in larg... ~mnJ..,. ta

the nonnal sampling di.~tnbuuon g1vcn 10 Key Concept 4.4.

Representation of f3 1 in Terms of the Regressors and Errors


We start by providtng an .:xpressiun for ~ 1 in tl.'rms of the regressors anJ error:.. R~:t.rtl'c
Y, = {30 ~ {J 1X 1 + Y, Y =j3 (X;- .\') +
ii. so the numera10r () f th\! f<nmul fllr

u,.

u,-

~ 1 m Equauun (-1.27) 1'


n

2:<X,- X}( Y,- Y) = L:CX,- Xl(~l(X,- X )- (u.- ii))


t l

= f3 L I X 1 - X f
l

(1.?9)

"

2:CX,t-1

X)(u, - ti}.

Sampling Distribution of tho OLS Estimator

145

Now ~" (X,- \')(11 - m= I ' 1(-)i, .\")11


L (\"-.\'Itt }.;; (A , X)u \\here
th~.; lu1al tquahl\' follm""" from tht definition of A . which impli~.-s that }..
( A - X )u =
I~"

X, - 11\')u =O. Suhst ituti ng~,~ 1 (X - X )(u

u)

}..

1 ( '(-

A lutniOthc:tm.tl

cxpt .,;~'ll'n tn LquJttun ( 4.29} pelds "L, 1(X; - \ )l Y, - )' ) -= fl ~~~ 1 \ . - X)~ r
~ 1(X, \ )11;- Subst1tutmg thts c:-.pr.,;~ston m tut n mto the lurmul IN~ 10 E4ua1t111l (4 '7) ~ tdds

I n
- ~ ( X - ); )II
II ~

(4.30)

I
- )' ( X - A )l

,"*:'

ll

ProofThat

Pt Is Unbiased

lltt: cxpcct.ttion of~ i~ obtained by raking the exp.;:ctallon of h<11h ,,J,' 11f lqu.tuon 1-UO).
llm~

-{ Ln ( X, - -X)F.(u, ! X 1,
II ,_,

= /31 - E
[

where tht

~econd

II

I<X,- A )
n; 1

equalirv in Equation (4.31) follows hv

,\,,) ]

'

u~mn t h1.

{3
IZ

,.

law tlf itct.ttcd C'lpecta-

tinns ( "1:\.:ltnn '>.3). ll) the s<xonJ l~:a <;t ~quarel> a'sumplton . 11 ~ J"tnhulcd tnJq'l\:nJ~:n t ly
hH .Ill 11h~e n :It tons other tha n i. so L::{11,l X 1 . X,) L(u X ). By the ftrst

nf ,\

k il~t '4uarel' J~umpuon. however. F.(u,l X.) - 0. lt fl,ll\1\\' th.tt the l:llOUtltllnal~:\P.,:C:la
111 l.trg~. bra~:l..ct' in the second hnc.: l)f [qu.tltlln (4. \J) IS l1.n.1. ~~) that

ltllll

(.(fi - {3,' \' I X,.) - 0. Equh alcnll}. F(~ I \ .... . .\'n) (3 lhu t j, /3 '" conuition.111) unhi.t,,J JZIVen
X, ny the l;m of iterated e\pec t:lli110S 1J;J,) =
11 r:w1 - JJ ,.

x,. . ..
.. .. x 1J =o. so that F.(,B,)

lb. -

f3 th,u , 1j t\lmh'cd

Large-Sample Normal Distribution of the OLS Estimator


ll1c larg.:-s.,mplc normal nppmxtmallun to the: hmmng dt~tnhuuon ol/3 l"-t;)
ts chtalnt J

h) '-'' ,,tdcnng the bcha' ior ot the tinal tt.nn 1n Equ tllon \ , >tl)

Conc~.pt4.4)

146

CHAPTER 4

Linear Regression with One Regressor


ftrst consider the numerator of this tenn . .Bccaus~.: X is consistent. if the sample -.jze
is large, X is nearly equal toP..\. Thus. to a close approximation, th e term in the numerator
of Equation (4.30) is the sample average ii. where v = (X;- IJ.A)tt, By the first leaq ;;quares
assumpuon. t, has a mean of zero. .By the second lcal>t s4uan.~ assumpuon, l ~~ t.u.l. lb.:
variance of v, is cr; = var((X; - P.x)u, ) which, b} the third least squarl!:. assumptton. I'>
nonzero and finite Th.:rel'ore. v satisfies all the requirement.' of the central limit thcorcrr
(Key Conc..:pt 2.7). Thus. v l c.r-v is, m large samples, d istributed V(O 1). where c.r~ = c.r~l n.

Thus the d istribution of v is well approximated by the N(O. u~Jn) dil>tnbution.


Next consid>.!r the expresston in the denominator in Equallon (-t30), thts is the sampltvariam:c of X (except dividing by n r<~ ther than n - 1. which is in con equcmial if n. is large.:).
As discussed in Section 3.2 [Equation (3.8)). the sample vanance is a conststent estimator
of the population variance. so in large samples it is arbitrarily close lo the populaLion variance of X.
Combining these two results, we have that. in large sample~.~~- {3 1 ;;; v/var(X,)

[31 is, in large ~ ampl es, N(/3 1 at ,), whe re u~,


var[(X,- P.x)u;] l ln[var(X,)fj . which is the cxpr~ss io n in Equa-

so thnt tb.e sampling distribution of

var(v) /[var(X,)]2

tion (4.21).

Some Additional Algebraic Facts About OLS


The OLS residuals and predicted values satisfy
1 ,

n2:U,
= o.
i

(4.32)

I~ .
n LJ Y,=Y.
j-1

" ti,X1 =0 and sr,x =0, and


2:

( -I.J.J)

i- 1

TS.S == SS R + ESS.
Equations (4.32) through (4.35) say that the sample average of the OLS residuals i~ zer'
the sample average of the OLS predicted Vi!llll.:l! equal:. )'; the sample covariance ' \
between the OLS re~iduals and the regressors il' zero; and th.: total ~u rn of squares is tl1..:
sum of the -;um of <:quared residuals and the explamcd ;;urn of squarcl! (the ESS. TSS JnJ
SSR ar..: dcfrned in Equations (4 14) (4.15).and (4.17))

Sampling Ot$!ribuhon of the OLS Estimator

147

lo H:riC~ F.4ua1ion (-U2) .. note that lh~ Jch nition ttf h11 1cts us wntc: the OLS residual~ a" ti, ~ Y1 - ~.... - p,x,= ( Y - Y ) - f3 (X, - X): thus
n

~>i
1-J

"

2: ( Y,- Y) - iJ. l.. (X,- X)

1-

2" 1(Y,- )') - 0 and I;. ,(X;- X)

But the defim tion of Y and X imply that

= 0. so

l,~ ilt = 0

To venfy Equation (4.33) , no te that Y, =

Y, .... 11, :;o

~7 Y, =-

L; ,Y- L~= it 1

Y. where the second eq uality 1s a consequence or F..qu 1t1un (4 J2)


To verify Equation (4.34}. no te tha t L7. 1ti , - 0 tm phc:; l,: u,X, I7. 1ii,(X; -

.. }:; 1

2:u;X = 2: [(Y, 1

i -1

Y) - /31(X, - x)]

ex,- x}

i=l
If

= L (}j- Y) (X,r=l

X ),

X) - {J, 2: ( X, - Xf =
"

(4.36)
0.

1 1

where the final equality in E<t uat ion (4.36} is obtained usmg the fonnula for {J1 in Equatio n (~ .27). nus result, combined with the precedtng resu lt ~ im pli~s lhat sitx = 0.
Equatio n (4.35} follows from the previous results a nd some al~ehra :
n

TSS

L (Y, - Y f
;: ]
II

2) Y,- Y, + Y, - Y)'
.~J

2:<Y.-

~y

I= I

= SSR +

II

+ :LeY. r~

II

ESS

Y) 2 -r 2 L

( Y,-

Y.HY. - Y)

(4.37)

+ 2'2:u,Y,- SSR + ESS.


r=l

whe re the fin al equality follows from

+ ~ 1 L~ 1ri ,X, = 0 by the pre viou5 resuhs.

I* 1 rc,~

2: ,ri,(JJ,.,

1-

~ 1 X,)

!Ju "L~ 1ti,

CHAPTER

Regression with a Single


Regressor: Hypothesis Tests
and Confidence Intervals

his chapter continues the treatment of lio~nr r~gressi on with a single

rcgres~or Chapter 4 cxplaim.:d how tht\ OLS t!Slimator ~ 1 of the slope

coefficient {3 1 ditfers from one sample to the next- that is. how ~ 1 has a samphnr
distribution. In this chapter. we show how knowledge o( this sampling
Ji~tribuuon

can

b~'

uscc..l to rnaJ.. ..: stat..:nH.: nts about {3 1 that accurate ly su mm a111~

the sampling uncertainty. The starting point b the standarc..l error of the OUi
estimator, which measures the spread of thr.: ampling c.Jt.,tribution of {3 1
Section 5.1 provides an

cx pr~!ssion Cor

tlus staodan.l t!rror (and for the stanJ.m..

error of the OLS estimator of the: intercept). lh~o bow~ bow to use
Stand.srJ

Crhf hl

test

hypothr.!'-~ '- S~dtlln

P and tb
1

5.2 explatn-. hO\\ tO construct

conlsdc..:ncc.; sntc.;l'\ b lnr {3, Section 5.3 takes up tht!

!~pect.ll c:~se

of a binary

regressor.
S~cl11>1lS 5. 1-5.3

Ch apter ~
strung~:r

assume that the:

th r~!c: kill>!

squares a!)sumptions of

IHll<..l. ll. tn addirion,some !.lrOn!!l.!f conditions hold , then some

n..:-.ult-, can be <..lcri,cd regarding lb1. llJ'>tribution of the OLS e<.limator

One ot thes1. !-trongl!r con<..l itions t' that the l!flOJ:> arc bomo!)kedac;tic. u Ct>m:c p l
intsoduccd

tn

Sl!rtton 5.4. Section 5.5 prc-;cnt'>

th~

Gu u:,!)Mat ko" theorem.

"hich -.tah!'- that. un<..lcr certain conJitiun'-. OLS L~ dfidcnl (ha~ tht: <..malle't
vanancc) among a cert=tin

cla~c;

ot esttmators. Sccuon 5.11 1.h~cus~es the

dt'>tnhution ol the OLS e'\timator when the rorulation distribution of the

s. 1 Testing Hypotheses About One of the Reg res$ ion Coeffictents

5.1

149

Testing Hypotheses About


One of the Regression Coefficients
Your client . the ~upcrintcnden t. caUs )OU with a prohlem. Slw hue; an angry taxpaver in her office who a:. crts that cutting clas::. size will not help boost test scores.
:>o that rl.!ducing them lunher is a waste of money. Class :.ize. the taxpayer claim:..
has no effect on test score~
The taxpayer's claim can be rephrased in the language of rcgre:.::.ion an aJy~is.
Beca u:.e tbc eflcct on test scores of .i unit change m '-'''' 'iJ?c is f3c ,u ...Si:<"
the tAxpayer is 3S!>erting that the popula110n regrc:.:.ion hnc i' fl lt-lb,lt IS, the
slope f3oof<\l:c or I he population regression line is zero. Is thcrt:. thc superinte<.lenl
ask!., evidence in your sample of 420 observations on (ali for nia school districts
that th1s slope is non7.ero? Can you re ject the taxpayer's hypmhes1s that f3c a,_,,;,
= 0. or should you accept it, at least tentatively pending fu nh~r ne'' t: \ t<.knce?
This secuon wscusscs tests of hypotheses about the slope {3 1 or mtcrcept /3,1of
the populatio n regression line. We start by dbcu::..')ing IWO-'Itded t~'its of the ,(ope
{3 1 in de tail, then turn to one-sided te!.ls and to wsts of hypoth~::.cs regarding the
intercept {30 .

Two-Sided Hypotheses Concerning {3 1


1l1 ~

general approilch to testing hypotheses about th~s~: coclltctc nts is the same a
to ttsling hypotheses about Lhe population mtan,so we begin with a brief review.

Testing hypotheses about the population mean . Recall from Section 3.2
Lbntthc null hypolhe~is lbat the mean or Y i~ a spe 1f1t value 1-'-l 1 can be written
as II , E(Y) = 1-'-l ,, and the twosided alternative is II f( Y) 1-Ll'fl
Tbc test of the null hypothesis H0 against the t\\ o-sided ah~rn.tttve proceeds
as in the thrt.. t! steps summarized in Key Conct!pt 3.6. The r.r,t i' to compute the
'l:lndar<.l error of Y,SE(Y), which is an eo;tim;Hor of the standa rd deviation of the
'ampling distribution of Y. The second ::.tcp is to compute the r-stntistic, which has
the lll!neral rorm given in Key Concept 5.1; applied het~. the r-statistic is
r - () IJ.) )'SE(Y).
I 'he third st~p is to compute the p -value. wbkh is the smallest significance level
.11 wh1ch thl.! null hypothesis could be rejected , based on the tcSL statistic actually
oh...ct \cO; cquivakntly,lhc p-value is the probahility of obtaining a statistic, by random :-.amplin!! \ ariat ion, at least as differl!nt from the null hypothc,is value as is
the statistic actua lly obser\'ed, assuming that the null hypothc.:o;Js i), correct

1 SO

CH APTER 5

linear Regres~ion with One Regressor

GENERAL fORM OF THE t-STATISTIC

5.1

ln general. the t-stati tic ha) the form


cslimator - hypoth c~l7(. d : c
1
= sta ndard error ol the e)ttmJm

C.t

( Key Concept 3.5). Because the 1-staristic has a standard normal distribution in

large samples under the oull hypothesis, the p-value for a two-sided hypotbl:!'-1'test is 2Cf?( -ITac.ti). where T'"'' is the value o ( the r-stalh.tic actually computed anu
(]> is the cumu lative standard norme~l distribution tabulated i.o Appendix Table 1.
Alternatively. the third step can be re pl aced by simply comparing the 1-statisuc tn
the critica l vaJue appropriate (or the tes1 with the desired significance level. F111
example, a two-sided test with a 5% significance level would reject the null hypothesis if lr 11c'l > 1.96. In this case, the population mean is said to be statistically stg
nificantly different than the hypothesized value at the 5% .;ignificance Jevel.

Testing hypotheses about the slope p,. At a theoretical level. the crilical l l w
ture justif}ing the foregoing testing procedure for the popula tion mean ts tba tn
large samples, the sampling distribution of Y is approxtmate ly normal. Becau.o.l. {3.
also bas a normal sampling distribuuon in large samples. hypotheses about the t ue
value of the slope /31 can be tested using the same general approach.
The null and altemarive hypotheses need to be srated r recisely before! l. ll.~
can be tested. The angry taxpayer's hypothesis is that Be, ~h = 0. More gcner
ally. under the null hypothe is the true popuiiHion slope (3 1 takes on some spe~. 'tc
value,J3tu- t;nder the two-sided alternative, (3 1 does not equal 1310 . That is.thl;! null
hypothesis and the twosidcd tdtemative hyporhesis are
Ho: 13 1 = B1.o vs. H 1: (31

* /3

111

(two-sided alterna tive).

("i 2)

To test the noll hypothesis H 0 , we follow the same three steps as for the popul,t
tion mean.
The first step is to compute the standflrd error of {J,. S(~ 1). The taml<trd
~rror of ~ 1 is an estimator of rrp,.lhc standard deviation of tbe sampling dismt>u
lion of {3 1 Specifically,
(5 )

s. 1 Testing Hypotheses About One of the Regression Coefficient$

15 1

where

a~

" (X
-1- ~
n- 2 ~

=- xn

X) 2i/l

Jl_

[l i <x,- X)')2

(5.4)

ll ; .. ,

The estimator of the variance in E quation (5.4) is discussed in Appendix 5. 1.


Although the formula for ~ . is complicated, in applications the standard error is
computed by regression software so that it is t!as) to uc;c 10 practice.
The second step is to compute the t-stati tic.
(5.5)
The third step is ro compute the p-vulue, the pr obability o( observing a value
of a tleasr as differem from l'1.o as the esllmate actually computed (fft"'~), assumin!l.tbat the nuU hypothesis is correct. Stated mulhematically.

b,

p-valut!

= PrH-[1~1
= Pr

(
Hu

13t.ul > I.Btu

- I'J.ol1

I]

/31- ?t.n > ljJ;rr- /3, u -= Pr 11 ( r >!rot:' I).


SE(f3t)

S({3 1)

'

(5.6)

where PrH. de notes the probability computed undl!r the null hypothesis, the seco nd equality follows by dividing by SE(j3 1). and t nrt is t he vaJul.! of the t-statistic
actually computed. Because 1 is approximately normally distributed in la rge samples, under lhe null hypothesis the r-statistic is approxitn<Hely distribute d as a standard no rma l random variable, so in la rge sam ples,

p-value

= Pr(IZI > It"" I) -

2<t>( -lt0 c11).

(5.7)

A sm all value of the p-value, say less than 5%. provides evide~O:: agamst the
null hypothesis in the sense that the chance of obtaimng a value of {31 hy pure random varia tion from one sample to rhe next is less rban 5% if, in fact. the null
h\-pothesis is correct. lf so. the null hypothesis is rejected a t the 5% sig.01ficance

k\el.
A lternatively. the hypothes1s can be tesled at the 5% igoftcance level simply
hy comparing the value of the t- tatistic to :tl.96. the critical vulu~ for a two-sided
tc.: ... t. and rejecting the null hypothesis at the -o/o lcvd if "r I > 1.96.
l1li.' '>C 'ttcps are summarized in Key Concept 5.2.

152

CHAPTER

s lmeor Regression with One Regressor

TESTING THE HYPOTHESIS {31


131,0
AGAINST THE ALTERNATIVE 131 =F {31,0

5.2

1. Comp ute the s tandard error of ~ 1 . SE@ 1 ) [Equation (5.3)).

2. Compute the r-statistic [Equntion (5.5)].


3. Compute the p-valuc [Equation (5.7 )J. R eject the hypo thesis at the 5% J>ignificance level if the p-value is less tha n 0.05 or. equivalently. if r<N > 1.96
The standard e r ror a nd (typically ) the T-slatistic and p-value testing {3 1 = 0 arl'
computed automatically b~ regression software .

Reporting regression equations and application to test scores. The OLS


reg.res~ion of the test score agamst the s tudent-teacher ratio. reported in Equation (4.l l). yielded ~0 = 698.9 anu ~ 1 = -2.28. The ~t:~odard erro rs of these estimate s arc S(~0 ) = 10.4 a nd S ($1 ) = 0.52.
Because o f the importance o f the s ta nda rd e rro rs, by convention they arc
include d when report ing the estimated O LS coefficie nts. One compact wav L~,
rc:port the sta ndard errors is to place the m in pare nthe::ses below the respectJ\'e
coefficients of the OLS regression line:

T estScore = 698.9- 2.28 x STR. R = 0.051,SR == 18.6.


)

(5 .~1

( 10.4) (0.52)

equation (5.8) a lso reports the regressio n R 2 and the sta ndard e rro r of the regre::.
sion (SER) following the e stima ted regression line. Thus Equa tio n (5.")
provi d~s the estim ated r eg ression line. estimates of the sampling uncertaint) ul
the slo pe and the intercept (tbc standard e rrors). a nd two me asures of the ht l,f

th is regression line (the R2 and the SER).TI1is is a common fom1 at for reportu1g.
a inglc regression eq uation . anu it will t"lc used th roug ho ut the rest of th is boo!-..

SuppO'>C you wish to test the null hypothesis that the slope /3 1 is zero 10 til~
population counterpart of Equ.tt ion (5. ) a t the 5 % signifiCance leve l. To d o:.,>.
construc t the 1-sratisti <tnd compare it to 1.96. the 5 % (two-side d) critical ' alu ...
talcn from the st andard normal d is tribution. TI1e 1\Latistic ill constructed bv
~ubstituting the hypotht:sJzed value of {3 1 under the null hypothesis (;cro ), the est
muted slope. and i t 'i \tunuard 1.:11 or from Equation (5.~ ) into the general formuJ,\

5.1

FIGURE 5 . 1

Testing Hypotheses About One of the Regression Coefficients

153

Calculating the p-Value of a Two-Sided Test When f 0 " = - 4.38

The p-val ue of a

,
f

s ig-

two-sided test is the


probability thot !Z > it-1, where
zis o standard normal random
.-orioble and F' is the value of the
1-statistic calculated from the sample.
When fX' = - 4.38. thep-value
only 0 .00001 .

96.

are

O LS
~qua
~ esti~y

llle area to the righi of +4 .38.

arc

vay to

in Equation (5.5); th e result is lurr = ( -2.28 - 0)/0.52 = - 4.38.1l1iS r-sta ti.stic.

bctivc

e xceeds (in absolute value) the 5% two-side d c ritical value of 1.90. so the null
hypo thesis is rejected in [avor of the two-sided alternative at lh~ 5% significance
level.
Alternatively, we can compute the p-value associ<"' ted with r"u = - 4.38. 1llis
probability is the area in the ta ils of sta ndard normal distribution, as shown in Figure 5.1. This probability is extremely small, approximately 0.00001. or 0.001rvo .That
i;, if the null hypothesis f3crmsSz~r = 0 is true, the probahilit) {1 1' ob taining a value of
{3 1 as far from the null as the value we actually obt ained is e xtre mely sma ll , less
than 0.001 %. Because this event is so unlikely. it is rcasonubk to conducJ~ thnLthe

(5.R)

lregrc~

n (:'i.t;)

~ioty of
~e fi t of

n ull hypothesis is fal se.

/porting
s hook-

in lllc
Iroodo

S0

,at v~1IUC
~cted 11~
the e!'tir
forrnl.lh'

One-Sided Hypotheses Concerning {3 1


1l1c discussion so far has foc used on te ting the hypothesi thm {3 1 == {3 1_0 against
th e hypothesis that {3 1 =F {3 1,0 This is a two-sided hypoth ~.:si s test. because unJcr the
a ltemat.ive {3 1 could be eithe r larger o r smaller th~u1 {3 1,0 Sometimes. however. it is
appropriate to use a o ne -sid t:d hypothesis l ~sl. For cx u mpl ~. in the studcn t-teach~r
ratio/test score: problem. many pcoplt: think that smalkr cla ses provide a better

1 54

CHAPTER 5

lineor Regression with One Regressor


karning, environment. Under that hypothesis. {3 1 is negative: Smaller clus'>c'> lead
to higher scores. lt might make sense, therefore. to te t the null hypothc h that
{3 1 = 0 (no eUect) against the one-sided alternative that {3 1 < 0.
For a one-sided test, the null hypothesis and the one-sided altem:Hi\ hypoth
esis are
H 0: {3 1 = f3t.o vs H 1: {31 < f3 to (one-sided alternative).

(5.9)

where {3 10 is the value of {3 under the null (0 in the !> IUdent-Leach~r ratJO example) and the alternative is that /3 1 is less than /310 If the alterna tive 1s that {31 1s
greater than /3 1.0 . the inequality in Equation (5.9) is reversed.
Because the null hypothesis is the same for a one- and a two-sided hypothesis test, the construction of the t-statistic is the same. Tile only difference betwel!n
a one- and two-sided hypothesis test is how you inte rpret the r-sta tistic. For thl!
one-sided alternative in Equation (5.9), the null hypothesis is rejected against t h~!
one-sided alte rnative for large negative, but not lnrge positive, valui!S of the t
stnt istic: Instead of rejecting if t 11~'1 > 1.96, the hypothesis is rejected at the 5%
significance level if rae'< - 1.645.
The p-value for a one-sided test is obtained from the cumulative standard normal distribution as
p-value

= Pr(Z < r'"') =

<l>(r ') (p-value, one-sided left-tail test). (5 II)

If the altemauve hypothesis 15 that {31 is greater than {3 1JJ, the inequali u~os
Equnuons (5.9) and {5.10) are rcversed,so the p-valuc is the righi -tail probability.
Pr(Z > tnc').

When should a one-sided test be used? In practice, one-sided alternative


hypotheses !>houJd be used only when there is a clear rea~on for doing so. This r..:,l
son could come from economic theory. prior empirica l evidence. or both. Ho\\
ever, even if it initially seems that the relevant alternative is one-sided. upoil
reflection this might not necessarily be so. A newly formulated drug undergoing
clinical trials actually could prove harmful because of previously unrecogni;cd
side effects. Jn the class size exumplc. we are reminded of the graduation joke thal
a university's secret of l)uccess is to admit talented s tud (.: Ot~ and tbeo make :;un.
that the facu lty stays out of their way and does aJ> little damage :1 possible. In pra1.
uce. such ambiguity often leads econometricians to use two-c;rded tests.

5. 2

Confidence Intervals foro Regression Coefficient

155

Application to test scores. The t-statistic testing the hypothesis that there
is no effect of class size on test scorcs[so /3 10 = 0 in Eq uation (5.9)] is raa = -4.38.
This is less than -2.33 (the critical value for a one-sided test with a 1% significance leve l), so the null hypothesis is rejected against the one-sided aJtemative
at the 1% level. I n fact, the p-value is leS!i than 0.0006%. Basl!d on tbcse data,
you can reject t.be angry taxpayer's asse rtion that the negative es timate of the
slope arose purely because of random sampling variation at the 1% significance
level.

)th-

.:'i.9)

Testing Hypotheses About the Intercept {3 0

othe

This discussion has focused on testing hypotheses about the slope, {3 1 Occasionally. however, the hypothesis concerns the intercept, {30 . TI1e null hypothesis concerning the intercept and the two -sided alternative are

ween

)r the
1st the
the rte 5%

H0: f3n = {3(,.o vs. H 1: {30

(two-sided alternative).

(5.11)

The general approach to testing this null hypothesis consists of the tlut!t!
steps in Key Concept 5.2, applied to {30 (the formula for the standard error of
~0 is given in Appendix 5.1). If the alternative is one-sided. this approach is
modified as was discussed in the previous subsection for hypotheses about the
slope.
H ypothesis tests are useful if you have a specific null hypothesis in mind (as
did our angry taxpayer). Being able lo accep t or to reject this null hypothesis based
on the statistical evidence provides a powerful tool for coping with the uncer tainty
inherent in using a sample to learn about the population. Yet, there are many times
that no single hypothesis about a regression coefficient is dominant, and instead
one would like to know a range of values of the coefficient that ;ue consistent wi th
the data. This calls for constructing a confidence interval.

ronor(5. tO)

litiC$ in
~ability.

l rnativc
n1iS rea
lh. How
~d .

* f3o.o

upon

tiergoing

cognit.ed
joke that

1akc sure
e.ln prac

5.2

Confidence Intervals
for a Regression Coefficient
Because any statistical estimate ot the slope {31 necessarily has sampling uncer
tainty. we cannot de te rmine the true vaJue of {31 exactly from a sample of data. lt

156

CHAPTER

s linear Regression with One Regressor


is, how~ver. posl>ible to usc the OLS estimator and its stantlurd error to construct
a confidence inrerva l for the slope {31 or for the intercl!pl {30 .

Confidence interval for {3 1

RecaU that a 95 % conlideoce interval for (J 1 ha'


two equivalent defmitions.Hrst. it is the set of values that t<innm be rqectt:d u.,ing
a two-sided hypothesis test with a 5% significance lc\cl Sccond.t is an mtc..n..t.
that has a 95% probab1Lity of containing the true value of 131 that i),. in 95% uf po'
sble samples that might be drawn. the confidence intcrvHI wiiJ contain the tnl\.
value of f3 t Beca use this interval contains the true val ue in 95 1h, of aU samples. it
is saJu to have a confidence level of 95%.
The reason these two definitions arc equivalent is as follows. A hypothesis tc~t
with a 5% significance level will. by definition. reject tllc trul:' value of 13 1 in on!)
5% of all po ible ~amples ; that is, in 95% of all possi ble samples the tru~ value.. ut
131 will nor be rejected. Because the 95% confidence interval (as defi ned in the lir.,l
defin ition) is the set of all values of 13 1 that are noc rejected :u the 5% s ignificnn~:~
level, it foll ow~ that the true value of {3 1 will he contained in the confidence int<:rval in 95% of all possible samples.
As in the case of a confidence interval fo r the population mean (Section 3.3).
in princip le a 95 % confidence interval can be computed h~ testing all possible v lues of {3 1 (that 1s. !~ ling the null hypothesi. {31 = 13~,11 for all va lue~ of /31.0) at the
5% ~ignificance level using the r-statistic. The 95% confidence intl!rval is th~ 1
the collection of aJJ the values of {31 that are not rejected. But constructing the
1-Stali tic for all value& of {3 1 would take forever.
An ca~rer way to construct the conrideuce interval is ro note that the r-,t.tll'tic will reject t he hypothesizt!d value 131,0 whenever {3 111 is outside the r.rngt:
[31 .. 1.96S({31). That is, the 95% confidence interval for f3t i the intc..r, nl
{~ 1 - 1.96SE([3 1).[31 + 1.96SEul1) }. This argument paraJlels the argument usc::d til
develop a confidence interval tor the population mean.
The construction of a confidence in te rva l for 13 1 is !i ummari1.cd ns "l,Y
Concept 5.3.

Confidence interval for {30

A 95% conl"it.lence interval fo r {311 is COil"tructl.'d


as in K ey Concept 5.3, with f3u and S(j30 ) replacing ~ 1 am.l S(~ 1 ).

The ou; regression of the tC~l score again'1 131c.'


student teacher r.llio. reported in Equation (5.&). ~ iclt.li.!c..l [3 1
:!.2M and SF(Jl I
= 0.52. The 95% two-sided confidence interval for 13 1 is 1- 2.2~ .... 1.96 X 0.521.' 1'
-3.30 s 13 1 ....,. - 1.26.The value 13 1 = 0 is not conlaioi.!d in this C1mfidcnce 1nter' ll.

Application to test scores.

5.2

Con~dence Intervals foro Regression Coefficient

157

ruct

CoNFIDENCE INTERVAl FOR


l~as
smg
e rval

pos-

e true
Dies. it

[3 1

A 95% two-sided confidence interval for /3 1 is an interval rhat contains the true
value of /3 1 with a 95% probability: that is. it contains the true value of /31 in 95%
of aU possible randomly drawn samples. Equivaltmtly, it is the set of values of f3 1
Lhat cannot be rejected by a 5% two-sided hypothesis test. When the sample size

is Large, it is constructed as
95% confidence interval for {:3 1 =

>iS I~SI

rn ~>nly

!Pt- 1.96SE(/31),,81 + 1.96SE(,81)].

(5.12)

~ luc: of

ne fir~ l

so (as we knew a lready from Section 5.1) the hypothesis /31 = 0 can be rejecte d a t

riC<lllCe

the 5% significance leve l.

I. intcr-

Confidence in ter vals for predicted effects ofchanging X . The 95% confidence interval for {3 1 can be used to construct a 95% confidence interval for the
predicted effect of a general change in X.
Consider changing X by a given amount, t:.x. 1l1e predicted change in Y asso-

ing the
/ Slid liS

range

,1!

'interval
~ USI.!d tO

ciated with this chauge in X is {3 1ux. The population slo pe {31 is unknown , but
because we can construct a confidence i.nterval fo r {3 1 we can construct a confidence in ~erva l for the predicted effectf3 1nx. Because one end of a 95 % confidence
interval for {3 1 is ~ 1 - 1.965(~ 1 ), the predicted effect of the change t:.x using this

estimate of {3 1 is [,8 1 - L965(,81) J x t:.x. The other end of the confidence inter val is ~ 1 + 1.965(,81) , a nd the predicted e ffect of the change using that estimate
is [~ 1 + 1.96S(~1 ) J X
Thus a 95% confidence interval for the effect of chang-

ux.

ing x by the amounl t:.x can be expressed as

, a~ Key
95% confidence interval for f3 16.x =

lstruct..:d

[.$16. x- 1.965(~ 1 )

X Sx,

~ 1 6.x .!.. 1 .965(~ 1 )

.h).

(5.13)

For example, our hypothetical superintendent is contemplating reducing the


student- teacher ratio by 2. Because the 95% confidence interval for {3 1 is
~ain~t thC

Hl

St.(~tl

( 0.521. 01
intcl'aL

[- 3.30, - 1..26), the effect of reducing the st ud cn t-teacher ratio by 2 could be as


great as -3.30 x ( - 2) = 6.60. or as little as - 1.26 x ( - 2) = 2.52. Thus decreasing the stud ent- teache r ratio by 2 is predicted to increase test scores hy between
2.52 and 6.60 points. with a 95% confidence level.

158

CHAPTER

5 .3

.s

Lineor Regression with One Regressor

Regression When X Is a Binary Variable


The di!.cu~~ion so fa r has focused on the case that the regres'lor is a onnnuou-.
'<tri.tbk Regression analy-.i" cnn also be used \.\-hen the reF.ressor is hinan - that
is, " hen it takes on only two values, 0 or 1. For example, X mi~ht be a worker":.
gcnc.kr ( 1 tf fe male. = 0 tf male}, whether a -chool dbtrict '' uroan or rural
(= 1 if urhan, = 0 if rural). or whether the di~tnct"s class ,j,l. j, 'm til or larg~
(= l if. mall. = 0 if large). A binary variable is also C<.tllcd Hn indicator \ Sriablc or
somcttmes a dummy l'llriable.

Interpretation of the Regression Coefficients


The mechanics of regression with n binary regressor arc the same as jf it is continuous. ll1c intcrprctalilm of {3 1 however, is different , and il turns out that n~.gres
sion with a binary variahlc is equiva lent to performing n diffe rence of means
ana lysis, as desc rib~d in Section 3.4.
To see this. suppose you have a variable D, that equals e irher 0 or I, dcpcnJ
ing on whether the studenr- teacher ratio is less than 20:
0

= {1 if the student- teacher ra\lo in 11h discrict < 2U


'

0 if the studenc- lcache r ratio in 1111 district

~ 20.

(S...

The population regression mood Wllh D, as the regres<>or I'>

(5.1 " I
Thi is the same as the n.:gTt!l>Sion model with the conLinu(lUS regreo;.:;or X,. exc~: rt
that now the rcgrc~sor is the binary variabk D;. Because D, 1.., no1continuous. It i"
not usefu l to think of {3 1 as a slope; indeed. because D1 ci:ln take on only tw('l' I
ues, there is no ''line'' so it makes no sense to talk about a slope. Thus we will n(
refer to {3 1 as the :,lope in Equation (5.15): instead we will simply refer to {3 1 as tl
coefficient multiplying D 1 m this regression or, more compacLiy. the coefficient
on D 1
II {3 1 in Eq uation (5.15) is not a slope, then what is it'! Th ~ be:,t way tu inh:r
pret {341 and {3 1 in a regression with a binary regressor is to con-.idcr one at a tim~
the two possible cases. D, = 0 and D1 = l. lfthc Stu(knt- teachcr ratio'' high. then
[) 0 and Equation (5.15) hccomes
Y, - {3,, ~

II,

(D, = 0).

(5 I )

S .3

Regression When X Is o Binary Vanoble

159

Because E(ufi D,) = 0, rhe condition a l expectation of Y, when D; = 0 is


(Y1 1D, = 0) = {30: tha t is, f30 is the population mean value of test scor es when the
stude nt-teacher ratio is high. Sim ilarly, when D; = l,

.inuous
~ -th at

(5. 17)

orkcrs

Jr rural

:Jf

large

iable or

Thus. when D, = I. E(Y,[D, = l) = f3n+ {31; tbat b. f311 + f3 1 is t he population mean


value of test scores when the student-te acher ratio is low.
Because {30 - {31 is the population m ean of Y; when D, = I and /3n is the pop-

ulalioo mean of Y; wht:n D;

= 0. tl1e difference ({3 0 + {3 1)

f3n = {31 is the differ-

e nce between these two means. In othe r words. {3 1 is the differe nce bet wee n

it 1s conat regrcsof means


,depend-

(5.14)

the conditiona l expecta tio n of Y; when D, = I a nd when Di =0. or {3 1 =


( Y,' D; = I) - E( Y, D; = 0). In the test score example, f3 1 is the difference
betwet!n mean test score in districts with low st ud..:nt-teacher rauos anJ the me an
Lest score in d istricts with high s tudent-teacher ratios.
Because {3 1 is the d iffe rence in lhe population means, it makes sense thai lhc
OLS e~timator f3, is the diffe rence between the sample averages of Y; in the two
groups, and in fact this is the case.

Hypothesis tests and confidence intervals.

If the two popula tion means are

the same. then {3 1 in Equation (5.15) is rero. Thus. the null hypothesis that the two
population mea ns are the same c an be tested against the alll!rnative hypothesis that they diffe r by testing the null hypothesis

(5.15)

r X ,, except

13 1 =

0 agamst the alternative

{31 -1= 0. This hypothesis can be tested using the procedu re outlined in Section 5.1.

Specifically, the null hy pothesis can be rl!jecrcd a t the. 5% level against the twosided alterna tive whe n the OLS t-statistic 1 = ~~I St:(~ 1 ) exceeds 1.96 in a bsolute

,inuous. it is
Ill) two val-

value. Sim ilarly, a 95% confidence interval for {3 1 constructed as~~ ::!:: J .96SE(~ 1 )
as d escribed in Section 5.2.. p rovides a 95% confidence in terval for the difference

we will oot

between the two population means.

to f3t as the
~ coefficie nt

Application to test scores.

;vay to intet
1ne at a tJ 01t:.
~is high. then

mated by OLS using the 420 observations in Figure 4.2, yields

(5. 16 )

As a n example. a regression o f lhc tC<;f score


against the student- teacher ratio binary va riable D defined in Eq uation (5.14) esti-

TestScore
= 650.0 + 7.40, R?
(1.3)
(1.8)

= 0.03).5R =
-

18.7 ,

(5. 18)

160

CHAPTER 5

l.meor Regression with One Regressor

where the standard errors of rhe OLS estimates ol the coefticienlS /3. 1 and /3 1 .uc
gjven in parentheses below the OLS estimates. Thus the avcra!!,e tc:.t score.: ll'r the
subsample with student- teache r rattos greater than or equal to 20 (th~t is, 11 1
which D = 0) is 650.0. and the average test score for the. sub~ample \\ith stu
dent- teacher ratios less than 20 (soD = I) is 650.0 - 7.4 = 6~7.~ The <Jif(c.:rcn c
between the sample average test score:. for the two group:. i~ 7 l Th~ '"the OI.S
estimate of {3 1 the coefficient on the student- teacher ratio hunn \'an:tble D.
Is the difference in the population mean test score~ in the tv.o j,!roups st.t 1 ll
cally s1gnif1cantly different (rom zero ar the 5% level? To ltnd out. construd the
r-sta tistic on 13 : r = 7.4 / 1.8 = 4.~ . This exceeds 1.96 in absolute value, ~u th~.:
hypothe!>i that the! population mean test scores in d btricts with high .md low :-tudent- teacher ratios is the same can be rejected at the 5% significance. lc\d
The OLS estimator and its standard error can be used to construct a 95% Lunfid ence interval for the true difference in me ans. Th is is 7.4 1.96 X 1 ~ -(3.9.10.9).This confidence interval exclude:. {3 1 = O.so that (as we know from thl!
previous paragraph) the hypothesis {3 1 = 0 can be rejected at the So/n signtfhllll.:l!
level.

5.4

Heteroskedasticity and Homoskedasticity


Our only assumption about the dislributjon of u, cond11tona1o n X; tS that 11 h:1: a
mean o( zero (the first least squares assumption). If. furthermore, the arumte of
thi~ conditional dislriburion does not depend on X,, then the errors .trc.. 'aid to be
homoskedastic. This section discusses homoskcdastic1ty, tts theoretical implica
lions. the simplified formulas for the standard errors of the OLS c~timator' that
an"c if the errors arc homoskedal>tic, and the nsk.:. you run if you use th.::-e ... un
pltfied formulas in practice.

What Are
Heteroskedasticity and Homoskedasticity?
Definitions of heteroskedasticity and homoskedasticity. 111c crrot tt.'trn
u, is homoskedastic ilthe variance of the conditional distribution of u,gav..:n \ ''
constant fori = 1. ... , nand tn particular does nor ul.!pt:n d on X, . Otherw~~. the

error tcm1

IS heteroskedastic.

160

CHAPTER

linear Regression with One Regressor


wb.:rc the standard errors of the OLS estimates or the coefficienrs {311 and {3 1 tu I!
given in parenlhcses below the OLS estimates. Thus the average test scor~ tor the
subsample with s tuden t- teacher ratios greater t han or equal to 20 (that b, tur
wbich D = 0) is 650.0, and the average test sco re for lhc ~ u bsamplc with -.tudent- teacher ratios le~ than 20 (so D = I) is 650.0- 7.4 = 657.4. n tc differ~.:"".!
between the sample average test scores for the two groups is 7 .4. This is the OI.S
estimutc of {3 1 the coefficient on the student-teacher ratio binal) variable 0.
Is the difference in the population mean test scores in the two groups stat''' .
cally significantly different from zero at the 5% level? To find out , construct th
!statistic on {31: r = ?A / 1.8 = -t04. This exceeds 1.96 in absolu te value , ~o 11c
hypothesis thatlhe population mean rest scores in districts with high and IO\\ <.tudent- teacher ra tios is the same can be rejected a t t he 5% s ignificance leve l.
The OLS e!>timator and its standard error can be used to construct <1 95% Ctnfidencc interv;l l for the true difference in means. This is 7.4 1.96 x u; ==
(3.9, 10.9). This con fidence inte rval excludes {31 = 0, so tha t (ns we know from the
previous paragrap h) th e hypothesis /31 = 0 can be rejected at the 5% signittc.mce
level.

Uke F

these
the di

5.4

Heteroskedasticity and Homoskedasticity


Our only assumpt ion about the distribut ion of u, condit ional on X1 is that it

h,~ a

mean of zero (the first least squa res assumption). It furthermore. the varwnct< of
this conditional distributJon does not depend on X,, then the e rrors are sa klw h~

bomoskedastic. lltis section discusses homoskcdasricity, its t heore tical implkations. the simplified formulas fo r the standard e rrors of the OLS esti ma tor~ that
ari e if tbe errors arc homoskcdastic, and the risks you run if you usc th l!~~ s1m
plified formu l a~ in pract.ice.

What Are
Heteroskedasticity and Homoskedasticity?
Definitions of heteroskedasticity and homoskedasticity. The error

t~rm

u 1 is bomoskedastic if the variance of lhe conditional distdbution of u, g.aven X 1 ~


constant for i = l, .... 11 a nd in particular docs not depend on X;. Othcrwasc. rhe
e rror rerm is hetero kedastic.

s.4 Heteroskedaslicity a nd Homoskedosticity


cl (3 1 a re
c tor the
tat is, for
with stulifference
. the O LS

FIGURE 5.2

Te~t

l.8

An Example of Heteroskedasticity
score

720

700

ble D.
ps statistiIRtrucr the
lue. so the
1d low stulevt:l.
H 95% con-

l)6

161

6HO

Distribution of Ywhen X "' 15

o~>otloo ol

X" 25

660

MO
620

>W from the


significance

; that it bas a
e variance of
ne said to he
tical implicatimators that
1se these sim -

he error term
f 11, given X, i:)therwisc, the

Srodcot-teacl1er ratio

Like Figure 4.4, this shows the conditional distribution of lesl scores for three different doss sizes. Unlike Figure 4.4,
these distributions become more spreod outlhove o larger variance) for larger doss sizes. Becouse the variance of
I the distribution of u given X, vor(ul X), depends on X, u is heteroskedastic.

As an illustration . return to Figure 4.4. The distribution of the errors 11.1 is


shown for various values of x. Because this distribution applies specifically for the
indicated value of x, tllis is the condicional distribution or u, given X;= x. As drawn
in that figure, all these conditional distributio ns have the same spread; more pre
cisely. the variance of these distributions is the same for the various values of x.
That is, in Figure 4.4, the condiljonal variance of u, given X 1 = x does no r de pe nd
on x, so the errors illustrated iD Figure 4.4 are bomoskedas tic.
In contrast. Figure 5.2 illustrates a case in which the conditional distribution
o f u; spreads out as x increases. For smaD values of x, this d istribution is tight. but
for larger values of x, it has a greater spread. Thus, in Figure 5.2 the varia nce of u1
given X; = x increases with x, so that the e rrors in Figure 5.2 are he \e roskedastic.
The definitions o f he teroskedasticity and homoskedasticity are summar ized
in Ke y Concept 5.4.

162

CHAPTER S

linear Regression with One Regressor

HETEROSKEDASTIClTY AND HOMOSKEDASTICITY


The error term u, is bo moskedastic if the variance of the conditional distribution
of a, give n X,, var(u1J X; = x). is co nstant for i = 1. . .. . II, and in particular does not
de pend on x. Otherwise. the e rror te rm is he te ros kcdas tic.

Example.

These terms a re a mouthful and the definitio ns might seem abstract


To he lp clarify them with an e xamp le. we digress from the student- teache r
ra tio/test score problem and instead return to the e xample of earnings of male versus female college gradua tes considered in the box in C hapter 3. "'!h e G ender Gap
in E arni ngs of College Gradua tes in the United S tmes.'" Let M.A L1 be a hinar~
variable that equals 1 fo r male college graduates and e q uals 0 for female graduates. The binary variable regressio n mode l relctting somcone's earnings to his l't
her ge nde r is

(5.1 ':l}
for i = 1, ... . n. B ecause the regres~or is binary. {3 1 i:, the difference in the population means of the two gr oups-in this case, the d ifference in mean e a rning:..
between men and women who graduated from college.
The definition of ho moske das ticity sta tes th a t the varia nce ot u i does o,t
depend on the regressor. He re the regresso r is MALE;. so a t issue is whether th~
variance of the erro r term depends on j'vf A Lt.", . ln other words, is the variance of
the error ter m the sa me for me n and for women? If ~o. the e rror is homoskecla\tic: if not, it is heteroskedastic.
Deciding whether the vari ance of u, d e pe nds on MALE, re quires think in~
hard about what the error term actually is. In thh regard. it is useful to write Equ.ltion (5.19) as two e parate e quation::.. one for men and one fo r women:

Eamings,

= {30 + u,

(women) and

(5 .2(1)
(5.'21)

Thus, for wome n, u, is the de' iation of the i'11 womans earnings from the popul.1
tion mean earnings for wome n (/3r>), a nd for me n, 11, is the Jeviatio n of the fh man\
earnings from the population mean earnings for men ({30 - {3 1) . J1 fo llow:-. tb atl h ~

'

5.4

Heteroskedos~city and Homoskedosticity

163

statement. "the variance of u1 does not de pe nd on MALE," is cquivalc m to the


statement. ' the varia nce of earnings i~ Lhe same for men as it is for wome n.'' In
othe r words. in this l:lxamplc. the e r ror term is llo moskedastic if the va ria nce o( lhe
population distribution of earnings is the same for me n and wome n; if these variances differ. the error term is heteroske dastk.

tribut1on
does not

Mathematical Implications of Homoskedasticity


T he OLS estimators remain unbiased and asymptoticaJiy normal.

1 ab!>tracl.

Because the least squa res assumption::; in Ke y Concept 4.3 place no restr ic tions on
the conditional variance, they apply to both the general case of he te roskedasticity and the specia l case of homoske dasticiry. Therefore. the O LS estimators
remain unbiase d a nd consisten t even if t he errors are bomoskcdast ic. I n addition.
the OLS est imators bave sam pling distributions t hat rue normal in large sam ples
even if the erro rs a re hom oskedastic. Whet her the errors a re ho moskedastic or
heteroskedastic, the OLS estimator is unbiased, consistent. and asympto tically
normal.

t-teacbcr
male ver!nder Gap
,e a binary
.ale graduts lo his or

(5.19)

Efficiency ofthe OLS estimator when the errors are homoskedastic. If


the least sq ua res assumptions in K ey Concept 4.3 hold and the l!rrors are
homoskedastic, then the OLS estim ators ~0 and ~ 1 are efficient a mong all estimators that are Linear in Y, . .. , Y11 and are unbiased, conditiona l on X 1, ,Xw This
result, which is called U1e Gauss-Markov theore m, is discussed in Section 5.5.

1the popuLn earnings

docs not
vhethcr the
variance of
omoskcdas11 1

Homoskedasticity-only variance formula. lf the error term is ho moskedastic, then rhe fonnulas for the variances of ~ 0 and ~ 1 in Key Conce pt 4.4 simplify.
Consequently. if the errors are homoskedastic. then there is a specia bzcd formula
that can be used for the s tandard e rrors of ~0 and ~ 1 . The homoskedasticit)onJy
standard error of ~ 1 . de rived in Appendix 5.1. isS (~ 1 ) =
whe re Cf~ is t he
""'
Pt
f-' 1
homoskedasticity-only estimator of tht:: va ria nce of {3 1:

rc::. thi nking


write Equo-

v'iif.

~n :

s~ _ _
- 2 - -,-----.:.:..___
0'f>l II

(5 2())
(5 21 )

n th\;. popula111 man:1l tht~ i


tllow-, thattht:

(h o mos k~dasticity-only) ,

(5.22)

}:(XI - X) 2
i=1

whe re s~ is given in Equation (4 .19). The hom oske dasticity-only fom1ula fo r the
standard e rror of ~11 is given in App end ix 5.1. In the special case tha t X is a binary
variable, the el>tima tor of the va riance of ~~ unde r bomoskedasticity (that is, the

164

CHAPTER 5

linear Regression wilh One Regressor

$qUMC of the standard error of ~ 1 under homoskedasticity) is the so-callt.!cJ pooled


variance formula for the difference in means. given in Equation (3.23).
Because these alternative form ulas are derived for the special case that the
errors arl! homo:.kcda:.uc and do not apply if the errors are h et ero~k cua-.tic, th~y
"tll be rcfl.:rr~d to as the " homoskedasticiry-on ly" formula~ for the: ' anancl. and
.,t.tndard error of the OLS c timator~. As the name sugge~t if th-.: errors Me het'-'Hl h :d.tstic. then the homoskedasticiry-onl} standard error- .1re inappropriate
Spccificall~. if the errors are hetcroskedastic. then the r-statistic computed u-.in ~
the homoskedaMicity-only standard error does not have a standard normal distri
hution, even in largl! sample:.. Tn fact. the correct critica l values to us~ for th1
homoskedasticity-on ly c-statistic depend on the precise nature of the ht.t
cro::.kcdasticity, ~o those critical values cannot be tabulated. Similarly. if the error:.
ure hctcroskedastic but a confidence interval is constructed as ::!: 1.96 homoskeda:o.
Licity-only standard errors. in general the probability thm this interval contains the
true value of th e coefficient is not 95%, even in large !:>a mples.
In contrast, because homoskedasticity is a specia l case of heteroskedastic1tv
the estimators
and <;J.. of the varian<:es of {3 1 and /311 given in Equations (5 ....
and (5.26) produce val id statistical inferences whether the errors are h1.. .
ero~kcdac;tic or homo~l..~d astic. Thus hypothests tests and confidence inten tl:.
b.t~e d on those standard e rrors are valid whether or not lhe errors arc h~.
eroskcdast1c. B cca~ the standard errors we have used so far (i.e., lho~c based
Equations (5.4) and (5.26)] lead to statistical inference~ that a re valid whether
not the errors are beteroskedastic. they arc called heteroskedastil'ity-robust randard errors. Because such formulas were propo~ed by Etcker ()967). Ifub....
(1!167), and Wbtte (1980). they are also referred to as E1ck<:-r-Huber-Wh1te ,t.
dard crrort

ifJ,

What Does This Mean in Practice?


Which is more realistic, heteroskedosticity or homoskedosticity? 11 L
answe1 to this question depends on the application. Uowever. the ic;sucs can hl
clarilil.!d by returning to the example of the gender gap in earnings among coli.:~"
~ra dua tC!.. r amiliarit) with ho" ~oplc are paid in the world around us gives ~0011.
clues,,, to which assumption is more '>ensihl.:. 1-"or many yearc;-and. to a k Nf
extcnt, totlay- \\ omcn were notlound in the top-pnyingjobs:There h<~vc ah>..''
bc\. n poorly paid men. but there have rardy been highly paid wom~ n . This :.u;
gc.,ts th tllh~o: Jl'tnhution ot e.1rrungs among women is ttghtcr than among m~.n
(~c: the hox tn Cb tptcr 3, "Th~.. Clend~r Gap m I:.armng.s of c ,llkgl; GracJuat~' ill
th.. nit~d ~t.tll.:-. ). In nth~..r \\Ord .... the Hlri,mcc: of tht: e rror tc;rm in Equa-

5.4

rolcd

the
.they

n ~1\'erngc. workers with more education have


highe r earnjngl> I han workers wilh le~~ education. But ii the best-paymgjobs mainly go to th\! colkg.: educated, it might nl ~o he that the spread of the
di::.tribution of eaming~ is greater fo r workers with
more education. Do~s the distribution of earnings
spread out as education increases?
This is an empiric<JI queo;tion. so !lll!>wering it
r\!qurn:s analyzing data. figure 5.3 is a scatterplot of
the hourly earnings and the numher of years of educ uion for a sample of 2950 full-time workers in Uu:
L ojt..:d States in 2004.llges 29 and 30. with between
6 and 18 years of education.'! he data come from the
March 1005 Current Population Survey. which is
d~~cribt:d in Appendix 3.1 .
Figu re 5.3 has two ~triking ieaturcs. The fint is
that the mean of tht: distribution of earnings
rm:rcases with the numbtr of years of education.'Thil>
rncn:asc h summari?.ed hy the OLS regression line,

c nn d

e het-

rnatc.
l uo;ing

distrior this
rl' hel c:rrors
~kedas

ains tbe
la~ticity.

1nS (5.4)
:trc hcl-

n terval<;

are hct:~ ased on

1ether t)r

ms1 stan
). Huher
~tao-

~ = -3.13 + 1.47Years Education,


(0.93) (0.07)
(5.23)
R2 - 0.130.SR = 8.77.
Thi~ line 1s plotted in Figure 5.3. The coefficient of
I 47 in the OLS r~~res~ion line means that. on

ty? 1 he

1es can h~
ng. colk,(!
~\'CS SOnll.

to a le:.~er

ave

a lwn~5

1. Thi

165

The Economic Value of a Year of Education:


Homoskedasticity or Heteroskedasticity?

ll

hitc

Heteroskedasticity ond Homoskedoshcily

FIGURE 5 .3

average. hourly earnings increase by $1.47 for each


additional year of education. The 9:'\'ln confidence
inu.:nal for this coefficient is 1.47 1.96 X 0.07. or
1.33 to 1.61.
The S~<!ond striking r~a t urc of Figure 5.3 i!. that
the spread of the distribution of earnings increases
with tht! yt:ars of educdtjon. While some \\ Orkers
with many years of education have low-pa> ing jabs.
\'ery few workers wilh IO\\ Jt:,eh; of education have
hi~h-pnying jobs. This can be stated more precisely
by looking at tl1c spread of the residuals around the
OLS regression line. For workers with t~n years of
~ducation. the standard deviation nf the residuals is
$5.46: Cor workers with a high sehoul diploma. this
standard deviation is $7.43: and for workers with a
colleg~: degree, rhis <;tandard de\'iation increa5es ro
$10.78. Because these stamlard devialiolb differ for
different lcvds of educatrou, the \arirmce of the
r<.:sidunls in the regression of Equation {5.13)
depends on lhc valutl of the n:g,n.!ssor (the years of
cuucation): in otht:r words, the regres:;inn errors are
heteroskeda~tic. In real-world terms. not all college
graduates will he earning $50/hour h) th~ time the>
are 29. but some will. aml workc~ with only ten years
of cdU<:ation ha\c no shot at those johs.

SC'atterplot of Hourly Earnings and Yeors of Education


for 29- to 30-Yeor Olds in the United States in 2004

Hou rly ea roin~~


Hourly earnings are ploHed against years of education
1()11
I<Jr 2950 full-time. 29 to 30yearold workers. The spread
und the regressioo line increases with the years of educotion,
d!CCJ11ng that the regression errors ore heteroslr.edostic.

sug.

.mong men
raduatl!~ Ill

n in I.qua

I r;

~~~

Yu"' o l oducacion

166

CHAPTER 5

lmear Regression with One Regre$sor


tion ( 5.20) ror , .. omen is plau~ib l y less rhan tbe variance of the error te rm in
Equation (:.21) (or men. Thus.. the presence of a .. gLJ~ c~iling for \\ omen JOh
and pa} suggests that the e rror term in lhe binary variabk regn.:ss1nn model m
Equalion (5. 19) is heteroske~tic. tJnless lhert! are compelling rt!a ... uns to the contrary- and we can think of none-it make sense to treat the c ror term in thi
exampk a" hdcroskedaslic.
As th1s example of modeling earnings illustrate~ hctcro~l..cda.,tidty arist:~ 111
many econometric applicatjons. At a general level. economic thcol) rarely gi\t:~
any reason to believe that the errors are bomnskedastic. It then:.fore 1s prudent to
assume that the errors might be hete roske<.las tic unless you ha ve compclli n6
reasons to believe otllcrwise.

Practical implications. The main issue ot practical relevance in this di scu~


slon is whether one should use beteroskedasticity-robusl or homoskedasticity-onh
standard errors. In tllis regard, it is useful to imagine computing both, then choo'\
ing betwt:en tbem . If the homoskedasticity-on ly and bc tcroskedastkity-robu.;t
standard errors are the same. nothi ng is lost by using 1he heteroskedasticity-roh ~: t
standard errors: if they differ, however . then you should use I he more reliabk l.ln ...
that allow for hcteroskeda!>ticity. The simplest thing. then, is always to use tl'.:
heteroskedast icily-robust standard errors.
For historical reasons. many software programs u the homoskedasticit} -unl\
standard errors as 1heir default setting, so it is up to 1he u er to specify the opt n
of heteroskedasticHy-robust standard errors. The details of how to implem~.. nt
hctcroskedasticity-robu~t standard errors depend on the software package you usc
AJI of the empirical examples in this book employ heteroskedasticit\ -robuo:t
c;tandard error; unless explicitly stared otherw1~e 1

s.s The Theoretical


Foundations of Ordinary Least Squares
As diset~ssed in Section 4.5. the OLS estimator is unbiased, is consistent, has n vn ianct! that is inversely proportionaJ ton, and has a nonnal sampJjng dist rihuthH1
1l n ca.'t this hool.. b u'c:d m ~:vn)unction wit h othtr tC\10:.. it might he hc:lplul to not th,ll ~lml.' tc\t
JJ h"mo,J,.;c:J.t'ltctl)' to the Ltst of lc:al ~uaro a UIIlJ'IWn\. 1\~ JU'I Ji,'u"c:d. h11" "' cr.

l>..nl..

1h1' JJ lillur 11 ,t" ump!l<ln

not ncedc:J C\H th<: hJII)


n...srd <:rror-< .m: u~J.
~Tllt'i ~""'!lOll 1'1 ortlorulllllld h 001 U'L>d In 1<. d "I
bt'tc: , l..cd.LtKII\ -robu~t

lS

or

01 \

,,.!.''-"

JO

onJI~"' a lu n~

11
'

s.s The Theoretical Foundations of Ordinary least Squares


min

, job!>
jel in
~con

n this

ises in
gives
lent to
)e\ling

discusty-only
choos
-robusl
'-robust
>lc on~s
u:.c the

167

when the sample size is large. lo addition. under certain conditions the OLS estim ator is more efficien t than som e other candidate estimators. SpecificaUy. if the
least sq uares assumptions hold an d if the e rrors are homoskedastic. then the OLS
estimator has the smallest va ria nce oi aJJ condi tionally unbiased estima tors that
are linear functions of Y1 , Y w T his section explains and discusses this result.
which is a consequence of the Gauss-Markov theore m. The section concludes with
a discussion of alternative estimators that are more efficient than OLS whe n the
conditions of the G a uss-M arkov theorem do no t hold.

Linear Conditionally Unbiased


Estimators and the Gauss-Markov Theorem
If the three least squares assumptions (Key Concept 4.3) hold and if the e rror is
homoskedastic. then the OLS estimator has the smallest variance, conditio na l on
X 1, . . , X,. a mong aU estjmators in the class of linear conditionally unbiased estimators. In other words, the OLS estimator is the Best Linear conditionally Unbi
ased Estjmator-that is, il is BLUE. This result extends to regression the result,
summa rized in Key Concept 3.3, that the sample a verage Y is the most efficient
estima tor of the popula tion mean among the class or all estimators that are un bi
ased and a re linear functio ns (weighted averages) of Y 1 Yw

:ity-only
~ option

plcmcnt
you use.
y-robust

Linear conditionally unbiased estimators. The class of Linear conditionally


unbiased estimators consists of all estimators of {3 1 tbat are linear fu nc tions of
Y 1 . . , Y, and that are unbi ased, conditiona l on X 1 . Xw That is, if ~ 1 is a linear estimator. then it can be wrillen as

"
731 -= 2:a;Y,
i=l

(~ 1 is linear),

(5.24)

>tributioo

where the weights a1... a11 can depend on X 1, X 11 bul nol on Y1, Y,."Ine
estima tor /31 is conditionally unbiased if t be mean o f its conditiona l sampling
distribution, given X 1, . ... X,, is {3 1 Thai is, the estima tor~~ is conditionally unbiased if

1 ~ome ~~ ~~

E(/3 1 1 X 1 . .. , X,)= {3 1 ({3 1 is conditionally unbiased).

tas a' ari-

(5.25)

d. howe\'cr.

s as long as

The estimator 1 is a linear conditionally unbiased estima tor if it can be writ


ten in the form of Equa tion (5.24) (it is linear) and if Equa tion (5.25) hold~ (it is

168

CH APTER 5

linear Regre~sion with One Regresror

THE GAUSs-MARKOV THEOREM FOR

5.5

~1

--------------------------

If the three least squares assumptions~ Key Concept 4.3 hold a11 r errors arc
homol>kedastic. then the OLS estimator (3, is the Be t (ntol>t ef1icicnt) linear conditJOnall) Unbiased Estimator (b BLUE).

conditionally unhiascd}. It is shown in Appendix 5.2 that tb~ OLS estimator i~ It


eat and condilionaJiy unbiased.

The Gauss-Markov theorem.

ll1e Gauss-Mnrkov rheorem stares thnt. undtJ


a se t of conditions known as the Gauss-Markov conditions. lhe OLS estimatoJ 1~ 1
has lhc smallest conditional variance. given X 1, X,, of all linear conditional lv
unbiased estimatOrs of {3 1: that is. the OLS estimato is BLUE. 'l he Gauss-Mark.u'
conditions, which are l';tatcd in Appendix 5.2. an: implied hy rhc three least square~
assumptions plus the assumption that the errors me homoskedastic. Conscqucnllr.
if the thn.:c least squares assumptions hold and the error!' arc homuskedasuc t~ n
OLS is HLUE. The: Gauss-Markov theorem i!l sldtt::J in Key Conce pt 55 J
prove n in Appendi"X 5.2.

Limitations of the Gauss-Markov theorem .

Th Gauss-Markov tb<o:-~.:m

provides a theor etical justification tor usmg OLS. I lowever, th ~ theorem t a ... l\\O
Important limitations. Fir' I. its conditions might not hold JJ1 pracLice. l n particul1r.
1f the e rror term is be teroskedac;ttc- as it oft en is in econom1c appltcatjon-;- h~:n
the OLS estimator is no longer BLUE . As discussed io Section 5.4. the pre'~ .,c~
of h\:.tc roskedasticity does nor pose a threat to mle rence based on h f.. tero~h~ 1::.
tit:it y-robust standard errors, but it does mean thal OLS is no longer the dficll nt
linear conditionally unbiased estimator. An alternative to OLS when t hen~ ~~ hd
eroskeda!>ticity of a known form. called 1he weight ~! d hw:<t :.~J tl!l res cstimattH . 1'
discussed be low.
The sccon<.l limilation of the Gaul!SMarkov theorem is that even if the cnn
diti ons of the theorem hold, there are other candiJatc c~ t i mn tors then Mc not ltn
ear and conditionnll) un biased; under some conditions. these othct esumator'- 1 .:
more efficient than OLS.

Regression Estimators Other Than OLS


Under C:\: nam conditions. 'orne: rcgrL~..,ion .::>ltm.t. r. ~\!more Cll tCtcnt than 0~

5.6

Using the I-Stonsnc in Regression VVhen the Sample Size Is Small

169

The weighted least squares estimator. If the errors arc heteroskedastic, then

lerrors are

,jnear con-

DalOr is lin-

IhaLunder

's(imator /31
:mditionalh
,uss-Marko v
east squares
bnsequentl\'.
~das ti c, then
rep! 5_5 and

OLS is nO longer BLUE. If the nature of the hClcroskedastic is known-specifi


ca lly, if the conditional variance of u j given X j is kn own up to a constant fac tor of
proportionahly- then il is possible 10 construct an estimator that has a small er
variance than Ute OLS estimator. This method, ca lled weighted least squares
(WLS), weights the i lh observation by the inverse of the square root of the condi
tional va riance of u, given X i' Because of th is weighting, the e rrors in this weighted
regression arc homoskelias tjc, so OlS. when a pplied to the weighted data, is
BLUE. Although theoretica ll y elegant, the practical problem with weigh ted least
squares is that you must know bow the cond itional variance of II, depends on Xi
- something that is rarely known in applications.

The least absolute deviations estimator. As discussed in Section 4. 3, the


OLS estimator can be se n ~ iti ve to outliers. If extrt!me outliers are not rare. then
ot her estimators can be more eHicient than OLS and can produce inferences that
are more reliabl e. One such estimator is the least absolute deviations (LAD) esti
mator, in which the regreSSion coefficients (3() and {3. are obtained by solving a min
im iza tion like that in E quation (4.6) , except th at the absolute value of the
prediction "mistak e" is used instead of its sq uare. That is. the least absolute devi
a tions estima to rs o f
~;:' lI Y; - bo

kov theorem
)rcm has tWO
In particular.
'alions- then

the presence
Ilcte roskedas
:f the efficient
'n there. is het
S estimator. i~

if the CO Il
nal are nol lin

Yell

estimator" tlrC'

Pu aod

PI

are the values of ho and

bl tha t minimize

blXil. In practice, this estima tor is less sensi tive to large outliers

than i~ OL$.

In many economic data sets, severe outliers in II are rare, so use of the LAD
estimator. or other estimators with reduced sensitivity 10 outli ers. is uncommon in
applications. Thus Ihe tre<ltment of lin ear regression throughout the n:mainder of
this text focuse s exclusively on least squares met hods.
in

II

5.6 Using the t-Statistic in Regression


When the Sample Size Is Small
When the sample size is small, the exact distribution of the I-statistic is compli
cated and depends on the unknown popul ation distribution of the data. If, how
e\'er, the Ihree leasl squares assumplions ho ld , the regressio n e rro rs a.re
ho moskedastic, and Ihe regression errors a re normally distributed , then the OLS

"111is secli()n is oplional lind is nO! used in la\cr c hapICrs.

170

CHAPTER

Lrneor Regressron with One Regressor

estimator i'> normall) distributed and the homoskedastici ty-only r-:.tatistic has.
Stud~nt 1 ur>tribution. These five assumptions- the three least ~quares assu mp
tion-.. that the e rrors arc homoskedastic, and that the errors arc:: norma II~ drstrihutell- arc collectively caJled the homoskedastic normal regression tLSsumption,.

The t-Statistio and the Student t Distribution


Recall from Section 2.4 that the Student t distribution with m degrees of J rl!etlom
is defined to be the disrribmion o( Z l~. where Z is a rdndom vHriahle \\tth
a standard nom1al distribution, W is a random variabl~ with a chisqunrc::d uistrr
burion with m degrees oi freedom. and Z and Ware independent. Under the null
hypothesis, the /Statistic computed using the homoskedasticity-only standard errot
can be wrille n in this form.
The homoskcdMil icity-only 1-statis[ic testing /3 1 = f3~ 0 is 1 = (~ 1 - /3 10) I ir11
where
is tle(incd in Eq uaUon (5.22). Under the homosketlastic norm al regr~.:~
~ion assumptions, Y has a normal distribution. conditional on XL, ... . X11 .A., tJ,,
cus~~tl in SecLion 5.5. the OLS eslimator is a weighted average of Y1 Yw \\ ht..l c
the weights depend on X, ... , X, !see Equation (5.32) in Appendix 5.2}. BecatN~
a .,..cighted average ot independent normal random variables is normally dt-,tlrh
utetl, ~ 1 has a normal distrihution, conditjonal on X1 X,,. Th us (~ 1 {3111 ) h '
a nomal Jt..,tri bution under the null hypothcsis,conditionaJ on X 1 , Xn. l n J1.!L
tion.the (nom1alized) homo!ikcdasticity-only variance estimalOr ha~ a chi-squar".U
dbtribution with 11 - 2 degrc::c:s of freedom , divided by n - 2, and
and {3 1 :nc
1
independent ly distributed. Consequenlly. the homoskcdasticity-onJy t-statistit.. ha'
a Student r distnbution with n - 2 degrees of freedom.
This result is close!\ rdated 10 a result discussed i.n Section 3.5 in the con t~xt
of resting for the equality of the means jn two samples. Tn that problem. if t h~ '" v
populmion distribut ion., are normal with the same variance and if Ihe 1-stall:.lll ''
constructed using the pooled standard error formula lEquation (3.23)]. then th"
(poole<.!) t-statistic hn a Student t distrihmion. \),.'hen X is binary. the hornoskcd h
ticity-only standu rd error for ~l simplifies to the pooled standard error fnrmul,,
for the differe nce of means. It follows that the result of Section 3..5 is a special t:<b~
of the rcsull tha\. if the homoskedastic normal regression assumptions holtl.th~.:n
the homoskedasticity-only regression t-swtistic has a Student t distribut ion (-.~!!
Exercise 5.10).

crl

ua

Use of the Student t Distribution in Practice


11 the rcgre-;sion ~rrurs ure homoskeda.,tic and normally disiributcu and il th~
homo... kc<.J.l,ticuy onlv t-!.'latrstrc 1' used, lhcn critical ' alucs ~houh.l h~ t<tlo..cn lr' Ill

5 .7

Conclu~ion

171

the Student / distribu tion (Appendix Table 2) in:.tt:ad ot the standu rd normal distribution. Because the difference between the Student t distribulion a nd the normal disl ri bution is negligible if n is moderate or large. this distinction b rdcva nt
only if the sample size 1s small.
In econometric a pplica tions. there is rarely a reason 10 be lie ve that the. e rrors
are hom oskedastic and normally dis! ributed. Because s<lmple sacs typically are
large. however . inference can proceed as described in Sections 5. 1 and 5.2-{hat
is, by first co mputing betcroskedas ticity-robust standard errors. and then using the
standard normal distribution to compute p-valucs, hypo thesis test!>, and coufidencc
intervals.

tmp.trib-

ons.

~Jam
~wi th

uistri-

,e null

l enor

5 .7

Conclusion

)) I iJp,

egresA.s dis-

,where

.ecm.tst
tlistrih~1 0 ) has

In addi;q ~arcd

:1 {3 1 are

istic ha~

context
the t WO
atistic is
then the
t)skedas
formula
;cia! case
old. th..:n
,tion (::.ec

and if tlw
1k~.:n

!'ronl

Return for a moment to the problem that started Chuptcr 4: the su peri ntend~ nt
who is considering hiring additional teachers to cult he srudent-tcacher ratio. What
have we learned that she m ight find useful?
Our regression analysis. based o n the 420 observations for I Y98 in the California test ::.core data set. showed tha t there was a ncg<Hivc rela tionship be tween
the student- teacher ratio a nd test scores: Districts with smaller cla'l~cs have highe r
test scores. The coefficient is modera tely large, in a prac tical sense: D is tricts with
2 fewer stude nts per t.e<sc.he r have. on average. test sco res thut arc -to poims higher.
TI1is corresponds to mo ving a district a t the 501tt percen tile of the tli1>trib ution of
test scores to approxima tely the 60111 percentile.
l11e coeffi ciem on the s tude nt- teacher ratio is statistically s ignificantly difterent from 0 at the 5% significance level. The popttla tion coefficient might be 0,
und we might simply have estimated our negative coefficient by random sampling
variation. However, the prohability of doing so (and of o btaining a r-statisti t.:
on {3 1 as large as we did) purely by r a ndom varialion over potential samples is
exceedingly small. approximately 0.001% . A 9'\% confidence in te r al for {3 1 is

-3.30 :S; f3J :::; - 1.26.


Tit is represents considerable progress toward unswering the superintendent\
question . Ye t, a nagging concern remains. There is a negative re lationship between
the st udent- teacher ra tio and test scores. but is thi relationship necessarily the
causal one that the superintenden t needs to make her decision? D istricts with
Iowa student-teacher ratios have, on ave rage. higher test sco res. But does this
m~a n that re d ucing the student- teache r ratio will. in fact. increa~c <:cores?

172

CHAPTER .5

Uneor Reg~on

with O ne Regressor

There is. in fact, reason ro ~A O ITy that it ntight not. Hin ng more teachers. nltt:r
all, costs money, so wealthier school dislricts can beucr afford smaller classes. But
students al wealthier schools also bave other ad\antJgcs over their poorer neigh
bors. including better faciliues., newer books. and better-paid h~a~hcr.. MoreoH; r
srudents at wealt hier schools te nd the mselves to come from more aftluc01 r milit!S. and thus have o ther advantages not direct!) asscx.aah.:d \lrll h their '~.:hoot. f
example, California has a large immigrant communaty; th~c,; mmi~ ranb ll.. nd
be poorer than the overall population and, in m.my cases, t h~.:ar chaldren an. .,
natI\ c English speakers. It thus migbl be that our negative e~lim.tted rclataon'r tp
between test scores and the student- teacher rauo is a consequence of large cia''
be lllg fo und in conjunction with many other factors that are, in fact, the real caL c
of rhe lower test scores.
These o ther factors, or omitted variables." could mean that the O LS a nalv~ i,
done so fa r has little value to the superintendent. Indeed . it could be mi sl e ad 111~:
Changing the student- teacher ratio alone would noL change these other factos
that determine a child's pe rformance at school. To address this problem. we nccll
a method that will allow u:; to 1solate the effect oo test scores of changi ng t h~ o;t J
dent- teacher ratio, holding 1he$e other facLors constan t. Tha t method is multipi
regression analysis, the:: topic of Chapter 7 and 8.

Summary
1. Hypothesis testing for regression coe{ticiems is analogous to hypo tbe is t~.:,t ang
(or tb~ population mean: Use the /-statistic to calcula te the p-values and t;tlhcr

accept or reject the null hypothesis. Like a confidence tnterval for the populltltlO
mean. a 95% confidence interval fo r a regreSSIOn coefficient is computed '' ttl~
estimator 1.96 standard errors.
2. When X is binary, the regressio n model ca n be used to esrimate and test hypotlw
scs about the differe nce between the population means of the
= ogroup .111u
the " X = 1" group.

Key Terms

after

f But

reigh-

~fa~i;

~.nd to

tre not
o nship

classes

173

3. In general the error u1 is he te roskedastic-tbat is. the variance of u1 a t a give n value


of X,, var(u;!X 1 = x ) depends on x. A special case is when the enor is homoskedas-

lic, that is, var(u1 1X, = x) is constant. Homoske dasticity-only standard errors do
no t produce \!a lid statistical inferences when the errors are beteroskedastic, but
hete roskedasticity-robust standard e rrors do.
4. If the three least squares a ssumption bold and if the regression e rrors are

ho rnoskedastic. then , as a result of the Gauss-M arkov t heorem. the OLS estimator is BLUE.
5. If the three least sq ua res assumptions hold, i ( the regression erro rs are
homoskedaslie,and if the regression errors are normally distributed, then the OLS

,l cause

t-statistic computed using homoskedasticity-only standard e rrors h as a Studc:nt./


distribution when the null hypothesis is true. The diffe re nce between the Student

tnalysis
eading:
factors

t distribution and the normal distribution is negligible if the sample size is mod-

~e need
t he stu-

erate o r large.

Key Terms

rultiple

I
.
is testing
~d either
i)p ulation
:ed as the

, hypoth~
~oup and

null hypothesis (150)


two-sided alternative hypothesis (150)
s tandard e rror of~~ (151)
r-sta listic (151)
p-value (151)
confidence interval for {J 1 ( 156)
confide nce level (J 56)
indicator variable (158)
dummy variable (158)
coefficient multiplying variable D;(158)
COI!fficien t on D, (158)
heteroskedasticity and homoskedasticity
( I nO)

ho moskedasticity-only standard errors


(163)
betcroskedasticity-robust standa rd e rro r

(164)
best linear unbiased estimator (B LUE)
( 168)
Gauss-Markov theo rem ( I68)
weighted least squares (169)
homoskedastic oom1al r egression
ass umptions ( 170)
Gauss-Markov conditions (182)

174

CHAPTER 5

lmeor Regression with One Regres50f

Review the Concepts


5.1

Outline the procedures for computing lhe p -value o f a t~ o -sded test ('
H0 : p. y = 0 using .tn 1 1 d. set of observatiOn'\ Y 1 1, . . . ,11. Oullmc the p1 >
cedures for computing the p-value of a two-s tdcd h:.~ t of H0 : ~; = 0 in
rt:grcssion model Wiing an i.i.d. set of obc;ct vat ion" (Y, . X 1).t = J. 11.

5.2

Explain how you could use a regression model to e~ll ma te the " age gender
gap using lbc data on eam mgs of men anu women.\\ h..1t .m: the depend\: I t
and indepe ndent variublcs?

5.3

Define homos/..edamciry and heterosk(da\llcity. Pn l\ ide a hypothctil I


empirical example in which you think the c rro~ would be heteroskedaqi
and explain you reasoni ng.

Exercises
5.1

Suppose that a resea rcher. uc;mg data o n class size (CS) and aH~ rage
scores from 100 third-grade classes, e!\timau:s tbc OLS regression.

t e~t

~ 520.4 - 'i.82 x CS, R1 0 08, SF. R = 1J .S.


(2.0.4) (2.21)
a. Construct a 95% confidencl.' interval for ~ 1 . the regression slope co\,. f.

ucicnt.
b. Calculate lhe p -value for the two-stdcd test uf the null hypothesis
H1 ~ /3 1 = 0. Do you reject the null hypothcc;i<; at tbe 5% level? At the
l ~o level?

c. Calculate the p-value for the two-sided test of th1. null hypothesJ<;

f/0 : /3 1 = - 5.6. Wit hout doing any addititmal calcu lations, de t ~nnint
whether - 5.6 is contained in the 95% confidence interval for {3 1

d. Construct a 99 /.~ confide lice in tel\ al fn1~u


5.2

Suppo:.c that a researcher. u~ing. wage J.1ta on 250 randomly selecteJ m. 1.:
workers and 280 female workers. esllnMt..:s the OLS rcgres;;ion.
~

12.52 + 2 12 '>< Male. R'


(.:!3) (0 _\6)

II_II6,Sl~ R = 4 ~

Exercises

175

\\here War:e IS measurl.!d m $Jhour and Male 1s a binary \Mlable that is equaJ
to I if th1.. pe rson is a maJe .mu 0 1f the per {lO I" .t kmale. Define the wage
genc.lcr gap as the difference in mean earnings between men and women.
a. V. hat is the estimated gender gap?

b. Is the estimated gender gup "igmlic:mtly different from zero? (Compute the p-valuc for testing the null h ypot hesi ~ that there is no gender
gap.)
c. Construct a 95% confidence interval for the gender gap.
d. Jn the sample, what is the mean wage of wome n? Of men'?
e. Another researcher u~e~ these same data, but regrc~scs \Vage.s on
Fmwle, a varia hie that IS equal to 1 if the person is female and 0 if the
person a male. Wha t are the regre-ssion estimate!> calculated from this
regression?

cal

tic.

Wage = ___ + ___ x Female, R2 = ___ ,SER = ___ .


5.3

Suppose that a random sample of 200 twenty-year-o ld men is selected from


a population and thei r heights a nd weights Hrc recorded . A regression of
weight on height yields

Weig}ii = -99.41 + 3.94 x Hetght, R2 = O.Rl , SER =

10.2.

(2.15) (0.31)

od-

where Weight is measured m pounds and I Jeighr 1~ measured in inches. A


man bas a latL growth spurt and grows 1 'i mches O\'cr the course of a year.
Coostruct a 99% confide nce iolenal fo r tht: [>\;N.'In's ''eight gain.

5.4 Read the box "The Econom ic VaJ ue of a Y~a r ol


or Homoskedasticity?" in Section 5A.
fq uat100 (5.23) to answer the folio\\ JOg.
li Cil \

us~.-

Edu~a tion: H eteroskedasthe regression reported in

a. A ra ndomly selected 30-year-old workc.;r reports an education level of


16 years. What is the worker's expected average hourly earnings?
b. A htgh cllool graduate ( I 2 years of cuuc:allon) is contemplating going
to a community college for a two-year degree. H ow much is this
worke r's average houri~ earnings expected to increase?

176

CHAPTER

lineor Regression with One Regressor


c. A high school counselor tells a student tha t. on average, colh:ge gradu ut l.S e arn $10 per ho ur m ore than high school graduates. Is this stat~
ment consiste nt with the regressio n evide nce? What range of Htluc' i'
consiste nt with Lhe regression evidence?

5.5 In the 1980s. Tennessee conducted an expenment in \\.hich kindergarten ~t u


dents were randomly assigned to 'regular a nd "smaJI'' cla,.,c, and !!i' c.:n
s tandardized tests at the end or the year. ( Regular classes contaim.:d appwximatel) 24 students and small classes contained approximatcl. 15 students.)
Suppose that, in Lhe population, the s landardiled tests have a mean score f
925 poin~ and a standard deviation of 75 points. Let Smoi/Ciass dcnok 1
binary variable equal to 1 ii the student is assigned to a small class and c qu.
to 0 otherwise. A regression of Testscore on SmallClass yieldl!

= 918.0 + 13.9 x SmallClas~, R2 = O.Ol , SER = 74.6.


(1.6)

(2.5)

a. Do mall classes improve test scores? By how much? ls the effect


la rge'? Explain.

b. Is the estimated e iTect of class size on test scores statist ic-ally signshcaot? Carry out a test a t the 5% level.
c. Construct a 99% confidence interval for the e ffect of Smai/Ciass c.m
test score..
5.6

Re fer to 1he regressio n described in Exerctse 5.5


a. Do you think that the regr~::sssoo errors plausibly are homoskeda-. tc !
Explain.
b. S(~ 1 ) was computed using. Equation (5.3). Suppose tha t the rc!lrcc;
sion errors were homoskedast.ic: Would this affect Lhe validity ot t he
contidence interval conmucte d in Exercise! 5.5(c)? Explain.

5.7

Suppose that ( Y~' X;) satisfy the assumptions in Kty Concept 4.3. A ranJ 1nl
sample of size n = 250 is drawn and yie lds

Y = 5.4 + 3.2X, R2 = 0.26,SR = 6.2.


(3.1) (1.5)

Exercises

177

* 0 at the 5% level.

gradu-

a. Test H 0: {3 1 = 0 vs. H 1: /31

.rnte-

b. Construct a 95% confide nce interva I foe /3 1

lu!!S is

c. Suppose yo u le arned tha t Y1 and X, were 1ndependenl. Wo uld you be


surprised'? Explain.

ten stutd given

a pprox:udents.)
~ core of
denote a

tnd equal

d. Suppose that Y, and X ; a re independe nt a nd many samples of size


n = 250 a re d ra'Wll, regr essjons estimated, a nd (a) and (h) answered. I n
what fcactio n of the samples would H 0 from (a) be rejected'? In what
fractio n of samples wo uld the value {3 1 = 0 be include d in the confidence inter val fro m (b)?

5.8

Suppose t hat (Y;.Xt) satisfy the assumptions in Key Concept 4.3 and, in addition, u; is N(O. aD and is independent of X,. A sample of size n = 30 yields

Y=

~-

43.2 + 61.5X, R 2 = 0.54, SER = 1.52,

(10.2) (7 .4)
where the n umbers in p arentheses are the ho moske das tic-o nly standa rd
errors for the r egression coefficients.

a. Construct a 95% co nfidence interva l for /30.

igoi(i-

b. Test H 0: {31 =55 vs. H( /31 :f= 55 at the 5% level


c. Test H 0: {3 1 = 55 vs. H 1: {3 1 > 55 at the 5% level.

';lass on

skedastic?

he regres-

lity of the

1.

.3. A random

5.9

Consider the regression mode l

Y, = /3X; + II, .
where u, a nd X; satisfy the assumptions in Key Concept 4.3. Let 73 deno te an
estimato r of {3 that is construc ted as 73 = ~where Y and X are the sample
means of Y ; and X ;. respectively.

a.. Show that {3 is a linear function of Y 1 Y2, . . , Yn.


b. Sho w that {3 is conditio nally unbiased.

5.10 Le t X; denote a binary var iable and consider the regression Y, = /30 + {3 1X ,
+ u;. Le t Y0 denote the sample mean for o bservations with X = 0 and Y1

178

CHAP TER

Uneor Regression with One Regre~sor

denote the sampl~ mean for observations with X


Y.. aml 1 = Y1 - Y0.
11 +- 1

P P

J. Show th at ~11

~.

5.11 A random sample of workers contains nm = 120 men amltl ~ - l31 womc:n
1 >." y
The sample averave
o of men's weekly earning<: (y-m = -"., ....,,_.
n _) t $5:?3 ..
~

and rhesampleswndard deviation (s"' = v'~I;': 1 (Y"',;- l'mf!) is$f'lh.l,


'lhc corr~ponding values for women are Y,. = $485.10 aods ~ = S51.10 let
Wom en lknor an indie<itor variable t hat is equal to I for women and IJ for
men. a nd suppose that all 251 observations are used in the reg.rcssion }'
{3 0 + {3 1 Women ;.,. II; is run. Find !be OLS estimates of {30 and {3 1 and their
corresponding standard errors.
5.12 Sta rting from Equation (4.22), derive the variance of
ticity given io Equation (5.28) in Appendix 5.1.

Po under homoskeda~

5.13 Suppose that ( Y1, X1) satisfy the assumptions in Key Concept 4.3 and, in mlJ1
tion. II, is N(O, CT~) and is in depend em of xr

a. Js ~ 1 conditionaUy unbiased?
b. Is {3 1 the hest linear condit ionally unbiased estimator o[ {3 1?
c. How .... ould your answers to (a) and (b) change if you assumed
only that (Y,, X;) satisfied tbc assumptions in Key Concept 4.3 and
va r(u, IX; = x) is constant?
d. How wou ld your answers to (a) and (b) change if you assumeJ

on!~

that ( Y1 X 1) satisfied the assumptions 1n Key Concept 4.3?


5.14 Suppose that Y,

f3X1 -r u;, where (u 1.X) satisfy the Gauss-Markov cunJi


t ion~ gi"en in Equation (5.31).
u. DNiv~ the.: least squares estimator of {3 and show that it is a unear

func tjon of Y 1 , Y,,.

b. Show that the esti mator is conditionally unbiased.


c. Derh c the conditional \ ariance of the estimator.

d. Prove that the estimator is BLU E.


5.15 A rcseart'hcr has two ind~pendent sam ples of observations on (Y,.X,). To hi.!
~pccific, suppo~c that Y, denott:s earnings, X, denotes years of schooling. 11 1J
the.: imkpcndent sample~ are for men and women. Write the rcgre<;sion! 1r
men ac; Y111, 1 = {3,~,0 + f3,.1X 111,, + u,,,. and tbe regression for women as ).
= {:3 "-+ {:3 1X , + u . Let~"' 1 denote lhe OLS estimator constructeJ u~in~

Empirical Exercises
tal

179

1he sample of men,~ .. . I denote the OLS estimator constructed (ro m the sam-

~o = Yo.

ple of women, and SE(~,, 1 ) a nd S(?3..,,1) denote the corresponding standa rd


error s. Show that the s tandard e rror of l3m.J - ~w.l is given by
SE(j3111.L - ?3.. .,) = V!S(/3,,1 ) ]2 + [S($.., ,)f.

l31 women.

,) i" $523.10.
-:)1) LS $61-1. 1.

t)l

: $5l.l0. Let

1eo

and 0 for
,ression Y. =
(3 1 and their

Empirical Exercises
ES.l

Using the data set CPS04 described in Empirical Exercise 4.1. run a regression of average hourly earnings (A H ) on Age and carry out the following
exercises.

t11)moske das-

a. ls the estim ated regression slope coefficient statistically significant?


That is, can you rt:!ject the null hypo thesis H 0 : /31 = 0 versus a twoside d alternative at the 10%,5%, or 1% s ignificance level? Whaf is the
p -value associa red with coefficient's t-stalistic?

3 and- in addi-

b. Construct a 95% confidence inte rval for lhe slope coefficient.


c. Repeat (a) using only the data for high school graduates.

~ ,?

d, Repeat (a) using o nly the data for college graduates.

>umed
pt 4.3 and
~umed

o nly

e. Is the effect of age on earn ings different for high school graduates
than for college g raduates? Explain. (Hint: See Exe rcise 5.1 5.)
E5.2

3?
Markov condi-

is a linear

o n ( Y,.X1). To \"I~;
Df schooling, nn~.l
1e regression (os
>r women as l.
XJnstructed uc:,ill~

Using the data set TeachingRatings.described in Em pirical E xercise 4.2, run


a regression of Course_Eval on Beawy. Is the estimate d regression slope
coefficient statistically sig11ificant? 11Hit is, can you reject the null hypothesis H0: {3 1 = 0 versus a two-sided alternative a t t he 10%, 5 %, or 1% significance leve l? What is the p-value associated with coefficient 's t-statistic?

5.3

Using the data set CollegeDistance described in Empirical Exercise 4.3, run
a regression of years of complete d e ducation (ED) o n distance to the nearest college (Disc) and carry out the fo llowing exercises.

u. ls the estimated regression slope coe[ficient statistically significant?


That i~ can you reject the null hypo thesis H0 : {31 = 0 ve rsus a twosided alternt~ tivc a1 the 10%.5%, or 1% significance level? What is the
p-value associated with coefficient's 1-statistic?
b. Construct a 95 % confidence interval for the slope coefficient.
c. Run the regression using data only on females and repeat (b).

180

CHAPTER 5

Linear Regression with One Regressor

d. Run Lh.e regression using data only on males and repeat lb).
e.

t~

the cffeCl of distance on completed years ol t:<.lucauon different tor

men than for women? ( Hint: See Exercise 5.15.)

APPENDIX

5.1

Formulas for OLS Standard Errors


lltis appendix discusses the formulas for OLS sta ndard errors. These are first presentL:J
under th~.: least squares assumptions in Key Concept 4 3, whtch allow for hete roskcdn~ll~
ity: these are the ''heteroskedasticity-robust st.andurd errors. Formula:. for tlw variant:\: f
the OLS estimators and the associated -.tandard errors arc then g1ven ft'r the c;peci.tl ca e
of homoskedasticity

Heteroskedasticity- Robust Standard Errors


The ec;umator ~~. defined in Equation (5.4) is obtuinc.:d by replacing the population' rt
ancc~ in Equation (4.21) by the corresponding '!ample variances. with a modificatitlll, lbe

"L:

variance in thl numerator of Equation (4 2 1) IS estimated by , 1 !


.(X, - X )2u ' '1c re
the d1v1sor n - 2 (instead of n) mcorporat~ a dcgcees-of-fret!dom adjustment to en T ct
for clownward bias. antuogou:.ly to the degrees ol freedom adJ ustment useJ in th .. Je t:u
tion of the SER m Secuon 4.3. The van ancc in tbt: denommator is csumatt->d b)

:r

(X,-

e~ttmator; ~

xr. Replacmg "ar[(X "x)u,) and var(X,) m Equuuon (4.21)


icld5. u
; m Equation (5.4). The consbknC\ vf
1-

dard e rror-. i ~ discu~sed 1n Section 17.3.


l'hc C'>llmutor of the variance of~" is

th""'- o
hetcroskeda~ticlly-robu~t 't :J
h)

Formula~ for OLS Standard Errors

181

"here. iT= I - (X tl L7- 1 X;JX,. illestandarderrorof ~0 tsSE(iJo) = \/'J[.Thc reason"


....,

r1!

for

ing behind !he e<;ttmator

Vi is the same as hehind c7~, and stems from replacmg populauon

expecta tions with sample "'verages.

Homoskedasticity-OnJy Variances
U nder homo!.k.cdasticity. the conditional variance of u, given X; is a constant:

var(u,l X,)

af.. ff the errors arc homoskedastic. the formulas in Key Concept 4.4 simplify

to

(5.27)

<J

flu

presented
~ked astic

ariance of

,ectal case

E( Xr ) 1
=---a
no} ,,.

(5.28)

To de rive Equation (5.27), write the numerator in E q uation (4.21) liS var[(X; - 1-t~)u;)

= E(I( X,-

JJ. x)tt1

E[(X, - ,u_y)u,W)

= E[ [( Xi-

JJ.x) u1]2] = E[(X, - JJ.xf tt~ ] =-

E[(X, - JLx) 1var(1t, IX 1) ]. where the second eq uality follows because Ef(X, - JJ.x)u1) = 0 (by

the first least squares as~u mption) a nd where the final equality follows from the Jaw of iterated e xpectations (Section 2.3). [f

E[(X, - ~-tx) var(u 1 I .Y;)] = a~ E[(X1 1

u, is homoskedastic. then va r(u, IX;) =a~ so


iJ.x?J = a-~a-_{,. 11te r~suJt inEquation (S.27) fol lows

lattOn vari-

by substituring thts expression into the numerator of Equation (4.21) and simplifying. A

catton. The

similar calcul:llion yid ds Equatio n (5.28).

,~,.i~. where

t \(l correct

1 the

defmi-

,\lma ted by

>y these two

,robust stan

Homoskedasticity-Only Standard Errors


The ho moskedas ticity-only s tandaid errors are obtained by substttutingsample means and
variances for the popullllt<lO means and vanances in Equations (5.27) and (5.28). and by
estimating the va riance of u; by the square of lhe SER. The homoskedasticity-onl} estlma-

Lors of these variances an:

,,
}2_(X, -

(bomoskedasticity-only) and

(5.29)

(homoskedasticity-only),

(5.30)

X) 1

;- 1

182

CHAPTER

s linear RE!9ression with One Regressor


wh..:rc

s ~ as given

in Equation (4. 19) The homosl<.edasticaty-only standard error.. are

th~o:

,.

MJUare rOOtS O( fr antJ 07,


..., .

APPE ND

xI The Gauss-Markov Conditions and

_ __5_._2--~1 a Proof of the Gauss-Markov Theorem


As di<;cusscd in Secuon 5.5. the Gauss-Markov theorem states that if the Gauss-Mark{J\
conditions hold. then Lbc OLS \!Stimator is the best (most ~fficient) condit ionally li ne..~r
unbiased estimator (is BLUE).ll1is appendix begins by stating the Ga uss-Markov conditions and showing that they are implied by the three le ast sq uares co ndition pi th
homoskcdastici ty. We next ~how that the O LS esli.mator is o linear conditionally un bla~~.- d
estimator. Finally, we tu.rn to the proof of the theorem.

The Gauss-Markov Conditions


The three Gauss-Markov conditions are

(i) E (u, X 1,.

(u) var(u;l X , .
(ili) E(u,u, x l,

X,.)

X 11 )

=0
= a-;,

X II)

0 < u~ < oo

= o. i

(5 3 )

*j

where the condnions hold for~~ J = I,


n. The thrct; conditions. respective ly. stat~ l<.~t
u, has mean tero, that u, has a coo~t an t van ance, and tha t the errors are uncorrd.tt<.:J
for different observaLions, whe re all these ~ta temen ts hold condi tionnlly on all ob~"r.cd
0

X s (X 1, , X~).
TI1e Gauss-Markov conditions are implied by the three least squarcf- assumpuCln'- I K.::)
Co ncept 4.3). plus the additionnl n.;o;umption that the errors are homoq)..cd a~tic Been ,e
the observations arc 1.1.d. (Assumption 2), (111! X 1 . X, ) = E(u,I X 1) , and by As(umruon I. E(u, X,) = 0: tbus condttio n (i) holds. Similarly, by Assumption 2. va r(u, X1..... X l
= var(u X1) , and becau.<;e the errors a rc assumed to be homoskeda!>tiC. va1(111 .\' ) - cr
\\ hich 1:. constant. Assumption 3 (nonzero finite fo urth moment<;) ensures that 0 ~ "
~o condition (ii) hold:.. To show that condition (iii) is implied by tht lcu't 'l(U<He~ :t' sumr
uons. note that (11 u,IX .
X , ) = (11,u1l X ,. X 1) bccaust. (X,. Y1) a r~; ii d h~ Aswnor
uon 2. A'~ump11on 2 also implies that (u,tt1 X,, X1) = E(u, X,) ( 11 \ ) lor 1 + J. ~c;~ut:
f~(u, 1 X ) = 0 lor all i 11 tollows that (11,11 X 0. 0. , A ) 0 fo r .lilt ,- I -.o ~:unJiti~'" (ni)
0

The Gauss-Markov Conditions and a Proof of the GoussMarkov Theorem


the

1 83

holds..11ws. the least squa res assumptions in KeyConccpt 4.3, plus homoske da' ticity of the
errors. imply the Gauss-M a rkov condjtions in E quatio n (5.31).

The OLS Estimator {1 1 Is


a Linear Conditionally Unbiased Estimator
To show that~~ is linear. fll'St note that, because

2.7 1(X, - X)= 0 (by lhe definition of X).

~7- 1 (X;- X)Y, - Y2:;'.. 1(X;- X) =


tuting this result imo the fonnula for ~tin Equation (4.7) yields

L7 1 (X;- X)(Y;- Y)

2:;=1(X, -

X) Y;. Substi-

(5.32)

;trkOY

line ar
~odi

' plus
lliased

Becaul;e lhe weights

Y1,

a,, i = 1. ... . n in Equation (5.32) depe nd on X 1, .. , Xn but no t on

Y,, the OLS estima tor~~ is a unear estimator.


Under the Gauss-Markov conditions,~, is conditionally unbiased, and the variance of

the conditional distribution of ~1 given xl.

xn. is
(5.33)

(5.31)

ate that

related
bserved

The result thal ~ 1 is conditiona lly unbiased was previously sho\\D in Appendix 4.3.

Proof of the Gauss-Markov Theorem


by deriving so me (acts that hold for all linear conditionallly unbiase d estima torsthat 15, for all estimators
satisfying E quations (5.24) <~nd (5.25). Substituting Y, =13u .o..
(31X 1 + u; into {3 1 = I:~ 1 a;Y, a nd collecting terms, we have that
We start

Pt

ms (Key

(5.34)

Beca use

\ ssump

By the first Gauss-Markov condition, E(2:~~ 1 a1u;I X 1 .... , X,,)

= 2:;~ 1 a,E(urJ X1 ,

.. .. . X, l
(;) = cr,..

0; thus, taking conditional expectations of both sides of Eq uation (5.34) yie lds

. a-;<%,
a.-.sump

assumption, it must be tha t (30 (2:;'..,a;)

Assump

for all values of {30 and {31 it must be the case that, (or

E(P1 ,X 1, . X, ) ={30 ( 2:;.. 1a;) + {31 ( 2:7=1a;X ,). Because ~1

....

X 11)

is co ndjtionaJiy unbiased by

+ {3 1 (I~a 1 u,XJ = (3 1 but for this eq uality to hold

p, to be conditionally unbiased,

~ bccau.'t:

,ition (ui)

n
"
:La,=
0 and 2:a,X, = I.

I I

i- 1

(5.35)

184

Linear Regre~sion with One Regressor

CHAPTER 5

Lnlkr th .. Gaus~- \1arkov coodtl tCln,, th~: variance nf /3 condnwnal on X1,..

ha\
0.1 'lmple torm. Sub:.titullne [quJtton (5.35) into Equattun (5J4) yn.:h.Js {3 - {3 . - ~
,ur,
\ ,

1;

Thu'-... arl~ X . .... X,.) nr( I :.,a,u, X 1, ... ,X,.) = L~ 1


111 a \:0\ (11,.11w \ 1, .... X \
~pplying th1. .-.ccond dOJ tlurd G.tuss-Marluv ..:ondtuons,the cnw.. terms in the doutlk su
mat ion v;tni'h anJ the cxpr~"tl)n for Lht.: conditiClnal variantt \llllphlil!s 1\l
II

var(/3 \' .

.. X,)=

u!2:t11.

(5.3f!J

\1 .1te 1h11 Et~uution~ (5.35) and (5.36) apply


lt<Jn , '> J2).

10

P1 \\ Jih '".:tghts aj

iii. rl\~n m Equ..

We:- m w show lh:nthe two r~stncuons in Equation (5.35) and the ex-pr~sron lm 1~
conditional variance in C\juuttor (5.::;6) tmply thai th~ condlliunal vari m~-e <1f P 1 cxt~: ~

the condtt tonal variance or /3 1 unless .8 1 = /31. l.el o. ~~ .a, . . 2 !~ .a,d - s~ ,c1 .
u~mg the ddiniuon o( rl ,, we have th at
I

'L il,tl,
1=1

1.
~
(X,- X)d,

I"l: <x,- x f ("iAX


11

j l

i l

'(

"
- l 'La,X,,.,

1-

n,

d o;o

n d,) /
x L_

,. ,

L:

! ; 1a?

(tr~ ~ J/, )

11

2:<X1 i l

" ) - x(~~ u - 2:n


" ,),
2:a,X;
, ... J
'
il
~

X)1
~

L (X - X)2
I

0.

wbert.: th.: ftnJI c!t~ U:lh ty lullows Irom EquatJl.lO (5.35) (which bold~ for both n, Jnd a.). T liS
~'
- 1 ~II d , = var({3. i'v ... .X)
. 1h'' r~ .. J l 1
2 ~~.. ,d';:su bstttuung
cr.,2 .._,_,a,
- a-;., ....y,. 1a, uu~.
, T cr,,~,
1
1

tnh.l

[quatmn t5J()) >tcld~

\-;tf({3 11 \ 1.... , X,)

- var(f3d X 1, ... ,Xn) =

u;"Ld?.

(5.37)

iJ

lhus {3 ha' a )!Tc!ilh;f ...:l.lnuuronal variance than iJ if d t:. nonzero for any 1 = 1, . . . . 11. But

tiA ~ 0 101 alit. then''

=11, and /3

::.

iJ,, \\ hich proves that OLS is BLUE.

The Gauss-Markov
Theorem When X Is Nonrandom
With a minor change in tnh:rprctation. the GatAs-Markov theorem oi i\O applic'\ Ill nonr.10
dom r~~re''''r:. . th,ttt ... tt applies 111 r~grcssors that do not chan!!C thetr value ~ over rcpcatl.'l

sample.::., Sp..:cthcally. il the.; liOIIll II. J<;t -.quare:. assumptllln is r~:plal.~J by th.. assumpt tl'll
that.\ ,._ ... X,. 1re n(l u .tndltOl (ltxc::d o~cr rcpcat.:J sample~) .mJ " . 11 Jte 1.1.d. 1hc!ll
the forcgn tOJ; .,l.ttctru:nt nnJ proof of Inc (nu-s-Mark"' thcnr\.!"1 .tppl} Jm:ctly.:.xc~pl thnt

1he GoussMorl<ov Conditions and a Proof of the GoussMorl<ov Theorem

185

aU of the "conditional on X 1 Xn" statcmcob HTC unm:<..'t' SS8T)' hecause X 1 X n take


on t11c same valuec; from one ...ample to the next.
,tl,ll,.

. Xn)'
...urn

The Sample Average is the


Efficient Linear Estimator of E(Y)
An implication of the Ga u s-Markov lhcor~m
efficient hnear esllmator of Et Y,) when

fquatnr the

xceeds
d,)!

,r,).Thus
hts result

(s.:n)
... 11.

But

onooran

repeated
;sumpti(1f\

i.i.J .. the n
"<ccpt tba:

IS

that the ~ample a\'erage, Y, is the most

Y;, . . . Y, are t.i.d. To sel! this. ~:onstde r the cru.c

of regression wnhou! an .. Y," .r;o that the only regrc or is Lhc con taut regressor X11 = I.
l11cn the OLS e timator {30 = Y. lt follows that, under the Gauss-MaJkov assumptions, Y
is BLUE. Note that t he Gau:..s-Markov requirement t ha t t h~ error b.. bomoskedastic is
irrelevant in tllis case because t he re is no regres.~or, so it follows that r b BLUE if
Y1.. Y,, are i.i.d. This result was stated pre viously in Key Concept 3.3.

CHAPTER

Linear Regression with


Multiple Regressors

hapter 5 ended on a worried note. AJthougb school districts wilh lo" cr

studenr-teachcr ratios tend to have higher test scores Ill the Cahtormn

data set. perhaps students from districts with small classes have orher
advantages tha t hd p them perform well o n ~tandardi 7cd tests. Could thio; hiJ\t:
ptoduced misleading results and, if so, what can be done'!
Omilted facrors, such as student characteristics, can in fact make rhe
Ot"dioary least squares (OLS) estimator of the effect of class size on test store")
misleading or. more precisely, biased . T11is chapter explains this "omitted
variable bias and introduces multiple regression. a method that can elim ina t ~
omiued variable bias. The key idea of multiple regression is that. if we

hav~:

Jata

on these om iued variables, then we ca.o include them as addilional regre<;<;OI'


and thereby estimate the effect of one regressor (the student- teacher ratio)
while holding constan t the other variables (such as student characteristics)
This chapter explains how to est)mate the coefficients of the multiple l111~ar

regrc ion model. Many aspects of multiple regression parallel those of


regression with a single regressor. studied in Chapters 4 and 5. The coeffit:tl n -;
of tbe mult1ple regression model can be estimated from data us.10g OLS: the
OLS estimators io multiple regression are random variables because the)

depend o n data (rom a random sample: and in large samples the sampling
distributions of the OLS estimators are approximately normal.

6.1

- - -- -t
Omitted Variable Bias

B} [ocusing onl) o n rhe st udenL- tl!ac.:hcr ratio. the empin cal analysts in Ch1P"
te~ 4 and 5 ignored "ome potenliall~ important determinant~ of test scores b) ol
h.:cting thcar annuenccs in the regre . ion error term. llle~e nmi uc.:d fact ors tndudc

6. I

Om1Hed Variable Bios

187

school ch a ract~ri~ tics. such as teacher quality a nd compu ter usage. and c;tudcnt
characte n sllcs. such as fam1ly background . We begin by con,idering a n omilte d
<;tudent chanJctenstic that ~ particularly re le\ant to Califo rnau beca use of its large
im migra nt population: lhe prevalence in the school tli-.tricr of students who are
still learning English.
By ignoring the pe rcentage of English lea rn er~ in the dic;trict, the OLS esti mator of the slope m tbe re gress ion of test scores on the s tudent- teacher ratio
could be bia~cJ ; that is. the mean of the sampling distribuuon ol the OLS estimator might not equa l the true effect o n te~t ~cores of u unit change in the
student- tl!ncher ratio. Here is the reasoning. Students who a re still learning EngLish mig ht perform worse on standardized tests than nat1ve English speakers. If
dstricts witb large classes also have many students stilllcarnang English. then the
OLS regression of test scores on the student- teacher ra tio could e rroneously fmd
a correlatio n and produce a large estimated cocllicicnt, whe n sn fact lhe true causal
effect o( cutting class sizes o n test scores is sma ll, even ze ro. Accordingly, based o n
the analysis o f Chapters 4 and 5.the superinte ndent might hire e no ugh new teachers to reduce the student-teacher ratio by two. but her hoped-for improvement in
test scores will fail to materialize if the true coefficient is mall or zero
A loo k a t the California data lends credence to this concern. The correlation
between the s rudenHeacher ratio and the percentage of E nglish lea rners (students
who are not na tive English speakers and who have not yet mashm.:d English) in
the district is 0.19. '11lis smaU but poS-itive co rre lation suggests that districts with
m ore E nglish learners tend to have a highe r s w de o t- teaehe r ratio (larger classes).
If the student- teache r ratio were unrela ted to the percentage of English learners.
then it would be safe to ignore E nglish profi ciency in the regrc::.sion of test scores
agai nst tht: ::.t udent-teacher ratio. But becau~ the studenl- tcacbe r ratio and the
percentage of English learners are correla ted. it is possible that the OLS coeiTic1ent
in the regression of test scores on the stude nt-teache r ratio re flects that influence.

Definition of Omitted Variable Bias


If the rc!!res::.or (the student- teacher ra uo) 1s correlate d with a variable that has
been omitted from the analy:,is (the percentage of Enflh ~ h l~ aml! rs) and that determane!< an part . the depe nde nt variable (test scores). then the OLS esumator will
hHe omitted variable bias.
O ms tted variable bias occurs when Lwo conditions are true (I) the o mitted
' <triable i~ correlated with t he included regressor: und (2) the o m ille d variable is
a de termi nant of the dependent variable. To illustra te these conditions. con~ide r
thrl.'c c.::xample' ofvanablc~ that are omilll.'d from the. f\:l.trc.:ion uf h.:~t scores o n
the ..tudcnt h:.tcht!r ratio.

188

CHAPTER 6

linear Regression with Multiple Regressors

Example #I: Percentage of English learners. Because the percentage or


English learners is correlated with the student-teacher ratio, the first condition fur
orn.itted variable bias holds. It is plausible lhat students who are still learning E nglish wiUdo worse on standardized tests than native Engli h speakers. in which case
tbe percentage of English learners is a determinant of test score::; and the second
condition for omitted variable bias holds. Thus, the OLS estimator io the regre~
sion of test scores oo lbe student-teacher ratio could incorrectly reflect the influence of the omitted variable, the percentage of English learners. That is, omin ang
the percentage of English learners may imroduce omitlcd variable bias.
Example #2: Time ofday ofthe test.

Another variable omitted from tht:


analysis is the time of day that the test was administered. For this omitted variable,
it is plausible that the first condition for omitted variable bias does not hold but
the second condition does. For example. if the time of day of the test varies from
o ne d istrict to the next in a way that is unrelated to class size, then the time of day
and class size would be uncorrelated so the first condition does no t hold. Conversely. the time of day of the test could affect scores (alertness varies through the
school day), so the second condition holds. However, because in this example the
time that the test is administered is uncorrelated with the student- teacher ratio.
the srudenr- teacher rati o could not he incorrectly picking up the " time of day"
effect. Thus omitting the time of day of the lest does not result in omiued variahle
bias.

Example #3: Parking lot space per pupil.

Another omitted variable is park


ing lot space per pupil (the area of the teacher parking l?t divided by t he n um b~r
of students). This variable satisfies the first but not the !>ecoud condition for omit
ted variable bias. Specifically, schools with more teachers per pupi l probably have
more teacher parking space, so the first condition would be satisfied. However,
unde r the assumption that learning takes place in the classroom, not the parking
lot, parking lot space has no direct effect on learning; thus the second conditi,,n
does not hold . Because parking lot space per pupil is not a determinan t of te'il
scores, orniuing it from the analysis docs not lead to om itted variable bias.
O mitted variable bias is summarized in Key Concept 6.1.

Omitted variable bias and the first least squares assumption. O mattcd
variable bias means that the first leas t squares assumption- that E(u, I X,) = 0. a~
listed in Key Concept 4.3-is incorrect. To see why, recall that the error term u in
the linear regression model with a si11glc regressor represents aU factors. other than
X ,, that arc determinants of Y,.lf on e of these other factors is corrdatt:d with X1

6. 1

Omitted Variable Bios

TB9

re of

'

11 for

l:ogcae
cond

1gresinflu-

li\ting

O MITTED VARIABLE BIAS IN REGRESSION


WITH A SINGLE REGRESSOR

'-'

.,

i.

~A(t:v CONct~ ';,;

. ~~------------------------------------------~-------:

r----~

6.1

Omitted variable bias is the bias in the OLS estimalor that arises when the regressor. X, is correlated with an omitted \ariable. For omitted variable bias to occur.
two conditions must be true:
I X js correlated with the omitted variable.

2. 1l1e omitted variable is a determinant of the dependent variable. Y.


m the

lriable,

>ld but
:s from
of day
:i. Conug,h the
tplc the
~ r ratio.
o f day
variable

~is park-

number

for omittb\y have

this means that the \.lrror term (which contains this factor) is corre lated with X,-. In
other words, if an omitted variable is a de terminant of Y,, then it is in the error
term. and if it is correlated with Xi> then the error term is correlated with Xi.
Because u, and X; a re correlated. the conditional mean of u; given Xi is nonzero.
This correlation therefore violates the first least squares assumption, and the consequence i~ serious: The OLS estimator is biased. This bias does not van ish even
in very large samples, and the OLS estimator is inconsistent.

A Formula for Omitted Variable Bias


The discussion of the previous section a bo ut omitted variable bias can be summarized mathe matically by a formu la for this bias. Let the correlation belween X;
and u ; be corr(X;,uJ = Px..- Suppose that tbe second and third least squares
assumptions hold , but the first does oot because p x11 is nonzero.l11en the OLS estimator has the limit (derived in Appendix 6.1)

-lowever.

e parking
condition

1nt Of teSt

ias.

OmitteJ

X,) := o.as

r term u; ill

,other tlt:t11
.ed with X,-

(6.1)
That is, as tbe sample size increases, ~ 1 is close to /31 + Pxia 11 1ax) with increasingly high probability.
The f01:mula in E quation (6.1) sum ma rize5> several of the ideas discussed
above about omitted variable bias:
1. Omitted variable bias is a problem whether the sample size is la rge or small.
Because 1 does not converge in probability to the true value {3 1, ~~ is inconsistent; that is, iJ1 is not a consiste nt estimator of {3 1 when the re is omitted variable bias. The term Px 11(rr)ax) in Equation (6.1) is the bias in 1 that persists
even in large samples.

19 0

CHAPTER 6

linear Regression with Multiple Regressors

1!

The Mozart Effect: Omitted Variable Bias?

study publhhcd in 1Vature in 1993 (Rauscher.

music courses appears to have omitted va.riabl~ bias.

Shaw and Ky.

By omiuing fa~.: tor suc h as t he studcnf s innate a biJ-

Mozart fM 10-15

199~)

suggcst..:d that listenjng to

minutl}~

could temporarily raise

ity o r the overall quality of the school. ~ tuuying music

your IO by 8 o r 9 points. That study made hig news-

appea r:; to have an effect on test scores when in fact

a nd politicians and parents saw a n easy way to make

it has none.

their chlldn.:n s marter. For a while. the state of Geor-

So is there a MO?.art effect'? One way to find out

gia even distributed cla.ssical music C'Ds to all infants

is to do a randomized controlled experiment. (A s

in the state.
What is the evidence for the " Mozart effect"? A
re vieW ot do~.ens of s tudies founJ that Sl udents who
takt: optional music or arts courses in high school do
in fact have higher English and ma th lest score:; than
those who don't. 1 A closer look a t the.~e studies. however. suggests that the real rcns()ll for the better lest

discussed in Chapter 4. r andomized COlllroUcd

"comrol" groups.) Taken together. the ma ny con-

performance has little to do with those courses.

e ver. it see ms that lis tening to classical music does

u1stead.thl: amhors t)f the review suggested thatlh..:

help temporarily in one narrow

correlation between testing well and taking art or

and visualizing shapes. So lhe next time you cram [or

music could arise (rom any number of lhings. For

an o rigami cxam. try 10 fit in a little Mo7.art, 100.

example. the academically

b~tt<.:r

~o:xperirnents

eliminate omitted variable bius by ran-

domly assigning participants tn "treatment" and

trolkd experiments on the Mozart effect fail to show


that listening to Mo7.art improves lQ or general test
p~rfonnance. For rcnsnns

not fully understood, howan.~a:

folding paper

students might

have more time to take optional music co urses or

more inh.:res t in doing so. or th<)!;e schools with a


deeper musit.: t:urriculum might just b~ better schools

TABLE 4

1 Sec

across lJ1e board.


In the terminology of regression, t he estimated
relationship between test scores and taking optional

the Journal of Aesthetic Educatcoll 34: :-4 (Faii/\Vin


t<:r :WOO). especially the article by Ellen Winner and l\.hm-

ica Cooper, {pp. 11-76) and the om: by Lois Hcrland


(pp. 105-148).

2. Whe the r lhis bias is large or small in practice depends on the coudalton P.t.
between tbe regressor and the error term. The larger is IPx, !. the larger b the:

All tlistnc

b ias.
3 . The direction of the h ias in

Pcr~:entagc

~ 1 depends o n whetJ1e r X and u a re positively or

<: l.l)%

ne gative ly corre late d. For exam p le, we :>peculate d tha t th e perce n tage of students learning English has a negmive effect on di::;trict test scores {st udents

still learning English have lower scores). so lh Rl t he percentage of Engli~h


learners cnl~rs the error tcm1 wit h a negative :-ign. Tn our d ata, the fraction of
English learners i positively correlated w ith Lhc student-teacher ratio

19-S.S%

I ll.R-2..l0%

--

l.:> 23.0%

6.I

Onutted Variable Bias

19 T

(di~t ricts

with more English learners have larger classe:)). Thus the


student- teacher rario (X) would be negatively correlated with the error term
(u), so Px11 < 0 and the coeffici ent on the student- teacher ratio {3 1 would be
biased toward a negative numbl!r. In other words, ha' ing a small percentage
of E nglish learners is associated both with high test scores and low
studen t- teacher ratios, so one reason that the OLS estimator suggests Lhat
small classes improve test scores may be that the districts with small classes
have fewe r English learners.

Addressing Omitted Variable


Bias by Dividing the Data into Groups
'What can you do about omitted variahle bias? Our su perintendent is considering
increasing the number of teachers in her district, but she has no control over the
Craction of immigrants in her community. As a result, she is interested in the effect
of the student-teacher ratio on test scores. holding constant other factors. including the perceotage of English learners. Tllis new way of posing ber question suggests that. instead of using data for all districts. perhaps we sbould focus oo districts
with percenlage!> of E nglish learners comparab1e to bers. Among th is subset of districts, do those with smaller cla!>ses do bette r on standardized tests?
Table 6.1 reports evide nce on the re la lion:.hip be tween class size and test
scores within districts with compa rable percentages of English learners. Districts

\
n-

TABLE 6. 1

Differences in Test Scores for California School Districts with Low and High
StudenHeacher Ratios, by the Percentage of English leomers in the District
Stt~dent-T~cher

Ratio < 20
Average

Test Scare

pi~

or
~f stu

Student-Teacher
Roria ~ ~0

Difference in Test Scores,


Low vs. High STR

Average
ScaN

Difteren<e

Test

I'-statistic

657 4

238

6'i0.0

182

7.4

4.04

664.5

76

665.-t

27

- 0.':1

- 0.30

"~ :..01agc: tl En11h~h ll!nrners


< II'J%

~dent"

!!.8~o

ng\ish
tion of

66S.2

64

661.8

.w

3.3

113

23 n,;,

6'\-1.9

5-1

M9.7

50

52

1.72

ti.'t>. 7

44

634.X

6L

1.9

0.6S

rtl tiO

1 92

CH A PTER 6

lineor Regression with Multiple Regressors

are divided ioto eight groups. Firsr, the districts are broken into four categonc,
that correspond to the quartile~ of the distribution of the percentage of Englt-.
learners across districts. Second, within eacb of these fo ur categoncs. dtsrricb are
further broken down into two groups. depending on whether the lltudent-teach r
ratio is mall (STR < 20) o r large (STR ~ 20).
The first row Ill Table 6.1 reports the overall diffe rence in 3\'erage lt= 1 scort:
between districts w1th low and high studeoHeacher ratios. that ts. the dtfferc:ncc
in test scores between these two groups without breaking them do'm funbcr 1ro
the quartilcs of English learners. (R ecaJI that this difference \\.aS previou, ,.
reported in regression form in Eq uation (5.18) as the OLS estimate of the cod
cienl on D; in the regression of TesrScore on D1, where D, b a binary regr~'\Sor th 11
equals 1 if STR, < 20 and equals 0 otherwise.) Overt he full sam ple of 420 distril h.
the aver::~ ge test score is 7.4 points higher in districts with a low studenr- teachtT
ratio than a high one; the /-statistic is 4.04. so the null hypothesis that the mt,m
test score is the same in the two groups is rejected at the 1% signifiCflnce level.
The final four rows in Table 6.1 report Lhe difference in test scores betwcc.:n
districts w1th low and high student-teache r ratios, broken down by tbe quartik vf
the percentage of English learners. This evidence presents a c.liffe rent picrure Of
the districts with the fewest English learners ( < l.9o/o ). the average test score Dr
those 76 with low student- leacbcr ratios is 664.5 and the average for the 27" h
high student- teacher ratios is 665.4. Thus. for the districts w1th the fewest Eng. 1-sh
learners. test scores were on average 0.9 points low~r in the disLricts wnh low stu
dent-teacher ratios! lo the second quartile. dtstricts with low stude n!-teatl er
ratio-; had test scores that averaged 3.3 points highe r than those wiLb bigh ~
dent- teacher ratios: this gap was 5.2 points for the third quartile and onl)' 1.9 pont~
for the quartile of districts with the most English learners. Once we hold the~ ... rcentage of English learners constant, the difference in performance betv.een ...
tricts with high a nd low student-teacher ratios i perhaps half (or less) 1.){ the
overall estimate of 7.4 points.
At first this finding might seem puz7.ling. How can the overall effect oi tt'l
scores be twice the effect of test scores witbin any quani lc? The answer i that the
d ist ricts with th~ mosl English lea rners tend to have both the high..:'t
student- teacher rarios and the lowest test scores. The difference in the ave rag~ II.'''
score he tween districts in the Lowest and highest quartile of the percentage of En~
lish learners tS large. approximately 30 points. Tile districts with fe w English le.u n
e rs tend to have lower student-teacher ratios: 74"" (76 of 103) of tbe dt~t rict~ 1n
the first quartile or English learners ha' c (Omall cla,sc.c; (STR < 20), while onl~ 12 '
(4-t of 105) of the dil-ltricb in the quartile\\ tth the m~t E.nJ!Ii h h.:aroen. ha' e "" II
d ""4.:' So.th'- dt.,tricb. with the mosl Cngh~h h. trn~.;r. ha...... bmh Jo,..,. ~r test ~or'"'
and ht h~or "'udcnt- eacher ratio<> than th~: other llbtw..:t'-

6.2

~lish

1 art!

c:hcr

ores
!ence

' lfi\ 0

r that
tricts.
achcr

193

This analysis reinforces the superintendent's worry that omitted variable bias
is present in the regression of test scores against the student-teacher ratio. By looking wit bin q uartilcs of the percentage of E nglish learners. the Lest score differences
in the second part of Table 6.1 improve upon the simple difference-of-means analysjs in the first line of Table 6.1. SLill. this analysis d oes not yet pro vide the superintende nt with a useful estimate of the e ffect on test scores of changing class size,
holding constant the fraction of English learners. Such an estimate can be provided, however, using the me thod of multip le regression.

ries

busty
be fi-

The Multiple Regression Model

6.2

The Multiple Regression Model

mean

Th e multip le regression model extends the single variable r egression m odel of


C hapte rs 4 and 5 to .include additional variables as regressors. This model permits

~vel.

estimating the effect on Y; of changing one variable (Xli) while holding the o the r

Itween

hilc of

regressors (X2;, X 3i. and so forth) constant. In the class size p roblem. the multiple
regression model provides a way to isola te the effect on test scores ( Y,) of the student-teacher ratio (X1;) while holding cons tant the percentage of students in the

~~ore for

district who are E nglish learners (X2i)

~~e. Of
7 with
oglisb

lowstu-

The Population Regression Line

l.eacber
l ~gh stu~ points
lthe peri'ee n d"1s

Suppose for the moment that the re are only two independent variables, X 1; and

of the

(6.2)

f)

c t of test
~ that the
~ highest
erage test
~e of Eog~sh learndistricts in
: only42%
have small

Itest :;cores

X 2;. In the linear multip!e regression model, the average relatiouship becween
lhesc two independent variables and th e de pendent variable, Y, is given by the linear function

where ( Y;IX lr = XI . x 2, = x2) is the conditional expectation of yl given that x li


= x 1 and X 21 = x 2 TI1at i ~. if the student- teacher ra tio in the ph d istrict (X 1;) equals
some valuex 1 and the percentage of E nglish le arners in the ith district (X21 ) equals
x2 then the expected value of Y1 given the student-teacher ratio and the percentage of English learners is given by Equation (6.2).
Equation (6.2) is the populatio n regression line o r population reg ression function in the multiple regression model. Tile coefficient /30 is the intercept. tl1e coefficient /31 is the slope coefficient of X11 or. mo re simply, the coeffi cient on X11 and
the coefficient /3: is the slope coefficient of X 21 or. more simply. the coefficie nt on
X 21 One or more of the independent variable in the multiple regression model
are sometime.~ referred to as control \ariabl.es.

194

CHAPTER 6

Linear Regression with Multiple Regressors

The interpreta tion of the coefficient {3 1 in E quation (6.2) is d1ff\!rent Lhan i


was wben X ll was the only regressor: In Equarion (6.2), {3 1 is the effect on } ol
unit change in X1, holding X2 constant or controlling for X 2
Thi$ interpTetation of f3J fo llows irom the definition that the expected effett
on Y oi a change in X 1, ~X1 , holdi ng X2 constant, is the differenc~ between tb
expected value of Y when the independent variables take on the values X 1 + .lX
and X 2 and the expected value of Y when the independent variables take on the.
values X 1 an<.J X 2. Accorc.lingly, write the population regression function in Equation (6.2) as Y = {30 + {3 1X 1 + ~X2 , and imagine changing X 1 by the amount ..l-\'1
while oot changjng X2, that is. while holding X 2 constant. Beca use X 1 has changr.:d .
Y wul cha nge by some amount, say ~Y. After this change, the new va lue of Y, Y +
~ Y, is

An equa tion for 6. Y in terms of AX1 is obtained by subtracting the equation


Y = /30 + {31X 1 + {32X2 from Equation (6.3). yielding .!l Y = {3 1D. X 1 That is,

f3t = : ; . holding X 2 constant

(6.4)

The coefficient {3 1 is the effect on Y (the expected change in Y) of a unit change


in X 1 , be lding X1 fixed. Another phrase used to describe {31 is the partial effect on

Yof X 1, holding X,_ fixed.


The interpretation o f the intercept in tbe m ultiple regression model, {30 b "imilar to the 1nterpreta t10n of the intercept in the sing le-reg.r~sso r model: ll is the
expected value of Y, when Xli and Xu are zero. Simply put, the intercept f3n deto.rmines bow far up tbe Y axis the population regression line starts.

The Population Multiple Regression Model


l11e population regression line in Equation (6.2) is the relationship between }'and
X 1 and X 2 that ho lds on average in the popuJation. Just as in the case of rcgr~'\:.H.JO
with a single regrCS$Or. however, this relationship does not hold e xactly because
many other factors influence the dependent var iable. In addition to the .;tu
dent- teacher ratio and the fraction of students still learning English. for example
test scores are influenced by school chara cteristics, other student cb aractc;:ristic~
and luck. Thus the population regression function in Equalion (6.2) needs to be:
augmcnt\!d to incorporate these additional factors.
Just a:-. in the: ca!>e of regrcl>Sion with a ing..le rc;:gr~or. thu ft~ctors th.ll dl.!termn

'

~~ ~rt:> .ncornnr~

c 1n1 n

Fau.atio

((l?

no;

art

6 .2
t

The Multiple Regression Model

19S

error" tt::rm u;. This error term is the deviation of a particular observation (test
scor es in the i'11 district in our example ) from the average population relationship.
Accordingly, we have

(6.5)
where the subscript i indicates the i 111 of the n observations (districts) in tbe
sample.
Equation (6.5) is the population multiple regression model when there a re
two regressors, X u and X 2"
In regression with binary regressors it can be useful to treat {30 as the coefficien t on a regressor that always equals 1; think of {30 as the coefficient on Xoi where
Xo; = 1 fori= 1, . .. , n. A~cordin gly, the population multiple regression model in
Equation (6.5) can a lte rnatively be written as

tion

(6.6)
The variable X 0, is sometimes called the constant regressor because it takes on the

(6.4)

same value-the value 1-for all observations. Simila rly, the intercept, {3(1, is some-

ange

times called the constant term in the regressioT.J.


The two ways of writing the population regression model, Equations (6.5) and

':j:.

(6.6), are equivalent.


The discussion so fa r has foc used on the case of a single additional variable,

s the

regressor model. For example, ignoring the stude nts' economic background might
result in omitted varia ble bias. just as ignoring the frac tion of English learners did.

cter

X2 . In practice, however, there might be multiple factors omitted from the single -

This reasoning leads u!:. to consider a model with three regressors or. mo re generally, a model that includes k regressors. The multiple regression model with k
re gressors, Xli, X 2;, ... , Xki is summarized as Key Concept 6.2.
The definitio ns of ho moskedasticity and heterosked asticity in the multiple

y and

;rcssion

)eel'use

.he s tu
:<ample.
LerisdcS.
d'> to hC

at dctl!f
.2) as il 11

regression model a re extensions of their definitions in the single-regressor model.


The error term u; in the multiple regression model is bomoskedastic if the variance of the conditional distribution of u, given X 1i, . .. , Xki var(u,IXli.... , Xk1). is
constant fori = 1. ... . nand thus does not depend o n the values of X 1i> Xkr
O therwise. the error term is heteroskedastic.
The multiple regression model holds out the promise of providing just what
the superinte ndent wants to know: the effect of changing the student- teacher ratio.
holding constant othe r fac tors tbat are beyond her control. These factors include
not just the pCrC<.'ntage of E nglish learners. but o ther measurable factors that might
affect test performance. including the economic background of the students. To be

196

CHAPTER 6

linear Regression with Multiple Regresors

THE MULTIPLE REGRESSION MODEL


fhe multiple regression model b
Y, =Po+

lhX,, + P~X~ -

Pz.X~u

111

i = 1. . . .. n

(6.7)

where

Y, is t 1h obsl!rvatio n on the dependent van able: X 1,. Xz;.... X 1" are the

i 1h observations on each of the k regressors: a nd u is the error term.

The population regression line is the rcla tion:,hip that holds between }

and the X's on averag~ in the population:

(YIX 11

x 1,X21 x2 X 11,

Po + /3,x1 .l. /3~~~ +

=xsJ

1- /3~c-t 4

{3 1 is the slope cocfficit:nt on X 1 {32 i~ tht: coefficient on X 2 and so on.


The coefficient /3 1 i~ the expected change m Y, resulting from cbangin
X 1; by one unit. holding constant X 2l X~c1 - The coefficients on the
other X's arc mterpretcd similarly.

The intercept Pois the expecte d value o t Y when a ll the Xs equal 0. '11 "
intercept can be thought of ac; the coefficient on a regressor, X 01 that
equals I for all i.

of practical help to the su perinte ndent, ho wever. we need to provide her with c'
m,ttcs of the unknown po pulation coefficients {J0 , f3.-: of the po pulauo n rcgr
'io n model calculated usmg a sample of da ta. Fo rtunately, these coefficie nt!) ~.
~c l.!~llm ated

6 .3

using ordina ry least squa res.

The OLS Estimator


in Multiple Regression
Th1s sectiun desc ri~cs how the coefficients o f the multiple regression mt'<iel
be c,limated u...ing OLS.

c.ul

6 .3

The OLS Estimator in Multiple Regression

197

The OLS Estimator


Section 4.2 shows bow ro esrimate the intercept and slope coefficients in the
single-regressor model by applying OLS to a sample of observations of Y and X.
The key idea is that these coefficients can be estimated by minimizing the sum of
squared prediction mistakes. that is. by choosing the estimators b0 and h 1 so as to
minimize 2:~= 1 ( Y, - b 0 - b1XY. The estimators that do so are the OLS estimaA

tors, 130 and 131.


The method of OLS also can be used to estimate the coefficients {30 {3 1 . ,
{3k in the multiple regression model. Let b0 b 1, . h~r. be eslin1ators o( {30 131 . ..
13~: The predicted value of Yv calculated using these estimators, is b0 + h 1X 1; +
+ b~1w and the misrake in predicting Y; is Y, - (b0 - b 1X 1,-'- + b,;<k;) =
Y,- b0 - b 1X1; - - b,;<k, The sum of these squared prediction mistakes over
all n observations thus is
n

2: (Yi- b

0 -

b 1XIi - - bkXk1f.

(6.8)

i=l

1l1e sum of the squared mistakes for the linear regression model in expression
(6.8) is the extension of the sum of the squa red mistakes given in E quation (4.6)

sti

rescan

for the linear regression model with a single regressor.


The estimators of the coefficients {30 {31 , . .. , {3k tha t minimize the sum of
squared mistakes in expression (6.8 ) are called lhe ordinary J~as! squar~s (OLS)
estimators of {30, 13 1, , {3". The O LS estimators are denoted {30 {3 1 13k
The terminology of O LS in the linear multiple regression model is the same
as in the linear regression model with a single regressor. The OLS regression line
is the straight line construc(ed using the OLS estimators: ~0 + ~ 1X 1 + + ~kXk.
1l1e predicted value of Y; given X 1i, . . . , X ki based on the OLS regression line, is
Y, = ~0 + ~ 1 X1 ; + + ~t7kr The OLS residual for t he i'h observation is lhe difference between Y, a nd its OLS predicted value. that is, the OLS residual is it;=

Y,- Y;
Tbe O LS e!:>timators could be computed by trial a nd error, repeatedly trying
different values of b0, ... , bk until you are satisfied that you have minimized the
total sum of squares in expression (6.8). It is far easier. h owever, to use explicit for-

1 cnn

mulas for the O LS estimators that are derived using calculus. The formulas for the
OLS estimators i.n the multiple regression model are similar to those In Key Concept 4.2 for the single-regressor roodel. llJcse formulas are incorporated into mode rn s ta tistical software. In the multiple regression model, the formulas a re best
expressed a nd d i<:cussed using matrix nota tion, so the ir presentation is deferred
lO Section I R. I.

198

CHAPTER 6

Linear Regression with Multiple Regressors

THE OLS ESTIMATORS, PREDICTED VALUES,


AND RESIDUALS IN THE MULTIPLE REGRESSION MODEL

6 3

The OLS e~timators 0. {11.. {Jl arc the valu~ of b0 b .... b4 that mmimue the
!>Urn of squared prediction mistakes I~- 1 ( Y, - b0 - b 1X, -
1.Xu)2. The
OLS predicted' alul!s Y, and residuals it, are

ii, = Y, -

Y,.i =

1, ... . II.

(6.10)

The OLS e!>1imntors jJ0 jJ 1, . , ffik and residual u1are computed from n ::-ample of
n observations of (Xli... . . X 111 Y.-). i = I ..... n. These are esrirnator!i of the
unknown true population coc mcients {30,(3 1, f3J.. and error term. 11 1

The definition s and terminology of OLS


rizcd in Key Concept 6.3.

10

muhiple regression are sum. ta

Application to Test Scores


and the Student-Teacher Ratio
ln Secuon 4.2. we used OLS to estimate the intercept and slope coefficcnt ot hi:!
regression re lating test scores (TestScore) to the srudeot- teacber ratio (STR) uo;t-1!
our 420 observations for Califo rnia school districts; the estimated OLS regrc!i'&on
line, reported in Equation (.u 1), is

t esrScore = 698.9- 2.28 X STR.

(1).11)

O ur concern has been that t bi ~ rela tionship is misleading because the- stu
denr- teachcr ra tio might be picking up the e{(ect of having many English h:urn
er!. in districts with large classes. That is, it 1s possible that the OLS esti 01i.liM j,
subject to onuttcJ variable bias.
We are nm' in a po<;nion to address tbis conce rn by using OLS tu c: ... timate 1
mulliple regression in wh1ch the depend~ot variable is the tc ... t -.con. 0') '\OJ there.:
are l\\0 rcgr..:ssors: tht.: studenr tcacb\.r ratio (X11) anJ the; p.:r.;.:nta~~~- '"'' r:nglbh

6. 3

The OLS Estimator in Mul~ ple Regression

199

learners in the school district (X2;) for our 420 districts (i == 1, ... , 420). The esti mated OLS regression line for this multiple regression is

Te stScore== 6R6.0 - 1.10 X STR - 0.65

he

he

lO)
~of

the

~rna-

of the
using
ssion

(6.1 l)

be stu-

i learn1ator is

imate a.
1tl rhefl!
English

PctEL,

(6.12)

where PctEL is the perce ntage of studeAnts in the district who are English learners. The OLS estimate o f the intercept ({30 ) is 686.0, the OLS estimate of the coefficient on the student- teacher ratio 1) is -1 .10, and the OLS estimate of the
coefficient on the pe rcentage English learners 2) is - 0.65 .
The estimated effect on test scores of a change in the student-teacher ratio in
the multiple regression is approximately half as large as when the student-teacher
ratio is the only regressor: in the single-regressor equation [Equation (6.11)), a uni t
decrease in the STR is estimated to increase test scores by2.28 points. but in the multiple regression equation [Equation (6.12)]. it is estimated to increase test scores by
only 1.10 points. This difference ocx:urs because tJ1e coefficient on STR in the multiple regression is tbe effect of a change in STR, holding constant (or controlling for)
PctEL, whereas in the single-regressor regression, PctEL is. not held constant.
These two estima tes can be reconciled by concluding that there is omitted
variable bias in the estimate in Lhe single-regressor model in Equation (6.11). In
Section 6.1, we saw that districts with a high percentage of E nglish learne rs tend
to have n o t only low test scores but also a high student-teacher ratio. If the
fracti on of E nglish learners is orrun ed fro m the regression. reducing the
student-teacher ratio is estimated to have a Larger effect on test scores, buf this
estimate reflects both the effect of a change in the studenL-teacher ratio and the
omitted effect of having fewer English learners in the district.
We have reached the same conclusion tha t there is omitted variable bias in
the relationship between test scores and the student- teacher ratio by two different paths: lhe tabuJar approach of dividing the data into groups (Section 6.1) and
the multiple regression approach [Equation (6.12)]. Of these two methods, multiple regression has two important advantages. First, it provides a quantitative esti
mate of the effect of a unit decrease in the st ude nt-teacher ratio, whlch is what the
superintendent need!. to m ake her decision. Second, it readily extends to more
than two regressors, so that mulliple regression can be used to control for measurable factors other than just the percentage of E nglish learners..
The rest o f this chapter b devoted to understa nding and to using OLS in the
multiple regression model. Much of what you J.earned ahout the OLS estimator
with a single regressor carries over to multiple regression with few or no mod ifications, so we will focus on that which is new with m ultiple regression. We begin
by discussing measures of fit for tile multiple regression model.

(h

.9)

(h

200

CHAPTER 6

6.4

linear Regression with Multiple Regressors

Measures of Fit in Multiple Regression


Three commonly used ummary t.lalistics in multiple rcgrc<:sion are the ~land "~
t.rror o f the regression. the regresSIOn R~. c\OU the adjusted R2 (al'o known a',, I
All three st<lll:,tics meal!UJ \!how well the OLS cs umat~ o l the muluple rcgr~"
line describes. or 'fits." the data.

The Standard Error of the Regression (SER)

me standard t:rror of the rcgrcs,.ion (SF:R) ec;timatcs the standard deviation

1f

the e rror tenn " ; Thus. the SER is a measure o f th\! spread ot the djstribut1on
Y around the regress1on line. ln multiple regression, the SE R is

SE R --

~~;,

1
.'2 _
I
~ ~oz _
SS R
w Jer e:>,; - -n -- k -- l 1-.c.,;1II/ - n - k, - I '

(fi

I ~)

v. here the SS R is the sum of squared re id ua1s, SS R ,.. z;~ 1tif.


The only difference bt::twecn the definition in Equation (6.13) and the 1.k l n
it ion of the SR in Set:tion 4.3 fm the single-regressor model is that here the.: d , I
sor 1s n - k - 1 rather than n - 2. 1n Section 4.3, the d1visor n - 2 (rather lh.nl ")
adtusts for the downward bias introduced b) c:stimaung tv. o coefficient.:. (the '> I< pc
and intercept of the regression line). Here. the div1sur n - k - I adjusts fur the
downward bias introd uced by estimating k .J.. 1 coetfictenls (the k lope codf,.
cknts plus the intercept). As in Section 4.3, using n - k - 1 rather than n is ca ll d
a degrees-of-freedom adJustment. If there'" a single rcgressor, then k = 1, -.o t I!
formula in Section 4.3 I!> the same as io Equation (6.13) When 11 is large, the dtcct
of the degr.;csof fn:edom adjuc:.tm ~ n t 1 neght!ible.

The regression R 2 is the frnction of the -;ample vannnce of Y; explaint!d PY (or r tc,;
dieted by) the regrc-,sots. Equivalcntly.Lhl! f(~ i~ l minu<> the fraction of tht! \Jt l
ance of Y, not expla ined by the rcgrt!ssors.
The mathemattcal dcfmition 01 the R2 is the samt! as tor regression wllh .;in
glc regressor:
R1 _ ESS
TSS

=1

whcr~ the cxplamcJ 'lim ol o;qu.ITC\ lS L \.S


!.quares is T\S

=~

(Y

}' )'.

SSR
TSS '

~ ( >. - r )' and the tow I ..um of

6.4

tndard
-l

us R ).
res_<::ion

tv\eosures of Fit in Multiple Regression

201

In multiple regressio n. the R 2 increases whenever a regressor is added, unless


the estimate d coefficient oo the added regressor is exaclly zero. To see this, think
about staning with o ne regressor and then adding a second. Whe n you use OLS
to estimate the mode l with borh regressors, OLS finds the values of the coefficients
that minimize the sum o f squa red residuals. U OLS happens to choose the coeffi-

cient o o the new regressor to be exact ly zero, then the SSR will be the same
whether or not th e second variable is included in the regression . But if OLS
chooses any value other than zero, then it must be that this value reduced the SSR

Hion of
Lltion of

(6J3)

be dcfinthe divi:r than n)

:the slope
Its for the

pe cocffilllS called

relative to the regression that excludes this regressor. In practice it is extremely


unusual for an estim ated coefficient to be exactly zero. so in general the SSR will
decrease whe n a new regressor )s added. But this means that the R 2 generally
increases (and never d ec reases) when a new regressor is added.

The "Adjusted R2 "


Because t he R 2 increases when a oew variable is added, an increase in the R 2 does
no t mean that adding a variable actually improves the fit o( the model. In this
sense, rhe R2 gives an inflated estimate of how well the regression fits the data.
One way to correct for this is to de flate or reduce the R 2 by some factor, and this
is what the adjuste d R2, or R2 does.

The adjusted R 2 or

R2, is a m o dified version of the R 2 that does not neces-

sarily increa se when a new regressor is added. TI1e R2 is

R,2 = l _

= 1. so the

n- 1

SSR = 1 _ su.
n - k - 1 TSS
sl'

(6.15)

. the e (fect
The difference between this formula a nd the second definition of the R 2 in Equation (6. 14) is tha t the ra tio of the sum of squared residuals to the total sum of
squares is multiplied by the factor (n -1)/(n - k - l). As the second expression

'by (or pr~l . the van-

ft

~with a s1u-

in EquaLion (6.15) sbows, this means that the adjus1ed R 2 is 1 minus the ratio of
the sample variance of the OLS relliduals [with the degre es-of-freedom correction

in E quation (6.13)]

lO the sample variance of Y.


There are three useful things to know about the

is alw<~ys greater than 1, so R

2 is

R2 Ftrst, {n -

1)/(n - k- 1)

R2.

always less than


Second, adding a regressor has two oppo ite effects on the R2 . On tbe one
hand. the SSR falls. whic h increases the R2 . O n the other hand, the factor (n - 1)/

(6.14)

(n - k - L) inc reases. Wbether the R2 increase s or d ecreases depends on which of


these two effects i!' stronger.

total suro "'(

T hird . the R2 can be negative. This happens when lhe regressors, taken
tog\.:thcr. reduce the sum of stJuarcd residuals by such a sma ll amount that thii>
red uction fail!. 10 offset the factor (n - I )/(n - k - l).

202

cHAPTER 6

Linear Regression with Multiple Regressors

Application to Test Scores


Equation (6.1 2) repons the estimated regression line for the multtplc regresst ,
rdating test scores (TestScore) to the stude nHeache r ratio (STR) and the pc:..centage o( English learners (PctEL). The R2 ior this regression hne is R2 =- 0.4-"
the auJusted R2 ts R2 = 0.424, and the standard error of the rcgr~.;s10n is St:.R

= 14.5.
Comparing these measures of fit with tbose for the regression m which Per .
is excluded [Equation (6.11)) shows rhat mcludltl& PctEL 10 the n .. gres~1 rt
im.Teased the R2 from 0.051 to 0.426. When lhe only regressor is STR, only a sm tll
tractto n of the variation in TestScore is explained. however, when PctEL is adJ~:J
to the regression. more than two-fifths (42.6%) of the variation in test sco re~o is
expl3ined . In this sense. including the percen tage of E nglish learners substuntwlly
im proves the fit of the regression. Because n is la rge and o nl y two regrcs ~nrs
appea r in Equation (6.12). the difference be tween R2 and adjusted R2 is very smt1 ll
(R 2 = 0.426 versus R2 = 0.424).
T11e SER for the regression excluding PcrEL is 18.6; this vatue fall s to 14.5
when PctEL is included as a second regressor. The units of the SER are point" on
the standardi?.cd tesl.The reduction i.o the SER tells us that predictions about st ndardized test scores are substantially more precise if they are made usinJ be
regression with both STR and PctEL than if they are made using the regrt.-..s n
witb o nly STR as a regressor.
because 11 quantific' th~
extent to which the regrcc;c;ors account for, or explain. the variation tn lh!! dependent varia hie. Nev~rtheless, heavy reliance on the R2 (or R2) ca n ~ a trap. Tn apph
cations, " max1mize the R2'' is rarely the answer 10 any economically or statt... ti all~
meaningful quesllo n. Instead, the decision about whether to include a vanahh. '"
a m ultiple regressioo sho uld be based on whether including that va riabl~; allo"~
you bett er to estimate the causal effect of interest. We retum to the issue nf how
to decide which variables to include- and which to exclude-in C hapt~r 7. nr~t.
however1 we need to develop methods for quantifying the sampling unce rtain tv of
the OLS estimator . T11e starting point for doing so is extending th e least ::.qu.trl!'
assumptions of Chapter 4 to the case of mult iple regressors.

Us ng tht! R 1 and adjusted R2 The

6.5

R.2 is useful

The Least Squares


Assumptions in Multiple Regression
lberc nrc four ka

_____________

-:quares assumptions in th~ mulliplc rc.:grc"1on mod,_.

6.5

The least Squares Assumptions in Multiple Regression

203

(Key Concept 4.3), extended to allow for multiple regre ssors. and t he se are disc ussed only briefly. The fourth assumption is new and is discussed in m ore detail.
~ssion

e per-

,0.426.

!>

SE R

Assumption #1: The Conditional Distribution


ofu,. Given X 1,., X 2;1 , Xki Has a Mean of Zero

PctEL

The first assumption is that the conditional distribution o f u i given X li.... ,Xki has
a mean of zero. Th is assumption extends the first least squares assumption with a

ression
a small

single regressor to multiple regressors. This assumplion means that sometimes Yi


is above the populatio.n regression line and sometimes Y; is below the population

; added

regression line, but on average over the population Yi fa lls on the po pul ation
regression line. Therefore, for any value of the regressors, the expected value o f u ;

pores is

iantially

rress ors
!fY small

is zero. As is the case for regression with a single regressor, tbis is the key assu mption that makes the OLS e::;timators unbiased. We return to o mitted variable bias

in multiple regression in Section 7.5.

~ lO 14.5

~s1ng ~he

Assumption #2:
(X,;, Xz,., ... I xkil Y,.), i = 1, ... ! n Are Li.d.
The second assumption is that (X1i, ... , Xki Y;), i = 1. ... , n are independl;!n tly and

~gresSlOO

identically distributed (i.i.d.) random variables. This assumption holds automatically

~o ints

on
lo ut stan-

if the data are collected by simple random sampling. The comm ents on this assumption appearing in Section 4.3 for a single regressor also apply to rn ulti.ple regressors.

ltifies the
be d epen-

~. Jn a ppli~.atistical~y

Assumption #3: Large Outliers Are Unlikely

J.ariable In

The third least squares assumption is that large outliers-that is. obser vations with
values far outside tJ1e usual range of lhe data-are u nlikely. Th.is assumption serves

~ble allows.
~ue of ho\\'

as a reminder that. as in single-regressor case, the OLS estimator of lhe coefficients


in the multiple regression model can be sensitive to la rge outl iers.

iter 7. First:
certainty ot

The assumption that large outliers are unlikeLy is made matJ1ematically precise
by assuming that Xli, ... ,

fast squares

0 < E(X1;) <co, ... , 0 < E(Xt i) < oo and 0 < E(Y7) < oo. Another way to state this

X ki>

and Y; have nonzero finite fou rth moments:

assumption is that the dependent variable and regressors have finite k urtosis.
This assumption is used to derive the properties of OLS regression statistics in large
samples.

Assumption #4: No Perfect Multicollinearity


TI1e f oUJ'Lh assumption is new to Lhe multiple regTession model. It rules out an

inconvenient situation. call~d perfect muJticollinearity, in which it is impossible to

204

CHAPTER 6

Lineor Regression with Multiple Regressors

THE LEAST SQUARES AssUMPTIONS


IN THE MULTIPLE REGRESSiON MODEL

6.4
l. u, has conditional mean zero given X w Xy, . . . , Xk1: that is,

(u,iX 1;. X 21, , Xk,) = 0.

2. (X ti X 21 , Xk,. Y;). i = I, .. . . n are indepe ndently and identically distributed (i.i.d.) draws from their joint distribu tio n.
3. Large outliers are unlikely: X v . ... Xld anl.l Y; have nonz~ro fi nile rounh

moments.

-t Titere b no perfect mullicoltineariry.

compute the OLS estimator. The regressors are srud to be perfectly multicollinear
(or to exhibit perfect multicollinearity) if one of the regressors is a perfect luH.ar
function of the o ther regressors. The fourth least squares assumptio n h that the
regressors are no t perfectly multicollinear.
Why does perfect muLticollinearity make it impossible to compute the OLS
estimalOr? Suppose you want to estimate the cocfficielll on STR in a rcgr"~" nn
of TestScore, on STR; and Pet EL,. except that you mak e a typographical erwr und
accidentally type in STR, a second tirne 1nslead of PctEL;: that is, you reg.n:~~
TesrScorei on STR, and STRi. This is a case of perfect roulticollinearity becau.'e nnl'
of the regressors (the first occurrence of STR ) is a perfecl linear function ,,(
another regressor (the second occurre nce of STR). Depending o n how your Sllll
ware package handles perfect multicollinearity, if yo u try 10 estimate this r~;E!.hY
sio n the software will do one of three things: (1) It will drop one of the occurrence~
of STR; (2) il will refu se to calculate the OLS estimates and give an error mcc;s,,gt::
or (3) it wi ll cra:,h the computer. TI1e mathema tical reason for th1s failun: 1s th<ll
perfect mullicollinearity produces division by zero in the OLS formulas.
At an intuitive level, perfect multicollinearity is a problem because you rl.l
~skin g the regression to answer an illogical question. In multipt..: regre ... -.ton ,hl'
cocfficicm oo one of the regrt::s ors is the effect of a change in that rcgrcs 'l)f. hold
ing the other regressors constant. In the hypothetical rcgres ion of Test~nm Jtl
S TR and S l R, the coefficient on the fi rst occurrence of STR 1~ lhe t:fft.!ctun 1!.!"1
'>Cores of a change In STR. hulding COil!>lanl S'JR. Thb makes no <.cn~e. anu 01 s
:

L!e

6.6

The Distribution of the OLS Estimotors in Multiple Regression

205

The solution to perfect multicollinearity in this hypothe tica1 regression is simply lo corre ct rhe rypo and lo replace one of the occurrences of STR with the variable you originally wanted to include. This example is typical : "When perfect
multicollinearity occurs. it often reflects a logical mistake in choosing the regressors or some previously unrecognized feature of the data set. In general, tbe solution to perfect multicollinearity is to modify the regressors to eliminate the
problem.
Additiona l examples of perfect multicollinearity are given in Section 6.7,
which also defines and discusses imperfect multicolline arity.
The least squares assumptions for the multiple regression model are summarized in Key Concept 6.4.

"b-

rth

6.6

near
ear

The Distribution of the OLS


Estimators jn Multiple Regression
Because the data differ fro m one sample to lhe next, different sample1:. produce
different values of the OLS estimators. This variatio n across possible samples gives
rise to the uncertainty associated with the OLS estimators of the popula tion

the

regression coefficients, {30, {3 1, . f3k Just as in the case of regression with a single
regressor. this variation is summarized in the sampling distributio n of the OLS

OLS
ssion
rand

estimato rs.
Recall from Section 4.4 lhat, under t he least squares assumptions, the OLS
estimato rs (~0 and ffi 1) are unbiased and consistent estimators of the unkn own
coefficients ({30 and {31) in the linear r egression model with a single regressor. In
addition, in large samples. the sampling dislribution of ffio and~~ is well approxi
mated by a bivaria te normal distribution .
These results carry over to multiple regression a nalysis. That is, under the least

squares assumptions of Key Concept 6.4, tbe OLS estim ators ffio. ffi 1... , ffi,. are
ences
l;!ssagt!:

is thai

lou art:

iion. thC

or. hOJJ
)'co re 011
n LC"\
~ 0

~od Ol :>

unbiased aod consistent est1mators of {30, f3~o . . . f3k in the line~r ~ultipl~ regression model. In large samples, the joint sampUng distribution of {30 . {3 1.. {3k is well
approximated by a multivariate oormal d istribution, which is the extension of the
bivariate normal distribution to the general case of two or more jointly norma l
random variables (Section 2.4).
Although the a lgebra is more complicated when there are m ultiple regressors.
the central limit theorem applies to the OLS estimators in the multiple r egression
model for the same reason that it applies to Y and to the OLS estimators whe n
there is a single regressor: The OLS estimators '/30, ~ 1 , . , ~k. are averages o( the
randomly sampled data. and if the sample si.ze is suffic iently large the sampling
distribution of those averages hecomcl> normal. Because Lhe multivariate nMmal

206

cHAPTER 6

linear ~egression

with Multiple Regressors

lARGE SAMPLE DISTRIBUTION Of {3 0 , {3 1,

"'

f3 k

If lhe least squares assumptions (Key Concept 6.4) bold. then in large samples lh~
OLS estima tors ~11 ~ 1 {3~. arc jointly normally distributed and each {3, is dtstributed N(J3,.
).j = 0, ... . k .

uJ

distribution is best handled mathematically using matrix algebra, the expression.\


for the joint distribution of the OLS estimators are deferred t o Chapter 18.
Key Conce pt 6.5 summarizes th~ result that, in large samples. the distribution
of the OLS estimators in multiple regression is approximately jointly normal . In
general. the OLS estimators are conela ted; this correlation arises from the corn :Jat.ion between the regressors. The joint sampling distribution of the O LS estimators is discussed in more detail for the case that there are two regressors anJ
h omoskedast ic e rrors in Appendix 6.2, and the general case is discussed m
Section 18.2.

6 ..7

Multicollinearity
As discussed in Section 6.5. perfect mult icollinea rity arises when one of the regressors is a perfect linear combination of the other regressors. This section pro\'iu6
some examples of perfect multicollinearity a nd discusses how perfect mull tcollinearit)' can arise, and can be avoided. in regressions with mult iple binary
regressors. Imperfect mullicollinearity arises when one of the regressors is ver\'
highly correlated- but not perfectly correlated-with the othe r regressors. Unl ikl!
perfect mulricollinca tity, imperfect mul ticollinearity does not prevent estimation
of the regression. nor does it imply a logtcal problem \vitb the choice of regressor"
H owever, it docs mean that one or more regression coefficients could be estimateJ
imprecisely.

Examples of Perfect Multicollinearity


We continue lhe discussion of perfect muhicolunearity from Section 6.5 by e xamining three additional bypmlletical regressions. In each. a Lhin.J n..:ttressor is added
to the regression ol TestS~ore, on STR1 and PcrEL, in Equation (6.12).

6. 7

Example #I: Fraction of English learners.

Multicollinearity

207

Let Frac EL, be the fraction of

he
is-

ion

English learners in the i h district. wluch varies betwee n 0 and l. Jf the variable
FracEL 1 were included as a third regressor in add ition to STR, and PcrEL" the
regressors would be perfectly multicollinear. The reason is that PcrEL is the per-

centage <.lf English learners. so that PctL1 = 100 x FracEL, for e very district. Thus
one of the regressors (Pc1EL,) can be written a:; a perfect linear functi on of
another regressor (FracE!.,).
Because of this perfect multicollinearily. it is impossible to compute the OLS
estimates of the regression of TestScore; o n STR1, PctF.L,. aod FracELi.At an intuitive le ve l, OLS fails because you are asking, What is the effect of a unit change in
the percenrage of English learners, ho lding constant tht! fraction of E nglish learners? 13ecause tbe percentage of English learners and the fraction of English learne rs move together in a perfect linear relationship, this yuestion makes no sense
and OLS canno t ans ...ver it.

.In
rrema-

and
d

jn

gres
vidCS
~l Ui ll

inary

ts very

Example #2: " Not very small" classes.

Let NVS1 be a binary variable that


1

equa ls 1 if the student- teacher ra tio in the i h district is 'not very small," specifically, NVS; e quals l if STR, ~ 12 and e quals 0 o therwise. This regression also
exhibits perfect muhicollinearity, but for a m ore su btle reason than the regression
in the previous example. There are in fact no districts io o ur data set with STR1 <
12; a:; you can see in the scatterplot in Figure 4.2, the smallest value of STR i~ 14.

Thus, NVS; = 1 for all observations. Now recall tbat the linear regression model
with an intercept can equivalently be thought of as including a regressor, Xlli> that
equals 1 for all i, as is shown in Equation (6.6). Thus \\ C can wri te NVS, = 1 x Xo;
for all the observations in o ur dala set; that is, .\'VS; can be written as a perfect linear combination of the regressors; specifically, it equal~ X .
This illustrates two important points about perfect multicollineari ty. First,
when the regression includes a n intercept, then one of the regressors that can be

lJnlikc

implicated in perfect multicollinearity is the constant regressor X01 Second, perfect multicollinearity is a statement about the data set ) ou have on hand. While it

1ation
e:-.'l>ors.

is possible to imagine a school district witb (ewer than 12 s tudents per teacher,
there are no s uch d istricts in our da ta set so we ca nnot analyze them in o ur

matcJ

regression.

c:wrn

s t'duc:J

Example #3: Percentage of English speakers. Let PctES, be the percentage of 'Engli$h speakers" in the i 1h district, defined to be lhe percentage of sruuenl who art: no t English learners. A gam the regressors
be perfectly
multicollinear. Like the previous example. the perfect linear relationship among
the regressors involves the constant rt:g re ..sor X, :For every distnct. Pct5 = 100

'"ill

X,

PctEL,.

20 8

CHAPTER 6

lineor Regression with Multiple Regressor>


This exa mple illustrate a nother point: perfect multicollinearity is a feature n1
the e ntire set of regressors. If either the intercept (i.e., the regrec;sor Xn.) or p ,tEL.
were excluded fro m this regression, the regressors would not be perfec.t ,
mulhcollinear.

The dummy variable trap. A no ther possible source of perfect muh tcoJiinearity arises when mulliple binary, o r dummy, variables arl! used ac; reg.re'sors. For exa mple. suppose you have pa rtitioned the school d1st ricts into three
categories: rural, suburban , a nd urban . Each district falls into on\! (and o nly on ~:)
ca tegory. J.c t these bina ry variables be Rural;. which e qua ls 1 for a rura l dbl u~ot
a nd eq uals 0 o therwise: Suburban;, and Urban,. If you include all lhree binary vanabies in the regression a long with a constant, the regressors will be perfect m ulucollinearit y: Because each djstrict be longs to one a nd only o ne category, Rural, ~
Suburban, + Urban, = l = Xoi whe re XOi denotes lbe constan t regressor introduced in E qua tion (6.6) . Thus, to estimate the regression, yo u must exclude ont. of
these fo ur varia bles, eithe r o ne of the binary iodica to rs o r I he coosta nt term. By
convention , the constant term is retained, in whicb case o ne of the binary indtcalo rs is excluded. for t:xample. if R urali were excluded . the n the coefficient on
Suburban, would be the average difference be tween tes1 scores in suburban and
rural distric t!.. holding coast ant the other variables in the regression.
In gene ral. if t11e rc are G binary variables, if each observation fa lls into one
and onJy one category, if there is an intercept in Ihe regression. and if aU G bina~
variables are included as regressors, lhen the regression wi ll fail because of p..:rfect mullicotlinearity. This situation is called the d ummy variable trap. The usual
way to avoid the dummy varia ble trap is to exclude o ne o f the bina ry variables
from the m ultiple regr ession .so on ly C - 1 of rhe G binary variables are incluJed
as regressors. In Ibis case, the coefficients on the include d binary variables represent tbe incre me ntal effect of be ing in that category. re la tive to the base ca:-~ of
the omitted category, holding constant the othe r regressors. Alternative!). all G
binary regressors can be included if the intercept is o miuc d from the regrc::.::.ton.
Solutions to perfect multicollinearity.

Per(cct mullkolline arity typicallY


arises when a mistake has been made in specifying the regression. Sometime.., th<=
mista ke is easy to spot (as in the first example) but sometimes it is not (as 10 tile:
second example). In o ne way or another your software wi ll le t you know Jf yoll
ma ke such a mistake because it cannot compute tbe OLS es timator if you haH'
Whe n your software lets you know that you have perfect multicolhnt!JII ~ it
is important thai you modify your regression to eliminate it. Som~ ,oft\\arc:. 1)
unreliable when there is perfect multicollinearity, a nd at a minimum ) ou will bt;
ccdmg control over your choice o( regre$.<;Ors to your computer it your rcgrc.,...or'

6. 7

of

Ll
tly

h i-

esree
nc)

usual
able!t
luded

epre
se o(

. all G

icall"
tht:
in th~o

t!S

if ,()U
have
arity. tl
va rt i'
wtllll~

rc~.;ors

Mulricollineorily

209

Imperfect Multicollinearity
D espite its similar name, imperfect multicollinearity is conceptllally quite dillc:re nt than perfect multicollinearity. Im perfect multicollinearit) m~.:ans that two or
more of the regressors are highly correlated, in the sense that there is a linear function of the regressors that is highly correlated with another regressor. Imperfect
multicollinearity does no t pose any proble ms for the theory of the OLS estimato rs; indeed, a purpose of OLS is to son o ut the independent influences of the various regressors when these regressors are potentially correla ted.
If the regressors are imperfectly multicoUinear, then the coeffioents on at least
one individual regressor will be imprecisely estimated. For example. consider the
regression of TesrScore on STRand Pc1EL. Suppose we were to add a third regressor,the percentage the district's residents who are first -generation immigrants.
First-generation immigrants often speak English as a second language. so the vari
ables PctEL and percentage immigrants will be highly correlated: Districts with
m any recent immigrants will tend to have many students who ar~ still learning
English. Because these two variables are highly correlated , it would be difficult to
use these data to estimate the partial effect on test scores of an increase in PcLEL,
holding constant the percentage immigrants. In other words, the data set provides
little information about what ha ppe ns to test scores when the percentage of English learners is low hut the fraction of immigrants is high, o r vice ''ersa. If the least
squares assumptions hold. the o the OLS estimator oflhe coefficient on PctEL in
rhis regression will be unbiased; howe ver. il will bave a larger variance than if the
regressors PcLEL and percentage immigra nts were uncorre la teu.
The e ffect of imperfect multicollinearity on the variance o f the OLS estimators
can be seen math e ma tica IIy by inspecting Equation ( 6.17) in Appendix 6.2, whlch is
the variance of ~ 1 in a multiple regressio n with two regressors (X1 a~d X 2 ) for the
special case of a homoskedastic e rror. In this case, the variance of /3 1 is jnversely
proportional to 1 - p~ 1_x 2 where p x .x 1 is the correlation betwec.:n X 1 and X 2. The
la rger is Lhe correlation between the two regressors, the closer i$ this term to zero
a nd the larger is the variance of {3 1 More generally. when multiple regressors are
imperfectly multico llinear, the n the coeffici ents on o ne or more of 1hese regressors
will be imprecisely estimated- tha t is, they will have a large sampling variance.
Pe rfect multicollinearity is a problem that often signals the presence of a logical error. Jn contrast, imperfect multicollinearity is not necessarily an e rror, but
rathe r just a feature of OLS, your data, and the ques1ion you are trying to answer.
1f the va riables in your regressio n a re the ones you meant to jnclude-tbe ones
vou chose to address the potential for omitted variable bias-then imperfec multicollinearity implies that it will he: difficull co estimate precisely one o r more of
the partial ciTects using the data at hnnd.

2 10

CHAPTER 6

6.8

linear Regression with Multiple Regre!>SOI'~

Conclusion
Regression wilh a single regressor is vuloerablt! to omitted variable hid'> ll <tn
omitted variable is a determinant of lhe dependent variable and [)) correh..tted '' nh
the regressor. then the OLS esumator of the slope ~.:oeHicient will be baa ell mu
\\ill reOect both the effect of the regressor and the effect of the omitted 'ariablc
Multiple regrec;c;ton mak~ it possible to mitigate omilled variable bias by ind udrng the ontittl!d vanable in the regres11ion. Tite coeffictent on a regressor. X, ln
multt plc regression is the partial effect of a change in X 1 holding constant the
other included regressors. In the test score example, including the percentag~. ,,f
English learners as a regressor made it possible to estimate tJ1e effect on test ohl:ll(..:\
of a change in the student- teacher ralio. holding constant the percentage of [ n...
Iish learners. Doing so red uced by hal( the eslimated effect on tes t score ot J
change in the student-teacher ratio.
TI1e statislica l theory of multiple regression buil<.Js on the statistical th~:orv uf
regression with a single regressor. The least squares assumptions for mu lt1p,e
regression are exte nsions of the three least squares assumptions (or regr~sst m
with n single rcgrcs~or. plus a fourth assumption ruling ou t perfect mL ucollioeanty. Because the reg.ression codficiems are estimated usmg a sing.le" LID
pie, the OLS estimators have a JOint sampling distnbution and. therefore have
sampling uncertatnry. This sampling uncertainty must be quaotiued as part of an
empirical study, ao<.J the ways to do so 10 the multiple regression modd are tb\!
topic of lhe next chaptec

Summary
1. Omitted variable bias occurs wbi!O an omitted variable ( L) is correlated wtth
included regressor and (2) is a determinant of Y.

:~n

2. The multiple rcgrcssjon model is a linear regression model that incl ud~s mulupl~
regressor:,, X I, x2. ' Xk . Associated with ~ach regressor is a regres. 100 codli
cient, }31, }32, . {3~; . The coefficient 13t is the.; expected change in Y associatl.!u with
a one-uoil change in X 1 holding the other regressors constant. The oth ~ r n.:~n:'
sion coefficients have an analogous inte rpretation
3. The coefticients in multiple regression can be l!stimated by OLS. \\ hc.:n the l lUf
least squares assumptions in Key Concept 6.4 are satisfied, the OLS e:,umatur' 1rl!
unbiased. con't'h.:nt, and normally distnbuted in large ~am pies.
4. Pcrrcct mu lticollinearit\ which occurs "hen one re~rc~sor i' .1n c\.tt.:t lin..: tr
fum:tion of the other regre ~or.. u... uallv m'~' Irom a m1, t1kc.. 10 chommg wh 1ch
0

Review the Concepts

i.IO

llh
nd
lc.
ud-

.in

the:
of
o rcs
1.

n~

of a

ry of
tiple
ion
ulti
S:liil

ha"c
of an
re tho.!

2 11

regressors to include in a m ultiple regres!)ioo. Solving perfect multicoll.inearity


requires changing the \!L of regressors.
5. The standard error of the regression, the R2 , and 1he R2 a re measures of fi t for the
multiple regression model.

Key Terms
omitted variable bias (187}
multiple regr ession model ( 193)

population multiple regression mode l

population regression line ( 193)


po pulation regression funcrion ( 193)
inle rcepl (193)

constuut regressor constant term (195)

slope coefficient of XI! (J 93)

coefficient on X 2; (193)
con trol varia ble (193)
holding X 7 constant (194)

OLS estimators or A,./31. . . f3k (197)


OLS regression hnc (L97)
predicte d value (197)
OLS residual {197)
R2 a nd adj us red R2 (R2 ) (200. 201}
perfect multicollinearity or to exhibit

controlhng for X (194)


partial effect (194)

perf~ct multico llinearity (204)


dummy variable tra p (208)

coefficient on Xli (193)


slope coefficient of X 2; (193)

(J95)
homoskedastic ( 19"i)
heteros kcdastic ( 195)

Review the Concepts


6.1

t: ~i:

computers per student? Why or why not? lf you thtnJ.. ~ 1 is biased, is it biased
up or down? Why?

coetfi

leu " Jtll

6.2

A multiple regression includes two regre:.M>rs: Y1 = 130 + f3 1X 1, ~ ~X11 + u,.


W hat is the expected change in Y iJ X 1 increases by 3 units and X 2 is
unchanged? Wbat is the expecte d change in Y if X 2 det.Teases b~ 5 units and
X 1 is unchanged ? What is the expecte d change in Y if X. increases by 3 units
and X 2 decrease~ by 5 units?

6.3

Explain why two perfectly multicollinear regressors cannot be included in a


linear multiple regression. Give two examples of J pair of perfect!) multicollinear regres. o rs.

reg.re~

the four

:nors arc
line:tf
g
11 w 1w:It

cl

A researcher is inte rested in the effect on rest scores of computer usage.


Using school district data like tha t used in this cha pte r, she rcgresse~ district
average test scores on the number of computt!r., pl!r student. WiU {31 be an
unbiased estimator of the effect on tc!it scores of mcreasing the number of

2 12

CHAPTU

6 linear Regression with Mulfrple Regr~SOI'1

6.4

I:xplain why it is dtfhcult l<.l i!Sltmate pn.:d.,l.!ly the partiaJ clfcct of X 1 hold
me X 2 constant. tf X 1 and~\ are highly correlated.

Exercises
The first four exerce:;es r~h.. r to lhe tahk ol es l i ma t~d r~gressions on page 2 3.
computed using data for 1Y9h from the C P~. Thct Jc.~ta set consists o[ mtvrmation on 4000 full-umc full-year worhrs. The htghcst educanonal achr~"e
ment for each worker wa' ~o.uhcr a high M:hool diploma or a bachelor Jee c
The \vorkers ages r<mg~o d fr om 25 to 34 )~:oar--. The data set also comau. r
information on thi! region of the country \\here the person Jived , manta! :-.tJtus, and numher of chih.lrcn. l-or the purposes of these exercises let
A HE = average hourly earnings (in 1998 dollars)
College = binary variable (1 if college, 0 if high schnol)
Female= binary va riable ( I if female, 0 if male)
A~e = age (in years)
Nrh enst = binary variable (I it Region = Northeast 0 o therwise)
~f1dwesr = brnary variable ( 1 1f Rl!gion = Midwest, 0 othe rwise)
South ; binar) variable (l if Regio n South , 0 olht!rwisc)
Wes1 = binary variable (l if Reg10n = Wesl. 0 o the rwic;e)
6.1

ComputeR~ for each of th~ regressions.

6.2

Ul.mg the regress1on result:> in .:-olumn (1):


a. Do wo rker.. \\ilh Cfllkg~o degrees l!arn more. on average. than ~ ork~o .,
with on!~ h1gh chou! Jcgrees? How much more?
b. Do m~.:n earn more than women on a\'l!rage? How much more?

6.3

c . .ing the reg.rc-.sion re...ult

in col umn(~)

a. Is age an nn portanl dett:rminant of ea rnmg~'? Explain.


b. Sally i. 29-year-old female college graduate. Betsy is a 34-yea r-t1IU
female college gradU.lll!. Preilict Sully's and Betsy's eammgs.
G.4

l;sing the n~grcss1on rel.ultl> m column (3):


A.

Do there appear to be important regional dtl fcrenccs?

h. Why ts th~ rLgres,or \\ nt l mitteJ from th~o n.;gn:s,ion'> What woulc.l


happ<:n if it '' "' inclul.h:J'!

Exercises
hold-

213

Results of Regressions of Aver09e Hourly Earnings on Gender and education


Binary Variables ond Other Characteristics Using 1998 Data from the Current
Population Survey
Dependent variable: average hourly earnings (AHE).
Regt""essor

.ge 213.
if infor-

(1)

College (X 1)

Female (Xv

fchieve-

5.46

- 2.64

. ..
..._

Age (.Y, )

degree.
1ntained
rita t sta-

(3)

(2 )

5.48

-2.62
0.29

Northeast (XJ)

5.44

- 2.62
0.29

--

0.('9

M1dwest (X5 )

0.60

South (X~)

lntercepL

12.69

---

4.40

--

- 0.27

3.75

--

Summary Statistics

SER

6.27

R2

0.176

R!
4000

~ar-old
~

6.22

6.2 1

0.190

0.1 94

--~

4000

.tOO()

c. Juanita is a 28-year-old fem ale college graduate from the South . .Jennifer is a 28-year-old female college graduate (rom the Midwest. Calculate the expected difference in eamings between J uanita and
Jennifer.

workers

l.?

-----

6.5

Data were collected from a random sample of 220 home sales from a community in 2003. Let Price denote rhe selling price (in $1000), BDR de note
th~ number of bedrooms, Barh denote the number of batluooms. Hsize
denote the size of the house (in square fee t), Lsize denote the lot size (tn
:>quare fee t),Age denote the age of 1he house (in years), and Poor denote a
binary variable tha t is equal ro J if the condition of the house is reported as
"poor." An estimated regression yields

Price = 119.2 + 0.4S5BDR + 23.4Barh + O.l 56Hsize + 0.002Lsiu

+ 0.090Age

- 48.8Poor,

R2 = 0.72. SER = 4 I.5.

214

CHAPTER 6

linear Regression with Multiple Regressors


a. Suppose that a homeowner converts part of an existi ng family room 1n
her house into a new bathroom. What is the expected increase in the
value of the house?
b. Suppose that a homeowner adds a new bathroom to her ho use, wh1ch
increases the size of the house by 100 square feet. What is the expcctLJ
increase in the value of the house?
c.. Wha t is the loss in value if a homeowner le ts his house run down so

that its condition becomes 'poor"?

d. Compute the R2 for the regression.


6.6

A researcher plam. to study the causal effect of police on crime using tlata
from a ra ndom sample of U.S. counties. He plans to fegr ess the cou nt~
crim e ra te on the (per capita) size of the county's police force.
a. Expla in why this regression is likely to suffer from omitted variable
bias. Which variables would you add to the regression to control for
important omitted variables?
b. Use your answer to (a) and the expression for omitted variable b ia~
givetl i.n Equation (6.1) to determine whethe r the regression will likely
over- or underestimate the effect of police on the crime rate. (That is,
do yo u think t hat ~ ~ > {31 or ~ 1 < {3 1?)

6.7

Critiq ue each of the following propose d researc h pla ns. Your critique should
explain any proble ms with the proposed research and describe how the
research plan m ight be improved. Include a discussion of any addi tional uata
that nee d to be collecte d and the approp1iate statistical techniques for ana
lyzing the data.
a. A researche r is interested in delennining whetJ1er a large aero pace
firm is guilty of gender bias in setting wages. To detennine potential
bias, the researcher collects salary and gender information for all nf
the firm 's engineers. The researcher th~o plans to conduct a ''differ
e nce in means'' test to determine whether the average sa lary for
women a re significantly Jess than the average salary for me n.
b. A researche r is interested in determini ng whether time pen t in pri~on
has a permanent effecl on a person's wage rate. He collects da ta on a
random sample of people who have been out of prison for at least tlf
tee n yea rs. H e co llects simi lar data on a random sample of people" hl,
have never served time in prison. The da ta set includes information on
each pc..:rson ~curren t wage. e d ucation, age!. elhnicity. gender, tenure

Exercises

21 5

(time in current job), occupation. a nd union status, as well as \vhether


the person was ever incarcerated. The re searche r plans to estimate the
e ffect of incarceratio n on wage s by rcgres!.ing wages on a n indicator
variable for incarceration, including in the regression the o ther potential determinants o f wages (e ducation , tenure , u nio n status, and so on).

ed

6.8

A recent study found that the deatJ1 ra te for people who sleep six to seven
hours per night is lower than the death rate for pe ople who s leep eight or
m ore hours. and higher than the death rate for people who sleep five or fewer
hours. The 1.1 million observations used for thb study came from a rand om
s urvey of Americans aged 30 to 102. Each survey respondent was tracked for

at a

fo ur years. The death rale for people sleeping seven ho urs "' as calculaiCd as
the ratio of the nun1ber of deaths over the span of the s tudy among people

:'s

sleeping seven hours to the total n um ber o f survey respondents who slept
seven hours. Tills calculation was the n repeated for people slee ping six hours,
and so on. Based on this summary, would you recomme nd that Americans
who sleep nine hours per night consider reducing their sleep to six or seven
hours if they want to prolo ng their lives? W hy o r why not? Explain.

s
ely
IS.

o uld
the
data
ana-

6.9

(Y1, X 11 , X2i) salisfy the assumptions in Key Concept 6.4. You are interested
in {31 , the causal effect o f X 1 on Y Suppose that X 1 and X 2 an~ uncorre lated.
You estimate {31 by regressing Y onto X 1 (so that X 2 is not included in the
regression). Does this estimator suffe r fro m o mille d varia ble bias? Explain.

6.10 ( Y;, X 1;, X 21 ) sa tisfy the assumptions in Key Conce pr 6.4; in addition,
var(u, IX 1,, X2,) = 4 and var(XIi) - , 6. A random sample of size n = 400 is
drawn from the populalion.
a. Assume that X1 and X2 are uncorrelated. Co mpute the ,aria nee of ~ 1 .

ce
'al
of
'f-

[Hint: Look at Equation (6.17) in the Appcntlt'< 6.2.)


A

b. Assume that cor(Xv X 2 ) = 0.5. Compute the \'ariance of {3 1

c. Comment on the following state ments: '' When X 1 and X 2 are corre-

x2

la ted, the variance of ~1 i~ large r than it would be if XI and


were
uncorrelated. Thus, if you are interested in {3 1 it is best to leave X 1 o ut
of the regression if it i.s correlated wnh X 1."
6.11 (R equires calculus) Consider the regression mode l

UTE:

fori = l. ... . n. (No tice that the re is no con~t a n ttc rm in the regression.) Followmg ana lysis like that used in Appcndtx 4.2:

216

CHAPTER 6

Linear Regression w1th Multiple Regresson

a. Specify the least squares Junction tha t i~ minimized by OLS.


b. Compute the parllal derivatives of the objection function \\oith resp ~t
tob 1 and b2.
c. Suppose

'7 1X 11X 2, =

0. Show that~~ ...

X Y ~ ~ X"!.

d. Suppose 'L~ 1X 1,X2, 0. Derive an expression for ~ 1 as a function of


the data ( Y,, X 1,. X~). i = I ..... n.

e. Suppose that the model includes an interc~:pl :


Y; =- {30 + {3 1X 1, + {3 2X2; + u;. Show that the least S<.JU:lres estimators
sa1isfy Pu = Y - ~.x, ~2x2

Empirical Exercises
E6.1

Using the da ta set Teacb.ingRatings described in Empirical Exerci<se~ 4 2.


carry out the following exercises.

a. Run a regression of Course_ val on Beoury. What i the estimated


slope?
b. Run a regression of Cours~_va/ on Beauty, including some addltitmul
variables to control for the L~-pe of course and professor charactcns
tics.ln particular, include as additional regressors lnrro, OneCredu.
Female. Minori1y, and NN English. What is lhe estimated effect of
Beauty on Course_l:..'ml? Does the regn.:~si on in (a) suffer from impor
tant omitted vanable bias?
c. Professor Smi th is a black male with average beauty a nd is a nattH?

Enghsh speaker. He teaches a three-credi t upper-division cour-.e. Pr


diet Professor Smith's course evaluation.
E6.2

Using the data set College Distance described in Empirical ExeiClse -U. L .1'0
out Lbe following exercises.
a. Run a regression of years of completed educaLion (ED) on di stanc~ ll1
the nearest college (Dist). What is the estimated slope?
b. Run a regression of ED on Dist, but include some additional regn.:~
soTS to control for charactcnstics of the student, the -,tudenr's fnmih.

and the local labor market. In particuJar, mcludc as additional rt._ ....
sors Bt,.test, Fenwh, Black, H~pamc. lncuuwhi, Ownhome. D ade. c '
C11eoO. and St" m fg80. What j<; the. cstimatt.d ellcd of D ISt on D'

Empirical Exercises

217

c. Is the estima ted effect of Dist on ED in the regression in (b) substantively dillerent from rhe regression in (a)'? Based on this. does the
regression in (a) seem to suffer from important omitted variable bias?

pect

d. Compare the fit of the regression in (a) and (h) using the regression
standard errors_ R2 and R2 Why are the R2 and R2 so similar in regres-

no(

sion (b)?
e. The value of the coeffici ent on DadCo/1 is positive. What does this
coefficient measure?

tors

f. E xplain why Cue80 and Swmfg80 appear in the regression. Are the
s igns of their estima ted coefficients ( + or -) what you would have
believed? Interpret the magnitudes of these coefficients..

g. Bob is a black male. His high school was 20 miles from the nearest college. His base -year composite test score (Bytest) was 58. His fa mily
income )n 1980 was $26,000, and his family owned a home. His mother
attended college, but bis fat her d id not. The unemployment rate in his
county was 7.5%. a nd the state average manufacturing hourly wage
was $9.75.1)redict Bob'~ years of comple ted schooling using the
regression in (b).

ses 4.2.
ted

h. Jim has the same characte ristics as Bob except tbat his high school was
40 miles from the nearest college. Predict Jim's years of completed
schooling using the regression in (b).

ditional
teris

edir,
of
impor

E6.3

Using the data seL Gro,\1h described in Empirical Exercise 4.4 , but excluding the data for Malta, carry out the following exercises.
a. Construct a {abl e th at s hows the sample mean, standard deviation , and
minimum a nd maximum values for the series Growth, TradeShare,
YearsSchool, Oil, R ev_Coups, Assassinations, RGDP60. Include the
appropri ate units for a ll entries.

4.3,carry

stance to
regres
family.

b. Run a regression of Growth on TradeShare, YearsSchool, Rev_Coups,


Assassinmions and RGDP60. Whal is the value of the coefficient on
Rev_Coups? f nterpret the value of this coe fficient. Is it large or smaU
in a real-world sense?
c. Use the regression to predict the average annual growth rate for a
country that has average values for all regressors.
d. Repeat (c) but now assume that the country's value for TradeShare is
one standard deviation a bove the mean.

F.D'!

218

l meor Regress1on with Multiple Regressors

CHAPTER 6

e. \\ hy 1s Oil omll t~,;d trom the.. re gression'! What would happen 1f it wc:rc...
mcluded'?

APPENDIX

6.1 Derivation of Equation (6.1)

----

l l1is .1pp~:nui\ prc~c n t~ il derivation of the formula for omilled van ahle b1a:. in Equauun
{6.1). fquauon

(4 ~10) 10

Appendix 4.3 <~ta tcs tha t

(6 16)

l nucr the ldst twu assumptio~ in Kcv Concept 4.3.


( X;- X)2 ---'4 o1 :uu
1~ k" 1{ X,- X )u, -----+ cov(u1.X,) = rx,p ~ux Sub-:.utuuon ol th~e limits into Equauon

k}';:

(616) yieiJ~ Equation (6.1)

APPENDIX

6.2

Distribution ofthe OLS Estimators


When There Are Two Regressors
and Homoskedastic Errors
Althnugh the general formula for the vatwncc of th~.: 0 1 ") e\ti ma ton. 111 multiple regre5
sion ~ comphcared, tl there are two n:grc-.~or- {k = 2) und the crru~ Me hnmoskcd.l'li
then thl' lor mul<l ~implifie.., enough to provide ~orne in-,i~hts mto the dl\tribution 1)f 1
OLS

~stinMtoN.

.B ~.:cause

the crr~'rs are homoskcdastic. the conditmnal vanancc ol u, can be writt~,:n .I '

var(ll .\' 1 .Y2, ) =a~. W h~:n there arc t\\O regre ~sor1> , X 11 und X 1,, 11nu th~.; error l~:rm I'
hom0-.l..~d:~stic.. m larg~ <.ampks the sampling dtstnbuuon of JJ ~ V{f3 , ; ) ''here ah.. '
anc.. ot tlu) ~hstribuuon . trt, j,

ITp

=' [ ..
I ., l "~
II

l -

'1"1.
\

J ll

(n

17)

Dtslribvtion of the OLS Eshmolors When There Are Two Regressors and Homoskedostic Errors

219

where p,. r ts the populauon correlation between the cwo rcgn.ssors ,\ , am.l X:: and ui h

the population vanance of X 1

Thte \ariance oj of the ~mphng J 1~trihuuon lll ~~ dcj~cnJ-; t.ln the ..qua red correlation
between the: regressors. If X1 and X~ at~: hi11hh corrd u~d. c::1thcr pu~itivcly or negath d y.
then I'~ , . 1s close to l,and Lh\1$ the term 1 - p~ , in th~o. dcnumin:uor of Equation (6.17)
1s small ::md the variance of~ 1::. larl!cr th:~n it would be 1f p, , were close to 0.
Ano ther feature ol the joint

Ol1rmal lar~c :..1mplc

di,CrthUilliO o f the O LS esti ma to rs

ts chat~ and f3.z are m gene ral corrc!Jtcd. \\hen thi.. errors .u~.- humoskcdastlc. the corre-

luuon lxt\\een the OLS estimators ~ and


two

iJ: 1-. the ncgatl\c of the c~1rrclauon berween the

rcgrc~rs:

corr(/3..f32)

- p,, ,.X;

(6.18)

CHAPTER

Hypothesis Tests and


Confidence Intervals
in Multiple Regression

s discussed in Chapter 6. muJtipJe regresston analysis provides a way to


mitigate the problem of omiued variable bias by including additinnr.l

regressors. thereby contJolling for Lhe effects of those additional regressurc;. fhl!
coe((icients of the multiple regression model can be estimated by OLS. Like .111
estima tors, lhe OLS estimator has sampling uncenainty because its va l u~ d1lll.rs
from one sample to the next.
This chapter presents methods for quaotiiyiog tht ampling uocertalnt) oi
the OLS estimator through the use of standard errors, statistical hypoth~'!!'
testS, and confiJence intervals. One new possibility that arises in multtple
regression is a hypothesis that simultaneously mvoh es two or more regn.: 'ton
coeUicieots. The general approach to testing such "JOtnt" hypothe es m"uh\.! a
new test statistic, the -statistic.
Section 7.1 extends the methods for statistical inference in regres~t on ''ith a
single regressor to multiple regression. Sections 7.2 UJlJ 7.3 show bow to tt..:

hypotheses that involve Lwo or more regression coefficients. Section 7.4 l'Xtc nd~
the notion of confidence intervals fo r a singlt: coefficient to confidence sc l!t f< 1r
multiple coefficients. Deciding wh ich variables 10 include in a regression is ,tn
important practical issue, so Section 7.5 discusses ways to approach th i'i
problem. In Section 7.6, we apply mulfiple regression analysis to obtatn
improved eslimntes o( the e(fc::ct oo test scores of a reduction in the
student- teacher ratio using the Californ ia tc l score data set.

7. 1

1.1

HYJXlthesi$ Tests and Confidence lntervol$for o S1ngle Coefficient

221

Hypothesis Tests and Confidence


Intervals for a Single Coefficient
This sectio n describes how to compute the stunJ11rd error. how to test hypotheses,
and how to construct confidence intervals for a !>i ngl~ coefficic!nt in a multiple
regressio n equation.

Standard Errors for the Ql S F=

+: M

tn,.

Recall that. in the case of a single regressor. it was po~iblc to t:stimatc the \ariance of the OLS estimator by substituting sampl~ averages for expec tations. which

le d to the estima tor~ ~ given in Equation (5.4). Under the lea~t squares assumptions, the law of large numbers implies that th t:s~ sam ple nver agcs conve rge to

l bc
all

{Jj,l,;

the ir popula tion counterpar~s, so fo~ example


~ I. The square roo t of
is the standard e rror of {3 1, SE(/31), an estimator of the standard d eviation of

cH

PI

the sa mpling distribution of {3 1

i"
o(

P,

AllthlS extends directly to mulliple regression. The OLS estimator of the


regression coefficient has a standard d eviatio n. and this standard deviation is

estima ted by its standard error, SE(p1) . Tile fom1ula fo r the s tand,ud e rror is most
easily stated using matrices (see Secuo n 18.2). The Important poin t is thar, as fa r

as standard errors are concerned, there is notbJog concepw a lly different between
the single- o r multiple-regressor cases. The key ideas- the large-sample no rmality of the estimators and the abiliry to estjmute conSISte nlly the c;tandard dev1ation
o f t heir sampling distributio n-a re the same whether o ne bas one, two. o r 12
regressors.
with a

Hypothesis Tests for a Single Coeffi,.iP"lt


Suppose that you want to test the hypotbe is that a change in the stude nt-teacher
rat10 has no e ffect on test scores, ho lding constant the percentage of E nglish learnis all

ers in tbe district. This corresponds to hypothesizing that the true coefficie nt 131 on
tht:. student-teacher ratio is zero in the population regrec;o;ion of test scores on STR
and PctEL. More generaJiy, we might want to lc ~ l the hypothesis tha t the true
cO\: lficient 131 on the ih regressor takes on some specific "alue. {3 .o The n ull value
{3 comes e ither from economic theory or. as in the studcm l- teacber ra tio C\ amplc. from the decision-making context of tbe application. I r the alternative hypothC!>IS i two-sided, then the rwo bypolh e~~ can b~ wnuen mathcma ucally as
Ito- {31 =

1310 ~

H 1 131 "I

/31 11 (t\\ O..,idcd a ltc rna ti\'c).

(7.1)

222

CHAPTER 7

Hypothesi>Tests and Confidence Intervals in Multiple Regression

TESTING THE HYPOTHESIS {31 = f3j,O


AGAINST THE ALTERNATIVE {3j ;;f. f3j,O

7.1

1. Compute the standard error of

PrSE(P

1) .

2. C'..ompute the r-staristic,


(7 2)

3. Compute the p-value.


(7.3)

where r1"c is the value of the -statistic actually computed. R eject the hypotheSI:. at
the 5% significance level ifthe p-value is less tban 0.05 or. equivalently. if t'1"'1> I %.
The standard error and (typically) the /-stat istic and p-value testing {31 = 0 arc
computt!d automatically by regression software.

For example, if the first regressor is STR. then the nuU h)' pOlh esis that cha n.~ ng
the studeot- tcacher ra tio has no effect o n class size corresponds co the null
hypothesis that {3 1 = 0 (so f3J.o = 0). O ur task is to test the n ull hypothe:o- ' H(J
againslthe alternative H 1 using a sample o( data.
Key Concept 5.2 gaves a procedu re for testing this null hypothesis when 'l1ere
lS d smgle regressor. The first ste p in tlus procedure is to calculate the s tandard
error o f the coefficient. The second ste p is to calculate the t-s tatistic using th 'i!\!O
eral formula tn Key Concept 5.1. The third ste p is to compute the p -\aluc.:. u the
test u!>ing tbe cumulative normal distribution jn A ppendix Ta ble l or. alternu e!~ .
to compare the t-statistic to the critical value corresponding to the desired.;_ til
icance level o f the l<:!Sl. The lheoretical underpinning of tbis procedure is th th~
OLS estimator has a large-sample normal distribution which, under the null
hypothesis, h.as as its mean the hypothesized true value, and that the varian ... e 1.1f
this distribution cao be estimated consistently.
This underpinning is present in multiple regression as w~U. As sta ted 10 l't:'
Co nce pt 6.5, the sampling distributio n of 1 is approximately no rmal. L nd1. r thl
null hypothesis the mean of this d istribution is f3j.o The varianc.! of this di.,t ,,u
rion can be estimated consiste ntly. Therefo re we can simpl) follow the '\amc rr"'~
cedure as in the single-regressor case to test the oull hypothesis in Equauon (i.l}
The.:. procedure for te t ing a hypothe is on a single coefficient in mulur 1'

rcgn:~u n 1s summa ri7ed a~ Key Conct:pt 7. 1. Ille t-stau,tic nctuaU) compu!ed

7.1

Hypothesis Tests and Confidence Intervals for a Single Coef~cient

223

CONFIDENCE INTERVALS FOR A StNGLE


Co EFFICIENT IN MulTIPlE REGRESSION
\ 95%

twosicl~d confidence interval

for the coefficient /31 is an interval that con-

tams the true value of {31 with a 95% probahility: that is, it contajns the true value

of 13 in 95% of all possible rand omlv drawn sampks. Equivalently. it is the set of
v~lu.:!S of /3 Lhal cannot he rejected by a 5% two-sided hypothesis te st. When the
1
s~mp\e size is large. the 95% confidence interval is

{.7.2)

95% conlld~!nce interval for {j1 =

. V J)

1\

'

lffi; -

1.96St:(p1).j31 + l.96SE(f3,>J. (7.4}

90% confidence in te rval is o btained by replacing 1.96 in Equation (7.4)

witb 1.64S.

tesis at
> 1.96.

;:: 0 arc
de no ted t 11 <'1 in this Key Co nce pt. Ho wever, i( is customary to denote this simply

as

)anging
[he null
Lesis Ho

1,

and we ad o pt this simplified notation for th\: rest of the book.

Confidence Intervals for a Single Coefficient.


The m ethod for constructing a confide nce inte rval in the mulliple regression

~ there

ftandard
lthe gen~e o f the
rnatively.
ed signif
s that the
the null

~riaocc of
cd in Kc.:~

Undc.:r th'
s distril'U

::.ame prll
ation (7 .l ).

n n1u ltiP 1 ~
II~

bmpute'

m odel is a lso the same as m the si n gl~-regressor model. Th is method is sum marized as Key Concept 7.2.
The method for cond ucting a h ypothesis ti;!Sl in Key Co ncept 7. 1 a nd the
method for constructing a conftdt!nce interval in Key Concept 7.2 rely on lhe largesample normal approximation to the distribution of U1e OLS estimator f3,. Accordingly. it sho uld be ke pt in mind that these methods for quantifying the sampling
uncertainty are only guaranteed to work in large samples.

Application to Test Scores


and the Student-Teacher Ratio
Can w e reje~:t tbe null hypothesis that a change ln the student-teacher rat io has
no dfect on test scores. once! we control for the perce ntage of English learners in
the district? What is a l):;; % confidence interval for tbe effect on test scores of a
change in the stu<.lcnt- ll.:achcr ratio. co nt ro ll ing for the pe.rcentage of Engl ic;h
learners'? We are now a hie to lind out. The regressio n of test scores against S TR
and PctEL, estimated by OLS, wa::. given in Equation (6.12) and is restated here
with standard errors in parentheses below the coemcients:

224

CHAPTER 7

Hypothe~is Tests and Confidence Intervals in Multiple Regression

--TesrScore

= 686.0- 1.10 x STR (R.7) {0.43)

0.650 '< PctL.


(0.03 1)

(7.S)

To tesr the hypothesis that the rrue coefficient on STR is 0. we fiN need 10
compute t.he /-statistic in Equation (7.2). Because the null hypothe~' 'il}' that the
true value of this coefficient is zero, the /-statistic is 1 = ( 1.10 0) 0.43 = - 2 ,
The asl>ociated p-vaJuc i 2<1>( - 254) = 1. 1%: th:n is. the smallest stgmficance 1~\ ...1
at which we can reject the null hypothesis is 1.1 %. Because the P' alue b le'" than
So/o. the null hypothesis can be rejected at the 5% stgnificance level (t'tut not quite
at the I% signif1cance level).
A 95% confidence interval for the populat1on coeiftc1ent on STR is I 1 11::
1.96 x 0.43 = (- 1.95, - 0.26); that is, we can be 95% confident that the tru~.- ' tlue
of the coefficient is between - 1.95 and - 0.26. Int erpret~d in the context lll the
superintendent's interest in decreasing the st uden t- teacher ratio by 2. tht Y~%
confidence interval for the effect on test scores of this reduction is ( - I 95 x 2,
-0.26 X 2) = (- 3.90. -0.52).
Your analysis o( th~: r: lt"ple regression in Equauoo (7.5) has persuaded the superinte ndent that, ba~t.: d on
the evidence so far, reducing class size will help test scores in her distnct. ' o''
however, she moves oo to a more nuanced question. !I she is to h~re m' rc.: teachers, she can pay for those reachers e ither through cuts elsewhere in Lhe buJg~t (no
new computers. reduced maintenance. and so on). or b) asking for an tn\.rca'e 10
her budget, which taxpayers do not favor. What, she asks. is the effect on h:'t ~ .>r"'
of reducing. the studenr-teacber rario. holding expenditures per pupil (a nd t Jkf
cenlage of English learners) constant?
This question can he addressed by estimating a regression of test score' ">r the
st udent-teacher ratio. total spending per pupil. and the percentage o r rn~Ji'h
learners. The OLS regression line is

Adding expenditures per pupil to the equation

= 649.6 -

0.29 X STR + 3.87 X Expn - 0.656 X Pl'tU.. (7.6)


(15.5) (0.48)
(1 .59)
(0 .032)

where xpn is to tal annual expenditures per pupil in the district in thnu'\.tnds of
dollars.
The result is striking. Holding expenditures pe1 pupil and the percent 1gt: ot
English learners constant, changing lbe student- teacher ratio is e')tim.llcd to hJ\
11 111
a \ 'CT\ small cflcct on tc t .,corts: The estimated cocfftcac; nt on 57 R tS -I
Equa-tion ("'.S) but aft"'r .tddmg '(pn a., a r'-'2r'-'' or in Cquat on (7.6). it 'onh
-0.:>9 \11 nrrcwc r ah. , ....a.ui ..llc: fnr "''I'"" thn 1h1 1r11r. ' luc of th<' rn~ ffict en,'-'-t_
r.......__ _ ____ J

7 .2

ecd t<:l
bat the

One inte rpretat io n of the regression in Equation (7.6) is that, in these Ca l.i-

- 2.54.

fomia data, school administrators allocate their budge ts e fficie ntly. Suppose, counrerfactually, that the coefficient on STR in Equation (7.6) were negative and large.
If so, school districts could raise their test scores simply by decreasing funding for
other purposes ( te xtbooks, technology. sports, and so on) and transferring those
funds 10 hire more teachers, thereby reducing class sizes while holding expenditures constant. However, the small and statistically insigni6cant coefficient o n STR
in Equa1ion (7.6) i.nd icates that this transfer would have little effect o n test scores.
Put differe ntly, d istricts are already allocating their funds efficiently.
Note that the standard error on STR increased when Expn was added, from
0.43 in Equation (7.5) to 0.48 in Equation (7.6). This illustrates lhe general point,
introduce d in Section 6.7 in the context of imperfect multicollineari ty, that correlation between regressors (the correlation between STR and Expn is - 0.62) can
make the 0 LS est imators less precise.
Wha t aboul our angry taxpayer? He asserts that the popula tion values of borh

quite

-1.10::
ue value
Kt of the

the 95%

l.95

225

zero is now t =(-0.29 - 0)/ 0.48 = - 0.60, so the hypothesis that the population
value of this coefficient is indeed zero cannot he rejected even at the 10% significance level Cl - 0.601 < 1.645). Thus Equation (7.6) provides no evidence that hiring more teachers improves test scores if overall expenditures per pupil are held
constant.

(7.5)

)t

Tests of Joint Hypotheses

2,

re multibased on

1.

ltct. Now.

the coeificjent on the studem- teacher ratio (/31) and the coefficient on spending
per pupil (/32 ) are zero, that is, he hypothesizes that both /31 = 0 and {32 = 0.
A lthough it might see m that we can reject this hypothesis because the c-statistic

1re teach

1dget (no

tcrease in

testing {32 =0 in Equation (7.6) is t = 3.87 / 1.59 = 2.43, th is reasoning is flawed.


The taxpayer'::. hypothesis is a joint hypothesis, and to test it we need a new tool,

l}St scores

d tbe per

the F -statistic.

re, on the
1f English

F:L. (7.M

)u.ands of

7.2

Tests of Joint Hypotheses


Tit is section describes how to formulate joinl hypotheses on multiple regressio n
coe fficients and how to test them using an F-statistic.

entage of

Testing Hypotheses
on Two or More Coefficients

l.:d to hli"~

Joint null hypotheses.

s - l .LO iO
l it is o111Y
erridc-Jtl

Consider the regression in Equation (7.6) of the tesl


score against the stude nt- teacher ratio, expenditures per pupil, and the percentage of English learners. Our a ng ry taxl>ayer hypothesizes that neither the student teacher ratio nor e xpenditures per pupil hav~ an effect on test scores. once

226

CHAPTER 7

Hypothesis Tests and Confidence Intervals in Multiple Regression


we control ror the percentage o( English learners. Beca use STR is the fir.., I TCfl ~,.
sor tn Cquat10n (7.6) and Erpn iJ> the second, we can write thi:; hypothesis m th.
~matic:Jil) l'\

1l1e.. hypothe~is that hoth the coeffident on the student -teacher ra t10 (~ 1 ) unti
the coeffidcnt on expenditures per pupil (132) are lero is an example of .1 l'''llt
hvpothcw> on the codfJcicnh in the m ultiple regression mod ~!. In thi'i c,,..,~... the
null hypothi!!-Js rest nets the' alue of two of the coeffici~nlS. so as a matlcJ ol1, rmmolog) \\C can say that tbc null hypothesis in Equauon (7.7) impose~ twu rc,tricti on<~ nn th~.- mulliplc regression model: 131 = 0 and 13z = 0.
Tn genera I. a joinf hypothesis is a hypothesis lhat imposes two or more rl.!-.lr ictiQns o n the regression coefficients. We consider joint null and alternative hyp,t h~.:
ses of the lorm

lfu: 131 = 13, n 13111

13m.n .... for a total of q restrictions, V'i.

// 1: one or more of the q restrictions under H,l does not hold,

where {31 ~.., rdcr to different regression coefficit:nts, and f3;.o /311111, r kr
to th~ vu lu~;s of these codficicnt:. under the null hypothesis. The oull hyp('th~:~b
in Equation (7.7) i ;lo c'ample o f Equation (7.8). Another example is that. tn
regres ...ion \\ith k = "regre...,o~. the null hypothesis is that the coeftu:icm-.. on the
'2~<~. 411'. and :>1 regres~or~ arc 7cro: rhat is. {3 2 = 0, {3~ = 0. and 13s = 0. so that thc:re
arc q = 3 re~tricttons. In general. under the null hypothesis H0 there arc q ~uch
restrit:t tun!..
II an) one (or more than one) of the equalities unde r tht: n ull hypvth~:~i~ //,
in Equation (7.8) is fa\c;c, then the joint null hypothesis iLSclf is fa lse. 11lU'\.th~.o .dt~r
nathc hypothcc;is is that ,11 lt:ac;t one of the cqualitiec; in the null hypothr.:''" J/1
dnl.!' not hold.

Why can't I just test the indiYidual coefficients one at a time? Alth,,ugh
it set'll1'> it r.hould be possiblt;: to test a joint hyputhcsis by using the u~lJ.tlt ~tllll:
ttc-. to test the reslrictJono, one ala time. the fo llowing calculatlun :;how~ th,ll tht'
appro.u.:h j, unreliahlt:. Specifically.suppo~ that )OU are interested in tc~liiH' th~
JOint null hyptthe~i' in Equation (7.6) that {3 1 - 0 and /3~ = 0. Let r bL th~ r ~ 1 '
ti..-.tic lor tcstmg the null h~ polhesi' that13 = 0. and lett2 beth~ H.tati,tiL wr teSt
ing tht: null hvpothesJ'\ that {32 = 0. What happens " hen~ ou use th~: "'one al ,ttinte
tcs11ng pro~cdure: ReJect the joint nulJ bypothc~c; .r ~ithl.!r 1 or r2 cxcel.!<.b 1.'10 1 ~

7.2

Tests of Jomt Hypotheses

227

math-

Because this question invo lves the two random variablclt t 1 and /~. answer ing
it requires cha ractcrizmg the joint sampling distribution oft 1 and r~. As mentio ned
in Section 6.6, mlurge samples /3 1 and ~ 2 have a joint normal d isu ibulion. so under

l7.7)

the joint null hyp ot he~is the r-statbtics t 1 a nd t,_ ha'~ a b iva riat~ normal diSlribution where e ach r-statistic has mean e qual to 0 an<.l \ariance equa l to 1
First coo ider the special ca"e in which the c-sta tistics are uncorrelated anu
thus are independen t.. Wh nt is the size of the '"one ar a time'' testing proccdurl!:

cgn:!-

,) lllld

a joint
,. . e. the
of tcrrestric
n.:stricvpothe-

(7.R)

. . . refer

rpotheSi'
thaL in a
tl" on the
hut ther~

re q such
th~sis flu

the alter
th~:<ii S

llu

1'\lth<'Ul!.h
11r-~tall~
~ thUI tbt~

that is, what is the probah11ity that you will reject the null hypo thcsis whe n it is
true? More than 5%! In this pecial case we can calculate the rejectio n probabil
ity o f this method exactly The nullll> not rejecte d only if both It.I ~ 1.96 and
h ~ ~ 1.96. Because the

r-statistit-s nrc inde pendent, Pr(lt

I ~ 1.96 and lt.:-' s

1.96)

= Pr(lt 1 l s

1.96) x Pr(lt~ ~ l.96) = 0.95 2 = 0.902.5 = 90.25%. So the probability


of rejecting the null hypothesis whe n it i~ true is 1 - 0 .95~ = 9.75% . Tit is ' one a t
a time" m e thod rejects the null too often because it ghes you too many chances:
U you fail to reject using the first r-statistic, you get 10 try again using lht! second.

If the re gressors are correla ted. the s ituatio n is even mort! co mplicated. The
of the 'one a t a time " procedure depends oo the val ue of the co rrel ation
be tween the regressors. Because the " o ne a t a time testing. approach bns the
wrong siL.e-that is, its rejec tion nne under the nuB hypothesis docs not equal the
desire d significance leveJ- a new approach is needed .
O ne approach is 10 modify the "one at a t1me '' method so tbat it uses different critical values that ensure that its size equals its significance level. This meLhod,
called the Bonferroni method. is described 111 A ppe nd ix 7.1. T11e advantage of rhc
Bonferroni me thod is that it applies very generally. ll~ c.lbHdvantagc is that it can
have low power; it frequently fails to reject tbe null hypothe is when in fact the
a lternative hypothesis is true.
Fortuna tely, tlu:re is anothe r approach to l~slingjoint hypothese!> tl1at is more
powerful. ~speciall) when the regresso r are highly co1 related. That approach is
based on the !-'statistic.
~>ize

T he F-Statistic
l11e F -statistic is used to test joint hypothesis about regression codficients. The

lh~ 1-t-1

fo rmulas for the F-statis tic are inte g ra te\.~ in to mode m regression soft"" a rc.
Wit! (ir!>t d iscu<;s the case of two restrict ions. then turn to the genera l c ase of q
restrictions.

c for t1!"1
at a titll

The F-statistic with q = 2 restrictions. When the joint nuU hypothesi'> has

.:. . ting rht

U" I t){1 '"

the two restrictions that {3 1 - 0 and {3"! - 0. the /-statlsttc comhines the two tstAttstics t and t. usmg the lormula

228

CHAPTER 7

Hypothesis Tests and Confidence Intervals in Multiple Regression

:.!.
?

(/j + 1 ?

1-:_-

2.
Pr .r.l112)
.:!

p, ')

'

(i 9)

where p,,J, is an estimato r of the correlation between the two r-..,tati ttc..:
To understand the F-statistic in Equation (7.9). firs t suppo-.e tbat W\. l.n1
the Hotatistics are uncorrclated so we can drop the terms invoh m~ p ~- U so. r .-. .
tion (7.9) ~ impll(ies and F = zCtf + ti): that is. the F-S l..ltl~ttc is the 8\\.rag<. o the
squared t-statisrics. U nder the null hypothesis, t 1and 12 an. tn<.kpendent st.mdord
normal random variables (because the /Statistics ru-e uncorrdated by a~umpt1 n).
so under the null hypothesis F has an F1.C. distribution (Section 2.4). Und~. the
alternative hypo thesis that either {31 IS no nzero o r {3 2 is no nzero (or both). then
eit her 'To r tl (or hoth) will be large, leading the test 10 reject 1he nu II hvpothe is.
In genera l the /-statistics are corre lated. and the fo rmula for rhe F-swu:o.tl<: tn
Equatio n (7.9) adjusts for this corre lation. This adj ustment is made so tiHH, under
the null hypothesis, the F-statisric has an F2_., distri bution in large samples wh~ rhcr
or nor the t-statistics a re correla ted.

The F-statistic with q restrictions.

The formula for the heleroskeda' ICttyrobust -statistic resting th(! q restrictions of the joint null hypothesis in Equ lion
(7.8) is given in Section 18.3. This formula is mcorporated mto regression soil\ are.
making lhe -statistic easy to compute in practice.
Uoder the null hypothes1s, the F-statistic has a sampling distribu tion that. m
large samples, is given by the Fq.x distribution. That as, in large san1ples. unJcr the
null hypothesis
the F-st <~ tist ic is dtstri butt:d Fq .._.

(7 10)

Thus the critical values for the F-statistic can be obtained from the 1 tl-tl ' I
the Fq.3: distribution in Appendix Table 4 for the appropriate value of q and the:.
desired significance level.

Computing the heteroskedasticity-robust F-statistic in statistical


software. If the P-statistic is computed using the general betcroskedH-.tit:lly
robust fonn ula, its b1rge-n distribution under the null hypothesis is F.,,Y rcgnrdk~'
of whether the e rrors are homoskedastic or heteros kedastic. As discu'!> J tn
Section 5.4, fo r historicaJ reaso ns most statistical sofrware computes homosk.:Jas
ticity-only standard e rro rs by default. Conseque ntly, in some soft\\ are pnll. gc:s
vou must select a " robust'" o ption so that the F-!>tatistic is wmputl!d U"'"-=- hc:t
\.TO~keda ticity-robust standard errors (and. mor~. I!Cn\.r.tll). a hdero,keJa..,!JCIW
robu"t \Ostimate of the "Co\ariance m.urix''). The;; homu,keJa,ticity-onl) versi~
o f th\. F "taUstic is dt.,cus~ed at the cod or Ihi'> .;~,;c.lltJO

7. 2

7.9)

Tesls of Joint Hypotheses

229

Computing the p-value using the F-stotistic. l11c /' Wllue of the F-,tall~tic
can he computed u-;mg the large-sample F 1 <tpprm:ml.ltion tn 1t~ di~tnl"luuoo Let
I Jcnutc the value oJ the 1--statt,llc ~cluall) compull:u. Bccau:.e the F-'>tatisttc
h..t' .t l.trge-!-ampk /-~.- di::.tribution undt. r the null h) pnthc\lS.. the fl\ alth! is

that

qua-

p-valuc

Pel F., )!

(7.11)

].

f the
dard

'on).
r th~:

then
hesis.
stic in
under
ether

The p -va1ue in [quation (7.11 ) c.:an be cva1uati!U U\llll' a table o lthe F1.x dhtnhuuon (or, alternauvdy. a tabk ol th ... \ 1 Ja,lnbutaon. because a \~-d ist ributed
r.tuc.lom \3mtl:lle b q tun.::. an f:rJtbl thuli..J r..nJ,)m' tri.tblc:) Allematih:l~. the
p \.llUt. can be: C:\JiuateJ using a wmpuler, becau~c rormuJa, for the cumulalhe
clllsquarcJ and /-distribution' have bc:cn inn>rpornt~c.lan to mo'\1 modem statis
tical software .

The " oYeroll" regression F-stotistic. 1lh. ov~ra ll'' rcgresMon F-statistic
1~.-sl~

the JOint hvpothcsis that a//th~: slope.: coefhclcnt:; are zt.ro. That
anJ .\ltemJli\~,; h)pothcscs are
sticity. uation
ftware.

that. in
dcr tbe

(7.10)

H 11: f3,

15,

the null

= 0. {32 = 0, .... f3k = 0 vs.. 1/1: {31 * 0, at least o ne j, j = 1, ... . k.

(7.12)

Under th1s null h\ potbesis. none of tht. 1egrc'-Sors explain-; any ol th~ varia uon in
Y1 although Lh~.- mtcrcept (wluch und~.-r the null hvpothc~;i~; is the mean ol r,) can
be noni"cro. The null hypoth~.-sio; m l::.quauon (7.1 .,) is .1 'Pect.ll c&'>t.. olth~ general
null hypothc:::.1s in Equation (7.8), anJ the O\er.all rcgJ~o..,sion F-,tati,tic i the
J- stnlisric computed for lhe null hvpothesi!> in F qu.llion (7.12). In large samples.
the overall rc..:gres ion F-statistic has an Ff . distribullun when the null hypothesis
i~;

true.

t ables o(

1 and tbl!

:~tisticol

The F-stotistic when q = / . \\ben " = I. the r ,,,,11,111. l~o:~lS .1 stngk re-.trictaon. Then lht. joint nuB hypothcw. reduces to th..: null h) pot he 1s on a single
rq~rt. .,..jon codficienl. and the 1--stalbtic i~ the .,quare <.lf lh1. 1- lali'\tic.

dasticit~

egard1t.~

.cussed ar.
110skedJ"'

packag~

u.~iog ttc:t
edastic'~

11)

vef'!>l\

Application to Test Scores


and the Student- Teacher Ratio
We are no\~ ahlc tote t th~ null h) pnthc~is that the coelficieots on hntll the student-teacher ratio and expenditures per pupal arc tt: l o. lgatn!>l the alternative that
I kJ't on..; c~x fllctcnliS nonzero. contrulhng tnr th. p~:rCI.'ntagt.o ot Ln~h'h learnt. r::. 10 tho.: dt'ilTICl.
lo t~..,t th" h)ptlth~is.. \\C nccJ hl compute the hctcn)sked.t,ti,it) ruhu.st F.,t.~ti-.tic ol the tc ... t that f3 = 0 ::~ml f3.
II u-.ing the rcgre-.sion ot li<;t5irnre on STR.

230

CHAPTER 7

Hypothesis Tests and Confidence Intervals in Multiple Regre$~ion


Expn, and Pet EL reported in Equation (7.6).This F-statistic is 5.43. Under then Jll

hypothesis, in large samples this statistic has uo F2: 11 disuiburion. The 5o ern ~ 1
value of the F2.?0 distribution is 3.00 (Appendix Table 4), and the 1 Clfu cnllcal' to "
is 4.61. The value of the F-stausuc computed from the data. 5.-4\ excc:cl:b 4.o.. ~v
the null hypothesis is rejected at the 1% level. It is very uoltkel) that \\ e vn ...
have drawn a sample that produced un F-statJStjc as large as 5.43 if tl e null h~ ~th
esis really were true (the pvalue is 0.005). Based on the evtdeoce in Equatton (7 6J
as summarized in this F-statistic, we can reject the taxpayer's h) pothe,is that r.ci.
ther the student-teacher ra tio nor expendi tures per pupil ha' e Jn effect on t ~ t
scores (holding constaot the percentage of English learners).

The Homoskedasticity-Only F-Statistic


One way to restate the question addressed by the F-statistic is to ask whc:thl'r
relaxing the q restrictions that constitute the null hypothesis improves the fit of
th~ regression by enough that this improvement is unlikely to be the result mc.:rel~
of random sampling variation if the null hypothesis is true. This restatcm\:nl ~.J g
gests that there is a link between the F-statistic and the regression R2: A largt. Fstatistic should, it seems, be associated with a substantial increase in the R2 Jn f d.
if the e rror u1 is horooskedastic, this intuiTion has an exact mathematical exprc.. ,_
sion. Tha t is. if the e rror te rm is homoskedastic, the F-staTISIIC can be writll. n
terms of the improvement in Lbe rit of the regression as mt:asured l..tlht.r b} ~:..
sum of sq u <~red residuals or by the regression R 2 The resulting F-stausu.. s
refetTed to as the homoskedasttclty-only -statistic, hecause it is valid only i. ..Le
error term is homoskeda tic. Io contrast , the heteroske<.lasticity-robust F-~t a( , c
computed using the formu la tn Sect ion 18.3 is valid whether the error te rm b
horoos.kedastic or heteroskedastic. Despi te this significant limitation of thc.:
homoskedasricity-ooly -statistic, its simple fomlUi a sheds light on what the f -;,t1
til> tic is doing.ln addition , the simple formula ca n be computed using stanJMd
regression output. such as might be reported in a table tha t includes regres:-i<ll1
R2's but not -statistics.
The homoskedasticity-only F-statistic is computed using a sim ple lormu ln
based on the sum of squared residuals Irom two regressions. Jn the Cirst reg.rcs~~~~ n.
called the re tricted regression, the null hypothesis is force<.! to be true. Whl.!!l Ihe
null hypothesis is of the type in Equation (7.8). where nil the hypothe<>izet.l va lue~
are 7ero, the restricted regression is lbe regression in which tbose coefficient, ,,rc
set to t.cro, (bat is, the relevant regressors are excluded from the rcl!rc,<>ion In hC
'c::cond rc::gression. called th..: unre~tricted regre ion, tht i.lllt:rnatJ\c;: h) pothc' ' ''
allowc;d tO he true 1( th~ '>Um osqm:.n.:d r..:,jduals I ' -.ult1C1c:nll\ :.muJier io the Uu 'l:"
strictcJ than lhc rc.: tricted \:2 .,,.,, n th~n lh~ t~'t r tccl t c.: null h\ pot h~is.

7. 2

Te$IS ol Jomt

Hypotheses

2.31

T11e homosked nsticity-only F-stntistic is given by the formula


F ~

(SSRmtrr<IJ- SSR"""'"{m i)Jq - .


SSRwuurrtl'ttl ( n - k,",, ~, ..ud - I )

(7.13)

where SSR,.:srrn:rcd is the sum of squared residuals from the resuicte d reg.ress1on,
u nr~stricced regression.q is
the number o( restrictions under the null hypothesis, and k,,,,..,,..,t'J is the number
of regressors in the unrestricted regressio n. An alterna tive e quivalent formula for
the ho mo!>kedasticity-only F-sta tistjc is based on the R1 of the two regr essions:

SSR,,,e)trtcred is the surn of squared residuals fro m the

(7. 14)

her
to{
rely
sug

c f
fact
res
oin

H the e rrors are bomoskedastic, then the diCfe rence bet ween the ho moskedastici cy-only F -statistic computed using Equa tio n (7.13) or (7. 14) a nd the bete roskedascicity-robust -statistic vanishes as the sample size 11 increases. Thus, if
the errors are homoskedastic, the sampling distribution o f tbe rule-of-rhumb Fsta tistic under the null hypothesjs is, in la rge samples, Fq ~
These rule-of-thumb formulas are easy to compu te and have ao i.nUitive interpre ta tio n in lerms of how weU the Wlrestricted a nd resLricted regres.c;ions fit U1e
data. U nfo rrunately, they are valid o nly if the e rrors are h omoskeda~ tic. Because

t\C i~

homosked asticity is a special case thal caonot be counte d o n in apphcations with


economic data. o r more gener aUy with data sets typically found in the social scie nces. in practice the ho moskedasticity-onJy F -statistic is no t a satisfa ctory su bsti-

L( the

tute fo r the heteroskedasticity-robust -statistic.

) the:

atistic
erro is
of the
: F-sta-

Using the homoslcedasticity~nly F-statistic when n is smo/1.

If the errors

:wctard

are homoskedastic a nd are i.i.d. no rmally dimibuted . tnen the homoskedasticityo nly -statistic defined in Equations (7.13) aod (7.14) has an Fq.n-l ,
1 distributio n unde r the n ull hypothesis. Cri tical va lues for tbis d istributio n, wh.ich

ression

dupend oo both q and n - kunrammd - 1. a re given in Appendix Table 5. As d is-

rrnula
ressioll

cussed in Section 2.4. the F q.JI


_ 1 distribution converges to the Fq,:s. distributio n as n increases; for large sam ple sizes. the di(fcrcnces between tbe two
distri butions a re negligible. For small samples, however. the two sets of en tical values d iffe r.

ben th~
d valuc:S
en~ art

Application to Test Scores and the Student- Teacher Ratio. To Lest the null

n. In thC

h~ polhesis

)thC!>l" IS

the unfl!

1tcsi'

that the popuJation coefficienll) o n STRand Expn arc 0. controUiDg for


PcfL, we need to compute the SSR (orR~) for the restricted a nd unre tricted
r cgrc~ io n. The unr~slricted regressio n has the rc~rc,,o rs STR, Expn, a nd PctEL.
a nd ~ give n n Equation (7.6): it<; R ~ b UA"-;66, th.ll '" R,,t.ulm.l - 0.4366. The

232

CHAPTER 7

Hypothesi$1ests ond Confidence Intervals in Multiple Regre~sion


rc~>~rictl'U regn:~sion 1m poses the joint null hypothesi:. thatthl tru~: codfkit nt' n
Sl"R and :qm are zero: that is. under the null hypothes.il> S IR and Etpn do nnt 1.1 r
the popul~llinn regressiOn. although PctE L does (the null hypothesis due~, 1
rc:-trict the codficicnt on Pcr1 ,). The restricted reg.rcsstoo, c"timattd b~ 01

~ = 664.7-0.671
( 1.0)
<.o R~,, 111 tc.t

X PctL. R2 = 0A141J.

(7 l~l

(0.032)

=OAl-49. The number of restrictions is q = :!,the number of obo;er

a.

is 11 = 420, aml the number of regressors in the unrestnctcd regressiOn 1' k


= 3.1llc homosktda:-.licity-only F-statistic, computed using Equation (7.1-t), L

ll011'>

r-=-- I(0.43M - 0.4149)12Jt l(l - 0.4366)1 (420 - 3 - l) J = 8.Ql.


Bucau:,c tWl ~ xcc~:ds the J% cri tical value of 4.6J, the h ypoth esi~ is rejected m tht:
J \Yo lc\'1!1 usmg th1<. ruk-of-lhumb approach.
llli'- example illu.stratcs the advantages and disadva ntages o( rhc homosk~:da~
ticity-onl> f -,tathtic. l t<; advantage is that it can be computed using a calcul or.
It~ disadvnn tngc is that the values o f the homoskcdas ticity-only arH.I hct
croskcua<;ttchy-robus t f -statistics can be very different: The hct~:roskeuaslk 1\
robu'-1 f-~tntisLi c testing thjs joint hypothesis is 5.43, quite d i[fcrent from th~: lc'~
r~:liahll! homoskedmaicity-only rule-of-thumb value of 8.01.

7.3

Testing Single Restrictions


Involving Multiple Coefficients
St,mctimc<> economic theory suggests a single rc'>triction that involves two vr m1r~
rcgr.:.,'iion cm:Wc i c m~. For example, theory might suggest a null hypothesis ul JhC:
turm [3 1 = {31; that is, the effects of the first and second regressor are the sam~ In
thb ca-,c, tlu; task is to t...:st this nuU hypothesis against tbe alternative rhu t rhc: LW 0
coeJiictenh lltff~:r

l Ill-< null hypothe:>i" hm, a single restrictiOn, so q = I. bur that re-.tnt:t1"11


in, ulvus mulupk cudhci(!ntS (/3 1 and /3'1). We ne~d ro mmlitv th~ m~thud'
pre'lt:ntcd st> tar h) test thL<: hypothesis. There art! two approachc'; whu.:.h on..: "'II
he C<~'>Je.-;1 <h.:penu' on} ()Ur ~oftw 1rc

7.3

Testing Single Restrictions Involving Multiple Coefficients

Approach #1: Test the restriction directly. Some statistica l packages l1ave

enrs on

Jt ~ntc(

oes not
OLS,is

(7 .15)

a specialized command designed to test r estrictions li ke Equation (7.16) and the


result is an F-statistic that, because q = l. has an F~_"' distribution under the null
hypothesis. (Recall (rom Section 2.4 that the square of a standard normal random
variable has an h ,, distribution, so the 95'"lfo pt!.rcentile of the F 1. ':1: distribution is

1.962 = 3.84.)

Approach #2: Transform the regression. If your statistical package cannot

observa;sion is k
.l4), is

test the restriction directly, the hypothesis in Equation (7.16) can be tested using
a trick in which the original regression equation is rewritten to turn the restriction
in Equation (7.16) into a restriction on a single regression coefficient. To be concrete, suppose there are only two regressors, X 1; and X 21 in the regression, so the
population regression has the form

(7.17)

:ted at the

noskedas:.alculator.
and het-

:e dastJCltY

illl.1

233

Here is the trick: By subtracting and adding {32 X 1;. we have that {3 1X 1, + {3 2X 2;
= f3tX1t- f32Xti + f32Xli + f32X 2i = (/31 - /32)Xli + f3iXIi + X2;) = Y1Xli + f32Wi,
where Y1 = {31 - {32 and Wt = X 1; + X21.1l1US, the population regression in Equation (7 .17) can be rewri rtcn as

Lhe less
(7.18)
Because the codficient y 1 in this equation is y 1 = {3 1 - {32 under the null hypothesis in Equation (7.16). y, = 0 while under the alte rnative, y 1 0. Thus, by turning Equation (7.17) into Equation (7.18). we have turne d a restriction on two
regression coefficient~ into a restriction on a single regression coefficien t.
Because the restriction now involves the single coefficient 1'1> the null hypothesis in Equation (7.16) can be tested using the t-stalistic m ethod of Section 7.1. In
practice. this is done by first constructing the new regressor W1 as the sum of the
two original regressors, then estimating the regression of Y1 on X li and Wi. A 95%
conJ'idence interval for the di((erence in the coeffi cients {3 1 - {32 can be calculated
as .Y 1 1.965(-'Yt).
This method c~n be extended to other restrictions on regression equations
using the same trick (see Exercise 7.9).
TI1e two methods (Approaches #1 and #2) are equivalent, in tJ1e sense that the
F-statistic from the first method equals the square of the /-statistic from the sec-

two or more
lthcsis of the
't he sau1e.l!l
l11al the tWO

p .lol

!at restricti011

. the mettwds

rhich one wid

ond method.

234

CHAPTER 7

Hypothesis Tests and Confidence Intervals in Multiple Regression

Extension to q

> 1.

Io general it is possible to have q restrictio ns under lhl:


null hypothesis in which so me or all of these restrictio ns involve mult iple cocft'j.
ciem s. The F-sraristic of Sectio n 7.2 extends to Ibis type of join t hypothesis. Tile/.
statis tic can be computed by eithe r of the two methods just discussed for q = 1
Precisely how best to do this in practice depends on the specific regression suitware being used.

7.4

Confidence Sets for Multiple Coefficients


1llis section expla ins ho w to construct a confidence set for two o r mo re regression
coefficie nts. The me thod is conceptually sicnilar to the me thod in Sectio n 7.1 lnr
construct ing a confide nce set for a single coefficient using the t-sta tistic. cxc~.- pl
that the confide nce set for multiple coefficients is based on the F-sta r.istic.
A 95% confidence set for two or more coefficients i.s a set tha t co nt ains the
true popula tio n values of these coefficients in 95% of randomly dra wn sa mples.
nlUs, a confide nce set is the generalization to two or more coefficients of a con li<.lence interva l for a single coefficient.
R ecall that a 95% confide nce interval is computed by finding the set of val
ues of the coeffi cie nts that are no t rejected using a t-statistic at the 5% significance
level. This approach can be exte nde d to the case of multiple coefficients. To make
t his concrete. suppose you ar e inter ested in cons tructing a confidence set for two
coefficients, j31 and {31_. Sectio n 7.2 showed h ow to use the F -statistic to test a ]\.lint
null h ypothesis tha t {31 = .810 and {32 = /3-z.o Suppose you were to test eve ry po~si
ble value of .81.o and .820 at the 5% level. For each pair of candida tes (f3u >.82u),
}'OU construct the F-statistic and reject it if it exceeds the 5% c rit ical value ol ;_(~.
Because the test has a 5% significance level, the true popula tion values of {3 1 .1nd
/32 will not he rejecte d in 95% of all samples. Thus, the set of values oo t rejected at
the 5% level by this F-sta tistic constitutes a 95% confidence se t fo r {31 and f3,.
Altho ugh this me thod of trying all possible values of .81.u and .82.0 works in thl!
ory, in p ractice it is much simpler to use an explicit formula fo r the con fid enc~.: ~o.ct.
This formula for the confide nce set for an arbitrary numbe r of coefficients is haseJ
o n the formu la fo r the f -staristic. When there are two coeffic ie nts. the resulting
confidence sets a re e llipses.
As an illustrario n, Figure 7.1 shows a 95% confidence ser (confidence clhpsel
for the coeHicients on the student-teacher ratio and expenJ iiUre per pupi l.
holding coos ta n Ltbe percentage of English learners, based on th e estim ated regres
sion in Equa tion (7.6).111is ellipse does not include the point (ll,O). This means th31
the null hypothesis that lhcsc two coeHicienls arc both zero is rejected using tht:
F-statistk at the 5% sie.nificance level. which we already knew from cction 7.'!.

7 .5

~ FIGURE 7. 1

;>ns under the


1ultiple coeffiothesis.The f.

235

95% Confidence Set for Coefficients on STR and Expn from Equation (7.6}

The 95% confidence set

for the coefficients on STR (/3,}

;seJ for Cf = 1.
egression soft-

Model Speci~cotion for Multiple Regression

C oefficie nt o n E:tprr (fJ2)


1.)

and Expn (/32) is on ellipse. The


ellipse contains the potrs of vol
ues of /3 1 and {3 2 that cannot be
re1eded using the fslotistic at
the 5% signi~conce level.

.t s
more regression
~Sect ion 7.1 for
~statistic. except

f)

-statistic.
.hat contains the

-1
-2 It

- 1.11

-U

-(1,5

11.\)

U.S

1.5

C oefficient on STR ({31)

drawn samples.

:cients of a confi

ng the set of vale 5% significance


ff1cients.To make
tdence set [or two
istic to test a joint
o test every possi
iidates (f3;.o f3;,u).
.tical value of 3.00m values of f3Jand
.ues not rejected at
:t fo r {31 and f3z
td [3.,0 works in the
the confidence sc::t.
:ocfficicnts is hased
:ients. the rcsull iM

r"~'

(confidence e 11.

~nditure

per pupil
he estimated r~~grc::'
aO). This mean.; 11131

.
\l~
is rejected uSlng. 1
.:w frnm Secti<'l1

_.,

The co nfidence e lli pse is a fat sausage with the lo ng part of the sa usage o n ented
in the lower-lefllupper-right direction. T he re ac;o n for th is o rienta tion is that the
estim ated corre latio n between {31 a nd {32 is pos itive, which in turn ari es because
1he corrt'la tio n between the rcgress<.1rs S'IR and E.xpn is negati v~ (schools th at
spend more per pupil tend to have fewe r students per teacher).
0

7.5

Model Specification
for Multiple Regression
The joh of detem1ining '' hich variables to include in mul tiple regression-that is,
the pro blem of choo!>ing a regression specification-can be yuite challenging. and
no singk: rule applie s in :.11! situatio ns. But do not despa ir. hecausc some useful
guidelines arc a\'ailable. The starting point for choosing a regression specificauon
is thinking through the possible sources of omitted va riable bias. It is impon a nt to
rely on your expert knowledge of the empirical problem a nd to focu~ on o b taining an unbiased estimate or the causal effect of imerest: do not rely soleJy on purel}'
statistical measures nf fi t such as the R 2 orR~.

236

CH A PTER 7

Hypothesis Test$ ond Confidence lntervols in Multiple Regression

Omitted Variable Bias in Multiple Regression


The OLS estLroators o( the coe[ficienls in multiple regression wiU have omitted
variable bias if an omitted determinant of Y, is correlated with at least one of th.
regressors.. For example, students (rom afflu ent famil ies often have more learnin.:
opportunities than do their less afnuc nt peers. which could lead to hcncr tc 1
scores. Moreover. i( the disrricl is a wealthy one. then the schools will tcnllto havl!
larger budgets and lower student- teacher ratios. If so. the affluence of the stud~: lh
and the student- teacher ratio would be negatively correlated, and the OLS t.:. rmate of the coefficient on the student- teacher ratio would pick up tht.: cfh:ct ll
average district income. even after controlling for the percentage or E nglic;h k.t mers. Tn short , omitting the studems economic background could lead to omiHl d
variable bias in the regression of test scores on the student-teache r ratio and the
percentage of E nglish learners.
'The general COl,ditions for omitted variable bias in multiple regression a1c
similar lo those for a single regressor: If an omitted variable is a determi nant ol r.
aod i( it is correlated with at least one of the regressors, then the OLS estimators
will have omitted variable btas. As was discussed in Section 6.6, the OLS e'-timnrors are correlated, so in general the OLS estimators of all the coefficaent~ \\il lX"
biased. The two conditions for omiued va riable bias in multtple regression :~rc
summarized in T<ey Concept 7.3
At a mathematical level, if the two conditions for omitted vanable hiJ' ue
satisfied, the n at least one of the regressors is correlated with the error term I 11-.
mean that the conditional expectation of rt, given X 1,, Xk1 ts nonzero. S ll that
the ftr'\tlcast squares assumpLion is violated. As a result. the omitted varinhk ,;a..
persists even tf the sample we js large. that is, omitted variable bias implie.; that
the OLS estimators are inconsistent

Model Specification in Theory and in Practice


In theory, when llata are available on the omitted variable. the solution to omi'
ted variable bias is to include the omitted variable in the regression. In pmctkt:.
however, deciding whe ther to include a particular variable can be difficull 1nJ
requires judgment.
Our approach to the challenge of potential omitted variable bia'\ is twof,,(J.
First , a core or base set of regressors sbould be chosen using n combinatwn lJf
expert judgment. ~:cooomic theory. and knowledge of how the data W \.. re collected:
the rcgrco;;sion Ul>ing. thi base set of regressors is sometime-, ref\..rr~u to"' .l lu.te
specifia~ti on llus bao;e specification l>hould contain the variahlcll ol pnman ant~r
est and the control \ ilriable., surP.c...,ted by expert judgme nt and t:conormc thcurY

0Mtm
Omitted
rnorc inclu
able brae, L
l. A t lt:a

variab
., Theo

7 .5

Model Specification for Multiple Regression


~

~o mi tted

rlearning

~ et tc r

test
d to have
e students
OLS esti-

;:'zq:_ ;:;:~~ c:i/:/.' ,

----------------------------------------------------------------~7"c

~Ornitred variablt: bias is the bias in the OLS estimator that arises when one or

~~----~lll l

7 .3

more included regressors are correlated with an omitted variable. For omitted variable bias to arise, two things must be true:

1. A t least one of the included regressots must be correlated w.ith the omitted
variable.

le effect of

glish learnto omitted


ttio a nd the

2. The omitted variable must be a determinant of the dependent variable, Y.

p-ession are

n}nant of Y,

~ estimators
:>LS estima
~ients will be
'gression are

able bias are


Ilor term Th1s
ozero. so that
I variable bias
.s implies tbat

Expert judgme nt aad economic theory are rare ly decisive, however. and often the
variables suggested by economic theory are not the ones on which you have data.
Therefore the next step is to develop a list of candidate alternative specifications.
that is, alternative sets of regressors. If the estimates of the coefficients of interest
are numerically similar across the alternative speci.fica tions, then this provides evidence that the estimates from your base specification are reliable. If, on the other
hand, the estimates of the coeff:icieots of interest change substantially across specifications. this often provides evidence that the original specification had omitted
variable bias. We e laborate on this approach to model specification in Section 9.2
after studying some tools for specifying regressions.

Interpreting the R2
and the Adjusted R2 in Practice

,e difflcull and

A n R 2 or an 8.2 near 1 means that the regressors are good at predicting the values
of the depende nt variable in the sample. and an R2 or an R2 near 0 meam they are
not. This makes these statistics useful summaries of th~ predictive abil ity of the
regression. However, it is easy to read more into them than the y deser Ye.
There are four potential pitfalls to guard against when using the R2 or R2 :

bias is rwofold1
;ombinarion
were collected~
Ted to as a base
,r primAl)' inter
conumic tht!l'f)

1. An increase in the R 2 or R2 does not necessarily mean that an added variable is statistically significant. The R 2 increases whenever you add a regressor, whether or not it is statistically significant. The R2 does not always
increase, but if it does this docs not necessarily mean that tbe coefficient on
that added regressor is statistically significant. To ascertain whether an added
variable is statisticaJ.ly significant, you need to perfo rm a hypothesis test using
the !-Slat i:"l ic.

lution to ornit)n. to practic-

1,11

, , Ke.YCoNffff t

OMITTED VARIABLE BIAS IN MULTIPLE REGRESSION

)ne of the

237

I'

238

CHAPTER 7

Hypothesis Tests ond Confidence Intervals in Multiple Regression

AND 7? 2 : WHAT THEY TEu


AND WHAT THEY DoN'T

R2

You-

The R 2 a11d R2 tell you whethe r tbe regressors are good a t predicling, or explaining." the values of the depende nt variable in the sample of data on hand. If the R2
(or R2 ) is nearly 1)t hen the regressors produce good predictions of the depe-nde nt
variable in tha t sample, in the sense that the variance of the OLS r csid ual is small
compared to the var iance- of the dependent variable. If the R2 (or R2) is nearly O,
the opposite is true-.
The R 2 arzil R2 do NOT tell you whether:
1. A n inchJded variable is statistically significant:
2. T he regressms are a true cause of the

mo\'cmcn t~

in the dependent variable:

3. T here is omitted variable bias ; or


4. You have chosen the most appropriate sd of regressors.

2. A high R 2 or R 2 does nor mean that the regressors are a true cause of !he
dependent variable. Imagine regressing test scores against parking lot area
per pupil. Parki ng lot area is correlated with lhe student-teacher ratio. with
whether the school is in a suburb o r a city. a nd possibly with district inco mealllhing.s that arc correlated with test scores.TI1us the regression of test score~
o n parking lot area pe r pupil could have a high R 2 and R2 but the re lationsh1p
is not causal (try telling the s uperintendent that the way to increase test score:.
is to increase parking space!) .

3. A high R 2 or R2 does not mean there is no omiued variable bias. Recall the
discussion of Section 6.1, which concerned om itted variable bias in the regrc::.sion of test scores on the student-teacher ra tio.The R2 of the regression ne\er
came up because it played n o logical role in this discussion. Omined variable
bias can occur in regressions with a low R 2 a m oderate R 2 , or a high R 2 Cunversely. a lo w R2 does not imp.l y that t here necessarily is omitted va riable bill '

4. A high R 2 or R 2 does not necessarily mean you have 1he most app ropriate
set of regressors, nor does a low R 2 or R2 necessarily mean you have an inap
propriate set of regressors. The q uestio n o f wha t constitutes the right se t 11f
regressors in multiple regrcs<>ion is difficult and we return to it throughout th i~
textbook . D ecisions about the regressors must weigh issues of omitted vari
able bias. data availability, data quality, and. most importantly, economic
theory ami the nature of the substantive questions being addresse d. None of

7.6

Analysis of the Test Score Data Set

239

these q uestions can be a ns wered simply by having a high (or low) regression

R2 or R1 .
These points are smnmarized in Key Concept 7.4.

xp1ain

f the R2
endent
is sm all
~,early 0.

7.6

Analysis of the Test Score Data Set


This section presents an analysis of the effect on test scores of the studenl- leacher
ratio using the California data set. Our primary purpose is to provide an example
in which multiple regressio n analysis is used to mitigate omitt~d variable bias.
Our secondary purpose is to demonstrate how to use a table to summarize regression results.

variable:

Discussion ofthe base and alternative specifications. This analysis focuses


on estimating the effect on test scores of a change in the student-teacher ratio,
holding constant student characteristics th at the superintendent cannot control.
Many factors potentially affect the average tc:;t score in a district. Some of the fac tors that could affect tes t scores are correlated with the student-teacher r atio, so

use of the

1g lot area

ratio. with
! incomeli!Sl scores
~ \ationsltip
: h!st scores

omirting them from the regre~:;ion will rcwlt in omitted variable bias. lf data are
available on these omitted variables, the solution tO this problem is to include them
as additional regressors in the multiple regression , When we do this. the coefficient
on the student-teacher ratio i~ the effect of a change in the student-te acher ratio,
ho lding constant these othe r factors.
Here we ~;onsider th ree variables that control for background ch aracteristics
of the students that could affect test scores. One of these control variables is the
ooe we have u~ed previously, the fraction of students who are still le arning Eng-

. Rt:call th~
1 the regres

lish . The two other variables are new a nd control Co r the economic background of
lhe students. There is no perfect measure of economic backgrou nd in the data set.

:ssioo never
t~d variable

so instead we use two imperfect indicators of low income in the d istrict. The first

igh R2 . Con

ariablc bia:-.

apfJroprialt
a~e an illap~ right set oi
oughout tlll~

)miued v~trt
I\' . econorn'-=f

.sed. J',ont: \.,

new variable is the percentage of students who arc eligible for receiving a subsidized or free lunch at school. Stude;:nts are e ligible fo r this program if their family
incomt is less than a certain threshold (approximately 150% of the poverty line).
'fbe second new variable is the percentage o f studenlS in the district whose families qualify for a California income assistance program. Families are eligible for
this income assistance prog ram depending in pan oo their family income, but the
threshold is lower (stricter) than the threshold for the subsidized lunch program.
TI1esc two var iable!\ thus m easure the fraction of economically di!;advantaged children in the di::;trict: a lthough they are related, they arc not perfectly correlated
(their correlation coefficient is 0.7-l). Although theory suggests that e conomic

240

CHAPTER 7

Hypothesis Tests and Confidence Intervals in Multiple Regression


backgro und could be an important omitted facto r. theory aod expert judgment d11
not really help us decide which of these two variables (percentage eligible for a
su bsidized lunch or percentage eligible for income assistance) is a better mca'.t.r'
of background. For our base specification, we choose the percentage e ligible lo1 .
subsidized lunch as the economic background variable, but we con ider an ahl'!
native specification that includes the other variable as well.
Scatterplo ts of [ests scores and these variables are presemed in Figur\; 7 '
Each of the!.e variables exhibits a negative correlation wilh test scores. The correlation between test scores and the percentage of English learners is - 0.64; bctv. n
test scores and the percentage eligible fo r a subsidized lunch is - 0.87; and berwc~.: n
test scores and the percentage qualifyi.ng for income assistance is - 0.63.

6<~ 1

What scale should we use for the regressors ? A practical question th,u
arises in regression analysis is what scale you should use for the regressors. In 11cure 7.2, the uni ts of the variables are percent, so the maximum possible rangL of
the data is 0 to 100. Alternatively, we could have defined these variables to b\. a
decimal fraction rather than a percent; for example, PcrEL could be replaccJ ~'~~
the fraction of English learne rs, FracE L (= PctEL /100) . which wo ulJ r<.~r ~e
betwee n 0 and 1 instead of between 0 and 100. More generally, in regression .malysis some decisio n usually needs to be made about the scale of bo[\1 the depc::mknt
and independent variables. How. then. should you choose the scale, or units. ol the
variables?
The general answer to the question of choosing the scale of the variabk'> b to
roak e rhe regression results easy to read and to interpret. In the test score apphcalion, the natural unit for the dependent variable is the score of the test It s~ II In
the regression of Tes1Score on STRand PctEL reported io Equation (7 S) l h~
coefficient o n PctEL is - 0.650. If instead the regressor had been Frac EL. rite
regression would have had an identical R 2 and SER; howe::vt-r. the coefU<:tL ill on
FracEL wo uld bave been - 65.0. In the specification with Pct L, the codfiut:nl
is the predicted change in test scores for a one-percentage-point increase in ln!!
lish learners. holding STR constant; in the specification witll FracEL, the codfi
cient is the predicted change in test scores for an increase by 1 in the traction of
English learn ers-that is, for a I()Q-percentage-point-increase- holding S'/ R Ct1nstant. Altho ugh these two specificatio ns are mathematically equi valent. lor rh<!
purposes of interpretation the one with PctEL seems. to us. more narur,ll.
A no ther consideration when deciding on a scaJe is to choose th~.; unns o1 rhe
regressors ~o that the resulting regre ion coefficients are easy to reaJ. f l,r t: ' mple. if a regre!'sor is meas ured in dollars and has a coefficiunt of 0.()()()0())56. 11 1'
easier to read tf the rcgrc-.sor is converted to millions of dollars and the coertu.:~nt
~ '\

nn I 1

( c)

The scotterpl
-0.64

hon =

percentage q

7. 6

tdo
~or

241

Scotterplots of Test Scores vs. Three Student Characteristics

~::;
~e

FIGURE 7 .2

Analysis of the Test Score Dolo Set

7.2.

forrer~veen

~tween

Test score

Tesr score

nor ~

7~0

;"lil t

::: &-~;
..
~~~
<~It

:.~

1 ~\1
1 ,(~1

'

1411

_,

!' ....,.

'-----'-----'----'----~

2'i

.'\( 1

75

II

~I)

Perce nt

n that

(a) Percentage

ol f.n!,!h>h l.lll!:,'l.l.tgc

Percent

lc:trnen

(b)

Percenta~ qua~1ng

fur n:duccJ pnce lunch

In Figangt! of
to be a
aced by
d range

Tesc score
.. .2lt

o analype ndent
ts.of the

~\es i<tO

ore appli~t 1tsel f.ln


(7.5) . the

'ac EL. the


i!fident on

(>111.1

"

2:1

'ill

-5

Percent

(c) PercemJg< <puhfving tnr ruc o me J<.<r<t:~nce

The ~cotterplots show o negative relationship between test scores and [a) the percentage of English learners (correlation -0.64). (b) the percentage of sh.ldents qualifying for o subsidized lunch (correlohon = -0.87); and (c) the
percentage qualifying for income assistance (correlation = - 0 .63).

;coefficient

ltsc in Eng-

tht: codfifraction of
~g S TR con
lent. tor th"

;ural.
' units oftlte
d. Fore~arn

1)000356. il I'
'I ic:nt
~e coi.!l 1\:

Tabular presentation of result. We are now faced wi th a communication


problem . 'What is the best way to show the resulls from several multiple regr<::sious that contain tliffc;rcnt c;uhscts of I he possible regressors? So far. we have pre
sented regression results by writing out the estima ted regression equations, as in
Equation (7.6) . This work:; well when there are onJy a few regressors and only a
le w equations. bu t with more regres~or and eq uations this method of pre~c nta
tlon can be confusing. A bette r \\U)' to communicate the results of seve ral regrcs:>ions is in a table.
Table 7.1 'iummarizcs the results or regressions of the test score on various <;ets
of rcgres:;of5.. Each column sttmmari7e.'> a separate regre_-;sion. Each regression lht~>

242

CHAPTER 7

TABLE 7.1

Hypothesis Tests ond Confidence lntervols in Mulliple Regression

Results of Regressions of Test Scores on the Sl\.tdent-Teocher Ratio and Sl\.tdent


Characteristic Control Variables Using California Elementary School Districts

Dependent variable: overage test score in the district.


Regressor

Student-teacher ra tio (X1)

(1 )

(2 )

-1.28**

-uo~

(0.52)

(0.43)

Percent eligible for subsidized lunch (X1)


Percent on public income ao;s1stance

(4)

(S)

- t.oo

- 1.31**

- 1.01
(0.27)

(0.27)

- 0.650**
(0.031)

Percent Enghsh learners (Xz)

(3 )

(0.3~ )

- Q.J22H

-0.~*'"

(0.033)

(0.030)

- 0.130'

co.o3o)

-0.529
(0.03f\)

- 0.547*
(0.024)

(X~)

--

- 0.790
(0.()(!8)

---

0.0-l.~

(0.059)

f-

Intercept

698.9**

686.0

(10.4)

(8.7)

18.58

14.46

700.2*"'
(5.6)

698.0*"

700A" *

(6.9)

(5.5)

9.08

11.65

Summary Statistics

SER

R2

0.049

0.424

II

420

420

0.773

0.626

420

420

9.08
0.773
420

.l

'fbese regr~~it~ns were- e-stimated using 1he data on K-S school districts in Caliiorma. described in Appendix 4.1. Standard error'

arc given in pllreotheses under coefficients. "lbc indiv1dual cc.~effident is statistically <:ignificaut at tlw S% lcwl or 1% ~igmii
cance level u~mg a two-sided test.

the same depende nt variable, test score. The e ntries in t he first five rows are the
estimated regression coefficients, with their standard errors below them in paren
theses. The asterisks indicate whether the /-statistics, testing the .hypothests 1hat
the relevant coefficient is zero. is significant at the 5% leve l (one asterisk) or the
1% le vel (two asterisks). The final three rows contain summary statistics for lh~
regression (the standard error of the regression, SER, and the adjusted R 2. R2) and
the sample size (which is the same for all of the regressions. 420 observatioJTh).
All the information that we have presented so far in equa tion format appear'
as a column of this table. For example, consider the regression of the test sco rr;
against the student- teacher ratio, with no control variables. In Ctluation form. this
regression is

TesrScore

2.28 x STR, -R 2
(10.4) (0-'\2)

= 698.9 -

= 0.049, SR

= J8.5H.n

= 420.

(7 .1 0)

7.6

Analysis of the Test Score Data Set

243

A ll this i.nfonnation appears in column (1) of Table 7.1 . The estimated coefficient
t.he student-teache r ratio ( -2.28) appears in lhe first row of nume rical e ntries,

Ot'l

(Sl

- 1.01
(U.:!7)

-- 0.130**
t0.036)

-ll.529*'"

(0.038)

0.048

(0.059)

700.4..
(5.5)

9.08
0.773

420

Stand,trd errors
lr 1% llignifi-

ve rows are the


1 them in parenbypothesis that
. asterisk) or th~
statistics for thC
- ,
d
llSled R2 .R-) a n
observations)
n format appc:at"
of tbe test score.:
~uation fom1. rhi~

=420.

a nd its standard error (0.52) appears in pare ntheses just below the estimated coefficie nt. The intercept (698.9) and its standard error (10.4) are given in the row
labeled "Intercept." (Sometimes you will see this row labeled ''constant'' becau::;e,
as discussed in Section 6.2. the intercept can be viewed as the coefficient on a regressor that is always equal to 1.) Similarly, the R2 (0.049). the S ER (J 8.58), and the sample size n (420) ap pear in the final rows. The blank e ntries in the rows of tbe other
regressors indicate that those regressors are not included in this regression.
Although the table does not report /-statistics, these can be computed from
the information provided; for example, the /Statistic testing lhe hypothesis that
the coefficient on the student-teacher ratio in column ( 1) is zero is - 2.28 /0.52 =
- 4.38. This hypothesis is rejected at the 1% level, which is indicated by the double asterisk next to the estimated coefficient in the table.
Regressions that include the control variables measuring student characteristics are reported in columns (2)-(5). Column (2), which reports the regres.sion of
test scores on the srude ut-teacher ratio a nd on the percentage of E nglish learne rs. was previously stated as Equation (7.5).
Column (3) presents the base specification, in which the regressors are the student-teacher ratio and t\VO control variables. the percentage of English learners
and the percentage of students eligible for a free lunch.
Columns (4) a nd (5) present alternative specifications that examin e the effect
of changes in the way the economic background of the students is measured. In
column (4}, the pe rce ntage of students on income assistance is incl uded as a regressor, and in column (5) both of the economic background variables are included.

Discussion ofempirical results. These results suggest three conclusions:


1. ControlLing for these s tudent characteristics cuts the effect of the s tudent-

teacher ratio on tesr scores approximately in bait This estimated effect is not
very sensitive to which specific control variables are included in the regression. In all cases the coefficient on the student-teacher ratio re mains statistically significant a t the 5% level. In the foUl' specificat ions with control
variables. regressions (2) - (5). reducing the stude nt- teacher ratio by one student per teacher is estimated to increase ave rage test scores by approximately
one point, holding constant student characteristics.
2. The student characteristic variables are very useful predktors of test scores.
The student- teacher ratio alone explain s only a small fraclion of the variation
in test scores:TI1c R2 in column (1) is 0.049. The R2 jumps. howe ver, when the
student characteristic variables are added. For example, the R2 in the base

244

CH APTER 7

Hypothesis Tests and Confidence Intervals in Multiple Regression

.,pcc..ification. rcgrl.!ssion (3). i 0.773. The ::.ign!> o f the cocffictcnh on the


dent demo~raph k vanablcs are consistent with the pancm ~en in RgurL 7 '
Districts '' nh many En!!Ji, h learners and distncts wnh manv pol.)r chtlt..l
h<t\~,; kl\'<~,;T IC~l "COf\.!5.

3. lllc c{1ntrol 'ariahlc:.;; arc: not ah' ay' indi .. idually <;ta tistically signattcant In
spec1ftcataon (5). the hypothe"'" that the coefficient on the pcrc'-nt<~I!C 4lhthl~anl:! for mcom~.:. al>stl>tance tl> zero i:; not rl.!jcc.:l<!d at the 5'!.. le, el (the ISt.tll\
uc i' -0.8::!). B ~cJU~~ JdJmg thl!> control va n a ble to the ba!>~ '>pcc.: tficuuon {3)
h.t::. u ncglag~bk dkct o n the esllm<.tt.:u coefficient lor thl. -.tudcnt-tcLidl,r
rat1o and its stand.ud error. and hccau ...c the coefficie nt un this control \Uti
able i' m1l signifkant in ~rccificat i on (5). this additional contrnl vanahll.:' j,
redundant. at least for the purposel> ot Ihi~ analy<;1s.

7.7

Conclusion
Chapter 6 bc;:g<m witb. ~ conc:cm : In the regression of test scores against the stU
ul.!nl ll:achl.!r ra tiO, ommed stuc.lcnt charactcric;tics that intluence test score~ mt Ill
o'- corrd.1tc:u \\llh the. -.tulh.nt-tc..achl.r ratio in the district. and if so th~.: tulknt tc:adtcr r.ttio in the.: i.Ji,trict would pick up the cHeer on test scorec; C'f thc:.c
omitll:d 'tuu'-nt characterbtics.Thus.. th-: OLS estimator would have ommcu ..anante oaa~ ro mitig.th: thh potcnual omined varible bia::.. we augmented the: re ~~
~ inn b' mduding van.tl"lk c; that control for vario u::. ::.tudc nl ch.tractcn'll~
.:
r~.-r<.:l.!ntagc ol l:.ngJa.,h kMner::. ,anJ tWO measure~ of studen t ~o:COO Omlc.. bJc..~
ground). Dome so ~u t~ th~.- estimated effect of a unit change tn the student t~;ac.. 1 r
rattll m h 111. Jhhough it rem.un poss1bk 10 reject the null hypothc-.b th 11 ,,,~
pupuhtinn dft;CI un t~'' ' C\ITC'-. holding these control variable::. constant. i-. 1.: >
at the "\'", -.i~nilkancc level Because they eliminate omitted variable bi.ls an''''\!
from thco;c o;tudcnt char.lcteristtc::.. these muluple regression estimate::.. hyp{1th~: ''~
tc::.r..., and conliucnc{! mtaval'> arc much more u::.eful ft.lf advising the s u pcnnt~.;n
ucn l than t h~ c;inglc-rcgrc!'sor estunates of Chapters -l and 5.
1111.. anal}::.is in tht::. and tb.L prcc~.ding chapter has pr c!>umcd that thl: popul.ttiun rcgrc-.-.mn lunction 1\ linear In the regrc ... ~ors-that is, that the conuiti\.lll:ll
C\pcctation of Y, ~ivcn t h~o rc!!rc-....oro. is a straight li n~; There is. huw~.; 'cr no 1 1r
ticular r'"''"nn to think thic; ic; '\O In Cnct. tbt: c ltect or r cduc mg the studcnt-h:.. ~c:J
ra11n m1gh 1 h.: quttc d 1l k r~.;nt in dastncts '' tth larg"' classes than 111 J -.trlct' th:tl
JlrcaJ~ h,l\ c ~m 111 d"''C'. It su the pl.)pulat1un r~.:grcv,ton h 11~.; .., not hnl..ll tn th&:
x.. hut nthcr j., l nunhncar tum: uno ot the X"<t. fo 1!\t\. nJ llUr .m.th ,j, tl rcgrc-...ion luncti11n' th nnrc nonlinear 111th~.; \'"".... hU\\<.!h'r. \\C nt:cU the tools de\ eloped __..___ __

--II

1 the Concepts

R8Yie<'

245

Summary
1. Hypothesis tests and confidence in ten J, lor a.
~sion coefficient are

tnl! 1e rcgrl
carried out using essentially the amc proc~dur<::
,t.,cJ
in the one-vari11Utt were
,, u:, confidence imerable linear regression model of Chapter 5. For c\
.

tmp1.... a~val for {3 1 is given by (31 :!: L96S({3 1).


2. Hypotheses involving more than one rc tnction 1111
t'fi tents are called J. oint
llc
1 coe
hypotheses. Joint hypotheses can be: t"sted us1n(! 1n F -~1,, 1'~"~1tc.
3. Regression specifi cation proceeds bv hrsl del~.:r 111
"-t '\! specification cbo.
tntnl.. a "'
sen to address concern about omitted \an.lbll bt, 1 ...,... b c specification can be
\.. a ale as
modified by including addit.ionul regressors that dd
tl' ~r potential sources
t fC'-S 0
of omitted variable bias. Simply choosing the. 'Pc\: 11 tcanon w 1th the highest R2 can
lead to reoression models that do not ~:. um ate tit..
fl.,:cl of mll!rest.
t'- C<IU'>tl1 C

ar\ei:-.

Key Terms
~tu

rc!!>lrictions (226)
joint hypo1hesis (226)
F-:.tatistic (227)
~s tricted regression (230)
unrestricted regression (230)

ight
SlU

,..~rc'-

h omosk ~d, 1 ,.

~ tlCil\'Oil

1)

F-statistic (231)

95% conft~f\:nce :.ct ( 2 )4)


base speclrtauon (23l )
ahemath " pecttca
r 1.u 1n., (237)
Bonferron1 lcs.t (l.'il)

.; (the

back

Review the Concepts

\!acht:r
\UI th~
1~ Z!!H'

7.1

aristn~

\)thc:.is

rintcn

popl'l
di tll'll'
nt) pnf
- tcach-:r
I

i\.15 th31
<\r '" th.:
o rc~rc;:o

c .. cl(rc

7.2

Explain how you would test the null hvpntL


th~l t''
0 in the multiple
'h!~l'
" I"
regression model, Y; = {30 + {31X 11 + {3~~
E 1 fl hO\\ you would test
u,. xp at
the null hypothesis that~ = 0. Explain ho\1, vou would tes.t the jomt h) pothesis that {31 = 0 and {32 = 0. Why iso 't th~o rl.,uJt of the join1 test implied by
the results of the fi rst two tests?
Provide an example of a regres.c;ion thut a1I',, U3bl V WOI Jld have a htgh value
of R, but would prod uce biased and inco11
, t'r nators of the regresl'>lc:n c~ 1
'"
sion cocfficient(s). Explain why the R2 is h~clv 10 be h tgh. Explain why the
OLS estimators would be biased and incon 1
''tl!n .

246

CHAPTER 7

Hypothesis Tests ond Confidence Intervals in Multiple Regr~on

Exercises
The fin.Lsix exercises refer to the table of estimated regressions on page 247.
computed using data for 1998 from the CPS. The data <;Ct con))t!>b ut Information on 4000 fu ll-time full-year workers.. The highc.l>t educational acht
ment for each worker was either a high scllOol diploma or a bachelor\
degret:. The worker's ages ranged from 25 to 34 years. The data <;Ct abo 1:1 11
tained information on the region of the country where the person li\'1!0, m 1r.
ital status, and number of children. For the purposes of these exercise~ kt

= average hourly earnings (in 1998 dollars)


College = bi nary variable (1 if college, 0 if hi.gl'l school)
Female = binary variable (1 if female, 0 if male)
Age = age (in years)
Nthensr = binary variable (1 if Region = Northeasr, 0 otherwise)
i\1/idwest = binary variable ( 1 if Region= Midwest, 0 otherwise)
Souclr = binary variable (1 if Region = South . 0 otherwise)
Wesr = binary variable (1 if Region= West, 0 otherwise)
A HE

7.1

Add c..,,, (5%) and "**" (1 %) to the table to indicate the sla tistical o;igniticance of the coefficients.

7.2

Using the regression results in column (1):


n. Is the college-high school earnings ditlerence estimated from thh
regression statistically significant at the 5% level? Coostr u~t a 9<i"'
confidence mterval of the diffe rence.
b. Is toe male- female earnings difference estimated from this regrc... ,i~'n
stat istically sig.nWcant at the 5% leve l? Construct a 95% con fidt"ncc
interval for the difference.

73

Using the regression results in column (2):


a. Is age an important determinant of earnings? Use an nppropriutc ~to
tistical tesl and/or confidence interval to explain your answer.
b. Sally is a 29-year-old female college graduate. Be tsy is a 34-y.:aroh.l
fe male college graduate. Construct a 95 % confidence interval for the
expected difference between their earni ngs.

7.4

Using the regrc'l<;ion results in column (3):


a. Do there appear to be important regional dilfer..:nccs! Use un apprn
priate hrpothcsis test to explain your an,.wer

Exerci~

247

,..-

Results of Regressions of Averoge Hourfy Earnings on Gender and Education Binary


Variables ond Other Characteristics Using 1998 Data from the Current Population Survey
Dependent vorlable: average hourly eoming.s (AHE).

veor"s
con-

mar-

(1)

(2 )

(3 )

5,46
(0,21)

5.48
(0.21)

(0.21)

-2.64
(0.20)

-2.62
(0.::!0)

Regressor
College (.X1)

frmalc: CX2)

5.44
- 2.62
(0.20)

let
0.29
(0.04)

0.29
(0.04)

\!Z.:dX)

1'\onhc:nst (X~)

(0.30)

Mtdwest (X 5)

0.60

--

(0.28)

-Snulh (Xb)
I---

0.69

-0.27
(0.26)

---

l nr~rccpt

12.69
(0.14)

4.40
(1.05)

3.75

(1.06)

signifiSummary StatisTics and Joint Te&ts

F-'' llo;tic for regional effects

51{(

R'

,,

6.27

6.22

0.176

0.190

4000

4000

----

6.10

6.21
0.194
4000

ression

ldencc

b. Jua nita is a 28-year-old female college graduate from the South. Molly

is a 28--year-old female college graduate from the Wes!. Jennifer is a


28-year-old fe male college graduate from the M idwest.

n ate sta-

i. Construct a 95% confidence interval for the diffe rence in expected


earnings bet\\oee n Juanita ami Molly.

r.

c ar-old
al for tht

ii. Explain how you would construct a 95 % confidence interval for


the difference io expected ear nings between Juanita and Jennifer.
(Hint What would happe n i( you included Wes1and excluded Midwesr from the regression?)
7.5

The regression shown in column (2) was estimated again . this time using data
from 199:! (4000 observat io n ~ c;elccted at random from the March 1993 CPS,
convcrtct.lmto 1998 t.lollaN. using the consumer price index) The results are

2 48

CHAPTER 7

Hypothesis Tesh ond Con~dence Intervals in Multiple Regression

AHE

= 0.77 + 5.29College- 2.59Female + 0.40Age, SER = 5.85)~2


(0.98) (0.20)

(0.18)

0.2t.

(0.03)

Comparing this regression to the rcgrc!> ion for 1998 ~ho\\ n in t:olumn (,)
was thcrt: a statistically ignificant change in tbc coetticie nl on Cullcgt!?
7.6

F\ .tluatl! the following statement: In all of the rcgre,.<;ion:... tht! cod II 11..
o n Female IS negauve,l..trgc, and statistically sigmficant Thb prO\'tdc -.tr ~
stattstical evidence of gender discrimination in Lbe U.S. labor m.trkct.''

7.7

Que-.tion 6.5 reportl!d the fo llowing regression (wnere standard e rrors ha\,.
been added):

Pmt

= 119.2 + 0.485RDR + 23.4Bath + 0. 156/b.i~e + 0.002Lsize


(23.9)

(2.6 L)

(8.94)

(0.0 11)

(0.0004S)

+ 0.090Age- 48.8Puor, R2 :: 0.72, SE R = 41.5


(0.311 )

(10.5)

a. Is the coefficient on BDR statistically sigmfica ntly different from


t.ero?
b. T)pically five- bedroom houses sell for much more than two-bedrotlm
house!::.. Is thiS consistent witb your an~wer lo (a) and wilh rhe regrl..''~ion more ge ner<~lly?

c. A homeowner purchases 2000 square fee t from an adJacent lot. Con


<;truct a 99% confident imerval for the change in the value o( her
house.
d.

Lot ::.tlC is measured in square feel. Do you think that another scale
might oe more appropriate? Wh~ or why not?

e. The Fsratistic for omitting BDR and Age h om the regression is r =O.OR.At c the cocfficienb on BDR and A ge ~tatistic a ll y different from
zero at the I 0/t~ level?

7.8

Referring to Table 7. 1 tn the text:

a. Construct the R2 for each of the regressions.


b. Construct the homoske<.Ja<;ticity-only f -stallstic for

te~ttnll /Ji

fJJ - tl

m the rcgr~ssion sbown rn column (5). Is the statistic ''gmfkant 11 1r


Yo lt:\d 1
'-" Tc-.t /3~ - /34 = () in the regrc,..wn -.hown in Cl1lumn (5) using the aonr~ I rnni t~o.st UbCUS'>CU in \ pJ)\:nUi '< 7.1.

Empirical Exercises

d. Construct a 99% coniidencc interva l for {3 1 for Lbe regression in


colunm 5.

0.21.

7.9

ron (2),
ge'?

Consider the r grcssion model Y, = {3(1,.. {3 1Xli + {32Xli + u,. Use ''Approach
#2 " from Section 7.3 to transform the regression so Lhat you can use a r- tatistic to te!)t

a. {31 = f3z:
b. {3 1 + a{3~ = 0, whe re a is a consLant;
c. {3 1 + {32 = I . ( Hint: Yo u must redefin e the dependent variable in the

s strong
iet."
to r~

249

regression.)

have

7.10 Equalions (7. 13) and l 7.14) show rwo formulas for lhe homo!;kedastici tyonly F-statistic. Show that the two formulas are equivalent.

~ize

1-bedrooru
be rcgrcs-

llot. Con

Iof her

thcr scale

son is f ==
ifercnt from

Empirical Exercises
E7.1

Usc the data set CPS04 described in E mpirka l Exercise 4.1 to a nswer tl1e
follmving queslions.
a. Run a regression o f average hou rly earning (A H E) on age (Age).

What is the estimated intercept? What is the estimated slope?


b. R un a regression of A HE on Age. gender ( Female), and education
(Bachelor). Wh<~t is the estimated effect of Age on earnings? Construct
a 95% confidence interval for the coeffi cien t on A ge in the regression.

c. Are the results from the regression in (b) substantively tliffl.!rem fro m
the results in (a) regarding the effects of A ge and A H E? D ocs the
regressio n in (a) seem to suffer from omitted variable bias?
d. Bob is a 26-year-old male worker with a high school diploma. Predict
Bob's earnings using the esti mated rcg.rcssion in (b). Alexis is a 30yea r-old female worker with a college degree. Predict Alex1s's earnings
using the regression.

c. Compare the fit of the regression in (a) and (b) using the rcgrcs. ion
standard errors, R2 and R2 . Why are the R2 and R2 so similar in regres1.)')
SIOn
(v
.

f . Are gender and ed ucation determinants of earnings? Test the null


hypothesis tha t Fem ale can he deleted 1rom the regres ion. Test the
null hypothcsb that Bachelor can be deleted from the regression. lest
the null hypothesis that both Female and B(tc/telor c<~n he deleted from

the regression.

250

CHAPHR 7

Hypothesis Tests and Coofiden<:e Intervals in Multiple Regression


g. A regression will suffer from omitted variable bias when two condttio ns hold. What are these two conditions? Do lhesc condttions <;ec:m
to hold here?
E7.2

Using the data set 'feachingRatings d escribed in Empmnl Ex<.rusc.: 4 2

carry out lhe fo llowing exercises.


u. Run a regression of Cour~e_Evaf on Beauty. Construct a 115",o conli-

deocc interval for the effect of Beaury on Course_ ai_


b. Consider the various cont ro l variables in the data set Which do ' u
think should be included in the regression? Usmg a Lablc hke 1able 7.1.
examine the robustness o f the confide nce interval that you constructed
in (a). What is a reasonable 95% confide nce interval for the d(cct \'I

Beauty on Course_Eval?
E7.3

Use the data set CollegeDista nce described in Empirical Exercise 4 3 to


answer the fo llowing questions.
ll. An ~ducat ion advocacy group argues that. o n average, 3 persons

l.

Ju-

cational a ttainme nt would increase by appr oximate ly 0.15 year if w-:.tance to the nearest college is decreased by '20 miles. Run a rt:.grc:s~ion
o f years of completed e ducation (ED) on distance to the neare~tl 1lege (Divt). Is the advocacy groups' claim consistent with the estim ted
regressmn? E xplain.
b. O lbe r factors also affect how much college a per:.on complelt:' o, ~
controlling for these other factors c hange the e stmated e ffect ot ,_u-.tance on coiJege years completed? To answer this q uestion, con ..truct 9
table like Table 7.1. lnclude a simple specification (constructed tn (a) I.
a base specificatio n (t ha t includes a set o f importa nt control \anabies), and several modifica tions of the base speci(ica rion. 01scu~~ how
the estimated e ffect of Disl on ED changes acr oss the speci ficati on~

c. It h as been argued tha t, controlling for o ther factors, blacks and I fi..;panics complete more college than whites. Is this result consisknl with
the regr essions that you constructed in part (b)?

E7.4

Using the data set

Gro~1h described in E mpirical \.:rc1se 4.-t. but t:.:<duJ

ing the data fo r Malta , carry o ut the following exercises.

a. Run a regression of Growth o n TradeShare. YearsSchool, Re1 _Co P-"


As.\ttSsinations and RGDP60. Conc;trucr a 95: confidt.nce inte~"'-" tor
the coefficient un fradeSiwre. Is the coeflJcit.nt '>tat il>ttc:~ll) l>ignificll111
aL

the '\% lc\\J?

The Bonferroni Test of a Joint Hypotheses

25 1

b. Test whether. taken as a gro up, YearsSchoo l, R ev_Coups. Assassina-

tndi-

fiom~ a nd RGDP60 can be omiLted from the regression. What is the


p-value of the F-statistic?

seem

cise 4.2.
onfi-

APPENDIX

7.1

The Bonferroni Test of a Joint Hypotheses

pyou
fable 7.1.
~structed

The method of Section 7.2 is the preferred way to test joint hypotheses in m ulti ple regre:.-

~ect of

sion. However, if the author of a s tudy p resents regression results but did not tesr a joinr
restriction in which you are intcrcswd, and you do not have the original data. then you wiU

dsc 4.3 to

not be able to compute !he F-statis tic of Secuon 7.2. This appendix describes a way to test
JOint hypothcsts that can be used whe n you only have a table of

regres~ion

resul ts.

This method is an applica tio n ot a ve ry ge neral tesring approach based ou Bonfc rroni's

on's edu-

tar

if disfegression
11rest coleestimated

inequality.

TI1c Bonferroni test is a test of a joint hypolheses based on the !-statistics for the individual hypotheses; that is, the Bonferroru test i;; the o ne-at-a-time t-stalistic test of Section
7.2 done properly. The Bonferroni test of the joint null hypoth<!sis /3 1 = /31,0 and {32 = /32.0
based on the critical value c > 0 usc::s the following rule:

ltes.Does

Accept if lE
ds

~ct of disconstruct a
~ted in (a)l.
~\ vari-

piscuss ho"'

i fications.

s and His

'h
nsiste nt wit

4, b ut cxch1J

c and if

~~

c: otbenvise, reject
(7.20)

(Bonierroni one-at-a-time t-stat.istic test).


where

11

and t 2 a re the r-stalisfics tha t tes t the restrictions on {3 1 and J32 respectfully.

The trick b to choose the critical va lue c in such a way that the probability tha t th.: one at-a-lime test rejects whe n the null hypo thesis is true is no more than the destred s1gnsfi-

cance level, say 5%.1his is d one by using Bonferroni's ineq ualify to choose the critical value
c to allow both for tbe fac1 that two restriction!> are being tested and for any possil,le corre lation between 11 and r2 .

Bonferroni 's Inequality


Bonfcrroni's inequality is a basic resull of probability theory. Let A and B be eve nts. Let

Rev_CmlJI!<.

e inter\'al fo'
'
I
. t :'ic,sll
I'II y Slgl
~11

A n B be the even''both A and B" (tbc intc rscc(jon of A and B). and let A U B be the event
.. A or B or both" (the union o f A and B). Then P r(A U B)
Because P r(ll n B)

<!!

= Pr(A) + Pr( R) -

Pr(A n B).

0. it follows tha t Pr(A U B ):;: Pr(A ) + Pr(B). This inequality in turn

implies that 1 - Pr(A UB) ?; I - [Pr{A) + Pr( B)] Le t A" and Be b.: the complements of

252

CHAPTER 7

Hypothesi$ Te$ISond Confidence Interval$ in Multiple Regression


A and FJ, that is. the C\'COt" "mll A ,. nnJ not 8 ... Because th.: cumpkment ur
A nB

' un ~

l - Pr(A U/J) = Pr(A'nR). which )tekb Bonlerroni'-; inequ,tht). Pr( Vr H )

1 - (Pr(A)

Pr(B)]
1\o" ld t\ b.; th .. C\<.Otthut 11t >rand 8 tx the ..:h.nt that 1; 1 > r. Th~n the meq

it~

PrpJ/J}

Pr

A)

Pr(H) )lckh

Pr('t ' > c ')r ltl >c or both) s P('t

'> d + PrCit 21>c).

(1

Bonferroni Tests
Bccaw.~ the e\'ent tsl -. t ur ltJ
lor both'' IS the reJeCtion regton u( tho.: one lht ttmc
tc-<t. Equ,llion (7.21) prnviJl" a W:l} to choo~c the critical value c so th.tt thc "onc HI u tun~
I stnti ~ric

has the

ue~irco

signillca nce level in l:ugc <;ftmplcs. U nder thL: null hypolh\:~1 ... 111

large samplo.:~. Pr( lr 11 >c)- Pr(lt ,j > c) ::.. Pr(I Z I > c).'nlUs Equation (7.2'1) imrl~ts th111,
111 Iorge samples, the prohobiltty that rh~ 1>0~-a t -a- tl m~.: lest rejects unJe r thl' nUll I"
(7.22)

Pr11.,(one at-a-time te't rcjecb)' 2 Pr(!Z t >c).


'lh~: ino.:qualit~

in Fquation (7 .22) provid"' a " ay to chO<he l ..'ritical 'alu.. c ' that the
prohahthl\' or the rqecuon unJer the nt,;.ll h~JX>lheSt<; equal!. the desrred SljtO fi~.:ance I \'tl
Th~

Bortcrron approach can be ext...nd.:J to mM.:

relltnctton

und~r

the null. the factor of 2 o

th~.:

th~n

t''o
right-hand

codCi1.1~.:nb
s t lh: 10

Eqa

1 th.:rc ure q

\.ln

l7.?~)r

r... plucc!'d 1'1) q.


Tab! ... ~ ~ rre . . . r t'-lTIItcal vatu..' l lor the c 0<: at-a-ume Bonferroni tl.\t for 'l.oiili.lUS .,.g.
'ltfic mcc levels and q = 2, 3. and 4. For e\.rmplc, 'uppo'-t: the de,ired ' gnific m~
5~> and q .,. 2 '\ccording to fable 7 1 the critical 'aluc c j., 2.241. Thi' critical \'iiluc

i ts
s ;I~

1.25"(, percentile of the st:tndan .l normalth,tnhulton so PrCI Z ! > 2.24 1) =:! 'i . 1 hu'

l::.qua11on (7 22) tells u' that, rn large snmplcs.thc one-at-a-tune test 111 Equation (7 2lll \\Ill
reJeCt at most 5% of the tlm..: unllcr the null h )pothcsi~.
'f11e critical value' 111 Tnhk 7.3 arc largl;r than the crtltcal \'alues fnr te,ting .t 111~\l'
re~triction. For e.xumrlc, with q = 2. the: one-ahiLimc te:H r~:jcct' il atlca,ton~.; tst.lll'llt:

ex~.:ccd~ 2.241 in nh,nlute value Thi' critical value is grenter than l 9ti ho.:cau'e II pr< 11' "'
corrects for the tact that b\'IO"lkin~ at t\\1) t~L;Hiqk~ you get 11 sewml chanc1. to rcjtr.:t tht'
hypoth~"- a' r.lt'l.:u,,e~lm

Secuon 7.2.
11 th1. mdl\ 1dual t-stalls!ICl' art tw.cd ~~n hctc:rosko.:J.J,tu:ll\ -robu<;ht.md. rd ..rrv~ thd'

JOint null

the!' Uonl..:rroni te-.t i-. ' hJ '' h hc:r vr nlllthc:n.: t'i hetcrcxkedlbttctt)'. but i1 h1. 1 -.tau-.ti
are ba~d on h0mo:.keda,ucit~ -em!~ .. t.Jndurd Cl'''
1. Bontc:mni tc-.t t-- Hthd (nh unLh:t
homo-.kc:da.-.tidt~

The 8onferroni Test of o Joint Hypotheses

if AU B is

TABLE 7.3

253

Bonferroni Critical Values c for the One-at-a-time


tStatistic Test of a Joint Hypothesis

~\n B') ===

Significance Level
~c

inequalNumber of R estrictions (q)


........
...

LO%

1.960

3
4

(7.21)

5%

1%

2.241

2.807

2.128

2.394

2.935

2.241

2.498

3.023

--

ne-at-a-tim..:

rt:

at ll time'"

~ypothcsis in

Application to Test Scores


The r-statistics testing the joint null hypolh.esis that the true coefficients on test scores and

implies that.

expe nditures per pupil in Equa tion (7.6) are. respectively, t1 = -0.60 and

null is

A ltho ug h

'2- =

2.43.

Jed < 2.241. because J t~J > 2.241 , we can reject the joi11t null hypothesis at the

5% significance leve l usmg the Bonierroni test However, both r1 and r2 are less t han 2.807

(7.221

in absolute value, so we cannot reject the joint null hypothesis at the 1o/o significance level

using the Bonferro ni test. ln contrast, using the F-slatistic in Section 7.2, we we-re able to

~c: c so that the


rificance level.
r tf there are q
~ution (7.22) it'

I for variou:; si$.

ificance le,el i'

teal ' alu.: is th<!

2.5"n. ThUS
alion (7 20) wtll
) ==-

reject this hypothesis a t the 1% significance level.

CHAPTER

Nonlinear
Regression Functions

n Chapters 4-7, lhc populauoo regression func1ion was as,umcd to

lx.lin~:nr.

In oth~r words, I he slope of the populaLion regress1on function was con~t.tnt,

-so that the effect on Y of a unil

chan~\!

in X does not itself depend on I he value

of X. But what if the effect on Y of a change in X does depend o n the

valu~

ut

one or more of the inucpendent va ri<.1bles? If so, the population regression


function is nonlinear.
Th is chapter develops two groups of meth ods for de tecting and modcliug
nonlinear populalion regression functions. The methods in the fi r<;t group urt
u ~dul

when the cf[ecl on Y of a change to one mdependent variabk , X1

depends on the vaiUt! of X 1 ilseli For example, reducing class sizes by one
student per tc.tchc.r might have a greater effect tf class sizes arc already
manJgcably smaU than it they are so large that the teacher can do hllh. mor.:
than kt!ep the cia:." under control. If so, the test score ( Y) IS a nonline.tr .unct1on
of the 'tudt=nt-teacht:r rauo (X1), \\here I hie; fun ctiOn

1S steeper

wht:n \', i~

smalLAn exampk of a nonlinear regressio n function wJth this fealurt. '' ~ho,~n
m Figur~ 8 1. Whereas the linear population regression (unction in Fieurc: ~ lu
has a constant slope. the nonlinear population regression function in figuJt: s.lh
has a st~c!per slope when X 1 is smaU than when il is l arg~.1ltis first group vf
methods is presented in Section H.2.
The methods in the second group nre useful when the e ffect on Y ot a
change in X 1 depends on the valul! of another in<.lepen<.l~nt \.ariahk. say X: fllr
~:xamplc. students still learning English migh1 c-.pcciull) hc:ncf1t from having
ml)r~ on~-on-onc: att~ntion; if so. the eff~ct

on test scores of rc:Jut.mg the

:.tudcnt-tcacher rat1u "ill he grealer in di'ltrict-. with man)- :>tudcnt c;till kanlio~
Englbh than

10

di..,trict' "ith fe'" Engli-.h h.:.1rn~.:r' In thi' t. x,t mplc. the ~ffc:d l 111

Nonlinear Regression Functions


FIGURE 8. 1

255

Population Regression Functions with Different Slopes

Run
Ris~

ar.

Run

n t.

x,

lue
(a) C onsram

~lope

(b) Slope depends on the value of X1

of
Population regression
function when x2 = 1
0

re

Risel ~

Population regression iunctior. when X2 = 0

ore
In F1gure 8.1a, the population regre~sion function has a constant slope. In Figure 8.1 b, the slope of the population
regre~ion function depends on the value of xl. In Figure 8.1C, the slope of the population regression function
depends on the value of x2.

test scores ( Y) of a reduction in the student-teacher ratio (X 1) depends on the


percentage of English learne rs in the district (X2 ). As shown in Figure 8.1c. the
sJope of this type of population regression function depends on the value of X 2
This second group of methods is presented in Section 8.3.

rn the models of Sections 8.2 and 8.3, the population regression funclion is a
nonlinear function o( lhe independent variables. that is. the concljtional expectation
ving

E( Y, IX 1,

. Xk,)

is a nonlinear function of one or more of the X's. A lthough they

are nonlinea r in the X's, these models are l inear functions o f the u nk nown

I ~e arniog.

coefficients (or parameters) of the population regression model and thus are

etfec\ otl

versions of the multiple regression

mod~.!~

of Chapters 6 and 7. Therefore, tbe

256

CHAPTER 8

Nonlinear Regression Functions

unknown parame ters of rhese nonlinear regression functions can he esti.matl!d tnd
tested using OLS and the methods of Chapters 6 and 7.
Sections 8.1 and 8.2 introduce nonlinear regresston functton-. in the C(~
of regression with a single independent variable. and SeclJon S.3 l.!xtends th"
two independent variables. To keep lhitlgs simple. a<ldiLJonal control "anablc,
are omitted in Lbe empirical examples of Secuons 8.1-8.3. In prachcc, howcv r.
it is imponant to analyze nonlinear regression function s in model!. that conr 1
for omitted variable bias by including control variables a well. Tn Section I\ "5
we combine nonlinear regression functions and addilional con trol va ria ble~
when we take a close look at possible o.onlinearities io the relationship betwcr.:n
test scores and the student- teacher ratio, holding student characte ristics
constan t. fn some applications. the regression function is a nonlinear function of
lhe X's a11d of the parameters. If so, the parameters t<!nnot be estimated by
OLS, but they can be estjmated using nonlinear least ~qua res. Appendix 8.1
provides examples of such functions and describes the nonlinear least squ.......
estimator.

8.1

A General Strategy for Modeling


Nonlinear Regression Functions
This section lays out a genera l stra tegy for modeling nonlinear population r~.gtt:S
sion functions. In this strategy, the nonlinear models are extensions of th~ multi
pie regression model and therefore can be estimated and tested using th~. wnJ.; of
Chapters 6 and 7. First. however, we return to the California test score don ""0
consider the relationship between test score:. and district income.

Test Scores and District Income


Jn Chapter 7, we found that the economic background of the student~ is an ii11P1'r
tant tactor to explaining performance on standardi7cd tc'>t" llltlt analysi<; U"c..'J J\\"
economic background \'arta~lel> (the percentai!C ot o;tuJc:nt:. qualifying for a ,.ur
...id11c:d lunch and the pc.! r~~nta~e or di~trit.t lc.mtli..:~ ualt lymg for incol11

8.1

General Strotegy for Modeling Noolinoor Regresston Function~

257

Scotterplot of le$t Score vs. District Income with o linear OLS Regression Function
There p<nitive correlation between test scor~
and a lncltncome (correlohon = 0.71), but the
lmea OLS regression line does not adequately
doscroL'* the rclo~onshi p between these voriobles

Test score
7-10

720
7(\o J
(~

6GiJ

640
(,2fJ

600

11.1
District income
(tho u saods of dollars)

assistance) to measure the fraction of s ludeots 10 tbe district coming from poor
(amtlies. A different , broHder measure of economic background is lhe average
annual per capita income in Lhe school dtstnct (''di trict income"). The California
data set tncludes district income measured 10 thousands of 1998 dollars. TI1e sample contains a wtde range of income leve ls: For lhe 420 ui:.trict.s in our sample, the
median district income is 13.7 (that is, $13.700 per person), anJ it ranges from 5.3
($5300 per person) lo 55.3 ($55,300 per person).
Figure 8.2 shows a scatterplot of fifth-grade lest scores again"t disrrict income
for the California data set, along witll1be OLS rcgrc~ion line relating lhese two
variables. Test scores aod average income are strongly positjvely correlated, with
a correlation coefficient of 0.71; student ) from aJllucnt districts do better on lhe
tests than students from poor districts. But this scauerplot has a peculiarity Most
of the points are below the OLS line whe n income is very low (under $10,000) or
very high (over $40,000), but arc above the line whe n income is between S15,000
and $30.000. The re see ms to be some curvature in the relationship between test
scores and income that is not captured by the linear regression.
In short, it seems that the relationship between district income and test scores
i" not a straight Line. R ather. it is nonlinear A nonJincar function is a function with
a -.lope that is nol constant: The function .f{X) is linear tf the c:Jope of J( X) is the
1'-ame lor aU \alucs of X, but if the slope de p~mls on t h~: value of X. then .f{X} is
nonlinear.

258

CHAPTE~ 8

Nonlinear Regre5sion Fundions

lf n slfaighl line JS nm an adeq ualc J c..:,cription of the rd ation ..hip hetw ~.:en
Jastnctancome and tel)t score'\. what ~? Imagine dra'"'ln1ll L"Uf"\'1! that filS the p 1 ,
10 Figur" X2. Thts cune would be steep lor IO\\ vnlucs of d 'trict _lCOmt: t 1
''uuiJ flatten out ~diStrict aocomc gds tugher. Oot: \\.l) to rprm:imak su. 1 a
c.un ~ mathematac.lll} .., to moJd the rdataon,hip " a quae:: t\: fun..:tion . 1h .!~ .
"C couiJ mudd test 'COre as a fum. lion of in~umL mt/tht ~qu u e of 'nw:& ...
A quadratic populatton regres:.ton modd rclat ng ll:'l -.ure:-. nnJ incor ) 1,
written m.llhematicaJiv as

l ~i.l i
wh~.:rc {311 , {3 1 a nd {32

are coeffic ie nts, lnconw, ii) the inwmc in the ;th dilltlict.
1II.COnlef , is the square of income in the i 1h ui!.lrict, a nd u, i ~ an error tC'rm that. J~
u:-,ual. represents all the other fac tors rhat determine test scores.l:.qua tion (8. I) l\
called the quadratic regression model because the population regrc~siun funt tinn.
E( TesrScore1 IIncome,) = {30 + {3 1lncome, + {3 21ncorne1. is a quaJrnt ic (unctwn of
th~: anJt.:.p~n dent variable, Income.
If )uu knew the population coefficients f3o,{31 and {31 in F4u,1tiun ( . ) ~ou
coukl prt!dicl the test score of a distnct based on ih ,tverage income But he'e
population coet11cicnts are unknown and therefore must be estimated using sampk ofJata.
\t ltr-.t, it might seem Jilii.cuh to find the coefficrcnts of t h~ qu. drat 1c !Unction that t-est Ills the d.1ta in figure 8.2. If you compare Equauon t~ I J ' ttL the
m It rk rc:gre'>'!llOn moJel an Key Concept 6.2, bowever. you \\all see lh<lt Equation (X.l ) j., in fact a' t.:.n.ion of the multiple regression model with two cgt~.: "or..
Th~.; fi r!,l regressor is lmome. and the second regressor as lltcomr. 1 hus. ,; ft l'r
lk tan in)! the ft..l!rt:ssors as Income and lncom~. the nonlinear model m Equ.llll1n
(8.1) i'> '>rmply a multiple regression mode l with two regressors!
Becau~t: the 4uudratic regression model is u varhmt of multiple rl.!grcssat>n.H'
unknown popula tmn codfic1ents cnn be est1 matccl and tcs tc:d using thl 015
meth ods described in Chapters (i and 7. E~:otimuting tht. codficicnts ol fquutillll
(H. l) using OLS for the 420 obse rvatio ns in Figure H.2 yie ld'

= 607.3 -r 3.85lncome- 0.042311kont.


(2.9)

(0.27)

,/~

0.554.

(~,;)

(O.!X).4X)

"here (a' u ... ual) 'lnnclard crn.n> of the t;Siimatcd coefficil!nt' 1n~ gi\.:n 111
p:ul!nthi!'>C.,_ Thl! c'umated regre,~ion ft~nction ( '.2.) is plutkc1 in T-i,gurc :d

s. t

1ip

bet,veen

FIGURE 8.3

ts the points
1cotnC, then

A General Strategy for Modeling Nonlinear Regression Functions

Scatterplot of Test *re vs. District Income with linear and Quadratic Regression Functions

The quadratic OLS regre~ion fvnclion fits the doto


better thon the linear OLS regression function.

,m ate such a

259

Test sco;e

740

:tion.Tha t is.

no

fincome.
n<.l income is

700

Linear regressi~n

680

660

(8.1)

he 11h d istrict.
r tcm1 tlJat as
uation ( 8.1) is
ssion function.
itic function of

..

64U

620
600
0

10

20

30

40

50

60

District income
(thousands of dollars)

ftion (B. I ), you

Dme. But these

le d using a sam-

!q uadr atic func-

h (8.1 ) with the


I sec that Equa
I two regressors:
lme2. Thus. after

superimposed over the scatterplot of the data. The quadratic function captures the
curvature in the scatterplot: It is steep for low values of district income but flattens om when district income is high. In short, the quadratic regression function
seems to fit the data better than the linear one.
We can go one step beyond this visual comparison and formally test the
hypothesis that the relationship between income and test scores is linear, against
the alternative that it is nonlinear. If the relationship is linear. then the regression

bdel in Equation

function is correctly specified as Equation (8.1), except that the regressor lncome 2
is absent; that is, if the relationship is linear. then Equation (8.1) holds with /32 =

j \e regression. its

0. Thus. we can test the null hypothesis that the population regression function is
linear against the alte rnative that it is quadratic by testing the null hypothesis that

~nts of Equatt~>O

{3 2 = 0 against the alternative that {3 2 0.


Because Equation (8.1) is just a variant of the multiple regression model, the

~ using the o.t.S

' = 0.554.

null hypothesis that /32 = 0 can be tested by constructing the t-statistic for this
hypothesis. 'This !-statistic is t = (~ 2 - 0)/ S(~2 ), which from Equation (8.2) is r =

(~~ 1

. n 1tt
r.uts are g.avt:
(::.d in Figure ~.-'

-0.0423/0.0048 = -8.81. In absolute value, this exceeds the 5% critical value of


this test (which is 1.96).lndeed the p-value for the t-statistic is less than 0.01 %, so
we can reject the hypothesis tbat {32 = 0 at aU conventional significance levels. Thus
this formal hypothesis test supports our informal inspect ion of Figures 8.2 and 8.3:
The quadratic model fits the data better than the linear model.

260

CHAPTER 8

Nonlinear Regress1on Fundions

The Effect on Y of a Change


in X in Nonlinear Specifications
Put astJe the tc:.t score example for a moment and consider a gen~ral probl m
You want to I.. nnw hO\-. the d~.:pe nd ent variable Y is expcct~.:u tu ~hangc- when rh
independent '.triabk X 1 changes by the amount 11X 1 holding constant other i tu pendent vJriablcs X .... X1c When the population regression func11on j-. lu .11.
thts d tcct is easy to cHiculate:As sbown in Equation (6.4). the c:xpccteJ cb.tngc ;,.
) is 6. }' ~ (3 tlX , where (3 is the population regression coefficient multipl\ mo ' 1
Wht! n the regression function is nonlinear. however, the expected change n } ,,
mort; complicated to calculate because it ca n depend on the values of the lOll\!
pcndc.:nt variables.

A general formula for a nonlinear population regression function . 1 The


nonhnea1 population regression models considered in this chapter are of the lmm

j)~~'
\~~"

Y1 = ftX 11.X2,, , X1)+ u;, i = l , . .. , lt,

l J)

whcrcf(X1,, X2;. , X,.,) is the population nonlinear regce ion function a po -.ibly non linear funclion or the independent variables xli. X2; . . Xlu, and II , ,.11.
error term. For example. in the quadratic regression model in Equation (K I )nl\
one independent "aria bit b present. ~o X 1 is Income and the population r~o_, .:s
ston function isf(lnwme,) = (30 ,. (3 1/ncome, + ~lncomeJ.
Dccause the population regressio n function is the conditional cxpect.lln 1 of
Y, givt!n X 1 X21, , X . in Equauon (lD) we allow for the possibility th tl I i
comJition.ll 1!\ pi!Ct.liiOn is a nonlinear fu nctio n of x ,,. x1,. . . . . X~ th .. t i ~.
E( Y, 1X , X21... , X. ) f(X , . Xy .. .. . Xk;). where f can be a nonlinear fum:r '"
Tf the popul.ttion regression !unction is linear, then f(X 11 X~...... . X~.;,) = /311 -t 13 ,\
'1' f3~X , + T (3">. . and Equation (8.3) becomes the linear regrc<:<;ton mvJd
in Ke) Concept 6.2. 1Iowever. Equation (8.3) allows for nonlinear regrcsston tun~
tions u:. well.
l

The effect on Y ofa change in X 1 As discussed in Section 6.2, the cfl c<.'t '0
Y of a change in X 1 ~.A: 1 holdmg X 2, , X~;. constant. is the difference in the
I Inc term '"nunhnc:nr n:erc'"'m" aprh~ lO '"" COO<."'\:ptuall~ dtllcn:nt l tmtlles or mo~o:.h:b In lh f~
l.tmlh,lhl J'I.II'UIIIIh>O r.:gTo:'-'l<lO fUil(IUlO I' a nonhncar tunction o f lht .,\ 'hUll' llllt'ar lun;;tJOI\
th unl.;nuwo parltmclc:flo (the 1J '). In 1hc -.co.:vnd 1am1l) .lhe fl<rut.ll"'" ro:gr~non functtIB ~ noll
llncnr 1un.::11un u lthc unl;n''"'" p;..rtlntCICI"'I anJ m.w vr m.l\ nul he 1 nmtlinc.or h.t~llon olllu: .\ llll
moJeh 10 th<li"Jd' ol lht' d~:iplcr are nil 10 lhe fir-t lnm1l~ Append" I llll.c, ur modcb uom ~
~nd

ranul\

A General Strategy for Modeling Nonlinear Regression Functions

8.1

,.---

26 1

Y OF A CHANGE IN
NONLINEAR REGRESSION MODEl (8.3)

THE EXPECTED EFFECT ON

m.

he
lcar,

X1 IN THE

.,---

The expected change in Y. ~ Y. associated with the change in X 1 .:lX1, holding X2


.. , Xl CODSlAill. is the difference between the value of the population regr~sion
function before and after changing X1 . holding X2 Xk constant. That is, the
eli.p~.:ctcd chnnge in Y is the diffe rence:

.l Y =-ttx~

be

u,. x~..... x,J - f(XI.x'l...... X,.J.

8.1

(SA)

The estimator of this unknown population ? ifferencc is the difference between


the predicted values for these lwo cases. Let f(XI. X2, . Xk) be rhc predicted
\aluc of Y bnscd on the estimator of the population regression functio11. Then
the predicted change in Y is

.f

.:lY = }'(X 1 +

~Xt. X 2... . , Xk) -

{(x1, X.2.... , Xk).

(8.5).

expec ted valu e of Y when the independent variables take on the values X 1 + aX1,
X 2 .. . Xk and the expected value of Y when the indepe ndent variables take on

the values X I. X'}., . . . ' xk.The difference between these two expected values,
say 6 Y, is what happens lo Y on average in the populatio n whe n X 1 changes
by an a m(lunt 6X1 holdi ng constan t the other variables X 2 . Xk. In the

non linear regrt:ssion model of Equation (8.3), this effect on Y is 6 Y

f(X1 + ~X 1 . X2 . . . . . X.~.)

j(X 1.X2. ... ,Xk)


Because the re gression fun eli on / is unknown, the population effect on Y of a

rune

h~o: fir'>l
!loll ,,r

' noo
\'"' Ill
otnth

change in X 1 is a lso unknown. To estimate the population effect, first estimate the
population regression funclion. At a general level. de note this estima ted function
by]: an example of such an estimated function is the estimated quudratic regres
sioo fu n<.:tion in Equation (8.2). The estimated eUect on Y (denoted SY) of the
change in X 1 is the difference between the predicted value of Y when Lhe independent vaiiablcs Lake on the values X 1 + 6 X 1 X 2... , X.<. and the predicted value
of y when they t ak~ on the values X I. x2, . .. 'xk.
The method for calcula ling the expected effect o n Y of a change in X 1 is summarized in Key Concept 8.1.

A pplication to test scores and income. What is the predicted change in test
scores associated with a change in district income uf $1000, based on the! estimatt!d
quadra tic regrcs ion function in Equation (8.2)'? Because that regrc~~ion function
is quadratic. Lhi~ effect dep\:mls on the inittal district 1ncome. We the r efore

262

CHAPHR 8

Nonlinear Regression Functions

consider two cases: an increase in district income fro m 10 to 11 (i.e., from $l ll.0f)(j
per capi ta to $ 1 1 .~) and an increase in district income from 40 to 41.
To compute ~ y associated with the change in income from 10 to ll. we: co1n
apply rhe general formula in Equation (8.5) to the quadratic regression model.
Doing so yields

([l, + iJ, X

l1

+ iJ, X 11 2) - (iJo + iJ, X 10 + iJ, X 10').

186)

where ~0 .~ 1 and~ are the OLS estimators.


The tenn i.n the first ser of parentheses in Equation (8.6) is the prcdictctl \ Uiu~:
of Y when Incom e = 11, and tbe tenn in the second set of parentheses is the: predicted value of Y when Income= 10. These predicted values are calculated u'101:!
the OLS estimates o f the coefficients in Equation (8.2).AccordingJy, when /ll(:olne
= J.O, the predicted value of test scores is 607.3 + 3.85 x 10 - 0.0423 x Jn1 =641.57. When Income= 11, the predicted value is 607.3 + 3.85 X 11 - 0.0-42'\ x
11 2 = 644.53. The difference in these two predicted values is 6 Y = 644.53 - 64 l.S7
= 2.96 points. tha t is. the predicted difference in test scores between a distncL wnh
ave rage income o f $11.000 and one with average income of $10,000 is 2.96 pomts.
In the second case, when income changes from $40,000 to $41.000. t h~.; differ
ence in the predicted values in Equation (8.6) is ~y = (607.3 + 3.85 X 41 -0.0423
X 412) - (607.3 + 3.85 X 40 - 0.0423 X 402) = 694.04 - 693.62 = 0.42 piJints.
Thus, a change of income of $1000 is associated wit.h a larger chang!! in prcdKtc:J
Lt:st scores if the tnitial income is $10,000 t han if it is $40,000 (the predicted char go
are 2.96 points versus 0.42 point). Said differen tly, the slope of th.e es umah:d 'IUJ
dratic regression function in Figure 8.3 is steeper at low values o f incom~: (like
$10,000) than a1 the higher values of income (like $40,000).

Standard errors ofestimated efficts. Tbe estimalOr oC the dft:ct on > of


changing X 1 depends on the estimator of the popula tion regression funct~1 . }
which varies (rom one sample to rhe next. Therefore the estimated e(fect cont.uns
sampling error. One way to quantify the sampling uncertainty associated with the
estimated effect is to compute a confidence in terval for the true population t:ltcct.
To do so, we need to compute the standard error of ~yin Equation (8.5).
It is easy to compute a standard error for !lY when the regression functwn 1)
linea r. The ~:,timated effect of a change in X 1 is j3 1 ~X1. so a 95% confid ence mtc:r
val for the estimated change is 1t:.X 1 1.96SE(tJ 1)dX1
ln the nonlinear regre~ion models of this chapter, the standard error 1ll ~}
can be computed using the tools in troduced in Section 7.3 fo r testing a 'lnPt'
re.;tncuo n nvoiYing multiple coefficienL~ To illustrate thu; method. consider tltC

8.1

ve can
nodel.

(8.7)
Thus, if we can compute the standard error of ~ 1 + 21~. then we have computed the standard e rror of 6. Y.There are two me thods for doing this using standard regression software, which correspond to the two approaches in Section 7.3
for testing a single restriction o n multiple coefficients.
The first method is to use 'approach #1"' of Section 7.3. which is to compute
the F-statistic testing the hypothesis tha t {31 + 21{3 2 = 0. The standard error of 6. Y

(8.6)

~d value
the pr~-

~d using
lnco:ne

is then given by2

(8.8)

~ .0423 X

I-

641.57
\"/hen applied lO the quadratic regression in Equation (8.2), the F-statistic testing
the hypothesis that {31 + 2'1{3 2 = 0 is F = 299.94. Because ~Y = 2.96, applying
Equation (8.8) gives SE(tiY) = 2.96/Y299.94 = 0.17. Thus a 95% confidence
interval for the change in the expected value of Y is 2.96 :: 1.96 X 0. 17 or
(2.63. 3.29).
The second method is to use "approach #2'' of Section 7.3, which entails transforming the regressors so that, in the transformed regression. one of the coefficients is {3 1 + 21~. Doing this transformation is left as an exercise ( E xercise 8.9).

trict with

,
r

p6 points.
e differ- 0.0423

~2

points.

!predicted
~d changes

11ated qua-

~omc

(like

A comment on interpreting coefficients in nonlinear specifications.


~ct on Y of

function. j.
~ct contains

ted with thl:


a tion effect.
(8.~ ) .

n (uncno11 ~~
. to>r
idencc tn "

263

estimated change in test scores associated with a change in income from 10 t o 11


in E quation (8.6), which is 1.Y = ~~ X (l I - 10) + ~ 2 X (11 2 - 102) = ~~ -r 21~2 .
Tite standard error of the predicted change therefore is

.U.(XX)

10'

A General Strategy for Modeling Nonlinear Regre~ion Functions

Lerror of .)
..

tbe multiple regression model of Chapters 6 and 7. the regression coefficients bad
a natural i.n terpretation. For example, f3t is the exp ected change in Y associated
with a change in X 1, holding rhe other regressors constant. But, as we have seen,
this is not generally the case in a nonlinear model. That is, it is not very helpful to
think of {3 1 in Equation (8.1) as being the effect of c hanging the districfs income,
holding the square of the district's income constant. Th is means that in nonlinear
models, th e regression func tion is best in te rpreted by graphing it and by calculating t he predicted e ffec t on Y of changing one or more o f t he indepe.ndcnr
variables.

i!

..I~

~ting. a. s'""'
lcon~,J~r

fn

til~

~Eq W!Ilun (8.8) i~ derived bv ngtin~ tbal the F-;-tati~llc 1~ !he. ~qua_re of the Htali~l!C testing l~is hypoth~l fJ2 )1Sl;(/3 1 + 21p.))2 - I ~YISE(~Y)p.and solving fm SF.(AY).

esis. tha t is, F = t = (({J1 +

264

CH APTER 8

Nonlineor Regression Functions

A General Approach to Modeling


Nonlinearities Using Multiple Regression
Th ~

gc.:ncral approach to modeling nonlinear rcgrc '-ion fum Lions taken iu


chapter ha five elc::ments:

t!o~

l. Identify a possible nonlinear relationship. The best thmg to dots to u"'- , ;. .

nomtc theory and what you know about the application to ~uggc::.t a po-.::. .. 1 ~.:
nonlinear relationship. Before you even look a t the data. ask )Our elf "'h~ .:r
the ~lope of the regression function relating Yand X might rea-.onabl) depend
on the value of X or on anothe r indepe ndent variable. Why might such nonlinear dependence exist? What nonlinear shapes does lhis suggest? For ~:x1;-n.
ph:, thinking about classroom dynamics with J1-ycur-olds suggests that ~ut1u1g
class size from 18 students to 17 could have a greate r effect than cutting it frum
30 to 29.

2. Specify a nonlinear function and estimate its paramer.ets by OLS. Secti o n~


8.2 and 8.3 contain various nonlinear regression function::. tbat can be e"-ti
mated by OLS. After working through these sections you wi ll undcrstanJ the
characteristics of each of these fu nctions.
3. Determine whether the nonlinear model improves upon a linear model. Ju,t
because you think a regression function is nonlinear does not mean 1t r~.. .ill)
is! You must de termine empirical!) whether your nonlinear modelt .tprropriate. Most of the time you can use t-stahstics and F-statisucs to te::.t th~.. null
b} potbt!si~ that th~ populat1on regr~si on funct ion 1~ linear a gam 1 the .titer
nati\ e that it i nonlinear.
4. Plot the estimated nonlinear regressionfuncrion. Does the estimated rt:=- c.Y
sion function describe the data well? Looking at Figures 8.2 and 8J ~u~g<:-;t..:J
that the quadratic model fit the data better than the linear model.
S. Estimate the effect on Y of a change in X. 1l1c fi nal step is to use the: ~,.:;l i
ma led regression to calculate the effec t on Y of a cha nge in on~ or nwrt.'
regressors X using the method in Key Concept 8.1.

8.2

Nonlinear Functions
of a Single Independent Variable
1lli'> section pro' ide two methods for modeling a nonlinear rc.:grc~;sion fum lll'11
To keep thing-. ample, we develop thec;e methods lm a nonltnear rc.:grc''''111

8. 2

Nonlinear Functions of o Single Independent Variable

265

function that involves only one inde pe odcnt varia ble, X. As we see in Section 8.5,
however, these models can be mod ified to in clude multiple independent variables.
The first method discussed in this section is polynomial regression, an exte nsion
in this

of the q uadratic regression used in the last section to mode l the relationsrup between
test scores a nd income.1l1e second m e thod uses logarithms of X and/or Y. Although
these methods are presented separately, they cail be used in combination .

use ecopossible
whether
dept!nd

Polynomials

uch non-

One wa y to specify a nonlinear regression fu nction is to use a polynomia l in X. In

or exam-

general, let r denote the hig hest power of X that is included in the regression.The

~t

polynomjal regression model of degree r is

cutting.
it rrom

?g

(8.9)
Sections

n be esti-

rstand the

rode/. Just

In it really

lis appro~st the null

the alter

When r = 2, Equa tio n (8.9) is the quadra tic re gression model discussed in Sectio n
8.1. When r = 3. so that the highest po we r of X inc luded is X 3 Equation (8.9) 1s
called the cubic regression model.
The polynomial regressio n model is similar to the multip le regression m odel
of Chapter 6> except tha t in Cha pter 6 the regressors were distinct independent
variables, wh erea~ here the regressors are powers of the same dependent variable,
X, that is. the regressors a re X , )(2, X 3 . and so o n. Thus the techniques for estim a tion and infe re nce developed for m ultiple regression can be a pplied he re. In
pa r ticular, the unknown coefficients {30 , {31 .... , f3r in Equatio n (8.9) can be estimated by O LS regression of Y; against X ,, X~, ... ,

xr

.ted rt:gres~ suggested

Testing the null hypothesis that the papulation regression function is


linear. If the popula tion regressio n function is linear, then the q uad ratic and

:Se 1he est i-

higher-orde r tem1s do not e nter the population regression function. Accordingly.

OJ' O)(lrt:

the null hypothesis (H0) thar the regression is linear and the altemalive ( H 1) that
it is a polynomial of degree r CO(respo nd to

re

H 0 : {32

= 0, {3 3 = 0, ... , f3r = 0 vs. H 1: a t le ast one {31 =F O,j = 2, . . . , r.

(8.1 0)

The null hypothesjs tha t the population regression (unction is linear can be
tested against the alternative that it is a polyno m ial of degree r by testing H 0
against H 1 in Equa tion (8. 10). Beca use [{0 is a joint n ull hypo the sis with q =
r - 1 restrictions on the coefficients of the population polynomia l r egressio n
model. it can be tested using the F-stati~tic as described in Section 7.2.

266

CHAPHR

a Nonlinear Regres~ion Fund1ons


Which degree polynomial should I use? That ioe;, how m.:~ny powcro; nf .\'
<>hould be included in a pol} nomial regt~ssion? The answt!r hal once a Lrau~"llf
bl.lWC:Cn 0C>.tbility and lll3115tiCal precision [ncrt:a<;ing lhe degree r tnlroJu t:'
more ftt.:xibihty uno the rcgre ...,ion fun ction and allow!> il ll) match mort.: shape
a polynomtalol degree r can have up to r - l bendl> (thalt,. mOI!ction point5) n
tts t!.raph But mcrc:a ... m!t r means addml! more regressors. '"hi~h c-an reduce th~

prl!cl'oion of the esltmated c<x!lficH!nts.


Thus the answer to the question of how ma n)' term~ to incluue b. that \ OU
shoulu mcludc ..-noug.h to model the nonlinear regn:ssion function adequately but
no more. l ntortunatdy, tlw: aoS\\ er is not very u,cful in practice!
A practical way to detcrminl.! the degree of the polynomial is to ask whether
the codrtcienL'> in F quation (R.9) associated w11h large~t values of 1 are zero 11 .:.o,
then these terms can be dropped from the regression. This procedure, which i:;
calleu sequential hypo thcl>is testing because inuividual h) polheses are lC"-h.tl
sequentially. is summariLcu in the following steps:
1. Pick a maximum value of rand estimate the poJ)rnomial regression for th, 1 r.
2. Llll." tht.> /statistic to test the hypothesill thalthe coefficient on X' [;3, in Eq ..t
tion (8.9)) is zero. If you reject this hypothesis. then X' belongs in the regr.. ,_
sion, so use the polynomial of dcgr-.:e r.
3. [f you do not reject /3, = 0 in step 2 , elimmate X' Irom the regression and C'i
mate a polynomtal regression of uegree r - 1.Test whether the coefficien' m
.xr-t '' 7~ro. If )OUreJCCt, use the polynom1al of degree r - 1.
4. If you do not reject /3,- 1 0 in step 3, continue this procedure until the c:>t:tficient on th\! h1ghe<\l power in your polynomial is statistically significant.

111i~ recipe ha~ one missang ingredient: the initial degr~e r of the polynon' tll
In many applications mvolving economic data, the nonlinear functions are smo 1th.
that is, tht.:y do not have shurp JUmps or "spikes."' If so. then it is appropriat'- ll'
choose a small maximum order for the polynomial, such a~ 2, 3. or 4-that is, begin
with r

= 2 or 3 or 4 in step 1.

Application to district income and test scores. The estimated cubit: regr~~
~ion Iunction rdaung.lhstncl income to \1.S t score~ i

~ ..,. 600.l .. 5.02lmome


(5.1 ) (0.71}

R2 =

0.0961nwmt.2
(0.029)

j.

0.00069Jncome-'.
(0.000~5)

0.55.~ .

- -~ '' .... "" .. ' h.-. tt.rn

tiq

thnt it b a cuhic at .h\.. "'

----~...JI

8. 2

of X

d:~!

apes:

. ts) in
bee the

hat you
tely, but

Nonlinear Functions of a Single Independent Variable

267

level. Moreover, the F-5tatistic testing the joint null hypothesis that the coefficie-nts
on Income? and Income' are b oth zero is 37.7. with a p-value less than 0.01 %~ ~
the null hypothesis that the regression function is linear is rejected agajnst tht'
alternative that it is either a q uadratic or a cubic.

Interpretation ofcoefficients in polynomial regression models. The coef.


ficients in poly nomial regressions do not have a simple interpretation. Thl! best
wa y to interpret polyno mial regressions is to plot the estimated regression fu nction and to calc ulate the estimated eliect o n Y associa ted with a change in X for
one or more values of X.

whether
~ ro. lf so.

\Which is
fe tested

for that r.

in Equa-

~e

regrcs-

~and esti[[icienton

~the coef

ificaot.
I
. I
'olvnomHl .

I -

lrc smooth.

ropriate to
~at is. begin

ubic regres-

Logarithms
Another way to specify a nonlinear regression fu nction is to use the natural logar ithm of Y a ndJor X. Logari thms convert changes in variables inro percentage
changes, and many rclationsl1ips are naturally e xpressed in terms of percentages.
Here are some examples:
The box in Chapter 3. "The G ender G ap in Earnings of College Graduates in
the U nited States:' examined the wage gap be tween male and fe rnak college
graduates. In that discussion, the wage gap was measu red in tenns of dollar
H owever, it is easier to compare wage gaps across professions and over time
whe n they are expressed in p ercentage terms.
In Section 8.1. we found that district income and Lest scores were nonlinearly
re lated. Wo uld this relationship be linear using percentage ch anges? That is.
mig ht it be that a change in district income o f I %-rather th an $1000-is associated with a change in test scores I hat is a pproximately constant for Ji ffcrent
values of income?

[n the economic a nalysis of consumer d e mand. it is often assumed that a I%


increase in price leads to a certain percenwge decrease in the quantity
d emandcd.ll1e percentage decrease in demand resulting from a l% increase

in price is called the price elasticity.


Regressio n specifications that use natural Jogarithrns allow regression mod-

(S. l t'l

ression full.:
. ic at the:;

els to estimate pe rcentage relationships such as these. Before introducing those


Spc:!cificatio ns, we review Lhe t:xponc nti al and natural logarithm fun ct1ons.

The exponential function and the natural logarithm. T he exponent ial


function and its inverse. the natural logarithm. play an important role in modeling
nonlinear r..:gressinn functions. The exponential function of xis e< (thut is, e raised
to tbt.: power x). \o\here e is the constant 2.7182H ... ; the exponential function 1s

268

CHAPTER B

FIGURE 8 .4

Nonlinear Regression Functions

lhe Logarithm fllnction, Y: ln(X)

The logarithmic function Y =ln(X) is sleeper


For small than for large values of X, is only
defined lor X> 0, and has slope I /X

Y
S
4

3
2

o ~--~-----L----~----~--~----~

20

40

60

80

100

also '>vritten as exp(x). The natural logarithm is the inverse o f the exponent tal function; tha t is, the na tural logarithm is the f unction for which x = ln(er) o r. equtvalenrly, x = ln[exp(x)].TI1e base of the n atura l logarithm is e. Alrhough there arc
logarithms in o ther bases.. such as base 10, in this book we conside r only logarithms
in base e, tha t is. the natural logarithm. so when we use the term "logarithm .. we
always mean " natural loga rithm.''
The logarithm functio n, y -= ln(x ), is graphed in Figure 8.4. Note that !be logarithm function is defined only for positive values of x. The JogarHhm function has
a slope that is steep at first, then fl a ttens out (although the function contmu..:~ to
increase). The slo pe of the logarithm function ln(x) is 1/x.
The logarithm functio n has the following useful properties:

= -.ln(x);
ln(ax) = ln(a) + ln(x);
ln(x!a) = ln (x) - ln(a); and

ln(llx)

ln(x") ~ aln(x).

(~.12)

(8 L3)

(S 1 ~)
(S.J5)

Logarithms and percentages. The link between the logarithm and per,;xnt
ages re lies on a key fac t: When ~is sma ll, the di rfere nce between the logant(11l1
of x + ~x and the logarithm of xis approximately .:>.;. the percentage chang(.) in'
divided by 100. 111a1 is.

8.2

Nonl1neor Functions of o Single lndept.ndent Variable

ln(x +~)-In( x)

==

xl:lx

( ''hen .lr.
\- tc: <.,mall )

269

(8.16)

where" :!!'' means 'approximately equal ro.'' 1l1e d~rivation of this approximation
relies on ~lculus, but it is readily demo nstrate d by trying out some values of x and
~. For example, when x = 100 and .!U = 1. the n l:lx l x = l / 100 = 0.01 (or 1%),
while ln(x + Ax)- ln(x) = ln(IOl ) - ln( l OO) = 0.()()995 (or 0.995~o) . Thus tlx l x
(which is 0.01) IS ver y close to ln(x + Ax) - ln(x) (which tS 0.00995). When Ax =
5, Ax I x ::= 5 / 100 = 0.05, while ln(x + Ax) - ln(x) = ln( I05) - ln(100) = 0.04879.

The three logarithmic regression models. TI1ere are three differe nt cases in
which logarithms might be used: whe n X is transformed by taking its logarithm
but Y is not; when Y is transformed to its logarithm but X is not; and when bolh
Y and X nre Lransfonned to their logarithms. Th~ interpreta tion of the regression
coefficients is different in each case \Ve J iscuss these three cases in turn.

Case 1: X is in logarithms, Y is not. In this case, the regression mode l is


Y; = ~0 + /311o(X;) + u,,i- 1, ... ,n.
D cause Y is not in logarithms but X i~. this is sometimes referred to as a

(8.17)
lin~sr

log model
In the linear-log mode l. a 1% change in X is associated with a change 1n Y of
O.Ql {31 To see this, consider the difference between tl1e population regression function a t values of X that differ by \X: This is [130 + 13 1tn(X + 6X)] -[A_+ J311n(X)]

= P tfln (X + 6X) - Jn(X)] =o 13 1(AX I X) where the fi nal step uses the approximation in Equation (8.16). If X changes by 1 o/o, then .lX I X = 0.01; thus, in this
model a 1% change in X is associated w1th a change of Y of 0.01{3 1.
The o nly difference between tbe regression model in Equation (8.17) and the
regressio n model of Chapter 4 wi th a single regressor is that the right-hand vari.tbk is now the logarithm of X r ather than X itse lf. To e~timat c the coefficient~ J30
.md /3 1 in Equation (8.17), first compute a ne w variablc.ln(X); this is readily done
uo.ing n spreadsheet or statistical softw<ne. Then 13oa nd J3 1 can be estimated by the
OLS regression of Y; on ln(X,). hypotheses abou1 J31 can be teste? usmg thl! 1-~ta
tl',tic, anJ a 95% confidence interval for 131 can be consrructed as /3 1 L.96S(/31 ).
A:. an example, return to the relationship betv.cen district income and test
-.cor~.:-. ln<.,tead of the quadratic specification , we cou ld usc the linear-log specification in Equation (8.17). Stimating rhis regresston by OLS yields

270

CHAPTER 8

FIGURE 8.5

Nonlinear Regression Functions

The Unea.-log Reg,......, Funchon

The eshmott-d lrneor log regre:uion function


Y ~~~ p ln(X) coptvres much of the noolineor

Test score

-.10

relation between le:.l scores ond district income.


720

II()()

'---~----l..---'----'---..J.._-__;

II

lO

.iO

re;;s;;;;:e = 557.8 + 36.42ln(lncome), H.

(3.8)

40

3tl
r,r r
District incurne
(th ousands of dollo~r;)

= 0.561.

U' 1 l

( l 40)

According to Equauon (8. 18), a I n.o inaea e in ancome i~ a~'ociatc:J '"ith an


incr~.; ''~.; in lc'>L scores of 0.01 x 36.42 = 0.36 points..
l o C'>timmc the cl'cct on Y ot a change 10 X in it'> original units o t thn~ an&
of d\llbrs (not in loga rithms). we can U!)t: the method in Key Concept~ 1 f-nr

c'\,1mple, what is lb.: pr10dscted diC1ercoce in test scores fo r district~ w1th '"~ ..c
snc.:omcs of S I 0.000 vcrsu:; $ 11.000? The cc;ti mated \"alue of tl Y is the lhfft " ~~
bct\\ ~.;cnth~.; pr~.;t.lictc:d"alues 6Y - [557., 36.421n(ll))- [557.8+ ~o.42lu(t{J)I
36.42 x [In( ll} -In( 10)) = 3.47. Similarly. the prcdictcu difference hctw~ n '1
district with average income of $40.000 and a uistrict with average inco1H t'1
$-11.000 is 36.42 x !Jn(41) ln(-10)) = tl.l)(). Thu<... like the quadratic spccific.JtiiJ0
this regression prcdtct~ that a $1000 mcrcasc in income has a larg~r cff<.> ct ,,n tt~l
swrcs in poor d~ tm: ts than it does in aroucnt uharicLS.
The estimated hncar-log rcgressaon fu nction in Equation (l'\.1,') ~~ pll1ttcd 10
Figure 8.5. Bt:caU'I:' th~; rcgre~or m Equ,staon (K J, ) is the natural logarithm"[
incom~.; r.llhcr than income.:. th~: e'timatc:d rcgrc-,,km fun ctiun i-. out a straigbtltnc!
l.ikc the quadratic regrco;o;ion function in fi!!urc h.3 it io; imtt,allv qecp butthl!ll
n:uten' out for hll~her len:h ot tncornc.

8.2

Nonlinear Functions of a Single Independent Variable

211

Case II: Y is in logarithms, X is not. ln this case. the regression model is


(8.19)
Because Y is in logarithms but X is. not , this is referre d to as a log-linea r model .
In the log-linear model, a one-unit change in X (!lX = I) is associated with a
100 X {3 1% change in Y. To see this, com pare rhe expected values of In( Y) for values of X that differ by .iX T11e exre.cted value of In( Y) given X is ln(Y) = {3~,1 +
{3 1X. When X is X~ llX, the expected value is given b~ ln(Y + AY) =f3u +
{3 1( X + LL\'}.111us the difference between these expecteu values is ln(Y + A Y) l n(Y) = [{311 + {3 1(X + 6-.X)]-[{30 + ,B 1X) = {3 1AX. From the approximation in
E q uation (8.16) , however, if {3 1 ~X is small , then lu(Y + AY) - In( Y) = AY!Y
Thus.AY/Y {3 3AX If AX = J. so th at X changes by one uni t, then AY /Y changes
by ,8 1. Tra nslated in to perce ntages, a unit change in X is associated with a 100 X
{3 1% change in Y.
As an illustration. we rcLUrn to t he empirical example of S<!ction 3.7. the relationship be twl!eD age a nd ea rnings of college graduatl:!s. Many employment contracts spcci l'y tha t , fo r e ach additio na l year of service. a worker ge ts a ce rta in
percentage increase in his or he r wage.111is percentage relationship suggests eslimating the log-linear specification in Equation (8.1 9) so that each additional year
of age ( X) i~. on average in the popula tion. associated with soml! constant percentage increase in earnings (Y ). By first computing the new depe-ndent variahle,
ln(EamingsJ the unknown cod fici e nts {30 and {31 can be estimaLed ~y the OLS
regression of 1n(amings1) against A ge.i. When estimated usi ng t he 12.777 observations o n college grad uates in the 2005 Current Po pulation Survey (the data are
d escr ibe d in Append ix 3.1 ), this relationship is

1n conte

dollars)

l8.18)

d with an
I

thousand"
lpt 8.1. f \lf

lth average

ln (Eamings)

d1ffcrenct:
36.42ln(l0)\

= 2.655 + O.OOS6Age. -R2 = 0.030.

(8.20)

(0.0 II)) (0.0005)

~ between 1

income ol
pecificatioll
:ffect on lt~l

Acco rding lo this regression, eami ngs are predicted to increase by 0.86% [(I 00 X
0.0086)% I for each additional year of age.

Case Ill: Both X andY are in logarithms.


. loueJ
lS p
logan'lllfll

111

In this case, the regression model

lS

111

I s~aight 1111"
th~ 11

(8.21)

leep but

Because hoth Y and X are specified in loganthm:-.. thts is referred to as a log-log

model.

272

CHAPTER 8

FIGURE 8 .6

Nonlinear Regre!>~ion functions

The LogUneor and Log-log Regression Functions

In the loglineor regression function, ln(Y) i:. o


linear function of X In the loglog regrenion
function, ln(Y) is a lineor function of lniXl.

ln(T~st

score)

f. HI~

''"~
..
6.1'i

1).40 '---.....l..--..L.----1'-..__ - L_ _...l__-.J


0

ln the log-log model, a 1o/o change in X is associated with a fj 1o/o change sn }:


Thu!i, in this specification {31 is the elaslicity oJ Y with respect to X. To """ thi
again apply Key Concept 8.1: thus ln(Y + 6 Y) - ln(Y) =1130 + 131ln(X .l.\)j
- [{30 + f3 1ln (X) ) = 13 1(ln(X + uX) - ln(X)]. Apphcauon of the apprm: l ution
in Equation (8.16) to both sides of this equation yields
~y

6.X

-y-=f3 1y or
6YI Y
{3 , = 6.X I X

100 X (6. Y I Y)

= 100 x

(6.X I X)

percentage change in Y

( .22)

= percentage change in x

Thus. in the log-log specification {31 is the ratio of the percentage change 111 Y ts,o
ciated with the percentage change in X. If the percentage change in X b 1% (thnt
is, if 6.X = 0.01 X), then {3 1 is the percentage change in Y associated with n I"'"
change in X. That is. {3 1 is the elasticity of Y witb respect to X.
As ao illustration , return to the relationship between income and tt:c;t :,-:~1rc:'
When this relationship is specified in thill fonn, the unknown coefficient!> ;~r ~> 11 :
mated by a regression of the logarithm of test scores against thl: logarithn1 vt
income. The resulting estimated equation is

ln(TestScur e) -,:: 6.336 ~ 0.0554In(Jncome). R2 = 0.557.


( tW06) ( OJ)()21)

~ ... ~;\
1

8. 2

Nonlinear Functions of a Single Independent Variable

! ;~

l OGARITHMS IN REGRESSION: THREE CASES

, :'

Logarithms can be used to transform tbc de pendent variable Y. an independent


ve2riable X, or both (but they must be positive). The folluwing table summarizes
these three cases and the interpretation of the regression coefficient {3 1. In each
case. {31 can be estimated by applying O LS after raking the logarithm of the dependent and I or independent variable.
Regreuion Specification

::o'

',J

r:~~v: C.ONg~11

~~
~--------------------------------------------~----~------~~~~w:

C11se

273

8.2

lnterpreJotion of {S1

A 1% change in X is ~ssociated
with a change in Yof 0.01/31

....__

_J

f)~

incom~

pf dollars)

,..,...

01

~1ange in Y.
~o see this.

A l% change in X is associ<H~d
with a {3 1 % change in Y. SQ /31 is lh~
elasticity of Y with respect to X.

According to tbis estimated regression function, a J % increase in income is estimated to correspond to a 0.0554% increase in test scores.
The estimated log-log regression function in Equation (8.23) is pJotted in Figure 8.6. Because Y is in logarithms, the vertical axis in Figure 8.6 is t he logarithm
of the test score. and the scatterplot is the logarithm of test scores versus district
income. For comparison p urposes, Figure 8.6 also shows the eslimated regression
function for a log-linear specification. which is

I(X :- ~X)l
>rmamation

A change in ){ hy I unit (D. X = l)


i!> a.-.sociated with a 100,G1% change
in Y.

_ __ __

--

(8.22)

ln(TestScore) = 6.439 + 0.00284lncome, -R2

tge in y assoX is t % (that


0
c:d with a l io

(0.003)

= 0.497.

(8.24)

(0.00018)

Because the vertical axis is in logarithms, the regression functi011 io Equation


(8.24) is the st.raight line in Figure 8.6.
As you can see in Figure 8.6, the log-log specification fits slightly better lhan
the log-linear specification. This is consiste nt with the higher R2 for the log-log
regression (0.557) than for the log-linear regression (0.497). Eve n so, the log-log
spcciGcation does not fit rhc data especially well: At the lowe r values of income,
most of the observations fall below the log-log curve. while in the middle income
range most o f rhe observations fall above the estimated regression function.
1l1c three logarithmtc regression model are sununari.ed in Key Concept 8.2 .

;7.

.0 .

~. .-

-~-~=-,~--~

274

CHAPTER 8

Nonlinear Regression Functions

A difficulty with comparing logarithmic specifications.

\\ hich of the lug

rcgrc!-.-.ion models hc'>l fib the;- data'' A~ we saw in the 1.hscus:.ion ol Fquatron

(R2J) and (R.::!-t). the: R 2 can be ust:d to compare the log-hnear and log-log fl'l)t.J.
ds: as it haprencd.the log-log moclcl had the higher R2. Similarlv. the R? c<en he
u-.cJ to compare the linl!ar-log regrc.,..,ion in Equation (H. I8) and the linear rcgr
<.ion of }~ agarn-;t X. In the tc't score and rncome regreo;sron. the lincnr-log rcgr,
sron has an R2 of 0.561 '"hile the linc<tr regressron h,rs .m R' 111 o.sog. s" ~~::
linear-log model Cit., tht;; data h1.:tter.
llo'" can we compare the linear-log. model and the log-lug mnclcl', L nlonu.
natd). the.. R' comwt he u,cJ to compare these two n ..grc,ston.., becau,... th 11
J~pemlent vanables arc different (one is Y. the other is ln(Y,)] Recall that the.. !<
mcar.ure.. " the fraction olthc vanancc ot the dependent variable explamcd h\ lh~
rcgu:ssors. Rccuus~ lhc t.lcpcntkn l variahles in the log-log aml linear-log rno~kh
an;. Jifl\:r~nt. itlloc::, nol makt: scns~.: to comp are lhcir R 2's.
Becau~c of this prnbkm . the best thing to do in a particulm applica uon b. to
dc<.:iuc. u ~ing economic: theory and either your or other experts' knowlcllge nltl11.
problem. whether it make" s~nse to specify Yin logarithms. Fo r examph... htb,,r
CCllnuml'>h t) pically model earnings using logariLhms becau<>e wae:c comp:ui- '
contracl w.rge rncrca..,C!>. anu so forth are often most nat uraUy di..;cusc:cJ in r
ccntagt: terms. ln moddmg test scores. ll seems (lo us.. anywa}) natural to d1,, ....~.,
test rc-.ult' in tenns of poinb on the test rather than perc~:nlage mcrcascs 1 hi!
(~:<,I '\Cores. '\1.) Wt! focus on models 111 whrch the dependent vanabk 1s the tv't
w
rathct than it' logarithm.

Computing predicted values of Y when Y is in logarithms. 3 If the <.kp!.!n


J ent '.mabie } ha.; hl!cn tran-.fornn:J by tabng logarrthm-:.the csrrmatcd re~r~:;,
'ion can be ust'd to comput~: directly th~ predicted ,alue o t ln(Y). H o\\c:H;r. 1t ''
a hit tricki~r to compute the pn.:cltLtcd value of Yit:>df.
To sc~. this.. consida thL log-Jin(:ar rcgrt!ssion model in E<.{lHit 1nn (~.li.J) . crnu
rewrite it ~o that it is specified in tt!rnh of }~rather than In( Y). ro c.h1 ~o. tak~ tJI~

exponent Ill I (unction of hoth sides of the Equation (S.19): tht: resu lt is

U u, i<. di,trihutcd inJcpendentl)' l'l \.then the cxpcct~:ll \aluc ul Y, gi,~,.n X I)


I:.( Y I \"/) = F.(t1' -tJ, \ c11 I X ) - (Ji tj \ (ell) The problem i~ that \.'\~0 if r 11 I
= 0. f:k"J -.= l.lbus.. thc appropriate predicted value or Y, i-. nut 'imply vbt ,\.,I

8 .2

Y.

od-

hc

the
lttU-

thclr

R'

y the

Polynomial and Logarithmic


Models ofTest Scores and District Income

ot.ld~
i~

to
of the
labor
risons.
0 p~.:l
iscuss
Ill

275

by taking the exponential (unction of {30 + $1X ,. that is. by scuing = eiJ,.- ~lx:
111i predicted value is biased because of Lhe 1ni~sing factor E(e").
One solution to thi!> problem is to estimate the factor E(e 11' ) and LO u:-;~ th is
esl!mare when computing lhe predicted value of Y, bu t thk gets compticatetl and
we do not pursue it further.
A nother soluti on. whichi.. the approach used in this book. is to compute predicted values of the logarithm of Y but not to transform them to their original
u nits. In practice, !his is ofte.n acceptahle because when the J ependent variable is
specified as a logarithllL ll is often most natural just to usc the logarithmic specification (and the associated percentage interpretations) throughout the analysis.

log
on~

H!

Nonlinear Functions of a Single Independent Variable

In practice. economic theory or expert judgment might suggest a func tional form
to use, hut in the end the true form of !be popu lat ion regression fu nction is
unknown. In practice. fitting a non lint\ar function therefore entails deciding which
method or combination of methods works best. As an illustration. we compar~ logarithm ic and polynomial models of the relationship between district income and
test scores.

tht:

1 scurc

Polynomial specifications.

We considered two polynomial specifications


specified using powers of Income, quadratic [Equation (8.2)] and cubic [Equation
(8.11 )]. Because the coefficient on lncome3 in Equation (8.11) was significant at
the 5% level, the cubic specification provided an improvement over the quadratic.
so we select the cubic model a~ the preferred polynomial specification.

dl.!pcnrcgresCl. it i~
IY). and
ltlkt! th~

i\'cn X 1~
t1 if 1:(
ohtauwd
j

Logarithmic specifications. 1l1c logaTitbmic specification in Equation (8.18)


seemed to provide a good fit to these da ta. but we did not test this fonnall y. One
way to do so is Lo augment it with higher powers of the logarithm of income. If
these additional terms are not statistically different from ~cro. then ""e can conclude lhatthe specification in Equation (8.18) is adequate in lbe sense thai it cannot be rejec!ed aga inst a polynomial function of the logarithm. Accordingly, the
est imated cubic regr~ssion (speci(ied in powers of l he logarithm of income) is

--

TwScore = 486.1
(79.4)

113.41n(/ncome) - 26.9[ln(Income)] '


(87.9)
(31.7)

+ 3.06[Jn{fnmme)J', R1
(3.74)

0.560.

(8.26)

276

CH APTE R a

Nonlinear RegreiSion Functions

TI1e !-statistic on the coefficient on the cubic term is 0.818. so the nail hypoth
esis that the true coefficient is zero is not rejected at the 10% level. The F-star 1s.
tic testing the joint hypothesis that the true coefficients on Lhe quadratic and cu tc:
term are both zero is 0.44, with a p-value of 0.64, so this joint null hypothesis is lllJ l
rejected at the 10% leve l. Thus the cubic logarithmic model in Equation (8.211)
does not provide a statistically significant improvement over the model in EqiJ;s.
Lion (8.18), wltich is linear in the Logarithm of income.
Figure 8.7 plots t h~
estimated regression functions from the cubic spl'cillcation in Equation (8.11) anJ
the linear-log specification in Equation (8.18). The two estimated regression func tions are quite similar. One statistical tool for comparing these specifications i~ lht!
R2. The R2 of the logarithmic regression is 0.561 and fo r the cubic regression it b
0.555. Because the logarithmic specification has a slight edge in terms o f Lhe R~.
and because this specification does not need higher-order polynomials in tb~ logarithm of income ro fit these data , we adopt the logarithmic specification in
Equation (8.18).

Comparing the cubic and linear-log specifications.

FIGURE 8.7

The linear Log and Cubic Regression Functions

The estimated cubic regression function [Equation


(8. 11)) and the estimated linear log regression
function [Equation (8. 18)) ore nearly identical in
this !>ample.

Test score

740
720

Linear-log regressloo

700

, . \

. ;

680

Cubic regression

660

640

620
600

Ill

20

311

40
DistTic t income
(thousands of dollar\)

8.3

8.3

hypoth

/ f-statis-

filld cubic
~sis is not
on (8.26)
in Equa-

Interactions Between Independent Variables

277

Interactions
Between Independent Variables
I n the introduct ion to this ch apter we wontleretl whethe r reducing the

pl ots

the
(8.1 1} and
sioo funcions is tht:

!ession i:_is
; of the R2
1in the logification in

studem-teacher ratio might have a bigger effect on test scores in districts where
many students are still learning English than in those with few ~till learning English. Th is could arise, for example. if students who are still learning English benefi t tlifferentially fmm one-on-one or small-group inst ruction . If so, the presence of
many English learners in a district would interact with the student- teacher ratio
in such a way that the effect on test scores of a change in the student- teacher ratio
wou lli depend on the fraction of English learners.
This section explains how to incorporat e such int~racti ons between two independent variables into the mu ltiple regression model. The possible interaction
between the :;tudent-teachcr ratio ant! the fraction of English learners is an example of t he mar~ general situation in which the effect on Y of a change in one indepe ndent variable depends on the value of anot her independe nt va riable. We
consider three ca~es: when both independent variables are binary. when one is
binary and the othe r is continuous, and when both a re continuous.

Interactions Between Two Binary Variables


Consider the population regression of log earnings [Y,. w here Y, = In( Eanzing.s)]
against two binary va riables. the individual's gender (D li which = I if lhe i 1h person is female) and whether he or she has a college degree ( Dll. where D"2, = I if
the t1h person gradua ted from college). The population linear r~grcssion of Y, on
these two binary variables is

!Ssion

(8.27)

I~
. .

esslOn

LTict jn cotll<'

Is of doUa rs)

---

1n this regression model, {31 is 1he effect on Log earnings of being female. holding
schooling consta nt. a nd {32 is the effect of ha ving a college degree. holding gender
constant.
1l1e specification in E quation (8.27) hm, an important limitation: Tl1c effect of
having a coUc::ge Jcgree in this specificatio n, bokling constant gcnde.r. is tbe c;ame
for men and women. There i. however, no reason that this must be so. Phrased
mathematically. the effect of n 2, on Y;, holding D 11 constant. could tie pend on the
value: of D 1,. l n o the r worus, lhcrc could be an interaction between gender and
having a college degree so tha t the value in the job market of a clegrc::e is different
for men and woml.!n.

278

CHAPTER 8

Nonlinear Regression Functions

Alt hough the specification in Equation (8.27) does not allow (or thi~ intclclction between gender and acq uiring a college degree, it is easy lO modify the "f' .
ification so that it does by introducing another rcgrc!)Sor. the product ot th~.: 1 , 0
binary variables, D 1; x D 2;. The resulting regression is
(R Ui)
Th~

new regressor, the product Dli X Du. is called an interaction term or an im er.
aded regressor, and lhe population regression model in Equatlon (8.28) IS called
a binary variable interactioo regression model.
The interaction te rm in Equation (8.28) aJJows the populatio n e ffect on Jo!!
earnings ( Y,) of having a college degree (changing 0 2, from 0 2, = 0 to D)J = I) tu
depend o n ge nde r (D 1,,). To show this mathematica lly, ca lculate the po pulatiun
effect of a chao.ge in D2; using the general method laid out in Key Concept 8.1. !'he
first step is to compute th e conditional expectation of Y; for Ou = 0, giv~n a' ,tlue
of D1,; lbis is E(Y,ID1, = d1 , D2, = 0) = [30 + {31 X d 1 +~X 0 + {3, X (d1 x IJ){30 + f3td1 The next step is to compute the conditional expectation of Y, uftcr the
change- that is, for Dv = 1, given the same value of D 1,; this is E(Y,I D 1, = d_ D11
= 1) = {30 + [3 1 X d 1 - {32 X 1 + {33 x (d t x I) = {30 + {3 1ll 1 + {32 + {3,d 1. lbe
e ffect of th1s change is the difference of expected values Ithat is, the dJ ffcrcnc:c 10
Eq uation (8.4)], which is

Thus, in the binary variable mteraclion specification in Equation {8.28' ll.


effect of acquiring a college degree (a unit cha nge io D 2,) depend!. on the rc~ ,
gende r [the va lue of D 1,. which i!) d 1 in Equation (8.29)]. If the pe rson ts 1 Jlt;
(d 1 = 0), lhe e ffect of acq uiring a college degree is~ but if the person ~ fl. ak
(d1 = I), the effect is [3 2 + fJJ.The coefficienr {3 3 on the interaction term is the.. dli
tere nce in the effect of acquiring a college degree for women versus men.
AJthou&h U1is example was phrased using log earnings. gender, am! a~quinng
a college degree. the point is a general one. The binary variable interaction regre~
sion allows the effect of changing ooc of tbe binary independent va riabh.::. to
depend on the value of the other binary variable.
The method we used here to interpre t rhe coeCficients "a' in effect, to "or~
through e ach possible combination of the bina ry variables. Th1 method. wl ~h
applies to all regressions with binary variables, is summarized in Key Concept ":

8.3

i> interac-

Interactions Between Independent Variables

279

A M ETHOD FOR

INTERPRETING COEFFIClENTS
IN REGRESSIONS WITH BINARY VARIABlES

the spec1( the two

(8.28)
~r ao inter-

fjrst compute the expecte d values of Y for each pos~ible case describe d by the set
of bmary vmiables. Next co mpare these expected values. Each coefficie nt can then
bl! 1!}\' Pressed either as an expected ,alue or as the difference between two or more
exp~:cted values.

S) LS called

fleet on log
D 21 = 1) to
population
~ept 8.t.The

iiven a value
~ (d 1 x O)=

'

yi after the
,011 = dl. D"!.i
r- f3 3dl. The

Application to the student- teacher ratio and the percentage of English


learners. Let HiSTR, be a binary variable that e qua ls 1 if the stude nt-teacher
ratio is 20 or more and equals 0 otherwise, and let HiEL, be a binary variable that
equals 1 if the percentage of English learners is 10% or more and equals 0 otherwise. The interacted regression of test scores against Hi.STR, and HiEL; is

TestScore = 664.1 - 18.2HiL- 1.9HiSTR - 3.5(HiSTR x HiEL) ,


(1.4) (2.3)
(1.9)
(3.1)

(8.30)

R = 0.290.
2

,diffc[ence in

fhd 1'

( 8.29)

ion (8.28), the


>n the person'$
person is malt!
erson is female
term is the dif

:us men .
r. and acquirin~
reare'
eract1on
,.
. bl<<;
ent ,ana
., _ l <

o w~o.,rl

n effect. {
s methoc:l ,~vntcl
. .

Key Co ncept 8

The predicted effect of moving from a district with a low student-teacher ratio
to one with a high srudenHeacher rafio.hold\ng constant whether the percentage
of English learners is high or low. is given by Equation (8.29). with estimated coefficie nts replacing the population coefficients. According to the estimates in Equatio n (8.30), this effect thus is - 1.9- 3.5HiEL. That is. if the fraction of English
le arners is low (Hi EL = 0), then the effect on test scores of moving from li iSTR
= 0 to HiSTR = 1 is for \est scores to decline by l.9 points. If the frac tion of English lea rners is high, test ~cores ar e estimated to decline b y l.9 + 3.5 '"' 5.4 points.
The estimated regression in Equation (8.30) a lso can be used to estimate the
mean test scores for each of the four possible combinations of the binary variables.
This is done using the procedure in Key Concept 8.3. Accordingly, the sample average test score for districts with low student- teacht:r ratios (HiSTR, = 0) and low
fractions of English learners (HiEL; = 0) is 664. 1. For districts with HiSTR, = 1
(h\g.b student-teacher ratios) and HiEL; = 0 (low fractions of English learners),
the sarnple average is 662.2 ( = 664.1 - 1.9). When HiSTR; = 0 and HiEL; = 1, the
sample average is 645.9 ( == 664.1 - 18.2). and when HiSTR 1 =I and HiEL, = l,
the sample average is 640.5 ( = 664.1 - 18.2 - J.9 - 3.5) .

2 80

CHAPTER 8

FIGURE 8.8

Nonlinear Regression Functions

Regreuion Functions Using Binary and Continuous Variables

X
(b) lllrl( r m tr rcrp . thlt r

Ill

,J, 'I'~

X
(c) S.

intcrcqll dilli."Tt"nt dope~

Interactions of binary variables <md canltnuous vanables con produce three different popui.Jt'()ll roqre>s on ~ fnct.VIU
(a) f>c I' X 13P allows for Mferent ntercep~ bv1 has tru> soi'IW' lope (b) fi
J ,X
~D
$ (X >.. ~a lov.~
for different intercepts and dtfferenl sl~: and (c) ~ J_ a X ' !J:l(X X Dl hos the .>OI'I'e intercept but allow\ for d I
ferent slopes.

Interactions Between
a Continuous and a Binary Variable
Nt!xl comiul.!t the population regrcs:.wn o( log carmngs I Y1 - In( Larw".i{' )j
<tgainsl nne cnnttnunu<. variable, 1he.: tntlividual\ yc..r~ nf wmk c\pcricncl.! (' \',) ,11td
um: binary vari.thle. wht!ther the worker hus a college lkgrcl' (n1 \\hl!f'l.' n, I t1
lh~: ;th p\.'NlO i<> a collcg.c gradual\:). A<; <ihnwn in rigurc.: RX.thc.. popul.llion r u!r~ ~
"ion line relating ) and the continuou!> vat table X c;m tic.: pend nn the bm.try \ .ui
abk {) m thr~c Jtfkr~nt W<n s.
In hgurc K~. th~ l\\O rq'ft:~"on linco; diller only in thltr intcrt:~pt .'Jll\. c,,r
rc~p.,mhng populntion J egrc~stlln mudc:l j..,
}'

B"

+ 8 ,.\ ' + B~ l> + 11 .

1s -;I_).......__ _ _~....

Interactions Between Independent Variable$

8. 3

281

This i:s the familim multiple regression model wjth a population regressiOn function that is linear in X; a nd D;. When D 1 = 0, the population regressio n function is

/30 + {3 1X ,. so the intercept is /30 and


regression function is {30

+ {3 1X,

-t-

the slope is {3 1. Whe n D; = 1. the popula tion


{31 . so t he slope remain!'. {3 1 but the intercept is

f3o + {32. Thus {32 is the difference betwc!en lhc intercepts of thc two regrc!ision line!>..
as shown in Figure 8.8a. StateJ in terms of the e arnings e xample. {3 1 is the effect
on log earnings of an additional year of work experie nce. holding college degree
status constant. and /32 is the effect of a co llege

d~gree

on log e arnings. holding

years of experience constanL In th is specification. the effect of an addi tional year


of wo rk experience is the same for college graduates and nongraduatcs.that is.lhe
two lines in Figure R8a have the same slope.
In Figure 8.8b. the two lines ha \'e diffe re nt slopes an d intercepts. The differ
e nt slopes permit t11e eUect of a n additional year of work to differ for college gr adua tes and nongraduates. To allow fo r different slopes. a dd an inter:lction term to
Equation (8.31):

(8_32)
where X; X D, is a new variable. the product of X; a nd D;. To interpre t the coeffi
d e nts of this regression. apply the proced ure in Key Concept g.3. Doing so shows
that, if D, = 0. the popularion regn:ssion fu nction is {30 T {3 1X;. wbercll:> if D; = 1.
the population regression function is ({3 0 - /3 2) 4- ({3 1 + {33)X,. 11lUS, this pecification alJI")WS for two dit:rcrent population regression functions relating Y; and X;,
de pending o n the value of D ,. as is shown in Figure 8_gb. The difference between

function~:

~allows

the two intercept s is {31 , and the difference be tween tbc two slopes is f3:. ln the
e amings example. /31 is the effect of an additional year of work experience for nongraduates (D , = 0) and /31 + {33 is t his effect for grauuates. so {33 is the difference
in the effect of an additional year of work experience for college graduates ver~us

s for dil-

non graduates.

Earnill~' )j

A third possibility, shown in Figure 8.8c, is that the two lines have different
"lopes but the same intercept_ The in teracted regression model for this case is

Ice ( \ ) anJ
~re JJ,

o:=

II

tlion regre.;binarv '1 ;~rt

(8.33)
The coefficients of this specification also can be interpreted using K~y Concepl
...3. l n rerms of the earnings example, this specifica tion allows for different effects
of experience o n log earnings between college graduates and nongraduates. but
requires tha t e xpected log earnings be the same for b orh groups when they
have no prior experience. Said dtiferently, this specification corresponds to the
populntion mean entry-level wage bt:ing 1he Mime fur college gratluaks and

282

CHAPTER 8

Nonlineor Regression Functions

INTERACnONS BETWEEN BINARY AND CONTINUOUS VARIABlES

8.4

Through the usc of the interaction term X, X D,. the population regressiOn l.nc:
relating Y, and the continuous variable X; can have a !>lope that depend~ on the
binary varia ble Dr There are three possibilities:

J. Different intercept. same l;Jope (Figure SJ~a):


Y,

= fJ,, - /3 1X , + {32D . u,:

2. Different intercept and slope (Figure 8.8b ):


Y1 = /30 + f3 1X; - /320, + /3 3(>:,

x 0 1) + tt;:

3. Same intercept. di(fere nt slope (Figure 8.8c):


Y1 -

f3o + {3 1X; ~

{32(X,

x D,) + u,.

no ngraduates. This does not make much sense in Lhis application. and in p-1ctict
this specification is used less frequently than Equauon (8.32). whtch allow<; h Llif
feren t intercepts and slopes.
All three specifications. Equations (8.31), (8.32). and (8.33), are versiOn" Llf tht>
multjple regression model of Chapter 6 and, once the new variable X; x D. ;... en:
ated, the coefficients of all three can be estimated by OLS.
The three regression models with a binary and a cootmuous independ~nt ..-ari
able are summarized in Key Concept 8.4.

Application to the student-teacher ratio and the percentage of English


learners. Does the effect on test scores of culling the student-reacher rnuo
depe-nd on whether the peccentage of students stil lleaming English is high m low'!
One way to answer this question is to use a sp~cification that allows for''"' Jtl
[erent regression lines, depending on whethe r the re are a high or low perc~.:nt 1 g.:
of English learners. This is achieved using th~ different inlercept/differenl slopcspecification:

= 682.2 -

0 97STR + 5.6HiL - l.28(STR .. lliEL).

cu 9) (os9)
R. = 0 105.
2

<1os)

co.n>

c~ '"'

8.3

~LES

n line
on the

lnteroc~ons Between

Independent Variables

283

where the hi nary variable H1 I:..L, equal 1 if the percentage of srudents still learning English in the district i~> greater than lO% and equals 0 o thcnvise.
For districts wjth a low fraction of English learners (HiL 1 = 0), lhc estimated
regression line is 682.2 - 0.97STR ,. For districts with a high fraction of English
learners ( HiL 1 = 1), the estimated regre sion line is 682.2- 5.6 - 0.91S TR,
- 1.28STR1 = 687.8 - 2.25STR,. According to these estimates, reducing the student-teache r ratio by 1 is predicted to increase test scores by 0.97 poin t in districts
' ith low fractions of E nglish learners but by 2.25 point!) in districts with high fractions of Englis h learners. 11le difference between lhe c two effects, 1.28 points, is
the coefficient on the interaction term in Equation (8.34).
The OLS regressjon in Equation (8.34) can be u~>ed to test several hypotheses about the populatio n regressjoo line. Firs t. the h ypothesis tha t the two lines are
in fact the same can be tested by com puting the F-staristic tt:sting the joint bypothesb that the coeffic.:ient on J-liEL1 an d rhe coefficient on the interaction term S TR,

L-----rt practice
s for di[

lons of til<

D, iSCf(

1ndent \'an

of English

acher
rat",
'

bigh or lo"
, for two d t
, pt!rcent .:
fferet\t sll .

r;EL).

X HiL1 are bot11 zero.1l1b F-~tatist ic is 89.9. which is signjficant at the 1% level.
Second , the hypothesis that two lines have the same slope can be tested by
testing w het her the coefficient on the interaction term is zero. The t-statistic.
- 1.28/0.97 = - 1.32. is less than 1.645 in a bsolute value, so the null hypothesis that
the two lines have the same slope cannot be rejected using a two-siJcd test at the
10% s ignificance leve l.
TI1ird, the hypothesis that the lines have the same intercept can be teste d by
testing whether the population co~fficic nf on Hi L is zero. TI1e r-statistic. i5 t =
5.611 9.5 = 0.29, so the hypothesis that the lines have the same intercept cannot be
r ejected at the 5% level.
These thret' tests p roduce seemingly contradictory results: The joint te~t m;i ng
the F-statistic rejects the jo int hypothesis lhat the slope and t he in tercept are the
sa me, but the tests of the individual hypothe:ses usi ng the t-s(atistic fa il to reject it.
n 1e reason Co r this is thaL the regressors. HiEL and STR X HiEL, are- highly correlated. n1is results in large standard e rro rs on the individual coeffici ents. Even
though it i:, impossible to te ll which o f the coefficients j non?.ero, there is '>trong
l.!videncc against the hypothesis tha t both are zero.
Finally. the hypothesis that the student-teacher ratio does not entl.!r this specification C.."tn be tested by computing the F-statisticfor the joint hypothesis that the
coefficientr:, o n STR and on the in te raction term are both zero. This -statistic is
5.64, which has a p-valuc of 0.004. 1l1us, the coeWcicnts on tbe stude nt-teache r
ratio arc statistically signi fi cant ar the 1% significance level.

284

Nonlineo Regression Functions

CH APTER 8

The Return to Education and the Gender Gap

control for other determinants of ~a min~ that mtght

hall economic rewards. As the boxes in Chapter.. 3

be correlated with educauonal achh;\cmcnt, -o the

a nd 5 show. workers with more education tend to

OLS estimator of the coeHictent O"l c:duc umr cou'

cam mon: than the1r counterparts "'tth less cduca

ha\e omttted variable bias. Second. thc

tion. The analysis m those boxes wal> incomplete.

form used in Chapter 5--a :.imple I nc 1r relation

ho~~v~r. for

impbes that eammgs change by a constunt doll.tr

n add1t1on to 1ts intcUcclUal plt:a:,urc!>. education

at least three reason!\. First. 11 failed to

function.~!

(cnntimwd

TABLE 8 . 1

The Return to Education ond the Gender Gop:


Regression Results for the United Stctes in 2004
Dependent vorioble: logoritnm of H01.1rly omlngs.

Regressor:

Years of educotu>n

(1 )

(2 )

(3)

0.()9H
(O.OOOS)

0.09.30
(O.OOCJ8)

0.0861
(0.0011)

- 0.237

Female

(0.004)

Femole X Years of education

- 0.484
(0.023)
O.Ql&J
(0.0016)

(4)
U.U~I}q

(0.00 11)
- O'i21
(!l.ll!!)

-------

0.02117
(U Olllti)

Potential expemmce

0.0232
(O.()IIU..}

- o.ronn,'oi

Pore/ll.lal expercenu?

(0.0000 1~)

- 0.051'1
(0.006)

MidWt:ll

-------- 0.078

Sourh

{0006)

----------------------------------------------------------------------- 0.030
(0.00h)
Intercept

1.545
(0.0 11)

0.174

l.b21
(0.011)

0.220

1.215

1.721

(1).(11~)

(0.0 15)
0.2:! 1

----------data are fwm the \l arch zeus rum:nt Populalton SUI"CV (<ec Append!\'').The o;.unplc ,,zc I<
'i7,.'lM

Th~
for eucb r~jlrc'>Jon Fona/, IS on onJo~ooator

II -

O~l""lltion'

\imolblc lh3l equal, 1 for "'''men and 0 f(ll" men \fidKQt, ~~~,;-.and 1\,-rt are u!d
.:ator anabk< d.:noHn, the r'i.Jon ultbc L.noted $! tc: tn whach the: wnrl.cr hve, For ~ ~ampk. Mtrf,.n t equ:tls I of the
worker h'c:' 111 1~ lu.J.., 1J cl{uah Cl ull'l<!rwt>c flhc cunallcJ re 11 "" ~ \ , rrhr~ t) St nJ uti error-. are reported on parctl
lbc>o below the c:sumMcJ cociiiClcnt 111\l' tu I c.. "<!IIICieno arc <t.tll,IJQ.ll\ "i!n"'4111
be ~ '10. or 1 % lglltt.caoc.:
lc:\ ..-1

un

hn
ret

call
rc~

tha

Th,

8.3

ot might
t so tbe

on could
.nctwual
el:tuon-

nt dollar
ntimuui1

:41
~99

Klll)

f2l

)2.2)

.U7b
l.ll06)

l.!l30..
) OO'i)

Interactions Between Independent Variables

285

region. Controlling for region mAkes a small


difference to the estimated coefllcients on Lhe education terms. relative to those reported in regression
(3). Fourth. regression (4) controlll for the potential
expericuc~ of the worker. as measured by years since
completion of schooling. The e~timated coefficients
Ch3pter3.
A ll or these limitations can be addTcssed b)' a imply a declining marginal value tor each year of
multiple regression analysis that includes those potential experience.
deh:rmi nants of earnings which. if omitted, could
The estimated economic return on .:ducation in
cauo;e omined variable bias. and that uses a nonlin- regression (4) is 8.99% for each year of education for
ear luncrional form relating educlltion and earnings. men. and 11.06% (:: 0.0899 + 0.0207. in percent) for
Table 8.1 summarizes regression::: estimated using women. Becau!>c the regression funct ions (or men
datu on full-time workers. ages 30 through 64. from and women have different slopes. the gender gap
the Current Population Survey (the CPS dnta arc depends on the year:. of education. For 12 years of
described in Appendix 3.1). The dependent 'ariable education. the gender gap is l"Stimated 10 be 27.3%
is ~hu logat ithm of hourly earnings. so another year (= 0.0207 x 12- 0.521 , in percent): for 16 years of
ol c:.ducution is a!\sociatcd with a conStant pcrco:ntage education. the gender gap is less in percentage terms.
wcrtJsc (not dollar increase) in earnings.
19.0%.
fable 8.1 bas four salient results. First. the omisThese estima te~ of the return to education and
sion of gender in regression ( I) does not result in the gender gap still have limitations. including the
'>Ubt;tllntial omiued varia hie bias: Even though g.:n- possibility of other omiued variables. notably the
dCJ enters regression (2) significantly and with a native ability of the worker, and potential problems
large coefficient, gender and years of education are ass<>ciote::d \\ith the way variables nre measured in
unc{1rre la ted, that is. on avcnsgc: men and wotm:n the CPS. Nevt:rthclcss. the estimates in Table 8.1 are
hn ~.:. ncarl~ the same lc\'els of education. Set.-ond. the consistent with thos~ obtained by economists who
returns to education are economically and statisti- carefully add res~ the!.t: limitations. A recent survey
c:,lly significantly different for men and women: In by the econometrician David Card ( 1999) of doz~s
of empirical studies c()ncludes that labor economists'
n:gre~sion (3). the r-stutistic testing the hypo thesis
that they are rite same is 11.25 (= 0.018010.0016). best ~'itimates of the return to education generally
l'lllrd, regression (.f) control-; for the region of the fall between 8% a nd 11%. and that the re.tum
countr) in which the indh idual lives. rhercb~ depends on th..: qualit)' of t.lu: education. If you are
ddrc ..o;lng potential omitted variable bias that might interc)tcd in learning more about the economic
ara~e 1f years of education differ systematically by
return to education. see Card (1999).

amount for each additional year of education.


"hereM one might su.;pect that tbe dollar change in
eumin&S is actually larger at higher levels of educauon. Third. the box in Chapter 5 ignores the gl.lnd~..:r
dift~ re uces in earning~ highlighted in the bi>X in

----

286

CHAPTER

Nonlinear RegrMsion functions

Interactions Between Two Continuous Variables


\iow

~uppo-.~;;

that hoth im.h:pl!m.lcnt ' 1ri.1hles (X anu X ) are continuuuo; An


~x.tmph.: 11> "hen )' is log l!aming~ ul the 1 worker. \' is ht' or her }C<tr.; of v. ork
exp( ri~nl 1nJ .\ 1 the numbc..r ol y~.:ars he or sh~; w~,;nt ,o ~h '01. It the pop
l1tH.>n regrc~sion tuncuun LS ltn~ ar th~. dk~t on '' ~c ul .m ..tddittonal) ear\
exp.:ricncc doc' not depc 10 on the num~r of yc.:Ms Jl cdut:ati,)n or, quivalcnt ,.
h" cff~cl of n ddstiunbl ~ear o ct.luc.tliun doe' nt 1 Jcpent.l on the number,,
year ol "' rk c\perienc~ ln re-1hl~. ho"c'"r. th~rc might hc ,w intct.1CII n
bct\\een th~.,c ''"u \.ariables "o that thc effect on wage-. uf 10 at.lui11unal y\: 11 ,f
c'pcril..nc~ t.lcpcnds on the number of yc.tr of t:ducauon lnis inrcraction can ll~
mouekd b} augmcOIJOg the lim.ar ri,;~IC!ISIOO model with ,tn lll l t:rUCliOO term lh1!1
I'> th1.. product ol X u ami X:;:

'TI1e tntl ruction term allows the effect of a unit change in X1 to depend 011 A,
thb. apply the gc.:nt!nll method ror computing effects in nonlinear rcPr~C.s
:non mouds tn Kt.!y Concept ~.l.11u: dtfferencc in Equation (H 4), comput~J tr
th~: mlcract~;d ll.:gfl.:~ tun (unctiOn m E4uauoo ( .35). IS .6 Y
(13 1 -t- {3,X; .l \
[Exercise 8JO(a)). ll1us..thc ~fft;cl on ) ot a change in X 1 holding X2 con...,tant...
l os~:~.:

{8 ~6)

"htch dcpcnu-. on .\ ' for I!Xamplc. '" th-.; ~arn mgs cxampk if {3 is ro.,, Ll\ ~. n
the ettcll on log earo1ng' ol .w additional }~<II of ~;xp~.:rt\.nc~.. i~ g.redter. ..._. 1h
amuunt {31 for each .tutlitional) CM ul educauon th~. wor.hr hJ!->.
A -.tmil.n calculation ilio'" th 1 thc.. dfccton }'of .. ~h.mgc :lX~ in X,. hold
mg.Y C<lll\l<tnl.is t~. = ({3L .1.. {J,X ),
Puumg thc~c tw~ ~:fleets together -,hows that the codfkicnl {3_1 on thc intd
uc:uon tcm1 j.., the effect ol a unH increttsc in X 1 and X1, above nnd bcvunJ the :-.Uill
oltht! dfects of .1 unit increase 1n X 1 atune nnll a unit1ncn..:ase 111 X 1nll11lt!. Tit.111s.
if X 1 chan~c~ by ~X 1 and X~ changes b} .lX1 then tin. CXJ'cctcd change m ) b
(13 + {33 X 1).lX: {3.l.lX .lXz IL:-.~rli..,e li..IO(c)j ThL'
.l Y ({3 1 - {33 \'1).lX
fir-.1 l<::rm i<~ lh'- ~ff\.CI Irom changing \'1 huiJing .\'~ COiht.llll th1. -.~cnnt.lter I'
the I! fleet frum ~.hanging X 1 holding .Y conostant, and thc fin,tlll.:rm. {3 . ~X..l '< I'
the c'<l l <1 ~. fl\;cl Irom changing bolb .\ and.\,,
lnL\. ra~.tvn~ bt.!twccn I\Hl \ art.lhk"> an.: -.ummanzcl.l a-. k. r.:\ ( unn:pl "5.
\\ ht.n JOtcr.to..:tJon-. ,trt: combmo.:J "11 1 lu~arithmit. tr.ui,IUJ m>~tton,, th\.} .. 0
b..: li'Cd 10 C'> lintOlC prke ela'lticitic'l \\hen the price Cll'-lh.:il) I.JcpCn\1 \lO the;

incr~a

8 .3

Interactions Between Independent Variables

287
I

us. An

INTERACTIONS IN M ULTIPLE REGRESSION

f work

_I

popu-

8.5

year of

akntly,
mbcr of
:raction
I year or
n can b.::
erm that
I'""

(8.35)

characteristics of the good (see the box 'The D emand for Eco nomic Journals" for
an e xample ).

~d ()D X~.
~

rcgTes-

Application to the s tudent-teacher ratio and the percentage of English


learners. The pre vious examples cons idered interactions bet ween the Stu

fputcd for
b, x,)~ X .

~st~nt. is

de n t- teacher ratio and a hinary variable indicating whether the percentage of English learners is large or small. A different way to study this interaction L to examine
the interaction between the student-teacher ratio and the continuous variable. the

(~U6)

percentage of E nglish learners (PcrEL). The estimated interaction regression is

!sitivl:, then
~tcr . \1y the
I
. ~ hOld

TestScore

)n the: inte r
ond the sum
lone.11lH1 1'1ange 1n ) 1'
S.HI( c) l ill~
:cond tcflll ~
(33;lX ~~X;'~

rpt!n<.h ~'"

(11.8)

l.1 2STR - 0.67PcrL + 0.0012(STR


(0.59)
(0.3 7)
(OJJ19)

PctEL).
(837)

R_z - 0.422.

~~ ! '

tccpt 1{.5.
.tons.- th~'.

= 686.3 -

When the percentagt! of E nglish learner i at the median (PctEL = 8.85). the
slope of the Hnc relating test scores and the student-teacher ratio is estimated to be
- 1.1 1 (= - 1.12 + 0.0012 x 8.85). When the percentage of English learners is at the
75'11 percen tile (PetE:/./ = 23.0),this line is ~s t imatt!d to be llattcr, with a !.lopt: of
-l .Ot.l ( = -1.12 + O.IXH2 x 23.0). That is, for a district ""'rir.h 8.85% English learnunit reducuon in the student-teacher ratio is to increase
test scores by L L1 points, but Cor a district with 23.0% English learners, reducing the
s tudent- teacher ra tio by o ne unit is prcdictccl to increase test scores by only 1.09
points. 'The difference between lht:se tstimatcd e ffects is not statisticaUy significant.
however: The H ..latisLic testing whether the coefficient on th e interaction term is
zero is t - ll.ll01210.019 = 0.06, \\ hich IS not Significant at the 10% level.
crs..th~ cMimatcd effect of a

,,tl
1h~

Ji

Nonlinea r RegreS$ion Functions

CHAPTER 8

2 88

The Demand for Economics Journals


rofcssi onall!conomist~

(ollow the most recent

re~earch

lion price using data for the ycnr 20(10 f01

et-Q.

in Lh~ir !lTcas of specinlil'.ation. Most


in economics first appear:. in economics

nomics journals. Bccau!>~ the proJucl ol ,, Joum lies


not the paper on which it is pt mt~.:d but rnthc1 the

journal. c;o e<:onomist:>-<>r their libraries-subscribe


to ~nomics journal'\.

idea.<> it contains.. ill> pncc ts logicalI) meil~urcd not sn


dollars per year or dollars per page but in<;tcad rn

How elastic is lhc demand hy hhraries for .:conomit-s journals'? To find ou1. wo nnnlyzed the rela-

dollars per idea. Although we cannot mca":>urc


ideas directly, a good indirect measure h lh:

tio n~;hi p

her ween the numher of subscriptions to a

number of times that articles in a journal are

journal a t U.S. lihranes ( Y1) <tnu ils library suhscrip-

contillllrtl

re~arcb

FIGURE 8.9

library Subscriptions and Prices of Economics Journals

Sub scriptions

ln(Subscriptions)
B

1200

Jll(l()

(~I() :

.j

3
2

IS
20
2.1
P r ice p er citation
(a) Snbscripoons artJ P mo:: per Cltlllon

()L--'----'---'-- - ' -..J..._-'---'' --..._-J


-b -5 -4 -3 -2 -1 0
2 3 -1

lo(Pricc ptt citation )


(b)

ln(S~-ripoons)

and ln(Pnc J>cr n.aMn'

ln(S ubscripcio ns)


8
7

U L-_.___.__..__.___,_...___.___._.._-J
-b -S -4 -.\ -2 I II
:! J
4

ln(Pr ice p er citation)


(c)

ln(Sub~cnpuou~)

lnd l n(l'tt~c per oution)

There is o nonlinear inverse relation between the


number of U.S. library subsc ription~ (quontity) and
the library price per citation (price), os shown 10 Fig
ure 8.9o for 180 economics jovrnols in 2000. Bur as
seen in Figure 8.9b, the relation between log qoon
tity ond log price oppeors to be approximately lin
ear. Figure 8.9c shows !hot demand 1s more eloshc
for young journals (Age = 5) than for old journa ls
(Age = 80).

8.3

~ubscq uently

c itt:d by other re~carch ers. Acoord

ingly. we measure price as t he "pncc per c itation"


m tht! journal. The price range J$ e normo us. from

;q

per citation (the American Econom ic- Re ~iew) to

289

Journal of Econmne1'ics cust ruurt.: t han $2700.


almo<;t 9 times tht.' price of a library .;ubscription

10

the .t mencan Ecollomic Nevi t>w :


Because wc un: intertlsted in estimating das,1ici

Some journals arc c.:xpen

ties. wu use il log-log ~pecitica lion (Kt!y Concept 8.2) .

.;iveper citaLion because they have few ~itiltions, oth -

Th..:: scatterplo rs in Figure 8.9a and 8.9b provide

ers beca use the ir library subscription price per year

empirical support for this tra.us(ormali(ln. Because

2!W per citation <>r

1.., ver~

mor~.

Interaction~ Between Independent Variables

high: ill 2006. a library s ubscri ption lo the

cominul'd

!lllinued

the

)and
1 in Fig1. But as

TABL 8 .2

Estimates of the Demand for Economic Journals

Dependent variable: logarithm af suburiptions at U.S. libraries in the Year 2000; 180 observations.
Re~rc~sor

(I)

(2)

{J}

(4)

- 0.533

- 0.400"
(0.04-l)

-0.961**
(0.160)

-0.899**

(0.0~4)

----~-

ln ( Prict.: per cnation )

[In{ !'rice p er ctlaion) f

(0.145)

0.017
(0.025 )

lin( Pnn P'' d tOI10n

0.0037
(0.0055)

0.424'" ..
(0.1 1':1)

lnLAge)
ln(Agt}

><.

4.77"'
(0.055 )

lnrercc:p t

0.374
(0.118)

0.156-**
(0.052)

0 141*'"
(O.(J40}

0.206
(0.098)

1}.235
(0.098)

(O.CI%)

3.2.1**

:'\_4JU
(0.38)

(0.3S)

ln(l"rrce p er cifa/irm}

In( C/wrar:urs .;. 1,000.00.1)

0.373*'*
(0.118)

(0.38)

0.229

141**

'"-Statistics a nd Summary Statistlu


F-'-lutt~lu.: testing coe (fictent~ on
tu.ulr.u~t:

SI:.U

0.25
{0.779)

;md cubic terms (p-.,aluc)

li.750

0.705

11.6<)1

II.~

0.555

0.607

0.622

l).ti26

quan,\y linelastic

k~

Jrnols

lld.1rd <.'rrl" :uo.: gi,en an p.lf<:nlh<:~~ under codficic:nb..tnd p-\'alut:'> are gi~<!n tn parcnthcseo; unJeT F-.,hllisuc. lndhidu~luet(I~JI!Ilt\ .Jr<: ~tau.u,allv ~ll!lllltcant .tl the 5 "lc' cl or .._.I% level.

~ "' r 'llli~llc l~t the hvpoth\."'1' tlldt tilt: coeificiem~ l'!l (In( J'ric<' permonnn)j! und pn(Pnce pl'r cuanrm ))'arc both zero.

s,

290

CHAPTE R 8

Nonlinear Regre!>sion Functions

some of the oldest and most prestigious journals are


the cheape!>t per citation. a rtgr~"Ston of log quantity
against Jog price could hwc omitted variable bias.
Our regressions therefore Include two control variables, the logarithm of age nnd the logarithm of the
number of characters per year in the journal.
The regre~sion re:.ults are summari2~d in Table
82. Those results yield the followmgconclusions (see
if you can find the basis for these concl u~ions in the
table!):

1. De mand is less clasnc for older than for newer

journnts.
2. The evidence supports a linear. rather than a
cubic, function of log price.
3. Demand is greater for journuls with more characters. holding price and age constant.
So what is the elasticily of t.lcmand for economics
journals? It depends on the age or the journal.

Demand curves !or an 80-year-old journal nod a 5.


year~ld upstart aresnperimpo!><:d on the ~Ill ctplol
in Figure 8.9c: the older journal'-; dem.\nd clustldt\
is -0.2R (S =0.06). while the youngt:r JOUrnal\, 1
- 0.67 (S = O.OS).
This demand is very inelastiC: Demand ~~ \'~ty
insensitive to price. especially lor old~r journals.. For
libraries, having the moc;t recent r~!t<.:arch on hanJ i
a necessity. not a luxury. By way of compari-on
expertS estimate rhe de mand ~la~;t icity for cigurette'
to be in the range of -0.3 to - 0.5. Economics JOUr
nals arc. it seems. M addictive as cigarettes-but a lut
better for your hcalth!1
Thcse data were graciously provh.h.:d by Prote~~or
Theodore Bergstrom of the Dt:partmcnl of Economk~ 11
1

tbe Uru,ersily of \..alifomia.Santa Barham. If you aremt{ r


ested in learning more about the economics o{ ccononuc
jollrl1a.l!i..scc Bcrgslrom (JOOJ ).

To keep the discussion focused on nonlinear models, the specifications in 51!C


tions 8.1-8.3 exclude additional control variables such as the students ' economic
background . Consequently, these results arguably are subject to omlLte<.l van blc
bias. To draw substantive conclusions about the effect on test scores of reducmg
tbe student- teacher ratio. these nonlinea r specilicalions must be augmented \\ ith
control va riables, a nd it is to suc h an exercise that we now turn.

8.4

Nonlinear Effects on Test


Scores of the Student-Teacher Ratio
This section addrcsse!. three specifi c questions a bout test scores and the :.t u
de nt- teacher ratio. First, a(ter controlling for differences in economic chara..:tt:r
istics of differen t dis lr ic ts, does rhc effect o n tes t scores o f reducing tht"
student- teache r ratio depe nd on the fraction of English lea rnd~? Second d1 't:'
this e ffect dep end on the value of the stude nt- teache r ratio ? Third. anJ 1 Q) t
important, aft~r takmg economic factors and nonlinearities int o nccouot. \\ hat 1'
the c~tima tcd effect on test scores of reducing the s tudent teacher ratio b) t W(l
stu<.l ent~ pe r teacher, as our supurinteodent from Chaptur .J prop<h cs to do'?

8.4

Nonlinear Effects oo Test Scores of the Slvdent-Teo<:her Rotio

291

We a nswer lhese questions by considcnng nonlinear rcgrcsston specifica tions


of the type discussed in Sections 8.2 and 8.3, extended to include two mea.-.ures of
the economic background of the students: lhe percentage of !.tudents eligible for
a subsidi7.ed lunch and the logarithm of average district income. The logarithm of
income is u!.Cd because Lbe empirical analysis of Section 8.2 suggests that tnis specificatio n captures the nonlinear relationsh ip between test scores and income. As
in Section 7.6, we do not include expenditures per pupil as a regressor and in so
doing we are considering the effect of decreasing the ~t ud e nt-teacher ratio, allowin&expenditures per pupil to increase (thai is, we are oot hold ing expenditures per
pupil constant).

Discussion of Regression Results


The OLS regression results are summarized in Table 8.3.11\e columns labeled (1)

stu
ct ..

...

Lhro ug h (7) each report separate regressio ns. The e ntries in the ta ble a re the coefficients, standard errors. certain F-sta tis tics a nd their p -va lues, and summary statistics, as indjcated by the description in each row.
The first column of regression results, labe led regreSSIOn (1) in rhe table, is
regression (3) in Table 7.1 repeated he re fo r convenience.1l1tS regression does not
control for income, so the first thing we do is check whether the results c hange
subs tantia lly when log mcome is included as an additiona l econo mic control variable. Tile results are given in regressio n (2) in Table 8.3. 1lle log of income is statistically signif1cant at the I% level and the coefficient on the studem-u~acher ratio
becomes some what closer to zero, falling from - 1.00 to - 0.73. although it remains
swti tically significant at the 1% level. The change in the coeffici ent o n STR is
large e no ugh between regressions ( l ) and (2) to warrant tncluding lhc logarithm
of income in the remaining regressions a:-. a deterrent to o miucd variable bias.
Regression (3) io Table 8.3 i~ tbe interacted regreSSIOn in Equation (8.34) with
the binary variable for a high or low percentage of E nglish learners, but with no
eco nomic control variables. When the ecooomic control variables (perceorage eli!!ible for subsidized lunch and log income) are added [regression (4) in the table] .
the. coefficie nts change, but in neither case is the coefficient on the interaction term
significant at the 5% level. Based on the evidence in regression (4), the hypotbe' 's that the effect of STR is the same for districts wilh low and high percentages of
English learners cannot be r ejected at the 5% level (the tstathtic is 1 = - 0.5 0.50
= - 1.16).
Regression (5) examines whether the e ffect of changing the :-.tudcnt- teacher
rat io depends on the value of the student- teacher ralio l"ly including a cubic specaficntion in STR in a dd ition to the othc..r cont rol variable.; in regressiOn (4) (the
tntd1dton term, HiL X STR, "''' dn,ppcd l'l~:ltU.,._ 11 \\as not "~nificant in

292

CH APTER 8

Nonlinear Regression Funclions

Nonlinear Regression Models of Test Scores

TABLE 8.3

Depen<Mnt variable: aver-oge test sccw. in dinrict; <420 observotiofU.

Studcn

-tc:~~

c:r r:ltiO I ll I I

(1)

(2)

-1 f(j. .

-073 ..

(0 2i)

(O.llil

(3)

(4 )

(1.25)

0.05Y ..
(0.1121)

-01711...

(l)(f'l1J

(ll.fl~)

-1.21\
I11.'17)

HiH <SIR

-4 3.....
(1.44)
fl 07"..

lllll24 )

~";

(1.,:1)

0 (){',(
(U 02

- 0 161'
(0 0.~)

- 5..!7

'\.M
( 19.51)

l:.ng.h~h learne r~

.:: 10%'' (Banury.//tll.l

(7)

lI
-1.42"'

-11122 ...

(6 )

(5)

(Ill~)

1\lh I

c:m11

-1~.- ~~

-tl 'il'
I ll "~II)

(:illl)

612'
(2 '~)

-0 HI

HtEL ')( .\I H1

(0 OJ)
% Eh!-'lhle lor 'ltb-.id11ed lun,h -II .r;.nu
(0 tJl4)

o.;:;o

-n Y '

(00291

fl)liH\

( q
251 (I
(lll3fl)

(S2.2u

lntc:r~epl

(II.YJ

(0 02'))

IIW..
(1.7 )

12

A,..-r,t..:< Jbtn.:t m-. >m~.;


, l.lg.,rithmJ

-f} .. ,"

I ''.3
t '".5)

f-Statlstiu and p-Votu.. on Joint H)'?Othose5


'I M
(OJK4J

(a) ,\11 STR van.thh:'

and tntcracuon" - II

"'"~

/ Itt/.

J<.

'iT!<. !Ill
\TJ<

)h

( , O.IXl l)

... 'il
(tl.U(IJ)
5

6.1 7

(h) S TR' STR 0


(c) lliFJ

,,,,

(H.CMn) ( fl ()()1 )

',l(l

(tl tiiJ1J

( ~ (J.()IJ I )

r x ~m .

Sf H

IJJJ.'i

11.773

IJ.:'IJI

UJ05

07'15

U.71J'l

1).7':1'}

I be~ oev~ n" r .: htruJ' d tmn& tbc ct;ua on I' "~ ~oul d tri..t' rn C hCurrua ue5oertbcd m rrcndi\
n: gtcn m p:no;nt~ under Cllc:rflamt11.11n.:lp otlUC! uc: &~~en tn Jl"fCnthc:<c:$ ~ F"'!lt tttln lodl\'ld\l
tutic:all) -M ml nt t chc:
Of' 1 ... w~mfi ""' lc:cl

Ll]li:O..J

'\
t.lnl c:rra '
- c:oh r: ~-

8 .4

Nonlinear Effem on Test Score$ of the Student-Teacher Rafio

293

rt:gres~ion ( 4) at the 10% level]. The estimates in regression (5) arc consistent with

the student-teacher ra tio having a nonlinear d'fcct. n1e null hypothesis that the
rela tionship is linear is rejected a t the 1% significance level against the a lternative that it is cubic (the F-statistic testing the hypothesis that the true coefficients
on STR'2 and STR 3 are zero i 6.1 7, with a p-valuc o f <0.001 ).
R egression (6) further examines whether the e ffect of the student- teacher

(7)
r-~29~

~5.~h)

ratio depends not j ust on the value of the student-teacher ratio but a lso on the

t-3.47*

fr action of English learners. By including imeractions bet\veen Hit.'L and STR,


STR 2, and STR\ we can check whether the (possibly cubic) po pulat ion regr~ssions

(127)

0.060

funct ions relating test score!:- and STR are diHcrcnt for low and high percentages
of English learner:.. To do so, we test the restriction that the coefficients on the

(11.021)

-0.166

three in teraction terms are zero. The resulting F-statistic is 2.69. which has a p

(0.034)

value of (}.()46 and thus is <>ignificant at the 5% but not the 1% significa nce leve l.
1l1is provides so me e vidence that the regression functions are dillercnt for distncts wilh high a nd low percentages ot English lea rners: howe ver. compa r ing
r egressions (6) and (4) makes it clear that these difference:. arc associated \\itb the
quadratic and cubic terms.
Rc gr~ss io n (7) is a modification of rcgrcl'ision (5) . in which the continuous
varia hle PctEL is used instead of the binary vari<)ble HiEL to control for the percentage of E nglish learners in the districl. The coefficients on the other regressors
do not c hange substantially when this modification is made, indicuttng that the

-O..!fl:!

to.n:.m

results in regression (5) are not sensit1ve to what measure of the percentage of

lJ 51
( 1.81)

E nglish learners is actually used in th~ r egressiOn.


ln a U the specifications. the hypothesis that the c;tudent-teacher ratio does not

244.1-l
( 165.7)

S~l --1
(O.OOl)

5.96
(0.l)0,1)

enter the regressions is rejecte d at tbe 1% leve l.


The nonJjnear specifica tions in Ta ble 8.3 are most easily interpreted g raphi-

cally. Figure 8 .10 graphs lhe esti mated regrt: sion fu nctions relating test scores and
the student-teacher ratio for the linear specification (2) and the cubic ~pecifica
tions (5) and (7), a lo ng with a scatterplot of the data:1 1l1ese e!>timatcd regression
functions show the predicted valu~: of test scores as a function of the
student-teacher ratio, holding fixed o ther values of the independcn1 \'ariables in
the regression . The estima ted regression functions are all c lose to each other.
although the c ubic re gressions Oattcn out for large val ue s of the student-teacher

--0 7'1:-

:anunrJ cfl

H"

tltCi<Of' ,ln. ' 1 '

ratio.
ror each curve. the prcdictc:u value wa~ computed b~ '<.!nmg each mdc:pc:ndent vanahle. o th.:r than
STR. 10 its sampk avcm!!c: Vi! lUI.! and computme thc: predicted value by muluplymg lhc:.c! fixed \alucs
ol the independent \ltri.1hle~ h~ lb.: respecthe c:'timnted cocfficicnl!. lrom Tahlc HJ.Tht) "''~done tor
,.,mous values ol .'>I 1<. and the graph of the resulting adjustecJ predtcted "alucs JS the e!>Umatcd regres
sion line reloung II!SI s<:ttre' and the: STR. hnldtn~ the other variables cO!Thtant at their ~ample uverugcs.

294

Ct'!APTER 8

Nonlinear Regre5sion Functions

FIGURE 8. 10 Three Regression functions Relating Test Scores and Student-Teacher Rotio
The cubic regressions from columns (5) ond
(7) of Tobie 8.3 ore nearly identical. They
indicate o small amount of nonlinearity in
the relation between test scores and
student-teacher ratio.

Test score

no

Cvbic TE'gfession (5) - -

.. . ....... .
.. ..:.; ~.. .." .,...

Cubic regression (7) - --


Unear regression (Z) - -

700
680

on

col

l'or

6ti(J

~------

8.3
cub

17

tion

640

620
6(>012

14

16

Ill

20

22

24

ron

di
ofS
lions

2n

Student- teache r ratio

Regression (6) indicates a s tatistically significant difference in the cuhic


regression functions relating test scores and STR, depending on whether the pt!rcentage of English learners in the district is large or small. Figure 8. U graphs the~~
two estimated regression functions so that we can sec whether this differenc", in
addition to being s ta tistically significant, is of practical importance. As Figun; 8.1 1
shows, for student- teache r ratios between 17 a nd 23-a range that include:. 8X%
of the obse r vations- the two functio ns are separated by a pproximately ten pouus
but other wise are very ~i milar; that is, for S TR bet ween 17 and 23, districts \\llh a
lower perce ntage of English learners do bener, holding constant the
student-teacher ratio, but the effect of a change in the stude nt-teacher rath' IS
essentially the same for the two groups. 1l1e two regression functions are uiff~.:rl'nt
for student-teacher ratios below 16.5, hut we must be careful not to reau more
inro this than is justified.11le d istricts with S'J'R < 16.5 constitute only 6% ol the
observations, so the differences between the nonlinear regression functioos nre
ret1ecting differences in these very few djstricts with ve ry low studen r- te ucber
ratio~. Thus, based o n Figure 8.1 1, we conclude tha t the effect on test score~ oi a
change tn the s tudent-teacher ratio does not depend on !he percentage of
English learners for the range o f student-teach~r ratios for which we haw the
most data.

8.4

FIGURE 8. 11

2~

295

Regression Functions for Districts with High ond Low Percentages of English Learners

Districts with low percentages of English


learners (HiEL = 0) o re shown by gray dots
and districts with HiEL = 1 are shown by
colored dots. The cubic regression function
for HiEL = 1 from regression (6) in Table
8 3 Is oppmximotely 10 point$ below the
cubic regression func~on for HiEL = 0 for
!7 :5 STR :5 23, but otherwise the two h.mc
iions hove similar shapes and slopes in this
range. The slopes of the regression Functions
differ most for very large and small values
of STR, for which there o re few observe
tiom.
__j

Nonlinear Effects on Tesl Scores of the Student-Teacher Rotio

Test score

720
700

680

Regression function

660

..

')('

Regression functJo,n
(JZU (HiEL = 1)

600 ~--~--~----~--~----~--~----L---~

1:2

14

lh

18

20

22

26

Student-teacl1er ratio

r ratio

Summary of Findings

renee. in
bureS.ll

rdes 88%

len points
ts ,vjth a
tant the
r ratio is
different
ead more
%of the
ction:; aJt!
t-teachet
cores oi n
c ntagc oi
have thC.:

These results let us answer the three questions raised at the start of this section.
First, after controlling for economic backgro und, whether the re are many or
few English learners in the district does no t have a substantial influe nce on the
effec t on test scores of a change in the stude nt- teache r ratio. In the linear specifications, the re is no statistically signifjcant evidence of such a difference. The cubic
specification in regression (6) provides statistically significant evidence (at the 5%
level) tbat the regression funcri oos are different for districts with high and low percentages of English learners; as shown in Figure 8.11, h owever. the esti mate d
regression functions have simiJar slopes in the range of student-teacher ratios containing most of our data.
Second. after controlling for economic background, there is evidence of a nonlinear effect on test scores of the stude nt- Leacher ratio. TI1is effect is statistically
significant a t the 1% level (the coefficients o n STR2 and STR3 are always significant at the 1% level).
1l1ird, we now can return to the superinte ndent's problem mat opened Chapter 4. She wants to know the effect on test scores of reducing lhe stude ot- le<tcher
ratio by two students per teacher. ro the linear specification (2), this effect does
no t depend o n the student-teacher ra tio itself. and the estimated effect of
Lh.is reduction i to improve test scores by 1.46 ( = -0.73 x - 2) points. ln the

296

CH APTER 8

Nonlinear Regression Flmdions


nonlinear -.pecifications. this e ffect d epends o n the value o f the tuclenl- teacher
ratio.lf her district c urrent!} has a sru dcn t-tcachcr ratio of 20. and she is con, 1cl.
t:nng cutung 1t to lh. then ba~ecl o n regres 100 (5) the estimated ellect or th~
rccluctton Ls to tmpro\'e test scores by 3.00 pomts, '"hllc bac;ed on regression ('')
thts e'ttmatc.. "2.Y3. If ht:r di~trict curn.ntly ha a tudc:nt- teacher ratio of 22, :md
'ht i., Cllll'idering cutting it to 20. then based on regre. !'ion (5) tht: ~:'>timated dfett
of thi' reduction is to improve test ~ore' by 1.93 p<.1ints, '' hile based on reg.rc.,,i 10
(7) thic; estimate is 1.90. The estimates fro m th e nonlinear c;pccihcations sug,"' 1
that cutung the student-teacher ralio h as a somewha t greater df~ct tf thts n1tit i-.
alrc:ilcl~ ~mall.

8.5

Conclusion
This ~:htt pter presente d se veral wa ys to mode l nonlinear regression functrons.
lkc:.~usc th~s~.: moJ ds arc varianb of lhe multiple regression mode l. the un kn~.m n

cod1ictents can be estimated by OLS. and hypotheses about their va lue~ can h~
te~tcJ ustng t- ami F-stutistics as described in Cha pter 7. In these modLb, the
expected effect o n Y of a change in one of the independent variab les, .X1, hoiJing
the o ther im.lepemkn t variables x2..... xlt. consta nt, in gene ra l d eptnd-; 1'111 the
values of X 1. X2. X,.
Then~ are man) Jifferent models in this ch aph!r. a nd you could not be hi -~ d
tor bctng a hit be\\ tldcrcd a bout" Inch to use m a g1ve n appli at ion. H o" :.1 JUid
}OU anJlyze posstble nonl tncarmcs 111 p racucc? Section 8.1 latd out a gcr.t:r,tl
approach lM ... uch an anal~:.is. but this a pproach requires you to make dcu.,, '"~
and exerci<;e judgment a long the wa}. II would be convenient if there were n ..:nglc recipc: you could follow that would alwayc; work in every application. hu in
practice data analy::.n. is rarely lhat simple.
1111-;: single most importa nt step in specifying nonhnear rcg.rcssion func:ttc ln:' i"
to "usc your head.'' Before you look at the d ata, can you think of a reason. b<l~-:u
on economic theory or e xpert judgment. why the slop~ of the population rc~rt:'
sion functi o n might depend on lbe value of that, or anothe r, inde pe ndent vurial,le'!
If :-o. what ~ort o f t.kpcnJcncc might you expect? Ant! . ml>Sl important ly." hi(h
nonlinea ritii.:s (if any) c ould have major implications for the substanti ve h,ttt:!>
adtlrcsseJ by your s tudy? Answering these questions carefully wtlllocu~ ~t>llr
analysis. In the tc't ~core:: .tp plicalion. for exa mple. such reasoning lcJ us to tnv<:~
1
Ligate whether hinng more teachers might have a g.reah.r cUect tn district'" iti1 '
lan~c percentage of \tuJ~nts 'till learning E nglbh pcrhap~ hccau"e tho.-.t> 'tuJ"nt\\-llUit.l diffcrcntiall) b<.:ndit from more Jl\!f"Onal atlt:ntion Bv m'lking the ql ~;;
uun pn.CI\C. we ""'rc ahle to fmd .1 preci-.e nn.,wcr: After .,;untrlllhng tor t w_ _ __ _ __

_.i,

Revew the Concepts

297

economil: bac.:l-ground of the student~. W \. lounu no \tallc;IJcally .;igruficant evidence


ol o.;uch an interacllon.

Summary
1. In a noniJncar rcgresston.the ~lopl.! of the porulation n:grC:>'-llln runc.:tion dl!pcnds
on the "alue o[ one or more ol he mdcrenuent variable-..

tio is

of a change in the mucpcm.lcnt \<triable(::.) can be c.: omputed b~


cvnluaung the regrc~~ion function at two v::~luc::. ot the independent variable(s).

2. 1111.' c llc.:ct on

The procedure is c;ummarized in Key Com:ept 8.1 .


3. A polynomial regr~ssion include~ power!'! ol X as rc.:grcssorl:>. A quadratic regres
sion includes X anu X 2, a nd a cuhic rcgrc!>!>io n includes X, X 2, and X 3 .
4. Small changes in loga rithms can be interpreted as proportional or percentage
changes 1n H va riable. Regressi on~ lllVolving logarithms arc used to estimate proportional changes and e lasticities.
5. 'lllC product of two variable~ b calh.:d <Ill interaction term. When interaction terms
are includl.'u cc. rcgrcsc:;ors. they allow the regression ~lope of one variable to
dcpc.:nd on the \'alue of another variable.

Key Terms
4u<tdrat1C rcgrt:ss1on model (25b)
nonhn~m r~gre;;;-.ion

funo..:tion

(:~60)

pul} nl,mial rcgr~..~;ion m,)del (265)


cubic r~grc ...sinn model (265)
elasticity (26 7)
cxponcntwl funct1on (267)
n.ltural 1ugmithm (268)
hnl!ar log modd (269)

log-lmcar model (271)


log-log model (271)
mteraction term (27h)
interacted regressor (278)

intt:rawon fC!,'TI.!S,lon model C27X)


nunhn~..a r least squares (309)
nonlinear l~a~t ::.4uare:, esttmators (309)

Review the Concepts


li.l

~l..ctt.:h u rl!grcssion function that is mcrea~mg (has a positive slope) and is


stcc::p [or ~mall values 01 X hut less sleep for large values ot X. Explain how
you woulu .,recity a nonlinear regre<: ion to model!hb ... hapc. Can you think
ut Lin economic relationship with a shape like thi.,?

298

CH APTER 8

Nonlineor Regression Functions

8.2

A " Cobb-Douglas" production function relates production (Q) to factor<s f

produclion. capital (K). Labor (L). and raw materials (M). and an error tc ,..
u using the equation Q = >..KfJ Li1~Mf1)eJ', where >.., {3 1 /3 2, and {3 \ are prod .
tjoo para me ters. Suppose you have data oo production and thl:. factor~
production from a random sample of finns with the same Cobb-Dougla~ pr
duction function. How would you usc regression analy'>i-. to c timat~
production parameters?
8.3 A standard " money demand" function used by macrocconom1 ts ha'> tile
form ln{m) = {30 + {3 11n(GD P) + /3 2R, where m is the quantity of (rt.<tl}
money, GDP is the value of (real) gross domestic product, and R is the value
of the nominal interest rate measured in percent per year. Suppose that f3 1
== 1.0 and {32 = -0.02. \Vhat will happen lo the value of m if GDP incrca~~:)
by 2%? What will happen tom if the interest rate increases from 4% to 5':?;>
8.4 You have estimated a linear regression model relating Y to X Your prokssor says. "I think that the relationship between Yand X is nonJincar.'' E xplain
how you would test the adequacy of your linear regression.
8.5 Suppose that in problem 8.2 you thought thal the value of {32 was not constant. but rather increased when K increased. How could you use an interaction term to capture this effect?

Exercises
8.1

Sales in a company are $196 million in 2001 and increase to $198 million in

2002.
a. Compute the percentage incn:ase in sales using the usual fonnula llJ(l

x Sllla;ts.ra <t.toot
Said. , . Compare this value to the approximation LOO
X

[ln(Sales2002 )

ln(Sales 20m)J.

b. Repeat (a) assuming Sales 2(Xll =- 205; Sales2002

250; Sales 2oo2

= 501J.

c. How good is the approximation when the change is small? Doe~ tht:
quality of the approximation deteriorate as the percentage change
increases?
8.2

Suppose that a researcher co llects data on houses that have sold in a paruc
ular neighborhood over the past year and obtains the regression result~ 10
the table shown below.
o. Using the results in column (1 ), what is the expected change iu pnct: l,r
building a 500-square-foot additton to a house? Construct J 95% C1, 0 "
fidcoce interval for the percentage chang~ m pric\.!.

Sr
l.n
In(

Bt>

Exercises

tors of

299

Regression Results for Exercise 8.2

'r term

roductors of

as pro-

ate the

has the

f (real)
te 'alue
: that (31
1creases
1 to 5%?

r profcs-

'Explain
not conan inter-

De pendent variable: ln(Price)


>

(2)

(I )

Regressor

0.00042
(0.000038)

Sire

0.69
(0.054)

Inr )'t~t!)

(3)

----0.68
(0.087)

lnt\iul"

--

---Vtt '"

f ool x Vteu1
I"- . .

Com/ilion

0.082
(0.032)

0.037
(0.029)

0.57
(2.1>3)
0.0078

--

0.071

0.0036
(0.037)

-- 0.69
--

(0.045)
10.97
(0.069)

4 .......

(0.055)

_ _

----

(0.034)

0.071
(0.03.f)

0.071
(0.036)

0.07 1
(0.035)

(0.028)

0.026
(0.026)

0.027
(0.029)

0.027
(0.030)

- - 0.027 -- --

-- ---

- - --- - -- - - -- -0.13
0.12
0.1 2

f-

lnlt n:c: pt

(5 )

(!ll~)

l:iedrooms

Prwl

(4 )

(OJ)35)

---

6.60

(0.035)

-- -- -6.63

(0.39)

(0.53)

--

--

0.0022
(0.10)

---

--

0.12
(0.036)

0.12

(0.035)
6.60

7.02
(7.50)

(0.40)

0.099

0.099

0.73

0.73

--

Summary Statistics

1illion jo

ula tOO
00

::. soo.
oes the

angc

SH

Rl

0.102

0.098

0.72

0.74

---

0.099
0.73

--

--

\'a11ohl, tlehnmon~ l'rice; ~~~ price {S): Sv.e 5 bouse ,iJ~ ( lD square: feet): Bednl(>m~ S number ol bedromn,, PoolS binarv variahl I 1t hou.~ ba:<. a swimm1ng pool. 0 otherwise): View 5 t;.ina l") vuriublc (I 1f house has a mcc: vJew. 0 othr::rwM:): Condillon 5
bmu1>. 1 Brial:llc (I if realtor reports house i~ in excellen t condition. 0 mhcrwise 1.

b. Compa ring columns (1) and (2), is it bcner to use Size or ln(Size) to
explain house prices?

c. Using column (2), what is the estima ted effect of pool on price? (Make
sure you ge t the units right.) ConsLrucr a 95% confidence interval for
this effect.
d. The regression in column (3) adds the number of bedrooms to the
regression. How large is the estimate d effect of an additional bedroom? ls the effect statistically signi fi cant? Why do you thin.k the estimated effect is so smaJI? (Hint: Which other variables arc being he ld
constant?)

300

CHAPTER 8

Nonlinear Regression Funclions

e. b lhc quadratic term ln (Size)~ important?

8.3

f. L se the rcgrc s1on in column (5) to compute the expected change tn


price when a pool is added to a house v.ithout a' iew. Rt.pt.at the cxcr
cise for a hottc;e v. ith a view. Is there a large diftercnct.') (,the differenct: stali::.ttcally significant?
Aitc!r reading this chapter's analvsis of test scores and d J ' ... w: nn cJu
tor comments. "In my experience, student pcrlnrman~ dl!~nd-. on la'\S ~
but not1n the way your rcgresstons say. Rather,studt.nts do v.dl v.ht..n t..
size is less than 20 students and do very poorly when cia::.:. -.uc:: 1., gn.:a!c.:.r than
25. 111c re are no gains from reducmg class size below 20 studen ts. the rclltionship is constant in the intermedia te rC!gion between 20 and 25 '\LUJI.'r .
a nd there Ls no lo~s to increasing class size whe n tl is a l reC~u)' greater tlr 1
25.''The educator is describing a " threshold effect" in which p er fo mwnc~.. 1-.
constCiol Cor class sizes less th an 20. rhen jum ps and is con~ Iant for class Si t~
between 20 and 25 , and then jumps again fo r class sizes greater thau 2). lo
model these threshold effects, define the binary variables

STRsma/1 = I if STR < 20, and STRsma/1

STRmoderote

= I if 20 ~ STR s

STRiarge

0 otherw1se:

25, <md STRmoderatt

= 1 if STR > 25. and STRmoderate

= 0 othcmtsc.

nd

= 0 othc rwbe.

a. Consider the regression TwSwre, - /3 ~ f3 1S TRtmoll, + /3~ I k .1r


T rt . Sketch the regression tunction relJLing TesrScorc to STR t,
hypothetical \..JJues of lhe regression coefficie nts that are con st~te11t
wtth the educator's <;tatemcnt.

8.4

t'

b. A researcher tries to estimate the re~ession TewScore; = /3fl +


f3 1STRsmall; + ~STRmoderate, - f3'!>5TRillrge; + u, <tnd lind thJt ht:l
computer cra~hes. Why?
Read the box 'Tbe Returns to Education anu the Gender Gap in C\cc
rion H.3.

a. Consider a man with 16 years of education, and 2 year~ ot cxp~III!OI.'~.


who is from a \\-estern ;:,tate. L. se thl' n:;:,ults from column (4) ot fthle
8.1 and the method in Ke} Concept .1 to e.,ttmate the c\pect~od
ch1nge in the logarithm of .tverage houri\ earning:-. (Atn) :l''oc:.aatcd
\\ llh an additional year of experience.
h. RcJ)cat (a) cc.'>uming 10 ye..rs of l!xpaicnl.'e.

Exercises

30 T

c. Explain why the answers to (a) and (b) are different.


d. Is the diff~rence in the answers to (a) and (b) statistically significant at
the 5% level? Explain.

in

le:<erer-

e. Would your answers to (a)-(d ) change. if the person was a woman?


From the South? Explain.

tuuca-

f. How would you change tbe regression if you suspected that the effect
of experience on earnings was diffe re nt for men than for women ?

size.

clas..l\
than

8..5

a. The box reaches three conclusions. Looking at the results in the table,
whaL i~ the basis for each of these conclusions?

~e rela-

dents.
r than

b. Using the results in regression ( 4). the box reports that the elasticity of
demand for an 80-year-ol<.l journa l is - 0.28.

ss sizes
25.To

i. How was this value de tenni.ned from the estimated regression?

ii. The box reports thaLlhe standard error for the estimated elasticity
is 0.06. How would you calculate this standard error? (Hinr: See
the discussion 'Standard e rrors of estimated effects'' below Key
ConcepL8.1.)
c. Suppose Lhat Lbe variable Characters had been divided by 1,000
instead of 1,000.000. How wou ld the re))ults in column (4) change?

: and

8.6

TR/arg(ll

(or
st::;t..::nt

Refer to Table 8.3.

a. A researcher suspects that t.be e ffect of %Eligible for subsidized lunch


has a nonlinear effect on test scores. In particu lar, he conjectures that
increases in this variable from 10% to 20% have l)ttle effect on rest
score~. but that ch anges from 50% ro 60% have a much larger eiTec1.
i. Describe a non lin ea r specification that can be used to model this

s that her

. p'' h1 s.::c

form of nonlinearity.

ii. How would you test whether U1e researcher's conjecture was better than the linear specification in column (7) of Table 8.3?

b. A researcher suspecls thaL the e ffect o( income on test scores is differ-

cc

l\'Pencn
~ ofTClhl~
~C\ 1.!<.1

Read the box "1l1e Demand for Economics Journals'' in Section 8.3.

:wmcialt:J

ent in districts with small classes than in districts witl1large classes.


i. Describe a nonlinear specification that can be used to model this
form of nonlinearity.
ii. How would you test whe the r the researcher's conjecture was beltcr than the linear specification in column (7) of Table 8.3?

302

CHAPTER 8

Nonlinear Regression functions


8.7

Thi~

problem is inspired by a study of the ''gender gap'' in carnmgs in top


corporal ~ jobs [B~rtra nd and H allock (2001)j. The stutlv comp<tn:s tu tal
cumpensat100 among top exc utives m a large set of U.S puhlic f;Orporataor
m 1he 1990s. ( Eacb ) c:1r these publicly tiaded corporatiOns must report h.H
wmpensation le,ek. for Lheir top five executives.)
a. Let fmale be an indacator variahle that is cquallo I for females and (l
for males. A regression of lhe logarithm of earnan!ts o nto 1-tmalc. yidJ

ln(Famingt~ ) =

6.48 - OA4Female. SER ""2.65.


(OJll) (0.05)

i. 111c estimated cocffi cicnl on Female is - 0.44 . Explain what this


value means.
ii. 1l1c SER is 2.65. E xplain what this value means.

iii. D oes this regression suggest that fema le top executives cam lc<--.
than top male C\.ecuti"cs? Expla in.

iv. Docs this regression suggest that there is gender discrimination"!


Explain.

b. T\\0 nC\\ variables. the market value of the firm (a measure of firm
size. in millions of dollars) anti stock return (a mca;;urc o f 1irm perfor
mance. in percentage points}, are added to the regression:

In( Earmn~s)

= 3.86 - 0.28Female + 0.311n(MarkerVa/ue) + 0.004Rewrn.


(O.OJ) (O.t>-t)
II =

(O.CJ<.).t)

(0.003)

46.670, lJ.! = 0.345.

i. The coefficient on ln (MarkerValue) is 037. Explatn whatlhb V<ll l.lt:


mcnns.
ii. 'T11c cocftlcicnt on Female is now - 0.28. Expbin why it has

clumgt!d from the regressio n in (a).


c. Are large firm s more likely to have fe male top executives than -.rna II
firm s? Explai n.

8.8

X is a continuous variable that take:, on "alucs bct\\ccn '\ and 100. / i' 3
binal) variable Sketch the follo ,,ing rcgrc ...::;ion function" (with ,,liue"''t X
1
hetween 5 and lOU on lhc horizomal axi' anc.l "allll~.., or Y on the:: vc.:r.l\.
.r\i,.)

Empirical Exerdses

gs in top
lre~ total
urations
p0 rt total

a. );

= :!.0

b. )'

= 2.0

c.

i. }"

..l.

3.0

In().

30

:.<:

ln(.Y).

= 2.0 +

3.0 >< In(.\) ... 4.0Z, \"tth Z

303

= l.

ii. Same a' (1 ). but with L - 0.

d.

1 1 ~~ and 0

i.

Y = 2.0 -t- 3.0 > In(\' )+ 4.0.7.-

ii. Samt: a' (i). but with

wk )~elds
e.
8.9

1.0 x Z x In(,\) w1th Z

= l.

0.

Y = 1.0 ..:.. 125.tlX- 0.111.\'~.

explain how you wouiJ usc. "Approach #2 .. of Section 7.3 to calculate the

confidence interval discu~sed below .Cquatlon (8.8 ).[Hi11l: Thil' requires csttmating a ne.w rc::gre~s1on using a diffurcnt ddinttion of the regressors and the
1.h:pt:nd.:nt variahk. S<:e Exercise (7.9).]

nat this

8.10 Co nsider the rcgrc:.~io n model Y, = {3,1 + {3 1X 1, + {3 2X 21 + {301(X 1, x X21)


u1 Usc Key Concepf 8.1 to ~h ow:

earn tess

aioatton'!

e of ftr01

a. t~

= {3 1 + {3 3X2 (effl.!ct of change 1n X 1 holuing X 2 constant).

b. ;~~

= {32 + {3~X 1 (effect of changt: in X 2 holuing X 1 constant).

c. ll X 1 changt:' hy .lX1 anJ X ch ange~ by .lX2 lhen .lY =


({3 1 .J. /3;X2J ~X + (/3.- /3y't.'1).l.X2 .J- {J,l l .iX.

firm perfor

)04Return.

Empirical Exercises

0113)

E8.1

t sc the data -;et CPS04 c.h:scribco in Empirical Exercise .u to answer the


follo'"ing 4ut:stion,.

hat this value

a. Run a regression of avcragL hourly carmngs (A H ) on age (Age),


gender (Femulc). and educat ton (Bachelor). If Age increases from 25
to 26. how are eC~ruings expedeuto change'? If Age increases from 33
to 34, how arc earnings c-..:pucted to change?

it has

es than sma11

b. Run a regression of the logarithm average hourly camings,ln(AH),

and 100. Z i~ '


.vith values nl \
~:JI
on tbe verll
1

on Asw. Femal<, .tnt! Btu helr11 If Age increases from 25 to 26. bm\ arc
earnings expectetlto change? H Age increases from 33 to 3-k how are
earnings expected to change"
c. Run a regression of the logarithm average hourly earnings. ln(A fl E).
on ln(Agc) . h mtl/(', and Barhdor If .1ge mcrcascs from 25 to 26. how

304

CHAPTER 8

Nonlinear Regression Functions


are earnjngs expected to change? H Age increases from 33 to 34, hm~
are earnings expected to change?
d. Run a regression of the logarithm average hourlv camings.ln(AHF'
o n Age, Agel, Female, aod Bachelor. li Age increases from 25 to 2o.
how are earn ing~ expected to change? 1f Age mcrcascs from 33 to 1
how are earnings expected to change?
e. Do you prefer the regression in (c) to the regression in (b)? Explair.

C. Do you prefer the regression in (d) to the regress1on in (b)? Exphi

1.

g. Do you prefer the regression in (d) to the regression in (c)? ExplaiJ,.


h. PIOlthe regression relation betwee n Age and ln(AH) fro m (b),(~) .
and (d) fo r males wi th a high school diplomn. Describe the similarille\
and differences between the estimated regression functions. Would
your answer change if you plotted the regression function for fema le
with college degrees?
i. Run a regression of ln(A H ),on Age, Agil Female, B(JChelor. and th~
inte raction term Female x Bachelor. What does the coefficient on th~
inte ract ion term measure? AJexis is a 30-yea r-old female with a batnelor's degree. What does the regression predict for her vaJue of
ln(A H )? Jane 1s a 30-year-old female with a high school degree! What
doe~ the regresston predict for her value of ln(AH)? What IS th~ pre
dieted difference between A lexis's and Jane's earnings? Bob is a 30year-o ld male with a bachelor"s degree. What does the regression
predict for bis value of ln(AH )? Jim ts a 30-year-old male wnh 'l htgb
school degree. What does lhe regression predict for his value of
ln(AHE)? What is the predicted difference between Bob's and Jw1',
camtogs?
j. Is the effect of Age on earnings different for males than for femal e)?
Specify and estimate a regression tbat you can use to answer tht~ qul'"
Lion.

k.

rs the effect of Age on earn ings different for high school graduate~
than college graduates? Specify and cstiu1at..:: a regression that you can
use to answer this question.

I. After running all of these regressions (and any others that you want tO

run), summarize the effect of age on earnings fo r young workt:r .


E8.2

lJ ing the d:lla c;et TeachingR arings described


out the [ollowing l!xerci es.

C31T}

10

Empirical Exerc'"'"

-1.:.

Empincol Exerc1se$

305

a. Es timate a regression of Course_ val o n B etrll f \, lmro, OneCreclit

Female, Minorily, and N VEngltsh.


b. Add A ge and Agel to the regressjo n. Is there cvith:nce that A ~e has a
nonlinear effect on Course_Eval? fs th ~n. ~.;V Itk ncl! tha t A~r has any
effect on Course_Eval?
c. Modify the re gression in (a) so that the l!ffect ot Btaut\' on

Course_Em l is different fo r men and wo men ls the male- lcmall! difference in the eiTect of Beauty s ta tist.kally sigmf1cant?
d. Pro fessor Smilh is a man. H e bas co me uc burgery th:\t increases his
bea uty index fro m o ne sta ndard devia tio n belo w the a verage to one
standard deviation a bove th~ a vcragc. \l..'hat h. Ius value of Beaut.\
before the su rgery? A fter the s urgery? Using the rcgrc!>sj on lD (c).
construct a 95% confidence fo r the increase 10 his course evalua tio n.

e. R epea t (d) for P rofessor Jo nes, who


E8.3

IS

a wo ma n.

Use the data ser CollegeD is1ance descn bed in Empirical Exerc1se 4.3 to
a nswer the following questions.
a. Run a regressjon of ED on Dist, Female, Byte.H, Tu uinn, Black. Hts-

panic. lncomehi, Ownhome, Dad Coll, MomColl. Cur80, and


Srwmfg80. If Disr increases from 2 10 3 (that ic;, from 20 t O 30 m iles).
how are years of e duca tion expecte d to change? lf Dist increases fro m
6 to 7 (that is. from 60 to 70 mtles), how are yean, of ed ucation
gh

expected to cha nge?

b. R un a regression of In( ED) o n Dist, Female, Byresr, Tuition, Black,


Hispanic, /ncomehi, Ownhome, DadColl, Mom Coli, Cue80, and
Stwmfg80. If DisT increases from 2 to 3 ( from 20 to 30 miles). how are
years oi e ducation e xpecte d to change? II Dist increase~ from 6 to 7
cs-

can
Jl{ \CI

(from 60 to 70 miles). how are years of e ducation cxpectt..! tl 10 change?

c. Run a regression of ED o n Dist, Oi:;fl, Female, Bytest, Titirion, 8/uc:k,


Hispanic, lncomehi, Ownhome, DodColl, Mom Coll. Cue80, a nd
Stwm.fg80. If Dist increases fro m 2 to 3 (fro m 20 10 ~0 miles) , how a re
years of ed ucation expected to change? If DLw increases fro m 6 to 7
(fro m 60 to 70 miles). how are years of ed ucation c-..pcctcd to change?

d. D o you prefe r the regression in (c) to the regression


I! ~.z.

10

e. Consider a Hispanic fem ale with Tui11o11 _.. $950. Bytw


lncomehi = 0, Ownhome = 0, DadColl I. \1omColl
7.1 . a nd Stwmfg = 510.06.

(a)? Explain.
58.
1, CmBO =

306

CHAPTER 8

Nonlinear Regression Functions


i. Plot tbe regression relation bt:twecn Diw and fJ) rrom (a) and ( c)
for Dist in the range of 0 \o 10 (I rom 0 to 100 mtk~ ). Dc:.cribe the
similanttes and differences bet\\t:cn the esumah:d rcgrc 'ton run~
tiOrll>. Would your anl>\\l!r change i[ you ploth:d tb-.. rcgrcl>:ilOn tu n~
tion for a white male with the same char.tch:.ri!-ti~?
ii. How does the regression function (q bl.!h.n~.- for Dt"it > 10'! llo''
many observation are there with Dist > 10?
f. Add the interacrion term DodCol/ X MomCo/l to the: rcgrc-.~ion in
(c). What docs the coc(fictent on the interaction term measure?

g. Mary, Jane, Alexis, and Bonnie have the same values of Dtst, Byte.\/
Tuition, Female, Black, Hispanic. Fincome, Ownhome, Cttt>80 and
Snvmfg80. l'cithcr of Mary's parents allcndcd college. Jancs fathc:r
anended college, but her mother did not. Alexis's mother attended
college, but h ~r father did not. Both of Bonnie's parents attended college. Using Lhe regressions (rom (I):
i. Wbat docs the regression pred ict tor the difference between Jath. s

and Marys years of eclucalion?

u. What doc!s the regression predict fo r the difference betwc~n


AJexis's and Mary's yeaJS of education?
iij. What does the regression predic1 for the difference between B

nie's and Mary's years of educat10n?


lt. Is there an) ~vtdence thmtbe effect of Di.H on ED dcpcndc; on the

family's income?
' A her running all of these regressions (and any others tllat you want iO

run), summarize the effect of Dist on ycnrs of educatton.


E8.4

Using the data ser GroW1h described in Empirical Exlolrcise 4.4, cxcluJin!! the
data for M altu, run the following five r~.:grcssion s: Groll'th on ( 1) TrmitS/um
and YecmSclwol; (2) TradeSiwrl! and In( Years Schoo f) ; (3) TratkSirare,
ln(YearsSchool), Rt?v_Coups, Assassinatiu/1,\ nnd ln(RGD P60); (4) {rade
Share, ln(YearsSdwol). Re1' _Coups, 1\ssa.uinatiom. 1n(RGDP60) . and fi ,Jdl:
Share X ln(YearsSchool); a nd (5) Trat!tSitare, Trudi'Siw re . Trcu!t\1 t rt': .
In( YcarsSchool). Rev_Coups, AssassinMions. md ln(RGDP60).
n. Construct a scatterplot of Growth on }mr:Stluml Doc the rdatiPnship look linc..tr or nonlinear? Explain. l ...e th~ plot to ~'pl.tin '" h,.
regreo;;.,1on ( 2) f1 h ~c ttcr than n:~ rco;-.iun ( I ).

Regression Functions That Are Nonlrnear m the Parameters

307

b. In 1960. a country contemplates an education policy that will increase


average years of schooling from 4 years to 6 years. Use regression (1)

lc)
he:
oc-

to predict the increase in Growth. Use regressio n (2) to predict the


increase in Growth.

c-

c. Test whether the coefficients on Assassinarwm and Rev_Coups are


equal to zero using regression (3).

d. Osing regression (4), is there evidence that the effect of TradeShare on


Growth depends o n the level of educa tion in the country?

e. Using regression (5) is

ther~ evidence of a nonlinear relationship

between TradeShare and Growth'?

st.

f. In t 960. a country contemplates a trade policy that will increase the


average value of TradeShare fro m 0.5 to 1. Use regression (3) to predict the increase in Growth. Use regression (5) to predict the increase
in Crow1h.

Janes

APPENDIX

8.1

Regression Functions That


Are Nonlinear in the Parameters

BonThe nonlinear regress1on functioru. considered in Section' 8.2 and 8.3 are nonlinear functions of the X's-but are linear function~ of the unknown parameters. Because they are lincar in the unknown parameters, those parameter:. can be ~lima ted by OLS after defining
\\ant to

new regressors that are nonlinear lran<>formatJooc; of the ongmal X"s. lb.is family of nonlinear regression functions is both nch and convcrucnt to U:>e. ln some applicattons, however.economic reasoni ng leads to regression functions that are not linear in the parameters.

udingthe

adeSilare
11 Je5/tllfe.

( ~) Trrllit'

nd rnult:
uieSJrart

Although s uch regression functionc; cannot he estimated by OLS, they can be estimated
u~ing

an extension of OLS called nonlinear lca:.t squar es.

Funct ions That Are Nonlinear in the Parameters


We begin \\ ith two exampl e~ of functions that are nonlinear in the parameters. We then provide a general formulation.

relation
m wh~

tugi.'itic cur~e.

Suppose yo u are studying the market penetration of a technology- for

example the <.~doption of database manag.:mcnl software m dtfferent mdustnes. The dependen t v::muhlc is the fracl!on of firm s in the mdustry that have adopted the software. a

308

CHAPTE R 8

Nonlinear Regression Funchons


~ d... ~ri l-ec;

'ingl-. mu...JX:m.lcnt \ JIIJhh..


tnJuo,tm:~ The d..:p~.;nd~nl "

an mdu'll')' ch.tr.t ~l U\tac and you ha'c data on,,


n 1hle 1 bctw""" 0 (no .aduph;r!>) Jnd I ( 100"-o adoplu

Becau,. a hncar rcgrc.:''' 10 mwd could produce prc.:dt1.t d ,,,luc:. lc.:ss than 0 or ~r .ah.
th.tn I it muJ...c.;~ :-en~ tn u,~. 111\tead a tum:unn th.ll pruducc' prcdtcted value' b..:tw~.;~
JnJ I.
lb~.;

logistiC (unction o;moothly increa~e~ from t1 minimullllll 0 to a ma"\imum of I I ll.

lo!!t'it tc rcg.re""ion model wnh a 'mgk. X is

l .J )
The lugi,lic function with a ,jngk Xi!> !,raphed 111 Fitturc 1'.12 \~can be een in the gr.tp :.
the I1.1J!i<.tic functu n h , .111 dnngah:d ..

s sh..:tp.;

For 'm sll' 1luc" of X. the 'alu.: of th~. fu n

''nearly 0 and the o;lopc ts flat: the curve i~ steeper for mot.lt:ratc values o f X ; an\1 ror
lare.e values of X. th~: Junlllon ttpprouchcs J and thu slop0 is llut again.
I ton

p
p

Negati~e

exponential l(rOwth. llw fu nction:, used in Section 8.2 to model the reiJtl\11\
he tween test ' core' anu mcnmc ha11c sum~: dcfic11:ncu:s. For exomple, the pol~ noDllaln

d~

can proJuc.. a negative slut~ for some values of meum~.. which i~ implaustble. Tiao: It g
arithmic ;;pecaltcat ion has .1 rx>~ttivc -.tope Cor ..11 value'\ o r l'l<.Om hO\\ <:\er. a:. incorr- ~~
vcn

l~c. the

prcdtcted v:~lues lllt.;"TC'N! \llllhout bound. so tor some incomes the pre c;c:d

value for J J1:>tn..:t " tll ... xcccJ

th~.;

m.n.1mum po!\.o;ahle ~c.) eon the:. tesL

The nesatt\e c:xponenlt.a bro'' lh model provides a 111>nlmear c;pecaficauon thJI


p\Nli\C :.lo~

and

decreao;;e~

lor all

\:I)UC'

n-. income

in..Tca~~..~ to inJ1nuy)

TI11!

<'f tncome, hac; :'1 -.lope thut iS gr<.'illC\t at low

ri-.c.:'~.

mll

h.t~

an upper hound ( thill ''an

ncl!,ative e\poncnti.tl growth

The nt:gall\e exp11ncottall1 IO\\ lh lunctton is graphcJ

rcgc~-.11111

Ill f~gure ,

hu~

lia!Uelo of tnWniC

.~~~ mptotc

a' inl me

model is

12h. The slope is 51\; e , tor

k)\\ '.1.luc~ oi .\.hut .t~ X mcrca-.es it rc.u:hes an .tS) mpwt..: ul f3u.

Crneralfunctions that ore lltmlfttttu in the parameters. TI1c lo~tstic and ncgatiw ~' 1"1 '
nenual growth rcgrc~~ion modele; are -.pecial C:'I$<!S of the l!~ncral nonlinear rcgr~ , , 11)!1
model
Y, J(X 11

,.\ 1 ,

/Jo.. . {3,,,) + u,.

in which then~ an. /.: ind.:pcnd~nt 'aria"lc' anc.J m

r .; .: '>

p.~rnmctcr--. f3o- . .. . /3m In the m~,r

d of Sccuons 8 , un.l s 1, the: .>. 'c.-ntc:rcd thl' luncunn nnnlwcurl). hut the par..1md ~

.:nlcr..:d lmc:aJ I~ In the C\llmpl

:.uf thi'

;tppcn~h\,lhc par unci~ I enter nonlinc.:~rl) a..' ,,..,11-----~.____ ____.

Regression Functions Thot Are Nonlinear in the Parameters

II 00 II

309

fiGURE 8.12 Two Functions That Are Nonlinear in their Parameters

>liOn).
y

~reat.: r

)'

f\'1!1!0 I)

1--- - --

f3o - ----- --- --- -- ---::.;-;;.-...-~--

(8J R)
~ graph.

(l ~-------------------------

f.J

lhc[unc-

(a) A Jo~istic en rve

I:and for

(b) A llC!<Hivc c-xpone-nnal !'rowth curw

plots the logistic function of Equation (8.38), which has predicted values that lie between 0 and 1. Part (b)
plots the negative exponential growth function of Equation (8.39). which hos a slope that is always positive ond
de<:rease.s os X increases, and on asymptote ot {30 os X lends to inRnity.

Pori {a)

f relation
~ialmod

_Thc log-

If the paramt.:tcr!> art: known. then predicted effect:; may be cornpu_t..:d using lbe mcLbod

orne gelS
predicted

Jcscnbc d m Sect ton R I. In applications, however, the parameters are unknown and must

be estimated from the data. J'an1mc ter~ that enter nonlinearly cannot be estimat.:tl by OLS.

but they can b.. t.:'timatcd

h~

nonlinear lea<;L squares.

that has a
of im:ome
as

m~omo:

Nonlinear Least Squares Estimation


Nonlinear lea<;t :>quare is a gt.:neral melhoc.l for e-;timating the unk nown pa rameters of a
rcgn;ss10n funcuon \\hen those parameters enter the population n:gn:s~ion funcuon non-

(S.39)

is steep for

gauve expt>r rcgt't:$~toO

linc:arl~
Rc.::callth~o. JbcU'>'>lllO

in Section "3 of the OLS eo;timator of the coefficients of !he lint:llr multiple regre,\IOn model. lllc OLS cslimalor mini.rn.Ues the sum of ~uarcd predic-

L;_

[Y, - (b11 + b 1X 11 + - -~ h0~,)f. ln principle. the


tion mistakes in Equation (5.8).
OLS t:~ t imator can be computed hy checkmg many lri<Ll \'aluc~ o( bn. .. _b~ and settling on
the values that mmirnizc the sum (If squared mtl'takes.
nm. ~arne approach can ~ used to estimate the parameter:. of the general nonlinear
regresswn moud m 6quution (~.40) . Because the regres-;ion model is nonlinear in the cod-

l' cient<., t hi~ me thou is Cl\Jieu nonlinear least squ are~. For a ~ct of trial parameter values b11
h 1 , b 111 cunstructthe sum of squared prediction mistakes:

~ I V, - f (.':(l . Xi.t"t .hm)f.

t l

( 8.41)

310

CHAPTER 8

Nonlinear Regression Functions

The nonlinear lea'il squares estimators of {.30, {3 1, , {3,.. ore the value~ of 1>11 b 1, b,, !hat
mmtmtte the sum of squared predicuon mistakes m Equation (8.41).
ln linear regression, a relatt"cly simple formula expresses the OLS ~lima tor a~ a ft~ nc.
tion of the data. Unfo rtunately. no such general formula exists for nonlinl!ar least 'S<JU r~
so the nonlinear least squares e:.timator must be found numcncally w.ing a computer
Regression software incorporates algorithms for sohing the nonhnear least ~uare~ m'nirnWtlion problem. which simplifies the task of computing the nonlinear lc.t,t ~quar~~ t:'limator in pracuce.
U ndcr general conditions on the function f and the X's. the nonlinear lc<tl>t squar~s l -t1
mator shares two key properties with the OLS estimator in the linear regression mou~ 11
is consistent and it is nonnuJJy distributed in large samples. In rcgn;l>~ion soft ware that sup.
ports no nlinear lea&t squares est imation. the output typically reports standard enor., fur
the estimated parameters. As a consequence, tnference concerning the parameters Ciln pro.
ct:cd as usua l: in parlicular, r-statisrics can be constr ucted using the general approach 111 Key
Concept S. I, and a 95% confidence mterval can be constructed as the estimated coeffil.li.!nl,
plus or minus 1.96 standard errors. Just as in lin~.:a r regression, the erro r tenn in the nt~n
linear regression model can be bereroskedastic. so heteroskedasticiry-robuo;Lstandard c:rrors
sho uld be used.

Application to the Test Score-Income Relation


A negative e'< ponential growth model, fit to district income (X) and test scar~ (Y). h , the

desirable features of a slope that is always positive Iif {31 in Equntion (8.39) i~ poslth ... j.mJ
an asymptote of f3o as income increases to infinit). The resuh of estimating /3. f3 . .liiJ 13- Ill
Equation (8.39) using. the California test score data yields~. = 703.2 (heteroskeda.. uu~
robust standard error = 4.44),~1 = 0.0552 (SE =0.0068).and ~:"" - 34.0 (S = 4 4)-. . Th~
the estimated nonlinear regression function (with standard errors reported bclo~ the! pl!am
eter estimates) is

Test Score

= 703.2[1 -

(4.44)

n.u~~2(/nc~>me - l4.ntj.

(0.0068)

rs ...m

(4.48)

11,is estimated regression function is plotted in Figure 8.13. nlong with t h~ lnga11 thm1'
tegression function a nd a scatterplot of the data. The two specifications are, in thi' ca>'
31
quite sim1lar. One difference u; that the negative exponential growth curve nattc:ns ,,ut
the htghest levels of income, consistent with ha\'ing an asymptote.

Regression Functions Thot Are Nonlincor in the Parameters


th at

FlGUR 8. 13 The Negative Exponentiat Growth and Linear-log Regreuion Functions

uncares.
uter.
plini-

~ nogotive exponential growth

regres
~on fvnction (Equation (8..42)) ond the
lu~or log rogrossion fvnc1ion (Equohon
8 1811 both coptvre the nonlinear relotion between test scores end district
rncomc One difference between the two
lunc on~ is thot the negative exponential
arowth model hos on osymplofe os
ir.:o'l'lO 1ncroows to infintty, but the
0 log regrenon fvnclion does not.

~ t:Mi-

"esti~:l h

3 11

Test score
71)()

rm

pro-

m l<e)

Iicicnt.
@

c non-

Dindct income

errors

.has 1he
jvcl and
10d~ in
asticst~-

c pararn

(SA2)

~;trithrnlc
till~

te ns<

I
,,I

_cH_A_P_TE_R_~ Assessing Studies Based


I

on Multiple Regression

he preceding fi ve chapters explain ho"

10

use multipl!! regrc.:l>:lion

analyze the relationship among vanables 10 a data

'iCl.

:.lep back and ask, What makes a study lhm uses multiple

tC\

In thio; chaptt!r. \\::

rcg.re~100

rehab!.: r

unreliable? We focus on statistjcal SlUd ies that have the objective of esti mating

the causal effect of a change in some independent variable, such a!> clas::, siLc. l, 0
a dependent variable. such as lest scores. For such studies, when will multi ph:
regression provide a useful estimate of the causal effect ond, jus1 as impnrtunrly,
when wi ll it fai l to do so?

To a nswer this question. this chapter presents a trameworl. for assessin~


statistical studies io general. whethe r or no t they usc rcgrc ...... iun analy:-.is. 111,
framework relies o n the concepts of internal and e-xtc m31"..11H.ht\ A stuJv 1
mtcmally valid if its statistical inferences about causal elfect~ are valid {c.;r the
popuhuon and selling studied: it is e\:tem all) vahd if 11 ~ mfe rencc:s can ~
generalized w olber populations and ~ mngs. l n Secuoo 9 1 and 9.2, we u'' _.,,.
mternal and extemal validity, list a vanety of pos ible threa ts to Internal r.J

external validit y, and di.<;cuss how to identiC) those th re<Hs 10 practice. 1111.!
discussion in Sections 9. 1 and 9.2

focu ~c:s

on I he estunauo n of ca usal cffec.:ts

from observational data. Section 9.3 discusses a different use of regression

modc l;;- forccasting- anu provides an introduction to rhe threats to the valtJit}

of forecasts made using regression models.


As an illustration of the framework of tnlcrnal and exle rnal validity. 111
Section 9.4 we assess the internal and exti!rnal validity of the stud)' of tht: -.11'-'d
on t ~st scores of cutting the studcnt- teachu rullo presented in Chapters *-"

-------------------------------------------------------- ~
312

9. 1

lniOrnol and ExttVnol Validity

313

INTERNAL AND EXTERNAL VALIDITY

r---

------------------------,

A :.tatisticul analysis is internany valid if the statistical inf~rcnct:!l about causal

9 .1

cft .. ct are valid for the population being studied. The analysi is e"tern&ll) \ alld
ir ib inferences and conclusions can be generalized from the population and set
t nl,( studied to other populations and settings.

9. 1

Internal and External Validity


1lte concepts of internal and external validity, defined in Key Concept 9.1, provide
a fram ework for evaluating whether a statistical or econometric study is useful for
answering a specific question of interest.
Internal an d external validity distinguish between the populatJon and setting
studied and the population and setting lO wbkh the results arc generalized. The
populatio n st udied is the population o f entities- people, companies, school dis
tricts, and so forth-from which the sample was drawn. 11te population to which
the results are generalized. o r the populat ion of intere t, is the population of e ntilit:s to which the causal inference5 from the study are to be applied. For example,
a high school (grades 9-12) principal mig.hl want to generalize our lindings on class
sizes and test scores in California e lementary school dtstricts (the population studied) to the population of high schools (the popu lation of interest).
By "setting,'' we mean the institutional, legal, soctal, and econom ic e nvironment. For e~a mpl e, it would be important to know whether the ftnding~ of a laboratory expe riment assessi ng methods for growing o rganic tomatoes could be
generalized to the field. that is, whether the organic methods that work in tbe setting of a laborarory also work in the setting of the real world. We provide other
examples o f differences in populations and setungs late r in this section.

Threats to Internal Validity


Internal validity has two components. Firs t, the estimator of the causal effect
should be unbiased and consistent. For example, if IJsrn is the OLS estimator of
the effect on tes.t scores of a unit chaogt: jn the s tudent- teacher ratio in a certain
regression , then Psrn should be an unbiased and consistent estimator of the true
population causal effect of a change in the s tudent-teacher ratio. f3srR

314

CHAPTER 9

Auonmg Studies Sosed on Multiple Regres~ion


l)~~und. hypothesi' tc t' -.houltl ha .. c the desi1ctl 'll!mftcanc~.- k:vd (the actual

rqcct on r:uc of tht: te-. under tht. null h> polht.'i' 'hould equ.1l it' tle..,irt:d .;igmr
ic net. k'-d) . .md contd~nc~. mt~ndb !>hould have the tlcsrcd confid~.;nCt. 1t.
l"nr ~..x.tmplt:. if a conhu.. n~..~.. mtenLll h construch!tl nc; {3\1 H l.lJ6SE({3,7 H) tl
cunfidcncc intcl'\.ol shouiJ contain the true popul.lllvn cau.:;al dlcc..t, {3HR w 11 11
prohahilit} 95% over ICp!!atctl samples.
In regression an.tlysi-,, caw,al dfecls are estimatt.d U'>ing the ~..stimatcd rcgr~..:.
'lon functton and hvpothc.sb tests arc performed u"in~ the cstimatc<.l regres-.1u11
cudfJctcmc; and thctr -.tandarJ errors. Accordingly, in a <;tudy based on OLS rcgr '}
s1on the rcquirem~..ntslur internal valitlllv are that the OLS e'ttmator '" unt-iz d
.md ~.:on!)JSt~:.nL. and that o;randartl ~rrors are compuh.tl m J \\a .. tb. t makes c lh
i.kncc mter\'alo; h'l\1.. tile dt:slrcd conftJc;:nce level. Th~..r~.- Jre vanous reason ... h .-;
might not happen, anJ thc..,c re.tsons con<.tilute thn:.tts to mtcrn.il valu.hty. The:>
threats lead to failure~ of nn~.: or more of the Jca,l ~ qu nres as:.umptions in Kt:y
Concept 6...1. For example. one threa t thnt we have discussed at length b omiuec.l
'an,1blc hia': it kads tu correlatiOn between one or more regressors and the er11r
term wh1ch violates lh~.: fu:;t k asL s4uarc!> assumption. lf data on the omi tted v:-.n
.1hlc. ar~.- a-.ailable.th~.on thrs threat ca n be avoided hy including that vanahlc a' n
adJation tl regn.'""or.
S~.:dion 9.2 provides a detailed discussion of the variou-. threats to intc:m dl
-.a.Jc.ltt) in mulliple regreo;sron ancdy:iis and suggests how to mitigate them.

Threats to External Val idity


Potenllalthreats to 1..'\h:rn:tl Htlidity arue (rom differences between thr.:: popu a
lion Jnd selling sluditd anJ the. population and selling of mtcrcst.

Differences in populations.

Diffacnccs between thl rupulation ,tudieJ and

th~.: population of intere't c tn f'<N?; threat to external vah(.ht~ For c.. "<:tmple. lab

oraton studie' of the lll\K dtecb ol chemicals typically u(,c animaJ populat wn'
like 1111cc (the populauon stutl1ed), but the r~uhs are u~cd to \.\fi le health. 11 1
safl!t) regulations ror human populuuons (the populalitm o f Interest). W helh~ r
mice and rn~n differ suffich..nlly to thrcatl.!n the externa l validity of ::.uch studh:' b
a maller of Jchate
\tor~ generally the trur.; causal ettcct might not be the ..am~ in the popuiJI II
~luc.h~d nd the popul,llh'O of Interest. This could be hcc:~tuse the popul:nion
~
chD:>c.:n 1n \\a} that make:. 11 lhfferent from th~ popula11on of mtere t. bccaU'l _I
d 11~.-r~.-ncc~ in ch..'\r,ld.:n-.uc) ot th~.o populations. bcc...tU')~ ot gcograptucal d I .. r
l' ncc or becau<;e the ~ uJ} i' out of d.tte

9.1

InfernoI ond Exromol Validity

315

Diffirences in settings.

Fventf the population hcing ... tudicd anJ the populallllltc.:rc ... t arc identical, it might not lx rn''iblc tn eo..:ncr'llilc the -.tudv rc'>ult-.
if thl 'ettin ~'- diller. for example. a c; t u<.l~ ol the ell~.:l'l on culkec. hingl Jnnking
ol an.tnllJnnkmg ad\'erttsing campatgn ought not gcncr.1ht.c to ,mother tJcnltcal
ruup ll( collcuc: stuJcnh tf the legal pcnalttes for dnnktnl! ll the t\\O ~olkgc:. l.ltffcr. l n tht~ cJ:.~o:.thc leg:tl ~tting in which the tUJ) '''' conJuctc:d Jtfier. from the
lc!-!al selling tu \\hil:b il'. rc'>ults are applied.
\.tore generally. examples of differences in -:,clling-. includl' dtlfcrenccs in the
in,titutitiO tl ~"'ironment (public unhef'>lltes \'C!T'>Uc; rtlac,tnu' Unt\tor.,attc:-.). l.hffercn~.:es tn J, '"" CJil krcnces in legal JX nalties) liT thlll:h;lll. . c' m thl: physacal~:n\ 1wnment ( taahwtc-part) bingt.' dnnkmg in 'outh~n C.tlilorlllt n::r:.us F<urb.lnks.
t\lao;ka).
11111

Application to test scores and the student-teacher ratio. C h tpkr~ 7 and


8 n.:purt.:d -.tatistically Significant. but <~uhstanth el'V snMll. t!'>timatcd improvcm~ nts
in test :-.core:'> rcc;ulting from reducing the stuJc.;m tc,tchcr rat10. llm unaly~ts wa::.
hased on tcc;t rt! ~ult:. lor California school di<>tnct.;. Suppoo;c lut th~.; momt!nt that
th1... c rc:-.ults ur'- tntcrnall~ valid. To wiMt oth~.; r pnpulations and setttng:. ot tnter
c~t could thiS finding be generalized '>
me closer are the population and settmg of the -.tud~ wthnsl..' uf intcre'>t the
'trnnger b the cao;e for e\ll!rual mlidit~. Fur cxampll culke.~... \tudcntc; and collt.'gc
m'tructiun Me' cry diff~...rent than ckmcntary school studl.!nt<; .md lllStru'-tion so
it '" implau-;ihlc that 1he effect o f rcduc111g. chtss c;izcl> c:.ttnwtcJ usmg th~.; California elcmlnttry schtlOI district dat.l \\OUld gencwl i7c to culkgc.;~. On thc: other
h.n1d, ckm~...nt.try .-;chool student!). cumculum. mJ o rg.muatton lilt bwadl~ -.unilar throulhoutthe Lnitcd Stares. so 11 "pl.tU'>thlc thut the ( th(\11 nia n~sults mighr
~c:nerah:te to pcrfum1ance on stanJanJized 11... -.t, 111 nl her U.S. dcment.tr) <>Chool
dhtritt'-

How to assess the external validity ofa study. r xh.rnal vahdil) mul>t be
JUdged u'm~ <;pecifi knowledge olthc popul.ttwnl> anJ scttuurc; !>tudted and those
of tnlLrc'>l. Important dtfferenc~ ~ctween th~ two will ca't Jnuht <.m the external
validity ul the study
Snmctimc-.thcn. arc two or more ::.ludic-. \Ill different hut rdutcJ ropulations.
If :-;o, the .:xt~...rnal validi ty of both c;tudies can he chcckcJ hv C\)lllpatm~ their
rc.,ult$. For cxampk, m Section Y.4 we analy:tc t~tl>Corc and d,,,., site J.tt,t 101 dt.:mcntan ~llllOI Jbtnct., rn \la s.1chu:. tr..; and comp.ut: the M.t''KhU"\."Il:, anJ Calitomiu rc"ult:,. ln gcneral.:illnilar flnJmgs m t\\oll or more stuJtc" hobtc::r dum!> to

3 16

CHAPTER 9

Assessing Stvdie$ Based on Multiple Regression

external validity. while differences in their fi ndi ng~ thot ill I! not readily cxpla 1ncd
ca~ t uoubt on their exte rnal validity.
Because th reat~ to e\tt!mal valid.
iry <;tern from a lack of comparability of populations and seumg~ the' tl reat rr.:
best mirumized at llle early stages of a study. befo re th~o; d.tl4 are collect~ d. Stud}
design s beyond tbe scope o( this textbook. aod the int~rc'\lcd reader i referr. d
to Shadisb, Cook, aod Campbell (2002).

How to design an externally valid study.

9.2

Threats to Internal Validity


of Multiple Regression Analysis
Studies based on regression analysis are internally valid if the eslimated r~g ~S
sion coefficients are unbiased and consistent, and ii their standard errors yu.:ld ~:on.
fid ence inte rvals with the desi red confidence le ve l. TiliS section su rvey~ fivl
reasons why the OLS es timato r of the multiple regression coefficie nts might he
biased , even in large samples: omitted variables. misspecifica t ion of the tum:tionul
form of Lhe regression function. imprecise measurement of the mdependen "tri
ables ("errors in va riables"). sample selection,and simultaneous causalit). A J\e
source) of bias a rise because the regressor is correlated wuh the error t\!rm u .ht
populatjon regression, violating the fi rst least squares assumption in Key Con ~ pt
o.4. For each, we diScuss whar can be done to reduce tbi~ bias. The sectinn m
eludes with a discussio n of circumstances tha t lead to inconsistent standard c rru <;
and what can bt: done about it.

Omitted Variable Bias


Recall that omiued variable bias arises when a variable that both det\!mlil' f
and is correlated with one or more of the included regressors is o mitted fn.lm the
regression. ' Jnis bias persists even in large samples, so that the OLS estimutPr is
inconsistenl. Ho-.v best to minimize omitted variable bias depends on wht!lhtr or
not dat<~ are available for the potential omitted vari able.
1A comp:!n'I<IR or man} related studtes on the: s.uue 1opic IS catkd 11 mel a .mal~~~s.. The cfu..:1. ,,on 1"
the box on the ),toL<Art dfcct" in Cb,1r1c:r 61s ba~d on a mct:\an.JI)SIS. for c'\clmplc Perl
U"~ll '
mcJa.an.tlvo.i~ or man\ ~tudi~ has it<> own ch tllo:m:~'i.llov. dO\U "-'rl th~ 1ood ,tudie!. from
hd
.
.
Ho'>' do \CIU .:ompare ~tudi~ \\ hen Ihe depc:nJc:nt vanatltc, d1llcr 1 Should you put more \\C:tghl (1n
large ~ud' Jh n 'ID.JII stud~"?/\ d~cu~ston of m.::lilanatyso~ and II$ ch.tll<'nl'~ ro.:~ beyonJ th e ~'\.lf'C
of the. 1c:~.1bool. The mtercstcd rcad<.:r '' n:lc:rred 11.1 Hedges :~.nd Ol~in ll'l~:'il onJ Cooper 111J I I 1 ~'

(IIJ'14 )

'

9.,

1nreo1~

to lnlernat vouotry ot Mulhple Regresson AnalySis

31 7

Solutions to omitted variable bias when the omitted vorioble is observed.


If you have data o n the o m ittcJ vart.tblc. then y<,u can mcludc this 'adabk: 1n a
multiple regressio n, thereby nddres!.ing the probl ~m J lowe' l!r. a dding a nc'' van able has hoth costs and ben~fl tc;. On th~ one ha nd o nmu ng the! \,mahle could
result in om iu~d variable bias. On the other han ~J. mcluding the ''a riabh: \\hen it
does not belong (tha t is, when its populatio n regrc sion coeffi cient is zero) reduce
the precision of the estimators of the other regn;;~:-.ioo codficicnts. In o the r word_
the decision whe ther to include a variable involves a tradeoff be tween nias a nd
variance o f the coefficients of inte rest. In practiCe!. t h cr~ a rc four stt:pl> that can
help you decu.Jc whether 10 inc lude a variable o r ~et of vmiahles LD ,\ regression.
The first step is to identif~ the ke~ coefficien ts o f an ter~;;~t in ) OUr regre io n.
1n the tes t score regressions. thi!> ic; the coefficicht o n the stude nt-teacher ra tio,
because the questio n o rigina lly posed conce rns the effect o n test scor es of reducing the student-teache r r a tio.
The seco nd ste p is to ask yourself: What are the mo~ l li J... cly sources of important o mitt~d va riable bias in this regression? A n sw~.-n ng thi!> q uestio n requ1res
a ppl y in~ econo mic theory a nd e xpert know l cd~e. and ~ho uld occur before you
actually ru n an y regressions; because lbis is do ne before analyzing the data, th is is
refe rred to as t1 priori ("befor e the fa ct") rea~oni n g. Jn the test score exam ple. this
step e ntails idt.mlifying those de lerminants of test scores Lba t, if ignore d, could bias
our es tima to r o f the class sue effect. The result of this step is a base regre sia n
specifica tion, the starting point fo r your empirical r.:gre:ion a nalysis. a nd a hst of
additiona l q uestiOnable" vanables tha t might help to miugalt! posstblc: o m itted

ors

variable bias.
The third step is lo augme nt your base specifica tion with the additional q uestionable variables identified in the second step a nd to test tb\! hypotheses that the ir
coefficie nts are zero. If the coefficiem s on the additiona l variables a re :,tau:.ltcally
sig.nilicant. or J( the estimated coefficients of mte rest change apprecia bly " hen the
additional variables are included. Lbe n the) should re main in the sp~tiJica tion and
you should modi fy your base sp ecification. If no t, then these vari~blc!> ca n he
excluded fro m the regression.
111e fo urth !>tep is to present an accurate sum mary of your results in tahular
fom1. This provides " full disclosure'' to a po tential skeptic. who can then dra,-. his
o r her own conclusions. Tables 7.1 a nd 8.3 arc examples of thi!l stra tegy. For e xample, in Table 8.3. we could have prcsenteJ o nly tht: regression in colum n (7) ,
because that regression s ummarizes th~ rele \ ant effec ts a nd nonlineantie!l in the
o the r regressio ns in that table. Prese nting the othe r regressions, however. p e rmits
the skepticaJ reader lO draw his or her o wn conclusio ns.
These steps a re summarized in Key Concept 9.2.

318

CHAPTER 9

Assessing Studies Based on Multiple Regression

OMITTED VARIABLE BIAS;SHOULD I


INCLUDE MORE VARIABLES IN MY REGRESSION?
If you include another variable in your multiple regression, you wW eliminate the
possibility of omitted variable bias from excluding that variable but th<! variance
of the estimator of the coefficients of interest can increase. Here arc some gu idelines to help you decide wbelher to include lm additional variable:
I. Be specific about the coefficient or toefficients of interest.
2. Usc a priori reasoning to identify the most important potential sources of
omitted variable bias. leading to a base specification and some '"questionable
variables.
3. Test whether additional questionable variables have nom:cro coefficients.
4. Provide ''full disclosure'' representative tabulations of your results so that others can see the effect of including the questionable variables on the coef(j.
cient(s) of interest. Do your results change if you include a questionable
variable'?

Solutions to omitted variable bias whn the omitted variable is not


observed. Adding an omitted variable to a regression is not an option i( you do
not have data on that variable. Still. there are three other ways to solve orrutted
variable bias. Each of these three solutions circumvents omitted vanabk bias
through the use of different types of data.
The first solution is to use data in which the same observational unit is
ob~crvcd at differcn t poi.nts in time. For example, test score and related data mi~h t
be collected for the same districts in 1995, then again in 2000. Data in this form art:
called panel data. As explained in Chapter 10. panel data make it possibh:: tn con
trol for unobserved omitted variables as long as those omitted variables dl' not
change over time.
The second solution is to use instrumental variables regression. This m~ thud
relies on a new variable, called an instrumental variable. Instrumental val'i ll'lcS
regrcs ion is discussed in Chapter 12.
The third solution is to use a study design in wh.ich the e ffect of in terest (fof
example, the effect of reducing class s.ize on student achievement) is stud ted t..:Stnl1
a randomized controlled cxperimem. Randomized controlled experiment. an. dis
cussed in Chapter 13.

9.2

Threats to Internal Yolidity of Mulriple Regression Anolysis

3 19

fUNCTIONAL fORM MISSPECIFICATION


Functional form misspecifkation arises ''hen the functional form of the estimated
.egression function differs from tbc functional form of the pvpuJation regression
f,mction. If the funct ional fom1 is mi$pecified, then the estimator of the partial
~ ffc ct of a change in one ot the variables will. in generaL be biased. Functional
form misspecification often can he detected by plotting the data and the estimated
r~gtession f unction, and it can be corrected by using a different f unctional form .

Misspecification ofthe
Functional Form of the Regression Function
If the true population regression function is nonlinear but the estimated regression is linear, then thi~ functional form misspecific11tion m akes the OLS estima tor
biased. l11is bias is a type of o mitted variable bi as. in which Lhe o mitted variables
are the terms that re flectlhe missing nonlinear aspecls of the regression fun ction.
For example. if the population regression functjon is a quadratic polynomial. then
a regression that omits the square of the independent variable would suffer from
omitted variable bias. Bias arising from fun ctional fonn misspecification is sum
me:ri?..ed in Key Concept 9.3.

Solutions to f unctional form misspecification. When lhe dependent variable is continuous (like test scores), this problem of potential no nlinearity can be
solve d using the me thods of Chapter 8. I f, however, the dependent vari able ~ dis
crete o r binary (for example, Y, e quals 1 if the i 1h person attended college and
equal::. 0 otherwise), things are more complicated. Regression with a discrete
dependent variable is discussed io Chapter 11.

Errors-in-Variables
Suppose that in our rcgre~sion of lesr scores against the stude nt- teacher ratio we
had inadverte ntly mixed up our data, so that we ended up regressing test scores
for fifth graders o n the student- teacher ratio for tenth graders in that dist rict.
A lthough the stude nt-teacher ra tio for eleme ntmy school s tudents and tenth
graders might be correlated, they are not the same, so thi~ m.ix up would lead to
b~as in the estimated coeffic ie nt. This is an example of error sinvariabl es hias

320

CHAPTER 9

As$8$sing Studies Bo~ on ~ltiple Regression


lx:cause its source is an error in tbe measurement of the inc.J~pcndcn t \'ariablc. n115
bias persisL even in very large samples. so that the OLS estimator'' incon'i~tcnt
if there is measurement error.
Then: are many possible sources of measurement error. I r the data arc collect d
through a survey. a respondent might give the wrong answer. For C\ctmplc one qu.: .
tion in the Current Population Survey involves last year's ca m in~ A re.,p<m~.krn
might not know hi!> exact earnings, or he might misstate it for :.orne other reao,on. 11
instead the data are obtained from compUicrized administ rot ivc record\ there mt'ht
have been typographical errors when tne data were first cnt~rcc.J .
ro . cc that error:.-in-variables results in correlation bet,\een the re;n:. r nd
the error term. suppose there is a single regressor X, ( ay. actual inco me) hut 111,11
.'<, ts measur~d imprectsely by (the respondent's estimate of income). B1.. L u\e
X,. not X,, is observed , the regression equatior1 actually estimated is the o ne b '"co
on X,. Wri tten in terms or the imprecisely measured variable
t he populattun
regression equation Y; = {3 0 + {3 1X, + u ; is

X,

X,,

Y,

= /30 + /3 1X1 + [f3 1(X, - XJ + u,)


= {30 + f31 X, + v;,

Wl)

where v, =:_ {3 1(X; - X,) + u,. Thus. the population regress100 equation written tn
terms of X 1 has an error term that contains the diffe~nce between X and Y H
this difference is correlated with the measured value X,. then thl! regrt.:"5or .\ ,''til
be correlated with the error term an d ffi1 will be biased and inconsistent.
The precise s ize and direction of the bias in {3 1 depend on the corrc.;latil'"
between X, and (X; - X,).This correlation depends. in rum, on the specific nature
of the measurement error.
As an example, suppose that the survey respondent provides he r be)t "' '"
or recollection of the actual value of the independent varia hie X1 A com e nt
way to repre ent this mathematicaJiy is to suppose that the m~::a s ured valw.. >f X.
equal:. the actual, unmeasured value, plus a purely random component, w1 All. ,rJ
ingly, the measured value of the variable, denoted by X,. is X1 = X; + w;. Beo.. U..'~~
the error is purely random, we might suppose that w; has mean ~e ro and vurhH1l't:
~and is uncorrela tcd with X1 and the regression error u1. U nder this assumrtH>II.
a bi t of algebra 2 shows that ~ 1 has the probability limit

-fR

En
ahi
err

the
terl

bJa

9.2

Threols to Internal Validity of Multiple Regression Analysis

321

E~<ROR5-IN-VARIABlES BIAS
Errors-in-variables bias in the OLS estimator arises when an independent variable is measure d imprecisely. This bias depends on the nature of the measurement
erro r and persists e,vcn if the sample size is large. If the measured variable equals
the a~tual value plus a mean-zero, ind~.!-pendently d istributed measurement error

9.4

term. then the OLS estimator in a regression with a single right-hand variable is
bjnsed toward zero, and its probability lin1it is given in Equation (9.2).

That is, if the measuremenl imprecis ion has the effect of simply adding a random
e leme nt to the actua l value of the independent variabJe, then ~ 1 is inconsistent.
Because the ratio fr.,1 rr1i v;.
_ 7 is less than 1, ~
1 wil1 be biased toward 0, e ven in large
samples. In the extreme case that the measurem ent error is so large lhat e ssentially no information about X, r emains. t he ratio of the va riances in the final
expression in Equa tion (9.2) is 0 and~~ conve rges in probability to 0. In the o ther
extreme. when there is no measurement error.
=- 0 so~~ ~ {31.

u;

A !though the result in Equation (9.2) is specific to this particular type of measureme nt error, it illustrates the more general proposition that if the independe nt
variable is measured imprecisely the n the OLS estimator is biased . even in large
samples. Errors-in-variables bias is summari7,ed in Key Concept 9.4.

Solutions to errors-in-variables bias. The best way to solve the errors-invariables problem is to get an accurate measure of X. rf this is impossible. howeve r, econometric m e thods can be used to mitigate errors-in-variable~ bias,
One such me thod is instrumental variables regression. It relies on having
another variable (the " instrumencar ' variable) that is correlated with the actual
value X, but is uncorrelated with the measure ment error. This method is studied
in Chapter 12.
A second method is to develop a mathematical mode l of the measure ment
erro r and, if possible. to use t.he resulting forrouJas to adjust the estima tes. For
example, if a researcher believes that the measured variable is in facr the sum of
the actual val ue a nd a random measurement error te rm, and if she knows or can
estimate the ratio a~./ a}, then she can usc Equation (9.2) to compute an estimator of {31 that corrects for the down ward bias. Because this approach requires speci<~liz e d knowledge about the nature of the m easure m e nt error. the details
typically are specific to a give n data se t and its m easure ment proble ms and we
sha ll not pu rsue this approach furthe r in th is textbook.

322

CHA PTER 9

A~scssing Studit!$ Bosod

on Multiple Regrt!$sion

Sample Selection
Sample electjoo bins occur:, when lbe availability of the d:'lta ts influenced \ li
selection process th.tt i~ related to the value of the dt;pend~.;ot vanablc. lnh ,elcc
lion proet:ss can introduce correlation between the error term and the rcgrl! r,
wh1ch leads to bias in the OLS estimator.
Sample selection that is unrelated to the value of tht. dcpcndtmt variablt.. d)(:~
not introduce bias. For example. if data are collected from a population b} simple
random ampling, the sampling method (being drawn at r~md om rrom the P' pu.
lation) hac; noth ing to do with the value of rbe dependent vanable. Such sam rlin~
does not introd uct: bias.
Bia$ can be in troduced when the mer hod of samphne. " related to the ''
of the dependent variable An example of sample selectiOn bias in polltn!'. , 1 ~
given m a box in Chapter 3. In tha t example. rhc sample selection method (1 n
doml) selected phone numbers of automobile owners) was related to the th.p n
dent variable (who the individual supported for president in 1936), becau~c in I 11.16
ca r owners with phones were more likely to be Republicans.
Ao ~xa mple of !iample selection in economics arises in using a regrel>!>JO "'
wnges on education to estimate the effect on wages of an additional year ol nJucation. O nly individuals who have a job have wages. by defini tion. Tht f ~.wr ..
(observable a nd unobservable) that determine whether someone ha .. a jo~ lucauon. expcn~nce. wbtre one hves. abihty, luck. and so forth - are similar to the
factors that determine how much that person earn~ when employed. Thus..t h~. 1Ct
that someone has a job suggests that. all else equal, tlte errol term m the " ~c
equation for that person is p o~it ive. Sa id differently. whether som~one ha!> a Job ts
in part determined by the omitted variables in the error term in the wage ret> ,.
sion Thus.. tbe ~impl e rct that someone bas a job, and thus appears in the dJI.l"rt.
provides information that the c1 ror term in the regression is posi tive. at lca-.,i on
ah:rage, anti could be correlateJ \\ilh the regressors. This roo can lead to om" tn
the OLS estimator.
Sample selection bias is summarized io Key Concept 9.5. The box " D o ~tt."l\:k
Mut ual Fund Outperform the Market?" provides an example of sample ~cb. itnl
bins in lioancial economics.

Solutions to selection bias. The methods we have dil>cussed so far ca 11t


ehminate ""mplc :>dcction btas. lnc metJ10ds for esllmating model" with sar Jc
sekllit)n <trt: hcyond the scope of this book. Those methods build on tht:
niquc!> introduced in Chapter ll, where further reference... .trc;. prO\I d~d.

1<. h

9.2

Threats lo Internal Validity of Multiple Regre~~ion Analysis

323

,-1

SAMPlE SELECTION BIAS

rl

::

t-

Sample selc'ction bias arises when a selection process influences the availability of
data and that process js related to the dcp~ndent variable. Sample selection
induces correlation bel ween one or more regressors and the error term. leading
to b1as and inconsistency of the OLS estimator.

l:oStock Mutual Funds Outperform the Market?


tock mutua~ f~nds are investment vc~i.cles that
hold a p ortfolio of stocks. l3y purchasmg shares

went out ot business or were merged inlO other


funds. .For thi$ reason, a stuuy using d<ita on histori-

m a mutual fund. a small iuv~stor can hold a broadly

cal ptrformancc of curr~ntl) available funds is sub-

di\'Ctsifi.:d uortfolio without the hassle and expense


(transaction cost) of buying anrJ selling shares in

ject to sampl-.: selection hins: The sample i!' selccu:d

mdivid ual companies. Some mutual funds simply


track the market (for example. by holding the stocks

re turns. because fu11ds with the low...:st returns

in the S&P 500). whereas others are actively man-

based on the value of the dependent variable.


ar~

eliminated, The mean return of all funds (inclur.Jirlg

by full-time professionals whose job is to make

the defunct) over a Len-year period will be less than


the mean return of tho!:e fund!. still jn existence al

thc fund earn a bener return than the overall mar

the end of those ten y..:nr::.. so a Study of only the lat-

ag~d

ket-and competitors funds. But do these actively

tcJ fu.1lds will ov..:rstate pert'o rmanc..:. Financial econ-

managed funds achieve this goal? Do $Orne mutual

omists ref~;r to this selection bias as survivorship

luHdsconsist~ntly

bias'' bt:cause only the bcue r funds survive 10 be in

heat o ther funds and the market?

One way to answer thes-e questions i~ to compare


future returns on mutual funds that had high returns

Lhe data se t.

over the past year to future returns on otht:r funds

ar:d on the market as a whole. ln making such com-

vivorship bias by incorporating data on defuhct


funds, the results do not paim a Uattering portrait of

pansons. financial economists know that it is impor-

mutual fund managers. Corrected for survhorship

tanl lo Sdect the samp.k of mutual funds carefully.


Ttu., task is not as straightforwe~rd a~ it !\eems. how-

bias. Lhe. economdric evidence

ev~:

Some dntaba~ include l1i!aorical tlata on funds


cu.1entl) available for purchase. but this approach

nlt:ans that th..: dogs-the most poorly performing


funds-are ()mitted from rhe datl'l ~et because they

When fin<1 ncial

econom<~tricians

indicat~s that

managcll stock mutual funds do


mark~t

corr..:ct for sur-

activdy

f\Ot Clutp~.:rform

the

on average. and pa$t good performance docs

not pred ict future good performance. For further


reading on mutual funds and survivorship bias. sec

Malkid (2003. Chapter ll) ilnd Carlull'l (1997).

324

CHAPTfR 9

Assessing Studies Bosecl on Multiple Regression

Simultaneous Causality
So far. we have assumed that causaliry runs from the regre!osors to thl! tlcpend~.:nt
variable (X causes Y) . But whHt if causaliry also runs from the dependen t variable
to o ne or more regressors (Y causes X)? lf so. ca usality run<o bad;\\-artl .. a:.\\ 11
as forwa rd, lhat is, there is simultaneous causality. If rhere '" -,unullantou~ cau<~ (.
ity, an OLS regression picks up both effects so the OLS estimator j, hiased nnll
inconsistent.
for example, o ur study of test scores focused o n the dfcct on test score' 1
rt!d ucing the student- teache r ratio, so that causality is presumed to run from t h~.:
student- teacher ra1io to test sco res. Suppose. however. that a government mlln
1ive subsidized hiring teachers in school districts with po01 test ::.core lf so, cau~.t.
ity would run in bo th directions: For the usual educa tional reasons lo''
student- teacher ra tios would arguably lead to high test scores. but because of tht
government program tow test scores would lead to low student- teacher ratit".
Simultaneous ca usality leads 10 corre lation be twee n the regressor and tlw
error term . In the test score example, suppose there is an omitted fac tor that k.u.ls
to poor lest scores: beca use of the government program, this factor lhat produlc:,
low scon:s in tum results in a low student- teacher ra tio.111llS, a negative error ll'ILTI
in the population regression of test scores on the stude nt-teacher ratio rcJulcS
test scores. but because oitlle government program it also let~ds to a decrea~~ in
Lhe student- teacher rallo. In o ther words. the student- teacher ratio is po~nh l!ly
corre lated with the error term in the popula1ion regression. This in turn lec.W\ to
simultaneous causality bias and inconsistency of the O LS estimator.
This correlation between lhe error term and the regressor can be made pre
cise mathematically by introducing an additional equation that describe:- he
reverse causa l link. For convenience, consider just the two va riables X and Y .mc.l
ignore other possible regressors. Accordingly, there are two equations. on~ in whtch
X causes Y, and one in \vhich Y causes X:
Y;

= {30 + f31X. + u, and

Equation (9.3) is the famili ar one in which {3 1 is the c(fccl on Yo[ a chang tn
X . where u represents other factors. Equation (9..4) represents the reverse caJ'3 1
d fect of Y on X. In the test score problem. Equation (9.3) represents the ~ducJ
tiona! effect of class size on test scores. while Equatio n (9.4) represents the rc\'~.-f!'l!
caus:1l effect of tes1scores on class size induced by the govcrnm\!nt program

9.2

Threats lo Internal Validity of Multiple Regression Analysis

325

SIMULTANEOUS CAUSALITY BIAS

.s111

ullancous causality hias, ah.o called s tmuhancous equation!. bias. a ri cs in a


regn.: ...~ion of Y o n X whe n. in addition to the causal link of interesr from X to Y.
there: is A cauc;al link from Y to X. This reverse causality makes X co rrela ted with
the error term in the po pula tion reg ression o f interest.

9.6

SimuJta11eous causality leads to corre la tion between X , a nd rhe e rror term u,


1n Equaton (9.3). To see this. imagine that 111 is negative, which decre ases Y;. Howe ver. this lo wer value of Y; affects the va lue of X 1 thro ugh the second of these
equa tions. a nd if ')'1 is positive , a low va lue of Y, will lead to a low value of X,. Thus,

if y 1 ts positive. X , and u, will be positive ly corrclated.3


Because this can be e xpressed ma the mmically using two simultaneo us equa llo ns, the simuhaneou{) causality bias is soroctt mes called simultaneous equations
bins. Simultaneous causality bias is summa rized in Key Concept 9.6.

Solution s to simultaneous causality bios. llte rt: ar e two

way~ to mitigate

sim ulta neous causality bias. One is to use tnslrumcntal variables regressio n, the
o t C hapter 12. The second is to design a nd to tmplt:meol a ra ndomLZed contro lled experime n1 tn which the r everse causa lity cha nne l is nullified . and such

10p1c

expe rim ents are

di scu s~ed

in Chapte r 13.

Sources of Inconsistency
of OLS Standard Errors
fnconsbtcn t standard e rrors pose a different threat to internal validity. E ven jf the

OLS es timator is co nsistent a nd th e sample is la rge. incons is tent sta ndard e rrors
will prod uce hypothesis tests with size th at <.ll lfc rs fro m tbe desired significance
h:vel a nd '95% .. confide nce intervals th at fa il to include the true value tn 95% of
rc!p<:ah:d samples.

' Jc ,Jmw th1s m.tth emaucnlly.ootc that E 4uauoo (11.4) imphc~ that em( X .11) CII\( Yn ~ y 1Y, + v ,.ll l
r e<t\(Y 11,1 ~"''"( '' 11,) \ S<>uming tbal cov( t , 11 ) II, h' l:qua uun tl 'l Ihi\ an turn 1mpl ic:~ thut
''' ( \',. 11,1 = -y1CO\ ( Y,.11) - )' cov(p ~ fi \"
11 11)'"' )' JJ.co, (X 11\ + ,,, ~(lhmg lur cov(X .11)
111 1 vdJ) tho: rl!l'ult cov(.\', .:tl = ;,u;. t 1 - )' 1 ~ ) .

326

CHAPTER 9

Asse~sing Stvdies Based on Mulfiple Regression

Tbere arc two main reasons for inconsjsrent standard errors: tmpropcrly handled beteroskedasticity and correlation of the error term acrose; ob~t!I"\CIIIOns.
As discussed in Section 5.4. Cor htstorical reason~ !\l.m ~
regrel>hlon so1tware report homoskeda<>ticity-only standard error... (l howevcr.th~
regression error is heteroskedastic, those 1>randard erron:. are not '' reliat'llt! ba "
for hypothesis tests and confidence intervals. The solution to this problem ts '
use hetcroskedasticity-robust standard errors and to con~tr uct/st ati tics using a
heteroskedasticity-robust variance estimator. Heteroskt!dasticity-robust standard
errors are provided as an option in modern software packages.

Heteroskedosticity.

Correlation of the error term across observations.

In some settings. he
population regression e rror can be correlated across observations. This wi ll not
happen if t he data are obtained by sampling at random from the populattorl
because lhe randomness of the sampling process ensures that the errors an:
i.ndepende ntly distributed from one observation to the next. Sometimes, howt:ver.
sampling is only partially random . The most common circumstance is "hen
the data are repeated observations on the same en tity over time. for example. the
same school d istrict for different years. 1f the omitted variables that constitute
the regression e rror are persistent (like district demographics), then this intlu~..es
'serial" correlatwn in the regression error over Lime. Serial correlation in the uror
term can arise in panel data (data on multiple districts for muJtiple years) ,,nJ in
time series data (data on a single district for multiple years).
Another sttuation in which the error term can be correlated across observa
tions is wben sampling is based on a geographical urut.lf there are omitted ' ri
ables Lhat refiect geographic influences, these omitted variables could re~ull to
correlation of the regression errors for adjacent observations.
Correlalion of the regression error across obsen'ations does no t make th1.. OL5
estimator biased or inconsistent, but it does violate tbe second least squares a-...ump
lion in Key Concept 6.4. The consequence is that the OLS s tandard errors-lx tb
homoskedasticy-only and heteroskedaslicity-robust- are incorrect in the sen-;c that
they do not produce confidence intervals with the desired confidence level.
Io many cases, this problem can be fixed by using an alternative formul,t tor
standard errors. We provide such a foml\lla for computing standard erro r~ I ltil l
are robust to both heteroskedasticity and serial correlation in Chapte r 10 (rtgr.:~
sion wilb panel data) and in Chapter 15 (regression with time series data).
Key Concept 9.7 summarizes the threats ro internal va lic.liry of a m ltlltple
regression study.

9.3

Internal ond External Validity When the Regression Is Used for Forecasting

327

THREATS TO THE fNTERNAL


VALIDITY OF A M utnPLE REGRESSION STUDY

----

T here are fiv~ prirn.ary threats to the internal validity of a multiple regression

study:
1. Om itte d variables
:. F uncLional form mjsspceifica~iOI)
3. Errors-in-variables (me asureme nt error in the regressors)
-l. Sample S'elect.ion

5. Simultaneous causality
Each of these, if present, results in f~ ilure of tbe first least squares assumption,
f.(u;fXv, .. . , XkJ :f. 0, which in turn means that the OLS estimator is biased and

inconsistent.
Inco rrect calculation of the standard errors also poses a threat to internal
validity. Homosk.edasticity-only standard errors are invalid if heterosJcedasticity is
present. If the variables arc not independent across observations, as can arise in
panel and time sedes data, then a further adjustment to the standard erwr formula is needed to obtain valid standard errors.
Applying this lis~ of threats to a multiple r egression study provides a systematic way to assess the internal validity of that study.

9.3

Internal and External Validity


When the Regression Is Used
for Forecasting
Up to now, the di~cussion of multiple regression analysis has focused on the estimation of causal effects. R egression models can be used for other purposes, howe ve r, including fo recasting. When re gression mode ls are u~ ed for fo recasting,
concerns about external validity are very important, but concerns about unbiased
est imation of causal effec ts are not.

Using Regression Models for Forecasting


Chapter 4 began by considering the proble m of a school supe rintende nt who wants
to know how m uc h test scores would increase if she reduce d class sizes in her

328

CHAPTER 9

Assessmg Stvdies Boscd on Multiple Regression

'ichool district. that h. the superintendent wants to know the causal effect un tesr
l>COres of a change in class size. Accordingly, Chapters 4-8 focuseu on usm~ rcr,r~'<
sion analvsis to estimate causal effects using observational data.
t\ow com ider a d ifferent problem. A parent movmg to a mctropnhtan .u~a
plans to choose where to live based .in part on lhe quality of the loca l scho(,Js.;fhe
paren t would like to know how different school districts perform on tam.J.~rd :ted
tesls. Suppose. however, that rest score data are not available (perhaps the) tc
confidential) but data on class sizes arc. ln this situation , the parent must gue~ ,11
how well the different districts perform on standardized tests bao;cd on a hmitcu
amount of informa tion. That ts. the parent's problem is to forecast averagc h~\t
scores in a given district based on information related to !est scores-in particular. class site.
How can the parent make this forecast? Recall the regression of test st:ores
on the student- reacher ratio (5 7R) from Chapter 4:

TesrScore

698.9-

2.2~

x STR.

(9.5)

We concluded that this regression is not useful for the superintendent:The- OLS
estimator of the slope is hiased because of omitted variables such as the compn:-i
tion o( lhe s1udenl body and students' other learmng opportunities outside: ~dtL.ol
Nevertheless, Equation (9.5) could he useful to the parent trying \0 choo.;e a
home. To be sure. class size is nol the only determinanl o( lest pe rfo rm an~~,. . hut
fr om the parent's perspective what matters is whether it is a reliable predJctl r oi
te!>l perlormance. TI1e parent interested in forecasting lest scores does not c;Irc
whether the coefficient in Equation (9.5) estimm e~ Lhe causal effecl on test..,._ rl!~
of class size. Rather, the pan~nt simply wan ts the regression to explain much l ' 1h~
variation in test scores acros disuicrs and to be stable-tba l is, to apply to 1h~,. Jistncts to which the parent is consjdering moving. Although omitted vanai:lk hias
rcnuers Equat1on (Y.S) useleSl> fo r answering the causal question, il sliU can he u..,cfu ll.or fo reca~:>tiog purposes.
More generall)', regression models can produce reliable forecasts, even Jl the! If
cocffic.:ient~ ha ve no causal in h.:rpretation. This recognitio n underlies much ut rh~
usc of rcgn:ssioo models for forecasting.

Assessing the Validity


of Regression Models for Forecasting
Because the superin tendent's problem and the parent's probh:m are conc~ptu~tiiY
vl!ry different tht! requirements for the validit) of the rcgrc~ ion are Jifkn.. nt for

9 .4

Example Test Score~ and Closs Size

329

problems. To obtain crcchbk c~tima tc'> of t:ausal dfl!cl~ '"c mu!>t


the threats lO intt'mal validity ~ummari:t.cdm K~y Concept 9.7.
In contra~l. 1f we are to obtain rchablc forecasts. the ._...,timatcu rcg1ession must
have g.uud explanatory power. its coeUicicnh must be estimotcd precisely. and it
mu'>t be !>tublc in the sense that the regrl!s ion e.,timatcd on one Svl of data can be
reliably u cJ to make forecasts using other d.11a. When a regri!!>~IOn model is used
tor forct:asllng, a paramount concern is that thL modd ~ ex ternally valid. m the
sense that it is stabk and quantitativd) applicahlc to the. cJrcum~tancc in which
the forecast is made. In Part IV. we return to the problem of assc..,sing the validity
of a regression model for forecasting futur~.: \,tlue or tllllC: scric<; data.
theJr

h:.,pc~o.u,e

aud rc~~

9 .4

Example: Test Scores and Class Size


The Cramcwork of internal and external ,aJidit~ helps us to take a critical look at
what we have learned-and what we ha\c not- from our analy ...is of lhe CaUforni:l tt:st score data.

External Validity
Whether the California analysis can be genera lized-that IS, whether it is externally valid-depends on t.he population and ~t: tti ng to wh1ch the gencralit.allon is
made. Here, we consider whether lht! rc-.uhs can be gencr.lhtcd to performance
o n other S l~ nd a rdized tests in other element ary puhllc school districts in the
United States.
Secttoo 9.1 noted that having more than one ~ t udy on the same toptc provides
an opponunity to assess the external valit.hty ot both stutlit!:, by comparing their
results. Tn the case of test scores and dal)S :~i ze, other comparahle data sets are. m
fac t, avCtilable. In this sectjon. we examine a <.liflcrent data set. bnc;ctl on -;tandard
i;cdtcst results for fou rth graders in 220 public school dic;trictc; in Massachusetts
in 1998. Both the Massachusetts and Cahforma tests a re l>rond mca!>urcs of student l..nowleuge and academic skiJLs. although the detaib dtlfer. Similarly. tht; organuauun t.' f classroom instruction is broadly "tmllar atth, demcutary school Jc, el
m th~ two states (<IS it is in most U.S. elt!mentary ~c h ool dt.,trict~). although aspects
ol dcmcntary school funding. and curriculum d iffer Thus. finding ...imilar re-.ults
ahout the effect of the student-teache r ratio on test performance in the California and Massachusens dat:l would he evidence of external validity of the fi ndings
in u1liroroia. Conversely. fi nding dtfkrent rcc:ults Ill the two states \\Ould raise
questions about the internal or external' ahult} ot at lc.1st om. ~o>f th~o. ~tud1es.

330

Auessmg Studies Based on Mul~ple Regression

CHAPTER 9

TABLE 9.1

Summary Statistics for California a nd Massachusetts Test Score Data Sets


Colifomia

Test score~
Student-teach~:r

ratio

Mauochusetta

Averosae

Sfanclord Deviation

654.1

19.1

701J.IS

15.1

19.6

1.9

17.3

2.J

Average

o/o English lc:amers

l 5R%

18.3%

1.1 Ofu

% Recehing lunch suhsidy

44.7%

17.1%

15.3%

Average dJ~trict income: (S)

Sl5317

7226

Standard Devia tion

2 .~~(,

I~

SIR 747

I%

S5&18

Number of ohscrvations

420

22()

Year

1999

IWM

Comparison ofthe California and Massachusetts data.

Like the Ctlifornia data. the Massachuse11s data are at the school discricllevel. The definiLmns
of the variables in the Massachusetts data set are the same as !hose in the Cnhforoia data se t. or nearly so. Mo re informaLion on the Massachu e tts data ~t:t.
including definitions of the variables, is given in Append i~ 9.1.
Table !:1.1 presents summary statistics for the California and Mass a chu~~.;~ts
sa mples. The average tesL score is higher in MassachusettS. but the tc<.t is diffc:r~:11.
so a direct comparison of scores is not appropriate. The average studenl-leJcher
ratio is higher in Cali fornia (19.6 versus 17.3). Average district income is 20'}o
higher in Massachusetts, bUL Lhe standard deviation of income is greater m C.shfornia, that il>. there is a greater spread in average district inco mes in Calih1rn1a
than jn Massachusetts. The average percentage of students still learning Englbh
and the average percentage of students receiving s ubsidized lunche!i an: hl'lh
much higher in the California than in the Massachusetls district5.

Test scores and average district income.

To save space, we do not pre:,c:nl


scatterplots of all the Massachusetts data. Because it was a focus in Chapter 8. ho,\
ever, it is inte resting to examine the relationship between test scores and aver.1g.G
district income in Massachusetts. This ~catterplol ~ presented in Figure 9. I. l11t'
general paltcrn of this scatterplot is similar to that in Figure 8.2 for Lbe Califon w
d ata:The r lat10nsb.ip between income and lest scores appears to be sreep for h1"
values of income and Q~:mer for high values. Evidently. the linear regression rlllttcd in Lhc figure mis. C!i this apparent nonlinc:arity. C ubic aod logarilhmic rcgr~:-
sion functions are also ploned in Figure 9.1. The cubic regressiOn Iunction h '' .J

9.4

FIGUR! 9.1

Tost Scores vs. Income

IW\r

331

for Massachusetts Data

T
hmoted ltneor
rngress1on function
cb s not coplvre the
nonltncor rclolton
~

Example. Test Scores and Closs Size

n 1ncomo ond

t ~cores in

the
so. husetts doto

~
Tho eshmoted linear-log
ord cubac regression
fun ~ions ore simila r
( dstract ancomes
L. tw n $1 3,000 and
SJC 000 the region
~ol 1n ng most of the
obse "'ofons.

e Calinilions
c Cali-

~2(1

L.__

_J__ _L . __

_J__ _L _ _

10

ffl!renl.
teacher
l 20%

.._~._ __.___.._~._
Jtl

_..___

.\U

__,~._ _~
50

Disrrict income
(Tho usands of dollars)

ata set.

husem

2U

slightl y higher R2 than the log<ritb mic speci fic<Hion (0.486 versus 0.455).
Com par ing Figu res 8.7 a nd 9.1 shows tha t 1h~ gene ra l pa ll ern of nonlinearity
found in the C.alifornia income and test score d:J ta 1~ abo prescm in the Massac husettl> dl\ta . The precise functional fo rms tha t bec;l describe this nonlinearity
diffe r, however, with the cubic specification fiumg best in Massachusetts but the
li near-log ~pecificai ion fining best in California.

Multiple regression results.

1t present

er 8. h<'"".
:l 8\ c.;.f..t!;"
~ 9.l. ,,..

::aliforni.t
~p for lo"
.sion pi<'t
tic reer~~
t ion h 1.. 3

Reg.resson rec;uhs for the Massachusells data


are presented in Table 9.2. The firs t regression. r~port ed in column (1) in the table.
has onl y the student-teache r ra tio as a resressor. The slope is nega tive ( -1.72).
a nd the bypo lhesis that the coefficient is zero can be rejected a t the I % stgnificancc level (1 = - 1.72/0.50 = - 3.44).

l1le r~:maining columns re port the results of mcluding additional varia bles tha t
control for student charac te ristics and o f introduci ng non linearities into the estimated regression function. Controlling for the percentage of English learners. the
percen tage of students e ligible for a free lunch . and average district income
reduces thl! estima ted coefficie nt on the srude nt teacher ra tio by 60%, from - 1.72
in rep.re-.sion (1) to -O.I'i9 in re~rc!>sio n (2) ant.! O.M m rcgre.,,Jon (3).

332

Assessing Studies Based on Multiple Regression

CHAPTER 9

TABL9.2

Multiple Regression Estimates


of the Student-Teacher Ratio and Test Scores: Data from Massachusetts

Dependent variable: overage combined English, math,


cmd science test score in the school district, fourth grade; :Z20 observa.;ons.
Regressor

Student-teacher ratio
(STR)

I 1I

(2 )

(3)

- l. 72'"''

-0.69*
(0.27)

-0.64'
(0.27)

(0.50)

STR2

12.4
(14.0)

---

- -

- 1.02""
(0.37}

-0.67*
(0.27)

IIi

0.0 11
(0.013)

% English learners

--

- OAU
(0.306)

STR

Sf;

--

- 12.6
(9.8)

- - - -

--

% Eligible for free lunch

- 0.521"*
(0.077)

-- 0.582**

---

- 0.587**
(0.104)

(0.097)

--

16.53*"
(3.15)

District income {logarithm)

- 3.07
(2.35)

D istrict income

- 3.38
(2.49)

--

- 0.709*4
(0.091)

---

... -- - 3.~7*

(2.49)

0.184*

-O.o5J'
(0.72)

-3.22
(231)
0.165
(0.085)

-0.0022 "
(0.0010)

-0.0023*
(0.0010)

-0.(1022'

682.4"'*

-7..4.0**

-0.0023*
(0.0010)

(21.3)

665.5**
(81.3)

759.9*

(11.5)

747.4""'
(20.3)

739.6**
(8.6)

coe

0.174
(0.089)

-District income3
--

0.80
(0.56)

0.164
(0.085)

District incomc2

Intercept

-0.434
(0.300)

-0.437
(0.303)

% English learners >


median? ( Binary. HiEL)

f/iEL

(6 )

-0.680
(0.737)

-STR;

(5)

(4 )

---

..........

--

(0.090)

- '

(23.2)

( ().()() }() 1

f1c1ble 9.:? wnu1w ,f!

Comparing the R2 's of regressions (2) and (3) indicates that lhe cubic spl!cifl
cation (3) provides a better m ode l of the relationship between test scores and
incom e than does the logarithmjc specification (2) , even holding constant the ::.tu
dent-teacher ratio. There is no statistically significant evidence of a nonlinear relationship between test scores a nd the stude nt- teacher ratio: The F-statistir in
regression (4) testing whether the population coefficients oo STR2 and STR3 ar~
zero has a p-vaJue of 0.641. Similarly. there is no evidence that a reduction in the:
student- teacher ratio has a dillerent effect in districts with many English learner'

9 .4

Example: Test Scores and doss Size

333

1 Tnhlt! 0.2 CO!l/ltlllt'tf)

FStGti$tiCS and p-Values Testing Exclu.6ion of Groups of Variables


(1)

(3)

(2)

(4 )

2.86
(0.038)

All STR variables and

=0
'i TR~. STR 3 = 0

tnteractions

(5)

(6 )

4.01

(0.020)

0.45
(0.641)

1-

7.74
(< 0.001)

Income, Income-'

-- - -

7.75
(< 0.001)

5.85
(0.003)

6.55
(0.002)

1.58

1/iEL, H1EL X STR

(0.208)
~~

Rl

1~.64

0.063

8.69

K6l

8.63

8.62

8.64

0.670

0.676

0.675

0.675

0.674

Tlte<e rcgre;;sions ..ere est ima ted ustng the data on Massachusett~ elementary ~hoot d1Mncts described m AppendL'I: 9.1. Standanl
drcln. ar.: gtvCtl in parentheses under the ~oeificicms. and p-va lue~ are given in parembe"&eS under tbc F-~tatistics.!ndh-idua l
.-O":ftinent~

are statistically significant at the *5% le vel or '"'1% leveL

tban with few [the t-statistic on HiEL X STR in regression (5) is 0.80/ 0.56 = 1.43].
Finally, regression (6) shows tha t the estimated coefficient on the student-teacher
rario does not change subst.aotially when the percentage of English learners [which
is insignificant in regression (3)] is excluded. [n sh a n , the results in regression (3)
are not sensitive to the changes in functional for m and specification considered in
regressions (4)-(6) in Table 9.2. Therefore we adopt regression (3) as our base estimate of the effect in Lest scores of a cha nge in the s tudent- teache r ratio based on
the Massachusetts da ta.

Comparison of Massachusetts and California results. For the California


data, we found:
l. Adding variables that control for student background characte risti cs reduced
the coefficient on the student-teacher ratio from -2.28 [Table 7.1, regression
(1)] to -0.73jTable 8.3, regression (2)) . a reduction of 68% .

pecifi

es and
be sru

1r rela
lstic in

2. The hypothesis that the true coefficient on the stude nt-teacher ratio is zero
was rejected at the 1% significance level. even after adding variables that control for student background and district economic characteristics.

.R;. ar.:

3. The effect of cutting the student- teacher ratio did not depend in an important
way on the percentage of English learners in the district.

in thC

4. The re is some e vidence tha t the rela tionship between test scores and the stu-

~a rner-

deut- teacher ratio is nonlinear.

r-334

CHAPTER 9

Assassins Studies Based on Multiple Regression

Do we fincl the -<ume things in Ma'isachusetrs? For fi ndings ( I). ( 2), and (i),
tbe answl.r is j c~ Including. the additional control variables reduce~ thl. codllcicm
on the student teacher ratio Crom - 1.72 [Table 9.2, regrc-;sioo {1)) to -0 69 [ rd tl~:
'J.2. rq~re~~mn (2)}, a reduction of 60 ~. The coelltcients on the tutk nt h. td cr
rauo rematn sumifit.:ant afte r adding the cont rol variables. Tho<.c co..tttdents nrc
on I} <:tgnificant at the 5% kvel in the \!fassachusens datJ, '' hcrc.:h they 1rc """
nificant at th(; I 0 o level in the Cahforrua data. However, there arc nearly t\\ 1~.;~.. s
many oh... ~.-rv:ttlo n~ in the camornia data. so it is not surprisingt hat t h~.o ( ahtor r J
estimates are more precise. As io the Ca lifo rnia data, there is no swtistically
sig.mhcant evtue nce tn the Massachusetts data of an interaction hct ,~ccn the
studeDL-t~.-,tch~.;r ralio and the binary variable indicating a large percentage ol En!!.ll~h lcJmcrs 111 the districl.
Findmg (-I). however. docs no t hold up in the Massachusetts data: The h) pnth
csis that thl! rc!lationship bt!lween the studenL- teache r ra tio and test score~ 1:. linear cannot be rejected at rhe 5% significance level whe n lesred against a culm:
specification.
Because the two s10udardized tes1s are diffe rent . the coefficients rh e m c;d v~.,
can not be compared directly: One point on the Massachusetts rest is not the c;ame
as o ne point on rhe California test. If. however. the test scores are put into the .. ne
units. then the eMimated class size effects can be compared. O ne way to do tl L<
to transform the test scorc:s by standardizing them: Subtract the sample <J\~.o Jge
and divide by tbe sta ndard deviation so that they have a mean of 0 and a vJr 'ICC
of l. The '>lope cocff,ctents in the regres!>ioo wnh the rransformed test score L.lJUal
the slope co~ffiden ts in the originaJ regression. dJVtded hy the standard devu ,.,
o( the test. Thus the coefficient on the studen t- teacher ratio. divided by Ult! c:
dard deviation of rest scoces. can be coo1pared across the two data sets.
Tit is cornp<trison 1s undertaken in Table 9 .3. 111~ first column reports the Ol-\
estima te::. of the cocflicienl on the student- teacher ratio in a regres!'ion \\ith ile
percentage o( E ngli~h l ea rne r~ the percentage of !otudents eligible (or a fret: lunch.
and lhe average d i~ tri ct income included as control variables. The second colu ln
reports the standard ~.kvia tlon of the test !>COres across district~. The finai i~H,
columns report the estimated effect on test scores of reducing the st ude n t-le<t~.h .: r
ratio by two students pt.:r teacher (our supcrintl.!nde nt's proposal), first in the llnll'i
of the tesr. and second in ' tamlan.l devia tion un its. For the linear specific<ltion.tllc
CJLS coefficient cst im nle using Ca lifo rnia data is - 0.73. so culling h~
<.tudcnt- teachcr ratio by two is estimated to increase dh.trict test scores by t 7-'
X ( -2) = J.46 points. Becaw.e the standard deviation of lc:~tscorcs is 19.1 pt, '
this corrc::.pond:> to I .46/ 19.1 = 0.076 standard deviat tons of the Ol:>tribullllO
tc~t ' core:' acr\",,S distnct!l. lbc standard error ol th i~ esti m at ~.o ts 0.26 x 2 lli

TAB

I
I

Colifc

Lm..a
Cubic:

Rrduc
Clthi(;
l?ttduc
Mane

Lmear
I
!>tandar

9 .4

TABlE 9.3

Example. Test Scores ond doss Size

3 35

Stvdeni'-Teacher Ratios and Test Scores:


Comparing the Estimates from California ond Massachusetts
Estimated Effect of Two fewer
Students per Teacher, In Units of:

OLS utimate

Standard Deviation
of Test Scores

fJSTR

Across Di5tricts

- 0.73

19.1

(0.26)

19.1
19.1

Points on the Test

140

Ull76

(05:!)

(0.02i)

'H)~

(0...(1)
l)t)

(11.1''1)

O.t'H
(0.27)

lS.l

Standard
Devlotion5

,~ 3

(IJ.U37)

0.099
(0.036)

1.2.1!

O.tJS5

(0.54)

(lll13o)

0.027. The csLimated effeclS for the nonlinear models anu lhetr srandard errors
were computed using the method described in Section 8 1.
Based on the linear model using Ca lifo rnia data, a reduction of two students
per teacher is estimated 10 increase test scorcc; hy 0.076 Slandard denat ton unit,
wtth a standard error of 0.027. The nonlinear mo<klo, for Caltfot nia data suggest a
somewbal larger dfecl. with the speciftc ellect depending on the tntt lul -.ludent- lcacher ratio. Based on the Ma~ac husctts dcna. thb ~:'it I mated effect is 0.085
standard Jevia1ion unit, wilh a standard e t ror of 0.036.
These estimates are essentially lhc .... unc Cutting the "rudcnt- teacher ratio is
prc.:dicted to raise test scores. bul the predicted improvement is small. In the CalIfornia data. for example.. the d1ffe rt:nce in tc-.t scori!S bet\\\,;1!0 the met.ltan de>tnct
and a district at the 75'h percentile IS 12.2 tec;t score pomts (Table 4.1), or 0.64 ( =
12 2/19.1) standard deviations. The e!>llmutcd effect Irom lb.: linear modd b. just
O\ ~: r one-tenth tllis size; in other words, ac-cording ll.l th i"l'"limatc. cutting the student teacher- ratio by two would move a d1'trict onh one-tenth of the wa\- from
the mcd1an to the 75lh percentile of the distribution of test scores acr\)SS districts.
Reducing lhe student- teacher ratio hy two is a l.trgc ch an!~ (or a dh tri l. hut the
cstimatt.:d benefits shown in Table 9.3. whtlc nonLero, .11 ~ small.

336

CHAPTER 9

Assessing Studies Based on Multiple Regression


This analysis of Massachu::;etts data suggests that the California results <trl!
externally valid. at least when generalized 10 elementary school dbtncts el<;t,;wherc
in the United States.

Internal Validity
The similariry of the results for California a nd Massachuscw. d~s not en ure tt t:tr
internal validity. Section 9.2 listed five possible th reat~ to mtcmal vahdtty t 1
could induce bias in the estimated effect o n tes r scores on cla~s size. We conslt... r
these threats in turn .

Omitted variables.

1l1e multiple regressions reported in this a nd previous


chapters control for a student characte ristic (the pe rcentage of English learn~:rs).
a family economic characteristic (the percentage of swde nts receiving a subsitli;cd
lunc h) , and a broader measure of the a fOue nce of the dis trict (average distnct
income).

Possible o mitted variables remain. s uch as other sch ool and srudent characterisrics, and their omission might cause o mitte d variables bias. For example. it the
s tudent- teache r ra tio is co rrelated \\-ilh teacher quality (perha ps because bc'ter
teachers ar e a ttracted to schoob with smalle r s tude nt- teache r r atios) , and if
teacher quality affects test scores, then omission of teache r q ual11y could bta~ the
coefficient o n the student-teacher ratio. Similarly, districts '' ilh a low <;tude,tteache r ra tio might also offer many extracurricular learning opportun.itie-... Al.;o,
distr:icts with a low stude nt- teache r ratio ought attract fami lies Lhat are mo re ommined to enha ncing the ir child ren's learning at home. Such omiued facwrs C .Jh.l
lead to omitted \'ariable btas.
O ne way ro eliminate omiued variable bias. at least in theory. il> to cond uc. an
exper imenr. For examp le, students could be randomly assigned to different '>1/.t'
classes, and their subsequent performance on sta ndardized tests could be
compared. Such a study was in fact conducted in Tennessee. a nd 'I'Ve examine tt in
Chapte r 13.

Functional form .

The analysis here a nd in Chapter 8 explo red a varict) d


function al forms.. We found that some of the possible non lincar ities invesuguted
were not sta tistically significant. while those tha t wcr..; did not substantiall) .lltt:r
the est imate d effect of reducing the studen t- teacher ratio. A lthough further func
tiona! fo rm ana lys is could be carried out. this s uggests that th~ ma in finc..lin!!~ ~Jf
these studies a re unlike!} to be sensiuve to using dil"f~.:rent non linear regrc._,,on
specifications.

9.4

Example: Test Scores and Closs Size

337

Errors-in-variables. TI1c average student-teache r ratio in the district is a

broad and potenlla!Jy maccurate measure of class size. For exam ple. hccause students m ove in and o ut of districts, the studenL-te.acher ratio might not accurately
represent the actual class sizes experienced by the students taking the test, which
in turn could lead to the estimated class size effect bei ng biased toward zero.
Anothe r va ri able with pote.ntial m easurement error is average d istrict income.
Those data were taken fro m the 1990 census. wbilc the other data pertain to 1998
(Massachusetts) or 1999 (California). If the economic composition of !he d istrict
changed substantially over the 1990s, this would be a n imprecise m easure of the
actual average district income.

ir
at
er

Selection. T11c California and the Massachusetts data cover alJ rhc public elementary school districts in the sta te that satisfy minimum size restrictions, ::.o there
is no reason to believe that sample selection is a problem here.
Simultaneous causality. Sim ultaneous causality would a rise if the perforthe
tter
d if
the
ntlso,

om-

mance on standardized tests affected the student-teacher ratio. This cou ld happen, for example, if there is a burea ucratic or poli(jcal mechanism for increasing
the funding of poorly performing school!. or districts, which in tum resulte d in hir
ing more teachers. ln Massachusetts, no such mechanism for equalization of school
financing was in place duri.ng the time of these tests. In California, a series of court
cases le<.lto some equalization of funding, hut t his redistribution of funds was no t
based on stude nt achi~veme nt. Thus, in neither Massachusetts nor Califo rnia does
simultaneous causality appear lObe a problem.

ould

Heteroskedasticity and correlation ofthe error term across observations.


ct an
size
d be
it in

ty of
gated
alter

All of the results reported he re and in earlier chapte rs use heteroskedastic-robust


standard errors. so heteroskedasticity does not threaten inte rnal validity. Correlation of the error term across ob~ervations, however, could threaten the consistency
of tbe standard errors because simple random sampli ng was not used (the sample
consists of aU elt:m entary school districts in the state). Although there arc a lternatjve standard error formul a!'. that could be applie d to this situation, the details
are complicated and specialized and we leave them to more advance d IC!xts.

rrune-

Discussion and Implications

logs of
ession

The similarity between th ~ M n~~achusetts and Californ ia resul~s suggest that these
studies are externally valid. in the sense that the main fin dings can be generalized
to performance o n standardi%ed tests at othe r elementary school d i~t rict s in the
United Sta tes.

338

CH APTER 9

Assessing Studies Bo~ed on Multiple Regression


Some of the most imponant potential threats 10 internal validity ha ve bCt'n
addressed by controlling for student background, family economic background.
and district affl uence, and by checking fo r oonlinearities in the rcgressmn function . Still. some potential threats to internal validity remain. A lcauing canditlall!
is omined variable bias. perhaps ansing because the conLrol \ariahl~l> lltl not
caprure other characteristics of the school districts or extracurricular learning
opponunitics.
Based on both the Cali fo rnia and the Massachuseus data, we are ahk
answer the superintendent 's question from Section 4.1 : After controlling lor lamily economic background, student characteristics, and district afnuence. and .tfter
modeling nonljnearities in the regression fun ction, cutring the st u den t-tc<tch~or
ralio by two students per teacher is predicted ro increase test scores by approximately 0.08 standard deviation of the dis tribution of test scores across di'\trkt:-.
This effect is statistically significant, but it is quite smalL This small estimated efh!ct
is in line with the results of the many studies that have investigated Lhe cflccts on
test scores of class size red uctions.4
The superintendent can now use this estimate to help her decide whethct to
reduce class c;izes. ln making this decision. she will need to weigh the CO!ItS of the
proposed reduction agains t the benefits. The costs include teacher salaricc; and
expenses for additional classrooms. The benefits include improved academic performance, which we have measured by performance on standardized teste;. but
there a re othe r potential benefits that we have not studied. including lo" r
dropout rates and enhanced future earnings. The estimated effect of the propo~al
o n standardized test performance is one imporrant input into her calculation of
costs and benefits.

'u

9.5

Conclusion
The concepts of internal and external validity provide a framework for asse!)sing
what has been learned from an econometric study.
A study based on multiple regression is internally valid if the estimated cod
ficients are unbiased and consistent, and if standard e rrors are consisten t. Thrc 1t~
to the internal validity of s uch a study include omitted variables, misspectficauon
o f funct ional form (nonlinearilies), imprecise measurement of the indept:nd~.nt

'If you arc mtc!rc:,tc:J in learning mor~ about the rc:lation,hip between claot~ sw: anc.J t~:'t
the reviews by I hn:nht:r~. Bn:wcr. Oamoran. and \\~llms (2001a. 200lb).

)UlrL).

,~..

Summary

339

vnnablcc; (errors-in-variables), sample sl!lcct10n. and Mmu1tancous causa lity. Each


of these introduces correlation betwcc,;n the regrc.,sor and the error term. which
in tum makes OLS estimators biased and ~nconsis t e ot. If the errors arc correlated
across obser va tion~. as they can be with time series data. or if they are heteroskedastic but the standard errors are computed using the homoskedasticityo nly fo rmula. then internal validity is compromised because the standard errors
will be inconststent : These Iauer problems ca n be addressed by computing the standard errors properly.
A study using regression analysis. bke any statistical study, is externally val.id
tf it::.lindings can be generalized beyond tl1e population and setting stud1ed. Sometimes it can help to compare two or more studies on the same topic. Whether or
not there are two or more such studies, however. assessing external validity
requires making _judgments about the similarities of the population and settmg
studied aJtd the population and selling to which the results are being generatized.
The next rwo pans of this textbook develo p ways to address threats to internal validity that cannot be mitigated by multiple regression analysis alone. Pan ll1
extends the multiple regression model in ways design ed 10 mitigate all five
sources of potential bias in the OLS e!.timator; Part Jn also dscusses a differe nt
approach to oblainng internal validity, randomi?.cd controlled experiments. Part
I V develops methods for analyzing time series da ta and fo r using ti me series
data to estimate so-called dynamic causal effects. which are causal e (fects tha1 vary
over time.

Summary

lS
(10

nt

1. Statistical studies are evaluated by asking whether the analysis is inte rnally and
exte rnally valid. A study is inte rnally valid if the s wtistical inferences about causal
effects are valid for tbe population being ~t u died . A study is externaJiy valid if its
mfe rences and conclusions can be generalized from the population a nd setllng
studied to other populations and settings.
2 In regre::~s ion estimation of causal effects, there are two types of threats to internal validity. F'trst. OLS estimators wi ll be inconsistent if the regressors and error
terms are correlat ed . Second , confidence int ervals and hypothesis tests are not
valid when the standard errors are incorrect.
3. Regressors and error terms may be correlated when tbere are omitted variables,
an incorrect functional form is used , one or more o f the regressor:. is measured
wi th error. the sample is chosen oonrandomly fro m the populatio n. or there is
simultaneo us causalit y between the regressors and dependent vanablc:..

340

CH APTE R 9

AsseiS1ng Stud1es Based on Mult1ple Regression

4. Stand.1rd arors arc incorrect when the errors are heteroskcdasti<. and the com
puter software use ... th ~.; homoc;kcdasticity-only standard errors, o r when the crro1
term i" corrdated across dilfercnl observations.
S. \\hen regression modds arc used solely Cor forecasting, it i not ncce"' tr) for th
re tlrc~ ton coeft1cients to be UJl biased estimates of causal effccr... l t is l:ntictl hm\
ever, that the regression model be externally valid Cor the lor~casting application
al hund.

Key Terms
internal '.thduy {3l3)
(313)
pupul.uioo .,tudied (313)
popui<Hion of interest (313)
setting (315)
c\t~.; rnal v<~hJlly

functional form misspcciilcarion (.liY J


errors-in-variable bias (319)
sample selection bias (322)
simultaneous ca usalily (324)
simultaneous equations bias (325)

Review the Concepts


What is th.e ditference between internal and external validity? Be t\\~,;cn the
population studied and the population of interest?
9.2 Key Concept 9.2 describe the problem of vnriuble selection in term' 111 1
tradeotf belween bias and variance. What is this tradeoff? Why could indlld
ing an additional r~g ressor decrease bias? Jocrease variAnce?
9.3 Economic va riables are often measured wtt h e rror. Docs this mean .1 1
regression analysi<; is unreliable? Explain.
9.4 Suppo e that a late oflcred voluntary standardrzed rests to all of 1L' thtrd
graders. and these data were used in a study of class stze o n srudc:nt p~~rrM
mance. E'q)lain how sample selection bias might invalidate the re... uhs..
9.5 A researcher estimates rhc dfect on crime rates of spending on polic Lw
~mg city-level data. Explain how simultaneous causali ty might invaJiJ,,tc
the re~ults.
9.6 A researcher c:'otimatcs a tcgrcssion using two differen t software pack..c )
The firq u es th~.- homoskcda~licity-only formula for standard error, 1"

9.1

second u~cs I he h c tcro~kedasticiLy-robu~t forn1ula. The standard error>

'e!J dtllcrent. Which should the researcher uo;c? Why?

r,

Exercises

34 1

Exercises
9.1

9.2

Suppos"' that you have just read a careful statistical stud} o t the t..!lkct of
advertising on the demand fo r cigarettes. c,mg data fro m Ke" York d uring
the 1970s, it concluded that advert i!;ing on bu~cs and ubways was more
effective than print advertising. L1se the concept of external validity to determine if these results are likely to apply to Bo:,ton in the 1970s: Lo:, /\ngeles
in the 1970s; Ne"vYork in 2006.
Con~ ide r the one-variable rcgre""ion model: Y, = {30 - {3 1X ; - u,. und suppose that it satisfies lhe assumptJon 10 Ke} Concept -U. Sup~ that Y, is
measured with error. so th.tttht.: data arc Y
, = Y, -r ., , \\here "' is the mea:,ure me nt l!rror which is 1.1.d. and independent of Y and X,. Consider the
population regression Y; = {311 ..~.. {3 1 X, ..~.. v" where v, is the rcgre~~ion error
using the mismeasured dependent variable. Y,.

a. Show thar v1 = u, + w 1
b. Show that the regression Y; = {30 - {3 1X 1 + v, "atisfies tht! assumptions
in Key Concept 4.3. (Assume that w1 is independent of Yi and X1 for all
V<llues of i and j and has o fin ite fourth moment.)
c. Are the OLS estimator' consb tent?
d. Cao conftdence mtervals be constructed in the usual way?

e. E valuate these statements: " Measurement error in the X's is a senous


problem. Measurement error in Y is not.''
Labor economists studying the dete rminants of women's ~arn1ngs discovered a puzzling empirical result. Usin~ randomly sclecteJ t!mplo}cd women.
1hey regres<;ed earnings on the women's numtx r of children and a ~ct of control variabh.:s (age. education , occupation. and so forth) . Thn fo und that
women with more children had higher wages.. conrroUing for these o ther raetors. Exphun how sample selection might be the cause of t hi ~ r~:,u lt . (Hint:
Notice that the sample includes only women who an~ worki ng.) (l111S empirical puzzle motivated James Heckman's research on sample sclc~.:tton that
led to hs 2000 Nobel Prize 1n econOmics.)
9.4 Using the cegressions -;hown in column (2) of Table R.3 and column (2) of
Table 9.2, con!>truct a tabk like Table 9.3 to compare the estimat~::d et'fects of
a 10% increase in district tncome on test scores in Cahfornia ami Mas:.achuse tts.

9.3

342

CHAPTER 9

Assessing Slvdies Bosed on Multiple Regress1on

9.5

The demand for a commodity is given by Q ""' {30 + {3 1P + 11. "here Q


denmes quantit) , P deno tes price, and u denotes fact o rs other th an pncc "''
de te rmine dem and. Supply for the commodity IS g1vc n b) Q == y , , ) p
v, where v d ~not c~ factors o ther lhan price that delermine suppl) Suppu::.l!
th at u and v bo th have a m ean of zero, have vanance), cr~ anJ rr~. anJ are
roulually uncorrclatcd.
a. Solve the two simultaneous equations lO show bow
o n u and v.
b. De rive the m eans of P anu

Q and r t..lepcnd

Q.

c. Deri"e lht: varia nce o f P. the vanance of Q .and the covariance


between Q a od P.
d. A random sample of observations of (Q;, P,) is collected, and Q1 is
regressed on P1. (Tha t is. Q1 is th e regressand and P, is the regressor.)
Suppose that the sample is very large.

i. Use your answers to (b) and (c) to de rive values of the regres~ion
c.oeiTicie nt ~ fHint: Use Equations (4.7) and (4.8).]
ii. A researcher uses lbe slope of rhis regressto n a~ an estimate of the

slope of th~ Jemand funct10n ({31). Ts thee tima red slope too I. r '
or too sma ll? (Him: Use the fact l.bat demand cu rve~ slope d vwn
and supply c urves s lope up.)
9.6

Suppose n == 100 U.d. observations for ( Y;. X,) yield t he fo llowing regrc:::.~ inn
results:

Y = 32 I + 66.8X. SER = 15.1 , R2 = 0.81.


(15 1)

(12.2)

Another researcher ts interested in the same regress1on, but he make" n


error when he e nters the data into h1s regres~io n program: He enter' ~.1 h
observation twice , so he has 200 observations (with observation I entered
twice. observation 2 e nle red twice. and so forth) .
a. Using these 200 observations. what results will be produced by Ius
regression progra m? (Hint Write rhc ''ioCOJJ ~ct '' values ot the '~ 1 1
means, va ri ance~ and covarianct.::. of Y a nd X as function::. of the correel" values.. Use these to determine the regres~ion statistics.)

Y=
(_

X .S R = -- R: = --.
-(__J

Empirical Exercise5

eQ
that

343

b. Which ( if any) of the internal vahditv oonditaon., are violated?


9.7

P-

Are lhe following state ments true o r tabe? Explain your an~wt!r.
a. ''An ordinary least squares regression of Y onto X will be intcrnaUy
inconsistent if X is correlated \\flth the e rror term ."

b. 'Each of the five primary threats to inte rnal validily impli ~s that X is
correlated with the error te rm."

9.8

Wou ld the regression in Equation (9.5) be useful for predicting cest scores
in a school district in Massachusetts? Why or why not?
9.9 Consider the linear regression of Te.r1Score on Income shov.'ll in Figure 8.2
and the nonlinear regressio n tn Equatio n (8 18). Would e ither of these
regress1ons provide a reliable esti mate of the effec t of income on test scores?
Would either of these regressions providt: a reliable me thod fo r forecasting
test scores'? Explam.
9.10 Read the box "The Returns to E ducation and tJ1e Gender Gap'' in Section
8.3. Discuss the interna l and external validity of the estimated effect of education on earnings.
9.11 Read tbe box "The Demand for Economic~ Journals' in Section 8.3. Discuss
tbe internal and external validit ot the estimated effect of price per citation
on subscnptions.

Empirical Exercises
E9.1

es an
rs each
ente red

his
sample
te "cor

Use tbt: data set CPS04 described in Empirical Exercise 4.1 to answer the
following questions.

s. Discuss the internal validity of the cegrcssions that you used lO answer
Empirical Exercise 8.1(1). Include a discussion of possible omitred
variable bias. misspecification of the functional fo nn of ilie regression.
e rrors-in-variables, sample seJect10n, simultaneous causalaty, and
inconsistency of the OLS srandard errors

b. The data set CPS92_04 descnbed

tn Emp,rical Exercise 3.1 includes


data from 2004 and 1992. Use these data Lo 111vestigate the (temporal)
ex ternal valldity of the concl u~io ns that you reached 1n E mprrical
Exercise 8.l(J). lNote: Remembe r lo adju t for mflation as explarned
in Empirical Exercise 3.1(b).l

3 44

CH APTE R 9

Assessing Studies Based on Multtple Regression

E9.2

A commiuee on improvmg uodergrad uate teaching at your college need;


your help befort: reporting to the Dean. The commlllee se(.ks your ad\ tCC
as an econometric expert, about whether your coll eg~ should take phy'tl:al
appearance 1nto accoum when hiring teaching facu lty. (This ;., legal "' lo 11
as doing so is blind to race. religion , age, and gender.) You do not h:~"c ti m~:
to collect your own data. so you musl base your rccommcndauons on 11c
analysis of the dataset TeochingRatings described in Em pineal Exerc1~c: 4 2
that has served as the basis for several Empirical Exercises m Part II ol the
text. Based on your analysis of Lhese data . what is your advice'? Jusllt}' ~~ ur
adv1ce based on a c. :trt rut and complete assessment of the: internal and
r
nal validity of tht: regre sioos that you carried out to answer the Empiricctl
Exercises using these data 1n earlier chapters.

t,

E9.3

Use the data set CollegeD ishmce described 1n E mpirica l Exercb<:! 4.J to
answer the followin g questions.
a. Discuss the internal va lidity of the regressions that you used to answer
Empirical Exercise 8.3(i). Include a discussion of possible omitted
variable bias. nusspecifica tion of the functional form of the reerCS'>l 1.
e rrors-in-variables, S<tmple selccuon, stmultancous causaliry, anJ
inconsistency of the OLS standard errOl".).
b. The data sc i College Distance excluded stude nts from western st.u ..-;;
data for these studtnls are included 1n the data set CollegeDis
tanceWest. Ul!e these dAta to investigate the (geographic) external
validity of the conclusions tha t you reached in Empirical Exercise 8.3(i).

APPENDIX

9.1

The Massachusetts
Elementary School Testing Data
The Massachusetts data are d:.tril:twide averages for public elem~ntary school dl,lfl 199 . The tc~l ~ore s tahn In m the \1a '"'chu~nc; Com pr~h~N'e As.sl.>~rncnt \~
(\IICAS) (.Cc;t adminhlt:ro.:d to all fourth gr..Jcr-. in ~las,achu~em public school'
spnng of I'1:l8. The test ' ~ SJ)(,IO\Orcd hy Lhe M.ssachusem ()epdrtment of Educ.1rion nJ

The .V.Ossochuselts Elementary School Testing Data

:e.
~~

ng
me

the
4.2
the
OUT

aer-

cal
3 t tl

tcs:
l

istn.:l' ,n
t !),,,_ ltl

h itt hC
auon ,nd

345

'' mand.tloJ~ for all public !>Chools. The d,u,t anJI}/cJ hlrc.: tre the ovc.:rall tmal l>Corc. which
is the ~urn ~lf the score~ on the En lish math. and sciem:" p<lrtion ~ the te t.
Data on the )\udent-reachcr rauo, the rcrcentage of o,tudenh rccc.:i,mg a <;Ubsidizcd

lunch and the percentage ol students still I~ 1m10g cngli~h are averages tor each elemenlilry :.chool th<;tOCl forth~ 1997 19'JH school \I!M and \\c:rc obtamed rrom the Ma~cbu
setts Dc.:p.1rtment of Education. Data on av~:r.tgc distnct tncnmc ''ere obtamcd from the
1990 U.S. Cen'~

PART THREE

Further Topics in
Regression Analysis

Cu ~ 1., r R 1o

Regression with Panel Data

Cu

Regression with a Binary Dependent


Variable

PTE;R

11

CuA PTER 12

Instrununtal Variables Regression

CuAtT

Experiments and Quasi-Experi1nents

13

~~ ~/1-t< ~
CHA PTER

.,..~ ~.

~,~ ~ '~

I{J<D(1()~

10 I Regression

with Panel Data

the variables. however. they cannot be included 111 the rcgresston anllthc OLS
esl imato~ of

,~
~~!

the rcgresston coefficients could have onu tted vanable btas.

This chapter de cribes a method for controlhng for some types of omitted
variables without actually observing them. This method requtres a specific type

of data. called panel data. in which each observational unit, or eo lit). is observed

, (u).ts

at two or more Lime periods. B~ stud) mg dwnge' in the Jcr.end~:nt ,,111abk o'er

6'~MI/~I

cirn~:. il j,

(Jif!;Y/w~

po siblc to eliminate the effect of omittc<J \,eriatilc' tHut iliffcr nero's

p.y ~

The empirical application in this chapter concern

;v~-

the ~ffcc:;t' of alcohol taxes and drunk dn' in~ Ia\\ ~ un trarfic f:H:1Iitic' We

])~~~ ad=dr;s this question using data o n traffic fatalitic~. <llcohol taxc~. drunk driving

t/Z...

".f.ws. and related vadahles for the s contiguou U.S. >1-'C' fnr each or the >even

~~'tar!> from

1982 to 19AA. This P.ancl data set

l~t-; us control for unob~crvcd

variables that differ (rom one state to the next, such as prevailing cultural
:auitulh.:' t \\an! dnnktng and d.n' ing bu.t do not

~ e~

e~ ~

(haug~: O\CT llm~o:

lt also

variabk~ that va~ throu!!-h time. like tmpro, ~mcnt!> mj


the ...~ret\ of nc'' c.tr<;. hut do not 'ar: aero~ )tat~'
~?
Section 10.1 describes the structure of pand data a nil introuucc., the aruft~
.

aile"' u' to control for

~u(/.~ dn\ mg Jat<J 'ct Pixcd effects rcgrc..,ion. the m.:~m toul for rcgresston nnalysb of
p v.Vii &M.l.. panel data. I!> an CXtcn!>iOD of mUJttplc! r~grc... 10n that CXplllltS punc.:l data 10
contrnllur vanahks that d1ffer acn.>s., cntit te'> hut arc constant m~:r umc Fixed
ctte<:.t!> rcgrcssion is inrroduced in Sccuons 10.2 and 10.3, firs t for the ca~c of
only two time periods, theo for multiple time

period~ Jn

Section 10 .... th~


349

350

CHAPTER 1o

Regression with Ponel Dote

NOTATION FOR PANEL DATA

10.1

Panel data consiM of observatiOnS on the same 11 entitiCl> at two or more umc period!~ T.II tht: data set contain!~ observations on the variables X and Y then the t.lata
a re d c norod

(X11,Y11

,I -

l. . ... 11 and t

1 . . T,

( H J)
~.:condsub.

methods are e xtended to incorporate so-called time fixed effects, which contrl I
for unobserved variables that an:: constaOLacro ~n l ltlt!., but chan 'I.' over tim.:.
Section L0.5 discusse s the panel data regression assumprions and standard
error) for panel data regression . In Section 10.6, we use these me thods to stuuy
the effe ct of alcoho l taxe and d ru nk driving laws on traftic de aths

Panel Data
Recall fro m Section 1.3 that panel da ta (also caJie d longitud ina l data) rde' 1 '
d:n.t tor' JJ( cr~:nt c::nllllt;S ,h,._rvl.'J at T d.tffcrc.:n lime rx:nuJ-.. rrlle state t
"
fatality data stud ied m this chapter a re pane l data Tho)c data arc ll r" 4" c..,
lle-. (st u.. ,). \\ ht.rc each t: ntH) '' oh,cn ed in T = 7 11m~: p..:rto(h (each of th..: ~ear..
19~:?. . J9SN for a t~>litl of 7 4, = 336 ob:-.cr' at ion-.
.
When J~ scribi ng cross-sectional data it was useful to use a subscript tu J~ note
tht! en tity; fM example:. Y1 refe rred to the variable Y fo r the i1h tmtity Wh~;n
describing pane l data, we need some additio nal no ta tjon to k u~:p track of both the
e ntity and the time period. This is done by using two subscripts rather UlJO 011e:
The fin.t, i , refe rs to the e n tity. and the second. 1. refers to the time period oi the
obserYa tion Thu~ r <.knott:'- thl: \Jrtabk Yob!>t'f\cd fm the ,rh Ul/1 cntili~ ... '" rbe
:" 111 I J'l nod llu.c; notation '' summarized in Key Concept lO.I.
Some ad dli JOna l terminology associa ted wit h pane l data d escribes " he t.r
' I

~ - ~~- ~ .

, ---~ -

.- ---- _., (31=

Panel Dolo

1o. 1

35 1

mts-.an' d l!:'l for at least unc.: lim~; pcnoJ for ut lc:a tone cntl IS caUcd an
unbalanc~CJ patnd. The traflic fatalit ~ d.n.1 '"' ha' d.tta for all 4S L.S. states fo r all
:,c.;' en 'c.:.,m, 'o 11 1.!. baJanced. Tf. ho\\ C\er, some dat.t were mt-.:.ing (for example. if
we tltd not have da ta on fatalilies for some -.tatcs in 1983). tb ~o.n the data set would
be unbalanced. The! methods presented in this chapter are dc!scnhed for a balanced
panel; however, all these methods can be used with a n unbal,l nced paneL aJthough
precisely how 10 do so in practice depends on the regression software being used .
S<.lnlC

/IIj,;rJ'
~~

~ P'

~~?

~ Example: Traffic Deaths and Alcohol Taxes

iM

There are approximately 40.000 highway traffic fataJJties each year in the Unite d
States. ApproXJmately one-third of fatal crashe!> involve a driver who was drinking, and this fraction rises during peak dnnktn~ pcriodc;. One tudy ( Levin and
Porter, 2001) esti mates that a-; man~ a'> 2" ~. t'f clfl\ ~r~ l'n the: road between I A.~t.
and 3 A .M. ha\ t. hcen drinking. anti th<~t .l timet \\ hn j.., Ic.: gall~ drunk il> at least 13
times as likely to ca use a fatal era h a.., a driwr who has not he~: n drinking.
In this chapter, we study how effective vanous governmenr policies designed
to discourage drunk driving actually a re in reducing traffi c deaths. The panel data
set containl> ,.,riablc related to traffic f.ttthttc' .mtl .tlc:ohul mcluding the oumlxr ll lr.tlltc latalities in eadi .!.tate in ~ach \C:lr.thc tyr.e of drunk driving laws tn
0

"]) ~ :
~,/A

'/j

_c,...

/ 'L...,~(ie-r
~~ ..:.1ch o;tah: 111 t:ach 'ear. and theta'( on l'et:r
eiC.~

to
ffic

nue arl>

b'c.v~
~

10

c.n:h

~t:'lh; ' fbt.. measure of traffic

death!> we u''- ts the fatalit} rate. \\hich is th~.; nunth~.;r <)f annu.tltrafltc deatlb per
10.0()(J p~!ople in the population in the ... tate ll1l me.: tsurc.: ,_~r :11coholtaxe~ \\c: u....c
i ~ the n.:al" t,t\ on a case of beer. " hkh I' the ccr tax. pul mto l (}~dollars by
adjuo,tin for mflation.' The data Jr~ ..,c.:r tl'nlm n1111'- Jctail in Appendi\ 10.1
figure 10.1a is a scatterplot of I he d.tta for 1Q82 on two of these vanable.s, the
fat ality rate and the real tax on a ~se o f hcer. /\ point 111 thl' scaucrplot repre-;;cnts the fatalit~ rate in 1982 and th~ ru1l hc~:r Ill\ tn llJ82 lor(a gncn o.,tate. The
O L I) r~.grcssaon line obtamed by rl!gn:..,..,inl! tli~.: t"'tulit~ rah.' on the real beer tax ~

'(F , / """~ plou,J on the_::::ccstimatcd rc~'"""'" line ;,


~
one::
( the
ntbt:

/~ .L/_

.,.f' re ~110

,. V

FatuluyRnu='2.1ll

015/lllllfl '

(0.15) tO 13

~~~
Jtt.P~
'
fh::.~~:rfi~knt on the real beer tax j, po Itt\

~~~f:..., ~

(IIJS2tlata).

/ ' - _(10.2)

--

hut..not stalisticall) :ugn.t.Ucant

at

the.: I0o lcH;J.


1Tu m.tkc.; th~ 101\t!'> Cl>mparahie.over time. thC\ an: put into''! 'IAA uull.trs'' U'l02 the C'onsumer Price
l n,lc.\ (Crl) I (1r cx.tmpk hccausc oi tnfbti(>n" ttl.\ uf \I 111 tIS2 wrrc'J'llnd\ to a tax of St.2J m 1988
w11.1~

~? ~ .

35 2

CHAPTER 10

Regreuion with Panel Data

FIGURE 10. 1 The Traffic Fatality Rate and tne Tax on Beer
Panel o is o Kotterplot of trof
Fie Fatality roles and the real
lox on o

Fa tali ry ll1l te
(Fatalities per 10,000)

'-

- .::1

case of beer (in 1988


dollars) for 48 stole$ in 1982.
Panel b shows the data for
1988. Both plots show a posi
live relationship between the
Fatality role and the real beer

.'.5

lax..

2.0 L-.

-10

3.11

.I

1 5 [ '
1.0

:~:

--

filtlfil}Ra!t : 2 Cl ... OJSBNrTa

'.

li.S
00~----~----~~----~----~------~----~.I
Cl.lt
0.5
1.11
1.5
l .(J
2.5
.~'I

Beer Tax
(Dollars per cose $1988)

Fataliry Rate
' ties per 10,000)
-15 ......

-II)~

J .S
3.0

--

. .

-"r: ...

? -

2.1

Fat.11lt}Rate = 1.86 ... 0.448eerTa~

. ..

...
11,11 ~----~----~'----...1...-----'---~--~

0.0

IJ.5

10

1.5

2 .1)

25

ll

Beer Ta:\
(Do Uors per

'

Ca$e

$198111

10.2

Ponel Dolo With Two Time Penod$. Before and After' Compori$00$

353

B ecause we have data for m ore than one year, we can T\.!cxamine this re lalio nship for another year. This is done in figure 10.1b." h1ch i~ the same scallerplot as before. except that it uses the data for I ~X~. TI1c OLS regression line
through these da ta is

FaaiiryRO'i't

= 1.86 ~ OA48t>t rli1t

( 1988 data).

(10.3)

(0.11) (O.D)

In contrast to the regression using the 1982 data. the coel ficil! nt on the real beer
tax is statistically significant a t the I% levd (the r-statist1c i" 143). Curiom.l}'.the
)

~~

t~ 1

tv

0
.o/'t.;
r -

.P?~

/
/)
/}'
{!) U."~l

;,1,

-( jJ( 1J. ~~
~

(9

estimated coefficient fo r the lQR::! and Lht; 19Rs ~l.l!.t i" ptHIII\o' I ak~n ltkr,dl .
h' 'hl r reiil bet!r ta~e.; are a sociatcil wnh mort: not fc\\~r trafllc fat.ililit!
Should we conclude that an ancrease m the tax on beer h:ad-, to more traffic
deaths'? Not necessarily. because the e regression could ha,c suhstanrial omJued
variable bias. Many factors affect the fatality rate. includmg tbt: y_u.Ilit~ of th1.. automohih.:~ driven in the state. wbetht:r the state h1gh"'Y' an.: m g\lod repair. whether
most d riving is rural or urban. the density ot ,:~r' un tht: road. anti \\hl!tber il i
sodalh al-ceptabk to drin k and dnve.Any of tht.!'c factor ' nHl) he correlated with
alcohol taxes: and if they are. they" ill lead to om1HCd 'art.tble b1as. One approach
to these pme ntial sources of o mitted variable bias would bl.! m collect data on all
4th esc v:ariahles and add them to the annual cross-sectional regressions in Equations ( I0.2) and (10.3). U nfo rtunately. some o f these variubk'. 'uch .... th\.: cultural
t~.:~o:\.:pt.mce ot dnnkmg and dn\mg. m1gh! b~,; \\,;!\ bard or e\'en impossible to

mc:n-.ur~::.
If these factors remain constant ove r time in a given state. however, then
ano ther route is available. Because we have panel data we can in cltt:~o:t hold th~se
fa ctor' coru;tant. even though we cannot mc.t,urc.; th~m ro Jo o, we use OLS
rcgrc s1on

\\ilh

tpt:Jq ~

ftxcJ effects

I/;A1,fP11'11J.' ~

'1 0.2 Panel Data with Two Time


/
(} fun.l' ,..r Periods: " Before and After" Comparisons
1

-~~~

(9 ~

/t.J~.

.,., T . . ..

/'J l[

\'aluc of

When da ta for e ach s tate a re obtaine d for

{'~

the

dcpc~ahl~1

:,:::;;:lu;
u

2 time p.:riods. it i possible to


;n the

s~ ~,Jdu~ : ~H~~$1.~~.e.

354

CHAPTER 10

Regreuion with Panel Dolo

fil"''
n d By focusing on chmrgc\ in the dependent \ afl..tbh;, this .. before a nd
afh.r comparhon m d let h1ld con.,tant the unob..,cn ei.l t.u.:t 11' tnat iliffer lrvrn
one ~l l'Hc lOthc next out du not chang~ 0\~r l1mc: \\llt11n tl t: tate.
' '
Let /. he a variabk thu d-.:tcrmint:' the fatali t) ra te in the '1 !>tate, hut dt
not chang~: over rime ('o the 1 'ub cnpt i-. omntcd) For example. Z, m1ght he: th
loca1JclultuJ ,tl ~Jttitu~o.: h '";uJ dllnk1 ng and dn;i~~~ \\dht chJ lh,~ll~l " lm1 J, ,1,10 1c1 u~
t
cou 1 1l'I:Oil'>l ereu ll' 1w ,nn-.tanl 1,etw~cn IX111 11 XK. n CCC>rumgly.the pop.
l,lyl'l<4, ~ ulatio n lint!ar regression relat ing Z, and the r~.:al beer tax to the fatalny rate 1s

Z:

f~~ -k

pes.H '/-

~~
~

lltfp.tt

. nd
.1:.md1
T
wV
Because Z 1 does not chonge over time, in the rcgres~ion model in Equauon
(10.4) it will nor
any change in the fawlity ra te hCt\\ l.!cn 1982 and
l'hu.... in this regression muc.Jcl, tht: mfluence of/, can be eliminated by anal) ling
the chanac 111 the fatalit\ rate hct\\ccn tbc two period!\. To .,ee this mathematK lll}.
con~id~r Equation ( 10.4) lor each of the two years, 1982 and 19SS:

produc~

~ f,..H

/J Lullf f'

IJ

~:,fA';_,~)~ .'
'f!

{10 4J

\\lll.r"' 1sthccrrortc

Vvl ~
f

~....-_Fil_r~al_.u~R_m-"'"--~"'----'~~--

1\.l~s.

Fa/ali(\ Rmt

1 :..;

#,1 ..,. {J BeerTax

Fara/il_\ Rtrlt

~.

+ {3

Beerltt\'

(lU 5)

( HJ.o)

Subtracting Eq untinn ( 10.5) [rom Equation ( 10.6) clim tnntc\ the effect ol L,:

FmulityRnte,

~' -

fillllill\ RM< 1 '"

= f3 1(Bcvllll:,

- Bt:erTar

I~!)+ It 1

(i0.7)

./,

ThiS 'ipecification has Hn intuitive interprelaliOn. ( ulturnl cllti!UUC (O\\.a.rd d rin '
m Jnd Jmmg affect th kh:l ,,J drunk dmmg. nd ttlus the traffic fatalin Jate '"
/1
./.e-;1111L
a ''te. lf. howc\cr.tbt:) did .n_otchange bet\\Cer. J9s~ .nL 'J th~nlhe> diLl not
f.t,~- -II prodUlL '"' d:.111 t rn 1.11.1 ,,,L., '"the 'tate R.llh er, any changes m rra rfic f 1.111
., M.l ~ til!!> overt1me m ust have arisen from otJ1en.ources. ln quulton (lO 7). tbe~e ,,rhcr
V~ ------~OllrCC\ Hrl! changes in lht: ltj\ on beer or changes in the error ~crm (whicl (~1(1
~ .... ~ lttrL~ clunl!Ct; m otht!r lactnr-. th.lt determine tr.1f11c dc?aths)
1
.!J
~
Spccifymg the regre ~ion in changes m Equation (10.7) eliminates the.. IT" 1
f1~
of the uoobliervcd \ari.tble' Z, that are constant over time In othe r worL 'i. Jna

jf;~.fo..-'-c

f'~ ~:"~~::~~T~::~~.:~~ ~~~~;::.~~~~~~; ::":::~~~.:~~ ~~~;:~.: ~~~ "' "'".

10 .2

Panel Doto with Two Time Periods Before and After~ Compori$00s

355

FIGURE 10.2 Changes in Fatality Roles and Beer Taxes, 1982- 1988

Th J IS 0 S(X)tteq>!Ot
ollh-. change in the

A)

traffiC fotolity rote


end the change
1n real beer toxes
b 1tween 1982 and
1988 for 48 stoles.
Tf ro IS o negative
vlohonship

on

n the fotolity role


one' honges in
the !Jeer lax.

~ changes

..()

8.
'ng
1\y.

-{1.4

-0.2

00

II.~

0.4

Changt in Beer Tax


(DoUars per case St988)

).5)

Figure 10.2 presents a ~a u~rpl o t of the dltJII~t in the fa taJil\ rate hetween
19g2 and 1988 against the dumgl' tn tht: r~.;,tl beer ta~ bet\\ccn 11JI'i 2 and 1988
for
\
the 48 ~ tate~ in our data set. A point in r 1g.ure 10.2 represents the: <.:hange in the
fata lity nne and the change i ll tbe real beer tax betwee n 1982 and 19H8 for a given
state. 'n1c OLS regTession line. e!>timatcd uo;ing these data and plotted in the figure. is

0.6)
f Z,:

0.7)

/.

~>

..,

,., f1.IW'

rl~

FawlioRme 11

~ - 0.072 -

1\ -

rwalityRmt 1 ,,~~

I 04(8r fut ,

(0.065) (0.1tl)

- BerrTut ,).

p~ibilit, thJt tb~ m~an

(10.8)

\\here including an intercept allows for the


change in the
fatauty rate. in the absence. of a changc: tn the real bl!~r ta\ i!' nolllc.W.
/ __,
.;::> ~ In contrast to the cro!>~ "cctional rcgtt.:""ion result . tht.: e..,llmntcd effect ot a
"tfl""'" ltl
ch<tn C.l

356

CHAPlER 1 o

Regr~Wion with

Ponel Doto

so the estimate suggests tliat !rat fie fataliti~ can be c.:ul in ball merely hy mcrea-..
tn~ th~ rLaltax on beeT bv $1 per case
By cxam1mng cbangl:!s in the fatality rare over Lime. the regre!l-.ion in Equa
tion (10.8) controls for fi xed factors such as cultural attitudes towanl urinking and
driving. Butth~re arc many factors that inOuencc traffic safety. and 11 the. v chang
over time and arc correlatc::d with the real beer tax, tben thr:t.r om i~s1on wiiJ pro
duce omilled variable bias. lt1 Section 10.5, we undertake a more careful anah~h
that conlrols for several such facrors. so for now iLis best to rdram from draw1 1g
any substantive conclusions <~bout rbe effect of real beer ta:<cs on traflic fat.~ht1c s.
This "before and after" analysis works when the data arc observed in two lillferent years. Our data set, however, contains observations for seven difkrent y~ar'l,
and it seems foolis h to discard those potentially useful addi tional data. Eut th~
"I'll lore anu after'' method does not apply direct!) when T "'> 2. To analyze all the
Obl>ervation~ in our panel data set, we use the method of fixed effects regn:o;swn.

Fixed Effects Regression


Fixed effects regression is a mc!thod for controlling for omitted variables in f'<JOe l
data when the omitted variables vary across entiries (l'lates) but do not cb 11!!C
ovl.!r time. Unlike the "bcfon:: and after" comparisons of Section 10.2. fixed ctflocb
regression can be used when there are rwo or more ume ob en au on f r C.Jch
en my.

The fixed effects regression model lias n Ciifkrent intercepts. nnl! fN~ceach
entil\. Thl!-.c intl!n:epts can be represented 11\ a set of bin an (or inu11:ator) van
abies Thco;c l:linary 'ariahlc:i absorb the influence~ of uU om1ttcu \ariablc th.1t
differ from one cnt1ty to lhe Ol!:\1 but are constant over tim\...
f

The Fixed Effects Regression Model


Consider the regression model in Equa tion (10.4) with the dependent van .. i'k
(FawtiryRate) and obse rved regressor (BeerTax) denoted as Y1, and Xw rt,~pcc
tively:

~
Y;, = + p,x,, + {J,Z, + u;,.
(!O.Y)
~ where Z, i~ an unobserved variable that varies from one state to the next but . , ,~s
{30

no1 change over lime (for example, Z, represen ts cult ural atlitudes toward dnnk

10.3

Fixed Effects Regression

357

ing and driving). We want LO estimate {3 1, the effect o n Y of X holding co nstant th ~


unobserved state characteristics Z.
Because Z; varies from on~ tate ro the next but is constant over time, the pop
ulatio n regression model in E quation (10.9) can be interpreted as having n inter-

eas-

quaand

cepts, one for each state. Specificall).let

~~
c es $~

a,= f3rt + f3~Z1 Then Equauon (10.9)

wing

litics.
o di(years.
t lbe
til the
sion.

r,..r

..L

a, + u,,.

('lO. IO)

Equa tion ( 10.10) is the fixed effects regressio n model. in which a 1 a 11 arc
treated as unkn own intercepts to he estinulted. one for each state. The interpret<ilion of ct; as a state-specific intercept in Equation ( 10.10) comes from considering
the population regression lin~ for the ilb staLe: Lhi populatioH regression line is

./, ,,
o<..c.,

.,,# ': a;+ {3 X,r 1l1e slope coeffici ent of the population regressio n lin!!. {3
1

1.

is the same

~~~~ for aU statt!s. but tbe inrercept of the population re gr sion line varies from one 1

).!1Vf

V ~
panel

YJJ = /3 1-"11
''

state tu the next.


Because t.he intercept a 1 in Equation (10. 10) can be thought of as the 'effect"
~being in e ntity i (i.n the curre nt application. e nt ities are ~tate.) . the terms
1 a" are known as entity fixed cffects. ll1e variation in the entity fixed effects
comes from omitted variables that , !ike Z 1in Equation (10.9), vary across e ntities
but not over time.

. re:spec

The stale-specific intercepcs in the fi xed effects regression model a lso can be
expressed using binary variables to denote the individual states. Section 8.3 considered the case in which the observations belong to one of two gro ups and the
popula tio n regression line has rhe same slope for both groups but different intercepts (see Figure S.8a). That population regression line was expressedma thematicall ' usin a single binaiy variable indicating one of the groups (case #1 in Key
Concept 8.4). If we had only two states in our data sel, that bmary van a
::.
sion mode l would apply here. Because we h ave more than t wo states. however, we
need additional binary variables to capture all the state-specific inte rcepts in Equation (10.10) .
To develop the fi xed effects rt'gression model using binary variables. let D l i

( I 0.9)

be a binary \'ariable thai e4uals 1 when i = l and equals 0 o therwise: let D2, eq ual
l when i = 2 and equal 0 ot hen' i::.e: and so o n. We! cannot include all11 binary variables plus a common intercept. for if we do the regressors on.111 be perfectly multi-

r eacll
r) vari
lcs thai

,v ariable

but doe<;
rd drink

collinear (this is the "dummy variable trap" of Section 6.7). so we arbitrarily omit
the binary variahie Dl; for the firsl group. Accordingly. the fixed effects regression
mode l in Equation (lO.l O) Cc-'1.11 be written equiva lently as

358

cHAPTER 1o

Regression with Panel Dolo


( 10.1I
when: /311 {3 1 y 2 y" arc unknown coefficients tb"bc c:.timaled. To d~ri' e th~.:
rela tionship bet,\-ecn the CQCCficients in Equation (lO.ll) and the intercepts 10

Equauon ( 10.10). compare lhe population regression lines for each ... tale in the
rwo equations. In Equation (10.11), the population regression equa uo n tor the Llrst
stale is /311 + {3 1X ,. so a 1 = {30 For the second and rernainjng states. it
+ y 1.S(l u, .... {30 + y fori> 2.

1s {311 .A.

{3 1X,,

Thus. the re are two equivale nt ways to write the fixed elfects rl.!gre. sh.l1
model. Equations (I 0.10) and ( 10.11). In Equation (10.10). it is written in tcnn" nt
n statc-spcc1fic intercepts. [o Equation {10. I 1). the fixed effects reg.ression modi:!!
has a common interce pt and

11 -

1 binary regressors. In botb formulations. th,;

mo

slope coefficient o n X is lb.e same from one state to the next. TI1e sta te -spectftc
inte rcepts in Equa tion (1 0. 10) and the binary regressors in Equation (10.11 ) have

the same source: the uno bserved variable Z, rh at varies across states btll not
time.

I.Wt:r

Extension to multiple X~.

If tl1ere are other observed dete m1inants of Y th 1t


are correlated with X and that change over time, then Lhese s hould al!>o
he included in the regression to avoid omitted variable bias. Doing so result:> n

Lhe fixed effects regression mode l with multiple regresson.. s ummarii'cd


Concept I0.'2.

10 K~y

Estimation and Inference


In principle thl! binary variable specification of the fixe d e ffects regression modd
(Eq uation ( 10. 13}1 can be estimated by OLS. Thi regression. however. ha!' k - n
regre.,o;or~ (the k
the 11 - 1 binar~ variables. and the mtcrcept), so m pral'IICC
this OLS regression is tedious or, in some software packages, impossible to impk

x.,_

ment if the number of entities is large. Econometric softwAre there fore has sr''
cial routines for OLS estimation of fi.xed effects rcgressjon mode l::.. The~~ o;p~~.1JI
routine!> an~ equivalent to using OLS on the full binary variable rcgn.;!>~ion , bUt
arc faster hecausc they employ some mathematica l simpli fica tions that ansi! in the
a lgebra of fixed effects regression.

The "entity-demeaned" OLS algorithm.

R egression software typicallv Ct)01

putes tlle O LS fix ed effects estimator jn two steps. 1n the first s tep. the cntttf
. [>C'Cific average i subtracted [rom each variable. In theM!C(lnU stl!p. lhc: regr~o:,,lt1n
1 t..:sttmah:u u.,ing ..c:ntit} dcrnt..mcd .. variable Specifically, tonsider the ca~ lli
a 6ingle rcgn.:s:.ur io the vc f'oio n of the tixe J effects model in Equatio n tJU. IO) ,trtd

wh

10.3

Fixed Effects Regression

359

(10.11)
~rive

THE FIXED EFFECTS REGRESStON MODEL

te in tbc
r the first

The fixed effects regression mode l is

the
cepts in

u + {3,X,,

terms of
on model
tions, the
e-spccific
.11) have
l no t over

(10.l2)
whe re i = I, ... . 11 and 1 = L ... , T. where Xu 1 is the q llue of the first regressor
for entity i in time period r. X2, 11 is tbe value of tht:l second regressor. and so forth,
and u 1, .. , a: 11 are entity-specific intercepts.
Equivalently, the fixed effects regression model can be written in tenllS of a commo n intercept. the
and n - 1 bina ry variables representing all bur one entity:

rs.

Yu =

f3o + f3 1Xl.cr + + f3kXk,ir + 'Y?.D2,


(1 0.13)

\\ here D2i = 1 if i

=2 and D2; = 0 otherwise. and so forth.

of Y that
results in
ed in Key

take the average of both sides of Equa tion(10.10): the n Y, = {3 1 X; +a, + u,. where
Y =} 2:.[- 1Y". and X, and ii, are defined "imila rly Thus Equation ( 10.1 0) implies
thar Y,,- >'; = {3 1(X11 - ~) + (u.. - u.). Let 1~, = Y,, - Y. _.\u = X, and ii 11

x;.

= u 11

ii,; accordingly.
~~~-

J)~~~~

:;sion model
:r, bask -1- n
) in practice
>le to impleore has spcbese special
~ression. but
Lt arise in the

yptcally com
p the entitv
be regrcss1on

~r the case of
,0 ( Hl.lO) and

S~ ~ ~ ~~

"{J1 can be estimated hy

Y,,

Y,,=/3 1X -u 11

th~

11

(10.14)

OLS regression of the "entity-demeaned'' van-

abies
on X1,. In fact, tbi~> e..,timaror is ident1cal to rhe OLS t::,Limator of {3 1
obtained by estimation oT the fixed effect!. model in Equation (I 0. 11) using n - 1

binary variabb {Exercise lK ,).

The ubefore and after" regression vs. ftxed effects estimation.

A lthough

Equation (10.11) with its binary a riables looks quue dillerent than the 'before
and after" regression modelm Equation (10.7). in the specia l case tha t T = 2 the
OLS estimator {3 1 from the hi nary variable specification and from lb...: " before a nd
after specification are identical if the intercept is excluded from the ''before and
afte r" specifications. Thus, when 1' = 2. there a re three ways to estimate /31 by OLS:
the " before and afte r'' specificatio n in E quation (10.7) (without ~1.1 intercept).tqe
binary varia ble specification in Equation ( 10.11 ). a nd the "entity-demeaned" specifi cation in Equation (10.14):n,csc three methods are equivalent, that is, they produce identical OLS estimates.

360

CHAPTER 1 0

Regression with Panel Data

The sampling distribution, s tandard errors, ond statistical inference. ln


multiple ngre:.sion \\ith cro ~ 'cctlonal data. if th~: four least square!> asmmptsan
in Key Concept 6...1 hold. then the c;ampling d istnbution of tht. OLS esttmator 1
normal in large sample.; l11c vari.mce of this sa m plin~ distribution can be \: 11
mated from the data. and the -.quare root of tlw, C'> timotor <.ll the varianc\: - th 11
i'>. tht.! standard \:ITOr--<:an bl! used to h.! t hypotht.!c;c, uc;ml! a H. tans lie anc.J to con. truc.t confidence mtervab.
Similar!), 10 m ultiple regression with panel dnt ,t, if a \et of asc;umptwn"callec.J the fLxed effects rcg.rl!'-'lon a'~umptlon ...- hold. then th~: 'ampling c.Ji,tnhullon Of the IIXCd e ffeCt!> QU) ~o:SIIffi a i Of b 00rmal111 ltrgc 'amples. rhc Vdrl3nl'c O(
that c.lbtrit'tulion can be l.''aimated from the data. the -;quare root of that c"timatur
is the . tanda rd t:rror. and th ~o. ... tandard error can h1.. u...cd tu construct l-'>t:Jihllcs
and confidence mtcrvals. Gl\en the standard error, !>tati:.tical inference-tc.;tmg
hypotheses (including JOint hypotheses using F-statistics) and constructm!!~:nn ti
dence intervals- proceeds in exactly the same way as in multiple regression wuh
cross-sectional data.
The fixed e ffects regression assumptions <'l nd stnndard e rrors for fixed df\:l'l\
regr ession are discussed furl her in Section J0.5.

Application to Traffic Deaths


The OLS estimate o f the fixed effects regression line re lating the real beer
the fatality rate, based on a ll seven years of data (336 observations). is
(

w~

l)l

where. as is conventional. the esttmah:d "tate t1


pac~ c~nd t'tcc.
the\ n not prtmHn inklc't in 1h1 tpplication.
Like tbe "dtffercncc-. " ~c1ricmion 111 E:.quatwn ( IO.R), the cstimateJ C.:{ll!lli
cient in rhc fixt:d dfect. regression Ill E4uation ( 10.1 S) "ncgutive. '>O rhat.as p1-:
die ted by economic lbuor~. hi, her renl hccr taxct> ell c.! .tssoclatcd with fe\WI u a1 l1
deaths-the uppo-,ite o f what we- louml in the initi.tl \:Tn.,., c.:cuonal rcgT i l1" nl
Equations (10.2) and (10.3) Tht:. two regressions Jre not tdentical because the "Jif
ferenccs" regre~ion in Equation (10.~ u~c' nnl\ th~ c.l.tt.t for 1982 anJ JQ "'
(spcaficail). lhe dtltcr~.:nce !"let" l!~n thoo;e l\\0 \cars).\\ hen.: :I<; the llxcJ l; rrc~l
regre-.sion in E4u.tttun 1(1.1~} use-s the dntn IOJ .111 ~C\ n ~ ar-,. Becaus~ \' th~
addlltOnal observation the t. ndard \:fmr j, ...mallcr in Equauon (10.151 t n If\
fqu;s(IIIO I II :--.1.

10 . 4

Regre"ion with Tlme Fixed Effects

361

Including state fixed cffectfo tn the f,ttnht) rate rc~rc,..ion let-. u, a' oid omith..:d 'anal:tles hias arising tr. 1m urnttt~:J I 'letor 'uch a cultural attitude toward
cJrinklne. .111J drhing. rhat \.tT) . u '" ... t.llc" hut ..ue cun-.tant o'er ltmc:' ''tlhtn a
..tate. 'ttl!. a keptic might suspect that there arc other factors that couJd lead to
omitted variables bias. For example. over thb petiod car<; were gcuing safer a nd
occupant.; were increasingly wearing ~eat b~.:lts; if the rc:altax on beer rose on averaj!c dunng the mid-1980s.thcn 11 could be ptd.mg up the t:ffect of merall automobile ,afety in1pro' emen t!<- If hO\\C\ cr. ,,,let~ imprun..nlL nh evolved over time
Inn \\ ere the same for all ~t a tes. then we can eliminate the tr intluencc by includmg tim!! fixed effects.

ln
tions
or is

~kJ~-

esting
confin \\tlh

effects

r tax to

~
10.4 Regression with Time Fixed i:ffects

Just as fi xed effects for each entit) can control tor Htriahlc:~ tl1at ar~ con... tanr o,er
time but diller across entitie.,. ~o c;tn tim~ frxed clfcct'> control for 'ariable:. that
arc constant across entil tt:~ butt:\'Ol\'e oYer ltm~
Because safety impro'vements in new cars are introduced nationally, they serve
to reduce traffic fatal iti e~ in all states. So. it is plausible to th ink of automobile
safety as an omitted variable that chang.e over ttmc but bas the same value for all
!-ta te. The population reg.re!<-~ion in Equation ( 10.9) ca n be modified to include
the effect of automobile alcty. "hich we "ill denote b~ S,:.

(10. 16)
( 10.15)

where 51 is unobserYed. and where the si n!!lc .. , .. ~ubsc ript cmpha~I.Zc~ that safety
mer time but h. constant aero'~ ~ta t ec;. lkcau'c: {3,S represent~ "ariables
that determme Y ,. if S, is corrclall.. d "ith X, . then om11 11ng S, from the regression
h:ad' to omitted variable btas.

cha nge~

Time Effects Only

~nd ilJ

td etlc\. t<
ll'C o(

tllC

S) than tn

For the moment, suppose that the variables Z1 arc not present . :.o that the term
!Jz.Z1 can be dropped Irom Equation (10.16), although the term f33S, remains. Our
objecth c is to estimate tJ1, controlling for S,
Although S, is unohsened. tb influcnct c.an he eliminated because it ,aries
over ttme..but not aero!.':> tatec,.Just ,,., Ill'> pos.,.hk to e!Jminatc the effect of Z;.
wh ich vanes across state~ hut not 0\er 11m ~ In the ~.;ntuy hxed dlcct'\ mo<.lel. the
pte,c:ncc of Z, leads to the ft xcd cltccts rcgrcs,inn model in E4uu 11on (10.10). in

362

CHAPTER 10

Regression wilh Panel Dolo

which each state has it!> own intercept (or fixed e(fect). Simslarly. because S', \!lfh:~
over t imc but not over state!>. tb.e presence of S, lea<.h to a regression nwdcJ m
which each rime period has its own intercept.
Titc time fixed effect' rcgres'\ion model "ith a
(l J 1~

This model has a different intercept..>.,, for each time perioo. The intercept \ ,
Equation (10.17) can be thought ofaHhc"effl!ct" on Yof}ean (or. more gur
ally, time period t). so the term .>. 1.... A an:. l.nm'n .a-. rime: fi,~d errccb In c'
ation an the ttme fiXed dfect!> come" lrom om1ttecl \,mabll!:o tha t. likeS, in
Equation (I 0.16), ' ar~ over tame but not acrn'>!> cntillc'
Just a!> the entity IJ.\ed effect rcgrei>sion model cnn be re presented u'tn~
n 1 binary indicators, so, roo. can the 1imc fi-.:cd dfl!cts rcgrc-;<>ion moucl hL rc..:p
fCSL'Oli!U using /' - 1 binary indicators:
( llJ I~)

"here 5l..... Srarc unKnown coefficient!>. and\\ here /C.


1 1f 1 = 2 and 8 2 - 0
otha\\ t"t;. <~nd ..o torth As in the fixed e((ect regres.,aon model in Equ. 10n
(10.11), in this version of the time eHe::cts model the intercept i!i included and tbe
first binary variable (81,) is omitted to prevent pertect multicollinearity.
When there are additional observed x regressors, then these regre~'ors
appear in Equations (10.17) and ( 10.18) as well.
In rhc traffic fJ.tdhtic.., rl.'grt.:"'Jon. the tame llxed ~nc:cts spccillcatton allo'~ ' u-.
to elimmate baas ansing from omitted variables like nationally introduced )..1. t)
standards that change over time but are the same across states in a given ) e tr.

Both Entity and Time Fixed Effects


II ::.om~: omitted vanablcs art! constant over time but vary across statCi> ('uch ,1'
cultural norms). while others ore constant across states hut vary over time (<.ucb
<Is national c;afcty standards). then it is appropriate to snc:lude b01h entity ( ... t.ttl!)
cuttl ttmc eCfcctc;.
The combined c ntit~ and time fi.Aed effects regn ion model is

} = {3 A

( 10 !9)

10.4

rics
lin

Regreuon wth Time Fixed Effect~

effect 111il .\l j, the tim.: fi\ci.J dkd fhj~, model can
- I entity binar) tndicators and T - 1 time
htn<lry indicators.. along with an mtercept:
where tt is

th~.: cntil~ fL~eu

~.:qUl\.th.: nt l~ be repre entcd u:.tng n

(1020)

0 .17)

A, in
cner,-ari.

51 in
u<.ing

3 63

wh r~ /3,,./3 1 1' .. _. -y,.fi~.

51 ate.: unl.nnwn cocfficientc;


When there are additional Obl>erved .. JC r~grc~!lor~. then these appear in
Equation ( 10.19) and (10.20) as well.
The combined state and time ftxed eftects rc:gression modd eliminate' omittea 'attabk' bta" arising both Irom unobscnetl v::mabk' that .rc: cunstant O\er
time anJ irom unobserved variable:- th::n arc constant ,1 ro. states.

~rep

{lO.l8)

2 =0
uatioo
and !he

ressors
!lows us
d saft:tY

year.

Estimation. The time hxcd cflccl'. model and the cntt l~ and tuue fixed effects
model :1rc both variants o[ the multtplc rc:grc:Sliion modc:L 1l1u!> tlwtr coefficients
can be estimated by OLS h} mduc.hni! th\! aJdiuonalum~.- hmal) ';mables.. Alternatively, in a balanced panel the coefficients on the xs can b~.- computed by first
devtating Y and the X s from their en til) wrd time-period m~.an~ then t!~fima ting
the multtple regres:.ion equation of dc,iated Yon the d~' iated X':.. Tht~ algorithm.
"htch ''commonly implemen ted m J~.;g rc~.,ton software. dimthates the need to
con'>truct the full set of binan indtcators that appc3r in Equation (10.20). An
equivalent approach is to devintc Y the}('-.. and the tunc tndtcators from the1r
state (btu nnl 11me) means nnd to cliltmalc f... + I em.: llictcnts by muluplc regress ton ,lf the c.Jc, tatcd Yon th..: dcvtakd .\"'sand the dcvtatcd time inclicatms. Fm<tiJy.
if T = 2. the entity and time fm.:d dl~cts rcgre:.~i,,n C<ln b'- est muted using the
"before tnJ ,tfl~.r" approacb.ul cctiun 10 2.m~.-1udm~ the: mh.:r~o'- 1111 there ~
"ton.Thus the "before and aftcr " fCP.fC' ion reported to J:quatiOn ( 10.8), in \\Chich
the change in Fara!il) Raft from I'JR2 to 19- s b regr~'"l.:d on the chanae in Beer/a.\ (rom II.JX2 to 1988 ncludmr> an mterccpt. pm"Vtdes the 'am" cstunatc of the
..,Jopc coctllctent a~ the O L r~.- rc..,.,tOn of 1-iualmRttu on /Jeer /a.\.mcluJmg entity
and time h\.cd effects. ~ltmnt.:d u'ing d.ltd lor the I\\ u} l!iU I ~X~ .tnd 19&~-

1 (such a<;
_mt:

llY

(-:,uch

htntd

Application to traffic deaths. Adding time effecb to the state fixed effects
regression results in the OLS t!'-limate of the regrc!-.,ion li ne:

( l0.19)

'/--+'"fTTitiTtt \ Rme = - Q64Beer Tin


(0.25)

.\tnteFnetlEj{;:'Zt~-' 7imeFixed F[fect~

( 10.21)

364

CHAPTER 1o

Regression with Ponel Doto

This specirication includes the beer ta~ 47 state binary variabh!s ('-tate frxt>d
effects). 6 year binary variables (time fi-:cd effects). and an intercept. sn that th 1)
regressiOn actuali\o' ha:. 1 + 47 + 6 + 1 =55 right-hand variable ! The cocffictcn
on the Lime and :,tate binary variables and the intercept are not re ported bcc;., us~:
they are not of primary interest.
Including the time effects has little impact oo the estimated rch.Jttnn'!>htp
between the real beer tax aod the fatality rate [compare Equatmns ( 10 15) anu
(10.21)]. and the coefficient on the real beer tax remains igniftcant at tbc 5., h.. vd
(f = - 11.64/0.25 = -2.56).
Thi3 estimated relationship between the reaJ beer tax and traffic fatnlit k .. i.\
immune to omjtted variable bias from variables that are constant either ovu timt:
or across states. However. many important determinants of traiiic death" do not
(aJl into this category. so thTh specification could still be ::.ubject to omitted \'ttnable
bias. Section 10.6 therefore unocrtakes a more complete empirical examinati<>J (lf
the effect of tbe beer tax and of laws aimed directly HI eliminating drunk driv1ng.
controlling for a variety of fac tors. Before turning ro that study. we first di!!cus.; the
as umptions underlying paoel data regression an d the construction of stnndMJ
errors for fixed effects estimators.

10.5 The Fixed Effects Regression


Assumptions and Standard Errors
for Fixed Effects Regression
The standard errors re poned so far in lhls chapter were computed using the u:.ual
hctcroskedasticily-robust formula. These beteroskedasticity-robu.ca <;tandan.l t.:r 1'5
are \'alid in panel data wben Tb moderate or large! under a set of five assumplltlll'.
called the fixed effects regression assumptions. The first fo ur of these assu mptions
extend the foUI least squares assumptions for cross-sectional data (K ey Con~c:pt
6.4) to pane l data. The fifth assumption requires the errors u, to be uncorn.lt~tc d
over time for each cntit). In som~ paneLdata settings, the fifth a::.sumption ill impl.w
sible, in which case a different standard error form ula should be used . To kct.:p the
ootatioo as simple as possible. this section focuses on the entity fixed effects rc: grc!'sion model of Section J0.3. in which there are no time effects.

The Fixed Effects Regression Assumptions


The fixed cfft:cts regression assump tions are s ummarized in K ey Concept I 0.3 I h~
first four of these a. sumptiom. extend the four least squares assumptions. ;;t 11~J
to r cro:.~-scc tio naJ data in Ke' Concept b.4. lo panel data.

THE

Form

1o.s

The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression

365

THE fiXED EFFECTS REGRESSION ASSUMPTIONS


'11 rc arc five assumptions for the panel data regression model" ith entity fixed
cfkcts (Key Concepl I 0.2). Stated for a single ob crvcd rcgrcswr. the five assump
twns an: as follo\\--.;;
E(u,,I X 11 . X,1 . Xn.a,)

10.3

= 0.

\' 1. .\' ,. , .... X;r.u,1.ua, .... ll,r), i


<.ltstr bution.

= 1. .. . . n are i.i.u.draws from their joint

3. urge outliers arc unlikely: (Xw u 11) haYe nonzero finite fourth moments.
4. There i no perfect multicolhnearity.

'

Th~ crro~ for a given en Itt} ure uncorrelatcd over time. condit ional on the

regressors; speci[icully, cov(u1,, u1,,1 X 11 , X 12 , , Xl'r a ,) = 0 for r :r s.


For multiple observed regressors. X 11 should be rcpl~:1ced by the fuU list X 1 ,,.

X2 . ... x

.11

,unable bnt'i.

hold if ,rHitic drt: :.dcCted b~ stmpl, f,IQt!\.lm '~mpling ftfll tl11. J'pUJ.Itl<ln.
The thirJ and tourtb assumptions for fixed effects regression arc analogous
to the th1rd and fourth least qua11.:s 1ssumption'- for cro,-.-~l!t:t iona l datn in Ke'
.
Concept 6.4.
The fift h assumption is that! he l! rrors 11,1 in the fi xed dfc<.:t~ l'l!.l'i,;~Sion model
<H~.o uncorrdatcd over time. condition. I on the regressors. n,is as.,umpt1on is new
and docs n; t <trise m cro -<;~ction'.tl data whi~.oh do not h,1vc a tam~.. l.hmen ion .
....
O ne V\ a) to understand thas assumption is w recall that "~ c::l!Nsb ol tame-val)mg hctors thnl are det~.:rmmant'- uP}' hu t 'lrt: not mclud~.:d a' ~ rcssur<;. In the
trafftc fatalitks application. one such ractor b the weatht.:r: J\ partttul rl} -.nowy
winter in Minnesota- that is. a winter with mtlre ~now than a' ... rag'- tor \tinnesota,
because there already is a " Minnc-.ota'' ri\ed effect in the regression could result

....

...

...,...

366

CHAPTER l 0

Regre$Sion with Panel Dolo

in unusually trcilcbr.!rou~ driviog and unul>-ually many futa l l.lccidcnh.li the amuum
of now io Mtn ni.!'Ota in one yC<.~r is uncorrdated with the amount of snow in lh
next year, then this omitted variable (snowfall) is uncorrelatcd from one )'lat to

Je,~ 1

the next. Stated more generally, if u,, consists of random fact on. (such as 'nowt 1JI l
that are uncorrelatcd from one year to lhe next. conditional on the rcgn.:~'ur-. ( ,
beer tax) and the state ( M innCl>ota) fixed effect. thcn llu 1c; uncorrt:lau.;d [ron . ~c<tr
to year. conditional on the regressors. and the fifth assumpllon hol<.l
The fift h a<;Sumption might not hold in !lome application~ however. For CX'"n-

~fusut~Uy

tc~d

\UCCC,'ii~nh.

pie,. if
snodwybewinters, indMAinndesota
to fo11ow 111
then that
()mttteu actor wuu1
corre 1ate .
O\""llturn tn t11c 1oca 1econom\ rmg 1 pruJ ~..~
_.. 0~
'P""
layoff!> and diminish commuting traffic, thus reducing traftic fataltu~ for two or more
(l
}cars as the workers look for new jobs and cummuting pauern$ '>l~l\\ly adJU"t ~ ~mi(.J,.A
larl). u major road improvemenr project might n~cJucl:! traffic accident~ not only m thl'
R ~ ~tr C/ ~
y~::ar of complclion but also in f uture years. Omitted fnctors like thc~l:. which per~L'\L
I
v
over multiple years, will produce correlation in the error term over time.
~
If u , is correlated with u l< for differen t values of s and /-that is. if 1111 1 co1rc,.et>~i,
latcd oyer time tor a given entity-then u,, is ~aid to he autocorrelatcd (corrcl.tl~o:<.l
JA4/"~ wi th irsclf. at Jiffe rcnt dates) or serially correlated . TilU~. Assumption #5 1:.,111 be
'~~
t.fiAsta ted as requi rin~ a lack of autocorrelation of 11,,. conditional on the Xs anu the
ent ity fixcu effects. If u, is autocorrclated. then As!.>umption ~~~ fall~ Auto~or daY' . ~
tion IS an essential and pervasive featu re or time series data and is di cuss :lm
~ .
~ detail in Part IV.
~
r-.l

l' vv

sr.t

r1)

relJ

~'-

~ e-,
.tv

Standard Errors for Fixed Effects

Regressio~

lf Assumption #5 in Key Concept 10.3 holds, then the errors u,1 are uncorrd.tll:d
over time. conditional on the regressors. In thi case lf Tb modc.:tntt: or lar~c then
th\.: u~ual (hctcro-.kc.!clasticity-robust) srandarJ errore; arc valid.
If the e rrors arc autocorrelated. then the U'\ual standard en or form ulH1 nvt
\ahu One wa\ to see th1s 1'- to dra\\ an analog) to h~h:ro~kt:J.to.,ticit . l n an: r~...
sion witb cross-sectional data. if tbe errors are hc teroskedastic.lh"n (as dislu~~c:d
jo Section 5.4) tht homoskcdasticity-only standard errors an: not valid hl'ca us~
they were deriv~d under thefalsc al*iumption o f homo kedasticit . Similarl) .lf thL'
\:rro ~ in panel data arc autcx:orrelatcd. then the U'\ual ~tandanl c:rror-, \\ill n 1t be
vahd ht.Jcause tho.:} \\l'rl! Jerived under the false ~umption that they are not .J\ ,,,.
correlated. App-:udix 10.2 provides a mathematica l explanat ion of why tlw u~uul
standard errors arc not valid if the regression errors are autocorrclated.
Standard errors that are valid if u, 1s porcntt.tll) heu~roskcda-.tic. and po ~,.n
tiUJI} c.x>rrdated \.l \ cr time \\ilhin an entiry an: referred to a-. hcle ro~ol\.edu!ll dW
and aurocorrelutionconsistenJ (II \C) standard crro.-... The standard ~;I ~>~'>

1 0 .6

clu,l~.:r.ol \!TI>upmg.

r ,

h;I ln Lbc

Drunk Driving lows ond Trofric Deaths

367

but a,sumt. 1h . 11 tht. ''t Llllllltl l.ttcd l<lr .. rror!> nut in th'Ol aufocorrd;llJOO in pand t.f,11 a th~ dUl>ter Cunsists
I. . I \\ ht..n u J' corre-

~.:OnlcX I

10.6 Drunk Driving Laws and Traffic Deaths

~ #v i#.. Alcohol taxes arc only one ''ay to Jiscouragt:.. dnnk111g and dnvmg. State~ Jilkr
~'

in th~ 1r pumshmenb for drunk dnving. anJ a l>l<ltc th.tt crad... t.fo,, non drunk dri-

?
' 1ng could Ju 'o a<..Tll ~ thl! board b~ tOUlht.nl1!:' Ia\\' .1., \\dl IS rai-.ing ta..xc"- H 'o.
~ ~mlllmg these law could produce omined vari.1bk hins 10 the OLS estimator of

,S~~ hc cffec1 of rt!al btcr taxc<> on traffic. fatalitic .... C\'cn in rcv.r~,.;ion!) with state and
(,I. ,_,A.
time: fi:-.cd effects. In addi11on b~causc vehicle u-.~ dt:pcnJ, in part on '"he ther d ri/~
'crs have jobs .md bccutL-,c tax chang~s can rdkct cconomk condlllODl> (a .. tmc
~
butlgel deli<.'ll can lead 10 tax bike,). <l011lllll~ St1tt.. t.:COil<.llDIC COOdllions also COUld
j/'~
h.''l;h Ill ommed \'ariablc bias.
~
~~~
In tlus ~l.!ction \\~:exten d tht! preceding nnal\s1s to s tud) the: dfcct on traffic
~l.ltalitk:. o l drim. tlg.la''" (mduuing --ecr t.txcs) h~'h.hng ..~,,nomit.. coniliuoru.c n;, _ ...., ~~ c;t:tnt Jlu.; ''June b) c timating panel data rt.-grc ion th t mclu\.k rcgrc . . . . ors rcn:~ re nt'n\! uth, r drunk Jri,ing hs"' <~nd 'I tit: wno1nc t.ond'i11ns
~~.'
l11e resuJl!. are :.ummarized in Table JOJ. The lm mat of the tJbh.. is the same
as that ol 1he tables of regres:.ion results in ChaptcJs 7. 8. and IJ Eat.h column
~e _.pr<: ports a difft"r'-nt rcgrt:ssion and each ro\\ 1~.:purt ''-'''-lhcJcnt cstimHh! anJ l n..J P"''"'/,AIJ' dMd error F-statt,tlC and p-value or oth~.:r tnfmtn.llion abnut lht. rcgrc saon
/~Column (I) an fa ble lO. I presents resu lts tor the OL~ 1cgress1on of the fatal-

d
0

,,

111 i nt on lht: h .. ~1 J,~.,cr tn "pm 1., tU ~'') . nd t h~ t. lumn ( 1cc;timare j, st


tl'li<.:Ctlly "igntf1cantly differentlrom tcro at the ~ o~ lcvd '\~o:t.ordmg to thiS c'timatc. incrca.,mp.. bc:t!r ta\~">mc:rea\1!.~ Lraffi... fathttv.! HO\\ cvcr. the rcgrl!".,ion in

<.'1

he
\le
to-

uol

column (2 ) Ircported previous!) a" Equat1on ( 10.1")) wh1ch mdudes -.tate fixed
cllcctl\. uggcsl" th,Jtthc positive coefficJc:nt in rt..g~ "t<>n (I) j., the result of omittcc.J "arinolc hias (the coefficient on the real bt.:cr 11x" 0.66) l11e n.;grcssmn R~
JUmpl> from (I.( )I){) to O.X."i9 when fi-xt!d dfi.'Ct' MI.' indmled. c\Jdcntly,the SUI\C fix ed
d h.:ct'> account for n large amount of the. "HI I 111un 111 the d.Ha.
Little ch.tngc when time effect-- :ut; td1.kc.l. ,1., rcportec.J in column (3)
[reported prt.' inu,J~ l"\ Equation ( 10 .~ I)). l11c r(-.uJt, in columns ( 1)- (3 l are

368

CH APTER 10

r TABLE

10.1

~.,

Regression with Pone! Dolo


sion Analysis of the Effect nJ

n~unl. nn..;.., ............... T....UU. n.....tl.. ~

Oe.,.ndent variable: troffi< fotollty rote (deortu ,_, 10,000).


Regntuor

1--

Dnnkmg age

( 1)

(3)

I~

Drinting ag.. 19

.f:

(4)

(5)

ll.tr-8
(II.Oo6)

-O.o! I
(0 Q6..1)

O.lll'l

-li.ii7s-

~~0.(~(),

.031

.046)

-,~Jtr -rLt

O.lll'<
(0.071

. .: o:ot'l
(0 .1)5 ~ j

(11.1149)
-O.W;?

(0.1.146) ~~

-U.Oll2
(0.017)

. ~~t

ll.OT3
(0.032)

O.OJ J
(0 1)(8)
(U-n

~---:

IJ.Il~.l

(0 137)

(0 .I'14)

o.o.w

Mandato!)' jnU

or community service?

(O.OM)

--Average vehcl~o; mlel> per driv~:r

~--

~~
;~~.(.,_~;-~(O.IkJ.'\)

() 017
(0.1110)

Real i.ooome pc:r cap1l.1 (logarithm)

Uil
(0.47)

Slate effects"!

Tune etrects.,
Oustcred standard errors"/

F-stati.s ria and p-volue~

().(X)!)

-0.063
t0.012)

Unemployment :

11 - ,1
(IW55)

te~ting

DO

ves

no

no

no

no

\"es

(0.45)

c:-.
\ Cl>
-~
---

yes
no

yes

-}e~

no

(liN ;)
~

)~~

;..c:

)e~

y..:

no

exclusion of groups of vorioblti::

Time effects = 0

2.47
(0.0:!~)

11.4-1

( < O.()() l)
U...tS

2.28
11.6:'
'r
(O.O:m ( <. 0.001) ( < (J. I I I
2.09

0.17
(0.845)

0.59 ~
(o "57)
~

38.29

0.81-.'Q

O.b'91

., ,

(0.102)~--c~ ( II '\?5)

4 .IS

0.')21\

() ~~

co:;s~l
25.~

( < 0.00]) ( < ().l f il)

( < O.(lOl)
0 0'-X>

(Cl.6!J6)

O.K93

11926

0.926

These rcgTc~'--"OO~ were c:'linutcd u~ing f!."tncl data ror 4~ l S. tat~ from I~ to 19S.~ 133ft ilNrnllom 1. .II. ~Ti~a u1
Appcodu 10 I Sl.tnd; .rd cnor~ 11re gh cn in r;arent~ " o<Jc ~ oodfJcicnL-. '1<1 p-'
aT" l!'v' n r'" nthc:s..-s undo:r :.r.
SL11ila. lb.: indllOu:al ~-ocfflcient b SII'IDUaUy s,ignif101n1 :It the 5.,. I<"\ e' ur
I ;., 51l(nlllc&OC'o ~ lc ~I

1 o. 6

Drunk Driving Lows ond Trolfic Deaths

369

n,e next thret! regressions in Table JU.Itnclude adt.litiun.ll potential t.leterminants of fatality rates. along with timt. ,mt.l ~\Hit: effect,, I h~.: basi! specification,
re ported 10 colum n {~). includcs two se ts of legal variables re lated to drunk
driving plu., vanables that control for the .unounl of urav10g and overall ~nate eeo-

1. Includine. the ~dd:itional 'anahle~ cc.Jucc.:' the ~ttmated ~oef JClt:Ot on the real

tcancc lew I. One way lQ e\ aluate the ma)!.nifude t'f the coefficie nt h tu imaginl! a state with an average real hccr wx t.loubling it\ tllx: because the average
real he~.: 1 tax in these- data is appm,irnutcly $0.50/ca-,e. thts entail" increasing
the l .t\ h\ $0.50tcase. According to tht: estimate in col umn (4). thc effect of a
$0.50 mcrcase (in 19&S dollars) 10 lh1. hc~.:r Ia\ b <1 dct.r~,;ac;~,. tn 1he .:xp~cted
fa taht> rate by 0.45 X 0.50 = 0.23 death per LO,OOO. I hi!> estimated dfect is
large: Because the average fataUt~ rtll. 1~ 2 per 10.000 n ruiuc11on of 0 23 corrt.,pomh to decreasin~ th fatc.hl\' rJtc t<\ -,- p~.r J( .I )0 Thi ~ ....ui:l the estim.llt. ''quite imprecise. Becau't! th1. -.tan\Jar:a rror on thi!; codl1cient '" 0 p.
tht. 95% confidence int~.rval tor th1' dft!cl ~ n -l'i 0 'iO .... I 9h 0.22 X
O.SO = (-0.44. -0.0 I). TI1is \Vldc 95% confide nce 1ntcr"ar 10cludcs \'a lues of
the trut. dfect that .lre ,ery nearly /t.ro

2. The minimum legal drinkmg aee ts e~t una ted to ha' c \en litth! cllcct on traffic fatalitie' The jomt hypotnc~ts that the coelfick nts on the mimmum legal
drinki ng age variables a re Lero cannot be rejected '11 the l0 1}u -;ignificancc
level: Tht: F-statbtic te ling the joint hypothesis that the three coeflidents are
zero is 0.48, with ap-value of0.696. Moreover, the estimates arc smallrn mag-

370

Regression with Pone! Dolo

CHAPTER l 0

nitudc. lor example. a tote with a minimum legal

tlrinkin~

age ol 18

1., ... .,11

mah.: tllo ha\'e a f, (,alit~ rme bagher by 0.02,' death r~.:r 10.00 lth.tn 'Ill .\ ltl\
minamum legal drinking ag.c uf 21. holding the other f::JctOJ~ in tlw 1cgrc,.
'111n constant.
3.

4. The economic vanabks ha'e cons1derahle explanatory power for tmffk rata).
itic~. High unemplo}m.:nt ratcl> a1c a soc.:iated with fc\\er I<1L:tlitks An
tnCrl.l'l.! in the um.:mplo~ml!nt

potnt ts e~timat. d to
n:ouct tmfttc fataliucs by O.Cl63 Jeath per I 0.000. Similarly, high vuluc' ot r~al
p~r capita income .Ire a!>~odated '' itn high fatalitit!!S: The codficac.:nt ,., I ~1. so
a I 'Yo 111creasc in rl!al per capit<t incoml! is associatcu with un incn.!~"'.. an tr ,lffi futtliuc of 0 0181 dc.atb per 10.000 lSet. Case I an Kty (~mc~:pt 1-. 2 lor
int~rp rl!ta tion of t h i~ col;!fficient). According to these estima tes. good conomic condttton" tire a<>...ot:iarcd with higher fataluics, perhaps h~c~ll"t. of
incrc~t ~:i.llralfic.: ucnsit~ when the unl!mployment rate is low or grcatct lh:O:'
hoi con...umptlon \\hen mcume 1'- high Tb~ t\'o cconomtc vanablc-; an: JUHttly
'>ig.nilicnnt at the 0.1% Slgn if'icancc level (the F-statistic i~ 38.29)

rak b) one p

rc~nragc

_p

fuV~
{r~ t...

, _.. ~ : hrA ,...r

Ctliumn" (5) and (6) ofTahlc 10.1 rcpon regressions that c hec ~ th~ sen,ltl\'
ot these.: conclusions lo changes in the ba c spcci(ication. Tite rcgrc:.,...lon 1 LOI

~I /J 1 _/Y
~umn (5) drops th.: ' ariabks that control for economic cond~tions. The rc~u lt ,., .1n
~ftlli~creasc an the c ttmatcc.J efkct of thl.! real beer tax, but no appr~ciabk "hu in
~~ f
('he other wt:Uiclc.nts: thL ...ensrla"ily ollhe ~stimatcd hcl!r t.t'l. codficicnt to urduJing the economic variabks, com bined with the !\tali tical ignificance ol the ~ <JCf
(icknls on thos<; variahlcs. ind ica t~!' th<~Lihc n:onomac \;mabh:~ -;huuld r~m am tn
lht.: ha ... ~: 'pcctlu:alton The rcgre . ion in column (6) cxaminl:!s the 'en ith-tty ofth~
rc:-.ult)>. to u...ing n Jifiercnt tunctional torm for Lht! Jrinking age ( rcplacmg.th..: thr~l'
intlic 1IOI \.triables \.\illl the drinking age itsc::U) and combming the two hinaf'\ pun
/ /'
\
i. hmcnt 'analik' 11u: rc.,ult~ ot regrc'-!>ton (4) are nor. ensitiv to tlil!sc. ch.. nt.:t..'
- ~\t
me final column in fabll! JO. l is the regression of column (4), hut with ~.lu'"
/, .
rcrct.l stand,aru c.lflm, that allm\ for autocorrelntion of the error term '"'thul nn
.I/
entity as di.,cu-;scd an ~ection 111.5 and Appendix 10.2.111e estimated c(ldhdt. nb
r1J
in columns (4) and (7) arc the c;ame; the only difference is the standard errol ' 11tC:
p
ustcrcJ swnd.ml error<> tn column (7) arc l<trgcr than th~; -.tandard l.'rmrs iu ('ol~
umn (4 ). Ctm"ei.JUC:ntl}.th"' conclu,ion from rcg:re$. ton(~ 1that the cocfficil.'nl~ 011

j.;NfA
l.

',A

j/-,_.

rn
-~c0
"

Condu~on

10.7

371

drunk dri\'ing laws and legal drin king ,tgc' Jill! not -.t.tll'ticall~ signific:mt al t"l
nht,un-; u-.in~ the HAC standard crmr... in column ("'I lllc J-,t.ttbtics in column
(7) .tn. ~;mallerthao those in column (4). hut thc1c .trC' no quuhtattve difference~
in the two sets of F-stath.ttcs and I"' alu~-.. One substunti-.:t.' Jiffercncc he tween
w lumn-. (..J) .md (7) arises bccau<..e th.: rt \( -.t.tnd tn.l error on the beer tax codltcil..'nt is largenhan the tandard ~ rrnr Ill wlumn ( -1) ( on,cqucntl) . lhe 95% confidence int~,;naltor the e ffect on latalllic-. ol a ~:hangc 10 the l'eer tax using the
II AC \landard error. ( -l.OQ 0.~11). is'' iJcr than the ml~r' al from colum n (4 ).
( - 0 . 9. -O.oJ ). anJ the intcr\'al computed u'ing the lit\( standard error tncluJes
tcru
The ~lren!!lh of this analv-.is ''that mdudinr ta t~. :tnJ ttmc fixed effect mit-

An
J to
real

l ...r

' ...Jt,

on hccr. could move \\~th other alcoholtaxc ... : this 'lll!l.!'t"lltl. intcrprt!ting the resulrs
a-. pertammg more broad!~ thiln JU\1 lo hc~:r A 'uotlcr possthilirY is that 1 -~ m
the rc:ll hccr tax could be ac;soctatcu '' llh puhltc ~du~allun campaign' perhaps tn
rc,pon-.e to political pre. -.urc. i{"::'o ch,,;lgcs in th" '" ,,j,C\.T ta'\ could pick up the

~ffcct of a hroader campatgn to re<.luc\. drunk drh Ill~.


r~ llle~e results present,, pwvuc~ui\ piCture ol m~.--urcc; to control drunk dri-

o~L g amJ traffic tatalittes. According toth\.-.c c ...tim.llt.' ncith~;r

uf( puni,.,hmen t:O:


mt cftc h on (ataJ-

n11r tncrca:;es in thtc> minimum kl'al drmktng nge h '" 111.,! ~,,rt
In t.ontra<.t.there is soml.. ~,; \ tJt.ttcc &hut mcr('n ... mg alcoholtaxe~ a::. mca;;urcd
h~ thl! real tax on bt!er. does rcJuc~ tr.tlltc de!.tth-. Inc m.s,gnituJl.. of thb dfect.

1111;''

codain in
o f the
tbfCL
~pun

ang~"

h du~
hirt ;10
tcie nl"
f".lltO:

iocvl
nt' on

howe, cr. ~' imprcciscl) est1matcJ.2

10.7 Conclusion
llm chapter showed how multiple uh-.t.J '.ttJon' o\cr It me on the same entity can
he u,.,ed to control for unob::ocn cJ onHtttd '" i.thlc-. th.H dtlfcr acro ...s entitic-; hut

372

CHAPTER 1 o

Regreuion with Panel Dolo


an. con

chane~

1 tnt over umc

The ke'

ms t~ h t

ts thattf tht unob,cr' cd ':tt iablc dues 1101

m cr umc.thcn am changt;s tn the dependent' ariabk mu,tlx due tutntlt 1

enccs nlht.:r than these llxcd charactcnsucs. If cultural auttu.ks to\vard drink1
and Jn, 1n2 do not chan!lt; apprel.'tahly over se,en v~P "11 htn ..1 st 11. hen e;~;p'
nauon' IM change 10 the trafftc fatality rate O\cr tho'1. sc\~;.n yeah mlL'>t he
elsewhere.
To exploi t this insight. you need data in which the same tntity i-.; u~...crvcd 11
t\\O 1r more umc p.;nod,. thatr . ~ ou need panel data. With panel data. the mul.
tiple regression model of Pa rt 11 can be extended to include a lull ~c t of clllit~
binary variables; this is the fixed effects regression model , which can be c'> ttmutcu
by OL. A twist on the fixed effec ts regression modeJt, to 11KiuJt; l'ml. fi);eu
etk~h. '' ht~h dmtrtll hlr unob~'-''' ~d '.uiabl(!:) that t.:hJnge over ttmc but ;u c cun... r..nt acrns entitic . 8(lth cntll~ anll time fi\c\J effect can \1~; inclullvo r th~:
regrc!>o;son In control for ''ariablc~ thut vary aero - cntllie~ hut are constant OYer
ti mt: and for variables that vary over ti me but are constan t nero~ en tities.
Despite these virtues. entity and time fixed effects regression cannot control
for omitted varia ble thut \ary hoth .tcross cnthtes mul over lime. \nJ o hvtouc;l}.
panel data methods requi re panel data, which often are not available. TI1U"' th.:II!
remains a net:d for a method that can eliminate tbe infl uence of unobservl.'d O!DJI
ted variables when panel data methods cannot do the job. A powerful and g\:nl!rnl
method for doing so is Instrumental variables rt:gression. the topic of Ch'lpter ;_,

v~~

gP,.
r

~~P. j 'mmary
~

L. Panel data consist of observations on multiple (n) em itics- swtes, firms.

r~ople.

and so forth-where each entity is observed at two or more time periods (I .


2. Regres..ston with entity fixed e ffects controls for unobserved vanables that dtfier
from one entity ro the next but rematn constant over time.
3. Whe n there are two time periods, fixed effect regression can be estimateJ I'' ll
''before and after" regression of the change in Y from the Cirst period to the. ')cr.:
ond on the change in X.
4. Entity fi xed effects regression can be estimated by including binary variable' ftlr
11 - I entities, plus the observable independent variables (the X's) and an intt:r
cept.
5. Time fixed effects control for unobserved variables that are the same aero:.:. .. til
11es but vary over time.
6. A regresston with time ,md entity fixed effects can be estun<t ted h} includirtf
bmary \ariahles for n - 1 entities, binar:.- variables forT - 1 tim~. periOd\, pJw.lh~
X's and an intercept.

ExerciS&

313

Key Terms
p<tnll Jata (:\50)
o.almccJ pand (:-50)
unlMianced panel (3: I)
hxed effects rcgre!'\. aon model (357)
entity ltxed eltects (357)
time: fixed etfc:cts regression modd (362)
time fi xed effects (362)
entity and time Ci~c:d effect<; regression
mode l (362)

autc.ll'(lrrd.ttcJ (366)
'l.'n tlh cotrc:l.th:J (J(>fl)
hetcro~l..cdJ'\ llCllv- anJ autocorrdationcons~lt.:nt (J lAC) :.tandard errors
(166)
clustered :.tandartl errore; (367)

Review the Concepts


lO.l

f0.2

12.

10.3

Why, . il necessary to use two suhscripts, i .mJ /, to J.::scribe panel data'! What
does i refer to'? What does r refer to'?
A researcher is using a panel data sct on n = 1000 workers over T = 10 years

(from 199h to 2005) that contains the workers earnings. gende r. education,
and age. fne researcher is interested tn the effect of education on earnings.
Give some examples of unobserved person-specific variaNcs that are correlated with both education aml earnings. Can you think of examples of tim..:specific variables that might be correlated Wtlh ctlucatJon and ~ammg.~'? How
would you cont ro l for the~~ per~un-sp~cific and tlnh:specific dfc:cts in a
pnnel data regre ion'>
Can th!! rcgrc~si on that you suggc:stc:d in respons~ to question 10.1 be u~ed
to estimate the effect of gender on an individual's ~arn ings? Can that rcgres
sion be uscu to estimate the effect of the national unemployment rate on an
individual's earnings? Explain.

Exercises
10. I

'll1is question refer!> to the drunk driving panel


in Table HI. I.
tt.

d<~tu

rcl.!ression summarized

New Jersey has a population of ~.I mlllu.)n people. Sup po~c: that "Jc\\
Jen.cy increased the tax on a ca~e of h..:cr by $1 (in $19lS8). U-...: the
rc.,ull in column (4) to predict th~.. n u mb~,;r of lives that would be
saved over the next year Con~truc l tt YS 0..... confidence tnterval for your

374

CHAPTER 1 o

Regnruion with Panel Dolo


b. The drinking age in New Jersey is 21. Suppose tha t New J~rsc} lm...-ered 1ts d rinking age to 18. Use the results in column (4) I> preuict thL
change in the numher o f traffic fatalities in Lhe next year. Con~truct a
95% confidence interval fo r your answer.
c. Suppose that real income per capita in New Jerscv mcreascs b) 1 '.. in
the next year. Use the results in column (4} to prec.lict th~.; c hange m
the number o( traffic fatali ties in the next year. Construct a 90~.., con deoce interval for your answer.

d. Should time e ffects be include d in the regre<;.<;ion? Wb) or why not?

c. The estimate of the coefficient on beer tax in column (5) is significanr


at the l % leveJ.The estimate in column (4) is signiiicam at the 5%
level. Does lhis mean that the estimate in (5) is more reliable?
f. A researcher conjectures that the une mployment rate has a different
effect on traffic fatalities in the weste rn states tha n. in the o ther ~lates.
How would you test th is hypot hesis? (Be specific about the sp eciftcalion of the regression a nd the statistica l test you would use.)
10.2

Conside r the binary variable version of the fixed effects mode l in Equation

(10. U), except with an additional regressor , D 1;: that is. let

11.

Suppose that n

= 3. S how lhat the binary regressors and rhe con-

s tanr" regressor are perfectly multicollinear. Lhat is. express one ot tb.;
variables D l ,, D2i. D3;, and X0J, as a perfect linear function of the
others, where Xo_, = 1 fo r all i, c.
b. Show the result in ( a) for general11.

c. What will happen if you try to estimate rhe coefficients or the rl.!gr<: .
sion by O LS?
10.3

Sectio n 9.2 ga,e a list of (ive pote ntial threats to the int e rna l valic.lil\ ,,[a
regression study. Apply this list to the empirical analysis in Secrion 10.6 and

10.4

thereby draw conclusions about its interna l validity.


Using the regression in Eq uation (10. 11 ), what is the lo pe and interct:pt ior

a. E ntity 1 io time period 1?


b. Entity 1 in time period 3?

c. Entity 3 in time period 1?


d. Entity 3 in time period 3?

Exeroses

10.5

Con~der the model" ith a single rcg.res.-.or } 11

J3 1X , ... a , -

3 75

p.1 + u,~ llus

moJd also can be written as

where! 82, = 1 ii t = 2 and 0 o therwio;e, 0 2, = I if t = 2 and 0 otherwise, a nd


so forth . How are the coefficients (/311 52
2, . , 'Yn) related to the
coefficients (a1, .... a,~> J.J,1, . .. 11 7 )?

o.,, y

10.6

Suppose that the fixed effects regression assumptions from Section 10.5 are
satisfied. Show that cov(v;,. v;~) = 0 for 1 s 1n Equatioo (10.28).

10.7

A researcher belie\'es that rraffic fatalities increase wh~n roads ar e icy, so


that states with more snow wiJJ have more latalit.ies than other states. CommeJH on the following methods destgned to estimate the effect of snow on
fatalities:

n. Tl1e researcher collects data on the average s nowfall for each stare
and adds this regressor (AverageSnow;) to the regressions given in
Table lO.l.
b. The reseai cher collects d a1.a on lbe snow1all in each state fo r ea ch year
in lbe sample (Snow11) and adds this regressor to the regressions.
10.8

Consider observations (Y,,. X11) from the linear panel data model

Y1, = X;,/31 + a;

+ A;f + U ;1, t

= I , ... , T, 1 "" 1, . .. , N,

where a; + A/ is an unobserved individualspecJfic ume trend. How would


you estimate {3 1?

10.9

and
jor

Explain wh y Assumptio n #5 gjven 10 Sectton 10.5 is important for fixe d


effect~ regression. What happens if Assumption #5 is no t true?

10.10 a.

ln the fixed effects regression model , <11 e the fi xed e ntity dfccts, a;. consisteolly estimated as n ~ oo with t fixed ? (/lint: Analyze the model
with no X>s: Ytl = a, + u;,.)

b. rf n is large (say, n = 2000) butT is small (say, T = 4), do you think


tbatthe estimated vaJues of a, are approximatd) normally dbtrihuted? Wby or why not? (Hint Anai)Ze the model Y 1 =a, - u ,.)

10.11 In a study of the effect on earnings of education u'mg panel data on annual
earnings for a large number of worke r!\, a rc!\c:archer regres es earning~ in a
given year o n age, education union s tatu'>. nnd the worker\ eamiog.!i in the

376

CHAPTER 1o

Rogreuion wirh Ponal Dolo


pr~\ iow,

yo.:ar ustn~ fixed effect<, r~grc:!:>'-iun . ~ illthi., rcgrc.,.,tun gih n:h.ablc

~,;..,tr mat~s

ul the !!I teet of the regr~sc:;ors (age. education, uniun .,,<HU\, nnu
prcviou:. year!:> t!arnings) on earning!:> ? Explain. (1/mt: Check the li~cd e lf ct
r~,;grc!:>sion ~:.umptions in Sectton 10.5.)

Empirical Exercises
( 10.1

'omt! 1..-.S. states have enacted laws that allow Clltll:ns to carry co n~:(',r h.:d
weapon<,. " l11e~e laws are known as ''!:>hall-issue lct"-S hc1.ausc thcv 11N ruc 1
local authoritic-. tv L Ut! a conre.ded ''-capons pt...rmtl to all applu.:anr... "' htl
arc citizen'. arc men tally competen t, and have nut been cunvtct<.:d ol alclom
(:.omc c:ta tes havt! some additional rcsLrictiom.). Propun nh argul.' th.tl. rf
mnrl! people carry concealed weapons, crime will decline becau-.c crimrn ,d ~
arc dcwrr~. J from attacking o ther people. Opponents argue that1.rim'- \\ill
ncrea c bccau:.c of accidental or spontaneous use of the weapon. Jn Ihi,
exerc i ~c. you will <~na lyze the dfect of concealed weapons laws on VH II 11
crimes. On the textbook Web site www.aw-bc.com/ tock_wat.son }OU "'' ... d
a data file Guns that contains a halanced panel of data from 50 C.S. st.t l~ .
plus the District of Columbia. for the ~ ca rs 1977-1(}(}9. 1 \ dctatlcd lk n p
tion i gv~..n in Guns_De cription. available on the Weh -.i t~.
n. Estlm.llt: (I) a regression of ln{ vio) against ~I! nil and (.2 1 a regrc .... on
or ln( I in) again...! shall. incarc_rme. density, avginc. pop, pb/OM ,
p .... /fJ6.1. anu pm/029.
i. Interpret the coefficient on shall in regrc:.sio n (2). Ts thi~ "'str'llltlC

large or small in a " real-woriJ'' en c?


ii. Docs audrng the! control "an,tbll!s in regression (2) change thL,
nwkd effect of a shall-carr) lav. in regression (I). as ml!ac:;UI~u
stctti liLa! "ignificance? A., mca..urcd by the.. "real-\\ orld" stgnll
cance of the estimated codficknt"

I I

iii. Suggest a variable Lhat vanes across state!:> but plausibly ' .trlc" l:l

tie 01 nllt at all-over time, and thal could ca ul'c om it ted va1 111 hiL:
hia... in regression (2).
b. Do the re~uhs ch,mge "hen vou ,1dc.l fixed s tate cffc~ t<,'? If so." hi h
set o l rc..grc:.,ion resuiLs is more cred1hle. and" hy?
nw~ uut.t \ere ptllliU.lcd b, l'ror~sur John Dunuhue ur Stanford t 101\"C~ItV m.l \\ t T"' U' ' .J '"Iii'
l.tn \\ fC: '' Sh'lllllll!;: 1),,,, n the ~tor"' Cnuu. I...: 'Cnmc' Hypoth~,,._ ~ 111.fo ull .m R 1

p h per wllh

20").1. ' "' I lOt 1.' 1"

Empincol Exercises

377

c. On th1.. rcsull:. change \\hen} ou .u.Ju li,~,.d time cfh.:d''' If ,t,, \\ hu..h :.et
ol regrehl>ion result:. b ffitlTC crcd1hl1.!, and \\ h~ ~
d. Repeal the analysis usmg lolrub) and ln(11111r) in plac~ of ln(I'IO).

e. To your v1ew, what are th1.


Internal vahdH) of th1:.

mo~t lnlf>Nitn t

rcgr~:.""-1011

, .. m.tilllll!!, tlm:.ll" to the


111111\ ''' '

f. B.1:.ed on your analvc;i-. wh.11 cvnclu~tun' wnuld ynu draw about the
cffe~..ts of conccalec.l-'''-'

E10.2

tp,)n laws ,m th1.:.1. ~..rune

ralc~!

Tralltc crashes are the k.tdmg c.,IU,I..' ut J~ath lm i\ml.!nc:ln' bch\C.:l:D the
.tgc-. of 5 anJ 32 ThTllugh 'ariou' 'P~;Illling polkilt>.thc fi;J~. nl i!\1\'trnment
ha:. encnuragcd states to in-.titulc m muttnl\ o;nt hell h\\l> to reduce the
numhcr of fatalities and M.riouo; injunes.ln till' c\crct'c you WIIIIO\CSUgate
how dlccth~ these lav.s arc m incrt:.,mg s~.at b.. ll u...~.. .1nd r~.c.lu(tng (a talilies. O n the textbook \'c:b :.it~.. \\W\\.a\\bc.com/1\tock_nut)On )OU Wlllllnd a
data fik Sea~tbclts that contatn' .1 pand of t.l1lll lwm '\II l \ ~tate.-.., plus the
Di:Mict of Columbia. for the y~.t r-. JI.)K1 199'~ ' \ dtt.likc.l c.l~-.t.~ription is
g1ven in Seafbelb_Dc cription 3\ :ul;1hlc t.lll the: \\ch -.itc.
a. Estimare rbe clfecl of 'L'3t be lt us~. on la t .t htic~ h\ r~urcs,ng
FaTnllfyRate on s/1_u<~ca ~c. \j)t ed65, 'Jit"C di(), haOb, drinka~dl,
ln(mwmt>), and age. Does th~ c. 'tlrn.lll.d r~grc.SSIOil sugg~l>l thut
increast>c.l ::.eat bell u-.~ reduces f.tttlhtt~.:s'!
h. Do the r..!sults change when) ou .1dc.l ~t.t lt.' 11:-.~oc.l dfccb'! Pro' ide an
intuiti,e explanatton !or wh) the rc,ull\ t'h,lllgct.l.
c. Do the results change \\hen )OU aud tunc l1\1..U cllccts pluc; st ate fixed
eift:cts?
d. Wh1ch regression specification
Explain why.

(,t ), (h), or (c)

1s mnst rdiabk?

e. Csing the results in (c), d1-;cus' the Mit: of th..: Cl)clftctcnl on ,\/> 11.\l!tl~c:.
Is it large? Small'? How mnny hvcs ''uulc.J h~ s.\\'..:d tf st.: al hclt usc
increased from 52% to 90%'?
f. There are two ways that mandatory s~.:al belt 1,,,\\ Me cntorcec.J: Pri-

mary" enforcement means that a pohc~.. utltccr cun :.top a car and
ticket the driver 1J the otllcer ob!!cn c-:. ,,n I.X:<.:Upunt 1WI w~aring" ...eat
helt. .. .,econdary"l;!'nfon:cmcnt mc.JII\th 11 ,, pnhc.1. nlticcr can \\rite a
ticket if an occupant is not wear in~ ,r '\!II h~ It hut mu-.t h:we another
n,c..... d.lla \\CI'<' prov1Jcd by Prulc:-..~or Lr w I IO.J" ol Stnnhnlt 111\l~nll\ md \\Crt u-...d 111 h1 p.1~r
\\1lh Alm.1 Cohen, "Tho: F.Jiccb
\lanJ:lln.,. ~t'~l 1\t'lt I..II\\~ otn IJrt\ tng Ucha\ 1ur unJ Tllllfi r1llllh-

,,f

l " ' Thr R, l'trn11f I:CIIIIliii/C.\ <11/tl ~1<1(1<1

't)l_\

'i(<l) H2~->

378

CHAPTER 10

Regre$sion with Panel Data

rea)OO to~top the car. In the data set.primary is a binury va riable Jor
primary enforcement and secondmy is a binary \'ariahle fur seconua 11
enforcemen t. Run a regression of sb_useage on priman, .wcmtdm:\,
speed65. speed70. ha08. drinkage21, lo(incume), anu a~e . induding
fixed state and time effects in the regr essi o n . D oes pnmar) cnlnrce
ment lead to mo.re seat belt use? Whal about secondar) e nlorc~ment.

g. In 2000, New Jersey changed from secondary enforcem.. : nt to primar\


enforcement. Estimate the number of lives saved per ) ear b)
this change.

mc~kin~

APPENDIX

10. 1 The State Traffic Fatality Data Set


ll1e data are fo r lhe tower 48" U.S. states (excluding Alaska and Hawaii). annu all~ for 1\JX,

through 1988.11le traffic falality ralt.: is the number of rraffic deaths in a grven sta te m a
given year, per 10.000 people living in that state in lhat year. Tra(fic fatality dat.l w..:r~
obtained from the U.S. Department of Transportation Fatal Accident Rcportmg Sy'item.
The beer tax ic; lhc tax on a case of beer, which is a mectsure of ~tate alcohol taxes mll ._ ~o.n
erally. The drinking agt.: variables in Table 10.1 are binary variabks indicaung ~ hdher th~
legal drinking age is 18, 19. or 20. The two binary punishment vanablcs in Tallie 10.1 dc: .... rihc
the SUite's minimum sentencing requirements for an initial drunk driving convtctlun

"Mandatory jail'!'' equals I if the state requires j ail time and equals 0

oth e rwJ~c:.

,.,d

'Mnndatory communuy service?" equals l if the state requi re~ communi ty serv1co. <lltd
equal 0 otherwise. Data on lh~ total vehicle miles traveled annually by state\\ ere of:lt r:~d
from the D epartmenr ol Transportation. Personal income was obtained from l hl tr <;

Bureau of Economic Anal) sis. and the unemployment rate was obt.ained from tho. l
Bureau

L)f Labor Stati~ucs.

These oata were graciously provided to us by Professor Christopher J. R ubm ol tho.:


Department of Econom i.cs at the University 0f North Ca.rolinn.

10. 2

APPENDIX

10.2

Standard Errors for F1xed Effects Regression w1th Serially Correlated Errors

379

Standard Errors for Fixed Effects Regression


with Serially Correlated Errors

~ tkch rcgre,,ion "h~n the


scriallv com.:lilled, '\f'CCJ!Jcallv when ~umption ." nf f...e\ Concept I03 does not
hold ll11 i<> autocorrelatcJ. wndllton.tl nn the X'<;. lh~nthe u,u.tl ~taml<ud e rror forrnulll
1~ m.tppropnatc 11ti~ app1. ndi\ expbms " hy this 1:> ~o and pr~''>~.;n t s an alternative iormula
for st~tndard t:rrors thai arc Htltd tf !here j., hclcrt)'.hJ,t~ltCtty .mdJor au tocorrdation-thal
1~. fur hetcrthkeda~UCII) - and <tutocorrelauon .:un'l~l\.111 {II \C) ~landard errors.
Th1<. aprx:nJi,. c~m-.ider' cnlll)'-d.:meam:J lix~.;J ffc. 1 r )?res, ion\\ ith a -.inglc rcgrc'sor X The fommlas gl'hn in lhi' .sppenul'~ 1r1. e\tcndcd t<' multiph.: rc~re;:.q)r,; in Exercse
IR.IS. Througholll we f,1Cilc; on the.: ca~c in ''hiCh th1. numher of entllliO's b large ~u l the
number of t1me penods T '" smaU mathcm.'\llCillly,thls ccmc~ponlis to treating n a.' mcre..~
ing to tnfmlly \\ hilt: 7 remam' fixed.

Th1' 11ppcmli\ pr<mdes fom1ula:-. for 'tandarJ em,-,.. fur fi\.:d


~ rrur' are

The Asymptotic Distribution of the Fixed Effects Estimator


Jh~

lhc:d effect<; estima tor of /3 is the OLS esu m:llm ohtu11cd u'inf! lht' ent it) dem~a ncd

r,,

re~tcssion ol Equ,ttion (10.1~) i n which Y,, '"' rcgrt-.,~d on X11 , \l.ht.re Y,, - Y, - Y .. =
T
X - A,. Y, = T I , Y, anJ X = r -L ,x .Th~ formul.t fur lh..: OLS eSIJmator IS

ib.:

ion:
and
and
inc:d
LS
L.S.

obtamed by r.::placmg X;- X b) X, and Y,- Y h) } "m I qu.J1111n (.t 7) .md by r~.;placing
the ~inglc summ:Hion in Eq uu ti\lO (.t 7) by two :;umllliiiUlll' on~ llH:r cnlitie' (1 - I, ... , n)
and one O\"er tim~.; pcrioJ., (I - 1.... , T)5

{3, -

Titc Jcr-\ aunn l1fthe 'amplin~ Ji~trihutu1 n or f! !' r11lk J, lh1. Jcri\:Uil)D in Appendix

4.3 of the sampling Ji,tributinn of the OLS c<>tim 1101 '' ith
f the

(10.22)

cro~<;-sectiona l Jata.

1-ir... t. -.uh-

380

CHAPTER 10

Rogrossion with Ponel Data

u,

stitute Y,, = p,x" 1[Equauon (10.1-l)J into the numerator of Equat1un (111.21). then
rearrange the re:;ult to obtain

a - Pn I --

P I

" r LLX;,u,
L.!.L II

'

2: 2,x'
I
-I t

Next, dl\'ide the denominator of the right side oCEqultinn (10.23) by r f dh e h~ numo.:~
alor by

v;;T, multiply the

I L,.r 1(X,

-]

- X,)

left side by

r u, = I,.
1 X 11 u,,. Then

v;;T, and

no te th;H

Y.,' ,X,Iiu - ~~-. t.,u,,-

( 10.24)
I

where v11 = X1,u" and Qx = -II T k,. 1 I,. 1 X~. The scaling fac10r 1n Equal ion (10.24). n T"
I he total numher of observations.
-

Under the first four assumptions of Key Concept 10.3. Q.Y ~ Q;

E2' 1 ~ 1 l.','

Also, by lhe central limit theorem, /f''7~ 1 TI, is distributed N(O,a~) for n large, where,,, ts
th,: variance Of 1};. lr follows from .EquatiOn (10.24) that . under A~sumplions l- 4.
(10..25)
where

(T~: var(11.) = var(Jir,., v,).

(10.26)

From Equahon (10.25), I he variance of the large-sample distribu tion of

P1 is

var(ft 1)

1 !!..a.
= -;;r
Q~ .

a;

L'oder Assumption 5 of Key Concept 10.3. lhe cxpres:.1on for in Equal ion (I 0..':6}
simplifie:.. Recall that. for two random variables U and V, var(U + V) ~ ,-ar(,1-'- '~ ll)

+ 2cov(U,V). The varian<.:<! of the sum in Equation (10.26) therdore can be written ll.~ the
liWD o( variances, plus covaritmces:

Standard Errors for Fixed Effects Regressron with Serially Cort~lt;tted Errors

10 .2

2). then

var(

/'i= ~v,) t .
=

381

ar(ii, 1 + ii, 2 + + '11 )

~[var(v, 1 ) + var(ii,v + + var(li,,)

(10.28)

+ 2cov( J;11 ,1i12 ) + + 2cov(ii,T _ 1,J;a)].

(10.23)

Under Assum ption 5, the erro rs are uncorrelated across time periods. given the X'~ so
e numer-

all the covaria nces in Equation ( 10.28) nrc zero (Exercise 10.6). B ut if u11 is autocorrelated.

,X,ult-

the n the covariances in Equa!iun ( 10.28) are, in general. nonzero. The usual heteroskedasticity-robust variance esttmator sets these covariances to zero. so if uu is a utocorrela ted the
usual heteroskedasticity-robust variance estimator does not consiste ntl y estimate~
ln contrast, the so-called clustered varia nce estimator is valid even ii IJ 11 is condi tion-

(10.24)

ally autocorrelated. The clustered variance estimator is


1

U~.rilumtd = n'f' L

24).nT.is

,,,7

X
~,

ii
here u~ i$
L 1 l

II (

-"1

wberc

fr;,

T~~~ )2
L
1-1

(10.29)

X;J;t, and u, is the residual from the OLS fix..:d effects regression. (Some soft-

ware imple ments the clustered variance formula with a degrees-of-freedom adjustme nt.)
TI1e clustered panel da ta standard e rrors are given by

-1
ll7

(10.2.5)

-.

O'q, riU IIrrttf


x

Q-x2

cJustered standard error).

( 10.30)

The cluste red va ria nce estunator U!.c~uurud is a consistent estimator of~ as n ~

<"-

and T is a fixed constanl, even if t he re i heteroskedasticity and/or autocorrelation (Exer-

(10.26)

cise 18.1 5): tha t is, the variance estimator is hcteroskedasticity- and au tocorrelatio nconsistent. This va riance estimator is called the clustered variance estimator because the
errors arc grouped imo clusters of observations, where the errors can be correlated wi1hin
the cluste r (here, for diff~.:rl.!ot time perio<ls but the same entity). but are assumed to be

( J 0.27)

uncorrelated ac ross clusters.


For the clustered va ria nce es tima tor in Equation (10.29) to be reliable, n sho uld be

ion ( 10.26)

large. Fo r e mpirica l cxamples using HAC standa rd errors in economic panel data, see

) + var(V)

Bertrand. Duno, and Mullainat han (2004).

itwn all the

Standard Errors when u ;t Is Correlated Across Entities

rn some cases, 11.11 migh t be correlated across e ntities. Fo r example, in a study of earnings,
suppOS\! that thc sampling scheme selects fam il ies hy simple random sampling, then tracks
all siblings within a family. Because the omitll~<.l factors that enter the erro r term could have

..

382

Regi'C$Sion with Poncl Doto

CHAPTE R 10

con11m10 driJI..ntS lor ~iblillg.'>, 11 IS not rc:lsonaN..: to a-;c;umc that the. crwro; urc im.lep.:>n~
dent lUI .rhlin~' (C:\CO though the) are independe nt [Qr rndh rduab I tom Jilll:rent

(amilic'\)
I~ the -'hling.' \ rmpk.lomilic:. <rr-.:. natural du-.tcr.-. ur !!fllUping-..ul oh,crvauon\
1\ here II IS CMI'CI:llc:J Wllh!O the cJuqer hUt not 3CTCX'I cluo;tei': Jbe Jcr ivntion Of dll..,lcn:u

vnn.anccs lwding It I ~u ruon {111.2'1\ can be modified to <1lluw hn clu5teb acros.~ cntith..- .
(fl

~~y~F"t

r e~arr plc.lnmili~). or aero.:, ~th en1111~ and time.

-t..-y

2~> r.

9/.
AI#

',-

J ~

i .wl.;/u

'<

urtfi~ Wei

!AtU'"~J

ftu;.,.

~us<

F'Dd..R.

(1</re.

~.

fp,

!f,ri~ ~~~~ f'PtU : ~ ~(,<~~

,J,-e

_u, ._R

e-u~ ?

!t:,t::::::I:::f::

Regression with a
Binary Dependent Variabl ~w

11

CHAPTE R

~ b~ nu.

~~ ! LP~H ~~~~/Cd~t"U'uJ,
C
__ ~.
~

,t~ /(;

~e I

wo people. identical but for their r.u.;c walk tnto.t b.mk and appl) for a
mortgage. a large loan -.o that c.1ch c"' .u\ .. n i knlll',ll hou e. D Ot:S the

banJ.. treat them the same wa} .,

-'\r~ the\ hoth equ.tlh likely tu have their

murtgagt.. application accept\!d? 8 ) law lht..) muM n.!L.c:tve tdt.mtical tn:atmcnt.


But whether they actually do is a mat ter ol great concern among bank

~ff

regulators.

Loans are made a nd d e nied for many h.:gitln1JIL.

pro posed

Jo::~ n

rea~ons. For example, if th~,;

ayments take u mo't or all of the 1pplicant's monlhly mcomc.

then a loan o fficer mig ht justifiabl~ den) the loan. Also. even loan officc:rs arc
human a nd they can make ho nest mhl3kcs. 'o tht.. dental of a '-ill!!k mmority
applicant Llocs not prove anything .shout J i,crimin.uinn \.fany -.tutlie:. of
d i ~c r~a ti on

thus look fo r statistical e\Jdcncc ol dtl.cnmmatton.lhat lS.

eviden~ contained in large d a ta set; ~howtng th:tt "h;tcc; \IOU minorities ar e


I rc:,ltc:tl diflerently.

But how. p recisely, should one ch~l.. tl\r '>t ll t,l-lL

/1
LP1

til:\ idcnce

of l.bJ>crimmauon

in the mortgage market? A start is tu <;Umpar~ the 1t.11.:tiun of minority and \\ httc
a pplicanl.s \\ 1)0

\~!!re denied a moqg~gc. ~ }~11.. tl~to c\.tm~ned Ill this chap ter.

(#~ rather~tl (rom mortgage application-. in 1990 tn the Ro~ton, Massachuset~ area,

~ r.~k ~8"v of hlack.Slpplicanrc; were dcnj~,:d mortlt<U!J.!S hu t opl)Y% o( w hi le a pphrantc;


if r.y~-~ ''-'< d" mcd. But th;s compan on doc; nut r.ally .m;wct the quc,Uun that
~

qf/~ ope ned this chapter, because the black :md \\httc applicants were no t necessarily

M{;.. .
~

'ide ntical but (or tbl!ir racc.''lnstt..ad we.: need a mdhod fur companng rates of
nwl, holding

mh~[ gpPiicam clw rnctrm uc.s W/1\tll!.!f.:

383

384

CHAPTER 11

Regression with o Binary Dependent Variable


Thi!\ sounds like a job for multiple regression ana lysii)- antl it h. but w11h "
twist The twist is that the dependent variable-whether the applicant is

denied- is binary. In Part II, we regularly used binary variables as regressor~


and they caused no particu lar problems. But when the dep~.:ndcnt vanabk 1!.
binary.th1ngs are more difficult: What does ir mean to fit a lm~ to a \kpcnd~:nt
variable t hat can take oo only two values. 0 and 1?

f/

The answer to this question is to interpret the regression function as a

lie

predk h:d probatlility. This inte rpretation is discussed in Section 11. t. a nd it

~~.A.~.I

/p

-~~

dependent variables. Section 11.1 goes over this "line;u probability model." But

,p~~

the predicted prohability interpretation a lso suggests that alternative. nnnlincar

I ,.,

a llows us to apply tbe multiple regression models from Part IT fo binary

rc:gres~on rnodels can do a beuer job

modeling these probabilities. These

methods. called ''probit .. and .. logjf' regression. are discussed in Section 11.2.

~t$~ !
f~/1

Section 11.3, whic h is optional, discusses the me thod used ro estimate the

coe(ficieots of the probit and logit regressions, the m~thod of max1mum

,.~ {t(''tr;. IJkclihood e~umation. ln Section 11 .4, we apply these methods to the Boston
mortgage application data set to see whether the re is evidence of racial bia~ in

1. 9
~r 'l
Q;..v"" vt
.[ ~

mortgage lending.
The binary dependent variable considered in this chapte r is an example of a

dependent variable with a limited range. in other words, it b a limited dependent

\.t j)r~

~)J/ . 1

rf I V

A~

d~crele values, are surveyed in

Appendix 11.3.

11.1 -B-in
- a-r y_ D_e_p_e_n_d_e_n_t_V.
_a_r_i_a_b_le-s- - - - -- -----r

~~ ~

:.fo

;.P /JI'
#J'T. A,/~
ArV'

dependent variables that Lake on mulliple

~u

,.IIJf

rible ldodcls for o the r types of Hmited dependent variable forexampk,

and the Linear ProbabiHty Model

denied~

Whether a mortgage application is accepted or


one example ol ' .....y
' anablc:. \1any other important q ues tions al~o concern binary outcomes. \Vh . tt ~
the effect of a tuition subsidy on an individual\.. dects1on to ao to coll1!2t> ' wn.tt

fx"' '7.~

-~~

' B~~ ~~-v~;;oble~-ond rhe.lin~r Probobiliry Model!'"':.~~

~' ~~nninc:> whether a tcenag~r take" up mn~ in~'? Wha t determmes wbe lher a

cou nt J! rcce ,.e" tore1gn aid' 1 Whm dt: term me~" h c thcr a job nr phcant i!> succc,,ful ! In all the~ ~xamplc!-. the outco me: o f mtcrc ... t i' bmiif! lne student does

d oc' not go to coUege. tht. lccna~cr docs nr tf<>t:., nut takt. up ... moJ...ing. a count!} doe or dlh!S not rccei\'e fort:lgn ;ud. tht. apphcant doc' or dol!s not get a job.
This c;ection dlscu~es what di'>tiO)!Ul\hcs rl.!gtessm n with a hmaf) d e pendent

01

vana ble from rcgrcss1on '' ' 'h a conunuo us dc pemk.nt \.tr!ahle. rhen tu rns t o the
c;im plcst model to use w11h bmar) dependent '..tri ub!I.!S. the hncar probability

;4 "ttlc~

~ >wtSf~ary Dependent Variables

~ ~

The applica Lion examined in this chapta is

wh~>!ther

race jc; a factor in denyi ng a

mortgage ap plica tion: the binal") dt:pt.:ndcnt \Jnahlc '"whe ther a mortgage ap plicatton '" demed. The datn arc a su bset ol a la rger data set com p iled by researchers
~{ ,.(fJOeC

at the Fed eral R c c rvc Bank of B oston und~r the Home Mort gage Disclosure Act
(H M D A). and relate to mortgage applications fi led in the Boston, Massachu.c;ett"'

7 ~ ., frt!
M~o/

l f1

t4,~ H

1 i,

area in 1990. The Boston HMDA data a re descrih co in Appe ndix I I. I.


'vlortgage application~ att. c11mphcatcd anJ ~o " the process by "hich the
hank loan offJcd makes a d.: btnn DH! loau o {{tCI!l m ust forecast wherher

Lh~

; LJ,/c..

applicant will make his or her loan pa) mcnts. O nto impo rtant piece of information

;j ~

arc 10o. o f \ouri ncomc than "On:.! We t herefor~: begtn by looking a r the relationship bc t\\ CCn two variables: the hmnrv dcpenJent variable deny. w hich equals

( <

~ is the size of the requiTe d ltMn payme n ts re luthc to the <tpplicant's income. As
~~
anyone \\ ho has borrowed moncv know~ Ut $ much ea"ICI 10 make oavmcnrs that

1 7':{I
~
1

~,

-1.

I if the mortgage applicatio n was J e nie J and l!ljuah IJ 1f it wac; accepted, and the
contmuous vanaPic PI! ratw, wh1c.h b the ra tio o l the applicant's anllcipated total
1
5
mo nthly loan paymenb to llli or her monthly income.
S ~ )4
Fi~un: 1 1.1 pre e nts a scuttcrplo t o{ tlon '"'r'u' f I ratin for 12- of tht. 2..1; )
~
~n. .~.n llions in tbe datcl ~cl. (Tht. ~t.atlcrplot is ca~tcr to read usmg rhis -ubset of
~he data .) Thi:.. ~catterplot look' diffe rent than the scauerplob of Pan ll because
{' ~
~~ variahie deny is binary. Still. it set:m' to show a rel. tllono;hip between deny and
I
,t~'t!J ratio: f c" applicants with a pa) m.:nl to llli.Otnc,; r.llto ~s than 0.3 ha\C: their ...,
applicauon deml!d, bul m<.l\1 pphcanh Wllh o JM) mcnt-llHncomc ratio cxceedYl~ 111g 0.4 ar~. dcmed.
~ ~-7 ~
T his positi\'c relationship be twee n PI/ rntio a nd deny (t he h igher the PI! ratio.
~ till. greater the fraction of de nials) i<: 'ummn1i?cd in ligun:. II I h~ tht. OLS rcgres''on hnc t.'llmatcd usmg thes~. 12"' uh),l.f\.llton,. \s usual.t llii> line plots the predtct~d value o l deny a a function of tht. rcwcssm, tht. paymenHo-mcomc ra uo.

J wA

Y,.

_/)
-

386

Regression with a Binary Dependent Variable

CHAPTER l l

FIGURE 11 . 1 Scatterplot of Mortgage Application Denial and the Payment-tolnco


Mongoge opplrconls with o
high ratio of debt payment$ to
income (PI/ ra tio) o re more
likely lo have their application

demed (deny = 1 if denred,


deny = 0 if approved) The lin
ear probability model u~ o
straight line to model the
probability of deniol, condi
tionol on the PI/ ratio.
-. Mo-rt~e appr!M!d

-0.4 '--- - ' -0,0


0.1

--''---

0.2

--'+- - - ''---

0 ..

U.4

- ' - - - - '()$
0,6

-'-- - - '
0.7
t).~

P/ l ratio

P., _AL"-"tSti

r:

IV'(/"?'

~~1!1.;

For example. when PI! ratio = 0.3. thl! predicted value of den~ is 0.20. But wh,ll.
precisely. docs it mean for the predicted value of the binar} variabk d t>t ,. 10

bl. 0.20'!
The key to answering this question-and more generally to u nd ers tn nJ~ng
~ regression with a binary dependent variable- is to interpret the rcgr~on a" Trd~ cling the probabLIIC) that the dependent 'ariable equab 1 Thus, the predicted v Juc
o f0.20 is interpreted as meaning that, when P/1 ratio is 0.3. the probability ot lh.. ntal
L.-- ~ p,3
is estimated 10 be 20 %. Said diffe rently, if there were: many applicatton ... ' .rb
{J,
11
().L,
PI/ rmio = 0.3 then 20% of them would be denied.
11

cd:?

pr~?_ AIL

This interpretation foUows from two facts. Fint. from Part 11. the popul:llii.O
function is the expected value of Y given the regressors. E( Yl .\ 1
~:;I
.X*). Sl.!cond, (rom Section 22, if Y is a 0-1 binary-t.variable, then its expected \I lm~
p,l
(or mean) is tne probability that Y = I. that is.. E(Y) = Pr(Y = 1). In the rc:c. ..:'-
~ sion context the expected value is conditional on Lhe va lue o f the r~gressut !'o. '~)
~
the probability is conditionaJ on X. 111USfor a bi nary variable. E(YI X 1, , x~) ==
P r( Y = l l%1 , X" ). In short. for a binary variable lhe predicted value fron th~
population regression is the probability that Y = 1. gtvcn X .
The linear multiple regression model applied 10 a binary de pendent vaJ wble
is called th e linear probability model: linear" because it is u straight l in~. . td
~~egressio n

" proba bility model" because it models the proba biliry that the dt:pendent , .,r. I~
equals l , tn our example, the pro babiiJty of loan denial.
1

~~ Line~~,:::::;~:::n~ ~u,.., p,~~,~~~


The linear probabiJit) model is lh~ nam" lor the muhipl~ r~~ress10n m~
/
II \\hen the dcp~.;ndent 'ariable i~ binar) ralh\:r th'ln contmuou:.. Because the
dl!pcnc.knt variable Y i bi nary. the population rl.!gression function corresponds to
the probability that the dependent variable equals one. gtven X. The population
~ '.
Ocudficknt {3 1 on a rcgrcc:sor \'is the chuw '" in c!w p mlwbilifJ th.H l' - I associ~ li-~ /1 ~(..f.tuc:d "ith a unit clumge tn X. Simtlarly. th~.o O LS predicted value. Y,. computed using
(,
~ t the estimated rcgrcss10n function. i th~.; prc.:~olJc.t~:J pruhabalit) that the dependent
ik
variable equals I. and the O LS esttmator {3 cstimtllc~ th~.; change in the probabil-I
ll' that } - 1 ,tssociated \\Jth ,, unil ch<Hlf.t; tn .A.
'I ~ \ Almost all of the tools of Part II carry over to the linear probability model.
l>
The coefficient~ can be estimated by OLS. Ntncty-hvc percent confidence inter.:
val-; can b~.; fom1cd as :: 1.96 standard error!., I\\ puthc~~:s conc~rning several coefCic nts can be t~te using the F-!Hati~tt scussc m Chapter 7; and anteracnons
rg : A /J }Jetw n vanablcs can be modeled usi th~; method~ of. ection .J. B ecause the
I
~J:tors f th~ linear probability mod arc al\\ ay . hctcro~I.CdQ!)tic (Exercise 11.8).
A
i~s css ntial tbat heteroskeda!>ta v-robust standard errors be used for inference.
L~ ~ X:::-1 0 e tool that does not ca , over is the R2 When the dependent variable is
co muous. it is possible to agine a situation m "hich th~; R~ equals 1: All the
oata lie exact]~ on the regr ssion line Thas 1s tmpo,~ibk "-h1.n the dependent 'an6
t ..ffbk is binan unJco;s the g.resso~ are al'o binan Accordingly, the R2 is not a par.') ff~ularly useful statisn ere. We re turn to measures of fi t in the next section.
f;,
The linear prob ility model is summarized in Key Concept l l.l.

.V

Y"

'f

.7'

,(0

.tt,uJ ~plication to

e Boston H MDA data. The OLS regress1on of the bi nary

~dd~er,JXndcnt van< le, deny , against the payment-to-income rallo. PI/ ratio. estimated
1), ;4l ~sing all 2380 bservations in our data c;et is
J-M& ::: t'.IPY
t; t,
:
pR
i,IJ.t
~~
-;J;;i} = - O.OSO + 0. 6f~fl I ratio .
.
( 11.1)

a-

~I~

r'
Jr. , t,c;~
fl 0 l fl'~~""The estimated coefficient on

(0.032) (0.()9S)

PI/ ratio is positwe. and the population coefficient is sta t i~tically significanll) different from tcro at the J~>fi kvd (the r-statistic
(
h I l) Thus. applicant wn h lugbt!r debt payments as a fraction of income are
,r k~yre Likely to have their applicatio n denied. This coeffi cient can be used to com. Ib
~
~. pute the predicted change in the probability of denial. given a change in lhe regresl~ft
r. For example. according to Equation (11.1), 11 :h~. PI rat l ncrea ec; b} O.L
then th~ prubabalit} of derual increases b} O.hO-l 0.1 a 0.060 that is. b} 6.0 pcrccnt.tgl.' r(.)inl ~

' J1if,4

pfP

388

CHAPTER 11

Regression with o Binory Dependent Variable

THE liNEAR PROBABIUTY MODEL


The linear probability model is the linear multiple regression mo<.lcl

Y; = f3o .... {31X11


whcr\. Y,

ll>

~ /32X2t

+ + /3I.Xk1+ u,.

( 11.2)

bmary. so that

Pr( y

x ,. x2..... x~.;) =

/3o

f3tXI

+ (3'!X2 + . . . T

f3~. X,.

The regression cociTictent (3 1 is the change in the probability that }' == Lassuc1ated
with a unit change in X 1 holding constant the other regressors. and so fot th for
/3-:,.... {3~. Tile rcgres~ion codficients C<tn he estimated by OLS. ami the usual
( hclcroskcdasticity-robust) O LS l:> tandard errors can be used for confidence inta
Yals and hypot hcsis tests.

The cstimalt:u linear pro bability mode l in Equation (ll.J) can be uscll 111
compute pn:uictcd denial probabilities as a functio n of the PI! ratio. For exampl ~.
if projected debt payment are 30% of a n applicant 's income, then t.be PI! rattu ''
0.3 aml the predicted valuefrom Equation (11.1) is -0.080 + 0.604 x 0.3 =0.101.
"Jhut IS, accon.hng to this linear probability model, an applicant whosl! proj~~h;J
d~bt payments arc 30% of tncomc bas a probabilitv of 10.1% that hie; or her .1ppli
cation "iU he denied. ('nus is diffe rent than the probabiliry of 20% based on thl.'
regrlo!~Sil)O line in Figure 11.1. bca~ usc that line was estimated using only 127 l
the 2300 obc;crvauons used to ~stimate Equation (11.1).)
\\ hat is the effect of race on thr.:: probability of denial, holding constanltb" l'il
ra11o'! Tn ~l!cp thtngs ~i mpll!. we focus on differences between black <tnd \\I t~
appl_icants. "lb estimate the e ffect of race. holding con~Lan t the /)/ / muo. wt: 'I!!
menl T:qttaliorr(L t .I ) w1th a ~inan regrlssor that r.:quall' 1 if the apphcant is hl 1 k
ami equals 0 if th~: applicant i~ white. The l!stimated linear probability mudd I'>
'\

~
..

= -0.091 +

0.559?/l tario + O.l77hluck. ..


(0.029) (0.089)
(0.(125)

(I U)

l11r..: co etlident on black, 0. 177, indicarc!. that an A ft tcan Amt.:ril-an appli nt


ha') a 17 7% hif!hc r prohabilnv of having a mon~al!c applicttion dcnicJ th 111

v. htk holdsnr con-..tant thetr pa) mcnt-h)inulme ra11o. llw; coel llctent b ~1f,n1li
cant at the 1"{, level (th~ r-statisttc 1" 7 II ).

(11"l- t-"..(;jti'V":l '?

11.2

Probtl

ond logil Regresston

389

(,v' d~

J,tl.cn litenlly. th1s estunatt: ..uggc,ts that lhcrl.' m1ght he rncinl bias in mortbur such a conclu,on \\<lUlu be. p1 nMturc.Ahhnugh the paymcntto-mcume rat10 plays a role in Lhc: loan nlfic~T 'f. JccJ,ton, 'n do m.tn) othl.!r [actors.
ch a<> the applicanr' l.!.'lming potential find the inJh iduar, crcdtt hbtory. U any

~.tgc do.!CI~ion

It

'more thorough analySls m Sccuon 11 .3.

Shortcomings ofthe linear probability model. 'Inc lin~arity that makes the
linear prob,lhi l it~ model easy lo use i' alo;o ih mnjor O.tw. I uok again at Figure 11 I :"Jhe estimated line rcprcs~.:nting. the prcdktcJ ( 1\lh,ahthtl~,;, drops below
0 fur \Cf~ IO\\ 'alu~o' of th(; P./ rau(l and C'Xl' cd I h' high vc~lut:.! B ut this i!. mmscn:)c: A probability cannot be k ss th"n 0 or v.rcatcr than 1. n11s nonsensical feature is an ine vita ble consequence of the lml.'ar rl:grcsston. lo udtlre~s th1s probkm.
we introd uce new nonlinear modt!ls specifically Jt..stgn..d Cor hin:tr) dependent

v.,;.b';1 Rr Z'l:.''~";~ k- (

11 .2 Probit and Logit Regression

,fj

Probit Regression

pfe 1,.r iiir . robit regression with a sing/e regreSSOr.


f~

11

'lltc probit I cgn:ssion model with

<tngk r<grcsw X"

1-t J: iJ

l'r( Y

l1Xl- 1h({3,-t{3 1X).

.P;_--:---.....:..__
l Jronoum.'o.:d prl\ hal and lr,..Jil .

(11.-t)

390

CHAPTER 1 1

Regression with a Binary Dependent Variable

FIGURE 11 .2 Probit Model of the Probability of Denial. Given the P/I Ratio
The probit model uses the cumulative normal d1stribution func
tion to model the probability of
denial given the poymenttoincome ratio or, more generally,
to model Pr(Y = l iX). Unlike
the linear probability model, the
probit conditionol probabilities
are always between 0 and 1,

deny
1.4

r-

1.2

r---- ----- ---------- - - -- ---:;;;;


--.---- - - Mortgage demed
0.8 r1.1)

0.6 1-

Probit Model
1).4 1--

0.2

1-

n.o r- --- --- --- . . . .~


. . . -'se.._
-0.2
-0.4

_ _ _ _.

--------------- ----Mortgage approve(!

rL..__

u.o

- J -_

0. 1

__JL . . _ --L..._

0.2

0.3

__JL..__

0 .4

_ . __

__JL..__

0.5

0.6

_ . __

0.7

_,

O.H

P /I ratio

where <l> b the cumulative standard norm al distribution function (tabulated in


Appendix Table 1).
For example. suppose Y is the binary .mortgage denial va riable, deny, X is the
payment-to-income ratio (Pif rtllio ), {30 = - 2. and {31 = 3. What then is the proh
ability of denial if PII ratio = 0.4? According ro Equation (11.4), this probabilit}
is<P ({30 + {3 1 Pil ratio) = cp (-2 + 3P/1 r(J.(io) = cl>(-2 + 3 X 0.4) = <1>( - (l.X),
According to the cumulative normal distribution table (Appendix Ta ble l ).
<P(- 0.8) = Pr(Z :5 -0.8) = 21 .2% . That is. when the PI/ ratio is 0.4,tlle predicteJ
probability that the application will be denied is 21.2%. computed usmg. the pro
bit model with the coefficients {30 = -2 and {3 1 = 3.
In the pro bit mode l, the term. f3u + {31X, plays the role of "z'' in tbe cumul<t
tive standard normal distribution table in Appendix Table 1. Thus. the calculation
in the previous paragraph can, equivalently, be done by fi rst compu ting the
'z-value.'' z = {30 -r {3 1X = -2 + 3 X 0 .4 = - 0.8, then looking up the probab1bly
in the tail of the normal distribution to the left of z = -0.8. which is 21.2%.
H {3 1 in Equation ( 11.4) is positive. then an increase in X increases the probd
bility that Y = 1: if {3 1 .is negative, an increase in X tll!creasell the probability that
Y = I. Beyond this, however, it is not easy to interpret the probit coefficient [3,,
and {3 1 direclly. Instead , the coeffic ients are best interpreted indirectly by cow
puting probabi lirie::. and/or changes in probabilities. \'.Vhen there is just one regr~~
sor, the easiest way to interpret a probil regression is to plot the probabilities.
Figu re 11.2 plots the estimated regression functio n p roduced by the probil
regression of deny on P/l ratio for the 127 observation. in the <\C3llerplot. The ~ti

11.2

Probit and Logit Regression

391

ma ted probit r~gressi on funct ion has a stretched s" shape: It is nearly 0 and fiat
for small val ues of PI! wtio: it turns and increases for intermediate values: and it
flatte ns out again and is nearly 1 for large values. For small values of the paymentto-income ratio, the probability of denial is small. For example, for PI! racio = 0.2.
the estimated pro ba bility of denial based on tbc estimated probit function in Figure 11.2 is Pr(deny == 11PI! rario = 0.2) = 2.1% . When the PI! ratio is 0.3, the esti~ mated probability of denial is 16.1 %. When the PI! ratio LS OA. the probability of
denial increases s harply to 51.9~8 . and Y?hen the P/1 rarw ts 0.6, the denial probabilil ' is 98jo/o. Acco;ding to this estimated prob.it model, for ~pplicants with high
payment-to-income ratios, the probabilit}' of denial is nearly I.

Probit regression with multiple regressors.

In all the regressio n problems


we have studied so far, leaving out a determinant of Y that is correlated with the
included regressors results in omitte d var iable bias. Probit regression is no exceptio n. Jn linear regressio n, the solution is 10 include the additional variable as a
regressor. Thi:. is also the solution to o mitted variable bias in probil regression.
The pr obit mode l with multiple regressors extends the si ngle-regressor probit model by add ing regressors to compute the z value. Accordingly. the probit
population regression model wil11 two regressors. X 1 and X 2, is
(11.5)

ity

.8).
l ).
ted

roula-

ion
the

ility

s.
robit
e'-ti

For example, suppose ,80 = -1.6.,81 = 2, and .82 = 0.5. lf X 1 = 0.4 and X 2 = 1,
then the z-value is z = - 1.6 + 2 x 0.4 + 0.5 X 1 = - 0.3. So, the probability that
Y = 1 given X 1 = 0.4 and X~ = Lis Pr(Y == 1IX1 = 0.4, X 2 = 1) = <P(-0.3) =
38%.

Effect of a change in X .

In general, the effect on Y of a change in X is tl,le


expecte d c hange in Y arisi ng from a c hange in X. When Y is binary, its conditional
expectation is the conditional probability that it equals 1, so the expected change
in Y arising from a cha nge in X is the change in the probability that Y = 1.
Recall from Section 8.1 that, \Vhe n the population regression function is a nonlinear function of X, this expected change is estimated in three steps: First, compute
the predicted value at the original value of X using the estimated regression function; next, compute rhe predicted value at the changed value of X. X + !lX; then
compute the diffe rence between the two predicted values. This proce dure is sum~rized in K ey Concept 8.1 . As emphasized in Section 8.1. this method always
wo rks for computing predicted effects ot a change in X. no matter how complicated
the nonlinear model. When applied to the probit model. the me thod of Key Conce pt ~. 1 yields the estimated effect on the probability that Y = 1 of a cbange in X.

392

v
f: =---tl

CHAPTER 1 1

.:~(''':

<;:.

K~Y CONC!ifT

r;=-.,..:

1 1.2

Regression with a Binary Dependent Variable

Tl;i.E-ROBIT MODEL, PREDICTED


"f"ROBABILITIES, AND ESTIMATED EFFECTS
n1e population probit mode l with m ultiple regressors i::.
Pr( Y = l lX 1.X2 X~.)= <P(~o {3 1X 1 + {32X 2 -r + {3~1J,

II

(ll.6)

where the J ependent variable Y is binary. () ill the cumulattve <,tandard normal
distribution funct ion and X 1 X 2 e tc., arc rugresso rs. The probit coefficients {311,
{3 1. .. f3k. do nm have simp le interpretations. The mode l is best interpreted h\

computing predicted probnbilities nnd the effect o.f a change in a regressor.


The predicted probabiliry that Y = 1. given values of X 1 X z. . . X1c is calctJ
lated by co mputing the ;;-value. z "" {30 + {3 1X 1 + {32X~ + {3kX k . and then
looking up this z-val uc in the normal dilitri but io n table (Append ix Table I).
Toe effect of a change in a regressor is computed by (1) computing the pre.
c.ilctcd probability for the initial value of the r egressors: (2) computing the predicted probability for the ntlw or changed value o f the reg ressors: and (3) takiug
their difference.

TI1e probit regression modeL predicted probabilities, and estimate d e1'Ccl:l5 are

summarized in Key Concepl 11.2.

iii

Application to the mortgage data.

ii
l

m e nt-to-income ratio (PI! ratio):

As an illustration , we fit a probit model

to the 2380 observations in our <.lata set on mortgage denial (deny) a nd thl: paY-

Pr(deny

= 1 PI f

rmio)

= cfl(- 2.19 ,-

2.97?11 ratio).

(JLfl

(0.16) (0.47)
The estima ted coefficients of - 2.19 a nd 2.97 are d iHicult Lo inter pret b~:caus~
they affect lhe probability of denial ''ia the z -value. Indeed, the only t hing that on
be readily co ncluded from the cslimatcd probil regression in E quation ( l l 7J is
tha t the P/1 ratio is positively related lo proba bility of denial (the coefficient on
the PI! rario is positive) and this rela tionship is statistica lly siguiiicant (r = 2.(mo.47
= 6.32).
What is the change in the prcuicted probabili ty that an application \\Ill bl.'
den ied when lhe paymcnl-to-income ratio increases from 0.3 to 0.4'? To .tn=-wer
t hi1> q uestion. we follow the procedure in Kt.::y Co ncept S. l : Compute the-: pJ\>hiJ

11 .1

Prob1t and log1t Regres~on

393

bll1ty o f dental for P I ratio = 0. ~. then for /'I ratu' = 11.4. ;m(.) then compute the
diflc.rc.m:u. n,c prohabilit} of de mal\\ hen PI/ ratu .-;:: 0.1 i... <II(- 2.19 - '2.97 x 0.3)
q,t-1.30) - 0.097. The probab1ht) ol <.lc.m.tl \\ hc.n PI ratio = 0.4 b {I>(- 2.19 2 ~7 0.4) <I>(- l.tiO} = 0. 159. 'Tllc ~:~timJh:d ch.mgc 111 the prohahi li t~ of denial
i-. 0.1 'i9 - 0.097 = 0.06'2.1lli:lt b . an incrca-.c. m the. fi.I)'Olcnt-tc.l-income ratio from
0.1to 0 4 is associated '' ith an incn.:a' c in the.: probability of dc.:.nial of 6.2 txrccntagc pomll..1rom 9.7o 10 15.9%.
lkcau ...l! the prohit regression function 1-; nonhnc.1r, the l ftcct ol a change! in
X t.h.:pcnd-. on the. tarting 'alue of ~ . ror C)o.ampk . lf PI/ ratio - 0.5. then the estimated dc.n1al prol"labilit} hac;c.:d on Equauon (II 7) '" <ll(-2.19-+- 2.97 x 0.5} =
ct>( - 0. 71) 0.239. Thus the thangc. tn the. prcdtc:tt:J probab1hl) "ben P I r lllto
incrc..t ..cc; fro m 0.4 to 0.5 is 0.~19- 0.159 or ~.0 pcn:cntag.e points.. larger Lhan the
incrc.:ase of 6.2 percentage point<; wht.n lh~;. PITratio inuc.a-.e.. fr,,m 0.3 to 0.4.
What i' the dlt!ct or rac.e on the pro h..th11lty of mortl!:tl!e Jcmal. holding consiCull the pn} menHo-income rauo? To I.!Stt m.tc thl'> c:f(cct. we estimale a probit
regre ~ ion with both Pl l ratio lind I>IMk as r<..!!r~.; c;so r~:

Pr(deny = l i P I ratio.. black) = <I)( - 2.26 + 2.74PII rfll/()


(0.16) (0.44)

0.7lbluck).
(O.O!G)

(11.8)

.I\ gain. the vsluc of the cocffici "'' iHt. d1flkuh tn intcrprd hut the s1gn and
..tatl..,tll.tl s1gn1ficanc~ arc not. Inc codtic1cnt on black ~ po"1tive, md1catmg that
un Alncao A merican apphcaot ha!> a h1gh~.;r proh.sb11Jty nl dl nial than a \\ hitc
.1pp1Jca nt. holdlllg constant thc1r pit) mc.nt tu mnmw ra11o. 11ti-. ~odficien t is sta11\llcall} "'&mfkant at the I n-o level (th\! t !Hati-;tic un him/... i-. K S5). For a white
applicant '' 11h PI! ratio = 0. ~. the predicte d <.leniul prohahilit} j.., 75%, whiJe for a
black applica nr with PI/ ratio = 0.3 it is 23.3%: the. (.hltcrencc in <.kmal probabilities hc tw~ n t h e~ two hypotheuc..tl ,tpplic;ln ts is IS.R pe rccn t ag~. poinls.

Estimation ofthe probit coefficients. 'l he. prnhit coc:tlicients reporteii here
using the me thud ul ma \llllUill hkchhooJ. \\ lm:h pwJul:~s dlicicnl
(minimum variance) t: t imator-; in .1 wiJc '<II ict \ nl upplkat ions. Jncludtng regression With n hin:tT\ dependent \liTlllhk. 'n11..: m.l\lmum likclihuoc.l estimator is con"' tent a nJ norm<JU) distributed in larg.c s<tmplec;,so that t-o;t ttisucc; and confidence
m t~rva l c; tor the codlicJcnts can be con'\lf UCh.:d m the. u~u 11\\,1\.
Rc~rc,sinn ~oft,\are for 1!'-llm tllll!' prnblllllo ld 1\JH~o. I) u't:' maximum
likdihuod t:stunataon 'o 1h1s 1' ..,jmplc methPd t ' ,rl\ Ill prn..:llce. tandard
erwr' produced hy uch w h\\ lrc can be: lhcJ in thl: s.m1e \\ay a-. the ... tandard
errors ot regressio n coefficien t:.: for example. a Y" 01 coniiJ.:nce mtcrval for he
\\lrl

cstun.ll~.;u

394

CHA PTER 11

Regression with o Binary Dependent Variable

LOGIT REGRESSION
The population loglt model of the binary depe ndent variable }'with multiple
re gressors is
Pr( Y

= 11X 1 X 2, ... Xk)


- 1

= F(/311 ,.. f3F'<!

+ {32X2 + + f3r.X~<)

(,-(IJ,,+(J Xt

(1 1.9)
I

fJ. \"2~

-/J,}l.J"

Logit regressio n is simila r to probit regression, except that the cumulative distn
bution function is different.

true probit coeffic ient can be constructed as the estima te d coe ffi cient :=1 .96 standard e rrors. Simila rly, F-statislics com puted using maximum likelihood estimator-,
can be used totes{ joint hypotheses. Maximum likelihood estima tio n is discussed
further in Section 11.3. with additiona l deta ils given in Appendix 11.2.

Logit Regression
The logit regression model. The logit regression model is simila r to the pro
bi t re gressio n model, except that the cum ulative standard nor ma l tlistribmion
functio n <I) in Equa tion (11.6) is replaced by the cumula tive standard logisttc Jj.;.
tTibutio n fun ction , which we de note by F. Logit regression is summarized in 1-.ey
Co ncept 11.3. The logistic cumulative distribution function has a specific functional
form. defined in terms of lhe expo ne ntial fu oclion. which is given as the fi n<ll
expression in Equation ( 11.9).
A s with probit. the logit coeffici ents a re best interpreted by co mputi ng pre
dieted pro ba bilities and d ifferences in predicted proba bilities.
The coefficients of the Jogit model can be estimated by m aximum likelihood.
'fb e maximum likelihood estimator is consiste nt and normally distributed in lar~"
sa mples, so tha t tslatjsrics and confidence intervals for the coefficien ts can be Lon
str uc ted in the usua l way.
The logit and probit regression functions are s imila r. Th.is is illustrated in F!!
ure 11.3, which gra phs lhe pro bit and logit regression func tions for the depenJc::ll!
variable deny a nd the single re gressor PI! ratio, estimated by maximum likelihooJ

1 1. 2

Probit and logtl Regression

395

FIGUR 11 .3 Probit and logit Models of the Probability of Denial, Given the P/ I Ratio
The$e logit and probil models
produce nearly identica l estimates of the probability thor o
mortgage application will be
dented, given the payment-to' income roho.

de ny
1-l
1.2

l.U
Mortgage denied

08
06
0.4

o.n -- ---- ---410111"1'tc:::;;..._ _ _ - ----- ----- ------- MMgage approved


-1J. 2

-0.4

L __

(1.(1

_ L_ _...J....__

0. 1

0.2

___.l_ _......~,__ _~..-_-..~.._ _...J....__

03

0-1

u.s

fl.&

__J

0.7
08
P/ 1 ratio

.
. F'rgures I 1. 1 a nd I J.2. Th e diff e re nces
! ~~ ~
h/~ us .rng 1he same l 27 o bser va ltons
as tn
I'plY' I. between t he two functions are small.
t' ~
H istorically. the main motiva tion for legit rcgrc,sion was that the logistic
t:umulative distribution function could be computed fa~ter thnn the nom1al cumulattvc distribution function. \ Vith the advent of more ctficient computer ,this distinction is no longer important.

Application to the Boston HMDA data. A logit regression of deny against


PI/ ratio and black, using the 2380 observat ions in the uata set, viel<.h the estimated
regression function

Pr(tlcny = l iP / / raJio,black) = F(-4.13

+ 5.37Pil ratio + 1.271>/ack). (11.10)

(0.35) (0.96)

(0.15)

llte coefficient on bluck is positive and statbttt:allv Sl~ ni lkant at the l% h~\el
(the t statistiC IS 8.47). The predicted demal probability of a wh ite applicant with
PI / ratio = 0.3 is l/[1 ...L e - ( -4 Ih s.:n:<o.~ 1 :?>O!J = 1/ l J -+ <:l ~2 ] = 0.074, or 7.4% .
The predicted de nial probability of an African Ame rican applicant wilh PIT ratio
= 0.3 is l / {1 + el.25 J = 0.222. or 22.2%. so the difference be tween the two probabili ties is 14.8 pe rcentage points.

396

CHAPTER 1 1

Regression with a Binary Dependent Variable

Comparing the Linear


Probability, Probit, and Logit Models
All three moc.leb-linear probability, probit, and log.it-are just approximations
to the unknown population regression function E(YI X)
PrO ~ J IX). Th\! linear probability model is easiest ro use and to interpret, but Jt cannot capture t h~;:
nonlinear nature of the true population regression function. Probit und lngn
regressions model this nonlinearity in the probabilitic . but their regrl!~ ion codficacnt~ arc more difficult to interpret. So which should you uc;c in practice?
There i no one right answer, and diiTerent researcher\ usc different model..
Probit and logit regrCSl>tOns frequent ly pruducc ')tmilu r re~uJt <., For examplt.
according to the estimated probit model in Equation (11.8) the dif(crenc~ in c.lenatl
probabilitic.:s between a black applicant and a whi te applicant wath PI/ m1io = O.J
was estimated to he 15.Rpercentage points,'' herem, the lnp,it estim<llt.: of this l!i
bused on Lq uation ( 11.10), was 14.9 percentage points. Por practical purposes th~o
two estimates arc very similar. O ne way to choose bt:tween logi t Hnd probit is 11>
pick the mct hoc.lthat il> t!asiest to use in your statistical software.
The linear probability model provides th~ast sensible aruuoximaLion to the
nonlinear population regression functi on. Even so, in some data sell> there rna) hl
few extreme valut!s of the regressors, in which case the linear probability moJd
still can provide ao adequate approximat.ion . ln the denial probability rcgre~SH l1
in Equation (11.3), the estimated black/white gap from the linear proba~ili tv
mode l is 17.7 percentage points, larger Lban the probit and logit estimates but ~till
qualitati vely -;imilar. The only way to know this. however, is to e!>timate both a I near and nootinear model and to compare their predicted probahilitacs.

11.3 Estimation and Inference

in the Logit and Probit Models 2


The nonlinear models studied in Sections 8.2 and 8.3 are nonlinear fu nction~ 1f
the ind0peru.!ent variables but are Linear fu nctions of the unknown coefficicnb
("parameter "). Consequ!!ntly, tbe unknown coefficients of those nonlinear n.:gr .,.
sion fun ctions can be estimated by OLS. In contrast, rhe probit and logic regtl!~
sion fun ctions ar~ a nonlinear nction of the coefficients. 111<t t ts, the pwl,l l
coeffad eo ts {30 . {3 1, , {34 in Eq a lion (11.6) appear inside the cumulative.! -.t, n
Jard norma l dastribution function <I>, and the Jugit cn~ci~nts in Equatil'ln (I I <~1

"""

~"""" ~

..... "' .... ,.". '"" ,

. ..,

~w, ~

11 . 3

Eshmotion ood Inference in the logt ond Prob1t Models

397

appear inside the cumulati' e "tnndard logtr.,tic c.h~trihutt<m function F. Because the
population regression function is a nonlinear function o( the coefficients {30 ,
{31 {3k,those coefficien ts cannot be estimated by OLS.
This section provide an introduction to the ttndarJ method for estimation
of rrobi t a nd logit coefficient' ma\lmum hhhhood Jd(htJonal mathem atical
details arc ~1ven in Appendix II 2 Because it is bu1ll into modern statistical software. maximum likelihood estimation of the probi t coefficients is easy in practice.
The theory of maximum likelihood estimation. however. Lo; more complicated than
the theory of least squares. We tberefon. fit.,t dl\cu-., another estimation method.
nonlinear lea'il -quare . bdon. tum me to ma\imum likelihood.

Nonlinear Least Squares Estimation


Nonlinear lea t squares is a general method for estimating the unknown parameters of a rcgrc'>sion funct1on when like thl' proh1t cudhc1enll>. those parameters
enter the population regression function nonlinearly. The nonlinear least squares
e:.timator, which was introduced in Appendix 8.1 , extends the OLS estimator to
regression functions that are nonlinear functions of the parameters. Like OLS,
nonlinear least squares finds the values of the parameters that minimize tbe sum
of squared pred\cllon mistakes product!d by U1e model.
To be concrete, consider tl:lc nonlinear least squares estimator of the parameters of the probit model. The conditional expect ation of Y given the Xs is
E(YIXI ..... X~:) = 1 X Pr( Y = l iXt..... x~.) + 0 X Pr(Y = OIX J, .. . Xk) =
Pr( Y = 1IX1, ... X~;) = <l>(J30 + {3 1X 1 + + {3~1J Estimation by nonlinear least
squares fits this conditional expectatJoo function, which is a oon linear function of
tbc parameters. to the dependent variable. That is. the nonlinear least squares estimator of the probit c~Cficte nt an:: tho e values o t b0 .. b1c that minim1ze the
sum of quarcd predicuon mL'itakc~
n

L [F - <t>(h.> - h 1X 1T+ + h~ X4 ,)]2.

(11.11)

' 1

The nonlinear least squares estimator shares two key properties with the OLS
estimato r in linear regression : lt is consistent (the prubabtht} that it is close to the
true value approaches I as the c;amplc 11e get large) and it i" normallv distributed in larg\! ::,amples There are. however. estimaton. that have a smaUer variance
than the nonlinear least squares estimator, that is. the nonlinear least squares estimator is inefficient. For this reason. the nonlinear least squares estimator of the
probit coefficient:. is rarely u:.ed io practice, and instead rhe parameters are estimat ed .by maximum likelihood.

398

CHA PTER 1 t

Regression with a 8n'IOry Dependent Variable

Maximum Likelihood Estimation

!,

~):~
P~".t

f ht lil..c lihood ruuclion b the jotnt proballilit\ ui ... tnhutiOO of the l.I:Jta. treated ~~
.1 funct1 JO ot the unknown coeffit:Jcnts. The ma., imurn lil.clih ()(1d C\limutor (MI )

n' ''' lt the valu~~ ol th~ codiH.'h; nh that m I \


mv~ th1.. likelihood funcuon. Becau ..~,. the ~I L[ ch,'ltl'>~ tht: a.nk O\\n ..oeflku.nt
to m.l\:imize the hkelihood function. '>' hilh is in tu1 n the joiut prubahiht~ dt~tnb
Utll'n. 1n dfl! t th,,: M.LE chooses thl \'aluco;; ol thl p.u mdcrs to maximil'c the
rm"ahllJI~ of dn wm.: lht. d~.t ' hI ' rc "'.Clu.t'\ \ ~,,. cJ I n thl' ~n,c.thc MI r:
nr.: th" parameter ' al ut~ " most likd) to h;l\ ~produced the dat a.
o the unkn, ' : .. -.. '' ...

h (

~ o< \At ~ Y, , ,~:: ~~:~:~~;:,~~::;~~:.::1~':,~~~:~~~~~ :g~~~~~,~~~:.~: :b::;~:~~;,


~

,P

9'~

L 4J}f
tJI'

a nJom van abk amJ the onl) unknown paramc!l:r to e-.timatc is the pll)habtlllv
I \\ h1ch 1s .tlso tht mc.:an ol }

. p th<tl l

To obtain tht: m<t\lmum hkchhood e timator we need an exr ression for the
likelihood functton, which in turn requires an expression for the joint rrohuhilitv
di~tri bution of the data. n,cJoint probability distribution of rhc two oh!>crv<JtJ\111~
Y anu Y1 is Pr( Y1 = >1 l".;o = y,). Because Y1 and Y2 are indt:penJ~ntlv c.ll'lrtb
utcd. the joint J istrihution i:- th~.: product of th~ indtvidual distrihut1ons I Eyu.ttJPil
(2.23)1, ~o Pr( Y1 :: y 1, Y2 = y2) = Pr(Y = y 1) Pr( Y2 = y1 ) . The Bernuulh distr
tion can be summarized'" the formula Pr(Y = y) = p"(l - p) 1-'': \\h1..n v 1.
Pr(Y = 1) = p (1 - p)" = p and whe n y == 0, Pr(Y == 0) = p1J(1 - p) = 1 -fl.
fhu,_the JOint probabilit)' dbtrabut ion of Y 1 and Y~ i~ Pr(Y1 - \' }. = ,. \:
(p'(J
p)l-) ./ (1 - p)
)- p
1
Th~ liJ..elihouJ function is the joint probabaht} JJ,tnbution.lrc:atcd <1$ a func
t1on of the unknown coefficil'nh. For n = ~ i.i.d observations on Bernoull t ran
Jum varrables. th~ hkelihood fu nction is
(I I 12)

rllt! maximum likelihood c.,t1maror of p '" t h~.: vu luc of p that ma\Jmw:s thtJ
likl-llhouJ function m Equation ( 11.12) Ac; wi th all maximiz<ttion or minlmiz.tttnn
prohl.:ms. th i~ can ht' Jone b) trial and error. thai i-.. you can try diffcn:nt ,.,, hit'!>
of p anJ com put-.; the liJ..clihood f(p; 'l,. Y2 ) until vou nrc satisficJ that ~Cil h.l"~
llHI\Imized thi~ funcllon.ln thit., uxamplt:. howcvl.!r. mnximi7ing the hkc hhooJ (unc
tum U!>lng ca lcul u~ produces a s1mplc formul.J ft1r theM L[ 1 h1. Ml E. 1s p 1 .I l ,
+ } _). In othcr wnrt.I-.. the \I LE of p IS JU5l th~ 'Jmpl~ average! l n ta~.-L. lot !!L'11'
cr.tltl . the \lLE p of the Bernoulli probabilny 1 j, th~.: '<~mpk ,t \-c.;rag.: th:tl ~
r y 'hs I sh(l\\ 11 in Appcnui\ 11.2) In thi' l.\ 111 plc .t h~: \ IL[ ,, lh~ u.. u.&l c.,u
m nor ol p . t h~.: lr:u.:uon ol times Y, '- 1 an thl.' ~ample

11.3

Eshmolton ond lnferonco in the logtl ond Probtl Models

399

ll1i' c:xample i-. '>imilar ltl lh1.. pru"'km ol~.:sltmatmg I he unknown coc!ffi..::ien!S
u( the prubll and logn regressiOn m<X.I I' In tho".. model' the !>U1..ccss probabtllty
Jill> nut con,tant. but rath~..r d1..p1. n1h un ...\ tll.lt 1'-. 11 ts th~.: '>Ut.:CC'-' probability conditiun.tl on .Y. -wbkh i::. ghen in Equation (11.6) fur th~.: pruhit moJd and Equation ( 11.9) for the logit model nm . . the prohl! nJ lt1:.tit likelihood function.:: arc
simtl.tr LU thl! ltkdihooiJ function in r ~ll Utun (II 12) \l:Cplth.tt thl.. <;lJCCess probahtht\ 'at h.!s from one obscrvatit'n tot ht. 111..\l ( lkc'tll'>l.. 11 Jcpcnd ... on ,\ ). Exprcs'>JOn !o r the probit and logit liJ..d1hood lunctiunl! .trc l!lve n m AppendL\: 11.2.
Ltkc the nonlinear leaa squarv c'>ltmatm. the \ILE is con..,.~tcnt and norm.tll) di-.tributed in large ::.ample::,. Bt:cau'>c: rcg,n.'> ton sofl\\Hre

corumonl~

com-

pute., thl.! M LE of the pro hit codficiLnh, thi' vtimawr i-. C3'>) to lJ!)I.. in practice.
All the:: estimated probit and logit coefficients reported in t hi~ chapter are 1LEs.

Statistical inference based on the MLE. Bccaul!e the MLC ~ normally d istnbutcd m l.trgc samples, ~ta tislical mkrcncc nbout the pr obit ,md logit coeffi ci~nh ha~~d on the

MLE proceeds m the s.tmc wa> a-; mfcrc ncc aboUI the linear
function coefficients bm,ed on the Ol S cstim,llo t lll.ll ~.hypothesis
h!Sh 'Ut. performed using rhc t- tali tic and 4'\"o conltdt..nc' int<.r\'als arc formed
regn:~s ion

1 % ~11ndard errors. Test of Joint hvpoth~s~s un multi pi~ cod ficicnb use the
/-stata:>llc. an a v.a~ stmilar to that dtscusscd in Chapter 7 for the hncar regression
model. All of this is com plet~l} analogous 10 slallsttcal mference in the linear
regression model.
An important practical poin t is that 'omt! :;tallst tcal ,uft\\ ;are re port:> tests of
jot[lt h~poth~: 't: usmg the F- tatJ,llc. whtlc o th '
c: c: 1-"quare srati'iric is q X f." bert> q jo; the number of restrictions
1.ktnl! tested. Because t he F-!)latistic t'\. und<r the null hvpothc'i'. distributed as
y- q tn large o;amplc:s, q X 1- ts dt$trthutcJ JS \ tn larg~: ~am pies.. Because Lhe two
.tpproac:hcs d11ter only m ''bet her the} dt\ tlk b~ q. they proJw.:~.. tdentJ<.-.tl infert.ncc, hut you n~cd to kno w \\hich approach 1'- impkm~..nted Ill }OUr soh\\ are so
thul you usc the correct critical values.

Measures of Fit
In Sect inn 11.1.11 w.1-. mi!OtlC)ncd that the R2 i~ a poor mca-.ur~ o{ lit tor the lmear
ptOhHhilif} model. This is also true for prohtt .md logtt rl!gr~"lon. T\\O ml!asures

of fi t for modi.!-. \\ith hinat: de pendent \3rinhlcs arc the "fraction correct] ~ prcdictcd" and the "p..,cudo-R::!: The frudiou corrccll) predicted u"c the folio" ing
ruk II }
I .mJ the prcdic!Ld proh.tbilit) e\.cceth 500{., or if F = 0 unJ the preilil.'h.'U pruhnhtlity ts le tban 50'!o . t hen }'1 is said to he: currcctl) predicted.

400

CHAPTER 11

Regression wilh o Binary Dependent Variable

Otherwise. Y, ts l>aid to be mcorrectl~ predicted. The "fraction correctly predi~tcu"


is the (raction of th ~ n observations Y1, , Y, that are correctly predtctcd.
Ao advanLage of this measure of fit is that it is easy to unc.Jer;tand. A dtsadvantage is that it does not reflect the quality of the prediction. If}', I, the ohscr.
''ation i~ trea ted as correctJy predicted wbethc.r the J>redicted prnbahiht~ b "]IJ 0
or 90o-.
2
TilC pseudo-R measures the fit of the model using the likcW1ooJ fun c.:th n.
Becau!>e th\: \ lLE maximize the likelihood fu nction.:tdding .muthur rcgres'l r to
a prohit or logit model increases the value nl the ma\imizcd likclibnod JUst IJkc:
addmg a regressor necessarily reduces the sum of '4U.Uc;U rl!-.aduals in Jin nr
regr~!.'-ion b) OLS. This suggests measuring tht! quality of fit ot a prob1t mudd by
comparing values of the maximized likelihood function with all the r~o:grcssor. . to
the value of the likelihood with none. Thts 1s. lD fact. "hat Lhe ps~..:udo R Ju~..~ A
formula lor the p:->l!Udo-R2 is given in Appendix 11.2.

11.4 Application to the Boston HMDA Data


The regressions of the previous two sections indicated that den ial ratt:s were
highcr tor bl<td. than while applicants. holding constant their paymenl-to-mcume
ratio. Loan officers, however , legitimately weigh many factors when deciding lln a
mortgage applicaLion, aod if any of those other (actors differ !iystematiC<llh by r=1ce
then the estimators considered so far have orniued variable bias.
In this section, we take a closer look at whether Ihere is stathtical cvic.lenc~ of
discr.iminati.oo in the Boston HMDA data. SpecificaUy, our objective is to csun. h.
the effect of race on tbe probability of denial, holding constant those applic.aot
characteristics rhat a loan officer might legally consider when deciding nn a mJrtgage apphcation.
The most important variables available to loan officers through the mortgage
applications in the Boston HMDA dara set are listed in Table 11.1: these arc tnc:
variables we will focus on in our empirical models of loan decisions. The lirst rwo
variables arc direct mt:asures of the financial burdl!n tht:: proposed loan w~Hild
place on the applicant, measured rn terms of hi~ ur her income. 'The first ol t hL!"I!
is the P/1 ratio: the second is the ratio of housing-related expenses to income. n1c
next variable is the size of the loan, relative to the assessed valut: of the home it
the loan-to-value ratio is nearly 1, then the bank mighl have trouble recouping tiW
. f\Jll amount of the loan if the applicant default" on the Joan and the bank f,~w
closes. The fina l three fmancial variables summanze the applicant's credit historv.
1f an applicant has been unreliable paying off debts in the past.lhen the:: loan L II

11 .4

TABU: 11.1

Appl.cotion to the Bo~ton HMDA Data

401

Variables Included in Regression Models of Mortgage Decisions


Somple Average

Vo rioble

Definition

,.,, rtJ/1(1

Ratio of total monthly debt payment!> to total momhly income.:

hoiiW!~ tX{II.'IIW!W

Ratio of monthly housing expe nses to total month ly income

IIC0111t' rtJI/0

Ratio of size of loan to assessed value of prop.:rty


CCl /1\111/ll' f ~ retltl SCOrt'

publu.. /lad credit record

1 if no stow.. payments or delinqucncic


2 tf one or two slow paymen ts or delmqucncies
3 1f more than two slow payment~
4 if insufficient credit histor. for dete rminat ion
5 if delinquent credit history with payments 60 doy~ overdu..:
6 if delinquent credit history with payments 90 days overdue

2.1

1 if no late mortgage payments


2 if no mortgage payment history
3 if one o r two late mortgage payments
4 if more than rwo late mortgage payme nts

1.7

1 if any public record of credit problems (bankruptcy. charge-offs,

O.o?-4

collection actions)
0 otherwise
Additional Applicant Charactflrlst/cs

dtmt'tl mortgugt! insurance

1 if applicant applied for mortgage insurance ;md was denied.


0 other.vise

0.020

<t'lfmrplowd

I if self-employed. 0 mherwise

0 116

lllll!ll'

I if applicant reported being single. 0 othen' 1:-e

0.393

"' h g/rool diploma

1 if applicant graduated from high school. no ther" "'e

0.984

19R9 \1assachusetts unemployment rate

3.8

I lilt mplo\JIIt'/11
(()fldtl/11 / / lll//11

rfi-

mle

10

the npphcant's rndu,try

I 11 unit is a condominium, 0 o therwise

0.188

1 if applicant is black. 0 if white

0.142

J if mortgage application denied. 0 o ther.v1se

0.120

ccr legitimately might worry about his or her ability or desire to make mortgage
payments in the future. The three variable measure diffcrcnt types of credit histories. which the loan officer mig.ht weigh differe ntly. The fir!>t concerns consumer
credit, such as credit card debt; the second, previous mortgage payment history:
and the thi rd measures credit problems so severe that they appeared in a public
legal record, such as tiling for bankruptcy.

402

CHAPTER 1 1

Rogreuion with a Binary Dependent Variable

')able 11 I alc;n listl> 'Orne other variables relevant 10 th..: loan ofllc~.:r's u~.c 1
siun Somctimel. the a pplicant must appl~ for private! mong.tg~. i n~unm.~.
h
lo,m of ll ~r knows whether that application was de nied md hat denial \\(.)lJid
wct'.!.h negauvel) '' ilb tht. loan officer. Th~ next three Htriabk '\, "'hich conc... rn th
e mplo}ment st.ll us, man ta) status. and educational all :un m~.nt ot the :l ['pltc<~nt
rd,tlc.: to the prospective ability of the applicant to rcpa). In the ~:"ent of ro-~ ~
... urc. charactcn...tics of the property are relevant as well and the. next \ aria hie tnu
cates whether the property is a condomimum.. The final two vuriabk:!) in Ja hlc. 11 . 1
arc whe ther the applicant is black or white, and whether th ~ 1pplic.ttion was
ucm~d or a ceptcd. In the:.e data. 14.2% of applicants are black and 12.0% of
applit .llJons an. demed.
l able 11.2 presents regression results based oo these variables. me ba'c 'P -:
ificallnns, reported in columns ( I)-(3). include th~ financial variables in T<.~ bl l.. 11 1
plu" the vanahlcs indicating whether pri\ ate mongage insurance was J t:ntcd .md
whether the applicant is self-employed . Loan offi cers commonly use thrc~lwkl:-..
o r cutoff values. (or the loan to-value rmio.so tht.! bo<ie -;pl.!dfication for thut va j.
ublc.. uses binar. van abk tor \\heLher the loan-to value rn11o 1s high ( .. 0 J"l.
mc.:dlllm (bet\H.I.!n O.X and 0.95 ). or low ( < O.X: th1' case i-; omittcu to avoid 1crlht multicollinearity ).l l1e regressors in the first three columns arc stm1lar to t 'L
in the base specification considered by the Federal Reserve Bank of ~J o .on
researchers in their original analysis ol these data. 4 The regressions m ~o.olumn'
(1)- (3) differ only in how the denjal probability is modeled, using a linea r probability model, a logit model. a.nd a probit model, rcspectivel).
Becaus~ the regress10n tn column (1) is a linear probabt lity model. itc:o c-odli
cicnh .tre esttmatcd changes in predicted prohabilities arising from a umt change
in the independent variable. Accordingly. ao increase in the P II ratio of 0.1 t:, \:-.ti
matul to i ncrca~e the probability o( denial by 4.5 percentage points (lhe coefU
CJcnt on P I I ratio in column (1) is 0.449, and 0.449 x 0.1 =!! O.!~ 'i}. Similarl>. h,,, inl!
a high loan-to-value ratio increases the probability of den ia l: a loan-to '< <tlUt.'
rallo exceedi ng 95% is associated with an 1 .9 r~rc.-.;nla!:... potnt tnctca.,c the

1
1\l(lrLt::gc

in~urance i~ nn in~uruncc poltC) under .... htch

mwwnc:e com pam 111 k~;.<


thr.lt n
to ,,,J~: rouo ~'c~eu stl .. the: upJlltl':lntt)pcall) wa~ requu d 10 hu~ mtllt .t~" 111\Ur.tnH'.
"The d1tt..renu bd wc-cn the: r.:gt:tsor. tn column~ (I H 3) anJ thtN: in Mun nell d al. (!Wo ) ,,,.,.,.
:?{ I) 1\ th u \ iunncll ct .tl. mcludc additional indicator; Cor the lncallon of the homo:: aoc.l1hc 1d !lilt'
ullh lcnu .d.!Jo~ "htch are not puhh~l~ availaNc: an mJ icat,~r I r ,, mulufamtl\ hilrne. "'hi h" t+
C'\O&nl here 1-oe<:ausc our ~ub::.<:tlocu~ on 'ml!lcfam1h >m~. J net wealth. "tndt ''e um11 lx-< ~
lht' \bTIIIhlc h:t~ a r.:" ' el} l:uge P" ttl\ .... nj nq!ativc \31~ nnJ thu, n L~ rnal<~ns the r~;.Uii.! OSl
11\'e "'"' fc\4 ~JX'I:tlJ "outlier" ob:o"'' .tuon(.
I hi!

th~ munthl\' "3' m.:nl 1 the oanl: 1f th~ horro".:r c.lctaulb.Dunng thl!

rcod of tht ,hxh

1r

Applicolton to the Boston HMDA Data

l 1 .4

TABLE 11 .2

403

Mortgage Denial Regressions Using the Boston HMDA Data

Dependent vorioble: chny 1 If mortgage QPplicotion is dent.<!, 0 if accepted; 1310 observorion$.

~
Regreuor
I <I J..

/ ~ I fllflll

Log1t

f>robir

(1)

(2)

(3 )

0.()84
(0.013)

co ll\2)

0.449..
(rl.l I 4)

( I 31)

O.Mb
J

"!flU

() 17t

ll iN
(II II%)

2 .w

Protnt

Probit

(S)

(6 )

(4)

( ll.llQII)

62

2.57-*

(0 61 )

(ll.hl)
- II I 'I

0163*'
(fl. 100)

,0

-0. ~

(0.66)
-----~

-IJ.048
l-110)

- 0 II
( I 21J)

(0 SO ... /run value ru110 s 0 .95)

11.031
(11.013)

0 .46
(0.161

/u I ft.,mtovolut rotw
loa 111111! rutw ~ 0.95)

0 1!'19

1.49

(OO'il))

(0.32)

(0.18)
0 .15
(0.02)

11.16

0.34

(002)

(0.11)

0.16
(0.10)

It

11lWIC ~AptiiSI!-/0

l llt'QJ/a

rauo

mtJwm

loatHO\'IIIu~t ruuo

lll(llmrr crt:dlllt:ure
m r 1,'11 t credlf l c ort

publu

(II ~)

-1)

(II f)," )

1) 22

0.~

(IJ.O."i)

(ll.OI'i)

(0.08)

0.79

0.84

(O.lg)

(0 18)

0.79
(0.18}

ll.~)n

0.29
(0.C>4)

0021
(0.011)

0.28

0.1 5

0 11

(0.14)

(0.07 )

(0.08)

1.23
(0.20)

0.70
(0.12)

(0 12)

4.55
(0 57)

(0.~0)

0.67

<t'lfrmployed

(0..21)

on-

0.21

0.031

dt'lll tl morlgagt mwrcm<'t

(0.74)

(11.1~)

(fi.OOS)

~ad uedtt rtcurd

- 054

(0. 7 0)

256

U.iO

zw

(11.29)

0.16*
(0.02)
------l
0.11

0.72
(0.12)
2.59*
(ll ;\1))

(0.08)
0.70-h
(0.12)

259...
(0.29)

036
(0 I I)
0.23..

(ll.IIS)
- 060
{11.24)

() 03
(0.02)

PI/

Mu ~

- 11.62.
(0.23)

0.03
(0.02)

fl/110

X lrnll.\111~ ~.tpt:II.H'IO

'" V Illi' ra1111

A ldumlrcll c11!dit raunr.:


1'1dtrmn, tunublts
11

lttUmt

no

- 0.1 RJ
(0.02.0)

no

-5.71
(0.4S)

no

-3 114 ..

-2.90 ..

(IJ.23)

(039)

-2.54

(035)

( I ahle II.:! coolinuol )

404

Regression with a Binary Dependent Variable

CHAPTER 11

(T~tbfc

/1.2 umrimud)

F-Statistic:s and pValues Testing Exclusion of Groups of Variable!!

( 1)

(2)

(3 )

1\pplicom singlt!:
lfS diploma. tmlu.'ltr)
lllll.'mployment rat~

(4)

(SJ

(6)

5.85
(<0.001)

5..12

5.79

(0.001)

A ddirionaf credit rating


indicmor vanables

0.001)

Jl~

1.22

(0.2:1) I)

4%

Ran imemcriolls cmd black

(OJJII:!)

Rae' mteracrions only

0.27
(0.7M)

/)iffcrence in pr11dicted

6.0%

~~

7.l'Jin

prohabilily vj"clenlul. whitt


v.~.

h/ack (percen tage J1tltnts)

6.5%

ll~Cla!

regrcssltiD' w.;r~ csumated usmg the 11 2380 obscnatmn' tn the Uos10n Hl\1DA dalll wt desert bed 1!1 Append x II I ll~c
Ol()dd ""' e~timatcd by OL<;. and probit and logll r~!:(rt:sionb were cstimat,;d b~ rna~;mum hkt:lthood StanJ:u,t
errors are g~wn in par~mhcscs um1.;r the: ~:oclfictcnts and 11-valu"' arc: [!tvcn m parenthe..cs und<:r the /-statistics. 111" ch:tn!,tc IL1
preuicted probabu1t~ m th<: final row was compi.ltttl fur a hypothetical apphcun t whose values of the r.:gressor~. other thnn r,,c..:.
equal the sam pi~ m~:an. Ll1u iv\dml.:oerfici~n~ MC sratistically >ignifkllnt ~ ~ the *5% or ** I% I.: ve l.
lm~ar probabtht~

coefficient is 0. 1f{9) in the denial prohability. relative to the omitted case of a loanto-value ralio less than 80%. holding the o ther variabl e~ in column ( 1) constant.
Applicants with a poor credit rating also have a more difficult time getting al1><tn.

all dsc being constant, although interestingly the coe ffic ie nt on consumer credit
is statistical ly significant but the coefficient on morrgage credit is not. Applicants
with a public record o f cr edit problems. such as filing fo r bank ruptcy. have mu~:h
greater difficult) obtaining a loan: All else equal. a public bad credit record lli t.;~t i
matcd to increase the probability of denial by 0.197. or 19.7 percentage pt"int:,.
Being denied private mortgage insurance is esrimatcu to be virtually_decisive: n,~
es timated coefficient of 0.702 m eans that be ing denied mortgage insurance
increases your chance of being denied a mortgage by 70.2 pe rcentage point'-. til
e lse equal. O f the nine variables (other than race) in th~ regression. thl! COl 1 1
cicnt' on all but two arc ' tati.sticaUy significant at the 5% leveL which i~ cnn~ht dll
"itb loan officers' considering man} facloJs when they make their decisions.
The coefficient nn black in regression ( 1) is 0.08-1. indicating that the tlitfc rl'nce in denial probHbil ities for black und white apphcants is 8.4 percentage pntnl'
holding constar1l the other variable!> in the rcgres::.ion. This is statistically ::.igmti
at the 1% sign1Ikam:e level (I = 3 65).
The logit and probit estimates reported in column$ (2) and (3) yield 1:-lmti:Ir

ca nt

conclusions. In tbe logit and probit regressions, e ight or the nine coefficient" tlll
variables other lhan race are individually statistically sign ificantly different rronl

11 . 4

Applicotion lo rhe Boston HMDA Data

405

zao at the'% level. and the cocfflcrcn t on blat/.. '' statisticalJy significant at tbe

9
11)

w.~j-<M

b6
l02)

b7

~66)

,_
f "'O
11.1. I h~

Sll.ndan.l
lange n\
~ mcc

t>f a loanonstant.
g a loan.
cr credit
tpplicants
\-Cn\UCh
rd is ~ti~poin ts.

isiv~: 1ne

nsurancc
points. all

he cocffi
~onsiste n t

sions.
be J jll t;f
\gc pullll'
lly si~ntfi
~lcl ..;in1ilar

Iictcnl" on
~ rent fn 1111

)
I

I "o level. A" uiscusl;.~u in S~.:ction 11.2, hecause th~!se models are nonlinear, speclfu: values of all the rcgrc~-.or... mu.:;t he chosen to com put~ the difference in predicted probabilities for wbile anu black applicant-.. A conventional way to make
th1s cliorcc is to consider .111 a\cragc applicant who has the ampk a\'cragc ' aiUe'> or all the regressor other than rae~ Tht. final row in Table 11.2 re ports this
1-:!.ltmatcd dtffercncc in probabilities. evaluated for this average applicant. The esumatro:d raci.tl dillerentiab <Ire 'imllm to each other 8.4 percentage points for the
l in~ar prohahility modciJcolumn (I) J. fl.ll percen tage pomt-; for the logit model
[column (2)]. and 7.1 p~rc~n tagt: poult~ lo1 the prohtt model [column (3)j. These
estima ted race effects nnd the cocfticknt~ on hlack an.: !e. s than in the regressions
of the previous sections. in which the only regressors were PI! ratio and !>lack, indicming that Lhosc earlier estimates had omitted variable bias.
The regressions in columns (4)-(6) investigate the sensitivity of the results in
column (3) to changes in the ft'!!ft!~~ion specification. Column (4) modifies column
(3) by including additiona l applicant characte ris tics. These characteristics he lp to

pred ict wht:ther lbe loan i ~ denied: for example. hav10g a t k ast a high school
diploma reduces the prohahi li ty of denial (the c. timatc is negative and the coefficient Js statisticall;. sig.mfic;mt .11 the J% leYel). H owever. controlling [or these
p~:rsnnal charactcnstk<: dot:~ not chang~ th~.: cstlma t~:d coefficient on Mack or the
e tunated diffl!rence in dema l probabllitic.'\ (6.6%) in a n im ponant way.
Column (S) break:. out the ix cunsumer credit catcgone and four m ortgage
credit categorie. to test the null b) potheo;ic; that these two 'ariables enter linearly:
thi rcgn..:sston also add~ a vtriablc inclic.tting whdhcr the property i)) a condommJUm The null hypothe. J!. that the credit raung variables e nter the expressio n
lor the C:-\a}ue linear!} ~not rcjl!ctcu. nor is the conuonnnium indicator sigruficant, at the 5% level. Most im portant!~. the estimated racial difference in denial
probabiliue:. (6.3%) ~ e:.:.cntially the ))arne a:. in colu111m (3) and (4).
\ olumn (6) examines whether there arc interaction . Are different standard
applit:d to evaluat ing the paym~nl-lo-mcomt: anc.l housing expense-Lo-mcome
rattn~ lnr hlack versu~ \\hit!.! applican ts! lhe answer appears to be no:"lh e interacllon lenns are not jointly stuti:.tically ig.niCicaot a t the 5% leveL H owever, race
contmucs to have a significant effect. because the race mdicator and the interaction terms are joint!) statistically ignificant at the 1% le'el. Again. the estimated
racial difference in denial probabilities (6.5'o) is essentially the same as in the
~>thcr prohit regressions.

In all six specifications. the dfect of race o o the denial probability, holding
other ctpplicant characteristic!> constant. i:. stali!>tlcall) sigruficant atlhe 1 1o leveL
n1e estimated difference tn denial robabtltlles between

406

CHAPTER 1 1

One way to assess whether this diffcrcnLial is Jargt: or small ~~ to return to a


variation on the question posed at the beginning of this ch<1ptcr. Suppost: twumlhviduals apply for a mortgage, one white and one black, but oth ~;:;rwis~.: having the
same "a lues of the other independent variables in regression (3 ): spccificall~ . a.'-iuc
from race, the values o( the olber variables in regression (3) ar~ the sample aver.
age value~ in the HMDA data set. The white applicant faces a 7.4'}o thi:.tnce ~r
denial, but the black applicant faces a 14.5% chance o f denial. Tht:. c~t1mated rac1 1
difference in deniaJ probabilities. 7.1 JX"'TCCntagc points. mean'> liM I tho: ttl.u.:k clpph
cant is ncarl) t\\icc as likely LO be denied as the white applicant
fb c rcsults in Table 11.2 (and in the original Boston Fed o;tudv) provtde '1<~
tistica l evidence of rac1aT patterns tn mortgage dental that b) laY., ,)ught not b~.:
then.. 11m evide nce played an important role in spurrjng policy changes h> b<tnk
regu lator .5 But economists lo' e a good argumc:nt. and not sm pttstngly these
re~ulls haw til 0 stimulated a\ tgorous debate.
Because the suggestion that the re ts (or wag) racial discriminatio n in lt:nding
is charged. we briefly review some points of thls debate.ln so doiog, it is useful to
adopt the framework of Chapter 9, that is, to consider the inte rnal and extt>rnal
valid ity of H1e results in Table 11.2, whicb are representative of previou-; anaJy,e-.
of the Boston HMDA data. A number of the c riticisms made of the ongmal ft>dReserve Bank of Boston s tudy concern internal validtly'
error.. 11 e

Q/ r '
J
I jifQ

Af

Regression with o Binary Dependent Variable

/J/

. ~

q., /'la

(} i/( A )~raJ
po~siblc
~
~l ~~. )
data. alte rnative nonlinear functional fonn ~ additional tnteractinns. and ' n forth
J(l1 YJ
The orig~nal data were subjected to a careful audit, some errors were found .md

(V

f"

.G

fn ~ t-'

JI:."IJ ,_

( " QJJP (
~r

~
f! )
v
1
~~?~0

~,t

11 1

v
/

'(j

0-'

rtfp

the res ults reported here (and in the final published Boston Fed study) arc ba!>ed
on the ''cleaned'' data set. Estimatton of other spcClfications-dtffcrcnt functional
.tJonns and/or additional regressors-also prod uces estimates of racial dtffcrcnllals
compa rable to those in Table 11.2 . A potentially more difficu ll issue of intun.ll
validity is whether there is relevan t nonracial fmancial informativn ootainee.l during
loan inter\'iews. not recorded on the loan apphcation tt-.df.th8tlll
co rrelated with raCl': it so. there still mt!!ht be omttted "aria hle bias 10 th~ Ta
blc lJ .2 regression!>. Finally. some have queMioned external valid it)': Even if ther.::
was racial discrimination in Boston m I090. it is wrong to tmplicHh! l\!nt.kr~ J.;c

m-r~r-;un

&,herr.! touny. TI1e only way to resolve the

qu~ tioo of e xternal validity ts to

e r data from other locations and years."

con

~The c r ohC\' '11111-. an~:luilc changes in the wa~ tha i fat r lendmg e~ammatwn' 1\( le llone h~ teJwrul
hank re~tul:.to trt chungc~ in inquim."$ mruk b' the l s Dq1arm1e nt o f Ju~ucc. and e nh anced cdutuon
pro11ram~ IM lmnl.:s lind oth r home' klan origtnat~<ll o..'mpant~.:

~li you are ntcrc~tcd m funh~ readin~ on thl'i toptc. a good plat~ to ~tart'' the 5\mpthtUm '''' L'i3l

r.

d lSCrimmaunn anJ c!COOtiJIDc-. in the Spring ll)CIS j,,u uf 1/lc J, umnl aj tllltlllllo p, ''I' f t
Ill.:
article m that 'Ymroium b} IIden Ladd f I '~'1::;1 ~unc\ the e\ &dena: and dct-at, n n ractal lll'>"nrn
nation m mllrl~t:igc: l~ndm~ A IIJ(ITe detatled tr~auncnl a~ ll1"Cn m Goc:nng 1nd Wu:nl.: ( JJ<J6 l

1 1. 5

I jointI~ to t"-O econometrician~ Jam.:' J Hec~m.LO

Mcfadden wa~ a"ardcd


hith ..._h,l

1>f the l 'm,~;Nty of California at Berkeley. for funt.W

0r ~tt

<~nalV"is

of data on indi

th~

pnzc for dc!vcloping

model-; for tnalping di cret, , hllllt d.1tJ \ d<X!s a

c f the L nhc t '1l\ uf Chicagll :tml Damd L. \1cFaducn

mental contrihution!> to the

bc
nk

407

I James Heckman and Daniel Mcfadden, Nobel laureates


"The 2000 Nobel Prize m ccononucs wru. awankd

ta-

Summary

111

ulull, 111 tht. nulit.tn gt.lo,ollcge.

.1 jch") He 'itarted hy con~idenng rhe problem


of un mdJvlduulmuxinuting the expected utility of

"dual!> lmd fim1s. \ 1uch 1>l th~:ir work addrcs,c:d


dtfficulttes th.tt .tn~ wilh hmth:d ucpcn~nt v:tn.tbk
Hcckmun wns awarded the pri7.e for deYeloping

each possible choicl!. which could depend on ob~cn

toob for handling ,,tmplc '-t:lcctiun. As discu~cd in

indi\idunl choice pmbahihties wirh unknown coeffi-

Section 9.2. )ample sclcclton biru> occurs when the


valluhtlity of data are inOucnced by a selection

hkthhood . l ht:'l!' models nnd Iheir exten~t ons have

pro<:eS!o related to the vuluc of dcp~nden t variable.

proven widdy li!>Cful in

For example. suppose you want to estimate the rela

J:lla m m.111y fu:ld' mcludmg lahc.lr economics. health


econnmic-.. and tralt\pnrtatton economtcs.
for mon.: infurmutitln on these 11nd olher Nobel
laureate'\ in economic<;. vit;it the 'ohel Foundation

tion~hip

between

enrnin~

tbmg 1 random ~mple

<tnd some regressor. X.

from the population. If you

.... umate the regression uc;ing the subsamplc of

mploycd workerl>- that is. those reporting positive


rntng.s-the OLS er.timate could be l>Ubjcct to

able variahlcs (such a~ wagcs.Joh characteristiC<;. and


f~tmil) bad.ground). He them derived models for the
CLent~ "hich tn 1urn could be cstimah.:d b'
analyz.i n~

maxunum

d1scrcte choice

Web site. W\\ w.nobcl.sc/ec.onomics.

scl~:ctton

htas. Hcckman'l> solution \\aS to specify a


pr\!hmmary equat1on wilh a binary dependent variblc 10d1ca11ng "'h~tbcr the work~r bin or out of the
hor force (tn or out of the ~uh~mple) and to treat
'lt c~u Ilion

<tnd thl! ~aminw; cquat1un ao. a "~'tem


This general strateg.} ba:.

r.r-;muJL.tn~u' <:\iU.ttlcln
t. l.ll ~xt~nded

to selcctton problems lhaL arh,

Ill

'11 1n field:>. mngmg from lnbor economics to indus


t 1 t1 or l!ani7.at ion to finance.

James J Heckman

Doniel L Mcfadden

11.5 Summary
When the dcpendenr varia ble Y ic; binary th rupulnlton rcgn.s,ion lun([ion i~ the
p ron<tbJht~ that Y = I. conditional on th1. r~.:grcssor". E ti mation of this pop ulatio n regression function entails frnding a fu nctwnul fo rm tha t docs juc;tice to its
probabtlity interpreta tion , estimating the unknown para meters of that function ,
a nd inter pre ting the results. The resulting pred icte d values are predicted probabilities, and the I!St imated effect of a change in a regressor X is the estimated
ch<t ngc in the probability that Y = I ariMng from the change in X.

408

CHAPTER 11

Rcgres~ion with o

Binary Dependent Variable

A naltll al way to model the proba bil iry that Y :::: I given lhc rcgrcso;ors h 'o
use a cumulative d istribution function, where the argument of the c.d.f. depend.,
on the regressors. Probil regression uses a normal c.d.(. as t h ~: regression fu nttt11
and logit regression uses a logistic c. d.( Because these moJds an.: nonlinear f "
tiom. of the unknown parameters. those parameters are mo te complicated to cc; 1
mate than linear regre io n coefficients. The standard estimatao n me th 1J
maxtmum hli.clihood. In practice, statistical inference using the maximUJll hkelt
hood estimates proccc<h the same way as it does in linear mullipk reg.re......ann tur
example. 95Qa confit.lence intervals for a codficacnt a rc construe tell a.s the est 1
mated codftctcnt 1.96 standard errors.
Despill! tts intrinsic nonlinearity. sometimes the population rcgresston tunc
tion can be ndcqualcl y approximated by a linear probability model. tbat t ~. b) the
straight lim: produced by hnear multiple regression. The linear probahtlit) m11dd,
prohtt regrc!lo;ion. and logil regression all give similar botto m line'' answers when
they arc applied to Lhe Boston HMDA da ta: All three methods estimate -. uh~t.m
tial dirie re ncc~ in mortgage denial rates for othe rwise similar black and whi te
applicants.
Binal) dependent ,ariables arc lhe most common exampk of limited d ocn
den t variables. whach are dcr>cndtmt variables witb a limitcd rangc. Jnc L1nalliUctr
ter of the twentic:th century saw important advances in econome tric method-. for
analyzing other limited dependent variables (see the Nobel Laureate, box). Sumc
of these methods are reviewed in Appendix 11.3.

Summary
I. W hen Y is a binary variable, the linear multiple regression model is caltl!d the lin
ear probability model. The population regression line shows the probabihlv thM
Y = 1 given the value of tb~: regressors, X 1.X2 , X 11
2. Probit and logit regression models arc nonlinear regTession mode ls used '' h,.., l '
is a bi nary variable. U nli ke the linear probability model. probit anc.l logit tL'!H~-'''
sian ensure that the predicted probability that Y = I is between 0 and !lor :til vJl
ucs of X.
3. Pro bit regression use:, the standard normal cumulative distribULion function 1 Pgit
regr~ssion us~s the logistic cumulalive distribution function. Logit and prohat cnd
ficients are estimated by maximum likelihood.
4. 1hc values of coefficients in probit and logit regressions an.: not easy to in h:rprcl
Chungeo:; in the probahility that Y == I associated with changes in one or llhlll' vf

Exercises

409

the xs can be calculated using the general procedure for nonhnear models outlinl!d in Key Concept 8.1.
5. H ypothesis tests on coefficients in the linear probabilit y,logit, and probit models
are performed using the usualt and Fstatistics.

Key Terms
limited depe ndent variable (384)
linear probability model (387)
probit (389)
logit (389)
logistic regression (389)

likelihood funct ion (398)


maxmum likelihood estimator (MLE)
(398)
fraction correctly predicted (399)
pseudo-R2 (400)

Review the Concepts


11.1

11.2

11.3

11.4

Suppose that a linear probability model yields a predicted value of Y that is


equal to 1.3. E xplain why this is nonsensic<ll.
In Table 11.2 the estimated coefficient on hlnck is O.OR4 in column (1), 0.688
in column (2), and 0.389 in column (3 ). rn spi te of these hu ge differences, all
three models yield similar estimates of the marginal effect of race on the
probabilit y of mortgage denial. How can this be?
One of your friends is usin.g data on individuals to study the determinants of
smo king at your university. She asks you whetJ1er she should use a probit.
Jogit, or linear probability model. What advice do you gjve her? Why?
Why are the coefficients of probit and logit models estimated by maximum
likelihood instead of OLS?

Exercises
Exercises 11 .1 through 11.5 are based on the following scenario: Four hundred driver 's license applicants were randomly selected and asked whether
t hey passed their driving test (Pass; = l ) or fa iled their test (Pass, = 0); data
were also collected on their gender (1'4ale; = 1 if male and = 0 if female).
and their years of driving experience (Experience,, in years). The following
table summarizes several estimatt!d models.

410

CH APTER 11

Regression with a 81nory Dependent Variable


Dependent vorioble: Pan

L~:;tcn~nc,

Problt

Logit

lPM

Probil

~ )

,:,

(3)

(4 )

0031
(0009)

\0.0161

u.u.m

((1)

(1)

OW

(U lui)

0622
(II 3(1.)

-0071
(II 034)

-n r.s
(0.2.'i9'
I I

tpt.;Tl<lll"'

'

(fii,J9)

Cwr tmt

11.1

071::!

I 059

u.n4

I ::!1\2

2 197

II 'Xlll

(01.26)

(U~l)

(0.1134)

(II I :!4)

(0.:!41)

(0.02:!)

(J

Sill,

(U.:!lXI)

Using the r~:.;u l ts in column (1):

a. l.)oes thl.! probabilit y of passing the

t e~ t tlepencl

on Experience'!

E\plain.
b. ~l.ltthew ha<; 10 years of driving experience. WhattS the prubctb1htv
that he ~t tl pa , Lhe test ?
c. Christopher .sa new dm er (zero years of expt:nence) What IS the

prohabilty thar he" ill pa~ the test?

d. Thl

aMpl~

included values of Experience bet\\Ccn 0 and .10 yt:ar'. ru.l


in the sample had more th. n 10 year- of driving
~:xpt:r~t: nQ!. Jed is 95 )ear old and has ~ce n dnving !)t OC(; he was l ~.
\\hat i<- th~ mudd s predict1on for the rrobahility that Jed will pal>~
tho; h.:st 1 Do )OU thmk. that t.hts prediction Iii rclinblc? Wby or why
onl~ four pt. :1ple

nnt"1

11.2

a. Ans,,cr (n)- (c) from Exe rcise 11.1 using the results in column (2).

b. Sketch the predicted probahilities (rum the prohit and ll>git in C<.> lunlll"
(I) tt~H.I (2) ror value!> of E:rperienc:e bet wc ~: n 0 und 60. Arc thc prnbi!
nntllugit model:-. simi lar'!
11.3

a . Answer (a)- (c) from E.xcrcisc 11.1 using the r~ult' tn column (3).

b. Skl!lch the

predtct~d prohahtlttie~

from the probit anJ LPM tn


<.:lllumth ll) Jnd (3) a '~~ function of xp1r t. llu lor vnl u<!S of f..tpt.rt rm, bd\\~;~;n O;:~ nc.l 60. Do ~ou tlunk that th'" TP~l ,., Jpprupnale
her~;

l \\

h~ ur '' h~

not?

ILzf

(U IS
-OJ~.\

(~)

Probit

0006
\0 [K}:! J

Male
,\f,tf~ X

lPM

::

(1-

~ ) :P( y, :: fP / '/;_ )
I 1.4 c . . mg the re-sult!> tn culumn ... (4)

::

p ( "f<t

-::: -g_ ~(J )(,

(6).

\ Exercises

r"r' y::

411

I?--,Jl. Yt
d

a. Compute tht.: c\timah:t.l prolnhrlll\ o( pa,,ing the te ... t lor men and f('lr
Pro bit

women.
b. Are the moJ~Is in (~)-(6) uiflcn.:nt' Why or \\hy not?

(1)

U.ll.ll

ll.S

tOr h)

a. Akira h a man wllh JO )C<IIl> ol t.lri\ing c\pcrll..nCc. What is the proba-

-(IJ1.t
(0..!59)

bilit) that he will pa's the tc<>t'?

(1-ft,f, 'It)*
ll.IY )(>
(ll.21 ll

Usmg. the rcsulls in Clllumn (7):

)<

{F" Y<)
11.6

b. Jane ts a woman \\tth 2 yc:.m.ol dmmg t:xpcncncc. Whatts the probahihty that she will p.r-;s th~: 11.'\t'!
c. Doe:. tht: dfccl 01 cxpl!ncnc~.: llO tl.'<it p..:rform:1ncc t.lepend on gender'?

[\plain.
Usc tht: estimah. t.l pruhit mocJll in r4uati{m ( II.R) to answer the foll owing
questions:

a. A black mortgage applicant has a PI II'LIIIo of 0.35. What is the proba


btJity that his appltC<.IttoD will bC! t.ll'Oll't.l'?

b. Suppose that the applicnnt reduced thi~ n110 to U.30. What effect
\\ Ould rhts ha\'c on his probabrht~ of betng demed a mortgage?

bilit:

c. Repeat (a) ant.l (b) r~n a v.hnc apphc:tnt.

d. Does the marginal e fkct of tht.o r tf ratio on the probahilit) of mort-

y.age derual cJcpcncJ on r:tcc I rxplam.


car;. and
ing

11 . 7

as 1:'.
I pass

11.8

why

Repeat Excrclloe ll.o ulomg Lhl' lug.~t mot.ll'l tn L4uatton ( ll.lO).Are the logtt
anJ prohit rc~ult~ sim1lar? Expl:tm
Consider the linear prob.Jbility OlllOcl r, = 13u..:. /3,X. - U;. where
Pr( Y, = I X) = {3, -'- {3 ). .

:r1) =U.
Show that ''ar( u :r) = ({3

a. Show that E(u


b.

(2).
column~

c. ].., u,

e rr('lhil

d.
0 (3).

in
C.lptfl

n ate

.&.

IHmt. Re"tew &juauon (27)

ll.9

hetero~;kct.Ja..,tk'.'

(Require~

{3 \ ,)[1 - ({3

/3 XI].

c\plain.

Seclion 1 U) Derive the likelihood function.

l "'e the estimated lincnr prohahrhty moJd hown in column (1) ofTable
to answer the foUo\\ tnl!

11. ~

a. Two applicants_ one while and one hlack, applv for a mortgage. They
h.1ve tile same;' 1luc~ lor .tilt he rcl.!.rc~<;Of' other than race. Hov. much
morc liJ..cl) ts the black applicant to bt: dt:ntcd a mortgage?

412

CHAPTER 11

Regression with o Binary Dependenl Variable


b. Construct a 9So/o confidence intenal for your answer 10 (a).
c. Think of an importan t omitted variable that might bias thl.! answer , 11
(a). What is it and how would it bias the rcsullS?

11.10 (Requires Section 11 .3 and calculus) Suppose that a random variable }. h,


the following probabihty d istribution: Pr( Y = 1)
Pr( Y

=p . Pr( Y = 2> = q

anu

=3) = 1 -

p - q . A r an dom sample of size 11 is drawn from thts d i'rribution an d thl! random variables are denoted Y1, Y: ..... Y,,

a. Derive the likelihood function [OJ Lhe p arameters p am.lq.

b. Derive fo rmulas for the MLE o( p and q.


11.11 (Req uires A ppendix 11.3) Which model would you use for:

a. A s10dy explaining the number of minutes Lhat a person spends talk


ing on a cellular phone during the month?
b. A study explaining grades (A- F) in a large Principles o f EconomH.:-.

cl ass?
c. A study of consumers' choices for Coke, Pepsi, or gene ric cola ?
d. A stud y of the number of cellular phones owned by a family?

Empirical Exercises
Ell.l

11 bas been conjectwed that workplace smoking bans induce smokers w y u11
by reducing their opportunities 1.0 smoke. In this assignmen r you will t:lill
mate the effect of workplace sm oking bans on smoking using t.lata on a s,tm
pie of 10.000 t :.s. indoor workers from 1991-1 993, available o n the texlhuok
Web site www.awbc..com/stock_ watson in the file Smoking. The data set C"n
tains information o n whether individuals were or were not subject to a" l'rl\
place smo king ba n. whether the individuals smoked, and other ind ivtJu,tl

characteristics.7 A detailed description is given in Smokin g_DescripliiHI.


available on lhe Web site.
a. E stim ate the probability of smoking for (i) a ll workers, (ii) worke r~
af(cctc.:.d by workplace smoking bans, and (iii) workers not a ffectl!J In
workplace smoking bans.
1111.-~ clata wcrl! provu.Jcu by Ptnfl:,\vr \Villiam E\-ans of tb .. Llnhcl'iity of Marylanu c1nd \\CH u J
1n h" p.tpt:r '~uh Mlltthe~~o Farn:llv and Fdwa1d .Montgomef\ . ..Do \vorkrln~ Smokm~t Bans Rulli~
Smukmg'! " A mc:nctlll Ollfllllll Rt HI""- 1999: 'N( 4): 7~ 747.

Empirical Exercises

413

b. What is the differcnct: in Lhc prob~tb ili ty of smoking between workers


affected hy a workplace smoki ng ban and workers not affecte d by a
workplace smo king ban? Use a line a r probability model to determine
whether this differe nce is l>tatistica lly significant.

wer in

a ble Yhas
) = q, and
m this db-

c. Estimate a linear probability model with v noker as the dependent


variable and the following regressors: smkhan. female, age. age1,
llsdrop, hsgrad, coLwme, co/grad. black. and hispanic. Compare the
estimated effect of a smoking ban fro m th1s regression \'ith your
answer from (b). Suggest a reason, based on the substance of thiregression, explaining the change in the estimated effect of a smoking
ban between (b) and (c).

d. Test th e hypothesis that the coefficient o n smkban is zero in the population version of the regression in (c) against the alternative that it is
nonzero. at the 5% significa nce leve l.

nds talknomics

e. Test the hypothesis that the probability of ::;mo king does not de pend
on the leve l of educa tio n in the r egressio n in (c) . Does the probability

Ia?

of smoking increase or decrease with the le vel of educa tion?

kers to quit
u will estita o n a same textbook
ata set conI to a workr individual
escriptioo.

workers
ffectct.l h)'

f. Based o n the regression in (c). is there a nonlinear relationship


between age and the probll bility of srook\ng? Plot the relationship
between the probability of smoking and age for 18 ~age ~ 65 for a
white, non-Hispanic m alt: college graduate with no workplace smoking
ban.

K11.2

This exercise uses the r,ame data as Empirical Exercise 11.1.

a. Estima te a problt model using the same regressors as in E mpirical


Exercise ll.l(c) .
b. Test the hypothesis that the coefficient on smkban is zero in the population version of thb probil regression against the alternati\e that it is
nonzero, a t th e 5% significance level. Com pa re your r-statist1c and
your conclusion with those of Empirical Exercise 11.1 (d) based on the
linear p robability mode l.

c. Test the hypothesis that the probability of smoking does not depend
on the level of t:duca Lion in this pro bit mode l. Co mpare your results

with those in questio n Empirical Exercise ll .l (c) using the linear


probabili ty model.
and were u>ed

g B.tn~ ReJm'C

d. M r. A is white. no n-Hispanic. 20 years old, and a high school dropo ut.

Using the probit regression from (a ). and assuming that Mr. A is not
subject to a workplace smoking ban. calcula te the pro bability that

414

CHAPTER ll

Regression wilh o Binary Dependenl Variable

Mr 1\ <>mol..c'i. Carn out the calculation again as,ummg that hi! 1S ,ub
jcct to a workpltce smo king ban . What is the effect of the 'moking ll.ln
on the probability of smoking?

e. Repeat (d) lor ~1s. B, a fe male. black. 40-ycar-old, cullcgc graduate

r.

R epc.;at (d) and (l:) U\ing the linear probabilir> model hum Empm~ 11
P"<crcise 11.l(c).

g. Oa!>c<..l on the an ''ers to (d)-(1). do lhe probit and linear probab1ln~


mud d re ..ult c;Jirfc r? If they do, which resuJls make m1.1rc 'cno.;e? Arc
the: estimated eflects l a rg~ in a real-world c;c n ~l!?
h. Arc there important remaining threat!. to internal validit} '!

E l 1.3

In tht:,exercise you will stuJy health insUiance,health statue;. and c.;mpl \ ncnt
random sample of more than 8000 wo rke rs in rhe United Stat~:~ n e
J aw .tn: avatlablc on the textbook Web site \VWw.aw-bc.corn/stock wuhcm tn
the file lnsurance.li A de tai led descripti on is given in l nsun lllcc_Desniplifln,
available on the Web site.
u-, i n~ J

a. Are the self-employed less likely to have healt h insurance than \\age
earners? If so, is the difference large in a real-world sense? Is the dtf
terence sratisuca lly signifkam ?
b. Th1. llt:lf~mpJoyed might systematically differ from wag" ~<A rnerc; in
the tr age, educauon. and so forth. After you control for these <."~lhcr
factors. arc tbe selfemployed less likely to have hcalt b msurance
c. How does beaJtb insurance status vary willl age? Are older workc . .,
more likely to ha'c health insurance? Less like ly?
d. Ill the ._ rr~ct of self employme nt o o insurance status diffe re nt for older
workers than it " for younger workers?
e. ll h:1s been argued that the self-employed are less likely to be "' red.
but ~.h.:spite this. they are JUSt as healthy as wage-earners. Is thts m.l:.
Does the. argumt:nt hold up fo r young workers? For older worl-:cr<!
An: tht!rc potcn uallwo-way ca usality problems that might u nd~n1
the internal valiuit~ of this kind of s ta~j stical analysis'?
'111C~.: Llillit Wt'IC r>mvid~d

hy Profc~~Cir

Ht.rVe)

Roo;cn of Prmcc:lon llnivc~il\' :lOll

\\ C:lC: u~.:d

in h'

raper Wtlh C'r I" P.:rl'\ ~Tht Sell I Olf'IU\eU Ar ic~, Ltl.el~ Th n Wo ~o.:-lnrr r lu ti t\e H lh
lnsuror.c ') \\ I 1 1r Doug I hlltl f'.akm nmll Jan cy S. Ros.:n. cd'-. J 1111</'<'nc tcr.Jap ;md f1 ' '
rultt;. \11 p,.. ~ .00-l

APPf:

1.

Maximum likelihood Estimation

h sub-

41 S

APPENDIX

mg ban

11 . 1 The Boston HMDA Data Set

ate.
lhc ao,ton ffiiDA data ...c.:t "'' collected h) rc''" m:hcr~ Ul the fcJeral Reserve Bank oi
Rn ... ton Th(. Jata

sur\1.'\

? Are

'-d .. lmhm\.~

mlormatllln h

1111

mor!ln!!.e application., and u follow-up

or the bank' nnd Jtlo. lc.;oding IO'>IIlUllons thut rccct\cd thc!'e mort2age a pplica-

lllln"- The data pertain to rnOrl!!3!ll! applicallons maJo. m li}<)(J in the greater Boston me!r~lpolllan area. The lull data :.c;.l h.ts '292:5 oh,<!rvaltons, consistmg of all mortgage
1\pphcations by blacks aod H i-,pan ~ plu' a random sample of mortgage applicaLioll!> by
WhiiCS.

laymen!
atcs. The

To narrow the scope of the analy '' 10 th ts c haplet, we use a subset of the data (or
smgle- ramil~

watloo in
' criptioo.

residence<; o n I\' (thereby cxdudtng data o n rnulti fa mil~ homes) and for black

and white applican ts o nly

( t b~.reby

excludmg dn to on applicants trom o the r minority

~roups). This

leaves 2380 observations. Ddimhons o ( the vanables us.!d tn this chapter are
given in Table 11.1.

n wage

The.~e

the dif-

data were

gracJuu sl~

provided to us by Geoffrey Tootell of the Research

Depart men( of the Federal Rc crvc Ban f... of Boston. More informal ion about this data set,
along " llh the conclusions reached hy the Federal Rc:~rve Bank of Boston resc:archers. is

avatlablc in the article. by Allc1a H Munnell, Georfrev M. B. TootcU. ~ne E Browne. and

James McEneaney...Mong,tge Lc:nJing 1n


ECt>nomic RevJew.

B~1on

loterpreting HMDA Data," American

l9%. pp. 25-53

for older

insured.
h. right?

t!

APPENDIX

Maximum Likelihood Estimation


This appendix pro' 1de.c; o huel mlroductiun to max unu m likelihood e!.llrnatton in the co ntext of the binary

ere u_<.ed in h '


fl, 3\111
lup <llld rvtrl

) JlJH

rc~pon-.c

moJdc; d1o;cus<,ed m thi( chapter. We stan b\ dcrh me th-.. MLE

lll lb..: ' U(.'Ct:'\' pmhahllll\ p lvr I

II

d ~ ,_ ' II""'' I

n.. r

liUII

randl m \.triabk \\e then

turn to the prot'tit and logll m4>ddo; and dio,cu.,, the p~UdlrR . \h cunclude '"itb a d L~s
2

sion of <otandard errol'> for predicted prohahilittc-... l'h" 1ppcndix us~.s calculus at two points.

416

Regression with o Binory Dependent Vorioble

CHAPTER 11

MLE for n 1. 1. 0 . Bernoulli Random Variables


The first 'ltcp an computing lhe M.L is to derive lhe joint prohab1lity di<~t ribuuon f ,v
1.i.d . ob~c.:rvotion!> on a Bcmoulh random variable, this JOint probability dl.,tnbution c.

e:<1ens1on of then

= 2 cnse in Scct1on 11.3 to generaIn

X (p

0(1 - p)

(I Ll.3)

Tht.: bkehhood function IS the JOint probability dJstnbut10n. treated as a fun ctton fthc
unknown ooe(ficicnt"- ~~ S ~ } . then the likelihood tunllion i~>
(ll.lH

IJ,,.,,,IIIIt(p;Yt }',) = p'(l- p)" \_

The MLE of p IS the value o {p that maximizes thl likelihood in Equation ( 11. 14) Till!
likelihood function can be maximized using calculus. It is convenient to mo x1mi~..: not th~:
likelihood but rather its logarithm (because the logarithm

IS a

strictly lncrcnsmg function.

maximizing th::.- likelihood or its logarithm gives the same e~ timator). The log hkd hood L\
Sln(p) + (n - S)ln(l - p ). and the denvative of lhc log likelihood '' llh rc,p-.;ct hlp i.~

cl
dp lnl/s,.,,.

(p; Y, ..... Y,.)J = p

n- S
-IP

( 11 I')

Settmg the derivau,e in Equatio n ( l l.l5) to zero and solving Corp yield~ U.1.
~

ll.

}'

MLE for the Probit Model


For the probit model. the probability that Y1 = 1, conditional o n X 1, X , c: P ==
tll(f30 /3 1 \ 1- /34 \ ) The conditional probability distribullon fot th1. ,t ob$er't I
liOn
i

IS

.PrJ Y1 = Y.1 X 11 , , Xs.J = r>r(l -

p,) 1-' . Assuming that (X11 XJ.}.) m t.t J

= I , .. . . n. the joint probability distribution of Y 1 , Y,.. cond1llonal nn the x . ..h


Pr( Y1 = y 1 Y,

= Pr(Y1 -

=y,. X 11, XIci.i = l , .. . , n)

y X 11

- P1 { 1 - fit)

'

XJ. 1)

x x Pr( Y,. = y, iX1, ,

X p~(l-

(lJ.IO)
XJ.,)

{',)

Th~: likelihooll funcuon is the Jtltnt probabilit) distribution, II c.ttcll.t' .1 func:ti"n f lh~
unknown coefiiCJcnts. lt is con~entonal to con,..idcr the logarithm of tht.likclih,l<ld. \L ,rJangl). the log hk.elihooll runcllon

Maximum Likelihood Estimation

4 17

ln[fprvbu<fto, .. .. .8k: Yl> . .. , Y nIX ,. . .. , Xlw i = 1, .. .. n))


tion. For n

I I

tion is th~:

.,. 2:" (1 l l

Y,ln[<l>(f3o ~ {3 1X 11 + ... f3~X*,)]

( l l 17)

Y1) ln[l - <1>(,80 + {3 1X 11 + .,. {J.~_Xkl)].

where this expression incorporates the prohit formula for the conditional probability.
(11.13)

p, = tll(.811

+ {3 1X 11 ~ .._ JJ~.X41 ).

The MLE for the p robit modd maximizes rhe likelihood function or. equivalently, lhe
loga rithm of the like lihood function given in Equation (11 .17). Because there is no simple
formula for the MLE. he probitlikelibood func tion must be maximized using a numerical

ctioo ofthOj

algorithm on the computer.


Under general conditions, ma.:illnum likelihood estimators arc consistent and have a

normal sampling distribulion in large samples

(1 1.14)
(1 1.14).Thc

MLE for the Logit Model

izc not the

The likelihood for the logit model is derived in the same way as the likelihood for the pro-

ng function,

bit model. The only difference is that the conditional success probability p 1 .for the logit

ikelibood is

mode l is given by Equation ( 11.9). Accordingly, the log l ike lihood of the logit model is

tto p is

given

[I + e
(11.15)
s the tv1Lf.

by

Equation

'~ s,x ,,-;J,:>: +

(11.17), wi th <1>({30 + {3 1X 11 + + f3AXk1) replaced by


tsi.\ IJ 1 Like lhe prohit model, there is no simple formula for the

MLE of Ihe logrt coe fficients. so the log likelihood m ust be maximized numerically.

Pseudo-R 2
T11e pscudo-R 2 compares the value of the likelihood of the estimated model to the value of

the likelihood when none of the X's are included as regressors. Specifically, the pseudo-R2
for the probitmodel is

. xk sp. =
(U .l8)

i 1" observa-

Y1) a re i.i.d .

eX's. is

where f1~,'~'b11 is the value of the maximized probit likelihood (which includes the Xs) and
flf..~ruiJII is

( 11 . 16)

the value of the ma ximized Berno ull i likelihood (the probit model excluding all

the X's).

Standard Errors fo:r Predicted Probabilities


For simplicity, consider thl! case o[ a single regressor in the pro bit model. TI1cn tJie predicted

unction oi the

probability at a fi>~er.l value of that regressor ..\ . is

l10od. AccorJ

and

p(x) = <PCffi~tLI: .,. i3?'LFx). where ffi6t.L

M"Farc the MLEs of the two prohlt coefricieots. Bec~use-lhis pl'edicted p robability

418

Regression with o Binary Dependent Variable

CHAPTER 1 1

depends on the estimat

,-c S,"' 1 L nn ! ~ ,..

-1n I l'l~au-...: thOS<. e~umators have 3 .,amplin!

ui,tribuuon the predicted prnh.tbllll}' ''ill ai\O h:l\ e a samphnA dt!)tnbution.


111e ...ln;mctc of the s.tmpling Jtstnhutu.>n ol p(x) ts calcul.t\l.:d by appro,unat
function Tcft\1 1 F ~.\Ill_\), u nonlmear lunc-tton ol A_w 1 ..~ml IJ 1'~ 11 .1'oy a linc:~r Ito ~

<f

i3J" F :mJ ~ti

Srx:u i~ll~, let

Ill
'~here

th..:

C()O~tant

c and !actor:. u , and a1 dc~ nd on x and ate ol'otJmt.:.J from\.


ijr,l orJ..:r Ta}lor 'l.OC'> c\pan,jon~ c
IJ(,B0 fJ 1 r). and a an

IEquat 1n ( .11 14) is 3

lf~ the partia) d~rhative~

II

=rl<)l(/3o

f3t\) d/3u;)J

''' .ll

Bllll

a,= i'cfltJ3,,

Jlt)

11

{i, \1

/l,
Jl IThe variance vi p(r) now can hi.: c:Jiculll!cd u'ingth1. npprm.:imalion 111 r, 1ua.
'"'n ( 11. 11J) and tht:. ,;xpression Cor the vanance o( the sum of two random vmiahl ' m
LI.{Uiltlllll (2.31 ):

fl/3

\:tr[p(.\)1

= var[c + a0 (~ 1 F.- /311)+ at<M"' - /31)]


=a~ vur(j3,;11 '

) +

afvar(~ 111 1 E) + 2L11f1 1cov(/3i( 1F.iJ111 LL).

( II 'II)

l -.ing Equation ( 11.20). the ttmd.trd ..:rror ol iJ(x) ccm be c.tlculato:d usmg c '11m h."
of th-: \':lrtance-. Jnd cmnriancc of the MLf.',

APPENDIX

11 .3 , Other Limited

Dependent Variable Models

"lltts 1ppcnJ1x :.urveys some mo<ld:o~ for lirrutcc.l d-:pcnd\!nt ,ariabk:s. oth..:r th<tn hm uy v,tn
:~hie\ found

n economctnc apphc:atiuns. Ln most c,l\C:'> th..: OLS cswnutor-. of the p.tr t11l~
tcr-; of limited dependent variuhl!.' models arc inco nsi~>t cnt, and cstimatinn is routim:l\ L1~' 11 c
using 111<1'\tmum likdihood n1cn: arc several ad \ttllccu rdcrcncc~ uvuilablc Ill the t aJ.:r
int<!rl!steJ 10 further dctarb. :.cc

10r

example;. Ruud ( ,, Ill) nJ \1.tt.: l.tla I 19 3).

Censored and Truncated Regression Models


~uppo-.c ~uu have cro.~\cctional d.11.1 on car purchn~cs by mJi\IJu,tls in a ghcn ~e:1r C.\r
huHr, hn'"' p< " '"c "'"IX, 11 :res, ''hkh can rc.'l''n.thly t-c trcah:J n< continuous ran~ll1'1

Other limited Dependent Variable Model$

419

'ar1.1bh.:lo. hut no'lhu,ers spcrt' _0 1ltu' the d1,tri~utmn ol Cilf c\pcnditurcs is a cumbinattun Of 8 JI,Cfet e tfutn'buti(ln (.t 7ct0) .tnd 3 COIIIIOIIUU'- Jt-.tnhUIItlO.
a

"ltlhcllaureatc Jame ToNn dC\ elOpcJ a useful mood fUI J Jependcnl variable \\ith
pi!rtly COill111UOUS and pari I~ di'CTete d"tn!'lution (lohm 195H). [obin suggc~tcd model-

Ill~ the 11 ~ IOUI\'tdual in the sampll: ~ hnvm~ a d~.:si rcc.J II:\\.. I of ' J'l<ndinl!. r

to the.:
11 .19)

rc:~IC.:"'or"

(for example famtl) 'iiiC) accord1n~

when there 1:. 'mgle regressur.thc.: c.Jcsuc.:c.J lc.:' c.:l ol

}' -/3., + f3 1X,

11.1

that ~ rclall.:d
to a hnc.:m rc.:grcsston modd.1l1at is.

:.pcnc.Jm~ lh

= ! ..... 11 .

(1 1.21)

(whntth.: consumer wants to spend) exceed:. :.om.. culoff.:.wh as the mint mum price
of a car, then the consumer buy the car and ~PCIH.h}
} \\hu:h !!> llb~~;r\c.:d . Ho,,e,er,

If)'

if Y: is less than the cutoff, then -.pendtng ol Y

o '' nh,~ned in,tead nf r

When Equation (11.2.1) t:.~o: umatcJ u'mg ob-.t.f\cc.l c.:x~nJuurl.!s Y, m place uf Y;-. the
OLS estimator is iocon~istcnt Tohtn <onlved th1s prohl m 1'1) dcm mg the lil..dihood function using the additional ru.sumptiun tht 11 h,,, ,, nt~rmallll,tnlmtion. anc.lth~.o rc... ulu ng
(ll.20)

~ tLF ha<; been used by apphed econometnc1an' to 1n ll\1~,; m,m\ problt:m' m economics.
Jn Tohm'o, honor. Equation (11.2 I). combtnt:c.l" ith 1h1: :tl>!>umption of normal error... i~ ciDit:d
the tobit re~re<;<;ion modd. I1tc tobu model is an ex.unplc ul 11 ccmorcd regre!>."ion model,
so-called bccaus~: the depende nt \ ;.triable ha~ been (;cnsorcd " :1b0\e o r below a certain

cutoU.

Sample Selection Models


In the ccnl>Ore<l regression modd. thcrl' arc c.J,It,l on huy~;.rs and non buyers, as there would

be if the data were obained by '\lmph! random \arn phll ~ tlf the.: ,tc.lult population. u: however, the d<tta ar e collected fr<'rn' tic ta' rcwrJ~ th~:n tht d 111 \\I)Uid mclud..: ool~ l'luyers:
'!ben~ would he no daW at II f111 nnnhu\er;. 0 .111 Ill \\ h1d1 oh-.c.rvat1on:. .m~ una\'atlable
nho\'c or h.:It''' a threshold ldata for buycf' onl}) tTl ullcJ trunl ted dat. Thl truna ted
regre,, ion model rs a regn:-,~ton mudcl .tppl ~ J t 1 1. nt J n \\ hich oh'>d\ ttion' 1rl! ~imply
nrv \'llJI
par;tm~

don.:

rc.,dcr

una\atlahl~..

wbc.:o the dependent' .tuabk 1s abnn or I'> lo\\ .. cerlam ~:utull


1bt. t run~;ated regression modd 1., an exnmpk 01 .1 <;~tmph: ''-lccuon model. in whtch

the 'dt!Ctlon mccharu::.m (an mdt' 1c.J ua l1~ m th~o 'mph. h\ virtue of hu~ mg a c.tr) tl> rdated
lo t he \ ;tlu~ufthedepend~;nl ,,,n.ll'lh.. (the pncc ol th1.. ~ 11 \!>lh'<:U"!>I!Ut.;the hox inS~c
tllm II 4 on"' .tpproach to e~umatmn l'l s.tmph.. dc~.:uun modd-. 1s to dt.:vclop t\\O equatiunS,I)nc.: for }'" and one lor v.h~ thcr } " ( N ... .:d T11t' r lf!lrnctcrs or the:. model can then

vear car

randorn

he csttmutec.J by maximum h clihouJ, or in 1 ''..: P" ~ rrcx:cdure, e!>llmntmg the sd<!CUon


equation hrst. th.:n estimating the t.:ljUa\ton tor ) lot J\.ldiuonal discuss1on. ~ce R uud
(2000, Chapter 28), Grc.:c:nl! (:!lXXI. Section 20.4). m Wu<llc.lrtdg~; (200:!, Chitpter 17).

4 20

CHA PTER 1l

Regression with o Binory Dependent Variable

Count Data
Count duta arise" hen the dependent varia ble is a counting number, for example, the 11110
ber of restaurant meals eaten by a consumer in a week. When th<.:"" number:. nrc largc.th
h'
v:mable can be trc:ttcd a<~ c~rr;o~m; tel~ contmuot~:>. tout ''hen tl1(' a c m:~ll , the conti
uou.; uppmximlltton is a ('lXlr one. The linear regression model. estimatc:J h)' OL.'), ct
_.
'
u~d tor count dutn, e'en tl the number of counts is <>maU. Predtch:d v:~lue<~ from :he rc

..

-,ion arc mterpretcd as tht; c:\pected value of the dependent \': lrtJbk, condnionnl on the
re~n.:,<.nr;. So. \\hen the d~:pe nde nt variable is the number of re~taurnnt meals cuten, 0 p
dtch;d \ .tluc of ..-:;;;._an' on '''-rage I 7 restaurant meat, per \\.:e~.,\:; !nth~ h:".try rcpc~

sion mooel ho"'ever. OUi doe~ not take advantage of the special Mructurc nf cuunt dato
and can y1\!ld non~cnse predictions. for examph.:.-0.2 rc::.taurnnfmcab per \\\:.:1. JU<,t as
prohit and log11 climinat" nonsense predictions when the depemknt \'JrlaPic ~ bmary. ~Jle
em! mo~k:h do -.olor count data The two mo~t widely U'>cd tnodels Ml the Po1~-.on .111d ucgotivc bini>miat regrc.:~'ion modcl!i.

Ordered Responses
Ordered response d.ota arise when mutually exclusive qualitative categorie<: h >tvc, n 11urul
ordering. such as obtaining a high school degree. some college cducauon (hut 1111t 1!.1 o1Juat
ing). or Araduittmg from college. Like count data. ort.lcrct.Lrespon,~. c.Jnta h.1v1. J llatur;tl
orderin~. ~~

....

unl!l..t: count data the" do not ha\e natural numencal "alues.


tJl-.:rc are no nutural numencal values for o rdered response data, OLS i~ ppropriatc. Instead, ordered data are often analyzed using :1 gencrah7.3tion of r r .~ . lied
,.~

Bccau~

the ordered prohit model 1n '' htcl.. t .e probab hu~: ,:~f ~:ach outcome (e ~ .. liege cdu
cat on). (;flnU~l.JI on !~I; uukpendcnt variable~ (such a<;' parent-. IOCOme), itrl.. modek-d
U'-in!! the ~.tnnula t i\'C normal J l-.tcibution.
..

::.._.:-..> . . . "

.-

Discrete Choice Data


A dlscrele choice or muJtipJe choice variable can take on multipk unoHkrcJ 1ualit:lll\c

\iiiU\.''-. On~o "' 1mp.t: m economic. IS the mode of tran,port cho::.cn by a commuter: She
mighttnkt.: the su bwn~. ride the hus. drive, or make her W<J) under her own p11\\o:l' lwiilk.
~btcvcle) It \\1! wen; to annl)le these choices.. tbe dependent \'ariable would ha"~ l1lU
sibl~:

outcomes (::.ubway, bus. car, humon-powered}. These outcomes arc no1on.lcrl:u in

til)'

natural w1.1y. Instead. the outcomes are a choice among uistinct qualitative nltcrnottH''
llle econometric task is to mood the probability of choo~mg t h~: \ anou!' 11p11on~
variou" regressor~ such as Individual characteristics {how far t h~: commu ter's hou~c.: tS /1 Pill

the :.ubway station) and lbe characteristics of each option (the price of the 'Uh\\;t) ) _,\ s dis
cussed 10 the OO\; tn ~cct1nn II \, modd" for anal\'sis of di'<crete chotec.: dau, tn lx' dc\-cl
< .
.
',. . rD._.;;--
opec.J frum pon"pk~ of ulflit\ max1mizatton. lndJvtdual cho1cto pmhabihti ., ~an b
to\;pn_
..,,cd i~ probit'Orloen
. form. and
.... lho-.e modd~~ticJ muHinomlal probit J mulli
nomiai iQjtit rce.rc~i1m m......J~; b ..

,=--..- . .

CHA

C IA PTER

12

Instrumental
Variables Regression

hapter 9 d bcu~sed several problem" ancluding nmith:J

\aria~k-.. c:-rmn.-in-

' .trtablt!'. tnJ ,,muJ!anc!OU' l JU,,tJtl\. that makt! the crr'~r t\!rm

~.70rrehteJ

"i th lh~ t .:gre~o t Omttted 'an abk bta'- can be addressed dtrt-ctly by including
the omilled ,ariablc in a muhiph! rcgr\!ssJOn, but this is onlv
J,ll<~

lc<~:-iblt!

if you h,tvt:

on the umiucd \Jnable.AnJ sometimes. such as when causality runs both

(rom X to Y nnd from Y to X , so !hat the re is simultaneous causality bias..


multiple regression simply cannot elimi nate the bias. If a direct solution lO these
problems is eithc!r infeasible or unavailahle, then a new method is required.
lnlitrumental 'ariables (IV) rcJ!rC~ ion

j..,

a g.eneral wa~ to nhtam a

con-.t-;tt;nt C~tlffidlUT M the unkn~mn Clldltctents Ollh'-' p~lpUi at i<'n rl!grc~'ion


fum: I ion ''hen the r~gressor. X.'' corr~ I,Hcd with thL crrm term. tt . To
unJI.!rstand how IV regression works, think of the. variatinn in X as having two
parts: nne partth!\1. for whatever reason.,., currdatct.l with 11

(thi~

is lht.: pMt

that (.,IUSI.!S the probh:tns). anJ <.1 'I!C!li1J pari that I' Unl.'orrchteJ \\ Hh II [f you
bad tnformation that allowed you to isolate tbc second pari. then you could
focus on tho-.c \anations in X that are uncorrelatcd with 11 and disreg.trd the
\.tnallom. an X that hta.s the OLS estimate~. Th.is is, in fact , what IV regression
dot;..' ln e informall~)n about the movements in X that an.. uncorrdutt.d wtth u is
)11cancd from on ~ or more additional \'unable' callc:-d in,1rumcntal 'uriubles or
simpl) in-.trumcnt'i Instrumental variable:, rcgrcs.-.ion uses thc-;e .tddnionnl
v1riablc:> a~ tools or mstrumeots" to isolate the movemen h tn X that are

uncnrrelated wtlh u, which in tum permtt coosistem t''>timation of the regression


codfi~J~n ts.

422

C HAPTER 12

lnstrumentol Variables Regression

1bc first two sections of this chapter describe the mechanics and

assumptions of IV regression: why IV regression works, wh.al is a valid


ins1rument, and how to implement and to interpret the most common IV
regression method, two stage least squares. The key to successful empirical
analysis using instrumental variables is finding valid instruments. and SectJm
12.3 takes up the question of how to assess whether a set of instruments i~, .tliu.
As an illustrauon, Secuon 12.4 uses IV regression to estimate the ela~llcJt\ nr

demand for CJgaretlc!>. Finally. Section 12.5 turns to the difficult quc-.tJon ol
-v.berc valtd instruments come frpm in the flrst place.

12.1 The IV Estimator with a Single

Regressor and a Single Instrument


We start with the case of a single regressor, X, which might be correlateu with the
regression e rroc.u. lf X and u are correlated, then the OLS cstimatoJ 1s inwn IStent.tbat is, it may not be close to the true value of the regression coeffictem even
when the sample tS \ Cit large [s~:!e Equation (6.1)]. As discussed in Section 9.2 this
correlation between X and u can stem from various source;;, mcludmg om ltl'd
variables. errors in variables (measurement errors in the regressors), or sm ltaneous causality (when causality runs "backward" from Y to X as well as " fun\ 1td''
from X toY). Wllatever the source of the correlation betwe~;:n X and u. if thc:rl! IS
a valid instrumental variable, Z, Lhen the effect on Y oC a unit change in X 111 be
estimated using the instrumental variables estimator.

The IV Model and Assumptions


The population regression model relating the dependent variable Y, and n.~l!~
SOT Xi is
(1::'

1)

where as usual u, is the error term representing omitted factors that detemun~ l',.
If X, and u, arc correlated. the OLS estimator is inconsistent. Instrumental v:~ri
ables cstimauon uses a n additional, "instrumental" variable Z to isolate that 11rt
of X that is uncorrelated with u,.

12.1

The IV Estimator with a Single Regressor end o Single Instrument

423

_-fu ~ ~f KtPi.U ~ Endogeneity and exogeneity.

id.

Instrumental variables regression has some spe/ // / / cializcd terminology to distinguish \ ariabl~s tha t are cqrrelated with the popula~rr ~ tion error te rm u from o nes that are not. ariables correlated with tht! e rror term
/l
are called e ndogenous variables. while varia bles uncorrclated with the error term
1#~ ~f,{ -"!(
are called exogenous variables. The historical source of these terms traces to mod--~--- els with multiple equations. in which a n 'endogenous " variable is determ ined
/J ~-olt S ! within the model while an "exogenous" variable is determined outside the model.
l: f - t'
For example, Section 9.2 considered the possibility t hat. if low test scores produced
-(t ~rr~ecreases in the s tudent-teacher ratio because of political interve ntion and
increased fun ding. then causality would run both from the s tudent- teacher ratio
~(,1,
to test scores and from test scores to the stude nt- teache r ra tio. 'IhlS was rcprese n ted ma thema tically as a system of two simultaneous equations [Equations (9.3)
and (9.4) ]. one for each causal conne ction. As discussed in Section 9.2, because
~~ue/ both test scores and lbe stude nt- teacher ratio are determined within the mode l,
{
, -~"
both arc correlated with the population error term u ; that is. in this example. bo th
~>t:f variables a re endogenous. In cont rast, an e xogenous variable, which is de termined
~ &{U
outside the model, is uncorre la ted with u.

14

the
nsisev~n

. this

ttt
1r
j

~ The two conditions for a valid instrument.

A va lid instrumental variable


(''instrume nt") must satisfy two conditions. known as instrument relevance a nd
instrument cxogeneity:
l. Instrument relevance: corr(Z,.X,) =i= 0.

2. lnsfr ument exogeneity: corr(Z,.ui)

gres-

= 0.

u v{ l, Y) -f~
~ll;

fA- ) -::;; ()

If an instnunent is releva nt, the n variation in the instrument is relate d to variation in X ,. If in addition the instrument is exogeno us. then that part of the variation of X, capture d by the instrume ntal variable is exogenous. Thus, an instrument
thal is relevant and exogenou~ can qtpture movemt!nt~ in X, that a re exog1mous.
This e xogenous variation can in turn be used to estima te the populatio n coefficient {3 1
The two co nditions (or a va lid instrument are vita l for instrumental varia bles
regression, and we return to them (and their exte nsio n to a multiple regressors
and multiple instruments) re peatedly throughout this chapter.

(12.1)

1ine Y,.
11 vari
111 part

The Two Stage Least Squares Estimator


If the instrum ent Z satisfies the conditions of instrume nt relevance and exogeneity, the n the coefficient {3 1 ca n be estimated using a n IV estimator called 1wo stage
least S(JUilrcs (TSLS). As the name suggests, the two stage least squar es estimator
is calculated in two stages. The first stage decomposes X into two components: a

424

CHAPTER 12

lnstrvmental Variables Regression

problcma llc compontnt that may be correla ted with the rcgrc~:,ll>n ~:rror .tllll
anothtt proolcm-trcc compon~nt that is uncorrelated with the error fhc sc:cond
stage usc~ th~ problem-.fret! component to c:;timate {3 1.
nlc first stage begins with a population regression Jinking .A and 7 :
(1 2.2)
here 1r0 is tbe intercept. 7T 1 is the slope, and v, is the error term 11m. rcgre,,ton
CO\ ideo::. the needed decomposition o( X ,. O ne componen t is 7fu
1T' 7 ,. lhl. r lrt
f ,\, that can be prcdictl!d b) 7.,. Because 2 , is exogenou!>.. Lhis component of \ ~~
uncorr~larcd wit h 11,. the error term in Equation (12.1 ). The other componcm of
X , iio 1',. whh;h is th~ problematic component of X, that is correlat~:d wllh tt .
Tlle idea behind T SLS is to usc the problem-free component of X,. rr11 + 1r , Z
and to disrega rd v,. The only complication is that the va lues or '1Tn a nd 'ITt ;uc
unknown, so rr0 I 1r 1?:, canuol be calculated. Accordingly, tbe first stugc of T)L \
appli<.:> OLS to Etjutttion ( 12.2) and uses thl.! prcdict~u vuluc: tmm the OLS reg"'
sion. I<, = 7;-u "' -ir 1/., . where 1r11 and ir 1 arc: the OLS estimal~ .
l111..> second stage ofTSLS is easy: Regress Y; on .\', ustng O LS. The rc-,ulll n)!,
i.!Stimators from the second. tagc regression are the TSl.S estimatms. ~Tt'L:l Jnd

cJu
ins
reg
tici

~~.\1'

Why Does IV Regression Work?


T\~O

l!xarnplcs provide some intuition for why IV regression solves the

pro~h:m

of correlation between X ; and u,.

Example #I: Philip Wright s problem. The me thod of in s trumental v.1ri


a ble eo;timation was fict published in 1~:lli m an appendix to a book wriuen 1w
l'hilir C.. Wright (Wng.ht. 192b). although the key idem. of I V regression appe.u Ill

have been uevelop<.:J coUaborat ively with his son, Sewall Wrigh t (';ce the bnxl .
Philip W right was concerned with an important economic problem of hts day: liP''
to set an import tarilf (a tax on imported go ods) on animal and vegetable otb .tllJ
fa ts. such as butter tt nd soy oil. l n the 1920s, import tariffs wc.:re a major souret' PI'
tax revenue for the United Staks. 'Jn e key to understanding the economic effu.:t
or :1 Lanff was having q uantitative estimates of the demand and suppl) cui""~ ~
of the goods. R ecall that the l>Uppl~ elasticit} is rhc p~rccntag.c change tn 1 11.
q uan ti!~

supplied ansing from a I% increase tn the plicc. anti Lh~ ucmnnd clw
the percentage change Ill the qun11ti t~ demanded arisin1_! Irom a 1' ''
incre.1sc m the price. Phtlip Wngh t needed estamates or lhc'>e t.!last acitie~ of :.ura I)
and dc.manc.l.
L icit~ I~

in 1.

(<IS .

sma

12.1

The IV E.sllmolor with o Single Regrcuor and o Single ln~lrumenl

425

Who Invented Instrumental Variables Regression?

'

nslrum~nl<t l

~~

variables n:gre~ion was first proposed


u )Oiuuon to the <;imuhnncuu) causation prob-

... how how a re!lrCS'Ion ot quantity on price will not,


in general. c)limatc ,, dtmuod cut ve. bur instead esti-

the appendix to Philip G

mate~ a comhtnation ol the supply and demand


cuncs. ln tht.' eJrl ~ 192(}., Sewall Wright was
r5-:arching the \t,IU:.Ilcal nnalrsis of multipk equauons with mulllplc cau~al vanablcs in the context of

lem an cconomctrrc:.
2)

10

\\o nght"s 1928 hook. Tilt Tartff un Animal anti Veg rMhh 0111. II you want to l,.now how anjmal and 'c~

n
trt

,is
of

7.,.
re
LS
es-

ing
nd

lem

ari-

by
[0

:d.

1''o

c.;t.thlc otis "~re produc~d . tramported. and sold in


th" c<trly twentieth ccntUI).tben the first 285 pag~::.
>f the book M e for you. Econometricians, however,
,, 111 bt: more inte rested in Ap!)l!ndi:\ R The appendix
oHwidcs two dcri\ arion<; of " the method of introJucmg external lactors- \\hat we now call the
11~1rumen t al vnnablcll c: timator- and uses l V
r.,;s:rc"'ton to e~timate the !lupply and dc:mand elas
t1citieo; for butter and fluxseed oil. Philip was an
olhCUII. cconumi"t with a ~ca nt intellectua.llcgac>
other than thi'< appendix. but his son Sewall went on
to hccontt: u preeminent population geneticist and
wu:.tician. Because the mathematical material in the
J1fh:ndix IS :.o dtHcrent than the rest of the book.
n Iill\)' econometricians as!'iumed lhat Philips son
Sewall Wnght wmtc the appcndi.\ anonymously. So
who wroh.: Appendi' R'?
In fact. either father or !>On could have been the
uthor. Phtlip Wri~h t (IXol-193-t) received a mas <;degree in econonuc. from Har\'ard Uni,er~ity
lAA7. anJ he tuught mathematics and economi~
well a-; literature a nd phystcal cduc:uion) at a
'Tt.tll college tn llhnots l n a book re,iew (Wnght
(I'HS)I. he u.c,eu a figure like Figure l2. l a and b to

gcncti<.~reseorch
profe~sorship

char in part led to hb a:.suming a


m 1930 at the l 'nivero;ity of Chicago.

Although it i) too l:u~ to ask Pllihp or Sewall who


\\rotc: Appendix B. it ,., never too late to do some sta-

tiStical dctecti\.c \\Ork. tylometrics is the subfit:ld of


:.lati.!.ti~ mwnted h~ Fredenck M<>i>teller and Da\'id
Wallace (I%~). that usc~ !-ubtlc suhconsciou:. differcnce:. in \\Tiling <;lyle~ to idc:nt1fy author.;hip of disputed texLs usmg st:lltsllcul analysis or grammatical
cotlStrucuon!> und word choicc.1llc Itelt.l hils had verified successes. such as Donald Fo,ter's (I 996) uncov-

c:ring Of J(l'.eph k.Jetn oil> lb..: .tuthor of the pOlitical


no,el Prinwr_,. Colon \\hen Appendix B is coru-

pared statisucallv to 1~.;\t' known to ha\e been written independently b~ Philip and hy Sew aU. the
rtsults are clear: Phihp wa~ the author.
Doc~

this meun thitt 'Philip Ci. Wright invented IV


Not quite. Recently. corrc)ponuence
betwcen Pht tp .uHJ S \\. II 10 the mld-1920s has
come to hght, and lim corr~.,pondencc <.how' that
1he de\ dopmcnt 1>f I r~o.& c. if 'l \\ J<, JO!Ot tntellcctual collilboralll)O hetween tather and :.on To
learn morc, 'Ce Stock and lrcbl-ti (2003).
rcgre~~ion ?

To be concrete. con)ider the problem of estimating th.: da, ticiry of demand


fo r butler. Recall from Ke y Concept 8.2 that the codficient in a linear equation
relating ln( Y,) to ln(X,) has the inte rpretation of the ~ l asticit y of }' ....,ith respect to
X . l n Wright's problem , this suggests the demand cqu.ttion
(12.3)

426

CHAPTER 12

lnstrumentol Variables Regression


wh~::re (//''"~' is the i 111 ob~erva tion on the quanlity of butler con~ uml.!d . P:'"11<'' ~its

price. and u, represents other factors that affect demand. such as income and consumer tastes. In Equation (12.3), a 1% increase in the price of butter yields a {3 1
percent change in demand, so {31 is the demand elao;ticity.
Philip Wright had data on total annual butter consumption and its a' erage
annual price in the United States for 1912 to 1922. Ir would have been ~v to uc;e
these data to estimate the demand elasticity by applying OLS to Equat1on ( 12.3).
but he had a key insight: Because of the interactions between surP.I) :tnd demand
the regressor, in(~'"'') was likely to be correlated" itb the error term
To see this. look at Figure 12.1 a, which shows the market demand and supply
curves for buuer for three different years. The demand and supply curves for the
first peri od are denoted D 1 and 5 1 and th e first period's equilibrium price and
quantity arc determined by their intersection . In year 2, demand increa es trom
0 1 to D 2 (l)ct y, because of an increase in income) and supply decreast s from 5 1 to
52 (because of an increase in the cost of producing butter); the equilibrium pricl.!
and quantity are determined by the intersection of the new supply and dcmanJ
curves. Jn year 3, the factors affecting demand and supply change again : demand
increases again to D3, supply increases to 53. and a new equilibr,ium quantit) and
price are de termined. Figure 12. tb shows the equilibrium quantity and price p:ttr'\
for these Lh ree periods and for eight subsequent years. where in e8ch year the -.upply and demand curves are subject to shifts associated with !actors other than r K'C
that affect market supply and demand. This scatterplot is like the one that Wnght
would have seen when he plotted his data. As he reasont!d. fitting a line to the'\.!
pointl) by OLS \\ iiJ estimate neither a demand cune nor a uppl~ cur'"~: bccau'c
the pomts h ve b.:cn Jetermm~d by changes m both dcmano and !)upply.
Wright realized that a way to get around this problem wa~ to find some thud
variable that shifted ~upplv hut c.lid not shift demand. Figure 12.1c ho,.,. what h rpens\\ h n ucb a\ ariablc shifts the supply cunc. but demand remain~ stable ' l)W
all of the equilibrium price an<.l quantity pairs lie on a stable demand curve. and
the slope of the demand curve is easily estimated. In the instrumental variable fur
mulalion of Wright's problem. this third variable-the instrumental variable- '
correlated with price (il shifts tbe supply curve, which leads to a change in pncc)
but is uncorrelaled with u (the demand curve remains stable). Wright considcrrlJ
several potential instrumentaJ variables; ont! wa tbe weather. For example, beh"~'\
average rainfa ll in a dairy region could impair grazing and thus reuuce butter 1nl
duction at a ghen price (it would hift the supply curve to the left and incrca.;e 1ht:
equilibrium price), so dairy-region rainfa ll satisfies thl.! condition for instrumLtll
rele vance. But dairy-region rainfall should not have a <hrect influl!nce on n.:
demand for butter. so the correlation between dairy-region ramfaJ! and 11, ,H,uld

in!

(b )

11

hi
fro

(c) y.,

dema

lies tr

12. 1

The IV Estimator with o Single Regressor and o Stngle Instrument

4 27

rFIGURE 12. 1
(a) Prtce and quonhty ore determined by the inrersection of the

Price I Ptnod 2

)upply and dcmond curves. The equtlibrium in the ~rst period IS


determined by the intersection of the demand curve D1 and the
supply curve 51 Equilibrium tn the second period is the intersec
1on of ~ ond 5:!, and equilibrium in the third period is the
i '' rsection of D:J and S:J.

Qua n tity

(a) Oc.-nund

(b) Thts scotterp!ot s~ equilibrium price ond quantity in


11 dtfforent Itme periods. The demand and supply curves ore

~nd ~

rrly 111 thn:<.

IIIII<'

rc.-nud<

Price

h'dd~n

Con you determine the demand ond supply cv~


from tho points on the scotterplol?

Quanti!)'

(b) l:qmlibnuu1
Unit'

(c) Whon the supply curve $h,fts from 5 1 1o ~ to 5:J but the
demand curve remains ot 0 1, the equilibrium prices ond quanti
lies trace out the demand curve.

pr11 c

.111d

quanti[)'

far II

~110d'

Peke

Qu.ncity
(c) l lJUIII''IIUIII pliCC' ;It hi t)llllllit\ \\hell t>nly
dH

\Uppty ( llf\ <' ~fulL~

428

CHAPTER 12

lnstrumentol Venables Regression

be 1c1u: that b. dair~-r"giun rainfall samfic~ the cutH.lltion lor m~trum c n 1


.exogcncil).

Example #2: Estimating the effect on test scores ofclass size.

De -pite con
troiJmg for studeo1 a nd district characteristics, the est1mat~ ut th1. dfec. on tc 1
scores of class size reported in Partli still might have omtllcd ' ;mub!c' btaos rcsuJt.
ing from unmeasured variables such as team ing opportunuk-. nut-.adc :-.chool or t h~.:
quality of the teachers. If data on thc'e variaolcs are unavarlabh.: till' untilled, lfl
abies b1as cannot be addressed b) including the variable~ m thc muhaplc regres<;iu o~
Instrumental variables regression provides an a ltcrnuuve approach to 1h1
problem. Consider the following hypotheucaJ example Sum~. <Jlt!omi~ 'ch 1ls
arc forced to closec;t lor rcpa1~ hcL<lU't: of a summLT ~.arthyunkc: D1~tnct clo ...c.o;t
to l hc eptccntt;r are most sc\erel\ affcctc.:J A district Wl!h SOmL dosed ch n~b
neelh. to "double up" 11s 'lutknl'-. temporarily increasing da'~ site. nus mean') th u
distance from the epicenter satisfies the cond11ion for in-.trumc nt rek\ a nt:t'
becu u ~e 11 il> corrdatcd with class size. But if distance to the ~picl!ntcr i~ uurebtcd
to any ol the other factors affecti ng student performance ('>llch o:-. wht.'ther the ~t u
dent~ ure !'lilllcarnmg EngJi~,h). then it will be cxogenou~ ht.!caulic 11 1s uncorr...
Iated with the error tenn. Thus the instrumental variable, distance lO the eptet: nt.. r.
could be:: used 10 Circumvent o mmed variables bias and to estima te the df...ct of
class Slle on tesl scores.

The Sampling
Distribution of the TSLS Estimator
The exact distribution of tbe TSLS estimator in small samples 1S complicated H(' ever, like the OLS estimator, its distribution in large samples as smple:The 1'\L
estimator is consistent aod is oormaUy distribUled.

Formula for the TSLS estimator. AJthough the two stages ofTSLS mak h'"
estimator seem complicated, wheo there i~ a single X and a sjngJe io~ tru menl /.
as we assume in this section, there is a simple formula for the TSLS esti mator. I ~t
Szy be the sam ple covariance between Z and Y and le t s7.x be the sample cu\'art
ance between 7 and X. As shown in AppendiX 12.2.tbl! TSLS estimator wnh ,, Jn
gle IOiilrUmt.!nt

IS

( _.4)
That ,.., the TSLS estimator of {3 1 i the rauo of the sample covariance bt:l'"e
and Y lu the sample covariance between 7 and X.

1z.1

nt

n-

1he IV Estunolor w1th a Smgle Regressor and a Single lnstrumenl

429

Sampling distribution af f:Jf'~ when the sample size is large. The formula in Equation (12.4) can be used to show thc1t {3 1 ' 1 ' i~ con~t~h!nt and. in large
~a m ple', normally distributed. The argument is !>llmmarit.cd here. with mathematical details given in Appendix 12.3.
The argument tha t ~[SLS is con:.istent comhineli the assumptions that Z;
lS relevant and exogeno~ with the consistency of sample covariances fo r populario n covariances. To begin, no re that beca use Y, = {30 + {3 1X ; + u; in Equation
(1 2. 1) ,
cov(Z,. Y;)

= cov(Z,.({:J0

....

{3X,- u,)l

{3 1cov(Z,.X;) + cov(Z,.u,). (12.5)

wberc the second equality follows from the properties of covarianres [Equation
the
(2.33)). By the instrument exogene ity assumpl io n, cov(Z,,u,) = 0,

CO\(/,,) )

f3r = C0\(7 .X)'


tcr.
of

That is, the population coefficient {3 1 is the ratio of the population covariance
bet we~n Z and Y to the population covariance between Z and X.
As discussed in Section 3.7, the sample covarin nce is a consistent estimator of
rhe population covariance, that ts. ~n _r_. em(/,}',) .mJ ~ 7 ,. ~ CO\(Z,.X,).
It fo llows from Equations (12.4) and (1 2.6) that the TSLS estimator is consistent

~TSt s _
owLS

tht:
l

z.

Lt!l

ari
sin

24 )

s/\

....:..!:...... cnv( /r} )

13

( 12.7)

em(/,,,\,) ,.. ,.

The formula in Equation ( 12.4) also can be used to sho'' tliat the ~am piing
, ubtribution of {3nLs i normalm large ~ample,. The r~.:ason rs the same as for every
otbeJ least squares estimator we have consideretl The TSLS estimator is an average of random variables.. and when the sample size is large the cenuaJ limit theorem teUs us that averages of random vJ riahlcs a re normally distributed.
Specifically. the numerator of the expression for ~f''\ m Fquation ( 12A)is s~) ::
II I ~: t( Z, -...<Z)( yt . . . . ""'Y).' an--average ol
/)(} I
A bit ot algebra.
c;ketched out in Appenclix 12.3, shows that b~ca u :.e of this averaging the central
limit theorem implies that, in large samples. {3fSLS has a sampling distribution that
is approximate ly !V({31,c1-v<). where

nl

.Vi.

,...j~ 1 '"r[(/

CT!:. HH -

fJ _

,. n

Jl,)lhl
(co\(/ .A,)J

,.

(12.8)

430

CHAPTER 12

Instrumental Variables Regress;on

Statistical inference using the large-sample distribution . 1l1e vananrc


cr\ titn he estim.llcd h) Cl>ttm<t ting the variance and covm t,m~:~ h .. rms <~r~xnr
tllg in 1 quauon ( 1 ~.8). and squaie roo t of the estimate of cr'
'' th~ -.wnuard
ermr ol 1he IV estimator. Thb is done automatically in TSLS r\. P.rc.:S'>I m command
m cconoPle tnc software packages. Because pr.H 5 1s normally \Jt"tnbuted in l1rg
~a mples, hypothesis tests about {3 1 can be pt:rformed by cumputm~ the t-'>t..Jtt"ltc
and 95'}u l<~ rge-samp1e confidence interval is given by P{' 1 ' _ I I.JhSF({J{ '' ~~

Application to the Demand for Cigarettes


Phihp W11~h t was m tcr.,;lllcd m the demand dasticll\

1)1

hull.:r but today ntl

commodiuec:, such as cigarcues, figure more prominen tly in pubhe pol ic} J eb<tlt:'l,

One tool m the quest for reducing ill nesses and deaths from smoking- ami the
costs. or ~.: :-.tcrnulit i es, imposed by those illnesses on the rest ol society-ts tu t.t:-.
cigmcllc'> so h ea v il ~ that current smoke!'\ cut back and potcnu.tlncw smoker-. .trc
da-;cnural.!cd I rom taking up the habil. Hut precisely how btg a tax hike is n~cd ~J
to makl' a c.knt in dgarelle con ~umption ? For example, what would the after t, tx
sales pri~.A: of cigarettes need to be to achieve a 20% reduction in cigarc tc
consumption?

fhc on~wer to this qu~o twn depends on the e las1icity of demand for etgar t
II the elastica!> i~ - I , then the 20 % target in conl>umption can be achte\'ed "'.. 1
20'':, mcr~:.t!'e in price. LC thl! elasucit~ is 0.5. then the price mu't me .JO" to
decrca,~. con,umplion b) 20'~. Of course. we do not know what the demand lit'
ucit\ u f d11.urenes is in the: abstract: We must estimate ir from data on pricC!o; !lnd
sales. But. as witb butter. hccause of the interactions he tween supply and d\. m and.
the cl,.stktty of dcmanu fur <.tgarcttes callllot he cc;timated con''" ntl-. b~ an< )l..\
r~;grc"ton lll lol!. tJUantitv on lo~ pnce.
w._ th\. retorc u~e TSLS to Cl>tunate the el asticity of demand for cigar\.1 11:,
using annual data for the 48 continental U.S. states for 19~.5-1995 (the uat are
de~cri bed m Appeo dJ~ 12.1). For now, all the results a re for the <.ross sectim f
states an 1995; results using J.Ha lor earlier years (pane l data) are pn::scntL ~ 111
Section I ::!.4.
11lc in!)Lrumcntal variablt:. Suit'\ Tax,. is the pmtion of the tax on dgurctt<.:"' uri'
ing fn.1m the general saks tax. measured in do ll ar~ p e r rack (in real do lin'
deflated h) the Consumer Price l nt.le>.). Cigarette con umplion. Q;lrnr 110 i-. Ill'
numhcr ot packs uf cigctrettes sold per capita in the srate, and the price. P
r'h
lS tht.: av~rag~.. r~al price p1..1 pack ot cigarenes including all taxes.
Be tort. u~mgTSLS itt'> c scnttal to ask whether Lhl.! two condi tions for tmtrll
anent 'faht.ltl} hold. \\ c rl'lun to this topic m detail m Scclio n 12.3, w hen '~'"

12. 1

The IV Eshmotor with o Single Rogreuor ond o Single ln~II"'Jment

43 1

pro"itlc :;omc ).talisticaltooh that help an thh 11'\C).\ment. Even wi th tho s tatisttcaltools,judgmeot plays an important role, ~o 1t1' u~cl ulto think about whether
the ..,ales tax o n cigarctl~s plausihly satasfics the two conditaon!>.
Fir:-t consider instrument rdcv tnCC. Jl~c lU'l ,1 htl!h o..ctk' ta:x tncrea<;es tbe
to tal saleo.. price ~ gar<" , the safe, tax pc r p.KI.. plau .. ahl) ,,ttt,fJL' the condition
tor m~trument rele,ancl!.

).

tes.
the

tax
are

;\~xt consider instrument exogcoell). For the ~.tie:. ta-< to be exogeno~ it must
be uncorrelated with the error in the demand equauon, that is. the sales tax must
affect the demand for cigarenes only indm~c tl) tJuough the price.This seems plausible: GeneraJ sales ta~ rates ' 'ar} from st.ttc to sta te. hut they do so mainly because
different states choose dttfcrent mixco.. of .,,,lc:'. mcomc proper!~. and ot.her taxes

to finance pu blic undenakmp. Ththc chou;c:' Ji'lnut puhlic hnancc are driven by
political considerations. not b) tactors r'"l.Hl.J to the dt.!mand for cigarette!>. We discuss the credibili1y of this al>sumpt.km mute in Sl!ction 12.4. but tor now we keep
it as a work ing hypo thesis.
In mode rn statistical software. the first stage ofTSLS i~ estimated automatically so you do not need to r uo this regression your~clf to compute the TSLS estimato r. Ju ' I thi once. however. we prl! ent the n r~t-!:lt agc rt!!!J cs.,ion explicitly: using
data for the 4" states in 1995. it j(,

In( ?'

'rn~)

= 4.61 + 0.03 1 ~alf!,\Ttn:,.

(12.9)

(0 In) (0.00-i)

The R2 of t.his regres\lOn J<; -H"n.so the' ari.tti\m in 'alt:.o..lil\ on cigarettes expla ins

...

'

---

47% of the "anance of cJgart!llt. price:' tcru'' state:-.


In the second l>tagc ol r L'>.ln(Q rlfo) 1:-. rc)rc.: ......ctl on ln(F,1" 0 'rr:'"') using
OLS. The rc<;ulting C$timatetl rl.gT~''mn fum:ttun ''

(1210)
Thi~ estimated

--

rcgrcs'iion funcllon is written U'>tng the regrcc;.<;or in the second


ln(P('~u""''' ). It i~. however. conventional a nti less cum-

stage. the predicted value

bersome simply to report the estima ted regtcs~ion function with ln(P;''~.,...11" )
ra ther t han Ln(P; 1Kauutr ). Reporte d in thi~ notation , the TSLS c<;timates and
he te roskedasticity-robust standard errors arc

fii'( Q

mtto'l)

= lJ 7~
( I 53)

I~O~ln( P
(0.32)

rtrtll'')

(12.11 )

Instrumental Variables Regression

CHAPnR 12

432

The TSLS estima te suggest~ that the demand for cigarettes is surpri ' ing ly da~>
tic. in light of the ir addiclive nature: An increase in the price of I % reduce' con
sumption by 1.08%. But. recalling o ur d iscussion of instrume nt exogencity, pcrh up~
this estimate should not yet be taken too seriously. Even though the elastacity \\..t
estimated using an ins trumep tal variable, there might s ti.U be o mitte d variable<.
that are corre lated witb the sales tax per pack. A leading candidate l1> income
States with higher incom~::; m ighr depend relatively Le~ on a sales tax anti more
on an income tax to financ~ state go,cmmenL Moreover. the de mand for cagurctle::. presuma bly depends on income. T hus we would like to reestim ate: oua
demand equation including income as an additional regressor. To do so, howe\t:r
we must firsLexte nd the JV regressio n modeJ to include additiona l regre $ or-..

' . ,J /V

(~~~.- ~~ ~

(,,:v i'fftJ,

(!)

~ 2.2 The General IV Regression Model

,,Ji-Jit

The general IV regression model has (o ur types of variables: the dependent 'arl
able. Y; problematic endogenous regressors. like the price ()fctgarcttcs. which are
potentiaJJy correlated with the e rror term and which we will la be l X: additwnJI

(j~~

regr essors that are oot correlated with the error term. calle d included exog~nou'
variables. which we wiU label W: and instrumental variable.~ Z. In general, t hl.r~

( MJ.AttJ,]()fS

"'p
t.tCf "

ILt

can be multiple endogenous regressors (X's), multiple included exogeno us regrcs/Afflln


sors ( \V's), and multiple instrumental va r iables (Z's).
fU
For IV regr ession to be possibk the re must be at least as many in::.trumcntal
)
~{/1 (1. / vanables (/' ) a"endogcnou regres ors ( X' J. In Section 12.1. the re was a .:;m c

{r

cndog~n ouS'_r~grc~or and

a single instrument. H aving (at least) o ne insrrumcnt


. 11
for this single endogenous regressor was essential. Without the instrument ,,e;:
~(1!>
could not have computed the ins trumental variables estimator: there wou ld b<.:. no
~1,1.0 1/1
first-stage regression in TSLS.
l f t,_j~S The rela tions hip between the number of instruments and the number \)f
~
ogenOUS regreSSOrs is ufficicntly important lO have it.s own tcrminolog~ . 1ht.:
K.
regression coefficients are said to be e xactly ident.~fi~d if the numbe r uf imtru
meats (m) equals the number of endogenous reg:rl!ssor!; (k). that i:..m - k. llu~
_ , _,r~~ , coefficients arc overidcnt.ified if the number of instru men t!i exce~ds the nu mb~r
of end_9gcnous regressor .... tha t is. m > k. They are unde ridentified if th e number
~ of instrume nts is less than the n umber of endogenous regressors, that is. 111 k.
rt'J
11
The coefficie nts must be either exact ly identified or overidentifie d if the} arc hl
\.)' 'l)oJIV .
be estimate d by fV regression.
tY v v~
The gene ral TV regression mode l a nd its termino logy is summariz~d in J..q
ffConcepL12.l.
_

111 /
lJ

JJ

rp

J 1(,7JPih

r.:::J /
V

qg,v

l
r"

JYd.t

THE G E

REGRES
11te gcncr
)

i = l. .. ..

Y,

u,
fa

XI

lm

wI
Iat

f3o,

Z t,,

The coeffiCJ ~
regressors (1

tificd if m

tion or O\'eri

12.2

lasonaps

was
bles

THE GENERAl INSTRUMENTAl VARIABLES


REGRESSIO~ MODEl AND TERMIN~, .C, ~4

~nl IV r.:grc.,io~~~:1

me:

I ... . n

varib are

. {Jk 1

tv1, -r + f3s..rW,, + 111,

wh~r~

e~ ~y.~ c...,~

Y, is the depc!ndent variable: ~"l!fjrt!SJ..91'S

umbi.'T
u ml'~r
II

<: k.

are to

12. ]

=IF ~~~~S~~ ~
,j e~ ~r~~H"'

u, i the error tcnn, which represents measurc;!ment error l'dl~f ~n~itrcd"'


~
factors:
f.t~ ~~~,J;.T;,.._. - -.u ~ .. N!

1:.

X 11 x~., are k endogenous r~g:rcs.wrs. which are pOlentiall>


Ia ted with u,:

W 11 W narc r included exogenous regressors. which arc

latcd wi th u ;;

{30 , {3 1 , f3t.:+r are. unknown

L 1,

corr~

L-k. ;

~Je..."r- ~ >

rcgrc~sion coefficient~; an~ Jft-18:; IJ(",~ ~/ ~

Zm, are m mstrumcmal vanables.

~e-~~f;l{

,. ~

'"'

TSLS in the General IV Model


nou~ rcgr~-.sor X

ofantc..rc,tl~

I(

uncorr~~f" 1 <'A'

TSLS with a single endogenous regressor.


ber of
). The
nstru
k. Tho::

- - t

(12.L)

1bc.. cod ticicnts are overident.ified if there arc more instruments than endogenous
regrc ~'>ors (m > k): Lhe} arc underidentified if m < k: and they arc exactly idenhfi~d 1f m
k. E.<itimation of the IV regression model requires exact idcntificatton or O\ emden 1ificalion.

nt we
he no

4 33

KEVCONCm

i( ~.uJ<-$ l'tp~

f34X 1

{3 :< 11

Y, .,.. {30

ore
our
'er.
s.

Tha General IV Regress1on Model

\\ h~n there.. is a ~ing.lc cndoge-

:md soml! additional tncludcd -.:xugcn ~~"..)1 riahle the equatjon


~
._f(J

u)

( 12.13)

where, as before. X; mig.bt be corrclatec.l with the error term. bur Wli. .. . W, arc
not.
The population fin,t-~rage rcgr c-.sion ofTSLS relates X to the exogenous variables. that is. the H sand lhe mstrumcnh (/\):
11m~, \V, - l',.

'' here 7T0, 71 1 , 1Tm

(12.14)

are unknown rc.:!.' r~o.:.:.ton codficrents anc.l '' is an error term.

434

CHAPTER 12

Instrumental Variables Regression

Fquation (12. 14 ) i ~ somctiml!s called the reduced l'nrm equation lor X . Tt


rclah.!'- th~; enllogenous \ariablc .\' tu allth~,; ahlilablc cxog~.:nous vanatlk-.. hoth
tho-.c included in the rcgre~si on ol mterest (W) and the in-.trumenh (./.)
In the fi rst stage of TSLS, thc unknown coefficients in F quation (12 1-1) are
~stimated by OLS. and the predicted values from Lhis regn!ss1on are \ . . . X
In the second stage oiTSLS, Equ..uon 12 3) C> e-.umateJ b} OL . ~:. xccpl tha~
X, i replaced by its predicted value from the first tage lllat h.} '' ~,; e. r:_.,.,~.:J
X,. \V . . . . Wn using OLS. The resulting estimator of 13 ./3 ..... ~ j, Lht. JS \
,..

~-: ~

J.

r % -,..t

~ '/.~xtension to multiple endogenous regressors.

jJ~
{1[

tJ)>

~~

JJ. f)flf" ~

1/P',Iflf) J..;f
11

!f!rQ

Q/'
Y

'

c:~tima tor.

..--

Two

lneTSl
multiple
l. 11rs

L. ,J'
tb~

When there are multiple

end<

endogenous regressors X li .. . . . X4;.the TSLS algorithm ts similar.cxcept that e~ch


endogenous regres~or!equire ih O\\ n first- ta!!~,; r~gression. Each of thes~.;. l iN
stage regressions has the same form as Equauon ( 12. 14).that is, the l.h.:pendenl
variable is one of lhe X's and the regressors are all the instruments (Z's) and ,Ill
the included exogenou~ variables (Ws). Together, these fi rst-stage regressions
produce predicted values of each of the endogenous regressors.
In the second stage ofTSLS, Equation (12.12) 1s estimated by OLS.~:xccrt that
th_~ em.loa:_nous regressors (X s) arc replaced by their respective prcdict~.;d 'alue-.
(X's). Tite rc uJtiog c timator of /30 {3 1 {3# _, i~ the TSL') esl!malor.
-To pract1ce, the two stages ofTSLS are done automatically within TSL \ e. .tt
mati~n commands in modem econometric :;oft ware. The general TSI..S c'timator

X ,.

~~~
~~r=rized in Ke y Concept 12.2.

Instrument Relevance
and Exogeneity in the General IV Model
The condu1ons of instrument relevance and exogenelly need to he rood1111:d for
tbe gene ral IV regression model.
When there is one included endogenous variable but multiple inst rument$.
the condition for inslrumen t relevance .is that at least one Z is useful for prcdi.:l
ing X. given W . V.'hen there are multi ple included endogenous variables, 1h1s ~.-on
dition is more complicated because we must rule out perfect multicoll inenritv '"the second '\lHge population regre~siOn. Intuitively " l~u1 tlwr.' ' It' c'~'ll ~~
included endogenous vanables.tbe Instruments must provide enough tnl ormarvn
about the exogenous movements in these variables to sort our their -.cpari.lte
effects on Y.

2. Sec()

no us

w,,)
the s
In practil
mandJ> in

12.2

Tho General IV Regress1on Model

435

X. h
both

TWO STAGE LEAST SQUARES


l'ht; f <iLS e:.timator in the general IV regreS!>ion model m Equation ( 12.12) with
muluple instrumental variables is computed in two stages:
1. Fir-.tStllgc regression(s): R egress X 1, on the instrumental variables (Z 11
/. ,) und the included exogenous variables (Wii, ... ':Y,) using OLS. Compute
e~eJ.c:t
111
th~ prediCted values fro m this regression: call these X1,. Repe.ll th1s fm .111 the~
~:ndogcn~us regressors x~,.. ... x~.,. thereby computing tht! predicted values /
A . x~.,,.

ccond-stage re~es..'iion:_Regress Y, on th~.- predicted values o f tne endogevan ables (X1, X.tl> and the included exogenous 'ariablcs ( Wlt .... ,
n ,) u!ting OLS.ThcTSLS estimators ~~ 51 ~ ~I~P arc the esumntor.. from
th1. seco nd-stage regression.

JIOU

In prncttcc. the two stages are done autom atically withi n TSLS estimatio n commands rn modern econometric software.

The gene ral state me nt of the instrument cxol.!~,;ncit \' condition is that each
im.trumcnt must be uncorrelated \\ith th1. ~rror tc:rm u,.The general conditton for
,alid i nstru men~ are ghcn in Ke) Concept 12.3.

The IV Regression Assumptions


and Sampling Distribution of the TSLS Estimator
U nder the IV regression assumptions, the TSLS estimator is consistent and has a
sampling distribution that. in large samples, is approximate ly normal.

T he IV regression assumptions. In c I\ rcl!rc,.,ion assumptions arc modifications of thl:! least ~quares assumption~ for the multiple rcgrcs ion model in Key
Concept 6.4.
The first IV regression assumption modifi es the conditional mean as~um ption
in Key Concept 6.4 to apply to the included exogenous variables only. Just like the
second least quares assumption for the mulliple regression model. the second IV

436

CHAPTER 12

Instrumental Variables Regression

THE TWO CONDITIONS FOR VALID INSTRUMENTS


A set of m instruments Z 1r
valid:

.... 2 111,

must satisfy the follov.ing two conditions to be

1. Instrument Relevance
in generai. Iet X f be the p redicted value of X ;, from the popuhttion regression of XII on the instrume nts czs) and the included exogenous regressors
( W's). and let I '' de note the constant reg1essor that takes on the value 1 for
aUobservations. Then (Xj~, . .. . Xf,. W1.- W,;. 1) arc not perfectly multicollinear.

if there is only one X, then for the previous condition to hold~ at least one Z
must e nter the population regression of X on the Z"s and the w s.

2. Instrument Exogencity
T he instrum<.:nts are uncorrela tcd
corr(Z1;.u,) = 0 . . . . . corr(Z111,.n,) = 0.

with

the

erro r

Lt! rm.

thai

is.

regression assump tion is that the draw a.re i.i .d .. as they arc if the data arl..' col lected by simple random ampling. Sunilarly, the third IV assumption is that large
out11en. are unlikely.
ll1c Co urth IV regression assumption is that the two conditions for instrumt!nt
validity in Key Concept J2.3 hold. m e inslrument relevance condition in Ke~ Concept 12.3 subsumes the fo urth kast squares al>su mption in Key Concept -Ui (nll
pe rfect multicollineari ty) by assuming that tb<: regressors in the second -:. tag~

regression arc not perfectly mu llico lllne.ar.'n1e IV r egression assumptions are


summarized in Key Concept 12.4.

Sampling distribution of the T SLS estimator. Under the IV regression


assump tions. the TSL S estima tor is consistent ao d normally distributed in lall:e
samples. Tni is shown in Section J2. l (and Appendix 12.3) fur the special a'c of
a single cmlogcnous regressor. a single instrument, and no included exogctwlls
variables. Conceptually, the reasoning in Sectio n 12.1 carries over to the gt.:n ~r;tl
case of multiple instruments and mult iple incl uded endogenous variables. '1 h ~
expressions in the gene ral case are complicated. however. and are deferred 10
Chapter IK

lho General IV Regres~ion Model

12. 2

THE

437

IV REGRESSION ASSUMPTIONS

----------------

The variablt:~ and

error~

I. t :(u 11 W1, W, )

in the lV regression model in Key Concept 12.1 ~mh.fy

12.4

= 0:

" (X11 X 1111 W1, W, . 2 1, Z,m Y,) are i.UI. draws from their JOint distribution:

s
r

Large o utlicrc; arc unlikely: The X's, W s. Z 's. and Y have nontcro fi nite fourth
moments~ and

4. The two conditions Cor a valid instrument in Key Conct;!pl 12.3 hold.

Inference Using the TSLS Estimator

s.

Atrr?
1~-~
I

Because the sampling distribution of the TSLS Cl> ltmato r is normal in large samples, the general proced ure" for static;tical tnfcren<:e (hvpo the is tests and confidence interval!') in regrcc;-.ton m<ldcJ ~; c:\tc nllto TS L \ rl.'ercc;o;ion. For example,
95% confidence intervC~ls arc constructed as the TSLS est i ma tor ~ 1.96 standard
errors. Stmilarl~. joint hypothe es abo ut the population \'a lues of the coefficients
euran he h!~ted using. the F- tati\tiC, a' dcc;crihcd in Sl.!ction 7.2.

Calculation ofTSL S standard errors. '11"11:re are two points to b.oar in mind

ion

al'lout TSLS <>tandard error'\. Firs1. the c;tandarll ern! ' rt:po rtcd b~ OLS estimation
ol th~.; second-stage regres JOn <Ire mcurrc~.;t ~cau'"' the~ uu nut recognize that it
is the second stage of a two-stage process. Speciitca ll~. the second-stage OLS standard erro rs fail to adjust for the f<1ct th.ll the \c~.;onu -.t t!!r.! rcgrcs 10n u es rhe prellictcd \ .tluc:s or the included cndoucnou... \ ariahk;... r ormul,,, hJr ~tandard errors
that m.th. t h~.; nccc~sary adJUstment 'tr~.: tnc,nr" ttcd into (and automatically used
h~ ) I L regreo;.sion ommanu" t~cnnum~.:~ric ' nft,, an.: lncrcfore this tssue is not
a concern in pracuce if you use a specialitcll rsu; reg.re-.,ion command.
Second. as a lwa y~ the error~~ mtght b~.; hctUl"~l.'d p,IJc It '"therefore imporW n\ to U'>\. hr.:tt:ro~kedao;ticitv-nibust \Cf"tnn' ulthc 'tam.lard error for precisel}
,.
the o;ame ren,on ""it is important to usc hdt: rthkcda,ticity-rubu<>t standJrd errors
tor thl.' OLS estimato rs of the: multiple reg r~ ton mo~..lc.!l.

..

Appl ication to the Demand for Cigarettes


In Section 12.1, we estimated the cla.,ticit) of dcmand for cigarette.-; using data on
:1111\ll:ll conc;umptton in 4h L'.S. statcc; in 19lJ"' usinv T\L~ "tth a single regTessor

..
438

CHAPTER 12

Instrumental Variables Regression

(the logarithm of the real price per pack) and a single instrument (the real <:all's
tax per pack). Income also uffech demand however. so 111' part ut the t:>rmr term
of the popul~1tion regre,._ion. As discussed in Section 12. 1. if the '!>tate ~ctlc-. tax 1
related to l>late income, then it b correlat~d with a \ arJ.tble 10 the error term of
the cigarette demand e4uauun '' htch violate'!> tht. in,trumcnt t.xogt.ncu~ condtllon If ...o. the IV estimator in Section 12.1 is inconsistenl. That ic;., the I\' rel!rl'-:
. ion suffers from a versjon of omttted ' ~le bias.. To solve th1s problem, \ \ C nt:cJ
~
to include income in the regression.
~.,._,,
We therefore consider an alternative specification in which the logarithm vi
j,.f(;e
tocome is included in the demand equation. In the terminology of Kt:>y Concept
' ~
f .. ~A-12.1, the dependent variable Y is the logarithm of consumption.ln(Q: ~"m' ); the
"-"~ P"~ ndogenous regressor X is the logarithm of the real after-lax price, ln(P;'~" 'u''),
the included exogenous variable \Vis the logarithm of the real per capita ,\,1\t:
income, ln(Juc ): and the in1,1rument Z i5 the real sales tax per pack. SalesTu\ 1 fhc..
SLS C!>timates ancl (hetcroskeuasticity-robust) tandard l!rrors are

,.

I1P

~~

......

) .;;:,

~ ,r:.;J _,
/J ( f

t og ;/
I

,,)teA)

-.-

ln (Q~IgOrt'IIC>)

=9.43- 1.141n{.P,' ~ureuc.~) + 0.211n(i11C


1

(1.26) (0.37)

).
1

( 1~.15)

(0.3 1)

ic.~

ts regression uses a single instrument. SafesTa."C,, but in fact another candt


ate instrument is available. In addition to general sales taxes. '\tatel' le'y sJXc al
taxes that applr on l ~ to cigarette and other tobacco product . These C1gan:I\C
specific taxes (Ctg Tax,) con:.titute a pOSl>ible second instrumental variabk. Tbc
L...---....i.o!'Xareue-specific tax increases the pnce of c gareue paid b~ the con,umu. -.o al
~a r uabl\ meet!> the conditiOn for instrum<!nt rekvance. If u is uncorre lated with th~.:
r
or term in the stale cigarette demand equation, it is an exogenous instrument
~~
With lbis additional imtrument in hand, we oow have two instrumental \,m
,.c. ~~ ( ~ul ab1es, the real saJes tax per pack and the real state cigareue-specific tax per pa~o.k
vf~
With two instruments and a single endogenous regressor, the demand ela~ll~oll) ~
~ lA ~'tt~ owndenllficd.th,lt ~.the numbt!r of mstrumems ( 5ale\Tax, and Cig nl.\ 1 !-Om 2)
~
exceed' the numher of includeJ endogenous' an abies (P,'"'rtttt'. )0 k = 1). w. . ( , I ll
estimate the demand elasticity using TSLS, where the regressors in the firstst, 11c:
6
regression are the incl uded exogenous variable.ln(/nc1). and both instruments.
q .I /
TI1c resultinl.! I'SLS c~timate ol the regression function using the two m't 1

.eP.

~ ~,_..t

~
1

mcn1 Suh' rtu ,

;.~ CigTcn

ln(Q;' 'c:r.')

IS

= 9.1)9 -

'
L2 ln(Pf"iumrJ) + 0.2~ln(lnc,).

(0.%) {0.25)

(0.25)

(l:?.ttl)

12.3

es
is
of
diesed

Checking Instrument Validity

439

Compare Equations (12.15) and ( 12.1 6): The standar d error of the estimated
price elasticity is smaller by one-third in Equation (12.16) l0.25 in Equation (12. 16)
versus 0.37 in Equation ( 12.15)]. The reason Lht:! standard erro r is smaller in Equation ( 12.16) is that this estimate uses more information than Equation (12.15): In
Equa tion (12. 15). only one instrume nt is u ed (the sales tax). but in Equation
(l2.16). two instruments are used (the sale tax a nd the ciga re ne-specific tax).
Using two instrumenb explains more of the variation in cigarette prices tha n using
just one. and this is reflected in smaller s tandard errors on the estimated demand
elasticity.
Are these estimates credible? Ultimately, credibility depend!> on whether the
set of instrumental variables-here, the two taxes-plausibly satisfies the two conditions for valid instruments. lt is therefore vital that we as:,ess whether these
instruments are valid, and it i:, to thi!. topic that we now turn.

12.3 Checking Instrument Validity


15)
Whether instrumental variables regression is useful in a given application hinges
on whether the instruments are valid: Invalid in!>trum ents prod uce meaningless
results. lttbe re fore is essential to assess whether a give n set of instruments is valid
in a particular application.

Assumption #I : Instrument Relevance


the
.nt.

The role of the instrument relevance condit ion in TV regression is subtle. O ne way
to think of instrument relevance is that It plays a role akm to the sample size: The

'ari
ack.
t v is

.more relevant the instruments- that is. the more the variation in X is explained
by the instruments-the more information i" available for use inJV regression. A
more relevant instrument produces a more accurate estimato r.just as a Larger sample size produces a more accurate estimator. Moreover, statistical inference using

TSLS is predic<~tcd on the TSLS estimator having a normal sampling distri bution.
but according to the central limit th eore m the norm a l distribu tioQ is a good
approximation in large-but not necessarily small- sampks. [fhaving a more rele vant instrument is like having a larger sample s ize, this suggests , co rrectly, that
the more relevant is the instrument. the beller is the normal approximation to the
sampling distribution of the TSLS estimator and its t-statistic.
Instruments th at explain little of the variation in X are called weak instrume nts. In the cigarette example, the dista nce of the state from cigar e tte

440

CHAPTER 12

ln~trvmentol Variables Regression

manufacturing plants arguably would be a weak insuument: A Ithough :1 grcutc 1


distance incrca!;CS shipping costs (thus hjfting the supply curve 10 a nJ rlll'-ln!;! the
equilihnum price), cigarettes are lightweight .. o shipping co~t' arc a ~mall w mponent of the pnce ot c1garettes. Thus the amount of price variation c~plaincd b~
shipping cost:., and thus distance to manufacturing plants. probabl) j.; ttuite sm,111
This section discus.-;es why \\eak insiiuments are a problem. hll\\ to check h1r
weak instruments. and what to do if you have weak instruments. It ~~ a sun11.<.J
throughout that the instruments are exogenous.

__.,

.~

I,--

Cl

;;q

th
I"

Why weak instruments are a problem. If the instruments arc \\c,tk. th~;n
the normal distribution provides a poor approximation to the sampling dl'>trihution of the! TSLS estimator. ~ven if the sample size is largt. Thus there is no thcJretical justification for the usual methods for performing statistical inference. I!\ en
in large samples. In fact. if instruments are weak. then the TSLS ctimatot can he
badly hiul!>cc.l in the direction of the OLS estimator. In addition, 95% conf\Jcn-.:e
intervals constructed as the TSLS estimator :t 1.96 standard errors can cont ain lhc
true value of the coefficient far less than 95/o of the time. In short, if in!';trumcnto;
are weak, rSLS b no longer reliable.
To sec that there is a problem with the large-sample normal approxim.lltllO w
the sampling cUstribulion of the TSLS estimator, consider the speciaJ ca~>t!. intr
duced in Section 12. 1. of a single incJuded endogenous variable, a single inst
ment, and no included exogenous regressor. If the instrument is valid, tht!n flf 1
is consist!nt hecausc the sample covariances sa and sl.X are consistent: thJI h.
p pu = sa '"/.\ ~ cm (Z,. Y,)/cov(Z,.X,) = /:31 IEquation (12.7}). But 110\\ -..up
pose that the im.trument is not just weak but irre levant, so that cov(..l,.A' ) 0.
Then ~LA ~ cov(~,.X,) = 0. so, taken literally, the denominator on the ngl t
hand side of the limit cov( Z,. Y, )/cov( Z,X ,) is zero! Clearly. the argument th ll
~[s15 is consistent breaks down when the instrument rdevance condnion tall' \'
shown in Appendix 12.4, this breakdown results in the TSLS e~lima tor ha\ im ;1
nonnormal c;amphng Ul'-tribulion.even 1f the -..ample sizt! is ,ery largl! In I act. \\ I ~.:II
the instrument is trrele\ant, the largl'-sample dist ribution or ~pts is not thai ul ,,
normal rdndom variablt!. but rather the distribution of a rario of two normal ran
dnrn variables!
While this circumstance of totally inelcvant instruments might no t be encoun
te red in practice, it raises a question: How relevant must the in:strume nts ht. fur
the normal distribution to provide a good approximation in practice? The an-.w-:r
to rhis question in the general IV model is com plicated. Fonunately. howe' r.
there is a simple rule of thumb available for the most common situation in prl.:
tice. the case 01 a single t:ndogenous regressor.

W1
~

12 .3

A RUlE

Chock1ng ln)trvmenl Volidiry

441

OF THUMB FOR CHECKING FOR WEAK INSTRUMENTS

Th~;

fir..,t ~tug~ F-statistic ts the F-statistic testing tht. h) pothcsas that the coeffic1cnt-. un the an~trument~ L,,. .. . . Z. .:equal zero in tht fir"'"''S~ ot l\\0 ~tngc least
StJUarcs. \\hut there is a single cndogenou.-. regressor. l tir.,H.tt~c F-st alio;tic less
th '1 10 111dic ltcs t h.n the in:.truments are \W3k. in "'hkh clsL the T"il S ~ostimator
is ,,a.,eJ (c\l.:n 10 large samples). andTSLS c-stausucs and confiJ ence mtervals are
ur rc.:hahle.

Checking for weak instruments wh~n ther~ is a single endogenous


regressor. One way to check for weak instruments when the re is a single
endogenous regressor is to compute the F-statistic testing the hypothesis that the
coefficients o n the instruments are all zero in the fi rs t-~ t agc regression ofTSLS.
This rlTStstage F-s1atistic provide<; a mca,ur1. olthe information content contained
in the anstrumems: The more inform.llion content. the lttrg.et is the expected value
of !h~ F-!>talil>tic. One simple rule of thumh i-. that you do not need to worry about
weak instruments if the fi rst-stage F-stati tic c\c~o:~:d.; 10. (Why 10? Sec Appendix
12.5.) This is summarized in Key Concept 12.5.

What do I do if T have weok instruments? If...\'OU ha\'1!-mam


...._
. instruments.
then some of those instruments arc pwb.tbl) '"cnl.cr thtn others. If you have a
small number of :strong in:.trumcnt., and many weak o nes. you "'ill be better off
di cardmg the weakest instrumenh mcl uo;mg th1. mo.,t relevant ub-;et for your
T<;LS analysis. Your TSLS <;tandard l rrurs mu.. ht JnLrt.a-;c when y~u drop weak
instruments. hut ke~ in mind that your ont.tmal stancl.uj errors \\ere not meanin ~fu l any\\ a~!
I f. however. the coefficient'\ arc exactI~ identified. you cannot discard the weak
inl'truments. Even if the codficienh oiTC ovcmlcntilicd \ 'OU might not have enough
strong in!oilrumcms to achtcvc tdenliftcation. so di')caruing liome we<tk mstrumeots
will not help. ln this case, you have two optaons. 'Inc fi rst option is to find addjtional, stronge r instruments. This is ea~h:r .tid than done: It Tl.!quires an intimate
knowledge of the problem at hand and can cnt.ul rcdcstgning the data ct and the
nature of the empirical ...rudy. The o..cc:ond op11on is to proceed with your empirical analysis using the weak i.t\.)trumcnts, but cmplo yin!l mc..;thods other than TSLS.
Although this cha pter has focu!.ed on TSLS. \Orne o ther, lcs" com monly used

442

CHAPTER 12

Instrumental Variables Regression

A Scary Regression

sample size, the quarter of btrth nught t-e a wc.1k


in:.trumcnt in some ot thetr >ectficalton .. ~o wh~ n

ul eammg~ auinst years of ..chool usrng data on imh\'iuunh. But if more! able individuals are both more suc-

Bound and Kruege r next met u\ "r lunch. the C\>nver.;a


tion mt:\itably turned to "hc1hcr the \ngrist Kruc,gcr
m::.lrumcots were weak Kmcgt;r thought nm. and St!f
gcsted a creative way to find out: Why not rerun th t

nc way to estimate tht: percentage increase in


earnings from going to ~chool for another year
(the "return to tducation") is to regrc~ lhe logarithm

ce!.l.lul m the labor market and ullcnd o;cboollonger


(perhaps b~causr: they ftnd it easier). then ..cars of
schooling ''ill be corrclnted Ml h the omitted variable.
innutc nbilit}. and thc OUi c~timutor of the return to
educallon will be b1ascd Bc:clluse mnlitc ability is
cxtrcmcly difficult to meu~urc .tnd thus cannot bt uo;ed
as a regn:s.'IQr. sumc labor cconombl.!. have tum.:d to IV

rcgrcion to estimate the return to education. But "'hat


vanable is correlated with yeaH of education but not
the error term in the cflrning~ regression-that is. what
is a valid instrumental variable'?
Your birthday. suggested labor economists Joshua

Any.rist nnd Alan Krucy.cr. Because of mandatory


schooHng law~ they reasoned.) our h1rthday is correlated with your years ot c:ducauon lf the hi'' reqmrcs
you to auend ~chool until your 161h birthday and you
turn 16 tn January wh1k you arc in tenth grade, you

might drop out-but if you turn lfl in July you already


will have completed tenth grade. If so. your birtbda~ ati~fies

tbe instrument relevance condition But being

born in January or July ~houlu have no dtrul efkct on

your earnings (other than tbrl.)u!th )cot~ of education).


so your h1rthday satisfies the m~trumcnt exogenclt)
rondition. The~ implemented this idea hy using the indi
vidual's quarter (tbree-munth pcnod) of birth as an
instrumental variable. They used n very large sample of
data from the U.S. Ccnsu:.. (thetr rcgn:s,ions had at least
329,1)00 observatiOn!>!), and they comroHeu (or other
variables such il!> the "'l)rk~r!. nge.
But John Bound. another labor economist. was !>kepti'-'il He kncw that weHk instruments cnu!'e TSL'i to be
unreliable and \\Omed th:n de,pnc; the l!)(trcmcly large

regressions using a truly irrelevant instrument- replac"


each individual's real quarter of birth by a lake quart,
o ( birth. randomly gener!ll~d by the

computer-and

compare the results IL'>tn!l the real and fake tn~tru.


ments'? What they found Wlh ama:.dnl!: It didn't matkr
whether you used the re<~l tjUartcr of birt h ur the r kt
one a::. th~ instrument-TSLS gave

ha~ically the

samt

3 1lSWCl'!

This was a scary regression for labC)r cconomctrt

cians. The TSLS standard errm computed IL~ing the r1.1l


data suggesls that the return tO education is precisdv
estimated-but so does the Mandard error computed
~ing the fake data. or course. the fake data tonnot e$11
mate: Lhc return to education precisely. because the fake
instrUment is totally irrelevant. The woiT). tbcn. ~ thul
the TSLS estimates based nn the real data are just ns
unreliahle as lhose ba.c,cd on the fnJ.;e data.
The problem is that the in~trumcms are in tact v('ry
weak in some of Angrist and Krueger~ rcgrcssior ~ In
SOffit: (lf their specificatiOO\,Ih~ first-,Uige f51aUsli~ IS
less than 2. far less than th~ rulc!ofthumb cutoff of 10
ln other specificatioos. Angn~t <Jnd Krueger have larr
fitst-stage F-~ta tilitics. and in those cnsl!s the TSJ .S ink1
ent:es arc not subject 19 the pmhlcm of weak in~ II u
mtnts. By the way. in thos~: spccificnrions the return 10
cducati<>n i~ estimated to be npproximotdy 8%, somewhat greorer than estimated by OLS.1

original rv regres~iun., arc r<:JX>ttc:d m An!!n~t and


Krueger ( 1991 ). and the rc:-a naly~is U$1 0~ tht: f1ckt: JO(lfU
mcnt~ is published in BounJ.. J.tc1:1e1. and llaker ( 1995 ).

1The

12.3

Chockmg Instrument Validity

443

method.'> f<lr 10!-.lrumemal vanabk anaJyo;b arc Jc,:. :.clhriJ\'c to weak mstrument:.
than fSLS. -;orne of th~'l. method" an.-~dN:u..,..,cd 111 A ppcndi' 12.5.

Assumption #2: Instrument Exogeneity


the instruments arc no t exogenous. then T I <\ 1~ mcon-.l,tcnt The TSLS e timator converges in probabilit} to something ot h~.:r tiMn thl. population coefficient
m the rcgrcs.,ion. After all. the idea of insuumcntalvariablcs regression is that the
mstrument contains informallon about variation 111 X, thnt t" unrelated to the error
term u,. If. in fact. the instrument is not cxog~: nous. then it cannot pinpo int thi:.
cxogcno u variation io X,. and it stand, tu rl.,,,un th 11 1\ h .. l!,rc"ion fail.;, to provide a con~bttnt c timator The math behrnd t h r ~ ar~u m ent is summarized in
AppendiX l 2A.
I{

Can you test statistically the assumption that the instruments ore exogenous? Yes and no. On the one hand . it b nnt ro''1hk to tc'-l the hypothesis Lhat

cD

the instrument arc exogenous when the col:!fficicnb arc exactly Identified. On the
other hand, if rhe coefficients are ovcridentificd, it i ~ possible to test rhe overidentifying restrictions-that is. to test the hyput ht:,l:-. th,lt 1he "extra .. in.,trum~n tS
arc V<Og.l.nous under the maintained as,umpt10n that there are eno ugh valid
in~trument to tdentify the coefficient'> of intcre'>t.
First conside r the case that the coefficient .m. cxactlv identified. so vou have
as many instruments as endogenous regressors. llH.n itt,lmptl,.,ibk to dc\dop a
stall!>tu:altcst of the hypolhesis that the rn~trum ~n t:-. arc rn tact exogenou<;. TI1at is.
empirica l evidence cannot be brought to bear on the question of \\hc thcr these
instruments satisfy the exogeneity restriction. In th1o; cao;c. th~ o nly wa) to assess
whether the immument are exogenous is to lira'\ on C\(Ktt upmton and )OUr personal knowledge ur the empirical problem ,ll h,tnd For c.:xampk, Ph1lip Wright's
knowledge of <ll!riCUltural ~upply and U~mand led h1m to sugoest tht.~t beloweaver<lgt. r.tinl all \\OUid plausibly o;h1ft the -.uppl~ cunc tur huller hut would not
d1rcctl) ,hjft the demand curw.
Asse!>Sing whether the instruments are e:<ORt: nou""tU'\\tlri/\' requires mal.lng an expert judgment ba~ed on pcr:.onal knO\\ l ~dgc o l the appltcalton. If, however, there arc more instruments than e ndogtmous rcgres ...ors, then there is a
statistical tool that can be helpful in this procc:.s: the !>O-called l~.:st ol overidentifying restrictions.

-...

-A

The overidentifying restrictions test. Suppose you have a single endogenous regressor. two instruments, and no mcludcd

cxo~~oous variable~ Then

you

444

CHAPTER 12

ln~trumentol Variables

Regression

THE 0VERIDENTIFYING RESTRICTIONS TEST (THE ).STATISTIC)


Lctui~'s be the residuals from TSLS estimation of Equation (12. 12). u..c O LS to
estimate the regression coefficients in
ur~LS

= Oo + a,zli + .. . + smzmi + omd w,, + ... + o," ,wr{ + e,.


( l2.17)

where! e, is the rcgrcsston error term. Let F denote the bomoskcJa s ticity- onJ~ p..
statistic testing the hypothesis that 8 1 = = 8111 = 0. The overid~,;ntifying reslrictions test statistic is/ = mF. Under the null hypothesis that alit he instrumen ts arc
wh~rc
exogenous, if ef is bomolikcdastic then in large samples J ~ djstributcd
m - k is the "degree of overidentification ,'' that is. the number of in stru ment~
minus the number of endogenous regressors.

x;,,_,. .

could compule two different TSLS estimators: one using lhe firsl instrument, rh.:
other using the second. These two estimators will not be the same bt:cause ol "d'npling variation. but if both instruments are exogenous then they will tl!o~ t~ be
clo!>e to each mher. But what if these two instrumems produce very different ~'!>ll
mates'? You might sensibly conclude that there is somethinta w rong with onl! ur the
other of the instruments-or both. That is, it would be reasonable to conclude 1.1at
one or the other, or both, of the instruments are not exogenous.
The test of overidentifying restrictions implicitly makes this comparison We
say implicitly, because the test is carried out wjtbout acrually computing all l' he
different possible IV estimates. Here is the idea. Exogeneity of the instrument:.
means that they are uncorrelated with ui. This suggests that the instruml!nts ~hl ul\1
be approximateJy uncorrelaled with iifSLS. when.! qs1 .s - Y, 1{3 lrsnx l1 f + 13.A
rsrs-w
) is the residual from the estimate~ TSLS regrc""H'n
r
t1
~

using all the instruments (approximately rather than ~xact ly bt.>causc of samrhnl!
variation). (Note that these residuals arc constructed using the true Xs rather tiMn
their first-stuge predicted values.) Accordingly, if th.e instruments are in fact exog.c
nOUS. lh~n the coefficientS 00 the instrumentS ina regression of u'f'SI.S OD the in~IJU
ments and the included exogenous variables should all be zero. and this hypoth1!" 15
can be tested.
This method Cor computing the overidentilying restriction test is sum.m~r ,~J
in Key Concept l2.6. This staristic is computed using the homoskcdasttcJt~ 1nl~
-s tatistic. The test statistic is commonly called the J. tati tic.

<iJr\t'

r - .. -<.ve - ~n
, __ ,.L~

WV'~}(e
In IJrgt!

12.4

Appl1cohon lo the Demond ForGgarettes

445

it th ~! in.,trumc:nb ..r"' nut \\~;ak and the errors are


then. under the nuiJ h} puth~sis that the mstruments are exogenous. the J -stat1stic has a ch1-:-.quarcd c.h tnbution \\ llh m k. degrees of freedom
( \~, d It ,., important to r~mcmber that even thoueh the number of restrictions
bemg tested ism. the degre~., of freedom ot iiic a. ymptotic distribution of the ] statistic i m - k. The rca<>on is that it is on ly po ~ ib lc to test the overidentifying
restrictions, of which there arc m k The modification of the ]-statistic for heteroskedastic errors is given in Section 18.7.
The easiest way to see that you cannot test the exogeneity of the regressors
when the co~fficien ts are exactly identified (m ,. k) b lO consider the case of a single included endogenous variable (k = 1). If there arc two instruments. then you
can compute two TSLS estimators, one for each instrument, and you can compare
tbem to see if they are close. But if you have ooJy one instrument, then you can
compute only one TSLS estimator and you have nothing to compare it to. In fact ,
if tbe coefficients arc exactly identified. so that m = k. then the ovcridentifying
test statistic 1 is exactly zero.
:.ampk~.

humo~kcdastic.

pplication to
e Demand for Cigarettes'
ur <1ttempt to estimate the elasricity of demand fo r cigarettes left off wirh the
LS estimates summarized in Equation (12. I6), in which income was an included
ogenous variable and there were two instruments. the general sales tax and the
gare tte-specific tax. We can now undertake a more careful evaluation of these
strume nts.
As 1n Section 12.1, it makes sen~e Lhat the two instruments are relevant
because taxes are a big part of the after-tax price of cigareues, and sbortly we wiU
look a t this empirically. First, however, we focus on the dtfficult question or
whether the two tax varia ble~ are pla usibly exogenous.
The first step in assessing whether an instrument ts exogenous is to think
through tbe argument~ for wby it may or may not be. This requires thinking about
which (actors account for the error term in the ca~arc ttc demand equation and
whether th ~se faclors are plausibly related to the instruments.
.. Why do some states have higher oer capita cigare tte consumption than oth- ~
ers? One reason might be vanat ion in income~ across states, but state mcomt' lS
llll\ -..:ction

lime periods.

assum~.-'l\ kno~l~o:dge

of the m.ucndlln

~~:ciiUn"

IU I .1nJ 10.2 on panel data \\ilh T=

Instrumental Variables Regression

CHAPTER 12

446

mokmg impo<;es costs that are not fully borne by

after death. the ne t present value of the

the .smoker, that is. it generates extem aliues. O ne

externalities ( Lhc value of the net cost!> per puck. di~

economic justification for taxing cigarettes therefore

counted lo the present) de pends oo tl1e dtsct,unt

is to "inturnulize" these externali ties. In th~ory. the

rate.

~r-po.~ck

tax on a pack of cigarettes s hould equal the d ollar

The studies do not agree on a c;pecilic J ollar \'lliU:

value o f the exte rnalitiel> created by smoking that

of the net extemalitie!.. Some suggest the net extt:r-

pack. But

~ hat.

precisely. are the cxtcrnalitiel> of

smoking. mcallured in dollar~ per pack?


Several studu.:s have used econometric method:.

nali t i ~.

properJy dio;countcd. are qu ite l>mall. Je,

than current tax~s. l n fact. the most ext reme esti


mates s uggest that the net externalities arc posiri,c.

to estimate the ~xtern alities of s moking.111e nega-

:~o

tive e.x ternali ties-costs-bornc by o thers ioclud~o.


medical costs paid by the gmrernmenl to care tor ill

which incorporate cost:> that are probably mporti.tnl


but d ifficuJt to quan11fy (such as caring for bahu~~

smoker~ health care

costs of nonsmoke rs associated

who arc unhealthy because their mo lhcrs smoke}

with secondhand s moke. a nd fires caused by ciga-

suggest that externalities might be S I per pack. po~

retres.
B ut. from a purely economic point or ''iew, smok-

sibly even more. But all the studies agree that. by

ing also bas positive extcrnaJities, o r benefits. The

more: in raxes than they e\er get back in their "rid

biggest economic ~ncfi t of smoking is that smokers

retiremenL!

smoking sho uld he subsidilt:d! Othe r studi-.:s,

tending to die in late middle: age. s mokers pay far

tend to pay much more in Social Security (public


pension) ta xes than they ever get back. There arc
also large savings in nursing. home expcnditure~ on
the very old- smokers tend no t to live that long.

Because the neg.ati vc c:xternalilies of <;moking occur


while the smoke r i'> alive but the positive ond. accrue

early calculation or1hc cxtcrmllihes of smoki ng was


rtlpurtcd by Willard G. Manning et al. (19R9). A catcultltlr>n
~ugge~ting lhat health care costs would go tlfJ if evervonc
1A n

~topped smoking is prcscnred in Barcndrcgt et ai..( IW~)


Other studies of the cxtemnlitic:s of smoking are rc:"JI!\~ cJ

by Chaloupka and Warner (2000).

included in Equ<ltion (12.16), so this is no t part of the e rror te rm. Ano the r reasM

""'i'sTha t lfie re are historica l fac tors jnJ1uencJOg demand. For example, s tates th

grow tobacco bave higher r a tes of s m oking than m ost o tbe r states. Could thts f.,ctor be rela ted to taxes? Quite p ossi bly: 1f tobacco farming and <.:igarette proJ u~
tion are importam indus tries in a s ta te. then these industrit:s could exert inJlw.. ntt
to keep cigarene-specillc taxes low. This suggests lJ1<1t an o mi tted factor in d g:Jr~tte demand- whether the s ta te grows tobacco and prod ut'cs cigarcu es--couiJ
be correlated with cigare tte-speci fic taxes.
O n e sol uri on to this po ssibli; corre lation bet ween the e rror term anJ tlJ(
ins trument would b e to include info nn ation on the size of the tobacco and Cl:-1

12.4

447

relle ind ustry in the s tate; this is the approach we took whe n we included income
as a regressor in the demand e quation . But because we have pane l data on cigarette co nsumption. a diffe r ent approach is a vaila ble that does not require this
infonnatioo. A s discussed in Chapter 10. panel data m ake it possible to eliminate
the influe nce of vnri ables that vary across e ntities (~ t a tes) but do not change over
time, such as tht! climate and historical circumstances that lead to a large tobacco
and cigarelle industry in a state. Two methods for doing thjs were give n in Chapter J 0: constructing data on changes in the variables between two different time
periods, and using fixed e ffects regression ..Yo keep the analysis here as simple as
p ossible, we adopt the former approach an d perform regressions of the type
described in Section 10.2, based on the changes in the variables between two diffe rent years.
The time span between the two different years influences how the estimated
e las!icities ar e to be interpreted. Because cigarelles a re addictive. changes in price
will take some time to alter behavior. At fi rst, an increase in the price of cigare ttes
might have liltle effect on demand. Over time, howe ver. the price increase might
connibutc to some smokers' desire to quit and, importantly, it could discourage
nonsm o ke rs from taking up the habit. Thus the response of demand to a price
increase could he small in the short run but large in the long run. Said differently,
for a n addiclive product like cigarettes, dem and might be inelastic in the short run,
that i~ it might have a short-run elasticity near zero, but it might be more e lastic
in the long run.
In this analysis, we focus on estimating the long-run price elasticity. We do this
by conside ri ng quantity an d price changes that occur over ten-year periods. Specifically, in the regressions considered he re. the te n-year change in log q uantity,
ln(Q::;fQ.W ..-s) - ln(Qfif~~11"s), is regressed against the ten-year change in log price,
ln( P1i?w~11'') - In( Pf.if9~al's). and the te n-year change in Jog income. lo(lnc,,1995)
- ln(/nc;, 1985) . Two instruments are used: the change in the sales tax over ten years,
SalesTax1,1w5 - SalesTa.x;,J 985 , and the c hange in the cigarette-specific tax over te n
years. Cig7tlx,_1995 - CigTax,_ 1985.
'
The result.s are presented in Table 12.1. A s usual, each column in the table presents the results of a d ifferent regression. All regressions have the same regressors, and all coellicienrs arc estimate d using TSLS; the only difference between
the three regressions is the set of instruments used. Jn column (1), the only instrumelll is the sales tax; in column (2). the o nly instrume nt is the cigar ette-specific
tax; and in colu nm (3). bo th taxes ar e used as instruments.
In lV regression, the reliability of the coefficien t estimates hinge on the valid ity o f the instruments, so the first things to look al in Table 12.1 are the diagnostic
s tatis tics assessing the validity of the instrume nts.

.....

cigtl

Application to the Demond for G goretles

448

Instrumental Variables Regression

CHAPTER 12

TABLE 12.1

Twa Stage Least Squares Estimates


of the Demand for Cigarettes Using Panel Data for 48 U.S. States
I

Dependent variable: ln( a;'f'm~ - ln( Qf1~~)


Regressor

(1 )

(2 )

(3 )

- 0.94*"

-1.34**

- 1.20'

(0.21)

(0.23)

(0.20)

0.53
(0.34)

0.43
(0.30)

(0.31)

ln(P~fml{")- ln(P~f~=)

ln(lnc, 199:-J

---

ln(/nc,,19l\5)

-0.12

Intercept

- 0.02
(0.07)

(0.07)
Instrumental variable(s)
First-stage F- w tistic
~

--

Overidcntifyiog restrictions
1-tt..><:t and p-value

Sales tax
33.70

---

0.46

--

-0.05

Cigarette-specific tax
107.20

---

(0.06)
Both sales tax and
cigarettcspecific Ia>.

88.60
4.93
(0.026)

'These regr~~ions were estimated usmg da ta for 48 U.S. ~tates (48 observation~ on the ten-year Jirro:rences). The data ~Lrc: <kscnbd
in Appendix t 2.1. l'bc / -rest or overidentifying restricti ons is described in Key Coacept12.6 (il~ p-va\ue is given in parentheseSI,
and the first-stage lstatistLC is described in Key Concept l2.5. Individual tod'l'icients arc staltstically sigruiicanL at the !i% lcvci or
t% ~ignificance level.

First, are tbe instruments relevant? The first-stage F~statis tics in the llm:c:
regressions are 33.7 .107 .2, and AA.6, so in aU three cases the first-stage F~stall~ttcs
exceed 10. We conclude t hat the instruments are not weak, so that we can rely on
the standard methods for statistical inference (hypolhesis tests, confidence intervals) using the estimated coefficients and standard errors.
Second, are the instruments exogeno us? Because the regressions in columns
( I) and {2) each have a single instrument and a single included e ndogenous reg.re<;sor. the coeffi cients in those regressions arc exactly identified. Thus we camwt
deploy the J-test in either of those regressions. The regression in column (3). how
ever, is ove.ride ntified because there are two instruments a nd a single includ~J
e ndogenous regressor,so thc:re is one (m - k == 2 - 1 = 1) overidentifying restriv
tion, The )-sta tistic is 4.93; this has a xt distribu tion.~o the 5% critical value is 3 ~4
(Appendix Table 3) and the null hypothesis that both the instruments a rc exog.e
nous is rejected at the 5% s igni(icance Le ve l (this deduction also can be made
directly from the p-value of 0.026, reported in the table).

12.4

three
is tics

gres
nnot
. bowluded
esrrieis 3..84
xogemade

Application to the Demand for Cigareltes

4 49

l11c reason the J -:-.talistic rejects the null hypothesis that bOLh instumcnts arc
exogenous is that the two instr ume nts produce rather differenl esti ma ted coefficients. When the only instrument is the sales tax [column (l)).lhe estimated price
elasticity is -0.94, but when the only ins trument is the cigarette-specific tax, the
estimated price elasticity is - 1.34. RecaU the basic idea of the }-statistic: If both
instruments are exoge nous, then the two TSLS estimators using the tndividua l
instruments are consistent and diffe r from each other only because of random
sampling variation. Jf, howeve r, one of the ins truments is exogenous and one is
not, then the estimator based on the endogeno us instrument is inconsistent. wh.ich
is detec ted by the J -s tatistic. In this application. th e difference between the lwo
estimated price e lasticities is sufficiently la rge tha t it is unlikely to be the result of
pure sampling varia tion, so lhe ] -statistic rejects the null hypothesis that both the
instrume nts are exogenous.
The ]-statistic rejection means that the regression in column (3) is based on
invalid ins truments (the instrument exogcneity condition fails) . What does this
imply abour the esiimates in columns (1) and (2)?TI1e J-statislic rejection says that
at least one of the instruments is endogenous. so there are three logical possibilities: The sales tax is exogenous but the cigarette-specific taxi~ not in which case
the column (1} regression is reliable: lJ1e cigareue-spccific ta-x is exogenous but the
sales tax is nor, so the column (2) regression is reliable; or neither tax is exoge nous,
so neither regression is reliable. The statislical evidence cannot tell us which possibility is correct. so we must use our judgment.
We think that the case for the e.xogene ity of the general sales tax is stro nger
than that for the cigaret te-specific tax. because the political process can link
changes in the cigarette-specific tax to changes in the cigarette market and smoking policy. For example. if smoking decreases in a state because it fa lls out of fas hion, there will be fewer smokers and a weakened lobby against ciga rette-specific
tax increases. which in turn could lead to higher cigarette-specific taxes. Thu::.,
change~ in tastes (which are part of u) could be correlated with changes in cigarelte-spec)(ic taxes (the instrument). Tilis suggests discounting the 1V estima tes
that use the cigarette-only lax as a n instrument. This suggests adopting only the
price e lasticity estim ated using the gene ral sales tax as an instrument. - 0.9-1 .
The estimate of - 0.94 indicates tha t cigarette consumption is not very inelastic: An increase in price of 1% lead::. to a decrease in consumption of 0.94%. This
may seem s urprising for an addictive product like cigarettes. But remember that
thi.s elasticity is computed using changes over a ten-year period, so it is a long-run
elasticity.1l1is estimate suggests that increased taxes ca n make a substantial dent
in cigarette consumption. at least in the long run.

450

CH APTER 12

lnmumentol Variables Regression


Wl1e n the elast icity is estimutcJ using five -year change~ from 1~X5 to 1~IJO
rather than the ten-year change reponed in Table 12.1, th~ d.tsticit) (~"-llmat d
~ith the general sales tax as the instrument) is - 0.79: for ch a n~e' from ll)lJil to
1995, the elasticity is -0.61-). These estimates suggest that demand is k~ t.lt ... tic
over ho rizons o( five years thao over len years. This lindtne ol greutct fhlc_ eta
ucity at lo nger horizons is consistent with the large body of rc.:.clrch l)n ; ~ reuc
demand. Demand elasticity estimates in that li teratu re typicall) fall m th~. r tnge
- 0.3 to - 0.5, but these are mainl y short-run e lasticirics: some recent '>tUUic-. '~U gest that the long-run e lasticity could be perhaps twice the short-run cla.,tidty.

12.5 Where Do Valid Instruments Come From?


In pracrice the most difflcult aspect of IV eslimation is finding in s tru m~.nb th,tl
arc both re levant and exogenous. There are two main approaches, ""hich r~ll~.- l' l
two different per~pectives on econometric and statistical m o d ~lin g.
111e first approach is to use economic theory to suggest instrum~nts. For example, Philip Wright's understanding of the economics of agricuJtura l m:11 k\;h kJ
him to look for an instrument thal shifted the supply curve hut not tht: Jl.:mand
curve; this in tum led him to consider weather conditions in agricultural rei!
One area where this approach has been particularly successfuls the fielc.l uf t '1 cial economics. Some economic modeb of inve!>tor behavior mvo lvt! s t a ttm~.. t:>
about how investors forecast, which the n imply sets of variables that are unc "~rrC
Iated with the erro r term. Those models somctlJTies are nonlinear w lbe d
and
in the parameters. in which case the IV estimator discussed in th1s chapttl cannot be used. An extension of IV me lhods to nonlinear moJels, caJled genc::ralved
method of moments estimatio n, is used instead. Economic theories are, hl''- "\Cr.
abstractions that often do not take mto account the nuances and details necc-. ry
for analyzing a particula r data set. Thus Ulis approach does not always work
The second approach to constructing instruments is to look (or some ex(l!:'~
nous source of variation in X arising from what is, in effect. a random ph~..nolll?
non rhat induces shifts in the endogenous regressor. For exam ple, 111 ,.ur
hypolh~ tical example in. Section 12.1, earthquake damage increased ave rug(' d<l"~
size irl some school districts. and this variation in class size was Wlrelated to pptc:ntial oroitred variables that affect student achievement. This approach tvpic.111Y

l l( y,ru :Ire: tnlerc~t.:d en teanung more: at>nut th economics or 'muktntt- ~c Ch.JI,>upl.:a ami W - r
CZI ,, nnJ c.rut-rr (2n! I

12.5

Where Do Volid ln1trvments Come From2

451

n:q u irc~

knowledge of the problem ht:lllg tu<.lit:d ,tn<.l cardul atrcntion to the


details of the data, and h best explained through cxampk'-

Three Examples
We now tum to three empirical applications of IV regression that provide examples of how diifcrent researchers used thetr expert knowledge of their empirical
problem to fi nd instrumental variables.

Does putting criminals in jail reduce crime? 111is is a question only an economist would ask. After all, a criminal c<lnnot commit a crime ourside jail while in
prason, and the fact that some criminals are caught and jailed serves to deter others. But tbe magnitude of the combined eCiect-the change in the crime rate associated with a 1% increase in the prison population-is an empinca l question.
One strategy for estimat ing this effect is to regress crimt! rates {crimes per
I00,000 members of the general population) against incarceration rates (prisoners per 100,000). using annual data at a suitable level of junsdicuon (for example.
U.S. states). Titis regression couJd include some control vari<~b les measuring economic condinons (crime increase:> when general economic conditinns worsen).
demographics (youths commit more crimes than the e lderly). and so forth. There
is, bowev~r . a serious potential for simultaneo us causality bias that undermines
such an analysis: If the crime rate goes up and the police do their job, there will be
more prisoners. On the one hand, increased incnrcerntion reduces the crin1e rate;
on the other hand, an increased crime rate increase incarceration. Ali m the butter example in Figure 12.1. because of this simuJ1aneoul> causality an OLS regression of t.he cnme rate on the incarceralioo nne '' ill ~s timal~ some complicated
combination of these two effects. This problem ca nnot be solved by finding bener
comrol variabh:s.
This simultaneous causality bias. bowever, can be elimi03tcd b) hnding a suitable instrumental .;ariable and using TSLS The instruroen1 must be correlated with
the incarcerati on rate (it must be relevant), but it must also be uncorrclate!d with
tht. e rror te rm in tbe crime ra te equation of in terest (it must be exogenous). That
is. tt must affect the incarceration rate but be unrelated to any of the unobserved
fac tors thai determine the crime rale.
Whe re does one find something that arfects 1ncarceration but has no direct
effect on the crime rate? One place is t!XOgenous vartation in the capacity o t ~.:xist
ing pnsons. Because it takes time to build a prison. sho11-term capllcit> restnctions
ca n Coree states Lo release prisoners prematurely or olh~.:n,ise reduce incarceration rates. Cl>ing this reasoni ng, Levitt (1 996} suggcllte<.lthat l.twsuits aimed at

452

CHAPTER 12

lnslrumentol Variables Regression

rcductng pnson overcrowding could ser ve as an instrurn\.'11 1..11 \ariablc?. and he


implemented this 1dea using panel data (or tbe U.S. states from 1972 to l~JJ.
Are variables measuring overcrowding litigation vahd insI rument'? Although
Leviu <.lid not report first-s tage F-statistics, the prison oven.rowdmg litigat 1on
slowetl the growth of pnsoner mcarcerat ions in his ll<Ha. ug~c,ting th31 thi
instrumen t is relevant. To the extent that overcwwding httgattun ., induced O\
prison conditions but not by the crime rate or its determinant, tbt tnlltrumcnt 1
exogenous. Because Levin breaks down overcrowding lcgt'>l,ttion into -.ever.!
types, and thu.<> has several instruments, he is able to re. tthe O\t. rklc.:mifying rc~ ... tnctions aod fai ls to reject them using the /-statistic, "hich holster the ca'c thut hi
instruments are valid.
Using these instruments and TSLS, Leviu estimated the effect on 1h1. ~n 11c
rate of incarceration 10 be subMantiai.This estimated effect was three ttm~..>ll lar ~c
tha n the effect estimated using OLS, suggesting tbat OU; s uffered from large
simultaneous causali ty bias.

Does cutting class sizes increase test scores? As we saw in the empirk 11
analysis of Part II, schools with small classes tend to be wealthier. and their '\tullents have access to enhanced lea rning opportunities both in and out of the cl !:>..room. In Pan II . we used multiple regression ro tackle th ~ threat of onutll.:d
variables b1as by controlling for various measures of stude nt afO uence, ah1liy hl
speak English, and so forth. Sttll, a skeptic could wonder whethe r we diu cut ugh:
If we left out something importa nt, our estimates of the class site effect would still
be biased.
Tius potenLial omi tted vanables bias could be addresscll by including th~:: n~ht
control variables, bu t if these dara are unavai lahle (some. like outsid~ kJrntng
opportunities, arc hard to measure) then an alterna tive approach IS to u'\! IV
reg.ress10o.This regression requi res ao instrume ntal variable correlated \\ il' cia"
si7.e (relevance) but uncorrcJated wtth the ommed determinants of tc't pt.rfnr
mance that make up the error term. such as parental interest in learning.lc:m ntn~t
opportu niti es outside the classroom, quality of the teachers and schol)l fudllltt::.>.
and so forth (exogcneity).
Where does one look for au instrument that induces random, exogenou'" 1ri
alion in class size. bu t is unrelated to the other de terminants o( test performan~"'!
Hoxby (2000) suggested biology. Because of ra ndom fluctuntions in timin.!' ,lf
b1rths, the ize of the incoming kindergarten cl<l.S:> vru-ies from one year ro tht. '1~.:X 1
Although the actual number of children entering kindergarten m1ght be~ o~~
nous (recent news about the ~cbool might influence whether parents send J 'htiJ
to a private school), !>he argued th ~ll th~ po1ential number ul chi ld r~.n -.:nt"'rin~

1l.S

nd he
3.
ho ugh
ga tion
at this
c~d hy
nem is
ever a I
rcslrictbat his
e crime
s larger
large

c classomitted
bility to
enough:
ould still
the right
learning.
o use JV
ith class
t pcrforkarning
facilities.

Where Do Valid Instruments Come From?

453

kindergatlcn- the number of four-year-olds in the district- is mainly a matter of


random fluctuations in the birth da tes of childre n.
Is p ote ntial e nrollment a valic.l instrum ent? 'Whe ther it is exogenous depends
un whether il is cor related with unob erved determinants of class size. Surely
biological fluctua tions in potential e nrollment a re exogenous, but pote ntial
enrollment nlso fluctuates because parents with young children choose to move
into a n improving school district and out of one in trouble. If so, an increase in
potential enrollme nt could be correlated with unobserved factors such as the quality of school management, rcnde ring this instrume nt invalid. Hoxby addressed this
problem by reasoning that growth or decline in the potential student pool for this
reason would occur smoothly over several years, whereas random fluctuations in
birth da tes would produce short -term "'spikes'" in potential enrollment. Thus, she
used as her instrume nt not potential e nrollment, but the deviation of potential
e nro llme nt from its long-tenn trend. These deviations satisfy the criterion for
instrumen t relevance (the first-stage F-statistics all exceed 100), She makes a good
case t hat this ins trument is exogenous, bul, as in all IV analysis, the credibility of
this assumption is ultimate ly a matte r of judgment.
H oxby imple me nted this s trategy using detailed panel data on elementary
schools in Conntlclicut in the 1980s and 1990s. The panel data set permitted her to
include school fixed effects, which, in addition lo the instrumental variables strategy, attacks 1be proble m of omitted variables bias at the school level. Her TSLS
estimates suggested thaf the effect on test scores of class ~i1.e is small; most of her
estima tes were statistically insignificantly different from zero.

Does aggressive treatment of heart attacks prolong lives? New aggres-

the n~X 1

sive trea tments for victims of heart attacks (technically, acute myocardial infarctions. or AM I) hold the potential for saving lives. Before a new medical
procedure-in this example. cardiac catheterization3- is approved for general use,
it goes through clinical trials, a ller ies of randomized controlled experiments
designed to measure i ts effects and side effects. But strong performance in a clinical trial i.s one thing; actual performance in the real world is another.
A natural St<lrting point for estimating the real-world e ([ect of cardiac
catheterization is to compare patients who received the treatment to those who
die..! not. This ll:ads to regressing the length of s urvival of the patient against the
binary treatment variable (wh~ ther the pa tien t received cardiac calhele rization)
and other control variables that affect mortality ( age, weight, other measured

endogt:
nd a chilJ
entering

J(\mliac c<l bc terizulton is a procedure in which a catheter. or tub.:. is inl(erted into a okxxl vessel and
guided <~li t he woy to rhc heart to obtain information about the heart and coronary arteries.

454

CHAPTER 12

lnslrumenk:JI Variables Regreuion

health conditions. and . o forth). The population coefficient on the: Indicator vnr1
able is the increment ro the ()atient's life expectancy provided hy the treatment.
Unfonunately, the O.LS estimator is subject to bias: Cardiac ca th~.;terization Ul>e~
not 'just happen to a patieor randomly: rather, it is performed bccauc;c th e
doctor and patient de cide that it might be effective. [f their decision tl> ba~l!d in
pan on unob e rved factors relevant to health ou tcomes not in the da ta set. then
the treatment decision will be correlated with the regression error term. lt the
healthiest patients a re the ones who receive the tre<stment, the OLS estimator" ll
be biased (treatment is correlated \'ith an omitted variable). anJ lh~ treatment
will appear more effective than it really is.
This potential bias can be e liminated by IV regression using a valtd instru
mental variable. T he instruo1ent must be corre lated with treatment (mu~ t be rt,;l
~vant) but must be uncorrelated with the omitted health factors that aCft;ct survl\al
(must be exogenous).
W here does one Look for some thing that affects treatment but not the health
outcome, o ther than thro ugh i ts effect on treatme nt ? McCieUan. McNeil, and Nev..
house (1994) suggested geography. Most hospitals in their data sc t did notllpc
c ialize in cardiac cathe terization, so many patients were c loser ro ' regular'
hospitals that did no t offer this lreatment than to cardiac cathe terizalion hoo;pt
rals. McCieUan, McNeil, and Newhouse therefore used as an instrumenta l variai:ll"'
the difference between the distance from the AMI patient's home to the neare<>t
cardtac catheterintion bospttal and the distance to the nearest hospital of an) ~ort
this distance is zero if the near~st hospital is a cardiac catheterization hospital. oth
erwise it is positive. If this re la tive distance affects the probability of receivin~ tht:.
treatment. the n it is relevant If it is distributed randomly across AMI victirru., thtn
\l is exogenous.
ls relauve distance to the nearest cardiac catheterization hospital a ' lid
instrument'! McClellan. McNe1l, and Newhouse do not report first -stage F -stati"
tics. but they do provide othe r e mpirical evide nce that it is not weak. Is this di~
tance measure exogenous? They make two argumt:nts. First. they draw o n th, tr
medical expertise and knowledge of the hea lth care system to argue that d ist.tn~ot:
to a hospital is plausihly uncorrelated with any of the unobservable variables th I
determine AMl o utcomes. Second, they have data on some of the add itiona l \'llfl
ables that affect AMl o utcomes, such as the weight of the patient, and in their sample distance is llllcorrelated wi th these observable determinants of survival: tht$.
they argue, makes it more credible that distance is uncorrelated with the Llll<lll
servable determ inanLS io the error term as well.
Using 205.021 observatio ns on Americans aged a t least 64 who had an A;\.il
in 1987, McClellan . McNeil, and Newhouse reached a striking conclusio n: Their

12.6

r varitmem.
n does

se the
ased in
t. then
. If the
tor will

instruhe re lsurviv~ll

health
d New-

a t spee gular''
n llospivariable
nearest
any sort
iral. othving this
1s. then
l a valid

F-statisthis disan thei r


distance
bks that
nal vnri

an A Ml

Conclusion

455

TSLS estima tes suggest that car dia c ca lhute rization has a small, possibly zero
effect on health outcome\ that is, cardia<.: catheterizat ion does not su bstantially
prolong life. ln contrast, the O LS estimates suggest a large positive effect. T hey
inte rpret this d ifference as evidence of bias in the OLS estimates.
McClellan , McNeiL and Newho use's IV me thod has an interesting interpretation. The OLS analysis used actual treatment as the regressor, but because actua l
treatment js itself the outcome of a decision by patien t and doctor, they argue that
the actual treatment is correla ted with the e rror te rm.lnstead, TSLS uses predicted
treatment. wbere the varia tion in pre dicted treatment arises because of variation
in the instrumental variable: Patients close r to a cardiac ca tbe.te rization hospital
are more tikely to receive this treatme nt.
This interpretation has two implications. First, the IV regression actu ally estimates the effecl of the treatme nt not on a '' typical" randomly selected patien t, but
rather on patients for whom distance is an important consideration in the: treatment decision. The effect on hose patients might differ from the e.ffcct on a typical patient, which provides one explanation of the greater estimated effectiveness
of the treatment in clini<:al trials tha n in McCle llan, McNeiL and Newhouse's lV
study. Second, it suggests a general strategy for finding instruments in this type of
setting: Find an instrument tbal af(ccts the probability of treatme nt, but does so
for reasons lhat are unrelate d to the outcome except through their effect o n the
likelihood oftreatment . BoLh these implications have applicability lo experimental and ''quasi-experimental " studies, the topic of Chapter 13.

12.6 Conclusion
From the humble start of estima ting how much less butter peo ple will buy if its
price rises, IV methods have evolved into a ge neral approach for estimating regressions when one or more variables are correlated with the error term. lru;trumental variables regression uses the instrume nts to isola te varia tion in the e ndogenous
regressor~ that is uncorrelated with the error in the regression of interest: th.is is
the first stage of two stage leasl squa res. This in turn permits es timation of t he
efiect of interest in the second s tage of two stage least squares.
Successful IV regre~sion requires valid instruments, that is, instruments Lbat
are both relevant ( no t weak) and exogenous. U lbe instruments are weak, then the
TSLS estimator can be biase d. e ve n in large samples. and statistical inferences
based on TSLS r-statistics and confidence in tervals can be misleading. Fortunately,
when there is a si ngle endogenous regressor it is possible to check for weak instruments simply by checking the fi rst-stage F-statistic.

456

CHAPTER 12

lnsfrumentol Voriobles Regression

H the inst ruments arc not exogenous, that is, if one or more in:.trurncnt:. i:> correlated witb the error tenn, then the TSLS estimator is incoosi <> t~.:nt II thcr~ nr
more instrumen ts than endogenous regressors, then instrument exogcndt) can be
exanuncu by using the !-statistic tO test Lhe overidentifying restriction' Howe\ l!l
the core us:>umption-Lhat there are at least as many exogenous tmtrum\:lll
Lbere are endogenous regressors-cannot be L~ted . lt is therefore snc.: u hem
bmh tbe empirical analyst and the critical reader to use their own unJcr tnndm
of the empmcal application to evaluate whether lhts assumption b rc \llnfthle
The interpretation of IV regression as a way to exploit known exo)o!..:nou'" n
ation in the endogenous regressor can be used to guide the search for pot<:tl ial
instrumental variables in a parllcular application. Th.is interpretation urH.h.:r.,cs
much of the empirical analysis in the area that goes under the broad he. lUin" r
program evaluation. in which experiments or quasi-experiments arc Ul)t:d to c.::.u
mare the effect of programs, policies, or other interventions on some outcomc.: nh .1
sure. A variety of additional issues arises in those applications-for exam pk 1h~
in terpre tatio n of IV results when. as io the cardiac call:leterization exampk d I
fcrenL''patients" might have different responses to the same " treatment.''lll ~
a nd other aspects of e mpirical program evaluation are taken up in Chapter 13.

Summary
1. Instrumental variables regression lS a way to estimate regression coefficients en
one or more regressor is corre lated walh the error term.
2. Endogenous variables are correlated with the error term in the equation ot 1nt r
est: exogenous variables are uncorrelated with this error term.
3. For an instrumt:ntto be valid. it musl (1) be correlated with Lhe included tnJoge
oous variable and (2) be exogenous.
4. 1V regression requires at least as many instruments a!. included endogcnouo; \'an
ables.
5. The TSLS estima tor has two stages. First. the included endogenous variahk .trt:
regre!'scd agnim.t the included exogenous variables and the instruments. ::,~~,.t~nll.
the dependent va ri able is regressed aga inst th ~ included exogenous variuhk' tnd
the predicted va lues of the included endogenous variables from the fir,t Ji!C:
rcgres ion(s).
6. Weak instruments (instruments that are nearly uncorrclated with the tncludld
endogenous variables) make the TSLS estimalor biased and TSLS contd~nct
intervals and hypothesis tests unreliable.
7. H an instrum~.:nt i-. not exogenou.'\. th~n the TSLS estJmator i incon ~ ~tcnt.

Exercises

457

Key Terms
instrumental variables (IV) regression
(421)
.nstrumental variable (instrume nt) (421)
l.lodogenous variable (423)
exogenous variable (423)
instrument re levance condir1o n (423)
10 trumc nt exogeneity condition (423)
rwo stage Least squai es (423)
included exogenous variables (432)

exact id~ntifrca tion (432)


overidentlficatio n (43:!)
unth.ridcnttlrcation (432)
r educcJ Corm ( -B-t)
first-srage rc:grc:~sio n (435)
second-~tage regression ( 435)
weak in,trumcn ts (439)
fi ~ t-s tage F-statistic (441)
test or overidcntrlving restrictions (444)

Review the Concepts


12.1 In the demand curve regression mO<.Icl of Equation (12.3), is ln( Pftt"~') positively or negatively correlated with the error. u,? If ~ ~ is es timated by OLS,
would you e:>.-pect the:: estimated va lue to be larger or smaller than the true
value of /31? Explain.
12.2

ts whc::n

12..3

12.4

In the study of cigarette demand in this chapr cr. suppose that we used as au
instrument the number of trees per capita in the state. Is till!. instrument relevant? Is it exogenous? Is it a valid UlStrumcnt?
In ills swdy of the effect o f incarcera tion on crime ra tes, suppose that Leviu
had used the number o f la"vyen. per capita as an instrument. Is this instrument relevant? Is it exogenous? Is it a va lid instrumem?
ln their study o f the effectiveness of cnrdiac cathe terization, McClella n,
McNeil, and Newhouse (1994) used as an instrument the difference in d istance to cardiac cathe te rization and regular ba<.pital-;. H ow could you determine whether this instrument is relevant? HO\\. could you determine whether
this instrument is exogenous'?

Exercises
12..1

This question refers 10 rhe panel daw regressions summarized in Table 12.1.
a. Suppose that fed eral government is considering a new lax on cigarettes rhat is estimated to incr~::ase the retail price by $0.10 per pack. If
the current price per pack is $2.00, use tht: regrel>slOn in column (1) to
predict the change in dema nd. Con-;truct a 95% confidence mtcrval
for the change in demand.

458

CHAPTER 12

Instrumental Variables Regression

b. Suppose that the United States cntt:rs a recession and income tails h\
2%. Use the regression in column (1) 10 predict the change in demand.

c. Recessions typicaUy last less than one year. Do you thmk that the
regression in column (1) wiUprovide a reliable answer to the quc~tion
in (b)? Why o r why not?
d. Suppose that the Fstatistic in column ( l ) was 3.6 m-.tcaJ ot 33.6

Would the regression provide a reliable answer to the question


in (a)? ~'hy or why not?
12.2

p~~d

Consider the regression model with a singJe regressor: Y, = {30 + {3 1>.:, + u,


Suppose that the assumptions in Key Concept 4.3 are satisfied.

a. Show that X; is a valid instrument. That is. show that Key Conce pt 12.3
is satisfied with Z; =X;.
b. Show that the tV regression assumptions in Key Concep t 12.4 arc sat
isfied with this choice of Z;.
c. Show that the IV estimator constructed using Z, = X; is identicnl to
the OLS estimator.
12.3

A classmate is interested in estimating the vari ance of the error term tn


E quation (12.1).

a. Suppose that she use!> the estimator from the second -stage regr~ sil ~
ofTSLS: ~ = ,~ 2 L~- 1 ( Y,- fi'bst.S- p[SLS X;)2 where X; is the fin~d
value (rom the flrs t-stage regression. Is this estimator consistent? (fx
the purposes of this q uestion suppose that the sample is very large .. ntl
tbe TSLS estimators ;ue essentially identical to {30 and {3 1.)
2 consistent'>
b Is 6-b2 = n-1
_..!.. ~~~ (Y - {3
ATSLS
- {3
~ rsw
X)
~t= l .,
II
l
1

U .4

Consider TSLS csLimaLion with a single incl uded endogenous variable ,1nd
a single inl>trument. Then the predicted value (rom the first-stage rcgre~~um
is X; = ..n-0 + fr 1Z;. Use the definition of the sample variance and covariant~
to s how that sxY = ..n-Lszy and
= i-fs~. Use this result to fill in the ~Lcp>
o f the derivation in Appendix 12.2 of Equatio n ( 12.4 ).

sl

U .S

Consider the instrumental variable regression model

where X; .is correlated wi th u; and Z; is an instrument. Suppose that the first


three assumption- in Key Concept 12.4 are atisfied . Which rv as umption
il> not satisfied when:

Exercises

4 59

a. Z, is indepcnJcnl of (Y;. X 1 W,)?

b. Z, = W,?
c. W, = 1 for a ll i?

d. Z, = X;"?
12.6

12.7

In an instrumental variable regression model wilh one regressor, X, and one


instrumenL, Z;. the regression uf X onto 7, has R2 = 0.05 and n = lOO. Js Z;
a trong instrument? [Him: S ~.:c ::.quat ion (7.14).] Would your answer change
if R2 = 0.05 and n = SilO?
In an instrume nta l variable regress on model with o ne regressor.x and two
instruments.. Zt, and ~ the value of I be /statistic is J = 18.2.
a. D oes t his s uggest that E(u, Z 1,, Z 2,) :1: 0? Explain.

pt 12.3

b. D oes this s uggest Lbat E(ttFI 2 1,) :f: 0'? Explain.

re sat

12.8

l ro

term in

u;.

Consider a p roduct marke t with a supply funct1on Qf = {30 + {3 1P; +


a
de m and f unction QJ = y0 + u:'. nnd a marke t equilibrium cond it io n
Q,' = Q'f. whe re u; and uf ar~ m utually independent i.i.d. random variables,
both with a mean of Lcro.
a. Show rhat P1 a nd

11f

are correlated.

b. Show tha t th~ OLS estimator o f /31 IS mconsjslent.


ession

e fi tted
r? ( Fo r
rge and

c. H ow would you estimate {30,/31, and y11?

12.9

A researche r is interested in the effect of military service o n human capital.

H e collects data from a random sample of 4000 workers aged 40 and runs
the OLS regress.ion Y; = {30 + {3 1X, + u,, where Y; is the worker 's annual
earnings and X, is a binary variable that i~ equal to I if tbe person se rved in
the mili tary and is equal to 0 otherwise.
H.

aria nee

he ste ps.

the t'lrst
umption

ExpJain why the OLS estimates are likely to he unreliable. (Him:


Which variables are om itted from the regression? Aie they corre late d
wilh military service?)

b. D uri ng the Vietna m War there was a draft, where priority for the draft
was determined by a national lottery. (B i r th ua\e~ were randomly
selected and ordered 1 through 365. Those with birthdares ordered
first were drafted before those with birtbdates ordered second, and so
forth.) E xplain how the loHery might he used as an instrument to esti
mate the effect o f military service on earnings. (For more a bout this
issue, see Joshua D. Anr.rist, "Lifeti.u1e Earnings and the Viem am Era
Draft Lonery: EviJcnc\. fro m Socia l Security Administratio n
R en,rds." AmtrtCIIII

('uiWII/I( Rr1 hw.

Jun\. 1990.)

460

CHAPTER 12

lnsrrvmentol Voriobles Regression

12.10 Consider lh ~ insLrumentaJ variable regressio n mode l Y; = {30 -t f3 1X, .... {3~ w,
+ u,. whe re Z, is an inslrumenl. Suppose tha t data on W, arc not avatl:tblc:
and the model is estima ted omiu ing W, from the regression.
a. S uppose Z, a nd W; are uncorrelated. ls rhe IV estimato r con istent?
b. Su ppose Z, a 11.d W; are corre la ted . Is the IV estimator com.ish:nU

Empirical Exercises
El2.1

During the 1&.<\0s, a cartel known as the Joint Executive Commiuee (JEC')
controlled LJ1e rail tra nsport of g.r<un fron1 the Midwest to eastun cities in
th<.: United S ta tes. The ca rte l preceded the Sherman Antitrust Act ol HNil
a nd it legally opera te d to increase the price of grain above wha t would

h ~vc

be en the compe titive price. From time to time. cheating by memhers of Lhi.!
cartel brought a bout a temporary collapse of the co llusivc prk\!-setting
agreeme n1. Tn this exercise, you will use varia tio ns in supply associated with
the cartel's collapses to estimale the elasticity of demand for ra il Lramporl
of grain. On Lbt! texrbook Web site W\'1-w.aw-bc.com/stock._watson, you w1ll
find a da ta file JEC that conlains weekly observations on the ra il sh1pp111g
price a nd o tbt!r factors from 1880 to 1886.4 A detaile d description of th~ di.ilJ
is contained in J EC_Description available on the Web site.
S uppose that the demand curve fo r rail tra nspo rt of grain is specified t~
ln(Q,) = {30 + {31ln(P;) + f3)ce; + l:;~ 1f3-z+;Seas1. , + u1, whe re Q, is the 111 at
tonnage o{ grain !ihipped in week i, P; is the price of shipping a ton of gram
by rail , lee; is a binary variable tha t is equal to 1 if the Great Lakes an. ~"lt
navigable because of ice, and Seas1is a binary variable that captures sea!i~m.l
variation in demand. fee is included because grain could also be 1ra nsp~1rtlJ
by ship whe n the Great Lakes were na vigable.

a. Estima te the demand e quation by OLS. Wha t is the estimated valul' ut


the de mand e lasticity and its standard error?
b. Explain why the interactit..1n of supply and demand could make the
OLS estima tor of the e lasticity biased.

c. Conside r using the variable cartel as instrum enta l variable Cor In(/')
Use economic re asoning to argue '>vhether can e! plausibly satisfies t'l~.:
lwo conditions Cor a valid instrument.
"'These data WC;(C rro\ulc:d b' Pmk-.wr Robcn PnrtcT of '>on h... estern l OIVC~Il~ ilnd llt<;(C u,~J ~
ht~ paper "A Study ot C.trtcl Stabtlitr: The Joint L'reculi\'C Commmec. JtlSO I '~li,'' nrc flrll Jurrrll I
oj Fc,umuo t 1183: l-112) 1111-~14

Empirical Exerci~

d. Esumate the first

~tagc

461

rcgrcl>sio n. h wrtel a wc.1k in!o.trumcnt?

e. Estimate the demand equation by instJ umental vanable regrcs ion.


\Vhat is the estimated demand d.tsttctt) and its standard e rror?
f. Does the evidence suggec:tthat thl. ca rtel v.as charging the profit -max
imizing monopoly price? Explain. (Hmt: What sho uld a monopolist do
if the price e lasticity is less than I?)
E12.2

How does fertility affect labor ~ u ppl y'! That is. how much does a woman's
labor supply fall when she has an additional child'/ Jn thi!' exercise you will
estimate this effect using data for m arn~d women from the 1980 U.S.
Census.5 The data are available on tht: t~ x tbook We b site nww.aw-bc.com/
stock_ watson in the fi le Fertility and descnbed in the file Fertility_D escription. TI1e data set contaim in[ormation on m.trried women aged 21-35 wirh
two or more children.

a. Regress weeksworked on the indicator vart.tble morekids using OLS.


On average, do women with more than l\\0 children work less than
women wirh two childre n? How much It:~~?

b. Explain why th e OLS regression estimated in (a) is inappropriate for


estimating the causal effect of fertility (morekids) on la bor ~ uppl y
( weeksworked).

c. The dara set cootains the vanable samesex, whrch is equal to 1 if the
fu~t two children are of the same sex (boy-boy or girl-girl) and equal
to 0 o therwise. Are couples whose firsI two cbtldrcn a rc o f the same
sex more like ly to have a third child? Is the effect large? Is it statistically significant?
d. Explain why samesex is a valid instrument for the instrumental vari
able regression of weeksworkell on morekidJ.
e. h samesex a weak instrument'!
f. Estimate the regrt!l>Sion of week~w,>rked on mord.;.1ds u::.ing :,tW ie.\eX as
an ins trume nt. How large is the fertiht) c l feet on labor suppl) 'l
g. Do the results change when you incl ud~ the vanables ageml. black,
hispan, and othrace in the labor suppl) rl!grcssion (trea ting tht: e varia ble as exogenous)? Explain why or\\ hy not.

nu:.:e<.law~ erc:prO\idcJ b\ Pwfe-'-(lr v.r.n

m e,, ~n ul 11 l nl\cl'rl\ ofMllf)liand:rnd,,l!rc:w..cd


rn h1~ p.1~r \\ lth Jlbhua Angn,t. -(luldr"n 1nd 111 ar t r nt<~'l bur Supplv: F.\ldcn.:c from E\ogcnou' Vanaunn an F.cmrh 'iizc, Am..m an Et'tlltomlc R, I 1'1:-> :->(3]: i"l-4~..

462

CHAPTER 12

lnslnJmentol Variables Regression

EU.3

(This requires Appendix 12.5) On the rextbouk Weh '>Ill! "ww.a"bc.cu 1


stock_ watson you will find the data set Weaklnstfume nt rhat ~.:ontain~ 1 1
observarions on (Y,, X 1, Zi) for rhe instrumental regression Y, p).:.. p \
1
+ u,.
a. Construct pfSLS, its standard error. and the usual 95 ~o coruic.h:nce
interval fo r (3 1

b. Compute the F-statstic for the regression of X, on Z,. Is there ~viden ~ c.:
of a "weak instrument " problem?
c. Compure a 95% confidence interval (or f3t using th1. Anderson-Ru
procedure. (To implemen t the procedure, assume thm -5 s {3 1 5 5.)

d. Comment on the differences in the confidence mlervals in (a) 'lnJ (c).


Which is more reliable?

APPENDIX

12. 1

The Cigarette Consumption Panel Data Set


The data set consis1s of annual data for the 48 continentnl U.S. -;tates from l9R5 to IVf,Quantity consumed is mensured hy annual per ~tpita cigarette sale~ in packs JXr fi..cal ar.
as derived from stale tax collection data. The price is the real (that is, inflation auju,tLd)
average retail cigarc:ue price per pack during the fisca l yeAr. inclu(]ing taxes. Incom e ~' 1 al
per capita income. The gencn1l sale<; tax is the average tax. tn cents per pack, due t ~~
broad- based state ~ales tax applied to all consump1ion goods. 'The cigarette-specfie t ' 1
the tax applied to cigaret tes only. All prices, income. and taxes used in the regre~ton' lll
this chApter are deflated by the Consumer Pnce fnde x Jnd thus are an constant (r~ll 'ol
Jars. We are gratdul to Protessor Jonathan Gruber of MIT for providmg us with thes(. t:~ .

APPENDIX

12.2

Derivation ofthe Formula


for the TSLS Estimator in Equation (12 .4)
lbe first stage of TSLS is to regress X, on th1.. mstrumcnt Z, by OLS, aod to comp ut~

OLS predicted value

t 11

X,, and the second stag..: j<; to regress Y, on j; by OLS Ac~..-orJ,n~ 1

Lorge-Somple D~tnbution of tho TSLS otimotor

463

the lnrm 1~ fN he 1'-.LS c"11mator, e\rl ""cu 10 term' 01 the J:n't:Uil teJ' luc X ,l'> the ormuiJ lor ,h .. Ol c:; c-.t1mator tn Kc~ l unc.:pt 4.1. "tth X, r.:pl:ll.tng A . That J.S..
ntW~

. h"!>amrI.. v.m.tncc.:
.
I .. mpI~ co\ari.Inc..
1 '.'1 w here \'- 1st
<1 ,., nn d ~,, '' 11c
lxt\\l~n ) , .md .\:.
n~- .lll\1. \ j, the prcuictcJ valu.: tlf \', fllllll the fil,t ,1.11!1..' rc~re,sion,
ir
~.\

1-11

.r -

rrl.t. the; Jd'mlliun" of ;;ample


'

\'ariancc~

ami covariance~ llllpl\ tll.lt

1'\-} T~l \

1T

sn and
'

'.- :r.>;(l::xcrCI,ei2A) fhu<;,tllcTSl c;~ttrnaltt\..otn hl\\TIIh.:n,"/31


=s.u'~\=
s 1 ,'(-r. 1.1 1 ). f-mally. i: ''the 01 c:; "lop.. ..:udh~o:tllll ht'"lthc.: lu:.t <;t .lg.~.- o11 SLS. st ir 1 ~,
. ..,ul't,JitUti m of thts tormul.1 tor - 1 1ntutht 1t11mul.1 ~n
'.I.; (rr 1 ,~)-}Jdd.:; tho.:.
(. rmul.1 forth .. T'-,LS 1.:.-tnn,,l\r m Equ.ttion {12 4).

APPENDIX

I Large-Sample

12.3 ~ Distribution of the TSLS Estimator


l11i' .1ppend1x "tudtcs the:

la rge-~t~mple dt,trthuttonltf the 1 SLS 'lim<~ tor

m the; ca'c wn-

\ll.krcd 10 <>ldtun 12.l.Lhat is.'' ilh a :.ml!k. trJ'-IrUill( nt,lt 'lll}lk indudc:J enJogcn11U:. ' an-

995.

t..

nil n,) mcludcd ..:xo~:.:nuus \artahk!>.

],, '' trl. \\t: Jc:rhc a formula tor tl<. I Sl s c tnlllllor 111 tern\' uf thee wr~ that tonn~
r r the emall11Ilg. di~u:....ill'l. similar h th..: C\PTI.:'' I n rr I the; OLS ~'tim.thl Ill

the b.l~l'

= {3 I\ - .\' ) ' (u - ii).


e-.:pte<,<;eJ a:,

f4tt.tli11n (4 'O) in \ppt:ndix -U. from l4u ti'm fl" 1) }' - )'

\c..:mllin1.1lv. the <;.ample covariance hetwccn

= -

1
-

II -

/'lOu ) ~ 111 h~

iv~,- 7.1!1~ 1 (.\', -.X)+ (u,

u)]

(1:! I~J

wh r ~ 11 ::...,
~

(7

Zl

~" (Z- ZI(X- ,\) tnJ v.htn th .. ltr 1!..:4 1h!y follow:. bu::au~~

=0

Sub-.LJiuung Lht d.:tmtlll..lll v.>7\ a1 d !h~ lm.ll t:\prc ....,.ou 111 E4u;~tton

tl" IS) 111111 the dcttnnion of Of 'L o.lOd mulllplytng th~ nurnetnllll ltnJ dc:nonHnlllM b~
(II - J) ,, \ it:IJ~

464

CHA PTER 12

Instrumental Variables Regression

aTSE.S =

,.,1

1 II
- L(Z1 - Z )u,
Q

,.,1

"t"

II

-1

__

1 "
.
;; L (Z, - Z)(X, - X)

(12 I )

Large-Sample Distribution of pfSL S W hen


the IV Regression Assumptions in Key Concept 12.4 Hold
Equauon l l2.19) for the TSLS estimator ts similar to Equ.uion (4.Jl) 10 -\ppend1x 4 3
for the OLS estimator. witb the exCc!ptions that Z rather than X appc.trs ut the; num~, r r.
and the de nominator is the covanance between Z and X rather than th~o \anancc ut ,y
Because of these simila ritic~ ond because Z is exogenous, the argum~;nt in \ppend
lhat the OLS estimator is nonnalh Jtstnbuted in large samples extends to fJ '" .
pecif1colly, when the sample s large, Z = Jkz. so the numenllor ts appro\lm.. t 1\
q = 11 Y~ 1q1, where q; = (2, - M-z)u;. Because the in~trume nt is exogenou ~, E(q,) 0
By the lV rl!gression a:ssumpti ons in Key Co ncept 12.4, q, is i.i.d. with variancc .,"'
var((Z, - M-7 )u,J. Jt follows t.hnt var(Cj) = <TJ = ~ I n and, by tho ccntral hmllth~.:orem </ "t
is. in large samples. distnbuted N(O.l).
Because the sample covariance ~ consistent for the population covariuncc. 1/\' - ' '
cov(Z,,X,). which, because the instrument 1s relevant. is nonzero. Thus. by Equation ( l~ '>~)
P?'.s ~ IJ1 + qlcov(Z,.X;). so that in large samples pfS~ is npproxmately dt''""u,.J
N(f3 1.<FjT ). where ~ s = q~/{co' (Z,.X;)]2 = (1/n)var((Z, - p.7.)u JJ!cov(Z \ ,))1. 'hidt
ts the e"presston given in Equation ( 12.8).

APPENDIX

12.4

Large- Sample Distribution of the TSLS


Estimator When the Instrument Is Not Val id
This appendiX considers the large-c;ample distribution o( the TSLS estimator tn the -.tUP
of Section 12.1 (one X, one Z) when one or the other of the conditionl> for in,trumcni v.dtJ
ity fa ilo;. Lf the Instrument relevance condition fai ls (that is, the instrument is wl!al.:l. the
large-sample distribution ofTSLS estimator is not normal: in (act, its distrihution I" that,,(
a ratio of two normal random \Onablcs. If the instrument exogeocity condttton t.nl-.thc
TSLS estima tor is inconsistent.

Large-Sample Distribution of

iJr' ~When the Instrument Is W eaJt.

FiN constd~r the case that the m.,trumcnt i~ trn:levant. so that co,~7 ~) 0. lb,.n tb<=
urtlumcntm ;\ppendix 12 3 cnta1ls dtv1:.1on by zero. To avotd thic; rn.)bkm. \\1. nc,;~J tltiil>.:

Lorge-Sample Distribution of the TSLS Eslimolor When the lnlrumenll$ Not Valid

465

a clo:.o.:r JOQk at the behavior of the term in the: do:n'-lmtnatur ol Cqu.tlton (12 19) when the
populdtnm ctwariance

9)

1:.

;.era.

We 'Inn ~y rewriting Equation (12.19). Because of the con~'ilency of the sample aver
age. in lnrEie samples. Z is close to ~l and

X is close to~>. TI1u:., the term in

rl.ltot ofL4uation (12.19) is approx tmately ~

(Z, - ,u.z)(X.- ~x) Let

'i.:

1(2 - ~.,)(X, - ~x) = ~

the denomi-

'2-'/4, = r, where

a; =var[(Z,- ,u.7) (X,- ~x)l , le t <1~ =u7t n and let q,ai.and

~ b\! ac; defined m Appendix 12.3 Then Equatjon (12 I 9) implies that, in large samples.

aTSLS

-9 =- ~1
n .._ _
- r

(q

lu~)
+ -(f )(q
-

(12.20)

r u,

It the instrument i~ irrelevant . .t.(r,) ::: cov(Z,,X,) 0. Th u~ i the sample average of


the ra ndom vanables r,. i = 1, ... , rr , "hich arc i.i.d. (by the second least squares assump!lon). hrt'e variance

0.

u:; =

var[(Z;- ~z)(X, -

.u.x)J (whtch is finite

by the third

rv regres-

sion assumption), and have a mean of zero {because t he 1nc;trumems are irrelevant). Tt
follows thai the central limit theorem appljes to' specifically. r l u, is approximately disuibuted N(O. I). Theretore, the final t:\pression ol Equal io n ( 12.20) implies that. in large
sa mples, the distribution of ~fSL..S

{J1 is the d JstributJOn of aS. where a

= cr / u , and S is

the ra tio of two ra ndom variables, each of which has a '>tandarcJ nom1al distribution ( these
two sta nda rd normal random va riables arc correlated)
ln Other words, when the im.lrument is irrelevant . 1he cc:ntrallimit theo rem applies to

the denommator as well as the numerator of the: TSLS estimato r, so that in large samples
1hc dil>tnbullon of the TSLS estimator ~ the distribution of the rallo of two normal random
''aria hies. Because X; and u; are correlated, these normal random variable~ are correlated.
nnd the large-sample distribution of the TSLS estimator ~hen the instrument 1s irrelevant
is complicated. In fact, the large-sample distribution of the TSLS estimator with irrelevant
instruments is centered on the probability limit of the OLS cllllmator. Thus, when the tnsltU
men! is Irrelevant, TSLS does not eliminate the bia:, in OLS and, moreover, has a nonnormal disuibu tion, eveo in large sample:..
When the instrument is weak but not Irre leva nt. the cli~tnbutJOn of the TSLS ec;tlmatorcontinu..::, 10 be nonnormal.so the gc oerall~son here about the extreme case of an irrcl
evant instrument carries over to weak in~trumen t s.

Large-Sample Distribution of pf'iUi


When the Instrument Is Endogenous
fht. numera tor in the final ex prc~ion in Equa tion (12 ltJ) convergec; 10 pnlbaNiiiV to
cov(Z,Jt,). H the instrument is exogenous, this is zero, and the TSLS estimator is oonsbtent
(assuming the instrument is not weak). If. howevcr.lhe mstrument1s oot~..xogeno~ then

if the inst rument


in~ t rumc n l

1s

not \'.Cak. ~l 'LS ~ {3 1 + cov(Z,.u1) / cov(7,.X,)

is not exogenous, th~n the TSLS I!!>limn tor I) J n<:on'>lbt~n t.

* {3 . lltnl is, if the


1

466

CHA PTER 1 2

lnslrvmentol VariabiM Regression

I Instrumental Variables
12.5 Analysis with Weak Instruments

APPENDIX

----

Thi( uppenJL\ J1'<:U~!'-I!~ sumc method' for instrum~..nt tl

\'.Uilhfc,

m,IIY'I' u h .. pr

ol poll.. ntiall} wcJk ilhllllmcnl~>. The appendt~ fncu:.c~ 1,11 1h\. ~:.n,~.. ol n "in~k m lud~d
dndng~..nou' rcgrc.:~SL'l [l4u:ttiOil' (12.13) ;mJ (I:! 14))

Testing for Weak Instruments


1: e ru11.. i - h in Ke) Concept 12.:>" ys thJt a ltrst''' l!t' F-~tall,uc k's than lu it 1.J'.
catc' tho.~tthe in,trum~ nls Dh.' ,,~:,tk. On.. mull\ tllcn fm tftt, rule of thumb aw.e' II""' n
appru\tmat~: cxp

e '""' rurthc hit~' of th.. TSL'\ .,ttmator. L..t {!".denote th~o: prvl ,,
f1'L ' -~~de not~ the.. <t~) mptoti~.. ht,t\ of the t 1L.')

11~ hmll ot th~: OLS e~llmntnr {3 1 nnd let

cMirnalllr (if the n:gr~.."''llf b enJugc.:nou'l.. then /)1 .....L:.. {1. 11 ' 4 {31). It is JX>''ihk " t., ,,
that, \\h.,;n there 111\! 01.111~ tn'>lrllnlCll~. the h~as ,1f the f'SJ "IS appm.JmaleJy nt~f' 1 ' )
1
{3 1 '- (Jf/'- ' - {J.)I[F(I J lj, \\h..:rc F.( F) IS th~.: t::<p(;l' l,tllun ol the. fiN-Mage /'.q ttl\l 1:
C(fl - IO,th~n the b1 ' I I \LS. rdaU\e h.> th.: ~ t.:. ul OL~. is uppr 'Xtm.Jtd~ /9 , u-1
o,cr 10 whid1 j, 'mallc..n"Ut.!h to he ctc:ccpt.lt>k 111 m til) applu...tllnns. R~ pJ..cmg F:(/1
Ill" rl F> l~1eld"t
koftltumh'r Kc\o(' ~ .. r l'\
pre\' ru" para~raph imoh'-J nn tppru\imat~ formul,l f 1 the
hta::. ot the I SL!:i o.::.tmt.ttor \\h~.; lther<! .trc man} in,trumcnt~ In mo't applicalinn I \\
T ..: mo ''-

'l nth~

ever th .. number"' tn:.trum~.:nh. "' 1::. small. St~X.I. HH.l 'I o!o!o (2005) pronJe .t form : 1 t..:'l
for'' en~ in.'>tnnm:nt~ that uv,,.Js fill' upproxtmatJull that111 '' Jmgc. Jn the: Stud.- y, lll<l 1st
Ihe null h~ pollt'-''1' b th Jt th" in,tnlm~.;nt' .trl w.: tl.. md th~:. .tltt.:rll<tll\ '- h} poth~ ''~ 1 ~

.ll

the in~trumenh tr~ 'tmn~. "l11:r.: c;trong tn,trunwnt' ut~.. ddim:u tu be 1n,tr Ull1ll1

whtch ti-e bta..' of 11 '- 1 \I "..:'11m tori" Jt "'!lo't IO"o of the hia' c;' the OU, "''um.tt

11

11)-nnl) ~o:rstonJ loa 1..rtllc.tl vnl ... th ..t ~cf'C'1d, n


pen-dot

te-., \It

;,

~~nw~.~n..:<: k\cl.tlt'cr

e r "ml,.r c f in~trur ~:nt

1 "''' '"'""

~c~I'Ct wc.:n <>t1'>nr.~l

5~

~\I the rule of thumb 'll~;ompanng /"to lU 1:-. a gow approxunauon tu tht: Stud \oi!O t~:t

Hypothesis Tests and Confidence Sets for f3


ll the tn~trum~. nl' ,tr.: Wl..tl.. th\: I ';LS ec;um.ttor "bt.t,eJ nnd ha" .1 non-nt,nn,tl dt tr j.u

8 , j, unn.:ii.:Jhle Th,re tlr\:,IHmever. tc~ts Ollht' null h\ pi. 1h


Jd
<.:.t~ tb tl .:It' 'll>l rc'lt."r" tl.c m'trum..:nt relc:vancc c.,ndtltnn to be' r J th..:se te<b .u
..r
Ublru n'-nt r~ trC"H '" li:, or c\cn trn:h:,ant Ihe tmp .,, and olde.:t o!
ll:~t' L' t-.1\ed lnthl.' ,\nJ rM>n Rubtn (I ':NY) !':t:lli,IIC.
l ~tn. '<It h.: U.GJ.Il t tC"l ~ I {J

Ander on Rubmte't f ne\\ vari:lbk Y(


r, {11" r In
1c

/3 1, 1pr<l\:ccd' 10 l\\o step:.. In the: t -..1 ~t<:p. ,t 1

th~ 'CI:tlnd

'lCJl, I

q~h.'''

} I 1!!11111,, till:

ill\ Jllcl'

ln!..'nJmenhll Variables Arto~) with Wco~ Instruments


.:..;ogcnou~ rccr~..._'i(,...._l W".;)

amJ the io~trumcnts l / , I

Ill~;

467

\ntl l'<lllRuhin swtistic j;, the

/-stnti,tic t~ ... ung th~ hyp<'lthesis that the cneff:crcr1 1 (\1\th / uc nllteru.Under the null
h~ pothesi.; tl tt/3 = /3 . if rhe in:.trumcnt' satisfy tlw C\1 .;~n~ 11y ,,mJitillll (condition 2 in
KcyConccptl231 th.:nthc:ywill~ unclrrd ,tco\\iththcerrortermrnthisrcgrcs~JOnand
the null hvpothcsh "Ill h.; rcJ~:ded 111 s,:. C"lf dJI' 1mph.s.
A:- J1~ussctlm Sccll)ll' 3.3 anti 7A. a conlitl~:nct: <oct can be c~n,lluct.:tl as the set of
value~ or the: parameter~ th,lt an. not rc:JecteJ b> a hyJXHhc~l~ (c)t. \ccurdmgl).thc 'et of
\a lu e~; ol {3 1 that arc not rejecteu hy u 'iulo l\nucr.,nn Huh111 tc't ,,,n..,tituu.:s a YS% cunfi
lkncc ~c.:.t tor /3 1 When thc And~n.(.lll Ruhin F-... tatistk j.. Cl>mputcd U'!>int: th~: humo<.kedao;ticity culy romlllla. the Anl.ler;~1n- Rubin confitl~:nc~: :o-ct can hl ,,,n,truch:d by '>Oh ing a
l(U.Jtlr .ttic cquatinn (c;ee

Empiri(illl::..\~:rci~~ 1.2 1),

I lw "'" c ~chmJ th.. AnJ~r;Cln-Ru~m <,lilll~ll ... rtt:\ r :aS-'Uml!s n~l rumtnt r~..h..:van.::....
und the Antkr...ul'Rurn cc,nllu.. nc~: 'CI ''ill h.rnc: .1 c.m..:r:~s,~ prob.tl. 'II) ol 95"' m .o~r{!e
s:~mpk'. 1c:her th" ,nstrumcnb ... r~.; '!->:vn!, weal;, or cn:n irrclc:,nnt. A.,do;.-s,;n-Ruh.n
confide1 ,..;.. ,._ ' L.,,.c 'Orne peculiar r"'pc-lif.:s-for c:xnmple.lhey can be empty or dbjoinL
r\ ura\\b.td. i~ that. when in~~rumu.ts .,rc strong (so TSLC\ is q11id) and the codiicicnt ''

,wc1JenlificJ. ,\ nJcNm Ruhin in1enab arc inclticicnt in lhe <.en(e that they arc \\oider

th tn confidence intenab !'lased on 11\LS

Estimation of {3
It thl '"'trumcnl' tr~ irrde\'ant. it b. n1.11 po..;,jhJc lonhtoin an unhia~cJ c,tim;ltllr of /3 1
C\~.;11

'n l.tr!!e '>3mpl!!'- Ncvcrthdc'' ''hcn m~trumcnb iU~ weal.: , (orne IV e~trmators tend
ro ht' mnrc ccnttrcu vn the true \'aluc ul {J tb:m h 1 Sl $. C>n~ (Uch csum.ttor j., th-.; hmrted
ini~HnHitm n maxmmm hkchh(.>Od (I 1\ff) e~umawt A' it~ tlilm~: tmpl1cs. the LIML C!>ll
mntnr 1:-1 1he mux1mum ltkchhood c~llmator of~ ~ in th.. ~y:-tl!ll l 1)1 Ltlualion~ (12.13) and
{12.11) (for;~ cliscuo;;ston of maximum li(..elihoC"lu esumalltm. w" t\ppt..m.h-.: I I 2) nw LIML
... ttm.llur al".> b tht.. ,,,)uc l1f ~1.u that mmimiLc., th~.: lwltl\l,ke~)a,ttCil) unly '\nJcrson
Ruhin tc,.t ~"''''tk. Thu.s. if the AnJ~:r-.c.m-Ruhin cn1hdcm:,;o ~t j, nut empty. il will con lain
Ihe u:-o,tL c~tima~or
Ir the in,tn ,r,crl:- .c weak. the UML estimator I$ more nearly et:ntercd on 1he tru-.

.....

e of P1 than :.

~L~

I in,truments are ~trung.thc LIMI.and TSLS e~timatur:.coinctde

L.rgc :.:impk-;. A dr.:~wb<>ck of the U\lL c~timator if; that it can produce e.xtreme out lie~
Confidence inh:rvub c.:un~t rue ted ar0unJ th<: LL\1~ cstimfttor --,;-: the Ll\tL .,tandard
t1 itrt: m(lreo relinhle than imcnab Clln<.tructe J around th r<a '\ c.;timatM u-.ing Ihe
T'l '\ . . t.m1.hru erwr. hut are le~s reliahl~. lh 10 \ r d('rS<\1\Ruhin Intervals when the in-.trure w.: .tk
'1111! prohlt.m-. ot ~c.timation. tt:,ting. 1nd conhd n.:c intcnl\1' in 1\' rt.-gr~.::.sion ''ith
"'" dt in.,trument~ con!illlutc an area uf l>ngum:- "' .n:h. 'li lc:ttn more ;1l'ltlUI this h'lpi.:.
VI >I thl' \\'ch Slh! rur Ihi;, f'\o,1k.
tn

m_,,,, ..

CHA PTER

Experiments
and Quasi-Experiments

13

n many f1eJds. such as psychology and medic ine. causa! cffccl.!> are communi"
m~Lhc..tl

e:;.tirnatcd using experiments. Before being approved for widespread

usc , for example, a new drug must be subjected to expenmenraltrials in whic.h


some patien ts are randomly selected to receive the drug whi le oth~rs arl' gi""" a
harmless ineCfective substit ute (a "placebo"); the dJug is approved only if th j,
rando mized controlled experiment provides convincing statistical evidence that

the drug is safe and effective.


Although randomized controlled experiments in economics are uncommon.
the re are tbree reasons to study them in an econono.euics cou rse. Frrst. at a
conceptualleveJ. the notion of an ideal randomi7.ed controlled exp.;:riment
provides a benchmark against which to judge estimates of causal c[ects in pr.ll'IIC~.
Second, wben experiments <rre actually conducted their results can be very
influential. so it is important to understapd the limitations and

threat~

to ''altdit~ oi

actual experiments. a<o \\f!l as heir S'WPths Th jrd. external circumstags')

..

sometimes produce what appear; to be randomi/-3118flj that is. hccause of extc.:m.1l

.,..

event". the treatment of some indjvjdual occurs "a!> if" it is random. For example.:

suppose that a law is passed in one state bur not its neighboring state. If the c;t:lh' ll

residence of the individual is thought of 'as if' it is randomly as-.igneu. then

,.,h~:.n

the law passes it is "as if" some people are randomly subjected to the law (the
treatment grouv) while others are not (the control gyoup).lnus passage of th~o.

);iW

producc.!,;quac;i-experi.ment." also referred to as a 'naturalt.!xpsriment.''nnd_ _

many of the Je~c;ons learned hy studying actual experiments can be apptied (w11h
some modifications) to quasi-experiments.
Th is chapter examines t:xperiments and quasi-experiments in economl'-~
The sHuisLicaJ tools used in thi::. chapter are multiple regres::.1on analysis.

Experrments and QuasiExperirnents


rc~rcs..,ion

469

analy..,1, ot panel ua1u 'lnu anslrum\!ntal v:.u-iahlc.::s (lV) regression.

Whar dbtingui$hes the Ji,.cuc:-.ion in this cbJph:r j.., nut

th~:

tools used. but rather

the typ.: l11 uata analy7ed anti the spcci.tl opponunilic' und challenge~ pos~d

when analyzing cxperimenl:.. <tnd quasi-expenmem-;.

!'he nwthods developed in this chapter arc often ust:d for program
C\ aluation.

dfe<:t of a

Program evalmttion i th..: field of studv I hal ssmcyrm c:stima (i ng the

croram. I ohcv, ur -:orne other

the dfect on

~arnings

IQtc!T\'Cilllllll

or treatment." \Vllat

j.;;

of going through a job training. program'? What .., the

t..!lfcct on l.'mploymcnr of low-skilkd workers of an mea ease m the mimrnum


wage? What is the effect on college uHcndancc nf makint! low-cost stucknt atd

loans a' ailabk to middk-ciJs.., '>tuuent~? Thi ... chapter sJi..,cu~::.cs how 'ud1
programs or policies can he cvalumcJ uc:ing l!xpcnm~:nrs or qua~i-e\pcnmems.
\Vc begin in Section 1.3 1 b, elaborating on the dascussion in Ch<lptcr I of an
ade<.~l randomized controlled

c.xpt!riment and caulial t!tlect).. In realit), actual

expet im~m~ wilh human subjects encounter practical probkms lhat

threats tu Lheir internal anJ e.\teroal vahdity. and the'e

thre<.~t

<trc Jascuc;scd

~~,;elton 13.2. As djssu cd m Sect ton 1:u . soan~.- of these ducat~ can
ndtlrcs!ieJ or e':a.luatcd

con~titutc
1n

bs

w..ing rer.rt!ssion mdhods, including the ..dincr.:nces-in-

<.li lfcrc.::nces' estimator ansJ instrumental variahk!-. rc rl.':..:..ion. Section 13.4 uses
thL'.;;c methods to an~ly:tc a randomized controllcJ ~xperimcnt in whtch
ch:m~.:ntary
..,tate

students wen; randomly assigned to Jiffercnt-si/cd cb!ic;c<; in the

of Tennessee in the late 19RQ!).

Scsti,,n 13.5 turns to qua~-experunents and the ~,,tm1nlion of sauc;al eflecls


using quasi-experiments.. Threats lU the \flli<.hl~ of qua..,i-cxpcrimcnts an:
dtscusscd

111

Section 13.6. One issue that ames an both

~;x pc:rimcnts

;md qual>i-

cxpcraments is that trl!atmcnt effes~ can Jitter fmm one memhcr of the
J10pulation

tO

the

PC\1, a nJ

the

m,tlti!T of

interpreting the re<;ulting c'>limate" Of

c.tUsal effccb when the pupuhnkm i'\ hctemgenc::oll.) lli taken up tn Sc;ction D./.

470

CHAPTER 13

Experiment~ and QvosiEx.periments

13.1 Idealized Experiments and Causal Effects


Recall from Section 1.2 that a randomized controlll.!ll experiment rdndnmly ~!eel!
subjects (individuals or, more generally, entities) from a pulatwn of antcrcM

or mterel>t of the treatment as measured in an tdeal


experiment.

raodomaz~d ~:.ontro ~

:1

Ideal Randomized Controlled Experiments


Initially, one might think that an ideal experimen t would take two otherwise: idl.!ntical individuals, treat one of them, and compare the difference in their out om~-.
while holc.ling constan t all other influences. This i.s not, however. a practical expl..'rime ntal design, for it is impossible to find two ide ntical individuals: Even id~.nti
caJ twins have different life experiences, so rhey are not identice1 Lin every way
The central idea of an idea l randomized expe rimem is 1hat the C<lusal df~..Lt
ulation and then
randomly gJ\ mg some o[ 1Qe individuals the treatmem, If the trca t m~.;nt ~ a'"'L 1c
cat random-for example. b} flipping a coin. orb} usang a computcri~d r.mdom
number gt!nerator- then the treatment level is distributed independent ly ol any
of the other determinant of Lhe outcome rbereb eliminating lh\; o:,sibtli \
omitted vanable bia:, (Key Concept 6.1). Suppose, fo r examp c th~t andrvrdu1l'
are randoml) a~igned to auend a job training program. An andividual's prior ,,c;k

Y, = {30

+ {31X, + u,,

where X, ac; the treatment level and, a~ usuaL u contain::. all tbe additional J.,.lt:r
minanh of th~ outcome Y,. If the treatment ~ thl ~ame for all membe rs of th~

13.1

Idealized Experiments and Causal Effects

471

treatment group. then X, is binary. where X ;= 1 indicates that the jlh individual
received the trea tment and l , 0 mdtcates that he or she did not receive the treatmen!. If the treatment level vanes among those in the Lreatment group, then X; is
the level of treatment received. For exam!;. X , might be the dose of a drug or the
number of weeks in a job Lrainin ro ram. where X, - 0 li lbe treatment is not
receiv
ro .
; is binary. the n t e tnear regress10n unctton m

The Differences Estimator


rf X , is binary, the causal effect can be estimated by the difference in the sample

tor {3 1 from the regression of Y, on X;. If the treatmen ts ran om y asstgne . l en


E(u,j X;) = 0 ig E gua tjgp (! 3 1) apd 8, is unbjased. The O LS estimator~~ fro m
i'lie regression of Y; on X 1 is called the differences estimator because, whe n the
treatme nt is binary. it is the difference between the sample avera e outcome of
a men
e avera
y randomly assigning treatment, an ideal randomized controlled experiment
e liminates correlation be tween the treatment X1 and the error term u 1 so the differences estimator is unbiased and consistent. In practice, however, real-world
experime nts deviate from an ideal exper iment. and proble ms arise that can introduce correlation betwt:cn X; and u,.

472

CHAPTER 13

Experiments ond Oumi&perimcnts

13.2 Potential Problems


with Experiments in Practice
Rcctll tn1m

K~~

Concept 9.1

th~\1

a lit:ttllitical stUU)'

IS

Threats to Internal Validity


Threat to the intc n.tl 'alidit\ of ranuornizcu wntmllcJ c:-.pcrimc.: r'lt-. inc.l.de 1, rl
ure 10 run
dh:<."b. am.l..;mall 1,amgle irq

Failure to randomize.

R tndom assignment to the tlc.llnh:nt anJ ~ 11 1nl


group is th~ fund<HlH.:ntuJ f~.ttun. nl .1 randomii~U contrnlh.:J expertm1.1 I u..
make-. it possihh.. Ill t:'>limatc th~: ... .~u-.al dfcct. If th~.: treatment is not assign<? I r.m
domly. but inste.td i-. h,ts~,;d in part on th~ charactcrbtic-. or pt~o fcrence-. of tl ui>J~oct hc.:n e\f'l.ri..... eutalllUtc('lmes v. ill rcllcct ttoth ti-c .,ff c ('II he treatment nJ
the dh..d of tht; nnnr.mJom a"illnmc.:nt. f-or cxamph.. suppo-.c that part -:' pnt"'
in a jon 1ratntng p1 o~ram cxpt:nm~.:nt n r~.: assig.neu tu the! tre.tt rn~.nt g.roup c..!cpl nd
ing on whether thd1 Ia>! n amr Ltl b jp Jb,~ "i (ljl 91' 8'''j()!)d lp lf gt !IW '11 ph h'l
n ectus~.: nl o:thntc dittcrcnccs in la"'l numc:s. cthnictl\ could Jilkr s ~ema t icll h

In .m a~tu I \!xr~ , m~ 11. ['C<'Pls.-' ~


th\ ~~ ~ thl "h.tt lhS) Ni' q!J. In ;!-IOh tr3ining. pro!!ram np~ rim~ nt. for ex an ''
!.Om~ ol the >ubjclt-. u-.siwed t('l lhl' treatment group lflll!ht nor -.ho'" u 1<' 1 thL"
traini ,,

Failure to follow treatment protocol.

13.2

wcol1~ called

Potential Problems With Expenmeots m Practice

partial com fi ance w1th the tn:atmcnt ,rotocol. In some

473

cu~es,

thl'

c>..pcnmentcr mm..., w ether the trcatmcnt wa~ actuully received (for example,

th" trainee atlcndcd cia"" . und the treatment actuall~ rcceived is reconled as \",
Because t ere..., .m dem~:nl of choice in \\hethcr th~. ~uhj~.:ct receives rhe treat
mt:nt. X (the trsatmspt r.ctuall" rcccivsd} "ill he correlated ,, ith u (\\hkh
mcludcs moti\ation and innate ahtlit\) C\cn 1lthc c ~ r.mJum ll'..,i!!nmcnt. In
1
o ther word-.. '' th par11al comphancc the trc,atm~.nt anJ ~ntrol group~ no lunll!cr
'' hu:b 1
sc cclJl)n.111U'>. fai lure

to folluw

the ' ft "J! mtpt

pn ltncni it!tlds to bias Hi the OL

t.:sttmator.
Tn oth~o.r ca.;cc; the experimentt:r mi!!hl not know whether the trc:mmcnt '"
tctuail) rccet\c:d r or ex.tmpk. if a c;uhJt:<:l!n :l n~e;dt~o. ..tl npcnml:n t b prO\ ttkd
wJt h the c.Jrug hut. unb knO\\D!>t to th1. rcsc trlhl.!rl). ~Imply does not taJ...~.. 11. thc.n
thl..' recorded tn:.nmenl (''n.:CCI\ol.'d Jrug")" llh:orrccl. Im:omxt mcasur~rn\!nt nf
the trcatmcnt actually received also ll.'aJ.., to hia' in the difference!> e-.timatur.

Attrition. Attrition rcfcrc; to subjects droppina ou t ol thc c;tudv after betn~ ran
Jomly as~ign\!d to the treatment or control 11\lll ' \oml.'ttmc~ auntion occur' for
ll.'ason~ rc ale.' lO 11e treatment ro rarn: fPt c>.amplc. a parttcipanltn a JOh
lr:u nmg" u \' mte. f need to leave l0\.\11 to
.., k r d ali\ e. But 1f the.' rca"''" lor allnliun b rclall:d to the treatment ihcll, tht!n the attrition results in hias
in the Ol S C\limator of the cau..,al clfcct. P(IT I.'X:tmplc:. -;uppo~e the muc;t able
tninee.., drop out olthc JOh lrainmg propr.tm npcrimcnt bccauc;e rhey !?$1 OUH.II
town johs acquired u in~ the job traminl! .,J,.,rJf..,, o;o that at the end of the ex eri

c..,t,mah)r will c tase . ecau t :tllnlwn re..,ulh tn a nonrandumly sckcted


o.;amplc. allnllnn tha t ts rela1eJ to th..: l!catnH:n t lc<~ds to sclectwn biu'> (Kc)
( um:cpl lJA ).

474

CHAPTER 13

Experiments and QuosiExperiments

The Hawthorne Effect


uring the J920~ and 1930s, th.e General Electric
Company conducted a seriel> of studies of
worker productivity at its H awthorne plan t. ln one
set of expeJ;inwnts, the ro::seurcher" Yaricd light bulb
\Vatlagc to see how ti~ ting affcc.:t~.:d the proJuctivity
of women assembling eleclrical parts. In other expcr
iments they increased or decreased rest pcriods,
changed the workroom layout, and shortened work
days. ln11uemial carly reports <.m these studies concludecltbat producthity continued to rise whether
the lights were dimmer or brighter. whclll<.:l' work
days were longer or shorter. whether contliricms
improved or worsened. Researchers concluded that
the productivity mprovemems were not the consequcocc of changes in the workplace, but instead
came ab<.1ut because their special role in the ~xperi-

mcnr made the worker~ feel noticed and valued, :so


they worked harder <md harder. Over the yean., th ..
idea tha
n e erim..:nt influences subJect
behavior has come to be known as the awthume
effect.
But then! is a !tlitch to this swry: Careful exanunation of the actuHl Hawthorn~:! data re\'e<tls no
HaWthorne cffecr (Gillespie, 199 I: Joncl;. 1992) 1 Still.
in some experiments. especially ones in which the
subjects have a stake in the outcome. merel} bdn)!.
in an experiment could affect bchavinr. 111~
Hawtnornc effect aod experimental effects more
generally can pose threats to internal validityo...-cven
though the Hawt horne effect is no r evident in the
original Hawthorne da ta.

Experimental effects.

In experiments with human subjects, the mere fact that


the subjects are in a n experiment can change their behavior, a phenomenon c:ometimes called the Hawthorne effect (see the box o n this page). For example. the:
excitement creaTed by or the a.n;ntion resulling from being tn aii cxperipJcntal
program migh t bring forth exLra e ffort that could affect
mes.
e expenmcnts. a "double-blind" protocol can mit1gatc the effect of
being in an experiment: a lthough subjects and experimenters both know thnt tlw~

are in an experiment. o;
e ~iL~h~e~
r ~kn~o~w~.,vwr,'w-~~~;.~~~~;;;:,~;:,:~~
or the contrgJ group I g a medical drug experiment. for examp e, somet1me' 1e
Jfug and the placebo can be made to look the same so that neithe r the msJtc::L
professional djspensing the drug nor the aticn t k
dmimstLrc:
"'IS
ere
in or
o. If the e xperimen t is double blind. then lwth
the treatment and control groups should e xperience the same experimental efft::!!.
so different outcomes betwc
e tw
can be atLnbuted to the drul!
ou e- lind experi ments are clearly infeasibk in real-world experimen~'> in
economics: Both the e xperimental subject and the instructor know whethcJ thl!
subject is alte nding the job training program. In a poorly designed c xpenn~
.!.his experjmegtal effecT could be sulsstmn4tsl. IW e @ill pi. ($acners 1~ 11

13.2

Polenliol Problems W'lth Experiments in Proclice

475

it thf > run the ric;k of losing t 1c1r jol"' 1! the prygrnm gcr! 0 mJs poorly in rhe ex pet,.
tmr.:nL D~citling whether r.:\rl'rimcntal results ;m.: biased bccau::-1.! ol Lh\. ex penmental t!ffects requires making judgment~ based un \\hat the expt!rinwnl is
e' aluaung and on the detail!' of ho'' lbe experiment was conducted.

jmprccbdy.

Threats to External Validity


n ueab to external \'3hdll\ compromise th&: aha hi\' tn generalize the re-.ult!) ol the
S(UU} LO other populauons and !)Clllngs. !"wo such thr~ats <n;,g \\hen thL l \peri
mr.:ntal -.am kit. not rl.! rcsc:ntall,"e of thr: popuhtlton of interest and w
lhe
c trr.:mmegr 1hm nphl bs jnwkmcnted more broac.lh.

Nonrepresentative sample. The popul.ltlon stuc.licd anJ the population vf


interest mlbl be sufficiently ..,tnuJar to JU:oillfy gcnc!r.tli7ing tlle expenmentnl re<\UILs.
II a JOb traming program ts c' aluatcd man experiment with lormer pn on mmatt:s.
then it might be possihk to gent:rali/C the stud~ rc-.ull!. to other furmer prison
inm<Hcs. Because a l:rimin<ll record weighs h~:u\ jl> op the minds of potential
l'lll lovers. however the results nu 1l no! lcncrali7c to workers who h<~vc never

mental
ven if the volunh:~o.rs arc random!) ao;signed
to treatment and control groups. these volumcc..r. might be more moth ati!J than
the ,,,crall o ulation .md, for them. the treatment could llavc a greater effect.
Mmc generally. selecting the sam pi!." nonranc.lom ,. Irom 1 1c greater popu nuon ot
intcrc~r can compromise the abllit~ to genaaliLe the n;:.ult:- from the population
o;tudied (-;ucb as volunteers) to the population o l 111tcrest.

Nonrepresentative program or policy.

might

nt>l

The policy or program Llf interest also

provtde the same quality control as thl! cxp~rimen(al vcrson or It mig.ht

476

CHAPTER 13

Experiments and Ouasi Experiments

be fu nded at a lowl!r leve l: either possibility could result in the full -scale program
being less effective than the smaller experimental program . Another dtHcrcnle
between an experimental program and an actual program b its dural o 1: lltc
experimental program only lasts for the Jcmnh of the ex ri n1 wbi!'- the a~tual
program under consideration might~ available for longt!r pcn tld~ of urn~.

General equilibrium e/focts.

An issue related to scale .md dur. tion concern


what economist!> a~Jl " encraJ equtlibrium" eftcct
!!1 tsmnog~
experimental program mto a wt esprca , permanent program mtght change the
economic environment sufficiently that the results from the experimcnt cannot be
generalized. A smaiJ, expt!rimenta I job training program. fo r example, might ,up

a>ruhtbisn:::

plement training by emWgyeu, but tf the mo~~ ~~~ ~~'!dfly


could dtsplacc employer-provided training, ,hf;Tt;hv ; ; ; ; the get bc11elt" 111
the progrum. Similarly, a widespread educational reform, such us school vouchl.! r~
~H'3fjiiy rctru'c.ing class sizes, couJd increase the demand Cor teachers anti chungt.
the type of person who is attracted to teaching, so the eveniUal ne l effect oi the
widespread reform would reOect these induced changes in school per-,onnd.
Phrased in econometric tenus. an internally valid small ex penment might corrct.lly
measure a causal e ffect, holding constant the market or policy environment Jt
general equilibnum effects mean that these other factors are not, 111 fact, ht. lJ t.lfl
stant when the program is implememed broadly.

Treatment vs. eligibility effects. Anorher potenualthreat to external' .tli 'itr


ari es because. in economics and social programs more generafu JQlOid gatjg J.ll.an actual (nonexperimental ro am is usuallv voluntaf). Tht&. an experir .tal
asur~~ t e effect of the program on ram.lomJy ~e lected mem !'l~..,, of
the population will not, in generaL provide an unbiased estimator of the proeram
effect when tbe recipients of the actual implemented program a.r; P::rm ttcd t,!!
decide '"hether to part jd pate.Ajob training program might be quite eftecuvc \l)r
the idw who choo e to take it. yet be relatively ineffective for a randomly sclt:c ~:J
member of the populalJon. One way to address th is i:.:.ue is lo design the ~\pt. 11
ment so that it mimics as closely as possible the real-world program that would h.:
implemented. For example, if the real-world job training program is made avatl.tl11t:
to individuals meeting certain income cutoffs, then the experimental protocOi louiJ
adopt a similar rule: The randomly selected treatmen t group wopld be 3iven ~
"treatment" of eligibility for the program. wherea~ the control group would
h<:
m adt! chgtb!e 2 lp lh!> ca:.e, the difference::. estimator would C::l>liUHIIc! the dlt
cligibilit) for the program, which ib dille rent than the JOb training lreatment df lt
for a randomly selected member of the eligible population.

t 3.3

Regression Estimators of Causal Effects Using Experimental Data

477

13.3 Regression Estimators of

Causal Effects Using Experimental Data


In an idea l rando mizl!d controlled experiment with a binary treatment, the causal
effect can bi! estimated by the differences estimator. that is, by tbe OLS estimator
of {3 1 in Equation (13.1). If lreatmem is randomly received , then the differences
estima tor ill unbiased: however, it js not necessarily efficient. Moreove r. if l>Ome of
the proble ms with actual experiments discussed in Section 13.2 are present , then
X, and u, arc corrclarsd sg 6. is hjased
-- Thts ection presents some additional regression-based methods for analyzing experimental data. The ajm is to obtain a more efficie nt estimato r lhan the diffe re nces estimator when lbe treatment i randomly rcceiv~
un tase . o r at east conststent csnmator gt the causal effect when certain threats
to internal validity are present. This section concludes with a discu sion of how to
test for randomllalion.

The Differences Estimator


with Additional Regressors
Often data are available on other characteristics of the subjects tha t arc re levant
to dctc m,ining the experimental outcome. Because earnings depend on prio r education, for example. earnings in an experimental job training program evaluation
will depend on prior e d ucation as well as on the job training program itself. In a
medical drug test. the health outcome could depend on patient characteristics. such
as age. wcisht, ge nder. and preexisting medical conditions. in addition to the drug
treatment it e lf. Let W ..... W denote variables measuring r individ ual charac-

tt: sot at these characteristics enter the regressio n explicitly~ as urning that these
charactcristicy cotes iz on dy rbis !earls to the multtple rcgrcsston mode l

(D.2)
Tne OLS estimator of {31 in Equation (13.2) is the diffcre nses ssfimntqr w jth
a dditional regressors.

478

CHAPTER 13

Expertmonb and Quasi-Experiments

In l~4untion ~ 13.2). X~ the treatment vanable anJ 1 ~ W \"ariahk~ arc co11


tml \:tri.tblc,Ailhu;!h we bCtve frequcntl) Jiffert.:nttulcd hct\\ccn treatment ;m
e<.mln.ll vanaf1h: .... we have not yet made a precise dbtinction h<..tw~cn the.: IW<l.

What is a control variable? Throughout thb ho0k, we have U'>cd the tern
control fur .1 factor thaL it omit
would h:ad to \)0 ' \,n 1
a c bia' iur the codfiL-1ent of inLCn.: I. In the crnptnc:..l application to cl,,.,, "'l
and the 'tudcnt teacher ratio 10 Sc::ction 7.6. \\ c LDcluJcJ the pcn:c:ntag~.: ol ,tu 4

not dcpsm! on rh ..: ' PO'S m' \ arinb!> X. ln_t e c <Co~ '-17C cxamp c, L l l'l "'"h.:
correlated with I octo"'\. "uch a.' learning opportunitk., outsidt.! school. that ~.;lll'r
the error term: indcctl.tt is becaust olthis corrdation that/.chf'cr is such a u,~.tul
control \ariuhlc. lhc correlation between LdtPct and the t.!rror term meum. thll
the codfi<.:lcnt on l.chfct does not have a ~.:ausal interpretation. What th.:: comll
tionalmean tcro independenct..: as~umptton says is thm . gJV~n the contwl van tbl~.;
in the rcltrt!s,ion (incluuing LciJPc:t).thc mean of the error krm does not d~.:r~" I
qp rbs stUucnt-tcat..:hxr ratio. p dw nwllwwn' db 1hr "w"Hii! II 'tdm 1uhn duA.
ha .. c a cau~al inter 1retation evt::n though the CLlefficicnl
Pet Joe~ tHll
n llt: contc:-.t of experimental data. there are two rck: .. ant cases 1n w 1c 1111\.
coudttional mcun-zeru assumption fmls but Lhs conuitil1n,d mc;1n inth:pcnt.IL r ~~
.tssump110n hold'.

13.3

Regression Estimators of Cousol Effects Using Experimental Data

479

mtc has the snmc chance o l hcing asstgned to the treatment group, the mean

or u is the same for raduates in the r

ent and control roups. Similarly. the


mean of u il> Lht same for nongraduatcs in the treatme
.The
1can of 11 however., wt
tion , corre a c \\.1! 1 '

,
1
rot'< ll
1
asc. X i'i as~1

at ion 'ltalUs IV. and as is discussed furer m Appendtx I 11l sgqsJujonal mean tn<.lependsqss hpld'i and the diflcrcnccs
c-.hmator "'ith additional
w,iiJP;{ .
ll> tmport.mtthatthc H', rcg.n.: sorl> in Equation (13.:!) not be expl.'rimcntal
4. ~ p~omc... For example. suppose that Y, i~ earnings .tfh:r the joh training program.
t,PfY~ indicate-. ~sttin a 'ob aflt!r the rouram. and.\ indtcate freatment. Iocludmg

.fit

e l:mp o~mcnt statu in the re2re~100 change' the 4ucstion ctng as s to

.ALP' "". tll(e,ant ~ dfect of the program. ho constanr future t:mplo~ ment ~1oreo\cr,
///(If.~ ~ploymcnt could he correl

'
c ro rram c.;; s o gcttmg a
l
JOb) an wt 1 t c error tcrm (morc-abll. trainees rcsetvc a JOb). ln thi~ cm.t:. condiltondl mean mdcpcndcncc would not hold. W~ thcrdor\! restrict attention to lV
tp;J variables in l:quation 13.1 th,tt mc;t~:.urc pretreatment characteristics. which arc
nol i.nl1ucn~.:cJ by the cxpcrimcnta treatment.

-f;'pit
,

CcjJ;istency of the differences estimator with additional regressors. II


!(9fce
lmll lt.:ast squares as-.um tions lor multiple regression hold (Kev Conce t 6.
1
~ 1 then c )L.' C!->timatur~ of all coefficknts Ill :quut1on 3.2 ar~ c.:onc;istcnt and
.

l'>trihutcd in large samples. 1f

t1e

trs ..:as s

480

CHA. P~riments and Quosi-Expenments

l)malkr variance) than the OLS estimator tn the ' tnglt: rcgrc<>-.ut mudd [Equat: reason for this is that includin the a ( ttlona l \.:
a

of Yin Equation ( 13.2) reduces the variance of the e rror tyrm l '' C
CISt: I 8 7).

Fv r

2. Check for randnm ization. If the treatme n t is nol randomly assigned. nnd
particular is assigned in a way that is related l o the W's then the dif

111

Adjust fo r "condiriontJJ" randomization. As previou I ~ c.la,cussed the proha


bility of being assigned to lhe treatment group can differ from on~.. roup of
subject<; to another.
can dt: nd on pretreatment chamcta..;ttC'S n
If 'o.ancluding tht:: 1... \\'variables control"> fort e pro a 1 ll't 1 .1

In pr. cli~.:C.lht. :.econd and thifd ol lhc!Se rdS'tsns can be related . If th ... h"ck
for randomuation in reason 2 indicate that the lreatment ,.. as not ranJoml~
a signed. it might be possible to adjust for this nonrandom assignmt.:nt by u in_g
the diff~rc nces estimator wilh regression controls. Whether this is in fact rl," ihle.
howc..;vl!r, depends on rhe detaih of the nonrandom assignmenl. If the asc;ign nr:nt
pmh.tbility depends only o n the observable variables W, then Equation (1..2 )
a<.lju~ts for lhts nonrandom assignment. but if lh~.: as~ig.nmen t probabil ity th:rllllh
o n unob el\ eel variabl!!t- as well. then the adjustme nt made by including thl H'
n:gressor!'! i' incomplee.

The Differences-in-Differences Estimator


EAp\.rimcntnl Jatn an.: often panel data. that is. obsenation<; on 1h ~... "a me subJc~ 1 ~
hdurl.! and alter th~ I.'Xperimenl. Witb panel data the cau,.1l dkl.l can b~ ~.: ti
.. ,.vi d mim tht; " ,ljff, rcncr jp-d jffercnce, e"rima10r. willcli !... tHe ,h ttn;~.;

13.3

Regression Estimators ol Cousol Effects Using Experimental Dolo

481

in } in thl.! tn!atmggr gm pg OV!' F ' 'W q wras 01 the qcyrimt!nl. minus the:
averal!e chanl.!c 10 }'in the t<mtwleroup n\'er 1b1; ~ijQJt rww Thb tljffewncrs-mdifferl.!nc..:~.-:, t:stimator can bt! computed u'tn~ a r~:gre;~ion, which can be augmented
\\ 1th additionul rcgn:l>sor' mea-,uring subj~ct c.h,rract~Ti tk"S.
change

The differences-in-difforences estimator.


avera e o f Y for those in thlc! treatment rou
and

wwg!t a\'rfp&q

trdl mcnl

be the corresponding pretreatment nnd posllm rbe q wrwl Ktfl!IP I b.t !!Y"fi!t ,tjiugg. m Y 1mr

yrrmtrnl u/1<''

the course ~lt the experiment [ur those rn th~ treatment group is Y1''.'111n 1711"'1" Y" ""' "'
. anJ the "'~rage change in )' O\ er this pt!riod for those m the contml group '"' ) mrvl,u;u~ - yam;rvl. b.f. .'The difft:ri.'ncec;-in-differences estimator IS
the a\'erage ~lange m \ tor tbos.: 10 the
10 }- tor those 10 the C()nlHll group

rre.rtm~nt

group. minu Lhc

av~ragc

ch.tng~

=(
= j, y ueumhnl _

fyfifhiiJirr.

uf

ym:tJIIIk'lll.llfl<"t _

y~trutnl<'llt.lwforr)

(13.3)

j, y cmllftl,

trcatmcntarou and j, ywmwl


th~ 3\'t:rage chan
is the 3\"erage c liO!H! jp Y jp !]),' con!ml ym"p If t he ' fCi' ' W.' P' 1L" rardO'P 1)!0

\\ ht:TI!

rr'11111

<ithgneu. then

11

IS

ft/1-m ..lr'fl

ic: an unbiased and conw,Lcnt cc:rimator of the caus;1l

C CCL

I he J HiGcnccs-in-differences estunuwr c<tn h~.: "nth:n in regrcss1on notation.

Let .l }'1 he the change in the value.: ol l lnr tlw i 1h individual over the course ol

!l Y, = /3 ,. /3 \'1 t- II,.

Th~ uillcrence<;-tn-tllffcrcnccs estimator has two p(llt!nli:-~1 IH.Ivantagcs over thl.' single-

Reasons for using the differences-in-differences estimator.


differenc~:

c-.timator of Equation ( 13.1 ).

l:.[{idrnc r If 1he 1real mcnt i. rnntlnmlv rcce1' cd, t hen the sl jfllg ncrs-jp-

d1fkrt:l11,; '' ''- . .a tor can h\. 0101"1..' efficient than the dtffcfC"OCCs t:\ltWa\Uf.

11\1~

482

CHAPTER 13

Expcrimenb and Quasi-Experiments

FIGURE 13. 1 The DifferencesinDifferences Estimator

The po~Hreotmenl difference between the


treolment ond control group~ is 80 - 30
= SO, but this cwersloles the treolment
elfect because belore the lrcotmenl Y
wos higher for the treatment thon the
control groop by 40 - 20 20 The
di~in.O,ffereoces estimolof is

for the control group, thol is.


~f!ilt-lt!-d.Ffs

~ )i11..,_1 _

.l y conNol

y
711
I)(I

the

difference between the ~nol and initial


gops, so that Mif, ,.d,ifs (80 - 30) (40 - 20) = SO - 20 = 30. Equivalently,
the diFferencesmdifference1 estimator il
the overage change for tho lrootment
group minus the overage chonge

Outcome

2lt

10

,_ .'!

t= 1

(80 - .40) - (30 - 20) - 30.


J

wi ll be the case if some of the unobserv~:d determinant<; of Y 1 arc pc"""tent_


1
over Lime for a given indi' idual. a~ are aender and riO.! educ

nunm
nccs-in-diflcrcnces c-.tllllator j., more effiCient depend-. on" hcther thc ...c per
SIStc.:ntJDJl.. IOuai-Spte!ftt eJ:htt&ctCIIsdG EXp15lli ia?U 61 <iii511 .diidd~f
the variance in Y1 (Exercise 13.6).

d with 1~ in
,lhc:n
tiaiJc, el of
l c differences estimator is biased but the diltcrences in-differenccl> t:!)lJmator
is not. This i illustrated in Figure 13.1. ln that figure, the sample averag~ c>f >'
for the treatment group is 40 before the experiment, whereas the pretn:atn ' nt
sample average of Y for the control group is 20. Over the cour::.e of the cxrcn
ment, lhe sample average of Y increases in tht control group to 30. wher~ it
increases to 80 for the treatment group. Thus, the mean difference of tht. po::-1
trea tment sample averages is 80 - 30 =50. Howevcc some of tttis diffl!rl.'l1l'l'
arilles because the tre<Hment and control groupl> bad different prc.:tre.lll nt
means: The treatment group started out ahead of the control group. Thl: Jitf.:r
e nces-i.n-di[ferences estimator measures the gains o( the treatment gruup. rcl1
live to the control gToup, Y.:hicb jn this example 1s (SO - 40) - (30 - ~l}) )0.
More generally. by focu sing on the change in Y over the course ol the l; ri

2. Eliminate pretreatmefll differences in Y. If treat

,.

13.3

Regression Estimators of Cousol Effects Using Experimental Dote

483

ment, the diffcrences-in-diffc r~nces estimators removes the influence of initial


values of Y that vary <>ystema tically between the treatment and control groups.

The differences-in-differences estimator with additional regressors.


The differe nces-in-differences estimator can be extended to include additional
regressors W1,, . . . , Wm which measure individual characte ristics prior to tbe experiment. For example, in a job training program evaluation in which Y is earnings, a

W variable could be lbe prior education of the participant. These a dditional regressors can be incorporated using the multiple regression model
(13.5)

The OLS estimatO( of {3 1 in Equatio n (13.5) is the differencesin-differepsGs


estimator with additional regressors. if X, is randomly assigned, thep the OJ n5 ssri
mator of p ~ in Esuation (13.5) is un biased
The reasons for including the additional W r egressors in Equation (B.S) are
the same three reasons as for including them in Equation (13.2), which uses only

Extension ofdiffirences-in-differences to multiple time periods. In some


experime nts the individual is observed for multiple pe r iods, not just two. In a job
training program experiment. the ~n dividual's income and e m ployment status

484

CHAPTER 13

Expenmtnt~ and Ouos1 Experiments

mighl bl.:' ub)Cf'tcU monthly for a year ur m1.m.:. lnthb ca~c.the popul,ttwn 1cgrc~
.,jon llh>ueJ, in htuatt '"' ( 13.-l) and ( 13.5 ), '' hu.:h arc h<t,cu on th~. 1. hnngc 111 th~.
l'Ull.~>lllC bCI\\C n ,, 'La!:ole pretreat me.. nt uh,cr' .Ilion and n .,jnglc po!it-lrc.atmcm
f\ 1 1c:~,.n on 'lrc not applicable Such J ta c, r' ~ ,,, \cr talytctl u in
t ... "" '- ~.ch rt. rG:.-.mn modd of cdicm Ill '\; thl! de L~IS rc provtdc ll m
Appc

II\

stimation of Causal
ffects for Different Groups
Inc r.tus,) dlcct can diller from one subject tu th nc:\1 lll!pcm.ling nn mu ' I
ch.uactl. n:-11~'-l or cxampk. the efkct on cholel>ternl kv'-1" ol a chok .. tcn'l .. J, ,.
ing drug ~:lUh.l b.. grc..atcr lor a patient With a h11!h clwl~sh:rollcvd th.an lor nne

\\ hn'c chok.,tcrul l~\cl i already low. Simdarl). a JOb t ratnmg pog1.1111 11\t)!h l [1~.,
mnn: t.:lkctivc f1)t \\omLn than for men, and it mi~hl bl.! mor~.. ctfccttvc lw mutJ' <H~:d than fn1 uunHHivatcd suhjcctf.. M<.m..: ge!ner" ll> ,lh~o: cuu:-.aii.'ITcct c:lll ~k p~ lhl
nn the vnlw: ol unc ''r nmrc variables. which can either he llh~erwu (lih l,'llld 1)
nr u noh..,\..rv~:u ( hkc motivat ion).

r"'grc~'ur ,, .,., tpph~::.ttiOn

or lht! interaction mcthuus dN:u ..,c..J Ill Sc.;Ctlnn J.


lhc.; topiC ol 1nlerprctmg csumatc" \ll cau,al Lll~:octs \\h~:n the c Ill' 1~.11 c1
<.kp~:nd' on the: 'nlue Nan unob!>ervable ,,,riahk h. t.1!-.cnup 111 ~LLt1on I. ..

Estimation When There Is Partial Compliance


Jlthctn;; ~ p.1r11.11 "''mpliancc: with the cx pcrimentc~l prntocnl. then the lrcntllll'll 1
lt.:vd A c,an be.: correlated wllh tb<: unobserH!Uim. illuat chunaltcrh.tl~c; 11 u11d

\~ Ja,cu.,..cd in Ch.tptcr 12 in,trumentnl \'tlriabk' rcgr~:"lllll pw,adc ,l l-!.~;11


'- r.:.~I-.,Jiutwn to the prohl\!rr ol (Orre atioo bci\\C\,;n o regressor .mu lh\; error 1c.:1 nt.
umutg thut then: 1 nn m... r-..-,Kntal ..,,mabl a\ ;ulablc. In .m cxpenmcnt '' th

13.3

(J

Regression Estimators of Causal Effect' U~ing Expemnentol Dolo

485

p<u tial~.omplianc.:. thc aHi~11ed trc:umcnt level c.1n sen c a an in..trument.JI , -a riabk lor th.: mtua/treatm.:nl k'1
Rl:ullthat a \".tnable mu't '1311,1~ the 1\\u cunJ111on~ ol tn,trumcnt rck' ancc
and lll,trumcnt cxogeneit~ (Ke} Con(t:pt 12-.J ) tube .1 \ hJ llhtrum~mal vanable.
\, lllng a-. th~ prut~.x:nl i(O partiall~ lollcmeJ.thcn th~. .tliU tltrcatmcnt k'd (X,)
j, partial!) determini!J h) the '""igncJ trcatmcnt lc\c.l (7), "l' that the instrumcnt.tl \:lrwblc 7. i~ rclc,ant It the a-;sigmd treatment lcvc.:l1~ dctcnnined r<~n
dnmlv that" if the expcnment ha-; random ""1'-!.nmcnt 111J tl the cc.stgnment
thdt has no cllcct on the outcomt.:. othd th.tn thr('lugh It~ 1nllu~.nc"' on '' hetbcr
trcatnKnt I!> rccctved, then L , t) cxogc.:nuu). That '' ranuom .t::.::.agnmem of Z ;
imphe... that r(u Z 1) - 0. wh...:n: u is thL crrnr term tn thc dalfcrcncc, -specification in rquatinn (11.1) or in the uiltcrencc' in diffcrenct:' ,pedficatiun in Equal ion ( lJ I).tlt!pcncling upon"' hich l!slimatot is hc.tng uo;cu. (bu(,, in an t.:xpcrimcnt
\\llh partial C('lmplwncc anc.l runuomly .tsst ~tncu ttcntml!nt. the onganal random

k, ~ ,

"'~r.nmea1 i!> a "alit.! in..,trumcnfal v.mublt

~ ,fO

1t

u~f

of~,.,;.."' Testing for Randomization

~Lt< l 2.

0
T;

:1- 0

(l'f~

w'ty ~<; P'l"o;jhlt IO IC<;t for randomization hy 1.bc\.kinj! \\~r the ranJomiLcd \C\fi-

(t 1

Jbk .lctuall) d"p.:nd.. un

~ Tc _j_
({X, .
Jkr~~"

:m~

oh,t:r\ahlc mdtnuual ch.tr.lctcnsttc<;.,

or random receipt of treatment.

II lhl; trcalmcnt ts

random!~

~~cc.l, thut .\,"til be uncoirelatc::u \\tlh I he oh,crv.tblt: tnc.ll\ aJu.tl charactcri-,-

J. 1l1U', the h} plllhcl>il\ that treatment ts 1anc.lnml~ rccci\cd can be tested b) tesfmg fhc h~ pot hcc;t, that Inc coeli lctent' (lO t\' ..... Cl 1 c ru in a rpw;,.~ton of
.Y, on W11 . . W In 1 L 10 traanmg program "\.unplc. h.:grc..;..;ant! receipt ('If job
trainmo (X) on gender. race, .1nd pnor eJucutwn ( n-.,). and computing lhe F:-tal ''"f 'f>'IR '' h c ther r ht> cOt:llJCknh on 1hc \\- ., .H1. 1cro rm Ide-. a test ol the

Te.s ting for random assignment. If th~.: treatment j, random!\ ac;'>igncd. then
the ac;"ignmcnt Z, \\Ill he uncorrdatcd with the uhscrv<tblc indtvidual chHntcleri..,ttc). IllU~. lh" h~polhc\1~ that trt!alrnl.'nl J (, ntnJ,,mly a ...... ign~.:d ~an he tested by
r~l!l C~Stnl] 7., on w,,..... \\ ri and te'\1101! I he null II\ RP!IW5i> I hl' t!! I b , !a pr~ cwtI

IICll.llh ar~.: 7l.ro.

1In 1111, c\nmpll!. ,\, 1' hma!J ~. ,,, dJ,cu~~l.lm Chuph:r ll , lh<' ~<:gre\''''" ,,f.\' un II'
, \\'" "a linmodclltnd hc:tc:rc,.(..cJu,I!~11V-rt>h!!'l ,l.mc.brJ ~"""' ore '""'"11.11 \nu h.:r Wfl) hll""'
lh II\ poth~ ~'' th ol F( \ 11 1
1\ nl c.locs nut dqxnJ ''" II 11 , .. \\ ,, '' h.:n .\, 1. b1nal) 1~ to U'-C ;,
rwhll I tf lu!tll moJd ( - c.,. d on 11 ..2 )
c.u p1ohab1ltl\

4 86

Experiments ond OuosiExperiments

CHAPTER 13

13.4 Experimental Estimates


of the Effect of Class Size Reductions
fn this section we return to a question addressed in Part I l: What is the effect 011
te.t cores of reducing class size in the early grades? In the lat'" L9S<b, fcnnt!,Se~:
conductcc.l a large, multimillion-dollar randomized contr
tain ''bet her clao;s s1ze reduction was an e ec:ttve wa\ to im rove dcmentar~ educataon.
11s experiment have strons tv influenced our undl!rstan ing
of the effect of class site reducuons.

Experimental Design
The Ten nc see class size reduction experiment, known as Proj ect STAR ( ~tu
denr- tea cr Achievement Ratio . was a fo '
uaw the effect on learning or small class sizes. Funded by the Ten nessee SlJtl!
Lt::gi ~ l ature, rhc experiment cost approximately $12 million over fo ur years. n t
study compared three diffcrem class arrangements for kindergarten through thmi
grade: a regular class size. with 22- 25 students per class, a single teacher. am.! O\l
aides; a sma ll clas" siz

17 ~tudents er class and no


~
sized class plus eacher's aide.
Each school participating in the experiment had at least one class of t!ach t) p~.
and students enrering kjndergarten in a participating school were randomly
assigned to one o( these three groups at the beginning of tbe 1985-1986 academic
year. Teachers were al o assigned randomly to one of the three types of classe ,.
According 10 tbt! original experimental protocoJ .stu d~ots would stay m th ar
initially a$Signed class arrangement (or the lour years of the experiment (klll,dc
garten through third grade). However, because of ar
ti all~ assigned to a regu ar class witb or without an aide) were randl,IJ1IY
reassigned at the beginning of first grade to regular classes with an aide or w n.;~'
ular classes without an aide; students initially assigned to a small class remain <.1
in a small class. Student::. entering school in first grade (kindergarten was optionul ).
in the second year of the experiment. were randomly assigned to one of the thll:t.
groups. Each year. stude nts in t.he experimen t were given standardized te~ts ttllt:
Stanford Achievcmt!nt Te t) in reading and math.
n,e project paid for lhe additional teachers and aidt!s necessa ry to achi~,t'
the target class sizes. During the fi rst year of the sludy. approximately 6400 ... u
dl!nls participatl!d in 108 ~m all classes. 101 regular classes. Hnd 99 regular dn''~~
with aides. Over all four year:) of the ~LUd}, a total of approxtmalely II ,bOO;:,tU
denrs at 80 c;chools participated in the study.

<

~~

lA \jl

-,

(\
\,.:/'~
'

13.4

Experimental Estimates of the Effect or Clou Size Reductions

487

Deviations from the experimental design. The experimental protocol specified tha t the students should not switch bet\vt:en clas~ group .other than through
th~ re-randomizatioo at the b.::ginning of first grade. However. approxima tcl~ 10%
of the students switched in subsequent years fo r reasons including incompatible
children and behavioral problems. These switche!l represent a departure [rom the
ra ndomization scheme anJ . depending OTl the true nature of the switches, have the
pote ntial (O introduce bias into the results. Switche!. made p urely to avoid personality conflicts might be sufficiently unrelated to the I!Xpcrime nt that they woul d
not introduce bias. If, however. the switches arose because the parents most concerned with their children's educat.ion pressured the schoo l into switcl1ing a child
into a small class. then this fai lure to follow the experimental protocol could bias
the results toward overstating the effectiveness o( s mall classes. Another devialion
from the expenmenlaJ protocol was that the class sizes changed over lime because
students switched between classes and moved in and out of the school disrricr.

Analysis of the STAR Data


Because there are two treatment groups-smaU class and regular class with aidethe regre.c;sion version of the differences estimator needs ro be modified to handle the two treatment groups and the conlfol group. This i!. done by introducing
two binary variables, one indicating whethe r the stude nt is in a small class and
another indicatiJlg whether the s tudent is in a regulaN,ized cJass witb a n aide. This
leads to the population regression model

(13.6)
where SmallClass 1 = I if the t th student is in a s mall class and = 0 otherwise.
RegAide1 = 1 if the jlh student is in a regular class with an aide and = 0 o therwise,
and Y, is a test score. The effect on the test score of a c;m all cla<;s, relative to a reg
e e eel o a rcgu a r c ass wtt an a1 e. rc a 1ve
maHng /} 1 and 13: in Equation ll3.6) by OL.{
Table 13.1 presents the differen~ estimates of the effect on test scores of
being in a small class or in a regular-sized class wtth an aide. The dependent varia hl Y, in the regressions in Table 13.1 is the stude nt's total score on the combini!d
math and reading portions of the Stanford Achievement Test. According to th~
estimates in Table 13.1, for students in kindergarten, tbt:! c(fect of being in a small
class is a n increase of 13.9 points on the test, relative to being in a regular class:
the estimated effect of being in a regular class with an aide is 0.31 point on thl! test.

488

Experiments and Quasi-Experiments

CHAPTER l3

TABLE 13. 1

Proje<:t STAR: Differences Estimates of Effect on Standardized Test Scores of Closs


Size Treatment Group
Gro de

Regressor

Snull cla.,s

t)()

! ::!.45)

Regula r <ill ''<llh .uJo:

031
(::!

2~ )

CJ!h.ll4 ..

lntcrco:pt

11!13)

:\uml">er o1 nb,o: l' .rllllOl>

57Xll

-:!9,7X'*
(~)U)

I I.%' "
(2.h'i)

III.W \l)h
(I 7Xl
fl.n~

141.31)
('7L)

15.511

_, 4S

u.:o

P.5~)

!2.27)

11"7 !11
I I R2)

1.:!2X.'il
(I nS)

()049

12 411)

5%7

ntc rt Qr.,won'

"rc c 'llmtllco u'm~ th~ ProJectS I'AK Public AcCC!>' D:rt01 )~ttkscri~~J rn 1\ppt:ndr\ J:l J. Il l\' JcptOmkn t v,,n.bk is
thr. , l uJ.:nr, i.<>mhmru 'l:ur~ nnth~ marlt am.l r~HJing p.xtionso[ the \tan ford Actlrcvcmcnl r.,,, $tanll,trll error ur"' I!I''C il 111 P"n
the"'' under the l.~)dltccnl\. I he rntlvdual codTrt:ICill r'> staust1.:alh ").'THfkd.O.l ott hr. I".. ''~mfkancc I.:' .I u'mg .rl\\o ,u.ku t:\1

For cm:h grade. the null hypothesis tlwt smal l clnc,<..cs. provide no impruvemt..nt 1:>
reJeLtt.!tl at the l% (two-sided ) s1gnificance leve l. However. it is not p o~~ihle w
rCJC<:t the null hypothesis that having an aide in a regular da:.s provtJ~.:, n '
lmprov~:ml:nt, rd<nivc lo not having an ald~.:.._':!xcept in lint grad~The c... um teJ
magnitudl!' nf the 1mprovemenh 10 'mall cla<~~c') arc broad!} 'itnillar m gr<tde~ "-:!.and .l although the estimate is larg~:r ror first grade.
11H: differences estimates in Table 13. 1 sugg.e-.t that reducing clas' ')izc ha:, ,tn
effect '"' t~:'>t performance. but addtng. an aide to a regular-sized cia'><> ha.; a I"'U.:h
:.maHer l!tkct. pO:.\JbJy zero. A~ tlic.,cussed m Section 13.3, JUI!mcnting the rc~1 L <;stons m fable 13.1 with addnional regressors [I he W regressor.; in Equat ion ( I ~ 21]
c~m prO\ iuc more e(ficient estimate.., of the cau::.ul eflects. Moreover. il the tr..:ttt
ment rccc:ived is not random becau"c o( failure-., to follow the tn. atmcnl prowcol.
then the e-..tim:lles of the C\:perimental effect;, ba!.ed on rcgn.:~sion' with additi ~tn:~l
regrc~-;ors could differ (rom the difference esti ma!es repo1 tctl in Table l~ 1. Fnr
the::.e two rca..,ons, estimates of lhe experimental effect. in which additional rc!l!'l:, sors an: mcluded in Equation ( 13.6) are report~d fur kindcrg.trtc:n u1 lltbk 13.2:
lhe fir ... t column of"labh.; 13.2 rept:!.H!. th.c rc!>ults of the fir~t column (for ksnJ~.- r
gartcn) from Tahk 13.1. and the n.:maming three columns include adutlltHhll
rcgressm-. lhat measure lcacher.school, and "tudcnt charm:terio.;tics..
Th~ mam condu.-.on from Tahlc 13.: is that the mull iph. rcgn~-..,ion I.!Mim ~t~ 5
()! lhL cauc;al cllects ot the two treatments (small class and regular-sized da"" \\ 1th
aide) in the final thrt:c columns ot fable 13.2 an: 'lmilar to the difference-. ~::>ll
mate rt.:poncJ in the first column. "Int: fact that aJthng the\c ob~en .tbk r... _r ~ -

13.4

TABLE 13.2

(1)

(0

II cia~~

R~f,ul.u

.;tzc \\1lh mde

k.~hcr, \~:lr~

489

Pro1ec1 STAR. Differences Estimates with Additional Regressors For Kindergarten

Regre1or
~

Experimental Estimates of the Effect or Clo~~ Size Reduchons

II 'I
(!.2"')

(2 )

JJ (WJ ..

I~

(~.451

(_.211

-II. HI

(225!
J.nn

nr.:xpcn net

(3 )

(0.17)

'

-M

I ,.,
t2114)

1174
(1117)

(4)

G)
1 "'Y
(I.%)

0 11(1
(II 17)

I:' 114

IJO}

~
h~c: lun~h

.\J -lt
(1 9'-J)

chgnliC

- 25 4~
p 5fiJ

010

Bt.t..."

-s ;;o

J\ 1~'\: (llhCI 111.111 hl.lt k Ill whtlc:

(I!"::!)
1)

lmt ~"l'l

lltO:l.
!l.h3l

904 72"*

!2.22)

Sdul01111Jilultlr "mahk''

no

no

r.

1J Oi

002

0.22

5i'~h

Sir16

'i7lltl

~utnllc:t

uf bscn.niiOll~

\\: '

)C'

11.21\

,,..

'I h rcf_to;SSJOII \\CIO: HOllllcJ 11>1111! lht' l'r<~tcd \T-\ R Puhh" A<.U">' D.11 1 '- ~ ..cn~d tn \t pcnth\ tl I I he Jq>cndcnl \llfl
:1hlc I\ the wmt>mctl 1e 1 ''"'c lin lht m.tlh tnJ cJJir" pml'""' cf the: Mar I >rd \cluc~'CIIICnl l~t I he num~r Ollt't'i, tV.lllltn'
Ill 1 111 th '''"' '' 111 Htcl"n' t>c,,HI't I ) rn<' m1"'"~ J 11.1. SwnJ:uJ en or. llr~ S''' n 111 p IIC'IIIh~c~ untl" ...xtlltCill!:- I he:
11 d t.llLII u-.clhCtcnt 1~ lalt\11~.111\ $Jt!OIIrc.tn1 .ttl he :'i% lt:vd or
I ..., ''l!mlll'llll~~ l'd U\111!1 & 1"<'-'~tJ,d tc>t

11 more plau!>tble that l.be random assig nment to the .,mallcr da,sc~ ,tfso Joe!> not

<h:pcnd on unoNT"\icJ ' ariablcs.A~ cxpccll.:d. thc'c addition.tl regt c"u'' in~orc,l'-C
the R-: {lf the rcgrc 'ion. and the stanu tro l.'rrnt ol th~.: c'tim:llcd cia'' o;;i1c effect
Jccrt!ci'C' fwm ~...JS in column (1) t(l :. 1 o in "nlumn (41.
Because teacher~ were random!~ asst\!ncd to da'" I\'J"C" \\ ithin 1 school. the
cxpcrim~.:nt aJo..u prt)\ tdc an opportunit~ to c'tin1.11c tht: dkcl on '"'' scores of
lc.'!<lCht;r experience.. Teacht:rs were not, h owcv~r r tndoml) as'-t!!ncJ aero'" parll\."
tp<tltnl! 'Chl)()l" i.IOO

some: schools had more cxpcncne..:d h:aehef1> than nthcr'i lhus


teacher cxr'-'rknc" wulo be corrcl.lleJ "ith the c1 rur tct m. as it wuuld hL' if the
more experienced teac:hcr; work at 'chOlll' "1\h more rc,ourcc<: and wilh higher
3\'Cra~c test !;Core~ Accordingly. to estunntc.: th~ c:flcd ot tcach..:r c'pcn~ncc on
''"'''<;\.Oft:'-"" nc~d to control for the olhcr dnradcri't""' uf the '\l.!hool , \\htch is
at:cumph'h~J tl'ln .1 complete set ot mdtcttm \Hrlilhlc' fur cudl ~dwvl ( :.chooJ

490

CHAPTER 13

Experiments and QoasiExperimen~

effects""), Lhot is. indicator variables denoting the school the st udent aueudcc.l
Because teachers are randomly assigned within a school. the condnton,tl m~.;an ,
u; gt\ Cn the school doc not depend o n the Lreatment; tn the terminology of Sl ~
tioo 13.3, because of random assignment within a school. the condittonal 011. .. ,.
independence assumption holds.. where the add.illonal W regressors are the <.choll
effects. Wben school effects are included. the estima~~ o r the dh.:ct of cxpcri cn~
drops in hatr. rrom 1.47 in column (2) to 0.74 in column (3) FH;n -.o.thc e~timatc
in column (3) remain!> statistically significant and mode rate!} I re.c, tt.U \'t' ,-... o,
experience corre~ponds to a predicted increase in test scores or 7.4 pomts..
It is tempting to interpret some of the other coefricien ts in Table 11 2 r1r
example, kinde rgarten boys perform worse than girls on these standard11cd te-.t-But these individual student characteristics are not ramlomly assigned (the pender of the student taking the test is not randomly assigned!). so these addiuun 1
regressors could bt: correlated with omitted variables. For example, if race or dlgi bil ity for a fret lunch is correlated with reduced learning opportunities outstJc
school (which i) om itted from the Table 13.2 regressio ns). then lheir estimated
coefficients would re flect these omitted influences. As discussed in Section 11 1 if
the treatment is randomly assigned then the estimator of its coefficient is com;,.
tent, whether or not the other regressors are corre lated wiLh the error term hut I
the additional regressors are corre lared with the erro r term then their coefftc~nt
es1imators have omiued variable bias.

Interpreting the estimated efficts of class size.

Are the estimated err


ol class size reported in Tables 13.1 and 13.2 large or small in a practical sen ~
There are two ways to answer this: first, by Lranslating the estimated chang ' ul
raw test scores 1010 units of standard deviauons of test scort!S. so that the esun tc:;
in Table 13.1 are comparable across grades; and second, hy comparing lhe u timated class sue effect to the other coefficients in Table 13.2.
Be<.:ause the di'itribution of test scores is not the same for each grade, th~ ~:-ti
mated effects in Table 13. 1 are not directly comparable across grodes. We lat:t.:d
this problem in Section 9.4, when we wanted to compare tl1c effect on test scnr s
of a reduction in the student- teacher ratio estimated using data from Cal it~lfll l 1
to the estimate based on data from Mal)sachusetrs. Because the two tests diffl!n: d.
the coc(Cicients could not be compared directly. The solution in Section 9.4 \\ ,, ... w
!ran late the estimated effects into units of standard deviations of the test. so th ,11
a unit decrease in the student-teacher ra rio correspondl> ro a change of an e~ u
mated fraction of a standard deviation of test scores. We adopt this approach h ~c
so that the esttmat~d eCfects m Table 13.1 can be compared across grade.... ! 1r
example, the standard deviation of test scores for children 10 kindergarten IS-:. .
so the effect of lxang an a small cia,., in ktndergarte:n h .....t.:d on th~;. ~;.:>ltm.t in

v .c;.;v

13. 4

TABU 13.3

.-

Experimental Estimates of the Effect ct.::aass Site Reductions

491

Estimated Closs Size Effects in Units of Standard Deviations of the Test Score
Across Students
Grode

Trec1tmont Group

Small class
Reguh1r St7<! with aide
arnpk standJrd dcvtauon
ol test score~ (.~ y)

0.19
(0.03)

(0 111)

0.00
(0.03)

(0.03)

(11.113)

84.10

7330

73.70

0.!.1

om

CJL30

0.21
(0.0.>)
(HJ()

The csums tc) aml ; lancJ.IT() .:rron. m the fi two row' arc: the ~lrm3tCd ettccts in t~ble I3 I, dl\ adccJ 1'>,1- the -.ample ~lancJanJ dc'l
Stanford Achi~:Hmcnt Te,t for hat grad.: (tht findI row LD rlus table). C\>mputc:u U\lng dtllaun the 'tudo:nt' in the
.:xpcra menl. Standu rJ ~rrnr~ are g.aven in renthe~es under ooefficaents. l'he ind ividua l roc:fricacnt~ slatisticull) Slgmfkant at the.
,aiHlO of the

'" ''''""""' ''~' '"'' . '"""''""' . /;. 9/~


Table 13. 1, is 13.9173.7 = 0.19, "vith a standard error of 2.45173.7 = 0.03. The
estimmed effects of class size from Table 13.1, converted into units of the standard
de viation of rest scores across students, are :summarized in Table 13.3. Expressed
in standard deviatjon units. the estimated effect of being in a small class is similar
rfor grades K, 2, and 3, and is approximately one-fifth of a standard dev1ation of test
scores. Similarly, the result of being in a regular-sized dasc: with an aide is approximately zero for grades K, 2. aud 3. The estimated treatment i!Hects arc larger fo r
first grade; however, the estimated difference bCt\\een the small class and the regular-sized class with an aide is 0.20 for first grade. tb~ same a the o ther grades.
Thus, one interpreTation of rhe first-grade results is that the students in the control
group-the regular-sized class without an aide- happent!d to do poorly on the test
that year for some unusual reason. perhaps simply random <;ampl ing vanauon.
A nother way lo gauge the magnilude of the estimated dft:cl of being m a
sma ll class is to compare the estimated treatment effec ts to the other coefficients
in Table 13.2.1n kindergarten, the estimated e ffect of being in a small class is 13.9
points on the test (fi rst row of Table 13.2). Holding constant race, teacher's years
of experience. eligibilily for free lunch, and the treatment group. boys score lower
roo the standardized test than girls by approximately 12 poin ts according to the
estimates in colun1n (4) ofTable 13.2. Thus, the es t in1at~d effecl of being in a small
class is somewhat larger than the per(ormancc gap between girls and boys. As
another comparison, the estimated coefficient on the teacher's years of experience
in column (4) is 0.66, so having a teache r with 20 years of experience is estimated
to improve test performance by J3 points. "Thus. the e~Limated effect of bemg tn a
small class is approximately the same as the effect or havmg a 20-ycar veteran as

492

CHAPTEit 13

Expariments and OuosiExperimenb


<~ teacher. rci:Hh<.: to ha' ing a rtl!\\ h:achcr. Th~e compari'>on.., ..;ugge~t that the
cstimmcd effect ur hcing in u small da-;.., is c;uh-;tantial

r conoml!tricians. "latistidan..,, :mJ -.pcctJh~ts m clemt. nedu a lion ha' e stuthed vurious a<,pects of this exp<.rimcnt.and v.e hncll> :.urn.
marit.<. some of Lhose fi ndmgs hcrl!. On.;: of the e t1 ndings '" 1h:11 the dkct uf ..1
small class is conct!ntratcd tn the earlkst grades. Tilt' can be ~ccn in Table 11 \
cxcept fllf the anomalous first-grudc results. tht test sco r<.: gap bc:twecn r~.:gul 1r
.md c;ma ll classe<> n . portcd in Tnbk I:t3 is e<:..,cntiall) constallt acrchs graJc.., (11 )
'tandard deviation unit in kindcrgtlftcn. 0.23 in second gra<.le. anJ 0.21 tn th d
gradI.'). Bccausl! the children initially ac;signcd to a ~ma ll cJm;c; stayed in that 'm:Jll
class, this meam that sta\ tng in a small class t.lid not result in .1dditiun.JI gc~ 111 ~
ratbc:r,th~ gains made upon tnllialasstgnmcm were reramed tn the higher g.r.ttlc~
hut the gap betwcl.!n the treatment .tnd control groups thd not incrcasc.:\notht:r
finding i<~ that, a-. inJicatc:d in the second row of fable IJJ. tlus ..:xpenment ~hl'"'"
little benefit of having an aide in n r~~l ar-s izcd classwom. One pote ntial concLrn
nhout interpreting the results of the c~peri men t i" th-.: failure to follow th~ trL,,l
ment protocol for -.orne :,tudent' (o;nmc <;tudent!. switchcd from the ~m:11l d.h,e-.).
II inhial pl.tcemcnt 10 a k.tndcrgarh:n classroom ts ranJom und has no direct effect
on test ~cores. then mnial plitccment can be used as an instrume ntal vanable ,1-.
p.trliall). but not entirdy. tnfluem:c~ placement. T his strategy was pur.,u ... ~.. tw
Krueger ( 1999). who used two stage k.t!>l squart:-. (TSLS) to estimate the effe~r on
t\.:SI scores of class silL' using initial classroom placement as the in, trumental an
ublc: he tound that the TSLS anJ OLS estimates were similnr.lenJing him 11 wnclude that deviations from t h~ experimental protocol d id nm intmcluce sub~tnti.tl
bias mto the OLS e sttmatcs.~

Additional results.
!M)

Comparison of the Observational and


Experimental Estimates of Class Size Effects
Part II pre!'oented multiph! regression estimates of thc class size effect bast. d ~1 n
obc;cnati\1nal data for Californi<1 and MassachUSi!lb school distnCI!'o. In tho-.c d.tt.t.
class s11e wac; nor ranuoml) asc;igneu. ~uL instead was dl!tcrmin~d by local sdt~1L1 1
ofttciab trymg ro b:.~lan ce cducutional llhjectives against huJgetar) realiti.:,. I"'"
00 those Obl>I.TV<ttiOnal e'\tlffi3(CS compare Wllh the C'\pCrimcntaJ C\timat~., r 1111
Ph)jCCt STAR'>
fm turtbcr rc:uling atl<out l'ttljco..! '\I \ R -......: Mo ...tellcr (I Ql);i ). \i~h:llcr I Jdli ..:Jnll 'i<~c!ls 1t t') J
.._u.cg.:r ( 1999). Ehrcn~rl!, Br.."c:r ( oamoran, .md \\ illm' (:!IIIII .1 .l(XII 1'1) t.h,cu~<: Pn>J~'Ct ST.-\~ uJ
pl.t~C: II m lh C:<>Dic~t Ill !he: ~lh" dcl'1ale I Ill c:J ..... ~ siz.: anu n:l.ltcu H':>l."iHCh n tit .. tuple Fll '' 111 ~
ltllcbm' f T'nlJI.'~IST \R. ,..,c: ll.wu'h~k (II'I<Ja). mu for n Tlltc-;tl \lew I tho: rdat111n.~hip b<l"cctl
cl.t" '"' un<J p.:rform '"'"' m<~rc r,cncrnll). '>C'c H:mu~helo; (J9<.1'JbJ.

13.4
TABLE 13.4

Study

- 13.90 .
K)

(2.45)

l alifornin

- 0.73
(0.26)

M<:Sachuscus

-0.64

(JtrUd~

493

Estimated Effeds of Reducing the Student--Teacher Ratio by 7.5 Based on the STAR Oato
and the California and Massachusetts Observational Date

fJs

STAR

Experimentol blimol!!$ or the Effect of Closs Size Reducnom

(0.27)

Change in
Standard Deviation
Studeni-Teocher
of Test Sco res
Rario
Acrou Students

95% Confidence

Effed

Interval

73.8

O.l'J
(O.Cl3)

(0.13.0.25)

38.0

IJ.l 4
(0.05)

(0.04. 0.24)

-;,6 f.7{;9.
0
1_$

0.12
(0.05)

(0.02. 0.22)

Small cia..~ v-...


regular cia~~

-7.5 ~
1~

-7.5

Esrimotod

(f .

i'h< c'uruated cocfficn:nt ;J for the STAR study IS lllkeo trom column ll) tf Table 11.2. 111~ estunatcd coct!Jcscms tor the Cahlor
ns;a . n.J \.i.,d.:huse!l' \tu..Sse~ .tre taken from the ftrst column of lablc 9_'l Th~ e,, tim:tt!!d err..c.t 1S the dfo.:c.t of bcing l.n a =all eLl~
.,.~~,,a rqular cl.t~ ((or STAR) or the dfcct of reducing the studcoHem;bcr 111110 by " 5 (for th< C1hfomia and ~lassachu.etts
n hcsl lbe <K~ ct>nfidcn-t inte""' for the r&!duction llllhc studcoHucher ratio i' ths <''-llmostuJ d fect I 96 standMJ errors.
::.r.mdard erron are ~iven tn pan:nthe~es under 6lunat~ cttccu. lbc cslsmau::d cffms arc 1>Uit,tk;.lll) \lb'llsf~t."'illltly d!llcrcnt !rom
~trn ~~ the S% lc:velvr
I% "!-'IIIIi an.:e levelusmg. a two-ssd.:d test.

To compare rhe California and Massachuseus estimates to those in Table 13.3,


it b necessary Lo evaluate the same class size reduction and to exprc!'s rhe predicted effect in units of standard devia tions of test scores. Over the four years of the
STAR experiment, the small classes had. on average. approximately 7.5 fewer stu
dents than the large classes, so we use the observational estimates to predict the
effect on test scores of a reduction of7.5 s tudents per class. Based on the OLS estimates for the linear specifications summariLed in the firs I column of Table 9.3.Lbe
Ca lifornia estimates predict an increase of 5.5 points on the test for a 7.5 student
reduction }n the student-teacher ratio (0.73 X 7.5 2l 5.5 points).'l1iestandard deviaLion of the te~t across students in California is approximate ly 38 points, so the esti
ma ted e ffect of the reductio n of 7.5 students, expressed in uni ts of standard deviations across students. i!. 5.5 138 == 0.14 standard deviatiorn;..l The Landard error of
the estimated slope coeffi cient for California is 0.26 (Table 9.3), so the standard
error of the estimated effect of a 7.5 s tudent reduction in standard deviation units is
0.26 x 7.5/38 == 0.05.Thus.based on theCaliforniadara,thccstimatcdeffect of reducing classes by 7.5 s tudents. expressed in units of standard de via lion o ( test scores
across students, is 0.1 4 standard deviation, with a standard error of O.OS. These calculations anti similarcalculatioos for Massachusetts, are, summarized in Table 13.4,
along with the STAR estimates for kindeigarteo taken from column ( 1) oiTabtc 13.2.
' - 1l n lilhld l '. the esrima~ed effects are pn:scoh:o m tc:nru.ofth e ott.mdard devsauon of test ~ore:' .u:ross

[able 13.3. the c!Stimatc:d c:fCecl!> .sr~ in term' Uf Ihe <,t.mJard dc\iaUon of l~l 'l'Orc~ dCI'0''>
The ~tandanl u C:VltllJOn acr~ ~tudeots is greater than the 'hmdi!rJ dc:vusllon cro'' d~<;tncts.
Fns C.sltf\)mia, the <,l!fni.lard d<: \i ation ;.s ro's st udent~ i ~ 38, but the~~ mdo~roJ d~\i<~tion aero di.M ncb
j~ ( I) 1.

tft.vtfiC/1, in
llmft!lll

494

CHAPTU 13

Experiment} and Oucni Expenmerm

lhc e ... umatci.l ctfc~.:h lrom th~.: C...thtomi 1 mJ ~''"' chusetts olhcn a llon,ll
srudt-.:-. arc some'\ hut smaller than the STA1~ cstmliltc:.. Om: reason that cstimutcs
from Jiflc:rcot stu<.Jk, Jllf~,. r. howeH:r i' random ....unplang 'ar~t~bilit~. so it make
~-.:n,t to compare conlldence tnh.:n 1h for the e'tirna u.l dlccb from 1hc thrc .
~tudtt.'- R.tst:tl ~)0 th1. \I \R t:hll<l [or kmdcqwrh:o the'-~"~ (..t. n ldcnt.~ i t
fm th~. dkct of !'l~snr 111 .1~maJJ d,t:-'> (reported 10 tht: tmul column oll.tblc 1~ l)
ts 0.13 w 11.25. fhc l.'ll111parahlc 95% conlt<.km:c int-.:rv,tl based on thc Cnliloaniit
o b,t.t 'ational dat.t j, () 04 Ill 0.2-i. and lc.n ~fa sachu':>ctlS tt 1s 0.02 t(> 0.2~ l11uc:. tlk
95"' c:onliJ~.ncc an h . n ,tis from the C..1lifomi 1 md \ lt,s. chu:-.cth 'tudics cont ut.,
mo't l f th~. 95 ~ contldl.'ot.-e inten<tl from the ST \R kindcrg.trtcn Jatc1. \ ic\\ql
in th1s ,~.1v the three studic" gwe stn l,.inglv 'tmtlar rlnl!l.'> t'l c<;timatc"
l'hcr~; lrc man) rt. l~uns ''llV th~.. e.\pcrimcnt.ll 1nd ohc:crvattonal t:'llm llc'
might Jtlic.r. Ont. Jl.'ason 1s that, as dtscuss..:J in SL~tt~.m 9 ~. thcr~.. .tre rl.'m.ea. un
thn:ats to th~ intern.tl \aliutt> of th(.. obst:J'-att<.m.tl stud..:~. for exampk b~~o: IU't
chilur~o.n move in .tnd out of diMnch. the: distrtct stuucnt teach..:r rutto m11.:ht
not reflect the !>lullenl ll.':tchcr ratio actuall} c\pt.!rknc~..d hy the ~tudcnt,, '" tl
c01. t,~,.,cnt 0n the .;tudcnt teacher ratio 111 the 'vta.......tt.hu,etts and Calif~o1mi.t ~ nu.J
ics c:ould h.:. hia!>t:J "'" .u J zero hccau<.e l'f ~ rror<. in-\ an able' hia<> Ot her rt.' a")ll'
COllt:CI 0 external hthJtty ill~ distrtCI JVI.!I"<Ig._l.! ~tUJC nl-tcaChl..f ratio used in the
OP'I.f\o.ttiunal ~tudi"" i'). nut the same tlung 'l.s lhl.! .t<.tual number ot htlc.lren 10 th..
d .......... th ~T \ R 1. 'Jlcrtmcr 1 1 'ariahk Project ST r\R ''a" in a sou thern slut ... Ill
the 1)~ .... pulcnti tll~ dtl ferc ntthan C.tlifomia and \I,J;s,achusctb tn IWS, :tnd I he.:
gr<HJc, betng comp<trcu <..hiler (K- :; in STA R. fourt h gruc.k in Massachusc.u ... l'ifth
grade in Caufornia ) In light of all these rc;t on" to C'Xf'II.'CI diffl.'rcnt cstimall''.lh~.:
flmltn~~ ~olf the: thrc:(.. stuJt~s .tre remarkably c:tmJbr 'fhc factth.llthe oh--cn.ttion 1l
'tudic' a "' "''mtlar t(.. th~. ProJ~C t $1.\ R ~ t m,llcs su~~~sts th.ll the: rem:untf'l
thrt:at' lo thc mternal 'ntit.lit} of the. ull._~.rvattllll dl~. :.um~t..:s ar~ mmtlr.

13.5 Quasi-Experiments
'fru\: rand\)ffitll.li ~OIIllllllcd cxpenml.!nb can he cxpcnst\ C-1 h..: ST,\ R C\1, 11
mt:nl CO'>I $12 milhnn anJ the~ ollen raise cthtcal COI1l'1.Tn" In m~Jicine. it " t1ti1J
h~. unethical to ddcJI11tnc.. the effect on lungcvaty lll smo~tng nyrandom I) ..~~.
1
ing \Uhjl'l.b to a 'mul.:ing trcatm~o.nt group and .I no l'ntuktng Cllnl'"OI :,!fll p i 1
economic-;. it wouiJ he lll ethic 11 to c ... tim:ue tht. d~.mand elasu~..it y Cor cig,m.:tl c.
amorw t~..:nagcrs bv -;cllitn ~ubstd11cd ct!!.3rCth:' tu nndomh ''-"ccted high -.chtlt,l
student~ r ..r co:.t. t.thic,t. md practtcal rt:a'>nn' t ru~ ran<.lomvcJ comrollcJ C\ P' rimenl'.. rt. rar~ 10 e..:unonucs.

13.5

Quo r hpenments

495

'i~.,erth...t ... ~~.

the: :.t.lti'>lit.al in,iht\ anu rn th1,U, of ranut~mvcu controlled


tn 1...tiT) t)\Cr to nonexp~.nmcnl tl\~o.tllng-.. In a qua,i-e,periment.
abtl callcu J na1urnl e~peri menl. randomnc 'I' t.llfl,duccJ b) 'ariallons in inJi' iJu tl circum~tancc' that mal-t: 11 appear a' ijtht. tr~.. Hml nt i' randuml~ assigned.
lhc'l; ,,triat"on~ in irH.Ji,iJu.tl circum,t.tnl'C' mu!.hl sri' hl.'t. su'c uf 'aganc;; in
le2.al in-.titution~. h>eation timme ot pohcv or prngtam impkmentation. natural
r.mJomn~.ss '>UCh a!> 1:1irth d,th!'i. rami.tll. or lither lat:IUJ'- that arc unrdatcd to the
cau-.al e!L!cct unJcr 'tuJ~.
There arc [\\'O t) p!!::. of yu.t::.ic).pt:runt. ub. lo thc lir't. "bcth~r an mJivtdual
(t)r, lllllft~ g.cnerall>. an entit~) reCCt\c\ trt. ttmcnt ., \IC\\cd " tl , "randomly
dctcm1im:J In lhis case. the c.tu,al dfc<t can he estim.ltcd b~ OL<; u'mg. the Lreatmcnl. .\ , a-. a rr..;grc~"or.
In the . . l!cond tvpc of qua ...i-cxpainkrH. the .1s tf' nndllm v.triation is onh a
pat tial determinant of tr~atmt!nt Scctwn I ' 1 dt'it:U'>!.C' ho\\, 111 ,lll expenment.
rand11lll a,o;,gnmcnt can he used as an tn!>ttumental 'ariabk \\hen tl mfluenc.:c:.thc
treatment actually recel\t!tL Similar!). in u quasit.:'\penmcnt. "as if'" random varialillll -.omc.:timc-. pr<.widc.:o, an instrumental vari.thlc (Z,) that influence:, tb..; treat
m~nt 1ctuallv rcceheu (.\) Acc<mlingl~. till: causal dkct '" estimated hy
in ... trumcmal v:-~riahks rcgrcc;c;inn whcH: the .. tc; tf randllln source of variation
proVIUC~ the IOCilrumcntal \afi.tble.
C\p~.nm~.nl'>

Examples
We illu.tr1ll. the l\\fl type.:' of qua ... ie\pc:rirnc-nt" h\ c\ ,tmpk,.l111. fiN C"'l:ampk
i:- <lllll''ic,pcriment in" hich the treatment is . , .. i1 randomly determined The
sec,,nJ .111d third t:xamplc illuo;tr;Jtc qua..,ic'<pCt tmcnb in wht~.:h the .. a... tf"
r tndl'lll \afl.ttt Jn mflut.nco.:s.. but due.:" nnt cnttrdy dc:.tt.rmtno.:. the. level ol the
trc:llm~.:.nt.

Example #I: Labor market effects of immigration. Dl'C' immigration


rcJuct. \\a\!.1!'- '? Economic tb~lr) -.uggc.:'l" that tl tht. 'uppl) of lahor incrca"C"
hecauc;c of an intlux of immigrants. the "pncc'' ul l,thor the..: w<u~c-,hould fall.
I lowe\ er all cbe hein~ cqual. immt!!.r,tnt" an: ,\llrat.tcd to elite~ wifh high l.tbor
demand . so th~ OL~ CSt lll13(0r or th\! cf(t: C.:I on \\:t)!l.l> or tmmigralton wtll be
bia::.cd \n tdc.:ulwouomvcu controllcJ c\pt.orimcnt for c'tunating. the dh::cl on
\\<I!!C' t'f immigration \\UUIJ flllllh\Oll)' .~~~ign oiffcn:nl nurnhcrs of immigrants
(dtlllrcnt''tro.:1tmcnb") to l.ltff~.rcnt l,lbllf m.trhh ( .,uhjc.:ct,") and measure the
cllcct on wal!.c.~ (the outcome) Such nn cxpc.:rimcn t however fac.:co, severe practk'.tl.ltnanu.tl nnu cthltdl prnhh.:ms.

496

CHAPTER 13

Experiments and Ouo$iExperiments

The labor economjsr Davad Card (1990) rherefore use(] a quasi-expcrimcnl in


which a large number of Cuban immigrants entered the Miami. Florida.labor market in the '"Marie I boatliit." which resulted from a te mpora ry ltfting of restt ictious
on emigration from Cuba in 1980. Half of the imm igra nt~ settled jn 1\fi ami.an rur
because it had a large preexjsting Cuban commuoily. C'..ard u~ed the Lhffurcnc .
in-differences estimator to estimate the causal effect on wage of an incrca'c in
jmmigration by comparing the change in wages of low-skilled ~orkers m \1 i l"'li
to the change in wages of similar workers in other comparablt l.~. catacs O\ cr tl
sa me period. He concluded thallhis influx of immigrants had a nt:gligiblc d1c~,. 1
on wages of less-skilled workers.

Example #2: Effects on civilian earnings ofmilitary service.

Does serving in tht! military improve your prospects on the labor market? The milt tary PIO
vides trajning that future e mployers might find attractive. However. an 01 S
rt!gre::ssion of individual civilian t!arnings against prior military service co uld ptoduce a biased estimator of the effect on civilian earnings of military scrvicl.!
because military service is determined, at least in part . by individual choices and
characteristics. For example, the military accepts only applicants who meet m llll
mum physical requirements, and a lack of success in lhe private sector labor ru.u
ket might make an individual more likely to sign up for the military.
To circumvent this selection bias, Joshua Angrist (1990) used a quasi-e~r'-ri
mental design in which he examioed labor market histories of those who served
in the U.S. military during lhe Vietnam War. During lhis period, wheth.er a young
man was drafted into the military was determined in part by a oational louery !>y<~
tern based on binhdays: Men randomly assigned low lottery numbers wert! '-ltgl
hie to be llrafted, while those wilh high numbers were nol. Actual entry into th~
military was determined by complicated rules, including physical screening anu
ccrtrun exemptions, and some young men volun tcered for service, so serving in h~
military was only partially innuenced by whether you were draft-eligible. nu-..
being draft-eligible serves as ao insuumental variable that parrially dete rma nc::~
military service but is randomly assigned. In this case, there was true ra ndurl1
assignmen t of draft e ligibility via the lortery. but because this random ization w t~
not done as part of an experime nt to evaluate the effect of military servict! lhi, 1s
a quasi-experimen. Angrist concluded that the long-term effect of mill tar) <cr
vice was to reduce earnings oi white. but not nonwhite. veterans.

Example #3: The effect ofcardiac catheterization. Section 12.5 desulhc:J


the st udy by McClellan. McNeil. and Newhouse (1994) in which they ust:d rhe
distance from a heart attack pat ient's home to a cardiac catheterization ho~ral.
rclauve tu lht: distance to a hospital lacking catheterization facilitie-., as an an~l u-

13.5

Quos Experiment!>

497

mcnt.tl \,uiablt.: tor ;.~ctualt rc!aLment by cardiac cath~t<.n/Jtlon.Thi~ -.tud> 1s a 4ua,iexpcnmt.:nt "llh a variable that partiall~ determine the tr\!atment. The trc..atmcnt
tlScll, cardtac catheterization . IS d t..tcrmtncd hy pc..rsonal charactensucs of the
patient <tnd by the decision of the patit!nt and doctor: IW\\t;:\~o:r. it is also innuenccd
hv whe ther a nearby hospital is capable of performing thic; procedure. U the location of th e patient is ''as if' randomly assigned and ha!> no direct efl ccr on health
out com~~. other tha n through its effect on the probability of catheterization. then
the relattvc distance to a catheterilatlon hospital i!> a valid instrumental 'ana hie.

Other examples. The quasi-exper iment re)carch ~t ra t egy has been applied 1n
other areas as well. Garvey and H::mka ( 1999) u td v mau on m U.S. statt laws to
examine the effc::ct on corporate financial st ructure (for example, the use of debt
by corporations) of anti-ta keover law!>. Meyer \'1~1.U~t. and Durbin (19CJ5) u....cd
large discrete changes in the generosity of unemployment insurance benefit-. in
Ke ntucky a nd Michigan. which differentially affect~ workers wilh high but not
low earnings, to estimate the effect on time out o1 work ol a cha nge 1n unemployment be nefits. The surveys of Meye r ( 1995). Rosenzweig and Wolpin (2000), and
Angrist and Krueger (2001) g.ive other examples of quasi-experiments in the fi elds
of economics and social policy.

Econometric Methods
for Analyzing Qoasj-Experiments
The cconomeuic methods for analyzing quasi-expcmmt!nto; are fo r the most pan
the same as those laid out in Section 13.3 for anal}ltng true experiments. Tf the
tre:Hme ntlevel X is 'as if' randoml y de termined, the n tht: OLS estima tor of the
coefficie nt of X is a n unbiased eslimato r of the cauo;.ll effect. If the treatment level
b o nly partiaUy random but is influe nced by a variable Z that is .. as if' randoml}
assigned, then the causal effect can be esLimated by instrumen tal variables regression using Z as an instrument.
Because quasi-experiments typically do not have tJue ra ndomi7.allon. there
can be systema6c differences between the treatment and control groups. U so. it
tS tm pona nt ro include observable measures of pr!!trcatment characteristics of the
indlVidunl subjects in the regresjon (the W's in the regressions in Section LD).
As discussed in Section 13.3, including W rc:gre~sors that are result) of the treatment io generaL results in an i ncon s1~tcn t estimato r of the causal effect.
Data in quasi-experiments typicall~ arc collected for reasons other tban the
pa nicular s tudy, so panel data on the "subjects'' o f lbc qu.t t-t:xpcnment sometimes arc unavailable (an exception is discussed in the box on thc: minimum wage).
If -.o. one way to proceed is to use a series of cross ..,cclion collected over t1me.
and to modif) the methods of Section 13..3 for rcpt..atc:d cros~ -sce1ional d,tta.

4 98

CHAPTER 13

Experiments and OuosiExperiments

What Is the EHect on Employment of the Minimum Wage?


ow much do~-:; an increase in the minimum wa~e
reduce demand for low-skillcd worker:;? Economic theory Si:I)'S that demand falls when the price
rises. but preci:-1!1} hO\\ much i$ Jn <.:mpirical quc~
lio n. Because: prices and quanti tic.\ are determined

by supply and d6rnamL the OtS

e::~t imator

to be uncnrn: latcd with the other tlete rnunanl'> qj


cmploymcnl changes over th1s period. Card and
Krueger collected data on employment at fa!>l-lt~ lu
restaurants before and after tht.! wage im.Teast: m th 1

two state~. When they comput~d thl) differenccs-i11

in a

differences estimator, they found a surprising n:!>uh

regression of e mployment agcunst wnges has simultaneous causaltl} bia~ Hypothetically.n randomi.l<:d

There was no I'! vidence that cmplo) mtnt fell at N~; 11

controlled cwcnment might r.mdomly assign differ-

Pennsylvania. Ln fact, som L' of their l!stimatt:s actu.

~nt minimum wngCl> to

c-ompAt<t chnnge!; in tl1llployment ( ()Utc:omes) in the

ally suggest that employ-ment increased in New Jcr$CY restaurant~ nfter its minimum wage went up.

lrcatrnerH anc.l c;ontm l groups. but how could this

relative to Pennsylvnnia!

differtJnt r.anployer$ and then

hypothetical cxperim~n l be done in pracrice'


ll1e lab~'r economists ))avid \ard and Alan
Krue-ger ~ 1994) dcci<kd to coru..1uc:t such an exp..:ri
menr, hut w Jut nnture- -or. more predscly. geog
raphy- perform th..: randorniLation for them. In
1992. the minimum wage in , 1ew Jersey rose from
$4.25 to $5.05 per hour, hut Lhc mtntmum wage in
n-:ighboring Pennsylvania stayed constanL In this
expcrimertt. \h(; "tro.l fltm ent" oi' the minimm11 wage
increns~-bein!! located in Ne\.\ J.:rscy or Pennsylvania- is viewed as if ' randomly assigned in the
sense that being .;ubjcct to the wage hike is ll!>sumed
1

Jersey fast-fuod rcstaurallts. relative to rhose 1n

TI1is findjng conflictl; with basic microeconomic

theory and has bl!cn quite controversial. Stib>equ~;nt


analysis. using a diff..:r..:nt source <.1f employment
data. suggests that there might have heen a sm.11l

drop in employment in New Jersey after the wage


hike. but even so the estimated labor demand curve
IS vt;ry inelastic (Ncumark 1111d Wascher. ~000}.
Although the cX'act "'ag.: dasticity in this quusi
cxp..:riment is a matter of tlcbate, the effect on
employment of a hi)..e in lh.: minimum wage app~.l ~
to be smaller lh<m many economists had prenou~h
rhou~ht.

Difforences-in-differences using repeated cross-sectional data.

'\

repeated cross-section aJ data set is a collection of cross-sectional data ~ets. \,.ht.r<each cross-sectional data set corresponds to a different time period For example.
the data set might contain o bservations o n 400 indtviduab in the yea r 2004 anJ ,,n
500 different individ uals in 2005, for a total of 900 diffe rent individuals. One -:\'1111'
plc of repeated c ross-sectional dma i political polling data. m which polittcal ['1A
ercnccs are measured by a series of surveys of randomly selected potential htd'
where t he . urveys are taken at different dates and each survey ha., dJffLf-'nt
respondents.
The pr e mise or using r 'pea ted cross-sectional data is tha t. if the indiviJll<11s
(more generally, entities) arc rundomly d rawn (rom the same population. then th~

13.5

OoosiExperiments

499

1ndividuub in I he carlil..:r cross sl!ction cun be u~ed as surrogut~s for the individu
ale: in tbe tr~almcn l and control groups in the late r cross section. For exnmple, suppose that, hccause of an increase of funds that had nothmg to do \\ 1th the local
lahor maikct. a job trai ning program was expant.lt:d in southern but not northern
Ca lifornia. Suppose you have survey data on two randomly selected cro s sections
o( adult Californians, with one survey taken before Lhe traini ng program expanded
and one aft~r the ~xpans1on occurred. Then the .. treatment group'' would be south
ern Californians and the "control group'' would be northern Ca liforni ans. You do
not have data on the ~outhem Californ1ans actually treated before the treatment
(because you do noc have panel data), but you do have data on southern Californians who are statistically similar to U1ose who '' ere treated. Thus you co n usc the
cross-sectional data oo southern Califo rnians 1n the first period as a surrogate fo r
the pretreatment ohserva rions un Lhe treatment group. and the cross-sectional data
on northern Californians as a surrogate for the pret rea tment observations on tbe
control group.
When there are two time period~ the regression model for repeated cross-sec
tionaJ data is

" here X., is the actual treatment of the ith mdividual (emil)) 1n the cross section
in period r (r = 1. 2). D, is the binar y indicator th at equals 0 in the fi rst perio<.l and
equals 1 in the second period. and G, is a binary vanable ind1cating whclhcr the
individual is in the treatment gro up (or in the surrogate treatment group. if the
ob:.ervation is in tbe pretreatment period). The ,w individual receives treatment if
he o r she is in rhe treaunenl group in the second period, so in Equation (13.7), X;1
- G, X D,, that LS, X 11 js tbe interaction hetween G; and D,.
If the qua<;i-experi ment makes X;, ..as if'' randomly received, tht:n the causal
dfe::ct can be C!>limated by the OLS esti mator of {3 1 in Equation ( 13.7). H there arc
more than two time periods.. then Equalion (13.7) j.., modified to contain T - 1
binary variables indicating the different time periods (see Appendix 13.2).
If the quas1-experimeo t makes the treatment X 1 oolv partially randomlv
received, then in gen eral X 11 will be correlated with 1111 and the OLS eslimator is
biased and inconsistent. In this case. tlu~ source of randomness in the quasi-experiment takes the torrn or the instrumental variable 2,1 that pMtiJII> tnlluence~ the
treatment level and is "as if'' randomly assigned. As usual . for Z, to he a valid
Instrumen tal variable it must be relevant (that is. it mu.,t ~ related to the acrual
lrealmenl X,,) and exogenous.

SOO

CHAPTER 13

Experimoot~ ond OvosiExperiments

13.6 Potential Problems

with Quasi-Experiments
Like all cmpirica 'tuo~.;~. 4ua,1- xpenment f.1cc tim. t' to tnt rn. n "xtcrna'
' 1 JH} A p.n ti..ularl~ important polt.nual thn.. at to mtan ... 'alid ..) .., "hcthcr
the ''if' runc.Jumization in f:.tct can he trcatt:c.J rdt<~hl)' 1:> true ram.lomiz:Hton.

Threats to lnternaJ VaJidity


111'- thrc:ab to th .. mh:mal '-t~ltdily of trut! ramlomi1ei.l contwllcd c pcrimcnt<>
li:>ted 10 ~cctton 13.2 .:~L-;o appl} to l}U.t'>t-expenment:>, but\\ tlh 't>mc mv<.hlt\:,llll>ll'

Failure ofrandomization. Quasi-~xpcrim~:n ts rei) un diff~rcncc-, in inut\Ic.lual circumstance'> legal change'>. sutldcn unrelatcu ~,; \'Uilh. and '>II flllth h> '"''
vitlc the ''ll'i if'' randonw allon in the treatment level Tf this "a, u~ randumtt.llion
f<ub to prmluc~ n trc.llmentlcvcl X (or an in'itrumcnt.d variahh.: 7) th.ll i.; r.tu
Jom, Ihun in g~,; ncrolthu OLS estimator is biased (or the insuumcntnl \'ttl i.thlc
c~Ltmo.IIM io; not consrslent).
\'in,, rru'- '- xpcnment, one WU} to tc~t for tailur1. of rantlomt73llon t) tu
ch~o.d. fM '' ~h:matic Jiffcr~D\:t:~ between the trc~ttmcnt anJ cuntwl g11 Jp.'-. tor
C:\:tmpl~ In rC!!fl.. ,jn~ X (or Z) lln th~ inJi,idual ch lfal:tuhlJC'- ( thl. \\ ~,) nnu tt:'-1:.,g he h'< porhc:.t:o that the coetficicnts on the Ws .tre zero It dilfc:r~.. nee..' cxi't I 1
ar~ not r~.Jtltly ~xpltined hy the nuture of the qu.t'>i-~o.xpcnrnem then thi'> is c
J~.n-.:~,; Lbatlh~ qua'' experiment d1J not pro<.lu~ tru~o r.mJomiz.1tt lO l hr. rl
j-. Oll r~.- lll<lll'lllp l1c..t\\~;Cn X (or Z) ..nJ th<. w~..\ (or Z) (:0Uid be r~olttcd 10 ~om~
Clf the un,lh,c..n(.:d factors in th1. arur h!rm tt. B~,;ctu'c th~se i 1dur' 1 ~ unoh
"cr\'cd. thi' (..tnnot h tc,ted anJ tht: validits uf the a-;,umptiun l,f "a-; tf'
domi7.atton muc;t b..: C\aluated U'>ing c.\pen kno\\ktlcc anc.J judgm~o.nt .1pplicd 1
the :lpph<.:<tlton .11 hnnd.

Failure to fallow treatment protocol. In a true exp~o. nmcnt. f<lihu c tt> tullt1\\
1111.. treatment protocol ar ise~\\ hen members of IlK lJ c<Jtmcm group tail to 1cCU\ ~
tn:atmcnt andlm member:. of the C()nlr<.ll group actunllj r~,dw trctllnwnt: illl'l)fl
'>cqucncc, the Ot S c.,timator uf the cnusal cfk<.:t hao; :-.election hi'h. 111~ counter"
pi!rl IU thi' in a tjU:t'>i CxperimentJS \\hCn the "a" 1r r:1nJnmi1<Hiun infiU~IlCL'' btl
doc:. not J~tarninc, tht. treatmc:nt k 'd Ln Lhl' c;1-sc the in:-.trumcntal \iUtinhl.:'
cstmltllur b. '~d nn th'- lJU.tsJ-expt;rtm~,;ntal innuen~~.. /c. n hL L'"'''' nt .:hn
though th\! Ot '- ,tim,tl\.lf 1 noL

13.6

Poronhol Problem' With O.oosi Expenments

501

1 qun,.i-exp~nmcnlt' ,jmJI.tr IU .Htritiun in e~ true experm the s~n'c that 11 11 tnscs bccau-.c ol personal chotec:. or characteristics.
then .tllritillll c.tn induct: correlation bcl\\c.!cn th1. tr1. ttmcnt 1\!,,.J .. nd the t::rror
term. TI1i' rcsulls in 'amrle ~electiL'O bi:t,,\lllhc 01 S c.;o.,timnlllr ,,f the: <.a usa) effect
j, hin,cJ anJ incon(;i,tcnt.

Attrition. Anritinn in
i m~:nt

Experimental effects. An adv,tntag~. ol qun..;Jc>.ptrim~.nt:. ''that. hecallo;.: they


<H~ Ol1llruc t.!Xpt: rim~nts. !bert. t~ p11.ally I!> no r~aM.lO lvt mJI\ 1uuab to thmk the}
.m. cxpcnmcntal :.UhJ~,;...:l"- Thu" c\p~..-rimental ctlc( t~ ,UI.. h ,,, the Ha'' thorn~.: clh:ct
gcncr.tlly art.! nut gt:rnwne tn quasi-experiments.

Instrument validity in quasi-experiments. 1\n importanto.,tep in e,aluating


n stuJy that u.;es in!>trument.tl variabk' rc~rcs')ion b <.:.~rdul con:.tt.lcration of
''hcthcr th~ tn,trumcnl 1:. tn tact v.lltu ., his v~.n~...r.-.1 stah.mcnt remams true in
4Uasi-t::xpcnmcnlal studH.!S m wh1ch the mstnmKnlts "J:. If r.tnt.lomly t.ktcmuncd.
A~ UiliCU'-'~.,;J 111 Chapter I~. tn<\trumcnt 'a hull) rcqutrc-. hnth tnc.trument relevance
anu instrumc.:nt c.:xogeneit). Bec.wse ino;trument rde' a nee can be checked u"ing
fh~.: 'tatbtical m..:thuJ-.. ... ummaritLt.l in Ke) Concept 1:! c; here we focu~ on the c;cconJ. mort.! judgment tl requirement of mc;trumcnr exoP.cnetty.
\ltlwucll it mn!.hl seem that a nnuomh aSS1t!n1.'d m"trumcntal variahlc '" nl.!ccsc;anl~ o..:'<tli!I.Oc.lUS. tht<; '" not so. Con,Jtkr the ~xamplc::s of ~cellon 13.5. ln
Angt bt ~ (1 C)(I(J) u:.t: of UJ.tftluttcr) number:. a!> .m m~trumcntal \'arial)k in studying the cllcct on'-" than '"'ammgs nf mtht,tl) 'Cr\ icc. the louery numbc.!r was in fact
randc.lml~ a''igneu But. ,t, -\ng.rbt (1990) pomh nut anJ Ji-.cus'c'. if a lo'' draft
-umhcr rco,;ultc; m hchnvior nimcd at avntt.ling thl draft. and that aYoidance beha\'' 1r !'tllh"cqu~.. nth afft:lh c. I\ ilian earnme' th~:n .t luw lotlcry numhu ( Z) could he
1d. 11.J tu unuhsa\\..lll.ll.. tor- that Jdanunc ch ilian carmngs (II 1- that ts. z, anJ
u, .tr~..- currcla!t::d t::\ 1.11 though
randomly "''1gnec.J. As a s~.:cunc.J ~.:-..tmplc.
:\tel '- lm, ~lc~ul. and Nc,~hou''- 's ( c.J~4) . . tuu) ol th1. dfe1.1 on heart attack
p.,til.nt-. ,)f e.ut.ll."\l c.: tthct~rizatiun treattc.l the rdathe Jbtancc hl a cttthetcri:tation h11-.pital "ir it \\.:rc ranc.Joml) ,,.,i~nl'u But. a-.. the au! ll()r:- highlight 1nJ
cx1mtnc, it patience; who li,-c close to a cathetenl.ttmn hospita l ;m: healthier than
thosl; whu lhc far""} (pc:rh<tps bcc.tu~ ol better .tcccc;~ to medical e<~re general!~). thl..n th~ rd.tttv<. Jt,tancc to a cathctcrmnion lh''>Ptl.tl wuulu b~ corrdatcd
'' tlh unHtll'U '..tiJabks in th~ enor h:Ull ol Lh1. hcalt h outcnm~:. Lqu.lliun. ln bhort
JU'>I hcc,tu~c ,m in-.u ument I!> random!~ t.lt:tcrminc<.l vr .._,,,rrant.luml} ddermined
Ulll''- nnt nl't'c,o;arily mc,tn it is o..:X\lgcnou.. in 1h~ sense that corr( Z,.ll) = O.lllU"
the c.1<\(' for CXOflCncity muo.,t he c;cruttni7cd dn'\clv C\'Cn tf the instrument anseo;
lrom a qu.tsH.:xpcnmcnt

z, ''

502

CHAPTER 13

bcperimenb and Ooosi bcperiments

Threats to External VaJidity


Quast-cxperimental studies U!\e obseJVational data. and the thrent' to the ~xtcr
nal validity of a study based on a quasi-experiment a re g~nerally <;imilat to tl'!(!
threats discussed in Seclion 9.1 Cor convenuonal regression studteo; u<>mg ob,.,,_
vational data.
One importam consideralion is that the special events Lhat cr\; He the "a' ar
randomness a t the core of a quasi ~expenmental aody can result in other '>p~;.~ 11
tcatures thatthrearen external vahdiry. For example, Card's (1990) studv of Iii~ r
market cflects of immigration discussed in Section l3.5 used the ..a~ tf randnm
nes~ induced by the intlux of Cuban immigrants in the Marie I boat lift. fht:rc '' L.
however, spe~o:ta l features of the Cuban immigrants, Miami , and its Cuban cur .
munity that might make it difficult to generalize tJ1ese findings to immi grant ~ fr<.mt
othe r countries or to other destinations. Simi larly, Angrist's (1990) study of lht:
labor market effects of serving in the U.S. military during the Vit.ltnam War pr _.
sumably would nor ge neralize to peacetime militar}' sc::rvice. A s usual. whether"
st udy generalizes to a specific population and setting of in terest depends on th~
details of the study and must be assessed on a case-by-case basis.

13.7 Experimental and Quasi-Experimental


Estimates in Heterogeneous Populations
The causal effect can depend on individual characteristics, thattS, it can varv 11om
ont! member or lht: population to lhe next. Sectton 13.3 discusses estimatins c.Ju:.al
effects [or different groups using interactions when the source of the variat' in
the effect. :.uch as gender. ts observed. ln this secuon, we consider th~. coo .:quences of unobserved variation in Lhe causal effect. We refer LO lhis circumstan~.e.
in whjch there ts unobserved variation in the causal effect within the popu lat1on.
as having a hcterogtmeous populatioo.lhis section begins with a discussion o1 rop
ulation heterogeneity, then turns to the interpretation of the OLS and IV '-''tt r.a
tors when then~ is a heterogeneous population. To keep things simpk tl-'i-
discussion focuses on lhe case of a hi nary treatment variable X, (which may or nwY
nor be random ly assigned) wi th no additional regressors.

Population Heterogeneity:
Whose Causal Effect?
If the cousal effect is the same for every member of the popu lation then in I ~
sensl! the population is homogenous and Equation (13.1), with its <.,tngk e1u~..l
\!!feet /3 1, applies to all members of the popu latton. In rcahl). hOWl.\d hC

13. 7

Experimental and Ovosi-Experimentol Esllmoles in Heterogeneous Populations

503

population s!u<l1ed can be heterogeneous: ~pcdficall}, the cau a l dfect can a ry


from o ne indiHdual to the next based on the indi vid ual's c trcumstan ces. backgro und. and other chara cte ristics. For example, the effect on employment prospects
of a job training program that teaches re sume-writing skill. prcsumahly is greater
fo r workers who lack resume-writing skills than for those who already have those
skills. Sim ilarly. the e ffect of a medical proce dure could depend o n the e a ting.
smoking, a nd drinking ha bits of the patie nl.
If rhe causal effect is d iffer ent for d iffer en t people, Equa tio n (13.1) no longer
applies. Instead, the i 1h individ ual now has his o r he t o wn intercept. {3,k and causal
effect, {31,, the effect of the treatment on that person. Thus the po pulation re gressio n equa tion can be written

Y; = f3o; + f31iX1 + u,.

(13.8)

For exampJe./3 11 might be zero for a resume-writing tra ining program if the ,,h individ ual already knows ho w to wrile a resume. Bccauc;e {3 11 varies from one individual to U1e next in the populatio n and the indt vH.Iuals are selectt!d from the
popula tion at ra ndom, we can think of f3H fill a ra ndom variable tha t, just li ke u,,
reflects unobserved varia tion across individuals (Cor example, variatjon in pr eexisting resume -..vriting skills).
As discussed in Sectio n 13.1, the causal e ffec t in a given popula tion is the
expected effect from a n experiment in which members of the population are selected
at random.When the poplltation is heterogeneous. this causal effect is in fact the aver..!Be causal effect, also called the a verage lletl tment etl"ect, which is the population.
mean of the individual causal effects. ln te rms of E quation (13.8), the average causal
e ffect in the population is the population mean vuluc of the causal cfi cct (13 1,)-that
is, rhe expected causal effect of a randomly selected member of the population.
Wba! do the estima to rs of Section 13.3 estimate if there is population heteroge nei ty of the form in Equatio n (13.8)'7 We first conside r the OLS estimator in the
case that X; is 'as i.l" randomly determined; in thili case, the OLS estunato r is a consistent estimator o[ the average causal effect. This is generally not true for the IV
e~tlma tor.

however. Instead, if X, is partially influenced by Z,.lhen the IV estimator using the instrument Z estimates a weighted average of the causal effects.'' he re
those for whom the instrument is most influt:nlialre ceive tbe most wejghl

OLS with Heterogeneous Causal Effects


Suppose thar the treatment rec~ived. X;. is r andomly assigned with per fect compliance (in an experim e nt) or "a~ ir randomly assig ned (in a quasi-experimenl ),
so that (u,IX;) = 0. Tben it is reasonal~l~ to conside r using the d iffe rences esti mator, that is, the OLS estima tor ~ 1 obta ined fro m n regression of Y, on X 1

504

CHAPTER ll

EJ<perrmenl5 and Ouosi Experiments

We mm ... huw thtll il thcrt: is heterogeneity in the cau'\al cffc:d 111 th1. pttpul .t
tion and if X, j, r.mdumly a<..,igncJ, then the difkrcnce ... eo;ttm.ttur '" 'I."OIN,lcnt
estimator ot the ,t\eragc causal dkcl. The OLS e... timator j, JJ 1 ',\ ~I ' ~ [lcrua.
Lion (4.7}l. llthe olN.~ nations are 11.d., the '>ample covanc nce a nd \ nn.mcc arc
CliO,hh:nt cc:tun .ttor:> ol the populauun covanance and hriancc. c:o {3 ~
un lu i. IJ .\ '" r.tnc.Joml~ a: -..hmcJ.then X as dtstnbuteJ nt.lcp~nucntl) t f oth r
int.livic.Jual ch .. r.tl."lt..n,ttl."'- both ub,en~..d and unob:.ened. and n paruculnr ts dt .
trihuted ind~.;p..:ndentl~ uf f3o. and f3w According!~. th1. OV c: ..um:Jtor ~ 1 h, :> t~.:
limit

( 11 0)
wh~.:rc

thc thmJ equality uses the facts abou t C(tv anance~ in Key Com:~o: p t 2 ~a nd
cov(u;.X1) 0. which is implied by E(u,I X ,) = 0 [equation (2.27)]. and the l111.tl
equality follows !'rom {30, ami /3 1, bcing d istributt.:u imlcpendt!ntly of X,. whtdtlhl.\
are if.\', is ranuumly uetcrmincd (Exercise 13.9):111Us) if X, b ramlnml> ""''!Pl~l
{3 1 is n consi,tcnt estamutor of lh~ averag~. cau:,al d(cct (/3a_).

IV Regression
with Heterogeneous Causa) Effects
Supf'O''- th.tt tr~o. ..tmcnt t:. onl) pawall) randomJ) <.lctermmed. that 7.; ~ n ' altd
m... trumemal nri thl~ (rek\ ..nt ;md '- \ogenoll!\) and hat there'' hc.;ter"6'- nen~ 111
th~ effect on.\' lf Z, SJXdfk~illy, :,uppo-.c that X i.., rc!latcd to L, by the lin~.,n modd

( p J())
\\here the coc11ictcnt:. 1r,~ and 1T, val) from one indt\ auual to the next. [lJu.ll t\1!1
(I J. IO) ,., thc fu..,t-stagc equation ofT LS p:.quataon (1.2.2) Jwith the moLl it u:.ll tlll
that the effect on X, of a change in Z, is allowed to vary from nnl.' tndivtduultn tll ~

next.
lhc fSLS cstimutor is~{ -~ 1' = ~'L\ I~/\ (Equation ( 1:!.4) !.the ratio of thc. , . 0 1
pic co' arm nee hc twc ~n L anJ } to the sample cova ri;mcc between Z anJ '(, It 11 .
obscrvauon~ arc J.uL. then thc ...c <:ample cm .mances arc con'''knt ~.stunat\ll ' ~~c
.

J
th~. f\\)pul,ttion ctwartances. SQ fj[' 1 ~ ~ rr ..n u /\ Suppose thatrr,;, r. f3 ~ ,,n
{3 1, ar~ Jbtrihutcd inJqxndentl) of u,. \ .. .. n<.l 7 ,.th.u (u., Z,) F( , / 1 = o. nu
th.ll /. (r. 11) = 0 (intrumenl rek\ancc). It j, 'ho,,n in App~..nt.li \ I 'A ll..tt. unJ ... r
these :t"umpti{ms.

t 3 .7

Experimental and OuosiExperimental Eshmola$ in Helerog~ Popllotions

50S

(13.11)
That is, the TSLS estima tor converges in probability to the ratio of rhe expected
value of the product of {3 1, and 1r 1, to the expected value of r. 1,.
The fi nal ratio in Equ<nion (13.11) is a weighted average of the individual
causal cf(ects {3li. The we ig hts are -rr 1,1 1:.'1i 1,. which measure the relative degree to
which the instrume nt influe nces whe ther the 1,h individual receives treatment.
TI1us, the TSLS estimator is a consistent estima tor of a we ighted average of the
individual causal effects. whe re the intlividuuls who receive the most weight are
those for whom the instrument is most influenti al. The weighted average causal
effect that is estimate d by TSLS is called the l oc~J ~:werngc treatment effect. The
te rm "local'' emphasizes th<H it is the weighted average that places the most weight
on those individuals (more general ly, e ntities) whose treatment pro bability is most
influenced by the instrumental variable.
There are tbree special cases in which the local average treatrrien l effect
equals the average trea tment effect
I. The treatme nt effect is the sam e for aU individuals. This corresponds to {3 11 =
{3 1 (or a ll i. Then the fin al expression in Equation (13.11) simplifies to

(/3 ,;1Tl;) / (7r,.)

= {31E(1Tll)l (., ,,) = {3 ,.

2. The instrument affects e ach individual e::quaUy. This corresponds to -rr 11 = r.1
for all i. In thi ~ case, the fint~l ex pression in Equation (13.11 ) simpliiies to
E(f3t,77 H) / (1Tlr) = (f3t;)r.,t r.1 = ({311).
3. The he te rogeneity in the trea tme nt effect and heter ogeneity in the effect of
the .instrument ar e uncorrelate d . This corresponds to {3li and 1r 1; being
r andom hut cov({3lr,1it;) = 0. Because ({311'1T 11} = cov(/Jlt,7T1,) + (/3 11)( 77 1;)
[E qua tio n (2.34)). if cov(f3 11'r. 11) = 0 then E(f3ur. 11 ) = E(f3 11 )E(r. 1;) and the
fin al expression in Equa tion ( 13.11) sim plifie~ to E(ft"1r 1J I E(1ru)

= (/31;)E(1Tli}/ (1T,;) = (1:3,1).


Jn each of these three cases, lhere is population heterogeneity in the effect of the
instrument, in the effect of the treatment, or both, buttbe local average tre atment
effect equ<tls the average treatment e ffe ct. That is. in a ll three cases, TSLS is a consiste nt ~::.ti m ator of the average treatment effect.
A":>ide from rbese tbree special cases. in genera l tbe local average treatment
effect d iffe rs from the average treatment effect. For e xample, suppose that Z; has
no innue nce on the treatme nt decision for half the population (for them, r. 11 = 0)
and tbat Z; has the same, nonzero intluence on the treatment decision for the other
half (for them, -rr 1; is a nonzero constant). n1e n TSLS is a consistent estirnaLOr of
the avcrugc treatment effect in the ha lf of the popu la tion for which the instrument
innucnces the trl!atmen t tlcci~ion To be concrete. suppose that workers arc

50 6

CHAPTER 13

Experimonls end OuosiExperiments

eligible lm a JOb tnnninl! program and arc ramlomh '''11!.1\nl a prinr II) number/,
whkh inOucrll.:e-. hm' hhl~ the} are to llt: Jt.lmiltcd toth1. prngr.tm.llal1the ''orkcrs knm' they" ill hcndit from the progr.1m: fl>r thl m. /3 11 = 13' > 0 .tntl -rr 1 r.
> 0 11 other h tlf know th::u. for them. the proN1m i' .ndfcctiH' 'o the\ woukl
ncH Cnh ._.' 1. 111 ndnu cJ that ''for them 13 = II .~nu r. = II The <1\'Crage lrc t
rncnt eth..:ct I!\
or U) = ~13 . The IV\. tl .1\t,;rl~l.. lr ~.11111 nt elft I

I,,, )

f.(JJ,~o 1 ,J/f"lor 1 ,) \;o\\ E{7.',;)=b"ia~d(/3.,rr) Ll/3.!(7.' .. /3 1,11 IU


":'I ~13.' 7r,'. ,,, /:."(/31, .... )/ C(7r~,) = 13.'. rhu..... 10 thi-. C\,tmpk lh\.' loc.ll .wet.

'-

lrLalm\.nl dh:C:t b the C3U,al CffCCt ror tho~~ Wllfkt!l'l \\ hO lr\. likd~ li.l CnrOlJIJ
procram. and gi"C" no wci!!bt to those ''ho will not enroll UJHkr ..tm 1. i cunht me~.:
In c,mtr ~t.thc d\cr..tgc treatment cfft:cl pl::~ccs cljUll WCI!...ht on tllu di,lduol<.
fi.I!Jnllc!i)> ut "h\.thcr thcy '' \lUid enroll. B ccau,~, tndt\ tJu.tls d~!cJd~.; 1<'1 '- nroll ba,cd
m pan on thcit knt}\\ ledge ol how dkctl\~ the prngram '"II Olo lor tht. m. tn th.;
1-!Xample th~..loc,Jl m~.;ragcw treatment effect cxcc.::~.:J'> tht. ,1\cragc trl:.tlnll.'ll\ clh:l\.
ll1is Jiscussion hao; two implkalillllS. fi rst, in LIK' cin:utl11-l.llitl'~
111 "'hid1 OL~ w<,ukl normally be coru;istent-thar is. when F.(u,l X,)
0 -th( cJl S
c'\lllnator eonunues 10 be consistent in the p t c.::::.~n c" of hct..:mg.ctH.:ou" C'lll d
c.:l h:cts m the popul.tltt)n hO\\t!ver. because there is no sllll!l~ c<tusal die: I. 1 'h. < , s
e-.umah1t j, prupCrl) interpreted a.c;. a con:,istcnt esum.ttor ot th" av"r ~t.:
dful in tl~: popul.t=un hdng studied.
S~.:" 1nd r mind 'tdual's d~.:ci,ion to r~cdve trcatmem u" pent.!' \10 lhc t ltxu,cncss ol the rcum~nt lor that individual ~~en th" TSLS e~llm.ttor i
I
h rut 1" n,t,h.nt est.m"h,r ot the a'~fal!.c cau-..tl eltect. Instead T LS 't"~.. '
,, k)\,al .l\crug" trl.!.ttffii!Ot ~.; f.fect. " here tht.. l<tus<~l dkcls t>L the 1nd1\ 1du I "
arc- mo')t inOu~:nccd b} the anstrumt:nt r~cctYe the.; grl;,tt(;.;t \\-l.:tl!ht.llus jt,;,,
disconcerting ...ituation in '"hich two res~.;Jrch~r'. ,trmcJ '' ath dafkn:nt m,tn n~n
t.al v tri.thl~' that .1re huth \<JiiJ in lhc sellSc! that both .m~ rdc,ant aml C \,,g~.:nL U'\\OUid t.,htam dtlka~.:nt ~~limatco;. of"th~" cau,al effect. e'en inlnr~~.:' 1 ,:,,
r\lthou~h both cc;tim.ttor' pro\'ide some insight into thl' di.;tribution of th~ L,ut
dk ts \'Ia thcr rc-.p~ctav" wetght~:d a' erages of th~ lot m 10 r4uatinn ( IJ.Il) n~i
1
th!.!r cstunator as 11\I!Cnt.:ral a consistent estimator ollh~.: avc.:rag.: ca uo;.nl clh:~.:l.

Implications.

.tU"''

11h~ll.' oliC , ... ,cr.ll gnllU (hlll .u.IV.llli.'CU) \J~u~~ion~ or th~:

dlcct

nl J1<lf'llll111Cll1 hdc:W!(C:I\~'11~ oitl

1'1"'

I'.J,\111 C\otlU, tllon 1!\lllll.llm~ I hc~c tncluJc lhc: 'uncy by llccl;m,m I .tl nlhk,.tml 'imllh I tJN" '1
7) .tnJ l.tmt:' Hc... km.m, lcuun. Jdt~Crctl whc:n he ICC\.IVCd th ... '\nlxll'ruc tr C..:liOllffiK' rli~~~ n

11111

..

.;'tMll. s~~thiO 7). nlc l.tiiCf ICla!ro:n~c .tnJ Ang:ml. GraJJ.~ tnd 1mb
(21)1.11)1 pro\lJC l.lc:a:uJ.. J "'''
cu' '''nollhc runJlm t:IIC\:t~ modd l"h'"h lrc:ab /3 .~.~ v-o~r.ngacro..-, mdt'tJuaJ,) nnJ pru ...k ,..,n
1
g n~.:Hll \tr,tOn> of the rc,ult in l'quat100 11.3 11) The CC'I 1't o11h lt~ltl\er:tJle trcntmc 11 ~ "
wa.. mtrnJuc..."Cl h\ ~ngn:>l nnd lmbcns (1~). '~ho 'bo"cd tru1ltn s~ncrlllll do.:~ noll!iJW 11
r
j!C treatment cflcct

13.8

Conclusion

507

Example: The cardiac catheterization study. !>~.ltmn-. 12..5 and 13 5 lh-.cu<.-.


McClellan. .\k:"\lll, and ~c" hous ... 's ( lWl) ~tud) of th~ ~1ft c.t on mortality of ~ar
d1.11: c.llhctcriJatilm of h~:art attack patient:..: nte uuthurs u-.cd in-.trum~.nttl \ triable-. r..:grc-..,ion. with lhc rdati\'C distanc~. to 1 cart.linc catheterization ho-.pit 1l 1s
the in-.trumcnral variable. Based on thc.:il f~l 1;, c.: ... timatcc:,thcy found thnt cardiac
clllll'tcrization had little or no cflcct on heal Ih out.:0mcs. This re::sult is surprtsmg:
tvkdkal pruccJurcs such as card1ac cathctcnt.lllon .uc ... ubtcctcJ to rigurou~ dmlt,lltri:lb prillf Lo .1ppr0\ allor" Jdcsprc.td u~c.::. Morco\cr. C<Jrd i.IC catht::tenzataon
IIJO\h -.Uf.!_!CilOS h1 {XffOrm medical intt:l \'t:llt10ll' that\\ oUid ha\1.: rcquir~d ntcljllf
surl!cr~ a Jccadt!' carlit!'r. making thc-.c in ten cnticllll. -.a fer and, pre-.umably. better
ftlf lonr term P'llicnt health. How could thb CCOill.llllCtrk -.tud~ r.ulto lind hcncfkbl lh:l.ls ol cardiac calheteriLtlJOn?
On~.:. pu-...lbk an.,wc.T r::. lhat thcrt:o I!> h~tcr,1uendt\ in the treatment dkct ot
c;mJrac c tthdcrrzatton. For -;orne pau~nl'-.lhts ~ .tn I.!II~CII\C mtcncntron. but tor
oth~.:.r-. perhaps tho~~:! \\ho are bl.'ahh~r.thh pr,\~o:t:durt. "'le~l. ef(~;.ctiH; llf gl\~n
thl' n''-' IOH>lvcd with an} surgery. perhaps un nc:t indfccti\c. 111U'i. the avcragt
cau,,tl effect in the population of heart ;\I tad patient-. could he.tnJ presumabl~
is. po.,iti,c. The IV C'\timator. however mea-;ureo; 1 mar~ina l ~o:lfcct, not :m nvcragc
dint where the marginal df:cl b the ell~;!;( Ol the prOCt.dUrl.! 011 th'''C: p~lllcOI S
I 1r whom di,tancc to the ho<>prtal a... m impl111Jnt [al'lor tn \\hdhcr th~~ flCl..l\1.!
ln.JIIl1<:;lll. Hut tho~e patients cuuiJ b~o. JU~! th~: Jclathcly health) p lllcllh lot
wl11 m 1n 1hl- margm cardiac cathderllIUiln '' ,, rd.IIJ\d) in.:IT~ctl\'l' procedure.
Tf ,o, :\lc\ldlan. ~k'\eil. and ~... \\huu,c, (lll91) 1 c;LS ~,.:;timatM rnen,urc!> the
effc:cl of th~. procedure for the marginal pati~;nt (for" hum it b rel:uivcly incllcctivc). rwt tor the avera~e patient (lor \\hom it lllll.!hl he clkctivc).

13.8 Conclusion
In Ch:1pkr 1. we Jdincd the.: cau.;aJ d t~.ct in ll'rms nf the cxpcch:d 1JUtcumc of an
l(kal randnmi7cd controlled experiment If n randomized controlled cxpcnmcnt
i.; 'I\ .tilahle or can be perfonncJ it can prO\ rdc compelling e\'ldcuc~.:. on the c.IU,,tJ
l fl~:ct unua l-tudy. although even randtmll/\;J c~mtmllcd l.'Xpcnmt:onts arc :.UbJ~o:<.:l
tu potent iaU) important thrcat!) to internal and ~.:xtcrnul vahd11y
J)c..,pit~: their advanta~c:s. ranllllffiitcd cuntrollcd c:-.pcrim~nh inc~. 111nmic~
f.u.:c s~;vt:rc hurdle ..., 1ncluding cthkal con~.....rn' nml co-.t . The in<;ighh of cxpclimult:ll mt:thod' can. howcvc1. be applitd to quac:i-experiment-., in \\hu.::h -.pccial
ci1cum,tanccc; make it seem .. a~ ir' randnmaz,llinn h.b Ol:curred In qua.,h::>.pcri
mcnh. the causaJ eCfect can be C'illmall'U U'>tng ,, Jrllclt.nc s-10 difference-. c'll

508

CHAPTER 13

Experiments and Quasi-Experiments


mat or. poss1bly augm e nted'' ith additional r..:gre ors; if the '"a if' randomiJ:ation
only partly influence the= treaLmcnl, then instrume ntal variable!> regression can
be used in lead. An important advantage of quasi-experiments i-. that the source
o f the "as if' randomness in the data is us ually tran:;paren r and th us can be evaluated in a concrete way. An important threat confronting quasi~ex rx:nml!nrs 1-..lha r
someume s the 'as if' raodomiunio o is not really random, ~o the treatment (or th\:
instrumen ta l variable) is com.:latl!d with omined variables and the resuhmg e..,t,mato r of the causal effect is biased.
Quasi-experiments provide a br idge be twee n observational d a ta ets and true
random ized conlrolled experiments. The econometric me thods used 10 thh chapter (or analyzing quasi-eJtperiments are those developed, in diffe rent contexts. u
earlier c ha pters: OLS, pa ne l data estima llo n methods, aod mstrume nt al vanabk-..
regressio n. W hat differe ntiates quasi-experiments from the applications examinu.J
in Part 1I and earlier in Part Ill is the way in which these me thods are interpreteJ
and the d ata sets to whic h they a re applie d. Q uasi-experime nts provide econome tricians with a way 10 think a bo ut how to acquire ne w da ta sets, how ro think
of ins trumental variables, and how to evalu ate t he plausibi lity of the exogencit11
assum ptions that underlie OLS and instrumental variables estimati()n.5

Summary
1. l11e causal effect is defioed 10 te rms o f an ideal rando m1lCd co ntro Ued experiment.
and the causal effect can be estimate d hy the difference in the average o utCOIT'l':.
for the treatment and control gro ups. Actual expenme o ts with human sub.Jec s
d e v1ate from an ideal experime nt lor various pr actical reaso nc;, especially the f ..Jure of people to comply with the experimental proto col.
2. 1f the actual treatment le "el X; is rando m , Lhen the treatme nt e ffect can be estimated b y re gressing the outco me on the treatment. optio nally using addiuon I
pretreatme n t characterist ics as rcgrc~ors to improve etficie o cy. If the assi~n I
treatme n t Z; is random but the actual treatment X1 i~ p <m ly determined by t.ndtvid ual choice. then the causa l effect can be estima ted by instr ume ntal variables
regression using Z1 as an instrument.
3. In a q uasi-experiment. variations in laws or circumstances or accidents of nature
arc treated "as if' they induce rando m af.signme nt to treatment and control group~

~'ih.td"h Cool . a nd Camphell (2002) provide a comprehcn,av.. tr~ tmcnt uf c\p.:rtmcnl' and q u 1.. \f" rtOlO:nl) 1D the :>OCtal \Cicnc~ unu 10 psv<:hology l -camptc:s v f e'p.;nmenb In CCOOI.lmtc<o mcluli.:
ncgathc ncomc: ta.'t experinu:nt~ (rur .:'ample. ~.:e ~.~~,pc.hh!>.gollt pl~imr-di me8J) and lh.: R'!I1'J
h.:ahh ln'Uranco.; 1!'-penm.:nt (:-\cwh,lU'-C 111"'1

Review the Concepts

509

If the actual treatment ~as if' random, then the causal effect can be estimated
b) regres.~on (possibly with additional pretreatment characteristics as regressors);
if the as.-;igned treatment i as if' random, then the causal effect can be estimat~c.J
by instrumental variables regression.
4. A key threat to the internal validity of a quasi-experimental study is whethe r the
"as if" randomization actually results in cxogenei ry. Because of behavioral
responses. just because an instrument is generated by "as if'' randomization docs
not mean it is necessarily exogenous in tile sense required for a valid instrumental variable.
5. Wbe n rhe treatme nt effect varies (rom one individual to the next, the OLS estimator is a consistent estimator of the average causal eff~t if the actual treatment
js randomly assigned or " as ;r randomly assigned. Howev . c
ables estimator is a weighted average of the individu
Lhe indi vidual~ for whom the instrument is most 1
weight.

Key Terms
program evaluation (469)
causal effect (471)
treatment effect (471)
differences estimator ( 471)
partial compliance (473)
attrition (473)
Hawthorne e ffect (474)
differences estima tor with additional
regressors (477)
conditional mean independence (478)

differences-1.......,=-=--
differences-m-differcnce
additional regr essors ~
quasi-experiment (495)
natural experiment (495)
repeated cross-sectional data (498)
average causal effect (503)
average treHtmc.nt effect (503)
local average treatment tlffect (505)

Review the Concepts


13.1

A researcher studying the effects of a nl!\\ ferti lizer on crop> idJ:, plans to
carry out an experiment in which <.lifferenl amoun ts of the fertititcr are
applied lo 100 different one-acre parcels of land. There will he four treat-

ment levels. Treatment level one is no fert ilizer; treatment level two il> 50%
of the manufacturer's recommended amount of fe rtilizer; treatment level
three is 100%; and treatment level four i& 150 % . The researcher plans to
apply treatment level one to the first 25 parcels of land, treatment level two

510

CHAPTER 13

ExpcrimentsondOuosiExperiments
to th~ ~ccond 25 piircch. and !'.o forth. Can you ..,U!!,!1~l>t a bcttl!r W:l} In iJ''>ign
level''\\ hy ts v<Jur proplhal txuer thanth1. rl!,carcher m~. thod!
A ..:hnu:al lrialt.., earned out for a new cholestcrol-lo,.. crinc. Jrug. '11tc Llruu
C'
tl> gt\cn h.> 500 palh~nlS and a placebo l!.iven to another '\()(1 paucnh. u..: _
r. mJo ''''gnmi.'DI o! th1. pauent:..llo\\ \\OUid ~ou c't rn 11.: the trentn ~,
dfcd of lh~.: drug., Suppose thill )OU h 1d data on th1. \WI~hl. ac,c. an~.t
Jer ol c.u.. h patic.: nt Could you usc thc~e data to imprn' }our ~.:::.tu
E\pltin Suppo"'- th'lt you had data on th1. cbok... h.:rol k' ch of cat.h p
betor~. hi. or she entered tht. exp~.:rimenr. C'oulu you u... c the ... ~.: dat r )
rmpro\ c) our v . umatc? Explam
Rco;c r1.hc.:rs stud) tng the STAR dat. rc pon lnt;c.Jotal e' td~.:ncc th 11 '" ~ -1
princip.tl.., Wl.!rc.: prcs... url!d by ::.orne parent::. to place tbetr chtiJr~:n tn tht. sm 1 l
clas~cs. Suppos1. thut some principals -.uccumbec.l to 1hi::. prcs ... un.: and t1 .n ...
lcrreu some childrc.n into the -.mall cla.s'\C'-. How ""ouiJ 11w. cumprnmt.., th~:
internal vnhdm ol the study'! Sup po~e that yuu hatl data on the origin.tlt.11l
Jom ns~ignme n t ol each sllH.knt before the principal's inte rvent ion II Pw
cnukl vou usc tht') tnlunnatiun to rt' tore the 111tcrnal v<lltJtl\ of the <;1\u.l\''!
Lxpl.un \c.heiiH:r cxperimcn t.tl effect~ (hkc th~; l l .t wth orn~.. effect) mt\!.h . 1.
import<111t in each (lf the experiment~ tn thl! prc\'IOUS three q uc~ tJIJI1<;.
()cctinn 12 I g.i\'c..., 1 h\ pothdil:.tl example in whtdt "'Ime: whooh \\1. r~. l.t:naecd h\''ln earthquake. Explain "b) thts ~~an c\Jmple I'll 14U'"i-c-.;p~n
ment. llow could vou tt:.c the mduccd change" in da'" s~es to c'timalc tl ~.
d k ct ot chc;s ..;iz._ un test scores?
trcatm~.nt

13.2

]3_'\

lJ.~

1.3.5

Exercises
13.1

13.2

t "iOl! tht rc ... uhdn'labk 13.1 calculate the foUowinl!for e.tch j:.:f<tl.lt:. .tn ~ ,t i
mate olthc 'lm.lll d.t!>l> treatment effect. relative ltlthe rc~ular cia" it~ ,. n
J ml c:nur anJ it~ 95 .. conlt~lt:nce mtcnal. (l-or thi' ex~.rct-.c t gnor~ I ~
rcsulh fm regular classc::. wtlh atdc::..)
f:or the following culculatii.)OS. use th~: n.:sults in ~:olumn (~)of fabk l 2
Cnno;ic.lct two cla..,~ro,lml>. A and B. with idcntic.tl \ialu~:-. 1.1f the n.:grc,,OI' in
column (-1) of'lahk 13.2.e-..:ccpl lhat:

u. Clas::.room A i-. ,, "smaU cla~s anJ classrO\lm B ,., a "rcvular ciao;.: ..


Con,truc:t a 9.)( ( ronlukncc mlt:nallor the\. \pcct~;d dtllcrcnc.l n
a\ e1 .tg..: lc: 1 scnr~;:l..
b. Cht'i'-IOOm A h~1s il teacher \\.itb 5 }C3~ of t':O.pcricnu. 1nJ da....... w ont 3
I as n I\. tct,~:r \\ith 10 }e f" nf ~:..peril!nc". Con:..trucl 1 '1:0. confldt.: ~~
intcnul tor the c:\pt:ctt"c.l Jiffl.rence in :1\c.:r<tgt. h:'t ,l!,ltc.:'-.

Exercises

51 1

c. Clasc:roo m A is a sma ll cla.-.s " ith a teacher with 5 yean> of cxpenence

a nd classroo m B is a regular class wirh a reac her with 10 years o f experience. Construct a 95% confide nce interval for the expected differe nce in average test scores. (I lim: I n STAR, the teachers were
ra ndom ly assign~.;d to the different t~ pes of classrooms.)

d. Why is the inte rcept missing from column (4)?


13.3

Suppose thal, in a randomize d controlled experiment of the effect of an SAT


preparatory CQUrse on SAT scores, the fo llowing results are reported:
Tr..:utml!ur Group

Control Group

1241

1201

tw.:nu~c: SAT soore (.X}


Standard t.lcviation nf SAT score (s,)

97.1

Kumher nf men

55

45

N umber of women

45

55

11.

Estimate the average treatment efCecc on tcsc scores.

b. Is there evidence of nonrandom assignme nt? Explain.


13.4 R ead the box '"What Is the Effecl on Employment of the Minimum Wage?"
in Section J3.5. Suppose. fo{ concre teness. that Card and Krueger collected
their data in 1991 (before the cha nge in Lhe New Jersey minimum wage) and
in 1993 (after the change in the Ne' Jersey minimum wage). Conside r Equation (13.7) with the W regressors excluded.
a. Wha t is the value of X11 , G,. and D, for:
i. A New Jersey restaurant in 1991?
ii. A New Jersey restauranl in 1993?
iii. A Pennsylvania restauraot i.n 1991?

iv. A Pennsylvania restaurant in 1993?


b. ln te rms of ihe coefficiems {30 {3 1,
number of employee in:

~-and

J3J! what is the expected

i. A Ne w Jersey restaurant in 1991?


ii. A New Jersey restaurant in 1993'1

iii. A Pennsylvania restaurant in 1991?

iv. A Pennsulvania restaurant in t993'l


c. Jn terms of the coeffinents {30 , {3 1, {31 and {33, wha t is the avera ge causa l
effect of the minimum wage on employme nt?

S 12

CJiAPTER 13

Experiments and Quasi-Experiments

d. Explain wby Card a nd Krueger used a differe nces-in-differences esti


ma tor of tbe causal effect instead of the " New Jersey after-New Jersey before'' differences estimator or th e " 1993 New Jersey-1993
Pennsylvania" differences estimator.
13.5

Cousi<ler a study to evaJuate the e ffect oa college studem graduo; ol donn


room Interne t connections. fu a large do rm . half the rooms are randomly
whed for hig h-speed Internet connections ( the treatment group). and final
cou rse grades a re col lected for all residents. Which of th lollowing po'c
threats to internal validity, and why?

a. Midway thro ugh the year all the ma le athletes move into a Cratern11v
and drop out of the study (their fin al g.rades a re no t o bserve d).
b. E ngineering students assigned to the c<>ntrol group pu t together a

local area ne twork so that they can share a private wireless lnte1not
connection that they pay for jojrHly.
c. The art m ajors )o the treatment group neve r le arn how to access tht.:lr

Interne t accounts.

d. The economics majors in the treatme n t group provide access to thL'u


Inte rne t connect ion to those in the control group, for a fee.
13.6

Suppose that there are panel data for T = 2 Lime periods for a randomized
controlled experime nr, where tbe first observation (t = 1) is taken before the
experiment a nd the second observation (r = 2) is for the post-trealmt:nt
period . Suppose that the tTeatment is binary, tha t is. X,,= 1 if the ;th inJntd
ual is in Lhe treatment group and t = 2, and X;, = 0 o thetwise. Furtht:r ... uppose thal the treatme nt ellect can be modeled using the speciftcatjon

where a ; are inclividuaJ-specific effects [see Equation (13.10)] with a n\1.1"


of zero and a va ria nce of cr; and tt;r is an error term, whe re ll;r is homoskc:t.i,IS
tic,cov(u, 1,u12) = 0. a nd cov(u;t>cx,) = 0 for a ll i. Ler ~fiffer~ll(dS de note th~ dtf'
fere nces estimator, that is, the OLS estimator in a regression of Y,: on X,_
with a n intercept. aJ?d let M ifftmdiffi deno te the di ffere nces-in-difference~
eslbnato r, tha1 is, the estimator of {31 based on the OLS regression of :l l, ,:;:
Yr2 - Yil against 6.X1 = X,1 - XLl and an intercept.
a. Show that nvar(~f1fl.-mrm) ~ (~ + o-~)/var(X,'2) (Hint: Use the
homoskedastjcity-only formuLas for the variance of the OLS estim. l\11
io Appendix 5. 1.)
b. S how thaLHvar('fr{iffimcli/)'s ) ~ 2cr~/ var(X;2). (Hittf: Note lhat ~ .\, ,;;

Xu.; why?)

ExerCISe

S 13

c. Based on your answers to (a) and (b), when would you prefe r the differences-in-differences estimator over the differences estima tor, based
purely on efficiency considerations?
13.7

Suppose you have panel da ta from an expe rime nt with T = 2 periods (so
r = 1, 2). Consider the panel data regression model with fixed individual and
time effects and individual charactcri!'tics W1 that do not change over time,
such as gende r. Let the treatment be binary, so X ;r = 1 for 1 = 2 for the individuals in the treatment group and let X,,= 0 othe rwise. Consider lhe population re.gression model

whe re a, a re individual fixed e.Uects, D, is the binary variable that equals 1 if


= 2 a nd equals 0 if 1 = J , D1 X W, is lht: product of 0 1 and Wi, and the a's
and {3's are unknown coefficien ts. Let uY; Y 12 - Y11 Derive Equation
(13.5) (m the case of a s)ngle W regressor. so r = l ) from this population
regression model.
Suppose you have tbe same data as in exercise 13.7 (panel data with two
periods. n observations), but ignore the W regressor. Consider the alleroarive regression model
t

13.8

where G1 = I if the individual is in the treatment group and G1 = 0 jf the individual is in the control group. Show that the OLS estimator of /3 1 is the diffe re nces-in-differences estimato r in Equa tion (13.3). (Him: See Section 8.3).
13.9 Derive the final equali ty i.n Equa tion (13.9). (Hint: Use the definition o f the
covariance and the fact that, because the actual treatme nt X, is ra ndo m, f3u
aod X; are independently distributed.)
13.10 Consider the regression model with heterogenous regression coefficients

where ( v,, X ;, f3oi> /31,) are i.i.d. random variables with {30

= E(/301 ) anti {3 1 =

E(/311)a. Show that the model can be written as

Y, = {30 + {3 1X; + u;. where II;

(!Jru - f3o} + (/311 - {Jl)X; + v,.


b. Suppose that (j301 IX;] =
that E[u 1IX1] = 0.

/30 , E[/3 11 1Xi) = {3 1 and E[v;!X1]

= 0. Show

c. Show that assumptions I a nd 2 of Key Concept 4.3 a re satis fied.

514

CHAPTER , 3

Experimcnb ond Quosl-E:<perimenls

d. Suppost: that nut licrs arc rare. -;o tll.lt (u,. Xi) ha\1; finll~; luun h
moment~. Is it uppropna tc lo usc OLS and tht! mdhod~; nf Chaplet~ Ll
Jnu

5 to esttmatc and curr) <lUt inference about

th~ aYe I ae~.

value., of

{3.,. .tnJ {3 1!
c. Suppose th.tl /31i and X, Jr~.o positivd> correlated. <;o that obscrv~t1111 .,

wllh larger than average value~ ol A, tend to b.l\l' largt:r th;m. ,~,., u
\alucll of /3.,. Ar~.- thc .~~umptions 10 K~) Conc~pl ~ . 3 satJ..,ItcJ! I , 1
whkh ilSSumptiun(s) i' (ur~) \iolated? Is it appropriat~ h .l U"<: ()L~
anJ !he method' of Chapters-+ and 5 LO esLimatL anJ carry nut tnh.
enct. ahout 1hc .1vcrage value of {3 and {3 1 '!
13.11 In Chapter 12. stmc-kvLI panel data were used 10 ~~ trmatc lh'-' pncc ~.-l.l ... lll.:
ity of d..:ma.nd l'ot dgarettc!>. using the Sl<ttc ~al es tax a... an mstrum~.:nlal \',Jliahk Cunsh.ler in particular rcgrc,llion (I) in Tabk 1.2.1 In Lht' c<t'>C, m yuut
judgment does lhto local <1\Cruge lrealm~nt efft,;Cl (jjffc:r f10111 lhl! JV\!t , lg C
trcrt lmCn l effect'? Fx phdn.

Empirical Exercises
El3.1

1\ pro-.pc.:ctl\'c cmplorcr rec~ives rwn rc~umes: a resume from a '' hit~o jno
applicant and a lltmilar resume from an Afncan Amencan tpplicant. J., ttn:
cmploy~.r mor~ likely to call b.tck the white appltcttnl to arrang~.: nn mt.:J
\ ic'' '' M.tri<.~nnc B.:rtntnd aml <i.:nclhil Mulluinathan carried out .l random
izeJ C\"111 1r~.lll~!c.l ~ "P" rim en I to an v.~r thi" quc-.t ion Becau-.o.: rae~ i~ ,,,,
t\ p1nlly included on a resume they dJlkrentiatcd ,c. . umcs on the h..... i-. I
"whitc-snunJing nt~mt!~" (~uch as Emily Walsh or Gregor~ Baker) 1~u
"Afncan t\ mcril.an o;ounc.lmg names (.-.ucb as Lal-..tsha \\ ,tshtn!!Lnn or I ...
Jones). A h.trgl! colk\.lton ol ltcl!Uous re:.umcs ''<I' crc.:ateJ. anJ the ptc.:-.up
pol.>ei.l"rm:c'' (based on lhl.' "~ound" ol the name) W<~ ranJoml> a):,!-.tgncd 1 1
each resume:. 'lhco;c resumes v.cre sent to pro~pectivc cmplo) Cl"> to see" 111d1

resumcll 2c.nerateJ .l phone call (a ''call hack") from the pro-.p~l'tiH'


~.- mplovcr. Oata fro m 1he c>.p~nme nt anJ a detailed dnUl description arl! 1111
lhe lcxtbook Wch o.,itc ltltp:/lnww.aw-bc.com/!>tock_wats on in the lth>
Name!~

and

l\amc~

Dcscription.h

a. Ddinc- the '\:<~11-hack wlc" ac; the fra<:tion of rcc;umcs tht1t gcnl.f<lt~ 1
r hnne call frnm the prospect he employer. Wh<!l W<l" the call b.IL'k r ti C:
"111.-..., J;ua W<'TC: ri(\VIJ~J h\ l'rnle-.<.ur ~~ m unt: llo:rtr.IOd tho: L'nl'cr-.ot) nl ( I~ IJ: 111<1 \\
u....c.l m htr rro.:r \\llh llo<.:ndhtl ~ullatnath,ln Arc EIIllly and (Jfi:O!! \1nrc;: rmp!O)ahk )11.111 I ~. ,,It I
.1nd l.lnllll? /1. lu.ld F.\pcrinu:nl fill I 'I N>I M.ul.:t o~')qlminari<n.~
Ill E(t)/lllfllt(' Ht I I<>' !XII

"""'n

94( ll.

Empirical Exerci~

51 5

tor\\ httc ...? for Alncan A m~.m:an ? lH 'trult .. q;;;o cunfiJcnce


ntcr\'al fllr the difference in lh~ call-h td r llC'- h lh~o d ffcrcncc 'ta-

ll,t'c:t'l\ ,jcnificant'! Is it large 111 :1 rc!.. lworiJ 'cn.,.c:'?


b. h tilL \frican American'"hitc callhack ra1c dtltercmial dtrtcrcnr for
m~::n th,m for women~

c. What i~ the difference in c<lllh~ck rate' IM hiuh qualit) vcn.us lowquality re~umc''? What j, thl! high qunhlv/lo\\ lJUahty dtfference for
''httc uppllcants'! l-or A tncan Aml!man npphctnh'! I<> there a ...igmft
cant difterencl.' rn thts htgh quulityllo\\ qu.tht) dtffea!nce ior ''hues
\CNU' \ln~.an AnKric.an-.?
d. Th~..: author:; olthe ~tud} d,aim th,tt ttlCt..: '''" a-.'>igncd random!~ to th~..:
re:-.ume:-.. b there an} 1.. v11k nee ,,f nnnr.mdum ,,,')ignment?
13.2

A cooc;umcr is ~iven the chance to huv n haschalllard lor 51. but he declines
the tr.tdc. If the conc:.umcr '"'On\\ ~1\lll th~. ha,eb,tll c;mJ.
h~ ~ wlhng
lOl>l.:ll it 1\.)r $1? Staod,ml consum~:r thcur} '"~~1.!!>1:> yes, but behavioral economists ha\'e found th.1t owncr!<hlp .. h.:nJ~ to 1ncre.tst the "alue o( good~ to
consum'"r' That is. the con~umer m.l)' hnJJ uut 1t1r 'omc amount more than
~1 (for L'\amplt:. S1.~0) "h~.!n .;cHine th1. c ~rd , C\ en though he '"'~\\ illing to
pay onh 'l'mc '!mount les... than (,J (for ~\.unplc SO&\) ''hen buYing it.
Behavior'! I economi<.h ..-..111 th1' plll.n,,mcnon the .. cnJL'wment eiTcd.'. John
Li~l IOVC,III',tlcd th" cndo\\ment effect in a ranJnm11"1.'J e>.. pcrimcot in,ofv10!;! <:ports mcmor ,lblh.t t m<.l~rs at J 'ron s-carJ show. I raders wen: ranJom1)
gh1.n one o f t wo !>pons collcctlhlc,, ""> gond \ m good 13. that had appro\
imall.:l) equal marl.~ I 'aluc. 7 11w-,~_ 1\!l'l.'l\ ing gnnd A were fhen givt:n the
ortion of trading good t\ fM good B with the~. xpt.!rimemcr: tho"C' rccci\ing
gMd B were A.i"en the option ol tradtng good B lor good A wuh the experimenter. D atJ fr~1m the c>..pl.!nm~.nt anJ a d"t'tiktl Je~cnphun can b.. louncl
on t b~. te\tl:l1ok \\cb site http://"'"".U\\ht .tomlbtock_\, at.,on 111 tht: file!.
Sportscards and port~C1l rd b_D c cription ~

,,,n

a.

i.

Suppos~.

th.tt, ;~b,cnt lfl) cmlll~ m nt cffccr. all of the sub\,ch prefer good \ to gcxxl l:l What fr:tction o( the cxpcnment's <>uhjccts
wouiJ you expect tutrad~ the guod th.Jt the} were ~ivcn lor the

'Gt >d A " "II 11dcl 'lUll hom the gume th 11 U.t R1pl.en. J1 '"' 11 r~~, rd l<~r u "''c~UII\ C i!flm..-:.
pl;m;J, .ano.J <illod B "' a souwn1r trum the game 1h.11 ~ .. l~n lh~tn \\<n h1'i ~l(l '1!.1mc:

110..~ d.111 \\cr" l'r'" 1d~ h) PrCih:s-oJ Jhn f,l


p.epcr ~ (),)C, \l.ukctl Xj)CJJcncc f hmn.11e \tarl..cl
l l!i(l) ~1 71

ul lh I 111\0.:f\IIY Ill Ch1ca~o anJ \\eTc: ll'C:U tn 111'


0\n~>llllhc," 'Jttolflt'rly )t>tJmul uf c(llromtC\, 21Xl'l,

S 16

CHAPTER 13

Experiments and Quasi Experiments


otht:r good? (Him: Random assignme nt means tha t .tppm \ imutdy
51)o/o of the: subjects received good A and SO% received good B.)

u.

Suppose that, absent any endowment effect. SO% ol th~ '>Uhjcch


prefer good A to good 8, and the o ther SO% prefe!r good B to goou
A Wh at fraction of the subjects would you expect to tr de the
good that they were given for the other good ?

iii. Suppose that. absent any endowment effect, Xo/o of the ~ubjcct'
prefe r good A to good 8, and the other (I - X).,., pre. fer ~ood B 1o
good A . Show that you would expect 50% of the suhJClb to trad-.;
the good tllat they were given for the other good.
b. U sing rhe sport ~tard data, what fraction of t he subjects tradt:d th~
goou they were given? Is the fraction significantly diffe re nt 1ro m
SO%? What fraction of the subjects who received good A traded fu1
good B? Wha t f-ractio n of the subjects who received good 8 W1dcu ftn
good A? Is tbe re evidence of an endowment e ffec t?

c. Some have a rgued that the endowment effect may be present. hut th::ll
it is likely ro diwppear as traders gain more trading experic nct!. I h lf
of the experimental subjects were deale rs and the o ther half we r~; nondeale rs. D eale rs have more experience lbao nondeal e~ Repea t (bl t'Jr
dea le rs and nondealers.ls there a significant difference in thc.tr be.
ior? Ts rhe evidence consistent with the hypothesis that the endowme nt effect disappears as traders gain more expenence?

d. The data set contains two addillonaJ measures of exp~rience: numhd


of trades per month and number of ye<lr.> trading. Is there e 'rid~: nH
tha t ror nondealers the e ndowment effect decreases as their trading
experience jncreases?

APPENDIX

13. 1

The Project Star Data Set


The Proje<t STAR public access data set contains data on tesl scores, treatment group~. un..l
Mudcnt nnd teacher charnctcrtstiC$ for the (our years of the experiment. rrom ucaucmt-. ~ eJr
198.5-1986 to ,tcadcmic year 1988-1989.lbe test score data anal)'LCd in thi~ chapter 1 ~ tn.:
sum of the scores on the math aod reading portions or the Stanford Achtevement T"' 111~

Extension ol the DifferencesinDilforencos Eshmotor to Multiple Time Periods

S 17

bin a~ 'art~bh.: Ru) - 10 Tabh: 13.:! mdicatcs wht thcr the l\tudtnt ts abo) ( = I) or gtrl (O): the bm I) 'an:-tbJo..:, -Black.. and Race other thltn hluck or white"' indicate the student's
race ' ll1c: hin.. ~ ' tri thk Free lunch divtblc .. int.l't. Jtt \\ hc.:ther the student i" ehgihle for
., free lunch dunng that M:h(lol year 1l1e tencher"s ve1rs l)f ~:xpcricncc are the tot.tl vear.;
of experience: of the tea.:hcr whom the stutknt h<HJ tn the grade for which the: te~t data
apply. The data set abo tndtcate:. "hich school t h~ :.tudcnl allcmJcd tn a gh.cn yenr. makmg 11 possibk tO construct binary school-spcdftc tndu:ator vonnhles.

APPENDIX I

13.2

Extension of the Differences-in-Differences


Estimator to Multiple Time Periods 9
When there are mt~rc than two time petiod'>, the causal cflcct can be cs(imatc:cJ using the
ftxed effect<. rt!gre,,ion modd of Chapter 10.
First con~tdcr the case that there are no addttto nnl W rl!gr~sors. Then the population
rcgrc::.:.ion modd '" the combined time and individu11l fLxcd effects regreso;ion modeli&Juatton (10.20)):

\\here i = I, .... n denotes the indivi<.lual, t = l , ... , T de notes the time period of measurement. X , - ltf the 11h 1nuividual has received the tieiHmcnt by date t and = 0 other\lo,IJ>r,;. D 2; is a htnaT} vanable indicating the t11 mdividual (that I<;. D2; = 1 fori= 2 and - 0
mherwtse)./J2 is a b10ary \llnable indicating the !>ccond time pcnod und the other hmary
\oanablc~

,ne cJdined -.tmtlarl), v,1 i~ .tn error term and {30 {3. y 2, y,_. li:.... 6r .trc

unknflwn codltcicnh.. lnduding hmar\ V.JrtJbl ~' mdtcatin!: cat.h inJi.,.idual control' for
unol:l~crvcJ rnJt~oidual charactert\UC" that affect) . Including the binary \'3Tial:lles 10dicat
ing the time period control:. fl'T dtficrences from one pcnod to the ncxtthat.tUcct the out
come rcgardk:os o1 \\ hether the indi' idLMi ts m th~: \Tl;<~tmc nt or control group. for .::xample.
an economtc rece~~ton that occur; dunng the.:. course ol a JOb tratning program expcnmcnt.
When r =:!.the lime and ftxed etfccts rcgtCl>Ston mooc.:l tn Equation (13.12) Slrnphltt:\ hl
the differences-m dtflcrencc\ rcgr..:~'ilm modeltn Equation ( 1.3.4). Methods for estimating
{J in Equ ~~j~,n ( I' 12) arc d tscussed an Section 10.4

S 18

CHAPTER 1 3

Experiments ond GvosiExperimenls


Additiona l rcg.n:ssors (W) tha t me asure pretrea tmen t chnracterbttcs. or ~:.hamel em.
tics tha t d o not change o ve r time. can be incorpora ted IIllO the fb:cd t:fkct' rogrc~'iiun
frnmework. As d iscussed in the context of Equa tio n (13.5). in the diffcrcnces-tn-dtifcrcnce'
!>pecificalion wilh ad di tio nal regressors the W regre51:iors affect the cltun~c

U1

Y Irom one

period to the next. o ot its level. A n individual's prio r education. for e xamph:.ts an ob'>tr,
a ble factor that might influence the change tn e amings whethe r or no t ht or 'he ts 10 the
job tra ining progra m. Th us, to extend Equa tion {13.5) to multiple periods, the W rc~rc s:.o1
are inte racte d with the time e ffect binary variables. For co n ve nie nce. su ppo:-c there is .1 ,jn.
gle W regresso r; then the multiperiod extension of Equation (13.5) is

Y.., =

f3u + {3 1X 1, + {32(8 2, X W;) + + {31( BT, X W1)

(n.D)

+ Y2D2, + . .. + y,.Dn, + 82B2( + . .. + crB1; + V;,.

where t he regressor 82, x W, is the interaction between the bina ry va riable 821 um1 \1'1
When there are on ly two time periods, the population regression model with ind1Vl<Ju.1l
fixe d e ffects, time effects, the W regressors, and the W's inte racte d with the s ingle nm._
binary va riable 8 2, Js the sa me as the population n:gress10n rno de l in E qua tion ( 13 ))
(Exerc1se 13.7).
Pane l data with multiple time periods a lso can be used to trace o ut causal cftccts ov~ r

time, the reby asking, fo r e xample. whether the e ffect o n income o f a job training PH'I!

uolo

penis ts or wears o(f ove r time. The methods for doing this a re d iscussed in Chapter IS m
I he context o f es timat ing causal effects using t ime series da ta.

APPENDIX

13.3

Conditional Mean Independence

Tltis appe ndix disc usses the cooditional mean indepe ndence assumptto n me ntioned 111 ~~--"
tion 13.3 a nd its role in the estimation ol a common treatmen t e ffe~:t {3 1 Tltis discu ~''\ 1 11

focuses on the d iffe re nces est imator with additional regressors l/3 1 in E qua tio n (13.2 )) hu t
the ide as gene rali%e to the d ifferences-in-differe nces estima tor with ad di tio nul rcg;re~s,>l:.
The conditio na l mean independence assumption is tha t the condi tiona l mean
error term u1 m Equat ion (13.2) can depe nd on the control varia bles W1, ,

or th<=

W,1 out

11 111

on the trl!at me nl variable X, , $pecifically,


(l ~ I~}

Conditional Mean Independence

5 19

Under the conditional mean indepcnucnce assumption. the unohscrvcd characteristics in


11,

can be correlated with the observed control variables (the W s). but gtvcn the Ws. t he

conditional mean of u, does not depend on Ule treatment.


The a~sumption of linearity in Equation {13.14) is oot restricti\'e if W; is a comple te set
of binary indtcator variables. If a W vanablc is contmuous, the n the hncar conditional expectation in Equatio n ( 13.14) can be interpreted as a nonlinear c<.mditio nal e:o.:pectation wi th
suitable n.:definilio n of the w~s. As discussed in Sec th)n 8.2, 1'or example. the additional
terms on the right-hand side of Equ;uion ( 13.14) can be polynomial functio ns o( an original continuo us W.
l t is useful to consider thice cases in which Equation (13.1 4) holds. FiNt. if the firs t
least squares assumptjon of Key Co ncept 6.4 holds.. then E(u,I X;. 11,, , W.,) = Oso E4uation (13.14) is satisfied and the conditional expectation equals zero.
Seoond, Equation (13.14) holds if the treatment X 1 is ra ndomly assigned experimentally, and thus is distribmed independe ntly of all individual charac teristics, whether observed
and included in the regression ( lhe W ,ariables) or unobserved and in the error tenn 11 X 1
is distributed independently of u, and W;. the n the condi tional distriburion of u. given W,
a nd

X., does not depend on X;, so in pa rticula r the mean o f that conditional distribution

does not depend on X ; (e ven though

It

might depend on W,). In the job training program

e xample. if treatment is assigned randomly then it will not pick up the e(fect of prior educatio n whether education is an included regressor o r an omitted pa rt of the e rror te rm.
ln tbe thtrd case, the trea tment X 1 is assigned randomly.conditionnl on w.. fn this case.
the mea n o r u1 does not depend on X, beca use. gi ven

w,. treatme nt is randomly assigned.

Jtconditional o n W;. u, and X, are indepe ndent, then the conditional dislributioo or rt, given
W; does no t depend on X 1, so its condi tional mean does not depend o n X, even though it
might depend oo W,. H W, 1s a se t of indicator variables, conditional mean indepe nde nce
means tha t X , is randomly assigned within each group, or "block." defined by the indicator
variables, but that the assignment probability can vary from o ne block to the next. R a ndom
assignme nt within blocks of individual<; is sometimes called block randomization.
Under the conditional mean assu mption . /31 is the

tr~a tment

etTecl. To see this, com-

pute the conditional expectation of both sides of Equatiun (13.2):


E( l',j X,.Wii, .... W,;)

f3u + {3 1X1 + (J,W1, + + /3,+ 1 W rl + (u1IX,.W1,, ... , W ,)

= f3o + f3 1X; + f3zWii + ... f3,..t w, + 'Yo+ y1W 1, + + y,W,;.

{13.15}

whe re the second equality follows from the conditio nal mean independence as umpuon
[.Equation (13. 14}). Evaluating the conditional expectation m l.:qualion (13.15) at X,= 1
(IIcatmcnt group) and at X , = 0 (control group) and subtracung yields
(13.16)

520

E>~perimunl) and OuosiExperimenls

CHAPTER 13

1lu:ldth.llld side''' hjU.ttinn (1 '\.Jti) j_, the C;lli' tl dfect t.ldmed !l) mt c\pcruncttt \\her
i ndt\itdu.rl~ \\ tth gt\'Cn H' ch.tracu~nstics trl; - mtJ,,m l ~ .l,,igoct.l 111 trc.ttmcnt 11nd control
gwup-.. nnd the cnu...tl ell ct ,the cxpc:dcd v,tluc of thc outcome. Uccau'4.' thi' c.tu al cllc4.

dcxs no.>l JepenJ on I}. 11 t!< .tho

IIi~

cau-.al ..!ted t'nr a

random!~ ~ded..:d

member uf t1

popul~l'''"
Wh~n f:tJUttinn ( IJ 14) hniJ, (alum~ with the 't:cond through lnurth k<~:-.t :-.q u 1:)
a-;<,umpti''"'" in Kc.;y ('(Incept 11.4). the diiTercncc' ;.:stimat~lr wnh ~.u.ldttiunal rcgr u ors 1

ctm't'te!\1 I ntUttivt.: h. hv includmg \\' '"a rcercso;nr the.: uillcrencc.:~ ,;-.limntM cou11 ol., 1 ,,
th" I tCt th.tt th~ treatment prubahilitv c:.1n dc:;pcnu on W, Th.; mathcmatil I .trguml tllth 1t

~ 1 ts l.Oil\t~tclll unuiJt tho.. ~,;,mdttitmal mc,m mJcpendcnc~.; ,,ssumptum In\'"'- cs mat r:, nl~c
br 1 md j, lcll to l.xer~.t't.: 1::>.9.

( t'ndtttnn.tl nh.tn tndcpendcm:c pr,wtJe, a framework fnr mtcrprcttng r~: g'"')' , ,


With 11b~ct V lll\llltl U.ll,l in \\hich Codficicni,IIO t:\.Hltml \ .mabh.:~ Jo 1101 h ,l\C II \.IIU,,\l llll, rplctnti<'lll hutnlhct cnclficicnts u(l, "'in T:.thlr'l 7. 1. 8.3. and CJ.:!.

APPENDIX

13.4

IV Estimation When the


Causal Effect Varies Across Individuals

Tht~ .tppcndtx d~.::mcs th~.:


th~re

pr, h.1hthl) ltm11 ul th~; T!>LS ~umator 10 Ec.JU.tlt\n ( Jl Il l '~ r n


'' popul tth 'ho:ter, .,.\:nc.:it) n the ucatm~.;nt dfcct nd m lh.. mtlu.. no..;c lth~ in.,tru

mo.;nl (Ill llll' r~ eipt ol lr\ !lment


tion~ in Kc\ C

hctcror.cnt"
Jentl)

11

',.

ft

cCI\ f.urthc.:
7 t 11 I

u"'...1u'c (.\ r,. 7 1 1 ...


111

Srx...:illl.''Uil~ . ll

'" ........ um..:c.J that tl1t"

rv t.,o.f" "ton 0)'Un r

I' I"' $ huiJ. -~'-~PI that Equati.1n' (135) anJ (13 10) Jwld ''"h

''

IMl'

th Jl-

:;

,. /3 . anJ {3

31'l'

, 7) - f. ,. ' L ) = u 1nd ' ~:u f.( -:; )

Ji,trihuted iml
4

ll

0.

I. .... n .1ro.: t ul \\ h 1 ur ..,,,ment'-. nc tw ol large numNt

I\... ) C unl' pt 2 t.. tpph ..) and

cScc \prxndt.>. 3 '\ .mJ C!xcrct~ li 2). Ill.. ta'k thu.. j, Ill obtain ~ xpre"ion~ Cor

:1 1\d

''.t

'")I

an t r 1' 1 the OH) mcnts of 7T


n l 13 . Nnw" \ - 1-j(L - p. )( \'
l ( / , - 1 ..!),\ ,) Substlluttn~ Equal tlJ.IO) 10 ' t'1t'- ~.::. prc;..,h''ll lor ''/ vtctu

IV Estimation When the Causal Effect Vanes Acres Individuals


h[( 7.1 -J.li')~7TI~

IT/\

-" l:(rr111 } x U ~

rr, Z,
f"[;; 11 7.,(/.,

521

1,)1
IJ.I'Jl

cov( 7 ,.t,)

ll ;\ I Fi)

= rr~ F(rr 1,).

'' hcre th~; 'iccond cquaht~ obtain<. hccuusc co,(Z,.t)- 0 [whkh follows from the nssumpllon l:.(v Z, l = O;~ee Equation (2.27)] hecauc;e l:{ (7, -p. )rr~J = l.:'j((Z - J.l hr ~J =

F{F:'{< Z. J.l )-u1~1 Z;]I = (1 7 , - J.L )(rr~, L;) == l-(7.,- J.l:) X (11' 1., )lthasu.,~qhela\\ uf
atcratcd cxpeclal ions and the ,a~umpt10n that rr1,, I!. ttHh:pcndcnt of L ,), and bccau~t:
1:(7r1,7 ,(Z, - p.z)l = EIL!rrt 1 7,( 7., J.l.::)IZ,JI- L:.'(rr 1,)Ff7,(.7., J.lLI] = tr~(rr 1 ,) (thts
u~cs

Y;

thl: law ur iterated exp.:ctatioru. and the ;hsumptt\)n that rr 11 i-. mdcpi.!ndent of/,)
Nc\1 COilSJdcr crLY. Sub,tituting [quatmn (I'll()) into r quataon ( 13.H) vicltb

{3,.,

+ {3 ,,(-r. - rr 1,Z
tr7r

,. )

_j_

II,.

so

= F.{( .7., - 1-LLP'l

= F.{! .7.

J.L.t)(/3,

= F(/31 ~) > II

{3, 1T ~ + {3 r. 7 , ~ /Ja,t,- tt,J{

(13.l\l)

+ cov(Z,.{311 rrlvJ

+ F[/3 1,r.1,7,(Z, Because (rr ~f3l,) and Z, are

4-

p.zll + l:[/31,1 ,(Z,- J.LLl]

+ ~,,q Z,.u,).

md~:pendentl) Ji!>lrihut..:~l.

cov{7.,.f31 rr\Ji) = (); bcc.au.se


f3; il> Ji~fributcd inJepent.kntly of 1, <1nt.l Z, and F(t ,! L,) - 0, L:{/3 1,(?., - J.LL)l L({3 )F{,,(z, J.lL)I = O;b,c.tuSI! F(u, 7.,) = 0 cm(L,,u) O.and hecausc j31 nnd rr :~re
da~t ratoutcd independently of Z,. El/31 11 1 Z1(l, - p. 1 ll ~ a"[ <13 11r. 11 ).111U'i.. the final ..xprc~
~ion in Fquauon (13.LY) yn;ltb

(13.20)
Suthtituting. Equatit,ns (13 lR) and (13.~0) intu fqu.uion ( 1.ll7) yidd~ (3f' 1 ~ ~
,~F(f3 1 rr ,) I u)E(;; ) = F.(f3 rr 11)/ /.:.( rr ). "h1ch i~ th~ re~ult ~!a ted an Equ.1tion ( l.lll ).

PA RT FOUR

Regression Analysis
of Economic Time
Series Data

CHAPTER 14

Introduction to Tilne Series Regression


and Forecasting

CHAPTER

15

Esrimation of Dynamic Causal Effects

CHAP' ER

16

Additional Topics in Time Series

Regression

CHAPTER

Introduction to Time Series


Regression and Forecasting

14

,. T

ime ser_i~~ data~ata collected for a SJogle e ntity at multiple points in


time-can be used to answer quantitative questions fo r which cross-

secuonal dattfare inadequate. One such question is, what is the causal efCect on
a variable of interest, Y, of a change in another variable. X, over tjme? In other
words, what 1s the dynamic causal effect on Y of a change in X? For example,
what is the efft!ct o n traffic fatalities of a law requiring pa~sengers to wear
sea tbelt ~

both inilially and subsequently as dnvers adju t to the law? Another

such question is, what is your best forecast of the value of some variable at a
future date? For example. what is you1 bt:sl forecast of next month's rate of
inflation. interest rates. or stock prices? Both of tbc c questions-one about
dynamic causal effects. the other about econo mic foreca sting-ca n bt! answered

USlng time seri~s data. But time series data pose special challenges. and
overcoming those challenges requires some new techniques.
Chapters 14-16 introduce techniques for the econometric analysis of time
series d ata and apply these techniques to the pro blems of forecasting and
estimating dynamic causal effects. Chapt er 14 introduces the basic concepts and
tools of regression with time series data and applies them to economic
(oreca.c;ting. In Chapter IS, the concepts and tools developed in Chapter 14 are
applied to

t h~

problem of estimating dynamic causal effects using time series

data. Chap ter 16 takes up some more advanced topic.<;

1n

time series analysis,

111cluding forecasting multiple time series and modeling changes in volatility


over lime.
The empirical problem s tudied 10 this chapter is forecasting the rate o f
in(lntion, that is, the perce ntage increase in overall prices. While in a sense
fo recasting is just an application of regression analysis, forecasting is quite
525

526

Regnmion Anoly$i$ of Economic Time Serle$ Dolo

CHAPTER 14

dillcr~nt fnm1

the: l.''>lim:uioo of cau.,.d

dklts.th~

As ul~<.u-.scd rn ~...:ll1on 14.1. moJ~..+, lhat

tm.u, ul tlus bm1k untal

ar~ u-.ctul

0\1\\,

h1r lm~.ot"'tmg, need not

have a laU't~lanterprLtation: II vou );CC p.:th:"-trian" carr) inl! umbr~..-lJu, vou


might

fore~. hl

r tin

~,~..r,

t wugh carf)tng .tn umbrc:ll

<I

Sci.! m 14. m troduc~ -.onto ba.ac cuncept:. ,,f ll"Tle , .. r " .... l)'b and pr~em
St"l ~~.; ~\<lmpJe<;

ot ccunom.c ljmc 'aic:' Jnt Sct.t on 1 t3 pr.. ,o.: nh time encs

rcg.:..,,iun mudcb in '"hich the rcbrr~~urs .tn.: past 'taluc::. ol th~ JcpcnJcnt
\'an thk.th\.S\. "auturcgn.:..sivc" modd .. u-.e the hl'>tory of inllation to fnrcca 1 it-;

futur~.o

Qfl ll. forCC<I'>b ba ...c:J on auturcgrc,..,IUO' C.tn be: improvcJ

auulllUII<tl prCUICh.H Vallublcs and thdr pa... t \ aluc<;. 01

h~

,1i.Jdmg

''Ja~s." "' rcgtl.!~'llll '\ lliHJ

thcl>C so c.:allcJ nuturcg1 cssivc distributed lag model' an: inti nJm:~..d in
14 4.

the

rl'l

r.at~

example, ''t: fin<.lthat intl;H.ion f<>rccao:;ts matk


uf uncmplo~ml!nt in additwn

to

ll'>tng

lagged inllatwn

~lUh 111

l.1ggcd v.tluc~ uf

th<il 1!>. (tlfl'lt~ ts

..ha<;~;tl on :m em p1rict~J Phillips curve-improve upon thi! :nllorcgrl'.sh c intl,llil)ll


forcca . . t-. A pr .Jet tea I i-."u~-" i!> decidmg. ho'" mam past '.tl Ut;::. It> tnc.l ud.: in
auton.g

dcscril

<.;."ltlllS

anu 3UIOCc!!!fC:SSi\C di tnbutcd bg nwllc:J.;. nd SeCIIOh I .:;


I

lltthod' for making thi-. ut:d,iul1.

Til.: ~~umpuon that the fururc \\ill bt: hkt. th""


lime Sct11.:-:

rc~ll.'-'tun sulfJcienrl~

11:0

th<~t

11

i~

r< II'> an imponnnltlJW Ill

gi\'Cn il' t.l\\O name. ",taliunuril\

l1m.: 'cru:... \ariabk' can fall h") h...: ,l.lltonnf) m 'ariou' '' .ap. hut l\\O 1rc

c'pccm 1\ rdc, 111 fl'f rci!Je..,,iim an

11~'>1'

ul u:ont.,mic tun~.. s< rh.'' d 1l'l 11) th.

trcnJ,, .tnd Pl lht: popul.!lton rcg.resswn can bL un'tiibk

llV\.T

11m.:, th.11 t:). t

'"

popul.tllull rcgr\. sJOn can haw breaks. These depanurcs lrtm <;t:ltion.tril\
JCOp;mli/t. ftHcC<t'> t and inferences ba!:~ed tm timl.! 't:rit-, ll'gfl.'),sitm Ftlllun.Jh:lv.
lhL'H! <11'1.! :O.li.lllSIICaJ pl'OCCdUI'l'S lor deteCllllg_ ln.:nJ'> \IOU hrcuJ..c; ,11)(.], (llll'~
~ok'Lcctcd lor

in

S~.llitlll'

ildjuqing the model c;peci.Cicatiun. 111c-.~ jlllll:l'lhll ~' .m.: pr~..~rnt~d

ILoanu 1>1.7.

t4. 1

Using Regressioo NKxlels for Forecasting

527

14.1 Using Regression Models for Forecasting

The empirical application of Chaptc.:r~ 4-9 focus~c.l on ~;:~lim.1ting the cau:.al c.;ffcct
on te..,t scores of the ~lullenHcm;hcr raJio. The implcst rcg.rc-.,it'n mo<.lel in Chapter 4 rclmed test s<:orcs to the student-teacher ratto (SIR):

Test core - 989.9 - 2.28

X ~'/'/?.

(lU)

A~ wa~ discussed

in Chapter 6, n school superint~nc.lnt, contemplating hiring mort:


to reduce class sue,, would nor con'>ider this cqurrli(m to be very hdpful.
TI1e e~timnted slope coclticienl in Equation (14.1) rnils to provide a useful estimate of the cau.:;al effect on test scores
probt~hle omi tted variable bias arising from tJ1e onussion of school anJ student
cha racteristics thall arc determtn< o s l
tcoc h c~

..

ln contrast as wa." discussed in Chapter 9, a p.lrl'nt \\ ho is considering mO\ing to a school district might find Equation (14.1 l more helpful Even though the
cnclficie.ot does not have a causal interpretation. the rcgrco;~ion could help the par
ent forecast test scores in a di~tricL for which they are not publicly avn.ih1bk. \.1orc
generally. a regression modch.an be useful furfurcca ling even if none of 1ts coef
fictcnts have causal tnlerpretations. From the pcr;pccttvc of lorecasting. what is
important is that the model provides as accunne a ltucc..lst a~ possihlc. Although
there rs no such thing a. a perfect fort!cast. rcgre~sion nHldt"lo; can ne\'erthde~ provide forecasts that are accurate and reliable.
ll1c applicatinn~ in this chapter diller from the test ~core/class size prediction
problem hecause this chapter rocuscs on us1ng tim~ scm~" data\{) rorc~:a. t future
events. For examrtc, the prospective pan~nt actually wnult.l be tnterested in test
scores next year. after his or her child has enrolled tn u sehoul. Of course. those
tc: 1 ha\e not ~et been given, so the parent must forcc<J-.tthc score-; using curn:ntl)
a' ailahlc inforruatiuo. If test cores arc a' ailahle for pa..,t ) C.Jr'-. thc:n a good .;;tarting (X'int is to ~c c..lata on current and past test score" to torcca.c;t future test score~
Ihi!> reasoning lead directly to the autorcerv,..,i\1.. molle;:ls prc~cnted in Scctmn
l4J, m \\hkh pa l valucc; of a vanable are U'>t.;U 10 ,, hrh.. H rcgresc;ion to forec;ts t
future values of U1e series. The next step. whrch 1s taken 10 ~cctwn 1-l..t. rs to cnend
these model:. to mclude adJrtionaJ predictor\ ant~hk\ such as data on cl.sss 'it\..
T ike Equation (J-L I), ~uch a regression model can pr11Jucc accurate anJ reliable
forecasts even if it coeficicnts have no causal intcrpr\.lation. In Chapter 15. we
return to problems like that faced by the ~chou! o;upcrinh:.nc..lent and disc:u~~ the
csllmation of causal effects using lime St:rics ' una hies.

528

CHAPTER 14

Regression Analysis of Economic Time Series Doto

14.2 Introduction to Time


Series Data and Serial Correlation
This sectto n introd uces some basic concepts and renmnolog} that .ut ._ tn tune
series econometrics. A good place to stan any analysis of ltme ~en~., c.l.tt..& ,, b~
plo tting the data, so that is where we begm.

The Rates of Inflation


and Unemployment in the United States
Figure 14.1a plot the U.S. rate of innatio n- tbe annual pe rcentage ch an ~~; in
prices in the United States, as measured by the Consumer Price Index (C Pl)fro m J960 to 2004 (the data are described in A ppend ix 14.1). The tnn.ttion r til.:
was low in the 1960s, rose through the 1970s to a postwar peak of 155% in the fir,t
quarte r of 1980 (that is, January, February, and March 1980). and then fe ll to lc~c;
than 3% by the e nd of the 1990s. As can be seen in Figure 14.1a, lh' jnilation rat~.;
also can flu ctuate by one percentage point o r mo re from one quarter to the Ol'\l
The U.S. unemployment rate-the fraction of the lahor force o ut of work. J S
measured in the Current Population S urvey (o;ec Appendix 3.1 )-~s plotted m r ::>
ure H .l b. Changes in tbe unemployment rate are mainly associated with th~ ollS'i
ness cycJtJ!l lhe United States. For example, the unemployme nt rate incrcd'kd
.,-during the recessions of 1
1961 , 197
~ twin recessioi}S Ql !2SO
an
1- 1982, an the recessions of 1990- 1'J91 and 200..l.....tltisodes denoh.c.l hl=. ---------~----~~
'Slla<flng in Figure 14.lb.

----

Lags, First Differences,


Logarithms, and Growth Rates
The observa tion on lhe time series variable Y made a t dater is denoted Y,. anL the
total number of observations is denoted T.The inlerval between observation,. hot
is. the period of lime between observation r and observa lion r + 1, is soml! umt ,,j
time such as weeks, months, q uarters (three-mo nth units), or yean;. For exu mpl~!,
the inOation dota studied io thi~ chapter are quflnerly. so t he unit of ti111~ (:t
"period'') is a quarter of a year.
Special terminology and notation are used to indicate future and past v~l I\.'~
of Y.1l1e value of Yin the previous period is called its first lagged value C'lr nwrc
simply, its fi rst lag, and is denoted Y, ,. lls j'h lagged \'lllue (or simply us/ h lJ
its value j p~.:riods ago, which is Y,_J' Simi larly Y,~ 1 denotes the va l u~ ol l one
period nto the future.

14. 2

Introduction 1o Time Seri~ Ooto and Senol Correlohon

529

tnRotion and Unemployment in the United States, 1960-2004

FIGURE 14.1

Percent

-2 .....
-4 I

I I

I I

I I

1~75

1980

I I

I I

1990

I I

1995

2()(l(l

I I

::!(105

Year

(a) U.S ( 1'1 lull ntc>n R.nc

Percent

II

~~~~~~~~~~~~~~~~~~UL~~~~~~~~

I 'J/,(1

l'J7ll

I'J7')

IWO

t995

~wo

~~ ~J5
Ye;~r

(b) U

(i

L nt'llll'lm I!ICIII

lt.ll.

Pn,e inAotion in the United Stoles (Figure 14.1o) drihed upward from 1960 until 1980, and then fell sharply during

lhe early I 980s.lhe unemployment rote 1n the United Stoles (Figure 14.1b) rises during rocessions (the shaded
episodes) and falls during exponsrons.

530

CHAPTER 14

I<EY CON Efll

14. 1

Introduction to lime Series Regression ond Forecasting

lAGS, FIRST DIFFERENCES, LOGARITHMS, AND GROWTH RATES


The fi~t lag of a time series Y, is Yt-1: its ,-u..lag is Y,_,.

The first diffcrl!nce of a series. ~ Y,. is its change


thatts. ~ Y, = Y,- Y, 1.

betw~en

pcriiJds r

I and 1

The lir..t difference of Lhe logarithm of Y, is .lJn( Y,) = In( Y,) - In( Y,_ ).
The percentage change of a time series Y, betwee n period!> t - 1 and r i
approximately lOOAln( Y,). where the approximation is most accurate \\hen
the percentage change is small.

The change in the value of Y between period t - 1 and period tis Y,- } 1 1;
this change is called lhe tirst difference in the variable Y,. I o time series d a t a,"~
is used to represent the first difference, so that AY1 == Y1 - Y,_ 1
Economic"time series are often analyzed after computing their l ogarithm~ or
the changes in their logarithms. One reason for this is that many cconorni scrk'-.
such as gross domestic product (GDP),e xhibit growth tbal is approximatcl} i..XP )neotial. lhat is. over the long run the series tends to grow by a certam percent t-e
per year on average: if so, the logarithm of the senes grows approximate!} line rly.
Another reason is that the standard deviation of many economic lime se rit.:. is
approxtmately proportional to its level, that is, the standard dcvtation is ' ~. 1 1
expressed a a percentage of the level of the series; if so. then the standard de\. tion of tbe logarithm of the series is approximately constant. In either cas~ .. ~
useful to transform the series so that changes in the transformed series are r )
poruonaJ (or percentage) changes in the original series, and this is achieved b~ tking the logarithm of the series. 1
Lags, first differences, and growth rates are summarized in Key Concept .l I.
Lags. changes, and percentage changes are Ulust.rated using the U.S. infl<JIIL1n
rate in Table 14.1. The first column shows the date, or period, where the first qunr
Ler of 2004 is denoted 2004:1, the second quarter of 2004 is denoted 2004:Il. anJ
1The chnoge of the logarithm or a variable is approximnttly .:qudlto the prorort1Un011 change.: !l!Ult
variable: thllll"- ln(X + a) ln(X) a a X. where the nppro\imdllon worb be'' when tl 1 X" "ttl<- j)tC
Equauon (li 16) .:~nd tho.: surroundmg discussion!. :-:o... n.:rl ...~ X WJtb }1 1 o1 wnh .l Y,. and note lh-"1
Y, Y,_ A Y~ ThiS means th.ll tlae p roponionaJ clui ngc m the: ..:n"~ Y, between rcnuh 1 - I arJ 1
" appro-cmntc:h lnC Y,) - ln(Y, ) = ln( Y, 1 + ~Y,) - ln( Y, ) .lY tY, Th" "rr~ion In() lIn( Y ) thc liN diffe~ncc: of In( Y) .lin I Y,). lbll) .lin(} ) .l )',t Y 1 llu: percentage: ch:tn~e. t)
100 11m the lrawonal ch;mgc. w the pcrcenrage ch.1ngc 10 the >enc~ Y, IS tppr\)XImatd) tOO.)lnl l ,I

14.2

TABLE 14.1

Introduction to Time Series Dolo and Seria l Correlation

lnAotion in the United States in 2004 and the First Quarter of 2005
Rene of lnflarlon at on
Annual Rafe (/nf,)

First Ulg

Change in
Inflation (~/nf1)

(/nfH)

Quarter

U.S. CPI

2004:1

186.57

3.8 ~ 0.9

2.9

J004:ll

188.60

4.4~ 3.8

0.6

2U04:1D

189.37

1 .6 ~ 4.4

- 2.8

2004:TY

191.03

3 . 5~ 1.6

1.9

192.17

2.4

12005:1

531

3.5

- lJ

The annualized rate of inflation is the percen tag~ change in the CPI from ihc prcviou$ quarter to the current quarter. time.s four.
The first lag of inflation is its va lue in the previous 4u~rter. and the change in rnflarion is the current inflation rate mimJs its first
lag. All entnes are rounded to the nearest tlecimnl,

so forth. The second column shows the va lue of the CPI in that quarter, and the
third column shows the ra te of inflatio n. For example, from the first to the second
quarter of 2004, lhe index i.ncreased fr om 186.57 to 188.60, a percentage increase
oflOO x (1 88.60 - 186.57)/ 186.57 ::: 1.09% . This is the percentage increase from
one q uarter to lJ1e next. It is conventiona l to report rates of inflation (and other
growth rates in macroeconom ic time series) on an annual basis, which is the per
cent age increase in pr ices tha t wo uld occur over a year, if the series were to continue to increase at the same ra te. Beca use there are four quarters a year, the
annualized ra re of inflatio n in 2004:II is 1.09 X 4- 4.36, or 4.4% per year after
rounding.

'nus percenta ge change can also be computed using fhe d ifferences-of logarithms

approximation jn Key Concept 14.1. The diffe re nce in the logarithm of lhe CPI from

2004:1to 2004:II is 111(188.60) - 111(186.57) = 0.0108, yie lding the a pproximate quarterly percentage difference 100 x 0.0108 = 1.08%. On an annualized basis., this is 1.08
x 4 = 4.32, or4.3% after rounding. essentially the same as obtained by directly com-

puting the percen tage gro wth. These calculations ca n be summarized as


A nnualizedra te ofi.nt1ario n

= ln!, = 400[1n (CPl

= 400~ln (CP!,) ,

1) -

ln( CPic-~) ]

(14.2)

where CP l, is the value of the Consume r Price Index at date t. The factor of 400 arises
from conve ning fractional cha nge to perce ntages (multiplying by 100) and convert
ing quarterly percentage cha nge to an e quivalent annual rate (multiplying by 4).
The fin al two columns o f Ta ble 14.1 illustrate lags and changes.1l1.e first lag of
inflation in 2004:11 is 3.8%, the inflation rate in 2004:1. The change in the rate of
inflation from 2004: ( to 2004:II was 4.4'}i:> - 3.8% = 0.6% .

5 32

CH APTER 14

Regression Analysis of Economtc lime Sertes Dolo

AUTOCORRELATION (SERIAL CORRELATION}


AND AUTOCOVARIANCE

14.2

The j 11' autocoHtriance of a eries Y1 i~ the cuvari ancL hctwe~n Y and it~ / 11 lag,
Y, . and thl! 1 b .sutoc{lrrl!lation cndbcienlt'l the corr~.:htion hel\~ecn ) !ld 1',
That IS.

J'h uutocovariance =- cov( Y1 }' 1_ 1)


/ :th

autocorrelation

=:

p -= corr( } Y
'

)=

( 14.3)

CO\( Y

)
.__]!_ - -

V \ :trO 1 )v.:u( Y,

( 14.4)

The/" nutocorrdation codficknt ts sometnnes called the P' :.erial correlation


coefficit:nl.

Autocorrelation
In time series datn, the value of Yin one pcrintltypically is cond,ttctl with its' alu~
in the next period. The corrdation ot a sene:. \\ llh lis own bgl!cJ value5 1<; l:<tlll..d
uulocorrclation M !.erial correlation. The fir'>l autocorrelation (or autocorrelatiuo
coctlidcnl) jo;; thl' c.:orrelation between Y, and ) " 1.th.tt b. the corrdation Od\\e~n
' a lues of )' a t two adjacent dates.1l1e scc<JnJ autocorrelation i'i the c.:urt clat i1111
be twee n yl anti yt-~ and the i'h autocorrelation i'i the correlation hetwc~n r/1 J
l ,_,. Stmiltrly. thcj'h autocovarianl>e Js the co' .mancc bl.!lween} a nti}' _ . u, nn rdatJoo and autOCO\ ana nee an; ~ummarm.:tltn k t:) Concept 14.2.

.........

The / 11 population autocovanaocco;; and auwcorrdatwns Ill Key Concept 1.\ . ~


can be ~~timatetl h~ the
sample autocoHtrwncl' and .tutocorrd.ttton,.

ph

<' -

~( Y1 F, ~) and p1:
-

ll
~
(YI - r I
.;..J

( ) ., . Y1- I) CO\
- T

I -' I

l.J )( r I I -

Y!. ,. I )

--

'
cov( Y1 ) ' 1
p1 -.
var( Y)

1)

( 14 h l

''here } ~, 1.1 dt.'notcs the ample a\'crage ol ) 1 computed over the ob'>crvat ton-'
j + I ... . , T anti where var(Y1 ) i~ the sampk varit~ncc of } '
1-Jlu: 'ummatmn m E4U.IIllln ( L4 'i) t<.Ji,idc:J "' 7 wh~r\ "m the U\Urtl fvrmuhlur the '-It mph: col'r
1
.me.: t~c I 4Uai1Un (J :?4)] Ihe summnrion "lhvu.k d h~ tho: numb~: I or nh-.en.111nn< in th.: ~unun.11 1 "
11111111~ J J~;;grccvof-trccd,,m ,tJ(U\IOICnt. The: lurmula in l<jUOIIOO (14 ') l ' .: lll'tt:OI!On.:JIIur the p r
pmc: <l( C<lnt(lUUng the aUil'ICO\!'lO.In~'C EqUJtllliO (I~ (J) U~' th.: 8'<Un!Jl on I I ~11rl) .) IUtll v )
:trc the --.1me '" imp1Kltlton or the ,Jl>Wmpuon that )' ''
"l11<h J) th<U5,,J 111 ~c '"'n ~

'''''""'r\

14.2

r TABLE

14.2

Introduction lo Time

Seri~ D,ta and Sedol

Correlation

533

First Four Sample Autocorrelations of the


U.S. Inflation Role and Its Change, 1960:1-2004:1V
Au1oc:o~lotion of:

Inflation Rote (lnf,)

o 7n

Cho,g of Inflation Ro~ (Ainf,)

-II

Z$

C),2!J

() h7

- (I 06

rnc ltrst four c;<tmpk autocorn.: lationc; of the inllauon rntc and l[ th~ '-h mgc in
the inll;nion rate arc listed m Tahk 14.2. The''- cntm 'ho\\ th ctmtlatton 1~ ~lrongly
posit1' ..:I~ autocorreluted:Tne first autocorrd.ttton ,, tl.M. TI11. ,,unpk autucorrdation dtdincs as the lag increase:., hut it remam._ J,crgc: C\t.'O 11 tl.tg nl fllUr t)Uclrtcrs.
The Lhunge m intlation i negatively tulocorrdatul:t\n incre.c~c in the rate of inflation in onl.' quarter tend' to be as~ociated '' ith .1 cJ~,.-crt.'c,e in the next quarter.
A I lirst. it might seem contrad i cto~ that the k vel ul inll.ttton ts stwngly pm.llive ly corrLI.ll<:d hut Its change ic; neg.au' dv coudueJ ll1es~ '" o autocorrdattnns. ho\\~.;\Cr. measure tlllfcrent thmg.s. l'h\.. ,,r 111g po:-tll''- .lUt~tlrrd.lllon in
mn ttllln rdlects the lonl!-term trends 10 mn.tthlll ~;viJenl in figllll. 14 1: lnfl.ation
wa!\ lu\\ mthe first quarter of 1965 <llld .1gain in thl' 't.:C\llld it \\,1~ high in the fir ... t
l.(U crtu nf 198 I anJ .a~ain in tbe M:conJ In contc,,,, th~ nLp.tth L autocorrelation
l'f th~. ch.mgc of inD
ea ns that. on a\>en c. an incrca"e in influ tion in one
ar cr ll> associa ted" ith a decrea~~.: m mnatton 111 thl: 111. '<l.

Other Examples of Economic Time Series


Ecnnnnw: t1me "'-TI"' differ greath. Four e\amplc' of cclmnmic tim~. 'eric, arc
pltlttc:tl in Fig.urc 14.::? th e L'.S. federal fund<; inteft.''t r.ctc; t h~.: r.t!C of 1. \Ch.lni!C
hci\\Ct: n the dollar :md the Briti. h pound; th~ lol!arithm ol Japane'e ~TlW\ dumcstic pwduct: and the d a11~ return on lhe St andanJ t~nd Poor\ 5UU (~~ P 500) ~lock
mnrkttcndc:x.
The U.S. kd..:rat funds rate (Figure! 14 2a) i~ the intc:rc'l r:tll." that hanb pay
to esH:h other to horrow funds owmighl Thl\ r.uc: b important h~.:c11usc it i' contr,,llc:d hy the: Federal Re_c:rve and i-. the F~od\ primal) m(IOI."':trv policy instrum~onl. II ~ou Ct.,mpare the plot' llf lhe f~.:dcrnl funds r.tl
nJ the r llt!~ of
ununplll\mcnlilnu inflation m Fcgurc 14.1. you" til 'I."'- tll.lt !ih 1rp tnc:r~oa:o.~.:<. m th~
fl:c.llr.tllunJ<> rate often haw hccn d'soctatcu with ... uh'ClJIH. nl rccc-. ...Hln ....

534

CHAPTER 14

Regression Analysis of Economic Time Seriel Dolo

FIGURE 14.2 Four Economic Time Series


DoUar1 pef Pound

I0

05

Year
(a) I ,,Jcr.lllund> Interest lUte

Percent per Day

Logo rithm
1.35
J2 G

~~~~~~~~-L~~~~~~

1990 l<l'n 1'1'.1<1 19?6 199.. "2

2000
Year

~ :?IX)4 ::: ~

Year
(d) p,.r,\:nt.lbt" Ch;111..~ w l>ll) \'.Uu,') of the: 1\ Y~f.
Compome Stcxk lndc:x

(e) L .;aruhn o.-GOP m jJrln

The fovr hmo series have markedly different potterns. The federollunds rota (Ftgvre 14 2ol hos o pottem similar to
price inAohon. The exchange rote between the U.S. dollar and the British pound (figure 14 2b) ~ o discrete
change olter the 1972 collapse of the Brel1on Wood!. sysrem of fiJ~ted exchonge roles. The logarithm of GOP in Jo all
(Figure 14 2c) shows rdo~vely smooth growth, although the growth rote decreases in tho 1970s and again in the
1990s. The doily percentcge chonges in the NYSC sto<k price index {Figvro 14.2d) ore e~~tially unpredictable. b- 1
its vanonce changes: This series shows volatility clustering.
N

.
;

The dollar/pound exchange rate (Figure 14.2b) is the price of a British. pound
() in U.S. dollars. Before 1972. the develo ped economics ran a syste m of lt:-.l!t.l
exchange rates- called the "Bretton Woods" syste m- unde r whic h governnwnts
worked to keep exchange r ates r om uc ua tng. n
.lfl ationary pre-;.; tre~
led to the breakdown of this syste m: the reafter, the major currendes were alllwt:t.l
to "float." that is, t heir values were determined by the supply and demand (M ur
rencies tn the market for foreign exchange. Prior 10 1972. the exchange rJt ~.i!>
approximately constant with the excepr;Dil of a single Cils' alwa(teR IR I%81'R \'Pi~
the official value o f the o und. re lative to lbe dollar, was decreased to $2.40 c:; 11'

14.3

Auloregressioos

535

Qu3rte rly Japanese G DP (Pigun.:. 14.2c) h. the totol value of goods and se.rvicc::!l produced in Japan during a quarter. G UP I\ the broadl:St measure of total economic acti viry.The logarithm of the series is plot ted in Figure 14.2c, and changes in

this series can be inte rpreted as (decimal) gfO'wth rates. During the 1960s and early
1970:.. Japanese G OP grew q uickly, but this growth slowed in the late 1970s and
1980s. Grow1h slowed furth er during the 1990s, averagmg only 1.2% per year from
1990 to 2004.
The NYSE Composite market ind..::x is a broad index of the share prices of all
firms traded on the :'\ew York Stock Exchange. Figure 14.2d plots the daily pe rcentage changes in this index for trading days from January 2, 1990, to November
11, 2005 (a to tal of 4003 observations). Unlike the olher series in Figure 14.2, there
is very littl e serial correlation in these daily percen1changes: If there were, then you
could predict them using past daily changes and make money by buying when you
expect the market to rise and selling when you expect it to faU. A Ithough the changes
are essentially unpredictable, inspection of Figure 14.2d reveals patterns in their
volatility. For example. the standard deviation o( daily percentage changes was relatively large in 1990-1991 and 1998-2003, and relatively small in 1995 and 200S.This
volatilit y clustering' js found in maov financial time series. and econometnc mode.IS'10rmodeling this special type of heteroskedasticity are raken up in Section 16.5.

14.3 Autoregressions
What will the rate of price infla tion- the perce nt age incr ease in overall pricesbe next year? Wall Street investors rely on fo recasts of inflation when decidjng
how much to pay for bonds. Economists at central banks. Like the U.S. Federal
R eserve Bank. w;~ inflation forecasts whe n they set monetary policy. Enns use
inflation forecasts when they forecas t sales of their prod u ct~. and local governments use inflatJon forecasts when they develop tbeir budgets for the upcoming
year. ln thjs section. we consider forecasts made using an autoregression. a regression mode l that relates a time series variable to its past values.

The First Order Autoregressive Model


If you want to prl.!dict the future o f a time series. a good place to LarL is in the
immediate past. For example, if you wa nt to forecast the change in inflation from
this quarter to the next. you m.ight see whether inflation rose or [ell last quarter.
A systema tic way to forecast the change in inflation, A In[,. using the previous uarter's change. Mnft_ 1 is to estimate an OL regress1on of Mnft ~n ilf nf, _1 Esti mated using d<~ta from 1962 to 2004, th.is n.:grcssion i s

536

CHAPTER 14

lnhoduction to Time Series Regression ond Forecasting

.l/nj~ = 0.0 17- 0.23'8-M nj,

II

( 14.7}

(0. 126) (0.096)

"hen-. ' usu ll 'l.tndc.rd error... an. given in parent h~"c' under the c tuna ted cocflic:tcnt--. 1 nJ S.!nf is the prec.Jiced value of j.fn{ ha,cd (IQ the c timnt\:d rcgres .
,j(ln li'lt. 1 11~ mo<.ld in Equauon ( 14. 7) is called a fiN o r JUhlTC! t. ,._ 1r: an
auhm.. :-1.!~'-ltlO bt:cause it is a f(!gressaon of t h~ :,eri e~ ontotb llWtl l.t.:. :lint
anu
lit:-.t oak r lll:<..a~~ onl} one lag ts useJ a<; a regr~ ssor. Th~ l:Odtu:acnt in I:qu.tll\n
(14.7) j, n~gala\'e, so an increase in the inn.ttion rat~:. an on1. lfUUilca as a''IXti.lleu
\\ ith il dcclint. in lht: inflation rate in the next quarter.

A liN order auroregression is ahhre.,.iated by A R( I) ''here tht. " I"' inuicate')


that 11 '' fir-.~ order . The population A R ( I) moJd for the .;cri ~' }' i ..

Y, = {311 + /31Y,

+ u,.

( 14.:-i)

'' lh. n.: u, is . 111 cn or term.

Forecasts and forecast errors. Suppose you have hio,tnricul Jatn on }' .mtl

~ ou

v.ant I() lorccast ns future value. If Y, foli o~ the AR( J) moth.. I in hquati,1n
t 1-t~) 1110 1f 13c tnd {3 1 ar(! known. then the forecas1 of YT 1 tl.t..~;tlt n Y1 'f3r -

P,.} ,.
'

In pr ~ k~.f% md /3, are unknown. so forecast" must bt. b.tscJ on c ..umJt~~


of fJ and f3 W ''all u.'..: Lhe OLS estim;.tlnf' ~0 .md {3 1, "htch 1 cnn,tructeJ u..ing
h\.tori~ ,, d II~ In I!CneraL yh
wiU denote the fo recnsl of r
Oil'it.J lln informallun through penod Tusing a model estimated with da ta throu,!!.b p~;rH' I r_
Accordsngl~. lh-.; io~ast base d op th e A R(l ) model jo Euuation ( 14.b) ......
(I .J 9)

\\hert. {311 .tnd b are e'>timatt>ll using hi~tonc.11 Ja ta thwu!!h timt. T. ...
- Thc forecn terror is the mi-;ta J.. c made by the forecast; this 1s I he tliffu~nL~
hctwLcn I he value ol Y 1 " ' that actually occurred and its lorecnstvd vuluc bNJ
un Y r:

Forecasl error - Yr+J - Yr+a1r

( 1-l )II)

Forecasts vs. predicted values. The torcca..ltSnnt.IO OIS rr dieted ,aJue.nnd


the torcc.tst error,.., m .. an OLS re..,iJu~l 0 ~ ptcdJd~.:d ' tlue~ nr~; c kul.ttL'd (llr

14 .3

Autor~ion~

537

the oh:.cnutat,n<; 10 the -.a mph.: u-;eJ 1\l e-;llnl.lh.' the r~:)!rC.,,I,.ll\ In contr.L\l,thc lorcc.l'l is maJ.: for o.;omc llnte b~..yunJ th~.; Jo~t.t -.ct u:.cJ to c:>tim.ttc tht..: rcgr~.:.,.,ion ~o
th~.; d.11.1 un th..: ,,ctu.tl \Uiuc of the hlrcca.-.tcu depend nt hannhlc aac not m thc: ~am
pk U'ed [II C'>linMIC the r~gre~-.i~lO '\jmj(,lrl) . lh~o; OT ~ T~ ,jJu,tl j, lh~ dillerencc
hdwl.!cn thl.! .a~.;tun l vnluc ot Y and 1ts predicted vnlue for obs~;n.atton' in the <>ampk "herea'\ the lor~c.:a-.t ~rror is the dillcrl.!ncc he tween th\. luturl.' ,nluc ot }'. whtch
'" lllll ctllllalll<.;U 10 the cstamataon ~ampk, and the Ina ccasl of thatluturc \ ,alut!. ~atd
Jallcrc.;ntlv, torccasrs .Hld forecast errors pcrtain to "out - of-s:.unph.~" obscn ataons.
\\ hcrc:<ts p1 clit~,;t~:J 'aluc~ and residuals pertain to' lllsam 1lc: ''~'en alton~

Root mean squared forecast error.

n,e roof mean 'i<JUarl!d forcca't error

(RI\.ISIF.) t<; n mca,ure of the stze of the fon:ca-.t c:rrur.that i<;. ol thL magnitude

ol

tvptcal

lnl'ot,t k~

mudc u<:ing a fon:ca,ting


root ol th~.: me m squarl.!d forecast errot:
1

Olllt.kl. ' I he

RM"'f [ t:> tht! -,quare

(14.11)

'lht R\lSrr h,as (\\0 source<; of error the error ario;ing b~caust ruturc v~lucs
ol 11 .trc unJ.. nO\\ n and the error io t!:.t imalin\! the c.:odhcicnh
. It the first
>;OUrCt: 0 ~ffOf is mUCQ ar.!!Cr than tbs secane! 85 !I f,! P b~ tf Ill!' S,tmpiL :>IZC (S
l;rb~..th-..n the RM ~l-1:. I" approxtmatch \. \i.lr(u.). the .,tand.ud ue\tal!oo q f rhe
- 1 rna tl , 10 tht.
l ulauon dUton.:\!TC'o.'oiOn It: uatln
c . . wnJarJ ucvia1101 ''' u,t' 111 turn ~:..11mated hy fhe !.tanJarJ crr,,r llf th~.; 1\.:grc,.,j,,n ()f:. R. see
<;,t tinn 4.3 ) 11tu'. if uncdlaint} .tri,ing lr0m eo;llmatinr tht 1cgrc..,,i,m wdficients
j, o.;mall cnoueh to he i~norcd. the RMSFE can be t:!-tun:Hcd o~ thl! standard error
ol th1. r l.'rC:l>MUil Lsttm,tUoo of th1. R \f)f [ indudm~ h111h ~nurn;o; of k1re~ast
t.HOI ,s ttkt:n up tn ~.!~IrOn 14A.
\\'hat ic. the iorecast '"'f mn tttun in the fir...t quarter
IJI 200) (',Cm I) that J forecaster \\OuiJ h:t'c m.tdc 10 ,()(').lJ\' h.l,ed ,mtht t:'limnt J \ R( I) mood in I:quation (14.7) (\\ hich wa<> ec;.timah.d U'ln data through
,Lil\ 14 I ) ) nom 1a hie 4.1. the inflation rrttc in ~on.t I v \\ '"' J _ ... (solnl: \.+I\ =
35' ) 'll\ incn..l..,c of 1 9 pt!rc~?ntagc po1nh from 21111~ Ill ('-ll Mll/z,.,.J 1v = l.':J)
PIUS!;IIHith~.;,~,. values mto Equation (14.7). th\: lorna't of the ch.mgt: tn infl.ttion
11om2f~l I\ tu :!()()5:1 I' SJ;if2u ).5. = 0.017 0.2lh jfn)200-1 1,. = 11.{11 7 - fi.:!..V
X J.t) ~() n
- 0 .t (rounued to thL' nearc..,ttcnth) The prcdkt~d r,tlc ot in!Jf
linn j, tht. ,1,1 r,IIL utmllattOn pJuo; It" pri..'UtCICU ch.JOI!C:

Application to inflation.

/,if T"'l T = /nj I

j/11/ 1 I If/'

ti-U2)

538

CH APTER 14

Introduction to Time Senes Regression and forecasting


Because Tnf-:.tt:l4 rv = 3.5% and the predicted change in the inflation rate from
2004:1V to 2~1 is - 0.4, the predicted rate of mflation tn 2005:J i" lnj 2' ,. 1 ==
lnf'Zf:J.: 1v + 6./n( q = 3.5% - 0.4% = 3.1 o/o. TilUs, the AR(l) model fon..cd ... ts
that inflauon Will drop slightly from 3.5% in 2004:1V to 3.1 'o in 2005.!.
How accurate was this A R(l) forecast? From Table 14. 1. the actual value ol
mfla tion in2005:1 was 2.4%,so the AR( I) forecast is high by 0.7 pt!rcentage point:
that is. the forecast error is - 0.7. The R2 of the AR( 1) model in Equatron (14.7) ll>
only 0.05, so the lagged c hange of inflation explains a very small fraction of the
variation in inflation in the sample used to fit the autoregl ession Thb Jo,.,. 7f ~
con~btcnt with the poor forecast of inflation in 2005:1 produced using E qu.ninn
(14.7). More generally, the low R2 suggests 1hat this A R( t) model will fo recast only
a ~mnll amount of the variation in the change of inflation.
TI1e standard error of the regression in Equation (14.7) is 1.65; ignoring unc~.or
tainty a n siog from estimation of Ihe coefficiems, our estimate o( the RMSFE for
rorecasts based o n Equation (14.7) therefore is 1.65 percentage points.

The p

th

Order Autoregressive Model

TI1e AR(l) model uses Y,_, to forecast Y,. but doing so ignores potentially u~tt.l
infonnation in lhe more distant past. One way to incorporate this informauon 1:,
to wclude addttional lags in the A R( I) model; thi~ y1elds the p'h order autoregressive, or AR(p ). model.
The p 1h order autoregressive model Ithe AR(p) model) represents Y, as a lincar function of p of its lagged values; that is, in the AR(p) model. the regressl '
are Y,_1, Y,_2, . .. , Y,_P, plus an intercepl. The oumber of lags,p, i.ocluded 1n 11
AR(p) model IS called lhe order, or lag length, ot the autoregression.
For exampk an AR(4) model of the change in rnOation uses four lags ott
change in inflation as regressors. Estimated by OLS over the period l962-2()(\l.
the AR(4) model is

til;;{,= 0.02 -

0.266.!nJ;_1 - 0.32Mnf,_'2 + 0.16Mnf,_.3 - 0.03lllnfc_ 4 (14. 1.)

(0.12) (0.09)

(0.08)

(0.08)

(0.09)

The coefficieo ts on the final three atlc.Jitionallags in Equation (14.13) are joint!\
significantly different from zero at the 5% significance teveJ:1l1e F-statistic i::. 6. J I
(p-value < 0.001). This is reflected in an improvement in the R2 from 0.05 (or t.:
A R(J) model in Equation (1~.7) lo 0.18 for rhe AR(4). Similarly, the SER of. ~
AR(4) model in Equatwn (14. 13) 1s 1.52, an improvement over the SER ol tbt'
AR(l) model, which is 1.65.
1he AR(p) motld i-; 'iummarized in Key Conc~.pt 1-D.

14. 3

AuloregrC$SIOOS

539

AUTOREGRESSION$

v-

Thc p1h o rder a uto regressive mode l (the AR(p ) mode l) reprelrents Y, as a line ar
fun ctio n of p o f its lagged values:

14.3

( 14.14)
whe re E(u,JY,_ 1 Y1- z. .. . ) = 0. The numbe r of lags pis called the o rde r. or the lag
length, of the a uto regression.

Properties of the forecast and error term in the AR(p) model. The
assumption that the conditional expectation of 111 is zero given past values of Y 1
[that is, (u 1 Yt-l. Y,_2 . . .. ) = 0) has two important implica tions.
The first implication is that the best forecasl of Y r+l based on its entire history depends o n o nly the most recent p pas t values. Specifically, let Y T-IlT=
E(YT+d Yr. Yr_1.. ) denote tbc conditional mean of Y 1 + 1 given its entire hislory. Then Yn ItT has the smallest RMSFE of any forecast based on the history of
Y (Exercise 14.5). If Y1 follows an AR(p), the n the best forcast of Ynt base d on

Y7 . Y,_ 1.... is
(14.15)
which follows from the AR(p) model in Equation (14.14) and the assumption that

E(tt11Y, .. t Y, _2. .. )

= 0. In practice, the coefiicieo ts {30, {31, ... ,f3p are unknown, so

actual forecasts from an AR(p) use Equation ( 14.15) with estimated coe fficients.
The second implication is rhat the errors u1 are serially uncorrelated, a result
that follows from Equation (2.27) (E xercise 14.5).

A pplication to inflation. What is I be forecast of inflation in 2005:1 using data


through 2QO..l:IV, based on tbe AR(4) model o f infla1ion in Equation (I-U3)? To
compute this forecast, su bstitute the values of the cha nge of inflation .in each of
the four quarters of 2004 into Equatio n ( 14.13): ~/,if~005:I~.I\ = 0.020. 26!l/nf~~:tV- 0.32dJnf 2cwl-l lll + 0.166lnh 10411 - 0.03~/n/20041 = 0.02 - 0.26 X
1.9 - 0.32 X ( - 2.8) + 0.16 X 0.6- 0.03 X 2.9 ~ 0.4, where the 2()()4 values for the
e~ange of in fla Lion are taken from the fina l column o f Table 14.1.
The corresponding forecast of infiation in 2005: 1 is the \alue- of inflation in
2004:rv, pl us the forecasted change, th at is, 3.5% + 0.4% = 3 .9%. The forecas t
error is rhe actual value,2.4%. minus the forecast. o r2.4%- 3.9% = -l.S, greater
in absolute value tha n rhe AR( 1) forecast error of- 0. 7 p\!rcentage point.

CHAPTE R 14

54 0

Introduction to Tim Series Regression and Forecasting

Can You Beat the Market? Part I


)OU ever dreamed of g~lling nch qutck b~
bcm' the tock m.nkc.:l"' II ,ou thinl.: that the
market "til te gvtng up.} ou 'huuld btl\ ~locks today
and -.d h~ 1luter. before the.: 0\dtl... t um... down. If
you are ~nod Dl force. sung ' '-'ing an ~;toe!.. pnc.::..
l-te,... '~o.:, aCll\ e t t.hm , . "~ prtl<lu~..: better

'" ~

p:'"" e "hu) tnJ hold -;tr.lteg~ in


\\ hich) ou .>urchn'~.: 'tod.' .tnd J~'' har onto them.
The 1nd. tll ~lur.,l!. ts h \\1ng" rcliuhh. IMeca... l ot
futun. -.t<X.k return~.

return ' h r

il

Forecast~

h.N.'d c
an: !>ometime' Clllled tn< mwtum' fon:c:nsts Jlthe
value of " :.tr~ k ro~ .h ' nonth pcrlulp It h
momentum anC: \\111 lso nsc n xt m nth IC ...o then
returns"; N: autocorr.: b t.:d nd th~.; nutmcgrcs.o;t\1!
mc.Ud "'' p c' d"' usctul fore\:,tlil 'lou cnn mtple-

mcnt a moml!ntum ll-.cd ' r tiCS) for n ~JX'Ctltc '!lock


or fur a stod

nJ..::-. that me

un.:s the o'crall \1llue

of the market
r vmi1111r d em 1/C\1 fltl/1.~

Autoregressive Models of Monthly Excess Sto<k Returns, 1960: 1-2002:12

TABLE 14.3

Oepondont variable: oxcou retums an tho CRSP valuewoighted index

Sp.:ctl t~o:,ttt ~)n

(1)

(2)

(3)

AR{l}

\R(~ l

r\ RI

ll053
t0.115L)

( tl.'

Regrc<.<.N S
l".lC< ' fClllffl1 1

'

(0.0511)
t'XCI'S f f ('//1

~n.t

\\

-ouc:-;

(O.ll4')

"LL.'~
(004 )

or;:,.;

ft'lllfJl, \

{0it50)

() 016
(0 0-;7)

ln tcr...:pt
r-,tatt'll~
(p valu~ )

on .rll ~nd tld~: nt ~

J<l
~ott

lllll

0.3.2S

(U.Irr7)

(0.1 <I<})

O.CJhS
(flJ:!'i)

(!1.:!611

IJ.I))\ )6

F.\lC'~

0'

(I) 1f 1' )

I J -12
t)

C)OI 4

I UK(' ~

rttum' nrc: mc<J\uri m percent per munth. The d<ttll arc dc><:riht:J in Arrcmllx 1-1 1. \II rc:1~"'"n' >~rc: c~u
Sit l"e".tiii>D~l. l'lth t>'.sTircr ob~crvauon~ u,eJ lvr tnitr.rl ,,,lue nl lu~~:J "'"'hk
Entnc~ rn tho rc rcs."lf '""'~ .rrc COCIIIl1t'nts, wnh hctcnl:>I.<"U.l,ticll> rolou<t ~tanJrrrd cnur\ 111 p.uemh ~' lh 11nnl t"'
rl'""' rcptn the hct.roskc:d 1\llCit\ rohust f MllttSIIt t.:~ur.~~: lb.:- h\'"'l(tt llc;, hat :~II coerftttcnts In th... n: rn,Kln arc zc:ro.."
i p-'tllut' m f!Jrt'Dth .,,,, :mrJ the dJW.I.:d R
m.u.J >\cr 1%111 2lil' 12 {I

14.4

Time Series Regres$iOn with Additional Predictors and the Autoregressive Distributed log Model

5 41

'lithle 14.3 presents autoregrc~ivc: models of the


e xecs return on a broad-based index o f qock prices

In fact. tht. adju.,ted R2 of o ne of tbc models is nega-

calkd th~ CRSP \'aluc-weig.h ted indc:-.. u~1ng

gesting that non..: of these models i' useful for

monthly data from 1960 l to 2002: 12. The monthl~

for..:casting.

tive and the other two a re only slightly po~itive. sug-

u:<cess return is wbut you earn. in pcrct!ntage terms.

These negative results are: consistent wi th the

by purchasing a stock at the e nd or the previous


month And selling it at the end of this month. minus
\\hat you would have earne d had you purchased a
safe a~set (a U.S. Treasury bill). 1l1e return on the

theory of cfficknt C<l pital markets. which holds that


cxc~::s:s returns should

be unpn:dictable because stock

prices already e mhody all curre ntly a\'ailable information. n1c reasoning is simple: If market partici-

stock include~ the capital gain (or IO~:!o) from the

pants. think that a stock will bave a positive execs:.

change in price. plus any dividcndl> you recei,e dur-

return next month.then they will buy that :.lock now:

ing the month. The data are described further in

but doing so will drive up the price of the stock to

A ppend ix ltl.

exactly th..: point at which there is no expected excess

Sadly. the resulls in Table

14.~

are negative. The

coefficient on lagged returns in the AR(l) model is


not s tntiiitically significant. and we cannot rcjl.!ctthe
null hypothesis that the coefficil.!nts on lagged

rl.!tum. As a result, we should not be ahle to forecast


future excess Mums by using past publicly available
information- nor can we. at ltast using the regressions in Table 14.3.

returns are a ll zero in thl.! AR(2) or AR(4) model.

14.4 Time Series Regression

with Additional Predictors and the


Autoregressive Distributed Lag Model
Economic theory often suggests other variables t.hat could help to forecast the
variable of interest.l11cse other variables, or predictors. can be adde d to an autoregression to produce a time seJies regre.<>.<;ion mcxlel "ith multiple predic tor-. When
other variables and their lags are ad ded l O an au toregre::.sion. the result is an
autoregre~si ve distributed lag model.

Forecasting Changes in the


Inflation Rate Using Past Unemployment Rates
A high value o f the unem ployment rate tend s to be associatl!d wi th a fu t ure
decline in the rate of inflation . This negative relationship, known as the short-run
Phillips curve, is evident in the scatterplo t o f Figure l-l.3. in which yea r-to -year
changes in the rate of price inflation are plotted against the rate of unemployment

542

CHAPTER 14

Introduction to Time Sene.s Regression and Forecasting

in thu previous vcar. For example. in l981thc unemplO} men I r<Ht.: avcntged 9. 7'Y.,.
and during the nt.!xt year the mte of in nation fell h) 2.9%. Overall. the currel.Jtion
10 Figure 14.3 i<~ -0.36.
l11c scaucrplot m Figure 14.3 'luggests that past value::. of the uncmplo~ ment
rate mtghl contain information about the luture course ol 1nllation that is not
alrcac.l) containcc.l in pa~t changes of inllc~tion.llm conJecture 1s rcac.Jily checked
by augmenting the AR(4) model in Equation (1-t.l3) to include tht.: fir,tlc~g o f the
unemployment rate:

iJ;if = 1.2b- 0.316/nf,


1

1 -

(0.53) {0.09)
-

0.08:l luf,_.~

(0.09)

0.39Mnf,_2 + 0.096.Jnt; ,

(0.09}
- 0.21Unemp1
(0.09)

(0.08)
(l4.loJ
1.

l11e /-statil>Lic on Um.-mpH is - 2.23. so this term is significant at the 5% level.


The R2 of this regression is 0.21. un improvement over the A R(4) R2 of 0.18.
rhc forecast of the change of inflation in 2005:1 is ohtained by . ub...tituting the
2004 ',11ues of the change of inflation into Equation ( 14.16). a1ung with the valth.
of the u nemployment rate in 2004:IV (which is 5.4%): the resulting forecast i ~
.fi;J.,... . _, ~1 ~w =0.4. lnus the foreca~t of inflation in 2005: 1 is 3.5cvo .J.. 0.4;, =
3.9o, .mtl tht! fo recast t:rror is -1.5 %.
If one lag of the unemployment rate is hc1p{ul for forecasting inflauon. "yv
l!rallag' might b~ C\ en more helpful: adding three morl! Ja g~ of the unc.:mplo~ mt:n
rate vields
1.30- 0.4Llln.f, . 1 - 0.37Mnf,-::. + O.OM.!nj,.. ,- 0.046/n/,_4
(0.09)
(O.OR)
(0.08)

(0A4) (0.08)

2 fHL'nnnp,_1 ~ 3.0-H..'nemp1
(0.46)
(O.Rt'i)

2 -

0.31{Unemp,_3

(0.89)

(lU 7)
0.25uncmp1 ~

(0.45)

'Inc F-stau::.uc 1esung the joint signtficance of the second through founb lag::o
n t the unemployment rate is 10.76 (p-valuc < 0.001), so they are jointly ~ignli
<:ant. rne R2 of the regression in Equation ( 1-1.17) is 0.34, a solid Improvement 0' t.. r
ll.:!1 Cur Equation (1-U6). The F-statistic on all the unc.:mploymc.:nt cocfficienb ~~
8.91 (JI value < O.OOl).iodicating thatlhis model rcpn.:M.:nts a statistically ::.ign It
cant improve ment over the AR(~) model of Section 14.3 [Equation (14.13)] l in.'
slandarJ aror ot the regression in E<.1uation ( 14.17) is 1.36. o substantial impn)\ ment ov~:r the SER of 1.52 for tbt: AR(-l).

I 4.4

Tlml! Serres Regression With Acldittonol Predictors ond the AutoregrC!sive o.stnbuted log tv\odel

~3

543

S<otterplot of Change in InAction Between Year t ond Yeor I + 1


versus the Unemployment Rote in Year t, 1961-2004.

In 1982 th U S unemployment
role WQ) 9 7 ond the role of
mllohon tn 983 fell by 2.9% (the
Iorge dot) In getl"'rol, high values
of the unemployment rote m year
1 tend 10 bo followed by
de reoses tn tho rote of pnce
' nllohon 10 th~M next yeor yeor
1.._ l wth o ccm~loton of

Clu.n gc in I nflation
t aod
Y~ar r -t 1

Bl't\\ e~n Y~ar

:>

036

-I
-2

-.'
-I
1

-sL-----~-----i---

II
I:!
Unrm plurmcnt R .ltr in \ear r

(1

forec:~c;tl!d

ch1ngc m intlauon from 20U41V ltl .,OO.'i I usin!! Equation


{14 17) b computed hv subtJt utmg th1.. " tl u~ ~ ol till.: v,mablcs an\u \h~.o cquaun n .'lh~o. unemplu\ m~:n\ r J\~ was 5.7 ,;, in 200-1 l, 5.<> ;-u 111 200-U I. and 5.4 o in
1\JO..I.Ill .tnu 2004.1V. n1e torec.'lsf of the dl.mgc in mtl.tllt11l from 200-U\. to 2l-.l~. l.
ht.hcd on l:.quatltiO (14.17), i-.
1 he

i'tnf'200.~ I.~ ...alv

T1lU-. th\:

1.30- 0A2 X 1 l ) - II "1,7 '>< (

X 2.9-

~.66 < 5.4 ...... 0.3~

for~ca't

of inflation in ~OU'\: 1 is :l :'- "<.

-< 5" -0.3R


~

2.~) l O.tl6 X 0.6 - O.IW

5.6

0.1%

0 2:' /.

~ i =0.1 (f 4 .l~)

3.o ll,.Th~ lorccaq error

lS - 1.~.

The autoregressive distributed lag model. 'l11c moJcls tn Equations ( 14.16)


and ( 14.17) an; autoregressive distributed log (AO L) modch> "autnrcl'n:'"''c"
h~:cau-.~ I.H,tgt:t.l 'alue'> of the! dept<nucnt "" a.tblc ar~.: induJcd .1s regrc~-.or-.. a!- in
an auwrcgn.:,,illO .mu t.Jio;tributed lag" b~..c,au-;c 1111: r~:grc,c;ion at-.u include-: multiple lag.; (:1 "dtstribufcd lag" ) of an additional prcuu,;tor In i!CD~:f II un ilUtoregrcv.;j, c t.lhtnbuted lag mudd \\ ath plug' nl th~.: J~.:p~.:nuc,;nl ' . rtahk Y, .lOd q laes
ol .111 lllduionnl predictor.\', i<: called an AD l (p,q) mudd In tht~ notation. the

544

CHAPTER 14

lntroducl10n lo Time Series Regression end Forecasting

THE AUTOREGRESSIVE DISTRIBUTED lAG MODEL


Tht: autoregressive dhlributed lag model with p lags of Y, and q lag" of X, denoted
ADL{p,<J). is
Y,

= Po+ P1 Y,_L+ 132Y 1- 2 fJpYt-p


+ S1X,_ 1 + S2X,_ 2 + -~.. ~qXt-t1 + u,.

(14.19)

wht:re f:J1r /31 /3,-61 Sq are unknown coefficients and u, is the error term
with F:(u, Y, 1 Y,_z.... , X,_1,X1 :! ... ) = 0.
mode l in Equation (14.16) is an ADL(4,1) model and the mode1 in Equatton
(l4.17) is an ADL(4,4) model.
I'he autoregressive distributed lag model is summarized in Key Concept l4A.

Wjtl:\ all these regressors, the notation in Equation (14.19) is somewhat cumbersome. and alternative optional nota tion. based on the so-called lag operator. is pre
sented in Appendix 14.3.
The assuruption that the errors in the ADL model have a coodiliooal mean
of zero given all past values o[ Y and X , that is, that E(u,l Y,_ 1, Y,_1 . .. . , X _1
X,_ 1 , ... ) = 0, imp)jes thar no additional lags of either Y or X belong in the ADL
model. In ot her words, the lag lengths p and q are the true lag lengths and the coefficients oo additional lags are zero.
The ADL model contains lags of the dependenl variable (the autorcgres~tve
component) and a distributed lag of a single additional predictor, X. In gener...
howeve r, forecasts can be improved by using multiple predictors. But before wming to the general time series regression model with multiple predictors, we ftr~t
introduce the concept of stationarity, whkh will be used in that discussion.

Stationarity
Regression analysis of time series da ta necessarily uses data from the past to quan
tify historical relationships. If the future is like the pll:!l. then these historical rei a
Lio nship~ can be used to forecast the fu ture. But if the future dillers [uodam entall~
from the past, then those hjstorical relationships might not be reliable guidt..> to
the future.
In the context of time series regression, the idea that historical relati on<;hir~
can be generalized to the future is formalized by the concept of stationarity. Th<!
precise dclimtion of s talionarity. given in Key Concept 14.5, is tha t the lh,trJI'U
tion of the time ~erie variable does not change over time.

14.4

Time Series Regression with Additional Predictors ond the Autoregressive Distributed Log Model

.54.5

STATIONARITY
A rime st!ries Y, is stationary if its probability distribution docs not change over
time. that is. if the joint distribution of ( Y1 L Y,.1' 2, Y,_ 7 ) does not depend on
, : otherwise. Y1 is said to be nonstationary. A pair of time series, X 1 and Y .. are said
to be jointly stationary if the joint distribution of (X,_ 1 Y,_ X1 _~, Y1 1..... X 1 .
Y1 ,.) doe~ not depend on s. S1ationarity requires the future to be like the past. at
T

least in a

probabili~tic

sense.

Time Series Regression with Multiple Predictors


TI1e general time series regression m odel with mult iple predictors extends the
ADL model to include multiple predictors a nd thc1r lags. The model is summarized in .Key Concept 14.6.11le presence of multiple predictors and their lags leads
to double s ubscripting of the regression coefficients anti regressors.

T he time series regression model assumptions. The assumpti ons in Key


Concept14.6 modify the four least squares assumptions of the multiple regression
model for cross-sectional da ta (Key Concept 6.4) for time series dara.
Tht! first assumption is that u, has conditional mean zero, given all the regressors and the additional lags o( the regressors beyond the lags included in the
regressio n . This assumption extends the assumption used in the AR and ADL
models and implies that the best forecast of Y1 using a ll pas! values of Y and the
X 's is given by the regression in Equation (14.20).
The second least squares assumption for cross-sectiona l data (Key Concept
6.4) is that (X 11 , . , X~;;. Y1). i = I, ... , n, ar e independently and idem ically distributed (i.i.d.). The second assumption for time series regression replaces tbe i.i.d.
assumptio n by a more appropriate one with two parlS. Part (a) JS that the data are
drawn fro m a stationary d istribution , so that the distribution of the d ata today is
the same as its distribution in the pa::;t. This assumption is iJ time series version of
the "identically distributed'' part o f the i.i.d. assumptio n: The cros:.-scction a l
requirement of each draw being identica lly distributed is replaced hy the time
series r e q uirement that the joint distribution of the variables. including lag.'>, does
not cha nge over time. In practice. many economic titne sericli appear to be no nstationary, which means that thls assumption can fai l to hold in a pplica t ion~. If the
time ~eries variables are nonstatiomuy, then one o r mo re pro blems can aric:e in
time ~eric~ regression: The forecast cao be bi ased , the f0rccast ca n be inefficient
(then; ce~n be alternative forecasts based o n the <>arne data with lower v3nance).

546

CHAPTER 141

lnlrocluction to Time Series Regression and Forecasting

TIME SERIES REGRESSION WITH MUlTIPlE PREDICTORS

14.6

Th~ g..:n~ral tim(.; ~c:m:' r..grcsslOn mocld allows fClr /.; adthtinr ~' C~redrctors, ''here
c1 1 lugs of the first prcdil:tor are included. q~ lag:, of the !>L<::nnd prcdtcto r ar~:
included. and ~o forth:

Y,

{311

{3 1Y,_1 - (3, Y, 2 +

.,. c5 1

.,. . .

/31, Y

- 6 ..X ,

c5 1q .\ 1,

1_

iJ. X -I + 8 \'

.!

{14.20)

+ .. . + 6 I XJ.

1. Llu '>' 1.Y : .X,1_,.x,,_"xkt . X'..1 ~. . . . ) = o:


2. (<t) J1te random variables (}",.X11 XA,) ha,._ u ~tu t ionary distrihuuon, und
(b) ( Y,. XLI.... X~;/) and ( Y,_;. x,l I' .. xkt , ) bccoml.! indcpt!ndcnt .Is j gc::ts
la rg~:

3. X1,, X;.1 and Y, have nonzero. fintte fourth mom~:nt s: ~ut d


4. 'll1crc is no pcrlcct multicollinl.!arity.
M ~on,entional

Ol S -ba,ed statistical inft:rcnces (f,,r l:'l:ampk, pcrft,rtnine. a

by cClmr~mng lhe Ol.S r-statl<;llc to I 9(;) c.m be mi'k 1ding Pre:


wl11ch nr tht:<;e problc!ms occurs. and tts remeJy. d cpcnc.h on the source nr r1
non:,tatmnarit~.l n Section~ 14.6 and 14.7. we study the pH>hlcrns pos!.!d tw. tc."t" fl)t .
and \Oiutiom. to two empmcally irnport.mt l) pes ul non'l t ~ltionarity in ccorhll .:
time "I.! fie!>. trends :~nd hrlak For nm'. ht1\\~.:. \CT, \H: "'mpl) ,t"'um~ th ttth~ .,~,.n co
are jt1intl} ... tation.-n and .-~ccordingly focus o r gn:""'un "ith stationary variabk:
Part (h) oflhc Sl'lonJ t\sumption reqUire' that till r.1ndnm 'ariahlc bc:comc
indt:pcnd~:ntly di'ltnhutcd " hen the amount nl time 'cpamting them hccon1 ~.: ~
large I hrs replace' the cross-o;ectionaJ requrn.;mcnl th.tt the \'ari,thJe, hr in 1
p~.:ndc-ntl) dt"lrihut~.:.J (rom one ob,cnation to the n ~,..xt with the ttnh.: :.-.
-:
rcqutr~.:.menr thut thc: y ht: tnd~.:.pemlentl) dbtritmt~J \\ h1. n Ul~.:.) are ~epar.ttt:c.l
long p~ riods of time. 11th .lSSUmption IS SO!llcltlllt.:" relet n:J tO a<; \\CUk cll'(ll fl
dence. and it ensure!> that in large samples thl.!rc i~ -.u ffid ~:n t rHntlomnt!!>S in tlK
unta for the law or large numbers and thl' central limit tht:mcmto hold We d11 nut
pro\ u.lc a prt:cJsc marhematu:al statement or the " c1k dependence cundttJi)ll.
rnrhcr. rh~.:. n ..adcr '' rd1.rrt!d to Ha}a"h' (:ooo Ch::~ptt: 2).
'Inc thtrd a~umpt10n, "h~c:h .c. th~ same:. as the thmJ h...1~t syu uc" a:,:,umplll1n
h\ po hCSI:> test

c~M:I}

for cro,s- ectional uata,J' that aU the' an 1hk' h m ... mmtcru Imile founh mom ent~
fi nally. the fourth U'-~urnption. which j, abo the ')!lOll ,1, flll' cro...,.-.~,ctional
Jara. j, that rhe regrc""ors ar~ not p~rlcctl) muhknlhne;1r.

14.4

Time Series Regression with Additional Predicton and the Autoregressive Distributed log Model

547

GRANGER CAUSAliTY TESTS (TESTS OF PREDICTIVE CONTENT}


~

The! Granger causality -;tatistic is the F-statistic testing the hypothesis that the coefftctents on all the valuc::s of o ne of the variables in Equation (U.20) (for example,
the coefficients on X 1, ,. X 11_ 2 , X . -q) are zero. Thjs nuJI hypothesis implies
that these n::gressor.:. have no predictive content for 1"1 beyond that contained in
the other regressors, and the test of this null hypothesis is called the Granger
causality test.

Statistical inference and the Granger causality test.

14.7

Under the assumptions of Key Concept 14.6, i nferencc on the regression coefficien Is using OLS proceeds in the same way as it usually does using cross-sectional data.
One useful application of the F-statistic in lime series forecasting is to test
whe tJ1er the lags of one of the induded regressors has usefuJ predictive content.
above and beyond the other regres~ors iu the model. The claim that a variable bas
no predictive content corresponds to the null hypothesis that the coefficients on
all lags of that variable are zero. The -statistic resting this null hypothesis is caJJed
the Granger causality stalistic, and the associated test is called a Granger causaJHy test (Granger, 1969). Thjs test is summariz.ed in Key Concept 14.7.
Granger causality has little to do with causality jn the sense that it is used elsewhere in this book. In Chapter 1, causality was defined in terms of an ideal randomized contro lled experiment, in which different va lues of X are applied
experimemaUy and we observe the subseq ul.!n l effect on Y. Jn con trast, Granger
causality means that if X Granger-causes Y. then X is a useful predictor of Y. g.iven
the other variables in the regression. While "Granger predictability" is a more
accurate term than "Granger causality," the latter has become part of the jargon
of econometri<."S.
As an exa mple, consider the relationship between the change in the inflation
rate and it:, past values and past vaJues of the uncmp l oym~nt rate. Based on the
OLS estimates in Equation ( 14.17). the F-statistic testing the null hypothesis that
IJ.1e coefficiertts on all four lags of the unemployn1ent rate are zero is 8.9J ( p-vulue
< 0.001): In the jargon of Key Concept 14.7, we can conclude (at the 1% significance level) that the unemployment rate Grange r-causes changes in the innation
rale. Thi~ does n01 necessari ly mean that a change in the une mployment rate wiU
cause-in the sense of Chapter 1-a subsequent change in the innation rate. It
does mean that the past values o f the unemployment rate appear to contain information that is useful fo r forecasting changes m the inOauon rate. beyond that cont;uned 10 p.1..,t ., ,tfut:s of the mOation rate.

548

CHAPTER 14

Introduction to Timo Series Regression ond Forecasting

Forecast Uncertainty and Forecast Intervals


ln an) c'-ttmation pmhh,;m. it is good pr..tcuc~ to r~rnrt a mea'!>ure 11f th~ un~.:c: r
tamty of that estim uc, and forecasrtng t'!> no C \ c~o.ptlon. One measur~.. olthc uncc rtaintv of :1 forcca')t is ih root mean "<.JU.Ire for~_c.t'llrrnr Cndt:r th~ .tdJiuon 11
assumptiOn that the c:.rror.111 arc Ollrmall) d t')tributcd.the R..\.1SFF 1..'"110 be u'i.'U to
co~truct a forcca't interval. that is, an interval that contains the fututc \1t1Ul' ol
the variable \'vlt h a certain probability.

Forecast uncertainty.

The foreca:,t error con..,bts of two componl.."nb. uncer .trising from 1.. ,( n 1Hon oftht: rq;re,,ton c.o~.- ltll~ nh, and unl r mt~ a~')O
cia ted \\ith the future unknown 'alue of 111 f'lr r~r~r~''iCin with r~" .:oct ih. i..:nt::
and many ob~crvationc;. the unccrtaint\ ,ui,ing from luturc u 1 can be much larger
than the unccmuntv u~ocinted with estimation of the paramctcN.ln general. howcwt. both ..,~.,lurccs of uncertainty are important, so we now deve lop .~n ~xprc,o;ion
fur the RMSI-L thatmcorporates these two S<lurcl\s of unccnamty.
1 o keep the nol.ttiun simpiL,conl>tder forecasts of Y ,. .. , based on an ADL( t.l)
mmkl "itb a ..,inglc pr~..dictor. that is. Y, {3,1 {3 1
+ t> 1X, . 1 + u,. und ~urpuo;~
that 11 i-; homosketlao;tic. The forecas t i~ Yr ll = ~0 + ~ 1 Y r + 51X 1 and the for~
cac;,t error 1:,
tain!~

Y,_,

Becau-;c 11 1 1 hu~ conditional mean zero and is homoskcdastic, uT-I lh\.., ._:ad
nncc a;. and is uncot re lated with the fina l exp r~sio n in hrack~t' in Cl(Uation
( 14.:! I). Thuc;, the menn c;quarcd forecast error (MSF E) i
MSrE - 1-l( y T

I -

'

...

l', 1/JI

= 17~- var[(.8,,- {30 )

(/3 1 -

/3.lYr+

(51 -

c5 .)X

1.

(14 22)

and the.: R\.f~FC is th1.. yuare root of th1.. ~lSFE .

.btJmatton of the MSFE entmls estimation ol the two parts i n I qu:HJllil


t l4.22). The first term . (!?,,can be csti rnat(;tl by the squ are of the standnrd c.:rt'oJ llt
the regn.:l.~ton. U!> d iscuo;c;cd in Section 14.3. Th ~.: second term requtrc:. c'ttmalllli!
the' <trtancc.: of~ \H.ighli.:J <tverag.c of the: rcgres'!>ton CQ~.;J'ftClents. and nt~:thoJ tor
doin~ .., 1 wac di cusst!d in Secuon , I ('ce th~: d t scu~:,ion foUowt 1g Equ
tion (lot 7)]
An altt::rnativc method for c.qim:lling the MSFC. j.., to u'c the '~trianc:: ' 1f
p~udo ()llt-nl-<;..ampl~ h>recast~ a pnx:cdur~.. Jt.,cuss~J tn Sl.!ction 14.7.

14. 5

log length Selection Using lnfonnotion Criteria

5 49

Forecast intervals. A fmccaa interval is hkc a Cllnftdcm:c antcrvnl,exccptthat


at pcrtams to a forcca..,t. That i~. a 95% forecast intcn ul is ,min ten at that C<lntains
the lurure ,,tlue of tht: -.cri"' in 95% of rcpeat~d ,tpplic:Hions.
One important diiTerence between a forecast interval and a confidence interval il> that the usual formula for a 95% confidence interval (the estimator 1.9o
standard errors) is justified b) th~ ccntrallimtlthcorcm anu fherefore hold<> f0r a
wide range ol dt~trihuttons ot the error term. In Ctll1\rH'>l, becaus~ the forecast error
in Equation (lUl) includes thu future value l)f the error 11TH to compute n foro;:cast intt:rval r.;qUtrcs either estimating. the distribution of the error tcnn or making some assumption about that distribution.
In practice. it is convenient to assume thatu 1 is normally
If ~o,
. . distributed
.
Eyuation ( 14 .~ I) anJ the central limit theorem npplicd I~) /3<10 /3 . and I) imply that
the forecast error 1s tbc sum of l\\0 md!!penJ~..nt. normal!~ distributeJ terms.
so that the hnec:tst error is Itself normally dt tnbull'J "1th 'ananc: cqualang the MSFE. lt follO\\::; tbat a 95% confid~..ncc interval is given b) 1',_ r
l.96SE(Y1 1 - }, 11 ),\\hCr\!SE(}'T+t- ), r)i-.<tnestimatorofthcR~lSPE.
l11is discussion has focused on the ~.:a~c that the error term. u,_ 1 is
homm.keda-.tic. If in._tead u 7 1 is heteroskedastic, tht!n one needs tu develop a
motld ot the heterosketlaslicity so thalthe term (T 11 an equation ( 14.22) can be estt
mated. gt\'t!n tbt! mu::.t recent values of ) and X. nnd methods for mtldeling th1s
condit1onal hctcroskedasticily are prc-sentcd tn $cCI1on 16.5.
Because of uncenaint) about future en:nt!> thilt as. unct:rtaint) about11 1 , 195% fllT~c~ht intervals can he so wide thatth~..v haHIimitcd uc;c in tkcisinn mak ing. Professional furt:castcrs therefore often report forecast inter\'als that arc
tighter thun 95%, for example, one standard error rorcca.;t imcn als (v. hich are
f>WY., (orecast an tervab ii tb~ error'\ are m)rmally distributed). Alternatavdy.some
forccast.l'rS rl.!r~>rt multiple forecast interv<tls, m; is <.lone by the cconumi$tS at the
Aank of England when th~!y publish their inflati\'n forecasts (see the " river of
blood" b{l\ on the f<liiCl\\ ing page).

14.5 Lag Length Selection


Using Information Criteria
l11e esti.Jllttled inflation regrcSl>ionc; in Sect ton~ 14.3 .m<.l 1-1.-1 have ~it her Ont; or
four bgs ot the preJictors. One lag mnkcs ~orne sense, but why four'? More gcnentll). how man) lag~ should be ulcluded in .a tim.: series regression? Thb sect ton
Ji..,tu,c.,..;.s ~r.tti-.tical methods lor choosing the number of lag~.l'irst in an autore
gre,.sion. then in a time series regression model wtth multiple predtctors.

550

CHAPTER 1 4

Introduction to Time Series Regression and Forecasting

The River of Blood

regularly publishes foreca~Ls of mflation. These forecasts combine output from econometric models

1l1e river of blood for May 2005 is shown in Figure 14.4 (in this figure the blood i~ blue. not red.~
you will need to usc your Imagination). 'Jltis chart
shows that. a~ of May 2005, the hank':. cconorrust!>

maintained by professional econometricians at the


bank with the expert judgment of the members of

expected the rate of inflation to climb to approximately 2%, then to hold !.teady for tht: foreseeable

s pan of its eCforts to inform the public about


monetary policy decisions. the Bank of England

the bank's senior starr and

~lonetary

Policy Com-

mittee. l'hc forecasts are pre~cntcd as a set of forc:cast intervals designed to reflec t what these

economists consider to be the range of probable


paths that inflation might take. ln its l nj7ation
Report. the bank prints these ranges in red. with the
darkest red reserved for the central band. Although
the bank prosaically refers to this ns the "fan chan:
the press has called these spr<.:ading shades of red lhc
" ri ver o( blood:

future. The economists expressed considerable


uncertainty about this forecast. however. In their
written discussion. they cited the outlook for consumer spending and prospects for the world econ-

omy as !>ourccs of forecast uncertainty. A s it


happened. their forecast of inflation six months
ahead (in Novcmher 2005) was 2.1 % and actual
intlation in November was 2.3%- very close. considcr ing the considerable jump in oil prices in the
intervening sL" month~.
continued on next page

FIGURE 14.4 The River of Blood

fon chort for tvloy


2005 shows forecast ranges for inflation.
The dashed line indicotes tvloy 2007, two
yeors after the release of the report
The Bonk of England's

Perce n tage lncTease ju Prices


From a Year .Earlier
5

41

Year

14 .5

log Length Selection Using lnFonnotion Criteria

551

The Bank of Rnplnnd has hecn a pioneer in th~

tmporta nt for cit it.c n ~ tl) undcmaand the ban k\ eco-

movcmenc toward Qreoter openness by cent ral

nomic oulloo~ and the n:a~om ng behind us difficult

banks. and o ther central ha nks nov: also pu blish


infl ation torcca~t~ Tile ucci~ions made by monetary
pohcymakcr.. u r~ uifficult ones and affect the li .. esand w;\llet.;-of man) of their fe llo w citi1.ens. In a
democraC)' m t he information age. re asoned the
economists at the B unk of England. it is particula rly

deci-;ions.

fo sec the nvcr of blood in it'\ origmal red hue.


vi sit the Bank ol En!tland"l> Web sue at
www.bankofenglaod.co.uk. To h:arn more about the
performance of the Bank of England mflation fore
ca~ts.sec Clement) (2004).

Determining the Order of an Autoregression


In practice, choosi ng the order p of an autoregression req uires balanci ng the
marginal benefit of including more lags against the marginal cos1 of addi tional
~t im a t.ion uncenainry. On the one band, if the order of an es~tm a ted autoregression is too low, you will omit potentially valuable Info rmation contained in the
more di!)tant lagged va lues. On the othe r hand. if it is 100 high. you will be
estimating more coefficients chan necessary, which in tum inc rod uces additional
estimation ~rror into you r forecasts.

T he F-statis tic approach. One approach to choosrng p is to !>tart w1th a


modeJ with many lags and to perform hyp otheSIS tests on thl! f111allag. For example, you might start by estimating an AR(6) and test whe the::r tbe coefficient on
the ~ix t h lag is significant at rhe 5% level; if not, drop tt a nu estimate an AR(.5),
test 1he coefficient on the (iiLh lag, and so fo rth. The drawback of this me thod ic;
that it will produce too large a model, at least some of the t me. Even if the trul!
AR order is fi ve, so the sixth coefficient is zero. a 5% test u~t ne the /Statistic will
incorrectly reject this null hypothesis 5% of the time ju~t by chance. Thus, when
the true value of p is five, this method will estimate p ro be SIX5'\, of the time.
T he 8 /C.

A way around this problem is to estimate p by minimizing an .. mlor-

mation criterion.'' One such in!or:rnation criterion i!l the Bayes information criterion (BlC), also called the Schwarz information criterion (SIC). which is
(14 .23)
where SSR(p) is the sum of squared residuals of the escimatcd AR (p) The BIC
ec;timator o{ p. p. is the value thar minimizes BJC'(p) among the po .tbk ch01c~s
p
0, I.... , p, r where Pma~ ts the largest vahu. ot f' cunsiucrcJ.

552

lntroducton to T1me Series Regre~ion ond Foreco~hng

CHAPTER 14

TABLE 14.4

The Boyes Information Criterion (SIC)


and the R2 for Autoregressive Models of U.S. Inflation, 1962-2004

SSR{pVT

ln(SSR{pVT)

(p + 1)In( T)fT

()

:.!:If)()

II ,c;

u 01()

~.7:07

I IIIli

fl.ll6U

I Ill>

II 056

2315

"S65

(J.fl\10

! .,,,

Ill SJ

:.31 I

II.R~

1~11

(J.~7

O"tn

:!.Jti'J

II.R~7

1J I 'O

U%fi

0.21):1

2 'IJ.'\

runn

O.lliO

l.Uiti

O.l(}.J

:11113

!)'\~

O.l(}IJ

(l).t6

021~

1111~ formula

BIC(p)
~<J'i

1f2

OtUl

for the B JC might look a bit my-;terious nt fiN. hut it ho~ an intu
ilht.: uppe.1 1. Consider the rust term in Equation (14.23). 13 ccau-.~: the r..:rr..::o.!ooi(lll
coc11H.. t~nts are estimated by OLS. tbe sum of squan.:J re"Jualc; n<.:l'l."'-" lv
d..:n~a'~:' {nr at fca, t doe!) not mcn!as~.) wh~.:n }OU adJ a lag In <.:ontra-,t.Lh~. :......
,mJ tCml i" the numher of C')limated regr~. siun co dfldc:nts lthe 'lumber f 1.,~:...
p p l U'lll e for thl! intercept) time' the ftdvr (In I) '/.Thi' ..~.. .. onJ ler 111 e.
"hen vou dd ,, lc1g. m e B lC Ira des oif 1hcsc 1'' o force!. so 1hat 1he numtx r ;')f I g..s
that mmtmi7.CS thc BIC is a consi!.hmt C.'\timator ol the true lae,lcn!!th The m:llh
crnaiJCS of 1h1' argum~:ntt s ghen in AppcnJix l4.5.
,\-. .1n ~. xnmpk. constdcr C5timatmg the A R order lor an auton.:greS!)ton oltlw
ch.mg~. mlht; tntlatJOO rate;: The' ariou-. ~h.:pl> m th~. c.tkulat1on ott he BlC 1r c.1r
ricd lUI in [ 1blc 1 ~.-+ for auton.:~re-... ion-. ,,f ma\imum onkr ~:x ([ mu = 6) l r
ex.tm(k. for the AR(l) modd n Equ<~tilln (14"'). (j~R(l)/7-2.717 . ..o
lnO.\Rt I In = I ( 17 Because r = 172 ( -l3 ye.lr .... fuur qu ncr.. per }car)
In( I )1 I = 0 <no und (p -. 1)In( T )1T- 2 A 0.0~0 = 11.060. TIIU) BlCC l l "" 1.1107
+ u.uou 1.067.
'llw ntc Ill -,mallest when p _-; 2 in Iahk 1-IA. lnu~ thl.! DIC I.!Stimatl.! ol rlw
Jug lcn~th j-, 7.. 1\), can bt! seen in Tahll! 14.4. a:.. th..: numhcr ,)f lug'> incrca-,..:!-1 1h~ /(
incr~. tsc-. and the f)SR decn~a"t.:'>. The: incr~.asc in the R 2 h large from one 1<1 '""~
IU!!' o;nr llu from l\\0 to three anu quite c:mall from lhrct tn ftlUr I n~. RI C,,,. c
dcc1J .. pr~.:~. .....d~ how large the 1ncrcu'e in the R mu-.t he lu ju.....tll\ incluJin ~.; .:
aJJIIJunallng.

14 .5

Log l..eng!h Seleclion U$ing Information Criteria

553

Th e A/C. The B!C is nor the only infurmalion criterion: another i thl' Akailc~
iJI(ornHllion criterion, (AIC) :

(14.24}

The difference between the AlC and the BfC is 1hat tht: 1erm ''lnT"in the; BfC
the A IC, so the second Ierro in the AlC is smaller. For exampit!. for the 172 observations used to estimate the inOation auLOregrc~sion s.lnT =
ln(172) = 5.15. so that the second term for the BIC is more than rwice as large as
the term in AIC. Thus a smaller decrease in the SSR i:-. needed in the ALC to justify mcluding another lag. As a matrer of theory. the secood term in the AIC is not
large enough to ensure that the correct lag length is chosen, even in large samples.
)0 the AIC estimator of pis not consistent. As is di!lcussed m Appendix 14.5,10
large sa mples the AIC will overestimate p wilb non7ero probability.
Despite this theoreucal blemish, the A IC is wide I) use::d in practice. Jf you are
concerned that tht BIC might yield a model with too tew lags, the AIC provides
a reasonable alternative.

IS replaced by "2'' in

A note on calculating information criteria. How well two estimated regrestOo fit the data is best assessed when they are estimated u ing the same dat.a sets.
Because the BIC and AfC are formal methods for makmg this compari<>on. the
autoregressions under consideration should be estimated u ing the same observations. For example. in Table 14.4 all tl1e regressions were estimated using data
from 1Y62: I to 2004: TV. for a total o(172 observations. Because the autoregressions
involve lag:i of the cbange of inflation. this mean~ that earlier values of the change
oCinfiation (values before 1962:1) were used as regress or!~ for the preliminary
observations. Said differently, the regressions examined in 'fable 14.4 each include
observations on 6./nf, , Mnfr_ 1, .. , Mn/,-p for 1 - 1962:J, .... 2004:fV, corresponding to 172 observauons on the dependent vnriahle and regressors. ~o T =
172 in Equations (14 23) and {14.24).

Lag Length Selection in Time


Series Regression with Multiple Predictors
11tc tradeoff involved with lag length choice in the ge neral time serit!S regression
model with multiple predictors [Equation ( 14.20)) is similar to that io an autoregre~sion: Using too few lags ca n decrease forecast accuracy because 'aluable
information is lost, but adding lags increa!>es estimation uncertainty. The choice of
lags must balance tbe benefit of using additional tn form mion against the cost of
Cl!timating the additional coefficients.

S54

CHAPTER

14

ln1rodochon to Time Series Regression and Forecasting

A<> in the univariat~: nutor~gre,,iun, one \\,1) 10


dctcrrn1nc the numher ullag" to include io; to u-;c F-<,taListics h') test joint h) p(lthc~'-' that !>~.:h ul ~odiiCll..nts equal zero. For c\.1mplc. in the discu,..,ll)n ot l4uatiun
( IU 7). \\~.; h.;,h.:d th~.: hypothesis that the codt iCI~nt-. on tbe scl.ond th1 ~ugh fourth
lag nf the unl.mpll'}lllcn t rate equal zero agamst the ahcmatin. that thq rc
lhlllhru: thi' hypothc"i' \\aS rejected at tht: I% o;ignificance levd, knd1n!! '>Uppur
to the longc1 I n~ '>pectfication I( the number of mtxlels being cumpHrcd ",m~dl ,
thL n thi' 1'-s te~t t sll c method is ea<;y to use Tn general. however th~.: 1''-l<~ti 1
mt:thod Cilll f'lTOdU~.;c modds that an~ LOU large, in the !.COSt:: thatthl:. true lag OHh.:r
is on: n.:~tmwtcd.

The F-stotistic approach.

M in an autorcgre!->SIOn,the me and A I(' l..tll b~; '.:d


c'timnk the number of l ag~ and vari able' in the time ~eric' regrl!ssion n Jel
\1. 1th multiple rrcdiclof'. If the regression model ha<, K coeffi cient' (induding the
inki'CL'pt). thc BIC ts

Information criteria.
10

(I-I 2';)

ThL 1\ IC is ddineu in the '<tme way. but '' itb 2 replacing In T in Equation ( 14.25).
for ~.llh c1ndid tl<; moJel the BIC (or A IC) can be e"aJuatcu,,1nd the ml)del v.11h
tht... lo-w~.~L v1lu~. o l the BIC (or AIC) is the preterrcd model. hascd on the inform<ttton t...nh,rlllll
1bcr~. .trc l\\0 tmporlant practical consllkratioru. ''h n u,.mg un inlw '1on
ct iter ion to "''"ll'Uit: the lag kngths. hl't, a.;; is the case for the autorcgrc!>!IH . :til
the canditl.lh; muJcJ, mu!oot he c..,tirnulcu over the same samrlc: 1n th~.: n l ion
<ll t:quntiun (I I 25). till number of observation-; used to estimate: thl~ model. f.
mu't h.: thl ~ me for Ill models. Second, when there are mult pk predictors. ht'
appro t<.h t!l ~. vnputatiunally dcma ndint! ~e cause it rcqutres computing m n~
dtlfl.r~.nt mvJ 1~ tm nv combmat1ons ol the 1:'\g. p-trametcrs}. In pr ll'kc
-~m
VLnicm 111\i :1.. .t 1' Ill fc.: ({U l!C aiJ the !C~!C!:>SO rS to have the S.tm1. numb...r r l.tg.._
th,11 i,, to rcl(UIIL that fl c1 1 - = CJJ. -.n that only p,...,x ... 1 models 1\ll.'- to be
cnmp;ucd (ct,rrcsponding to I' = 0. 1.. . . . p,,,,..,).

14.6 Nonstationarity 1: Trends


In Key Cvnccpt 14.{1,tl \\ <I!> assumed that the dependent \<ln<tble and th~ r~ gr~.: ' "'
<lrl ' ' ttiun.ll'\ If thi 1~ notlhL ca~ that ts.tf the Jl:.~ndent van.1bk tnd.ur r~:~ ,_
SLIT" nrc mm:tutiun.tl'). thLn con' en tiona I h~ polhL ,b, tc ... ts.l.onCiJI.!ncl! in tel' nl 10d
h rccrt't' c~m be unrcli hlc.The preci'c rmhlcm created hy nun:,tation.rit~ ..tnd thC'
s,,lutltlll to thnt problrm. Jcpends on the nature of that non.,tutionarity.

14.6

Nomtotionority 1: Trends

555

Jn thi' anJ the n.:xt 'ection. \\I.! <!xamine tWtl nl the mo't tmpnrwnt t} pc~ of
nun,tuti<''lnflrity in c!t:unomic Lime sene~ data: trenu' unu btcak' In C<ll..h 'i.I..'Ction.
"'-"fiN Jc,cribe th~.;. n tlun.: of the nonqationarity, then di,cu'~ the con.,equcncc<:
for 11mc ~rico; rc!!r~;;"sion if thb I\ pe of non,lnllonnnt\ '' prc:-cnt hm IS , norc:d
\\ n~.: xt prc:..,c:nt h.:'t' IM n(lnstatlonanl) .mu ch~u'' r m.... Ju.' h1r \)f '.>lutt~'"' to.
tht probkm:, caused b) that parltcular t~pc ot non-.tution,\111}. \\'1.. b~..gin b} dts
cu-.~ing tr~nJ~.

What Is a Trend?
A trend ,, .t p~.:r,tsh:nl long-term movement ot n 'nri.1hh.: over tim('.

van

t~k

lim I.! s.:til.!s

Oudu He<: around ITs trend.


1-t.la suggcsL'- that the t S tnr1all,lll rnh: h,.., a trend con -

ln~pecuon ot Figurt;

upward tendenq. through 9 2 .111J u do\\"'' ud tendenc)


th~.-rc.aftcr. 'I he sene, 10 Figures 14.:a. h. anJ 1.. 1l<.o luwc trcnJ,, but thdr tr~:nJ,
arc quite difkrcnt l11e trend in the ll.S. fedt!ml t und' intcn~<:t rate is s1milar to the
trend in rhe t 1.~. inll<ltiOD rate:. The 1 C>.Ch<JnllC reliC clc lfiV h,uJ a pn.lltlngcd
dov.nward tr~mJ .1ft~r the collapse of the fh~;:J cxch.mg.~ r.ll~.: <iv::.tcm in t9n. Ilk
lol!:trithm ol Japanc~e GOP bas :1 complicated lr('nd: la~>t gro\\th .u fu~t. thi!D modcr.tte gh.lWth .tnd hnall) s)O\\ growth

'l'ltng ol

i!COl: r:JI

Deterministic and stochastic trends. l11c1~.: ~~~ I\H\ l\~'11.!' ot trends <:ec'!l in
time ~;;crkc: data deterministic and ::.toch.1 tic determ inistic trend I!'> '1 m.'nr.m
dom lunctton of 11me for ~xamplc. a dctermlnt,tll 11'-'ntl mu.tht h~. ltnc.tr m lime:
if mfla\tun h.td a J~.tcrmlmsnc linear trend so thalli tncrcascd h~ 0.1 percentage
po1nt pl!r quarter, thb trend could b..: wTitl~:n "' <l.lt, \\her~. 1 j, mca~urcd in quarter'>. In wntrast. a -.tochastic trend is rand<'tn and \.lfic' cwct time. l m ex.lmple ,,
stuchn.,tic trend in inflation might e\.hihit ,, p1 nlongeu pcrind ,,fin~. rca,~,. fnllov.. cu
h) 3

pr~lklngcd

period Of decrease. like the inflation lrl.lld in

h!!Ufl~

l.t I.

l tl..c manv econometricians. '' e thtnk 11 Ill murc: 'tppn1p1 hilt: to modtl o.:cn,,,mn. 11m.: '~rh.:S u<: h.t\1ng stochastiC rath\;r th.tn uct~. 1lllllll'ltc trt.nt.J,. Ec(lnum
11..' IS complic.ttc::d stuff_ lt is barJ to rct:oncile the prcdil.t.1b1ltt} tmplu.:d h) a
dct~ rnuni-.tic tr\;nd "ith the.: complication'> ami 'urpn-.c' l.tc~..d) ~.r .titer yc1r b)
\Hirkcr' hu<iinco;se'- md guvernment-.. Fnr example.tit hnugh l'.S. ion.ttinn r\'h(:
lhroueh the lJ7tk. it" a' neither destined Ill I i'-l' lot C\C'I nor dcc;ltncd ro fall a!!am .
Rather. the slu\\ rise of inflation is now undctstuod Ill h:l\c nccurn.:u bt:C.IlN~ ut
h.tdluck .1nu moneta~ polic) m.il:;tak.;s, Jnc.l tis t.tmllll' was in hug\; part ..1 l:nn..,cy u~.;nc~.. ol tough dcci,tons made by th1.. Bm11 d ul (,ovt:.rnors o{ I hl: Fl..'dcr<tl
R\; 'cf\~ ~ltniiMI).thc /t exchange rate: 1rcndcc.l Jm\ n hom llJ72 to 19R5 anJ 'uh'cqucntl) uri fled up. hut these ffiO\emcOt\ l~lO \\('1 C the CPn'-l'lJUCllCC,l)f com ph.\

556

CHAPTER 14

Introduction to Time Series Regression e nd Forecasting

economic forceo;; because these for~s change unpredictably, these trend ~ 'IT\! u,c.
fuUy thought of as having a large unpredictable, or random. component.
For the e r\!asons, our treatment of trends in economic llmt: ~crll;'> focu c-. on
stochastic rather than deterministic trends, and when we rder ro 'tremh 1n tmw
eries data we mean stochastic trends unless we explicitly say oth~n..-1.-.c. Tht~ "'-=
rion presents the simplest model of a s tocha~lic trend, rhe nmdom \\.tlk model,
olher models of trends are discussed in Section 16.3.

The random walk model ofa trend. The simplest model of a \anahk \\ h
a stochasuc trend is the random walk. A time series Y, is said to folio'' u random
walk if the change m Y, is i.i.d., that is, if
(14 "6)

where tt1 is i.i.d. We wi ll, however, use the term 'random walk" more generally to
refer to a time series that follows Equation (1 4.26), where 111 has condilional m~tan
zero. that is. E{u,J Y, _1, Y,_2 .... ) = 0.
The basic idea of a random walk is that the value of the series tomorrow is tb
value today, plus aJ1 unpredictable change: Because rbe palb followed by Y, cun
sists of raodom "steps" u,, tb.al path is a "random walk.'' The conditional mcar lf
Y,based on dara through time I - 1 is Y,_1; that is. because (u, I Yr-~, Y, - ~. ) 0, E(Y,IY,_ 1, Y,_2, ) = Y,_ 1 In other words. if Y, follows a random wall.. . nen
the best forecast of tomorrow's value is its value today.
Some series, such as the logarithm of Japanese GOP in Figure 14.2c, ha\\! an
obvious upward tendency, in wh.icb case the best forecast of the series must inclu1e
an adjustmen t for the lendency of the series to increase.lh.is adjustment leads t\ lO
extension of the random walk model ro include a tendency to move. or " drift lJ1
one direction or the other. This extension is referred to as a random walk with drift

Y, = {30 + Y,_ 1 + u,.

(1-1.27)

whe re (u., JY,_1 Yt-l, ... ) = 0 and {30 is the ''drift'' in the random walk . lf {30 is p(l'
iLive, then Y, incr~ases on average. In lbe random walk wilh drift model, the !l~st
forecast of the series tomorrow is the value of lhe series roday, plus tbe drift (3
l11e random walk model (with. drift i:lS appropriate) is simple yet versatih:. ,111J
it is the primary model for trends used in this book.

A random walk is nonstationary.

If Y, follows a randoro walk, then it is


stationary: The variance of a random walk increases over time so Lhc: distribut
of Y cha n ~~s over ume. One wav to sec th&s is to recognize th.ll becau -~ 1t

'"

558

Introduction to Time Serie~ Regreuion and Forecosllng

CHAPTER 14

IO\\ m.J II il' tt-.trut: value i. 1: (2) r-stati-.th.:.., on rcgr~<~'m' ''ith a ~lt'l\.h '"''c trc mb.
10 ha\C a nnnnnrnnl di,tnhution even in large c;nmpk' ~nd ('\) 11 \lr \: m~
example''' th~ risks P' ~~u by stochastiC tn:nd' '' th.11 l\Hl S~;IICS 1h1t nrc inJcpcndcnt \\ill, \\ith htgh probabtlny. mll-lcautngly app~.tr Ill bc r~ ltt~d 11 thC) I oth
ha\~ 'tc>ch ...... tic tn:nd'-. ,1 ,jtualtl)n kno\\n a' 'puriou::. I C{!It:.. ''""
c

Problem #I: Autoregressive coefficients that ore btosed t oward zero.


ur"'''C' th.tl }' foliO\\'\ the random \\all 10 Equation ( 14 "'h) but this i Ul'l k , , \
tnt hi! cwnnmctriciJn. "ho mstead e~ ti mate' the AR( 1) mndclm r~1u ,.,,,n (11"
l:kcalbc Y," nonstatron.tr}. the leac;t ~qu.tre'> as umptlllll'> tm tim\; 1.111.:. r ~r~
''on 10 1\.t:.} < onccpt 14.tl uo not hold, so a' -'t g.:n~ml matter '"' ~ Jl rot r.: ' nn
l.'!'.timatur' ,md h:.'>t stati,tk~ having their u... uallargc ,,tmph: normal dr::.trthuuon . . .
In fact in thi-; ~o\<tmple the O LS eslimator of th~! autort:gll~~"i"~. cudiiUI.Ilt J] 1, , ,
con-.i:-.tcnt, hut it has _n nonnormal distribution. even in lmg1. sampks:1lll ,,..,~ mplotic Jistrihution o! {3 1 is shtflcd townrd 7.cro. The expected qtlue of {3 1 i<. 'PP' '"
imntl.!lv L(li 1)
I - 5.31 T. This re~ul t ~ in a large bt<IS in -;ample we-; t) pic1llv
~.ncounh.: reu 10 uconomtc appltc.:mions. lor example, 20 year~ of q uartc;rlv d.tt . t~..nn
!<lin SL) obs~natton ..... in which ca<>c the expt:ctcd \alut: ol {3 1 ts /\([$ 1) I - )Jil-lll
.,.. 0 lJ'4 Mon:ovl!r. this di'>tribulion hao;. a long left tail 'Tl1c.. 'i'~o pdccnllk of ft, ~
:tpproxun.:udv 1 - 141 T. \\hich. for T = RO. corre~ponlh to 0.824. w tluu - of
tl c 11111: 13, < 0 'C ~ .
On~; IMph\.. Iron of tht:. bta" toward ZdO 1::. tha t. tf )' 1\IIJ~,ws a ranJ,,m wal .
th~.:n torct..t:>t~ ba::.ed on tht! AR(IJ model cJn pc..rtorm ::.ubc;t,lntl.tll) \\ or~~ 1
bo'<.: b.t... Jon th1. ram.lom \\Jlk model. whtch unposes the true value {3 1 - I. 1 ~
c1nclu ion ubn :tpplic." to highcr-ordl!r .wtor\:.gr.:"' m~ n \\hH;h there nrc f. -c 't'ng gain' from inlpo,mg a unit root ( hat j, lr 1m e'ttm,tung the auwrcgrcs 1 n
in ltr't c..lrff~rcnccc; m'tead of rn level<i.) when in r 1cllh~ -.cril, cuntain.; ,, unit n t.

Problem #2: Non -normal distributions oft-statistics.

If a r~l!l l '-' r 1 ~

,, stocha')ttc trt:nll, then lt!) usual O LS t!)tatl-.uc can h,l\c. .1 mm nonn.tl ~h,t r.
tilm under thc null hypo the''' even m large samples. llu::.non-nmmal Jl,llt1'lltJ<in
mc.lll!.> lh.ll cutwemional c:onfidcncc interval~ nrc not valid aml h) pmhc~t., t ~~ r ...
cnnno1 be C'()I1ULH.:tt:d as u~ua l. Jn general. the di'itrihutinn of this t-<;tati,tic i'- n 11
rcadtlv t.tbul,llcd bccau..,e the di-;tribution dcpcmh on th~. rc.lat iunship hl!l" ~d 1
tht.o regr~. ''"r 10 que!> liOn amltbe other rCI!rc,sor;. An impon:tnt c-.:ampk ol l h ~
prohkm ,trlsc.' 10 rcgrcs:;ton.., that attempt to torc~asl .,tod. rcturn-. usm~ r _:..,or., th.ll could h,11.e stocha<.ttc trends (sec the box rn Scettllll 11.7, can \ou li
the,; \t ukct '! Part 11").

14.6

Noo~tohonority I

Trends

559

Onc.. ampmtant ..:;be m \\hich it '' flUl>l'-tbh.: to t.tbulatc the dt,lrtbution of thl!
when the reg,res-.or ha.., a stochu... ttc lrLnd ,., in the context of an autorc.!~rc_,..,,,,n \\tth a unit root. \\c rcaum hlthi.., 'pcci.llc.. ''c "hcn we. tJke up the problem of h!..,ting. whdhcr a time <;eri~.:' Cllnlain.., n 'll~h "lie llc..nd

1 ~lilll,ttc

Problem #3: Spurious regression .

Sh~h ''tic trend, can k'ld two tUlle '-(.;rtC:-.

tu ctppc.r related ''hen the\ <Jr... not. 1 pmblo.;m \_;alh.;d ' Puriou." regression
rur CXiJmplc. l..~ tnflation \\as steadily lll>Ull1 from I h~.: mid 14()(h I hrough the
c.nly llJ. lis. anJ at the ~ame Lime J:ap.m~:'~ <,J)p lrlnued tn log.trttllDls in Figure
l4.2c) '' ,,., -.teadily ri"ing. Th~o."c.. t\\ o trcn~h c..un..,piae to produce 1 regrc ...-.ion that
appt:1r' to he .,jgnificant" using con' cntion I nllac;urcs. .::..,timated by OLS u'iing.
d<~t:l from 196- through l9Sl.thi.., regrc,c;ion is

lJ.S. luj1auan ,- - 37.78 J..iO

X ln(Japtlllt\c. (i[)f',)

-I<

0.56.

(14.2~)

(3.1JlJ) (ll.3b)

l11e Hotati.,tic on the . . tor~ coefficient exceed., 10 which bv usu.tl 'tandards


indic:Hcc; a strong postll\'e relatiom.lup hch, ccn the two c;crie' and the R.' is high.
However. running thi::. rcgrcc:sJon u~mg, dnt1 hom 1%2 through 2004 vil..lds

L . S. ln f1auon , = 31.20- 2.17 > ln{JIIJI(IIlt~t> GDJ>,).R2 = 0.08.

(10.41)

lfl.

( J4.21J)

0)

The rcg.r...:ssion.; in Equation ( 14 .2~) :tnd (1-l 24) could hardly he more different Intcrpreted literally. Equal ion ( 14.2~) mdic.ttc.., a .,trong po:.ithe rclatwn..,hlp.
' 'htle Equation ( 14 29) indtcat~.-::. a \\e.lk, hut upp.trc..nll) ..,t.ltl,llt:alh ''.!.!nlllcant,
m.gali\'C rdauoo-.lup.
The source ol thc..:sc conllicting re..,ult... is th,tl llc.llh ...eric' h.1vc ::.tllChastic trends
Thc-.c trends happened to align from 1965 through 19~1. hut did not align (nlm
19X:? through 2004. There'' in I act nn CllmJXIIilli!. ccnnnmic or political r~o.a,nn to
think that the trends in the<:e two seric' arc rd.ltcJ In ~hort the-.~.: recre,,tonc: arc
c;purmus.
Th~ regrcsston., in EquatiOns Ll4.2X) <ttH.I ( 1-1.21.J) tlluo;lratc empirical!~ the thcorettcal pomt that OLS can hL mtsh.:udtng wh~.:n the scncs contain stochu!')tic trend'>
('ce Excrci-.e 1-l.h for c1 computer !>imulallnn that Jcmcmstr.tte~ thi., rt,ult) Ont.
'pedal cas~ in '' hich (~rtain rcgres..tnn-hascJ mcthuJ, art r~liahle jo; wh~:n lhc
trend compont:nt of the l\\O sen co; ac; the c;ame th tt i~ "hen the c;cncs con tam a
<mllmm ,tocha..,llc tr~nd: tf .;o,thc.. ~~..nt:s arc. ~atd to he; cotntegrntcd Econnmct-

560

CHAPTER 14

Introduction to Time SeriM Regression and Forecasting

ric methods for detecting and analyzing cointegrated economic time ~cricl> are discussed in Section 16.4.

Detecting Stochastic
Trends: Testing for a Unit AR Root
Trends in time series data can be detected by informal and formal m~thod'). 'The
informal methods involve inspecting a time senes plot of tht.: dat ~ .and computin~
the autocorrelation coefficients. as we did in Section 14.2. Becau'c. the fir t aut correlation coefficient will be near 1 if the series has a stochastic trc:"nd. at lcaq i
large samples, a small first autocorrelation coefficie nt combined with a time scric"
plot that has no apparent Lrend suggests that tbe series docs not have a tr~nd. lt
doubt remains. however, there are formal statist1cal procedure!~ that cun be U!>eu
to test the hypothesis tbat there is a stochastic trend in the series against the alternative tha t there is no trend.
In this section, we use the Dickey-Fuller test (na med after ilS inven tors Da..,ill
Dickey a nd Wayne fuller. 1979) to test for a stochastic trend. A ltho ugh thL
Dickey-Fuller test is not the only lest for a stochastic 11end (another test is discussed io Section 16.3), il is the most commonly ust:d test in practice <1nd io; ont: r
the most reliable.

The Dickey-Fuller test in the AR(I) mod~l. The starting pomt for the
Dickey-Fuller test is the autoregressive model. As d1scusscd earlit:r, the ranol m
walk in Equation (14.27) is a special case of the AR(l) model wah {3 1 = l.ll /31 =
1, Y, is nonstationary and contains a (stochastic) trend. Thul>. within the A Rp~
model, the hypothesis that Y, has a trend can be tested by testing
(14.30)
If {3 1 "" 1, the AR(l) has an autoregressive root of 1. so the null bypotl1esis in Equ.ltion (14.30) is that the AR(1) has a un it root, and the alternative is that it ''
stationary.
'This Lest is most easily implemented by estimating a modified version of Equltion (14.30) obtained by subtracting Y,_ 1 from both sides. Let
{3 1 - 1; tht;tl
Equat.ion (14 .30) becomes

o-

H0: 8 = 0 vs. H 1: 8 < 0 in o Y,

= {30 + 8 Y,

... u,.

(1 -L~ll

The OLS tstatistic testing 8 = 0 in Equation ( 14.31) is called th1. Did'-.


Fuller stati tic. The formulation in Equation (II 31) IS convc.nil.:nt hcc 1 't

14.6

Nonslctionority I Trends

561

regression soft ware automatically prints out the t-!>tatistic tc~Ling 8 = 0. Note that
the Dickey-Fuller test is one-sided, because the re levant a ltcmathe is that Y, is
stationary so {3 1 < 1 or , equivalently. 5 < 0. The Dickey-Fuller sta tistic is computed
using " ooorobusr standard errors. that is, the ' homoskedasticity-o nly" standard
errors presented in Appendix 5.1 [Equation (5.29) for the case of a single regressor and in Section 18.4 for the multiple regression model]-3

The Dickey-Fuller test in the AR(p) model. The Dickey-Fuller statistic presented in the context of Equation (14.31) a pplies o nly to an AR(1 ). As discussed
in Section 14.3, for some series the AR{l) mode l docs no t cap ture aU Lhe seria l
correlation in Yl' in which case a higbe r-order autoregression ill more appropriate.
The exteosjon of the Dickey-f uller test to the AR(p) m od el is summarized in
Key Concept 14.8. Under the null hypothesis. = 0 a nd ~ Y, is a stationary AR(p ).
U nde r the a lte rnative hypothesis, o< 0 so that Y , is statio nary. Because the regres-sion used to compute this version of the Dickey-fuller statis tic is augmented by
lags of d Y,, the resulting !-statistic 1s referred to as the augmeoted Dickey-Fuller
(ADF) stafjstic.
In general the lag length pis unknown , but it can be estima te d using a n inform ation criterion applied to regressio ns of the fonn in Equation (14.32) for various values of p . S tudies of the ADF statistic suggest rbat it is bette r to have too
many lags than too few, so it is recomme nded to use tbe A IC instead of the BJ C
to estimate p fo r the ADF statistic.4

Testing against the alternative ofstationarity around a linear deterministic time trend. The discussion so far has considered the nujl hypothesis that
the series has a unit root and the alternative hypothesis that it is stationary. TI1is
a lternative hypothesis of stationarity is appropriate fo r series, like the rate of inflation. that do not exhibit long-tenn growth . But other econom ic time series, like
Japanese G OP {Figure 14.2c), exhibit long-run g ro wth , a nd for such series tbe
a lternative of stationarity without a trend is inappropriate. Instead. a commonly
used aJternative is that the series are sta tionary around a deterministic time trend,
thai is. a trend that is a deterministic function of time.
One specific fo rmulation o f this aJtema rive hypothesis is tbat the time tr end
is Linear. that is, the trend is a ljoear function oft, tbus. tbe null hypothesis is tha t
the series has a unit roo r and the a lte::roative is that it does not bavc a unir root but
dues have a deterministic time trend. The Dickey-Fuller regression must be
'Under the null hypothesis of a unit root, the usual .. non robust.. standard errors produce a 1 swti~ltc
that Is in fact robust to heteroskedastlci ry, a ~urpnsing and special rc~ult.
~sec. tock ( 1994) and 1-!aldrup and Jnn~~on (2006) for review~ of stmulation qudics of the flnitc-srunpt.: prop<:rllcs ot thc Dickey-Fuller and otlwr unit rootte~t \tatiMtc:s

562

CHAPTER 14

lntrodudton to Time Sertes Regression and Forecasting

THE AUGMENTED DICKEY-FULLER


TEST FOR A UNIT AUTOREGRESSIVE ROOT

14.8

Inc .tugm.. nkJ Dicke~-Fuller (r\Df) h.'>t fo- a unit tutorc :.._.,, . roottc b t he
null h) putt \.'St" Hn i> = 0 agam t the on~.--~id~o .tht!rnnlt\ t: H /i l in tht: regn!"

sion
.n

f3u + oY,_,- y,.l }',

+ y,.l Y,_ ~

+ ...

+ 'Y,.lV, I'

llr

(14.32)

l nth:r th ... null hypothe=-is. Y1 ha ,, slot.. ha~ttc trl!nd: uudet the tltl:rnativc hypothesis } io;. '\tationary. The ADF staust tc I!:! the OLS t-stall~tll.t~~ltnl' 6 - 0 in Eqtl(ltton (1-L(!).
If ithtc 1c.l llle allt!rnatiYe h~ pothcst' 1s that Y, is stationary around .t Jetermini-.ti~o linea time trend, then th1" trend ...( ( t h~.- observauon number). muM be
added a.; an t~ddttional regressor. iu which case the Dickey-Fuller regression
hccome.s

1-Y,;. {3(1

+ (l'( + 8}'~ -j + y ,.lYI I+ y,.:n

r- 2

+ ... + Yij, Y,

.... II(.

( 14.33)

where a is an unknown coefficient amJ th~ ADF' <;tntistic i!> the OL<) r-stmi~lic Lc:.l
ing D 0 in bttl:'ltion (14.33)
"lht. ltt1! 1~.-nl!th p can bt: e~timate<.l U'-mg th.:: BIC or AJC. The \DF ... tatisttc
dot:s 1 m han. " normal diMributton. ev~.-n m large sampk-;. Crillcal 'alues lor the
one-std~.d ADf te:.t depend on "hetht>t tht. LlSI ~~ l:lal:>e<.l on Equauon ( I~J2) or
(14.33) and an: gi\cn in Table 1~.5.

modifit:d Ill tc't the: nuiJ hypothcw' uf ,, unit root agam~t th~o alternative that 11 ...
'ullionur) arnunJ a lin..:ar lime Lrend. \ ' ... ummariLeJ in l:.quation {1~.33) in Kl
Concept I-tS. this h accomplished bv adding a time tr~o nJ (the regre,sor ..\ 1 = /) '
th..: regre,shm.
A hnt:ar time trend is not the only wa> to 'Pc~tf) (I Jetermmtslic ltrn( lrt:nd
tor example. the Jct..:rministic time trend could be quadratic. or it ~o u l d be lin~::.tt
0Ul h<tYL hn:Hk'> (that i~. he linear With , Jnpt.;s l h <ll tJ jf(er in tWO pnrt'i OJ th1: ~01111
pie). lhe u<;c \lf a ltcma lives like th c~c with nonli near Jctcm1inistic.: ttcmls -.hnuld
hl! motiva ted bv economic theory. For a dtscu-.,ion of unit root test~ again ...t '' t
tton<.tritv arnund nonlinear determtnt~lic tt~..nJ s . see Maddala and K im ( ttJ' -.:
Chapt~;r 11).

l ru.kr th~ null hypothc:~t., of .1 unil


roN th~.. \()I' -,t.Hi,ric Joes fi<Jt ha\t: " n'f m.1l di,u ibution . cvr.:n in I.~rgt. -.ampll''

Critical values for the ADF statistic.

14.6

r TABLE

14 .5

~rmini1tlc

Jnt..r~~r

Noosrotionarity 1: Trends

563

LargeSomple Critical Values of the Augmented OickeyFuller Statistic


10%

ltegreuors

1%

a ni\

lot< r~.:pt <tnJ 1 n tr nJ

341

-.\12

Occau'l.' it" Ji,trihuti~1n is nonstam.l:m.l the. u,u.tl ~ritit.ll \aluc.-. from the normal
Ji'>tribution cannot he used \\hen u~in!! the. \DP'-.tnti,ttc In tt:q tor a unit root a
c;pccial "c.!l of cnuc:~l values. based on th~.: dt,tnhutt(JO ot the ADF 'l<tlisltc under
I he. null h\ r 1tl:
mu t be u...eJ m'tcad.
lllt; (.rlltc.tl ';:t lu~s tor fh(;. ADF tc't arc gh l!n m 1ohlc 14.5.13ec!luse lhe alter
nattvc.. h) pothc..,ts ot stationant} implks th.tt 11 < 0 111 Equ.llion .. (14 . ~~) and
( 14.3:>), thc ADF test i ont.:-~id..:u Fnr cxamplc.tf the rcgrc"i '" uo..::-; 1 m include
,, time trent.l. then the. hypothesb of .1 unit root i' r<:tc.cl<:d at the ''X 'ignificancc
kH;I if the A Or stafi Lie is less Lhan -:: 1-ih. II n time trend is inch1<.kd m the rcgresc;ion, the cntical value ic; instead - 1.-H.
The t-1"' tlc..:tl values. w Table 14 c; are suh,tanllnll) l.trg~r (more nt:gauvc) tha n
the on~,tJt.t.l crt teal 'a lues of - 1.2:-; tnt tht: 10" lc'c I) and -1.6-15 (at th~:. sr ~
k'd) tmm tht. '' ndard normal Ji~>tnhution ."lllc non..,tnndard di ..tnhution \1! he
AD.-:- 'it.Jil,flc ban t::o.ampk of hO\\- OLS 1 ..t.lli tic., ltJr rcgrc!.,soro; \\itb-. oclla:-tk
11 t.n~h c.m h,l\ c non normal distribution' Wh~ the lnrge "lmplc Jistnhution ot the
A DF .,tattc;tic is nonstandard 1c; discuc;c;cd further in Sl!<:-tton lh :l

l!'''

Does U. S. inflation have a stochastic trend? l11c.. null h~ rothcsis that infla
llOil hn~

a stocha)IIC trend can be testt:J ,tg.nnst th~ olhc..ln,lfl\c that it j, ,t,llil10h) performing the ADF h:::.t for n unit auturcgrc,,i,c rl)ot Th~... ADF rcgre~..ion
''ith four l.tg' uf In[, i'

fl1)

;)."'i:? -fl."l-0.11/nf,
(1121)

(IJ.t~)

-0I9..Hnj_1 - 0.2M/nf,
(O.OS)

(tWX)

r li21Uinl,_, - O.l)J;l/n[, ~

(o.o. >

( o.o~ >

(14.34)

'flle ADF f-~t.tti~tic tS the t-stati<;tic lt:'ittn th ... h\p\Hhe-.i' th,tt the cocffidc..nt on
In!, 1 is zero: thi" j., 1 = :::.69. From T.thll 14 . the 5 ~u criucal ,aluc is ! .S6.
Beet'-.{' t'lc \ DF "taustic of - 2.69 '' 'l"' n (! 1uvc th o 2 so he ICC\l d\)(!' not
rcc<.tJttlenullh~rotht.,is11tncS
,. fl rcmct.l \..I.B ~dLnt trcgr\;"itm
in Lqu1tu.n {1-1.34)." t.. therefore c:tnnot r c..ct (at th-. "\ o stgmltcanec. level) the
null h~pothC..i'> that mnattnn ha' J Untt 8UtOrt. r~;.<,,t\C T1tll. that j~ that inllaliOn
Clllltain' a '1(\\.h<Nic t reno. again't the nil~; lllottivt: th.ll it j, -;t:ttionar\'.

564

CHAPTER 14

Introduction lo Time Sertes Regression ond Forecasting

I1H: ADJ- n..:gr..:-ssion in Equation (14.34) incluc.Jcs four lags of ~In], to compull! the ADF statistiC. Wht:n the number oi lag~ is C'\llmated usin~t the,; J\lC where
0 ~ p s 5, the A IC estimator of the lag length is, howevc:r. three \\ hl:n three Jag-,
are used (thatts, when ~nJ;_ .. Mnf,_b and Mnf,_ ~ are included a-. r~:ore' nrs). the
ADF stausttc is - 2.72. which is less negative than 2.86. Thus." hen th1. numher
of lags in the 1\D F regre:>!>ion is chosen by AIC. the hypothe ts th.Jt infl 1 mn contains a stoch.htic trend is not rejected at the 5% signtfic,tnct. k\d
These te~ts ~ere pedormed at the 5% ~igoificancc l eq~J o\t the 10% ,j~mifi
cance Jv.~;l however, the tests reject the null h) pothc,;sis of a umt ront: me DF
~tattsttcs ot -2.69 (four Jags) and -2.n (thr~ lagc;) arc more.; n~.;gatJ\c,; than the
I0o en tical value of -2.57. Thus the ADF statbticl> paint a rather <tmbiguou:-.
picture, and the forecaster must make an informed judgment about wb.cthct Ill
mode l inflat ion as having a stochastic trend . Clen rly, infl ation io Figure I U.t
exhibits long-run swings. consistent with the stochastic trend model. Jn practke
many forecasters treat U.S. inflation as having a stochastic trend. and we lollow
that stralebry here.

Avoiding the Problems


Caused by Stochastic Trends
The most rehable way to handle a rrend in a series ts to tran~form the ~.rk' ~o that
it d oe~ not have the trend. If the series has a stochastic trend, that IS. if t h~ 'erie.'
has a unit root, then the fir~t dif[e rence of the seric does nor have a tr~.nd . Fln
example, 1f Y, follows a random walk so Y 1 = {30 - Y1_ 1 + u,. then AY, = {3, + 11 .,
staTionary. Thus usmg firs1 dif(ereoces eliminates random walk trends m 'tl ~nt;.
In practice, you can rarely be sure whether a scrie bas a stochastic 1rcnd
Recall that, as a generaJ point. fa tlure to reject the null hypothesis doe filii ,essarily mean that the null hypothesis is true: rather. it simply means lhat you h '.:
in!lufficien t evidence to conclude that it is fal~. Thus, failure to reject the null
hypo thesis of a unit root using the ADF test does not mean that lhe Sl!ri.:' m.t ually has a unit root. For example, in an AR(l) model the rrue coeCCicient # 1 llll!:hl
be very close to l,say O.Y8,in winch case the ADF t~.:st would have low powcr.t l1.1t
is. a low probability of correctly rejecting the null hypothesis in sample'> thl ~i;..:
of our inflation series. Even though fa ilure to r~jl'ct the null hypothesis of a unit
root does not mcun the series has a unil root. it still can he reasonable to appw'
imate the t ru~ autoregressive root as equaling I anc.J therefore to use di ffcn.'nlo.:"
of the series rathe r than its Jevets.5
'f-or adchuonal di~USSJoo of ~toe
"' rn.n<h m e1."0DOOUC tune~ ric:s 'lln Illes nd .,r!he pnl '
1hc:~ ~for rcgrcs\aon noaly~"" St L nnd \\at""'n (19S: )

14.7

Nomtohononly II. Breaks

565

14.7 Nonstationarity II: Breaks


;\

~C<lrlt..ll \ p..:

Ill nomaauonanry an:.es ''hen tht. populatwn 1t.gn.:<.c;lun tuncrion


chang~ on.r the cour!>e ot tb~ :.ampk. In cconomu:,, th1' cm1 t'C\:UT for" 'ariel~
PI n.::.l,llll'. 'u\.h ,h t.h mgc' m e<:OJhlmic fX.'UC). ch,rngc' in th~ 'tructurc (.lf the \!conom).lll nn invc:-ntion that change'> a ... pccific industr) . lf such ch.tn~c'.llT "brc ak._:
occUI , then a rcgre,!>ion model that neglects those ch.IIH!C" c:rn prm1Jc a mlsk,ldllw bn<>1s lor inference anl.l forcca-:.ling
l111s sc~t1un prcst.nls two strareg1es tor cb~,.;c:l,.m\! lor bn.1ks Ill a ume s~..:ncs
Tl ,,I~,.;S'Iun functton over ume. The fu~t strategy look!- for pot~.;nllal brc.1ks from the
f 1. t'P'-I.llh of hypotht:!>i~ ll.:sting. 3ntl entail' tcsllng lur ch.mgc' m th~: n.:grc,,ion
Ci'lCIIIcienh u'ing F-,tatisticc.. nle second --trah:g~ hxlk' fnr pntcntial hn:aks fwm lhl
pcn.pccthc ,,f f,lrccasting: You prLicnJ that your ample emf... '<liC.llh. r than it actually
unco; and e\'aluate the torccasts you would have m.t<.k h1J th1' been c;o. Breab arc:
dch:ch:d ''hen the forecasttny. pcrft~rmancc ic; c;u~-;t<~nlt.!llv ('( lTll tl1.1n C\fll-'Ctcu.

What Is a Break?
H1~.:

1ks t..tn ,1ri~c ~ithc:-r fmm a dic;crete chang.: in thc.: (1upul.llton rt.'!!re,sinn cod
fkknt 11 t Jjqinct Jate or from a gradual evolution ullh" codl1cicnrc; over a

lunPCI period ol umc.


On~. suurc~ ot d1scret ~ break" m macroecononuc 1.1 11 1 IS nMJOr hange 111
m lt.ro~conomtc puht.). for example, the breakdown of the U1 Lttun \\uod:. s) ~tcm
of f1xcJ cxc.:hangc ratec; tn lY72 pn.l<.lucc<.lthc bre,'" tntht. time.: -,crk' ~t.hanc'lr nf
lh~.: ''( ~.xch.tnge rate that j, ~,.;\idcnl in Rgure 1-l "h. Pnnr lll1')72.the ex<.:hangt.
1ate w,rs cv;t:OIHtlly con<:tant. with the except inn of n 'in vic de\ ,lluation in I ':16~ in
'' hteh the official \'alue of the pound, relative to the dollar." ac; dcc1 eased [n contr.l 1. -,mcc 19~2 the ex hange rate ha~ fluctuated u\cr a Vl r \ "1Jc 1ang~.:.
Brenk' alo;;o can occur mor~..: ,Jo\\ I) as the populn11on 1\:l!rc"mn cvohes O\'er
lime. lor example.:, ::.u~:h <.hanges can arbe becau~ ul ,Jo\\ "' ulutwn of econnmic
policy and ongoing changes in tht! tructur~ of the cwnom~ Tht. method' for
dCtlctin!! hr\:,tk" llc.:,cri}}cd in thi' ection c-.1n detc.:~t hnth type' uf hrc::.tb. Ji.;;tinct
ch.IIH't:' anJ c;lo\\ evolution.

Problems caused by breaks.

H a hreak oc1.ur' m the popula11nn regrc,'\lon


lunllwn Junng, the "ample. then the OLS regres~1on cstunuh:~ u\t:J the full sample \\ill t ttnMit: ci rdaltOnship that holds "On ,J\CI,tgc," in the o,cn'e thatlhl C"linlth. l'Otnbinc-. th\.' t\HI Jiffercnl pcnods. Defl'-n<.ling un the klCntiun anJ th\. 'i/c.;
' r lh\. hrenk.tht. '\t~<en.lgc'' rcgres.c;ion function can t>c qullc dillercnl than the true
rc:gtc,:-.i''" 1unction :H the ~:nd of the 'ample. anc.lth1~ lend' to 1'1KH lllr~c:.,,l,

566

CHAPTER 14

Introduction to Time $erie$ Regression and Forecasting

Testing for Breaks


One way to detect breaks is to test for discrete cbanges, or bredkc;, in the rcgrc.
sian coeffid eots. Ho~ this is done depends on whether the date ot the c;uspeckd
break (the break dnte) is known.

Testing for a break at o known dote. In some applica tion~ you migh t su...
pect that there is a break at a known dale. For example. if you an. '-lUJyine international trade rela11onships using data (rom the 1970s. you might hvpothcsize thwt
there is a hreak in the population regression func tion of interest m I972 when the
Bre non Woods system of fixed exchange rates was abandoned in favor ot Ooatm~
exchange rates.
If the date of the hypothesized break in the coefficients is known, then tht! null
hypothesis of no break can be tested using a binary variable interacUon regression
of the type d iscussed in Chapter 8 (Key Concept S.4). To keep things simple. con
sider an ADL(l ,l) model, so there is an intercept, a single lag of Y,. and a singh:
lag o( X'" LetT denote the hypothesized break date and le t D,() be a binar) van
ab le that equals 0 before the break date and 1 after, so D1(T) = 0 if t :s ; anJ
D,(T) = 1 if 1 > -..Then the regression including the binary break indicator and I
interaction terms is
Y1 =

f3o + f3t Y,_l + OtXr-1 + yoi),('T ) + 'Y t(D1(T) X Y,_.J

Y2[D,(-.) X X, ] .l. li ..
(14.3))

H there is no t a break, the n the populatio n regression function is th\! same


over both parts of the sample so the terms involving the break binary var able
D1(-r) do no t ente r Equation ( 14.35).That is. under the null hypothests of no br~ li:.
y11 = y 1 = y 2 = 0. Under the alte rnative hypothesis that there is a break , then 1e
population regression fun ction is different before and after the break tlat\! '"
which case at least one of the y's is nonzero. Th us the hypothesis of a break ctn
be tested using the F-sta tistic that tests the hypothesis tha t 'Yo = 'Yt = y2 ~ fl
against the hypothesis thar at leasr one oJ the y'::, is nonzero. This is often called ~~
Chow test fo r a break at a known break date, named !"or its inventor, G rcgMV
Chow (1960).
If there are mu ltiple predictors o r more l ag~;, then this test can be extended I'}
constructing binary variable interaction variables for all the regressors and 11.::-t
ing tbe hypothesis that all lhe coefficients on terms involv.ng 0 1(T) ure zero.
1nis approach a m be modified to check for a break in a subset of the cc.x ificicnts by including only the binal) variable interactions for that suhset of rc~f t:'
sors or interest.

567

Testing for a break at an unknown break date. Oft en th\! date of a possi
ble breal.. is unknown or known only withm a range. Suppo c. for example. you
suspect that a break occurred sometime between two dates. ..., and r 1 The Chow
test can be modified to handle this by testing for breaks at all rossible dates 'T in
betw~en To and 'T 1 then using the largc!!>t of the resulting F-statistics to test for a
break a t an uoknown date. This modified Chow test is variously called the Quandt
-=---J---f~~-r;- elihood ratio (QLR) statistic (Quandt, 1960) (the term we shall use) or, more
~bscurely. tbe sup- Wald statistic.
Because the QLR statistic is the largest of many f-sullbt..ics, its distribution is
not the same as an individual F-statistic. Instead, the critrcal values for tbe QLR
stalist..ic musr be obtained from a spec1al distribution. Like the F-statistic, rh is distribution depends on the number of restrictions bemg tested, q, that is, the number of coefficients (including the intercept) that are bei ng a Uowed to break, or
change, under the alternative hypolh~is. The distribution of the QLR statistic also
depends on ; 0 /'T and ; 1 /T , \hat js, on the endpoi nts. To and '7' 1, of the subsample
over which The F-statistics are computed, expressed as R frRc uon of the total sample size.
For the large-sample approximat ion to the distribu tion of the QLR statistic
to be a good one, the subsample endpoints. To and 1' 1 caonot be too close to the
begin ning or the end of lhe sample. For thh reason, m pract..ict: the QLR statistic
is compu ted over a "trimmed" range, or subset, of the sample. A common choice
is to use 15% trimmmg. that is, to set for To = 0.15T and '1' 1 = 0.85T (rounded to
the nearest integer). With 15% trimming. the f-staustic is computed for break
datt:s in the eentral70% of the sample.
The critical vaJues for the QLR statistic, computed wilh 15% trimming, are
given in Table 14.6. Comparing these critical values with those of the Fq... distributjon (Appendix Table 4) shows that the critical values for the QLR statistics are
larger. Tills rcnc~ ts the fact lhat the QLR statistic look!\ at t h ~ largest of many individual F-stallstics. By examining P-statistics a t many possible break dates, the QLR
statistic has many opportunities to reject the null hypothesis, leading to QLR critical values that are larger than the individual -sta tistic critic<ll values.
Like the Chow test, the QLR test can be used to focus ('I n the possibWty that
the re are breaks in onJy some of the regression coefficients. 1l1is is done by
first computiflg the Chow tesu. a 1 different break dates using binary variable interactions only for !be variables with the suspect coerricients. then computing
tbe maxim um of rhose Chow cests over the range -r0 :s :s ; 1 The critical values
for this version of the QLR lest are also taken from Table J4.6. where lbe number
o f restrictions (q) is the number of restricrions \ested by the constituent
F-statislics.

..

568

CHAPTER 14

TABLE 14.6

Introduction to Ttmc Series Regression and forecoshng

Critical Values of the QLR Statistic with 15% Trimming


10"".

5%

1%

.., 12

l\ liS

IZ.Ih

~m

5. ~

7 ~s

"\

-UN

4./1

II 02

4N

12

32h

J (1.2

n1

4 12

2..~

'

'i

:;.~:.

2.119

:! Y8

JH

1)

'."S

:.~4

JJH

lU

2..1~

2.71

.1 ..21

.2.-ll)

2112

' 11'1

1.?.

" ~l

:!.).J

2.47

13

2.27

tf)

2.)7

2.21

.Z!IU

2 7S

Number of ltetrictlons (q)

\I

:;

""'

.4

-\1

!.16

..- ..

16

212

229

'264

17

21}..

2 .:''i

2 'Is

:!.05

2.10

253

)I)

2.111

2 17

24S

:n

I 'J'i

:.13

2 ol)

~.

2 71

I h<"~ nit I\ ' vnlu~ l(lpl) \\h~;n r


u I~ /anJ : 1 = 11 F.~ I ltuunJcJ w tho.; n.::u-~'t inlc!!~r) !141h.lt the I MO\tl tw; '' "mput~<
diJKlt.:ntl,tl l-11!.1\.. !.1 1tc:' tilth~ t~ntrol ~u., ut the: "Dmplt: llu: ouml~r nl rc:,ln..:llon' /" the number ol roliiCIIUII>IO:,tc:J h~
lllUt' tJuol/ HJll~ll~.liiiiC l l'alu.: tur nth~;r tmnnllnJl f!<:l~~.utJg~~ trc: given m Am.lrc:11~ (21'WI~).

It then: j-. a uhcrctc break Jl a dale within lh~;. rangt; IC ... tnl. then the Ql I~ '(:\
ti.,tic wtll reject\\ ith hil!h probability in large sample~ \furi!UH~r. th~: date .11 whi h
the con,titucnL F-, t ~tbtic is at it.-. m:tximum. 'is ~m esumatc ot the hrcal- d.
Pw.. c-.timh. I' a g'10tl ~ nc in th~; sense rhat. unlkr .:cn,un h:\:hnk tl CllnJttl n.:-.
7-IT _.!..._. -r I. th01l ts. th~; fracuon ot thL way thmuch the ~amplt. .tt whtl h 1.:
brenk occur"., c'ttm,lh.J (;onst,lt:ntl).

14.7

THE

QLR

Nouto!iooority II. Bteob

569

TEST FOR COEFFICIENT STABILITY

l.el F(;) denote th~ f-,latistic

t..:~;tin!' th~

hypmhc:,as of :l hrcul..

10

the r.:tre:;saun

14.9

coeftici .. nts dl .:la te;: n tht. n::: rt. -.~1 1111 10 r:~.,uatJNl ( 14.3."\ l. l~.>r cx.w.piL. : ht' '' th~
F-stalisti~.; test in~ tht! null h) pothe-.b lh.tl l'
)
)'
0. Th~ QLR (or ..upWald) ll''l q ltt'\l!C ,, th~ largc.-:t ul 'l.lltsllcs in the range : S T
' .

OL R

max[hro).f'(r11 ... 1).... . I ( JJ).

1. Likco the F-st:tll!>tic, th~ 0 1 R st,tt,ttc can h\.

tt~cc.J

(143u)

to test lm a l'l1e.tk in all ur

just 'ome of the rcgrv.<:ton cncffic1c nts.


2 In ltrge sample~ tht. Jistrihu:ion of the QI R ... tatisttt. under th~ null h) pNh
depends on the.! number of rc~tnc tions bemg 1<. sted .cf :md on the t:ndpomt~ 7' .md ;, as <1 fraction 01 I ( : ttical '.tlucs .trt. given in Tcthk I ! 6 lor

c'''

15% trimmtng ( 'TlJ


~-

=O.l5T ,tnc.J

0.85T rounJeJ to the n~nrc.st intl'gcr).

l11c OLR tc:st can dett:ct a sing.k dic.ca;t... h r~.1 k , muhipk di,crctc brenk'.
and or -;Jo\1. ~voluuon of the.: r~gre~ston !unction.

-' If thl:rc t!' .t dtsh nct break in lhe rc~rcso;ion function. the da ll. at which the
larg~~t Chow sttllt"tic (.lCCUT"- b. an c<\timator Clf the break du tc.

'llte Q LR stalt'>IIC :tlso TI.'JCCt s tb~.; null h~ pothcst' "tth high probahtlity in lan~t.
sample-. '' hen lht..tc.: .tr~. multtpk J"" de.:. brt:uk-. n. \\hen thl' br~; ~~ ~ltmc' n the
form of a slo\\ cvolutton of tht. regrcs,ion function . llti'- m~..tn-. lhat tht.. O LR stati"> tc Jc.t~.:lh fnrm-. nf tn-:.1 thilit} other tlun a 'ing'c da<;crt:t~.: hre,ak '\.-: a result tf
th~o O LR 'latic;tic rejects the null hypothc.:~k it can 1111. an th.t l th~.rt. I' a san<> It ~.las
crete bre.tk tha t 1hl.'rt. arc: multiple dis.. reh.: hre tks. or that then... 1s siO\\ evolutaon
nf tht. regrc,.,ton funcllon.
l'he OLR ::.Wttl>IK ts summarw.:d m "-.e) lonccpt 14.9.

Warn ing: You probably don 't k now the break da te even ifyou think you
do. <\om~. tim'-'' 1111. \p~.ort might believe th 11 h~. or -.ht. knm'' the.: J th: of '' pns-;ihlt:. break so that the Cho\\ tcsl c.m he u-,eJ an::.tc,uJ of Lh'- QLR t~.-'1. But if tlus
J.. n o wleJ~c i., b:.ascd on thl.' expert\ ~llO\\l ~o)!.c of th~ ~crtc::. b~.mg anai)L'-d, then
tn l..ct tht' 0.1tc '' '"' csum.th.:o using th..: datt ulht.:tt in n infllml31 \\.1~. P.clirninat) csttm .tltnn ot thl hr~ak oalc mean" that tht. usu.tl r aitical value.<. cannot b~

570

Introduction to nme Series Regression ond Forecosting

CHAPTER 14

FIGURE 14.5

f.Stotistics Testing fora Break in Equation (14.1n at Different Datos

F- tatinic:
(

QlR StatlSlic =5.16


1~ CntK.at~ Ill'

- --~-----

(I ll

-- -

I . . - - - - ' -- -- L - -- - L - -- ' - - - -...I----'-----L-----1

fI()(J

1965

1970

1975

l'll80

1990
I99:i
.,llnl)
Break Date (Year)

given bfeok dote, the F-slolistic plotted here t~ the null hypolhesis of o brook rn at leo~
Unempi- 1, Unemp;- 2. Unemp._J- UnempH. or the inlef'a."P'
Equation (l.S 17) For example, the Fslolis~c testing foro breok rn 1980:11s 2 85. The OLR
slotislic 1s the largest of these Fstc~slics, which is 5. 16. This exceeds the 1% criticol YOiuc
of 4.53.

AJ o

one of the coefficient\ on

used for the Chow test tor a break at tha1 date.Thu!. Jt remains appropnat~ tv U.'t:
the QLR statisttc 10 this circumstance.

Application: Has the Phillips curve been stable? The QLR test rm' ic.le'
a way to check whether the Phillips curve bas been stable from 1962 to "'()(l.t.
Specifically, we focus o n whether there have been changes in the coefficknts on
the lagged values of ll1e unemploymen t rate and the imerccpt in the ADl ( 1.1)
specification in Equation (14.17) contain ing four lags each of Mn/, aml Um;mJ
The C how F-statistics testing the hypothesis that the intercept and the W<.lll
cients on Unemp, 1, .... Unemp,_~ in Equation (14 .17) arc cono;tant ag.tin~l th"
alternative thatthcv break at a given date are plotted in Figure 14.5 for hrc:aks in
the central 70~v of the sample. For example. the -statistic testing for a h r~;.lt. 1n
1980:1 is 2.85, the 'aluc..: plou ed at that dale in the figure. Each F-!>tatbtK ll.''b fi\r.:
re tncuons (no change in the intcrct.:pl and m the four cod fktents on lab) vlthc

Nonstohononly II Breaks

14.7

unc:mphl) ment r

ttc), '0 q = :'. Ihe large:" I ot till.. -.~.;

571

I -.tall'll~' '' "' ll. '' hich tiCCUI':'\

w 19Sl:lV: thr '"the Q LR 'tati...ttc. Comp.tring 'i 16 to tht: cnltc.tl values ll'r q5 in lahlc 1-4.6 tndka.to...:-. that lh\.o h}pothe..,j, thatthc'c cocfiicrent... .tr~ 'table is
at the l " u -;ignificanc.:: It:\ el (lht.: critllal '.tlu~: i-. 4 'I~). lhu-. then. i-; C\ ii.lrnct: thmntlca... t <me of th~:: .....: five coeffich.:nt<. changed ovt.r the .;ample
1ejected

Pseudo Out-of-Sample Forecasting


lltc ulttm.11e tc')l oJ a forecusting modd j.., ib out ut-samph: pcrlormance, thJtt'
II!\ ton!<..,t.,ttng performance in "real tim~.;.'' after the mnt.!LI ha-. hecn estimat~.;t.l.
a,cudo out-of-\ample foreca..\tiog is a metlwd fot .,imulating the: real-time perfonnan<:c: uf t rort.:casting modd. Ille idea lll r<;CUUO OUt of '>:Ullple forc~.l ... tlnl"! IS
'>implc: Pick :1 d.lle nc.:ar the eml of the !'ample, e:-tint.ttl.. \t)Uf ftlft.:C3,ltn~ moJd
u...ing dat:.t up to that dare. then u-.e that esttmatc<..l mo<..lelto mal\~. a lorccast. PL'rfm ming th.., ~..:Xt:tc.:.lst! for multtple Jail:~ ncar the en <..I of your s,tmplc )tcl<..l.., a series
of p-.cudo lolClt'>h and thus f>CUdO forccetst crwr:- ll1c pscuJo fon.:c.lst t: rwr
~<In then he cxumine<..l to ..ec \\ hetber thC::} nr~ rt:pres~.; nt;tthc 1lf what Y<.'U would
C\.pcct if the forccu<.Ling relationship were slation'lry.
'J11c r~.tson this is called 'pseudo" out-ot 'oamph: ftlrcc..,\:,ting ,., that it i-. nm
true.: out ot ,,tmple forccasttng.Trut.' out-ol-s.lmplc lure .J\IIIH' o~o. c..urs tn real-11m~.
t h tl i~. \'Oll m.tkC VOur foreca'l \\ILJtout the n~ndtl 01 kilO\\ 10& the fUI Ufc; \.I lUeS
of the. ";rtcs. In p~cuJo out-oi-sampk tor~<..t..,llng. )liU sunulate rcul-time fltrc ca... llng u-.m~ )Ullr model. but )OU ha\c the "lutur~.. <..l.tt.t '!' tinst which to .t~scc;s
thu'>c ~imul.ttcd. or pseudo. forecasts. Po;cu<..lo uut-of-~ampk lorcc,tsting mimic-, the
fnr~.ca,ting pwcc" that would occur in re.tl time, hut w11huut h:l\ tng to watt for
nc.w <..l.tta to arn' e.
p,cuJo lHtt-ol -,amrle fore .1stmg gt\ls ,, ll>r~..:C'l'>tc.l Slfl'l! ol ho\\ \\ell the
nw<..lcl h,,., b~..:c.. n lorec.tsting at the end ol th~.; sampk. lht' c 10 pw\'i<..l-: \:"lluable
tnll>nnattun, cllhc..r bolstc.nng confidence 1hat lhc mo<..lcl htl'> bt.:1.'1\ fon.:cast ing \\ell
0 1 ugg~~ting thnt the mu<..lcl ha' gone off tr.tck in thl.;' cc.:nt ('lt't Dte m~.:thodol
og~ l'r pseudo uut-of-c;ample forec<l!'lmg j., ... umml i;cd 1 1 J...e\ Concept 1 :UU.
A ~cwn<..l u'c of pseudo
out-of-,ampiL loreca~nng,... to estintatt! the RM\1 L. BccaU()c.. the p~cudo out-ofsample torec.t'>b ..trc computed usmg onl) d.tl.t pnm 10 the furl.;'t:asl J.ttc. the
P'>CUdll OUluf-,ampk torccast errol' rcn"ct outh the unccnnint) .h'>tlCiatcu "ilh
luturt ,,,Juc., nf the error term and the unc\:rtaint} arisin~ hcc'\U<;e the regression
coclll~icnh \\NC estimatcu: that is. tht: po;cud1l oUl-of s,tmplc lon.:rast ctrors
indudc but h 'ources ol error in Equatwn ( 1~ .21 ).l11us thl..'' tmplc '' :tnu.m.l <..l~.:\'1-

Other uses ofpseudo out-of-sampleforecasting.

572

CHAPTER 14

Introduction to Time Series Regression ond Forecasting

PSEUDO 0UlDF-SAMPlE fORECASTS

14. 10

Pseudo o utof-'ample forecasts are computed using the fo llo.,.,ing steps:

1. Choose a number of observations. P. for which you will gcner<tll! p. ~.:udo outof-sample forcca t~: for example. P might be 10% or 15% of the sample ~tze
Let\ - T P.
2. Estimate the forecasting regression using the s hortened d ata set fo r

{= 1. . . . ,s.
3. Compute the foreca st fo r the

fu~t

period beyond th is shortened c;amph.!,

s + 1; call this Ys -~~~


4. Computc thc forecasterror,u)_ 1 = Y.~ 1 - ~ t!.r

S. R epeat steps 2--4 .for the remaining dates, s = T- P + 1 to T - J (re-estimate


Lhe regression a t each date). The pseudo out-of-sample forecasts an:
IY1+ 111.,s = T - P.. . .. T - 11 and the pseudo out-of-sample forecast error~ arc
llis - tS = T - P. .. . , T - lJ.
ation of lhe pse udo out-of-sample forecast erro rs is an estimator o f the R MSFF
As discussed in SecLion 14.4, th1s estimator of the RMSFE can be used to qu:m
tify fo recast uncertainty and to construct forecast inter vals.
A Lhird use of pseudo out-of-sample forecas1ing is to compare two or mm~
candidate forecas ting models. Two models tlJat appear to fit lhe data equ all~ wcU
can perform quite differentl.y in a pse udo out-of-sample forecasting exercise.~ b(;n
l he models are differe nt, fo r example, when they include diffe re nt pr~didor-..
pseudo out-of-sample forecasting provides a convenient way to compare lhe l\\0
model that focuses on their potential to provide reliable forecasts.

Application: Did the Phillips curve change during the 1990s? Using th~
QLR statistic, we rejected the null hypothesis that the Phillips curve has been sta
ble against the alternative of a break at the 1% significance level (see fi gure P. :;).
The maximal -statistic occurred in 1981:IV, indicating that a break occurr~J til
the early 1980s. This suggesLS that a forecaste r using lagged unemployment ro Jc.m.
cast inflalion should use a n estimation sample s tarting after lhe break in 19~1 T\
Even so, a question remains: Does the Phillips curve provide a stable forecasting
model s ubsequent to the 198l :IV break'?
If the coc ffic i.::nts of the Phillips curve changed some time during thl!
1982:1-2004:1 period. then pseudo ou t-of-sample forecasL" cumpu t..:d u.. . mg J.,t.t

14 .7

crh1po; you have heard the: advice that you should


bU) .1 stocl.. when tt' tamings are high relative to
11 ~ pm."l;. Buymg a ~toe!.. b. m effect. buying lhc stream
of fmurc dividends pni<.l by that company out of its
eamings..lf the: Ji, tdend stream tS unusually large relative to the price of the company's Mock. then the
company could be constdcrcd undenalucd. Ucurrent
dtvidend~ are an indicntor of future dividends. then
the dhidend yidd-the ratio of current di\'id!!nds to
the ,rock price-might foreca.'>t future e'Ccess stock
retums.lfthe di,idend yield is high. the stock is undervalued and returnl> would be forecasted to go up.
'lnis reasoning suggests exumining nutorcgrl!ssive
~hstributcd lag models of exces!i returns. where the
pn:dictor vannble i~ the dividend yield. But a difficulty unses wtrh this upproach:Thc dividend yield is
htghly persistent and might even contain a stocbac;uc trend. Using monthly data from 1960:1 to 2002:11
on the logarithm of the dividend-price ratio for the
CRSP \alue-weighted index (the data are descnbcd
10 Appendix 14.1 ). a Dickey-Fuller unit root test
mcludin$ an intercept foil~ to reject the null hypothe~t~ of n unit root nt the 10% significance level. As
always. th1s failure to reJect the null hypothesis doc!>
not mean that the null h) pot he sis is true, but it doe.<i
underscore the tact that the dividend yield is a highly
persiStent rcgres<;ot Following the logic of Section
'Hi. this n.~<oult suggest~ that we ~hould u_~e the fin.t
difference of the los, d1vtdcnd ~ield as a regressor.
not the le,el ot the log divtdcnd y1eld.
Table 14 7 present) ADL models o excess returns
n the C'RSP \',tlue-\o.caghtcd index. In columns (1)
nnJ {"l). th~. dl \ tJ~.nd ~ icld appcurs in first differcnl'CS. und the mJi'Ytdual t)lJ\tiStiC'\ and joint F-statt'>tlt'' lui to reject the nuU hypothesis of no
predict&btht). But "bile these specitic.:Attons accord-

With the nodehng recommendations of Section 14.6.


the) do not corre<>pond to the economic reasoning tn

Nomlationority II: Breaks

5 73

the mttoductof) pura19'aph. which rclaacs returns to


the /e\el of the dl\'idcnd ' 1dd. Column (3) of Tallie
14.7 therefore reports an A DL( 1.1) model of excc!>s
retums using the log dividend yield. esumated
through 1992:12.The /-statistic is 2.25. which exceeds
the ~ual5% critical value of 1.96. Hov.cver. becau-;e
the regressor b high I~ perststent. the distribution o(
this H1ati:>tic t<; <.uspect and the 1.96 crittcal vatu~
may be inappropriate. (The F statistic for this n:gression is not reponed bccau~e it does nol necessarily
have a chi-squared distrihutton. t:\en in large loampies. bccau~e of the persistence of the regre!'..;;or.)
One way to cvuluate the apparent predictability
found in column (3) of'Iilble 14.7 is to conduct a
J>l>cudo out-of-sample fortcasting analysic;. Doing so
O\'er the out-of-sample pcriod 191)3:l- 2001:12 pro\ri.dcs a sample root mean square forecast error of
4.08%.ln contrMt. the sumple RMSFEs of always
forecasting excc!>s relurns to b~.: lcro is 4.00%, and
the .sample RMSFE of a constant forecast" (in
which the recu~ively est1muted forecasung model
include:. only an intercept) IS 3.98%. The pseudo outof-sample forecast based on the ADL(l.l) nwdel
with the log di'vidend yield does worse than foreca!>ts
in v.bich there arc no predtctor-. 1
Thi::.lack of predictabiht) ts consistent \\ith the
strong form of the effict~.;nt markets hypothesis.
which holds that all publici)' ava1lablc information is
incorporated mto stock pric~!> so that return~ should
not be predictable U!>tng puhhcly nvailablt info rmation (the weak form concern!> fori:!Ca~ts bu':iCd on past
returns only). The core me sage that excess returns
are not casil) predtcted m;lkt~ c;en~: II tht) were.
tb~: prices of stoc~s \\QUid b\: dri\CO up to the point
that no expected excess rc:tUm!l would exic;t.
The interpretation o( result- like those in Table
14.7 is a mauer or heated uellnte among financial
colllllllle(/ un nt'\f pag~

574

CHAPTE R 14

lntrcxluclion to T1mo Series Regression and Forecasting

CCOI\01111'\s So lilt.:: l'OII~tdct lht.::

li!d. ol

in prcl.lsctivc rcgre inn~ to ben' tndtl:'.ttion of the efiident market h\poth<..-sis (M.>.e. forcxampk.C.oyal wtJ
\\~lch.20l1J)

Others !>3\ thut u.:

n:,.;,on~ O\cr

loner

ume pcnod, and longer I '" '"hen anDI).fcd


u'ing tool;; th.tt .:trc speclft l\' u ..,,i!th.:d to handk
pe~">btcnl rcgrc,SON.. , 1\u\\ ... <lw - of pn..chcral-th t~
( c Camphcll nd \c-1~0 2006) 1111 predtct.tblluy
n112ht arise Irom mll<naJ ...conomic ~lu1' ior. in '' hich
iuvc..tor attitudes to\\ ani mk chnnl:\c 11\t:r the busine.., C.') ck (f'nmpbell.2003), or 11 might rcnect "rraUtmal exuh.: runc .. " (Siullc( 2~1 l
TABLE 14.7

lhL rc-.ulh in l.tblc 1-t l'(lllC~I'flnttllllhh return

pl l'-ft~tnhtlll}

but 'orne !lnaacinl etom>n .ct 1ans hm c ft wed on


e\cr-shortcr hont.on'. lltc thcnf) of m rtct

micr<'''"Uet!!e"-thc minute to-mtnute mO\emcn~

of the'

~K

m,trke: suggest. thlllthcrc can be n~.:cl

ing. 1 ..., u..:....;. ~ pr..Jictah1ht) .md lhnt mom:\ am be


mad~ h~

th.: ...lch r and mmblc But domg

reqwre.,
nc.mc. plus lot, of compuung f'O\\cr-nnd n ~l.l!f of
talented ecouom.:tncmns.

Autoregressive Distributed lag Models


of Monthly Excess Stock Returns

Dependent variabt.: ex~eu returns on the CltSP va lue-weighted index

')p..:dli~.lllllll

E-um:nion p:nod

(1 )

(2)

ADUI ,I)

,, 111 l2-)

(3)

AUL

1J

1960:1-

19611 1-

1%0 I

_002.!::

!IMI.!.I2

19')2:12

ll.059

O<M::
(O Jol)

0.07
(0057)

R ef- C'SO!>
I ~(~'l{

r, f/011,

(0.1 - )
l'.let"D

-0213
(0 193)

rrrum

~ ln( r/t1ulrnd 1 u::lt/1

0.1)(11/

.)

(0.157)

.lln(iltl uil:tul }td d

-001::'
(U 163)
-!1161
(Ills')

In( tlll'ltlc:nd ,.,,.ld1 ,I

0.!1~11'1

(().til.': I

[nter\:cf'll

r-'\latblll

\I.I)(JJJ
(ll.IM12IJ)
llll

R-

1111 ... ll.:'llu.:ll.'llh (/'

V,\JUt.')

O.O(n7
(UHI21)

11.5111

II.K<!J

(ll.f><lft)

(11.1117)

- 0.001~

-liiiOilX

I).1M~

(!l.(l 39)

oor;a

:-;..,, .., lb.:- 1~ rc ck."SCJ10cd m \rp.:rulo\ tJ I EntrK"> m Ihe n:gressor ro"' 11re codfiC1 nt Wllh hctcr~L:c,bstiCII)'IU'I> I
stliJJdard crm~ m J'lln:nth~ 11~ lirull t,.o TLfl\ J rcpon the hctcrO!kcd:lstt.-.1\-rob~nt I tat~ltc "'"'"nt:th~ h' r'l)thc~s t! .>t
all the codh,,,nt) IIIIi .. rc~r~-..,.ooo 11n: lru. '""" tU {' llhJ.. on p;ucuth~es. and Ihe a.~u,tt'd R
I
196.

14 .7

Non,tationonty II: Btcab

575

-.tarting in 19~21 slwuld ~oktcrit'l 11~. 11~ P"cut.h ' out of-~mpk forcC1'-I' of milalion for thl period II.JI.Jt);f-2()()4 IV. computed u-.mg thl fuur-lag Phtllrp ... c.:unt. c'ti
m tlo..'d \\ 'tl J In tartin. 19~~.1. arc plultcd I rn!loll.o J-l.6 along With th~o lCIU::tl
,a lues of tnllauun. f',)r ex.smple.thc Ioree t')ll> nll.tlhm tor 1999:1 \\a~ '-~)m. 1 cJ

bv cgn.: "'""~In/ on ~/11/, _l . ... :1/nf, J L "'' [J1_ 1 , l.dlt'lllf'1 .. with .m lilt crept ustng the da11 through Jl)(}S:lV, then \omputin: th'" forecast ~W 1 1,
u-,ing thc~l C!\llll1.1tt:d c:ncl ficicnh and the data thnmgh lYLJX:IV 1l1e inflat 1011 lorec.t"t tor 1'>99.1 j, thln S"r,7] 1 .,.~ ''"'1\. 1,
Jnf ~ 1 , '~lli/ 1 u 111 1<JI\I\ llli:-.enur~
procedure wa-. repeated U!)lllg data throurh IYlJY: l tl"l curnpu1c the forlca~t
:fi-:ij,
Doing th1s ft)r II ~-t qudrt<-r" lrom l'J:N.lto 2004:1\ creak~ ,:4
p-.cuJo out of '<1m pie fun:.la:.l"- '' hich ,1rc plottc.:J rn I l"'llrl 4.6. The J'-cudu uurof-,ampk lur~.-~... '' ... rror~ aTL the thlkrcnc~.-:, hctwl..'cn actu:ll infl.ttron .tnJ rlS
p~cudo out of-~.unple ton:ca:-~t, th.tt i"-. thl..' diffcrl..'ncl..'' h 1\\ct.n tht: I\\O lrn~.:~> in l rg.urt: l-1 h f-or e~.tmpk. in 20011 I\' the rnnation rate fell ny 0 8 rerccntal.!c p~ollnt.
but the p~l..'udo UUI-\1(-s.rmpl e fon:cac;t or .:J.Inl' Jtl\ \\ .l'i n.J pcrccnl.tl:.l..' puant, so
I

tht pseudo out ot '<lmplc fore~1't error \\a~ .1/nf ut\ - ~I nf1111,, f\'::'lm 111 -0.~
-113 = 1.1 pch. cnta~c pomt)o. In other \\Ord-.. ,aloh.caster usin~ th AOI (-lA)
model ulthl Phrlllp' cun c. e'hm.ncd through 2000 I II "ouiJ ha\1.. lmc:ca~tcd that
inllation \\uuiJ llllf1..lM~ b) U.J pdc~.;nt.tge point in "II('IO.TV. \\here'" in rl..'alit~ it
f~o: ll b) O.b perc~. nt.tgt! point
How ~.lo the me.tn and :.1 mJard J~.:\ iation ot the p<>cudo out-ol ' mple 'Tc.:l.l'\1 error~ compare with the in-sample Ill ot the nwJ I? The stanJ.a.J ~rror u the
rcgn:-s. iol of the four 1ae Phillip~ cun ~.: lit u'in~ da 1 Irum 19t'i::::J tbwugh 19c1.-. I\
is 1.30. so hallcd 110 thl..' In-sample fit we would c\p~.:c:t the out-of-~..tmpk Lnrc~.;a'l
lrror tn hav~;; n11:an cro and ruoL mean ~quare fmt.'C.I'-1 en or uf 1.:1() In fnct over
the llJIJY:l 201J4l\ p~cuuo \lUtof-:.ample furcc<~'l f'lriod. the 3\ll.ll.!t. furcca't
crrori:.1lll.and the r-,tatio;tk tc,ting th1. hypothc-.j, that the mean h'rc<.. tc;t c:.1ror
t!tflll!ls 7t. o j, 0.41. thu<. the h'P' th~_c;" that t'l)t: for"c- ... 'h.He me.., l.l..f '''not
rcjectt!d In adJttion, the R~t\fE mer the P'~ouuo out nl-,ampk lorc~. ,t p~..nod
ic; 1.32, verv c.lu'e to \Jiue ol 1.30 for thL: ~t.md ud <..HUT ol there~~.- ~')t 1n lor tht:.
IY~2:1-II.JlJl-i: IV pcrrotl. 'v1 ure0\ cr. the plot of the tnrcca~tl> and the h,1 eca't error'
tn Figun.. 14.6 )oho"' no maJor outht.r' nr unu'u tl dbln:pancie...
Acct..,rdmg tu the: p~cudo out-t..lf-,amplc forcc.t,tinv cx~.rcase.thc pcrfurmmcc
the Phillip' curve forecasting mcxld duriM the p-:cuJo out-ot-samplc pl.ltoJ of
Jl)99:I-:!004: IV \\a' compamhlc to ih p<..rlorm,tncc dunn g. the m--..ul pic pt.lluu ut
19~2: l -J99~: l v Although the OLR tc.,t point' to mst 1hility in th~o Phillip' wrv1.
in the carlv 14~1,..,, th1s pseudo out-of., tmpk .lila I~~~~ ~ut!ge~h that 1fter the c ul~
11..1:-.0.. tm:ak. the Phtlhp~ curve; fon.:t-1'-trng model h ,., hccn l>tabk.

,,f

576

lntroduchon to Timo Series Reertl$sion and Forecasting

CHAPTER 14

FIGURE 14.6 U.S. lnRotion and Pseudo Out-of-Sample Forecasts


Percent per Annum

::r
3.0

!.......... ....
-1.0

' - -- L - -_.__

1'.19 4

1995

19'1(

_.__

_.__

__.__

1997

1998

I'~'J'J

.___..L.-_ _.__ _.__ _,

__._ _

2000 .21)01 2002 2003 200-4 .:!flO" Year

The pseudo ootofsomple forecasts modo using a four log Phillips curve of the form in Equohon (14.1 generally
frock octuol inRotion ond ore consistent with o $loble pos11982 Phillips C\lrve forecos~ng model.

Avoiding the Problems Caused by Breaks


The best way to adjust for a break in the population regression function dcp1. nJ
on rhe source of that hrcak. lf a distinct break occurs ar a spcciJic date, tht-. ht c.:nlc
wiU be detected WJth high prohabihty by rhe QLR statistic, and the break <.l at ~.- \. 111
be cstimated.Thus the rcgresswn function can be estimated usmg a binary vut iabk
indicating the two ~ubsa mples associated with thi. hri.!uk. interacted with the lHht:> r
regressors as needed. lC aU the coefficients break. then this regression t,,l,.c., th~
form of Equation (14.35). where -r is replaced by the c-.,timared breal.. uatc ~. wh ale
if only some or the coefficients break , then onJy the rdc, ant inte ractiOn tcnm
appear in the rc.:grcssion.Ifthere is in (acladistinct break, then mft:a eo eon th ~.-a ~. rev
~ion coefftcknt' can proceed as u~ual, for example, using the usual normal crtttc d
values for hypothesis tcsLS ba: cd on t-stall'>tic-,. In additton forccash can ~c J ro
duccd u.-.in~ the. c..o;tamatcu rcgrt?-<<~ion lunctiun that .tpplic' to the end,)( the. -;ample.

14.8

Conclusion

577

lith I! hrcal is nl)l disllnct hut rather ari'e~ from 1.,llm.lmvumg chan~c 111 the
n:mcdy ts more difficult. an<..! goes beyond the scope of ths~ hook.&

pawm~:kr-s. th~:

14.8 Conclusion
In time scncs data. a variable g..:nerall~ 1~ t:orrclatcu Irom one obscn alil)TI. nr date.
to thL Ill:XL. A consequence of tlm. corrclauon is thnt linear reg C"-Sion can be u:.cd

to forcca~t future vHlues of a Ume serie:. ba::.ccl on it!. current ant! pas( valu..:s.ll1e
:>tartmg point fnr time series regression is an autoregression, in which the rt!gre~
sors .trc lagged \alucs of rhe dependent variable. If additional prcdicto~ arc avrulahlc, then thdr lag~ can be added to the regression.
llus chapter has comtdereu sevcraltechnical lssut!s th<it ari'i.! when c ... timatmg ncl using rr..:grc siOll..'i \\llh time sen~ c.lata. One :.uch ISSUe I. u\!tcrmming Lhc
number of lags to include in the regrcss1ons. fu dt,cusseJ in ~cctton 1-1. 'i 1f the
number of lags is chosen to minimize the BIC, then the e~timatcJ lag length b con'\l:>lcnt l<.1r thl. true lag length
\n~1thci of these i~sue" cunccrn" \\ h~thcr the "eriel> b..::ing anal~ zed are stalionar~ . Jf tbt.: :-cries are stationary. then the usual method!; of ~latistJcnl inference
(such"' comparing r-stati<;.tics to normal critJOll valu~~) C<ln h~.- used. and, because
lht. populatwn regression fun~tson ts o;tablc ovcr lim~. regressions cstsmated w.ing
hi~lmic:ul data can bt:: used reliably lor fon!C<lSiing I f. htl\\cv~:r.thc 'eric-. ;.tre nonstu 11onary, thLn things become more com plica tell. where the 'iflt!d fie complic<tl ion
depends on the nalurc ollhe no~lalionarity. For c-.:amplc. il the "erit'" is nonsta 1ionnr) hecau:-c it has a stochastic trend. then the OI .S esrunalor on <.I t-st.tftsllc can
hnvc m.mstandarc.l (nonnormal) distributions. even in lurgt.: snmples. anJ IMccast
pcrf~1rmance cun he improvcu by sped[~ mg the rcgrcc;<;.ion 1n first dttfcrcnces. A
l~:'t lor dt.:.tcclme tlus typ-.: of no~tahonarit)-thl. augmtnt.:d D1cke) fulkr test
for a unit root-wa~ mtroduc~d in Section I-H1. Allcmativdy. if the pl)pulation
r~..g.n.""'on functinn ha!> <t tm::ak. then neglecting thi' hreak rc,ults in e'ltimaling an
a' cmg~. \'crsion (lf thl! population rcgre ~ion function that in mm can lead to
hia<;cd and/or imprecise lorccasts. Procedure ror Jctectine " hrcak in the populution regression function were introJucctlm Scct1on 14.7.
ln lhl' chapte r, the methods of 11me sene~> regressiOn ''l' rc apphed to economll: lore a'tin~. and the coefficient!> in these forc.ca ling ffi(lJd.., \\ere Dllt !:-tiven
a cau~<ll interpretation. You do not need a cau.,al rclatton.,hip to f<m:ca'>l. and
ignoring cau,al interprl.lations liberates lbc quco..t for Qtltlll forecastf... In some
" I m tddtllt>n;ll UI"-U"i''" "' c:.um.IIIOO .md 1(')110!! in the rrc.-..:ncc ,-,( di'<:rch: t.r.:tsl,. \\:.... Hm'""
(~IN Ill fur .m ...h.tncccl u~u~'uu1 ul.:sltm.tlum .tnJ lur.:ca,llllf, when 1h~r~ 11c ~t"IY ~ v,,t\tn)l coct
fim: nl~. ~c

ll.umllnn ( 191J>l Chaptt:I 13 ).

578

CHAPTER 14

Introduction to Time Series Regre~s1on and Forecasting

applicuUoos, however. the task is not to develop a foreca-.ting model hut rather to
estimate causal relationships among time series variable lhat is_ to c..:stimatl;! the
dynomtc causal ef(ect on Y over time of a change in X. Umkr th~ n~t conditions.
the methods of l.his chapter, or closely related methods_ can b\! uo; d to estimate
d)rnamic causal effects, and that is the topic of the next chapta

Summary
1. Regression models used for forecasting need not ha\'e a causal intc..:rpretation .
2. A time series variable generally is correlated with one or mor~ of tts lagged ' ;!lues; that is. it is serially correlated.
3. Ao auto regression of order pis a linear multiple regression model in which lhl'
regressors are tbe first p lags of lhe dependent variable. The coefficients of an
A R (p) can he estimated by OLS, and the estimated regression function can be
used for forecasting. 'tne lag order p can be estimated using an information l'riterioo such as the BIC.
4. Adding other variables and their lags to an auto regression can improve forecnq.
ing performance. Under t he least squares assump t ion~ for rime series regrt!s~u n
(Key Concept 14.6), the OLS estimators have normal distributions in large sa!"lples and statisti01l inference proceeds the same way as for cross-sccttonal d.tt<J.
5. Forecast intervaJs are one way to quantify forecast uncertainty. If the erro r~ tre
normally distributed. an approximate &~% forecast interval can be construclt.:u a'
the forecast plus or minus an estimate of the root mean squared forecast error.
6. A series lhat contains a stochastic trend is nonstationary, violating the second kait
squares assumption in Key Concept l4.6. The OLS csttmator and t-statistiL rur the
coeffictent of a regressor with a stochastic trend can have a nonstandard distribution , potentiaUy leading ro biased estimators. inefficie m forecasts, and mi.,JcaJing
infe rences. The ADF s tatistic can be used to lest [or a stochastic lrend. A random
walk stochastic trend can be e liminated by u~ ing Grst dif[ere nces o( t.he scri~.s.
7. H the popu lation regressio n function changes over time, then OLS csumat"5
neglecting this instability are unreliable for statistical inference or (orccasung. l'h~
QLR stat istic can be used to test for a break and, if a discrete break is found, the
regression fun ction can be re-estimated in a way that allows for the break.
8. Pseudo out-of-sample forecasts can be used to assess modeJ s tability toward till'
end of the sample, to esti mate the root mean squa red forecast error, and to compare different (orecasung models.

Review the Concepts

579

Key Terms
first Jag (528)
1 lag {528)
first difference (530)
autocorrelatio n (532)
se ria l correlation (532)
autocorrelation coefficient (532)
autocovaria nce (532)
autoregression (535)
forecast e rror (536)
root mean squared fo recast error (537)
AR(p) (538)
autoregressive distributed lag (ADL)
model (543)
ADL(p,q) (543)
stationarity (544)
weak depeodence {546)
Granger causality statisLic (547)
Granger causality lt:St (547)

forecast interval (549)


Bayes information criterion (S IC} (551)
Akaike .nformalion cn terion (AI C) (552)
trend (555)
de tt:rnlinistic trend (555)
stochastic trend (555)
random walk (556)
random wa lk with drift (556)
unu roo\ (557)
spurious regressio n (559)
Dickey-Fuller test (560)
D1ckey-Fuller slatistic (560)
augmented Dickey-FuUer (ADF) ~ ta tistic
(561)
break date (566)
Quandt Hkelihood ratio (QLR) statistic
(567)
p eudo out-o(-sarnple forecasLing (571 )

Review the Concepts


14.1

14.2

14..3

Look at the plo t of the logarithm of GDP fo r Japan j n Figure 14.2c. Does
this time series appear to be stationary? Explain. Suppose tha i you calcula ted the first dtfference of this series. Wou ld it appear to be stationary?
Explain.
Many financial economists believe thnt the random walk model is a good
de cription of the logarithm of stock pnces. It imphes that the percentage
changes m stock prices are unforecastable. A fi nancial analyst claims to have
a new model that makes better predictio ns than the: ranc.lorn walk model.
Explain how you woul.d examine the analyst's cla im tl1a t his model is supe
rior.
A researcher estimates an AR(l) with an intercept and fmds that rbe OLS
estimate of /3 1 is 0.95. with a standard error of 0.02. Doe"> a 95% confidence
interval tnclude {J 1 = 1? Explain .

580

CHAPTER 14

Introduction to Time Series Regress1on and Forecastmg

14.4

!>uppu:,~o.

thnt you c;;uspected that the tntl'tl.t.pt tn E4Uilti1111 114 17) dt.utgcu
I'JIJ' I llo'' '' ould you modtf) the c4Uat1on ll' inc~1rpur ate tht:- ,,:hnngc .'
I km \\lllllu you lC'it for a change in lht: interc~.pt '> JTm, \\ouiJ ~nL lc t fu.
a chan~~ in the mterct:pt if you did not know Lh~o. J lk ,tf tht: d 1 ?

111

Exercises
U.l

Con...idcr Lh~ AR( 1) mood Y,


h ,t.,tionar~ .
u. Shnw that 1:(}',) = EO'
~hm'

h.

14.2

th,tr f."(}') =

= f3o- /31}',
1).

I + u, 5uppll'>C that the prOCC \S

(llmr ReaJ K'- \

(',mc~.pt

I4.5.)

/3n '0 -/31).

1hc. 111dc)i ol mdu-,trial production (JP,) i:-. a mnnthl) tunc M.'ncs that 1111 <~
Mites the. quantity ol inuustrial commu<ltth:~ produccu 111 a given nwnth I ht::.
prnhlun thC'> data un this index for the l.lntlcd <;tatcs. All regr~''mn' .til
l.'~limakd mer the ~amph.: pl'riml l%0:llt1 :!000:1:! (th,ll j,,JanU.II) 19o0
throu.!!h D~..ccmber:!IIOO) L et Y, = 1200 Y ln(IP,t lfl, 1).
fott!C 't'>h::r ~t.ues

J11

u.

h.

//',Ilk

l'U

!'UpJ <'

1.

that Y, -.ho''' th~ monthh pdct..nt tu~o. ~h. 1g~ tn


\.'J 10 (1\!n:cnlae-c pomt:. J"'d annum.l:. lht!o Co n.:l.l. \\'h\ '!

.t M\!C..J,k: T ~-.tinulcs

} ~-.:

J r7- 0311'\}'1
(0.1)62} l0.07~)

! -

tbc loU'"' ing \R

fl123l , :::
(11.lJ5_')

'> mu.ot.., ,,

+ 0.06Sl',_.; - O.()()J >'.- 4


{0.()6~)

(0.056)

u,~ lht'i AR{ I


111

to foreca<;t the\ aluc ot } =n J.l"U If\ 2001 u-.mg the tullow'.tluc:. ol 11 lor t\ug,u\l 2UlXJ throut-b De-~mb~r "000:

Date

2000:7

2000:8

2000:9

2000: 10

2000:11

2000: 12 l

//'

147.5\1:;

~~~650

4S.973

I ~!\.(lUll

14.\ "On

1~'7..illlj

c. \\'orrieJ :th<lllt p~..ltcntlal ...ca:-.nnal tluctuatllln' m rr,lluctu.m. th~ IH~


c:.... Lc..r ..tJJ., Y, 12 to the .Jllh>rcgre~Non I ht.. ..:'Lim.ttcJ codll~.oh.!nl un
>', 1: b 0 ll5~ With a standard Lm.lr <JI U 0:"1 ..' b th1' ~.:v~..fltctcnt !>l.l' '
t"~o:dll~ ,jgnillc<mt'!

d. WorncJ about~~ potential hrc..U.... ~he cumputc:, a OLR l1..'t l'' ith I''
nmming.) , th.: u ,n... tanl .1nd ,\ R C< ~.oll11. lh r '11. \ Rtd) rr uJ..:
1l1e aesulti QLR ..., ti'\lk "a'' .t". T' tl 1 i I c \lf a b 1.. ' ' '
l \nbln

Exeroses

581

c. Worneu that "h1. might lun c mcluded tunIC\\ lll ton m.tn) lag!'> in th~:
nll)dcl the lorcca~tcr estunatcs A R(p) nwdl!b 1!11 p
I, ... ll LWc;;f
the ~ame ~ample period The sum of 'lluntcd r..:sidunb frLlffi each of

thc..c c-;tim1tc;;d modclc:tc: ... hown m the tuhlc 1 the BIC h" c ... tim.lte
the numh('r ol lag' that sbouJJ ~c incluth.d m h~o. ,,u l'l cgr~. ...... ion. n,)
th~... rC!->Ults t.hflcr if~ ou u.-;e the AI( !
2

AR Order

2H.3~1

U .J

lJ,ing thl.! ~o,amc: Jata as m Exl..'rtJ'>L 14.2. a rc., trdt~...t 11.!'1' lnr a swchasuc
trend m ln(IP1 ) usmg the followmg regr~.:s,ion

iiil~)

= 0.1161

~ O.~t - O.ot 'lnUP,_) o ~1\~ln(lr,


(0.117")
(0.0..-l) (0.00001) (0.007)

1) -

lo~.lln(/P 1 )

(0 05))

\\ here the swntlard errors shown in p.tn:nthc:-.~.:' OJrc.. ClHnput~o.tl u-.mg thl!
h'lmo,kcd.l-,ttcuy-only lonnula and the n:gtcssot .. , .. ts .1 hm:ar !line u entl
a. Use the -\O F ... tati-.tk to test fur a -.tocha!->tic trend (unil Will) in
In(/?).

b. Do thc'c result ... 'upport the ~pecifk.ttiun u ll in Excrcbc 14.::! .'


Expl.tin.
HA

Inc forcC<l'>tcr in F-.;crcise 14.2 aug.m~.:nh h~ r t\ R( ~) llll'lkl fur I r gm'' th h>


tnduc.h.: lour laggcJ \aluc~ L)l jfi, ,,h~.re H '' lhL llllLh.''t rate on thre~..-month

U.$. l rcasury

btll~

(measured in percc.:nt.tgL potnt' .11 .tn annual r.ttc)

a. llw Cirangcr-<:au,alit) F-'lati-.tic on the foUl lag~ of :lR, b ".3'1 Do


intcre't rates help to predict IP l!fOWth? Expl.un .
h. The rc,c.t l he.. r .1lc,o regrc~~e' .lR1 ' ' " J ~..on,lclnt. fc ur Ia,;' Clf ~R, .tnJ
fnur lags llf lP gTO\\ th. The re--ulluu G llCCH'tiUS lily r--.t.tti-.tic on
the fl,Ur la(!s of I P gro,,1h b:::! '::>7 Doc IP gruwth hdp to pr~;;Lhct mterL'l rate~? [>.plain .
1-'='

PrO\ c I he

ll>IIO\\ in!!-

results about Cl>IJUil iun,tl

llll. lOS. torc~:..t~ts.

anu fell ccn't

Cfrl'l s.

a. l.d \\' he a random \'ariablc wHh mcnn Jl-u and \ari.tnce tr?1 anJ kt c:
"c n constant. Show that El (W - t) 'J trf, + (Jtn - 1.) 1

b. Con-.ider th~;. pn)hlem ol forccn,tin1 Y, u m~ dnta on ) )', ~ ....


1 ct /, 1 Jcnotl.! some foreca<.;( or ) ,. "h1.. r...: the sub~cnpt r - I nn /, 1

582

CHAPTER 14

Introduction to Time Series Regre$Sion and Forecoshng


i ndicate~

thutthe forecast is a function of data throu!!,h datu t - t. Let


E[( Y, - [,_1) 2 ' F1 _ 1 , Y,_l.> ... ) he the conditional me.m -.quar~d l;.rror
of the forecast f,_ 1 condttional oo Y ob erved throu~h Jatc t - 1.
Show that the conditional mean squared foreca~t error i~ minimi7ed
when [,_ 1 = Y,, 1, where Y 'I 1 = E(Y, Y, 1.} ,_ 2, ).!!lint. Extend
the result in (a) to conditional expectations.]
c. Let u, denote the e rror in Equation ( 14.14). Sho" tl1.1t cm (111 111 _ J) = 0
tor j =F O.lHim: Use Equation (2.27). ]
14.6

1n this exe rcis~ you will conduct a Monte Carlo experiment that !>tudtes the
phenomenon o( spurious regresston discussed in Sectu.m 1~.6. In a ~lontt:
Carlo study. artificial dara are generated using a computer. then these artifi
cia I data are used to calculate the statistics being st ud ted . This ma ~ e it po~
sible 10 compute the distribution of stattstics for known models when mathemal'ical expressions fo r those distributions are complica ted (as they are here)
or even unknown. In this exercise, you will gene rate data ~o that two seric.;,
Y, and X,. are independentJy distributed random walks. The specific ~tepl> ilf1..
i. l 'se your computer m generate a seq uenc~ of T == 100 i.i.d. stan-

dard normal random va riable~ Catt these variable:. e1 e2 , e ()


Sel Y 1 = e 1 and Y1 = Y,_ 1 + e, for 1 = 2. 3. .... 100.
ii. Use your computer to generate a new sequence, n1, n2 , ... a

Nn

of

T = 100 i.i.d. standard normal random variables. Set X 1 = a 1 and


X, = X , 1 + a, Cor 1 = 2, 3, .... I 00.

iii. Regres.<> Y, ooto a constant and X,. Compute the OLS estimator,

the regression R2, and the (homoskedastic-onty) t-statll> tic testing


the null hypothesis that/3 1 (the coefficient on X,) is zero.
Use this algorith m ro answer the fo llowing questioas:
a. Run the algori thm (j)- (iH) once. Use the /Statistic from (iii) to test thl!
null hypothesis that /3 1 = U using the usual 5% critical value of 1.96.
What is the R2 of your regression?
b. Repeat (a) 1000 times. savmg each value of R2 and the tSlatistic. Construct a histogram of the R2 and t-statistic. Whnt nre the 5%.50%. and
95% percentiles ot the distributions ol the R~ and the r-stat.istic? ln
what rraction of your IUOO c;imu laiCd data sets doec; the t-stattstic
exceed 1.96 in absolute value:?
c.

Re p~ ll (b) for differen t numbers of observations, tor example. T = o


and T = 200. A the ample ue increases, docc; the traction of times
that you reject the null hypothesis approach 5 ~o. as ll .,houltl becaul.e
you have generated Y anJ Y to bl;. intlt:pcmkntl~ di ..tributcd? Does

Exerdses

583

lim fraction seem to approach some other limit as T geb larac'l What
is that limit'?
14.7

Suppo e that Y, follows the stationary A R ( l ) mode l Y,


where u, is i.i.d. wnh (u,) = 0 and \'OT(u,)

= 2.5 + 0.7Y,

u,.

1-

= 9.

a. Compute the mean and variance of Y,. (Him: See Exerdse 14.1)
b. Compute the first two autocovariances of Y,. ( Hmr:Read Appendix l4.2)
c. Compute the ftrst two autocorrelations of Y,.
d. Suppose that Yr = 102.3. Corn p ure Y r- 1 1 == E(Y r- 1 1Y 1' Y r- t... ).

14.8

Suppose that Y, is Lhe m onthly value of the number of new ho me construction projects started in the United States. Because o f the weather, Y, has a
pro nounced seasonal pattern; fo r example. housing starts are low in Ja nuary
and high in June. Let JJ.Jan d eno te the average value o f hou sing sta rts in January a nd JJ.Nb JJ.N.,, ... lkuu: deno te tbe average values in tbe other months.
Show Lhat the values o f fLJon Jl.Fd . .. , 1-L D.-. can be est1mated fro m the 0 LS
regression Y, = {30 + {3 1Feb1 + {3 2Mar1 + + {3 11 Dec, .,. u,. where Feb, is a
binary variahle equa l to J if 1 is Ftlbruary, Mar, is a binary variable equal to

1 if 1 is March, and so forth . Show that /30 = J.L;u,,. {30 + /3 1=

Jl.rtb f3o +

/32=

JJ.Mor and so fonh.

14.9 The moving average model of ord er q has the fo rm

where e, is a serially uncorrelated random va riable with mean 0 and variance

a:.
11.

Show that ( Y,) =

/3o-

b. Show that the variance of Y, is var(Y,)

= u~(l + bf + h~

+ + b~).

c. Show that p1 = 0 for j > q.


d. S uppose tha t q

= 1. D e rive Lbe autocovariances for Y.

14. 10 A researcher carries out a Q LR test using 25% trimming and there are q =
S restrictions. Answer the following que tions using the values in fable 14.6
(CrilicaJ Values o f lbe QLR S tali~tic with 15n~ Trimmmg) and Appendix
Tahle 4 (Critical Values o f the F,.,, Distributio n).
a. The QLR Fstatistic is 4.2. Should the resea rcher rejeetthe null
hypothesis a t the 5% leve l'!
b. 1l1c QLR F-statis tic is 2. 1. Should the researche r reject the null
hypothesis a t the 5% level?

584

CHAPTER 14

Introduction to Time Series Regre:s.sion ond Forecostong

c.

Th~ Ql R 1-~l~tti,tic

is 3.:; 'lhould
hypothc~i' at lht.. .:;% lt:vd?

14.11 !)uppll'\! that

u}

a. Shm\ th.tt
b.

D~.:nve

lollow:-

th~; \ R ( l)

r, follow~ an

th~:

rc.;,\!llrcher !I!J~clthc null

modd

u }',-= /3.1 -l- f3 ~} 1

_L

11.

\ R( ~J model.

the AR(:!} coeffictcnh lorY, ac; a tunctiun ot fJ11 .tnJ /3 1

Empirical Exercises

El-l.l

On the h!xtbook Web sne \HH\.Uw-bcco mJ,co ck_ watson. wu "ill find'\ t:l
1k S\1acru_Quartcr1) that (ontam~ qu<~rtal) data on l>l:'-~o.r.tl macfll.,.. ,,.
numk 'eric-; htr tht: l nll~J States; the dat<t arc dc:,lltbcd in the 11!..
USMuc:ro Oe'lcription. Compute r', =ln( GJ)f',).thc log..~rithm 01 r~al (I I)p
and .l Y. the quarterly grm"th r tl~,; of G DP. In l:mpirical r\dch,c' lU - 11(t
U''- II~,; ,,tmpk rcroo ll15 .:;1 -2II044 (where d tiJ ~fore l'}.5<i ma~ be U ~,;J
ali> 1 e,;c.;(,;s" f). a... mttal '-Jiuv l~H lag... in r~.-cr~'~luns).
u. r l>llWare thL m~:ao ut .l Y,.

h.

r \f'TI:!ss the mean gtO\\th r.ttc in pen.cnl<I~C points at an annual r.1tc


!Him ~1ullipl~ thc~ampl ... ml!an tn (,J) I~ -lOO.J

r.

E't1matc tht.. 'tandard U!,.;\ 1 ttiuo of .l Y,. f.>.pr~:.s you r ""''\ r.:r in pc1
ccnt.lg~.: point:-. at on annual rate.

d.

E~tunatc

th1. fin.t four autu..:orrdauonc; l .l Yr 'What 1rc the units 1


nc utocurrdations {~J U If \;r ~ rat~ "'f .;r.J\\lh pen:~.ntagl! pmnts

<~n

1!: 14.2

annual

r.tt~

or nt) unit-. at all)"

u. bttrnak an AR(l) model for ~Y,. Wh at i11 the estimated AR(1) wLII J
~''- nt? [s the ~oudflcient -t.lll'll~ally stg.nilt~mtl\ different Irnm zero' ( m
~trul t d )5 .. ~.onlidcocc intcn a fur lh~o t~oru .lllun A-Rt 1uletfic t'
b. htim.1te .tn AR(2l mudd 1M ~Y,. Is th~,; ,\R(2) cocfttcicnt sraHstrc,ll\'
sig.nificamly tltfferent from 1r.:ro? Ls thi-. muut!l prefcrn:tlto the ARt I )
moJd'?

c. bumah: \R ; 1 and .X R( 1 mndl'IS. ( 1 l ,;,u~ th~ c,t:mued


.\R(l) .\R(-l) muJds. u~\; BlC to chon~l-lht.. numb~1 l>tl.tg' in th.:
AR nwcJ~-:1. (11) How many lu!!" doc:. AlC choose'?
[:14.3

th e .10 augnwntcJ Dickey-f-ullr.:1 'tatistJc to h:'l fur a unit

Hlln~rcsxi \l rO<

in II ~; \R m,d"l f,,r r' \:, n altcrnat \'C. uppose th:ll Y, i" stati0nn1)
Mound 1 J(,;L!,.;nlltnl' K tr(,;nJ

Empiricol Exercise~
E 14.4

'k,t f,,r a hre:tk

E 14.5

u.

111

585

!he AR(I) mode: lim .:H', w..n~ :t 01 R t~'t.

Let R Jc:notc the 1ntcrc:"t rate for thrcc-month trcn~ur) hilh. E<atm.llt.
an ADL( I A) nwud for~} , uc:inP lagc; ot :lR, as auuitional prt.lllllorc:.
(omparin!! the- 1'\DL( 1.4) moc.k Ito the AR I) mudcl. h\ ht1\\' mut:h h 1...
the R~ ch tngt.u!

h. h the: Gt mgc:r c:m-..thty F-<;t,lll,lil'-;ignificant '.'

E 14.6

f~:-.t

th~.:

c.

codfit. cnh un the c.:onq:'lnt tel nl nnd coeffit' ic:nb on the lagg~J value-, ot j,R u..ing 1 OLR h .... t. b there C\ iu~ncc
ur a break?

ll.

c~llll>truct P'CUUO out-of-... ample forecast-- u...ing the. AR(l) moc.lcl hcginmng m I ')~4:4 and !!lllllg through the cnd of the ...ample. CJbat j,, com-

ft)r break in

:1().: 'O:Iant.J,off.,tlh)
h. Con,truct p)cuuoout-ohample fol"l:c.Jsts u'mg thc t\DL(l.4) moJd.
putc..l}' 1 Ari'I'AWI..lY,

th~.: I oliO\\ In!!

c.

Con::.truct P't:udo oul-l>f-sampk u...ing

d.

( omputc: the p!)cudu om-o(-)amplt! forecast errots lor each nwJt:l .

n.uve model:

Arc any of the: lorcc~"t:. b1ascd'> \\'hid1 modd hns Ihe 'mallt:'l root
l h1\lo tan<: i~ the R\tSrr
(exprcs,c:J in percentage pnintc: :tl an annual ratl.) f<'l the bt.'\1 mt)tkl'!

mean :,lJ.UUreu forcca't error (R\ I ~FE)'

~ 14. 7

Rc ..d the hm.es Cun You Beat th~.- \l,uJ.\I.!t! Part 1 and "Cm You Bcatlht.'
M.1rkct'? Part If" m this ch.tptcr >.t.:<t. go tu tht. c0ur't.' \\d'""' \\here: ynu
'''" ltnc.l an C\\l.ndcd \C:r.,ion of th\. dut.N:t dc,cribc:d in the ho:-.cl> the data
Me in tht: fik Stock _Rlo!turnc; 1931 2002 nnd ar"' dc-,uil~cd in the fill!
Sw ct._Re turn' 19,\ 1_2U02_D escriJ>Iion

u. Repeat th" calcul.tlton., rcpm tcd in Table 14.3 U'-IIW 1~..:gres~IOUl> c'limated over the 19."2:1-2002.12 :),tmph.. pt. nod.
h. Repeat thl. calcuhtton!) rt.:pnrteJ in lahle 14.7 u'ing reg c....,l,..,n) .!:.timateu uh r th~: 1932:1- 2002:12 sampk pcnod.
c. [<,the \'art,1l1le

ln(d~~idmtl

\idtl) htghly pt.Nstt.. t' l \plain.

d. Cun.truct psl.Udn nut-uf---ampk forccasb of cxccc:c; 1t.turn' over the


19~31 -~IKC : l2 period us.ing rcgte,\iun' that begin in 1932 I
e. Doth~ re ... ults in (a)-{ d) sugge:-.l an~ imp1rtan1 chan' 'to tac condu
).ion' rt.tchcd in the hoxcs'? F\pl.1in.

586

CHAPTER 14

Introduction to Time Seritn Regression and Forecasting

APPENDIX

14.1 Time Series Data Used in Chapter 14


MacroeconOmic time senes data for the United States are collected and publi,hcd b\ \ aiious gover nment ,tgencies. The U.S. Consumer Pnce Jnde>. IS me.t!>Ured U'IO~ montH\ sur
veys and is compiled by the Burc.:au o( Labor Stall$liCS (BLS). The un mplovment fd l( ~
computed from the BLS's Current Population Survey (see Append 1~ J.l ). The: quartr.;rlv
da ta used here were computed by averaging the monthly values. The federal funcb ra te dat<~
~ rc

the monthly average of daily rates as reported by the Feder<Jl Reserve and the dul-

lar pound exchange rate dma are the monthly average of daily rate<.; both arc forth~.; fin \I
month in thc: quaner. Japanese GDP da1a were obtai ned from the OECD. The daily pc1
centugc change in the NYSE Composite Index was computed as 1006 1n(NYS,). whctt
NYS , is the value o t th e index at the dai ly close of the New York Stock Exchange; becau~~:

the s tock exchange is not open oo weekends and holidays, the rime period of an aly"~ I!> a
bu~iness da y.'fbese and thousands of other economic time series are free ly av11ilable on the
Web sites maintained by various data-collecting agencies.
The regressions in Tables 14.3 and 14.7 use monthly financia l data fo r !.he United State:
Stock pnces (P,) arc measured by the broad-based (NYSE and AMEX) value-weiO!t" '
mdex of stock prices conslructed by the Center for Research m Secunty Prices (CRSP I
The momhly percent excess return is 100 X {ln[(P, + Dtv,)/ P,_d- Lo(TBill,)}. v. hcre Dtv.
is the davidends paid on the stocks in the C RSP indi!X and TBtll, ts the gross return ( Ptbe interes t rate) on a 30-dayTreasury bill during month t. The d:ividcnd- pnce rauo j, 'on
structed as the dividends over the past U months, di' ided by the pnce in the current m mtb

We thank Motoh1ro Yogo for his help and for providing these data

APPENDIX

14.2 Stationarity 1n the AR(l) Model


Thi~

appc nd1 x shows that, if lf:3tl < 1 and u, is stationary. then Y, is stationary. Recall It om

Key Conce pt L4.5 that the time !.cries variable Y, is s tationary if the joint distribution ''

( Y,. 1 , YJ r) does not depend on s. To slreamline the argument. we show tht-. furn
for T = 2 under the Slmphfyang assumptions that f:3o == 0 anJ lu1J are 1.1d N(O, ulJ.

Stationarity in the AR!l I Model

587

The firs !>tep is deriving an exprc~ion for Y, in tcr ms nf the u,~ Becau'c ~11 0. equation ( 14.8) implies that Y, = {3 1 Y,_ ~ u, Suh!>li t ulin~ Y,_, {3 Y, ~ u, into lht!> t\prev
sion )idtlo; Y, ~ {3 1({3 1 Y,_: + u,_J- u, = fj 1Y ... f3 u, -1- 11 Conttnumg thic; c;ubo;titution
another Mep yields Y, ={3~Y,_ 1 + {31u,

Y, :: u,- (3 1u1_ 1 + {3;u,

f3 11
1 -

"t

u,. and con11num~ rnudmltd) yrdds

{3iu,_') + ...

..

Lt3itt,_,.

( 14.37)

1=0

Thus Y, i<; a \\etghted average of current and pa!il u,'s. Becau~c the 11,'s are normally
distribuled and becau..<\C the weighted average of normal random varrahles is normal (SecliOn 2.4), Y,

1 and

Y5 2 have a bivariate normal dtstnbution. Recall from Sec1ion 2.4 that

the bivanate normal distribution is complt:lely dete rmined by the means l1f the lwo variables. their vari:.mccs.. and their covariance. Thus. 10 show that Y, i) stationary, "e need 10
show that the means, variances. and covariance of(Y,.,. 1, Y,.z) do not depend ons.An extension of lbe argument used below can be used \<.t ~bow that the dislnhution o f (Y,. 1.
Y,

2,

Y, + r) does not depend on s

The means and variances of Y,.. 1 and Y1 ~ 2 cun be compu1ed uc;1ng Equa1ion (14.37).
wrth the subscrip1s s + 1 or s + 2 replacing 1. F1rs1, bccau!>e E(u,) = 0 Cor alii. EC Y,) =

L.; ,J3j (u,_,) "' O,so lh~ mean of Y,_1 andY, ., are both zero and w particular do not depend on s. Second. var(Y,) var( l:;:<Plu, 1) =
0 !J31)var(u,_,) =

( l:~.o(jJu,_,) =

~~~

-.em,

2:;

~/(J
where the fmal equalily follows from the fact that. if 1aJ< 1.
k;:oU' J/(1 - a):thusvar(Y,~ 1 ) = var(Y,.- 2) =uT,I( l - f3i).wbichdoc\nOI d~pen dons
a~ longas i J3 d < l.Finally, because Ys~l = ,t31 Y,.t + u. -~,cov(Y 1.Y, ,) - E(Y, 1 Y1 _, ) =
[ Y~tt(f3 1 Y,+ 1 + u,+ 2)J = f3 1var( Y, _ 1) + cov(Y1 1.11, ~) = {3 var(Y1 1) = ,13 1 (1~/ (1- f3i).
The covariance does no1depend on s, so Y,. 1 and Y, hav~. a Jntnt probllhthty di tstbutton
n(.BI) 2 =

that does not depend on ~. th at is, tbei.r joint tlb1ribu1ion is l> l<~llonary. II lf3 11 ~ I. th1s calculation breaks do"'n ~a use the infini1e sum tn Equ.uron ( 14.37) Joe!> nol conv~rgc and
the vanance ot Y, rs infinite. Thus Y, is statiomH} if f3 ~ I but no if f3 ~ I.
The preceding argumcm was made under the assumption:.. that fJ - 0 and 11 rs normally dlsllibuled. lf f3o i: 0. tbe argument i~ simtlar t.xccpt 1hn1 the mean-; of Y, and Y, .,
are ,13.,1(1 - ,131) and Equation (14.37) must he modtftt:d for Ihr., nonzero mean l11c a.~ump
tJon thnl 111 ts t.i.d. normal can be replaced wilh Ihe ac;<~u mpt ion that u, is slationilr) wJth a
1inll~ \'urtnncc because. by Equation (14 ..n), Y, can sttll bt: expr\!~scd as a functton of cur
rcn1 .md past u,s, so 1he distri bu tion of Y, is statmnnry as long a the c.lrstnhulion of u, io;
~1a1ionary nnd lhe infinite sum expression in Equn1ioo (14 ~7) is meaningful io tbc 'cno,c
lhHI 11 con"\!rges. which requires 1 ~1 1 < 1.

......
588

Introduction to Time Series Regre$$ion and Forecasting

CHAPTER 14

APPENDIX

__ 14.3 ~ Lag Operator Notatio n


'!11..: n~11at10n 111 tht~

.md the next two chapters~ ~tre:1mltno.:J con~tJct :1bl} b)' adopting wh

'' J.:no\\ n " Ins, orxr.ttnr notation lxt L demltc th~; lug opcrutor '' h11..h hJ:. th1. prop ' ,
th lllltran~hlTOl\ \artahh: into It<~ lag. Th:nt,;. the Ia!) (lfl\.T31Cir L ha~ the rro~rt~ I } r
Ay .1pplyin~tht htlo(upcnt.lrt,,icc.:.tmc.:l.lblilinqhc,..:~o:~nJioig L'r = LCLr,)- Lr,_ ,
Y
\fore gencrull). h~ 3pplvint the lat operator j time one nhtain.' the? lag In ,um
m.1rv. the l.tg npcr.ttnr ha<~th~. property that

l'

L Y1 = Y,.

L~ Y,

= Y1 ' illld U Y,

}'1 ,.

'l11c h.w, opcntttll nutation permits us ll> ddin1. the lug polynornhtl, wh1ch i~ a pol\' no
rni,1l in tht lal! lptruhr:
u(l ) -

11 1

+ n l. + a: L

..f.al'
+" I - ~
'
'I

\\here u11.... ''"are the coefficu:nts or tho.: l.tg pol)"' 11 I 'lJ 1.11 = ' lbc: ~cgr~; .. of th.l .,
pol} num1al a( L) 111 Equatlf\1\ ( JJ39) '"I' \luluplying Y. "~ o( L.) ~ 1eld,
f)

+ ...

u,. y

T lc expre''>lllll m f:.qu3t1on ( IJ.40llmpht.'' thallh AR(J 1 Mod(' In Equauon t!L!~)


Clio lx .,.ntt.. o comp<~.:tl) a:.
,,(L) Y, = (30 ~ "~"

whrre n 1 = I and 11

... -~,.for

j - I . ... ,p. Stm1larly. :111 ADUJ'.ql moud can be

u( I.) Y, - {30 + r!I ).\


\\here Ll(l.) i-. a lag (l<llynum1:tl
(Jc~II.'C

1J - }.

(l..t .. )

,,t degree p (with ilu

\\ntt~.:n

II

II nnd dL) ~'a lag poi)110ml<ll ~,1

Consistency of the BIC Log Length Estimator

5 89

APPENDIX

14.4

ARMA Models
me auto regressive- m oving a' erage (ARMA) model extends the autoregrc~:.ive model by
modeling u, as serially correlated, specifically. as being a distributed lt~g (or " movmg a ve rage'') of a nother unobserved error tenn. Jn the lag opera tar notation of Appendix 14.3,1et

u, = b(L)e,, where e, is a serially uncorrelated. unobserved ra ndom variable, and b(L) is a


lag polynomial of degree q with b0

= 1 Then tht: ARMA{p.q) model is

a(L)Y, "" {30 + b(L)e,.

(14.43)

where a(L) is a lag polynomial of degree p with a0 = 1.


Both AR a nd ARMA models can be tho ught of as ways to approximale the a utocovariances of Y,. The reason fo r this is tha t any stationary tl mt: series Y, with a imi te variance can be writ te n eithe r as a n A R o r as a MA with a serially uncorrelated error te rm ,
altho ugh the AR or MA models might need to have an infini1e order. llJe seco nd of these
results, tha t a sta tionary process can be wriuen in moving a\e ragc form, is known as the
Wold decomposition theorem. and is one of tlJe fundame ntal results und~.:rlying the theo ry
o t stationary time series analysis.
As a theore tical matter, the families of AR, MA. and A RMA models are equaJly rich,
as long a!\ the lag polynomials ha ve a sufficiently high degree. Still, in some ca~es the auto~:ova riances can be better approximated using an ARMA(p,q) model with small p and q

tha n by a pure A R model with only a few lags. As a pract1cal m atte r, however. the estima
uon of ARMA models is mo re d1fficult tha n the estima tion of AR modcb, a nd A RM A
models are more difficull to extend to addi tional regressors tha n a re AR model~

APPENDIX

14.5

Consistency ofthe BIC Lag Length Estimator


lni~

appendix summari7.es the argument that the BIC estin1ator of the lng length. p. in a n

a utoregression is correct in large samples, th.11 is, Pr(p

= p) --+

A lC estimator. which can overestima te p even in large samples.

I. This is not true: for the

590

lntroductton lo Ttme Series Regression ond Foreco$ling

CHAPTER 14

BIC
Fir-t ~un,IUt r lhs.;

~p.:u.ll C.N: that the

BIC Ill Us<.:d to cll\l\1\<.. .unong olllhlll. '''-'''-IIIII~ "llh

cru, one, ur ~~~ ll.?' \\hen tb..: trul.' lag k 1gth '" on... II ' .,hm\ n hdo\\ thJt (I 1 l'r(p
0) -

0.md

(ti) Pr(p

0. from ~hi.:" =t fvllo\\' that l'r(fl

2) -

e"tcn,ton o r !-tic; argument'" t.,e g~.:ncrnl ca..e "f '" 1 d ,.,over 0 s I'

m_ tltJt Pr(p < p) - , II ,,nJ Prr p > fl)

---4

I) -

I The

enuuls h "

0: the str leg_\ for ~hu\\ing these i' the !'i,lntc.:

'w.~u m {IJ Jnd 1i1) ~lo\\

Proof of (i} and (ii}


Pruof of (1). To chr ~ p = (Itt mu't tx. theca~ that Ul("(lll < BJC( 1) th 11 , BIC(O\
BLC.c )<ll. ~uw Ulqllt-BIC(I)
2{1 7) I

In( 'i5R(O) I)

'\\R I) 1'\ -+

[ 1t\\RIOJ.T)-L1ll/Ji/)

ln(5SR{!)/

n - (In[),/.

I'.U\\

IU -

s.'RlOI. I

'L

1),

-"- n 1 \Sf<{!) J ......:....,, "~ami (In T) J'---> 0. pullm~ these ptc<.:c."> ltgcth~:J , Btl (OJ

-RIC'( I)

-''-

lntr;. - Jrur~

> 0 hc.cau'e u~ > ,,,:.

II tollm~' rh.tt Pr(BI( (II)

lHCII)) - O,,uthatl'r(iJ - 0) - 0.

Proof of (II).

ll chov-.~.

= 2 II mu.,t I"-~ the cao;c th 11 Ul< r),.. HlC I II N HI('(~)

RICtl) o. 'ow lllill.l2) TllC.( L)j"' Tl[ln(S R( ')1 T) + 'Onl)r II- [In! ~'RII J'TJ
-:!Cin7) /ll-nn(\SRl2)'SSR(II) lnf=-nn(l
f.(T-~))
lnl.\\l, ... 1

H1<12)) [\\Rf"'

{SSR(')
~

U -

~))

IJ) k tm~ th" nul hH thc,ts th.tt f3

'' rh~. homo,keda,ttc.11~-tml) f-'11:\H'Ilo.. tE 1


o n the .-\R{2). 1111 1 t~ hom,"kl.'l.fa,ttc, f h.t

llHmptotic Ull>tribullC\n: if not. it llil'

"llnl~o. othl!r ll~) mptntic

Pr[RIC(:!J- BICII) .... lJ)-Pr(T1BIC(2)-RlC{1))<0)


(lnl') < lll

Prliln[l + 1-1(1-2))~ lnlj . A,Ttncr~:t~.:)./ln[l

oltbc luganthmt~: .tpJ'ro\lmauon In( I


0) lhu' Pr(BI( l2)- BT(tl) OJ ___.. J>xl)
0.

11 (~ .:ln'..:llu~ncc
tt -

2) -

Jhtrihuttnn.

Prl - 'lln(l 'F 1(7

r tl).;.

f/(1 -~1

n \\htd h

,, \' ,

-'ll
F

com~.:. l'

1 ,,,

tnT)--+ U. S\1 tll.ll i'rtJ.

AIC
ln the 'pc~.:t.tl ct~o1 nn \ R(J) when tcru.llne, e~r tW1 la!-\'>trc cothu.krcd.(t) opph~' llll1l:
AI( 1\h~;.t,th~tcrmln/t't~pl.tci.:Jh) \'PI'r(p-0) -~ O.Allthc-.t"P"inllu.:pnl~ll~tl
(ii) fur th~ n Ic ah!1 1ppl~ tlll h.. A J(. "'llh the: ITI11lhh1.' Ilion thJl In r ,, r~ pi.!C~U h~ ~ tllll

Pr( \I( I'')


\ TC(Il < 0) - Pr( F > :!\ 0 lJ 111 1~ hnmo,l..cu."llc. Pr(F > ] l -
Pr1 \ 1 > ?) - U.II-.<;C~th tll'r tp = 2) --" II 16 Tn gc:ncr.1l when i j, dlo.,~n lNng th \I(
Prf/' ~ p)--+ ll hut Ill(/'*' p) lcmh ton p0-.11in: numh~r. ~o l'r (i
10

I.

p) d<'le~ nnl "'"I

CHAPTER

Estimation of
Dynamic Causal Effects

15

19~'

n 1he

LtiJic

t:~rcd

mo' ie Trading Pluce.,.thc charnctcr-. pht) ell hy Dan Aykruytl anJ

~lurph~

U"ed m:mlc mformatron on how \\CIIllorid" or.mgcl> had

over the winter to make milhons in the nrange juice conccntrnw futurt:s

market. a m.trket for contracts to bU) or -.dllnrgl' lJUantitie:-. of lH mgc JUke


con('~ ruratt.

at a specifi~u price on a future J.m:. In real

hi~.

tr nkr m 1.lrang~.;

juice futures in fact do pay dose altcntion h' the \\Cathct in I lorida: Freeze-; in
llonda kill lim iJ.t oranges. the source of nlmo!>t .til fnvcn ot.lll~\.. jurce
con~.crmnte

made in the l.Tnitcd States. c;o 1ts suppl' I<lib nnt.lth\: rru.:c

rh~.:s.

13ut

prcd,c:l~

ho\\ much doe:. the pri~X ri..,L "hen the weather in Holitla turns -..uur'?

Do\;.., th\..

prtc~.; n~e

nU at once. or arc th\..Tl. Jdays: tl

!-tO. !01

ho\\ lung? Thc-..c arl.!

que: '\I tone; that re.li-We traders in orange jUice fuiUrcc; need w am.wcr if they
\\,101 to

succeeu.

' lhr., ..:IMpter tak~s up the pro~km tll c-..tim tting thl dkct ''" }'no\\ ami in
the futun: of a change in X. that is.. the d) nnmk cnu.,al clTccl nn 1' of .t change in
A. \Vhut. fot example:. ib the effect

00

the rath

Ill

nr.tngc JUICC prices OH;r ltmc

ol n lrct7ing spdl in Florida? The .,t,ming pmnt lllr modcltng :mJ csiLOl.ttmg
d~ n.unil' l'.lll'al dfect~ is
~ ,,, I!Xprc~ .... J .JS a
introduce~ th~;

Lbe so-callcu di-.11 ihtllcJ lag. regt c.,, jon nwJd, in "hich

function of current and past \alucs oi \ St.ction 15.J

tlbtnbutcd lag model in the contc\l nf co;timating the dh:ct of

c11IJ \\C.tthcr in Flmida on the price of ,,range JUICe <.:onccntwh:


~ntrun

t. fiLet.

O\Cr time.

l.i 2 ltkl.!s a doser look at \\h,tt, ~'rccl'. ch. b 011. tnt"' a Jynam11: c<.~usal

592

CHAPTER 1

E$~imotion of Dynomic

Cousol Effe<:ts

One way 10 estimate dynamic causaJ e((ects IS 10 estimate the coefficient' N


the d 1stnbuted lag regression model using OLS. As discussed in Section 15 3
this estimator is consistenr if the regression error has a conditiondl mean o[ zero
given current and past values of X, a condition thai (as in Chapter 1::!) ts
referred 10 as exogcneity. Because the omitted determinants of Y are co rrelah.:d
over time-that is. because they are serially correlated-the t::rror tt.:nn in the
distributed lag model can be serially correlated. This poss1bility in turn reqUJ re~
" heteroskedaslicity- and autororrelation-consistent'' (HAC} standard erro~. t '-"
topic of Section 15.4.
A second way to estimate dynamic causal e ffects, discussed in Section 155
is to model the seria l correlation in the error tenn as an a utoregression and then
to use this autoregressive model to derive an a utoregressive distributed lag
(ADL) model. A ltema tive ly, the coefficients of the OJig.inal distributed lag
model can be estima 1ed by generalized least squares (GLS). Both the ADL and
GLS methods, however, require a stronger version of exogeneity than we have
used so far: strict exogcneity, under which the regression errors have a
conditional mean of zero given past. present, and furure values of X.
Section 15.6 provides a more complete anai)SIS of the relationship between
orange juice prices and the weather. In this applicalJon.tbe weather is heyond
human control and thus is exogenous (although. as clbcussed in Sectio n 15.6.
economic theory suggests that it is not necessarily strictly exogenous). B ec.llls~
exog.ene ity is necessary for estimating dynamic causal effects, Section 15.7
examines this assumption in severa l appJications taken from macroeconomtca nd finance.
This chapter builds on the material in Sections 14. 1- 14.4 but, with the
~xcepti on of a subsection (that can be skipped) of the e mpirica l analysis io

Section 15.6, does not require the material in SectiOtlS 14.5- 14.8.

1 s. 1

An Initial Tosto of the Orange Juice Dale

593

15.1 An Initial Taste of the Orange Juice Data


Orlando, the cemer of Florida's orange-growing region. i~ normally sunny and
wann. Buc oow ami then there is a cold snap, and 1f te mp~.;ra tures drop below
(reezing for too long, the tree;; drop many of their orange rr the cold snap is
severe, the trees freeze. Follo~ing a free:o::. the supply of orange juice concentrate
tails and i1s price rises. The timing of 1he pncc increa~es ic; ratbl!r comphcated,
however. Orange juice concentrate is a "dura hie,'' or Slorable. commodity: that is.
11can be stored in its frozen state, albei1 at some cost (to run the freezer).nms the
price of orange JUice concentrate depends not onl~ on curren1 supply but also on
expectalions of future supply. A freeze today rueans that future supplies of concen trate will be low. but because concentrate cum.. ntly in storage can be used to
mee1 either current or future dernand, Lbe pnce of cxis1ing concentrate rises today.
But precisely how much does the price of concentrate rise when there is a freeze?
1l1e answer to I his ques1ion is of interest not JUSt to orange juice traders but more
generally to economists interested in studying the operations of modern commodity markets. To learn how the price of orange juice changes in response to
weather conditions. we must analyze data on o range JUice prices and the weather.
Monthly data on the price of froze n orange juice concentrate. its monthly percentag,e change, and temperatures in the ora nge-grO\\~ng n:gion of Florida from
January 1950 to December 2000 are plotted in Figure 15.1. m e pnc~. plotted in
Figure l5.1a, is a measure of the average real price of frozen orange juice concentrate paid by wholesalers. Thjs price was deflated by the overall producer price
index for finished goods 1o e liminate t he effects of overall pnce inflation. The percentage price change plotted in Figure 15.1 b is the percent change in the price
over the month. The temperarure data ploned in Figure IS.Ic arc the number of
" frcezmg degree days at the Orlando, Flonda. airp01 t, cak:ul.tted .ts the sum of
the number of degrees Fahrenheit that the minimum te mperature fa lls below
freezing in a given day over all days io the month: !or example, in '\;o.. ~m bcr 1950
tht! airport temperature dropped below freezing twice, on the 25'11 (31 ) and on the
291h (2~). for a total of fo ur free7jng degree days [(32 - 31) ..~- (32 29) = 4).
(1be dlata are described in more detail in A ppeodix J5.1.) As you can see by companng the panels in Figure 15.1. the price of orange juice concentrate ha..o;; largl!
S\\mgs. some o{ which appear to be associatc:d w1th cold wea t h~r in Florida.
We begin our quanlitative analysis o( the relationship bl!!\\ cen orange juice
price and the weather by using a regress1on to e.,timatc the amount by which
orange j uice price rise when the weather turns cold. The dependent variable is
1he percen1age change in the price over chat month (%CIIRP,. \\ her!! % ChgP, =
100 X lln(~1 ) and ~1 is the real pricc of orange JUice). The rcgrcs~or is the
numbt::r of lrce7jng: degree days durinl! that mnnch (ID D,} Thu; rcgre"-Sion h

594

CHAPTER 1s

Es11n10tion of Dynomic Couwl Effecn

FIGURE 1 S . 1 Orange Juice Prices and Florida Weather, 195~2000


Prict' Index

P~ r.:c n t

2~
of II

l'K.:J

197
'r~ar

Year
(b ) I' ll."lll Ch;mgc a 1 P c: oi
I uz~n c~)u tltr.ll~ .I OtJil,_'l.: Jnn..<
Freuin~

ncgrcc D.')'
til

J5
311

Year
(c) M1'1 ahh I r~\"7in~

lk~::o"'~

I ).J~" in Otl.uhl Hnnd.J

There hove been Iorge monlhJo-month change~ in lhe price of frozen coocentroted oren~ juice IN:Jny of tho large
mOll~!$ coinctde wtlh freezing weother in Orlando, home of the orange groves

estim,'lh:d usinl! mt.mthl) dJtd from ltnuury 19"0 to l)~~~.mb.:r 20Wl..'

regreSSIOn.., in thiS Cha ptu) for a total of 1'

%C/1~P1

oJ2

. ,II

Ob~cnation.,:

= -IJ.~O +li.-HFDDr

(I ,,J)

(0.2?) (0.13)

The ... t.anJ.ard crr,,r, reponed in thi... ~ctaon .src nt'l the u... u tl OLS ... 1nJJrd
crrv~. but r Hher ~~~~ hch. rahl,.eda~ticit) .md .tutocorr l.stiamc,lnsistcnt I I l \C)
~lamlarc.l errur" thor arc uprropriatc \\hen the etl'\''11 term and rcgrc,,nr~ nrc
auto orn.:lall.:d. Jl t\L l>liiiH.l<JrJ errors art di.,cu..,M;:J an \~o1.taon 15.4. anJ lt~r no"
tha:} .art! u..,uJ witlwut furtl11.:r ~ xplanauun.

1S.2

Dynom1c (OU!.OI l:lteds

595

Accor<.hng to 1his regression. an additional l11:: ...:1.in~ dcf rCt.. day during a
monlh increases the price of orange j uice concentrate over that month by 0.47%.
In a mon1h with four freezing degree days. '\UCh as t'ovember L950. the price of
orange juice coocenlrate is estimated to have increased by 1.88% (4 X 0.47% =
Ui8%), relative to a month with no days below freezing.
Because the regression in Equation ( 15.1) includes only a contemporaneous
measure of the weatber, it does not caplure any lingering effects of the cold snap
on the orange juice price over the coming months. To capture these we need to
consider the effect on prices of both con1emporaneous and lagged values of
FDD. which in turn can be done by augmenting the regression in Equation {15.1)
wnh, for example, lagged values of FDD over the previous six months:

--

%ChgP, :: - 0.65 + 0.47FDD, + 0.14FDD,


(0.23) (0.14)
(0.08)

+ 0.06FDD,_1
(0.06)

+ 0.07FDD,_ 3 + 0.03FDDH + 0.05FDDH + 0.05FDD,-6


(0.05)

(0.03)

(0.03)

(15.2)

(0.04)

Equation {15.2) is a distributed lag regression. The coefficient on FDD, in


Equation (J 5.2) estimates the percemage increase in prices ove r the course of the
month in which the freeze occurs; an additional freezing degre~ day is estimated
to increase prices that month by 0.47%. The coefficent on the fi rst lag ot FDDr
FDD, _L estimates the percentage increase in pric~s arising from a freezi og
degree day in the preceding month, the coefficient on the second lag estimates
the dfect of a free7.ing degree day two montJ1s ago, and ~o forth. Equvalently, the
coefficient oo the first lag of FDD estimates the effect of a unit increal.e in FDD
one month after the freeze occurs. Thus the est1mated coefficients in Equation
( 15.2) are ~stimates of the effect ot a unit increase in FDD, on cutTent and future
values of %ChgP. that is, they are estima tes of the dynamic effect of FDD, on
%ChgPr For example. the four freezing degree days in November 1950 are estimated to have increased orange juice price-; by 1.~% dunng November I950, by
an additional 0.56% (= 4 x 0.14) in December 1950. by an addilional 0.24% ( ~
4 X 0.06) in January J951. and so forth.

15.2 Dynamic Causal Effects


Befor~

Lea rning more about the tools for estimat1ng dyoamic causal effects, we
should spend a moment thinking about what, precisely, is meant by a dynamic
causal effect. Having a clear idea about what a dynamic cau.,nl effect is leads to
~t clt!arcr under tanding of the conditions under wh1ch il cnn he estim::tted.

596

CHAPTER 1S

E~hmolion of Dynamic Cousol Effect$

Causal Effects and Time Series Data


~~.ctivl

1 2 del n~J c.tU'>dl dft.:~ 1 <~S he u come ol :'10 ideal mndollllt~. ... n
rdJcu e\pcnmcnt: WI C'l 1 hortkultu .IJi..,t nndoml~ npplic~ r~rltlizc
1~
I<Jm'lh plot... but n,)r oth~o.rs and tiKn mca!)urcs the \Jcld,thc ~\pcckd c.JifCcr c ncc
111 ytcltl b-.:t\\een the krtthzed anti unkrttli1.cu pluh ts the cuu.;;iil dte~ t on tom 1
10 yield olth~ f~.:rtlltzu. llns concept of an ~::xpenmcnt , hO\\C\t..:l j.., on~ tn \\l\lch
there nrc rnuhipk 'llhJt:Cts (mulllph: tomato pi~H:. 01 multiple people). :.u lh~
J,tla arc either c.:Tos-.-sc.:ctional (th\." tomato ~tcld at lhr: end ul the hurvc:-.t) 1.11
panel cl.11.1 (ind1viJunl im:ome~ hd~.>rl. .tnt! aft~.r .m c~pt!'rim~.nt<llj~lh lr.tininl! prol!lam). H) h 1\tng multiple.; \UbJects. it i-; poo;-;ible to lu .. c.: both trcntment and control ~fll P': nd 'h~. ch)' to e'um.1tc th~o. ~o..1u.-.. I c kct of tic tn::ltmcnt,
In t. ::l~o. :>dir.:> ar h-..tliOJb., I hi~ tk rmtion ol ..ill ,,,I c.:llecb n k m<; Ol 'ltdcal
nmh1011ZeJ contrullc.:d e\pcnmcnt need:, to b~o. modified. lo be ~(lOl"f 1 .
con~iJ~r .10 import.uH problem ot macro~cunnnw:s: c~tunalmg the dtc..ct ol ,tn
Uni1nticipalcu change in lhc shorH...:rm intcrc:-.t 1\111! nn the cutrc:nt c111U tulurc
cconom1c acuvirv tn n given countn. as mcal'ur...:d b) G O P. Taken li~t:rall~. lhl!
randomm:d controlkd 1.!\p.:nmem ot )cction I ~would entail randomh :t-..;i.t ing Jtllcr~o.nl cconorn1~o.:> to treatment anti comroll.!tOUps. The central h.liiJ., m
treatment grllUp \\ ouiJ applr the tr~o.atmc.:nt ol a random iDler~.' r 1... '- '""""'
\\hilc tt u'e 10 the.. ~ontrol group '"ould appl~ n~.; sut.h random c..hJn '-' lvr hotlt
group .... ~.o umic ncti\ it) ([or cxampk. GOP) \\nulc..l be mea<>urcd uvcr the next
tc:w yt:.ICS. But wh.u if we arc intcrest~o.u in c.:stinuung thi~ dfcu f,,r ,, -.pcd ti c
count!\, ""Y the llnunl St<tles? 'I hen thi-. e-.;pcrimcnt wou ld cntni lluvinl! 1..hffc1
cnt "dorll'" .. nl the llnih:d States a:> c;uhJct.:ts. and assi)!nmg some clone ccon,)mtcc;
to lh~o. lr~atmcnt group .md some to the cnntrvl ~ruup. Ohv1ousl\' thts "pc.~r.tll cl
unhcrsc:-. ~o.xp~.:nmcnt '" mfea:.tble.
ln-.tc.td tn timl.' -.~o.IIC'- data 11 t~ u.,elul l\) think nl .1 r~mdomiLcJ control k d
l;lXperim~o. nt con-.i)o,tinc t'lf th~ samt: -.ubjcct (e.g.. th..: l'.S. econ('lm}) be.: n f!i\'cn
d1frcrcnt rt:atmcnh (r:tndoml~ ~o.h l'\!n hange' in i 11.rc't rnte<>) at lhflercnt
p(IJOb n L11e ( th ~o. I t"'l)... the Iqst " and "0 torth 1 J~ till!> tram~.\\ c..1rk. tht: .. m~:h.:
'ubjcct ,ll dtll~rt:nt tun~.) pla~s the t ole of both ln... tm .. nt anti control grtHtp
Som~:tmH::> the Feu dwngcs the mlerest ralc '"hilt! al other times it du~.~ nut.
Dt!l:ausc.. Jata are wlkctcu over tinK, ill\~ possibk tu c!>llmale the d\ n.1mic call' II
dfcc..l that is. thl! tim~: p.uh ol the effect on the uutc.om~. ot intu~.-.t of tl c.: 1
ment ror '- \llrnplc. a -.urpri-.t: increa'c m tht: shnn tl.'rm interc't ratl' nf t\\ U p.. r
centage pc...lnt". sw.t<uncJ for one 4u.mer. might inili.lll)' ha\~o. .1 ncgligit>lc.: ... r ~.
on outp~t. 11tc.:r ,..,. o qu.trll.rs GDP ~fl)\\ th mi~ht sl<m with the gr.. teo;: '"'
UO\\oll alter l I. lllU I'Oo.; t slf }ear.. th~o. 0\t.f l h OC\1 (\\-0 )Car.. c,np n , , ,
mi~ht rctun 1<1 norm d. l111s umc... p llh of uu,al dlc...t.h IS thl! d:on 111 1. ....~,. tl
~fleet on 0 Dl' grn\\lh of., '>llrpri~...: c..h.mgc in the int~.r~o.~l rate.

"' n second example. c:onsiucr the c.-.u...al cnec t on w llli!C: JUic.:c.: price
~hnngc' tf 1 freetine. degree tlay.lt '' po'''hk to m111ginc ..1 v.mc:ty 1.11 h) pntheticnl cxpcnmcm-.. each \idding a dilfcrent ~.au:.al et:J.c.:ct. Onl' e\("-rtmc:nt \\ould be
1 ch 11 l!t; the \\Wther in th1. Flonda orange. grmc: hnh.ling C:tlO!o.ltnt weather
do:cv. hc:n.:-for C:\.,tmple, holtlmg con,tant \\ cathc1 in the (t;,,,, l!.rnpllruitl!fO\c"
.Jilt! 10 nthl..'r citru' fruit regions. This e\p...rim~.nt \W\Uic.lnKa"urc 3 part .11 dJ~.:ct,
holding 11ther weather wnstaol. A 'econJ cxpenmcnt nll~hl ch.trl.~ the wcath~.-r
in all the! rl'gion' where the "tre,ltment '' :lpphc..ttton ol O\ era II wc:.tther pal
t rn.;_ If wc.athcr 1s corrd,ttcd across n::~ions for compdmg C:f'lll"' then thc:!.C t\\o
tlvn~mu: cauo;,al cllccts 1..11ffer. In tlus chapter."'- um ... iJer the C.IU''I dft.:d in the
I Iter eXJKitml.!nt. that 1:., the cau.;al dkct ot nppl) in(' ' neml weather patterns.
'TlH:. cor re~pond~ to measuring the.: clyoamk dfL'd on pritc:" of a thang~o. m
HonJ.t weather. 11m hnlding con... tant weather m othc.:r 't(!ncultural ~egtl' 1'

Dynamic effects and the distributed lag model.

l:kcauc;c c.hn.mic cftub


n~:t.:c,,alll} occur O\c:r time!. the econometric modd u'cd to t~u m,llc.:- dynamic
~oau ... al dfcth nce1.b to incorporate lags.'l o tlu ~o. >, C<lll b~.- expr~,..,c;ctl as a dJstributcJ l.tg of ntrrent and r pru.l value::~ of X,.
Y, = /311 ~

where.;

11 is

/3

-~

+ f3:X,

/3

X, ,

+ ... 13 ,.\' , + tt,.

.m crwr term that incluJes mea-;urc.!mt:nt

~rror

in )',and

th~o.

( 153)

dfecl

\If

omitll'd determinant... of Y,. The 01\)deln E.quallou ( 1).3) '" calkJ thL di,trihuled
rdatin~ X ., aml r of th lags, to Y,.
an lllu:.tr.tllOD of I:4uation (I'\ 3). cc.msilkr

IUJ.: model
A~

.1

moJifkd vcr~ion ol the

tumato/icmb.::cr ewcrimt.:nt. Bccau~c fertilize rrlicd todny might n.:mam m


!he.; g,ruunc.l 10 future ~cars. the hc.miculturaft,t wants tn llch:rmi'l~.. the dfect on
wmatll yield mw tim1 of applying ferllh7er 1\ccorJingly. sht dc:;tgth 3 thrccyear
experiment ancl randomly dt"tdes her plob mto to tr gruup:.: lhr: llr't'' cnilttcd
in onlv the liN vear; the ~~ond IS fctllhzec.lm on I~ the 'ccc.mtl ~ t:ar: the thircl io;
lcruhzcJ in onl> the rlurd year. and the: tourth, tht: cuntwl gwup. i' nc\'cr r~.-rtll
tzctl. tomatoes are grown annually m each pint. and the third yc.tr har' ~..st ''
\\Cil!hcd l"hc thn.:c trcmment group~ art: denoted bv the btnatv \,tnabks .A,_~.
\ ' \ ,IOU .\',_\\here t represents the thtrd \'Car ( lht.: y~o II in \\ htth the har\'c:'l is
1
} tf lbC plot J<; in the flf:.[ l!WUp (t~.;rltli7C0 tWO yc,us carltcr).
\\Ci!!hc:d) \'1 :
\'
=I
1f
thc
plot
was ft::rlthzcd ODI! vear t.:Jrhcr. JnJ .\,: I if the plot \\a~ fcr1
ulucd in th~;. fm 11 ye:-~r. In tht: conlt.:xt o{ Equ.ttton {1" 1) (whkh npplk~ tu a 'inl!lc plot). lhl.! dtcct of bemg ferllhzcd in the tina! year i... {3 the cltcct of being
h.;rttiUI.:tl One )Cdr 1..3rlier is {3. ctn cl lhe dfect Of h~.-lnf' krtiliAd i\\U \cars
trh ~r i" {3 Tf th" cflcct c.,f fertili7er ic; gn:.t!l.:<;t in the vcar 11 is npplicd. then {3 1
\\lHtiJ he larger than /3. anJ {3,.

598

CH APTU 15

Eshmotion of Dynomic Cou501 Effects

More gcncrall~ . the coeffici ent on the contemporaneous value ol X ,.{3 . ill tit~
contemporaneous or immediate effect of a unit change in X, on Y,. The coclt1ciem on X,_ 1, /32, is the effect on Y, of a unit change in X 1_ 1 or. equivalently, the
effect t) O Yr+ 1 of a unit change in X,: that is, {32 IS the effect of .t un1t chan~o:t 1n X
on Y one penod later. Tn geneial, the coeCficient on X ,_h is the effect of .t rn
change in X on Y after h periods. The dynamic causal cCfcct Ill the cfktt of ..
change in X, on Y,. Y1+ 1 Y,+2, and so forth , that is, it .is the scttuenc~ of t;.JU<~al
effects on current and future values of Y. Thus. in the context ol th~ J i-.tributcd
Jag model in Eq uation (15.3), the dynamic causal effect is the .;cqucncc of coeffi.
cien ts /3 11 /32, .. , {3,..,. 1

Implications for empirical time series analysis. This formulation ,,f


dynamic causal effects in time series data as the expected outcome of an cxp~o:Ji
ment in which different IJ'eatment levels are re peatedly appUetl to the sarm. subject has two implications (or empirical attempts to measure the dynamic cau.::al
effect with observational time series data . The first implication is that the dyMmic
causal effect should not change over the sample on which we have data . ll11s n
turn is imphed by the data being jointly stationary (Key Concept 14.5). A~ Jls
cussed in Seclion 14.7, the hypothesis that a population regression function I' ..,IJ
ble over time can be tested using the OLR test for a bceak, and it is pos~1i:llc to
estimate the dynamic causal effect in different subsamples. The second implication i~ that X must be uncorrelated wi tb the error term. and it as to this implication that we now tum.

Two Types of Exogeneity


Section 12.1 defined an "exogenous variable as a variable that is uncorrelatctl
with the regression error term and an "endogenous" variable as a variabk tiM! i..,
correlated wnh the error term. This terminology traces to models with muluph:
equations, in which an --endogenous' variable is determined within the m"' 'cl
while an "exogenous" vanable 1s determined outstde the model. Loose!} ' I'..
ing, if w~ are to estimate dynamic causal effects using the distributed lag m~Jcl
in Equal ion (15.3), the regressors (the X's) must be uncorre latcd with the c1 r\'r
term. Thus, X must be exogenous. Because we are working wirh time serit.:' J .1 ta,
however, we need to refine the definitions of exogeneity. ln fact , there arc t,..,
different concept of exogeneity lhat we use here.
The first concept of exogeneity is that the error term has a conditional me 10
of zero giveo current and aU past values of X, that is. that (u1 X, Xt-1 A,
.)
= 0. This modifies the srandard conditional mean assumption for mul11plc
regression with cross-sectional data (Assumpuon 1 in K ey C'unc~;pt tl.-n. ,.,h1Ch

15.2

Dynomic Covsal Effects

S99

rcqutrt.:s on I) thatu, ha a conditional mean ot tcro ghen th.,; inclm.h:d regre,sor-.


that IS, that .C(u, l X,, xt-1 . .. 'x,_r) = 0. Tncluding all lagged value-. uf ,.\ in rhe
condirional expectation implies that all the more distan t causal <:ffects-all the
causal e(fects beyond Jag r-are zero. Thus, under I his assumption . the r di tributed lag coefficients in Equation (15.3) consl lt UL~ all of the nonLero J~ namic
causal effects. We can refer to this <l!>sumption- tho.tt E(u,l X,, X, 1 . ) = 0-a
past and present exogeoeity, bur hecause of the si milarity of th1s ddinition anJ
the definition of exogcncity in Chapter 12. we just u!!e the term exogeneity.
The second concepl of exogencity is that the error term has meun zero, given
all past , presl!nt , andft~wrevaluesofX" tbat l<;. that E(u,! .... X . Y,_ 1. X,,X,_ 1 ,
X, _2.. . ) =0. This is called strict e~ogeneity; for clanty. we also call II past, present, and fu ture e~ogenei1y. The reason for introducing the concept of strict exogeneity is that, when X is strictly exogenous, there are more effi c1em estimators
of dynamic causal effects than the OLS estimators o( the coefficients of the distributed lag regression in Eq uation (15.3) .
The diffe rence between exogeneity (past and present) and strict exogeneiry
(past, present, and future) is that strict exogene ity includes future values of X in
the conditional expectation. Thus. strict exogeneity implies exogeneity, bul no t the
reverse. One way to understand the difftm:nce between the two concept~ is to
consider the implications of these definitions for correlatjoos between X and u. ff
X is (past and present) exogenous, then u, is uncorrcla1ed with curre01 and past
value::. of X,. Jf X is strictly exogenous, then in addition u1 1'> uncorrelated with
ftoure values of X,. For example, if a chang!:! in Y1 causes fuwre values of X, to
change, then X, is not strictly exogenous even though it might be (past and prl.!sent) exogenous.
As an illustration, consider the hypothetical muh iyt::ar t omato/fenitiz~r
expcnment described fo llowing Equatio n ( 15.3). Because the fe rtiliz~!r is randomly npplied in the hypothe tical experiment, 1t is exogenous. Because tomato
yield today does not depend on the amount of fe rtilizer applied in the future. the
f~rt il iler time series is also slrictly exogenou ~.
As a second illustration, consider lhe orange juice price example, in which Y,
is the; montbly pcrcen1age change in orange juice prices and X, h the number of
freezmg degree days in that month. From the perspecu,c of orange juice,; markets, we can think of the weather- the number of frcc7inQ degree days- as if il
were randomly assigned , in the sense that the weather is out ~ 1de human control.
If the cf(ect of FDD is linear and if it has no e ffect on prices afl t.:l r months. th~ n
11 follows that the weather 1s exogenous. But is the weather wicil} exogenous? II
the conditional mean of u, given fu ture FDD is oo ntcro, rhen FDD 1~ not
slrictly exogenous. Answering this question requires thinlo.tng carefully .&bout
what. prcci'iely, i:s con1ained in u,. In particular, if OJ mar\..l'l participant'> use

600

Estimation of Dynamic Causal Effect~

CHAPTER 15

THE DISTRIBUTED lAG MODEL AND EXOGENEITY

15.1

In the Jt<aributt:d l.1e modd

Y. - {31

/3 1X

{3_.X + + /3,

1 ~

+ llr

(1'.4)

there a n.: tWO Ui' crcnt t)pts of \Ogcneity. that is. tWO dt ., rent cxogcnelt)
condilll 11!.:
Pa~t nnd pn..:~~nt cxogcndty (cxo~cneit y) :

E(tt X. X ,. XI-' . . . ) = 0:
Past present. a nti future

~.: xogendtr

(strict cxogencily):

(15.6)
If .Y is ~t rictl~ exogenous it is exngcnous. but exogcneily docs not
cxogcne ' l\

fureca'''

l)f

impl~

:;tn(t

FDD when they decide ho'' much they ,.. ill bu} ur -,dJ ut ,,

gt\Cil

pncc. then OJ price' and thus thL ~.:rror term 111 wuiJ incorpm.lle inform.Hion
tbout future FDD that would make 11 a usetul predictor of FD D . Titb mcnn~ that

\\ill be corrd .llc.J '" llh future value:. of FD D,. According to 1hi'> logic. ,_ cau'.!
u, mcludcs forecast:; ol tutu n: Flonlla weather, f f)D would be (past anJ pre ~nt)
c\ogt:nou), but not Hriul; exogenous. The Jtflerencc bt:twcc.;n thi" MJ ths.:
tomato fl:rtiliza l'\.tmplt! '' that "lule tomaw planh aR un.1Uuted b) 1
c
lc.rlili:r.atinn. OJ m.uket participant' lift influ\.'nc~d h) forccash oi lutur\. IINida
weather. We return to the q ue~tion of whctht:l FDD i~ 'trictl) ~.;xugenou., wh~n
we anaiH~ Lhe or.. nl!c JUice price data in more Jct:ulm Section 15.6
Titc t\\0 ddtnitlllns of c~ogent.'I!Y Jrc ... ummuri7cd 10 Key Concept 15 I
11

15.3 Estimation of Dynamic Causal


Effects with Exogenous Regressors
lf X is 1.!\0genou),, then ill- dynamic c.wsal t'fft:ct on r Cdll he 1.!\timatctl h~ ul 'i
C<;tlmation of tht. dtstrihutcd lag r~oercc;sinn in Equation (15 I) Ibh '..: 1'-'h
-.umm.Jrtll'S the cundniun'> under \\htt:h the'>~.: OLS c'timator<; lcaJ to v J :ota
t.i~tical tn I crences .1nll introduce' lh namic mull 1phcrs <mJ curnu Ia IIve d' 11!lmic
muluplic rs.

15.3

tshmohon

ot IJynomtc l.ousol t:lteds Wlltll:xogenous

l<egressors

60 l

The Distributed Lag Model Assumptions


The four assumptions o f the distributed lag regression model are imila r to the
four assumptions fo r the cross-sectional multiple regression model (Key Concept
6.4). modified for time series data.
The first assumption is that X is exogenous. which extends the zero conditiOnal mean assumption for cross-sectional data to include all lagged values of X.
As discussed in Section 15.2. this assumption implies that the r dJs rributed lag
coefficients in Equation (15.3) constitute all of the nonzero dynamic causal
effec ts. I n this sense. the population regression func tion summarizes the entire
dyna mic effect on Y of a change in X.
The second assumption has two pans: Part (a) requires tha t the variables
have a stationary distr ibution. and pa rt (b) requires that the y become inde pende ntly distributed when the amount of time separating them becomes large. TI1is
assumption is the same as lbe corresponding assumption for the A DL mo<.l el (rhe
second assumption in Key Concept 14.6) , and the discussion of this assumption
in Secti on 14.4 applies he re as we U.
TI1e third assumption is tha t the variables bave more than eight nonzero,
finite momenrs. This is stronger than the assumption of four finite mome nts tha t
is used elsewhere in this book. As discussed in Section 15.4, this sl!onger assumption is used in the mathematics behind the HAC variance estimator.
The fourth assumplion , whicb js the same as in the cross-sectjonal multiple
regression model, is that there is no perfect multicollinearity.
The distributed lag regression model and assumptjons a re summarized in
Key Coocepll5.2.

Extension to additional Xs. The distributed lag model extends directly to

xs

multiple X's: The additional


and their lags are simply included as regressors in
the distributed lag. regression, a nd the assumptions in Key Concept 15.2 are modified to include these additional regressors. Although the exte nsion to multiple X s
is conceptually straightforward. it complicates the notation , obscuring Lhe main
ideas of estima tion and inference in tbe distributed lag model. For this reason. the
case of multipleX's is not tred\.:d explicilly in this chapter but is left as a s traightforward extension of the distributed lag model with a single X.

Autocorrelated ut,
Standard Errors, and Inference
In the distributed lag regression model, the e rror te rm u1 can be a ulocorre lated,
that is. u1 can be correla ted with its lagged values. This autocorrelation arises

602

CHAPTER 1 5

Estimation of Dynomrc Cousol Effects

THE DISTRIBUTED

15.2

nlL

lAG

MODEl ASSUMPTIONS

da:;tnbutcd lag model as gt\~n in Key Conccptl5.J [Equataon ( 15.4)J, where


.:\ is exogenous. that

ts..

l:.(11 X,. X1

X,

. ) -0

2. (c1) llw n ndom .,.,u1ablcs Y, and X, have a stationar) distribution. and


(b) ( Y1 X ) and ( Y _1, Y, l b~come independent as i g\!ts large:
J. Y, anti ,have more than eight non:tt!ro. fin ite momentc;: and
4. n,c rc is no perfect multicollinearity.
hecau~t!,

in ume c;crics data. the omiued factors includcJ in

11 1 can thcmselvc~

ht:
st!rially correla ted. For example. suppose that the demnnd tor orange juace .1lso
tlcpcnds on income, so that one factor that influt::nccs lhc pncc or orange JUte~ ts
income. c;p\!dikall}. the aggn:gatc: income of potential orange JUice con... um~.:rs.
TI1cn aggreg<HC income: is an omitted variublc 10 the distributed lag regres-;iun ui
orange juice price changes against free7ing degree day<: Aggregate incnm-:. ho\\
ever. i~ seriallv correlated: Income tends to lull in rece.;<;ion~; and Ti~e in t\p:tn saon" Thu-.. income ic; <;l!riall} correlated.mu. because it ,., p:.art of the error term.
u, "ill be s~..nalh correlated. Thb ~xamplc rs typtcal Bec~1u::.c onuttcd Jt:t~:rms
nJnt... of}' ciTe! thl:ffiSCI\1.~::. -:.c:riall} corrclat~d . in gcncral111 Ill the dt<;tnbull.:d 1.1g
model will he currdated .
TI1e aurocorrdaaion of 11, doc.-; not affect the con"i~tency of OLS. nM Jo~' it
introduce h1~b. If. howeYcr. the errors ar~ autocorrelatuJ. then in ~eneral the
u-.ual OLS standard errors are inconsi:>tent and a different formula must he used.
1 hu.;; corrcldtaon of the errors is analogous to hcteroskedastidt}: The
humo~keda-,ta<.Jty-only standard errors an: "wrong- when the error::. ar~.. sn .u.:t
IH:kro-,keda~tic, m the St:!n~e that using bomol>kcda~ticit) -Oil ly standard ~ rr~ r'
rc,ulh in mi..,Jc.tding statistical inference~ \\hen the error'> Me hetero-.k\.J ,. . til:.
Similarly. when the errors arc llcrially corrclau.:u. swndard errors prcdicut~.;J
upon i.i.d. errors ure wrong" in the sense Lhut they rcsull in misleatling ~tati.;tJ
calm[t!rencc c;. l11~ solution to thb problem is to U'-C hclt!ro~keda'\llcily- anJ 1utol:orn:lauon-con\i~tcna (HAC) standard errors. the toptc ol <;cc.:tion 15.--1.

Dynamic Multipliers
and Cumuiative Dynamic Multipliers
Anoth~.:r namo.: lor the

tlynamac cau-:..tl effect ~.>the d~ namic multtpher. TI1c cu111u

lath l' dynamic multipliers an: th~ cumulative cau ... JI t!ffcct'-, up to a gi' en l.tg:

1s.3

Es~malion of Dynamic

Cousol Effects wtth

Eo;;~ nov~ Regre~sors

603

thu:. t h~ c.;umulati\'C dynamiC multipliers mc.;a!>lll ~the cumul.llt\ e dkct ~.m }'of"
~h.mg~. in \'.

Dynamic multipliers.

Th~

I fr..:ct

~>I

a un11

~.h 111gc

in \ l.lll } afh:r h pcnods.


\\ hllh b {31, , 10 Equatton {15.-1), 1:> ct~llcJ th~. h P'- riuJ d) namic multiplier. Thm.
Ihe. J) rl.tnliL multiplier., rdaung X h, S' ,trc: the ~.:odf1t:iLnh ~m \', .mc.l it!> l.tgs in
E\.JU.tli{ln (15.-l). For example.J3~ i-. th1. ~.,n~,. pcrinJ J~n unk multiplicr.J3, j, the
two pLfilJ J) namic muhiphcr. and so h,rth In tins terminology, the 7r..:ro-penod
(or con!l:mporaneous) d~namic muhiplit:r. or impact ellcct. '' J3 1. the effect on Y
\I .r~.h til~~ 10 ,\ in the ~arne ~riod.
BcL.Ill''- the dynam1c multtpliers a.rc: c~lmt.ttc:J b) thl' OL5 r~;gre~ston coetltll\, nh thcrr <otandard erron. .trc the HAC 'tanJ.trJ ctt uh ,,[ tL~. OLS r~:.grc;,..,iiJn
c1 K fficicnl'>.

Cumulative dynamic multipliers. lltt.. hpl.ll\tJ cumulnthe d)namic multi!llicr t;., the cumulative effect o1 a umt ch.rngc 111 .\ on ) o' c1 the:! Dt::\1 It r~riuc.ls.
I hu .... th~ cumulath I! c.l) namic multiplicr. .trc the cu multtliw ~um of thc dynamtc
multiplkrs. In terms of the c.oefficicnt-. of thc Ji-.tributcu lag tt:gre-.-sion in I:quation ( 15.-1). the zero-periuJ cumulatiY'- multiplier j, /~ 1 the. on~. p-.!riod cumulati,e
multiplier is J3 1 + J3=- and the 11-periou <:umul.ttive J\ nnmic multiplier is {3 1 T
f3 + , J31 _ 1 The ~urn of all the ndr-.:idu.JI c.lyn.muc mulllplkr-... {3 +
/3 2 -r -r J3, . i<~ the cumulatl\ l. lvnv-run dll.:~t on }' ul :.r change in .\',and t
C:lllcd the long-mn cnmolath-c d~ou mi c multiplier.
F~lr c;:.:\.tmpk. consrJcr the regrt.'>'>IUn Ill r ~JU.ttinll (I '\,2). The immt!diate
dlc~t of an additional freezing dcgii.!C c.l.1~ i)> tl 11 the pric<. of ur.mge juice con<.~.n trat~.- ri"e' b~ o..r%. Thl' cumulati,e cftcct nf n pm:c ch.1nge mcr the nt:xt
ffi{)nth IS the ... um of tht! impact cfkct mJ the: d) namr..: cllu.t ,101. month c~hc.td
thu:-. the cumulame dfc.ct on pnccs "th ... IOitlalmcr~. N; lli0A7o plus tht: "ubSClJUC:Ill <;malkr mercase of 0. 14 "u lor a tot.1l ol O.t> t~ ~umlarl) . thl! cumulatl\ 1.!
d~n.tlllll multiplier 0\er t\\o month"'' 11.47% + U.l4'...
U.tl6% = 0.67'}...
The cumulati\'C d)namiL multipli~.;r' cun h~. ~,,ttmatcJ directly using a moJt
fk.cti\lll of the uistribuleJ lag regr~.:,!.il'n in Equation ( l:'i...l ).1l1b mouil1ed regres,jon is

o.- )

The C~)dficicnts in Equatil.lfl


c'i, ..l~ ... /\,_.., are in fact th~ c:umulati\'C
J\ n . mh.. multipli~.:r.,_ This can he sho\\ n b' t bitc1f .tlgl'bra (E.xcrcic;e I" '). "bich
dtmum.u.ltes that Lhc populaUCln rcgrc<.,ion' in Equ.rtwn~ ( ".7) and (I" 41 .rc
cqUt\ ctlcnt. \\ht.'re s~ = {3 ,. ril = 13,. '~2 = {3, {3~.
/31 f I t p, nd ,\, t~.mh
'J11 ~.- c \t:lllct nt on.\'. ,. i>rt i~ the lung-ruu curnul.llive J>n.tllltC multph ... r that

o,

604

CHAPTER 1 s

Estimation of Dynomic Causal Effects

a,

{3, . t Mo reover, the O LS estimators or the coefficient:> in E quation ( 15.7) are the same as the corresponding cumulau "~ .. um of
the O LS estimators io E q uation ( 15.4). For example,~ = ~ 1 + ~z rn~ main benefit of estimating lJ1e cumulative dynamic mulupl.lers using the spe<.1ficauon in
E quation (15.7) tS that. because the OLS estimators of the regrC!>l:itOn \:tlCffkw;
arc estimato rs of the cumulative dyoamjc multipliers, ilie H AC standarJ ~rr,rs
is.

1-

fJ. -

{J~

+ /33 -

... T

of the coefficients in Equation (15.7) are the HAC standard l!rror-. (>f th" t:u nulative dynamic multipliers.

15.4 Heteroskedasticity- and Autocorrelation-

Consistent Standard Errors


If the e rror term 11 1 is autocorrelared, then OLS is consiste nt, but in gener.tlth\:
usual OLS standard erro rs for cross-sectional data are no t. This means tlldt conventional s tatistical inferences-hypothesis tests and con(idence intcrvul., based on the usual OLS standard e rrors will, in general , be m i sl ead~ng. Fort. 'IC.Inl
pie. confidence intervals constructed as the OLS estimator :!:: 1.96 convt."ntional
standard er rors need not conraio the tr ue value in 95% of repeated sampk-; c,cn
if the sample s12e is large. This section begins with a derivation o( Lhe corr~.:~.' h'\r.
mula fo r the variance o[ the OLS estimator with autocorrelated error~ th~o.n tur 1:.
to heteroskedasttciry- and autocorreJati oo-<:onsiste nt (HAC) standard er ror
This section focuses oo H AC standard crwrs in time series data. H AC ... t.lndard errors fo r panel data were introduced in Section 10.5 and Appendx lll.l.
This section is self-contained . and Chapter 10 is no t a pre req uisite.

Distribution of the OLS


Estimator with Autocorrelated Errors
To keep things simple, consider the O LS esrimato r ~~ in tlle distributed Jag
regression model with no lags, that is. the linear regression model wi th a -,in~lc
regressor X 1:

where the assumptions of Key Concept 15.2 are satisfied. This section s h o~ -. thnt
the variance of ~ 1 can be wrin en as the product of two terms: the expres~iun fM
var(~ 1 ). applicable if u, is no t ~crially correlated, times a correctio n factor thnt
arises fro m the autocor relation in u1 or, more precisely, the autocon eltliiOO 111
(X, - JJ.>.)u,.

..

15 . 4

Heteroskedosticity- and AutocorrelationConsistanl Stondord Errors

605

A~ :.ho\\o 10 Appendix 43. the formula lor the OI.S c~tmntor ~~ in Key
Concept 4.2 can be rewritten as
I

T 2:(X, - X)u,

...
{Jl

= {3!

(15.9)

1 f
'
(X,X)
2
T ,_.

-L

where Eguation (15.9) is Equation (4.30) with a change of notation so that i and
re placed by rand T. Because X 1-L.\ and }
1 (X, - X)~ ~ u:{.
in large samples ~~ - {3 1 is approximately given by

z.:

11 ar~

T 2:<X, - J.Lx)u,

fJ 1-'1

fJ

=-1

""'

1-' I -

<rx

where v, =(X, - J.Lx)u, and ii

.,.

t 2: v, =

=-

' '".L
l

o-x

vl

cr}:

(15.10)

,.

= ~2:,. 1 v,. Thus,


(15.11}

If,., is l.i.d.-as assumed for cross -~ectional data 10 Key Concept 4.:>-then
var(ii) = var( v,)/ T and the formula for the variance of {3 1 from K ey Concept 4A
apphes. If, however, u 1 and X 1 are not independently distnbuted over time. then
in general v1 will be sen ally correlated. so var(ii) :t- va r(~'.)/ T And Key Concept
4.4 does not apply.lnsread, if v1 is serially correlated, t h~ \anance of r b gtvcn by
var(ii)

= var[(v 1 + v2 + + vr)1 1l

= [var(v1)

+ cov(v1.v1 ) + ~ cov(\' .1
cov(v1, 11) + var(v2) + ~ var(l' r)]/ T 2
= [Tvar(v,) + 2(T - l)cov(v1,v1_ 1)
+ 2(T - 2)cov(v,,v, .,) + + 2cm(1'1.1 '1-n 1)J/ T 2

(15.12)

u -'

= ifr
where

T-l(T ; 1)Pj

fr = 1 + 2 L

(15. 13)

j =l

when: P; = corr(v,.v,_1). In large samples, fr


I t- 2 ~... ,p.

t~nc.b

to the limit

f r - for. =

606

Estimation of DynamiC Cou$01 Effects

CHAPTER 15

Combining the ~xpresstons in Equation (15.10) for iJ ami Fqu,ttton ( 1'\ 12)
fo: i'~l) give~ the formulu for the ,ariancc of ffit when r, is auto~orrLI.tlt;J

1
~

,..-

~Gv
,<;)(X~:-~
L- !'~

vm(/3)

[1 (tT~u= ) ~ ]f1

= r

(1

_1.:)

where [ 1 is ~ivcn in Equaion (15.1 3).


Equatton (I'\ 14) expresses the variance of {3 1 as the product of twotcrnh. 1h
first. tn o;quarc brackets, is t h ~ formul,t for the vanancc of f3 giv('n in "-.1..'\ c,, .
~pt 4.4. whtch appltcs m lhe absence of scnal correlal!on. rn~..,\,;conJ '' the t .t~tor
~
~~- \\htch .JUJU'-t~ tJ1is formula tor c;en.tl correlation. Be<.:<tuse ul thi' tJ lit tunal
~
(actor 1, in Equation ( 15.14). the usual OLS standard error computed ti'-IO J l"lJU.I
tion ('i 4) is incorrect if the t:rrors arc scriall) cond ,lt~J If ,, -=
(X, 1-L\)u, i" ~eriallv correlated. the e..,limator of the variance is ofl h) tlw
factOr
....,-

~..t::ol

X, fJ_ '/J

l1.

r,.

HAC Standard Errors


If thi.! factol f-1 defined in Equation

ii,

(15.13). w as known. then the v,trian~~ ol


could be estimated bv multiplying the usual cross-sectional e::.tim<HN of I hi.! , ,,riance by Thtl> factor. however, depends on the unknow~ autocorrdations ''' 1
so it mu"t bee ttm.ued. rhc e~ttmator of the variance ol f3 that mcorptxatc' tht"
aJjul>tment 1~ const~tCDl whether o r not there is heteroskcda..,ltcll) tnd \\ 11etl
or not t 1 t., .IUtocorrelated. Accun.lmgly. this esumatur is called tht.. hcteroskcda.'lticity- and aufocorrclation-consi~tent (HAC) estimator of the v;m,mcc ul 1 nd
the square root of the HAC variance estimator i lhe HAC standard error of

r,

The hcteroskedasticit~- ctnJ .IUto~ortd . t titm


con,btt:nt e"timator of the variance of f3. is

The HAC variance formula .

( J'i 1;.)

where rr/J b thL t.!StimatOr Olthe variance~~~~ in lhe absence of Sc.!rtall:Otld~


tion, gtvcn 10 Equation (5.4). and wbt:re f r is an estimator of tht.! factut I, 111
Equntiun ('I " -11).
n1e w"k of <.:ono;lructing a consistent estimator / r is chaUenging. r~~ scl' why.
con-.ic.ler two C'\trcmt:~. At one extreme. giH:n the formula in Equ.llion ( 15.1~).
11 mtght ~eem n.nural to replace the populatton autocurrel.uion" p \\ tL!l
tht.. sample autocorrelauon-; p [defineJ tn Equattoo ( l-1.6)].} tdlltng th~ \.:!'1,101
tor 1 2 ~J 1 (!. , ) Pr But tht~ e~ttmawr con tams so man~ c... umatt..J tH>
corrdauon' that it io.. incon~istcnt l ntuitivcl~. because each of th~ t..:-.,ttmah.J t r.~
correlation<> contains ~:stimation error. h> e'limaling '>O man) auto\.'mn.l 11 111'

....

15.4

Heleroskedostictty ond AutocorrolotionConsistent Standard Errors

607

the ~'timution ~rrur in thts ~:-ttmator ot I 1 remain!'> lurgc even in largt; ...ample,
At the other extr~m~ on~ coulc.J imagine u...ang onl~ :1 lc\\ 'ampk autocorrelation-.. fur c\ample. onl) the Cir.t ..:tmph: auhXIlrrclttion anti ignoring :til the higher nutocorrclation;;;. 1\lthoueh thi estimator t.hminntc:s the problem of c:.llmatmg
too many autocorrelation!'>. it has a thflerent pHlhkm It IS inconsistt:nt because it
tgnorc-. thc add1tional auwcorrcla11ons that appc.<tr tn J:.quation ( 1!'\ .D). In :-.ho~t.
u~ing too many sample autocorrclauons ma~c~ thc c:-t imator have a large van
tncto. but u:.mg t oo fc\\ autot:urrclation' tgnorc., the autocorrc:l.uion ... at higher
lags, ~n in either of thc-.c c\trc mc cases th~ c.,tima tllr j, incon'i'h:nr.
E-.tim.ttors of fr u'ed tn practice -.trike a halance between tbec;c two extreme
cases hy choosing the! number of autocorrd.lllOns to mcludc m a \vay that
Jcpen<.b on the c;amplc '>17c r.If the sam ph: sw.: 1s ~mall. on I ~ a [t:w uutocorrdatioru; arc u-;cd, but 11 the S<.tmplc <:1/C t\ large. more au to~:orrd.lliOO\ .m: tncluded
(hut c;till far fev.e r than T). Specifically let j, ~\,; ~I' ~n h)'

j1 = 1 ~

t;:,'(m-1) i).,
'

(15.16)

2,;_

where p 1 =
.\)it (ac; m the dcfi nttion of
1r,,, ,12.:. \- .where i\- (X
ir ' ). The pa ramet~r m 111 l:qua tion (15.16) ic; c.lllcJ the truncAtion pl! ramcter of
th1.. HAC esum::nor becaW>c the sum of autocoul!lalloru. ., ~h ortencd . or truncalcu. to tnclutlc: only m - 1 autocorrdations m-.h.ad of the r- 1 tiUtocorrclations
appl.mng in the populatton fonnula in E4uauon (I ' 13).
f or fr to be con~~tc:nt.m must bt.: ch o~c:n so th.1111 ll> lan~c tn large sample~
although still much Jc-;s than T. O ne guideline tor choo... tng m 111 pract1c.:1. i to use
1he formula
m

= 0.75T1':l.

(15.17)

ruunJ~:d tO an

integer. This furmula. \\ hich is htl\cU on the ;t\:o.Umptllln that there


11odc:rate amount uf .Jutocorrelation in ' .. ~ivc.; ..1 ~nd11nark rule for tlcterminin~ m as a funcuon ot the number o( Ob:>cnattons m the r~,; ~?Te~i on. 1
l11c value ot the trunca tion paramet~r m rc~uhtn2.1rom E:.qu1tton (15.17) can
bl! mod tiled using your kno" ledge ot tht: scnes at ham!. On the one hand.tt there
is a great deal of :.cu<tl correlation 1n 1,. then >uu could increase 111 bt!yond the
value.: from Equa1u1n ( 15.1"'). O n tht: oth..:r hand if ' has littk 'cri.11 correlation.
) ,,u could decrc:a'>c m. Becau...e of the ambi~uitv t'>:-cx.iatc:d "ith the choice of m.
it "good practice to rry one or two alternative values of m lnr .u ka-.t one ~pec
tfl~.tlum to mal..c sure.: your re...,ults ar\.! not S\!llC:Ili\C to 111.
j, a

---

1Fqu.lllllll (I~ 17) !tin:~ Ihe ")X:,I'' ,;hnice <lf Ill If, nn.J \ oliO: fi N urllt'r nulurq!!rC'"'c prc..:..:,~s with
rir.-1 null"llrrd ron cudfidcnl.; 0 ~ \\her" "IX:\1 me.rn' lh 1.. l1m.11or 1lur1 nunlllllh I
I <jUIIIron (I"' ill' b.1..eJ oln n niCifc gcnci.tllomlllla .!em ~J h) ,\ndrc\\ [ii\ll , t ttU<~Iion f ~ iJ)

ca'

"; )

608

CHAPTER 1 S

Estimation of Dynamic Causal Effects

ir

The HAC estimator in Equation (15.1 5). with


given 1n Equatu.m ( 15.11))
is called the c~ey-West "1Uian~ cstimattor, after the c..'COnomctn~.w n ' \\ hune\
Kcwc~ and Kenneth West, who proposed it. They showed that. \\h~o n u'c..d c~long
wnb a rule like that in Equation (15.17), unde r general assumption., thi c'lima.
tor is a consastent esliroator of the variance of ~ 1 (Newe;. and We~t JlJl\).1hcir
proofs (and those in Andrews, l991) assume that v, ha~ more than rour moment ~
which in tum is implied by X, and u, having more than eight momc..nh, .lnd tht' t~
the reason that the third assumption in Key Concept 15.2 ts that.\, ami 11 h ,,l;
more than eight moments.

Other HAC estimators. The Newey West variance estimato r is not th~ nnl)
HAC l!!.limat or. For exa mple, the weights (m- j) l m io Equation (15 16) can h~
replaced by differenl weights. If differe nt weights are used, then the rul ~.: for chuo-;.
ing the truncation parameter in Equation (15.17) no longer upplies and a Jll ll.t
ent rule, developed for those weights. sh.ould be used instead. Discussion ul I lAC
estimators using other W(!ights goes beyond rhe scope o( this book. For mote tnlormarion o n this to pic, see Hayashi (2000. Section 6.6).
Extension to multiple regression. AU the issues d~'\cussed tn thts ~~..lion generalize to the distributed lag regression model in Key Concept 15. 1 with multtph:
Jags and, more generally, to the mul lipl~ regression model wi lb sen ally com.:l.llcu
errors. Tn particular. if lhe error term IS serially correhned. then the usull 01 S
standard e rrors are an unreliable basi~ fo r inference aod HAC tanJ nd ~.rroro;
should be used instead. If the HAC vanaoce estimator used is the Ncwcy-We:.t
estimator fth e HAC varia oce~tima tor based on the weights (m - J) l mJ , th('n th"'
truncation parameter m can be chosen according ro the rule in Equation (I 'U7)
whether there is a single regressor or multiple regressors. The fo rmula tor HAC
standard errors in muhiplc regression i incorporated mlo modern r~&re'"nn' lfl
ware destgncd for use with time series data. Because this formula involves rr ri'
algebra, we omit it here, and instead refer the re ader to Hayashi (2000. S 1.l ton
6.6) for the mathematical details.
HAC standard errors are summarized in Key Concept 15.3.

15.5 Estimation of Dynamic Causal Effects


w ith Strictly Exogenous Regressors
When X, is strictly exogenous, two alternative estimators of d) numic c..uu..,,tJ effcc~'>
are avaHable. The fi rst such estimator involves estimatin~ an au torc"r~''''\!
di..,tnhutcJ lup (ADL) model mstead of a di tribmed lag modd.and c.tkul.tllng the

15. 5

E~timotion of Dynamic Cou50l Effecb with Strictly Exogenous Regressors

609

HAC STANDARD ERRORS


f he p roblem: The error te rm u, in the d istributed lag regression moucl in Key
Concept 15.1 can be serially correla ted. II so. the OLS coefficit!nt esumators are
co nsistent but in gen~ral the usual OLS standard errors are no t. resul ting in misleading hypothesis tests and confidence intervals.
The solutio11: Standard e rrors should be computed using a heteroskcdas ticity1nd autocorrelation-cono;io;tent (H AC) estimator of the variance. 'Ote HAC estimator involves estimates of m - l a utocovariances as well as tht! vanance; in the
ca<:e of a single regressor. the relevant formulas are given in Equation-; (15.15)

15.3

and (15. 16).


In practice, using HAC standard e rrors entails choosing the truncation paramd er m. To do so, use the formula in Equation (15.1 7} as a benchmark. then
increase o r decrease m depending on whether your regressors and errors have
high or low serial correla tion.

dynamic multipliers from the esti.mated A DL coefficients. TI1is method can entail
estimating fe wer coefficients than OLS estimation of the distributed lag model,
thus potentially reducing estimation error.1'he second method is to estimate tbe
codficieots of the distributed lag model, using gene ralized least sq uares (GLS)
instead of OLS. Although the same number of coefficients to tbe distributed Jag
mode l are estimated by GLS as by OLS, the:: G LS estima tor has a smaller variance. To keep the exposition simple. these two estimation method!; are toi tia Uy
laid o ut Hnd discussed \o the context of a distributed l~g model with a single lag
and AR( I) errors. The potential advantages o f these two estimators a re greatest,
however. when many lags appear in the d istributed lag 01odel. so these estimators
are then extended to the general disuibuted lag model with higher-order autoregressive errors.

The Distributed Lag Model with AR(f) Errors


Suppose that tbe causal effect on Y of a change in X lasts for only two periods,
that is. it has an initial impact effect {3 1 a nd an effect in tbe next period of {3 2, but
no effect the reafter. Then the appropriate di ~tri but ed lag regression mode l js lhe
distributed lag model with only c unent and past values of X,_ 1:
Y,

= {3

11

f3 X,

{3 X,

1 "t llr

(15.18)

610

CHAPTER 15

Estimation of Dynomic Cousol Effects

As discus!>ed in Section 15.~. in general Lhc error term 111 in Equation (15.1 X)
is serial!~ corrclatcJ. One cunscqucnt:c ot this !>erial correlation is that. if the Jistnbutcd lag coefficknb arc C:>lllllatcd by OLS, then mfcrencc based on the usu:al
O LS standard errors ca n bl! misleading. for thi::; reason, Sections 15.3 and 15 4
emphasized the use of I lAC stanJarc.l error::. when {3 1 and {32 in Equtltton ( 15. 1~ 1
are estimated by OLS.
T.n lbi.s section. we take o diffcrcnl appro<lch toward the .serial correlation in
u,. Thi-: approach. whtch is pO!oo!>tble if X, b ::.uictly ~xogenous. involve<; adopti 1~
an auton::grc~~tve model for the o;cnal condation 111111 then using thts AR m<,<kl
to derive ')ome estimators that can be more efficien t than the O LS estimator tn
the Jistributcd lag mnJL'I.
Specifically. -:uppusc thatu, folloM lhe A R( I) model

U1

= c/> 1111

( 15.

I -r IL1

where l/11 b the autoregrco;:.i' c paramcll: r, li, is seri.tlly uncorrdated, and no i nt~;r
cept is ncl!ded becau..,c E(tt,) 0 Equatinns (1 5.1X) and (1 5.19) imply that the-

distributed lag model \\ ith a seria lly correlatcJ e rror can be re\\ rill en a~ an
autoregrcs!>ivc distrihut\!d lag modd wtth a serially uncorrclatcd error T{) Ul 'o.
lag each :.tdc of E quatton (l5.1H) and subtract c/> 1 times tb i lag from each "'de

Y, - <P t Y,

1=

(f3u .... f3,.Y,

+ f32Xt

= f3t, + f3tXt + f3~X,_ ,

..L..

u,) - c!>t (/3o ~ f3tX,_, + f3zX,_ ~

- <f>Jf3o- <f>1{3JX,_ t- d>1{32Xr-2 -

v.hcre the second equality uses li, =


(15.20), we ha,c th.tt

111 -

cl>tllt-t

Collect ing

term~

ii,.

+ u -t)
(15 .2)

in Equation

where
(I ~

221

"here {3,..{3. and 13 .ue the codltcknt'> m Equation (15.1h) and <b IS the ct utocorrdation codfictcnt in Equatton ( 15.1 ~) .
Equation ( 15.21) ~ .111 DL m odd that include:. a contcmporanc\lU!"> ... aluc
1
of X anJ two of il::. Ia go; \\'t.> "til rclcr to (15.21) a~ the ADL representation f t:
Jio-.tributed lttg model \\ith autnrcgn:ssivc error.; given in Equation1\ ( 15 IX) 111 J
(1519).

1 s.s

Estimation of Dynamic Cou$ol Effc<:ts with Strictly Exogenous Regressor~

61 1

The term'> 10 Equ:nion ( 15.20) cun b~ rc.:urganvc:d diffc.:rcntl~ to <2?tain an


c\prt.:\'>ion that is cqui\ tlc:nl to l:.quatinns (I::> 21) and ( 15.22). Let Y, - Y,
tfJ 1Y,_ he the quasi-difference ol Y ( qua\! .. hccnu ...c it i~ not the first dtfft:rence.
the.: Jiffcrcnce between Y1 c~nd Y, 1 r.llhc.:r, 11 '' the.: dtlfl.:rcncc hctwccn Y 1 and
4> 1Y1 1). Srmilarl~. kt ;y1 = X, - <!>,X _1 be the qua-.t-c.Jiffert!nce uf X,. Then
Equation (15.20) can be written

(15.23)
We will refer to Fquation (I "'.23) a~ the quasi-dtfference representation of the
distributed lag. model "ith Hutorcgrc.:~~v\! error" i!,th:n in Equa11ons ( 15.1 '{) and
( 15.19).
The .\DL model Equation (15.21) ("tth the raramctcr rc::,tnctton<> in
Equation (15.22)] ,md the quasHhtkn:ncc moJcl in Equation (15.23) are eqot\
alenl. In hoth moc.lels. the error term. i'i,. i<> serially um:orrelatcd. The two rcpn::sentation however. suggest c.Jiif~..:rcnl C\timatrnn <;tratcgics. But before di cussing
those strategies. we turn to the assumptions unc.ler which they yidd c<.msi tent
C:>timator. of the: dynamic mullipht:r~. {3 .tnc.l/3:

The conditional mean zero assumption in the ADL(2 , I) and quasidifferenced models. Bee<mse Equation ... (I i 21) l'\ith the restrictions in Equation (15.22)] and ( 15.2.1) arc equivalent. the condition~ for their c::,limatkm arc the
<;a me, so for convenience we consider Fqumion (I 5.2~).
The quas1-diffcrcncc model in Equation ( 15..23) is a &nribut:!d lag mudd
involving the quasi-differenct!c.l variablt..:s with 11 serially uncorrclated error.
Accordingly, the condllions for OLS cstimatmn of the coefficients in Equation
( 15.2.1) arc the least ~quares a~sumptiuns lor tho.: dhtributcd lag model 10 Key
Concept 15.2, expres...ec.l in term~ ol u, and A,. The critical ~umption here it; the
fir;t as<;umption. \\ hich. applied to Equation ( 1.:5.23 ). b that X, i<> exogenous. that is.
{15.24)

where letting Lhe conc.ltttonal ~ xpcct ation depend on dtstant lags of X, ensures
that no additional Jags of X,. other than those appearing m Equatton (15.23).
enter lhe population rcgrc~sion function
Becau.<;e A 1 = .\ 1 - <J> 1X,_ 1, M> X, .\'1 dJ 1X, . conc.litioning on X: and all
uf ih lags ,., -:quh alent to conJttioninlt on Y .tnc.l tll of it lags. Thu". the condiLtonal expcctauon condllton tn Equation (I 'i ::!4) is equivalent to the condllion
tbat ECu, X,. X, 1, ) = 0. Furthermore. hi!C,tU\1! 11 1 =- u <b 1111 _ 1.this conc.lttton
in tum implks

612

CHAPTER 15

Estimation of Dynamic Causal Effects

E(il,

x,. x,_, ... . )

= E(u, -

c/> 1u,_ 1 IX,. X,

= (u,I Xr X,_1, ... )

( 15.2$ \

1, )

1X,,X,_ 1, . )

c1> t;(u, 1

For 1he equaht\ in E4uation (15.25) to hold for gener'll ' lu~, of </> 1, it must
be the case that both E(u, X,, X,_ 1, . . . ) = 0 and E(u 1 X .. ~ ,_. . ... ) 0 B)
shifting the time :.ubscripts. the condition that E(u, Xr Xt-1 , . .. ) - 11 can be
re\\TlltCO ac;
E(u,I Xt+ t X,.X,_ 1, )

= 0,

which (by lhe law of itcrared expecta tions) implies that (u,l xr ,\ I
)
0 In
summary, having the zero conditional mean assumptton in E4uation (15.24) hoiJ f~~r
general values of <b 1 i<> equivalent to luwing the condition in Equation ( 15 211) hniJ.
The condition in Equation (15.26) is implied by X, being strictly c xog(!nuu~.
bu l ir is nor implied by X 1 being (past and present) exogeno us. 1l1us, till' ll.!mH
squares assump tions for estimation o( the d imibuted lag model in f7<.J.U Uihm
(15.23) hold if X, is strictly exogenous, bu t it is not e nough thai X 1 be (p,1..,t and
prcsen1) exogenous.
Because the A DL representation (Equatio ns ( 15.21) and (1 5.22)] is cqut\ al~nt
10 tbe qua:,i-differenced representation [E4uation ( 15.23)], the conditi<'n I mean
assumptio n needed to e-;timate the coeffici.::nts 0 1 the quasH.Iifferenced r~~msen
talion (that E(u,I X,+ t X('J X,_ 1 ) - 0) is also the conduional mean ....,,umptton
for consstcnt estimation of the coefficients o( the ADL r~presentation.
We now turn to the two estimation stra1egiec; suggested by thelle two rcpr~
sentations: esltmation of the A D L coefficienb and estimation of the codhctt'Ol'of the quasi-differe nced model.

OLS Estimation of the ADL Model


The first Strategy is to use OLS to estimate the oocffictents in the ADL rnoJd in
Equation ( 15.21). As the derivation leading to Equation ( 15.21) shows, inclu dlll~
the lag of Y and lhe extra lag of X as regressors makes tht: ~rror term sl.'rwll)
uncorrelated (u nder the assumption that the e rror fo llows a J"irsl o rder auturc
grc::;sion). Thus the usual O LS standard errors can be used. that is, HAC sL,mJatd
e rror" are nol nccdcJ whl!n the ADL model coeffici e nt~ in t:quation (1:;.::!11 Jtr.:
estimated by OLS.
The estimated ADL cocrrictenls are not themselves csrimates 11f t
dynamic mult1phl!rc;.. but tht: d} namic mulupher<: can b.. comput<.d from th~. ,\ JL
cod!JCI nts. A gcnc.;r~l ""'~ to compute lhe d~namtc "TlUitlpl11..rs Ill to ~'Ph'~

1 s.s

Estimohon of Dynamic Cous.ol Effect~ w11h Strictly Exogenous Regressors

6 13

c~timlh. J r~.~r\.!!>!)ton tun~.:unn

..... 1 luni..IJun ul current tnd pa ... t' tlu~l> \lf X that


chmmal\.! Y, lrom the estlmatl.'d r~,;rrc ''"" lundmn. Tn do su. rept:.ttedly
... ub,tituh.. c\pr~..,!'ton ... lor lagged' aluc' ol } 1 mill the e-,timatcd r~,;grcc..<>il'm function Spccificall}. cun...iJcr th~.: ~stimatcJ rc:"l!rc,,illn function
~. lCl

(15.27)
where the c~timul <.: tl in tercept lws been nrniLLI.!d beca use it does not cnrer any
cxpn.:!-o,ion IM the dynmic multipli~r,. Laggmg huth <;iut::s of Equation (15.27)
~ielc.ls } 1 1 =- J1 11', ~ f. ~~lY\'1 1 .... ii, X, , + ~:.X, \ '>0 replacing Yr- in Equation
(15.::!7) b\ :r, 1 und collecting termc; yrdds

Y, = J,,(,;,, Y,

~ -

cS0 X,_,

+ 5,x,_2 + ~,_ 3 ) + 8,~, + 6,x,

1-

fi:.X,_ 2

R~ pc-ut mg

thb prOCI.'!)'> by repeatedly suhstituting expr~sions for Y,


anJ -;o lurth yields

,;1('~2 ~ J,~ , *

q,i6. )X,_3 + d1i(8~ + cb 151 - J>iS, )X


1

..~ + .

( 15.2X)

1,

Y,

3,

(15.29)

The coefficient<> in Equation (15.29) are the ~,. ,tt ma tors \.>1 r h~ d}namic mulupliers. comput~d from the OLS cstim:.ttor~ of the cot fficients in theADL modd
Ill Fq uat10n ( I).21 ). If the restrictions on th~: wdfiL:Jcnts in Fquat1on ( 15.22)
~ ere to hold exact!) for the eMinwred coefficients. then the dynamic multipliers
bcyontl the :-ccond (that is. the codfi cicnl., on X, ~ Y, J ami '>0 forth) wou ld all
he 7Cro ' Howe\'er. under thi c<:.limation strategy those rec;trictions will not hold
C),,u.:tly c;o the cc;trmatcd multipliers beyoml the 'l'cond in Equation ( L5.29) ,,,11
I!O::nc rall~ I'C nonzero.

G LS Estimation
1l1c '.::C<lnJ stratee\ fm ~'tima uny. th~.: d\ n rm tc multrplicr;; \\hen X, ts stncll~

exo!!enou' is to u'e ~cncrah 7cd least squa res (GLS). which entml'> estimating
Fquatinn ( 15.'"'.3). fo describe tb~: GLS ~:,trmator. \\C rnniall} assume that </J is
knO\\ n: Dc~ru:,c in prac.ttce H is unknown. thts esum,uor h infea,tbk, so 11 1:> called
the tnk.r:.tble GLS ~umator. The mlca,thk GLS c'timator, ho\\e\er. un ~mod
ilied using 31) l.''tlmah)r of tPt \\-hich) H..ld'" r~.. Nhk \l..r...ion of the GLS e... timator.
'<;u(hllluk the cqu.11111c:' 111
<lrl>u

II

l:qu.lllon ( 1.:\.lllln 'hom thnt, if thtl'C cquaJiti~ hoiJ lho:n 6 ,

-t <f,1fi

614

CHAPTER 1 s

E.slimation of Dynomic Cousol Effect.s

lnfeastble GLS. Suppol>c that tb 1 were known: then the qu asi -dill~.;r~,;nc~,;<.l \'ilrJables X, andY, could be computed directly. As discussed in lhc e<.mt~,; \I ot rqua ton
(15.24) and (15 26), if X, is strictly exogenous. then E(li, l X,,

X,. 1 ) o. rnu-.. 1t

X, is strictly exogenous and i! 4> 1 is known.th.e coefficients ~./3 1 , and {jl in Equation
(15.23) can be estimated by the OLS regression of Y, on X, and X, 1 (mcludine an
intercept). The result ing estimators of /31 and {32-that is. the OLS cl>ttmator!> ol tht:
slope coefficients in Equalion (15.23) when ~ 1 is known- arc the infe"'iblc Gt~
estimators. Thb esttmator is infeasible because if>1 is unknown. w l, anc.l }: c. 1 n,
be computed and thus lhese OLS estimators cannot acmally be. computed.

Feasible GLS The feasible GLS estimator modities lhe infeal>l hl~.< GLS t ' .
mat or by u mg a preliminary estimator of 4>1, 4>1 to compute the c timatt:d qu 1!>1
differences. Specifically, the feasible GLS est imators of /31 and {3 2 ~re the 01 )
e~timators of {3 1 and {32 in Equati9n (15.23). computed ~y regrcs~ing ~ on .\'1 .tnd
X, _1 (with an intercept). where X1 = X,- ~ 1 Xr-~ and Y1 = Y, - ~ 1 Y, 1.
1l1e prelimi nary estimator. 4> 1 can be computed by first estimating thl' Jistributed lag regression in Equation (15.18) by OLS. then using OLS to csumatl!
</l 1 in Equation ( l5.19) with the OLS residuals ii1 replacing the unobsc.:rvc:d
regression errors u,. This version of the GLS estimator is called the Co~hr.Jnt:
Orcutt (1949) estimator.
An extension of the Cochrane-Orcutt method is to continue ths proc.:~::.' \; atively: Use the GLS estimator of {3 1 and~ to compute rc::vtseJ e timator' uf :
use these new residuals to re-estimate <6 1: use this revised estimator of <b 1 to compute revised estimated quasi-diffcrem:es: use these revised estimated qua~t - J tfk ~
cnces ro re -estimate /3 1 and /3 2; and co ntinue this process until the estima tors 01 p
and fi2 converge. This is referred to as the iterated Cocbrane-Orcull estim.uor
A nonlinear least squares interpretation ofthe GLS estimator. \ n ~quh
alent interpretation of the G LS csttmator is that it esrimates the ADL moJd 10
EquaUon (15.2.1 ). imposing the parameter restrictions in Equation (15.22). 11,t>
restrictions are nonlinear functions of the o riginal parameters {30 {3 1 fJ?., aml t 1 1 M,
this estimation cannot be performed using OLS. Instead . the parumetcn; c,1n ~~
estimated by nonlinear least squares (NLLS ). As discussed i.n Appendix R.I . 'JL L5
minimizes the sum of squared mistakes made by the estimated regression lunc
tion. recognizing that the regression functi on s a nonlinear functio n of th~ par:me ters being estimated. ln general, NLLS estimation can require :.ophi<.u~.tc:d
algorithms for mioimi1ing. nonlt near functions of unknown paramc.: terl>. In th~
special case a t hand, however, those sophisticated algorithms ar~,; not oc:ct.kd.
rather, the NLLS Cl>tima tor can be computed u ing the algonthm J ~,;,crihcd

t s. 5

E!>timotion of Dynamic Cou501 Effects with Strictfy E.xogenou!> Regressors

6 15

pre vious!> for the it erated Cocbrane-Orcull estimator. Th us. the iterated
Cochrane-Orcutt GLS estimator is in fact the ~ LLS estimalor of the ADL coefficient s. ubjcct to the nonlinear constn ints in Equation ( 15.22).
The virtue of the GLS estimator is that when X is strictly
exogenous and the transformed errors ii, are homoskedastic, it is efficient among
linear estimators, at least in large samples. To see this, fi rst consider the infeasible
GLS estimator. Jf 1 is homoskedastic. if 4> 1 is known (so that X, and Y, can be
treated as if they are observed). and if X, is strictly exogenous, then the GaussMarkov theorem implies that the OLS estimator of a 0 /3 1, aod /32 in Equation
(15.23) is efficient among all linear conditionally unbiased estimators; that is, the
OLS estimator of tJ1e coefficients in Equation (15.23) is the best linear unbiased
estimator. or BLUE (Section 5.5). Because the OLS estimator of Equation (15.23)
is the infeasible GLS estimator, this means that the infeasible GLS estimator i~
BLUE. The feasible GLS estimator is similar to the infeasible GLS estimator.
except that 4>1 is esti mated. Because lhe estimator of~ ~ is consistent and its variance is inversely proportional toT, the feasible and infeasible GLS estimators have
the sa me variances in large samples. In this sense, if X is strictly exogenous, then
the feasib le GLS eslimatOr is BLUE in large samples. In particular, if X is strictly
exogeoous, the n GLS is more efficient than the OLS estimator of the distributed
lag coefficients discussed in Section 15.3.
The Cochrane-Orcutt and iterated Cochrane-Orcutt estimators presented
here are special case~ of GLS estimation. In general, GLS estimation involves
transforming the regression model so that the errors are homosl.<edastic and serially uncorrelated, then estimating the coefficients of the tram.formed regression
model by OLS. ln ge neral. the GLS es6mator is consiste nt and BLUE io large
samples if X is strictly exogenous, but is not consistent if X is only (past and present) exogenous. The mathematics of GLS involve matrix algebra. so they are
postponed to Section 18.6.

Efficiency ofGLS.

The Distributed Lag Model


with Additional Lags and AR(p) Errors
TI1e foregoing discussion of the distributed lag model in Equations (15.18) and
(lS.J 9), which has a si ngle lag of X, and an AR(l) error term, carries over to the
general di::.tribuled lag model with multiple lags and an AR(p) error term.

The general distributed lag model with autoregressive errors. The general distributed lag model with r lags and an A R(p) error term is

616

CHAPTER 1 S

E~limalon of Dynamic Causal EffecJs

Y, - /3n

whc:rc {3 1,

+ {3 1 \ ',

{3~X,

f /3,~ 1X,., t 11 1,

( 15.Jil)

/3,. 1 ;~re tbl! dynamic mult ipliers and t/> 1 />1, ~r..:

the:: autur c-

)!r~''hc cocUich:nt-; of the ~.:rror tc:rm. Cnder the AR(/' J modcltur the errors..u

'" --erh y uncMrchtcU.


Al~l'bra of the !\On that lcc..l to the ADL model in E4u::ninn (15.21) 'h'1'''
that l lJUt:.lltnns (1 5.30) and ( 1531) tmply th.\l }',can he wrillcn in \DL hlrm

"her~ tJ
r + p .tnc..l 8.,_ ... , {,,, ure fu nctions ol the {:fs and d/s in ElJU.tltlll "
( 1<i.30) and ( 15.31) Equi\'alcntl). the maud of Equ.nion:-. ( 15.30) .mc..l ( 15.31) ~: n
be: written in qua:-.i i.ltlfcrence form as
(I ~D3)

when:

Y=Y
I

- </> 1Yt-1 - -

.-~.

<Jip YI p

X =X

and

- 'f'l
"' X t- 1 - - (b".\'
, 1

,,

Conditions for estimation ofthe ADL coefficients. Th" f Jfcgoing 'i,cu .


,jun nf the ctmdition!. for constst ~;nt c . . tim.ttion of tht: AOL co-: tficicnt~ 111 he
AR(l) case extend.; to the gcncr,tl model with \R(p) crro~. The condiriona l m 11
zero h'\Umption for Equation (I 5.33) is thnt

t(iiiiX,,X,. 1.... ) = 0.
13~cc.tlhe

ii, = u

1 -

c/J u._ 1

cblu 1 _ 2 - - cjJ

11 1

(15'4 )

X,= X, -

and

c/> 1X,

- - c/>1rtY1_r this conditton b l:q uiv.tl~nt tn

r(u I X~> X,_1.... )

cb E(u

ti.Y~ X,_ , .. .. ) -

I!Ji'F( 11

1X,.

X,_, .. .. )
( li

I'

''>

r~11 Equ11lil>n (15.35) ro hold for genc1.11 Htlucs of <b 1, ''' . it m u~t b-. thl!
ca-.c thallach olth~o condiuon.tl expt!ctation., in [4Uillton ll5.3S) ,,zero.~.y
all!ntly. it mu't he 1h~ case that

E(u, X

,,
1

X,.,. _1, X,

. ...- ... ) = 0

(I 'J(l)

'l'hi!\ comhtum ,.., not impkc..l h~ X hc:u \!(past and pr\!'~ 11) ~o.\ogenuu hut
it 1:> implied b} ,\,bung stncth o.;xug~,;nou' 1n f.t<..t, tn the limtl \\lll'n p J<; mltntl ~

1 s. s

Estimation of Dynomrc Cousol Effects wtth Stnctly Exogenous Regressors

61 7

ESTIMATION OF DYNAMIC
MULTIPLIERS UNDER STRICT EXOGENEITY

15.4

llt-: general distributed 1.1g model with r lng<> and \ R(p) error t~rm i~

Y1 = /30

{3 1X 1 ;- ~X1 1

+ - {3

~ tt1

1X

( 1537)

( 15.38)

then the dynumi<.: mulllplkrs {J 1 /3 1 can


by fi.T'>l using OLS to estimate the cocffictcnts of tht: A Dl mudel

LI \'1 i.: ~tric tly


ma t~d

cxogenou~.

Y, = a 1)

.,...

41 1Y,

+ + 4Jp Y,

- o0 X, + c51X,

-r

h~

.:"ti-

t>,1 X '~

ii .

( 15.39)

when: q = r 4- p. then co mputing the Jymunk multipliers using. regression software. Ahemalively. th~ d~namic multirlicr<> can he estimated hy estimal in!'t the
dbtrihurcd lag coefficients in Equation ( 15.37) by GLS.

(so that the error term in the d i,tri~u tcd lag model f<.1llo'\ . an infinite-order
autoregression). then tbt.: condllion in ryuation ( 1: .~fl) becomes the condition in
K cv Concept 15.1 for c;tnct exugcneitv.

Estimation of the ADL model by OLS. As Ill the tlbtribute d lag modd with
a :-.ingle lag anJ an A R (l) t:n or term,lhl! c.Jynamk. mult ipliers can he estimated
(rom thl! OLS estimator'> uf tht.. \ D L coeffkients in Equation (15.32). The general formulas arc c;imilar to. but more complicntcd than.tho"c in Equauon ( 15.1Y)
a nd a re best expressed usmg 1.1g multiplkr notation. these fo rmulas are gi,en in
Appendix l <i.2. In practice. modern n:~rc..,,t<.m '>oil war\; destgnl!d for time scrie'>
regre~s1on analyst!> Jo~~ thcst: computallonc; tor you.
Estimation by GLS.

A l ll:rna ti wl~.

the d}namic multipliers can be estimated


hy (fl:al'ihh::) GL<; nti" entaHs OLS e'>t imation of the cocf(il:ienls o r the qua,.i-diffcrenced spccwca\lon in Equation (1 " ..H). usi ng estimated quasi-differe nces. The
estimated quasi-differences cnn be computl..'c.J usmg pn::hmnary ~~t imators of the
nut o rcgH~"<>hc co dfiCICtHs f/1 1.. . . . l/1 .. ai) 111 the AR( l ) case.1l1e GLS est1ma to r is
1
as~m plotJco ll y BLUE, m tht.. sense discussed ~a rht.!r for fht: ARtl) cast:.
E~ t imation of dynamk multiplier'> unc.Je1 strict e xogeneity i-, summariLcd in
K t.!~ C'unccpl I5.-l.

618

CH APTE R 1

s Estimation of Dynamic Cousol Effocts


Which to use: OLS or GLS? The two eo;timatiun ortion:,. OI.S c-,tinwlion uf
the A OL codfic~t.: n ts and GLS t:~timation or the di tributcd lag cocfficil:nr... ha-.:c
both advantagcc; and dic;advantages.
The advantag\! of the ADL approach is that 11 can reduce the nurnrer or
paramctt.r. nt:t:ded for cstimatmg the d~ namtc multipht:r-. compared to OL::,
e)llrnatjon of the distributed lag model. For example. th~o c-..urn. ted ADL mndt:l
in Equa11on ( 15.27) led to the infinite ly long estimated distributed lag rcprc-..r.: ntation in Equauon (15.29). To the extent that d istribured lag modd with only r
lags is n~all} an ;1pproximation to a longer-lagged dbtrit'lut~.:J lag moJ"l the f\ DL
model can prO\tde a Slmple way to csttmate those many longl!r Ia~<> uc;inc onl)
few unknov.n parameters. Thus. in practice it might be posstble lO C)timatc th~
ADL model in Equation (15.39) with value::. of p and q much maller thun th~.
value of r needed fo r OLS estimation of the distributed lag coefficient~ tn
Equation ( 15.37). In other words, the ADL specificnrion C<ln provide a compact.
or parsimonious, summ ary of a long and complex dislributed lag (see Appendtx
15.2 for additional discussion).
Tht! t~dvantag~ of the GLS estimator is thal. for a given lag length r in I he Jt~
rributed lag modd. the G LS esti mator of the distributed lag coefficients is more
efficient than the OLS esl imator, at least in large samples. ln practice. then. th~
advantage of usmg the A DL approach arises bccau~c the ADL speciftca tion can
~rmtl ~tim a ting fewer parameters than are estimated by GLS.

1 5.6 Orange Juice Prices and Cold Weather


Tht~ section

uses the tools of tune series regres::.ion to squee7c adctitional instgl t ..


from our data on H orida temperatures and orang~ juice prices. First. how llme
lasting is the effect of o freeze on the price? Second. has this dynamic effect heen
stable or has it changed over the 51 years spanned by the data and. if so. how?
We begin thb analysis by C)ltmating the dynam ic causal effects using th..:
method of (\ection 15.3. that is, by OLS estimation of the coefficient!) of a ~.h:-
tribu tcd lag regrt:ssion of the percentage change in price!) (%CIIgP1) on the number of freez ing degree days in that month (FDD,) and its lagged value!>. For ~h~
dt!)trihuted lag estimator to be conststent, FDD must be (past and rrcscnt)
exogt!nous. As di-;cussed in Section 15.2. Ulis assumption is reasonublt: here.
Humane; cannot innucnce the weat her, so rreaung th~ wc.tther as il tl were; random!) <~~s igned expcn m~ntally i!' appropriate. Because FDD is CXOJ!I.nous. ''\!
can estimate the c.lyn.unic causal dfl:cr.. by OLS e t1matwn of th ~ codfidt.nl'- tn
the; dtstrihutcd lag model of Equ tti<.m (15.4) in Key Co nlcrt 15.1.

1s .6

Orange Juic~ Prtce) and Cold Weother

619

nn be ~eri11h corrdatcd tn dt,tnhutcd lag regression' 't' 11 '' im1"1rtunL 111 u..."' IIAC ... tandarJ ~rrnrc;.
v.htch adJUSt for lhls s~.:nal com.:J,JlJOn. ror thl! tnllioll rc ... ulls.th-.; truncatl!)n param
clcr lor the NC\\C~-\\c:st standard errors tm 10 lhc nut.ttton o( Sccll~m J'-1) ""'
dw-.cn u...ing the rule in Equation (J'\. 17) BI!L.lU"r.: there are 612 mlmthl~ olN!r
vntions.accmding.to that rule m- 0.75f'''- 0.7'\ X 61 ., 3 = 6.37. hut because m
mut;l be an intcg~r this was roundcu up to 111 = 7. the sensltl\'ity of the::. tandan.l
~.:.rmr~ to thic; cho1ce of truncauon paramctu i-. Ill\ cslil!.ltcc.J bclo\\.
Th~.. tt!-uhs ol O LS t:sttmatwn of the Jt.,tnbutcJ lag tcgres.o;ton of %CltgP, on
FOIJ I D01 1.. 1-DD1_ 18 are '-UnlllMrtlc::d Ill LLllumn (1) of Tttbh: 151. Till:!
~..,,dflci t..nb of thi' regrl!'>:.ic.m (unl~ \Omc of" htLh arc:- reported tn thr.: table) ar~.:
"''t m ''"''of the dynamic causal effect on ~1r tnl~t. JUice price change.:., (in percent)
lor the first lb months folio\\ mg a unit incre<tt;t: in the number ot lree7.ing. degree
J,t\'S Ill a month. For example. a s ingle lrcc11nc Jegree d.ty ,.., estimated to
1ncrcas... pnce~ b~ U.50% over th~ month in whtt.b thl.' Jrcc:rmg degree day occurs.
'll1c subscqucnt dfect on pncc m Iuter month~ ul .1 trcc7tng degH:c da} is less:
Alter on" month the: estimated c:lfcct '' to incrcasr.: the pric~ b}' ,, further O.J70.u.
and afh.:r 1\\0 months the c~timatcd dfcc:l is tn incrc.,,e the price h~ an additional 0 07" The R~ from this regression ic; 0 12, ind icnting that much of the month
ly v.trt.tllon in onmge juice prices is not c~plamc<.J hy curren t ami pa'\1 values of
\c; dN:U''CJ in Scction1- 15.3 and 15.-t. thc error term

TDD .
Plots ot dyn:1mic multipltc:rs can conve) infornution more dh.!cuvel) tha.n

t.1bk., such .1s Table 15.1. The dynamiC multipht;r' Irum column (1) of Tabk 15 1
.lrL pJ,)ucd in f-igure 15.2a alung "ith their 95n confidt.ncc intcrvab., C\)mputc<.J
,,, Ihe t:!>timatt!J c:odficicnl 1.96 HAC standard errore;. After the initial sharp
price nse, subsequent price rises are less. althou!!h prices arc e ..umatcd to rise
~o,lt~h th in each of the fin-1 :.ix months al ter thl. free/\!. A::. an he Sct!n fr,>m ngurl!
15.2J. tur months other than the first thl! dynamic mulltpher:, ;He not ~t.llic.ticall)
"ignJllc<.mtl> different from zero ill the 5 l'u ~tgn tlt tan cc lc\'el. .1lthough the) are
c..',(lnlilttU [{l he po~iti ve through lh1.. 'C\'t.nlh 11HIIlth.
\ulumn (2) of Table 151 w ntain' IlK cumulative dynamic multiplier~ for
thio; 'Pt.cificatton.that is. the cumulaTive sum of the d~ namic multiplier.. reported
in culumn (1). The!>e dynamic multiplier'> nrc pluth.:u in Figure 15.2h nlon with
thctr ~s , (tmfidencc tnter.al~ Aft-.:r un\! month, the cumul.tthe ..-Hc.et of tht.
ln:c.liug u~:grcc t.lay is to mc:rcase p11r~.: b} U.h7 'o, .11tcr two months th(; pnct.. "
c-.11m l!ct.l 10 haw risen b~ 0.7-1. .mu .tftcr 'i'< ll1<.lnth' the prier.: j, eo;11matcd
111 h,l\c ri'o~.:n h~ 0.911{,. A-. can he ..ccn in F1gurc 15.2h,thc::"~.: LUmulati'-'1.. multipli~.:r' incrcJ..,c through the sc, cnth month, hccauc;c the indtvidual c.hnamic multipltcr" arc pn'>itive for the first s~vcn month~ In th1.. eighth month. the U)n.tmtc

620

CHAPTER 1

Estimation of Dynamic Cousol Effects

TABLE 15.1

The Dynamic Effect of o Freezing Degree Day {~ on rile Price of Oronge Juice:
Selected Estimated Dynamic Multipliers and Cumulative Dynamic Multipliers
( 1)

(2)

(3)

(4 )

log number

Dynamic Multipliers

Cumulative Multipliers

Cumulative Multipliers

Cumulative Multipliers

0.50

0.50

0.50

(I ')1

(11.1 -l)

(0. 1-l)

(0.14)

(0 1'i)

0.17

0.67
(0.14)

0.67

0 711

(0.13)

(0.15)

---

(0.09)

o.m

(0Jl6)

(0.17)

0.74
(0 16)

(0.1'\)

O.o7

0.81
(0.18)

O.Sl
(0.18)

0.&-l
(0.19)

(0.03)

0.84
(0. 19)

0.84
(0.19)

IU37
(0.20)

0.03
(0.03)

0.87
(0.19)

0.~7

(UN

(0.19)

(0.20)

0.03
(0.05)

0.90
(0.20)

0.90
(0.21)

(11.21)

0.54
(0.27)

0.54
(0.'21>)

(0.28)

(0.02)

0.37
(0.30)

0.37
(0.31)

0.37
(0.30)

No

}to

~0

(0.04)

12

0,02

-0.14
(0.0.~)

0.00

18

Monthly indtcators?

HAC standard error


truncation parameter (m)

0.74

0.76

0.91

0.54

Yes

--7

14

F - 1.01
(p - 0.43)

AU regrcssJons \\ o;:fl: esli mnted bv O LS usu1g monthly Witu (de~ibed tn Appcodt.x 15. 1) from J.muarv 1950 h.l Ot"Cembcr lfXX.i. hr a
total ofT .. 612 m1>n1hly olxervations. The depend~nt variable tS the month!) perc~ntug~ ch.lnl_lt' in the pnC(' of orange jUJC.:
(%ChC1P,). Rcgrc~'iun (I) ~ rhe disrnbured lag rcgr~ion w1th the monthly number or (r\.~1.1011 d.:grcc d.1~~ and lis ol ars la~ll 31
ues. that ts. FDf)r Ff)f), ,... FODt-t and th<! rep<1rt.:d eodflcients ar' the OLS e-.hmo~t.:' ur the thn.1nuc mulhphc" Tht cumuLtive mulupllef' un: the cumulnuve sum of esttmatt'd dvnallliC muluphcrs. /\11 re!lfe.>on' mcludc an ul!crccpr. wtuch IS not r..:poncd.
Newey-We\t HAC standard errors. computed using the lrunc<~uoo number gwn 10 rh.: final f0 \1 , arc rcponcJ 1n p.ar..:nthC'>Ci

multiplier is negative, so rhe price of orange juice begin:. to fa ll slowly from it"
peak. After 18 months, the cumulative increase in prices is only 0.37%, that b. the
long-run cumulative dynamic multiplier is only 0.37 %. This long-run cumulative
dynamic multiplier is not statis tically significan tly different from zero at the 10%
significance level (1 = 0.37 /0.30 = 1.23).

15. 6

Orange Juu:l' Pnces ond Cold Weather

6 21

FIGURE 15.2 The Dynamic Effed of o Freezing Degree Day (F~ on the Price of Orange Juice
Multiplier

1n

on
0 I
0.2
..Ill)

-0.2

0
(a)

E~umJted I))

10

12

14

16
18
20
Lag (in Mo nths)

n:unic Multiphc["( Jnd 95% C onfiden..c Interval

Multiplier
l1 04

1?
l ,!l

tl.\1

-t12
-11,4

' - - - - - L - - - L - - . l . . . _ - _ _ . J_ __.__ _.....__ _L.__ - - L_ _-L.._.-.-J

t)

(b) Esunu ted Cumul..luvc

6
l>yn~nuc

10

12

14

Ill

IX
21l
Lag (in M onths)

Multtplicrs and I)S" C<mf'iJtll< lrncrv.U

The estimated dynamic multipliers ~w that a freeze leods to on immediate increase in prices. Future price rises ore
much smaller than the initial impoct The cumulative multiplier shows thot freezes hove a persistent effect on the level
of orange 1uice prrces, w1th pnces peaking seven months after the freezo.

622

CH APTER 1

s Estimation of Dynomic Cousol Effects


Sensitivity analysis.

\' in an~ ~mpirical anal~ -,i-.:.

it '' unpnll,ult to ch~;ck

\\huhcr th~;'>\. ,.~,ults tr\. -.cn.,ith-c to changes in the detail' of the empirical
analv''" \\~,;. th\.rdnrc cxamme three a"pect" ot this analysi" <;cn ..iti\it) tu tht>
~omput.llwn ol th\. HAC stanuan.l ~;.rrors: an alt~m a tive spec:tltc.ttiun th 11 im~,.
uga11.:s pot\.lliiJI omath:u \Unable bws. anu an analysis of the ::.tabthtv ov\.r time of
the estimated multiplier'.
Ftf'l. \\\. inH:stig.th. whuhcr the .. tandard error::. repon~J tn the -.cullld column ol Ibhk 15 I art. -.c..n,itin. to dittcr~nt choices of the IIAC trunc:llillll p:lrnmd~r m In column (1) rc~ult~ arc rcrorted lur m ~ 14, twic~ th~o. "alu u.;cd in
column l2). fh ~ rc..cJ\.''''>n 'fl'-\.tlh.:a tnn i::. the same d S in Cc.1ll.lr--n (2). !':O the
estimated \.U~oift\.tt.::nts unJ J) n.tmtc multipliers .JIC tdcnllcal ''nh the :-.tand.trd
error.. ullcr hut, ,,.., 11 happen, not by much. V.c conclude th,\1 th ... r~sulh ilrt:
in,~n-.ithe to change-. in thl' HAC truncatton parameter.
ScconJ . WI. invt,tig.atc a pos...ihk ~uurce of omiLtcLI variahk ht.l .... rtcczc .. in
noml.t arc not randnm ly ltssigned throughout the year. hut r:tthct 11ccut in the
winter (ol CllU rsc ). If Jcmanc.l !'or ownge juice is seasonal (is d cmond for m<~nP,l'
ruicc grcatl.!r rn the wtnlcr than the summer?), then the st:a:>onul palt\.rll:> 111
orange JU tl'l. J~.manu coulu be corrdated w11h FDD , re~ulting in ormu~u '''"'' I...
bi.h. The qu.lOtll} nl 111 .mges sold lor JUic~; is endogenous: Pricl.!~ and qu01nlltr~.:.
arc ~imult meou"l) tktcrmin~J by the forces of ~upply and d\.mant.l. Thu.... <t OL'-cus<>ed in Section t) 2. an cl udin~ qu.mtity would lead to 'imulta'l i) bia..
I'.C\\.:Jihdc:ss. the :-.eason.tl component nf demand can be captured by ir dudlll!!
se:to;;onal \ :~ri 1hlt!" ao; rcl!fC!)l)Or~>. Tht specrllcauon in column (4) ol l.thk b I
therdort: include-. I I monthly hrnal") variable<;. one indicating whet he. I tb ~: rnunth
is Janu.tr},t.'ll'-" indacaung Fchru,rr]. and so forth (as u::.ual one hina11 v.utabk must
b\! omiucJ to rr~\cnt txrfect multtcolhnl!anty with the mtcn:o.:pt). 11t~.,e month
lv mJic tlor variahks .1rc not joint!~ ... t.uisticall~ significant at t he 1011 .. k\1..1 (p
.43 ). and the c'timmed cumulall\l' dynamic multiplier.; arc c~'lt: nt i.alh thl.: qme :t<;
IM tht: ~pectftl~lllnns cxdudin~ the monthlv indicators. In .;ummarv, sC'l~On'llll uc
tua!lons 111 dc..m<tnd .trl! nnt an mport.tnt c;ourcc ot omilii!U VHrt.thle b11'-

Have the dynamic multipliers been stable over time?3

ro a~ses:> the :>t.t

bility of the U)nnmic muhiplkrs. we nl!cd to check \Vhcthcr the dbtrtb utcJ l.tg
cuc..:ffici...nts h,l\c '-"L\!11 -.tahlc O\'Cr time. Because we Jn not h:t'"-
~pcciftl brc'll. d.ttc in mind. we tco;;t for in'>tabtltty in the n.:grc-.,inn codfkicnt"
ll"IOJ! th\. O u tndt ltkdthooJ rat1o IQLR) 'tall~tr c (K~y Conc~;.pt ll '') 'JllC' Ql H
')L.Jtt~tic l w llh 15 tu lnmming. and HAC\ ana nce esumator) comput\.U 1111 t lC
rcgrco,~iun

1'11.

l.b'o(;U~')IiiO Ol

it th.tl m.11c:n.1l

'lllhtll\ tn

hn~

lhl~ 'ui:N.CIJllO dill\\'

O ot he< o ,mere~

<lll!n.lli!n;t( [rom 5.-'CliCIO

14 7 lllhl

Llln

lx: Sl.:lp(l<:LI

15 .6

Orange Juice Price$ and Cold Weather

623

FIGURE 1 5.3 Estimated Cumulative Dynamic Multipliers from Different Sample Periods

The dynamic effect an


orange juace price$
of freezes changed
significantly over the
second half of the
twentieth century A

freeze hod o lorger


impoct an pnces dur-

1 1

ing 1950-1966 than


later, and the effect of

a freeze was less per


sistent during

1984-2000 than
earlaer.

1%7-1983
(l

---:::s ;;~_::, ---

_::~ t - -, ------ - . ,-------- u

(>

I')

111

12

14

1(,
11\
20
L.tg (in M.onrh~)

regression of column ( 1) with all coefficient<; interacted. ha" a vulu.: of 21.19. ""ith
q 20 lkgrces of frct:dom (the cocfJicien~ on FDD,.ll' Hi lae' anJ th~ mtercept ).
nu~ I~~ .. ~oriticaJ "alue in Tahle l-Ui is 2A3, -.o tht QLR -.t.ttl~tJ<. r~Jects at thL 1 o.o
"i!!nificancc k.'d Tht:!'e QLR rcgres~ions han. 40 regressor<.. a large numhcr;
recomputing them for sn lags only (so there are It) rcgn: ...!>or... and q = 8) abu
rt.:!>ults 1n reJeCtion at the I% Jc,e1. Thus. the hypothesis that the dynamic multiplier!> me ~table is r~JCCtcd a! the I% signillcance le,cl.
One wa) to see ho\\ the dynamic multipiH:r~ have changed ll\cr time '' to
compuh, th~m for dllfcr~.;nt pans of the ~mplt:. 1-'igurc 15.3 plot<. the cstim<~ted
cumulativ~.; dynamic multiplu:~ for the hr"l thud (195U- I%6). middle th1rd
(1%7 IIJK~). and fmal third {19S~2000} of lht sample. ..:omputt.:d hy running
sep;Hatc regrc~~ions on each subsaruple. 'Ill esc e:;timates show un interesting and
noticeable pattern. In thl! LY50s and early 1960s.a frcc?ing dc?,rcc da) had a large
anu pl.!rSISII.!nt cffec{ 011 the pnc~. The mag_mLUJC: Of (he effect 011 pn..:c of II freez ing ucgll.. l.' uay Jlmtni:o.hcd tn the 1970s, although It remained lm!hl) pc:rSIStent. In
thL l.ltl 1Q~Os and 19'Xh, the short-run effect ot a frcc;ing Jcgrc~.; c.lay 'V<<h the
~arne as in the 1lJ70" but it became much kss pcr"i"tcnl. and \\'<1"- l's-.cntially elimin:lll!d .liter a year. These estimates suggest that the dynamic cau-.al effect on

624

CH APTER 1 s

Estimation or Dynamic Cousol Effects

orange juice prices of a Florida frc..:c/c became "mallcr .tnd less


thl. ~ccoml hal1 of the twentieth century.

p~or:.istent

o\l.:r

ADL and GLS estimates.

As discussed in Section 15.5, if the c:rrm term in the


J istributed Ia!! rcgrcs.,ioo is serially correlated and FDD is stnctly t;\Ogcnou-. it i
possible lo ~~uma te the d}narnk mu ltipliers more efltcicntly than In 01.., c ...limatton of the distributed lag coefficient'\. Be fore us me. ctthcr th~o G [ S c: ... um..t.or
o r the c:-.timalor based o n the ADL model. howc\l..:r. \\c nc.:~.;d to coo-.Jo'"r "hether
FDD ts in fact strictly exogeno us. True. humans canno t lli'"~.t the dail) \\cath r.
but docc; thm mean that the weather is strictly exogenous'? D oc!. the error tcm1u,
in the distributed lag regression have conditional mean 7cro, given past, rm:,cnt.
and jiaure value!> ol FDD'l
TI1e error term .in the populatiou counterpart of the di!llributed lag rcl!rcssion in co lumn (I) ofTable 15.1 is the discre pancy bet ween the price and ttli pop
ulation prediction based on the past 18 months of wea ther. This discn..:pum;y
might arise for many reasons. one of which is that traders use forecast~> ot thr:
weather in Orlando. For example. if an e!\pecially cold winter is fo rcca-;tl.!cl. th~;n
traders would incorpo rate this into the price so the price wo uld be above its pred icted va lue based on the popu lation regression; that is. the error tern, would be
positive. If this fo recast is accurate, thc o in facl future weatber would turn tlUt w
he cold.TiluS future frc.:c7mg degree days would be posiuve (X,. 1 > 0) when t"tc
current price is unusually high (u, > 0), so lhat corr(X,, 1.u,) IS positive. ~t h:d
more Slmply. te hhough orange ju ice traders cannot infl uence the " cathtr .hey
can-and do-predict it (see the box). Consequently, the e rror term in th ~
pnce/weather regression is correlated with (utun: weather. In o ther words.. / D[)
is exogenous. but1 this reasonjng i!l true, tt is no t strictly exogenous.. and the G l <i
and A DL esttm:nors will not be consistent estimators ot the dynamic multiplu.: ,
These estimators therefore are not used in this appllcat1on.

15.7 Is Exogeneity Plausible? Some Examples


As in regression with cross-sectional d.lta, the interprclat1un uf the coeffi cil:nb 111
a distributed lag regression as causal dynamic effects hi ngcs on the i'IS~ u m pt ion th.tl
X is c:xogenou!>. [f X, or its lagged va lues arc corre lated with 11 1, lh~n 1lw
condilional mea n of u, will depend on X , or ib lags. in which ca~c Y is no1 (P"'I
and present) exogenous. Regn.:ssors can be correlated with tJi~ c!rror t~o:rm ft1r
several reasons.. but with economic time series data,, rarticularly important con
cern i , that thcr\, could be simultancou'!> <..tU~ t) , '' hKh (.1 ... d1,cus~d in S~.:dlcll''
Y.- 3nd 12.1) rc-.uhs in ~:odogcnou' rcgrc sors. ln Sccti<'D I c;,6, we dt~us.,.:J the
a'sumption' ol cxogenetry and strict cxogcoeit) ot fr(;t.tin~ J~e ... d:l',, in Jctml

1 5.7

Is Exogene~ty Plou1ible? Some Exomples~

625

NEWS FLASH: Commodity Traders


Send Shivers Through Disney World

ltlhl lltt.h tht: "cotlll; '

t Dtc;ne' Wor lu in

OrlanJo. florida. ~~ u:.lJ.tlh pka~<~nt. now and

then a cold spdl C:-ln M!lllc 111. Jf you arc 'i::.iting


Dtc;ne) World un 11 '' inter evening. should you hring
a " :1rm coat'! Some p~: nplc.; might ch..:ck the
\leather iorccu:.t un T\'. but lho::.c in the: know c;an
de' ht'Her:11te) can check thnt dn~ 'clllsing price ou
the ~cw York ornng4.; Jlliu.. fu ture-; market!
The financial economt't Rtclwrd Roll undcrto,1k
a dctaibJ !!ludv of the rdution~hip bc.twct'n orange
JUke pnce-; nod the weather. Roll ( 19H.t) examined
tllc cltcCl on prtcc~ of cold weather in Orlando. but
h~o al~u c;tudted the " effect'' of changes in the price
ot m ,,ram~c )uic:c future!> contract (a contract to
bU\ froter 01 olllSC !Uic:' conrentrate at a ~pecificd
d.tt4.; m tlu: future) ,m th..: \\Cather. Roll U)ed dail~
titt.t from 97" to 19X1 on th~; pnces of OJ tuturcc;
contntCl~ tr tied ul the Ne" York Cotton Exchange
and on dal> and ovcrntghl tcmperatur~:. in
Orlando. He found that c1 me 10 the: price of the
future~ contract dunnp, thl! trading da~ in :-\.:\\ Yorl.
prcdtctcJ wid weather. in particular n tree7in!
"t>ell m Orlando <l\'..:r t h~.: tolluwing night. In fact.
the m<rket '' n ~ ' 0 ch:ctl\ e tn predtcung cold
\\eatlu:r in n oml th.tl <I priC\: n-.c dunng th~,; lr<tU-

ing. duy i!Ctuo~ lh pt.:ilicted (orcca~t error~ mthc official lJ.. governm~nt weather forecac;ts tor that
mghl.

Roll'<> <;tUd) IS also mtcre<>tlllg tor what he did


ncJt find : Although hb J ct<tikc.l \o\ Co~ ther data
e:\"J)Iaineu some of the varunion tn dati}' OJ futures
pric~~. mo:.t of the datly mO\II!m~nb m OJ price~
remained unexplnintd. H\; therefore sugg~sled that
Lhc OJ future:. milrkct ~:xhibit:s "exec:.!> volatility:
that is. more \'Ointility than can be atrributed to
ffiO\ ~mcnll> in fundum.:nl<t l~ Vndt:T!>I<Inding why
(and if) there i~ exceo;s \'Oiatilny in fmnucial markets
j, now ;m important .trcu of rc)Carch in financial
economiC!'..
Roll's finding aJ,o illu!>tratl!s the difference
hetween forc:cnst in~ and e~ltmat ing dynnmtc causal
effects.. Price dl.Jnge:. tlU the OJ futun!:. marl.et are
a u"eful predictor of cold wt:ather. but that docs nol
m~an that commooity trodcn. are ~o po'' erful rhal
they can cau.~c the I 'nlp~t aturt: to fall Visttol"3 !o
OC.nc) Worlt! mt~hl ')htv~or Mtt.:r an OJ ftnure~ contract price nsc. but thcv :u~ not -.hiv~nng becmm: of
the prie n '.: ~.tnlt'o;.._ of enur--e:- the~ \l-ent short in
the OJ futur..:s market.

In thi' "'t.:\:lit'n. we examine the assumption of cxogcncity 10 lour o ther economic


apphc.tllnns.

U. S. Income and Australian Exports


Tile;.

l nt h.:J States i$ an imporlant source nt ckmand lor Australian exports.

l'rt.:~ l'iel}

how senslll\'t' Australian e xports .m.: lO nu~tuatwns tn L:.S. aggregate


ull.otnc,.; could be imesugated by rc::gre'\Stng Au~t ralian cxporb to the l'nitcd
'\t oties ag<~inst a mea,ur~ of U.S. income S t ricti~ '-peaking. bccc~uo;e the \\orltl
economy j!; imcgratcd there j, simult:tneouo; cauo;alitv in this rei.Hiono;hip: A
tkd1ne 111 \U'-trilha n expMb rt!Juces Auo;tralian tncomc. \\ohtch r~..Jucc-. Jemand

6:26

CH APTE R 1 s

E,Hmotion of Dynamic Causal Effects


(or imports from the U nited ' tates. which reduces U.S. income. A s a practical
ma tter, however.tht:; cffcctts very small because the Au tralian economy ~~much
smalle r than the US. economy. Thus. U.S. income plausibly can be tr~dted as
exogenous in this regression.

In contralil, in a regression of E uropean Union exports to the United State~


agai nst U.S. income, the argume nt for treating U.S. incom as exogenous ts ks<>
convincing because demand by reside nts of the European Union for A mcnc<tn
exports constitutes a subs tantial fraction of the total demand for L.S. export:--.
Thus a decline in U.S. demand for EU exports would dccrea~e E U income," htch
in tu rn would decrease de mand for U.S. exports and thus decrease U.S. income.
B ecause of these linkages through international trade. EU e xports to the Unitc\l
States a nd U.S. income are simultaneously determined , so in this regression U.S.
income a rguably is not e xogenous. This example illustrates a more gene ral potn t
tha t whe th e r a vari able is ~xogenous depends on the context: U.S. income b piau.
sibly t!xoge nous in a regression explaining Australian exports, but no t in a rcgrt!~
sion explaining E U exports.

Oil Prices and Inflation


Ever since the o il price increases of the 1970s. macroeconomis\s hetve been tnterested in estima ting the dynamic effect of a n increa-;e in rhe inte rna tional price of
crude oil on lhe U.S. rate of inflation. Because oil prices are sel in world markt!t'>
in large pan by foreign oil-producing cou ntries. ini tially one might think that <'il
pn ces a re exogeno us. But oil prices are no t like tbe weathe r: Me mbers of OPEC
set oil produc tion levels strategically, ra king ma ny factors, including the state
the world economy, in to account. To the extent that o il prices (or qua ntities) .tre
set based on a n assessment o f current and (uture world economic conditions.
incl uding infla tion in the U nited States. oil prices a re e ndogenous.

,lr

Monetary Policy and Inflation


TI1e central bankers in charge of monetary policy need to know the effect on

inflation of mone tary policy. B ecause the main tool of mone tary policy is the
shorl-te r m inte rest rate (the short rate''). this means they need to kno" the
dynamic causal effect on inflation of a change in the short rate. Although the
short rate is determined by the central bank, it is not set by the ce nnal banker~
at ra ndom (as it would be in an ideal randomized experimenr) but rather is .set
e ndogenously: The c~ ntral hank de te rmines the short rate based on an ass~-.!)
menl of the cum.:nl a nd future sta te of the economy, especially mcluding th.:
current and fu ture nile:, o f in na tion. The rate of inflation in turn depends on the..
mtcrcst rntc (high~;r tntc rc~t rates reduce aggregate demand). but the tntcre'l

1 s.s

Conclusion

627

nne dc;pcnds on the rate of inflation , its past value, and its (expected) future
value. nlUs the short rate is endogenous and the causal dynamic effect of a
change in the short rate on fut ure inflation cannot be consistently estimated by
an OLS regression of the rate of inflation on current and past interes t rates.

The Phillips Curve


The Phillips curve investigated in Cha pter 14 is a regression of the change in the

rate of inflation against lagged changes in inllation and lags of the unemployment
rate. Because lags of the unemploymen t rate happened in the past, one might initially think that Lhere cannot be feedback from current rates of inflation to past
values of the unemployment rate. so that past va lues of the unemployment rate
can be treated as exogenous. But past values of the unemp loyment rate were not
randomly assigned in an experiment; instead, the past unemployment rate was
simultaneously determined with past values of inflation. Because inflation and
the unemployment rate are simultaneously detern1ined, the other factors that
determine inflation contained in u 1 are correlated with pasr values of the
unemploymenl rate, that is, the unemploymenl rate is not exogenous. It follows
that the unemployment rate is not strictly exogenous, so the dynamic multipliers
computed using an empirical Phillips curve [for example, the ADL model in
Equation (14. l7)J arc no t consistent estimates of the dynamic causal effect on
inflation of a change in the unemployment rate.

15.8 Conclusion
Time series data provide the opportunity to estimate the time path of the effect
on Y of a change in X. that is, the dynamic causal effect on Y of a change in X. To
estimate dynamic causal effects using a distributed Lag regression, however, X
must be exogenous, as it would be if it were set randomly in an ideal randomized
experiment. If X is not just exogenous but is srrictly exogenous, then the dynamic
causal eUeclS can be estimated using an autoregressive disuibuted lag model or
by GLS.
In some a pplications, s uch as estimating lhe dynamic causal effect on the
price of orange juice of freezing weather in f lorida, a convincing case can be
made that the regressor ([reezing degree days) is exogenous: thus the dynamic
causal ef[ecl can be estimated by OLS estimation of the distributed lag coefficients. Even in this application, however, economic theory suggests that
the weather is not strictly exogenous, so the ADL or G LS methods are
inappropriate. Moreover, in many relations of intere:,t to econometricians,
there is simultaneous causality, so tbe regressor m these specifications are not

628

Eshmotion of Dynamic Causal Effects

CHAPTER 15

t:\ogc:mlU'>. strktl~ nr utlwrwisc. A-,certaining whether th~o rc.:g.r~:-.-;or i~ ~:-.o ~nl)u'


(or 'trictly C\U)!cnou ) ult1matdy require-. comhininl!, c:cniliHllic th or~ u ~tllu
uon.tl kno'' kdg~. .tnd ~oar..tul Judgmen t.

Summary
I.

2.

3.

-t
5

D~n

tr 11.: c:nr~::tl cllccG 111 time ~rie-. are ddin~J in the ~.umext of
r 'ffiJted
"'"p...ru 11~ nl. '"'llt;rC the' tmc suhjcct (entity) rt:cerve!\ dtffc:rent moth lltl\ '''gn~.:d
tr~atnl\.Ot!> at dtll~rc:mttmc-.. The codtrcrcnts in a dro,tnhutc.:J Ia rcl'rec:,,on 'Jf }'
on X nd tt' I.'!!-' 1.an be Ulh:rprct..:u as the d}lllmtc '-'IU''I c:fll.dS "hen he 11m~
path of \' j, d~:tc.:rmined randoml} Jnd inJerc.:nlkntly ol othc.:r lador-. th.11 tntluc.:ncc }',
1l1c \'aJJclbk \ j, (pac;t and present) cxogennu' if the conditim1.1l nN 111 ul the
c:rrilf 11 1 111thc dt,tnhu ted l,tg rcgre:.:.ion of Yon currcnto~nJ I"''' \.Jiuc ... nf' .Y doc.:s
tllH Jt.:penJ nn cut rent and past values o( .-Y. If In addition the contli tion~ I mean
ot 111 d1.h:':l not dc.:p~:nJ on lut urc v~llues of X . then X is ~ttictly cxnl!~;nou~
II r j, C'\Ogt:nuus.then the O LS e<;llmat ors olthc COI!IIIl.lcnts in iJ Ul'\lt tbUttd '&
rc.:grc,,ion of Y on l'unent and past values of X arc con:.h.tent 1. )lJJlhttor:: ol th~:
dyn.mac causal dft:~L. In ~eneral tht! error u 1 in this rc.:grc~sion '''crt til~ corn:
I ttn. o convcntion::ll standard errors are mt,kJdin~ and HAC l>t tnd mJ ~:rror
mu't h.. u'c.:d in,lead
It X LS tnc.tl} cxogc:nous. th~n the dr namk. mul ipliers can be el>timtcd
f
c.:'l m ton ol 1 \OL modd or b) GLS.
Fwgc:ndt~ i... 1'lrtlng assumption th.tl often latb to hold tn economic ttmc:. 11 :.
Jat bCCllU'e of ,jmullanc: US Ca U!>C!Ji~. amlthe :1<;'\Utnplion of <;tri.;t t'\U:.Cnclly ~
l.!vcn c:trongcr.

Key Terms
thmtmK e<~usal c.:ffect ('i91)
JistnbuLt..J lap. mudt.:l (51)7)
CX<.lgi.'Jlt.:ily (5!.19)
strict c\ogcm.tl) (~!.llJ)
Jy n tmic multtpli~. r (l)(l:-)
impact dlcct (hill)
curnulltH~ Jy n.tlllll' muhtplicr (60~)
Jon run ... umulut"~ 1.h n muc mulllplic:r
lW3)

hctt:ro!.kcdastid 1y ., nd au tt x:undu t iun


const~:>tCnt ( I lAC) ~t<Jndnrd CrTlll ((l()r, )
truncation paramct.:t (607)
l\e\\C) '"c.:!>t \<lrti.lncc c~tun.tlor (6lll'il
glneraliL~;d k.tst :.4u.u~s {GLS) {oU~)
quasi-dtll~rcncc: lhll)
infca-.ibk GLS l.'tint.~tor (1)14)
~~. ... ,ihlc.: GLS c'ttmntor (614)

Exerci\es

6 29

Review the Concepts


15.1

In the IIJ"'Os a common practice was tu c ... t,matc a dic;trioutc\1 l.tg model

rdatinl! ch.lnges 10 nomtnal ~ross domc,tic proJull (}jlo currcrl md pa'-t


111 the mon~) suppl) ( \).l nc.lt:r \\hat s-;umpunn~ \\llllhl' r~o.c.res
~ion ~o:!IIJmate the causal effo.:ch nf mon~:} on 011mm.d G DP' Arc thc:.e
:w. urnpli()n-. likt:l) to be ~atio.,fh:d 10 a modern econumy like. tho.: Umto.:J
St.ttco;?
Suppose that X is slnclly exoecnou" 1\ researcher c'timatcs nn A DI ( 1.1)
nwdcl CJiculatcs the regrc<'o'-1\ln rc"1dU.11 and linJ, the re,iJu1l 1 1 lx h1ghly
'en.. h ~ urrdateJ. ~h'-'uld the ft.:'-~o. n.:h~. r c!>llmal<! nC\\ \ D I model \\lth
addiltnn.ll lag, or simply u,~,. II A.C ~tand; rd t.!'rur:. for till. A O L(I,I)
e., lima c.:J ludlictenb '>
Suppo'"' that a Ji.,trihuh.:d lag rcJ,U'c:.-'-hlll ., c~timated." here the dc.'JlCndl!nt
' tria hie i" ~ )' instead of Y1 Explain ho\\ you would computl' the U) namk
multiphcr\> of Xr on Y,.
Supptl~c that ou added f"DD,
n'> an aduitivnal rl'!!lc..;sol in Equata()fl
(15.~). lf IDD 1'> s tncll~ exogenou~ wouiJ you cxr~c.t th1. co~lhc1~o.nt em
I 1>0, 1to he 7cro nr non7cro 'W<.1Uid )OUr .tn~\\1..! ch.tn!>"' J..l UJD 1:. t::\ugc
~h.1ngc'

15.2

15.3

15.~

nuu .. but

0(\l

~frictl)

exogt!nou!'?

Exercises
15.1

lncaca-..e' in oil prices have been blamed l'nr 'cvcral rl.!ces'litm-. in Jcw loped
Ctlllflll ics. To quantify the. effect of oil prices on rcn l economic nctivity
1esc.Hchus h;wc done rlgn.:!-. ... iuns 111..~ thtl'C Jio,;cu-.!>cd 111 tim. chapter Le t
GD P, th.molt. the \'alue of quartet I~ groo;c; domc<>llc prl)uUct 1n t h1.. L nlt..:d
St.IIC'-. md lu }~, = llXlln( G 0 P, G {) P, .) be th<. qu.trtl d) r(. fl'Ull.tge
dt.tng~.. in GDP Jame~ Hamil!on. :n econ0mctrician :1nd maCTik..:l..llll{lillht,
h..... 'U)!\!1.. ''~c.l th.11 oil price~ ad H. p,~,. h 1f ect that econnm) onh "hen they
jump ahove their value<; in the recent past. Spccificalh. let n equal the
grc.1t1..r ot zc:-ro or the pcrccn ta!.!c pt\int difference bet\\ecn l.lll pnccs at dt~tc
t :111d thdr nHlXtmum value dunng the pa!\t year. 1\ <.lt ~tnhutcu lag n.:grc'\:.ton rclatin):t }', and 0,, l'Stimatcd over 1955:1-2000:1\ '"
>~

1.0 - O.!l'\5() - O.O:!fiO,


(t).J) (ll.OS~)

.;. o.ooso, :.
(t1.025)

- o.rn 1o _- o Hl90

(0.115~ I

0.0250,
(O.H4X)

(I (US)
0-

0.0190:
(<l.OW)

o r"~O,

J -

(0.0-C)

+ 0.0670,
(11.()-t:)

j
8

llC\

630

CHAPTER 1 s

Estimation of Dynamic Cousol Effecb


a. Suppose that nil price~ jump 25 1<, uhovl: theit prl:vtuu' peak \ulue
and stay .11 this nl!w higber level (so that 0 , 25 anJ 0. 1
0 ,. 2 = ._ 0). What is the predicted effect on output gfiiY.th for
each quarter over the next two years'!
b. Construct a 95% confidence interval for your anS\\ Cn. in (.).
c. What is the predtcted cumulative change in GOP groY.th

u\C ~;;tght

quarter ?
d. The l-IAC F-statistic tesLing whether lhe cocllictent:\ on 0 1 nod 1ts I<.~II.S
are ~ ro is 3...49. Are the coefficients Significantly d fC~::rcnt frnm ur,,'
15.2

Mac r oeco nomi s t~

have also noticed that interest rate::, chang.. lullowmg 011


price jumps. Let R, denote the interest rate on three-month Tn:.asury btll"
(in per(;entagc points at an annual rate). The distnbuted lag regre. -.ton
relat ing tbe change in R, (6.R 1) to 0, estimated over 1955:1- 2000:JV 1s

SR., =o.07 + 0.0620, + 0.0480,


(0.06) (0.045)

(0.034)

, - 0.0140,_2

(0.028)

+ 0.0230,_5 - 0.01001-(,- 0.1000,_7 (0.065)

(0.047)

(0.038)

0.0860,_ 3 - o.oooo,_~
(0.169)
(0.05R)
0 . 0140,_~.

(0.025)

a. Suppose that oil prices jump 25% abo'<e the tr previous peet k. value
and stay at this new higher level (so rhat 0 , = 25 and 0,. 1 =
0tT'l = .. = 0). What ls the predicted ch an g~ tn Interest 1<1teS for
each quarte r over the nl!xt t\'-O years?
l>. Construct 95'./o
. confidence intervals lor your a nsWCTh to (a).

c. What is the effect of this change in 0 11 prices on the level of anterest


rates in penod 1 + 8? How is your answer re lated to the cumulative
mu hiplier?
d. ll1e HAC F-lltalistic tec;ting whether the coefficients on 0, and its l<~g ...
arc 7Cro is 4.25. Are the coefficients signiti can tly different fro m zero'?
15.3

Consider two differe nt randomized expcri rnenls. rn experiment A. oil


prices are set randomly anti the Central Bank reacts according h> its usual
policy rule!) in respon!)e to economic condi tions, mdud tng changes m the oJl
price. In experiment B, oil prices are ct randoml} and the Central BanJ..
holds interest ra tes constant, and in particular dot:' nor respond to the oil
price chnnges.. In both, G OP growth is Ob:>erved. Now suppoc;c that
otl pru.;c!) are exogl!nous in the regresc;ion in l:.xerc1se 15.1. To \\ htch

Exerc1ses
e'< {>\!rim ~;n t .

A or B. doc.:\ th1. dyn.lrnJCcau-;1l c fl cct

~tima t cd

631

m Exercise

15. 1 com:,pond'!
15.4

Suppose that oil pnces are ~ t rictl y exogenous. Discuss how you could
improve upon the estimates of the dynamc multi pliers in Exercise 15.1.

15.5

D erive Equation ( 15.7) from Equalion (15.4) and show thar 80 = {30 . 5t =
f3t, 52 = {3 1 ..... {3 1 8J = {3 1 + {32 + {33 (etc.). ( Him: NOle lhat X, = 6X, +
6XI - t + . .. + .:\X ,_r, + X,_fl.)

1.5.6 Consider the regression model Y1 = {30 + {3 1X , + u,, whe re u, fo llows the
stationary AR(J ) model u, = <b 111, t- u, wtth u, i.1.d. wllh mean 0 aod variance ui and d> 11 < 1, the regressor X, fo llows Lhe stational) AR (1) model
X1 = -y 1X,_ 1 + e,.with e, i.i.d. wi th mean 0 and variance a; and I'Y1 1< 1. and
e, is independent of ii 1 for all 1 and 1.
a. Show that var(u,)

= --'1-

aod var(X,)

q2

=-

'2

1 - l/Jl
1 - 'Y t
b. Show that cov(u, u,_) = 4>{var(u1) anu cov(X, X,_i) = y{ var(X,).
c. Show that corr(u,, u,_i)

= l/J{ and corr(X,. X

)
1

= y/.

d. Consider the rerms <T; aod f 1 in Equation (1 5.14).


i. Show th a t ~= o-}u~ . \\ here u_i. is the vanance of X and u~ is the

variance

o[

u.

ii. Derive ao cxpresston for f..,.


15.7

Consider the regression model Y 1 = {30 + {3 1X, ..L. u1, where u, fo llows the
stationary A R(1) model u, = l/J 1u, 1 + ii,with ii, i.i.d. with mean 0 and variance <T~ and I ~ 1 1 < 1.

a. Suppose that X 1 is independent of 1 for all r and j. Is X, exogenous


(pa::.t aod present)? Is X1 strictly exogenous (past, present, and
turure)?
b. Suppose that X, = ii, + 1 Js X , exogenous? Is X , stnctly exogenous?
15.8

Consider the model io Exercise 15.7 with X, = ii, ... 1


a. ls the OLS estimator of {3 1 consistent? Explain.
b. Explain why the GLS estimator of {3 1 is not consistent.
c. Show Lhat the infeasible GLS estimator

if" s ~ {31 -

4>1-.,;.
l/J-

} T

[ Hint: Use the omi11ed variable formula (6.1) appljed to the

<.li!ferenecd regression Equation ( 15.23)).

qu~si

632

CHAPTER 1 s

Estunohon of Dynamic Causal Effects

15.9

the ..con-.t mHt.rm-onh regr.....,,on rnuJ....J } 1 = p 1 ~ ":-\\here ,,,


Iolio'\' the -.tatJonary AR( I) model n, = th 111 _ ~ li \\ith ii. 1 i.J. "uh m~.: an
0 und \ananc\. tr 'lnd ld I < I
Con~J .. r

r -t ~~ ,};.
Slll)W th.tt th'- (rnku-.Jble) GLS l:'-llltlatl)f ,., if.:'\ =

a. Show th.ll the O LS t.:'-timutor i' fJ11


b.

(I - f/1 1 ) 1(T- 1)- 2., ~ (}',- </> 1), 1). [lfim. Th.. CiL\ ~'timatllr uf
{3 is ( I - 0 1) 1 ttmt.:'- the OLS e~ttnl.l tnrol o 0 m clJUdtHIII (1='.2' ).
1

Wh) ?J
c. Sho\\ that {3r,1 ~ can be written ;h

(I

- 1)

'I .-\.!.

( l - tb 1)

it;'

1(1

I)

{/Jill/' R C'3rrange the rormuh in (h)

d. Dcrh c I he tliffcrem:\.' ~u ,m,tU \\hen I i" large.

15.1 0 Con::.iu<!r the

1\

'i -

1 (Yr

- </J \''1 ).

ifJ' ~and tli cuss \\by 1t ~ likt.:ly to be

DL nHH..Id Y, = 3.1

0.4 Y,_ 1 + 2 II '<,

0 ~ \' 1 n,

"heft: \'1 1'- 'LrlCU) t.::-.ogt: OI)US.


II.

l)cri\e thl.! imp.u;t dlt.:ct 1lf \"on Y

b. l)l:ri'e the

fir~t

live d) nam ic multlplters.

c. Dt.nH: th~ Uht 11\t! cumulati\C muhlpltc;P. .

d. D-.:ri'vc the lotw-run

cu mul~uivc

d vnamh. multiplier.

Empirical Exercises
15.1

In thi~ cxcn:i ~ )OU will c"timalc the dfcct ot 011 pnct!" on macro~cononm
:\ClJ\11>' ~101! monthl} d:ttt on the lnc.k x of Industrial ProJucltL>n (IP) <~od
th1. mlmthly mca"url: of 0, ut!o;(nhcc.l in Excrci"e 1" I. The Jata c tn l' ~o:
found oo rh~ text~ouk \\~,.b <>Jl\. \'rnWJl\\ bc.com/MocJ. _\\ahon in tl " rile
USMucro_Month1} .
jj,

Compul~

Ihe mc.mthl} gro\\ th rate In lP cx preS!\CU tn rcn:\.l!llag~:


pllints.tp t:rrJwth1 = lOll Y ln(/P 'IP ) \\ 1, I i-.tht. mt!ar .-nJ
r.d.ml d~.:' 1.1t1on uf ip growrlr over the 19)2.1- 2004: I::! sam pit: r~.::riml'?

b. P h't lhe v, lue t)f Or\\ h~

'on

1n~ v,lu~o:' ~.,1 lJ, ~.~ua.

'"' uru.

Wh} arl:n't some values ot 0, ncgativt:?

c.

l:.'timat~. 1 di'l

'buh:J lag m' Jd nl 'f1~~"' 1 rh ontu current and I 'i


laf'!Cd \a lues of o, What \' tlue or the J I \(' ... wnJard lrum.ation r .If I
n ~oler m did ~I'll choo'e? \\' I v'?

Empirical Exercises

d.

Tak~:n

633

as a gwur. art. tho.. <Aidllcu:nts on U, ,;t;~ta,ticully .;igntricantly


zero'

(hffc.:r~nt I rom

C.

r.

15.2

(Am:-.truct graph liJ...c tho:,r.. tn f II! UTI. 15.:! ~hO\\ ing the: estimated
d) namtc mulupher:.. cumul..tll\1. multtplicr'.tnd 'J<; ' confidence
int~n<tls. Comment on the rc tl\\oriJ '"c llf the muh pli~rs.
Suppo-;1.! that high demand an the l.Jrutcu State' (c..\tULnced by large
values of ip~f!rOI\ th) kad~ to tncrr.. ..N:s 1n nal pttc~s. h 0, c:xogenou~ '?
Arc the e .. timatcd multiplier.,l'lllmn in the graph-. in (r..) rlliahlc'>
Explain

In the. data file LSMacro_)1onlbly. y<.1u v.tll lind dat1 on two aggregate
priCl. scnes for the CoiLed Statcc;. tht. (.:nnc;umea Pncc. lnJcx (CPI) anu the
Personal Conl>umpuon f=xpendttUh!' D~ll.awr (PCfD) These ...erie!. are
altemalt\e measures of con:,umr..:r price' in tho.. United St.lles. The CPJ
prices a basket of goods whose compn...itar..m il> updated every 5-10 years.
TI1c PCED uses chain-weight ing to price a basket of !!OOd' whose composition changes (rom mon th to mont h. b:onnrntl>t' haw ar!tued Chat the CPI
will overstate mtlation becau!>e it tlol!~ not take inw account the substitution th:u occurs \\ ben rdutt Vt. pJicc-; ch.mgc. Tf thi!> sub!)titution b1as s
tmporl.tnt, then a\ erage CPl mtlation l>houiJ be 'YStc.:mJtically higher that
PCED intla rio n. Let ;;~f'l
1200.., ln [CPf(t)ICPI(t- l)J, rr{'un =
1200 x ln[PCf,fJ(t)I I'CD(t - 1)j. andY,"" 1r1 1' 1 - r.f'0 0 . so that 1T~r1
is the monrhly rate of pric~ inflation (measured 10 percent~I!!.C points at an
annual rate) ba:,cd on the CPl. r.!"( 11 ' 1~ t h~; lll~'nthh rate ot pncc 111tlataon
from the PCED. and Y 1 is the.; dlffCrc.!nce. L'>tng Jata !rom 195Y:l through
200-U2, carry o ut the lollo\\t ng c\erct\r..'.
a. Compute the -;ample means of r.fNlml rrf'' n. Arr.. the'\e point csumntcs con~istent with the pre!)cncc ol cconomicall~ .;igniticam sub:.ti

tullon bias in the: CPI!


b. Compute the ..ample mean ot } ,. fxplnin '' hv it i" numerically equal
to the difference in the me<ans computcc.l in (a).
c. $ ho\\ that the population mc:tn l'f Y " cqu.ll to the uirrerence of the
population means of the two infl.ttton rates.
d. Ctm..,ic.lcr the "constant-term-untv tc\!n:,saun: Y, f3u + u,. Shu'' that
/3, 1 = E(} "). Do you think that u, is scriallv cnm: Ia tell'! Expl.tin.
e. Cun;;truct a 95n{, confid~;ncr..: intcrqtl fm {3 0. What value of the TJAC
-..tanuard truncation p::uamckr m did ynu ch<)o.;c'' Wh~ )
f. h tht:rC o,l<llhlicaJI~ Stgntficant C\i,.knr..c IIMt the mc:tn inllation rate
fur I hr.. CP "greater than tht. r.llc.. fm tho.. P< l-:.D"!

634

CHAPTEI 15

EstimotiOI'I of Dynomic Cousol Effects

APPENDIX

15. 1

The Orange Juice Data Set


The orange JUtce pncc data are lhe frozen orange jUice component of procc,scd rUtNJ, and
feed~

group of the Producer Price Index (PPI), coUcc1ed by th~ L \ Hure !U of Lab

Staw,llcs (BLS ~m:~ wpu02420301). The orange juice pnce scru:s \\J:. dt\h,kll h~ the O\<!r
all PPI for fini bed goods to adjust (or general price innauon. The lr.:c:Lt tJcgrl!~o. J;t)"
sencs

'"d" con!>tructed from daily minimum temperatures recordcll .11 Orlando-,m:a atr

port~ obtained

from th.: National Oceanic and Atm~phcric Administration ( ~OAt\) ot

the U.S. Depanmcnt Clf Commerce. The FDD series was constructed so that its tt ming and
the timing of the orange juice pric~.: data were approximately alignell. Specificall\. the
frozen orange juice price data arc collected by surveying a sa mple of producers in the milldie o( every month, although the ext~ct date va ries from month to month. Accordingly. the

FDD series was constructed to be the n umber of freezing degree day~ from the ll rh ol om.

month ro the 101h or the next month; rhat is. FDD is the maximum of zero and 32 minus
the mintmum daily temperature, summed over all days from the I J'h to th~; ]()1 . fhu~.
%ChgP, for February is the percentage change in real orange ju1ce price~ from millJanuary to midFdlruary. and FDD, Cor February is the number of freew1g degree llavc;
from January I I 10 February 10.

APPENDIX

15.2

The ADL Model and Generalized


Least Squares in Lag Operator Notation
Titis appendix presents the distn butcd lag model in lag operator notation. de rives the
ADL nnd qua~idi(fcrenced re pres~;:nta tions of the distrib ulud lag model. and discusses the
conditions under which ihe ADL model can have to.:wer parameters than the ongmal
distributed lag model.

The Distributed Lag, ADL, and


Quasi-Differenced Models, in lag Operator Notation
As dt:ftnec.l in Appenlli\ 14.3. the Ia~ operator. L. h.l, the property th.1t
tbe di~tnbuted lag fj .\' + J31 X,

1-

+ fj

X,

UX, = X,

anc.l

cun t-.: ~ \rrc" J ' #ll )X,. "hc:rc:

The ADL Model and Generalized leosl Squoros in log Operator Notation

:L /3 Ll, where L 11 =

fj(L)

I. lbus the dim1bUII:J 1.1g model

1n

635

K c~ Concept 15.1

1EquJIII"n ( 15.4)j can he wriue n in lag operator nut.llllln as

+ u,.

Y, = {3. , -t- {3(L)X,

(I SAO)

In addition. if the e rror tem1u, fo lio\\" an AR(p ).th~:n it can be wrinen a~

<b(L)u, =
where <}J(L)

u,,

05.41)

= L:=ud>jV. where .Po= 1 and ii 1s scnall~ uncurrd 11cd !note tha t o .... .<Pp
1

as de fined here are the negati' c of <!> 1... . I!>,. in the nutation ol Fqu<tlton ( 15.31 )).
To derive the ADL model. premuhiply each 'u.h. u( ElJU<~IIon (15 40) by tb(L).so that

r/J(L) Y, = 1/i(L){ 130 + fj(L)X, + u,J

a0

+ S(L)X, + ii,.

(15.42)

where
p

t/J(1)f3o and ~(L) = </>(L)t3(L). where <P(l)

= L <l>r
j-

(15.43)

To derive the quast-differenccd mo del. noll! thnt <P(L)j3(L)X,

= ,S(L)Q>( L)X, =

a0

IJ(L)X,. where

X, = cf>(L)X,. Thus. rearra ngin~ Equallon (15.42) yields

r, = a., + IJ(L)..r, u,.


whe re

(15.44)

Y, IS the quasi-difference o f Y,. th ai il', Y, ,P( L) Y,.

The ADL and GLS Estimators


The OLS esuma10r of the ADL coefficients ,,; obta1nc.::d by OLS e~t 1mation ot Equauon
( I ~A2). The o rigi nal d imibuted lag coeffictc "" ar\! IJ(L), whtch. tn term:. of 1he: e~timated
\:udftcicnts. is {3(L) = S(L)fi/,>( L); that i~ the codfici~:nts in J3(L} -.Jti'>r) thc; restricuon'

implied by Q>(L)Jl(L) =S(L). Thus. the c<:tirn ato r o r t he dvnomic mulllplic r' ba<:ed on the
OLS estimators or the coefficients of the AUL moJ\!1. c(l.) and 1/J(L) is
~ 1101 (L)

6(L)/4J(L)

(15.45)

The e xpressions for the codficicnts in Equat1on ( I <i.29) 111 tlu.: to.:xt .~rc obtainc:d a., a '>pe
ctal cao;e of Equat ion (15.45) " hen r

= 1 and p

The feasible GLS esttmator ts computc:d b" ohtammg a preliminarv .:l>timatur


of <I>(L). computing estimated quaSJ-dtllcrc.::ncc~. c:'llm 1110g. fj( L) m Cqu.:ruon (15.4-l) u'mg

636

CHAPTER 15

e~timotion

of Dynamic Cousol Effect~

qua"i Jiffl!rc:ncc:.. and (i f dc:.lrc:JJ t t~o:r tlin~ unttl C'"'crgem"C. l'h~


itcruted C' l S c:'llmator j, th~ ::'-ILLS c:'tunator c:cmputeJ hy '\ill l ' ' m. lion ot the t\DI
muJd in F.4u.ttton (1 5.4"). "ubject to lli..: nonlinear reo;trrdiun' till the p.~r.tmctc:r-. CtnliiiiiC:d 10 f:4U1111nn (I "i.-!3 ).
th.:'.:

c:stirn,ll~J

,\s :.l h ;;!>Sed 10th~ &.hscus::innsurrounding Equ"tti "" ( 1 ' :til) 1n t " 1

\l

it b not enou h

for,\, tn I'C (PJ't and prc')cnt) exogenous to u... t.:tlhc:r ot the:,._. .... ,urn,t\,nn

m~.thotl

hvld II. hi)\\ \cr .r ts ,,n, y


c\ COlU,,th~.:n Equati1.l0 ( 15.36) d ~'> h ld and, a''ummg th, I A '-umpllons :Z Ol to.. \'
Com:ept 14.6 hoiJ the,.. eo;timator" art: con.;i,t~.nt :md :t'Hnptuticnll~ norm.tl. \lure '
the U')U tl ( ~, rl).)sscctional h~:tc rosked<ISticity-rohu<;t) OLS ,t.lndarJ errors prmiJc u \Jhd

c\clgencit~

ah1nc:

d~XS

not en,urc: that Equauon (I'

~b.

h '" for :.tatt"ll" I mf...,n.:n'-C.

Paraml'ler reduction m;iiiJ> tile ADJ. model.

Supposc that the l..lt..,trihutt,;J lug poly-

llllnli.ll I:J(l } .:an I'll: Millen 1~ a ratio otlag pol~ nomtals, O,(l ) II~( L). \\ hcro.; U,tL) am.J IJ~( I )
nr~ buth lup. pulynomialo; 11f a low dcg.ree. Then t/J( L)f~(f) in l qu ntum (l~.-1 ~) 1~ {1(1 )f:J( l)
_.. cbC I.)fJ 1(L)/II,(I ) -[r/J{L)/Hz(L)]0 1(L) If it .so huppcn' tha t rb(L) O.(f.). thcu ~(l 1 ~

cb(l

)~(l

(I ( L). li

the Jcgrcc of IJ1(Ll is low, then q . the numl>et

l)fl,ll'\lll \"'

moJ.:I. can be much h.:ss than r . Thus. under tbc:.c a!'su m ption~ e~t unntion

in th1.. \ DL

(ll

the .\DL

modi..'! ~ ntaih C\tlll\otmg pot~.;otiall y many fewer parameter<; than th<. ngtn.tl UI'IJ thutec.l
1~ 111 th1s :.em..: that tho.: ADL modd ~Jn ach1ev'- 1 more par tmvnh>U' pa!.l-

l.tl' modd It

m~.: h:ntnllon'

(th;ll to;. U.'l: fc:\\Cr unknl"'" parnml!tc:r,llhan rht d1~tn~utcd l.tt; muJ...t
A<.. de \'clopcd hen:. the a"'umption that (L) and fl_(l ) h tpp.;n to be th1.. ' me '.:~.-m:.
hi..~. 1 c ncid~. 1ce th Jt wouiJ not occur in an appli~...lliw Ho" " the \01 mekkl ;, uhle
10 ~ ' J .. J I b'- numlx r C"l[ 'bape<i "'' \J) nanuc multtplu!l' "1th only <1 h:w cot'ffick nt

1ariance. A gooJ wa) to thinl.: ahout wh ~.;th.-r to c~ttm.ll.


dyn 1i" JUitJfli ""~) ftr,t c'timating an -\DL rno<.ld and th;n wmputmg the d)n~mt..:
mullipJia, from the A OL c~ldficienl'> or. alternJtt'd~. by e~ttm.ttin1! the dt,tnhutc:J l.t?moJd lltrectly u~tng GLS i~ to view thc: decision tn lt.rm' (I{ tr.tu~'(lf[ h..tv.l."cn hin' nnd
vu11 an~.e. F\tmt tttng the Jvnamtc multtpliers u ing. :~n app111\ im 11\: \ Dl mudd intrtt
ducc:s bw~; how~\c.: r. because there a re fe w cocl ltcicnts. the v.u1ancc ol the e~timatliT of
thL J> nJmic muluplicn. can be ~mall. 1n co ntrasl. c:.umattng .t ILnl! dt.,tnbuh.:u l.ig mod I
u ... ing Gl ) produces Jec;s bta~ m the mulliplicrs: howewr. bc~UU'-C tiH:rc arc M> man) l.'t't:l
lktl'n t:-.. thcit variance can be l~~rg~:. U' the A D L appmximHthlft to th~.; uyn.tmi&.: mu lt iplt~r.;
i:. a ~ood one then the hia" of the implied uynamk multiplier' \\ill b1.. \Ol.lll o;oth~ "\1>1
:~ppru.tch will have n .. mallcr \'anancc than rhe GLS appru.t~.h with .mh .t .. m.tll incr ....,...;
m lh~.; btJ" ror tht' r~a-;on unr~stricted eo;tunauun ol .m Al>l modd with '>mall numhc~
ul ''!.- t1t } .mJ X ts an attra~o.ttve \\il'f to appro>.tm 11..: a 1 ~ Jt,tnht J I ~ \\h\."n ,;\ '
'tn.:tly C\ng..:llu 'i.

./ tDL

tlr

Gl ~ Bias

l"S.

PART FIVE

The Econometric
I Theory of Regression
Analysis

----------:

CH.\PTER

17

The Theory of Linear


R egression with One Regressor

CHAr rER

18

The Theory of Multiple R egression

CHAPTER

17

~ The Theory
of Linear Regression
with One Regressor

hv c;houltl an applied cconom~tncinn hothl.!t h..trning any cconometnc


th~:nn? Thc::rt!'

art!'

-~\'cral

rca,t-'OS.

L~ammg econometric th~,.,,ry tu rn~

}nUr 'tali,ucal -.ofl,.,.are (rom a ..bl.u:k hox 'mto a flcxthk: tn\Jlktt lmm which

you an.: able to select the right tool for the job nt h,utd Under,tanding
econom~t ric

theory helps you appreciate \\ h}

a~:-.u mp tion s a rc r~qwred

t he~e

tn,lJ, '' ork and "h.tl

tor each tool to work. properly. Perhaps m'):o.t

importantly. knowing econometric Lheof) help-. you rcl'o~nite wh~.n a tool will
'"'' work well in an application ami when you ~hou l d look for a dtlfcrcnt
econometrk approach.
Thi ... ~. h a pt cr proviJe, an inlroductto n to the econumc..lric theOT} ot liot!'ar
regrl..!ssion '"nh a singl~ regre'>sor.111is in troduction is tntcndcd to ~:uppkmeot
nut replace

the material in Chapter" 4 a nd .-, " htch c;hould he re,u.ll ir't.

This chapter extends Chapter:. 4 <md 5 in t\'vO way!>.


Ftrst, it provides a malhematical treatment ol th1. sampltng d t~trihutton of
the OLS cslimalOr and r-static;tic. hoth in Iarae.; ':1mplc~ under the tlm.:l! lca't
square' H'-~u mptions of Key Concept 4.3 nnd
.tJdttton.tl asc;umptions of
extended lc.t..,t

square~

homo:.k.~?dust idty

tn ltnllc

sJmpk:> undu th~.. two

and normn l error.... Thcc;c

lt\'C

assumptions arc laid out in S-.:ction 17 I. Sct.:tmn' 17.2

:mel 17.3. augmented b} Appendx 17.:?. Je\clop mathc:nl.lttedlly tht: large-

'amplc normal dtstributions of the OLS estimator nnd t-... tatt"itC undt:r the fir<;t
three
17.-l

a~ ... umptiun~

J~.rtves

(the lcn-.t square<. a"umption" ol K1.) ( ,lllC1.pt -U) Sectton

the exact d istnbutions of th~o. OLS esum.uor .md t-.t:tltsttc under

th1. two aut.litional assumptions of bomo,kcdac.ticity and normall) Ji,trihutct.l

f,77

678

CH A PTER 17

ThG Theory of linoor Regression with One Regre~50r


Scc(.lnd,lhi~

chapter cxtcnus

Cha pt er~ 4 and 5 b)

prm idmg .1n

method tor hJn<.llmg hetcroskedasticity. The approach ol

.th~rn,llth'

Chapt~.:r-s 4

and;:. ic; to

use hc teroskt:da~llctty -robus t standard errors 10 en'\urc that 'tali-.tk:.tl inrcremc


i:. ' 1liJ even if I he crro~ are hetcrol>kedasttc..llw- mcthuJ come'>" ith 3 co.t.
ho\\c \Cr. H t h~.: errors are beteroskedal>UC, then in

th~.:JI)

n mt- .. '- 1 ~tent

cstimatut than OLS is available. This estimator, called "dghtc<.l

prc'c nted in Sect ion L7 5. Weighted least squares requues a

it;,,,, '4uarc'. ts

~real

deal of .,nor

kn u" h:dge about the prec ise nature of the heterosk~Jc~~ttc uy- th .tt " about the
conditional varinnct! of u given X. When such knowledge is a\.tlluhlc: Wctghtcd

lea$1 l>quares 1m proves upon OLS. In most applications. however, such


knowledge is unavailable; 10 those cases. using OLS with heterol>kcllasticit}'
robust standard e rrors is the preferred method.

1 7 . 1 The Extended Least Squares

Assumptions and the OLS Estimator


Thi section introduces a set of assumpuons that extend and strengthen the thr~.-o.:
least squares assumptions of Chapter 4. The. c slronger a~'u mptio ns arc used m subseque nt .;cctwns to dcri"e stronger theoretical resuJts about the OLS estimator th
are po sible under the weaker (but more rc:a hstic) assumplto ns o( Chapter 4.

The Extended Least Squares A ssumptions


Extended least squares assumptions #I, #2 and #3 . I he first three
extenc.Jcd least squares assumptions are Lhe three assumptions gi v~n in Key Con
cept t~: that the conc.lit ional mean of 11,, given X ,, is zero; tha t (.\',, Y,), i

1.. ... 11.

arc i i d. draws (rum their joint distribution: and tha t X, ,tnJ u, h<l\ c: four moment::\ lnde r these three assumptions, the OLS cstim,nor is unbit~scd. is cono.,bt.:nt.
and has an a.,ym ptotically normal sampling cli-.tribution If the'l' thrcl.' a~~ump
tronl> hold then the methods for infe r~ncc introduced in Chapter 4-hypothcs"
h!'>lmg usan!! the t-~Wlt '\llC .. nd conc:tructJon of 95" coni!Jc n~c mt n.th as - 1 ' 1fl

17. 1

The Extended least Squares Assumptions and the OLS Estimator

679

standard errors- are justified when the sample size is large. To develop a theory
of efficient estimation using OLS or tO characteriz the exact sampling distribution of the O LS estimator, however, req uires stronger assumptions.

Extended least squares assumption #4. The. founh extended least qua res
as umpt.ion is that u is homoskedastic. that is, \"ar(u 1X;) =
whe re u;, is a con-

u;

stant. As seen in Section 5.5, if this add itional assumptio n ho lds. then the OLS
estimator is efficient among all linear estimators that are unbiased, conditional on
Xl , ... , Xn.

Extended least squares assumption #5. The fifth extended least squares
assumption is that the conditional distribution ol u;, given X;, is normal.
Under least squares Assumptions #1 and #2 and the extended least squares
Assumptions #4 and #5, u; is i.i.d. N(O. CT~) . and u, and X 1 ar e independently distributed. To ::;ee this, note that the fift h extended least squares assumption states
that the conditional distribution of u, IX; is N(O, var(u;I X,)) . By the fourth least
squares assumption , however, var(u, IX) = a~ . so the conditional distribution of
u;lX ; is N(O, a~) . Because this conditional distribution does not depend on X 1, u,
and X; are independently di:.tributed. By the second least squares assumption, u,
is djstribmcd iodcpcndenrly of u; for all j #:- i. It follows that. under tbe extended
least squares Assumptions #1. #2. #4, and #5, u, and X; are independently distrib
uted and u, is i.i.d. N(O. ~) It is shown in Section 17.4 thal, if all five extended least squares assumptions
ho ld. the OLS estimator has an exact normal sampling dJl>tribution and tbe
homoskedasticity-only -statistic has ao exact Stude nt t distribution.
The fo urth and fifth extended least squares assumptions are much more
restrictive than the fi rst three.AJthough il might be reasonable to assume that the
first three assumptions hold in an application, the final two assumptions are lese;
realistic. Even though these final two assumptions might not bold in practice, they
are of tht:!oretical interest because if one or both of them hold . then the OLS t:stimaror has additional properties beyond those discussed in Chapters 4 and 5. Th us,
we can e nhance our understanding of the OLS estimator, and more generally of
the theory of estimation in the linear regression modeL by exploring estimatio n
under these slronger assumptions.
TI1e fi ve extended least squares assumptions for the single-regressor model
are summari7.ed in Key Concept 17.1.

680

CHAPTER 17

The Theory of linear Regression with One Regressor

THE EXTENDED LEAST SQUARES A SSUMPTIONS


FOR REGRESSION WITH A SINGLE REGRESSOR

17.1

Tile linear r~gn::.~ion mo del \\.ith a singk regressor is


Y, = {30
The extcnc.lcu lcust
1. F.(u .A )

{3 1X,

..1-

square~ as~umption~

u,.

1. . . . . 11 .

( 17.1)

are

0 (condilional mean Lero):

2. (X,. Y,). i 1. . . .11 arc indcpcmlent and identicall) di.stnbutcd (i.t.d.) draw:-.
from their joint distribution:

3. (X,.u;) have non:.--ero finite fourth moments:


4. var(11, X,)

= tr;: (hornoskedu.;ticity): and

5. The conditional distribution of 11, ~iven X; i~ normal (normal errors).

The OLS Estimator


1-nr ca'>y rdcrcncc we restate the OLS e~timators of {30 and /3 1 here:

L( X, - X )(Y;- Y)
{31

=-

..L_,_

( 17.2)

ll

L (X; - X)2
t=l

{3 , - Y

f3 X

( 17.3)

Equauons ( 17.2) ;mu (l/.3) are denved in Appendax 4.2.

17.2 Fundamentals
of Asymptotic Distribution Theory
Asymptotic c.li;;tribution theory i~ the theory of the d.tr1huuon of ~t atbtics-e!\t i
mators. test statistic&. anu confidence ultcn alo;-whl:n the sample size is large. Fnrm.tlly, thb theor} involves characterizing th~ behavior ot the :,ampling tlbtri bution
of a statistic along u :.equence of ever-larger sampk~.1l1e theory is asymptotic an
the scn'\C thal it characterizes the bchuvior of the statio;tk in the limit a' II ----+ x .
Even thoue.h sample sizes are. ot cour~e. never infinite.. ""}rnptotic Ji"triouuon theof) plays 1ccnual role tn economdric<; anJ -.wta,llcs fur two reason.... FiN.
tl lhc number of uh.;ervauons u~cd 10 .m cmprrical appltcallun 1s large, then th(;
,,,ympwtic hnHt l:JO provtde a hgh-liU:Jiit) .tppro\inwtiun to the fimte ..,ample c.lt$-

17 . 2

f uodomen1ol~ of A~yrrpk>li~.: Dislri~tioo Theory

681

trihuttnn Sc~:onu awmptot1c samplinc dt~lrihutinn' t)ptcally arc mucb simpler.


cllld thU'- l.:.l~l..:r to U'l! Ill practic~.;, t h Ill ~ Jo..lC f ltlliii! '-AOlplc:. Ji,tnbUtiOQ\. Tak<"n
togcthc1. thcl'.e t\\0 reasons mean that rchJbk and suaighttmward rnL.thods for

w tng t - ~tati,tk~ and ~5 % wnlld~.ncc intenah Gtkulated ,,, I C)fi -.tandard ~rru~-1:an he bawd on ctppt 1.l\im.11e .,ampling di'>tri-

.,t.tti~til:al inkr~.n~c-tc'i'

hutioth dcmcu from a~>ymptotic thcor~.


lltc two corner tones of asymptotic dtstributton theor~ .tre the law ot large

number-. anJ the central hnll l thL.Orcm hnth introuucc.:d Ill S~1.110n 2.6. \V~ begm
th is lie.; tton b} colllinuin!t the dtscus,Hm ol thl! 1,1\\ ot lar)?.~.; numbcrc; .md the cc.;nllallumt tlh.orcm. includtng .1 pn.>1JJ llf the 1a'' ot htrgL. numbc:r;). \\c thtm mtrodtKc t\\t\ rnorc toot... ~lutsk) \theorem and the ~.:'1\llllllUnu' mappmg theorem, that
e\h:nu the ut.dulnc.,, of the I<J'' llf large: numbers .md thL. c~ontwllimit lh~orem.
\ s nn illustt ntion. th~c;e tools arc then U'\CU to prm 1. that the d1<:trihution of the
t-~t atl..,tlc based on Y le::.ting the hvpotht!sts I:. I Y) - I' has a standard normal uistnhuttnn under th~ null h) pothc-;is.

Convergence in Probability
and the Law of Large Numbers
1111. w nccph of convc rgencc m prohahilit) anu the l,t\\ ullargl.! numl'lc.!l' were introduced m Secuun ~ 6. H ere we prov1d~ a pr~ise mathc:malic<tl d~:fmi tion ol converg~ n~t Ill prnbahlhty h"lllow~d b~ a Slah.ffil.. nl clOd prtXII nf the LlW t>llarg.c numher~

Consistency and convergence in probability. Llt ~I \ . . .. sn.. .. he a


'l'4llt.'llct 11! r.mJom 'J.ri.tbks. For c:x.tmpk 511 t.uuiJ he th~;. -..tmplc '' a<~ge Y of
n ,,unrk llf ''t'h'l.'natit)ns nf tht random \an thlc Y. The ... cqut.nc~.: or random
,ari:tbk' {S' )I" c;:1id to converge in prob nbilit~ to tlimit. Jl (that jc; S ~Jot), if
the proh;thlht\ th.Jt ) h '' 1thm z ~~ l,f J.L tcnt.h to I .1... 11 - - :(;, a, long a' the constant ~I' p\Ntlve Ibat i".

S, ~
Zh ''

,. :.c.

I' tf and on I~

ir PrL ~-. -

J.L 1

~ ii) ~ n

{17.-l)

for every 5 > 0. lf S, ~ J.L. then S,. j, 'i'lld to he a con.'iL'>teot esti-

mutor nl Jl.
lbc law ol large numh~.r~> ~avs that. undc:r certam
('onl.htlon' nn }1.... , }'r.th..: ,,1mplc aH:ragc )' c.un\it:rgc-. 10 prohah1ht~ tn the
population mean. Pwhah11il~ th~uri'" haH: Jcn-lupcJ m.tn} 'vt:r..,iun' uf thc law
t'l l.trgc numh~t' C0rTC"J'IOnding lo ,,,riuu' condition' nn} .... Y , Th1.. \cf'-ion
,,f the 1,1\\ nllarcc numh~..r, u. . cd 10 this hnn~ i..,that }' 1.. .. ) are i.i.d. Jr """from

The law oflarge numbers.

682

cHAPTER 17

The Theory of Linear Regression with One Regressor

a distri b ution wi th finit e variance. This law of large nwnbe rs (also sta ted in K~y
Concept 2.6) js

if Y1, ... , Y11 are i.i.d .. E (Y;)

= J.l, y, and var(Y,) < oo, then Y

~ J.Ly.

(17 5)

The idea o f the law of large numbers can be seen in Flg urc 2.8: As the ~ampk
size incrl!ases. the sampling d istribution of Y concentrates a round the population
mean, JJ.. One fea ture of the sa mpling dist ributio n is that the -. anance of Y
decreases as the sample s ize increases: another feature is tha t the p robabili ty that
ide ::!: S o f J.L y vanishes as n increases. Th~:,e two feature:. o f the ~am
piing d istribution are in fact linked, and the proof o f the la w o f la rge number

Y fa lls o u t

exploits this link.

Proof of the law of large numbers. 111e link be tween the va riance o f Y a nd
the probability that Y is within ::!:

oof J.Ly is provided by Chebychev's ine quality,

which is s ta te d and proven in Appendix 17.2 [see Equa tion (17.42 )). Written in
te rms o f Y, Che byche v's inequality is
(1 7.6)
fo r any positive consta nt S. Because Y1, ... Yll a rc i.\.d. wi th variance u}, var(Y)

= ct~ l n; IJ1us, for a ny S >

0. var(Y} I fi

= u~ I (S2n) ---+

0. I t fo llows fro m Equa-

tion ( 17.6) that Pr(. Y - J.Ly l ~ S) ~ 0 for every<>> 0 , proving Lhe law ollargc
numbers.

Some examples.

Consistency is a fundame nta l concept in asymp to tic dt~tri

b utioo theory, so we present some examples of cons iste nt and inconsisten t estimatOr"$ o f the populatio n mean , J.l.y. Suppose that Y;. i = 1, .. . . n a re i.i.d. \\ ith
va riance 0'~ lhal is positive and fi nite. Consider the fo llo wing three estimator:. of

J.l. y: (1) mu = Y1; (2) mb = (\ -: ._ :) - 1 2:~= 1 ai-t Y;, wher e 0 <a< I; a nd (3) m, =
+ 1/ n. A re these estima to rs consistent?

T he firs t estima tor, m". is just the first observation, so (m11 ) == E ( Y1) = J.L\
a nd m,, is unbiased. However, m" is not consi!'te nt: Pr(l mu - ,uyJ 2: 8) =
Pr(l Y1 - f.L} 1 2: <5), whjch must be positive for sufficie ntly smallS (beca use ct1- >
0). so Pr(l m 0 - J.L rl 2: S) does not tend to zero as n -::c., so m 0 is not con sist~nt
Th is inconsistency should no r be surprising: Because m., U!-.C~ the information in
on!) one ohserva tion. its d istribution cannot concentrate around J.l.t as the sample
size increa es.

11. 2
1111.'

Fvndomentols of Asymptotic Distribution Theory

!.ccond estimator. mb. is unbia<;ed hut ts not

con.t~t~nl.

683

It i" unbiased

becau~e

E(m ")= [(11 -- a")-t=Ia- Jy = (I1 - a")


t1

"

since 2:ai-l
i-t

>0

= (1- a") 2:rl

"'-.~ 0

J.L)

= J.L Y

=( I - u")/ (1- a).

;-o

TI1e variance of m b is
var(mh)

,(1 - u ")(l
= ( 11- -- -a")-~"
2:a-(l~-,
I)IT)' = a y
o ,-;
(l-a2)(J -

a)~
a')

1
= u .(
} (I

a")(l -a)
a )(I

t1)

whichhas the limitvru(mh)---+ o{.( l - a)/(1 a)asn----+ c:o. Th us. the\ariance of lhi!. estimator does not tend to zero, the lh,t ribut ion doe ... not concentrate
around 11-Y and the estimator, although unbia!led, is not con-;btcnt Thi' is perhaps
surprising, because all the observations enter this c-.tinunor. But tnO\l of the observations receive very small weight (the weight of the 1111 observation jc; proportional
to a' - 1.whtch approaches zero as i becomes large.!), and for this reason there is an
insufficient amount of cancellation of sampling errors for the esttmator to be consbtent.
The th ird estimator, me. is biased but consistent. lt~ bia is I I n: (mJ =
E(Y + 1 I 11) = f.J-y + 1I n. But the bias tends to zero as the samrlc ~v~ increases
and m, is consistent: Pr(l me - P-> I ~ S) = Pr((Y + 1/ n - p., I > S) - Pr(l (Y JJ- y)+ l /n i~ 5).Now i(Y - p.y)+l/nlsi Y - p ) tl/n,so tf ( l' - /-l )) '-lnl
~ 8, it must be the case that Y - 1-L yl + I 11 ~/>;thus. Pr(l (l'- J.l.)) + 1111~ 8)
s Pr(IY- P-r l + l in~ S). But Pr(l Y - J.L>
I 11 ~ c5) = Pr(ll' - ,.,, _ lil I n) s u~ l[n(S - 11 n) 2J ---+ 0. where rhc ftnlllnllqualny follow~ 1rom Chebychevs inequality ~Equation ( 17.6), with 5 rerlaceu h) li - 1 In for 11 Ifill It rollows tha t m e is consistent. This example i ll u~tt ate., the g~;ncral point that a n
e'>timator can be biased in tinite samples bUl. if that h1as vanic;hec; ac; the sample
si1.~ gcb large, the estimator can still be coosi~teot ( L>.~;rcic;e 17 I0).

The Central Limit Theorem


and Convergence in Distribution
1C the distribu tions of a sequence of raouom variables conv~rge to a limit a'
n ---+ oo, then the sequence of random variable is .,aiJ to comergt: m ublribuuon ll\e CLntrallimit theorl!m S<l\'S that, UOUCr pcncr:tl condition-.. the St tndard1/I..:U -.ample. average converges m distritmtion to a JH.lrntal random -.an.1ble

684

Thn Theory o( linear Regression with One Regressor

CHAPTER 17

Convergence in distribution.

Let F 1 F .... ... F11 ,

h~ a o;cquenc of cumu-

latt\1.' dbtnbutton lunclion~ corrcspondmg to a se qu~.n..:c ol r tnd n "tr ~hk . ~ .

s~

..... S,, . .. .. Ft"lr 1.\ dmple. sn might he the '>lantl.ardiz-.:d 'Ill ph. " eragc.

(} - 11\) ''l Th1.n the seque nce of random 'ariable .. S',. i, -.aid to roD\ol'rJ.:l' in dbtributiM to S (denoted Sn ~ S) if the distd1utiun fundton' {1: j~.~onvergc to
I , lhl Ja-.trahut on c..ll f) That LS.

S,,

'

-or

S if and only if hm F

(l)

r.-

1- (t)

( 17.7)

wht.n: the limit hold~ Hl all points rat which the limiting Ul\trthution Pj, ..:nnttnuouc;. T11e di$11 ibutwn F is called th~ asymptotic distribufiun of ), .
Jt as u:.ef ulto contrast the concepts of convergence in pmhab1ht'l- ( -1-~ l.nH.l
COO\ I.: I gc nct Ill distribution (~) . If s/J ~ J.L. Ihcn S,, ht:comcs close to /J
with high pr<.>bnbilit y :1s n increases. In contrast. if S, ~ S.thc n I he dlll/lhtllwll
of S11 becomes close 1o r he distribution of S as n i ncr..:ases.
We now restatr.: the central limit thc:ntcm u-.in~
the concept of convergence in distribut ion. The central limit thcorc.. m in k t:y
Conn:pt '>.7 states that if Y1 , Yn are i.i.d. and 0 < o1- < :c. then the 1wmptotio.:
J"tnhution of (Y- J.Ly) l u y is N(O. l ). B ecause (Ty = u \ 1\. 11, (l - I - ' l l trl \in( Y - J.Ly)lur Thus. the central l imit thc!orem c.m he rc-.tntcd :as
\~(Y- J.L\) ......:!..... rryZ, \\here Z is a standard no rma l r1ndom , .mabie. lb1s
mean' that the distribuuon of Vii(Y- ~y) converges to ,\ (0. u 1 ) n... '' -x.
Conv..:nt1onal shonhand for lhi limit JS

The central limit theorem.

. rd
vn(Y- J.Ly) - . N(O.u}).

( 17 l'i)

Th1t is. al Y1, , Y, arc i.i.d and 0 < <T~ < x. then the dt~lrihuti on .-.t
V 11(}
J.l..>) converges to a nom1al distribution \N'ilh mean tew and van.tnl'l rr~.

Extensions to time series doto. The Ia'' oflarg.c numbt:J sand central ll m11 thv
orcrn :,lalcu 1n "icctinn 2.6 apply to i.i.d. observations. As (li~cu::.~cu in Chup1 e1 14.
the i.i.d

'"~umptinn il>

inappropriate for lime series data. and these thcMcm' nc\;'U

to be C'~tcndt.!d before the~ can be applied to lime st.:lics nhsl'rvation .... 111ll~-: c\ t ~:n
~ton:,

nr.:- technical in nature. in Lhc sense that the conclu'lion jc; the c;amc vcr~ion'i
ot the law ol l a r~c numbers nncJ thl! central limit t hco r~.m arpl) to time --~nc..-,
data but the.: C011UIIIon<; uuJer wh ich the} appl:. ar~ dirk rent. Tht::. l~ JISCU"'-J
hth.:fly 111 Setllon lhA but a mathematical trt.:almcnt of a~}mptotJc dJstnbutw
thc..<H) for llml ~Cril.!' \<lriahJc., i' hc)tlnd the scup~: of thi-. btlok 1nd tntcr~:,tcJ
rc:~Jcr., <He rdcrrcJ tu ll n}a~hl (21ll I C'h ph:r "'~ )

l 7 .2

Fundamentals ol Asymptotic Distribution Theory

685

Slutsky's Theorem
and the Continuous Mapping Theorem
Slutsky') theorem combines consistency a nd convergence in distribution . Suppose
that a, ~ a, where a is a constant, and S, ~ S. Then

a, + S, __!___, a + S. a,S, ~ aS. and, if a '1'- 0, S,l a, ~ Sla. ( 17.9)


These three resulrs are toge ther ca11ed Slutsky's tht:orem.
1l1e continuous mapping theorem concc;ms the asymptotic properties of a continuous functio n,g. of a sequence of random variables, sll' The theore m has two parts.
11te first is that if S,, converges in probability to the constant a. thcog(S11) converges
in probabilit) to g(a); the second i!. tha t if S, conve rges in distribution to S, then g(S,)
converges in distribution to g(S). That is. if g is a contiouOU) function, then
(i) if S, ~ a then g(S.,) ~ g(a). and
(ii) if S, ~ S then g(S, ) __!___, g(S).

(17.10)

As an example of (i). if st ~ u~. then ~ = sy ~ u >" A!. an example


of (ii). supposeS, ~ Z. where Z is a standard normal rando m variable, and le t
g(S,) = S~. Because g is continuous, the continuo us mapping theor em applies and
g(S, ) ~ g(Z), that is. S~ ~ Z 2 In o ther word!'.. the distrihutio n of S?, cooverges to the distribution of a squared standa rd normal random variable. which in
turn has a
distribution: that is.
~ XI

xi

s;,

Application to the
t-Statistic Based on the Sample Mean
We no v. usc the central limit theorem, the law of large numbe rs, a nd Slutsky's theorem to prove that , unde r the null hypothesis, tbt: t:.ltlttstic based on Y has a standard norma l distribution when Y1. , Y, are i.i.d. and 0 < E(Yj ) <co.
l11e r-statistic for testing the null hypo thc-;is that E(Y,) = JJ.u based on the samph.! ave rage Y is gjveo in E quations (3.8) and (3.11 ). a nd can he written
Uy ,
(~)
~ hc!rc the second equa lity uses the trick of
denominator by cry.

di vi di n~

(17.11)

both the numerato r a nd the

686

CHAPTER

17

The Theory of linear Regression wtth One Regressor


B...:~:.tll'\.

>'1

Y, have two mnmLnl'> (wh1ch IS tmplt~.:u h) thLII having lour


moment:..: -;cc l:\l.T\.t:..c 17.5). und b~cause Y1... . } ,, art: i.t.u., lh\. tn,t h.rm in p.arcnthcc;cc; .1ltcr the ltnal -:qua lil) in Equation ( 17 11) olx'" thc cerur.d limitlhc:N\. .
L nderthc null h\JXlthcsl:>. v ~(r- JLQ)/u) ~ \(ll.l) Tn ndJithm.;.\~ --.!!- ( I ..
( h pro, en
Ar .,cnJrx 331 so~~ fer~ ~ I .1nd th1. r.1tio m th~. .;econ~ le ..... in
Equation {1, I) krds to 1 tE:.xe rci~c 17A) Thus tnc c:-.;prc 10 11 a ft( t " 1al
"qualit\ in Equ,ttior (.- ) ha' the form ll tnc:. nnal '-"<prc~~ion 111 l~ WtiL n ll 7. \1).
"here l tn th.: nmation of Equation ( J7.QJ) S - '/n(}' - P-fJ itr) __!_. \ I ) and
an= ,., f ,r 1 ~ I. It follows b)- uppl)mg 51uhJ...) ., th.:urcm th,tl t - V(O 1).
1..

17.3 Asymptotic Distribution


of the 0 LS Estimator and t-Statistic
Rccallt rorn C h;~pl~r 4 that, under the as-;umption-; of K~y C~nnp t 4 J (the li rsl
three ,t,.,umpti('llS of Kc} Conct!pl 17.1 ), the O LS I.!Sti mutor t~ 1 is co nsi<;tcn t ami
\;~(~ 1 - /3 1) ha., an .~.,ymptotic nomwl Ji..,tribution. Mon:owr.thc.: r-~w tisllc ll':.t
ing thr: nu ll h\ pulhc.:-.i\ {3 1 = f3 u 1 has an ao;ymptolic ., t ~1ndard not mul ui<..t ribution
under the null hypothesi<;. Thi'> <;cction summariL'e'> theo;c,; rc~ult-. ,mu prmiJc::, aJJition.d ddatls ,,, th\. tr proof!,.

Consistency and Asymptotic


Normality of the OLS Estimators
The lar\!c

~ampk

dtstnhution of f3 origin all\' -.tatcd in Key C"um:cpt 4 4. i'(17.12)

"here.: 1. = (X, - J.l.x )u,. The proof ot th1' rc.: ... ull "a.., :-J...etchc.:d in Appc ndi\ L3, but
tlut prcx,f umttt\:d .,c.)mc uctaib and in' oiV<:d an appro, im,uion th 11 ,~ ..,not fM
mally -.ho\\ n 'lbc mic;sing <oteps in thJl proof arc.! left as Cxcrc.:i"c.. J 7.3.
An impltl':tttnn ol fquaU on (17.12) is that {3 1 u-. consisten t ( Fxcn:i'c 17.-l).

Consistency of
Heteroskedasticity- Robust Standard Errors
t..ndtr the llrst three lc...tst "CJUarcs as,umplionc;. the heteroskeda~tidt\ rohu"t 'illO
Juru " rror hll /3 1 tnrm' the balol:. for ' cJitd 'tat l~ ll.:.t l inkrcnct' (\pc..:tltcally.

-,

I.T~.

~ -I.

"s.

(1 7.1.3)

17.3

Asymptoli<: Distn~tion ol tho OtS e~timalor and 1 Stohstic

687

where "
robu"' ,t,mJard "rror dettned in l.:qu.uion (5A): th11 ts.

(T2

I
II -

,,

"(Y

2 ..t.J

, [t2:1\

y)l ~2

u,

- X)

'fo,how tht rcsuh in Fquation ( 17 I1) fuo.;t uc;c the dt. hmtwnc; ot 1,2 and Uj,
h> rc:wntc: the tatto 10 E4uatton (17.13) ill>

(17 15)

hrackeb 0 11 the m~ht-hand side


HI Lqu.ttwn {17.1:;) converge in pwh.tbill t~ Ill I. Clc,ttly th~.: lu I term ~onvt:!rge:,
to 1, .tnd by the con .... ~tc:ncy or the sample varianc~ (App...ndtx J.J) the tinal tcTm
('Otl\crgc,tn prl>hahtlny to I l11us.all t11dt n:m,uns ts to 'how th.tt tht: st:concltt:rm
Cllnvcrgc:' tn pruh.thihty Ill 1. that h. th,ll L;'__ (X; - X) 11; _ ,._, v.u(' ,).
!'he: pno( that:
(X;- Xlu~ ....1'- '1r(1 ) J'lliCCt.'d' in 1\w teps. The
fip;p.howstht l II1 ~ 12 ~ "ar(1 ). lhcsec0nd.,how~thnt ,., ~ (X- X)~1i,~ ! ~ ,.,
0.
\h ntcd tn c;h<lW that each of the three

term~ til

2:;-

"

mum~nl. suppOM! that X ''"d 111 h '"-' Ct~ht m<lmcn t:> I that is,
anJ c(u,) < - 1-a strongct .tl.sumptton th.m the fou r momenrs
tcquucJ b) the third k,t'l ~l{U<Ift!~ a~~umptinn. ro Sh<l\\ th~ hr'l 't~p. \\t: mu:.t
'ho\\ thflt 1 ~' 11, 0}\c~ s the Ia'' of large numhcr' in [quat inn ( 175) To do so.,.~
mu't be: t.t.d (\\htdttt is b~ the -.c:cun<.llc.l\t 'qu.-r~.'> ''"umptit,n) ami var(1 ~)must
tx- Imit~ f0 ,(low that var( 1 ') < :x.. apply the Cauch )ch\\ nr1 inc:4 tahtv ( -\ppenp. ) 11 1-lr{( \ - 1-1 .d ]f(ct; )1 ~. Thu..._if
dtx 17."\):' r(1 ) 11--) - [(.\
X, und u h.. ,~ ctght mom('nts, lhc:n 1} has a ltnttc ';mane and thus o,;atisfi.::s the
hJ\\ of large numher-. Ln Equation ll"' Ci)
0
The 'Cl'nn<.l ~l1.p '' Ill pro'c th.1t 1 ~; { \' - X) , ... - 11' L ,; - - 0.
f:kc.HI\l' 1', = C\', J.1., )u,. thi' "l:!cond "'~P j, tht. -..un~. ,,, ,Jto,, in!! that

I llr lht

/~( \' ~)

::t:.

(17

)h)

Shu\\ tng this n.:~ult ~.ntails s.:lling u, = u, - <flo- {30 ) - (~ 1 - J3 ).\',, cxpandtng
!Ill. tel m 10 r\.ju.llion (17.16) in brackets. rt.:pl.:lli..'OI~ uppl)tnl! Ihi. (.~uch~ -~chwal7
illCl(U:tlil\, .tnJ u... ing the COO\i<,lenc) llf #u and
Tnc Jc:t.llh or thl. algt.bra are
ll.'lt a-; Fwrcist 17.9

hi.

688

CHAPTER 17

The Theory of l1near Regression with One Regressor

The preceding argument suppo e~ that X, and u 1 have erght moment-;. fhi.; is
2
not ncc~ssa ry, however. and the result
1(X,- X) ii? ~ nr(l,) can be
proven under the weaker assumption that X, and u, have four rn1lrnent,, a~ .,tatcd
in the lhrrd least squares assumption. That prooL however. is bc\ond the cope of
this textbook; see H ayasht (2(XX). Section 2.5) for dctaib.

!2.;

Asymptotic Normality of the


Heteroskedasticity-Robust t-Statistic
We now show that, under the null hypothesis, the heteroskedastrcrt) robu-.t O LS
t-sta ristic tc:,ting the hypothesis {31 = f3to has an asympto tic stanJ.rrd normal dr~
tribution if least squares Assumptions IH,It2, and ID hold.
The /-Statistic constructed using tbe heteroskedasticity-robust tandurd error
SE(~ 1 } = iriJ jddincd in Equation (17.1 4)] is

(17.17)

ll fo llows from Equation (17. 12) tha t the te rm in brackets aller the ~econd t::qualily in Equation ( 17. I 7) converges in distri burion to a standard normal random 'ariable. In addit ton. because the heleroskeda!>ticitv-robust standard error is con<~i<,tent

(Equation ( 17.13)), VujJuJ,


orem that t - - N(O, I).

~ 1 (ExerCI~ 17.4). It foUows from Slutsky's tt,l!-

17.4 Exact Sampling Distributions When


the Errors Are Normally Distributed
In small samples. tbe disrriburion of the OLS estimator and t-statistic depends on
the distribut ion of the r egression error and typically is complica ted. As discusst:d
in Section 5.6. however, if the regression errors are homoskedastic and nom1ally
distributed , thl!n t hcse distributions are simp te. Specifically if aU five extentlcd lea~I
squares assumptions in Key Concept 17.1 hold, tht: n the OLS cstima10r has a nMmal sampling distribution. conditional on X 1, ,Xw Mo reover. the r-sratistic hus
a S tudent t distribution. We present these results he re fo r ~ 1

Distribution of f3 1 with Normal Errors


If the error~ .1rc i.i.d normally distributed and mdcpendent of the regressors., them
the di-.tnhutinn tlf
conditionaJ on X 1 X , I'> \'(/31 tTl> , ) where

P:

L.

17.4

Exact Sampling Distributions When the ErrOf$ Are Normally Distributed

'

usi:X =

689

(T"

L(X, -

(17.1!:;)

-.

X )~

il

1l1e de nvation of the normal d1stnbu11on l'v(/3 trj

) condtllonal on X
1

. ....

X,. cntatls (t) establishing that tht! distnbut10n tS normal. (u) showmg that
E(p X 1 X,)= {31: aod (iii) 'enfy1ng Equation (I ?. IX)
To show (i), note that conditional on X 1, X, , {3 1 - {31 i!> a \\dgbted aver
I

age of u1 u,:
I

(17.19)

This equation was derived jn Appendix 4.3 1Equation (4.30) and IS restated here
for convenience.] By extended l ea~t squares Asl)umptioo&IH ./#2. #4. and #5. u, is
i.i.d. N(O. aD.and u., and X, arc independent ly dil'trihuted. Because weighted averages of no~mally distributed variables are tbemselve~ normaUy dislributed, il follows that {3 1 IS nonnally distributed, cooditonal on X 1. , X,.
To show (ii). take conditional expectations of bot h sides of Equation (17.19):
[(P 1 - {31)IX11... 1X,) = [~~- 1 (X, - X)u;f 'i." 1 (X,- X )' X 1, .. . , X,]=
L~- 1 (X,- X) E(u;I Xt> ... , X,)/ 2::7= 1 (X, - X) 2 = 0 where tbe finaJ equality follows because E(u,l X 1 X 2, X,,) = E(u,IX,) = 0. nws ~~ is conditionally unhiasedl Lhat is.

(17.20)

To show (ili). \Jse the fact that the errors are independently distributed. conditional on X 1 , X,. to calculate the conditional variance of {3 U<;ing Equation I l7.19):

L (X;- X) 2var(ll IX 1, ... X,)

= t= l
II

L(X;- X) 2 u~
-

1=- l

-[,~(X;- X)]

p7.:!1)

690

CHAPTER 17

Tho Theory of linear Regression with One Regressor

C.mcc ling thi.! term in thL numerator rn the fwal exrr'-'''"'11 in Equa11on ( 17.:! 11
yicJU., the formula for the condition.tl vat iancc m l::.4u ttion ( 17.1~).

Distribution of the
Homoskedasticity-Only t-Statistic
The bomoskcdastictl)' Onlv t-stattsuc testing the null h) p1lthc'1~ P1 =f3u 1i'>

/3, -

t = -

{3

(17.'22)

S(/3, )

whl.'re Sf !{3 1) , -computed using the bomoskedastkit y-nnl~ 'Lmu.trJ rrnr uf p


Suhc;t ttut ing th~ formula for S(~ 1 ) [Equation (5.2~) of App~nJ1~ 5.11 into Equatum (17.22) and rl!arranging yield~
1

f3t - f3u,

/ ,~

- .,

\ ' r,i/ f-1 (X;- X )~

_.._\J q'

II

(Pt - f3 to)la,,

VW, (u - 2)
''here

f3 t - f3t n

I
\f (T~I "l: (X, -

,\)'

"

(17.2J)
I

(i;, = , ~ 2 1; u; and W = 2.';.. 1 u~l~. Cnder the null hyplHhc!)t' ;3, ha-.

tr: .)

\ ({3 11
distribu tio n conditional o n X " ... X". so the dtstnhution of the
numerator in tbe final expression in Equation (17 23) is N(ll I). It b .,JHl\\ n in Scl.
twn 18.4 that \II h.ts .t chi-squared distributiOn with n - 2 Jcgrees ollrc:\.'uom ~nd
moreo~cr th.tt \\ is dtstributt!d independc:ntlv of the standaruiLI!d 01 S c::.t m <~h.
10 thl! n um~.rator of Equaton (17 .23).1t fo lio\\~ trom tl1e Jdin1110n olth\! Student
I di,tril:'lu tiun (AppenJh 17.1) that. under Lhe five e\!l. nueu least Sqllt~rl.~ <tS!iUIUp
tions. thl.' homoskeclasticity-only t-statistic has a Student 1 distributwn \\ 11h 11 - 2
J cgrec:s ol trccdom.
I

Where does the degrees offreedom adjustment fit in? ll1c <k~rcc~ (If freedom !lOJUstmcnl in s~ t.!n">ures that s~ IS on unbi::tseJ cstimmor ol und thatthi.!
t-staristic hn!-t a Student 1 distribution when the crn.m. ure llllt mully dbttibuted.
l11.!cumc W = "i~ 1 it; fer~ is a chi-~quarecl random vnrinnlc.. \\ilh 11- 2 tkgrc~.,

if,:

of freedom. tiS mean is ( \V) = n - 2. Thus. FIW '(n- ~)] (11- .,) (11 2) -=c
1. Rcumngtn!! the defin ition of W . we hav~ tbnt 1:.(, 1 , "i~ /t;) ~~~ nlll.,l tht'
t.lcgr~.: es ot trccuom correct ion makl!s 'i an unhia-;cJ ...:~tunutor of rr' AJ,o. h~
dt\tutng b)' 11 - 2 raLher than n,lhe term 10 the dt:n(lJ11tn.llor ul the hn.sl cxpreo;sion ol (~Illation (1-.2'\J matches tht. c.kllmtwn ot a r.mdom \'!lri.sblc: \\llh a

17 .5

Weighted leosl Squares

6 91

Student r Ji.,tnhution g1vcn in AppendL'\ J7.1. That '" hy u'lll!{ the degrees oi freeJ~llll .1tlju~t ment to calculutc the -.tanJarJ l.!rror, the t-o,tatistic ha' the Stutlcnt r di tributioo when the erro~ arc normally dl'\tn huted.

17.5 Weighted Least Squares


Under the first four extended least squart:s assumptions, the OLS estimator is efficient among the class of Unear (in Y1, ... , Y,,). contlitionally (on X 1. ... , Xn) unbiaseo estimators; that is, the OLS estimator is BLUE. This resu lt is the
Gauss-Markov theorem. which was discussed in Section 5.5 and proven in Appendix 5.2. The Gauss-Markov theorem provides a theore tical justification for using
the OLS estimato r. A major Limitation ol lht! Gau s-\1arkov theorem is that it
requires homoskedastic e rrors. If. as is often e ncountered in pracuce, the errors
are heteroskedastic, Lbe Gauss-Markov theorem dots not hold :md the OLS estimator is not BLUE.
l11is section presents a modification of the OLS estimator. called weighted
least s qut~re (WLS) , which is more effici ent than OLS when the errors arc
heteroskedastic.
WLS r~t.tuires knowing quite a bit about thl. conditional variance fu nction ,
var(u X;). We consider two cases. fn the first case. var(u, X;) is known up to a factor of proportionality. and WLS is BLUE. In the second case. the functional fom1
ol var(u,! X;) is known. bUJ this functiona l fo rm hao; some unknown parameters
that must be esljmated. Under ~orne addiliooal conditions, the ac;}mptolic distribution of WLS in the second case is the same as il the parameters of the condiuonal variance function were in fact known. and in this !tense the WLS estimator
1<> asymptotcally BLUE. The sectjon conclude:> '' 11h a dic;cussion of the practical
advantages and disadvantages of handling beteroskeda,tiC.Jty using WLS or, alternatively, hetcroskedasticity-robust standard errors.

WLS with Known Heteroskedasticity


S upp~c

that the conditwoal variance var(u X,) is kno'' n up


porllont.llity, that i&,
var(u,!X;)

= Ah(X;).

10

a !actor of pro-

(17.24)

"here A is a constant and h is a known fu nction. In this case. the WLS esumator is
th~: c~nm ator ohtaint!d by first di,iding the d~.;;pcnd~.:nt vanable and regre sor by

692

CHAPTER 17

The Theory of Linoar R~ression with One Regressor

the '>4Uilr\o mot of /J,thcn rcgrl!ssing thi-. moJific:d llcpem.lent \:triuhlc 11n th..: nwJ.
ificd rel!_rc,-.or uc;ml' OL5 Spectbcally. 01\ ide both "ide of th~ -.inglc-,uridblc
r\.g.re'(.;or mudd O\ /t( \ ,) to obtam

"here~ = )''\ hi\).\-.. = l'v'II(X,). \,

X l\'h(.\.).and ii
'' 1\ /,( ..\,).
The Wl' eo;timutor is tbe OLS est1mator of {3 in Equati<lll ( 17.'~l.that i'. it
~ the l!'-tmatur uht.unnJ h\ thl.! OLS regrco.;sinn ol l ' on X anJ \', wherl' the

codhucnt on X t.JJ...I.'' llt~ place of the mtcn:~pt in the unwetg.htcd r"\!tl:"llln.


l nd~..r th~..lt('.)lthrl.'~.: kl~t ~4uarcs assumptum'i an Key Conc\opl l 7.J rlt, rc
knu\\n hdt.Hhk J "ttdl ~ "umpt1on m Cqu<~llun ll7 24}.\\LS 1~ BLLE 111~.- r~..., .
son that the Wl S l..'ltim Jtur b BLL E j., thJt "d!!hting the varia !'Ih.:' h.ts m 1Jl he
error term ii, in the wd htcd regrcs-.ion homoskeda-.tic. That i~.

II,

x) = var(u, I.Y,)

v.~r(u,l.~,) :- vm ~I ,

All(.'<,)
h( XJ - h(X,) = A,

( 17 211)

cont.lttion.ll V.tll.anc<.: of ll ;. 'ar(u,l.\,), IS constant. Thus th\.. l1 rst l0ur lc:t't


'4u.trc::' .t:-.-.umptinn' .1ppl~ to I::quath>n ( 17.2") St rict i} 'PLab.mg. thl G.~u ...~
:\tarkov lh~.:or~..m \\U\ prm 1.:0 10 Appeodi:\ " ., for r 4Uation ( 17 1). \\ hich mduJe~
the mtercept .8 c;o it th -.not appl~ tC\ E4u tLio 1 ( J7 ~5). n '' htd the i 1terccp1 i
rcpi.~~~..J tw fj Ao,.l hl\h.\\or the extc:ns10n ot the G~uss-Ma 1 kO\ thcon.: m for multipk r\!g.rt!:..'\IOn (S~.llwn lh.5) d~ apply t0 t:~u mauo n of f3 in the wc1 ~hted pnp
ulauon n:gt~.:~~wn. L4u.11aon (17.25). t\ccordngl~. the O LS csttm.Jtur <ll JJ 1 111
Equation ( 17 .:") that '' th~ WLS estunators ol {3 1 -1~ BLUE.
In practic\: the fun~Liun II typical!~ j-. unknuwo.so neither the w~-;1ght~d \,lfl
ahlec; in F4u Ilion ( 17 .,'\) nor lh~! WLS \!'ilimatur '-'" b~ computed F0r thi' rc.".t
~on. t h~ \\ L~ c~tlm,,tur lkc;cnbcd hen: is o;omctimcs called the infca ..ible \\ 1$
e-.um. tor. Iu 111 pkmt:nt \\ LS m practice. hi! tunction It mu~t be cc;um.ucJ , thc
tup1c h> \\ht~.- h "~no'' turn.

">0 till

WLS w ith
Heteroskedastkit y of Known Functional Form
Jl the hetcro:.kcdu<:.tici ty hao; a known functional fmm.thcn Lhl.' hdcro,kcJ;bticitv functmn h c m h.. ~.:,t,m.tt~..d and lhe '.\ L<; c<arma111r can ~c calculntcd u<:.ing thic;
~'lim th:J funt11on

Example #I: The variance of u is quadratic in X .


lion tl

\<trian~c

l.:ll ll\\,\

b~.-

lhl I..JU.tdr.tlk function

Suppv~c that

the con~.h

17.,
';'lr(u, X ,I

fl,

Weighted leo~ Squares

693

fl \

and IJ 1 are unkno'' o param-:tch. tJ > 0. and o1 ::::: II.


Bel IU'C II,. and HI .ue unkn0\\0. it ., nul r,),,jhJc: t~llllll,lrU~I the \\'c:ightc:J
vari.thk'}; \ . mJ .\\,. It i', ho\\c,cr. P'""'"'~ to c~tim.llt' 00 and n" anJ to use
thn:-.e estimate' to compute estim.th:' or v.u (11 I \' ). 1 c 1 i1 tnd 01 he c"timalo~s of
o, and H1 and let '.ar(11 X) = 0 ..1.. fJ 1! , Och nc the! \\~:lghtcd rc!!resson, Y =
) / \, \i\1(/1 \ I ). Xn- 1/V 'var(ll X ). JnJ \ I
A 1/V \,trCu,l,\ , ) TI}e w~s
t.'tim.l tur ~the 01~ ~stimator ot the cocfl icicnb 111 th<.. rcgn.:!>sion ot }'.on .\'t.\'
anJ \ 1, (\\her~: ,80 .\ ~ t.tkes the plm.:e of the intercept /311 ) .
lmplc:mcntatton of thi~ t:~timatut rcquin:' c~tim.tlln~ thl c~.mJnion,tl '.triancc:
lum:tlun.thHII'. c-..t imating Ov and 0 1 1n Equat1on ( 17.27) One \\;1~ l\'1 estimate fl0
md n cnnsbtc:ntly is to regre'>.
on
using Ol.S "hctc: ;;; b thc ~4uare o( the
1'11 OLS n:<;idual.
Suppnse that the conditional variaoct: hal> thl form in lqu<tlioo (17.27) and
that 1111 am.I IJ 1 arc con'i'tent C!>ILmH(tlTS ol tJ11 .tnd o1 Llnd~ r il"'Umfltlon'\ 1-3 ot Key
Concept 17. I. plus addttrona l moment condlltons th.u ill lSI.. bc~ausc: 9unnd IJ 1 are
c~llmatc:J tht. a~}mptot1c dl!>tributton of the\>, l ~,.,t t matnr j, the 'ami.! as if 00
.tnd 11 1 \\crc known n1u' the WLS estimator \\ilh Ru and 01e'timatcJ ha!> the same
n') mptt,tic Ji,trihutillll ''the in(easibk \VLS c'timator.tnd j, in thi' '~IN:' a.;.~mr
toticalh Dl L f.
1:3t:ctu-..c thh method o( \\ LS c.tn be tmpkmullcd hv ~,. ,tun.tting unkno,,n
p tram~lc rs ttl th.: conJIItOnal \Jnanc~.. tuncuun. th1" llldhnd ts :..tlmetimes called
feo11ible \\ L~ 1.lr estimated \\ LS
"h~.:r~ t1

u; x;

Example #2: The variance depends on a third variable. WI S .tlsu can tx:
u., .. d ''hen tht. c.onditional variance depend<: on 'l tlmd 'an tblc. W, whtch doe-.
tlllt tppcar in the rt:ercssion funct1on . Spet.tfic:.tllv o:uppu'e thc11 d tla .trt~ col.l~cteJ
un thrtt. v,trttbk , Y, ..\ . ,md W . i = I..... 11; the popultlhm rctrc"ton tunction
Jqh.!nJ, on X hut not W;: and rhc conJitton tl '.mancc d pend~ on n ,, but not X ,.
llt.tl ;,, tht. popul,uion rc:gre .,it.ln functJ~m i-. q }',IX, \\') Po- (3 1X, :md the conJitiun,ll \anancc i-. \3r(u,J X;.\\',) = Alr(WJ. where:" A j, ,, con... tant and his a function that mthl hl cl>limateJ.
r or cx:1mpk. suppo$e that a rc~cnrchcr is intere-;ted in mmklinr. t h~ rclatlon,Jup bd,.,.cl..'n the uncmpiO\ffilnt rate 111 1 <;f~tl tnd .s ,,,,tc .:"'onomtt. pohcv
\ ndal>l~.o ( \ ,). ll't~: me.1sured unemplo) mcnt r.tll. (} ) hO\\ 1..\l..r. ~~ .1 'uncy-based
l'stimatl nl the true unemplo~ment rJh: {}
I hu-. } nh:t~-..ur~., Y ,.,.jth c.;rror.
\\hl.'t c the 'ourcl.' of the error i<> random sur\'C)' e1nlr ~o } , }
1,. \\ hert: ', j..,
lhl.' llll'"UtllllCnt error ari..ing from tht. \lll \'l.'y. ln thi' c'\amplc it j, plau<-ible that
th.: sut \ cv s.unpk <;1/C. H' . is not the It a dtll.'l min.mt ot the 11 Ut '>t.tlc unc:mplm 1

694

CH APTER l 7

The Theory of Uneor Regression with One Regressor

ment ratc.111u~ tbe population regression functi on doc~ not depend on W .Lbat is.
=f3o + f3 1X 1 We therefore have tbe two equations

E(Y, IX,,W;)

Y~

=Po+ f31X, + u, . and

( 17.2t-;)

Y; = Yt + v,,

(11.::!9)

where Equallon (17.28) models the relationship between the !.ldtc economic policy variable and the true state unemployment rate anti Equation ( 17."91 repre
sents the rc lataonship between the measured unemployment rat" Y .md th'- true
unemployment rate Yi.
The model in Equations (l7.2t}) and (17.29) can lead to a popu lation regression in which the conditional variance of the error depends on W, hut not on Xi.
The error term
in Equation (17.28) represents other fac tors omjlted from thi::.
regression, while the error te rm V; in Equation (17.29) represents measurement
error arising from the unemployment rate survey. I( u, is homoskedastic, then
var(tt! I X ,, W,) = cr~. is constant. The survey error variance. however, depends
inversely on the survey sample size W;, that is, var(v1 [X;.W,) == a/W1, where a b a
constant. Because v1is random ~ urvey error. it is safely assumed to be uncorrelated
with u?, so var(u~ + v,IXI' W,) = C.T~. + a / W,. Thus. substjlUting Equation ( 17.28)
into Equation (17.29) lea<ic; to the regression model wnh heteroskedastJcity

ut

Y,

= Po+ {3 1X, + u,.

(17.30)

var(u, IX;.W,) == 811 + 6 1(1 / W,)

(17.31)

where II, :a u~ + v,. Oo = crz.. and e, = a, and E(u, l X,, IV,) = 0.


If 80 and 9 1 wert! known, then the conditional vanance function 10 Equation
( 17.31) could be used to esumate f3uand {31 by WLS. In th.is example, On and 8 1 are
unknown, but they can be estimated by regressing the squared OLS residual [frum
OLS estimation of Equation (17.30)] on I I W,. Then the estimated conditional variance function can be used to construct the weights in feasible WLS.
It should be stressed Lhat it Is critical th at E(u, IX,. W1) = 0; if not, the weightell
errors will have nonzero condi tional mean and WLS will be inconsistent. Said dafferently, if W, i::. in fact a determinant of Y;. then Equation (17 .30) should be a multiple regression equation tha t includes both X; and W,.

General method offeasible WLS. In general . feasible WLS proceeds in four


~tcJX:

I. Regress Y un X, by OLS, and obtain tbe OLS res1duals

u,, = l, ... ,n.


1

17

.s

Weighted least Squares

695

2. Estimate a model of the condil iomtl variance:: function \'ar(u, X,). For example, if the conditional variance function ha' the form in Equation ( 17.27).this
entails regressing ii? on Xf. In general. thh .tep entails estimating a fun ction
for the conditional variance, var(u IX,).
3. Use the estimated fu nction to compute predicted value.<> of the conditional
variance function, \;ar(u, IX;).
4. \\Ieight the dependent variable and regre)sor lincluding the intercept) by the
inverse of the square root of the ~Stl rnated conditional variance function.
5. Estimate the coefficients of the weighh.:d regression by OLS: the resulring

estimators arc the WLS estimators.


Regression software packages typically incl ude opt ional w~igb ted least
squares commands that automate the (ourth and fifth nf these steps.

Heteroskedasticity-Robust
Standard Errors or WLS?
There are two ways to handle hctcroskedasticity: estim ating f3n and {3 1 by WLS, or
estimating {30 and {3 1 by 0 LS and using heteroskedasticity-robust standard errors.
Deciding which approach to use in practice requires weighing the advantages and
disadvantages of each.
The advantage ofWLS is that il i~:~ more efficient than the OLS estimator of the
coefficients in the original regressors. at least asymptoticall). The d ~advantage of
WLS is that it requires knowing the conditional variance funct1on and estimating its
paramete rs. If the condilional variance function hru. the quadratic form in Equation
(17.27), this 1s easily done. In practice. however. the functionallorm of the conditional vanance function is rarely known. Moreo,er. if the fu ncllonal form IS moorreel, then the standard errors computed by WLS regre)sion rouuncs arc in"alid in
the sense that they lead to incorrect statistical inferences (tests have the wrong ize).
TI1e advantage of using heteroskedasticity-robust standard errors i.;; that they
produce asymptotically valid inferences even if you do nor kno\\ the form of the
conditional variance func tion. An additional advantage is that heteroskedasticityrobust standard errors are readily computed ali a n option in modern regression
packages, so that no additional effort is needed to safegua rd against this tlueat.
1l1e disadvantage of heteroskedusticity-robusr standard errors is that the O LS estima tor will have a larger variance than lhe WLS esti ma tor (hascd on the true
conditional variance fun ction), at least asymptotically.
In practice. the functional form of var(u,l X,) is rarely if eve r known . which
poses a proble m fo r using WLS in real-world applications. Thi-; problem is

696

CHAPTER 17

The Thoory of Uneor Regression wrth One Regressor


difficult enough wuh a ,jngle regre- or, hut 10 apphcuuous \\ llh muhtpk rc r ,.
sors it ic; cv~.:n murc difficult to kno" the functional form t)f the'" md1110nnl 1 1
ance. For thas reac;on practical use o f WLS confront' imro~m challeng s In
contra.,t, tn mcxl~.; m statJsllcal packages it is stmplc to usc hetcro..ktdaqicity-rohu"t
standard ~..rrors. :mJ the resulting inferenCI!!> arc rcll.tbk unllcr h. TV I.!Cnc:r.tl conditions; m pa1 uculur. ht!lt:roskedastJci ty-robust stanc.IMJ errors can b.. tN.J '' tthout nccuing tO !>pectf} a functional form for the condtliOnal \ art..tnt.l... f-01 tht.~e
rcac;on . it '' our o pinio n that. dcc;pite the thcurt.IJcal appeal tlf \\I \ hcteroskcda\UCity-robust Standard ~..rrors prO\id~.o a be tter wa~ to hand! r Oh; ntiaJ
hl!h:.rosk~.;J ast icit y in most applications.

Summary
I. I11e a~ympt ot ic normality of the OLS estimator,comhincd with the cnn-;i,tcncy of
hetcros k ~dnc;taci t y-robust standard e rrors. impHes that, if the fi rst lim~~.- ka-.1
sq uares as~ u m p lion s in Key Concept 17.1 hold. then the h e l e roskednc;ti~o:ilv robu<a
r-statistic ha ~ an asympto tic standard normal distribution under the null hyptllh
esis.
2. If the rcgre~s ton erro r' arc i.id . a nd normally distnhutcd, condlltonal on the
regre sors. then {3 1 has an exact no rmal sampl ing dt,tnbutton. condtlwn tl ''n lht:.
regressoTh. ln addition. the bomoskedast icity-only t-statistic has an end SIUJ.:nt
t, _~ sampl ing distribution under the null hypothcsb.
3. The \\ l!t~ h t ed least squares (WLS} estimator is OL.~ applied to a wei~htcd rcl!re ~
sion , where all variables are weighted by the square root of th inv...:rse olth~.: c~m
dittonal \ SIIance. var(u,.X;). o r its estimate. Altho ugh the WLS c~ttmatur t:.
asymptotically more efficient than OLS, to im plc! m~ nt WLS )OU must kno\\ th'"
functional Corm of the conditional variance fu nction, which usuall~ '" .t t,tll order.

Key Terms
convergence in probahtlit) (68 1)
consistent C'>limator (6.!i I }
conv~rgence in distnhution (684)
as\mptotic &-tnt"lutmn (6X4)
Slutc;ky's theorem ((~5 )
continuous mapping th~orcm (685)

weighted ku~ l ~q uare'-' (WLS) (691)


WLS e~timatnr ((19:2)
infeasible WLS (692)
fca:.i ble WLS (h93)
normal r.d.r (701)
hl\anate normal r .d.l (701)

Exerci~

697

Review the Concepts


Suppose that assum ption 4 in Key Concept 17.1 is true, but you construct a
95t. confidence interval for /3 1 u-.ing the hd~ ro,. keda stic-robus t o;tandard
error in a large sample. Would this confid t:ncc intcrv:tl be' alid asymptoticall> '" the sense that it contained the trUI! value of /3 10 95 ~u of all repeated
samples for large n? Suppose instead th at assumpuon 4 in Key Concept 17.1
ts false, but you construct a 95 ~o confldl!ncc tntt.nal for /3 1 using the
homoskedastici ty-only standard e rror fum1ula m a large sample. Wo uld this
confidence interval be ' a lid asymptotically'?
17.2 Suppose that A n is a random variable that converges in probabilit\ to 3. Suppose 1hat Bn is a random variable that converges in distnhuuon to a stan dard no rmal. What is the asym ptotic d tslribution of AnB,? Use this
asymptotic dtstribution to compute an apprOxi ma te value of Pr(A,B,. < 2).
17.3 Suppose that Y and X arc related by the regression Y = 1.0 + 2.0X + u. A
researcher bas observations on Y and X , wht:re 0 s X s 20. where the conditional variance is var(tt;I X1 = x ) = 1 for 0 s x ::S 10 and var(II;IX; = x) =
16 for 10 < x s; 20. Dra w a hypothe tica l scalle rplo t of the observations
(X;. YJ, i = 1. .. . , n. Doe~ WLS put more weight o n observat ions with x :s;
10 or x > 10? Why?
17.4 Instead of using WLS, the re!.ea rcbcr in the previous problem decides to
comput~ the O LS estimato r using only the observatio ns fo r whicb x s; 10.
then US"ing only the observations for wbich r > 10. the n a\'erage the rwo O LS
of esumaton.. Is this more efficient than WLS?

17.1

Exercises
17.1

Conside r the regression model without an mt erc~ pt term , Y1 = /3 1X1 +


the true value of the inlercept,/3 0, is ~ero) .

11 1 (so

a. Denve the least squares ~;; Lima tor o f /3 1 for the restncted regression

model Y, = f3 1X1 7 u,.l n is is called the rcstncted h.!c.lst squar l!s estimator (~f-5) of {:31 because it js estimated under a restriction, which in
this ca e is {30 = O.

b. De rive the asymptotic distribution of ~fLS under assumptions 1- 3 of


Key Concept 17.1.
c. Show that

M' ~ is linear IE4uatio n (5.24)] and, undt:r assumptions 1

and 2 of Key Concept l7.l. conuttlo nall} unht a!>ed [Equation (5.25)).

l.

698

CHAPTER 1 7

The Theory of Lineor Regression with One Regressor


d. Dcm .._the cClnditional van mce of pRt' untkr tht G,JU...-.-~farko' cont.htlons (a~:.umptron:. 1-t of key Conc~;pl 17.1)
e. l.c,rlpar\! thv conditional '.~r nc... ~,, ;3f1 1 (l ) to tht: c ndllt<mal
\.t
nee ol Lhc OLS cc:Um.Jtor f (lrt.lM th .. ,~; ~~' u
1duding an
intc: npt) unc.kr lhe Gaus:.-\larko' cundtlt 1fl~ \\ h t timator is
more efficient, t -e th~ formula' ior tt ~; V<lr r~ncc' to cxpl;rin "h).

r.

D~ 1\'~ tht exac.t samphng di.:.tribution ol ~.


'.f Kc} Concept 17 l

Ul

lcr 3\:.UmpltOOS l -5

g. "o" ~on~1der the c:.tim,ttor j;, = :i" 1 l I :L, 1X Dr11vt ,10 cxpr~''onlonar(P11X .. .. . X 11 ) - vari.Bf 1 Y1, . . . .\ ) und ... r the
Ci,w-.s-Markm conJiuons. and u-... till' exptc ......iun to -.llll\\ th,H
\ .~r( ~ 1 IX 1.. ... X,) ::::: var(J3f 1 s IX 1, ... . \',).

17.2

~uppn-;~

that (X1 Y1) are i.i.d. with fin ite fou rth moments. Prove 1hat the sam
pk cc..IVuriancc is a consiste nt estimator of lhc pnpu lnt inn covanancc, lhat is.
\,\ ~ _L_. rT XI. where \' n is defined 111 l qu atlOU (3.24 ). (1/mt: U~l! the Strat
~:g~ of Appendi x 3.3 and the \auchy-Schwa11 ln~:qu,thty. )

17.3 1 h1-. exerctsc tillo; in the detail.; of the denvauon of the..


tton of f3 g_~ven 111 Appenilix 4.3.

n.

,t'i) mptotu: dislribu -

u-... Equation (17.19) to demc tht ex pre~c;t on

'l"
- 13,) =

= (X, -

I L;

(X - r-.\
II
)
\

,- - - -

! L{ X,

n,_ 1
whc:tc ~;

- "' l

- .\') z

t/ ~
-

'\

2:(.,\ , . X):

J.t \ )u .

h. t '-.c th(,; central limit theorem. th~: l.n.. of 1.11 gl numhcr'.111J Slutsk} ,
theorem to show that the (iual term in th\.' cyuution wnv~'l!tC~ in pwhabthl\ to zero.

c. ll-.1.! tht: Cauchy-Schwarz inequa lity anJ tht: thitd ll:u'>l ~qu:ln.:'>
ns~umption

in Key Concept17.1 to prove that vur(l') '); . Dm.:s 1hc


term\~~~; 1v,l<r ,. sa tisfy the ccntntl limit thcor~m)

d. 1\pplv the ccntrallimittbcorem and Slut..,J...) \


n.:c;uh tn Equ.tlion ( 17.12).

thc1n~m

t\lllh(.tin the

Exercises
17.4

Show tlw following


-

699

Tc~ults:
..

.,

"

u. Show that \ n(/31 - {3 1) .\'(U,u ). whc tl' a t\) a con,t.ml.tmpht:!>


tlltlt iJ 1 l">l'On\hlcnt. (Hr111: l sl.' \lut-.k) , thcor~m .)
b. Shu" that
17.5

c;uppo~<: that
/"(\~ ')

s'?,trr;.

~ I 1mph~' that

t,IIT 11

~ l.

W is a random variable with I:(tV') < '). Show that

;r.,

17.6

"\lw\\ th.ll 11 {3 1 is conuittonall) unhwscd then it is unbla..,l.'d.that is, show Lhat


1ft (/J 1 .\ X,.) = {3 1.thcn r:c{3 1 )
{3 1

17.7

<iupr'''~:

that X and
l. ... 11 an. i i.J.

11

are cnntinuou:. random 'nriabh.:' anJ ( ,. u,). i =

a. Show tho~t the J'-'int pwhabiht) J~.nl>il~ tunct 1 ~m (p.d.l.l of (II,, 11,. X, . X)
em bc\\ nllcn asf(u,,X,)f(ur.\,) fl)l/ r- J \\hcrcj(u, .\') ts the JOIDl
pdf otu, and X,.
b. Sho" that E(uu, ."(,X) = (11 1

c. Sho"' that ( 11 Y .... , .\ 1 )


d. <iho" th.at T(r111. \'1

17.tJ

,\

)f~(ll

\ ) fur i =F j

= Ff u I \' ).

\'_, .... X")

- l:.( tt IX;)FCu,I X 1) ft~r 1 f: J.

the rt!grt: ...!.-100 modd tn K ey Concept 17 .I and <;Uppose that


..,sumpthJO' I. 2. 3, ttnd 5 hold Suppose that ;Nu mptton ~is n:pbccd by the
l~)umpllun !bat 'ar(u,l X,) = (I - U \, . "h~ r~; \ i.; 1he ab,lllut~ value of
X1,11u >II, and 11 1 ~ 0.

c llO<; tt.kr

u. Ic; 1he 0 I.S cc;timator ot {3 B Ll t?


h. <\uppo... ~.. that n0 and 0! arc k111wn . \\'h,tt i'the BI tTF e'timator of /3.''

c. Dcm.c the exact samplmg dt!>lribution l'( the Ol.S ~'tim.ltor. t:J 1 . condi
llllll.tlon X 1, , ..Y

d.
17.11

I.km~..

the ex.tcl ~mpling Ji,tribuuon ott he\\ LS t:~timatur (tre<1ting


(/, .. nd n, a... known) of /31. conditkllldl llO x,... ..\ ,..

l'n,.,.c Fqu:ltlon (17.16) under ao;sumptiono; I :tnd" nf Kq Concept 17.1 plus


the asl>umpuon that X anJ u, have eight mom~.nt'

17.10 l.i.t Ub1. an c::.timator of the par.1mct1.'r U, ''here UmighT tx- hi a...<:d. Sh1)\\' That
tf Ef(1i - R) ] ~ 0 as, ~ (tiMt j,_thc mt .tn ..qua red error ol Rtends
tli/CIIl) then _.!__. n. lHint.l',c Equation (I~ -11) \\ith w- ;/ 8.J

Cf,

700

The Theory of Linear Regr~sion wtth One Regressor

CHAPTER 1 7

i The Normal and Related Distributions and


17.1 Moments of Continuous Random Variables

APPENDIX

Tht" apsxodt' c.lchnes and disclb~ the nonnal .~nd rdated d ... nibutmn'- lbc de' ..""""
of the cha-squared. F. and Student t da:.tribulions. gah:n tn Sccuon 2.1, are r~.:,tntcd ~.: r... 1v
convenient rd~.:rc.:nce. We beg~n by pr..-senting ddanttton' >I probabaltttC5 :1nJ mom~.-rt:.
invohing continuous random ,ana hies.

Probabilities and Moments of Continuous Random Variables


As discu...~d in Sc:cuon 2.1, if Y ~a continuous random van<tbh.. then th prob.sbthty ''summanzed by ItS pmha~ility tknsit)' function (p.d.f.). Tile probabJilly that Y falb be; ! w~;cn tv.o
values is the areK under its p.d.C. between those two values. As in the discrete case, thc.c
expected value of Y is its probability-weighted nwrage value, where now the weights Jrc.
gtven by the p.d.f. Because Y ~con tinuous, however. the mathcmatic.'ll cxpn:,sion'> r~,r ih
probabihttes and expected values involve integrals rather than the summations \hilt are
appropnate for discrete random vanables..
Let frde note the probability densny fun ction of Y Becau~e probabthltc~ cannut he
ne~alivc.fy ( y) <:!: 0 for all: .1l1c probability that Y falls between a and b (\\here a < b) ~.
/I

Pr(o s Y

b)= jfr(})dv.

...

Because Y must take on some value o n the real line, Pr( -::Jt: 5 Y
that

= I, whach tmphe'

J::..ft (y)dy = l.

Expected values and moments of continuou!> rando m variables.. like those of d1scret~
random vnriablc!>. are probability-weighted averages of their values. except that summ.trions [for ~. xample,the summauon in Equation (2.3)J nre replaced by mlcgral<i. According!\.
the expcct"'d vulue of Y

lS.

(Y) = P.r =

f.'.ft

(y)dy,

(17.33)

where the runge of integration ill the set of values for which f) is non7ero. n,c variancc i~
thc expected value of ( Y - p.y )~ . ;md the rth moment of ,1 random vAriu ~k is the expected
value of Y'. Thu:..
var( Y)

F.(Y - p. y)"- =

J(\

JJ- 1.)1ft (y)dy, and

(17.34)

(17.35)

Tho Normal and Related Distributions and Moments or ContinUOI.Is Random Variables

701

The Normal Distribution


The normal dis tribution for

single variable.

lltt pwhahility denslfy funcuon or a normally distributed r:lodom ''ariahle (the normal p.d.f. ) ll>
a

l(y-p.)']
,,

fy{y) = ~ np -..,
u V2rr
..

(1".36)

v'2

where exp(A) is the exponential function of t. Th.. fat tor 11(1.1 ;f) in Fquation ( 1., >6)
ensures that Pr( - x s Y s x) = .f_.. /l{r)dv =1
The mean of the normal d~tribu11on is p. and 1ts vanrmce: is 1.1 lb..: normal c.Jbtribuuon 1S ~vmmctnc, so all odd central moments of 0n.!l.:r thr..:~; .Jntl ~r\:.lter arc zcro. l11c founh
~ntral moment is 3u 4. 1n gener.tL tf Yis dislributec.J .V(p... tT ).then th .,; \ CD ccotral mc.1menb
are given by
(17.37)

\\1len J.l. = 0 and u 1 = l. lhe normaJ d1 trihution IS called the )tanJard normal J ic;tribution.The standard normal p.d.f. IS denoted by 4> and the: ~l anc.J.trJ normal c. d.I. 1s denoted
by (1>.Thus the standard normal density is f/>(y)

= "~" exr t-'1 ) und <lllr) = f-~tfl(~)ck

The bivariate normal distribution. The bivariate nonurd p.d.f. for the t\\O random
vanabll!s X and }

IS

where Px y is the correlation betwec:n X and Y.


When X andY are uncorrelated (Pxr = O).g,,-,,{,,1') [,(,()/1 (y). wher ... f s the normul tlco,i L~ g1\ en in Equation (17 .36). This pron:s thJL if X .1ntl Y 1rc joint I~ norm all~ di~
m huted and are uncorrelated, lhen they are indcJ>~.:nd c nt l \' tltStnbutt:d Th1' IS a o;r~~: ial
tc:llurc of the normal distribution lbat ts typtc..tll} not true lnr oth~.: r d1~tribul10n-...
The muluvanntc.: normal dt~tri bution extend~ the bt' arMtc; norm.ll di'trihuuon to hJn
dh. more tlwn two random variables. Thb dt:.tributwn ;.,. mo"t cunvt:nic.:ntl}' :.tatcJ uang
matnces and is prc~t:ntcd in Appendix 18. 1.

The conditional normal distribution.

Suppo~ 1h1t

..\ nnd Y arc jointly normally


tliswbu ted. Titen the conditional di:.tribution of Y g1ven \ is j\ (!J 1 , a; , ). with mean
p. 1 A - /Jy t (u nloi )(X - J.J.x) and vanance <T),
(I
fl\ 1 hT t l'hc.; mean ot this condiiiOnal dl!'.tnbuuon. condilJonal on X = x , IS hnc:tr funcllt>n of ' .mtl the hlrian~t: dOc:s
nul dept: nd on \

702

The Theory of lineor Regression with One Regressor

CHAPTER 17

Related Distributions
The chi-squared distribution .

Let Z 1,Z 1. .. . z,. lx 11 i.i d ,t.tnJard nunnal ranJom

\ttnables. Tho: random \iUJab le

( 17..'9)
ha' a c htsquarcd distrihution with

11

degrees ot freedum 1111, Ji,trtbutinn i1. dennteJ ,;:.

13-:...tU\t: t:( Z?> = I anJ E(Z:) =- 3, ( W) = nand val( I\')

The Student t distribution .


.1

Zlt.

Le t Z ha ve a )t.tnJarJ normal Jt,tribution, let 1\- lt.,vc

\"~. Jto;tribution. and let/ and W ~ mdcpendentl) d~tnbuted.lhcn the ramlom 'an thle

z
Y\iil;,,

r= - -

has a SIUdcnt 1 distrihu tion Wtt h m degrees o f freedom. denoted

(1740)
1111 ll1c 1

Jistnbution i~

thi.: ~tandard normal distrib ut io n.

The F distribution . Let W1 and \V2 be independent nu1c..lom ' Hriublcs wuh clu-sq u.~reJ
dt~trihutions with res pective degree~ of freedom 11! anJ n1 . Then the random vanahlc

(17 41)

h.t, an F dto;tribulion '' ith

(11 1, n~)

degrees of freedom. Thts dt~t rihuu on 1~ denoted F, , .

Th~.: f distribution depends on tile numerator d..:gree~ of freedom 11 1 anti the denomt

n.ttur degree~ of freedom


lar!!e. the ~.

Jl

11 2 fu

number of degrees of frccd<'lll m th~.; 1knommator gd~

di<~tribution is well approximated b\ a .\':0, Ji~tribution. divided by n. In the

hmll the F _ distrihution is the same as tile~ di~tribution, diviJ.:d by 11 that 1-., it is the
'\Jm~ as the.! \'~In, Jb tribution.

APPENDIX

17.2

Two [nequalit ies


1111~ .tppcndi'\ states and proves Chcbychev's inequality anu the CJuchySch\\i.ITZ ineqUt11ir~

Chebychev's Inequality
Ch.. h}chcv ~ mequalit~ uws the \:Jrianc.. of the random varinhle ~ to hlund the rrob<thtl
II\ that Vis fanher th<tn ...

8 from it' m~.:an. '' herc ,<; i~ a p1"iti\ c l:on-.tanr:

Two lnequalitie~
Pr{'\' - p 1.J:::: Jl) :=: \3r(l ')i8' (Chchych,.,. in(quaht\ ).
l'l

rrt,,e

r4l ltt\l { 17.42). h.: I\\ - \1 - Jll kt

703
(17.-t~\

J 1'1\; lhC pd.f nf I\. and Jcto be <.~0\'

~i

ll\ c nu 'l".:r. :'\u\\,

/'( \\'') = ,.. 110:[(11').111'

-
<

;;;;: {

C!

I,,
~

)_" w .f(w)d11'

.~~~~({~<

)dw-

o~f r-:Jfw)tln

_!_

H'[(II )till' ..

11~[( l' )t/11'

(17.43)

.rf(h')dl't]

=- <~2 PrOw 1<== t>)

"hlrc the fir-t C4U:tht \ j<, the uefinition of( W~). Ihe ..ccond cqualit\ huiJ, becall'e Ihe
l'lll~c.:' nf
\\'t'

mlcf.!ralion drvrdes up the real line. the: iN mcyualtty hnld) ht:cau~ I he h:rm thai

Jwpp..:d '' nunnq!.tll\l' lhc <:l conJ tnequalit) huld, bo.:,;ru'c 11

tnll. ... r<JIIOO .md Ihe ltni.ll~.:quahty

holds by the dcluutaon

II(

Pr(l 1\' I

'

b over the range of

r)) !)uhsltiUIIng \\

anto the final e\prc~ilm. no1ing lhat F.(\.\1.') f. I( I' - 11 1 ) 11 \,tr(\'}.snu rcarnmging yiciJ, the- rn~qualny given in Equal ion ( l 7.4~) II I' rs dt"('rctt, I hi<. pmof appJtc,
\\ ith ~umma11un" rcpl:tl'ing intcgnis.
I

Jl1.

The Cauchy-Schwarz Inequality


lllc.: <auch~ \ch\\otr/ in~qualil) 1 an cxtco..,run ,,r the~<'' rela11110 tllcLJu.tlny, II'.\,
tllc.'nrJl(lr:th. tl<ltlhfll mc.m-; The Cauchy-Sch\\ar7 ith!lJUalil\ is

l . tu

(I 7 44)

fh~

dt'-

rruol ,,, Lqu.tllm ( 17.4-t) IS similar I\) lhe prool of the C<Jrrdauun in.:quahl~

l'- h.\ wher" h ~ a con,t.mt. llu:n r(I\'Z)

1 Let 1\

f.(Y')

111

.-\ppen-

2/JL(..\'}'t-

h'l.(,\ ') 'u\\ kr h-= -F(.\ }')!(..\':). 'o thut (after srmphlic,nion) lhc. 1:'\prc"inn

bcc.,m ...,

nu

I-= F(Y')

he the c.a,c thai


1:1ldn~

IF( \'}')]2 s

the o;qu:~rc I'<Rll

IL'(X}')j1'E(.\ ) Bct.'tiU'C /1\.\' ')

O(,inc~ W' ... O).il

mu,[

E(.\ :!)F.(Y~). and lh~ C:mchv-S.:hw.111 incquahl\ folh''" lw

CHAPTER

16

Additional Topics
in Time Series Regression

his chapter takes up some further topics in time series regression, starting
wirh forecasting. Chapter 14 considered forecasting a single variable. In

practice, however. you might want to forecast two or more variables such as the
rate of inflation and the growth rate of the GDP. Section 16. 1 introduces a
model for forecasting multiple variables. vector autoregressions (VARs), in
which lagged values of two or more variables are used to fo recast futur e values
of those varia bles. C hapter 14 also focused on making forecas ts one period (e.g..
one quarter) in to \he future, but making forecast!, lwo, three, or mo re periods
inro the future is important as well. Methods for makiog muhiperiod fo recasts
are discussed 1n Section 16.2.
Sections 16.3 and 16.4 return to the topic of Sccuon 14.6, stochastic Lrends.
Section 16.3 introduces additional models of stochastic trends and an alternative
test for a umt autoregressive root. Section 16.4 introduces the concept of
cointegration, which arises when two variables share a common stochastic trend,
that is, when each variable contains a stochastic trend, hut a weighted diffe rence
of the two variables does not.
In some tin1c series data, especially financial data, the variance changes over
time: Sometimes the series exhibits high votarility, while at other times the
volatility is low, so that the data exhibit clusters of volatility. Section 16.5
discusses volatil ity clustering and introduces models in which the "ariance of the
forecast error changes over time, th at is. models in which tb.e forecast error is
conditionally heteroskcdastic.
~everal

~v1odel s

of conditional heteroskedasticity have

applications. One application is computing

width of the

int~rva l

forec:.~s t

intervals, where the

changes over ltme to rencc.:t periods of high o r low

638

Add1honol Topics in Ttme Scrtes Regres$ion

CHAPTER 16

unccnaim y 1\nothd applk,nion L'- tu foreca~ting


.m

3'-'-l'l.

th~ unc~runnty

ol

h.'tUIIh

IJn

uch ,,, a ~tock. which in rum cc~n b~ u::.dul in , '~~::.:.tnl! th~.: , ,,k ol

ownim th.ll

., .,.,~,.,

16.1 Vector Autoregressions


fo.;u,~J on forcca'\ling the rate of intlation hut in rc:.tl'l'< o.:n1
. m.: 10 th~.- hW.LOt!S'> of lon:casttng othc.r k~y mncmcconc)mt\. v;ni:thk ~
.ts \\l'll. \Uch as lhl' rate of unl!mploym..:nt. the gro" th r.ll~ Cll (JOP .1nd Into..: 11..''1
rail.!'-. One approach is 10 dl!vdop a ))t!p.tratc fl.lll!castmg moJd lor c..teh \ ari.1hl
using th~.: m~;thod:-. of Section 14.4. Another appmm:h i::. to tkvL'lnp ,, ~nglc 1111.>t..k. l
that cnn lorccast alltht! variJblcs. which can help to make tht: tnrcc<t-.t~ mUilltlly
~.-on.,il> t cnt One wav to forecast sevcr1l varia hies with " sin)!lc model j, to u'~ a
\ct:tor .mtMegn.;ssinn (vA R). A VAR extends the univanote aut on;~rc<>"inn tu
multtpk umc.. t:ru.:s \:lnablc~.thatlS. it extends tho..: unavanatc auto n;l r ''!):.ion 111 a
.., c:ctur" ulum~ 'o..:n~:' 'artabk~.

<.hapt.;:r 14
torcc..t~tt.r:o.

The VAR Model


A H~clor uutorcgrc-.,ioo (\' ~R ) with two time sencs varhhlcs. }' .tnd \' .COIN'I
ul two o..4u.ataons: In un~. the d~: pl..'n<.knt vanabk '" }',; tn the othcr, thl J~pend~nt
\ariablo.. ,, \, Th~; n ... rt:ssors m both ...qu ... uons ardacy.ed \Jiu ....s of both ' r "''
\f,,re g\.n~;r tlly. n \ \R wnh k time senes \Jriables con't~ ls ol k ~:quattm ~. ont: L
c...to.:h tll thr.: \<lriahh:' v. hen: the rcgre~son. in all~:qu,tttnn' me. lagged v,tluc..~ lll
the van.thlc" 11lc cocfftcient'- of the VA R are e timatcd b~ ..:'timating c..Jo..h (l( th~
c..yuall<llh lw OLS.
VARs arc ~ummamcd in Key Conccpt l 6.1.

Inference in VARs. Undcx the VAR a:.sumpuon'>.Lhe OLS o..:stJmatOJ-. arc wn


.,ish.:nt .tnJ have a jninl normal dbtribution in large snmpks. Acc1>rdingly. stathuc,JI
in1crcno.:u pnx-ccd'> in th~ u-,ual manner: fo..>r t!xampk. 95% cnnhd~..:nct 1ntc..n tl" 111
l.odfickn t' c.m be con-;tructed ~'the e~timatcd codftucnt 1 Q(l 'tand uJ I..'T~or..
One.. n\;:w a-.p..:~.:t ol hvpmht!sis rc~tin.l! arises tn VARs bec1uc;c a V,\ H with~
\ MtJblc..s l ' ,\ lliJcc.llon, or --~ ~tem, Ol J.. l!l!UilllOn fhUt, II I' p l:>::.tbh.. 10 h..<;( J(llllt
h>flOil~.;~ '> that IIH Jo. fi.~IKllOOS JCTO:>::. muJUplc i:ljUJll\) h

16. 1

Vector Autoc~ressions

639

VECTOR AUTOREGRESSION$
A vecwr autoreg.r~~sion (VA R) is a set of k time series regressions. in "'hich the
regressor'> nn.: lagged valui!S of nil k scrie~ A VAR extemh. the uni,nriat\! autorcgrt!S~Jon tt> a lbt. <Jr .. vector,.. of time series vAriables. When the number of lags in
each of the equations is the same anJ i<; equal top, the system of cq umion~ is called
a VAR(p),
In thl.! cnsc of two time series variables. Y1 and X,. the VAR(p) consists nf the
two equation~

X,

flll

/321 Y

1 -

+ 132r Y,_P- -y 21 X,_1 + +

-y~X, P-i u'1r

(16.2)

where th.: {ls and the -y's are unknown coefficients and u 11 and uu <ll'e error ll.!rms.
The: VAR <~~sumptions arc the time series regr~sion assumptions of Key Concept 14.6. applied to each equation. The coefficients of a VAR are estimated by
cstimnting eAch equation b~ OL).

For exampk in lh~ t\\O-variahle VAR(p) in Equntiuns (1 tl.l) and ( ltl.2). you
co uld ask whe th er the correct lag length is p 01 fl- 1: that is. you could ask
whether the coeffkients on Y, P and .'<,_1, are 7cro in these two equations. Ihe null
h y poU1csi~ that these coe fficients are zero is
(ln.3)
The allernathl. h ~ polh~sis is that at lea'>t one l'f thc<~c four coefficients is noiUew
Thu..;; the null hypothesi in,oJves coefficient;, Irom horh o the equat1onc;.. two from
each equation.
Because the estimated codfi .Jcnts hav~.- a JOint I~ normal Jhtnbuuon in large
sample:,. it is po!>Siblc to test restriction!> on thr.:l)c.: coclllclt!nts b~ computing an F'tatistic. TI1e preci~e formula for thl!' statistic j, cumplicatcJ hl.!causc the notation
muc;t handle multiple equation~. so we omit il. In practice. mo!.l modem software
packages have amomated procedures for tel) tin~ hvpotheses on codfic1cnts in systems uf mu ltiple c4uations.

640

CHAPTER 16

Additional Topics in Time Series Regression

How many variables should be included in a VAR? n,~ numhl!r of codtJcicnLS m cucl1 equation of a VAR is proportional to the nu mhcr uf \anabl~.:"'llllhc
A R. For l!xamplc, a VAR with five variables and fo ur lag. \\ill have "ll ~cll 1 cients (four lags each of fi ve variables, plus the intercept) in each '-lith~.- ftvc equatio ns, for a total of 105 coefficients! Estimating all these codiJCl~.;ntc; m~...c sc., the
amount of estimauon error entering a fo recast. wh.ich can r~o.l!ult m .J d~.t~o.n uratllln
of the accuracy of the fo recast.
The pracucal implication js that one needs to kl!ep the numN:r of vnr hk~ n
a VAR small and. especially, to make sure that the variables arc phu ...ibl) rd 11t..d
ro each other so that they '"i11 be useful for fo recasting each other. For example,
we know from a combmalion of e mpirical evide nce (such as that dJ-;cusscd in
Chapter l4) and economk theory that the intlation rate, the unemployment rate,
and the short-t erm interest rate are related to each other. suggesting thalth..:sL
variables could help lo forecast each other in a VAR. Including an unrelated Vtiriahle in n VAR, however, introduces estimation e rro r withour addi ng predictive
content, thereby reducing forecast accuracy.
Determining lag lengths in VARs. 1 Lag le ngths in a VAR ca n be determined
using either F-tesrs or information criteria.
The in formation criterjoo for a system of equatioos extends the smgle-equation information criterion in Sectjon 14.5. To define lhis informatiOn criterion \\.;
need to adopt matrix notation. Let~~~ be the k x 1.. covariance matrix o( thl! VAR
errors. and let iu be the estimate of rhe covariance matrix whe re lhe iJ element
,. wbere 11 is the OLS residual from lbe ; rh equattoo and u ~
of i 11 IS ~
1
tbe OLS res1dual from the J1b equation. The BIC for the VAR ts

:r,;_,,i,,u

BIC(p)

= lo[det(2-'

11) ]

lnT
+ k(kp + 1>T,

( 16.4)

wbere det(I11 ) is the determinant of the matrix i,,. The ATC is computed using
Equatio n (16.4}, modified by replacing the term 'ln T' hy "2''.
The expression fo r the BJC for the k equati on~ in the VARin E quation (16.4)
extends the expression fo r a single equation given in .Section 1--1.5. When there 1'
a ~in gl e equation. the fi rst term simpl ifies to ln(SS R(p)/T). The second term in
Equation (16.4) is tJ1e pe nalty for adding additional regressors; k(kp + l) is the
lotaJ number of regression coefficie ms in the VA R (there are k equations, each of
which has nn intercept and p lags o( each of the k time series variables).
Lag kngth estimation in a VAR using the BIC proceeds analogously to th~.:
single equation case: Among a set of cand1date value!> of p. the ~~tima l ed lag length
p is the value of p that minimizes BJC(p).
1 1ba:> sect ton u~ ~ru~tncn nd m.'l} he ~l-.1ppcd tor ~c-.-. mlthcmnuc:.lucaunent~

16 . 1

Vector Autoregressions

641

Using VA Rs for causal analysis. 111~: dt,cu~~t\lO '\0 l.tr has tocu'>cU on u'mg
vA Rs for forecasuog. Another use of VAR mod~.;ls 1s lor anal ~11ng caus31 relationships among economic time series van.tble'> md ~ocd tl wa... for this purpose
thai VARs were first introduced to economics h) the economctridan and macroeconomist Christopher Sims (1980). Tite use of VAR., for cau-.al inference is known
a~ structural VAR modeling. ''structural" because in th is applicatio n VARs arc
used to model the underlying structure of the economy. Structural VAR analysis
uses the techniques introduced in this section in the context of forecasting, plus
some additional tools. The biggest conceptual dilicn:ncc between usmg VA Rs for
fo reca:.ting and using rhern for structural modeling. however. t !hal structural
modeling requires very specific assumpt ions, det h,ed from economtc rhco~ and
inst ituuonal knowledge. of what is exogeno us and what is not Tite discu ~sion of
structural VA Rs 1s best undertaken in the con1ext of t!:otimation of systems of
simu ltaneous equations, " hich goes beyond the.: scope of this book. For an inlrodu<.: tion to using VARs for forecasting nod policy analyst<;. sec Stock and Watson
(200 I). For addi tional mathematical detail on Slructurnl VA R modeling, set!
Hamilto n (1994) or Watson (1994).

A VAR Model of the Rates


of Inflation and Unemployment
A ~ an illustration , c011sider a two-variable VA R for the inOalton ratc./nj1 and the
rate of unemployment, Unempr As in Chaplt:r 14. we trea t the: rate of mflatton al>
having a stochastic trend. so that il is appropriate to transform it by computing ih
first difference. Mnf,.
The VAR tor Mn[, and Unemp, coosi<;tc; of two equations: one in which !lin_(,
is the dependent variable and one in which Uncmp 1 1l> the dependent variahl . The
regressors in both equations are lagged value~ ol !1/n/, and Unemp,. Bec<IUl>C of
the appa re nt break in the Phillips curve in the early 1980s found tn Sectton 1-U
using the QLR test. the VAR is estimatt:d using data Irom 1CJ82:1 ro 200-HV.
The first equation of the VAR is the inflation equation:

Ki~,

0.641ilnf,_. - 0.64!1111{, ~ - 0.1 J..)Jn[,


(0.55) (O.l2)
(0.1 0)
(0 I J)

= 1.47 -

l -

0 P.O. In{,

( O.tJ<J}

- 3.49Unemp,_ 2 } 2.80Unemp,_ 2 + 2.44 Unemp,_ 3 2.03Unemp


(0.55)
(0.58)
(0. 94)
(1.07)

(16.5)

642

CHAPTER 16

Add1tionol Topics in T1me Series Regression


Th\,. aJjustcJ R1 i!, R2

0.44.

11H: 'econu equa11un of the VAR i. the uncmrlo) mcnt cyuttion,m \\luch the
rcl!n..:-.,or; are the samt! as in the inflatiun equ.Hllln !"lui 1hi.' dcpcmknt van.thk h
th\! uncmpluymcnt rate:

--

llllt'IIIP1

= 0.2:!

!l.005.1ln};
(0.1"') (0.0 17)

- 1.52Unemp1
(0. I}

1 -

1 -r- O.IX~~Inf~ -l-

(0.018)

0.29Unemp,_~

(0.18)

() IKJ7~/nf1
(O.Ul ')

'-

II 1Xl3:l/ll{, ~
(0014 )

- 0.43Unt'nt/) -.' -r- 0. lfiUnC'IIIfl1


(021)

( 16.6)

(ll.l ll

ll1c <~ c.ljust~d R~ IS R2 = 0.982.


Lquauon:. (16.5 ) and (16.6), taken together. are a VAR (.t) mod~.. I uf thr ch.mgl
in the rate of in nation. !J.In_f,. and Lhe unemployment rate, Um!IIIJ1 1
fhcsl.! VA R l.!quatinns can be used to perform Granger causallt} tc~b. I he f".
stati,.ttc tesung the null hypothesis that the cocll"icJ~nts on Ummp,_1 Untmp, ..
U111'll'f'r 3 und Ull~lllfJ 1 4 are lero in the inflation equation [l::.qu:.J iiun ( lh5)j i-.
l 1.04. which h.l'> a p-value lcsll than 0.001. Thus. lht. null hypothesi., ~ flll.'t:ll'U . M~
we c~ n conclude that the unemrloyrnenr rate is a u:.dul predn;tor t'l ch.mg..:!l 111
inllatto n. g1ven lags in inflation (lhat is. the un\: mplo~mt.n t rate G rangcr-cau'~.:'
changes sn mfl a tto n). The F-sta tt tic tesung the h~ pothes1-. that t h~o cudflcii.'nt-, llll
thl.! four lags o f 6./n/, ar~ zero in the unemployment equatiOn j.Cquat1on ( 16.6)J i<:
0. It} "'lt~.h h., .t p-valut! of 0.96. Thus the change sn the inflation rat~o JHC' nul
G rang~o r-cau':>\; th..: unt!mploymenl rate at Lht! 10% sigmficancc kwl.
Forecast$ o f the r th.:" of in Galion and unemplo} m~o nt o ne JX nod .thc::.sd :..~r
obtained C\actl't as dl..;cu.;.sed in Section 14 4. Th\,. forec t"t ol lh~ ~.oh.mgc: l'f mil<~
uon Irom 200-UV to 2005: 1. based on Equation ( 16.5). ts !:.in(_ 1 _ ~ 1, = - Ill
perC4:nt.lge point. A simrlar calculation using Equation ( lfl6) gt'l!' 1 furu ''' I
t h~.; unt.mploymcnt rate in 2005:1 ba~e d on data through 20ll4.1Y l l l
Lm mpzc.,,.fl,2t)II IV = 5.4~.,. very close l O jts actual value. f..- llt'IJ!f'l!JJ 1 =- '\ 3.

--

16.2 Multiperiod Forecasts


111~ Jbcu~sion ul forecas ting so far has focu etl nn makin~:t lorccast~ one (1Citotl

m a(h a nee. O lt~.;n. howc\er, forecasters are called upon to m.1kc lnn:~.:.t~h lu1th~1
into the futurl.'. 1l1h section describes lwo methods Lor maldny mull 1p~oruJd

foreca'\ls. I h\,. u-.ual muhod i-. to con-.truct .. lleratc:J'"

ll)rc~,;u~t:o-, I'~ '"hi\~

16. 2

Multiperiod Foreco~ts

643

onc-p.;riod-nhead model i~ iterated forward one paiod at ' lime, in a way that ilmade precise in this section. The second mcthotl 1s to make ''direct .. forecasts
by using a regre ' ion in which the dependent vuriJblc li. the multipcriod variable
tha t one wants to foreca~L For reasons discussc:d .11 the t!nd of thi~ section. in most
applications the ite rated me thod is recomme nded O\'cr the dtrect method.

Iterated Multiperiod Forecasts


1l1e c~scntia l idea of an iterated forecast is th<lt a forecasting model is used to
make a forecas t one period ahead, for period T + 1 using data through period T.
The model the n is used to make a forecast for date T T 2 given the data through
date T, where the forecasted value for date T - I is treated as data for the purpose of making the forecast for period T + 2. Thu ~ the on~-pcriod -ahead forecast
(which is also refe rred to as a one-step-ahead rorccast} is used as an intermediate
step to make the two-period-ahead forecast. This process repeats, or iterates. until
the forecast is made for rhe desired foreca st horizon h.

The iterated ARforecast method: AR(J).

An itera ted A R( l ) forecast uses


a n AR(1) for the one-period-ahead model. For example, consider the first order
a utoregression for Mnf, [Equation (14.7)):

iJiif = 0.02 I

0.24tllnj,_ 1

( Hl.?)

(0.13) (0. 10)


The fi rst s te p in computing tl1e two-qua rter-a head forecast of Mnf200.~ 11 based on
Equa tion (16.7) using data through 2004:IV is to co mp ut ~ the one-quarter-a head
forecas t o f 6/nfloo<.t based on dal<l througb 2004:JV: K/;;j-..oo~non.~tv = 0.02 0.24fl/11[2o~.Jv = 0.02 - 0.24 X 1.9 = - 0.4. The second step is to substitute this
(orecac;t
into
Equation
(16.7), so
th<lt
E:hi]2to!- 11,2uJ..I \1 0.02 0.24 K/,;f"x."' t'"'~IJ rv = 0.02 - 0.24 X ( - 0.4) 0 1. In us, based on informa tion
through the fourth quarter of 2004, this foreca:,l stati.!S that the rate of mflat10n
"111 increase b) 0. 1 percentage point between the fma and second quarters ol 2005.

The iterated AR forecast method: AR(p). 111e Iterated AR(l) stratcg) is


C\tcnded to an A R(p) by replacing Y T+ 1 " itb its foreca l. }'1 'l' the n treating lha t
forecast a:, data fo r the A R(p) forecast of Yr ..2 . For example, consider the iterated
two-pe riod-ahead forecasl of inflation based o n the AR(4) model (rom Section
14.3 [Equa tion (14.13)]:

644

CH AP TER 16

Addational Topics in Time Series Regression

J.f,;J = (l.ll2
1

ll.2li~!nJ;
(0.12) (O.tW)

1 -

03'!.Mnj 1 _~ + 0.16~/nj;.J- D.CJ.Uin/


(O.US)
(O.Oo)
(0.09}

"Pte for~.c..t"l ol ~/lz/111jj 1 bac:ed on uaw through 2fl0-t rv u...ing thi' t\1{(4), com..,u ~.:d 1 S.;l.lwn q ~- is S/;f !5.t~tl 1\- = UA Thuo; the ''"-'quarter-ahead ih:r
ah:u
lorn.1st bast:d on the AR(~) '"
~ htl:m' 111111J I\'- 0112
0.26 .fi;;;. mI. 1\ - 0.3'2.llnh v - 0.16~/nf ~Ill - U.03:l/nib0-l II l ,, 0.2o x n ~ - 11 :P 1.9 O.ln (- 2.~) - o.os 11 h = - 1.1. Ac~.urdu._ tu tht.,
item ted \ R( ) foreca... t, ha.;..:u on Jat.tthrnugh the fourth 4uarh. ll ,OW, the rate
nl mnntion i.!: predicted to tall oy I I JX'rcentage pnmts hctwe~,. nth fir..,t and ,ecunJ quMh.. r~ of 20115.

Iterated multivariate forecasts using an iterated VAR . lt l.'rat~.d multt\etriatc forct'.lsh c..ao bc computed Ul>ing a VAR in much the same \\U) \Is ttcr..ttcJ
uni\'arim~: forcc.1~t~ .m: computed u~ing an autorcgrc!-.siun. The matn n~. w lt.:uturl'
of an iterated mulli\ariate forcca~t is that the two-~t~.:p-aht!ad (p~.-riod T .._ 7) Inn:
casr ol ntl(' vari.thlc depends on the lorccasls oi all vnrinbk~ in the Vt\ R in pc:ri,lt.l
T 1 1.1-or c\ampli!. to compute Lh ~: forecast of the ch<mAe of inll.uion !rom penou
/' + I to pcncxl I 2 u'ing a VA R ,.. ith the vanahks M uj and Ummp onu mu-.
lureclst h<11h ~nj I+ I and Lnempr , usmg data through pcm,<.J T as an in11... n ...
d" 111. 't~.:r in fo 1.c.. ''tin!'- :llnf.r .,. \lor~ generaII} tu c,,Mputc= mu ttp~.onoJ iter .tk~
\ \ R fnrl."ca't' h pt'ri1kh ah\..au. It i<> necc,,,u} Ill ~0mpu c forcc<J,tS ol 111 vnrinhk"'
lor tll mtervcning periods bet\\cen T and T ~ lz.
""''"\..X tmplc wt \\Ill compute the iterated 'vA R torccast of ~It I b:t,cd
on c.Ltta thrc.1U\h 2lJO.l:l\. UStn\! lht. VAR(.l) for a/J f, ctnJ Un 11 f n S~l.l on Hl.l
(F.qu lluD!> ( lfl "i) and (to.6)]. The llr<'t Mep ts to cc mruh. the on~.. -tl.nh.r-<ohc. d
f,)rc:ca't' :l"/;;j'21 uq 20-u 1\ and G;;i;np . ; _..(1.1tv twrn that 'vAR. The for~\. st
i/,1f,, .. 1 >-~I\ ons..:<.l on Equation (16.5) '"as cnmputcd 1n <;~.c..tron II~ an<.J ''
n I pcn:cntagc pmnt !Equatiml ( 14.1f{)j A sirnil.tr cnh.:ulation using J:quati1111
( 16.6) shm.., that Ummp:.~~lltn.JI\ S -!%. In the -sccund <.tcp.thc~~ furcca,t-s tfl.'
~liO~IItutcd tnlu l::.qualtOUl> ( 16.5) ..tn<.l ( L6.6) to rrouucc the tWO quarter ahead
forecast. ~luf:x'~ utlfU 1v:

-..,

~In} .:aJ':II ~~~J.I rv

1.<17 - 0.6-l ~ IIIJ '"~~ 1111 c)J 1\' - O.b..!O.lnJ,,m IV - 0. 13~/nf,~~H ur


- 0 13..H11f2m.: 11 - J ~9 u;;;mp~ .,, 1').'"-"' I\ 2.1:1.11Lnt111f1 ,_ IV
+ ::!.4-H.:nemp:wo 111 - ~.03L'Ilt'lllflzu,.:;u
-.. 1.47 - 0.6:1 X ( - () ) - 0.~ -x 1.9 - tl.l~ X ( - ::! ") 0 P ' 0 .6
- 3 49 X 5 -t + ~ '-:0 >- 5.~ ~ 2.44 " .;.4 - ::! .115 1<. 5.6 -= -1.1
t I,,,9 J

16.2

Muhiperiod foreco~ls

645

ITERATED M UlTIPERIOD f ORECASTS

----

n,c iterated multiperiod AR forcc11~t i<s computed in step~: First compu t~ the onc-

r~: t iod

ahead forecast. then usc that to computt' the two-period-ahead forecast.


and !.O f\lrth The two and thrl!c-pcriod-ahead iterated fon!CHsts bu~ed on an
\ R(p) ,\TC

-r

..

,,

f3,Y

( 16.11)

\hc:rc the {3's are rhe OLS esllmatcs of the AR{p) coefficient!.. Contin umg thic;
proce~ ("itcraung) produccc; forecasts further imo the future.
The iterated multiperiod VAR forccllSt is also computed Ill steps: Fin.t compute the one-period-ahead forecast of all the variables in tllc VAR.thcn uc;c thos<.
lon:ca,ts to compute the two-period-ahead forecasts. and continue this proces"
itc rativd~ to the desired fon:cnc;t horizon. The two-period- ahead iterated forc CH~t llf Y 1 . l;lascd on the two-variublc VAR(p) in Key Concept 16. J is

Y,

21

= /3 .. PuYr r- Pt:Yr- f;DYr- J + - {3,,,}'


, .Y,,x,

+ y Y'r+.YpXr-t

..

+ .Yt,XI

+2

(16.12)

P -.

''h~ rc

the coefficients in Equalion (16.12) arc the OLS estimates of lh..: VAR cocffkk nts. hl!ruting produces t'orccac;rs further into lhe future.

Thu' the it..:ra t ~tl VAR(4) fo recast. based un d.1ta thruugh th~ fourth quarter of
2(XI-t. il' that inflation will decline b~ 1.1 percentage point-. between the fi~t and
second lJILtTlcr~ nf 2005.
ltcrcttetl multtpenoJ forecasts arc summ.mL~d in Ke> Concept lb.2

Direct Multiperiod Forecasts


Di r,.:ll multipcriotl forecasts are compUlt:d \\ithuut ikratin.; b\ u'ing. a 'ingle
in which the dependent var iable ic; t h~ multtpcnod-ahead vmtabk to
be fut cc.t-.h.:tl and the r~gressors are thc prcdKtor var 1.1bles ron:~..;hts computed
thts w .t'r arc c.tlkd dtrecL forecasts because t h~ r~grc.,SIOil ~ot: ffh.:tcnt., can be us..:d
tlH..:c.:tl) to nl.tk.c the multiperiou forc~t-.1.
r~l!rc~<..ion

646

CHAPTfR 16

Additional Topcs tn Time Series R~ression

The direct multiperiod forecasting method. Suppose you \\ant to makl u


forecast o( Y 7 :! using data through timeT. The Jin:cl multivariate mctholltakcs
the ADL model as its starting point. but lags the predictor varbhlcs h) an aJdt
uonalume period For example. if two lags of the predictors ar\:' u..ed ttt~;n the
dependent variable is Y, and lhe regressors are Y,_: Y,. 3,X ... 1nd \
1lt~ cod
ficicnts (rom this regre~ i o n can be used directly to compute the; tnr4,;C:,t.,t of YT.l
using data on Y r Y r- . .\'r and X 1 _ 1, without the need for an) iteration . ~lore
generally. in a direct It-period-ahead forecasting regression, all predictors are
lagged It periods to produce the h-penod-ahead forecast.
For example, the forecast of A.ln!, Lwo quartcn, ahead using four lag.'> each of
A.lnf,_2 and Unemp, _2 is computed by first esttmating tbe regresston:
Af'n.t,~ ~

-0.15 - 0.25A.Inf,_2 + 0.160ln,f,_3


(0.53)
(0.13)
(O. l3)

- 0.l 7Unemp,_2 + l.'~o2Unemp,_ 3 - 3.53Unemp1


(0.70)
( 1.63)
(2.00)

O.lSA./nf,_4
(0.14)

0.10A.lnf, _5
(0.07)

+ L.89Unemp,_5.

( 16.13)

(0.91)

TI1e two-quarter-ahead forecast of the change of intlation (rom 2005:1 to 2005:1 I


is computed by substituling the values of !l./nf:.004 1v... , 6/nf~ 1 UnempZ!J..I.ltV
... , Unemp 21 " 1 into Equation ( L6.13): this yields

c;[;jj:.t ~~ II"Zlll~. 1 v = 0.15 - 0.25/::Jnf-:Ji 1v + O.l6u lnf200: 111 - O. tStllnh ,.0 1


- O.lO.linf?f)lJJ- O.l?Unemp~" + 1.82Untmp:. ~ 111
- 3.53Unemp:m41J

-r

1.89Uuemp2004 .1 = - 1.38.

(16.1-l)

The three-q ua rter abead direct (orecast of M nfr+'l is compu ted by lagging all
the regre sor~ in Equation (16.13) by one a dditional quarter, estimal\ng that
regression, and then computing the forecast. The h-quarter-ahcad d1rect foreca~t
of A.lnf1 It is computed by using CJ.lnj, as the dependent variable and the regrl!'-
sors M nfr- lt and Vnemp 1 JJ plus additional Lags of A.lnf, 11 anti Unemp, It as
desired.

Standard errors in direct multiperiod regressions. Because the dependent


variable in a multiperiod regression occurs two or more pe riods into the future.
the e rror te rm in a multiperiod regression is serially correlated. To see thl'-.
coo ider the two-period-ahead forecast of in nation. and suppose that a surpn-..c

16. 2

Mulhpenod Foreco1~

647

DIRECT MULTIPERIOD FORECASTS


The direct multiperiod forecast h periods into t he futurt: based on p lag each of
Y, and a n additional pred ictor Xr is comp uted by first estimating the regression.

Y,

= 8o + tSJ Yt " + ... + spYr-p-11 - 1 "*

Lhen using the estimated coefficients


data through period T.

tSp+Jx ,

dir~ctly

"+ ... +

li2px , _p- /l <t l

16.3

+ u,.
( 16.15)

to make the forecast of Y r+h

u ~ ing

jump in oil prices occurs in Ibe next quarte r. Toda y's two-period-ahead forecast of
inflation will be too low bet:ause it does not incorporate this unexpected event.
Because the oil price rise was also unknown in the previous quart(;!r. the twoperiod-ahead forecast made last quarte r will also be loo low. TI1us the surprise oil
price jump nexr quarter means that both last quarte r 's a nd this quarte r's twoperiod-a head forecasts are too low. Because of such intervening events, the e rror
te rm in a multiperiod regression is serially correlated .
As discussed io Section 15.4, if the error te rm is serially correlated . the usual
OLS standard errors are incorrect or. more:: precisely. they a re not a reliable basis
for inference. The refore hete roskedasticity- and a utocorrelation-consistent (H AC)
standard errors must be used with direct multtperiod regressions. The standard
e rrors reported in Equation (16.13) (or direct multiperiocl regrc:s:;ions therefore
are Ne.wey-West HAC sta ndard e rrors. where the trunca tion pa ra mete r m is set
according to Equation (15.17); for these data (for which T = 92). Equcuion ( l5.17)
yields m = 3. For lo nge r forecast horizons, the a mount of overlap-a nd thus the
degree of serial corre lation in the error-incrca~es: In gene ral. the first h - l a utocorrelation coefficients or the errors in a n 11-period-aht!ad regre sion arc nonzero.
Th us larger values of m than indicated by EquatiOn ( 15. 17) a re appropriate for
multiperiod regressions with long fo recast horizons.
Direct multiperiod fo rcasts are sumrnari?cd in Key C'oncepl 16.3

Which Method Should You Use?


In most a pplications. the iterated method is the recommc:nded procedure for multiperiod foreca ling, for two reasons. First. from a theoretical perspective. if the
underlying one-period-a head model (the A R or VAR thai is ~ed to compute the
iterated forecast) i'> specified correctly. then the codf1cieots arc c ... timated more

648

CHAPU R 16

Addihonol Topb in Time Series Regression

dltc11.:ntly tl they urc estimated by a om.:-pet i nd -a h ~:ad regres-;inn (ami then iterated) thnn b) a mulupc riod-aheau regression. Second. twm :1 prnctkal per~rcc
ll\ 1.., lor~;.c.:l~tc rs arl! usuall} mterc:stcd in lorecastl> not JU:o.t 11 a 'tnglc honzon hut
11 multtpk honzons. Because the) are produced u~tng th~.- :...tml' modd, iterated
foreca'h 1\!nc.J to ha\ e tune path' that are le~-. ~:rra tJc ac:ro~~ hon1un-, tl1.1o do dtrcct
tore~.- .. ' l Bec:m...~.- ,, Jiffen.. nt model i-, u-;t..d at ever~ hori10n lor tr~.:~t forcCtbt~
!ktmphn ~ error in hl' e lima ted coefficients ca n add n dum lluctuation' tu th~;;
lime (Mth' of a ~qu cnce of direct multipcriod forecas t!).
l mh..r ~orne c1rcumstan ec;. however, di rect forcca~t~ tre prdcrable tn iter
a ted fort..<.:JSLS. Ont: )Uch cucumstancc is \\ben you ha\ -.; n .. a~on to hehe\c thlltthc
une-paioc.J-aheac.J moc.Jd (the A R orVAR) is not spec1t1ed correcti~. I or cxJ mpk.
you migh t h~:l ic ve that tht.! equation for the variabk )O ll .1rc lf)ing to forl!t.t)ltn
a VA R is -.pecificd correctly. but that one o r more of tho.. uthc:r '-'ttU.tlion!\ 10 the
vAR 1'\ spcd fJCc.J mc()rrec tly. perhaps because of ncglec.Lcd nnnli nl '" t~:rm' Tf th~:
onc-stcp-uhcad modt:l is specifil!u incorrectly. then in gtne.ra l the iterated mu lt i
pL'riod forecast will be btased and the MSf E of th~ itcratec.J fo rccn'>t cun exccLd
I h ~: MSfF of the c.Jm~c.t forccru;t' even though the dlrCet forecast h.ts a larg~:r \ oil I
.tnco.. A econd circumMance in which a d irt.:ct forecast nHght be dcstrablc a n s~.. ~
in mu lti va ria te forecas ting models with many predictors. 1n wh ich C<l'iC Vt\R
"pec1ftC d in termc; of all the variables could be unre liable becauc;c it would ha\ ~
\ Cf \ m.my estimatc!d coefficients.

16.3 Orders of Integration

and the DF-GLS Unit Root Test


This sect inn extend-, the treatment of stochast~~.: trends in Section 14 6 1w J<.kln.....~
In!,. l\\(\ further topk.... FiNt. the trends of some ttme ~... n ... ' .tre not "dl des~o.nb~d
b\ the ranc.Jom walk model. so we introduce an exten-.inn nf th.11 mod1..l .md d1..,
~.u ss it:> implications for regression moddin~ of o;uch '>C I i~.: .... ScconJ. wo.. continul
the c.Jiscu''lun of t stmg fur a unit root in time ~cries daw nnd. among l)t her thing.....
1ntroc.Juc:c ,, ~ccontl tt!st !or a unit root. the DT'-GLS test.

Other Models ofTrends


and Orders of Integration
Recall t hnt the ran dum wa lk mode l fo r a trend. int mtluc:cd in Sect tun l-Ui. sped
lie~ th 11 th~ trend ,tt J ute t equals the lrend at dater - 1. plu'" r.mJ ''" l'rror t~o:
If} I lollU\\ S J random \\ctlk with un fl /3 lh~o. n

}, = {3. , 1- Y,_, + u,.

( lfl l o)

16 .3

Orders of lntogrohon and the OF GlS Unit Root Test

"here!', ,, , c::nall)' uncorrel.Hc:d. AI'-O recall frum cctJon 14.6 Lhat. if a


a random walk trend then it has an lUtorc!!rc~i h root that equals 1

649

'-Cric~

has

Although the random walk modd of ,1 tre nd dc:,crihcs the long-run ruO\ements of man} economic time series. some '-'conomtc ttmc sencs have trends that
arc smoother- that is. vary lc from one penod to the next- than is implied by
Equation (16.16). A different modd is nee<.led 10 describe the trends of ~uch series.
One model of a smooth trend makes the first difference of the trend follow a
random walk: thar is,

!\Y, = f3u T AY,

+ 11,

(16. 17)

where u 1 is serially uncorrclatcd. Thus. if Y, follows Equ:~ tion (16.17), AY, follows
a random walk , so AY, - 6 Y,_ 1 is stationary. The difference of the first differences.
AY,- 6 Y, _,. is called the second djftereoce of Y,. and is denoted A2Y, = 6 Y, !1 Y,_ 1 In this tem1inology, if Y, follows Equation (16.17). then its second dj(ference is stationary. Tf a series has a trend of Ute form 10 Equat ion (16.17). then the
first difference of the series bas an autoregressive root that equals 1.

" Orders of integration " terminology. Some adctitional terminology is useful for distinguishing between these two models of tre nds. A series that has a random walk trend is said to be integrated of order ooe, or / (1) . A series lbat has a
trend of the form in Equation (16.17) is said to be integrated of order two, or 1(2) .
A series that do~s not have a stochastic trend and is stationary is said to be integrated of order zero, or / (0).
The order of integ, atjon in the /(1) and /(2) terminology i ~ the number of
times that the series needs to be differenct-d for it to be stationary: If Y, is /(1 ).
then the first difference of Y,. 6 Y1, is stationary, and if Y, is /(2), then the second
difference oC Y,, 61 Y, , is stationary. Tf Y, is /(0). then Y, is stationary.
Orders of integration are summari7ed 10 Key Concept 16.4.

How to test whether a series is 1(2) or/(/). Tf Y, is T(2).lheo 6 Y, is J(l ).so


that dY, nas an autoregressive root that equals I If. however, Y, is /(1), th!!n 6Y,
is stationary. Thus the null hypotbesjs that Y, is /(2) can be tested against the alternative hypothesis that Y1 is /(1) by resting whe ther 6 Y, has a unit autoregressive
root. lf the hypothesis that 6 Y, bas a unit autoregressive root is rejected. then Lbc
hypotheSIS that Y, is /(2) is rejected in favor of tbe alternative that Y, is /(1).

Examples of/(2) and I( I) series: The price level and the rate ofinflation.
In Chapter 14, we concluded that Lbl.! rate olt nllat ion in the C nited State:. pla~i
h a~ a random wnlk :.tochastic trend, thnt is, that the rate of inflation is /(1). U
inflation is 1(1). then i!s stochastic trend i!'. removed by first diffe rencing, so A/nf,

bly

650

CHAPTER 1 6

Addihonol Topics in Time Series Regression

ORDERS OF INTEGRATION, DIFFERENCING, AND STATIONARITY

16.4

If Y, is

integnll~,;d

gre ~si\ c

o f orde r one, U1at is, if Y1 is /(1 ). then ) ',has a unit autoreroot and its first dif!~rc.>nce . .l Y,. i::. stationary.

lf Y is in t~gr li~J of order two. that ic;. il Y, b !(2). then .l l has a un1t a morcg rl. :.Sl\-1.! root and its second difference. _l2 Y,. is statiOnary

Jf r', is intcgr:.tted of ordt!r d. that is. if Y, 1~ ! (d), th ~!n Y, must he d ifferenced


d times to chminate irs stocha ~tic trend. that is. .ltiY, is stationary.

j, stationary. R c~.:a ll

from Section 14.2 [Equ<tllon (l-t.2)] that quarterly tnflutton at


nn nnnual rate is the fi rst difference of the lognri thm of the price lcvcl.liml'~ -tnn:
that is, /nf, = -tOO~p,. where p, = In( CPIJ Thw. treating the nlt~ oflnflation a~ l( l)
1~ ~.:q uiv;~lenl to trl.!ating D.p, as /( I). butthic; in turn is equivalent to treating p a'>
/(2). l'hus, we have all along been treating.lh~ logarilhm olthe pnce level uo; 1(2)
even though ""c hn ~ not used that tt!rmino logy.
n,c logarithm of the price level.p 1, anJ tlH! rate of intlat1on are plotted in Ft~
urc 16 l.1l1e long-run trend of the logarithm ,)1 the price levd ( F1gurc 16 1<.\) varie'
more '>moothl) than the long-run trend in lht! rate of innatiun (Ftgure Ill I b) Tile
~moothly varving trend in the logarithm ot the price level i.., typical of ! (2) 'tt.:ricc;

The DF-GLS Test for a Unit Root


the dil.cussion ol Sc~:uon 14.6 rcgarlltn~ testing for u unn
au t or~gressivc root. We llrst d~.:,cribe anoth~r tc'>l for ct unit autorcgre'>'>l\ e wot
the so-called Dr -GLS tc~L Next. in an optional mathcm,tll~o:al 'ection. w~. Ji,cu:-'
,., h) untt roottc'-'t ~taustics do not ha\'e normal distributions. C\cn in large sample'

Thi'

s~:ction ~:ontinucs

The DF-GLS test. 'In c ADF test was the first test devc lop~;u for testing tht: null
hypothesis of a unit root and is the most communi} u~ed te~t in pracltcc. O ther test.,
subst.qucntl) lwve been propo~ed . however. nhtn)' of '~hich ha.,.e h1gher pcm...r
(I' C) Concept 3.5) than the A DF test. A test with higher power than the ADf tc:"l
is more likely to rl!jcct the null hypothesis or a unit root again-.t th~ stational y :~Iter
nattvc when the altcrnalive is trul.!; thus. a more pm.,crfult~st io; hdt~r a hie to Jl"tmgul'h between a unit A R root and a root that 1~ l,uge but tc~-. than I.
1111., ...ection d!.;cus.....c~ one such test, tht... l>.F-GLS te~t ~"' doped b) J:.llwtt.
Roth~nhag. anJ Stuck ( 1996). Inc t~st i~ introJuccd for the CU\1! that, under the:

16.3

Ord~m of Integration and

the DFGLS Unit Root Te~l

FIGURE 16. 1 The Logarithm of the Price Level and the lnflattan Rote in the United States, 1960-2004
Logu richnl
Cl

IIIII! II!

I'Jhll

I I I I

I! I I

I'

1'):-i(J

l'J~5

I I I I

II II I

tno

1')')5

I I

::!lltlll

tt

It

~~H

Year

Percent per Annum


If,

I ) 1Jt,

(b) Umt<cl ')c,m._

PI lntbuon

2cKifl

to0)

Year

The trend in the logortthm of prices (Figure 16 1a) is much smoother than the trend in inAction (ftgurc. 16 1b)

651

652

CHAPTER 16

Addihonol Topics in Time Series Rf!9ression


null h) pothc-.b. Y1 ha:- a random walk trend. possibly with drift. tancl under the
altcm a t h c Y1 is stationary around a linear 1ime trend.
The DF-GLS test t::. computed in two s te ps. In the flfST step, th1.. tnl~,;rccpt "'' 1
trend a rc e timated by generalized least squares (GLS; ::.ce Section 15.5). The (,J "i
estimation is performed by computing three new varia bles, V1 X 1 and "' , '"h
V 1 = Y1 and V1 = Y, - u*Y,_ 1 t = 2, ... , T. X 11 = 1 and X 11 = 1 n t ,
.,
T. a nd x21 c: 1 and x2J = ( - a (t - 1), where a is compute d U\ing the formul

a * = 1 - 13.5/ T. The n V, is regressed against X 1, and X 21: that i-i, OL<; is us~..d to
estimate the coe ft ici~nts ot the population regression equation
(16.1~)

using the observatio ns t = 1, ... , T. where e, is the error te rm. Note tha t t h~:rt: t
no inte rcept in the regressio n in Equation ( L6. t 8). 1l1e OLS estimators B11 and 51
a re then used to compute a " detrended" version of Y,. Yf1 = Y1 - (S0 + B1r).
In the second s te p, the Dickey-Fuller test is use d to test for a unit a utorc~r~..s
sive root in Y~ . whe re the Dickey-Fuller regressio n does not include an imen:~o.pt

Yf

Yf_

1
1
or a time tre nd.That 1s, ~
is regressed a gains1
1and L\Y;_ 1, .... L\Y; I'' wh1..1
t he number o f lags p is determined , as usual. either by expert kno wlcdg~o. nr b)
us ing a daUJ-based me thod such as the A I C o r BIC as discussed in Section 14 c;
lf the alternative hypothesis is that Y, is sta tio nary with a mea n that miglt be
nonzero but without a time trend, then the preceding ste ps are modified. Sp1..olt
cally, a li' tS comput ed using the formula a = 1 - 1/ T. X 2, is omitted from h~.:

regresston tn Equa tion ( 16.18), and the series Y~ is computed as Y1::: Y,- 11
The GLS regressto n in the Cirst step of the DF-GLS rest makes this test more
complicated than the conventional ADF test. but it is atso what im proves its .tl'l
ity to d iscriminate between the null hypothesis of a unit autoregrcs ive root ,, d
the a lterna ttve that Y, is s tationary. This improvement can be !>ubstantbl I r

example. suppose that Y, is in fact a stationary AR(l) with autoregre sive co Ill
cicnt {3 1 = 0.95. that the re are T = 200 obse rvations, and that the unit root tl "
a re computed wllhout a time trend [ lhat is. t is excluded from the D ickey-r t.lkr
regressio n. and X 2, h omitred from Equation (16.18)}. Then the probability th 11
the ADF test correctly rejects the null hypothesis at the 5% significance lev.:! ~~
approximately 31% compared to 75 % for the DF-GLS tes t.

Critical values for DF-GLS test. Because the coefficients on thl! dt:tl!rolln
istic te rms a re estimated differently in the ADF a nd DF-GLS tes ts, tbe test~ h e
different en tical valu'-'' The critic3J values fo r the DF-GLS test are gi,cn 10 T I 'r
16.1. If the DF-GLS tc::.tstatislic (the tSta tistic on yd in the regR,sion in th1.. ... L~
ond step)~ less than the crittcal value, then the null hypothesis that Y, ha" a untl

16 . 3

TABLE 16.1

Orders of Integration and tho DFGlS Unit Root T~l

6S3

Critical Values of lfle Df-GLS Test

Detem'llniilic R~reuors

1Regressors in Equation ( 16. 18))

I lnt~r~crt onh (X1 only)


Intercept and tunc trend ( X 1, <~nd .\'~)

1O"o

S%

- 1.6.2

- !()5

1%

- .2.57

Soun:c: I ullur (1!17<>) ond Flholl. Rothenberg.tnd Stock (19%. Table I}

root is rejected. Like the critica l values for the Dickey-Fu ller test, the nppropriale
critica l value depends on which version of th ~ t c~ t is used. tbnt i~ on ,.,.hether or
not a time trend is included (whether or not ..\ ' is included m Equauon (1 6. 18)}.

Application to inflation. The DF-GLS statistic, computed for the rate of CPI
inflalion, /nf,, ove r the period J962:I to 2004 J V wllh nn intercept but no lime
trend , is - 2.06 when three lags of ~Y;1 a re included in the Dickey-Fuller regression in the second stage. This value is less than the 5% critical value in Table 16.1.
-1.9S,so using the DF-GLS test with three lags leads to rejecting the null hypOJhesis of a unit root at the 5% significance level. The choice o f three lags was based
o n tbc A IC (out o f a maximum of six lag!>).
Because the DF-GLS test is beuer able to disct iminate between the unit roo\
nuU hypothesis and the stationary alternative, one mterpretation of this hndmg is
that inflation is in fact stationary, but the Dickey-Fullcr 1\!St implemented in Sec
tioo 14.6 fail ed to detect this (at the 5% level) . TI1is conclusion, however, should
be tempered by noting that whether the DFG LS tes t rejects the null hypothesis
is, in this application, sensitive to the choice of lag k ngth . If the fest is based on
two lags, which is the number of lags chosen by 8 IC. it rejects tbc null hypothesis
at the 10% but not the 5% level. The result is also sensitive to the choice o r sample; if the statistic is instead compUled ovc::r the pc!riod 1963:1 10 2004. 1V (that is,
dropping just the ftrst year). the tes t rejects the nuU hypothesis at the lU ~" Je,d
but no t at the 5% JeveJ using AIC lag lengtb~ The overall picture th(.;rc10re t::.
rather ambiguous [as it is based on tbc ADF t~t. a~ tlt~cu sed followine Equ.ttton
( 14.34)] and requires the forecaster to make:: an mlonncd judgment ahout "hcther
it is better to model infl ation as 1(1) or station<ar).

Why Do Unit Root Tests


Have Non-normal Distributions?
In Section 14.6, it was stressed that the la rg~ ,umple normal di..,trihution upon
which regrcsl>ion anaJy,is relies so heavil> doeo; nnt .1pply if the rl.!grc<o,or-, are

654

CHAPTER 16

Addihonol Toptcs '" Time SenM Regression

non tauonat\ l nc.lcr the null hypoth~~i" thatlh~.; regr~.: 'ion contains u unit rout ,
th~ regrc.,-.or } , in the Dickcv-fulkr rcgrc~ion (and the rq,tr~.:,,o )' 1 tn the
moJtftt.d Dtt.KI.} I ull~r rcgr~"'-tOn an the second ~h:p of the OF-C,LS t~.;'
- ')11
starionar) lltc. ' on normal dJStnbuuon of the untl root t~l l.)tt,lu. . . .
qucncc of thi-. non-.talionariry.
To gain 'umc mruhcmaticnl insight into this nonnormality. constdcr the simplest po~stblc Dtckcy-Fuller regression. in" hich 6 Y, is rcg.rc. sed ill.!ainst thc. 'inglc regres-.or Y 1 and the intercept is excluded. In the notation of Kc.\ (\lllC~o.:fll
1-tl'i,thc OLS t;:.ltmator in thi!' regrc~ion is 6 =
Y, !lY ' ~r } o;o that

L,:_,

Tl) ~

I T

r 2: Y,_ , ~ Y,
I

1
T2

I..,::---T
,

( 16.19)

2:Y-_

r=l

Con-.idcr thc. numera tor in Equation (16.19). Under the additiOnal


tion that Y11 = 0. a bit of algebra (Exerci~c 16.5) shows that

a~sump

(I(, ~0)

Under the null h)flOlhesis, AY, = u r which is s~rwlly uncorrdutcll anJ h,l,:t
finttc variance. so th~ second tem1 in Equation (16.20) hm, the! prob<tbility I unit
f~~ ,(~ Y,)~ ~ IT~ Under the a--~umption that l"u = o. lhc fi~t lCtlll tn r yu
uon (l6.20l C.'ln be wriuen Y,rvT
\ ~~~ ~Y, = ,~: ,u.. ''hich m turn l,o...,,
the central It nUl theorem: tb.lt is. Y 'V T ~ N(O. tr). Thus ( Y 'v T)7
2 - I ), \\ hcrc L is a standan.l nomlal Titndom \31 t.thk.
~ r) (T ( Z
Recall. howc.,~r. that tbe !>quare of a standard normal d i:.tnhutton h,,, a chtgquarcd Ul\lrtbU!t()n with one ucgrec of lrccdom. It therefor~ follows It urn I lJll I
tion ( 16.20) thul under 1he null hypt)\hcst' the numerator tn Equattou ( lb.llJ) h l'
the limiting di,tribution

fL- (

(16.:!1)
The larg~.- !>ample distribution Ill Equation ( 16.21) I'> utffcrenl than Lhe U'>U tl
largd-,amplc not mJI dt'>tribution wh~.;n the regrcs~or b :.tallonary. In t~.-.,J, the
numcr~tor of th O LS e~timattlr of the cocfllcicnt on Y, tn thi' Dtch} rulle r
rcgrc . . . . ion has a dt-.tribution that j., proportional! , a ch 'CJU<~r~.;d distribution with
one dcgrcc! ol lrccd~~m. mtnll!! 1.

16.

Cointegrotion

6 SS

'11ti' di,cw.,inn has con~idcn:d tmly the numewtnr ol /I) Ill~ lknomin<llt>t
al'u hchnv~" unu,unll) under the null I\\ pllthvts: Bee tU:>t. } lullti\\S a random
\Htlk unda tht. null hypothe~1~ r L 1Y, 1 dot.~ not \:On' t~c,; in prohahtlit> h> a
cun ... tnnt. Tn,tt..tJ.the Jt:nominator tn E4uattt1n ( 16 II}) j., 1 1.mJnm' m 1blc. c:v~n
tn l.trcc 'amplc:s: Un~okr the nuU h~ puthr..:'i' / 'S:, }'., umvt.rl!C' m di<.trihution
joint I) "ith the numerator. The unwma l tlbrribution.; ot the numerator and denllmin:uor in Equation (16.11}) are the ~ource olthc nonst.uH.I.uu c.listrihutton of th~!
Olckc:)-FUII~t t~ ... t ~IJII'>llC and tht. reason thai rh..: A or ~t.tti~tic ha~ II!> 0\\ n 'Pt!ca.tlt.tblc t)f crtttl.. tl '.tlucs.

16.4 Cointegration
t\Hl or more series have lhc ,,,me sto~o.haslic ttc:nJ 1n common. In thi!.
"J'>ct.t.tl~..t,~ Tlkrrt. J to a!) comh:grauon regre,ston tn th ~s c.an reveal long run
rd.ttinn:-.htps ,unong ttmc series "anabk ,but ~llnlt. nt:w metluxb arc needed.

<;om

ttmc~

Cointegration and Error Correction


Two or more ume o;eries '"ith srocha~ttc trenus c.:,m mm '- tol].cthcr ~o clo~l!h O\'cr
t h~.; 1\)llg run th.tl lht.y appear to h,,,e t h~.; . . ank treoJ t..omponcnr. that t<:. thl!}
appt.ur to hav~.. .:1 common trend. For example:. two mtt.r~.:st 1 tiC' l.ln L.S. !"-!O'cmm~..nt u~..bt .uc plolll!U in Figure 16.2 O n~: ol the rate' j, thl.' mh.:rC'\1 rat\. on 90-tht)
U.S. Tr~.,.,ur~ htll!\, at an annual rntc (R90,): the othcr ts tht. interc~t rate on .1 onl:")l..'.tr 11 S. Ih::~ ... un Otlnd (R1yr,): lhc ...c intr..:rcst 1<\IC....uc uiSCUC.'\Cd in /\ppcndtx
111.1. The mtcrc~t r:ttcs exhibit the same long-run tcnc.l~..ncic' or trc:nd~> Both ''~rr..:
IO\\ m the I'H>Os, b 1th wsc through th~ 1971"- to [X' tk,rn the:~ trl> l%1)..., tht.n both
tell through the l'J'Jtls. Moreo,er.th~.. drlferencc ~t\\~,.;t.n tht. t\\O sen~. .... Rl_
, ,,RYO,. \\htch 1:-. t..tllcd the: ..,prcad" bet\\t..Cn the t\\O mtt.rc.' r.th:.s .md ,., .Is, I plotted in rtgun.: I fl.:!, une~ not appear tel haven tr~nd.lllttl\, .. uhtr.tctin~ tht: 90ua~
ink r~.. ... t rate from lht: unc-ycar intcre'it rc111.. <tppcar... to l'hnunatc the trenus in both
of the indh idual1.11cs. Said diller~..nth olthough the two mtcrc'\t rates differ. th~:y
appear to c:harc a common ::.tochastic trend. Bec<.~u'c the trcnt.ltn eac.:h indi\'idual
Sc.:flc:~ '' dtm11ntw hy subtracting one.. M.:riec; lr<llll thl oth~.;r.th~ two o;~;rk .. must
h.w\.. tht. :-amc: trcnu.lhat is. they mw,t b;nc a common :,tOt.h.l:>tl~.o tr~.nd
l\\lo or mnrc scrks that ha\c.. a common "tOI.ha ... ttc trend arc ...aiJ to ht. w integrnfcd The fom1al definition l>f cointt..gration (due to thl..' cconomctridan Cli\'C
(rrliWCt lll"\1 ,._:c tiK hox on Cli\c Granger and Robert Fn!!lc) '" gl\en 111 Kc:'
< onct:pl lfl 5. In thi" section. we intn1ducc a t1..~ lnr \\hcthd cutntc,; r.tlttln , ..

656

CHAPTER 16

Additional Topics in Time Series Regression

FIGURE 16.2 One-Year Interest Rote, Three-Month Interest Rote, and lnlere$t Rote Spread
Percent per Annum

20

IS

10

LYear Ri!tes

Oneyeor and three month interest roles shore o common stochastic trend. The spreod, or the difference, between the
two roles does not exhibit o trend. These two interest roles appear to be cointegroted.

present. discuss estimation of the coefficients of regressions relati ng cointcgratctl


variables, and illustrate the use of the coin tcgrating relationship for (orecastmi
The discussion initially focuses on the case that there are only two variable!>. X,
and Y1

Vector error correction model. U ntil now, we have eliminated the stocha~ !lc
trend in an /(1) variable Y 1 by computing its first d iffere nce. ~ Y1; the problems cr~
ated by stochastic trends were then avoided by using 6. Y 1 instead of Y, in umt>
series regressions. lf X, and Y, are cointegrated. however, another way to elimina te the trend is to compute Y,- OX1 where is chosen to e liminate rhe comml10
trend fr om the difference. B ecause the term Y, - 8X1 is stationary. it too can be
used in regression analysis.
In fact, if X1 and Y1 are cointegrated , the first diffe rence~ of X1 and Y, can be
modeled using a VAR , augmented by including Y,. 1 - 6X1 _ 1 as an addniooal

regressor:

!l Y,

=f3to +

f3u !:i Y1- 1 + + f3 lptl Y,_r

+ 'Yua..r,_. + +

l'tp~X, _,, + at(Yr-1 -OX, !)

ul,

16.4

Coimegrotion

657

Robert Engle and Clive Granger, Nobel Laureates

2003.two economctriClaM, Robert f Engle and


Clive W. J. Granger. won the Nobel Prize in economics for fundamental theoretical rl.!:.earch in time
)eries cconumetrics that they did in the late 1970s
:md el\rly 19g()s.
Grnnger'c; work focused
on how to handle $lOCbaMic
trends in economic time
series data. From earlier
work by him~clf and othe~.
he knc" that two unrelated
seriec; "1th stochablic trends
could, b\' the usual statistical
mcn:>ure:. oft-statistics anc.l regre~sion R~-:.. [abely
appe:u to be meaningfully related: this js the ""spurious regres~ion" problem. Jn the 1970s.. rbc standard
procucc \\U!> to use differences of time series data to
avoid the ri~k of a spurious regres.'\ion. For this reason. Granger "a!> skepucal of orne recent work by
'lorn<.: Hritbh econornerricia~ (David~on, Hcndr}.
Sri>A. and Yeo. 1978). who claimed that t he lagged

diff<.:n:ncc between log consumption and ll">g income


(InC, 1 In Y,_ 1 ) was a vnluahle predictor of the
' growth rate or consumption (.llnC,). Because InC,
and In Y, individually have a unit root.thc COO\enuonal "i dom wao; that they should be included in
first difference because including. them in levels
would produC~: a version of a spuriou!> regre..'ision.
Granger set out to prove mathematically that the
0 Ills! u 1 h.1d made a mistake.. but instead proved
that thctr ~pecitication wa~ C("lrrect: There i!> " wclldduu d m.uhemarical reprl.'scntation-the ,ector
error cc:,rrcctton modc::l-for lime series that are md
\'iJuall} /(I) but for" hich a linear combination is

/(0). He termed th1~ sttuauon '"reintegration " In sub~qucnt

\\ork wtth hi~ colleague at the L ni,~r;it) of


California at San Diego. Robert Engle. Grangc:r proposed s~:"craltc~t~ for colntegration. must O(llably
the Engle-Granger A.L>f lc!it described in Section
16.4. The methods of cointe~ ration analy~is arc no'~
n :>tapl~ in modem macrocconometrics.
Around the sam~ 11me. Robert Engle \\US pon
duing the ~tnktn~ 1ncrcasc in the \Oiatility of CS.
inflation dunngthc latc 1970s (see Figure 16.1b). If
the volatilit)' of inflation
had increa~d . he reasoned.

then prediction interval~ for


inflation forecast" r.hould oo
wider than the model.; of
the day would mdicate.
because thoo;e model~ held
the variance of 101lauon
Robert F. Engle
co~tant. But how. prcmely.
can you forcc:l\t tht lime-varying variance (which
you do not ohservc) of an error term (which you nlso
do not obs~rv~) ?
Engle's nnswcr wa!> to dev~lop the autorcgrcs~ivc
conditional hct~roskedasticity (ARCH) model.
described in Secuon 16.5 The ARCH model and it~
extensions. dcvdopcd '13mly by Engle and lull ~lu
dents. pro,ed e'pec1nlh useful for modeling the
volatility of nc;sct rctunb. and the resulting ' olatility
forecast~ can be u-.ed to pnce frnancinl denvathes
and to aso;ec;.; change~ o,cr time in the rt~k of holding ftnancial ~'\!>I!IS. Toda~ measures and forecasts tlf
volaulity are n core component of financtal cconomclric!>.. and the ARCH model anc.l itS descendants
are the worl.hor~c toolo; for moddiug volatiliry.

658

CHAPTER 16

Additional Topics in Time Series Regression

COINTEGRAnON

16.5

Suppose ,\ and Y an.. integ.nn...d of ord~r one. Jl.lor some ~odltcicn t fJ . }' -OX,
I!> tnlo..!!rHt~d of order LCro. then X, and Y, arc sa1d to he cointcgratcd. The coefficient II is calkd the cointcgratin~ coefficient.
II X, and Y, .tre cointcgrated then they ha"~.- the same. or c<m mon. ~tochastk
trend. \omputing th~ difference Y - HX, eliminates this common stochastic. trend.

~~-

f3-:JI"' fJ ,,J. Y, 1 + ' . .


'Y2(>~J( _ 1 ,

..i..

+ a 2(Y,

f3~,,J.Yt
1-

9X,

t- 'Y~tJ.Xr-1

1)

+ .. ' +

+ ~~~,.

Tht. term >, OX, i-. called the error corrcd'ion term.ll1c comb wed modd m
J?quation<> (16.22) nml (16.23) i~ <.:ailed a VC'-'tor error correction model (VECM ). In
a VECM. past values of >', - ex, help to predict future vulues of oY1 and/or .1.\',.

How Can You Tell Whether


Two Va.riables Are Cointegrated?
there arc three ways to decide \\hcther two \Ciriahles Ciln plaU'-thl} be moJckd
.1s cointcgrated us~ expen kno,~kdge ami ccononuc theor). graph the sene!> ;md
!'CC \.,hdhcr the}' appear to have a common -.wch.lstic trend. and perform -,t.Hi~
llcal tc~b Cor comtcgration All three methods o;huulJ t>~ u'cd in practice.
first. ) ou must t;'e your e"\pert knm.,ledg<' of these variables to ucciJ~
whcthl.!r cointcgration IS 111 fact plau!>iblc. f or example, the two interest rate'> ;~,
I 1gurc 16 2 arc linked together b) the ~o<alkd ~xpcctJttnn~ theory of the te-rm
structure uf interest rate~. According lo this theory, the int~:re.:;t rate on Janu.1r) I
on the on~;-yearTn:asUf) bond h the :ner~1gc of the mterc<;t rat<:! on a l)()-day ftu,'-UTy bill fnr the first quarter of Lh~ year and tht! expected imcrc-;t rates on futun:
90-dav Treasury hilh issued in the second. third, and fourth quarters t1f the yc:n. if
not, then mve!>tor'\ could e:-:pect to make money h) holding either the one year
Trca!\ury note or a sequence of four 90-duyTrcasury hill'\, and thcv would biJ ur
pncc~ until the expcct~d rerum~ arc equalized. If th~; 90-tl<t) mtcrcst rate h 1s
random \\alk stochastic trend. tills theory implic~ that thts stochasuc trend 1s mlwr
itct.l by tht. one vear interest rate and that th\! Jiffercncc h~t\\Ccn the t\Vtl r ttl.!' .
thut is. the :.pread 1c; stationar: Thu'-.the c:-.pccl<llton" thcory of the t.:rm strU('LI ft?
imphes that. if the.: mterest rates are/( I ).then they will he cointcgratcJ with a c0i'1h.:gl.tHng 1:1.1dfu;icntu! tJ = I (Excr 1se Ifl "')

658

CHAPTfR 16

Additional Topic.s in nme Series Regression

(OINTEGRATION

16.5

Sup r os~

\' nd Y 1rc mtegratld l l order ont:. lt. lor ..,om f.:oclftctent 0. )' BX
i" tcgr.t . "' 01 <"; u~r zcrc. !lt:r. \' 'lnd } ar.: ., 11d to ~ ~:ointlgrated lltc coc ffi.
cient 0 j, cu' k<.J the cointel!l'ating coefficient.
It X amJ }
l; c >integra ted. then Lh e~ ha\(! the
''Th ..: or common. -,tocha. . tic
trend. Compt 1 ng th "rfferenco.. Y, 6.\, chnunatc-, thi c 'mmon tochn ilic trend.

\ = f3zu

' f3_ .lY:

Y ~,.lX,_I'

+ - f3_,,.l }',
+ a 2(Y,

1-

fJX,

'Y ~.\, 1

1)

..

I u21 .

llrt: term Y, H.\ 1 IS called the error correction term.1111! combu11.:d nlolkl 111
FlflWlion'> (1 1\22) anJ (I 6.23) b culled a vector error corrC('fi on mode l ( VLl'Yt) rn
a \ r ('M. (W\I ',II Ul!!> ul }, - IJX, hd p lO preJict f u 1U rt! \ .tl Ut.!S ol' 1 }'", .tnd ur .l \',.

How Can You Tell Whether


Two Variables Are Cointegrated?
\\hdher I\\ O ''.m rbles can plru.,ihh h~ modd..:d
a" rointegrntcd: u-.~; L \pert J...nowkdge anu ~:~one n rc thcor). grnph th~ ~ne~ ;m..!
., wh~:thc; r they appear Lo ba\L .s common '-loch.tstic tn.nu. ,rntl perturm ">1311'
tic I t~.:o;t<. fur Ct)intcgr rtion. All three methods .;houh.l be u' J in practke
f r:.t \UU lllll l U'~ your expert kno\\lo..-U\!,e 1 f the-,._ v r" thl~' I J~,,idc
whl'lhl.f nunt..:gratton '"in [act plau~ible. Fur ex.1mplc. the twu intcrcc;t I'Jio..' in
figurL 16.2 arc linked together b) the !-.O l.tllcd expcct.rtwn<. th~;or~ ol 1h~ t rm
structure ol rntcre~l rutc~.According. Lo Lht!> thcor}, th!! rnh.rc::.l mh.: un lanu.tr}
\10 tlw 1>nc ycilr lh:as ur~ bond i:. the avcra11.c uf the tntcrc~t ral\: Qn a 9U-J,t) lr~ l
sur) hill for tht.: first quarter of the ~car and th~ c\pccted inll:n.:'>t tatc..,tlrtlu t~ ~
911-tln\ Trcao;ur} bill-, hsucd in Lhe second. thirtl. anu fourth lJlHHL..:r' uf th~.: yt:ar, if
not. then inv~.:..,tors couiJ expect to ma ke mon~y by hn l tlin~ either tlw t~llc-yLo~ r
'!rl.' '"U'} note; nr cJ SCl[Ucnce of (our 90-Ju~ freasury hill.;, .1nd the~ would b1J ur
J'lll'c.;' unttlthl c\peth!u returns ,rrc equahhd. H thL lJO-tl,ty tnlt!r~-;t r.ttl! h,l., '
r .mdnm \\ ,sll.. !>tncha tic trcnd.thr~ thcor} 1mplic::s that thi., !>!Oclw..,trc trL nJ ,., mhd
ilt.:d h\ tht on~.-~~..tr int...rc...l rat'" and that th\! Jtlll:rcnu. "o..t \l.lO th1. ""' ralt:'
h.rt . he -.prc,td, i-. stationary. Thus. the cxp{'ct1til'll' thCl)l) 1.1f the 1e1 m 'I fUll ~
tmplrc;-. th.ll,tl the ink rc..,t rates .trc /( 1).th~n thcv wtll he Ct)intcl!ntlcd "ith ,, (''
to..grtllng-.o ffi 1.nt 1ltJ-=. ['(l dc;e lo.2.
Tho.;~~ .tre tiHLL w,t}" tu d!!cide

16.4

Cointegrotion

6 59

~-.:~,md, 'isual inspection ot the "cries help to identify l'n'~' 10 "bkh cointegntillll j.., pluu,ihlc_ For example. the graph of th\. I\\U inte1c~t rate-.. 10 Figun: 16.2
shllW' that ~ach of fhe Seriec; appcdrS tO be::/( 1) hut th,tt the '-preHU app~ar-; 10 he
/ (0). so that the two series appcor to be cointe~rat\;c.l .
Th m.l, the umt root tec:ting procedures introduced so Jar ~an he extended to
tCl>l lor corntcgrmion. The! in..,rght ,)n which the'\. h.:!-.t., arc h<t,ed is tbttt if Y1 .mu
\
.1r~.- cotntegratcc.l with corntegwting cocfficknt 0 then Y1 - II X, is stationary.
1
lllhcn' ;.,~. r, - nx. is nonsLational"} (i IC 1)] TI11. In P'JtiH:.,is tl1.1t Y anc.l X, :~re not
wintc.:gr .tt~.:u [that is. 1hat Y - fiX , ~ 1( l) jth1.. n.tore can he tested hy testing the
nu ll hvpothcsio., that Y, - HX, h.t~ a unll root: if thh h' puthc~ts IS rcJc!Clcd, tben} 1
aml A 1 can he modeled as cointegr.lll!u. The tkt.nl" ut thi~ test ucpcnc.l on " hcther
the cointC!_!,Iaung coefficient IJ 1s kilO\\ ll.

In some ca,l:S e~pert knowledgt.: or ~.wnnm i c lheory suggests values of fJ. Wh~.:n n" kilo" n. the Dick~y- Full e r
and DF-GLS unit root rests can be W>l!d to tc-.t lor comt~~rution hy lirst constructing 1he series z, - Y, - H \',. then testing the null In pnthc~is th.ll : has a unit
autorcgscssive root.

Testing for cointegration when 8 is known.

Testing for cointegration when 9 is unknown.

rr the;; cointcgrating coeffi-

cien t fit~ unknO\\n rhen it must be "'"timatcd pnot lutc.,tinl! lor a unit root in the
c..nur correction term. This p reliminary step mnkcs rl neccs~ury w usc different
crit ical values Cor the '>ubsequent unit roof test.
Spccsllcally. in the Cirst sll:p th\! cotntegratjng codhctcnt U ~~ t:'\llmatcd by O LS
cc;ttmation of the regression

Y1 = n

nx, .... :.

(16.24)

In the c;econc.l step. a Dickey-Fuller Hcst (wuh t'lllllcrccpt but no time.: trend)
is u.;c:J to test for a unit root in the rc'>tdual (rom 1111~ r~gr~.:!>.'>ion, ::.,. Thi' tv.o-stcp
prucedun: "called the Engle-Granger Augmc..nted Dtckcy-fulkr test for cumh:rraLton. or F:G -ADF tes1 {Engk and Granger, 19~7).
( mica I' alut~s of the EG-ADF ~taiJ!:>IJC arc given in Iable l 6.2 .~l11c critical value!> in 1h~.:. fir-.t row apply" hen thl:rC is a c;ingle rc~rc:-:.ot in Equation ( 16.16). so that
thdc: trc: two coilltcgratc d variables (X, and > ).The .;ul:\~c:qucnt row' applv to the
C.1S4.' of multiple cointegratcd van.;bl~:.. wh1ch 1:> dr..c~d .1t t h~.: ~nd ~..11 thrc; '>I!Ctton.
Th <.!Ill~ 1! \.alu~-. in Tahle 16., ;m: t.slien lrom I uiJ..r (l<r.o) .mlll'hllhp' rJ 0Jhart> ( 1'1'.111) h11 "'''"ll).'. ,, u!lll.:'l'on b) lLn~cn Cl<lr.!J.thc crnical \:.Jiuc' in "J.tl-l.:o lh.211rt ch1~n "' th.ct the' appl,wh<.' lhtJ ot n"t .r, .mJ F, hoi\<! Jntt t:Pmpono.!nb.

660

CHAPTER 16

TABlE 16.2

Additional Topics in Ttme Series Regression

Critical Values for the Engle-Granger ADF Statistic

Number of X' i n Equation ( 16. 24)

10%

5%

-3 12

-3.41

l "a

2
-4.16
4

-44Y

-4 ZIJ

Estimation of Cointegrating Coefficients


Ir \'1and yt are cointcgrated.then the OLS estlmutor of the coefficll:ntm lht. COtn
!('grating regre,sion in Equation (16.24) is consistent. However, m gcnc:raltiK OLS
estimator has a non-normal disrribution, and inr~.:re n ces hn!>ed on it~ 1-l>tath.tic-, can
be misl ~!adi ng whether or not those l-statistics are computed using HACstanuaru
errors. Beef! use of these drawbacks of the OLS estimator of 1:1 , ec~,"' n omct riciam
have deve loped a number of other estimators of the cointegratiog coefficient.
One such estimator of 8 that is simple to u e in practice is the dynamic OLS
(00LS) estimator (Stock ami Watson, 1993). me DOLS estimator 1s hasl.'d on J
modified v~rsion of E<4uation (16.24) that includes pa t. prese nt. and futurt. 'a lues ot th~ change in X,:
Y, = /30 + 8X1 +

:'i11X,_1 + ttr

(ltl.25

l11us.m Equation ( 16.25), the regressors are X 1, M. ~ ..... O.X1 _ The DOLS ~ll
mator of 9 is tht.! OLS estimator of(} in tbe regrc"on of Equauon t 16.25).
1f X, aod Y arc cointegrated. then the DOLS estim,IIOr is efficient in l1rg~:
~ampl~s. MoreO\<!T. ~tatistical inferences about 8 and the o'o; in Equ llion ( lt1 2:')
based on HAC standard erro~ are vahd. For example. the t-statistic con<;truc tcd
u~ing the DO.LS estimator with HAC standard errors has a standard normol dt:,trihution in large sample~
One.,., ay to interpret Equation (16.25) is to rcc:ull from Section 15.3 that cumulative dynomic multipliers can be computed by mollifying, th\. tli-;tribu tetllag
regression of Y1 on X, and its lags. Specificall), in Equation ( 15 7), the cumuh1tivt
dynamic multipli~rs we re computed by rt:gressing Y, on ~X,, lags of ~X1 and.\ r:
th1. coclltctent on X, in that o;pecificauon is the long-run curnulullve dyn tfllll..
mult1phc:r. Smlarl~ If X 1 \\ere strictly exogen ou~. then m Equation (16 ..25).
the coefficient on X,, 0. would be the ll>ng-run cumulative multtpiH. r, that b thl'

16.4

Cointegrahon

661

It IH' run dlccl t\0 Yot :1 change in X. If X,~ not :.tncth

~xogenou!-.thcn thL coddo uot haYL th1~ mtcrrr~tatton . ~~' erthd~" t'lcc.llN~ \"1 and Y, h ,,.e a
common ,h,c:ha-.tic trend if th1..~ are cointegratcJ. th~ DOL S c'ltm:tlt'r j, c,m,j,.
t~..nt C\'cn if \', i-. cndogcnou'-.
ll1c DOLS e::.timator is not thc only efficient c-.tim.ttor of the cointcgrattng.
cocl ficicn \. The lt r~t such cstim.ttor \\as Je\.elupcd by StlTt...n .Johansen {Jnhan,en.
I 1 !~R) r M l dtsCU~SJOil of Johan.;cn's mdhod anJ Ul Olht:r \\1)~ to csttm.llc the
curnh:yr.tung coerhctent, scc I lamtlton (I W-1 <'h,tptl.'r "II).
f \ n if CCOnOJDlC thcOT} dOc:S DOl sugg~~-ol [I -.pcdfic Vc Jue of the cointcgrating cudficicnt it is important It\ check \\ hcth~..r th1.. c'timah.:u c.:ointt;grating relation, hip makes <;ense in practice. Because cointcl!ration tests C<tn b~ mt,h.:aJing
(the\' c:~n improperly reject the null h~poth1.. -.i~ of no c<.\tntcgnllinn mnrL Irequcntlv th.m the~ "hllulJ. ,md lrtqu~..ntl\ the\ rmpro rcrl\ t;ul to r~J~ct tht; null).
it 1s ~.:~r\.cr II} unportant torch on econoffilc th~o:OI). tn:.lttutionul J...nuwkdgc and
common ~~.n,c when e-..umating anti m;ing Ct)tnte~rating rclation-.htps..

hcr~nts

Extension to Multiple Cointegrated Variables


J11e concept!>. tests. and co;u mato r~ u i~cusscd hac exr~,..mJ lo more than two \anahk". r or e xample. if thl.r~ .tr~ three \'ariablc:!'. Y,. x,,, and x2t each oi \\hlch IS
/( 1). then the) are comtcgrated \\Ilh comh:gr.lttng codftcr~..nh II, .1nd fl_ tf
} , fl 1.\'11 - o.x21 ~ '-I.Hil.lOilry. When there ar~ thrt.t. o r n1M1.. 'ttriable~.rhcrl..' can
he multipl~ t.Olntegratmg rdauonship". f'ur cx.tmple. ~.on-.iJer moJdinathe rdatiun-.hip among three interc-.t ratcc: the three-month r:~tc. the one-year rate. nntl
the l'i\'C vcar rate ( R5\r). If they are/( L). lht.'ll the C\pectat1ons lhc<wv ot the term
<;truc.:tur~ ot rntcrcst rates c;ugge-.t~ that they
all be comtcgratcd. One c<,tnl~.:

,,,JJ

~r.ttini!

rd wonship ~ugg~~tcd hy the thcor) '' Rlyr, - /Nil,. anJ ,, ~-.:cond rclaR5) r - R9<t,. (The relatttm,hip R5) 11 - Rl) r1 j, aho 1 LOintegrating
rcl.tthm<.htp. hut it Clllltain-. no additional infNmath'n hcyond th 11 in the nthcr
rd;rtiun..,hips bccau~ itt" perfectly multkolhncnr \\ith th~ oth~o.r two cointeg.ratiny !dation,hips.)
I ht. EG-ADF procedure for tl!:..liM lnr ,, ~mgle 'umtc~rating rd.ttionship
among multtple vanabl~" ''the :,ame as for the Cll'>C ot two variables. excL'pt that
the tc!'rct-,tun in F.quatron {16.::!-4) ts mod1lu:d ..;(' th 11 both X,, and,\_. ar~.: regrcsMII'; th<.. c.nttccll \aluc-. tnr the EG-ADf lc't .m.: gi\'t'n in 1abk ln.~. \\hae the
appn1priat~.. m"' depend' lln th~.; numha of rc~1re-.... nrs in the ril"-t--tagc OU, cmnh or ttinl! rcgre<,.;ion.lllc OOLS estimator ol n sink cointc!!rcll101! rciJIIOn:.htp
anllllll! multiple X'-::. lll\Oh ~.:. mtludmg the k \dol cad1 A. 1lon~ "ith k'JU' and la!!s
ol the ltr'>l dtflerencc of each X. 'le<;ts for multiple cumtcgratin~ relationships can
uon~htp ~

662

CHAPTER 16

Additional Topics in Time Series Regress1on


be performed using the syMcm methods, such as Johansen., (1988) ml'l hod, ami the

DOLS C!)timator can be extended to multiple cointegrat.ing rclation.,hip' b.. c"ti


mating multiple equations, one for each cointegrating relauonshtp. For :~tlc.htional
dtscussion of cointegration methods for multiple variables. !)Ce Hamtlton ( 1994).
A cautionary note. If two or more variables arc cointegratt.:d then lhe error
correction term can help to foreC<l~t these vanables and possibl~ othc:r r~.' tted
variables. However. cointegra tlon requires the va r iabl~ to hav~. the "arne <.to
chasuc trends. 'fiends io economic variables typicaUy arise lrom complex intcracuons of d Lsparate forces, and closely related series can have dt(ferent trenJs for
subtle reasons. If variables that are not cointegrated are incorrect!~ mod~led u~ tng
a VECM. the n tbe e rror correction term wiU be /(1): this introduces a trend imo
ll1e fo recast that can result in poor out-of-sample forecast perlormance. Thus forecasting using a VECM must be based on a combination of compe lling theoretic.il
arguments in favor of cointegration and careful empirical ana lysis.

Application to Interest Rates


As discussed earlier. Lhe expectalions theory o( the tem1 structure O( interest rat~
implies that, if two intere!>t rates of difCerent maturilles are /(1), then they wW b~
comtegrated with a cointegrating coefficjem of()= 1, that 1s. the spread betv. l.'... n
the two rates wtU be stationary. Inspection of Figure 16.2 provides qualita tive surport or the hypotbeSlS that rhe one-year and three-month interest rates are cotrtegrated. We ftrst use unit root and comtegrallon test statistics to provide m or~
formal evidence on this hypothesiS, then estimate a vector error correction motkl
or these two inte rest rates.

Unit root and cointegration tests.

Various unit root and coin tegration tc.,t


statistics for these two series are reported in Table 16.3. The unit root test stat h.tics in the firs t two rows examine the hypothesis that the two in terest rates, the
three-month rate (R90) and rhe ont>-year rate (R l yr). individually have a unit rool.
1\vo o( the four statistics in tbe first two rows fail to reject th is hypothesis a t th ~
10% level, and three of the four faH to reject at the 5% level. The except ion is tlw
ADfstatistic evaluated for the 90-dayTreasury bill rate ( - 2.96), which reject--. th~..
unit root hypothesis at the 5% leveL. The ADF and DF-GLS sta tistics lead to d1l
fcrenl conclusioos for this variable (the ADF test rejects the unit root hypoth e~"
at the 5% level while the DF-GLS te!)t does not), which mean that\.\ c must exer
ci e some judgment in deciding" hether thc!)e variables ar~. plausibly modeled
/(I).Taken togcthl!r, these rc ults suggest that the intcrc::.t rate'> are plau!)ibl} mod
ckd .b l( I)

Cointegrotion

16 .4

TABL 16 .3

Unit Root and Cointegrotion Test Statistics for Two Interest Rates
AOF Stati$!ic

Series

I
I

OFGLS Starisric

RClO

-2.%

-1.88

Rlyr

- 2.22

- 1.37

- 631

-5.59

Rlyr- R90
Rl \'f

1.046RW

663

--

- 6.97'

R90 ~~the wterest r11e on o,ICI-da} U.S. Lreawry hill._ en an annual rate, and R Iyr is the wtcrest r.ttc: un llne-year U.S Jrcasul')
hon<b. Ro:gt~o~ ''ere nltmn ted uSJng quanc:rl~ dau o,er the period I%2.1-1999:1V.The number or Iago. tn the Utlll root tc~t s!a
11\IIC retr~-ssions were o:hl~n h) A IC (SIX lag~ maximum) l" nit root test \talt<.lic- an: ~tgnUJ~'11nt ~~the 5 or .. I,. ~~~nutcance
levd.

The unit root statistics for the spread, R1 yr1 - R901, test the furthe r hypothel>is that these variables are not cointegrated against the alternative that they are.
The null hypothesis that the spread contains a unit root is rejected a t the 1% level
using both uni t root tests. Thus we reject the hypothesis that the series are not cointegrated against the alternative that they are, with a cointegrating coefficient 8 =
1. Taken together, the evidence in the first three rows of Table 16.3 suggests that
these variables plausibly can be modeled as cointegrated with 8 = 1.
Because in this application economic theory suggests a value for 8 (the expecta tions theory of tbe termstrucrure suggests that 8 = l) and because the error correction te rm is /(0) when this value is imposed (the spread is st<Jtionary), in
prin ciple it is not necessary to use the EG-ADF test, in which 8 is estimated. Neve nheless, we compute the test as an illustration.The first step in the E G-ADF test
is to estima te 8 by the OLS regression of one variable on the othe r; the result is
--Rlyr
1

= 0.361

.
-2
1.046R90,.
R + 0.973.

( 16.26)

The second step is to compute the ADF statistic for the residual from this
regression, ir The result, given in the final row of Table 16.3. is less than the J%
critical value of - 3.96 in Table 16.2,so the nuU hypothesis that bas a unit autoregressive root is rejected. This statistic also points toward treating the two inte res t
rates as cointegrated. ote that no standard err o rs a re presented in Equation
(16.26) because, as previously discussed. the OLS estimator of the cointegrating
coefficient has a non-normal distribution and its L-statistic is not normally distributed , so presen ting standard errors (HAC or otherwise) would be misleading.

z,

A vector error correction model ofthe two interest rates. If Y, and X, are
cointcgratcd , I hen forecas ts of ~ Y 1 a nd

~X1

can he improved by ,IUgmcnting a

664

CHAPTER 16

Aclditionol Topics in Time Series Regression


~ 1~

und :lX, hy the J.uH!.ed \aluc: olthc error corn:ctiun tcrm .that i hv
~{lmputing lur.:casto; u... ing thc VEC\t in rquatiuns (lfl.22) .tnJ ti6.2 3J. II t) IS
J...no,,n then the unknov.n C<lefficicnts of the; \! [C\1 can he e'tinwtcJ by OLs.
mcludm\!,- _ - }'1 1 - H \
ac; an aJdllilln,tl regressor. II H jc; unkno"n . then the
\ [.(. \l CJil be ''timatcJ U'>JOg: 1 aS 3 f' \!fi..:~"O \\ here : 1 = l'1 - 0.'(,. \\here fi jc;
an t::'lllll3LUJ tl 11.
Tn the .tpplication to the: two imcre"t r.ll~~ rhcol) suggc~t~ that tJ = I anJ the:
unit ro111 h:'h .... uppMt modeling the''' o inh:rest ratcs as Ctlintcgt ntc<.l "ith tl.'l.ttn
tegr ttim coellicient of I We lhcrdore c;pcctr~ the VEC\T u'ing th thctlr ti~o nll~
Sll\!!!t:"t J \ Jue Ol () = 1. th.tt 1:-.. b\ aJJing the Jagged V'lh: I f tl c; spr ctU.

vAR n

Rlw1

; -

/N01

Jutc:rl.'nCI.'li.th~

tu

\AR

tcsulung \

~ N1il,

10 ~Rhr,

anJ ~f?\JO .. SpecHic;J With

1\\0 l.t~

of fir~t

LC~11s

0.14 - 0.24~ R901

(0.17) (0.32)

1-

0.44~ R90,_ 2

(0.34)

II 0 I ~R l Y1 ,_ 1

(0.39)

-l- 0 15~R lyr,_~- O.I!)(Rlyr,_ 1 -

(0.27)

R90,_1)

( 16.27)

(0.27)

~-m:~~ , = U.J6 - 1114~1?901 _ 1 - 011~90,_2 - O.II ~Ri vr _ 1


(O.Ifl) (UlU)

t0.~9)

- u IJ.lRhr _1 - 0 "~(Rlyr,_ 1
(0.2 -)

{0.35)
-

R90, 11

(1 6.2 )

(ll.~-l

In the fiJ,t ~:yuatwn. none ultht: cocfli<.J~n t~ is indiv1duall) stgntltC<Int at lh ~:


51- le\cl anJ tht. ,,~flidcnb \lO the l.tg,gcJ ftr... t dtllcrt:ncc'> ol tnl. 1Dlt:rcst r!\l(.'~
arc nutjointl) significant at tht. .;cr, Je,cl.ln tht.: second c4u llll'll.th~o wdlldlnt
on the lugged firc;t Jilcrence<: :~rc not j,,intly' gnificant. but the co" ff1ct~nt 1

'l th~

lal!.ceJ 'pre ad (the: crror corn:cllon term).'' hich is estimated to he 0 ":! h,,, n
t statt-.1 1~ ol - 2. 17, ~u Ills :-.latt,tlcalh :-tg.niticmt .It the 5% lchl \lthoul!.h l.n!~cu

value.; l)ltlw lu:tl d1ft~rcnc~ ulth~ tntcrc't r.ttcs &rc not u'dultur prct.lll.:ltn~ tutuc
intt.rc!>t r.t!l..' ' the l.tgged "(ln..ali Joe' help tu prcthct the ch.m~c Jll the on~o } l
Trcasun hnnd r.llc Wh~:n the nnc->C<tr rate cxcccc.l~ the 90-day rah.:.th~o one)'- ' ~'
ra te is lorccn<.ll.:d to fall in till.! future.

16.5 Volatility Clustering and Autoregressive


Conditional Heteroskedasticity
that !~OIDe limes arc tranquil\\ h1k others Ire.. Olll- lh:ll j,, lh.tt
H)btilit\ C.'lllliC' in C)U'.tl~r' - 'hows ur in man~ c~ )n(lffiic lime: I. riC' Thi' ')\!Clion

'}he

rhlllt111\Cllllll

16. 5

Volatiltry Clus!et mg and Autoregreuive Concl.rionol Hetoroskedosticily

665

FIGURE 16.3 Doily Percentage Changes in the NYSE Index, 1990-2005


Percent
q

6-

-~~----~----~----~----------~----~----~----~
19')6
2006
11J<J4
l'rJ8
I 'J'JII
IY'J.:!

Year

Do'ly NYSE P"'' ontoge price changes exhtbit volatility dustering, in which thcrl ore some periods of high volatility.
such os tn tte late 1990s. and othet periods of relot.ve lronquilrty, svch os in tho mtd 1990s.

presents a patr ot modd.., for quantirymg \t)lattlit}


conditional hetcroskc.:da:,ticny.

clu~l~t mg or. a~ ti

tS al"o known.

Volatility Clustering
lnc volc~tilily ur man) financial and mac.:rocconnmiL '.u iahlcs ch.m~cloo over time
f or l!xamplc. datly percentage changes in thc . C\\ York Stock F xchan~c (?'-. YSE)
stock pnce indc~ . sho" n an Figure 16.3.c>.hthtt p"nnd<; of htgh 'ol.ttiht ' su.:h as
m llJ90 and 1lKl3. and other period-, ol low Vlll,uslu> -,uc.:h a~ m JW3 .. \ ~~:.ne~ wtth
some penod:> of Jo,, "olatthty and some pl.!nuc.ls ul htgh 'olauht) '" -..uJ ltlcxhihit
' olutility clu!\Cering. Bl!cau!!-c: the \UI<Jliht\ PJ">C<tr" tn clu~\t;TS. lh~o. variance of the
Jail~ percent.tgc prh.:c change in the ~Y~l tmk' can he foreca,led.t:\1.!0 though
the Jaily price change itself is \ery difticull to forecast.
Forecasting the' ariance of a ~ries is of mt~.rl.:.Sl for -.c,eral rcao.;on:-. fiN. the
\artanc ot an .1'-!let price i<; a mea sur~ of thl. risl-. of U\\nmg that ac;o;ct: lbe larger
th~,. '.lrtJncc 01 datly stock pncc ch:mgt.:l!., th~.: mor~,. ,, ~toe~ murh t p.uuctpanl
tund~ to g.11n-or h' los~-<.'ID a typic.tl o t) An invcMor \\hO '" " orrit:J l!NlUl risk
\\OUIJ b!! ks ... toh.ranL \lf participating in thl! stock marhl during :1 period of
high-wthcr than lo\\ -volatility.

666

CHAP TER 16

Addttionol Topics in Time Series Regresiion

'ecuntl.thc.. value ol \Om~ financial dcriv;uiv~, -.uch a-. optiun... tl~opcn<.l on the
\ :lnancc ol the underlying as.-.ct. An options trader wants the hc~t UV<lilahle forcca,ts
of luturc.. volatility to help h1m or her know the price at wh1clt 10 tluv or !>ell option...
llmd, lunxasling variances makes it po~sible to ha'~ .\ccur.uc.. h .1T\.:Cd., int\,;r.
vals.Suppo'<; you arc.. foreca!>ting the rate of mtlation. lt the..' 111 nc~.- oft c t ~,..
ca!>t error is constant. then Jn approximate forec.1-.t conftu~ net r t~;.l'\ a lc.l ~
con'\lructed .tlong rhc lines discusc;;ed in Section 14 ...._th.. t j,, IS lhc.. r\)(Ccast p)u:o.
or minus d mu l11pl ~ of the SF.R. If. however, the variance of the tot~o:C<L't c.. nor
changee; O\ cr t1me.thcn thl' "'1Jth ol the forecast int~. r-..al ..hould ~:han c 0\er um~:
At pcnods wh~.:n inflation 1-; ... ubject to pam cul.trl~ large Jbturl lnl~o'> or '''10 ks,
the intcn.1l 'hould he \\ Ide; during periods ot rdall\C 1ran4U1ht>. the mtcn tl
l>hould be lighter.
Volatility cluc;rcriog can be thought of as clu<;tcring of the \:lri;~nc~ of the c1 ror
term over 1imc: 1f the regression error has a small variance in one pe riod. it!- va1 ioncc tends to be small in the next period, too. In other word!'>, volatilit y clucacring
1mplies tbat lit~ error c>.hlbits time-varymg bcteroskedastkity.

Autoregressive Conditional Heteroskedastidty


Two models of volatility cluste ring are the autoregressive conditiooaJ betcroskeda ticity (ARCH) model and its extenston. the generalized <\ RCH
(G ARCH) moJ el.

ARCH.

Co nsid ~.:r

the ADL(I.t) regression


(16

:o

In the A RC U model, which wal> developed by the econometrician Robert Enelc


(l:::.ngk, 1982; sec the box on Clive Grang~r and Robcn Engle). the e rror u, is moJ
ded as hcmg normally dhtributed with mean zero and vanancc rT 2, where.. cT1
depend" on pa .. t squared values u,. Specificall}, the ARC'T-1 moJd of order p
denoted A RCI l(p), is
( 16.3())

where u 0 <t1, , a P arc unknown coefuctcnts. It thcs"' codlicu: nts arc poSllln
then if rccc..nt !'quarcd errors are larg~ the ARCH model pred1~;b that the: cuw.:n1
'4uarcd crron\1 11 be large in magnitude in the: .,cn'c.. that 1t~ \ .Uhlnc..c.rr, . is lm gc
Allhou!)h 11 is described here tor the ADL( 1.1) model in Equ.nion (1~.2<>) th"'
ARCH muucl c:tn be applied to the error van mcc of any time sc.. rics regression

16. s

Vololility Clustering and Autoregrcssrve Cond hoool Helerosltedoshcty

667

m11Jd \\tlh an ~rror that ha~ a com.Jition-11

mc.1r ol 7t.ro mdudang. higher-order


\01 nwlh:l,, uuttlrl!gressjons, anJ um~ s..rlt.:~ rcl'r~'''un~ \\lth mulupk predtctors.

llH. g~:n~.rali;ed A RCH (GARCT T) nHH.Icl. c.khlnpcJ h) the ccononh.trid.tn l'im Boller..,lt!\ (19R6).cxtenJ-. th~: A RC I I llhldd to let tT~ depend on its
own lag~; as well n' lags of the squared error. The (, A RC ll (p.q) modclts

GARCH.

n + cr u;-_ 1 +

'
a llj 11 t

1! 1cr,2

+ + <hclr;

I'

( 16 ll)

"here uo- u ..... u ,. t!> 4> 1 arc unkno\\ll codfici~.nt-.


The ARCH model is analogou~ loa di'-tr hutcJ l.tg model. 1nJ the GAR CH
model is analogouo; to an WL modd As dtc:cus,eJ in Appendix 1.:'.~. the -\Dl
mudd ("hen appropriate) can pro"vidc a mort p.tr...~monious modd ol dynam1c
mu lllpltc rs than the distributed lag model. 5mlllarl~. b~ tncorpur.lling l.tg~ ol cr~.
the GARCI I moJd can caprure slo" ly ch.mgmg' ilrtHnc~:~ \'<llh h.:w1..r po.rameter!:>
than the A RCH model.
An unportant .application of,\ RCH and GA R('T muu~..ls b to m~o:<hUring ant.l
rot\.ll'-till!,! the timc-Yarying volalilil) or n.:tUIO' t111 f'iniltlLiOI asset'\, particular!\
asc;ctct ohc;cn ed at hig,b sampling trcqucncicc; such .1s tht: d.nly !:>lock retumo;, in Figure I~ 1 J In such apphca11ons lht! ret urn itsell' Oll~n nwJclcJ .1s unprcJictabk
,o th~:. r._!!rt...,Sillll m Equauon (In "2"1) onl) 10 luLl~:.~ the: llllt:rccpt

Estimation and inference.

A R<'Hand GARCl Tmuueh. an: e'timatcd h) th1.


mdlwJ tlr mcl\imum likelihood ( \ppcnJi, II ") llle C'-timallH., of the ARCH
nnJ (i RCII cnetficients are norm1lh ulc;ttthutcd in l.trge sampk . so in l:ngc:
~ampk~ fst.tliStics ha\c standard nMI11'11 Ut'-tnhutinn' anJ conlldcncc Intervals
101 ,, codiH:ICilt can be con:,truLieJ as 1h mJ>.tmum hkl'hhood c-;umalc
1.9()

'tandarJ ~ ror::..

Application to Stock Price Volatility


\ <iA R< II( 1 1) model ot the NYSr d.ulv pcrccntacc 'tC\ck pncc ch.mg.cs. cc:tim tkll u,mg data on all tradm~ Oil\' from Jilnu:tr~ 2. IYYO th rou~h 1\u-.cmbcr 11.
""Ill) I \

R, =

o.t).IQ

(0.1112)

ui = 0.()(}79

0.111211; I I 0.Q19tr; 1

(0 .0014) (0.1)()))

(0,006)

( [tl,1))

668

Additional Topb in Time Series Regression

CHAPTER 16

FIGURE 16.4 Doily Percentage Changes in the NYSE Index and GARCH(1 1 1) Bonds
Percent per Annum
5 -1

J.b

I.H
- 11.11

- I.H

- 3.6

S.4
- 7.2 L-.,__
1990

____..JL.. . -----L-----L--- - ---.__- -------'--- --2011(


1994
\996
1991!
2002
2000

1992

Year

The GARCH( 1, I) bonds, which ore :!: u, computing using Equation (16.33), o re narrow when the conditional vori
once is small ond wide when it is Iorge. The conditional volatility of stock price changes varies comiderobly over the
1990-2005 period.

No lagged predictors appear in E q uation (16.32) because da ily NYSE price


changes are essentially unpredictable.
The two coefficients in the GARCH model (the coefficients on u; 1 a nd <rl_ 1 )
are both individually statistically significant at the 5% significance level. One measure of the persistence of moveme nts in the variance is the s um of the coefficients
on~~~- ~ and
1 in the GAR CH model (Exe rcise 16.9).11lis sum (0.991) is larg~:.
indicating that changes in the conditional variance a rc pe rsiste nt. Said differently
the estimated GARCH model im plies tha t periods of high volatility in NYS E
prices will be long-lasting. This implication is consistent with the long periods o l
volatility clustering seen in Figure 16.3.
The estimated conditionaJ va riance a t datl' t,
can be computed using the
residua b from E qua tion (16.32) a nd the coefficie nts in Equa tio n ( l6.33). Ftg

O'i_

o-;.

16.6

Svmmory

669

urc l oA plots bands of plw. or minus one condltwnu l ~tandonJ ue\ tation (that is,
:t u,) bao;cd on th~ GARCH( 1.1) model. along with tlt... vtallons of the percen tage
pric" change series from its mean . The conditional stanuard dcviation bands quantify the timc-val)ing volatility of lhe daily price changes. Dunng the mid-1990s,
the conditional staodard deviation bands a re tight. indicating lower levels of risk
for investors holding the NYSE index. In contrast, around th..: tum of the century,
these conditional standard dev1ation bands are wide. tndicattng a period of greater
daily stock price volati lity.

16.6 Conclusion
1l1is part of the book has covered some of the most frequeotlv used tools and concepts of time series regression. Many other tools fo r analyzmg economic ti.xne
series have been developed for speci{ic applicauons. If you are interested in learning more about economic forecasting, see t h~ introductory textbooks by Enders
{1995) and Diebold (2000). For an advanced treatment of econometrics with rime
series data, see Hamilton (1994).

Summary
1. Vector autoregressions model a "vector" of k t1me series variables as each depends
on its own lags and the lags of the k - 1 othl:' r senes The forecasts of each of the
time senes produced by a VAR are mutually consistent.m the sense that they are
based on the same information.
2. Forecasts two or more periods ahead can be comput..:d either by 1rerahng forward
a one-step-ahead model (anAR or a VAR) or by t:slimating a multiperiod-ahead
regression.
3. 1\vo series that share a common stochastic trend are comt c,g.rated : that is, Y, and
X, are comtegrated ii Y, and X, are l (l) bul Y,- 8X, is /(0). If Y1 and X, are cointegrate<.l.thc error correction term Y,- OX, can help to prt!dict 6 Y, andfor AX,. A
vecto r error correcti on model is a VAR model o f oY, and AX,. augmentl:'d to
mclude the lagged error correction term.
4. Volatility cluslt!riog- when the variance of a ~eries tS htgh m some periods and
low io others- is common in economic time series, especially fi nancial time series.
5. The ARCH model of volatility clustering expresses th..: conditional variance o f the
regression error as a function of recenr squared regresston errors. The GARCH
model augments the ARCH model to include lagged conditional variances as well.
Estimated AR CH and GARCH model produce forecast mtervals with widths
th.ll dept:od on the volatility of the most r~cen t rcgrt:~l on re ..tduals.

670

CHAPTER 16

Additionol Topic~ in Time Series Regression

Key Terms
\C:Cinr

,j,,r (\

aul<.lr~;grv..

\R)

(h~S)

cointcgr<.~tion

iterated multipau>J \R furc<.'tl'-l (IH5)


itcrawd multipcriod VAH fNC(-a<;t (OJ'\)
direct muhipr.:ritK.IIor<. '"' (647)
~~.:coml diflcn.:n~.:~

(fytJ)

l(UJ./(1 )..mJ /('2

h Y)

ordN 1 f trlh;.!r.rti >n {I


intc~!t th;tl oturd~.:r

DI-GI <i

t~.:'>t

J)
I /(d) (650)

(6'\U)

comm(ln trtnd (o55)

(655)
error corrc::cti,,n term (11:.1)
'l!ctor error currcctitm nwu~l (ti57)
cointegrating co~:ttrdcnt (h57)
EG-ADF te"l (6 "Y)
DOLS t:'>llm IUr (660)
' olalilit) du)tctim ((loO:\)
autorcgrc:""~ co tdtlhllll
hderoskedo~ tt(.ll) ( \RCH ) (M~)
generali1cd A R 'Jl (G \R( I I) (665)

Review the Concepts


16.1

1\ nwcrncwnomi'l want!> to construct fo r..: ~:a~ l s fo1 th1. following rn.tcro


~conomk '.tri.rblc-.~ ( i DP. cnn ... umpliun. in\e'' mcnl. govcrnrm:nt purdlit~e~.

<:ht.lrl-term interc:-.t rates.. long term itUtrest rntc~ ~mu th1.


ot r ' -"- rnll.tuon. lk ha::. quarter!~ ume s~ n c::, tor c:-~ch o t tltc.;e nriahks lwm 97o-:OOL Should he cstimah.. a VAR lot th~)~ vmabJc., nd t..'>~
tht" . JT tor\!c.t,ting. \\h) or " hY nor. Can )OU liU~... ::.l an ah~rn..111\..:
t.. \"Ptlrh tmpotl'.

r.t t~

.tpp IMCh?
16.2

-.,L"'){)SC t~ t Y,l

Ill,
fnr h

16.3

,IJ,,w

a ~1ti1man A R(l rnixkl ,., tl- {J

~"',lt'iVO\Irforc~.t'tof}",~:(lhatis.\\hllt"Y

0 tnd /!. =- 0.7.


J' \\h till Y, 1,

= 311'' Doc" thi' for~::ust for h = 30 ....:em rcr::.onahle to vou >


A Vt: N{ n l)lth~. fh.rman~ntmwm~ theof) of ccm.,urnpllnn imrh~ th,ttth~.:
lo!!arithm o1 n:al G DP 0 ') and the logarithm <.If r~~rl~:tnsumptmn ( (..) ar ~
cointc~ratnJ \\ tth it COtnlcgrating coefficient cqu,tllo 1 J\phun ho\\ ~llll
\\oulu tnvt:.. tigatc this implication hy (a) pl<.ltting the d.uu .tnJ (h) u'mg a
-.tali~tical IC'-1

16.4

16.5

( onsidcr the 1\RCH modcl.u: = 1 0 -t O.bu; 1 [~plain w h~ thl'i ''ill lead w


vnlatlltl\ du,tt:rtn!!,. (Jlmr: Whm happen<;" hc:n 11 1 '' unu,utdlv lar~c: .,)
Tht.: DF GLS tc~>l lor a untt root bas htgber power th.1n thc Dt1:kc; I ullcr
t~.-;t \\'h> ,Jmuld you u~c ,, more powerfultl!st!

&:erci~

671

Exercises
16.1

\uppo

~.: th.ll

Y tnllows a c;t;~ttunary \ R( 1) model. Y, = {3 - {3 1 Y _ 1 ... u,.

a. <;hn\\ th.u the h-perioJ ,thc.u.l furcc,,,,


{3~'( Y, J.l.l ). where iJ.l = {3,/(1 - {3 1).
b. SuppOl>l.! that
SllOW twt'
I
\ 'I

16.2

t)f

Y, io; given

h~

}',_1.,

= J.Ly-

x, is related tl) Y, hy X, .. L, oo~YI' 1/' where 11i! <


I

1'-:t

.t- }

I.

111

Jj,\'

OnL vc1~inn ol the expcct,ttwns th t.:UI y ol lh~.: t~.:rm l>l ructure of in terest rates
11l,kls thul a long-term rate equalc; thL .lH:ragc of Lhc expected values o(
'hort term uHcrcst ratel> mto the future. plul> a term prcmtum that i~ /(0).
SpLc.tftcall~ 1~1 Rk, denote a k pcrtt)u tnt~.;rc't rate, Jet Rl, denote a onc periou intcrc:-t rate. and let ~, denott: an /(0) te-rm premium. Then R/.. 1 =
.. L. I Rl I + e,, \\here Rl , I" j .... the rorccac;t made at date ( of the value of
R l at date t + i. Suppose that Rl 1 follows 11 ranuom wa lk. so thn! Rl, = Rl ,_1

+ u,.
l!.

Show tha t Rk, = Rl , + e,.

b. ShO\\ that Rk, and Rl , are comtl!grat~.:u. Whatts the cointcgraung


CodftCit!OI?

c. Now suppol>l! that tlRl


( b) change'?

O.S~R l

d. Now suppose that Rl,- O.SRl,


change?

16.3

1 1-

u,. How doe-; }OUr

i.IM\\er

to

+ u,. How docs your answer to (b)

Suppose th.ttu, folltms the ARC H ptoccss. n} = 1.0- O.S11~ 1.


a. Let (u ) -= var(u,) be lhe UO(.;(lndtttona l \ariance of ur Show that

' cJr(u.)

b.

=2

that the dt~trihution N 11 conuitional on laggcJ ,aluec; of u is


i\(11 rT ) (fu,_ 1 = 0.2. whlt isPr(-3!5 u, ~ 3J!lfu _ =2.1,whal t'Pr(-3 ::!5 11 s. J)'?
Suppo<:~.-

16.4

Supp<N; that Y, folio''' thL AR(pJ muJcl Y, fJu ... f3t '( 1 + ... f3,,Y,_,
F( Y,_,.l Y,. Y,_, .... ). Shm\
" \\her~.- r(/1 Y, _,, }" - l " .. )
0. 11!1 Y, I
,
for
h > p.
thai Y , 1 , {3, - {3 Y,_ "-' + ! fJ Y 1

16.5

Vcnfy C4Urltton ( 16.20). (Hmc. use


Y, 1 - .l} ,)' to :>hO\\ that
1Y
L,. 1Y f 1 -2 ::, Y,_16Y,+ 'L:.,.l);.md~ohdor L; 1 }1-l.lY,.)

L;_,l;

'L:.

I,:,(

672

CH APTER 16

Addillonof Topics in Time Series Regression

16.6

A reg rco.;sion o f Y, onto current, pa'>l. ami Iu 1ur\.' "aluc ... ut ,\ 1 ) tl!ld',)

Y, = 3.0 - 1.7X,_, - 0.8X, - ().2X,_, + II,.


a. R~:ar ra nl!c the regression so that it has the lorm o;hown in Equation
(16.25). What att. th.e values o t O,o_ 1.8 and o?

b. i. Soppose that X, is J(l) and u, is / (1). Are Y and X lointcgratcd?


ii. Suppos<! that X, is 1(0) and u, is 1(1). Are Y and,\ ntnt(!~ratl!d''
iii. Suppose tha1 X, is /(1) and u, is /(0). Are Y and X cmmcgrated'

16.7

Suppose that ~ Y, = u,. where u, is i.i.d. N(O, I). and conc;ider the regr~.:ssiC'\n
Y, ... fjX, + error. where X, = Q Y,+ 1 a nd error is the r~grcssmn error. Show
that /3 --.!!....
1) . [Him: 1\nalyze the numerator of {3 using unaly~is likc
that in Equation (16.2 1). Analyze the denominator using the law of larg~.;

!<x1-

numbers.)
16.8

Conside r the following two-variable VAR model witb one 1.1g anu no inLI!rcept:

Y, = /311 Y, '+ Yl,x

'

T " "

a . Show that the tterated rwo-p.!riod-ahead (orecast for Y can be written

o,Y,_~ - t-2 and derive va lues for a, and 6,


the coefficient!. in the VAR.

as Y,,, 2 =

10

term<; l

b. In hght of your answer to (a), do iteratct.l mulllperiod fo recast<; diff~r

lrom direct muJtiperiod forecasts? Explam


16.9

a.

Suppose that E(u,l u, 1 111_ 2, ... ) = 0, that var( 111 u, Lu, ~ ... ) fo llow-.
the ARCH( I) mod el~ = a0 + a 1u;_ 1, and thai tht! process for u, is stnLionary. Show that var(u1) = a 0 / ( 1 - a 1). (Hmt: Usc the law ol llcrmcd
expectations (u7) = E( (u;l 111 _ 1)l.)

b. F\:tcnd the result ill ta) to the ARCH(p) model.


c. Sho'" that

"L;_ 1a

< 1 for a stationa ry ARCI l(p) model.

d. Extend rhc rcc;ult in (a) to the G A R C H(l I) model

e. Show that a 1

c/J 1 < 1 for a c;tationary GARCH( I I) mudd.

Emp1ncol ~erc1ses

6 73

16.111 Constder the cointcgratcd model }' 1 - ll.'f1 -r 1 1, .1110 Y = X, 1 111 \\here
1 11 .tnJ ,., are mean zero :;crhlh uncnn d.llul r lllUtll11 ' anables wtlh
r~ (l'l I ) = u tor all (and J. D~.-mc lh~..- \-eCtor l.IIOl COIICCllon model [Equation' (lo.:-!2) and (16.23)) fur X .mJ Y.

Empirical Exercises
I ht!~t! exe.rcbe" arc bas...-d on dat,l l'eriel' 1n the ll~tt.t ltle!) LS) lacro_Quartcrl) dOd LSMacro_l\lonthl) dc!>c:rtbnlu. the Fmpirical F.x<.rcJ"c" m (n..~p
tcr' 1-l anJ 15. Let Y, - ln(Gf)l',). R, Jcnntc the thn:cm(lnth Treasuf) 1.1iil
l.ltL und r.i"~' 1 antl r.f'li:.Jenl>te the inflation rfllC~> fh,m th~.- CPI and PCE
deflator respecrivel}.
1.16.1

u,mg quanerl) dma !rom llJS~:lthrough 2004:4. ~slim


With foUl lags) for ..1}"1and .lR,.
a. Does .lR Granger-c<tUsc j, Y'?

ll\.. a\ A R(4)

(a \AR

Docs u Y Granger-cause .lR?

b. Should the VAR induJc mnrc than fout l.tgs?

1<: 16.2

In this e '\ercise you ''ill compute pc:cudo out-ot. c;amplc two-quarter-ahead


forecast!> fur ..l F l'>cginnmg tn I ~I-ilJA through the end t~l thL <:ampk . (That ..,.

you '"" compute

~) .,..: , ..,. .~

.i1

0 :; 10901 mt.l

~u hJrth.)

a. Com.truct iler.Jtcd t\\lh.JU31h!rnhc,td p tudn uut ul ,ampk IMccasts


u'ing an AR(l) model.
h. C'on'!>truct lll!r.ttc.:u (\\Uyu,trtt r-.thc.td Jht'Uuu nut-ut-...ample furcx.tsb
u'>mg 1 \'AR(-l) mndcl for ~ )' .mu ~R
c. Con ... rruct Iterated two-gu.trter uh~;tti P'-t.UUO ~.>ut 01 '>amplc forcc.:tsts
w.ingthc::n.tiH!lllft:La,l 1}, .
(..1}~
1- :ll:: + :.n:_3)/4.

.l}:_,

d. Whtch model ha the ... m.tlk't mot me.tn <;LJUilrcd lort.Last erwr?

E16.3

l 'st: the DI--GLS test tn te'>t ft'f .tun it autliTcgre"'"" mot f1lr )'1. A., an altern.tlive. suppll'l' that ) , '' '>lalionar) .1round a dctcrmini..,tic lrc:nJ. Compan.:
the result<; to the result-; oht.1incc.l in rmpiricnl Exercise lL'.

t . l6A

In Emptncal E:.xerctsc I"'-~ vou '-IUutcd the b~.-ha vitJr nl rr~ 1' 1 - 1r{'< 'n 0\1..-r
the sampk pcnod 19591 through 200-t 12. 111.11 anal) ~IS ''""~ predicated on

the,, -,umption thdt ;;f 1' 1

rrf'<. lll

i.; /(0).

a. Tc~t fLlr a unit root in the autoret'!.Tl .;c;ion fot 7T~ rl 77{'' D. Cam out
he te't u'in~ the ADF tc'l that tndudc-. a cnn,tnnt and I~ ltg' lll th~

674

CHAPTER 16

Additional Top1cs in Time Series Regression


first difference of nfPl
GLS procedure.

- 1T{'" f>. Also carry out the ll''>l using the

or.

b. Test for a umt root in the auto reg ression for TT~ a nd 111 the :lutore
gression for
As in (a), use both the ADf anli DF (.d 'i tc~ts
tnc luding a constant and 12 lagged first differences.

"rao.

c. W hat do the results from (a) and (b) say about coinh:)!ratton "ct"ec"
these two inflation rates? What is the value of th~.- comtegrat n cod! .
cient (8) implied by your answers to (a) a nd (b)?

8 = 1.
How would you test for cointegration? Carry out the test How "-OUid
you estimate 8? Estima te the value of tJ using the DOLS regrc~i on ol
r.~l'l onto r. ~uo and six leads and lags o( C:m{'cw. Is tl1e estimated
value of fJ dose to l?

d. Suppose you did not know that the cointegratiog coefftctcnt

El6.5

a.

\\d<;

Us ing data on .ll Y ( tbe growth ra te in GOP) from 1955: ll o 2004:4. csll
mate an AR(l ) model with GARCH(l.l) e rrors.

b. Plot the residuals from the AR( I) model a lo ng wilh

:!:

a bands as in
1

Figure 16.4.
c. Some macroecono mists have clajmed that the re was a sbarp d rop in
the variability of .ll Y around 1983. wtuch they call the "Great \1oliera
I ton." Is this Grea t M ode ra tion evident in the plot that you lormed
in (b)?

APPENDIX

16.1

U.S. Financial Data Used 1n Chapter 16

n1e int~ r~.:)t

rate~

U.S. Trensury bill and on one year U.S.l rcu)uty bond~


arc the monthly average of their daily rat!.'~ converted to an .mnual basis, a~ reported by
the U.S. F~era l Reserve I3ank. The quarterly data used in this chaplcl are the monthly
average intercl>l rates for the final month in the quarter.
on

thr~.:e-month

CHAPT ER

18

The Theory
of Multiple Regression

his chapter provid~ an introduction to the theory or multiple rc:g1~: 'ion


aMiysis 1l1e chapter has four

objective~ The

first

IS

tt> prc-.c:nt the mult1ph..

regression model in matrix form , which leads to compact turmulds tm the OLS
estimator and test statistics. The second objecuvc is to characterize the s.tmpling
distribution of the OLS estimator, bot.h in la rge samples (using asymptotH.:
theo1 y) and in small sa mples (if the errors are homoskedastic and normall y
<.listribured). The third objective is to slUdy the rhcory of efficient estimation of
the coeffici ents of the multiple regression model and to describe generalized
Jenst squares (GLS). a me thod fo r estimating the regression coefficients
efficiently when lhe e rrors are heteroskedastic and/or correlated across
obsetvallons. The fourth objective is to provide a concise treatment of the
asymptotic dstribution theory of instrumental variables (IV) rcgressto n 10 the
linc:ar model, including an introduction to generali7cd method or moments
(GMM) es11matio n in the linear IV regression model w1th he tero~kedas11 c

errors.
l11e chapter begins by laying our the multiple regression mode l and the
OLS estimator in matrix torm in Section 18.1 Th1s section also presen ts the
extended /east l.quares assumptions for the multiple regression model. The fu.,t
four of these assumptions are the same as the least squares assum ptions of Key
Concept 6.4. and underlie the asymptotic distributions used to justify the
procedures descri bed in Chapters 6 anJ 7. The remaining two extend(.'(.)

ku~t

squares assumptions are stronger and permit us to explore in more detail the
theore rical prope rties of rhe OLS estimator in the multi ple regression model.

11le next three sections examine the sampling Jistribution ot the OLS
estimator and test static;tics. Secuon 1~.2
704

pre~ent~

the asymptotic distrihution:> of

The Theory of Multiple Regression

705

the OLS est1ma tor and r-statistic under the leal>t squares assumptions or Key
Concept 6.4. Section 18.3 unifies and generalizes the

tes~

of hypotheses

involving multiple coefficie nts presented in Sections 7.2 and 7.3. and provides
the asymptotic distribution of the resulting F-stati::.tic. ln Section 1H-4, we
examine tbe exact sampling di tribut1ons of the O LS estima tor and test sta tistics
in the special case that the e rrors are homoskedastic and normally distributed.
A hhough the assumption of homoskedastic normal errors is implausible in most
econometric applications, th ~ exact sampling distributions arc of theoretical
interest, and p -values compu ted using these distributiovs often appear in the
output of regression software.
The oext two sections turn to the theory of effi cient estimat ion of the
coefficients or the multiple regression model. Section 18.5 genera lizes the
Gauss-Markov theorem to multiple regression. Section 18.6 deve lops the
me thod of gene ralized least squares (GLS).
1l1e final section takes up IV estimation in the general IV regression model
when Lhe instruments are valid and are strong. This section derives the
asymptotic distribution of the TSLS estima tor when tbc errors are
be teroskedastic and provides expressions for t.bc standard e rror of the TSLS
~s t im a tor . The TSLS

estimator is one of many possible OMM estimatOr'\ and

lhis section provides an inlroduction to GMM estimation in the linear IV


regression model. It is shown tbat the TSLS estima tor is the efficieut G MM
estima tor if the errors a re homoskedastic.

Mathematical prerequisite. The treatme nt of the linear model in this chapter uses matrix notation a nd the basic tools of linea1 algebra. and assumes that the
reader has taken an introductory course in linear algebra. Appe ndix 18.1 reviews
vectors. matrices, and the matrix operations used in this chapter. In addilion. multivariate

calculu~

is used in Section 18.1 to derive the OLS estimator.

706

The Theory of Muh,ple Regres~1on

CHAPTER 18

18.1 The Linear Multiple Regression Model


and OLS Estimator in Matrix Form
Tit~.: 'in""f rnult' pk rcgrcc;-.i0n muJt:l IOU the OLS C!\tllll31Qr .JO

h be r~rrc

l.

Sl'ntcJ c:omp,tclh u-;ing matrix notation

The Multiple Regression


Model in Matrix Notation
}, = f:J,>- {3

.\l T

{31 X2J

1 T fhX~;, i

u,.t

= I, ... . n.

(I:\. I)

li1 ~Hllc the. multiple. rc.!!rt:~:>J on model in mntrix form. Jdinc the follo\\tng
'ector" :mJ m,ll nee~.

r=(::).c/= (::; ),,\'=(: ~;:


u,,

Y,,

anJ p '0

th,tt }

I!\ II X

1. .\

I'.> ll ,

X 111

(k

(f)

( 1X 2)

I), U i' n X I 1nd {J

jc;

(k- 1) < 1. Titr,l UghOUl

' ~ ,!(>notc. m ttrtcc.:~ anJ \c.:Ctors b~ bold I} pc.ln thi.; notation,

)' i=' the n X I Jimen-;ional vcctm of 11 ob,t.:rvationc; on the <h:p~:nc.knt varihk


(k - I J dimen-,i<mal matrt\ ot

X j, lht "

rcg.tc,,nr; ( tncluding

lh~

'\:on-.tant"

r~ogre-.,or

11

obller\.tlll..'ns o n th~o k

tor the

+1

mt~rc~.;pt).

'Tlt..: (k + I) 1 dimcnsi,)naJ column vc..:tor .\' is th1. 1 ,,1::-<;crvatton on '


J... -~ I rcgn.. ~so~.lhat is. \'. ' = (I x,, ... x~.-,). \\ h~rc ,\, Ul.'llOh.'S lht: lr lll'>pt ''(I f X ,
1
U j-. the 11 X I diml!n-.ional vector nl tbc

fJ i' tht.> (J.

t.

I) ' I Jim~os10n.tl

11 ~.;rrot l~rm'.

, .eliOT olth~.; k

+ I unknown rcgrc.!,.,ion cod-

fkiLnls.

multtpk regrc-.-.ion modd in Equation l t,'.l) tor the


\Hillc.:n usinl' th~o. \c:..:lof' JJ anJ X,. I'
J11"'

X I fj

..1.

II

= 1. .... II

;tb

OO:>I.!f\ ::tllJ P..

18, 1

The linear Multiple Regression Model ond OLS Estimator in Motrix Form

707

THE EXTENDED LEAST SQUARES


ASSUMPTIONS IN THE M ULTIPLE REGRESSION M ODEL
The hnear regression model with

muhi pl~ rcgr~ssors

Y - .Y,' fj
The extended
I. f;(u, X)

lc.'!aSL

18. 1

is

u,. 1 = 1. ... . n .

( 18.-t )

squares a-;su mption~ are

= 0 (u, has conditional mean 1ero):

2. (X,. Y;). i = l. .... n arc mdcpcndently and identicall~ distributed (i.i.d.)


draws from their JOint distribution:

3. X1 und u1 have non7tro finite fourlh momcntl>:


4. X has full column rank (there is n<.> perie<:t mulricollinearity):
5. var(u11X;)

= (f~ (homc.lsk~dh:.. t icity): and

6. the conditional distribution of u, given X1 is normal (normal error~).

In Fquation (IH.3), the first regressor is the constant" rl!~rc~sm thai always equab
I. and il'i cvdfidcnt i' the intercept Thus the inh!rcept doc~ not nppcar separately
in Equation (1R3); rather. it is the firs! ekml:nt nf the coefficient vector {3 .
Stacking al111 obsenations in Equation ( 18.3) yie ld~ th~: muhipk regression
moddm m<tnx form:
l'. = Xfj + U.

(18.5)

The Extended Least Squares Assumptions


n1e extendctl l t:.a~t "quare~ a~sumpt ions forth~ mult1pk regres<;or model arc the
lour lcac;t squares ac;sumptions for the rnulliplc rcl!rcssion model tn Key Concept
l'>..t. plus the two atluitional assumptions nf homosk eda~t1city anti normall y distnhutcd errors. rnc assumption of homo h<.Ju.,tidty I!> tN.:d when we study the
dfictcnC} of the OLS e~lima tor, and thl! a!>sumption of normality is u!>cd when
'" 'ltUd} the '-xact sampling distribution of the 01 ' ~'t1mator and test st atistic~.
1hc t:\lended le.ast squares assumptions arc !>Ummarizcd in K~y Concept

IH.l .
fxccp t for notational differences, the fir~t three asc;umpt1nns 111 K ey Concept
IR I arc ldcnl1ca lto the lirst three assumptions in " cy c~,,nccpt 6.4.
'I he fou rth assumption in Key Concepts 6.4 and 18.1 m1ght appc<.~ r t..liffLrt:nt.
hut 111 f<lct lhcv an.: the same: They arc .,imply Jiffcrcnt ways of ~aying that there

708

C HAnE~ 18

The Theory of Mulhplo Regre~sion

cannot he p rtc~.t muhic:ullinc.1rit)' Rt!call that pcrkct mulltcnllinenrit~ an-,c


\\ hen one rq~r'-''or can b~ \Hit ten as a ~rfect lin1.<1r comhinatinn of the others.
kl the m.ttm. not..atton of F qu,llion ( I\.::?) pertc.:t multicolhnearity m~an~ that
one column of X 1!. a perfect linear combinatiOn o f the other wlumn nl.\ . but tf
this 1'- tru~.:. thc.. n X due-. not ha' c fu U column rank. rhus, !loa}Jn\! that X lw rnnk
J..
I that i'. rani. t:qu:tl h1 the numhi:r of columns or X. b jm.t another''~ to ::.,ty
chat the re~rcc;.c;ors arc not perfectly mult icollinear.
The I1It h lca!.t :>quare~ a::.sumption 10 Key Conccpt18.1 i!i that the error te1 m
is conditiontlly hom o~keda st ic, and tht: sixth a.<:~umpuon is thatlhe conditional ui,.
tribuuon of u, giv~. n X,. is nom1al. Th e:.~. two assumpttons are th'- '-.Jffic ,,., th~ final
two a' !.umpllon., 10 Key Concept 17.1. except thatlhey are now :.LJtcd lor multiple
regrc,sor...

Implications for the mean vector and covariance matrix of U.

lltc k a-.1

squares ttssum pt ions in Kl.!y Conce pt 18. 1 imply si mple expressio ns lor thl! nl('<tn
vector anti cova1innce matri x (If the cond itional dislribution of U given the mutt JX
or n:g resso r~ X. (The mean vector and covuriance matrix of a vector ot random
variables are defined in Appendix 18.2.) Specifically, the fi rst and second .t!.:.UmpLions in Key Conce pt 18.1 imply that F.(u, ,X) = E(u,I X;) = 0 and that cov(u .u1 \ ')
= E(u,u1jX) = E(u,u A;,X1) = (11 l .\',)(u1 jX1) = 0 for i t-= j (Exerci''" I, "') TI1c
first. second, and fifth ac;sumptions impl)- that (u: 1 .Y) = E(u~ .\ ) tr,. C\lr tb
mg these result , w~.; lt.l \~ that
umlt..r l'>sumpuons I and 2, E( L- ,.\') = 0,. aod

(11).())

under a. uruptions 1, 2. and 5. (UC' '.\ ) = cr'f},,

( lf< 7)

where 0, is the 11-d1men<:iona \l.ctor of zero'c; and /,~ then x n identity m:tt 1:\.
Similar!}, th~ fi rst, second, fifth, and <:Jxth .ts!>umption::. in Key Concept I" 1
imp I> thc11 th~.:. cond1t1onal d istnbution of the IH.Itm~nsional random ' ector C
conditional on .\'. is the multivariate nl..lrmal dsrrihutton (dd lne<.l an Appcmhx
Ht2). Tiwt is.
under a~)umptions l , 2, 5, and 6. the
condit1onal <.lt~t n hu t i on of U given X b N(O,. cr~l,).

(I R X)

The OLS Estimator


The OLS csttm,llor mintmtze~ the '\Urn of squared pr~t.lac t io n mtc;tak\!<:.
'i.;' ,(Y,- b11 b 1X 1, - h1 X 1 IEllU<t llllll (6.R)J. Th\! formula f~'r thL OLS

18. 1

The lmeor Multiple Regression Model ond OLS Estimator in Molnx Form

709

<"tim lhlf h oht~ined by taking tbt> UCri\'<JIJH. of the o;um of -.quarcd pr.:Jtl.tion
ma\1 tkc~ "llll rcc>,pcct to f..;ach element ot th1. ct_>cllictcnl 'o.;Ctor. S<.tung the-.~.:
d~:m.ttth., h' 11.ro. and !>olvmg for the e'timatm Jj .
1l1c (kri' .ttih: of the sum of $qu.trt!d ptcdktion mi,takc' with n.: ... pcct ""the
P" rcgn:-;-,ion cociTicicnl. b,. i'>
.,

-!-h,,_
l:<Y,

,,

- hu -

b 1X 1,

"
2 ~X,(
Y, - h11

I>,..XIl1) ==
( IKlJ)

/1 1 \

11 -

hkX, .)

1-l

= 0..... k. where. foq = 0. X,


I fur all ;:nk J"ra"at ive on the right-hand
,j<.l.: of Lqu 1llon ( 18.9) i~ the 1 h clement ol th~.: k l 1 dimcnc;ional vector.
fllf j

:!X '( Y- \ 'b) \\here b is the k ~ 1 dimcn.,iun 11 \~o:dnr con-.islln!! l)l h, 1 , I,


11H: r ~.. ar~.. J. ~ I such lkrhati\o~!\. ~ac.h c,lrrc'p'mJanr tu an l kmcnt of b.
Cumbm~d. thc'>C: }ICid the 'i}"Stem ot k r I cquallon~ that, \\hLn ,u to z~.ro,con
'II!Uh. th1. tm. t order conditioru. for the; OLS c'llm.llnr
That i-.. {3 'ohc-. tht:
c;y-.tcm of k ~ I equa tion~

X ' ()' - XP)

= o.

(HU ll)

m, ~qu" akntly, X'l' - X' X/3.


!:>ohing th<.: ::.}l>l~m ot equations llKlO) )'tdJ-.thc OLS c-.timator p in matrix
turm.
(I ~.II)

{3 = (.\".\') l XT.
whcr~.

( \''X)

1 ili

the inverse of the m.ttrrx X ' X.

The role of "no perfect multicollinearity.,

lh~

fourth lea<;t "qua1 e'


a-;..;t mpt on m Key Co ncept 1&.1 states lhut .\ h,l<; full column r~nk In turn.
tht' ampll's th.llthe matrix X' X hao; lull r.ank.tlwt i.... tl1<1 t .\"' \ 'is non-..sng.ular.
B~c au'~ \ ,\ 1~ nonsmgular,tt is inveniblc. 'JltU". the ao;,umption that thert ts no
pcrlcc.t multtculhnc:anty ensun:l> thut (,\ 'X) 1 cxi,h, -.1.1 that f quJtwn (I K 10) hal>
a unique- 'nlution and tbe formula m r lJU.ttitm (I S.ll) for the OLS c~limawr can
actually be wmputcd. Said wiT~r~.ntl~ if t do'-' not ha\c full column rank,
there;, not a unique solution ro Fquntion ( IK Ill), nnd .\"' X b "'"l!ular. fllcrcf\)fC,
( \'' \ ") 1 ~.1111101 he computed and. thus, {J c .an11nt he computed Irom E4uatmn
(I X II).

7 10

cHAPTER 1 8

KEY CONa:PT

18.2

Tho Theory o f Muhiple Regression

THE M ULTIVARlATE CENTRAL LIMIT THEOREM


Suppo~c

that W .... Wn are i.i.d. m~dimcnsional random v.uiablc \\ ith mcnn


vector E( W,) = 1-411 and covariance matnx E[( W,- ~tn )(
J.tuJ' I l 11
where ~ 11 is posttive definite and finite. Let 'if ~ ~:'- 1 w,. Then \r 11( W
~tu )
~ N(O,,. ~~~>

18.2 Asymptotic Distribution


of the OLS Estimator and t-Statistic
lf the sa mple si7.e is large and the first four assumptions of Key Concept I ~ I nrc
satis(ied, then 1he OLS estimator has an asymptotic joint normal distribution, t h<.!
heteroskedasticily-robust estimator o( the covariance matrix is consistent, anu
the he tero~ ke dasti city- robust OLS r-sratistic has ao asymptotic standmu normal
distribution. These results make use of the multivariate normal distribution
(Appendix 18.2) and a multivariate exte nsion of the central limitthe01em

The Multivariate Central Limit Theorem


The ceotralll.mit theorem o[ Key Concept 2.7 a ppties to a one-d1mensional random
variable. fo derive the joint a-;ymptoiic distribution of the elements or {J. we nc~J
a muJtivarinre centrallunittheorem that applies to vt!ctor-va lued random variabll!s.
The multivariate central limit theorem extends tbe univariate central ltmn
rheorem to averages of observations on a vector-valued random variuble_ n.
where W ism-dimensional. The difference between the central hmit the on. m' fur
a scalar as opposed to a vector-valued random variable is the conditions on the
variances. In the scalar case in Key Concept 2.7,the requirement is that the van
aoce is both nonzero and fini te. In the vector case, the requirement i" that tht;
covariance matrix is both positive definite and tinite. If the vector valued random
variable W has a fini te positive definite covariance matri>... t h~ n 0 < va1(c' HI)
en for all nonzero m-dimensioual vectors c (Exercise 18.3).
The multivariate central limit theorem that we ..viii use is stateu in Key
Concept 18.2.

Asymptotic Normality of fJ
In large samples. the OLS estimator h:lo; the multi, anate normal asymptotiC'
distnbution

18.2

Asymptotic Distribullon of

lne OlS E$1imolor and t-Siati~ic

71 1

where QA 1 the (k + I) X (k + 1)-dim\!Dl>ional matnx of ~econd moments of the


regressors, that is, Q y = E(X,X,'), and ~v is the (k + I) >.. (k .,.. l )-dime nsional
CO'-ariance matrix of V, = X,u;. that is. I v = ( V, V,' ). Note that the second least
squares assumption in Key Concept 18.1 implies that V,. i = 1, . . . . n are i.i.d.
Writte n in terms of~ rather than Vn (~ - f3 ). the no rmal approximation in
Equation {18.12) is

~is. in large sample s, distributed N({3. 'f-fi).


where !.p = 'i. ..;;;cJJ- IJYn

= Q,y 1!vQ_i,IJn.

(18.13)

The covariance matrix !. 0 in Equa(ion (18.13) is the covariance matrix of the


approximate nonnal di~>tribuliOD of ~ . whe reas I , nt O- IJ> in Equation (18.12) is
the covariance matrix of the asymptotic normal distribution of Vn (~ - {3).
These two covariance matrices diller by a factor of n, depending on whether lhe
OLS estimator i!l scaled by Vn.

Derrvation of Equation (1 8. 12). To derive Equation (18.12), Grst use Equatjons (1 R4) and ( lb 11) towme.~ = (X' X )- 1X' Y = (X' X)- 1X '(Xf3 + U ).so that
(18.14)

1l1us j3

- f3 = (X'X)txo.so
Vii({3 -

(3)

'X)-' (X
= (Xn\ 'U)
, .

(18.15)

The denva tion of Equation (18.12) involves arguing first that rhe "denominator'' matrix tn Equation (18.15}, X' X/ 11, ts consistent. and <:econd Lbat the
numerator'' matrix, X' Ul Vn, obeys the multivariate central ltmit theorem in
Key Concept 18.2.111e detail!. are givt.!n in Appendix 18.3.

Heteroskedasticity-Robust Standard Errors


The hctcroskcdasticiry-robust estimator of ! , "(P-fll is obtaincJ by replacing tbe
population m oml!n l~ in its defini tion IEquation Cl~ 12)] by sample moments.
Accordingly. the heteroskedasticity-robust e'ttmator ot the covariance matrix" of
Vii(p - /3) is

-X
- )-t . "here l:
( X'X)-t:I. (X'
II

II

II

II -

-k -

"'"'" 'u;.,

I LJ
I

A ~,

( 18.16)

712

CHAPTER 18

Tho Theory of Multiple Regres~ion


l!H.. \. "illmator ~ 1 tncorporat~' th~ ~m~ <lqtr~~~ ,)tlrec:dnm dju~tmcnt that

bin the SLR tor th" multiple r~grc,'>tOo moucl (Secuun <>.4) to .tJjust lor JWien
o[ C'>linlJlioo Ot J..
ltcgrv>'ilOil l'tlcftictCOh
Inc proof h~t ~ iJ p ) ___. }.. p-fr '' cun.:cpluall} 'm1ilar to th~. prool,
pr~'crtc:d tn Sco 10 1"'.3. of the con"''h.:ncy of hdcn>s!.l:cu.t"licit) rohust l>t.md.m..l \:rrors tor the stnglvregressor moJt!l.
tialthl\\0\\~r<J hi 1\ bcC3U\C

'Ilt~..

Heteroskedasticity-robust standard errors.

he t~,;ro,k\!J fl.,lic aty - rohust

e'ti m.tt 1r uf th~; cm.tri.tnce m.ttri\ nl (J. }".11 b

'-

(J

II

-1 ...
-\ \',!()

(1 ~' . 17 )

/J)'

n,c hctcro~l.:cua,tidty-rohu-.t -.wnuard aror fnr thcl'rcgrcs,ion i..'ot'IIH:knt


;.., thl' -;quorc root of the j"' diagonal clement ot }".fJ Thut is, the hctcn'l:-kcd,,..,tidtyJ obu~t ~lanJan.l error of the /h codficicnt is
(IS. I~)

Confidence Intervals for Predicted Effects


Scdion "'.1 <lescrh..:'

l\\O

mcthoJ, tor compuung th1.. ::.land.uJ error ut prcd~ ... J

dlc,;~t..,

that in' oln. chang~~ in t\\O or more rcgn:,sor'. mere .ul! cnmp.1Cl mJlll\
exprcc;<;ion<; for tl est stanJar<l er ,>rs .uJd thu ... (or cor fiJ net tnten.th lor
preJIC'Cd d'ects..
(.llll"'ld-.:T J

chan !C in lht: \ JJU\.. Of lhe

Tl!gTc:>:>Cifl>

~\1mc tntlaal valu~. Sa\ .\ 0. to ..orne nc\\ value. .\ 11

(or the

+ d , Sll th

"'Cll'lSef\'3liOO h1)!1l

th~: cham~~:

in X ; i.;

:1 \ ', =d \\hcrt. d ,.., a J.. - 1 dtmen.;ional \ector. Tht~ ~.:hang'- tn \ c.tn '"' nh c rnu 1
tipk rc!, cssor... (lh<tl j, rnultipl~! elcmcnh vf .\',) fur c\,unpk, tl
ul th~
regre,,or, trc! the \aluc of an independent \ariahk anJ Its 'tJUarc. then d ''tit~

t,,,,

Jitfcrcncc hctwccn the subs~yut!nt and initial \3IU~l> of tlwsc lwo '.tri.tbk~.

,)r thi' I.' Ifl:d


'" d' p. Bccau.w ltnt;.tr l:\1mbinativn<> ol normal!~ Jt,tributcd 1anJom v:~riahll'" arc
,_
"'
' ,
th~.:rn-.,1\..:l>
norrnall~
dtstnhutl:!d. \tn(d'
{J - d ' {J )
d''vn( {J
- jJ ) _
'I h~.- cxpcl:tcd ~fleet of thi" chang!! in X , i" d' fJ and the c'tim.ttor
.,..

\(ll. tl'!. ... ,.11, 111d) TilUs the -.tanJ.lrd error uf this pn.:uii:tt:d dtcct '" (d' lf1d l .
\ C)'\'lo confidence intt:rval for thts prt!dicted ctkd i'

18. 3

Test$ ol Joml Hypotheses

Asymptotic Distribution of the t-Statistic


the tstati~tic tesung th~ null hypothc!si' thai 131 fJ ton~tructed

Tl3

using the

hctcro,kcdasticitv-robust tandard error in E4u.tli,,n ( l''i.IS). '" t!i\en 10 J....ey


( "~n<:ept 7 I. Th"' argum~nt thatlhi~ H\tau.-.tic ln.. n ''~ mpl'''" ,t,mJarJ normal
Jt'.trtbuuon p.lraiJcl, th~ argument gt,eo 111 Sccunn 17.3 lot the Stn!!le-rc 0 rc:,:.or

moJ"'l

18.3 Tests of Joint Hypotheses


Section 7.2 con~i ders lests of joint hypotheses that inwlvc. multiple r~..:.tncuons..
wh~..rc each rc:stnc.ttun mvolvt:S a single c.odiH.:knt . anJ S ... I..IH)O 7.3 con,tJers
l\.::.1-; o! .1 ''"gle re:.mcuon tn\Olvmg two or mMe coctf ctt:Ub. The m.Hrt'\ :.c!Up
ol ~cct i on HU permit:> a umlicd repre:.c:nt.ttton ol th~..~c t\\o t)P~' of h)pothc:sc::.
as hnt!ar re,tnctlOn' on the coefficient vectur. "hu~.. c.:.tch r~trtction can innlhc
multiple coefficient\. Un<.kr the fir<;t four lca'l ...quare..'' ,,,,umption.; in Key
Concept 18 1. the hetcro~kedasticity-robu l 0 LS F-st.llisric tesfing th\:SC
hypothc:>cS has an F a ympto tic distribut ion under tht: null hvpothcsis.

Joint Hypotheses in Matrix Notation


C'on,tdc.:r .1 Jninl h~ pothe::.is that is linear in the l:,,cfficu. nts anJ impo'~' tf r~~tric
tJon... where q < k _.. l. Each of these q rl!,trictions <.an in\'olvc one or more or
the regression coclllcients. Thil> joint null hypothesi.; c;.an be wuttcn tn matnx
not.1110n as

RIJ = r.

".1

( lfL~O)

q >< (k + 1) nonrandom matrix w11h full''"" r.tnk .md r is a nonmnJum tJ v 1 \t.:Clor 'The number of row' ul R 1' q, \\ h1ch io:; the numb~..r ol
re,trictiono:; hcin!! impo:.cd under the null hvputhc'l'.
'lhe null b\pothcc:ic; in Equation (l'-.20) su~~umcs lll tht.. null h~poth~.:,c:-.
considen:J 1n Sccuons 7.2 ami 7.3. For cxampk. ' JOint b)polhl..'~is nf the t)ptt
constd~rt:u in Section 7.2 is that {311 = 0. {3 1 ll.... . {3 1 0 lu '"ntc th1~ j\llnt
h)pothcsi'\ in the form of [q uatinn (JR.20). '\cl R [111 011 1, 1 .,J ,md r 0,1'
The formulation in Equation (1K21l) 'llo;o captures the rc~triCtlons of
<\cellon 7.3 in\'olving multiple reg.rc!;l>ion cocfiJCicnt'. I o r ex.tmplc. al /.; = 2.thl.!n
th1.. hvpoth~..,Js that /3 T 13: = I can be \\Otten 10 the 1~1rtn ul Lquntklll ( Us.:::!O) by
:.t..tllne R IO I lj. r = I. and tf = 1.
''hl.!rc R

714

CH APTE R IS

The Theory of Multiple Regression

Asymptotic Distribution of the F-Statistic


The hetl!ro!~kedasticit y-robu s t F-statist1c t(!:o.ting the joint hypothesis in Equation
{1H.20) is

F == (Rp- r)'[R ipR'j

(Rp - r)lq.

11:-;.21)

II the hrst four assumptions in Key Concept 18.1 hold. then under
hypothcsb

th~.:

null

with the con-.1~


tency of lht: he rcroskedasticity-robust cslim:llor Ip of the cMuriance matrix .
Specifically. first note that Equation (18. 12) a nd Equation (18.74) in App~nJix
1R.2 imply tha t, under the n ull hyput he~is.
r) = vliR( ~- {3)
~N(O, R!.....,;;![HJ}R ') . It fo llows t:om Eq uati~ (1_8.77) lhat. unJt:r the null
hypothesis, (R(3 - r }' [R!._BR ' t 1(R (3 - r) - (Y nR((3 - {3)j' !Rl.v7,(13 p 1R' )
[VnR(p - (3)] ~X~ However, be~ause i,ii(p-fll ~ !., 11(/1 J!l' it follnw11
from Slutsky's theorem that [\1'7zR ((3 - J3) j'[R I ..;;;(iHJ)R' I- 1 [V'nR ((3- J3)j
~ x~ or. equivalenlly (because ip = IJViiciJ-fJl n), that F ~ \ ~ q whllh ;,
in turn distributed Fq.-.:
This result foll ows by combining the asymptotic norma lity of

vn(RP -

Confidence Sets for Multiple Coefficients


As discussed in Section 7.4, an asymptoucally valid confidence set for two nr
more elements of {3 can be constructt!d as the set of values that \\.hen tak~n a:.
tbe null h) pothesis, are not rejected by the F-st at1~t1c. In principle. thi:. set ..:oulJ

be computed hy repeatedly evaluating the F--.tatisttc Cor many value!) ot /3. but
as is the case with a confidence interval fo1 a si ngle cocfficiem. it i' simpler to
manipulate the formula for the test statistic to ohtain an explicit formula fM I h..:
confidenc~ set.
Here is the p rocedure for constructing a conftd ~nce :-ct lor two or mon. of
the elements of (3 . Let denote the q-diruensionnl vccwr consisting of the coetficients for which we wish to construct n c<>nrich.:ncc set. For example. if \W ,11 I.'
constructing a confidence set for the regression coefficients {3 1 and (3,, then q - 2
and = ((3 1 {31 )'. In general we can write = R (3 . where the m<llrh R con:.ish ,)1
ze ros and o nes fas discussed following Equation (Hi 20)1 . The f -statistic te~t "~
the hypothc,is that o = o11 is F = (B- o1,)'[Ri 11R ' J 1(6- o11 )1q. where S= Rf3.
A 95% confluence set for o is tht! set of values 50 that are not reJeCt~<.! h~ the!'statistic. That b. \o\ h~n o = RJ3. a 95cyo confi(.)cncL o,l.! t for o is

1 8.4

Distnbution of Regreision Stotiilo with Nonnal Errors

71 S

thc Y.:'1 1 pt!rcentik (the 5~u critical v,tluc) C\lthc F Jt.,tributioo.


[he: <.c:t in [ 4Uation (18.23) CODl:>i'>b nl ,til lh~ p\lln(S t.:ontmncJ tnside lhe
dllpsc Je;tc.;lll1lncJ \\h~n the mcquality in Lquatton (lt).::!3) '" ..111 cquahty (trus tS
<Ill c,;lltp::.~nJ '' ht.: n q
2). Thus. the! c:onfiJI.'n~c bel l1r can be computed b} solvtng biLI.ttton (IS . ~~) fur the boundar) dlip,c.
"here<

i~

18.4 Distribution of Regression

Statistics with Normal Errors


Tile. Jt,tnhuttons prc.sented in Section::. lb.2 .mJ IR ' \\hich \\ac,; JUStlftt.'d by
tppc.tling tu the! lav. of large numbers anJ the ccntnllunit thl.'urcm .cppl) when
the. ... umpk ,i,e; i-. large. If. however, the errol' ue; homthl..l.'u<htic and normally
ui,trihutcd, conditional on X. then the OLS c~timator hac; a muiU\'ariate normal
Jt,ttthution in finite sample. conditiona l on Y In aJdllion the finite samrlc distrthuttnn of the square of lhe o;tandan.l em nr ol the rcgrcsston t'> proportiOnal to
the cht 'quarw dt)lnbulon wHh n - k - I Je1re;Ll> of {n.:ctlom, lht..! homo)kcdastll.' lt)001) 0 15 1 statistic ha::. a Stutlt'nl t tli,tnhutton v.ith 11- k- I dcgrc:c' of
lr~cdotn mltht:: hnffil)"kcda'>ticity-onl) 1- 't lli,tic h ~ an F
k-. & .. trihulit n
'Iltt: .trgunh. n" in this section emplo~ )tlnlt; )~L"t.lli!c.d matrix rormulas f0r O LS
rcgrcc.:..ion l> I :Jtistte~. v.h1ch arc presented IJr,t.

Matrix Representations
of OLS Regression Statistics
'lhc OLS pn.:Jictcd values. residuals. and ~urn of 'lquarcd re,ic..lu:~b have compact
m.ttrt"< rcprcc;cntation~ These representalilm' m.tkc u-.c ni L\\.O matnccs. P ranJ M,.

The matrices Px and M x The algcbr..t o( 0 I.S Ill th~.: m uhhurtatc modd
\10 the. t\\<.1 '~mmt.:tnc n < n matrices. P,. and llx:

rc!lit:~

(IX~-!l

(1 25l
j, idt'mpolent ir C '' squ.trt.: anJ (.(
C \'ec: Appc:nJix I\ IJ
P'< P_, and Mx = l(, lf , (r\crd'>t.. IS.'\) anJ l"lc:causc.: 1\ mJ \(\
... ~mmctric. P , and \f, arc wmmctrie id~ntpt"\lc.rH matrtCC'.

Am 1tri' C

n~' l\llht..
.u~

Px

716

CHAPTER 1 a

The Theory ol Multiple Regression

The mut.rices Px and M .ll have some additional usdul prnpcrtu;s. which fol low directly (rom the defin ition'S in Equations (18.2-1) and (1 1). 25):

rank(Px) = k + 1 and rank{Mx)-== n - k - 1.


wbcrd rank(P,r) i~ lhe rank of Px.
The mat.rices Px and Mx can be used to decompose an n-drmcn,ion tl "o:dnr
Z into two parts: a pan that is \panned by the columns of X. am.l J part ortho:-J
nalto tb~ columns of X . In othe r words. PxZ is the proj ection nl .l onto tht. spa~.:e
spanned by the columns of X, MxZ is the part of Z o rthogona l to 1he column' of
X and Z Pr Z - MxZ

OLS predicted values and residuals. The mnrrices P,'( and Mx pro.,. ill~: .,orm:
simple expressions fo r OLS predicted values and residuals. The OL predictcu
values, Y = X~, and the OLS residuals, iJ = Y - Y, can be expressed a f<.lllows
(Ex~::rcise 18.5):
Y= Px Y and

( 18.27}

( 18.28)
The expressions in EquaLion (18.27) and (18.28) pro'vide a imple prool th<1l
the OLS residuals and predie1cd values are orthogonal. that IS.. thaL Equa ron
(4.37) holds: Y iJ = Y' Px' MxY = 0, where the second equality follows lro
Px' M x = On n which in tum follows from M xX = 0,, (4 ,. 11 in Equation ( 18.26).

The standard error ofthe regression. The SR.deftnec.l in Sectio n 4 .3.~ ,,.
where

s~ = n _ k _ 1 ,~
iif = n _
... I

i _ U'if = n _ ~ _
1

1 U' M xU.

( 18.29)

where thl.' final eq uality follows because U' V = (M,r.U)'(MxU) "" V' M x MxU ""
U'Mx U (because Mx is symmetric and ide mpotent).
~

Distribution of f3 with Normal Errors


Because iJ = f3 ~ (X'X) 1X' U (Equation (11\.14)1 .tnd because the
o! U condi t i~ma l on X is. by assumption, V(Ow cr?l n) (Equation
conditional drslribution o(

iJ

d rl>tribUtlon
( 18.8)), th'-'
given X is multt'- ariate normal with mean {3. lnl!

18.4

Distribution of Regression Stolt1ha with Normol Error'$

717

covariance matrix of {J. conditional on X , is p \


F(( /l - {J)(p - P)' IX) =
1
= (X'X ) 1X'(u , l ,) t( X ' X) - 1 =~( X'

Accordingly. under aJI six ~sumptions in Key Concept 18. l. 1he fmite-sample
conditional distribution of {3 gi\ en X is

xr

L::f(X ' X )- 1X ' UU' X (X'X) - 1 Xl

(1R30)

Distribution of ~
lf all six assumptions in K ey Concept 18 I hold, then s;, has an exact sampling
distnbutio n that is proponional to a cbi-squart:d distrinution with n - k - 1
degrees of freedom:
s-=?

tl

'

u,
fl -

,,..~ - 1. k - I X f\u

(1831)

1l1e proof of Equation ( 18.31) slarts with Equation {18.29). Because U is normally distributed conditional on X and because Mx is a symme tric idempotent
matrix, tbe quadratic form U' Mxf.Ji c~ has an ~xacl chi-squared distribution with
degrees of freedom equal to lhe rank of M" I Equation ( 18.78) in Appendix 18.2).
From Eq ualion (18.26) the rank of M x is n - k - 1 Til us U' M,r.U I u~ has an exact
.\ ~- J. -l distribution. from \\hi.ch Eq ua tion ( 18.31) follows.
1l1e degrees of freedom adjustment ensures that s~ lS unhiascd. The expecrauoo of a random variable with a x~ 4 1 distribution is 11 - k - 1: thus,
2
(U' M X lf) = (n - k - 1)u11>
so E(s?)
= a u2
11

Homoskedasticity-Only Standard Errors


The homoskedaslicity-only estimato r !.p of the covariance matrix of p, conditional on X , is obtained by subslituting the sample vuriaocc s~ for the popula Lioo
va ri a n ce~ in rhe expression for :tpiA in Equation (IX.30). Accordtngly,

ip = s~(X'X)- 1

(homoskedasticit y-only).

{18.32)

i3r

The estimator of the variance of the normal conditional distribution of


given X. is the (j,J) e lement of fp . TilUs the bomoslo.cdasticity-only standard
error of
is the square root of the lh diagonal el~meot of ~.9. That is. the
homoskedasricit y-only standard error of iJ, as

P,

(I .33)

718

CHAPTER l 8

The Theory of Multiple Regression

Distribution of the t-Statistic


he th\; r-statl,tic tc::.ting the hyptHhcw~ {31
homoc;kcJ.a-.;taclly-onl ~ -;tanJard error: lhctt is. le t
Let

I =

\ (!.,,),;
t

the

(1K.1-l)

Ke~

Concept IX 1.
is the Student r dt.tribution \\ ilh 11 - k - 1

U nder aiJ,ix of the extended least .;;quare-. <1')!->Umptinn ... in

the exact ::.Ltmpling ui,tribution of


degree!) nl lrccdom: that as,

u~ing

f3 o~)

/3 I

131. , n,n.,tructeu

( ll:U5)
fhc proof

llf

Equation (I ~35} is given tn Appendix I R4

Distribution of the F-Statistic


If alJ six least c;quarcc; assumptions in K ey Concept I H. I hold. then the F-stath-.tic
testi ng the hypothesis in Equation (1K20), constructed u.. ing the homoskedasticity-only estinHitor ol the covariance matrix. has an exact F,1 ~~ - A _ distribution
under the null hypothec;is.

The homoskedasticity-only F-statistic. The homoskcJa::.ticity-onl) 1--statb.tic jo;, c;;jmilar tl) the:! hetero<;kedac;Licity-rohust F ')(illl,tll Ill l::quation ( 1~.21) .
except th~t the homo!.kcJa,lkity-onl~ c<;tim<Hur }.1.1 is u:-.cu instl!au of the hc~
ero:-.kcd<htacatr robu't esumator i p. Substituting the expression ~P = (X' X ) - 1
mto the! e\pres~aon lor the F-statJstic in Equataon ( 18.21) yields the homoskcdastictty-onl~ F-sta ta~uc testing the nuU hypothc~as in Equauon ( 18.20):

s:,

}- = -'-(R...:./3_ _r )'--''{R (X'X) : 'n' r 1<RP \~

r ) <t

08.36)

If ull six assumptions in Ker Concept HU holu. then unud the null hypmh-

F - r~.n-k.-J

(18.37)

Inc pmof l>f Equation ( IX.37) is given in 1\ppcnJi;... I t'J A.


The f-sratistac in Equation ( IK.36) is e<tlled tht' Waltl vcr-;ion of the F-stati-;uc (named altca the stallstician Abraham Wald). Although the formula for the
homoskcJa~tll.:-tlnly f-stali!->tic given in Equati<ln 7.D appc.trs qunc different

t 8. s

Efficiency of the OLS Eshmotor wath Homo~edoshc Erron

71 9

1h.ln the formula for the WaJd statJ.,tac an lqu.atmn ( 11-...'h). t h~ homoskedasuconh ! ...t.ttasuc and the Wuld F-:-.lata:-.t&L lllL two VLI"'otllll'llf thL .,,amt: ')l<lli~tk. That
'" th~.: t\\ O expressiolb an: cquivaknt. a rc-.ull .,hnwn in Exc:n:isc lS.D.

18.5 Efficiency of the OLS


Estimator with Homoskedastic Errors
lmdc:r the Gaus:.-Marko" condllao nc: tnr mull aph:. r\.'l rc).,llln. the OLS e~tt
m tltlr of (J ,., dfictenr :1mong all linear condtltonall~ unot hl.!d c'tamators. that is.
the O LS LStimator lli BL li'E.

The Gauss-Markov
Conditions for Multiple Regression
' 111L

Gaus.s-MarkOl conditions for mult iple regrc .., ion a rc

(i) E(U X ) = 0,,


(ii) F( UC' ~ X) = rr',J,,. ,and
(iii) X hm, full column a.tnk .

(HUt\)

The (iauc:c;-Markov conditions for mu ltiple rL~IL~ ~m 1n turn are amphc:.J by the
first lt, eassumptionsin Kq Ct,nccpti R I (s~.:c Lqut llo n-.tHs.o)and(1~.7JI.TIH.'
nmJ1taons m Equation (HUb) generalize thL Gau" \ hrko' condHtOil) for .1 smgk r~gre,.,or modd to multtplc rc.;gr~.:,.,ion. (H) U\lflg m.lln\ notation thL <;\.:COnd
and third Gau.<;s-Markm condition~ in [quat inn ('i ~I) arc cnlkctcd into the 'ingk C\llldition (ii) in Cquation (J~. 3~). J

Linear Conditionally Unbiased Estimators


\\lc stnrt by describing the etas~ ollmcnr u nbi.t~cJ co;tim.ttor.. nn<.l by '!>howing that
OL i~ in that class.

The class oflinear conditionally unbiased estimators. A n estimator of (J


j, ":ud to be linear if it

tnr

i:.. a linc:ar fun.;tHlO ol )' 1 . . }


a:. hnear an } 11 11 can be \\Tith:n m th1. lorm

Ac:c:orJmgl). th1. C'-ttma-

{J

A' )'.

( 1.'.39)

7 20

CHAPTER 18

The Theory of Multiple Regression


\\ here A is a 11 X (k + 1) dimensional matn\ ol "-t!tp.hts tha t may depend on .\'
a nd on nonrandom con!)ta nts, but nor o n Y.
An estimator ts condiuonally unbiased if the mean of its conditional
a~pling distribution. given X, is {3 . That is, {J is conditionally unbiased tl

(fJ IX) = fJ.


The OLS estimator is linear and conditionally unbiased.

Compa~on of

Equations (18. l l) anu ( 18.39) shows that Lhe O LS e timator is linear in}"; speciftcally,~ = A ' Y.where A = X(X' X) 1.Toshowthmpiscondiuonallyunhiascd.rccall
from Equation (18.1-1) that = fJ + (X' X) 1X' U.Taking the cond1tionaJ expectation of both side s of thb expression ~idus E(P il") = {J ~ [(X'X) - 1X'UIXJ
= {J + (X' X) - 1X ' E(U IX ) = {J , where the finul equa lity follows because

E( U IX)

= 0 by the first Gauss-Markov conditio n.

The Gauss- Markov


Theorem for Mult iple Regression
The Gauss-Markov theorem for multiple regression provides conditio ns under
which the OLS estima tor is e fficient a mo ng the class o f l! near conditionally unbiased estimators. A subtle point ames. however, because {3 is a vector and its ,c:tria nce is a covariance ma tnx. Whe n the "variance o( an estima to r is a matrix.ju"t
wha t does it mean to say that o ne estima to r has a sma ller variance than anotbt!r"?
The Gauss-Markov theorem ha ndles this problem hy comparing the variance
o( a candida te estimator of a linea r combinaTion of the dements of fJ to th~ vanance of the corresponding linear comhination of {3. Specifically. le t c be a k -4- I
dime ns ional vec to r, and consid\.:r the pro ble m of est imating the linear combination c' fJ using tbe candidate estimator c' (whe re {3 b a linear condition all} unhialleJ estimator) o n the o ne hand and c' jJ on the o ther band. Because c'
c' iJ
are both scalars a nd are both lin~oar conthtionally unbiasl.'d e 1imators of c' {3 . 1t

panu

now makes sense to compare their vanancc .


The Gauss-Marko" theorem for multiple regression liays tha t the OLS esti-

mator of c' fJ is efficien t. that is, the O LS cstimutor c' has the smallest contlilJOnal va riance of all linear condi tionally unhiascd t:~ timutors c' {3 . Remarkably.
this is tr ue no matter what the linear combinatio n 1~. It i=- m this sense that thl:
OLS estimator is BLU E in multiple regression.
The G auss-MarkO\ theorem IS slated 10 Key Concept II):"\ and proven tn
A ppendix 18.5.

1 8.6

GM 'lfalized Leos! Squares

721

GAUS5-MARKOV THEOREM FOR M ULTIPlE REGRESSION


<iu p po~c

that t li~ (;au,_..,_\ll arko \' condt~ion-. for multiple regre~'>ion in F quntton
(HDS) hold. 'I hen tht. OLS csum.,tor JJ b BLUE. That is, l~ t fj he a lmt:ar cun-

18.3

ditionall} unhHlWU l!'>timator of {3 anu kt c be a nonrandom /.; I c.lum.:nsmnal


\'t'Ctor. fl wn var(c' JJ \') ~ .. ar(c'fj \') for c:vcf)' nontero vector c. \\ ht.'r~.: the
inequalit\ hold!." ith equahty for all c: o nly if {J = ~.

18.6 Generalized Least Squares 1


The: ........ umption of i.i.d ....ampling fit~ m.IO) .tpplk:ttll ns ror t'\,tmpk. ... upposc
th.ll },and X, mrrcspond to information .J~\"'Ut inui\ idu.t1....... uc:h as tht:ir ~at ning.....
education. anc.l pcr-..onal characteristio;. whlt c the indi\iduah are .;;elected from a
population hy simple random sampling. In this cnsc, bccau-.~. l>l the "'mph! ranJc\lll,amphn~ scheme. ( r,. } ',)are necessar tl) i 1.J. Ucca u~ ( \', } ,) :lllJ ( X ,.} 1) .trt.
tnde p~ndcntl\' di-..t ributcJ for t =F- j. u, and u arc mdt.'pcndcntl) dt:,tnbutcd lor
1
J I hts m IUm tmphcs that u, and 111 <~re unl'lllld.tlr.:u lor 1 = J In the cvntext
l' th~ (,Juv... \ larko' .. ..,~umptJon~ th\! a""llnlJtion th 11 E(UC' : \) ,.., diagonal
th.:-refM~:. j.., .tppropriat~ it thi! Jata are collt:ctcu in n '' ,1\ that make ... th.: ob ... ~. r
',ttl on" independently dtstributed.
~orn e smnphnl! :,chcmcs encounterc;:tl 10 c~.:unum~:tnt.'S do not , ho"-1.'' er.
rc-.ult U1 tnuepcndent obo.;ef\atton~. anu lll't~.-.td c.11l k .tJ to error tern,, u, that arc
correlated trom one ob:,l'natton to tht: next. lh~: lc.tJllll! c'<amplc ,.., when the
d 111 art. "ttmpktl mer time fut the sam~.: ~.:ntll), that io.,. \\ht:n tht: d.1ta .m: time
... ~rics Jat.l. A~ di-.lU~!.cJ in Section 1:'.3. in regtt:,,ion' tll\Ohing times~ ric.;; data.
man) omith.:d f tllllr' arc correlated from one period to the next and thic; can
rco;ult in rcgrcc;c;ion error tenns (\\ hich rcprcc;ent thu~l.' om tiled ltctnr... ) that
re t.Orrd tlt:J lrvm nne period of otn:rvallun tu the nl!xt. In mh~r \\orJ the
l.'ff\)r lt.rm in ()f\1,. renoJ will not. in gcncr;.ll, b~ Jt.,lllbutcd Ultlcpcmkntl) uf tht.
..:nor lt:rm in th-.. next pcrioJ. Instead. tht. t.rtllt term in unl! pcri~ld could he correlated "tlh the error term in the next period .
1The (Jf 'i '"' OUIIjtf ''~ IO!Ioduccd In S.:~Uon I).5 II the Clllll.:\1 of di~!JI ... Uicd l.t~ I1111C \\: fi<:Hq;n;,.
'~C" ,,,,lion here'' t !oell-<:onl.Jmed n.tlh m.llkul trt:lllmcnl of GL<; 1h 11 e:1n 11~ read mdc

~Hn

pendent!\ ~dClO 15 'i, but c

<J.D!!

th 11 ''"' ~, hr 1 Will help"' m 1l.c lhc:.l< '"""more cunnctc.

722

CHAPTER 18

Tho Theory of Multiple Regre$5100

'D1c pr~!"cncc o1 correlated crrm tenns creates two prohh:m-. for inh.:rl!ncc
ha,.cd on OLS. First, neither the hcteroskedasticity-robust nor the. homu.:;ked''"
ucitv-only standard errors produced by OLS pro,idc a ' a lid IM'Ils h.>r infer... "'1.c.
The solution 10 thi.., problem is to use standard errors that arc ruhust to hoth h\.1
ero~ l..cdasticity and correlation of the error terms acro..,s <lO,\:.r\ Ilion lbi ...
topic-hcteroskedasticity and autocorrelation-consi)>te nt (II \{') co\ nnnncc
matrix estimation-is the subject of Section 15A and we do not pur~uc it lurthcr
here
Second. it tile error term lS correlated across observation th ( L l.l' .\ ) 1s
not diagonal, the second Gauss-Markov condition in Eq uation (1 '.J ) do... ., not
hold. and OLS is not BLUE. In this section we study an estimator. generulized
least squares ( GLS), that is BLUE (at lea!>t asymptotically) when the conditional
covnriance matrix of the errors is no longer proportional to the identity matnx /\
special case of GLS is weighted least squares, discussed in Section 17.5, in wh1ch
the conditional covariance ma tr.ix is diagonal .and the i' 11 diagonal elcm~nt is a
function of X;. Like WLS. GLS transforms the regression model so that the crro~
of the transformed model satisfy the Gauss-Markov co nd ition~>. The GLS c.. timator js the OLS estimator of the cocffi.cients in the transformed model.

The GLS Assumptions


Tht:re nre tour assumptions under which GLS is valid.11te first GLS a::.sumrtion
is rhatu, has a mean of zero, conditional on X 1 X,; that 15,
E(UIX) = Ow

(18.40)

Th1s assumption is implied by the first two least squares assumpuon an l<ey
Concept 18.1 , tbat is. iJ E(u,IX,) =0 and (X,.Y;). i J, ... , n are i.i.d, then
E(V IX) - 0,. ln OLS, howeve r, we will not want to maintain the 1.i.d. assumption; aft er all. one purpose of GLS is to handle errors that arc correlated across
observations. We discuss the significance of the assumption in Equation (1R40)
after introducing the GLS estimator.
The st.:cond GLS assump tion is that the conditional covari ance matrix ol U
given X is some function of X:
E( UU'!X)

= S1(X).

(18-H)

where .fi(X) i-; an X n positive definite matnx-valued funct ion of X.


Then. are l\\ O mam apphcauons of GLS that an: cu.. "red h) th1s assumpt1on
The fir'lt jo, mJcpendcnt sampling w1tb hc:t...roskcda,tu:. error 10 "'h11.:h ct~\:. fi ( \ l

18 .6

Generoltzed leost Squares

THE GLS ASSUMPTIONS


In th~ linear regression model Y
l. (U ' X)

723

KEY CONC1'1'

= X{3 +

U. the GLS assumptions Are

18.4

=0

11 :

2. F.(UU' X) = il(X). whe re O(X) is an


depend on X:

x 11 positive definite mul'rix that can

3. X, and u, satisfy suitable moment conditions;


4. X has full column rank (there is no perfect multicollinearity).

is a diagonal matrix wilh diagonal element Ah(X,). where A is a constant and h is


a function . Io litis case, discussed in Section 17.5. GLS is WLS.
The second application is to homoskedastic errors thac are serially correlared. In practice, in this case a model b developed for the serial correlation. For
example. one model is lhat the e rror term is correlated with only its neighbor, so
that corr(u,, u;_ 1) = p * 0 but corr(u1.u1) =0 1f It - jj<::: 2. rn this case. O (X) has
u~ as its diagonal elemenL, pu~ in the rtrst off-d tagonal. and zeros elsewhere.
Thus, O(X) does not depend on X ,
C1;,, = pu~ ror Ii -)I - l , and
=
0 for li - jj > 1. Other models for serial correlation. including the fim order
autoregressive model. are discussed further in the context of G LS in Section15.5
(also !.ee Exercise 18.8).
O ne assumption lbat has appeared on all pre vious lists of least squares
assumptions for cross-sectional dat<~ is that X, and " have nonzero, (inite fourth
moments. In the case of GLS, the specific moment assumptiOns needed to prove
asymptotic results depend on the nat ure ot the function O (X). The particular
mome nt assumpllons also depend on whether o ne is considering t he GLS estimator. the GLS 1- or F-statistics, and the moment requireme nts also depend on
whethe r fl (X) is known or h as estimated pa r am~t e ~. Because the assumptions
arc:: case- and model-specific, we do not plcf>Cnt specific moment aSllumptions
here, and the discussion of the large-sample properties of GLS assumes that such
moment conditions apply for the relevant case at hand. For compl~tcness. as the
third GLS assumption, X; and u ; are simply nssumed to satisfy suitable moment
conditions.
The fourth GLS assumption is tha t X has full column rank. Ihat i~ the regressor:, are not pe rfectly multicollinear
The GLS a<;sumptions are l!umm:uw;u 111 Key C\mceptl8A.

n,. -

n,,

o,,

724

CHAPTER 11

Tho Theory of Multiple Regre~sion


\\'c; con... J..:r C.,L'-, '- .umntion in two "'!>'"~ In lht.: I r.. t W ... t:. n r..\1 j, kn0\\0 In
the '\!cond '".1".: the lu lt:tion~tl form ol H (X) J'- knl' \n Lp to som~: ramcler'
that can l:lc cstim ttc.d Tl'l ,jmplif) notation. ''n: rdcr to the I unct1on fll.\') '" the
OHlin:>; fl. \0 the dLpLOUCnCC of fl On 'f is impliCit.

GLS When !lIs Known


\\ hc.n n b known. th1. Gl S ~. ... tmator usc~ n. to tw nstorm the rcgrcs"inn model
to une With<: rror:. th.ll Stlll..,f) the Gau:.:.-\J.Jrl..\l\ l'nndallons. ~p~citicall). kt r he
:t matm. 'quarc. rnot ol n , th,at ts.let F tx a matrix that '"t t.,fJe~ F' F
n (~o,c.e
AppenJh IS 1). A prnp.:rt} uf r ,., that FHF' = I '\ow prcmulllpl) both :.aJc.s
ot F.4uation (lXA) h} 1' to oht.1in

-,.. = X{J
- + U.\\ hc.r.: )' = F Y. X - FY. ~tnd l/ Fl./.
l11c k~v inc;ight of l1LS j.., that. undet the four GLS assumptiom. the. Gau-.,.
:\.Sarkov a.....umptioth holc.J fur the tranc:formeJ rc~rc...silln in Equation (I ~.-P ).
'll1 tt '"h) tr.mslormm~ dl the vanablc;s bv thc. mver.;e of the matrix squ~ rc- roOi
Ol H. th~ r~gf\!:.:.100 c.ff()f~ In the tran-.lllnlli!J rc.gTc.:>!>tOn haW~ a Conc.JIIU'IIl31
mean l'IC lero nnd ,1 l.'ll\ rt.tnC\; m.tlrL\. that .:4u.tl~ th~o. idc.nltly matnx. fo :.) uw
thi:. math~matil:.,lly.lir..t nott: that l.li:'L\') l:(Fl.. F\ ) F I:(L,F\ ) 0,, hy
the lir~t GLS ,I..,....Untptiun rr 4uation (!SAO)) . In ,,Jc.Ji tiOD. E(i/L'' ,.\')
l (! FU)( F(')' F \ J = F ( l.iU' F.'<)F' = FHF' = l , ''hl!re the second eq Lwlity
follow~ becauo;~ (FU)' = U' F' and the fma l cqualitv lollow:. from the c.Jdmiti~'ll
of F. I! rollow:. that t h~ lfi.lll'oformcc.l fCgf\!S'iion model Ill Lquauon (1\.f') satt'l!c... the Gauss \larko\ conJitions m Key Concept I X.3.
lltc GLS e'umator /1 0 ,,, th.: O LS esum.1tor of fJ 111 E:.quauon ( 1~.42). th.1t
i.._ p :> = (.Y'.\ ) 1Ci' Yl Occ.uo'- th'" trJO">formcc.l rq;rc,,ion model
the
(' u ....... ~tarki'l\ condition$. lh~o. GL~ e timator h the hc-.t conditionally unhj,,..,.:J
e-.umator th.ll h linear in l But hccau<;e Y- f ' )' and F b (here) a.,<.umc:J 10 h~.;
l..nown. and h\!GIU'e F ,., un etllble ( bccau"c H j, po.,ttt\~ c.Jdillne ). the da" of
t!Sllll1ttlOTS that arc linear Ill }' I!> the ~atne a'- lhc cJ.I'l"i or estimatOTh that ,1r1.. lln
car in l '. lllu..... thc O LS c<>llmator of f3 tn Fquatton llh.4~) i., also rht! ht:<;t \.:onui
lit1nall~ unb' ,l,~,J to:'' im.1Lor :1mnng ~'ttmator-c; th.u 1111. hn\:.tr m }'. In otbt..r '' oru:..
unc.l~r the (iJ <; "'umptinn-,. lhe GL <; l:'lim ltor j., BLl ' F
' Pte GLS eo;.tmloltllf c.m !'lc C\.Pfl."t:J dircctl} Ill h.:rm' of n. St'l th,lt in princtrk there'" nn n~.eJ to compute the 'quare wot m:Hn\. r Bccau~e .\' - / ~\' onJ
}' FY. {f1 ~ ( .\ ' F'.f ).j !X' f" F > ). Out r F - n- 1 so

'''''r''-'

18 .6

G, tlerclized

Leos! Sqoores

725

In pra~tic~. His t' ptcall~ unlo.nu\\n '-<tth~ (J I.S 1.. -.umalnr tn b.~u;ltiun t 1K4~)
<tlly ~nnnot ht.. ~omputcd :md thu-. '' um1..1m1c~ <:<.tiled thc.: inrcao,iblc GL~
C'\tlm:uor. rr hm\ C\ cr. n ha~ a kno\\ n fun~tiumtllorrtl I'UI the: paramc.:te~ of that
functi'n .tre unkno" n.thcn fi can he e!>timutcd und a f~:~-.ihk "'"Non :>f the GLS
c"timntM can he computed.
I) pt

GLS When

a Contains Unknown Parameters

If H IS <.1 known functton of some paranh.ll.. r-. that tn turn can be c:~llmatc:J, thc:n
tht:sc e'umatetl parameh.T' c~n be u...cd to 1.. tlcultc an c.:~timator of th~. L"O\ anance matrh n.. For example. runsiucr th~o. tim1. c;cric' applicatinn Ji. cu,sed
following Equation (1~.41) in \\hic.h !l( \ ) doc-. not dcpcnu un \'. .n = u-:.
fl - pcr2 for i - jl = I. and fi, -II lor jl - 1 > I fhcn fi ha-. two unkno,,n
paramctcrs. rr : and p. Thest: paramch. . r<; l.an he c~ttnMtcd u"tng th~o. rc~H.lu,tb from
a prellmtnary OLS rt.!grc~s10n; spectftc<~ll), 1.<111 b~.. c~tnn.llcd b) '~ .wJ p can bt:
estimated h) the sample correlation hctwcctl llln~tghhnring pair::. of O LS re-,it.l uals. These estima ted param~tcrc; can in turn he u'ct.lto compute''" cstimawr of

U":

.n.n.
In general. supp~ thai you have an C\lllnator n of n. ThLn the
mator b:l:...:d on i>. 1s

CIL~

esti-

1l1e GLS C\timator in Equation (I R44) b \llnldlmc' called the feasible GLS
estimator because it can be compulc<.l if the cov:m~1nce matrix cont.unc; som~
unkno\\ n parameter' that can he e um.ttcd

The Zero Conditional


Mean Assumption and GLS
for the O LS c'timator to be con,h.tc..nt, the lir't k1-.t ~uares ac;,umpti<m mtbt
hold. that i.;;, F(u, .,) must he 7Cro. In conHast the fiN GLS as~umpllon is that
(u, IX 1. X, )= ll. In other words. the lir'tt OJ. S a-.... umption IS that the error
tor the ;th obscn alton has a cont.ltllon,tl nh. .an ol tero. i!tVl'n th~ \'alu<::~ u[ tbt!
rcgrc<>,ors for !hat obsenauon. wht.:r~o..t~ the llr~t OLS Js.,umption ~~ th.tl 11, ha:> a
conditional mean ol1cn,, gih:n the' a lui!' of the. rcgn:"or' tor all oi"-,c:nauonl>.
-\_ di,cu"c..u in Section PU thc.: a"umplitlfl'- that F(u X) = 0 .mJ that sam pling is i.i.u. togcthc:r tmpl) that F:(u ' \ ... , \ ) 0. Thus. ''hen <;Jmpltng is
i.i.d. so that GLS i::. \VLS, thc.: fiN Gl \ l:.sumptton '' 1mpla:d b\ thl.! ftN k,t-.t
...quares ~-. ... umptwn m "-.cy Cum:~o.pl IX I.

726

CHAPTER 18

Tho Theory oF Mvfttple Regression

When sampling is not 1.1.0 . howe\ a , the ftr::.t GLS a~umpllon is not implied
O\ th~.: 1 ,umption th.tt F(u, .\',) 0: that i' the first G l ~ d .. umpuon ,, !>tfOI\ cr.
Although the dastinction bet\\t.:\;n thc'c two cond itilln~ ma ht sc~m light u can
b.. v-.:rv important in applicauon-. to nme serie uata. Thl'> J <:tinct 'on ~,., Ji.,~,.u.,.,cLI
in $..::~lion 15.5 10 th~.- ~ontex t Ol whether th~.; rcj.!r -.::..sm IS ''past anJ prco;cnt"'
exog~nous or 's t rictl~ .. c:\ogo.::nou::.; the Jssurnption th..tt L(ll >: .... , X 1 = 0 C\lfresponds to 'tnct exogc.!ncat~ He re. we discuss thJs Ji~tmcuon ..tl J more gcncr.tl
Jc, cl U!-tng matrix nnt.ttion.1o do so. we Cocu on the CJ".: that L 1 homo.,keda"tic. fi ic; knOY. O. and fi has OOOAfO off-di ag.onal~.h..m ent<..

The role of the first GLS assumption.

To see the source of the dtfferencc


between these a<:sumption<:. ar is usctulto contrast the consistency arguments lor
GLS a nti OLS.
We first sketch the argument ror the consistency of the GLS estlmtHor in
Equation (18.43). Subt-tituling Equat ion ( l8.4) into Equation (18.43). we han.:
/:J~JLS = 13 + (X'H 1Xfn ) l(X'fl - 1U/n) . t.Jnder l11~: first GLS assumption.
l:.(X'fi - 1U) = [X'O 1E(U X JI = On. If in audition rhe variance of X'H 1Uin
1X/n - Q. where Q is some invenible matri~. then
tends to zero and
s ~ 13. Cri tically, wh~n .n. hac; off-diagonal element . tbe term X ' n t' =
~.. 1
X;{H ),p1 tnhll"~' proJucb of X, and u1 for diffe rent i,;, wh~.rc (fl )il
J~notes the (i,j) element of !l. 1. Thus for x.n.- U to h1v1. a mean of zero, it is
not enough that E(u IX ) 0: ra ther L(u, X1) mu 1 equ tl uro for all i.j p.ir-; corresponding to nonze ro values of (O - ) . Dependinf! on the covariance structur~.
or the errors., only some or all the elements of (H ),i might be nonzero. For
example. tlu, fo llows 3 tarst o rder JUtoregreso;ion (as di cu~seJ in Secuon I 5.5),
the only nonzero d~m~.ots ( fi \ arc those for \\htch .; - j j s 1. In general.
howe\ c f, all the dement Of H I <:an 01. norucro. SO 10 general (OC
X' fi 1L' 1n ~ 0 k
, (and thu-. fur lJc- s to he con:-.l'>lcnt) we need that
F.(U X) Ow that is. the fir~t GLS ac;,umption m~t hold
rn contrast. recall the argument that the OLS cc;tamator is con ~iste nt. Rewrite
C4uution ( 18.14) as {J 13 -t (X'X I n) 1 ~ s:_; 1X,u,. 1J l:.'( u,I X,) = 0. then the term
;:,;' 1X,.u, ha~ mean zero and, II lhi~ term h::t!> a variance that tends to zero. it con
II
v..:rges in prohabili ty to cro. Tf in addition X' X /n ~ Q.r thcr\ 13 ~ 13.

/JC'

xn-

2.;.,

Is the first GLS assumption restrictive? The first GLS assumption re4uires
thJt the error!> fo r the 11h oh~ervation be uncorrelatcd with lhe re!U'essors for all
other observation:,. Thic; a!)l)umpuo n is dubious in some timL 'lc!ries applications.
This issue is di<.cu::.sed 10 SecLJon 15.6 m th1.. conti.!XI oi .10 \;mptncal example!. the
relation.,hap bl!tY.een the chang~. an the pnc~. o! .1 contract lur future delivery ot

18.7 lnstrumentol Variables and Generolr~ecl Melhod of Moments E'timotion

727

frotcn orun!!e conct:n tratc and the weather in I lorida. As cxplatncd there. the
~.rror ~t:rm tn the regr~.:sston ot pncc change!- on the'' cathcr is plauc;ibly uncorre
httt:d w11h currl!nt and past ,atues of the \.\c,llhcr. so the ft rst OLS assumption
hnlds. H O\\ c\CT, this error term is plausibly con elated with luturc values of the
wcathc..r. so the fir<>t GLS assumption does nm hold.
Tilb example illustrates a general phenomt!non in economic time series data
that ames when the 'aJue of a varia hie today is set in part based on expectations
or th~ {uture: Those future expectations typically tmply that the error term today
depends on a forecast of the regressor tomorrow." bich 1n turn is correlated \\ ith
the actu.tl value of the regressor tomorrow. For this rca-;on, th..: ltrst GLS assumption is in fac t much stronger than the fiN OLS assumptiun. Accordingly, in some
applications with economic time series data the GLS estimator is not consistent
even though the OLS estimator is.

18.7 Instrumental Variables

and Generalized Method


of Moments Estimation
11tis section provides an introduction to the theory of instru me nta l variables (JV)
estimal ion and the asymptotic distribution of I V esttmators. II is assumed
throughout that the [V regression assumption!. in Key Concepts 12.3 a nd 12.4
bold and. moreover, that the instruments are trong. n,cse assumpuons a ppl~ to
cross-sectional data with i.i.d. observa tions. Under certain conditione; the results
derived in this section are applicable to time 'cries data a well. and the exten
sion to time series dat.a is briefly discussed nt the e nd of this section. All
asym ptotic results in this section are developed under thc assumptjon of :.trong
in<>trumcntl>.
Thi sc:ction hegins by presenting the IV regression model. the two stage lt!ru.t
<..quarcs (TSLS) estimator. and its ar.ymptotic dtstnbution m the general ca<..c of
hl!lcro~ked as tidty. all in malnx form. It is nc\t 'hown thJt in the pecial case:: of
homo<>kedasticity. the TSLS estimator is a~ymptotically efficie nt among the class
u( LV estimators in which the instruments are linear combinatiOn!> of the exogenous vanables. Moreover. the 1-st.ttistic has an asymptollc chi-c;quared distnbution in ''bich the degrees of trecdom c"luab the numb~.r ol O\ cridenllfying
restrictions. This section concludes " 1th a dt'>CUs'-lon of dficicnt rv estimauon
and the test l)[ overidentifying restrictions \\hen the errors are hetcroskcda-;tica situation in which the efficient rv estimator ic; known a~ the cffic.tcnl gencrai!Z<;tl method of moments (G MM) estimator.

728

CHAPTER 18

The Theory or Multiple Rcgrcuion

The IV Estimator in Matrix Form


ln thi,'CCtJOn.\\~o. kt \'dt:note lhet ., (1.- r + I) matm.

,fth~o. rcgr~.:..,

orsm the
LljU.Itl\)0 ol intt r."SI. so .\ con tams the included endogenous rc~re..,..,or, (the \",
10 J..~.y Com:~:;pt 12.1) /11/c/ the incluJ..:d ~xogcthlll'- TC!,!f.'''<(lf' llhc n \ il Kcv
( unccpt I:!.I) 11ut 1~ in the notauon ol l....t:.} Cun~..... pt 1~ I. th 1
''\ ''' ,\ "
X, (I X II ,Y2J ... X,., \\II w~ ... W,.) "\1-.,u. kt / lkO<HL th~. II < (m ~ r 1)
malri\ of all the c\ogcnou... rcg.rc!>~OT'. buth tiHN: included 111 the equation uf
inh.:rc... t (tht W's) anclthU'L excluded rrom rhc 1.4U ttiun of im rc-;t (the ir't u
mcnt,). That jc; in the not.ttion of Key Concept 12 I tl e i1h n_,,, l. Z i" Z (1 Z 1,

7.21 ...

\\

w ... W,).

\\ tth tht~ notatton. the [V regre::.sion motld of Key Cunc:~.rt 12.1. wnth.. n tn
m.llrt\

lorm. i::.
}' =

where U ls t h~:

11

X{3

( lg.45)

L,

X 1 vccwr of e rrors in the equulion ol interest, with /111

ekm~.nt

";
1l1c: m.ttrix Z consist' ol all the exogenous n:gr~:ssors.so under the 1\r rq~rc ....
StOll :lt;mmrtklll' in KL \ runcept 1~.4.
HZ , ) = 0

(jo,trument ~:>.o. ~neil~)

( 18.46)

Hecuu'>e thcr..: .1r~ J;. indutled endogenous rcgrc-.,~or-.. the rir'>l stagt: rcgre..,..,ion
con-.i-.h of k equation'

The TSLS estimator. The TSLS t!:.Umato r 1:. th~: '"'trumcntnl 'ariahh...., c-.ti
mal or 10 wbtch the tnstrumt.. nb arc the ~rcdil'tcJ values of.\ b.tseJ on ou; c:-.11
m'Jtton ol the.: l!f'l '!>tag~ r~.ue~~ton. Let.\ tlenot~. tht'> main\ ol pn.:Jtctcd \,t!u~.:-..
Sll that the ;th H \\ l ( \ ... ( i
X:; ... ..\-~, w, Wu ... u ). wht..re .r,; j.., tht:. rrc
Jit.tcJ \alut.. from the: rcgresston of X 1, on Z . am!..,,l IMlh Bcc.:au~c tht.. W\ 1rc
contained in Z. the prcdictc.:J \Jlue from a regre..,...ion uf l4 on L i.; jut;t \V1 and
:.o lurth, ~o .\ - P1 X. where 1'7 = Z ( Z' L) 7' I"1.'C Cquutton ( 18.27)]. Accord
ingl~.

the rSLS ..:stimator b

Bccau'e X = P 1 .\ Y'X - Y'PLX = X'.\, nnJ .\'' )


c.1n he rc" riucn

'\'' P7 >'. tht I,')LS c\timatm

.1.;

(I ~ .48)

18.7

Instrumental Variables and Generalrzed Meiflod of Moments &timotion

729

Asymptotic Distribution of the TSLS Estimator


\ulhtituting Lquauon ( 1 A5) into Equ.tllnn ( 1S.4S), rcan 111 ''"b mJ multi pi) in:;
by V 11) idJ the c:xpres-.im fl'r the centered tnll,c.tku T\l S 'timah,r:

[.t Z(L Z) Z'X] '[\ /(7/) 7 ul


1

II

II

II

II

II

II .

(18.49)

''h~re th~.

'ccond cqualit\ Ul'C"' tht. Jc..finilittn of 1'7 t 'nJd the IV r~~r~ton


n...;umption ... X ' Z n ~ Q \7. and L ' L 11 ___L.\ Q n " here.. Q v
['(.\ L ')and
Q n - I (L Z;'). In additi,1n, unJ~:r the I\ r" .r~.;s-.aC'In 'lssu mpllun-. 7tt b i.u.l.
\\tth me.. m .t<._:o [Equauon (lb.-It!), 3nJ ,, non/" o lim\" ,,tnancc:.. l>O us -.um.
Ut\klcJ 0} \ II. S.ltl'oltC~ thl.' condthons o[ th~o CLiltr.tl hmit th~or~-:111 clllU

\\ hl!rl: \II /l I!> (m -'- r + 1) '< 1


AppltcJtton ot l:.quat10n (lR:'iOJ .mJ ''' th" hnHtS \''L111 ~ Q, t and
Z ' L .11 ~ QzL. to EqualJon (1 ~.-19) ) icld~ the fl.''ult lhJt. until.' I tht. 1v regresston ""umpttons, the T'SLS ~.:<-timatllr lb "'>mptotkall:v norm.dl) Ji,trtt"lutcd:

wh ~re

whl!r\. H i-. Jdincu in E4uation (IR50).

Standard errors for TSLS. Tht: formul,t in Cquu11on CIX.'\"" lis uaunlln~ Ncvcrthc:h:'-S. 11 prO\ ides a way to esllmatc ~ 1' 1 ' h) -.ub,;utullng .;ample mnmt. nls lllr
th~o populatiOn moments. The rcsultmg. \ ,trwncc csttlllatw i-.
"'''..S-(
Q
A\ lQ
. l7. Q
~ )- Q
A.t. Q
~ l.t.IJJQ/./ Q
. /\ (Q
A\ZQ
. /./.' Q. / \ )
-'
-

where Q,z

X'Z

11 .

Qzz = Z 'Zi n. Qz,

(1~51)

= /' ..\ 'n. and

H =~ "'2:,ZZ'11~,\\hl...r~... i l'

.\/J1 ' 1

( 18.54)

730

CHAPTER 18

Tho Theory of Multiple Regression

when! U is the vector of TSLS resid ual!:. and it, is the ;rh clcmcn 1 of that vector
(the TSLS residual for the t"th observation).
The TSUi standard errors are t he square roots of the diae.onal clement:, of

'f. nt \.

Properties ofTSLS
When the Errors Are Homoskedastic
H the errors are homoskedastic, then the TSLS estimator i\ a~~ mptoticall) efficient among the class of I V estimators in which the instruml!nts are linear combinations of lhe rows of Z.1llis result is the IV counterpart to the Gauss-Marko\
theorem, and this result constitutes an important justification for u ~ing TSLS.

The TSLS distribution under homoskedasticity. lf the em.>r~ arc


homoskedastic.thH t is, if E(u}IZJ = lf~, then H = E(Z;Z/uT) = E[E(Z,Z,'uf IZ,)]

= [ZiZ.' I::(uriZ;)] = Qzza~ . Jn this case. the variance of the asympto tic distribution of the TSLS estimator in Equation ( 18.52 ) simplifies to
~ tSU - ( Q
"'
X2

Q-I
Q )2Z Z X

l a2
u

(homoskedasticity only) .

(18.55)

The homoskedasticity-only estimator of the T SLS variance matrix is


i TSLS

U"U
= (0
(J- l(J )-1 C,lu where (J."Zu = -n ---,-------.,.
-XZ- ZZ-ZX
k - r
1
(homoskedaslicity only).

(18.56)

and the homoskedasticity-only TSLS standard errors are the square root of the
d iagonal elements of trsLs.

The class of IV estimators that use linear combinations of Z . The class


of IV estimators tha t use linear combinations of Z as instruments can be generated in two equivalent ways.
1l1e first way Treats the problem of estimatio n as one of minimizing a quadratic o bjective fu11ction. just as the OLS estimato r is derived by minimizing the
sum of squared residuals. Under the assumptio n of instrument exogeneity. the
errors U = Y - Xf3 are uncorrelatecl with the exogenous regressor ': that is, at the
true value of {3 , Equation (18.46) implies that
E(( Y - XfJ)' Z I

= 0.

(18.57)

Equauon (IR57) constitutes a S}Stem of m + r + 1 eq uations in\olvmg the k + r


+ I unknown c.kmcnh of {J. In population, thcsl.! equation~ arc n:dun<.Jant, in the

18.7

Instrumental Variables and Generalized Method of Moments Estimation

731

sense !hat all arc satisfied at !he true value of /3 When th~.:sc population moments
are replaced by their -;ample moment . the system of equations ( Y - Xb )'Z = 0
C<tn be solved for b when there is exact tdenllftcation. l'hts value ol b '" the IV
esumator of /3. However, '"hen there is overidcntthe<ltttm (111 > k). the system of
equa tions typically cannot all be ~at i lied b) the same \alue of b bccaus~.. of :.ampiing variation-there are more equations than unkno\\ ns---.lnu m general this
sy<;tcm doc' not ha\e a solution.
One approach to the problem of estimating fJ '"hen there b overidentilication is to trade off the desire to satisfy each equatton by minim171ng a quadratic
form involving all the equations. Spcctltcally, Jet A be an (111 "'I r "T" l) x (m - r,
1) symmetric positive semidefinite weigh t matnx, and let {3~1 denote the c::~llma
tor that minimizes
minh (Y - Xb) ' ZA.l'( Y - Xb ).

(18.58)

The solution to this minimization problem is found by taking the dl.!rivative of the
objective function with respect to h, setting the rc ult l.'qual tu tcro. and rearranging. Doing so yields p~v, the IV estimator based on rhe weight ma trix A :
/3 1,~,.

= (X'ZAZ'X) - 1 X'ZAZ'Y.

( 18.59)

Comparison of Equations (18.59) and (18.48) shows rhat 'f'SLS i~ the IV estimator wit11 A = (Z ' Z)- 1 That is, TSLS is the solu tion of the minimization problem
in Equation (18.58) with A = (Z' z ) -1.
The calculations leading ro Equations ( 185l) and (1 8.52). applied to p~v,
show that

Vn(P.': l~v

fJ) ~ N(O,l. 111 ), where

= (QxzAQzx) - 1 QxzAllAQ7.x (QuAQzx)

(18.60)

The second way to generate the class of lV estimators that Ul>C linear combinations of Z is to consider IV estimators il1 which the instruments arc Z B, where
B is an (m + r + 1) x (k + r + 1) matrix wi th full row rank. Then the sy tem of
(k + r + I) eq uations, ( Y- Xb)'ZB = O.can be solved uniquely for the (k. + r +
L) unknown eleme nts of b. Solving these ettuat itms for b yic.lu!:> P111 =
(B' Z ' Xr ' (B' Z ' Y), and substitution of B = AZ' X into this expression yields
Equation (18.59). Thus the two approaches to ddining (V csllmuLUrs that are lin
ear combinations of the instruments yield the:! same family of IV estimator'>. It i-.
conventi on al to work with the first approach. in which lht: I V ~'itim.ltor solv~l-. th~
quadratic minimi7ation problem in Equation (1 8.5X). and that i~ the approach
taken here.

7 32

CHAPTER 18

The Theory of Multiple Regre$sion

Asymptotic efficiency ofTSLS under homoskedas ticity.


homo,keclasllc. thcn H

Q.zr.r~ and the expres!.mn lo1 ~ ','

111

II tht; ... rrm' .trc


J:qu.ltJnn ( 11\.hll)

becomes
(liS til)

To sho'' that TSLS is a"YJTlplotically eLficient amc,nr the clac:c: f ' 1mator,
th t arl.! hn"c1r \;ombinations of Z when the errors ar\; ho ,o,kcun 111.;, "~ n~.cd to
show that. undt:r homoskedasticny,
( 1S.h2)
for all positive sem idefinite mnrrice<> A nod all (k 1- r 1- 1} x l vcctorc; c. wh~:lt.'

l ''' ~

= (Q,zQz~Q ?.x)

ai

[Equation (18.5~)J . fquo tion (18.62) wh1ch l '


proven in 1\ppendi'\ 18.6, is 1.he same e[ficiency crit enon as is usucJ in the multi1

variate Guuss-Markov theorem in Key Concept 18.3. Cons(.!yucnlly, fSl Sis thl!
efficient I V co;timator under homoskedasticity. among the dn'' ol 1.. -.timator'
which tbe instruments are linear combinations of Z.

111

The J-statistic under homoskedasticity. Thef-stalistic (Key C'onc~:pt 12.6)


t l.'~h

the nuU hy potb~b that all the o"eridentifying restrictions hold, a! un.M 1 c
ahemativc that c;ome or all of theo1 do not hold.
The tdea ot h~ ) -statistic is that. 1f the ovendc.:ntlymg n:stru:th'"' hl.llc.l. t,
will be u ncorrdat~d '' ith th..: instrum~nt!> and tho-.. a n:.gress1un of L ll / "111
haH; populauon re~re,!\ilm coefficients that all cqu.tl tcro. In pr sctke. C.: is nl.l
ohscrvcd. hut it can be E:stimaled by 1he TSLS reo;1duals L . so a lc.:er.. ssion ot C
on Z should y1cld c;tattll ticall~ ins1gnificant coeffcknl" Acco1Jingly. Ih..: 1 SI-S
J-~ tatt tu; ~ ~ t he.: homu kedasticity-only F-statistiC testi ng the hypothesi' th:ll th1.
coefficients on Z arc all :tero, in the rl!gre~sion o f U on Z. multpl11.:d by
(m +- r 1- l) ~u that the F-statistic is in its asymptotic ch1 'tfU<Hl!d 1'111 m.
A n explicit fnrmulu for the J-slulislic can be obtained w..ing I:.qu.llinn (7 I:i)
for the homuskedasl icity-only F-slatistic. The unrc'.lrictcd n:~trl.'.,...ilm j, the
rcgrcs~ion of iJ on the rn " r + 1 regressors Z . and the restricted rcgre~-.1on ha-;
no reg.t<!!.sors. 'lbus. in the notation ot Equation (7.11), SS'R
= U' M /(
andSSR,, ,. I ll'V.soSSR lri<:~J- SSR ,.,,
lf u - u'JJ,- ':: ii'PzL'
and the J-~talhtl c is
(1S.6J)

18.7

Instrumental Variable!> and Generoliz.ed Method of Moments Eshmalion

733

nH:

ml!thod for computing the 1-!>tatisLJc described 10 Key Concept 12.6


c otmls te:.t10g only the hypothesis that the cocff1cicnts on the; excluded instruments are L.ero. Although these twc) method<; have di fferen t computational steps,
they produce identical 1-stalistics (Exercbe 18.14).
It i!' shown in Appendjx 18.6 that, under the null hypothesis that E(u,Z ,) = 0,
d
2
1 ~ Xm - k

( 18.64)

Generalized Method
of Moments Estimation in Linear Models
1f thl.! errors are hcneroskedastic, then the TSLS estimator is no longer efficient
among the class of IV estimators that use linea r combinations of Z as instruments. The efficient esLimator in this case is known as the efficient generalized
method of moments (GMM) estimator. In addition, if the errors are heteroskedastic. then the J-stati:.tic as defined in Equation ( L8.63) no longer has a
chi-squared distriburion. However, an alternative fom1u lation of the ]-statistic.
constructed using the efficient GMM estimator, does have a chi-squared distribution with m - k degrees of freedom.
These results parallel the results for the estimation of the u~ual regression
model with exogenous regressors and b e teros k c:d a ~tic errors: If the errors are
heteroskedastic. then the OLS estimator is not efficient among estimators that
are linear io Y (the Gauss-Markov conditions are not satisfied) and the
homoskedasricity-only F-statistic no longer has an F distribution , even in large
samples. In the regression model with exogenous rl!gressors and heteroskedasticity. the effici ent estimator is weighted least squares: in the lV regrcs<:ion model
with hcteroskedasticity, the efficient estimator uses a difft:re nt weighting matrix
than TSLS, and the resulting estimator is the efficient G MM estimator.

CMM estimation.

Generalized method of moments (GMM) estimation is a


general method for the estimation of the paramete rs of linear or nonlinear model\, tn "hich rhe parameters are chosen to provide the best fit to multiple equatlOO!>. each of which sets a sample moment to zero. These equations. which in rhe
context of GMM are called moment condition ~. typtcally cannot all be sali fied
-.imultaneously. The GMM estimator trades off the desire to satisfy each of the
equations by minimizing a quadratic objective function.
In 1hc linear IV regression model with exogenous variables Z. the class
of GMM estimators consist::. of all the estimators that arc solutions to the
quadratic min imjzation problem in Equation (18.58). Thus the class of GM~1

734

CHAPTER 1 8

Tho Theory of Multiple Regres!>ion


~~timators

b;tsl!d on the full set of

in ~trum t:nts

Z With tllfl.:r\:nt

Wct~ht malltlc~

A i-. th~.o 'iam.: a~ the class of I V estimator- in wh1c::h th~.; in.,trunll. nb ul~.; hncr

c::ombinmion-; of Z. Ln the linear IV rcgrc~~ion mudd. GM\1 i-. ju,t .tnnthcr name
for the cla<;s ot cc;1maton;; we have been studying th:at i'. c'ttmntor' that ~~> lv<:
E4u:11ion (I R5X).

The osymptoticolly efficient GMM estimator. 1\ nonl!, the clnc;c; of (J\t\'


c... um.Jto~ the efficie nt G\DI esumator is the G:-.IM cstlm ll\11 ,. ith 'h" 'm 1l c:~l
~ts}mptotit.: \.:tnance matnx {where the smallest ,,mnncc matnx '' Jdm~.:J , 1
1 quation (l\.h2JI Thu' the:: result in Equatton (IK62) eM he lc't ltt.d a' ').,)111:.

that T~LS is the dficient GMM c"timator in the linenr regre,,i,,n llllldd \\hc:.n the
errors nre homor,kcda tic.
Iu muuvme the expresston for the efficient GMM esumntor when the errors
arc lh:l~ro'k~uu~tic . recall that when the error!> nrc h omosi-.~Ja:.tic. H Ithe \Jrt
uncc matnx of Z,tt;. sec Equatton ( l~.5U )] equal' Qucr,:. anJ the a-.ymptullcallv
dfidcnl weight matrix 1s obtained by seuingA = (Z 'Z) 1, wh1ch y1cld~> thcTSl S
t:'timc~tor In lnrg.: -.amplc~. u~ing Lhe weight tnJln\ A :;: (Z'Z) '' cqlli\<tl\!!11 to
u'ing. \
( Qzz<r~) j = H 1.1llis intcrprctati<.m of the:. TSLS estimator '-Ugg.c,tl>
that, hy analogy the ~..:fficient IY estimator under hctcro'\keJ ''licity can be..
obt.tm~..:d by semng A = H 1 and solv mg.
(IX.65)

Thi' tn ...Jugv 1~ correct: llti'! solution to the minami.Z<.It1or pro~km m


1
Equation l U:>.t>5) as th~ et:ictent G!\e..f estlfl1ator Let fJ
'' d~.;Ol'IC the ::.olt
t=(,n to the.. mm1maauon problem m Equ.nt 10 ( '65) By [ qu ton (1:-\."d) th'
e... ti mator i'

prJ G\f\1 = (.\" ZH

Z'.\'J - .\ 7.H

7 l'

pt:tl

1llc.. .tsymptotlc distribmion of


f i H 11 is oblatncd by suh'\lltullng A
intn Equation ( IS.60) anJ simplifying: thus,

y-;, (pFtf G.\1 II _

{3) ~

(1~.66)

II 1

'(O. !,t.ll Ci llll).

where y_ t.i;u.IIM =( Q\I.H

1QL\ )

( 1Kli7)

' fhc.. r~:,ult that ~F G ~t 1 is the dfic1ent G.\1~1 c..!iltm nm b prO\~;n bv ~hO'"
ing th 1 t ~" "c '"> l' !.rrc; 1\lc tor all \ecto~ c. \\h~.;rc.. ~ tIS gi,cn in [quutlon
( I:\.60). Th~; pro ,f ot I hi.."- re:-.ult is ghcn in A PIX no x I '-i.f\

18.7

lnstrumentol Voriobl~ and Generalized Method of Moment'!. Estimation

735

Feasible efficient GMM estimation. n1e GM\.1 ~~fimator ddmed m Equation ( JR66) ts not a feasible estimator hecau~~.. 11 t.l!!pcmh on the unkmm n \'ariance m.:.ttnx H . llowever. a feasible efficu!nt G MM esumator can be computed by
suh~tituting a consistent estimator of H mto the mmtm ti.IIJOn pmhll'm o( Equation (IK6:'i) or. equivalent!). b) sub~titu ling a consistent estimator of H inlOthc
formula for 111 in Equation (1-'.66).
1l1c ...:fticient GMM estimator can be computet.! in two step~ rn the ftn.t step.
estimate {J using an~ consistent estimator. lJsc thts L'\ttmatm ol {J to compute the
rcstdual<> I rom the equation of interest, ami then use these restJuals to compute an
c<.,tltnator of H . In the second step, use this csllmator ~lf II to ~sttmatc the optimal
weight matrix u - t and to compute tht! efficient GMM c~timator. li.1 he concrete.
in the linear IV regression model. it is natural to usc thl' TSLS estimator in the first
step and to use fhe TSLS residuals to estimate H lfTSLS is u-;cd tn the fiN step.
t h~ n the lca.o;,ihle dftcicnt GMM estimator computed tn the scconJ step is

P.

(IK~~)

''here H ic. gi,cn in Equation (18.54).


Because
~ H , V~(pEft.GM.It - '{Jt.'ff.<,M.It) ~ 0 (1:: \dci'-C 18.12). and

(18.69)

where I, Lf/G\1\t = (Q_\zH- 1Qzx) - 1 (Equa tion (lo.67)]. 111at ts.tht! feastble two<; LI.'p c~timator ~EtfG 1111 in Equation (18.68) i)\. a-,ymptoticall>, th1. cftktcnt GMM
c"timator.

The heteroskedasticity-robust } -statistic. 1 he hetero.,l\edaslicit~-rohust


J-stati,tic. tlso known as the GMM )-statistic. is the counterpart olthe TSLS-based
J stt~thltc. computed u::.ing the efficient GMM eslllnator anJ weight function. That

'"the

GM~I J-~tat istic

is given hy
( 18.70)

'' herl' L'CM i t - Y - x jJJo.ff.G 11'1 are the rcsiuuaJ, fr,1m the cqu~ tinn of interest,
c'tlmatcd hy (lcasihlc) effici~nt GMM. and i{ 1 is the wct~ht matnx used to
compute (J 1Dctltlf.
Umlcr thl.! null hypothesis l::.(Z,u,)
1Kc1).

= 0, 1'' 1111 ~

,\ :,, k

(sec Appcndi\.

7 36

CHAPTER 18

The Theory of Multiple Regression

GMM with time series data. Th\; r..:l>ulf'\ tn this :.cction wen.. d~.:rivc:d unJcr
the [V regression assumption' for cross-:.cctiona l data. In many applintion'.
ho\.. c\ er. these results extend to time series applica tions of 1\ rt.;gre''llln and
GN!M.A ithough a formal matbemaucaltr~.:atmcn t ofGMM \\ith time .,.. rie data
1S beyond the scope of this book (lor such a treatment. ~ec H aya-;hi, 2CIUO. Chapter 6), \\e neverthe less will summarize the kC)' tdcac; or G~ esttmattun \\llh time
series data. This l>Ummary assumes famiHarity with the matenal in Chaph: r<- 14 und
15. For this discussion.ll U, ~umed that the variables are starionaf).
ft is useful to distinguish betwee o two types of applications: apphcatiom. an
" hich the error term u 1 is serially correlated and applications m \.. hkh tt1 is seriallv uncon elated. U the error term u, is serially correlated. then the asymptotic
dbtribution of the GMM estimator continues to be normally dbtribut~d . but the
fom1ula for H in Equation (U~.50) is no longe r correct. Instead, the correct
expression fo r H depe nds on the autocovanances of Zit, and is analogous to the
formula given in Equatio n (15.14) for the variance of the OLS estim ator when
the erro r term is serially correlated. The eCf.icient GMM estimator is still constructed using a consistent estimator of H; however, tba t consistent estimator
must he computed using the HAC methods discussed in Chapter tS.
U the error term 11, is not serially correlated, chen HAC estimation of H is
unnecessary and the formulas prcs(?nted in this section all e"'\tend to time se ri~
GMM applications. In modem applications to tinance and macroeconometnC1,.11
is common to encounter models in whtch the error te rm represents an unexpected or unforccastable disturbance, in which case t h~ model implies that u, is
senally uncorrelated. For example, consider a model wi th a smgle included
endoge nous variable and no included exogeoeou!o variables, so that the eq uation
of Lntt.rest is Y, = {31 - 13 X -r u,. Suppose an economtc theory implies that u, is
unpredtctable gi,en past information. Then the theory implies the moment
condition
(18.71)
where z,_J is the lagged value of some o ther variable. The moment condition in
Equation (I R.71) implies that all the lagged variables Y1 1 X,_,. Z,_ ,, Y, _2, Xc-2
Zt-2, ... arc cand idates fo r heing valid instruments (they satisfy the exog~n tlity
condit ion). Moreover. because u, 1 = Y, 1 - f3u- {3 1X, 1 the moment condition
in Equation (18.71) is eq uival~nt lO E(ll,iu, t X, I z ,_,.ll{-2 x c-2
=
0. Because 11 , is ~eriaJly uncorrelated, HAC estimation of H is unnecessary. The
theory of GMM presented m this section, includtng efficient GMM Cl>tlmation
and the GMM J-stattstic. tht!re!ore applies direct!) to time series applications
wuh moment contlt tton~ of Lhe form 10 Equauon (1K71), undu the lnpothcsis
I hell the moment conuitiun in Equation (1 ~ 71) i in fact, com:ct

z,_z, ... )

Key Terms

737

Summary
r

X(J + U, where Y is
then x 1 vector of observations on the dependent variable:.,\ IS then >-. (k + l)
m.ttri:-< of 11 ohscr\"ations on the k + l regressor-. (includmg a LOOl>lant), (3 is the
k + 1 vector of unknown parameters, and U is the 11 X 1 vector of error terms.
2. The OLS estimator is
= (X' X) - 1X ' Y. Under the first four least squares
assumptions in Key Concept 18.1. is consistent and a!>ympto ucally normally
distributed. If in addition the errors are homosk.cda ttc. th.:n the:: conditional variance of Pis var(P IX) = u~(X'Xr 1 .
3. General linear re~trictions on (3 can be written .ts the q equations R(J = r. and
this formulation can be used to test joint hypotheses involving multiple codficients or to construct confidence sets for e lements of (3 .
4. When the regression errors arc i.id. and normally distributed, conditional on X.
{3 has an exact normal distribution and the homoskcdasticny-only r- and F-statistics, respectively, have exact 11 k-l and Fq 11 s. 1 distri butions.
5. The Gauss-Markov theorem says that, if the errors a re ho moskcdastic and conditionally uncorrelated across observations and if E(u, j.\') - 0, the OLS estimator is e fficie nt among li near conditionally unbiased estimators (OLS is BLUE).
6. If the error covariance matrix !l. is not proportional to the identity marrix. and if
n is known or can be estimated . th.:n the GLS c::.timato r i!> nsymploticaUy more
efficient than O LS. However. GLS requires that, in general. 11; be uncorrelated
with u/1 observations on the regressors. not just wtth X, as is requm.d by OLS. an
assumption that must be evaluated carefully in applicat ions.
7. The TSLS estimator is a member of the cla\S or GMM estimator.. of the linear
model. In GMM, the coefficients are estimated by making the ~ample co\'ariance
between the regression error and the exogenous variables as small ,,s possible~
specifically. by solvi ng minb !( Y- Xb)' Z] A IZ '( Y - Xb)J. \\here .4. is a wc1ght
matrix. The a::.ymptotically efficient GMM t.:t-~imator sets .1\ - [E(Z,Z,'u;n- 1
When the e rrors are homoskedastic. the aS) mptotJcally dfic1cnt G.MJvt e::.timator
in the linear IV regression model is TSLS.
1. "11lc linear multiple regression model tn matm. tom1 .,

Key Terms
Gauss-M arkov conditions for multivle
regression (719)
Gauss-Markov theorem for multiple
regression (720)
generalized lca~l squares (GLS) (722)
infeasible GLS {725)
fca~tblc

GLS (725)

gcncralit.cd m.:tho<.lof moments


(GMM) (733)
efficient GMM (734)

hcwroskcda'\ticll> rohu\t J-statil>tic (7~5)


GMM J-.,tatio.;tic (735)
mean vector (747)
coHtri<tlll:l: main'\ (747)

738

CH APTER t s

Tho Theory of Multiple Regres~ion

Review the Concepts


18.1

A researcher studying Lhe relationship bct\\ t!en carnm\!S and g~nd~r tor a
T .A {3 -r Xufj
1 1
~r..un j, a tcm .. ~
and X 11 i' a hinal) "ariable that equals I if the ih pcr.on 1s ~ muk. Writ~ th~
model in the matrix form of Equalion (18.2) Cor a h\'pothc:ticnl 'CI ' r II = '
obscn auon!.. Show that the columns of X are linear!\ dcr~ndcnt ~o th~ t X
does not have full rank. Explain how you would re"rcctll\' the modd to
climtnatc the perfect multicollinearity.
You arc analyzing a linear regression model with 500 ob~ervat1011~ ant.J on~!
rcgre,.sor. Explain bow you would construct a confidence int~rv<.ll for {3 1 11:

group of workers pecifies the regresston model. Y; {3,


u,, where X, is a hinal) variabk that cquab J 11 the t'

18.2

18.3
18.4
18.5

tt.

Assumptions 1-4 in Key Concept 18.1 arc true. hut you think assumptio n 5 or 6 migh l not be true.

u.

Assumptions 1- 5 are true, but you think assumptio n 6 migh t n1..ll be


true (give two ways to construct the confidence interval).

c.

A~s umption!)

1-6 are true.

Suppo:se that assumptions 1-5 in Key Concept 18.1 are true, but that
assumpllon 6 is not. Does the result in Equation (I R.3 1) hold! F \platn.
Can you compure the BLU E estimator of (3 iC EquaLion ( 18.41) hold:- and
)'O U do no t know !l.? What if you know !1?
Construct an example o( a regression model that sausfies the assumption
(11 IX;)= 0. but for which E(U 'X) ::= on.

Exercises
18.1

Constder the population regression of test scores agai nst tncumc and the
square of income in Equation (8.1).
a. Wrirc the regression in Equation (8.1) in the nHtl rix form of ELILI<ttii.Hl
(l fl.5). D efi ne Y, X. U, and /3.
b. Explain bow to test the null hypothesis that the rcl ~Ltio n:-.hip bc twc~..:n
test score::; and income is linear against the altcmattvc that it is quadr~ltk. Write the null hypothesis in the form of Equatmn (IH.20).
What are R . r, and q'!

1S.2

Suppuc a sample of n = 20 households has the sample mc~ms and sample


co... anancc!) bc!low for a dependent vanable and t\\n rcurcs\Clr:-.:

Exercses

739

I
I

Samplo Covariancos
y

x,

x,

0.26

0.22

0.32

O.SO

0 21{

Sample Means

6.39
~

x,

1 ~--

7.24

.too

x1

2.40

a. Calculate the OLS estimates o f {30./3 1 and

f3,. Ca lcu l a t es~. Calculate

the R2 of the regression.


b. S uppose that all six assumptio ns in Key Concept 18.1 hold. Test the
hypothesis that {3 1 = 0 at the 5% significance level.
J8.3

Let W be an m x 1 vecto r with covariance m atrix ~w. where ~w is finite


a nd positive definite . Let c be a no nrandom m X 1 vector, and le t Q

o. Show that var( Q)

= c' ~" c.

b. Suppose that c =F 0,... Show tha t 0 < var (Q) <


18.4

= c' W.

oo.

Conside r the regressio n m odel from Chapte r 4. Y 1 = {30

+ {31X 1 + ui, and

assume th ut the assumptions in Key Concept 4.3 ho ld.

a. W r ite the mo del in the ma trix for m given in Equations ( 18.2) a nd


(18 .4).
b. S how that assumptions 1-4 in K ey Concept 18.1 a re satisfied.

c. Use the general formula for

j3 in

Eq uation (J8.J 1) to de rive the

expressio ns for {30 and {31 given in K ey Concep t 4.2.


d. Show that the (1,1) ele men t of 'Lp in E q uauun (18.13) is equal to the
expressio n for
given in Key Concep t 4.4.

vi

I-'ll

18.5

Let Px and Mx be a s defmed in Eq uat io n (18.24) and (18.25).


a. Pr ove tha l PxMx

= 011x 11 and that Px and Mx a re idcmpo tt!nf.

b. Dcnvc Equations (18.27) and (18.28).

18.6 Consid er the regression mo del in m a trix form . Y X/3 + Wy + u , whe re X


is a n n X k 1 m atri..-.; of regressors and W is an n x k~ matrix o f regressors.
Then the OLS estimator ~ can be expressed

740

CHAPTER 18

The Theory of Multiple Regression


Now kt ~1' be the ''htnary vanabk .. lhcd cllcct:. ~..,timatur ~:umputl:U h)
t:..,llmaung r:liUHLion (10.11) by OLS.aml kt bf'Mh1. th~o lk mcaninv' la\:t:d
l!ltcus ~.: ... tim,uur computed b}' c:.umaung Equ llum (10.14) b\ lH 'i. Ill
whilh th.. cntity-specihc :.a mph.: mean~ h,t\C: h~t:n ... uhtr.tCtl j lwn \ .tnd } .
1 ( IT1111: Wn te
u. .c: the exprc...~ion {3 giH:n ab0\l! to pw'c that ~f11
jFquation ( lU 11) using a full o;er of fixed cll\:ct .... 1>1 ,. 1>2,..
Th nnJ no
cun,tant term. Include all ot the fixed cfkch in W. \\ r itc out l'lc matrix

P{''

M,~.\

1H.7

Cnn'ltkr the regression model, Y, = f3 1X, + 13.. U', 11,, \\hcJ e tm -;implicit)
the intercept is l)mittcd and all vari.tblc:s are a'~umcd tu ha\C ,, mc:.m o l
;rcro. Suppose X, is dtstrihuted independent IV of (W 11,) hut W ;lllu 11 mi,lt
be cnm::latcd, om.l kt /3 1 aml /3-: be the OLS e~ t irn a tor... IM thts mndd Sh"''
that
a. Whether nr not W, and u, arc corrclatctl,
b. 1f W, and

111

PI

h,

---Lt /3 1

arc correlated, ~2 IS inconsi<:h!nt.

c. Ld
be the 0 LS estimator from the regression ol ron \ (the
re<;tnctcd n:grcssilm that excludes W ). Provttle condllllln' unJ.:r
"luch /3 1 has a smaller asympwuc vananc~. than ~ 1 Jllowini: forth~.:
pos-;ibtht~ tbat W1 and u, are correlated
litH

Consttll!r the rege~sion model Y; = /30 ..1. f3 Y; + 11 wh<.rc 11 1 = ii 1 and u,


(: tor i= 2, 3..... n. Su p~e that 111 ar.. i.i .d . \\tth mean() a nd
1 JUJ are distributed tndep.!nd~;ntlv ol X lo .til i nnd j.

II" t '.t t.tnc.;;

a. Der 'e an

expr~<;ton

for E( L'c:')

= n.

b. Explatn hO\\ to esttmate the mod..:l by v LS wJthout explldtl) ti.\r.,;rt 10. the matrix n. {Him: Transtorm the moJd so th . H the rcgrc ...... llln
erJ\)fS arc ii,,
!ill.)

ul.- ...

18.9

lhio; ocrci ... ~ sho"s thai the OLS estimator ol a suh<;ct of the rc~rcc;'lll)n
codftctc:nts is consistent under th ~.; conJi tion,tl mean inJ ..:pr,;~tJ~onu
"'sumptton '>tatctl in A ppendix 13.3. C'onsitlt..t the mulltplc rt.grC'\'-II'Il
model tn matnx form Y = X/3 + Wy -'- u, where X ,md n' .tre. r~pcctivcl>
n ~ /.. 1 am.l n X k~ matrices of regressors. Let .t; and w; denote the.: 11h nw.~
of.\ anJ H (as in Equation (1(\.J)I A'-"umc th.tl (i) F(11tl \',. 'W) W:'o
whl!rc 1i j, a k , X L vector of unknown parameters (ii) (XJ. W }~)arc i i d
I iii l t.\,. H~. 11 l have four finik. nonzero mom nt:-.tnd (") there j, 011 perfect multtculhneant) . Tht!:>~ arc a..:,umptH.IO' l..-4 ol K~v C 111~o:cpt 1~.. wtt ..
the c,mJIIional m~.:an mde~ndence a!>!>Umpuon (1) rt:piJlln!-the U!>U:tl con
Jiti<'mal mc:tn J'Ch' a ...... umption

Exercises

u. l

s~..

(II

h.

tht. c'\prcssion fur

\" -\f 11 X)

'tn

/J gi,cn in

E\CJci-.~

IX lltu \Hik

/3- /3 =

',\" Mu L'l

<;ho" thalt 1.\' M wX --"--+ }. \ \ - l ,w 1. 11 \.'~: 11 , ,\\h~rc.

r ( y \' ')

\II

741

= ( y H'')

lmho forth

I nl~

4 ,, .1 ----+ A Cor all 1 j, "hc:re A .,~1 <~nd ,-t


A, and A .j

matnx A

1 H. th~

X, ,, ~ A it

(t.J) clcmcnb

llf

c. Sho'' that assumpuonl' (1) and (u) 1mply that flu.\', W)- Hi>.
d. t '" (c) anJ the law of ih:ratcJ c:xp~daton~ t<. shll\\ tl1.1t ,- ..\" \lnL '
__!!__,

e. l'sc.' (,1)-(J) to condude th.ll, unJI!r cundithms (i)-(h)


liUO Ld (

b~

jJ ~

{3 .

u symmc:tnc tdempoh:nt m ttJX.

u. ShO\\ that the! eigeO\alue!-. of C an. c1thd I) or I (/ lmt :"\oh.' that Cq

yq implit:' 0 = Cq - yq

= CCq - "'IJ

yCq - yq

y'q - "';Q, and

.,,.,h e for y.)


b. Show that trace( C) = rank(C).
c. Let d be tt n :<. I vector. Sho\\ that d' Cd 2:: 0.
18.11 <luppose that Cis an n X n symmetric iJI..'mpotcnt matrh "ith rank rand
let V - V(O, I ,.)
u. Sho'' that C = AA ' whc.!re A IS n X r V.Jih \ 'A = l r (1/mt (.IS p,,,,_
t1vc c;cmJdcllnttc and can be wntlcn .1~ Q \ Q' "" cxplainc:d 111 Appendi X lKl.)

b. Shu'' that.4 ' V- N(O. I ,).


c. Show th,lt V' CV -

x;.

UtU

a.

how that ~t./1< 1111 is the l!l'fich.:nt G\1M c'tim.ttor- that i' that

{3'' 1<1111 in Equation (1 '66) is the 'olullon to E'tuution (JX.6'\).

b. $how that v 7'z(/Jf}t(;\f \1 -

p rtJfo\f\f) -"-;

c. Showtharfi\1 ' 1 ~ X~1

18.13 C'l)n.,ider

th~

prohlcm of mimmi;in!,!

th~

0.

-.um of squ:u cd

residual~

subject

to the con'>traint that Rb - r. \\here H ic; q X (/.:


I) with rank q. Let
th~ , ,Jiut: of b that c;oJ\'eS the c m.tr.lJn~d minrmmttHm l''''hlem.
a. 5ho" that the! Lagrang1an (or the minimi7.tliun prubkm i-. l.( h;y)
(Y \'b)'( } ' - .\'b)..,.. y '(Rb - r) , \\hCI~ y j, 11 'f X I H;dur uf
l tgrangc multipliers.

fJ he

742

CHAPlEt 18

The Theory of Multiple Regression


b. S how that~ =

P- (X'X)

1R ' [R (X 'X) - 1R ' l - 1(RP

- r)

c. Show that ( Y - XP)'( Y - XP)- (Y - Xp)'(Y - XP)


[R(X' X)- 1R' ]- 1 (RfJ - r).

= (R{J- r)'

d. Show that F in Equation ( 18.36) is equivale nt to the homoskeskasttcity-only F-statistic in Equation (7. 13).

18.14 Consider the regression mode l Y = X/3 + U Partitio n X and [X


a ll

[p; Pil', where X

X_! Y = Ot ,xt Let R

.fJ and /3

has k 1 columns and X2 has k2 columns. Supposi! that

= ( !~ O ~

a. Show that P'(X'X){J

t ).

= (RIJ)'IR(X'X) - 1Rr 1(RtJ) .

b. Conside r the regress1on described in Equarion (1.2.17}. Let W = [1 W 1


W2 W,l. where 1 is an n x 1 vector of o nes, W 1 is the n x I vector
with ;th eleme nt Wl, a nd so forth . Le t (Jw ..~ denote the vector of two
stage least squares residuals.
i. Show that W (Jrsu

= 0.

ii. Show that tbe me thod Cor computing the .!stat ist ic describe d in
K ey Concepr 12.6 (using a homoskedas1ticity-only F Slatistic) and
the formula in Equation (18.63) produce the same value for the J.
s ta tis tic. [Him: Use the results io (a), (b.i), and Exercise 18.13.)

18.1.5 (Consistency of clustered s tandard erro rs.) Conside r the panel da ta mode l
Y11 = {3Xil + a; + u, where all variables are scalars. Assume that assump
tions 1, 2. and 4 in Key Concept 10.3 hold, and st re ngthen assumptio n 3 so
that X 11 a nd II;, have eight no nzero fini re mo ments Suppose. howeve r. tb.:tt
the e rro r is conditionally serially corre lated so rhat assumptio n 5 doc. not
h<..l ld. Let M = l r- T 1u' . whe re L is a T x 1 vector of 1's. Al11o le t Y; =
(Y, 1 Yu ... Y11 )', X, = (X11 X 12 ... X,1 )', u; = (u, 11, .. uiT)'. Y, = MY,, X,=
MX1, and 1 = Mu1 For the as}mptotic cah.:uhllio ns in lhis problem. suppose that Tis fixed and 11 --+ 'JO.

a. Show that the fixe d effects c<;timato r of {3 fro m section 10.3 can be
written asp = ( L~ 1.Y,'X,)- 1 L~~ 1 X/ Y1
b. Show that

p- {3 = (L;' ,i ,x,)-1 ~~ 1X,'u,.(Him: M is ide mpotent).

c. Let Q; =

E(X,'X,) and QA

d. Let T'l,

= Xfu11'VT a nd tr~

= ~~~ 'L~_, x; , x~ Show rhat Q.~ ~ Q.\


= var(T'I,) Sho\\ 1h tt 1 ~~~ 17J;__e__. N(O.o;)

e. L se your an:.wcr.. to (bHd} to prove E'fu.uion ( 10 :!5): tha t tS. :.how


that" ;/(p-

/3) -

.\(o,cr;

Q~. ).

Summary of Matrix Algebro

743

f. Le t c;~.tluumd be Lhc infeasible clustered variance estima tor computed

using the true errors in~tead o f the rcsi du<~lc;. su that cT~''" "'' 1 =
~" (A
v-iu,) -.
' SllOW
th at cr;,,cftrtcnd
_..,.
P
,r1 L.J,
__,___,.
cr;'
1

~X, and u~Afu<tmJ = n~ 2.;~ ( Xl u 1) 2 [thl!-> is Equation


(I 0.29) in matrix form ]. ShO\\ that ;,,,ft,,..,. J ~ u ;, [ lfinr: U se an

g. Le t ii 1 =

Y, -

arg ument like that used to show E q uation (1 7. 16) to show that

a-;,. ltrmwd -

u~.clustmd ~ O, th e n usc your ans\\Cf to([).)

APPENDIX

18.1 Summary of Matrix Algebra


1l1is appendix summanzes vectors, matrices, and the element\ ol mat m algehra used in

Cha pter l8.1l1c purpose of this appendix i:. to rcvtcw somc concepts and dcfmlttons from
a

cou r~e

m linear algebra, not to replace such a course.

Definitions of Vectors and Matrices


A vel1or i~ a collection of n numbers or cle me nt~ collected eithe r in a column (a column

vector) or in a row (a row vector). The n-dimen~1onal column

v~ctor

b and the n-dimen-

sional row vector c arc

where b1 is the first elemen t of b and in general h, is the 11 h element of b.


TI1roughout. a boldface symbol denotes a 'ector or malrix.
A matri x i

a collecuon. or array. of numPcr:. or dcmcnb Ill ''lueh the clements an:

lmtl out in columns and rows. The dimenl>tOn of a matnx is 11 "

[...

111.

of rO\h nnd m i<> the number of column The 11 X m matrh \ i-.

A=

a~t

a1..

....

a,l

an2

II:,.,.

(1 '

a.J,,

where 11 IS the number

744

CHAPTER 18

The Theory or Multiple Regreuion

th,ll lflfl\:;lf, ln ''" t' 1' ruw mJ


/''column An 11 m matnx cons1st'l of 11 row vecto~ or. altl!rn,lltvcly, 01111 column \..:..,1111~
To d i-;tin~ui'h one diml!n'>ionul numher; from vechlr....mJ m ltrtcc,, a ,m ... -dimc.:nSn)n31 numlxr '' called 'I caJur.
\\h~o:r~ "v

1'

t hl:

(1

J) demen t ol A , that

I!> /J 1\

th~o; ckm~nt

Types of Matrices
,\ quart, \,l'mmuric, ttnd diagonal matric~.

\ matnx "~1d to b.: ~quar~ 1f thl.:' numtx.r uJ


r,1,,, -.:qual'> th~ number of column-....\ '<JUare matri\ j, ' 1d hll'<: symmctri~: t 11 ( JJ, ..
ment cqu:\l{, its (j i ) element. A diagonal matnx is a squ... ~ matrix m \Vhil:~ I .he ,'lff-diag-

un.tl d..:ments equ.tl/.ero, that IS. if the square matnx A io; dta~ona l . th~.:n a 0 fnr i

Sprcialmatric:t~.

An 1m~rtant matrix 1s the identity malrh<, 1,. v. h1ch is ,,n n X 11 dtaunna l mutrix wit h ones on the d iagonal 'lhc nuiJ matrix O, x, IS the 11 X m mutrtx wuh all

c krncnh equal to zero.


Tile trall)pO:-.f'. 1111.! transpose ol a matrix

switchc~

the rows and

lh~.: col umn ~.1hut

is. the
tran,po~l.! of n ma trix turns the 11 X m matrix A tntO the m X n matrix, which i~ denntcu
hy A'. where the {i,J) ell.:mcnt of 1-\ bccom~.:::. the (j.i) element of A '; $:lid dtCfcn.: ntly. the
trunspose t)f the mnt11'c: A turns lhc rows of A into tbc columns of A '. ll n "the 11.j) dem nt ,l( .\ then A' (th~ tra ~pose of A) j..,

A =

a[1

[""
a,

fl:t

Dnt

a~

(lnl

au.,

n,...,J

The tran,po--c of a vector~.> a :.pcctal ca'< o l lhL tr:~n' ~. ''' ' m.Jirt>..Thu t. " tran,.

p<-..; ,f 1 \"~Ctl

vector, tht.:n

tiS

turn:. a column \eCtllf .nto d rO\\ "c:~.:toJ. th Jt 1!'. tf

transp()l:(.

l11~.: t rnno;pos~

i~ th~.:

I > 11 rtm

b 15 an n >< I column

\'CCior

of a row vector is a column vector.

Elements of Matrix Algebra


Addition and Multiplication
Two matrice:. .\ and 8 that havt.. th, c;ame uimc!n'iliO.'- ( .tr~ hnth n X
m)' 10 tx auJ~;d tllf!c:-lh..:r 'Jbc sum ott ~() matnces ~~t he sum of their ckmcntc; !nat i~. 1f
c A .... B, thc.: n c,l 1.1; -+ b . A !\pt!Cial C.J'SL o[ matriX add!IIIII ...... \ t.;l.(lll lllldot.on Ira :tnd

,\latrl.\ addition.

Summary of l'itotrix Algebro


h <~
ts, c,

to;

txllh"

I Ctllunm

\C

tor... tho.:n Lh.:tr !'>Um c a + b '' the clt.!mcnt\\bc

745

:)Uill,

that

n,.

- "-

l "raor mul matrix multip/icarion Let a and b be two' X I column vector<;. Then the
proJuct of the tr:m~p<'~ of a (whach as itself a row vcdur) wilh b ~ a'b = 2:;&a.b,
Appl>'tni! thas Udlllill\)n walh b = a yield~ o' a =- I' aU
Sinu.larly. the matnces A and 8 can be multapiH.:J togethe r 1.1 the) are conformable..
that '' af the numl'l\!r ol columm of A e4u<ab the numl'lcr of fllWS of B. Specifically, sup
po~c -\ hit~ l.hmcn~ion n X m and 8 hcl..'i damen:.aon m x r Lll~..:n the product uf A and B is
ran 11 x r matrax. C: th:H t!\. C = AB, wh~..:rc lh~.: (tJ') d.mcnt of C i'' 1 2 1 tJ,,h~ Said diftcrcntly. thc (iJ) element of A B i~ the product of muluplyang the row vector that is the ,~~
row of A wi th the column \ector that i' the /' column o! B.
The product of 11 scalar d wnh the matrix A hfl~ the: (t,J) dement dAw thrat is, each element of A
~nmc

i~

multaplaed

u\eful

b~

pmpertie.~

the scalar d.
Let A and B be

of matri.t. atldltinn mul multlpllmtitm.

mat races. Then:


a. A + B = B + A .
b (A + B) + C = A + (B + C),
c. (A + B )'= A' + B ';

d li -\

n X m. then A l 111 = A and /,A = A ;

e A (BC) = (AB)C:
1 (A + B)C = 4 C + BC. and
g. (AB)' = B'A '.

In gt.neral, matrix muluplicauon does not commut~. that as, an sencrnl AB BA .


.tlthJugh Lhcn: arc.: 'orne special ca.o;es m wh1ch matrl\ muhiphcation co mmutes. for c:xample 11 A and 8 are both n X n diagonal m.nrice., then \ B = B A

Matrix Inverse, Matrix Square Roots, and Related Topics


1 ~~~ matn' lm'tl"\e.

Let A be a square matrax. Assummg n exi~s. thc inverse of the


1
matn'( A il> J daned as the matrix tor wluch A -JA = l n. If m f~CI the mver.;e mamx A
c.xa.,t-... then A ic; satd to be invertible or nonsingular. If hoth A am! B ar..: tn\erhhle, then
V\8)

J>tJ\itirr tlcfinit~ and pQ)Ifil't \tmitlefirrltr matriu\.

Let V l'l\. an

11

11

square matri\.

'Then V is positive dclinitc if c' Vc > 0 tor nil nontcro n X l vecror~ c. Similarly, V is

positive semidellnile if c Vc ~ 0 for all noru.ero n X I vecto rs c. If Vas po<.lliYe Jefinite


then at ,., anvertible.

74 6

CHAPTER 18

The Theory of Multiple Regression


l.ineur independence.

Then x I vector. ct 1 and

11~

nt..: linear!) independen t 11 there do

nul c.:\bt nonzcw scalars c, and ~: 2 <~uch that' 1u 1 + ',u

\fore.:

0,

~en~;rull).th ... ,c;:~ ot

1.. vectors. a . a . . ... aA are linear!> independent ilthcrc U\l not cxtq non:tcro '.:alar~ . ~
L

.... c1 :ouch th:n c a c-:a1


Th~

run/,; nf a matrix.

Tb~

+ c a, - 0.,,.. 1

rank o Lthe n x m matm. A 1::. the.: numbtr ot I ~~~~rh mJc-

th~ r.lnl. \ll .-\ ~lJU~b the


numbet 01 column' oi A. then A I.S ~td to h,l\~; tull column (or ro,,) r.mk ll the 11 x: m
m urix -\ h.t-. full column rank. then there: doc~ nul cxht .t nonz~m m X I \eCtor c ~uch

JXnJcnt columns ot A . The rank of A o<t denoted r.mk(A ). II

i~ 11 X n with rank( A)
" thc:n \
m matrix.\ has full column rank. then A 'A i-. nun.,inl!ulat.

that A c,. 0., . H A


" Y

the matrix square root.

Let V be an

11 X 11

'lquarc

i~

n1msing.ulat . Lf the-

~ymmetric po~il i vc ddinit~

matrix.

1111: matrix square root of Vis defineJ to be an n ,... n matrix F ~uch th:H F ' F = V. TI1e
mu l rix ~quore 1011 t of a positive defini te matrix will a lway~ .;xrst. hut it is not UnllJU<.:. The
11Hill"ix St) ll<lfC root has the property lh<lt F V" IF' =1, In auulliu n. thL
ol n posllf\'C definite matrix is invcrubk. so that F' - I VP 1 =- T,.

F.igtmulue~
'I:Jiilf

and

~igemectors.

Let A

h~

an

11

n mamx. II

th~ 11

A o;atisf~ ..\ q = Jo.q . \\b~;r~ q' q - L then A 1 ~ an eigemalue

vector of -\ a~'-OCJated with that eigenvalue -\n 11


need not tJke on d1sunct values. and 11 cigenvecl\11"$.
II I'

1:.

11

root

x I vector q and the

lll .\

m.ttrix ha"

11l:llrtll: ~quare

11

.tnll q ~the eigeneigenvalues. which

an 11 X 11 ~~'mmetnc po,iti\c Jduutc mat II\ then ,tllthc c!JI!Cnvalu~~ of V 1re


number.. .mJ all the el!!emectvr:. 0f I ' sre r~: tl. AJ,o. V tan be \\Otten m

J)\1'\111\c r~;.tl

term~

111 '" t genvalues and eigcn' cctors .l!> V = QAQ' \\ hae \ t:> a iliag1mal n . n
m.lln\ ''itb Ji<tgonal elements that equal the eJgt.:m.tluc:-. llf I , .mu Q '' ,m 11 x 11 matrix
wn'i'tmg of the ~igcn,c.:ton; of V. urrang~;d -;o that the 11h column of Q i., the cigt:n\CCtm <.:<Jrrt.:">JXJOding to the etgenvalue that i" thl.' t1 h di;tg1.mal t.:kmcntof \ . TI1c eigenvectors
un orthl)n<,rmal. ~o that Q'Q -== 1.,.
ldt:mpotelll matrices. A matrix C is idempolcnl if C is square und CC -- C. It C ts an
n X 11 id..:mpott\n1 matri.\ that is also symmctnc. the n C i~ positive semidclinilt and C ha~
1 ctgcnv.tlm:s that equal I and n - r eigenvalw.:s thut equal U. where r
r.tnk(C) (ExerCISe liUO).

MultiYOriote Distnbutions

747

APPENDIX

18.2

Multivariate Distributions
lbts .tpJ)\.ntln. collect~ vanou~ dchnllon~ and lad'> ah~ul c.Jt,tnbuunn' of ve~tnr' ~I rllndom \ r lhk~ We ::.t..rll'l) ddinmg the mean and tu\olrtnncc mntri\ olth~! IH.hmcn... i.-.. nal
rnnJom \ariahlc V . l\ext

\\C

prc.,cntthe muluvnnnh: nutm I J\t 'hutinn, anJ then ,um-

m.trite ''lint: facb about the Ji,trihution~ of lincnr and


m :~llv dhtributed random vanables.

qu:~dr.ntc

lundilm-. 111 juimly nor-

The Mean Vector and Covariance Matrix


Inc hr.;t .tnJ second mvmc:n.' ,,fan m

\~1.h1r

,,( ranJ,, 11 'ariahlc'>. V

= I 1~, \'1

V,.,)' .tr~o. 'ummari7ed by ih ml.:tn 'ector and cm.trian~o.o:c m. I IX.


Recau~e

1llc

V is a vector the vector ol its mc:Jn' th 11 i~ It'> mean vector


mean vector I> the me.111 ot tho. 1 \.'ltmlnl of V.

~~/-I

=P. v-

,,~ dcm~o.nt of the

1111. Ctnariancc m11fch of V ic; the matrix c:on'l'llll~ ol th~. '":mancc var(I',J,

I... . . 11 along the diagonal anJ the {I.J) ofr-1.h.tgnnal dcmcnh '~''(1 ',. 1"1). ln 111.1trtx form.
maiilx ~ 1 tlo

th~o. 1!1'1\,.tn.m~

1,.'(1\ (

'._,.1 ~)).
(Is.n1

' rt l ',.)

The Multivariate Normal Distribution


lhc m
\ ~CI11r

l \ector random 'n<tDk V ha;, a multt\ lllt.tll: nmmal ditnhullln \\ tlh meun
J.1. 1 .111d C~l\"itri,mce m,tlrl\ ~ 1- if it ba' the jtinl pruh,thilil\ den,il~ run~tiun

I
r
1
1
"'
,
cxpl--:;( l'- p. 1 )'~ 1 1 lV-p. 1 ) 1.
\. l-") Jet( .... , l
J

ft V) = -..

'' here det( ~l ) 1s the (kl\:rmmantnf the matri:(

~ 1 .

(PG3)

In multl\.m..tt" nmmul dtanbuunn

~ denoted 1\ (J.I. " ' ~ ~ ).

o.\ntmponant fact Jboutthc: mulll\-nrtatc nurm.ll \li,tnhuwm ''that ilt\\l'l j1intl~ n\lr
m 1ll} di~tnhuted r.1ndnm \ariahlt.:'> .m; uncorrd.al~d (e\juh 1kntlv. h.t,c.: l dia~tnn.tl covari:mc.: matrl.' ). then the' rtre independcnll) Ji,.trihutl."d I h.11 ' ' ld V1 .mJ V! lx jointly

748

CHAPTER 1 a

The Theory of Multiple Regre$sion


normal!' tJi,tributcd random v..~ri.~hJc:, " llh re~pectl\c '.hmcm1on~ m 1 X 1 md 1112 1.
Then if CO\ ( V,.Vl) = E{( l ' t - J.l1',)( Vl - JA. v)'] = 0,., , V, and Vl ~ re inde~ndenl
If IV;I arc i i.d. N(O. o-;). then l. v = u-,,.... and the multivon:uc normal d~trihullun
M mphti~.-s to tht. proouct of m um'<Uillle normal dcnsitic~

Distributions of Linear Combinataons


and Quadratic Forms of Normal Random Variables
Lmcar combmallons ol mult1van.nc normal random v.ui:tl'lle' ue themo;eJves oormalh
di,tributctl. and certam quad ratic forms of multl\'ariutc normol random vanables hav~:

:1

ch1-squared Jhtnbuuon. Let V be dO m X 1 ra ndom vari,tbh.: d1:.Lnhuted V(JA. v}. v). l.:t \
and 8 be nonrandnm u X m .tnd b x m matnces, ami ld d be a n~>nram.lom a x I vector
Then
d + AVis distributed N(d

+ AJA. 1 AI~ '),

(18.7-t)

cov(AJI, B V) = Alv8' :
L( AivB' =

( 18.75)

O. xhthen AV ,md B V are independent ly d1stn hu ted;

Jl'!

v' v is di<;uihuted ~.

(18.76)
(18.77)

Ltt U be an m-d imensional mullivan ate 'itandard normal random vanable w1tb dutribuuon N(O. / _,). If C b symmetric and idempotent. then
U' CU has a

,r: distr ibution. where r = rank( C).

( IR 78)

Equation (18.78) IS proven as F\crn<.e ( 18.11).

APPENDIX

18.3 Derivation of the Asymptotic Distribution of {J


A

l bl'\ appendI\ rrovides the dem .ttlon ot thl! a"ymptotiC nOt m.tl dutnbullon of
\ n(IJ - fJ) g.&\cn 10 Equauon l}S. l2). An 1mphouon of thtS rc..,ult " that ~ IJ.

FiN concrd~r thr. "dcnomm:mr matm. .\''X 'n - ~ L

The (j. {) e l~ml.'n t of this matrix

IS

*'L"

1 Y,X,' 1n Equ.suon (18. 15)


,_!(1,. By the.. 'tcond ,, ,~umption in Kev

Denvotions of Exact Di~tribvhon$ of Ol5 Test Stot1,hcs with Normal Errors

749

Con~.:pt lti I. .Y, 11- 11.J.. so.\ PX . 1 1.1.d B) th~: tlmd J' umpuon 10 1-..c) <.um:c:pt llS.l. each
dement of .\", ha" four m~mcnt ..o. h~ the ('~uch) Schwar/ m ... 4u Jhty tArrx ndix 1... .2).

\", \

Xi,X1 ha' t\\o momcnb. Bec.:aUSt; XpX j., i.i u \\llh t\\1) moment" ~
ot>eys the
1:\\\ of large num~rs, so ; ~- X \' ~ F.( \ 0- ) l'hi' i~ true lllr ull tho; c.h!m~.;n!S of
X' A. n.soX' n

--L. E< Y,X/)

Q\ .

Next con~1dcr the numerator m.llri'< 10 f 4u 111\n ( 18.15). ,\ ' (.u \, 11 =- \ ;- ~n=! V,.
where: V, = Xu,. By the fi rst assumption tn 1-..cy Conc... pt II'! I and the law of Iterated
expect a tiOn'>, ( ~ , ) = L:"[.\',(11

X,))

=ol ' I B~ the; ...cconu lt.<l't ..qu trc~ ~~umptiOO. v, I~

i.1.d. Let c be a finne k -.. 1 dimen'\1\lOal vecl\lr By the C 1uch} Sch\\ar7

E[(c' v,)'J = E[(c' X,u,fl

= 1(c'X f(u,)') ~ V f[l c'X1)

ine4uaht~

jl.t uf) . ''h~~:h j, fini te by the

third least squares assumpuon. This is true tor even such vectM c, so F( V, V,') = :1 1 is
fi nite and. we a. umc. p~iuvc ddinitc. rhus the multlhli Hitc. ..:cntr.llimlt thl:,ucm of Key

Concept IS.2 applies to

\1L'- 1V, =

, ,.. X'L , that is,

( 18.79)

m e rc~u l t in Equation (18 12) folio~ from Eqtmlionb (1R 15) unu (I ~ 79). the consisleocy of X' X l n. the founh least squares assumpton (wluch c n~ure~ thot (.Y' X) ' ex1sts),
and Slutsky's theorem.

APPENDIX

I Derivations of Exact Distributions

18.4 1of OLS Test Statistics with

Normal Errors

1l1is appc!ndlx present the proofs of th~ dNnbutun~ unJ~r tho; ~ull h) p<.lhc<.t-.. ol the:
hOtnll~J,.cJ.t.,lldl} t tnl~ rstati!'tic. tn EquatiOn ( 11\. '\'\) anJ the humc"l.cd;t"l cit}''"'~ F--tali,llc 10 Equaton (18.37). ac;suming that all -ix assumpllull' m Key Clmccpt IS. I hold.

Proof of Equation (18. 35)


If t) Z ha' a lllandard normal distnbuuon. (ti) ~\ ha' a \ , J1~trihution, and (111) /.and \\'
.Ire.- tndcpcndc.:ntl) Ji')tributcJ.thc.:nthc ranuom \otrlolhk Z I'VW Ill hth the H.Ji ... tribution

with m degree' t1f freedom (Appendix 17.1). To put


(.\ : lu~P. 1u. Then

in tht' fl,rm.

nutic~

that

i si =

rewrite Equation ( I '.34) as


( =

([J, - /31 u)l~'(>;,


\lw ( n - ~- L)

( 18.130)

750

CHAPTER 18

The Theory of Multiple Regression


k- I) (s;,l<r ,). and let Z

where W - (tr

(13- (3 )IV (!j!. , )

.tnd m -=

k- I

11 -

With th~:w ddinitt~'nc:.. I= 7 1\rW/m.nluS. to rrnvc: the n.:~ult m Equation (18351


must show (i) -(iti) for these ddmi!JOm of Z, W, anJ 111.

W(

An tmrhcation of Equauon ( IS .~U) 1~ that. undt:r the null hypothc'>J


(/J1 - fJJ.!l) \ (:i11 d 1 ha~ an c\act \tlnd.m.l mmn:tl dtqnhutt\ln, \\htdt !'>ho\\~

(i)

u. From fquauon ( Ui.31 ). W is distributed u~ ,;, , 1. wbich .;;how.. (ii).


iii. 'Iii ~ht>\\ (1ii) .H mu-.t b~. ~hown that/J, nnd "'nrc ind~pendcml\ di,tnhmcd.
From L:.qualiOII'> (ll:U-1) and {I .29}. ~ - fJ (X' X) X' lJ and.~ ""Ot ,l.f}' I M ,U)/
(n - S - ) Jlus fJ - fJ and). aretndepeot.knttl( X'X) '.\ ' L and M,Lurcmd rxn
dent. Both ( \ X) -I X' U and M xU arc line 1r tnmbiDallon. ot U. whilh ha'>

.:11

.V(O. tr.f. ) dtc;trillution, com.llllonal no X. But becaU';e MxXt X' \ }


o.. ,4 n
!Equntion (l8.26ll. it follows thul (X' X } 1X'U nnd MxU arc indepcnd~nll~ Ji,tllhutcJ
1 '"

[Equation ( 11'!.7o)j. Consequcmly. under all '>'' assumptions in K ey C'oncl-pt IS I,

~ and

s;,

are Uldepcndcntly d1stnbuted.

(ll{.sl)

v.hicb .,hoY. (iii) .mJ Lhuo; p mves Equation (HUS).

Proof of Equation (18. 37)


The f,, , dt,tri bution i:. the distrihu uon of ( H'1hi 1)1( W: ' n!). \\here (i} W1 b dt,trihutc.:d

.v:

x;,:(til II . ~~ dt~t nbu tcd and (iii) W1 ;md W2 are independently distrihut~u ( J\ppcndtx
17.1). 1o C).pre!i~ J: rn tlu~ fonn.lct W1 = (R~ - r )'IR(X'X) - R 'crj ' (R~ r ) arJ n -=
(n - J.. - I)',

11 ~

ubstitUll\ln

or lhcsc definrliOO

IIllO

f:quat ion (I, .36) ~hOW') th

(W q) (\\. (n - k- 1)] Thus. h> the ddtnllton ''I the; F .Ji,tribution.

dtstnhutilln if (1) (iii) hold " ith 11 1

q and":

11 -

T ha<~ .111 F. ,

I =
k

k- 1.

i. l 'nd\:T the null hypothesi' R~ - r - R (fJ fJ). Becaw.c

p h.t' the C<,nuitivnnl

norm,l d tqributttln in E4u:1tion (1 8.31J) 1nd hcc:~u"e R is a < nr nd n matri~


/J) is dl'otrihuted r\'(0, R( Y' \ ) 1R u- ). conlhlh~n I on Y 111"" .._~

R(

p-

Equatton ( 18.77) in Appendtx 18.2. (R/3 - r)'(Rl X' X )- R'tr.l (R~ - r ) '" dt'>
tnhutcd ,\'q pro,ing (1)
ii. Rcqutrcment (ii) is 'hown 111 Equation ( llU I).
iii It h," tlrcadv h.!cn 'h(mn tbat
/3 .mu ,2 ar..: indcpl'nJunlv '''tnhutl.'d
Jl.quatt'n (18. !JI ll follc)\\'S that
r mJ ' are iodependL .. ,I\' J~,lnhut~d .
,,h,~,.' in rum imphcs rhJt W1 and W 1re mJe('\!nd~ntl) ~,IJ,tri!'luted. rrn''lng (Ill)

p-

RP -

anJ cumpletmg th\: pmol

Proof of the Gov~s-Morkov Theorem fcx Multiple Regression

APPENDIX

75 1

I Proof of the Gauss-Markov

18.5 l Theorem for Multiple

Regression

lhi' appendix prove-. the Gauss-Markm theorem (J..ey ('oncept IR 1) lor the multplc

re~rcs<~ion modd L..:l /3 be nlinearc0 nd1tionally unhw,\.J 1.!\lima\llt of /3, so that ~ - A' Y
and I <P1.\") - {J . '' here A is an 11 X (k + I l m.1trh th.il ~; .ln dc:pc"nd on X nnd nonrandom
u~n-;tants. \\c.: !>hO\\ that ,,lr(c'/3) $ \ar(c'fl ) for all k 1 c.lum:nllionc.~l 'ector:. c. "here th..:
mo.:4u.s 11~ hoJJ., "1th cquaht) on!~ tt ~ =
Re .. IU')e ~ , ... hno.:at. It \.aD tx v.'Tittc:n ,,... iJ : :." A ' y .. A '( rp - l ') - (.-\ 'X)fJ + A . u.
Bv tho: liNG u,,_\larko' condition. E( L'I X) 0 1 'o rf fJ I
(-~ ' \'){J_ hut because
~is Cllndlllon:~ll~ unhtased E({J X) = fJ = (A ' X ){J. \\htch imphe"lhat A'X- 1 _1 'lhu:. jj

= {J

1\'U. so "M (/J \ ) == var(A'U X)= F( J\ 'UU' 4 \ ) - A 'F(UVI..\ )A = cr.J\'A ,

v. h ~o-rc the.: th1rd ~.-quality follows bcc-auss: A llll d1:pcnd on."< but not L and the.: final..:qu:ll'') lollows from -.ccond Gaus:.-Markov cond11ion Th,lf ~.it~ h.lin~o.tr and unbhts~:d, then

umkr tho.: G:IU''>Marktw oonditions

lhc r~,ults 10 F.quatton (11'>.82) also a pply to

jJ with

=\

\'( Y'X)- 1 \\he r\. ( X 'X ) 1

ex1sl'o hy th tlmd u tU"Marl..~..l\ condition

Now lo.:t 1
~ute that

= -~ . - n. <;O that D
.4.\ =

(.\'' X ) - 1X 'X ( \" X)

Sub\ltt utulr A

mu

i" the dlffo.:rcncc bctws:cn the \\eight matrics:~ 4

(X'X) - 1X'A

= ( '\'' ..\)

fh\ [qu:~ttnn (1!-..,':!l) and

-= (X ' X) - 1 so A'D =,-i '( t\ - ~ )=

\ 'A - 1'A = O,.

1 \:
I

A , D into the formula lur the: C~llllhllon.JI VJrtanc.: m L4uattun (18.b2

ndds
var(P IX)

~(.4

+ D )'(A

., /J )

= <r.[A'A . A'D ~ D'A + D'DJ


= t77,t.Y'A:) - I ~ u:oo.
:~nd .\ .n = 0
(4 11
R~~auw v.~r( IJ 1 ,\ ) =~(X' X) . E4U.1110n~ (IIOf') .tnd ( 1'-~ .fG ) mply Lh.tl 'tri (J \ )
v.11(~ I.\') tr,,D' D. The difference b<!l\"l!~n th~o- vm,ms.:~:. ol the two e~llmatoTh olth~
hn~ 11 ~omhm.IIum c'IJ th u~ is
\\here I he Itnul CljUlllitY uses the fac rs

A.A = (X' \ l

752

CHAPTER 18

The Theory of Mulltple Regreu1on


var(c' p

\') -

var(c' iJ X) - ": c' D ' D c ~ 0.

{IS.S4)

The mcquality 1n Equat1on ( 18.~) holds for all hnc.1r oomhm.ttllllh c' 1J and the
inc:ttuahty hold\ With cquahty for aU nonzero con I} tf D 0, v(l 'II: that 1). tl. \
\ \)f.
equivalent I~. ~ :: P.Thu~ c' iJ has the smallest vanuncc of all hncn cnra.htton.tll) u 1hHbcd
e ~tima tors of c' {J . that Ill. the OLS estimator is BLUE.

Proof of Selected Results


18.6 for IV and GMM Estimation

APPENDIX

The Efficiency ofTSLS Under


Homoskedasticity [Proof of Equation (18.62)]
When the errors u, are homoskedastie, the difference between I~1 'l Equatum ( ll\.61 )J und
'i. TSLS (Eq uation ( 18 ~5)] ts given by

'5.!t- ~}~~-~

( Q.~:tAQ ur 1 Q yzAQzzAQt:A( Q.r.tAQzx)-' u!- (QxlQJQL.d- 1

= ( QuAQ z~)-

QxzA IQzz- Qa(Qx7Q7.}. Qz,r ' Qxz JAQLt ( Q"7AQ7.\'r 1

u;
u;

(I RK)

whe re the second term to brad.cto; in the second cqua lit)' lollows rmm

(Qxz A Qu) -a Qx7 AQ.I\ ::; 1, 4

= PF

<~nd Qi~

F -I F' - and

... F - 1F

r -:::

1
'

Let

F be lhc matrix square root of Q77 so th.at Qzz

[the latter equaliry followl> from noung that ( F'Fr'

F ' 1'J. Then the hnal expression in EquattOO (IK85) can b.: r..: wril-

h:n to yie ld

~.~\,,

- I.' sr.s = (QxzAQz.r)- 1 QXT.AF'fl - F'' Qzx(Qxzr 1r '' Q7X) - 1 Qxzr 'l
FAQzx ( Qx7.AQ7X)- 1 u~.

(18.86)

where th~ second l!xpression in bracl..l!t'- uses rr- = 1. Thus


(18.87)

where d FAQ/,( Q" /AQ." )- 1c ,lnd D = F 1 Qz.., . Now I - D(D 'm- v as a ~yrn
metrae tdc.mputcnt matfl'( (F.~crct"c IK5) As a result. I - D I D 1))- 1/J ha~ dgcnv:\lucs

Proof of Selected Results


tllilt arc. either 0 or

c' ("l 111 -

I and d' [f- D(D D )

l; Hl .\)c ~ 0. proving

for IV and GMM Estimation

1D 'Id

753

..... 0 (l:xerci~e 18.10). Thus

that TSLS is efficient under

homobk cda~llcu:..

Asymptotic Distribution
of the J -Statistic Under Homoskedasticity
The J -stat1sllc is defined in Equation (18.63). First note that

= Y- X ( X'PzX) - 1X'P2 Y
=

( X{J

+ U) -

X(X'Pz.X) - 1X'Pz(X{J

+ V)

(18.88)

= U - X (X'PzX) - 1 X' P7.U

= ll- X(X'PzX) - 1X'Pz)U.


Thus

i.Jp.z U =

U'[l - P7.X(X'P-zX) - 1X']P7. [1 - X(X'P7.X)- 1 X'P z]U


( 1K89)

when; the second equalit y follows by simplifying the preceding expr\!!;Sion. Because Z ' Z
is symmetric and positive definite, it can be written Ill terms of its mntri.x 'quare root. Z'Z
= (Z' Z) 1 '~' (Z' Z) li2, and this matrix square root IS invertible, so (Z'z)- 1 =
(Z'Z) - 1 1 '(Z ' 7.)-ll~ . where (Z 'Z) - 1 1 = [(Z'Z) 112]- 1. Thu~ P7. can be \\nttcn al> Pz =
Z(Z' Z)

z = BB', where B =

Z (Z'Z) 112 Substituting th1s expre..;~ion for Pz into the

final expression in Equation (18.89) yields

UPzU =

U'[BB ' - BB'X(X'BB'X) - 1 X'BB' IU

U' B [I - B "X(X' BB'X)

U'BMif:rB'U.

X' B]B'U

(1&.90)

\\hcrc M 8 .x -= 1 - B'X(X'BB'X) - 1X'B isasymmetnc u.Jempotc:nt matnx.

The ao;ymptotic null distribution of U'PzU ts found b) computing the hmtl'> in prob
nbtut) anJ in di~tribulion of the various terms in the final exprcss1on in Equatton {IH.90)
under the null hypothesis. Undo::r the null h) poth ~sis that (Z,II.) 0, zL'l Vii has mean
tcro and tile central lirmt theorem applies. so Z' Ul Vii ~ \1(0 Qz.zU~). 1n aJdition,
/.'Zin ~ Qzz
and
X'Z in ~ Qx:t
T11us
B' U (Z'Z)- tr z u =

754

CHAPTER 1a

The Theory of Multiple Regression


77 , )

'!7'l' \ r; 1 -----+ rr,.z. \\b.:rc z j., d1~tnbutcJ .\(11. 1,., .,d). In oiJJiton,
t7 7 ,)- '(7' \ 'n) ~ QJ: Q zx "'

11 \ I\ -

M,'

I - Q/\ (QY?Qz z ' Q:J ~QL,)- ! Q rzQ7.z 1 ' ... MQ,~

Q,.

ll1U~
(lS'JI)

UnJ..: r the nul lwp. th... ,j,_rh~ T~LS e;;um,,tor '' o:onw>h:nt nnd the o:oeffCJenL' in the
re2re"i'm of i ''r 7 ... JO Ve ~e in prtlbt~bilih to zo:w [l- implic.tion of r:qu ' I~.QJ l).
so the Jo:nnmmJtor in the Jdiruuon of the J-stat',uc ' ,, o:on~t\ICnt ~timatc f r1:
..

....

f1

U' M zC:t(n- m - r - 1) --- u~.

(1 .92)

Ir.rn the c.kflni!IC'ln c.l th" J o;tatistc .tnJ Fquat1on' (1~.<11) .1nd (IS.9'),1l lulkm~ th.tl

i'I'PzU

J :.

-----'

U' M 7.V!(n -

111 -

"

~ ----+

r - I)

z'MQ.,y
'"<h, z.

(li\.9~)

HclaU\c:.: 1~ a :.t,ulthrd nmmal ranJ om v~.:ctor and M <lu' Clt i' a ~ymm~tnc idcmpo
tent mattlx.J '" c..hstnbutcJ <Ill .1 d u-squared rantli)m \.tr~o~hk ''llh ,,t..;gr<.:~o:.\ 1>1 tree,.h>m that
equlll tht!' r.ml. ol \IC!u (l, (Equauon (I .7~)1 Bo:c,lU~e Q/.,el Qn 1 (m
I X V
r - I)

anJ

111

J., the ru 1.. f

M Qu Qn i.; m - lc (b.:r~.t\C (I

\\hk 1 j, th~ rc,ult 'tated m E'{Ul'

5)) fhu.c; J - .\~

!'

~m (l~.6J).

The Efficiency of the Efficient GMM Estimator


'IlK nt... 1" bit ellll1e01 CJ!-.t\1 '-""ttmator. pr .>t , ~ Jdini.!J 10 [ qu:llt\10 l1K66). 'Jb~
P"< 1f th.tl #0 :t 1 I' (;ffictcnt entail~ ,h,mmg thar c'{l fr I L ''1 '1)c c:: II for ull vector<>
c lh\. prl"lcl( dlN.:I~ p.mtlkb the proof of the dficio:m.'\ llf tht T SLli c.,tnn uur m th\; Un.t
~cction oithi-; :~ppcnJi\. wuh th\. 'ole modification that H 1 n.:pl.u.:c" Qnu' in Cquaunn
( IK~'i) and 'ub,cqucntlv

Distribution of the GMM J-Statistic


lltt Cl\.1!\1 J 'tatistic
si<;.

1" 1'"'

~ ),~

1s

gven in E4u:1tion (IS.70). Tht prnCif that. undc1 th .. null h~polh\.

duwly parallel<: the corre<:.ptmllm~ pwul

under h<1nlllskcd."tlcll ,

flf

th~; I SLS 1 '>t.ltisttc

Appendix

TABLE 1 The Cumulative Standard Normal Distribution Function, d'(z)

Se<ond Decimal Value of


%

-2.9
-2.~

-2.1

:t

0.00 I~
0.11025
(I II(H4

0.0018
0.111124
O.IXH>

O.lllll7

0.1.}(11 h
O.Cl022

0.0015

O.lllll5

0.001 ~

0.0014

ll.IKI21

u.no.12

O.O<H fl
(1.()023
OJIInl

O.CKI:' I
0.0021\

!l.0020
0Jl027

0.0019
0.0026

o.<w).n

11.()()4 ~

o.oo:n

ll.ll(.IJ6

f) (.11)55
1)111.173

llll0-49
llllllM
0.()1..\7
0.111 t_,
0.0146

0.0048
O.\Xl6-t
O.!lOS4
0.0110

0.111~

li.OIS3
0.0233

0.1)01 '}
1.00:'(
ono ,5

Pr(Z :.... z)

ll.l'kl2.~

OJIOJO
IJ.(IC\411

ll.U0211
ll.I~'W

lll GS

(.1 (11154

0.00.52

(111115i

11.0069
0 .0091
() II 1'1

(1.(1()(,~

0.0151.1
on 92
0.11244

-2.ft

ll.llnl"'

11.1111-t"''

()()I~

-2.5

0 .006::!

(1.1.)11{'14 1

( t)()<i()

0.111157

(1.(1(~1

ll.liO-s

O.fltr'S

-~.J

0 0107

0 .11104

O.tll 02

11.11

3t.

(l Ul ~2
(J lll711

11.1)12..'\

-2.1
.2.0
- IQ
I '\

0.0139
11.01711
II 022S
II 02h'7
IJO:lW

O.OCJ9iJ
11.0129

fl071
(l.(l(J94
I 11121

1 1.01~

IJ.Illll2

0.01'>~

(I.IIJ'i4

11.02r

0.0212

0.0201

n1121C

I 11117

0.026-'i

0.0:'!112

0.02.'\fl

(11)~5(1

ll.IH!\1

0 0274
1111144

11.113}6

0.113211

o.unz

11.11.~07

11.11.239
(J.Il'\j 1

-17

fl.IJ.~lh

11 t).nh

() 11427

OJJ4l~

O.ll409

0.114111

1).(1314
IJ.II N;:

1Hl3s-1

tJ.o~n:;

-1 h
15

ll051X

I)

05:!h

{I Ocl7'i

O.fl.tf!5

0 .()('1(.)0

(l()'iQ4

ll Wi .:!

IJ.057J

0.05W

-1.4

o.osos

O.lliil'i

(I rm~

0.0505
O.OolS
0.07-lll

0.114 "i

11.1104~

0.0516
O.OnJO

Ll049'i

11066.'\

llll5.P
ll fl6~5
n !J71J'\

1).0294
0 036~
U.ll455

t).l)7"'i

1)0721

0 o-o~

nM9-+

-U

O.I>'Jfo~

II (19'i I
II I 111

0 09:W
11.1112
(I 1114
(I)..; 'N
n 11s.~

(I (l<) I~

0.0<..10 I
0.107:1
I) 1271
O.IJlJ2

O.OXlo.'i

onso9
0 JOx

IS'i~

1.01\3~

0 IO"'IJ

II I {)(J3

0 12311
ll IJ4tl
0 IN\'

II 121 0
11.1423
0.1660

I)

O.i}l): 1
OJk-.23
11.()<)"5
0 lPI
0.13"'9
o loll

..

.,.,

-1.2
- 1.1
1 II
-0 \)

1(111

(1 11174
0()"'22
O.ll"S I

ll

!'~'\

0.11'1'13
0.12<J2
ll I 'i I'i
() l7ft2

O.rlO%

01736

(J

U.l115h
01 251
(I 1-lh\)
(l 1711

(1 (1(1~9

flllllfl

1190
01-tlll
0.1635

0.01~3

COnllnU"d OD nell J'!l~~

755

7 56

APPEN DIX

TABLE 1 (continued}

-().8
-o.7
-0.6
- 0.5
- 0.4
-o.3
-0.2
-0.1
-o.o
0.0
O.J
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4

].5
1.6
1.7
1.8
1.9

2.0
2.1
2.2
2.3
2.4

2.5
2.6
2.7
2.8
2.9

Se<ond Oe<imal Value of :r

0.2119
0.2420
0.2743
0.3085
0.3446
0.382 1
0.4207

0.-1602
0.5000
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.788'1
0.8159
0.84l3
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.97l3
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974

0.9981

0.2090
0.2389
0.2709
0.3050

0.3409
0.3783
O.-IJ68
0.-1562
0.4960
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920

0.9940
0.9955
0.9966
0.9975
0.9982

0.2061
0.235R
0.2676
0.3015
0.3372
0.3745
0.4129
0.4522
0.4920
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.&'188
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868

0.9898
0.9922
0.9941
0.995()
0.9967
0.9976
0.9982

0.2033
0.2327
0.2643
0.2981
0.3336
0.3707
0.4090
0.44R3
0.4880
0.5t20
0.55 17
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370

0.2005
0.2296
0.261 .1
0.2946
0.3300
0.3669
0.4052
0.4443
0.4840
0.5 160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251

0.94~4

0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984

0.1 977
0.2266
0.2578
0.2912
0.3264
0.3632
0.4013
0.-l-\04
0.-lROI
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.974-l

0.95R2
0.9664
0.9732
0.9788
0.9834
0.91:!71
0.9YOJ
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983

0.9382

0.9798

0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984

0.1949

0.2236
0.2546
0.2877
0.3228
0.359-1
0.3974
0.4364
0.4761
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0J:!770
0.8962
0.9'13'1
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.99~

0.9961
0.9971
0.9979
0.9985

0.1922
0.2206
0.2514
0.2843
0.3192
0.3557
0.3936
0.4325
0.4721
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078

0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.94lR
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.99 11
0.9932
\).9949
0.9962
0.9972
0.9979
0.998.'i

0.1894
0.2177
0.24R3
0.281 (J
0.3156
0.3520
0.3897
0.42R6
0.4681
0.5319
0.57 14
0.61 03
0.6480
0.6844
0.7190
0.75 17
0.7823
0.8106
0.8365
0.8599
0.8l510
0.8997
0.9 162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9903
0.9973
0.9980
0.9986

0. 1867
0.2l4H
0.2451
0.2776
0.3121
0.3483
0.3859
0.4247
0.4641
0.535Y

0.57'13
0.6141
0.65 17
0 f>S7lJ
0.72~-1

0 7549
0.7g52
0.8133
0./G:-:9
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.963~

0.9706
0.9767
0.9817
0.9857
0.98<.)0
0.9916
0.9936
0.9952
0.9964
0.997-l
0.9981
0.9986

This table con be used to calculate Pr(Z :S ;;) \\her~:; Z is a standard normal\ ariabh.:. For el\amplc. when: = 1.17. this
probabtht~ '" 11.87'Xl. which is the table cntry for the rO\\ labeled 1.1 and the column labeled 7.

APPENDI X

TABLE 2

757

Critical Values for Two-Sided and One-S'tded Tests Using lt!e Student t Distribution
Sig nificon'e Level

OegrM$

20.,., (2 -Sid.ed)

of Fr. .dom

10% (lSided)

\.118

2
3

I S9
1.ti4
1.5.1

1~

(2Sided)

s,. I1-Sided)

S"Ko (2 Sided)

2% (2 Sided)

1% (2 Sided)

2.5'1' (l Sided)

1"'4 (lSided)

0.5"4 (1-Sided)

3 1.K~

.t)(}

ll%

'J

35

u~

4.5-t

2.13

1 \(l

S.M
4.60
4.03
3.71
3.5U
3.36

'..,.s

5
6
7

1.4h

:!.ll2

1 4--t
1.41

1 1)4

2.7S
2."i7
2 ~"i

U~9

}.1()

.1,()(1

1.40
1.38

1.~6

2.31

U:l1

2.16

un

2.23
2.20
2.11{
2.16
2.14
2.1.1

2 1Xl
2.X2
2.76

1:'7
I .16
1.36
1.35
1.35
I 14
I ~4
1.33
113

lU
11
IZ
13

14
15
lh

lX
14
2fl

133
I ..H

t.:n

")I

22

1.32
I ,_
I \1
1.32
,~

23

24

2"
26
2"1

1.32
1.31

2~

I 1

21J
311

I 11
1.31

(1(1

I 10

90

1"'9
I 2.9
1 ~s

12!1
'r;

I.SO
1.78
177
I 76
1 75
1.75
1.74

2.11

:..72
2.1\R
2.65

2.62

3.25
3.1 7
3.11
--.o5
3.01
::!.98

2.60

~.95

-\fl

2.92
:! 90

,~

I 73
J --~_,
1.72

1.10
2.01.1

2 )4

~.l'\6

2.ll9

2<\\

2 . 5

1.72

2.0.~

., "'
2 <\ )

1.72
1.71
1.71
1.71
1.71
1.70
1.7ll
1.70

2.07
2.07
21X)
., .Oti
2.11tl
2.()5
205
2.05

., 4h

1.711

204

2.4ll

167
1.66

2.00
199

l.fi6

1.%

2 17
') 16

1.64

1.%

VJ

V,tlu~~ tr~ ~h"'n fM lht


:1

~-1'

' ll

1)2

2 <;<;

::!

~8

2 ~->3

2 '-12

2 "II

2 '\l

., 4l)

2.XO

,.41)
2.-tS
2 . 2..1"'

2.7b
17"

2 \()

2 79

2.76
2 "6
2.75
266

263
2.62
2 )8

mt il .tl 'ulut'~ fur''' o -stdcd (-...) and (\Oo.;\lded ( > ) nhcrn.ttlvc h~ pothc~c=-- The .:ruic.,J \':tlu,
the ont.. -~ld.:J ( ::> ) crlltctl v.tluc .,hm\n 111 the tahlt'. For cll:amplc. -~I~'"
l\\ll Mdtd 1<.:'-l w ith 0 ,j~OihC30t'C J.:vcl 01 5~, U"n~ lht ~IUU<.:fll I Jt~lllhUIIOO wtth 15 Jcl!fcc~

ln tho.; one \IU.:ll ( ) '"~' "tlw ll~i\itiiVC ~f


the CllltC.II \,J)UC 101
uf trccdnm

6 ~ .66

12.71

6.3 1
2.92

758

APPENDIX

TABL 3 Critical Values for 1M -' 2 Distribution

1
stgnifi<o nce Level

1--c
Degrees of FI'Mdom

2
3
4
5

6
7
8
9

10
1I
12

13

14
15
16
17
18
19

20
21

22
23

24

25
26
27
28
29
30

~~

.s~.

~ 71
4.61

384
599

6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.3 l
23.54
24.77
25.99
27.20
2!>.41
29.62
30.81
32.01
33.20
3U8
35.56
36 74
37.92
39.09
40.26

VH

, ~.

6.6~

9 21
I I >.t

9 49

13.2~

II 07

l "i.ll9

12.59
14.07
l5.5 1
16.92
18.31
19.68
21.03
22.36
23.68

16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.5R
32.00
334 1

25.00
26.30
27.59
28.R7
30. 14
31.41
32.67
33.92
35.17
36.41
37.65
38.R9
40.11
41.34
42.56
43.77

34.81

36.19
37.57

38.93
40.29
41 .64
4298

<W31
45.64
46.96
-li(.28

41J59
5U.X~

11us table cuntuins rhe 9Q'h. 95'11, ,1nd 99'1' percentiles of the x~ distribution. These serve! ::.s critical v:liuc' I'M lC:~IS
s1gnificancc levels of 10%.5%. und l 'Y.,.

''11ft

APPENDIX

TABLE 4

759

Critical Values for the F,. Distribution

()

Significance Level
DegrMJ of frMd.o m

10%

2.7 1
2.30
2.08
1.94
1.85
1 77
1 72

2
3
4

5
6

7
8

9
10
11
12
13
14

15
16
17
1.
1lJ
~()

_,'l..

3.84

6.63
4.61
3.78

2.60
2.37

2.21

1.60
1 57
1.55
1.52
1.50

3.32
3.02

2.10
2.0 1
1.94
1.88
1.83
1.79
1.75
1.72

2.80

? "?

2.64
2.51
2 41

-"2.25

:! 18
2 13

1.69

2 Ots

] .49

1.67

1.47

1.64

2.04
2.00
1.97
1.93
l.90
1.88
1.85
1 R3
l'1
I 79
1.77
1.76
1.7.t
1.72
1.71
I 70

1.46
1.44
1.43
1.42

1.41

:!_'\

1.40
1.39

2-t

1.38

2S

1.38

26
27

1.37
1.36
1.35
1.35
1.34

2'>
1()

l "fo

3.00

1.67
1.63

21

28

5%

1.62

1.60
1.59
1.57
1.56
1.54
1.53

1.52
1.51
l ~()
I 49
I 48

1.47
1.46

Tlw. t11hlc cont.un the 1)1)01',9'i1h uml1J<1'1 pt:rcc:ntiles of the: F, d1~tnbullon. l111:~c ~r' ~ as cnttc.ll valu~.:~ for tc~ts with
~ignific.mw lc\c:l~ ot 10 11o, 5').,, und I%.
y

760

APPENDIX

TABLE 5A Critical Values for the F11 , n, Distribvtion-10% Significance level


Denominator

Numerator Degr-s of Freedom (n 1)

Degrees of
Freedom (n ,)

10

:Ws6

49.511

53.59

55.1:1~

5h l){)

QJ()

_,

lJ..'\5

SAn

60:20
Q.J9
5:23

4 54

'i
h

4 (,K)
3 71'\
3.5\J

5.39
4.19
3.62
:1.29
3.07
2.92
2.R I
2.73
2.66
2.nl
2.56
2.52
2.49
2.46
2.+1
2.42
2.40
2.38
2.36
2.3...)
2.34

9:n
'i 2:-i

9.24
5.34
4.11
3.52
3.1.
2.96
2.1S l
2.119
2 .61
2.54
2.4S
2.43
2.39
2.36
2.33
2.31
2.29
2.27
2.25

59.+1
9.37
t:..25

59.1\h

Y.OO

57.24
Q.::!9
5.31
4.05
3.45
3.1 1
2.HH
2.73
2.1'1 t
2.52
2.45
2.39
2.35
2.31
2.27
2.24
2.22
2.211
2.1S
2.10
2.14
2.13
2.1 1
211J
2 09

5~.20

K53
5 54

I
2
~

7
8
9
10
J1
12
13
14

4 32
3 7t\
3.46
~.lh

~.46

3.11

3.36
3.29

3.01

:w

2.97

21

~.%

2.92
2.X6
Z.:.H
2.76
2.73
2.70
2.67
2.64
2.62
2.o1
2.5l)
:!Y

22
23

2,1:)5

256

2.94

2.5'l
2 54
2.53

~~

J.n
:1 IX
3. 14
3.10
3.07

16
17

:to:'

I~

3.01

!9

2 .~

24
25
26
27

3.03

'2.9"~

2.92
2 IJ)
2.90

, -.,

_,:'1.:.

2l\

2.SI.J

2.:1
2.50

29
30

'2.~9

~.50

2.:->,
2.79
2 7o
'2 75
2.71

1.49

60
90
120
<X

::.:w
2.36
2.35
2_"\()
11

_., ..,:.

~~

......

2.32
2.31
2.30
2.29
2.28
2.28
2.18
2.15
2.13
2.08

I'hi~ lahlc c:unt.!in<; I he "ll percenlllc: ol the F~


IO"ri ~ignii!I.;;IOC<: k\d

'2.23

.,...,

-2.21
')

2.19
2.18
2.17
2. 17
2.16
'.2.15
2.1-1-

2.04'2.01
1.99
1.94
"!

2.0S

2.tr7
2.06
2.06
2.05
I 95
1.9 I
1.90
l.RS

4.0 I
1 40

3.05
2.H~

2.67
2.55
2.46
2.JlJ
2.33
2.28
2.24
2.21
2.lX
2.15
2.13
2. 11
2.119
.:!.fiR

2.1)(,
2.0."\
2.04
2.02
2.01
2.1)()

2.110
I 91.J
1.%

"27
3 9H
3'\7
J.Ol

2.7'rl
2.62
2.51
2.41
2.34
2.~8

2.23
2 19
2.16
2.1 3
2.lll
1

.0X

2.1)6
2.1>4
2.112
'01
I ~y
I%

1.97
1%
I 115
1.94
191

\.95

9 \h
5.24
\.lJ4

3.34

u2

:> ()[\

2.%
2.72
2.5o
2.44
2.35
2.27
2.2 1
2.1 6

2.75
2 51.)
2.47
2J ij
2.30
2.24
2.20
2. 1'i
112
2.fW
2 .()6

2.114
2.02
2.tXI
I 9R

I 97
I 95
I 94
I Y'J
I 92
I \1 I
I

l)(J

I c9
t'l~

1.93

UG

1.8::

1.1:4
1.82
1.77

1.77
1.72

I 77
I 74
172

l.7R

1.67

2 12
2.09

2.00
203
2.W
I Q,'
1%
1 cr
I IJJ
I~

PH

1.9
I~

J.;-,l.S7
l ,hf,
l.K5
1.74
1.70
l.t'S
1.6:1

di'>tnbution. wh1ch serve ~ a~ the cracical value<; lor,,

3.92
3.30
2.94
2.70
2.54
2.42
1.32
2.25
2.19
2.14
2.10
2.1l6
2.0.1
2.00
1.98
1.96
1.44

1.1}2
1 I}(J
I SI.J
I ;-,
1.:->7
l.l>n
I.S5
I X4

un
LS2
1.71
l.b7
1.65
1.60
I Qit

with

II
j

I
il

761

APP ENDIX

TABLE SB Cntical Values for the F11 , " Distribution-S% Significance level
Denominator

N u merator o.gr-s of Freedom (n 1)

O.g,... of
Fr-dom (n2l

.5

21 'i.70

224.60
19.25

2311 21 )
19 3U

2~Wt1

2.1t~.t'O

2.:\~.lJO

~I

8.79

h.2fl

6ln

6.ll9

19 \7
X h5
hiJ.I

19.40

9.01

llJ '5
8 l'\9

IY.'N

fo.UO

5.-11
4.76

9.12
o.39
5.19
453

11J.H
t\ 44

'i.05

4~

4 ~2

4.77

5.96
-t74

-LW

-121

4 15

4. ~5

412

3.Y7

4.95
4.21-i
3.1\7

179

410
) 6S

' lW

3.6</
3.4X

~.:'h

~\.:ill

317

~'.29

3...::!2

3 14
1.11 1
2,1)1

IIJIJ "1(1
I \1 0!1

3
4

11113

9 'i'i

7 71

,, l}..j

n.ill

57!)

fl
7

5.lJ'1
.) .:;y

" I~
4.74
-1 -16

4 "h

4~>n

4.10

9
10
II
12
lJ
14
15

"12\:'

4~-1

.\9~

4.75

J.~lj

4,h7

3.bl
3.74
3,6.'\

4 tiO

~~

454
4 49
4.45
4.41

IQ

4 .3~

20
22

4.3:'
4 .n
4Jll

"~

4.2s

,4

4.~()

?)

4.24

ltl
17

,.I

J.63
3 'ill

4.07
3'->{,
3. 71
159
3.49

3.311
l.'!()

J.ll

3.41

.\.1)\

1.34

lll
306

3.03
2.96
2.90
2...:5

129
3.2.-1

3.20

3.41)

J.lll

3.47
3.41
3.42
3.-111

3.07
3.05
3.03

339
3.37

4.2~

60

4.21
4.211
4,IS
4.17
4.t:XJ

911
1211

3.9
,,92

3Jn
3.1l7

'X

.'\.Mol

,\,()H

1.35

3.34
3..33

) ()~

3.33
1.211

3 16

27
2R

30

19.16
9.2S
6.59

3 -;.:;
3.'i.'

2h

~~~

10

1() 1.411
ts 'il

1
2

\D

3.01
2.9'>
2.9~

2.%
2.95

34~

1.111

2%
2 13
2QO
~ R7
2.M
2. Q

2. o
2 7~
2 "'fo
2.:'4

2 7(!

:!N
2.7-1

2.71

2.64

2.:w

.Hl9

3 {)()
2.'12

2.~3

fi

J 3Y
~~~

3 02
2.1){1
2.80
2.11
2.6.:;
2 'i'}
2 '.:t
2 :.jl)
2.-lt)
2.42
23q

"_.,
~ ~

:tltl

2.74
171
2 6S

:!.h~

, 'i..j

~.4S

2.60

2.)1

2.57
2.5:'
2.53
2.51
::!.41)

:?W

2.4:.2.42 ..J(I
7.').7

2 4ll

2.~4

~.47
__
.u;

2.39

2.32
2.31

""-

2.~Y

2.24

2..2S
2.27

2.21

2.10

2.U4

V>l

2.66
::!.6-l
2.o2
2 611

2 '\Q
2

2.71

,~

3.32

292

2 'U
2.69

'15

2 .,.6

2.53

2 '6
'2 ..:;s
253
2.3'"'

2.71

2..-n

~-~

2.4~

212
2.211

.2.611

2.37

2.21

::.en

2R5

7\
3 -14
3 23
3117
2 \)'\
:? l\'i
277
., 70

240.50 241 90

2,()(1
2 61
2 "'t)

'-~

-''

-o

2.45
24"'
2.42

2.25
::!.2(1
2.1S
2. 10

~.16

2.-14
2.42

" 'l""

2 '"
235

2 ..H
2.17
2.11
2.1Jll
2.01

255
2.51

2.36

-l.llti
3.64
3.35
3.1-1
2.98

2.85
2.75
?..67
2.60
?.5-1
?..4Y
2.45

2.41
2.38
:!.35

2.:\7

2.3~

2~

2.30
'2.21

2 ..~:
2311
2.2S
J. . L.I

2..25

., .,.,

2..25
'2.2~

2.""\
1.20
2.11)
2.18
2.16
1.99

2.1l4

J.l)<)

J.Q~

2.02

l.lJ6
L.88

l.Yl

1.94

(llr a

1.83
tc~t

\\ith a

762

APPENDI X:

TABlE SC Critical Values for the Fn, , n2 Distribution-1 % Significance level


Denominator
Freedom (n 2 )

1
2
3

4
5
6

7
8
9
10
11

12
13
14
15
1.6
17
18
19
20
21
22
23
24
25
26
27
28
29
30
60
90

120
~

Numerator Degrees of Freedom (n 1)

Degrees of

to

-t052 .00 4999.00 5403.00 5624.00 5763.00 5859.00 5928.00 5981.00 6022.00 6055.00
98.50
99.00
99.17
99.25
99.30
99.33
99.36
99.37
99.39 99.40
34.1 2
30.82
29.46
28.7l
28.24
27.91
27.67
27.49 27.35 2:7.23
21.20
18.00
16.69
15.98
15.52
15.21
14.98
14.80 14.66 14.55
16.26
13.27
12.06
11.39
L0.97
10.67
10.46
10.29
10.16
10.05
13.75
8.47
10.92
9.78
9.15
8.75
8. 10
8.26
7.98
7.87
12.25
9.55
8.45
7.46
7.85
7. 19
6.99
(\.84
6.72
6.62
11.26
8.65
7.59
7.01
6.63
6.37
6.18
6.03
5.91
5.8 1
t0 .56
8.02
6.99
6.42
6.06
5.61
5.47
5.80
5.35
5.26
10.04
7.56
6.55
5.99
5.64
5.39
5.20
5.06
4.94
4.85
9.65
7.21
5.67
5.07
4.89
6.22
5.32
4.74
4.63
4.54
6.93
9.33
5.41
5.06
4.82
4.64
5.95
4.50
4.39
4.30
9.07
6.70
5.74
5.21
4.86
4.44
4.30
4.19
4.1 0
4.62
8.86
5.56
6.5'1
5.04
4.69
4.46
4.28
4.14
4.03
3.94
8.68
6.36
5.42
4.14
4.89
4.56
4.32
4.00
3.89
3.80
8.53
6.23
5.29
4.77
4.44
4.20
4.03
3.89
3.78
3.69
8.40
6.11
5.18
4.67
4.34
4.10
3.93
3.79
3.59
3.68
8.29
6.01
5.09
4.58
4.25
4.01
3.8.4
3.71
3.60
3.51
8.J8
5.93
5.01
4.50
4.17
3.94
3.77
3.63
3.52
3.43
8.10
5.85
4.94
4.43
4.10
3.87
3.70
3.56
3.46
3.37
8.02
5.78
3.- 1
4.87
4.37
4.04
3.81
3.64
3.31
3.40
5.72
4.82
7.95
4.31
3.99
3 .76
3.59
3.45
3.35
3.26
4.26
7.88
5.66
4.76
3.71
3.94
3.54
3.41
3.30
3.21
7.l$2
5.61
4.72
4.22
3.90
3.67
3.50
3.36
3. 17
3.26
7.77
5.57
4.68
4.18
3.85
3.63
3.46
3.32
3.22
3.1 3
7.72
5.53
4.64
3.82
3.59
3.42
4.14
3.29
3.09
3.18
7.68
5.49
4.60
4.11
3.78
3.06
3.56
3.39
3.26
3.15
7.64
5.45
4.57
4.07
3.75
3.53
3.36
3.23
3.12
3.03
7.60
5.42
4.54
4.04
3.73
3.50
3.33
3.20
3.09
3 .00
7.56
5.39
4.51
4.02
3.70
3.47
3.17
3.07
3.30
2.98
7.08
4.98
4.13
3.65
3.34
2.95
2.82
2.72
2.63
3.12
6.93
4.01
3.23
4.85
3.53
3.01
2.84
2.72
2.52
2.61
6.85
4.79
3.95
3.48
3.17
2.96
2.79
2.66
2.56
2.47

6.63

4.61

3.78

3.32

3.02

2.80

2.64

2.51

2.41

2.32

This table contains the 99l11 perce ntile of the f~'"! di st ribution, which sc rv..: ~ ns the cntical values for a te-;t w1th a
I% !;ignificancc level.

Refer ences

AnJcJ<.un, TI~,xJcu~ W .tnd l h. rm.1n Ruhin. 1950.


" J,tmMtor' 1)l tho. P.tr.tmch.:r' 1ll a ~~~~~le Equ:llton
'"II Complctt Sd l>i ~llll.h.t,tll. r l.jU<Itlons." AlllltJL\
<IJ .\futile matt ,t/ \tuttll ,. ~I ..-"0 '\s2.
Anc..lre'" Don tlc..l \\1 K IWI. I lctcrv,l.cc.Ja,tidt' md
Au1 tX >rrdltll'll C'on'''tr..nt Cov.m.mc~: ~tatri.~
E~tunat10n.' IHI/Iolllc'lrtttt 'it!(~) h17~5S

Andre"' L.>nn. lc..l W lo. IIJ'J' le'h tor Parameter


ln\tahtltty Md Stru~.tul al Chan~.: w11h l 'nknown
( han~c p, In! .. rllll/lllllt '/fl( II hI ( -t) X~ I ~56.
Andll;\\' (),m.tld \\." "Om. " lc:,ts Fnr Parameter
ln<aabthtv .tnd !:ltru~:tuJ.tl Ch.m~c V.tlh L'nl.nown
Ch.tn\. Pmnt A C'urn~enJum: I:.c mw1m mw 71:
:N:'I .W'.
AnF!n:ol. k"hu.1 I) Im I tfcum .. btrmng~ .mJ lbe
\ 1~.:tn tm I ru D1.tlt l 1 lien [ 'idr..nce trom Socml
Sccurtl\ \Jmun .. trall\1. Rcn>rJ, Am~rtam
FCt twll , R.utw stln): 3U .~fl.
Angr't Jo,hul nJ \\tlhlm fnn<.. Jl}QS. "Children
Jnd Then P:m nt.' l .tl-lf Supplv: Fndencl! fr(lm
Exogcnnu \ \'In itttlll 10 F.tmtl) Slit..'' Amencatt
~~ """""' Rctut~ , ~n ): 4~0 117
A Ol!rt't,Jll'.hua ll. lo-.sthl\ n ( ,r,tdJv .md GuiJn
lml'l~.:n' :ool lnr.. lnh.. rp!r..;l.ttton of Jn .. trumental
\.,,n.thlc, I ' tllll.ll\lr' 111 S1muh.mcou... Ett~llo~
\1~ Jd-. "ttl 10 Appl ~..ttion tu tl11: L>cmanJ for
l1,h.'' R( 1/t 11 uJ J:.cuiiOIIItC ~tudu:s 6 (2.32 )
4\Jtl 'i':t7.
An~r '' J,,.., uo~ , nnJ Al.sn Krut;g~. r. 1991. ''D0c:s Compul~" \~,.h,lOI Ancnd tn Alkct l:,choohng. and
E unu11 ,., '! Quurlt'tlj Joumul f>{ f ww mic:s I 06(.:1)
970 1011.
An~ttl,l , Joshun I) . and Al.1n R " Ut:'!a 2001.
.. ln~trumcnt al V.muhl~.::. 'lnd th S<: 1n.:h fur l~nlificauun: J wm Suppl~ md l>r..m.snJ to ' at ural
F\pcnmcnb "J,"'"'''' o) I <tmollll P.-rspt:ailt:s
15(4). l'llll b9 5
Arellano, Manuci11XIJ / 'mrc I Oata Ccm10mt<trics.
0\f,,rd. UK. thf\ltd l Ot\l;l'''' Pr,;~
A~ r~.: lnn. nml John Dtmuhu.. ~tin "Shoollng Do"' n
the ~o1c Gun~ I.e'' Cnmt: Hvpothc'i .. Sranfortl

Bt!ck. Thor--ten. Rn'' Levine .tnd 'iorman LoaHa.


21100 'Ftnancc anJ the Source' 11f GrO\\th.';
lottrlltll of Fmwu wll ('()/1(11/1/( \.'iS :!6 1 300.

Berg,trom ThcoJ,sr~ \ 1 (M'JI "fret lubor lor Cost!~


Journak1 "loumctl uf Ewnumtc l'alpccmes 15(4).
1-all. L8:l-11J8.
Bcrtr-am.l. ~1.triannc. tnu Kr..\ 111 ll.tiiOI:k 2001 "The
G~.:od~.:r Gap 111 lop ( <>1 P"' I h. Joh lndullrtaland
Lahor Rrlutum.\ Rt't'll l\ 55( I): ~21.

Bertrand. M.manne. and Scndhtl Mullainathan. 2QO..t.


"Are Emil) and Greg \itfre F.mplo}able than
LJI.i,ha mJ Jam.tl'! A T-tcld Experiment on Labor
'v1arl..ct Dis...:rimin.stion .. \mcrrwu f.(onomu:
Rc:1wlt 94(H:99l 1013.
.Bolkr:.C'>,lun 191>6. "Gcn1. ralw.:J Autor...:grcssi\ e
Conditional! h:tcro~kcc.Ja,tlcll\ 'Joumal of
(onmnt:mo 'I ( 1) 307 ' 27.
BounJ, John. Oa\ld A. lnc.ger, .Jnd Rc!103 M.
Baker 1'.19'i. " Prohkm~ with Tn~trumentnl
\.ana hies E~tlnllllllll \\'hen the Corrcl.ttion
Between the ln<..~rurn~.:nt nnJ thc FndogcnCIUS
E.'planawn V tn.t"h: J... \\~.,,~; "Jvumnl of
t/u Anll'mmr 5ulii.IIIWI \ \\OcttltioniJO(430).
44~-t50.
C<~.mpbcU John Y ~om "Con'>umption-Bascd Asset
Pncmg Ch;.sp I~ 111 llnttdluok ottllr Economics
<lf Fmallct cd1tt.J by \hit on I I arm and Rene
Stulz Amstc: rdam El'"'icr.
C'.:~mp~ll John 'r \nd \rtottlhirn Yogo. 2005. ''Effictent Te'l" ol Sttd~ Re:turn Prcdtct.1b1ltt~ Joumnl
o f Fuw''' wl F O/ltlllllc'\' ( lorthCtlming).
Card. 03\td 1990. "lh\: lmp.tCI or the \1arid Bo<Jilift
on the M1<1mi Lalxn \l.trkct." Jndtutrial anti Labo1
Rrluttotl\ l<f'\1('\\ 43(21:24'> 2'\7
Card. O.t,td. 'N-J In~: Cau,.tJ Efh.ct of Education
on Earmng:.." Chdp. 'I 111 llu llmuibook of Labor
f.(u11omtt'.\. c;u1h;J bv Orlc\ <... _\,hcntcltcr and
David CarJ Am,tc:rdJm Chr..\ ~.:r.
Card D.n1c..l. and Alan B. 1\.rucl!cr. 1994. w~linimum
\\ gc~ and Empk1~ m~.:ot .\ i a'c Stud) ot th.: fast
Food LnJu,uy." A lilt""'" c nomic R~den l:>-1( 4 ):

/.on Rr.1 Wll S5: II C()

(.'\ l:!

Barcndregt. Jan J 1)97...n,~ H~.:;~lth Cart Coa<; of


Smokmg." rh, /l.'r.~t l: n~:lmtd lvuuwl of -\.leJicute
::\37( I'\): 1051 10Ci7,

~-793.

.trhart. \1.trl \l I '1:17. "On P..:r..i,tcncc m ,\1utual


Fund Pertonn nee: fiJI! loum11/ uf Fimmct 52( I):
57-R2.
763

-1

764

RE FERENCES

Chaloupka. Prank J.. und Kenneth E. Wa m tT. 2000.


"The Economics of Smoking." C hap. 29 in The
Ham/book of l lealrh Economics. edited by Joseph
P. 1 cw hou~l! <IOU Anthonv J. Cuvler. Ne\\ York:
North Holland. 2000.

C how. G regory. 1960. "'tests o f Equality Between


Sets of Codficu.:nb 10 lvo Linea r Regre~sion s."

Econometrica :!8(3). 591-605.


Michael P 2004. " Evaluating the Bank of
England D cnsit) Fo recasts of Infla tion." Economic
Jouma/ 114: 1:!44-866.

Clemen t~

Ox-hranc. D.. and Guy Orcutt. 1949. "A pplication


of Least Squares Regressio n to Re la tionships
Containing Autocorre lated E rror TemlS." l ou mal
of the American Swti.\tical Association 44(245):
32-6 1.
Co he n, Alma, a nd Liran Einav. 2003. "The Effe cts of
Ma nda tory Seat Bell Laws on Driving Be havior
a nd rra ffic Fata htie s. fhe Review of Ecnnomics
and Statisrics 85(4): 828-843.
Cook. Philip J.. anu Michael J. Moore. 2000. ''Alcohol." Chap. 30 in The Handl>ook of Health
Econ om ic.~. edited by Jo.c;ep h P. Newhouse and
Anthony .r. C uyle r. New York: North Holland. 2000.
Coop~..:r. Harris. and Larry. V. Hed ges. 1994. The
Handbook of Research Synthesis. New Yor k:
Russell Sage Foundation .
Davidson. Jame~ E. H.. D avid F. Hendry, Frank Srba,
and Stcpht.n Yeo. 1978. "Eco nometric Modelling of
the Aggregate 'Jimc-Scn es Rclauonshtp Be tween
Comumcrs' Expendtture and Income in the United
Ki ngdo m." Ecmromrc Journal 88 661--69.2.
D ickey. Davtd A .. and Warne A. F uller. 1979. Distribution of the Estima tors for Auto r..:grcssive Tune
Se ne~ with o Uni t Root." Jormrol ofrhe American
Statr\ll('al A ~sociauorr 74(366): 427-431.
Diebold, Francis X. l997. Elements of Forecastinf(
(second e dttio n) Cincinna ti, OH: South W~rcm.
Ehren berg. Ronald G .. Dominic J. B rewer. Adam
Gamoran, and J. Douglas Willms. 200 la. "Class Size
and Studen t Achtevement .. P~yclwlugzcal Science
in till! Public lmue.1t :!(I); J-30.
E hre n her~. Ro nald G .. Dominic J. .Brcw.:r. Adam
O amoran. and J Dougla~ Willms. 2001b. o oes
Class Si7e Matter?'' Scienufic American 285(5):

80-85.
E tcker. F. 1967. "Limit Theorems for ReRres.~ions wi th
U nequal and O ,;pcndl!nt E rrol"\." Pro'Cet-dmg~ of

the Frftlr /Jakelt!y Sympo.~ium 011 Mmlmnaricul


Stut~llcs und l'robabdit~. I, W-82. Berkeley:
l"nhcr~it) l'f Calttomia Press.
Elliou. Gra ham. lltomn~ J RothenN:rg. and James H.
Stod .. 1996. "ffficicnl Tc)b fo r an Au toregTe~:.ivc
L'ntl Root" Fnmrmtrmm tH(4) Rl>-..'>~.
Endcl', \ \,lht. r I 99'\ Applu:d f.C()fiOIIIt'lTIC Tune
Strrn.l'i~"' York \\ tlcy.

Engle. Robert E 1982. ''Autorcgreo,sivc Condiunnal

(lr

Heteroskeda'>Licity with Estima tes


the Variance
of Urutcd Kingdo m lnllation.'' Economerricu 50(4)
987-1007.
Engle, Robcm F., and Clive W. J. Granger 19X7 'Cointegration and E rror Co rrec tion: Rcpre~entation
Esumation and Testing.'' conometnw 55(2); 251- 27o.
E va ns. William, Matthew Farrelly. and Fdward
Montgomc.:ry. l999 " Do Wo rk place Smokmg Ban'>
Reduce Smoking'1" American Fccmomit Rel'ien
89(4): 728-747. ~
Foster, Donald. 1996. " Primary Culprit. An AnalysiS
of a N o vel of Politics." N ew York \.Jagazmt! 29(8),
February 26.
Fuller, Wayne A.1976. fmroducuon 10 Sratiscica/ Ttme
Series. New York: Wi lev.
Garvey, Ge ra ld T., and Gordon Hanka. 1999. "Capital
Structure and Corporate Cont rol: Tile Effect of
Anti ta keover Statut,;s on Firm Leve rage.'' The
Joumal of Finance 54(2): 519- 546.
Gillespie, Richard. 1991. Manufacturing Knowledge.
A History of the Hawthorne F.xperimem:.. New
York: Cambridge Uni versity P ress.
Goering. John , a nd Ron Wienk, ells. 1996 Murrgage

Lendi11g, Racral Discrimination. and Federal Polu:y.


Washi ngton. DC: U rban Ln~litute Press.
Goyal. A mit. and lvo We lch. 2003. 'Predicting
the Equity Premi um with Dividend Ratios."
.\fanagemem Science 49(5): 639-654.
Granger, Clive W. J. l%9. tnvesllgati ng Causal Relations by Econometnc Models and Cross-S~ctraJ
Methods." EconometriCll 37(3): ~24-438.
Granger. Qive W. J., and A.A. Wet.;s.l983. "Tun,; &ri~
Analysis of E rror-Correction \1odels.'' ln Swdies in

Econometrin Time Series and JlulrivanaJe Statistics. edited by S. Karlin. T. Amcmiya. and L A.
Goodman. 255-278. New York: Acade mic Press.
Greene. William H. 2000. conomeiric Analysis (fourth
editio n). Upper Saddle River, NJ: Pren11ce Hall.
G ruber, Jonathan. 2001. "Tobacco at the C rossroads:
ll1e Past and Future o f Smoking Regulation in
the u ni ted S tates: The lou mal of Economic
Perspec/11'1.', 15(2): 193-21 2.
Haldrup. Ntels and Michael Jansson. 2006. "lmprm'ing
Size and Po,.,er in Unit Root Testing.'' Palgmve

Handbook of Econometrics. Vo/umnl: Ecnnoetric


17reorv. 252-m
Hamermcsh. D anid. and A my Parke r. 2005. 'Beauty
in the Classroom: Instructors' Pulchrnude and
Puta tive Pedagogical Pro ductivity... Economics o(
Education Renew 24(4): 369-376.
H amil ton. James D. 1W4. Ttme Series Anaf.u is.
Pnncclon. NJ Princeton L ni ..crsitv Press.
H anS(!n. Bru~ tC>92. "ELfi1.1cnt E~ttmation and Tcst.iru! of
Cotntegraling Vectors tn the PreS(!nce or Oetermmisoc Tremh" Jounwl of convmetrtC.'! 53(1 -3 ): ~12 1.

REFERENCES
Han~en.

Bruce 2001. "The :-.llw Econometrics


of Struclurdl C"bonge: Dating 13rcaks m L.S.
L.tbor Producllnty." The! Journal of Fcmwmic
Pc:r~pt'atles

15(4).fa1L 117-12X.

Hanushck. Enc. 1999a. "Some Findmgs from an Independent l nvc~llgation of the Tennes.~ee STA R
Expcnmcnt and from Other Invesugations of Class
Size Eilccls." d111:ationnl F.valuatwn and Policy
A na/rsis 21: 143-164.
Han uslick, Eric. J999b. "Tht: Evidence on Class
Size.'' Chap. 7 in Earning und l.eami ng: How
Scltools Mauer. edited by S. May~r and P. Peterson.
Washmgton, DC: Brookings Lnsttf utlon Press.
Hayasht. Fum to. 2000. Econometrics. Prin elon. ~J:
Princeton llntversitv Press.
Heel.;man. James J. 2001 "M icro Data. I h!terogeneity,
and the E\'aluauon of Public Policy: Nobel
Lecture Jou.nwl of Political Economy 109(4):
67:>-748.
Heckman, James J.. Robert J. Lalonde. and Jeffrey A .
Smith. 1999. "'.fhe .Economics and Econometrics
of Active L1hor Market Progra m~" Chap. 31 in
Handbook of LAbor Economics, edited by Orlcy
Ashenfelter and David Card. Amsterdam:
Else\'ier.
Hedges. La~ V..and Ingram Olkin. 198-. Stalistical
Method~ fur Mcta-anulysis. San Diego: Academic
Press.
Hetland. Lo is. 2000. " Listening to Music E nhances
Spallal-Temporal Rdlsoning: Evidence for the
' Mozan .EfCcct.'" Joumal of A estltettc Education
34(3-4): 179-23~.
Hoxby, Caroline M. 2000. "l11c Efft.:cts of C'las~ Size
on Student Achieveme nt: New Evidence Crom
Population Variation." The Quarter~y Journal of
Economic~ 115(4): 1239- 1285.
Huber, P. J. 1967. "The Behav1or of Maximum Likelihood Eo,timates Under Nonstandard Conditions.''
Procrt!thng\ r>f the Fifth Bc>rkdey Sympo~ium on
.Mathemuucol Suuistics and Probabilil\'. 1. 221 - 233.
Bcrl..ch.:y: L mve~ity of Cahfomia Pre'i&
J mben~ Gutdo W.. and Johsua D. Angnst. 1994. fdeotification ;.md Estimation of Local Average Treatment EHccts." Econommica 62: 467-476:
Johan~en. S~lren. 1988. "Stallsttcal Analv~is o( Cointe
grau ng Vcctof'!\." Jou rnal o.f &o11omic Dynamics
and Comrol 12 231-254.
Jom.:~. Stephen R. G. 1992. "Was There a Hawthorne
Eflcct 1" Amaicmt Joumal of Soctology 98(3):
.l~l-46.'\,

Krucgcr.AJuo 8.1999 "Expcrimcmal Estimates of


Fducatwn Production Fu nction~ The Quanerly
Jmmwl of F.wnomics 14(2) 497 -56:!
LaJd. Helen. 199R " Evidence on Discnmination in
Mortgage Lending.'' lou mal of Ecollomic Perspec
tit't'\ 12(2). Spring: .t 1-62.

765

Levitt. ~ tc , cn D 1996. "The Effect of Pri,on Populauon Si:rc on Cn me Rates: E\ idence from Pn son
0 \crcrowdmg Llltgallon.'' Tltt! Quamrl\ lou mal of
Econonuo 111(2). 319-.151.
U\ ttt. ~tcv~n D. md Ja.:k Portc:r 2001. " How Dangcrou~ Arc Dnnking Drivers?" Joumal of Political
Eco/IOIII\' 109(6): 119~1237.

List, John. 2003." Doc Market .Explricncc Elimina te Market Anomalies.'' Quanerly Journal of
Ecrmo111ics II H( I ): 4 1-71.

Maddalo. G. S. 11.)8.1. Linured-Dependelll and Qualita11\'(' Varial>/e ~ in Fconom etncs. Camlmdste: CambridS!.e U niH~rsi t v Pre~
MadJ ala. G S. ami In-Moo Kim. 1991:!. Una Roots.
CotnteRratwn. and Stmcnmtl Chon ~e. Cambridge:
Cambn ugc Lmvc~it\ Press.
Madnan. B~ige ttc C . and Dennis F. Shea. 2001. ''The
Power of Suggestion: inertia m 40l(k) Participation
and Saving.<; Behavtor Quarterly Joumal of
Economtc~ CXV1(4): 1149-11&7.
Malk1d, Burton G. 2003. A Random Walk Down Wall
Street. Ne" York: W. W. orton
Manning. Willard G., ct at. 1989. "The Taxes of Sin:
Do Smoker~ and Drinker.. Pay Their Way?" Journal of the Amnica11 f\fcdicol Associmion 261(11):
IOCl+- 1609.
McClellan. Ma rk. Barbara J. McNeil, and Jos.:ph P.
Newhouse. 1994 "Does More ln tensi\'e Treatment
o f Acute Myocardt:ll lnfarct1on in the Elderly
Reduce Mortallly'?" Jnumal of the American
'!yfedical A \\nciauon 272( I I): R59-866
Mcvcr, Bruce D. 1995. "Natural and Quasi
Bxperiments tn ECOI'IOmlcs.'' Journal of Business
and t:rononuc ~trtltstics 13(2): 151 161.
1\-h:ycr. Bruc<. D.. W. Kip Viscusi. anti David L.
Durbin. l995. " W OJ kers' Compcn<;ation and
Injury Duration: Evtdencc from a ;-:at ural Expen ment " ll mencan Econom ic Re1iew 85(3):
322-340.
Mosteller Frederick. 1995. - The Tennessee Study of
Class Silt" in the Earl\ School Grades.'' Till! Fwure
of Chtltlmt Crwcal l~sucs for Cluldr~Jn and Youths
5(2). Summer Fall: 113-127
Mosteller. Frederick. Ri~hard light. and Jason
Sachs. 1996. "Su,l.tmcd lnquli}' m Education.
Lc~on~ from Skill Grouping anti Ctw..s Sit e."
Hartarrl l:.dllcllttnnal Re>irw 66(~). Winter:
631-676
Mostcller,Fr.:dcrick. and Da-.id L. Wallace. 1963.
Inference in ,m Author.,hip Problem." Joumal
of rile A m-nccm Statisllcul Aswciuricm "i8:
275-309.
Munneii ,Alicia T-T . Geoffre\' M B.To01ell, L\nne E.
Browne. nnd James McEneane~. 1996. " Mortgage
Lcndtng tn 13o~ton: lntcrprctmg IIMDA Data."
A mem:an htmrmllc R<'new 86( 1). 25 'i3.

766

REFERENCES

Neumark.Da"iu. und William W.t'-Chcr 2.000.

.. \.finimum \\'ugc~ aod Empl11\mc:nt \ C:1:.c


Studv ul th~ P.t,lhx>J lndu->11'\ 10 ' c'' Jcr~v and
Pennwlvam.1 Cummcot: . \m er~ctJII Fnmomic
Ret vii

I}(J( ) )

1162 I 1Q6.

'\!"ewe). Whnnc~ .mJ Kennctlt \Ve<;t ICJ:-(7 ~A Simple


Posiuvc Scrlllt.kl'mHe. Hctcroskcda\lic and Autocorrelatrnn Con't'tc:nt Co,ananec ~l.ltrt.\.
Ecmwmtmw ~5(31 "'03-71~\
'lewhou,e, Jo c.: ph P ct. al. I99.'

Fr,~

{or \ If?

lt"o"' from tht Ntlllli llto/tlr lmrmlllC't' Experiment Cambnucc: I larvard L nrvcrsih PTe~
Pem. Cr.ul!. ,md I lane\ S Ro-.c:n 21i. "'Jbc SelfEinplt.l\cd J\r4- l e.. ~ Lrkely Thun W;~gc-Earner; to
Havo.: I lcalth ln,unmce. Stl \\ hut?" l n Emrepeneurship tlltd f'uhlit Pnluy. curteu hv Dl1ll!!.l3-" HolllEaktn and l lar"c' S. Ro, cn. Boston M IT Press.
Phillip~ Pet~r C. R..'.tnd S<tm O ulron,. l'~Xl...Asymptotrc Prnp~.:rtie' uf Rc~adual Ba.,ed Te'" fur Cointegrauon l .nmometrica 5h( 1): loS 194
Porter. Rob~: rt I t.)K\ "A Study of Cartd Stability: The
Jomt Lxct:uta\e Cnmmrttee. 11\&>- ltsXtl.'' The Bell
Jvumal of Fcol/wmn 1-1(2): 301- 314.
Q uandt. R ac hard. 1960. "Tc:<;tS of tlw llyrothesis
l hat a Luwar R egr~.:.;-;it1n Sy,tcm Ohcys Two Separate Regtme"" Joumal vf che .t\mencan StullMtcal
Auoci.utw11 ~'i(290) 324-330
R auscher. Franc~'- Gordon L. ~h3\\ , and K:uherinc r-:.
K~ . 1993. ''\lu,ie .md 5paual Task Pcrform.ancc:
Nature 365( 6-147) to II.
Roll. Ru.:h.ml I liM orange Juice and Weather."
Amcrimn Lconomit Rnie11 74(5): HM~I
Ro-,cnz\o\ctg. Ma rk R , .md Kenneth I Wcllpm. 2000.
'Natural ' II.Jatural F\pcrimcnts' 111 Econom1cs.''
Joum111 oj fmnmmc Ltterarure .3~(4) ,'27-._'\74.
Rouse. Cecalia. 1995. MO.:mocratt73Uon ~1r Dt,Crston?
The: Effect of Communi!~ Colle2es ''" Educallonal
Anamml!nt." Journal of 8u,inc_,\ 11111/ Eumvmtc
Stumuo 12{21: 21 7-22-1
R uhm. Chmtopher J. llJ9tl "Aicohlll Poltctc'i and
H rghwu~ Vt.:hu:lc f:n.dtltc.,: Jnwllnl of 1/eulth
/.:.C0/1('11111('.\ 15( -1 )' 115 -154.
RuuJ. Pau l. 2000 A n lmroduction rn C/av,iml
Ecmwmetrtc flwon. , cw York Q\1orJ University
Pre~s.

Shadish. Wilham R llmma'> D Cook. and Do nald T.


amphcll 20112 bpmment(l/ mul Qtwsi-L'Cperimmwl Dcm:m ]i11 (itmrali:;:.C'd Cuwallnfermcc.
Bi'lSton : H uughtt'n M1fntn.

S hJikr, Rolx:n J. 2005 /muumtt!l~wberonce


('cconc.J coition). Prmceton : Princeton L.nm.:r-;it\

Prc' 'Sims. C'hn.,tophcr


ll/SlJ. "MJcmeconomiC!> and
R~alm: Emnometrtca 4X( I). 1--IR.
Stock, .T:t.mes H. llJ9-I . " Unn Roots, St ructuntl
Breab. and Trend~ C hap. oltl m Hamfhuok of
Ecnnomnrio. voluml I\ . cdncd h~ Robt:rt Engle
anu Daniel McF-Jullcn Am,terdam Ehc,io.:r.
Swcl.:. J.tmc~ H .. .tnd rr lnc..:,C<) TrebhJ 1003. h\\'ho
fmentctlln!>lrumcnlll Van~hk Rcgre,:-.ton.''
Jounwl u' Lconomu PtfltW< mt \ 1';: 1""'-- IIJ4
Stock. Jamcs ll.. and \larJ.. \V '-"' <~bon. 19&1.
" Varia ble Tre nd<~ 10 rc<momac j IIDC Scnes.''
Journal n( Economtt f'enpewaes 2(3):
147- 174.

Stock:, Jame<> H., am.l Milrk W. Watson. 1993. ''A Sampk Estimawr ot C uintcl!r.l! tng Vector; in HtghcrOrder Integrated Sy~lcrn,... lnmometnw 61(4):
7Kl--. ~0.
Stock James H .. and Mark W. Wab on 2001 'Vector
Autureg.n.:sswns: Jnum(l/ o.f f ('(11/0mic Perspeuitt,,
15(4 ), Fall: 101- 11S.
Stock, James H .. and Mo tnh tro Yogo. 2005. "Testing
for Weak rnsrrumenl\ 111 l anc.1r 1V Rcl!.rec;son."
Chap. ~ 1n ltlnw.fiwtion ,mc/lnjerence in E~:mw
metrtc. \4odek.. F.ssu1:;:, 111 H u11or oj Thf)nitJ\ J
RntlwniJt'rg, edited b~ Don-tid W "- Andre"' <tnd
Jame~ H Stock. C<Jmhndge: Cambndge 'LOJWNI\"

Prc!>s.
Tobin.J;imcs. 1958. 'Esti mation of Re l atioru.hip~ for
Ltmitcd Dcpcndent Variables .. Econometrica
26( I)' 24-36.
Wat~on. Mark \V. 199-1 "Vector t\utorcgrc~ion" and
C~ltntegrat ion "Chap. 47 m 1/tmclbook ot Ecmromctnn , vnlumt= l\,edtted by Ruben Engle anJ

Damel Mcfadu..:n. A mq~rdam: Elsevier.


\\'hlle. Hal !:len llJ~O. A llc to.:ru,kedastrcttvC~1ll.'il'i
lcnt Covanancc M at n x E'll mator and a o arcct Test
for H..:tcroskedastlcitv." r .('t/110/ll('lr/CCI. 48. ~27 .~;\X
Winner. Clh:n. a nd Mon.ica Cooper 2000. " Mu te
11tose Clarms: No Ev1dt:ncc (Yet) for a Olu,al
Lrnk Between Arts S t Ud!! and 1\c,Jdemic Achievement... J111mwl <Jj Ac:~tht/1( fduwclo11 34(3-1)'
l t- 7n.

Wright. Philip G 1915. ..~1 oorc-. EconomJc Cycle


Tlrt Qtwnal\ Journal oj tconuiiii<J 29: 6J l-fl41.
\o\ ng.ht. Philip G. 1928. 1 he !tm(f 1111 4mmul wul
lll.'~<'tllbl.: Oils. N ..: \~ York: Mucmi llan.

Answers to the "Review


t he Con cepts " Question s

Chapter I
1.1 111. C:\J'l\;rtmvllthut ~ou dl!l!tgn hould ha\c one
ur mm .. tr.. tl"'l~Ot group" and n c~mtrol ~roup: fur
e\ampk nne ''trc;ltmcnt" could llc studying for four
h'-'UT:o.. and the control would he nut 'tud~ ing (no
lrcatm~nt) S tud.::nt~ would be ramloml~ ,,.,)tgn.::d ro
b~. treat nen .md Cmtrul group. and th~: \."aU~
cth:ct ,,(hour; t'l 'tutl~ un m1dtcrm performance
would [.c t:'>tirnated hy compi11111E! th-.: ~wcrage
m1dterm gwdcs fo1 cal hut tht trc tlmcnt !!.roup'> to
thatnl th~.: controll!roup. llw lnrg~,ttmpedim.:nl is
to en,UJ<. that rhe ::.tut.kntc; m the tllf!crent treatment
~ll)Up::. s~nd 1he corrcc1 numh~:r of hours -.tudying.
llo" c.Jn \OU make -.m~: lhnt I he 'tudc111' in the control group do not 'lull\ ll 111. ' ino:c that might Jeopardize the1r gr.tdc! lim\ ~an 'llU r"l 1ke '-Urc that all
'tuJcnt' m th~: treatment group nctuall~ )IUd~ ior
lour hour)?

1.1 llu~o experiment n ~de; thc ' !!I 11.: tngn:Ju.:nl<: as


lhc ~.:\pcnmcnt io th.:-p \IOU 4Ul ~''n ln.:atmcn
anJ control gr0ups. random .1... '11 m1~.:nt . ond a procc:dure ,,,, rm.1l~zing the rc:,ultan!! c\pcnmcntal data
I Jere there urc twn tn:.ttml'nl l\'d' not w.:arml! :1
,.:.uhdt (the contrl ~.roup) and we.mng a .;e1t~lt
(the tn:.tlcd eroup). I hc'i(; lreatmcnt' 'hould lle
applied 0\Cr a c;pecifictl pcnod of lime, ''Uch :l' he
ne\t war. The efk...t uf sent hell u'c on trafiJc latali11~, could he c-.um~ltcd .1' the JJIIcrlnce N:t\\ccn
Ia! tiiiy rate. in th~.. ttrol nod lrealment g.r0up. One
1mpcdtmen1 It 'li'' d) 1 cn.. uring that parucip:tnto;
ftllll'''' the treatment (dm do nCII \\C.Jr .1 )1:.11 hell).
M0r~.: lmport.mll). thi stud)' ra1'~o:' 'CitllU:. cth1cnl
concc1 n' hccausc it in,tluct-. parttdptt nt' to l'ngJge in
~nuwn un,afc bchavir (not wcarnw ,1 'catbelt).
1.3 a. )ouwtllnccdtu'lx:df\ thclr~;almenl(')
.mu runthmil.ttion mc1huo, u' 111 Que~tmth LJ
and 1.2.
0, One 'UC:h CHI'-S 'Cdltllllil d 11 I 't:l ~llU(d Ct111t:o.l of a number nl d1ffercnt fl n s ' uh th.: obscr\ution' collected 111 the c;<tmc pt.ml mllmc: h'l
ex<tmplc. the d.1111 "l't nught C'lrttun data''"
lrllllling fc, .:1.-. .1ml ,,, crag~ I thtlf pnlducu' 11~ lor
IIKI dtficrent llrnl\ Junng 21 t'i ( I 1p1er 4 mtro-

duce .. linear r.:S!'ro:,,iun .J!'. J wm llll'SIIIll:th:


CdU,,t) dt~._t~ u.,ing C SS ~l.lional data.
c. I be tune scric' d 1 " 'J d c,uu;h;t nl oil cr' at ion' f1r ,, ~inglc firm .:ollcctcJ 111 dtfferc:nl
J1111nts 10 time. For ex.tmplc.lhl d.11l1 'et might
contain J,11a on training lc'd' anJ overage labor
pr'lductl\ II\ to the linn for each year Oet\\ccn
I'.loll .mu ~ul5. C'h.1pt..:r I:' u'''u"~'c h'"' lino:<~r
rcgr~.:,,ton can tx u'~J "' esumut~ causal dJc:cts
u~in~ tim ... :;ent! Jul.t.
d. Pand Jat<t would ..:un-.1s1 nt nlh~o:t' .1110ns
I rom diffcr..:nt fsnn:.-. c.tch uh,ervc.:tl u1 J1fferent
point., in 11mc For cx;1mplc.thl.' dallillll!!ht con'''' of tnun1n!! lt:vcl' mJ .tvcra~c labur produl'11\ itv 1l11 lOCI Jtffcrcnt 111m~. \\Jth d.ll on each
llmi n I')!'\<\ I 195. 10d 200~ Chnptcr I (l JN:u,.,e<;
h."' lndlr rc~c~tOI .:an bi: us..d to eqimate
~.tu:.al etlc~ h Ul>in!!. panel d.tta.

Chapter 2
2.1 lncre outcumcs are random hce;tu'c I he} are nut
known with n:rlaml\ untiltht:\ :Jctuulh lll:\'ur \ ou

lin nnt knu" wllh ce.rtainl~ the c.endcr ~I tht. next


pcr,on ~''u \\111 meet.th.:tlmt> that 11 ''llltnke ltl
commute tl schO<I,and "' touh.
2.2 If \ mtl l'nrc independent. then Pr(l' ::= ' .\ - t)
"' Pn Y < ,. I Ior nil values uh nod .t. Th:n b. mdqx ntlc 1c'- rr .... n, at the C\mdtlttnnl and margmal dl'tnbuiJun' nl } 1rt:- dentical 'oth111 lcarn1ng the \'alue of
\ Ju.:' 11t1to:h 1nge 1he pruhabll11r Ji,tribution of Y:
Kn<mmg th~,; \ilhh: of \ 'Y" nmhing ubuutthe prnb;lblltt) th 11 } wtlltake on Jill.:rent 'alul!ll.
1.3 '\II hough tht n.: is nn .tpp.trcnl cnu~nllink. b~
t\H:a:n ruintull and thc numhcr of t:htldll'll hnt n. rnintall C:ttuiJ tell vuu '<lOll.!thtn~ ;\hnul the number oJ
chlldrcD horn K nlmlDI!. tlu..unuunl ''' nunl.llltcll'>
\ 'OU sumclhJO" .rhf.,utthe ~l"n . nnd hlnhs .uc ~ ... .J
~onr11 . I hu". k"'"' me ruinfnll ldJ, you somcthin<!
ulxu11 1he month. which tell) nu 'it)nlCihrng !lout
I he numhcr ol o:hildrcn h\"'nJ, 01u ramfoll and the
numN:r ol childr~..n born urc nut mdcpcnJentl~
diSIIIbUtcd.
767

768

A NSWERS

2.4 The average weight of four randomly sdect~:d


students is unlikely to be exactly 145lbs. Different
groups of four students will have different sample
averagc weights. sometimes greater 1han 145 Ibs. and
sometimes less. Because the four students were
selected at random, their sample average weight is
also random.
2.5 All of the distributions will bave a normal shape
and will be centered at I. the mean o( Y. However
Lhey will have different "spreads" because they have
diffe ren t variances. The vruiancc of Y is 4/n. so the
variance shrinks as n gets larger. In your plots. the
spread of the normal when 11 = 2 should be wider
than \\hen n = 10. which should be wider than \\hen
n = 100. As n gets very large. the variance approaches
zero. and the normal distribution collapses around
the mean of Y. TI1at is, the distribution of Y becomes
highly concentrated around iJ.> as n grows large (the
probability that Y is close to 1-l-l' tends to 1), wh1ch is
just what the law of large n umbers says.
2.6 The normal approximation does not look good
when n = 5. but looks swod for n - 25 and n = 100.
Thus Pr(f $ 0.! ) is approximately equal to the value
computed by the normal approXImation when n is 25
or 100, but is not well approximated by the normal
distribution when n = 5.
2.7 The probability distribution looks liked Figure
2.3b. but with more mass concentrated in the tails.
Because the distribution i~ symmetric around Il l == 0,
Pr(Y > c)= Pr(Y < -c) and. because this is substantial mass in the tails of the distribution. Pr( Y > c)
remams significantly greater than zero even for large
values of c.

Chapter 3
3.1 The population mean is the average in the popu-

lation. The sample average Y IS the average of a


sample drawn from the population
3.2 An estimator is a procedure for computing au
educated guess of the value o( a population parameter. such as the population mean. An estimate is the
number that the estimator produces in a given sample. V is an example of an estimator. It give!'l a p rocedure (add up aJJ of the values of Yin the sample
and divide by n) for computing an educated guess
of the value of tbe population mean. Tf a sample oJ
size 11 = 4 prod uces values of Y of 100, 104. 123. and
96, then the estimate computed using the estimator
Y is 105.75.
3.3 ln all cases the mean of Y i~ 10.The variance of
Y IS_::ar(Y)In, which yields var(Y ) = 1.: when n = 10,
var( Y ) = 0.16 when 11 = IOO.and var(Y) = 0.016
when n = HXlO. Since var(Y) converges to 1ero as 11
increases, then, with probability approaching 1, Y wil l

be close to 10 as 11 increases. This 1s \\hal the law of


large numbers says.
3.4 The central limit theorem plays a key role \\.hen
hypotheses are tested using the sample mt.:an Smce
the sample mean i!' approximately normally dlstnhutcd when the sample size is large. critical values for
hypothes1s tests and p-values for lest stall~ tic' can he
computed u~ i ng the normaJ distribution Normal criti
c.al values are also used in lbt: con~tructton nf confidence Intervals.
3.5 These are described in Section 3.2.
3.6 A confidence intervaJ coutains all vaJucs of the
parameter (for example, lhc: mean) that cannot be:
rejected when used as a null hypothesis. Thus. it \UIDmarizes the results from a ve~ large numher of
hypothesi:. tt.:sts.
3.7 The:: treatment (or causal) effect is the difference
between the mean outcome s of treatment and control groups when individuals in the populotion arc
randomly assigned to the two groups. The differencesin-mean estimator i~ the difference b~Jtwc.:cn the mlan
outcomes of treatment and conLrol grou p~ tor a randomly selected sample of individuals in th..: population, who are then ra ndomly assign.;d to the two
groups.
3.8 The plot for (a) is upward sloping. and the po ints
lie exactly on a line. The plot for (b) IS downward
sloping, and the points lie exact!~ on a line. The plot
for (c) should show a positive relation. and the poinb
should be close to. but not e.xactly on an upwardsloping line. The plot for (d) shows a generally negative relation between the variables, and the pomts arc
scattered around a downward-sloping linc.'ll1c plot
for (e) ha.; no apparent linear relation between the
variables.

Chapter 4
4.1 /31 is the value of the slope in the population
regression. This value is unknown. {31 (an estimator)
gives a fo rm ula for estimating the unknown value of
{3 1 from a sample. Similarly. u, is the value of the
regression error for the t'~h o bservation: u1 i::. the
diffe.rence between Y, and th e population regression
line /3n + {3 1X1 Because the values o f f3u and (3 1
are unknown. the valut.: of u. il. unknown. rncontrast.
~~ is the difference be tween 'y, and /30 + {3 1 X;: thus. ii;
is an estimator of u1 FinaUy. ( Y IX ) = {311 .,.. {3 1X is
unknown because the values of {30 and {3 1 are
unknown: an csumator for this is the OL.C) predicted
value. So + {31 X.
4.2 111crc arc many examples. Here is one for each
assumption If th;.: value of Xi:. ossignc::J in a randomized controlled experiment. then (I) is <;atl'lficJ. For
the class SIZC n.:gress10n. if X= class SJLC is correlated

ANSWERS

wJth ClthcJ fnctN'> th.tt uth:ct te-.t -.c:orc!'..thl!n" and X


..r~ c:orrcl.llcd und tl 1 '' \iolftt~J . Il cn1111c' (1m
examph:, W11rker-. II -.chuoh) ur~; r.1nt.lnmly 'elected
ln'm the popul.Jtinn,thcn (21'" '.JthllcJ. fur the clu'~
<;t:tc rt.:grc,,Hln, il nnlv rur.1l M .. houb .~rc mdudcd 10
the '.Jmplc ''Ill It the 1'-'f'ulatll!ll of mtcrc,tl:. all
lo\;hl'IOJ-.. th..:ll ('') i ' \'lllt,lh.!d , It Ill' Olll m,IIJ)' ut,tnb
uted tlll'n ('\) '" 'all,lu:d. lur the di''" ''zc r..:grcs<.ion.
tl.,om~ tc ... t ''~"e" <11~ llll'lepurtcd '' 100.000 (out of
II r <Mihh: IUOU).thc:n l.ugc outhcl"\ ar~.: ros~Jblt: and
( ~) l ' 'Jolatcd.
4J Tile value ot lhl I< IOI.hl.liC~ Ill''' JJ,pcr..c:d th~
pomr.. .1rc arnund the c,t,m.tlcJ rcgJ~.;,:.Jon ltnc. When
R: - li.CJ. the 'cath:J '''
'hould he H: ~ clllsc to
the rcgrc'''l'" hnc V. h~.:n J<
0.5 the: pum( 'hould
be nlllll.! JJ,pclc,c.:J <lbuut the hn~.: Tl1c R du<::. not
tndk.lt..: \\hdh-.:r lh-.: hn ha' a pcl..,iltve ,,r a negati\'e

J"'O'""

~to~-

Chapter 5

S. l. 111~: p' tluc lot u t\\'(l"dcd h.:,ttll Hn:JJ 0


usi ne on 11d. 'tl nl nlhcn '''"'n:. Y. i = l. ... , n
can I'<: l'<lnc,tructed 10 thrc~: ,~~;r~ 1 I) com_putc the
~mplc mean nod thl >l.mu.lfl.l crwr Sl:.(l.) (2) co mputt the 1-suubt1L for thi\~ l mrlc t - >' "IS(});
n) U<omg the !'.!IIIlO IHI nurm.1l I 1hk. CClmpUI~ the
1 '.,Ju~. - Pr(IZ > 1 )
''I'( r ). '\ o;Jmilar
thtct:,lcp pr,udur~o.. '' uo;cd to construct the p-\'alue
for u (\hi 1dcd h .:st ut H 0 ~
0 (I) compute the
Ol.S c'ltm 1 cut the r~~trt"lllll ,Jupe .md the: tand<ud .:rror 'if (fj 1, ( ~ 1com put~. th~.: 1-,t.ui,t.ic for lhb
"'mrl 1 ' p "'/SI"(/31): (Jl U<oinP the ,t.mu<Hd normal tJhlc, C"omputc the f'"' 1luc Pt( / '
t"' '") =
.:!11( .1""'1).
5.2. I he "tc g..:nd..:r gup 1111 IW' t<~n ~ t!~llmateJ
u'm the r-.:gn '"lllll 111 Equu!IC'n ('i.l9) md the dan
'umnwrucd m the 19Q2 r''" 01 T bit 3.1 The dependent \ 1\nal"tlc " the hClurl) earn me .. or th~ ;tb pc~n
m th..: ~ample. Tne Independent ''lln,tblc i' a bmary
\iln thle that cquols I tr the peNm 1' a male and
~.;qu ~b fiJI the rerson IS II fcnl.llc. lllc \\,tgc gender
eap 10 the popUIDIIOTl" the- populatillO cudticll~nl /31
111 the regres: 10n "h1ch cnn be e'timated U'> JO!!, S .
The \\'llgc gt:ndcr gnp lor the other vc:~r.. can ~ c:.-.ti
mated Inn $11llJI,u fu)luuo .
$_'\ I fom{))kcda,tJcit\ mean" that the \arinn..:c of 11 i!.
unrt.l.tted to the \'1\Juc nl \ llctcn~kcda'h~tl\ mt:~Jns
1hat the \llflllllc.."C <,1 u "rd.th.:d to 1he value: or X. U
I he vnluc of.\' is t:huscn u~tn1 .1 !"1ndt)ffillcd e<,n
IT oiled e'-penmcnl , lhl'n 11 ''he nw.. kcda">llt.. In a
rcgre~ 1nn of .1 "'-'rkc r, ~.1r111r '' (} ) "" 'e~1rs of
cduc.tllcm (.\'l, ll ~ould hetcro,kctl. '"'IC 11th" variance uf c:nrnutg' ''higher ltr cnlk~c ~rnduatc ... than
fnr non-college f,ldU.tt'" . J 1gurc 5.::! .suel!c!'>tS that tbic;

j, lllUCC\llhl: e.t.;l:,

769

Chapter 6
ti.1 It" likcl~ th;lt fj 1 \\Ill h~: bi<~,ed N.:causc of omittl!d \.manic'- Sclw"l' in more Hlllucnl di ... rncts are
hkeh to '>p:nd mnrt tn .111 c:Ju~;.IIJOn;tl mputll and
lhUl wouiJ have ,m,tlter eta" '"c' lmlrc l"!,loh 1n
lht.:. hhrury. und murc: c.:~mtputcr-.. lllc!->C other mput-.
ma} lead tu l11gher " c:r ill!.t. tc:)l ~corcs.ll\Us, /31 \\ill be
h1a~ed upw.~rd bcc.:,nN.: the numb~.:r ol computer" per
~tudcnt 1:. po~tiJvcly em rd.ltcd w1th omitted van able~
thai ha'.: a po~J ti\'C clte~ltlll :wera~c tc\t !oc.:on..:s.
6.2 If \ mcrea''-" b\< 3 umt<. .md X ,., unchanged.
I hen )'I\ expected lO Chillll!e I"!\' 3~ 1 UnJij; ff X~
decn:.t,cl- bv 'i unit" and.\ i'> unchanged, then Y is
e\pc:t:t.:d lo c.:hanvc h) 'i/3 unit' If \ inlTCa~t.:S b} 3
UOII'- ;md X~ d..:cr~.:a\~.;S lw "\ un11o:.. then ) J!o .:xpccteJ
to chanp.c bv 3/3
;;,~, un1t...
6.3 Tht.:. rcgre ... ~i~m cann<lt de term me the effccl of a
ch,m~~ in one ol Ih..: n.:grc:~or.. as,ummg no change
tn the o(hc..:r rcgrcs,or<.. hecnu~ 11 th~ voluc of one.: o(
thl:- perfectly mulucnlhnear rejlrc<.~or..ts held con~tant , t hen SO IS thC valUe or tht 1..11 her llJat b. there is
nu independent v UJ.IIIUn 111 ont multtClllhnear
re!!re<.-;nr r~t, ex.tmplc) ol p.:tll:cll~ mulncollinear
regr~K~r<; arc (I) a p.:rsnn s \\ ctght mca .. ured 10
pt)Und' and the- \;lmc: fll:l"\lm., '""'!!hi mea<.ured in
k1logram'. and ('l) th.: lracllt>n nt \tudent<. \\hO are
male and the con<;tant te rm. ~ hen tht datot come irom
tll-m..tle schl>tlh.
6.4 If X .1nd \ llrt.:. lughly correlated. most of the
\ariation m \' I.'Uinl'luc ... "llh the variauon m X,
Thu:. lhc.:rc i~ hill.: . , an.tiiOil Ill .\" holdmg X,
' '""' that c..an b.: U\..:d to csllmatc thl partial effect
ol X on Y.

con-

Chapter 7

7.1 The null ll\ pothcsi:. that {J 1 0 can b" tested


u.'i"" thl! Htathtlc tor 13 n' dc-.cnbcd m "-c' ConCt!pt- I Similarly. the null h) pothc'>i!. that /3~ = 0 can
he test.:d usmg the r-,tnlt..,ttc !or f3 lllc: null hypotheo;a, that f3
0 and~
ll C:tn lx lc.,ted u,ing the Fo;tali<.Uc fwm \.;cllll 7.., l'ht.> f <.l311siiC jc; neces_<;arv
to tc.:'' a JOIOI h\poth"''' bccau~ the It"-' "ill he
ba~cJ l'O both /3. and fj 2 and thi, mean<, that the testmg proc~.:durc mul>t usc pmperties of their joinl dlstnbutwn.
7.2 IIere '"one x. mph. l.Nng dat!t fwm sc\cr.U
yc.:li"!> ot ht.r t..:onomdn d. ''S. ., prote.,~r rcgr~
~tudcni.S' ~orcs -:ln lht hnJI c. xnm ( l') on their '-Cor..:
lrom tht nudtcrm cx.. m ( .\) This rei!Ies,ion \\ill have
a high U . bccnu~c pcopl~.: who do
Cln the
mtJterm t~nd HI do \lodl on the 1inal. H,,\\e\er.this
rcgre '1on produce.. a bJa..cd c:-.ttm.,lc of the causal
cftcctuf midterm o:c:orco; un the hnal \tuJent-. \\ho do
\\dl on the midterm lend tube 'tudcntl- \\ho .tlll.!nd

"ell

770

A NS W ERS

cl:hs regularly. -;tudy hard, and have an aptitude for

the suhJC:Cl fhc vanablcs nre correlated wtlh the;


midtenn <>co re hut .tee dct~,;rmrnanl<> of the final cx.tm
scon:. so o m1tt10g them l~:ad-. to ommcd variahl~ hi a

Chapter 8
8.1 Inc n:grcs~ton func:tton will look like th" qua
drulic regression 10 Figurt: 8.3 or tlli.' loganthmtc ru ne
non in Figure 1{.4 The first of theS\: b spccifu:d a:. the
rcgn.:s,io; of Y onto X and.\~. and th1.. second " the
rcgn.:!>l ton of Y onto ln(X) '!here are many c~onomic
rdations \\ 11h tht' shape. for example. this <>hape
mtg.ht rcpn:~nt the decreasing margtnal proJuctavity
of labor in a proJuct1on function.
8.2 Ttking loeanthms of both side::. of the ClJUat ton
ytekl~ ln(Q)
flr, + f3 In( K) + 13zln(L) + f3lln(M) -1 u.
\\hero.; f3v = ln(A) The production (uncuon parame
ten. can be e!illm.ltc:d h) regressing the loganthm of
producuon on the logarithms of capital. labor. and
raw mmerio ls.
8.3 A 2% increase in GDP means th,ll ln(GDI')
increases by 0.02. lltc implied change mln(m) i.,
J.0 x 0.02 = 0.02, which corresponds to a 1~..
increase in m . With R measured in pcrccnt ag~,; poi nts,
lbc increase in R ts from 4.0 to 5.0 or 1.0 percentage
point. Thb leads to a chnnge of ln(m) ol -:0.02 '< 1.0
= - 0.02. which corr~ponds to a 2"'u fall m m
8.4 You want to compare the fit of your linear regn.:s
sion to tbe fit of a nonlmcar regression. Your tubwcr
will depend on the nonlinear regression that vou
choo-.e tor the compari~on. You mightlt:st yout linear
re8rc~ion against a quadratic regression by addtng
){- 10 the linear regres<>ion. lf the coefficient on x~ is
significantly dtffcrenr from zcro, th.;n you c1n reJeCt
the null hypothesi that the relation~h ip is ltnear in
favor of the altem .Jtive that it ts quadratic.
8.5 Augmcnung the ~.:quation in Quc,tJon b.2 ''ith an
intcrm:uon term yield~ ln(Q) = /j11 + fj 1Ln(K) +
0-ln{L ) + /3}1n(M) + f3.. [1n(K) X ln(L)) + II. nw par
ti~tl dfcct of In(/.) on ln(Q) is now /32 + f3Jln(K).

Chapter 9
9.1 See Key Ct)nccpl 9.1 and the paragraph that

immediate!)

follow~

the Key Concept bo\

9.2 Includmg .tn .tdd1110nal "ariahle that b..:lt'n~~ 10


the rc~rcssion ~111 ehmmate or reduce omitkd 'an
ahle bia:.. Ho"'"'cr.tndudm2 an add1tional ' .trt.thle
that dtle not h\:lone , the re~rression will 10 ~on.:rnl
reduw th~.; pn:n:.ton (tn ... reac;i the' ananc~.:) o the
e~timator ul the tllbcr codftcient~
9.3 ll t~ tmrxmant to dt~tm~uish bet\\I!CO mca.. ure
m~..nt error in l' and mo.; l\Urcment error m \ II } '
mca.. ured \\llh crrtlr,thcn the mc:;~surcment en or

becomes pan of the regrc~'>tOn error term. 11. If lhl.l


assumptions ol Key Concept ll.4 ~:onunuc.: to huld, tlus
will not a ffect Lhe mternul \ahdtl\ ll OL ro.;gre ..... ton.
.tlthough by makin)! the vauancc ol the.: rcgrc:"ion
error term larger. it 10~o:rc.J::.cs the ''Ori me.: ol the
OLS cslimator. lf X b mea urcd \\tth crrur. howc~~r.
thi-. can rc ult in c~lm:lauon bc::t\\Ccn the rcgrc"or
and regresston error. leadmg tu mcon.,i..,tcn<.:) otth~
OLS cstima10r. Asl!Ug8Csted hv Equation (9.2). as
lhis inconst<>tencv bccom"~ more.: \CVerc. th-.; l1rgcr '"
the measuremen't .:rro1 (th.lt!S. the larger 1<. cr ' HI
Equatio n {9.2)].
9.4 Schools wirh rughc ruchievm~ ...tudent<~ cnuld ~
more I kcl\ to volunteer to wke the.: te,t. so that the
school \Oiuntec:rinR to take the tc~t are ntt rcprc:.
sentSll\e of lbe pop-ulation Of ~c.:hOO)S, and l>llmplc
selection bias will result. For e>..1mple. if all-.chrnl ...
v.u h a low !>tudcnr-teacher raltvtakc the te'>t.l'Ut
onl} the best-performing schools with a htgh
sludcnt- teacher ratio do. the estimated clus:. si1.e
effect will he biased.
9.5 C'i t ie~ with hieh crime rates may dectdc: thai they
need more police-protection and spend mmc: on
police. but if police do their job then more pohc~
~nding reduces crime. Thus. there are L.lU,,t link!>
from crime ra~~ to police spendmg. a11tl lwm pol!....
spendmg to crime rates, leadtng 10 SJmult,mc:uus
calli><ility bias.
9.6 If Lhc regression has homo,J.cd.c.tic error.. then
Lhc homoslcedastic and heter\l"okcda tic standard
errors generally are similar, bccau:.c both arc co n~t S
tent. However. if the errors arc hctcroskeda~tic. then
the hom0.;kc:dastic sttmdard errors are to\;On'>i,tcnt.
whtle the heteroskedastic standard error-. are con,t<;
tenr. Thus.. diffcr~n t values for the rwo '-tilndurd crh1r:,
conslttutc!> c:' idenee of het eroskeda~tictty. :md tht'suggc!>ts that the bl:h!roskedastic standard crror:.
should be u.;ed.

Chapter 10
10.1 Pand data (also called lon ~ituuin:tl data) reft.or"
to datn lor 11 dtffen:ntcmiucs uh\crved at r differcnl
time periods. One of the subscnpt$. i. identitlt:s the
entity. and the other subscnpt. t. tdenlifie-. the nmc
penod.
10.2 A pcr..un:> ability or motivation mi~ht nHect
both educ.lti<.m and carrung... More ahie indt\ tdu11l~
tend to complctt: mor.: }<!an. of cchoohng. dnd. for a
''en le\.c;l of eduwt un. the\ tend to ha ..c: 'ttgh..:r
tamtn2'- The.: ..arne.: ~~ true f0r lughJ} mouvatt:d pc-o
pt..:.
<.tate of thc.: macro..:conomy is a umc<>pectfi\."
variahlc: that affect:. lx1th c.:ammg.. and \:du~:auon.
Dunnl! rc.:C.:l:,Sion"- un ... mplo.. ment '' h1gh c.:.trntnl''
Ml" lo~. and cnrollmc.:nt In colleges tncrcascs. Pt:r:o()l1

The

ANSW ERS

specific .md umc:-"[>(!1.-,fic iixcd effects can t-e mcludcd


in the regr""s1on to control for pcrson-,pe<.1fic and
llmc-,p.::ctflc vanahle:.10.3 When per-;on-specific fixed effects arc included
m a rcgres~IOn .thl!y capture all feature:. of the individual that do not vary over the sample.: period. Since
gender does not vary over the sample pcnod, its
effect on enrn 1ngs cannot be determined separately
from the persorH;pecitlc fixed effect. Similarly, tum:
fixed dfecrs capture all features of the time period
that do not varv across individuals. The national
unemploymenl rate is the same for all individuals in
the sample at a given poin t in time. and thus its effect
on c.:ammgs cannot be dctermined separate!) from
lhe time-specific fixed effect.

Chapter II
lLl B.::causc Y is binary, its predicted value is the
probability that Y = 1. A probabiliry must be
between 0 and I, so the value of 1.3 is nonsensical.
11.2 The rec;uJt<; in column (1) are for the linear probability model. The coefficients in a lino::ar probability
model show the e(fect of a unit change in X on the
probability that Y = l. The results m columns (2) and
(3) are for the logit and probit mood~ Thc..'be coefficien~ an difficult to interprcL To compute the errect
of a change in X on the probability that Y = 1 for
logir and probit model. usc th.: procedures outlmed
in Key Concept 11 .2.
11.3 She should use a logit or probi1.1l1ese models
are preferred to the linear prohahility model because
they consrrain the regression's predicted values to be
between 0 and I. l..Jsually, probit And log it regressions
give similar results. and she should use the me thod
that is ea!>tcr to implement with her software.
11.4 OLS cannot b.: U!>ed becau!>c: the regression
function i:. not a linear function of the regression
cocffictcnl (the coefficients appear 1n.side lhc nonlinear Iunctions !I> or F). Tht> maximum likelihood
estimator i~> efttdent and can handle regre<;<;ton fu nctions that arc nonlinear in the parametel'$.

Chapter 12
12.1 An tncrea."<! in the regress1on error, u. shifts out
the demand curve.leadmg to an increase in both price
and quantttv.Thusln(P~o"1''') is poc;ttively correlated
wuh the rcgreo:;ston erro r. Becauc;e of this positi\'e corrclatitln,the OLS estimator of /31 is inconsistent and is
likdv to be larger than the true value of {31.
12.2 The number of 1rees pt;r capita in lhe state is
exog~nous because tt IS p lauc;ibl~ uncorrdated with
the error tn the demand funct1on l lowever. tt probahly "abo um:orrcla te<.l with ln(P"~"""'''). so It is not

771

rete' ant .I\ valtd tn~trumcnt must tx: e'<UI.!enou.' and


relc!vant. :.o tlu: number of trees per capita m the state
1s not a \altd IDJ>frumcnt.
12.3 The numher of lawyers 1<. arguably correlated
with the mcarcera11on rate, <.o it I'> rele\ant (although
tim should he checked usinl.! the method:. in Section 12.3). rlowever. states ,-vll h higher lhan expected
crime rates (with positive regression errors) arc
lil-.ely to haw more lawyers (criminAb must be
defended anc.l prosecuted), so tbe number of lawyers
will be positivdy currclatcd with the n:grcssion e rror.
11tU. mean!> lhut the number of luwycrs is no t exogenous. A vahd mst rumen! must be exogenous and relevant. so the number of lawyers is not a vahd
tnstrument.
12.4 If the d1tT... rence tn dto;tance is a valid instrument. then it mu.c;t he correlated with X. ''hich in this
case i~ a h1nan 'ariable tnd1cat1n!.! whether the
patient rect!IV~d cardiac catheteri; auon lnstrument
relevance can N: chcckt:d u!>ing the procedure outlined in Section 12.3. Chccl..ing instrument cxogeneity
is more difficult 11 there arc more instruments than
endogenous regresso rs, then joint exogene1ty of the
instruments can be rested u.-;ing the 1-test outlined in
Key Concept 12.6. However, if the number of instrument<. ic; equal to the number of endogenous regressors. then 1t 1s 1mpossible to test for exogeneit)
statistically In tit~> McClellan, \1cl\c1l. and Newhouse
study (199-t) there 1, one endogenous regressor
(ln:atmcnt ) and one instrument (di(fcrcncc in di::.tancc ). so the J-teSi l'<tnnol be w;ccl. Expert judgment
i:-.. required to ~:~ssc~s the cxoge.ne ity.

Chapter 13
13.1 II would be hetter to assign the trea tment level
randomly to ~:ach parcel. The research plan outlined
in the prol'llem rna} be nawed because the dtffereot
groups of parcclc; might differ systematical!~. For
e\ample. the fin.t :!5 parcel of land mtght ha"e
poorer drainage than the other parcel" anli this would
lead to IO\\ cr crop yicltb. The trcutment I!>Stgrunent
outlined in the problem would place these l5 parceb
in thc control group. thereby overestimating the
effect ot the lertilt7cr on crop yield~ llliS proble m is
avoided with random ao;~ignrnen t of treatments.
13.2 The treatment effect could oe e~t ima ted as the
difference in average choleo;terollevel-. for the
treated group and the untreated (control) group.
Data on the wetl!ht , age. and gender c)f each pattent
could be u-.cd to1mprove the e~timalc u tng. the difference!. e!>ttm<ttor \\ith additional regrc.~ors ~hown
in Equation ( 13.2). ThiS Tl!grc~SIOn may produce a
more accur,ttc C:!>tlmatc bccausc it controb lor these
addtttonul facton. that rna) affect cholesterol. Jr you
had dura on the cholc~lcrollcvcls ol ench patte-nt

772

ANSWERS

hctorc: he or 'h.: entered the c:\pcnmcm th ., th ~hr


tercnce-,m dtffer\;nL"I.!' c'nmatnr could l ~J Tills
C!>UID.ator contrul~ for mdt\IUUDI pcctfi.. d~:tt m
n.ml' ul cholestcrl\lle,eb th '' uc con:.tam O\ic tht:
lk1IIlplc period. ,u.. h a' the per:.~m, gcno.:lll." pro.;tJt,p<l
~II ton ,, lug.h c ItoI~ 1~ h11
13.3 If the ,tuJcnh \\ho were tramkrr..d to-.mall
classe~ dtflercJ '''lt:m.ttic: ll\ lmm tht> nthc:r 'IU
denL<;, then intcni.11 \,lhdity i~ cumprurni,cc.J. ~or
example. tf the tr:m,fcrrcu <.tutknts tt.>ndc:c.J tc hrt\c
higher income' und mur .. lcurnmg 11ppmtuntt1e~ out
side ol '<Chool.thcn thq \\t,uld tend tn pcrllrm ~~
kr on ,t.mc.l.trdu..J l ~t" The expenmcnt \\OUIJ
incom.'l'tly :illrtbut~o: thl' r~lnrm.mce to the 'mallei
cia:.~ )JLc. lnhmn ttiun on nnl!rnal r.tmkm "'"ll!nment
oouiJ be u" tJ U' nnm,trumcnt 10 .1 regrc,sttmhkt
Equation { 1.\.h) to a,tur~ mtc:mal vala.llt). The on~t
nal rand,)lll ossijl.nmcnt i'>" valtd tn\lrtll111.!1lt hcc.w'~;
IllS eXO[!t'llnUS ( ll11Clll'l di!ll'U \\llh thl rc~re,~t\10
e rror) unL11' relt.wnnt (cmn.:lutt:d wtth the uctu.tl
assignmt!n t).
13.4 lhe llawthllltll' eollcct j, unhl..cly to uc a rrnh
km m th~: lertth71!r c\amplt:. unle~ (tor ~X<lmplc)
''ork.:l' whl\ at~d th~ Jtflcrcnt parcel~ ull.lnd mme
or le5s tnten"''el~ JcfX'ndmg un the treatment.
Patient-. in thc- ~htlle,tChll 'tud\ mteht be mMe Jali
gent takmg thc:tr ml'~hcatton than pauent" niJt in .m
e-q>eriment. \l,tking the clh\J~,teml e'l.pcrtmcnt dou
hle-hhnd.>u that netthcr th .. doctor mr the p:tll"nt
kno\\:, whcth..-r th .. puticnt '' recc:t' ing the tn..atmcnt
or th\.' placdx'. \\ouiJ rcJucc: c\pcrim..:ntal \!lh:ct'.
Expcrimcntalclfcct may bl.' impurtant in c\pcrt
mcnt~ tikt ST\R 1Ith~ h.: chcr-, ted that the 'pcrimcnt prO\tdt:' ~1:11 \\ith un opportuntt} to pNh
th.J.t "rna cl. '''$ :>11~.:::. ar\.' bc~t.
13.5 The ~:~rt '" t.:c mtrnd Lc:d r.tnJmnc'" 11
ria'" '-l.lC)
t make It ap ~ l r
f tilL ...... m" nt ~
ranJoml .1''-l!!ncd.The d ~u"r 111m ~ct on 1~.1
dL"!>CTTIx. how 10\trumen .t ',arabk rettre<-,lun C2ln
U..'>C the nJu..eJ .. . tnl!l.!'- n cl,,,o; "r'e' to .:... umatt: the
dfect ol das' ''A on II!' 'core

t"

Chapter 14
U .l h dtc' nul rpc.tr 'tntionnr~. The mo:.l ..triking
char;scll:mtK uf the -.cne' j;; th.at it ha'> .m Up\\arJ
trcnd. lb.ttt~. uho;..tvaticm' at the cnJ (f th ... 'ample
arc W)t~mJth. 'lv l:tr1!er than ~,h,crHtlton~; ..tt the.
Oelmirunl!- Till' ~li1!~C,ts that the m.::tn Of the ~rt~ S I~
not con'l~nt. \\htch wnuld imp!\ 1hat llt" not statulll
af\. [be f.ir.ot dtfktence ol the 'Lric' m.l\ lool.: '<tationa~.lx~au-., (iJ-,t dtffereocmg el mmatc~ the lnrgc
trend HO\\'C\\;r.thc lc\c(,,i the lir 1 Jtflercncc 'cncs
ts the .,J,)pe oil he pl\!ttn Figure 1~.2c I i>lktnl! e.trc
full~ Ill the llUTC.Iht: lope j, 'ICC!"!f 10 ll)NI 197~
1h.u1 in LY7Q-21)04 lltu, there m.!) hil\e been a

change 111 the rne ..n \>lth~ lthtlhlkrcncc o;cnco;.lf


thcre "'a' a d1 ngc 111 the p<pul.tllliO mean of the til't
dtlkrc:nu.: Sc!flc then at toots n,,n~ltlltonaf)
14.2 One \\ av tu du th" '' tu con,l rue I p..eudu nutof
!> mpll tor~c.t'l" lor thl" rutll.hllll wall. m,K:JeJ and th"
ltnt~nct.tl anal\ ... t "<~nmJd If thL .tn.tl\\t 'modcJt, hct
tcr th~.;n it ,h,-miJ h.l\c ,, !1)\\cr RM~FF in the p!<eudn
out-ol <>ample pcruxl t=\en ilthc anal)'t's model out
pcrturmcd the mndom wall.. mlh" p-.cudo out-01~mpll.' penml, \tlU mtg.hl '1111 he \\,\f)' of his daim. lf
h~. had accc'' to the pscuJuuuHll,ampt..: daLH, then
hi., modd rna\ h,l\l' hccn con,tructed to fittht:sc data
\'Cf} \\ell.") ll '1111 could produce JX~'r true out-otsample IMcc.l\1'. lltu' o hettcr te .. t otth~.: anal}'>l,
d.um j, to u~ Ill\ nwdd and the: r.mdom \\,tlk to
fmce<t'>l (utur~: ~t od. 11:turn' nnJ compilrc true outol'!amplc pet lot m.111C'e
14.3 'res. Tht usual 115''o c:on ltJ~:nce tntervalt<; i31
:t 196SF.(/31). \\h H:h in tlm c:: .t~c produces the tnlerval
ll.l)J 11.99 l11i~o tnt~:rval dn~:s mncomnm 1.0. Ho\\ever this mclht~tllm ~~m'll ucttn~ a confidence tn tc rvnl ~' ha'l.'d on the ..:cntr.1l ltmit tilt:Orcm t~nd the
t .~rgt.:-~tmph.: normaltlt'ltrthuttlln
J3 \\shcn (3, =
1 ll the oornul appro'l.lm.llhlO " not appropnah: ;md
t h~ method hlr corn pulln lh~ conJ de nc-. mterval i!>
nvl 'Jlu.J ln!'1e1d \\c cld to usc t h1. gLnc.:r I metht)(f
lor ronstru.. tnl! J ' ' nfiJ nc~: ~h.:r' aloulltned 10 Secllun-. 3 3 'lnJ :>.2 Jo l1nd o ut \\hd h.:r I.U j<; ,, the
'J'\ crm Jcnc~. ntl f\ ,,I \\ \; nc ... J ' "' le'Sl h~ null
hypothc"' /31 = I 1 tl '
I h i II \\c do not rcjL"CI
tht' null.th..:n lilt' in th~,;. cnn fide n~ tnkf\al. lbl"
\,tlu..: of the f~l.llt\IIC l llr th~' nulltS -1.50. h1.1m
labk I~ 5, the:. o CIIIIL I 'liUl ' ' 2 sli. 'u the null
hy['\ll he~I" 1\ not rcJC'\.lCJ. ( nu-. f3t 1.0 I~ tr the: ~5%
ronttJ~:ncc mt<'r\1'tl
14.4 'lo \\OUid aJd a htn.tr) Hmablc ~> fl hat
equ 11' 0 lord ttc~ pnor ~~~ IIJI-J'I and cqua ' fur
c.l.tt.:-.1992:1 and hl:vond lfthtcoetlt..h:nl ~~I> 1'
'ic.ntlte<tntl\ dt(tcrcnt ITI'rn ter<s in the: n:grc.;, l'D (a'
ju-dged b\ 1.13 r 'tnlt'-lll ),then Ihi' \\oOU!U lX l YtdlllCC
o1 an tnterlept hrl'<al.: m 199: I. II th.: datt. 01 thL'
break'" unkthl\\O th~.n \UU "uuld nll.'d tocarf\ out
tht' lt:Sl for mlill) pu.,..ibh. br~:.tk d IIC~ U!>lll!t the Ql R
pnxt:dure .. umm.utzc.:d 111 "e~ onct:'pt 14.9.

,,f

Chapter 15
1.5.1 -\.., Ji"u"c~l in Kev Ctlllll!pl 1~ , . cau...al df~cts
can he c-.um.tt.:d b.,. a dt,trtbutcd 1.1~ mtxld when the
regrl''""lf'> olf\: 1.::\111!\.0tHI' In tht ... COntC\1 exo~c:nctt~
mc:m-. th ~I ~:un nl unll.uacd \lllu~' olthe munev
"upph rc: uncurrelatcJ \\llh th~ rcgr"'"'mn error:
fhj, ac:.,umplun '' unhkcl~ 111 he .:.<~lt"ltcd. for .:xam
pk aggre~utc ;o,upplv dt,turh.mcec:. (otl pric... 'hod,,.
ch;mgcs to pwducll\11\ I h 1\C imp,,rt ..nl cflcch ~10
(,J)J~ ll1e 1-t.uetal Rc~ctvc. utd thL b.tnL;u\~W:>h.:m

ANSWERS

773

abo rc'lponJ 1t1 the,..: l:~cll'r-. thus chamnng the


m<m~ ~ 'upplv. l11u' the lnllOC\ 'iupply ~ endogenom.
.md '' <:tlrro.:l.tii.:J \\llh th.: rt:Qrt:.,~IOll crrt'r (\\hich
1ncluJ~l> thc'.: <'nlllt.:J Vi1n<lblc-.). Ucc.:aU"t: the monev
!>Uppl~ i-. ntlt ~:wgcnllU'-. Ihe di~tribut..:d lag rcgrc~io'n
mnJd cannot he u....:u to c-.umak th1. dmt~mic cau~J
die t lll mont:\ I-'ll GDP.

15.2 l11c ~aiallv correlated <'rror tc.:rrn could arise


frnrn tncluc.ling too l'ew lags tn the ADL model.
Adding more lngs will eliminate the serial correlation
in the crrm terrn nnd produce a con\i'>tent estimator.
l$.3 Cumul.lltng the dvnanuc multiplier~ for !1 Y,
y1dJ-. lhc dynamic mulllph.:r... tor Y, Smd differently,
lh..: d~n.1m1..: mult1phcr-. torY, arc th.: cumubt1ve
multtpher" lmm th~.: ~} re!!re ... o;no.
15.4 "lllc regre'-!.IOn function lhalmduJc, rDD, 1
ColO 1-le \Hllh:n "' ( o.Chv.l' I DD l FDD ..
f[)f) .... ) = /3, + f3 FDD, _._ f3 IDD,! T f3_.FDD, ~ T
""/31 LJD1 ~ ... i:.(u1 FDD, 1 FDD. FDD,. 1 ).
Wh.:n f-J)f) ' ' ~tnctl} cxogenou:o..then F.(u FDD _1.
FDD, FLU> ... ) = 0. ~that FDD, 1 doe~ n01
cnte1 the tcgrcsl\ton When FDD 1s cxogenouo;. hut
nnt ~tnctl\ exo~enour;, then 11 ma) be th~.: ca~c that
F<u\FDI> . FI>D,. FDD, .... } -to 0. "ll thatiDD111
Mil c:nt~:. the regre'-!>IOn

16.2 ThL' lon:e." ol )J1, i.; }'


= 0 7 x <; - 2 45.
The foreca't ,,f )'1 .~1 i-. )
, =0 ~
" - II.IOJI.
The re-.ult '' rca'-tmllhlc Rccau-... th~ pnxe''" moderately 'enall~ corrd.tted (/3 - 0 7). } ~ 1' t>nh
weakly telaled Ill }',. Thi' mt.:ano,; th.ll the loreca:.t l)f
1 t~><"hnuld he.: \t:r) do't.: to JJ.r.Lh.: mean of} Stnce
the proce,., 1:. -.tattun<tT\ anJ /3, 1 - 0. p.-1 = IJ. Thus. a.;
expected. }',. "':' i!. vcJ y dose to Ll.!ro.
16.3 JJ Y ,mc.l C <II..: wintcgrateu. then lht.: c.:rror corrc.:ction term )' ( is c;rationary. A pkt of lh~: scnes
r - C :.hnuld appear -.tationary. Comtc.:grallon can he
tested h) carry1ng nut a Did.. t:]Fullcr 0r Of - GL<;
unit wot ~t:'t fnr thl! 'cncs }' - C rtu-. ~ an example
of a te'l for cointcgr.lltun \\lth a kno\\"n cOtnteerntmg
cocfflcttnl.
16.4 \\he.; n 11, 1 1' umt..,u.lll) large. then ul "ill be
large. Smce tr ''the condttional hm.tncc l>l u,. then
11 '" like I\ to he: IarS!~:. fhj, wtll kaJ a larg.: 'alu.: or
Tt 1 and ..;u forth
16.5 A more p11wcrfultc<>l jc; more hkdy to reject the
null hypothc~' when the null h) pot he'll> I'> ft~b(;.lbtl>
amproves vuur ahiht~ tn d1sttngui'h bcl,\i.en ,, unit
A R root and .1 root lc" than l

Chapter 16

17. 1 H '~umption ~ in Ke\ Concept 17 1 IS true.tn


large l>ample) .1 IJ'5" Clmftucnc.: mtcr\'al constructed
u:.mg lhc hctero'k~d.l\IIC rohu<a c;tandard error wtll
contain the truc \<iluc ol f3t w11h a prohabilit\ of
95% U Lll>.'>Umptum 4 an Kc)' Cnncept 17 1 b falc;e,
the hnmnskcJa~ttc t tVnnlv varance estlm:nor t<i
tnconsistcnt. '11nts. n{ general. in lnr~e 'ilmplc~ a
IJ'i% C(>ntu.lcucc 1n1erva l cnn~tructeJ using the
homo~kedu-.ttcllv l'nlv 'tanJard ~.:rror ''til not contain
the true valu1. nl./.:1 1 \\~lh '' prob<~btlit} ot Y5'~n it the
f'rTllr' an: ht:tcru,kc:d.l'>l k. ~l, the cuntiJcnc~. mtcrval
wtll nul bt. 'alid al>ymptotically.
17.2 froml\lutsh\ theorem. A R, h snn a'~mplotic
.\'(1) 9) dl'lrlhullon lllU~ Pr(A,.B
211' .1ppro\J
mately equal tn Prl/ ...- (21~)). '' h~.:re / '" s ~tantlarJ
normal random' mthh:. E\aluallnl! thl' prllballitit~
yteld~ Pr[7
(21.')) U."''S.
17.J fllr \ .tJUC\ 1.11 \' "'" Ill. the pomto; ~houlu he very
do~e lo t hi. rcp.re,~tun Ime because the \'aria nee >f u,
.., "mall Whcn .\' , IU, th.: pntnh should he much
tartber fwm thl: TL'\!TI.' s1on line hecawe the 'ana nee
of u C. l.lrgt. ">tnc~ th1. po1nh \\ Hh \' ::= IClare much
do~r to tht rLI!IC"'l\)n lin..:, \\'L. gt\6 tht'm more

16.1 lll1. m.~~:ro<eonomisl "ani~ tu construct f01 cfnr nmt: vanai"llc..;. If four l.tg~ ol each variable
ttrc U'il.!d 111 a VAR. then cnch VAR cqu:uion will
tnclUdt. 37 rCI_\f\.!!i\100 coeffictcllt~ (the COnStant term
<ltlU fm11 codficicnt., fnr each ol the nine vnriahles ).
llll.! ~;nmpk pcr111d tncluues 12K quarterly nhservations. When n coc::t'ficlenh un: (.;:O.llmntcJ usmg 128
oh,crvat~tm~ the e ... ttmllh::tl cocHICicnh arc hkdy to
he tmpre~o:tsc, lcadmg. tu maccur:uc fon.:cu~t~ One
.thcrn.ltl\c ''Ill u'c a uni\ari.1tc .tutorcgr.:,~l0n for
ea~.:h vanahlt: rhc <IU\ .mtllgt: of th1<> approach 1:. that
rclatl\cl) (C:\\ p.tramctcn. need lobe o.:stimatcd. -;o
th tl the c:ucfltClCnt' \\illl't! preci,ely e~timated 1-ly
OLS Inc dt,au\1lntaec ''that the roreca<>ts are con::.tructt.d w.ing on I) lag" ot the' aria hie be in!! forc:ca~t .
and lag-. ot the ''th..:r \"aria hie, might contain at.fdlltonalusl'lul torecasung mrormation \ compronuse
t' to usc a set ot t11ne ~ene' regre,\lon ... \\ilh addiIH'Ilnl rrcutCIOI'- For example a ( rDP rnrc~a~ting
rcerc,,lllll mtght l't! '\flCCJIICd u... mg lag<.v( GOP. consumptnlll, und '''"!!-term 1nterc:~1 r.. t~' but e:-.cludmg.
the; tth"r \.Utahlc: l11t: $;hnrt-term mterc't r<~tc for<-'c.l'oling r.:grc"lOO lllll!ht he spc.:iftet.f u-.ing l.lg.~ of
'h(Jrt-tcrm r.llc:..lt>ng-tcrm rate,, GOP. and inllation.
1l1e dc.t '' lo indudc the mo~t Important predictor'\
111 .:ach 0f the re~rc-.~i<m c~unuun'. hut kaw outlht.
vanablc' that .uc.: no! vc.:t) tmpmt.tnl
C.t~l"

Chapter 17

\\Cight.

17.-t Th~ (.,.tu''~larkm theor~;m 1mphe<o lhat the


avcr,1g~.u ~.:'tim.ttur c.annot be ht:ltt.:r than\\~. To
~cc thi'<. not..: lh.ll thc U\craged c~tm:th'l j, ,, hnc<tr
function nl r 1 Y,, (the OL<; tc;timawr' arc lm..:ar
funciHllh.h'ts !hen ii\'Ctag~) and'' unlm-.td Ithe

774

ANSWE~S

OLS e<.umators are unhla,cd

as~~

their uvc:ragc ).ll1e

Gaw, \tarl..o' thcor~m 1mphc' the WU> 1., the hc~t


linear condiiiOIMII} unh1 sed c::'um.Jt!'lr Thus. the
ave rag~c.J ..:sum~tor cannot tx h..: tier th 1n WLS.

Chapter 18
18.1 ach cnu' olth" hl">t column of X i I The
the s.:cund o~nc.J third columns an:. zeros and
one Th..: hl">t column ot tho.; ma111x X 1, the sum of
the sccCind J nd tlunl column-:: thu~ the column.; are
linearly dependent. and X does no~ h:we full column
rank. fllc regrcS'\IOn c.tn be re~pec1fied hy ehmmatmg

enlric~ 10

en her X 1 or

X~

J8.2 a. Fstimat~.: the r..:~,.,.~.,ion coefficients by OLS


and compute hctclll-.kcdu,ucm rnbusl standard
errors. Con\trucl the confidence interval as
{j 1 + 1.96S E(~t ).
b. Estimate 1hc rc~otres~i on coefficients by OLS and
compute h etcroskcdn~ticltyrobust standnrd errors.
Construct the conriucncc: mterval us (3 1 :t l.96SE((31).
Alternatively, cQ_mputc the h oml>skc<.lnst icity-o~ly

standard error SE((3 ) unu form the confidence mtervaJ as (3.. :!:: l.lJ6SE(p11 ) .
c. TI1e con!1dence intervals could be constructed as
m (11 ). TI1ese uo,c the large-sample normtl dpproxrmation. Under ..,,sumpuon ~ l-6, thc c:xact 1.b~1nbu tton
can be used to form the.; confid..:oce mt~,;nal ~1 =
1 t
~S.I:(,B ). \\hl'TC t"
or i~ the 97.5111 p.:r~nt.ilc of lhc 1 c.J~tnbuuoo with 11 - lc - I d..:gn:..:~ of

freedom Here n 'i()O and k = I. '\n extended '..:rsion of App.:nJI \ Table 2 ~hOW!. ~-~v l7~ = 1.964>{
18.3 N o,lhi'l result requtrl!s normally Ji~trihutcJ
errors.

18.4 The BLUE estamator 1, tht. C..l..!) .. ,um.-ttor. You


must kno\\ .{} to compute th~ c>.liCt C..LS c~tJmatllr.
Howe-,;er. If n i~ known fun..:tivn ol 'om..: paramc
ters lhat in tum ~..<tn ~ con\l.,lcntly Cl!llmatcd then
estimators for tbl!sc paramt. h. r. can he u.....:d to construct an c~lirruttur ol th~:. CO\ilrtanc..: Ol<ltrl\ fl Till'
cstrmator <:<~n then~ u~~:d to c.m,truct .1 te;a,thlc \lc.;Jsion of lh.: GLS estimator. Th1s esllmator ~~ apprl)~
mately equal to the OLUC cstJmatl)f when the
sample c;1ze 1' l:~rge.
18.5 There arl' many example-.. Here "one Supposc.:
thai X 1 = Y, _1 and 11, 1~ i.1.d. with mean ll and vanancc

cr. fTilat 11.. Ihe rcp.reS'!IOI1 moJ el s on AR( I) moJd


from Chapte r I.!. lin thi~ case X; dcpt:nuc; on 111 [or j
< i but does nor depend nn u1 for j ' TI11s implies
F.(II ,IX,) = o. Howcvcr. E(u, IX,) 0 anu tbll>
imphes E(VjX) * On.

Glossary

Accc(,tunce region The ~ct ot value' of a test '>tati~


ttc tor whtch the null h) pmht:st~ t!> accepted (is not

fi!)CCtcd).
Adju_<;tcd R 1 (R 2 ) : A muJificd versttm ot R2 that
J,"'t:~ nut ncc.:~art ly m~;.rc;t't(! \1. hen a new rcgr~
"lr '" 1ddcd to the r.:grc~'l<'"
~O I ,(p,q) : Sec ourort>grtsSI\ ~ chl tribwed l11g model.
\ TC

s~:~

I jormalltm crilt'f/011

AkaiJ.e inrormation criterion: See infor17Ultion


(fill flfltl

Alternathe h.)pothe is: The hypothesis lhat i!>


a~~umcc.J to~ true if the null hypothc:M!) ,., fal.,e.
Th~.; alternauvc hypothesi~ i ~ often denotcc.J H 1.
AR(p): Sec mttongJe'W'"
ARC H: Sco. nuwr<'gre~l've condirionol ht reroJktd,L\fU 1\

A ymp totic d btribution The approxmlate sampling


dt,tnhutton of u random 'ana hie computed uswg
.1 larllc ,,1mpk For example the asymptmic dl,lfl
h utmn of the s.ample average is normal.
A~)m ptotic normal distribution: A normal~.h~tril'lu
tion that npproxtmatcs the samphug di~ t nbuuon
of 11 statistk' computed using a large Sl:lmple.
Attritio n: 'lllc loss of ~uhjc:cts from a :iluoy after
a~~wnment ll the trea tment or control grtJUp.
Augmeot~d O kkey-Fuller ( A OF) lest: A n.:gr~...~ion
bao;cd h.; I ftlf 3 UOII root 10 an AR(p) mvJcl

Autocorrelation l'ho. 'or d 1tton het\\.ccn J umc


M:nes' 1r l'1c <tnd :ts ll)!Ct:d \alue. Th~ 1 autocorrdatmn ot )' 'thc correht1on hctwecn Y, and
Y, ;
AulflcO\nrinncc: The co' triancc bel\\een a time
o;cru:~ vannbl~o nnllll~ lagged 'alue. Th~: pb autol:tl\ariancc ,,, ) 1s the co,anaoce be(\\ Ct:n Y, a nd
)',_('
utnregrc,,ion: ,\ hnc.tr rcgrc-.,lon modd that
relate" n ttme ~nc.;,. varUthle w lh PJ'I (that IS.
lagged) ...111 .:' '\n .1utorc!!re"""1" with p lal!g.cd
\ Blue_, .t!> r"

essof' ~~ d~notcd .\R(p).

Autorcgrc,,iH! conditional hcterosl.;cda\ticit)


c \RCII) '\tun~; sene~ mnJd oll.'ondllttliiJI hctao-.\..,d.t,llnl y

Autorcgrc ~ohc dhlrlbuted tag rnod.>1 A linear


regrc<.sion lllllUcltn \~ h1ch the lime '\Crtes \ anahlc
r," exprcs~cd '"a tum:ttvn llf lags of l', anJ o f
another vanahk .X,. Ill~:. model ~ oenoted
ADL(p.tf). where p dcnt>tC!> the number ollags or
Y, and q c.Jcnott:l> t h~:; number of lag~ of X
Average aau_,uJ effect: !he ~rulation .ner:lQe
of the tndl\ tdual c. u.;al cllcC.:I<; tn a heterogeneous
population. Also calleu lhc; 3\'cracc treatm~nt
effect.

B11buu:ed panel: A panc.:l da ta set \\ tth no mis:.tn~


oh:.enatlon,th.ltiS. tn wh1ch thc vanablc~
arc obscrvcc.J for c:ach enttty and each umc
period.
8 a.;e speci li~tlo n A h::~schnc o r benchmark regres~JOn spcctfic~lti nn thcl t includ.:!) a ~t of n.:grc:~ors
cho,cn U'lng 1 comhin'ltion ol exP'!rt JUc.lgmr.:nt.
economit. lht:Cir), and lno\\.ledgc ol ho\1. tht. dat,t

wer..:.. lkt.ted
'Bayes information aiterion: Sec mfu

IIWI/(}11

airerinn

Bernoulli d b tribulio n: The probability di~trihution o l


a .Bcrnoullt rflndom vt~rtablc.
Bernoulli rnnrlom vuriuble: A rondom variable that
takes o n two values. 0 unu t.
Be~ t linear un hla ed el\limator: An c:~limatot that
hils thl.' (,mallcst \Jrtancc of an.,. e ... umntt'r thatts a
Unc.;at fUilt.IIOD Of the -..tmpfe values ) clOd I~ UObia:...;d. L nlh:r th ~ G.tu"-\f,lrkO\' condttion~ the
OLS c'l r ' or '' tl1" he<:t I near un!"t1 N.;d c'um.ttor ol th" r" 'fQ:.tvn t;l'l(;ffi\.J<:nb ~XlndltiOnal 011
l he \Jiue' of th~; r~:. .rc,.,ors.
Bias: The expected ' '"ue oil he dtlfcn:.ncc bct\\een
an csumutor nnc.J the paramdcr that Il l'> c:.umating. U f..C.l '' an <:~llmator of ILr then the bw-:. of
;.,.). t:. ( ~)') - P.l
BIC: Sec mfomwllnn cm~rwn.
8 in2n variable: A ' 1ri Jblc I hat is either 0 ur I
butarv \Jn tbk 1 used to indicate a hman outcom~;. f,lr c~ tmr .\' 1s a hmary (or tndi~Ull. ~>r
dumm~) '' Jbtc hr a person's gender tl .\
I
if tht: person I rc:male and X = u it the: person
IS male.
17S

776

GLOSSARY

Bh-nriah. normul dhtribulioo A gc:ncr:slr,r.111on nf


the nNmal Jt,trlbuuon to tle<.cnbe the Jmtthan
touunn <11 ''"'' r.mtlom variahJc,
BLUE S.:\: h, >1 lmear unbum:J e:.tmlulm.
Oreal.. dat~ lltc ll.tte '""' <1 tli-.cretc change m pupulauon lim\;. '~ ....., r . . grc~')tOO coditctenl( ~).
Cuu~o l dfcet nte t: \ pt:ch.:u clh:ct of a gtvc;n inter
vcntion 01 tn:ntment ns mcusured tn un i<.l . .ul run
domtLctl cnntrollctl expel irncnt.
<.:cntrullimitthcorem: A rc:,ult m math~m.llt
cal 'Ldt ''~ that
that unde r ec.:ncn l
condtttnn-..th.. ';~mplmg dt,tntouuon nf th" .. t.mdardt~c.:J ~.,mph.: a\'eragc ., ,, ....IJ .lppro\lm,sl J h\
a ,t<~nJ.trd uurmalllt:;tnhution "hen the.: -.ampk
~.,.c: h lar ..:.
Chi-~quared di~ribution: Th..: di-.tribution of tl e
:.urn or m ~u m.:d independ~:nt -.tamlard nnn.tl
tandom \ m.tbh:-. The parameter m ,., callc:J
the degre..::. of the treedom olthe cht ~qu.tn.:tl

sa,..,

tlistnhutt~m

C how teNt ' A h:\1 for a brcal\ in a llmc !>t.:flc!\ n.:~rc)


)JOn Ul <1 known hrcaJ.. Uate.
Coefficient of de termina tion: Sc~.- R.
Coin te~tion When '"o o r more lime '\\:rJ~o;' van
abk~ 'hllf~:. ,, common ~tochastic tn:nu.
Common trend: A trend .;harcd b' two or more ltrnc
sene~

Conditionul di lributioo: The prohabilil) Ji,trihutinn


ol on~ r.trJnm vmablc gt\en that anotht!r ranJom
v,milhk t.tk'"" on u partt- ular ' aluc.
Cond itional C\pt:clotion: Th~:. expected \,tluc: nf nne
rand\>m 'Jlu~ tvc:n th..tl another random vuuthle
take:' on I r HIICUI,l \ ;Jiuc.
Conditional hcte ro~kt.'Ciasticity: The vanance, u-.u;tlh
of an "rrm It r m d~;pendc; l)n other van a hi~,
Conditionol meun Th mean of a contllltonal c.l1:-olrt
hut ion. sec rnt~duwnol e~:pertation
Conditional mean independence: The condiuon.tl
..:xpectathm ol the regrec:.,IOn uror 11 2t\C.n the
rt.!!re,~or,, J..:pendc; on "omc but not allol the
re~r~:.sors.

Conditioonl 'nrlnncc n1..: vanance of a cmdition.tl


dt-,tnbution
Confide nce inl~rval (or confidence set): .\n mt.:n nl
tor ~I) that or I 11ns the true value ol 1 p<~pul:l
uon parameter wnh a rrc~rccilled proh;tbtllly
'~hen C11mputctl ovc:r rl.'pt; ttcJ lWlmples.
Confiden ce Jc,cl: The pr~;\pcctfied probabtlitv that a
ronfidenl~o mt..:n. I (ur '<.:1) contains th~o tru~ value
lll the;; p.tr.unct..:r.
Con!>istency M ~ 111~ thnt .ln e\limator i... l'llllsi~t~nl
Sec oort\1\lt'll/o'\IW/IIfllr.

Co~i~>tent

estimato r \n c'timatur that t:(J[)\eT~c' m


probabilitv to th~ p.u tmcto::r that 1 '' csum,tliOI!
Constant regre!>Sor.
rl!grc"(tr 3'-,uo.:talcd \\ tth th<!
r.... gre~s ron mt..:n;cpt 1"'' rcgre'' tr " lwa~ s ~qual
to I
Constan t t erm: Th~ r\;rr~;ssJon mil. rc..pt.
Continuous random \llrlahlc: I\ random variabk
tha t can toke on . 1 conttnUum ul value:,..
Control group: "llt~.: k'roup that J~''-=' nut reCt!l'>'C the
tr~;alment <1T mlt:t\..:nllnn man \fX:rimeol.
Co ntrol variable: A n lh\:r krm I 1 ro..I!Tessor. mttrc
-.pcct.ficall) ..t r~; 'h;(~vr th..tt oontrob for one of th\:
f;~ctor.. tllal detcrmtn\;. the tkrc:nJcnt \ariahl..:.
Con H~rgence in di!>tributioo. \\hen o SCtJUcnce ol
dt,tnbutions c"n""'~ e' It' a hmn: .1 precise detmi-

n,. .

u 10

~ g~ven t':l

Se\.lwr

Convergence in prohohility Wh. . n J '..:qucncc of


random vanahles c,,.,,l rgc~ 111 n 'pcciuc valul.!,
fm example. when th~: :>umplc uvcrag~.: becomes
close to the popul.lliun mcun ns the sample SI7C
mcreascs; see "t> Concept 2 6 .tnc.J Sccuon 17 2.
Correlation: A unst lrec mnsure of the extent to
'~ hich l\\O random ,,m 1hl-.;'> mwc o ,an.
IOI!cthcr The c mdauon (or corrdatioo ~ffi
ci;nt) between \
~
Xl 1rr,o-,. Mid IS
denoted rorr( \. l )
Correlation coefficient Sec rorreluuun
Covariance: A mea,ur t"l( the \:X tent 10 \lthicb
two random ,anahle::. tnt>v~ tol!ether The co\'ari
nncu bcl\\ l'en \ .mJ Y i., Ihe ~o;\pectcd value
l:.ll X- llx)(Y -11} ll <Utd ~~ Jcnuted by cov(X.Y)
or by uXY.
Co\ariu nce matri\: A m.tt rix com po~ed of the van
mces and co' ui.tncc~ ol a vech'T ,,( r.mdom vantblcs.
Critical \'alue: 1l1e value tl{ a IC'il ~tati~tic lor'' hich
lhl.' tc.:~l just reteCt5 the nullln-p;.1lhc'b ;~t the ~t\~.;11
'lgntficance lc\c
Cro,,.,ectionaJ dahl'
, n: to..~ m .1 '1111.:it to1 h I'' 11od.
Cu bic regression model: A Olln'l:o-m~c,,~
r -r ep-ession
funclion Lh.Jt mclucJe, \. \'2, ,tnd \''a<. rcg.n:ssors,
Cumulnthe distrib ution runctiun (c.d.f.): See lllmurllll
prvbabtilt) \ 1111 1111 t
CumuJoti\"e dynamk multiplier The ~oumubthe
dfcct of a. un11 \.h.ml!\. mth~ IIIli~ '\:rtcs vanablc X
on } The 11-penod cumulull\'1. tl\ n.tmic multiplier
t'o the dtcct ol a Untl ~h.m~o.: 111 \ nn }' ~ ) , -

.mu ., ,

~ )

,,.

Cumoluthc probabihh di\tribution '\ funcuon


,Jttl\\.tng tht.; pwh.1hthtv th.ll .t mJom nriahlc ~
lc~~ thiln or CliUalto a gt\cn nurnb .. t.

GlOSSARY

l>cpt: ndcnt , ,Jriahlc llh.. \ .JT'l 1hk to be e\plamed in


, .,., nr othc.:r st11J,Ilcal model: the \anahle
ppcnnng u~ t ... lc t-hund ,,Je 10 .1 regre'>Sion.
Oetcmtioi'lfic trend A pcl"'i\lcnt l<1n~-tern move:nt ul "' n thk 1\t:r llmt: thnt \:an he repre~.. nl J. <;a nl.lnrandum functivn of ume.
Oiclo;c,-FUIIer fest : A mdh,)(! lor h.. ~ung for a unit
mot 111 a fiN order autoregressiOn !AR(l))
Diffcrcnl'C' e~thnutor An csumalor of tbc causal
dk<.:t con<.tructed as th~ dtftcrcncc rn the ~ample
:w~:r tv.c uutcomc' betwec.:n the tr..:.tlm..:nt and .:ontHIII\ruUp'
Difference -in-diiTerences e)fimattoc: Ibc a\ocra!!e
lh Illite 10 } " f,,r th,,, (' 1n the trc~Hmc nt group.mmu' the ,1\crage chan!le m } ll.lr those in the
<.:\IOtrul v.rour Discrete ra ndom variable: A randum \arial:>lc that
t,II..IOS \10 d1,1:rete \ aiUC!;.
Olstrihutcd l a~t model: A n:gn:~sion modd in
whtch the rc~ressors arc current and laggl..'d values
(l(

Dumm) "ariablc Se c binary vurwblc.


Dumm~ 'llriablc tro~p: A pwt'llem caused hv tnclud-

ln . 1 lull '-4:1 ol l-ioary van . bles m a re~re'lSion


l<l
her \\llh t Cl.ln<.t.mt regrc -.or (intercept),
1.. dmt. to perfect multu:olhncanly
Oynotmic ro u, al e ffect: The causal effect of one
' tn.1t>k on c:urrcnl .md luture \ alue ~ of another
\<JW'Ihh;.
OylU.lmic multiplier: The 1!-penod d) namtc multiplkr
IS thl' cHect of a unit change m the time st:n es
vunahlc \., on Y, h
t-: ndo~enouo; 'urlahle: A v:lrtablc t hc~ t i' w rrd all.:d
with the uror tem1
F.rror term: ll1c d tt ercnc~: Jxtw.......:n }'and the propul :'lllon rc~rc,c;ion tuncu )n. dcno11.. tl b\ 11 in tht'!>
!1.. \II'><"IOk
Error<;-in-, nrlablcs bias: llle bin.., in an est imator oi a
reyn.....,lon co<:tllc1cnt th 11 m,..:s lrom mca,ure
m~:;nt c;1 ron. m the rej!n:o:c;on;.
E'timutc: '111L numcricul value of an e'timator compu t ~..d lhlm J.tt.<l m a :.pccific S<tmple.
b tirnuiOr \ function of a c;nmple ol data to b..:
uruwn nmdt)mly from a pt1pulauon. A n csumator
l' 11 procedure: lor u<:m~ ..ample d'-lta to com pute
an educated gue'' of the ' alu1.. 111 c1 population
pnrumeter, uch a' the populauon m..:an.
&\1nt dhlrihutiun. The exact rrohahtlit) di" ihution
of n random vandble.
f.:\J.u:t id cn tifku lion \\ hen th~.: number of instmmcn
1111 \.m,lt>l.;' r;aru .. b the numtl<.!r Ill cndogenou~
r" g rc~

,,,r...

777

Exo~<:oou'

' uriu hlc .-\ \,JO,thl~ that '' uncorrelated


" uh th rc r.:' ttn ~.;rrm tc1m
Expec-ted \llluc: ll1c lon~-run ft\c:ra~c' luc: of a ranJllm v,, 1hh: mer n1.1n~ repented tr1als or occurr~nc.~' II' !hi.:" prub:ththi\" \\CI~htcJ a\crage ot all
P<-"~hk' IIU~.;"~that the: random' anat>le can take
till. Ibe: c\(ll.:d cc.J' 1J u .... of } IS Jcn,>tcd F(}) and ts
abo C.,lllc.;d the.: C\ f"ICCia tion of }"
Experimental dtlln: Datn ohtamcd lwm an expcrimcnt d c..,i~tned to CHII Uat ~ n ll eatmcnt ur pohcy or
hJ tn\l.:,llgute 1 clu-,al effect.
Experimental eiTec-t: \\'hen c xpenm.:nt.l 'u bjcct~
ch.10~1.. thdr hc:h vtor ~cau'e they rtr'- part ot an
~.:x p.. nm~..nt.

\ Um of ~Ullte~ (SS). The <iUm


tk n ,ltiOll' ultht. prc:~hcted \alue~ of}',,
t h~;lr d h. T.u''- :,~:c.; Eyua11on ( 4. 14 ).

\ pla ined

Explanat or~

of -.quarc:d
f, , trom

Htrlable: See ngre(.wr.


F.xtemnl \ allditv lnkr..:nccs <tnd condu-.ions from a
st.lli\ltC.tl ' tudy nrc cxtern nll} vnltd tf they can be
gene;rah.:eJ from the populall<ln and !he 'cumg
~tudtcd to other population' and :.c mn g~
F-stuti~llc: A '\l 111'>tiC used to a '"' t Jllint hvpotbcsts
concem int> more than o ne o f thL rc!!Io: ,1<.1n
cOC:Iftl'lCnt-..
P. .. di tribution. ll1e tlistribuuon of 1 ratio of independe nt r ndom 'arhhle'- \\here the; numerator b
;1 c.:hH;qu.t l~:;d 1 Jndom \'Snahlc wuh 111 degree~ of
fr~,.c..Jom dt' ttlcu hy m , and the d.:nommato r i~ a
cht ..qu.tn.d random 'ar1:1hle with 11 degree. ot
trecuom Ul \t d~d hy n.
F, ,.. di'llributinn. 1 be d1o;tnbution of u random variublc With oll.'hi\llUt'ITCd J islrihutton With Ill
dcgr~..cs of frc ... dom di' idcd h~ m.
Fea~ibll.! G l .S: A \'Cr-.ton of the .-:cnalltzcd k<t:.t
' 4U, r~:;o; ((l S) ~:;( llmator that- u e~ un O:'!>IJIDalor of
t h~.: c:onJit1 >n 1l ' .mance of the: rear<:..., ion errors
nd ~o\.tn.tn~.;'e bet\\een the regrc~,,1on erwrs at
d ukre nt obo... r.aton
'
fea , ihlc \\ l.S \ "eNon of the ''e1ghted lc~t
'quar.., (\\-1 Sl c'ltm Hnr that u'~' :tn C!>timatur o(
the conJttnm tl '.ln.tn~:e of the regres.,on crrur..
folr~t diiTcrcncc: Tht hht dtHcrc:nC\; ol a time 'cne-.
v:tri.1hlc r'1 ~ }''
) , 1. denoted ~ Y,.
Fln<t -~ruec rcJ!rC!>Joion. lbc rcgn:,,km ol an included
~.:nd01'.1. 11l'U' vanahlc on the tndudc:d exogenous
\3rt.tbk' tl tn\ .md the in~ trumcntal 'aria hie(!>)
Uli\\IJ 'I 1!'.1.: II. 1'1 'I.JU.tre'.
.fitted \'llluet-. Sec fltt'Jtc ted tlllllt.'l.
Fixed ciTttt..: nm.H~ \ ~n.tbk-. tndu:attnl! the enllt}
llf tun~; Jll..rmd 111 o Jl:tnd Jnta r~grc,sion .
fi~ed e ffects rcJ!rt:\sion model >\ (Mncl d.11a regres'iiOil that toc..ludl~ enttty ln:~,.;d dJcct'

778

GLOSSARY

lorccast e rror. The dilfcrcnce N:twccn th~ \Blue- of


th1. vanablt: thut actual!\ occur-. and "" forc:at'>tcd

\ .tlu~:.

Hctcro~l.edasticity-robw.t tstati~k A t

forel-ust intenaJ An anh:.nallhnt contun-. the


future valu~ of a ume sene!> va riable wuh a pre:
l>pccilied probabtht\
Functional form lllls..~ecification: \\ hc:n th" fmm
of the ~:.,ttmat.:d rc~rc:!>l>tOn tu ncuon doco; nut
match the form ot the populati1'n rcs.tr~.:s'"'" func
uon: for exampk when a hnear "P\:t:tficauon "
u.....ed but lb..: true population regrc:~!>IOn fU111:11on ts
quadrauc.
GARCH: See };(llerolt~ed tUttor~grt~~l~>l! cundltionul

Homo kedasticity: 'Ibc varwn~o.!' ot the: crwr h:rmu ,


conditjonal o n the regres:.OK ll> con,t.ant.
Homoskedaqjcity-onJy F ~l ati,.tic' A ftmn of the f'..
:.tati)IIC that is valid only wh~..n the regrcsston
errol"' are homoskcda<;u~.

hctero~kt!du~IICttv.
Gan~~-MarkO\'

theorem Mathematical h.:,ult <;tatine


that. under certam wmliuon' the OL \ ..:'ttm.uor
ts the be~ It near unbta-.ed e<ttirruttor ul the r~o:Prc~
<.ton cocffietentl> cundtuonal o n t he values ol the
rc:grcssori.

GeneraJired au torcgre~~ive conditional belero kedasticity: A ume sene... model for conditional hl'tcro'il..edasllctt}
Generalized le8S1 squares (GLS): A generahzatton
0 f OLS that i~ appropriate when the rl'gr~~ n
e rrors have a known form of heterosl...:dnc;ticllv (in
which case GLS ts al'o rdcrred to ac; wct~hted
least square. WLS) or 11 know n lormul s~..ri,ll
correlation.
Generalized me thod of mome nb: A met hod (or estimatmg. paramctc~ hy fi lltng sample moments to
population moments that are fu nctions ut t he
unknown param~. ter<\. Tn ... trumental vun.1hle' C)timator-, an: an tmpon.,nt speetal ..:a:.c.
G\1~1: Sec gencrult~~d ml!tlwd uf mnmt'lll\,
GrJnger causality te!it. A procedure fo r testi ng
\\hcther current and tagged valu~s of nne lime
seri~ hdp pred1ct futurl' \alues of JllCll her tune

o;enes..
HAC standard e.crors \~.:c lrt:tund.c.lu,uut\ und
awocorrelallotHXIII\1\IUif (I lAC) ~tundard rrrur:..
Han1horne efJcd: S<:c t>\pt mmnwl ejjr:ct
H eteroskeda.sticity: The ~itU<Hion 111 '' hich the vari,Jncc of the regression error term u,. conditional o n
the regressor!>. is not con~tant.
lielcro~kedastkih- a nd ~tutocorrelati oo-con~;i tent
(HAC) ~ian da;d erron: ~tandard ~..rrMc; f11r OLS
esti mators that are con~1sten t whcthcl M not the
regre:;:,ton error.. .m: hctcro ... l..cda,llc .1nd llUtocorrclated.
ll ctero~kedasticit ~-robu\1 ' llmdard error: S t;~ndt~rd
errors for lhc 6u; c\llmator th.ll art..' appr~o1prtatc
\\hether the .:rror t~rm I'> hnmo,l..cdu,uc ur hctc:n,.,kcdasuc

con.,.tructed ustng a
d.Jrt.l e rror.

h.:t~.:Tosl..cd.l'tJCll\

'tatLSllc
robust tnn

Homoskedasticity-onJy standard error; St mdard


erron. for the: OLS estim.unr thnt an.. Jppropn.tlc
only ''he n the error tcm1 ~~ homo~kcda'lll~
Hypothe~is

lest: A procedure for u:.in!l. 'ample e vtdeocc to help d etl'rmine i t .1 'pcctlic h) fl<Hhcst<;
abou t a population is true or fat e.

i.i.d.: Independently and indentically d l:.trlbutcd.


/ (0), / (1), and / (2): See order of ime~raliott.
Identically distributed: When two or mon.: randl.)m
va riahles have the same distnbutioo.
Impact effect : The contemporaneoul>, or immcdwtc,
effect of a uni t change 10 the time series van.1ble
X,on Y,.
Imperfect muJticollincarity. The eondiuon in which
two o r mo re regressors arc highly com:l ll~.;d
lnduded endogenous ,arilable~. Regrcsso~ tha t a rc
correlated with the erro r term (u~uall'r mtht conte\t of instrumental van able rl'g.rcss1on)
Included exogenous variables: Regressor' thal a re
uncorrelated with the error term ( usu tlh tn the
comext o f instrumental van able regre'O$tun).
Independence: When knowmg th~:. value t,f nne. random variable provides no in formatio n about the
\ aluc of a nother random vari.tbk. Two r.lndom
'anables are indl'pendent if their joint di,trib ullon
is the product of their margmal d1stnhut ions.
lodicalor variable: See binon arwble.
lnfo rOllltion cri1e rioo: A .,tat"tk used to e'tim He
the number ollagged \anablc-. to mclud-.: m.m
autoregression or n d1-.tnbutc:d lag model LcnJmg
examples are the Aka1l..c tnformauon crttcrion
(AJC) and the Bayeloo tnlom1a11on cri wnon ( BIC).
Instrument: See insmtmtnwl 'uriuhlt.
Instrumental variable: A va n.abll' that ,., correlated
with a n endogcnou..-. rcgrcs.,llf (tn,trumc:nt rc.:le
\ance) and IS uncorrelated \\llh the r~grC">)JOn
Lrror (instrument exo~c.ndty) .
lnstrumenlaJ variables (IV) reg~ession: -\ wnv to
obtain a consC,tcnt e!>ttn~t~tor ot the unknown cocUicents of lhc populauun T~.;!trco,-.tunlun~tHIO wh~n
the. rcg.res.~r. X. i!> corrcl<lled '' ith the crr.1r term, 11 .
lnteruction ferm. A rcgr~.:s,or that ;.., lormcJ n' the
pnxlu~o:l ot two olhc::r rcgrc!.~or:., 'lul'h ,,~ \" X \',,.
11

GlOSSARY

lnlerccpl: ll1c: 'alut; ot A 111 1he hnc:ar rcgrc:>Sion


tni.Jd.
.lnternal YOiidil\ \\hen mfcrcncc' Jhout causal
llfc:LI~ 111 1 ;t.llt'-111...11 ~tudy ar~ v.1hd lt1r th~o: populltlllll lx:1ng 'tuJ1ed
l ~>hllis lk A '\tatt~t l for tcsllng ovcndcnttfying
rc~trictinn ~ llllll~trumcntul vanublc:. rcgrc~~on.
Joi nt hypothesis: A hypothcsi~ cun,:.ting of two or
more tndividunl hypothcsc~ that i~. tnvolvmg more
th.tn CHit.: restriction on tht.: purumctcn. of a model.
J olniJ>rClhohility di~trib ution: nlc: probability distrihUtl\ln dc.tc:rmminthe prob,thJlitu.:~ of llUtcomes
' " ' " " 111~ LW11 or more randl.lm v.1n.1 blc:..
Ku rt o~;J , A m J!-.Ur..: of hm' much ma~ it. con tamed
tn th~ t 11ls ol J prohnbthtv J istnbuUI.ln
l og Th ' 1lue of a time l>eric.:' '.mablc m a pre\'ious tune pc ru.xl. Th.: l lag of } , is Y,
Low or itcntted expectutioM: A rc,ult in probability
theon that 'iJ ys that the cxpcctl!d value. of Y '" the
ex~ctctl \alue of 11s conditional exf)\!ctation fJ'en
X. th 11 i,.., /"(}) = ( (>': \))
Lan or lall:l' numbcTS: Acconhne to lhi-. result from
pr lh.thlht~ lhcOI') undc.: r gen~ral .:ont.ltllon., the
(<\mple av~r.t~c. ''ttl he clo<;e to the populauon
m~:tn "tlh v~ry high probahrhty whl n the )lil))ple
'' ~ '' ,,rgt
Lea 1 square~ a ~omptio ns: The a...coumptions for the
hnc.u r~:1 rt "ion model Itsted m Kev (\1nccpt 4.3
(~m~k ';m.1hlc rc.:grcss10n) and K..:r Conct.:pl 6.4
(mul11pk n.:~trCS:.ton model).
Leo t qu a re<~ e~;t i mnt or: An e~tun ator lormed by
mimmillnf t h~: -.um of squnreJ resJuals.
Limited dependent Vllriablc: A dcpe nd~: n t 'anable
th:ll c mtaJ..e on onl\' a limited o;et ol \ aluc:.. Fm
cx.tmpk tht v,ma~l e ma~ht he all I htnal') ,<!fiublc ur .n...: frum one olthl! modd" dc~nbcd m
Arr mh' 1 3.
L ineorIOJl model .\ nnnhnc rcl.'!rc<;.<;ton funclron in
v.hl\ h th c.J.. p.:ndent van hie:: IS Y dOd the indc~nd~ 1 \annhk ,, tnt\ J
Llncor prohahilit) model: A cgre"''on model m
"h11.h }' i-. a hiMI') ' ui Jhlc.:.
llnl:or rcl!l'C, ..iun runction: A fo,:Pfc,,IOn funcllon
\\llh Jl CCIOM.tnt "lOpe.
I ocul lii\CniJ!C treatment effect: A WCI;!ht1.'1.! a .. c:rage
treatment e!h:c.:t C'>llm ttnl fliT 1m pie. b~ TSLS.
I H linear model. A nonhnc u rc ..rc-:c:u-o funcuon in
""htch the dept;ndcnt v1n.-blc '"In( Y) and lb.:
~ J r nJctlt 'un~k ~ X.
I n~ICIJ: model \ nonlinear rci~JI:'-'ion function in
\\ht1.h I he dqxndcnt \ mablt '"In(}') .wd the
llllh:fl(: ndcm \.trt tblc ''In{.\).

t'

779

logarithm. 1\ mllhcm.HII:alluncu,,n det'incu l\1r


a po''"''t; lrgum..:nt.Jh slup.: j., alwa~:> po..iti\t!
~ut h:nd~ II tcr,, 111~. naturalluganthm ,, the:
JO\Cf'~; or the cxponc.:nual funcuon.lhat is.
X- Ink').

Logit re2rcs.~i on : A nonlinear rcgr.:~-..,lm model for a


binary tkpcndent vnnohlc 1n whtch the populatiOn
regn.:s~ion function 1s modeled usng the cum ulative logt.,tic Llislrihution function.
LODJ.(ru n cunul u1hc d)namlc multiplier: 111e cumulathc lonp run effect on the timt: \encs variable Y
ol n chango in ,~\ .
l ongitudimtl data: See panel data
Marginal prohnhility dl,trihu tion Another name for
the pmt'lat'liht~ Ji~trihution uf a ranJom 'ariahle
Y. ''hu.:h <.lt'-tmgUJ<:hec: th~ dt,tnbuti~m of Y alone
(the marganal ui~tnt'lulion) from th~ joint di!o.tributlon ot Y anJ anuthcr randl'm ,,niilt'lk
Maximum like(jhood e!>limlltor (\llt:): An c~tunator
of unkml\\n par:mll.tt:rl> thai '" obttim:d by maxi
mazmg the hkchhood funcuon; l>cc Appcm.h..x 11.2.
\-Jean: Thl.' expected '"alue of n random 'ariahle. The
mean of Y ll> denoted J.ly
Momcoh of u di,tri butjoo: Th~ expected value of a
randum var tblc rill'cd to diffcr~nt J'O''er<; The ,C.
moment o( the wndtm '.m 1bh: } ~ < }"").
MLd licollincoril) S\:t.: perf~ct muluwllmt(lnt) ;mtl
imperfect mulucolltllt'flrtl\.
Multiple regre_~qion model: An e'\tcnston ot 1h~ ingle ' anahle rcgres<;inn modelthnr allow~ Y to
depend nn k rcgreso;ors.
Nuturul experiment: se~ l(llllSI-t'l;fll:rtlllf!./11.
Natural logu.rithm: Sec loguruhm.
95% confidence set: A conlidcncc :.ct \\llh a 95o.o
confidence h.:, c) "'e cvnfidcnn mttnal.
'onlinear lea l qwtres: The ..nalog of OLS that
JppJi~,, \\htn the .:\!re<;-;ron tuncuon j, a nonlinear
func.:t10 ut th ... unlno"n paramc:ter..
NooHnclll' lea t \qu:.re' ~1irnalor. The c:'timator
obl.. ntJ b) mmlmtLlll!!. the ... urn of squared re.;idu
als wh~.n the rc_n.:-:'tun lun.::uon i!oo nonhncar m
the p.tr.tmclc"
Nonlinear regression ru.oction: A n..:grc,,ion fum:tJOO
"ith a 'lope that J" nut con.. rant
onstation&l) . \\hen the jomt di... tnhuuon or 8 umc
se ne~ "ariabk anJ 11~ lag~ ch.mec' l>\c.'r lUllt.:
Normal di.,tribution \ cummonh u....ed bcJl,haped
dt,lrtbUIJOn Ul I t.:OllUllUOU'- random vana\lJc.

~ ull

bypolhe' i' : Inc h)pothesi bern!: tc ted in a


hyputhc:.' t..:st oltcn denoted b~ II&
Qb.,cnulion number The: UDtlJUc ~dcnllhcr a!oo'>igncu
to cnch cnllt~ 111 .1 d.ua ~ct.

780

GLOSSARY

Obser,ationul data. Data ba~ed on ob~crvmg. or


measunng. actual bcha\'1or outside an e"pcrimental ~ctung.
OLS c~-funator. See ordtnarv ltmt fqunre~ estimator.
OLS rcgrc-s'iion line: The regrc-.~lon hnc wnh
population coef11c1ent-. replaceJ by the OLS
estimators.
OLS residuaJ: The d1fference between Y, and the
OLS regression line.denott:d b\ tl tn th1s textbook.
Omitted ' ariables bias: 'lb~.- bias man e~timator that
ano;e~ ~cau""' a 'anabk thatt!> a Jetenmnant of Y
and IS correlated with a rcgrw.~or has ~en omitted from the regression
O ne-sided aJternathe hypothesis.: 1l1e paramet r of
in1ercst is on one s1de of the 'alue g1ven by the
nuJJ hyrothes1s.
Order of integration: The number o f times that a
time series variable must be dift'crcnceclto make it
stallonary. A time series variable that is integrated
of order p must be uiilcrcnccu p ti me~ anu is
denotecl/(p ).
Ordinary least ~quares e~timator The estimator of
the regression intercept and l>lopc(s) th<ll minimizes th~; sum of squared rec;id ua l~.
Outlier: An cxccpuonally large or small value of a
random 'v anable.
Overidentification: Whe-n the number of Instrumental vanables exceed$ the number of included
endogenous regressors.
p-value: The probabillty of drawmg a ~ ta t is tic at
least as auvcn;c to the null hypotheSI., as the one
actual!} computed. assumtng the null hvpothesis
is correct. Also called the marginal 'ligmficance
probability. the p-value tS the mallcst Significance
level at which the nuU h) poth~.:sts can l'\(! reJected.
PIWel data: Data for multiple entities where each
entity is observed in ~vo or more ume periOU).
Parameter: A constant that determines 11 characteristic of a probability dlstrihulion or pupulution
regression fu nction.
Part ial compliance: Occurs when S<.)me porticipants
fail to follow the treatment protocol in a ranJomacd cxpenment.
PartiuJ effect: The effect on Y of changing one o( the
regressors. holding the other rcgrc,.,OI"\ constant.
Perf~ct multiconinea.rih' 0 curs whe n one of the
regressors is an exac't hnear function of the other
regr~ors.

PolynCimial regression model: A nonlinear rcgre!'c;ion


unc11on that mdudcs X. X ...1n<.l .\'' ''' regressor:,. where r is an mte11.er.

Pop ulation: The gt1>up of entllles-,uch a' p.:orlc.


compam~;s. or school dl ... tnct
bcm11 '' udicd.
Population coefficients: See popufntlm: mtern:pt a11d
slope.

Population intercept and ~lope lbe true. or population, values of f3u (the mtcrccpt) anu {J, (the
slope) in a single variable rc.:grcs'>IOn In,, multiple
regression, there are muluplc ~lop.: coelllcicnts
(fi1 {32, {34) . one for each rcgrc-.sor
Population multiple regression model The multtplc
regression model in Key Concept fl 2.
Population regression line: Tn a ~inglc vanat'lle
regresston. the population rl!gre~~ton hne 1c;
{j0 - {3 1X ;. and m a mult1ph.: regn.:sston 11 1~
f3o-:- I31X1, + f3: X 2i + + f3p '<v
Power: The probabilny that a test correctly rejects
the null hypothesis when the a ltcmatJvt; 1s true.
PrediL1ed \'alue: The value of Y1 tha t is predicted
by the OLS regression line, denoted by ~i n this
texibook.
Price elasticity: The percentage change in the qun nt it~ demanded resulting from a 1% incrca-;c 1n
price.
Probability: The proportion of the tim~ that an outcome (or event) ~'ill occur in the long run
Probability density ru.nctioo (p.d.f.) For a Cl)ntinuous random variable. the an:a umlcr the prot'lahility density fu nction b~tween any two pomts ~~the
probability thatlbe ranuom variable fall<; betwec.:n
those two points.
Probability distribution For a di-.crctc ranuum variable. a list of all values that 1\ random van able can
take on and the p robability as~odatcd with each
of these values.
Probit regression: A nonlin ar regressmn moucl
for a tnnar) dependent vanahle in "hich the
population regressiOn function 1" modeled usmg
the cumulative standard nonnal distributiOn
fu nction.
Program evaluation: 1l1e field of ~tudy con<.:crnctl
with estimating the effect of a program. policy. or
some other intervention or treatment
Pseudo out-of-sample forecru.t: A forecast com
puted over part of the sample. usmg a procedure
that IS as if these sample data have not yet been
realized.
Quadratic regression model: A nonhnear r~;2re ~llm
function that includes X and X ac; re~tre...,or'
Quasi-experiment: A u-cumstan e m wh~eh randumne.,s ic; introduced b\ 'analtons tn mdiv1dual circumstances that make it appear n' tj the trc.nment
1s randomly a-;s1gned.

GLOSSARY

fl! . In u re2.r\.".,~1un.thc frudtun ot the ,,,mpt.. \ariam:c ot the dependent \,mablc that 1:. c.:xphtn.:d
I>\ 1he: re r ..:,,..,,....

"'R!. S.:c.: adJIL.\ rat R1


Rllndom walk: A lime ..cru:' proc""' 10 "htch the
',due 0f the "clri:tbk CLJUdb ''' valuo.: tn the.: pre vi
,,u, rcmxt. pluo; an unprcJictahlc error term.
Random walk ''ith drift: A gcm:rnhzation of the rn n
clom walk in which the chang'-' tn the variable has
( I nouLero mean but i' ntiH.:rwbc unpn.:uicwble.
Ruoclomilcd controlled experiment: An experiment
in whu.:h pitrttctpnnt!> arc nuu..lllnlh .l~l>tgned W a
cnntwl gmup. '' hich rece" c:. no trc.ttmcnt. or to a
trcttmt..nl group. which receive.. a treatment.
Regre ~Jld : See dt-fUmdent toriable.
Regce,sion ~p eci fica tion : A d~.:~t.:nplton of u rcgres
!>ton th.tt 1nclude!\ the ~t of r.:grc~or and any
nonhnnr transformation that ha' l'>l!cn applle::d.
R.egre!>Sor ,\ vanahle app.:.1ring on the righthand
!>U.k of a re~re ton. an tndcpend~.:nt 'ariable in a
r~.;gn.:,~ion.

Rejection region The !\et ot valu~ of a tc:.l statistic


lur "hu:h th~; test reject~ the null h\ poth.::-i!>.
R c~alcd cro -sectionaJ datu: A wU.:~.:Ilon of cross
,c...t , ~ al d.na !\C IS, wh-.:r.: ~.;ach cro)~'>l:clional data
)d ~.t'rrc~ponJ.s to a dttfc.rent u mc JX:nod.
Re lricted re~c~sio n: A r~.;grel>'>lon tn "hich the ooef
ltcu.ntc; arc;; rec;mcted to satt)ly M>m~ condition. For
n11mplc. ''hen computmg the homOl>kcdasticity
l'nl~ f '>laUMic. this is the ~t.:grc,~aon with cocffi
lu.:nts rc)lfll..l~d LO satisfy the null hypothesis.
Root mean squared forcc11st error: nu.: ~quare root
or' he mean o r the squared furcca~t u ror.
Sample correlation: An ~:~llm;ttor of the.: c.:orrelatton
b~-;1\\ccn 1\lfO random v:mabks.
ample co' ariancc An c'ttm tlot of the: co,anance
lxt\\ccn t\\0 ru ndom \3riabks.
ample election bi:1s (he ht.t' an .-sn e,timator of a
T\.LtC: ' " ' ,
dttc.o.:nt that ri'" \\hc.;n a ...elect on
pro\."e'' mflucncc' the ,wa1lahahl\ of J.tt;t and that
pr<X't' 1 related to the 1.h.:p.:nuo.:nt \unabk Thts
tnduc.:c' correlauon het\\.ccn one or more rcgrc~
'"r' anJ the: rc:gre~on error.
Samlllc ' tundard de' iution: An l 'um.llor of lhe stan
durJ ll~ \l.lll\ln "I a rJnJ ,,m 'drt.thlc..
Sumplc Htriunce \n c:~ttm 1tor of the= van a nee ol n
runJom '.m,thlc
S<tntplln~ dl, trihution: Tit~; Jl,lrtbuuun ,,(a statistic
o\'cr allf>~.l'>'lhl~.: <:tunrks: 1he.: dt~ tn hutton ari"mg
Irom ro.:pcarcllh c\,tluatint:t the.: ,t,itJ\IIc u:.ine a

781

.. en~ uf randomly dra,,n :>ampk lrom the sam~


popul t n.
Scatterplot A plul oln otN:n tlton.., l\0 ' ;.tnd Yr m
wh '- h \. c.:h ob--cn ,tttcm \' rcpn:,c.:nt~ !>~ the point
{ \' } ).

Serial correlation: Sec nuwcorrtlativn.


eriolly uncorrelated A ttme ~eries variahle \\lth all
uuto~.:mrc:lntions equal to zero.
Si~rnitican ce level: Ihe prcspcc.:1f!Cd rcjl:ction probatlllity of n <;tatbttcnllnputbesb h.:sl when the null
h ypothcsa~ i~ tru~.

Simple ra ndom ~umpling: When entitie<; are chosen


r.lndllml) fwm ,, populutton U\1ng. <1 method that
cn!iu re~ that each en lit} t!> equally likely to be
chosen
Simultaneou causal it\ bia<; \\hen, 1n addllton to
the c,tu,,tl hnl.. of i~tcrest from X 10 Y, there s a
causal hnl. frl,m }'to X Stmultaneou., causality
mah!> \ <.'Orr<.:htt~.;d '"th the error tem1 m the
popui.Jtton n.:gn:ssion o( intcn:,t.
imulfaneou'< equations bias: See smwltaneoLLs
cnusalm biat
. izc of a le!~1: Lbc prot-abtltt' that a test incorrectly
reJeCt<; the null hvpothel>ls v.h~.:o the null hypotbe
SIS i~ tf'le
A mea<;ure of the aysmmetry o t a probaSke"ne
bib t~ \h,tnhUtlon
tsndard dc,iation: The square root of the variance
Tht.: '\IJnJ;ud dc.:vaatton of the raollom variable Y.
denoted tr y. h.ts the until' of Yand 1 a measure of
the 'flread of the distribution of Y around itl>
mean
Standard e!Tor of an estimator \n e'tinlntor of the
stnnuard J..:vtauon of thL ~.;')timator
tundard error or the regrec;c;ion (SER ): An estimat 1r ,,t the.: ' nJ,,rd de' all'> of the regre:qon
~rror 11

tandard oonual distribuJjoo: lb.. normJI dt:>Lributlon '' th me tn ~;qu.ll to 0 .md .. ..nancc ..qual to 1.
dC' notc:d N( l I I
Stund nrdi1ing a random \llriable: An oper.Uil"m
acnlmpfi,hc<.l h\ 'ubtracting the mean and diYid'"" h\ th~.: ,,,mJ,ar<.l Je,1a110n. ''hch produc.:s a
rand~m 'ariahk " ilh a mo.:;sn l)f 0 and a c;tand:trd
J"' 1.1tion uf I lhc ,L,md.ud11ed \alue ot r ts

l )' -

ill ) IT )'

Statioourity: When the jomt distribution of a time


'~rt~o:'> v trt.tblc <IOU 11 lagged \alues doc.:> not
ch..tng~o. uv..:r ume.

782

GLOSSARY

Sta fi!ltically insignifican t: The null h\pothcc:ao; (typi-

cal!\. that a reere-.->ton coefficient I' ;c.;ro) cannm


lx fCJI.!Cted at gvc.;n ~lgniltc.mc.;e I~;\ el.
Stathtically significant The null hvp\'Hht:sl' (tvplcall).lhal Lt rcgres.-;1on coclllctc.;nt 1, ;ero) 1'reJt:CteJ at a given sign1fh:ancc le,cl.
Stocha.<>tic trllnd: A per.-.l,tt!nl but random long-tern\
lllO\ernentol a variable over tunc.
Strict c~ogencity The re4uirement that the rcg:rcsswn c:rror ha~ a mean of .tctu cundiuonal on current. future. and pa'>l values of thc rc2rc~r m a
dt'\tributed wg. model
Student t distribution: The Studt.:nt 1 da,tribution
"llh m degrees of freedom ~ the d"tribuuon of
th~.: ratio of a -;tandard normal r.1ndom \Uttahle,
diYtdcd b) the '>quare I Ol'lt of an tn~kpendcntly
di~tributed ch1-squared random v.1riuhle with m
degrees of freedom div1ded hy m. As 111 ~ets IMgc,
the Student t distribution converges to the standard normal dis tribution.
Sum of ~qua red residuals (SSR ): '1he :-.um ot the
~quared OLS rcsadua ls.

tdi.~tribution: See Stud~:ntt d1.1trlhurinn.

/rat io: See t-scamuc.


r-stalistic: A statistic used for hvputhesis tt:,tine. See
Key Conco.:pt 5.1.
Tc~t

for a difference in m~aru; A procedure: Cor testang whether two populations have the: 'a mo.: mean.
lim e effects: Binary variabk!> indicating the time
penod in a panel data rcgrt:.\>;lon.
lime and entity lhed effec~ regres ion model: A
pand data regresSIOn that 1nclud~.:' both cntit}
fixed ettects and time fixed erfcc.:t'.
Time fixed effects: Sec tim~: t-{jn I\
Time c ries data: Data for the "tlme en lit\ for multiple ume penods.
1olal wm of squares (TSS) The -.Lun uf squared
devaat1ons of Y,. from ill> average, Y.
Treat me nt effect: llte causal dfcct m nn l!xpcrimcnt
or a lJUasi-experiment. ~..:..:caw a! e[fl!''t,
Treatme nt group: The group that receive" the treatment or intervention in an c'\penmenl.
TSLS: See rwo sra~e least squarl!\

Two -sided a lternative hyp oth t:Sh When. um.kr the


alternall\e h\pothesis. the paramdcr ol mterest i~
not equal to the Htlue g1ven b) the null h)'pothc-.ls.
Two stage least squ ares An in~trumcntal 'ariabl"

estunawr. de:scnbt:d 111 Kc)

Conc~,;pt

12.2.

Type 1 e rror: ln hypothesis tesung. the error m.uJe


when the null hypothcsll> I!> tru~ hut 1\ f\:Jcch.:d.

Type ll error: In h~pothcsis tel'ting.thc error madc


" hen the null h~ polhc~ts 1 I;JI<.(; but 1~ 0111
fCJCCled

Unbalanced pane l: A rand Ui.llU

data are

~et

in \\ hich -.orne

mb~mg,

U nbiased estimato r: An c~t1mator \\ ith

hi a~ that is

equal to zero.
U ncorrelated: 1\\'o random ' ari<tble~ are uncorrc.lnted 11 their correlation i 'cro.
U nderide nti licatio n: When the nu!l1hcr o l instrumt:ntal vanahles i~ lcs:. than the numbur of

endogenous regresso r~.


Unit ro ot: Refer~ to an autoregrt>Ml>n wllh .tl.rge~t
root eq ual tu l.
u nr estrict ed rcgre~io o : \\'hen cornpulinl' the
homtn.k edastlcllyonly F-stati~ llc.thll> ll> the
regrcs.-,1on that applies under the .tltcrnatl\c
h~ pothe!;t'>. so the coefficients .uc no I rc,tru.:tt:d Ill
sausf) the nuU hypoth~is.
\'AR St:t: 1 ecwr autoregrC:il/011.

Variance: The C\IJ{:Ctcd value of the squur~.ed dtltercnce between a random variJhlc: ami II\ mean; the
\'anance or y i!> denoted uj..
Ve ctor au toregression: A modd of k time ~cne' vilri
ablt:s consi!>tmg of k equalwns. one for each \ :m,,blc. m which the regressors 10 all t:l(Uiltlllll'> arc
lagg~d values of all the vanablec;.
Volatilil) cl u~1e ring: \\'hen a t1me scfll:' varmhle
exh1t'lit' l>Ome clustered penod~ ut h1!!h '.lft.lnCtc
and other clustered penod~ of lo\\ '<HIJncc.
\\euk instruments: Lnslrumenlal variahlcs th.tt ll\1\'t:
a low correlauon w1th th~ endogen\ll~
regressor(io ).
Weight~d

le as t squares (W I.S): An alt<:l natl'tc w

OLS that can he used \\hen the ac.grco,~itlll erro1 1:.


hctero:.kcdasuc and the form ol th1. hetew-.ked,,.,.
tiCII)- 1-. known or can be c~lU11atcd.

Index

Page number" fullownl b' nalici7ed f anJ t relet to

ugur~ and tahl~~. rc~pccuwly.

of t.i.d. sampling. 721


rcl\1011. 7'-J. Set 11l1 Stati.,tkal
hyp.tl\c$J\ I ''

Acu:pt<~n~c

ADF 'WINk ~t'< :\UI!Illcllltcl D1d.e~~


Fuller 'i.llhllc
\ dJLI$Icd R ,:!IKI-111' ''l~:!:W
Jdtntlltlll ul. 2111
ruf:~lb O) U~tnJ:. :!:\"i

A DL m\Xk:l

~t< :\utorc~,:r~"""~

dt~tnt>ntuii.IS m ldr:l
\Ee,:tnJ .:arnrnr \rt' ('ullc:~;c:: gra.W..tcs

Air 'in \katkc mturmaunn <.Tit..: non


AI.; at I.." mforntJIIun cntr.:n<n (AIC)

552

~5.\:'tI.~.M.:!-653

... ate ss:;

IN\

Alcvhol

\Rs

7Y.
A~yruptouc

distribullcn. J9

dctiruuon of.():).!
)I lstatisuc. 705. 71J

Cll I\ cstimatol"o. ~Q.J. 7:!'


?17
nf OI...S estimator. ~. 7<1+-705.

and trarti<.' Jo:ath. Stc

Trlllln: Jcath~ .,nJ ak.>holt:t-,1.'>


,\ltcmath.: h' pot he'" sl, 2:11-:no
dcfuuuon ot 7.
lliiCSid.:Otl.l:\1.1. 1~1... I 'iS
of >tauon.Uit). "ih.l
two-.tLI<:d.IID, 11<1 I ~0
ddinn inn
7l
A ile mauve spcctfK.IIlun'. 717,243
\m~riCIIII t.cmwmi(' Rnt'"' }~Q

,.r.

AnRmt. Jo~hu.t. 4.!2. Nl---1'17. 501-502


1\liJtriM J...rueger ln\trumtnts 442
Aprro:wndlt nunn.11 d1,tnhutiun M. 151.
711
/\ppw\lrn.olton. ).lr~r.:,umpk normal. .)e<'
L.tr~;c,,m,plc nmma l
appw~tm<tll<~n

fCd'Unlng. ll7 'l! X


AHCH \u -\utuu:srL ''"' '' ndtuonal
hctc:'"l.;cJ3 II~ '
A R\1.\. ~ .\uh~-gr .. ,,, rn '~
a~crdgc: (,\R\1,\) modd
AR Mudd ,'\, r .\ut.r.:gr.:,.,on
-'''umplton'
cnnd11u10al m,.m .\t< t.ondii!On;~l
mean
1.-r d!!>tnt>utcd 13S nKidd \',.
Drnributr.:d lag mud
c~tcndc.J lco~l sqUlln:, s...,. El.l~'llJcd
lcll'l MJUUic
fur li~c:d dfc:d' r~"f(r~''IOil Sec fixed

,, pfltlrl

diCCI\ r~~ln'i<lll

GttU,'> 'villtk<l\, 7' 1

( tl

s. . .

dtmcd W2
~'llnt.ltlll8 cocfficicnt6. bl:! .013.
h) '>.Ill/\
~'1un.1tinn

of. tlflh-(11:> ol7. o24


Au!Qrcjlrc:''l''<.! models (ARl. 526-527.
~~

~ St't' Ocn~rHh/Cll

kasl !>QUSICS

((ll..S) C"lllllll(ll'll

~ 4'lt

'ihJ

'"umplU\0\ U'>i:d Ill. 545


Jcllnlli<IO ol "J~
L>1~lc) Fullt;r ''"' u 561J-'f)l
tiN order. "-5~
pwpc:rllc' ot IJrcca~t anJ c:rrur term
5'\<l .<..J(l

\\ith Untl r<ICII,557


o\utuc:grr.!'''vc root Str L 011

O! JSlaUSUC .U5

fl.l!l

ta.\~

ot tnHrnment exvgt>ndt\ 7~
tor IV rcgre>sion Sn Vori.lhks
regression 1TV rr.:~rc""'lll)
lor lca>l squares n:~:,'Te\\IC>n.
L..:aM
squares
tur 11mc senes T"'h'Tc::,,ltln ~..~ Ttmo.;
series
\.'~mptollcall~ dficu:nl G\1\1 u.umator.

oiUlllfl'jU'O:~~J\c

A~cr .. ~c .:a~u31

~\'c:r~ge

ht: llrn<.:lll c.IICCh, ~115-<;()0

711.~71~

the..~ bb(J-()86
ol fSLS estllThlllll 711", 7~-730

A\C:IIii!C: tre~tm~nl

01 tstaUStic. fiS6....AA..., ~114--"~(15,

A~~r ..~J. 0dn. 511

71ll-713
Asymptotic normality, 7HJ.-71l. Set rtlw
Cr.:ntra l lmu th<:orcm.Asymptotlc
di.,mbution
Ali\ntplotic mlTlllilll~ i.lt\tnhut.:d. 54
Altrtlion ~73,501
Augmc:ntc::d D id.ey-Fulkr ( A Of) SI311SII.:.
'\OI-563.M:?~3 nt>3t
Augmented Dtckey-Fullcr (AD[) tc-s. vs.
DF-GL<; te~t 1>5H
r\ utm:orrd.:.t1on 366-31>7. 532-533
.:cdficic:nt,532
dc:l'initwn oUM. S32
of c:rrur term. :nu
'ttmple~ .;;:n. 533t

COOl

dkc1 \"S.IOC3l

cttcct Se, A\c:ragc

lU'U.tl diO:Cl.

B
Balanc.:i.l pnnd, ~50. 163
Bank >I Lngl.1nJ. 5~9-551
Ba~.. 'JICCittc.llinn. 237 :243 ::!4.! .'\17-318.
Bay(~

31>1J-370.J02
mtorm.lllun LTilt:nun (BlC).
~5~-~~ .. ~5~1.561 577.

6~~-M.
~"AI( .55;\

tor VA!{., t>-lll

Beer !J~ l~ t 1S2/ ~5~35n. JS5J


3<7 Wl-J71

1(~. 363.

Bdorc: ,ond .1ftcr" ~fl.'


unal\'t'. J~
o\ut<)(.:~)\anan.-.-e. 53~
rompanson 153-JSh
Ault>rcgr~'lon
ddl~IIIVO of. 1'\.;
l:Udli<.'ic::nl. hia>c:J (Oiol. drd /Cfll ''S
\. h\.:d c:Hc~l' c:>llmallon.359
Bch.lvtor:tl c:o.>nnmi~ 911
dc::fmtlllln of.535
dr.:tc::nmning order at. 551-~q
LJcr&'trnm, Tnc::ndnr.:.191l
fiN unkr.S.36
He rnuulli, J>t~uh. :!U
Ia!! lc:ngth. 53l>-53\l
Ucrnuulh dttnhUIIOn. 51. 7:.-7t>
{>rder ol See Autorc::grc::'' n. La~ lc,;ngth Rc:moulh rond<m van:tt>lc.1tl-~l.
,\ulltn:gr;:s,.,,e cond1hona
39\
bet.erosk.e.I~IICJt~ ( ARC:H l C'iSl:i.
l.k rmill<>n <I. 20
M.l-1\tiQ
C:\f'L"<:l.:d \ aluc of. 2.J.f
dc:ftn~tum of. on5
\. mplc; 11\C:IHg( of
dt~tnbwed lag tAD I l m..,Jd . .).1(}-.549.
tll'trlhulllm of '\landardtzcd.53f
5751.666
,.unpt.: dl~tribuuon' or, 52{
assumption' u~ed m. 54.5
~tnnllurd dcviall<>n nf. 25
dcruut.ion ol. '40. 54'
V;tl Hllll'l' Uf 15
7S3

784

INDEX

13<!.'1 linear unhia!>~d c~umators (BLlJ E}.


00- 70.167-t69.nt5
dcfinttion of. ()I}
and cMimation I>~ GLS. 617-618.722.
n.~

and OLS estimator in multiple


regression, 71 9-72 1
and we: 1gb red least sqLUJre'. 691 -693
"Beta" uf 'toc:k.122. 325
Bias
ot coefficients. 317
and con'i'tenC}. (J8-.69
crrors-m-vanablcs. So1e Errors-iovuriuhlell biru.
of cstimatnr, 68.319 .nu
and nonrandom samphng. 7~71
of OLS "''timator Src OL'> e'timator
ornitled 1 unable. S!!l! Omi ll~d vanablc
b1a.<
sampk ~b:tioo Set: ample! :,election
biJ..~

simultancou.~

casualty. s~e Simuhaocous

ca,unlt;
stmu ltancou:\ equanons. s,.:
SunuiLnneous casually
BlC Sl!e Ba)t" informauon tntenoo
Binary dependent vanablc!\. 319. .384-389.

3W

measures or fi t for models w1th. 399-lOO


and popu lation regression rum:tiou 4(}7
Binary random variables. See BemouU1
random vanablc
Bmary ro:gr.:~~or. 148, 158. 192, 19S. 358
Binary trcatm.,nt,S7
Bi nary vamblc($), 160.207, 277-2RO. 291 ,
2\13.453 . .181. 4~7 . 502
as dependent van able. Set Bmary
d.:pendent v~nablc
uummy variable trap. Sec Dummy
vanable trap
<!otity. St:e Entity binary varia bit:
in t'L~cd cffecL' regression model. '!-57
mdepcndcnl. 27f.. 27l!
interaction h.! tween Cl,ntmuou" variabk

and. 2, 0-284
interaction' between. 277-279
ioteracuon specificauon 2iil
limned dependent ,-anable. t!xampk of.
4())\

linear mulupl<! regressiOn model


applit:d to. 3ti6
and Linear probabilit) modd. 38-1-38~
r<:!!fCSSIOn mode1.162- 163. 165
a.x reg.rcs~r 15,
spedtication. 359
Bm lnate normal distribution. 41 .43.
132- 133.205.217
~ mac!.. Mond<ty," 42
BLL' E. Su Best linear unhia>ed estilll3to~
l3oller,Jcv.1im , 067
Bonferroni method. '227
l:lou11d. John. 44.2

B reak date. 568-569.570/


k.uown. 506. Se<' also Chow test
unkoo" n. 51\7-569
Breaks. 56."'>- 577
Jefimuvn oL 565
probl~m~ caused by. 56.5
3\'0idlng, 576- 577
source~ uf. 565
testin!IIOr. 51\6--570
Bretton Woods ' ystem at hxed e xchange
rate>. 534. 53-tf. 565- M
Bur.:au of L~hor StaustiCS. 71

c
Capllal asset pncmg model (CAP\1).121.
324-J25
CA Pl\1. Sc:' Capi tal asset pnc1ng. model
Card, David, 285 . .t96, 4\18 . .502
Cardiac ca tht"teri7.ation 5\udy. 453-456.
4%-197.501 507
Cauchy-Schwa n m.:quahty. 6l:r1
Causal effccts..202
average. 503
chal lenges of ohserv<~tional data. I 0-11
ch;~ngc: in mdt:pendeot van able, 312
d.:ftwHOll Ol 9. ss. 471. S%
d''llamk. Sl!e Dvnam ic causal effcch
"~II mat l Oll 0( ~9. 66. S5,3l3- 314,-m.
4tl4. 4S15, 497, 499.506-507
di (ferencc-of-mt:<m:.. usin.2
expcnmemal data.
for ditfcrcnt grou~ 502
and forecastmg, 9-10
and ideahzcd experiments.ll-10
and ideal randomited cootrolled
e xpcnment, 469-471
rt:gre~s1on cstirnawr~ ot. -177-485
5ta tistical mference-. about. 313
and lime scncs data. 5%-59~
or ln:atmcnt. 470
treatment \'S. ehgihility effect$. 476
unobserved \'ariltllon in. 502
variables. See C'.a~Ual rc:lation~h1p~
among vanable~
c d.L See Cumulative probability
distribution
Census. ~ec U.S. Cen~us
C'.entral timlllht:ort:m 18. 49.51-54 7'3.
78. 84.
l32- t33. 2os. 429.
434.546.549.bS4.715,729
and asymptotic distribution. 681
and convergence m dtstribuuon,
6: 1-684
Jelimuon of.l\84
C hebychev's tncqualh). 6112-{..SJ
Ch1-square<J disrribulion . 4:>-44. 89. 170.
445.654. 69()-Q'} I. 715. 717.733
Ctu-squarc:d statJSIIC. 399
Cno"; Grcgof). 566
Chow tc:~t. 566-569
modlflt:d. Sec Quundtlikelihood ratio
( 0 LR) ~tatistic

85--S8

ss.

Cigar~tte

taxes and ~moldng. .t31.!-l32.


437-439,445-'150
demand elasticity. 49-l
cxtemallues of smo). , ng. 4-16
p:llld dat<t on coosumrcion. 447. -14&
Class SIZe. See Student-teacher rat1o
Clustered stanuard err on.. 31\7. 370
Cochraoc-Orcull e~timator, ti 14-615
Coeffic1ent(s)
coimc:graung. SI!P Comte!(rating
COCiliCJCnt
of determination. Sl!e /(~
cstimuti11g for lin~ar rt:grcssion. 116
mterpretmg. 279
lin~ar regression. defimrion of. l l 4
multiple. :!25-227. 232- 235
conf1deocc sets Int. 23-1-2.35.
7 14-7 15
t~'ung ~inglc restrictions mvolvmg.
:!JJ-234
reg:re~1on. See Regr~ion cocfficJen iS
stahilitv,tcst for. 569
Cointt:gr~tec.l variables. 656-659
muhipk ,66l-662
Com tegrating cocft'ic1en t, 65 7
estimation oL 660-J;ti I
Coinlt:gr:. tion. 655-M-1
defimtion of. 637, 1\SS li57
testing for. 662-{)63
umt r<JQI testmg procedures. 659
Cold weather and orange juice prices. 591.

727
anal~ s1' of. 592. 594[. fJ l R--624

data dc~nip tion. 592-595


CoUege graduates. cammg~ uf.l7-lll,
ns..A6, 72-73. 77.R~-83. 166/

addiuon<tl education. 284-285,322.442


addltll)nal work cxpcritnce. 281
age anu. 87. 92-YJ. 95. 128.271.284
gend.,u\nd,J5. 361. 371.71.84-86. Ll6.
162. 1~1 65.267,277-27~. 284 .

2R6t
Common intercept. 358
Common m.:ao. 49
Common trend,655
Cond!tulnal dtstribuhon, 30-34. 371. 126,
IM-162. 20~

and

e~h:nded

kast l><juares assump!IOM.

(1.'-i()

Conditional expectation. 32. R5. 15\i. l\13.


245. 255. 260.278
Cond111onal heteroskcdasucll). 637
Condmonall~ uobi85Cd c~timators.l67
Coodiuonal mean.l27, 131. 135. 2~
assumpuon. 128. 435
correlation and,J6.18\I
independence. 47~1 490
'lero. assumption. 47R-179. 616.680.
hll-612. 707.725-727
Condllional probability diStribution. 127
and population regre~~ion lin.:.127f
Conditi1>nal variancc,33. 161 163. 169

INDEX

formuiJ for, "-''>-ii'IU


lunlll<>!l,1:1'1l,ll'l~')<.

Confrdt:n~:c dlipsc ..:!J:\


CCinfu.t~ncc rnrcrval "-'

10

N<

tl::!.IJ~I7.\.

con~tru~:lln!).hJ.Il:i.li'i

n1

I-~J;,I5:'i.

IM 22U(>(I
con~ragc

probiabahly ot Sec Cove::r~ye


prnbabrhr'

ddanrlll>ll <>f. M.Rl I 'it.


' ' ltlf<'lOSI mrervul, '\41)
inh!rlllll ~alrdll) ..'i< lnh:rnal

validity
in muluplc reg.re"icm, 22U-245
on~ stdcd. !>3

fiJI' f"'PUillllon mean SJ Kl 1:'6


for pn:dattc:d ctfc:ca' -rp 7t3
lor r.:grc\'O!On ct>c.'lflcrcn r,l55-157
fnr \!Ugh: cocffrt: c:nl. 223
for ,Jop.:. 1'\S
for true drUcn:n<;< 10 m.:.ms.160
1\\0o\tdcd.Sl-.'(\ S'\ I.S7.22J
Cnnhd.:ncc: lc' cl
d dinrlloo IJ!.~I 156
Confad~ncx. 'C:I
dehnall>n u(. SI
fur multtplc ~ocHku:nl;.. .!.34-235.
714-715
95''<. .~, 95% .:unfadl!llCC SCI

1511,1>1
fun<.:llun, 111 probu rcj!.rc,sltln. ;\'XI..3\lo
Current J>opulaunn Surw~ (CPSI.l!6. lb5.
271. ~!W-~~). 32ll. 5~

ddimtion of. 6!\1


\ooper Moruca.l90

sample. 95
Covcrag~

and Ia" nt large nu mbl!r;. 49-:if)


ot Ul "c'um... tof\ I':>. IC(i, 167
ot sample ctwilnancc. ')5
ot ~~mplc hlfiJncc. 7n.l29
Con~r~tcn! c'llmaror. f>X l
<:onstant fnrc:cil)>l, 573
Con.~tanl regrc"or. 19.5. "08. 707

.335
teml. 2'7tJ. ~'91, 29.'
\umulaU\C dbtributi<>n

ch1-squarcd. 229
F.~9

ContiniiiU' mappang thcurcm. 681


dehnllu>n ur 1>1\5
~n .I '>h.Jt'l.. ~ th, r m.li.C:~
Contanou u' r>tndctm \1lfL.iblc,
Jch01111m of,IY
t:\pcch:tl value ot. 24

prh.thiht' dasmbullun uf,21


group.s5.127.19(1,Jilb.47()-4'1(\,

c~nlrol

479.~1-lK.'\,-1..,'2),-t\'7,-NI-l'l~

4'17-5(-.J, ;i1)tl

( tln'C.:TI(<:nl:C
iu tlhlltbuuun. 6S.3 I>X.I

repc<tlcd. ~97-4\1\1
ddinitton of. 498
Cubac rcgn!~IOO. 275-'!.76. 294/

D
;md 1ypoc:~ nl HJ...IJ
Dcgrt:.:' o r lrccdon, , 43-J4 ~'\\1.91. 125.
1711.44'\
n.Jju,tmcnt.200.1'19("k>91 717
Dt~ta. ,.,urc~.:;.

75,201
dc:fHlllion ,1f. 125
Dtrn.md closucarv, .J2J-.121>
fur ~,I! tr~nc~ -130-4'\2 .n7-t3':1
Dc:n'-Jh 'tuncllon ~~ l'r."dbahl:y dcn<it~
fun,uon
Lkrc:nd.:nt 'Jra3bk 'iu Rct!r~~sand
l>eh:rrnm"t'~ tr~nd. ~)~.~~~~. ~ol 5o2
ddinatan ol. 555
C<lrft:{IIOO .

11m.:'~'~-

l>F-GL\ tc,t, for unit


nJ~ aSS.

66.''

rt~<>l,

' ' -\OF tC>I 050


and clmtc~rataoo. o;iJ.Iic\2-oo3
ddinatae>n 'I. c,_c;o
Dicke' Da1 1d. -iN I
Dr<!..~ I Fuller ~1:111\li.: ~~I 563t

uugmcnt.:d r_ADFJ y, Au!!mcntcd


Dlckcyluller -.ralt,IIC
d.:finallon ol. ~60
P ad .. c:l Fuller tc:.t.. ;\flJ...5n2. W 3. 517.
t.S~

b5J

h5:tt

fm wintcftdtion 1\'\9
Darr~:r.:ncc olm.:an1-o
anatiV>I\. 15~. !9.\

c\lit;l,llll>ll of c.lu,ul effects. 85-BS


Daffcrcnccs csllmalor JS\
fun~taon. 266.277/. ::!90. :!94 ?.95/: 331
"lth nJJtlle>n.<l n:!(rc:,o;(lr<.~77-l-SO.JR.I
D2/
model. 265.275
-liNt
pecaficauon. 275-27() . .291. 21}3, 2115 B2.
tldtnllrllll of .J71

Comtant lenn. Sr,lnlcr,,pt


l"oru.umcr Pncc: Indo:\ tl'l'l).I~.M.~~I.
~~~~. ~21Jf,;)31 5.'11.651(

1-)-.:!'11. 47~711

pml'tabatity. ol conftdence
mt.:rvailB
CPL. <:u Cun,umcr Price lnJt:~
C~
\urrent Populat1ton Slll'\-ey
Crune r~lc:,J51
Critica l ' :alue. 71J..W, &~> Sr1 ul.1u
lallstiCal bypothc:'i' l t!'ll
Cr~.,..~llonal data. JSO. J55. ! 99. 5<.16
dcfinllmn of.l l
IV r"grt:J>~ioo assumptaun> for. 7J6

s..,

tor intc:rnJI 'alld11y. !:>ctlmem~l


vnlldity

".n. 24c;.256.

:!411 2::~.25~/.
21!3
and olin lion , 501
codficienl. 95. U 4
and l:tmdllaonal mean' :'>6 I:!8
populattnn. See Populution ~'Orrclauon
regre~son. and t:rror t.:nn Su
P\lpulatioo rcgrt:\~1<111
sampl.:. ~..t' Sample dUtocurrd..t tlon
and ~mplc covanancc. '4--'.lll
serial ~t' Senal corrd.tllun
Co\anan'c: 3-l-35. 94. 122. I ~X
matn>.. 7~. 71(}-71~. 71J, 717. 7lU
72~725

and bla!.. C:tr Baa' and consistency


Jduuuon 1f 'ill. t.-;
or C'>llmiltnr 1:17--Ni. 122. 20!\

Contnll"d c\pt:nmcnt. randomaz~d


~ :mtlunta1.tJ con II olleJ

J)-36.1l~ ~~~190.21..'i

.:!27-221'.235-23~.

cakulatang csumator uf. 725


hetert.... l..edanicil\ nhtN <tanJ..rd
c:TTOf' for. Ill

Ccmst.;h:n~

C\f'O.:I'IDlClll

Cumulative ,lnnd&rJ nnnnal Ji,tribuwo.

\orrclataon

compa11n11 444

\ontrol \anahiC'io.1'H

ddiruuoo or. 6.~

pr>l>abalit). ~~~' .~.,. ul>o


(.'on"~ten~

2:!'-124 1fi<.!

78 S

.~t'l'

funct1on. J08 Sl't.' al11 Normal


cumu!ati,e dr-.tnbuuon tuncllon.
L~"llc cumulatl\\: d&Sinbuuon
functiun
Cumulam.: J~nanm: muluptiers. 0011.
600-60J.6Jq,h2tll 621 6'!...'[

dctimh<n ni. 602


11-pen,Jt!. 11113
long-run, {1tJ3
normal d!'tnl>utaon. 2?.2,.NI)
Cumul.tli\ C probsbrlitv tli~trabuuoo
(cumulati\c dt~tribuuoo fun~uun 1.

I\. drilL rcn~e~o-in-c.hffcr ..nccs co;umaiC.lr.


.jl(1

error t~rm. -18..'>


rea-.t>n l<>r INn!:(. -I'll
Dtflcrcnccsm-differ..:nco:' cstunator. 41\9,
.j!otl)-li'J,JlCj.JIJI( 507-.SUt>

wuh adJuaonal re~o!,"<'l'.


.t\WlW
l'OIIS~Il"ll<."1/

of 479-4'iiJ

dCIIUIIIlln 1>f.~hl
error tcrm.Jl\5
\:$110\Jtan ~u~l JI~.:t .f<:;(I,J%

cxrc:naun uf. ro muluplc 11me penod~


J W:<4

Cumulau~e

reil">n !ur u~rng.~l-11'<.1


IL\IOj! TI:JX'31Cd CfO'o,'><'CllOOaJ data,

19 :!I. 2.3
tn log.11 anJ probit re!!J~'ion-.. ~9
Cumuhtll\c standard lngi\11~ drstnbuunn
funcuon .394. 31)7

,.~ ~inglc

11)!!--IC.JCJ
O~rc.:l

J1ffcrc:ncc C~llmlllt'r JSJ


lurcca,t, IWJ

' ' ih.:rat~d tnrc:c.ht. ~'!


mulllp<'ttoJ, M:i-647

786
Di

INDE X

Cl< lt

chuio: data, miklcl lor ll113lvnn!l-

Studentr

)"'

t., rpounJ c."tcrungc rate. Se lJ.S

o,

Autoregr<'"'"'c diSlnt>utc:J l.1g


mudd
~R(I ) crron.rlO'rf.l2

dnllar Bnush puund ~xch '"""

rt
DOLS. .s. <h"''d!DlC ordtn:tn I st '<lU><h.'

(oOlS 1 esiJm:ltor

~&Uh J\Rip) c:rro~. <>l5 61s

anumpuom.. 6111
autocorrclaton f

rv

rm.

EIJI4il12
"'ith

Studcnrrdttnbudun

f>l\'td~;nJ Vtc:kl "71

OtSCTete random 'llnabl;:, Ill


Distnbute.J Ia~ model .\.-t' ,,/11>

with

'~<-<

Olf.SIDII~h.: ., .X

DUiotCgn:J.~ive c:rrur~ 61,S

n16

Drunk dm 1ng fa"' Set" Iralltc ratuht)'

rate'
n umm)' vnn.. hlc:. .'iu BUlQJ} '"" hlc:
t>urnm~ vo~rablc trap. 20:<. 357

and c:'CO};COCtt~ . (l(YI


v.nh multtr.., L"" hi <-{it
D"moutK'Il

')mntt'IC

l))lllilnliC (liU..al c:ffectJ;.CSIUIIiilton <II,

Sill -.:>R
tn '6

h(JI( ..Mf!
DvrMmic muhipliers. 600. (>(J2-fltl4. f. 19.

f.!l,trltluuon
.kt ('ondllionnl

COI'IlJ'IIIInS. 6 1 2~13

ron1 mporaneom. Su impact effc.-ct


.f ftnuion of 602
lnoauon of. under~tnct cxogeneH}.

.Jt,tribUIIOO

..r c:.Jmin!!" 165


ol "'""- 161
e~act Su 1:~act dl'trih uuon

Ol

hum~k~Uru~-onl~ 1-~tilllSIIC.

hllf'k'UI
JdcnttUI,..:.o
JOint \. Jotnt d tnh tion
ttltnth IWrntlll o;.ampltn!( Su Jointly
normi\l li11J'hng Ul\tnl>uuon
JOint rrul:tablluy. .!II
kurhr.i, 26
l.!r!(t:\umple. Stt l :.rge'ltmpfc
d1-..mhuuor
nurg.m~l. 3:1

measure-.. o1

"~~r

617

nnl~

F)Ultt,.IC: vs.
I c:h:ro<kc:J ..,tKtl) ruhu' t /~tattt.c. 71 illl

ul homc,.,L.cda<Uctt'

~>-1.'

IC:"'h )I
,7
muh" l!rl.tle n.,rrr II. lit't \lulman3tC

!'><

nrmaf dl~lrihuttun
n.m-nurmJI.untt wultcw-. 1\.'i:\ f>55
nom1uf \',.1 l'oonnaf Jl,tnhullon
ot 01 S e~l m:llflr 14s
nl O LS t.-'1 ~tall\tlc<. f:'Y
prohJhlht} ~tr Prohahtht~ ..ll'tribution
''' '"l!rc:-.~un \taii~IICS v.tth n.,rmd
Cfi'I'T'\. 71S-II'l

01 S c~llmJtJon nf. 61~1 ~


lip ""d.flt:n
ldh ht'. I 6:'2-{i2..l
7c ropcr GJ. Su Impact Cllt,t
D) n. ru .. o n..n least squo~.u. (DOI.S)
C$111"'131~~2

\. n~~:nted ll!l:kCI/ fufkr


lH-' Af.)I J h:,l . n~'<. &oOI ,,,,
d. flnillnn e>l. f.~y
[IIIII~ hmary \.lfldtlles, ~7'
Cntlly dem~uned ( >LS
o~fgunthm. 35\1
EnlttVf1xed c:lfc:~l"- 357 111.' 'I'-'

[nttiY'po.:ull, onerap.c J~Q


Cnuty.
mtcrccpt' 1~
Erro C< rrc:.:t, n
lind ccuntcgr tie>n. 1\SS-t~<j'
term M7
error currc:clinn mood ~e,. Vnh>l C."rhlr

rf,

currc:clton modd

E.rrol"'ln ~r 11~< '\. _r-s "7,1'\9


1-t,, 111l-31 .!CJ..t
dd '"" l>f 31!1
solutton' 1 .'21

ddmun.. "r.~

Erwr "urn uf '<jUUCS. .)u Sum nl

:~quarcd

rc:,du~l'

E
F:mnmp.~

.Su College graduJ tes


r:~"'"'lll'~' )'"lumal'. demand lor. 2AA
t'lnn<>rntc 1101<!' ~nes. Su Time 'ene1
f'Juamon poh;.: J
F.uro. C.\u-.JI Se~ Causal effect
F.llu:ten'-v
Jc1101tlon of.~.

1~

Error tc'nn.ll4- llh


assumption t~r tlxcd cltcL' ro.:JlrcSSIOII

3M
ilUII..:11rr fJI d.tn dt>lrtbuted
1,\,

7'~ "73_~

1 '
III.IUrc:J 121
nntl mufttpfc rcj!rC'sswn. 721
anJ ~nul mncln11on, 592.hD,hf'l
\l.mt.lunl d~' 13lion ol Set !lt.mJ.ml

c rrnr nf rc!-',ltl'.'on
und trcotmc:nr level.~l
crnr t}tlt f unJ II, 7V S,, 'r, ' \t ~~ llatl

IL')mploucafh Su Asymptoucall~

cmrurcd. 735

,l,.,"nc-.... .:?b
> \t.tn.J.srdll.:d J\cr..~ ~I
vi t..nd.Mdved 5allljlle &\crap.c,5V

1kl111111un nl. ~~

l (r

1>1

IC$t ..'irr

h~polhc:-.1,

F<;,~

elltCI<'DI GM~1.,.,umator

"'mrhnR- ~. '>Amphn diSlnhurun

~ mvdc-o,

tllll -60:!
co !~tKm ol aero ~rvauoas. n1.

corrc:latoun ol problem' ft>r inh:-rcntt,

oJ '\11 nato.-.....67
11 (,Ls c,umator.St>< Genc.-rahtc:d lcust
~quare~ (GL$) csumulur
nt CJL.!> ,sttmaior See Ordmary lc:a~t
... quare' t0LS) csumnt<r
ami vannnc.,. Srr Vanance
r: ftt"cnt G/11\f "~timator. 72?

-..mr~< ~~~t'r ~"" 4~

laiK>nJII'\ \n- 'I&IIOOAI) <lisJ.rtbullnn

w~

J..-rm,ttun t>t.-11.1 , 59!1


mgcndltf I\' model 4H...t'l.t
Engel, Rul>c:n. b.~). 65~. M6

Cnmy re~rt:''IOO mood. 'n2 "1M

{2Clt

condtuon~l

J~I,J'i'\-1}0

cndogc:nuus \otrl.lbk H3,J3t.l hl, 14

[ogtl-Gr

~ tnrccasung..625
hy (,J.S,hl.'--615.627
wnh 'trictfy exogenous reg."'hm'.

dl\trtbuhon

Remnulh Su B~moulli Ji,lnhutton


hiv Jri,tle normal .\,., fiJYilrliiiC n.:trmol
1.hMrihuuon

\DLm,>del 612 ~13. 621

wtlh CXOI!<.'nt>U~ regressors.~

\')" tulle

du'4u:tred.St't' ( hl'<junred

r:stlm:ued. mtci'J'retmg,4~"

~llmattns, 211'9.1SO

t.l\ n.1m1c ctt.:cts und. <.97


611;

dchnttton u(,2h7
vi demand ~t' t>crll3Dd cllt\ll<:tl~

l)mm
nde~penmcru< 4"-l
c.f )UJ'PI~ . "tre Supph d;utlt'll}
Ehtuhilu~ cffccu "- trcntmcnt ,4"(,
I )ow Jonu lnd~lrial Avera!!" 42
Drlf r n hmt w u, w1th. ~... R.andum v.lliL. l!nJ'!lc:nOO) tMirumc-nr i4Q
mc>del
l!ndngc:nouHcgrcs-..~4:U,4h, 141 W'.

ddinnton ol5'11.597
c:hrn.lte c~><:lttct<!nl\ ul, bl/1.613 ol5

I:Jdcrllub<:r Whttc 1
J mJno. \ t'
HttCh'-l.:edJ,( l)
W.t
~tandard err< r-.
r.ta,uaty 21:!. 2>\11, 2!/1

tc:'t

'',.I 'Pllunc:.J ... urn of <4jllllru

E'um.tlc.<kllnlltOn ol ni
f

C..\ltrnttc:J "-Ctj1.btcd k:1-1 ~JI!;IT<.~ r;o


FC-.whlc: weighted JC:ll)l JqUitlt)

ngel-Gr~n!lc:r

Augmc~d Dicke~fuUcr
,\01") tc~l

(EG-

bumauun ~ ulm Formttllg

.\RUt mo<kl.667

INDEX
of

cau~>ll

ettecl.

s,.,.

ddmtlton ur. 5'.f.!


anti ilistrthutcd Jijgmodel hiiCI
pI I and p~c:'o.Cllt \ , p "I .mJ prc...:nl

Cuu~11 cil.:.:t~

c,f c.ul ~raun1 C{>elhcacnt. .'oro


{. ontcgr tang .:~fn.:~o:nt
ddmmnn nf. tlh
of dvn< mac c.~u.<:.ll cftctt~ Set' lh nanm.
... u~dl cb
GAR< II modci.M7
<I I\ rc"f~'S:>Ion coeflid~ms. Su Two

ICll\t "QUUrt
regre"nn m<l<ld codf1e~<!nt\.
:; ( Coc:fll.:lt!nh
in lo!!ll modo:!. \<lr>--JO
''I populallon rne;m.Srr Popu1ataon
mc.m
tn pmbu mudd W('>-1011
v i rcerc,~1<'" hnc. 20:!
bum.n~..... l~l.li\.~169, 1 1J7.2n1 'n2
SIJI!C

otlmC'~r

nlcau.<al d(e<:t t 11
c~>ndlllnnall~ unha,ell 11>7
..:un'"tc;lll 76 ~n CucN\It!nt c~tmator

uefmt.nn "' r.7


<lc:,anshlc: .:hunt.:tcril~~ uf.li 7
t.liffercn.:e,. sf( n.rrcrc:nct!, estimator
hffc:rc:n.:c-.-m-Joff.,~.:c:' St<,
Dlffc:n:nc.:,m-Jifferencc'
ellm~tor

Wt
gc:neruhteJ 1.:."1 'lJllllres. See
Coeneruh,cd lea.t ...quare'
lc:."t J..._nluJc: U. 'iali """ ~l't: u ~I
uh-..1lute lle\;atmn' ~'limn tor
hnear tnru.lllmtMih unhased.
I anc:ar c~>nJlttl>n.tlly unhaa~ed
ellw:.~nt

s,.,,

"'llmatm

m.. \imum hkehhnoJ. .'it,. \la..\imum


lilo.ehhu.. >J "'timaht (~IL[ )
Nc:" e' We;.t .\<' Newey West' anancc
""'llrrulhlr
nonlinc.ar lc<~.~t 'quarcs. Sec Nonhne.ar
lc:a'l ~u.trc:s (NLLS) cst.Jnator
mdin.Jr~ le;bt "!UatO:\ Sa Ordtna~
1... M ~uar.:s (OLS) csum:uors
pr<lp.:rt~' of. t>7-f..X
of te~rc>MOII cu.:tflci..: IllS. 317 ::; I II
of 'tJ111.1.1!d de~ 13Uon. '3mplmp

Jt,lnbUIHllJ, ).lq
TS LS. Ste 1\vo Mogc t.;,,~l squnr''

pJI. pre,cnl and luturc. Su l'nst.


J'r~nt. anJ llltUCo.: U01!0C:1f:
~tnct. Scf! Slnct cxo~cncu~
I\pt.:) ol 51iH 6lX.I
Exogenous tiUtrumcnt oilS, J..IJ-41. +JIJ
-153 4.5~5(,

450
IC\llltg Ol ol-15
ExO!lCliOIIS \'<.irtablco;..-1~~3(). 4.'8 . .1-1,5
dchou,on ol-l~. """ <>26
ExO!!CilOUS \',tnatlOO, -151-!52

Expc.:tatconls1. ~J
C'ondJttOn:ll \a Condition I
exp.:ctauons
iterated. Ia~ ol \rt Law of iterated

Se.. UnbtasC'dn.:ss
"O:I!I.IHcd least ,quarcs '''e Wc iph ted
1~t.tr~- ~lilnator
l.uropcan Ccmralllank. 7
l"C:ll" IIJ
F,acl d~ll '>o..: n ol.';.....l\1 );.'

Ex.otlh oc..lc:ntafieJ cocfuct.:nt~. J:l. - -13.3


C:\3..:1 ~mplmg dt<tnhutmn
erw,..., rwrmalh llistnbulcd.. 6,<\.c;..I><J:
of OLS c'llmilior 11ntl1.:'1
SldUSUc-... 7115
f. 'cec-qollltilit\ 11 ' 5

E.xugcncitv, 41.1
a.~<umptum ol ll25-67S

of

co~U~dl

effect~. 477-4i!5

Exp.:nmcnt I cfte<:t\. .;Xi!, 50L


Experimental .:stlmarc~ 50:!-507
Exp.:rlmcnts
d~'l.lblc-bhnd Su Douhk-Nand
expcnments
idcJI raoclom.ized conlrollc:J 9
problc:m \\ith. 'I
natural. Scr O un<l-c\:penmc:nh
and qua<ac\penmcnt\. 46-~SO..~
rnndomuc:d condn1unal Srr
R~nc.lomi7c:d ~'Onditiunal
c~pcnmenh

Expl,uned 'um ot -quurc:' (E~S). 1.!4


c.lehntllon nl. 123
Expnnc:nunl funcLi<'" 21\7-21\.ll 27"-27'>
E'(h:nded l~:11 <qourc:' 1\7!>-t>RO
"umptaun' (err
multiple regrc"ll>n mll<ld. 704.
707-711.11

">.ttmator
unl't~\Cd

'iJ.-19J 1%. .!61 .!"l ;!~Q

or oonunuous r3ndom ,.anabk. .!.1


of random variable. 23-.ZS
Expcnmcntal data. 10. Su ol r<> t>Jnc:l JJIJ
rcgrc~son c~tunetor~

.$t6

Re\Crve B~>.trd. 7. :\.\.1. 555


hnll~ kurto"' Set> Kurtt~i
Finatc: ample lla\tnl>uhun. Sl' Ex~.:t
dhlnhulinn
F ir-1 t.liflercn~~'- 'i~l\-5.31
FiN l"g (hn.tlaggc:J ""luc) ~:!R S:VI-(31
FiN 'tagc fvi>~U;.Il~ .t.$1-W~ -Wh1. ~ !,>;,
Fi~t \I age:

te}!rcssum -135
Fitted value .5.-t> Prcdi~ted 'Jlue
Fh;cd c:tte<:t~ n:gre""'n.35!> \61
oi'-Wffipllllll\, 360, 'M-.31)()
dehruuon uf. .1-i\1 \'\n
modd. ~~11-359. \7:! 41>4
u.'ing hlOilT) 'uraable'l. '\57
J~fmnaun of '57
"ith mu!Uph: rcgressun. ..~5.<;
rdahng tralfic llcnlhs H> ukohol 1.1\.
.:wl-~1

<'>K!'C(IoiiJOO~

theory of wnn $II ucturc, 657. M.?.-4l63

and

.U!2

.t5:!~55

ot-4 5

EXO)!O:liOUS CCjU'CSSO~ J.H, IJb. 443. ~.

E~(I(Ctedv3IU~

Fea!llhlc: ", ~thteJ It 1~1 squ re' o<J:! HJ5


Fcd~~ral Rc.wrve Bank of BliSI<ln, 5. 1!:>5.
F~Jcral

"'"~cneil\

a'~ssm.:nt

787

r.:grc:o-""" \\llh 'angle rc:gro::'>T. f."l


External vaho.llt\ ,Jl ?-3lll.33U-331i .B~.
-1111>. 494
llo.:hn1hon ot J12 lll, .1~~
u:~rc~inn U>ct.l fur (orcca<Ullil, 327 -32'1
thrc.;lS I ' .J.-.3 h. ~h'i 475-4'76. :SUO.
502. 5117

F
Fan ~hart. 55<1
FDD su~ FrcC71ng liL'$fC:O: ll.a~'
Fd .tnl>ullm ~5
Lca,1hle dh~~nt tiMM "'tunat~lf, ;y;.
Fc.a,lhle GLS C\limaturs. 614 'Po'\

,t.mllan.l nror\ rnr 364. JM-367


nme Sr fime h~.:d effect ~
flun.l.J ,c;, ( ult.l w\'3ther... nll Ordn!o:

juu.:c: pric~'
Fvre.:il'l

c.lara:L Sr. Om!t:l forecibl


~rror];. .541
~ompunc:nL\.. ~~

llefmllltln uf, '\\to


ro111 m~an ~qunrc. :54K
~uJn
itcr.u~c.l

'\71

So TtcrMcd for~cast

llltcrval\. ~JX-54Y

''confallenc.: mtcn'al ~9
dodimtwn nf. '144
u' mg R MSFe to con\trllct. 5-IX
momt!ntu!"l Su \tomentum forecast
' -t pred" led \ Alue 537
um:ertamt) 5-IJo>- 54'1
Fur~c.c.tmtt. \12

.,nJ cau<..tlttv. 9-IU


co:unnm1~ '\25
., t:'tmtaung d) nanuc cuu~aJ ctJcc~
IlL'\

anllation \'n Rates of mllauon


mtcmal.tnd ext.:rnal vnlidlty.327 321J
nu>Jel\
com p:'lnng, .57 l 572
l'or multtple nrrobh.." Su Vector
autongr~"'' oos
l'lillh p~ o.; urvc. 576
snulllpcronJ s.., !>lullipem>J
J(lr '\.4(i:lJO~

rtoeudl1o1UIl)fSI101plc. Scl' l'~ellU" OUI


Of 1mpl~; "ICC3\(IOC
r c. ul onll.tUOD. \r<' R.tt of mtl.uiun
11\ing rl!jllc'>ion model,._ <,27
"ector llutnregr 1 n~ ftt , 63& 1).12
M-"l r. Donald . .l:?j

Fr.lctiun Cllrrc~otl~

pr~tlu:tcd ,.\'}<1.....1~1

788

INDEX

l-rec7rngdcgrec IJil)':l (1-1>1>1.5'1.1 595.597, C.cncrnh1<'0 autnrc:grcSSJvc condlllt>nal


'ilokili(J.IJ02.111N !>~ll,(\27,()201.
hetcro>lcdru.u~,,Y tv ARCH)
h21{,h21
Jetlnnl(ll1 of. ~93
r-,hiii,IIC, :!27-.235
-rrrn.1d1 w atnnn:~r.:,,iun, 551
~rrroa~h ot tu length ,d~~:uon.
'~'-"'54

-1\)IOJ'IIIIit: Ul\ltibUIIOII 1. 71 I
tltw. .S711
o;ompulln~4.c;~

I'll'7-(llJY, 668}
dchmtaun uf. 665
Gcncrnh.t~J

lcu't 'quarcs (G L.<;)

C~II111Ut1nn, 592. b()\1, 61 ~IS, 62-i.

652
nh,illlilj!CS v.;. di~thllntag..:s. 6\S
''"umruons. 122-724
rule nl first. 72~727
:t')mphnically BU 'E.I\17~11~

dcllntllon nf,127
~~~~~ rlhUUt>n or. 71!)... 71 Y

con'"t ~nC)

hctcro~kccJ,,~tlclty

Jcftnllinn of, 704

robust, S,t
l lcteru... ~cdn... llclt> rohu't f .

<tllthllc

hum''kcda,ticity-unly. 'ir.llomo,kecJa,ticllvc1J11) f .,tuu ... tic


u, -:ruu r<!~r('"''n . .219
v.uh 4 r.....,trl.:tJt>n'. 2's
ruk ot thumb. 2.11
lc\llll~ cxdu.,lon 11f qruup' ut 'uriahlc'-.

404/
lull~:olumn rank 723
Pull Ji,dO'iure ~ 17-~ Ill

fulkr, Wl!\11.:.560
Fun~llon

fug,tnthml.: .'irtl nl!,lfllhllll.: IUilCIIIJil


4u.tdrnuc. .'i(; Qu.1Jr .1t1c tun..:uun
run~11011al

IMm ~.17 ' 70


unknown p.mtmct..: rs. lH7
ffil"pcCI!iClltiOn nf, '2~. J.W
c~ttnullln!l-

uellnttJCin nf WJ

'oluuoru. w. '111
rn rt>~r-:~sion St I me11r rcgr.:'"""
noni.Jncar rcl<(rCI>OtOn tun.tton"

arp.umt!nt,. compared wnh

01.~. 726
cffc~ncv

ot. 1Jl5

kal~ihle ~..!' Fcasibk Cit S C'timntors

tnf.:t,hlc Ste Lnfeas1hlc: (',t ')


c'1im.uors
.1nd mulllpk r.:grt!$>~On. "'U~ . ~~ 727
nnnhnc.ar h:ast square' mlcrprc:tauon
ur. fi t-l-4> 15

,, 01" hili
~lid 1cn cunduonal menn ~ ...wmpuon.
72~-727

G.:n.:mh1cu method of moment' (GM M)


c:'llmatvr -'SU
J\) mptntil., lll\ cffich.:nt s,.,
A'\mptotically cft~<.1c:nt G'-IM

e. . umatr
clha:J.:Ol s~r Ftficicnt (o\1\i 1:\limaiOr
dd 10111nn of 733
lcU\lhl~ clrict<lnt Se~ Feu ...ihh: duct.:nt
( iMM

c~timator

10 hn~Jr

l'ovanancc matrix ~timauon i.!'l


.::sttmuh>f'-. (Ill(,, ~I(J
~t andard t!rrnr<. 36b--J67 ">71 '<J.
5U4, 6112. fi04....6(1R. 619. M7. btl( I
" aria nee formula. (i(')6-6(JS

covanuncc mntnx. 229


dclinll<'n uf. I(~J
haodllng with heterosk<!da~ttcttvrohu'c
standard c:rrors ''S. "'t!'l!hted k ~~
MjUaro:, 691. b95-69n of koo" n funcllonal tonn. "cghted
IC3SI "'-!Uufl:" \\lth. 692--NI~
Newey-Wc\1.
Newev-WI!~t vari11ncc

s,,.

cstmntor
regres<ion 101'lt.ld with. 694

time 'rtr)'lllt(. Oli~


truocauon p~mme1er. Set Trun.:alt<n
rruamet~r

volatilit~

.:lu,tenng. Sec Volat1hl)

clustcnng
wctghtcd k;c.t "-fLWT~ 691 6Y2
Ho:tcro~kcd11Sl1Cll~ -robust

F-stallstu:.

22X 232 714


Jdimuon ot ::3D
Hc:to:ro~kcd3SitO:ll\rohu.<t 1-st.ansu.:

(G \L~I 1\Wii,tic), " 35


slaod;o.!d ennf'-

Hdem,kcd<~StiCII\ -rohm.,

1 ~. 1~. ~ 171.:29. 32b-32~

.117- 130. 711-71~


aJ,antagcs and dt~<h-antages oi F-n5
.:tm~l\tency ot. O.'i6--f>.~
conclation 01 error tenn across
oN:I"\<ltioos.. 338
ddimhofl olltl4
Hetland. l.o1s. !YO
Home \lonpg" Dl<,cfO'ure .\o (li.\lDA ).
'lS~ :,.<.;7 39' ~-f-..40S. -10.31.
Sc '""' .\fortgage leodmg and

I\ rcgreSSJOO moue!. 7().l-705,


7:?7- 71h
wtth um. <en~ data. r.>6
toLS. .'>!! (" nc:Tlllvcd u:ast "'u.rc'
G
< UmolliVO
(or\ Rt II Scc Ci.:ncr<~lli<.J autlrl'!!f<.'-\1<!
toM\1 "'umatur Su Gcn~rwvcd metholl
nlct"
condnonal hcter"k,Ju~ll.:ll\
of mm~nb (GM\.1) "'lim.uor
Homo,J..c:d,,...ucir~. 1-'R. II\CH69.171 IIJ5.
li.IU<,Mnrko\ tha:trcm, l4lS, l11l 1!!7-II>X. GMM 1,1.111'11.: See Hct.::rt~l..edn'liCifY
WI,130 -231
c.lefimuoo ollbO
015
ruhti\1 J\131lSUC
(oran!(c:r n.ve. fi!'5.6.5; . liM
:i"UmpiiOII$. "'21
t:rTIIf\. ~I'll-.. 20'1
~onJni n.<. 111S. .,2:!. 1::!4. 7':;
Gmng.r i::Ju,:thl) srau,.tic. S47
effl.;,en~ O! OlS e:.tnnmor With.
Jelmllluo i\1. I~'
torangcr uqlm lest.5-H. tl4'!
71~"~1
LirO"s d.,..,.,,.,, "rodUCf (CiDPl. <,~1)
l\1 countcrp.ar t ' 7.10
mJ cTte ied k lt-1 "JU.'ire-; assump11on.-..
hmtall(ln' . 1011
IN hJ' Jn . .) t Japan GOP loll
1!.~.7117
fur multtpk retll'l:''i''" ;ns . ....,0- 21
Gm\\th rat<-'-. "21--5.1
J-,t.tt t,tic under. "~::- 33
etm1.h110n~ (ur 71 II
mathematcaltmphl'auno\ of.l()J !f>-.1
udnnllon nr i211
H
Humn,lccJ;I\1101) -~'nt~ F-stall\Uc.
11.-\C. .~u Hch:rf;l'CkcJa_,ucll\ nnd
muluvanatc. St \luluv.matc (i.IU'' '
22~~32 . 444 715.711J.733
\I,UI.:O' thC:(Ir<:m
11ut ..:orrebtion..:on'"t~nl IliA( 1
:uhilllt.J~IZ' aod <lrsaJ,illltag"" ll! 232
30d "CI h!Cd klhl "!U3rC~ I,._, I ~l
,t.md.trJ err(l~
Jrlinit~on <>f. '!31
Hornc,...J..o:J..,IIO:II)-onl~ OLS
'"" I! P !!,. (',>tle~c gradU3to.;
Han11rd limvo:Nt~.425
H!v.thomc ertc:ct . .P~7" )[II
l ,tali,IIC ~ '
~
llf.S <I
G~ncr.tl flcctrK" (ompam 1. I
Hom<~!>keJ...,II..:itv-only stand.mJ <'rrnr;.
Hcclan.1n, JJmo:' I . 407
lkncr.ll "<tutht>nurn cllec:t, .tt>
Hcter()j!o:ncou' f"''pulat1on<. c~f>Cnm.:ntal
l11~1~ . 106.~~- 32o-J2X.717
(icncr~ll\ rc~'TC>'It'll modd
formula l(lf. Ill.~
.tnd qu~'l-i.'\J'enm<nt~l csum.th:'
.J31-439,..t-lll
m ~ll'-5117
Hnm<>,l.cd.micity-t>nly TSI.S ~tanuard
(\llj!C11Cif\ and 10\lfUIIlCill rdo.:\IIOI:e 10 , llctcrmkcc.Ja,h~11).ltlll lno. lh.'i, 1 1)~ Z:!\,
~rrnf'-. 13(1

,r

Hl-115
1 ....1 S c'IID101IIII, I\\I) ~~~&C> II' 4.l~

,~1. 1102

.mJ. UIJO.rrcl.llllll u111"'1Cflt 111\C)

Hom,,,t..cJ.t~lli:ll\ -unl\

1\lalls!tc. 170-171
Jl\tnh1111<H1 <1. t>'lC,._:,a,l

789

IND EX
H1 mosl.cdnuo r' mil 'liiUIK'C ~ 'tun:slor. ln1C3Sibk \\~lghl-.<1 J. st "jWirn.. fJ91
t 711
tnrur100
U)mv~l..c:,.fa,ll nom~ I rc:t:rcs,iun
tor<'ca,lin~. 6-7
and monc1.1n p..olj,,,(,:f,
'"ump111.111.. 1711
lhpnrh.:'>l~
.md oi I pnc"' o~t>
owrall pncc. 6
.lllctn.o\1\C: \n \ltt>lflllll' hypnthe"'
null .,,.,1\ull h)p..thcsr~
<c llm~ m1crcst r lie~. I
.u~J un.::mpl~menl St'r 1/so l'h1lhJ"
~h pntl ""~ I c.,lint:Y, II 113
o.lcfnliiiOO ()I of,
cunc:
lniMllllllton critcrb. 554 \l't'UIJn A.t\e>tor diU.rr:nu! hcl\\t:c:n ro means.
1:i.,l-I'J
u 1 ...,au'" tntcrton lBTC
Akat~c mlormauon .ntcnon
and mcn"'lcnr I uro.laro.l ..:rror~ 32r,
( AI(')
lltl"m.ol \o~lu.ltl\,.\14
ttnl ~ . , Jurnr hvp..ttho.:-.
cakululln!!. 5'\'\
lor lal)lenglh ...,kcrtou, 51Cl 55.1
IIH.I mUI IIpic: Clltlfl~'lrOt' 711'
IMtutlillly. ~rt Arc.tkS
.mJ muluplc r.:)!rc"hm ,-!2!1 2-l!i
t'lc-,>dcJ J;n I~ ~
lnstrumcnll~l
..:kfinlliOO l>l. -~.R
h r J'L'I('UI~UOI
I 1 71 "-I
C'\'~l'Oclly J13. 4.\1. I:X,, 4J'l-4J') ~01.
fnr p.1pUla1JOD rcgre."~ .!113
-=->
\\llh prc'spc.al~d ''gn1f
It'!,
~S-NJ

coodi!J(l(l'lI( .l)..\,.U2, 4~ ~

ramkmu.'ll' '" .J' '


omo?-'ldcd, l ~ \
'CotUIIIt.lf \t " Jll'nli.lf hyf)C'th~l~
(or

canJttl<>n<. u( 4-11 4:0.:'\


ln<rrumcntal ''.Jno~h .... em trumenh) 118.

lnleltJ

'"'"led. I'< I '7 ~~

5W

" nrll~" nf ,..,. OrJer.. vf


lion
tnt egr.lll n

lnrcr l~lo;<II.-J!.Ic'"'' ~~. Tnleradin lrOI


ltii'J~(IIIltliC:I!I<'"i<ln,

2t\7

lUll< lltlll, .2K~

llhldC'I, :z;~ .:!XI

term, 2'1 27'1

:s~

2'(l . -~~-'>fi~ 2')1

::!93

lnl.-rccrr (AJ, I'll-~txl


rummon ,, ... t- ommon tmc:rcefll
'(lCI."lhl' ~n EnUIY'fX"~,fiC

, IIIli

11110:1>;('('1
h\'l"llhc'" l~"'llll!J. for,\55-l:i( t
lntcrn.11 ,,,lll.llly 11.:\ Jlh, ~31>-:-.'\1) 406
llll'lf'IIOtlll\ ~J1 -31J

o1ttlfltthm tl. ~12-313, -li.!


llllt II 111 ~I ~-.ll-I. Jltt-\~1
.172...:17" 417.~~..

~'6.4b':l.

so.

Inter\ l" ll.~

ronltdcncc. \c < Confidence rntc:n-..h;


'"' lrJ nlrumcut.+ N
11 11 l r uh" , (orec:t<l AR -41 -0-15

'-\It

4J(I,'H~.-IJ!o., 44bt.ol 90: 4<)~-N.

h.'<ollll"

1\1

rc~'afl\.c:, 42J oi 'I6..J39-441 <:(II

-.S, I ~~nuol

lntrurnc

r,

I\"

u\1 m.;.thoJ.(~ ~

11 \c, ln..'irumen

v.o ro.N.::s

anal)~.U\

.Jchmllon ,,(, -1:! I


ldc:llhad cxpc:umen t'< nd

;~usa! c:tf~..-1"

~to

ldl!:tl randomu.-d Cllntrollc:J C:"'p<nmcnh .


1\5 ._~ . ~~~~ _..,, N'.!il.l~
n~ bcndunllrk 4M
.md cauo,aJ ctfr:Ll' ~%
prublcms 1111h 7
h.lcrnpo>tcnt m..otl'l~. 'I~
ld~:nllh,aln>n. \ ,. ( ,. lh(tcnts
IJ.:nllt\ rmnm. 7:!2. 7'>4
1.1o.J . S.:. lnucpc:nkllll\ .rnJ 1\lcllll~ull)
Jl>tnt>utcd
lmp,r~r cllc ..t , l'lll\
hn1" 1 muluphc:r .\n Impact c:lte(l
l mp.:rlcct multu:nlhneltrll\ 20"-:nh.

2W-2111 225 54h 5 rcrtt'd


mult~'tlbncaul\

lldinlllon nf. 2fl':l


lncam:r.-.rum ml 4 I
Included e\tllttnOUS' r~:~bln, ~,2

lndc:pc:m.len,-cassuntptron Ill
lno.lc:pc:ndcntl) d1$1nbutc:ll .t6
Ind. pendenll\ lind rJcnurnllv ,.h,trlhulcd
(II d ) Jo1 ti6 flll, ~77 12.'- I::<J
fo-)o,-6

ump11oc lor thcd dlccu rcgn: "' .,,


~

msumptltJI'l rn mulllplc r.;~c- 1om


modo!l 7117
"umphon~ ol omphn11- ll

In I 'fX'Odcnl \IHJ,Jhl~(\) '7~'-YCI


lnh~,lhll \'llfllhl~ .\r lhrlM) \>tnuhh.:
Inlea" hie Ci I S ,.,tlnl.llt>l, ''1 ~ b!S. 725
UdltUIH>O tl( , IJI-1. p,

e~tuuto "

1 'tl1

in q" ,,..,xpcnmcnt
Jo~p~11. CiiWI r 5.33 5J~i '\~5-~5<1.55<.1.
Sill
VJ.JJ \~ ' hd mstrumcnt
lt.'trum.:nlll ~n~hlcs rcpc-.too (I\'
ddintllcn of. 535
n:gn:""'"l 31'- .\2b..ln.-11l-t:S!>. Jolum..:n Sor..n 11tl
.Uli'J JR-1 ~17 5U.'I
Joint .twmrrout' drsmhullun, 7111
%)um runnt .r. <~21 W
J<>ml J .. rnhullon,2'1-40.13l '204.~llh
'umptun~ for en"~
' umplin lor used clfe..:h rcgr.-ron.
3M
"'"""nat o.Jatll 7.\li
a)\ IDJ'hlll~ cJI,Irll'IUII\10 ol(, iOJ
tnd e\l.:nJcd ka\t MjWirC" t\ umptuns.
rlOd C;tU\4\I ~lt.:ct ..1115

Nil I. 707

JchOIIHltl of. ~2 I. 4' I


~~llm:~tur. 4~~-02

hntlmQ

tn)trurn~nt<., 45('1

C:ncr~l IDlXli:l l>.t G,.n.:rall\


r~r~'-1011 m ..Jd

G\IM esumauoo
methodof

Co,.ncrnliu-d

....._ ' (,\1\t)

e<>umauon
111th helcro.)g\.'"IICOU~ cnll..al ellttl
501-'0int.:rprdmft. ~5(1
in\cOIIIlO ('ll .1].5
m main\ torm. 7~? ~b
111e{h4ntl' <>1.4~1-l '1

INrumental , ,mabla rt{!lt~wn


mood. ~1:!
<>milled \1lnlh!.: b .t" and
.:~

mle 01 iru.l.nlm<:Dt rek'ancc: contliuon.


-1111

t\\ U''!tt= ~.=.,,, "luar.:) mc:th'<l ,,., ,.,..,,


\t3g~: IC~~~ \ \)UIIIC\ C\IIIH.Jltr

(I "IS)

hunt h:.op..olh.:,cs. :!.'-1, ~76 ..'!1->3. 2921


.,, 1111>11<'11 tll, ?2h
11\Htg mux1mum hkehh<'\Od c~umntor,,
11J.J
1~1\

2..>~. il 'L715
JoHnth normal rnn!lom \:mahlcs.:ZOS
Jomrh nonnaJ :!kliDJ'Irn~ di>lnbUIIOII. I" 'I
Jo nl normal d! '1llbuhno. 21.1
Jmnl nuiiii\'Jl'Oibem. ?.1b-219 :31 265
"'il7. 2io
lmnt l'rot>:lbtht} dL,In't>uuoo " 30.
'4 ll.l
Jo1n1 sarnphnc dilnbuu,,n,205-:!l~ '11U,

nl. Z2S

"2:l7
;,, "' at OJ A nrltrtiL' Llur. til<"', 190
J m lll of I mnmn..tna, ~9
J llllbJlC ~~ .:U~9 .4~ 456
&ymptouechl-:SqWircd drstribuhnn of

727
Jclimrwn of.~
ICJC:Cltc>O, 4JO

J II,JUII~<:UrtCIItiOO ~o..fll"""' :S 12
/h ,lllhl\,;t)\IUIIfli.:C, 53;!

790

INDE X

r lag (jth lugged \aluc). 528

J'1' serial correlation coefficicnL Su Jth


autocorrelation coefficient
J ust td~ntified coctfietenL See Exactly
tdcnlified coefficient

K
Kentucky. 497
Klein. Joseph. 425
1< rucgt:r. Alan. 442,492. 497-498
Kurtn~t'S. 2f>-2~
finite, 119- 130. 135.203

L
L.lg('). ~.2&-531
dt\trihuted. See Di~tril>uted lag.
OimibUied lag model
length sclccuon. 549-554
muJitplier notatinn. lil 7
operator, 543
Landon. AJ( \1 71
Lllrg~-~amplc approximation. 129
chi-~quarccl.

229

nnrmnl, 74, 92.223. 440


LargcHumplc d istribution, 131.133.222.
Ser also Asymptotic distribution
Lm~ uf averages. Se~ Law of h1rge

Lmear [unction of random vanabl~s.


mean~ and vanance ol. :?5-26
Lmear log regre!>Sion.274
function. 270.270/.332/
mudel. 26<J-271 . 274
spccificaaion. 269. 27o. 331
La ncar probabtl!ty model. JS6l. 4U::S
and banary dependent variables.

3B4-389
comparing to loglt and probit , 3':1(1
definition of, 387
modeling mortgage de ntal probilba lity.
.:102
~honcomi ng.' of.3&9
Ltncar rcgres.,1on. 111-DS. 169.257,274.
'294}
and foreca~tmg. 5Ti
model. I I 1- 123. 126. 135. 188. 197, 205,
207.264. 285.707
with muhtple ro::gressors.IS>..2 10
OLS [unction. 257[. 259[. 2r.Al, 277[
with one regre~sor.lll-135, 148,
677-696
population function. 254
Ltncar time tre nd . 561-562
Local average treatm.:ut effects

vs. ave rage trea1111ent effect. 506


defin111on of. 505
Law of iterated cxpcctauon'. 32-33. 36. 6t2 Logarithm tunctton. ?.68/. 268
Logarithmtc regn:~ion. 269-27(>
Lav. or lurge numbers. 49-50. 76, 129.221,

numbers

5-16.715
loga nthmtc speci fication. 275-276.
331-332
an<.! awmptotic theory. 68H)82. 6.'>4. 6f?.7
L~lgaritbmic t ransformations. 2R6
and cunvcrgence in probability,
Loganthms. 267-275.290-291. 2<J2t
681-{)l.(J
proof of, 682-683
definition o f, 267
d iffcr.: nccs-o f approximatio n, 53 1
Least ah~ulutc dcviattom e~timator, 169
natural. See Na tural logarithm
Lca~t 1>qunrcs assumptions, 12~132,135.
and percentages, 267-lM, 271 .274-275,
148, 160.163. 167. 209-210.221 ,
::!36
284
tn regression. 273
lor l'fO'>'~ectional data . 364--365
definitinn of. 126
Logistic cumulative distribu tion function.
394-395.408
f:lllure in. 314.328
wg.isuc regre<,ion. See Logit
10 multtplc regression 202- 205. -H 9
rcgre;..,ion
ornmc:d \ ariable btl!!> and. 18&-H)C)
vaolaticm o.316,327
logll rcgre'Stvn. 384. 39-t-41XJ
cocffictent;.. .:~timating. 397
Lc:.st ::qunres estimator. I 18-119. S.:~ u/.\tl
dcfiruuon or. 389
Go::nerallzcd le.a:.t squares.
c~timation an, 396-400
Nonlinear least ~uares. Ordinary
inference in. 396-400
len'-t squares.1\vo stage least
model. 39-4--395,395/,408
'-(jUures. Wetghted le.ast square~
defined, 70
comparing to linear probability cmd

usc: or, t ~0- 131


Lett-hand vanablc. :lee Dependent
varint>le

Leptol.uruc. 2ll
Ukclthood function .:100
dcfinltaon of. 398
Lama ted <.lcpcndent variables. J~. 4~ .
StP ttlw llanary dependent
vunul>le~

Lmear condtuomtllv ualbtascd esumnton.


167- lb!:!. 7 19-7~0

probit.396
modeling mortgage dcrual

probability.-402
populauon. 394
Log-linear regression . 273
function. 272!
mood. 274
dcfimtwn oi, 271
5pcC'tficauon. 271, 273
Log-tog rel:!ressaon. 273

fun<.:l!c>n, 272.f. 273

motlel.271-272.274

dcfirution uf, 271


~peaficanon. 272-273. 2l:!9

Longatudinal data. Set Panel c.l.tta


Long run average value.
l::.xp.:cted
vaJue

s,.,

M
Mucroeconomtc foreca~ting. 9
Mudnan,Urigi ttc. 90
Manning. Willard G .. 44(\
Marginal probability di~tributton. 33-34.
43.132
Mand boatlift.4%
Market microstructure:. thcorv ot 574
Ma"-"achuserts school dl\tri<:t. class si7e and

test scores. Sei'Studeotteachcr


ratio (STR) and te<ot ~cor<'S
Matm (matrices)
algebra. 206.615
covariance. See Covanunc~ marrix
idempotent. See ld~;Jmp()t~:n t matnx
tdentity. See ldcntity matrix
notati()n, 705
;umt hypotheses tn. 713
representations o f O LS rc:gr~ssion
>tatistk s. 71 5-716
square root, 724
S} mm etric. Su Symmetric adem potent

matnx
Maximum likelihood ~ttmatvr (ML.E).
384.393-394
for ARCH and GARCH coeffk ca ts.
667

definitkm of. 398


for p robit Hnd logit coefficient\, 397
statistical inference. 399. 40R
us~:d to teSt joint b~-pothe~e~. 394
McFadden , Daniel L., 407

Mean(s)
common. See Commo n mean
condttl<mal See Condit tonal mean
of distnbution, n
of distnbution of carrung.... lo5
population. See PopuJa ticm mean

of sample average. -l,')


of ~ling distribution, 1\9
~qua red

forecast error (MSFE).


548
square root of. 5-1!!
t~~u ng for equality of. 170
vecto r. 708.710
Mcasuus of fit. 399-400
Measurement error See Error..-tn
variable~

t:>ia<

'vbnimum wa~e. and demand lor low-

skilled '' orkers. 498


"'fichigan.497

1\linnc'-Ota. 305-366
M.inontte~. und mortgage applic:11ion
denial. Sa '.iortgagc: lendtng and
race:

INDEX

\Ill'

~.

\fa,imum likc:hhoo..l
<:-t mator

M<>dd 'f'<cillc:ottun "'6-2J\I


Momcnll-1. P'l I.H.l~!l
C<'lnJititlO!J.. 7H

nl tlJ~cnhuuon . 27
md cxtc:nd.:d lcllst \qunre' assumptions
(,Sit. 707
go:n~rdlw:d method of ,.... uruauon,.t5(t
rtll.ll\
Momentum ltrc:o: ~>l~ 54 1
Monthh cxce" ro:tum, 541
1\lortg.agelo:nJmg .tnd race, J 5.38.'-3R5.
Jli8-.~Y '92 W3 1()(1,41)2.

40J-J116, -l(lh

omntcd vuriuhlc hta' 1!W.405-106


otber lac tor\. ~II'J. JU5
'umm;m r.: .. ult -1tllt 41Gr
tc:,tmg null hyputho'\1 405
\lo-.tdic:r. Frc:Jc:nck 425
"M,,7..Jn F/le.;t ~ I'ICI
~1St- C. S., Mc:.. n -.quar J tore.:<lst c:rroT
\luluo:ulhn ... ant) .:!Cit>-, 10. !.~~also Po:rtect

79 1

0~1'\UIIOn\

N
' turul o:xpo:nmcnK .\rt' Qu~o
expc:nmc:nh
1'-.tluralllll!.aritlun. '.:!b7. ~711
JctJruti~n of. 21\8
N olur<'. IQ(l

Nc:wt:\. Whilnev, tiOI>


"'t:wc:;.west HAC standard ciTill"o,
647

Nc:\\c}\\'est standanl c:rwr' 61'',1>20r


l't:\\ey-We't VaJI3JJC<' t:'llm,llor. N'lS
~t:" Jersey. 4'1l:i
~c"' York Stock E~chnng.c: ('IIYSF) St
l i.S. stock markt:t
NLL"i. See 1'-'onlincar Ieos! squurt:' (Nl.LS)
6llmator
\l()hel Pnze. 407. 6~
1'-.t>n-e,penrnerual data 10
1'\onlinc:M function 21.>0
ddimllon oL ~~
vf ~ingle mdependent , .Jrwt>le.

'-17
5 'P
01 S \ OrJman lt:.!.'l ''~u~r.:
Omitted vanohl.: btas.l!>b-l<J3.l'l5.
lllX IW. 2 W. :!JC>-21'.1. 24+-245,
:s.J.l<HI-:!91. '1-1. 3111-118. n 9.
~ ,n r.amplc

..Ut tl(

~ompk

J5~

:Jdd c~\tng .J'\3


l'' dtvtding Jat" mto group"
tJI-t<n
by mdudmg t>miueJ vanablt: in

mu ltiple rc:gre"lnn. '12l


'''"'ling. 15X. ~~ 171
C.lli\C~ ~t. ,lJ7-3JCJ. 349, 3~6.167. '71,
JX'I.J91 1\22
do ltnllron o(, IR7. ;19
cltnunaung the: UTc:e of '\~ lli:.'-363,

47U

torrnuh 1\'lr. l'O.'.I-191


.md mtcrnal \ ahdii).:Ht>-l~{l
and I\ rc:cre--~iun.J~
2~267
anJ muluple regr"'"',''" 172.~113.
" ''"hnear leas-t squares (NLl ~) e-tim.ttnr,
muh1cnlhncant~ lm~rlect
2Jh 2..'7
15(1, 397
mulucolhncarlly
and m'nltncar rc:grc:'-'lnn :2.511
compu tmg. 614-hlS
~nluuono,tu. '.:!20. ~- iP-.\IX. '91. 421
Mulurcnvc.l forc:.;a,unr M2 Qllj
deflmlltln of.397 1\14
rc:commc:ndc.'d pro..:.:duro.; tor 6474
<rdinary lr-\t ~u ..rc; csttmJte>r. thllred Onc ~dcd :lltc:rnilttvo: h~l'<lth<''' MJ-..'(1
kO nc u a trmr - tl:!>ting procc:dun:. 22t>-227
Mult1plc: rc:grt:"l"" 209-210. - <H-T->6
prop.?Ttie<.. J</7
aJv;~ntag~ nf 1'19
Oran c jutcc pnce-.. Sri' Cold \\c:athcr and
l'oronlinc... r regresston tunwon., ~5-1-296
analy'II,IJ.-111 1-J IW !02.205.2~0.
orang.: JUice: pn.:.,...
Set als<> Logll rcgrcs:.Jon. Pml'lit
239 ::!)i-1 Jn~
Ord~ r.. ol 1nh:~nuion. MS-ti55
regressiOn
thrc:dl\ 111 internal valtdlty o(.
t.:'r11llllolnJ;y. M'l
interpretal!on of CiLS est tmar or.
310-~27
Urdu1ury lcn't ~4UJuc:' (OL~) c~tunators.
614-615
(II'I- 123.7M-7CW
U\'c"tn,\t "uJi.:~ hascd lln, ':\12-339
model 258,261-~63. 285
of ADL mudc:l.612-fll3 617
und .;,mfido:ncc mt.:nals.. 2:?()-245
>~(.1ficattons. 21l l
8\)'IIIJ'I<l!IO: di,trihutum uf nSI\-6.'ilS.
"tth cro"-.cctiOilal data 360
Nonrandom regressor. 129
Jc:finotleln ur. 11(1)
704 705.710-713
Nonrandom samplm~. 71)
With .IUtoetrrdat<'<l error-.. 6(~606
(Mu"M.uko theorem to1. 705.
"lonr~prcsen tauv.: prol\[nm or pvhn
7211-7.:! 1
ot .tH:rar.c: l'llU'-'11 dft.-ct. <;fl)
.175- Hn
nnd h)puthct' lc\t\. '?(~ ?.t'i
ht.t~~d nnJ mcnn'l'lt:nt. 319-323.
Nnnreprcscntativc 'ample. 47!\
J28-J29, ol'i.!, .!73, ~()ll-:'illl
tmt:rucuons 111 2l<J
Nonst andard dlslrtbuuon ol A l>F 'l<llt\trc.
lc:ll~t ,quare~ ossurnpt11Jns 1n. 20::'>-205
ot h1nnrv VHrf,,hJe 'pt:c"llicauon, 35<J
563
C'OCIIICICIII,t:\tlmated, 15:!.196
hnc.2110
Non~tationarity
m.:ru.ure~ o f Itt in. 'Ill 2112
comp.o.:t t.mmul.h for. 70J
breaks. See Breaks
mnud
con'l't~n,, u(. 602. 6.'\b-~~. ~n 727
..oluuons to. 541\
uchntllon uf. 11Q 197
o:~h:ndcd least oqtuarc "''umptivn>
toh for. .5-16
ft>r . 71i-l, ll7
dcrl\1ltll>n vf, -0"
trcn<.b. Su Trend~
m matn~ rorm. 71
nl t.l"tnhutec.llag re!!rCSSIOn ,.. (I
~rm..l o~ppro:timauon. I ~J-1 \.1
muuo:ltng nonhnc:rmttn ~64
dt\t 1t>ut~om uf, in muluplc r~~r,-~,lon .
'mnal dJstnbuuon 3<1-!.l
211:'-:~llh
OL"i 'ire <JrJtnal') le11'1 'qu;m:-;
a.\~'Tllptotic See As:oompwtil: tlhtnhution
eflidcn~'\ llf. 719-7:!1
l'Sttmator
of .:rror;. exact samphng d1,1nbutton.
c: 'l~t 'tmphnt~. dtStrihuuon ot. 7(15
tnlllh:d \'llnablc h1a' mJ , :!.'\0-.:!.'7
b!S._'\.-.691
.md c~h:nd.:d least ~uarc>.. n/S-fl.'lft
Mu1uphc""
multtvariatc. .S .,. \1uhivanatc normal
\ Col'\ hiS
o.~mul. ll"c d\r;;tmtc 'i,.... \umulauh.
<listrit>ution
dyll n"~ multtplu:l"o
of I tlro cnCtlU\ cau"-11 ctfc:'t'
>3mpting. 150. 167
.)II} 5().1
Jmam"' ~. 0\llamc multip1icrsf(tat>ll 1\ull h\poth""i' -:!-7J
.md hntnO'okcJ.t,tidl\, 11\.l IWI.
\lultllun .. te .:cntral hmllthcorcm.
'\YSE 'tock pn~ tntk\ ~" l ); 'h>d.
710 711
n9:>--h'lfl
market
nnd mcon,sstcm standard errnr-. 326
Multlvaritlle Ciau'' 1\IJIIo.u thcM~m. 732
,1, lim .tr o:omlltiunnlly unhiascJ
Mulli\uriiltt nurm.ol dlstrthution,4(1-4 l.
0
.n. 2tt5. 7o~ 11u
Ot>sa"allonal tlala, I 0, 127
Murph\,l'tl<lJ~. 'i<IJ
Ol"<:rvatiun numh.:r. ll

792

I NDEX

Orc..linal) leaM ~quar<.>:> (OLS) e:ollmators


(cont'dl
iJJ multtple regr.:"tnn, 1':16--199.

220...223.3 16-326
nonlinear least sqWlTc' o:timator.
\hAreJ propccties. 39'"'
m nl>nhnear regrcsston. :!.SO
of populaU(>n mean.l22
and prt!Jich:d values. 119
reasons to U.\c!. 121-123
restricted r~t(Te"ion. 231
~tt W..tnl>ution ot 131-B-1 "U-1
standard err''" 156. 3~b-3:!7 s.:~ al-.11
Ho:to:ro<ol..etJa,tiClty-robu't
standard errors.
H omosketJa,ucn\-onl struldard
crro~

test St3USIICS. 129


rbeoreucaJ foun<.Lttions oL 167-169

theoretical pr<>peru.:;. o( 71}1


under VAR a...,umpwn. f.l~
variaru:c o(. 211
Ordinal) least ..quurcS (OLS) prcdtcrcd
valucs.ll9.1:!1 , 197-l%. 71{)

vs. lorcCOISI. :i37


sample average 1:!~
Ordma~ least ~Uarc:" (OLSl rc~rcsston.
122.111J. Bl. 15~.1 XI. 265. 271.
28J. 21Jl

biased. 323
with fi.xed effect~. 3S3
function. linear 257/.259/
funcuon. quadral!c 259/
li ne. 119-121. 123. no. 165. 197 JW.
224. 257
' latbtics. 203. 715-7ln
Ortlm~ry lcastsquure' COL$) rcsidu;\lS..
119,125. l97-19S.201,23~ . 7 l h
v" forecast error. 5J7
Out<.:'Omes.. l~

Outlic:.r.27. 50.12'1- 131.135.169.Z!D-2()4


a..,,urnpttoo tor hxed eirects rcgrcss<>n.
365
Overidentified cocflicients. ~32--133
Qv.: ndeonfymg r~ tncuons 1~1. 443-445.
4-l&. .l-1-R. -l"fl. Sr<! olsrl J~t.lltMtC
ddiniuoo oL +1-1
degree of O\,;ndenufication, -144

p
Pand tlata.l3 1-1. 35ll-17I. 4:30,452,S%
toefore and .tiler compamon:.. !lC'c
-setorc and art.:r- comp;u'\00$

defmition otl3.31~.3"11
estimaung causal dft.>t:L 4."()
rcgresst<m,.:U<J-37:!. 3M
anai)''M..,4h'l
assumptwn'. 'l~O
estimauna. 'Jo'7
s trudUR ot. 35(] J5:>
on SUbJ.:<.1' uf qu~s~xp..--nment,J 1 17
v.11h two urn.- pc:nuJ-. \)3-l'\o

unhalancl.'d Su Unhalnnccd p.lncl


111<'iln(~). K2 116. II K. l-l9- l50.
Param<'ll'n. Stl' Codficu:nt\
lhO. It,z
Pani:tl compliunce
cornpanng.I:IJ ~
and esttma uon. <tJl4...4~5
conlltlunc~ interval' fur Se.with trc.liiDCOI prOtOc<ll J73
Coofid~n<.c: onro:r<ah
c:'um;ttton ul.~71,12l.lf>"'
Parll.ll dfec:t, I W
Past. pr......,nt.and fotur.: c~nscncoty,
h\puthc:st) tC>lln!llnr. 71 ~1 I W
"19...(11J(J. 'lre u/JQ Stnct c~ugc:neny
nuiJ h~ pothC)I'>. J:!
dct:itltllon or. 599
value.l59.2.34- 2~'\
Past and pre,c:nt exog.:ncuy.51l<l-t\00,611!.
lllUillplt: TC!.'TC:S~IIJU motlel. lliol-19i\,
Jon-707
lil!i.. s~t' al\o E:togcncll)
dcftmuon nf '\<)9
rcp,retun !tnt:. IJJ 11J4. ~M-21>5.
and tun.:; ~nto. dl!ta ~1b
2M,_W
PaymcnHo-mcome rauo. JS..'i. 31!7-393.
condJ11o0<1l probabiht~ di...rrihutton\
3'15/ 4<Xl. 405
and l~if
p.d f St't.' Prohahility dcoily tUilclton
Jcunuion u1. lt.,IJ
P<!nnsvlvanta 4~
nnd d\namtc c,IU\81 ctlccts. hO I
Pcr1.:cl !Jn,ar lunuion 20-1.107
nonhnear. 2.'iti
Pcrf.xl muhtlo>lhneamy, 2U.\ ~10. 661. Sa
' art ana:. 71>. X4
ol\rl Imperfect muhtcolhnearit\ ,
p,,und dollar t, ..hanc rate. Su t .s
run column nnk
dollar Bnll'h pound LXCban!lo:
a~umpttun hu li\ed effect~ ro:~rt:"ton.
rate
Power ol resL ?Q, 'lot t1/so Stntisucal
365
assumption <>f n(). 70'1
hypolhC:>t> tt.''l
detinmon or. .:!113-104. 708
'PredJctcd clft.>CIS.<."'"Itdc:ncc Lnlcnah ftr
pr.:venttM '162
712
PhdhpHurv~ 'i.!l\ 572,574-575. fl27. 641
T'rcdtctcd ' diu;.> 197-lQS ~1-262,2711,
break te<Ot
Break test
27~. ~3
ddtOittOOof. 7
ordinary least ,quurt:' 1OLS 1predtctc:d
value. S'tt Qrd rnal) least squar..,
shortrun. 5411
, tabttity <>L SnY-570
rrict
l'lacc: lX>. -16.'!. 4 7-1
c:lasuaty ol cklmand.n
flol) nomJal(~)
Ultlatton Sr~ l nn~uon
r.:gr.sston. 265-267.275-277
Prut>abili~ . 12-:'-4
dcnsitv tuncuon, 23
run~uon. 275
model. :!n5. 267
deftOiuon ol. 21
dr~uibu uon
~pc:d hcati<lih. 275
a5ymptouo:. Stt! Asymptotic
for spectlymg nonlino:Mtl:rcsston
tuncuon~ 2ti'>-2to7
distribullon
Pooled s1.mdard error rurmul.t, Yl-IJ2. 170
Bernoullt. 21
Pooled t stnusuc, 91. 170
ot' contlllUOU'< randvm variable.
Poolt!d v:mancc
21 23
o:~rimmor 91
d..ti.nnton oL 39
formula, IM
normal.311
Popul~uon. 46
and random riobk'. 18-23
standard normal. W
corrclatt<>n. 557
CO\ ananc.. \15
Pr<>t>.~t>iluy hmH.321
,,j outcome, lS
covanancc ilnd .:o"el,ttion. 94 Sn: af,r>
~ample co, ananct". !)ample
t C\ ICW 01. 17-54
corrclall<ln, Sample Cl1rrcl,ttton
Pr~l>it rcgr~-ssion. 3!\4. W >-394
cociiiCtenl
co:lllct<'Dl5
di<trtbuuon. "'4- 75, S.~\'9, 91 112. QJ , Il l
~umauon ol 393-W-1. J</7
c:llcct,:!M
intc:rpretauon uf. ~Qil
Jdinmon oL 3b'Y
hcterogc-nc(t\b. o:x~nmenlal and qu~
c:'(pcrun~nral
"'umatmg wah rcgr~"''wn ,oftware.
.\93
.:'um.ltc) m. <;()2-'07
csumatton in.19(Hf)O
of mtc:r~:t. ~5
ddimhun of,Jl:
Inter.: nee in. 3%-UJO
model. JYJ. 31J5J. :Ill.'>
dttlcr""~"' on popul.llh'll ('II tnterc:--1
and, '\14-'15
cumpann! to ha.:ar probat>tht) anu
l<>:lrit , ...
JOmt tlt,tnbut on Ill
l<>gll muo.ld. ,\ U>[Ctl nl<oJcl
l'rc~ram c~ahtalln -11>9

s,.,.

IN DEX
Projo:ct s I\R ,.,.,.

hnnc-.-~c.: d01~ ~i7.c

ct.lt:ill Scf Ideal ramlomvccJ cumrulkcJ

Jun<.hun.2nt -::!n2.2M 21l1>.273.2.'l0/,


29.~2'1~. ~l'f

c\penmcnt

c: '<f"!rlmc n1\
P<.cuJu uut ..,t.,,,mplc fcrcca,tinl!. "1- 571,
problcllb 'l'ilh.49J-1'>5
ddtnllhm <)l 5"' I
rea<.<m' 10 stud\ . .11)!,
v.;. tru~ out-of ~oompl~ furcca,llng.57 l
R.mdoml~ as.stgnc-J trealm<:'nl. 90. 4 7<l
Randomlv sampli:<l data 205
U'C:' (lj, '71-:'72
vartancc.: f '4)>
Randomncs.' 12
Pl.cudo u-, -l(XJ
Random sampling,45-41'i. h.i . 73.132 I '5.
p'" urd~t IIUIOrc~rc,f- c modt: l .'>1'1' A R(p)
15:". 234. 32:!
mudcl
imponanct' ol. 7~71
{J-\aluc. 72-KI.I.f'l 1~1.1 55 .:!24 259.267.
~implc:. Set! Stmplc random \arupling
l7n 2~3.::!X9r , 2Y1 . 2112r.::!'J3.331 .
vanauon. 79. l-1\1, L10 494
:n.tr
Random vnriuhlc(~). 1&-54. X4 , Y4.
cakul.llm)!.. 7-1 7f>.. 77 153(
131-l32. 170. ll\6 . .227
condtuonal dtStnbuuono;. 311 34
compuun~. U>tnl'
appro~llll3lc O()mtal Ji,tnhulton lYl
condnional m<!an ot. 12.\
~~ mJ.ud oormul dlqrtbulllln. I71
dascrete v:;. <.'Ontiouou.' I'I
St 1dtnl t dl\tnbulton 92
distributed.. ~'I
I ,t.tli\U(\ ~1:!
.:xpcx:tcd \alu.... 25
dcllnuullt ot. 71
JOUll and mar~anal dl'lnbuuon. :!~.>-~ ~
IC,tlnl! c\clu:.t<>n uf group-- uf vanat>lt.:'
hnear funct1un ol mean and ''artant.-c of.
~2ti

J04t

Q
Ouddratic form. 71"'
Ouadr:~llc h'll4'1K , 2S~~.W

dcftMHIO Ot.l.~'
Oundnttlc polynomcal 11u
OuadruttL pupuhtuun rc~trc$Sctln model.
:bl>

()uaJrulcC r.gre-.,iun 2h5. 267


funcllnn 25~2fii.27U
modd hl\-.2t.CI 1(>.2 265
~-p,mfc."t~llc>n 2o'l no.27'
Ouamltlikdtho..KI n1110 !OI.RJ ~taustic.
v,7 !\70. 51\t~r. '72. 575- 576
0Ul~H.hllcrcncc:.611 <112
Qu;t)H!\))('rtment' \7 ~. <~55-456.

N-1-.l99
Jdv;\Oill~C

of. 50S
.m..tl)ltn .197-4<1Q

allntcc>n 10. ~~
dcfinlliOO elf. X. N~
dfl!cl~

tn

hctt:ru~cncuu~

pupulation!>.

~11. -507

3.lld prot>abdny d.llitflhuuons. Us-23


s;and:ud nornl31.1S3{
uncorrelatt>d . 36
Random wall.; trk."ld..l. 55(r551<. '\6.1. 64h
ddirunon nf 5~
"'tth dnft. 5:\o
Rate o ! mOallOn. 53~!\-12. SS5.
6-l. t>:i'\
autorc~ress~vc model of. 552r
torcc~;ting. 525. 52.S. 529/.550
chan~c:s co. 540-54J

predicted.. 537 'i.\1\, 539


VAR model of. 6.11-M~
R-bar squ:ared. !;;e< Adju~lt:d 1< 1
Reduc~J furm cquatcon . .rq
Rcgres-;and 135
definclton oL 115
Regre,.,iun
anah , ,
c:<.:onometrit: theof\' of. o77~.
714-731\
of ~couomac ttmc sene., uuu..
~25 :;-~.591~. (\7-Dh~

and .:~pcrtmcnt,-16l;-501!

a~'umpttoD.> Cc>r

tnlrUIIICOt.tl ' thtltl\', $01

" tth hmary dependent

rrl>lcm' Wtlh,.54U .SIJ2,5tl7

R
R 231 "''1)..2.111. 'i1r n/.10 AOJU)Icd R~
I J, I u'lll 1 1S
R' adjtAic:d for dcl!rcc nf fr.:ctk>l" ~....

rc

,\JJU I<! U~

R cnl dN:HrnJn.llllln. m 0\0MI!;tgc kndtng


'"' \<torlt.t~W lcndmg dnd race
R ndclllltl.1llc>n.tc:,llnl! lor -185
Randoml7cd n>ntrnlh:J c:\pc.:rimcnt'- it".
t.Z7,1'111.:11~.472.~.%.-l'14-l95.
507-;.1~

ddtntllon <ll. nu
l'x.Jmph: ut. s, I'll\

p..nel d.Jtd 'SO


~arc.thlt:.

~1\3-IS

\\ith hcnary \'3n:~l>lc.l5:-.-tf~l. .?7S


t>reo~k~ in.~ Rro!ili
.:QCifictcrus. IIR. 125-12<. I'-! .lfl'l. 217.
233-2..\J.241-24~.~o9

I C!!J~ 'IVO

C'ltrn:llcd 2;>\
cntcrprcac.J '" modelmg probabtltty.
'8tl
inlcrprtolCO a' prc:dt<.;to!d probahilih.
~M

nunhncar 1(>..).....~1\a
linc. l lli,IIX.I24.13-I, l66.2fMJ, 2R2- 2!>3
..,umatcd, I ~~. 234
urdmaf) lc;N ,.quarcs (O LS) See
Or<hnttf} lc:u't ~ uarcs
populauon. ~I''' Populnuon
rcgrc~sion hne
hn,;.cr !itt< Llne.u rc:grcs~ton
lul!ll Sa LO!ttl rc:gn'!.SIOO
mtKid. wtlh hctcm~kcdasttcit\. 694
till '<.lei lollartthmic. Su Lotzarlthmi
n.:l!.rcsston model
m11<.ld' im torc.u,ting 527
mulltple. Set Muhcple r.:gresston
11\>"''hncat. St. " nnlmcar rcp-e-<Ston

funcllo '
OLS. Su OrdmJf) lt:ast squares
rc~'t'CS~tl'"

with p tnd d. Ia '\.ICJ.-.Jn


p- cl~ nomt31 5a~ Polvnomial rc:gresstoo
rwhn. Stt Pruhit r<!grcSStoo
quudmltc. 5tt Qu.Jdnuk rcgr~con
R- .1~"-1~6. 1~2 :?00-201 .230
d hnihon ut. 123 200
r~,lncted. 51'e Rc,tncrcd rcgr0ssron
"uh ~mglc rcgrc"or
h}pothc~~ t"'l\ and ronfillencc
lll!Cf\ <II:$. 14!>-173
>nit ware
compu tine OLS tixt:d effects, 259
lor csumatcnc.t probn models. 393
\tund.trd crrm nf 124-125
~l3li&IICS. dt,l nhUliOO of. 715-71<)
\UII\ nl <qUllr.:\. St~e Lxplamed sum of
~U;tr(\

unrc~tncted

s,., LlnJ'cstnctc:d

rcgr'""'tn

Rer~.....,orb). l'I~2JO.L'2-24fl

I> nan.lt>2
llcruurion of. II ~

cntcracuon' t>ctw<.:cn. tnnnrv \CITiahlc


ancJ.1Sil- 2'<..1
ltncar ugrc:,~IOO wnh mullcple.
I Jnc::cr rejlrcssJOn wtth multtpk

s.-,

273,31-1

...onfidencc: ~..,to:r\al\ for.t.'~ 15"


mtcrprd;>tum oLl~lt>el
diSCrete chang~ to ~c, Br"aks
exact!~ 1dcouhcll. Stc [!~act!~
Jenllllcd cncfficicnt.:.
equ.uwo. repor1tn)t format. I ~2
error. 124-121\. Sn 11/so r=rror term
cstim:clur...lh9. 477-IX'i
fixed c:lfects. ~u Fe xed elf<.t'

79 3

r(.!gf~'~f"

line r Tl!>."f,~\1011 "'itb one. \,c un.:'r


rc~r~

on \\ ilh on~ regressor

mulltpk.:!.;tJ

n<lnrandom .\u 'llonn.ndom rcgre''><)f


\, e Slllg!c regressor

''"lUC. model

mt>dd
tcmc: '.:n~s ddl~ S, ,. Time ~encs
daltt
Rcje.uc>n regaon 711. Sl. 'it", ul o St8lhlielll
h) pt.)lhC'I\ tc:'l
~nd

794

INDEX

Rc<aduol(s). 124.165
'um <1f ~qunrcs. S ee Sum of ~quarcd
rt,lduat<
ordmal'\' k..~.,t..quar~ IO LS) re.<ic..lual
~--~ OrdmaT\ least squares
reg.rc<-'1011. 1:!.5
Rc,lnctc:J rc~rcsswn. LW-2..< I
Rc:,;tmUons.1:!i!. -234. 2b.Z- 2tl3
ddimllon olllf>
Rebrc:m.. nt "" mgs. sumul:llmll. 90
Rc,cr;e cau..al cttcct 325-321>
Rcv<!r-e ~'llu~allin.L 32.3
R.t!o\hl-hdnd '~<11nable. .'ll!t' Rcgrco.~or
R.t\<.f. { hit '<XI."' 549-551
R.'<iSFE. v Rvot mean squ:~red rore...<~~t
crrllr
RolL Rtchard, 625
Roost\'dl Franklin D .. i I
Root mean 'Mluared forCC3St error

(R\1 FEt.s-:; 575

SolltCl i)IOb. '-ll-\!(1,

1.14[

ot cl1.1n~~:
'" inlluuon vc:. uncmplll\tnCnt rote

cstimatm~. '\71
source' ol <:rror. 537
R-Squ.ucJ.
R"

s,.,.

momcn t.2~>

l.s\c.:' .ant.lt..mo~m~'

dtla. l::ll/
of huurh Cllr0111!t'
ol

' " Oj!c, Y)J


ant.l ~cars ot cducauun
m<>TIJ:!cl.l]c app!J.:auun J

!Itt>(
ntal and

of u"t -eN<:<>
'" Ji,IO\ol IOL"Uffi~ "J'\Ij,].SYf
' ' 'ludenl teacher rnlau, 1\IJf II)\)
'~ lhr;:~ S!Udcnl ch.tru~lt:ri$W..'S.
Jl/
o f trulfc dcn1bs anJ aknhulta'tc-;.
~52/

Sch.-art mfurffi(IIIOn mter~t~n !Sl C1 . .'lt:<'

Sample aUt<X:OrrclatitJI1, 557


Sample awruge. 124. 159 1()().179, 293
COOSISII;nc} 76
and null h~pothesis. /3
probahlhlv nr o blallllDg. 78
sam plm~ dim ibutio n ot. LX, 4t\--18 1.'2
standardv ed, 55f
Sample correlation codficic n1. 92 . 9 'i
dcfin.uon uf. 94
Sample covanancc. Q2
and corrclutinn 9-l-96
dellrution of. 94
Sample ml!uns. fi.'i
S-Imple sclcd1un
bi3S. 32:!- 3:!4. 337 501
tools lor handling. JOi
Sample space:. 19
~:unplc ~wnJarJ de'JJIIOII. YS. 12.5
dcti01110n >[, '75
Sample \anance. 95 21Jot .::Ut
ddiruuon cll 7;
ronsh.J.;nq 'i.ce Com1stcn~
Sampltnj!. ranc..lum.St't' Random
s.arnpling
Sampling dt,tnhUJ.ion . 4~ ~.::

R.avc:, mtormatum cnt.:non (0 1<.)


<icc-onJ Jaff~r.:ncc:. MIJ
:::.ceond,l~~e ro:g.r~10n. H 5
ScqucnlJI h\polhCSI' IC\11111(. :!ri6
SER .'iPt 'lnnJ.trd crwr nt regrcss1011
~eno l co m :lauon. 3:!7 5111-~.~5 . .Ste u/Jo
otUIUCIIrrclllttUII
J dinauun n(.;\66
of crrm lt:nn
m ,.., I) I Oll1dcl. 1112 .t, U
n Ji ,lltbut .... d lng mode:-I, 51)2. fJl 'I

d~rrablc chan.:tcOrN~ of

e~umat\>r. fo7
lnrg.:,amplc appro\amuuons to, -1~ 11

::us

meun c>l f'l.l


in muluplc regrCSS~on. WI
o f OL" t:<:llm.lh>f'.I4S,Ib1, 1!!7.211~

of -.ample

~vcrJ)l\!, -lh-olX

t.lt:ts nsuuu t.l IIJ-11 :\


OL<i c:,tJmmor ol ~21J 171
tc~lUIF. h>fkllhe-.es about 149-15."
" W' tO C\lllllJlt: With OL'i I)Y-$.0
Slut'k} \ lhcorcm.lllll.l"~' 714
and cun unuous mappmg
lhelltcm, fll:iS
ddmlliun 11l, llS5
~moktnlt. ~:g.~rettc taxCli.. s.,.,. Ci garette

5-l'J
m lrlllll, d".tth' dnc..l l-eer taxc~. 3.5 ~1
und ~~rrd<lllOn. 115
of t:<JUihbuum pn~c o~nc..l <IU;mmv. 427}
uf ~tlm.uc:J rc~'"'''"" hnc; Cahlorma

p.nm,ntlo-mo:me r.ho.3&>(

constructmg forecast ullcf"'al. 548


dcJIOIIIUn Ill, 537

wb~n Mm [)h: ~17-C ~~ lur).lc, 42'!

' ta11Jord de 1a1iun. Sec :::.twu.Jnrd


dt:VIoi(I(JII
ol TS L't c'1ama1or. -I~R-110
Samplmg ~ rrur ;.:!h:!
Samplm~ m~lhoJ. '122

Shea, Dcmm, ''lJ


S hon ru1, r.:!ll
SK t&h,v.trl mlorm~llnn nih:liou) Sn
R.,~..,,

mf,,rtnJit<m cruerioo (ill( l


ulw

S~n tfswnt;C' lc:vd. 79~1 1:\2 157 ~n


S t llti\IIC'3I h\1'11111~,1- t~q

lllh;m,al v.tlidll}. 5.:1' Intern. I 'alit.luv


S,.mlcun\:c probab1h1v. .~n f'"al.uc
!:ttmpk nmJm samphng 4WII.II<i 711.

IU\,:.!0::1
S1m'-. (1m,tophcr. 641
~tmultilnet>UH.sUsalit~

l2h. F JJIJ.4.U
" ,,_;.;!1\, 4'1 451-t52
defmllnn I .\23
<it nullaneuus C\jUallons 1>1 ' \
<.;,mult.mcou~ ou'<~hl' """
Sm~k rcttrc''"' moJcl.l9+ 1Q:\. J117. 199.
.:!112, ll.l
<iue of lc,t . 'W .\, ul -;1.11t tical
hvpol h, h IL
Sk.:wn"'" l "-2.'>
'ilufl<: t,P 1) .:!:\5/ l()Sf, ~l'llJ, ~YS/

\runou.r.:Jtr~"ion. 551J

Soil llSh

dcfiruuon ur 55b-55'J
Square roN malrix, i2~
SSR !>t"t :,um ol..qu<ned rcs.dual~
'itimdard d "litm 117 D:S. IIl5
dcU.nlllon ol. ~.1
of cr rur H."rm .'il'l' St<mdard error of
r.:gt.:\''"n
"' rc:gr"''"'" "rrnr ~rt S1and:ud error
ol r<:(:r.,-son
of regrc~Jon r"'idual'\. 1~5
'f '<liDf'k ,t\" r .:~ge "''
c.l ~1111phn~ dMnbulltlO, 71.J. 1-!9-150.

'ZI
;md v.uian.:c ~4--:!fo
Stand~r.t c:rrMb)
m.J auiOc>rr.:blum .mel mtcrcnce.
(l(JI l>tl~
du_,terc:\1 'te <1u.~tere.l ~r.mdard error~
.:nmp,ll marru. ~'rr"'"''"s lor . 71~
tktmll!on ol -5-"'6
m thrtCI multpcnll<l rc:grc-..<ions.
b4tHH7
of c~uma tcJ effects. 21'12- 263

helefo) J...:dasliCII) .mJ ~utocorrelatJCm


const~tcnt. Su Heteroskcdast~,l~
.10J 3Uill<.'(lrTdatim<onsistcnl
( ILAC) sta nc..larc..l errors
het~:ro~kcdustclly -rohu~t ~f.'e
H~:tcroskeda.\l idlyrobusl
~tandard

c rron.. ~5

u.'i"!! homO'ikedasiJclvlnl\' formula

339
mconst~tcnt.

reasons lm. 326


OLS csttmnlnn., 148. 152. 160
for p<tnd da ta regrel'~i1111 . 350
!""'led SCT Pooled <landard error
of predtctt:d ctfe(L 71:!
of regrc~sron ISER I 12.'1-126.152. 21~ I,
2ll2. 24tl. 243. 2ll'Jt .~~1. 716
f11r

Jehnltton of.

124.~(X'l

formulator.2t~l

'ildn<iarducd ~pi~ a' crd~c: -7

'>tandard nt,nnJI cmnul311'" c..ll'lribuuon


luncuon .J'I 75

Standarc..l nurrnal disuibuuon

3~-1!1. "J

I
~ M . &.......,'N.l50. 151-1'3. 1r..;
JIII- I ~ .
St.mu;~rd

ntrmal ranJum' riablc..l51

'n

2~~>.
St.llifurd .\ch~"em~nt

lest. 11

~1>6

795

INDEX
5

IUIIIIY

'>t.l.o562 'lrt' u/,n

'\ rr 11 squ;rred rcsrduuh '''I{ 1

'<omtau(lOllnl\
e'rem.'llc h)pc tho" ot 563
dr:(IDih1111 <I ''h .. ll :4;'
11 11\.'Jf\' dl\tlllUIIu11, <.i'i,(ill JiO~

'It 1t1~1 '" !I~ !XII he< a. leSt

7 1\1

Stiiii,II<S
IC'\1 W 1>!, h5

s"

011
lr<nd .)5'\ s~. ~7.' nn. 641.

ha~
(W

115{1

t\62

11nd .1uturc~re

1\~ mo>Jd, ' '


lind CUIIllqJJ.III<>O,fl~ 1>:)~

romroon, ~w.n:-oJ hnl


rrublc:nh ~.ru,ttll\ ""7

~(\()

8\ niding, s~

r-.!JlJum" II.. mode: I 11 St Random


w;~ll.; rno~<.ll

l'( k<' "kea nr ~t()CI:


'tad: mark, t \to,. I ' uoct nurkct
'I TR ~.,. \tudc:nt IC ,M CDIIO
...IIMt'\<> ,n 1 \ ot~-hl4.t1ll>-f\17,

cum~uon

ur ~~~ n11m1<
mulrtflhco.. 1 ,"J trmc l!CO.: d.ua 726
'>r 'udural change \rr Brc: h
\truruntllnlaballl\ \rt' U " ll~
~tru.:tur.ll \ AR modchns:, t..l
'ltudent f\hln'huflnl! "4 fol. ,,_Q::.
I?U-171 r'"' 71S
"rull~nl to: tdh:l r.11111 tSTI{ ) and rest
e 1 I Ill 117 1::!0 1L'-12h
jJJ !)] !l>ll, ltt- 1 ' 1'11>-1~'.
I'~" '~~~. "l, 2.\K, :!4 l-::!.W. 2:\h
2.~'' ~tl .'h\-:'M.ZIJI. 2110,.\JIJ,

ll' 3),()..3J9
C'ahtorru.1 d:art,llr.l P.IIS(. 101. 1~1.
125.1J'I,I71- 172.1M-11>7 1'1".
nn.~ .l.N-~-1at.l12r.~5~r257.

2Y::!t, lll \1~, 4::',.'


nthtl'"' ot lJ\)-24-1
W.l.s dC'ICTIJ'h<llt II
J~

oen .. data.
!{)r .190

l'ahfumut, ln~cbu..ct ':lnJ


ltn~ 41f'_,..41).4 .. '~

:n ;,

M ~"tcbuscn~ tl II a . .. 32f,
33-lr
lcnnc,<;<o: dol !I ~ \7, J!it>-1).1.
4SS/, 4~1

rn

furca.~rng. '27
111ln<formmiJ h) ~t.1nll;lldiZUII! 'B~

C:t

I<Wnetric5..4,~

... UnuDJJrV ~ldiiSliOo. 2tJO


StUll Col MjUIIIC<l J'IC'd1CIIt111
1'17- 1%, l'l7
Ol'C!Vcllll 0: ~1!, 7tN

s.. Qu ndt hl..drhooJ


tullo (QI.R) >1~11) ,
'>uro t\'<lr~hip b"'' '.~4
S~mm~ nc Idempotent matH\,
715.- 'i
T

teacher ratio (STI{ 1.md lost


-.cores
Tc!Slln hYJ'Olhc:'l"' k<' H'P..'tnc" 1~ tin~
Tc'l s.:orctS..cla." 'llc and .\r(' Student
eacbcr

ni!JCI

Tc,l "1 llC. 17

C;

O:t'lll.lllf h~ II ~

C'cntr.IJ hmtl

thcor.:m
rr ltl\'llllllte. ~ \tuhhan th.: ccnltill
lnruttl eocm
Theorem. .:onrmuou m.IJ'J'In ~""
rh~orcm
~s-

\ onunuuus napp111
The .,,.., Ga~.> \L. ' .\tari;
rh:urem

w (,

Tho e , Siub" y~~ .- SI Jhl..v'sth<.t~~


Time dfet."lS.Iht:d Sc, lim~ h'cd, u~n'
cftc-.:t:-. 150. '\()J :\64. >n
control for 'unst>Le-. \IIJ

lim~ fl\c:J

OdllliUOO

of 'n~

lt.:l!f<''MOD \\llh.J61

if>~

lime 'Cries. ccnnomic. S3~ 5:\S


~nltfrato:J.dch:.:tn mll anal~""!!
561.1

Time: ~cncs an.tl~>l5 cmJ'III~al


lO.lf'lic.tllt'n> fcr WS
Time: ....:nc5 di!t.t 11-ll 12lJ,I31. '.Y.-'12'1,

tnhtlll:~'\

ut<lCOITdauo

s... \. rocorrcbtron

~nd CIU!oo<l efl ~'"'' S'Jif,

.9S

ddiruuon of. '25


ion:o:astm~ future 1!\C:nt

5~7 '\i~<>

~rali7ed mcth<M.I ol

mo1mc:nts
tG\1>.1) eo.trmaor 71.6
H \(. Mandlltd error' m, H)J-6f!S
intrC'ductton tn. ~15 711
n .iUilhiM. ~::!b ~ 7

1:

~Q

"""'

prohlenu. S45
lollll $Urn ttl "tll3r~ (T '), 12l-J:Z4
\l tuultnn 111, IL1
It llhc death' and olwhol tax.:-.

U9

~~

J.1111 Je"-Ttjlllvn.-~5(}-,13

It tl!ic 111.1111~ tatc:-.351 b21. J:\.J '\~/\.

\"V"'' ~7

l(, l

,~2

rrat1o \rrr 1~11,11.:


lr<'.l!mcnt mndm rc,errt tti .JX'I
ln.:rtmc:nr .tk r, Stt lllu.-.jtl cu~:.:t'
l.ct a'crn' 'i.. L<:IC/IIa,cra~c:
tre.tUII nt o.:IICh
'Jrlflllcllt ~rnup.'l l\5,S7,127.l'JCI,411S.
47H- 17" .f'lti .IS\ Js;?f,.lk/,
J91-'CJ2 r: 500. '%
lrcntmcnt Jc,-cJ, :tQ7
ctu.ll ' DSStgoo:d 4S~
and error term. 5o I
Trcatmcnt pro!o.:ul. IJ' J.i'C' to tollo,.,

1'12. 50Ct
Treatment '"nut>le ,p!l .JCl.l
fr.,nd{sl . < 4 ..,...,
de hnllhlll I '>~'\
J.tcrman '"c \~t l),rc:munt't" tr~ods
hnc r lin \r L r r 1 me ttcnJ
f'-ol ~ '

T
t..;;ec:1tn1Jhr
T"'\ \c Tct tl sum ot squllrC)

""'"" u. '" ~, 51-1:'6

11"-ollut v~ luc: nl, 711


u.-rnptntic dl\ln~Utlltn of. 6.-.:11 N\,'4.
~~~ Ill~. 713
~, "J un sample me: n t--':' ~
ICIJ C'llntparml! '"" m~:..tns, l>7

d limtio)ll ol "'"' Io;(l


Chllcrcnc:: ...,( M\08 Ql
di\lrthu!lo.~n ul. 71~

Ill!!" ''lllf'lt: Jasrnhut11n tl(, 77 7X


nm nrmul tlisrnbuuun ot.S.~" '5'1
Ul..,,l'lll
pocJicd \rt' P.'l<'>lcd 1\l:lthllC
m rc~.:re.." '"" lb9-l i I
\\1Im re '~"tf II& todumiC' trend. '~9
s
~,, pic -.uc.SS..lJ.!.169-l.l
"' ' ' 1 I ntr Jtstnhuuoo, "'~-91.lil1
1 q
ru nullh\'J'i>thc~,SJ
I 1~1 \rrl SlllltltC
1\t.u .. J.d ultt"matt\c h\pothesr

I<Hb.

i;! I

regressor ''l't' "I exc cnt'!l\. 71.!1


11nd ~.., ..J ~ ,,...c!alt " ~2-. :;.,~._y;
,l.tlt,,n ... rv. 5_,
tc,ltog lor IIIli! root in ~. t Df Cil S tc:t
lame ~n~ reltr\."'-'"'n n '11 NCJ

~."

t\11\rn.rtrvc h"JX'Ihe "


"IINQ.~ dcd mn!idcnce micro tl '~'"
onlide~..,

rJotun~ ~~

and muluplc rtgrc'

w1th rnulllpk prcdl~tors. ~;

'"'b modc:l u~m(lllOIU. '45

UCldlr,ltC', ~

mullt\:tnah: \n \lul 11\ll' 1t. Gau"


M ark{W th,orc:m

and aiYDlf'IU!Ic dt,tnhU'Inn,IJ.<iA

.: r

Jl~. ~J0-1:W ,.)'1/

tc&t

ddtllJJn>n ui.42J

~ur \\aid ~r.atisuc

. . z,-o~'
dcfin111o~ '' "'' :!. ~'1'1

wntparoonofrc:. 1
Ca!ifom1 und \I

0.. Uutlon 01. 1:!4


Surr~ elasttot)'. 4:! l-1:!.)

r dm 1t'ouuon. Sc:r 'It tJ.:nll da<tnt>uhon


T~nn~""'" d~" ,,. t'\f".
nl lll.4h9
summ.tJ! ot ,anJh "'"' \ct '\r ud~:nr

J.tmhr.1n or ".,
\JC1CU111JI, 51111 'I"I

Slod;. t>cra

~I

:.~~;1

rntcrval

T\1 0 II e IC':iSI "''llllfCS l."'llmlllt'r nL.Sl


42' :.'.IJ3f1-441. ~' J~ .1.p

44St

4JQ.~'l-'U

.t;5-456

~14 ~a;

\Ji\tnhLIIJOO 01, 70~. 727.


7:\11

U>)'Olf'IVII\
7~'1

796

IN DEX

Two )tag< least ~u.ares c'umat,,r (I SL\)


(oont'd)
a~ymptllllC etltcto:n.:\ ol.unJc:r
homu,keda<ucll), 711
calculattng \t.sndard crrCir< H7
c nh:rcd.7~
lllltOQ OL -l23, 7~
lfut r>t>uuon under hornosJ,; .:..i:~.;hut).

Jd

, ..

a' c:IIJOC:W G \1:-1 otimator. -~


rint 'rage ol rc,.. ~<ton,.UI.SO-l
f<~nnula for. 4~.'!9
mlcreoce ~ -lJ7
I\' n:p~ion i!.SMimptin -l35-43i
m nutm: form. 727
populauoo flr. q ..f''" revc: ioo
ol433
p.r>pt'rtics "'b.:n crrors are
'.ofTlO'olcda.'tl~ 7ll 1-7 ''
~ampling di!>tiibuuon nf,4h-4:l0.
-l.''-437.411

''~tc-. of. -BJ

sm~dard errOl'- {or. -11 7l0


l)ipc I ;md 11 errors. 7 \1 ~n o1lso Stull" tOll
hypotheSI.> t~t

u
Lnb.. o.mred pancl351
Lnbta>et!n~s. l35.2U'J
ot.:~limator,67

OJ!.I12,205.2Jo.3 13

defmmon oU7
ofOLS etimaror.I2J,I6l 167
-0~

lJJ
Untt :mtureyre.."SS,.c root IUntl ruot).
tl511--n.55. 61l~...M '. llft~r !itt' alw
Swchasuc Uo;ml" l)f--lo.l.!) l('t
deftmiJun ol 557
lesb fur 5~56-1
UodcriJentified cocllll'ltnt'

cen,u..._ 65.442

l,.S. Currem PopuJauvn Sur.~: y (C P ') 71


U.S doll.ir 'Bnll5b pound~ wh.m~c r Jte.
533-53-1. 53-ll. <;~;
L S. R:Jentl Fund' R.alo: q \, ''-'' 555
L.S. tllt:.lnne and .-\11<;1rllhan c\po rtS..

n2S-Q;:6
L " mtht.at:. -l%
L S.. puhhc c:ducation "'tern 1
L stock market C. IU. 541
mutual funds. 32-l

"\ 'iSE~tJd pncc mdc~.sn. ~3-1}.535,


.55 660).067-J<J 1)1).''
L.S unemploymc:nt 11<. ~I, ~ '~> 5:t'l/
and CX<M!enet" b.:.7
llltlalhm and.:, 5-:ll. -;.~ '-:17
V.-\.R IOO<Icl t>l.b41

~~

v
\ ,tlld tn~lrumcnt. 43\1 . .W3. 4S\--l~5. 4Y?
chcd;ang.-' 19-4-15
wn.JIIIIllll hn, ..123, -IJ(l
~uur~c

ol. 15H--45S
' m cbk:.,..1.!2--t!.3, 'S:!,-l54,-185 511-<

\ alau 1\
~ ' ~cmlll ~;,.,. C.~tcmal ...oloJil'

and stJnJ.JrJ ..tevt,llton. 2-l-21!


'vcf'\.ol .\~(' Vo;.tur cm1r corrcc11on mcll.lel
Vector( l 711!'1
t1Uturcgrc:"IIIO\ (Vr\R ) n:\((..(>12,n5n,
N>4
ddtntltm ut. n.'7--<r.3b
moJ, I t~..\S-(i.$2
trr..r CO
liOn model (\'f( \1)

hXo t\,'\K, t.n:!--66-:

ocrnAI '' ,. lnlcrn.tl ' hJII\


\ \R \u Vec1<. out
~ ""'""'
kl~).:.'b'J! 1~2~>'

bll'dr\ 5f't' RIO ill) \ .lfl I'>It


COillwl reldiJ"Q)htps .ummg. l>
co nlcgnllcd. :;..e Cotnlt rutc;d ' aJ ~abl~
o: ~11101 ' " ConU'OI ' nahl
"
ndc:ot \"t l>tpe'1tk..,, ' cP O:l~
d nm) \ IJtDal) ' n hie

.:n ' Sr En<Jo. .. '""" vanab,.;

Colli\ bin 1'\

Sff J:ntil\ hm. rv ' am1t>Jc

" ,hi) Jcm~ancd

3!\11
.), r f.:.xot~cnnu' ,.mable

c~" cnvu

tnd~p~nJ<'nt. SPt' Rcerc~~m

del""'"'~!

mo; 11

ol 11.:;1

\, t' \1c4l '"~tor

Vtcln~m \\ar Stt l <; mth~ar.


Volatthtv clu~lcrtng H-5.\~ 637.(~ #,9
.t..flnllt<n ol htt>
tlhNratcd. 5.3-l/

w
\\ .1ld . A I-orah n 71
Wll!d '""IUO '"1 /-si .. IJ,tic. 71<;...7111
W;dl .: [)a\td. 1::5
\\ cal. dep.-nJcncc.54o
Weak '"'trumnts.. -13\1. 455

tmhcutnr -V\9--WO

checkmv fur 441

tn(trurr m.tl (!t:,.lnslruntcnt.tl \ .trtablc


101 rJ..:t,u 'bct\\.:CIIIWO. 2X5-2M
tmU~J J p.-oJc..t .1011
ond I<
HhiT" ZC
11 , Jltoplc rcgrwton.:l17-'IQ
.;~mott< J 'If-, Omut.:d \.Jn ~l->tc: t>ias

d~finiuvn vt.-11~ 1

random ,,,.,.Random \'ilrr.lt>lo:l$)


11 fL I! h. ''(>f, \J-l

st.md.trdlfJ W
tr.:atnh:nt \,, frcatm~nl \.lrtdble
un~r<.:J . .l!i-1 ~Sl!,371
"31oJ m trumc:nt ;}tl! Vahd '"'trument
\,&n~hlo:)

1~,

(<
ult<lfl for +II -143
\\ .:.11her ~, Cc.d weather .. nJ orange
JW~ pn,cs
\\ o;tgnted :1\t' li~C 170
V. cijthtetli.:J,I "!Uures (WL') ).I>h.
ffll-{>'ltl "'22
,tdvant.t~~ mJ dtsad\ ~nt.&!l~ ul (,c)S
d.:fmtttuo of I 1-11. Mil
CSIIm.lt~J St.l.:JSible "'"'ghlt:J kas1

~u.tres

CSirmatnr IM. fo<J:!

f.:a,thk ~r f-eaSible \l.<.tj(hlo:J lc.t>t


''Ill ~C:'I

IZ-I.JJJ-135 l'il.
1111 IIW 167,170.21111.!:! . 227.
'I~. :uo-3!1.Sre "'"' '\tandard

handhntL h~u:rvslccdasi1C11).691.

tiT CIT)

With

VauMce P

of lkm\tulh ranuom '.~riot> It, 25

o9H%
ht:h:r()l<k.ellasucity on.no\1n
l uncuunal lorm.l>'12-il9~

o.>l ~"" ltt,,mt' 317


conJtttol'.tl C!i Condtuon~l

mf~u,tt.k ~rt Jntc~1Pk ".:sht..,lllc:a~t


'<jU.trC:'

' 1
on
or ,l'nJtlll nal tliSirlbuuoo IN't
ul dt~tc random \'llnahlc, 25
nd dfi~ocn,"\ hl!

"nh I.. nov. n hc:lcrosked;c.IIL1tv,

231$
Nc" CV Wc~l. Sec f\cwc) \\bt
\ m.tnce CSUI!l31M
po.1I.J. S Pooled ';an an..-.:

o;\UIII.IICII(\)

~umaht

ol popubuon dNnbuuon ,
Ol f'SCUdO <lUIohampl (OI'C.:a>Ung.

54S

I m\cr.~ of C.aluomrll,4 - t ~R
t.:nr..:rnry 01 Chrcag'' 407 -1~

( random ,.maN~ 2-l

Unre,tn.:ted rcjttt.;"'''m ~ :!JI

or "''mplc a\crage. 41-i

~mrle.oorursh:n,, vr,l:!.<~

o<~l-111:!

West. Kennclh,60S
Wlut<: ' tunJ.!rJ erro~ ~e.:
H"l~:roke\lasucrty-r<lhu't

-.t.mJurd errors
\VU1Jla Ell.:o, 190

WLS Stt' Wetghted le:~st ~uarc'


W nght P 11lhp G . -':!J-12h. aJ.-'-' '-'~1
W ng 1
w I. J!~25

Potrebbero piacerti anche