Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
to Econometrics
James H. Stock
HARVARD UNIVERSITY
Mark W. Watson
PRINCETON UNIVERSITY
Brief Contents
PART ONE
CHAPTER 1
CHAPTER2
ReviewofProbability
CHAPTER 3
Review of Statistics
PART TWO
CHAPTER 4
CHAPTER 5
CHAPTER 6
CHAPTER 7
CHAPTER 8
CHAPTER 9
PART THREE
CHAPTER 10
CHAPTER I I
CHAPTER 12
CHAPTER 13
PART FOUR
CHAPTER 14
CHAPTER 15
CHAPT ER 16
PART FIVE
CHAPTER 17
17
65
I09
Ill
186
254
312
34 7
349
383
421
468
525
591
637
675
677
704
v
Contents
Preface
xxvii
PART ONE
CHAPTER I
1.1
1.2
I0
2.1
Review of Probability
I0
17
18
23
25
29
29
vii
viii
CONTENTS
Independence 34
Covariance and Correlation 34
The Mean and Variance of Sums of Random Variables
2.4
35
39
2.6
AP P ENDI X 2.1
CHAPTER 3
3.1
Review ofStatistics
48
49
63
65
3.2
46
66
70
71
81
3.4
83
45
75
84
ix
CO NTENTS
3.6
3.7
94
APPENDI X 3. 1
105
APPENDIX 3.2
APPENDIX 3 . 3
PART TWO
CHAPTER 4
4. 1
4.2
107
I 09
Ill
112
4.3
Measures of Fit
123
The R2 123
The Standard Error of the Regression 124
Application to the Test Score Data 125
4.4
126
4.5
4.6
106
131
132
Conclusion
135
A P P E NDI X 4 . 1
A PPENDI X 4.2
A PP E N D IX 4 . 3
143
143
144
116
CONTENTS
CHAPTER 5
5.1
II
l.
I
5.2
5.3
5.4
158
158
160
160
155
166
5.7
Conclusion
171
APPE NDIX 5. 1
180
APPEND IX 5 . 2
CHAPTER 6
6.1
186
186
6.2
6.3
194
196
198
191
167
CONTENTS
6.4
200
u,
6.7
Mult icollinearity
206
Conclusion
APP ENDIX 6.1
20
206
210
Derivation ofEquation (6 .1 )
218
APPEN DIX 6 .2
CHAPTER 7
7.1
7.2
225
7.4
22 3
232
234
229
xi
xii
CONTE NTS
7.5
7.6
7 .7
Conclusion
244
CHAPTER 8
8 .1
25 1
254
8 .2
264
Polynomials 265
Logarithms 26 7
Polynomial and Logarithmic Models of Test Scores and District Income
8 .3
280
8 .5
275
277
Conclusion
291
296
in the Parameters
CHAPTER 9
9.1
307
312
313
9 .2
264
316
3 19
290
CON TE NTS
325
9.4
9.5
Co nclusion
328
329
337
338
10. 1
10.3
34 7
349
350
344
351
356
1 O.S
361
362
10.6
10.7
Concl us ion
APPENDIX I 0.1
36 7
37 1
The State Traffic Fatality Data Set
378
APPENDIX 10.2
Jdii
xiv
CONTENTS
CHAPTER II
11.1
383
1 1.2
389
396
396
11.5
Summary
CHAPTER 12
1 2. 1
400
407
APPENDI XII.I
TheBostonHMDADataSet
APPEND IX 11.3
415
415
4 18
421
12.2
12.3
439
445
434
384
xv
CONTENTS
12.5
12.6
Conclusion
A PPENDIX 12. 1
450
451
455
The Cigarette Consumption Panel Data Set
462
APPENDIX 12 .2
APPEN DIX 12 . 3
463
A P P ENDIX 12.4
APPENDIX 12 . 5
13.2
468
466
4 70
4 70
4 72
13.4
13.5
Quasi-Experiments
494
Examples 495
Econometric Methods for Analyzing Quasi-Experiments
13.6
500
497
xv
CONTENTS
12.5
12.6
Conclusion
APPE NDIX 12 .1
450
451
455
The Cigarette Consumption Panel Data Set
462
A P PENDIX 12 .2
APPENDIX 12.3
463
APPENDIX 12.4
APPENDIX 12.5
13.2
468
466
470
4 70
4 72
13 .4
13.5
Quasi-Experiments
494
Examples 495
Econometric Methods for Analyzing Quasi- Experiments
13.6
500
497
486
xvi
CONTENTS
13.7
13.8
Conclusion
507
516
APPENDIX 13. 2
APPE N D IX 13.3
Individuals
PART FOUR
518
520
Autoregressions
525
527
523
528
528
535
14.5
549
Nonstationarity 1: Trends
What I a -rrend?
p
554
r-55
7
553
CONTENTS
xvii
565
Conclusion
577
15.2
APPE ND IX 14.3
APPE ND IX 14.4
ARMA Models
588
589
591
595
586
APP EN D IX 14.2
586
596
15.4
602
15.5
604
15.6
15.7
625
618
624
615
600
xviii
CONTENTS
Conclusion
626
62 7
The Orange Juice Data Set
APPENDIX 15 .1
634
APPENDIX 15 .2
Vector Autoregressions
637
638
Multiperiod Forecasts
64 1
642
I.
16.3
16.4
Cointegration
653
655
Conclusion
A PPENDIX 16. 1
PART FIVE
658
16.6
648
666
669
U.S. Financial Data Used in Chapter 16
674
675
6 77
678
xix
CONTENTS
17.2
680
17.4
688
17.5
686
690
691
700
Two Inequalities
702
C H A PTER 18
18 . 1
704
18.2
706
18.3
71 3
715
715
7 10
XX
CONTEN TS
st
Distribution of
717
Homoskedasticity-Only Standard Errors
Distribution of the t-Statistic 718
Distribution of the F-Statistic 718
18. 5
717
719
721
APPENDIX 18 .2
Multivariate Distributions
APPENDI X 18 .3
APPEND IX 18. 4
Normal Errors
743
747
748
APPENDIX 18.5
APPENDIX 18.6
Appendix 755
References 763
Answers to "Review the Concepts" Questions
Glossary 775
Index 783
767
752
Key Concepts
PART ONE
1.1
15
2. 1
2.2
2. 3
2.4
2.5
2.6
2. 7
24
25
3. 2
68
70
76
3. 5
3.6
3. 7
PART TWO
50
67
Y is BLUE
Efficiency ofY:
3.4
47
55
3. I
3.3
38
40
79
80
82
I 09
4. I
4. 2
4. 3
115
119
131
4. 4
5.1
5.2
5.3
5.4
5. 5
6. 1
6.2
6. 3
The OLS Estimators, Predicted Values, and Residuals in the Mult iple Regression Model
6. 4
6. 5
133
150
152
157
162
168
189
196
... ,
~k
198
204
206
xxi
JUtii
7. I
7. 2
7.3
Testi ng the H ypothe sis {3i - {31,0 Against the A lternative {31 cF f3i.O 222
Confidence Inter ls for a Single Coefficient in Multiple Regression 223
O mitted Variable Bias in Multiple Regression 237
7.4
R2 and
8 .1
238
8.2
8. 3
8.4
8.5
273
279
282
287
9 .I
9 .2
313
9.3
9.4
Errors-in-Variables Bias
9 .5
9.6
9. 7
318
319
321
323
325
327
10. 2
10.3
II. I
I 1. 2
347
350
359
365
388
392
I I. 3
Log it Regression
12.1
394
12.2
12 .3
433
435
436
12 .4
12.5
44 1
I 2.6
444
437
14.2
14.3
Autoregre ssions
14.4
14.5
Statio narity
545
26 1
539
544
530
532
523
xxiii
KEY CONCEPTS
14.6
546
569
572
15 . I
15.2
15 .3
15.4
600
602
16. 3
16.4
PART FIVE
17 . I
6 17
639
16. 2
16.5
562
645
650
658
6 75
The Extended Least Squares Assumptions for Regression w ith a Single Regressor
18. 1 The Extended Least Squares Assumptions in the Multiple Regression Model
18.2 The Multivariate Central Limit Theorem
710
18.3
18.4
723
721
707
680
35
42
86
284
288
323
407
425
426
446
4 74
498
550
573
625
Preface
conometrics ca n be a fun course for both teacher a nd stud nt. The real world
of economics, business, an d gov -mment is a com plical d and messy place. full
of com petin g ideas and q uestions that dema nd answe rs. Is it more effective to
tackle dr unk driving by passing tough laws or by increasing the tax o n ale hol?
Can you make mon y in the st ck marke t by buying when prices are hist rically
low, re lative to earning . or should you just sit tight as the rando m walk theory of
stock prices suggests? Can we improve elementary e ducation by reducing class
size . or sho uld we simply have our children li ten to Mozart for te n minutes a day?
Ec nometrics help us to sort o ut sound ideas from crazy ones and to find quantita tive answ rs to in1portant quantitative question . E cono metrics op n a wi ndow on o ur complicated world tha t le ts us see the re lation hips on which people.
bu inesses, and governments base their decisions.
This textbook is designed for a first course in undergr aduate e onometrics.lt
o ur expe rie nce that to make econome trics rei vant in an introductory cours ,
interesting applications must motivate the the ry and the theory must match the
a pplications. This simple principle represents a significant depar ture fr m the older
gene ration of econ metrics books, in which theoretical models and a umptions
do not match the applications. It is no wonder that orne tud nts qu tion the relevance of econometrics after they spend much of their time learning assumpli ns
that they subseque ntly realize are unrealistic, so th at they must then learn " olutions'' to ' pro !ems" that aris when the applications d not match the assumpt ions. We believ that it is far better to motivate the need for tools with a concrete
a pplication, a nd the n to provide a few simple assumptions that ma tch the application. Because the theory is im media te ly re levan t to the applicali o , this
approach can make econome trics come alive.
The second edition benefits from the many c nstructive suggestions of teachers who used the fi rst edition, while m aintaining the philo o phy that applications
hould drive the theory, not the other way around. The single greatest change in
the second edition i a re rgan iza tion and expansion of the mat rial o n core
regression analysis: Part II, wh ich covers regression with cro s-section l data, has
be n expanded fr m fo ur chapters to six. We have added new empirical examples
(as boxes) d rawn from economics and finance; some new optional . ections on
xxviii
PREFACE
classical regres ion th ory; and many new exercises, both paper-and-pencil and
computer-based empirical exercises using data sets n wly placed on the textbook
Web site. A more detailed description of changes to the econd edition can be
found on page xxxii.
PREFACE
xxix
Instrumental variables regression. We present instrumental variabl s r gression as a general method for handJing correlation between the error term and
a regressor, which can arise for many reasons, including omitted variabl sand
simultaneous causality. The two assumptions for a valid instrum nt - exogeneity and relevance-are given equal billing. We follow that presentation
with an extended discussion of where instruments come from, and with tests
of overidentifying restrictions and diagnostics for weak instruments-and we
explain what to do if these diagnostics suggest problems.
Program evaluation. An increasing number of e onometric studies analyze
either randomized controlled experiments or quasi-experiments, also known
as natural experiments. We address these topics, often collectively referred to
as program evaluation, in Chapter 13. We present this research strategy as an
alternative approach to the problems of omitted variables. simultaneous
causality, and selection, and we assess both the strengths and the weaknesses
of studies using experimental or quasi-experimental data.
Forecasting. The chapter on forecasting (Chapter 14) considers univariate
(autoregressive) and multivariate forecasts using time series regression, no t
large simultaneous equation structural models. We focus on simple and reliable tools, such as autoregressions and model selection via an information criterion , that work well in practice. This chapter also fe atures a practically
oriented treatment of stochastic trends (unit roots), unit root te ts, tests for
structural breaks (at known and unknown dates), and pseud o ut-of-sample
forecasting, all in the context of developing stable and reliable time series forecasting models.
Time series regression. We make a clear distinction between t\ o very dill rent applications of time se ries regression: forecasting and estimation of
dynamic causal effects. The chapter on causal inference using time series data
(Chapter 15) pays careful attention to when different estima tion methods,
including generalized least squares, will or will not lead to valid causaJ inferences, and when it i advi able to estimat dynamic regressions using OLS with
heteroskedasticity- and aut correla tion-consi tent standard error .
XXX
PREFA CE
Large-sample approach. Because data sets are large, from the o utset we use
large-sample normal approximations to sampling distributions for hypothesis
testing and confidence intervals. O ur experience is tha t it takes les time to
teach the rudiments of large-sample approximations than to teach the Student
t and exact F distributions, degrees-of-freedom corrections, and so forth. This
large-sample approach also saves students the fr ustration of discovering that ,
because of nonnormal errors, the exact distribution theory they just rna tered
i irrelevant. O nce taught in the context of the sample mean. the large-sample
approach to hypothesis testing and confid ence intervals carries directly
thro ugh multipl regression analysis, logit and pro bit, instrumental variables
estimation, and time series methods.
Random sampling. Because regressors are rarely fix d in econometric applications, from the outse t we treat data on all variables (dependent and independent) as the result of random sampling. This assumption matches our initial
app lications to cross-sectional data; it extends readily to panel and time eries
data; and because of our large-sample approach, it poses no c dclitional conceptual o r mathematical difficulties.
Heteroskedasticity. A pplied econometricians routin ly use het roskedasticity-robust standard errors to elim ina te worries a bout \ hether hetero skedasticity is pr sen t or not. Jn this book. we move beyond treating
heteroskedasticity as an xception or a 'p roblem to he "solve ''; instead, we
all w fo r heter skcdasticity fro m the outset and simply use heteroskedasticity-
PREFACE
xxxi
robust standard errors. We present homoskedasticity as a special case tha t provides a theoreti al motivation for OLS.
xxxii
PREFAC E
More empirical examples. TI1e econd edition retains the empirical examples
from rhe fir t edition , nd adds a sign ificant number of new ones.1l1e e addi tio na l
exampl s include esti ma tion of the returns to education; infe ren e about the gender gap in earnings: the diffic ulty of for ecasting the stock market; and mo deling
the volatility cl stering in stock returns. The data se ts for these e mpirical example are posted on th course Web ite. The se omJ edition also includes more general-interest box s, for example how ample sel cti on bia ("survivorship bias' )
can produce misleading conclusions about whe ther actively mana ged mu tual fund s
actually beat the market
Expanded theoretical material. The phil osophy of this and the previous edition is that t he modeling assumptions should be motivated by empirical applications. For this rea on, o ur th ree basic least sq uares a. sumpt io ns that under pin
regression with a single regressor include neith er normali ty nor homoske dasticity, both o f which are arguably t he exc ption in econometric a pp lication . Thi
leads directly to large-sa mple inference using hetero kedasticity-robust tanda rd
errors. O ur experience is that students don t fin d th is difficult- in fact, what they
fin d difficult is the traditiona l approach of introducin g the homo kedasticity and
nonnali ty assum ptions, learn ing how to use t- and F-ta bl es, then being told that
wh at th y just learned is not reliabl in applications because of the fail ure of these
assumpti ons and that these " problems' must be ''fixed." But n ot all in tructors
sh are th is vie> , a nd some find it u -eful to introd uce the horn oskedastic nonna l
regr ssi on model. Moreover, even if homosk edasticity is the exc ption instead of
the rule. assuming homo kedasti city permits discussing the G auss-M rkov theorem , a key motivation fo r using ordina lea t - u ares (OLS).
For the r a on. , the treatm ent of the core r gre ion mate rial has been signific ntly expanded in the second edition. and now incl udes ections on the theoretical motivation for OLS (the au -Markov t heorem ). mall-<;ample inference
in the IJomoskeda tic normal modeL and mullicollin eanty and the dummy ari-
PREFAC E
x.x.xiil
able trap. To accommodate these new sections, the new empirical examples, the
new general-i11terest boxes. and the many new exercises, the core regression chapt rs ha e b n expanded from two to four: TI1e linear regression model with a single regresso r and OLS (Chapte r 4); inference in regression with a single regressor
(Ch apter 5); the mult iple regression model a nd OLS (C hapter 6); and inference
in the m ultiple regression model (Chapter 7). Thi expanded and reorgan ized
treatment of the core regr ssion material con titutes the single greatest change in
the second edition.
The second edition also includes some additional topics req uc ted by some
instructors. On such addition is specification and estimation of m dels tha t are
nonli nea r in the parame ters (Appendix 8.1). A no ther is how to com pute standard erro rs in panel data regression when the error term is serially correlated
fo r a given entity ( clustere I tandard rrors; Section 10.5 and ppendix 10.2). A
third addition is an introduction to current best practices for detecting an d handling weak in trume nts (Appendix 12.5), and a fourth addition is a treatme nt, in
a new final s ction of the last chapter (Section 18.7), of effjcien t stima tion in
the heteroskedastic linear IV regression model using general ized method of
moments.
Additional student exercises. The second edition contains many new exercises, both ' paper and pencil" and empirical exercises that invol e the use f data
bases, supplied on the course Web site, and regression oftwar . T he data section
of the course Web site has been significantly enhanced by the addition of numerous databases.
xx.xiv
PREFACE
Part II
Chapter 4 introduces regression with a single regressor and ordinary least qua res
(OLS) estimation, and Chapter 5 discusses hypoth is tests and confidence intervals in the regression model with a single regressor. In Chapter 6, students learn
how they can address omitted variable bias using multiple regression, thereby estimating the effect of one independent variable while holding other indep ndent
variables constant. Chapter 7 covers hypothesis tests. including F-tests, and confidence intervals in multiple regression. In Chapter 8, the linear regression model
is extended to models with nonlinear population regression functions, with a focus
on regression functions that are linear in the parameters (so that the parameters
can be estimated by OLS). In Chapter 9, students step back and learn how to identify the tr ngths and limitations of regression studies, seeing in the process how
to apply the concepts of internal and external validity.
Part Ill
Part III presents extensions of regression methods. In Chapter 10, students learn
how to use panel data to control for unobserved variables that are constant over
time. Chapter 11 covers regression with a binary dependent variable. Chapter 12
shows how instrumental variables regression can be used to address a variety of
problems that produce correlation between the error term and the regressor, and
examines how one might find and evaluate valid instruments. Chapt r 13 introduces
students to the analysis of data from experiments and quasi-, or natural, experiments, topics often referred to a "program evaluation."
Part IV
Part IV takes up regression with time series data. Chapter 14 focuses on forecasting and introduces various modern tools for analyzing time series regressions such
as unit root tests and tests for stability. Chapter 15 discusses the use of time eries
data to estimate causal relations. Chapter 16 presents some more advanced tools
for time series analysis, including models of conditional heteroskedasticity.
PartY
Part Vis an introduction to econometric theory. This p rt is more than an appendix that fills in mathematical details omitted from the t xt. R ather, it is as If-contained treatment of the econometric theor y of estin1ation and inference in the
linea r regression model. Chapter 17 develops the theory of regres i n analysis for
a single regressor; th e expo ition does n t use matrix algebra, although it does
demand a higher level of mathematical phi tication than the re l of the te t .
PREFACE
XXXV
TABLE I Guide to Prerequisites for Special-Topic Chapters in Parts Ill, IV, and V
Prerequisite parts or chapters
Part I
Part II
Chapter
1-3
4- 7 , 9
10
xa
x
xa
x
x
X
X
x
x
x
x
x
X
X
X
X
X
X
- -l l
12.1. l2.2
l2.3-12.6
-13
- 14
15
16
17
r18
,___ xa
Part Ill
10. 1,
12.1,
10..2
12.2
Part IV
14. 1-
14.5-
14.4
14.8
PartY
15
17
X
X
X
r-
X
X
-X
This table shows the minimum prerequisites needed to cover the material in a given choJ.'er. For example, estimation of
dynamic causal effects with time series data (Chapter 15) first requires Pori I (as neede , depending on student preparation ,
and except as noted in footnote a), Port II (except for chapter 8 ; see footnote b), and Sections 14. 1-14.4.
"Chapters 1Q-16 use exclusively lorge-sa11e aphroximations ta samplint distributions, so the optional Sections 3.6 (the
Student r d istribution for testing means) an 5.6 (t e Student r distribution or testing regression coefficients) can be skipped.
bChapters 14- 16 (the time series chapters} can be taught without fi rst teaching Chapter 8 (nonlinear regression functions) if
the instructor pauses to explain the use of logarithmic transformations ta approximate percentage changes.
C hapter 18 presents and studies the multiple regression model, instrum ental variables regression, and generalized method of moments estima tion of th e linear
model, all in matrix form .
x.xxvi
PREFACE
Sample Courses
This b ok ac
PREFACE
xxxvii
Pedagogical Features
The textbook has a variety of pedagogical features aimed at helping students to
understand, to retain, and to apply the essential ideas. Chapter introductions provide a real-world grounding and motivation, as well as a brief road map high lighting the sequence of the discussion. Key terms are boldfaced and defined in
context throughout each chapter, and Key Concept boxes at regular intervals recap
the central ideas. General interest boxes provide interesting excursions into related
topics and highlight real-world stud.ies that use the methods or concepts being discussed in the text. A numbered Summary concluding each chapter serves as a helpful framework for reviewing the main points of coverage. The questions in the
Review the Concepts section check students' understanding of the core content,
Exercises give more intensive practice working with the concepts and techniques
introduced in the chapter, and Empirical Exercises allow the student to apply
what they have learned to answer real-world empirical questions. A t the end of
the textbook, the R eferences section lists sources for further reading, the Appendix provides statistical tables, and a Glossary conveniently d fines all the key terms
in the book.
x.x.xviii PREFAC E
tables, and Key Concepts. The Soluti ons Manual includes solutions to all the endo f-chapter exerci es, while the Test Bank. offered in Test Generator Software (Te tGen with Q uizMaster) . provides a rich supply of easily edited test problems and
que tion of various types to meet specifjc course needs. These resource are availabl for download from the In. t ructors Resource Center at www.aw-bc.com/irc.
If instructors pref r th ir suppl ments on a CD-ROM, our Instructor' Re ource
Disk, available for Windows and Macinto h, ntains th e PowerPoint Lecture
Notes, the 11 st Bank , and the Solut ions Manual.
In adilltion. a C mpa nion Web site. found at www.aw-bc.com/ tock_watson,
provides a wide ra n ge of addi tio nal resources for students and facu lty. The e
include data e ts (or all the text examples repli ation fi les for empirical results
reported in the text, data sets fo r the end-of-cha pter Empirical Exercises, EViews
and STATA tutorials for students. and an Excel add-in fo r OLS regres ion .
Acknowledgments
A great many people contributed to the fir t eiliti n of this book. Our biggest debt
of gratit ude arc to our colleagues at Har vard and Prin ceton who u ed early drafts
of this book jn their classroom . A t H arvard 's Ken nedy School of Government.
S uzanne Cooper provided invaluable sugge Lio ns and detailed comm ents on multi p le draft . As a co-teacher with one of th a uthors ( tock), she also helped to vet
much of the m a teria l in this b ok while it was being develop d for a requi red
cour e for master's student at the Kenn dy Sch ool. We are also indebted to two
o ther Kennedy School colleagu s, A I ert Abadie and Su e D yn arski. for the ir
patient xptanation of quasi-experiments and the field of program evaluation an d
for th eir detailed comments on early draft of th e text. A t Prine ton . Eli Tamer
taught from an early draft and also provided he lpful om ments on the penultimate
draft of the b ok.
We also owe much t man y o f o ur friends a nd co lleagues in econometri s who
spe nt time talking with us about the substanc of Lhis book and wh collectively
m ade so ma ny h lpful suggestions. B ru cHan en (U niversi ty ofWiscon in . Madison) and Bo Honore (Princeton) provided helpful feedback o n very t.:arly outlines
and preliminary versions of the core material in Part II. Joshua A n grist (MJT) and
Guido l mbens (U niver ity of California, Berk ley) provided th o ughtful suggestions about our treatment of materials on program valuat ion. Our pre ntation
of the material on time erie h a be nefited f rom discussions with Yacine itSahalia (Princdon), Graham Elliott (University of California, San Diego). Andrew
Harvey (Cambridge University), and Chri topher ims (Prince! n ). Finally. many
people mad helpful sugge tions on parL of the manuscript clo c to their area of
expertise: Don Andrew (Yale), John Bound ( mversity of Mich1gan). 'r~gor}
PREFACE
xxxix
Chow (Princeton). Thomas Downes (Tufts). David D rukker (Stata, Corp.), Jean
Bald win G rossman (Princeton), EricH n u hek (the Hoove r Institution), James
He kman (Univ rsity of Chicago), Han Hong (Princeton , Caro li ne Hoxby (Harvard) , A lan Krueger (Princeton), Steven Levitt (University of Chicago), Richard
Light Harvard), D a id eumark (Michigan State U niversity), Jo eph Newhouse
(H r ard) , Pierre Perron (Bo ton University), Ke nneth Warn r ( U niv r ity of
Mi higan), and Richard Z ckhauser (Harvard).
Many people were e ry generous in providing us with data. The California test
score data w re c nstructed wit h the assistance of Les Ax !rod of the Standard
and Ass s ments Division, California Department of Education. We are grateful
to Charlie DePascale. Student Assessment Services, Massachusetts D partment
of E du ation, for his help with asp ct of the Massachusen test score data set.
Christop her Ruhm (Univ r ity o f N rth Ca r !ina. G reensboro) graciously provided us with his data set on drunk driving laws and traffic fatalities. The research
department at the Federal Res r e Bank of Boston dese es thanks fo r putting
together their data on racial discrimination in mortgag lending; we particularly
thank Geoffrey TooteU for providing us with the updated version of the data set
we use in Chapter 9, and Lynn Browne for explaining its policy context. We thank
Jonathan Gruber (MIT) for sharing hi data on cigarette sales, which w analyze
in Chapter 10, and Alan Krueger (P rinceton) for his help with the Tenn ssee
STAR data that we analyze in Chapter 11.
We are also grateful for the many constructive, detailed, and thoughtful comments we received from those who reviewed vari us drafts for Addison-Wesley:
We thank several people for carefull y checking the page proof for errors.
Kerr. Griffin and Yair Listokin read the entire manuscript, and Andrew Fraker.
Ori H ffetz, A mber Henry, Hong Li, A lessandro Tarozzi, and Matt Watson worked
through several chapters.
Mi hael Abbott, Queen' U niver ity,
Canada
Richard J Agnello, Univ rsity
of Delaware
Cl pp ,r Almon, University
of Maryland
Joshu a Angrist. Massachusetts
ln titute of Technology
Swarnj it S. A rora, University
of Wisconsin, Milwaukee
Ch ri. lopher F. Baum. Boston
C liege
xl
PREFACE
PREFACE
xli
xlii
PREFACE
Above all, we are inde bted to our families for their endurance throughout this
p roj ect. Writing this book took a long time-for them, the project mu t have
seemed endless. They more than anyone bore the burden of this commitment, and
for their help and support we are deep! grateful.
CHAPTER
Economic
hall dozen di ffer~nt answers. One m1ghttdl )OU that cconomctnel> ~~the
i~
the~
the !)t:t of tool used for forecasting future valUe!!) of econumis 'ariabln such
M sw~k
pm:es. Another
th~.
science and
CHAPTER 1
1.1
1 1
consistent with the idea that smaller classes produce better test scores, it might
simply renect many other advantages that stu<.lents in districts with small classes
have over their counterparts in districts with large classes. For example, districts
with small class sites tend to have '"cal tbier residents than distncts with large
classes, so students in smaiJ-class districts could have more opportunities for learning o uts id~ the classroom. It could be these extra lc.:aming opportunitit:s thotlt!ad
to higher test scores, not smaller class sizes.. ln Part II. we use multiple regression
analy::>is to isolate Lbe effect of changes in class size from changes in othe r r~ctors,
sucb as the economic background of the students.
CHAPTER l
smoking a nd the less quantifiable costs to nonsm okers who prder not to breathe
second hand c1gare n e smoke. a re borne by other meml'tcr' ~.>I :.od~:t) Bc:\."ause
these costs are borne by pcopJc Olher than lhe c;mo ker. tbtrt:. b a role fur gnH!mmcnt interventio n in reducing c igarette consumption On(' of the mo... t tkxible
tools for cuumg consumption is to incre ase taxes on cif trcttc-...
Basc ~con omics says that if cigarette prices go up. consu-.pt:on \\'Ill go down.
B ut by how much? If the sales price goes up by 1'"u hy \\hal pcrc\.:ntage will the
quanuty of cigare ttes sold de crease? The pt,.rcentagc ch 1ngl! m the qua nuty
demanded re ulling from a ll% increase in p nce is the pnn cla\ttdty of demtmd.
If we want to reduce smoking by a certa in amo unt. say 20%. h> raisi ng taxes, the n
we need LO kno w the price elasticity to calcul a te the price incrc.lsc necessary to
achieve this reductio n in consumptio n. But wha t is the price e lasticity of dem and
for ciga rettes?
A It ho ugh economic theory provides us wi th tbe co ncepts rhot he lp us a nswer
this questi o n , it does not tell us the nume rica l value of the price elasticit y o f
dema nd. To learn the e lasticit y we must examine e mpirical e vidence about the
behavio r o f smo ke rs a nd pote ntial sm oke rs; in o the r words, we nee d to analyze
data o n cigarette cons umption and pric~.
The data we exami ne a re cigarette sales. prices, taxcc;. and persona l incom~ fo r
U.S stmes in the 1980s and 1990s. Jn these data. sta les with km- taxes. a nd thus low
cigare tte pnces. have high smo king rates, and s tates with high pnccs have low
smokmg rates. However. the a naJyss o f these d ata i com plicated beca use causality runs both ways: Low wxes lead to high demand. hut if there arc many !lmo kers
tn the sta te the n local politicians might try ro keep cgarctte taxes low to satic;fy
thetr smok10g constitue nts. [n Chapter 12 we study methods for handling lhis
simultaneous causality" and use those me thods to estima te the price elasticity of
cigarette demand.
1.1
CHAPTER 1
1.2
1. 2
fertili zer) and a treatme nt group that n::ceivcs the treatment (100 g/m 2 of fe rtilizer). It is rando mize d in the sense tha t the treatment is assigne d randomly. 111is
random assignm~nt e limi nates the posl>ibility of aS} temalic relationship betwe en,
for example. bow sunny the plot is and whe ther it receives fer t ili7er, so that the
only sy~temaric diffe rence between the treatment and control groups is the treat m e nt. [f tl1is exp..:ri mt:nt is proper:ly Impleme nted on a large enough scale, then it
will yield an estimate of the causal e ffec t o n the o utcome of intt.;rest ( tomato production) of tbe treatment {applym g 100 glm 2 of fe rllh7er).
In this book, the causal effect b defined to be the effect on a n outcome of a
givt:n action or ueatme nl, as measured in an ideal randomized controlled t: pe rime nt. In sucl1 a n expe riment. the only system atic re ason for diffe rences in o utcome::s between the treatment and control gro ups is the treatm~nt itselL
It 1s possible to imagine an idea l randomized controlled expenment to a nswer
each of the first three qut:stions in Section 1.1. For example, to s tud) class size o ne
can imagine randomly assigning ' treatments'' of di(ferent dass sizes to diffe rent
groups of students. Jf the experime nt is designed a nd executed so that the on.ly systemallc diff<!rence bet\\ Cen the groups of s tudents is their dass 1ze. then in theory this t:xperime ot would estimate the effect o o test scores of reducing class size.
holding all e lse consram.
m e concept of an ideal rando mized controlled experiment is useful be(,;ause
it gives a defutition of a causal eff~ct. In practice, however, it ill nol possible to per
form ideal expe rim~rH s. In fac t, experiments are rare in cconomt:trics because
often they a re une thicaL impossible to execute sntisfactorily, or prohibitively
e xpe nsive. TI1e cone:: pi of the ideal randomized controlled expe riment does, however, provide a theore tical benchmark for a n econome tric analysis o f causal e ffectli
using actual data.
r,
,
10
CHAPTER 1
1.3
The knncssl!'- cl. ,., siZe experiment co!)t mllhl'"" of dnll u ... and required the
un!!omg cuufkratlun of many admimstratorl>. p;1rcnt-.. anlltt.;,Jthus mer s~ve ral
~\:;n-. . Bccnu'c real-wt rlllexpcrimenh \\ llh hum,m !)llhJl:LlS ar" \hfltlUiltu aJmmi ...ter and lu control.tht;y ha' c n- \\s rda(\ c tu ideal ranJonuz d lllnl ll~o.J c::xpui'lllcOI\. ~lorcover.1n <:orne circumstance' ~.-xpcrimcnh arc nut nnl) C\pcnstvc anJ
t.hlllcult to nJmin' h:r but a.Jso unethical (Would it~ ethk1l hl olltr r mJoml}
~ lct:tt.;d tccnn.!en. n~o. C\pen..-.1\~;; c1garcth:s h SC.:! how n :tn} th~.- v 11uv?) Bec:.u'c \'lf
tht:"~o. fmant' l,tl. practKal. and ethical pml1kms. expcrim~.-nto;; in ~oa.:onomk" arc rare.
I n't~;au mn... t ecnnomtc dat~t arc obtainc::d by ob!)Ct\tng rc::JI \\LiriJ b... h "tor.
D tl 1 ul1tainetll1) observing actual hd~t1vior tllll,tl.lt: an c Xpltlmcnt.ll settin~
arc callcd observational data. Obscrvatinncll U<tla arc collected U'itng -;uncy... such
n., ,, tdcphnnl..' ~un cy of consumers, and admin i... tr ,tli\'c rccllrd.,, "uch "' historkal
rccortl<, IJn mort~wge applications maintamcd by lcnc.hn~ institution"
Obsc1 va twnal duta pose maJor cha llenges to l!conumcl lll.: nncmpb to cstim,llt cau'>all'lfc.:cts, anJ the tools of econometric' h1 tm:kle the.'>~ dl'lll~n'l'' In th...:
ltll \\Orld. k\cls ol "tre.nmcnt" (the amLIUnt ol tuuhzcr 111 tl1~: tomto c.:\amplc.
t h~.- '\I Ud~nt - llJChl.;r ratio in tbc cl..t<is sizt c.:\.1mpl~) ate nut a'"l!ntJ ,tt randl)ffi,
'n 11 i:- llillicult to sort out the effect of the treatment " from nth~o.t tdtHtnt
l.tctuh. ~ tuch 11 (..Uir '' :1crncs,, nJ much ol lhh ho<'k, 1 dcvot d 1<1 me tl oJs 1r
t .3
11
meeting the chalkngc!> ~ncountereu when real-world duw arc used to estimate
cau-.al dlccts.
\\ hether the data are experimcma l or ob en ational, data set come in three
nwn t) pes. cross-sec:til)nal dara. time ~cries data. and pand tlatn. In this book you
willl!ncounter all three types.
Cross-SectionaJ Data
Data on different entities-workers. conl>uruers. firm:-.. governmental unib. and so
fmth-for a single lime period are callt:tl crO'i!!Sectionnl datu fur example. the
unla on test score in California school tlistricts are cross sectional. Those data arc
for 4:!0 entiticc: (school districts) for a sin~le time p.:riod ( 1998). ln general , the
number of enllt tc::> on which we have ob::.crvations is denoh:J by n; so for exampk. in the Cahfom~:1 data set 11 = 420.
The Cahrorma t.:::.l score data set Cllntains measuremenb of :o..:, cral differc::nt
Htriahles for each dtstrict. Some of these data are tabulah:d in lbhle 1 1. Each row
hc;ts data for a different district. For example. the average tc:-.1 -.core for the firl>l
district (''district #1 ") is 690.8; lh.i:s is the average of the math and science test scores
for all fifth graders in that district in 1998 on a standardi1!d test (thl! Stanford
Ach ievement Tc::st).l11c average student- teacher ratio in that d tstrict is 17.H9,
that is.tbc number of :o.tudents in district #1. divtdcd b) the numb..:r of classroom
teuchcrs in d istrict "I. is 17.89. A\ eragc ~:xpen d i turc per pupil in districtl#l is
$6 ,3~5. The percentage of studen ts in that d istrict 'ili ll lt:arning E nglish-that is.
1hc perc\.!ntagc of students for whom English is a sl'cond language and who are
not yet proficient in English-is 0% .
The rema1ning. rows present da1a for other distrh.:lS. Th-.: onkr of the row~ is
nrhitrary. and lht.> number of the distncl, which is called the observation number.
IS an arbitran ly assigned number that o rge~mzes tht! data. As you can see in the
tuble. all the 'uriablcs listed vary con')idcrably.
With cro~s-sectional data. we can learn about relationship~ among variablelh~ .. lud~ing differtmces across people. firms. or other economic entitie:o. dunng a
single time period.
12
CHAPTER 1
TABLE 1. 1
Observation
District Average
Test Score (Fifth Grode)
Student-Teamer
xpenditure
Ratio
per Pupil ($ )
b90 s
17.89
So~5
661 .2
2 1.52
5()<)<)
46
641.6
18 70
'i502
JOO
047.7
17.3n
71 02
0.0
640.8
1K67
5236
13.9
411{
645.0
2 1.H9
4403
24.3
419
672.2
20.20
4776
J.O
420
655.R
19.04
5993
5.0
(District) Number
Studenu
Leoming Englim
Oil%
observations on two variables (the rates of inflation <IOU un~ rnploymenl) for a
single e ntity (the united States) for 183 time pe riods. Each time period in this data
set1s a quarter of a year (the first quarter i~ January. February. and March: the second quarter is April. May, and June; and so forth) . The observations in this data
set begin in the second quarter of 1959. which is denoted 1959:1I. and end in the
fourth quarte r of2004 (2004:1V). The number of observations (tha t is. time periods) in a time series data set is denoted by T. Because there are 183 quarters from
l959:11 to 2004:1 V, this data set contains T = 183 observ a tion~.
Some observations in this data set are listed in Table 1.2. The data in each row
corresrond to a differenl time period (year and quarter). In the secant.! quarter of
1959, for example, the rate of price inflation was 0.7% pe r year at an annual rate.
In otlwr \vords, if inflation had continued Cor 12 months at its rate during the .sec
ond quarter of 1959, the overall price level (as measured by the Consumer Price
Index. CPI) would have increased by 0.7%. In the second quarter of 1959. the rate
of unemployment was 5.1 %; that is. 5.1% of the labor force reported that the~ did
not have a job but were looking for work. In the thtrd quarter of 1959, the rate of
CPT in nation was 2.1%, and the rate of unemployment Wa'> 5.3%.
1. 3
13
I_
(Yeor:quomr)
CP11nflotion Ror.
Unemployment
Rote ("!.)
llJ59:11
0.7%
5.l~o
1959:111
2. 1
5..'
5.1\
l'I)Q;JV
0.4
Jl),4} 11
24
5.2
l\1
Z~ll
4.3
'.6
1 ~2
~lii4 :Il[
I ''
SA
.~
21114.1\'
3.5
5.4
b.
'
By tracki ng 1 smglt! t..Olily over time, ume ~erie!> d tta e<~n he used to s1Ud\ the
C\'Oiuuon of vanables over time and to lorecast tutur~.- vttlucs of tho::.~ "ariahlc:s.
Panel Data
Panel data. ,tlc;o called longitudinal data. are J at.l for mulliplc cnti t i~~ in which
each cntll\' is nh,crvcd a t two or more ttme pe11mh. Our Jat.J l)n c1g ut.tlc con~umptim
and pncc::. art: an C).3mple of a panel c..lat.t ''-' and selected 'affitblcs and
that data ~et are hstt..J m l ahl~; 1.3. Tht: numhtor of cntllk~ 111 .t
p.. nc.l JJta 'Lt IS denoteJ b) n. and the nu m~r of lmt.. rcriod.; i' Jen~liCJ 1:'1~ T In
th~.- d~.tretll. d.tl a ''-'l. We; ha,e observation~ on 11
4S cominenlctl U.S 'tate" (c:nlittcs) torT= 11 y~:af'> (time period-;) from L985 to I~J5.1llUc; there il> a tol:ll of
" X T = 4R v II = 52R ob!'ervations.
Some cJ,lt,, from the cigarette consumption data set are listed m Tabh... 1.3. 11\e
liN block u( -1~ observau on'> li,ts the data lot each slate m 1985, 11rgamzcd .tlph.th:tlnlly lrllll \l.tham,, 1<.1 Wyoming. Thl! n1. \t hlock o f 4!) oo-.l:l'\ a11on' Ji._t., !he
J l.l fllf I "" .tnJ '-II fllr\h through 1995 ror C\Jrnpll in 19\:', cigarette -;ales in
Arl.;an-..t-.. '"'' r~ I" 5 p:~ck" per capiw {the tot:tl numhl!r ot pacb ot dgan:lh:' solcJ
o~'!>Cf\ uon::. 10
14
CHAPTER I
Observation
Average Price
Cigarette Soles
per Pock
(paclu per coplto) (Including toxu)
Totol Taxes
(clvorene excise
State
Yeor
A l ttlamll
JQ~5
116.5
1.022
S0.333
Arka,,~:~,
1%5
128.5
I 015
II '\"U
A ruona
1985
10-t5
I 086
() 11\2
-17
Wes1Varvmi.J
1985
ll2.8
I UR9
0.382
48
Wyuming
1985
129.4
0.935
11.240
49
AJnb.tmu
1986
117.2
1.080
0.3~4
96
Wyomang
1%6
1r.s
1.007
0.?40
97
AlabnmJ
19~7
115.8
I 135
0.335
528
Wyomnr
1995
112.2
Number
0.36(1
~--~
/'J()l~
in Arkansas in 1985 divided by the total popula tion of Arkansas in 1985 equals
128.5). The averuge price of a pack of cigarettes in Arkansas in 1985, including tax,
was $1.015, of which 37 went to federal. state, and local taxes.
Panel data can be used to learn about economic n:lationship5 from the experiences of the many diffecent enrities in the data set and from the evolution over
time of the variables for each en tity.
ll1e definitions of cross-sectional data. time series data, and pane l data are
summarized in Key Concept 1.1.
Key Terms
15
1. 1
p!riods.
Summary
1. M a ny decisions in business and econo mics require quanrira 11ve estimates of how
a change m one variable affects another variable.
2. Conceptua lly. the way to es(jma re a causal effect ts in an ideal ran.domi7ed contro lled experime nt, but performing such experiments in economic applica tio ns is
us ually unethical, impractical. or too expensive.
3. Econometrics provides tools for estimating causal effects us10g either observational ( no nexperimental) data o r data from real-world, imperfect experiments.
4. Cross-sectional data are gathered by observing multiple entities at a single point
in time; time series data are gathered by o bserving a single e ntity a t multiple points
in ttme; and panel data <ue gathered by observing multiple e ntities, each of wh.icb
is observed at mulltple points in time.
Key Terms
rnndomi/c<.l contro lled experi ment (8)
l'Ontrol group (8)
11 catment group (9)
\...tusaJ eftect (!>)
~;xpenment.tl data (10)
observational data (10}
16
CHAPTER 1
1.2
1.3
Ynu arc ~tsk~d to ~tudy the:: relationship between hours -.pent on employee
ltaining (measured in hours per worker per week) in a manufacturing pla nt
am! the productivity of its worker~ (output rer \\Nkcr pcr hout ). De~cribe:
u. an idc.al ranclom izeu co ntro lled ex r~.:r imcntr o measu re rhb cnu~al
effect:
b. nn ob~ct va tional cross-sectional data set with which you cou kl stud y
tlus e ffect;
c. an observational time :.eries data set for studying thi' effect: <IIH.I
CHAPTER
Review of Probability
hj... dtaptcr rc\'lew.; the core ideas of the theory of prvbabllll\ Lhal an:
10
J.no'"ledgc of rrl)hanilit~ i' <aale.you ~h~.... uld rctrc'h it b) rcadtn\' this chaptl:r lt
you Ieel conhut.:nt wtth the.! materi.tl, you !>llll .,hould ... kim the chapter .tnd the
tcm1s nnd concepts ntt he end to make sun.: you arc familiar with the lth.:ns and
notation.
Most aspects ol the world around us have an clement ol randomness. llte
lhcur~ of probability
random variabk. and Section 2.2 cover; the mathcmaucal '-xped<~ttun mean,
<1 nJ 'ariance of,, ,jnglc.! random variable. Most ot the interesting problem-. m
~_;<.:onomics
invoh c more than one vanablt!. a nd Sec lion 2. ~ introduces the b~l'.ic
d ements of probability theory for two randum van.thlcs. Section 2.4 <.J i.;cusscs
thrc.!C
'uppusc you 'iliiVc.!Y lt.:n recen t college graduntcc; ~e lected at random . record (or
obl>u n."") the If carnmgs. and compute the il\~; rage ~:arning., using the~c
t~;n
data pomts (or un,crvations'').Because you chose the: t-am pic.! at tJnt.lom, you
17
18
CHAPTER 2
Review of Probobi11ty
could have chosen ten di(fere nt graduates by pure random chance; had vou
done so. you would have observed ten different e arnings a nd vuu would b.n e
computed a different sample
lS
il!;elf a random
2.1
Random Variables
and Probability Distributions
Probabilities, the Sample
Space, and Random Variables
Probabilities and outcomes. The gender o{ the next ne w person you meet,
your grade on an exam. and the number of times your comp uter will crash while
you are writing a term paper all have ao eleme nt o( chance or randomness. ln each
of these e xamph.:s. then:: is something not yet known that is eventually revealed.
The mutually exclusive potential results of a random process are called the
outcomes. For example, your computer might ne ver crash, it might cras h once, it
might crash twice. and so on. Only one ot these o utcomes will actually occur (the
outcomes a re mutually exclusive), and the outcomes need not be eq ually likdy.
1,.te probability o( an ou tcome is the proportion ot the time that the outcome
occurs in Lbc Jong run. 1f th e pro bability of your computer not crashing wh1le you
are wri ting a h:rm pape r is 80%. the n over Lh~ course of writing rndD) tenn pap.:r.;.
you will complete 8l}% without a crac;h.
2. 1
19
'\Ubset
Random variables.
Probability Distribution
of a Discrete Random Variable
Probability distribution .
Probabilities of events.
20
CH APTER 2
TABLE 2.1
Review of Probability
dL~tnbution
0.80
010
(I (Jo
O.IJ~
0.80
0.90
0.96
ll.W
Probahalit\
U.!Jl
variable M . For example, the probability of at most one crash. Pr(M :S 1). is 90% ,
which is the sum of the probabilities of no crashes (80%) and of ont! crash ( 10% ).
A cumulative probability disLribution tS also referred to as a cumulative dis
tributioo fun ction, a c.d.r., or a CUJ11.ulative distribution.
FIGURE 2.1
P ro babiliry
lUI
.,,
t)O
Numh"r of cnuh~
2.l
21
For example. le t G be the gender of the nt;>.l new pcrfion you meet. where
0 indicat~..;s that the person is male and G I indicates that she is female.The
outcomes of 6 and their probabili lie~ thus are
G _ { 1 with probability p
J 0 wilh probability l - p ,
(2.1)
where pis the probability of the next new person you meet being a woman. The
probability dist ribution in Equation (2.1) is Ibe Bernoulli di&tribution.
Probability Distribution
of a Continuous Random Variable
Cumulative probability distribution.
The cumulative probabi lity distribullon for a conunuous variable is defmed just as it is (or a discrete n1ndom varia ble.
nlal is. the cumulative probability distribution o f a continuo us rMdom variable
is the prob<~biU t y that the random variable is less than o r equal to a particular
val ue.
For example.. consider a student who drives from homt: to school. Tilis student's
commuting time can take on a continuum of values aod. because it depends on random (actors such as the weather and traffic conditions, it is natural to treat it as a
continuous random variable. Figure 2.2a plot~ a hypothetica l cum ulative distribullon of commuting times. For example, the probability that the commute takes less
than 15 minutes is 20% and lbe probability that it takes less than 20 minutes is 78%.
22
Review of Probob1l1ty
CHAPTER 2
FIGURE 2 .2
~~~l'fi[ommii/JIIgt.Te S
114
IS :020
11.:!
I)
,,
(I ~--~--~-------~---------~------~--------~------~
HI
C om m uting rime
(m iu u tc~)
(a) (
Probabilit\ d c n)ih
I~
<
<>OI o;a
llM ._
II (I{,
1\1.1-
1\
on L....~--~:.l------1----~-==::::::~----J..._------l
II
3
Commu tin~
35
timt-
II)
(m i nu to~)
f ogure 2 2o shows ltle cvmulotiVc probobiltty dtsrnbution (or c.d f.) of commuhng limes The probolity thai o c.om
is less rhon 15 minu'cs is 0 20 (or 20%), and >he probobtlity tho: tl ts less than 20 m1n.. "'
78 (78%)
mutng time
Fgure 2 2b s~ the probobtltty dcnsty funct,on (or p.d f ) of commuhng times Probob,lotM oro gvco by arcos
under tho p.d I Tht> probe' hty t'
1
nmuing tune rs ~ 15 aod 20 mmu~ is 0 58 (58!t), and i~ gi11en by
tho oroo under the curve b tween 15 and 20 minull!s
2.2
23
hctwecn the rrohabilit~ that the commut~. '' kss than 20 minut~" (7h"o) and the
prnbabtht\ thatlt J<; les~ than l: minute (20gu)
the rrobabillly den ll) functiOn and the. cumulall\e probabilit} dtstnbution ~hO\\ tht.: -.am..: infom1.1tion in diffcr~.nt form.ns.
,lU,
2.2
= (l
0. 0
1 X 0. 10
+2
0.06 - 3 X 0.0"'
(2.2}
Titat i'-. the expected number of computer cra.;hcs whi le writing a tcml pap.;r is
tU5. Of course. the acrual number or crash ~!$ mu-.t always be an i nt e~cr: 11 makec;
llt) -.cnc;c to -.a\ that the computer crashed 0.35 times while writing a partrcular
tc.r m papl.'r 1 RJth~.;r,thl. calculalio n tn Equatton (2.~) mean-. th.tt the <l\cragc number ul c;r,r,h~.-. o,~,.r man) ... uch term parxn. is 0.~5.
The r,~mlU)a for lh-: e xpected \ alue or., dtscrl.!l~ random \'ari3hlc y that can
lnke c)n k different \alucs ''given as Ke\' Concept::? L
ul th\: g~o.rh.:talll,rmuln
24
CHAPTER 2
KE.Y CONCEPT
2.1
Review orProbability
( Y);:: Yt P t
+ Y P:.- . .. + Yl PI
= L y,p,.
(2.3)
where the notation .. "'i~t>', P; .. means "the sum 0 1 v1p1 for 1 running fro m I to k."
The expected value of Y is also called the mean of Y or the expectation of Y and
is denoted J.l.r
variable. Let G be the Bernoulli random variable with Lhe probability distribution
in Equation (2. l ) .The expected value of G is
E{G)
=1X
+0
(1 - p)
=p.
(2.4)
Thus the expected va lue of a BernouUi random vanable is p. the probability that
it takes on lht: value '1."
2 .2
25
2 .2
a,'
var(Y)
= F[(l
iJ.l )
(~.5)
i-I
ntc ..t mdard de' ~tllion of Y i:. <T) the square root of the variance. Tht! un ' t!. ot tht
stand. n.l 1.lcvatinn arc the c;ame as the units of Y.
f11r l xamplc. !hr.. '1riance of the number of computer cr;1-.hes M i~ the: proh:lhlhty-weightcd average of the squared difference bet w~.:~n \I .1nd Its m ~an.lJ.35:
' 1r(M)
(0
+ (3 -
0.35)2 X 0.03
-I- (
0.06
(2.6)
l11e ~t a n dard devHHion of M is the ~4u arc root of the: variance. so 11M
O.M75
... 0.80
llll.: mean of thr.. Bernoulli random ' .1riahk G " ith pwbability distribution in 'Equnion (2.1) is ~J.r, = p [E4uation (2.4)] ~o its ' ariance is
\ ar(G)
= u 1~ = (0- Pi X ( 1 -
p)
(I - p)
J>
p(l- p).
v'p(l
(2.7)
-
p).
Me an and Variance of
a Linear Function of a Random Variable
I hi' .;cctwn d1~c.u~~s random ' anabk<: ('iay, .\ and } ) that an: related b) a linear
lunl11on. For C'\ampkconsider an incom~ ta~ sr..hcm~.; unJc.r \\l11ch a \\nrk~r is
1 1\cd .It " rat~.; tlf 211"., on his o r h~r earnings and tht: n gh r.. n 1 (Lax-free) grant If
~.,(}()(),
Undl!r thi' tax <:cheme. afll.:r-ta-.: earnings Y ar~ rclatc.:d tu pre-lax l'arnings
.\ b, the l:!qu<Jlion
Y - '2000
'1 u.~
\'.
I h.tl i-., niter tax carningo;; Y io;; ~0% of pre-ta'\ c.u ning.' \', pht<: 2000.
(2.X)
26
CHAPTER
Rev1ew of Probability
Suppose an individual's pre-tax earnings next )COr art. n random variable \\ilh
mean JJ.x and variance o} Because pre-tax earnings arc r<mdom. '-O arc after tax
earnings. Wba t are the mean and standard devia t ion ~ of h~r , fter tax earnings
under !hi!. tax? After ta.xes, her earnings are SO% of the o m!inal pr~;-l t\ carnmg.~
plus $2,000. Thus the expected value of her alter-tax cc.ming~ i:,
E( Y) -= J.l.l'
= 2000 + 0.
J.L r
(2.9)
That is, the standard deviation of the distribution of her aft er-tax earnings is 80%
of the standard deviation of the distribution of pre-tax eammgs.
This analysis can be generalized so tha t Y depends o n X with an intercept a
(instead o f $2000) and a slope b (instead of 0.8). so that
Y =a +bX.
(2.ll)
(2.12)
(2. 13)
m~an
and standard dt!viation mca,ure rwo import,mt fe<~turcs of a di..trtbumean) and its ~pread (the o;tandard u eviatllln) 1 ltis c.:cttun di'm~.t,urcs ot two otner fcaturec; of a d t,tn hultOn: the k 'n~;. " whach
,. ,
~,..t,
.. r . . , ....... .-nw.JY"'
......
27
how th1ck, or ''heavy." are it5 tail~ The mean. variance. skewncs'\. and kurtosis are
all ba~l'd on what are called the moments of a djstrihution.
Skewness.
Figure 2.3 plolS four dic;tribulloos, two which are symmetric and two
which .1rc not. Visually. the d ist ribution in Figure 2.3d app~:a rs to devjate more
from symmetry than does the distribution in Figure 2.3c. The skewness of a distribu tion provides a mathematical way to descri be how much a distribution deviates
from symmetry.
The skewnes of the distribution of a random vanable Y s
Skewness = [ (Y - 1-Lr ) ~]
(T~
(2.14)
Kurtosis.
{2.15)
IC a di!-.trihution has a large amount of mass in its tails. the n some extreme de partun:~ ()[ Y from its mean are likel). und these very large values wiUlead to large
values on average (in expectation), o f ( Y - lk >)J. Th us. for a dislribution w1th a
large nmo unt ol "''-'"' m 1h tails. the J..urtosi.; will b.: large. Because ( Y - ,u.y)4 cannut he nc.:f_allVC lh t.. kurtos1s cann u l b~.o ncgo t1vc,;
28
CHAPTER
FIGURE 2.3
Rev1ew of Probob1l11y
,, ...
ll.4
II U1----F:::.-.-.--...-----...--1
{1
-~
-1
{c)
-.~ Wl11:-'~
-I . kurul'i
:;
= i) r,
kunosJ- 5
and b) ore symmetric, the di\lribl.ttions with nonzero skewness (c and d) ore not symmetric. Ths distnbvtioos
Titc kurtnsb ol 1normally dbtrihutcd r~tntlum vn1 1ahlc is 1 -.o.~ r.md~1111 vari
abk "' 11h kurtusi" c-.:ct;ctling 3 has more m.1ss 1n 11~ tall<; than n 1101 m 11 r tndnm
vanabk \ u stnhut1on with kurl\)sis t:\~t.lJ ng .\ 1s c.tlh.:d le pto'-urtic 1.Jr. more
s mph ht.a\} wiled. L1kt. ~kcwne<>~ the kurto"'' 1 un11 tr~:~...~o ch.tn '"!! tht. units
uf F Jot.' notlh,mge 11'> kurtosiS.
Bdo\\ c~ hot th~.; h .. ur U1"lnbutluns m Iil!un: ~.3 lS its kurttl'i'. llh: t.h'>trihuIJUns Ill I 1gur .., .., Jh u nrc hea') -t,,ih;tl.
2.3
29
Moments.
lne mean of Y. E(Y). is also callt:d the first momt.nt of Y. and the
e-<pcctcd value of the !>quare of }', 0 ). is cnlku the sccont.lmomcnt of Y. In
gt:Jllral.the expected vaJue of t~ 1<: called the r 1h nwmenl ol th~.; random variable
}'. Ill 1! IS. the rb moment of} ' i~ Ell') The 'k~:" ne~ ... I' a function ul the fint, ~ec
ont.l, .1nd thtrd mom..:nt of Y. and the kurto,is I' .1 functuln of the! first through
fourth moments of Y.
2.3
30
CHAPTER 2
TABLE 2 .2
Review of Probability
No Rain (X : 1)
Total
Long Commute ( Y - 0)
0.15
0.117
C ~mmut~ ( Y- I)
U.l5
o.o3
11 7S
0.30
0.70
ll)il
~hort
Total
II?...,
Pr(Y
= y)
2: Pr(X
= x,, Y = y).
(2.16)
i=l
For example, in Table 2.2.the probability of a long rainy commute is 15wo and
the prooabihty of a long commute wi1b no rain is 7(\~. so the probability of a long
commute (ra~n y or not) is 22%. The marginal dt tribution of com muttng times is
given in the fi nal column of Table 2.2. Simi larly, 1he marginal probability that i1
w1ll rai n is 30%. as shown in the final row ol Table 2.2.
Conditional Distributions
Th~ distribulion of a random varianlc Y conditional
on anulher random variable X taking on a spccilic value is called the conditional
distribution of Y given X. The condilional probabi lity tha t Y take!> on tb ~ val ue y
when X takes on the value x is written Pr( Y = yiX = x).
For cxumple, what is the probability of a long commute (Y = 0) if you know
it is raining (X = 0)? From Table 2.2, th~ joint probability of a rn iny short commute is 151u and the joint probability of a rainy long commute is I~ %. so it it is
ra ining a long commute and a short commute are equally likely. Time;. the probabllll~ of a long commute (Y = 0). conditional on il o~1ng rainy (X= U),1s 50%. or
Pr( Y II .Y 0) 0.50. EquivaJentl)'.the marrmal prooanllity of ram'' 10'.., that
i-. )\ cr ma11) Ctlmmute-; tt rains 30% of I he limt:. Of lhl' JO uf "ummutc". 50%
Conditional distribution.
2.3
r,.ABLE 2.3
Joint and Conditional Distributions of Computer Crashes (M) and Computer Age (A)
A. Joint Distribution
"'
..
"' .. 1
"' - 2
"' .. 3
0.35
1).065
0.05
0.025
0.0 1
0.50
New compute! (A 1)
0.45
0.035
0.01
II.IXI.'i
0.00
1).50
futaJ
0.!$
U.l
0.06
o.m
().Ql
1.00
"' : 4
TotoJ
31
"'
0)
TotoI
"' :: 1
111 = 2
"' - 3
P1( \ljA
0)
0 70
U11
0.10
IJ.O'
0.02
I' ~ \IJ.tl
ll
0.90
0.07
0.02
0.01
t).()O
~~
1.00
= r , Y = y)
Pr (y = y IX= x ) = Pr(X
Pr(X = .l)
.
(2.17)
For example, the conditional probability of a long commute given thal it is rainy
is Pr(Y = OI X = 0) = Pr(X = 0, Y = 0)/Pr(X = 0) 0.15/0.30 = 0.50.
As a second example, consider a modificu tion of the crashing computer example. Suppe&e you use a computer in the libra ry to type your 1em1 paper and rhe
Ubrarian random ly assigns you a compute r from those avai lable, ha lf of which are
new and half of which are old. Because you are randomly assigned Loa computer,
the age of the computer you use. A ( = 1 if the computer is new, = 0 tf it is old), is
a random variable. Suppose the joint distribution of the random variables M and
A i" gven in Part A of Table 2.3. Then the conditional distribution of computl!r
cra,bes, given the age of the computer, is gwen in Part B of lhe table. For example.lhe joim probability M = 0 and A = 0 is 0.35; because half the compute~ are
old, the conditio nal probability of no crashes, given that you are using an old computer, is Pr(M = 0 1\ = 0) = Pr(M = 0, A - 0)/ Pr(A = 0) = 0.35/ 0.50 = 0.70.or
70%. In contrast, the conditional probability of no crashes give n that you arc::
assigned a new computer is 90%. Acco rd ~n g to the conditiona l distributions in
Part B of Table 2.3, the newer computers ar~ (co; hkely to cra~h than the old ones:
for example. the probability of three crashes i'> ~<>.. "ith an old computer hut 1%
with a new computer.
32
CHAPTER 2
Rev1ew of Probability
Conditional expectation .
( Yl X = t)
= ::S:
PrO' :.. \ / X
x).
(2.18)
E(Y)
= 2:, Et > tX ~
t
.\ )Pr(X - r 1)
(2.1\}J
Lqu.111nn (2.1Y) follow~ t rom Equat1on~ (2.18) ,md (:!.1 7) (sec rx~:n:w 2.19).
Stilled differently. th~ expectation of} j.., the expi.!ctattllll of the conditional
cxpcct!'llion of Y given X,
E(Y)- E[f::(YIX) ],
\\h~1 c
(2.20)
2.3
33
Conditional variance.
,.,..,
,,1.\'= x).
(2.21)
fur e\,tmpk,t hc conditional \ari.mcc of the numb~. 1 ,,f cr.t,.hv. gi\'cn the\ I the
umputt:r i-. old is \dr( )f' A = 0) = (0- 0.56)" 0 o (I - 0.5o)2 >. O. n(2 0 .:;6)2 X () )() ~ (3 - 0.56) 2 X 0.05 ~ (4 - 0 56) Y. O.IY.' - (} 99.111C stand.trJ
dr' tation of the conditional distribution of \1 ~iH:n that A = 0 ts thu') v 'O.Y9 =
II'N Ilh. condlltunal variance of M given that A
I ''the vmwnc~ ol th~ Ji"lri
button Ill the seconJ row ol Panel B of fablt! 2 3, whtch IS 0.22. ~o the ~t.1nJarJ
dc.:\ wtion ot \I for new com put~Th i~ \ '0.22 = 0.47 I or lh~ conJiltonaJ Jtstrihut inns in l .thlc 2.3.thc C\pected numlxr of cra!.hc.:<. lur nc\\ computt:r... (0.1~) j, less
thun th.tl for oltl computer.., (0.56). and the 'rrcau t'lf the dio;ll ihution of the numhcr t)l crn~hc ... a~ mca..urcd i:l~ the conditional :-.t.tnd.m.l dc\'i,llton. '' <;maller ror
ne" computer... (0.47) than tor old (0.99).
..,
34
CHAPTER 2
Review of Probob41.ty
Independence
1\vo random variables X and Yare independen11y distributed. or inde pe ndent if
knowing tbe value of one of the variables provides no infonnatwn about the uth~.;r.
SpecifJcally, X and Y are independent if the conditional d1stnhuuon of Y ghc.;n .\'
equals the marginal distribution of Y. Thaa is. X and Yare indcpen<.kmly distribUted iL for aU values of x and y,
Pr(Y == yi X ,.. .r)
= Pr(Y ""' y )
(independence of X andY).
(2.22)
Substituiing Equarioo (2.22) into Equation (2. l7) g1ves an alternative expression for indept:ndent raodom variables m terms of their joint d1stnbution. If X and
Yare independent, the n
Pr(X = x. Y o:= y)
(2.23)
That is, the joint distribution of two independent random vanables is the produc t
of their margmal dismbutions.
= <1 xY = I(X 4
=L
I
JlA)( Y- P.r)]
(2.24)
2: (x
I 1
1-
P.x)(y,- Jl y)Pr(X
= x,. Y =Y;)
To mterpret this formula. suppo e that when X is greater than it mean (so
that X - J.Lx is positive), then Y tends be greater than its mean (so that Y - JLy 1s
posiuve). and whe n X is less than its mean (so tha t X - Jl.\ < 0), then Y te nds to
be less than its mean (so that Y - p.y < 0). In both ca!>es, the product (X - P.x)
x (Y - p.y) tend~ to he positive. so the covariance is positive. In contrast, tf X and
Y tc:nd to move 10 oppu~ite direct tons (so that X is large when Y 1" ~mall. antl \icc
vc.:r..t) then th~ ~o.OV<Inance i ncl!auvc finally. if X anti Y an.: intlcpcn<.Jcnt then
th-.: c.:uvariancc i'i 7t.ro ( ce Excrc1c;c 2.19).
2. 3
35
Correlation.
corr(X, Y) =
cov( X .Y)
r---:vvar(X ) var(Y)
CTx )
= -".\'rrl
'"-
B ecause the units of the numerator in Equation (2.25) are the sa me as those
o f the de nominator, the units cancel and the corre lation is unit k ss. The random
varia ble!' X and Ya re sajd to be nncorre tated if corr(X, Y) - 0.
T hl:\ corre latio n a lways is between - I a nd 1; that is. as rroven in Appen -
dix 2.1.
- I
s corr(X. Y)
:5 1
(2.26)
(2.27)
We now show this result. First suppose that Y and X have mean ze ro, so tha t
cov(Y,X) = .t.l (Y - ,uy)(X- Mx)) = E(YX).By the luw of i t~ r a ted expectations
!E qua tion (2.20)). E(YX) = [(YIX)X) = 0 because E(YIX) = 0. so cov(Y,X)
= 0. E4u ~nion (2.27) follows by substituting cov(Y,X) =0 into the definition of
correlatio n in Equa tio n (2.2'5). If Y and X do no t have mean zero, first subtract off
thetr means. then the prece ding proof applies.
It is not ne cessarily true, however, that if X and Y are uncorrclated , the n the
conditional rnea.o of Y given X does oo t depend on X . Said differently, it ~ possihl~ for the conditio nal mean of Y to be a functi o n of X but for Y a nd X nonetheh:-.-. to he un~orre l a te d . An example is given in Exercise 2.23.
Y)
= E(X) +
E(Y)
= IJ.,
t- J.L,,.
(2.28)
36
CH APTER 2
Review of Probability
o me
par~nb
TABLE 2.4
~hown
in
l h~o
Ct ndtl!unnl d1 tnbuttort-.nrc
fl'lr
(Fi~ure
Flgure 2.4c). For both men <mll wom.:n. m.:un enrniogo; .1re hil!hcr for tbn~C with collc~c u c ~n;c (T&
blc 2.4,
f1r~t num~nc
column).
l n t ~: re~t 10gly.
Pen:enh1e
Standard
Deviation
s~
90"11.
lS%
(med"tan)
75"1.
7 II-i
S IP9
12.0~
$16 lllr
12
1085
13 i'.:l
11113
~ll 11.:1
.H.2h
17 tJ3
4. ~6
11 54
1:\.S7
21.113
2t< \ 5
J..un
ti.31
1.S.23
'5 71
4. liS
dtp ll
( b l \\lumen with
dcgro.:\:
1 .~5
'
1 ~1 ut
)o.:.u
l7S
~o:.ollo.:go.:
~I
(c ) M en wi th lugh ~dW11 l
Jiplum.t
ILIJ ~len \\ tt h "'ur-yc.11 collcgo.:
dcercc
2, "
"'urtc:d mnll311)
,\,~ndi\J.
.,
the
Mean
2.3
FIGURE 2 .4
37
c?o~[sity
Den sity
UCtll~
(1117
007
I
fi(J<
0.05
c '~
CI.CI~
'-'I .
11u1
V.lll
co
Do llan
Cl
1V "'
-4(1
5<
-t, so
'
'-
Dollar~
Dollars
(c) Men "''Uh
1
38
CHAPTER 2
k!:v CONCEPT
2.3
Review of Probobtlity
var ( a,,v - lJ Y)
' '
tNr:._
(2.30)
2ahu.n
-t
I,
,
r<T).
(2.~1)
(2.32)
cov(a + hX .. cV.Y)"" buyy - rrr 1 r
E(XY)
=ux,
(2.33)
(2.34)
P.\J.l.> and
Int.' ariunce olth~ :-.um uf X and Y is th~.: '\um of tbei1 \atinnccs. plus twice their
co\anancc.
var(X }") -= '.tr(X) - \ ar(}') + 2cov(X,Y)
(2.36}
li \'and Yare indepcndcnt. tb~n the c,wari.tnc\! ~' :teru and the' ariuncc l\l thd
sum 1tlthcil ' ariances:
~urn 1s the
var(.'< + Y)
+ rr;,
(2.37)
L "dul t:~prc~... 1on~ lor means. \ .~ni.ln<.:~,;~. anJ Hl' aria 1~e' nvuh 11~ "' "~htcd
'um-. 111 randl'Ol , .., 11hlc' are clllk(.'h;u 10 .k.e~ Cnncept 2.3. Th~ n.::.ulb 111 Kc~
C1,nct:rt 2.1 are d~rih:u in Appcnui' ~.1
2. 4
fFtGURE 2.S
39
II -
2.4
I '.IN'I
)'
40
CHAPTER 2
Review of Probobility
2.4
Suppose Y is normally distributed with mean p. and variance <r. m other words.
Y is distributed N(p.. u-2) . The n Y is standardized by subtmcting its mean and
dividing by its s tandard deviation. that is. by computing % = (Y- p.)lu.
Let c and c~ denore two numbers with c 1 < c:. a nd let c/ 1 = (c 1 - p.)lu and
d 2 = (c: - 1-L) / u-.Then.
= cf>(d2) -
<l>{c/ 1) .
(2.40)
i<2 - 1),
= 0.691 ,
(2.4 1)
2.4
FIGURE 2 .6
41
Y.
I II 2 II
}'
(a) N(l.-1)
(I f)
0.5
(b) N(fl. 1)
the diSiribution is called the multivariate normal distribution, or. if only two variables a re being considered, the bivariate normal di tribution. The formula for the
bhariate normal p.d.f. is given in Appendix 17 I. a nd the formula for the gene ral
mult tvnriate normal p.d.f. is given in Appendix 18.1.
The multivariate normal distribution has thr\!c imponant properties. I X and
Y have a bivanate normal distribution with covariance uXY. and if nand hare two
con.;tants. then aX + bY bas the normal distribution,
(2.42)
REI\'icw of Probobtlity
CHAPTER 2
42
0
hy
1\p~t:al
~locks
I~,.
ur even mor
'l bi~
h a 1m-hut nothing co m-
"cl' h) :? .6
I wm J;muary I.
I'J~O.to Octo~r
16.
If dail\'
1980~
per~ent<J~c.; ;'Tit:C
FIGURE 2 .7
Doily Percentage Changes in the Dow Jones Industrial Average in the 1980s
Pcrctnr change
16 . On
-I
_,.,
_ ,
2.4
w-'o.
The universe is helh:vc.:d to have existed for 15 billion years. or about 5 x '1017 seconds. so the probnbihty of choosing a particular <;econd at random
from all the <;t:conds sine.! the bc.:ginning of time
is 2 X 10- }. .
Then~ are appro;\imately I043 molecules of gas
in the fir.;t kilometer above the earth's surface.
The probability ot choosing one at random is
w-.u.
43
m 11enera I.
Zy
44
CH APTER 2
Review of Probability
from tht: Greek kllcr useJ to denol~: II a chi-squared distribution with Ill uegrces of freedom is denoted x:,.
Sdt--cted perct!ntilcs of the X: distribution are given in Appendix Table 3. For
example. Appendix fable 3 sbowi'\ that the 951: pcrcentsk of tht: ,), dhtnbuuon 1!.
7. 1, so Pr(7.' -r z; , z~ s 7.Xl) = 0.1)5.
UlStnbuliOn
u~rivcs
The F Distribution
The F distribulion wiLh m and 11 degree" of frecc.lom. denoted F, ,,.. is defined ro
be the distribution of the ratio of a chi-squared random vanahle \\-llh degrees ol
freeuom m. divided by m.lo an indcpend~.ntly distribut ed chi-squared random
varinhle with degrees of rrcedom 11 , diviucd by 11 . To ~late th1~ mathcmaticHlly.let
W hl a ch t-\<j uart:tl ranc.lom varrablc with m tlegrcl!s uf fn:ctlom and let V' be n
chi-~qu<rr..:d random variable with" tlcgrces of rrcc;Jom. \\ h~:rc Want.! V arc inde
pt:nc.lcml~ di~trihutt:cl.ll1en ~.~'has an l~,u' Jistributi,ln-that is. an F distribution
"'ith num~:rator ~k.gree\ of freec.Jom m and d\!nominatur clq;rces of treeuum 11.
In sl,-tti..,tics ,tnd econometri<.:"-. an important .;p~:ci al a-.t> or the F c.listributitln
arises when the d\.nomm.ltor degrees of lreelftlm IS l.trg..: enough th"ttlbc l mJI di'>tributron can be appr oxrmaled hy the r:,. drst ributmn. In thio; limiting case. the
l.knomin.ttm rand1)m \anable \.' h the mean of mlinuel ~ many chi-.;quarcJ nm - I ... . . n , l..tHI- - -
2.5
45
ntnJom \ari tbk i' 1 c~~ E:xerci'c: ">~) l hU\ the,.., disuiblllinn j, .he JJ,Irlhution of u chi ...quart.'d r.tndom 'ctria~lc v. uh m de!Ire ... " 01 lrc..:c..:dom tli' 1J ~d by
m \\' 1111 " Jt,trihutcJ /., _. For example from Appl.ndi\ Tabk 4. thc 95' p<..r~~:nule oJ rhc r, dt~tnhullon ts 2.60, \\hich i:-. th1.. ~.tmc .., th1.. ""' p~..rccntllc of
th ... ..\; Jt.,tnbuttun ' ' ' (from Appc:nJix Table 2). Jivid~.;d b) thc Jcgr~~' uf frl!edom," hkh 1., :1 ( 7.Sl i 3 = 2.60).
1111..' 90' 11 95111 and C)C)Ih percentiles of th~ rll Ji-.tribution an: riven in Append ix Table 5 (or .;elected values of m 1nd n. For exa tple. tht. ~) pu...enulc.. nf the
F di,trihullon b 2.Y2,and th~Y5 1 Jkn. nllk of I'"- F; ~' J1,tr bUllt_)O 1:\ ~.71.As
the denomtnator dl.!gree:-. of treeJom lltn~rea...cs, th<..lJ5 p<.;n.:.. ntik of the.: 1..., dJstributwn ll.:nJs to tbt.: F3. It mit of 2.o0.
2.5
Random Sampling
Simple random sampling.
46
CHAPTER 2
Review of Probability
The situation described in the previous paragraph is an example of l.be simplest sampling scheme used in statistics. called simple random sampling. in \\-hicb
n objects are selected at random from a p o puJation (the population of commuting clays) and each member of Lhe population (each Jay) is equall> likely to be
included in tbe sample.
Then obse rvations in lhe sample aie denoted Y 1 , Y~ where Y1 is the first
observation, Y 2 is the second observation . a nd :.o forth In the comm uting
example. Y1 is the commuting time o o the first o l her n random ly selected days
aocl Y, is the commuting time on the i'h of her random ly selected days.
Because the members of the population included in the sample are selected
at random, the vaJues of the observations Y1, Y, are the mselve s random. If diffe re nt m embers of the populati on are chosen. their values of Y will diffe r. Thus.
the act of random sampling means thar Y1, , Y, can be treated as random varia bles. Before they are sample d, Y 1, , Y 11 can take on m any possible va lues: after
they are sampled. a specific value is recorded for each observation.
i. i.d. draws.
tion. the margina l distribution of Y; is the same for e ach i = J, . .. , n; this marginal
distribution is the distribution o( Yin Lhe population be ing sampled. Whe n Y, has
the same marginal distribution fori
tically di tributed.
U nde r simple random sampljng. knowing the value of Y1 provides no information about Y2 so the conditional distnoutio n of Y2 given Y 1 is Ihe same a s t he
margina l d istribution of Y1 . In o ther words, under simple rantlom sam pling, Y1 is
distributed independently of Y1. . . , Y,.
\\'hen Y1 , . , Y,, are drawn from the same distributiOn and are independlently
distributed, they are said to be ind ependently and identicaUy distribute d, or i.i.d.
Simple r andom sampling and i.i.d. draws are summarize d in Key Concept 2.5.
The Sampling
Distribution of the Sample Average
The sample average, Y,of then observations Y1.. , Yn is
-
y == -n ( y 1 + y 2 + ... + y '')
II
= -11 ""'
,~
. ,Y.,
(2.43)
An esst!ntia l concept is tha t the act of drawing a random sample llas the eCCect
of making the sample avera!le Y a random variabk. Because the sample was dra\\-n
al random, the value of each Y, is rando m. Be cause Y 1 Y, arc nmdurn. their
average is rando m. Had a different sample bc~n urav.n,l hen tht: uh-.cnntion~ and
2.S
47
2.5
their sa mple average would have been diffe re nt: the value of Y diffe rs from o ne
(Y)
=-;; ,.,.,
2: E(Y,)
= }Joy .
(2.44}
48
CHAPTER 2
Review of Probability
var(T')
= var( -I
"
)
2:Y,
II ,-J
u-'
= ___r
n
11 '- 'tandanl de\ iation of Y j, lht! :.quare root llf th~: 'c~rinnct. n
V";.
In summary. the mean. the! \aria nee. aml thc 'tandard tk vi uion ol Yare
E(Y)
= /-L)'
stt.I.Licv( -Y)
These
rc ~ ult~
{2.46)
.,
var( Y ) = n:)
(Tf
= II <HlU
. '
= 1 1 -= !!..r
/. .
\n
(.:!..t7)
(2.48)
UlK:~ not nt:.cu to t..tke on a peci fic form. 'uch ,1, tht:. normal di,tnhuuon, for Equn
lr;
2.6
Large-Sample Approximations
to Sampling Distributions
'\,unpling Jt.,tnhurion5 play,, central rolc in the dcn~],,pmt: nt nht.lti-ah.:.ll wu
..:c,,nomdnt. prt cuure::... -;oil b important to J.;ml\\, in am .th~ n~tica l 'cn,c. \\h't
tht. ' ampling t.lt'>trillution 1lf } is. There 1rc two apprnudtc' h i ~,;huructcri11ng
s,unpltng Ji,trihutaons: an "c\ at:t" appro.u.h .tnd an ...Jppro\tllt.lle- appro.u::h.
1ll "C.'\:UCI" arpr~..-,ach ~;nt.ul'-lh!rh ing a fc,rrnuJa ftlr the s:unpling dt .. rrihutitn
2.6
49
describes the distribution of Y for any 11 is called the exact dist ribution or fi nite
ample distribution of Y. For example. if Y b normally distributed, and } .... , Yn
are i.i.d .. then (as d iscussed in Section 2.5) the exact di:.tnburion of Y is no rmal
with mean 1-f.r and variance u~ l n. Unfortunately, if th1.. dtstribulion of Y 1" not nor
mal , then in general the exact sampling distribution of Y is very complicated and
depends on the distribution of Y.
The "approximate" approach uses approximmions to the sampling distribu
tion that rely on the sample size being large. The large ~a m p l e approximation to
the sampling distribution is often called the asymptotic di lributioo-'a:.ymptotic''
because the approximations become exact in the limit that 11 - 4 ..As we see
in this section, these approximauons can be \Cry accurate ~ ..e n if the ':!.tmple size
tS oo l~ 11 = 30 observations. Because sample si7CS U!>ed in practice tn economet
n cs I) ptcally number 10 the hundreds or thousands, these asym pto tic distributions
can be counted on to provide very good approximattons 10 the exact sampling
distribution .
This section presents the two key roots used to upproximate sampling dislri
but ions when the sample size is large, the law of large numbers and the central
lumt theorem. The law of large numbers says that, when th e sample sit.e s large.
Y will bt! close to 1-f.y with very high probabiliry. The central limit theorem says that,
when t he sample size is large, the sampling distributio n of Lhe srandard i7ed :;am
pie average, (Y- p,y)luy, is approximately normal.
A Ithough exacl sampling distributions are complicated a nd depend oo the distribulioo of Y, the asympto tic distributio ns are si mple. Moreover- remarkablyLhe asympto tic normal distribution of (Y- tJ.'I'} Iuy does not depend on the
d1stribution of Y. This oonnal approximate di tribution provides enormous si mpltfications and underlies the theory of regression u~ed throughou t this book .
''''h
50
CHAPTER l
Review ol Probahihty
2.6
lAW
The 'ampk avcrC:tg.e Y con,ergcs in probnbility top. ~ (or. c4uhalcntly. }"is con-.i'>ll.:nt for J.Ly) if the prohabilit~ that Y ts in th(; r.tng.,; JL> - c ll1 1, ~ ~ ~comes
arbitrarily closl! to one a~'' increases tor an~ constant c > 0 nu~ 1::. written as
} 4 J.l.)
Ihe Ia\\ of large numbers sa~r-; that II Y,, i 1. . ... 11 arc tndcpcndently and
identically dtstrihutcd with (Y1) = JLy and it large outliers are unlikcl) (tcchn:ic.:ally if Htr( Y,) = CJ'~ < -:.c). then Y ~ J.Lr
where (from Table 2.2) the probability that Y, = 1 is 0.7R. Because the
cxpcc.:tation of a B ~rnoulli random variab!l;! is its success probability. E( Y;) = ll- >
=- 0 7R. The sample average Y is the fraction o( days in her sampk in whi<.:h her
V<tl iablc!,
2 .6
FIGURE 2.8
P robabilit)'
Pn>h.tbiliry
., . r
., i-
J1
Jl ~ 078
0 78
l l~
lll
r ,,
,,
~s
o),II._,.-,-,.....,L-.--.--,,..........,,...., ..,a.,_.,-,.~,---:"'1
I)
II
U 2S
II
(I
11.7.1
I.IJI.I
(b) , - 5
.:::
Prnhahilil}
Probabiliry
t'!'i
II
15
Il l
(11~1
Ill
I -.I
"I
1 I (I
...,....,.....r-r..,..-,..-.,.-,................-.~~-~
1111
!lilt
II
lfl()
Th dr~trobulton$ ore thP ~mpling distnbutions of Y, the ~mple overage ol n mdependont Bernoulli random
Pr( Y
1J 0.78 (the probability of o short commuto is 78' ) Tho variance of the som
1
dr$tributton of Y decreases as n gels forger, so the sampling distnbuhon becomes more ttghtly concen
- d around rls moon 1 0.78 o~ the somple size n increases
vonobla~ With p
S1
52
CHAPTER 2
Reoview of Probability
rr,
2.6
FIGURE 2.9
Probability
P robability
0.7
11,5
02
II I
n n ~~~,-~~~~r---~,---~
-\ I
- lll
-1 11
IJ II
I U
2 11
-J
3 .11
- :!.0
(I
St t~odardi:z.ed
value of
sample average
II (t
l.tl
2.11
J.ti
Standardized val ue of
sample ;~veragf
(a) "=:!
(b) n= 5
Probabiliry
P robability
lU:!
J.:5
Standt~rdi:ud
St~mple
( c ) u <=
-1 .11
:?:>
val ue of
average
Standardized vnlue of
sample average
(d)
II=
100
The sampl1ng diilribvtion oS Y in Figure 2.8 h plotted here after standardizing Y. This centers the distribvltons in Figure 2.8 and mogni~es the sc.ole on the horizontal axis by o foetor of Vii. When the sample size
s Iorge. the sampling distributions ore increosingly well approximated by the normal distribution (the solid
ltne), os predicted by the central limit theorem The normal distribution is Koled so thot the height of the
distributions is approximately the some in all figures.
53
54
CHAPTER 2
FIGURE 2. 10
Rev1ew of Probability
Probability
Probability
1). 12
Ol P/
l,li.IJ
ll.f k) >:::::.,-,--.
-.H1
10
- 1.\)
II 11
I)
2.ll
.\ !J
Stl'lndardizcdl ~du e of
sample rw erage
Standardized value of
sample aver age
(a) u- I
( b)
Probability
II=
. I.?
0.1 ~
I ll{t
l l (1\
(I
011
_ , IJ
2.0
l .ll
lJ II
I II
.?I J
- 2.0
3.11
1).11
I \1
j ,I.J
Stnndnrdizcd vol ue of
snmplc :wern go
Scaudardizcd value of
sam ple Average
(c) 11 - l'i
-1 !l
(d)
II=
Jllrl
The figur~ show the sompl10g distribution of the standardized sample overoge of n drows from the skewed
(asymmetric) population distribution shown in Figure 2. 10o When n is smoll (n 5) the sampling dastribu
hon, like the populohon d1stribuhon is skewed But when n is Iorge (n - I00) the 10mpling d1stribution is well
opproxm101ed by o slonclord nonnol distribution (solid line), os predicled by the cent "OIIamit thf.orem The I'IOf
rnof distribuhon is SCOiecJ SO tho! the height of the distributions IS ClpPfOXirnoleiy the 101ll0 in aJ figurtn
Summary
55
2.7
Bv 11 = 100. howeYer. tl~n;al appro\im-:rion i... quite good. rn fact. lor Tl 2:. Iuo
the normal approximation to tjJt! distrihUtllm,ol t} ptcalf} is \W~ good for a" 11.k
variety o[ population Jistnbutton~.
The CCtHrallim.it theorem is a r~mark able result. While the small n'' distributions of Y in parts band c of Flgures 2.9 ami 2.10 arc compl iC<t ted and quite
di(fl'rc nt from each othe r. the 'large 11'' distributions in Figures 2.9d a nd 2.10d
are simple and, amazi ngly. hav~ a simtlar ~hape. Be!causc the dt!>tribution ot
Y approuchc' the nom1al as 11 grows large, Y it. said to he a!'~mptoticall~ norma ll~
diS1ribuCed.
Summary
l. The probabilities with which a random variable takes on different value!> are
56
CHAPTER 2
Review of Probability
first standardize the variable, rhen use the standard normal cumul.uivc t.li.;tribution tabulated in Appendix Table 1.
5. SU:nple random sampling produces 11 random observations Y 1 Y,. that art mdependently and iJ eotically distributed (i.i.d.).
6. The sample average, Y . va ries from one randomly chosen sample t o the next and
thus is a random variable wirh a sampling distribution. tf Y1, . . Y,.. nre 1 t.d .then:
u1-tn;
Key Terms
outcomes (18)
probability (l8)
sample space (19)
event (19)
discrete random variable (19)
continuous random variable (19)
probability distribution (19)
cumulative probability distribution (19)
cumulative distribution funct ion (c.d.t.)
(20)
Review of Con<epls
57
Examples of random variables used in this chapter mcludcd: (a) the gender
of the oext person you meet, (b) the number of tunes a computer crashes.
(c) the time it takes to commute to school, (d) whether the computer you are
assigned LD the library is new or old, and (e) whether it is raming or not.
Explain why each can be thought of as random .
2.2
Suppose that the random variables X and Yare independent and you know
their distributions. Explain why knowing the value of X tells you nothing
about tbe vaJue of Y.
2.3
2.4
An econometrics class has 80 students, and the mean swden l weight is 145
lbs. A random sample of 4 students is selected fr om the class and rheir average weight is ca lculated. Will the average wciglll of the srudeuts in the sample equal145 lbs.? Why or why not? Use this example to explain why the
sample average, Y. is a random variable.
2.5
Suppose that Y1 .. Y, are. ii.d. random variables with a N( l , 4) distribution. Sketch the probability density of Y when n = 2. Repeat this fo r n = 10
and n = 100. In words, describe bow the densities differ. What i!. the relationship between your answer and the law of large numbers?
2.6
Suppose that Y1, Y, are i.i.d. ranJom vari:Jbles with the probabilit y dis
tribution given in Figure 2.10a. You want to calculate Pr(Y s. 0.1). Would it
bt:: reasonable to use tbe normal approximation if n = 5? What about 11 = 25
or n = 100? Explain.
2.7
58
CHAPTER 2
Review of Probability
Exercises
2.1
Let Y d e note the number of " heads'' tbat occur " hen 1\HI Cl>in-. arc tn,,cd.
o. O e n ve rhe probability distribution of Y
h. Derive tbl:! cumulative probability distributton of
c. Derive the
2.2
m~:an
i:
a nd variance o{ Y.
lh~ the probability disu ibution given in Table 2 .2 to compuh.. (a) F.(Y ) and
U:,ing the rando m va riables X and Y from Table 2.2. con!-ilkr two new ran
=3
dom variables W
+ oX a nd V
= 20 -
Shnw
p.
(X~) = p .
b. Show (X")
= p fo r k > 0.
c. Suppose th:u p = 0.3. Compute tbe mean. variam:c. skewne~~. a nd kurto is of X . (Him: You might find it helpful to use the formulas given in
Exercise 2.21 .)
2.5
2.6
1l1c following table give:. the joint probab1lity d ~mbu 11on he t wc~n c:mploymcnl statu and college graduation am on!t tho se e ither e rnployeu or looking for \\Ork (unemployed) in the wo rking age U.S. popultwon. based on the..
( ull~c 1'1
To111
h ( \
II
Employed ( Y
ll.\1~5
0.71l<l
ll.(k15
,1-~;o!l
IIIL"'
f)
9.50
1)
Total
1.000
Exercises
n.
Comput~
59
E( Y).
b. The unemployment rate ic; the rra tlllll of th..: lahor lure~ th.n '" unemploycu. Show that rhe unemplo> mc.:nt rate I) gih n by I - L:t Y).
c. Calculat~ E( YI X =
d.
1) and F-( Y1 X
= 0).
Calcul<tt~; the unemployment rate for (i) c.:olkg.l! gJ .tJu.u~c; and (u)noncolkgc graduates.
e. !\ randomly selected mt!mbcr of th1s population report~ bemg. uncmploycJ . What is the probability that thi~ \\Urkc 1s .t culkgc grJduatt!?
A non-c:olkg~ graduate?
f. Arc educational achie\emeot and employment :.tntus independent?
Explain.
2.7
u mean nf $40.000 per year and a standard deviation of S12.1KXJ. Fe: male.: ~.:am
ings ha q~ a mean of $45.000 per year and u stnndurJ dc\l<ltlOn ol $1 K.OOO.
The corrcl:lllon between male and r~.malc earnings fm ,, c.ouple b 0.80. Let
C denote the com hi ned earnings for a randomly ~e l ected couple.
a. Wh&t is the mean of C?
b. What i'> the covariance between male and fe male ~..mungs?
c. W hat is the !>Landard dev1ation o( C?
2.8
2.9
X and Yare discrete random variables\\ 1th the foliO\\ ing JOint Jt:-.tribution:
Voluo of Y
[
1
5
40
6S
(1.0.\
II 01
o.P
11. 15
IUI5
002
(l.IJI
0.02
0.03
11.1't
lllll
IJ.09
('. c lkulatc Lht: CO\ariao~.:e anu C(lrrd:lli<IO hi.! I\\ ccn X .md Y.
60
CHAPTER 2
Review of Probability
3).
s 8).
s Y s 1.99).
1.99 s Y s 1.99).
d. Why are the answers to (b) and (c) approxtmntely Lhe Sl'lmc?
e. If Y is Jastributed F7 .~. find Pr(Y > 4.12).
= 1) = 0.99, Y is distributed
c. Show that (Y~) = 3 and (WJ) = 3 X l00 2. (Hin.t: Usc the fact th at
tht.:. 1-.unosis is 3 for a normal distribution.)
d. Derive E(S) , (52). (5 3) and (54 ) . (Him. Use the Jaw of iterated
t!Xpcctations conditioning on X =0 and X = 1.)
th~.:urem tu
Exercises
a. Jn a random sample of size n = 100. fi nd Pr(Y
b. In a random sample of si7e 11
<
61
101).
N (1 0, 4).
a. Compute Pr(9.6 s
(ni) n = 1.000.
Ys
= 100. and
<
lO - c)
2.16 Y is distrib uted N(5, 100) and you want to calculate Pr( Y < 3.6). Unfortunate ly, you do no t have your tcxthook and do nor have access ro a no rma l
probah1hty table like Appendix Table l. H owcwr. you do have vour compurer and a computer program that ca n generatt 1 i.d. draws from the
N(5. I 00) distribution. Explain how you can usc your computer to compute
an accurate approximation for Pr(Y < 3.6).
2.17 Y,. i = 1, .... n, arc i.i.d. Bernoulli rando m variables with p
denote the sample mean.
= 0 .4. Let Y
:5
2. 11( In any year. the weather can innict storm damage to a home. From year to
vcar.lhe d amage is random. Let Y denote the doiiHr value of damage many
gl\cn year. Suppol>e that in 95o/o of the years Y = $0, hut tn 5% of the years
}' = $20.000.
a. Wh.ll is the mean and standard deviation o f the damage in any )c!ar?
b. Consider an " insurance pool" o f tOO people whose homl!l> are sufCicicntl\ dispersed so that, in any year, the damage 10 dil krcnt homes
can reviewed as independentlv distributed random variables. Let Y
denote the a'...:ragc damage 10 theS<! 100 homes tn .t year. (i) What ;.,
tht: "'pcct...:c..l value ofth<; ..l'~ra)o!C lhtmal.!c f , (n) What ..., th~: prohahilit) th,ll ) exceeds $2000?
62
CHAPTER 2
Review or Probability
2.19 Coosider two random variables X and Y. Suppose that Y taJ...cc; on k valucc;
y 1, >'t and that X takes on I values x 1, x1
2:!=
= x,)Pr(X
t 1).1 Hmt:
Use
cor((X, Y)
(I
and
= 0.
2.20 Consider three random variables X, Y, and z. Suppose that Y takes on k values y 1 >'k that X takes on I values x1 . , ' "and tha t Z takes on m values , 1 , Zm T11e joint probability disuibution of X, Y. Z is Pr(X = x, Y =
y, Z = z), and the conditional probabi lity dist1ibutio n of Y given X and Z is
Pr(Y = IX = x Z = z) = Pr(Y = .v. x - x. l.- { 1
y
'
Pr(X - x. 7 (j
o. Explai n how the marginal probability that Y = y can be calculated
from the joint probabi lity distribution. [Hi11t: This is a generaliza tion of
Equation (2.16).]
31E(X W .
2.22 Suppose you have some money to iovesL- for simplicity, $l- and you are
planning to put a fraction w into a stock marke t mutual fund and the rest.
1 - w, into a bond mutual fund. Suppose that $1 invested io a stock funtl
yields R. after one year and that $1 invested in a bond fund yelds Rb, that
R1 is random with mean 0.08 (8%) and standard deviation O.o7. and that Rh
is randolll with mean 0.05 (5%) and standard deviation 0.04. The correlation
between R3 and Rb is 0.25. If you place a fraction w of your money in the
stock fund and the rest, 1 - w, in the bond fund , the n the return o n your
investment is R = wR5 + (1 - w)Rb.
w =
of R.
63
d. (Harder) \\'hat is the value of w tlutt minimizes the standa rd devta tio n
of R? (You can show tbjs usmg a grapn, algebra, or calculus.)
2.23 This exercise provides an example o f a patr of rando m variables X and Y for
which tbe condit iona1 mean of Y giveo X depends on X bur corr(X,Y) = 0.
Lei X and Z be two independently distributed standard normal random variables, aod let Y
= X 2 + Z.
= X 2
= 1.
c. Show that E(XY) = 0. (Him. U e the (act that Lhe odd moments of a
sta ndard normal random variable are aU zero.)
d. Show !hal cov(X, Y)
= 0.
= 1, 2, .... n.
= 1.
K,.
~ ~ is distributed '- ,.
~;~; Yf
n-1
APPENDIX
2.1
- - - ----1
+ b(Y- p.y)J2J
+ 2E fab(X - P.x)(Y
P.x) 1)
- }.ly)}
+ E(b2(Y - }.ly)2J
=
111111
+ 2J1/Kr \ y
~ b~tr}.
(2.49)
64
CHAPTER 2
Review of Probability
wh~.;rc
ng
the: -.cc,nu 10 1ht\ follow~ t'ly colkcting term the thtrc.l ..:quaht} follow-; by cxp: nJquadr:uic. and the fourth equality fol11'~ b) the tkltnJlton o t the van,m~-e nnd
th~
CO\ JnJ.llcc.
1o Ul.'nve
LtjU .Ilh>n
= E(l(Y -
J.l y) ~~A-t
!'I- L:(p
J.ld
I-
+ cV. Y) == Ella
IL"I)[Y
J.l.. t
II
11-dJ
+ llc(\t-
(2 50)
p. li( Y - 11-y}f
buxr- ccryy.
IJ.r) + p. 1 II
E[(X- Mx)(Y ~ My)] + !Jxli( Y- 1-4r) + IL rE(X - J.Lx) + J.l..xJ.l..y = (rx 1 -r J.l..xJ.Ir
We now prove the CMrelation inl.'quality in Equation (2.35); t h<~l is, 1corr(X. >')] :! I.
Let n
2att-' y
+ 2(- uXYI"~Wx 1
(2.51)
= u~- uhlu~
Bec:tuse
v:.~r(uX
~o
tr_i.q?
(covarianc..: mequahty)
(252)
.llle ccnananc..: 10cquality tmpties that u.~ yf(trrui) ~ I or. ci.JUI\,dc n tv.
t' { u x iTt) :... I, \\hich (u mg. the definitiOn or tho.; corrdutH.>n) provr:;. lhl.' C(.llfdJtll.iO
1111!quality.jcorr( X .Y ) Is I.
,v,
CHAPTE R
Review of Statistics
t atis t ic~
S Statistical tools.
is the science of using data to learn about the world around us.
hdp to answer questions about unknown charactcnstics of
\\ hHI j,
the
m~:an
of rhc
~u n t:y uf
the U.S. populaunn is rbe decennial ccnsu .... Tiu: 20fXl l'.S. Cc nsu. cost
SJO billion. nn<.lthc proccs.c;. o f de<:igning the c~.. n ~w. Corm~. managing and
conducting tl1e surveys. and compiling and analyzing the data takes ten years.
Despite this extraordinary commitment. many member~ of the population slip
through I he cracks and arc not surveyed. Thus a diCfcrcnl, more practical
approach is needed .
TI1c key insight or statistics is that one can learn .1bout a population
di-.tt ibution by sck cting a random san1plc fro m th.1t p(pulation. Rather than
.;ua '1.!} th" en lire U.S. population, we might .,urve}. S<l}. 1000 mcmhers of the
population
mt:.thod<:. we.: can use this sample to reach tentative conclusions- to draw
~t.ltisucal
65
l
66
CHAPTER J
Review of Statistics
'Jbrec types of statistical methods are used throughout oconumctril.v
e\ lllence to decide wbelber it is Lrue Confidence mtervals u.;e a set \)I c.lata to
estimate an tmerval or range for an unknown population charactl;nStiC. Sections
3.1, 3.2. and 3.3 review estimation, hypothesis testing, and confiucncc intervals in
the context of statistical inference about an unknown population mean.
Most of the interesting questions in economics involve relationships between
------------------------------------------------------------t----3.1
3.1
61
3. l
Unbiasedness.
68
CHAPTER 3
Revtew of Stoh~cs
3.2
P, 1
L~t jj,y be unother estimator of P..l and suppose that both P.r and ji. 1
ore unhiascd. Then {Ly is said to be more efficient th:tn ;;..,. 1f var([J.~) <
IS a
var(ji.y ).
Another desirable property of an estimator jL,. io; that, when th~
..ample 'itc is t.ug,e. the u ncertmnl~ about thl val u ~: ol 11> ari,.ng lrnm random
'ariatmn.., 10 thl .,ample I' V~IJ 'ma ll. S tatcJ more precisd) a J~.;slrahle propt!rt~
of P., is th.tt the p10bahility that it is withm a nt.~ll lOll \ d o' thc tJ,Uc \ du" p.,
011
ilpp roalhl'' I,,, tht. .,,,mpk '"~,; mcrc.:asc~ thatts.p.
Consistency.
ccpt 2.ll).
Properties of Y
How Joes Y farl! .t) nn esttrnator of IJ.y when judgcll h~ the th ree Clltt.:tt.l of h1a~.
COO\I"tCOC}, unJ clflcicn('y'?
Bios and consistency. Thtc> 'am piing di-.trihutton ,,1 )' h.' alrc.tO\ hccn ~' tmincd in ~cc.:uon~ 2 ' and 2.6. As
s hO\\ n
in Sccuon ~ '\ I
n ) = JJ Y
<;(\ }
) )!
3. 1
69
unbi a~cd
estimator of p.y. Similarly, the law of large numherl> ( Key Concept 2.6)
states that Y ~ JJ-y. that is. Y is consistent.
Efficiency. What can be said about rhe efficicncv of Y'! Because efficie ncy
t!n taals 1 cornpanson of estimators.. we need to sped f) the estamator or c timators
to whtch Y is to be compared.
We stan by comparing tbe efficiency of Y to the estimator Y 1 Because
Y 1..... Y,1 are i.i.d .. the mean of the sampling distribution or Y1 is F( Y1 ) = JJ-y: thus
Y1 is an unbiased estimator of /J-)' Its variance is vnr( Y1) = tr ~ From Section 2.5.
the varinnce of }' is CT~ l n. Thus. for 11 ~ 2. the variance of Y i::. lc'>S than the variance ol Y1: that h. Y i::. a more efficient e::.timator th.tn Y so. according to the crite::non of dlictCnC). l "hould be used instead of Y,. 'l11e esumato r Y1 might strike
\ OU as an obviously poor estimator-why would you go to the trouble of collecting a sa mple of n observations only to throw awa\ all hut the fir~t ?-and the concept of c tl'iciency provides a formal way to sho\\ th.a t Y is a more desirable
estimator than Y1
Wb!ll about a less obviously poor estimator? Consider the weighted average
in which the observations are alternately weigh ted by ~ nnd ~:
-y
(t
3 + 2y3
1 + 23 y4 + ... + 2Y,_,
I
3 ).
= nl 2YJ + 2y2
+ 2Y,
(3.1)
Y is the least squares estimator ofJ.lvy The l-ample avcra~c:: Y provide' the
hc ... t fit l<l the data in the ~en!';e that tht" .1vcr 1g -.qu 1r l Otlkrcnc..e ... ~d\\CI.'n the
uh 'n.tllnn.:and r arc thc.:-.maUest ot aU po"1hlc c~tunntnr'- ...
&
----
10
CHAPTER
Review of Stotislics
EFFICIENCY OF
3. 3
y: y
IS
BLUE
Let fir be an estimator of p.y that is a weighted average of Y1, , Yn. that is.
f;,y == ~ I7= 1a1Y;. where a1.... , an are nonrandom constants. If fiv is unbiased, then
var(Y) < var([i.y) unless fly = Y. Thus Y is the Best Linear Unbiased Estimator
(BLUE); that is, Y is the most efficient estimator of J.Ly among all unbiased estimators that are weighted averages of Y1, .. . , Y71
L (Y,- m) 2
(3.2)
i=l
which is a measure ot the total squared gap or distance between the estimator m
and the samp le points. Because m is an estimator of E(Y), you can think of it as a
prediction of the value of Y 1, so that the gap Y; - m ca n be thought of as a prediction mistake. The sum of squared gaps in expression (3.2) can be thought of as
the s um of squared prediction mistakes.
Tne estimator m that minimizes the sum of squared gaps Y; - m in expression
(3.2) is called the least squares estimator. One can imagi ne using trial and error t\ l
solve the least squares problem: Try many values of m until you are satisfied tha t
you have the value that makes expression (3.2) as small as possible. Alte rnatively.
as is done in Appendix 3.2. you can use algebra or calculus to show that choosing
m = Y minimizes the sum of squared gaps in expression (3.2). so that Y is the least
squares estimator of f.t y.
3.2
Hypothesi~ Tes~
71
J andon Wins!
honlv before the 1936 Presidential election. the
in 1936 man) households d1d not have cars or telephones, and tho<>e that d1d tended to be richer-and
were abo more likely to be Republican Because rhe
telephone survey did not o;ample randoml) (rom Lhe
population but Jn)tf.:ad undcrsampled Democrats.
the estimator wa~ bia ed and the Ga zette made an
embarrassing mistukc.
Uo you think :.ur..eys conducted over the !Illernet m1ght have a <itm1la problem wnh btns'?
3.2
Hypothesis Tests
Concerning the Population Mean
Many hyp otheses about the world around uc; can be phrased as yes/no questions.
Do the mean hou ri~ earnings of recent U.S. collcg~ IH .tlluutc-. equal $20/hour? Are
mean earnings the same for malt: and female college gralluatcs? Both these que~
tions embody o;pccific hypotheses about the population distnbution of earnings.
D1c s tatistical challenge is to answer thcc;t; questions based on a sample of evidence. This section describes testing h~ p()these' concern~ the populauon mean
{Does the population mean t houri\ catntn ., u u " )') . ltv othec;is tests
mg l \\C) l>OpulatiOn!>
are taken up in Section 3.4.
72
CH APTER 3
Review of Statistic
Ho: ( Y) = JLr,o
laAA
U,~ \
For cxmnple, the conjecture lhat, on average in the population. college graduates
cnm $20/hour constitutes a 011!! hygprhqj~ ghmu rh5 go pulappn distribution oi
hourly cnrnin s. Sta ted mathematically. if Y is the hourly cnrning of a randomly
:-.clcctcc.l recent co ege gra uate, t en e nu 1ypo csa!l tS a
.. . 1a
r- r ~~s.
-~
~
r' ',If'
(3.3)
(3.4)
It~
O ne-sided alternatives are also possible. and these arl! dt:.cussed late r in Ibis
section.
..
3.2
73
ra
(3.5)
That is, the p-value is the area in the Laits of the dislribution of Y under the null
hypoth esis heyond 1?0 ' 1 - ,u.y,ol rt the p -value is large. then the observed value
Y""1 is consistent with the null hypothesi, but ifllle -value is small it is not
,.,
_
mpute t e p-va uc t
c c ary o now the sampling distribution of
Y under the null hypothesis. As discussed in Section 2.6. when the sample size is
~ ... f, small this distribution is complicated. However. accordi njtlo the cemra!Umitthe.._
-=orcm, when th
Ie size i~ lar c the sam ling distribution of )7 is well a rox~
tmate by a normal dis tributi on. Under the n
ypotbc-is, 1 e mean of lhis
normal distri bution is J.L Y.o so under the nuiJ bypothesis Y is dislributcd N(,u.,.:11,ut ),
AI
,A
74
CHAPTER 3
FIGURE 3.1
Review of Statistics
Cakukmng a p-valuo
Y' '"'
,..
II \,I I
<Ty
2tll( I
I -
IT )
Jl.\
01)
(3.6
3. 2
75
where ct> is the standard normal cumulative dhtribution lunc.: t1on. That is. the
p-vuluc is the a rea in the tails of a standard normal dl'\lnhutioo outside::
::!: (Y - p. r.u) I u y .
v t e nu
-.c in general
<ly mu~L c esti mated before the p-value can be computc:d. \\ C now tum to the
problem of c:stimating 17~.
Sy
1 ~
=~
~ (Y, I=
Y) .
Sf. is
(3.7)
The ~n mple 'ltandard dcvjetipp fn is the spUAr roo,t of the .,,mlll~ variance.
11tc lorm ula for the sample variance is mucb like th~ h>rmula tor the popu ation variance. The population variance, E(Y - J.l.) ) ' , I\ the average value of
( Y- J.l.y) 2 in the population distribution. Similarly, lh~ ..... mpk \!I ria nee is the -.ample~vcrage of (Y, - JJ.y} 2 i = 1.... ,n, with two modifil.ttion1: First. fL >is rt:placed
the divi-.or 11 - I instc.H.I of 11
b\ Y. and 'ccond. the avera
~ r\.:a'\on for Lhc fmt modificauon- c
y
ts
unkno\\n and thus must be estimated: the natural c!'illmator of IJ. > ic; Y . The rca'un fur the second modification-dividing ~b~\~ll'-~~~~~W..61iiiiiiiiil~i!Wrj,.,},j..__
m_ 1tin11"
~ 5 s,
...... r l b' Y. introduc ~as
r t' ' iulh
1
wn 111 xercise 3.18, E[( Y, - Y) ] = [(n - l )fni<Ty. l hus. 1:. 2:: 1 ()', Y)1 =
ni(CY, Y)1 J = (n- l)t7f, Dividingbyn - I 10 Equution(3.7)in.,teadofncorrcch tor thi:. :.m<tll downward bias. and as a res ult s~ is unbiased.
01 \ H.hng hv
of n i called a degree of freedom
1cmain.
76
CHAPTER 3
Review of Stotistic.s
3. 4
T he stapdard error of Y is an estim;ror of rbe !apdard deviatigq g f Y, The standard eg or of Y js degwcq;l bySECY) or by up. Wilen Y1 Yn are i.i.d..
SE(Y)
= Uy
= Sy!Vn .
(3.8)
3. 2
77
unknown and Y1 . , Yn are i.i.d .. the p-value is calcu lated using the formula
p-value
( ly (lf/ - I)
= 2<1>
JJ.
S (Y)l.u .
(3.10)
The t-Statistic
T11e standarcli2ed sample average (Y - 1-LY,o) IS( Y) play~ fl central role in testing
statistical hypotheses and has a special name. the t-statiseic o r 1-ratig_ y - 1-LY.O
(3. 11)
t- SE(Y) .
(3.12)
r a<:t =
0
_ _-:::,...:.=
SE(Y)
(3.13)
78
CHAPTER 3
Re-~iew of Statistics
to bl!
Hypothesis Testing
with a Prespecified Significance Level
Whe n you unde rtake u statistical hypothesis test. you cun make two types of mis
takes: You can incorrectly reject tbe null hypothesis whe n it i!) true, o r you can fail
to reject the null hypothesis when it is false. Hypo thesis tests can be performed
without computing the p-value if you are willing lo specify in ad vance the pcobability you arc willing to tolerate of making the first kind of mistake- tha i is. of
incorrectly re'cctin the null h the. is when it is true. If ym1choose a prespecific<.l pro a 1 tty of rejecting the null hypotheSIS when it IS true (fo r e xample,5%).
the n you wiU reject the null hypothesis i ( and only if the p -value is less than 0.05.
This approach gives preferential treatment to the null hypothesis, hut in many
practical situations this preferential treatme nt is appropria te.
Suppose it has be ~ n
decided that the hypothesis will be rejected if the p-va!ye '5'' S !IWO 5%. Because
er the lails of tbe normal distribution outside =: 1.96 i" 5%, this g1vl!s
a simple rule:
(3.15)
1bat is, reject if the absolute value of the 1-statislic computt:d from the sample i!'.
greater than 1.96. H 11 is large e nough, then under the null hypothesis the
t-statistic bas a N(O. 1) distribution. Thus, the probability of erroneously rejectmg
the null hypoth esis (rejecting the ouU hypothesis whe n it is in fact rrue) is 5%.
This (rarne work. for testing statistical hypotheses has some specialized 1erm1
no l.ogy. summarized in Key Concept 3.5. The significance level of lhc test in E4u;-a
tion (3.15) is 5% , the critica l valu e of this two-sidetltesl is 1.96, and the rejection
region is the values of the t-statistic outside :!: 1.96. 1f the te5t rejects ut the 5/c) c;ig
nificance level., the population mean J.L y is said to be stotislically significantly dtf
ferent from J.LY,o at the 5% significance leve l.
l esting hypo theses using a prespecified significance level docs not require.
computing p-values. In the previous example of testing the hypothe,is that th~
mean earning of recent college gradua tes is $20, the /-statistic wa:, 2.06 Uti-'
exceeds 1.96. so the hypothesis is rejected at Lhe So/o level. Although pc:rlorminc
rhe test with a 5% signi ficance le veJ is easy. reporting only wht:thl!r the! null
3.2
79
lht: tc:>l. '(b e crilic~tl \'lllue of the tec;t stati!.tk is lhc vaJue of the sta tistic for whtch
the tc~t jur;t rejects the null hypothc-;i:;, at the g h.'en significance le vel. lnc r;ct of
, alue-. nf the test statistic for which the test rejects the null hypothesis is the rejection region. nnd the values o f the te't statistic for which it does not rject the null
h~ p<lthcsis is the acccp1a ncc rc~i o n. fne p robability that the test actually incorr~ctJy rejects the null h~1 pothe sis when it is true is the size of th~ t ~sl. and the proh~bilily that the rest correctly rl:!j~ctt. the null hypothesis wh~n the altl.!rnative jo; true
i~;
In many cast:s, statistician'> and t:conomctricians use a 5% significance le\1.!1. lf you wer~ to test many
<;tali,llc:al h\ pothc,es at the 5% level. ) o u wo ukl incorrl.'(:tly rc.jcctthe null on average once in 20 C<N!'- Sometimes a more conscn.ttivc ... ignificance h::vel might be
in order f-or example. legal cases :>omclimes mvolvc stH ii:.Lical evidence. a nt.! the
null hvpolhc\is coultl be that lhe de fendant IS not gu alty; the n one woultl '"ant to
be quite sure thnt a reJection of the null (conclusion of gualt) a<; not just a result of
rantlnm !i.lmplc variation . In some legal l.CIIings the '>lgnificance !~vel u~ed i" I%
or even 0.1 %. to U\ oiLI thic; son of mistake. S1milll1 ly, 1f a go vernment a gem:~ is
considering p<.:rmitting the sale of a new tlrug. a verv con:,crvaLive standard migh(
be in oadcr so that consumers can be sure rhnt the drugs available in the matkcL
actuall~
\\urJ.. .
1
80
CHAPTER 3
Review of Statistics
3.6
Being conservative, in the sense of using a very low !:iign ifica ncc level, has a
cost: The smaller !J1e significance level, the larger the critical va lue, and the more
difficult it becomes lo reject the null when the null is false. In fact, the most conservative thing to do is never to reject the null hypothesis-but if that is your view,
then you never need to look at any statistical evidence, for you will never change
your mind! The lower the significance level, the lower the power of the test. Many
economic a nd policy applications can call for less conservatism than a legal case.
so a 5% significa nce level is often considered to be a reasonable compromise.
Key Concept 3.6 summarizes hypothesis tests for the population mean against
the two-sided alternative.
One-Sided Alternatives
In some circumstances. the alternative hypothesis might be that the mean exceeds
.UY.o For example, one hopes that education helps in the labor market. so the relevant alternative to th~ null hypothesis that earnings are the same for college graduates and nongraduates is not just that the ir e arnings differ. but rather that
(3.16)
3.3
Con~dence Intervals
8T
calcu1atcd c-statist 1c. That is, the p-value. based on the V(O. 1) approximation to
the distribution of the t-statistic, is
p-value
(3.17)
The N(O>L) criticnl vnlue for a one-sided test with a 5/o significance level is 1.645.
'The rejeclion re~ion for this test is all va lues of the r-sl'atistic exceeding 1.645.
The one-sided hypothesis in Equation (3.16) concern" val ues of J.L y exceeding
JLY.o If instead the alternative hypothesi!. is that E( Y) < JL);(I then the discussion
of the previous paragraph applies except that the signs are switched; for example,
the 5% rejection region consists of values of the t-stallstic less than - 1.645.
3.3
Confidence Intervals
for the Population Mean
Beca use of random sampling error, it is impossible to learn the exact value of the
population mean or Y using only the information in a sample. However. it is possible to use data fro m a random sample to construct a set of ,a lues that contains
the true population mean J.Ly \\ith a ecnain pre~pccified probability. Such a ~et is
called a confidence ~et. and the prespecifil!d prooab1ht~ that p.., is contamed m this
set is caUed the confidence level. The conf1dence set for f-L y turns out tO be all the
possible valut:s of the mean between a Lower and an upper li mit, so tha t tbe coo
fide11ce set is an interval, ca.lled a confidence int.crvnl.
Here is one way to construct a 95% confidence set for the population mean.
Begin by picking some arbitrary value for the mean : call this J.LY,n Test the null
hypothesis that JJ..y = J.l-.r.o against the alternative that J.Lr _. J.t.r.o by computing the
/-statistic; i( 1t JS less thdn 1.96, this hypothesized \ alue J.I..):O is not rejected at the
01
'\ o level, and write down this nonrejected value f-L H Now pick another arbitraT)
value of JJ..r.o and test it; if you cannot reject it. write this value down oo your list.
Do this again and again: indeed, keep doing this for all possible values of the populalion mean Continuing this process yields the s~t o f all values of the population
mean that ca nnot be rejected at the 5% level by a two-sided hypothesis tes1.
This list is useful because it summarizes the set of hypotheses you can and cannot reject (at the 5% level) based on your data: ll someone walks up to you \\tth
a specific number in mind, you can tell him whether his hypothesi is rejected or
not simply by looking up his number on your handy list. A bit of clever rl!asoning
. hows thatth1s set of values ha~ a remarka ble property: The probability that it contain!> the true ' aluc of the population mean j..,l)) 0 {,
82
CHAPTER 3
Review of Statistics
(V 1.64SE(Y)}.
{Y =:: 2.58SE(Y)j.
The clever reasoning goes like th is. Suppose the t r ue value of Ji.l is 21.5
(nllbough we do not know this). TI1cn Y has a normal distribution centered on
21 .5. und the /-statistic te~-r in g tht: null hypothesis IJ. y = 215 has a N(O. 1) d istrihution. Thus, if 11 is large. the pro bability of rejecting the null hy po thesis Jl.y = 21.)
a t the 5.{, level i.s 5%. Out bccau..-;c you tested all possible va lues of the population
mean in con\trucring your set. in particular you tested the [rue value. J.Lr .= 21.5
In 95 % of all !'ampk s. you will correctly accept 21.5: litis mt!ans that in 9:"% o f all
sampl~. your list will contain the true value ot J.J-y. nws. the va lues o n yow list
cnru;timtc a lJ~% confidenct: set for p.y.
This method of constructing a confidence <;.Ct is impr acticaL for It requires you
to test all possible values o f J.l.y as null hypotht:ses. Fortunately then~ is a much ..: a~
1cr approach.Ac{:ordmg to the formula for the r-stat1stic in Equation (3.1 ~).a rri.tl
value ol P..ut is rejected at the 5!(, level if it is more than J.96 . tandard errors a"" u~
from Y Thus the set of ''a lues of Ji.} that are not rejected at the 5% level wnsi~h
of those values within :t 1.96SE(Y) of Y .That is. a 95% w nfidcm.:e interval for J.L >
is Y - l 96SE(YJ s JJ.1 s }- + 1.%Sf.(f\ Key Co ncept 3.7 summantes tht'
lpproach.
L\s an example, consjdcr the probkm of constru~:ting. a 95% co nfidence interval for the mt!an hourly earnings of recen t college graduates using a hypothetic<~ I
random sample of 200 recent coll~ge graduates where Y = $22.()-t nnd SE(Y)
1.28.1l1e 95% con fide nee interval for mean hourly e a rnings i<> 22.64 +- 1.9fl -x 1.2.'
= 11.6-t
~5 1 = [~20.13. $25.15}.
1 his discussion so far ha s roc.: used on two-sided confidence tnterval". Ont.
could m-.tead construct a one-sided confidence interval as the scl of valu~.:s of P.. 1
that cannot he rcjt:cted by <J one-sided hypothesis test AJ tbough unc-sided conft
3 .4
83
t.lcnc~
mtc:rvals ha\e applicanon' m ,orne branchc:'> ul -.t tll'-tic,, the~ are uncommon m apphcd ~onomctric anal)'i'
Coverage probabilities.
3.4
Comparing Means
from Different Populations
Do rt!cent m.1le and female college graduate<. earn the samc amount on a veragc:?
Thi' qu~;,lltln invohes comparing the means of two dillerent populatmn distributionc;. Thio.; section summarizes h~1w to tel>l bypothc)es md ho'' w con')lruct
cunltd~ncc tnh:rvals for the difference in th~.: mc,lll'> (rom l\\0 different
populations.
Hypothesis Tests
for the Difference Between Two Means
Let ll. bc: the mean hourly earning in the populatiOn of women recc:nlly graduated Irom college and lel #J. be the population rnt!an for recently graduated men.
Conc;tdt!r the null hypothesis tJ1at eammgs for thcst: two pupul.tttons dtUcr by a
certain amount , say d11 Then the null hypo thel>b .md the t \~vO-'>iued alte rnative
hy pot hc~'" m~:
(3.1 )
OH! null hypolh(!')is that men and wom~n in tht!l.c populauon~ have the same t!arnm~~ c.onel>pundc; 10 Ho in Equal ton (3. 18) \\llh tl
n
Bc~;.ausc: thest:: population means <tre unl..no\\ n. they rnu't be estimated from
'iillnpJc, tlf men and women. Suppose \\'c ha,c samples nf 11,, men and 11. women
dr .two at ranJom from their population.' Let I he sample uverage annual carnin~
be }"' tur men and Y,, for women. Then an Cl>ll m nor ot JJ.1 - JJ. . i'> } ,, - Y,,
li) IC!<t th-. null hypothests that p., - IJ.... = d usinL } , - }; . '" need to know
thr..; Ul\trihutton of}:., - Y..~ Recall that Ym '' accordtng to the centrul hmll theorun approximate!~ di:-.trihutcd V(IJ.m a}, I 11, ) . whcrt.' rr;, j, the population variam.c nl car ning" for men Similarly. Y. is apprwomatdy Lli<trihuted N(p. .... rr: /11w),
84
CHAPTER 3
Review of Statistics
u;
where
is the population variance of earnings for women. Also, recall from Section 2.-1 that a weighted average of two normal rnm1om variables lS itself normally
distributed. Because Y,11 a nd Y,, arc constructed !rom different randomly selected
samplec;, they arc indcpemlenl random va riables. Thus. Y,, - Y,. b distributed
Nlfom- JJ-,,. (u,:l nm) .,_ (u;ln.,)].
If fT~ and
are known. tben this approximate normal distributton can be
used to compute p-values for the test of the null hypothe~,is thal#J.,., - IJ.,. = d0 . In
practice, however, these population variances are typically unknown so they must
bt! esti mat~d. As before. they can be estimated u. ing the ampll! variances, s~, and
s:. where .\~1 is defined as in Equation (3.7). except that the ~ta tistic is computed
only for the men in the sample, and is defined 'imilarly for the women. Thus the
standard error of ?,11 - Y.. . is
cr;
s;
SE(ym - y ...) --
~;,,
\ '"
s?.
+ n" .
(3. 19)
The r-statistic for testing the null hypothesis is construcled analogously to the
/-statistic for te!>Ling a hypothesis about a single population mean, by subtracting
the null hypothesized value of ll-m - IJ... from the estimator Y"' - Y,,. and dividing
the result by tbe standard error of Ym - Y .,:
r = <;E~YV..~ ;
Ill
~0
(3.20}
IY
lf hoth nmand n,.. are large, tben th.i:; r-statistic bas a standard normal distribution.
Because the r-statistic in Equation (3.20) has a standard normal distribution
under the null hypothesis wben n"' and n,. are large, t he p-value of the two-sideJ
lest is computed exactl} as it was in the case of a single population: that is.. tbe
p-val uc is computed using Eq uation (3.14).
To conduct a test wi th a prespecifi ed significance level, simply calculate tht!
r-statistic in Equation (3.20) and compare it ro rhe appropl'iate crilical value. For
example, thl! null hypothesi is rejected at the 5% significance level if the absolute
value of tbc 1- tal i tic exceeds 1.96.
If the altcrnaUvc is on ~-sided rather than two-sided (tha t is, if the altcrnativt:
is that J.Lm - IJ.w > d 0), then the test is modiCi !!d as outlined in Section 3. 2. me
p-valuc i-; computed using Equation (3.17). and a test with a 5% significance kvel
rejects when t > 1.65.
3 .5
means. d
= J-l.m -
85
if 111> l.96. du "ill be in the confidence set if ftl s 1.96. But itI :s; 1.96 mean~ that
the estimated differ ence, Y , - V.... i1o less than 1.96 s tandard errors away from t/11
Thus. the 95!.) two-sided cunfit.lcnce interval ford consi!'ts of those values of d
within :: 1.96 standard errors of Y 111 - Y,..:
95% confidence interval ford= J-l.m - f-L .. is
(3.21)
With these formulas in hand, the box ''The Gender Gap of Earnings o l College Graduates in th e U.S." contains an empirical investigation of gender differences in earnings of U.S. college graduates.
3.5
Differences-of-Means Estimation of
Causal Effects Using Experimental Data
Recall from St:ction 1.2 that a randomized co:ntrollt!d ~xpcrim c nt randomly selects
subjects (individuals or. more generally. en tit ic ) from a population of interest.
then randomly a igns them either to a treatmen1 group. which receives the experimental treatment, or to a control group. which docs not receiw the treatment.
The d1fference between the c;aruple means of the treatment and control group is
~1n
~:xperi.ments.
86
Review of Statistics
CHAPTER 3
TABLE 3. 1
C:()nt/mte<l
Men
Confide~><
,._,
lnmvol
Year
Y.,
...
n,.
v_
1992
20.33
8.70
1592
17.60
6.90
137(1
2.7:.
0.29
2. 1~3.30
19%
19.52
8.48
\'377
16.72
7.03
1235
2.80
0.30
2.22 1..10
2000
21.77
10.()0
1.300
18.21
8.20
ll!i2
3.56
0.37
2.HJ t 21J
201~
21.\N
10 39
19(11
" .47
s 16
1739
3 '\2
0. ~ 1
2.1-H -4 13
n_
Y.,.
r_
Sf Y..
ford
Thee ~liM tes an comrcued 1m11gd.ua on all !uUumc workers~ :.5-1.1 :.uf\c\ ~d 1 l Cun ~tl'l: pulatwn !>un~v ron
dueled tn \l..rdt ot lbL n~ ~ ar ( for C.'{~ o .. 11 dau for2(X)4 wen: c:olkL'tcd tl \ t 1n;b ~ 61 The dtrfclcnu: ' ''gmfi.-ant l)
dtU..-rcm tr..>m ltro ;n 1b.; I 51!tn't..:an.:e c\cl
3 .5
87
Part 11.
TI1is empirical analysis documents that the "gender gap" in hourly earnin gs is large and has been
fairly stahlc (or perhaps increased sligb1ly) over the
recent pas t. The analysis does not, however, tell us
why this gap exists. D oes it arise from gende r discrimination in the labor market? D oes it rdlcct differences in skills. experience. or educa tion between
88
CHAPTER 3
Review of Stoti~tics
unrclmc.:u to Lh~ trc.:atm.:nt or 'iUbjcct characteril:ttic ha-.thc t.lfect of a!:-'-Jgning uitfcrcnt treatmcnL" to dtffcrent subjccrs as tf they bad been part ot a randomized
controlled t:Xperimen l. The box, ''A Novel Way to Boost Rc;ttr~mcrll Si.i\tng~"
pruvitlcs an ~'ample of such a quasi-expt:riment that >1eh.lt:d 'llnlC '>Urpri-.ing
conc.:Ju..,tOn~
3.6
appro\imation to the di-.tribu tion of the l-sta tistic. H. however, the popul.uion
dbtnhullon is itsdf normally distributed, then the exact Jbtnhuuon (tbnt ~. thl.!
Iinilc-,ample ui tnbution: sc:e Secuon 2.6) of the 1-stati..,tic testing the; mean of .t
smgle population i::. th~ Stuuem t distribution with 11 - l degrees of freedom. ttnd
t:rilicul valu~ c<tn he taken from th ~.: Student 1 distribu tion.
Y - J.tLo
I = .
-VS'j-111
~
(3.22)
3.6
89
distributed. thl!n the 1-stntistic in Equation (3.22) has a tudt:nt t di:-.trihution with
n - I deg.rt!e!> of freedom.
To \'Crif\ thi" r<~!>Uit. rct:all from Section 1.4 that the St udcnt / distribution with
11 - 1 dt:gr~e!> ot frl;cdom is defined to he the dtstrihution of Z tVWI(n- 1).
"here Z is a random 'ariat"lle with a standard normal distributiOn. W ~a random
variable " itb a ch1-squared clli.Lribution \1.1th n - 1 dcgr~!> ol freedom . and 7. and
n arc tndcpendenll) distributed. When Y1. . Y, arc 1.1.J. and the population di5tnbution of Y is .V(IJ.r, u~ ). the /-statistic can be '~ ritlen "' '>Uch a ratio. Spccifica II).ICt 7. == (Y - /J.) 11)
and let 'lV - ( 11 - I ).1~/ tr~: then some algebra 1
shO\\S that the r-.,tatistic in Equation (3.22) can be written a<> c =
Z tVw (n - l ). Recall from Section 2.4 that if Y1 Yn are i.i.d. and the population distribution or Y is N(J.Lr u?, ).then the !>ampling thstribution of Y is exactJy
N(p.y, u~ l n) for alln: thus. if the null hypothesis IJ.y IJ.ro is correct, then Z =
(Y - IJ.-r.o> tv;;{! n has a standard normal di!>tribution for all 11. In add1tion, W =
(n - l)s~!u~ has a xl,_ , distribution for all n, anu >'and .\f. are independently
d ist ributcd. It follow<> t ha 1. if lhe population distribution of Y is norma I. 1hen under
the null hypothcsh the t-::;tafis tic given in Equation ().22) has an exact S t ud~..:nt 1
di~tribution with 11 - l d egree ~ of freedom.
If the population distribmion is normally distributed. then critical value from
the Student t di\trihution can be used to perform hypothesi tc ts and to con!'truct
confidence interval:.. A" an example, consido::r a h~ pothctical problt:m rn "hich
t ' - 1. 15 and n = 20 so that the degree of freedom j.., n - I = 19. From .1\ppcnJi\ Table 2, the 5'3o two-sided critical ,alue for thl' t '~ di~trinution ic; 2.09. Becau c
the r-statistic is larger in absolute value than the critical value (~ .15 > 2.Cl9), the null
h) pothcsis would he rejected at the 5% significanc.: level agamsl the two-sided
a lternative. The 95% confidence interval for /J. )' con.,tl uc.:ted u ing the ' '~ distribution. would beY 2.095 E(Y). Thi::; confidcnt.:c interval b somewhat wider than
the confidence interval constructed using the tandartl normal cmical value of 1.96.
tv;;rt;;
The t-statistic testing differences of means. 1 he t-stati"lic te..,ting the difference:: of t\\ 0 mcanc;. given in Equation (.3.20). c.Jocs not hav~ a Student 1 dlst rihution, even if the popula tion distribution of Y il"> normal. The Stude nt 1
di-,tribution docs not apply here because tlw variance estimator used to compute
the standard error in EquaLion (3. 19) does not produce a denominator in thl.! t- tati~tic with a ch1-squared distribulion.
The tlc~ircd e'l.r c''1o~ ~~ oblamcd by mutuplnng and Ul\ tUIOJ! b' \1 0'1 ilnd oollccunc. lcnm:
:1'
VI\
(II -
1>.
90
Review of Stolistics
CHAPTE R 3
~conom1~h
~ave enough
have lncn:!cbingly
ob:-:.~t vcd
that behav-
ior 1!> not alway" in accord wnh conventional economic models. Ac; a con~cqucnc.:c. there hns been an
up:,urgc m mtcrcst in unconventional way~ lo influ-
financial
aspect~ of
\~a r
nf1
lhc l'hangc
~1ndn;m
this cnnvemional
rca~oning
be
chang~:d
3.6
91
phn-
A modtfied \'er.;;ion of the dtlfl'rcnces-of-mean.;; f-<.,t<lli,tic. based on a different 'tandard error fum1ula-the pooled" standard aror tomula-has an exact
Student r distrihut10n when Y u. normally distrihuted. howe,er. the pooleJ ::.t.tndanl error formul a applies only in the ~pecial case thHI the two group::. have rhe
snnh! v~triance or thtt l each group has the same numb~,;r of obser vation~ (exercise
3.21 ). Adopt the notatwn of Equation (3.19). so that thl' t wn group~ at c 1..knoted
a., m and 11. The pookd variance e timator is,
I _
~ --'----
11m' II,..
'7?
-
" (Y; 2:
i- 1
!!IOU(' Ill
~.Y 1-
" (~ 2:
i- 1
~..)' ]
(3.23)
VlnUp il
where the fi rst summa non is for the observations tn group m and t h~ second summauon is fo t the ohservations in group w. The pooled q,:mdard e rror of the c..Iiffcrcncc in means ts SEf"""''cd( Ym - ~ .) - s,ofJ/~d X v l 1111, .- I I n,. .md the pooled
H>tatlsllc JS comput~o:d usmg Eq u~lllon (3.20}, \\here th~ standard error b the
pooled ~tandard error, SEpOt~t-i f.n If the population distribution of Y in group m 1'\ N(J.L,.
if the pupulation
d1 trihutton of Yin group w i'> f' C~-t... cr' ). and if the t\\ O gwup variances are the
'a me (that is. a,~, - o-~ ). then under the null hypoth<.!-t~ the H.tattstlc computed
usmg the pooled stantlard erro r has .t Stude nt r dt~tnbutmn \\ ilh n,, + 11_. - 2
tlcgrccs of freeuom.
Tht: dra\\.-hack nf u'ing the pooled variance estimatot ,;,H,ktl is that it apphes
11nh if the two population variances are the same (as-;uming 11111 =t- n. ). If tbl! populatiOn variances arc different. the pooled variance csumator is bie~ Sl'd and inconsi~ t cnt. U the population va riances are different but the poolcJ vanance formula
is uo;cd.t he null distribution of th ~ pooled t-statistit- is not n Student r di ~tributi o n .
C\'Cn il the data arl! nomally distribu ted. in fact, 11 does not even have a standard
normal di.,trihution in large samples. Therefore. the pooled staodan.l error and the
pnolcd t-stall"tic .-.lwuld not be used unless vou have a good reason to hchcve that
lh&: population variances are the snme
Y., ).
rT: ).
92
CHAPTER 3
Review of Stotistb
3.7
3.7
l FIGURE 3.2
93
s
I
I
. :i . I !
~ I , .t
25
.35
30
40
I . .
:
45
I I
50
ss
60
65
Age
Eoch point '" the plot repro~ents the oge and overage earnings of one of the 200 workers in tho ~mple. The colored
dot cooesponds 1o o 40-yeorold worker who eorm $3 1.25 per hour The dolo ore for technicians in the information
dustry from tho Morch 2005 CPS.
Scatterplots
A -,catterplot is a plot of n observations on X 1 and Y11 in which each observation is
represented by the JXlint (X;,Y;). For e~a mple. Figure 3.2 is a scanerplot of age (X)
nnd hourly earning:; ( Y) for a sample of 200 workers in the infonnation industry
I rom the March 2005 CPS. Each dot in Figure 3.2 corresponds to a n (X, Y) pw for
one ol lhc observatio ns. For example ooe o( the workers in this ~ampl e ts -tO years
'-'IJ .md c.trns $31.25 per hour: this wo rker's age and earnings a re indica ted by the
colored dol tn Figure 3.2. The scanerplo t shows a positwe relationship between
a~~ nn<.l earning!. in this sam ple: O lder workers tend to earn more than younger
wurl..crl.. This relationship is not exact, however, a nd earnings could not be pre-
dicted
p~;rh::ctl}
94
Review of Statistics
CHAPTER 3
- X)(Y1 - Y) .
(3.24)
Like the sampk vmiunce.the average in Fquation (3.24) i::. computed b\ dh 1d1ng
hy 11 - I in'lt 1d of' hc.re tou. this difkrc.:nce \ll..ms from using X and ); to c'tt
mate th~ re'r'-~''"c pt1pulation mcaru;. When n i la rge. it makes little diff~r~n~~
'' hcther dl\ 1'1un 1~ b\ n or n - 1.
The sumplc correlation coeffi cient. 01 ~m pl e correlation 'dent tc.:J r 1 ami
i ~ the ratio nf tht <;ampk cO\anaoce to the sampk :.landard d~''talJOn<;
~\)
r \l' - ,,,,,
n1c sample l"1>rrclatiun measure~ the :,trcn!lh of the linear a"sociation hl!twcc.:n \'
and Yin <.~ S<tmplt. ut 11 ob:,ervauons. L1l.c the popu lation corrclalion, the ::..1mpic
~orrdat110 ''
fllf
r,
all i ~lure gencrnlly the c' rrdc~tion i~ _ I 1f the -.c ttcrplot "'a ::.traightline. If
lhc line sl1lpc' UP" lrd,then there i<> a r<''itin relationship ~d ween .\ .md }' <~n '
the corrdation 1~ I. If the line slope~ do\\ n thl!n there j.., a negalivc rd.rti~m,hip
and the concl.llton '" I. lllcdo'>cr the ~cattcrpl1ll is to a stmight lim.. the.; clm'-1
JS the cum.lation 111 ::!: t. A high correlat1on cudfku.:nt doe~ nul nccessnltl) mean
that the line has a 'h:cp slopt!; rather. tl ml.!ans that the point!> 10 th1.. ~~:.lth.:rrl~.>t
f<~ll \~r~ dthl! ton 'tra1.::htlint;.
3.7
tn o th~r "'mh. in
95
lo.~rge
b"'"
I' .t
not
lin~.:ar.
96
CHAPTER 3
Review of Stotislics
FIGURE 3.3
)'
70
::r
....., .
~
'ill
41)
31)
50
I ' ...
.. 4~
...... .
. ..
.. . '.
\
2H
6U
'
~'\
. , '..t: ..
. ,..
..... .. z ..
......
. ... ...... ,
:, ,:
~. ~
.. . '
30
!.. . . .
2U
I()
II
tlO
70
90
1110
II tJ
120
UO
Cw11.:lutiun =+0.9
40
:w
I Ifl
120
no
X
?10
l)(J
100
IIU
120
()()
-"~~....
.......,
.,..1:
so
40
30 -
HI
ill
IUO
')'
..~:..
~~
90
7n
.... .
.
. .. ':. ..
.... ....
... ,,....... :
.
...
.
....
. .: ... . ..
....
50
80
.I
60
0
70
)'
..
40
Ill
(a)
,.. ........
130
::~
"'0
80
~~
IIlii
~~~
130
X
II II
The scotterplol$ in Ftgures 3.3o and 3.3b show strong lineor relationships between X ond Y. In Figure 3.3c, X is independent of Y and the two voriobles ore uncorrelated In Figure 3.3d, the two variables ol~ ore uncorreloted even
though they ore related nonlineorly.
Summary
I. The sample average, Y , is an estimator of the population mea n. p.y. Whe n )'
Y, are i.i.<.l .,
a. the sumpling distribulion of Y ha!) mean JJ.y a nd ' ariancc tr ;
h. } j.., unbiast.:d:
= rr~ /fl:
Key Tenns
97
Key Terms
~'tllll.llOI (hX)
c~lllllclll (h~)
power (79)
c.:aur.all.:llt!C.:I
(~:>)
-.i_gnilll'<tlll'C h!HI
(711)
98
CI-!APTER 3
Review of Statistics
Explain the diffe rence between the sample average Y and the population
mean.
~limaLe.
3.2
ProHde an
3.3
3.4
What role does the ceo tral limit tJ1eorem play in statistical hypothesis testing? Jn the construct ion of confidence intervals?
3.5
3.6
Why does a confidence interval contain more information than the result of
a single hypothesis test?
3.7
3..8 Sketch a hypothetical scatrerplot for a sample of size LO for two random vari
ables with a population correlauon of (a) 1.0; (b) - 1.0; (c) 0.9; (d) -0.5:
(e) 0.0.
Exercises
3.1
= 43. Use t he
l 1l
c.
3.2
a. Show that p - Y.
b. Show that p is an unbi~cd estimator of p.
,. ~hnw rh;or v:erf n) = n( I - n \ / 11
Exercises
3.3
99
In <t survey of -+00 likely voters, 215 responded that they would vote for the
incumbent and 185 responded that they would vote for the challenger. Let
p denote the (ruction of all likely voters who preferred the incumbent at the
time of the survey. and let p be the (raction of survey respondents who preferred the incumbent.
= 0.5 vs. H 1: p
:f= 0.5?
3.4
A survey of 1055 registered voters is conducted. and the voters are asked to
choose between candidate A and candidate B. Let p denote the fraction of
voters in !be population who prefer candidate A , and let p denote the fiaction of voters in the sample who prefer Candidate A.
a. You arc inte rested in the competing hypotheses: H 0: p = 0.5 vs.
H 1:p=I=0.5. Suppose fhat you decide to reject H 0 if 1.0- 0.51 > 0.02.
i. What is the size of this tes t'?
= 0.53.
p = 0.54.
i. Test H 0: p = 0.5 vs. H 1: p * 0.5 using a 5% ~jgnifi cance lev<.: I.
b. In the survey
100
CHAPTER
Review of Stohslics
c. ~uppo'c that rhc '\Urvcy i" c,lrrii!U out 21l llm~os u''"!! mJcp~..ndcntl}
sclcL!cd voters m each sun c). For 1..i.11.:h of th~..-.~.. n-.ur,~.~,.a ''5'h
con u.knce intcnaJ lor p ts coiThlrul.h.:J.
i. Wh 11 j.; tllc prohab ht~ that the tru~.- '.luc ol fl
21 of l
u~<:e coniJdence
~.on rained
in nil
mtervab 1
ii. H(lw mlny oi the~ conll dem.~: mt~f\1lb do> I)U c\pt\:1 to contain
rd
b. (
3.7
.10
==
)'!
Lxplam
v~,.'
3.8
A nc\\ 'c:r:.ion ,.,f the SAf test s g.tv~.n to IUOO r.mJoml) ,cJcc:tcd high '-Chool
nior-.. The :-.ample: mean te:>t :.core j<; lllll .tnd the.. -.ampk tanJ:u d de:'"
111m is 23. Con-;truct :t 95.., confidt. nee rntcc\ al fnr the pnpulati<'n mean IC'>l
''-Ole. fort e.h sch('lol seDJ(ln,..
,l
3.9
~uppn'e that ;~ lightbulb manufacturinr plant pn. uJuccs bulb-. with ' I me<ln
lik of :!00() hour<; and a ~tandarJ deviation ol 200 hoUI '( An in' en tor claim
tn haw developed an improved procco.,s th nt prm.luct.:' hulh~ with "' long.l'T
mean hie nnd the same standan.l <kvwtmn .'l he plant mattag~..r rnntloml)
select' too bulbs produced by the pn>cc'>S. She '"Y' th.ll 'he will 1-lellcvc th.:
Jnv~..ntot \ cl.um 1llhe ~ample mean lift of the hulbl> 1s e.rcatcr than~ lOll
hour:..uthc.. 1"\\J,e. ~he'' ill concludt: that the new flTillCS' 1s no l>cttc.:J tban th ~
old pr1~c' l ct J1 Jenote the mean olth~.: 111.'\\ prl>Ces~ lom.LJ~o.lth~.. nullun.l
ahcrnativ~,. h\pothesisH., IL- 20110~ /L 1J.L
2(.)( I
a. \A h.ll
h.
IS
'\t rr~),
. . th tl
h.:
llC:\\
Exercises
1 01
c. What lc:stmg procedure shnuklt hc plant mm.tl!cr usc 11'he ''ant-> tht:
'at e ol her It: 1 to be 5% '?
3. LO S upfH''~ a nc\\ !)tandardized te-.t i' ghcn
100 rnnunmh ... ct~ted thirugrlde -,tudt:nh in ~ ew Jersey. The 'ample avera~e ...cor<: ) on the tc:.t IS SX
points and the sample standard lie' iauon. \ >. i., ~ potn l'
Ill
a. '11,.;: author-. plan to administer the test l<' all third !tl al.le student!> in
Nc '' Jcrsc\'. Con~truct a 95% confidence tntcr\'allor the mean score of
all New J e r~cy third grader'>.
b. Suppo't lht 'tme test b gt"cn to 2(1() random I\ selected thtrd graders
from Jo,\a. producing a sample tvcn.Jf.e of o2 putnts .10J -.amp!... ~ta n
d arJ dcvtallon ot I I points. (onc:truct a IJO'lu co nlldt:n\A; tntenal for
the diUcrcncc in mean scores bct\\cen lo'' " and Nc\J. Jcr:.ey.
c. Cm
3. 1I ConsiJ~..r the C!\~m alor y ' defi ned in Equation (::\.1). Show tha t (a) L (
J..l. y and (h) var(Y) = 1.25a} In .
n- =
\ten
Average Salary ( Y)
S311 J
S200
IIW"l
Women
a. What Ju thl!sc data sugl!e'l .1oout \\age differ~. nee~ in the firm'' Do
th~.:) n.:pr~..-.cnt statislicalh ,ignificant ~vtdcncc that w:~~cs of rnt.:n and
'''<'men are different? (To an"'"cr th1s l!Ut:,t ton ltr..,t ~tate the null and
alh:matih: hypothests: second. com putt the rc.:ll!\ .lilt t-\t3ll~llc: thmL
compute the p\'alue ~octatcd with the t-:>L.Illt>IIC; .tnll ftn.tll) U<;L the
fH .1luc to an!)wcr the quc~ti<.Hl. )
b. Do thcsl! data suggest Lho~t the llrm is gu tlt) ol gcnu~..r dbc:rimino.Jtion
1n ''' Lnrnp.:n,alion pulicic-.' Explalll
.'.1.'\ D:ua <'n tilth-grade te -l scorv. (rcauin}! ,wJ m.tthematll"') for .$20 --chool dbtm t-.. in C.tlilurma ~ ield Y - h4h.2 and ,t,mdard d1..' t.llaun 'y ~ 1') .S.
102
CHAPTER 3
Review of Stotislics
a. Construct a 95% confidence in terval for the mean
popui<:1Lion.
t~:st
-;core in the
b. When the districts were divided into distric1s WJtb small cla~scs ( <20
studenls per teacher) and large cl asses( ~ 20 students per teacher) the
following results were found:
l c~Y..
Avervge
Standard
Score (Y)
Deviation (sy)
Small
65/A
19.4
Llrge
650.0
J 7.9
__;_j
:2
3.15 The CNNIUSA Today/GaJlup poll conducted on September 3-5, 2004, surveyed 755 likely voters; 405 reponed a preference for Presiden1 George W.
Bush, and 350 repon ed a preference for Senator John Kerry. TI1e CNNIUSA
1oday/Gallup poll conducted on October 1- 3.2004. surveyed ?56 likely voters: 378 reported a pl'eference for Bush. and 37g reported a preference for
Kerr~.
Exercises
103
b. Is the re statistically significant evide nce that Flo rida stude nts perform
differen tly than o rher srudc nts in the U ni1cd S ta tes'?
c. Ano tber 503 students are selecte d at random from F lo rida. TI1ey a re
given a three-hour preparatio n co urse bdore the test is administe red.
Their average test sco re is 1019 with a standard deviation of 95.
i. Construct a 95% c onfide nce inte rval for the change in ave r age test
score associated with the prep course.
ii. Is the re stat istically sig,nilicant evidence tlta t the prep course
he lped?
d. TI1e origi nal 453 students a re give n the prep course and t he n asked to
take the test a second time. The ave rage change io their test scores is 9
points. and the standard devia l.ion of the cbaoge is 60 points.
i. Construct a 95 % confidence interval fo r t he change in aver age test
scores.
ii. Is the re sta tistically sig nificant evidence that students will perfo rm
better on their second a ttempt after ta king the prep course ?
a. Construct a 95% confide nce inte rval for the change 10 me n's average
hourly earnings be lween 1992 aod 2004.
b. Construct a 95% confid ence inte rval fo r the cha nge in women 's average ho urly earnings be tween 1992 and 2004.
c. Co nstruct a 95% co nfi dence interval for the change in the gender gap
in ave rage hourly e arnings between 1992 and 2004 . (Hinr:
Y,n.t<m -
Y,. .z~X~o~)
3.18 'Ibis exe rcise shows t hat the sample varian ce is an unbia sed estimator of the
population variance whe n Y1... . , Yn are i.i.d. with me an p.y and varia nce tri-.
a. Use E quatio n (2.31) to sho w tbat E [(Y; 2cov(Y,Y) ,. var(Y).
104
CHAPTER 3
Review
or Statistics
Hl \;) - tri.
3.21 Show that lhl} puoktl '>lanuard error lS,.. ,,, ,( Y,. - r . . )I given following
Equation (3.23) equal<~ the usual slaudarJ error lnr the difference in mcam
in Fquation (3.19) when the two group sizes are t.he sam~.: (n, = n w)
Empirical Exercise
E3. 1
On the text Web site www.a\lbc.com/st ock_wat'ion you will find a data Ilk
CI'S92_0-l that contuin::. an extended version of lbc ~l.ttasct used 10 'I able .:u
of the text for the years 1992 um.l ::!004. IL contain,;; data on full-time. full-~ car
wnrk~r' age 25-34. with a high o;chool diplum<t or 13.1\.lllS. ac; their hight:'l
de~rt:~ A Jctailcd description is giwu in CPSIJ2 04_ 0e~cription . availal'll~
on the \\cb "ite. Cse these data to answer the following <.JUC~ton ..
a. Compute the sample mean for average hourly t.trmng::. (A HE) tn l q~2
and tn 2004. Coru.truLL a 9.-o.-o confidence inter-.. I h1r thL population
mean::. of AHE in 1992 and 2()(}.1 and the change between 1992 uno
2004.
b. In 200-1. tbe value of the Consumer Pnct: Imh.:x (CPI ) was J88.9. l n
1992. the value of the CPI was 140.3. Repeat (a) hut use AHE measured in re<d 200-4 dollars l$2004): that is. adjust Ihe 1992 data forth~
pncc: infl<Hion that occurred between 1992 and 200~.
c. [f you were interested in the change in wnrkcrs' pu rcha-.ing power
lrom 1992to 200-1. would you use the rcl>ulls Irom (a) or from (h)'!
Explain.
d. t ''~C the 2004 da ta lo construct a 95'Yu conhdcncc interval for the
mean of AHE for high school graduah:s. Com.truct a 1.)5~.., mntit.h.:nc-:
105
int~
'.tl for th~: mean of AHE fur "orJ.;cr' "ilh a ,,lie dt:,.rt!t:. Con~truct a <>5'\, confidence inteJYal for the thllcrcnt'l' llctwccn the two
g. lank 3 1 present information on the tl.e!'l'dct S'P for collt"i!.C gradu.tt... s. Prepare a <:millar table for htgh !.~ht)ol A' :~duatc' u'anl?. the 1992
.tnJ 200-t daw. An.. there any notabk (.hltcn:u~:e:.lx:.t\\Ct. 1 tht rc'!>Uhl>
tor tugh chool and colkge grm.luatc~'!
APPENDIX
'Current Population SurYey" (CPS), \\hiCb J'fO\Ide ... J.tt.l un I '"'' r >r~c: ch.lractt:ri~lll~ ol
the popul.ltiUII,IIldudtnl! the 1..:\-elnf employm..:nt un~,;mpltl\llll.:nt, .11\J canlllll!'- \!lrc.: than
'll.IJUU G.~. lwu~c:holo.h are ~urveyed each munth.ll1c ~.unpk "~ hu~cu b} r;mdornl} -;elcct-
mg JJd e .. n lu'm a Jatabill>c of addre<>se<. from the mu lto.:ccnl dco.:cnnml ccn'>u' .eugmcnt~.:d v.nh J.et.s l'O new housing units coru.tructetl after the he 1ccn,u I he ~\act randnm
.tmpli .. -...heme '' ruther cumphcated (fir,t .... m.tll geo.gr.1phknl nren-. nrc r.mdoml~
~k~to.:J th n
hllu'>tnQ units within these area" arc r<~ndtmh' lectcd) Jctnl" can he II)Und
m the 1/andht>ok o I 11hnr StatlSiics and on tho.: Buu:uu o.>l l.lii'IOr c;t \11\IIC' \hb -.itc
{W\\\\
I'IJ,gm)
11w ~u' ) cvm.luc:tcd each \larch es mur .. Jcllut..;llthnn 10 other mont!' .md ,,,)..,
I.{Uo.:,IHllh .1hnu1 trnmg~ during lhc: pre' illU' ~ear. The ''''''''h' 111 Inhlc '\ I wert;; Clmputcd
u'ing tlw ~1.11\:h 'Ill\~.;~ 111~. C PS earning' data <Hl' for full 11111~.: v.rl..cr defined to~
'omd,oJ' empi11\'Cd mnrc than 35 hours per \\Cd; r1'1 :'II kn,t IR week' 111 th~ prev11lU'
\c,er.
106
CHAPTER 3
APPENDIX
3.2
Review of Statistics
of E( Y ).
Calculus Proof
To minimize the sum of squared prediction mistakes, take its de nvntive und set it to zero:
m? is mimmized when
m=Y.
Non-calculus Proof
The strategy is to show that the difference between the least squares estimator and Y must
be ~ro, from which it follows that Y is the lea!>t squa r~ estimator Let d = Y - m, so tlut
m =Y - d. Then ( Y,- m)~ = (Y, - [Y- dl) 2 = ([Y, - Y) - tf)! = (Y1 - Y)l + 2d( Y,Y) + d 2. Thus. the sum of sq uared predictiof) mistakes (Equation (3.2)} 1~
n
If
= 2'; (Y; -
(3.2:.>)
where the second equality uses the fact that :L~ l Y, - Y) =0. Because both terms in the
final line of Equation (3.28) are nonnegative and because the first term does nor dc p~no
on c/, L7 1( Y, - m f is minimized by choosmg d to mab.c lhc second t..: rm ,nd 2 as small as
possible.This is done by settmg d = 0, that is, by selling m = Y, so that l' is the least squnrt:~
estimmor of ( Y).
107
APPENDIX
-------1
l11i$ appendix uses the law of large numoer:. to prove lhut the sa mph: vanancc l'~ is a consiste nt estimator of the populatio n van ancc
as stated in r quatiun (J 9). when Y 1
Y,, are i.i.d. and E(Y ~) < oo.
First. add and subtract f-L y fo write ( Y1 - Y) 2 = [(Y; - f.L\ ) ( )': - f.' }W = ( Y, - i-'I') 2
- 2( Y, - f.'y)(Y - p. y) + (Y - p.yf Substituting this cx prc~siun !o r ( Y, - Y)" into tbt: t.Jef.
inilion or s? (Equation (3.7)]. we have that
ur.
S~ = -
II
- L (Y; - Y) 2
n- l ,_1
[!
(3.29)
where the rin al equality follows from the defin ition ofY [which implies that ~7- 1 ( Y, - f.L y)
= n(Y - f.l.y)l and by collecring terms.
The law of large numbers can now be applied to th~.: two ll m1~ in the final line of Equation (3.29). De fine W, = (Y,- f.L yl Now E(W,) = cr~ (by the tkfmition of the variance).
Because the random var iahles Y1 . .. Y,. are i.i.d., thc random vMiables W1, ... , W11 are
i.i.d. l n addi tion, E(W f) = E[(Y, - J.t 1) 4J < ~ bcca us<l, hy a'liumption, E(Y:> < oo. Thus
If . . , tt, are i.i.d. and var(W,) < :.::. soW satisfies the cont.ltll~ms Lor I he Ia'' of large num-
! :L:
cor.vuge;s 10 probability to uf. Because Y -L... J.tl. (Y - p. 1 f ~ 0 -;o the second lcrm
co~\'erges in probability to zero. Combining tbese result~ ytelds 4 ~ cr~.
PAR T TWO
Fundamentals of
Regression Analysis
CuAPTER 4
Cu
rTF.R
CuArTr:R 6
cu" ,. rER 1
CH\PJF.R
C ll i\PTER
Linear Regression
with One Regressor
lltatc. uHpkmcnrs. t.ough new pcnallH:.s on drunk~~ ivcrs: Wllat is th'- cfkct
on h1ghwoy fatahttes"! A school J1-ilr1Cl cuts the <.171! o( '" dcmt:nt.trv
j, the dlcct on
variable. X (X being penalties for d runl. dmmg. class SIZC. or) ~...tr~ ol schouhng),
on another variable, Y (}being high\\ i.l) death\ student test scores. or t;,1rninuc;),
1bis chlfHCr introduces the hnca1 regre,.sion model relating one "ariabk,
c;amplc of dara on ...\ and Y. For instance. U"ina data on cJ;,,, !->UC" .mJ c't scMes
lr,lm dillcrcnt '<.:hool Jistricts, we shO\\ ho'' tn c'\timatc the c:-.pcctcJ cllccl on
II!' I
-.cures tl( red ucing class sizes b~ c;av. one ... tudcnt per clcl''- The .,lurx- .mJ the
II\ ~c
cstimatt.:J b)
:1
ml.:lhod ~ailed
1t1
112
CHAPTER 4
f3ct.uSi:r
= change m ClassSize
6 TestScore
= t:J.ClassSize '
pi)
where the Greek letter a. (della) s tands for ' change in.'' That is, f3c. '' : is 'tc
change in the test score that resul!s from changing the class size, divtdcd b~ 'te
change in the class size.
lf you were lucky .:!nougb to know f3ctoSitc you wouJd be able to te ll the i>Up.... r
inte odent that decreasing class size by one Sllldent would change district wide t L 't
scores by f3nau.~f:t You could also answer the !.upe rinte nde nt's actual q uesll' 1.
which concerneLl c hanging class size by two students per class. To do so, r~urntn~('
Equatio n ( 4.1) so that
llTestScore = 13('/u"SI:t
t:J.ClassSi;.e.
(4 ~)
4. 1
11 3
Equatton (.U) is the definiti o n of the slop~ of u ~tra ight line relating test
!>COre~ and class size. This straight tine can he wri11en
(4.3)
where {3,. is the intercept of rllis straight line. and, as bl.!lore. {3<, ,,, is the slope.
According to Equation (4.3). if you knew {30 and {3, (\,,,.,. not o nl> \\Oulu you be
able to determine the change in test scores at a di-.trict assoctated with a change in
class s1ze, but you also would be able to predict the avc!rage te t score 11'-t.lf for a
given class size.
Whe n you propose Equation (4.3) to the s uperinte ndent. 'ohe tells you that
some thing is wrong with this formula tion. She points out that class size is just one
of many facets of eleme ntary education, and that two districts with the arne class
(t4)
Thus, the test score for the district is written in te rms of o ne compont!nl, {30 +
fJc. , , x ClassSize, that represents t he average dfect of class size o n "cores in
tht. population of school districts and a econd component that represents all other
factors.
A lthough this discussion has focused on test scores and class size, tht! tdca
exprl!so:l:d m Equauon (4A) is much more gena.tl. so ita' uc;efulto mtroducc more
11 4
cHAPTER 4
/Jo + fJ X
(4 <;)
,,,.
fur each d1<;rnct. (that ~ i = 1, ... , 11) ''here {3 '' lht. mtercept ol thh h n~ anJ p
is the ~l ope. fThc g;.:neral notation "/3 1'" '~' u sed lor the "lope in Equ,Jttou (-!.5 )
JO<;te<u.J of ../3 0 ,,,. becau!.c this cquatton is \Hiltcn 111 terms of a gcncr.11 \anuble X, I
Equation (-t 'i) is the linear regrt.>ssion model with a !lingle regresioor. in" hich
r i" the dependent variable and X is the independent variublc or the regressor.
Thl! fi rst part of Equation (4.5)./30 1- {3 1X 1 b the population regrcNsion line or
the population regression fun ction. Th b i~ the rc l ntion~ h ip that hol<.ls bct wt:len }"
.1nu X on average over the population.lllU". if yo u "-tH.:" the value of.':. acwnltn~ to thi population rcgrcsswn line you would prec.lict tha t the value of the:
dependent variabk. Y, jo; /311 f3 ...\'.
TIH! inte rcept /30 a mi the ~l ope {3 1 arc the cocffident' of the population regrc'sion line. also known a' the aram ct~~ of the population rcgrc.,~ion line The slope
ut'....
( r
nccpl 4 l.
4.1
11 S
~
TERMINOLOGY FOR THE LINEAR REGRESSION
MoDEL WITH A SINGLE REGRESSOR
t:
..
The Jlnem rcgress1on model 1s
--------------------------------------~
4.1
u1 as t he aror rcrm.
Figure 4.1 summarizes the linear regression model with a single regressor fo r
seven hypothe tical observations o n test scores (Y) a nd clac:s s1ze (X ). The popula tion regression line is the straight li ne /30 + {3 1X. The popula tiOn regression tine
slopes down ({31 < 0), which means that distr icts with lowe r student-teacher ratios
(smaJie r classes) te nd to have highe r tesl scores. The intercept {30 has a ma thema tica l meaoing as the vaJue of the Y axis inte rsected by the population regresSJOn line. but. as mentioned earlie r. it has no real-world mearung in this example.
Beca ~e of the other factors that determine test pe rtormance, the hypothetical observa tions in Figure 4.1 do not fa ll exactly o n the populmion regression line.
r or example, the value of Y for di ~ Lrict # J. Y 1 is above the populat ion regre~io n
lane. Th 1s means tha t tesl scores in district #1 we re better than predicted by the
populat ion regression line, so tbe e rror term for that J io;tricl, u 1.as positive. In con!ra't Y, is below the population regression line. so test scores for that district we re
., or.,c tha n predicted, and u 2 < 0.
:"ow return to your proble m as advisor to the superintendent: What is the
e xpected effect on test scores of rcd ucang the s tudent- teache r ra tio by two stude nts pe r \cac be r? The answer is ea!>y:The e xpec ted c hange is ( - 2) X l3oauSt:c
But wha t is the va lue of f3cJilliSI:?
il
116
CHAPTER 4
Test '<'On.' (Y )
~'I
regreuion line is
<tOll
] I)
1'\
'II
2'1
\II
Stud ent- teac her ratio (X)
1'1
li
4.2
L 4.1
1 17
Avorogo
T ' .....o~t:
Sta ndard
Deviation
10%
25%
40%
50%
60%
75%
90".4
(IT*IIan)
19.1l
J.9
J7.3
I'!_(,
(+.5 ~
lQ 1
ll30A
6400
191
Jll7
:'U I
h'4"
(l.'i9.4
(),11)
~(J.<J
21'1
IJ71J J
6ti(>
population. so is it poo.,sible to learn nb~lU( the: popuhtion -.lope /3 1 lo1t<., ,., uc;ing a
'-lrnpk of uuta.
The dnta we analyze here conlll'>t t.'f t~<:t -.cor~:- .mu cl"ss '-ties in IIJ9\) in
-120 California :,~,.hool dtstncts that -;er\~ kmdcrgu~n thmu.,:h t..lllhth grade. The
test score is the Ul)>trict,,ide averaac: of re~tding <Jnd m.tlh ...cnre... for fifth ~rrauc:r....
Chi!)S si7c can be: measured in various wn~ s. The rnea'>urc u-.cu hc:rc is nne of the
broadest.'' b.ich i., the n
thc district di\'idcd by the number ot
teachcr-.-that ''the district"ide stud~nt-tc:ach r o lC"I.! .1ta arc de-.cribeu
.., m more etat an Appendix -U .
Table 4. 1 !>Ummanze~ Lhe Jistrihuttonc. c1r tc::.t sc,)r~s anc.J 1:!.1::.::. Sties lor thts
sample. lne average srudent-teachl!r rat10 is 19.6l>tmkn b pd lt;ach~r anc.J the
!'tandard deviation is 1.9 student::. per teacher The IO'h r~rc~.:nttk ur th~: distribu
tion of the student- teacher ratio 1s 17.3 (that b. on l> IO'Y., uf Ji!>trich baH! stu
JC'nt t~a ch er rauos below 17.~). \\hile the district \1 the rll)'h pl.!rccntilc has a
studcnl- h:achcr rttio of 21.9.
\ scatterpiOt of rhese .f2Q o b,Cfh\liOO<; on ll!!'t 'core nd tl e -.(udent-h:a~he r
r.1t1u 1 ~hown in Figurl! ~.2. The sample corrdauon b - 0.23. tnd1canng a \\t: tk
n~:gauve rdattonc:hip between th~; l\\ O variabk!>. Although l.trgcr da,-.c, in th1s
~<ample tc:nc.J to haw low~r test scores.. there arc oth~.:r dctcrmin.mts of tel>t -.cores
that keep the obs~;f\ at ions from falling perfec t!~ ulu01.~ a -.t raight line.
nc,pitl this low correlation if llllc cou ld 'omchow dr:m a strairht line
throul!h the-.c data. hen tbe slope of this line \\ uuld h~ an c ... llmatc or fJo~,s.;:r
ba"ed llll these data. One wa~ to J ra\\ the lin~.; \\ouiJ be h.lltkc out a pencil and
a rukr and w 'cyeball .. the be~tltnl you could. \\'h1lc tim mdhoUt) ._,"> 1L1s \cry
unsc1cntiltc. and dtffcrl!nt people: wtll crcat~.. dtl"lcrl!nt c-.tim.llcJ line-...
How.th~..n. 'huuiJ you choo c .tmong th ~: mJny ptls.,ihlt: lin~:-.') B~ l;u the mo't
l!lltnnlOil \\ I~ i' tO ChOOSe the UOC thnl protiUCI..''- thC ")CL\<;t "t)li::ITCS'' li t tO lhcsC
data - that b. to u'e the ordinar. h:rtst ~u~rc-. (OL\) c-.tunntor.
118
CHAPTER 4
FIGURE 4.2 Scatterplot of Test Score vs. Student-Teacher Ratio (California School Distrid Da1o)
Dolo from 420 Coli
lomio school districts.
There is o wee~
negative reJotionship
T~st
scon
:.~r
()i l -
.. . ..
.... .-:..... ....., ,"... ..
'. .
-. ... .; ' . \.....
. .
... .. ~ .. :.......
.
. ca.. 1':'
...:,.'*.!It:
,.. .: .." , .. ,.'
.....
.I.,~
. . .. 'k. , ..
,,.~~ ~:-
f,()ll
'~
640
,/'I
t*
to20
. ..
(,0/)
Ill
20
15
:!0,
30
Student-tencber ratio
}:(Y; - b0 - b1XY.
(4.6)
i= l
The sum of the squared mistakes for the linear regression model 10 expre:.
sion (4.6) is the extension of the sum of the squared mistakes for the problem uf
estimating the mean in expression (3.2). In fact, if there is no regressor, then 1
does not enter expression (4.6) and the t"' o problems are idenucal except lor the
differe nt notation [m in expression (3.2), b0 in t.:xpression (4.6) J. J lbt us there i" J
unique estimator Y that minimi7c~ the ~xpres inn ("'.2). u b thcrL a unique p:ur
of esumators ot /3, and /31 that mminwc cxprt.s,iun (4.6).
4 .2
---
1 19
4 .2
:Lex,. - X)(Y
1-
{:3,-
i =l
Y)
,,
- ~Y
--2-
2: <X,.- X)?.
(4.7)
sx
i= l
Po=Y- fi,x.
(4.8)
~=
__,___J
30
ratio
f;0 + ~ 1X1 ,
U; = Y; - ~. i
i =I, ... , 11
= 1. ... , II.
(4.9)
(4.10)
The estimated in tercept (~0), slope (~ 1 ), and residu al (ii,) are computed from a
:sample of n observations of X, and Y1 i = 1. ... , 11 . These are estimates of the
;timatc d
unknown true population intercept ({30), slope ({3 1) , and error term (u 1).
.s is mea-
s esti:ma-
!> limation
(3.2)j.
el. Let 1>1,
hese esti-
'tXi.Thus.
= Y,- bu
atioos is
(4.6)
in expres
)roblem of
;or. then h
:ept for the
ts there is 3
unique pair
120
CHAPTER 4
T estScon
o9KI.) - 2.21)
c; ~TU .
(4.11)
' '"here TestScort: '" the average test .;core in 1he di'tt ict .md STR is the
... tmknt-teach(. r ratio. TI1c 'ymool .. ~,. over 1.\fScort! in EltUation ( 4.7) im.licat~o:'
that this is the predicted value ba,ed on tht! OLS r~ol!rc~sion line. ngun; 4.3 pint'
this OLS regression line superi mposed over the scauerplot of the data previous!\
~how n in Figure 4.2.
The lope ol -2.28 means that an increase i n the -;tutlcnl teacher ratio by one:
stut.lcnt per class i!>. on average, assucmh:d " ith a Jcchnc 10 Ji,lrictwiJe Lest !>COI't.''
hy ~ 28 points on the test. A dcct ca'& jg hr >'!Jrf P' It' ll h q , tJio h, "' > "'d~nt
pt:r cla~~ i) 90 ;n rnn nss:Cimsd llh qn iosrraw in 'r>' scmes pf -1 .56 pomt
1= -2 x
'
cr tea ch~
arl!er classes) i'\ 3""ocwtcd \\:lth oora c:rlormancc on th~ lC'\t.
b no\\ po~~bk 10 predic:t thc: district id~:. test sc:orc 21' en J .. aluc ot tin:
'ltuden t-t l!a~.:h~.:r r.1t1o. for t:xampk tor a distnct ath ::!0 stuJ-..nh per tc..,cht:I.tht:
FIGUR 4 .3
Tc~t sco~
1n,...
sion line ~s o
negotrve relottons.htp
between test scores
and the student
teacher ratio II class
sizes loll by 1 student,
the eslimoled regros
sion predtct~ thot test
' Ill
scores will 1ncreose by
2 28 potnls
..
...
..
.......
..
12 1
ptcdictcd test score is 698.9- 2.28 X 20 = 653.3. Of course. this prediction will
not be exactly right because of the other factors that determine a district's performance. Bu t the regression line docs give a prcdicti<.m (the O LS pred iction) of
wbat test scores would be for that district, bast:J on their ~t udc n t-teach cr ratio.
ahscnt thO!:>t! other factors.
Is this estimate of the slope large or small'! To answer th1' we return to the
superintenden t's problem. R ecall that she is contemplating htnng enough teach
ers to reduce t he student-teacher ratio by 2. Suppa c her distnct bat the median
of the California districts. From Table 4.l, the median stuJ ent-tcacher ratio is 19.7
and the median test score is 654.5. A reduction of 2 tudents per class. from 19.7
to 17.7. would move her swdent- teacber ratio from the S01h pcrccntik to very near
the l O'b percentile. This is a big change, and she would need to biTl.. many new
teachers. How would h affect test scores?
Accordin g to E quation (4.11). cutllng the studeot-teucher ratio b} 2 is predicted to increase test scores by approximately 4.6 points: if her di~t rict's test scores
are at the median. 654.5, they are predk ted to inc rea~e to 659.1. Is this improvement large or small'? According to Table 4.1, thb imp rovement would move h er
district from the median to just short of lhe 601h percentile. Thu~ a decrease in class
size that would place her district close to t he 10% wi th the c;mallcst classes would
ffiQ\'C her test scores from the solh to the 6Qih percentile. According lO these estimates,at leasL cutt1ng the student- teacher ratio by a large amount (2 !.tudeots per
teacher) would help and might be worth doing depending on her budgetary situatio n, but it would not be a panacea.
What if the superintenden t were contemplating a far more;: radical change,
such as reducing the s tudent- teacher rat io from 20 students per teacher to 5?
Unfortunately, the estimates in Equation (4.11) would not be very useful to her.
Th ic; regr~ion was estimated using the data in Figure 4.2. and as the figure shows.
the :,mallcst student-teacher ratio in these duta IS 14. Thi.!Se data contain no information on how districts with extreme I) small classc.:.. perform, so these data alone
arc nut a reliable basis for predicting the cflcct of radical mo\'e to uch an
extremely low student- teacher ratio.
tl
There ue both practical and theoretical reasons to usc the O LS estima10r.:; {30 and
{3 1 Because O LS is the dominant method used in practice. it ha!. become the common language for regression anaJysjs througho ut \,;conomics, llnance (see the box),
und the social sciences more generally. Prescnung rc~~ulls u<.,ing OLS (or lls varirum di~>cusscd late-r in this houk) mean~ that you <Ill! "<,peaking the same language''
122
CHAPTER 4
to tilkt'
mlo. Sat ditlt:rcntl).tt t.: c\:pect~.:J return on a ri~k~
~ ;, R. mu't exc~.; he:. return un .t ._.,fe. 01
'" . )T
urn. R
'\lclCk
At
Oc:tl!l 1CCOti\'C
n..,k,
R,.un
J.D\C
m a pot t tohn
thb idea
ACCI)I
~:\cess return
I<
,;(!~,
Rl
(4.12)
return.
ln~
m:U~"d
huntlrc:~c;
ol
~tack'
nn in,c::.tm,;nt
Company
0.0~
11.65
(1,70
0 .7S
l.O~
\licr
l.li
2.1 '\
2.t-5
.;oft\\.H..:)
Bc~t Bu~
c:kcth>IUt: ~.:qutpm.:ntrd;u1-.r}
.\Jlld7 m (millie n: tailct)
v
Pn\1 '""' c ""
r.:turn ol/<
I(SIO.~
-.s"
.1~ llthcr \!Conomt't" anll ..ta tistichn. ThL OLS form ub.-. ,1r~.. built intll \'irtu.tll\' II
' f>reall... h~d nll st.tli,ticaJ,ofh\ are pad .. 1
'll.lking OLS C'l'\ ' tO u.;t.
The OL. c,timutors alo;o have lle"1rahk th tel teal prlJXrlll.'' Thl.!s~,. .1rc;. "'" tl
()go us to t h~.. ll~::,i r.thlc p rl1p~ 1tie' ' tullie<.! in S ~tio n 3. I. o l } u' nn c' tlm.ttl
the populatimt m~an. l ndcr the a"umptiun' introJuct..-d in Sectio n -t.4, thc 0 "
Measure~ offit
4.3
123
c:.unultor j.; unh1a:-cd and coo'i't<:nt ."llt~ OLS c,..timator b aho cffkH:ntumonl! .
ccrt un ell'' of unbia-.t.:d estnn<Jtur...; h o\\C'-Cf. th1s eff!C.Icnc) re~ult hold,.. umkr
~om~.: <u1Jlllon.tl pt:ci.tl conditions. .mJ furth&. r Jl',c~'ion ~~r tim. rc..,ult is Jdc:rrcu
until se~\1110 .:; '
Y, "-
}~
+ Li ,.
(4. 13)
In thi-; notation. the R2 i" the ratiCl o f the ~ample' .man~:c ol )~ to the c:a 1ple ' antncc <'t } ,.
~l.llhcmattca lly. thl:! R-:. can be '' rittt:n " tht! r.ttllt uf the C \plain~d 'um
llf
-.quarl:' IClthe l<.lt,tl ~urn of squares. The C\plaincd 'urn of 'quare' (F..SS) i' the 'urn
<..'f" u rccJ dc,iationc; of the predicted', luc'-<'f Y ) , from then avda~c. and th~
total ' urn of sq u are.~ ( TSS) is the sum ul squared dc\i,tttun~ ol )'1 fwm tt<; 3\CTJI;!C:
ESS
=:-
...
L (Y, -
(~
Y)-
IH
' t
{U5)
th~.
124
CHAPTER 4
SSR
"
= ;2>?
1
(4.17)
It is shown in Appendix 4.3 that TSS = ESS - SSR. Thus the R 2 a lso can be
expressed as lminus tbe ratio of the sum of squared residuals to the total sum nl
squares:
R2 = t _ SSR
rss
(4.18)
Finally, the R2 of lJ1e regression of Yon the si ngle regressor X is the square of the
correlation coefficient between Y and X.
The R2 ranges between 0 and l.li ~ 1 = 0, then X; explains none of the vanJtion of Y, and the predicted value of Y; based on the regression is just tbe sample
average of Y,. In this case, the explained sum o( squares is zero and the sum ' t
squared residuals equals the total sum of squares: thus the R2 is zero. In contra'!.
if X; explains aU of the variation of Y,. then Y, = Y, for all i and every residual ~
zero ( that is, = 0), so that ESS = TSS and R2 = I. Tn general, the R2 does n
take on the extreme values of 0 or 1 but falls somewhere in between. An R2 ne, r
1 indicates that the regressor is good at predicting Y,. while an R2 near 0 indica -.
that the regressor is not very good at predicting Y,.
u;
4.3
Measures of fit
125
= -n
1 2:
" .,
- -2 i I 11I
= -nSSR
-- 2'
(4.19)
where the form ula for s~ uses the fact (proven in Append" -U) th.at the ~ample
aven1ge of the OLS res1duals is zero.
The formula for the SER in Equation (4.19) is similar to the formula for the
sample sta ndard deviation of Y given in Equation (3.7) in Section 3.2. except that
Y, - Y in Equation (3.7) is replaced by
and the divio;or an Equa 110n (3.7) is
n - 1, whereas here ll is n - 2. The reason for using the divic;cn 11 - ~here (instead
of n) as the same as the reason for using the dJvisor 11 - I in Equation (3.7): It
corrects for a slight downward bias introduced hecause two lcgre,l>tOn coefficients
were estimated. TI1is is called a "degrees of freedom" correction: bccau~e rwo coeffici e nts were estimated (130 and /31), two 'degrees of freedom" <>I the data were
lost, so the divisor in this factor is n - 2. (The mat hcmatics behind U1is is d1scussed
in Section 5.6.) When n is large, the difference between dividing by 11, by n - 1, or
by 11 2 is negligible.
u,,
126
CH APTER 4
tht: '>tudcnt hod~ acros~ <.h,tnct , diffcrcn~-. 111 'c.:houl qualtt} unrdatcd to the stu
Jcnt-tcath~r ratio. or luck on the IC'>I. The.. I'>" R' unJ hi:.h STR Jo nt. t tdl u~ ,.. h. t
these factor" arc. but the~ Jo indicate that the.. -.tuu-.:nt-lc.. .c..hcr ;ati1) ~tlone explain!
only :t c;mall part of the variation in tc..st scores m the~ datu
x,
(ul \' )
11.
lJ
a...n
4. 4
127
7('M t
D~!libuho ~
ol Ywhen X = 25
I
(,41)r
( ~20
~ ~-----._~~~~------~----.__ _ _ _ _ _ _ _ _ _ _ _L_~~~~--~
10
;w
")")
Siudrnt-teocher rario
The figure 5hows the conditional probability of test scores for districts w1th do ~zes of 15,
20, ond 25 students The mean of the condtionol distribution of test scores, gven the
wdent-teocher roho, E(Yl X), is the populotton regression line f3o 13 X At o given volue
of X, Y is dostnbuted around the regression Ime and the error, v Y - (f3o JJ X) has o
condilionol mean of zero for all values of X
Inn
128
CHAPTER 4
i[ the
conditional mean of one random variable given another is zero, then the rwo random variables have zero covariance and thus are uncorrclated [Equation (2.27 )].
Thus. the conditional mean assumption E(u,IX,) = 0 implies that X, and u, are::
uncorrelated, or corr(X;. u;) = 0. Because correlation is a measure of linear association. this implication does not go the other way; even if X, and u, are uncorrelatcd, the conditional mean of u; given X ; might be nonzero. However, if X1 and C/ 1
are correlated, then it mus t be the case that E(u,IX1) b nonzero. It is therefore
often convenient to discuss the conditional mean assumption in terms of possible
correlation berweeo X , and u,. If X; and u; are correlated. then the conditi onal
mean assumption is violated.
4.4
129
developed for i.i.d . regresso rs are a lso true if the re gresso rs are no nra ndo m . The
case of a no mando m re gressor is.. ho we' cr. quite special. Fo r e xample. mode m
experimen ta l protocols would have the ho rticulturalist assign the le vel of X to the
different plots using a co mputerized random number ge ne rator, Lhe re by circ umventing an y possible bias by the ho rticul turaJist (she migh t usc he r favorite: weeding method for the tomatoe~ in the sunniest plot). Whe n this mode rn expe rimental
pro tocol is used, the level of X is ra ndom a nd (X;. Y,) a re i.i.d.
Ano the r example of no n-ii.d. sam pling is when o bservatio ns refer ro lhe sa me
unit of o bserva tio n over time. Fo r example, we might have data o n inventory leve ls (Y) ar a firm and the interest ra te at whic h the fi rm can borrow (X) . where
tJ1ese da ta are co fl ecred over time from a specific firm; for example , they might be
recorded four times a year (quarte r ly) for 30 years. Thill is an exa mple of lime
series data, and a key fea tur e of time series data is that observations falling close
to e ach othe r in time are oo t independent but rather tend to be correlated witb
each o the r: if interest rates a re low now. rhcy arc likely to be lo w next quarter. This
pa tte rn of corre lation violates the ' independence .. pa rt of the i.i.d. assumption.
Time series data introduce a set of complicatio ns tha t arc best ha ndled afler develo ping the basic tools of regression ana lysis.
ren
be
on
at
Ihis
pey
"' anes.
~pi
Kl P "
lpl ~.
:\iM
lS ot
=tll~s
.,tt
1e 1
rottl
Cilll-
The third least squa re s assumption is that large o utliers- that is. o bserva tions with
values of X, a nd/o r Yi far outside the usual range of the data-are unlikely. Large
outliers can make OLS regression results m isleading. This potential scnitivity of
OLS to ex tre me o uilie rs is illustra ted in Figure 4.5 using hypothe tical data.
In this book, the assumptio n that large outliers are unlikely is made mathematically precise by assuming rhat X and Y have no nzero Imite fo urth mo ments:
0 < E (X1) < oo and 0 < E ( Y ?) < oc. A no the r way to sta te thi!> assumption is
tha t X a nd Y ha ve finite kurtosis..
l11e assumptio n of fi nite kunosis is used in the ma thema tics tha t justify the
la rge -sa mple a pproximations to tbe distributions o f the OLS test statistics. We
encountered this assumption in Chapter 3 when d iscussing the consistency of the
sample varia nce. Specifica lly, Equa tion (3.9) states tbat lhe samr te varia nce ?y is
a consisten t estim a to r of the population variaoce u j. (s~ ~ cr~) . If Y1, Y11
are i.i .d. and th e fo urth mome nt of Y, is fi nite. then the law of la rge numbe rs in
Ke) Conce pt 2.6 applies lo the av~ragc, , 2..~ 1(Y, - fLy) 2, a ke y step in the proof
in Appendix 3.3 sho wing tha t s~ is consisten t.
One source of lar ge o utlie rs is data entry e rrors.. '> uch as a typogr apJ1ical erro r
or incor re ctly using d iffe rent units for differe nt o bservations: Imagine collecting
130
CHAPTER 4
fiGURE 4.5
y
~111)11-
1-ou
ou~ier ~ o strong
poshve relohonship
between X ond Y.
but the OLS regress1on
line estimated without
the outlier ~hows no
l lllll :....
111111 ~
1;1"1
relotionsh1p
data on the height of students in meters. but inadvertently recording one st uu~nt\
height in centimeters inste<Jd. One way to find outlien> i!) to plot your da ta. If ~ou
dcci1.k that an outlier is due to a data entry error. then you can either correct tb.:
~ rror or. II th.tl 1:> tmpossible. Jrop the obs~ rvation from your data ~lt
Data entl) crrorr. aside, the assumption of finite kurto~~ is a plausrble onL
in man} application~ '>'ith economic data. Class size is capped by the ph )sic,t
capncily of a clao;l>ro<.>m: lhe best you can do on a standarc.li.!ed test is to g..: I .llltht:
qth.::-.ti,1ns right aml the worst you can do is to get all the qucsttons wrong. Bccau, ...
cle~ss s1ze and tc ~l scores have a fin ite raogt'. they ncce:.sarily have finil~ kurl\his
I\ l or~ !!Cnerall), commonly u<>ed distribution<> such as the nom1n l distnhullon hc~,c.
four rnom~nts. Still , as a mathematical matter, sonu: distribut ions hu ve infinite
fomth moments. and U1is assumption rules out those distributions. H thi~ nssumptiun hold then it io; unlikel~ that statistical inference_ U!)ing OLS will b~ dominatrd
ova few obc;crvations.
1l1c thtcc
rcwrn
1u
k<~-.1
tht~
le'\tbook.
cu t'~
fyou
:t the
e one
ysica l
11lthe
~cause
rtosi!>.
n ba ,e
illinite
>sump-
llnatcd
.umma<lnd we
4.5
131
'
'
{3 1X, + tt 1 i
4 .3
= L .. .. n, where
J. 111e error term u, bas conditional mean zero given X;: E(u,I X,)
= 0;
2. (X,Y1) , i = L . ... n are inde pendent and identically distributed (i.Ld.) draws
from their joint distribution: and
3. Large o utliers are unlike ly: X; and Yi have nonzero finite fourth mome nts.
Bn
132
CHAPTER 4
different possible random -;am pies. This section pn.:~cnt!> lhc!>c c;ampling distribution In small am pies. the c dt'>tribut ion ~ arc comphcatccl, but in large amp h.:.
they are approximately normal beca use of the cenu .tlliatit theorc::m.
Y.
Po
Pu P
lhat is, ffiu and ~~ an~ unbiased esumators of {30 and {3 1.1be proo( that ~ 1 is unhi:hr.:d
is given in Appendix 4.3 and the proof that {311 ts unbiased is le ft a Ex1. rc1o;c 4 i
H the sample is sufficient!) large. by the central limit theorem th~: 'Jmplint_:
dtstribution of ~0 and ~ 1 is well approximated by the bh ariate normal dtstnhuttOtl
(Section 2.4.). This tmpltes that the marginal dtstribution" of iJ and~. ar" nurm I
tn lar'!c ~Jmple:..
4.5
133
,..
,..
Pu
It tIn: 1\!ast squares as!>umplions in Key Concepl 4.3 hold. then in large samr lcs
~nc1 jJ 1 hnvt a jointly no rmul sam pling distribution. The lilrgc sample normnl
dl'lrtbution of fo 1 is N ({31,(T$.). where the variaucc of this distribution. a ;,. i~
~~ = .!_ \'ari(X, - JL " )u,]
' ft
{var(X,)f
(PI )
1 var(JI,u,)
4.4
''he re H ,
=1-
JJ.x
( X~) X,.
(4.2:!)
This argument i nvok e~ Lhe central limit theorem. Tt!c:hnic.llly. the central limit
theon.. m C\lncerns the dJ, trihution of averag.t:s (like Y) . l f you ~.xami n c the numeryou wi ll sec that 11. too. i~ a type ol average-not n
'ltor in Equatio n (4.7) for
''mplc a\ crag.e. lake };. but an a\erage of thl.! produl.l. ( Y1 - } ')t~\ - X ). As discussed furthe r 10 Appendix 4.3, the central limit theorem appl k~ to thLl> a\c:rage
so that, like the lllm pkr average Y , it is normall) di:o.trahutcJ llllclf~c ~.ampks.
The normal approximation lo tbe ui trihut ion ur the OLS e-.tamatMS in large
sampk' is summa riled in Key Concept 4.4. (App ndix -t 1 'utmn.trvcs the dcri\'ation of thc'c fomlUla'-) A relevant questaon in practice'" hO\\ large n must he [or
these approximat io ns to be reliable. In Sectio n 2.fl w~ su ge~l;t ed th.tt n = 100 is
~ ul fici c ntly large r~)r the sampling distribution of Y to b1. well approximated by a
normal J itnbullnn. and some times smaller 11 l!lllltc.:<' n,., criterion carrie::~ over
t\''1 th<.. mnrc: comphcatcd a\ erages appcanng an rcgrc"ion mal) i-,. ln ' irtuall} all
nwJcrn CC\lnnmctric appliC<ttiolll. n > IIXJ. "0 \\C \\ill trc H the normal appro'.imati,)ns to the Jiqribution of the OLS ~uma t or" as reli.1hlc unlcsc; there a rc l!ooJ
rc:~<;ons to thin k otherwise.
n,<- rc,ults in Kt!y Concept .u in1ply t huth~ OLS estsmJl<.ll"\ arc on si~ten t
th 11 ts, \\ht.nlhe s3mpk ize is largcj and
will be clos~ to the true pupulatil'O cch.lhcicnt:- /311 and /3 1 with high pwbahsltt y. This is ht:CJU">C the' anance.
2
1
''tl amI 111, o I' tI1c c't1. mators decrca-;e tll/clll '' n .Increases (11 appear'\ .10 t IH~
Jcnominatnr nt the formulas for tl1e variann ... ). so the dlsttibution of Ihe OLS csli
matm:. ' 'ill be lll!htl) concenlraled around their m~:.an~.~il a nd /3. '' hen 111-. large.
ffi,.
h,
134
CHAPTER 4
206 -
:w-1 -
..
...... .-:.. .:........'......
--::t
... .
.
..
.
.
.
.,
..
.......
.
.. ,. . . ..
.
.".,........, .....
.
20:!
200
19~
JC)(\
\94
97
98
99
100
101
102
103
Summary
135
4.6 Conclusion
y.
al
m~ IS
on
)OS
ble
uld
ariort."-
, ow; [or
1eots
Thts chapter has focused on the use of ord1nary lea t squares to estimate the interet:pt and slope of a population regression lme using a sample o! n observatjons on
a dependent variable, Y, and a single regressor. X.There .trc many ways to draw a
straight line through a scatterplot , but doing so using OLS hai> several virtues. Tf
the least squares assumptions hold, then the OLS cstimutors of the slope and intercept are unbiased , are consistent, and huve a sampling distribution with a variance
that IS inversely propon ional to the sample silc " Mort:o-..c.;r t1 n IS large, then the
sampling disrriburion of the OLS estimator is normal.
These importaor properties of the sampling distribution of rhe OLS estimator hold under the rhree least squares assumptiono;.
The firsr assumplioo is thar the error term in the lineaa regression model has
a conditional me:m of zero, given the regressor X. This assumption implies that the
OLS estimator IS unbiased.
The second assumption is that (X,.Y,) are i.i.d., as is the case 1f the data are
collected by simple random sampling. This assumption ) tclds the formula , presented in Key Concept 4.4, for the variance of the sampling distribution of the OLS
estimator.
The third assumption is that large outlieri> are unlikely. Stated more fo rmally,
X and Y have fmite fourth moments (finite kurtosis). The reason for this assumption is that OLS can be unreliable if there are large outlier;.
The results in this chaprer delicti be the sampling <.li-.rribution of the Ots estimator. By themselves, however, these results arc not suflicicn t to test a hypothesis about lhe value of {31 or to construct n confid!!ncc tntcrval Cor /3 1 Doing so
requires an estimator of the standard deviation of the samplin)? distnbutioo- that
'"-the standard error of the OLS estimalor. This step-moving from the sampling
distribution of 1 to its standard error, hypothesis WSll' ::md confidence intervals'' taken in the next chapter.
Summary
I . The popu lation regression line. {30 + {J 1X , is the mean of Y as a function of the
va lue of X. The slope, {3 1 is the expected change in Y associated vith a 1-unit
change in X The intercept. {30 , determines the level (or height) of the regression
lioe. Key Concept4.1 summarizes the terminology of the population linear r~gres
'ion model.
136
CHAPTER 4
Po
Key Terms
lin..:ar rej.!rcsswn motld with a c;inglc
r~gr~:.:.or ( 11~)
dl:penJent \anabl.: (114)
indt:pcnJ~.o nt 'nriahlt- ( ll4)
n.:grd-.l>Or (I 14)
pnpul;stiOn regrc"i''" lith: (11 -t)
popul.nion r~!'rc.,sion func tkm (I I4)
populatinn mterccpt and slope ( 114)
populath"1n codflctenl~ (I 14)
paramdcrs ( 114)
error tLnn (114)
-t J
Expl.tin the Jiffcrenct b~rwc\.'n ~ 1 and {3 1: bet\\ cc n th~ rc..,tdu.tlli anJ tl '
rcvrc.,,ion crrM 11. and between th~.; 01 5 prcdkteJ v,tluc } ;mJ f ( r l \')
l-or cnch lcn::.t squar cs a<;<;umption. provrJe an example 10 wh1ch thc a..,o:u nt
uun ,.; '.thJ. nJ th~n pro' 1J" an example 10 '' htch the a"umption tall:-..
!)htch a h\ pothl!lK ttl ~m~-.rplut ol d..tw tur an e...llm.tti!J regrc,~1on " tth
R
( l) ~ dch a t) r llt CIJC- , ... t(h; rot ol J.Jt,t l\lr rt n.:grc ......i,m " ith
u-:. 05.
Exercises
137
Exercises
4.1
Suppose that a researcher. usi ng. d ata on ch1-;:, ~1/1! (CS) ami a'eragc test
scon~s from 100 third-grade classes, estimates the OLS regressio n.
4.2
Weighi = -
99.41 + 3 94
1/ei~:llf, R 1
= 0.81, SER =
10.2.
inche~
b. A ma n has a late gro\\lh spurt und grows t 5 im:hcs over the cour.;c
of a year. What is the regression's predict ion lnr the incre ase in this
man's weight?
c. Suppose that instead o t mea uring wcifht and heigh t in pounds and
the
).
4.3
AWE = 696.7
O.tP1 \T R - fl24. 1.
138
CHAPTE 11 4
b. The standard error of the regression (S ER) is 624.1. \\ hat arl! th~ unit
of measurement for the SER (dollars? years? or~:. SER unit-frec)7
c. The regre ion R2 is 0.023. What are the unit of mt:asurl!mcnt lor the
.1
2"-year-olJ worker'>
A 45 -~ear-old worker?
e. Will the rcgre sion give reliable prediction ror a 99-year-old worker'?
Why or why not?
f. Given wuat you know ahoutthe di!.tribution of earnings. do you thml.
it is plau$ible that the distribution of errors in the regression is normal? (Htru: Do you think that the di$tribution is symme tric or
skewed'/ What is the smallest value of earnings, and is ir consistent
with a normal distribution?)
g. The average <~ge in this sample is 41.6 years. What is the average valu:
of AWE in the sample? ( Him: Review Key Concept4.2.)
4.4
regr~ssion error.)
Exercises
u. F'<plain "hat the term u reprt:!.unt<,. Why
\\llllhllcr~nt
139
<;\U(knts have
= 4lJ + 0.24 ;\ .
,1
S how that the first least squares assumptiOn. f(u, X) = 0. imphcs that
(Yw\,)
= {3l, -
f3 t Xr
4.7
S how that ffiu is an unbiased estimator of {30. (Nmt. Use the fact that~~ is
unbiast:d, wh.ich is shown in Appendix 4.3.)
4.8
fi\!0 e.\ccpt that the first assumpt ion is replaced with L'(u, X,) = 2. \\ h1ch
r am of Key Concept ~.4 contmue to ho ld? W hich clungc'! Why? (Is {3 1
normali) distributed in large sample!. wilb mean and
\,tn,lllCI!
given in Key
4.9
0?
4.10 Suppo"c:. that Y, = {3~. J. {3 1X + 11. where (X,. 11) arc i i d, and X is a
nernoul h random variable wit h Pt(X = I) - 0.211. \ \hen ,\ = l.u is \'(0. 4);
when
>. -
0. u, i N (O. 1).
n. Show that thl:! regre~ ion assumptionll in Key Concept 4.~ are
-.ati-.ric.tl.
b. Derive an expression for the large sample variance of
fi,.
11.
=~
140
CHAPTER 4
the squared
value of the sample corre la tion between X and Y. Th It '" s ho\\ th.u
IS
R2 = r.}y.
b. Show that the R?. from the regression of Yon X ~~ the same as the R~
from lhe regression of X on Y.
Empirical Exercises
E4.1
On lhe text Web site (www.aw-bc.com/stock_ watsou). you will find a uJil
fi le TeachingRatings lha t contai ns data on course eva lua tions. cour~t:
characteristics, and professor characteristics for 463 courses a t the U niwr
sity of Texas at Au~tin. 1 A d e tailed description is given in TeachingRa lings_D escription , a lso available o n tile Web site. One of tbe charactcrhll~~
is a n index of the professor's ''beauty'' as r a ted by a panel of six JUdge!> I
this exercise you will mvestigate bow course evaluatio ns :~ re rdated to t
professor's beau ty.
OI\Cr.>ll> ul
~'ere u-c:O m h~ paper '' 11h -\m\ I' .rt..cr, .. Bc..uh m th~ Cl01~'n'll.lnl lnstru..:tos-. Put~hntu.k und f>ut
h-c Pe.lagog~.:nl t'ruduc:tiu~. Et " "Illes nfl.'duc-utmiJ Rnle"t . AUU'I :20.1,, 2-114) pp. .lh~\11>.
Empencol Exercises
14 1
d. Comment on the size of the regression 's slope. 1-. the e~tima ted effect
of Beauty on Course_ Eva/ large or smull? Explain wha t you mean by
" la rge'' a nd "smalL"
~~-se 11111 ~ were pro\ldcd h\ Prof~"<lr \eClha R1u...: ul l'nncctun t m veNI\ .tnd "en: Ul>Cd tn her
J>.tpcr ~ u.mocr.tllllllton ur 01\cr,inn" The F lh:.:t <~I Cummuml) f'<ll~ , on f.d uc tltonal AltaJOmcnt," Juumt~l r/ Huwu:.~.\ amt &cmunuc: ~''''1.1""\ (lrtli'J'J'i 121:!1 rr 217...:!~4
142
CHAPTER 4
b. Bob'!> high school was 20 miles lrom th~ nearest college. Pr\!t.hct Bub'.),
) ~a rs Ill -.:ompleted educatwn uo,mg the c~tunated rcgrc ......ion. rlow
would th~: pred1ction ~h mg~ 1f Bob h\'cU Ill m1k~ ln.lm the ne..~rest
colkge?
c. Doe!) dtstance to collcg~ c\phm i.tlarge tr.1ction of the 'arlance in
c=ducauon.JI ,tlla.nment aero' indl\ 1tlu ''',Explain
e. Where JS Malta'? Why 1s the Malt<t tr;HJ<.: share !:.o large? Should Malt.t
be included or ~.:xcludcu fwm Ihe ; 11wlyt'i~'?
143
APPENDIX
mance . .;chm1l char,tctl!rhtic.... and sttH.knt dcnmgraphil.: 1-tacky.HHIOJ,, lllc Jata used here
nrc frvm all 42() K -6 nnd K -8 dil>tricl5. in California wtth J.tta ov:ul.1hh: for 194!( ant.l1999.
Test ~core~ arc the a\er<tgc of the reading and math :-.co re~ on the StunfurJ 9 Achievement
lbt, a stand;IIJI/eJ h!St admim:.terc:d tu ftflh-~rade s t ud~..o n t ~ S<:huul ch.tractcmttQ; (avcrtgcd ac.m:..' the.: dt:.tr1ct) mdude. enrollmen1 . number of tc JCh.:r~ (nH.:a, uJct.l as (ull-ttme
equiv.tlcnh ") number ,,r computers per cla:.sroom. and c.;\pcndttutc' per ..,tuJent.l11c l>LUt.h:nt tc.tchcr ratio th~d here j., the numher of 'tude nt ~ in th1. c.Ji,trict, Ul\ idcd hy the numhcr of full time ~.yui va len t teachers.. Demographic variable~ for the: o,;tudcnts also are
,1\cr.t~cd
wlu..) ae 111 the public a~islimce program CaiWork!l (l<>rmerly AFD C), the p;;rccntage of
:.tudents ~bo qualify for a reduced pncc luncb,.mJ the p;;rcl!ntagl! uf students who arc EngJt~;h kat nef'\ (that is, :.t udcnts !or whom English ~ a second l n n gua~l!). All of th.:'-e data
W L rc.: obt ain~d fmrn the Californ ta Dcpartmcnl of J:.ducation (w""' .nkca g,m 1.
APPENDIX
-Mu,
- :L ( Y1 -b,, - bX
) 2 =-""~- ,2, ( Y1 - b11 - bX
) and
1 1
1 1
1
- '
tl 1
,,
(423)
{4.2-l)
t J
111~: ( H ~ e'tunuhll'l\. ~ :tnd ~ 1 are the values of h. and i> Lhat nunim11e "i.~ { Y1
b -
b .\ 1 ) ur. cyut-..lcntl). thc 'nlue:. of b, and b 1 for \\htch Lhc dem .JII\ c:. 10 I qu.1ttons ( 4.23)
144
CHAPTER 4
(4 2i>l
II
{3, = I
)1cld~
L:AY -X Y
"
L:tx
-
X)(Y,- Y)
II
- L:Xf- CX> 1
II tl
4 :!:
iJ
lhe formula 1 ... s,yyll } i~ obtained by dividing the numerator and denommator in fquil
iion (4.27) by 11 - 1
APPENDIX
4.3
- -- ----i
u,.
u,-
= f3 L I X 1 - X f
l
(1.?9)
"
2:CX,t-1
X)(u, - ti}.
145
u)
}..
1 ( '(-
A lutniOthc:tm.tl
cxpt .,;~'ll'n tn LquJttun ( 4.29} pelds "L, 1(X; - \ )l Y, - )' ) -= fl ~~~ 1 \ . - X)~ r
~ 1(X, \ )11;- Subst1tutmg thts c:-.pr.,;~ston m tut n mto the lurmul IN~ 10 E4ua1t111l (4 '7) ~ tdds
I n
- ~ ( X - ); )II
II ~
(4.30)
I
- )' ( X - A )l
,"*:'
ll
ProofThat
Pt Is Unbiased
lltt: cxpcct.ttion of~ i~ obtained by raking the exp.;:ctallon of h<11h ,,J,' 11f lqu.tuon 1-UO).
llm~
-{ Ln ( X, - -X)F.(u, ! X 1,
II ,_,
= /31 - E
[
where tht
~econd
II
I<X,- A )
n; 1
,\,,) ]
'
u~mn t h1.
{3
IZ
,.
tinns ( "1:\.:ltnn '>.3). ll) the s<xonJ l~:a <;t ~quarel> a'sumplton . 11 ~ J"tnhulcd tnJq'l\:nJ~:n t ly
hH .Ill 11h~e n :It tons other tha n i. so L::{11,l X 1 . X,) L(u X ). By the ftrst
nf ,\
k il~t '4uarel' J~umpuon. however. F.(u,l X.) - 0. lt fl,ll\1\\' th.tt the l:llOUtltllnal~:\P.,:C:la
111 l.trg~. bra~:l..ct' in the second hnc.: l)f [qu.tltlln (4. \J) IS l1.n.1. ~~) that
ltllll
(.(fi - {3,' \' I X,.) - 0. Equh alcnll}. F(~ I \ .... . .\'n) (3 lhu t j, /3 '" conuition.111) unhi.t,,J JZIVen
X, ny the l;m of iterated e\pec t:lli110S 1J;J,) =
11 r:w1 - JJ ,.
x,. . ..
.. .. x 1J =o. so that F.(,B,)
lb. -
f3 th,u , 1j t\lmh'cd
h) '-'' ,,tdcnng the bcha' ior ot the tinal tt.nn 1n Equ tllon \ , >tl)
Conc~.pt4.4)
146
CHAPTER 4
var(v) /[var(X,)]2
tion (4.21).
n2:U,
= o.
i
(4.32)
I~ .
n LJ Y,=Y.
j-1
( -I.J.J)
i- 1
TS.S == SS R + ESS.
Equations (4.32) through (4.35) say that the sample average of the OLS residuals i~ zer'
the sample average of the OLS predicted Vi!llll.:l! equal:. )'; the sample covariance ' \
between the OLS re~iduals and the regressors il' zero; and th.: total ~u rn of squares is tl1..:
sum of the -;um of <:quared residuals and the explamcd ;;urn of squarcl! (the ESS. TSS JnJ
SSR ar..: dcfrned in Equations (4 14) (4.15).and (4.17))
147
lo H:riC~ F.4ua1ion (-U2) .. note that lh~ Jch nition ttf h11 1cts us wntc: the OLS residual~ a" ti, ~ Y1 - ~.... - p,x,= ( Y - Y ) - f3 (X, - X): thus
n
~>i
1-J
"
1-
= 0. so
l,~ ilt = 0
~7 Y, =-
L; ,Y- L~= it 1
.. }:; 1
2:u;X = 2: [(Y, 1
i -1
Y) - /31(X, - x)]
ex,- x}
i=l
If
= L (}j- Y) (X,r=l
X ),
X) - {J, 2: ( X, - Xf =
"
(4.36)
0.
1 1
where the final equality in E<t uat ion (4.36} is obtained usmg the fonnula for {J1 in Equatio n (~ .27). nus result, combined with the precedtng resu lt ~ im pli~s lhat sitx = 0.
Equatio n (4.35} follows from the previous results a nd some al~ehra :
n
TSS
L (Y, - Y f
;: ]
II
2) Y,- Y, + Y, - Y)'
.~J
2:<Y.-
~y
I= I
= SSR +
II
+ :LeY. r~
II
ESS
Y) 2 -r 2 L
( Y,-
Y.HY. - Y)
(4.37)
I* 1 rc,~
2: ,ri,(JJ,.,
1-
~ 1 X,)
CHAPTER
coefficient {3 1 ditfers from one sample to the next- that is. how ~ 1 has a samphnr
distribution. In this chapter. we show how knowledge o( this sampling
Ji~tribuuon
can
b~'
the sampling uncertainty. The starting point b the standarc..l error of the OUi
estimator, which measures the spread of thr.: ampling c.Jt.,tribution of {3 1
Section 5.1 provides an
cx pr~!ssion Cor
error of the OLS estimator of the: intercept). lh~o bow~ bow to use
Stand.srJ
Crhf hl
test
P and tb
1
!~pect.ll c:~se
of a binary
regressor.
S~cl11>1lS 5. 1-5.3
Ch apter ~
strung~:r
th r~!c: kill>!
squares a!)sumptions of
One ot thes1. !-trongl!r con<..l itions t' that the l!flOJ:> arc bomo!)kedac;tic. u Ct>m:c p l
intsoduccd
tn
th~
"hich -.tah!'- that. un<..lcr certain conJitiun'-. OLS L~ dfidcnl (ha~ tht: <..malle't
vanancc) among a cert=tin
cla~c;
5.1
149
general approilch to testing hypotheses about th~s~: coclltctc nts is the same a
to ttsling hypotheses about Lhe population mtan,so we begin with a brief review.
Testing hypotheses about the population mean . Recall from Section 3.2
Lbntthc null hypolhe~is lbat the mean or Y i~ a spe 1f1t value 1-'-l 1 can be written
as II , E(Y) = 1-'-l ,, and the twosided alternative is II f( Y) 1-Ll'fl
Tbc test of the null hypothesis H0 against the t\\ o-sided ah~rn.tttve proceeds
as in the thrt.. t! steps summarized in Key Conct!pt 3.6. The r.r,t i' to compute the
'l:lndar<.l error of Y,SE(Y), which is an eo;tim;Hor of the standa rd deviation of the
'ampling distribution of Y. The second ::.tcp is to compute the r-stntistic, which has
the lll!neral rorm given in Key Concept 5.1; applied het~. the r-statistic is
r - () IJ.) )'SE(Y).
I 'he third st~p is to compute the p -value. wbkh is the smallest significance level
.11 wh1ch thl.! null hypothesis could be rejected , based on the tcSL statistic actually
oh...ct \cO; cquivakntly,lhc p-value is the probahility of obtaining a statistic, by random :-.amplin!! \ ariat ion, at least as differl!nt from the null hypothc,is value as is
the statistic actua lly obser\'ed, assuming that the null hypothc.:o;Js i), correct
1 SO
CH APTER 5
5.1
C.t
( Key Concept 3.5). Because the 1-staristic has a standard normal distribution in
large samples under the oull hypothesis, the p-value for a two-sided hypotbl:!'-1'test is 2Cf?( -ITac.ti). where T'"'' is the value o ( the r-stalh.tic actually computed anu
(]> is the cumu lative standard norme~l distribution tabulated i.o Appendix Table 1.
Alternatively. the third step can be re pl aced by simply comparing the 1-statisuc tn
the critica l vaJue appropriate (or the tes1 with the desired significance level. F111
example, a two-sided test with a 5% significance level would reject the null hypothesis if lr 11c'l > 1.96. In this case, the population mean is said to be statistically stg
nificantly different than the hypothesized value at the 5% .;ignificance Jevel.
Testing hypotheses about the slope p,. At a theoretical level. the crilical l l w
ture justif}ing the foregoing testing procedure for the popula tion mean ts tba tn
large samples, the sampling distribution of Y is approxtmate ly normal. Becau.o.l. {3.
also bas a normal sampling distribuuon in large samples. hypotheses about the t ue
value of the slope /31 can be tested using the same general approach.
The null and altemarive hypotheses need to be srated r recisely before! l. ll.~
can be tested. The angry taxpayer's hypothesis is that Be, ~h = 0. More gcner
ally. under the null hypothe is the true popuiiHion slope (3 1 takes on some spe~. 'tc
value,J3tu- t;nder the two-sided alternative, (3 1 does not equal 1310 . That is.thl;! null
hypothesis and the twosidcd tdtemative hyporhesis are
Ho: 13 1 = B1.o vs. H 1: (31
* /3
111
("i 2)
To test the noll hypothesis H 0 , we follow the same three steps as for the popul,t
tion mean.
The first step is to compute the standflrd error of {J,. S(~ 1). The taml<trd
~rror of ~ 1 is an estimator of rrp,.lhc standard deviation of tbe sampling dismt>u
lion of {3 1 Specifically,
(5 )
15 1
where
a~
" (X
-1- ~
n- 2 ~
=- xn
X) 2i/l
Jl_
[l i <x,- X)')2
(5.4)
ll ; .. ,
b,
p-valut!
= PrH-[1~1
= Pr
(
Hu
- I'J.ol1
I]
S({3 1)
'
(5.6)
where PrH. de notes the probability computed undl!r the null hypothesis, the seco nd equality follows by dividing by SE(j3 1). and t nrt is t he vaJul.! of the t-statistic
actually computed. Because 1 is approximately normally distributed in la rge samples, under lhe null hypothesis the r-statistic is approxitn<Hely distribute d as a standard no rma l random variable, so in la rge sam ples,
p-value
(5.7)
A sm all value of the p-value, say less than 5%. provides evide~O:: agamst the
null hypothesis in the sense that the chance of obtaimng a value of {31 hy pure random varia tion from one sample to rhe next is less rban 5% if, in fact. the null
h\-pothesis is correct. lf so. the null hypothesis is rejected a t the 5% sig.01ficance
k\el.
A lternatively. the hypothes1s can be tesled at the 5% igoftcance level simply
hy comparing the value of the t- tatistic to :tl.96. the critical vulu~ for a two-sided
tc.: ... t. and rejecting the null hypothesis at the -o/o lcvd if "r I > 1.96.
l1li.' '>C 'ttcps are summarized in Key Concept 5.2.
152
CHAPTER
5.2
(5 .~1
( 10.4) (0.52)
equation (5.8) a lso reports the regressio n R 2 and the sta ndard e rro r of the regre::.
sion (SER) following the e stima ted regression line. Thus Equa tio n (5.")
provi d~s the estim ated r eg ression line. estimates of the sampling uncertaint) ul
the slo pe and the intercept (tbc standard e rrors). a nd two me asures of the ht l,f
th is regression line (the R2 and the SER).TI1is is a common fom1 at for reportu1g.
a inglc regression eq uation . anu it will t"lc used th roug ho ut the rest of th is boo!-..
SuppO'>C you wish to test the null hypothesis that the slope /3 1 is zero 10 til~
population counterpart of Equ.tt ion (5. ) a t the 5 % signifiCance leve l. To d o:.,>.
construc t the 1-sratisti <tnd compare it to 1.96. the 5 % (two-side d) critical ' alu ...
talcn from the st andard normal d is tribution. TI1e 1\Latistic ill constructed bv
~ubstituting the hypotht:sJzed value of {3 1 under the null hypothesis (;cro ), the est
muted slope. and i t 'i \tunuard 1.:11 or from Equation (5.~ ) into the general formuJ,\
5.1
FIGURE 5 . 1
153
The p-val ue of a
,
f
s ig-
96.
are
O LS
~qua
~ esti~y
arc
vay to
bctivc
e xceeds (in absolute value) the 5% two-side d c ritical value of 1.90. so the null
hypo thesis is rejected in [avor of the two-sided alternative at lh~ 5% significance
level.
Alternatively, we can compute the p-value associ<"' ted with r"u = - 4.38. 1llis
probability is the area in the ta ils of sta ndard normal distribution, as shown in Figure 5.1. This probability is extremely small, approximately 0.00001. or 0.001rvo .That
i;, if the null hypothesis f3crmsSz~r = 0 is true, the probahilit) {1 1' ob taining a value of
{3 1 as far from the null as the value we actually obt ained is e xtre mely sma ll , less
than 0.001 %. Because this event is so unlikely. it is rcasonubk to conducJ~ thnLthe
(5.R)
lregrc~
n (:'i.t;)
~ioty of
~e fi t of
/porting
s hook-
in lllc
Iroodo
S0
,at v~1IUC
~cted 11~
the e!'tir
forrnl.lh'
1 54
CHAPTER 5
(5.9)
where {3 10 is the value of {3 under the null (0 in the !> IUdent-Leach~r ratJO example) and the alternative is that /3 1 is less than /310 If the alterna tive 1s that {31 1s
greater than /3 1.0 . the inequality in Equation (5.9) is reversed.
Because the null hypothesis is the same for a one- and a two-sided hypothesis test, the construction of the t-statistic is the same. Tile only difference betwel!n
a one- and two-sided hypothesis test is how you inte rpret the r-sta tistic. For thl!
one-sided alternative in Equation (5.9), the null hypothesis is rejected against t h~!
one-sided alte rnative for large negative, but not lnrge positive, valui!S of the t
stnt istic: Instead of rejecting if t 11~'1 > 1.96, the hypothesis is rejected at the 5%
significance level if rae'< - 1.645.
The p-value for a one-sided test is obtained from the cumulative standard normal distribution as
p-value
If the altemauve hypothesis 15 that {31 is greater than {3 1JJ, the inequali u~os
Equnuons (5.9) and {5.10) are rcversed,so the p-valuc is the righi -tail probability.
Pr(Z > tnc').
5. 2
155
Application to test scores. The t-statistic testing the hypothesis that there
is no effect of class size on test scorcs[so /3 10 = 0 in Eq uation (5.9)] is raa = -4.38.
This is less than -2.33 (the critical value for a one-sided test with a 1% significance leve l), so the null hypothesis is rejected against the one-sided aJtemative
at the 1% level. I n fact, the p-value is leS!i than 0.0006%. Basl!d on tbcse data,
you can reject t.be angry taxpayer's asse rtion that the negative es timate of the
slope arose purely because of random sampling variation at the 1% significance
level.
)th-
.:'i.9)
othe
This discussion has focused on testing hypotheses about the slope, {3 1 Occasionally. however, the hypothesis concerns the intercept, {30 . TI1e null hypothesis concerning the intercept and the two -sided alternative are
ween
)r the
1st the
the rte 5%
(two-sided alternative).
(5.11)
The general approach to testing this null hypothesis consists of the tlut!t!
steps in Key Concept 5.2, applied to {30 (the formula for the standard error of
~0 is given in Appendix 5.1). If the alternative is one-sided. this approach is
modified as was discussed in the previous subsection for hypotheses about the
slope.
H ypothesis tests are useful if you have a specific null hypothesis in mind (as
did our angry taxpayer). Being able lo accep t or to reject this null hypothesis based
on the statistical evidence provides a powerful tool for coping with the uncer tainty
inherent in using a sample to learn about the population. Yet, there are many times
that no single hypothesis about a regression coefficient is dominant, and instead
one would like to know a range of values of the coefficient that ;ue consistent wi th
the data. This calls for constructing a confidence interval.
ronor(5. tO)
litiC$ in
~ability.
l rnativc
n1iS rea
lh. How
~d .
* f3o.o
upon
tiergoing
cognit.ed
joke that
1akc sure
e.ln prac
5.2
Confidence Intervals
for a Regression Coefficient
Because any statistical estimate ot the slope {31 necessarily has sampling uncer
tainty. we cannot de te rmine the true vaJue of {31 exactly from a sample of data. lt
156
CHAPTER
5.2
157
ruct
pos-
e true
Dies. it
[3 1
A 95% two-sided confidence interval for /3 1 is an interval rhat contains the true
value of /3 1 with a 95% probability: that is. it contains the true value of /31 in 95%
of aU possible randomly drawn samples. Equivaltmtly, it is the set of values of f3 1
Lhat cannot be rejected by a 5% two-sided hypothesis test. When the sample size
is Large, it is constructed as
95% confidence interval for {:3 1 =
>iS I~SI
rn ~>nly
(5.12)
~ luc: of
ne fir~ l
so (as we knew a lready from Section 5.1) the hypothesis /31 = 0 can be rejecte d a t
riC<lllCe
I. intcr-
Confidence in ter vals for predicted effects ofchanging X . The 95% confidence interval for {3 1 can be used to construct a 95% confidence interval for the
predicted effect of a general change in X.
Consider changing X by a given amount, t:.x. 1l1e predicted change in Y asso-
ing the
/ Slid liS
range
,1!
'interval
~ USI.!d tO
ciated with this chauge in X is {3 1ux. The population slo pe {31 is unknown , but
because we can construct a confidence i.nterval fo r {3 1 we can construct a confidence in ~erva l for the predicted effectf3 1nx. Because one end of a 95 % confidence
interval for {3 1 is ~ 1 - 1.965(~ 1 ), the predicted effect of the change t:.x using this
estimate of {3 1 is [,8 1 - L965(,81) J x t:.x. The other end of the confidence inter val is ~ 1 + 1.965(,81) , a nd the predicted e ffect of the change using that estimate
is [~ 1 + 1.96S(~1 ) J X
Thus a 95% confidence interval for the effect of chang-
ux.
, a~ Key
95% confidence interval for f3 16.x =
lstruct..:d
[.$16. x- 1.965(~ 1 )
X Sx,
.h).
(5.13)
Hl
St.(~tl
( 0.521. 01
intcl'aL
158
CHAPTER
5 .3
.s
~ 20.
(S...
(5.1 " I
Thi is the same as the n.:gTt!l>Sion model with the conLinu(lUS regreo;.:;or X,. exc~: rt
that now the rcgrc~sor is the binary variabk D;. Because D, 1.., no1continuous. It i"
not usefu l to think of {3 1 as a slope; indeed. because D1 ci:ln take on only tw('l' I
ues, there is no ''line'' so it makes no sense to talk about a slope. Thus we will n(
refer to {3 1 as the :,lope in Equation (5.15): instead we will simply refer to {3 1 as tl
coefficient multiplying D 1 m this regression or, more compacLiy. the coefficient
on D 1
II {3 1 in Eq uation (5.15) is not a slope, then what is it'! Th ~ be:,t way tu inh:r
pret {341 and {3 1 in a regression with a binary regressor is to con-.idcr one at a tim~
the two possible cases. D, = 0 and D1 = l. lfthc Stu(knt- teachcr ratio'' high. then
[) 0 and Equation (5.15) hccomes
Y, - {3,, ~
II,
(D, = 0).
(5 I )
S .3
159
.inuous
~ -th at
(5. 17)
orkcrs
Jr rural
:Jf
large
iable or
e nce between these two means. In othe r words. {3 1 is the differe nce bet wee n
(5.14)
the same. then {3 1 in Equation (5.15) is rero. Thus. the null hypothesis that the two
population mea ns are the same c an be tested against the alll!rnative hypothesis that they diffe r by testing the null hypothesis
(5.15)
r X ,, except
13 1 =
{31 -1= 0. This hypothesis can be tested using the procedu re outlined in Section 5.1.
Specifically, the null hy pothesis can be rl!jecrcd a t the. 5% level against the twosided alterna tive whe n the OLS t-statistic 1 = ~~I St:(~ 1 ) exceeds 1.96 in a bsolute
,inuous. it is
Ill) two val-
value. Sim ilarly, a 95% confidence interval for {3 1 constructed as~~ ::!:: J .96SE(~ 1 )
as d escribed in Section 5.2.. p rovides a 95% confidence in terval for the difference
we will oot
to f3t as the
~ coefficie nt
;vay to intet
1ne at a tJ 01t:.
~is high. then
(5. 16 )
TestScore
= 650.0 + 7.40, R?
(1.3)
(1.8)
= 0.03).5R =
-
18.7 ,
(5. 18)
160
CHAPTER 5
where the standard errors of rhe OLS estimates ol the coefticienlS /3. 1 and /3 1 .uc
gjven in parentheses below the OLS estimates. Thus the avcra!!,e tc:.t score.: ll'r the
subsample with student- teache r rattos greater than or equal to 20 (th~t is, 11 1
which D = 0) is 650.0. and the average test score for the. sub~ample \\ith stu
dent- teacher ratios less than 20 (soD = I) is 650.0 - 7.4 = 6~7.~ The <Jif(c.:rcn c
between the sample average test score:. for the two group:. i~ 7 l Th~ '"the OI.S
estimate of {3 1 the coefficient on the student- teacher ratio hunn \'an:tble D.
Is the difference in the population mean test score~ in the tv.o j,!roups st.t 1 ll
cally s1gnif1cantly different (rom zero ar the 5% level? To ltnd out. construd the
r-sta tistic on 13 : r = 7.4 / 1.8 = 4.~ . This exceeds 1.96 in absolute value, ~u th~.:
hypothe!>i that the! population mean test scores in d btricts with high .md low :-tudent- teacher ratios is the same can be rejected at the 5% significance. lc\d
The OLS estimator and its standard error can be used to construct a 95% Lunfid ence interval for the true difference in me ans. Th is is 7.4 1.96 X 1 ~ -(3.9.10.9).This confidence interval exclude:. {3 1 = O.so that (as we know from thl!
previous paragraph) the hypothesis {3 1 = 0 can be rejected at the So/n signtfhllll.:l!
level.
5.4
What Are
Heteroskedasticity and Homoskedasticity?
Definitions of heteroskedasticity and homoskedasticity. 111c crrot tt.'trn
u, is homoskedastic ilthe variance of the conditional distribution of u,gav..:n \ ''
constant fori = 1. ... , nand tn particular does nor ul.!pt:n d on X, . Otherw~~. the
error tcm1
IS heteroskedastic.
160
CHAPTER
Uke F
these
the di
5.4
h,~ a
mean of zero (the first least squa res assumption). It furthermore. the varwnct< of
this conditional distributJon does not depend on X,, then the e rrors are sa klw h~
bomoskedastic. lltis section discusses homoskcdasricity, its t heore tical implkations. the simplified formulas fo r the standard e rrors of the OLS esti ma tor~ that
ari e if tbe errors arc homoskcdastic, and the risks you run if you usc th l!~~ s1m
plified formu l a~ in pract.ice.
What Are
Heteroskedasticity and Homoskedasticity?
Definitions of heteroskedasticity and homoskedasticity. The error
t~rm
FIGURE 5.2
Te~t
l.8
An Example of Heteroskedasticity
score
720
700
ble D.
ps statistiIRtrucr the
lue. so the
1d low stulevt:l.
H 95% con-
l)6
161
6HO
o~>otloo ol
X" 25
660
MO
620
; that it bas a
e variance of
ne said to he
tical implicatimators that
1se these sim -
he error term
f 11, given X, i:)therwisc, the
Srodcot-teacl1er ratio
Like Figure 4.4, this shows the conditional distribution of lesl scores for three different doss sizes. Unlike Figure 4.4,
these distributions become more spreod outlhove o larger variance) for larger doss sizes. Becouse the variance of
I the distribution of u given X, vor(ul X), depends on X, u is heteroskedastic.
162
CHAPTER S
Example.
(5.1 ':l}
for i = 1, ... . n. B ecause the regres~or is binary. {3 1 i:, the difference in the population means of the two gr oups-in this case, the d ifference in mean e a rning:..
between men and women who graduated from college.
The definition of ho moske das ticity sta tes th a t the varia nce ot u i does o,t
depend on the regressor. He re the regresso r is MALE;. so a t issue is whether th~
variance of the erro r term depends on j'vf A Lt.", . ln other words, is the variance of
the error ter m the sa me for me n and for women? If ~o. the e rror is homoskecla\tic: if not, it is heteroskedastic.
Deciding whether the vari ance of u, d e pe nds on MALE, re quires think in~
hard about what the error term actually is. In thh regard. it is useful to write Equ.ltion (5.19) as two e parate e quation::.. one for men and one fo r women:
Eamings,
= {30 + u,
(women) and
(5 .2(1)
(5.'21)
Thus, for wome n, u, is the de' iation of the i'11 womans earnings from the popul.1
tion mean earnings for wome n (/3r>), a nd for me n, 11, is the Jeviatio n of the fh man\
earnings from the population mean earnings for men ({30 - {3 1) . J1 fo llow:-. tb atl h ~
'
5.4
163
tribut1on
does not
1 ab!>tracl.
Because the least squa res assumption::; in Ke y Concept 4.3 place no restr ic tions on
the conditional variance, they apply to both the general case of he te roskedasticity and the specia l case of homoske dasticiry. Therefore. the O LS estimators
remain unbiase d a nd consisten t even if t he errors are bomoskcdast ic. I n addition.
the OLS est imators bave sam pling distributions t hat rue normal in large sam ples
even if the erro rs a re hom oskedastic. Whet her the errors a re ho moskedastic or
heteroskedastic, the OLS estimator is unbiased, consistent. and asympto tically
normal.
t-teacbcr
male ver!nder Gap
,e a binary
.ale graduts lo his or
(5.19)
docs not
vhethcr the
variance of
omoskcdas11 1
Homoskedasticity-only variance formula. lf the error term is ho moskedastic, then rhe fonnulas for the variances of ~ 0 and ~ 1 in Key Conce pt 4.4 simplify.
Consequently. if the errors are homoskedastic. then there is a specia bzcd formula
that can be used for the s tandard e rrors of ~0 and ~ 1 . The homoskedasticit)onJy
standard error of ~ 1 . de rived in Appendix 5.1. isS (~ 1 ) =
whe re Cf~ is t he
""'
Pt
f-' 1
homoskedasticity-only estimator of tht:: va ria nce of {3 1:
v'iif.
~n :
s~ _ _
- 2 - -,-----.:.:..___
0'f>l II
(5 2())
(5 21 )
(h o mos k~dasticity-only) ,
(5.22)
}:(XI - X) 2
i=1
whe re s~ is given in Equation (4 .19). The hom oske dasticity-only fom1ula fo r the
standard e rror of ~11 is given in App end ix 5.1. In the special case tha t X is a binary
variable, the el>tima tor of the va riance of ~~ unde r bomoskedasticity (that is, the
164
CHAPTER 5
ifJ,
5.4
rolcd
the
.they
c nn d
e het-
rnatc.
l uo;ing
distrior this
rl' hel c:rrors
~kedas
ains tbe
la~ticity.
1nS (5.4)
:trc hcl-
n terval<;
1ether t)r
ms1 stan
). Huher
~tao-
ty? 1 he
1es can h~
ng. colk,(!
~\'CS SOnll.
to a le:.~er
ave
a lwn~5
1. Thi
165
ll
hitc
FIGURE 5 .3
sug.
.mong men
raduatl!~ Ill
n in I.qua
I r;
~~~
Yu"' o l oducacion
166
CHAPTER 5
l>..nl..
lS
or
01 \
,,.!.''-"
JO
onJI~"' a lu n~
11
'
, job!>
jel in
~con
n this
ises in
gives
lent to
)e\ling
discusty-only
choos
-robusl
'-robust
>lc on~s
u:.c the
167
when the sample size is large. lo addition. under certain conditions the OLS estim ator is more efficien t than som e other candidate estimators. SpecificaUy. if the
least sq uares assumptions hold an d if the e rrors are homoskedastic. then the OLS
estimator has the smallest va ria nce oi aJJ condi tionally unbiased estima tors that
are linear functions of Y1 , Y w T his section explains and discusses this result.
which is a consequence of the Gauss-Markov theore m. The section concludes with
a discussion of alternative estimators that are more efficient than OLS whe n the
conditions of the G a uss-M arkov theorem do no t hold.
:ity-only
~ option
plcmcnt
you use.
y-robust
"
731 -= 2:a;Y,
i=l
(~ 1 is linear),
(5.24)
>tributioo
where the weights a1... a11 can depend on X 1, X 11 bul nol on Y1, Y,."Ine
estima tor /31 is conditionally unbiased if t be mean o f its conditiona l sampling
distribution, given X 1, . ... X,, is {3 1 Thai is, the estima tor~~ is conditionally unbiased if
1 ~ome ~~ ~~
(5.25)
d. howe\'cr.
s as long as
168
CH APTER 5
5.5
~1
--------------------------
If the three least squares assumptions~ Key Concept 4.3 hold a11 r errors arc
homol>kedastic. then the OLS estimator (3, is the Be t (ntol>t ef1icicnt) linear conditJOnall) Unbiased Estimator (b BLUE).
Th Gauss-Markov tb<o:-~.:m
provides a theor etical justification tor usmg OLS. I lowever, th ~ theorem t a ... l\\O
Important limitations. Fir' I. its conditions might not hold JJ1 pracLice. l n particul1r.
1f the e rror term is be teroskedac;ttc- as it oft en is in econom1c appltcatjon-;- h~:n
the OLS estimator is no longer BLUE . As discussed io Section 5.4. the pre'~ .,c~
of h\:.tc roskedasticity does nor pose a threat to mle rence based on h f.. tero~h~ 1::.
tit:it y-robust standard errors, but it does mean thal OLS is no longer the dficll nt
linear conditionally unbiased estimator. An alternative to OLS when t hen~ ~~ hd
eroskeda!>ticity of a known form. called 1he weight ~! d hw:<t :.~J tl!l res cstimattH . 1'
discussed be low.
The sccon<.l limilation of the Gaul!SMarkov theorem is that even if the cnn
diti ons of the theorem hold, there are other candiJatc c~ t i mn tors then Mc not ltn
ear and conditionnll) un biased; under some conditions. these othct esumator'- 1 .:
more efficient than OLS.
5.6
169
The weighted least squares estimator. If the errors arc heteroskedastic, then
lerrors are
,jnear con-
DalOr is lin-
IhaLunder
's(imator /31
:mditionalh
,uss-Marko v
east squares
bnsequentl\'.
~das ti c, then
rep! 5_5 and
kov theorem
)rcm has tWO
In particular.
'alions- then
the presence
Ilcte roskedas
:f the efficient
'n there. is het
S estimator. i~
if the CO Il
nal are nol lin
Yell
estimator" tlrC'
Pu aod
PI
bl tha t minimize
blXil. In practice, this estima tor is less sensi tive to large outliers
than i~ OL$.
In many economic data sets, severe outliers in II are rare, so use of the LAD
estimator. or other estimators with reduced sensitivity 10 outli ers. is uncommon in
applications. Thus Ihe tre<ltment of lin ear regression throughout the n:mainder of
this text focuse s exclusively on least squares met hods.
in
II
170
CHAPTER
estimator i'> normall) distributed and the homoskedastici ty-only r-:.tatistic has.
Stud~nt 1 ur>tribution. These five assumptions- the three least ~quares assu mp
tion-.. that the e rrors arc homoskedastic, and that the errors arc:: norma II~ drstrihutell- arc collectively caJled the homoskedastic normal regression tLSsumption,.
crl
ua
5 .7
Conclu~ion
171
the Student / distribu tion (Appendix Table 2) in:.tt:ad ot the standu rd normal distribution. Because the difference between the Student t distribulion a nd the normal disl ri bution is negligible if n is moderate or large. this distinction b rdcva nt
only if the sample size 1s small.
In econometric a pplica tions. there is rarely a reason 10 be lie ve that the. e rrors
are hom oskedastic and normally dis! ributed. Because s<lmple sacs typically are
large. however . inference can proceed as described in Sections 5. 1 and 5.2-{hat
is, by first co mputing betcroskedas ticity-robust standard errors. and then using the
standard normal distribution to compute p-valucs, hypo thesis test!>, and coufidencc
intervals.
tmp.trib-
ons.
~Jam
~wi th
uistri-
,e null
l enor
5 .7
Conclusion
)) I iJp,
egresA.s dis-
,where
.ecm.tst
tlistrih~1 0 ) has
In addi;q ~arcd
:1 {3 1 are
istic ha~
context
the t WO
atistic is
then the
t)skedas
formula
;cia! case
old. th..:n
,tion (::.ec
and if tlw
1k~.:n
!'ronl
Return for a moment to the problem that started Chuptcr 4: the su peri ntend~ nt
who is considering hiring additional teachers to cult he srudent-tcacher ratio. What
have we learned that she m ight find useful?
Our regression analysis. based o n the 420 observations for I Y98 in the California test ::.core data set. showed tha t there was a ncg<Hivc rela tionship be tween
the student- teacher ratio a nd test scores: Districts with smaller cla'l~cs have highe r
test scores. The coefficient is modera tely large, in a prac tical sense: D is tricts with
2 fewer stude nts per t.e<sc.he r have. on average. test sco res thut arc -to poims higher.
TI1is corresponds to mo ving a district a t the 501tt percen tile of the tli1>trib ution of
test scores to approxima tely the 60111 percentile.
l11e coeffi ciem on the s tude nt- teacher ratio is statistically s ignificantly difterent from 0 at the 5% significance level. The popttla tion coefficient might be 0,
und we might simply have estimated our negative coefficient by random sampling
variation. However, the prohability of doing so (and of o btaining a r-statisti t.:
on {3 1 as large as we did) purely by r a ndom varialion over potential samples is
exceedingly small. approximately 0.001% . A 9'\% confidence in te r al for {3 1 is
172
CHAPTER .5
Uneor Reg~on
with O ne Regressor
There is. in fact, reason ro ~A O ITy that it ntight not. Hin ng more teachers. nltt:r
all, costs money, so wealthier school dislricts can beucr afford smaller classes. But
students al wealthier schools also bave other ad\antJgcs over their poorer neigh
bors. including better faciliues., newer books. and better-paid h~a~hcr.. MoreoH; r
srudents at wealt hier schools te nd the mselves to come from more aftluc01 r milit!S. and thus have o ther advantages not direct!) asscx.aah.:d \lrll h their '~.:hoot. f
example, California has a large immigrant communaty; th~c,; mmi~ ranb ll.. nd
be poorer than the overall population and, in m.my cases, t h~.:ar chaldren an. .,
natI\ c English speakers. It thus migbl be that our negative e~lim.tted rclataon'r tp
between test scores and the student- teacher rauo is a consequence of large cia''
be lllg fo und in conjunction with many other factors that are, in fact, the real caL c
of rhe lower test scores.
These o ther factors, or omitted variables." could mean that the O LS a nalv~ i,
done so fa r has little value to the superintendent. Indeed . it could be mi sl e ad 111~:
Changing the student- teacher ratio alone would noL change these other factos
that determine a child's pe rformance at school. To address this problem. we nccll
a method that will allow u:; to 1solate the effect oo test scores of changi ng t h~ o;t J
dent- teacher ratio, holding 1he$e other facLors constan t. Tha t method is multipi
regression analysis, the:: topic of Chapter 7 and 8.
Summary
1. Hypothesis testing for regression coe{ticiems is analogous to hypo tbe is t~.:,t ang
(or tb~ population mean: Use the /-statistic to calcula te the p-values and t;tlhcr
accept or reject the null hypothesis. Like a confidence tnterval for the populltltlO
mean. a 95% confidence interval fo r a regreSSIOn coefficient is computed '' ttl~
estimator 1.96 standard errors.
2. When X is binary, the regressio n model ca n be used to esrimate and test hypotlw
scs about the differe nce between the population means of the
= ogroup .111u
the " X = 1" group.
Key Terms
after
f But
reigh-
~fa~i;
~.nd to
tre not
o nship
classes
173
lic, that is, var(u1 1X, = x) is constant. Homoske dasticity-only standard errors do
no t produce \!a lid statistical inferences when the errors are beteroskedastic, but
hete roskedasticity-robust standard e rrors do.
4. If the three least squares a ssumption bold and if the regression e rrors are
ho rnoskedastic. then , as a result of the Gauss-M arkov t heorem. the OLS estimator is BLUE.
5. If the three least sq ua res assumptions hold, i ( the regression erro rs are
homoskedaslie,and if the regression errors are normally distributed, then the OLS
,l cause
tnalysis
eading:
factors
t distribution and the normal distribution is negligible if the sample size is mod-
~e need
t he stu-
erate o r large.
Key Terms
rultiple
I
.
is testing
~d either
i)p ulation
:ed as the
, hypoth~
~oup and
(164)
best linear unbiased estimator (B LUE)
( 168)
Gauss-Markov theo rem ( I68)
weighted least squares (169)
homoskedastic oom1al r egression
ass umptions ( 170)
Gauss-Markov conditions (182)
174
CHAPTER 5
Outline the procedures for computing lhe p -value o f a t~ o -sded test ('
H0 : p. y = 0 using .tn 1 1 d. set of observatiOn'\ Y 1 1, . . . ,11. Oullmc the p1 >
cedures for computing the p-value of a two-s tdcd h:.~ t of H0 : ~; = 0 in
rt:grcssion model Wiing an i.i.d. set of obc;ct vat ion" (Y, . X 1).t = J. 11.
5.2
Explain how you could use a regression model to e~ll ma te the " age gender
gap using lbc data on eam mgs of men anu women.\\ h..1t .m: the depend\: I t
and indepe ndent variublcs?
5.3
Exercises
5.1
Suppose that a resea rcher. uc;mg data o n class size (CS) and aH~ rage
scores from 100 third-grade classes, e!\timau:s tbc OLS regression.
t e~t
ucicnt.
b. Calculate lhe p -value for the two-stdcd test uf the null hypothesis
H1 ~ /3 1 = 0. Do you reject the null hypothcc;i<; at tbe 5% level? At the
l ~o level?
c. Calculate the p-value for the two-sided test of th1. null hypothesJ<;
f/0 : /3 1 = - 5.6. Wit hout doing any addititmal calcu lations, de t ~nnint
whether - 5.6 is contained in the 95% confidence interval for {3 1
Suppo:.c that a researcher. u~ing. wage J.1ta on 250 randomly selecteJ m. 1.:
workers and 280 female workers. esllnMt..:s the OLS rcgres;;ion.
~
II_II6,Sl~ R = 4 ~
Exercises
175
\\here War:e IS measurl.!d m $Jhour and Male 1s a binary \Mlable that is equaJ
to I if th1.. pe rson is a maJe .mu 0 1f the per {lO I" .t kmale. Define the wage
genc.lcr gap as the difference in mean earnings between men and women.
a. V. hat is the estimated gender gap?
b. Is the estimated gender gup "igmlic:mtly different from zero? (Compute the p-valuc for testing the null h ypot hesi ~ that there is no gender
gap.)
c. Construct a 95% confidence interval for the gender gap.
d. Jn the sample, what is the mean wage of wome n? Of men'?
e. Another researcher u~e~ these same data, but regrc~scs \Vage.s on
Fmwle, a varia hie that IS equal to 1 if the person is female and 0 if the
person a male. Wha t are the regre-ssion estimate!> calculated from this
regression?
cal
tic.
10.2.
(2.15) (0.31)
od-
us~.-
176
CHAPTER
(2.5)
b. Is the estimated e iTect of class size on test scores statist ic-ally signshcaot? Carry out a test a t the 5% level.
c. Construct a 99% confidence interval for the e ffect of Smai/Ciass c.m
test score..
5.6
5.7
Suppose that ( Y~' X;) satisfy the assumptions in Kty Concept 4.3. A ranJ 1nl
sample of size n = 250 is drawn and yie lds
Exercises
177
* 0 at the 5% level.
gradu-
.rnte-
lu!!S is
a pprox:udents.)
~ core of
denote a
tnd equal
5.8
Suppose t hat (Y;.Xt) satisfy the assumptions in Key Concept 4.3 and, in addition, u; is N(O. aD and is independent of X,. A sample of size n = 30 yields
Y=
~-
(10.2) (7 .4)
where the n umbers in p arentheses are the ho moske das tic-o nly standa rd
errors for the r egression coefficients.
igoi(i-
';lass on
skedastic?
he regres-
lity of the
1.
.3. A random
5.9
Y, = /3X; + II, .
where u, a nd X; satisfy the assumptions in Key Concept 4.3. Let 73 deno te an
estimato r of {3 that is construc ted as 73 = ~where Y and X are the sample
means of Y ; and X ;. respectively.
5.10 Le t X; denote a binary var iable and consider the regression Y, = /30 + {3 1X ,
+ u;. Le t Y0 denote the sample mean for o bservations with X = 0 and Y1
178
CHAP TER
P P
J. Show th at ~11
~.
5.11 A random sample of workers contains nm = 120 men amltl ~ - l31 womc:n
1 >." y
The sample averave
o of men's weekly earning<: (y-m = -"., ....,,_.
n _) t $5:?3 ..
~
Po under homoskeda~
5.13 Suppose that ( Y1, X1) satisfy the assumptions in Key Concept 4.3 and, in mlJ1
tion. II, is N(O, CT~) and is in depend em of xr
a. Js ~ 1 conditionaUy unbiased?
b. Is {3 1 the hest linear condit ionally unbiased estimator o[ {3 1?
c. How .... ould your answers to (a) and (b) change if you assumed
only that (Y,, X;) satisfied tbc assumptions in Key Concept 4.3 and
va r(u, IX; = x) is constant?
d. How wou ld your answers to (a) and (b) change if you assumeJ
on!~
Empirical Exercises
tal
179
1he sample of men,~ .. . I denote the OLS estimator constructed (ro m the sam-
~o = Yo.
l31 women.
,) i" $523.10.
-:)1) LS $61-1. 1.
t)l
: $5l.l0. Let
1eo
and 0 for
,ression Y. =
(3 1 and their
Empirical Exercises
ES.l
Using the data set CPS04 described in Empirical Exercise 4.1. run a regression of average hourly earnings (A H ) on Age and carry out the following
exercises.
t11)moske das-
3 and- in addi-
~ ,?
>umed
pt 4.3 and
~umed
o nly
e. Is the effect of age on earn ings different for high school graduates
than for college g raduates? Explain. (Hint: See Exe rcise 5.1 5.)
E5.2
3?
Markov condi-
is a linear
o n ( Y,.X1). To \"I~;
Df schooling, nn~.l
1e regression (os
>r women as l.
XJnstructed uc:,ill~
5.3
Using the data set CollegeDistance described in Empirical Exercise 4.3, run
a regression of years of complete d e ducation (ED) o n distance to the nearest college (Disc) and carry out the fo llowing exercises.
180
CHAPTER 5
d. Run Lh.e regression using data only on males and repeat lb).
e.
t~
APPENDIX
5.1
"L:
:r
(X,-
e~ttmator; ~
th""'- o
hetcroskeda~ticlly-robu~t 't :J
h)
181
r1!
for
Homoskedasticity-OnJy Variances
U nder homo!.k.cdasticity. the conditional variance of u, given X; is a constant:
var(u,l X,)
af.. ff the errors arc homoskedastic. the formulas in Key Concept 4.4 simplify
to
(5.27)
<J
flu
presented
~ked astic
ariance of
,ectal case
E( Xr ) 1
=---a
no} ,,.
(5.28)
To de rive Equation (5.27), write the numerator in E q uation (4.21) liS var[(X; - 1-t~)u;)
= E(I( X,-
JJ. x)tt1
E[(X, - ,u_y)u,W)
= E[ [( Xi-
E[(X, - JLx) 1var(1t, IX 1) ]. where the second eq uality follows because Ef(X, - JJ.x)u1) = 0 (by
the first least squares as~u mption) a nd where the final equality follows from the Jaw of iterated e xpectations (Section 2.3). [f
lattOn vari-
by substituring thts expression into the numerator of Equation (4.21) and simplifying. A
catton. The
,~,.i~. where
t \(l correct
1 the
defmi-
,\lma ted by
,robust stan
,,
}2_(X, -
(bomoskedasticity-only) and
(5.29)
(homoskedasticity-only),
(5.30)
X) 1
;- 1
182
CHAPTER
s ~ as given
th~o:
,.
APPE ND
(u) var(u;l X , .
(ili) E(u,u, x l,
X,.)
X 11 )
=0
= a-;,
X II)
0 < u~ < oo
= o. i
(5 3 )
*j
X s (X 1, , X~).
TI1e Gauss-Markov conditions are implied by the three least squarcf- assumpuCln'- I K.::)
Co ncept 4.3). plus the additionnl n.;o;umption that the errors are homoq)..cd a~tic Been ,e
the observations arc 1.1.d. (Assumption 2), (111! X 1 . X, ) = E(u,I X 1) , and by As(umruon I. E(u, X,) = 0: tbus condttio n (i) holds. Similarly, by Assumption 2. va r(u, X1..... X l
= var(u X1) , and becau.<;e the errors a rc assumed to be homoskeda!>tiC. va1(111 .\' ) - cr
\\ hich 1:. constant. Assumption 3 (nonzero finite fo urth moment<;) ensures that 0 ~ "
~o condition (ii) hold:.. To show that condition (iii) is implied by tht lcu't 'l(U<He~ :t' sumr
uons. note that (11 u,IX .
X , ) = (11,u1l X ,. X 1) bccaust. (X,. Y1) a r~; ii d h~ Aswnor
uon 2. A'~ump11on 2 also implies that (u,tt1 X,, X1) = E(u, X,) ( 11 \ ) lor 1 + J. ~c;~ut:
f~(u, 1 X ) = 0 lor all i 11 tollows that (11,11 X 0. 0. , A ) 0 fo r .lilt ,- I -.o ~:unJiti~'" (ni)
0
1 83
holds..11ws. the least squa res assumptions in KeyConccpt 4.3, plus homoske da' ticity of the
errors. imply the Gauss-M a rkov condjtions in E quatio n (5.31).
L7 1 (X;- X)(Y;- Y)
2:;=1(X, -
X) Y;. Substi-
(5.32)
;trkOY
line ar
~odi
' plus
lliased
Y1,
xn. is
(5.33)
(5.31)
ate that
related
bserved
The result thal ~ 1 is conditiona lly unbiased was previously sho\\D in Appendix 4.3.
Pt
ms (Key
(5.34)
Beca use
\ ssump
= 2:;~ 1 a,E(urJ X1 ,
.. .. . X, l
(;) = cr,..
0; thus, taking conditional expectations of both sides of Eq uation (5.34) yie lds
. a-;<%,
a.-.sump
Assump
for all values of {30 and {31 it must be the case that, (or
....
X 11)
is co ndjtionaJiy unbiased by
p, to be conditionally unbiased,
~ bccau.'t:
,ition (ui)
n
"
:La,=
0 and 2:a,X, = I.
I I
i- 1
(5.35)
184
CHAPTER 5
ha\
0.1 'lmple torm. Sub:.titullne [quJtton (5.35) into Equattun (5J4) yn.:h.Js {3 - {3 . - ~
,ur,
\ ,
1;
var(/3 \' .
.. X,)=
u!2:t11.
(5.3f!J
10
P1 \\ Jih '".:tghts aj
We:- m w show lh:nthe two r~stncuons in Equation (5.35) and the ex-pr~sron lm 1~
conditional variance in C\juuttor (5.::;6) tmply thai th~ condlliunal vari m~-e <1f P 1 cxt~: ~
the condtt tonal variance or /3 1 unless .8 1 = /31. l.el o. ~~ .a, . . 2 !~ .a,d - s~ ,c1 .
u~mg the ddiniuon o( rl ,, we have th at
I
'L il,tl,
1=1
1.
~
(X,- X)d,
j l
i l
'(
"
- l 'La,X,,.,
1-
n,
d o;o
n d,) /
x L_
,. ,
L:
! ; 1a?
(tr~ ~ J/, )
11
2:<X1 i l
X)1
~
L (X - X)2
I
0.
wbert.: th.: ftnJI c!t~ U:lh ty lullows Irom EquatJl.lO (5.35) (which bold~ for both n, Jnd a.). T liS
~'
- 1 ~II d , = var({3. i'v ... .X)
. 1h'' r~ .. J l 1
2 ~~.. ,d';:su bstttuung
cr.,2 .._,_,a,
- a-;., ....y,. 1a, uu~.
, T cr,,~,
1
1
tnh.l
u;"Ld?.
(5.37)
iJ
lhus {3 ha' a )!Tc!ilh;f ...:l.lnuuronal variance than iJ if d t:. nonzero for any 1 = 1, . . . . 11. But
=11, and /3
::.
The Gauss-Markov
Theorem When X Is Nonrandom
With a minor change in tnh:rprctation. the GatAs-Markov theorem oi i\O applic'\ Ill nonr.10
dom r~~re''''r:. . th,ttt ... tt applies 111 r~grcssors that do not chan!!C thetr value ~ over rcpcatl.'l
sample.::., Sp..:cthcally. il the.; liOIIll II. J<;t -.quare:. assumptllln is r~:plal.~J by th.. assumpt tl'll
that.\ ,._ ... X,. 1re n(l u .tndltOl (ltxc::d o~cr rcpcat.:J sample~) .mJ " . 11 Jte 1.1.d. 1hc!ll
the forcgn tOJ; .,l.ttctru:nt nnJ proof of Inc (nu-s-Mark"' thcnr\.!"1 .tppl} Jm:ctly.:.xc~pl thnt
185
. Xn)'
...urn
fquatnr the
xceeds
d,)!
,r,).Thus
hts result
(s.:n)
... 11.
But
onooran
repeated
;sumpti(1f\
i.i.J .. the n
"<ccpt tba:
IS
of regression wnhou! an .. Y," .r;o that the only regrc or is Lhc con taut regressor X11 = I.
l11cn the OLS e timator {30 = Y. lt follows that, under the Gauss-MaJkov assumptions, Y
is BLUE. Note that t he Gau:..s-Markov requirement t ha t t h~ error b.. bomoskedastic is
irrelevant in tllis case because t he re is no regres.~or, so it follows that r b BLUE if
Y1.. Y,, are i.i.d. This result was stated pre viously in Key Concept 3.3.
CHAPTER
studenr-teachcr ratios tend to have higher test scores Ill the Cahtormn
data set. perhaps students from districts with small classes have orher
advantages tha t hd p them perform well o n ~tandardi 7cd tests. Could thio; hiJ\t:
ptoduced misleading results and, if so, what can be done'!
Omilted facrors, such as student characteristics, can in fact make rhe
Ot"dioary least squares (OLS) estimator of the effect of class size on test store")
misleading or. more precisely, biased . T11is chapter explains this "omitted
variable bias and introduces multiple regression. a method that can elim ina t ~
omiued variable bias. The key idea of multiple regression is that. if we
hav~:
Jata
depend o n data (rom a random sample: and in large samples the sampling
distributions of the OLS estimators are approximately normal.
6.1
- - -- -t
Omitted Variable Bias
B} [ocusing onl) o n rhe st udenL- tl!ac.:hcr ratio. the empin cal analysts in Ch1P"
te~ 4 and 5 ignored "ome potenliall~ important determinant~ of test scores b) ol
h.:cting thcar annuenccs in the regre . ion error term. llle~e nmi uc.:d fact ors tndudc
6. I
187
school ch a ract~ri~ tics. such as teacher quality a nd compu ter usage. and c;tudcnt
characte n sllcs. such as fam1ly background . We begin by con,idering a n omilte d
<;tudent chanJctenstic that ~ particularly re le\ant to Califo rnau beca use of its large
im migra nt population: lhe prevalence in the school tli-.tricr of students who are
still learning English.
By ignoring the pe rcentage of English lea rn er~ in the dic;trict, the OLS esti mator of the slope m tbe re gress ion of test scores on the s tudent- teacher ratio
could be bia~cJ ; that is. the mean of the sampling distribuuon ol the OLS estimator might not equa l the true effect o n te~t ~cores of u unit change in the
student- tl!ncher ratio. Here is the reasoning. Students who a re still learning EngLish mig ht perform worse on standardized tests than nat1ve English speakers. If
dstricts witb large classes also have many students stilllcarnang English. then the
OLS regression of test scores on the student- teacher ra tio could e rroneously fmd
a correlatio n and produce a large estimated cocllicicnt, whe n sn fact lhe true causal
effect o( cutting class sizes o n test scores is sma ll, even ze ro. Accordingly, based o n
the analysis o f Chapters 4 and 5.the superinte ndent might hire e no ugh new teachers to reduce the student-teacher ratio by two. but her hoped-for improvement in
test scores will fail to materialize if the true coefficient is mall or zero
A loo k a t the California data lends credence to this concern. The correlation
between the s rudenHeacher ratio and the percentage of E nglish lea rners (students
who are not na tive English speakers and who have not yet mashm.:d English) in
the district is 0.19. '11lis smaU but poS-itive co rre lation suggests that districts with
m ore E nglish learners tend to have a highe r s w de o t- teaehe r ratio (larger classes).
If the student- teache r ratio were unrela ted to the percentage of English learners.
then it would be safe to ignore E nglish profi ciency in the regrc::.sion of test scores
agai nst tht: ::.t udent-teacher ratio. But becau~ the studenl- tcacbe r ratio and the
percentage of English learners are correla ted. it is possible that the OLS coeiTic1ent
in the regression of test scores on the stude nt-teache r ratio re flects that influence.
188
CHAPTER 6
Omitted variable bias and the first least squares assumption. O mattcd
variable bias means that the first leas t squares assumption- that E(u, I X,) = 0. a~
listed in Key Concept 4.3-is incorrect. To see why, recall that the error term u in
the linear regression model with a si11glc regressor represents aU factors. other than
X ,, that arc determinants of Y,.lf on e of these other factors is corrdatt:d with X1
6. 1
TB9
re of
'
11 for
l:ogcae
cond
1gresinflu-
li\ting
'-'
.,
i.
. ~~------------------------------------------~-------:
r----~
6.1
Omitted variable bias is the bias in the OLS estimalor that arises when the regressor. X, is correlated with an omitted \ariable. For omitted variable bias to occur.
two conditions must be true:
I X js correlated with the omitted variable.
lriable,
>ld but
:s from
of day
:i. Conug,h the
tplc the
~ r ratio.
o f day
variable
~is park-
number
this means that the \.lrror term (which contains this factor) is corre lated with X,-. In
other words, if an omitted variable is a de terminant of Y,, then it is in the error
term. and if it is correlated with Xi> then the error term is correlated with Xi.
Because u, and X; a re correlated. the conditional mean of u; given Xi is nonzero.
This correlation therefore violates the first least squares assumption, and the consequence i~ serious: The OLS estimator is biased. This bias does not van ish even
in very large samples, and the OLS estimator is inconsistent.
-lowever.
e parking
condition
1nt Of teSt
ias.
OmitteJ
X,) := o.as
r term u; ill
,other tlt:t11
.ed with X,-
(6.1)
That is, as tbe sample size increases, ~ 1 is close to /31 + Pxia 11 1ax) with increasingly high probability.
The f01:mula in E quation (6.1) sum ma rize5> several of the ideas discussed
above about omitted variable bias:
1. Omitted variable bias is a problem whether the sample size is la rge or small.
Because 1 does not converge in probability to the true value {3 1, ~~ is inconsistent; that is, iJ1 is not a consiste nt estimator of {3 1 when the re is omitted variable bias. The term Px 11(rr)ax) in Equation (6.1) is the bias in 1 that persists
even in large samples.
19 0
CHAPTER 6
1!
Mozart fM 10-15
199~)
minutl}~
it has none.
in the state.
What is the evidence for the " Mozart effect"? A
re vieW ot do~.ens of s tudies founJ that Sl udents who
takt: optional music or arts courses in high school do
in fact have higher English and ma th lest score:; than
those who don't. 1 A closer look a t the.~e studies. however. suggests that the real rcns()ll for the better lest
b~tt<.:r
~o:xperirnents
folding paper
students might
TABLE 4
1 Sec
2. Whe the r lhis bias is large or small in practice depends on the coudalton P.t.
between tbe regressor and the error term. The larger is IPx, !. the larger b the:
All tlistnc
b ias.
3 . The direction of the h ias in
Pcr~:entagc
<: l.l)%
ne gative ly corre late d. For exam p le, we :>peculate d tha t th e perce n tage of students learning English has a negmive effect on di::;trict test scores {st udents
19-S.S%
I ll.R-2..l0%
--
l.:> 23.0%
6.I
19 T
(di~t ricts
\
n-
TABLE 6. 1
Differences in Test Scores for California School Districts with Low and High
StudenHeacher Ratios, by the Percentage of English leomers in the District
Stt~dent-T~cher
Ratio < 20
Average
Test Scare
pi~
or
~f stu
Student-Teacher
Roria ~ ~0
Average
ScaN
Difteren<e
Test
I'-statistic
657 4
238
6'i0.0
182
7.4
4.04
664.5
76
665.-t
27
- 0.':1
- 0.30
~dent"
!!.8~o
ng\ish
tion of
66S.2
64
661.8
.w
3.3
113
23 n,;,
6'\-1.9
5-1
M9.7
50
52
1.72
ti.'t>. 7
44
634.X
6L
1.9
0.6S
rtl tiO
1 92
CH A PTER 6
are divided ioto eight groups. Firsr, the districts are broken into four categonc,
that correspond to the quartile~ of the distribution of the percentage of Englt-.
learners across districts. Second, within eacb of these fo ur categoncs. dtsrricb are
further broken down into two groups. depending on whether the lltudent-teach r
ratio is mall (STR < 20) o r large (STR ~ 20).
The first row Ill Table 6.1 reports the overall diffe rence in 3\'erage lt= 1 scort:
between districts w1th low and high studeoHeacher ratios. that ts. the dtfferc:ncc
in test scores between these two groups without breaking them do'm funbcr 1ro
the quartilcs of English learners. (R ecaJI that this difference \\.aS previou, ,.
reported in regression form in Eq uation (5.18) as the OLS estimate of the cod
cienl on D; in the regression of TesrScore on D1, where D, b a binary regr~'\Sor th 11
equals 1 if STR, < 20 and equals 0 otherwise.) Overt he full sam ple of 420 distril h.
the aver::~ ge test score is 7.4 points higher in districts with a low studenr- teachtT
ratio than a high one; the /-statistic is 4.04. so the null hypothesis that the mt,m
test score is the same in the two groups is rejected at the 1% signifiCflnce level.
The final four rows in Table 6.1 report Lhe difference in test scores betwcc.:n
districts w1th low and high student-teache r ratios, broken down by tbe quartik vf
the percentage of English learners. This evidence presents a c.liffe rent picrure Of
the districts with the fewest English learners ( < l.9o/o ). the average test score Dr
those 76 with low student- leacbcr ratios is 664.5 and the average for the 27" h
high student- teacher ratios is 665.4. Thus. for the districts w1th the fewest Eng. 1-sh
learners. test scores were on average 0.9 points low~r in the disLricts wnh low stu
dent-teacher ratios! lo the second quartile. dtstricts with low stude n!-teatl er
ratio-; had test scores that averaged 3.3 points highe r than those wiLb bigh ~
dent- teacher ratios: this gap was 5.2 points for the third quartile and onl)' 1.9 pont~
for the quartile of districts with the most English learners. Once we hold the~ ... rcentage of English learners constant, the difference in performance betv.een ...
tricts with high a nd low student-teacher ratios i perhaps half (or less) 1.){ the
overall estimate of 7.4 points.
At first this finding might seem puz7.ling. How can the overall effect oi tt'l
scores be twice the effect of test scores witbin any quani lc? The answer i that the
d ist ricts with th~ mosl English lea rners tend to have both the high..:'t
student- teacher rarios and the lowest test scores. The difference in the ave rag~ II.'''
score he tween districts in the Lowest and highest quartile of the percentage of En~
lish learners tS large. approximately 30 points. Tile districts with fe w English le.u n
e rs tend to have lower student-teacher ratios: 74"" (76 of 103) of tbe dt~t rict~ 1n
the first quartile or English learners ha' c (Omall cla,sc.c; (STR < 20), while onl~ 12 '
(4-t of 105) of the dil-ltricb in the quartile\\ tth the m~t E.nJ!Ii h h.:aroen. ha' e "" II
d ""4.:' So.th'- dt.,tricb. with the mosl Cngh~h h. trn~.;r. ha...... bmh Jo,..,. ~r test ~or'"'
and ht h~or "'udcnt- eacher ratio<> than th~: other llbtw..:t'-
6.2
~lish
1 art!
c:hcr
ores
!ence
' lfi\ 0
r that
tricts.
achcr
193
This analysis reinforces the superintendent's worry that omitted variable bias
is present in the regression of test scores against the student-teacher ratio. By looking wit bin q uartilcs of the percentage of E nglish learners. the Lest score differences
in the second part of Table 6.1 improve upon the simple difference-of-means analysjs in the first line of Table 6.1. SLill. this analysis d oes not yet pro vide the superintende nt with a useful estimate of the e ffect on test scores of changing class size,
holding constant the fraction of English learners. Such an estimate can be provided, however, using the me thod of multip le regression.
ries
busty
be fi-
6.2
mean
~vel.
estimating the effect on Y; of changing one variable (Xli) while holding the o the r
Itween
hilc of
regressors (X2;, X 3i. and so forth) constant. In the class size p roblem. the multiple
regression model provides a way to isola te the effect on test scores ( Y,) of the student-teacher ratio (X1;) while holding cons tant the percentage of students in the
~~ore for
~~e. Of
7 with
oglisb
lowstu-
l.eacber
l ~gh stu~ points
lthe peri'ee n d"1s
Suppose for the moment that the re are only two independent variables, X 1; and
of the
(6.2)
f)
c t of test
~ that the
~ highest
erage test
~e of Eog~sh learndistricts in
: only42%
have small
Itest :;cores
X 2;. In the linear multip!e regression model, the average relatiouship becween
lhesc two independent variables and th e de pendent variable, Y, is given by the linear function
194
CHAPTER 6
(6.4)
'
~~ ~rt:> .ncornnr~
c 1n1 n
Fau.atio
((l?
no;
art
6 .2
t
19S
error" tt::rm u;. This error term is the deviation of a particular observation (test
scor es in the i'11 district in our example ) from the average population relationship.
Accordingly, we have
(6.5)
where the subscript i indicates the i 111 of the n observations (districts) in tbe
sample.
Equation (6.5) is the population multiple regression model when there a re
two regressors, X u and X 2"
In regression with binary regressors it can be useful to treat {30 as the coefficien t on a regressor that always equals 1; think of {30 as the coefficient on Xoi where
Xo; = 1 fori= 1, . .. , n. A~cordin gly, the population multiple regression model in
Equation (6.5) can a lte rnatively be written as
tion
(6.6)
The variable X 0, is sometimes called the constant regressor because it takes on the
(6.4)
same value-the value 1-for all observations. Simila rly, the intercept, {3(1, is some-
ange
':j:.
s the
regressor model. For example, ignoring the stude nts' economic background might
result in omitted varia ble bias. just as ignoring the frac tion of English learners did.
cter
X2 . In practice, however, there might be multiple factors omitted from the single -
This reasoning leads u!:. to consider a model with three regressors or. mo re generally, a model that includes k regressors. The multiple regression model with k
re gressors, Xli, X 2;, ... , Xki is summarized as Key Concept 6.2.
The definitio ns of ho moskedasticity and heterosked asticity in the multiple
y and
;rcssion
)eel'use
.he s tu
:<ample.
LerisdcS.
d'> to hC
at dctl!f
.2) as il 11
196
CHAPTER 6
lhX,, + P~X~ -
Pz.X~u
111
i = 1. . . .. n
(6.7)
where
Y, is t 1h obsl!rvatio n on the dependent van able: X 1,. Xz;.... X 1" are the
The population regression line is the rcla tion:,hip that holds between }
(YIX 11
x 1,X21 x2 X 11,
=xsJ
1- /3~c-t 4
The intercept Pois the expecte d value o t Y when a ll the Xs equal 0. '11 "
intercept can be thought of ac; the coefficient on a regressor, X 01 that
equals I for all i.
of practical help to the su perinte ndent, ho wever. we need to provide her with c'
m,ttcs of the unknown po pulation coefficients {J0 , f3.-: of the po pulauo n rcgr
'io n model calculated usmg a sample of da ta. Fo rtunately, these coefficie nt!) ~.
~c l.!~llm ated
6 .3
c.ul
6 .3
197
2: (Yi- b
0 -
b 1XIi - - bkXk1f.
(6.8)
i=l
1l1e sum of the squared mistakes for the linear regression model in expression
(6.8) is the extension of the sum of the squa red mistakes given in E quation (4.6)
sti
rescan
Y,- Y;
Tbe O LS e!:>timators could be computed by trial a nd error, repeatedly trying
different values of b0, ... , bk until you are satisfied that you have minimized the
total sum of squares in expression (6.8). It is far easier. h owever, to use explicit for-
1 cnn
mulas for the O LS estimators that are derived using calculus. The formulas for the
OLS estimators i.n the multiple regression model are similar to those In Key Concept 4.2 for the single-regressor roodel. llJcse formulas are incorporated into mode rn s ta tistical software. In the multiple regression model, the formulas a re best
expressed a nd d i<:cussed using matrix nota tion, so the ir presentation is deferred
lO Section I R. I.
198
CHAPTER 6
6 3
The OLS e~timators 0. {11.. {Jl arc the valu~ of b0 b .... b4 that mmimue the
!>Urn of squared prediction mistakes I~- 1 ( Y, - b0 - b 1X, -
1.Xu)2. The
OLS predicted' alul!s Y, and residuals it, are
ii, = Y, -
Y,.i =
1, ... . II.
(6.10)
The OLS e!>1imntors jJ0 jJ 1, . , ffik and residual u1are computed from n ::-ample of
n observations of (Xli... . . X 111 Y.-). i = I ..... n. These are esrirnator!i of the
unknown true population coc mcients {30,(3 1, f3J.. and error term. 11 1
10
(1).11)
O ur concern has been that t bi ~ rela tionship is misleading because the- stu
denr- teachcr ra tio might be picking up the e{(ect of having many English h:urn
er!. in districts with large classes. That is, it 1s possible that the OLS esti 01i.liM j,
subject to onuttcJ variable bias.
We are nm' in a po<;nion to address tbis conce rn by using OLS tu c: ... timate 1
mulliple regression in wh1ch the depend~ot variable is the tc ... t -.con. 0') '\OJ there.:
are l\\0 rcgr..:ssors: tht.: studenr tcacb\.r ratio (X11) anJ the; p.:r.;.:nta~~~- '"'' r:nglbh
6. 3
199
learners in the school district (X2;) for our 420 districts (i == 1, ... , 420). The esti mated OLS regression line for this multiple regression is
he
he
lO)
~of
the
~rna-
of the
using
ssion
(6.1 l)
be stu-
i learn1ator is
imate a.
1tl rhefl!
English
PctEL,
(6.12)
where PctEL is the perce ntage of studeAnts in the district who are English learners. The OLS estimate o f the intercept ({30 ) is 686.0, the OLS estimate of the coefficient on the student- teacher ratio 1) is -1 .10, and the OLS estimate of the
coefficient on the pe rcentage English learners 2) is - 0.65 .
The estimated effect on test scores of a change in the student-teacher ratio in
the multiple regression is approximately half as large as when the student-teacher
ratio is the only regressor: in the single-regressor equation [Equation (6.11)), a uni t
decrease in the STR is estimated to increase test scores by2.28 points. but in the multiple regression equation [Equation (6.12)]. it is estimated to increase test scores by
only 1.10 points. This difference ocx:urs because tJ1e coefficient on STR in the multiple regression is tbe effect of a change in STR, holding constant (or controlling for)
PctEL, whereas in the single-regressor regression, PctEL is. not held constant.
These two estima tes can be reconciled by concluding that there is omitted
variable bias in the estimate in Lhe single-regressor model in Equation (6.11). In
Section 6.1, we saw that districts with a high percentage of E nglish learne rs tend
to have n o t only low test scores but also a high student-teacher ratio. If the
fracti on of E nglish learners is orrun ed fro m the regression. reducing the
student-teacher ratio is estimated to have a Larger effect on test scores, buf this
estimate reflects both the effect of a change in the studenL-teacher ratio and the
omitted effect of having fewer English learners in the district.
We have reached the same conclusion tha t there is omitted variable bias in
the relationship between test scores and the student- teacher ratio by two different paths: lhe tabuJar approach of dividing the data into groups (Section 6.1) and
the multiple regression approach [Equation (6.12)]. Of these two methods, multiple regression has two important advantages. First, it provides a quantitative esti
mate of the effect of a unit decrease in the st ude nt-teacher ratio, whlch is what the
superintendent need!. to m ake her decision. Second, it readily extends to more
than two regressors, so that mulliple regression can be used to control for measurable factors other than just the percentage of E nglish learners..
The rest o f this chapter b devoted to understa nding and to using OLS in the
multiple regression model. Much of what you J.earned ahout the OLS estimator
with a single regressor carries over to multiple regression with few or no mod ifications, so we will focus on that which is new with m ultiple regression. We begin
by discussing measures of fit for tile multiple regression model.
(h
.9)
(h
200
CHAPTER 6
6.4
1f
the e rror tenn " ; Thus. the SER is a measure o f th\! spread ot the djstribut1on
Y around the regress1on line. ln multiple regression, the SE R is
SE R --
~~;,
1
.'2 _
I
~ ~oz _
SS R
w Jer e:>,; - -n -- k -- l 1-.c.,;1II/ - n - k, - I '
(fi
I ~)
The regression R 2 is the frnction of the -;ample vannnce of Y; explaint!d PY (or r tc,;
dieted by) the regrc-,sots. Equivalcntly.Lhl! f(~ i~ l minu<> the fraction of tht! \Jt l
ance of Y, not expla ined by the rcgrt!ssors.
The mathemattcal dcfmition 01 the R2 is the samt! as tor regression wllh .;in
glc regressor:
R1 _ ESS
TSS
=1
=~
(Y
}' )'.
SSR
TSS '
6.4
tndard
-l
us R ).
res_<::ion
201
cient o o the new regressor to be exact ly zero, then the SSR will be the same
whether or not th e second variable is included in the regression . But if OLS
chooses any value other than zero, then it must be that this value reduced the SSR
Hion of
Lltion of
(6J3)
:the slope
Its for the
pe cocffilllS called
The adjusted R 2 or
R,2 = l _
= 1. so the
n- 1
SSR = 1 _ su.
n - k - 1 TSS
sl'
(6.15)
. the e (fect
The difference between this formula a nd the second definition of the R 2 in Equation (6. 14) is tha t the ra tio of the sum of squared residuals to the total sum of
squares is multiplied by the factor (n -1)/(n - k - l). As the second expression
ft
~with a s1u-
in EquaLion (6.15) sbows, this means that the adjus1ed R 2 is 1 minus the ratio of
the sample variance of the OLS relliduals [with the degre es-of-freedom correction
in E quation (6.13)]
2 is
R2 Ftrst, {n -
1)/(n - k- 1)
R2.
(6.14)
T hird . the R2 can be negative. This happens when lhe regressors, taken
tog\.:thcr. reduce the sum of stJuarcd residuals by such a sma ll amount that thii>
red uction fail!. 10 offset the factor (n - I )/(n - k - l).
202
cHAPTER 6
= 14.5.
Comparing these measures of fit with tbose for the regression m which Per .
is excluded [Equation (6.11)) shows rhat mcludltl& PctEL 10 the n .. gres~1 rt
im.Teased the R2 from 0.051 to 0.426. When lhe only regressor is STR, only a sm tll
tractto n of the variation in TestScore is explained. however, when PctEL is adJ~:J
to the regression. more than two-fifths (42.6%) of the variation in test sco re~o is
expl3ined . In this sense. including the percen tage of E nglish learners substuntwlly
im proves the fit of the regression. Because n is la rge and o nl y two regrcs ~nrs
appea r in Equation (6.12). the difference be tween R2 and adjusted R2 is very smt1 ll
(R 2 = 0.426 versus R2 = 0.424).
T11e SER for the regression excluding PcrEL is 18.6; this vatue fall s to 14.5
when PctEL is included as a second regressor. The units of the SER are point" on
the standardi?.cd tesl.The reduction i.o the SER tells us that predictions about st ndardized test scores are substantially more precise if they are made usinJ be
regression with both STR and PctEL than if they are made using the regrt.-..s n
witb o nly STR as a regressor.
because 11 quantific' th~
extent to which the regrcc;c;ors account for, or explain. the variation tn lh!! dependent varia hie. Nev~rtheless, heavy reliance on the R2 (or R2) ca n ~ a trap. Tn apph
cations, " max1mize the R2'' is rarely the answer 10 any economically or statt... ti all~
meaningful quesllo n. Instead, the decision about whether to include a vanahh. '"
a m ultiple regressioo sho uld be based on whether including that va riabl~; allo"~
you bett er to estimate the causal effect of interest. We retum to the issue nf how
to decide which variables to include- and which to exclude-in C hapt~r 7. nr~t.
however1 we need to develop methods for quantifying the sampling unce rtain tv of
the OLS estimator . T11e starting point for doing so is extending th e least ::.qu.trl!'
assumptions of Chapter 4 to the case of mult iple regressors.
6.5
R.2 is useful
_____________
6.5
203
(Key Concept 4.3), extended to allow for multiple regre ssors. and t he se are disc ussed only briefly. The fourth assumption is new and is discussed in m ore detail.
~ssion
e per-
,0.426.
!>
SE R
PctEL
The first assumption is that the conditional distribution o f u i given X li.... ,Xki has
a mean of zero. Th is assumption extends the first least squares assumption with a
ression
a small
; added
regression line, but on average over the population Yi fa lls on the po pul ation
regression line. Therefore, for any value of the regressors, the expected value o f u ;
pores is
iantially
rress ors
!fY small
is zero. As is the case for regression with a single regressor, tbis is the key assu mption that makes the OLS e::;timators unbiased. We return to o mitted variable bias
~ lO 14.5
~s1ng ~he
Assumption #2:
(X,;, Xz,., ... I xkil Y,.), i = 1, ... ! n Are Li.d.
The second assumption is that (X1i, ... , Xki Y;), i = 1. ... , n are independl;!n tly and
~gresSlOO
~o ints
on
lo ut stan-
if the data are collected by simple random sampling. The comm ents on this assumption appearing in Section 4.3 for a single regressor also apply to rn ulti.ple regressors.
ltifies the
be d epen-
~. Jn a ppli~.atistical~y
J.ariable In
The third least squares assumption is that large outliers-that is. obser vations with
values far outside tJ1e usual range of lhe data-are u nlikely. Th.is assumption serves
~ble allows.
~ue of ho\\'
iter 7. First:
certainty ot
The assumption that large outliers are unlikeLy is made matJ1ematically precise
by assuming that Xli, ... ,
fast squares
0 < E(X1;) <co, ... , 0 < E(Xt i) < oo and 0 < E(Y7) < oo. Another way to state this
X ki>
assumption is that the dependent variable and regressors have finite k urtosis.
This assumption is used to derive the properties of OLS regression statistics in large
samples.
204
CHAPTER 6
6.4
l. u, has conditional mean zero given X w Xy, . . . , Xk1: that is,
2. (X ti X 21 , Xk,. Y;). i = I, .. . . n are indepe ndently and identically distributed (i.i.d.) draws from their joint distribu tio n.
3. Large outliers are unlikely: X v . ... Xld anl.l Y; have nonz~ro fi nile rounh
moments.
compute the OLS estimator. The regressors are srud to be perfectly multicollinear
(or to exhibit perfect multicollinearity) if one of the regressors is a perfect luH.ar
function of the o ther regressors. The fourth least squares assumptio n h that the
regressors are no t perfectly multicollinear.
Why does perfect muLticollinearity make it impossible to compute the OLS
estimalOr? Suppose you want to estimate the cocfficielll on STR in a rcgr"~" nn
of TestScore, on STR; and Pet EL,. except that you mak e a typographical erwr und
accidentally type in STR, a second tirne 1nslead of PctEL;: that is, you reg.n:~~
TesrScorei on STR, and STRi. This is a case of perfect roulticollinearity becau.'e nnl'
of the regressors (the first occurrence of STR ) is a perfecl linear function ,,(
another regressor (the second occurre nce of STR). Depending o n how your Sllll
ware package handles perfect multicollinearity, if yo u try 10 estimate this r~;E!.hY
sio n the software will do one of three things: (1) It will drop one of the occurrence~
of STR; (2) il will refu se to calculate the OLS estimates and give an error mcc;s,,gt::
or (3) it wi ll cra:,h the computer. TI1e mathema tical reason for th1s failun: 1s th<ll
perfect mullicollinearity produces division by zero in the OLS formulas.
At an intuitive level, perfect multicollinearity is a problem because you rl.l
~skin g the regression to answer an illogical question. In multipt..: regre ... -.ton ,hl'
cocfficicm oo one of the regrt::s ors is the effect of a change in that rcgrcs 'l)f. hold
ing the other regressors constant. In the hypothetical rcgres ion of Test~nm Jtl
S TR and S l R, the coefficient on the fi rst occurrence of STR 1~ lhe t:fft.!ctun 1!.!"1
'>Cores of a change In STR. hulding COil!>lanl S'JR. Thb makes no <.cn~e. anu 01 s
:
L!e
6.6
205
The solution to perfect multicollinearity in this hypothe tica1 regression is simply lo corre ct rhe rypo and lo replace one of the occurrences of STR with the variable you originally wanted to include. This example is typical : "When perfect
multicollinearity occurs. it often reflects a logical mistake in choosing the regressors or some previously unrecognized feature of the data set. In general, tbe solution to perfect multicollinearity is to modify the regressors to eliminate the
problem.
Additiona l examples of perfect multicollinearity are given in Section 6.7,
which also defines and discusses imperfect multicolline arity.
The least squares assumptions for the multiple regression model are summarized in Key Concept 6.4.
"b-
rth
6.6
near
ear
the
regression coefficients, {30, {3 1, . f3k Just as in the case of regression with a single
regressor. this variation is summarized in the sampling distributio n of the OLS
OLS
ssion
rand
estimato rs.
Recall from Section 4.4 lhat, under t he least squares assumptions, the OLS
estimato rs (~0 and ffi 1) are unbiased and consistent estimators of the unkn own
coefficients ({30 and {31) in the linear r egression model with a single regressor. In
addition, in large samples. the sampling dislribution of ffio and~~ is well approxi
mated by a bivaria te normal distribution .
These results carry over to multiple regression a nalysis. That is, under the least
squares assumptions of Key Concept 6.4, tbe OLS estim ators ffio. ffi 1... , ffi,. are
ences
l;!ssagt!:
is thai
lou art:
iion. thC
or. hOJJ
)'co re 011
n LC"\
~ 0
~od Ol :>
unbiased aod consistent est1mators of {30, f3~o . . . f3k in the line~r ~ultipl~ regression model. In large samples, the joint sampUng distribution of {30 . {3 1.. {3k is well
approximated by a multivariate oormal d istribution, which is the extension of the
bivariate normal distribution to the general case of two or more jointly norma l
random variables (Section 2.4).
Although the a lgebra is more complicated when there are m ultiple regressors.
the central limit theorem applies to the OLS estimators in the multiple r egression
model for the same reason that it applies to Y and to the OLS estimators whe n
there is a single regressor: The OLS estimators '/30, ~ 1 , . , ~k. are averages o( the
randomly sampled data. and if the sample si.ze is suffic iently large the sampling
distribution of those averages hecomcl> normal. Because Lhe multivariate nMmal
206
cHAPTER 6
linear ~egression
"'
f3 k
If lhe least squares assumptions (Key Concept 6.4) bold. then in large samples lh~
OLS estima tors ~11 ~ 1 {3~. arc jointly normally distributed and each {3, is dtstributed N(J3,.
).j = 0, ... . k .
uJ
6 ..7
Multicollinearity
As discussed in Section 6.5. perfect mult icollinea rity arises when one of the regressors is a perfect linear combination of the other regressors. This section pro\'iu6
some examples of perfect multicollinearity a nd discusses how perfect mull tcollinearit)' can arise, and can be avoided. in regressions with mult iple binary
regressors. Imperfect mullicollinearity arises when one of the regressors is ver\'
highly correlated- but not perfectly correlated-with the othe r regressors. Unl ikl!
perfect mulricollinca tity, imperfect mul ticollinearity does not prevent estimation
of the regression. nor does it imply a logtcal problem \vitb the choice of regressor"
H owever, it docs mean that one or more regression coefficients could be estimateJ
imprecisely.
6. 7
Multicollinearity
207
he
is-
ion
English learners in the i h district. wluch varies betwee n 0 and l. Jf the variable
FracEL 1 were included as a third regressor in add ition to STR, and PcrEL" the
regressors would be perfectly multicollinear. The reason is that PcrEL is the per-
centage <.lf English learners. so that PctL1 = 100 x FracEL, for e very district. Thus
one of the regressors (Pc1EL,) can be written a:; a perfect linear functi on of
another regressor (FracE!.,).
Because of this perfect multicollinearily. it is impossible to compute the OLS
estimates of the regression of TestScore; o n STR1, PctF.L,. aod FracELi.At an intuitive le ve l, OLS fails because you are asking, What is the effect of a unit change in
the percenrage of English learners, ho lding constant tht! fraction of E nglish learners? 13ecause tbe percentage of English learners and the fraction of English learne rs move together in a perfect linear relationship, this yuestion makes no sense
and OLS canno t ans ...ver it.
.In
rrema-
and
d
jn
gres
vidCS
~l Ui ll
inary
ts very
equa ls 1 if the student- teacher ra tio in the i h district is 'not very small," specifically, NVS; e quals l if STR, ~ 12 and e quals 0 o therwise. This regression also
exhibits perfect muhicollinearity, but for a m ore su btle reason than the regression
in the previous example. There are in fact no districts io o ur data set with STR1 <
12; a:; you can see in the scatterplot in Figure 4.2, the smallest value of STR i~ 14.
Thus, NVS; = 1 for all observations. Now recall tbat the linear regression model
with an intercept can equivalently be thought of as including a regressor, Xlli> that
equals 1 for all i, as is shown in Equation (6.6). Thus \\ C can wri te NVS, = 1 x Xo;
for all the observations in o ur dala set; that is, .\'VS; can be written as a perfect linear combination of the regressors; specifically, it equal~ X .
This illustrates two important points about perfect multicollineari ty. First,
when the regression includes a n intercept, then one of the regressors that can be
lJnlikc
implicated in perfect multicollinearity is the constant regressor X01 Second, perfect multicollinearity is a statement about the data set ) ou have on hand. While it
1ation
e:-.'l>ors.
is possible to imagine a school district witb (ewer than 12 s tudents per teacher,
there are no s uch d istricts in our da ta set so we ca nnot analyze them in o ur
matcJ
regression.
c:wrn
s t'duc:J
Example #3: Percentage of English speakers. Let PctES, be the percentage of 'Engli$h speakers" in the i 1h district, defined to be lhe percentage of sruuenl who art: no t English learners. A gam the regressors
be perfectly
multicollinear. Like the previous example. the perfect linear relationship among
the regressors involves the constant rt:g re ..sor X, :For every distnct. Pct5 = 100
'"ill
X,
PctEL,.
20 8
CHAPTER 6
The dummy variable trap. A no ther possible source of perfect muh tcoJiinearity arises when mulliple binary, o r dummy, variables arl! used ac; reg.re'sors. For exa mple. suppose you have pa rtitioned the school d1st ricts into three
categories: rural, suburban , a nd urban . Each district falls into on\! (and o nly on ~:)
ca tegory. J.c t these bina ry variables be Rural;. which e qua ls 1 for a rura l dbl u~ot
a nd eq uals 0 o therwise: Suburban;, and Urban,. If you include all lhree binary vanabies in the regression a long with a constant, the regressors will be perfect m ulucollinearit y: Because each djstrict be longs to one a nd only o ne category, Rural, ~
Suburban, + Urban, = l = Xoi whe re XOi denotes lbe constan t regressor introduced in E qua tion (6.6) . Thus, to estimate the regression, yo u must exclude ont. of
these fo ur varia bles, eithe r o ne of the binary iodica to rs o r I he coosta nt term. By
convention , the constant term is retained, in whicb case o ne of the binary indtcalo rs is excluded. for t:xample. if R urali were excluded . the n the coefficient on
Suburban, would be the average difference be tween tes1 scores in suburban and
rural distric t!.. holding coast ant the other variables in the regression.
In gene ral. if t11e rc are G binary variables, if each observation fa lls into one
and onJy one category, if there is an intercept in Ihe regression. and if aU G bina~
variables are included as regressors, lhen the regression wi ll fail because of p..:rfect mullicotlinearity. This situation is called the d ummy variable trap. The usual
way to avoid the dummy varia ble trap is to exclude o ne o f the bina ry variables
from the m ultiple regr ession .so on ly C - 1 of rhe G binary variables are incluJed
as regressors. In Ibis case, the coefficients on the include d binary variables represent tbe incre me ntal effect of be ing in that category. re la tive to the base ca:-~ of
the omitted category, holding constant the othe r regressors. Alternative!). all G
binary regressors can be included if the intercept is o miuc d from the regrc::.::.ton.
Solutions to perfect multicollinearity.
6. 7
of
Ll
tly
h i-
esree
nc)
usual
able!t
luded
epre
se o(
. all G
icall"
tht:
in th~o
t!S
if ,()U
have
arity. tl
va rt i'
wtllll~
rc~.;ors
Mulricollineorily
209
Imperfect Multicollinearity
D espite its similar name, imperfect multicollinearity is conceptllally quite dillc:re nt than perfect multicollinearity. Im perfect multicollinearit) m~.:ans that two or
more of the regressors are highly correlated, in the sense that there is a linear function of the regressors that is highly correlated with another regressor. Imperfect
multicollinearity does no t pose any proble ms for the theory of the OLS estimato rs; indeed, a purpose of OLS is to son o ut the independent influences of the various regressors when these regressors are potentially correla ted.
If the regressors are imperfectly multicoUinear, then the coeffioents on at least
one individual regressor will be imprecisely estimated. For example. consider the
regression of TesrScore on STRand Pc1EL. Suppose we were to add a third regressor,the percentage the district's residents who are first -generation immigrants.
First-generation immigrants often speak English as a second language. so the vari
ables PctEL and percentage immigrants will be highly correlated: Districts with
m any recent immigrants will tend to have many students who ar~ still learning
English. Because these two variables are highly correlated , it would be difficult to
use these data to estimate the partial effect on test scores of an increase in PcLEL,
holding constant the percentage immigrants. In other words, the data set provides
little information about what ha ppe ns to test scores when the percentage of English learners is low hut the fraction of immigrants is high, o r vice ''ersa. If the least
squares assumptions hold. the o the OLS estimator oflhe coefficient on PctEL in
rhis regression will be unbiased; howe ver. il will bave a larger variance than if the
regressors PcLEL and percentage immigra nts were uncorre la teu.
The e ffect of imperfect multicollinearity on the variance o f the OLS estimators
can be seen math e ma tica IIy by inspecting Equation ( 6.17) in Appendix 6.2, whlch is
the variance of ~ 1 in a multiple regressio n with two regressors (X1 a~d X 2 ) for the
special case of a homoskedastic e rror. In this case, the variance of /3 1 is jnversely
proportional to 1 - p~ 1_x 2 where p x .x 1 is the correlation betwec.:n X 1 and X 2. The
la rger is Lhe correlation between the two regressors, the closer i$ this term to zero
a nd the larger is the variance of {3 1 More generally. when multiple regressors are
imperfectly multico llinear, the n the coeffici ents on o ne or more of 1hese regressors
will be imprecisely estimated- tha t is, they will have a large sampling variance.
Pe rfect multicollinearity is a problem that often signals the presence of a logical error. Jn contrast, imperfect multicollinearity is not necessarily an e rror, but
rathe r just a feature of OLS, your data, and the ques1ion you are trying to answer.
1f the va riables in your regressio n a re the ones you meant to jnclude-tbe ones
vou chose to address the potential for omitted variable bias-then imperfec multicollinearity implies that it will he: difficull co estimate precisely one o r more of
the partial ciTects using the data at hnnd.
2 10
CHAPTER 6
6.8
Conclusion
Regression wilh a single regressor is vuloerablt! to omitted variable hid'> ll <tn
omitted variable is a determinant of lhe dependent variable and [)) correh..tted '' nh
the regressor. then the OLS esumator of the slope ~.:oeHicient will be baa ell mu
\\ill reOect both the effect of the regressor and the effect of the omitted 'ariablc
Multiple regrec;c;ton mak~ it possible to mitigate omilled variable bias by ind udrng the ontittl!d vanable in the regres11ion. Tite coeffictent on a regressor. X, ln
multt plc regression is the partial effect of a change in X 1 holding constant the
other included regressors. In the test score example, including the percentag~. ,,f
English learners as a regressor made it possible to estimate tJ1e effect on test ohl:ll(..:\
of a change in the student- teacher ralio. holding constant the percentage of [ n...
Iish learners. Doing so red uced by hal( the eslimated effect on tes t score ot J
change in the student-teacher ratio.
TI1e statislica l theory of multiple regression buil<.Js on the statistical th~:orv uf
regression with a single regressor. The least squares assumptions for mu lt1p,e
regression are exte nsions of the three least squares assumptions (or regr~sst m
with n single rcgrcs~or. plus a fourth assumption ruling ou t perfect mL ucollioeanty. Because the reg.ression codficiems are estimated usmg a sing.le" LID
pie, the OLS estimators have a JOint sampling distnbution and. therefore have
sampling uncertatnry. This sampling uncertainty must be quaotiued as part of an
empirical study, ao<.J the ways to do so 10 the multiple regression modd are tb\!
topic of lhe next chaptec
Summary
1. Omitted variable bias occurs wbi!O an omitted variable ( L) is correlated wtth
included regressor and (2) is a determinant of Y.
:~n
2. The multiple rcgrcssjon model is a linear regression model that incl ud~s mulupl~
regressor:,, X I, x2. ' Xk . Associated with ~ach regressor is a regres. 100 codli
cient, }31, }32, . {3~; . The coefficient 13t is the.; expected change in Y associatl.!u with
a one-uoil change in X 1 holding the other regressors constant. The oth ~ r n.:~n:'
sion coefficients have an analogous inte rpretation
3. The coefticients in multiple regression can be l!stimated by OLS. \\ hc.:n the l lUf
least squares assumptions in Key Concept 6.4 are satisfied, the OLS e:,umatur' 1rl!
unbiased. con't'h.:nt, and normally distnbuted in large ~am pies.
4. Pcrrcct mu lticollinearit\ which occurs "hen one re~rc~sor i' .1n c\.tt.:t lin..: tr
fum:tion of the other regre ~or.. u... uallv m'~' Irom a m1, t1kc.. 10 chommg wh 1ch
0
i.IO
llh
nd
lc.
ud-
.in
the:
of
o rcs
1.
n~
of a
ry of
tiple
ion
ulti
S:liil
ha"c
of an
re tho.!
2 11
Key Terms
omitted variable bias (187}
multiple regr ession model ( 193)
coefficient on X 2; (193)
con trol varia ble (193)
holding X 7 constant (194)
(J95)
homoskedastic ( 19"i)
heteros kcdastic ( 195)
t: ~i:
computers per student? Why or why not? lf you thtnJ.. ~ 1 is biased, is it biased
up or down? Why?
coetfi
6.2
6.3
reg.re~
the four
:nors arc
line:tf
g
11 w 1w:It
cl
2 12
CHAPTU
6.4
I:xplain why it is dtfhcult l<.l i!Sltmate pn.:d.,l.!ly the partiaJ clfcct of X 1 hold
me X 2 constant. tf X 1 and~\ are highly correlated.
Exercises
The first four exerce:;es r~h.. r to lhe tahk ol es l i ma t~d r~gressions on page 2 3.
computed using data for 1Y9h from the C P~. Thct Jc.~ta set consists o[ mtvrmation on 4000 full-umc full-year worhrs. The htghcst educanonal achr~"e
ment for each worker wa' ~o.uhcr a high M:hool diploma or a bachelor Jee c
The \vorkers ages r<mg~o d fr om 25 to 34 )~:oar--. The data set also comau. r
information on thi! region of the country \\here the person Jived , manta! :-.tJtus, and numher of chih.lrcn. l-or the purposes of these exercises let
A HE = average hourly earnings (in 1998 dollars)
College = binary variable (1 if college, 0 if high schnol)
Female= binary va riable ( I if female, 0 if male)
A~e = age (in years)
Nrh enst = binary variable (I it Region = Northeast 0 o therwise)
~f1dwesr = brnary variable ( 1 1f Rl!gion = Midwest, 0 othe rwise)
South ; binar) variable (l if Regio n South , 0 olht!rwisc)
Wes1 = binary variable (l if Reg10n = Wesl. 0 o the rwic;e)
6.1
6.2
6.3
in col umn(~)
Exercises
hold-
213
.ge 213.
if infor-
(1)
College (X 1)
Female (Xv
fchieve-
5.46
- 2.64
. ..
..._
Age (.Y, )
degree.
1ntained
rita t sta-
(3)
(2 )
5.48
-2.62
0.29
Northeast (XJ)
5.44
- 2.62
0.29
--
0.('9
M1dwest (X5 )
0.60
South (X~)
lntercepL
12.69
---
4.40
--
- 0.27
3.75
--
Summary Statistics
SER
6.27
R2
0.176
R!
4000
~ar-old
~
6.22
6.2 1
0.190
0.1 94
--~
4000
.tOO()
c. Juanita is a 28-year-old fem ale college graduate from the South . .Jennifer is a 28-year-old female college graduate (rom the Midwest. Calculate the expected difference in eamings between J uanita and
Jennifer.
workers
l.?
-----
6.5
Data were collected from a random sample of 220 home sales from a community in 2003. Let Price denote rhe selling price (in $1000), BDR de note
th~ number of bedrooms, Barh denote the number of batluooms. Hsize
denote the size of the house (in square fee t), Lsize denote the lot size (tn
:>quare fee t),Age denote the age of 1he house (in years), and Poor denote a
binary variable tha t is equal ro J if the condition of the house is reported as
"poor." An estimated regression yields
+ 0.090Age
- 48.8Poor,
214
CHAPTER 6
A researcher plam. to study the causal effect of police on crime using tlata
from a ra ndom sample of U.S. counties. He plans to fegr ess the cou nt~
crim e ra te on the (per capita) size of the county's police force.
a. Expla in why this regression is likely to suffer from omitted variable
bias. Which variables would you add to the regression to control for
important omitted variables?
b. Use your answer to (a) and the expression for omitted variable b ia~
givetl i.n Equation (6.1) to determine whethe r the regression will likely
over- or underestimate the effect of police on the crime rate. (That is,
do yo u think t hat ~ ~ > {31 or ~ 1 < {3 1?)
6.7
Critiq ue each of the following propose d researc h pla ns. Your critique should
explain any proble ms with the proposed research and describe how the
research plan m ight be improved. Include a discussion of any addi tional uata
that nee d to be collecte d and the approp1iate statistical techniques for ana
lyzing the data.
a. A researche r is interested in delennining whetJ1er a large aero pace
firm is guilty of gender bias in setting wages. To detennine potential
bias, the researcher collects salary and gender information for all nf
the firm 's engineers. The researcher th~o plans to conduct a ''differ
e nce in means'' test to determine whether the average sa lary for
women a re significantly Jess than the average salary for me n.
b. A researche r is interested in determini ng whether time pen t in pri~on
has a permanent effecl on a person's wage rate. He collects da ta on a
random sample of people who have been out of prison for at least tlf
tee n yea rs. H e co llects simi lar data on a random sample of people" hl,
have never served time in prison. The da ta set includes information on
each pc..:rson ~curren t wage. e d ucation, age!. elhnicity. gender, tenure
Exercises
21 5
ed
6.8
A recent study found that the deatJ1 ra te for people who sleep six to seven
hours per night is lower than the death rate for pe ople who s leep eight or
m ore hours. and higher than the death rate for people who sleep five or fewer
hours. The 1.1 million observations used for thb study came from a rand om
s urvey of Americans aged 30 to 102. Each survey respondent was tracked for
at a
fo ur years. The death rale for people sleeping seven ho urs "' as calculaiCd as
the ratio of the nun1ber of deaths over the span of the s tudy among people
:'s
sleeping seven hours to the total n um ber o f survey respondents who slept
seven hours. Tills calculation was the n repeated for people slee ping six hours,
and so on. Based on this summary, would you recomme nd that Americans
who sleep nine hours per night consider reducing their sleep to six or seven
hours if they want to prolo ng their lives? W hy o r why not? Explain.
s
ely
IS.
o uld
the
data
ana-
6.9
(Y1, X 11 , X2i) salisfy the assumptions in Key Concept 6.4. You are interested
in {31 , the causal effect o f X 1 on Y Suppose that X 1 and X 2 an~ uncorre lated.
You estimate {31 by regressing Y onto X 1 (so that X 2 is not included in the
regression). Does this estimator suffe r fro m o mille d varia ble bias? Explain.
6.10 ( Y;, X 1;, X 21 ) sa tisfy the assumptions in Key Conce pr 6.4; in addition,
var(u, IX 1,, X2,) = 4 and var(XIi) - , 6. A random sample of size n = 400 is
drawn from the populalion.
a. Assume that X1 and X2 are uncorrelated. Co mpute the ,aria nee of ~ 1 .
ce
'al
of
'f-
c. Comment on the following state ments: '' When X 1 and X 2 are corre-
x2
UTE:
fori = l. ... . n. (No tice that the re is no con~t a n ttc rm in the regression.) Followmg ana lysis like that used in Appcndtx 4.2:
216
CHAPTER 6
'7 1X 11X 2, =
X Y ~ ~ X"!.
Empirical Exercises
E6.1
Using the data set College Distance described in Empirical ExeiClse -U. L .1'0
out Lbe following exercises.
a. Run a regression of years of completed educaLion (ED) on di stanc~ ll1
the nearest college (Dist). What is the estimated slope?
b. Run a regression of ED on Dist, but include some additional regn.:~
soTS to control for charactcnstics of the student, the -,tudenr's fnmih.
and the local labor market. In particuJar, mcludc as additional rt._ ....
sors Bt,.test, Fenwh, Black, H~pamc. lncuuwhi, Ownhome. D ade. c '
C11eoO. and St" m fg80. What j<; the. cstimatt.d ellcd of D ISt on D'
Empirical Exercises
217
c. Is the estima ted effect of Dist on ED in the regression in (b) substantively dillerent from rhe regression in (a)'? Based on this. does the
regression in (a) seem to suffer from important omitted variable bias?
pect
d. Compare the fit of the regression in (a) and (h) using the regression
standard errors_ R2 and R2 Why are the R2 and R2 so similar in regres-
no(
sion (b)?
e. The value of the coeffici ent on DadCo/1 is positive. What does this
coefficient measure?
tors
f. E xplain why Cue80 and Swmfg80 appear in the regression. Are the
s igns of their estima ted coefficients ( + or -) what you would have
believed? Interpret the magnitudes of these coefficients..
g. Bob is a black male. His high school was 20 miles from the nearest college. His base -year composite test score (Bytest) was 58. His fa mily
income )n 1980 was $26,000, and his family owned a home. His mother
attended college, but bis fat her d id not. The unemployment rate in his
county was 7.5%. a nd the state average manufacturing hourly wage
was $9.75.1)redict Bob'~ years of comple ted schooling using the
regression in (b).
ses 4.2.
ted
h. Jim has the same characte ristics as Bob except tbat his high school was
40 miles from the nearest college. Predict Jim's years of completed
schooling using the regression in (b).
ditional
teris
edir,
of
impor
E6.3
Using the data seL Gro,\1h described in Empirical Exercise 4.4 , but excluding the data for Malta, carry out the following exercises.
a. Construct a {abl e th at s hows the sample mean, standard deviation , and
minimum a nd maximum values for the series Growth, TradeShare,
YearsSchool, Oil, R ev_Coups, Assassinations, RGDP60. Include the
appropri ate units for a ll entries.
4.3,carry
stance to
regres
family.
F.D'!
218
CHAPTER 6
e. \\ hy 1s Oil omll t~,;d trom the.. re gression'! What would happen 1f it wc:rc...
mcluded'?
APPENDIX
----
l l1is .1pp~:nui\ prc~c n t~ il derivation of the formula for omilled van ahle b1a:. in Equauun
{6.1). fquauon
(4 ~10) 10
(6 16)
k}';:
APPENDIX
6.2
~stinMtoN.
.B ~.:cause
the crr~'rs are homoskcdastic. the conditmnal vanancc ol u, can be writt~,:n .I '
var(ll .\' 1 .Y2, ) =a~. W h~:n there arc t\\O regre ~sor1> , X 11 und X 1,, 11nu th~.; error l~:rm I'
hom0-.l..~d:~stic.. m larg~ <.ampks the sampling dtstnbuuon of JJ ~ V{f3 , ; ) ''here ah.. '
anc.. ot tlu) ~hstribuuon . trt, j,
ITp
=' [ ..
I ., l "~
II
l -
'1"1.
\
J ll
(n
17)
Dtslribvtion of the OLS Eshmolors When There Are Two Regressors and Homoskedostic Errors
219
where p,. r ts the populauon correlation between the cwo rcgn.ssors ,\ , am.l X:: and ui h
Thte \ariance oj of the ~mphng J 1~trihuuon lll ~~ dcj~cnJ-; t.ln the ..qua red correlation
between the: regressors. If X1 and X~ at~: hi11hh corrd u~d. c::1thcr pu~itivcly or negath d y.
then I'~ , . 1s close to l,and Lh\1$ the term 1 - p~ , in th~o. dcnumin:uor of Equation (6.17)
1s small ::md the variance of~ 1::. larl!cr th:~n it would be 1f p, , were close to 0.
Ano ther feature ol the joint
ts chat~ and f3.z are m gene ral corrc!Jtcd. \\hen thi.. errors .u~.- humoskcdastlc. the corre-
rcgrc~rs:
corr(/3..f32)
- p,, ,.X;
(6.18)
CHAPTER
regressors. thereby contJolling for Lhe effects of those additional regressurc;. fhl!
coe((icients of the multiple regression model can be estimated by OLS. Like .111
estima tors, lhe OLS estimator has sampling uncenainty because its va l u~ d1lll.rs
from one sample to the next.
This chapter presents methods for quaotiiyiog tht ampling uocertalnt) oi
the OLS estimator through the use of standard errors, statistical hypoth~'!!'
testS, and confiJence intervals. One new possibility that arises in multtple
regression is a hypothesis that simultaneously mvoh es two or more regn.: 'ton
coeUicieots. The general approach to testing such "JOtnt" hypothe es m"uh\.! a
new test statistic, the -statistic.
Section 7.1 extends the methods for statistical inference in regres~t on ''ith a
single regressor to multiple regression. Sections 7.2 UJlJ 7.3 show bow to tt..:
hypotheses that involve Lwo or more regression coefficients. Section 7.4 l'Xtc nd~
the notion of confidence intervals fo r a singlt: coefficient to confidence sc l!t f< 1r
multiple coefficients. Deciding wh ich variables 10 include in a regression is ,tn
important practical issue, so Section 7.5 discusses ways to approach th i'i
problem. In Section 7.6, we apply mulfiple regression analysis to obtatn
improved eslimntes o( the e(fc::ct oo test scores of a reduction in the
student- teacher ratio using the Californ ia tc l score data set.
7. 1
1.1
221
+: M
tn,.
Recall that. in the case of a single regressor. it was po~iblc to t:stimatc the \ariance of the OLS estimator by substituting sampl~ averages for expec tations. which
le d to the estima tor~ ~ given in Equation (5.4). Under the lea~t squares assumptions, the law of large numbers implies that th t:s~ sam ple nver agcs conve rge to
l bc
all
{Jj,l,;
cH
PI
i"
o(
P,
estima ted by its standard error, SE(p1) . Tile fom1ula fo r the s tand,ud e rror is most
easily stated using matrices (see Secuo n 18.2). The Important poin t is thar, as fa r
as standard errors are concerned, there is notbJog concepw a lly different between
the single- o r multiple-regressor cases. The key ideas- the large-sample no rmality of the estimators and the abiliry to estjmute conSISte nlly the c;tandard dev1ation
o f t heir sampling distributio n-a re the same whether o ne bas one, two. o r 12
regressors.
with a
ers in tbe district. This corresponds to hypothesizing that the true coefficie nt 131 on
tht:. student-teacher ratio is zero in the population regrec;o;ion of test scores on STR
and PctEL. More generaJiy, we might want to lc ~ l the hypothesis tha t the true
cO\: lficient 131 on the ih regressor takes on some specific "alue. {3 .o The n ull value
{3 comes e ither from economic theory or. as in the studcm l- teacber ra tio C\ amplc. from the decision-making context of tbe application. I r the alternative hypothC!>IS i two-sided, then the rwo bypolh e~~ can b~ wnuen mathcma ucally as
Ito- {31 =
1310 ~
H 1 131 "I
(7.1)
222
CHAPTER 7
7.1
PrSE(P
1) .
where r1"c is the value of the -statistic actually computed. R eject the hypotheSI:. at
the 5% significance level ifthe p-value is less tban 0.05 or. equivalently. if t'1"'1> I %.
The standard error and (typically) the /-stat istic and p-value testing {31 = 0 arc
computt!d automatically by regression software.
For example, if the first regressor is STR. then the nuU h)' pOlh esis that cha n.~ ng
the studeot- tcacher ra tio has no effect o n class size corresponds co the null
hypothesis that {3 1 = 0 (so f3J.o = 0). O ur task is to test the n ull hypothe:o- ' H(J
againslthe alternative H 1 using a sample o( data.
Key Concept 5.2 gaves a procedu re for testing this null hypothesis when 'l1ere
lS d smgle regressor. The first ste p in tlus procedure is to calculate the s tandard
error o f the coefficient. The second ste p is to calculate the t-s tatistic using th 'i!\!O
eral formula tn Key Concept 5.1. The third ste p is to compute the p -\aluc.:. u the
test u!>ing tbe cumulative normal distribution jn A ppendix Ta ble l or. alternu e!~ .
to compare the t-statistic to the critical value corresponding to the desired.;_ til
icance level o f the l<:!Sl. The lheoretical underpinning of tbis procedure is th th~
OLS estimator has a large-sample normal distribution which, under the null
hypothesis, h.as as its mean the hypothesized true value, and that the varian ... e 1.1f
this distribution cao be estimated consistently.
This underpinning is present in multiple regression as w~U. As sta ted 10 l't:'
Co nce pt 6.5, the sampling distributio n of 1 is approximately no rmal. L nd1. r thl
null hypothesis the mean of this d istribution is f3j.o The varianc.! of this di.,t ,,u
rion can be estimated consiste ntly. Therefo re we can simpl) follow the '\amc rr"'~
cedure as in the single-regressor case to test the oull hypothesis in Equauon (i.l}
The.:. procedure for te t ing a hypothe is on a single coefficient in mulur 1'
7.1
223
tams the true value of {31 with a 95% probahility: that is, it contajns the true value
of 13 in 95% of all possible rand omlv drawn sampks. Equivalently. it is the set of
v~lu.:!S of /3 Lhal cannot he rejected by a 5% two-sided hypothesis te st. When the
1
s~mp\e size is large. the 95% confidence interval is
{.7.2)
. V J)
1\
'
lffi; -
witb 1.64S.
tesis at
> 1.96.
;:: 0 arc
de no ted t 11 <'1 in this Key Co nce pt. Ho wever, i( is customary to denote this simply
as
)anging
[he null
Lesis Ho
1,
~ there
ftandard
lthe gen~e o f the
rnatively.
ed signif
s that the
the null
~riaocc of
cd in Kc.:~
Undc.:r th'
s distril'U
::.ame prll
ation (7 .l ).
n n1u ltiP 1 ~
II~
bmpute'
m odel is a lso the same as m the si n gl~-regressor model. Th is method is sum marized as Key Concept 7.2.
The method for cond ucting a h ypothesis ti;!Sl in Key Co ncept 7. 1 a nd the
method for constructing a conftdt!nce interval in Key Concept 7.2 rely on lhe largesample normal approximation to the distribution of U1e OLS estimator f3,. Accordingly. it sho uld be ke pt in mind that these methods for quantifying the sampling
uncertainty are only guaranteed to work in large samples.
224
CHAPTER 7
--TesrScore
(7.S)
To tesr the hypothesis that the rrue coefficient on STR is 0. we fiN need 10
compute t.he /-statistic in Equation (7.2). Because the null hypothe~' 'il}' that the
true value of this coefficient is zero, the /-statistic is 1 = ( 1.10 0) 0.43 = - 2 ,
The asl>ociated p-vaJuc i 2<1>( - 254) = 1. 1%: th:n is. the smallest stgmficance 1~\ ...1
at which we can reject the null hypothesis is 1.1 %. Because the P' alue b le'" than
So/o. the null hypothesis can be rejected at the 5% stgnificance level (t'tut not quite
at the I% signif1cance level).
A 95% confidence interval for the populat1on coeiftc1ent on STR is I 1 11::
1.96 x 0.43 = (- 1.95, - 0.26); that is, we can be 95% confident that the tru~.- ' tlue
of the coefficient is between - 1.95 and - 0.26. Int erpret~d in the context lll the
superintendent's interest in decreasing the st uden t- teacher ratio by 2. tht Y~%
confidence interval for the effect on test scores of this reduction is ( - I 95 x 2,
-0.26 X 2) = (- 3.90. -0.52).
Your analysis o( th~: r: lt"ple regression in Equauoo (7.5) has persuaded the superinte ndent that, ba~t.: d on
the evidence so far, reducing class size will help test scores in her distnct. ' o''
however, she moves oo to a more nuanced question. !I she is to h~re m' rc.: teachers, she can pay for those reachers e ither through cuts elsewhere in Lhe buJg~t (no
new computers. reduced maintenance. and so on). or b) asking for an tn\.rca'e 10
her budget, which taxpayers do not favor. What, she asks. is the effect on h:'t ~ .>r"'
of reducing. the studenr-teacber rario. holding expenditures per pupil (a nd t Jkf
cenlage of English learners) constant?
This question can he addressed by estimating a regression of test score' ">r the
st udent-teacher ratio. total spending per pupil. and the percentage o r rn~Ji'h
learners. The OLS regression line is
= 649.6 -
where xpn is to tal annual expenditures per pupil in the district in thnu'\.tnds of
dollars.
The result is striking. Holding expenditures pe1 pupil and the percent 1gt: ot
English learners constant, changing lbe student- teacher ratio is e')tim.llcd to hJ\
11 111
a \ 'CT\ small cflcct on tc t .,corts: The estimated cocfftcac; nt on 57 R tS -I
Equa-tion ("'.S) but aft"'r .tddmg '(pn a., a r'-'2r'-'' or in Cquat on (7.6). it 'onh
-0.:>9 \11 nrrcwc r ah. , ....a.ui ..llc: fnr "''I'"" thn 1h1 1r11r. ' luc of th<' rn~ ffict en,'-'-t_
r.......__ _ ____ J
7 .2
ecd t<:l
bat the
One inte rpretat io n of the regression in Equation (7.6) is that, in these Ca l.i-
- 2.54.
fomia data, school administrators allocate their budge ts e fficie ntly. Suppose, counrerfactually, that the coefficient on STR in Equation (7.6) were negative and large.
If so, school districts could raise their test scores simply by decreasing funding for
other purposes ( te xtbooks, technology. sports, and so on) and transferring those
funds 10 hire more teachers, thereby reducing class sizes while holding expenditures constant. However, the small and statistically insigni6cant coefficient o n STR
in Equa1ion (7.6) i.nd icates that this transfer would have little effect o n test scores.
Put differe ntly, d istricts are already allocating their funds efficiently.
Note that the standard error on STR increased when Expn was added, from
0.43 in Equation (7.5) to 0.48 in Equation (7.6). This illustrates lhe general point,
introduce d in Section 6.7 in the context of imperfect multicollineari ty, that correlation between regressors (the correlation between STR and Expn is - 0.62) can
make the 0 LS est imators less precise.
Wha t aboul our angry taxpayer? He asserts that the popula tion values of borh
quite
-1.10::
ue value
Kt of the
the 95%
l.95
225
zero is now t =(-0.29 - 0)/ 0.48 = - 0.60, so the hypothesis that the population
value of this coefficient is indeed zero cannot he rejected even at the 10% significance level Cl - 0.601 < 1.645). Thus Equation (7.6) provides no evidence that hiring more teachers improves test scores if overall expenditures per pupil are held
constant.
(7.5)
)t
2,
re multibased on
1.
ltct. Now.
the coeificjent on the studem- teacher ratio (/31) and the coefficient on spending
per pupil (/32 ) are zero, that is, he hypothesizes that both /31 = 0 and {32 = 0.
A lthough it might see m that we can reject this hypothesis because the c-statistic
1re teach
1dget (no
tcrease in
l}St scores
d tbe per
the F -statistic.
re, on the
1f English
F:L. (7.M
)u.ands of
7.2
entage of
Testing Hypotheses
on Two or More Coefficients
l.:d to hli"~
s - l .LO iO
l it is o111Y
erridc-Jtl
226
CHAPTER 7
1l1e.. hypothe~is that hoth the coeffident on the student -teacher ra t10 (~ 1 ) unti
the coeffidcnt on expenditures per pupil (132) are lero is an example of .1 l'''llt
hvpothcw> on the codfJcicnh in the m ultiple regression mod ~!. In thi'i c,,..,~... the
null hypothi!!-Js rest nets the' alue of two of the coeffici~nlS. so as a matlcJ ol1, rmmolog) \\C can say that tbc null hypothesis in Equauon (7.7) impose~ twu rc,tricti on<~ nn th~.- mulliplc regression model: 131 = 0 and 13z = 0.
Tn genera I. a joinf hypothesis is a hypothesis lhat imposes two or more rl.!-.lr ictiQns o n the regression coefficients. We consider joint null and alternative hyp,t h~.:
ses of the lorm
where {31 ~.., rdcr to different regression coefficit:nts, and f3;.o /311111, r kr
to th~ vu lu~;s of these codficicnt:. under the null hypothesis. The oull hyp('th~:~b
in Equation (7.7) i ;lo c'ample o f Equation (7.8). Another example is that. tn
regres ...ion \\ith k = "regre...,o~. the null hypothesis is that the coeftu:icm-.. on the
'2~<~. 411'. and :>1 regres~or~ arc 7cro: rhat is. {3 2 = 0, {3~ = 0. and 13s = 0. so that thc:re
arc q = 3 re~tricttons. In general. under the null hypothesis H0 there arc q ~uch
restrit:t tun!..
II an) one (or more than one) of the equalities unde r tht: n ull hypvth~:~i~ //,
in Equation (7.8) is fa\c;c, then the joint null hypothesis iLSclf is fa lse. 11lU'\.th~.o .dt~r
nathc hypothcc;is is that ,11 lt:ac;t one of the cqualitiec; in the null hypothr.:''" J/1
dnl.!' not hold.
Why can't I just test the indiYidual coefficients one at a time? Alth,,ugh
it set'll1'> it r.hould be possiblt;: to test a joint hyputhcsis by using the u~lJ.tlt ~tllll:
ttc-. to test the reslrictJono, one ala time. the fo llowing calculatlun :;how~ th,ll tht'
appro.u.:h j, unreliahlt:. Specifically.suppo~ that )OU are interested in tc~liiH' th~
JOint null hyptthe~i' in Equation (7.6) that {3 1 - 0 and /3~ = 0. Let r bL th~ r ~ 1 '
ti..-.tic lor tcstmg the null h~ polhesi' that13 = 0. and lett2 beth~ H.tati,tiL wr teSt
ing tht: null hvpothesJ'\ that {32 = 0. What happens " hen~ ou use th~: "'one al ,ttinte
tcs11ng pro~cdure: ReJect the joint nulJ bypothc~c; .r ~ithl.!r 1 or r2 cxcel.!<.b 1.'10 1 ~
7.2
227
math-
Because this question invo lves the two random variablclt t 1 and /~. answer ing
it requires cha ractcrizmg the joint sampling distribution oft 1 and r~. As mentio ned
in Section 6.6, mlurge samples /3 1 and ~ 2 have a joint normal d isu ibulion. so under
l7.7)
the joint null hyp ot he~is the r-statbtics t 1 a nd t,_ ha'~ a b iva riat~ normal diSlribution where e ach r-statistic has mean e qual to 0 an<.l \ariance equa l to 1
First coo ider the special ca"e in which the c-sta tistics are uncorrelated anu
thus are independen t.. Wh nt is the size of the '"one ar a time'' testing proccdurl!:
cgn:!-
,) lllld
a joint
,. . e. the
of tcrrestric
n.:stricvpothe-
(7.R)
. . . refer
rpotheSi'
thaL in a
tl" on the
hut ther~
re q such
th~sis flu
the alter
th~:<ii S
llu
1'\lth<'Ul!.h
11r-~tall~
~ thUI tbt~
that is, what is the probah11ity that you will reject the null hypo thcsis whe n it is
true? More than 5%! In this pecial case we can calculate the rejectio n probabil
ity o f this method exactly The nullll> not rejecte d only if both It.I ~ 1.96 and
h ~ ~ 1.96. Because the
1.96)
= Pr(lt 1 l s
If the re gressors are correla ted. the s ituatio n is even mort! co mplicated. The
of the 'one a t a time " procedure depends oo the val ue of the co rrel ation
be tween the regressors. Because the " o ne a t a time testing. approach bns the
wrong siL.e-that is, its rejec tion nne under the nuB hypothesis docs not equal the
desire d significance leveJ- a new approach is needed .
O ne approach is 10 modify the "one at a t1me '' method so tbat it uses different critical values that ensure that its size equals its significance level. This meLhod,
called the Bonferroni method. is described 111 A ppe nd ix 7.1. T11e advantage of rhc
Bonferroni me thod is that it applies very generally. ll~ c.lbHdvantagc is that it can
have low power; it frequently fails to reject tbe null hypothe is when in fact the
a lternative hypothesis is true.
Fortuna tely, tlu:re is anothe r approach to l~slingjoint hypothese!> tl1at is more
powerful. ~speciall) when the regresso r are highly co1 related. That approach is
based on the !-'statistic.
~>ize
T he F-Statistic
l11e F -statistic is used to test joint hypothesis about regression codficients. The
lh~ 1-t-1
fo rmulas for the F-statis tic are inte g ra te\.~ in to mode m regression soft"" a rc.
Wit! (ir!>t d iscu<;s the case of two restrict ions. then turn to the genera l c ase of q
restrictions.
c for t1!"1
at a titll
The F-statistic with q = 2 restrictions. When the joint nuU hypothesi'> has
the two restrictions that {3 1 - 0 and {3"! - 0. the /-statlsttc comhines the two tstAttstics t and t. usmg the lormula
228
CHAPTER 7
:.!.
?
(/j + 1 ?
1-:_-
2.
Pr .r.l112)
.:!
p, ')
'
(i 9)
where p,,J, is an estimato r of the correlation between the two r-..,tati ttc..:
To understand the F-statistic in Equation (7.9). firs t suppo-.e tbat W\. l.n1
the Hotatistics are uncorrclated so we can drop the terms invoh m~ p ~- U so. r .-. .
tion (7.9) ~ impll(ies and F = zCtf + ti): that is. the F-S l..ltl~ttc is the 8\\.rag<. o the
squared t-statisrics. U nder the null hypothesis, t 1and 12 an. tn<.kpendent st.mdord
normal random variables (because the /Statistics ru-e uncorrdated by a~umpt1 n).
so under the null hypothesis F has an F1.C. distribution (Section 2.4). Und~. the
alternative hypo thesis that either {31 IS no nzero o r {3 2 is no nzero (or both). then
eit her 'To r tl (or hoth) will be large, leading the test 10 reject 1he nu II hvpothe is.
In genera l the /-statistics are corre lated. and the fo rmula for rhe F-swu:o.tl<: tn
Equatio n (7.9) adjusts for this corre lation. This adj ustment is made so tiHH, under
the null hypothesis, the F-statisric has an F2_., distri bution in large samples wh~ rhcr
or nor the t-statistics a re correla ted.
The formula for the heleroskeda' ICttyrobust -statistic resting th(! q restrictions of the joint null hypothesis in Equ lion
(7.8) is given in Section 18.3. This formula is mcorporated mto regression soil\ are.
making lhe -statistic easy to compute in practice.
Uoder the null hypothes1s, the F-statistic has a sampling distribu tion that. m
large samples, is given by the Fq.x distribution. That as, in large san1ples. unJcr the
null hypothesis
the F-st <~ tist ic is dtstri butt:d Fq .._.
(7 10)
Thus the critical values for the F-statistic can be obtained from the 1 tl-tl ' I
the Fq.3: distribution in Appendix Table 4 for the appropriate value of q and the:.
desired significance level.
7. 2
7.9)
229
Computing the p-value using the F-stotistic. l11c /' Wllue of the F-,tall~tic
can he computed u-;mg the large-sample F 1 <tpprm:ml.ltion tn 1t~ di~tnl"luuoo Let
I Jcnutc the value oJ the 1--statt,llc ~cluall) compull:u. Bccau:.e the F-'>tatisttc
h..t' .t l.trge-!-ampk /-~.- di::.tribution undt. r the null h) pnthc\lS.. the fl\ alth! is
that
qua-
p-valuc
Pel F., )!
(7.11)
].
f the
dard
'on).
r th~:
then
hesis.
stic in
under
ether
The p -va1ue in [quation (7.11 ) c.:an be cva1uati!U U\llll' a table o lthe F1.x dhtnhuuon (or, alternauvdy. a tabk ol th ... \ 1 Ja,lnbutaon. because a \~-d ist ributed
r.tuc.lom \3mtl:lle b q tun.::. an f:rJtbl thuli..J r..nJ,)m' tri.tblc:) Allematih:l~. the
p \.llUt. can be: C:\JiuateJ using a wmpuler, becau~c rormuJa, for the cumulalhe
clllsquarcJ and /-distribution' have bc:cn inn>rpornt~c.lan to mo'\1 modem statis
tical software .
The " oYeroll" regression F-stotistic. 1lh. ov~ra ll'' rcgresMon F-statistic
1~.-sl~
the JOint hvpothcsis that a//th~: slope.: coefhclcnt:; are zt.ro. That
anJ .\ltemJli\~,; h)pothcscs are
sticity. uation
ftware.
that. in
dcr tbe
(7.10)
H 11: f3,
15,
the null
(7.12)
Under th1s null h\ potbesis. none of tht. 1egrc'-Sors explain-; any ol th~ varia uon in
Y1 although Lh~.- mtcrcept (wluch und~.-r the null hvpothc~;i~; is the mean ol r,) can
be noni"cro. The null hypoth~.-sio; m l::.quauon (7.1 .,) is .1 'Pect.ll c&'>t.. olth~ general
null hypothc:::.1s in Equation (7.8), anJ the O\er.all rcgJ~o..,sion F-,tati,tic i the
J- stnlisric computed for lhe null hvpothesi!> in F qu.llion (7.12). In large samples.
the overall rc..:gres ion F-statistic has an Ff . distribullun when the null hypothesis
i~;
true.
t ables o(
1 and tbl!
:~tisticol
The F-stotistic when q = / . \\ben " = I. the r ,,,,11,111. l~o:~lS .1 stngk re-.trictaon. Then lht. joint nuB hypothcw. reduces to th..: null h) pot he 1s on a single
rq~rt. .,..jon codficienl. and the 1--stalbtic i~ the .,quare <.lf lh1. 1- lali'\tic.
dasticit~
egard1t.~
.cussed ar.
110skedJ"'
packag~
u.~iog ttc:t
edastic'~
11)
vef'!>l\
230
CHAPTER 7
hypothesis, in large samples this statistic has uo F2: 11 disuiburion. The 5o ern ~ 1
value of the F2.?0 distribution is 3.00 (Appendix Table 4), and the 1 Clfu cnllcal' to "
is 4.61. The value of the F-stausuc computed from the data. 5.-4\ excc:cl:b 4.o.. ~v
the null hypothesis is rejected at the 1% level. It is very uoltkel) that \\ e vn ...
have drawn a sample that produced un F-statJStjc as large as 5.43 if tl e null h~ ~th
esis really were true (the pvalue is 0.005). Based on the evtdeoce in Equatton (7 6J
as summarized in this F-statistic, we can reject the taxpayer's h) pothe,is that r.ci.
ther the student-teacher ra tio nor expendi tures per pupil ha' e Jn effect on t ~ t
scores (holding constaot the percentage of English learners).
7. 2
Te$IS ol Jomt
Hypotheses
2.31
(7.13)
where SSR,.:srrn:rcd is the sum of squared residuals from the resuicte d reg.ress1on,
u nr~stricced regression.q is
the number o( restrictions under the null hypothesis, and k,,,,..,,..,t'J is the number
of regressors in the unrestricted regressio n. An alterna tive e quivalent formula for
the ho mo!>kedasticity-only F-sta tistjc is based on the R1 of the two regr essions:
(7. 14)
her
to{
rely
sug
c f
fact
res
oin
H the e rrors are bomoskedastic, then the diCfe rence bet ween the ho moskedastici cy-only F -statistic computed using Equa tio n (7.13) or (7. 14) a nd the bete roskedascicity-robust -statistic vanishes as the sample size 11 increases. Thus, if
the errors are homoskedastic, the sampling distribution o f tbe rule-of-rhumb Fsta tistic under the null hypothesjs is, in la rge samples, Fq ~
These rule-of-thumb formulas are easy to compu te and have ao i.nUitive interpre ta tio n in lerms of how weU the Wlrestricted a nd resLricted regres.c;ions fit U1e
data. U nfo rrunately, they are valid o nly if the e rrors are h omoskeda~ tic. Because
t\C i~
L( the
) the:
atistic
erro is
of the
: F-sta-
If the errors
:wctard
are homoskedastic a nd are i.i.d. no rmally dimibuted . tnen the homoskedasticityo nly -statistic defined in Equations (7.13) aod (7.14) has an Fq.n-l ,
1 distributio n unde r the n ull hypothesis. Cri tical va lues for tbis d istributio n, wh.ich
ression
rrnula
ressioll
ben th~
d valuc:S
en~ art
Application to Test Scores and the Student- Teacher Ratio. To Lest the null
n. In thC
h~ polhesis
)thC!>l" IS
the unfl!
1tcsi'
232
CHAPTER 7
~ = 664.7-0.671
( 1.0)
<.o R~,, 111 tc.t
X PctL. R2 = 0A141J.
(7 l~l
(0.032)
a.
ll011'>
7.3
7.3
Approach #1: Test the restriction directly. Some statistica l packages l1ave
enrs on
Jt ~ntc(
oes not
OLS,is
(7 .15)
1.962 = 3.84.)
observa;sion is k
.l4), is
test the restriction directly, the hypothesis in Equation (7.16) can be tested using
a trick in which the original regression equation is rewritten to turn the restriction
in Equation (7.16) into a restriction on a single regression coefficient. To be concrete, suppose there are only two regressors, X 1; and X 21 in the regression, so the
population regression has the form
(7.17)
:ted at the
noskedas:.alculator.
and het-
:e dastJCltY
illl.1
233
Here is the trick: By subtracting and adding {32 X 1;. we have that {3 1X 1, + {3 2X 2;
= f3tX1t- f32Xti + f32Xli + f32X 2i = (/31 - /32)Xli + f3iXIi + X2;) = Y1Xli + f32Wi,
where Y1 = {31 - {32 and Wt = X 1; + X21.1l1US, the population regression in Equation (7 .17) can be rewri rtcn as
Lhe less
(7.18)
Because the codficient y 1 in this equation is y 1 = {3 1 - {32 under the null hypothesis in Equation (7.16). y, = 0 while under the alte rnative, y 1 0. Thus, by turning Equation (7.17) into Equation (7.18). we have turne d a restriction on two
regression coefficient~ into a restriction on a single regression coefficien t.
Because the restriction now involves the single coefficient 1'1> the null hypothesis in Equation (7.16) can be tested using the t-stalistic m ethod of Section 7.1. In
practice. this is done by first constructing the new regressor W1 as the sum of the
two original regressors, then estimating the regression of Y1 on X li and Wi. A 95%
conJ'idence interval for the di((erence in the coeffi cients {3 1 - {32 can be calculated
as .Y 1 1.965(-'Yt).
This method c~n be extended to other restrictions on regression equations
using the same trick (see Exercise 7.9).
TI1e two methods (Approaches #1 and #2) are equivalent, in tJ1e sense that the
F-statistic from the first method equals the square of the /-statistic from the sec-
two or more
lthcsis of the
't he sau1e.l!l
l11al the tWO
p .lol
!at restricti011
. the mettwds
ond method.
234
CHAPTER 7
Extension to q
> 1.
7.4
7 .5
~ FIGURE 7. 1
235
95% Confidence Set for Coefficients on STR and Expn from Equation (7.6}
;seJ for Cf = 1.
egression soft-
.t s
more regression
~Sect ion 7.1 for
~statistic. except
f)
-statistic.
.hat contains the
-1
-2 It
- 1.11
-U
-(1,5
11.\)
U.S
1.5
drawn samples.
:cients of a confi
r"~'
(confidence e 11.
~nditure
per pupil
he estimated r~~grc::'
aO). This mean.; 11131
.
\l~
is rejected uSlng. 1
.:w frnm Secti<'l1
_.,
The co nfidence e lli pse is a fat sausage with the lo ng part of the sa usage o n ented
in the lower-lefllupper-right direction. T he re ac;o n for th is o rienta tion is that the
estim ated corre latio n between {31 a nd {32 is pos itive, which in turn ari es because
1he corrt'la tio n between the rcgress<.1rs S'IR and E.xpn is negati v~ (schools th at
spend more per pupil tend to have fewe r students per teacher).
0
7.5
Model Specification
for Multiple Regression
The joh of detem1ining '' hich variables to include in mul tiple regression-that is,
the pro blem of choo!>ing a regression specification-can be yuite challenging. and
no singk: rule applie s in :.11! situatio ns. But do not despa ir. hecausc some useful
guidelines arc a\'ailable. The starting point for choosing a regression specificauon
is thinking through the possible sources of omitted va riable bias. It is impon a nt to
rely on your expert knowledge of the empirical problem a nd to focu~ on o b taining an unbiased estimate or the causal effect of imerest: do not rely soleJy on purel}'
statistical measures nf fi t such as the R 2 orR~.
236
CH A PTER 7
0Mtm
Omitted
rnorc inclu
able brae, L
l. A t lt:a
variab
., Theo
7 .5
~o mi tted
rlearning
~ et tc r
test
d to have
e students
OLS esti-
----------------------------------------------------------------~7"c
~Ornitred variablt: bias is the bias in the OLS estimator that arises when one or
~~----~lll l
7 .3
more included regressors are correlated with an omitted variable. For omitted variable bias to arise, two things must be true:
1. A t least one of the included regressots must be correlated w.ith the omitted
variable.
le effect of
p-ession are
n}nant of Y,
~ estimators
:>LS estima
~ients will be
'gression are
Expert judgme nt aad economic theory are rare ly decisive, however. and often the
variables suggested by economic theory are not the ones on which you have data.
Therefore the next step is to develop a list of candidate alternative specifications.
that is, alternative sets of regressors. If the estimates of the coefficients of interest
are numerically similar across the alternative speci.fica tions, then this provides evidence that the estimates from your base specification are reliable. If, on the other
hand, the estimates of the coeff:icieots of interest change substantially across specifications. this often provides evidence that the original specification had omitted
variable bias. We e laborate on this approach to model specification in Section 9.2
after studying some tools for specifying regressions.
Interpreting the R2
and the Adjusted R2 in Practice
,e difflcull and
A n R 2 or an 8.2 near 1 means that the regressors are good at predicting the values
of the depende nt variable in the sample. and an R2 or an R2 near 0 meam they are
not. This makes these statistics useful summaries of th~ predictive abil ity of the
regression. However, it is easy to read more into them than the y deser Ye.
There are four potential pitfalls to guard against when using the R2 or R2 :
bias is rwofold1
;ombinarion
were collected~
Ted to as a base
,r primAl)' inter
conumic tht!l'f)
1. An increase in the R 2 or R2 does not necessarily mean that an added variable is statistically significant. The R 2 increases whenever you add a regressor, whether or not it is statistically significant. The R2 does not always
increase, but if it does this docs not necessarily mean that tbe coefficient on
that added regressor is statistically significant. To ascertain whether an added
variable is statisticaJ.ly significant, you need to perfo rm a hypothesis test using
the !-Slat i:"l ic.
1,11
, , Ke.YCoNffff t
)ne of the
237
I'
238
CHAPTER 7
R2
You-
The R 2 a11d R2 tell you whethe r tbe regressors are good a t predicling, or explaining." the values of the depende nt variable in the sample of data on hand. If the R2
(or R2 ) is nearly 1)t hen the regressors produce good predictions of the depe-nde nt
variable in tha t sample, in the sense that the variance of the OLS r csid ual is small
compared to the var iance- of the dependent variable. If the R2 (or R2) is nearly O,
the opposite is true-.
The R 2 arzil R2 do NOT tell you whether:
1. A n inchJded variable is statistically significant:
2. T he regressms are a true cause of the
mo\'cmcn t~
2. A high R 2 or R 2 does nor mean that the regressors are a true cause of !he
dependent variable. Imagine regressing test scores against parking lot area
per pupil. Parki ng lot area is correlated with lhe student-teacher ratio. with
whether the school is in a suburb o r a city. a nd possibly with district inco mealllhing.s that arc correlated with test scores.TI1us the regression of test score~
o n parking lot area pe r pupil could have a high R 2 and R2 but the re lationsh1p
is not causal (try telling the s uperintendent that the way to increase test score:.
is to increase parking space!) .
3. A high R 2 or R2 does not mean there is no omiued variable bias. Recall the
discussion of Section 6.1, which concerned om itted variable bias in the regrc::.sion of test scores on the student-teacher ra tio.The R2 of the regression ne\er
came up because it played n o logical role in this discussion. Omined variable
bias can occur in regressions with a low R 2 a m oderate R 2 , or a high R 2 Cunversely. a lo w R2 does not imp.l y that t here necessarily is omitted va riable bill '
4. A high R 2 or R 2 does not necessarily mean you have 1he most app ropriate
set of regressors, nor does a low R 2 or R2 necessarily mean you have an inap
propriate set of regressors. The q uestio n o f wha t constitutes the right se t 11f
regressors in multiple regrcs<>ion is difficult and we return to it throughout th i~
textbook . D ecisions about the regressors must weigh issues of omitted vari
able bias. data availability, data quality, and. most importantly, economic
theory ami the nature of the substantive questions being addresse d. None of
7.6
239
these q uestions can be a ns wered simply by having a high (or low) regression
R2 or R1 .
These points are smnmarized in Key Concept 7.4.
xp1ain
f the R2
endent
is sm all
~,early 0.
7.6
variable:
use of the
1g lot area
ratio. with
! incomeli!Sl scores
~ \ationsltip
: h!st scores
omirting them from the regre~:;ion will rcwlt in omitted variable bias. lf data are
available on these omitted variables, the solution tO this problem is to include them
as additional regressors in the multiple regression , When we do this. the coefficient
on the student-teacher ratio i~ the effect of a change in the student-te acher ratio,
ho lding constant these othe r factors.
Here we ~;onsider th ree variables that control for background ch aracteristics
of the students that could affect test scores. One of these control variables is the
ooe we have u~ed previously, the fraction of students who are still le arning Eng-
. Rt:call th~
1 the regres
lish . The two other variables are new a nd control Co r the economic background of
lhe students. There is no perfect measure of economic backgrou nd in the data set.
:ssioo never
t~d variable
so instead we use two imperfect indicators of low income in the d istrict. The first
igh R2 . Con
ariablc bia:-.
apfJroprialt
a~e an illap~ right set oi
oughout tlll~
)miued v~trt
I\' . econorn'-=f
new variable is the percentage of students who arc eligible for receiving a subsidized or free lunch at school. Stude;:nts are e ligible fo r this program if their family
incomt is less than a certain threshold (approximately 150% of the poverty line).
'fbe second new variable is the percentage o f studenlS in the district whose families qualify for a California income assistance program. Families are eligible for
this income assistance prog ram depending in pan oo their family income, but the
threshold is lower (stricter) than the threshold for the subsidized lunch program.
TI1esc two var iable!\ thus m easure the fraction of economically di!;advantaged children in the di::;trict: a lthough they are related, they arc not perfectly correlated
(their correlation coefficient is 0.7-l). Although theory suggests that e conomic
240
CHAPTER 7
6<~ 1
What scale should we use for the regressors ? A practical question th,u
arises in regression analysis is what scale you should use for the regressors. In 11cure 7.2, the uni ts of the variables are percent, so the maximum possible rangL of
the data is 0 to 100. Alternatively, we could have defined these variables to b\. a
decimal fraction rather than a percent; for example, PcrEL could be replaccJ ~'~~
the fraction of English learne rs, FracE L (= PctEL /100) . which wo ulJ r<.~r ~e
betwee n 0 and 1 instead of between 0 and 100. More generally, in regression .malysis some decisio n usually needs to be made about the scale of bo[\1 the depc::mknt
and independent variables. How. then. should you choose the scale, or units. ol the
variables?
The general answer to the question of choosing the scale of the variabk'> b to
roak e rhe regression results easy to read and to interpret. In the test score apphcalion, the natural unit for the dependent variable is the score of the test It s~ II In
the regression of Tes1Score on STRand PctEL reported io Equation (7 S) l h~
coefficient o n PctEL is - 0.650. If instead the regressor had been Frac EL. rite
regression would have had an identical R 2 and SER; howe::vt-r. the coefU<:tL ill on
FracEL wo uld bave been - 65.0. In the specification with Pct L, the codfiut:nl
is the predicted change in test scores for a one-percentage-point increase in ln!!
lish learners. holding STR constant; in the specification witll FracEL, the codfi
cient is the predicted change in test scores for an increase by 1 in the traction of
English learn ers-that is, for a I()Q-percentage-point-increase- holding S'/ R Ct1nstant. Altho ugh these two specificatio ns are mathematically equi valent. lor rh<!
purposes of interpretation the one with PctEL seems. to us. more narur,ll.
A no ther consideration when deciding on a scaJe is to choose th~.; unns o1 rhe
regressors ~o that the resulting regre ion coefficients are easy to reaJ. f l,r t: ' mple. if a regre!'sor is meas ured in dollars and has a coefficiunt of 0.()()()0())56. 11 1'
easier to read tf the rcgrc-.sor is converted to millions of dollars and the coertu.:~nt
~ '\
nn I 1
( c)
The scotterpl
-0.64
hon =
percentage q
7. 6
tdo
~or
241
~::;
~e
FIGURE 7 .2
7.2.
forrer~veen
~tween
Test score
Tesr score
nor ~
7~0
;"lil t
::: &-~;
..
~~~
<~It
:.~
1 ~\1
1 ,(~1
'
1411
_,
!' ....,.
'-----'-----'----'----~
2'i
.'\( 1
75
II
~I)
Perce nt
n that
(a) Percentage
ol f.n!,!h>h l.lll!:,'l.l.tgc
Percent
lc:trnen
(b)
Percenta~ qua~1ng
In Figangt! of
to be a
aced by
d range
Tesc score
.. .2lt
o analype ndent
ts.of the
~\es i<tO
(>111.1
"
2:1
'ill
-5
Percent
The ~cotterplots show o negative relationship between test scores and [a) the percentage of English learners (correlation -0.64). (b) the percentage of sh.ldents qualifying for o subsidized lunch (correlohon = -0.87); and (c) the
percentage qualifying for income assistance (correlation = - 0 .63).
;coefficient
ltsc in Eng-
tht: codfifraction of
~g S TR con
lent. tor th"
;ural.
' units oftlte
d. Fore~arn
1)000356. il I'
'I ic:nt
~e coi.!l 1\:
242
CHAPTER 7
TABLE 7.1
(1 )
(2 )
-1.28**
-uo~
(0.52)
(0.43)
(4)
(S)
- t.oo
- 1.31**
- 1.01
(0.27)
(0.27)
- 0.650**
(0.031)
(3 )
(0.3~ )
- Q.J22H
-0.~*'"
(0.033)
(0.030)
- 0.130'
co.o3o)
-0.529
(0.03f\)
- 0.547*
(0.024)
(X~)
--
- 0.790
(0.()(!8)
---
0.0-l.~
(0.059)
f-
Intercept
698.9**
686.0
(10.4)
(8.7)
18.58
14.46
700.2*"'
(5.6)
698.0*"
700A" *
(6.9)
(5.5)
9.08
11.65
Summary Statistics
SER
R2
0.049
0.424
II
420
420
0.773
0.626
420
420
9.08
0.773
420
.l
'fbese regr~~it~ns were- e-stimated using 1he data on K-S school districts in Caliiorma. described in Appendix 4.1. Standard error'
arc given in pllreotheses under coefficients. "lbc indiv1dual cc.~effident is statistically <:ignificaut at tlw S% lcwl or 1% ~igmii
cance level u~mg a two-sided test.
the same depende nt variable, test score. The e ntries in t he first five rows are the
estimated regression coefficients, with their standard errors below them in paren
theses. The asterisks indicate whether the /-statistics, testing the .hypothests 1hat
the relevant coefficient is zero. is significant at the 5% leve l (one asterisk) or the
1% le vel (two asterisks). The final three rows contain summary statistics for lh~
regression (the standard error of the regression, SER, and the adjusted R 2. R2) and
the sample size (which is the same for all of the regressions. 420 observatioJTh).
All the information that we have presented so far in equa tion format appear'
as a column of this table. For example, consider the regression of the test sco rr;
against the student- teacher ratio, with no control variables. In Ctluation form. this
regression is
TesrScore
2.28 x STR, -R 2
(10.4) (0-'\2)
= 698.9 -
= 0.049, SR
= J8.5H.n
= 420.
(7 .1 0)
7.6
243
A ll this i.nfonnation appears in column (1) of Table 7.1 . The estimated coefficient
t.he student-teache r ratio ( -2.28) appears in lhe first row of nume rical e ntries,
Ot'l
(Sl
- 1.01
(U.:!7)
-- 0.130**
t0.036)
-ll.529*'"
(0.038)
0.048
(0.059)
700.4..
(5.5)
9.08
0.773
420
Stand,trd errors
lr 1% llignifi-
=420.
a nd its standard error (0.52) appears in pare ntheses just below the estimated coefficie nt. The intercept (698.9) and its standard error (10.4) are given in the row
labeled "Intercept." (Sometimes you will see this row labeled ''constant'' becau::;e,
as discussed in Section 6.2. the intercept can be viewed as the coefficient on a regressor that is always equal to 1.) Similarly, the R2 (0.049). the S ER (J 8.58), and the sample size n (420) ap pear in the final rows. The blank e ntries in the rows of tbe other
regressors indicate that those regressors are not included in this regression.
Although the table does not report /-statistics, these can be computed from
the information provided; for example, the /Statistic testing lhe hypothesis that
the coefficient on the student-teacher ratio in column ( 1) is zero is - 2.28 /0.52 =
- 4.38. This hypothesis is rejected at the 1% level, which is indicated by the double asterisk next to the estimated coefficient in the table.
Regressions that include the control variables measuring student characteristics are reported in columns (2)-(5). Column (2), which reports the regres.sion of
test scores on the srude ut-teacher ratio a nd on the percentage of E nglish learne rs. was previously stated as Equation (7.5).
Column (3) presents the base specification, in which the regressors are the student-teacher ratio and t\VO control variables. the percentage of English learners
and the percentage of students eligible for a free lunch.
Columns (4) a nd (5) present alternative specifications that examin e the effect
of changes in the way the economic background of the students is measured. In
column (4}, the pe rce ntage of students on income assistance is incl uded as a regressor, and in column (5) both of the economic background variables are included.
teacher ratio on tesr scores approximately in bait This estimated effect is not
very sensitive to which specific control variables are included in the regression. In all cases the coefficient on the student-teacher ratio re mains statistically significant a t the 5% level. In the foUl' specificat ions with control
variables. regressions (2) - (5). reducing the stude nt- teacher ratio by one student per teacher is estimated to increase ave rage test scores by approximately
one point, holding constant student characteristics.
2. The student characteristic variables are very useful predktors of test scores.
The student- teacher ratio alone explain s only a small fraclion of the variation
in test scores:TI1c R2 in column (1) is 0.049. The R2 jumps. howe ver, when the
student characteristic variables are added. For example, the R2 in the base
244
CH APTER 7
3. lllc c{1ntrol 'ariahlc:.;; arc: not ah' ay' indi .. idually <;ta tistically signattcant In
spec1ftcataon (5). the hypothe"'" that the coefficient on the pcrc'-nt<~I!C 4lhthl~anl:! for mcom~.:. al>stl>tance tl> zero i:; not rl.!jcc.:l<!d at the 5'!.. le, el (the ISt.tll\
uc i' -0.8::!). B ~cJU~~ JdJmg thl!> control va n a ble to the ba!>~ '>pcc.: tficuuon {3)
h.t::. u ncglag~bk dkct o n the esllm<.tt.:u coefficient lor thl. -.tudcnt-tcLidl,r
rat1o and its stand.ud error. and hccau ...c the coefficie nt un this control \Uti
able i' m1l signifkant in ~rccificat i on (5). this additional contrnl vanahll.:' j,
redundant. at least for the purposel> ot Ihi~ analy<;1s.
7.7
Conclusion
Chapter 6 bc;:g<m witb. ~ conc:cm : In the regression of test scores against the stU
ul.!nl ll:achl.!r ra tiO, ommed stuc.lcnt charactcric;tics that intluence test score~ mt Ill
o'- corrd.1tc:u \\llh the. -.tulh.nt-tc..achl.r ratio in the district. and if so th~.: tulknt tc:adtcr r.ttio in the.: i.Ji,trict would pick up the cHeer on test scorec; C'f thc:.c
omitll:d 'tuu'-nt characterbtics.Thus.. th-: OLS estimator would have ommcu ..anante oaa~ ro mitig.th: thh potcnual omined varible bia::.. we augmented the: re ~~
~ inn b' mduding van.tl"lk c; that control for vario u::. ::.tudc nl ch.tractcn'll~
.:
r~.-r<.:l.!ntagc ol l:.ngJa.,h kMner::. ,anJ tWO measure~ of studen t ~o:COO Omlc.. bJc..~
ground). Dome so ~u t~ th~.- estimated effect of a unit change tn the student t~;ac.. 1 r
rattll m h 111. Jhhough it rem.un poss1bk 10 reject the null hypothc-.b th 11 ,,,~
pupuhtinn dft;CI un t~'' ' C\ITC'-. holding these control variable::. constant. i-. 1.: >
at the "\'", -.i~nilkancc level Because they eliminate omitted variable bi.ls an''''\!
from thco;c o;tudcnt char.lcteristtc::.. these muluple regression estimate::.. hyp{1th~: ''~
tc::.r..., and conliucnc{! mtaval'> arc much more u::.eful ft.lf advising the s u pcnnt~.;n
ucn l than t h~ c;inglc-rcgrc!'sor estunates of Chapters -l and 5.
1111.. anal}::.is in tht::. and tb.L prcc~.ding chapter has pr c!>umcd that thl: popul.ttiun rcgrc-.-.mn lunction 1\ linear In the regrc ... ~ors-that is, that the conuiti\.lll:ll
C\pcctation of Y, ~ivcn t h~o rc!!rc-....oro. is a straight li n~; There is. huw~.; 'cr no 1 1r
ticular r'"''"nn to think thic; ic; '\O In Cnct. tbt: c ltect or r cduc mg the studcnt-h:.. ~c:J
ra11n m1gh 1 h.: quttc d 1l k r~.;nt in dastncts '' tth larg"' classes than 111 J -.trlct' th:tl
JlrcaJ~ h,l\ c ~m 111 d"''C'. It su the pl.)pulat1un r~.:grcv,ton h 11~.; .., not hnl..ll tn th&:
x.. hut nthcr j., l nunhncar tum: uno ot the X"<t. fo 1!\t\. nJ llUr .m.th ,j, tl rcgrc-...ion luncti11n' th nnrc nonlinear 111th~.; \'"".... hU\\<.!h'r. \\C nt:cU the tools de\ eloped __..___ __
--II
1 the Concepts
R8Yie<'
245
Summary
1. Hypothesis tests and confidence in ten J, lor a.
~sion coefficient are
tnl! 1e rcgrl
carried out using essentially the amc proc~dur<::
,t.,cJ
in the one-vari11Utt were
,, u:, confidence imerable linear regression model of Chapter 5. For c\
.
ar\ei:-.
Key Terms
~tu
rc!!>lrictions (226)
joint hypo1hesis (226)
F-:.tatistic (227)
~s tricted regression (230)
unrestricted regression (230)
ight
SlU
,..~rc'-
h omosk ~d, 1 ,.
~ tlCil\'Oil
1)
F-statistic (231)
.; (the
back
\!acht:r
\UI th~
1~ Z!!H'
7.1
aristn~
\)thc:.is
rintcn
popl'l
di tll'll'
nt) pnf
- tcach-:r
I
i\.15 th31
<\r '" th.:
o rc~rc;:o
c .. cl(rc
7.2
246
CHAPTER 7
Exercises
The fin.Lsix exercises refer to the table of estimated regressions on page 247.
computed using data for 1998 from the CPS. The data <;Ct con))t!>b ut Information on 4000 fu ll-time full-year workers.. The highc.l>t educational acht
ment for each worker was either a high scllOol diploma or a bachelor\
degret:. The worker's ages ranged from 25 to 34 years. The data <;Ct abo 1:1 11
tained information on the region of the country where the person li\'1!0, m 1r.
ital status, and number of children. For the purposes of these exercise~ kt
7.1
Add c..,,, (5%) and "**" (1 %) to the table to indicate the sla tistical o;igniticance of the coefficients.
7.2
73
7.4
Exerci~
247
,..-
veor"s
con-
mar-
(1)
(2 )
(3 )
5,46
(0,21)
5.48
(0.21)
(0.21)
-2.64
(0.20)
-2.62
(0.::!0)
Regressor
College (.X1)
frmalc: CX2)
5.44
- 2.62
(0.20)
let
0.29
(0.04)
0.29
(0.04)
\!Z.:dX)
1'\onhc:nst (X~)
(0.30)
Mtdwest (X 5)
0.60
--
(0.28)
-Snulh (Xb)
I---
0.69
-0.27
(0.26)
---
l nr~rccpt
12.69
(0.14)
4.40
(1.05)
3.75
(1.06)
51{(
R'
,,
6.27
6.22
0.176
0.190
4000
4000
----
6.10
6.21
0.194
4000
ression
ldencc
b. Jua nita is a 28-year-old female college graduate from the South. Molly
n ate sta-
r.
c ar-old
al for tht
The regression shown in column (2) was estimated again . this time using data
from 199:! (4000 observat io n ~ c;elccted at random from the March 1993 CPS,
convcrtct.lmto 1998 t.lollaN. using the consumer price index) The results are
2 48
CHAPTER 7
AHE
(0.18)
0.2t.
(0.03)
Comparing this regression to the rcgrc!> ion for 1998 ~ho\\ n in t:olumn (,)
was thcrt: a statistically ignificant change in tbc coetticie nl on Cullcgt!?
7.6
F\ .tluatl! the following statement: In all of the rcgre,.<;ion:... tht! cod II 11..
o n Female IS negauve,l..trgc, and statistically sigmficant Thb prO\'tdc -.tr ~
stattstical evidence of gender discrimination in Lbe U.S. labor m.trkct.''
7.7
Que-.tion 6.5 reportl!d the fo llowing regression (wnere standard e rrors ha\,.
been added):
Pmt
(2.6 L)
(8.94)
(0.0 11)
(0.0004S)
(10.5)
Lot ::.tlC is measured in square feel. Do you think that another scale
might oe more appropriate? Wh~ or why not?
e. The Fsratistic for omitting BDR and Age h om the regression is r =O.OR.At c the cocfficienb on BDR and A ge ~tatistic a ll y different from
zero at the I 0/t~ level?
7.8
te~ttnll /Ji
fJJ - tl
Empirical Exercises
0.21.
7.9
ron (2),
ge'?
Consider the r grcssion model Y, = {3(1,.. {3 1Xli + {32Xli + u,. Use ''Approach
#2 " from Section 7.3 to transform the regression so Lhat you can use a r- tatistic to te!)t
a. {31 = f3z:
b. {3 1 + a{3~ = 0, whe re a is a consLant;
c. {3 1 + {32 = I . ( Hint: Yo u must redefin e the dependent variable in the
s strong
iet."
to r~
249
regression.)
have
7.10 Equalions (7. 13) and l 7.14) show rwo formulas for lhe homo!;kedastici tyonly F-statistic. Show that the two formulas are equivalent.
~ize
1-bedrooru
be rcgrcs-
llot. Con
Iof her
thcr scale
son is f ==
ifercnt from
Empirical Exercises
E7.1
Usc the data set CPS04 described in E mpirka l Exercise 4.1 to a nswer tl1e
follmving queslions.
a. Run a regression o f average hou rly earning (A H E) on age (Age).
c. Are the results from the regression in (b) substantively tliffl.!rem fro m
the results in (a) regarding the effects of A ge and A H E? D ocs the
regressio n in (a) seem to suffer from omitted variable bias?
d. Bob is a 26-year-old male worker with a high school diploma. Predict
Bob's earnings using the esti mated rcg.rcssion in (b). Alexis is a 30yea r-old female worker with a college degree. Predict Alex1s's earnings
using the regression.
c. Compare the fit of the regression in (a) and (b) using the rcgrcs. ion
standard errors, R2 and R2 . Why are the R2 and R2 so similar in regres1.)')
SIOn
(v
.
the regression.
250
CHAPHR 7
Beauty on Course_Eval?
E7.3
l.
Ju-
cational a ttainme nt would increase by appr oximate ly 0.15 year if w-:.tance to the nearest college is decreased by '20 miles. Run a rt:.grc:s~ion
o f years of completed e ducation (ED) on distance to the neare~tl 1lege (Divt). Is the advocacy groups' claim consistent with the estim ted
regressmn? E xplain.
b. O lbe r factors also affect how much college a per:.on complelt:' o, ~
controlling for these other factors c hange the e stmated e ffect ot ,_u-.tance on coiJege years completed? To answer this q uestion, con ..truct 9
table like Table 7.1. lnclude a simple specification (constructed tn (a) I.
a base specificatio n (t ha t includes a set o f importa nt control \anabies), and several modifica tions of the base speci(ica rion. 01scu~~ how
the estimated e ffect of Disl on ED changes acr oss the speci ficati on~
c. It h as been argued tha t, controlling for o ther factors, blacks and I fi..;panics complete more college than whites. Is this result consisknl with
the regr essions that you constructed in part (b)?
E7.4
25 1
tndi-
seem
cise 4.2.
onfi-
APPENDIX
7.1
pyou
fable 7.1.
~structed
The method of Section 7.2 is the preferred way to test joint hypotheses in m ulti ple regre:.-
~ect of
sion. However, if the author of a s tudy p resents regression results but did not tesr a joinr
restriction in which you are intcrcswd, and you do not have the original data. then you wiU
dsc 4.3 to
not be able to compute !he F-statis tic of Secuon 7.2. This appendix describes a way to test
JOint hypothcsts that can be used whe n you only have a table of
regres~ion
resul ts.
This method is an applica tio n ot a ve ry ge neral tesring approach based ou Bonfc rroni's
on's edu-
tar
if disfegression
11rest coleestimated
inequality.
TI1c Bonferroni test is a test of a joint hypolheses based on the !-statistics for the individual hypotheses; that is, the Bonferroru test i;; the o ne-at-a-time t-stalistic test of Section
7.2 done properly. The Bonferroni test of the joint null hypoth<!sis /3 1 = /31,0 and {32 = /32.0
based on the critical value c > 0 usc::s the following rule:
ltes.Does
Accept if lE
ds
~ct of disconstruct a
~ted in (a)l.
~\ vari-
piscuss ho"'
i fications.
s and His
'h
nsiste nt wit
4, b ut cxch1J
c and if
~~
c: otbenvise, reject
(7.20)
11
and t 2 a re the r-stalisfics tha t tes t the restrictions on {3 1 and J32 respectfully.
The trick b to choose the critical va lue c in such a way that the probability tha t th.: one at-a-lime test rejects whe n the null hypo thesis is true is no more than the destred s1gnsfi-
cance level, say 5%.1his is d one by using Bonferroni's ineq ualify to choose the critical value
c to allow both for tbe fac1 that two restriction!> are being tested and for any possil,le corre lation between 11 and r2 .
Rev_CmlJI!<.
e inter\'al fo'
'
I
. t :'ic,sll
I'II y Slgl
~11
A n B be the even''both A and B" (tbc intc rscc(jon of A and B). and let A U B be the event
.. A or B or both" (the union o f A and B). Then P r(A U B)
Because P r(ll n B)
<!!
= Pr(A) + Pr( R) -
Pr(A n B).
implies that 1 - Pr(A UB) ?; I - [Pr{A) + Pr( B)] Le t A" and Be b.: the complements of
252
CHAPTER 7
' un ~
1 - (Pr(A)
Pr(B)]
1\o" ld t\ b.; th .. C\<.Otthut 11t >rand 8 tx the ..:h.nt that 1; 1 > r. Th~n the meq
it~
PrpJ/J}
Pr
A)
Pr(H) )lckh
(1
Bonferroni Tests
Bccaw.~ the e\'ent tsl -. t ur ltJ
lor both'' IS the reJeCtion regton u( tho.: one lht ttmc
tc-<t. Equ,llion (7.21) prnviJl" a W:l} to choo~c the critical value c so th.tt thc "onc HI u tun~
I stnti ~ric
has the
ue~irco
signillca nce level in l:ugc <;ftmplcs. U nder thL: null hypolh\:~1 ... 111
large samplo.:~. Pr( lr 11 >c)- Pr(lt ,j > c) ::.. Pr(I Z I > c).'nlUs Equation (7.2'1) imrl~ts th111,
111 Iorge samples, the prohobiltty that rh~ 1>0~-a t -a- tl m~.: lest rejects unJe r thl' nUll I"
(7.22)
in Fquation (7 .22) provid"' a " ay to chO<he l ..'ritical 'alu.. c ' that the
prohahthl\' or the rqecuon unJer the nt,;.ll h~JX>lheSt<; equal!. the desrred SljtO fi~.:ance I \'tl
Th~
relltnctton
und~r
th~.:
th~n
t''o
right-hand
codCi1.1~.:nb
s t lh: 10
Eqa
1 th.:rc ure q
\.ln
l7.?~)r
i ts
s ;I~
1.25"(, percentile of the st:tndan .l normalth,tnhulton so PrCI Z ! > 2.24 1) =:! 'i . 1 hu'
l::.qua11on (7 22) tells u' that, rn large snmplcs.thc one-at-a-tune test 111 Equation (7 2lll \\Ill
reJeCt at most 5% of the tlm..: unllcr the null h )pothcsi~.
'f11e critical value' 111 Tnhk 7.3 arc largl;r than the crtltcal \'alues fnr te,ting .t 111~\l'
re~triction. For e.xumrlc, with q = 2. the: one-ahiLimc te:H r~:jcct' il atlca,ton~.; tst.lll'llt:
ex~.:ccd~ 2.241 in nh,nlute value Thi' critical value is grenter than l 9ti ho.:cau'e II pr< 11' "'
corrects for the tact that b\'IO"lkin~ at t\\1) t~L;Hiqk~ you get 11 sewml chanc1. to rcjtr.:t tht'
hypoth~"- a' r.lt'l.:u,,e~lm
Secuon 7.2.
11 th1. mdl\ 1dual t-stalls!ICl' art tw.cd ~~n hctc:rosko.:J.J,tu:ll\ -robu<;ht.md. rd ..rrv~ thd'
JOint null
the!' Uonl..:rroni te-.t i-. ' hJ '' h hc:r vr nlllthc:n.: t'i hetcrcxkedlbttctt)'. but i1 h1. 1 -.tau-.ti
are ba~d on h0mo:.keda,ucit~ -em!~ .. t.Jndurd Cl'''
1. Bontc:mni tc-.t t-- Hthd (nh unLh:t
homo-.kc:da.-.tidt~
if AU B is
TABLE 7.3
253
Significance Level
~c
LO%
1.960
3
4
(7.21)
5%
1%
2.241
2.807
2.128
2.394
2.935
2.241
2.498
3.023
--
ne-at-a-tim..:
rt:
at ll time'"
~ypothcsis in
implies that.
expe nditures per pupil in Equa tion (7.6) are. respectively, t1 = -0.60 and
null is
A ltho ug h
'2- =
2.43.
Jed < 2.241. because J t~J > 2.241 , we can reject the joi11t null hypothesis at the
5% significance leve l usmg the Bonierroni test However, both r1 and r2 are less t han 2.807
(7.221
in absolute value, so we cannot reject the joint null hypothesis at the 1o/o significance level
using the Bonferro ni test. ln contrast, using the F-slatistic in Section 7.2, we we-re able to
2.5"n. ThUS
alion (7 20) wtll
) ==-
CHAPTER
Nonlinear
Regression Functions
lx.lin~:nr.
chan~\!
valu~
ut
depends on the vaiUt! of X 1 ilseli For example, reducing class sizes by one
student per tc.tchc.r might have a greater effect tf class sizes arc already
manJgcably smaU than it they are so large that the teacher can do hllh. mor.:
than kt!ep the cia:." under control. If so, the test score ( Y) IS a nonline.tr .unct1on
of the 'tudt=nt-teacht:r rauo (X1), \\here I hie; fun ctiOn
1S steeper
wht:n \', i~
smalLAn exampk of a nonlinear regressio n function wJth this fealurt. '' ~ho,~n
m Figur~ 8 1. Whereas the linear population regression (unction in Fieurc: ~ lu
has a constant slope. the nonlinear population regression function in figuJt: s.lh
has a st~c!per slope when X 1 is smaU than when il is l arg~.1ltis first group vf
methods is presented in Section H.2.
The methods in the second group nre useful when the e ffect on Y ot a
change in X 1 depends on the valul! of another in<.lepen<.l~nt \.ariahk. say X: fllr
~:xamplc. students still learning English migh1 c-.pcciull) hc:ncf1t from having
ml)r~ on~-on-onc: att~ntion; if so. the eff~ct
:.tudcnt-tcacher rat1u "ill he grealer in di'ltrict-. with man)- :>tudcnt c;till kanlio~
Englbh than
10
di..,trict' "ith fe'" Engli-.h h.:.1rn~.:r' In thi' t. x,t mplc. the ~ffc:d l 111
255
Run
Ris~
ar.
Run
n t.
x,
lue
(a) C onsram
~lope
of
Population regression
function when x2 = 1
0
re
Risel ~
ore
In F1gure 8.1a, the population regre~sion function has a constant slope. In Figure 8.1 b, the slope of the population
regre~ion function depends on the value of xl. In Figure 8.1C, the slope of the population regression function
depends on the value of x2.
rn the models of Sections 8.2 and 8.3, the population regression funclion is a
nonlinear function o( lhe independent variables. that is. the concljtional expectation
ving
E( Y, IX 1,
. Xk,)
are nonlinea r in the X's, these models are l inear functions o f the u nk nown
I ~e arniog.
coefficients (or parameters) of the population regression model and thus are
etfec\ otl
mod~.!~
256
CHAPTER 8
unknown parame ters of rhese nonlinear regression functions can he esti.matl!d tnd
tested using OLS and the methods of Chapters 6 and 7.
Sections 8.1 and 8.2 introduce nonlinear regresston functton-. in the C(~
of regression with a single independent variable. and SeclJon S.3 l.!xtends th"
two independent variables. To keep lhitlgs simple. a<ldiLJonal control "anablc,
are omitted in Lbe empirical examples of Secuons 8.1-8.3. In prachcc, howcv r.
it is imponant to analyze nonlinear regression function s in model!. that conr 1
for omitted variable bias by including control variables a well. Tn Section I\ "5
we combine nonlinear regression functions and addilional con trol va ria ble~
when we take a close look at possible o.onlinearities io the relationship betwcr.:n
test scores and the student- teacher ratio, holding student characte ristics
constan t. fn some applications. the regression function is a nonlinear function of
lhe X's a11d of the parameters. If so, the parameters t<!nnot be estimated by
OLS, but they can be estjmated using nonlinear least ~qua res. Appendix 8.1
provides examples of such functions and describes the nonlinear least squ.......
estimator.
8.1
8.1
257
Scotterplot of le$t Score vs. District Income with o linear OLS Regression Function
There p<nitive correlation between test scor~
and a lncltncome (correlohon = 0.71), but the
lmea OLS regression line does not adequately
doscroL'* the rclo~onshi p between these voriobles
Test score
7-10
720
7(\o J
(~
6GiJ
640
(,2fJ
600
11.1
District income
(tho u saods of dollars)
assistance) to measure the fraction of s ludeots 10 tbe district coming from poor
(amtlies. A different , broHder measure of economic background is lhe average
annual per capita income in Lhe school dtstnct (''di trict income"). The California
data set tncludes district income measured 10 thousands of 1998 dollars. TI1e sample contains a wtde range of income leve ls: For lhe 420 ui:.trict.s in our sample, the
median district income is 13.7 (that is, $13.700 per person), anJ it ranges from 5.3
($5300 per person) lo 55.3 ($55,300 per person).
Figure 8.2 shows a scatterplot of fifth-grade lest scores again"t disrrict income
for the California data set, along witll1be OLS rcgrc~ion line relating lhese two
variables. Test scores aod average income are strongly positjvely correlated, with
a correlation coefficient of 0.71; student ) from aJllucnt districts do better on lhe
tests than students from poor districts. But this scauerplot has a peculiarity Most
of the points are below the OLS line whe n income is very low (under $10,000) or
very high (over $40,000), but arc above the line whe n income is between S15,000
and $30.000. The re see ms to be some curvature in the relationship between test
scores and income that is not captured by the linear regression.
In short, it seems that the relationship between district income and test scores
i" not a straight Line. R ather. it is nonlinear A nonJincar function is a function with
a -.lope that is nol constant: The function .f{X) is linear tf the c:Jope of J( X) is the
1'-ame lor aU \alucs of X, but if the slope de p~mls on t h~: value of X. then .f{X} is
nonlinear.
258
CHAPTE~ 8
lf n slfaighl line JS nm an adeq ualc J c..:,cription of the rd ation ..hip hetw ~.:en
Jastnctancome and tel)t score'\. what ~? Imagine dra'"'ln1ll L"Uf"\'1! that filS the p 1 ,
10 Figur" X2. Thts cune would be steep lor IO\\ vnlucs of d 'trict _lCOmt: t 1
''uuiJ flatten out ~diStrict aocomc gds tugher. Oot: \\.l) to rprm:imak su. 1 a
c.un ~ mathematac.lll} .., to moJd the rdataon,hip " a quae:: t\: fun..:tion . 1h .!~ .
"C couiJ mudd test 'COre as a fum. lion of in~umL mt/tht ~qu u e of 'nw:& ...
A quadratic populatton regres:.ton modd rclat ng ll:'l -.ure:-. nnJ incor ) 1,
written m.llhematicaJiv as
l ~i.l i
wh~.:rc {311 , {3 1 a nd {32
are coeffic ie nts, lnconw, ii) the inwmc in the ;th dilltlict.
1II.COnlef , is the square of income in the i 1h ui!.lrict, a nd u, i ~ an error tC'rm that. J~
u:-,ual. represents all the other fac tors rhat determine test scores.l:.qua tion (8. I) l\
called the quadratic regression model because the population regrc~siun funt tinn.
E( TesrScore1 IIncome,) = {30 + {3 1lncome, + {3 21ncorne1. is a quaJrnt ic (unctwn of
th~: anJt.:.p~n dent variable, Income.
If )uu knew the population coefficients f3o,{31 and {31 in F4u,1tiun ( . ) ~ou
coukl prt!dicl the test score of a distnct based on ih ,tverage income But he'e
population coet11cicnts are unknown and therefore must be estimated using sampk ofJata.
\t ltr-.t, it might seem Jilii.cuh to find the coefficrcnts of t h~ qu. drat 1c !Unction that t-est Ills the d.1ta in figure 8.2. If you compare Equauon t~ I J ' ttL the
m It rk rc:gre'>'!llOn moJel an Key Concept 6.2, bowever. you \\all see lh<lt Equation (X.l ) j., in fact a' t.:.n.ion of the multiple regression model with two cgt~.: "or..
Th~.; fi r!,l regressor is lmome. and the second regressor as lltcomr. 1 hus. ,; ft l'r
lk tan in)! the ft..l!rt:ssors as Income and lncom~. the nonlinear model m Equ.llll1n
(8.1) i'> '>rmply a multiple regression mode l with two regressors!
Becau~t: the 4uudratic regression model is u varhmt of multiple rl.!grcssat>n.H'
unknown popula tmn codfic1ents cnn be est1 matccl and tcs tc:d using thl 015
meth ods described in Chapters (i and 7. E~:otimuting tht. codficicnts ol fquutillll
(H. l) using OLS for the 420 obse rvatio ns in Figure H.2 yie ld'
(0.27)
,/~
0.554.
(~,;)
(O.!X).4X)
"here (a' u ... ual) 'lnnclard crn.n> of the t;Siimatcd coefficil!nt' 1n~ gi\.:n 111
p:ul!nthi!'>C.,_ Thl! c'umated regre,~ion ft~nction ( '.2.) is plutkc1 in T-i,gurc :d
s. t
1ip
bet,veen
FIGURE 8.3
ts the points
1cotnC, then
Scatterplot of Test *re vs. District Income with linear and Quadratic Regression Functions
,m ate such a
259
Test sco;e
740
:tion.Tha t is.
no
fincome.
n<.l income is
700
Linear regressi~n
680
660
(8.1)
he 11h d istrict.
r tcm1 tlJat as
uation ( 8.1) is
ssion function.
itic function of
..
64U
620
600
0
10
20
30
40
50
60
District income
(thousands of dollars)
le d using a sam-
superimposed over the scatterplot of the data. The quadratic function captures the
curvature in the scatterplot: It is steep for low values of district income but flattens om when district income is high. In short, the quadratic regression function
seems to fit the data better than the linear one.
We can go one step beyond this visual comparison and formally test the
hypothesis that the relationship between income and test scores is linear, against
the alternative that it is nonlinear. If the relationship is linear. then the regression
bdel in Equation
function is correctly specified as Equation (8.1), except that the regressor lncome 2
is absent; that is, if the relationship is linear. then Equation (8.1) holds with /32 =
j \e regression. its
0. Thus. we can test the null hypothesis that the population regression function is
linear against the alte rnative that it is quadratic by testing the null hypothesis that
~nts of Equatt~>O
' = 0.554.
null hypothesis that /32 = 0 can be tested by constructing the t-statistic for this
hypothesis. 'This !-statistic is t = (~ 2 - 0)/ S(~2 ), which from Equation (8.2) is r =
(~~ 1
. n 1tt
r.uts are g.avt:
(::.d in Figure ~.-'
260
CHAPTER 8
j)~~'
\~~"
l J)
whcrcf(X1,, X2;. , X,.,) is the population nonlinear regce ion function a po -.ibly non linear funclion or the independent variables xli. X2; . . Xlu, and II , ,.11.
error term. For example. in the quadratic regression model in Equation (K I )nl\
one independent "aria bit b present. ~o X 1 is Income and the population r~o_, .:s
ston function isf(lnwme,) = (30 ,. (3 1/ncome, + ~lncomeJ.
Dccause the population regressio n function is the conditional cxpect.lln 1 of
Y, givt!n X 1 X21, , X . in Equauon (lD) we allow for the possibility th tl I i
comJition.ll 1!\ pi!Ct.liiOn is a nonlinear fu nctio n of x ,,. x1,. . . . . X~ th .. t i ~.
E( Y, 1X , X21... , X. ) f(X , . Xy .. .. . Xk;). where f can be a nonlinear fum:r '"
Tf the popul.ttion regression !unction is linear, then f(X 11 X~...... . X~.;,) = /311 -t 13 ,\
'1' f3~X , + T (3">. . and Equation (8.3) becomes the linear regrc<:<;ton mvJd
in Ke) Concept 6.2. 1Iowever. Equation (8.3) allows for nonlinear regrcsston tun~
tions u:. well.
l
The effect on Y ofa change in X 1 As discussed in Section 6.2, the cfl c<.'t '0
Y of a change in X 1 ~.A: 1 holdmg X 2, , X~;. constant. is the difference in the
I Inc term '"nunhnc:nr n:erc'"'m" aprh~ lO '"" COO<."'\:ptuall~ dtllcn:nt l tmtlles or mo~o:.h:b In lh f~
l.tmlh,lhl J'I.II'UIIIIh>O r.:gTo:'-'l<lO fUil(IUlO I' a nonhncar tunction o f lht .,\ 'hUll' llllt'ar lun;;tJOI\
th unl.;nuwo parltmclc:flo (the 1J '). In 1hc -.co.:vnd 1am1l) .lhe fl<rut.ll"'" ro:gr~non functtIB ~ noll
llncnr 1un.::11un u lthc unl;n''"'" p;..rtlntCICI"'I anJ m.w vr m.l\ nul he 1 nmtlinc.or h.t~llon olllu: .\ llll
moJeh 10 th<li"Jd' ol lht' d~:iplcr are nil 10 lhe fir-t lnm1l~ Append" I llll.c, ur modcb uom ~
~nd
ranul\
8.1
,.---
26 1
Y OF A CHANGE IN
NONLINEAR REGRESSION MODEl (8.3)
m.
he
lcar,
X1 IN THE
.,---
.l Y =-ttx~
be
8.1
(SA)
.f
.:lY = }'(X 1 +
(8.5).
expec ted valu e of Y when the independent variables take on the values X 1 + aX1,
X 2 .. . Xk and the expected value of Y when the indepe ndent variables take on
the values X I. X'}., . . . ' xk.The difference between these two expected values,
say 6 Y, is what happens lo Y on average in the populatio n whe n X 1 changes
by an a m(lunt 6X1 holdi ng constan t the other variables X 2 . Xk. In the
f(X1 + ~X 1 . X2 . . . . . X.~.)
rune
h~o: fir'>l
!loll ,,r
' noo
\'"' Ill
otnth
change in X 1 is a lso unknown. To estimate the population effect, first estimate the
population regression funclion. At a general level. de note this estima ted function
by]: an example of such an estimated function is the estimated quudratic regres
sioo fu n<.:tion in Equation (8.2). The estimated eUect on Y (denoted SY) of the
change in X 1 is the difference between the predicted value of Y when Lhe independent vaiiablcs Lake on the values X 1 + 6 X 1 X 2... , X.<. and the predicted value
of y when they t ak~ on the values X I. x2, . .. 'xk.
The method for calcula ling the expected effect o n Y of a change in X 1 is summarized in Key Concept 8.1.
A pplication to test scores and income. What is the predicted change in test
scores associated with a change in district income uf $1000, based on the! estimatt!d
quadra tic regrcs ion function in Equation (8.2)'? Because that regrc~~ion function
is quadratic. Lhi~ effect dep\:mls on the inittal district 1ncome. We the r efore
262
CHAPHR 8
consider two cases: an increase in district income fro m 10 to 11 (i.e., from $l ll.0f)(j
per capi ta to $ 1 1 .~) and an increase in district income from 40 to 41.
To compute ~ y associated with the change in income from 10 to ll. we: co1n
apply rhe general formula in Equation (8.5) to the quadratic regression model.
Doing so yields
([l, + iJ, X
l1
186)
8.1
ve can
nodel.
(8.7)
Thus, if we can compute the standard error of ~ 1 + 21~. then we have computed the standard e rror of 6. Y.There are two me thods for doing this using standard regression software, which correspond to the two approaches in Section 7.3
for testing a single restriction o n multiple coefficients.
The first method is to use 'approach #1"' of Section 7.3. which is to compute
the F-statistic testing the hypothesis tha t {31 + 21{3 2 = 0. The standard error of 6. Y
(8.6)
~d value
the pr~-
~d using
lnco:ne
(8.8)
~ .0423 X
I-
641.57
\"/hen applied lO the quadratic regression in Equation (8.2), the F-statistic testing
the hypothesis that {31 + 2'1{3 2 = 0 is F = 299.94. Because ~Y = 2.96, applying
Equation (8.8) gives SE(tiY) = 2.96/Y299.94 = 0.17. Thus a 95% confidence
interval for the change in the expected value of Y is 2.96 :: 1.96 X 0. 17 or
(2.63. 3.29).
The second method is to use "approach #2'' of Section 7.3, which entails transforming the regressors so that, in the transformed regression. one of the coefficients is {3 1 + 21~. Doing this transformation is left as an exercise ( E xercise 8.9).
trict with
,
r
p6 points.
e differ- 0.0423
~2
points.
!predicted
~d changes
11ated qua-
~omc
(like
function. j.
~ct contains
n (uncno11 ~~
. to>r
idencc tn "
263
.U.(XX)
10'
Lerror of .)
..
tbe multiple regression model of Chapters 6 and 7. the regression coefficients bad
a natural i.n terpretation. For example, f3t is the exp ected change in Y associated
with a change in X 1, holding rhe other regressors constant. But, as we have seen,
this is not generally the case in a nonlinear model. That is, it is not very helpful to
think of {3 1 in Equation (8.1) as being the effect of c hanging the districfs income,
holding the square of the district's income constant. Th is means that in nonlinear
models, th e regression func tion is best in te rpreted by graphing it and by calculating t he predicted e ffec t on Y of changing one or more o f t he indepe.ndcnr
variables.
i!
..I~
~ting. a. s'""'
lcon~,J~r
fn
til~
~Eq W!Ilun (8.8) i~ derived bv ngtin~ tbal the F-;-tati~llc 1~ !he. ~qua_re of the Htali~l!C testing l~is hypoth~l fJ2 )1Sl;(/3 1 + 21p.))2 - I ~YISE(~Y)p.and solving fm SF.(AY).
264
CH APTER 8
t!o~
nomtc theory and what you know about the application to ~uggc::.t a po-.::. .. 1 ~.:
nonlinear relationship. Before you even look a t the data. ask )Our elf "'h~ .:r
the ~lope of the regression function relating Yand X might rea-.onabl) depend
on the value of X or on anothe r indepe ndent variable. Why might such nonlinear dependence exist? What nonlinear shapes does lhis suggest? For ~:x1;-n.
ph:, thinking about classroom dynamics with J1-ycur-olds suggests that ~ut1u1g
class size from 18 students to 17 could have a greate r effect than cutting it frum
30 to 29.
8.2
Nonlinear Functions
of a Single Independent Variable
1lli'> section pro' ide two methods for modeling a nonlinear rc.:grc~;sion fum lll'11
To keep thing-. ample, we develop thec;e methods lm a nonltnear rc.:grc''''111
8. 2
265
function that involves only one inde pe odcnt varia ble, X. As we see in Section 8.5,
however, these models can be mod ified to in clude multiple independent variables.
The first method discussed in this section is polynomial regression, an exte nsion
in this
of the q uadratic regression used in the last section to mode l the relationsrup between
test scores a nd income.1l1e second m e thod uses logarithms of X and/or Y. Although
these methods are presented separately, they cail be used in combination .
use ecopossible
whether
dept!nd
Polynomials
uch non-
or exam-
general, let r denote the hig hest power of X that is included in the regression.The
~t
cutting.
it rrom
?g
(8.9)
Sections
n be esti-
rstand the
rode/. Just
In it really
the alter
When r = 2, Equa tio n (8.9) is the quadra tic re gression model discussed in Sectio n
8.1. When r = 3. so that the highest po we r of X inc luded is X 3 Equation (8.9) 1s
called the cubic regression model.
The polynomial regressio n model is similar to the multip le regression m odel
of Chapter 6> except tha t in Cha pter 6 the regressors were distinct independent
variables, wh erea~ here the regressors are powers of the same dependent variable,
X, that is. the regressors a re X , )(2, X 3 . and so o n. Thus the techniques for estim a tion and infe re nce developed for m ultiple regression can be a pplied he re. In
pa r ticular, the unknown coefficients {30 , {31 .... , f3r in Equatio n (8.9) can be estimated by O LS regression of Y; against X ,, X~, ... ,
xr
OJ' O)(lrt:
the null hypothesis (H0) thar the regression is linear and the altemalive ( H 1) that
it is a polynomial of degree r CO(respo nd to
re
H 0 : {32
(8.1 0)
The null hypothesjs tha t the population regression (unction is linear can be
tested against the alternative that it is a polyno m ial of degree r by testing H 0
against H 1 in Equa tion (8. 10). Beca use [{0 is a joint n ull hypo the sis with q =
r - 1 restrictions on the coefficients of the population polynomia l r egressio n
model. it can be tested using the F-stati~tic as described in Section 7.2.
266
CHAPHR
111i~ recipe ha~ one missang ingredient: the initial degr~e r of the polynon' tll
In many applications mvolving economic data, the nonlinear functions are smo 1th.
that is, tht.:y do not have shurp JUmps or "spikes."' If so. then it is appropriat'- ll'
choose a small maximum order for the polynomial, such a~ 2, 3. or 4-that is, begin
with r
= 2 or 3 or 4 in step 1.
Application to district income and test scores. The estimated cubit: regr~~
~ion Iunction rdaung.lhstncl income to \1.S t score~ i
R2 =
0.0961nwmt.2
(0.029)
j.
0.00069Jncome-'.
(0.000~5)
0.55.~ .
tiq
----~...JI
8. 2
of X
d:~!
apes:
. ts) in
bee the
hat you
tely, but
267
level. Moreover, the F-5tatistic testing the joint null hypothesis that the coefficie-nts
on Income? and Income' are b oth zero is 37.7. with a p-value less than 0.01 %~ ~
the null hypothesis that the regression function is linear is rejected agajnst tht'
alternative that it is either a q uadratic or a cubic.
whether
~ ro. lf so.
\Which is
fe tested
for that r.
in Equa-
~e
regrcs-
~and esti[[icienton
~the coef
ificaot.
I
. I
'olvnomHl .
I -
lrc smooth.
ropriate to
~at is. begin
ubic regres-
Logarithms
Another way to specify a nonlinear regression fu nction is to use the natural logar ithm of Y a ndJor X. Logari thms convert changes in variables inro percentage
changes, and many rclationsl1ips are naturally e xpressed in terms of percentages.
Here are some examples:
The box in Chapter 3. "The G ender G ap in Earnings of College Graduates in
the U nited States:' examined the wage gap be tween male and fe rnak college
graduates. In that discussion, the wage gap was measu red in tenns of dollar
H owever, it is easier to compare wage gaps across professions and over time
whe n they are expressed in p ercentage terms.
In Section 8.1. we found that district income and Lest scores were nonlinearly
re lated. Wo uld this relationship be linear using percentage ch anges? That is.
mig ht it be that a change in district income o f I %-rather th an $1000-is associated with a change in test scores I hat is a pproximately constant for Ji ffcrent
values of income?
(S. l t'l
ression full.:
. ic at the:;
268
CHAPTER B
FIGURE 8 .4
Y
S
4
3
2
o ~--~-----L----~----~--~----~
20
40
60
80
100
also '>vritten as exp(x). The natural logarithm is the inverse o f the exponent tal function; tha t is, the na tural logarithm is the f unction for which x = ln(er) o r. equtvalenrly, x = ln[exp(x)].TI1e base of the n atura l logarithm is e. Alrhough there arc
logarithms in o ther bases.. such as base 10, in this book we conside r only logarithms
in base e, tha t is. the natural logarithm. so when we use the term "logarithm .. we
always mean " natural loga rithm.''
The logarithm functio n, y -= ln(x ), is graphed in Figure 8.4. Note that !be logarithm function is defined only for positive values of x. The JogarHhm function has
a slope that is steep at first, then fl a ttens out (although the function contmu..:~ to
increase). The slo pe of the logarithm function ln(x) is 1/x.
The logarithm functio n has the following useful properties:
= -.ln(x);
ln(ax) = ln(a) + ln(x);
ln(x!a) = ln (x) - ln(a); and
ln(llx)
ln(x") ~ aln(x).
(~.12)
(8 L3)
(S 1 ~)
(S.J5)
Logarithms and percentages. The link between the logarithm and per,;xnt
ages re lies on a key fac t: When ~is sma ll, the di rfere nce between the logant(11l1
of x + ~x and the logarithm of xis approximately .:>.;. the percentage chang(.) in'
divided by 100. 111a1 is.
8.2
ln(x +~)-In( x)
==
xl:lx
( ''hen .lr.
\- tc: <.,mall )
269
(8.16)
where" :!!'' means 'approximately equal ro.'' 1l1e d~rivation of this approximation
relies on ~lculus, but it is readily demo nstrate d by trying out some values of x and
~. For example, when x = 100 and .!U = 1. the n l:lx l x = l / 100 = 0.01 (or 1%),
while ln(x + Ax)- ln(x) = ln(IOl ) - ln( l OO) = 0.()()995 (or 0.995~o) . Thus tlx l x
(which is 0.01) IS ver y close to ln(x + Ax) - ln(x) (which tS 0.00995). When Ax =
5, Ax I x ::= 5 / 100 = 0.05, while ln(x + Ax) - ln(x) = ln( I05) - ln(100) = 0.04879.
The three logarithmic regression models. TI1ere are three differe nt cases in
which logarithms might be used: whe n X is transformed by taking its logarithm
but Y is not; when Y is transformed to its logarithm but X is not; and when bolh
Y and X nre Lransfonned to their logarithms. Th~ interpreta tion of the regression
coefficients is different in each case \Ve J iscuss these three cases in turn.
(8.17)
lin~sr
log model
In the linear-log mode l. a 1% change in X is associated with a change 1n Y of
O.Ql {31 To see this, consider the difference between tl1e population regression function a t values of X that differ by \X: This is [130 + 13 1tn(X + 6X)] -[A_+ J311n(X)]
= P tfln (X + 6X) - Jn(X)] =o 13 1(AX I X) where the fi nal step uses the approximation in Equation (8.16). If X changes by 1 o/o, then .lX I X = 0.01; thus, in this
model a 1% change in X is associated w1th a change of Y of 0.01{3 1.
The o nly difference between tbe regression model in Equation (8.17) and the
regressio n model of Chapter 4 wi th a single regressor is that the right-hand vari.tbk is now the logarithm of X r ather than X itse lf. To e~timat c the coefficient~ J30
.md /3 1 in Equation (8.17), first compute a ne w variablc.ln(X); this is readily done
uo.ing n spreadsheet or statistical softw<ne. Then 13oa nd J3 1 can be estimated by the
OLS regression of Y; on ln(X,). hypotheses abou1 J31 can be teste? usmg thl! 1-~ta
tl',tic, anJ a 95% confidence interval for 131 can be consrructed as /3 1 L.96S(/31 ).
A:. an example, return to the relationship betv.cen district income and test
-.cor~.:-. ln<.,tead of the quadratic specification , we cou ld usc the linear-log specification in Equation (8.17). Stimating rhis regresston by OLS yields
270
CHAPTER 8
FIGURE 8.5
Test score
-.10
II()()
'---~----l..---'----'---..J.._-__;
II
lO
.iO
(3.8)
40
3tl
r,r r
District incurne
(th ousands of dollo~r;)
= 0.561.
U' 1 l
( l 40)
c'\,1mple, what is lb.: pr10dscted diC1ercoce in test scores fo r district~ w1th '"~ ..c
snc.:omcs of S I 0.000 vcrsu:; $ 11.000? The cc;ti mated \"alue of tl Y is the lhfft " ~~
bct\\ ~.;cnth~.; pr~.;t.lictc:d"alues 6Y - [557., 36.421n(ll))- [557.8+ ~o.42lu(t{J)I
36.42 x [In( ll} -In( 10)) = 3.47. Similarly. the prcdictcu difference hctw~ n '1
district with average income of $40.000 and a uistrict with average inco1H t'1
$-11.000 is 36.42 x !Jn(41) ln(-10)) = tl.l)(). Thu<... like the quadratic spccific.JtiiJ0
this regression prcdtct~ that a $1000 mcrcasc in income has a larg~r cff<.> ct ,,n tt~l
swrcs in poor d~ tm: ts than it does in aroucnt uharicLS.
The estimated hncar-log rcgressaon fu nction in Equation (l'\.1,') ~~ pll1ttcd 10
Figure 8.5. Bt:caU'I:' th~; rcgre~or m Equ,staon (K J, ) is the natural logarithm"[
incom~.; r.llhcr than income.:. th~: e'timatc:d rcgrc-,,km fun ctiun i-. out a straigbtltnc!
l.ikc the quadratic regrco;o;ion function in fi!!urc h.3 it io; imtt,allv qecp butthl!ll
n:uten' out for hll~her len:h ot tncornc.
8.2
211
1n conte
dollars)
l8.18)
d with an
I
thousand"
lpt 8.1. f \lf
lth average
ln (Eamings)
d1ffcrenct:
36.42ln(l0)\
(8.20)
~ between 1
income ol
pecificatioll
:ffect on lt~l
Acco rding lo this regression, eami ngs are predicted to increase by 0.86% [(I 00 X
0.0086)% I for each additional year of age.
111
lS
111
I s~aight 1111"
th~ 11
(8.21)
leep but
model.
272
CHAPTER 8
FIGURE 8 .6
ln(T~st
score)
f. HI~
''"~
..
6.1'i
6.X
-y-=f3 1y or
6YI Y
{3 , = 6.X I X
100 X (6. Y I Y)
= 100 x
(6.X I X)
percentage change in Y
( .22)
= percentage change in x
Thus. in the log-log specification {31 is the ratio of the percentage change 111 Y ts,o
ciated with the percentage change in X. If the percentage change in X b 1% (thnt
is, if 6.X = 0.01 X), then {3 1 is the percentage change in Y associated with n I"'"
change in X. That is. {3 1 is the elasticity of Y witb respect to X.
As ao illustration , return to the relationship between income and tt:c;t :,-:~1rc:'
When this relationship is specified in thill fonn, the unknown coefficient!> ;~r ~> 11 :
mated by a regression of the logarithm of test scores against thl: logarithn1 vt
income. The resulting estimated equation is
~ ... ~;\
1
8. 2
! ;~
, :'
::o'
',J
r:~~v: C.ONg~11
~~
~--------------------------------------------~----~------~~~~w:
C11se
273
8.2
lnterpreJotion of {S1
A 1% change in X is ~ssociated
with a change in Yof 0.01/31
....__
_J
f)~
incom~
pf dollars)
,..,...
01
~1ange in Y.
~o see this.
A l% change in X is associ<H~d
with a {3 1 % change in Y. SQ /31 is lh~
elasticity of Y with respect to X.
According to tbis estimated regression function, a J % increase in income is estimated to correspond to a 0.0554% increase in test scores.
The estimated log-log regression function in Equation (8.23) is pJotted in Figure 8.6. Because Y is in logarithms, the vertical axis in Figure 8.6 is t he logarithm
of the test score. and the scatterplot is the logarithm of test scores versus district
income. For comparison p urposes, Figure 8.6 also shows the eslimated regression
function for a log-linear specification. which is
I(X :- ~X)l
>rmamation
_ __ __
--
(8.22)
(0.003)
= 0.497.
(8.24)
(0.00018)
;7.
.0 .
~. .-
-~-~=-,~--~
274
CHAPTER 8
rcgrc!-.-.ion models hc'>l fib the;- data'' A~ we saw in the 1.hscus:.ion ol Fquatron
(R2J) and (R.::!-t). the: R 2 can be ust:d to compare the log-hnear and log-log fl'l)t.J.
ds: as it haprencd.the log-log moclcl had the higher R2. Similarlv. the R? c<en he
u-.cJ to compare the linl!ar-log regrc.,..,ion in Equation (H. I8) and the linear rcgr
<.ion of }~ agarn-;t X. In the tc't score and rncome regreo;sron. the lincnr-log rcgr,
sron has an R2 of 0.561 '"hile the linc<tr regressron h,rs .m R' 111 o.sog. s" ~~::
linear-log model Cit., tht;; data h1.:tter.
llo'" can we compare the linear-log. model and the log-lug mnclcl', L nlonu.
natd). the.. R' comwt he u,cJ to compare these two n ..grc,ston.., becau,... th 11
J~pemlent vanables arc different (one is Y. the other is ln(Y,)] Recall that the.. !<
mcar.ure.. " the fraction olthc vanancc ot the dependent variable explamcd h\ lh~
rcgu:ssors. Rccuus~ lhc t.lcpcntkn l variahles in the log-log aml linear-log rno~kh
an;. Jifl\:r~nt. itlloc::, nol makt: scns~.: to comp are lhcir R 2's.
Becau~c of this prnbkm . the best thing to do in a particulm applica uon b. to
dc<.:iuc. u ~ing economic: theory and either your or other experts' knowlcllge nltl11.
problem. whether it make" s~nse to specify Yin logarithms. Fo r examph... htb,,r
CCllnuml'>h t) pically model earnings using logariLhms becau<>e wae:c comp:ui- '
contracl w.rge rncrca..,C!>. anu so forth are often most nat uraUy di..;cusc:cJ in r
ccntagt: terms. ln moddmg test scores. ll seems (lo us.. anywa}) natural to d1,, ....~.,
test rc-.ult' in tenns of poinb on the test rather than perc~:nlage mcrcascs 1 hi!
(~:<,I '\Cores. '\1.) Wt! focus on models 111 whrch the dependent vanabk 1s the tv't
w
rathct than it' logarithm.
exponent Ill I (unction of hoth sides of the Equation (S.19): tht: resu lt is
8 .2
Y.
od-
hc
the
lttU-
thclr
R'
y the
ot.ld~
i~
to
of the
labor
risons.
0 p~.:l
iscuss
Ill
275
by taking the exponential (unction of {30 + $1X ,. that is. by scuing = eiJ,.- ~lx:
111i predicted value is biased because of Lhe 1ni~sing factor E(e").
One solution to thi!> problem is to estimate the factor E(e 11' ) and LO u:-;~ th is
esl!mare when computing lhe predicted value of Y, bu t thk gets compticatetl and
we do not pursue it further.
A nother soluti on. whichi.. the approach used in this book. is to compute predicted values of the logarithm of Y but not to transform them to their original
u nits. In practice, !his is ofte.n acceptahle because when the J ependent variable is
specified as a logarithllL ll is often most natural just to usc the logarithmic specification (and the associated percentage interpretations) throughout the analysis.
log
on~
H!
In practice. economic theory or expert judgment might suggest a func tional form
to use, hut in the end the true form of !be popu lat ion regression fu nction is
unknown. In practice. fitting a non lint\ar function therefore entails deciding which
method or combination of methods works best. As an illustration. we compar~ logarithm ic and polynomial models of the relationship between district income and
test scores.
tht:
1 scurc
Polynomial specifications.
dl.!pcnrcgresCl. it i~
IY). and
ltlkt! th~
i\'cn X 1~
t1 if 1:(
ohtauwd
j
--
TwScore = 486.1
(79.4)
+ 3.06[Jn{fnmme)J', R1
(3.74)
0.560.
(8.26)
276
CH APTE R a
TI1e !-statistic on the coefficient on the cubic term is 0.818. so the nail hypoth
esis that the true coefficient is zero is not rejected at the 10% level. The F-star 1s.
tic testing the joint hypothesis that the true coefficients on Lhe quadratic and cu tc:
term are both zero is 0.44, with a p-value of 0.64, so this joint null hypothesis is lllJ l
rejected at the 10% leve l. Thus the cubic logarithmic model in Equation (8.211)
does not provide a statistically significant improvement over the model in EqiJ;s.
Lion (8.18), wltich is linear in the Logarithm of income.
Figure 8.7 plots t h~
estimated regression functions from the cubic spl'cillcation in Equation (8.11) anJ
the linear-log specification in Equation (8.18). The two estimated regression func tions are quite similar. One statistical tool for comparing these specifications i~ lht!
R2. The R2 of the logarithmic regression is 0.561 and fo r the cubic regression it b
0.555. Because the logarithmic specification has a slight edge in terms o f Lhe R~.
and because this specification does not need higher-order polynomials in tb~ logarithm of income ro fit these data , we adopt the logarithmic specification in
Equation (8.18).
FIGURE 8.7
Test score
740
720
Linear-log regressloo
700
, . \
. ;
680
Cubic regression
660
640
620
600
Ill
20
311
40
DistTic t income
(thousands of dollar\)
8.3
8.3
hypoth
/ f-statis-
filld cubic
~sis is not
on (8.26)
in Equa-
277
Interactions
Between Independent Variables
I n the introduct ion to this ch apter we wontleretl whethe r reducing the
pl ots
the
(8.1 1} and
sioo funcions is tht:
!ession i:_is
; of the R2
1in the logification in
studem-teacher ratio might have a bigger effect on test scores in districts where
many students are still learning English than in those with few ~till learning English. Th is could arise, for example. if students who are still learning English benefi t tlifferentially fmm one-on-one or small-group inst ruction . If so, the presence of
many English learners in a district would interact with the student- teacher ratio
in such a way that the effect on test scores of a change in the student- teacher ratio
wou lli depend on the fraction of English learners.
This section explains how to incorporat e such int~racti ons between two independent variables into the mu ltiple regression model. The possible interaction
between the :;tudent-teachcr ratio ant! the fraction of English learners is an example of t he mar~ general situation in which the effect on Y of a change in one indepe ndent variable depends on the value of anot her independe nt va riable. We
consider three ca~es: when both independent variables are binary. when one is
binary and the othe r is continuous, and when both a re continuous.
!Ssion
(8.27)
I~
. .
esslOn
LTict jn cotll<'
Is of doUa rs)
---
1n this regression model, {31 is 1he effect on Log earnings of being female. holding
schooling consta nt. a nd {32 is the effect of ha ving a college degree. holding gender
constant.
1l1e specification in E quation (8.27) hm, an important limitation: Tl1c effect of
having a coUc::ge Jcgree in this specificatio n, bokling constant gcnde.r. is tbe c;ame
for men and women. There i. however, no reason that this must be so. Phrased
mathematically. the effect of n 2, on Y;, holding D 11 constant. could tie pend on the
value: of D 1,. l n o the r worus, lhcrc could be an interaction between gender and
having a college degree so tha t the value in the job market of a clegrc::e is different
for men and woml.!n.
278
CHAPTER 8
Alt hough the specification in Equation (8.27) does not allow (or thi~ intclclction between gender and acq uiring a college degree, it is easy lO modify the "f' .
ification so that it does by introducing another rcgrc!)Sor. the product ot th~.: 1 , 0
binary variables, D 1; x D 2;. The resulting regression is
(R Ui)
Th~
new regressor, the product Dli X Du. is called an interaction term or an im er.
aded regressor, and lhe population regression model in Equatlon (8.28) IS called
a binary variable interactioo regression model.
The interaction te rm in Equation (8.28) aJJows the populatio n e ffect on Jo!!
earnings ( Y,) of having a college degree (changing 0 2, from 0 2, = 0 to D)J = I) tu
depend o n ge nde r (D 1,,). To show this mathematica lly, ca lculate the po pulatiun
effect of a chao.ge in D2; using the general method laid out in Key Concept 8.1. !'he
first step is to compute th e conditional expectation of Y; for Ou = 0, giv~n a' ,tlue
of D1,; lbis is E(Y,ID1, = d1 , D2, = 0) = [30 + {31 X d 1 +~X 0 + {3, X (d1 x IJ){30 + f3td1 The next step is to compute the conditional expectation of Y, uftcr the
change- that is, for Dv = 1, given the same value of D 1,; this is E(Y,I D 1, = d_ D11
= 1) = {30 + [3 1 X d 1 - {32 X 1 + {33 x (d t x I) = {30 + {3 1ll 1 + {32 + {3,d 1. lbe
e ffect of th1s change is the difference of expected values Ithat is, the dJ ffcrcnc:c 10
Eq uation (8.4)], which is
8.3
i> interac-
279
A M ETHOD FOR
INTERPRETING COEFFIClENTS
IN REGRESSIONS WITH BINARY VARIABlES
(8.28)
~r ao inter-
fjrst compute the expecte d values of Y for each pos~ible case describe d by the set
of bmary vmiables. Next co mpare these expected values. Each coefficie nt can then
bl! 1!}\' Pressed either as an expected ,alue or as the difference between two or more
exp~:cted values.
S) LS called
fleet on log
D 21 = 1) to
population
~ept 8.t.The
iiven a value
~ (d 1 x O)=
'
yi after the
,011 = dl. D"!.i
r- f3 3dl. The
(8.30)
R = 0.290.
2
,diffc[ence in
fhd 1'
( 8.29)
:us men .
r. and acquirin~
reare'
eract1on
,.
. bl<<;
ent ,ana
., _ l <
o w~o.,rl
n effect. {
s methoc:l ,~vntcl
. .
Key Co ncept 8
The predicted effect of moving from a district with a low student-teacher ratio
to one with a high srudenHeacher rafio.hold\ng constant whether the percentage
of English learners is high or low. is given by Equation (8.29). with estimated coefficie nts replacing the population coefficients. According to the estimates in Equatio n (8.30), this effect thus is - 1.9- 3.5HiEL. That is. if the fraction of English
le arners is low (Hi EL = 0), then the effect on test scores of moving from li iSTR
= 0 to HiSTR = 1 is for \est scores to decline by l.9 points. If the frac tion of English lea rners is high, test ~cores ar e estimated to decline b y l.9 + 3.5 '"' 5.4 points.
The estimated regression in Equation (8.30) a lso can be used to estimate the
mean test scores for each of the four possible combinations of the binary variables.
This is done using the procedure in Key Concept 8.3. Accordingly, the sample average test score for districts with low student- teacht:r ratios (HiSTR, = 0) and low
fractions of English learners (HiEL; = 0) is 664. 1. For districts with HiSTR, = 1
(h\g.b student-teacher ratios) and HiEL; = 0 (low fractions of English learners),
the sarnple average is 662.2 ( = 664.1 - 1.9). When HiSTR; = 0 and HiEL; = 1, the
sample average is 645.9 ( == 664.1 - 18.2). and when HiSTR 1 =I and HiEL, = l,
the sample average is 640.5 ( = 664.1 - 18.2 - J.9 - 3.5) .
2 80
CHAPTER 8
FIGURE 8.8
X
(b) lllrl( r m tr rcrp . thlt r
Ill
,J, 'I'~
X
(c) S.
Interactions of binary variables <md canltnuous vanables con produce three different popui.Jt'()ll roqre>s on ~ fnct.VIU
(a) f>c I' X 13P allows for Mferent ntercep~ bv1 has tru> soi'IW' lope (b) fi
J ,X
~D
$ (X >.. ~a lov.~
for different intercepts and dtfferenl sl~: and (c) ~ J_ a X ' !J:l(X X Dl hos the .>OI'I'e intercept but allow\ for d I
ferent slopes.
Interactions Between
a Continuous and a Binary Variable
Nt!xl comiul.!t the population regrcs:.wn o( log carmngs I Y1 - In( Larw".i{' )j
<tgainsl nne cnnttnunu<. variable, 1he.: tntlividual\ yc..r~ nf wmk c\pcricncl.! (' \',) ,11td
um: binary vari.thle. wht!ther the worker hus a college lkgrcl' (n1 \\hl!f'l.' n, I t1
lh~: ;th p\.'NlO i<> a collcg.c gradual\:). A<; <ihnwn in rigurc.: RX.thc.. popul.llion r u!r~ ~
"ion line relating ) and the continuou!> vat table X c;m tic.: pend nn the bm.try \ .ui
abk {) m thr~c Jtfkr~nt W<n s.
In hgurc K~. th~ l\\O rq'ft:~"on linco; diller only in thltr intcrt:~pt .'Jll\. c,,r
rc~p.,mhng populntion J egrc~stlln mudc:l j..,
}'
B"
1s -;I_).......__ _ _~....
8. 3
281
This i:s the familim multiple regression model wjth a population regressiOn function that is linear in X; a nd D;. When D 1 = 0, the population regressio n function is
+ {3 1X,
-t-
f3o + {32. Thus {32 is the difference betwc!en lhc intercepts of thc two regrc!ision line!>..
as shown in Figure 8.8a. StateJ in terms of the e arnings e xample. {3 1 is the effect
on log earnings of an additional year of work experie nce. holding college degree
status constant. and /32 is the effect of a co llege
d~gree
(8_32)
where X; X D, is a new variable. the product of X; a nd D;. To interpre t the coeffi
d e nts of this regression. apply the proced ure in Key Concept g.3. Doing so shows
that, if D, = 0. the popularion regn:ssion fu nction is {30 T {3 1X;. wbercll:> if D; = 1.
the population regression function is ({3 0 - /3 2) 4- ({3 1 + {33)X,. 11lUS, this pecification alJI")WS for two dit:rcrent population regression functions relating Y; and X;,
de pending o n the value of D ,. as is shown in Figure 8_gb. The difference between
function~:
~allows
the two intercept s is {31 , and the difference be tween tbc two slopes is f3:. ln the
e amings example. /31 is the effect of an additional year of work experience for nongraduates (D , = 0) and /31 + {33 is t his effect for grauuates. so {33 is the difference
in the effect of an additional year of work experience for college graduates ver~us
s for dil-
non graduates.
Earnill~' )j
A third possibility, shown in Figure 8.8c, is that the two lines have different
"lopes but the same intercept_ The in teracted regression model for this case is
Ice ( \ ) anJ
~re JJ,
o:=
II
(8.33)
The coefficients of this specification also can be interpreted using K~y Concepl
...3. l n rerms of the earnings example, this specifica tion allows for different effects
of experience o n log earnings between college graduates and nongraduates. but
requires tha t e xpected log earnings be the same for b orh groups when they
have no prior experience. Said dtiferently, this specification corresponds to the
populntion mean entry-level wage bt:ing 1he Mime fur college gratluaks and
282
CHAPTER 8
8.4
Through the usc of the interaction term X, X D,. the population regressiOn l.nc:
relating Y, and the continuous variable X; can have a !>lope that depend~ on the
binary varia ble Dr There are three possibilities:
x 0 1) + tt;:
f3o + {3 1X; ~
{32(X,
x D,) + u,.
no ngraduates. This does not make much sense in Lhis application. and in p-1ctict
this specification is used less frequently than Equauon (8.32). whtch allow<; h Llif
feren t intercepts and slopes.
All three specifications. Equations (8.31), (8.32). and (8.33), are versiOn" Llf tht>
multjple regression model of Chapter 6 and, once the new variable X; x D. ;... en:
ated, the coefficients of all three can be estimated by OLS.
The three regression models with a binary and a cootmuous independ~nt ..-ari
able are summarized in Key Concept 8.4.
= 682.2 -
cu 9) (os9)
R. = 0 105.
2
<1os)
co.n>
c~ '"'
8.3
~LES
n line
on the
lnteroc~ons Between
Independent Variables
283
where the hi nary variable H1 I:..L, equal 1 if the percentage of srudents still learning English in the district i~> greater than lO% and equals 0 o thcnvise.
For districts wjth a low fraction of English learners (HiL 1 = 0), lhc estimated
regression line is 682.2 - 0.97STR ,. For districts with a high fraction of English
learners ( HiL 1 = 1), the estimated regre sion line is 682.2- 5.6 - 0.91S TR,
- 1.28STR1 = 687.8 - 2.25STR,. According to these estimates, reducing the student-teache r ratio by 1 is predicted to increase test scores by 0.97 poin t in districts
' ith low fractions of E nglish learners but by 2.25 point!) in districts with high fractions of Englis h learners. 11le difference between lhe c two effects, 1.28 points, is
the coefficient on the interaction term in Equation (8.34).
The OLS regressjon in Equation (8.34) can be u~>ed to test several hypotheses about the populatio n regressjoo line. Firs t. the h ypothesis tha t the two lines are
in fact the same can be tested by com puting the F-staristic tt:sting the joint bypothesb that the coeffic.:ient on J-liEL1 an d rhe coefficient on the interaction term S TR,
L-----rt practice
s for di[
lons of til<
D, iSCf(
1ndent \'an
of English
acher
rat",
'
bigh or lo"
, for two d t
, pt!rcent .:
fferet\t sll .
r;EL).
X HiL1 are bot11 zero.1l1b F-~tatist ic is 89.9. which is signjficant at the 1% level.
Second , the hypothesis that two lines have the same slope can be tested by
testing w het her the coefficient on the interaction term is zero. The t-statistic.
- 1.28/0.97 = - 1.32. is less than 1.645 in a bsolute value, so the null hypothesis that
the two lines have the same slope cannot be rejected using a two-siJcd test at the
10% s ignificance leve l.
TI1ird, the hypothesis that the lines have the same intercept can be teste d by
testing whether the population co~fficic nf on Hi L is zero. TI1e r-statistic. i5 t =
5.611 9.5 = 0.29, so the hypothesis that the lines have the same intercept cannot be
r ejected at the 5% level.
These thret' tests p roduce seemingly contradictory results: The joint te~t m;i ng
the F-statistic rejects the jo int hypothesis lhat the slope and t he in tercept are the
sa me, but the tests of the individual hypothe:ses usi ng the t-s(atistic fa il to reject it.
n 1e reason Co r this is thaL the regressors. HiEL and STR X HiEL, are- highly correlated. n1is results in large standard e rro rs on the individual coeffici ents. Even
though it i:, impossible to te ll which o f the coefficients j non?.ero, there is '>trong
l.!videncc against the hypothesis tha t both are zero.
Finally. the hypothesis that the student-teacher ratio does not entl.!r this specification C.."tn be tested by computing the F-statisticfor the joint hypothesis that the
coefficientr:, o n STR and on the in te raction term are both zero. This -statistic is
5.64, which has a p-valuc of 0.004. 1l1us, the coeWcicnts on tbe stude nt-teache r
ratio arc statistically signi fi cant ar the 1% significance level.
284
CH APTER 8
ho~~v~r. for
function.~!
(cnntimwd
TABLE 8 . 1
Regressor:
Years of educotu>n
(1 )
(2 )
(3)
0.()9H
(O.OOOS)
0.09.30
(O.OOCJ8)
0.0861
(0.0011)
- 0.237
Female
(0.004)
- 0.484
(0.023)
O.Ql&J
(0.0016)
(4)
U.U~I}q
(0.00 11)
- O'i21
(!l.ll!!)
-------
0.02117
(U Olllti)
Potential expemmce
0.0232
(O.()IIU..}
- o.ronn,'oi
Pore/ll.lal expercenu?
(0.0000 1~)
- 0.051'1
(0.006)
MidWt:ll
-------- 0.078
Sourh
{0006)
----------------------------------------------------------------------- 0.030
(0.00h)
Intercept
1.545
(0.0 11)
0.174
l.b21
(0.011)
0.220
1.215
1.721
(1).(11~)
(0.0 15)
0.2:! 1
----------data are fwm the \l arch zeus rum:nt Populalton SUI"CV (<ec Append!\'').The o;.unplc ,,zc I<
'i7,.'lM
Th~
for eucb r~jlrc'>Jon Fona/, IS on onJo~ooator
II -
O~l""lltion'
\imolblc lh3l equal, 1 for "'''men and 0 f(ll" men \fidKQt, ~~~,;-.and 1\,-rt are u!d
.:ator anabk< d.:noHn, the r'i.Jon ultbc L.noted $! tc: tn whach the: wnrl.cr hve, For ~ ~ampk. Mtrf,.n t equ:tls I of the
worker h'c:' 111 1~ lu.J.., 1J cl{uah Cl ull'l<!rwt>c flhc cunallcJ re 11 "" ~ \ , rrhr~ t) St nJ uti error-. are reported on parctl
lbc>o below the c:sumMcJ cociiiClcnt 111\l' tu I c.. "<!IIICieno arc <t.tll,IJQ.ll\ "i!n"'4111
be ~ '10. or 1 % lglltt.caoc.:
lc:\ ..-1
un
hn
ret
call
rc~
tha
Th,
8.3
ot might
t so tbe
on could
.nctwual
el:tuon-
nt dollar
ntimuui1
:41
~99
Klll)
f2l
)2.2)
.U7b
l.ll06)
l.!l30..
) OO'i)
285
----
286
CHAPTER
~uppo-.~;;
'TI1e tntl ruction term allows the effect of a unit change in X1 to depend 011 A,
thb. apply the gc.:nt!nll method ror computing effects in nonlinear rcPr~C.s
:non mouds tn Kt.!y Concept ~.l.11u: dtfferencc in Equation (H 4), comput~J tr
th~: mlcract~;d ll.:gfl.:~ tun (unctiOn m E4uauoo ( .35). IS .6 Y
(13 1 -t- {3,X; .l \
[Exercise 8JO(a)). ll1us..thc ~fft;cl on ) ot a change in X 1 holding X2 con...,tant...
l os~:~.:
{8 ~6)
"htch dcpcnu-. on .\ ' for I!Xamplc. '" th-.; ~arn mgs cxampk if {3 is ro.,, Ll\ ~. n
the ettcll on log earo1ng' ol .w additional }~<II of ~;xp~.:rt\.nc~.. i~ g.redter. ..._. 1h
amuunt {31 for each .tutlitional) CM ul educauon th~. wor.hr hJ!->.
A -.tmil.n calculation ilio'" th 1 thc.. dfccton }'of .. ~h.mgc :lX~ in X,. hold
mg.Y C<lll\l<tnl.is t~. = ({3L .1.. {J,X ),
Puumg thc~c tw~ ~:fleets together -,hows that the codfkicnl {3_1 on thc intd
uc:uon tcm1 j.., the effect ol a unH increttsc in X 1 and X1, above nnd bcvunJ the :-.Uill
oltht! dfects of .1 unit increase 1n X 1 atune nnll a unit1ncn..:ase 111 X 1nll11lt!. Tit.111s.
if X 1 chan~c~ by ~X 1 and X~ changes b} .lX1 then tin. CXJ'cctcd change m ) b
(13 + {33 X 1).lX: {3.l.lX .lXz IL:-.~rli..,e li..IO(c)j ThL'
.l Y ({3 1 - {33 \'1).lX
fir-.1 l<::rm i<~ lh'- ~ff\.CI Irom changing \'1 huiJing .\'~ COiht.llll th1. -.~cnnt.lter I'
the I! fleet frum ~.hanging X 1 holding .Y conostant, and thc fin,tlll.:rm. {3 . ~X..l '< I'
the c'<l l <1 ~. fl\;cl Irom changing bolb .\ and.\,,
lnL\. ra~.tvn~ bt.!twccn I\Hl \ art.lhk"> an.: -.ummanzcl.l a-. k. r.:\ ( unn:pl "5.
\\ ht.n JOtcr.to..:tJon-. ,trt: combmo.:J "11 1 lu~arithmit. tr.ui,IUJ m>~tton,, th\.} .. 0
b..: li'Cd 10 C'> lintOlC prke ela'lticitic'l \\hen the price Cll'-lh.:il) I.JcpCn\1 \lO the;
incr~a
8 .3
287
I
us. An
f work
_I
popu-
8.5
year of
akntly,
mbcr of
:raction
I year or
n can b.::
erm that
I'""
(8.35)
characteristics of the good (see the box 'The D emand for Eco nomic Journals" for
an e xample ).
~d ()D X~.
~
rcgTes-
fputcd for
b, x,)~ X .
~st~nt. is
de n t- teacher ratio and a hinary variable indicating whether the percentage of English learners is large or small. A different way to study this interaction L to examine
the interaction between the student-teacher ratio and the continuous variable. the
(~U6)
!sitivl:, then
~tcr . \1y the
I
. ~ hOld
TestScore
)n the: inte r
ond the sum
lone.11lH1 1'1ange 1n ) 1'
S.HI( c) l ill~
:cond tcflll ~
(33;lX ~~X;'~
rpt!n<.h ~'"
(11.8)
PctEL).
(837)
R_z - 0.422.
~~ ! '
tccpt 1{.5.
.tons.- th~'.
= 686.3 -
When the percentagt! of E nglish learner i at the median (PctEL = 8.85). the
slope of the Hnc relating test scores and the student-teacher ratio is estimated to be
- 1.1 1 (= - 1.12 + 0.0012 x 8.85). When the percentage of English learners is at the
75'11 percen tile (PetE:/./ = 23.0),this line is ~s t imatt!d to be llattcr, with a !.lopt: of
-l .Ot.l ( = -1.12 + O.IXH2 x 23.0). That is, for a district ""'rir.h 8.85% English learnunit reducuon in the student-teacher ratio is to increase
test scores by L L1 points, but Cor a district with 23.0% English learners, reducing the
s tudent- teacher ra tio by o ne unit is prcdictccl to increase test scores by only 1.09
points. 'The difference between lht:se tstimatcd e ffects is not statisticaUy significant.
however: The H ..latisLic testing whether the coefficient on th e interaction term is
zero is t - ll.ll01210.019 = 0.06, \\ hich IS not Significant at the 10% level.
crs..th~ cMimatcd effect of a
,,tl
1h~
Ji
CHAPTER 8
2 88
re~earch
et-Q.
How elastic is lhc demand hy hhraries for .:conomit-s journals'? To find ou1. wo nnnlyzed the rela-
tio n~;hi p
contillllrtl
re~arcb
FIGURE 8.9
Sub scriptions
ln(Subscriptions)
B
1200
Jll(l()
(~I() :
.j
3
2
IS
20
2.1
P r ice p er citation
(a) Snbscripoons artJ P mo:: per Cltlllon
ln(S~-ripoons)
U L-_.___.__..__.___,_...___.___._.._-J
-b -S -4 -.\ -2 I II
:! J
4
ln(Sub~cnpuou~)
8.3
~ubscq uently
;q
289
10
1.., ver~
mor~.
cominul'd
!lllinued
the
)and
1 in Fig1. But as
TABL 8 .2
Dependent variable: logarithm af suburiptions at U.S. libraries in the Year 2000; 180 observations.
Re~rc~sor
(I)
(2)
{J}
(4)
- 0.533
- 0.400"
(0.04-l)
-0.961**
(0.160)
-0.899**
(0.0~4)
----~-
(0.145)
0.017
(0.025 )
0.0037
(0.0055)
0.424'" ..
(0.1 1':1)
lnLAge)
ln(Agt}
><.
4.77"'
(0.055 )
lnrercc:p t
0.374
(0.118)
0.156-**
(0.052)
0 141*'"
(O.(J40}
0.206
(0.098)
1}.235
(0.098)
(O.CI%)
3.2.1**
:'\_4JU
(0.38)
(0.3S)
ln(l"rrce p er cifa/irm}
0.373*'*
(0.118)
(0.38)
0.229
141**
SI:.U
0.25
{0.779)
li.750
0.705
11.6<)1
II.~
0.555
0.607
0.622
l).ti26
quan,\y linelastic
k~
Jrnols
lld.1rd <.'rrl" :uo.: gi,en an p.lf<:nlh<:~~ under codficic:nb..tnd p-\'alut:'> are gi~<!n tn parcnthcseo; unJeT F-.,hllisuc. lndhidu~luet(I~JI!Ilt\ .Jr<: ~tau.u,allv ~ll!lllltcant .tl the 5 "lc' cl or .._.I% level.
~ "' r 'llli~llc l~t the hvpoth\."'1' tlldt tilt: coeificiem~ l'!l (In( J'ric<' permonnn)j! und pn(Pnce pl'r cuanrm ))'arc both zero.
s,
290
CHAPTE R 8
journnts.
2. The evidence supports a linear. rather than a
cubic, function of log price.
3. Demand is greater for journuls with more characters. holding price and age constant.
So what is the elasticily of t.lcmand for economics
journals? It depends on the age or the journal.
8.4
8.4
291
stu
ct ..
...
Lhro ug h (7) each report separate regressio ns. The e ntries in the ta ble a re the coefficients, standard errors. certain F-sta tis tics a nd their p -va lues, and summary statistics, as indjcated by the description in each row.
The first column of regression results, labe led regreSSIOn (1) in rhe table, is
regression (3) in Table 7.1 repeated he re fo r convenience.1l1tS regression does not
control for income, so the first thing we do is check whether the results c hange
subs tantia lly when log mcome is included as an additiona l econo mic control variable. Tile results are given in regressio n (2) in Table 8.3. 1lle log of income is statistically signif1cant at the I% level and the coefficient on the studem-u~acher ratio
becomes some what closer to zero, falling from - 1.00 to - 0.73. although it remains
swti tically significant at the 1% level. The change in the coeffici ent o n STR is
large e no ugh between regressions ( l ) and (2) to warrant tncluding lhc logarithm
of income in the remaining regressions a:-. a deterrent to o miucd variable bias.
Regression (3) io Table 8.3 i~ tbe interacted regreSSIOn in Equation (8.34) with
the binary variable for a high or low percentage of E nglish learners, but with no
eco nomic control variables. When the ecooomic control variables (perceorage eli!!ible for subsidized lunch and log income) are added [regression (4) in the table] .
the. coefficie nts change, but in neither case is the coefficient on the interaction term
significant at the 5% level. Based on the evidence in regression (4), the hypotbe' 's that the effect of STR is the same for districts wilh low and high percentages of
English learners cannot be r ejected at the 5% level (the tstathtic is 1 = - 0.5 0.50
= - 1.16).
Regression (5) examines whether the e ffect of changing the :-.tudcnt- teacher
rat io depends on the value of the student- teacher ralio l"ly including a cubic specaficntion in STR in a dd ition to the othc..r cont rol variable.; in regressiOn (4) (the
tntd1dton term, HiL X STR, "''' dn,ppcd l'l~:ltU.,._ 11 \\as not "~nificant in
292
CH APTER 8
TABLE 8.3
Studcn
-tc:~~
c:r r:ltiO I ll I I
(1)
(2)
-1 f(j. .
-073 ..
(0 2i)
(O.llil
(3)
(4 )
(1.25)
0.05Y ..
(0.1121)
-01711...
(l)(f'l1J
(ll.fl~)
-1.21\
I11.'17)
HiH <SIR
-4 3.....
(1.44)
fl 07"..
lllll24 )
~";
(1.,:1)
0 (){',(
(U 02
- 0 161'
(0 0.~)
- 5..!7
'\.M
( 19.51)
l:.ng.h~h learne r~
(7)
lI
-1.42"'
-11122 ...
(6 )
(5)
(Ill~)
1\lh I
c:m11
-1~.- ~~
-tl 'il'
I ll "~II)
(:illl)
612'
(2 '~)
-0 HI
(0 OJ)
% Eh!-'lhle lor 'ltb-.id11ed lun,h -II .r;.nu
(0 tJl4)
o.;:;o
-n Y '
(00291
fl)liH\
( q
251 (I
(lll3fl)
(S2.2u
lntc:r~epl
(II.YJ
(0 02'))
IIW..
(1.7 )
12
-f} .. ,"
I ''.3
t '".5)
and tntcracuon" - II
"'"~
/ Itt/.
J<.
'iT!<. !Ill
\TJ<
)h
( , O.IXl l)
... 'il
(tl.U(IJ)
5
6.1 7
,,,,
(H.CMn) ( fl ()()1 )
',l(l
(tl tiiJ1J
( ~ (J.()IJ I )
r x ~m .
Sf H
IJJJ.'i
11.773
IJ.:'IJI
UJ05
07'15
U.71J'l
1).7':1'}
I be~ oev~ n" r .: htruJ' d tmn& tbc ct;ua on I' "~ ~oul d tri..t' rn C hCurrua ue5oertbcd m rrcndi\
n: gtcn m p:no;nt~ under Cllc:rflamt11.11n.:lp otlUC! uc: &~~en tn Jl"fCnthc:<c:$ ~ F"'!lt tttln lodl\'ld\l
tutic:all) -M ml nt t chc:
Of' 1 ... w~mfi ""' lc:cl
Ll]li:O..J
'\
t.lnl c:rra '
- c:oh r: ~-
8 .4
293
rt:gres~ion ( 4) at the 10% level]. The estimates in regression (5) arc consistent with
the student-teacher ra tio having a nonlinear d'fcct. n1e null hypothesis that the
rela tionship is linear is rejected a t the 1% significance level against the a lternative that it is cubic (the F-statistic testing the hypothesis that the true coefficients
on STR'2 and STR 3 are zero i 6.1 7, with a p-valuc o f <0.001 ).
R egression (6) further examines whether the e ffect of the student- teacher
(7)
r-~29~
~5.~h)
ratio depends not j ust on the value of the student-teacher ratio but a lso on the
t-3.47*
(127)
0.060
funct ions relating test score!:- and STR are diHcrcnt for low and high percentages
of English learner:.. To do so, we test the restriction that the coefficients on the
(11.021)
-0.166
three in teraction terms are zero. The resulting F-statistic is 2.69. which has a p
(0.034)
value of (}.()46 and thus is <>ignificant at the 5% but not the 1% significa nce leve l.
1l1is provides so me e vidence that the regression functions are dillercnt for distncts wilh high a nd low percentages ot English lea rners: howe ver. compa r ing
r egressions (6) and (4) makes it clear that these difference:. arc associated \\itb the
quadratic and cubic terms.
Rc gr~ss io n (7) is a modification of rcgrcl'ision (5) . in which the continuous
varia hle PctEL is used instead of the binary vari<)ble HiEL to control for the percentage of E nglish learners in the districl. The coefficients on the other regressors
do not c hange substantially when this modification is made, indicuttng that the
-O..!fl:!
to.n:.m
results in regression (5) are not sensit1ve to what measure of the percentage of
lJ 51
( 1.81)
244.1-l
( 165.7)
S~l --1
(O.OOl)
5.96
(0.l)0,1)
cally. Figure 8 .10 graphs lhe esti mated regrt: sion fu nctions relating test scores and
the student-teacher ratio for the linear specification (2) and the cubic ~pecifica
tions (5) and (7), a lo ng with a scatterplot of the data:1 1l1ese e!>timatcd regression
functions show the predicted valu~: of test scores as a function of the
student-teacher ratio, holding fixed o ther values of the independcn1 \'ariables in
the regression . The estima ted regression functions are all c lose to each other.
although the c ubic re gressions Oattcn out for large val ue s of the student-teacher
--0 7'1:-
:anunrJ cfl
H"
ratio.
ror each curve. the prcdictc:u value wa~ computed b~ '<.!nmg each mdc:pc:ndent vanahle. o th.:r than
STR. 10 its sampk avcm!!c: Vi! lUI.! and computme thc: predicted value by muluplymg lhc:.c! fixed \alucs
ol the independent \ltri.1hle~ h~ lb.: respecthe c:'timnted cocfficicnl!. lrom Tahlc HJ.Tht) "''~done tor
,.,mous values ol .'>I 1<. and the graph of the resulting adjustecJ predtcted "alucs JS the e!>Umatcd regres
sion line reloung II!SI s<:ttre' and the: STR. hnldtn~ the other variables cO!Thtant at their ~ample uverugcs.
294
Ct'!APTER 8
FIGURE 8. 10 Three Regression functions Relating Test Scores and Student-Teacher Rotio
The cubic regressions from columns (5) ond
(7) of Tobie 8.3 ore nearly identical. They
indicate o small amount of nonlinearity in
the relation between test scores and
student-teacher ratio.
Test score
no
.. . ....... .
.. ..:.; ~.. .." .,...
700
680
on
col
l'or
6ti(J
~------
8.3
cub
17
tion
640
620
6(>012
14
16
Ill
20
22
24
ron
di
ofS
lions
2n
8.4
FIGURE 8. 11
2~
295
Regression Functions for Districts with High ond Low Percentages of English Learners
Test score
720
700
680
Regression function
660
..
')('
Regression functJo,n
(JZU (HiEL = 1)
600 ~--~--~----~--~----~--~----L---~
1:2
14
lh
18
20
22
26
Student-teacl1er ratio
r ratio
Summary of Findings
renee. in
bureS.ll
rdes 88%
len points
ts ,vjth a
tant the
r ratio is
different
ead more
%of the
ction:; aJt!
t-teachet
cores oi n
c ntagc oi
have thC.:
These results let us answer the three questions raised at the start of this section.
First, after controlling for economic backgro und, whether the re are many or
few English learners in the district does no t have a substantial influe nce on the
effec t on test scores of a change in the stude nt- teache r ratio. In the linear specifications, the re is no statistically signifjcant evidence of such a difference. The cubic
specification in regression (6) provides statistically significant evidence (at the 5%
level) tbat the regression funcri oos are different for districts with high and low percentages of English learners; as shown in Figure 8.11, h owever. the esti mate d
regression functions have simiJar slopes in the range of student-teacher ratios containing most of our data.
Second. after controlling for economic background, there is evidence of a nonlinear effect on test scores of the stude nt- Leacher ratio. TI1is effect is statistically
significant a t the 1% level (the coefficients o n STR2 and STR3 are always significant at the 1% level).
1l1ird, we now can return to the superinte ndent's problem mat opened Chapter 4. She wants to know the effect on test scores of reducing lhe stude ot- le<tcher
ratio by two students per teacher. ro the linear specification (2), this effect does
no t depend o n the student-teacher ra tio itself. and the estimated effect of
Lh.is reduction i to improve test scores by 1.46 ( = -0.73 x - 2) points. ln the
296
CH APTER 8
8.5
Conclusion
This ~:htt pter presente d se veral wa ys to mode l nonlinear regression functrons.
lkc:.~usc th~s~.: moJ ds arc varianb of lhe multiple regression mode l. the un kn~.m n
cod1ictents can be estimated by OLS. and hypotheses about their va lue~ can h~
te~tcJ ustng t- ami F-stutistics as described in Cha pter 7. In these modLb, the
expected effect o n Y of a change in one of the independent variab les, .X1, hoiJing
the o ther im.lepemkn t variables x2..... xlt. consta nt, in gene ra l d eptnd-; 1'111 the
values of X 1. X2. X,.
Then~ are man) Jifferent models in this ch aph!r. a nd you could not be hi -~ d
tor bctng a hit be\\ tldcrcd a bout" Inch to use m a g1ve n appli at ion. H o" :.1 JUid
}OU anJlyze posstble nonl tncarmcs 111 p racucc? Section 8.1 latd out a gcr.t:r,tl
approach lM ... uch an anal~:.is. but this a pproach requires you to make dcu.,, '"~
and exerci<;e judgment a long the wa}. II would be convenient if there were n ..:nglc recipc: you could follow that would alwayc; work in every application. hu in
practice data analy::.n. is rarely lhat simple.
1111-;: single most importa nt step in specifying nonhnear rcg.rcssion func:ttc ln:' i"
to "usc your head.'' Before you look at the d ata, can you think of a reason. b<l~-:u
on economic theory or e xpert judgment. why the slop~ of the population rc~rt:'
sion functi o n might depend on lbe value of that, or anothe r, inde pe ndent vurial,le'!
If :-o. what ~ort o f t.kpcnJcncc might you expect? Ant! . ml>Sl important ly." hi(h
nonlinea ritii.:s (if any) c ould have major implications for the substanti ve h,ttt:!>
adtlrcsseJ by your s tudy? Answering these questions carefully wtlllocu~ ~t>llr
analysis. In the tc't ~core:: .tp plicalion. for exa mple. such reasoning lcJ us to tnv<:~
1
Ligate whether hinng more teachers might have a g.reah.r cUect tn district'" iti1 '
lan~c percentage of \tuJ~nts 'till learning E nglbh pcrhap~ hccau"e tho.-.t> 'tuJ"nt\\-llUit.l diffcrcntiall) b<.:ndit from more Jl\!f"Onal atlt:ntion Bv m'lking the ql ~;;
uun pn.CI\C. we ""'rc ahle to fmd .1 preci-.e nn.,wcr: After .,;untrlllhng tor t w_ _ __ _ __
_.i,
297
Summary
1. In a noniJncar rcgresston.the ~lopl.! of the porulation n:grC:>'-llln runc.:tion dl!pcnds
on the "alue o[ one or more ol he mdcrenuent variable-..
tio is
2. 1111.' c llc.:ct on
Key Terms
4u<tdrat1C rcgrt:ss1on model (25b)
nonhn~m r~gre;;;-.ion
funo..:tion
(:~60)
298
CH APTER 8
8.2
produclion. capital (K). Labor (L). and raw materials (M). and an error tc ,..
u using the equation Q = >..KfJ Li1~Mf1)eJ', where >.., {3 1 /3 2, and {3 \ are prod .
tjoo para me ters. Suppose you have data oo production and thl:. factor~
production from a random sample of finns with the same Cobb-Dougla~ pr
duction function. How would you usc regression analy'>i-. to c timat~
production parameters?
8.3 A standard " money demand" function used by macrocconom1 ts ha'> tile
form ln{m) = {30 + {3 11n(GD P) + /3 2R, where m is the quantity of (rt.<tl}
money, GDP is the value of (real) gross domestic product, and R is the value
of the nominal interest rate measured in percent per year. Suppose that f3 1
== 1.0 and {32 = -0.02. \Vhat will happen lo the value of m if GDP incrca~~:)
by 2%? What will happen tom if the interest rate increases from 4% to 5':?;>
8.4 You have estimated a linear regression model relating Y to X Your prokssor says. "I think that the relationship between Yand X is nonJincar.'' E xplain
how you would test the adequacy of your linear regression.
8.5 Suppose that in problem 8.2 you thought thal the value of {32 was not constant. but rather increased when K increased. How could you use an interaction term to capture this effect?
Exercises
8.1
Sales in a company are $196 million in 2001 and increase to $198 million in
2002.
a. Compute the percentage incn:ase in sales using the usual fonnula llJ(l
x Sllla;ts.ra <t.toot
Said. , . Compare this value to the approximation LOO
X
[ln(Sales2002 )
ln(Sales 20m)J.
= 501J.
c. How good is the approximation when the change is small? Doe~ tht:
quality of the approximation deteriorate as the percentage change
increases?
8.2
Suppose that a researcher co llects data on houses that have sold in a paruc
ular neighborhood over the past year and obtains the regression result~ 10
the table shown below.
o. Using the results in column (1 ), what is the expected change iu pnct: l,r
building a 500-square-foot additton to a house? Construct J 95% C1, 0 "
fidcoce interval for the percentage chang~ m pric\.!.
Sr
l.n
In(
Bt>
Exercises
tors of
299
'r term
roductors of
as pro-
ate the
has the
f (real)
te 'alue
: that (31
1creases
1 to 5%?
r profcs-
'Explain
not conan inter-
(2)
(I )
Regressor
0.00042
(0.000038)
Sire
0.69
(0.054)
Inr )'t~t!)
(3)
----0.68
(0.087)
lnt\iul"
--
---Vtt '"
f ool x Vteu1
I"- . .
Com/ilion
0.082
(0.032)
0.037
(0.029)
0.57
(2.1>3)
0.0078
--
0.071
0.0036
(0.037)
-- 0.69
--
(0.045)
10.97
(0.069)
4 .......
(0.055)
_ _
----
(0.034)
0.071
(0.03.f)
0.071
(0.036)
0.07 1
(0.035)
(0.028)
0.026
(0.026)
0.027
(0.029)
0.027
(0.030)
- - 0.027 -- --
-- ---
- - --- - -- - - -- -0.13
0.12
0.1 2
f-
lnlt n:c: pt
(5 )
(!ll~)
l:iedrooms
Prwl
(4 )
(OJ)35)
---
6.60
(0.035)
-- -- -6.63
(0.39)
(0.53)
--
--
0.0022
(0.10)
---
--
0.12
(0.036)
0.12
(0.035)
6.60
7.02
(7.50)
(0.40)
0.099
0.099
0.73
0.73
--
Summary Statistics
1illion jo
ula tOO
00
::. soo.
oes the
angc
SH
Rl
0.102
0.098
0.72
0.74
---
0.099
0.73
--
--
\'a11ohl, tlehnmon~ l'rice; ~~~ price {S): Sv.e 5 bouse ,iJ~ ( lD square: feet): Bednl(>m~ S number ol bedromn,, PoolS binarv variahl I 1t hou.~ ba:<. a swimm1ng pool. 0 otherwise): View 5 t;.ina l") vuriublc (I 1f house has a mcc: vJew. 0 othr::rwM:): Condillon 5
bmu1>. 1 Brial:llc (I if realtor reports house i~ in excellen t condition. 0 mhcrwise 1.
b. Compa ring columns (1) and (2), is it bcner to use Size or ln(Size) to
explain house prices?
c. Using column (2), what is the estima ted effect of pool on price? (Make
sure you ge t the units right.) ConsLrucr a 95% confidence interval for
this effect.
d. The regression in column (3) adds the number of bedrooms to the
regression. How large is the estimate d effect of an additional bedroom? ls the effect statistically signi fi cant? Why do you thin.k the estimated effect is so smaJI? (Hint: Which other variables arc being he ld
constant?)
300
CHAPTER 8
8.3
STRmoderote
= I if 20 ~ STR s
STRiarge
0 otherw1se:
= 0 othcmtsc.
nd
= 0 othc rwbe.
8.4
t'
Exercises
30 T
in
le:<erer-
tuuca-
f. How would you change tbe regression if you suspected that the effect
of experience on earnings was diffe re nt for men than for women ?
size.
clas..l\
than
8..5
a. The box reaches three conclusions. Looking at the results in the table,
whaL i~ the basis for each of these conclusions?
~e rela-
dents.
r than
b. Using the results in regression ( 4). the box reports that the elasticity of
demand for an 80-year-ol<.l journa l is - 0.28.
ss sizes
25.To
ii. The box reports thaLlhe standard error for the estimated elasticity
is 0.06. How would you calculate this standard error? (Hinr: See
the discussion 'Standard e rrors of estimated effects'' below Key
ConcepL8.1.)
c. Suppose Lhat Lbe variable Characters had been divided by 1,000
instead of 1,000.000. How wou ld the re))ults in column (4) change?
: and
8.6
TR/arg(ll
(or
st::;t..::nt
s that her
. p'' h1 s.::c
form of nonlinearity.
ii. How would you test whether U1e researcher's conjecture was better than the linear specification in column (7) of Table 8.3?
cc
l\'Pencn
~ ofTClhl~
~C\ 1.!<.1
Read the box "1l1e Demand for Economics Journals'' in Section 8.3.
:wmcialt:J
302
CHAPTER 8
Thi~
ln(Famingt~ ) =
iii. D oes this regression suggest that fema le top executives cam lc<--.
than top male C\.ecuti"cs? Expla in.
b. T\\0 nC\\ variables. the market value of the firm (a measure of firm
size. in millions of dollars) anti stock return (a mca;;urc o f 1irm perfor
mance. in percentage points}, are added to the regression:
In( Earmn~s)
(O.CJ<.).t)
(0.003)
8.8
X is a continuous variable that take:, on "alucs bct\\ccn '\ and 100. / i' 3
binal) variable Sketch the follo ,,ing rcgrc ...::;ion function" (with ,,liue"''t X
1
hetween 5 and lOU on lhc horizomal axi' anc.l "allll~.., or Y on the:: vc.:r.l\.
.r\i,.)
Empirical Exerdses
gs in top
lre~ total
urations
p0 rt total
a. );
= :!.0
b. )'
= 2.0
c.
i. }"
..l.
3.0
In().
30
:.<:
ln(.Y).
= 2.0 +
303
= l.
d.
1 1 ~~ and 0
i.
wk )~elds
e.
8.9
= l.
0.
explain how you wouiJ usc. "Approach #2 .. of Section 7.3 to calculate the
confidence interval discu~sed below .Cquatlon (8.8 ).[Hi11l: Thil' requires csttmating a ne.w rc::gre~s1on using a diffurcnt ddinttion of the regressors and the
1.h:pt:nd.:nt variahk. S<:e Exercise (7.9).]
nat this
earn tess
aioatton'!
e of ftr01
a. t~
b. ;~~
firm perfor
)04Return.
Empirical Exercises
0113)
E8.1
it has
es than sma11
on Asw. Femal<, .tnt! Btu helr11 If Age increases from 25 to 26. bm\ arc
earnings expectetlto change? H Age increases from 33 to 3-k how are
earnings expected to change"
c. Run a regression of the logarithm average hourly earnings. ln(A fl E).
on ln(Agc) . h mtl/(', and Barhdor If .1ge mcrcascs from 25 to 26. how
304
CHAPTER 8
1.
k.
rs the effect of Age on earn ings different for high school graduate~
than college graduates? Specify and cstiu1at..:: a regression that you can
use to answer this question.
I. After running all of these regressions (and any others that you want tO
C31T}
10
Empirical Exerc'"'"
-1.:.
Empincol Exerc1se$
305
Course_Em l is different fo r men and wo men ls the male- lcmall! difference in the eiTect of Beauty s ta tist.kally sigmf1cant?
d. Pro fessor Smilh is a man. H e bas co me uc burgery th:\t increases his
bea uty index fro m o ne sta ndard devia tio n belo w the a verage to one
standard deviation a bove th~ a vcragc. \l..'hat h. Ius value of Beaut.\
before the su rgery? A fter the s urgery? Using the rcgrc!>sj on lD (c).
construct a 95% confidence fo r the increase 10 his course evalua tio n.
IS
a wo ma n.
Use the data ser CollegeD is1ance descn bed in Empirical Exerc1se 4.3 to
a nswer the following questions.
a. Run a regressjon of ED on Dist, Female, Byte.H, Tu uinn, Black. Hts-
can
Jl{ \CI
10
(a)? Explain.
58.
1, CmBO =
306
CHAPTER 8
g. Mary, Jane, Alexis, and Bonnie have the same values of Dtst, Byte.\/
Tuition, Female, Black, Hispanic. Fincome, Ownhome, Cttt>80 and
Snvmfg80. l'cithcr of Mary's parents allcndcd college. Jancs fathc:r
anended college, but her mother did not. Alexis's mother attended
college, but h ~r father did not. Both of Bonnie's parents attended college. Using Lhe regressions (rom (I):
i. Wbat docs the regression pred ict tor the difference between Jath. s
family's income?
' A her running all of these regressions (and any others tllat you want iO
Using the data ser GroW1h described in Empirical Exlolrcise 4.4, cxcluJin!! the
data for M altu, run the following five r~.:grcssion s: Groll'th on ( 1) TrmitS/um
and YecmSclwol; (2) TradeSiwrl! and In( Years Schoo f) ; (3) TratkSirare,
ln(YearsSchool), Rt?v_Coups, Assassinatiu/1,\ nnd ln(RGD P60); (4) {rade
Share, ln(YearsSdwol). Re1' _Coups, 1\ssa.uinatiom. 1n(RGDP60) . and fi ,Jdl:
Share X ln(YearsSchool); a nd (5) Trat!tSitare, Trudi'Siw re . Trcu!t\1 t rt': .
In( YcarsSchool). Rev_Coups, AssassinMions. md ln(RGDP60).
n. Construct a scatterplot of Growth on }mr:Stluml Doc the rdatiPnship look linc..tr or nonlinear? Explain. l ...e th~ plot to ~'pl.tin '" h,.
regreo;;.,1on ( 2) f1 h ~c ttcr than n:~ rco;-.iun ( I ).
307
lc)
he:
oc-
c-
st.
Janes
APPENDIX
8.1
BonThe nonlinear regress1on functioru. considered in Section' 8.2 and 8.3 are nonlinear functions of the X's-but are linear function~ of the unknown parameters. Because they are lincar in the unknown parameters, those parameter:. can be ~lima ted by OLS after defining
\\ant to
new regressors that are nonlinear lran<>formatJooc; of the ongmal X"s. lb.is family of nonlinear regression functions is both nch and convcrucnt to U:>e. ln some applicattons, however.economic reasoni ng leads to regression functions that are not linear in the parameters.
udingthe
adeSilare
11 Je5/tllfe.
( ~) Trrllit'
nd rnult:
uieSJrart
Although s uch regression functionc; cannot he estimated by OLS, they can be estimated
u~ing
relation
m wh~
tugi.'itic cur~e.
example the <.~doption of database manag.:mcnl software m dtfferent mdustnes. The dependen t v::muhlc is the fracl!on of firm s in the mdustry that have adopted the software. a
308
CHAPTE R 8
Becau,. a hncar rcgrc.:''' 10 mwd could produce prc.:dt1.t d ,,,luc:. lc.:ss than 0 or ~r .ah.
th.tn I it muJ...c.;~ :-en~ tn u,~. 111\tead a tum:unn th.ll pruducc' prcdtcted value' b..:tw~.;~
JnJ I.
lb~.;
l .J )
The lugi,lic function with a ,jngk Xi!> !,raphed 111 Fitturc 1'.12 \~can be een in the gr.tp :.
the I1.1J!i<.tic functu n h , .111 dnngah:d ..
s sh..:tp.;
''nearly 0 and the o;lopc ts flat: the curve i~ steeper for mot.lt:ratc values o f X ; an\1 ror
lare.e values of X. th~: Junlllon ttpprouchcs J and thu slop0 is llut again.
I ton
p
p
Negati~e
exponential l(rOwth. llw fu nction:, used in Section 8.2 to model the reiJtl\11\
he tween test ' core' anu mcnmc ha11c sum~: dcfic11:ncu:s. For exomple, the pol~ noDllaln
d~
can proJuc.. a negative slut~ for some values of meum~.. which i~ implaustble. Tiao: It g
arithmic ;;pecaltcat ion has .1 rx>~ttivc -.tope Cor ..11 value'\ o r l'l<.Om hO\\ <:\er. a:. incorr- ~~
vcn
l~c. the
prcdtcted v:~lues lllt.;"TC'N! \llllhout bound. so tor some incomes the pre c;c:d
th~.;
and
decreao;;e~
lor all
\:I)UC'
n-. income
in..Tca~~..~ to inJ1nuy)
TI11!
ri-.c.:'~.
mll
h.t~
rcgc~-.11111
Ill f~gure ,
hu~
lia!Uelo of tnWniC
.~~~ mptotc
a' inl me
model is
Crneralfunctions that ore lltmlfttttu in the parameters. TI1c lo~tstic and ncgatiw ~' 1"1 '
nenual growth rcgrc~~ion modele; are -.pecial C:'I$<!S of the l!~ncral nonlinear rcgr~ , , 11)!1
model
Y, J(X 11
,.\ 1 ,
r .; .: '>
d of Sccuons 8 , un.l s 1, the: .>. 'c.-ntc:rcd thl' luncunn nnnlwcurl). hut the par..1md ~
:.uf thi'
II 00 II
309
>liOn).
y
~reat.: r
)'
f\'1!1!0 I)
1--- - --
(8J R)
~ graph.
(l ~-------------------------
f.J
lhc[unc-
I:and for
plots the logistic function of Equation (8.38), which has predicted values that lie between 0 and 1. Part (b)
plots the negative exponential growth function of Equation (8.39). which hos a slope that is always positive ond
de<:rease.s os X increases, and on asymptote ot {30 os X lends to inRnity.
Pori {a)
f relation
~ialmod
_Thc log-
If the paramt.:tcr!> art: known. then predicted effect:; may be cornpu_t..:d using lbe mcLbod
orne gelS
predicted
Jcscnbc d m Sect ton R I. In applications, however, the parameters are unknown and must
be estimated from the data. J'an1mc ter~ that enter nonlinearly cannot be estimat.:tl by OLS.
h~
that has a
of im:ome
as
m~omo:
(S.39)
is steep for
linc:arl~
Rc.::callth~o. JbcU'>'>lllO
in Section "3 of the OLS eo;timator of the coefficients of !he lint:llr multiple regre,\IOn model. lllc OLS cslimalor mini.rn.Ues the sum of ~uarcd predic-
L;_
l' cient<., t hi~ me thou is Cl\Jieu nonlinear least squ are~. For a ~ct of trial parameter values b11
h 1 , b 111 cunstructthe sum of squared prediction mistakes:
t l
( 8.41)
310
CHAPTER 8
The nonlinear lea'il squares estimators of {.30, {3 1, , {3,.. ore the value~ of 1>11 b 1, b,, !hat
mmtmtte the sum of squared predicuon mistakes m Equation (8.41).
ln linear regression, a relatt"cly simple formula expresses the OLS ~lima tor a~ a ft~ nc.
tion of the data. Unfo rtunately. no such general formula exists for nonlinl!ar least 'S<JU r~
so the nonlinear least squares e:.timator must be found numcncally w.ing a computer
Regression software incorporates algorithms for sohing the nonhnear least ~uare~ m'nirnWtlion problem. which simplifies the task of computing the nonlinear lc.t,t ~quar~~ t:'limator in pracuce.
U ndcr general conditions on the function f and the X's. the nonlinear lc<tl>t squar~s l -t1
mator shares two key properties with the OLS estimator in the linear regression mou~ 11
is consistent and it is nonnuJJy distributed in large samples. In rcgn;l>~ion soft ware that sup.
ports no nlinear lea&t squares est imation. the output typically reports standard enor., fur
the estimated parameters. As a consequence, tnference concerning the parameters Ciln pro.
ct:cd as usua l: in parlicular, r-statisrics can be constr ucted using the general approach 111 Key
Concept S. I, and a 95% confidence mterval can be constructed as the estimated coeffil.li.!nl,
plus or minus 1.96 standard errors. Just as in lin~.:a r regression, the erro r tenn in the nt~n
linear regression model can be bereroskedastic. so heteroskedasticiry-robuo;Lstandard c:rrors
sho uld be used.
desirable features of a slope that is always positive Iif {31 in Equntion (8.39) i~ poslth ... j.mJ
an asymptote of f3o as income increases to infinit). The resuh of estimating /3. f3 . .liiJ 13- Ill
Equation (8.39) using. the California test score data yields~. = 703.2 (heteroskeda.. uu~
robust standard error = 4.44),~1 = 0.0552 (SE =0.0068).and ~:"" - 34.0 (S = 4 4)-. . Th~
the estimated nonlinear regression function (with standard errors reported bclo~ the! pl!am
eter estimates) is
Test Score
= 703.2[1 -
(4.44)
n.u~~2(/nc~>me - l4.ntj.
(0.0068)
rs ...m
(4.48)
11,is estimated regression function is plotted in Figure 8.13. nlong with t h~ lnga11 thm1'
tegression function a nd a scatterplot of the data. The two specifications are, in thi' ca>'
31
quite sim1lar. One difference u; that the negative exponential growth curve nattc:ns ,,ut
the htghest levels of income, consistent with ha\'ing an asymptote.
uncares.
uter.
plini-
regres
~on fvnction (Equation (8..42)) ond the
lu~or log rogrossion fvnc1ion (Equohon
8 1811 both coptvre the nonlinear relotion between test scores end district
rncomc One difference between the two
lunc on~ is thot the negative exponential
arowth model hos on osymplofe os
ir.:o'l'lO 1ncroows to infintty, but the
0 log regrenon fvnclion does not.
~ t:Mi-
"esti~:l h
3 11
Test score
71)()
rm
pro-
m l<e)
Iicicnt.
@
c non-
Dindct income
errors
.has 1he
jvcl and
10d~ in
asticst~-
c pararn
(SA2)
~;trithrnlc
till~
te ns<
I
,,I
on Multiple Regression
10
'iCl.
:.lep back and ask, What makes a study lhm uses multiple
tC\
rcg.re~100
rehab!.: r
unreliable? We focus on statistjcal SlUd ies that have the objective of esti mating
the causal effect of a change in some independent variable, such a!> clas::, siLc. l, 0
a dependent variable. such as lest scores. For such studies, when will multi ph:
regression provide a useful estimate of the causal effect ond, jus1 as impnrtunrly,
when wi ll it fai l to do so?
external validit y, and di.<;cuss how to identiC) those th re<Hs 10 practice. 1111.!
discussion in Sections 9. 1 and 9.2
focu ~c:s
modc l;;- forccasting- anu provides an introduction to rhe threats to the valtJit}
-------------------------------------------------------- ~
312
9. 1
313
r---
------------------------,
9 .1
cft .. ct are valid for the population being studied. The analysi is e"tern&ll) \ alld
ir ib inferences and conclusions can be generalized from the population and set
t nl,( studied to other populations and settings.
9. 1
314
CHAPTER 9
rqcct on r:uc of tht: te-. under tht. null h> polht.'i' 'hould equ.1l it' tle..,irt:d .;igmr
ic net. k'-d) . .md contd~nc~. mt~ndb !>hould have the tlcsrcd confid~.;nCt. 1t.
l"nr ~..x.tmplt:. if a conhu.. n~..~.. mtenLll h construch!tl nc; {3\1 H l.lJ6SE({3,7 H) tl
cunfidcncc intcl'\.ol shouiJ contain the true popul.lllvn cau.:;al dlcc..t, {3HR w 11 11
prohahilit} 95% over ICp!!atctl samples.
In regression an.tlysi-,, caw,al dfecls are estimatt.d U'>ing the ~..stimatcd rcgr~..:.
'lon functton and hvpothc.sb tests arc performed u"in~ the cstimatc<.l regres-.1u11
cudfJctcmc; and thctr -.tandarJ errors. Accordingly, in a <;tudy based on OLS rcgr '}
s1on the rcquirem~..ntslur internal valitlllv are that the OLS e'ttmator '" unt-iz d
.md ~.:on!)JSt~:.nL. and that o;randartl ~rrors are compuh.tl m J \\a .. tb. t makes c lh
i.kncc mter\'alo; h'l\1.. tile dt:slrcd conftJc;:nce level. Th~..r~.- Jre vanous reason ... h .-;
might not happen, anJ thc..,c re.tsons con<.tilute thn:.tts to mtcrn.il valu.hty. The:>
threats lead to failure~ of nn~.: or more of the Jca,l ~ qu nres as:.umptions in Kt:y
Concept 6...1. For example. one threa t thnt we have discussed at length b omiuec.l
'an,1blc hia': it kads tu correlatiOn between one or more regressors and the er11r
term wh1ch violates lh~.: fu:;t k asL s4uarc!> assumption. lf data on the omi tted v:-.n
.1hlc. ar~.- a-.ailable.th~.on thrs threat ca n be avoided hy including that vanahlc a' n
adJation tl regn.'""or.
S~.:dion 9.2 provides a detailed discussion of the variou-. threats to intc:m dl
-.a.Jc.ltt) in mulliple regreo;sron ancdy:iis and suggests how to mitigate them.
Differences in populations.
th~.: population of intere't c tn f'<N?; threat to external vah(.ht~ For c.. "<:tmple. lab
oraton studie' of the lll\K dtecb ol chemicals typically u(,c animaJ populat wn'
like 1111cc (the populauon stutl1ed), but the r~uhs are u~cd to \.\fi le health. 11 1
safl!t) regulations ror human populuuons (the populalitm o f Interest). W helh~ r
mice and rn~n differ suffich..nlly to thrcatl.!n the externa l validity of ::.uch studh:' b
a maller of Jchate
\tor~ generally the trur.; causal ettcct might not be the ..am~ in the popuiJI II
~luc.h~d nd the popul,llh'O of Interest. This could be hcc:~tuse the popul:nion
~
chD:>c.:n 1n \\a} that make:. 11 lhfferent from th~ popula11on of mtere t. bccaU'l _I
d 11~.-r~.-ncc~ in ch..'\r,ld.:n-.uc) ot th~.o populations. bcc...tU')~ ot gcograptucal d I .. r
l' ncc or becau<;e the ~ uJ} i' out of d.tte
9.1
315
Diffirences in settings.
Fventf the population hcing ... tudicd anJ the populallllltc.:rc ... t arc identical, it might not lx rn''iblc tn eo..:ncr'llilc the -.tudv rc'>ult-.
if thl 'ettin ~'- diller. for example. a c; t u<.l~ ol the ell~.:l'l on culkec. hingl Jnnking
ol an.tnllJnnkmg ad\'erttsing campatgn ought not gcncr.1ht.c to ,mother tJcnltcal
ruup ll( collcuc: stuJcnh tf the legal pcnalttes for dnnktnl! ll the t\\O ~olkgc:. l.ltffcr. l n tht~ cJ:.~o:.thc leg:tl ~tting in which the tUJ) '''' conJuctc:d Jtfier. from the
lc!-!al selling tu \\hil:b il'. rc'>ults are applied.
\.tore generally. examples of differences in -:,clling-. includl' dtlfcrenccs in the
in,titutitiO tl ~"'ironment (public unhef'>lltes \'C!T'>Uc; rtlac,tnu' Unt\tor.,attc:-.). l.hffercn~.:es tn J, '"" CJil krcnces in legal JX nalties) liT thlll:h;lll. . c' m thl: physacal~:n\ 1wnment ( taahwtc-part) bingt.' dnnkmg in 'outh~n C.tlilorlllt n::r:.us F<urb.lnks.
t\lao;ka).
11111
How to assess the external validity ofa study. r xh.rnal vahdil) mul>t be
JUdged u'm~ <;pecifi knowledge olthc popul.ttwnl> anJ scttuurc; !>tudted and those
of tnlLrc'>l. Important dtfferenc~ ~ctween th~ two will ca't Jnuht <.m the external
validity ul the study
Snmctimc-.thcn. arc two or more ::.ludic-. \Ill different hut rdutcJ ropulations.
If :-;o, the .:xt~...rnal validi ty of both c;tudies can he chcckcJ hv C\)lllpatm~ their
rc.,ult$. For cxampk, m Section Y.4 we analy:tc t~tl>Corc and d,,,., site J.tt,t 101 dt.:mcntan ~llllOI Jbtnct., rn \la s.1chu:. tr..; and comp.ut: the M.t''KhU"\."Il:, anJ Calitomiu rc"ult:,. ln gcneral.:illnilar flnJmgs m t\\oll or more stuJtc" hobtc::r dum!> to
3 16
CHAPTER 9
external validity. while differences in their fi ndi ng~ thot ill I! not readily cxpla 1ncd
ca~ t uoubt on their exte rnal validity.
Because th reat~ to e\tt!mal valid.
iry <;tern from a lack of comparability of populations and seumg~ the' tl reat rr.:
best mirumized at llle early stages of a study. befo re th~o; d.tl4 are collect~ d. Stud}
design s beyond tbe scope o( this textbook. aod the int~rc'\lcd reader i referr. d
to Shadisb, Cook, aod Campbell (2002).
9.2
(IIJ'14 )
'
9.,
1nreo1~
31 7
ors
variable bias.
The third step is lo augme nt your base specifica tion with the additional q uestionable variables identified in the second step a nd to test tb\! hypotheses that the ir
coefficie nts are zero. If the coefficiem s on the additiona l variables a re :,tau:.ltcally
sig.nilicant. or J( the estimated coefficients of mte rest change apprecia bly " hen the
additional variables are included. Lbe n the) should re main in the sp~tiJica tion and
you should modi fy your base sp ecification. If no t, then these vari~blc!> ca n he
excluded fro m the regression.
111e fo urth !>tep is to present an accurate sum mary of your results in tahular
fom1. This provides " full disclosure'' to a po tential skeptic. who can then dra,-. his
o r her own conclusions. Tables 7.1 a nd 8.3 arc examples of thi!l stra tegy. For e xample, in Table 8.3. we could have prcsenteJ o nly tht: regression in colum n (7) ,
because that regression s ummarizes th~ rele \ ant effec ts a nd nonlineantie!l in the
o the r regressio ns in that table. Prese nting the othe r regressions, however. p e rmits
the skepticaJ reader lO draw his or her o wn conclusio ns.
These steps a re summarized in Key Concept 9.2.
318
CHAPTER 9
9.2
3 19
Misspecification ofthe
Functional Form of the Regression Function
If the true population regression function is nonlinear but the estimated regression is linear, then thi~ functional form misspecific11tion m akes the OLS estima tor
biased. l11is bias is a type of o mitted variable bi as. in which Lhe o mitted variables
are the terms that re flectlhe missing nonlinear aspecls of the regression fun ction.
For example. if the population regression functjon is a quadratic polynomial. then
a regression that omits the square of the independent variable would suffer from
omitted variable bias. Bias arising from fun ctional fonn misspecification is sum
me:ri?..ed in Key Concept 9.3.
Solutions to f unctional form misspecification. When lhe dependent variable is continuous (like test scores), this problem of potential no nlinearity can be
solve d using the me thods of Chapter 8. I f, however, the dependent vari able ~ dis
crete o r binary (for example, Y, e quals 1 if the i 1h person attended college and
equal::. 0 otherwise), things are more complicated. Regression with a discrete
dependent variable is discussed io Chapter 11.
Errors-in-Variables
Suppose that in our rcgre~sion of lesr scores against the stude nt- teacher ratio we
had inadverte ntly mixed up our data, so that we ended up regressing test scores
for fifth graders o n the student- teacher ratio for tenth graders in that dist rict.
A lthough the stude nt-teacher ra tio for eleme ntmy school s tudents and tenth
graders might be correlated, they are not the same, so thi~ m.ix up would lead to
b~as in the estimated coeffic ie nt. This is an example of error sinvariabl es hias
320
CHAPTER 9
X,
X,,
Y,
Wl)
where v, =:_ {3 1(X; - X,) + u,. Thus. the population regress100 equation written tn
terms of X 1 has an error term that contains the diffe~nce between X and Y H
this difference is correlated with the measured value X,. then thl! regrt.:"5or .\ ,''til
be correlated with the error term an d ffi1 will be biased and inconsistent.
The precise s ize and direction of the bias in {3 1 depend on the corrc.;latil'"
between X, and (X; - X,).This correlation depends. in rum, on the specific nature
of the measurement error.
As an example, suppose that the survey respondent provides he r be)t "' '"
or recollection of the actual value of the independent varia hie X1 A com e nt
way to repre ent this mathematicaJiy is to suppose that the m~::a s ured valw.. >f X.
equal:. the actual, unmeasured value, plus a purely random component, w1 All. ,rJ
ingly, the measured value of the variable, denoted by X,. is X1 = X; + w;. Beo.. U..'~~
the error is purely random, we might suppose that w; has mean ~e ro and vurhH1l't:
~and is uncorrela tcd with X1 and the regression error u1. U nder this assumrtH>II.
a bi t of algebra 2 shows that ~ 1 has the probability limit
-fR
En
ahi
err
the
terl
bJa
9.2
321
E~<ROR5-IN-VARIABlES BIAS
Errors-in-variables bias in the OLS estimator arises when an independent variable is measure d imprecisely. This bias depends on the nature of the measurement
erro r and persists e,vcn if the sample size is large. If the measured variable equals
the a~tual value plus a mean-zero, ind~.!-pendently d istributed measurement error
9.4
term. then the OLS estimator in a regression with a single right-hand variable is
bjnsed toward zero, and its probability lin1it is given in Equation (9.2).
That is, if the measuremenl imprecis ion has the effect of simply adding a random
e leme nt to the actua l value of the independent variabJe, then ~ 1 is inconsistent.
Because the ratio fr.,1 rr1i v;.
_ 7 is less than 1, ~
1 wil1 be biased toward 0, e ven in large
samples. In the extreme case that the measurem ent error is so large lhat e ssentially no information about X, r emains. t he ratio of the va riances in the final
expression in Equa tion (9.2) is 0 and~~ conve rges in probability to 0. In the o ther
extreme. when there is no measurement error.
=- 0 so~~ ~ {31.
u;
A !though the result in Equation (9.2) is specific to this particular type of measureme nt error, it illustrates the more general proposition that if the independe nt
variable is measured imprecisely the n the OLS estimator is biased . even in large
samples. Errors-in-variables bias is summari7,ed in Key Concept 9.4.
Solutions to errors-in-variables bias. The best way to solve the errors-invariables problem is to get an accurate measure of X. rf this is impossible. howeve r, econometric m e thods can be used to mitigate errors-in-variable~ bias,
One such me thod is instrumental variables regression. It relies on having
another variable (the " instrumencar ' variable) that is correlated with the actual
value X, but is uncorrelated with the measure ment error. This method is studied
in Chapter 12.
A second method is to develop a mathematical mode l of the measure ment
erro r and, if possible. to use t.he resulting forrouJas to adjust the estima tes. For
example, if a researcher believes that the measured variable is in facr the sum of
the actual val ue a nd a random measurement error te rm, and if she knows or can
estimate the ratio a~./ a}, then she can usc Equation (9.2) to compute an estimator of {31 that corrects for the down ward bias. Because this approach requires speci<~liz e d knowledge about the nature of the m easure m e nt error. the details
typically are specific to a give n data se t and its m easure ment proble ms and we
sha ll not pu rsue this approach furthe r in th is textbook.
322
CHA PTER 9
on Multiple Regrt!$sion
Sample Selection
Sample electjoo bins occur:, when lbe availability of the d:'lta ts influenced \ li
selection process th.tt i~ related to the value of the dt;pend~.;ot vanablc. lnh ,elcc
lion proet:ss can introduce correlation between the error term and the rcgrl! r,
wh1ch leads to bias in the OLS estimator.
Sample selection that is unrelated to the value of tht. dcpcndtmt variablt.. d)(:~
not introduce bias. For example. if data are collected from a population b} simple
random ampling, the sampling method (being drawn at r~md om rrom the P' pu.
lation) hac; noth ing to do with the value of rbe dependent vanable. Such sam rlin~
does not introd uct: bias.
Bia$ can be in troduced when the mer hod of samphne. " related to the ''
of the dependent variable An example of sample selectiOn bias in polltn!'. , 1 ~
given m a box in Chapter 3. In tha t example. rhc sample selection method (1 n
doml) selected phone numbers of automobile owners) was related to the th.p n
dent variable (who the individual supported for president in 1936), becau~c in I 11.16
ca r owners with phones were more likely to be Republicans.
Ao ~xa mple of !iample selection in economics arises in using a regrel>!>JO "'
wnges on education to estimate the effect on wages of an additional year ol nJucation. O nly individuals who have a job have wages. by defini tion. Tht f ~.wr ..
(observable a nd unobservable) that determine whether someone ha .. a jo~ lucauon. expcn~nce. wbtre one hves. abihty, luck. and so forth - are similar to the
factors that determine how much that person earn~ when employed. Thus..t h~. 1Ct
that someone has a job suggests that. all else equal, tlte errol term m the " ~c
equation for that person is p o~it ive. Sa id differently. whether som~one ha!> a Job ts
in part determined by the omitted variables in the error term in the wage ret> ,.
sion Thus.. tbe ~impl e rct that someone bas a job, and thus appears in the dJI.l"rt.
provides information that the c1 ror term in the regression is posi tive. at lca-.,i on
ah:rage, anti could be correlateJ \\ilh the regressors. This roo can lead to om" tn
the OLS estimator.
Sample selection bias is summarized io Key Concept 9.5. The box " D o ~tt."l\:k
Mut ual Fund Outperform the Market?" provides an example of sample ~cb. itnl
bins in lioancial economics.
1<. h
9.2
323
,-1
rl
::
t-
Sample selc'ction bias arises when a selection process influences the availability of
data and that process js related to the dcp~ndent variable. Sample selection
induces correlation bel ween one or more regressors and the error term. leading
to b1as and inconsistency of the OLS estimator.
ag~d
luHdsconsist~ntly
Lhe data se t.
ev~:
econom<~tricians
indicat~s that
activdy
f\Ot Clutp~.:rform
the
324
CHAPTfR 9
Simultaneous Causality
So far. we have assumed that causaliry runs from the regre!osors to thl! tlcpend~.:nt
variable (X causes Y) . But whHt if causaliry also runs from the dependen t variable
to o ne or more regressors (Y causes X)? lf so. ca usality run<o bad;\\-artl .. a:.\\ 11
as forwa rd, lhat is, there is simultaneous causality. If rhere '" -,unullantou~ cau<~ (.
ity, an OLS regression picks up both effects so the OLS estimator j, hiased nnll
inconsistent.
for example, o ur study of test scores focused o n the dfcct on test score' 1
rt!d ucing the student- teache r ratio, so that causality is presumed to run from t h~.:
student- teacher ra1io to test sco res. Suppose. however. that a government mlln
1ive subsidized hiring teachers in school districts with po01 test ::.core lf so, cau~.t.
ity would run in bo th directions: For the usual educa tional reasons lo''
student- teacher ra tios would arguably lead to high test scores. but because of tht
government program tow test scores would lead to low student- teacher ratit".
Simultaneous ca usality leads 10 corre lation be twee n the regressor and tlw
error term . In the test score example, suppose there is an omitted fac tor that k.u.ls
to poor lest scores: beca use of the government program, this factor lhat produlc:,
low scon:s in tum results in a low student- teacher ra tio.111llS, a negative error ll'ILTI
in the population regression of test scores on the stude nt-teacher ratio rcJulcS
test scores. but because oitlle government program it also let~ds to a decrea~~ in
Lhe student- teacher rallo. In o ther words. the student- teacher ratio is po~nh l!ly
corre lated with the error term in the popula1ion regression. This in turn lec.W\ to
simultaneous causality bias and inconsistency of the O LS estimator.
This correlation between lhe error term and the regressor can be made pre
cise mathematically by introducing an additional equation that describe:- he
reverse causa l link. For convenience, consider just the two va riables X and Y .mc.l
ignore other possible regressors. Accordingly, there are two equations. on~ in whtch
X causes Y, and one in \vhich Y causes X:
Y;
Equation (9.3) is the famili ar one in which {3 1 is the c(fccl on Yo[ a chang tn
X . where u represents other factors. Equation (9..4) represents the reverse caJ'3 1
d fect of Y on X. In the test score problem. Equation (9.3) represents the ~ducJ
tiona! effect of class size on test scores. while Equatio n (9.4) represents the rc\'~.-f!'l!
caus:1l effect of tes1scores on class size induced by the govcrnm\!nt program
9.2
325
.s111
9.6
way~ to mitigate
sim ulta neous causality bias. One is to use tnslrumcntal variables regressio n, the
o t C hapter 12. The second is to design a nd to tmplt:meol a ra ndomLZed contro lled experime n1 tn which the r everse causa lity cha nne l is nullified . and such
10p1c
di scu s~ed
in Chapte r 13.
Sources of Inconsistency
of OLS Standard Errors
fnconsbtcn t standard e rrors pose a different threat to internal validity. E ven jf the
OLS es timator is co nsistent a nd th e sample is la rge. incons is tent sta ndard e rrors
will prod uce hypothesis tests with size th at <.ll lfc rs fro m tbe desired significance
h:vel a nd '95% .. confide nce intervals th at fa il to include the true value tn 95% of
rc!p<:ah:d samples.
' Jc ,Jmw th1s m.tth emaucnlly.ootc that E 4uauoo (11.4) imphc~ that em( X .11) CII\( Yn ~ y 1Y, + v ,.ll l
r e<t\(Y 11,1 ~"''"( '' 11,) \ S<>uming tbal cov( t , 11 ) II, h' l:qua uun tl 'l Ihi\ an turn 1mpl ic:~ thut
''' ( \',. 11,1 = -y1CO\ ( Y,.11) - )' cov(p ~ fi \"
11 11)'"' )' JJ.co, (X 11\ + ,,, ~(lhmg lur cov(X .11)
111 1 vdJ) tho: rl!l'ult cov(.\', .:tl = ;,u;. t 1 - )' 1 ~ ) .
326
CHAPTER 9
Tbere arc two main reasons for inconsjsrent standard errors: tmpropcrly handled beteroskedasticity and correlation of the error term acrose; ob~t!I"\CIIIOns.
As discussed in Section 5.4. Cor htstorical reason~ !\l.m ~
regrel>hlon so1tware report homoskeda<>ticity-only standard error... (l howevcr.th~
regression error is heteroskedastic, those 1>randard erron:. are not '' reliat'llt! ba "
for hypothesis tests and confidence intervals. The solution to this problem ts '
use hetcroskedasticity-robust standard errors and to con~tr uct/st ati tics using a
heteroskedasticity-robust variance estimator. Heteroskt!dasticity-robust standard
errors are provided as an option in modern software packages.
Heteroskedosticity.
In some settings. he
population regression e rror can be correlated across observations. This wi ll not
happen if t he data are obtained by sampling at random from the populattorl
because lhe randomness of the sampling process ensures that the errors an:
i.ndepende ntly distributed from one observation to the next. Sometimes, howt:ver.
sampling is only partially random . The most common circumstance is "hen
the data are repeated observations on the same en tity over time. for example. the
same school d istrict for different years. 1f the omitted variables that constitute
the regression e rror are persistent (like district demographics), then this intlu~..es
'serial" correlatwn in the regression error over Lime. Serial correlation in the uror
term can arise in panel data (data on multiple districts for muJtiple years) ,,nJ in
time series data (data on a single district for multiple years).
Another sttuation in which the error term can be correlated across observa
tions is wben sampling is based on a geographical urut.lf there are omitted ' ri
ables Lhat refiect geographic influences, these omitted variables could re~ull to
correlation of the regression errors for adjacent observations.
Correlalion of the regression error across obsen'ations does no t make th1.. OL5
estimator biased or inconsistent, but it does violate tbe second least squares a-...ump
lion in Key Concept 6.4. The consequence is that the OLS s tandard errors-lx tb
homoskedasticy-only and heteroskedaslicity-robust- are incorrect in the sen-;c that
they do not produce confidence intervals with the desired confidence level.
Io many cases, this problem can be fixed by using an alternative formul,t tor
standard errors. We provide such a foml\lla for computing standard erro r~ I ltil l
are robust to both heteroskedasticity and serial correlation in Chapte r 10 (rtgr.:~
sion wilb panel data) and in Chapter 15 (regression with time series data).
Key Concept 9.7 summarizes the threats ro internal va lic.liry of a m ltlltple
regression study.
9.3
Internal ond External Validity When the Regression Is Used for Forecasting
327
----
T here are fiv~ prirn.ary threats to the internal validity of a multiple regression
study:
1. Om itte d variables
:. F uncLional form mjsspceifica~iOI)
3. Errors-in-variables (me asureme nt error in the regressors)
-l. Sample S'elect.ion
5. Simultaneous causality
Each of these, if present, results in f~ ilure of tbe first least squares assumption,
f.(u;fXv, .. . , XkJ :f. 0, which in turn means that the OLS estimator is biased and
inconsistent.
Inco rrect calculation of the standard errors also poses a threat to internal
validity. Homosk.edasticity-only standard errors are invalid if heterosJcedasticity is
present. If the variables arc not independent across observations, as can arise in
panel and time sedes data, then a further adjustment to the standard erwr formula is needed to obtain valid standard errors.
Applying this lis~ of threats to a multiple r egression study provides a systematic way to assess the internal validity of that study.
9.3
328
CHAPTER 9
'ichool district. that h. the superintendent wants to know the causal effect un tesr
l>COres of a change in class size. Accordingly, Chapters 4-8 focuseu on usm~ rcr,r~'<
sion analvsis to estimate causal effects using observational data.
t\ow com ider a d ifferent problem. A parent movmg to a mctropnhtan .u~a
plans to choose where to live based .in part on lhe quality of the loca l scho(,Js.;fhe
paren t would like to know how different school districts perform on tam.J.~rd :ted
tesls. Suppose. however, that rest score data are not available (perhaps the) tc
confidential) but data on class sizes arc. ln this situation , the parent must gue~ ,11
how well the different districts perform on standardized tests bao;cd on a hmitcu
amount of informa tion. That ts. the parent's problem is to forecast averagc h~\t
scores in a given district based on information related to !est scores-in particular. class site.
How can the parent make this forecast? Recall the regression of test st:ores
on the student- reacher ratio (5 7R) from Chapter 4:
TesrScore
698.9-
2.2~
x STR.
(9.5)
We concluded that this regression is not useful for the superintendent:The- OLS
estimator of the slope is hiased because of omitted variables such as the compn:-i
tion o( lhe s1udenl body and students' other learmng opportunities outside: ~dtL.ol
Nevertheless, Equation (9.5) could he useful to the parent trying \0 choo.;e a
home. To be sure. class size is nol the only determinanl o( lest pe rfo rm an~~,. . hut
fr om the parent's perspective what matters is whether it is a reliable predJctl r oi
te!>l perlormance. TI1e parent interested in forecasting lest scores does not c;Irc
whether the coefficient in Equation (9.5) estimm e~ Lhe causal effecl on test..,._ rl!~
of class size. Rather, the pan~nt simply wan ts the regression to explain much l ' 1h~
variation in test scores acros disuicrs and to be stable-tba l is, to apply to 1h~,. Jistncts to which the parent is consjdering moving. Although omitted vanai:lk hias
rcnuers Equat1on (Y.S) useleSl> fo r answering the causal question, il sliU can he u..,cfu ll.or fo reca~:>tiog purposes.
More generall)', regression models can produce reliable forecasts, even Jl the! If
cocffic.:ient~ ha ve no causal in h.:rpretation. This recognitio n underlies much ut rh~
usc of rcgn:ssioo models for forecasting.
9 .4
329
h:.,pc~o.u,e
aud rc~~
9 .4
External Validity
Whether the California analysis can be genera lized-that IS, whether it is externally valid-depends on t.he population and ~t: tti ng to wh1ch the gencralit.allon is
made. Here, we consider whether lht! rc-.uhs can be gencr.lhtcd to performance
o n other S l~ nd a rdized tests in other element ary puhllc school districts in the
United States.
Secttoo 9.1 noted that having more than one ~ t udy on the same toptc provides
an opponunity to assess the external valit.hty ot both stutlit!:, by comparing their
results. Tn the case of test scores and dal)S :~i ze, other comparahle data sets are. m
fac t, avCtilable. In this sectjon. we examine a <.liflcrent data set. bnc;ctl on -;tandard
i;cdtcst results for fou rth graders in 220 public school dic;trictc; in Massachusetts
in 1998. Both the Massachusetts and Cahforma tests a re l>rond mca!>urcs of student l..nowleuge and academic skiJLs. although the detaib dtlfer. Similarly. tht; organuauun t.' f classroom instruction is broadly "tmllar atth, demcutary school Jc, el
m th~ two states (<IS it is in most U.S. elt!mentary ~c h ool dt.,trict~). although aspects
ol dcmcntary school funding. and curriculum d iffer Thus. finding ...imilar re-.ults
ahout the effect of the student-teache r ratio on test performance in the California and Massachusens dat:l would he evidence of external validity of the fi ndings
in u1liroroia. Conversely. fi nding dtfkrent rcc:ults Ill the two states \\Ould raise
questions about the internal or external' ahult} ot at lc.1st om. ~o>f th~o. ~tud1es.
330
CHAPTER 9
TABLE 9.1
Test score~
Student-teach~:r
ratio
Mauochusetta
Averosae
Sfanclord Deviation
654.1
19.1
701J.IS
15.1
19.6
1.9
17.3
2.J
Average
l 5R%
18.3%
1.1 Ofu
44.7%
17.1%
15.3%
Sl5317
7226
2 .~~(,
I~
SIR 747
I%
S5&18
Number of ohscrvations
420
22()
Year
1999
IWM
Like the Ctlifornia data. the Massachuse11s data are at the school discricllevel. The definiLmns
of the variables in the Massachusetts data set are the same as !hose in the Cnhforoia data se t. or nearly so. Mo re informaLion on the Massachu e tts data ~t:t.
including definitions of the variables, is given in Append i~ 9.1.
Table !:1.1 presents summary statistics for the California and Mass a chu~~.;~ts
sa mples. The average tesL score is higher in MassachusettS. but the tc<.t is diffc:r~:11.
so a direct comparison of scores is not appropriate. The average studenl-leJcher
ratio is higher in Cali fornia (19.6 versus 17.3). Average district income is 20'}o
higher in Massachusetts, bUL Lhe standard deviation of income is greater m C.shfornia, that il>. there is a greater spread in average district inco mes in Calih1rn1a
than jn Massachusetts. The average percentage of students still learning Englbh
and the average percentage of students receiving s ubsidized lunche!i an: hl'lh
much higher in the California than in the Massachusetls district5.
9.4
FIGUR! 9.1
IW\r
331
T
hmoted ltneor
rngress1on function
cb s not coplvre the
nonltncor rclolton
~
n 1ncomo ond
t ~cores in
the
so. husetts doto
~
Tho eshmoted linear-log
ord cubac regression
fun ~ions ore simila r
( dstract ancomes
L. tw n $1 3,000 and
SJC 000 the region
~ol 1n ng most of the
obse "'ofons.
e Calinilions
c Cali-
~2(1
L.__
_J__ _L . __
_J__ _L _ _
10
ffl!renl.
teacher
l 20%
.._~._ __.___.._~._
Jtl
_..___
.\U
__,~._ _~
50
Disrrict income
(Tho usands of dollars)
ata set.
husem
2U
slightl y higher R2 than the log<ritb mic speci fic<Hion (0.486 versus 0.455).
Com par ing Figu res 8.7 a nd 9.1 shows tha t 1h~ gene ra l pa ll ern of nonlinearity
found in the C.alifornia income and test score d:J ta 1~ abo prescm in the Massac husettl> dl\ta . The precise functional fo rms tha t bec;l describe this nonlinearity
diffe r, however, with the cubic specification fiumg best in Massachusetts but the
li near-log ~pecificai ion fining best in California.
1t present
er 8. h<'"".
:l 8\ c.;.f..t!;"
~ 9.l. ,,..
::aliforni.t
~p for lo"
.sion pi<'t
tic reer~~
t ion h 1.. 3
l1le r~:maining columns re port the results of mcluding additional varia bles tha t
control for student charac te ristics and o f introduci ng non linearities into the estimated regression function. Controlling for the percentage of English learners. the
percen tage of students e ligible for a free lunch . and average district income
reduces thl! estima ted coefficie nt on the srude nt teacher ra tio by 60%, from - 1.72
in rep.re-.sion (1) to -O.I'i9 in re~rc!>sio n (2) ant.! O.M m rcgre.,,Jon (3).
332
CHAPTER 9
TABL9.2
Student-teacher ratio
(STR)
I 1I
(2 )
(3)
- l. 72'"''
-0.69*
(0.27)
-0.64'
(0.27)
(0.50)
STR2
12.4
(14.0)
---
- -
- 1.02""
(0.37}
-0.67*
(0.27)
IIi
0.0 11
(0.013)
% English learners
--
- OAU
(0.306)
STR
Sf;
--
- 12.6
(9.8)
- - - -
--
- 0.521"*
(0.077)
-- 0.582**
---
- 0.587**
(0.104)
(0.097)
--
16.53*"
(3.15)
- 3.07
(2.35)
D istrict income
- 3.38
(2.49)
--
- 0.709*4
(0.091)
---
... -- - 3.~7*
(2.49)
0.184*
-O.o5J'
(0.72)
-3.22
(231)
0.165
(0.085)
-0.0022 "
(0.0010)
-0.0023*
(0.0010)
-0.(1022'
682.4"'*
-7..4.0**
-0.0023*
(0.0010)
(21.3)
665.5**
(81.3)
759.9*
(11.5)
747.4""'
(20.3)
739.6**
(8.6)
coe
0.174
(0.089)
-District income3
--
0.80
(0.56)
0.164
(0.085)
District incomc2
Intercept
-0.434
(0.300)
-0.437
(0.303)
f/iEL
(6 )
-0.680
(0.737)
-STR;
(5)
(4 )
---
..........
--
(0.090)
- '
(23.2)
( ().()() }() 1
Comparing the R2 's of regressions (2) and (3) indicates that lhe cubic spl!cifl
cation (3) provides a better m ode l of the relationship between test scores and
incom e than does the logarithmjc specification (2) , even holding constant the ::.tu
dent-teacher ratio. There is no statistically significant evidence of a nonlinear relationship between test scores a nd the stude nt- teacher ratio: The F-statistir in
regression (4) testing whether the population coefficients oo STR2 and STR3 ar~
zero has a p-vaJue of 0.641. Similarly. there is no evidence that a reduction in the:
student- teacher ratio has a dillerent effect in districts with many English learner'
9 .4
333
(3)
(2)
(4 )
2.86
(0.038)
=0
'i TR~. STR 3 = 0
tnteractions
(5)
(6 )
4.01
(0.020)
0.45
(0.641)
1-
7.74
(< 0.001)
Income, Income-'
-- - -
7.75
(< 0.001)
5.85
(0.003)
6.55
(0.002)
1.58
(0.208)
~~
Rl
1~.64
0.063
8.69
K6l
8.63
8.62
8.64
0.670
0.676
0.675
0.675
0.674
Tlte<e rcgre;;sions ..ere est ima ted ustng the data on Massachusett~ elementary ~hoot d1Mncts described m AppendL'I: 9.1. Standanl
drcln. ar.: gtvCtl in parentheses under the ~oeificicms. and p-va lue~ are given in parembe"&eS under tbc F-~tatistics.!ndh-idua l
.-O":ftinent~
tban with few [the t-statistic on HiEL X STR in regression (5) is 0.80/ 0.56 = 1.43].
Finally, regression (6) shows tha t the estimated coefficient on the student-teacher
rario does not change subst.aotially when the percentage of English learners [which
is insignificant in regression (3)] is excluded. [n sh a n , the results in regression (3)
are not sensitive to the changes in functional for m and specification considered in
regressions (4)-(6) in Table 9.2. Therefore we adopt regression (3) as our base estimate of the effect in Lest scores of a cha nge in the s tudent- teache r ratio based on
the Massachusetts da ta.
pecifi
es and
be sru
1r rela
lstic in
2. The hypothesis that the true coefficient on the stude nt-teacher ratio is zero
was rejected at the 1% significance level. even after adding variables that control for student background and district economic characteristics.
.R;. ar.:
3. The effect of cutting the student- teacher ratio did not depend in an important
way on the percentage of English learners in the district.
in thC
4. The re is some e vidence tha t the rela tionship between test scores and the stu-
~a rner-
r-334
CHAPTER 9
Do we fincl the -<ume things in Ma'isachusetrs? For fi ndings ( I). ( 2), and (i),
tbe answl.r is j c~ Including. the additional control variables reduce~ thl. codllcicm
on the student teacher ratio Crom - 1.72 [Table 9.2, regrc-;sioo {1)) to -0 69 [ rd tl~:
'J.2. rq~re~~mn (2)}, a reduction of 60 ~. The coelltcients on the tutk nt h. td cr
rauo rematn sumifit.:ant afte r adding the cont rol variables. Tho<.c co..tttdents nrc
on I} <:tgnificant at the 5% kvel in the \!fassachusens datJ, '' hcrc.:h they 1rc """
nificant at th(; I 0 o level in the Cahforrua data. However, there arc nearly t\\ 1~.;~.. s
many oh... ~.-rv:ttlo n~ in the camornia data. so it is not surprisingt hat t h~.o ( ahtor r J
estimates are more precise. As io the Ca lifo rnia data, there is no swtistically
sig.mhcant evtue nce tn the Massachusetts data of an interaction hct ,~ccn the
studeDL-t~.-,tch~.;r ralio and the binary variable indicating a large percentage ol En!!.ll~h lcJmcrs 111 the districl.
Findmg (-I). however. docs no t hold up in the Massachusetts data: The h) pnth
csis that thl! rc!lationship bt!lween the studenL- teache r ra tio and test score~ 1:. linear cannot be rejected at rhe 5% significance level whe n lesred against a culm:
specification.
Because the two s10udardized tes1s are diffe rent . the coefficients rh e m c;d v~.,
can not be compared directly: One point on the Massachusetts rest is not the c;ame
as o ne point on rhe California test. If. however. the test scores are put into the .. ne
units. then the eMimated class size effects can be compared. O ne way to do tl L<
to transform the test scorc:s by standardizing them: Subtract the sample <J\~.o Jge
and divide by tbe sta ndard deviation so that they have a mean of 0 and a vJr 'ICC
of l. The '>lope cocff,ctents in the regres!>ioo wnh the rransformed test score L.lJUal
the slope co~ffiden ts in the originaJ regression. dJVtded hy the standard devu ,.,
o( the test. Thus the coefficient on the studen t- teacher ratio. divided by Ult! c:
dard deviation of rest scoces. can be coo1pared across the two data sets.
Tit is cornp<trison 1s undertaken in Table 9 .3. 111~ first column reports the Ol-\
estima te::. of the cocflicienl on the student- teacher ratio in a regres!'ion \\ith ile
percentage o( E ngli~h l ea rne r~ the percentage of !otudents eligible (or a fret: lunch.
and lhe average d i~ tri ct income included as control variables. The second colu ln
reports the standard ~.kvia tlon of the test !>COres across district~. The finai i~H,
columns report the estimated effect on test scores of reducing the st ude n t-le<t~.h .: r
ratio by two students pt.:r teacher (our supcrintl.!nde nt's proposal), first in the llnll'i
of the tesr. and second in ' tamlan.l devia tion un its. For the linear specific<ltion.tllc
CJLS coefficient cst im nle using Ca lifo rnia data is - 0.73. so culling h~
<.tudcnt- teachcr ratio by two is estimated to increase dh.trict test scores by t 7-'
X ( -2) = J.46 points. Becaw.e the standard deviation of lc:~tscorcs is 19.1 pt, '
this corrc::.pond:> to I .46/ 19.1 = 0.076 standard deviat tons of the Ol:>tribullllO
tc~t ' core:' acr\",,S distnct!l. lbc standard error ol th i~ esti m at ~.o ts 0.26 x 2 lli
TAB
I
I
Colifc
Lm..a
Cubic:
Rrduc
Clthi(;
l?ttduc
Mane
Lmear
I
!>tandar
9 .4
TABlE 9.3
3 35
OLS utimate
Standard Deviation
of Test Scores
fJSTR
Across Di5tricts
- 0.73
19.1
(0.26)
19.1
19.1
140
Ull76
(05:!)
(0.02i)
'H)~
(0...(1)
l)t)
(11.1''1)
O.t'H
(0.27)
lS.l
Standard
Devlotion5
,~ 3
(IJ.U37)
0.099
(0.036)
1.2.1!
O.tJS5
(0.54)
(lll13o)
0.027. The csLimated effeclS for the nonlinear models anu lhetr srandard errors
were computed using the method described in Section 8 1.
Based on the linear model using Ca lifo rnia data, a reduction of two students
per teacher is estimated 10 increase test scorcc; hy 0.076 Slandard denat ton unit,
wtth a standard error of 0.027. The nonlinear mo<klo, for Caltfot nia data suggest a
somewbal larger dfecl. with the speciftc ellect depending on the tntt lul -.ludent- lcacher ratio. Based on the Ma~ac husctts dcna. thb ~:'it I mated effect is 0.085
standard Jevia1ion unit, wilh a standard e t ror of 0.036.
These estimates are essentially lhc .... unc Cutting the "rudcnt- teacher ratio is
prc.:dicted to raise test scores. bul the predicted improvement is small. In the CalIfornia data. for example.. the d1ffe rt:nce in tc-.t scori!S bet\\\,;1!0 the met.ltan de>tnct
and a district at the 75'h percentile IS 12.2 tec;t score pomts (Table 4.1), or 0.64 ( =
12 2/19.1) standard deviations. The e!>llmutcd effect Irom lb.: linear modd b. just
O\ ~: r one-tenth tllis size; in other words, ac-cording ll.l th i"l'"limatc. cutting the student teacher- ratio by two would move a d1'trict onh one-tenth of the wa\- from
the mcd1an to the 75lh percentile of the distribution of test scores acr\)SS districts.
Reducing lhe student- teacher ratio hy two is a l.trgc ch an!~ (or a dh tri l. hut the
cstimatt.:d benefits shown in Table 9.3. whtlc nonLero, .11 ~ small.
336
CHAPTER 9
Internal Validity
The similariry of the results for California a nd Massachuscw. d~s not en ure tt t:tr
internal validity. Section 9.2 listed five possible th reat~ to mtcmal vahdtty t 1
could induce bias in the estimated effect o n tes r scores on cla~s size. We conslt... r
these threats in turn .
Omitted variables.
Possible o mitted variables remain. s uch as other sch ool and srudent characterisrics, and their omission might cause o mitte d variables bias. For example. it the
s tudent- teache r ra tio is co rrelated \\-ilh teacher quality (perha ps because bc'ter
teachers ar e a ttracted to schoob with smalle r s tude nt- teache r r atios) , and if
teacher quality affects test scores, then omission of teache r q ual11y could bta~ the
coefficient o n the student-teacher ratio. Similarly, districts '' ilh a low <;tude,tteache r ra tio might also offer many extracurricular learning opportun.itie-... Al.;o,
distr:icts with a low stude nt- teache r ratio ought attract fami lies Lhat are mo re ommined to enha ncing the ir child ren's learning at home. Such omiued facwrs C .Jh.l
lead to omitted \'ariable btas.
O ne way ro eliminate omiued variable bias. at least in theory. il> to cond uc. an
exper imenr. For examp le, students could be randomly assigned to different '>1/.t'
classes, and their subsequent performance on sta ndardized tests could be
compared. Such a study was in fact conducted in Tennessee. a nd 'I'Ve examine tt in
Chapte r 13.
Functional form .
9.4
337
broad and potenlla!Jy maccurate measure of class size. For exam ple. hccause students m ove in and o ut of districts, the studenL-te.acher ratio might not accurately
represent the actual class sizes experienced by the students taking the test, which
in turn could lead to the estimated class size effect bei ng biased toward zero.
Anothe r va ri able with pote.ntial m easurement error is average d istrict income.
Those data were taken fro m the 1990 census. wbilc the other data pertain to 1998
(Massachusetts) or 1999 (California). If the economic composition of !he d istrict
changed substantially over the 1990s, this would be a n imprecise m easure of the
actual average district income.
ir
at
er
Selection. T11c California and the Massachusetts data cover alJ rhc public elementary school districts in the sta te that satisfy minimum size restrictions, ::.o there
is no reason to believe that sample selection is a problem here.
Simultaneous causality. Sim ultaneous causality would a rise if the perforthe
tter
d if
the
ntlso,
om-
mance on standardized tests affected the student-teacher ratio. This cou ld happen, for example, if there is a burea ucratic or poli(jcal mechanism for increasing
the funding of poorly performing school!. or districts, which in tum resulte d in hir
ing more teachers. ln Massachusetts, no such mechanism for equalization of school
financing was in place duri.ng the time of these tests. In California, a series of court
cases le<.lto some equalization of funding, hut t his redistribution of funds was no t
based on stude nt achi~veme nt. Thus, in neither Massachusetts nor Califo rnia does
simultaneous causality appear lObe a problem.
ould
ty of
gated
alter
rrune-
logs of
ession
The similarity between th ~ M n~~achusetts and Californ ia resul~s suggest that these
studies are externally valid. in the sense that the main fin dings can be generalized
to performance o n standardi%ed tests at othe r elementary school d i~t rict s in the
United Sta tes.
338
CH APTER 9
'u
9.5
Conclusion
The concepts of internal and external validity provide a framework for asse!)sing
what has been learned from an econometric study.
A study based on multiple regression is internally valid if the estimated cod
ficients are unbiased and consistent, and if standard e rrors are consisten t. Thrc 1t~
to the internal validity of s uch a study include omitted variables, misspectficauon
o f funct ional form (nonlinearilies), imprecise measurement of the indept:nd~.nt
'If you arc mtc!rc:,tc:J in learning mor~ about the rc:lation,hip between claot~ sw: anc.J t~:'t
the reviews by I hn:nht:r~. Bn:wcr. Oamoran. and \\~llms (2001a. 200lb).
)UlrL).
,~..
Summary
339
Summary
lS
(10
nt
1. Statistical studies are evaluated by asking whether the analysis is inte rnally and
exte rnally valid. A study is inte rnally valid if the s wtistical inferences about causal
effects are valid for tbe population being ~t u died . A study is externaJiy valid if its
mfe rences and conclusions can be generalized from the population a nd setllng
studied to other populations and settings.
2 In regre::~s ion estimation of causal effects, there are two types of threats to internal validity. F'trst. OLS estimators wi ll be inconsistent if the regressors and error
terms are correlat ed . Second , confidence int ervals and hypothesis tests are not
valid when the standard errors are incorrect.
3. Regressors and error terms may be correlated when tbere are omitted variables,
an incorrect functional form is used , one or more o f the regressor:. is measured
wi th error. the sample is chosen oonrandomly fro m the populatio n. or there is
simultaneo us causalit y between the regressors and dependent vanablc:..
340
CH APTE R 9
4. Stand.1rd arors arc incorrect when the errors are heteroskcdasti<. and the com
puter software use ... th ~.; homoc;kcdasticity-only standard errors, o r when the crro1
term i" corrdated across dilfercnl observations.
S. \\hen regression modds arc used solely Cor forecasting, it i not ncce"' tr) for th
re tlrc~ ton coeft1cients to be UJl biased estimates of causal effccr... l t is l:ntictl hm\
ever, that the regression model be externally valid Cor the lor~casting application
al hund.
Key Terms
internal '.thduy {3l3)
(313)
pupul.uioo .,tudied (313)
popui<Hion of interest (313)
setting (315)
c\t~.; rnal v<~hJlly
9.1
r,
Exercises
34 1
Exercises
9.1
9.2
Suppos"' that you have just read a careful statistical stud} o t the t..!lkct of
advertising on the demand fo r cigarettes. c,mg data fro m Ke" York d uring
the 1970s, it concluded that advert i!;ing on bu~cs and ubways was more
effective than print advertising. L1se the concept of external validity to determine if these results are likely to apply to Bo:,ton in the 1970s: Lo:, /\ngeles
in the 1970s; Ne"vYork in 2006.
Con~ ide r the one-variable rcgre""ion model: Y, = {30 - {3 1X ; - u,. und suppose that it satisfies lhe assumptJon 10 Ke} Concept -U. Sup~ that Y, is
measured with error. so th.tttht.: data arc Y
, = Y, -r ., , \\here "' is the mea:,ure me nt l!rror which is 1.1.d. and independent of Y and X,. Consider the
population regression Y; = {311 ..~.. {3 1 X, ..~.. v" where v, is the rcgre~~ion error
using the mismeasured dependent variable. Y,.
a. Show thar v1 = u, + w 1
b. Show that the regression Y; = {30 - {3 1X 1 + v, "atisfies tht! assumptions
in Key Concept 4.3. (Assume that w1 is independent of Yi and X1 for all
V<llues of i and j and has o fin ite fourth moment.)
c. Are the OLS estimator' consb tent?
d. Cao conftdence mtervals be constructed in the usual way?
9.3
342
CHAPTER 9
9.5
Q and r t..lepcnd
Q.
i. Use your answers to (b) and (c) to de rive values of the regres~ion
c.oeiTicie nt ~ fHint: Use Equations (4.7) and (4.8).]
ii. A researcher uses lbe slope of rhis regressto n a~ an estimate of the
slope of th~ Jemand funct10n ({31). Ts thee tima red slope too I. r '
or too sma ll? (Him: Use the fact l.bat demand cu rve~ slope d vwn
and supply c urves s lope up.)
9.6
Suppose n == 100 U.d. observations for ( Y;. X,) yield t he fo llowing regrc:::.~ inn
results:
(12.2)
Y=
(_
X .S R = -- R: = --.
-(__J
Empirical Exercise5
eQ
that
343
P-
Are lhe following state ments true o r tabe? Explain your an~wt!r.
a. ''An ordinary least squares regression of Y onto X will be intcrnaUy
inconsistent if X is correlated \\flth the e rror term ."
b. 'Each of the five primary threats to inte rnal validily impli ~s that X is
correlated with the error te rm."
9.8
Wou ld the regression in Equation (9.5) be useful for predicting cest scores
in a school district in Massachusetts? Why or why not?
9.9 Consider the linear regression of Te.r1Score on Income shov.'ll in Figure 8.2
and the nonlinear regressio n tn Equatio n (8 18). Would e ither of these
regress1ons provide a reliable esti mate of the effec t of income on test scores?
Would either of these regressions providt: a reliable me thod fo r forecasting
test scores'? Explam.
9.10 Read the box "The Returns to E ducation and tJ1e Gender Gap'' in Section
8.3. Discuss the interna l and external validity of the estimated effect of education on earnings.
9.11 Read tbe box "The Demand for Economic~ Journals' in Section 8.3. Discuss
tbe internal and external validit ot the estimated effect of price per citation
on subscnptions.
Empirical Exercises
E9.1
es an
rs each
ente red
his
sample
te "cor
Use tbt: data set CPS04 described in Empirical Exercise 4.1 to answer the
following questions.
s. Discuss the internal validity of the cegrcssions that you used lO answer
Empirical Exercise 8.1(1). Include a discussion of possible omitred
variable bias. misspecification of the functional fo nn of ilie regression.
e rrors-in-variables, sample seJect10n, simultaneous causalaty, and
inconsistency of the OLS srandard errors
3 44
CH APTE R 9
E9.2
t,
E9.3
Use the data set CollegeD ishmce described 1n E mpirica l Exercb<:! 4.J to
answer the followin g questions.
a. Discuss the internal va lidity of the regressions that you used to answer
Empirical Exercise 8.3(i). Include a discussion of possible omitted
variable bias. nusspecifica tion of the functional form of the reerCS'>l 1.
e rrors-in-variables, S<tmple selccuon, stmultancous causaliry, anJ
inconsistency of the OLS standard errOl".).
b. The data sc i College Distance excluded stude nts from western st.u ..-;;
data for these studtnls are included 1n the data set CollegeDis
tanceWest. Ul!e these dAta to investigate the (geographic) external
validity of the conclusions tha t you reached in Empirical Exercise 8.3(i).
APPENDIX
9.1
The Massachusetts
Elementary School Testing Data
The Massachusetts data are d:.tril:twide averages for public elem~ntary school dl,lfl 199 . The tc~l ~ore s tahn In m the \1a '"'chu~nc; Com pr~h~N'e As.sl.>~rncnt \~
(\IICAS) (.Cc;t adminhlt:ro.:d to all fourth gr..Jcr-. in ~las,achu~em public school'
spnng of I'1:l8. The test ' ~ SJ)(,IO\Orcd hy Lhe M.ssachusem ()epdrtment of Educ.1rion nJ
:e.
~~
ng
me
the
4.2
the
OUT
aer-
cal
3 t tl
tcs:
l
istn.:l' ,n
t !),,,_ ltl
h itt hC
auon ,nd
345
'' mand.tloJ~ for all public !>Chools. The d,u,t anJI}/cJ hlrc.: tre the ovc.:rall tmal l>Corc. which
is the ~urn ~lf the score~ on the En lish math. and sciem:" p<lrtion ~ the te t.
Data on the )\udent-reachcr rauo, the rcrcentage of o,tudenh rccc.:i,mg a <;Ubsidizcd
lunch and the percentage ol students still I~ 1m10g cngli~h are averages tor each elemenlilry :.chool th<;tOCl forth~ 1997 19'JH school \I!M and \\c:rc obtamed rrom the Ma~cbu
setts Dc.:p.1rtment of Education. Data on av~:r.tgc distnct tncnmc ''ere obtamcd from the
1990 U.S. Cen'~
PART THREE
Further Topics in
Regression Analysis
Cu ~ 1., r R 1o
Cu
PTE;R
11
CuA PTER 12
CuAtT
13
~~ ~/1-t< ~
CHA PTER
.,..~ ~.
~,~ ~ '~
I{J<D(1()~
10 I Regression
the variables. however. they cannot be included 111 the rcgresston anllthc OLS
esl imato~ of
,~
~~!
This chapter de cribes a method for controlhng for some types of omitted
variables without actually observing them. This method requtres a specific type
of data. called panel data. in which each observational unit, or eo lit). is observed
, (u).ts
at two or more Lime periods. B~ stud) mg dwnge' in the Jcr.end~:nt ,,111abk o'er
6'~MI/~I
cirn~:. il j,
(Jif!;Y/w~
p.y ~
;v~-
the ~ffcc:;t' of alcohol taxes and drunk dn' in~ Ia\\ ~ un trarfic f:H:1Iitic' We
])~~~ ad=dr;s this question using data o n traffic fatalitic~. <llcohol taxc~. drunk driving
t/Z...
".f.ws. and related vadahles for the s contiguou U.S. >1-'C' fnr each or the >even
~~'tar!> from
variables that differ (rom one state to the next, such as prevailing cultural
:auitulh.:' t \\an! dnnktng and d.n' ing bu.t do not
~ e~
e~ ~
lt also
~u(/.~ dn\ mg Jat<J 'ct Pixcd effects rcgrc..,ion. the m.:~m toul for rcgresston nnalysb of
p v.Vii &M.l.. panel data. I!> an CXtcn!>iOD of mUJttplc! r~grc... 10n that CXplllltS punc.:l data 10
contrnllur vanahks that d1ffer acn.>s., cntit te'> hut arc constant m~:r umc Fixed
ctte<:.t!> rcgrcssion is inrroduced in Sccuons 10.2 and 10.3, firs t for the ca~c of
only two time periods, theo for multiple time
period~ Jn
350
CHAPTER 1o
10.1
Panel data consiM of observatiOnS on the same 11 entitiCl> at two or more umc period!~ T.II tht: data set contain!~ observations on the variables X and Y then the t.lata
a re d c norod
(X11,Y11
,I -
l. . ... 11 and t
1 . . T,
( H J)
~.:condsub.
methods are e xtended to incorporate so-called time fixed effects, which contrl I
for unobserved variables that an:: constaOLacro ~n l ltlt!., but chan 'I.' over tim.:.
Section L0.5 discusse s the panel data regression assumprions and standard
error) for panel data regression . In Section 10.6, we use these me thods to stuuy
the effe ct of alcoho l taxe and d ru nk driving laws on traftic de aths
Panel Data
Recall fro m Section 1.3 that panel da ta (also caJie d longitud ina l data) rde' 1 '
d:n.t tor' JJ( cr~:nt c::nllllt;S ,h,._rvl.'J at T d.tffcrc.:n lime rx:nuJ-.. rrlle state t
"
fatality data stud ied m this chapter a re pane l data Tho)c data arc ll r" 4" c..,
lle-. (st u.. ,). \\ ht.rc each t: ntH) '' oh,cn ed in T = 7 11m~: p..:rto(h (each of th..: ~ear..
19~:?. . J9SN for a t~>litl of 7 4, = 336 ob:-.cr' at ion-.
.
When J~ scribi ng cross-sectional data it was useful to use a subscript tu J~ note
tht! en tity; fM example:. Y1 refe rred to the variable Y fo r the i1h tmtity Wh~;n
describing pane l data, we need some additio nal no ta tjon to k u~:p track of both the
e ntity and the time period. This is done by using two subscripts rather UlJO 011e:
The fin.t, i , refe rs to the e n tity. and the second. 1. refers to the time period oi the
obserYa tion Thu~ r <.knott:'- thl: \Jrtabk Yob!>t'f\cd fm the ,rh Ul/1 cntili~ ... '" rbe
:" 111 I J'l nod llu.c; notation '' summarized in Key Concept lO.I.
Some ad dli JOna l terminology associa ted wit h pane l data d escribes " he t.r
' I
~ - ~~- ~ .
, ---~ -
Panel Dolo
1o. 1
35 1
mts-.an' d l!:'l for at least unc.: lim~; pcnoJ for ut lc:a tone cntl IS caUcd an
unbalanc~CJ patnd. The traflic fatalit ~ d.n.1 '"' ha' d.tta for all 4S L.S. states fo r all
:,c.;' en 'c.:.,m, 'o 11 1.!. baJanced. Tf. ho\\ C\er, some dat.t were mt-.:.ing (for example. if
we tltd not have da ta on fatalilies for some -.tatcs in 1983). tb ~o.n the data set would
be unbalanced. The! methods presented in this chapter are dc!scnhed for a balanced
panel; however, all these methods can be used with a n unbal,l nced paneL aJthough
precisely how 10 do so in practice depends on the regression software being used .
S<.lnlC
/IIj,;rJ'
~~
~ P'
~~?
iM
There are approximately 40.000 highway traffic fataJJties each year in the Unite d
States. ApproXJmately one-third of fatal crashe!> involve a driver who was drinking, and this fraction rises during peak dnnktn~ pcriodc;. One tudy ( Levin and
Porter, 2001) esti mates that a-; man~ a'> 2" ~. t'f clfl\ ~r~ l'n the: road between I A.~t.
and 3 A .M. ha\ t. hcen drinking. anti th<~t .l timet \\ hn j.., Ic.: gall~ drunk il> at least 13
times as likely to ca use a fatal era h a.., a driwr who has not he~: n drinking.
In this chapter, we study how effective vanous governmenr policies designed
to discourage drunk driving actually a re in reducing traffi c deaths. The panel data
set containl> ,.,riablc related to traffic f.ttthttc' .mtl .tlc:ohul mcluding the oumlxr ll lr.tlltc latalities in eadi .!.tate in ~ach \C:lr.thc tyr.e of drunk driving laws tn
0
"]) ~ :
~,/A
'/j
_c,...
/ 'L...,~(ie-r
~~ ..:.1ch o;tah: 111 t:ach 'ear. and theta'( on l'et:r
eiC.~
to
ffic
nue arl>
b'c.v~
~
10
c.n:h
death!> we u''- ts the fatalit} rate. \\hich is th~.; nunth~.;r <)f annu.tltrafltc deatlb per
10.0()(J p~!ople in the population in the ... tate ll1l me.: tsurc.: ,_~r :11coholtaxe~ \\c: u....c
i ~ the n.:al" t,t\ on a case of beer. " hkh I' the ccr tax. pul mto l (}~dollars by
adjuo,tin for mflation.' The data Jr~ ..,c.:r tl'nlm n1111'- Jctail in Appendi\ 10.1
figure 10.1a is a scatterplot of I he d.tta for 1Q82 on two of these vanable.s, the
fat ality rate and the real tax on a ~se o f hcer. /\ point 111 thl' scaucrplot repre-;;cnts the fatalit~ rate in 1982 and th~ ru1l hc~:r Ill\ tn llJ82 lor(a gncn o.,tate. The
O L I) r~.grcssaon line obtamed by rl!gn:..,..,inl! tli~.: t"'tulit~ rah.' on the real beer tax ~
/~ .L/_
.,.f' re ~110
,. V
FatuluyRnu='2.1ll
015/lllllfl '
(0.15) tO 13
~~~
Jtt.P~
'
fh::.~~:rfi~knt on the real beer tax j, po Itt\
~~~f:..., ~
(IIJS2tlata).
/ ' - _(10.2)
--
at
~? ~ .
35 2
CHAPTER 10
FIGURE 10. 1 The Traffic Fatality Rate and tne Tax on Beer
Panel o is o Kotterplot of trof
Fie Fatality roles and the real
lox on o
Fa tali ry ll1l te
(Fatalities per 10,000)
'-
- .::1
.'.5
lax..
2.0 L-.
-10
3.11
.I
1 5 [ '
1.0
:~:
--
'.
li.S
00~----~----~~----~----~------~----~.I
Cl.lt
0.5
1.11
1.5
l .(J
2.5
.~'I
Beer Tax
(Dollars per cose $1988)
Fataliry Rate
' ties per 10,000)
-15 ......
-II)~
J .S
3.0
--
. .
-"r: ...
? -
2.1
. ..
...
11,11 ~----~----~'----...1...-----'---~--~
0.0
IJ.5
10
1.5
2 .1)
25
ll
Beer Ta:\
(Do Uors per
'
Ca$e
$198111
10.2
Ponel Dolo With Two Time Penod$. Before and After' Compori$00$
353
B ecause we have data for m ore than one year, we can T\.!cxamine this re lalio nship for another year. This is done in figure 10.1b." h1ch i~ the same scallerplot as before. except that it uses the data for I ~X~. TI1c OLS regression line
through these da ta is
FaaiiryRO'i't
( 1988 data).
(10.3)
(0.11) (O.D)
In contrast to the regression using the 1982 data. the coel ficil! nt on the real beer
tax is statistically significant a t the I% levd (the r-statist1c i" 143). Curiom.l}'.the
)
~~
t~ 1
tv
0
.o/'t.;
r -
.P?~
/
/)
/}'
{!) U."~l
;,1,
-( jJ( 1J. ~~
~
(9
estimated coefficient fo r the lQR::! and Lht; 19Rs ~l.l!.t i" ptHIII\o' I ak~n ltkr,dl .
h' 'hl r reiil bet!r ta~e.; are a sociatcil wnh mort: not fc\\~r trafllc fat.ililit!
Should we conclude that an ancrease m the tax on beer h:ad-, to more traffic
deaths'? Not necessarily. because the e regression could ha,c suhstanrial omJued
variable bias. Many factors affect the fatality rate. includmg tbt: y_u.Ilit~ of th1.. automohih.:~ driven in the state. wbetht:r the state h1gh"'Y' an.: m g\lod repair. whether
most d riving is rural or urban. the density ot ,:~r' un tht: road. anti \\hl!tber il i
sodalh al-ceptabk to drin k and dnve.Any of tht.!'c factor ' nHl) he correlated with
alcohol taxes: and if they are. they" ill lead to om1HCd 'art.tble b1as. One approach
to these pme ntial sources of o mitted variable bias would bl.! m collect data on all
4th esc v:ariahles and add them to the annual cross-sectional regressions in Equations ( I0.2) and (10.3). U nfo rtunately. some o f these variubk'. 'uch .... th\.: cultural
t~.:~o:\.:pt.mce ot dnnkmg and dn\mg. m1gh! b~,; \\,;!\ bard or e\'en impossible to
mc:n-.ur~::.
If these factors remain constant ove r time in a given state. however, then
ano ther route is available. Because we have panel data we can in cltt:~o:t hold th~se
fa ctor' coru;tant. even though we cannot mc.t,urc.; th~m ro Jo o, we use OLS
rcgrc s1on
\\ilh
tpt:Jq ~
ftxcJ effects
I/;A1,fP11'11J.' ~
-~~~
(9 ~
/t.J~.
.,., T . . ..
/'J l[
\'aluc of
{'~
the
dcpc~ahl~1
:,:::;;:lu;
u
s~ ~,Jdu~ : ~H~~$1.~~.e.
354
CHAPTER 10
fil"''
n d By focusing on chmrgc\ in the dependent \ afl..tbh;, this .. before a nd
afh.r comparhon m d let h1ld con.,tant the unob..,cn ei.l t.u.:t 11' tnat iliffer lrvrn
one ~l l'Hc lOthc next out du not chang~ 0\~r l1mc: \\llt11n tl t: tate.
' '
Let /. he a variabk thu d-.:tcrmint:' the fatali t) ra te in the '1 !>tate, hut dt
not chang~: over rime ('o the 1 'ub cnpt i-. omntcd) For example. Z, m1ght he: th
loca1JclultuJ ,tl ~Jttitu~o.: h '";uJ dllnk1 ng and dn;i~~~ \\dht chJ lh,~ll~l " lm1 J, ,1,10 1c1 u~
t
cou 1 1l'I:Oil'>l ereu ll' 1w ,nn-.tanl 1,etw~cn IX111 11 XK. n CCC>rumgly.the pop.
l,lyl'l<4, ~ ulatio n lint!ar regression relat ing Z, and the r~.:al beer tax to the fatalny rate 1s
Z:
f~~ -k
pes.H '/-
~~
~
lltfp.tt
. nd
.1:.md1
T
wV
Because Z 1 does not chonge over time, in the rcgres~ion model in Equauon
(10.4) it will nor
any change in the fawlity ra te hCt\\ l.!cn 1982 and
l'hu.... in this regression muc.Jcl, tht: mfluence of/, can be eliminated by anal) ling
the chanac 111 the fatalit\ rate hct\\ccn tbc two period!\. To .,ee this mathematK lll}.
con~id~r Equation ( 10.4) lor each of the two years, 1982 and 19SS:
produc~
~ f,..H
/J Lullf f'
IJ
~:,fA';_,~)~ .'
'f!
{10 4J
\\lll.r"' 1sthccrrortc
Vvl ~
f
~....-_Fil_r~al_.u~R_m-"'"--~"'----'~~--
1\.l~s.
Fa/ali(\ Rmt
1 :..;
Fara/il_\ Rtrlt
~.
+ {3
Beerltt\'
(lU 5)
( HJ.o)
Subtracting Eq untinn ( 10.5) [rom Equation ( 10.6) clim tnntc\ the effect ol L,:
FmulityRnte,
~' -
= f3 1(Bcvllll:,
- Bt:erTar
I~!)+ It 1
(i0.7)
./,
ThiS 'ipecification has Hn intuitive interprelaliOn. ( ulturnl cllti!UUC (O\\.a.rd d rin '
m Jnd Jmmg affect th kh:l ,,J drunk dmmg. nd ttlus the traffic fatalin Jate '"
/1
./.e-;1111L
a ''te. lf. howc\cr.tbt:) did .n_otchange bet\\Cer. J9s~ .nL 'J th~nlhe> diLl not
f.t,~- -II prodUlL '"' d:.111 t rn 1.11.1 ,,,L., '"the 'tate R.llh er, any changes m rra rfic f 1.111
., M.l ~ til!!> overt1me m ust have arisen from otJ1en.ources. ln quulton (lO 7). tbe~e ,,rhcr
V~ ------~OllrCC\ Hrl! changes in lht: ltj\ on beer or changes in the error ~crm (whicl (~1(1
~ .... ~ lttrL~ clunl!Ct; m otht!r lactnr-. th.lt determine tr.1f11c dc?aths)
1
.!J
~
Spccifymg the regre ~ion in changes m Equation (10.7) eliminates the.. IT" 1
f1~
of the uoobliervcd \ari.tble' Z, that are constant over time In othe r worL 'i. Jna
jf;~.fo..-'-c
10 .2
Panel Doto with Two Time Periods Before and After~ Compori$00s
355
FIGURE 10.2 Changes in Fatality Roles and Beer Taxes, 1982- 1988
Th J IS 0 S(X)tteq>!Ot
ollh-. change in the
A)
on
~ changes
..()
8.
'ng
1\y.
-{1.4
-0.2
00
II.~
0.4
).5)
Figure 10.2 presents a ~a u~rpl o t of the dltJII~t in the fa taJil\ rate hetween
19g2 and 1988 against the dumgl' tn tht: r~.;,tl beer ta~ bet\\ccn 11JI'i 2 and 1988
for
\
the 48 ~ tate~ in our data set. A point in r 1g.ure 10.2 represents the: <.:hange in the
fata lity nne and the change i ll tbe real beer tax betwee n 1982 and 19H8 for a given
state. 'n1c OLS regTession line. e!>timatcd uo;ing these data and plotted in the figure. is
0.6)
f Z,:
0.7)
/.
~>
..,
,., f1.IW'
rl~
FawlioRme 11
~ - 0.072 -
1\ -
rwalityRmt 1 ,,~~
I 04(8r fut ,
(0.065) (0.1tl)
- BerrTut ,).
(10.8)
356
CHAPlER 1 o
Regr~Wion with
Ponel Doto
so the estimate suggests tliat !rat fie fataliti~ can be c.:ul in ball merely hy mcrea-..
tn~ th~ rLaltax on beeT bv $1 per case
By cxam1mng cbangl:!s in the fatality rare over Lime. the regre!l-.ion in Equa
tion (10.8) controls for fi xed factors such as cultural attitudes towanl urinking and
driving. Butth~re arc many factors that inOuencc traffic safety. and 11 the. v chang
over time and arc correlatc::d with the real beer tax, tben thr:t.r om i~s1on wiiJ pro
duce omilled variable bias. lt1 Section 10.5, we undertake a more careful anah~h
that conlrols for several such facrors. so for now iLis best to rdram from draw1 1g
any substantive conclusions <~bout rbe effect of real beer ta:<cs on traflic fat.~ht1c s.
This "before and after" analysis works when the data arc observed in two lillferent years. Our data set, however, contains observations for seven difkrent y~ar'l,
and it seems foolis h to discard those potentially useful addi tional data. Eut th~
"I'll lore anu after'' method does not apply direct!) when T "'> 2. To analyze all the
Obl>ervation~ in our panel data set, we use the method of fixed effects regn:o;swn.
The fixed effects regression model lias n Ciifkrent intercepts. nnl! fN~ceach
entil\. Thl!-.c intl!n:epts can be represented 11\ a set of bin an (or inu11:ator) van
abies Thco;c l:linary 'ariahlc:i absorb the influence~ of uU om1ttcu \ariablc th.1t
differ from one cnt1ty to lhe Ol!:\1 but are constant over tim\...
f
~
Y;, = + p,x,, + {J,Z, + u;,.
(!O.Y)
~ where Z, i~ an unobserved variable that varies from one state to the next but . , ,~s
{30
no1 change over lime (for example, Z, represen ts cult ural atlitudes toward dnnk
10.3
357
eas-
quaand
~~
c es $~
wing
litics.
o di(years.
t lbe
til the
sion.
r,..r
..L
a, + u,,.
('lO. IO)
Equa tion ( 10.10) is the fixed effects regressio n model. in which a 1 a 11 arc
treated as unkn own intercepts to he estinulted. one for each state. The interpret<ilion of ct; as a state-specific intercept in Equation ( 10.10) comes from considering
the population regression lin~ for the ilb staLe: Lhi populatioH regression line is
./, ,,
o<..c.,
.,,# ': a;+ {3 X,r 1l1e slope coeffici ent of the population regressio n lin!!. {3
1
1.
is the same
~~~~ for aU statt!s. but tbe inrercept of the population re gr sion line varies from one 1
).!1Vf
V ~
panel
YJJ = /3 1-"11
''
. re:spec
The stale-specific intercepcs in the fi xed effects regression model a lso can be
expressed using binary variables to denote the individual states. Section 8.3 considered the case in which the observations belong to one of two gro ups and the
popula tio n regression line has rhe same slope for both groups but different intercepts (see Figure S.8a). That population regression line was expressedma thematicall ' usin a single binaiy variable indicating one of the groups (case #1 in Key
Concept 8.4). If we had only two states in our data sel, that bmary van a
::.
sion mode l would apply here. Because we h ave more than t wo states. however, we
need additional binary variables to capture all the state-specific inte rcepts in Equation (10.10) .
To develop the fi xed effects rt'gression model using binary variables. let D l i
( I 0.9)
be a binary \'ariable thai e4uals 1 when i = l and equals 0 o therwise: let D2, eq ual
l when i = 2 and equal 0 ot hen' i::.e: and so o n. We! cannot include all11 binary variables plus a common intercept. for if we do the regressors on.111 be perfectly multi-
r eacll
r) vari
lcs thai
,v ariable
but doe<;
rd drink
collinear (this is the "dummy variable trap" of Section 6.7). so we arbitrarily omit
the binary variahie Dl; for the firsl group. Accordingly. the fixed effects regression
mode l in Equation (lO.l O) Cc-'1.11 be written equiva lently as
358
cHAPTER 1o
Equauon ( 10.10). compare lhe population regression lines for each ... tale in the
rwo equations. In Equation (10.11), the population regression equa uo n tor the Llrst
stale is /311 + {3 1X ,. so a 1 = {30 For the second and rernainjng states. it
+ y 1.S(l u, .... {30 + y fori> 2.
1s {311 .A.
{3 1X,,
Thus. the re are two equivale nt ways to write the fixed elfects rl.!gre. sh.l1
model. Equations (I 0.10) and ( 10.11). In Equation (10.10). it is written in tcnn" nt
n statc-spcc1fic intercepts. [o Equation {10. I 1). the fixed effects reg.ression modi:!!
has a common interce pt and
11 -
mo
slope coefficient o n X is lb.e same from one state to the next. TI1e sta te -spectftc
inte rcepts in Equa tion (1 0. 10) and the binary regressors in Equation (10.11 ) have
the same source: the uno bserved variable Z, rh at varies across states btll not
time.
I.Wt:r
10 K~y
x.,_
ment if the number of entities is large. Econometric softwAre there fore has sr''
cial routines for OLS estimation of fi.xed effects rcgressjon mode l::.. The~~ o;p~~.1JI
routine!> an~ equivalent to using OLS on the full binary variable rcgn.;!>~ion , bUt
arc faster hecausc they employ some mathematica l simpli fica tions that ansi! in the
a lgebra of fixed effects regression.
putes tlle O LS fix ed effects estimator jn two steps. 1n the first s tep. the cntttf
. [>C'Cific average i subtracted [rom each variable. In theM!C(lnU stl!p. lhc: regr~o:,,lt1n
1 t..:sttmah:u u.,ing ..c:ntit} dcrnt..mcd .. variable Specifically, tonsider the ca~ lli
a 6ingle rcgn.:s:.ur io the vc f'oio n of the tixe J effects model in Equatio n tJU. IO) ,trtd
wh
10.3
359
(10.11)
~rive
te in tbc
r the first
the
cepts in
u + {3,X,,
terms of
on model
tions, the
e-spccific
.11) have
l no t over
(10.l2)
whe re i = I, ... . 11 and 1 = L ... , T. where Xu 1 is the q llue of the first regressor
for entity i in time period r. X2, 11 is tbe value of tht:l second regressor. and so forth,
and u 1, .. , a: 11 are entity-specific intercepts.
Equivalently, the fixed effects regression model can be written in tenllS of a commo n intercept. the
and n - 1 bina ry variables representing all bur one entity:
rs.
Yu =
\\ here D2i = 1 if i
of Y that
results in
ed in Key
take the average of both sides of Equa tion(10.10): the n Y, = {3 1 X; +a, + u,. where
Y =} 2:.[- 1Y". and X, and ii, are defined "imila rly Thus Equation ( 10.1 0) implies
thar Y,,- >'; = {3 1(X11 - ~) + (u.. - u.). Let 1~, = Y,, - Y. _.\u = X, and ii 11
x;.
= u 11
ii,; accordingly.
~~~-
J)~~~~
:;sion model
:r, bask -1- n
) in practice
>le to impleore has spcbese special
~ression. but
Lt arise in the
yptcally com
p the entitv
be regrcss1on
~r the case of
,0 ( Hl.lO) and
S~ ~ ~ ~~
Y,,
Y,,=/3 1X -u 11
th~
11
(10.14)
abies
on X1,. In fact, tbi~> e..,timaror is ident1cal to rhe OLS t::,Limator of {3 1
obtained by estimation oT the fixed effect!. model in Equation (I 0. 11) using n - 1
A lthough
Equation (10.11) with its binary a riables looks quue dillerent than the 'before
and after" regression modelm Equation (10.7). in the specia l case tha t T = 2 the
OLS estimator {3 1 from the hi nary variable specification and from lb...: " before a nd
after specification are identical if the intercept is excluded from the ''before and
afte r" specifications. Thus, when 1' = 2. there a re three ways to estimate /31 by OLS:
the " before and afte r'' specificatio n in E quation (10.7) (without ~1.1 intercept).tqe
binary varia ble specification in Equation ( 10.11 ). a nd the "entity-demeaned" specifi cation in Equation (10.14):n,csc three methods are equivalent, that is, they produce identical OLS estimates.
360
CHAPTER 1 0
w~
l)l
10 . 4
361
Including state fixed cffectfo tn the f,ttnht) rate rc~rc,..ion let-. u, a' oid omith..:d 'anal:tles hias arising tr. 1m urnttt~:J I 'letor 'uch a cultural attitude toward
cJrinklne. .111J drhing. rhat \.tT) . u '" ... t.llc" hut ..ue cun-.tant o'er ltmc:' ''tlhtn a
..tate. 'ttl!. a keptic might suspect that there arc other factors that couJd lead to
omitted variables bias. For example. over thb petiod car<; were gcuing safer a nd
occupant.; were increasingly wearing ~eat b~.:lts; if the rc:altax on beer rose on averaj!c dunng the mid-1980s.thcn 11 could be ptd.mg up the t:ffect of merall automobile ,afety in1pro' emen t!<- If hO\\C\ cr. ,,,let~ imprun..nlL nh evolved over time
Inn \\ ere the same for all ~t a tes. then we can eliminate the tr intluencc by includmg tim!! fixed effects.
ln
tions
or is
~kJ~-
esting
confin \\tlh
effects
r tax to
~
10.4 Regression with Time Fixed i:ffects
Just as fi xed effects for each entit) can control tor Htriahlc:~ tl1at ar~ con... tanr o,er
time but diller across entitie.,. ~o c;tn tim~ frxed clfcct'> control for 'ariable:. that
arc constant across entil tt:~ butt:\'Ol\'e oYer ltm~
Because safety impro'vements in new cars are introduced nationally, they serve
to reduce traffic fatal iti e~ in all states. So. it is plausible to th ink of automobile
safety as an omitted variable that chang.e over ttmc but bas the same value for all
!-ta te. The population reg.re!<-~ion in Equation ( 10.9) ca n be modified to include
the effect of automobile alcty. "hich we "ill denote b~ S,:.
(10. 16)
( 10.15)
where 51 is unobserYed. and where the si n!!lc .. , .. ~ubsc ript cmpha~I.Zc~ that safety
mer time but h. constant aero'~ ~ta t ec;. lkcau'c: {3,S represent~ "ariables
that determme Y ,. if S, is corrclall.. d "ith X, . then om11 11ng S, from the regression
h:ad' to omitted variable btas.
cha nge~
~nd ilJ
td etlc\. t<
ll'C o(
tllC
S) than tn
For the moment, suppose that the variables Z1 arc not present . :.o that the term
!Jz.Z1 can be dropped Irom Equation (10.16), although the term f33S, remains. Our
objecth c is to estimate tJ1, controlling for S,
Although S, is unohsened. tb influcnct c.an he eliminated because it ,aries
over ttme..but not aero!.':> tatec,.Just ,,., Ill'> pos.,.hk to e!Jminatc the effect of Z;.
wh ich vanes across state~ hut not 0\er 11m ~ In the ~.;ntuy hxed dlcct'\ mo<.lel. the
pte,c:ncc of Z, leads to the ft xcd cltccts rcgrcs,inn model in E4uu 11on (10.10). in
362
CHAPTER 10
which each state has it!> own intercept (or fixed e(fect). Simslarly. because S', \!lfh:~
over t imc but not over state!>. tb.e presence of S, lea<.h to a regression nwdcJ m
which each rime period has its own intercept.
Titc time fixed effect' rcgres'\ion model "ith a
(l J 1~
This model has a different intercept..>.,, for each time perioo. The intercept \ ,
Equation (10.17) can be thought ofaHhc"effl!ct" on Yof}ean (or. more gur
ally, time period t). so the term .>. 1.... A an:. l.nm'n .a-. rime: fi,~d errccb In c'
ation an the ttme fiXed dfect!> come" lrom om1ttecl \,mabll!:o tha t. likeS, in
Equation (I 0.16), ' ar~ over tame but not acrn'>!> cntillc'
Just a!> the entity IJ.\ed effect rcgrei>sion model cnn be re presented u'tn~
n 1 binary indicators, so, roo. can the 1imc fi-.:cd dfl!cts rcgrc-;<>ion moucl hL rc..:p
fCSL'Oli!U using /' - 1 binary indicators:
( llJ I~)
} = {3 A
( 10 !9)
10.4
rics
lin
effect 111il .\l j, the tim.: fi\ci.J dkd fhj~, model can
- I entity binar) tndicators and T - 1 time
htn<lry indicators.. along with an mtercept:
where tt is
(1020)
0 .17)
A, in
cner,-ari.
51 in
u<.ing
3 63
~rep
{lO.l8)
2 =0
uatioo
and !he
ressors
!lows us
d saft:tY
year.
Estimation. The time hxcd cflccl'. model and the cntt l~ and tuue fixed effects
model :1rc both variants o[ the multtplc rc:grc:Sliion modc:L 1l1u!> tlwtr coefficients
can be estimated by OLS h} mduc.hni! th\! aJdiuonalum~.- hmal) ';mables.. Alternatively, in a balanced panel the coefficients on the xs can b~.- computed by first
devtating Y and the X s from their en til) wrd time-period m~.an~ then t!~fima ting
the multtple regres:.ion equation of dc,iated Yon the d~' iated X':.. Tht~ algorithm.
"htch ''commonly implemen ted m J~.;g rc~.,ton software. dimthates the need to
con'>truct the full set of binan indtcators that appc3r in Equation (10.20). An
equivalent approach is to devintc Y the}('-.. and the tunc tndtcators from the1r
state (btu nnl 11me) means nnd to cliltmalc f... + I em.: llictcnts by muluplc regress ton ,lf the c.Jc, tatcd Yon th..: dcvtakd .\"'sand the dcvtatcd time inclicatms. Fm<tiJy.
if T = 2. the entity and time fm.:d dl~cts rcgre:.~i,,n C<ln b'- est muted using the
"before tnJ ,tfl~.r" approacb.ul cctiun 10 2.m~.-1udm~ the: mh.:r~o'- 1111 there ~
"ton.Thus the "before and aftcr " fCP.fC' ion reported to J:quatiOn ( 10.8), in \\Chich
the change in Fara!il) Raft from I'JR2 to 19- s b regr~'"l.:d on the chanae in Beer/a.\ (rom II.JX2 to 1988 ncludmr> an mterccpt. pm"Vtdes the 'am" cstunatc of the
..,Jopc coctllctent a~ the O L r~.- rc..,.,tOn of 1-iualmRttu on /Jeer /a.\.mcluJmg entity
and time h\.cd effects. ~ltmnt.:d u'ing d.ltd lor the I\\ u} l!iU I ~X~ .tnd 19&~-
1 (such a<;
_mt:
llY
(-:,uch
htntd
Application to traffic deaths. Adding time effecb to the state fixed effects
regression results in the OLS t!'-limate of the regrc!-.,ion li ne:
( l0.19)
( 10.21)
364
CHAPTER 1o
This specirication includes the beer ta~ 47 state binary variabh!s ('-tate frxt>d
effects). 6 year binary variables (time fi-:cd effects). and an intercept. sn that th 1)
regressiOn actuali\o' ha:. 1 + 47 + 6 + 1 =55 right-hand variable ! The cocffictcn
on the Lime and :,tate binary variables and the intercept are not re ported bcc;., us~:
they are not of primary interest.
Including the time effects has little impact oo the estimated rch.Jttnn'!>htp
between the real beer tax aod the fatality rate [compare Equatmns ( 10 15) anu
(10.21)]. and the coefficient on the real beer tax remains igniftcant at tbc 5., h.. vd
(f = - 11.64/0.25 = -2.56).
Thi3 estimated relationship between the reaJ beer tax and traffic fatnlit k .. i.\
immune to omjtted variable bias from variables that are constant either ovu timt:
or across states. However. many important determinants of traiiic death" do not
(aJl into this category. so thTh specification could still be ::.ubject to omitted \'ttnable
bias. Section 10.6 therefore unocrtakes a more complete empirical examinati<>J (lf
the effect of tbe beer tax and of laws aimed directly HI eliminating drunk driv1ng.
controlling for a variety of fac tors. Before turning ro that study. we first di!!cus.; the
as umptions underlying paoel data regression an d the construction of stnndMJ
errors for fixed effects estimators.
THE
Form
1o.s
The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression
365
10.3
= 0.
3. urge outliers arc unlikely: (Xw u 11) haYe nonzero finite fourth moments.
4. There i no perfect multicolhnearity.
'
Th~ crro~ for a given en Itt} ure uncorrelatcd over time. condit ional on the
X2 . ... x
.11
,unable bnt'i.
hold if ,rHitic drt: :.dcCted b~ stmpl, f,IQt!\.lm '~mpling ftfll tl11. J'pUJ.Itl<ln.
The thirJ and tourtb assumptions for fixed effects regression arc analogous
to the th1rd and fourth least qua11.:s 1ssumption'- for cro,-.-~l!t:t iona l datn in Ke'
.
Concept 6.4.
The fift h assumption is that! he l! rrors 11,1 in the fi xed dfc<.:t~ l'l!.l'i,;~Sion model
<H~.o uncorrdatcd over time. condition. I on the regressors. n,is as.,umpt1on is new
and docs n; t <trise m cro -<;~ction'.tl data whi~.oh do not h,1vc a tam~.. l.hmen ion .
....
O ne V\ a) to understand thas assumption is w recall that "~ c::l!Nsb ol tame-val)mg hctors thnl are det~.:rmmant'- uP}' hu t 'lrt: not mclud~.:d a' ~ rcssur<;. In the
trafftc fatalitks application. one such ractor b the weatht.:r: J\ partttul rl} -.nowy
winter in Minnesota- that is. a winter with mtlre ~now than a' ... rag'- tor \tinnesota,
because there already is a " Minnc-.ota'' ri\ed effect in the regression could result
....
...
...,...
366
CHAPTER l 0
in unusually trcilcbr.!rou~ driviog and unul>-ually many futa l l.lccidcnh.li the amuum
of now io Mtn ni.!'Ota in one yC<.~r is uncorrdated with the amount of snow in lh
next year, then this omitted variable (snowfall) is uncorrelatcd from one )'lat to
Je,~ 1
the next. Stated more generally, if u,, consists of random fact on. (such as 'nowt 1JI l
that are uncorrelatcd from one year to lhe next. conditional on the rcgn.:~'ur-. ( ,
beer tax) and the state ( M innCl>ota) fixed effect. thcn llu 1c; uncorrt:lau.;d [ron . ~c<tr
to year. conditional on the regressors. and the fifth assumpllon hol<.l
The fift h a<;Sumption might not hold in !lome application~ however. For CX'"n-
~fusut~Uy
tc~d
\UCCC,'ii~nh.
pie,. if
snodwybewinters, indMAinndesota
to fo11ow 111
then that
()mttteu actor wuu1
corre 1ate .
O\""llturn tn t11c 1oca 1econom\ rmg 1 pruJ ~..~
_.. 0~
'P""
layoff!> and diminish commuting traffic, thus reducing traftic fataltu~ for two or more
(l
}cars as the workers look for new jobs and cummuting pauern$ '>l~l\\ly adJU"t ~ ~mi(.J,.A
larl). u major road improvemenr project might n~cJucl:! traffic accident~ not only m thl'
R ~ ~tr C/ ~
y~::ar of complclion but also in f uture years. Omitted fnctors like thc~l:. which per~L'\L
I
v
over multiple years, will produce correlation in the error term over time.
~
If u , is correlated with u l< for differen t values of s and /-that is. if 1111 1 co1rc,.et>~i,
latcd oyer time tor a given entity-then u,, is ~aid to he autocorrelatcd (corrcl.tl~o:<.l
JA4/"~ wi th irsclf. at Jiffe rcnt dates) or serially correlated . TilU~. Assumption #5 1:.,111 be
'~~
t.fiAsta ted as requi rin~ a lack of autocorrelation of 11,,. conditional on the Xs anu the
ent ity fixcu effects. If u, is autocorrclated. then As!.>umption ~~~ fall~ Auto~or daY' . ~
tion IS an essential and pervasive featu re or time series data and is di cuss :lm
~ .
~ detail in Part IV.
~
r-.l
l' vv
sr.t
r1)
relJ
~'-
~ e-,
.tv
Regressio~
lf Assumption #5 in Key Concept 10.3 holds, then the errors u,1 are uncorrd.tll:d
over time. conditional on the regressors. In thi case lf Tb modc.:tntt: or lar~c then
th\.: u~ual (hctcro-.kc.!clasticity-robust) srandarJ errore; arc valid.
If the e rrors arc autocorrelated. then the U'\ual standard en or form ulH1 nvt
\ahu One wa\ to see th1s 1'- to dra\\ an analog) to h~h:ro~kt:J.to.,ticit . l n an: r~...
sion witb cross-sectional data. if tbe errors are hc teroskedastic.lh"n (as dislu~~c:d
jo Section 5.4) tht homoskcdasticity-only standard errors an: not valid hl'ca us~
they were deriv~d under thefalsc al*iumption o f homo kedasticit . Similarl) .lf thL'
\:rro ~ in panel data arc autcx:orrelatcd. then the U'\ual ~tandanl c:rror-, \\ill n 1t be
vahd ht.Jcause tho.:} \\l'rl! Jerived under the false ~umption that they are not .J\ ,,,.
correlated. App-:udix 10.2 provides a mathematica l explanat ion of why tlw u~uul
standard errors arc not valid if the regression errors are autocorrclated.
Standard errors that are valid if u, 1s porcntt.tll) heu~roskcda-.tic. and po ~,.n
tiUJI} c.x>rrdated \.l \ cr time \\ilhin an entiry an: referred to a-. hcle ro~ol\.edu!ll dW
and aurocorrelutionconsistenJ (II \C) standard crro.-... The standard ~;I ~>~'>
1 0 .6
clu,l~.:r.ol \!TI>upmg.
r ,
h;I ln Lbc
367
but a,sumt. 1h . 11 tht. ''t Llllllltl l.ttcd l<lr .. rror!> nut in th'Ol aufocorrd;llJOO in pand t.f,11 a th~ dUl>ter Cunsists
I. . I \\ ht..n u J' corre-
~.:OnlcX I
~ #v i#.. Alcohol taxes arc only one ''ay to Jiscouragt:.. dnnk111g and dnvmg. State~ Jilkr
~'
in th~ 1r pumshmenb for drunk dnving. anJ a l>l<ltc th.tt crad... t.fo,, non drunk dri-
?
' 1ng could Ju 'o a<..Tll ~ thl! board b~ tOUlht.nl1!:' Ia\\' .1., \\dl IS rai-.ing ta..xc"- H 'o.
~ ~mlllmg these law could produce omined vari.1bk hins 10 the OLS estimator of
,S~~ hc cffec1 of rt!al btcr taxc<> on traffic. fatalitic .... C\'cn in rcv.r~,.;ion!) with state and
(,I. ,_,A.
time: fi:-.cd effects. In addi11on b~causc vehicle u-.~ dt:pcnJ, in part on '"he ther d ri/~
'crs have jobs .md bccutL-,c tax chang~s can rdkct cconomk condlllODl> (a .. tmc
~
butlgel deli<.'ll can lead 10 tax bike,). <l011lllll~ St1tt.. t.:COil<.llDIC COOdllions also COUld
j/'~
h.''l;h Ill ommed \'ariablc bias.
~
~~~
In tlus ~l.!ction \\~:exten d tht! preceding nnal\s1s to s tud) the: dfcct on traffic
~l.ltalitk:. o l drim. tlg.la''" (mduuing --ecr t.txcs) h~'h.hng ..~,,nomit.. coniliuoru.c n;, _ ...., ~~ c;t:tnt Jlu.; ''June b) c timating panel data rt.-grc ion th t mclu\.k rcgrc . . . . ors rcn:~ re nt'n\! uth, r drunk Jri,ing hs"' <~nd 'I tit: wno1nc t.ond'i11ns
~~.'
l11e resuJl!. are :.ummarized in Table JOJ. The lm mat of the tJbh.. is the same
as that ol 1he tables of regres:.ion results in ChaptcJs 7. 8. and IJ Eat.h column
~e _.pr<: ports a difft"r'-nt rcgrt:ssion and each ro\\ 1~.:purt ''-'''-lhcJcnt cstimHh! anJ l n..J P"''"'/,AIJ' dMd error F-statt,tlC and p-value or oth~.:r tnfmtn.llion abnut lht. rcgrc saon
/~Column (I) an fa ble lO. I presents resu lts tor the OL~ 1cgress1on of the fatal-
d
0
,,
<.'1
he
\le
to-
uol
column (2 ) Ircported previous!) a" Equat1on ( 10.1")) wh1ch mdudes -.tate fixed
cllcctl\. uggcsl" th,Jtthc positive coefficJc:nt in rt..g~ "t<>n (I) j., the result of omittcc.J "arinolc hias (the coefficient on the real bt.:cr 11x" 0.66) l11e n.;grcssmn R~
JUmpl> from (I.( )I){) to O.X."i9 when fi-xt!d dfi.'Ct' MI.' indmled. c\Jdcntly,the SUI\C fix ed
d h.:ct'> account for n large amount of the. "HI I 111un 111 the d.Ha.
Little ch.tngc when time effect-- :ut; td1.kc.l. ,1., rcportec.J in column (3)
[reported prt.' inu,J~ l"\ Equation ( 10 .~ I)). l11c r(-.uJt, in columns ( 1)- (3 l are
368
CH APTER 10
r TABLE
10.1
~.,
1--
Dnnkmg age
( 1)
(3)
I~
Drinting ag.. 19
.f:
(4)
(5)
ll.tr-8
(II.Oo6)
-O.o! I
(0 Q6..1)
O.lll'l
-li.ii7s-
~~0.(~(),
.031
.046)
-,~Jtr -rLt
O.lll'<
(0.071
. .: o:ot'l
(0 .1)5 ~ j
(11.1149)
-O.W;?
(0.1.146) ~~
-U.Oll2
(0.017)
. ~~t
ll.OT3
(0.032)
O.OJ J
(0 1)(8)
(U-n
~---:
IJ.Il~.l
(0 137)
(0 .I'14)
o.o.w
Mandato!)' jnU
or community service?
(O.OM)
~--
~~
;~~.(.,_~;-~(O.IkJ.'\)
() 017
(0.1110)
Uil
(0.47)
Slate effects"!
Tune etrects.,
Oustcred standard errors"/
().(X)!)
-0.063
t0.012)
Unemployment :
11 - ,1
(IW55)
te~ting
DO
ves
no
no
no
no
\"es
(0.45)
c:-.
\ Cl>
-~
---
yes
no
yes
-}e~
no
(liN ;)
~
)~~
;..c:
)e~
y..:
no
Time effects = 0
2.47
(0.0:!~)
11.4-1
( < O.()() l)
U...tS
2.28
11.6:'
'r
(O.O:m ( <. 0.001) ( < (J. I I I
2.09
0.17
(0.845)
0.59 ~
(o "57)
~
38.29
0.81-.'Q
O.b'91
., ,
(0.102)~--c~ ( II '\?5)
4 .IS
0.')21\
() ~~
co:;s~l
25.~
( < O.(lOl)
0 0'-X>
(Cl.6!J6)
O.K93
11926
0.926
These rcgTc~'--"OO~ were c:'linutcd u~ing f!."tncl data ror 4~ l S. tat~ from I~ to 19S.~ 133ft ilNrnllom 1. .II. ~Ti~a u1
Appcodu 10 I Sl.tnd; .rd cnor~ 11re gh cn in r;arent~ " o<Jc ~ oodfJcicnL-. '1<1 p-'
aT" l!'v' n r'" nthc:s..-s undo:r :.r.
SL11ila. lb.: indllOu:al ~-ocfflcient b SII'IDUaUy s,ignif101n1 :It the 5.,. I<"\ e' ur
I ;., 51l(nlllc&OC'o ~ lc ~I
1 o. 6
369
n,e next thret! regressions in Table JU.Itnclude adt.litiun.ll potential t.leterminants of fatality rates. along with timt. ,mt.l ~\Hit: effect,, I h~.: basi! specification,
re ported 10 colum n {~). includcs two se ts of legal variables re lated to drunk
driving plu., vanables that control for the .unounl of urav10g and overall ~nate eeo-
1. Includine. the ~dd:itional 'anahle~ cc.Jucc.:' the ~ttmated ~oef JClt:Ot on the real
tcancc lew I. One way lQ e\ aluate the ma)!.nifude t'f the coefficie nt h tu imaginl! a state with an average real hccr wx t.loubling it\ tllx: because the average
real he~.: 1 tax in these- data is appm,irnutcly $0.50/ca-,e. thts entail" increasing
the l .t\ h\ $0.50tcase. According to tht: estimate in col umn (4). thc effect of a
$0.50 mcrcase (in 19&S dollars) 10 lh1. hc~.:r Ia\ b <1 dct.r~,;ac;~,. tn 1he .:xp~cted
fa taht> rate by 0.45 X 0.50 = 0.23 death per LO,OOO. I hi!> estimated dfect is
large: Because the average fataUt~ rtll. 1~ 2 per 10.000 n ruiuc11on of 0 23 corrt.,pomh to decreasin~ th fatc.hl\' rJtc t<\ -,- p~.r J( .I )0 Thi ~ ....ui:l the estim.llt. ''quite imprecise. Becau't! th1. -.tan\Jar:a rror on thi!; codl1cient '" 0 p.
tht. 95% confidence int~.rval tor th1' dft!cl ~ n -l'i 0 'iO .... I 9h 0.22 X
O.SO = (-0.44. -0.0 I). TI1is \Vldc 95% confide nce 1ntcr"ar 10cludcs \'a lues of
the trut. dfect that .lre ,ery nearly /t.ro
2. The minimum legal drinkmg aee ts e~t una ted to ha' c \en litth! cllcct on traffic fatalitie' The jomt hypotnc~ts that the coelfick nts on the mimmum legal
drinki ng age variables a re Lero cannot be rejected '11 the l0 1}u -;ignificancc
level: Tht: F-statbtic te ling the joint hypothesis that the three coeflidents are
zero is 0.48, with ap-value of0.696. Moreover, the estimates arc smallrn mag-
370
CHAPTER l 0
tlrinkin~
age ol 18
mah.: tllo ha\'e a f, (,alit~ rme bagher by 0.02,' death r~.:r 10.00 lth.tn 'Ill .\ ltl\
minamum legal drinking ag.c uf 21. holding the other f::JctOJ~ in tlw 1cgrc,.
'111n constant.
3.
4. The economic vanabks ha'e cons1derahle explanatory power for tmffk rata).
itic~. High unemplo}m.:nt ratcl> a1c a soc.:iated with fc\\er I<1L:tlitks An
tnCrl.l'l.! in the um.:mplo~ml!nt
potnt ts e~timat. d to
n:ouct tmfttc fataliucs by O.Cl63 Jeath per I 0.000. Similarly, high vuluc' ot r~al
p~r capita income .Ire a!>~odated '' itn high fatalitit!!S: The codficac.:nt ,., I ~1. so
a I 'Yo 111creasc in rl!al per capit<t incoml! is associatcu with un incn.!~"'.. an tr ,lffi futtliuc of 0 0181 dc.atb per 10.000 lSet. Case I an Kty (~mc~:pt 1-. 2 lor
int~rp rl!ta tion of t h i~ col;!fficient). According to these estima tes. good conomic condttton" tire a<>...ot:iarcd with higher fataluics, perhaps h~c~ll"t. of
incrc~t ~:i.llralfic.: ucnsit~ when the unl!mployment rate is low or grcatct lh:O:'
hoi con...umptlon \\hen mcume 1'- high Tb~ t\'o cconomtc vanablc-; an: JUHttly
'>ig.nilicnnt at the 0.1% Slgn if'icancc level (the F-statistic i~ 38.29)
rak b) one p
rc~nragc
_p
fuV~
{r~ t...
Ctliumn" (5) and (6) ofTahlc 10.1 rcpon regressions that c hec ~ th~ sen,ltl\'
ot these.: conclusions lo changes in the ba c spcci(ication. Tite rcgrc:.,...lon 1 LOI
~I /J 1 _/Y
~umn (5) drops th.: ' ariabks that control for economic cond~tions. The rc~u lt ,., .1n
~ftlli~creasc an the c ttmatcc.J efkct of thl.! real beer tax, but no appr~ciabk "hu in
~~ f
('he other wt:Uiclc.nts: thL ...ensrla"ily ollhe ~stimatcd hcl!r t.t'l. codficicnt to urduJing the economic variabks, com bined with the !\tali tical ignificance ol the ~ <JCf
(icknls on thos<; variahlcs. ind ica t~!' th<~Lihc n:onomac \;mabh:~ -;huuld r~m am tn
lht.: ha ... ~: 'pcctlu:alton The rcgre . ion in column (6) cxaminl:!s the 'en ith-tty ofth~
rc:-.ult)>. to u...ing n Jifiercnt tunctional torm for Lht! Jrinking age ( rcplacmg.th..: thr~l'
intlic 1IOI \.triables \.\illl the drinking age itsc::U) and combming the two hinaf'\ pun
/ /'
\
i. hmcnt 'analik' 11u: rc.,ult~ ot regrc'-!>ton (4) are nor. ensitiv to tlil!sc. ch.. nt.:t..'
- ~\t
me final column in fabll! JO. l is the regression of column (4), hut with ~.lu'"
/, .
rcrct.l stand,aru c.lflm, that allm\ for autocorrelntion of the error term '"'thul nn
.I/
entity as di.,cu-;scd an ~ection 111.5 and Appendix 10.2.111e estimated c(ldhdt. nb
r1J
in columns (4) and (7) arc the c;ame; the only difference is the standard errol ' 11tC:
p
ustcrcJ swnd.ml error<> tn column (7) arc l<trgcr than th~; -.tandard l.'rmrs iu ('ol~
umn (4 ). Ctm"ei.JUC:ntl}.th"' conclu,ion from rcg:re$. ton(~ 1that the cocfficil.'nl~ 011
j.;NfA
l.
',A
j/-,_.
rn
-~c0
"
Condu~on
10.7
371
drunk dri\'ing laws and legal drin king ,tgc' Jill! not -.t.tll'ticall~ signific:mt al t"l
nht,un-; u-.in~ the HAC standard crmr... in column ("'I lllc J-,t.ttbtics in column
(7) .tn. ~;mallerthao those in column (4). hut thc1c .trC' no quuhtattve difference~
in the two sets of F-stath.ttcs and I"' alu~-.. One substunti-.:t.' Jiffercncc he tween
w lumn-. (..J) .md (7) arises bccau<..e th.: rt \( -.t.tnd tn.l error on the beer tax codltcil..'nt is largenhan the tandard ~ rrnr Ill wlumn ( -1) ( on,cqucntl) . lhe 95% confidence int~,;naltor the e ffect on latalllic-. ol a ~:hangc 10 the l'eer tax using the
II AC \landard error. ( -l.OQ 0.~11). is'' iJcr than the ml~r' al from colum n (4 ).
( - 0 . 9. -O.oJ ). anJ the intcr\'al computed u'ing the lit\( standard error tncluJes
tcru
The ~lren!!lh of this analv-.is ''that mdudinr ta t~. :tnJ ttmc fixed effect mit-
An
J to
real
l ...r
' ...Jt,
on hccr. could move \\~th other alcoholtaxc ... : this 'lll!l.!'t"lltl. intcrprt!ting the resulrs
a-. pertammg more broad!~ thiln JU\1 lo hc~:r A 'uotlcr possthilirY is that 1 -~ m
the rc:ll hccr tax could be ac;soctatcu '' llh puhltc ~du~allun campaign' perhaps tn
rc,pon-.e to political pre. -.urc. i{"::'o ch,,;lgcs in th" '" ,,j,C\.T ta'\ could pick up the
n11r tncrca:;es in thtc> minimum kl'al drmktng nge h '" 111.,! ~,,rt
In t.ontra<.t.there is soml.. ~,; \ tJt.ttcc &hut mcr('n ... mg alcoholtaxe~ a::. mca;;urcd
h~ thl! real tax on bt!er. does rcJuc~ tr.tlltc de!.tth-. Inc m.s,gnituJl.. of thb dfect.
1111;''
codain in
o f the
tbfCL
~pun
ang~"
h du~
hirt ;10
tcie nl"
f".lltO:
iocvl
nt' on
10.7 Conclusion
llm chapter showed how multiple uh-.t.J '.ttJon' o\cr It me on the same entity can
he u,.,ed to control for unob::ocn cJ onHtttd '" i.thlc-. th.H dtlfcr acro ...s entitic-; hut
372
CHAPTER 1 o
chane~
The ke'
ms t~ h t
enccs nlht.:r than these llxcd charactcnsucs. If cultural auttu.ks to\vard drink1
and Jn, 1n2 do not chan!lt; apprel.'tahly over se,en v~P "11 htn ..1 st 11. hen e;~;p'
nauon' IM change 10 the trafftc fatality rate O\cr tho'1. sc\~;.n yeah mlL'>t he
elsewhere.
To exploi t this insight. you need data in which the same tntity i-.; u~...crvcd 11
t\\O 1r more umc p.;nod,. thatr . ~ ou need panel data. With panel data. the mul.
tiple regression model of Pa rt 11 can be extended to include a lull ~c t of clllit~
binary variables; this is the fixed effects regression model , which can be c'> ttmutcu
by OL. A twist on the fixed effec ts regression modeJt, to 11KiuJt; l'ml. fi);eu
etk~h. '' ht~h dmtrtll hlr unob~'-''' ~d '.uiabl(!:) that t.:hJnge over ttmc but ;u c cun... r..nt acrns entitic . 8(lth cntll~ anll time fi\c\J effect can \1~; inclullvo r th~:
regrc!>o;son In control for ''ariablc~ thut vary aero - cntllie~ hut are constant OYer
ti mt: and for variables that vary over ti me but are constan t nero~ en tities.
Despite these virtues. entity and time fixed effects regression cannot control
for omitted varia ble thut \ary hoth .tcross cnthtes mul over lime. \nJ o hvtouc;l}.
panel data methods requi re panel data, which often are not available. TI1U"' th.:II!
remains a net:d for a method that can eliminate tbe infl uence of unobservl.'d O!DJI
ted variables when panel data methods cannot do the job. A powerful and g\:nl!rnl
method for doing so is Instrumental variables rt:gression. the topic of Ch'lpter ;_,
v~~
gP,.
r
~~P. j 'mmary
~
r~ople.
ExerciS&
313
Key Terms
p<tnll Jata (:\50)
o.almccJ pand (:-50)
unlMianced panel (3: I)
hxed effects rcgre!'\. aon model (357)
entity ltxed eltects (357)
time: fixed etfc:cts regression modd (362)
time fi xed effects (362)
entity and time Ci~c:d effect<; regression
mode l (362)
autc.ll'(lrrd.ttcJ (366)
'l.'n tlh cotrc:l.th:J (J(>fl)
hetcro~l..cdJ'\ llCllv- anJ autocorrdationcons~lt.:nt (J lAC) :.tandard errors
(166)
clustered :.tandartl errore; (367)
f0.2
12.
10.3
Why, . il necessary to use two suhscripts, i .mJ /, to J.::scribe panel data'! What
does i refer to'? What does r refer to'?
A researcher is using a panel data sct on n = 1000 workers over T = 10 years
(from 199h to 2005) that contains the workers earnings. gende r. education,
and age. fne researcher is interested tn the effect of education on earnings.
Give some examples of unobserved person-specific variaNcs that are correlated with both education aml earnings. Can you think of examples of tim..:specific variables that might be correlated Wtlh ctlucatJon and ~ammg.~'? How
would you cont ro l for the~~ per~un-sp~cific and tlnh:specific dfc:cts in a
pnnel data regre ion'>
Can th!! rcgrc~si on that you suggc:stc:d in respons~ to question 10.1 be u~ed
to estimate the effect of gender on an individual's ~arn ings? Can that rcgres
sion be uscu to estimate the effect of the national unemployment rate on an
individual's earnings? Explain.
Exercises
10. I
d<~tu
rcl.!ression summarized
New Jersey has a population of ~.I mlllu.)n people. Sup po~c: that "Jc\\
Jen.cy increased the tax on a ca~e of h..:cr by $1 (in $19lS8). U-...: the
rc.,ull in column (4) to predict th~.. n u mb~,;r of lives that would be
saved over the next year Con~truc l tt YS 0..... confidence tnterval for your
374
CHAPTER 1 o
Conside r the binary variable version of the fixed effects mode l in Equation
(10. U), except with an additional regressor , D 1;: that is. let
11.
Suppose that n
s tanr" regressor are perfectly multicollinear. Lhat is. express one ot tb.;
variables D l ,, D2i. D3;, and X0J, as a perfect linear function of the
others, where Xo_, = 1 fo r all i, c.
b. Show the result in ( a) for general11.
c. What will happen if you try to estimate rhe coefficients or the rl.!gr<: .
sion by O LS?
10.3
Sectio n 9.2 ga,e a list of (ive pote ntial threats to the int e rna l valic.lil\ ,,[a
regression study. Apply this list to the empirical analysis in Secrion 10.6 and
10.4
Exeroses
10.5
J3 1X , ... a , -
3 75
o.,, y
10.6
Suppose that the fixed effects regression assumptions from Section 10.5 are
satisfied. Show that cov(v;,. v;~) = 0 for 1 s 1n Equatioo (10.28).
10.7
n. Tl1e researcher collects data on the average s nowfall for each stare
and adds this regressor (AverageSnow;) to the regressions given in
Table lO.l.
b. The reseai cher collects d a1.a on lbe snow1all in each state fo r ea ch year
in lbe sample (Snow11) and adds this regressor to the regressions.
10.8
Consider observations (Y,,. X11) from the linear panel data model
Y1, = X;,/31 + a;
+ A;f + U ;1, t
= I , ... , T, 1 "" 1, . .. , N,
10.9
and
jor
10.10 a.
ln the fixed effects regression model , <11 e the fi xed e ntity dfccts, a;. consisteolly estimated as n ~ oo with t fixed ? (/lint: Analyze the model
with no X>s: Ytl = a, + u;,.)
10.11 In a study of the effect on earnings of education u'mg panel data on annual
earnings for a large number of worke r!\, a rc!\c:archer regres es earning~ in a
given year o n age, education union s tatu'>. nnd the worker\ eamiog.!i in the
376
CHAPTER 1o
~,;..,tr mat~s
ul the !!I teet of the regr~sc:;ors (age. education, uniun .,,<HU\, nnu
prcviou:. year!:> t!arnings) on earning!:> ? Explain. (1/mt: Check the li~cd e lf ct
r~,;grc!:>sion ~:.umptions in Sectton 10.5.)
Empirical Exercises
( 10.1
'omt! 1..-.S. states have enacted laws that allow Clltll:ns to carry co n~:(',r h.:d
weapon<,. " l11e~e laws are known as ''!:>hall-issue lct"-S hc1.ausc thcv 11N ruc 1
local authoritic-. tv L Ut! a conre.ded ''-capons pt...rmtl to all applu.:anr... "' htl
arc citizen'. arc men tally competen t, and have nut been cunvtct<.:d ol alclom
(:.omc c:ta tes havt! some additional rcsLrictiom.). Propun nh argul.' th.tl. rf
mnrl! people carry concealed weapons, crime will decline becau-.c crimrn ,d ~
arc dcwrr~. J from attacking o ther people. Opponents argue that1.rim'- \\ill
ncrea c bccau:.c of accidental or spontaneous use of the weapon. Jn Ihi,
exerc i ~c. you will <~na lyze the dfect of concealed weapons laws on VH II 11
crimes. On the textbook Web site www.aw-bc.com/ tock_wat.son }OU "'' ... d
a data file Guns that contains a halanced panel of data from 50 C.S. st.t l~ .
plus the District of Columbia. for the ~ ca rs 1977-1(}(}9. 1 \ dctatlcd lk n p
tion i gv~..n in Guns_De cription. available on the Weh -.i t~.
n. Estlm.llt: (I) a regression of ln{ vio) against ~I! nil and (.2 1 a regrc .... on
or ln( I in) again...! shall. incarc_rme. density, avginc. pop, pb/OM ,
p .... /fJ6.1. anu pm/029.
i. Interpret the coefficient on shall in regrc:.sio n (2). Ts thi~ "'str'llltlC
I I
iii. Suggest a variable Lhat vanes across state!:> but plausibly ' .trlc" l:l
tie 01 nllt at all-over time, and thal could ca ul'c om it ted va1 111 hiL:
hia... in regression (2).
b. Do the re~uhs ch,mge "hen vou ,1dc.l fixed s tate cffc~ t<,'? If so." hi h
set o l rc..grc:.,ion resuiLs is more cred1hle. and" hy?
nw~ uut.t \ere ptllliU.lcd b, l'ror~sur John Dunuhue ur Stanford t 101\"C~ItV m.l \\ t T"' U' ' .J '"Iii'
l.tn \\ fC: '' Sh'lllllll!;: 1),,,, n the ~tor"' Cnuu. I...: 'Cnmc' Hypoth~,,._ ~ 111.fo ull .m R 1
p h per wllh
Empincol Exercises
377
c. On th1.. rcsull:. change \\hen} ou .u.Ju li,~,.d time cfh.:d''' If ,t,, \\ hu..h :.et
ol regrehl>ion result:. b ffitlTC crcd1hl1.!, and \\ h~ ~
d. Repeal the analysis usmg lolrub) and ln(11111r) in plac~ of ln(I'IO).
mo~t lnlf>Nitn t
rcgr~:.""-1011
f. B.1:.ed on your analvc;i-. wh.11 cvnclu~tun' wnuld ynu draw about the
cffe~..ts of conccalec.l-'''-'
E10.2
ralc~!
Tralltc crashes are the k.tdmg c.,IU,I..' ut J~ath lm i\ml.!nc:ln' bch\C.:l:D the
.tgc-. of 5 anJ 32 ThTllugh 'ariou' 'P~;Illling polkilt>.thc fi;J~. nl i!\1\'trnment
ha:. encnuragcd states to in-.titulc m muttnl\ o;nt hell h\\l> to reduce the
numhcr of fatalities and M.riouo; injunes.ln till' c\crct'c you WIIIIO\CSUgate
how dlccth~ these lav.s arc m incrt:.,mg s~.at b.. ll u...~.. .1nd r~.c.lu(tng (a talilies. O n the textbook \'c:b :.it~.. \\W\\.a\\bc.com/1\tock_nut)On )OU Wlllllnd a
data fik Sea~tbclts that contatn' .1 pand of t.l1lll lwm '\II l \ ~tate.-.., plus the
Di:Mict of Columbia. for the y~.t r-. JI.)K1 199'~ ' \ dtt.likc.l c.l~-.t.~ription is
g1ven in Seafbelb_Dc cription 3\ :ul;1hlc t.lll the: \\ch -.itc.
a. Estimare rbe clfecl of 'L'3t be lt us~. on la t .t htic~ h\ r~urcs,ng
FaTnllfyRate on s/1_u<~ca ~c. \j)t ed65, 'Jit"C di(), haOb, drinka~dl,
ln(mwmt>), and age. Does th~ c. 'tlrn.lll.d r~grc.SSIOil sugg~l>l thut
increast>c.l ::.eat bell u-.~ reduces f.tttlhtt~.:s'!
h. Do the r..!sults change when) ou .1dc.l ~t.t lt.' 11:-.~oc.l dfccb'! Pro' ide an
intuiti,e explanatton !or wh) the rc,ull\ t'h,lllgct.l.
c. Do the results change \\hen )OU aud tunc l1\1..U cllccts pluc; st ate fixed
eift:cts?
d. Wh1ch regression specification
Explain why.
1s mnst rdiabk?
e. Csing the results in (c), d1-;cus' the Mit: of th..: Cl)clftctcnl on ,\/> 11.\l!tl~c:.
Is it large? Small'? How mnny hvcs ''uulc.J h~ s.\\'..:d tf st.: al hclt usc
increased from 52% to 90%'?
f. There are two ways that mandatory s~.:al belt 1,,,\\ Me cntorcec.J: Pri-
mary" enforcement means that a pohc~.. utltccr cun :.top a car and
ticket the driver 1J the otllcer ob!!cn c-:. ,,n I.X:<.:Upunt 1WI w~aring" ...eat
helt. .. .,econdary"l;!'nfon:cmcnt mc.JII\th 11 ,, pnhc.1. nlticcr can \\rite a
ticket if an occupant is not wear in~ ,r '\!II h~ It hut mu-.t h:we another
n,c..... d.lla \\CI'<' prov1Jcd by Prulc:-..~or Lr w I IO.J" ol Stnnhnlt 111\l~nll\ md \\Crt u-...d 111 h1 p.1~r
\\1lh Alm.1 Cohen, "Tho: F.Jiccb
\lanJ:lln.,. ~t'~l 1\t'lt I..II\\~ otn IJrt\ tng Ucha\ 1ur unJ Tllllfi r1llllh-
,,f
't)l_\
'i(<l) H2~->
378
CHAPTER 10
rea)OO to~top the car. In the data set.primary is a binury va riable Jor
primary enforcement and secondmy is a binary \'ariahle fur seconua 11
enforcemen t. Run a regression of sb_useage on priman, .wcmtdm:\,
speed65. speed70. ha08. drinkage21, lo(incume), anu a~e . induding
fixed state and time effects in the regr essi o n . D oes pnmar) cnlnrce
ment lead to mo.re seat belt use? Whal about secondar) e nlorc~ment.
mc~kin~
APPENDIX
through 1988.11le traffic falality ralt.: is the number of rraffic deaths in a grven sta te m a
given year, per 10.000 people living in that state in lhat year. Tra(fic fatality dat.l w..:r~
obtained from the U.S. Department of Transportation Fatal Accident Rcportmg Sy'item.
The beer tax ic; lhc tax on a case of beer, which is a mectsure of ~tate alcohol taxes mll ._ ~o.n
erally. The drinking agt.: variables in Table 10.1 are binary variabks indicaung ~ hdher th~
legal drinking age is 18, 19. or 20. The two binary punishment vanablcs in Tallie 10.1 dc: .... rihc
the SUite's minimum sentencing requirements for an initial drunk driving convtctlun
"Mandatory jail'!'' equals I if the state requires j ail time and equals 0
oth e rwJ~c:.
,.,d
'Mnndatory communuy service?" equals l if the state requi re~ communi ty serv1co. <lltd
equal 0 otherwise. Data on lh~ total vehicle miles traveled annually by state\\ ere of:lt r:~d
from the D epartmenr ol Transportation. Personal income was obtained from l hl tr <;
Bureau of Economic Anal) sis. and the unemployment rate was obt.ained from tho. l
Bureau
10. 2
APPENDIX
10.2
Standard Errors for F1xed Effects Regression w1th Serially Correlated Errors
379
lhc:d effect<; estima tor of /3 is the OLS esu m:llm ohtu11cd u'inf! lht' ent it) dem~a ncd
r,,
re~tcssion ol Equ,ttion (10.1~) i n which Y,, '"' rcgrt-.,~d on X11 , \l.ht.re Y,, - Y, - Y .. =
T
X - A,. Y, = T I , Y, anJ X = r -L ,x .Th~ formul.t fur lh..: OLS eSIJmator IS
ib.:
ion:
and
and
inc:d
LS
L.S.
obtamed by r.::placmg X;- X b) X, and Y,- Y h) } "m I qu.J1111n (.t 7) .md by r~.;placing
the ~inglc summ:Hion in Eq uu ti\lO (.t 7) by two :;umllliiiUlll' on~ llH:r cnlitie' (1 - I, ... , n)
and one O\"er tim~.; pcrioJ., (I - 1.... , T)5
{3, -
Titc Jcr-\ aunn l1fthe 'amplin~ Ji~trihutu1 n or f! !' r11lk J, lh1. Jcri\:Uil)D in Appendix
4.3 of the sampling Ji,tributinn of the OLS c<>tim 1101 '' ith
f the
(10.22)
cro~<;-sectiona l Jata.
1-ir... t. -.uh-
380
CHAPTER 10
u,
stitute Y,, = p,x" 1[Equauon (10.1-l)J into the numerator of Equat1un (111.21). then
rearrange the re:;ult to obtain
a - Pn I --
P I
" r LLX;,u,
L.!.L II
'
2: 2,x'
I
-I t
Next, dl\'ide the denominator of the right side oCEqultinn (10.23) by r f dh e h~ numo.:~
alor by
I L,.r 1(X,
-]
- X,)
left side by
r u, = I,.
1 X 11 u,,. Then
v;;T, and
no te th;H
( 10.24)
I
where v11 = X1,u" and Qx = -II T k,. 1 I,. 1 X~. The scaling fac10r 1n Equal ion (10.24). n T"
I he total numher of observations.
-
E2' 1 ~ 1 l.','
Also, by lhe central limit theorem, /f''7~ 1 TI, is distributed N(O,a~) for n large, where,,, ts
th,: variance Of 1};. lr follows from .EquatiOn (10.24) that . under A~sumplions l- 4.
(10..25)
where
(10.26)
P1 is
var(ft 1)
1 !!..a.
= -;;r
Q~ .
a;
L'oder Assumption 5 of Key Concept 10.3. lhe cxpres:.1on for in Equal ion (I 0..':6}
simplifie:.. Recall that. for two random variables U and V, var(U + V) ~ ,-ar(,1-'- '~ ll)
+ 2cov(U,V). The varian<.:<! of the sum in Equation (10.26) therdore can be written ll.~ the
liWD o( variances, plus covaritmces:
Standard Errors for Fixed Effects Regressron with Serially Cort~lt;tted Errors
10 .2
2). then
var(
/'i= ~v,) t .
=
381
(10.28)
(10.23)
Under Assum ption 5, the erro rs are uncorrelated across time periods. given the X'~ so
e numer-
all the covaria nces in Equation ( 10.28) nrc zero (Exercise 10.6). B ut if u11 is autocorrelated.
,X,ult-
the n the covariances in Equa!iun ( 10.28) are, in general. nonzero. The usual heteroskedasticity-robust variance esttmator sets these covariances to zero. so if uu is a utocorrela ted the
usual heteroskedasticity-robust variance estimator does not consiste ntl y estimate~
ln contrast, the so-called clustered varia nce estimator is valid even ii IJ 11 is condi tion-
(10.24)
U~.rilumtd = n'f' L
24).nT.is
,,,7
X
~,
ii
here u~ i$
L 1 l
II (
-"1
wberc
fr;,
T~~~ )2
L
1-1
(10.29)
X;J;t, and u, is the residual from the OLS fix..:d effects regression. (Some soft-
ware imple ments the clustered variance formula with a degrees-of-freedom adjustme nt.)
TI1e clustered panel da ta standard e rrors are given by
-1
ll7
(10.2.5)
-.
Q-x2
( 10.30)
The cluste red va ria nce estunator U!.c~uurud is a consistent estimator of~ as n ~
<"-
(10.26)
cise 18.1 5): tha t is, the variance estimator is hcteroskedasticity- and au tocorrelatio nconsistent. This va riance estimator is called the clustered variance estimator because the
errors arc grouped imo clusters of observations, where the errors can be correlated wi1hin
the cluste r (here, for diff~.:rl.!ot time perio<ls but the same entity). but are assumed to be
( J 0.27)
ion ( 10.26)
large. Fo r e mpirica l cxamples using HAC standa rd errors in economic panel data, see
) + var(V)
rn some cases, 11.11 migh t be correlated across e ntities. Fo r example, in a study of earnings,
suppOS\! that thc sampling scheme selects fam il ies hy simple random sampling, then tracks
all siblings within a family. Because the omitll~<.l factors that enter the erro r term could have
..
382
CHAPTE R 10
con11m10 driJI..ntS lor ~iblillg.'>, 11 IS not rc:lsonaN..: to a-;c;umc that the. crwro; urc im.lep.:>n~
dent lUI .rhlin~' (C:\CO though the) are independe nt [Qr rndh rduab I tom Jilll:rent
(amilic'\)
I~ the -'hling.' \ rmpk.lomilic:. <rr-.:. natural du-.tcr.-. ur !!fllUping-..ul oh,crvauon\
1\ here II IS CMI'CI:llc:J Wllh!O the cJuqer hUt not 3CTCX'I cluo;tei': Jbe Jcr ivntion Of dll..,lcn:u
vnn.anccs lwding It I ~u ruon {111.2'1\ can be modified to <1lluw hn clu5teb acros.~ cntith..- .
(fl
~~y~F"t
-t..-y
2~> r.
9/.
AI#
',-
J ~
i .wl.;/u
'<
urtfi~ Wei
!AtU'"~J
ftu;.,.
~us<
F'Dd..R.
(1</re.
~.
fp,
,J,-e
_u, ._R
e-u~ ?
!t:,t::::::I:::f::
Regression with a
Binary Dependent Variabl ~w
11
CHAPTE R
~ b~ nu.
~~ ! LP~H ~~~~/Cd~t"U'uJ,
C
__ ~.
~
,t~ /(;
~e I
wo people. identical but for their r.u.;c walk tnto.t b.mk and appl) for a
mortgage. a large loan -.o that c.1ch c"' .u\ .. n i knlll',ll hou e. D Ot:S the
~ff
regulators.
pro posed
Jo::~ n
then a loan o fficer mig ht justifiabl~ den) the loan. Also. even loan officc:rs arc
human a nd they can make ho nest mhl3kcs. 'o tht.. dental of a '-ill!!k mmority
applicant Llocs not prove anything .shout J i,crimin.uinn \.fany -.tutlie:. of
d i ~c r~a ti on
/1
LP1
til:\ idcnce
of l.bJ>crimmauon
in the mortgage market? A start is tu <;Umpar~ the 1t.11.:tiun of minority and \\ httc
a pplicanl.s \\ 1)0
\~!!re denied a moqg~gc. ~ }~11.. tl~to c\.tm~ned Ill this chap ter.
(#~ rather~tl (rom mortgage application-. in 1990 tn the Ro~ton, Massachuset~ area,
qf/~ ope ned this chapter, because the black :md \\httc applicants were no t necessarily
M{;.. .
~
'ide ntical but (or tbl!ir racc.''lnstt..ad we.: need a mdhod fur companng rates of
nwl, holding
383
384
CHAPTER 11
f/
lie
~~.A.~.I
/p
-~~
dependent variables. Section 11.1 goes over this "line;u probability model." But
,p~~
I ,.,
methods. called ''probit .. and .. logjf' regression. are discussed in Section 11.2.
~t$~ !
f~/1
Section 11.3, whic h is optional, discusses the me thod used ro estimate the
,.~ {t(''tr;. IJkclihood e~umation. ln Section 11 .4, we apply these methods to the Boston
mortgage application data set to see whether the re is evidence of racial bia~ in
1. 9
~r 'l
Q;..v"" vt
.[ ~
mortgage lending.
The binary dependent variable considered in this chapte r is an example of a
\.t j)r~
~)J/ . 1
rf I V
A~
Appendix 11.3.
11.1 -B-in
- a-r y_ D_e_p_e_n_d_e_n_t_V.
_a_r_i_a_b_le-s- - - - -- -----r
~~ ~
:.fo
;.P /JI'
#J'T. A,/~
ArV'
~u
,.IIJf
denied~
fx"' '7.~
-~~
~' ~~nninc:> whether a tcenag~r take" up mn~ in~'? Wha t determmes wbe lher a
cou nt J! rcce ,.e" tore1gn aid' 1 Whm dt: term me~" h c thcr a job nr phcant i!> succc,,ful ! In all the~ ~xamplc!-. the outco me: o f mtcrc ... t i' bmiif! lne student does
d oc' not go to coUege. tht. lccna~cr docs nr tf<>t:., nut takt. up ... moJ...ing. a count!} doe or dlh!S not rccei\'e fort:lgn ;ud. tht. apphcant doc' or dol!s not get a job.
This c;ection dlscu~es what di'>tiO)!Ul\hcs rl.!gtessm n with a hmaf) d e pendent
01
vana ble from rcgrcss1on '' ' 'h a conunuo us dc pemk.nt \.tr!ahle. rhen tu rns t o the
c;im plcst model to use w11h bmar) dependent '..tri ub!I.!S. the hncar probability
;4 "ttlc~
~ ~
wh~>!ther
mortgage ap plica tion: the binal") dt:pt.:ndcnt \Jnahlc '"whe ther a mortgage ap plicatton '" demed. The datn arc a su bset ol a la rger data set com p iled by researchers
~{ ,.(fJOeC
at the Fed eral R c c rvc Bank of B oston und~r the Home Mort gage Disclosure Act
(H M D A). and relate to mortgage applications fi led in the Boston, Massachu.c;ett"'
7 ~ ., frt!
M~o/
l f1
t4,~ H
1 i,
Lh~
; LJ,/c..
applicant will make his or her loan pa) mcnts. O nto impo rtant piece of information
;j ~
arc 10o. o f \ouri ncomc than "On:.! We t herefor~: begtn by looking a r the relationship bc t\\ CCn two variables: the hmnrv dcpenJent variable deny. w hich equals
( <
~ is the size of the requiTe d ltMn payme n ts re luthc to the <tpplicant's income. As
~~
anyone \\ ho has borrowed moncv know~ Ut $ much ea"ICI 10 make oavmcnrs that
1 7':{I
~
1
~,
-1.
I if the mortgage applicatio n was J e nie J and l!ljuah IJ 1f it wac; accepted, and the
contmuous vanaPic PI! ratw, wh1c.h b the ra tio o l the applicant's anllcipated total
1
5
mo nthly loan paymenb to llli or her monthly income.
S ~ )4
Fi~un: 1 1.1 pre e nts a scuttcrplo t o{ tlon '"'r'u' f I ratin for 12- of tht. 2..1; )
~
~n. .~.n llions in tbe datcl ~cl. (Tht. ~t.atlcrplot is ca~tcr to read usmg rhis -ubset of
~he data .) Thi:.. ~catterplot look' diffe rent than the scauerplob of Pan ll because
{' ~
~~ variahie deny is binary. Still. it set:m' to show a rel. tllono;hip between deny and
I
,t~'t!J ratio: f c" applicants with a pa) m.:nl to llli.Otnc,; r.llto ~s than 0.3 ha\C: their ...,
applicauon deml!d, bul m<.l\1 pphcanh Wllh o JM) mcnt-llHncomc ratio cxceedYl~ 111g 0.4 ar~. dcmed.
~ ~-7 ~
T his positi\'c relationship be twee n PI/ rntio a nd deny (t he h igher the PI! ratio.
~ till. greater the fraction of de nials) i<: 'ummn1i?cd in ligun:. II I h~ tht. OLS rcgres''on hnc t.'llmatcd usmg thes~. 12"' uh),l.f\.llton,. \s usual.t llii> line plots the predtct~d value o l deny a a function of tht. rcwcssm, tht. paymenHo-mcomc ra uo.
J wA
Y,.
_/)
-
386
CHAPTER l l
--''---
0.2
--'+- - - ''---
0 ..
U.4
- ' - - - - '()$
0,6
-'-- - - '
0.7
t).~
P/ l ratio
P., _AL"-"tSti
r:
IV'(/"?'
~~1!1.;
For example. when PI! ratio = 0.3. thl! predicted value of den~ is 0.20. But wh,ll.
precisely. docs it mean for the predicted value of the binar} variabk d t>t ,. 10
bl. 0.20'!
The key to answering this question-and more generally to u nd ers tn nJ~ng
~ regression with a binary dependent variable- is to interpret the rcgr~on a" Trd~ cling the probabLIIC) that the dependent 'ariable equab 1 Thus, the predicted v Juc
o f0.20 is interpreted as meaning that, when P/1 ratio is 0.3. the probability ot lh.. ntal
L.-- ~ p,3
is estimated 10 be 20 %. Said diffe rently, if there were: many applicatton ... ' .rb
{J,
11
().L,
PI/ rmio = 0.3 then 20% of them would be denied.
11
cd:?
pr~?_ AIL
This interpretation foUows from two facts. Fint. from Part 11. the popul:llii.O
function is the expected value of Y given the regressors. E( Yl .\ 1
~:;I
.X*). Sl.!cond, (rom Section 22, if Y is a 0-1 binary-t.variable, then its expected \I lm~
p,l
(or mean) is tne probability that Y = I. that is.. E(Y) = Pr(Y = 1). In the rc:c. ..:'-
~ sion context the expected value is conditional on Lhe va lue o f the r~gressut !'o. '~)
~
the probability is conditionaJ on X. 111USfor a bi nary variable. E(YI X 1, , x~) ==
P r( Y = l l%1 , X" ). In short. for a binary variable lhe predicted value fron th~
population regression is the probability that Y = 1. gtvcn X .
The linear multiple regression model applied 10 a binary de pendent vaJ wble
is called th e linear probability model: linear" because it is u straight l in~. . td
~~egressio n
" proba bility model" because it models the proba biliry that the dt:pendent , .,r. I~
equals l , tn our example, the pro babiiJty of loan denial.
1
.V
Y"
'f
.7'
,(0
.tt,uJ ~plication to
~dd~er,JXndcnt van< le, deny , against the payment-to-income rallo. PI/ ratio. estimated
1), ;4l ~sing all 2380 bservations in our data c;et is
J-M& ::: t'.IPY
t; t,
:
pR
i,IJ.t
~~
-;J;;i} = - O.OSO + 0. 6f~fl I ratio .
.
( 11.1)
a-
~I~
r'
Jr. , t,c;~
fl 0 l fl'~~""The estimated coefficient on
(0.032) (0.()9S)
PI/ ratio is positwe. and the population coefficient is sta t i~tically significanll) different from tcro at the J~>fi kvd (the r-statistic
(
h I l) Thus. applicant wn h lugbt!r debt payments as a fraction of income are
,r k~yre Likely to have their applicatio n denied. This coeffi cient can be used to com. Ib
~
~. pute the predicted change in the probability of denial. given a change in lhe regresl~ft
r. For example. according to Equation (11.1), 11 :h~. PI rat l ncrea ec; b} O.L
then th~ prubabalit} of derual increases b} O.hO-l 0.1 a 0.060 that is. b} 6.0 pcrccnt.tgl.' r(.)inl ~
' J1if,4
pfP
388
CHAPTER 11
ll>
~ /32X2t
+ + /3I.Xk1+ u,.
( 11.2)
bmary. so that
Pr( y
x ,. x2..... x~.;) =
/3o
f3tXI
+ (3'!X2 + . . . T
f3~. X,.
The regression cociTictent (3 1 is the change in the probability that }' == Lassuc1ated
with a unit change in X 1 holding constant the other regressors. and so fot th for
/3-:,.... {3~. Tile rcgres~ion codficients C<tn he estimated by OLS. ami the usual
( hclcroskcdasticity-robust) O LS l:> tandard errors can be used for confidence inta
Yals and hypot hcsis tests.
The cstimalt:u linear pro bability mode l in Equation (ll.J) can be uscll 111
compute pn:uictcd denial probabilities as a functio n of the PI! ratio. For exampl ~.
if projected debt payment are 30% of a n applicant 's income, then t.be PI! rattu ''
0.3 aml the predicted valuefrom Equation (11.1) is -0.080 + 0.604 x 0.3 =0.101.
"Jhut IS, accon.hng to this linear probability model, an applicant whosl! proj~~h;J
d~bt payments arc 30% of tncomc bas a probabilitv of 10.1% that hie; or her .1ppli
cation "iU he denied. ('nus is diffe rent than the probabiliry of 20% based on thl.'
regrlo!~Sil)O line in Figure 11.1. bca~ usc that line was estimated using only 127 l
the 2300 obc;crvauons used to ~stimate Equation (11.1).)
\\ hat is the effect of race on thr.:: probability of denial, holding constanltb" l'il
ra11o'! Tn ~l!cp thtngs ~i mpll!. we focus on differences between black <tnd \\I t~
appl_icants. "lb estimate the e ffect of race. holding con~Lan t the /)/ / muo. wt: 'I!!
menl T:qttaliorr(L t .I ) w1th a ~inan regrlssor that r.:quall' 1 if the apphcant is hl 1 k
ami equals 0 if th~: applicant i~ white. The l!stimated linear probability mudd I'>
'\
~
..
= -0.091 +
(I U)
v. htk holdsnr con-..tant thetr pa) mcnt-h)inulme ra11o. llw; coel llctent b ~1f,n1li
cant at the 1"{, level (th~ r-statisttc 1" 7 II ).
11.2
Probtl
389
(,v' d~
J,tl.cn litenlly. th1s estunatt: ..uggc,ts that lhcrl.' m1ght he rncinl bias in mortbur such a conclu,on \\<lUlu be. p1 nMturc.Ahhnugh the paymcntto-mcume rat10 plays a role in Lhc: loan nlfic~T 'f. JccJ,ton, 'n do m.tn) othl.!r [actors.
ch a<> the applicanr' l.!.'lming potential find the inJh iduar, crcdtt hbtory. U any
~.tgc do.!CI~ion
It
Shortcomings ofthe linear probability model. 'Inc lin~arity that makes the
linear prob,lhi l it~ model easy lo use i' alo;o ih mnjor O.tw. I uok again at Figure 11 I :"Jhe estimated line rcprcs~.:nting. the prcdktcJ ( 1\lh,ahthtl~,;, drops below
0 fur \Cf~ IO\\ 'alu~o' of th(; P./ rau(l and C'Xl' cd I h' high vc~lut:.! B ut this i!. mmscn:)c: A probability cannot be k ss th"n 0 or v.rcatcr than 1. n11s nonsensical feature is an ine vita ble consequence of the lml.'ar rl:grcsston. lo udtlre~s th1s probkm.
we introd uce new nonlinear modt!ls specifically Jt..stgn..d Cor hin:tr) dependent
v.,;.b';1 Rr Z'l:.''~";~ k- (
,fj
Probit Regression
11
1-t J: iJ
l'r( Y
.P;_--:---.....:..__
l Jronoum.'o.:d prl\ hal and lr,..Jil .
(11.-t)
390
CHAPTER 1 1
FIGURE 11 .2 Probit Model of the Probability of Denial. Given the P/I Ratio
The probit model uses the cumulative normal d1stribution func
tion to model the probability of
denial given the poymenttoincome ratio or, more generally,
to model Pr(Y = l iX). Unlike
the linear probability model, the
probit conditionol probabilities
are always between 0 and 1,
deny
1.4
r-
1.2
0.6 1-
Probit Model
1).4 1--
0.2
1-
_ _ _ _.
rL..__
u.o
- J -_
0. 1
__JL . . _ --L..._
0.2
0.3
__JL..__
0 .4
_ . __
__JL..__
0.5
0.6
_ . __
0.7
_,
O.H
P /I ratio
11.2
391
ma ted probit r~gressi on funct ion has a stretched s" shape: It is nearly 0 and fiat
for small val ues of PI! wtio: it turns and increases for intermediate values: and it
flatte ns out again and is nearly 1 for large values. For small values of the paymentto-income ratio, the probability of denial is small. For example, for PI! racio = 0.2.
the estimated pro ba bility of denial based on tbc estimated probit function in Figure 11.2 is Pr(deny == 11PI! rario = 0.2) = 2.1% . When the PI! ratio is 0.3, the esti~ mated probability of denial is 16.1 %. When the PI! ratio LS OA. the probability of
denial increases s harply to 51.9~8 . and Y?hen the P/1 rarw ts 0.6, the denial probabilil ' is 98jo/o. Acco;ding to this estimated prob.it model, for ~pplicants with high
payment-to-income ratios, the probabilit}' of denial is nearly I.
ity
.8).
l ).
ted
roula-
ion
the
ility
s.
robit
e'-ti
For example, suppose ,80 = -1.6.,81 = 2, and .82 = 0.5. lf X 1 = 0.4 and X 2 = 1,
then the z-value is z = - 1.6 + 2 x 0.4 + 0.5 X 1 = - 0.3. So, the probability that
Y = 1 given X 1 = 0.4 and X~ = Lis Pr(Y == 1IX1 = 0.4, X 2 = 1) = <P(-0.3) =
38%.
Effect of a change in X .
392
v
f: =---tl
CHAPTER 1 1
.:~(''':
<;:.
K~Y CONC!ifT
r;=-.,..:
1 1.2
II
(ll.6)
where the J ependent variable Y is binary. () ill the cumulattve <,tandard normal
distribution funct ion and X 1 X 2 e tc., arc rugresso rs. The probit coefficients {311,
{3 1. .. f3k. do nm have simp le interpretations. The mode l is best interpreted h\
TI1e probit regression modeL predicted probabilities, and estimate d e1'Ccl:l5 are
iii
ii
l
to the 2380 observations in our <.lata set on mortgage denial (deny) a nd thl: paY-
Pr(deny
= 1 PI f
rmio)
= cfl(- 2.19 ,-
2.97?11 ratio).
(JLfl
(0.16) (0.47)
The estima ted coefficients of - 2.19 a nd 2.97 are d iHicult Lo inter pret b~:caus~
they affect lhe probability of denial ''ia the z -value. Indeed, the only t hing that on
be readily co ncluded from the cslimatcd probil regression in E quation ( l l 7J is
tha t the P/1 ratio is positively related lo proba bility of denial (the coefficient on
the PI! rario is positive) and this rela tionship is statistica lly siguiiicant (r = 2.(mo.47
= 6.32).
What is the change in the prcuicted probabili ty that an application \\Ill bl.'
den ied when lhe paymcnl-to-income ratio increases from 0.3 to 0.4'? To .tn=-wer
t hi1> q uestion. we follow the procedure in Kt.::y Co ncept S. l : Compute the-: pJ\>hiJ
11 .1
393
bll1ty o f dental for P I ratio = 0. ~. then for /'I ratu' = 11.4. ;m(.) then compute the
diflc.rc.m:u. n,c prohabilit} of de mal\\ hen PI/ ratu .-;:: 0.1 i... <II(- 2.19 - '2.97 x 0.3)
q,t-1.30) - 0.097. The probab1ht) ol <.lc.m.tl \\ hc.n PI ratio = 0.4 b {I>(- 2.19 2 ~7 0.4) <I>(- l.tiO} = 0. 159. 'Tllc ~:~timJh:d ch.mgc 111 the prohahi li t~ of denial
i-. 0.1 'i9 - 0.097 = 0.06'2.1lli:lt b . an incrca-.c. m the. fi.I)'Olcnt-tc.l-income ratio from
0.1to 0 4 is associated '' ith an incn.:a' c in the.: probability of dc.:.nial of 6.2 txrccntagc pomll..1rom 9.7o 10 15.9%.
lkcau ...l! the prohit regression function 1-; nonhnc.1r, the l ftcct ol a change! in
X t.h.:pcnd-. on the. tarting 'alue of ~ . ror C)o.ampk . lf PI/ ratio - 0.5. then the estimated dc.n1al prol"labilit} hac;c.:d on Equauon (II 7) '" <ll(-2.19-+- 2.97 x 0.5} =
ct>( - 0. 71) 0.239. Thus the thangc. tn the. prcdtc:tt:J probab1hl) "ben P I r lllto
incrc..t ..cc; fro m 0.4 to 0.5 is 0.~19- 0.159 or ~.0 pcn:cntag.e points.. larger Lhan the
incrc.:ase of 6.2 percentage point<; wht.n lh~;. PITratio inuc.a-.e.. fr,,m 0.3 to 0.4.
What i' the dlt!ct or rac.e on the pro h..th11lty of mortl!:tl!e Jcmal. holding consiCull the pn} menHo-income rauo? To I.!Stt m.tc thl'> c:f(cct. we estimale a probit
regre ~ ion with both Pl l ratio lind I>IMk as r<..!!r~.; c;so r~:
0.7lbluck).
(O.O!G)
(11.8)
.I\ gain. the vsluc of the cocffici "'' iHt. d1flkuh tn intcrprd hut the s1gn and
..tatl..,tll.tl s1gn1ficanc~ arc not. Inc codtic1cnt on black ~ po"1tive, md1catmg that
un Alncao A merican apphcaot ha!> a h1gh~.;r proh.sb11Jty nl dl nial than a \\ hitc
.1pp1Jca nt. holdlllg constant thc1r pit) mc.nt tu mnmw ra11o. 11ti-. ~odficien t is sta11\llcall} "'&mfkant at the I n-o level (th\! t !Hati-;tic un him/... i-. K S5). For a white
applicant '' 11h PI! ratio = 0. ~. the predicte d <.leniul prohahilit} j.., 75%, whiJe for a
black applica nr with PI/ ratio = 0.3 it is 23.3%: the. (.hltcrencc in <.kmal probabilities hc tw~ n t h e~ two hypotheuc..tl ,tpplic;ln ts is IS.R pe rccn t ag~. poinls.
Estimation ofthe probit coefficients. 'l he. prnhit coc:tlicients reporteii here
using the me thud ul ma \llllUill hkchhooJ. \\ lm:h pwJul:~s dlicicnl
(minimum variance) t: t imator-; in .1 wiJc '<II ict \ nl upplkat ions. Jncludtng regression With n hin:tT\ dependent \liTlllhk. 'n11..: m.l\lmum likclihuoc.l estimator is con"' tent a nJ norm<JU) distributed in larg.c s<tmplec;,so that t-o;t ttisucc; and confidence
m t~rva l c; tor the codlicJcnts can be con'\lf UCh.:d m the. u~u 11\\,1\.
Rc~rc,sinn ~oft,\are for 1!'-llm tllll!' prnblllllo ld 1\JH~o. I) u't:' maximum
likdihuod t:stunataon 'o 1h1s 1' ..,jmplc methPd t ' ,rl\ Ill prn..:llce. tandard
erwr' produced hy uch w h\\ lrc can be: lhcJ in thl: s.m1e \\ay a-. the ... tandard
errors ot regressio n coefficien t:.: for example. a Y" 01 coniiJ.:nce mtcrval for he
\\lrl
cstun.ll~.;u
394
CHA PTER 11
LOGIT REGRESSION
The population loglt model of the binary depe ndent variable }'with multiple
re gressors is
Pr( Y
+ {32X2 + + f3r.X~<)
(,-(IJ,,+(J Xt
(1 1.9)
I
fJ. \"2~
-/J,}l.J"
Logit regressio n is simila r to probit regression, except that the cumulative distn
bution function is different.
true probit coeffic ient can be constructed as the estima te d coe ffi cient :=1 .96 standard e rrors. Simila rly, F-statislics com puted using maximum likelihood estimator-,
can be used totes{ joint hypotheses. Maximum likelihood estima tio n is discussed
further in Section 11.3. with additiona l deta ils given in Appendix 11.2.
Logit Regression
The logit regression model. The logit regression model is simila r to the pro
bi t re gressio n model, except that the cum ulative standard nor ma l tlistribmion
functio n <I) in Equa tion (11.6) is replaced by the cumula tive standard logisttc Jj.;.
tTibutio n fun ction , which we de note by F. Logit regression is summarized in 1-.ey
Co ncept 11.3. The logistic cumulative distribution function has a specific functional
form. defined in terms of lhe expo ne ntial fu oclion. which is given as the fi n<ll
expression in Equation ( 11.9).
A s with probit. the logit coeffici ents a re best interpreted by co mputi ng pre
dieted pro ba bilities and d ifferences in predicted proba bilities.
The coefficients of the Jogit model can be estimated by m aximum likelihood.
'fb e maximum likelihood estimator is consiste nt and normally distributed in lar~"
sa mples, so tha t tslatjsrics and confidence intervals for the coefficien ts can be Lon
str uc ted in the usua l way.
The logit and probit regression functions are s imila r. Th.is is illustrated in F!!
ure 11.3, which gra phs lhe pro bit and logit regression func tions for the depenJc::ll!
variable deny a nd the single re gressor PI! ratio, estimated by maximum likelihooJ
1 1. 2
395
FIGUR 11 .3 Probit and logit Models of the Probability of Denial, Given the P/ I Ratio
The$e logit and probil models
produce nearly identica l estimates of the probability thor o
mortgage application will be
dented, given the payment-to' income roho.
de ny
1-l
1.2
l.U
Mortgage denied
08
06
0.4
-0.4
L __
(1.(1
_ L_ _...J....__
0. 1
0.2
03
0-1
u.s
fl.&
__J
0.7
08
P/ 1 ratio
.
. F'rgures I 1. 1 a nd I J.2. Th e diff e re nces
! ~~ ~
h/~ us .rng 1he same l 27 o bser va ltons
as tn
I'plY' I. between t he two functions are small.
t' ~
H istorically. the main motiva tion for legit rcgrc,sion was that the logistic
t:umulative distribution function could be computed fa~ter thnn the nom1al cumulattvc distribution function. \ Vith the advent of more ctficient computer ,this distinction is no longer important.
(0.35) (0.96)
(0.15)
llte coefficient on bluck is positive and statbttt:allv Sl~ ni lkant at the l% h~\el
(the t statistiC IS 8.47). The predicted demal probability of a wh ite applicant with
PI / ratio = 0.3 is l/[1 ...L e - ( -4 Ih s.:n:<o.~ 1 :?>O!J = 1/ l J -+ <:l ~2 ] = 0.074, or 7.4% .
The predicted de nial probability of an African Ame rican applicant wilh PIT ratio
= 0.3 is l / {1 + el.25 J = 0.222. or 22.2%. so the difference be tween the two probabili ties is 14.8 pe rcentage points.
396
CHAPTER 1 1
"""
~"""" ~
. ..,
~w, ~
11 . 3
397
appear inside the cumulati' e "tnndard logtr.,tic c.h~trihutt<m function F. Because the
population regression function is a nonlinear function o( the coefficients {30 ,
{31 {3k,those coefficien ts cannot be estimated by OLS.
This section provide an introduction to the ttndarJ method for estimation
of rrobi t a nd logit coefficient' ma\lmum hhhhood Jd(htJonal mathem atical
details arc ~1ven in Appendix II 2 Because it is bu1ll into modern statistical software. maximum likelihood estimation of the probi t coefficients is easy in practice.
The theory of maximum likelihood estimation. however. Lo; more complicated than
the theory of least squares. We tberefon. fit.,t dl\cu-., another estimation method.
nonlinear lea'il -quare . bdon. tum me to ma\imum likelihood.
(11.11)
' 1
The nonlinear least squares estimator shares two key properties with the OLS
estimato r in linear regression : lt is consistent (the prubabtht} that it is close to the
true value approaches I as the c;amplc 11e get large) and it i" normallv distributed in larg\! ::,amples There are. however. estimaton. that have a smaUer variance
than the nonlinear least squares estimator, that is. the nonlinear least squares estimator is inefficient. For this reason. the nonlinear least squares estimator of the
probit coefficient:. is rarely u:.ed io practice, and instead rhe parameters are estimat ed .by maximum likelihood.
398
CHA PTER 1 t
!,
~):~
P~".t
f ht lil..c lihood ruuclion b the jotnt proballilit\ ui ... tnhutiOO of the l.I:Jta. treated ~~
.1 funct1 JO ot the unknown coeffit:Jcnts. The ma., imurn lil.clih ()(1d C\limutor (MI )
h (
,P
9'~
L 4J}f
tJI'
a nJom van abk amJ the onl) unknown paramc!l:r to e-.timatc is the pll)habtlllv
I \\ h1ch 1s .tlso tht mc.:an ol }
. p th<tl l
To obtain tht: m<t\lmum hkchhood e timator we need an exr ression for the
likelihood functton, which in turn requires an expression for the joint rrohuhilitv
di~tri bution of the data. n,cJoint probability distribution of rhc two oh!>crv<JtJ\111~
Y anu Y1 is Pr( Y1 = >1 l".;o = y,). Because Y1 and Y2 are indt:penJ~ntlv c.ll'lrtb
utcd. the joint J istrihution i:- th~.: product of th~ indtvidual distrihut1ons I Eyu.ttJPil
(2.23)1, ~o Pr( Y1 :: y 1, Y2 = y2) = Pr(Y = y 1) Pr( Y2 = y1 ) . The Bernuulh distr
tion can be summarized'" the formula Pr(Y = y) = p"(l - p) 1-'': \\h1..n v 1.
Pr(Y = 1) = p (1 - p)" = p and whe n y == 0, Pr(Y == 0) = p1J(1 - p) = 1 -fl.
fhu,_the JOint probabilit)' dbtrabut ion of Y 1 and Y~ i~ Pr(Y1 - \' }. = ,. \:
(p'(J
p)l-) ./ (1 - p)
)- p
1
Th~ liJ..elihouJ function is the joint probabaht} JJ,tnbution.lrc:atcd <1$ a func
t1on of the unknown coefficil'nh. For n = ~ i.i.d observations on Bernoull t ran
Jum varrables. th~ hkelihood fu nction is
(I I 12)
rllt! maximum likelihood c.,t1maror of p '" t h~.: vu luc of p that ma\Jmw:s thtJ
likl-llhouJ function m Equation ( 11.12) Ac; wi th all maximiz<ttion or minlmiz.tttnn
prohl.:ms. th i~ can ht' Jone b) trial and error. thai i-.. you can try diffcn:nt ,.,, hit'!>
of p anJ com put-.; the liJ..clihood f(p; 'l,. Y2 ) until vou nrc satisficJ that ~Cil h.l"~
llHI\Imized thi~ funcllon.ln thit., uxamplt:. howcvl.!r. mnximi7ing the hkc hhooJ (unc
tum U!>lng ca lcul u~ produces a s1mplc formul.J ft1r theM L[ 1 h1. Ml E. 1s p 1 .I l ,
+ } _). In othcr wnrt.I-.. the \I LE of p IS JU5l th~ 'Jmpl~ average! l n ta~.-L. lot !!L'11'
cr.tltl . the \lLE p of the Bernoulli probabilny 1 j, th~.: '<~mpk ,t \-c.;rag.: th:tl ~
r y 'hs I sh(l\\ 11 in Appcnui\ 11.2) In thi' l.\ 111 plc .t h~: \ IL[ ,, lh~ u.. u.&l c.,u
m nor ol p . t h~.: lr:u.:uon ol times Y, '- 1 an thl.' ~ample
11.3
399
ll1i' c:xample i-. '>imilar ltl lh1.. pru"'km ol~.:sltmatmg I he unknown coc!ffi..::ien!S
u( the prubll and logn regressiOn m<X.I I' In tho".. model' the !>U1..ccss probabtllty
Jill> nut con,tant. but rath~..r d1..p1. n1h un ...\ tll.lt 1'-. 11 ts th~.: '>Ut.:CC'-' probability conditiun.tl on .Y. -wbkh i::. ghen in Equation (11.6) fur th~.: pruhit moJd and Equation ( 11.9) for the logit model nm . . the prohl! nJ lt1:.tit likelihood function.:: arc
simtl.tr LU thl! ltkdihooiJ function in r ~ll Utun (II 12) \l:Cplth.tt thl.. <;lJCCess probahtht\ 'at h.!s from one obscrvatit'n tot ht. 111..\l ( lkc'tll'>l.. 11 Jcpcnd ... on ,\ ). Exprcs'>JOn !o r the probit and logit liJ..d1hood lunctiunl! .trc l!lve n m AppendL\: 11.2.
Ltkc the nonlinear leaa squarv c'>ltmatm. the \ILE is con..,.~tcnt and norm.tll) di-.tributed in large ::.ample::,. Bt:cau'>c: rcg,n.'> ton sofl\\Hre
corumonl~
com-
pute., thl.! M LE of the pro hit codficiLnh, thi' vtimawr i-. C3'>) to lJ!)I.. in practice.
All the:: estimated probit and logit coefficients reported in t hi~ chapter are 1LEs.
Statistical inference based on the MLE. Bccaul!e the MLC ~ normally d istnbutcd m l.trgc samples, ~ta tislical mkrcncc nbout the pr obit ,md logit coeffi ci~nh ha~~d on the
MLE proceeds m the s.tmc wa> a-; mfcrc ncc aboUI the linear
function coefficients bm,ed on the Ol S cstim,llo t lll.ll ~.hypothesis
h!Sh 'Ut. performed using rhc t- tali tic and 4'\"o conltdt..nc' int<.r\'als arc formed
regn:~s ion
1 % ~11ndard errors. Test of Joint hvpoth~s~s un multi pi~ cod ficicnb use the
/-stata:>llc. an a v.a~ stmilar to that dtscusscd in Chapter 7 for the hncar regression
model. All of this is com plet~l} analogous 10 slallsttcal mference in the linear
regression model.
An important practical poin t is that 'omt! :;tallst tcal ,uft\\ ;are re port:> tests of
jot[lt h~poth~: 't: usmg the F- tatJ,llc. whtlc o th '
c: c: 1-"quare srati'iric is q X f." bert> q jo; the number of restrictions
1.ktnl! tested. Because t he F-!)latistic t'\. und<r the null hvpothc'i'. distributed as
y- q tn large o;amplc:s, q X 1- ts dt$trthutcJ JS \ tn larg~: ~am pies.. Because Lhe two
.tpproac:hcs d11ter only m ''bet her the} dt\ tlk b~ q. they proJw.:~.. tdentJ<.-.tl infert.ncc, hut you n~cd to kno w \\hich approach 1'- impkm~..nted Ill }OUr soh\\ are so
thul you usc the correct critical values.
Measures of Fit
In Sect inn 11.1.11 w.1-. mi!OtlC)ncd that the R2 i~ a poor mca-.ur~ o{ lit tor the lmear
ptOhHhilif} model. This is also true for prohtt .md logtt rl!gr~"lon. T\\O ml!asures
of fi t for modi.!-. \\ith hinat: de pendent \3rinhlcs arc the "fraction correct] ~ prcdictcd" and the "p..,cudo-R::!: The frudiou corrccll) predicted u"c the folio" ing
ruk II }
I .mJ the prcdic!Ld proh.tbilit) e\.cceth 500{., or if F = 0 unJ the preilil.'h.'U pruhnhtlity ts le tban 50'!o . t hen }'1 is said to he: currcctl) predicted.
400
CHAPTER 11
11 .4
TABU: 11.1
401
Vo rioble
Definition
,.,, rtJ/1(1
hoiiW!~ tX{II.'IIW!W
IIC0111t' rtJI/0
2.1
1.7
O.o?-4
collection actions)
0 otherwise
Additional Applicant Charactflrlst/cs
0.020
<t'lfmrplowd
I if self-employed. 0 mherwise
0 116
lllll!ll'
0.393
0.984
3.8
I lilt mplo\JIIt'/11
(()fldtl/11 / / lll//11
rfi-
mle
10
0.188
0.142
0.120
ccr legitimately might worry about his or her ability or desire to make mortgage
payments in the future. The three variable measure diffcrcnt types of credit histories. which the loan officer mig.ht weigh differe ntly. The fir!>t concerns consumer
credit, such as credit card debt; the second, previous mortgage payment history:
and the thi rd measures credit problems so severe that they appeared in a public
legal record, such as tiling for bankruptcy.
402
CHAPTER 1 1
')able 11 I alc;n listl> 'Orne other variables relevant 10 th..: loan ofllc~.:r's u~.c 1
siun Somctimel. the a pplicant must appl~ for private! mong.tg~. i n~unm.~.
h
lo,m of ll ~r knows whether that application was de nied md hat denial \\(.)lJid
wct'.!.h negauvel) '' ilb tht. loan officer. Th~ next three Htriabk '\, "'hich conc... rn th
e mplo}ment st.ll us, man ta) status. and educational all :un m~.nt ot the :l ['pltc<~nt
rd,tlc.: to the prospective ability of the applicant to rcpa). In the ~:"ent of ro-~ ~
... urc. charactcn...tics of the property are relevant as well and the. next \ aria hie tnu
cates whether the property is a condomimum.. The final two vuriabk:!) in Ja hlc. 11 . 1
arc whe ther the applicant is black or white, and whether th ~ 1pplic.ttion was
ucm~d or a ceptcd. In the:.e data. 14.2% of applicants are black and 12.0% of
applit .llJons an. demed.
l able 11.2 presents regression results based oo these variables. me ba'c 'P -:
ificallnns, reported in columns ( I)-(3). include th~ financial variables in T<.~ bl l.. 11 1
plu" the vanahlcs indicating whether pri\ ate mongage insurance was J t:ntcd .md
whether the applicant is self-employed . Loan offi cers commonly use thrc~lwkl:-..
o r cutoff values. (or the loan to-value rmio.so tht.! bo<ie -;pl.!dfication for thut va j.
ublc.. uses binar. van abk tor \\heLher the loan-to value rn11o 1s high ( .. 0 J"l.
mc.:dlllm (bet\H.I.!n O.X and 0.95 ). or low ( < O.X: th1' case i-; omittcu to avoid 1crlht multicollinearity ).l l1e regressors in the first three columns arc stm1lar to t 'L
in the base specification considered by the Federal Reserve Bank of ~J o .on
researchers in their original analysis ol these data. 4 The regressions m ~o.olumn'
(1)- (3) differ only in how the denjal probability is modeled, using a linea r probability model, a logit model. a.nd a probit model, rcspectivel).
Becaus~ the regress10n tn column (1) is a linear probabt lity model. itc:o c-odli
cicnh .tre esttmatcd changes in predicted prohabilities arising from a umt change
in the independent variable. Accordingly. ao increase in the P II ratio of 0.1 t:, \:-.ti
matul to i ncrca~e the probability o( denial by 4.5 percentage points (lhe coefU
CJcnt on P I I ratio in column (1) is 0.449, and 0.449 x 0.1 =!! O.!~ 'i}. Similarl>. h,,, inl!
a high loan-to-value ratio increases the probability of den ia l: a loan-to '< <tlUt.'
rallo exceedi ng 95% is associated with an 1 .9 r~rc.-.;nla!:... potnt tnctca.,c the
1
1\l(lrLt::gc
th~ munthl\' "3' m.:nl 1 the oanl: 1f th~ horro".:r c.lctaulb.Dunng thl!
1r
l 1 .4
TABLE 11 .2
403
~
Regreuor
I <I J..
/ ~ I fllflll
Log1t
f>robir
(1)
(2)
(3 )
0.()84
(0.013)
co ll\2)
0.449..
(rl.l I 4)
( I 31)
O.Mb
J
"!flU
() 17t
ll iN
(II II%)
2 .w
Protnt
Probit
(S)
(6 )
(4)
( ll.llQII)
62
2.57-*
(0 61 )
(ll.hl)
- II I 'I
0163*'
(fl. 100)
,0
-0. ~
(0.66)
-----~
-IJ.048
l-110)
- 0 II
( I 21J)
11.031
(11.013)
0 .46
(0.161
/u I ft.,mtovolut rotw
loa 111111! rutw ~ 0.95)
0 1!'19
1.49
(OO'il))
(0.32)
(0.18)
0 .15
(0.02)
11.16
0.34
(002)
(0.11)
0.16
(0.10)
It
11lWIC ~AptiiSI!-/0
l llt'QJ/a
rauo
mtJwm
loatHO\'IIIu~t ruuo
lll(llmrr crt:dlllt:ure
m r 1,'11 t credlf l c ort
publu
(II ~)
-1)
(II f)," )
1) 22
0.~
(IJ.O."i)
(ll.OI'i)
(0.08)
0.79
0.84
(O.lg)
(0 18)
0.79
(0.18}
ll.~)n
0.29
(0.C>4)
0021
(0.011)
0.28
0.1 5
0 11
(0.14)
(0.07 )
(0.08)
1.23
(0.20)
0.70
(0.12)
(0 12)
4.55
(0 57)
(0.~0)
0.67
<t'lfrmployed
(0..21)
on-
0.21
0.031
(0.74)
(11.1~)
(fi.OOS)
- 054
(0. 7 0)
256
U.iO
zw
(11.29)
0.16*
(0.02)
------l
0.11
0.72
(0.12)
2.59*
(ll ;\1))
(0.08)
0.70-h
(0.12)
259...
(0.29)
036
(0 I I)
0.23..
(ll.IIS)
- 060
{11.24)
() 03
(0.02)
PI/
Mu ~
- 11.62.
(0.23)
0.03
(0.02)
fl/110
X lrnll.\111~ ~.tpt:II.H'IO
lttUmt
no
- 0.1 RJ
(0.02.0)
no
-5.71
(0.4S)
no
-3 114 ..
-2.90 ..
(IJ.23)
(039)
-2.54
(035)
404
CHAPTER 11
(T~tbfc
/1.2 umrimud)
( 1)
(2)
(3 )
1\pplicom singlt!:
lfS diploma. tmlu.'ltr)
lllll.'mployment rat~
(4)
(SJ
(6)
5.85
(<0.001)
5..12
5.79
(0.001)
0.001)
Jl~
1.22
(0.2:1) I)
4%
(OJJII:!)
0.27
(0.7M)
/)iffcrence in pr11dicted
6.0%
~~
7.l'Jin
6.5%
ll~Cla!
regrcssltiD' w.;r~ csumated usmg the 11 2380 obscnatmn' tn the Uos10n Hl\1DA dalll wt desert bed 1!1 Append x II I ll~c
Ol()dd ""' e~timatcd by OL<;. and probit and logll r~!:(rt:sionb were cstimat,;d b~ rna~;mum hkt:lthood StanJ:u,t
errors are g~wn in par~mhcscs um1.;r the: ~:oclfictcnts and 11-valu"' arc: [!tvcn m parenthe..cs und<:r the /-statistics. 111" ch:tn!,tc IL1
preuicted probabu1t~ m th<: final row was compi.ltttl fur a hypothetical apphcun t whose values of the r.:gressor~. other thnn r,,c..:.
equal the sam pi~ m~:an. Ll1u iv\dml.:oerfici~n~ MC sratistically >ignifkllnt ~ ~ the *5% or ** I% I.: ve l.
lm~ar probabtht~
coefficient is 0. 1f{9) in the denial prohability. relative to the omitted case of a loanto-value ralio less than 80%. holding the o ther variabl e~ in column ( 1) constant.
Applicants with a poor credit rating also have a more difficult time getting al1><tn.
all dsc being constant, although interestingly the coe ffic ie nt on consumer credit
is statistical ly significant but the coefficient on morrgage credit is not. Applicants
with a public record o f cr edit problems. such as filing fo r bank ruptcy. have mu~:h
greater difficult) obtaining a loan: All else equal. a public bad credit record lli t.;~t i
matcd to increase the probability of denial by 0.197. or 19.7 percentage pt"int:,.
Being denied private mortgage insurance is esrimatcu to be virtually_decisive: n,~
es timated coefficient of 0.702 m eans that be ing denied mortgage insurance
increases your chance of being denied a mortgage by 70.2 pe rcentage point'-. til
e lse equal. O f the nine variables (other than race) in th~ regression. thl! COl 1 1
cicnt' on all but two arc ' tati.sticaUy significant at the 5% leveL which i~ cnn~ht dll
"itb loan officers' considering man} facloJs when they make their decisions.
The coefficient nn black in regression ( 1) is 0.08-1. indicating that the tlitfc rl'nce in denial probHbil ities for black und white apphcants is 8.4 percentage pntnl'
holding constar1l the other variable!> in the rcgres::.ion. This is statistically ::.igmti
at the 1% sign1Ikam:e level (I = 3 65).
The logit and probit estimates reported in column$ (2) and (3) yield 1:-lmti:Ir
ca nt
conclusions. In tbe logit and probit regressions, e ight or the nine coefficient" tlll
variables other lhan race are individually statistically sign ificantly different rronl
11 . 4
405
zao at the'% level. and the cocfflcrcn t on blat/.. '' statisticalJy significant at tbe
9
11)
w.~j-<M
b6
l02)
b7
~66)
,_
f "'O
11.1. I h~
Sll.ndan.l
lange n\
~ mcc
t>f a loanonstant.
g a loan.
cr credit
tpplicants
\-Cn\UCh
rd is ~ti~poin ts.
isiv~: 1ne
nsurancc
points. all
he cocffi
~onsiste n t
sions.
be J jll t;f
\gc pullll'
lly si~ntfi
~lcl ..;in1ilar
Iictcnl" on
~ rent fn 1111
)
I
I "o level. A" uiscusl;.~u in S~.:ction 11.2, hecause th~!se models are nonlinear, speclfu: values of all the rcgrc~-.or... mu.:;t he chosen to com put~ the difference in predicted probabilities for wbile anu black applicant-.. A conventional way to make
th1s cliorcc is to consider .111 a\cragc applicant who has the ampk a\'cragc ' aiUe'> or all the regressor other than rae~ Tht. final row in Table 11.2 re ports this
1-:!.ltmatcd dtffercncc in probabilities. evaluated for this average applicant. The esumatro:d raci.tl dillerentiab <Ire 'imllm to each other 8.4 percentage points for the
l in~ar prohahility modciJcolumn (I) J. fl.ll percen tage pomt-; for the logit model
[column (2)]. and 7.1 p~rc~n tagt: poult~ lo1 the prohtt model [column (3)j. These
estima ted race effects nnd the cocfticknt~ on hlack an.: !e. s than in the regressions
of the previous sections. in which the only regressors were PI! ratio and !>lack, indicming that Lhosc earlier estimates had omitted variable bias.
The regressions in columns (4)-(6) investigate the sensitivity of the results in
column (3) to changes in the ft'!!ft!~~ion specification. Column (4) modifies column
(3) by including additiona l applicant characte ris tics. These characteristics he lp to
pred ict wht:ther lbe loan i ~ denied: for example. hav10g a t k ast a high school
diploma reduces the prohahi li ty of denial (the c. timatc is negative and the coefficient Js statisticall;. sig.mfic;mt .11 the J% leYel). H owever. controlling [or these
p~:rsnnal charactcnstk<: dot:~ not chang~ th~.: cstlma t~:d coefficient on Mack or the
e tunated diffl!rence in dema l probabllitic.'\ (6.6%) in a n im ponant way.
Column (S) break:. out the ix cunsumer credit catcgone and four m ortgage
credit categorie. to test the null b) potheo;ic; that these two 'ariables enter linearly:
thi rcgn..:sston also add~ a vtriablc inclic.tting whdhcr the property i)) a condommJUm The null hypothe. J!. that the credit raung variables e nter the expressio n
lor the C:-\a}ue linear!} ~not rcjl!ctcu. nor is the conuonnnium indicator sigruficant, at the 5% level. Most im portant!~. the estimated racial difference in denial
probabiliue:. (6.3%) ~ e:.:.cntially the ))arne a:. in colu111m (3) and (4).
\ olumn (6) examines whether there arc interaction . Are different standard
applit:d to evaluat ing the paym~nl-lo-mcomt: anc.l housing expense-Lo-mcome
rattn~ lnr hlack versu~ \\hit!.! applican ts! lhe answer appears to be no:"lh e interacllon lenns are not jointly stuti:.tically ig.niCicaot a t the 5% leveL H owever, race
contmucs to have a significant effect. because the race mdicator and the interaction terms are joint!) statistically ignificant at the 1% le'el. Again. the estimated
racial difference in denial probabilities (6.5'o) is essentially the same as in the
~>thcr prohit regressions.
In all six specifications. the dfect of race o o the denial probability, holding
other ctpplicant characteristic!> constant. i:. stali!>tlcall) sigruficant atlhe 1 1o leveL
n1e estimated difference tn denial robabtltlles between
406
CHAPTER 1 1
Q/ r '
J
I jifQ
Af
/J/
. ~
q., /'la
(} i/( A )~raJ
po~siblc
~
~l ~~. )
data. alte rnative nonlinear functional fonn ~ additional tnteractinns. and ' n forth
J(l1 YJ
The orig~nal data were subjected to a careful audit, some errors were found .md
(V
f"
.G
fn ~ t-'
JI:."IJ ,_
( " QJJP (
~r
~
f! )
v
1
~~?~0
~,t
11 1
v
/
'(j
0-'
rtfp
the res ults reported here (and in the final published Boston Fed study) arc ba!>ed
on the ''cleaned'' data set. Estimatton of other spcClfications-dtffcrcnt functional
.tJonns and/or additional regressors-also prod uces estimates of racial dtffcrcnllals
compa rable to those in Table 11.2 . A potentially more difficu ll issue of intun.ll
validity is whether there is relevan t nonracial fmancial informativn ootainee.l during
loan inter\'iews. not recorded on the loan apphcation tt-.df.th8tlll
co rrelated with raCl': it so. there still mt!!ht be omttted "aria hle bias 10 th~ Ta
blc lJ .2 regression!>. Finally. some have queMioned external valid it)': Even if ther.::
was racial discrimination in Boston m I090. it is wrong to tmplicHh! l\!nt.kr~ J.;c
m-r~r-;un
con
~The c r ohC\' '11111-. an~:luilc changes in the wa~ tha i fat r lendmg e~ammatwn' 1\( le llone h~ teJwrul
hank re~tul:.to trt chungc~ in inquim."$ mruk b' the l s Dq1arm1e nt o f Ju~ucc. and e nh anced cdutuon
pro11ram~ IM lmnl.:s lind oth r home' klan origtnat~<ll o..'mpant~.:
~li you are ntcrc~tcd m funh~ readin~ on thl'i toptc. a good plat~ to ~tart'' the 5\mpthtUm '''' L'i3l
r.
d lSCrimmaunn anJ c!COOtiJIDc-. in the Spring ll)CIS j,,u uf 1/lc J, umnl aj tllltlllllo p, ''I' f t
Ill.:
article m that 'Ymroium b} IIden Ladd f I '~'1::;1 ~unc\ the e\ &dena: and dct-at, n n ractal lll'>"nrn
nation m mllrl~t:igc: l~ndm~ A IIJ(ITe detatled tr~auncnl a~ ll1"Cn m Goc:nng 1nd Wu:nl.: ( JJ<J6 l
1 1. 5
0r ~tt
<~nalV"is
of data on indi
th~
bc
nk
407
ta-
Summary
111
tion~hip
between
enrnin~
maxunum
d1scrcte choice
scl~:ctton
r.r-;muJL.tn~u' <:\iU.ttlcln
t. l.ll ~xt~nded
Ill
James J Heckman
Doniel L Mcfadden
11.5 Summary
When the dcpendenr varia ble Y ic; binary th rupulnlton rcgn.s,ion lun([ion i~ the
p ron<tbJht~ that Y = I. conditional on th1. r~.:grcssor". E ti mation of this pop ulatio n regression function entails frnding a fu nctwnul fo rm tha t docs juc;tice to its
probabtlity interpreta tion , estimating the unknown para meters of that function ,
a nd inter pre ting the results. The resulting pred icte d values are predicted probabilities, and the I!St imated effect of a change in a regressor X is the estimated
ch<t ngc in the probability that Y = I ariMng from the change in X.
408
CHAPTER 11
Rcgres~ion with o
A naltll al way to model the proba bil iry that Y :::: I given lhc rcgrcso;ors h 'o
use a cumulative d istribution function, where the argument of the c.d.f. depend.,
on the regressors. Probil regression uses a normal c.d.(. as t h ~: regression fu nttt11
and logit regression uses a logistic c. d.( Because these moJds an.: nonlinear f "
tiom. of the unknown parameters. those parameters are mo te complicated to cc; 1
mate than linear regre io n coefficients. The standard estimatao n me th 1J
maxtmum hli.clihood. In practice, statistical inference using the maximUJll hkelt
hood estimates proccc<h the same way as it does in linear mullipk reg.re......ann tur
example. 95Qa confit.lence intervals for a codficacnt a rc construe tell a.s the est 1
mated codftctcnt 1.96 standard errors.
Despill! tts intrinsic nonlinearity. sometimes the population rcgresston tunc
tion can be ndcqualcl y approximated by a linear probability model. tbat t ~. b) the
straight lim: produced by hnear multiple regression. The linear probahtlit) m11dd,
prohtt regrc!lo;ion. and logil regression all give similar botto m line'' answers when
they arc applied to Lhe Boston HMDA da ta: All three methods estimate -. uh~t.m
tial dirie re ncc~ in mortgage denial rates for othe rwise similar black and whi te
applicants.
Binal) dependent ,ariables arc lhe most common exampk of limited d ocn
den t variables. whach are dcr>cndtmt variables witb a limitcd rangc. Jnc L1nalliUctr
ter of the twentic:th century saw important advances in econome tric method-. for
analyzing other limited dependent variables (see the Nobel Laureate, box). Sumc
of these methods are reviewed in Appendix 11.3.
Summary
I. W hen Y is a binary variable, the linear multiple regression model is caltl!d the lin
ear probability model. The population regression line shows the probabihlv thM
Y = 1 given the value of tb~: regressors, X 1.X2 , X 11
2. Probit and logit regression models arc nonlinear regTession mode ls used '' h,.., l '
is a bi nary variable. U nli ke the linear probability model. probit anc.l logit tL'!H~-'''
sian ensure that the predicted probability that Y = I is between 0 and !lor :til vJl
ucs of X.
3. Pro bit regression use:, the standard normal cumulative distribULion function 1 Pgit
regr~ssion us~s the logistic cumulalive distribution function. Logit and prohat cnd
ficients are estimated by maximum likelihood.
4. 1hc values of coefficients in probit and logit regressions an.: not easy to in h:rprcl
Chungeo:; in the probahility that Y == I associated with changes in one or llhlll' vf
Exercises
409
the xs can be calculated using the general procedure for nonhnear models outlinl!d in Key Concept 8.1.
5. H ypothesis tests on coefficients in the linear probabilit y,logit, and probit models
are performed using the usualt and Fstatistics.
Key Terms
limited depe ndent variable (384)
linear probability model (387)
probit (389)
logit (389)
logistic regression (389)
11.2
11.3
11.4
Exercises
Exercises 11 .1 through 11.5 are based on the following scenario: Four hundred driver 's license applicants were randomly selected and asked whether
t hey passed their driving test (Pass; = l ) or fa iled their test (Pass, = 0); data
were also collected on their gender (1'4ale; = 1 if male and = 0 if female).
and their years of driving experience (Experience,, in years). The following
table summarizes several estimatt!d models.
410
CH APTER 11
L~:;tcn~nc,
Problt
Logit
lPM
Probil
~ )
,:,
(3)
(4 )
0031
(0009)
\0.0161
u.u.m
((1)
(1)
OW
(U lui)
0622
(II 3(1.)
-0071
(II 034)
-n r.s
(0.2.'i9'
I I
tpt.;Tl<lll"'
'
(fii,J9)
Cwr tmt
11.1
071::!
I 059
u.n4
I ::!1\2
2 197
II 'Xlll
(01.26)
(U~l)
(0.1134)
(II I :!4)
(0.:!41)
(0.02:!)
(J
Sill,
(U.:!lXI)
t e~ t tlepencl
on Experience'!
E\plain.
b. ~l.ltthew ha<; 10 years of driving experience. WhattS the prubctb1htv
that he ~t tl pa , Lhe test ?
c. Christopher .sa new dm er (zero years of expt:nence) What IS the
d. Thl
aMpl~
nnt"1
11.2
a. Ans,,cr (n)- (c) from Exe rcise 11.1 using the results in column (2).
b. Sketch the predicted probahilities (rum the prohit and ll>git in C<.> lunlll"
(I) tt~H.I (2) ror value!> of E:rperienc:e bet wc ~: n 0 und 60. Arc thc prnbi!
nntllugit model:-. simi lar'!
11.3
a . Answer (a)- (c) from E.xcrcisc 11.1 using the r~ult' tn column (3).
b. Skl!lch the
predtct~d prohahtlttie~
l \\
h~ ur '' h~
not?
ILzf
(U IS
-OJ~.\
(~)
Probit
0006
\0 [K}:! J
Male
,\f,tf~ X
lPM
::
(1-
~ ) :P( y, :: fP / '/;_ )
I 1.4 c . . mg the re-sult!> tn culumn ... (4)
::
p ( "f<t
(6).
\ Exercises
r"r' y::
411
I?--,Jl. Yt
d
a. Compute tht.: c\timah:t.l prolnhrlll\ o( pa,,ing the te ... t lor men and f('lr
Pro bit
women.
b. Are the moJ~Is in (~)-(6) uiflcn.:nt' Why or \\hy not?
(1)
U.ll.ll
ll.S
tOr h)
-(IJ1.t
(0..!59)
(1-ft,f, 'It)*
ll.IY )(>
(ll.21 ll
)<
{F" Y<)
11.6
b. Jane ts a woman \\tth 2 yc:.m.ol dmmg t:xpcncncc. Whatts the probahihty that she will p.r-;s th~: 11.'\t'!
c. Doe:. tht: dfccl 01 cxpl!ncnc~.: llO tl.'<it p..:rform:1ncc t.lepend on gender'?
[\plain.
Usc tht: estimah. t.l pruhit mocJll in r4uati{m ( II.R) to answer the foll owing
questions:
b. Suppose that the applicnnt reduced thi~ n110 to U.30. What effect
\\ Ould rhts ha\'c on his probabrht~ of betng demed a mortgage?
bilit:
11 . 7
as 1:'.
I pass
11.8
why
Repeat Excrclloe ll.o ulomg Lhl' lug.~t mot.ll'l tn L4uatton ( ll.lO).Are the logtt
anJ prohit rc~ult~ sim1lar? Expl:tm
Consider the linear prob.Jbility OlllOcl r, = 13u..:. /3,X. - U;. where
Pr( Y, = I X) = {3, -'- {3 ). .
:r1) =U.
Show that ''ar( u :r) = ({3
(2).
column~
c. ].., u,
e rr('lhil
d.
0 (3).
in
C.lptfl
n ate
.&.
ll.9
hetero~;kct.Ja..,tk'.'
(Require~
{3 \ ,)[1 - ({3
/3 XI].
c\plain.
l "'e the estimated lincnr prohahrhty moJd hown in column (1) ofTable
to answer the foUo\\ tnl!
11. ~
a. Two applicants_ one while and one hlack, applv for a mortgage. They
h.1ve tile same;' 1luc~ lor .tilt he rcl.!.rc~<;Of' other than race. Hov. much
morc liJ..cl) ts the black applicant to bt: dt:ntcd a mortgage?
412
CHAPTER 11
=p . Pr( Y = 2> = q
anu
=3) = 1 -
p - q . A r an dom sample of size 11 is drawn from thts d i'rribution an d thl! random variables are denoted Y1, Y: ..... Y,,
cl ass?
c. A study of consumers' choices for Coke, Pepsi, or gene ric cola ?
d. A stud y of the number of cellular phones owned by a family?
Empirical Exercises
Ell.l
11 bas been conjectwed that workplace smoking bans induce smokers w y u11
by reducing their opportunities 1.0 smoke. In this assignmen r you will t:lill
mate the effect of workplace sm oking bans on smoking using t.lata on a s,tm
pie of 10.000 t :.s. indoor workers from 1991-1 993, available o n the texlhuok
Web site www.awbc..com/stock_ watson in the file Smoking. The data set C"n
tains information o n whether individuals were or were not subject to a" l'rl\
place smo king ba n. whether the individuals smoked, and other ind ivtJu,tl
Empirical Exercises
413
wer in
a ble Yhas
) = q, and
m this db-
d. Test th e hypothesis that the coefficient o n smkban is zero in the population version of the regression in (c) against the alternative that it is
nonzero. at the 5% significa nce leve l.
nds talknomics
e. Test the hypothesis that the probability of ::;mo king does not de pend
on the leve l of educa tio n in the r egressio n in (c) . Does the probability
Ia?
kers to quit
u will estita o n a same textbook
ata set conI to a workr individual
escriptioo.
workers
ffectct.l h)'
K11.2
c. Test the hypothesis that the probability of smoking does not depend
on the level of t:duca Lion in this pro bit mode l. Co mpare your results
g B.tn~ ReJm'C
Using the probit regression from (a ). and assuming that Mr. A is not
subject to a workplace smoking ban. calcula te the pro bability that
414
CHAPTER ll
Mr 1\ <>mol..c'i. Carn out the calculation again as,ummg that hi! 1S ,ub
jcct to a workpltce smo king ban . What is the effect of the 'moking ll.ln
on the probability of smoking?
r.
R epc.;at (d) and (l:) U\ing the linear probabilir> model hum Empm~ 11
P"<crcise 11.l(c).
E l 1.3
In tht:,exercise you will stuJy health insUiance,health statue;. and c.;mpl \ ncnt
random sample of more than 8000 wo rke rs in rhe United Stat~:~ n e
J aw .tn: avatlablc on the textbook Web site \VWw.aw-bc.corn/stock wuhcm tn
the file lnsurance.li A de tai led descripti on is given in l nsun lllcc_Desniplifln,
available on the Web site.
u-, i n~ J
a. Are the self-employed less likely to have healt h insurance than \\age
earners? If so, is the difference large in a real-world sense? Is the dtf
terence sratisuca lly signifkam ?
b. Th1. llt:lf~mpJoyed might systematically differ from wag" ~<A rnerc; in
the tr age, educauon. and so forth. After you control for these <."~lhcr
factors. arc tbe selfemployed less likely to have hcalt b msurance
c. How does beaJtb insurance status vary willl age? Are older workc . .,
more likely to ha'c health insurance? Less like ly?
d. Ill the ._ rr~ct of self employme nt o o insurance status diffe re nt for older
workers than it " for younger workers?
e. ll h:1s been argued that the self-employed are less likely to be "' red.
but ~.h.:spite this. they are JUSt as healthy as wage-earners. Is thts m.l:.
Does the. argumt:nt hold up fo r young workers? For older worl-:cr<!
An: tht!rc potcn uallwo-way ca usality problems that might u nd~n1
the internal valiuit~ of this kind of s ta~j stical analysis'?
'111C~.: Llillit Wt'IC r>mvid~d
hy Profc~~Cir
Ht.rVe)
\\ C:lC: u~.:d
in h'
raper Wtlh C'r I" P.:rl'\ ~Tht Sell I Olf'IU\eU Ar ic~, Ltl.el~ Th n Wo ~o.:-lnrr r lu ti t\e H lh
lnsuror.c ') \\ I 1 1r Doug I hlltl f'.akm nmll Jan cy S. Ros.:n. cd'-. J 1111</'<'nc tcr.Jap ;md f1 ' '
rultt;. \11 p,.. ~ .00-l
APPf:
1.
h sub-
41 S
APPENDIX
mg ban
ate.
lhc ao,ton ffiiDA data ...c.:t "'' collected h) rc''" m:hcr~ Ul the fcJeral Reserve Bank oi
Rn ... ton Th(. Jata
sur\1.'\
? Are
'-d .. lmhm\.~
mlormatllln h
1111
or the bank' nnd Jtlo. lc.;oding IO'>IIlUllons thut rccct\cd thc!'e mort2age a pplica-
lllln"- The data pertain to rnOrl!!3!ll! applicallons maJo. m li}<)(J in the greater Boston me!r~lpolllan area. The lull data :.c;.l h.ts '292:5 oh,<!rvaltons, consistmg of all mortgage
1\pphcations by blacks aod H i-,pan ~ plu' a random sample of mortgage applicaLioll!> by
WhiiCS.
laymen!
atcs. The
To narrow the scope of the analy '' 10 th ts c haplet, we use a subset of the data (or
smgle- ramil~
watloo in
' criptioo.
residence<; o n I\' (thereby cxdudtng data o n rnulti fa mil~ homes) and for black
( t b~.reby
~roups). This
leaves 2380 observations. Ddimhons o ( the vanables us.!d tn this chapter are
given in Table 11.1.
n wage
The.~e
the dif-
data were
gracJuu sl~
Depart men( of the Federal Rc crvc Ban f... of Boston. More informal ion about this data set,
along " llh the conclusions reached hy the Federal Rc:~rve Bank of Boston resc:archers. is
avatlablc in the article. by Allc1a H Munnell, Georfrev M. B. TootcU. ~ne E Browne. and
B~1on
for older
insured.
h. right?
t!
APPENDIX
) JlJH
rc~pon-.c
II
d ~ ,_ ' II""'' I
n.. r
liUII
turn to the prot'tit and logll m4>ddo; and dio,cu.,, the p~UdlrR . \h cunclude '"itb a d L~s
2
sion of <otandard errol'> for predicted prohahilittc-... l'h" 1ppcndix us~.s calculus at two points.
416
CHAPTER 11
e:<1ens1on of then
X (p
0(1 - p)
(I Ll.3)
Tht.: bkehhood function IS the JOint probability dJstnbut10n. treated as a fun ctton fthc
unknown ooe(ficicnt"- ~~ S ~ } . then the likelihood tunllion i~>
(ll.lH
The MLE of p IS the value o {p that maximizes thl likelihood in Equation ( 11. 14) Till!
likelihood function can be maximized using calculus. It is convenient to mo x1mi~..: not th~:
likelihood but rather its logarithm (because the logarithm
IS a
maximizing th::.- likelihood or its logarithm gives the same e~ timator). The log hkd hood L\
Sln(p) + (n - S)ln(l - p ). and the denvative of lhc log likelihood '' llh rc,p-.;ct hlp i.~
cl
dp lnl/s,.,,.
n- S
-IP
( 11 I')
Settmg the derivau,e in Equatio n ( l l.l5) to zero and solving Corp yield~ U.1.
~
ll.
}'
IS
= Pr(Y1 -
y X 11
- P1 { 1 - fit)
'
XJ. 1)
X p~(l-
(lJ.IO)
XJ.,)
{',)
Th~: likelihooll funcuon is the Jtltnt probabilit) distribution, II c.ttcll.t' .1 func:ti"n f lh~
unknown coefiiCJcnts. lt is con~entonal to con,..idcr the logarithm of tht.likclih,l<ld. \L ,rJangl). the log hk.elihooll runcllon
4 17
I I
tion is th~:
.,. 2:" (1 l l
( l l 17)
where this expression incorporates the prohit formula for the conditional probability.
(11.13)
p, = tll(.811
+ {3 1X 11 ~ .._ JJ~.X41 ).
The MLE for the p robit modd maximizes rhe likelihood function or. equivalently, lhe
loga rithm of the like lihood function given in Equation (11 .17). Because there is no simple
formula for the MLE. he probitlikelibood func tion must be maximized using a numerical
ctioo ofthOj
(1 1.14)
(1 1.14).Thc
The likelihood for the logit model is derived in the same way as the likelihood for the pro-
ng function,
bit model. The only difference is that the conditional success probability p 1 .for the logit
ikelibood is
mode l is given by Equation ( 11.9). Accordingly, the log l ike lihood of the logit model is
tto p is
given
[I + e
(11.15)
s the tv1Lf.
by
Equation
MLE of Ihe logrt coe fficients. so the log likelihood m ust be maximized numerically.
Pseudo-R 2
T11e pscudo-R 2 compares the value of the likelihood of the estimated model to the value of
the likelihood when none of the X's are included as regressors. Specifically, the pseudo-R2
for the probitmodel is
. xk sp. =
(U .l8)
i 1" observa-
Y1) a re i.i.d .
eX's. is
where f1~,'~'b11 is the value of the maximized probit likelihood (which includes the Xs) and
flf..~ruiJII is
( 11 . 16)
the value of the ma ximized Berno ull i likelihood (the probit model excluding all
the X's).
unction oi the
l10od. AccorJ
and
M"Farc the MLEs of the two prohlt coefricieots. Bec~use-lhis pl'edicted p robability
418
CHAPTER 1 1
<f
Ill
'~here
th..:
C()O~tant
II
=rl<)l(/3o
f3t\) d/3u;)J
''' .ll
Bllll
a,= i'cfltJ3,,
Jlt)
11
{i, \1
/l,
Jl IThe variance vi p(r) now can hi.: c:Jiculll!cd u'ingth1. npprm.:imalion 111 r, 1ua.
'"'n ( 11. 11J) and tht:. ,;xpression Cor the vanance o( the sum of two random vmiahl ' m
LI.{Uiltlllll (2.31 ):
fl/3
\:tr[p(.\)1
) +
( II 'II)
l -.ing Equation ( 11.20). the ttmd.trd ..:rror ol iJ(x) ccm be c.tlculato:d usmg c '11m h."
of th-: \':lrtance-. Jnd cmnriancc of the MLf.',
APPENDIX
11 .3 , Other Limited
"lltts 1ppcnJ1x :.urveys some mo<ld:o~ for lirrutcc.l d-:pcnd\!nt ,ariabk:s. oth..:r th<tn hm uy v,tn
:~hie\ found
n economctnc apphc:atiuns. Ln most c,l\C:'> th..: OLS cswnutor-. of the p.tr t11l~
tcr-; of limited dependent variuhl!.' models arc inco nsi~>t cnt, and cstimatinn is routim:l\ L1~' 11 c
using 111<1'\tmum likdihood n1cn: arc several ad \ttllccu rdcrcncc~ uvuilablc Ill the t aJ.:r
int<!rl!steJ 10 further dctarb. :.cc
10r
419
'ar1.1bh.:lo. hut no'lhu,ers spcrt' _0 1ltu' the d1,tri~utmn ol Cilf c\pcnditurcs is a cumbinattun Of 8 JI,Cfet e tfutn'buti(ln (.t 7ct0) .tnd 3 COIIIIOIIUU'- Jt-.tnhUIItlO.
a
"ltlhcllaureatc Jame ToNn dC\ elOpcJ a useful mood fUI J Jependcnl variable \\ith
pi!rtly COill111UOUS and pari I~ di'CTete d"tn!'lution (lohm 195H). [obin suggc~tcd model-
Ill~ the 11 ~ IOUI\'tdual in the sampll: ~ hnvm~ a d~.:si rcc.J II:\\.. I of ' J'l<ndinl!. r
to the.:
11 .19)
rc:~IC.:"'or"
11.1
that ~ rclall.:d
to a hnc.:m rc.:grcsston modd.1l1at is.
:.pcnc.Jm~ lh
= ! ..... 11 .
(1 1.21)
(whntth.: consumer wants to spend) exceed:. :.om.. culoff.:.wh as the mint mum price
of a car, then the consumer buy the car and ~PCIH.h}
} \\hu:h !!> llb~~;r\c.:d . Ho,,e,er,
If)'
When Equation (11.2.1) t:.~o: umatcJ u'mg ob-.t.f\cc.l c.:x~nJuurl.!s Y, m place uf Y;-. the
OLS estimator is iocon~istcnt Tohtn <onlved th1s prohl m 1'1) dcm mg the lil..dihood function using the additional ru.sumptiun tht 11 h,,, ,, nt~rmallll,tnlmtion. anc.lth~.o rc... ulu ng
(ll.20)
~ tLF ha<; been used by apphed econometnc1an' to 1n ll\1~,; m,m\ problt:m' m economics.
Jn Tohm'o, honor. Equation (11.2 I). combtnt:c.l" ith 1h1: :tl>!>umption of normal error... i~ ciDit:d
the tobit re~re<;<;ion modd. I1tc tobu model is an ex.unplc ul 11 ccmorcd regre!>."ion model,
so-called bccaus~: the depende nt \ ;.triable ha~ been (;cnsorcd " :1b0\e o r below a certain
cutoU.
be if the data were obained by '\lmph! random \arn phll ~ tlf the.: ,tc.lult population. u: however, the d<tta ar e collected fr<'rn' tic ta' rcwrJ~ th~:n tht d 111 \\I)Uid mclud..: ool~ l'luyers:
'!ben~ would he no daW at II f111 nnnhu\er;. 0 .111 Ill \\ h1d1 oh-.c.rvat1on:. .m~ una\'atlable
nho\'c or h.:It''' a threshold ldata for buycf' onl}) tTl ullcJ trunl ted dat. Thl truna ted
regre,, ion model rs a regn:-,~ton mudcl .tppl ~ J t 1 1. nt J n \\ hich oh'>d\ ttion' 1rl! ~imply
nrv \'llJI
par;tm~
don.:
rc.,dcr
una\atlahl~..
the 'dt!Ctlon mccharu::.m (an mdt' 1c.J ua l1~ m th~o 'mph. h\ virtue of hu~ mg a c.tr) tl> rdated
lo t he \ ;tlu~ufthedepend~;nl ,,,n.ll'lh.. (the pncc ol th1.. ~ 11 \!>lh'<:U"!>I!Ut.;the hox inS~c
tllm II 4 on"' .tpproach to e~umatmn l'l s.tmph.. dc~.:uun modd-. 1s to dt.:vclop t\\O equatiunS,I)nc.: for }'" and one lor v.h~ thcr } " ( N ... .:d T11t' r lf!lrnctcrs or the:. model can then
vear car
randorn
4 20
CHA PTER 1l
Count Data
Count duta arise" hen the dependent varia ble is a counting number, for example, the 11110
ber of restaurant meals eaten by a consumer in a week. When th<.:"" number:. nrc largc.th
h'
v:mable can be trc:ttcd a<~ c~rr;o~m; tel~ contmuot~:>. tout ''hen tl1(' a c m:~ll , the conti
uou.; uppmximlltton is a ('lXlr one. The linear regression model. estimatc:J h)' OL.'), ct
_.
'
u~d tor count dutn, e'en tl the number of counts is <>maU. Predtch:d v:~lue<~ from :he rc
..
-,ion arc mterpretcd as tht; c:\pected value of the dependent \': lrtJbk, condnionnl on the
re~n.:,<.nr;. So. \\hen the d~:pe nde nt variable is the number of re~taurnnt meals cuten, 0 p
dtch;d \ .tluc of ..-:;;;._an' on '''-rage I 7 restaurant meat, per \\.:e~.,\:; !nth~ h:".try rcpc~
sion mooel ho"'ever. OUi doe~ not take advantage of the special Mructurc nf cuunt dato
and can y1\!ld non~cnse predictions. for examph.:.-0.2 rc::.taurnnfmcab per \\\:.:1. JU<,t as
prohit and log11 climinat" nonsense predictions when the depemknt \'JrlaPic ~ bmary. ~Jle
em! mo~k:h do -.olor count data The two mo~t widely U'>cd tnodels Ml the Po1~-.on .111d ucgotivc bini>miat regrc.:~'ion modcl!i.
Ordered Responses
Ordered response d.ota arise when mutually exclusive qualitative categorie<: h >tvc, n 11urul
ordering. such as obtaining a high school degree. some college cducauon (hut 1111t 1!.1 o1Juat
ing). or Araduittmg from college. Like count data. ort.lcrct.Lrespon,~. c.Jnta h.1v1. J llatur;tl
orderin~. ~~
....
Bccau~
the ordered prohit model 1n '' htcl.. t .e probab hu~: ,:~f ~:ach outcome (e ~ .. liege cdu
cat on). (;flnU~l.JI on !~I; uukpendcnt variable~ (such a<;' parent-. IOCOme), itrl.. modek-d
U'-in!! the ~.tnnula t i\'C normal J l-.tcibution.
..
::.._.:-..> . . . "
.-
\iiiU\.''-. On~o "' 1mp.t: m economic. IS the mode of tran,port cho::.cn by a commuter: She
mighttnkt.: the su bwn~. ride the hus. drive, or make her W<J) under her own p11\\o:l' lwiilk.
~btcvcle) It \\1! wen; to annl)le these choices.. tbe dependent \'ariable would ha"~ l1lU
sibl~:
til)'
natural w1.1y. Instead. the outcomes are a choice among uistinct qualitative nltcrnottH''
llle econometric task is to mood the probability of choo~mg t h~: \ anou!' 11p11on~
variou" regressor~ such as Individual characteristics {how far t h~: commu ter's hou~c.: tS /1 Pill
the :.ubway station) and lbe characteristics of each option (the price of the 'Uh\\;t) ) _,\ s dis
cussed 10 the OO\; tn ~cct1nn II \, modd" for anal\'sis of di'<crete chotec.: dau, tn lx' dc\-cl
< .
.
',. . rD._.;;--
opec.J frum pon"pk~ of ulflit\ max1mizatton. lndJvtdual cho1cto pmhabihti ., ~an b
to\;pn_
..,,cd i~ probit'Orloen
. form. and
.... lho-.e modd~~ticJ muHinomlal probit J mulli
nomiai iQjtit rce.rc~i1m m......J~; b ..
,=--..- . .
CHA
C IA PTER
12
Instrumental
Variables Regression
\aria~k-.. c:-rmn.-in-
' .trtablt!'. tnJ ,,muJ!anc!OU' l JU,,tJtl\. that makt! the crr'~r t\!rm
~.70rrehteJ
"i th lh~ t .:gre~o t Omttted 'an abk bta'- can be addressed dtrt-ctly by including
the omilled ,ariablc in a muhiph! rcgr\!ssJOn, but this is onlv
J,ll<~
lc<~:-iblt!
if you h,tvt:
j..,
(thi~
is lht.: pMt
that (.,IUSI.!S the probh:tns). anJ <.1 'I!C!li1J pari that I' Unl.'orrchteJ \\ Hh II [f you
bad tnformation that allowed you to isolate tbc second pari. then you could
focus on tho-.c \anations in X that are uncorrelatcd with 11 and disreg.trd the
\.tnallom. an X that hta.s the OLS estimate~. Th.is is, in fact , what IV regression
dot;..' ln e informall~)n about the movements in X that an.. uncorrdutt.d wtth u is
)11cancd from on ~ or more additional \'unable' callc:-d in,1rumcntal 'uriubles or
simpl) in-.trumcnt'i Instrumental variable:, rcgrcs.-.ion uses thc-;e .tddnionnl
v1riablc:> a~ tools or mstrumeots" to isolate the movemen h tn X that are
422
C HAPTER 12
1bc first two sections of this chapter describe the mechanics and
demand for CJgaretlc!>. Finally. Section 12.5 turns to the difficult quc-.tJon ol
-v.berc valtd instruments come frpm in the flrst place.
1)
where as usual u, is the error term representing omitted factors that detemun~ l',.
If X, and u, arc correlated. the OLS estimator is inconsistent. Instrumental v:~ri
ables cstimauon uses a n additional, "instrumental" variable Z to isolate that 11rt
of X that is uncorrelated with u,.
12.1
423
id.
Instrumental variables regression has some spe/ // / / cializcd terminology to distinguish \ ariabl~s tha t are cqrrelated with the popula~rr ~ tion error te rm u from o nes that are not. ariables correlated with tht! e rror term
/l
are called e ndogenous variables. while varia bles uncorrclated with the error term
1#~ ~f,{ -"!(
are called exogenous variables. The historical source of these terms traces to mod--~--- els with multiple equations. in which a n 'endogenous " variable is determ ined
/J ~-olt S ! within the model while an "exogenous" variable is determined outside the model.
l: f - t'
For example, Section 9.2 considered the possibility t hat. if low test scores produced
-(t ~rr~ecreases in the s tudent-teacher ratio because of political interve ntion and
increased fun ding. then causality would run both from the s tudent- teacher ratio
~(,1,
to test scores and from test scores to the stude nt- teache r ra tio. 'IhlS was rcprese n ted ma thema tically as a system of two simultaneous equations [Equations (9.3)
and (9.4) ]. one for each causal conne ction. As discussed in Section 9.2, because
~~ue/ both test scores and lbe stude nt- teacher ratio are determined within the mode l,
{
, -~"
both arc correlated with the population error term u ; that is. in this example. bo th
~>t:f variables a re endogenous. In cont rast, an e xogenous variable, which is de termined
~ &{U
outside the model, is uncorre la ted with u.
14
the
nsisev~n
. this
ttt
1r
j
gres-
= 0.
u v{ l, Y) -f~
~ll;
fA- ) -::;; ()
If an instnunent is releva nt, the n variation in the instrument is relate d to variation in X ,. If in addition the instrument is exogeno us. then that part of the variation of X, capture d by the instrume ntal variable is exogenous. Thus, an instrument
thal is relevant and exogenou~ can qtpture movemt!nt~ in X, that a re exog1mous.
This e xogenous variation can in turn be used to estima te the populatio n coefficient {3 1
The two co nditions (or a va lid instrument are vita l for instrumental varia bles
regression, and we return to them (and their exte nsio n to a multiple regressors
and multiple instruments) re peatedly throughout this chapter.
(12.1)
1ine Y,.
11 vari
111 part
424
CHAPTER 12
problcma llc compontnt that may be correla ted with the rcgrc~:,ll>n ~:rror .tllll
anothtt proolcm-trcc compon~nt that is uncorrelated with the error fhc sc:cond
stage usc~ th~ problem-.fret! component to c:;timate {3 1.
nlc first stage begins with a population regression Jinking .A and 7 :
(1 2.2)
here 1r0 is tbe intercept. 7T 1 is the slope, and v, is the error term 11m. rcgre,,ton
CO\ ideo::. the needed decomposition o( X ,. O ne componen t is 7fu
1T' 7 ,. lhl. r lrt
f ,\, that can be prcdictl!d b) 7.,. Because 2 , is exogenou!>.. Lhis component of \ ~~
uncorr~larcd wit h 11,. the error term in Equation (12.1 ). The other componcm of
X , iio 1',. whh;h is th~ problematic component of X, that is correlat~:d wllh tt .
Tlle idea behind T SLS is to usc the problem-free component of X,. rr11 + 1r , Z
and to disrega rd v,. The only complication is that the va lues or '1Tn a nd 'ITt ;uc
unknown, so rr0 I 1r 1?:, canuol be calculated. Accordingly, tbe first stugc of T)L \
appli<.:> OLS to Etjutttion ( 12.2) and uses thl.! prcdict~u vuluc: tmm the OLS reg"'
sion. I<, = 7;-u "' -ir 1/., . where 1r11 and ir 1 arc: the OLS estimal~ .
l111..> second stage ofTSLS is easy: Regress Y; on .\', ustng O LS. The rc-,ulll n)!,
i.!Stimators from the second. tagc regression are the TSl.S estimatms. ~Tt'L:l Jnd
cJu
ins
reg
tici
~~.\1'
pro~h:m
have been uevelop<.:J coUaborat ively with his son, Sewall Wrigh t (';ce the bnxl .
Philip W right was concerned with an important economic problem of hts day: liP''
to set an import tarilf (a tax on imported go ods) on animal and vegetable otb .tllJ
fa ts. such as butter tt nd soy oil. l n the 1920s, import tariffs wc.:re a major souret' PI'
tax revenue for the United Staks. 'Jn e key to understanding the economic effu.:t
or :1 Lanff was having q uantitative estimates of the demand and suppl) cui""~ ~
of the goods. R ecall that the l>Uppl~ elasticit} is rhc p~rccntag.c change tn 1 11.
q uan ti!~
supplied ansing from a I% increase tn the plicc. anti Lh~ ucmnnd clw
the percentage change Ill the qun11ti t~ demanded arisin1_! Irom a 1' ''
incre.1sc m the price. Phtlip Wngh t needed estamates or lhc'>e t.!last acitie~ of :.ura I)
and dc.manc.l.
L icit~ I~
in 1.
(<IS .
sma
12.1
425
'
nslrum~nl<t l
~~
lem an cconomctrrc:.
2)
10
\\o nght"s 1928 hook. Tilt Tartff un Animal anti Veg rMhh 0111. II you want to l,.now how anjmal and 'c~
n
trt
,is
of
7.,.
re
LS
es-
ing
nd
lem
ari-
by
[0
:d.
1''o
gcncti<.~reseorch
profe~sorship
pared statisucallv to 1~.;\t' known to ha\e been written independently b~ Philip and hy Sew aU. the
rtsults are clear: Phihp wa~ the author.
Doc~
426
CHAPTER 12
price. and u, represents other factors that affect demand. such as income and consumer tastes. In Equation (12.3), a 1% increase in the price of butter yields a {3 1
percent change in demand, so {31 is the demand elao;ticity.
Philip Wright had data on total annual butter consumption and its a' erage
annual price in the United States for 1912 to 1922. Ir would have been ~v to uc;e
these data to estimate the demand elasticity by applying OLS to Equat1on ( 12.3).
but he had a key insight: Because of the interactions between surP.I) :tnd demand
the regressor, in(~'"'') was likely to be correlated" itb the error term
To see this. look at Figure 12.1 a, which shows the market demand and supply
curves for buuer for three different years. The demand and supply curves for the
first peri od are denoted D 1 and 5 1 and th e first period's equilibrium price and
quantity arc determined by their intersection . In year 2, demand increa es trom
0 1 to D 2 (l)ct y, because of an increase in income) and supply decreast s from 5 1 to
52 (because of an increase in the cost of producing butter); the equilibrium pricl.!
and quantity are determined by the intersection of the new supply and dcmanJ
curves. Jn year 3, the factors affecting demand and supply change again : demand
increases again to D3, supply increases to 53. and a new equilibr,ium quantit) and
price are de termined. Figure 12. tb shows the equilibrium quantity and price p:ttr'\
for these Lh ree periods and for eight subsequent years. where in e8ch year the -.upply and demand curves are subject to shifts associated with !actors other than r K'C
that affect market supply and demand. This scatterplot is like the one that Wnght
would have seen when he plotted his data. As he reasont!d. fitting a line to the'\.!
pointl) by OLS \\ iiJ estimate neither a demand cune nor a uppl~ cur'"~: bccau'c
the pomts h ve b.:cn Jetermm~d by changes m both dcmano and !)upply.
Wright realized that a way to get around this problem wa~ to find some thud
variable that shifted ~upplv hut c.lid not shift demand. Figure 12.1c ho,.,. what h rpens\\ h n ucb a\ ariablc shifts the supply cunc. but demand remain~ stable ' l)W
all of the equilibrium price an<.l quantity pairs lie on a stable demand curve. and
the slope of the demand curve is easily estimated. In the instrumental variable fur
mulalion of Wright's problem. this third variable-the instrumental variable- '
correlated with price (il shifts tbe supply curve, which leads to a change in pncc)
but is uncorrelaled with u (the demand curve remains stable). Wright considcrrlJ
several potential instrumentaJ variables; ont! wa tbe weather. For example, beh"~'\
average rainfa ll in a dairy region could impair grazing and thus reuuce butter 1nl
duction at a ghen price (it would hift the supply curve to the left and incrca.;e 1ht:
equilibrium price), so dairy-region rainfa ll satisfies thl.! condition for instrumLtll
rele vance. But dairy-region rainfall should not have a <hrect influl!nce on n.:
demand for butter. so the correlation between dairy-region ramfaJ! and 11, ,H,uld
in!
(b )
11
hi
fro
(c) y.,
dema
lies tr
12. 1
4 27
rFIGURE 12. 1
(a) Prtce and quonhty ore determined by the inrersection of the
Price I Ptnod 2
Qua n tity
(a) Oc.-nund
~nd ~
IIIII<'
rc.-nud<
Price
h'dd~n
Quanti!)'
(b) l:qmlibnuu1
Unit'
(c) Whon the supply curve $h,fts from 5 1 1o ~ to 5:J but the
demand curve remains ot 0 1, the equilibrium prices ond quanti
lies trace out the demand curve.
pr11 c
.111d
quanti[)'
far II
~110d'
Peke
Qu.ncity
(c) l lJUIII''IIUIII pliCC' ;It hi t)llllllit\ \\hell t>nly
dH
428
CHAPTER 12
De -pite con
troiJmg for studeo1 a nd district characteristics, the est1mat~ ut th1. dfec. on tc 1
scores of class size reported in Partli still might have omtllcd ' ;mub!c' btaos rcsuJt.
ing from unmeasured variables such as team ing opportunuk-. nut-.adc :-.chool or t h~.:
quality of the teachers. If data on thc'e variaolcs are unavarlabh.: till' untilled, lfl
abies b1as cannot be addressed b) including the variable~ m thc muhaplc regres<;iu o~
Instrumental variables regression provides an a ltcrnuuve approach to 1h1
problem. Consider the following hypotheucaJ example Sum~. <Jlt!omi~ 'ch 1ls
arc forced to closec;t lor rcpa1~ hcL<lU't: of a summLT ~.arthyunkc: D1~tnct clo ...c.o;t
to l hc eptccntt;r are most sc\erel\ affcctc.:J A district Wl!h SOmL dosed ch n~b
neelh. to "double up" 11s 'lutknl'-. temporarily increasing da'~ site. nus mean') th u
distance from the epicenter satisfies the cond11ion for in-.trumc nt rek\ a nt:t'
becu u ~e 11 il> corrdatcd with class size. But if distance to the ~picl!ntcr i~ uurebtcd
to any ol the other factors affecti ng student performance ('>llch o:-. wht.'ther the ~t u
dent~ ure !'lilllcarnmg EngJi~,h). then it will be cxogenou~ ht.!caulic 11 1s uncorr...
Iated with the error tenn. Thus the instrumental variable, distance lO the eptet: nt.. r.
could be:: used 10 Circumvent o mmed variables bias and to estima te the df...ct of
class Slle on tesl scores.
The Sampling
Distribution of the TSLS Estimator
The exact distribution of tbe TSLS estimator in small samples 1S complicated H(' ever, like the OLS estimator, its distribution in large samples as smple:The 1'\L
estimator is consistent aod is oormaUy distribUled.
Formula for the TSLS estimator. AJthough the two stages ofTSLS mak h'"
estimator seem complicated, wheo there i~ a single X and a sjngJe io~ tru menl /.
as we assume in this section, there is a simple formula for the TSLS esti mator. I ~t
Szy be the sam ple covariance between Z and Y and le t s7.x be the sample cu\'art
ance between 7 and X. As shown in AppendiX 12.2.tbl! TSLS estimator wnh ,, Jn
gle IOiilrUmt.!nt
IS
( _.4)
That ,.., the TSLS estimator of {3 1 i the rauo of the sample covariance bt:l'"e
and Y lu the sample covariance between 7 and X.
1z.1
nt
n-
429
Sampling distribution af f:Jf'~ when the sample size is large. The formula in Equation (12.4) can be used to show thc1t {3 1 ' 1 ' i~ con~t~h!nt and. in large
~a m ple', normally distributed. The argument is !>llmmarit.cd here. with mathematical details given in Appendix 12.3.
The argument tha t ~[SLS is con:.istent comhineli the assumptions that Z;
lS relevant and exogeno~ with the consistency of sample covariances fo r populario n covariances. To begin, no re that beca use Y, = {30 + {3 1X ; + u; in Equation
(1 2. 1) ,
cov(Z,. Y;)
= cov(Z,.({:J0
....
{3X,- u,)l
wberc the second equality follows from the properties of covarianres [Equation
the
(2.33)). By the instrument exogene ity assumpl io n, cov(Z,,u,) = 0,
CO\(/,,) )
That is, the population coefficient {3 1 is the ratio of the population covariance
bet we~n Z and Y to the population covariance between Z and X.
As discussed in Section 3.7, the sample covarin nce is a consistent estimator of
rhe population covariance, that ts. ~n _r_. em(/,}',) .mJ ~ 7 ,. ~ CO\(Z,.X,).
It fo llows from Equations (12.4) and (1 2.6) that the TSLS estimator is consistent
~TSt s _
owLS
tht:
l
z.
Lt!l
ari
sin
24 )
s/\
13
( 12.7)
em(/,,,\,) ,.. ,.
The formula in Equation ( 12.4) also can be used to sho'' tliat the ~am piing
, ubtribution of {3nLs i normalm large ~ample,. The r~.:ason rs the same as for every
otbeJ least squares estimator we have consideretl The TSLS estimator is an average of random variables.. and when the sample size is large the cenuaJ limit theorem teUs us that averages of random vJ riahlcs a re normally distributed.
Specifically. the numerator of the expression for ~f''\ m Fquation ( 12A)is s~) ::
II I ~: t( Z, -...<Z)( yt . . . . ""'Y).' an--average ol
/)(} I
A bit ot algebra.
c;ketched out in Appenclix 12.3, shows that b~ca u :.e of this averaging the central
limit theorem implies that, in large samples. {3fSLS has a sampling distribution that
is approximate ly !V({31,c1-v<). where
nl
.Vi.
,...j~ 1 '"r[(/
CT!:. HH -
fJ _
,. n
Jl,)lhl
(co\(/ .A,)J
,.
(12.8)
430
CHAPTER 12
1)1
commodiuec:, such as cigarcues, figure more prominen tly in pubhe pol ic} J eb<tlt:'l,
One tool m the quest for reducing ill nesses and deaths from smoking- ami the
costs. or ~.: :-.tcrnulit i es, imposed by those illnesses on the rest ol society-ts tu t.t:-.
cigmcllc'> so h ea v il ~ that current smoke!'\ cut back and potcnu.tlncw smoker-. .trc
da-;cnural.!cd I rom taking up the habil. Hut precisely how btg a tax hike is n~cd ~J
to makl' a c.knt in dgarelle con ~umption ? For example, what would the after t, tx
sales pri~.A: of cigarettes need to be to achieve a 20% reduction in cigarc tc
consumption?
fhc on~wer to this qu~o twn depends on the e las1icity of demand for etgar t
II the elastica!> i~ - I , then the 20 % target in conl>umption can be achte\'ed "'.. 1
20'':, mcr~:.t!'e in price. LC thl! elasucit~ is 0.5. then the price mu't me .JO" to
decrca,~. con,umplion b) 20'~. Of course. we do not know what the demand lit'
ucit\ u f d11.urenes is in the: abstract: We must estimate ir from data on pricC!o; !lnd
sales. But. as witb butter. hccause of the interactions he tween supply and d\. m and.
the cl,.stktty of dcmanu fur <.tgarcttes callllot he cc;timated con''" ntl-. b~ an< )l..\
r~;grc"ton lll lol!. tJUantitv on lo~ pnce.
w._ th\. retorc u~e TSLS to Cl>tunate the el asticity of demand for cigar\.1 11:,
using annual data for the 48 continental U.S. states for 19~.5-1995 (the uat are
de~cri bed m Appeo dJ~ 12.1). For now, all the results a re for the <.ross sectim f
states an 1995; results using J.Ha lor earlier years (pane l data) are pn::scntL ~ 111
Section I ::!.4.
11lc in!)Lrumcntal variablt:. Suit'\ Tax,. is the pmtion of the tax on dgurctt<.:"' uri'
ing fn.1m the general saks tax. measured in do ll ar~ p e r rack (in real do lin'
deflated h) the Consumer Price l nt.le>.). Cigarette con umplion. Q;lrnr 110 i-. Ill'
numhcr ot packs uf cigctrettes sold per capita in the srate, and the price. P
r'h
lS tht.: av~rag~.. r~al price p1..1 pack ot cigarenes including all taxes.
Be tort. u~mgTSLS itt'> c scnttal to ask whether Lhl.! two condi tions for tmtrll
anent 'faht.ltl} hold. \\ c rl'lun to this topic m detail m Scclio n 12.3, w hen '~'"
12. 1
43 1
pro"itlc :;omc ).talisticaltooh that help an thh 11'\C).\ment. Even wi th tho s tatisttcaltools,judgmeot plays an important role, ~o 1t1' u~cl ulto think about whether
the ..,ales tax o n cigarctl~s plausihly satasfics the two conditaon!>.
Fir:-t consider instrument rdcv tnCC. Jl~c lU'l ,1 htl!h o..ctk' ta:x tncrea<;es tbe
to tal saleo.. price ~ gar<" , the safe, tax pc r p.KI.. plau .. ahl) ,,ttt,fJL' the condition
tor m~trument rele,ancl!.
).
tes.
the
tax
are
;\~xt consider instrument exogcoell). For the ~.tie:. ta-< to be exogeno~ it must
be uncorrelated with the error in the demand equauon, that is. the sales tax must
affect the demand for cigarenes only indm~c tl) tJuough the price.This seems plausible: GeneraJ sales ta~ rates ' 'ar} from st.ttc to sta te. hut they do so mainly because
different states choose dttfcrent mixco.. of .,,,lc:'. mcomc proper!~. and ot.her taxes
to finance pu blic undenakmp. Ththc chou;c:' Ji'lnut puhlic hnancc are driven by
political considerations. not b) tactors r'"l.Hl.J to the dt.!mand for cigarette!>. We discuss the credibili1y of this al>sumpt.km mute in Sl!ction 12.4. but tor now we keep
it as a work ing hypo thesis.
In mode rn statistical software. the first stage ofTSLS i~ estimated automatically so you do not need to r uo this regression your~clf to compute the TSLS estimato r. Ju ' I thi once. however. we prl! ent the n r~t-!:lt agc rt!!!J cs.,ion explicitly: using
data for the 4" states in 1995. it j(,
In( ?'
'rn~)
(12.9)
(0 In) (0.00-i)
The R2 of t.his regres\lOn J<; -H"n.so the' ari.tti\m in 'alt:.o..lil\ on cigarettes expla ins
...
'
---
(1210)
Thi~ estimated
--
bersome simply to report the estima ted regtcs~ion function with ln(P;''~.,...11" )
ra ther t han Ln(P; 1Kauutr ). Reporte d in thi~ notation , the TSLS c<;timates and
he te roskedasticity-robust standard errors arc
fii'( Q
mtto'l)
= lJ 7~
( I 53)
I~O~ln( P
(0.32)
rtrtll'')
(12.11 )
CHAPnR 12
432
The TSLS estima te suggest~ that the demand for cigarettes is surpri ' ing ly da~>
tic. in light of the ir addiclive nature: An increase in the price of I % reduce' con
sumption by 1.08%. But. recalling o ur d iscussion of instrume nt exogencity, pcrh up~
this estimate should not yet be taken too seriously. Even though the elastacity \\..t
estimated using an ins trumep tal variable, there might s ti.U be o mitte d variable<.
that are corre lated witb the sales tax per pack. A leading candidate l1> income
States with higher incom~::; m ighr depend relatively Le~ on a sales tax anti more
on an income tax to financ~ state go,cmmenL Moreover. the de mand for cagurctle::. presuma bly depends on income. T hus we would like to reestim ate: oua
demand equation including income as an additional regressor. To do so, howe\t:r
we must firsLexte nd the JV regressio n modeJ to include additiona l regre $ or-..
' . ,J /V
(~~~.- ~~ ~
(,,:v i'fftJ,
(!)
,,Ji-Jit
The general IV regression model has (o ur types of variables: the dependent 'arl
able. Y; problematic endogenous regressors. like the price ()fctgarcttcs. which are
potentiaJJy correlated with the e rror term and which we will la be l X: additwnJI
(j~~
regr essors that are oot correlated with the error term. calle d included exog~nou'
variables. which we wiU label W: and instrumental variable.~ Z. In general, t hl.r~
( MJ.AttJ,]()fS
"'p
t.tCf "
ILt
{r
111 /
lJ
JJ
rp
J 1(,7JPih
r.:::J /
V
qg,v
l
r"
JYd.t
THE G E
REGRES
11te gcncr
)
i = l. .. ..
Y,
u,
fa
XI
lm
wI
Iat
f3o,
Z t,,
The coeffiCJ ~
regressors (1
tificd if m
tion or O\'eri
12.2
lasonaps
was
bles
~nl IV r.:grc.,io~~~:1
me:
I ... . n
varib are
. {Jk 1
wh~r~
e~ ~y.~ c...,~
umbi.'T
u ml'~r
II
<: k.
are to
12. ]
=IF ~~~~S~~ ~
,j e~ ~r~~H"'
1:.
latcd wi th u ;;
L 1,
corr~
L-k. ;
~Je..."r- ~ >
~e-~~f;l{
,. ~
'"'
ofantc..rc,tl~
I(
uncorr~~f" 1 <'A'
- - t
(12.L)
1bc.. cod ticicnts are overident.ified if there arc more instruments than endogenous
regrc ~'>ors (m > k): Lhe} arc underidentified if m < k: and they arc exactly idenhfi~d 1f m
k. E.<itimation of the IV regression model requires exact idcntificatton or O\ emden 1ificalion.
nt we
he no
4 33
KEVCONCm
i( ~.uJ<-$ l'tp~
f34X 1
{3 :< 11
Y, .,.. {30
ore
our
'er.
s.
u)
( 12.13)
where, as before. X; mig.bt be corrclatec.l with the error term. bur Wli. .. . W, arc
not.
The population fin,t-~rage rcgr c-.sion ofTSLS relates X to the exogenous variables. that is. the H sand lhe mstrumcnh (/\):
11m~, \V, - l',.
(12.14)
434
CHAPTER 12
~-: ~
J.
r % -,..t
jJ~
{1[
tJ)>
~~
JJ. f)flf" ~
1/P',Iflf) J..;f
11
!f!rQ
Q/'
Y
'
c:~tima tor.
..--
Two
lneTSl
multiple
l. 11rs
L. ,J'
tb~
end<
X ,.
~~~
~~r=rized in Ke y Concept 12.2.
Instrument Relevance
and Exogeneity in the General IV Model
The condu1ons of instrument relevance and exogenelly need to he rood1111:d for
tbe gene ral IV regression model.
When there is one included endogenous variable but multiple inst rument$.
the condition for inslrumen t relevance .is that at least one Z is useful for prcdi.:l
ing X. given W . V.'hen there are multi ple included endogenous variables, 1h1s ~.-on
dition is more complicated because we must rule out perfect multicoll inenritv '"the second '\lHge population regre~siOn. Intuitively " l~u1 tlwr.' ' It' c'~'ll ~~
included endogenous vanables.tbe Instruments must provide enough tnl ormarvn
about the exogenous movements in these variables to sort our their -.cpari.lte
effects on Y.
2. Sec()
no us
w,,)
the s
In practil
mandJ> in
12.2
435
X. h
both
ccond-stage re~es..'iion:_Regress Y, on th~.- predicted values o f tne endogevan ables (X1, X.tl> and the included exogenous 'ariablcs ( Wlt .... ,
n ,) u!ting OLS.ThcTSLS estimators ~~ 51 ~ ~I~P arc the esumntor.. from
th1. seco nd-stage regression.
JIOU
In prncttcc. the two stages are done autom atically withi n TSLS estimatio n commands rn modern econometric software.
The gene ral state me nt of the instrument cxol.!~,;ncit \' condition is that each
im.trumcnt must be uncorrelated \\ith th1. ~rror tc:rm u,.The general conditton for
,alid i nstru men~ are ghcn in Ke) Concept 12.3.
T he IV regression assumptions. In c I\ rcl!rc,.,ion assumptions arc modifications of thl:! least ~quares assumption~ for the multiple rcgrcs ion model in Key
Concept 6.4.
The first IV regression assumption modifi es the conditional mean as~um ption
in Key Concept 6.4 to apply to the included exogenous variables only. Just like the
second least quares assumption for the mulliple regression model. the second IV
436
CHAPTER 12
.... 2 111,
1. Instrument Relevance
in generai. Iet X f be the p redicted value of X ;, from the popuhttion regression of XII on the instrume nts czs) and the included exogenous regressors
( W's). and let I '' de note the constant reg1essor that takes on the value 1 for
aUobservations. Then (Xj~, . .. . Xf,. W1.- W,;. 1) arc not perfectly multicollinear.
if there is only one X, then for the previous condition to hold~ at least one Z
must e nter the population regression of X on the Z"s and the w s.
2. Instrument Exogencity
T he instrum<.:nts are uncorrela tcd
corr(Z1;.u,) = 0 . . . . . corr(Z111,.n,) = 0.
with
the
erro r
Lt! rm.
thai
is.
regression assump tion is that the draw a.re i.i .d .. as they arc if the data arl..' col lected by simple random ampling. Sunilarly, the third IV assumption is that large
out11en. are unlikely.
ll1c Co urth IV regression assumption is that the two conditions for instrumt!nt
validity in Key Concept J2.3 hold. m e inslrument relevance condition in Ke~ Concept 12.3 subsumes the fo urth kast squares al>su mption in Key Concept -Ui (nll
pe rfect multicollineari ty) by assuming that tb<: regressors in the second -:. tag~
12. 2
THE
437
IV REGRESSION ASSUMPTIONS
----------------
error~
I. t :(u 11 W1, W, )
12.4
= 0:
" (X11 X 1111 W1, W, . 2 1, Z,m Y,) are i.UI. draws from their JOint distribution:
s
r
Large o utlicrc; arc unlikely: The X's, W s. Z 's. and Y have nontcro fi nite fourth
moments~ and
4. The two conditions Cor a valid instrument in Key Conct;!pl 12.3 hold.
s.
Atrr?
1~-~
I
Because the sampling distribution of the TSLS Cl> ltmato r is normal in large samples, the general proced ure" for static;tical tnfcren<:e (hvpo the is tests and confidence interval!') in regrcc;-.ton m<ldcJ ~; c:\tc nllto TS L \ rl.'ercc;o;ion. For example,
95% confidence intervC~ls arc constructed as the TSLS est i ma tor ~ 1.96 standard
errors. Stmilarl~. joint hypothe es abo ut the population \'a lues of the coefficients
euran he h!~ted using. the F- tati\tiC, a' dcc;crihcd in Sl.!ction 7.2.
Calculation ofTSL S standard errors. '11"11:re are two points to b.oar in mind
ion
al'lout TSLS <>tandard error'\. Firs1. the c;tandarll ern! ' rt:po rtcd b~ OLS estimation
ol th~.; second-stage regres JOn <Ire mcurrc~.;t ~cau'"' the~ uu nut recognize that it
is the second stage of a two-stage process. Speciitca ll~. the second-stage OLS standard erro rs fail to adjust for the f<1ct th.ll the \c~.;onu -.t t!!r.! rcgrcs 10n u es rhe prellictcd \ .tluc:s or the included cndoucnou... \ ariahk;... r ormul,,, hJr ~tandard errors
that m.th. t h~.; nccc~sary adJUstment 'tr~.: tnc,nr" ttcd into (and automatically used
h~ ) I L regreo;.sion ommanu" t~cnnum~.:~ric ' nft,, an.: lncrcfore this tssue is not
a concern in pracuce if you use a specialitcll rsu; reg.re-.,ion command.
Second. as a lwa y~ the error~~ mtght b~.; hctUl"~l.'d p,IJc It '"therefore imporW n\ to U'>\. hr.:tt:ro~kedao;ticitv-nibust \Cf"tnn' ulthc 'tam.lard error for precisel}
,.
the o;ame ren,on ""it is important to usc hdt: rthkcda,ticity-rubu<>t standJrd errors
tor thl.' OLS estimato rs of the: multiple reg r~ ton mo~..lc.!l.
..
..
438
CHAPTER 12
(the logarithm of the real price per pack) and a single instrument (the real <:all's
tax per pack). Income also uffech demand however. so 111' part ut the t:>rmr term
of the popul~1tion regre,._ion. As discussed in Section 12. 1. if the '!>tate ~ctlc-. tax 1
related to l>late income, then it b correlat~d with a \ arJ.tble 10 the error term of
the cigarette demand e4uauun '' htch violate'!> tht. in,trumcnt t.xogt.ncu~ condtllon If ...o. the IV estimator in Section 12.1 is inconsistenl. That ic;., the I\' rel!rl'-:
. ion suffers from a versjon of omttted ' ~le bias.. To solve th1s problem, \ \ C nt:cJ
~
to include income in the regression.
~.,._,,
We therefore consider an alternative specification in which the logarithm vi
j,.f(;e
tocome is included in the demand equation. In the terminology of Kt:>y Concept
' ~
f .. ~A-12.1, the dependent variable Y is the logarithm of consumption.ln(Q: ~"m' ); the
"-"~ P"~ ndogenous regressor X is the logarithm of the real after-lax price, ln(P;'~" 'u''),
the included exogenous variable \Vis the logarithm of the real per capita ,\,1\t:
income, ln(Juc ): and the in1,1rument Z i5 the real sales tax per pack. SalesTu\ 1 fhc..
SLS C!>timates ancl (hetcroskeuasticity-robust) tandard l!rrors are
,.
I1P
~~
......
) .;;:,
~ ,r:.;J _,
/J ( f
t og ;/
I
,,)teA)
-.-
ln (Q~IgOrt'IIC>)
(1.26) (0.37)
).
1
( 1~.15)
(0.3 1)
ic.~
.eP.
~ ~,_..t
~
1
;.~ CigTcn
ln(Q;' 'c:r.')
IS
= 9.1)9 -
'
L2 ln(Pf"iumrJ) + 0.2~ln(lnc,).
(0.%) {0.25)
(0.25)
(l:?.ttl)
12.3
es
is
of
diesed
439
Compare Equations (12.15) and ( 12.1 6): The standar d error of the estimated
price elasticity is smaller by one-third in Equation (12.16) l0.25 in Equation (12. 16)
versus 0.37 in Equation ( 12.15)]. The reason Lht:! standard erro r is smaller in Equation ( 12.16) is that this estimate uses more information than Equation (12.15): In
Equa tion (12. 15). only one instrume nt is u ed (the sales tax). but in Equation
(l2.16). two instruments are used (the sale tax a nd the ciga re ne-specific tax).
Using two instrumenb explains more of the variation in cigarette prices tha n using
just one. and this is reflected in smaller s tandard errors on the estimated demand
elasticity.
Are these estimates credible? Ultimately, credibility depend!> on whether the
set of instrumental variables-here, the two taxes-plausibly satisfies the two conditions for valid instruments. lt is therefore vital that we as:,ess whether these
instruments are valid, and it i:, to thi!. topic that we now turn.
The role of the instrument relevance condit ion in TV regression is subtle. O ne way
to think of instrument relevance is that It plays a role akm to the sample size: The
'ari
ack.
t v is
.more relevant the instruments- that is. the more the variation in X is explained
by the instruments-the more information i" available for use inJV regression. A
more relevant instrument produces a more accurate estimato r.just as a Larger sample size produces a more accurate estimator. Moreover, statistical inference using
TSLS is predic<~tcd on the TSLS estimator having a normal sampling distri bution.
but according to the central limit th eore m the norm a l distribu tioQ is a good
approximation in large-but not necessarily small- sampks. [fhaving a more rele vant instrument is like having a larger sample s ize, this suggests , co rrectly, that
the more relevant is the instrument. the beller is the normal approximation to the
sampling distribution of the TSLS estimator and its t-statistic.
Instruments th at explain little of the variation in X are called weak instrume nts. In the cigarette example, the dista nce of the state from cigar e tte
440
CHAPTER 12
__.,
.~
I,--
Cl
;;q
th
I"
Why weak instruments are a problem. If the instruments arc \\c,tk. th~;n
the normal distribution provides a poor approximation to the sampling dl'>trihution of the! TSLS estimator. ~ven if the sample size is largt. Thus there is no thcJretical justification for the usual methods for performing statistical inference. I!\ en
in large samples. In fact. if instruments are weak. then the TSLS ctimatot can he
badly hiul!>cc.l in the direction of the OLS estimator. In addition, 95% conf\Jcn-.:e
intervals constructed as the TSLS estimator :t 1.96 standard errors can cont ain lhc
true value of the coefficient far less than 95/o of the time. In short, if in!';trumcnto;
are weak, rSLS b no longer reliable.
To sec that there is a problem with the large-sample normal approxim.lltllO w
the sampling cUstribulion of the TSLS estimator, consider the speciaJ ca~>t!. intr
duced in Section 12. 1. of a single incJuded endogenous variable, a single inst
ment, and no included exogenous regressor. If the instrument is valid, tht!n flf 1
is consist!nt hecausc the sample covariances sa and sl.X are consistent: thJI h.
p pu = sa '"/.\ ~ cm (Z,. Y,)/cov(Z,.X,) = /:31 IEquation (12.7}). But 110\\ -..up
pose that the im.trument is not just weak but irre levant, so that cov(..l,.A' ) 0.
Then ~LA ~ cov(~,.X,) = 0. so, taken literally, the denominator on the ngl t
hand side of the limit cov( Z,. Y, )/cov( Z,X ,) is zero! Clearly. the argument th ll
~[s15 is consistent breaks down when the instrument rdevance condnion tall' \'
shown in Appendix 12.4, this breakdown results in the TSLS e~lima tor ha\ im ;1
nonnormal c;amphng Ul'-tribulion.even 1f the -..ample sizt! is ,ery largl! In I act. \\ I ~.:II
the instrument is trrele\ant, the largl'-sample dist ribution or ~pts is not thai ul ,,
normal rdndom variablt!. but rather the distribution of a rario of two normal ran
dnrn variables!
While this circumstance of totally inelcvant instruments might no t be encoun
te red in practice, it raises a question: How relevant must the in:strume nts ht. fur
the normal distribution to provide a good approximation in practice? The an-.w-:r
to rhis question in the general IV model is com plicated. Fonunately. howe' r.
there is a simple rule of thumb available for the most common situation in prl.:
tice. the case 01 a single t:ndogenous regressor.
W1
~
12 .3
A RUlE
441
Th~;
fir..,t ~tug~ F-statistic ts the F-statistic testing tht. h) pothcsas that the coeffic1cnt-. un the an~trument~ L,,. .. . . Z. .:equal zero in tht fir"'"''S~ ot l\\0 ~tngc least
StJUarcs. \\hut there is a single cndogenou.-. regressor. l tir.,H.tt~c F-st alio;tic less
th '1 10 111dic ltcs t h.n the in:.truments are \W3k. in "'hkh clsL the T"il S ~ostimator
is ,,a.,eJ (c\l.:n 10 large samples). andTSLS c-stausucs and confiJ ence mtervals are
ur rc.:hahle.
442
CHAPTER 12
A Scary Regression
ul eammg~ auinst years of ..chool usrng data on imh\'iuunh. But if more! able individuals are both more suc-
computer-and
ha~ically the
samt
3 1lSWCl'!
1The
12.3
443
method.'> f<lr 10!-.lrumemal vanabk anaJyo;b arc Jc,:. :.clhriJ\'c to weak mstrument:.
than fSLS. -;orne of th~'l. method" an.-~dN:u..,..,cd 111 A ppcndi' 12.5.
Can you test statistically the assumption that the instruments ore exogenous? Yes and no. On the one hand . it b nnt ro''1hk to tc'-l the hypothesis Lhat
cD
the instrument arc exogenous when the col:!fficicnb arc exactly Identified. On the
other hand, if rhe coefficients are ovcridentificd, it i ~ possible to test rhe overidentifying restrictions-that is. to test the hyput ht:,l:-. th,lt 1he "extra .. in.,trum~n tS
arc V<Og.l.nous under the maintained as,umpt10n that there are eno ugh valid
in~trument to tdentify the coefficient'> of intcre'>t.
First conside r the case that the coefficient .m. cxactlv identified. so vou have
as many instruments as endogenous regressors. llH.n itt,lmptl,.,ibk to dc\dop a
stall!>tu:altcst of the hypolhesis that the rn~trum ~n t:-. arc rn tact exogenou<;. TI1at is.
empirica l evidence cannot be brought to bear on the question of \\hc thcr these
instruments satisfy the exogeneity restriction. In th1o; cao;c. th~ o nly wa) to assess
whether the immument are exogenous is to lira'\ on C\(Ktt upmton and )OUr personal knowledge ur the empirical problem ,ll h,tnd For c.:xampk, Ph1lip Wright's
knowledge of <ll!riCUltural ~upply and U~mand led h1m to sugoest tht.~t beloweaver<lgt. r.tinl all \\OUid plausibly o;h1ft the -.uppl~ cunc tur huller hut would not
d1rcctl) ,hjft the demand curw.
Asse!>Sing whether the instruments are e:<ORt: nou""tU'\\tlri/\' requires mal.lng an expert judgment ba~ed on pcr:.onal knO\\ l ~dgc o l the appltcalton. If, however, there arc more instruments than e ndogtmous rcgres ...ors, then there is a
statistical tool that can be helpful in this procc:.s: the !>O-called l~.:st ol overidentifying restrictions.
-...
-A
The overidentifying restrictions test. Suppose you have a single endogenous regressor. two instruments, and no mcludcd
you
444
CHAPTER 12
ln~trumentol Variables
Regression
where! e, is the rcgrcsston error term. Let F denote the bomoskcJa s ticity- onJ~ p..
statistic testing the hypothesis that 8 1 = = 8111 = 0. The overid~,;ntifying reslrictions test statistic is/ = mF. Under the null hypothesis that alit he instrumen ts arc
wh~rc
exogenous, if ef is bomolikcdastic then in large samples J ~ djstributcd
m - k is the "degree of overidentification ,'' that is. the number of in stru ment~
minus the number of endogenous regressors.
x;,,_,. .
could compule two different TSLS estimators: one using lhe firsl instrument, rh.:
other using the second. These two estimators will not be the same bt:cause ol "d'npling variation. but if both instruments are exogenous then they will tl!o~ t~ be
clo!>e to each mher. But what if these two instrumems produce very different ~'!>ll
mates'? You might sensibly conclude that there is somethinta w rong with onl! ur the
other of the instruments-or both. That is, it would be reasonable to conclude 1.1at
one or the other, or both, of the instruments are not exogenous.
The test of overidentifying restrictions implicitly makes this comparison We
say implicitly, because the test is carried out wjtbout acrually computing all l' he
different possible IV estimates. Here is the idea. Exogeneity of the instrument:.
means that they are uncorrelated with ui. This suggests that the instruml!nts ~hl ul\1
be approximateJy uncorrelaled with iifSLS. when.! qs1 .s - Y, 1{3 lrsnx l1 f + 13.A
rsrs-w
) is the residual from the estimate~ TSLS regrc""H'n
r
t1
~
using all the instruments (approximately rather than ~xact ly bt.>causc of samrhnl!
variation). (Note that these residuals arc constructed using the true Xs rather tiMn
their first-stuge predicted values.) Accordingly, if th.e instruments are in fact exog.c
nOUS. lh~n the coefficientS 00 the instrumentS ina regression of u'f'SI.S OD the in~IJU
ments and the included exogenous variables should all be zero. and this hypoth1!" 15
can be tested.
This method Cor computing the overidentilying restriction test is sum.m~r ,~J
in Key Concept l2.6. This staristic is computed using the homoskcdasttcJt~ 1nl~
-s tatistic. The test statistic is commonly called the J. tati tic.
<iJr\t'
r - .. -<.ve - ~n
, __ ,.L~
WV'~}(e
In IJrgt!
12.4
445
humo~kcdastic.
pplication to
e Demand for Cigarettes'
ur <1ttempt to estimate the elasricity of demand fo r cigarettes left off wirh the
LS estimates summarized in Equation (12. I6), in which income was an included
ogenous variable and there were two instruments. the general sales tax and the
gare tte-specific tax. We can now undertake a more careful evaluation of these
strume nts.
As 1n Section 12.1, it makes sen~e Lhat the two instruments are relevant
because taxes are a big part of the after-tax price of cigareues, and sbortly we wiU
look a t this empirically. First, however, we focus on the dtfficult question or
whether the two tax varia ble~ are pla usibly exogenous.
The first step in assessing whether an instrument ts exogenous is to think
through tbe argument~ for wby it may or may not be. This requires thinking about
which (actors account for the error term in the ca~arc ttc demand equation and
whether th ~se faclors are plausibly related to the instruments.
.. Why do some states have higher oer capita cigare tte consumption than oth- ~
ers? One reason might be vanat ion in income~ across states, but state mcomt' lS
llll\ -..:ction
lime periods.
assum~.-'l\ kno~l~o:dge
of the m.ucndlln
~~:ciiUn"
CHAPTER 12
446
rate.
~r-po.~ck
pack. But
~ hat.
nali t i ~.
:~o
retres.
B ut. from a purely economic point or ''iew, smok-
retiremenL!
included in Equ<ltion (12.16), so this is no t part of the e rror te rm. Ano the r reasM
""'i'sTha t lfie re are historica l fac tors jnJ1uencJOg demand. For example, s tates th
grow tobacco bave higher r a tes of s m oking than m ost o tbe r states. Could thts f.,ctor be rela ted to taxes? Quite p ossi bly: 1f tobacco farming and <.:igarette proJ u~
tion are importam indus tries in a s ta te. then these industrit:s could exert inJlw.. ntt
to keep cigarene-specillc taxes low. This suggests lJ1<1t an o mi tted factor in d g:Jr~tte demand- whether the s ta te grows tobacco and prod ut'cs cigarcu es--couiJ
be correlated with cigare tte-speci fic taxes.
O n e sol uri on to this po ssibli; corre lation bet ween the e rror term anJ tlJ(
ins trument would b e to include info nn ation on the size of the tobacco and Cl:-1
12.4
447
relle ind ustry in the s tate; this is the approach we took whe n we included income
as a regressor in the demand e quation . But because we have pane l data on cigarette co nsumption. a diffe r ent approach is a vaila ble that does not require this
infonnatioo. A s discussed in Chapter 10. panel data m ake it possible to eliminate
the influe nce of vnri ables that vary across e ntities (~ t a tes) but do not change over
time, such as tht! climate and historical circumstances that lead to a large tobacco
and cigarelle industry in a state. Two methods for doing thjs were give n in Chapter J 0: constructing data on changes in the variables between two different time
periods, and using fixed e ffects regression ..Yo keep the analysis here as simple as
p ossible, we adopt the former approach an d perform regressions of the type
described in Section 10.2, based on the changes in the variables between two diffe rent years.
The time span between the two different years influences how the estimated
e las!icities ar e to be interpreted. Because cigarelles a re addictive. changes in price
will take some time to alter behavior. At fi rst, an increase in the price of cigare ttes
might have liltle effect on demand. Over time, howe ver. the price increase might
connibutc to some smokers' desire to quit and, importantly, it could discourage
nonsm o ke rs from taking up the habit. Thus the response of demand to a price
increase could he small in the short run but large in the long run. Said differently,
for a n addiclive product like cigarettes, dem and might be inelastic in the short run,
that i~ it might have a short-run elasticity near zero, but it might be more e lastic
in the long run.
In this analysis, we focus on estimating the long-run price elasticity. We do this
by conside ri ng quantity an d price changes that occur over ten-year periods. Specifically, in the regressions considered he re. the te n-year change in log q uantity,
ln(Q::;fQ.W ..-s) - ln(Qfif~~11"s), is regressed against the ten-year change in log price,
ln( P1i?w~11'') - In( Pf.if9~al's). and the te n-year change in Jog income. lo(lnc,,1995)
- ln(/nc;, 1985) . Two instruments are used: the change in the sales tax over ten years,
SalesTax1,1w5 - SalesTa.x;,J 985 , and the c hange in the cigarette-specific tax over te n
years. Cig7tlx,_1995 - CigTax,_ 1985.
'
The result.s are presented in Table 12.1. A s usual, each column in the table presents the results of a d ifferent regression. All regressions have the same regressors, and all coellicienrs arc estimate d using TSLS; the only difference between
the three regressions is the set of instruments used. Jn column (1), the only instrumelll is the sales tax; in column (2). the o nly instrume nt is the cigar ette-specific
tax; and in colu nm (3). bo th taxes ar e used as instruments.
In lV regression, the reliability of the coefficien t estimates hinge on the valid ity o f the instruments, so the first things to look al in Table 12.1 are the diagnostic
s tatis tics assessing the validity of the instrume nts.
.....
cigtl
448
CHAPTER 12
TABLE 12.1
(1 )
(2 )
(3 )
- 0.94*"
-1.34**
- 1.20'
(0.21)
(0.23)
(0.20)
0.53
(0.34)
0.43
(0.30)
(0.31)
ln(P~fml{")- ln(P~f~=)
ln(lnc, 199:-J
---
ln(/nc,,19l\5)
-0.12
Intercept
- 0.02
(0.07)
(0.07)
Instrumental variable(s)
First-stage F- w tistic
~
--
Overidcntifyiog restrictions
1-tt..><:t and p-value
Sales tax
33.70
---
0.46
--
-0.05
Cigarette-specific tax
107.20
---
(0.06)
Both sales tax and
cigarettcspecific Ia>.
88.60
4.93
(0.026)
'These regr~~ions were estimated usmg da ta for 48 U.S. ~tates (48 observation~ on the ten-year Jirro:rences). The data ~Lrc: <kscnbd
in Appendix t 2.1. l'bc / -rest or overidentifying restricti ons is described in Key Coacept12.6 (il~ p-va\ue is given in parentheseSI,
and the first-stage lstatistLC is described in Key Concept l2.5. Individual tod'l'icients arc staltstically sigruiicanL at the !i% lcvci or
t% ~ignificance level.
First, are tbe instruments relevant? The first-stage F~statis tics in the llm:c:
regressions are 33.7 .107 .2, and AA.6, so in aU three cases the first-stage F~stall~ttcs
exceed 10. We conclude t hat the instruments are not weak, so that we can rely on
the standard methods for statistical inference (hypolhesis tests, confidence intervals) using the estimated coefficients and standard errors.
Second, are the instruments exogeno us? Because the regressions in columns
( I) and {2) each have a single instrument and a single included e ndogenous reg.re<;sor. the coeffi cients in those regressions arc exactly identified. Thus we camwt
deploy the J-test in either of those regressions. The regression in column (3). how
ever, is ove.ride ntified because there are two instruments a nd a single includ~J
e ndogenous regressor,so thc:re is one (m - k == 2 - 1 = 1) overidentifying restriv
tion, The )-sta tistic is 4.93; this has a xt distribu tion.~o the 5% critical value is 3 ~4
(Appendix Table 3) and the null hypothesis that both the instruments a rc exog.e
nous is rejected at the 5% s igni(icance Le ve l (this deduction also can be made
directly from the p-value of 0.026, reported in the table).
12.4
three
is tics
gres
nnot
. bowluded
esrrieis 3..84
xogemade
4 49
l11c reason the J -:-.talistic rejects the null hypothesis that bOLh instumcnts arc
exogenous is that the two instr ume nts produce rather differenl esti ma ted coefficients. When the only instrument is the sales tax [column (l)).lhe estimated price
elasticity is -0.94, but when the only ins trument is the cigarette-specific tax, the
estimated price elasticity is - 1.34. RecaU the basic idea of the }-statistic: If both
instruments are exoge nous, then the two TSLS estimators using the tndividua l
instruments are consistent and diffe r from each other only because of random
sampling variation. Jf, howeve r, one of the ins truments is exogenous and one is
not, then the estimator based on the endogeno us instrument is inconsistent. wh.ich
is detec ted by the J -s tatistic. In this application. th e difference between the lwo
estimated price e lasticities is sufficiently la rge tha t it is unlikely to be the result of
pure sampling varia tion, so lhe ] -statistic rejects the null hypothesis that both the
instrume nts are exogenous.
The ]-statistic rejection means that the regression in column (3) is based on
invalid ins truments (the instrument exogcneity condition fails) . What does this
imply abour the esiimates in columns (1) and (2)?TI1e J-statislic rejection says that
at least one of the instruments is endogenous. so there are three logical possibilities: The sales tax is exogenous but the cigarette-specific taxi~ not in which case
the column (1} regression is reliable: lJ1e cigareue-spccific ta-x is exogenous but the
sales tax is nor, so the column (2) regression is reliable; or neither tax is exoge nous,
so neither regression is reliable. The statislical evidence cannot tell us which possibility is correct. so we must use our judgment.
We think that the case for the e.xogene ity of the general sales tax is stro nger
than that for the cigaret te-specific tax. because the political process can link
changes in the cigarette-specific tax to changes in the cigarette market and smoking policy. For example. if smoking decreases in a state because it fa lls out of fas hion, there will be fewer smokers and a weakened lobby against ciga rette-specific
tax increases. which in turn could lead to higher cigarette-specific taxes. Thu::.,
change~ in tastes (which are part of u) could be correlated with changes in cigarelte-spec)(ic taxes (the instrument). Tilis suggests discounting the 1V estima tes
that use the cigarette-only lax as a n instrument. This suggests adopting only the
price e lasticity estim ated using the gene ral sales tax as an instrument. - 0.9-1 .
The estimate of - 0.94 indicates tha t cigarette consumption is not very inelastic: An increase in price of 1% lead::. to a decrease in consumption of 0.94%. This
may seem s urprising for an addictive product like cigarettes. But remember that
thi.s elasticity is computed using changes over a ten-year period, so it is a long-run
elasticity.1l1is estimate suggests that increased taxes ca n make a substantial dent
in cigarette consumption. at least in the long run.
450
CH APTER 12
l l( y,ru :Ire: tnlerc~t.:d en teanung more: at>nut th economics or 'muktntt- ~c Ch.JI,>upl.:a ami W - r
CZI ,, nnJ c.rut-rr (2n! I
12.5
451
n:q u irc~
Three Examples
We now tum to three empirical applications of IV regression that provide examples of how diifcrent researchers used thetr expert knowledge of their empirical
problem to fi nd instrumental variables.
Does putting criminals in jail reduce crime? 111is is a question only an economist would ask. After all, a criminal c<lnnot commit a crime ourside jail while in
prason, and the fact that some criminals are caught and jailed serves to deter others. But tbe magnitude of the combined eCiect-the change in the crime rate associated with a 1% increase in the prison population-is an empinca l question.
One strategy for estimat ing this effect is to regress crimt! rates {crimes per
I00,000 members of the general population) against incarceration rates (prisoners per 100,000). using annual data at a suitable level of junsdicuon (for example.
U.S. states). Titis regression couJd include some control vari<~b les measuring economic condinons (crime increase:> when general economic conditinns worsen).
demographics (youths commit more crimes than the e lderly). and so forth. There
is, bowev~r . a serious potential for simultaneo us causality bias that undermines
such an analysis: If the crime rate goes up and the police do their job, there will be
more prisoners. On the one hand, increased incnrcerntion reduces the crin1e rate;
on the other hand, an increased crime rate increase incarceration. Ali m the butter example in Figure 12.1. because of this simuJ1aneoul> causality an OLS regression of t.he cnme rate on the incarceralioo nne '' ill ~s timal~ some complicated
combination of these two effects. This problem ca nnot be solved by finding bener
comrol variabh:s.
This simultaneous causality bias. bowever, can be elimi03tcd b) hnding a suitable instrumental .;ariable and using TSLS The instruroen1 must be correlated with
the incarcerati on rate (it must be relevant), but it must also be uncorrclate!d with
tht. e rror te rm in tbe crime ra te equation of in terest (it must be exogenous). That
is. tt must affect the incarceration rate but be unrelated to any of the unobserved
fac tors thai determine the crime rale.
Whe re does one find something that arfects 1ncarceration but has no direct
effect on the crime rate? One place is t!XOgenous vartation in the capacity o t ~.:xist
ing pnsons. Because it takes time to build a prison. sho11-term capllcit> restnctions
ca n Coree states Lo release prisoners prematurely or olh~.:n,ise reduce incarceration rates. Cl>ing this reasoni ng, Levitt (1 996} suggcllte<.lthat l.twsuits aimed at
452
CHAPTER 12
Does cutting class sizes increase test scores? As we saw in the empirk 11
analysis of Part II, schools with small classes tend to be wealthier. and their '\tullents have access to enhanced lea rning opportunities both in and out of the cl !:>..room. In Pan II . we used multiple regression ro tackle th ~ threat of onutll.:d
variables b1as by controlling for various measures of stude nt afO uence, ah1liy hl
speak English, and so forth. Sttll, a skeptic could wonder whethe r we diu cut ugh:
If we left out something importa nt, our estimates of the class site effect would still
be biased.
Tius potenLial omi tted vanables bias could be addresscll by including th~:: n~ht
control variables, bu t if these dara are unavai lahle (some. like outsid~ kJrntng
opportunities, arc hard to measure) then an alterna tive approach IS to u'\! IV
reg.ress10o.This regression requi res ao instrume ntal variable correlated \\ il' cia"
si7.e (relevance) but uncorrcJated wtth the ommed determinants of tc't pt.rfnr
mance that make up the error term. such as parental interest in learning.lc:m ntn~t
opportu niti es outside the classroom, quality of the teachers and schol)l fudllltt::.>.
and so forth (exogcneity).
Where does one look for au instrument that induces random, exogenou'" 1ri
alion in class size. bu t is unrelated to the other de terminants o( test performan~"'!
Hoxby (2000) suggested biology. Because of ra ndom fluctuntions in timin.!' ,lf
b1rths, the ize of the incoming kindergarten cl<l.S:> vru-ies from one year ro tht. '1~.:X 1
Although the actual number of children entering kindergarten m1ght be~ o~~
nous (recent news about the ~cbool might influence whether parents send J 'htiJ
to a private school), !>he argued th ~ll th~ po1ential number ul chi ld r~.n -.:nt"'rin~
1l.S
nd he
3.
ho ugh
ga tion
at this
c~d hy
nem is
ever a I
rcslrictbat his
e crime
s larger
large
c classomitted
bility to
enough:
ould still
the right
learning.
o use JV
ith class
t pcrforkarning
facilities.
453
the n~X 1
sive trea tments for victims of heart attacks (technically, acute myocardial infarctions. or AM I) hold the potential for saving lives. Before a new medical
procedure-in this example. cardiac catheterization3- is approved for general use,
it goes through clinical trials, a ller ies of randomized controlled experiments
designed to measure i ts effects and side effects. But strong performance in a clinical trial i.s one thing; actual performance in the real world is another.
A natural St<lrting point for estimating the real-world e ([ect of cardiac
catheterization is to compare patients who received the treatment to those who
die..! not. This ll:ads to regressing the length of s urvival of the patient against the
binary treatment variable (wh~ ther the pa tien t received cardiac calhele rization)
and other control variables that affect mortality ( age, weight, other measured
endogt:
nd a chilJ
entering
J(\mliac c<l bc terizulton is a procedure in which a catheter. or tub.:. is inl(erted into a okxxl vessel and
guided <~li t he woy to rhc heart to obtain information about the heart and coronary arteries.
454
CHAPTER 12
health conditions. and . o forth). The population coefficient on the: Indicator vnr1
able is the increment ro the ()atient's life expectancy provided hy the treatment.
Unfonunately, the O.LS estimator is subject to bias: Cardiac ca th~.;terization Ul>e~
not 'just happen to a patieor randomly: rather, it is performed bccauc;c th e
doctor and patient de cide that it might be effective. [f their decision tl> ba~l!d in
pan on unob e rved factors relevant to health ou tcomes not in the da ta set. then
the treatment decision will be correlated with the regression error term. lt the
healthiest patients a re the ones who receive the tre<stment, the OLS estimator" ll
be biased (treatment is correlated \'ith an omitted variable). anJ lh~ treatment
will appear more effective than it really is.
This potential bias can be e liminated by IV regression using a valtd instru
mental variable. T he instruo1ent must be corre lated with treatment (mu~ t be rt,;l
~vant) but must be uncorrelated with the omitted health factors that aCft;ct survl\al
(must be exogenous).
W here does one Look for some thing that affects treatment but not the health
outcome, o ther than thro ugh i ts effect on treatme nt ? McCieUan. McNeil, and Nev..
house (1994) suggested geography. Most hospitals in their data sc t did notllpc
c ialize in cardiac cathe terization, so many patients were c loser ro ' regular'
hospitals that did no t offer this lreatment than to cardiac cathe terizalion hoo;pt
rals. McCieUan, McNeil, and Newhouse therefore used as an instrumenta l variai:ll"'
the difference between the distance from the AMI patient's home to the neare<>t
cardtac catheterintion bospttal and the distance to the nearest hospital of an) ~ort
this distance is zero if the near~st hospital is a cardiac catheterization hospital. oth
erwise it is positive. If this re la tive distance affects the probability of receivin~ tht:.
treatment. the n it is relevant If it is distributed randomly across AMI victirru., thtn
\l is exogenous.
ls relauve distance to the nearest cardiac catheterization hospital a ' lid
instrument'! McClellan. McNe1l, and Newhouse do not report first -stage F -stati"
tics. but they do provide othe r e mpirical evide nce that it is not weak. Is this di~
tance measure exogenous? They make two argumt:nts. First. they draw o n th, tr
medical expertise and knowledge of the hea lth care system to argue that d ist.tn~ot:
to a hospital is plausihly uncorrelated with any of the unobservable variables th I
determine AMl o utcomes. Second, they have data on some of the add itiona l \'llfl
ables that affect AMl o utcomes, such as the weight of the patient, and in their sample distance is llllcorrelated wi th these observable determinants of survival: tht$.
they argue, makes it more credible that distance is uncorrelated with the Llll<lll
servable determ inanLS io the error term as well.
Using 205.021 observatio ns on Americans aged a t least 64 who had an A;\.il
in 1987, McClellan . McNeil, and Newhouse reached a striking conclusio n: Their
12.6
r varitmem.
n does
se the
ased in
t. then
. If the
tor will
instruhe re lsurviv~ll
health
d New-
a t spee gular''
n llospivariable
nearest
any sort
iral. othving this
1s. then
l a valid
an A Ml
Conclusion
455
TSLS estima tes suggest that car dia c ca lhute rization has a small, possibly zero
effect on health outcome\ that is, cardia<.: catheterizat ion does not su bstantially
prolong life. ln contrast, the O LS estimates suggest a large positive effect. T hey
inte rpret this d ifference as evidence of bias in the OLS estimates.
McClellan , McNeiL and Newho use's IV me thod has an interesting interpretation. The OLS analysis used actual treatment as the regressor, but because actua l
treatment js itself the outcome of a decision by patien t and doctor, they argue that
the actual treatment is correla ted with the e rror te rm.lnstead, TSLS uses predicted
treatment. wbere the varia tion in pre dicted treatment arises because of variation
in the instrumental variable: Patients close r to a cardiac ca tbe.te rization hospital
are more tikely to receive this treatme nt.
This interpretation has two implications. First, the IV regression actu ally estimates the effecl of the treatme nt not on a '' typical" randomly selected patien t, but
rather on patients for whom distance is an important consideration in the: treatment decision. The effect on hose patients might differ from the e.ffcct on a typical patient, which provides one explanation of the greater estimated effectiveness
of the treatment in clini<:al trials tha n in McCle llan, McNeiL and Newhouse's lV
study. Second, it suggests a general strategy for finding instruments in this type of
setting: Find an instrument tbal af(ccts the probability of treatme nt, but does so
for reasons lhat are unrelate d to the outcome except through their effect o n the
likelihood oftreatment . BoLh these implications have applicability lo experimental and ''quasi-experimental " studies, the topic of Chapter 13.
12.6 Conclusion
From the humble start of estima ting how much less butter peo ple will buy if its
price rises, IV methods have evolved into a ge neral approach for estimating regressions when one or more variables are correlated with the error term. lru;trumental variables regression uses the instrume nts to isola te varia tion in the e ndogenous
regressor~ that is uncorrelated with the error in the regression of interest: th.is is
the first stage of two stage leasl squa res. This in turn permits es timation of t he
efiect of interest in the second s tage of two stage least squares.
Successful IV regre~sion requires valid instruments, that is, instruments Lbat
are both relevant ( no t weak) and exogenous. U lbe instruments are weak, then the
TSLS estimator can be biase d. e ve n in large samples. and statistical inferences
based on TSLS r-statistics and confidence in tervals can be misleading. Fortunately,
when there is a si ngle endogenous regressor it is possible to check for weak instruments simply by checking the fi rst-stage F-statistic.
456
CHAPTER 12
H the inst ruments arc not exogenous, that is, if one or more in:.trurncnt:. i:> correlated witb the error tenn, then the TSLS estimator is incoosi <> t~.:nt II thcr~ nr
more instrumen ts than endogenous regressors, then instrument exogcndt) can be
exanuncu by using the !-statistic tO test Lhe overidentifying restriction' Howe\ l!l
the core us:>umption-Lhat there are at least as many exogenous tmtrum\:lll
Lbere are endogenous regressors-cannot be L~ted . lt is therefore snc.: u hem
bmh tbe empirical analyst and the critical reader to use their own unJcr tnndm
of the empmcal application to evaluate whether lhts assumption b rc \llnfthle
The interpretation of IV regression as a way to exploit known exo)o!..:nou'" n
ation in the endogenous regressor can be used to guide the search for pot<:tl ial
instrumental variables in a parllcular application. Th.is interpretation urH.h.:r.,cs
much of the empirical analysis in the area that goes under the broad he. lUin" r
program evaluation. in which experiments or quasi-experiments arc Ul)t:d to c.::.u
mare the effect of programs, policies, or other interventions on some outcomc.: nh .1
sure. A variety of additional issues arises in those applications-for exam pk 1h~
in terpre tatio n of IV results when. as io the cardiac call:leterization exampk d I
fcrenL''patients" might have different responses to the same " treatment.''lll ~
a nd other aspects of e mpirical program evaluation are taken up in Chapter 13.
Summary
1. Instrumental variables regression lS a way to estimate regression coefficients en
one or more regressor is corre lated walh the error term.
2. Endogenous variables are correlated with the error term in the equation ot 1nt r
est: exogenous variables are uncorrelated with this error term.
3. For an instrumt:ntto be valid. it musl (1) be correlated with Lhe included tnJoge
oous variable and (2) be exogenous.
4. 1V regression requires at least as many instruments a!. included endogcnouo; \'an
ables.
5. The TSLS estima tor has two stages. First. the included endogenous variahk .trt:
regre!'scd agnim.t the included exogenous variables and the instruments. ::,~~,.t~nll.
the dependent va ri able is regressed aga inst th ~ included exogenous variuhk' tnd
the predicted va lues of the included endogenous variables from the fir,t Ji!C:
rcgres ion(s).
6. Weak instruments (instruments that are nearly uncorrclated with the tncludld
endogenous variables) make the TSLS estimalor biased and TSLS contd~nct
intervals and hypothesis tests unreliable.
7. H an instrum~.:nt i-. not exogenou.'\. th~n the TSLS estJmator i incon ~ ~tcnt.
Exercises
457
Key Terms
instrumental variables (IV) regression
(421)
.nstrumental variable (instrume nt) (421)
l.lodogenous variable (423)
exogenous variable (423)
instrument re levance condir1o n (423)
10 trumc nt exogeneity condition (423)
rwo stage Least squai es (423)
included exogenous variables (432)
ts whc::n
12..3
12.4
In the study of cigarette demand in this chapr cr. suppose that we used as au
instrument the number of trees per capita in the state. Is till!. instrument relevant? Is it exogenous? Is it a valid UlStrumcnt?
In ills swdy of the effect o f incarcera tion on crime ra tes, suppose that Leviu
had used the number o f la"vyen. per capita as an instrument. Is this instrument relevant? Is it exogenous? Is it a va lid instrumem?
ln their study o f the effectiveness of cnrdiac cathe terization, McClella n,
McNeil, and Newhouse (1994) used as an instrument the difference in d istance to cardiac cathe te rization and regular ba<.pital-;. H ow could you determine whether this instrument is relevant? HO\\. could you determine whether
this instrument is exogenous'?
Exercises
12..1
This question refers 10 rhe panel daw regressions summarized in Table 12.1.
a. Suppose that fed eral government is considering a new lax on cigarettes rhat is estimated to incr~::ase the retail price by $0.10 per pack. If
the current price per pack is $2.00, use tht: regrel>slOn in column (1) to
predict the change in dema nd. Con-;truct a 95% confidence mtcrval
for the change in demand.
458
CHAPTER 12
b. Suppose that the United States cntt:rs a recession and income tails h\
2%. Use the regression in column (1) 10 predict the change in demand.
c. Recessions typicaUy last less than one year. Do you thmk that the
regression in column (1) wiUprovide a reliable answer to the quc~tion
in (b)? Why o r why not?
d. Suppose that the Fstatistic in column ( l ) was 3.6 m-.tcaJ ot 33.6
p~~d
a. Show that X; is a valid instrument. That is. show that Key Conce pt 12.3
is satisfied with Z; =X;.
b. Show that the tV regression assumptions in Key Concep t 12.4 arc sat
isfied with this choice of Z;.
c. Show that the IV estimator constructed using Z, = X; is identicnl to
the OLS estimator.
12.3
a. Suppose that she use!> the estimator from the second -stage regr~ sil ~
ofTSLS: ~ = ,~ 2 L~- 1 ( Y,- fi'bst.S- p[SLS X;)2 where X; is the fin~d
value (rom the flrs t-stage regression. Is this estimator consistent? (fx
the purposes of this q uestion suppose that the sample is very large .. ntl
tbe TSLS estimators ;ue essentially identical to {30 and {3 1.)
2 consistent'>
b Is 6-b2 = n-1
_..!.. ~~~ (Y - {3
ATSLS
- {3
~ rsw
X)
~t= l .,
II
l
1
U .4
Consider TSLS csLimaLion with a single incl uded endogenous variable ,1nd
a single inl>trument. Then the predicted value (rom the first-stage rcgre~~um
is X; = ..n-0 + fr 1Z;. Use the definition of the sample variance and covariant~
to s how that sxY = ..n-Lszy and
= i-fs~. Use this result to fill in the ~Lcp>
o f the derivation in Appendix 12.2 of Equatio n ( 12.4 ).
sl
U .S
Exercises
4 59
b. Z, = W,?
c. W, = 1 for a ll i?
d. Z, = X;"?
12.6
12.7
pt 12.3
re sat
12.8
l ro
term in
u;.
11f
are correlated.
e fi tted
r? ( Fo r
rge and
12.9
H e collects data from a random sample of 4000 workers aged 40 and runs
the OLS regress.ion Y; = {30 + {3 1X, + u,, where Y; is the worker 's annual
earnings and X, is a binary variable that i~ equal to I if tbe person se rved in
the mili tary and is equal to 0 otherwise.
H.
aria nee
he ste ps.
the t'lrst
umption
b. D uri ng the Vietna m War there was a draft, where priority for the draft
was determined by a national lottery. (B i r th ua\e~ were randomly
selected and ordered 1 through 365. Those with birthdares ordered
first were drafted before those with birtbdates ordered second, and so
forth.) E xplain how the loHery might he used as an instrument to esti
mate the effect o f military service on earnings. (For more a bout this
issue, see Joshua D. Anr.rist, "Lifeti.u1e Earnings and the Viem am Era
Draft Lonery: EviJcnc\. fro m Socia l Security Administratio n
R en,rds." AmtrtCIIII
Jun\. 1990.)
460
CHAPTER 12
12.10 Consider lh ~ insLrumentaJ variable regressio n mode l Y; = {30 -t f3 1X, .... {3~ w,
+ u,. whe re Z, is an inslrumenl. Suppose tha t data on W, arc not avatl:tblc:
and the model is estima ted omiu ing W, from the regression.
a. S uppose Z, a nd W; are uncorrelated. ls rhe IV estimato r con istent?
b. Su ppose Z, a 11.d W; are corre la ted . Is the IV estimator com.ish:nU
Empirical Exercises
El2.1
During the 1&.<\0s, a cartel known as the Joint Executive Commiuee (JEC')
controlled LJ1e rail tra nsport of g.r<un fron1 the Midwest to eastun cities in
th<.: United S ta tes. The ca rte l preceded the Sherman Antitrust Act ol HNil
a nd it legally opera te d to increase the price of grain above wha t would
h ~vc
be en the compe titive price. From time to time. cheating by memhers of Lhi.!
cartel brought a bout a temporary collapse of the co llusivc prk\!-setting
agreeme n1. Tn this exercise, you will use varia tio ns in supply associated with
the cartel's collapses to estimale the elasticity of demand for ra il Lramporl
of grain. On Lbt! texrbook Web site W\'1-w.aw-bc.com/stock._watson, you w1ll
find a da ta file JEC that conlains weekly observations on the ra il sh1pp111g
price a nd o tbt!r factors from 1880 to 1886.4 A detaile d description of th~ di.ilJ
is contained in J EC_Description available on the Web site.
S uppose that the demand curve fo r rail tra nspo rt of grain is specified t~
ln(Q,) = {30 + {31ln(P;) + f3)ce; + l:;~ 1f3-z+;Seas1. , + u1, whe re Q, is the 111 at
tonnage o{ grain !ihipped in week i, P; is the price of shipping a ton of gram
by rail , lee; is a binary variable tha t is equal to 1 if the Great Lakes an. ~"lt
navigable because of ice, and Seas1is a binary variable that captures sea!i~m.l
variation in demand. fee is included because grain could also be 1ra nsp~1rtlJ
by ship whe n the Great Lakes were na vigable.
c. Conside r using the variable cartel as instrum enta l variable Cor In(/')
Use economic re asoning to argue '>vhether can e! plausibly satisfies t'l~.:
lwo conditions Cor a valid instrument.
"'These data WC;(C rro\ulc:d b' Pmk-.wr Robcn PnrtcT of '>on h... estern l OIVC~Il~ ilnd llt<;(C u,~J ~
ht~ paper "A Study ot C.trtcl Stabtlitr: The Joint L'reculi\'C Commmec. JtlSO I '~li,'' nrc flrll Jurrrll I
oj Fc,umuo t 1183: l-112) 1111-~14
Empirical Exerci~
~tagc
461
How does fertility affect labor ~ u ppl y'! That is. how much does a woman's
labor supply fall when she has an additional child'/ Jn thi!' exercise you will
estimate this effect using data for m arn~d women from the 1980 U.S.
Census.5 The data are available on tht: t~ x tbook We b site nww.aw-bc.com/
stock_ watson in the fi le Fertility and descnbed in the file Fertility_D escription. TI1e data set contaim in[ormation on m.trried women aged 21-35 wirh
two or more children.
c. The dara set cootains the vanable samesex, whrch is equal to 1 if the
fu~t two children are of the same sex (boy-boy or girl-girl) and equal
to 0 o therwise. Are couples whose firsI two cbtldrcn a rc o f the same
sex more like ly to have a third child? Is the effect large? Is it statistically significant?
d. Explain why samesex is a valid instrument for the instrumental vari
able regression of weeksworkell on morekidJ.
e. h samesex a weak instrument'!
f. Estimate the regrt!l>Sion of week~w,>rked on mord.;.1ds u::.ing :,tW ie.\eX as
an ins trume nt. How large is the fertiht) c l feet on labor suppl) 'l
g. Do the results change when you incl ud~ the vanables ageml. black,
hispan, and othrace in the labor suppl) rl!grcssion (trea ting tht: e varia ble as exogenous)? Explain why or\\ hy not.
462
CHAPTER 12
EU.3
b. Compute the F-statstic for the regression of X, on Z,. Is there ~viden ~ c.:
of a "weak instrument " problem?
c. Compure a 95% confidence interval (or f3t using th1. Anderson-Ru
procedure. (To implemen t the procedure, assume thm -5 s {3 1 5 5.)
APPENDIX
12. 1
APPENDIX
12.2
t 11
463
the lnrm 1~ fN he 1'-.LS c"11mator, e\rl ""cu 10 term' 01 the J:n't:Uil teJ' luc X ,l'> the ormuiJ lor ,h .. Ol c:; c-.t1mator tn Kc~ l unc.:pt 4.1. "tth X, r.:pl:ll.tng A . That J.S..
ntW~
. h"!>amrI.. v.m.tncc.:
.
I .. mpI~ co\ari.Inc..
1 '.'1 w here \'- 1st
<1 ,., nn d ~,, '' 11c
lxt\\l~n ) , .md .\:.
n~- .lll\1. \ j, the prcuictcJ valu.: tlf \', fllllll the fil,t ,1.11!1..' rc~re,sion,
ir
~.\
1-11
.r -
\'ariancc~
1'\-} T~l \
1T
sn and
'
APPENDIX
I Large-Sample
\ll.krcd 10 <>ldtun 12.l.Lhat is.'' ilh a :.ml!k. trJ'-IrUill( nt,lt 'lll}lk indudc:J enJogcn11U:. ' an-
995.
t..
],, '' trl. \\t: Jc:rhc a formula tor tl<. I Sl s c tnlllllor 111 tern\' uf thee wr~ that tonn~
r r the emall11Ilg. di~u:....ill'l. similar h th..: C\PTI.:'' I n rr I the; OLS ~'tim.thl Ill
the b.l~l'
f4tt.tli11n (4 'O) in \ppt:ndix -U. from l4u ti'm fl" 1) }' - )'
= -
1
-
II -
/'lOu ) ~ 111 h~
u)]
(1:! I~J
wh r ~ 11 ::...,
~
(7
Zl
~" (Z- ZI(X- ,\) tnJ v.htn th .. ltr 1!..:4 1h!y follow:. bu::au~~
=0
Sub-.LJiuung Lht d.:tmtlll..lll v.>7\ a1 d !h~ lm.ll t:\prc ....,.ou 111 E4u;~tton
tl" IS) 111111 the dcttnnion of Of 'L o.lOd mulllplytng th~ nurnetnllll ltnJ dc:nonHnlllM b~
(II - J) ,, \ it:IJ~
464
CHA PTER 12
aTSE.S =
,.,1
1 II
- L(Z1 - Z )u,
Q
,.,1
"t"
II
-1
__
1 "
.
;; L (Z, - Z)(X, - X)
(12 I )
APPENDIX
12.4
Large-Sample Distribution of
FiN constd~r the case that the m.,trumcnt i~ trn:levant. so that co,~7 ~) 0. lb,.n tb<=
urtlumcntm ;\ppendix 12 3 cnta1ls dtv1:.1on by zero. To avotd thic; rn.)bkm. \\1. nc,;~J tltiil>.:
Lorge-Sample Distribution of the TSLS Eslimolor When the lnlrumenll$ Not Valid
465
a clo:.o.:r JOQk at the behavior of the term in the: do:n'-lmtnatur ol Cqu.tlton (12 19) when the
populdtnm ctwariance
9)
1:.
;.era.
We 'Inn ~y rewriting Equation (12.19). Because of the con~'ilency of the sample aver
age. in lnrEie samples. Z is close to ~l and
'i.:
the denomi-
'2-'/4, = r, where
~ b\! ac; defined m Appendix 12.3 Then Equatjon (12 I 9) implies that, in large samples.
aTSLS
-9 =- ~1
n .._ _
- r
(q
lu~)
+ -(f )(q
-
(12.20)
r u,
0.
u:; =
var[(Z;- ~z)(X, -
by the third
rv regres-
sion assumption), and have a mean of zero {because t he 1nc;trumems are irrelevant). Tt
follows thai the central limit theorem appljes to' specifically. r l u, is approximately disuibuted N(O. I). Theretore, the final t:\pression ol Equal io n ( 12.20) implies that. in large
sa mples, the distribution of ~fSL..S
= cr / u , and S is
the ra tio of two ra ndom variables, each of which has a '>tandarcJ nom1al distribution ( these
two sta nda rd normal random va riables arc correlated)
ln Other words, when the im.lrument is irrelevant . 1he cc:ntrallimit theo rem applies to
the denommator as well as the numerator of the: TSLS estimato r, so that in large samples
1hc dil>tnbullon of the TSLS estimator ~ the distribution of the rallo of two normal random
''aria hies. Because X; and u; are correlated, these normal random variable~ are correlated.
nnd the large-sample distribution of the TSLS estimator ~hen the instrument 1s irrelevant
is complicated. In fact, the large-sample distribution of the TSLS estimator with irrelevant
instruments is centered on the probability limit of the OLS cllllmator. Thus, when the tnsltU
men! is Irrelevant, TSLS does not eliminate the bia:, in OLS and, moreover, has a nonnormal disuibu tion, eveo in large sample:..
When the instrument is weak but not Irre leva nt. the cli~tnbutJOn of the TSLS ec;tlmatorcontinu..::, 10 be nonnormal.so the gc oerall~son here about the extreme case of an irrcl
evant instrument carries over to weak in~trumen t s.
1s
466
CHA PTER 1 2
I Instrumental Variables
12.5 Analysis with Weak Instruments
APPENDIX
----
\'.Uilhfc,
m,IIY'I' u h .. pr
ol poll.. ntiall} wcJk ilhllllmcnl~>. The appendt~ fncu:.c~ 1,11 1h\. ~:.n,~.. ol n "in~k m lud~d
dndng~..nou' rcgrc.:~SL'l [l4u:ttiOil' (12.13) ;mJ (I:! 14))
e '""' rurthc hit~' of th.. TSL'\ .,ttmator. L..t {!".denote th~o: prvl ,,
f1'L ' -~~de not~ the.. <t~) mptoti~.. ht,t\ of the t 1L.')
cMirnalllr (if the n:gr~.."''llf b enJugc.:nou'l.. then /)1 .....L:.. {1. 11 ' 4 {31). It is JX>''ihk " t., ,,
that, \\h.,;n there 111\! 01.111~ tn'>lrllnlCll~. the h~as ,1f the f'SJ "IS appm.JmaleJy nt~f' 1 ' )
1
{3 1 '- (Jf/'- ' - {J.)I[F(I J lj, \\h..:rc F.( F) IS th~.: t::<p(;l' l,tllun ol the. fiN-Mage /'.q ttl\l 1:
C(fl - IO,th~n the b1 ' I I \LS. rdaU\e h.> th.: ~ t.:. ul OL~. is uppr 'Xtm.Jtd~ /9 , u-1
o,cr 10 whid1 j, 'mallc..n"Ut.!h to he ctc:ccpt.lt>k 111 m til) applu...tllnns. R~ pJ..cmg F:(/1
Ill" rl F> l~1eld"t
koftltumh'r Kc\o(' ~ .. r l'\
pre\' ru" para~raph imoh'-J nn tppru\imat~ formul,l f 1 the
hta::. ot the I SL!:i o.::.tmt.ttor \\h~.; lther<! .trc man} in,trumcnt~ In mo't applicalinn I \\
T ..: mo ''-
'l nth~
ever th .. number"' tn:.trum~.:nh. "' 1::. small. St~X.I. HH.l 'I o!o!o (2005) pronJe .t form : 1 t..:'l
for'' en~ in.'>tnnm:nt~ that uv,,.Js fill' upproxtmatJull that111 '' Jmgc. Jn the: Stud.- y, lll<l 1st
Ihe null h~ pollt'-''1' b th Jt th" in,tnlm~.;nt' .trl w.: tl.. md th~:. .tltt.:rll<tll\ '- h} poth~ ''~ 1 ~
.ll
the in~trumenh tr~ 'tmn~. "l11:r.: c;trong tn,trunwnt' ut~.. ddim:u tu be 1n,tr Ull1ll1
whtch ti-e bta..' of 11 '- 1 \I "..:'11m tori" Jt "'!lo't IO"o of the hia' c;' the OU, "''um.tt
11
te-., \It
;,
~~nw~.~n..:<: k\cl.tlt'cr
1 "''' '"'""
5~
~\I the rule of thumb 'll~;ompanng /"to lU 1:-. a gow approxunauon tu tht: Stud \oi!O t~:t
th~ 'CI:tlnd
'lCJl, I
q~h.'''
} I 1!!11111,, till:
ill\ Jllcl'
Ill~;
467
/-stnti,tic t~ ... ung th~ hyp<'lthesis that the cneff:crcr1 1 (\1\th / uc nllteru.Under the null
h~ pothesi.; tl tt/3 = /3 . if rhe in:.trumcnt' satisfy tlw C\1 .;~n~ 11y ,,mJitillll (condition 2 in
KcyConccptl231 th.:nthc:ywill~ unclrrd ,tco\\iththcerrortermrnthisrcgrcs~JOnand
the null hvpothcsh "Ill h.; rcJ~:ded 111 s,:. C"lf dJI' 1mph.s.
A:- J1~ussctlm Sccll)ll' 3.3 anti 7A. a conlitl~:nct: <oct can be c~n,lluct.:tl as the set of
value~ or the: parameter~ th,lt an. not rc:JecteJ b> a hyJXHhc~l~ (c)t. \ccurdmgl).thc 'et of
\a lu e~; ol {3 1 that arc not rejecteu hy u 'iulo l\nucr.,nn Huh111 tc't ,,,n..,tituu.:s a YS% cunfi
lkncc ~c.:.t tor /3 1 When thc And~n.(.lll Ruhin F-... tatistk j.. Cl>mputcd U'!>int: th~: humo<.kedao;ticity culy romlllla. the Anl.ler;~1n- Rubin confitl~:nc~: :o-ct can hl ,,,n,truch:d by '>Oh ing a
l(U.Jtlr .ttic cquatinn (c;ee
I lw "'" c ~chmJ th.. AnJ~r;Cln-Ru~m <,lilll~ll ... rtt:\ r :aS-'Uml!s n~l rumtnt r~..h..:van.::....
und the Antkr...ul'Rurn cc,nllu.. nc~: 'CI ''ill h.rnc: .1 c.m..:r:~s,~ prob.tl. 'II) ol 95"' m .o~r{!e
s:~mpk'. 1c:her th" ,nstrumcnb ... r~.; '!->:vn!, weal;, or cn:n irrclc:,nnt. A.,do;.-s,;n-Ruh.n
confide1 ,..;.. ,._ ' L.,,.c 'Orne peculiar r"'pc-lif.:s-for c:xnmple.lhey can be empty or dbjoinL
r\ ura\\b.td. i~ that. when in~~rumu.ts .,rc strong (so TSLC\ is q11id) and the codiicicnt ''
,wc1JenlificJ. ,\ nJcNm Ruhin in1enab arc inclticicnt in lhe <.en(e that they arc \\oider
Estimation of {3
It thl '"'trumcnl' tr~ irrde\'ant. it b. n1.11 po..;,jhJc lonhtoin an unhia~cJ c,tim;ltllr of /3 1
C\~.;11
'n l.tr!!e '>3mpl!!'- Ncvcrthdc'' ''hcn m~trumcnb iU~ weal.: , (orne IV e~trmators tend
ro ht' mnrc ccnttrcu vn the true \'aluc ul {J tb:m h 1 Sl $. C>n~ (Uch csum.ttor j., th-.; hmrted
ini~HnHitm n maxmmm hkchh(.>Od (I 1\ff) e~umawt A' it~ tlilm~: tmpl1cs. the LIML C!>ll
mntnr 1:-1 1he mux1mum ltkchhood c~llmator of~ ~ in th.. ~y:-tl!ll l 1)1 Ltlualion~ (12.13) and
{12.11) (for;~ cliscuo;;ston of maximum li(..elihoC"lu esumalltm. w" t\ppt..m.h-.: I I 2) nw LIML
... ttm.llur al".> b tht.. ,,,)uc l1f ~1.u that mmimiLc., th~.: lwltl\l,ke~)a,ttCil) unly '\nJcrson
Ruhin tc,.t ~"''''tk. Thu.s. if the AnJ~:r-.c.m-Ruhin cn1hdcm:,;o ~t j, nut empty. il will con lain
Ihe u:-o,tL c~tima~or
Ir the in,tn ,r,crl:- .c weak. the UML estimator I$ more nearly et:ntercd on 1he tru-.
.....
e of P1 than :.
~L~
L.rgc :.:impk-;. A dr.:~wb<>ck of the U\lL c~timator if; that it can produce e.xtreme out lie~
Confidence inh:rvub c.:un~t rue ted ar0unJ th<: LL\1~ cstimfttor --,;-: the Ll\tL .,tandard
t1 itrt: m(lreo relinhle than imcnab Clln<.tructe J around th r<a '\ c.;timatM u-.ing Ihe
T'l '\ . . t.m1.hru erwr. hut are le~s reliahl~. lh 10 \ r d('rS<\1\Ruhin Intervals when the in-.trure w.: .tk
'1111! prohlt.m-. ot ~c.timation. tt:,ting. 1nd conhd n.:c intcnl\1' in 1\' rt.-gr~.::.sion ''ith
"'" dt in.,trument~ con!illlutc an area uf l>ngum:- "' .n:h. 'li lc:ttn more ;1l'ltlUI this h'lpi.:.
VI >I thl' \\'ch Slh! rur Ihi;, f'\o,1k.
tn
m_,,,, ..
CHA PTER
Experiments
and Quasi-Experiments
13
n many f1eJds. such as psychology and medic ine. causa! cffccl.!> are communi"
m~Lhc..tl
threat~
to ''altdit~ oi
..
.,..
event". the treatment of some indjvjdual occurs "a!> if" it is random. For example.:
suppose that a law is passed in one state bur not its neighboring state. If the c;t:lh' ll
,.,h~:.n
the law passes it is "as if" some people are randomly subjected to the law (the
treatment grouv) while others are not (the control gyoup).lnus passage of th~o.
);iW
many of the Je~c;ons learned hy studying actual experiments can be apptied (w11h
some modifications) to quasi-experiments.
Th is chapter examines t:xperiments and quasi-experiments in economl'-~
The sHuisLicaJ tools used in thi::. chapter are multiple regres::.1on analysis.
469
th~:
the typ.: l11 uata analy7ed anti the spcci.tl opponunilic' und challenge~ pos~d
!'he nwthods developed in this chapter arc often ust:d for program
C\ aluation.
dfe<:t of a
the dfect on
~arnings
IQtc!T\'Cilllllll
or treatment." \Vllat
j.;;
loans a' ailabk to middk-ciJs.., '>tuuent~? Thi ... chapter sJi..,cu~::.cs how 'ud1
programs or policies can he cvalumcJ uc:ing l!xpcnm~:nrs or qua~i-e\pcnmems.
\Vc begin in Section 1.3 1 b, elaborating on the dascussion in Ch<lptcr I of an
ade<.~l randomized controlled
thre<.~t
<trc Jascuc;scd
~~,;elton 13.2. As djssu cd m Sect ton 1:u . soan~.- of these ducat~ can
ndtlrcs!ieJ or e':a.luatcd
con~titutc
1n
bs
<.li lfcrc.::nces' estimator ansJ instrumental variahk!-. rc rl.':..:..ion. Section 13.4 uses
thL'.;;c methods to an~ly:tc a randomized controllcJ ~xperimcnt in whtch
ch:m~.:ntary
..,tate
111
~;x pc:rimcnts
;md qual>i-
cxpcraments is that trl!atmcnt effes~ can Jitter fmm one memhcr of the
J10pulation
tO
the
PC\1, a nJ
the
m,tlti!T of
c.tUsal effccb when the pupuhnkm i'\ hctemgenc::oll.) lli taken up tn Sc;ction D./.
470
CHAPTER 13
raodomaz~d ~:.ontro ~
:1
Y, = {30
+ {31X, + u,,
where X, ac; the treatment level and, a~ usuaL u contain::. all tbe additional J.,.lt:r
minanh of th~ outcome Y,. If the treatment ~ thl ~ame for all membe rs of th~
13.1
471
treatment group. then X, is binary. where X ;= 1 indicates that the jlh individual
received the trea tment and l , 0 mdtcates that he or she did not receive the treatmen!. If the treatment level vanes among those in the Lreatment group, then X; is
the level of treatment received. For exam!;. X , might be the dose of a drug or the
number of weeks in a job Lrainin ro ram. where X, - 0 li lbe treatment is not
receiv
ro .
; is binary. the n t e tnear regress10n unctton m
472
CHAPTER 13
K~~
Concept 9.1
th~\1
a lit:ttllitical stUU)'
IS
Failure to randomize.
13.2
wcol1~ called
473
cu~es,
thl'
c>..pcnmentcr mm..., w ether the trcatmcnt wa~ actuully received (for example,
th" trainee atlcndcd cia"" . und the treatment actuall~ rcceived is reconled as \",
Because t ere..., .m dem~:nl of choice in \\hethcr th~. ~uhj~.:ct receives rhe treat
mt:nt. X (the trsatmspt r.ctuall" rcccivsd} "ill he correlated ,, ith u (\\hkh
mcludcs moti\ation and innate ahtlit\) C\cn 1lthc c ~ r.mJum ll'..,i!!nmcnt. In
1
o ther word-.. '' th par11al comphancc the trc,atm~.nt anJ ~ntrol group~ no lunll!cr
'' hu:b 1
sc cclJl)n.111U'>. fai lure
to folluw
t.:sttmator.
Tn oth~o.r ca.;cc; the experimentt:r mi!!hl not know whether the trc:mmcnt '"
tctuail) rccet\c:d r or ex.tmpk. if a c;uhJt:<:l!n :l n~e;dt~o. ..tl npcnml:n t b prO\ ttkd
wJt h the c.Jrug hut. unb knO\\D!>t to th1. rcsc trlhl.!rl). ~Imply does not taJ...~.. 11. thc.n
thl..' recorded tn:.nmenl (''n.:CCI\ol.'d Jrug")" llh:orrccl. Im:omxt mcasur~rn\!nt nf
the trcatmcnt actually received also ll.'aJ.., to hia' in the difference!> e-.timatur.
Attrition. Attrition rcfcrc; to subjects droppina ou t ol thc c;tudv after betn~ ran
Jomly as~ign\!d to the treatment or control 11\lll ' \oml.'ttmc~ auntion occur' for
ll.'ason~ rc ale.' lO 11e treatment ro rarn: fPt c>.amplc. a parttcipanltn a JOh
lr:u nmg" u \' mte. f need to leave l0\.\11 to
.., k r d ali\ e. But 1f the.' rca"''" lor allnliun b rclall:d to the treatment ihcll, tht!n the attrition results in hias
in the Ol S C\limator of the cau..,al clfcct. P(IT I.'X:tmplc:. -;uppo~e the muc;t able
tninee.., drop out olthc JOh lrainmg propr.tm npcrimcnt bccauc;e rhey !?$1 OUH.II
town johs acquired u in~ the job traminl! .,J,.,rJf..,, o;o that at the end of the ex eri
474
CHAPTER 13
Experimental effects.
are in an experiment. o;
e ~iL~h~e~
r ~kn~o~w~.,vwr,'w-~~~;.~~~~;;;:,~;:,:~~
or the contrgJ group I g a medical drug experiment. for examp e, somet1me' 1e
Jfug and the placebo can be made to look the same so that neithe r the msJtc::L
professional djspensing the drug nor the aticn t k
dmimstLrc:
"'IS
ere
in or
o. If the e xperimen t is double blind. then lwth
the treatment and control groups should e xperience the same experimental efft::!!.
so different outcomes betwc
e tw
can be atLnbuted to the drul!
ou e- lind experi ments are clearly infeasibk in real-world experimen~'> in
economics: Both the e xperimental subject and the instructor know whethcJ thl!
subject is alte nding the job training program. In a poorly designed c xpenn~
.!.his experjmegtal effecT could be sulsstmn4tsl. IW e @ill pi. ($acners 1~ 11
13.2
475
it thf > run the ric;k of losing t 1c1r jol"' 1! the prygrnm gcr! 0 mJs poorly in rhe ex pet,.
tmr.:nL D~citling whether r.:\rl'rimcntal results ;m.: biased bccau::-1.! ol Lh\. ex penmental t!ffects requires making judgment~ based un \\hat the expt!rinwnl is
e' aluaung and on the detail!' of ho'' lbe experiment was conducted.
jmprccbdy.
mental
ven if the volunh:~o.rs arc random!) ao;signed
to treatment and control groups. these volumcc..r. might be more moth ati!J than
the ,,,crall o ulation .md, for them. the treatment could llavc a greater effect.
Mmc generally. selecting the sam pi!." nonranc.lom ,. Irom 1 1c greater popu nuon ot
intcrc~r can compromise the abllit~ to genaaliLe the n;:.ult:- from the population
o;tudied (-;ucb as volunteers) to the population o l 111tcrest.
might
nt>l
476
CHAPTER 13
be fu nded at a lowl!r leve l: either possibility could result in the full -scale program
being less effective than the smaller experimental program . Another dtHcrcnle
between an experimental program and an actual program b its dural o 1: lltc
experimental program only lasts for the Jcmnh of the ex ri n1 wbi!'- the a~tual
program under consideration might~ available for longt!r pcn tld~ of urn~.
a>ruhtbisn:::
t 3.3
477
tt: sot at these characteristics enter the regressio n explicitly~ as urning that these
charactcristicy cotes iz on dy rbis !earls to the multtple rcgrcsston mode l
(D.2)
Tne OLS estimator of {31 in Equation (13.2) is the diffcre nses ssfimntqr w jth
a dditional regressors.
478
CHAPTER 13
What is a control variable? Throughout thb ho0k, we have U'>cd the tern
control fur .1 factor thaL it omit
would h:ad to \)0 ' \,n 1
a c bia' iur the codfiL-1ent of inLCn.: I. In the crnptnc:..l application to cl,,.,, "'l
and the 'tudcnt teacher ratio 10 Sc::ction 7.6. \\ c LDcluJcJ the pcn:c:ntag~.: ol ,tu 4
not dcpsm! on rh ..: ' PO'S m' \ arinb!> X. ln_t e c <Co~ '-17C cxamp c, L l l'l "'"h.:
correlated with I octo"'\. "uch a.' learning opportunitk., outsidt.! school. that ~.;lll'r
the error term: indcctl.tt is becaust olthis corrdation that/.chf'cr is such a u,~.tul
control \ariuhlc. lhc correlation between LdtPct and the t.!rror term meum. thll
the codfi<.:lcnt on l.chfct does not have a ~.:ausal interpretation. What th.:: comll
tionalmean tcro independenct..: as~umptton says is thm . gJV~n the contwl van tbl~.;
in the rcltrt!s,ion (incluuing LciJPc:t).thc mean of the error krm does not d~.:r~" I
qp rbs stUucnt-tcat..:hxr ratio. p dw nwllwwn' db 1hr "w"Hii! II 'tdm 1uhn duA.
ha .. c a cau~al inter 1retation evt::n though the CLlefficicnl
Pet Joe~ tHll
n llt: contc:-.t of experimental data. there are two rck: .. ant cases 1n w 1c 1111\.
coudttional mcun-zeru assumption fmls but Lhs conuitil1n,d mc;1n inth:pcnt.IL r ~~
.tssump110n hold'.
13.3
479
mtc has the snmc chance o l hcing asstgned to the treatment group, the mean
,
1
rot'< ll
1
asc. X i'i as~1
at ion 'ltalUs IV. and as is discussed furer m Appendtx I 11l sgqsJujonal mean tn<.lependsqss hpld'i and the diflcrcnccs
c-.hmator "'ith additional
w,iiJP;{ .
ll> tmport.mtthatthc H', rcg.n.: sorl> in Equation (13.:!) not be expl.'rimcntal
4. ~ p~omc... For example. suppose that Y, i~ earnings .tfh:r the joh training program.
t,PfY~ indicate-. ~sttin a 'ob aflt!r the rouram. and.\ indtcate freatment. Iocludmg
.fit
.ALP' "". tll(e,ant ~ dfect of the program. ho constanr future t:mplo~ ment ~1oreo\cr,
///(If.~ ~ploymcnt could he correl
'
c ro rram c.;; s o gcttmg a
l
JOb) an wt 1 t c error tcrm (morc-abll. trainees rcsetvc a JOb). ln thi~ cm.t:. condiltondl mean mdcpcndcncc would not hold. W~ thcrdor\! restrict attention to lV
tp;J variables in l:quation 13.1 th,tt mc;t~:.urc pretreatment characteristics. which arc
nol i.nl1ucn~.:cJ by the cxpcrimcnta treatment.
-f;'pit
,
t1e
trs ..:as s
480
l)malkr variance) than the OLS estimator tn the ' tnglt: rcgrc<>-.ut mudd [Equat: reason for this is that includin the a ( ttlona l \.:
a
of Yin Equation ( 13.2) reduces the variance of the e rror tyrm l '' C
CISt: I 8 7).
Fv r
2. Check for randnm ization. If the treatme n t is nol randomly assigned. nnd
particular is assigned in a way that is related l o the W's then the dif
111
In pr. cli~.:C.lht. :.econd and thifd ol lhc!Se rdS'tsns can be related . If th ... h"ck
for randomuation in reason 2 indicate that the lreatment ,.. as not ranJoml~
a signed. it might be possible to adjust for this nonrandom assignmt.:nt by u in_g
the diff~rc nces estimator wilh regression controls. Whether this is in fact rl," ihle.
howc..;vl!r, depends on rhe detaih of the nonrandom assignmenl. If the asc;ign nr:nt
pmh.tbility depends only o n the observable variables W, then Equation (1..2 )
a<.lju~ts for lhts nonrandom assignment. but if lh~.: as~ig.nmen t probabil ity th:rllllh
o n unob el\ eel variabl!!t- as well. then the adjustme nt made by including thl H'
n:gressor!'! i' incomplee.
13.3
481
in } in thl.! tn!atmggr gm pg OV!' F ' 'W q wras 01 the qcyrimt!nl. minus the:
averal!e chanl.!c 10 }'in the t<mtwleroup n\'er 1b1; ~ijQJt rww Thb tljffewncrs-mdifferl.!nc..:~.-:, t:stimator can bt! computed u'tn~ a r~:gre;~ion, which can be augmented
\\ 1th additionul rcgn:l>sor' mea-,uring subj~ct c.h,rract~Ti tk"S.
change
wwg!t a\'rfp&q
trdl mcnl
be the corresponding pretreatment nnd posllm rbe q wrwl Ktfl!IP I b.t !!Y"fi!t ,tjiugg. m Y 1mr
yrrmtrnl u/1<''
the course ~lt the experiment [ur those rn th~ treatment group is Y1''.'111n 1711"'1" Y" ""' "'
. anJ the "'~rage change in )' O\ er this pt!riod for those m the contml group '"' ) mrvl,u;u~ - yam;rvl. b.f. .'The difft:ri.'ncec;-in-differences estimator IS
the a\'erage ~lange m \ tor tbos.: 10 the
10 }- tor those 10 the C()nlHll group
rre.rtm~nt
av~ragc
ch.tng~
=(
= j, y ueumhnl _
fyfifhiiJirr.
uf
ym:tJIIIk'lll.llfl<"t _
y~trutnl<'llt.lwforr)
(13.3)
j, y cmllftl,
\\ ht:TI!
rr'11111
<ithgneu. then
11
IS
ft/1-m ..lr'fl
C CCL
Let .l }'1 he the change in the value.: ol l lnr tlw i 1h individual over the course ol
!l Y, = /3 ,. /3 \'1 t- II,.
Th~ uillcrence<;-tn-tllffcrcnccs estimator has two p(llt!nli:-~1 IH.Ivantagcs over thl.' single-
l:.[{idrnc r If 1he 1real mcnt i. rnntlnmlv rcce1' cd, t hen the sl jfllg ncrs-jp-
d1fkrt:l11,; '' ''- . .a tor can h\. 0101"1..' efficient than the dtffcfC"OCCs t:\ltWa\Uf.
11\1~
482
CHAPTER 13
~ )i11..,_1 _
.l y conNol
y
711
I)(I
the
Outcome
2lt
10
,_ .'!
t= 1
nunm
nccs-in-diflcrcnces c-.tllllator j., more effiCient depend-. on" hcther thc ...c per
SIStc.:ntJDJl.. IOuai-Spte!ftt eJ:htt&ctCIIsdG EXp15lli ia?U 61 <iii511 .diidd~f
the variance in Y1 (Exercise 13.6).
d with 1~ in
,lhc:n
tiaiJc, el of
l c differences estimator is biased but the diltcrences in-differenccl> t:!)lJmator
is not. This i illustrated in Figure 13.1. ln that figure, the sample averag~ c>f >'
for the treatment group is 40 before the experiment, whereas the pretn:atn ' nt
sample average of Y for the control group is 20. Over the cour::.e of the cxrcn
ment, lhe sample average of Y increases in tht control group to 30. wher~ it
increases to 80 for the treatment group. Thus, the mean difference of tht. po::-1
trea tment sample averages is 80 - 30 =50. Howevcc some of tttis diffl!rl.'l1l'l'
arilles because the tre<Hment and control groupl> bad different prc.:tre.lll nt
means: The treatment group started out ahead of the control group. Thl: Jitf.:r
e nces-i.n-di[ferences estimator measures the gains o( the treatment gruup. rcl1
live to the control gToup, Y.:hicb jn this example 1s (SO - 40) - (30 - ~l}) )0.
More generally. by focu sing on the change in Y over the course ol the l; ri
,.
13.3
483
W variable could be lbe prior education of the participant. These a dditional regressors can be incorporated using the multiple regression model
(13.5)
484
CHAPTER 13
mighl bl.:' ub)Cf'tcU monthly for a year ur m1.m.:. lnthb ca~c.the popul,ttwn 1cgrc~
.,jon llh>ueJ, in htuatt '"' ( 13.-l) and ( 13.5 ), '' hu.:h arc h<t,cu on th~. 1. hnngc 111 th~.
l'Ull.~>lllC bCI\\C n ,, 'La!:ole pretreat me.. nt uh,cr' .Ilion and n .,jnglc po!it-lrc.atmcm
f\ 1 1c:~,.n on 'lrc not applicable Such J ta c, r' ~ ,,, \cr talytctl u in
t ... "" '- ~.ch rt. rG:.-.mn modd of cdicm Ill '\; thl! de L~IS rc provtdc ll m
Appc
II\
stimation of Causal
ffects for Different Groups
Inc r.tus,) dlcct can diller from one subject tu th nc:\1 lll!pcm.ling nn mu ' I
ch.uactl. n:-11~'-l or cxampk. the efkct on cholel>ternl kv'-1" ol a chok .. tcn'l .. J, ,.
ing drug ~:lUh.l b.. grc..atcr lor a patient With a h11!h clwl~sh:rollcvd th.an lor nne
\\ hn'c chok.,tcrul l~\cl i already low. Simdarl). a JOb t ratnmg pog1.1111 11\t)!h l [1~.,
mnn: t.:lkctivc f1)t \\omLn than for men, and it mi~hl bl.! mor~.. ctfccttvc lw mutJ' <H~:d than fn1 uunHHivatcd suhjcctf.. M<.m..: ge!ner" ll> ,lh~o: cuu:-.aii.'ITcct c:lll ~k p~ lhl
nn the vnlw: ol unc ''r nmrc variables. which can either he llh~erwu (lih l,'llld 1)
nr u noh..,\..rv~:u ( hkc motivat ion).
13.3
(J
485
p<u tial~.omplianc.:. thc aHi~11ed trc:umcnt level c.1n sen c a an in..trument.JI , -a riabk lor th.: mtua/treatm.:nl k'1
Rl:ullthat a \".tnable mu't '1311,1~ the 1\\u cunJ111on~ ol tn,trumcnt rck' ancc
and lll,trumcnt cxogeneit~ (Ke} Con(t:pt 12-.J ) tube .1 \ hJ llhtrum~mal vanable.
\, lllng a-. th~ prut~.x:nl i(O partiall~ lollcmeJ.thcn th~. .tliU tltrcatmcnt k'd (X,)
j, partial!) determini!J h) the '""igncJ trcatmcnt lc\c.l (7), "l' that the instrumcnt.tl \:lrwblc 7. i~ rclc,ant It the a-;sigmd treatment lcvc.:l1~ dctcnnined r<~n
dnmlv that" if the expcnment ha-; random ""1'-!.nmcnt 111J tl the cc.stgnment
thdt has no cllcct on the outcomt.:. othd th.tn thr('lugh It~ 1nllu~.nc"' on '' hetbcr
trcatnKnt I!> rccctved, then L , t) cxogc.:nuu). That '' ranuom .t::.::.agnmem of Z ;
imphe... that r(u Z 1) - 0. wh...:n: u is thL crrnr term tn thc dalfcrcncc, -specification in rquatinn (11.1) or in the uiltcrencc' in diffcrenct:' ,pedficatiun in Equal ion ( lJ I).tlt!pcncling upon"' hich l!slimatot is hc.tng uo;cu. (bu(,, in an t.:xpcrimcnt
\\llh partial C('lmplwncc anc.l runuomly .tsst ~tncu ttcntml!nt. the onganal random
k, ~ ,
~ ,fO
1t
u~f
~Lt< l 2.
0
T;
:1- 0
(l'f~
w'ty ~<; P'l"o;jhlt IO IC<;t for randomization hy 1.bc\.kinj! \\~r the ranJomiLcd \C\fi-
(t 1
~ Tc _j_
({X, .
Jkr~~"
:m~
II lhl; trcalmcnt ts
random!~
J. 1l1U', the h} plllhcl>il\ that treatment ts 1anc.lnml~ rccci\cd can be tested b) tesfmg fhc h~ pot hcc;t, that Inc coeli lctent' (lO t\' ..... Cl 1 c ru in a rpw;,.~ton of
.Y, on W11 . . W In 1 L 10 traanmg program "\.unplc. h.:grc..;..;ant! receipt ('If job
trainmo (X) on gender. race, .1nd pnor eJucutwn ( n-.,). and computing lhe F:-tal ''"f 'f>'IR '' h c ther r ht> cOt:llJCknh on 1hc \\- ., .H1. 1cro rm Ide-. a test ol the
Te.s ting for random assignment. If th~.: treatment j, random!\ ac;'>igncd. then
the ac;"ignmcnt Z, \\Ill he uncorrdatcd with the uhscrv<tblc indtvidual chHntcleri..,ttc). IllU~. lh" h~polhc\1~ that trt!alrnl.'nl J (, ntnJ,,mly a ...... ign~.:d ~an he tested by
r~l!l C~Stnl] 7., on w,,..... \\ ri and te'\1101! I he null II\ RP!IW5i> I hl' t!! I b , !a pr~ cwtI
1In 1111, c\nmpll!. ,\, 1' hma!J ~. ,,, dJ,cu~~l.lm Chuph:r ll , lh<' ~<:gre\''''" ,,f.\' un II'
, \\'" "a linmodclltnd hc:tc:rc,.(..cJu,I!~11V-rt>h!!'l ,l.mc.brJ ~"""' ore '""'"11.11 \nu h.:r Wfl) hll""'
lh II\ poth~ ~'' th ol F( \ 11 1
1\ nl c.locs nut dqxnJ ''" II 11 , .. \\ ,, '' h.:n .\, 1. b1nal) 1~ to U'-C ;,
rwhll I tf lu!tll moJd ( - c.,. d on 11 ..2 )
c.u p1ohab1ltl\
4 86
CHAPTER 13
Experimental Design
The Ten nc see class size reduction experiment, known as Proj ect STAR ( ~tu
denr- tea cr Achievement Ratio . was a fo '
uaw the effect on learning or small class sizes. Funded by the Ten nessee SlJtl!
Lt::gi ~ l ature, rhc experiment cost approximately $12 million over fo ur years. n t
study compared three diffcrem class arrangements for kindergarten through thmi
grade: a regular class size. with 22- 25 students per class, a single teacher. am.! O\l
aides; a sma ll clas" siz
<
~~
lA \jl
-,
(\
\,.:/'~
'
13.4
487
Deviations from the experimental design. The experimental protocol specified tha t the students should not switch bet\vt:en clas~ group .other than through
th~ re-randomizatioo at the b.::ginning of first grade. However. approxima tcl~ 10%
of the students switched in subsequent years fo r reasons including incompatible
children and behavioral problems. These switche!l represent a departure [rom the
ra ndomization scheme anJ . depending OTl the true nature of the switches, have the
pote ntial (O introduce bias into the results. Switche!. made p urely to avoid personality conflicts might be sufficiently unrelated to the I!Xpcrime nt that they woul d
not introduce bias. If, however. the switches arose because the parents most concerned with their children's educat.ion pressured the schoo l into switcl1ing a child
into a small class. then this fai lure to follow the experimental protocol could bias
the results toward overstating the effectiveness o( s mall classes. Another devialion
from the expenmenlaJ protocol was that the class sizes changed over lime because
students switched between classes and moved in and out of the school disrricr.
(13.6)
where SmallClass 1 = I if the t th student is in a s mall class and = 0 otherwise.
RegAide1 = 1 if the jlh student is in a regular class with an aide and = 0 o therwise,
and Y, is a test score. The effect on the test score of a c;m all cla<;s, relative to a reg
e e eel o a rcgu a r c ass wtt an a1 e. rc a 1ve
maHng /} 1 and 13: in Equation ll3.6) by OL.{
Table 13.1 presents the differen~ estimates of the effect on test scores of
being in a small class or in a regular-sized class wtth an aide. The dependent varia hl Y, in the regressions in Table 13.1 is the stude nt's total score on the combini!d
math and reading portions of the Stanford Achievement Test. According to th~
estimates in Table 13.1, for students in kindergarten, tbt:! c(fect of being in a small
class is a n increase of 13.9 points on the test, relative to being in a regular class:
the estimated effect of being in a regular class with an aide is 0.31 point on thl! test.
488
CHAPTER l3
TABLE 13. 1
Regressor
Snull cla.,s
t)()
! ::!.45)
031
(::!
2~ )
CJ!h.ll4 ..
lntcrco:pt
11!13)
57Xll
-:!9,7X'*
(~)U)
I I.%' "
(2.h'i)
III.W \l)h
(I 7Xl
fl.n~
141.31)
('7L)
15.511
_, 4S
u.:o
P.5~)
!2.27)
11"7 !11
I I R2)
1.:!2X.'il
(I nS)
()049
12 411)
5%7
ntc rt Qr.,won'
"rc c 'llmtllco u'm~ th~ ProJectS I'AK Public AcCC!>' D:rt01 )~ttkscri~~J rn 1\ppt:ndr\ J:l J. Il l\' JcptOmkn t v,,n.bk is
thr. , l uJ.:nr, i.<>mhmru 'l:ur~ nnth~ marlt am.l r~HJing p.xtionso[ the \tan ford Actlrcvcmcnl r.,,, $tanll,trll error ur"' I!I''C il 111 P"n
the"'' under the l.~)dltccnl\. I he rntlvdual codTrt:ICill r'> staust1.:alh ").'THfkd.O.l ott hr. I".. ''~mfkancc I.:' .I u'mg .rl\\o ,u.ku t:\1
For cm:h grade. the null hypothesis tlwt smal l clnc,<..cs. provide no impruvemt..nt 1:>
reJeLtt.!tl at the l% (two-sided ) s1gnificance leve l. However. it is not p o~~ihle w
rCJC<:t the null hypothesis that having an aide in a regular da:.s provtJ~.:, n '
lmprov~:ml:nt, rd<nivc lo not having an ald~.:.._':!xcept in lint grad~The c... um teJ
magnitudl!' nf the 1mprovemenh 10 'mall cla<~~c') arc broad!} 'itnillar m gr<tde~ "-:!.and .l although the estimate is larg~:r ror first grade.
11H: differences estimates in Table 13. 1 sugg.e-.t that reducing clas' ')izc ha:, ,tn
effect '"' t~:'>t performance. but addtng. an aide to a regular-sized cia'><> ha.; a I"'U.:h
:.maHer l!tkct. pO:.\JbJy zero. A~ tlic.,cussed m Section 13.3, JUI!mcnting the rc~1 L <;stons m fable 13.1 with addnional regressors [I he W regressor.; in Equat ion ( I ~ 21]
c~m prO\ iuc more e(ficient estimate.., of the cau::.ul eflects. Moreover. il the tr..:ttt
ment rccc:ived is not random becau"c o( failure-., to follow the tn. atmcnl prowcol.
then the e-..tim:lles of the C\:perimental effect;, ba!.ed on rcgn.:~sion' with additi ~tn:~l
regrc~-;ors could differ (rom the difference esti ma!es repo1 tctl in Table l~ 1. Fnr
the::.e two rca..,ons, estimates of lhe experimental effect. in which additional rc!l!'l:, sors an: mcluded in Equation ( 13.6) are report~d fur kindcrg.trtc:n u1 lltbk 13.2:
lhe fir ... t column of"labh.; 13.2 rept:!.H!. th.c rc!>ults of the fir~t column (for ksnJ~.- r
gartcn) from Tahk 13.1. and the n.:maming three columns include adutlltHhll
rcgressm-. lhat measure lcacher.school, and "tudcnt charm:terio.;tics..
Th~ mam condu.-.on from Tahlc 13.: is that the mull iph. rcgn~-..,ion I.!Mim ~t~ 5
()! lhL cauc;al cllects ot the two treatments (small class and regular-sized da"" \\ 1th
aide) in the final thrt:c columns ot fable 13.2 an: 'lmilar to the difference-. ~::>ll
mate rt.:poncJ in the first column. "Int: fact that aJthng the\c ob~en .tbk r... _r ~ -
13.4
TABLE 13.2
(1)
(0
II cia~~
R~f,ul.u
k.~hcr, \~:lr~
489
Regre1or
~
II 'I
(!.2"')
(2 )
JJ (WJ ..
I~
(~.451
(_.211
-II. HI
(225!
J.nn
nr.:xpcn net
(3 )
(0.17)
'
-M
I ,.,
t2114)
1174
(1117)
(4)
G)
1 "'Y
(I.%)
0 11(1
(II 17)
I:' 114
IJO}
~
h~c: lun~h
.\J -lt
(1 9'-J)
chgnliC
- 25 4~
p 5fiJ
010
Bt.t..."
-s ;;o
(I!"::!)
1)
lmt ~"l'l
lltO:l.
!l.h3l
904 72"*
!2.22)
Sdul01111Jilultlr "mahk''
no
no
r.
1J Oi
002
0.22
5i'~h
Sir16
'i7lltl
~utnllc:t
uf bscn.niiOll~
\\: '
)C'
11.21\
,,..
'I h rcf_to;SSJOII \\CIO: HOllllcJ 11>1111! lht' l'r<~tcd \T-\ R Puhh" A<.U">' D.11 1 '- ~ ..cn~d tn \t pcnth\ tl I I he Jq>cndcnl \llfl
:1hlc I\ the wmt>mctl 1e 1 ''"'c lin lht m.tlh tnJ cJJir" pml'""' cf the: Mar I >rd \cluc~'CIIICnl l~t I he num~r Ollt't'i, tV.lllltn'
Ill 1 111 th '''"' '' 111 Htcl"n' t>c,,HI't I ) rn<' m1"'"~ J 11.1. SwnJ:uJ en or. llr~ S''' n 111 p IIC'IIIh~c~ untl" ...xtlltCill!:- I he:
11 d t.llLII u-.clhCtcnt 1~ lalt\11~.111\ $Jt!OIIrc.tn1 .ttl he :'i% lt:vd or
I ..., ''l!mlll'llll~~ l'd U\111!1 & 1"<'-'~tJ,d tc>t
11 more plau!>tble that l.be random assig nment to the .,mallcr da,sc~ ,tfso Joe!> not
<h:pcnd on unoNT"\icJ ' ariablcs.A~ cxpccll.:d. thc'c addition.tl regt c"u'' in~orc,l'-C
the R-: {lf the rcgrc 'ion. and the stanu tro l.'rrnt ol th~.: c'tim:llcd cia'' o;;i1c effect
Jccrt!ci'C' fwm ~...JS in column (1) t(l :. 1 o in "nlumn (41.
Because teacher~ were random!~ asst\!ncd to da'" I\'J"C" \\ ithin 1 school. the
cxpcrim~.:nt aJo..u prt)\ tdc an opportunit~ to c'tin1.11c tht: dkcl on '"'' scores of
lc.'!<lCht;r experience.. Teacht:rs were not, h owcv~r r tndoml) as'-t!!ncJ aero'" parll\."
tp<tltnl! 'Chl)()l" i.IOO
490
CHAPTER 13
effects""), Lhot is. indicator variables denoting the school the st udent aueudcc.l
Because teachers are randomly assigned within a school. the condnton,tl m~.;an ,
u; gt\ Cn the school doc not depend o n the Lreatment; tn the terminology of Sl ~
tioo 13.3, because of random assignment within a school. the condittonal 011. .. ,.
independence assumption holds.. where the add.illonal W regressors are the <.choll
effects. Wben school effects are included. the estima~~ o r the dh.:ct of cxpcri cn~
drops in hatr. rrom 1.47 in column (2) to 0.74 in column (3) FH;n -.o.thc e~timatc
in column (3) remain!> statistically significant and mode rate!} I re.c, tt.U \'t' ,-... o,
experience corre~ponds to a predicted increase in test scores or 7.4 pomts..
It is tempting to interpret some of the other coefricien ts in Table 11 2 r1r
example, kinde rgarten boys perform worse than girls on these standard11cd te-.t-But these individual student characteristics are not ramlomly assigned (the pender of the student taking the test is not randomly assigned!). so these addiuun 1
regressors could bt: correlated with omitted variables. For example, if race or dlgi bil ity for a fret lunch is correlated with reduced learning opportunities outstJc
school (which i) om itted from the Table 13.2 regressio ns). then lheir estimated
coefficients would re flect these omitted influences. As discussed in Section 11 1 if
the treatment is randomly assigned then the estimator of its coefficient is com;,.
tent, whether or not the other regressors are corre lated wiLh the error term hut I
the additional regressors are corre lared with the erro r term then their coefftc~nt
es1imators have omiued variable bias.
v .c;.;v
13. 4
TABU 13.3
.-
491
Estimated Closs Size Effects in Units of Standard Deviations of the Test Score
Across Students
Grode
Trec1tmont Group
Small class
Reguh1r St7<! with aide
arnpk standJrd dcvtauon
ol test score~ (.~ y)
0.19
(0.03)
(0 111)
0.00
(0.03)
(0.03)
(11.113)
84.10
7330
73.70
0.!.1
om
CJL30
0.21
(0.0.>)
(HJ()
The csums tc) aml ; lancJ.IT() .:rron. m the fi two row' arc: the ~lrm3tCd ettccts in t~ble I3 I, dl\ adccJ 1'>,1- the -.ample ~lancJanJ dc'l
Stanford Achi~:Hmcnt Te,t for hat grad.: (tht findI row LD rlus table). C\>mputc:u U\lng dtllaun the 'tudo:nt' in the
.:xpcra menl. Standu rJ ~rrnr~ are g.aven in renthe~es under ooefficaents. l'he ind ividua l roc:fricacnt~ slatisticull) Slgmfkant at the.
,aiHlO of the
492
CHAPTEit 13
r conoml!tricians. "latistidan..,, :mJ -.pcctJh~ts m clemt. nedu a lion ha' e stuthed vurious a<,pects of this exp<.rimcnt.and v.e hncll> :.urn.
marit.<. some of Lhose fi ndmgs hcrl!. On.;: of the e t1 ndings '" 1h:11 the dkct uf ..1
small class is conct!ntratcd tn the earlkst grades. Tilt' can be ~ccn in Table 11 \
cxcept fllf the anomalous first-grudc results. tht test sco r<.: gap bc:twecn r~.:gul 1r
.md c;ma ll classe<> n . portcd in Tnbk I:t3 is e<:..,cntiall) constallt acrchs graJc.., (11 )
'tandard deviation unit in kindcrgtlftcn. 0.23 in second gra<.le. anJ 0.21 tn th d
gradI.'). Bccausl! the children initially ac;signcd to a ~ma ll cJm;c; stayed in that 'm:Jll
class, this meam that sta\ tng in a small class t.lid not result in .1dditiun.JI gc~ 111 ~
ratbc:r,th~ gains made upon tnllialasstgnmcm were reramed tn the higher g.r.ttlc~
hut the gap betwcl.!n the treatment .tnd control groups thd not incrcasc.:\notht:r
finding i<~ that, a-. inJicatc:d in the second row of fable IJJ. tlus ..:xpenment ~hl'"'"
little benefit of having an aide in n r~~l ar-s izcd classwom. One pote ntial concLrn
nhout interpreting the results of the c~peri men t i" th-.: failure to follow th~ trL,,l
ment protocol for -.orne :,tudent' (o;nmc <;tudent!. switchcd from the ~m:11l d.h,e-.).
II inhial pl.tcemcnt 10 a k.tndcrgarh:n classroom ts ranJom und has no direct effect
on test ~cores. then mnial plitccment can be used as an instrume ntal vanable ,1-.
p.trliall). but not entirdy. tnfluem:c~ placement. T his strategy was pur.,u ... ~.. tw
Krueger ( 1999). who used two stage k.t!>l squart:-. (TSLS) to estimate the effe~r on
t\.:SI scores of class silL' using initial classroom placement as the in, trumental an
ublc: he tound that the TSLS anJ OLS estimates were similnr.lenJing him 11 wnclude that deviations from t h~ experimental protocol d id nm intmcluce sub~tnti.tl
bias mto the OLS e sttmatcs.~
Additional results.
!M)
13.4
TABLE 13.4
Study
- 13.90 .
K)
(2.45)
l alifornin
- 0.73
(0.26)
M<:Sachuscus
-0.64
(JtrUd~
493
Estimated Effeds of Reducing the Student--Teacher Ratio by 7.5 Based on the STAR Oato
and the California and Massachusetts Observational Date
fJs
STAR
(0.27)
Change in
Standard Deviation
Studeni-Teocher
of Test Sco res
Rario
Acrou Students
95% Confidence
Effed
Interval
73.8
O.l'J
(O.Cl3)
(0.13.0.25)
38.0
IJ.l 4
(0.05)
(0.04. 0.24)
-;,6 f.7{;9.
0
1_$
0.12
(0.05)
(0.02. 0.22)
-7.5 ~
1~
-7.5
Esrimotod
(f .
i'h< c'uruated cocfficn:nt ;J for the STAR study IS lllkeo trom column ll) tf Table 11.2. 111~ estunatcd coct!Jcscms tor the Cahlor
ns;a . n.J \.i.,d.:huse!l' \tu..Sse~ .tre taken from the ftrst column of lablc 9_'l Th~ e,, tim:tt!!d err..c.t 1S the dfo.:c.t of bcing l.n a =all eLl~
.,.~~,,a rqular cl.t~ ((or STAR) or the dfcct of reducing the studcoHem;bcr 111110 by " 5 (for th< C1hfomia and ~lassachu.etts
n hcsl lbe <K~ ct>nfidcn-t inte""' for the r&!duction llllhc studcoHucher ratio i' ths <''-llmostuJ d fect I 96 standMJ errors.
::.r.mdard erron are ~iven tn pan:nthe~es under 6lunat~ cttccu. lbc cslsmau::d cffms arc 1>Uit,tk;.lll) \lb'llsf~t."'illltly d!llcrcnt !rom
~trn ~~ the S% lc:velvr
I% "!-'IIIIi an.:e levelusmg. a two-ssd.:d test.
[able 13.3. the c!Stimatc:d c:fCecl!> .sr~ in term' Uf Ihe <,t.mJard dc\iaUon of l~l 'l'Orc~ dCI'0''>
The ~tandanl u C:VltllJOn acr~ ~tudeots is greater than the 'hmdi!rJ dc:vusllon cro'' d~<;tncts.
Fns C.sltf\)mia, the <,l!fni.lard d<: \i ation ;.s ro's st udent~ i ~ 38, but the~~ mdo~roJ d~\i<~tion aero di.M ncb
j~ ( I) 1.
tft.vtfiC/1, in
llmft!lll
494
CHAPTU 13
lhc e ... umatci.l ctfc~.:h lrom th~.: C...thtomi 1 mJ ~''"' chusetts olhcn a llon,ll
srudt-.:-. arc some'\ hut smaller than the STA1~ cstmliltc:.. Om: reason that cstimutcs
from Jiflc:rcot stu<.Jk, Jllf~,. r. howeH:r i' random ....unplang 'ar~t~bilit~. so it make
~-.:n,t to compare conlldence tnh.:n 1h for the e'tirna u.l dlccb from 1hc thrc .
~tudtt.'- R.tst:tl ~)0 th1. \I \R t:hll<l [or kmdcqwrh:o the'-~"~ (..t. n ldcnt.~ i t
fm th~. dkct of !'l~snr 111 .1~maJJ d,t:-'> (reported 10 tht: tmul column oll.tblc 1~ l)
ts 0.13 w 11.25. fhc l.'ll111parahlc 95% conlt<.km:c int-.:rv,tl based on thc Cnliloaniit
o b,t.t 'ational dat.t j, () 04 Ill 0.2-i. and lc.n ~fa sachu':>ctlS tt 1s 0.02 t(> 0.2~ l11uc:. tlk
95"' c:onliJ~.ncc an h . n ,tis from the C..1lifomi 1 md \ lt,s. chu:-.cth 'tudics cont ut.,
mo't l f th~. 95 ~ contldl.'ot.-e inten<tl from the ST \R kindcrg.trtcn Jatc1. \ ic\\ql
in th1s ,~.1v the three studic" gwe stn l,.inglv 'tmtlar rlnl!l.'> t'l c<;timatc"
l'hcr~; lrc man) rt. l~uns ''llV th~.. e.\pcrimcnt.ll 1nd ohc:crvattonal t:'llm llc'
might Jtlic.r. Ont. Jl.'ason 1s that, as dtscuss..:J in SL~tt~.m 9 ~. thcr~.. .tre rl.'m.ea. un
thn:ats to th~ intern.tl \aliutt> of th(.. obst:J'-att<.m.tl stud..:~. for exampk b~~o: IU't
chilur~o.n move in .tnd out of diMnch. the: distrtct stuucnt teach..:r rutto m11.:ht
not reflect the !>lullenl ll.':tchcr ratio actuall} c\pt.!rknc~..d hy the ~tudcnt,, '" tl
c01. t,~,.,cnt 0n the .;tudcnt teacher ratio 111 the 'vta.......tt.hu,etts and Calif~o1mi.t ~ nu.J
ics c:ould h.:. hia!>t:J "'" .u J zero hccau<.e l'f ~ rror<. in-\ an able' hia<> Ot her rt.' a")ll'
COllt:CI 0 external hthJtty ill~ distrtCI JVI.!I"<Ig._l.! ~tUJC nl-tcaChl..f ratio used in the
OP'I.f\o.ttiunal ~tudi"" i'). nut the same tlung 'l.s lhl.! .t<.tual number ot htlc.lren 10 th..
d .......... th ~T \ R 1. 'Jlcrtmcr 1 1 'ariahk Project ST r\R ''a" in a sou thern slut ... Ill
the 1)~ .... pulcnti tll~ dtl ferc ntthan C.tlifomia and \I,J;s,achusctb tn IWS, :tnd I he.:
gr<HJc, betng comp<trcu <..hiler (K- :; in STA R. fourt h gruc.k in Massachusc.u ... l'ifth
grade in Caufornia ) In light of all these rc;t on" to C'Xf'II.'CI diffl.'rcnt cstimall''.lh~.:
flmltn~~ ~olf the: thrc:(.. stuJt~s .tre remarkably c:tmJbr 'fhc factth.llthe oh--cn.ttion 1l
'tudic' a "' "''mtlar t(.. th~. ProJ~C t $1.\ R ~ t m,llcs su~~~sts th.ll the: rem:untf'l
thrt:at' lo thc mternal 'ntit.lit} of the. ull._~.rvattllll dl~. :.um~t..:s ar~ mmtlr.
13.5 Quasi-Experiments
'fru\: rand\)ffitll.li ~OIIllllllcd cxpenml.!nb can he cxpcnst\ C-1 h..: ST,\ R C\1, 11
mt:nl CO'>I $12 milhnn anJ the~ ollen raise cthtcal COI1l'1.Tn" In m~Jicine. it " t1ti1J
h~. unethical to ddcJI11tnc.. the effect on lungcvaty lll smo~tng nyrandom I) ..~~.
1
ing \Uhjl'l.b to a 'mul.:ing trcatm~o.nt group and .I no l'ntuktng Cllnl'"OI :,!fll p i 1
economic-;. it wouiJ he lll ethic 11 to c ... tim:ue tht. d~.mand elasu~..it y Cor cig,m.:tl c.
amorw t~..:nagcrs bv -;cllitn ~ubstd11cd ct!!.3rCth:' tu nndomh ''-"ccted high -.chtlt,l
student~ r ..r co:.t. t.thic,t. md practtcal rt:a'>nn' t ru~ ran<.lomvcJ comrollcJ C\ P' rimenl'.. rt. rar~ 10 e..:unonucs.
13.5
Quo r hpenments
495
Examples
We illu.tr1ll. the l\\fl type.:' of qua ... ie\pc:rirnc-nt" h\ c\ ,tmpk,.l111. fiN C"'l:ampk
i:- <lllll''ic,pcriment in" hich the treatment is . , .. i1 randomly determined The
sec,,nJ .111d third t:xamplc illuo;tr;Jtc qua..,ic'<pCt tmcnb in wht~.:h the .. a... tf"
r tndl'lll \afl.ttt Jn mflut.nco.:s.. but due.:" nnt cnttrdy dc:.tt.rmtno.:. the. level ol the
trc:llm~.:.nt.
496
CHAPTER 13
Does serving in tht! military improve your prospects on the labor market? The milt tary PIO
vides trajning that future e mployers might find attractive. However. an 01 S
rt!gre::ssion of individual civilian t!arnings against prior military service co uld ptoduce a biased estimator of the effect on civilian earnings of military scrvicl.!
because military service is determined, at least in part . by individual choices and
characteristics. For example, the military accepts only applicants who meet m llll
mum physical requirements, and a lack of success in lhe private sector labor ru.u
ket might make an individual more likely to sign up for the military.
To circumvent this selection bias, Joshua Angrist (1990) used a quasi-e~r'-ri
mental design in which he examioed labor market histories of those who served
in the U.S. military during lhe Vietnam War. During lhis period, wheth.er a young
man was drafted into the military was determined in part by a oational louery !>y<~
tern based on binhdays: Men randomly assigned low lottery numbers wert! '-ltgl
hie to be llrafted, while those wilh high numbers were nol. Actual entry into th~
military was determined by complicated rules, including physical screening anu
ccrtrun exemptions, and some young men volun tcered for service, so serving in h~
military was only partially innuenced by whether you were draft-eligible. nu-..
being draft-eligible serves as ao insuumental variable that parrially dete rma nc::~
military service but is randomly assigned. In this case, there was true ra ndurl1
assignmen t of draft e ligibility via the lortery. but because this random ization w t~
not done as part of an experime nt to evaluate the effect of military servict! lhi, 1s
a quasi-experimen. Angrist concluded that the long-term effect of mill tar) <cr
vice was to reduce earnings oi white. but not nonwhite. veterans.
13.5
Quos Experiment!>
497
mcnt.tl \,uiablt.: tor ;.~ctualt rc!aLment by cardiac cath~t<.n/Jtlon.Thi~ -.tud> 1s a 4ua,iexpcnmt.:nt "llh a variable that partiall~ determine the tr\!atment. The trc..atmcnt
tlScll, cardtac catheterization . IS d t..tcrmtncd hy pc..rsonal charactensucs of the
patient <tnd by the decision of the patit!nt and doctor: IW\\t;:\~o:r. it is also innuenccd
hv whe ther a nearby hospital is capable of performing thic; procedure. U the location of th e patient is ''as if' randomly assigned and ha!> no direct efl ccr on health
out com~~. other tha n through its effect on the probability of catheterization. then
the relattvc distance to a catheterilatlon hospital i!> a valid instrumental 'ana hie.
Other examples. The quasi-exper iment re)carch ~t ra t egy has been applied 1n
other areas as well. Garvey and H::mka ( 1999) u td v mau on m U.S. statt laws to
examine the effc::ct on corporate financial st ructure (for example, the use of debt
by corporations) of anti-ta keover law!>. Meyer \'1~1.U~t. and Durbin (19CJ5) u....cd
large discrete changes in the generosity of unemployment insurance benefit-. in
Ke ntucky a nd Michigan. which differentially affect~ workers wilh high but not
low earnings, to estimate the effect on time out o1 work ol a cha nge 1n unemployment be nefits. The surveys of Meye r ( 1995). Rosenzweig and Wolpin (2000), and
Angrist and Krueger (2001) g.ive other examples of quasi-experiments in the fi elds
of economics and social policy.
Econometric Methods
for Analyzing Qoasj-Experiments
The cconomeuic methods for analyzing quasi-expcmmt!nto; are fo r the most pan
the same as those laid out in Section 13.3 for anal}ltng true experiments. Tf the
tre:Hme ntlevel X is 'as if' randoml y de termined, the n tht: OLS estima tor of the
coefficie nt of X is a n unbiased eslimato r of the cauo;.ll effect. If the treatment level
b o nly partiaUy random but is influe nced by a variable Z that is .. as if' randoml}
assigned, then the causal effect can be esLimated by instrumen tal variables regression using Z as an instrument.
Because quasi-experiments typically do not have tJue ra ndomi7.allon. there
can be systema6c differences between the treatment and control groups. U so. it
tS tm pona nt ro include observable measures of pr!!trcatment characteristics of the
indlVidunl subjects in the regresjon (the W's in the regressions in Section LD).
As discussed in Section 13.3, including W rc:gre~sors that are result) of the treatment io generaL results in an i ncon s1~tcn t estimato r of the causal effect.
Data in quasi-experiments typicall~ arc collected for reasons other tban the
pa nicular s tudy, so panel data on the "subjects'' o f lbc qu.t t-t:xpcnment sometimes arc unavailable (an exception is discussed in the box on thc: minimum wage).
If -.o. one way to proceed is to use a series of cross ..,cclion collected over t1me.
and to modif) the methods of Section 13..3 for rcpt..atc:d cros~ -sce1ional d,tta.
4 98
CHAPTER 13
e::~t imator
in a
regression of e mployment agcunst wnges has simultaneous causaltl} bia~ Hypothetically.n randomi.l<:d
ally suggest that employ-ment increased in New Jcr$CY restaurant~ nfter its minimum wage went up.
relative to Pennsylvnnia!
'\
repeated cross-section aJ data set is a collection of cross-sectional data ~ets. \,.ht.r<each cross-sectional data set corresponds to a different time period For example.
the data set might contain o bservations o n 400 indtviduab in the yea r 2004 anJ ,,n
500 different individ uals in 2005, for a total of 900 diffe rent individuals. One -:\'1111'
plc of repeated c ross-sectional dma i political polling data. m which polittcal ['1A
ercnccs are measured by a series of surveys of randomly selected potential htd'
where t he . urveys are taken at different dates and each survey ha., dJffLf-'nt
respondents.
The pr e mise or using r 'pea ted cross-sectional data is tha t. if the indiviJll<11s
(more generally, entities) arc rundomly d rawn (rom the same population. then th~
13.5
OoosiExperiments
499
1ndividuub in I he carlil..:r cross sl!ction cun be u~ed as surrogut~s for the individu
ale: in tbe tr~almcn l and control groups in the late r cross section. For exnmple, suppose that, hccause of an increase of funds that had nothmg to do \\ 1th the local
lahor maikct. a job trai ning program was expant.lt:d in southern but not northern
Ca lifornia. Suppose you have survey data on two randomly selected cro s sections
o( adult Californians, with one survey taken before Lhe traini ng program expanded
and one aft~r the ~xpans1on occurred. Then the .. treatment group'' would be south
ern Californians and the "control group'' would be northern Ca liforni ans. You do
not have data on the ~outhem Californ1ans actually treated before the treatment
(because you do noc have panel data), but you do have data on southern Californians who are statistically similar to U1ose who '' ere treated. Thus you co n usc the
cross-sectional data oo southern Califo rnians 1n the first period as a surrogate fo r
the pretreatment ohserva rions un Lhe treatment group. and the cross-sectional data
on northern Californians as a surrogate for the pret rea tment observations on tbe
control group.
When there are two time period~ the regression model for repeated cross-sec
tionaJ data is
" here X., is the actual treatment of the ith mdividual (emil)) 1n the cross section
in period r (r = 1. 2). D, is the binar y indicator th at equals 0 in the fi rst perio<.l and
equals 1 in the second period. and G, is a binary vanable ind1cating whclhcr the
individual is in the treatment gro up (or in the surrogate treatment group. if the
ob:.ervation is in tbe pretreatment period). The ,w individual receives treatment if
he o r she is in rhe treaunenl group in the second period, so in Equation (13.7), X;1
- G, X D,, that LS, X 11 js tbe interaction hetween G; and D,.
If the qua<;i-experi ment makes X;, ..as if'' randomly received, tht:n the causal
dfe::ct can be C!>limated by the OLS esti mator of {3 1 in Equation ( 13.7). H there arc
more than two time periods.. then Equalion (13.7) j.., modified to contain T - 1
binary variables indicating the different time periods (see Appendix 13.2).
If the quas1-experimeo t makes the treatment X 1 oolv partially randomlv
received, then in gen eral X 11 will be correlated with 1111 and the OLS eslimator is
biased and inconsistent. In this case. tlu~ source of randomness in the quasi-experiment takes the torrn or the instrumental variable 2,1 that pMtiJII> tnlluence~ the
treatment level and is "as if'' randomly assigned. As usual . for Z, to he a valid
Instrumen tal variable it must be relevant (that is. it mu.,t ~ related to the acrual
lrealmenl X,,) and exogenous.
SOO
CHAPTER 13
with Quasi-Experiments
Like all cmpirica 'tuo~.;~. 4ua,1- xpenment f.1cc tim. t' to tnt rn. n "xtcrna'
' 1 JH} A p.n ti..ularl~ important polt.nual thn.. at to mtan ... 'alid ..) .., "hcthcr
the ''if' runc.Jumization in f:.tct can he trcatt:c.J rdt<~hl)' 1:> true ram.lomiz:Hton.
Failure ofrandomization. Quasi-~xpcrim~:n ts rei) un diff~rcncc-, in inut\Ic.lual circumstance'> legal change'>. sutldcn unrelatcu ~,; \'Uilh. and '>II flllth h> '"''
vitlc the ''ll'i if'' randonw allon in the treatment level Tf this "a, u~ randumtt.llion
f<ub to prmluc~ n trc.llmentlcvcl X (or an in'itrumcnt.d variahh.: 7) th.ll i.; r.tu
Jom, Ihun in g~,; ncrolthu OLS estimator is biased (or the insuumcntnl \'ttl i.thlc
c~Ltmo.IIM io; not consrslent).
\'in,, rru'- '- xpcnment, one WU} to tc~t for tailur1. of rantlomt73llon t) tu
ch~o.d. fM '' ~h:matic Jiffcr~D\:t:~ between the trc~ttmcnt anJ cuntwl g11 Jp.'-. tor
C:\:tmpl~ In rC!!fl.. ,jn~ X (or Z) lln th~ inJi,idual ch lfal:tuhlJC'- ( thl. \\ ~,) nnu tt:'-1:.,g he h'< porhc:.t:o that the coetficicnts on the Ws .tre zero It dilfc:r~.. nee..' cxi't I 1
ar~ not r~.Jtltly ~xpltined hy the nuture of the qu.t'>i-~o.xpcnrnem then thi'> is c
J~.n-.:~,; Lbatlh~ qua'' experiment d1J not pro<.lu~ tru~o r.mJomiz.1tt lO l hr. rl
j-. Oll r~.- lll<lll'lllp l1c..t\\~;Cn X (or Z) ..nJ th<. w~..\ (or Z) (:0Uid be r~olttcd 10 ~om~
Clf the un,lh,c..n(.:d factors in th1. arur h!rm tt. B~,;ctu'c th~se i 1dur' 1 ~ unoh
"cr\'cd. thi' (..tnnot h tc,ted anJ tht: validits uf the a-;,umptiun l,f "a-; tf'
domi7.atton muc;t b..: C\aluated U'>ing c.\pen kno\\ktlcc anc.J judgm~o.nt .1pplicd 1
the :lpph<.:<tlton .11 hnnd.
Failure to fallow treatment protocol. In a true exp~o. nmcnt. f<lihu c tt> tullt1\\
1111.. treatment protocol ar ise~\\ hen members of IlK lJ c<Jtmcm group tail to 1cCU\ ~
tn:atmcnt andlm member:. of the C()nlr<.ll group actunllj r~,dw trctllnwnt: illl'l)fl
'>cqucncc, the Ot S c.,timator uf the cnusal cfk<.:t hao; :-.election hi'h. 111~ counter"
pi!rl IU thi' in a tjU:t'>i CxperimentJS \\hCn the "a" 1r r:1nJnmi1<Hiun infiU~IlCL'' btl
doc:. not J~tarninc, tht. treatmc:nt k 'd Ln Lhl' c;1-sc the in:-.trumcntal \iUtinhl.:'
cstmltllur b. '~d nn th'- lJU.tsJ-expt;rtm~,;ntal innuen~~.. /c. n hL L'"'''' nt .:hn
though th\! Ot '- ,tim,tl\.lf 1 noL
13.6
501
1 qun,.i-exp~nmcnlt' ,jmJI.tr IU .Htritiun in e~ true experm the s~n'c that 11 11 tnscs bccau-.c ol personal chotec:. or characteristics.
then .tllritillll c.tn induct: correlation bcl\\c.!cn th1. tr1. ttmcnt 1\!,,.J .. nd the t::rror
term. TI1i' rcsulls in 'amrle ~electiL'O bi:t,,\lllhc 01 S c.;o.,timnlllr ,,f the: <.a usa) effect
j, hin,cJ anJ incon(;i,tcnt.
Attrition. Anritinn in
i m~:nt
z, ''
502
CHAPTER 13
Population Heterogeneity:
Whose Causal Effect?
If the cousal effect is the same for every member of the popu lation then in I ~
sensl! the population is homogenous and Equation (13.1), with its <.,tngk e1u~..l
\!!feet /3 1, applies to all members of the popu latton. In rcahl). hOWl.\d hC
13. 7
503
(13.8)
For exampJe./3 11 might be zero for a resume-writing tra ining program if the ,,h individ ual already knows ho w to wrile a resume. Bccauc;e {3 11 varies from one individual to U1e next in the populatio n and the indt vH.Iuals are selectt!d from the
popula tion at ra ndom, we can think of f3H fill a ra ndom variable tha t, just li ke u,,
reflects unobserved varia tion across individuals (Cor example, variatjon in pr eexisting resume -..vriting skills).
As discussed in Sectio n 13.1, the causal e ffec t in a given popula tion is the
expected effect from a n experiment in which members of the population are selected
at random.When the poplltation is heterogeneous. this causal effect is in fact the aver..!Be causal effect, also called the a verage lletl tment etl"ect, which is the population.
mean of the individual causal effects. ln te rms of E quation (13.8), the average causal
e ffect in the population is the population mean vuluc of the causal cfi cct (13 1,)-that
is, rhe expected causal effect of a randomly selected member of the population.
Wba! do the estima to rs of Section 13.3 estimate if there is population heteroge nei ty of the form in Equatio n (13.8)'7 We first conside r the OLS estimator in the
case that X; is 'as i.l" randomly determined; in thili case, the OLS estunato r is a consistent estimator o[ the average causal effect. This is generally not true for the IV
e~tlma tor.
however. Instead, if X, is partially influenced by Z,.lhen the IV estimator using the instrument Z estimates a weighted average of the causal effects.'' he re
those for whom the instrument is most influt:nlialre ceive tbe most wejghl
504
CHAPTER ll
We mm ... huw thtll il thcrt: is heterogeneity in the cau'\al cffc:d 111 th1. pttpul .t
tion and if X, j, r.mdumly a<..,igncJ, then the difkrcnce ... eo;ttm.ttur '" 'I."OIN,lcnt
estimator ot the ,t\eragc causal dkcl. The OLS e... timator j, JJ 1 ',\ ~I ' ~ [lcrua.
Lion (4.7}l. llthe olN.~ nations are 11.d., the '>ample covanc nce a nd \ nn.mcc arc
CliO,hh:nt cc:tun .ttor:> ol the populauun covanance and hriancc. c:o {3 ~
un lu i. IJ .\ '" r.tnc.Joml~ a: -..hmcJ.then X as dtstnbuteJ nt.lcp~nucntl) t f oth r
int.livic.Jual ch .. r.tl."lt..n,ttl."'- both ub,en~..d and unob:.ened. and n paruculnr ts dt .
trihuted ind~.;p..:ndentl~ uf f3o. and f3w According!~. th1. OV c: ..um:Jtor ~ 1 h, :> t~.:
limit
( 11 0)
wh~.:rc
thc thmJ equality uses the facts abou t C(tv anance~ in Key Com:~o: p t 2 ~a nd
cov(u;.X1) 0. which is implied by E(u,I X ,) = 0 [equation (2.27)]. and the l111.tl
equality follows !'rom {30, ami /3 1, bcing d istributt.:u imlcpendt!ntly of X,. whtdtlhl.\
are if.\', is ranuumly uetcrmincd (Exercise 13.9):111Us) if X, b ramlnml> ""''!Pl~l
{3 1 is n consi,tcnt estamutor of lh~ averag~. cau:,al d(cct (/3a_).
IV Regression
with Heterogeneous Causa) Effects
Supf'O''- th.tt tr~o. ..tmcnt t:. onl) pawall) randomJ) <.lctermmed. that 7.; ~ n ' altd
m... trumemal nri thl~ (rek\ ..nt ;md '- \ogenoll!\) and hat there'' hc.;ter"6'- nen~ 111
th~ effect on.\' lf Z, SJXdfk~illy, :,uppo-.c that X i.., rc!latcd to L, by the lin~.,n modd
( p J())
\\here the coc11ictcnt:. 1r,~ and 1T, val) from one indt\ auual to the next. [lJu.ll t\1!1
(I J. IO) ,., thc fu..,t-stagc equation ofT LS p:.quataon (1.2.2) Jwith the moLl it u:.ll tlll
that the effect on X, of a change in Z, is allowed to vary from nnl.' tndivtduultn tll ~
next.
lhc fSLS cstimutor is~{ -~ 1' = ~'L\ I~/\ (Equation ( 1:!.4) !.the ratio of thc. , . 0 1
pic co' arm nee hc twc ~n L anJ } to the sample cova ri;mcc between Z anJ '(, It 11 .
obscrvauon~ arc J.uL. then thc ...c <:ample cm .mances arc con'''knt ~.stunat\ll ' ~~c
.
J
th~. f\\)pul,ttion ctwartances. SQ fj[' 1 ~ ~ rr ..n u /\ Suppose thatrr,;, r. f3 ~ ,,n
{3 1, ar~ Jbtrihutcd inJqxndentl) of u,. \ .. .. n<.l 7 ,.th.u (u., Z,) F( , / 1 = o. nu
th.ll /. (r. 11) = 0 (intrumenl rek\ancc). It j, 'ho,,n in App~..nt.li \ I 'A ll..tt. unJ ... r
these :t"umpti{ms.
t 3 .7
50S
(13.11)
That is, the TSLS estima tor converges in probability to the ratio of rhe expected
value of the product of {3 1, and 1r 1, to the expected value of r. 1,.
The fi nal ratio in Equ<nion (13.11) is a weighted average of the individual
causal cf(ects {3li. The we ig hts are -rr 1,1 1:.'1i 1,. which measure the relative degree to
which the instrume nt influe nces whe ther the 1,h individual receives treatment.
TI1us, the TSLS estimator is a consistent estima tor of a we ighted average of the
individual causal effects. whe re the intlividuuls who receive the most weight are
those for whom the instrument is most influenti al. The weighted average causal
effect that is estimate d by TSLS is called the l oc~J ~:werngc treatment effect. The
te rm "local'' emphasizes th<H it is the weighted average that places the most weight
on those individuals (more general ly, e ntities) whose treatment pro bability is most
influenced by the instrumental variable.
There are tbree special cases in which the local average treatrrien l effect
equals the average trea tment effect
I. The treatme nt effect is the sam e for aU individuals. This corresponds to {3 11 =
{3 1 (or a ll i. Then the fin al expression in Equation (13.11) simplifies to
2. The instrument affects e ach individual e::quaUy. This corresponds to -rr 11 = r.1
for all i. In thi ~ case, the fint~l ex pression in Equation (13.11 ) simpliiies to
E(f3t,77 H) / (1Tlr) = (f3t;)r.,t r.1 = ({311).
3. The he te rogeneity in the trea tme nt effect and heter ogeneity in the effect of
the .instrument ar e uncorrelate d . This corresponds to {3li and 1r 1; being
r andom hut cov({3lr,1it;) = 0. Because ({311'1T 11} = cov(/Jlt,7T1,) + (/3 11)( 77 1;)
[E qua tio n (2.34)). if cov(f3 11'r. 11) = 0 then E(f3ur. 11 ) = E(f3 11 )E(r. 1;) and the
fin al expression in Equa tion ( 13.11) sim plifie~ to E(ft"1r 1J I E(1ru)
50 6
CHAPTER 13
eligible lm a JOb tnnninl! program and arc ramlomh '''11!.1\nl a prinr II) number/,
whkh inOucrll.:e-. hm' hhl~ the} are to llt: Jt.lmiltcd toth1. prngr.tm.llal1the ''orkcrs knm' they" ill hcndit from the progr.1m: fl>r thl m. /3 11 = 13' > 0 .tntl -rr 1 r.
> 0 11 other h tlf know th::u. for them. the proN1m i' .ndfcctiH' 'o the\ woukl
ncH Cnh ._.' 1. 111 ndnu cJ that ''for them 13 = II .~nu r. = II The <1\'Crage lrc t
rncnt eth..:ct I!\
or U) = ~13 . The IV\. tl .1\t,;rl~l.. lr ~.11111 nt elft I
I,,, )
'-
lrLalm\.nl dh:C:t b the C3U,al CffCCt ror tho~~ Wllfkt!l'l \\ hO lr\. likd~ li.l CnrOlJIJ
procram. and gi"C" no wci!!bt to those ''ho will not enroll UJHkr ..tm 1. i cunht me~.:
In c,mtr ~t.thc d\cr..tgc treatment cfft:cl pl::~ccs cljUll WCI!...ht on tllu di,lduol<.
fi.I!Jnllc!i)> ut "h\.thcr thcy '' \lUid enroll. B ccau,~, tndt\ tJu.tls d~!cJd~.; 1<'1 '- nroll ba,cd
m pan on thcit knt}\\ ledge ol how dkctl\~ the prngram '"II Olo lor tht. m. tn th.;
1-!Xample th~..loc,Jl m~.;ragcw treatment effect cxcc.::~.:J'> tht. ,1\cragc trl:.tlnll.'ll\ clh:l\.
ll1is Jiscussion hao; two implkalillllS. fi rst, in LIK' cin:utl11-l.llitl'~
111 "'hid1 OL~ w<,ukl normally be coru;istent-thar is. when F.(u,l X,)
0 -th( cJl S
c'\lllnator eonunues 10 be consistent in the p t c.::::.~n c" of hct..:mg.ctH.:ou" C'lll d
c.:l h:cts m the popul.tltt)n hO\\t!ver. because there is no sllll!l~ c<tusal die: I. 1 'h. < , s
e-.umah1t j, prupCrl) interpreted a.c;. a con:,istcnt esum.ttor ot th" av"r ~t.:
dful in tl~: popul.t=un hdng studied.
S~.:" 1nd r mind 'tdual's d~.:ci,ion to r~cdve trcatmem u" pent.!' \10 lhc t ltxu,cncss ol the rcum~nt lor that individual ~~en th" TSLS e~llm.ttor i
I
h rut 1" n,t,h.nt est.m"h,r ot the a'~fal!.c cau-..tl eltect. Instead T LS 't"~.. '
,, k)\,al .l\crug" trl.!.ttffii!Ot ~.; f.fect. " here tht.. l<tus<~l dkcls t>L the 1nd1\ 1du I "
arc- mo')t inOu~:nccd b} the anstrumt:nt r~cctYe the.; grl;,tt(;.;t \\-l.:tl!ht.llus jt,;,,
disconcerting ...ituation in '"hich two res~.;Jrch~r'. ,trmcJ '' ath dafkn:nt m,tn n~n
t.al v tri.thl~' that .1re huth \<JiiJ in lhc sellSc! that both .m~ rdc,ant aml C \,,g~.:nL U'\\OUid t.,htam dtlka~.:nt ~~limatco;. of"th~" cau,al effect. e'en inlnr~~.:' 1 ,:,,
r\lthou~h both cc;tim.ttor' pro\'ide some insight into thl' di.;tribution of th~ L,ut
dk ts \'Ia thcr rc-.p~ctav" wetght~:d a' erages of th~ lot m 10 r4uatinn ( IJ.Il) n~i
1
th!.!r cstunator as 11\I!Cnt.:ral a consistent estimator ollh~.: avc.:rag.: ca uo;.nl clh:~.:l.
Implications.
.tU"''
dlcct
1'1"'
I'.J,\111 C\otlU, tllon 1!\lllll.llm~ I hc~c tncluJc lhc: 'uncy by llccl;m,m I .tl nlhk,.tml 'imllh I tJN" '1
7) .tnJ l.tmt:' Hc... km.m, lcuun. Jdt~Crctl whc:n he ICC\.IVCd th ... '\nlxll'ruc tr C..:liOllffiK' rli~~~ n
11111
..
.;'tMll. s~~thiO 7). nlc l.tiiCf ICla!ro:n~c .tnJ Ang:ml. GraJJ.~ tnd 1mb
(21)1.11)1 pro\lJC l.lc:a:uJ.. J "'''
cu' '''nollhc runJlm t:IIC\:t~ modd l"h'"h lrc:ab /3 .~.~ v-o~r.ngacro..-, mdt'tJuaJ,) nnJ pru ...k ,..,n
1
g n~.:Hll \tr,tOn> of the rc,ult in l'quat100 11.3 11) The CC'I 1't o11h lt~ltl\er:tJle trcntmc 11 ~ "
wa.. mtrnJuc..."Cl h\ ~ngn:>l nnd lmbcns (1~). '~ho 'bo"cd tru1ltn s~ncrlllll do.:~ noll!iJW 11
r
j!C treatment cflcct
13.8
Conclusion
507
13.8 Conclusion
In Ch:1pkr 1. we Jdincd the.: cau.;aJ d t~.ct in ll'rms nf the cxpcch:d 1JUtcumc of an
l(kal randnmi7cd controlled experiment If n randomized controlled cxpcnmcnt
i.; 'I\ .tilahle or can be perfonncJ it can prO\ rdc compelling e\'ldcuc~.:. on the c.IU,,tJ
l fl~:ct unua l-tudy. although even randtmll/\;J c~mtmllcd l.'Xpcnmt:onts arc :.UbJ~o:<.:l
tu potent iaU) important thrcat!) to internal and ~.:xtcrnul vahd11y
J)c..,pit~: their advanta~c:s. ranllllffiitcd cuntrollcd c:-.pcrim~nh inc~. 111nmic~
f.u.:c s~;vt:rc hurdle ..., 1ncluding cthkal con~.....rn' nml co-.t . The in<;ighh of cxpclimult:ll mt:thod' can. howcvc1. be applitd to quac:i-experiment-., in \\hu.::h -.pccial
ci1cum,tanccc; make it seem .. a~ ir' randnmaz,llinn h.b Ol:curred In qua.,h::>.pcri
mcnh. the causaJ eCfect can be C'illmall'U U'>tng ,, Jrllclt.nc s-10 difference-. c'll
508
CHAPTER 13
Summary
1. l11e causal effect is defioed 10 te rms o f an ideal rando m1lCd co ntro Ued experiment.
and the causal effect can be estimate d hy the difference in the average o utCOIT'l':.
for the treatment and control gro ups. Actual expenme o ts with human sub.Jec s
d e v1ate from an ideal experime nt lor various pr actical reaso nc;, especially the f ..Jure of people to comply with the experimental proto col.
2. 1f the actual treatment le "el X; is rando m , Lhen the treatme nt e ffect can be estimated b y re gressing the outco me on the treatment. optio nally using addiuon I
pretreatme n t characterist ics as rcgrc~ors to improve etficie o cy. If the assi~n I
treatme n t Z; is random but the actual treatment X1 i~ p <m ly determined by t.ndtvid ual choice. then the causa l effect can be estima ted by instr ume ntal variables
regression using Z1 as an instrument.
3. In a q uasi-experiment. variations in laws or circumstances or accidents of nature
arc treated "as if' they induce rando m af.signme nt to treatment and control group~
~'ih.td"h Cool . a nd Camphell (2002) provide a comprehcn,av.. tr~ tmcnt uf c\p.:rtmcnl' and q u 1.. \f" rtOlO:nl) 1D the :>OCtal \Cicnc~ unu 10 psv<:hology l -camptc:s v f e'p.;nmenb In CCOOI.lmtc<o mcluli.:
ncgathc ncomc: ta.'t experinu:nt~ (rur .:'ample. ~.:e ~.~~,pc.hh!>.gollt pl~imr-di me8J) and lh.: R'!I1'J
h.:ahh ln'Uranco.; 1!'-penm.:nt (:-\cwh,lU'-C 111"'1
509
If the actual treatment ~as if' random, then the causal effect can be estimated
b) regres.~on (possibly with additional pretreatment characteristics as regressors);
if the as.-;igned treatment i as if' random, then the causal effect can be estimat~c.J
by instrumental variables regression.
4. A key threat to the internal validity of a quasi-experimental study is whethe r the
"as if" randomization actually results in cxogenei ry. Because of behavioral
responses. just because an instrument is generated by "as if'' randomization docs
not mean it is necessarily exogenous in tile sense required for a valid instrumental variable.
5. Wbe n rhe treatme nt effect varies (rom one individual to the next, the OLS estimator is a consistent estimator of the average causal eff~t if the actual treatment
js randomly assigned or " as ;r randomly assigned. Howev . c
ables estimator is a weighted average of the individu
Lhe indi vidual~ for whom the instrument is most 1
weight.
Key Terms
program evaluation (469)
causal effect (471)
treatment effect (471)
differences estimator ( 471)
partial compliance (473)
attrition (473)
Hawthorne e ffect (474)
differences estima tor with additional
regressors (477)
conditional mean independence (478)
differences-1.......,=-=--
differences-m-differcnce
additional regr essors ~
quasi-experiment (495)
natural experiment (495)
repeated cross-sectional data (498)
average causal effect (503)
average treHtmc.nt effect (503)
local average treatment tlffect (505)
A researcher studying the effects of a nl!\\ ferti lizer on crop> idJ:, plans to
carry out an experiment in which <.lifferenl amoun ts of the fertititcr are
applied lo 100 different one-acre parcels of land. There will he four treat-
ment levels. Treatment level one is no fert ilizer; treatment level two il> 50%
of the manufacturer's recommended amount of fe rtilizer; treatment level
three is 100%; and treatment level four i& 150 % . The researcher plans to
apply treatment level one to the first 25 parcels of land, treatment level two
510
CHAPTER 13
ExpcrimentsondOuosiExperiments
to th~ ~ccond 25 piircch. and !'.o forth. Can you ..,U!!,!1~l>t a bcttl!r W:l} In iJ''>ign
level''\\ hy ts v<Jur proplhal txuer thanth1. rl!,carcher m~. thod!
A ..:hnu:al lrialt.., earned out for a new cholestcrol-lo,.. crinc. Jrug. '11tc Llruu
C'
tl> gt\cn h.> 500 palh~nlS and a placebo l!.iven to another '\()(1 paucnh. u..: _
r. mJo ''''gnmi.'DI o! th1. pauent:..llo\\ \\OUid ~ou c't rn 11.: the trentn ~,
dfcd of lh~.: drug., Suppose thill )OU h 1d data on th1. \WI~hl. ac,c. an~.t
Jer ol c.u.. h patic.: nt Could you usc thc~e data to imprn' }our ~.:::.tu
E\pltin Suppo"'- th'lt you had data on th1. cbok... h.:rol k' ch of cat.h p
betor~. hi. or she entered tht. exp~.:rimenr. C'oulu you u... c the ... ~.: dat r )
rmpro\ c) our v . umatc? Explam
Rco;c r1.hc.:rs stud) tng the STAR dat. rc pon lnt;c.Jotal e' td~.:ncc th 11 '" ~ -1
princip.tl.., Wl.!rc.: prcs... url!d by ::.orne parent::. to place tbetr chtiJr~:n tn tht. sm 1 l
clas~cs. Suppos1. thut some principals -.uccumbec.l to 1hi::. prcs ... un.: and t1 .n ...
lcrreu some childrc.n into the -.mall cla.s'\C'-. How ""ouiJ 11w. cumprnmt.., th~:
internal vnhdm ol the study'! Sup po~e that yuu hatl data on the origin.tlt.11l
Jom ns~ignme n t ol each sllH.knt before the principal's inte rvent ion II Pw
cnukl vou usc tht') tnlunnatiun to rt' tore the 111tcrnal v<lltJtl\ of the <;1\u.l\''!
Lxpl.un \c.heiiH:r cxperimcn t.tl effect~ (hkc th~; l l .t wth orn~.. effect) mt\!.h . 1.
import<111t in each (lf the experiment~ tn thl! prc\'IOUS three q uc~ tJIJI1<;.
()cctinn 12 I g.i\'c..., 1 h\ pothdil:.tl example in whtdt "'Ime: whooh \\1. r~. l.t:naecd h\''ln earthquake. Explain "b) thts ~~an c\Jmple I'll 14U'"i-c-.;p~n
ment. llow could vou tt:.c the mduccd change" in da'" s~es to c'timalc tl ~.
d k ct ot chc;s ..;iz._ un test scores?
trcatm~.nt
13.2
]3_'\
lJ.~
1.3.5
Exercises
13.1
13.2
t "iOl! tht rc ... uhdn'labk 13.1 calculate the foUowinl!for e.tch j:.:f<tl.lt:. .tn ~ ,t i
mate olthc 'lm.lll d.t!>l> treatment effect. relative ltlthe rc~ular cia" it~ ,. n
J ml c:nur anJ it~ 95 .. conlt~lt:nce mtcnal. (l-or thi' ex~.rct-.c t gnor~ I ~
rcsulh fm regular classc::. wtlh atdc::..)
f:or the following culculatii.)OS. use th~: n.:sults in ~:olumn (~)of fabk l 2
Cnno;ic.lct two cla..,~ro,lml>. A and B. with idcntic.tl \ialu~:-. 1.1f the n.:grc,,OI' in
column (-1) of'lahk 13.2.e-..:ccpl lhat:
Exercises
51 1
a nd classroo m B is a regular class wirh a reac her with 10 years o f experience. Construct a 95% confide nce interval for the expected differe nce in average test scores. (I lim: I n STAR, the teachers were
ra ndom ly assign~.;d to the different t~ pes of classrooms.)
Control Group
1241
1201
97.1
Kumher nf men
55
45
N umber of women
45
55
11.
~-and
S 12
CJiAPTER 13
a. Midway thro ugh the year all the ma le athletes move into a Cratern11v
and drop out of the study (their fin al g.rades a re no t o bserve d).
b. E ngineering students assigned to the c<>ntrol group pu t together a
local area ne twork so that they can share a private wireless lnte1not
connection that they pay for jojrHly.
c. The art m ajors )o the treatment group neve r le arn how to access tht.:lr
Interne t accounts.
Suppose that there are panel data for T = 2 Lime periods for a randomized
controlled experime nr, where tbe first observation (t = 1) is taken before the
experiment a nd the second observation (r = 2) is for the post-trealmt:nt
period . Suppose that the tTeatment is binary, tha t is. X,,= 1 if the ;th inJntd
ual is in Lhe treatment group and t = 2, and X;, = 0 o thetwise. Furtht:r ... uppose thal the treatme nt ellect can be modeled using the speciftcatjon
Xu.; why?)
ExerCISe
S 13
c. Based on your answers to (a) and (b), when would you prefe r the differences-in-differences estimator over the differences estima tor, based
purely on efficiency considerations?
13.7
Suppose you have panel da ta from an expe rime nt with T = 2 periods (so
r = 1, 2). Consider the panel data regression model with fixed individual and
time effects and individual charactcri!'tics W1 that do not change over time,
such as gende r. Let the treatment be binary, so X ;r = 1 for 1 = 2 for the individuals in the treatment group and let X,,= 0 othe rwise. Consider lhe population re.gression model
13.8
where G1 = I if the individual is in the treatment group and G1 = 0 jf the individual is in the control group. Show that the OLS estimator of /3 1 is the diffe re nces-in-differences estimato r in Equa tion (13.3). (Him: See Section 8.3).
13.9 Derive the final equali ty i.n Equa tion (13.9). (Hint: Use the definition o f the
covariance and the fact that, because the actual treatme nt X, is ra ndo m, f3u
aod X; are independently distributed.)
13.10 Consider the regression model with heterogenous regression coefficients
where ( v,, X ;, f3oi> /31,) are i.i.d. random variables with {30
= E(/301 ) anti {3 1 =
= 0. Show
514
CHAPTER , 3
d. Suppost: that nut licrs arc rare. -;o tll.lt (u,. Xi) ha\1; finll~; luun h
moment~. Is it uppropna tc lo usc OLS and tht! mdhod~; nf Chaplet~ Ll
Jnu
value., of
{3.,. .tnJ {3 1!
c. Suppose th.tl /31i and X, Jr~.o positivd> correlated. <;o that obscrv~t1111 .,
wllh larger than average value~ ol A, tend to b.l\l' largt:r th;m. ,~,., u
\alucll of /3.,. Ar~.- thc .~~umptions 10 K~) Conc~pl ~ . 3 satJ..,ItcJ! I , 1
whkh ilSSumptiun(s) i' (ur~) \iolated? Is it appropriat~ h .l U"<: ()L~
anJ !he method' of Chapters-+ and 5 LO esLimatL anJ carry nut tnh.
enct. ahout 1hc .1vcrage value of {3 and {3 1 '!
13.11 In Chapter 12. stmc-kvLI panel data were used 10 ~~ trmatc lh'-' pncc ~.-l.l ... lll.:
ity of d..:ma.nd l'ot dgarettc!>. using the Sl<ttc ~al es tax a... an mstrum~.:nlal \',Jliahk Cunsh.ler in particular rcgrc,llion (I) in Tabk 1.2.1 In Lht' c<t'>C, m yuut
judgment does lhto local <1\Cruge lrealm~nt efft,;Cl (jjffc:r f10111 lhl! JV\!t , lg C
trcrt lmCn l effect'? Fx phdn.
Empirical Exercises
El3.1
1\ pro-.pc.:ctl\'c cmplorcr rec~ives rwn rc~umes: a resume from a '' hit~o jno
applicant and a lltmilar resume from an Afncan Amencan tpplicant. J., ttn:
cmploy~.r mor~ likely to call b.tck the white appltcttnl to arrang~.: nn mt.:J
\ ic'' '' M.tri<.~nnc B.:rtntnd aml <i.:nclhil Mulluinathan carried out .l random
izeJ C\"111 1r~.lll~!c.l ~ "P" rim en I to an v.~r thi" quc-.t ion Becau-.o.: rae~ i~ ,,,,
t\ p1nlly included on a resume they dJlkrentiatcd ,c. . umcs on the h..... i-. I
"whitc-snunJing nt~mt!~" (~uch as Emily Walsh or Gregor~ Baker) 1~u
"Afncan t\ mcril.an o;ounc.lmg names (.-.ucb as Lal-..tsha \\ ,tshtn!!Lnn or I ...
Jones). A h.trgl! colk\.lton ol ltcl!Uous re:.umcs ''<I' crc.:ateJ. anJ the ptc.:-.up
pol.>ei.l"rm:c'' (based on lhl.' "~ound" ol the name) W<~ ranJoml> a):,!-.tgncd 1 1
each resume:. 'lhco;c resumes v.cre sent to pro~pectivc cmplo) Cl"> to see" 111d1
and
l\amc~
Dcscription.h
a. Ddinc- the '\:<~11-hack wlc" ac; the fra<:tion of rcc;umcs tht1t gcnl.f<lt~ 1
r hnne call frnm the prospect he employer. Wh<!l W<l" the call b.IL'k r ti C:
"111.-..., J;ua W<'TC: ri(\VIJ~J h\ l'rnle-.<.ur ~~ m unt: llo:rtr.IOd tho: L'nl'cr-.ot) nl ( I~ IJ: 111<1 \\
u....c.l m htr rro.:r \\llh llo<.:ndhtl ~ullatnath,ln Arc EIIllly and (Jfi:O!! \1nrc;: rmp!O)ahk )11.111 I ~. ,,It I
.1nd l.lnllll? /1. lu.ld F.\pcrinu:nl fill I 'I N>I M.ul.:t o~')qlminari<n.~
Ill E(t)/lllfllt(' Ht I I<>' !XII
"""'n
94( ll.
Empirical Exerci~
51 5
c. What i~ the difference in c<lllh~ck rate' IM hiuh qualit) vcn.us lowquality re~umc''? What j, thl! high qunhlv/lo\\ lJUahty dtfference for
''httc uppllcants'! l-or A tncan Aml!man npphctnh'! I<> there a ...igmft
cant difterencl.' rn thts htgh quulityllo\\ qu.tht) dtffea!nce ior ''hues
\CNU' \ln~.an AnKric.an-.?
d. Th~..: author:; olthe ~tud} d,aim th,tt ttlCt..: '''" a-.'>igncd random!~ to th~..:
re:-.ume:-.. b there an} 1.. v11k nee ,,f nnnr.mdum ,,,')ignment?
13.2
A cooc;umcr is ~iven the chance to huv n haschalllard lor 51. but he declines
the tr.tdc. If the conc:.umcr '"'On\\ ~1\lll th~. ha,eb,tll c;mJ.
h~ ~ wlhng
lOl>l.:ll it 1\.)r $1? Staod,ml consum~:r thcur} '"~~1.!!>1:> yes, but behavioral economists ha\'e found th.1t owncr!<hlp .. h.:nJ~ to 1ncre.tst the "alue o( good~ to
consum'"r' That is. the con~umer m.l)' hnJJ uut 1t1r 'omc amount more than
~1 (for L'\amplt:. S1.~0) "h~.!n .;cHine th1. c ~rd , C\ en though he '"'~\\ illing to
pay onh 'l'mc '!mount les... than (,J (for ~\.unplc SO&\) ''hen buYing it.
Behavior'! I economi<.h ..-..111 th1' plll.n,,mcnon the .. cnJL'wment eiTcd.'. John
Li~l IOVC,III',tlcd th" cndo\\ment effect in a ranJnm11"1.'J e>.. pcrimcot in,ofv10!;! <:ports mcmor ,lblh.t t m<.l~rs at J 'ron s-carJ show. I raders wen: ranJom1)
gh1.n one o f t wo !>pons collcctlhlc,, ""> gond \ m good 13. that had appro\
imall.:l) equal marl.~ I 'aluc. 7 11w-,~_ 1\!l'l.'l\ ing gnnd A were fhen givt:n the
ortion of trading good t\ fM good B with the~. xpt.!rimemcr: tho"C' rccci\ing
gMd B were A.i"en the option ol tradtng good B lor good A wuh the experimenter. D atJ fr~1m the c>..pl.!nm~.nt anJ a d"t'tiktl Je~cnphun can b.. louncl
on t b~. te\tl:l1ok \\cb site http://"'"".U\\ht .tomlbtock_\, at.,on 111 tht: file!.
Sportscards and port~C1l rd b_D c cription ~
,,,n
a.
i.
Suppos~.
th.tt, ;~b,cnt lfl) cmlll~ m nt cffccr. all of the sub\,ch prefer good \ to gcxxl l:l What fr:tction o( the cxpcnment's <>uhjccts
wouiJ you expect tutrad~ the guod th.Jt the} were ~ivcn lor the
'Gt >d A " "II 11dcl 'lUll hom the gume th 11 U.t R1pl.en. J1 '"' 11 r~~, rd l<~r u "''c~UII\ C i!flm..-:.
pl;m;J, .ano.J <illod B "' a souwn1r trum the game 1h.11 ~ .. l~n lh~tn \\<n h1'i ~l(l '1!.1mc:
S 16
CHAPTER 13
u.
iii. Suppose that. absent any endowment effect, Xo/o of the ~ubjcct'
prefe r good A to good 8, and the other (I - X).,., pre. fer ~ood B 1o
good A . Show that you would expect 50% of the suhJClb to trad-.;
the good tllat they were given for the other good.
b. U sing rhe sport ~tard data, what fraction of t he subjects tradt:d th~
goou they were given? Is the fraction significantly diffe re nt 1ro m
SO%? What fraction of the subjects who received good A traded fu1
good B? Wha t f-ractio n of the subjects who received good 8 W1dcu ftn
good A? Is tbe re evidence of an endowment e ffec t?
c. Some have a rgued that the endowment effect may be present. hut th::ll
it is likely ro diwppear as traders gain more trading experic nct!. I h lf
of the experimental subjects were deale rs and the o ther half we r~; nondeale rs. D eale rs have more experience lbao nondeal e~ Repea t (bl t'Jr
dea le rs and nondealers.ls there a significant difference in thc.tr be.
ior? Ts rhe evidence consistent with the hypothesis that the endowme nt effect disappears as traders gain more expenence?
APPENDIX
13. 1
S 17
bin a~ 'art~bh.: Ru) - 10 Tabh: 13.:! mdicatcs wht thcr the l\tudtnt ts abo) ( = I) or gtrl (O): the bm I) 'an:-tbJo..:, -Black.. and Race other thltn hluck or white"' indicate the student's
race ' ll1c: hin.. ~ ' tri thk Free lunch divtblc .. int.l't. Jtt \\ hc.:ther the student i" ehgihle for
., free lunch dunng that M:h(lol year 1l1e tencher"s ve1rs l)f ~:xpcricncc are the tot.tl vear.;
of experience: of the tea.:hcr whom the stutknt h<HJ tn the grade for which the: te~t data
apply. The data set abo tndtcate:. "hich school t h~ :.tudcnl allcmJcd tn a gh.cn yenr. makmg 11 possibk tO construct binary school-spcdftc tndu:ator vonnhles.
APPENDIX I
13.2
\\here i = I, .... n denotes the indivi<.lual, t = l , ... , T de notes the time period of measurement. X , - ltf the 11h 1nuividual has received the tieiHmcnt by date t and = 0 other\lo,IJ>r,;. D 2; is a htnaT} vanable indicating the t11 mdividual (that I<;. D2; = 1 fori= 2 and - 0
mherwtse)./J2 is a b10ary \llnable indicating the !>ccond time pcnod und the other hmary
\oanablc~
,ne cJdined -.tmtlarl), v,1 i~ .tn error term and {30 {3. y 2, y,_. li:.... 6r .trc
unknflwn codltcicnh.. lnduding hmar\ V.JrtJbl ~' mdtcatin!: cat.h inJi.,.idual control' for
unol:l~crvcJ rnJt~oidual charactert\UC" that affect) . Including the binary \'3Tial:lles 10dicat
ing the time period control:. fl'T dtficrences from one pcnod to the ncxtthat.tUcct the out
come rcgardk:os o1 \\ hether the indi' idLMi ts m th~: \Tl;<~tmc nt or control group. for .::xample.
an economtc rece~~ton that occur; dunng the.:. course ol a JOb tratning program expcnmcnt.
When r =:!.the lime and ftxed etfccts rcgtCl>Ston mooc.:l tn Equation (13.12) Slrnphltt:\ hl
the differences-m dtflcrencc\ rcgr..:~'ilm modeltn Equation ( 1.3.4). Methods for estimating
{J in Equ ~~j~,n ( I' 12) arc d tscussed an Section 10.4
S 18
CHAPTER 1 3
U1
Y Irom one
period to the next. o ot its level. A n individual's prio r education. for e xamph:.ts an ob'>tr,
a ble factor that might influence the change tn e amings whethe r or no t ht or 'he ts 10 the
job tra ining progra m. Th us, to extend Equa tion {13.5) to multiple periods, the W rc~rc s:.o1
are inte racte d with the time e ffect binary variables. For co n ve nie nce. su ppo:-c there is .1 ,jn.
gle W regresso r; then the multiperiod extension of Equation (13.5) is
Y.., =
(n.D)
where t he regressor 82, x W, is the interaction between the bina ry va riable 821 um1 \1'1
When there are on ly two time periods, the population regression model with ind1Vl<Ju.1l
fixe d e ffects, time effects, the W regressors, and the W's inte racte d with the s ingle nm._
binary va riable 8 2, Js the sa me as the population n:gress10n rno de l in E qua tion ( 13 ))
(Exerc1se 13.7).
Pane l data with multiple time periods a lso can be used to trace o ut causal cftccts ov~ r
time, the reby asking, fo r e xample. whether the e ffect o n income o f a job training PH'I!
uolo
penis ts or wears o(f ove r time. The methods for doing this a re d iscussed in Chapter IS m
I he context o f es timat ing causal effects using t ime series da ta.
APPENDIX
13.3
Tltis appe ndix disc usses the cooditional mean indepe ndence assumptto n me ntioned 111 ~~--"
tion 13.3 a nd its role in the estimation ol a common treatmen t e ffe~:t {3 1 Tltis discu ~''\ 1 11
focuses on the d iffe re nces est imator with additional regressors l/3 1 in E qua tio n (13.2 )) hu t
the ide as gene rali%e to the d ifferences-in-differe nces estima tor with ad di tio nul rcg;re~s,>l:.
The conditio na l mean independence assumption is tha t the condi tiona l mean
error term u1 m Equat ion (13.2) can depe nd on the control varia bles W1, ,
or th<=
W,1 out
11 111
5 19
can be correlated with the observed control variables (the W s). but gtvcn the Ws. t he
X., does not depend on X;, so in pa rticula r the mean o f that conditional distribution
It
e xample. if treatment is assigned randomly then it will not pick up the e(fect of prior educatio n whether education is an included regressor o r an omitted pa rt of the e rror te rm.
ln tbe thtrd case, the trea tment X 1 is assigned randomly.conditionnl on w.. fn this case.
the mea n o r u1 does not depend on X, beca use. gi ven
Jtconditional o n W;. u, and X, are indepe ndent, then the conditional dislributioo or rt, given
W; does no t depend on X 1, so its condi tional mean does not depend o n X, even though it
might depend oo W,. H W, 1s a se t of indicator variables, conditional mean indepe nde nce
means tha t X , is randomly assigned within each group, or "block." defined by the indicator
variables, but that the assignment probability can vary from o ne block to the next. R a ndom
assignme nt within blocks of individual<; is sometimes called block randomization.
Under the conditional mean assu mption . /31 is the
tr~a tment
{13.15}
whe re the second equality follows from the conditio nal mean independence as umpuon
[.Equation (13. 14}). Evaluating the conditional expectation m l.:qualion (13.15) at X,= 1
(IIcatmcnt group) and at X , = 0 (control group) and subtracung yields
(13.16)
520
CHAPTER 13
1lu:ldth.llld side''' hjU.ttinn (1 '\.Jti) j_, the C;lli' tl dfect t.ldmed !l) mt c\pcruncttt \\her
i ndt\itdu.rl~ \\ tth gt\'Cn H' ch.tracu~nstics trl; - mtJ,,m l ~ .l,,igoct.l 111 trc.ttmcnt 11nd control
gwup-.. nnd the cnu...tl ell ct ,the cxpc:dcd v,tluc of thc outcome. Uccau'4.' thi' c.tu al cllc4.
IIi~
random!~ ~ded..:d
member uf t1
popul~l'''"
Wh~n f:tJUttinn ( IJ 14) hniJ, (alum~ with the 't:cond through lnurth k<~:-.t :-.q u 1:)
a-;<,umpti''"'" in Kc.;y ('(Incept 11.4). the diiTercncc' ;.:stimat~lr wnh ~.u.ldttiunal rcgr u ors 1
ctm't'te!\1 I ntUttivt.: h. hv includmg \\' '"a rcercso;nr the.: uillcrencc.:~ ,;-.limntM cou11 ol., 1 ,,
th" I tCt th.tt th~ treatment prubahilitv c:.1n dc:;pcnu on W, Th.; mathcmatil I .trguml tllth 1t
~ 1 ts l.Oil\t~tclll unuiJt tho.. ~,;,mdttitmal mc,m mJcpendcnc~.; ,,ssumptum In\'"'- cs mat r:, nl~c
br 1 md j, lcll to l.xer~.t't.: 1::>.9.
APPENDIX
13.4
hctcror.cnt"
Jentl)
11
',.
ft
cCI\ f.urthc.:
7 t 11 I
Srx...:illl.''Uil~ . ll
I' I"' $ huiJ. -~'-~PI that Equati.1n' (135) anJ (13 10) Jwld ''"h
''
IMl'
th Jl-
:;
,. /3 . anJ {3
31'l'
Ji,trihuted iml
4
ll
0.
cScc \prxndt.>. 3 '\ .mJ C!xcrct~ li 2). Ill.. ta'k thu.. j, Ill obtain ~ xpre"ion~ Cor
:1 1\d
''.t
'")I
IT/\
-" l:(rr111 } x U ~
rr, Z,
f"[;; 11 7.,(/.,
521
1,)1
IJ.I'Jl
cov( 7 ,.t,)
ll ;\ I Fi)
'' hcre th~; 'iccond cquaht~ obtain<. hccuusc co,(Z,.t)- 0 [whkh follows from the nssumpllon l:.(v Z, l = O;~ee Equation (2.27)] hecauc;e l:{ (7, -p. )rr~J = l.:'j((Z - J.l hr ~J =
F{F:'{< Z. J.l )-u1~1 Z;]I = (1 7 , - J.L )(rr~, L;) == l-(7.,- J.l:) X (11' 1., )lthasu.,~qhela\\ uf
atcratcd cxpeclal ions and the ,a~umpt10n that rr1,, I!. ttHh:pcndcnt of L ,), and bccau~t:
1:(7r1,7 ,(Z, - p.z)l = EIL!rrt 1 7,( 7., J.l.::)IZ,JI- L:.'(rr 1,)Ff7,(.7., J.lLI] = tr~(rr 1 ,) (thts
u~cs
Y;
thl: law ur iterated exp.:ctatioru. and the ;hsumptt\)n that rr 11 i-. mdcpi.!ndent of/,)
Nc\1 COilSJdcr crLY. Sub,tituting [quatmn (I'll()) into r quataon ( 13.H) vicltb
{3,.,
+ {3 ,,(-r. - rr 1,Z
tr7r
,. )
_j_
II,.
so
= F.{! .7.
J.L.t)(/3,
= F(/31 ~) > II
(13.l\l)
+ cov(Z,.{311 rrlvJ
4-
+ ~,,q Z,.u,).
md~:pendentl) Ji!>lrihut..:~l.
(13.20)
Suthtituting. Equatit,ns (13 lR) and (13.~0) intu fqu.uion ( 1.ll7) yidd~ (3f' 1 ~ ~
,~F(f3 1 rr ,) I u)E(;; ) = F.(f3 rr 11)/ /.:.( rr ). "h1ch i~ th~ re~ult ~!a ted an Equ.1tion ( l.lll ).
PA RT FOUR
Regression Analysis
of Economic Time
Series Data
CHAPTER 14
CHAPTER
15
CHAP' ER
16
Regression
CHAPTER
14
,. T
secuonal dattfare inadequate. One such question is, what is the causal efCect on
a variable of interest, Y, of a change in another variable. X, over tjme? In other
words, what 1s the dynamic causal effect on Y of a change in X? For example,
what is the efft!ct o n traffic fatalities of a law requiring pa~sengers to wear
sea tbelt ~
such question is, what is your best forecast of the value of some variable at a
future date? For example. what is you1 bt:sl forecast of next month's rate of
inflation. interest rates. or stock prices? Both of tbc c questions-one about
dynamic causal effects. the other about econo mic foreca sting-ca n bt! answered
USlng time seri~s data. But time series data pose special challenges. and
overcoming those challenges requires some new techniques.
Chapters 14-16 introduce techniques for the econometric analysis of time
series d ata and apply these techniques to the pro blems of forecasting and
estimating dynamic causal effects. Chapt er 14 introduces the basic concepts and
tools of regression with time series data and applies them to economic
(oreca.c;ting. In Chapter IS, the concepts and tools developed in Chapter 14 are
applied to
t h~
1n
526
CHAPTER 14
dillcr~nt fnm1
dklts.th~
ar~ u-.ctul
0\1\\,
fore~. hl
r tin
~,~..r,
<I
Sci.! m 14. m troduc~ -.onto ba.ac cuncept:. ,,f ll"Tle , .. r " .... l)'b and pr~em
St"l ~~.; ~\<lmpJe<;
rcg.:..,,iun mudcb in '"hich the rcbrr~~urs .tn.: past 'taluc::. ol th~ JcpcnJcnt
\'an thk.th\.S\. "auturcgn.:..sivc" modd .. u-.e the hl'>tory of inllation to fnrcca 1 it-;
futur~.o
h~
,1i.Jdmg
thcl>C so c.:allcJ nuturcg1 cssivc distributed lag model' an: inti nJm:~..d in
14 4.
the
rl'l
r.at~
to
ll'>tng
lagged inllatwn
~lUh 111
l.1ggcd v.tluc~ uf
dcscril
<.;."ltlllS
rc~ll.'-'tun sulfJcienrl~
11:0
th<~t
11
i~
l1m.: 'cru:... \ariabk' can fall h") h...: ,l.lltonnf) m 'ariou' '' .ap. hut l\\O 1rc
11~'>1'
llV\.T
'"
popul.tllull rcgr\. sJOn can haw breaks. These depanurcs lrtm <;t:ltion.tril\
JCOp;mli/t. ftHcC<t'> t and inferences ba!:~ed tm timl.! 't:rit-, ll'gfl.'),sitm Ftlllun.Jh:lv.
lhL'H! <11'1.! :O.li.lllSIICaJ pl'OCCdUI'l'S lor deteCllllg_ ln.:nJ'> \IOU hrcuJ..c; ,11)(.], (llll'~
~ok'Lcctcd lor
in
S~.llitlll'
ILoanu 1>1.7.
t4. 1
527
The empirical application of Chaptc.:r~ 4-9 focus~c.l on ~;:~lim.1ting the cau:.al c.;ffcct
on te..,t scores of the ~lullenHcm;hcr raJio. The implcst rcg.rc-.,it'n mo<.lel in Chapter 4 rclmed test s<:orcs to the student-teacher ratto (SIR):
X ~'/'/?.
(lU)
A~ wa~ discussed
..
ln contrast as wa." discussed in Chapter 9, a p.lrl'nt \\ ho is considering mO\ing to a school district might find Equation (14.1 l more helpful Even though the
cnclficie.ot does not have a causal interpretation. the rcgrco;~ion could help the par
ent forecast test scores in a di~tricL for which they are not publicly avn.ih1bk. \.1orc
generally. a regression modch.an be useful furfurcca ling even if none of 1ts coef
fictcnts have causal tnlerpretations. From the pcr;pccttvc of lorecasting. what is
important is that the model provides as accunne a ltucc..lst a~ possihlc. Although
there rs no such thing a. a perfect fort!cast. rcgre~sion nHldt"lo; can ne\'erthde~ provide forecasts that are accurate and reliable.
ll1c applicatinn~ in this chapter diller from the test ~core/class size prediction
problem hecause this chapter rocuscs on us1ng tim~ scm~" data\{) rorc~:a. t future
events. For examrtc, the prospective pan~nt actually wnult.l be tnterested in test
scores next year. after his or her child has enrolled tn u sehoul. Of course. those
tc: 1 ha\e not ~et been given, so the parent must forcc<J-.tthc score-; using curn:ntl)
a' ailahlc inforruatiuo. If test cores arc a' ailahle for pa..,t ) C.Jr'-. thc:n a good .;;tarting (X'int is to ~c c..lata on current and past test score" to torcca.c;t future test score~
Ihi!> reasoning lead directly to the autorcerv,..,i\1.. molle;:ls prc~cnted in Scctmn
l4J, m \\hkh pa l valucc; of a vanable are U'>t.;U 10 ,, hrh.. H rcgresc;ion to forec;ts t
future values of U1e series. The next step. whrch 1s taken 10 ~cctwn 1-l..t. rs to cnend
these model:. to mclude adJrtionaJ predictor\ ant~hk\ such as data on cl.sss 'it\..
T ike Equation (J-L I), ~uch a regression model can pr11Jucc accurate anJ reliable
forecasts even if it coeficicnts have no causal intcrpr\.lation. In Chapter 15. we
return to problems like that faced by the ~chou! o;upcrinh:.nc..lent and disc:u~~ the
csllmation of causal effects using lime St:rics ' una hies.
528
CHAPTER 14
----
14. 2
529
FIGURE 14.1
Percent
-2 .....
-4 I
I I
I I
I I
1~75
1980
I I
I I
1990
I I
1995
2()(l(l
I I
::!(105
Year
Percent
II
~~~~~~~~~~~~~~~~~~UL~~~~~~~~
I 'J/,(1
l'J7ll
I'J7')
IWO
t995
~wo
~~ ~J5
Ye;~r
(b) U
(i
L nt'llll'lm I!ICIII
lt.ll.
Pn,e inAotion in the United Stoles (Figure 14.1o) drihed upward from 1960 until 1980, and then fell sharply during
lhe early I 980s.lhe unemployment rote 1n the United Stoles (Figure 14.1b) rises during rocessions (the shaded
episodes) and falls during exponsrons.
530
CHAPTER 14
14. 1
betw~en
pcriiJds r
I and 1
The lir..t difference of Lhe logarithm of Y, is .lJn( Y,) = In( Y,) - In( Y,_ ).
The percentage change of a time series Y, betwee n period!> t - 1 and r i
approximately lOOAln( Y,). where the approximation is most accurate \\hen
the percentage change is small.
The change in the value of Y between period t - 1 and period tis Y,- } 1 1;
this change is called lhe tirst difference in the variable Y,. I o time series d a t a,"~
is used to represent the first difference, so that AY1 == Y1 - Y,_ 1
Economic"time series are often analyzed after computing their l ogarithm~ or
the changes in their logarithms. One reason for this is that many cconorni scrk'-.
such as gross domestic product (GDP),e xhibit growth tbal is approximatcl} i..XP )neotial. lhat is. over the long run the series tends to grow by a certam percent t-e
per year on average: if so, the logarithm of the senes grows approximate!} line rly.
Another reason is that the standard deviation of many economic lime se rit.:. is
approxtmately proportional to its level, that is, the standard dcvtation is ' ~. 1 1
expressed a a percentage of the level of the series; if so. then the standard de\. tion of tbe logarithm of the series is approximately constant. In either cas~ .. ~
useful to transform the series so that changes in the transformed series are r )
poruonaJ (or percentage) changes in the original series, and this is achieved b~ tking the logarithm of the series. 1
Lags, first differences, and growth rates are summarized in Key Concept .l I.
Lags. changes, and percentage changes are Ulust.rated using the U.S. infl<JIIL1n
rate in Table 14.1. The first column shows the date, or period, where the first qunr
Ler of 2004 is denoted 2004:1, the second quarter of 2004 is denoted 2004:Il. anJ
1The chnoge of the logarithm or a variable is approximnttly .:qudlto the prorort1Un011 change.: !l!Ult
variable: thllll"- ln(X + a) ln(X) a a X. where the nppro\imdllon worb be'' when tl 1 X" "ttl<- j)tC
Equauon (li 16) .:~nd tho.: surroundmg discussion!. :-:o... n.:rl ...~ X WJtb }1 1 o1 wnh .l Y,. and note lh-"1
Y, Y,_ A Y~ ThiS means th.ll tlae p roponionaJ clui ngc m the: ..:n"~ Y, between rcnuh 1 - I arJ 1
" appro-cmntc:h lnC Y,) - ln(Y, ) = ln( Y, 1 + ~Y,) - ln( Y, ) .lY tY, Th" "rr~ion In() lIn( Y ) thc liN diffe~ncc: of In( Y) .lin I Y,). lbll) .lin(} ) .l )',t Y 1 llu: percentage: ch:tn~e. t)
100 11m the lrawonal ch;mgc. w the pcrcenrage ch.1ngc 10 the >enc~ Y, IS tppr\)XImatd) tOO.)lnl l ,I
14.2
TABLE 14.1
lnAotion in the United States in 2004 and the First Quarter of 2005
Rene of lnflarlon at on
Annual Rafe (/nf,)
First Ulg
Change in
Inflation (~/nf1)
(/nfH)
Quarter
U.S. CPI
2004:1
186.57
3.8 ~ 0.9
2.9
J004:ll
188.60
4.4~ 3.8
0.6
2U04:1D
189.37
1 .6 ~ 4.4
- 2.8
2004:TY
191.03
3 . 5~ 1.6
1.9
192.17
2.4
12005:1
531
3.5
- lJ
The annualized rate of inflation is the percen tag~ change in the CPI from ihc prcviou$ quarter to the current quarter. time.s four.
The first lag of inflation is its va lue in the previous 4u~rter. and the change in rnflarion is the current inflation rate mimJs its first
lag. All entnes are rounded to the nearest tlecimnl,
so forth. The second column shows the va lue of the CPI in that quarter, and the
third column shows the ra te of inflatio n. For example, from the first to the second
quarter of 2004, lhe index i.ncreased fr om 186.57 to 188.60, a percentage increase
oflOO x (1 88.60 - 186.57)/ 186.57 ::: 1.09% . This is the percentage increase from
one q uarter to lJ1e next. It is conventiona l to report rates of inflation (and other
growth rates in macroeconom ic time series) on an annual basis, which is the per
cent age increase in pr ices tha t wo uld occur over a year, if the series were to continue to increase at the same ra te. Beca use there are four quarters a year, the
annualized ra re of inflatio n in 2004:II is 1.09 X 4- 4.36, or 4.4% per year after
rounding.
'nus percenta ge change can also be computed using fhe d ifferences-of logarithms
approximation jn Key Concept 14.1. The diffe re nce in the logarithm of lhe CPI from
2004:1to 2004:II is 111(188.60) - 111(186.57) = 0.0108, yie lding the a pproximate quarterly percentage difference 100 x 0.0108 = 1.08%. On an annualized basis., this is 1.08
x 4 = 4.32, or4.3% after rounding. essentially the same as obtained by directly com-
= 400~ln (CP!,) ,
1) -
ln( CPic-~) ]
(14.2)
where CP l, is the value of the Consume r Price Index at date t. The factor of 400 arises
from conve ning fractional cha nge to perce ntages (multiplying by 100) and convert
ing quarterly percentage cha nge to an e quivalent annual rate (multiplying by 4).
The fin al two columns o f Ta ble 14.1 illustrate lags and changes.1l1.e first lag of
inflation in 2004:11 is 3.8%, the inflation rate in 2004:1. The change in the rate of
inflation from 2004: ( to 2004:II was 4.4'}i:> - 3.8% = 0.6% .
5 32
CH APTER 14
14.2
The j 11' autocoHtriance of a eries Y1 i~ the cuvari ancL hctwe~n Y and it~ / 11 lag,
Y, . and thl! 1 b .sutoc{lrrl!lation cndbcienlt'l the corr~.:htion hel\~ecn ) !ld 1',
That IS.
autocorrelation
=:
p -= corr( } Y
'
)=
( 14.3)
CO\( Y
)
.__]!_ - -
V \ :trO 1 )v.:u( Y,
( 14.4)
Autocorrelation
In time series datn, the value of Yin one pcrintltypically is cond,ttctl with its' alu~
in the next period. The corrdation ot a sene:. \\ llh lis own bgl!cJ value5 1<; l:<tlll..d
uulocorrclation M !.erial correlation. The fir'>l autocorrelation (or autocorrelatiuo
coctlidcnl) jo;; thl' c.:orrelation between Y, and ) " 1.th.tt b. the corrdation Od\\e~n
' a lues of )' a t two adjacent dates.1l1e scc<JnJ autocorrelation i'i the c.:urt clat i1111
be twee n yl anti yt-~ and the i'h autocorrelation i'i the correlation hetwc~n r/1 J
l ,_,. Stmiltrly. thcj'h autocovarianl>e Js the co' .mancc bl.!lween} a nti}' _ . u, nn rdatJoo and autOCO\ ana nee an; ~ummarm.:tltn k t:) Concept 14.2.
.........
ph
<' -
~( Y1 F, ~) and p1:
-
ll
~
(YI - r I
.;..J
( ) ., . Y1- I) CO\
- T
I -' I
l.J )( r I I -
Y!. ,. I )
--
'
cov( Y1 ) ' 1
p1 -.
var( Y)
1)
( 14 h l
''here } ~, 1.1 dt.'notcs the ample a\'crage ol ) 1 computed over the ob'>crvat ton-'
j + I ... . , T anti where var(Y1 ) i~ the sampk varit~ncc of } '
1-Jlu: 'ummatmn m E4U.IIllln ( L4 'i) t<.Ji,idc:J "' 7 wh~r\ "m the U\Urtl fvrmuhlur the '-It mph: col'r
1
.me.: t~c I 4Uai1Un (J :?4)] Ihe summnrion "lhvu.k d h~ tho: numb~: I or nh-.en.111nn< in th.: ~unun.11 1 "
11111111~ J J~;;grccvof-trccd,,m ,tJ(U\IOICnt. The: lurmula in l<jUOIIOO (14 ') l ' .: lll'tt:OI!On.:JIIur the p r
pmc: <l( C<lnt(lUUng the aUil'ICO\!'lO.In~'C EqUJtllliO (I~ (J) U~' th.: 8'<Un!Jl on I I ~11rl) .) IUtll v )
:trc the --.1me '" imp1Kltlton or the ,Jl>Wmpuon that )' ''
"l11<h J) th<U5,,J 111 ~c '"'n ~
'''''""'r\
14.2
r TABLE
14.2
Introduction lo Time
Correlation
533
o 7n
-II
Z$
C),2!J
() h7
- (I 06
rnc ltrst four c;<tmpk autocorn.: lationc; of the inllauon rntc and l[ th~ '-h mgc in
the inll;nion rate arc listed m Tahk 14.2. The''- cntm 'ho\\ th ctmtlatton 1~ ~lrongly
posit1' ..:I~ autocorreluted:Tne first autocorrd.ttton ,, tl.M. TI11. ,,unpk autucorrdation dtdincs as the lag increase:., hut it remam._ J,crgc: C\t.'O 11 tl.tg nl fllUr t)Uclrtcrs.
The Lhunge m intlation i negatively tulocorrdatul:t\n incre.c~c in the rate of inflation in onl.' quarter tend' to be as~ociated '' ith .1 cJ~,.-crt.'c,e in the next quarter.
A I lirst. it might seem contrad i cto~ that the k vel ul inll.ttton ts stwngly pm.llive ly corrLI.ll<:d hut Its change ic; neg.au' dv coudueJ ll1es~ '" o autocorrdattnns. ho\\~.;\Cr. measure tlllfcrent thmg.s. l'h\.. ,,r 111g po:-tll''- .lUt~tlrrd.lllon in
mn ttllln rdlects the lonl!-term trends 10 mn.tthlll ~;viJenl in figllll. 14 1: lnfl.ation
wa!\ lu\\ mthe first quarter of 1965 <llld .1gain in thl' 't.:C\llld it \\,1~ high in the fir ... t
l.(U crtu nf 198 I anJ .a~ain in tbe M:conJ In contc,,,, th~ nLp.tth L autocorrelation
l'f th~. ch.mgc of inD
ea ns that. on a\>en c. an incrca"e in influ tion in one
ar cr ll> associa ted" ith a decrea~~.: m mnatton 111 thl: 111. '<l.
534
CHAPTER 14
I0
05
Year
(a) I ,,Jcr.lllund> Interest lUte
Logo rithm
1.35
J2 G
~~~~~~~~-L~~~~~~
2000
Year
~ :?IX)4 ::: ~
Year
(d) p,.r,\:nt.lbt" Ch;111..~ w l>ll) \'.Uu,') of the: 1\ Y~f.
Compome Stcxk lndc:x
The fovr hmo series have markedly different potterns. The federollunds rota (Ftgvre 14 2ol hos o pottem similar to
price inAohon. The exchange rote between the U.S. dollar and the British pound (figure 14 2b) ~ o discrete
change olter the 1972 collapse of the Brel1on Wood!. sysrem of fiJ~ted exchonge roles. The logarithm of GOP in Jo all
(Figure 14 2c) shows rdo~vely smooth growth, although the growth rote decreases in tho 1970s and again in the
1990s. The doily percentcge chonges in the NYSC sto<k price index {Figvro 14.2d) ore e~~tially unpredictable. b- 1
its vanonce changes: This series shows volatility clustering.
N
.
;
The dollar/pound exchange rate (Figure 14.2b) is the price of a British. pound
() in U.S. dollars. Before 1972. the develo ped economics ran a syste m of lt:-.l!t.l
exchange rates- called the "Bretton Woods" syste m- unde r whic h governnwnts
worked to keep exchange r ates r om uc ua tng. n
.lfl ationary pre-;.; tre~
led to the breakdown of this syste m: the reafter, the major currendes were alllwt:t.l
to "float." that is, t heir values were determined by the supply and demand (M ur
rencies tn the market for foreign exchange. Prior 10 1972. the exchange rJt ~.i!>
approximately constant with the excepr;Dil of a single Cils' alwa(teR IR I%81'R \'Pi~
the official value o f the o und. re lative to lbe dollar, was decreased to $2.40 c:; 11'
14.3
Auloregressioos
535
Qu3rte rly Japanese G DP (Pigun.:. 14.2c) h. the totol value of goods and se.rvicc::!l produced in Japan during a quarter. G UP I\ the broadl:St measure of total economic acti viry.The logarithm of the series is plot ted in Figure 14.2c, and changes in
this series can be inte rpreted as (decimal) gfO'wth rates. During the 1960s and early
1970:.. Japanese G OP grew q uickly, but this growth slowed in the late 1970s and
1980s. Grow1h slowed furth er during the 1990s, averagmg only 1.2% per year from
1990 to 2004.
The NYSE Composite market ind..::x is a broad index of the share prices of all
firms traded on the :'\ew York Stock Exchange. Figure 14.2d plots the daily pe rcentage changes in this index for trading days from January 2, 1990, to November
11, 2005 (a to tal of 4003 observations). Unlike the olher series in Figure 14.2, there
is very littl e serial correlation in these daily percen1changes: If there were, then you
could predict them using past daily changes and make money by buying when you
expect the market to rise and selling when you expect it to faU. A Ithough the changes
are essentially unpredictable, inspection of Figure 14.2d reveals patterns in their
volatility. For example. the standard deviation o( daily percentage changes was relatively large in 1990-1991 and 1998-2003, and relatively small in 1995 and 200S.This
volatilit y clustering' js found in maov financial time series. and econometnc mode.IS'10rmodeling this special type of heteroskedasticity are raken up in Section 16.5.
14.3 Autoregressions
What will the rate of price infla tion- the perce nt age incr ease in overall pricesbe next year? Wall Street investors rely on fo recasts of inflation when decidjng
how much to pay for bonds. Economists at central banks. Like the U.S. Federal
R eserve Bank. w;~ inflation forecasts whe n they set monetary policy. Enns use
inflation forecasts when they forecas t sales of their prod u ct~. and local governments use inflatJon forecasts when they develop tbeir budgets for the upcoming
year. ln thjs section. we consider forecasts made using an autoregression. a regression mode l that relates a time series variable to its past values.
536
CHAPTER 14
II
( 14.7}
"hen-. ' usu ll 'l.tndc.rd error... an. given in parent h~"c' under the c tuna ted cocflic:tcnt--. 1 nJ S.!nf is the prec.Jiced value of j.fn{ ha,cd (IQ the c timnt\:d rcgres .
,j(ln li'lt. 1 11~ mo<.ld in Equauon ( 14. 7) is called a fiN o r JUhlTC! t. ,._ 1r: an
auhm.. :-1.!~'-ltlO bt:cause it is a f(!gressaon of t h~ :,eri e~ ontotb llWtl l.t.:. :lint
anu
lit:-.t oak r lll:<..a~~ onl} one lag ts useJ a<; a regr~ ssor. Th~ l:Odtu:acnt in I:qu.tll\n
(14.7) j, n~gala\'e, so an increase in the inn.ttion rat~:. an on1. lfUUilca as a''IXti.lleu
\\ ith il dcclint. in lht: inflation rate in the next quarter.
Y, = {311 + /31Y,
+ u,.
( 14.:-i)
Forecasts and forecast errors. Suppose you have hio,tnricul Jatn on }' .mtl
~ ou
v.ant I() lorccast ns future value. If Y, foli o~ the AR( J) moth.. I in hquati,1n
t 1-t~) 1110 1f 13c tnd {3 1 ar(! known. then the forecas1 of YT 1 tl.t..~;tlt n Y1 'f3r -
P,.} ,.
'
\\hert. {311 .tnd b are e'>timatt>ll using hi~tonc.11 Ja ta thwu!!h timt. T. ...
- Thc forecn terror is the mi-;ta J.. c made by the forecast; this 1s I he tliffu~nL~
hctwLcn I he value ol Y 1 " ' that actually occurred and its lorecnstvd vuluc bNJ
un Y r:
( 1-l )II)
14 .3
Autor~ion~
537
the oh:.cnutat,n<; 10 the -.a mph.: u-;eJ 1\l e-;llnl.lh.' the r~:)!rC.,,I,.ll\ In contr.L\l,thc lorcc.l'l is maJ.: for o.;omc llnte b~..yunJ th~.; Jo~t.t -.ct u:.cJ to c:>tim.ttc tht..: rcgr~.:.,.,ion ~o
th~.; d.11.1 un th..: ,,ctu.tl \Uiuc of the hlrcca.-.tcu depend nt hannhlc aac not m thc: ~am
pk U'ed [II C'>linMIC the r~gre~-.i~lO '\jmj(,lrl) . lh~o; OT ~ T~ ,jJu,tl j, lh~ dillerencc
hdwl.!cn thl.! .a~.;tun l vnluc ot Y and 1ts predicted vnlue for obs~;n.atton' in the <>ampk "herea'\ the lor~c.:a-.t ~rror is the dillcrl.!ncc he tween th\. luturl.' ,nluc ot }'. whtch
'" lllll ctllllalll<.;U 10 the cstamataon ~ampk, and the Ina ccasl of thatluturc \ ,alut!. ~atd
Jallcrc.;ntlv, torccasrs .Hld forecast errors pcrtain to "out - of-s:.unph.~" obscn ataons.
\\ hcrc:<ts p1 clit~,;t~:J 'aluc~ and residuals pertain to' lllsam 1lc: ''~'en alton~
(RI\.ISIF.) t<; n mca,ure of the stze of the fon:ca-.t c:rrur.that i<;. ol thL magnitude
ol
tvptcal
lnl'ot,t k~
Olllt.kl. ' I he
(14.11)
'lht R\lSrr h,as (\\0 source<; of error the error ario;ing b~caust ruturc v~lucs
ol 11 .trc unJ.. nO\\ n and the error io t!:.t imalin\! the c.:odhcicnh
. It the first
>;OUrCt: 0 ~ffOf is mUCQ ar.!!Cr than tbs secane! 85 !I f,! P b~ tf Ill!' S,tmpiL :>IZC (S
l;rb~..th-..n the RM ~l-1:. I" approxtmatch \. \i.lr(u.). the .,tand.ud ue\tal!oo q f rhe
- 1 rna tl , 10 tht.
l ulauon dUton.:\!TC'o.'oiOn It: uatln
c . . wnJarJ ucvia1101 ''' u,t' 111 turn ~:..11mated hy fhe !.tanJarJ crr,,r llf th~.; 1\.:grc,.,j,,n ()f:. R. see
<;,t tinn 4.3 ) 11tu'. if uncdlaint} .tri,ing lr0m eo;llmatinr tht 1cgrc..,,i,m wdficients
j, o.;mall cnoueh to he i~norcd. the RMSFE can be t:!-tun:Hcd o~ thl! standard error
ol th1. r l.'rC:l>MUil Lsttm,tUoo of th1. R \f)f [ indudm~ h111h ~nurn;o; of k1re~ast
t.HOI ,s ttkt:n up tn ~.!~IrOn 14A.
\\'hat ic. the iorecast '"'f mn tttun in the fir...t quarter
IJI 200) (',Cm I) that J forecaster \\OuiJ h:t'c m.tdc 10 ,()(').lJ\' h.l,ed ,mtht t:'limnt J \ R( I) mood in I:quation (14.7) (\\ hich wa<> ec;.timah.d U'ln data through
,Lil\ 14 I ) ) nom 1a hie 4.1. the inflation rrttc in ~on.t I v \\ '"' J _ ... (solnl: \.+I\ =
35' ) 'll\ incn..l..,c of 1 9 pt!rc~?ntagc po1nh from 21111~ Ill ('-ll Mll/z,.,.J 1v = l.':J)
PIUS!;IIHith~.;,~,. values mto Equation (14.7). th\: lorna't of the ch.mgt: tn infl.ttion
11om2f~l I\ tu :!()()5:1 I' SJ;if2u ).5. = 0.017 0.2lh jfn)200-1 1,. = 11.{11 7 - fi.:!..V
X J.t) ~() n
- 0 .t (rounued to thL' nearc..,ttcnth) The prcdkt~d r,tlc ot in!Jf
linn j, tht. ,1,1 r,IIL utmllattOn pJuo; It" pri..'UtCICU ch.JOI!C:
Application to inflation.
j/11/ 1 I If/'
ti-U2)
538
CH APTER 14
The p
th
TI1e AR(l) model uses Y,_, to forecast Y,. but doing so ignores potentially u~tt.l
infonnation in lhe more distant past. One way to incorporate this informauon 1:,
to wclude addttional lags in the A R( I) model; thi~ y1elds the p'h order autoregressive, or AR(p ). model.
The p 1h order autoregressive model Ithe AR(p) model) represents Y, as a lincar function of p of its lagged values; that is, in the AR(p) model. the regressl '
are Y,_1, Y,_2, . .. , Y,_P, plus an intercepl. The oumber of lags,p, i.ocluded 1n 11
AR(p) model IS called lhe order, or lag length, ot the autoregression.
For exampk an AR(4) model of the change in rnOation uses four lags ott
change in inflation as regressors. Estimated by OLS over the period l962-2()(\l.
the AR(4) model is
til;;{,= 0.02 -
(0.12) (0.09)
(0.08)
(0.08)
(0.09)
The coefficieo ts on the final three atlc.Jitionallags in Equation (14.13) are joint!\
significantly different from zero at the 5% significance teveJ:1l1e F-statistic i::. 6. J I
(p-value < 0.001). This is reflected in an improvement in the R2 from 0.05 (or t.:
A R(J) model in Equation (1~.7) lo 0.18 for rhe AR(4). Similarly, the SER of. ~
AR(4) model in Equatwn (14. 13) 1s 1.52, an improvement over the SER ol tbt'
AR(l) model, which is 1.65.
1he AR(p) motld i-; 'iummarized in Key Conc~.pt 1-D.
14. 3
AuloregrC$SIOOS
539
AUTOREGRESSION$
v-
Thc p1h o rder a uto regressive mode l (the AR(p ) mode l) reprelrents Y, as a line ar
fun ctio n of p o f its lagged values:
14.3
( 14.14)
whe re E(u,JY,_ 1 Y1- z. .. . ) = 0. The numbe r of lags pis called the o rde r. or the lag
length, of the a uto regression.
Properties of the forecast and error term in the AR(p) model. The
assumption that the conditional expectation of 111 is zero given past values of Y 1
[that is, (u 1 Yt-l. Y,_2 . . .. ) = 0) has two important implica tions.
The first implication is that the best forecasl of Y r+l based on its entire history depends o n o nly the most recent p pas t values. Specifically, let Y T-IlT=
E(YT+d Yr. Yr_1.. ) denote tbc conditional mean of Y 1 + 1 given its entire hislory. Then Yn ItT has the smallest RMSFE of any forecast based on the history of
Y (Exercise 14.5). If Y1 follows an AR(p), the n the best forcast of Ynt base d on
Y7 . Y,_ 1.... is
(14.15)
which follows from the AR(p) model in Equation (14.14) and the assumption that
E(tt11Y, .. t Y, _2. .. )
actual forecasts from an AR(p) use Equation ( 14.15) with estimated coe fficients.
The second implication is rhat the errors u1 are serially uncorrelated, a result
that follows from Equation (2.27) (E xercise 14.5).
CHAPTE R 14
54 0
'" ~
return ' h r
il
Forecast~
h.N.'d c
an: !>ometime' Clllled tn< mwtum' fon:c:nsts Jlthe
value of " :.tr~ k ro~ .h ' nonth pcrlulp It h
momentum anC: \\111 lso nsc n xt m nth IC ...o then
returns"; N: autocorr.: b t.:d nd th~.; nutmcgrcs.o;t\1!
mc.Ud "'' p c' d"' usctul fore\:,tlil 'lou cnn mtple-
nJ..::-. that me
of the market
r vmi1111r d em 1/C\1 fltl/1.~
TABLE 14.3
(1)
(2)
(3)
AR{l}
\R(~ l
r\ RI
ll053
t0.115L)
( tl.'
Regrc<.<.N S
l".lC< ' fClllffl1 1
'
(0.0511)
t'XCI'S f f ('//1
~n.t
\\
-ouc:-;
(O.ll4')
"LL.'~
(004 )
or;:,.;
ft'lllfJl, \
{0it50)
() 016
(0 0-;7)
ln tcr...:pt
r-,tatt'll~
(p valu~ )
J<l
~ott
lllll
0.3.2S
(U.Irr7)
(0.1 <I<})
O.CJhS
(flJ:!'i)
(!1.:!611
IJ.I))\ )6
F.\lC'~
0'
(I) 1f 1' )
I J -12
t)
C)OI 4
I UK(' ~
rttum' nrc: mc<J\uri m percent per munth. The d<ttll arc dc><:riht:J in Arrcmllx 1-1 1. \II rc:1~"'"n' >~rc: c~u
Sit l"e".tiii>D~l. l'lth t>'.sTircr ob~crvauon~ u,eJ lvr tnitr.rl ,,,lue nl lu~~:J "'"'hk
Entnc~ rn tho rc rcs."lf '""'~ .rrc COCIIIl1t'nts, wnh hctcnl:>I.<"U.l,ticll> rolou<t ~tanJrrrd cnur\ 111 p.uemh ~' lh 11nnl t"'
rl'""' rcptn the hct.roskc:d 1\llCit\ rohust f MllttSIIt t.:~ur.~~: lb.:- h\'"'l(tt llc;, hat :~II coerftttcnts In th... n: rn,Kln arc zc:ro.."
i p-'tllut' m f!Jrt'Dth .,,,, :mrJ the dJW.I.:d R
m.u.J >\cr 1%111 2lil' 12 {I
14.4
Time Series Regres$iOn with Additional Predictors and the Autoregressive Distributed log Model
5 41
for..:casting.
prices already e mhody all curre ntly a\'ailable information. n1c reasoning is simple: If market partici-
A ppend ix ltl.
14.~
542
CHAPTER 14
in thu previous vcar. For example. in l981thc unemplO} men I r<Ht.: avcntged 9. 7'Y.,.
and during the nt.!xt year the mte of in nation fell h) 2.9%. Overall. the currel.Jtion
10 Figure 14.3 i<~ -0.36.
l11c scaucrplot m Figure 14.3 'luggests that past value::. of the uncmplo~ ment
rate mtghl contain information about the luture course ol 1nllation that is not
alrcac.l) containcc.l in pa~t changes of inllc~tion.llm conJecture 1s rcac.Jily checked
by augmenting the AR(4) model in Equation (1-t.l3) to include tht.: fir,tlc~g o f the
unemployment rate:
1 -
(0.53) {0.09)
-
0.08:l luf,_.~
(0.09)
0.39Mnf,_2 + 0.096.Jnt; ,
(0.09}
- 0.21Unemp1
(0.09)
(0.08)
(l4.loJ
1.
(0A4) (0.08)
2 fHL'nnnp,_1 ~ 3.0-H..'nemp1
(0.46)
(O.Rt'i)
2 -
0.31{Unemp,_3
(0.89)
(lU 7)
0.25uncmp1 ~
(0.45)
'Inc F-stau::.uc 1esung the joint signtficance of the second through founb lag::o
n t the unemployment rate is 10.76 (p-valuc < 0.001), so they are jointly ~ignli
<:ant. rne R2 of the regression in Equation ( 1-1.17) is 0.34, a solid Improvement 0' t.. r
ll.:!1 Cur Equation (1-U6). The F-statistic on all the unc.:mploymc.:nt cocfficienb ~~
8.91 (JI value < O.OOl).iodicating thatlhis model rcpn.:M.:nts a statistically ::.ign It
cant improve ment over the AR(~) model of Section 14.3 [Equation (14.13)] l in.'
slandarJ aror ot the regression in E<.1uation ( 14.17) is 1.36. o substantial impn)\ ment ov~:r the SER of 1.52 for tbt: AR(-l).
I 4.4
Tlml! Serres Regression With Acldittonol Predictors ond the AutoregrC!sive o.stnbuted log tv\odel
~3
543
In 1982 th U S unemployment
role WQ) 9 7 ond the role of
mllohon tn 983 fell by 2.9% (the
Iorge dot) In getl"'rol, high values
of the unemployment rote m year
1 tend 10 bo followed by
de reoses tn tho rote of pnce
' nllohon 10 th~M next yeor yeor
1.._ l wth o ccm~loton of
Clu.n gc in I nflation
t aod
Y~ar r -t 1
:>
036
-I
-2
-.'
-I
1
-sL-----~-----i---
II
I:!
Unrm plurmcnt R .ltr in \ear r
(1
forec:~c;tl!d
T1lU-. th\:
X 2.9-
for~ca't
5.6
0.1%
0 2:' /.
~ i =0.1 (f 4 .l~)
lS - 1.~.
544
CHAPTER 14
(14.19)
wht:re f:J1r /31 /3,-61 Sq are unknown coefficients and u, is the error term
with F:(u, Y, 1 Y,_z.... , X,_1,X1 :! ... ) = 0.
mode l in Equation (14.16) is an ADL(4,1) model and the mode1 in Equatton
(l4.17) is an ADL(4,4) model.
I'he autoregressive distributed lag model is summarized in Key Concept l4A.
Wjtl:\ all these regressors, the notation in Equation (14.19) is somewhat cumbersome. and alternative optional nota tion. based on the so-called lag operator. is pre
sented in Appendix 14.3.
The assuruption that the errors in the ADL model have a coodiliooal mean
of zero given all past values o[ Y and X , that is, that E(u,l Y,_ 1, Y,_1 . .. . , X _1
X,_ 1 , ... ) = 0, imp)jes thar no additional lags of either Y or X belong in the ADL
model. In ot her words, the lag lengths p and q are the true lag lengths and the coefficients oo additional lags are zero.
The ADL model contains lags of the dependenl variable (the autorcgres~tve
component) and a distributed lag of a single additional predictor, X. In gener...
howeve r, forecasts can be improved by using multiple predictors. But before wming to the general time series regression model with multiple predictors, we ftr~t
introduce the concept of stationarity, whkh will be used in that discussion.
Stationarity
Regression analysis of time series da ta necessarily uses data from the past to quan
tify historical relationships. If the future is like the pll:!l. then these historical rei a
Lio nship~ can be used to forecast the fu ture. But if the future dillers [uodam entall~
from the past, then those hjstorical relationships might not be reliable guidt..> to
the future.
In the context of time series regression, the idea that historical relati on<;hir~
can be generalized to the future is formalized by the concept of stationarity. Th<!
precise dclimtion of s talionarity. given in Key Concept 14.5, is tha t the lh,trJI'U
tion of the time ~erie variable does not change over time.
14.4
Time Series Regression with Additional Predictors ond the Autoregressive Distributed Log Model
.54.5
STATIONARITY
A rime st!ries Y, is stationary if its probability distribution docs not change over
time. that is. if the joint distribution of ( Y1 L Y,.1' 2, Y,_ 7 ) does not depend on
, : otherwise. Y1 is said to be nonstationary. A pair of time series, X 1 and Y .. are said
to be jointly stationary if the joint distribution of (X,_ 1 Y,_ X1 _~, Y1 1..... X 1 .
Y1 ,.) doe~ not depend on s. S1ationarity requires the future to be like the past. at
T
least in a
probabili~tic
sense.
546
CHAPTER 141
14.6
Th~ g..:n~ral tim(.; ~c:m:' r..grcsslOn mocld allows fClr /.; adthtinr ~' C~redrctors, ''here
c1 1 lugs of the first prcdil:tor are included. q~ lag:, of the !>L<::nnd prcdtcto r ar~:
included. and ~o forth:
Y,
{311
{3 1Y,_1 - (3, Y, 2 +
.,. c5 1
.,. . .
/31, Y
- 6 ..X ,
c5 1q .\ 1,
1_
iJ. X -I + 8 \'
.!
{14.20)
+ .. . + 6 I XJ.
c~M:I}
for cro,s- ectional uata,J' that aU the' an 1hk' h m ... mmtcru Imile founh mom ent~
fi nally. the fourth U'-~urnption. which j, abo the ')!lOll ,1, flll' cro...,.-.~,ctional
Jara. j, that rhe regrc""ors ar~ not p~rlcctl) muhknlhne;1r.
14.4
Time Series Regression with Additional Predicton and the Autoregressive Distributed log Model
547
The! Granger causality -;tatistic is the F-statistic testing the hypothesis that the coefftctents on all the valuc::s of o ne of the variables in Equation (U.20) (for example,
the coefficients on X 1, ,. X 11_ 2 , X . -q) are zero. Thjs nuJI hypothesis implies
that these n::gressor.:. have no predictive content for 1"1 beyond that contained in
the other regressors, and the test of this null hypothesis is called the Granger
causality test.
14.7
Under the assumptions of Key Concept 14.6, i nferencc on the regression coefficien Is using OLS proceeds in the same way as it usually does using cross-sectional data.
One useful application of the F-statistic in lime series forecasting is to test
whe tJ1er the lags of one of the induded regressors has usefuJ predictive content.
above and beyond the other regres~ors iu the model. The claim that a variable bas
no predictive content corresponds to the null hypothesis that the coefficients on
all lags of that variable are zero. The -statistic resting this null hypothesis is caJJed
the Granger causality stalistic, and the associated test is called a Granger causaJHy test (Granger, 1969). Thjs test is summariz.ed in Key Concept 14.7.
Granger causality has little to do with causality jn the sense that it is used elsewhere in this book. In Chapter 1, causality was defined in terms of an ideal randomized contro lled experiment, in which different va lues of X are applied
experimemaUy and we observe the subseq ul.!n l effect on Y. Jn con trast, Granger
causality means that if X Granger-causes Y. then X is a useful predictor of Y. g.iven
the other variables in the regression. While "Granger predictability" is a more
accurate term than "Granger causality," the latter has become part of the jargon
of econometri<."S.
As an exa mple, consider the relationship between the change in the inflation
rate and it:, past values and past vaJues of the uncmp l oym~nt rate. Based on the
OLS estimates in Equation ( 14.17). the F-statistic testing the null hypothesis that
IJ.1e coefficiertts on all four lags of the unemployn1ent rate are zero is 8.9J ( p-vulue
< 0.001): In the jargon of Key Concept 14.7, we can conclude (at the 1% significance level) that the unemployment rate Grange r-causes changes in the innation
rale. Thi~ does n01 necessari ly mean that a change in the une mployment rate wiU
cause-in the sense of Chapter 1-a subsequent change in the innation rate. It
does mean that the past values o f the unemployment rate appear to contain information that is useful fo r forecasting changes m the inOauon rate. beyond that cont;uned 10 p.1..,t ., ,tfut:s of the mOation rate.
548
CHAPTER 14
Forecast uncertainty.
The foreca:,t error con..,bts of two componl.."nb. uncer .trising from 1.. ,( n 1Hon oftht: rq;re,,ton c.o~.- ltll~ nh, and unl r mt~ a~')O
cia ted \\ith the future unknown 'alue of 111 f'lr r~r~r~''iCin with r~" .:oct ih. i..:nt::
and many ob~crvationc;. the unccrtaint\ ,ui,ing from luturc u 1 can be much larger
than the unccmuntv u~ocinted with estimation of the paramctcN.ln general. howcwt. both ..,~.,lurccs of uncertainty are important, so we now deve lop .~n ~xprc,o;ion
fur the RMSI-L thatmcorporates these two S<lurcl\s of unccnamty.
1 o keep the nol.ttiun simpiL,conl>tder forecasts of Y ,. .. , based on an ADL( t.l)
mmkl "itb a ..,inglc pr~..dictor. that is. Y, {3,1 {3 1
+ t> 1X, . 1 + u,. und ~urpuo;~
that 11 i-; homosketlao;tic. The forecas t i~ Yr ll = ~0 + ~ 1 Y r + 51X 1 and the for~
cac;,t error 1:,
tain!~
Y,_,
Becau-;c 11 1 1 hu~ conditional mean zero and is homoskcdastic, uT-I lh\.., ._:ad
nncc a;. and is uncot re lated with the fina l exp r~sio n in hrack~t' in Cl(Uation
( 14.:! I). Thuc;, the menn c;quarcd forecast error (MSF E) i
MSrE - 1-l( y T
I -
'
...
l', 1/JI
(/3 1 -
/3.lYr+
(51 -
c5 .)X
1.
(14 22)
14. 5
5 49
550
CHAPTER 1 4
regularly publishes foreca~Ls of mflation. These forecasts combine output from econometric models
1l1e river of blood for May 2005 is shown in Figure 14.4 (in this figure the blood i~ blue. not red.~
you will need to usc your Imagination). 'Jltis chart
shows that. a~ of May 2005, the hank':. cconorrust!>
expected the rate of inflation to climb to approximately 2%, then to hold !.teady for tht: foreseeable
~lonetary
Policy Com-
mittee. l'hc forecasts are pre~cntcd as a set of forc:cast intervals designed to reflec t what these
41
Year
14 .5
551
deci-;ions.
mation criterion.'' One such in!or:rnation criterion i!l the Bayes information criterion (BlC), also called the Schwarz information criterion (SIC). which is
(14 .23)
where SSR(p) is the sum of squared residuals of the escimatcd AR (p) The BIC
ec;timator o{ p. p. is the value thar minimizes BJC'(p) among the po .tbk ch01c~s
p
0, I.... , p, r where Pma~ ts the largest vahu. ot f' cunsiucrcJ.
552
CHAPTER 14
TABLE 14.4
SSR{pVT
ln(SSR{pVT)
(p + 1)In( T)fT
()
:.!:If)()
II ,c;
u 01()
~.7:07
I IIIli
fl.ll6U
I Ill>
II 056
2315
"S65
(J.fl\10
! .,,,
Ill SJ
:.31 I
II.R~
1~11
(J.~7
O"tn
:!.Jti'J
II.R~7
1J I 'O
U%fi
0.21):1
2 'IJ.'\
runn
O.lliO
l.Uiti
O.l(}.J
:11113
!)'\~
O.l(}IJ
(l).t6
021~
1111~ formula
BIC(p)
~<J'i
1f2
OtUl
for the B JC might look a bit my-;terious nt fiN. hut it ho~ an intu
ilht.: uppe.1 1. Consider the rust term in Equation (14.23). 13 ccau-.~: the r..:rr..::o.!ooi(lll
coc11H.. t~nts are estimated by OLS. tbe sum of squan.:J re"Jualc; n<.:l'l."'-" lv
d..:n~a'~:' {nr at fca, t doe!) not mcn!as~.) wh~.:n }OU adJ a lag In <.:ontra-,t.Lh~. :......
,mJ tCml i" the numher of C')limated regr~. siun co dfldc:nts lthe 'lumber f 1.,~:...
p p l U'lll e for thl! intercept) time' the ftdvr (In I) '/.Thi' ..~.. .. onJ ler 111 e.
"hen vou dd ,, lc1g. m e B lC Ira des oif 1hcsc 1'' o force!. so 1hat 1he numtx r ;')f I g..s
that mmtmi7.CS thc BIC is a consi!.hmt C.'\timator ol the true lae,lcn!!th The m:llh
crnaiJCS of 1h1' argum~:ntt s ghen in AppcnJix l4.5.
,\-. .1n ~. xnmpk. constdcr C5timatmg the A R order lor an auton.:greS!)ton oltlw
ch.mg~. mlht; tntlatJOO rate;: The' ariou-. ~h.:pl> m th~. c.tkulat1on ott he BlC 1r c.1r
ricd lUI in [ 1blc 1 ~.-+ for auton.:~re-... ion-. ,,f ma\imum onkr ~:x ([ mu = 6) l r
ex.tm(k. for the AR(l) modd n Equ<~tilln (14"'). (j~R(l)/7-2.717 . ..o
lnO.\Rt I In = I ( 17 Because r = 172 ( -l3 ye.lr .... fuur qu ncr.. per }car)
In( I )1 I = 0 <no und (p -. 1)In( T )1T- 2 A 0.0~0 = 11.060. TIIU) BlCC l l "" 1.1107
+ u.uou 1.067.
'llw ntc Ill -,mallest when p _-; 2 in Iahk 1-IA. lnu~ thl.! DIC I.!Stimatl.! ol rlw
Jug lcn~th j-, 7.. 1\), can bt! seen in Tahll! 14.4. a:.. th..: numhcr ,)f lug'> incrca-,..:!-1 1h~ /(
incr~. tsc-. and the f)SR decn~a"t.:'>. The: incr~.asc in the R 2 h large from one 1<1 '""~
IU!!' o;nr llu from l\\0 to three anu quite c:mall from lhrct tn ftlUr I n~. RI C,,,. c
dcc1J .. pr~.:~. .....d~ how large the 1ncrcu'e in the R mu-.t he lu ju.....tll\ incluJin ~.; .:
aJJIIJunallng.
14 .5
553
Th e A/C. The B!C is nor the only infurmalion criterion: another i thl' Akailc~
iJI(ornHllion criterion, (AIC) :
(14.24}
The difference between the AlC and the BfC is 1hat tht: 1erm ''lnT"in the; BfC
the A IC, so the second Ierro in the AlC is smaller. For exampit!. for the 172 observations used to estimate the inOation auLOregrc~sion s.lnT =
ln(172) = 5.15. so that the second term for the BIC is more than rwice as large as
the term in AIC. Thus a smaller decrease in the SSR i:-. needed in the ALC to justify mcluding another lag. As a matrer of theory. the secood term in the AIC is not
large enough to ensure that the correct lag length is chosen, even in large samples.
)0 the AIC estimator of pis not consistent. As is di!lcussed m Appendix 14.5,10
large sa mples the AIC will overestimate p wilb non7ero probability.
Despite this theoreucal blemish, the A IC is wide I) use::d in practice. Jf you are
concerned that tht BIC might yield a model with too tew lags, the AIC provides
a reasonable alternative.
IS replaced by "2'' in
A note on calculating information criteria. How well two estimated regrestOo fit the data is best assessed when they are estimated u ing the same dat.a sets.
Because the BIC and AfC are formal methods for makmg this compari<>on. the
autoregressions under consideration should be estimated u ing the same observations. For example. in Table 14.4 all tl1e regressions were estimated using data
from 1Y62: I to 2004: TV. for a total o(172 observations. Because the autoregressions
involve lag:i of the cbange of inflation. this mean~ that earlier values of the change
oCinfiation (values before 1962:1) were used as regress or!~ for the preliminary
observations. Said differently, the regressions examined in 'fable 14.4 each include
observations on 6./nf, , Mnfr_ 1, .. , Mn/,-p for 1 - 1962:J, .... 2004:fV, corresponding to 172 observauons on the dependent vnriahle and regressors. ~o T =
172 in Equations (14 23) and {14.24).
S54
CHAPTER
14
Information criteria.
10
(I-I 2';)
ThL 1\ IC is ddineu in the '<tme way. but '' itb 2 replacing In T in Equation ( 14.25).
for ~.llh c1ndid tl<; moJel the BIC (or A IC) can be e"aJuatcu,,1nd the ml)del v.11h
tht... lo-w~.~L v1lu~. o l the BIC (or AIC) is the preterrcd model. hascd on the inform<ttton t...nh,rlllll
1bcr~. .trc l\\0 tmporlant practical consllkratioru. ''h n u,.mg un inlw '1on
ct iter ion to "''"ll'Uit: the lag kngths. hl't, a.;; is the case for the autorcgrc!>!IH . :til
the canditl.lh; muJcJ, mu!oot he c..,tirnulcu over the same samrlc: 1n th~.: n l ion
<ll t:quntiun (I I 25). till number of observation-; used to estimate: thl~ model. f.
mu't h.: thl ~ me for Ill models. Second, when there are mult pk predictors. ht'
appro t<.h t!l ~. vnputatiunally dcma ndint! ~e cause it rcqutres computing m n~
dtlfl.r~.nt mvJ 1~ tm nv combmat1ons ol the 1:'\g. p-trametcrs}. In pr ll'kc
-~m
VLnicm 111\i :1.. .t 1' Ill fc.: ({U l!C aiJ the !C~!C!:>SO rS to have the S.tm1. numb...r r l.tg.._
th,11 i,, to rcl(UIIL that fl c1 1 - = CJJ. -.n that only p,...,x ... 1 models 1\ll.'- to be
cnmp;ucd (ct,rrcsponding to I' = 0. 1.. . . . p,,,,..,).
14.6
Nomtotionority 1: Trends
555
Jn thi' anJ the n.:xt 'ection. \\I.! <!xamine tWtl nl the mo't tmpnrwnt t} pc~ of
nun,tuti<''lnflrity in c!t:unomic Lime sene~ data: trenu' unu btcak' In C<ll..h 'i.I..'Ction.
"'-"fiN Jc,cribe th~.;. n tlun.: of the nonqationarity, then di,cu'~ the con.,equcncc<:
for 11mc ~rico; rc!!r~;;"sion if thb I\ pe of non,lnllonnnt\ '' prc:-cnt hm IS , norc:d
\\ n~.: xt prc:..,c:nt h.:'t' IM n(lnstatlonanl) .mu ch~u'' r m.... Ju.' h1r \)f '.>lutt~'"' to.
tht probkm:, caused b) that parltcular t~pc ot non-.tution,\111}. \\'1.. b~..gin b} dts
cu-.~ing tr~nJ~.
What Is a Trend?
A trend ,, .t p~.:r,tsh:nl long-term movement ot n 'nri.1hh.: over tim('.
van
t~k
ln~pecuon ot Figurt;
'l'ltng ol
i!COl: r:JI
Deterministic and stochastic trends. l11c1~.: ~~~ I\H\ l\~'11.!' ot trends <:ec'!l in
time ~;;crkc: data deterministic and ::.toch.1 tic determ inistic trend I!'> '1 m.'nr.m
dom lunctton of 11me for ~xamplc. a dctermlnt,tll 11'-'ntl mu.tht h~. ltnc.tr m lime:
if mfla\tun h.td a J~.tcrmlmsnc linear trend so thalli tncrcascd h~ 0.1 percentage
po1nt pl!r quarter, thb trend could b..: wTitl~:n "' <l.lt, \\her~. 1 j, mca~urcd in quarter'>. In wntrast. a -.tochastic trend is rand<'tn and \.lfic' cwct time. l m ex.lmple ,,
stuchn.,tic trend in inflation might e\.hihit ,, p1 nlongeu pcrind ,,fin~. rca,~,. fnllov.. cu
h) 3
pr~lklngcd
h!!Ufl~
l.t I.
l tl..c manv econometricians. '' e thtnk 11 Ill murc: 'tppn1p1 hilt: to modtl o.:cn,,,mn. 11m.: '~rh.:S u<: h.t\1ng stochastiC rath\;r th.tn uct~. 1lllllll'ltc trt.nt.J,. Ec(lnum
11..' IS complic.ttc::d stuff_ lt is barJ to rct:oncile the prcdil.t.1b1ltt} tmplu.:d h) a
dct~ rnuni-.tic tr\;nd "ith the.: complication'> ami 'urpn-.c' l.tc~..d) ~.r .titer yc1r b)
\Hirkcr' hu<iinco;se'- md guvernment-.. Fnr example.tit hnugh l'.S. ion.ttinn r\'h(:
lhroueh the lJ7tk. it" a' neither destined Ill I i'-l' lot C\C'I nor dcc;ltncd ro fall a!!am .
Rather. the slu\\ rise of inflation is now undctstuod Ill h:l\c nccurn.:u bt:C.IlN~ ut
h.tdluck .1nu moneta~ polic) m.il:;tak.;s, Jnc.l tis t.tmllll' was in hug\; part ..1 l:nn..,cy u~.;nc~.. ol tough dcci,tons made by th1.. Bm11 d ul (,ovt:.rnors o{ I hl: Fl..'dcr<tl
R\; 'cf\~ ~ltniiMI).thc /t exchange rate: 1rcndcc.l Jm\ n hom llJ72 to 19R5 anJ 'uh'cqucntl) uri fled up. hut these ffiO\emcOt\ l~lO \\('1 C the CPn'-l'lJUCllCC,l)f com ph.\
556
CHAPTER 14
economic forceo;; because these for~s change unpredictably, these trend ~ 'IT\! u,c.
fuUy thought of as having a large unpredictable, or random. component.
For the e r\!asons, our treatment of trends in economic llmt: ~crll;'> focu c-. on
stochastic rather than deterministic trends, and when we rder ro 'tremh 1n tmw
eries data we mean stochastic trends unless we explicitly say oth~n..-1.-.c. Tht~ "'-=
rion presents the simplest model of a s tocha~lic trend, rhe nmdom \\.tlk model,
olher models of trends are discussed in Section 16.3.
The random walk model ofa trend. The simplest model of a \anahk \\ h
a stochasuc trend is the random walk. A time series Y, is said to folio'' u random
walk if the change m Y, is i.i.d., that is, if
(14 "6)
where tt1 is i.i.d. We wi ll, however, use the term 'random walk" more generally to
refer to a time series that follows Equation (1 4.26), where 111 has condilional m~tan
zero. that is. E{u,J Y, _1, Y,_2 .... ) = 0.
The basic idea of a random walk is that the value of the series tomorrow is tb
value today, plus aJ1 unpredictable change: Because rbe palb followed by Y, cun
sists of raodom "steps" u,, tb.al path is a "random walk.'' The conditional mcar lf
Y,based on dara through time I - 1 is Y,_1; that is. because (u, I Yr-~, Y, - ~. ) 0, E(Y,IY,_ 1, Y,_2, ) = Y,_ 1 In other words. if Y, follows a random wall.. . nen
the best forecast of tomorrow's value is its value today.
Some series, such as the logarithm of Japanese GOP in Figure 14.2c, ha\\! an
obvious upward tendency, in wh.icb case the best forecast of the series must inclu1e
an adjustmen t for the lendency of the series to increase.lh.is adjustment leads t\ lO
extension of the random walk model ro include a tendency to move. or " drift lJ1
one direction or the other. This extension is referred to as a random walk with drift
(1-1.27)
whe re (u., JY,_1 Yt-l, ... ) = 0 and {30 is the ''drift'' in the random walk . lf {30 is p(l'
iLive, then Y, incr~ases on average. In lbe random walk wilh drift model, the !l~st
forecast of the series tomorrow is the value of lhe series roday, plus tbe drift (3
l11e random walk model (with. drift i:lS appropriate) is simple yet versatih:. ,111J
it is the primary model for trends used in this book.
'"
558
CHAPTER 14
IO\\ m.J II il' tt-.trut: value i. 1: (2) r-stati-.th.:.., on rcgr~<~'m' ''ith a ~lt'l\.h '"''c trc mb.
10 ha\C a nnnnnrnnl di,tnhution even in large c;nmpk' ~nd ('\) 11 \lr \: m~
example''' th~ risks P' ~~u by stochastiC tn:nd' '' th.11 l\Hl S~;IICS 1h1t nrc inJcpcndcnt \\ill, \\ith htgh probabtlny. mll-lcautngly app~.tr Ill bc r~ ltt~d 11 thC) I oth
ha\~ 'tc>ch ...... tic tn:nd'-. ,1 ,jtualtl)n kno\\n a' 'puriou::. I C{!It:.. ''""
c
If a r~l!l l '-' r 1 ~
,, stocha')ttc trt:nll, then lt!) usual O LS t!)tatl-.uc can h,l\c. .1 mm nonn.tl ~h,t r.
tilm under thc null hypo the''' even m large samples. llu::.non-nmmal Jl,llt1'lltJ<in
mc.lll!.> lh.ll cutwemional c:onfidcncc interval~ nrc not valid aml h) pmhc~t., t ~~ r ...
cnnno1 be C'()I1ULH.:tt:d as u~ua l. Jn general. the di'itrihutinn of this t-<;tati,tic i'- n 11
rcadtlv t.tbul,llcd bccau..,e the di-;tribution dcpcmh on th~. rc.lat iunship hl!l" ~d 1
tht.o regr~. ''"r 10 que!> liOn amltbe other rCI!rc,sor;. An impon:tnt c-.:ampk ol l h ~
prohkm ,trlsc.' 10 rcgrcs:;ton.., that attempt to torc~asl .,tod. rcturn-. usm~ r _:..,or., th.ll could h,11.e stocha<.ttc trends (sec the box rn Scettllll 11.7, can \ou li
the,; \t ukct '! Part 11").
14.6
Noo~tohonority I
Trends
559
Onc.. ampmtant ..:;be m \\hich it '' flUl>l'-tbh.: to t.tbulatc the dt,lrtbution of thl!
when the reg,res-.or ha.., a stochu... ttc lrLnd ,., in the context of an autorc.!~rc_,..,,,,n \\tth a unit root. \\c rcaum hlthi.., 'pcci.llc.. ''c "hcn we. tJke up the problem of h!..,ting. whdhcr a time <;eri~.:' Cllnlain.., n 'll~h "lie llc..nd
1 ~lilll,ttc
tu ctppc.r related ''hen the\ <Jr... not. 1 pmblo.;m \_;alh.;d ' Puriou." regression
rur CXiJmplc. l..~ tnflation \\as steadily lll>Ull1 from I h~.: mid 14()(h I hrough the
c.nly llJ. lis. anJ at the ~ame Lime J:ap.m~:'~ <,J)p lrlnued tn log.trttllDls in Figure
l4.2c) '' ,,., -.teadily ri"ing. Th~o."c.. t\\ o trcn~h c..un..,piae to produce 1 regrc ...-.ion that
appt:1r' to he .,jgnificant" using con' cntion I nllac;urcs. .::..,timated by OLS u'iing.
d<~t:l from 196- through l9Sl.thi.., regrc,c;ion is
X ln(Japtlllt\c. (i[)f',)
-I<
0.56.
(14.2~)
(3.1JlJ) (ll.3b)
(10.41)
lfl.
( J4.21J)
0)
The rcg.r...:ssion.; in Equation ( 14 .2~) :tnd (1-l 24) could hardly he more different Intcrpreted literally. Equal ion ( 14.2~) mdic.ttc.., a .,trong po:.ithe rclatwn..,hlp.
' 'htle Equation ( 14 29) indtcat~.-::. a \\e.lk, hut upp.trc..nll) ..,t.ltl,llt:alh ''.!.!nlllcant,
m.gali\'C rdauoo-.lup.
The source ol thc..:sc conllicting re..,ult... is th,tl llc.llh ...eric' h.1vc ::.tllChastic trends
Thc-.c trends happened to align from 1965 through 19~1. hut did not align (nlm
19X:? through 2004. There'' in I act nn CllmJXIIilli!. ccnnnmic or political r~o.a,nn to
think that the trends in the<:e two seric' arc rd.ltcJ In ~hort the-.~.: recre,,tonc: arc
c;purmus.
Th~ regrcsston., in EquatiOns Ll4.2X) <ttH.I ( 1-1.21.J) tlluo;lratc empirical!~ the thcorettcal pomt that OLS can hL mtsh.:udtng wh~.:n the scncs contain stochu!')tic trend'>
('ce Excrci-.e 1-l.h for c1 computer !>imulallnn that Jcmcmstr.tte~ thi., rt,ult) Ont.
'pedal cas~ in '' hich (~rtain rcgres..tnn-hascJ mcthuJ, art r~liahle jo; wh~:n lhc
trend compont:nt of the l\\O sen co; ac; the c;ame th tt i~ "hen the c;cncs con tam a
<mllmm ,tocha..,llc tr~nd: tf .;o,thc.. ~~..nt:s arc. ~atd to he; cotntegrntcd Econnmct-
560
CHAPTER 14
ric methods for detecting and analyzing cointegrated economic time ~cricl> are discussed in Section 16.4.
Detecting Stochastic
Trends: Testing for a Unit AR Root
Trends in time series data can be detected by informal and formal m~thod'). 'The
informal methods involve inspecting a time senes plot of tht.: dat ~ .and computin~
the autocorrelation coefficients. as we did in Section 14.2. Becau'c. the fir t aut correlation coefficient will be near 1 if the series has a stochastic trc:"nd. at lcaq i
large samples, a small first autocorrelation coefficie nt combined with a time scric"
plot that has no apparent Lrend suggests that tbe series docs not have a tr~nd. lt
doubt remains. however, there are formal statist1cal procedure!~ that cun be U!>eu
to test the hypothesis tbat there is a stochastic trend in the series against the alternative tha t there is no trend.
In this section, we use the Dickey-Fuller test (na med after ilS inven tors Da..,ill
Dickey a nd Wayne fuller. 1979) to test for a stochastic trend. A ltho ugh thL
Dickey-Fuller test is not the only lest for a stochastic 11end (another test is discussed io Section 16.3), il is the most commonly ust:d test in practice <1nd io; ont: r
the most reliable.
The Dickey-Fuller test in the AR(I) mod~l. The starting pomt for the
Dickey-Fuller test is the autoregressive model. As d1scusscd earlit:r, the ranol m
walk in Equation (14.27) is a special case of the AR(l) model wah {3 1 = l.ll /31 =
1, Y, is nonstationary and contains a (stochastic) trend. Thul>. within the A Rp~
model, the hypothesis that Y, has a trend can be tested by testing
(14.30)
If {3 1 "" 1, the AR(l) has an autoregressive root of 1. so the null bypotl1esis in Equ.ltion (14.30) is that the AR(1) has a un it root, and the alternative is that it ''
stationary.
'This Lest is most easily implemented by estimating a modified version of Equltion (14.30) obtained by subtracting Y,_ 1 from both sides. Let
{3 1 - 1; tht;tl
Equat.ion (14 .30) becomes
o-
= {30 + 8 Y,
... u,.
(1 -L~ll
14.6
Nonslctionority I Trends
561
regression soft ware automatically prints out the t-!>tatistic tc~Ling 8 = 0. Note that
the Dickey-Fuller test is one-sided, because the re levant a ltcmathe is that Y, is
stationary so {3 1 < 1 or , equivalently. 5 < 0. The Dickey-Fuller sta tistic is computed
using " ooorobusr standard errors. that is, the ' homoskedasticity-o nly" standard
errors presented in Appendix 5.1 [Equation (5.29) for the case of a single regressor and in Section 18.4 for the multiple regression model]-3
The Dickey-Fuller test in the AR(p) model. The Dickey-Fuller statistic presented in the context of Equation (14.31) a pplies o nly to an AR(1 ). As discussed
in Section 14.3, for some series the AR{l) mode l docs no t cap ture aU Lhe seria l
correlation in Yl' in which case a higbe r-order autoregression ill more appropriate.
The exteosjon of the Dickey-f uller test to the AR(p) m od el is summarized in
Key Concept 14.8. Under the null hypothesis. = 0 a nd ~ Y, is a stationary AR(p ).
U nde r the a lte rnative hypothesis, o< 0 so that Y , is statio nary. Because the regres-sion used to compute this version of the Dickey-fuller statis tic is augmented by
lags of d Y,, the resulting !-statistic 1s referred to as the augmeoted Dickey-Fuller
(ADF) stafjstic.
In general the lag length pis unknown , but it can be estima te d using a n inform ation criterion applied to regressio ns of the fonn in Equation (14.32) for various values of p . S tudies of the ADF statistic suggest rbat it is bette r to have too
many lags than too few, so it is recomme nded to use tbe A IC instead of the BJ C
to estimate p fo r the ADF statistic.4
Testing against the alternative ofstationarity around a linear deterministic time trend. The discussion so far has considered the nujl hypothesis that
the series has a unit root and the alternative hypothesis that it is stationary. TI1is
a lternative hypothesis of stationarity is appropriate fo r series, like the rate of inflation. that do not exhibit long-tenn growth . But other econom ic time series, like
Japanese G OP {Figure 14.2c), exhibit long-run g ro wth , a nd for such series tbe
a lternative of stationarity without a trend is inappropriate. Instead. a commonly
used aJternative is that the series are sta tionary around a deterministic time trend,
thai is. a trend that is a deterministic function of time.
One specific fo rmulation o f this aJtema rive hypothesis is tbat the time tr end
is Linear. that is, the trend is a ljoear function oft, tbus. tbe null hypothesis is tha t
the series has a unit roo r and the a lte::roative is that it does not bavc a unir root but
dues have a deterministic time trend. The Dickey-Fuller regression must be
'Under the null hypothesis of a unit root, the usual .. non robust.. standard errors produce a 1 swti~ltc
that Is in fact robust to heteroskedastlci ry, a ~urpnsing and special rc~ult.
~sec. tock ( 1994) and 1-!aldrup and Jnn~~on (2006) for review~ of stmulation qudics of the flnitc-srunpt.: prop<:rllcs ot thc Dickey-Fuller and otlwr unit rootte~t \tatiMtc:s
562
CHAPTER 14
14.8
Inc .tugm.. nkJ Dicke~-Fuller (r\Df) h.'>t fo- a unit tutorc :.._.,, . roottc b t he
null h) putt \.'St" Hn i> = 0 agam t the on~.--~id~o .tht!rnnlt\ t: H /i l in tht: regn!"
sion
.n
+ y,.l Y,_ ~
+ ...
+ 'Y,.lV, I'
llr
(14.32)
l nth:r th ... null hypothe=-is. Y1 ha ,, slot.. ha~ttc trl!nd: uudet the tltl:rnativc hypothesis } io;. '\tationary. The ADF staust tc I!:! the OLS t-stall~tll.t~~ltnl' 6 - 0 in Eqtl(ltton (1-L(!).
If ithtc 1c.l llle allt!rnatiYe h~ pothcst' 1s that Y, is stationary around .t Jetermini-.ti~o linea time trend, then th1" trend ...( ( t h~.- observauon number). muM be
added a.; an t~ddttional regressor. iu which case the Dickey-Fuller regression
hccome.s
1-Y,;. {3(1
r- 2
+ ... + Yij, Y,
.... II(.
( 14.33)
where a is an unknown coefficient amJ th~ ADF' <;tntistic i!> the OL<) r-stmi~lic Lc:.l
ing D 0 in bttl:'ltion (14.33)
"lht. ltt1! 1~.-nl!th p can bt: e~timate<.l U'-mg th.:: BIC or AJC. The \DF ... tatisttc
dot:s 1 m han. " normal diMributton. ev~.-n m large sampk-;. Crillcal 'alues lor the
one-std~.d ADf te:.t depend on "hetht>t tht. LlSI ~~ l:lal:>e<.l on Equauon ( I~J2) or
(14.33) and an: gi\cn in Table 1~.5.
modifit:d Ill tc't the: nuiJ hypothcw' uf ,, unit root agam~t th~o alternative that 11 ...
'ullionur) arnunJ a lin..:ar lime Lrend. \ ' ... ummariLeJ in l:.quation {1~.33) in Kl
Concept I-tS. this h accomplished bv adding a time tr~o nJ (the regre,sor ..\ 1 = /) '
th..: regre,shm.
A hnt:ar time trend is not the only wa> to 'Pc~tf) (I Jetermmtslic ltrn( lrt:nd
tor example. the Jct..:rministic time trend could be quadratic. or it ~o u l d be lin~::.tt
0Ul h<tYL hn:Hk'> (that i~. he linear With , Jnpt.;s l h <ll tJ jf(er in tWO pnrt'i OJ th1: ~01111
pie). lhe u<;c \lf a ltcma lives like th c~c with nonli near Jctcm1inistic.: ttcmls -.hnuld
hl! motiva ted bv economic theory. For a dtscu-.,ion of unit root test~ again ...t '' t
tton<.tritv arnund nonlinear determtnt~lic tt~..nJ s . see Maddala and K im ( ttJ' -.:
Chapt~;r 11).
14.6
r TABLE
14 .5
~rmini1tlc
Jnt..r~~r
Noosrotionarity 1: Trends
563
ltegreuors
1%
a ni\
341
-.\12
Occau'l.' it" Ji,trihuti~1n is nonstam.l:m.l the. u,u.tl ~ritit.ll \aluc.-. from the normal
Ji'>tribution cannot he used \\hen u~in!! the. \DP'-.tnti,ttc In tt:q tor a unit root a
c;pccial "c.!l of cnuc:~l values. based on th~.: dt,tnhutt(JO ot the ADF 'l<tlisltc under
I he. null h\ r 1tl:
mu t be u...eJ m'tcad.
lllt; (.rlltc.tl ';:t lu~s tor fh(;. ADF tc't arc gh l!n m 1ohlc 14.5.13ec!luse lhe alter
nattvc.. h) pothc..,ts ot stationant} implks th.tt 11 < 0 111 Equ.llion .. (14 . ~~) and
( 14.3:>), thc ADF test i ont.:-~id..:u Fnr cxamplc.tf the rcgrc"i '" uo..::-; 1 m include
,, time trent.l. then the. hypothesb of .1 unit root i' r<:tc.cl<:d at the ''X 'ignificancc
kH;I if the A Or stafi Lie is less Lhan -:: 1-ih. II n time trend is inch1<.kd m the rcgresc;ion, the cntical value ic; instead - 1.-H.
The t-1"' tlc..:tl values. w Table 14 c; are suh,tanllnll) l.trg~r (more nt:gauvc) tha n
the on~,tJt.t.l crt teal 'a lues of - 1.2:-; tnt tht: 10" lc'c I) and -1.6-15 (at th~:. sr ~
k'd) tmm tht. '' ndard normal Ji~>tnhution ."lllc non..,tnndard di ..tnhution \1! he
AD.-:- 'it.Jil,flc ban t::o.ampk of hO\\- OLS 1 ..t.lli tic., ltJr rcgrc!.,soro; \\itb-. oclla:-tk
11 t.n~h c.m h,l\ c non normal distribution' Wh~ the lnrge "lmplc Jistnhution ot the
A DF .,tattc;tic is nonstandard 1c; discuc;c;cd further in Sl!<:-tton lh :l
l!'''
Does U. S. inflation have a stochastic trend? l11c.. null h~ rothcsis that infla
llOil hn~
a stocha)IIC trend can be testt:J ,tg.nnst th~ olhc..ln,lfl\c that it j, ,t,llil10h) performing the ADF h:::.t for n unit auturcgrc,,i,c rl)ot Th~... ADF rcgre~..ion
''ith four l.tg' uf In[, i'
fl1)
;)."'i:? -fl."l-0.11/nf,
(1121)
(IJ.t~)
-0I9..Hnj_1 - 0.2M/nf,
(O.OS)
(tWX)
r li21Uinl,_, - O.l)J;l/n[, ~
(o.o. >
( o.o~ >
(14.34)
'flle ADF f-~t.tti~tic tS the t-stati<;tic lt:'ittn th ... h\p\Hhe-.i' th,tt the cocffidc..nt on
In!, 1 is zero: thi" j., 1 = :::.69. From T.thll 14 . the 5 ~u criucal ,aluc is ! .S6.
Beet'-.{' t'lc \ DF "taustic of - 2.69 '' 'l"' n (! 1uvc th o 2 so he ICC\l d\)(!' not
rcc<.tJttlenullh~rotht.,is11tncS
,. fl rcmct.l \..I.B ~dLnt trcgr\;"itm
in Lqu1tu.n {1-1.34)." t.. therefore c:tnnot r c..ct (at th-. "\ o stgmltcanec. level) the
null h~pothC..i'> that mnattnn ha' J Untt 8UtOrt. r~;.<,,t\C T1tll. that j~ that inllaliOn
Clllltain' a '1(\\.h<Nic t reno. again't the nil~; lllottivt: th.ll it j, -;t:ttionar\'.
564
CHAPTER 14
I1H: ADJ- n..:gr..:-ssion in Equation (14.34) incluc.Jcs four lags of ~In], to compull! the ADF statistiC. Wht:n the number oi lag~ is C'\llmated usin~t the,; J\lC where
0 ~ p s 5, the A IC estimator of the lag length is, howevc:r. three \\ hl:n three Jag-,
are used (thatts, when ~nJ;_ .. Mnf,_b and Mnf,_ ~ are included a-. r~:ore' nrs). the
ADF stausttc is - 2.72. which is less negative than 2.86. Thus." hen th1. numher
of lags in the 1\D F regre:>!>ion is chosen by AIC. the hypothe ts th.Jt infl 1 mn contains a stoch.htic trend is not rejected at the 5% signtfic,tnct. k\d
These te~ts ~ere pedormed at the 5% ~igoificancc l eq~J o\t the 10% ,j~mifi
cance Jv.~;l however, the tests reject the null h) pothc,;sis of a umt ront: me DF
~tattsttcs ot -2.69 (four Jags) and -2.n (thr~ lagc;) arc more.; n~.;gatJ\c,; than the
I0o en tical value of -2.57. Thus the ADF statbticl> paint a rather <tmbiguou:-.
picture, and the forecaster must make an informed judgment about wb.cthct Ill
mode l inflat ion as having a stochastic trend . Clen rly, infl ation io Figure I U.t
exhibits long-run swings. consistent with the stochastic trend model. Jn practke
many forecasters treat U.S. inflation as having a stochastic trend. and we lollow
that stralebry here.
14.7
565
~C<lrlt..ll \ p..:
What Is a Break?
H1~.:
1ks t..tn ,1ri~c ~ithc:-r fmm a dic;crete chang.: in thc.: (1upul.llton rt.'!!re,sinn cod
fkknt 11 t Jjqinct Jate or from a gradual evolution ullh" codl1cicnrc; over a
566
CHAPTER 14
Testing for a break at o known dote. In some applica tion~ you migh t su...
pect that there is a break at a known dale. For example. if you an. '-lUJyine international trade rela11onships using data (rom the 1970s. you might hvpothcsize thwt
there is a hreak in the population regression func tion of interest m I972 when the
Bre non Woods system of fixed exchange rates was abandoned in favor ot Ooatm~
exchange rates.
If the date of the hypothesized break in the coefficients is known, then tht! null
hypothesis of no break can be tested using a binary variable interacUon regression
of the type d iscussed in Chapter 8 (Key Concept S.4). To keep things simple. con
sider an ADL(l ,l) model, so there is an intercept, a single lag of Y,. and a singh:
lag o( X'" LetT denote the hypothesized break date and le t D,() be a binar) van
ab le that equals 0 before the break date and 1 after, so D1(T) = 0 if t :s ; anJ
D,(T) = 1 if 1 > -..Then the regression including the binary break indicator and I
interaction terms is
Y1 =
Y2[D,(-.) X X, ] .l. li ..
(14.3))
567
Testing for a break at an unknown break date. Oft en th\! date of a possi
ble breal.. is unknown or known only withm a range. Suppo c. for example. you
suspect that a break occurred sometime between two dates. ..., and r 1 The Chow
test can be modified to handle this by testing for breaks at all rossible dates 'T in
betw~en To and 'T 1 then using the largc!!>t of the resulting F-statistics to test for a
break a t an uoknown date. This modified Chow test is variously called the Quandt
-=---J---f~~-r;- elihood ratio (QLR) statistic (Quandt, 1960) (the term we shall use) or, more
~bscurely. tbe sup- Wald statistic.
Because the QLR statistic is the largest of many f-sullbt..ics, its distribution is
not the same as an individual F-statistic. Instead, the critrcal values for tbe QLR
stalist..ic musr be obtained from a spec1al distribution. Like the F-statistic, rh is distribution depends on the number of restrictions bemg tested, q, that is, the number of coefficients (including the intercept) that are bei ng a Uowed to break, or
change, under the alternative hypolh~is. The distribution of the QLR statistic also
depends on ; 0 /'T and ; 1 /T , \hat js, on the endpoi nts. To and '7' 1, of the subsample
over which The F-statistics are computed, expressed as R frRc uon of the total sample size.
For the large-sample approximat ion to the distribu tion of the QLR statistic
to be a good one, the subsample endpoints. To and 1' 1 caonot be too close to the
begin ning or the end of lhe sample. For thh reason, m pract..ict: the QLR statistic
is compu ted over a "trimmed" range, or subset, of the sample. A common choice
is to use 15% trimmmg. that is, to set for To = 0.15T and '1' 1 = 0.85T (rounded to
the nearest integer). With 15% trimming. the f-staustic is computed for break
datt:s in the eentral70% of the sample.
The critical vaJues for the QLR statistic, computed wilh 15% trimming, are
given in Table 14.6. Comparing these critical values with those of the Fq... distributjon (Appendix Table 4) shows that the critical values for the QLR statistics are
larger. Tills rcnc~ ts the fact lhat the QLR statistic look!\ at t h ~ largest of many individual F-stallstics. By examining P-statistics a t many possible break dates, the QLR
statistic has many opportunities to reject the null hypothesis, leading to QLR critical values that are larger than the individual -sta tistic critic<ll values.
Like the Chow test, the QLR test can be used to focus ('I n the possibWty that
the re are breaks in onJy some of the regression coefficients. 1l1is is done by
first computiflg the Chow tesu. a 1 different break dates using binary variable interactions only for !be variables with the suspect coerricients. then computing
tbe maxim um of rhose Chow cests over the range -r0 :s :s ; 1 The critical values
for this version of the QLR lest are also taken from Table J4.6. where lbe number
o f restrictions (q) is the number of restricrions \ested by the constituent
F-statislics.
..
568
CHAPTER 14
TABLE 14.6
5%
1%
.., 12
l\ liS
IZ.Ih
~m
5. ~
7 ~s
"\
-UN
4./1
II 02
4N
12
32h
J (1.2
n1
4 12
2..~
'
'i
:;.~:.
2.119
:! Y8
JH
1)
'."S
:.~4
JJH
lU
2..1~
2.71
.1 ..21
.2.-ll)
2112
' 11'1
1.?.
" ~l
:!.).J
2.47
13
2.27
tf)
2.)7
2.21
.Z!IU
2 7S
\I
:;
""'
.4
-\1
!.16
..- ..
16
212
229
'264
17
21}..
2 .:''i
2 'Is
:!.05
2.10
253
)I)
2.111
2 17
24S
:n
I 'J'i
:.13
2 ol)
~.
2 71
It then: j-. a uhcrctc break Jl a dale within lh~;. rangt; IC ... tnl. then the Ql I~ '(:\
ti.,tic wtll reject\\ ith hil!h probability in large sample~ \furi!UH~r. th~: date .11 whi h
the con,titucnL F-, t ~tbtic is at it.-. m:tximum. 'is ~m esumatc ot the hrcal- d.
Pw.. c-.timh. I' a g'10tl ~ nc in th~; sense rhat. unlkr .:cn,un h:\:hnk tl CllnJttl n.:-.
7-IT _.!..._. -r I. th01l ts. th~; fracuon ot thL way thmuch the ~amplt. .tt whtl h 1.:
brenk occur"., c'ttm,lh.J (;onst,lt:ntl).
14.7
THE
QLR
569
t..:~;tin!' th~
hypmhc:,as of :l hrcul..
10
the r.:tre:;saun
14.9
coeftici .. nts dl .:la te;: n tht. n::: rt. -.~1 1111 10 r:~.,uatJNl ( 14.3."\ l. l~.>r cx.w.piL. : ht' '' th~
F-stalisti~.; test in~ tht! null h) pothe-.b lh.tl l'
)
)'
0. Th~ QLR (or ..upWald) ll''l q ltt'\l!C ,, th~ largc.-:t ul 'l.lltsllcs in the range : S T
' .
OL R
tt~cc.J
(143u)
c'''
=O.l5T ,tnc.J
l11c OLR tc:st can dett:ct a sing.k dic.ca;t... h r~.1 k , muhipk di,crctc brenk'.
and or -;Jo\1. ~voluuon of the.: r~gre~ston !unction.
-' If thl:rc t!' .t dtsh nct break in lhe rc~rcso;ion function. the da ll. at which the
larg~~t Chow sttllt"tic (.lCCUT"- b. an c<\timator Clf the break du tc.
'llte Q LR stalt'>IIC :tlso TI.'JCCt s tb~.; null h~ pothcst' "tth high probahtlity in lan~t.
sample-. '' hen lht..tc.: .tr~. multtpk J"" de.:. brt:uk-. n. \\hen thl' br~; ~~ ~ltmc' n the
form of a slo\\ cvolutton of tht. regrcs,ion function . llti'- m~..tn-. lhat tht.. O LR stati"> tc Jc.t~.:lh fnrm-. nf tn-:.1 thilit} other tlun a 'ing'c da<;crt:t~.: hre,ak '\.-: a result tf
th~o O LR 'latic;tic rejects the null hypothc.:~k it can 1111. an th.t l th~.rt. I' a san<> It ~.las
crete bre.tk tha t 1hl.'rt. arc: multiple dis.. reh.: hre tks. or that then... 1s siO\\ evolutaon
nf tht. regrc,.,ton funcllon.
l'he OLR ::.Wttl>IK ts summarw.:d m "-.e) lonccpt 14.9.
Warn ing: You probably don 't k now the break da te even ifyou think you
do. <\om~. tim'-'' 1111. \p~.ort might believe th 11 h~. or -.ht. knm'' the.: J th: of '' pns-;ihlt:. break so that the Cho\\ tcsl c.m he u-,eJ an::.tc,uJ of Lh'- QLR t~.-'1. But if tlus
J.. n o wleJ~c i., b:.ascd on thl.' expert\ ~llO\\l ~o)!.c of th~ ~crtc::. b~.mg anai)L'-d, then
tn l..ct tht' 0.1tc '' '"' csum.th.:o using th..: datt ulht.:tt in n infllml31 \\.1~. P.clirninat) csttm .tltnn ot thl hr~ak oalc mean" that tht. usu.tl r aitical value.<. cannot b~
570
CHAPTER 14
FIGURE 14.5
F- tatinic:
(
- --~-----
(I ll
-- -
fI()(J
1965
1970
1975
l'll80
1990
I99:i
.,llnl)
Break Date (Year)
given bfeok dote, the F-slolistic plotted here t~ the null hypolhesis of o brook rn at leo~
Unempi- 1, Unemp;- 2. Unemp._J- UnempH. or the inlef'a."P'
Equation (l.S 17) For example, the Fslolis~c testing foro breok rn 1980:11s 2 85. The OLR
slotislic 1s the largest of these Fstc~slics, which is 5. 16. This exceeds the 1% criticol YOiuc
of 4.53.
AJ o
used for the Chow test tor a break at tha1 date.Thu!. Jt remains appropnat~ tv U.'t:
the QLR statisttc 10 this circumstance.
Application: Has the Phillips curve been stable? The QLR test rm' ic.le'
a way to check whether the Phillips curve bas been stable from 1962 to "'()(l.t.
Specifically, we focus o n whether there have been changes in the coefficknts on
the lagged values of ll1e unemploymen t rate and the imerccpt in the ADl ( 1.1)
specification in Equation (14.17) contain ing four lags each of Mn/, aml Um;mJ
The C how F-statistics testing the hypothesis that the intercept and the W<.lll
cients on Unemp, 1, .... Unemp,_~ in Equation (14 .17) arc cono;tant ag.tin~l th"
alternative thatthcv break at a given date are plotted in Figure 14.5 for hrc:aks in
the central 70~v of the sample. For example. the -statistic testing for a h r~;.lt. 1n
1980:1 is 2.85, the 'aluc..: plou ed at that dale in the figure. Each F-!>tatbtK ll.''b fi\r.:
re tncuons (no change in the intcrct.:pl and m the four cod fktents on lab) vlthc
Nonstohononly II Breaks
14.7
unc:mphl) ment r
571
w 19Sl:lV: thr '"the Q LR 'tati...ttc. Comp.tring 'i 16 to tht: cnltc.tl values ll'r q5 in lahlc 1-4.6 tndka.to...:-. that lh\.o h}pothe..,j, thatthc'c cocfiicrent... .tr~ 'table is
at the l " u -;ignificanc.:: It:\ el (lht.: critllal '.tlu~: i-. 4 'I~). lhu-. then. i-; C\ ii.lrnct: thmntlca... t <me of th~:: .....: five coeffich.:nt<. changed ovt.r the .;ample
1ejected
572
CHAPTER 14
14. 10
1. Choose a number of observations. P. for which you will gcner<tll! p. ~.:udo outof-sample forcca t~: for example. P might be 10% or 15% of the sample ~tze
Let\ - T P.
2. Estimate the forecasting regression using the s hortened d ata set fo r
{= 1. . . . ,s.
3. Compute the foreca st fo r the
fu~t
Application: Did the Phillips curve change during the 1990s? Using th~
QLR statistic, we rejected the null hypothesis that the Phillips curve has been sta
ble against the alternative of a break at the 1% significance level (see fi gure P. :;).
The maximal -statistic occurred in 1981:IV, indicating that a break occurr~J til
the early 1980s. This suggesLS that a forecaste r using lagged unemployment ro Jc.m.
cast inflalion should use a n estimation sample s tarting after lhe break in 19~1 T\
Even so, a question remains: Does the Phillips curve provide a stable forecasting
model s ubsequent to the 198l :IV break'?
If the coc ffic i.::nts of the Phillips curve changed some time during thl!
1982:1-2004:1 period. then pseudo ou t-of-sample forecasL" cumpu t..:d u.. . mg J.,t.t
14 .7
5 73
574
CHAPTE R 14
li!d. ol
in prcl.lsctivc rcgre inn~ to ben' tndtl:'.ttion of the efiident market h\poth<..-sis (M.>.e. forcxampk.C.oyal wtJ
\\~lch.20l1J)
n:,.;,on~ O\cr
loner
pl l'-ft~tnhtlll}
of the'
~K
reqwre.,
nc.mc. plus lot, of compuung f'O\\cr-nnd n ~l.l!f of
talented ecouom.:tncmns.
')p..:dli~.lllllll
E-um:nion p:nod
(1 )
(2)
ADUI ,I)
,, 111 l2-)
(3)
AUL
1J
1960:1-
19611 1-
1%0 I
_002.!::
!IMI.!.I2
19')2:12
ll.059
O<M::
(O Jol)
0.07
(0057)
R ef- C'SO!>
I ~(~'l{
r, f/011,
(0.1 - )
l'.let"D
-0213
(0 193)
rrrum
0.1)(11/
.)
(0.157)
-001::'
(U 163)
-!1161
(Ills')
0.!1~11'1
(().til.': I
[nter\:cf'll
r-'\latblll
\I.I)(JJJ
(ll.IM12IJ)
llll
R-
V,\JUt.')
O.O(n7
(UHI21)
11.5111
II.K<!J
(ll.f><lft)
(11.1117)
- 0.001~
-liiiOilX
I).1M~
(!l.(l 39)
oor;a
:-;..,, .., lb.:- 1~ rc ck."SCJ10cd m \rp.:rulo\ tJ I EntrK"> m Ihe n:gressor ro"' 11re codfiC1 nt Wllh hctcr~L:c,bstiCII)'IU'I> I
stliJJdard crm~ m J'lln:nth~ 11~ lirull t,.o TLfl\ J rcpon the hctcrO!kcd:lstt.-.1\-rob~nt I tat~ltc "'"'"nt:th~ h' r'l)thc~s t! .>t
all the codh,,,nt) IIIIi .. rc~r~-..,.ooo 11n: lru. '""" tU {' llhJ.. on p;ucuth~es. and Ihe a.~u,tt'd R
I
196.
14 .7
575
-.tarting in 19~21 slwuld ~oktcrit'l 11~. 11~ P"cut.h ' out of-~mpk forcC1'-I' of milalion for thl period II.JI.Jt);f-2()()4 IV. computed u-.mg thl fuur-lag Phtllrp ... c.:unt. c'ti
m tlo..'d \\ 'tl J In tartin. 19~~.1. arc plultcd I rn!loll.o J-l.6 along With th~o lCIU::tl
,a lues of tnllauun. f',)r ex.smple.thc Ioree t')ll> nll.tlhm tor 1999:1 \\a~ '-~)m. 1 cJ
bv cgn.: "'""~In/ on ~/11/, _l . ... :1/nf, J L "'' [J1_ 1 , l.dlt'lllf'1 .. with .m lilt crept ustng the da11 through Jl)(}S:lV, then \omputin: th'" forecast ~W 1 1,
u-,ing thc~l C!\llll1.1tt:d c:ncl ficicnh and the data thnmgh lYLJX:IV 1l1e inflat 1011 lorec.t"t tor 1'>99.1 j, thln S"r,7] 1 .,.~ ''"'1\. 1,
Jnf ~ 1 , '~lli/ 1 u 111 1<JI\I\ llli:-.enur~
procedure wa-. repeated U!)lllg data throurh IYlJY: l tl"l curnpu1c the forlca~t
:fi-:ij,
Doing th1s ft)r II ~-t qudrt<-r" lrom l'J:N.lto 2004:1\ creak~ ,:4
p-.cuJo out of '<1m pie fun:.la:.l"- '' hich ,1rc plottc.:J rn I l"'llrl 4.6. The J'-cudu uurof-,ampk lur~.-~... '' ... rror~ aTL the thlkrcnc~.-:, hctwl..'cn actu:ll infl.ttron .tnJ rlS
p~cudo out of-~.unple ton:ca:-~t, th.tt i"-. thl..' diffcrl..'ncl..'' h 1\\ct.n tht: I\\O lrn~.:~> in l rg.urt: l-1 h f-or e~.tmpk. in 20011 I\' the rnnation rate fell ny 0 8 rerccntal.!c p~ollnt.
but the p~l..'udo UUI-\1(-s.rmpl e fon:cac;t or .:J.Inl' Jtl\ \\ .l'i n.J pcrccnl.tl:.l..' puant, so
I
tht pseudo out ot '<lmplc fore~1't error \\a~ .1/nf ut\ - ~I nf1111,, f\'::'lm 111 -0.~
-113 = 1.1 pch. cnta~c pomt)o. In other \\Ord-.. ,aloh.caster usin~ th AOI (-lA)
model ulthl Phrlllp' cun c. e'hm.ncd through 2000 I II "ouiJ ha\1.. lmc:ca~tcd that
inllation \\uuiJ llllf1..lM~ b) U.J pdc~.;nt.tge point in "II('IO.TV. \\here'" in rl..'alit~ it
f~o: ll b) O.b perc~. nt.tgt! point
How ~.lo the me.tn and :.1 mJard J~.:\ iation ot the p<>cudo out-ol ' mple 'Tc.:l.l'\1 error~ compare with the in-sample Ill ot the nwJ I? The stanJ.a.J ~rror u the
rcgn:-s. iol of the four 1ae Phillip~ cun ~.: lit u'in~ da 1 Irum 19t'i::::J tbwugh 19c1.-. I\
is 1.30. so hallcd 110 thl..' In-sample fit we would c\p~.:c:t the out-of-~..tmpk Lnrc~.;a'l
lrror tn hav~;; n11:an cro and ruoL mean ~quare fmt.'C.I'-1 en or uf 1.:1() In fnct over
the llJIJY:l 201J4l\ p~cuuo \lUtof-:.ample furcc<~'l f'lriod. the 3\ll.ll.!t. furcca't
crrori:.1lll.and the r-,tatio;tk tc,ting th1. hypothc-.j, that the mean h'rc<.. tc;t c:.1ror
t!tflll!ls 7t. o j, 0.41. thu<. the h'P' th~_c;" that t'l)t: for"c- ... 'h.He me.., l.l..f '''not
rcjectt!d In adJttion, the R~t\fE mer the P'~ouuo out nl-,ampk lorc~. ,t p~..nod
ic; 1.32, verv c.lu'e to \Jiue ol 1.30 for thL: ~t.md ud <..HUT ol there~~.- ~')t 1n lor tht:.
IY~2:1-II.JlJl-i: IV pcrrotl. 'v1 ure0\ cr. the plot of the tnrcca~tl> and the h,1 eca't error'
tn Figun.. 14.6 )oho"' no maJor outht.r' nr unu'u tl dbln:pancie...
Acct..,rdmg tu the: p~cudo out-t..lf-,amplc forcc.t,tinv cx~.rcase.thc pcrfurmmcc
the Phillip' curve forecasting mcxld duriM the p-:cuJo out-ot-samplc pl.ltoJ of
Jl)99:I-:!004: IV \\a' compamhlc to ih p<..rlorm,tncc dunn g. the m--..ul pic pt.lluu ut
19~2: l -J99~: l v Although the OLR tc.,t point' to mst 1hility in th~o Phillip' wrv1.
in the carlv 14~1,..,, th1s pseudo out-of., tmpk .lila I~~~~ ~ut!ge~h that 1fter the c ul~
11..1:-.0.. tm:ak. the Phtlhp~ curve; fon.:t-1'-trng model h ,., hccn l>tabk.
,,f
576
CHAPTER 14
::r
3.0
!.......... ....
-1.0
' - -- L - -_.__
1'.19 4
1995
19'1(
_.__
_.__
__.__
1997
1998
I'~'J'J
__._ _
The pseudo ootofsomple forecasts modo using a four log Phillips curve of the form in Equohon (14.1 generally
frock octuol inRotion ond ore consistent with o $loble pos11982 Phillips C\lrve forecos~ng model.
14.8
Conclusion
577
lith I! hrcal is nl)l disllnct hut rather ari'e~ from 1.,llm.lmvumg chan~c 111 the
n:mcdy ts more difficult. an<..! goes beyond the scope of ths~ hook.&
pawm~:kr-s. th~:
14.8 Conclusion
In time scncs data. a variable g..:nerall~ 1~ t:orrclatcu Irom one obscn alil)TI. nr date.
to thL Ill:XL. A consequence of tlm. corrclauon is thnt linear reg C"-Sion can be u:.cd
to forcca~t future vHlues of a Ume serie:. ba::.ccl on it!. current ant! pas( valu..:s.ll1e
:>tartmg point fnr time series regression is an autoregression, in which the rt!gre~
sors .trc lagged \alucs of rhe dependent variable. If additional prcdicto~ arc avrulahlc, then thdr lag~ can be added to the regression.
llus chapter has comtdereu sevcraltechnical lssut!s th<it ari'i.! when c ... timatmg ncl using rr..:grc siOll..'i \\llh time sen~ c.lata. One :.uch ISSUe I. u\!tcrmming Lhc
number of lags to include in the regrcss1ons. fu dt,cusseJ in ~cctton 1-1. 'i 1f the
number of lags is chosen to minimize the BIC, then the e~timatcJ lag length b con'\l:>lcnt l<.1r thl. true lag length
\n~1thci of these i~sue" cunccrn" \\ h~thcr the "eriel> b..::ing anal~ zed are stalionar~ . Jf tbt.: :-cries are stationary. then the usual method!; of ~latistJcnl inference
(such"' comparing r-stati<;.tics to normal critJOll valu~~) C<ln h~.- used. and, because
lht. populatwn regression fun~tson ts o;tablc ovcr lim~. regressions cstsmated w.ing
hi~lmic:ul data can bt:: used reliably lor fon!C<lSiing I f. htl\\cv~:r.thc 'eric-. ;.tre nonstu 11onary, thLn things become more com plica tell. where the 'iflt!d fie complic<tl ion
depends on the nalurc ollhe no~lalionarity. For c-.:amplc. il the "erit'" is nonsta 1ionnr) hecau:-c it has a stochastic trend. then the OI .S esrunalor on <.I t-st.tftsllc can
hnvc m.mstandarc.l (nonnormal) distributions. even in lurgt.: snmples. anJ IMccast
pcrf~1rmance cun he improvcu by sped[~ mg the rcgrcc;<;.ion 1n first dttfcrcnces. A
l~:'t lor dt.:.tcclme tlus typ-.: of no~tahonarit)-thl. augmtnt.:d D1cke) fulkr test
for a unit root-wa~ mtroduc~d in Section I-H1. Allcmativdy. if the pl)pulation
r~..g.n.""'on functinn ha!> <t tm::ak. then neglecting thi' hreak rc,ults in e'ltimaling an
a' cmg~. \'crsion (lf thl! population rcgre ~ion function that in mm can lead to
hia<;cd and/or imprecise lorccasts. Procedure ror Jctectine " hrcak in the populution regression function were introJucctlm Scct1on 14.7.
ln lhl' chapte r, the methods of 11me sene~> regressiOn ''l' rc apphed to economll: lore a'tin~. and the coefficient!> in these forc.ca ling ffi(lJd.., \\ere Dllt !:-tiven
a cau~<ll interpretation. You do not need a cau.,al rclatton.,hip to f<m:ca'>l. and
ignoring cau,al interprl.lations liberates lbc quco..t for Qtltlll forecastf... In some
" I m tddtllt>n;ll UI"-U"i''" "' c:.um.IIIOO .md 1(')110!! in the rrc.-..:ncc ,-,( di'<:rch: t.r.:tsl,. \\:.... Hm'""
(~IN Ill fur .m ...h.tncccl u~u~'uu1 ul.:sltm.tlum .tnJ lur.:ca,llllf, when 1h~r~ 11c ~t"IY ~ v,,t\tn)l coct
fim: nl~. ~c
578
CHAPTER 14
applicuUoos, however. the task is not to develop a foreca-.ting model hut rather to
estimate causal relationships among time series variable lhat is_ to c..:stimatl;! the
dynomtc causal ef(ect on Y over time of a change in X. Umkr th~ n~t conditions.
the methods of l.his chapter, or closely related methods_ can b\! uo; d to estimate
d)rnamic causal effects, and that is the topic of the next chapta
Summary
1. Regression models used for forecasting need not ha\'e a causal intc..:rpretation .
2. A time series variable generally is correlated with one or mor~ of tts lagged ' ;!lues; that is. it is serially correlated.
3. Ao auto regression of order pis a linear multiple regression model in which lhl'
regressors are tbe first p lags of lhe dependent variable. The coefficients of an
A R (p) can he estimated by OLS, and the estimated regression function can be
used for forecasting. 'tne lag order p can be estimated using an information l'riterioo such as the BIC.
4. Adding other variables and their lags to an auto regression can improve forecnq.
ing performance. Under t he least squares assump t ion~ for rime series regrt!s~u n
(Key Concept 14.6), the OLS estimators have normal distributions in large sa!"lples and statisti01l inference proceeds the same way as for cross-sccttonal d.tt<J.
5. Forecast intervaJs are one way to quantify forecast uncertainty. If the erro r~ tre
normally distributed. an approximate &~% forecast interval can be construclt.:u a'
the forecast plus or minus an estimate of the root mean squared forecast error.
6. A series lhat contains a stochastic trend is nonstationary, violating the second kait
squares assumption in Key Concept l4.6. The OLS csttmator and t-statistiL rur the
coeffictent of a regressor with a stochastic trend can have a nonstandard distribution , potentiaUy leading ro biased estimators. inefficie m forecasts, and mi.,JcaJing
infe rences. The ADF s tatistic can be used to lest [or a stochastic lrend. A random
walk stochastic trend can be e liminated by u~ ing Grst dif[ere nces o( t.he scri~.s.
7. H the popu lation regressio n function changes over time, then OLS csumat"5
neglecting this instability are unreliable for statistical inference or (orccasung. l'h~
QLR stat istic can be used to test for a break and, if a discrete break is found, the
regression fun ction can be re-estimated in a way that allows for the break.
8. Pseudo out-of-sample forecasts can be used to assess modeJ s tability toward till'
end of the sample, to esti mate the root mean squa red forecast error, and to compare different (orecasung models.
579
Key Terms
first Jag (528)
1 lag {528)
first difference (530)
autocorrelatio n (532)
se ria l correlation (532)
autocorrelation coefficient (532)
autocovaria nce (532)
autoregression (535)
forecast e rror (536)
root mean squared fo recast error (537)
AR(p) (538)
autoregressive distributed lag (ADL)
model (543)
ADL(p,q) (543)
stationarity (544)
weak depeodence {546)
Granger causality statisLic (547)
Granger causality lt:St (547)
14.2
14..3
Look at the plo t of the logarithm of GDP fo r Japan j n Figure 14.2c. Does
this time series appear to be stationary? Explain. Suppose tha i you calcula ted the first dtfference of this series. Wou ld it appear to be stationary?
Explain.
Many financial economists believe thnt the random walk model is a good
de cription of the logarithm of stock pnces. It imphes that the percentage
changes m stock prices are unforecastable. A fi nancial analyst claims to have
a new model that makes better predictio ns than the: ranc.lorn walk model.
Explain how you woul.d examine the analyst's cla im tl1a t his model is supe
rior.
A researcher estimates an AR(l) with an intercept and fmds that rbe OLS
estimate of /3 1 is 0.95. with a standard error of 0.02. Doe"> a 95% confidence
interval tnclude {J 1 = 1? Explain .
580
CHAPTER 14
14.4
!>uppu:,~o.
thnt you c;;uspected that the tntl'tl.t.pt tn E4Uilti1111 114 17) dt.utgcu
I'JIJ' I llo'' '' ould you modtf) the c4Uat1on ll' inc~1rpur ate tht:- ,,:hnngc .'
I km \\lllllu you lC'it for a change in lht: interc~.pt '> JTm, \\ouiJ ~nL lc t fu.
a chan~~ in the mterct:pt if you did not know Lh~o. J lk ,tf tht: d 1 ?
111
Exercises
U.l
h.
14.2
th,tr f."(}') =
= f3o- /31}',
1).
(',mc~.pt
I4.5.)
1hc. 111dc)i ol mdu-,trial production (JP,) i:-. a mnnthl) tunc M.'ncs that 1111 <~
Mites the. quantity ol inuustrial commu<ltth:~ produccu 111 a given nwnth I ht::.
prnhlun thC'> data un this index for the l.lntlcd <;tatcs. All regr~''mn' .til
l.'~limakd mer the ~amph.: pl'riml l%0:llt1 :!000:1:! (th,ll j,,JanU.II) 19o0
throu.!!h D~..ccmber:!IIOO) L et Y, = 1200 Y ln(IP,t lfl, 1).
fott!C 't'>h::r ~t.ues
J11
u.
h.
//',Ilk
l'U
!'UpJ <'
1.
.t M\!C..J,k: T ~-.tinulcs
} ~-.:
J r7- 0311'\}'1
(0.1)62} l0.07~)
! -
fl123l , :::
(11.lJ5_')
'> mu.ot.., ,,
(0.056)
to foreca<;t the\ aluc ot } =n J.l"U If\ 2001 u-.mg the tullow'.tluc:. ol 11 lor t\ug,u\l 2UlXJ throut-b De-~mb~r "000:
Date
2000:7
2000:8
2000:9
2000: 10
2000:11
2000: 12 l
//'
147.5\1:;
~~~650
4S.973
I ~!\.(lUll
14.\ "On
1~'7..illlj
d. WorncJ about~~ potential hrc..U.... ~he cumputc:, a OLR l1..'t l'' ith I''
nmming.) , th.: u ,n... tanl .1nd ,\ R C< ~.oll11. lh r '11. \ Rtd) rr uJ..:
1l1e aesulti QLR ..., ti'\lk "a'' .t". T' tl 1 i I c \lf a b 1.. ' ' '
l \nbln
Exeroses
581
c. Worneu that "h1. might lun c mcluded tunIC\\ lll ton m.tn) lag!'> in th~:
nll)dcl the lorcca~tcr estunatcs A R(p) nwdl!b 1!11 p
I, ... ll LWc;;f
the ~ame ~ample period The sum of 'lluntcd r..:sidunb frLlffi each of
thc..c c-;tim1tc;;d modclc:tc: ... hown m the tuhlc 1 the BIC h" c ... tim.lte
the numh('r ol lag' that sbouJJ ~c incluth.d m h~o. ,,u l'l cgr~. ...... ion. n,)
th~... rC!->Ults t.hflcr if~ ou u.-;e the AI( !
2
AR Order
2H.3~1
U .J
lJ,ing thl.! ~o,amc: Jata as m Exl..'rtJ'>L 14.2. a rc., trdt~...t 11.!'1' lnr a swchasuc
trend m ln(IP1 ) usmg the followmg regr~.:s,ion
iiil~)
= 0.1161
1) -
lo~.lln(/P 1 )
(0 05))
\\ here the swntlard errors shown in p.tn:nthc:-.~.:' OJrc.. ClHnput~o.tl u-.mg thl!
h'lmo,kcd.l-,ttcuy-only lonnula and the n:gtcssot .. , .. ts .1 hm:ar !line u entl
a. Use the -\O F ... tati-.tk to test fur a -.tocha!->tic trend (unil Will) in
In(/?).
U.$. l rcasury
btll~
PrO\ c I he
ll>IIO\\ in!!-
Cfrl'l s.
a. l.d \\' he a random \'ariablc wHh mcnn Jl-u and \ari.tnce tr?1 anJ kt c:
"c n constant. Show that El (W - t) 'J trf, + (Jtn - 1.) 1
582
CHAPTER 14
1n this exe rcis~ you will conduct a Monte Carlo experiment that !>tudtes the
phenomenon o( spurious regresston discussed in Sectu.m 1~.6. In a ~lontt:
Carlo study. artificial dara are generated using a computer. then these artifi
cia I data are used to calculate the statistics being st ud ted . This ma ~ e it po~
sible 10 compute the distribution of stattstics for known models when mathemal'ical expressions fo r those distributions are complica ted (as they are here)
or even unknown. In this exercise, you will gene rate data ~o that two seric.;,
Y, and X,. are independentJy distributed random walks. The specific ~tepl> ilf1..
i. l 'se your computer m generate a seq uenc~ of T == 100 i.i.d. stan-
Nn
of
iii. Regres.<> Y, ooto a constant and X,. Compute the OLS estimator,
Exerdses
583
lim fraction seem to approach some other limit as T geb larac'l What
is that limit'?
14.7
= 2.5 + 0.7Y,
u,.
1-
= 9.
a. Compute the mean and variance of Y,. (Him: See Exerdse 14.1)
b. Compute the first two autocovariances of Y,. ( Hmr:Read Appendix l4.2)
c. Compute the ftrst two autocorrelations of Y,.
d. Suppose that Yr = 102.3. Corn p ure Y r- 1 1 == E(Y r- 1 1Y 1' Y r- t... ).
14.8
Suppose that Y, is Lhe m onthly value of the number of new ho me construction projects started in the United States. Because o f the weather, Y, has a
pro nounced seasonal pattern; fo r example. housing starts are low in Ja nuary
and high in June. Let JJ.Jan d eno te the average value o f hou sing sta rts in January a nd JJ.Nb JJ.N.,, ... lkuu: deno te tbe average values in tbe other months.
Show Lhat the values o f fLJon Jl.Fd . .. , 1-L D.-. can be est1mated fro m the 0 LS
regression Y, = {30 + {3 1Feb1 + {3 2Mar1 + + {3 11 Dec, .,. u,. where Feb, is a
binary variahle equa l to J if 1 is Ftlbruary, Mar, is a binary variable equal to
Jl.rtb f3o +
/32=
a:.
11.
/3o-
= u~(l + bf + h~
+ + b~).
14. 10 A researcher carries out a Q LR test using 25% trimming and there are q =
S restrictions. Answer the following que tions using the values in fable 14.6
(CrilicaJ Values o f lbe QLR S tali~tic with 15n~ Trimmmg) and Appendix
Tahle 4 (Critical Values o f the F,.,, Distributio n).
a. The QLR Fstatistic is 4.2. Should the resea rcher rejeetthe null
hypothesis a t the 5% leve l'!
b. 1l1c QLR F-statis tic is 2. 1. Should the researche r reject the null
hypothesis a t the 5% level?
584
CHAPTER 14
c.
Th~ Ql R 1-~l~tti,tic
is 3.:; 'lhould
hypothc~i' at lht.. .:;% lt:vd?
u}
a. Shm\ th.tt
b.
D~.:nve
lollow:-
th~; \ R ( l)
r, follow~ an
th~:
modd
_L
11.
\ R( ~J model.
Empirical Exercises
El-l.l
On the h!xtbook Web sne \HH\.Uw-bcco mJ,co ck_ watson. wu "ill find'\ t:l
1k S\1acru_Quartcr1) that (ontam~ qu<~rtal) data on l>l:'-~o.r.tl macfll.,.. ,,.
numk 'eric-; htr tht: l nll~J States; the dat<t arc dc:,lltbcd in the 11!..
USMuc:ro Oe'lcription. Compute r', =ln( GJ)f',).thc log..~rithm 01 r~al (I I)p
and .l Y. the quarterly grm"th r tl~,; of G DP. In l:mpirical r\dch,c' lU - 11(t
U''- II~,; ,,tmpk rcroo ll15 .:;1 -2II044 (where d tiJ ~fore l'}.5<i ma~ be U ~,;J
ali> 1 e,;c.;(,;s" f). a... mttal '-Jiuv l~H lag... in r~.-cr~'~luns).
u. r l>llWare thL m~:ao ut .l Y,.
h.
r.
E't1matc tht.. 'tandard U!,.;\ 1 ttiuo of .l Y,. f.>.pr~:.s you r ""''\ r.:r in pc1
ccnt.lg~.: point:-. at on annual rate.
d.
E~tunatc
<~n
1!: 14.2
annual
r.tt~
u. bttrnak an AR(l) model for ~Y,. Wh at i11 the estimated AR(1) wLII J
~''- nt? [s the ~oudflcient -t.lll'll~ally stg.nilt~mtl\ different Irnm zero' ( m
~trul t d )5 .. ~.onlidcocc intcn a fur lh~o t~oru .lllun A-Rt 1uletfic t'
b. htim.1te .tn AR(2l mudd 1M ~Y,. Is th~,; ,\R(2) cocfttcicnt sraHstrc,ll\'
sig.nificamly tltfferent from 1r.:ro? Ls thi-. muut!l prefcrn:tlto the ARt I )
moJd'?
Hlln~rcsxi \l rO<
in II ~; \R m,d"l f,,r r' \:, n altcrnat \'C. uppose th:ll Y, i" stati0nn1)
Mound 1 J(,;L!,.;nlltnl' K tr(,;nJ
Empiricol Exercise~
E 14.4
E 14.5
u.
111
585
Let R Jc:notc the 1ntcrc:"t rate for thrcc-month trcn~ur) hilh. E<atm.llt.
an ADL( I A) nwud for~} , uc:inP lagc; ot :lR, as auuitional prt.lllllorc:.
(omparin!! the- 1'\DL( 1.4) moc.k Ito the AR I) mudcl. h\ ht1\\' mut:h h 1...
the R~ ch tngt.u!
E 14.6
f~:-.t
th~.:
c.
codfit. cnh un the c.:onq:'lnt tel nl nnd coeffit' ic:nb on the lagg~J value-, ot j,R u..ing 1 OLR h .... t. b there C\ iu~ncc
ur a break?
ll.
c~llll>truct P'CUUO out-of-... ample forecast-- u...ing the. AR(l) moc.lcl hcginmng m I ')~4:4 and !!lllllg through the cnd of the ...ample. CJbat j,, com-
ft)r break in
:1().: 'O:Iant.J,off.,tlh)
h. Con,truct p)cuuoout-ohample fol"l:c.Jsts u'mg thc t\DL(l.4) moJd.
putc..l}' 1 Ari'I'AWI..lY,
c.
d.
n.uve model:
Arc any of the: lorcc~"t:. b1ascd'> \\'hid1 modd hns Ihe 'mallt:'l root
l h1\lo tan<: i~ the R\tSrr
(exprcs,c:J in percentage pnintc: :tl an annual ratl.) f<'l the bt.'\1 mt)tkl'!
~ 14. 7
Rc ..d the hm.es Cun You Beat th~.- \l,uJ.\I.!t! Part 1 and "Cm You Bcatlht.'
M.1rkct'? Part If" m this ch.tptcr >.t.:<t. go tu tht. c0ur't.' \\d'""' \\here: ynu
'''" ltnc.l an C\\l.ndcd \C:r.,ion of th\. dut.N:t dc,cribc:d in the ho:-.cl> the data
Me in tht: fik Stock _Rlo!turnc; 1931 2002 nnd ar"' dc-,uil~cd in the fill!
Sw ct._Re turn' 19,\ 1_2U02_D escriJ>Iion
u. Repeat th" calcul.tlton., rcpm tcd in Table 14.3 U'-IIW 1~..:gres~IOUl> c'limated over the 19."2:1-2002.12 :),tmph.. pt. nod.
h. Repeat thl. calcuhtton!) rt.:pnrteJ in lahle 14.7 u'ing reg c....,l,..,n) .!:.timateu uh r th~: 1932:1- 2002:12 sampk pcnod.
c. [<,the \'art,1l1le
ln(d~~idmtl
586
CHAPTER 14
APPENDIX
the monthly average of daily rates as reported by the Feder<Jl Reserve and the dul-
lar pound exchange rate dma are the monthly average of daily rate<.; both arc forth~.; fin \I
month in thc: quaner. Japanese GDP da1a were obtai ned from the OECD. The daily pc1
centugc change in the NYSE Composite Index was computed as 1006 1n(NYS,). whctt
NYS , is the value o t th e index at the dai ly close of the New York Stock Exchange; becau~~:
the s tock exchange is not open oo weekends and holidays, the rime period of an aly"~ I!> a
bu~iness da y.'fbese and thousands of other economic time series are free ly av11ilable on the
Web sites maintained by various data-collecting agencies.
The regressions in Tables 14.3 and 14.7 use monthly financia l data fo r !.he United State:
Stock pnces (P,) arc measured by the broad-based (NYSE and AMEX) value-weiO!t" '
mdex of stock prices conslructed by the Center for Research m Secunty Prices (CRSP I
The momhly percent excess return is 100 X {ln[(P, + Dtv,)/ P,_d- Lo(TBill,)}. v. hcre Dtv.
is the davidends paid on the stocks in the C RSP indi!X and TBtll, ts the gross return ( Ptbe interes t rate) on a 30-dayTreasury bill during month t. The d:ividcnd- pnce rauo j, 'on
structed as the dividends over the past U months, di' ided by the pnce in the current m mtb
We thank Motoh1ro Yogo for his help and for providing these data
APPENDIX
appc nd1 x shows that, if lf:3tl < 1 and u, is stationary. then Y, is stationary. Recall It om
Key Conce pt L4.5 that the time !.cries variable Y, is s tationary if the joint distribution ''
( Y,. 1 , YJ r) does not depend on s. To slreamline the argument. we show tht-. furn
for T = 2 under the Slmphfyang assumptions that f:3o == 0 anJ lu1J are 1.1d N(O, ulJ.
587
The firs !>tep is deriving an exprc~ion for Y, in tcr ms nf the u,~ Becau'c ~11 0. equation ( 14.8) implies that Y, = {3 1 Y,_ ~ u, Suh!>li t ulin~ Y,_, {3 Y, ~ u, into lht!> t\prev
sion )idtlo; Y, ~ {3 1({3 1 Y,_: + u,_J- u, = fj 1Y ... f3 u, -1- 11 Conttnumg thic; c;ubo;titution
another Mep yields Y, ={3~Y,_ 1 + {31u,
f3 11
1 -
"t
{3iu,_') + ...
..
Lt3itt,_,.
( 14.37)
1=0
Thus Y, i<; a \\etghted average of current and pa!il u,'s. Becau~c the 11,'s are normally
distribuled and becau..<\C the weighted average of normal random varrahles is normal (SecliOn 2.4), Y,
1 and
the bivanate normal distribution is complt:lely dete rmined by the means l1f the lwo variables. their vari:.mccs.. and their covariance. Thus. 10 show that Y, i) stationary, "e need 10
show that the means, variances. and covariance of(Y,.,. 1, Y,.z) do not depend ons.An extension of lbe argument used below can be used \<.t ~bow that the dislnhution o f (Y,. 1.
Y,
2,
The means and variances of Y,.. 1 and Y1 ~ 2 cun be compu1ed uc;1ng Equa1ion (14.37).
wrth the subscrip1s s + 1 or s + 2 replacing 1. F1rs1, bccau!>e E(u,) = 0 Cor alii. EC Y,) =
L.; ,J3j (u,_,) "' O,so lh~ mean of Y,_1 andY, ., are both zero and w particular do not depend on s. Second. var(Y,) var( l:;:<Plu, 1) =
0 !J31)var(u,_,) =
( l:~.o(jJu,_,) =
~~~
-.em,
2:;
~/(J
where the fmal equalily follows from the fact that. if 1aJ< 1.
k;:oU' J/(1 - a):thusvar(Y,~ 1 ) = var(Y,.- 2) =uT,I( l - f3i).wbichdoc\nOI d~pen dons
a~ longas i J3 d < l.Finally, because Ys~l = ,t31 Y,.t + u. -~,cov(Y 1.Y, ,) - E(Y, 1 Y1 _, ) =
[ Y~tt(f3 1 Y,+ 1 + u,+ 2)J = f3 1var( Y, _ 1) + cov(Y1 1.11, ~) = {3 var(Y1 1) = ,13 1 (1~/ (1- f3i).
The covariance does no1depend on s, so Y,. 1 and Y, hav~. a Jntnt probllhthty di tstbutton
n(.BI) 2 =
that does not depend on ~. th at is, tbei.r joint tlb1ribu1ion is l> l<~llonary. II lf3 11 ~ I. th1s calculation breaks do"'n ~a use the infini1e sum tn Equ.uron ( 14.37) Joe!> nol conv~rgc and
the vanance ot Y, rs infinite. Thus Y, is statiomH} if f3 ~ I but no if f3 ~ I.
The preceding argumcm was made under the assumption:.. that fJ - 0 and 11 rs normally dlsllibuled. lf f3o i: 0. tbe argument i~ simtlar t.xccpt 1hn1 the mean-; of Y, and Y, .,
are ,13.,1(1 - ,131) and Equation (14.37) must he modtftt:d for Ihr., nonzero mean l11c a.~ump
tJon thnl 111 ts t.i.d. normal can be replaced wilh Ihe ac;<~u mpt ion that u, is slationilr) wJth a
1inll~ \'urtnncc because. by Equation (14 ..n), Y, can sttll bt: expr\!~scd as a functton of cur
rcn1 .md past u,s, so 1he distri bu tion of Y, is statmnnry as long a the c.lrstnhulion of u, io;
~1a1ionary nnd lhe infinite sum expression in Equn1ioo (14 ~7) is meaningful io tbc 'cno,c
lhHI 11 con"\!rges. which requires 1 ~1 1 < 1.
......
588
CHAPTER 14
APPENDIX
.md the next two chapters~ ~tre:1mltno.:J con~tJct :1bl} b)' adopting wh
'' J.:no\\ n " Ins, orxr.ttnr notation lxt L demltc th~; lug opcrutor '' h11..h hJ:. th1. prop ' ,
th lllltran~hlTOl\ \artahh: into It<~ lag. Th:nt,;. the Ia!) (lfl\.T31Cir L ha~ the rro~rt~ I } r
Ay .1pplyin~tht htlo(upcnt.lrt,,icc.:.tmc.:l.lblilinqhc,..:~o:~nJioig L'r = LCLr,)- Lr,_ ,
Y
\fore gencrull). h~ 3pplvint the lat operator j time one nhtain.' the? lag In ,um
m.1rv. the l.tg npcr.ttnr ha<~th~. property that
l'
L Y1 = Y,.
L~ Y,
= Y1 ' illld U Y,
}'1 ,.
'l11c h.w, opcntttll nutation permits us ll> ddin1. the lug polynornhtl, wh1ch i~ a pol\' no
rni,1l in tht lal! lptruhr:
u(l ) -
11 1
+ n l. + a: L
..f.al'
+" I - ~
'
'I
\\here u11.... ''"are the coefficu:nts or tho.: l.tg pol)"' 11 I 'lJ 1.11 = ' lbc: ~cgr~; .. of th.l .,
pol} num1al a( L) 111 Equatlf\1\ ( JJ39) '"I' \luluplying Y. "~ o( L.) ~ 1eld,
f)
+ ...
u,. y
whrre n 1 = I and 11
... -~,.for
1J - }.
(l..t .. )
\\ntt~.:n
II
5 89
APPENDIX
14.4
ARMA Models
me auto regressive- m oving a' erage (ARMA) model extends the autoregrc~:.ive model by
modeling u, as serially correlated, specifically. as being a distributed lt~g (or " movmg a ve rage'') of a nother unobserved error tenn. Jn the lag opera tar notation of Appendix 14.3,1et
(14.43)
tha n by a pure A R model with only a few lags. As a pract1cal m atte r, however. the estima
uon of ARMA models is mo re d1fficult tha n the estima tion of AR modcb, a nd A RM A
models are more difficull to extend to addi tional regressors tha n a re AR model~
APPENDIX
14.5
appendix summari7.es the argument that the BIC estin1ator of the lng length. p. in a n
= p) --+
590
CHAPTER 14
BIC
Fir-t ~un,IUt r lhs.;
cru, one, ur ~~~ ll.?' \\hen tb..: trul.' lag k 1gth '" on... II ' .,hm\ n hdo\\ thJt (I 1 l'r(p
0) -
0.md
(ti) Pr(p
2) -
e"tcn,ton o r !-tic; argument'" t.,e g~.:ncrnl ca..e "f '" 1 d ,.,over 0 s I'
---4
I) -
I The
enuuls h "
In( 'i5R(O) I)
'\\R I) 1'\ -+
[ 1t\\RIOJ.T)-L1ll/Ji/)
ln(5SR{!)/
n - (In[),/.
I'.U\\
IU -
s.'RlOI. I
'L
1),
-"- n 1 \Sf<{!) J ......:....,, "~ami (In T) J'---> 0. pullm~ these ptc<.:c."> ltgcth~:J , Btl (OJ
-RIC'( I)
-''-
lntr;. - Jrur~
lHCII)) - O,,uthatl'r(iJ - 0) - 0.
Proof of (II).
ll chov-.~.
RICtl) o. 'ow lllill.l2) TllC.( L)j"' Tl[ln(S R( ')1 T) + 'Onl)r II- [In! ~'RII J'TJ
-:!Cin7) /ll-nn(\SRl2)'SSR(II) lnf=-nn(l
f.(T-~))
lnl.\\l, ... 1
H1<12)) [\\Rf"'
{SSR(')
~
U -
~))
11 (~ .:ln'..:llu~ncc
tt -
2) -
Jhtrihuttnn.
r tl).;.
f/(1 -~1
n \\htd h
,, \' ,
-'ll
F
com~.:. l'
1 ,,,
AIC
ln the 'pc~.:t.tl ct~o1 nn \ R(J) when tcru.llne, e~r tW1 la!-\'>trc cothu.krcd.(t) opph~' llll1l:
AI( 1\h~;.t,th~tcrmln/t't~pl.tci.:Jh) \'PI'r(p-0) -~ O.Allthc-.t"P"inllu.:pnl~ll~tl
(ii) fur th~ n Ic ah!1 1ppl~ tlll h.. A J(. "'llh the: ITI11lhh1.' Ilion thJl In r ,, r~ pi.!C~U h~ ~ tllll
I.
CHAPTER
Estimation of
Dynamic Causal Effects
15
19~'
n 1he
LtiJic
t:~rcd
~lurph~
over the winter to make milhons in the nrange juice conccntrnw futurt:s
hi~.
tr nkr m 1.lrang~.;
juice futures in fact do pay dose altcntion h' the \\Cathct in I lorida: Freeze-; in
llonda kill lim iJ.t oranges. the source of nlmo!>t .til fnvcn ot.lll~\.. jurce
con~.crmnte
made in the l.Tnitcd States. c;o 1ts suppl' I<lib nnt.lth\: rru.:c
rh~.:s.
13ut
prcd,c:l~
ho\\ much doe:. the pri~X ri..,L "hen the weather in Holitla turns -..uur'?
Do\;.., th\..
prtc~.; n~e
!-tO. !01
que: '\I tone; that re.li-We traders in orange jUice fuiUrcc; need w am.wcr if they
\\,101 to
succeeu.
' lhr., ..:IMpter tak~s up the pro~km tll c-..tim tting thl dkct ''" }'no\\ ami in
the futun: of a change in X. that is.. the d) nnmk cnu.,al clTccl nn 1' of .t change in
A. \Vhut. fot example:. ib the effect
00
the rath
Ill
ol n lrct7ing spdl in Florida? The .,t,ming pmnt lllr modcltng :mJ csiLOl.ttmg
d~ n.unil' l'.lll'al dfect~ is
~ ,,, I!Xprc~ .... J .JS a
introduce~ th~;
Lbe so-callcu di-.11 ihtllcJ lag. regt c.,, jon nwJd, in "hich
t. fiLet.
O\Cr time.
l.i 2 ltkl.!s a doser look at \\h,tt, ~'rccl'. ch. b 011. tnt"' a Jynam11: c<.~usal
592
CHAPTER 1
E$~imotion of Dynomic
Cousol Effe<:ts
Section 15.6, does not require the material in SectiOtlS 14.5- 14.8.
1 s. 1
593
594
CHAPTER 1s
P~ r.:c n t
2~
of II
l'K.:J
197
'r~ar
Year
(b ) I' ll."lll Ch;mgc a 1 P c: oi
I uz~n c~)u tltr.ll~ .I OtJil,_'l.: Jnn..<
Freuin~
ncgrcc D.')'
til
J5
311
Year
(c) M1'1 ahh I r~\"7in~
lk~::o"'~
There hove been Iorge monlhJo-month change~ in lhe price of frozen coocentroted oren~ juice IN:Jny of tho large
mOll~!$ coinctde wtlh freezing weother in Orlando, home of the orange groves
%C/1~P1
oJ2
. ,II
Ob~cnation.,:
= -IJ.~O +li.-HFDDr
(I ,,J)
(0.2?) (0.13)
The ... t.anJ.ard crr,,r, reponed in thi... ~ctaon .src nt'l the u... u tl OLS ... 1nJJrd
crrv~. but r Hher ~~~~ hch. rahl,.eda~ticit) .md .tutocorr l.stiamc,lnsistcnt I I l \C)
~lamlarc.l errur" thor arc uprropriatc \\hen the etl'\''11 term and rcgrc,,nr~ nrc
auto orn.:lall.:d. Jl t\L l>liiiH.l<JrJ errors art di.,cu..,M;:J an \~o1.taon 15.4. anJ lt~r no"
tha:} .art! u..,uJ witlwut furtl11.:r ~ xplanauun.
1S.2
595
Accor<.hng to 1his regression. an additional l11:: ...:1.in~ dcf rCt.. day during a
monlh increases the price of orange j uice concentrate over that month by 0.47%.
In a mon1h with four freezing degree days. '\UCh as t'ovember L950. the price of
orange juice coocenlrate is estimated to have increased by 1.88% (4 X 0.47% =
Ui8%), relative to a month with no days below freezing.
Because the regression in Equation ( 15.1) includes only a contemporaneous
measure of the weatber, it does not caplure any lingering effects of the cold snap
on the orange juice price over the coming months. To capture these we need to
consider the effect on prices of both con1emporaneous and lagged values of
FDD. which in turn can be done by augmenting the regression in Equation {15.1)
wnh, for example, lagged values of FDD over the previous six months:
--
+ 0.06FDD,_1
(0.06)
(0.03)
(0.03)
(15.2)
(0.04)
Lea rning more about the tools for estimat1ng dyoamic causal effects, we
should spend a moment thinking about what, precisely, is meant by a dynamic
causal effect. Having a clear idea about what a dynamic cau.,nl effect is leads to
~t clt!arcr under tanding of the conditions under wh1ch il cnn he estim::tted.
596
CHAPTER 1S
1 2 del n~J c.tU'>dl dft.:~ 1 <~S he u come ol :'10 ideal mndollllt~. ... n
rdJcu e\pcnmcnt: WI C'l 1 hortkultu .IJi..,t nndoml~ npplic~ r~rltlizc
1~
I<Jm'lh plot... but n,)r oth~o.rs and tiKn mca!)urcs the \Jcld,thc ~\pcckd c.JifCcr c ncc
111 ytcltl b-.:t\\een the krtthzed anti unkrttli1.cu pluh ts the cuu.;;iil dte~ t on tom 1
10 yield olth~ f~.:rtlltzu. llns concept of an ~::xpenmcnt , hO\\C\t..:l j.., on~ tn \\l\lch
there nrc rnuhipk 'llhJt:Cts (mulllph: tomato pi~H:. 01 multiple people). :.u lh~
J,tla arc either c.:Tos-.-sc.:ctional (th\." tomato ~tcld at lhr: end ul the hurvc:-.t) 1.11
panel cl.11.1 (ind1viJunl im:ome~ hd~.>rl. .tnt! aft~.r .m c~pt!'rim~.nt<llj~lh lr.tininl! prol!lam). H) h 1\tng multiple.; \UbJects. it i-; poo;-;ible to lu .. c.: both trcntment and control ~fll P': nd 'h~. ch)' to e'um.1tc th~o. ~o..1u.-.. I c kct of tic tn::ltmcnt,
In t. ::l~o. :>dir.:> ar h-..tliOJb., I hi~ tk rmtion ol ..ill ,,,I c.:llecb n k m<; Ol 'ltdcal
nmh1011ZeJ contrullc.:d e\pcnmcnt need:, to b~o. modified. lo be ~(lOl"f 1 .
con~iJ~r .10 import.uH problem ot macro~cunnnw:s: c~tunalmg the dtc..ct ol ,tn
Uni1nticipalcu change in lhc shorH...:rm intcrc:-.t 1\111! nn the cutrc:nt c111U tulurc
cconom1c acuvirv tn n given countn. as mcal'ur...:d b) G O P. Taken li~t:rall~. lhl!
randomm:d controlkd 1.!\p.:nmem ot )cction I ~would entail randomh :t-..;i.t ing Jtllcr~o.nl cconorn1~o.:> to treatment anti comroll.!tOUps. The central h.liiJ., m
treatment grllUp \\ ouiJ applr the tr~o.atmc.:nt ol a random iDler~.' r 1... '- '""""'
\\hilc tt u'e 10 the.. ~ontrol group '"ould appl~ n~.; sut.h random c..hJn '-' lvr hotlt
group .... ~.o umic ncti\ it) ([or cxampk. GOP) \\nulc..l be mea<>urcd uvcr the next
tc:w yt:.ICS. But wh.u if we arc intcrest~o.u in c.:stinuung thi~ dfcu f,,r ,, -.pcd ti c
count!\, ""Y the llnunl St<tles? 'I hen thi-. e-.;pcrimcnt wou ld cntni lluvinl! 1..hffc1
cnt "dorll'" .. nl the llnih:d States a:> c;uhJct.:ts. and assi)!nmg some clone ccon,)mtcc;
to lh~o. lr~atmcnt group .md some to the cnntrvl ~ruup. Ohv1ousl\' thts "pc.~r.tll cl
unhcrsc:-. ~o.xp~.:nmcnt '" mfea:.tble.
ln-.tc.td tn timl.' -.~o.IIC'- data 11 t~ u.,elul l\) think nl .1 r~mdomiLcJ control k d
l;lXperim~o. nt con-.i)o,tinc t'lf th~ samt: -.ubjcct (e.g.. th..: l'.S. econ('lm}) be.: n f!i\'cn
d1frcrcnt rt:atmcnh (r:tndoml~ ~o.h l'\!n hange' in i 11.rc't rnte<>) at lhflercnt
p(IJOb n L11e ( th ~o. I t"'l)... the Iqst " and "0 torth 1 J~ till!> tram~.\\ c..1rk. tht: .. m~:h.:
'ubjcct ,ll dtll~rt:nt tun~.) pla~s the t ole of both ln... tm .. nt anti control grtHtp
Som~:tmH::> the Feu dwngcs the mlerest ralc '"hilt! al other times it du~.~ nut.
Dt!l:ausc.. Jata are wlkctcu over tinK, ill\~ possibk tu c!>llmale the d\ n.1mic call' II
dfcc..l that is. thl! tim~: p.uh ol the effect on the uutc.om~. ot intu~.-.t of tl c.: 1
ment ror '- \llrnplc. a -.urpri-.t: increa'c m tht: shnn tl.'rm interc't ratl' nf t\\ U p.. r
centage pc...lnt". sw.t<uncJ for one 4u.mer. might inili.lll)' ha\~o. .1 ncgligit>lc.: ... r ~.
on outp~t. 11tc.:r ,..,. o qu.trll.rs GDP ~fl)\\ th mi~ht sl<m with the gr.. teo;: '"'
UO\\oll alter l I. lllU I'Oo.; t slf }ear.. th~o. 0\t.f l h OC\1 (\\-0 )Car.. c,np n , , ,
mi~ht rctun 1<1 norm d. l111s umc... p llh of uu,al dlc...t.h IS thl! d:on 111 1. ....~,. tl
~fleet on 0 Dl' grn\\lh of., '>llrpri~...: c..h.mgc in the int~.r~o.~l rate.
"' n second example. c:onsiucr the c.-.u...al cnec t on w llli!C: JUic.:c.: price
~hnngc' tf 1 freetine. degree tlay.lt '' po'''hk to m111ginc ..1 v.mc:ty 1.11 h) pntheticnl cxpcnmcm-.. each \idding a dilfcrent ~.au:.al et:J.c.:ct. Onl' e\("-rtmc:nt \\ould be
1 ch 11 l!t; the \\Wther in th1. Flonda orange. grmc: hnh.ling C:tlO!o.ltnt weather
do:cv. hc:n.:-for C:\.,tmple, holtlmg con,tant \\ cathc1 in the (t;,,,, l!.rnpllruitl!fO\c"
.Jilt! 10 nthl..'r citru' fruit regions. This e\p...rim~.nt \W\Uic.lnKa"urc 3 part .11 dJ~.:ct,
holding 11ther weather wnstaol. A 'econJ cxpenmcnt nll~hl ch.trl.~ the wcath~.-r
in all the! rl'gion' where the "tre,ltment '' :lpphc..ttton ol O\ era II wc:.tther pal
t rn.;_ If wc.athcr 1s corrd,ttcd across n::~ions for compdmg C:f'lll"' then thc:!.C t\\o
tlvn~mu: cauo;,al cllccts 1..11ffer. In tlus chapter."'- um ... iJer the C.IU''I dft.:d in the
I Iter eXJKitml.!nt. that 1:., the cau.;al dkct ot nppl) in(' ' neml weather patterns.
'TlH:. cor re~pond~ to measuring the.: clyoamk dfL'd on pritc:" of a thang~o. m
HonJ.t weather. 11m hnlding con... tant weather m othc.:r 't(!ncultural ~egtl' 1'
where.;
11 is
/3
-~
+ f3:X,
/3
X, ,
~rror
in )',and
th~o.
( 153)
dfecl
\If
omitll'd determinant... of Y,. The 01\)deln E.quallou ( 1).3) '" calkJ thL di,trihuled
rdatin~ X ., aml r of th lags, to Y,.
an lllu:.tr.tllOD of I:4uation (I'\ 3). cc.msilkr
IUJ.: model
A~
.1
598
CH APTU 15
More gcncrall~ . the coeffici ent on the contemporaneous value ol X ,.{3 . ill tit~
contemporaneous or immediate effect of a unit change in X, on Y,. The coclt1ciem on X,_ 1, /32, is the effect on Y, of a unit change in X 1_ 1 or. equivalently, the
effect t) O Yr+ 1 of a unit change in X,: that is, {32 IS the effect of .t un1t chan~o:t 1n X
on Y one penod later. Tn geneial, the coeCficient on X ,_h is the effect of .t rn
change in X on Y after h periods. The dynamic causal cCfcct Ill the cfktt of ..
change in X, on Y,. Y1+ 1 Y,+2, and so forth , that is, it .is the scttuenc~ of t;.JU<~al
effects on current and future values of Y. Thus. in the context ol th~ J i-.tributcd
Jag model in Eq uation (15.3), the dynamic causal effect is the .;cqucncc of coeffi.
cien ts /3 11 /32, .. , {3,..,. 1
15.2
S99
600
CHAPTER 15
15.1
Y. - {31
/3 1X
{3_.X + + /3,
1 ~
+ llr
(1'.4)
there a n.: tWO Ui' crcnt t)pts of \Ogcneity. that is. tWO dt ., rent cxogcnelt)
condilll 11!.:
Pa~t nnd pn..:~~nt cxogcndty (cxo~cneit y) :
E(tt X. X ,. XI-' . . . ) = 0:
Past present. a nti future
~.: xogendtr
(strict cxogencily):
(15.6)
If .Y is ~t rictl~ exogenous it is exngcnous. but exogcneily docs not
cxogcne ' l\
fureca'''
l)f
impl~
:;tn(t
FDD when they decide ho'' much they ,.. ill bu} ur -,dJ ut ,,
gt\Cil
pncc. then OJ price' and thus thL ~.:rror term 111 wuiJ incorpm.lle inform.Hion
tbout future FDD that would make 11 a usetul predictor of FD D . Titb mcnn~ that
\\ill be corrd .llc.J '" llh future value:. of FD D,. According to 1hi'> logic. ,_ cau'.!
u, mcludcs forecast:; ol tutu n: Flonlla weather, f f)D would be (past anJ pre ~nt)
c\ogt:nou), but not Hriul; exogenous. The Jtflerencc bt:twcc.;n thi" MJ ths.:
tomato fl:rtiliza l'\.tmplt! '' that "lule tomaw planh aR un.1Uuted b) 1
c
lc.rlili:r.atinn. OJ m.uket participant' lift influ\.'nc~d h) forccash oi lutur\. IINida
weather. We return to the q ue~tion of whctht:l FDD i~ 'trictl) ~.;xugenou., wh~n
we anaiH~ Lhe or.. nl!c JUice price data in more Jct:ulm Section 15.6
Titc t\\0 ddtnitlllns of c~ogent.'I!Y Jrc ... ummuri7cd 10 Key Concept 15 I
11
15.3
tshmohon
l<egressors
60 l
xs
Autocorrelated ut,
Standard Errors, and Inference
In the distributed lag regression model, the e rror te rm u1 can be a ulocorre lated,
that is. u1 can be correla ted with its lagged values. This autocorrelation arises
602
CHAPTER 1 5
THE DISTRIBUTED
15.2
nlL
lAG
MODEl ASSUMPTIONS
ts..
l:.(11 X,. X1
X,
. ) -0
11 1 can thcmselvc~
ht:
st!rially correla ted. For example. suppose that the demnnd tor orange juace .1lso
tlcpcnds on income, so that one factor that influt::nccs lhc pncc or orange JUte~ ts
income. c;p\!dikall}. the aggn:gatc: income of potential orange JUice con... um~.:rs.
TI1cn aggreg<HC income: is an omitted variublc 10 the distributed lag regres-;iun ui
orange juice price changes against free7ing degree day<: Aggregate incnm-:. ho\\
ever. i~ seriallv correlated: Income tends to lull in rece.;<;ion~; and Ti~e in t\p:tn saon" Thu-.. income ic; <;l!riall} correlated.mu. because it ,., p:.art of the error term.
u, "ill be s~..nalh correlated. Thb ~xamplc rs typtcal Bec~1u::.c onuttcd Jt:t~:rms
nJnt... of}' ciTe! thl:ffiSCI\1.~::. -:.c:riall} corrclat~d . in gcncral111 Ill the dt<;tnbull.:d 1.1g
model will he currdated .
TI1e aurocorrdaaion of 11, doc.-; not affect the con"i~tency of OLS. nM Jo~' it
introduce h1~b. If. howeYcr. the errors ar~ autocorrelatuJ. then in ~eneral the
u-.ual OLS standard errors are inconsi:>tent and a different formula must he used.
1 hu.;; corrcldtaon of the errors is analogous to hcteroskedastidt}: The
humo~keda-,ta<.Jty-only standard errors an: "wrong- when the error::. ar~.. sn .u.:t
IH:kro-,keda~tic, m the St:!n~e that using bomol>kcda~ticit) -Oil ly standard ~ rr~ r'
rc,ulh in mi..,Jc.tding statistical inference~ \\hen the error'> Me hetero-.k\.J ,. . til:.
Similarly. when the errors arc llcrially corrclau.:u. swndard errors prcdicut~.;J
upon i.i.d. errors ure wrong" in the sense Lhut they rcsull in misleatling ~tati.;tJ
calm[t!rencc c;. l11~ solution to thb problem is to U'-C hclt!ro~keda'\llcily- anJ 1utol:orn:lauon-con\i~tcna (HAC) standard errors. the toptc ol <;cc.:tion 15.--1.
Dynamic Multipliers
and Cumuiative Dynamic Multipliers
Anoth~.:r namo.: lor the
lath l' dynamic multipliers an: th~ cumulative cau ... JI t!ffcct'-, up to a gi' en l.tg:
1s.3
Es~malion of Dynamic
603
thu:. t h~ c.;umulati\'C dynamiC multipliers mc.;a!>lll ~the cumul.llt\ e dkct ~.m }'of"
~h.mg~. in \'.
Dynamic multipliers.
Th~
I fr..:ct
~>I
a un11
~.h 111gc
Cumulative dynamic multipliers. lltt.. hpl.ll\tJ cumulnthe d)namic multi!llicr t;., the cumulative effect o1 a umt ch.rngc 111 .\ on ) o' c1 the:! Dt::\1 It r~riuc.ls.
I hu .... th~ cumulath I! c.l) namic multiplicr. .trc the cu multtliw ~um of thc dynamtc
multiplkrs. In terms of the c.oefficicnt-. of thc Ji-.tributcu lag tt:gre-.-sion in I:quation ( 15.-1). the zero-periuJ cumulatiY'- multiplier j, /~ 1 the. on~. p-.!riod cumulati,e
multiplier is J3 1 + J3=- and the 11-periou <:umul.ttive J\ nnmic multiplier is {3 1 T
f3 + , J31 _ 1 The ~urn of all the ndr-.:idu.JI c.lyn.muc mulllplkr-... {3 +
/3 2 -r -r J3, . i<~ the cumulatl\ l. lvnv-run dll.:~t on }' ul :.r change in .\',and t
C:lllcd the long-mn cnmolath-c d~ou mi c multiplier.
F~lr c;:.:\.tmpk. consrJcr the regrt.'>'>IUn Ill r ~JU.ttinll (I '\,2). The immt!diate
dlc~t of an additional freezing dcgii.!C c.l.1~ i)> tl 11 the pric<. of ur.mge juice con<.~.n trat~.- ri"e' b~ o..r%. Thl' cumulati,e cftcct nf n pm:c ch.1nge mcr the nt:xt
ffi{)nth IS the ... um of tht! impact cfkct mJ the: d) namr..: cllu.t ,101. month c~hc.td
thu:-. the cumulame dfc.ct on pnccs "th ... IOitlalmcr~. N; lli0A7o plus tht: "ubSClJUC:Ill <;malkr mercase of 0. 14 "u lor a tot.1l ol O.t> t~ ~umlarl) . thl! cumulatl\ 1.!
d~n.tlllll multiplier 0\er t\\o month"'' 11.47% + U.l4'...
U.tl6% = 0.67'}...
The cumulati\'C d)namiL multipli~.;r' cun h~. ~,,ttmatcJ directly using a moJt
fk.cti\lll of the uistribuleJ lag regr~.:,!.il'n in Equation ( l:'i...l ).1l1b mouil1ed regres,jon is
o.- )
o,
604
CHAPTER 1 s
a,
{3, . t Mo reover, the O LS estimators or the coefficient:> in E quation ( 15.7) are the same as the corresponding cumulau "~ .. um of
the O LS estimators io E q uation ( 15.4). For example,~ = ~ 1 + ~z rn~ main benefit of estimating lJ1e cumulative dynamic mulupl.lers using the spe<.1ficauon in
E quation (15.7) tS that. because the OLS estimators of the regrC!>l:itOn \:tlCffkw;
arc estimato rs of the cumulative dyoamjc multipliers, ilie H AC standarJ ~rr,rs
is.
1-
fJ. -
{J~
+ /33 -
... T
of the coefficients in Equation (15.7) are the HAC standard l!rror-. (>f th" t:u nulative dynamic multipliers.
where the assumptions of Key Concept 15.2 are satisfied. This section s h o~ -. thnt
the variance of ~ 1 can be wrin en as the product of two terms: the expres~iun fM
var(~ 1 ). applicable if u, is no t ~crially correlated, times a correctio n factor thnt
arises fro m the autocor relation in u1 or, more precisely, the autocon eltliiOO 111
(X, - JJ.>.)u,.
..
15 . 4
605
A~ :.ho\\o 10 Appendix 43. the formula lor the OI.S c~tmntor ~~ in Key
Concept 4.2 can be rewritten as
I
T 2:(X, - X)u,
...
{Jl
= {3!
(15.9)
1 f
'
(X,X)
2
T ,_.
-L
where Eguation (15.9) is Equation (4.30) with a change of notation so that i and
re placed by rand T. Because X 1-L.\ and }
1 (X, - X)~ ~ u:{.
in large samples ~~ - {3 1 is approximately given by
z.:
11 ar~
T 2:<X, - J.Lx)u,
fJ 1-'1
fJ
=-1
""'
1-' I -
<rx
.,.
t 2: v, =
=-
' '".L
l
o-x
vl
cr}:
(15.10)
,.
If,., is l.i.d.-as assumed for cross -~ectional data 10 Key Concept 4.:>-then
var(ii) = var( v,)/ T and the formula for the variance of {3 1 from K ey Concept 4A
apphes. If, however, u 1 and X 1 are not independently distnbuted over time. then
in general v1 will be sen ally correlated. so var(ii) :t- va r(~'.)/ T And Key Concept
4.4 does not apply.lnsread, if v1 is serially correlated, t h~ \anance of r b gtvcn by
var(ii)
= var[(v 1 + v2 + + vr)1 1l
= [var(v1)
+ cov(v1.v1 ) + ~ cov(\' .1
cov(v1, 11) + var(v2) + ~ var(l' r)]/ T 2
= [Tvar(v,) + 2(T - l)cov(v1,v1_ 1)
+ 2(T - 2)cov(v,,v, .,) + + 2cm(1'1.1 '1-n 1)J/ T 2
(15.12)
u -'
= ifr
where
T-l(T ; 1)Pj
fr = 1 + 2 L
(15. 13)
j =l
t~nc.b
to the limit
f r - for. =
606
CHAPTER 15
Combining the ~xpresstons in Equation (15.10) for iJ ami Fqu,ttton ( 1'\ 12)
fo: i'~l) give~ the formulu for the ,ariancc of ffit when r, is auto~orrLI.tlt;J
1
~
,..-
~Gv
,<;)(X~:-~
L- !'~
vm(/3)
[1 (tT~u= ) ~ ]f1
= r
(1
_1.:)
~..t::ol
X, fJ_ '/J
l1.
r,.
ii,
r,
( J'i 1;.)
....
15.4
607
the ~'timution ~rrur in thts ~:-ttmator ot I 1 remain!'> lurgc even in largt; ...ample,
At the other extr~m~ on~ coulc.J imagine u...ang onl~ :1 lc\\ 'ampk autocorrelation-.. fur c\ample. onl) the Cir.t ..:tmph: auhXIlrrclttion anti ignoring :til the higher nutocorrclation;;;. 1\lthoueh thi estimator t.hminntc:s the problem of c:.llmatmg
too many autocorrelation!'>. it has a thflerent pHlhkm It IS inconsistt:nt because it
tgnorc-. thc add1tional auwcorrcla11ons that appc.<tr tn J:.quation ( 1!'\ .D). In :-.ho~t.
u~ing too many sample autocorrclauons ma~c~ thc c:-t imator have a large van
tncto. but u:.mg t oo fc\\ autot:urrclation' tgnorc., the autocorrc:l.uion ... at higher
lags, ~n in either of thc-.c c\trc mc cases th~ c.,tima tllr j, incon'i'h:nr.
E-.tim.ttors of fr u'ed tn practice -.trike a halance between tbec;c two extreme
cases hy choosing the! number of autocorrd.lllOns to mcludc m a \vay that
Jcpen<.b on the c;amplc '>17c r.If the sam ph: sw.: 1s ~mall. on I ~ a [t:w uutocorrdatioru; arc u-;cd, but 11 the S<.tmplc <:1/C t\ large. more au to~:orrd.lliOO\ .m: tncluded
(hut c;till far fev.e r than T). Specifically let j, ~\,; ~I' ~n h)'
j1 = 1 ~
t;:,'(m-1) i).,
'
(15.16)
2,;_
where p 1 =
.\)it (ac; m the dcfi nttion of
1r,,, ,12.:. \- .where i\- (X
ir ' ). The pa ramet~r m 111 l:qua tion (15.16) ic; c.lllcJ the truncAtion pl! ramcter of
th1.. HAC esum::nor becaW>c the sum of autocoul!lalloru. ., ~h ortencd . or truncalcu. to tnclutlc: only m - 1 autocorrdations m-.h.ad of the r- 1 tiUtocorrclations
appl.mng in the populatton fonnula in E4uauon (I ' 13).
f or fr to be con~~tc:nt.m must bt.: ch o~c:n so th.1111 ll> lan~c tn large sample~
although still much Jc-;s than T. O ne guideline tor choo... tng m 111 pract1c.:1. i to use
1he formula
m
= 0.75T1':l.
(15.17)
ruunJ~:d tO an
---
1Fqu.lllllll (I~ 17) !tin:~ Ihe ")X:,I'' ,;hnice <lf Ill If, nn.J \ oliO: fi N urllt'r nulurq!!rC'"'c prc..:..:,~s with
rir.-1 null"llrrd ron cudfidcnl.; 0 ~ \\her" "IX:\1 me.rn' lh 1.. l1m.11or 1lur1 nunlllllh I
I <jUIIIron (I"' ill' b.1..eJ oln n niCifc gcnci.tllomlllla .!em ~J h) ,\ndrc\\ [ii\ll , t ttU<~Iion f ~ iJ)
ca'
"; )
608
CHAPTER 1 S
ir
Other HAC estimators. The Newey West variance estimato r is not th~ nnl)
HAC l!!.limat or. For exa mple, the weights (m- j) l m io Equation (15 16) can h~
replaced by differenl weights. If differe nt weights are used, then the rul ~.: for chuo-;.
ing the truncation parameter in Equation (15.17) no longer upplies and a Jll ll.t
ent rule, developed for those weights. sh.ould be used instead. Discussion ul I lAC
estimators using other W(!ights goes beyond rhe scope o( this book. For mote tnlormarion o n this to pic, see Hayashi (2000. Section 6.6).
Extension to multiple regression. AU the issues d~'\cussed tn thts ~~..lion generalize to the distributed lag regression model in Key Concept 15. 1 with multtph:
Jags and, more generally, to the mul lipl~ regression model wi lb sen ally com.:l.llcu
errors. Tn particular. if lhe error term IS serially correhned. then the usull 01 S
standard e rrors are an unreliable basi~ fo r inference aod HAC tanJ nd ~.rroro;
should be used instead. If the HAC vanaoce estimator used is the Ncwcy-We:.t
estimator fth e HAC varia oce~tima tor based on the weights (m - J) l mJ , th('n th"'
truncation parameter m can be chosen according ro the rule in Equation (I 'U7)
whether there is a single regressor or multiple regressors. The fo rmula tor HAC
standard errors in muhiplc regression i incorporated mlo modern r~&re'"nn' lfl
ware destgncd for use with time series data. Because this formula involves rr ri'
algebra, we omit it here, and instead refer the re ader to Hayashi (2000. S 1.l ton
6.6) for the mathematical details.
HAC standard errors are summarized in Key Concept 15.3.
15. 5
609
15.3
dynamic multipliers from the esti.mated A DL coefficients. TI1is method can entail
estimating fe wer coefficients than OLS estimation of the distributed lag model,
thus potentially reducing estimation error.1'he second method is to estimate tbe
codficieots of the distributed lag model, using gene ralized least sq uares (GLS)
instead of OLS. Although the same number of coefficients to tbe distributed Jag
mode l are estimated by GLS as by OLS, the:: G LS estima tor has a smaller variance. To keep the exposition simple. these two estimation method!; are toi tia Uy
laid o ut Hnd discussed \o the context of a distributed l~g model with a single lag
and AR( I) errors. The potential advantages o f these two estimators a re greatest,
however. when many lags appear in the d istributed lag 01odel. so these estimators
are then extended to the general disuibuted lag model with higher-order autoregressive errors.
= {3
11
f3 X,
{3 X,
1 "t llr
(15.18)
610
CHAPTER 15
As discus!>ed in Section 15.~. in general Lhc error term 111 in Equation (15.1 X)
is serial!~ corrclatcJ. One cunscqucnt:c ot this !>erial correlation is that. if the Jistnbutcd lag coefficknb arc C:>lllllatcd by OLS, then mfcrencc based on the usu:al
O LS standard errors ca n bl! misleading. for thi::; reason, Sections 15.3 and 15 4
emphasized the use of I lAC stanJarc.l error::. when {3 1 and {32 in Equtltton ( 15. 1~ 1
are estimated by OLS.
T.n lbi.s section. we take o diffcrcnl appro<lch toward the .serial correlation in
u,. Thi-: approach. whtch is pO!oo!>tble if X, b ::.uictly ~xogenous. involve<; adopti 1~
an auton::grc~~tve model for the o;cnal condation 111111 then using thts AR m<,<kl
to derive ')ome estimators that can be more efficien t than the O LS estimator tn
the Jistributcd lag mnJL'I.
Specifically. -:uppusc thatu, folloM lhe A R( I) model
U1
= c/> 1111
( 15.
I -r IL1
where l/11 b the autoregrco;:.i' c paramcll: r, li, is seri.tlly uncorrdated, and no i nt~;r
cept is ncl!ded becau..,c E(tt,) 0 Equatinns (1 5.1X) and (1 5.19) imply that the-
distributed lag model \\ ith a seria lly correlatcJ e rror can be re\\ rill en a~ an
autoregrcs!>ivc distrihut\!d lag modd wtth a serially uncorrclatcd error T{) Ul 'o.
lag each :.tdc of E quatton (l5.1H) and subtract c/> 1 times tb i lag from each "'de
Y, - <P t Y,
1=
+ f32Xt
..L..
111 -
cl>tllt-t
Collect ing
term~
ii,.
+ u -t)
(15 .2)
in Equation
where
(I ~
221
"here {3,..{3. and 13 .ue the codltcknt'> m Equation (15.1h) and <b IS the ct utocorrdation codfictcnt in Equatton ( 15.1 ~) .
Equation ( 15.21) ~ .111 DL m odd that include:. a contcmporanc\lU!"> ... aluc
1
of X anJ two of il::. Ia go; \\'t.> "til rclcr to (15.21) a~ the ADL representation f t:
Jio-.tributed lttg model \\ith autnrcgn:ssivc error.; given in Equation1\ ( 15 IX) 111 J
(1519).
1 s.s
61 1
(15.23)
We will refer to Fquation (I "'.23) a~ the quasi-dtfference representation of the
distributed lag. model "ith Hutorcgrc.:~~v\! error" i!,th:n in Equa11ons ( 15.1 '{) and
( 15.19).
The .\DL model Equation (15.21) ("tth the raramctcr rc::,tnctton<> in
Equation (15.22)] ,md the quasHhtkn:ncc moJcl in Equation (15.23) are eqot\
alenl. In hoth moc.lels. the error term. i'i,. i<> serially um:orrelatcd. The two rcpn::sentation however. suggest c.Jiif~..:rcnl C\timatrnn <;tratcgics. But before di cussing
those strategies. we turn to the assumptions unc.ler which they yidd c<.msi tent
C:>timator. of the: dynamic mullipht:r~. {3 .tnc.l/3:
The conditional mean zero assumption in the ADL(2 , I) and quasidifferenced models. Bee<mse Equation ... (I i 21) l'\ith the restrictions in Equation (15.22)] and ( 15.2.1) arc equivalent. the condition~ for their c::,limatkm arc the
<;a me, so for convenience we consider Fqumion (I 5.2~).
The quas1-diffcrcncc model in Equation ( 15..23) is a &nribut:!d lag mudd
involving the quasi-differenct!c.l variablt..:s with 11 serially uncorrclated error.
Accordingly, the condllions for OLS cstimatmn of the coefficients in Equation
( 15.2.1) arc the least ~quares a~sumptiuns lor tho.: dhtributcd lag model 10 Key
Concept 15.2, expres...ec.l in term~ ol u, and A,. The critical ~umption here it; the
fir;t as<;umption. \\ hich. applied to Equation ( 1.:5.23 ). b that X, i<> exogenous. that is.
{15.24)
where letting Lhe conc.ltttonal ~ xpcct ation depend on dtstant lags of X, ensures
that no additional Jags of X,. other than those appearing m Equatton (15.23).
enter lhe population rcgrc~sion function
Becau.<;e A 1 = .\ 1 - <J> 1X,_ 1, M> X, .\'1 dJ 1X, . conc.litioning on X: and all
uf ih lags ,., -:quh alent to conJttioninlt on Y .tnc.l tll of it lags. Thu". the condiLtonal expcctauon condllton tn Equation (I 'i ::!4) is equivalent to the condllion
tbat ECu, X,. X, 1, ) = 0. Furthermore. hi!C,tU\1! 11 1 =- u <b 1111 _ 1.this conc.lttton
in tum implks
612
CHAPTER 15
E(il,
= E(u, -
( 15.2$ \
1, )
1X,,X,_ 1, . )
c1> t;(u, 1
For 1he equaht\ in E4uation (15.25) to hold for gener'll ' lu~, of </> 1, it must
be the case that both E(u, X,, X,_ 1, . . . ) = 0 and E(u 1 X .. ~ ,_. . ... ) 0 B)
shifting the time :.ubscripts. the condition that E(u, Xr Xt-1 , . .. ) - 11 can be
re\\TlltCO ac;
E(u,I Xt+ t X,.X,_ 1, )
= 0,
which (by lhe law of itcrared expecta tions) implies that (u,l xr ,\ I
)
0 In
summary, having the zero conditional mean assumptton in E4uation (15.24) hoiJ f~~r
general values of <b 1 i<> equivalent to luwing the condition in Equation ( 15 211) hniJ.
The condition in Equation (15.26) is implied by X, being strictly c xog(!nuu~.
bu l ir is nor implied by X 1 being (past and present) exogeno us. 1l1us, till' ll.!mH
squares assump tions for estimation o( the d imibuted lag model in f7<.J.U Uihm
(15.23) hold if X, is strictly exogenous, bu t it is not e nough thai X 1 be (p,1..,t and
prcsen1) exogenous.
Because the A DL representation (Equatio ns ( 15.21) and (1 5.22)] is cqut\ al~nt
10 tbe qua:,i-differenced representation [E4uation ( 15.23)], the conditi<'n I mean
assumptio n needed to e-;timate the coeffici.::nts 0 1 the quasH.Iifferenced r~~msen
talion (that E(u,I X,+ t X('J X,_ 1 ) - 0) is also the conduional mean ....,,umptton
for consstcnt estimation of the coefficients o( the ADL r~presentation.
We now turn to the two estimation stra1egiec; suggested by thelle two rcpr~
sentations: esltmation of the A D L coefficienb and estimation of the codhctt'Ol'of the quasi-differe nced model.
1 s.s
6 13
(15.27)
where the c~timul <.: tl in tercept lws been nrniLLI.!d beca use it does not cnrer any
cxpn.:!-o,ion IM the dynmic multipli~r,. Laggmg huth <;iut::s of Equation (15.27)
~ielc.ls } 1 1 =- J1 11', ~ f. ~~lY\'1 1 .... ii, X, , + ~:.X, \ '>0 replacing Yr- in Equation
(15.::!7) b\ :r, 1 und collecting termc; yrdds
Y, = J,,(,;,, Y,
~ -
cS0 X,_,
1-
fi:.X,_ 2
R~ pc-ut mg
,;1('~2 ~ J,~ , *
..~ + .
( 15.2X)
1,
Y,
3,
(15.29)
The coefficient<> in Equation (15.29) are the ~,. ,tt ma tors \.>1 r h~ d}namic mulupliers. comput~d from the OLS cstim:.ttor~ of the cot fficients in theADL modd
Ill Fq uat10n ( I).21 ). If the restrictions on th~: wdfiL:Jcnts in Fquat1on ( 15.22)
~ ere to hold exact!) for the eMinwred coefficients. then the dynamic multipliers
bcyontl the :-ccond (that is. the codfi cicnl., on X, ~ Y, J ami '>0 forth) wou ld all
he 7Cro ' Howe\'er. under thi c<:.limation strategy those rec;trictions will not hold
C),,u.:tly c;o the cc;trmatcd multipliers beyoml the 'l'cond in Equation ( L5.29) ,,,11
I!O::nc rall~ I'C nonzero.
G LS Estimation
1l1c '.::C<lnJ stratee\ fm ~'tima uny. th~.: d\ n rm tc multrplicr;; \\hen X, ts stncll~
exo!!enou' is to u'e ~cncrah 7cd least squa res (GLS). which entml'> estimating
Fquatinn ( 15.'"'.3). fo describe tb~: GLS ~:,trmator. \\C rnniall} assume that </J is
knO\\ n: Dc~ru:,c in prac.ttce H is unknown. thts esum,uor h infea,tbk, so 11 1:> called
the tnk.r:.tble GLS ~umator. The mlca,thk GLS c'timator, ho\\e\er. un ~mod
ilied using 31) l.''tlmah)r of tPt \\-hich) H..ld'" r~.. Nhk \l..r...ion of the GLS e... timator.
'<;u(hllluk the cqu.11111c:' 111
<lrl>u
II
-t <f,1fi
614
CHAPTER 1 s
lnfeastble GLS. Suppol>c that tb 1 were known: then the qu asi -dill~.;r~,;nc~,;<.l \'ilrJables X, andY, could be computed directly. As discussed in lhc e<.mt~,; \I ot rqua ton
(15.24) and (15 26), if X, is strictly exogenous. then E(li, l X,,
X,. 1 ) o. rnu-.. 1t
X, is strictly exogenous and i! 4> 1 is known.th.e coefficients ~./3 1 , and {jl in Equation
(15.23) can be estimated by the OLS regression of Y, on X, and X, 1 (mcludine an
intercept). The result ing estimators of /31 and {32-that is. the OLS cl>ttmator!> ol tht:
slope coefficients in Equalion (15.23) when ~ 1 is known- arc the infe"'iblc Gt~
estimators. Thb esttmator is infeasible because if>1 is unknown. w l, anc.l }: c. 1 n,
be computed and thus lhese OLS estimators cannot acmally be. computed.
Feasible GLS The feasible GLS estimator modities lhe infeal>l hl~.< GLS t ' .
mat or by u mg a preliminary estimator of 4>1, 4>1 to compute the c timatt:d qu 1!>1
differences. Specifically, the feasible GLS est imators of /31 and {3 2 ~re the 01 )
e~timators of {3 1 and {32 in Equati9n (15.23). computed ~y regrcs~ing ~ on .\'1 .tnd
X, _1 (with an intercept). where X1 = X,- ~ 1 Xr-~ and Y1 = Y, - ~ 1 Y, 1.
1l1e prelimi nary estimator. 4> 1 can be computed by first estimating thl' Jistributed lag regression in Equation (15.18) by OLS. then using OLS to csumatl!
</l 1 in Equation ( l5.19) with the OLS residuals ii1 replacing the unobsc.:rvc:d
regression errors u,. This version of the GLS estimator is called the Co~hr.Jnt:
Orcutt (1949) estimator.
An extension of the Cochrane-Orcutt method is to continue ths proc.:~::.' \; atively: Use the GLS estimator of {3 1 and~ to compute rc::vtseJ e timator' uf :
use these new residuals to re-estimate <6 1: use this revised estimator of <b 1 to compute revised estimated quasi-diffcrem:es: use these revised estimated qua~t - J tfk ~
cnces ro re -estimate /3 1 and /3 2; and co ntinue this process until the estima tors 01 p
and fi2 converge. This is referred to as the iterated Cocbrane-Orcull estim.uor
A nonlinear least squares interpretation ofthe GLS estimator. \ n ~quh
alent interpretation of the G LS csttmator is that it esrimates the ADL moJd 10
EquaUon (15.2.1 ). imposing the parameter restrictions in Equation (15.22). 11,t>
restrictions are nonlinear functions of the o riginal parameters {30 {3 1 fJ?., aml t 1 1 M,
this estimation cannot be performed using OLS. Instead . the parumetcn; c,1n ~~
estimated by nonlinear least squares (NLLS ). As discussed i.n Appendix R.I . 'JL L5
minimizes the sum of squared mistakes made by the estimated regression lunc
tion. recognizing that the regression functi on s a nonlinear functio n of th~ par:me ters being estimated. ln general, NLLS estimation can require :.ophi<.u~.tc:d
algorithms for mioimi1ing. nonlt near functions of unknown paramc.: terl>. In th~
special case a t hand, however, those sophisticated algorithms ar~,; not oc:ct.kd.
rather, the NLLS Cl>tima tor can be computed u ing the algonthm J ~,;,crihcd
t s. 5
6 15
pre vious!> for the it erated Cocbrane-Orcull estimator. Th us. the iterated
Cochrane-Orcutt GLS estimator is in fact the ~ LLS estimalor of the ADL coefficient s. ubjcct to the nonlinear constn ints in Equation ( 15.22).
The virtue of the GLS estimator is that when X is strictly
exogenous and the transformed errors ii, are homoskedastic, it is efficient among
linear estimators, at least in large samples. To see this, fi rst consider the infeasible
GLS estimator. Jf 1 is homoskedastic. if 4> 1 is known (so that X, and Y, can be
treated as if they are observed). and if X, is strictly exogenous, then the GaussMarkov theorem implies that the OLS estimator of a 0 /3 1, aod /32 in Equation
(15.23) is efficient among all linear conditionally unbiased estimators; that is, the
OLS estimator of tJ1e coefficients in Equation (15.23) is the best linear unbiased
estimator. or BLUE (Section 5.5). Because the OLS estimator of Equation (15.23)
is the infeasible GLS estimator, this means that the infeasible GLS estimator i~
BLUE. The feasible GLS estimator is similar to the infeasible GLS estimator.
except that 4>1 is esti mated. Because lhe estimator of~ ~ is consistent and its variance is inversely proportional toT, the feasible and infeasible GLS estimators have
the sa me variances in large samples. In this sense, if X is strictly exogenous, then
the feasib le GLS eslimatOr is BLUE in large samples. In particular, if X is strictly
exogeoous, the n GLS is more efficient than the OLS estimator of the distributed
lag coefficients discussed in Section 15.3.
The Cochrane-Orcutt and iterated Cochrane-Orcutt estimators presented
here are special case~ of GLS estimation. In general, GLS estimation involves
transforming the regression model so that the errors are homosl.<edastic and serially uncorrelated, then estimating the coefficients of the tram.formed regression
model by OLS. ln ge neral. the GLS es6mator is consiste nt and BLUE io large
samples if X is strictly exogenous, but is not consistent if X is only (past and present) exogenous. The mathematics of GLS involve matrix algebra. so they are
postponed to Section 18.6.
Efficiency ofGLS.
The general distributed lag model with autoregressive errors. The general distributed lag model with r lags and an A R(p) error term is
616
CHAPTER 1 S
Y, - /3n
whc:rc {3 1,
+ {3 1 \ ',
{3~X,
f /3,~ 1X,., t 11 1,
( 15.Jil)
/3,. 1 ;~re tbl! dynamic mult ipliers and t/> 1 />1, ~r..:
the:: autur c-
)!r~''hc cocUich:nt-; of the ~.:rror tc:rm. Cnder the AR(/' J modcltur the errors..u
"her~ tJ
r + p .tnc..l 8.,_ ... , {,,, ure fu nctions ol the {:fs and d/s in ElJU.tltlll "
( 1<i.30) and ( 15.31) Equi\'alcntl). the maud of Equ.nion:-. ( 15.30) .mc..l ( 15.31) ~: n
be: written in qua:-.i i.ltlfcrence form as
(I ~D3)
when:
Y=Y
I
- </> 1Yt-1 - -
.-~.
<Jip YI p
X =X
and
- 'f'l
"' X t- 1 - - (b".\'
, 1
,,
t(iiiiX,,X,. 1.... ) = 0.
13~cc.tlhe
ii, = u
1 -
c/J u._ 1
cblu 1 _ 2 - - cjJ
11 1
(15'4 )
X,= X, -
and
c/> 1X,
cb E(u
ti.Y~ X,_ , .. .. ) -
I!Ji'F( 11
1X,.
X,_, .. .. )
( li
I'
''>
r~11 Equ11lil>n (15.35) ro hold for genc1.11 Htlucs of <b 1, ''' . it m u~t b-. thl!
ca-.c thallach olth~o condiuon.tl expt!ctation., in [4Uillton ll5.3S) ,,zero.~.y
all!ntly. it mu't he 1h~ case that
E(u, X
,,
1
X,.,. _1, X,
. ...- ... ) = 0
(I 'J(l)
'l'hi!\ comhtum ,.., not impkc..l h~ X hc:u \!(past and pr\!'~ 11) ~o.\ogenuu hut
it 1:> implied b} ,\,bung stncth o.;xug~,;nou' 1n f.t<..t, tn the limtl \\lll'n p J<; mltntl ~
1 s. s
61 7
ESTIMATION OF DYNAMIC
MULTIPLIERS UNDER STRICT EXOGENEITY
15.4
llt-: general distributed 1.1g model with r lng<> and \ R(p) error t~rm i~
Y1 = /30
{3 1X 1 ;- ~X1 1
+ - {3
~ tt1
1X
( 1537)
( 15.38)
cxogenou~.
Y, = a 1)
.,...
41 1Y,
+ + 4Jp Y,
- o0 X, + c51X,
-r
h~
.:"ti-
t>,1 X '~
ii .
( 15.39)
when: q = r 4- p. then co mputing the Jymunk multipliers using. regression software. Ahemalively. th~ d~namic multirlicr<> can he estimated hy estimal in!'t the
dbtrihurcd lag coefficients in Equation ( 15.37) by GLS.
(so that the error term in the d i,tri~u tcd lag model f<.1llo'\ . an infinite-order
autoregression). then tbt.: condllion in ryuation ( 1: .~fl) becomes the condition in
K cv Concept 15.1 for c;tnct exugcneitv.
Estimation of the ADL model by OLS. As Ill the tlbtribute d lag modd with
a :-.ingle lag anJ an A R (l) t:n or term,lhl! c.Jynamk. mult ipliers can he estimated
(rom thl! OLS estimator'> uf tht.. \ D L coeffkients in Equation (15.32). The general formulas arc c;imilar to. but more complicntcd than.tho"c in Equauon ( 15.1Y)
a nd a re best expressed usmg 1.1g multiplkr notation. these fo rmulas are gi,en in
Appendix l <i.2. In practice. modern n:~rc..,,t<.m '>oil war\; destgnl!d for time scrie'>
regre~s1on analyst!> Jo~~ thcst: computallonc; tor you.
Estimation by GLS.
A l ll:rna ti wl~.
618
CH APTE R 1
1s .6
619
nn be ~eri11h corrdatcd tn dt,tnhutcd lag regression' 't' 11 '' im1"1rtunL 111 u..."' IIAC ... tandarJ ~rrnrc;.
v.htch adJUSt for lhls s~.:nal com.:J,JlJOn. ror thl! tnllioll rc ... ulls.th-.; truncatl!)n param
clcr lor the NC\\C~-\\c:st standard errors tm 10 lhc nut.ttton o( Sccll~m J'-1) ""'
dw-.cn u...ing the rule in Equation (J'\. 17) BI!L.lU"r.: there are 612 mlmthl~ olN!r
vntions.accmding.to that rule m- 0.75f'''- 0.7'\ X 61 ., 3 = 6.37. hut because m
mut;l be an intcg~r this was roundcu up to 111 = 7. the sensltl\'ity of the::. tandan.l
~.:.rmr~ to thic; cho1ce of truncauon paramctu i-. Ill\ cslil!.ltcc.J bclo\\.
Th~.. tt!-uhs ol O LS t:sttmatwn of the Jt.,tnbutcJ lag tcgres.o;ton of %CltgP, on
FOIJ I D01 1.. 1-DD1_ 18 are '-UnlllMrtlc::d Ill LLllumn (1) of Tttbh: 151. Till:!
~..,,dflci t..nb of thi' regrl!'>:.ic.m (unl~ \Omc of" htLh arc:- reported tn thr.: table) ar~.:
"''t m ''"''of the dynamic causal effect on ~1r tnl~t. JUice price change.:., (in percent)
lor the first lb months folio\\ mg a unit incre<tt;t: in the number ot lree7.ing. degree
J,t\'S Ill a month. For example. a s ingle lrcc11nc Jegree d.ty ,.., estimated to
1ncrcas... pnce~ b~ U.50% over th~ month in whtt.b thl.' Jrcc:rmg degree day occurs.
'll1c subscqucnt dfect on pncc m Iuter month~ ul .1 trcc7tng degH:c da} is less:
Alter on" month the: estimated c:lfcct '' to incrcasr.: the pric~ b}' ,, further O.J70.u.
and afh.:r 1\\0 months the c~timatcd dfcc:l is tn incrc.,,e the price h~ an additional 0 07" The R~ from this regression ic; 0 12, ind icnting that much of the month
ly v.trt.tllon in onmge juice prices is not c~plamc<.J hy curren t ami pa'\1 values of
\c; dN:U''CJ in Scction1- 15.3 and 15.-t. thc error term
TDD .
Plots ot dyn:1mic multipltc:rs can conve) infornution more dh.!cuvel) tha.n
t.1bk., such .1s Table 15.1. The dynamiC multipht;r' Irum column (1) of Tabk 15 1
.lrL pJ,)ucd in f-igure 15.2a alung "ith their 95n confidt.ncc intcrvab., C\)mputc<.J
,,, Ihe t:!>timatt!J c:odficicnl 1.96 HAC standard errore;. After the initial sharp
price nse, subsequent price rises are less. althou!!h prices arc e ..umatcd to rise
~o,lt~h th in each of the fin-1 :.ix months al ter thl. free/\!. A::. an he Sct!n fr,>m ngurl!
15.2J. tur months other than the first thl! dynamic mulltpher:, ;He not ~t.llic.ticall)
"ignJllc<.mtl> different from zero ill the 5 l'u ~tgn tlt tan cc lc\'el. .1lthough the) are
c..',(lnlilttU [{l he po~iti ve through lh1.. 'C\'t.nlh 11HIIlth.
\ulumn (2) of Table 151 w ntain' IlK cumulative dynamic multiplier~ for
thio; 'Pt.cificatton.that is. the cumulaTive sum of the d~ namic multiplier.. reported
in culumn (1). The!>e dynamic multiplier'> nrc pluth.:u in Figure 15.2h nlon with
thctr ~s , (tmfidencc tnter.al~ Aft-.:r un\! month, the cumul.tthe ..-Hc.et of tht.
ln:c.liug u~:grcc t.lay is to mc:rcase p11r~.: b} U.h7 'o, .11tcr two months th(; pnct.. "
c-.11m l!ct.l 10 haw risen b~ 0.7-1. .mu .tftcr 'i'< ll1<.lnth' the prier.: j, eo;11matcd
111 h,l\c ri'o~.:n h~ 0.911{,. A-. can he ..ccn in F1gurc 15.2h,thc::"~.: LUmulati'-'1.. multipli~.:r' incrcJ..,c through the sc, cnth month, hccauc;c the indtvidual c.hnamic multipltcr" arc pn'>itive for the first s~vcn month~ In th1.. eighth month. the U)n.tmtc
620
CHAPTER 1
TABLE 15.1
The Dynamic Effect of o Freezing Degree Day {~ on rile Price of Oronge Juice:
Selected Estimated Dynamic Multipliers and Cumulative Dynamic Multipliers
( 1)
(2)
(3)
(4 )
log number
Dynamic Multipliers
Cumulative Multipliers
Cumulative Multipliers
Cumulative Multipliers
0.50
0.50
0.50
(I ')1
(11.1 -l)
(0. 1-l)
(0.14)
(0 1'i)
0.17
0.67
(0.14)
0.67
0 711
(0.13)
(0.15)
---
(0.09)
o.m
(0Jl6)
(0.17)
0.74
(0 16)
(0.1'\)
O.o7
0.81
(0.18)
O.Sl
(0.18)
0.&-l
(0.19)
(0.03)
0.84
(0. 19)
0.84
(0.19)
IU37
(0.20)
0.03
(0.03)
0.87
(0.19)
0.~7
(UN
(0.19)
(0.20)
0.03
(0.05)
0.90
(0.20)
0.90
(0.21)
(11.21)
0.54
(0.27)
0.54
(0.'21>)
(0.28)
(0.02)
0.37
(0.30)
0.37
(0.31)
0.37
(0.30)
No
}to
~0
(0.04)
12
0,02
-0.14
(0.0.~)
0.00
18
Monthly indtcators?
0.74
0.76
0.91
0.54
Yes
--7
14
F - 1.01
(p - 0.43)
AU regrcssJons \\ o;:fl: esli mnted bv O LS usu1g monthly Witu (de~ibed tn Appcodt.x 15. 1) from J.muarv 1950 h.l Ot"Cembcr lfXX.i. hr a
total ofT .. 612 m1>n1hly olxervations. The depend~nt variable tS the month!) perc~ntug~ ch.lnl_lt' in the pnC(' of orange jUJC.:
(%ChC1P,). Rcgrc~'iun (I) ~ rhe disrnbured lag rcgr~ion w1th the monthly number or (r\.~1.1011 d.:grcc d.1~~ and lis ol ars la~ll 31
ues. that ts. FDf)r Ff)f), ,... FODt-t and th<! rep<1rt.:d eodflcients ar' the OLS e-.hmo~t.:' ur the thn.1nuc mulhphc" Tht cumuLtive mulupllef' un: the cumulnuve sum of esttmatt'd dvnallliC muluphcrs. /\11 re!lfe.>on' mcludc an ul!crccpr. wtuch IS not r..:poncd.
Newey-We\t HAC standard errors. computed using the lrunc<~uoo number gwn 10 rh.: final f0 \1 , arc rcponcJ 1n p.ar..:nthC'>Ci
multiplier is negative, so rhe price of orange juice begin:. to fa ll slowly from it"
peak. After 18 months, the cumulative increase in prices is only 0.37%, that b. the
long-run cumulative dynamic multiplier is only 0.37 %. This long-run cumulative
dynamic multiplier is not statis tically significan tly different from zero at the 10%
significance level (1 = 0.37 /0.30 = 1.23).
15. 6
6 21
FIGURE 15.2 The Dynamic Effed of o Freezing Degree Day (F~ on the Price of Orange Juice
Multiplier
1n
on
0 I
0.2
..Ill)
-0.2
0
(a)
E~umJted I))
10
12
14
16
18
20
Lag (in Mo nths)
Multiplier
l1 04
1?
l ,!l
tl.\1
-t12
-11,4
t)
6
l>yn~nuc
10
12
14
Ill
IX
21l
Lag (in M onths)
The estimated dynamic multipliers ~w that a freeze leods to on immediate increase in prices. Future price rises ore
much smaller than the initial impoct The cumulative multiplier shows thot freezes hove a persistent effect on the level
of orange 1uice prrces, w1th pnces peaking seven months after the freezo.
622
CH APTER 1
\\huhcr th~;'>\. ,.~,ults tr\. -.cn.,ith-c to changes in the detail' of the empirical
analv''" \\~,;. th\.rdnrc cxamme three a"pect" ot this analysi" <;cn ..iti\it) tu tht>
~omput.llwn ol th\. HAC stanuan.l ~;.rrors: an alt~m a tive spec:tltc.ttiun th 11 im~,.
uga11.:s pot\.lliiJI omath:u \Unable bws. anu an analysis of the ::.tabthtv ov\.r time of
the estimated multiplier'.
Ftf'l. \\\. inH:stig.th. whuhcr the .. tandard error::. repon~J tn the -.cullld column ol Ibhk 15 I art. -.c..n,itin. to dittcr~nt choices of the IIAC trunc:llillll p:lrnmd~r m In column (1) rc~ult~ arc rcrorted lur m ~ 14, twic~ th~o. "alu u.;cd in
column l2). fh ~ rc..cJ\.''''>n 'fl'-\.tlh.:a tnn i::. the same d S in Cc.1ll.lr--n (2). !':O the
estimated \.U~oift\.tt.::nts unJ J) n.tmtc multipliers .JIC tdcnllcal ''nh the :-.tand.trd
error.. ullcr hut, ,,.., 11 happen, not by much. V.c conclude th,\1 th ... r~sulh ilrt:
in,~n-.ithe to change-. in thl' HAC truncatton parameter.
ScconJ . WI. invt,tig.atc a pos...ihk ~uurce of omiLtcLI variahk ht.l .... rtcczc .. in
noml.t arc not randnm ly ltssigned throughout the year. hut r:tthct 11ccut in the
winter (ol CllU rsc ). If Jcmanc.l !'or ownge juice is seasonal (is d cmond for m<~nP,l'
ruicc grcatl.!r rn the wtnlcr than the summer?), then the st:a:>onul palt\.rll:> 111
orange JU tl'l. J~.manu coulu be corrdated w11h FDD , re~ulting in ormu~u '''"'' I...
bi.h. The qu.lOtll} nl 111 .mges sold lor JUic~; is endogenous: Pricl.!~ and qu01nlltr~.:.
arc ~imult meou"l) tktcrmin~J by the forces of ~upply and d\.mant.l. Thu.... <t OL'-cus<>ed in Section t) 2. an cl udin~ qu.mtity would lead to 'imulta'l i) bia..
I'.C\\.:Jihdc:ss. the :-.eason.tl component nf demand can be captured by ir dudlll!!
se:to;;onal \ :~ri 1hlt!" ao; rcl!fC!)l)Or~>. Tht specrllcauon in column (4) ol l.thk b I
therdort: include-. I I monthly hrnal") variable<;. one indicating whet he. I tb ~: rnunth
is Janu.tr},t.'ll'-" indacaung Fchru,rr]. and so forth (as u::.ual one hina11 v.utabk must
b\! omiucJ to rr~\cnt txrfect multtcolhnl!anty with the mtcn:o.:pt). 11t~.,e month
lv mJic tlor variahks .1rc not joint!~ ... t.uisticall~ significant at t he 1011 .. k\1..1 (p
.43 ). and the c'timmed cumulall\l' dynamic multiplier.; arc c~'lt: nt i.alh thl.: qme :t<;
IM tht: ~pectftl~lllnns cxdudin~ the monthlv indicators. In .;ummarv, sC'l~On'llll uc
tua!lons 111 dc..m<tnd .trl! nnt an mport.tnt c;ourcc ot omilii!U VHrt.thle b11'-
bility of the U)nnmic muhiplkrs. we nl!cd to check \Vhcthcr the dbtrtb utcJ l.tg
cuc..:ffici...nts h,l\c '-"L\!11 -.tahlc O\'Cr time. Because we Jn not h:t'"-
~pcciftl brc'll. d.ttc in mind. we tco;;t for in'>tabtltty in the n.:grc-.,inn codfkicnt"
ll"IOJ! th\. O u tndt ltkdthooJ rat1o IQLR) 'tall~tr c (K~y Conc~;.pt ll '') 'JllC' Ql H
')L.Jtt~tic l w llh 15 tu lnmming. and HAC\ ana nce esumator) comput\.U 1111 t lC
rcgrco,~iun
1'11.
l.b'o(;U~')IiiO Ol
it th.tl m.11c:n.1l
'lllhtll\ tn
hn~
O ot he< o ,mere~
14 7 lllhl
Llln
lx: Sl.:lp(l<:LI
15 .6
623
FIGURE 1 5.3 Estimated Cumulative Dynamic Multipliers from Different Sample Periods
1 1
1984-2000 than
earlaer.
1%7-1983
(l
(>
I')
111
12
14
1(,
11\
20
L.tg (in M.onrh~)
regression of column ( 1) with all coefficient<; interacted. ha" a vulu.: of 21.19. ""ith
q 20 lkgrces of frct:dom (the cocfJicien~ on FDD,.ll' Hi lae' anJ th~ mtercept ).
nu~ I~~ .. ~oriticaJ "alue in Tahle l-Ui is 2A3, -.o tht QLR -.t.ttl~tJ<. r~Jects at thL 1 o.o
"i!!nificancc k.'d Tht:!'e QLR rcgres~ions han. 40 regressor<.. a large numhcr;
recomputing them for sn lags only (so there are It) rcgn: ...!>or... and q = 8) abu
rt.:!>ults 1n reJeCtion at the I% Jc,e1. Thus. the hypothesis that the dynamic multiplier!> me ~table is r~JCCtcd a! the I% signillcance le,cl.
One wa) to see ho\\ the dynamic multipiH:r~ have changed ll\cr time '' to
compuh, th~m for dllfcr~.;nt pans of the ~mplt:. 1-'igurc 15.3 plot<. the cstim<~ted
cumulativ~.; dynamic multiplu:~ for the hr"l thud (195U- I%6). middle th1rd
(1%7 IIJK~). and fmal third {19S~2000} of lht sample. ..:omputt.:d hy running
sep;Hatc regrc~~ions on each subsaruple. 'Ill esc e:;timates show un interesting and
noticeable pattern. In thl! LY50s and early 1960s.a frcc?ing dc?,rcc da) had a large
anu pl.!rSISII.!nt cffec{ 011 the pnc~. The mag_mLUJC: Of (he effect 011 pn..:c of II freez ing ucgll.. l.' uay Jlmtni:o.hcd tn the 1970s, although It remained lm!hl) pc:rSIStent. In
thL l.ltl 1Q~Os and 19'Xh, the short-run effect ot a frcc;ing Jcgrc~.; c.lay 'V<<h the
~arne as in the 1lJ70" but it became much kss pcr"i"tcnl. and \\'<1"- l's-.cntially elimin:lll!d .liter a year. These estimates suggest that the dynamic cau-.al effect on
624
CH APTER 1 s
p~or:.istent
o\l.:r
1 5.7
625
t Dtc;ne' Wor lu in
ing. duy i!Ctuo~ lh pt.:ilicted (orcca~t error~ mthc official lJ.. governm~nt weather forecac;ts tor that
mghl.
l'rt.:~ l'iel}
6:26
CH APTE R 1 s
,lr
inflation of mone tary policy. B ecause the main tool of mone tary policy is the
shorl-te r m inte rest rate (the short rate''). this means they need to kno" the
dynamic causal effect on inflation of a change in the short rate. Although the
short rate is determined by the central bank, it is not set by the ce nnal banker~
at ra ndom (as it would be in an ideal randomized experimenr) but rather is .set
e ndogenously: The c~ ntral hank de te rmines the short rate based on an ass~-.!)
menl of the cum.:nl a nd future sta te of the economy, especially mcluding th.:
current and fu ture nile:, o f in na tion. The rate of inflation in turn depends on the..
mtcrcst rntc (high~;r tntc rc~t rates reduce aggregate demand). but the tntcre'l
1 s.s
Conclusion
627
nne dc;pcnds on the rate of inflation , its past value, and its (expected) future
value. nlUs the short rate is endogenous and the causal dynamic effect of a
change in the short rate on fut ure inflation cannot be consistently estimated by
an OLS regression of the rate of inflation on current and past interes t rates.
rate of inflation against lagged changes in inllation and lags of the unemployment
rate. Because lags of the unemploymen t rate happened in the past, one might initially think that Lhere cannot be feedback from current rates of inflation to past
values of the unemployment rate. so that past va lues of the unemployment rate
can be treated as exogenous. But past values of the unemp loyment rate were not
randomly assigned in an experiment; instead, the past unemployment rate was
simultaneously determined with past values of inflation. Because inflation and
the unemployment rate are simultaneously detern1ined, the other factors that
determine inflation contained in u 1 are correlated with pasr values of the
unemploymenl rate, that is, the unemploymenl rate is not exogenous. It follows
that the unemployment rate is not strictly exogenous, so the dynamic multipliers
computed using an empirical Phillips curve [for example, the ADL model in
Equation (14. l7)J arc no t consistent estimates of the dynamic causal effect on
inflation of a change in the unemployment rate.
15.8 Conclusion
Time series data provide the opportunity to estimate the time path of the effect
on Y of a change in X. that is, the dynamic causal effect on Y of a change in X. To
estimate dynamic causal effects using a distributed Lag regression, however, X
must be exogenous, as it would be if it were set randomly in an ideal randomized
experiment. If X is not just exogenous but is srrictly exogenous, then the dynamic
causal eUeclS can be estimated using an autoregressive disuibuted lag model or
by GLS.
In some a pplications, s uch as estimating lhe dynamic causal effect on the
price of orange juice of freezing weather in f lorida, a convincing case can be
made that the regressor ([reezing degree days) is exogenous: thus the dynamic
causal ef[ecl can be estimated by OLS estimation of the distributed lag coefficients. Even in this application, however, economic theory suggests that
the weather is not strictly exogenous, so the ADL or G LS methods are
inappropriate. Moreover, in many relations of intere:,t to econometricians,
there is simultaneous causality, so tbe regressor m these specifications are not
628
CHAPTER 15
Summary
I.
2.
3.
-t
5
D~n
tr 11.: c:nr~::tl cllccG 111 time ~rie-. are ddin~J in the ~.umext of
r 'ffiJted
"'"p...ru 11~ nl. '"'llt;rC the' tmc suhjcct (entity) rt:cerve!\ dtffc:rent moth lltl\ '''gn~.:d
tr~atnl\.Ot!> at dtll~rc:mttmc-.. The codtrcrcnts in a dro,tnhutc.:J Ia rcl'rec:,,on 'Jf }'
on X nd tt' I.'!!-' 1.an be Ulh:rprct..:u as the d}lllmtc '-'IU''I c:fll.dS "hen he 11m~
path of \' j, d~:tc.:rmined randoml} Jnd inJerc.:nlkntly ol othc.:r lador-. th.11 tntluc.:ncc }',
1l1c \'aJJclbk \ j, (pac;t and present) cxogennu' if the conditim1.1l nN 111 ul the
c:rrilf 11 1 111thc dt,tnhu ted l,tg rcgre:.:.ion of Yon currcnto~nJ I"''' \.Jiuc ... nf' .Y doc.:s
tllH Jt.:penJ nn cut rent and past values o( .-Y. If In addition the contli tion~ I mean
ot 111 d1.h:':l not dc.:p~:nJ on lut urc v~llues of X . then X is ~ttictly cxnl!~;nou~
II r j, C'\Ogt:nuus.then the O LS e<;llmat ors olthc COI!IIIl.lcnts in iJ Ul'\lt tbUttd '&
rc.:grc,,ion of Y on l'unent and past values of X arc con:.h.tent 1. )lJJlhttor:: ol th~:
dyn.mac causal dft:~L. In ~eneral tht! error u 1 in this rc.:grc~sion '''crt til~ corn:
I ttn. o convcntion::ll standard errors are mt,kJdin~ and HAC l>t tnd mJ ~:rror
mu't h.. u'c.:d in,lead
It X LS tnc.tl} cxogc:nous. th~n the dr namk. mul ipliers can be el>timtcd
f
c.:'l m ton ol 1 \OL modd or b) GLS.
Fwgc:ndt~ i... 1'lrtlng assumption th.tl often latb to hold tn economic ttmc:. 11 :.
Jat bCCllU'e of ,jmullanc: US Ca U!>C!Ji~. amlthe :1<;'\Utnplion of <;tri.;t t'\U:.Cnclly ~
l.!vcn c:trongcr.
Key Terms
thmtmK e<~usal c.:ffect ('i91)
JistnbuLt..J lap. mudt.:l (51)7)
CX<.lgi.'Jlt.:ily (5!.19)
strict c\ogcm.tl) (~!.llJ)
Jy n tmic multtpli~. r (l)(l:-)
impact dlcct (hill)
curnulltH~ Jy n.tlllll' muhtplicr (60~)
Jon run ... umulut"~ 1.h n muc mulllplic:r
lW3)
Exerci\es
6 29
In the IIJ"'Os a common practice was tu c ... t,matc a dic;trioutc\1 l.tg model
15.2
15.3
15.~
nuu .. but
0(\l
~frictl)
exogt!nou!'?
Exercises
15.1
lncaca-..e' in oil prices have been blamed l'nr 'cvcral rl.!ces'litm-. in Jcw loped
Ctlllflll ics. To quantify the. effect of oil prices on rcn l economic nctivity
1esc.Hchus h;wc done rlgn.:!-. ... iuns 111..~ thtl'C Jio,;cu-.!>cd 111 tim. chapter Le t
GD P, th.molt. the \'alue of quartet I~ groo;c; domc<>llc prl)uUct 1n t h1.. L nlt..:d
St.IIC'-. md lu }~, = llXlln( G 0 P, G {) P, .) be th<. qu.trtl d) r(. fl'Ull.tge
dt.tng~.. in GDP Jame~ Hamil!on. :n econ0mctrician :1nd maCTik..:l..llll{lillht,
h..... 'U)!\!1.. ''~c.l th.11 oil price~ ad H. p,~,. h 1f ect that econnm) onh "hen they
jump ahove their value<; in the recent past. Spccificalh. let n equal the
grc.1t1..r ot zc:-ro or the pcrccn ta!.!c pt\int difference bet\\ecn l.lll pnccs at dt~tc
t :111d thdr nHlXtmum value dunng the pa!\t year. 1\ <.lt ~tnhutcu lag n.:grc'\:.ton rclatin):t }', and 0,, l'Stimatcd over 1955:1-2000:1\ '"
>~
.;. o.ooso, :.
(t1.025)
- o.rn 1o _- o Hl90
(0.115~ I
0.0250,
(O.H4X)
(I (US)
0-
0.0190:
(<l.OW)
o r"~O,
J -
(0.0-C)
+ 0.0670,
(11.()-t:)
j
8
llC\
630
CHAPTER 1 s
u\C ~;;tght
quarter ?
d. The l-IAC F-statistic tesLing whether lhe cocllictent:\ on 0 1 nod 1ts I<.~II.S
are ~ ro is 3...49. Are the coefficients Significantly d fC~::rcnt frnm ur,,'
15.2
(0.034)
, - 0.0140,_2
(0.028)
(0.047)
(0.038)
0.0860,_ 3 - o.oooo,_~
(0.169)
(0.05R)
0 . 0140,_~.
(0.025)
a. Suppose that oil prices jump 25% abo'<e the tr previous peet k. value
and stay at this new higher level (so rhat 0 , = 25 and 0,. 1 =
0tT'l = .. = 0). What ls the predicted ch an g~ tn Interest 1<1teS for
each quarte r over the nl!xt t\'-O years?
l>. Construct 95'./o
. confidence intervals lor your a nsWCTh to (a).
Exerc1ses
e'< {>\!rim ~;n t .
~tima t cd
631
m Exercise
15. 1 com:,pond'!
15.4
Suppose that oil pnces are ~ t rictl y exogenous. Discuss how you could
improve upon the estimates of the dynamc multi pliers in Exercise 15.1.
15.5
D erive Equation ( 15.7) from Equalion (15.4) and show thar 80 = {30 . 5t =
f3t, 52 = {3 1 ..... {3 1 8J = {3 1 + {32 + {33 (etc.). ( Him: NOle lhat X, = 6X, +
6XI - t + . .. + .:\X ,_r, + X,_fl.)
1.5.6 Consider the regression model Y1 = {30 + {3 1X , + u,, whe re u, fo llows the
stationary AR(J ) model u, = <b 111, t- u, wtth u, i.1.d. wllh mean 0 aod variance ui and d> 11 < 1, the regressor X, fo llows Lhe stational) AR (1) model
X1 = -y 1X,_ 1 + e,.with e, i.i.d. wi th mean 0 and variance a; and I'Y1 1< 1. and
e, is independent of ii 1 for all 1 and 1.
a. Show that var(u,)
= --'1-
aod var(X,)
q2
=-
'2
1 - l/Jl
1 - 'Y t
b. Show that cov(u, u,_) = 4>{var(u1) anu cov(X, X,_i) = y{ var(X,).
c. Show that corr(u,, u,_i)
)
1
= y/.
variance
o[
u.
Consider the regression model Y 1 = {30 + {3 1X, ..L. u1, where u, fo llows the
stationary A R(1) model u, = l/J 1u, 1 + ii,with ii, i.i.d. with mean 0 and variance <T~ and I ~ 1 1 < 1.
if" s ~ {31 -
4>1-.,;.
l/J-
} T
qu~si
632
CHAPTER 1 s
15.9
r -t ~~ ,};.
Slll)W th.tt th'- (rnku-.Jble) GLS l:'-llltlatl)f ,., if.:'\ =
(I - f/1 1 ) 1(T- 1)- 2., ~ (}',- </> 1), 1). [lfim. Th.. CiL\ ~'timatllr uf
{3 is ( I - 0 1) 1 ttmt.:'- the OLS e~ttnl.l tnrol o 0 m clJUdtHIII (1='.2' ).
1
Wh) ?J
c. Sho\\ that {3r,1 ~ can be written ;h
(I
- 1)
'I .-\.!.
( l - tb 1)
it;'
1(1
I)
1\
'i -
1 (Yr
- </J \''1 ).
DL nHH..Id Y, = 3.1
0 ~ \' 1 n,
b. l)l:ri'e the
fir~t
cu mul~uivc
d vnamh. multiplier.
Empirical Exercises
15.1
In thi~ cxcn:i ~ )OU will c"timalc the dfcct ot 011 pnct!" on macro~cononm
:\ClJ\11>' ~101! monthl} d:ttt on the lnc.k x of Industrial ProJucltL>n (IP) <~od
th1. mlmthly mca"url: of 0, ut!o;(nhcc.l in Excrci"e 1" I. The Jata c tn l' ~o:
found oo rh~ text~ouk \\~,.b <>Jl\. \'rnWJl\\ bc.com/MocJ. _\\ahon in tl " rile
USMucro_Month1} .
jj,
Compul~
'on
'"' uru.
c.
l:.'timat~. 1 di'l
Empirical Exercises
d.
Tak~:n
633
(hffc.:r~nt I rom
C.
r.
15.2
(Am:-.truct graph liJ...c tho:,r.. tn f II! UTI. 15.:! ~hO\\ ing the: estimated
d) namtc mulupher:.. cumul..tll\1. multtplicr'.tnd 'J<; ' confidence
int~n<tls. Comment on the rc tl\\oriJ '"c llf the muh pli~rs.
Suppo-;1.! that high demand an the l.Jrutcu State' (c..\tULnced by large
values of ip~f!rOI\ th) kad~ to tncrr.. ..N:s 1n nal pttc~s. h 0, c:xogenou~ '?
Arc the e .. timatcd multiplier.,l'lllmn in the graph-. in (r..) rlliahlc'>
Explain
In the. data file LSMacro_)1onlbly. y<.1u v.tll lind dat1 on two aggregate
priCl. scnes for the CoiLed Statcc;. tht. (.:nnc;umea Pncc. lnJcx (CPI) anu the
Personal Conl>umpuon f=xpendttUh!' D~ll.awr (PCfD) These ...erie!. are
altemalt\e measures of con:,umr..:r price' in tho.. United St.lles. The CPJ
prices a basket of goods whose compn...itar..m il> updated every 5-10 years.
TI1c PCED uses chain-weight ing to price a basket of !!OOd' whose composition changes (rom mon th to mont h. b:onnrntl>t' haw ar!tued Chat the CPI
will overstate mtlation becau!>e it tlol!~ not take inw account the substitution th:u occurs \\ ben rdutt Vt. pJicc-; ch.mgc. Tf thi!> sub!)titution b1as s
tmporl.tnt, then a\ erage CPl mtlation l>houiJ be 'YStc.:mJtically higher that
PCED intla rio n. Let ;;~f'l
1200.., ln [CPf(t)ICPI(t- l)J, rr{'un =
1200 x ln[PCf,fJ(t)I I'CD(t - 1)j. andY,"" 1r1 1' 1 - r.f'0 0 . so that 1T~r1
is the monrhly rate of pric~ inflation (measured 10 percent~I!!.C points at an
annual rate) ba:,cd on the CPl. r.!"( 11 ' 1~ t h~; lll~'nthh rate ot pncc 111tlataon
from the PCED. and Y 1 is the.; dlffCrc.!nce. L'>tng Jata !rom 195Y:l through
200-U2, carry o ut the lollo\\t ng c\erct\r..'.
a. Compute the -;ample means of r.fNlml rrf'' n. Arr.. the'\e point csumntcs con~istent with the pre!)cncc ol cconomicall~ .;igniticam sub:.ti
634
CHAPTEI 15
APPENDIX
15. 1
group of the Producer Price Index (PPI), coUcc1ed by th~ L \ Hure !U of Lab
Staw,llcs (BLS ~m:~ wpu02420301). The orange juice pnce scru:s \\J:. dt\h,kll h~ the O\<!r
all PPI for fini bed goods to adjust (or general price innauon. The lr.:c:Lt tJcgrl!~o. J;t)"
sencs
'"d" con!>tructed from daily minimum temperatures recordcll .11 Orlando-,m:a atr
port~ obtained
the U.S. Depanmcnt Clf Commerce. The FDD series was constructed so that its tt ming and
the timing of the orange juice pric~.: data were approximately alignell. Specificall\. the
frozen orange juice price data arc collected by surveying a sa mple of producers in the milldie o( every month, although the ext~ct date va ries from month to month. Accordingly. the
FDD series was constructed to be the n umber of freezing degree day~ from the ll rh ol om.
month ro the 101h or the next month; rhat is. FDD is the maximum of zero and 32 minus
the mintmum daily temperature, summed over all days from the I J'h to th~; ]()1 . fhu~.
%ChgP, for February is the percentage change in real orange ju1ce price~ from millJanuary to midFdlruary. and FDD, Cor February is the number of freew1g degree llavc;
from January I I 10 February 10.
APPENDIX
15.2
1-
+ fj
X,
UX, = X,
anc.l
The ADL Model and Generalized leosl Squoros in log Operator Notation
:L /3 Ll, where L 11 =
fj(L)
1n
635
K c~ Concept 15.1
+ u,.
(I SAO)
<b(L)u, =
where <}J(L)
u,,
05.41)
= L:=ud>jV. where .Po= 1 and ii 1s scnall~ uncurrd 11cd !note tha t o .... .<Pp
1
as de fined here are the negati' c of <!> 1... . I!>,. in the nutation ol Fqu<tlton ( 15.31 )).
To derive the ADL model. premuhiply each 'u.h. u( ElJU<~IIon (15 40) by tb(L).so that
a0
+ S(L)X, + ii,.
(15.42)
where
p
= L <l>r
j-
(15.43)
= ,S(L)Q>( L)X, =
a0
IJ(L)X,. where
(15.44)
implied by Q>(L)Jl(L) =S(L). Thus. the c<:tirn ato r o r t he dvnomic mulllplic r' ba<:ed on the
OLS estimators or the coefficients of the AUL moJ\!1. c(l.) and 1/J(L) is
~ 1101 (L)
6(L)/4J(L)
(15.45)
The e xpressions for the codficicnts in Equat1on ( I <i.29) 111 tlu.: to.:xt .~rc obtainc:d a., a '>pe
ctal cao;e of Equat ion (15.45) " hen r
= 1 and p
636
CHAPTER 15
e~timotion
c:stirn,ll~J
,\s :.l h ;;!>Sed 10th~ &.hscus::innsurrounding Equ"tti "" ( 1 ' :til) 1n t " 1
\l
it b not enou h
for,\, tn I'C (PJ't and prc')cnt) exogenous to u... t.:tlhc:r ot the:,._. .... ,urn,t\,nn
m~.thotl
c\clgencit~
ah1nc:
d~XS
~b.
llllnli.ll I:J(l } .:an I'll: Millen 1~ a ratio otlag pol~ nomtals, O,(l ) II~( L). \\ hcro.; U,tL) am.J IJ~( I )
nr~ buth lup. pulynomialo; 11f a low dcg.ree. Then t/J( L)f~(f) in l qu ntum (l~.-1 ~) 1~ {1(1 )f:J( l)
_.. cbC I.)fJ 1(L)/II,(I ) -[r/J{L)/Hz(L)]0 1(L) If it .so huppcn' tha t rb(L) O.(f.). thcu ~(l 1 ~
cb(l
)~(l
(I ( L). li
l)fl,ll'\lll \"'
moJ.:I. can be much h.:ss than r . Thus. under tbc:.c a!'su m ption~ e~t unntion
in th1.. \ DL
(ll
the .\DL
modi..'! ~ ntaih C\tlll\otmg pot~.;otiall y many fewer parameter<; than th<. ngtn.tl UI'IJ thutec.l
1~ 111 th1s :.em..: that tho.: ADL modd ~Jn ach1ev'- 1 more par tmvnh>U' pa!.l-
l.tl' modd It
m~.: h:ntnllon'
(th;ll to;. U.'l: fc:\\Cr unknl"'" parnml!tc:r,llhan rht d1~tn~utcd l.tt; muJ...t
A<.. de \'clopcd hen:. the a"'umption that (L) and fl_(l ) h tpp.;n to be th1.. ' me '.:~.-m:.
hi..~. 1 c ncid~. 1ce th Jt wouiJ not occur in an appli~...lliw Ho" " the \01 mekkl ;, uhle
10 ~ ' J .. J I b'- numlx r C"l[ 'bape<i "'' \J) nanuc multtplu!l' "1th only <1 h:w cot'ffick nt
./ tDL
tlr
Gl ~ Bias
l"S.
PART FIVE
The Econometric
I Theory of Regression
Analysis
----------:
CH.\PTER
17
CHAr rER
18
CHAPTER
17
~ The Theory
of Linear Regression
with One Regressor
art!'
-~\'cral
rca,t-'OS.
}nUr 'tali,ucal -.ofl,.,.are (rom a ..bl.u:k hox 'mto a flcxthk: tn\Jlktt lmm which
you an.: able to select the right tool for the job nt h,utd Under,tanding
econom~t ric
t he~e
importantly. knowing econometric Lheof) help-. you rcl'o~nite wh~.n a tool will
'"'' work well in an application ami when you ~hou l d look for a dtlfcrcnt
econometrk approach.
Thi ... ~. h a pt cr proviJe, an inlroductto n to the econumc..lric theOT} ot liot!'ar
regrl..!ssion '"nh a singl~ regre'>sor.111is in troduction is tntcndcd to ~:uppkmeot
nut replace
square~
homo:.k.~?dust idty
tn ltnllc
lt\'C
'amplc normal dtstributions of the OLS estimator nnd t-... tatt"itC undt:r the fir<;t
three
17.-l
a~ ... umptiun~
J~.rtves
f,77
678
CH A PTER 17
chapter cxtcnus
.th~rn,llth'
Chapt~.:r-s 4
and;:. ic; to
th~.:JI)
it;,,,, '4uarc'. ts
~real
deal of .,nor
kn u" h:dge about the prec ise nature of the heterosk~Jc~~ttc uy- th .tt " about the
conditional varinnct! of u given X. When such knowledge is a\.tlluhlc: Wctghtcd
arc i i d. draws (rum their joint distribution: and tha t X, ,tnJ u, h<l\ c: four moment::\ lnde r these three assumptions, the OLS cstim,nor is unbit~scd. is cono.,bt.:nt.
and has an a.,ym ptotically normal sampling cli-.tribution If the'l' thrcl.' a~~ump
tronl> hold then the methods for infe r~ncc introduced in Chapter 4-hypothcs"
h!'>lmg usan!! the t-~Wlt '\llC .. nd conc:tructJon of 95" coni!Jc n~c mt n.th as - 1 ' 1fl
17. 1
679
standard errors- are justified when the sample size is large. To develop a theory
of efficient estimation using OLS or tO characteriz the exact sampling distribution of the O LS estimator, however, req uires stronger assumptions.
Extended least squares assumption #4. The. founh extended least qua res
as umpt.ion is that u is homoskedastic. that is, \"ar(u 1X;) =
whe re u;, is a con-
u;
stant. As seen in Section 5.5, if this add itional assumptio n ho lds. then the OLS
estimator is efficient among all linear estimators that are unbiased, conditional on
Xl , ... , Xn.
Extended least squares assumption #5. The fifth extended least squares
assumption is that the conditional distribution ol u;, given X;, is normal.
Under least squares Assumptions #1 and #2 and the extended least squares
Assumptions #4 and #5, u; is i.i.d. N(O. CT~) . and u, and X 1 ar e independently distributed. To ::;ee this, note that the fift h extended least squares assumption states
that the conditional distribution of u, IX; is N(O, var(u;I X,)) . By the fourth least
squares assumption , however, var(u, IX) = a~ . so the conditional distribution of
u;lX ; is N(O, a~) . Because this conditional distribution does not depend on X 1, u,
and X; are independently di:.tributed. By the second least squares assumption, u,
is djstribmcd iodcpcndenrly of u; for all j #:- i. It follows that. under tbe extended
least squares Assumptions #1. #2. #4, and #5, u, and X; are independently distrib
uted and u, is i.i.d. N(O. ~) It is shown in Section 17.4 thal, if all five extended least squares assumptions
ho ld. the OLS estimator has an exact normal sampling dJl>tribution and tbe
homoskedasticity-only -statistic has ao exact Stude nt t distribution.
The fo urth and fifth extended least squares assumptions are much more
restrictive than the fi rst three.AJthough il might be reasonable to assume that the
first three assumptions hold in an application, the final two assumptions are lese;
realistic. Even though these final two assumptions might not bold in practice, they
are of tht:!oretical interest because if one or both of them hold . then the OLS t:stimaror has additional properties beyond those discussed in Chapters 4 and 5. Th us,
we can e nhance our understanding of the OLS estimator, and more generally of
the theory of estimation in the linear regression modeL by exploring estimatio n
under these slronger assumptions.
TI1e fi ve extended least squares assumptions for the single-regressor model
are summari7.ed in Key Concept 17.1.
680
CHAPTER 17
17.1
{3 1X,
..1-
square~ as~umption~
u,.
1. . . . . 11 .
( 17.1)
are
2. (X,. Y,). i 1. . . .11 arc indcpcmlent and identicall) di.stnbutcd (i.t.d.) draw:-.
from their joint distribution:
L( X, - X )(Y;- Y)
{31
=-
..L_,_
( 17.2)
ll
L (X; - X)2
t=l
{3 , - Y
f3 X
( 17.3)
17.2 Fundamentals
of Asymptotic Distribution Theory
Asymptotic c.li;;tribution theory i~ the theory of the d.tr1huuon of ~t atbtics-e!\t i
mators. test statistic&. anu confidence ultcn alo;-whl:n the sample size is large. Fnrm.tlly, thb theor} involves characterizing th~ behavior ot the :,ampling tlbtri bution
of a statistic along u :.equence of ever-larger sampk~.1l1e theory is asymptotic an
the scn'\C thal it characterizes the bchuvior of the statio;tk in the limit a' II ----+ x .
Even thoue.h sample sizes are. ot cour~e. never infinite.. ""}rnptotic Ji"triouuon theof) plays 1ccnual role tn economdric<; anJ -.wta,llcs fur two reason.... FiN.
tl lhc number of uh.;ervauons u~cd 10 .m cmprrical appltcallun 1s large, then th(;
,,,ympwtic hnHt l:JO provtde a hgh-liU:Jiit) .tppro\inwtiun to the fimte ..,ample c.lt$-
17 . 2
681
w tng t - ~tati,tk~ and ~5 % wnlld~.ncc intenah Gtkulated ,,, I C)fi -.tandard ~rru~-1:an he bawd on ctppt 1.l\im.11e .,ampling di'>tri-
.,t.tti~til:al inkr~.n~c-tc'i'
number-. anJ the central hnll l thL.Orcm hnth introuucc.:d Ill S~1.110n 2.6. \V~ begm
th is lie.; tton b} colllinuin!t the dtscus,Hm ol thl! 1,1\\ ot lar)?.~.; numbcrc; .md the cc.;nllallumt tlh.orcm. includtng .1 pn.>1JJ llf the 1a'' ot htrgL. numbc:r;). \\c thtm mtrodtKc t\\t\ rnorc toot... ~lutsk) \theorem and the ~.:'1\llllllUnu' mappmg theorem, that
e\h:nu the ut.dulnc.,, of the I<J'' llf large: numbers .md thL. c~ontwllimit lh~orem.
\ s nn illustt ntion. th~c;e tools arc then U'\CU to prm 1. that the d1<:trihution of the
t-~t atl..,tlc based on Y le::.ting the hvpotht!sts I:. I Y) - I' has a standard normal uistnhuttnn under th~ null h) pothc-;is.
Convergence in Probability
and the Law of Large Numbers
1111. w nccph of convc rgencc m prohahilit) anu the l,t\\ ullargl.! numl'lc.!l' were introduced m Secuun ~ 6. H ere we prov1d~ a pr~ise mathc:malic<tl d~:fmi tion ol converg~ n~t Ill prnbahlhty h"lllow~d b~ a Slah.ffil.. nl clOd prtXII nf the LlW t>llarg.c numher~
S, ~
Zh ''
,. :.c.
I' tf and on I~
ir PrL ~-. -
J.L 1
~ ii) ~ n
{17.-l)
mutor nl Jl.
lbc law ol large numh~.r~> ~avs that. undc:r certam
('onl.htlon' nn }1.... , }'r.th..: ,,1mplc aH:ragc )' c.un\it:rgc-. 10 prohah1ht~ tn the
population mean. Pwhah11il~ th~uri'" haH: Jcn-lupcJ m.tn} 'vt:r..,iun' uf thc law
t'l l.trgc numh~t' C0rTC"J'IOnding lo ,,,riuu' condition' nn} .... Y , Th1.. \cf'-ion
,,f the 1,1\\ nllarcc numh~..r, u. . cd 10 this hnn~ i..,that }' 1.. .. ) are i.i.d. Jr """from
682
cHAPTER 17
a distri b ution wi th finit e variance. This law of large nwnbe rs (also sta ted in K~y
Concept 2.6) js
~ J.Ly.
(17 5)
The idea o f the law of large numbers can be seen in Flg urc 2.8: As the ~ampk
size incrl!ases. the sampling d istribution of Y concentrates a round the population
mean, JJ.. One fea ture of the sa mpling dist ributio n is that the -. anance of Y
decreases as the sample s ize increases: another feature is tha t the p robabili ty that
ide ::!: S o f J.L y vanishes as n increases. Th~:,e two feature:. o f the ~am
piing d istribution are in fact linked, and the proof o f the la w o f la rge number
Y fa lls o u t
Proof of the law of large numbers. 111e link be tween the va riance o f Y a nd
the probability that Y is within ::!:
which is s ta te d and proven in Appendix 17.2 [see Equa tion (17.42 )). Written in
te rms o f Y, Che byche v's inequality is
(1 7.6)
fo r any positive consta nt S. Because Y1, ... Yll a rc i.\.d. wi th variance u}, var(Y)
0. var(Y} I fi
= u~ I (S2n) ---+
tion ( 17.6) that Pr(. Y - J.Ly l ~ S) ~ 0 for every<>> 0 , proving Lhe law ollargc
numbers.
Some examples.
b utioo theory, so we present some examples of cons iste nt and inconsisten t estimatOr"$ o f the populatio n mean , J.l.y. Suppose that Y;. i = 1, .. . . n a re i.i.d. \\ ith
va riance 0'~ lhal is positive and fi nite. Consider the fo llo wing three estimator:. of
J.l. y: (1) mu = Y1; (2) mb = (\ -: ._ :) - 1 2:~= 1 ai-t Y;, wher e 0 <a< I; a nd (3) m, =
+ 1/ n. A re these estima to rs consistent?
T he firs t estima tor, m". is just the first observation, so (m11 ) == E ( Y1) = J.L\
a nd m,, is unbiased. However, m" is not consi!'te nt: Pr(l mu - ,uyJ 2: 8) =
Pr(l Y1 - f.L} 1 2: <5), whjch must be positive for sufficie ntly smallS (beca use ct1- >
0). so Pr(l m 0 - J.L rl 2: S) does not tend to zero as n -::c., so m 0 is not con sist~nt
Th is inconsistency should no r be surprising: Because m., U!-.C~ the information in
on!) one ohserva tion. its d istribution cannot concentrate around J.l.t as the sample
size increa es.
11. 2
1111.'
con.t~t~nl.
683
It i" unbiased
becau~e
"
since 2:ai-l
i-t
>0
"'-.~ 0
J.L)
= J.L Y
;-o
TI1e variance of m b is
var(mh)
,(1 - u ")(l
= ( 11- -- -a")-~"
2:a-(l~-,
I)IT)' = a y
o ,-;
(l-a2)(J -
a)~
a')
1
= u .(
} (I
a")(l -a)
a )(I
t1)
whichhas the limitvru(mh)---+ o{.( l - a)/(1 a)asn----+ c:o. Th us. the\ariance of lhi!. estimator does not tend to zero, the lh,t ribut ion doe ... not concentrate
around 11-Y and the estimator, although unbia!led, is not con-;btcnt Thi' is perhaps
surprising, because all the observations enter this c-.tinunor. But tnO\l of the observations receive very small weight (the weight of the 1111 observation jc; proportional
to a' - 1.whtch approaches zero as i becomes large.!), and for this reason there is an
insufficient amount of cancellation of sampling errors for the esttmator to be consbtent.
The th ird estimator, me. is biased but consistent. lt~ bia is I I n: (mJ =
E(Y + 1 I 11) = f.J-y + 1I n. But the bias tends to zero as the samrlc ~v~ increases
and m, is consistent: Pr(l me - P-> I ~ S) = Pr((Y + 1/ n - p., I > S) - Pr(l (Y JJ- y)+ l /n i~ 5).Now i(Y - p.y)+l/nlsi Y - p ) tl/n,so tf ( l' - /-l )) '-lnl
~ 8, it must be the case that Y - 1-L yl + I 11 ~/>;thus. Pr(l (l'- J.l.)) + 1111~ 8)
s Pr(IY- P-r l + l in~ S). But Pr(l Y - J.L>
I 11 ~ c5) = Pr(ll' - ,.,, _ lil I n) s u~ l[n(S - 11 n) 2J ---+ 0. where rhc ftnlllnllqualny follow~ 1rom Chebychevs inequality ~Equation ( 17.6), with 5 rerlaceu h) li - 1 In for 11 Ifill It rollows tha t m e is consistent. This example i ll u~tt ate., the g~;ncral point that a n
e'>timator can be biased in tinite samples bUl. if that h1as vanic;hec; ac; the sample
si1.~ gcb large, the estimator can still be coosi~teot ( L>.~;rcic;e 17 I0).
684
CHAPTER 17
Convergence in distribution.
h~ a o;cquenc of cumu-
s~
..... S,, . .. .. Ft"lr 1.\ dmple. sn might he the '>lantl.ardiz-.:d 'Ill ph. " eragc.
(} - 11\) ''l Th1.n the seque nce of random 'ariable .. S',. i, -.aid to roD\ol'rJ.:l' in dbtributiM to S (denoted Sn ~ S) if the distd1utiun fundton' {1: j~.~onvergc to
I , lhl Ja-.trahut on c..ll f) That LS.
S,,
'
-or
S if and only if hm F
(l)
r.-
1- (t)
( 17.7)
wht.n: the limit hold~ Hl all points rat which the limiting Ul\trthution Pj, ..:nnttnuouc;. T11e di$11 ibutwn F is called th~ asymptotic distribufiun of ), .
Jt as u:.ef ulto contrast the concepts of convergence in pmhab1ht'l- ( -1-~ l.nH.l
COO\ I.: I gc nct Ill distribution (~) . If s/J ~ J.L. Ihcn S,, ht:comcs close to /J
with high pr<.>bnbilit y :1s n increases. In contrast. if S, ~ S.thc n I he dlll/lhtllwll
of S11 becomes close 1o r he distribution of S as n i ncr..:ases.
We now restatr.: the central limit thc:ntcm u-.in~
the concept of convergence in distribut ion. The central limit thcorc.. m in k t:y
Conn:pt '>.7 states that if Y1 , Yn are i.i.d. and 0 < o1- < :c. then the 1wmptotio.:
J"tnhution of (Y- J.Ly) l u y is N(O. l ). B ecause (Ty = u \ 1\. 11, (l - I - ' l l trl \in( Y - J.Ly)lur Thus. the central l imit thc!orem c.m he rc-.tntcd :as
\~(Y- J.L\) ......:!..... rryZ, \\here Z is a standard no rma l r1ndom , .mabie. lb1s
mean' that the distribuuon of Vii(Y- ~y) converges to ,\ (0. u 1 ) n... '' -x.
Conv..:nt1onal shonhand for lhi limit JS
. rd
vn(Y- J.Ly) - . N(O.u}).
( 17 l'i)
Th1t is. al Y1, , Y, arc i.i.d and 0 < <T~ < x. then the dt~lrihuti on .-.t
V 11(}
J.l..>) converges to a nom1al distribution \N'ilh mean tew and van.tnl'l rr~.
Extensions to time series doto. The Ia'' oflarg.c numbt:J sand central ll m11 thv
orcrn :,lalcu 1n "icctinn 2.6 apply to i.i.d. observations. As (li~cu::.~cu in Chup1 e1 14.
the i.i.d
'"~umptinn il>
to be C'~tcndt.!d before the~ can be applied to lime st.:lics nhsl'rvation .... 111ll~-: c\ t ~:n
~ton:,
nr.:- technical in nature. in Lhc sense that the conclu'lion jc; the c;amc vcr~ion'i
ot the law ol l a r~c numbers nncJ thl! central limit t hco r~.m arpl) to time --~nc..-,
data but the.: C011UIIIon<; uuJer wh ich the} appl:. ar~ dirk rent. Tht::. l~ JISCU"'-J
hth.:fly 111 Setllon lhA but a mathematical trt.:almcnt of a~}mptotJc dJstnbutw
thc..<H) for llml ~Cril.!' \<lriahJc., i' hc)tlnd the scup~: of thi-. btlok 1nd tntcr~:,tcJ
rc:~Jcr., <He rdcrrcJ tu ll n}a~hl (21ll I C'h ph:r "'~ )
l 7 .2
685
Slutsky's Theorem
and the Continuous Mapping Theorem
Slutsky') theorem combines consistency a nd convergence in distribution . Suppose
that a, ~ a, where a is a constant, and S, ~ S. Then
(17.10)
xi
s;,
Application to the
t-Statistic Based on the Sample Mean
We no v. usc the central limit theorem, the law of large numbe rs, a nd Slutsky's theorem to prove that , unde r the null hypothesis, tbt: t:.ltlttstic based on Y has a standard norma l distribution when Y1. , Y, are i.i.d. and 0 < E(Yj ) <co.
l11e r-statistic for testing the null hypo thc-;is that E(Y,) = JJ.u based on the samph.! ave rage Y is gjveo in E quations (3.8) and (3.11 ). a nd can he written
Uy ,
(~)
~ hc!rc the second equa lity uses the trick of
denominator by cry.
di vi di n~
(17.11)
686
CHAPTER
17
>'1
~ampk
"here.: 1. = (X, - J.l.x )u,. The proof ot th1' rc.: ... ull "a.., :-J...etchc.:d in Appc ndi\ L3, but
tlut prcx,f umttt\:d .,c.)mc uctaib and in' oiV<:d an appro, im,uion th 11 ,~ ..,not fM
mally -.ho\\ n 'lbc mic;sing <oteps in thJl proof arc.! left as Cxcrc.:i"c.. J 7.3.
An impltl':tttnn ol fquaU on (17.12) is that {3 1 u-. consisten t ( Fxcn:i'c 17.-l).
Consistency of
Heteroskedasticity- Robust Standard Errors
t..ndtr the llrst three lc...tst "CJUarcs as,umplionc;. the heteroskeda~tidt\ rohu"t 'illO
Juru " rror hll /3 1 tnrm' the balol:. for ' cJitd 'tat l~ ll.:.t l inkrcnct' (\pc..:tltcally.
-,
I.T~.
~ -I.
"s.
(1 7.1.3)
17.3
687
where "
robu"' ,t,mJard "rror dettned in l.:qu.uion (5A): th11 ts.
(T2
I
II -
,,
"(Y
2 ..t.J
, [t2:1\
y)l ~2
u,
- X)
'fo,how tht rcsuh in Fquation ( 17 I1) fuo.;t uc;c the dt. hmtwnc; ot 1,2 and Uj,
h> rc:wntc: the tatto 10 E4uatton (17.13) ill>
(17 15)
term~ til
2:;-
"
mum~nl. suppOM! that X ''"d 111 h '"-' Ct~ht m<lmcn t:> I that is,
anJ c(u,) < - 1-a strongct .tl.sumptton th.m the fou r momenrs
tcquucJ b) the third k,t'l ~l{U<Ift!~ a~~umptinn. ro Sh<l\\ th~ hr'l 't~p. \\t: mu:.t
'ho\\ thflt 1 ~' 11, 0}\c~ s the Ia'' of large numhcr' in [quat inn ( 175) To do so.,.~
mu't be: t.t.d (\\htdttt is b~ the -.c:cun<.llc.l\t 'qu.-r~.'> ''"umptit,n) ami var(1 ~)must
tx- Imit~ f0 ,(low that var( 1 ') < :x.. apply the Cauch )ch\\ nr1 inc:4 tahtv ( -\ppenp. ) 11 1-lr{( \ - 1-1 .d ]f(ct; )1 ~. Thu..._if
dtx 17."\):' r(1 ) 11--) - [(.\
X, und u h.. ,~ ctght mom('nts, lhc:n 1} has a ltnttc ';mane and thus o,;atisfi.::s the
hJ\\ of large numher-. Ln Equation ll"' Ci)
0
The 'Cl'nn<.l ~l1.p '' Ill pro'c th.1t 1 ~; { \' - X) , ... - 11' L ,; - - 0.
f:kc.HI\l' 1', = C\', J.1., )u,. thi' "l:!cond "'~P j, tht. -..un~. ,,, ,Jto,, in!! that
I llr lht
/~( \' ~)
::t:.
(17
)h)
Shu\\ tng this n.:~ult ~.ntails s.:lling u, = u, - <flo- {30 ) - (~ 1 - J3 ).\',, cxpandtng
!Ill. tel m 10 r\.ju.llion (17.16) in brackets. rt.:pl.:lli..'OI~ uppl)tnl! Ihi. (.~uch~ -~chwal7
illCl(U:tlil\, .tnJ u... ing the COO\i<,lenc) llf #u and
Tnc Jc:t.llh or thl. algt.bra are
ll.'lt a-; Fwrcist 17.9
hi.
688
CHAPTER 17
The preceding argument suppo e~ that X, and u 1 have erght moment-;. fhi.; is
2
not ncc~ssa ry, however. and the result
1(X,- X) ii? ~ nr(l,) can be
proven under the weaker assumption that X, and u, have four rn1lrnent,, a~ .,tatcd
in the lhrrd least squares assumption. That prooL however. is bc\ond the cope of
this textbook; see H ayasht (2(XX). Section 2.5) for dctaib.
!2.;
(17.17)
ll fo llows from Equation (17. 12) tha t the te rm in brackets aller the ~econd t::qualily in Equation ( 17. I 7) converges in distri burion to a standard normal random 'ariable. In addit ton. because the heleroskeda!>ticitv-robust standard error is con<~i<,tent
P:
L.
17.4
'
usi:X =
689
(T"
L(X, -
(17.1!:;)
-.
X )~
il
) condtllonal on X
1
. ....
X,. cntatls (t) establishing that tht! distnbut10n tS normal. (u) showmg that
E(p X 1 X,)= {31: aod (iii) 'enfy1ng Equation (I ?. IX)
To show (i), note that conditional on X 1, X, , {3 1 - {31 i!> a \\dgbted aver
I
age of u1 u,:
I
(17.19)
This equation was derived jn Appendix 4.3 1Equation (4.30) and IS restated here
for convenience.] By extended l ea~t squares Asl)umptioo&IH ./#2. #4. and #5. u, is
i.i.d. N(O. aD.and u., and X, arc independent ly dil'trihuted. Because weighted averages of no~mally distributed variables are tbemselve~ normaUy dislributed, il follows that {3 1 IS nonnally distributed, cooditonal on X 1. , X,.
To show (ii). take conditional expectations of bot h sides of Equation (17.19):
[(P 1 - {31)IX11... 1X,) = [~~- 1 (X, - X)u;f 'i." 1 (X,- X )' X 1, .. . , X,]=
L~- 1 (X,- X) E(u;I Xt> ... , X,)/ 2::7= 1 (X, - X) 2 = 0 where tbe finaJ equality follows because E(u,l X 1 X 2, X,,) = E(u,IX,) = 0. nws ~~ is conditionally unhiasedl Lhat is.
(17.20)
To show (ili). \Jse the fact that the errors are independently distributed. conditional on X 1 , X,. to calculate the conditional variance of {3 U<;ing Equation I l7.19):
= t= l
II
L(X;- X) 2 u~
-
1=- l
-[,~(X;- X)]
p7.:!1)
690
CHAPTER 17
C.mcc ling thi.! term in thL numerator rn the fwal exrr'-'''"'11 in Equa11on ( 17.:! 11
yicJU., the formula for the condition.tl vat iancc m l::.4u ttion ( 17.1~).
Distribution of the
Homoskedasticity-Only t-Statistic
The bomoskcdastictl)' Onlv t-stattsuc testing the null h) p1lthc'1~ P1 =f3u 1i'>
/3, -
t = -
{3
(17.'22)
S(/3, )
f3t - f3u,
/ ,~
- .,
_.._\J q'
II
(Pt - f3 to)la,,
VW, (u - 2)
''here
f3 t - f3t n
I
\f (T~I "l: (X, -
,\)'
"
(17.2J)
I
(i;, = , ~ 2 1; u; and W = 2.';.. 1 u~l~. Cnder the null hyplHhc!)t' ;3, ha-.
tr: .)
\ ({3 11
distribu tio n conditional o n X " ... X". so the dtstnhution of the
numerator in tbe final expression in Equation (17 23) is N(ll I). It b .,JHl\\ n in Scl.
twn 18.4 that \II h.ts .t chi-squared distributiOn with n - 2 Jcgrees ollrc:\.'uom ~nd
moreo~cr th.tt \\ is dtstributt!d independc:ntlv of the standaruiLI!d 01 S c::.t m <~h.
10 thl! n um~.rator of Equaton (17 .23).1t fo lio\\~ trom tl1e Jdin1110n olth\! Student
I di,tril:'lu tiun (AppenJh 17.1) that. under Lhe five e\!l. nueu least Sqllt~rl.~ <tS!iUIUp
tions. thl.' homoskeclasticity-only t-statistic has a Student 1 distributwn \\ 11h 11 - 2
J cgrec:s ol trccdom.
I
Where does the degrees offreedom adjustment fit in? ll1c <k~rcc~ (If freedom !lOJUstmcnl in s~ t.!n">ures that s~ IS on unbi::tseJ cstimmor ol und thatthi.!
t-staristic hn!-t a Student 1 distribution when the crn.m. ure llllt mully dbttibuted.
l11.!cumc W = "i~ 1 it; fer~ is a chi-~quarecl random vnrinnlc.. \\ilh 11- 2 tkgrc~.,
if,:
of freedom. tiS mean is ( \V) = n - 2. Thus. FIW '(n- ~)] (11- .,) (11 2) -=c
1. Rcumngtn!! the defin ition of W . we hav~ tbnt 1:.(, 1 , "i~ /t;) ~~~ nlll.,l tht'
t.lcgr~.: es ot trccuom correct ion makl!s 'i an unhia-;cJ ...:~tunutor of rr' AJ,o. h~
dt\tutng b)' 11 - 2 raLher than n,lhe term 10 the dt:n(lJ11tn.llor ul the hn.sl cxpreo;sion ol (~Illation (1-.2'\J matches tht. c.kllmtwn ot a r.mdom \'!lri.sblc: \\llh a
17 .5
6 91
Student r Ji.,tnhution g1vcn in AppendL'\ J7.1. That '" hy u'lll!{ the degrees oi freeJ~llll .1tlju~t ment to calculutc the -.tanJarJ l.!rror, the t-o,tatistic ha' the Stutlcnt r di tributioo when the erro~ arc normally dl'\tn huted.
= Ah(X;).
10
a !actor of pro-
(17.24)
"here A is a constant and h is a known fu nction. In this case. the WLS esumator is
th~: c~nm ator ohtaint!d by first di,iding the d~.;;pcnd~.:nt vanable and regre sor by
692
CHAPTER 17
the '>4Uilr\o mot of /J,thcn rcgrl!ssing thi-. moJific:d llcpem.lent \:triuhlc 11n th..: nwJ.
ificd rel!_rc,-.or uc;ml' OL5 Spectbcally. 01\ ide both "ide of th~ -.inglc-,uridblc
r\.g.re'(.;or mudd O\ /t( \ ,) to obtam
X l\'h(.\.).and ii
'' 1\ /,( ..\,).
The Wl' eo;timutor is tbe OLS est1mator of {3 in Equati<lll ( 17.'~l.that i'. it
~ the l!'-tmatur uht.unnJ h\ thl.! OLS regrco.;sinn ol l ' on X anJ \', wherl' the
II,
x) = var(u, I.Y,)
v.~r(u,l.~,) :- vm ~I ,
All(.'<,)
h( XJ - h(X,) = A,
( 17 211)
">0 till
WLS w ith
Heteroskedastkit y of Known Functional Form
Jl the hetcro:.kcdu<:.tici ty hao; a known functional fmm.thcn Lhl.' hdcro,kcJ;bticitv functmn h c m h.. ~.:,t,m.tt~..d and lhe '.\ L<; c<arma111r can ~c calculntcd u<:.ing thic;
~'lim th:J funt11on
\<trian~c
l.:ll ll\\,\
b~.-
Suppv~c that
the con~.h
17.,
';'lr(u, X ,I
fl,
693
fl \
u; x;
Example #2: The variance depends on a third variable. WI S .tlsu can tx:
u., .. d ''hen tht. c.onditional variance depend<: on 'l tlmd 'an tblc. W, whtch doe-.
tlllt tppcar in the rt:ercssion funct1on . Spet.tfic:.tllv o:uppu'e thc11 d tla .trt~ col.l~cteJ
un thrtt. v,trttbk , Y, ..\ . ,md W . i = I..... 11; the popultlhm rctrc"ton tunction
Jqh.!nJ, on X hut not W;: and rhc conJitton tl '.mancc d pend~ on n ,, but not X ,.
llt.tl ;,, tht. popul,uion rc:gre .,it.ln functJ~m i-. q }',IX, \\') Po- (3 1X, :md the conJitiun,ll \anancc i-. \3r(u,J X;.\\',) = Alr(WJ. where:" A j, ,, con... tant and his a function that mthl hl cl>limateJ.
r or cx:1mpk. suppo$e that a rc~cnrchcr is intere-;ted in mmklinr. t h~ rclatlon,Jup bd,.,.cl..'n the uncmpiO\ffilnt rate 111 1 <;f~tl tnd .s ,,,,tc .:"'onomtt. pohcv
\ ndal>l~.o ( \ ,). ll't~: me.1sured unemplo) mcnt r.tll. (} ) hO\\ 1..\l..r. ~~ .1 'uncy-based
l'stimatl nl the true unemplo~ment rJh: {}
I hu-. } nh:t~-..ur~., Y ,.,.jth c.;rror.
\\hl.'t c the 'ourcl.' of the error i<> random sur\'C)' e1nlr ~o } , }
1,. \\ hert: ', j..,
lhl.' llll'"UtllllCnt error ari..ing from tht. \lll \'l.'y. ln thi' c'\amplc it j, plau<-ible that
th.: sut \ cv s.unpk <;1/C. H' . is not the It a dtll.'l min.mt ot the 11 Ut '>t.tlc unc:mplm 1
694
CH APTER l 7
ment ratc.111u~ tbe population regression functi on doc~ not depend on W .Lbat is.
=f3o + f3 1X 1 We therefore have tbe two equations
E(Y, IX,,W;)
Y~
( 17.2t-;)
Y; = Yt + v,,
(11.::!9)
where Equallon (17.28) models the relationship between the !.ldtc economic policy variable and the true state unemployment rate anti Equation ( 17."91 repre
sents the rc lataonship between the measured unemployment rat" Y .md th'- true
unemployment rate Yi.
The model in Equations (l7.2t}) and (17.29) can lead to a popu lation regression in which the conditional variance of the error depends on W, hut not on Xi.
The error term
in Equation (17.28) represents other fac tors omjlted from thi::.
regression, while the error te rm V; in Equation (17.29) represents measurement
error arising from the unemployment rate survey. I( u, is homoskedastic, then
var(tt! I X ,, W,) = cr~. is constant. The survey error variance. however, depends
inversely on the survey sample size W;, that is, var(v1 [X;.W,) == a/W1, where a b a
constant. Because v1is random ~ urvey error. it is safely assumed to be uncorrelated
with u?, so var(u~ + v,IXI' W,) = C.T~. + a / W,. Thus. substjlUting Equation ( 17.28)
into Equation (17.29) lea<ic; to the regression model wnh heteroskedastJcity
ut
Y,
(17.30)
(17.31)
17
.s
695
2. Estimate a model of the condil iomtl variance:: function \'ar(u, X,). For example, if the conditional variance function ha' the form in Equation ( 17.27).this
entails regressing ii? on Xf. In general. thh .tep entails estimating a fun ction
for the conditional variance, var(u IX,).
3. Use the estimated fu nction to compute predicted value.<> of the conditional
variance function, \;ar(u, IX;).
4. \\Ieight the dependent variable and regre)sor lincluding the intercept) by the
inverse of the square root of the ~Stl rnated conditional variance function.
5. Estimate the coefficients of the weighh.:d regression by OLS: the resulring
Heteroskedasticity-Robust
Standard Errors or WLS?
There are two ways to handle hctcroskedasticity: estim ating f3n and {3 1 by WLS, or
estimating {30 and {3 1 by 0 LS and using heteroskedasticity-robust standard errors.
Deciding which approach to use in practice requires weighing the advantages and
disadvantages of each.
The advantage ofWLS is that il i~:~ more efficient than the OLS estimator of the
coefficients in the original regressors. at least asymptoticall). The d ~advantage of
WLS is that it requires knowing the conditional variance funct1on and estimating its
paramete rs. If the condilional variance function hru. the quadratic form in Equation
(17.27), this 1s easily done. In practice. however. the functionallorm of the conditional vanance function is rarely known. Moreo,er. if the fu ncllonal form IS moorreel, then the standard errors computed by WLS regre)sion rouuncs arc in"alid in
the sense that they lead to incorrect statistical inferences (tests have the wrong ize).
TI1e advantage of using heteroskedasticity-robust standard errors i.;; that they
produce asymptotically valid inferences even if you do nor kno\\ the form of the
conditional variance func tion. An additional advantage is that heteroskedasticityrobust standard errors are readily computed ali a n option in modern regression
packages, so that no additional effort is needed to safegua rd against this tlueat.
1l1e disadvantage of heteroskedusticity-robusr standard errors is that the O LS estima tor will have a larger variance than lhe WLS esti ma tor (hascd on the true
conditional variance fun ction), at least asymptotically.
In practice. the functional form of var(u,l X,) is rarely if eve r known . which
poses a proble m fo r using WLS in real-world applications. Thi-; problem is
696
CHAPTER 17
Summary
I. I11e a~ympt ot ic normality of the OLS estimator,comhincd with the cnn-;i,tcncy of
hetcros k ~dnc;taci t y-robust standard e rrors. impHes that, if the fi rst lim~~.- ka-.1
sq uares as~ u m p lion s in Key Concept 17.1 hold. then the h e l e roskednc;ti~o:ilv robu<a
r-statistic ha ~ an asympto tic standard normal distribution under the null hyptllh
esis.
2. If the rcgre~s ton erro r' arc i.id . a nd normally distnhutcd, condlltonal on the
regre sors. then {3 1 has an exact no rmal sampl ing dt,tnbutton. condtlwn tl ''n lht:.
regressoTh. ln addition. the bomoskedast icity-only t-statistic has an end SIUJ.:nt
t, _~ sampl ing distribution under the null hypothcsb.
3. The \\ l!t~ h t ed least squares (WLS} estimator is OL.~ applied to a wei~htcd rcl!re ~
sion , where all variables are weighted by the square root of th inv...:rse olth~.: c~m
dittonal \ SIIance. var(u,.X;). o r its estimate. Altho ugh the WLS c~ttmatur t:.
asymptotically more efficient than OLS, to im plc! m~ nt WLS )OU must kno\\ th'"
functional Corm of the conditional variance fu nction, which usuall~ '" .t t,tll order.
Key Terms
convergence in probahtlit) (68 1)
consistent C'>limator (6.!i I }
conv~rgence in distnhution (684)
as\mptotic &-tnt"lutmn (6X4)
Slutc;ky's theorem ((~5 )
continuous mapping th~orcm (685)
Exerci~
697
17.1
Exercises
17.1
11 1 (so
a. Denve the least squares ~;; Lima tor o f /3 1 for the restncted regression
model Y, = f3 1X1 7 u,.l n is is called the rcstncted h.!c.lst squar l!s estimator (~f-5) of {:31 because it js estimated under a restriction, which in
this ca e is {30 = O.
and 2 of Key Concept l7.l. conuttlo nall} unht a!>ed [Equation (5.25)).
l.
698
CHAPTER 1 7
r.
Ul
lcr 3\:.UmpltOOS l -5
g. "o" ~on~1der the c:.tim,ttor j;, = :i" 1 l I :L, 1X Dr11vt ,10 cxpr~''onlonar(P11X .. .. . X 11 ) - vari.Bf 1 Y1, . . . .\ ) und ... r the
Ci,w-.s-Markm conJiuons. and u-... till' exptc ......iun to -.llll\\ th,H
\ .~r( ~ 1 IX 1.. ... X,) ::::: var(J3f 1 s IX 1, ... . \',).
17.2
~uppn-;~
that (X1 Y1) are i.i.d. with fin ite fou rth moments. Prove 1hat the sam
pk cc..IVuriancc is a consiste nt estimator of lhc pnpu lnt inn covanancc, lhat is.
\,\ ~ _L_. rT XI. where \' n is defined 111 l qu atlOU (3.24 ). (1/mt: U~l! the Strat
~:g~ of Appendi x 3.3 and the \auchy-Schwa11 ln~:qu,thty. )
n.
'l"
- 13,) =
= (X, -
I L;
(X - r-.\
II
)
\
,- - - -
! L{ X,
n,_ 1
whc:tc ~;
- "' l
- .\') z
t/ ~
-
'\
2:(.,\ , . X):
J.t \ )u .
h. t '-.c th(,; central limit theorem. th~: l.n.. of 1.11 gl numhcr'.111J Slutsk} ,
theorem to show that the (iual term in th\.' cyuution wnv~'l!tC~ in pwhabthl\ to zero.
c. ll-.1.! tht: Cauchy-Schwarz inequa lity anJ tht: thitd ll:u'>l ~qu:ln.:'>
ns~umption
thc1n~m
t\lllh(.tin the
Exercises
17.4
699
Tc~ults:
..
.,
"
c;uppo~<: that
/"(\~ ')
s'?,trr;.
~ I 1mph~' that
t,IIT 11
~ l.
;r.,
17.6
17.7
<iupr'''~:
that X and
l. ... 11 an. i i.J.
11
a. Show tho~t the J'-'int pwhabiht) J~.nl>il~ tunct 1 ~m (p.d.l.l of (II,, 11,. X, . X)
em bc\\ nllcn asf(u,,X,)f(ur.\,) fl)l/ r- J \\hcrcj(u, .\') ts the JOIDl
pdf otu, and X,.
b. Sho" that E(uu, ."(,X) = (11 1
17.tJ
,\
)f~(ll
\ ) fur i =F j
= Ff u I \' ).
c llO<; tt.kr
c. Dcm.c the exact samplmg dt!>lribution l'( the Ol.S ~'tim.ltor. t:J 1 . condi
llllll.tlon X 1, , ..Y
d.
17.11
I.km~..
17.10 l.i.t Ub1. an c::.timator of the par.1mct1.'r U, ''here UmighT tx- hi a...<:d. Sh1)\\' That
tf Ef(1i - R) ] ~ 0 as, ~ (tiMt j,_thc mt .tn ..qua red error ol Rtends
tli/CIIl) then _.!__. n. lHint.l',c Equation (I~ -11) \\ith w- ;/ 8.J
Cf,
700
CHAPTER 1 7
APPENDIX
Tht" apsxodt' c.lchnes and disclb~ the nonnal .~nd rdated d ... nibutmn'- lbc de' ..""""
of the cha-squared. F. and Student t da:.tribulions. gah:n tn Sccuon 2.1, are r~.:,tntcd ~.: r... 1v
convenient rd~.:rc.:nce. We beg~n by pr..-senting ddanttton' >I probabaltttC5 :1nJ mom~.-rt:.
invohing continuous random ,ana hies.
Pr(o s Y
b)= jfr(})dv.
...
Because Y must take on some value o n the real line, Pr( -::Jt: 5 Y
that
= I, whach tmphe'
J::..ft (y)dy = l.
Expected values and moments of continuou!> rando m variables.. like those of d1scret~
random vnriablc!>. are probability-weighted averages of their values. except that summ.trions [for ~. xample,the summauon in Equation (2.3)J nre replaced by mlcgral<i. According!\.
the expcct"'d vulue of Y
lS.
(Y) = P.r =
f.'.ft
(y)dy,
(17.33)
where the runge of integration ill the set of values for which f) is non7ero. n,c variancc i~
thc expected value of ( Y - p.y )~ . ;md the rth moment of ,1 random vAriu ~k is the expected
value of Y'. Thu:..
var( Y)
F.(Y - p. y)"- =
J(\
(17.34)
(17.35)
Tho Normal and Related Distributions and Moments or ContinUOI.Is Random Variables
701
single variable.
lltt pwhahility denslfy funcuon or a normally distributed r:lodom ''ariahle (the normal p.d.f. ) ll>
a
l(y-p.)']
,,
fy{y) = ~ np -..,
u V2rr
..
(1".36)
v'2
where exp(A) is the exponential function of t. Th.. fat tor 11(1.1 ;f) in Fquation ( 1., >6)
ensures that Pr( - x s Y s x) = .f_.. /l{r)dv =1
The mean of the normal d~tribu11on is p. and 1ts vanrmce: is 1.1 lb..: normal c.Jbtribuuon 1S ~vmmctnc, so all odd central moments of 0n.!l.:r thr..:~; .Jntl ~r\:.lter arc zcro. l11c founh
~ntral moment is 3u 4. 1n gener.tL tf Yis dislributec.J .V(p... tT ).then th .,; \ CD ccotral mc.1menb
are given by
(17.37)
\\1len J.l. = 0 and u 1 = l. lhe normaJ d1 trihution IS called the )tanJard normal J ic;tribution.The standard normal p.d.f. IS denoted by 4> and the: ~l anc.J.trJ normal c. d.I. 1s denoted
by (1>.Thus the standard normal density is f/>(y)
The bivariate normal distribution. The bivariate nonurd p.d.f. for the t\\O random
vanabll!s X and }
IS
Suppo~ 1h1t
702
CHAPTER 17
Related Distributions
The chi-squared distribution .
( 17..'9)
ha' a c htsquarcd distrihution with
11
Zlt.
\"~. Jto;tribution. and let/ and W ~ mdcpendentl) d~tnbuted.lhcn the ramlom 'an thle
z
Y\iil;,,
r= - -
(1740)
1111 ll1c 1
Jistnbution i~
The F distribution . Let W1 and \V2 be independent nu1c..lom ' Hriublcs wuh clu-sq u.~reJ
dt~trihutions with res pective degree~ of freedom 11! anJ n1 . Then the random vanahlc
(17 41)
(11 1, n~)
Th~.: f distribution depends on tile numerator d..:gree~ of freedom 11 1 anti the denomt
Jl
11 2 fu
hmll the F _ distrihution is the same as tile~ di~tribution, diviJ.:d by 11 that 1-., it is the
'\Jm~ as the.! \'~In, Jb tribution.
APPENDIX
17.2
Chebychev's Inequality
Ch.. h}chcv ~ mequalit~ uws the \:Jrianc.. of the random varinhle ~ to hlund the rrob<thtl
II\ that Vis fanher th<tn ...
Two lnequalitie~
Pr{'\' - p 1.J:::: Jl) :=: \3r(l ')i8' (Chchych,.,. in(quaht\ ).
l'l
rrt,,e
703
(17.-t~\
~i
-
<
;;;;: {
C!
I,,
~
)_" w .f(w)d11'
.~~~~({~<
)dw-
o~f r-:Jfw)tln
_!_
H'[(II )till' ..
(17.43)
.rf(h')dl't]
"hlrc the fir-t C4U:tht \ j<, the uefinition of( W~). Ihe ..ccond cqualit\ huiJ, becall'e Ihe
l'lll~c.:' nf
\\'t'
mlcf.!ralion drvrdes up the real line. the: iN mcyualtty hnld) ht:cau~ I he h:rm thai
II(
Pr(l 1\' I
'
r)) !)uhsltiUIIng \\
anto the final e\prc~ilm. no1ing lhat F.(\.\1.') f. I( I' - 11 1 ) 11 \,tr(\'}.snu rcarnmging yiciJ, the- rn~qualny given in Equal ion ( l 7.4~) II I' rs dt"('rctt, I hi<. pmof appJtc,
\\ ith ~umma11un" rcpl:tl'ing intcgnis.
I
Jl1.
l . tu
(I 7 44)
fh~
dt'-
rruol ,,, Lqu.tllm ( 17.4-t) IS similar I\) lhe prool of the C<Jrrdauun in.:quahl~
1 Let 1\
f.(Y')
111
.-\ppen-
2/JL(..\'}'t-
h'l.(,\ ') 'u\\ kr h-= -F(.\ }')!(..\':). 'o thut (after srmphlic,nion) lhc. 1:'\prc"inn
bcc.,m ...,
nu
I-= F(Y')
IF( \'}')]2 s
mu,[
CHAPTER
16
Additional Topics
in Time Series Regression
his chapter takes up some further topics in time series regression, starting
wirh forecasting. Chapter 14 considered forecasting a single variable. In
practice, however. you might want to forecast two or more variables such as the
rate of inflation and the growth rate of the GDP. Section 16. 1 introduces a
model for forecasting multiple variables. vector autoregressions (VARs), in
which lagged values of two or more variables are used to fo recast futur e values
of those varia bles. C hapter 14 also focused on making forecas ts one period (e.g..
one quarter) in to \he future, but making forecast!, lwo, three, or mo re periods
inro the future is important as well. Methods for makiog muhiperiod fo recasts
are discussed 1n Section 16.2.
Sections 16.3 and 16.4 return to the topic of Sccuon 14.6, stochastic Lrends.
Section 16.3 introduces additional models of stochastic trends and an alternative
test for a umt autoregressive root. Section 16.4 introduces the concept of
cointegration, which arises when two variables share a common stochastic trend,
that is, when each variable contains a stochastic trend, hut a weighted diffe rence
of the two variables does not.
In some tin1c series data, especially financial data, the variance changes over
time: Sometimes the series exhibits high votarility, while at other times the
volatility is low, so that the data exhibit clusters of volatility. Section 16.5
discusses volatil ity clustering and introduces models in which the "ariance of the
forecast error changes over time, th at is. models in which tb.e forecast error is
conditionally heteroskcdastic.
~everal
~v1odel s
width of the
int~rva l
forec:.~s t
638
CHAPTER 16
3'-'-l'l.
th~ unc~runnty
ol
h.'tUIIh
IJn
uch ,,, a ~tock. which in rum cc~n b~ u::.dul in , '~~::.:.tnl! th~.: , ,,k ol
ownim th.ll
., .,.,~,.,
<.hapt.;:r 14
torcc..t~tt.r:o.
16. 1
Vector Autoc~ressions
639
VECTOR AUTOREGRESSION$
A vecwr autoreg.r~~sion (VA R) is a set of k time series regressions. in "'hich the
regressor'> nn.: lagged valui!S of nil k scrie~ A VAR extemh. the uni,nriat\! autorcgrt!S~Jon tt> a lbt. <Jr .. vector,.. of time series vAriables. When the number of lags in
each of the equations is the same anJ i<; equal top, the system of cq umion~ is called
a VAR(p),
In thl.! cnsc of two time series variables. Y1 and X,. the VAR(p) consists nf the
two equation~
X,
flll
/321 Y
1 -
(16.2)
where th.: {ls and the -y's are unknown coefficients and u 11 and uu <ll'e error ll.!rms.
The: VAR <~~sumptions arc the time series regr~sion assumptions of Key Concept 14.6. applied to each equation. The coefficients of a VAR are estimated by
cstimnting eAch equation b~ OL).
For exampk in lh~ t\\O-variahle VAR(p) in Equntiuns (1 tl.l) and ( ltl.2). you
co uld ask whe th er the correct lag length is p 01 fl- 1: that is. you could ask
whether the coeffkients on Y, P and .'<,_1, are 7cro in these two equations. Ihe null
h y poU1csi~ that these coe fficients are zero is
(ln.3)
The allernathl. h ~ polh~sis is that at lea'>t one l'f thc<~c four coefficients is noiUew
Thu..;; the null hypothesi in,oJves coefficient;, Irom horh o the equat1onc;.. two from
each equation.
Because the estimated codfi .Jcnts hav~.- a JOint I~ normal Jhtnbuuon in large
sample:,. it is po!>Siblc to test restriction!> on thr.:l)c.: coclllclt!nts b~ computing an F'tatistic. TI1e preci~e formula for thl!' statistic j, cumplicatcJ hl.!causc the notation
muc;t handle multiple equation~. so we omit il. In practice. mo!.l modem software
packages have amomated procedures for tel) tin~ hvpotheses on codfic1cnts in systems uf mu ltiple c4uations.
640
CHAPTER 16
How many variables should be included in a VAR? n,~ numhl!r of codtJcicnLS m cucl1 equation of a VAR is proportional to the nu mhcr uf \anabl~.:"'llllhc
A R. For l!xamplc, a VAR with five variables and fo ur lag. \\ill have "ll ~cll 1 cients (four lags each of fi ve variables, plus the intercept) in each '-lith~.- ftvc equatio ns, for a total of 105 coefficients! Estimating all these codiJCl~.;ntc; m~...c sc., the
amount of estimauon error entering a fo recast. wh.ich can r~o.l!ult m .J d~.t~o.n uratllln
of the accuracy of the fo recast.
The pracucal implication js that one needs to kl!ep the numN:r of vnr hk~ n
a VAR small and. especially, to make sure that the variables arc phu ...ibl) rd 11t..d
ro each other so that they '"i11 be useful for fo recasting each other. For example,
we know from a combmalion of e mpirical evide nce (such as that dJ-;cusscd in
Chapter l4) and economk theory that the intlation rate, the unemployment rate,
and the short-t erm interest rate are related to each other. suggesting thalth..:sL
variables could help lo forecast each other in a VAR. Including an unrelated Vtiriahle in n VAR, however, introduces estimation e rro r withour addi ng predictive
content, thereby reducing forecast accuracy.
Determining lag lengths in VARs. 1 Lag le ngths in a VAR ca n be determined
using either F-tesrs or information criteria.
The in formation criterjoo for a system of equatioos extends the smgle-equation information criterion in Sectjon 14.5. To define lhis informatiOn criterion \\.;
need to adopt matrix notation. Let~~~ be the k x 1.. covariance matrix o( thl! VAR
errors. and let iu be the estimate of rhe covariance matrix whe re lhe iJ element
,. wbere 11 is the OLS residual from lbe ; rh equattoo and u ~
of i 11 IS ~
1
tbe OLS res1dual from the J1b equation. The BIC for the VAR ts
:r,;_,,i,,u
BIC(p)
= lo[det(2-'
11) ]
lnT
+ k(kp + 1>T,
( 16.4)
wbere det(I11 ) is the determinant of the matrix i,,. The ATC is computed using
Equatio n (16.4}, modified by replacing the term 'ln T' hy "2''.
The expression fo r the BJC for the k equati on~ in the VARin E quation (16.4)
extends the expression fo r a single equation given in .Section 1--1.5. When there 1'
a ~in gl e equation. the fi rst term simpl ifies to ln(SS R(p)/T). The second term in
Equation (16.4) is tJ1e pe nalty for adding additional regressors; k(kp + l) is the
lotaJ number of regression coefficie ms in the VA R (there are k equations, each of
which has nn intercept and p lags o( each of the k time series variables).
Lag kngth estimation in a VAR using the BIC proceeds analogously to th~.:
single equation case: Among a set of cand1date value!> of p. the ~~tima l ed lag length
p is the value of p that minimizes BJC(p).
1 1ba:> sect ton u~ ~ru~tncn nd m.'l} he ~l-.1ppcd tor ~c-.-. mlthcmnuc:.lucaunent~
16 . 1
Vector Autoregressions
641
Using VA Rs for causal analysis. 111~: dt,cu~~t\lO '\0 l.tr has tocu'>cU on u'mg
vA Rs for forecasuog. Another use of VAR mod~.;ls 1s lor anal ~11ng caus31 relationships among economic time series van.tble'> md ~ocd tl wa... for this purpose
thai VARs were first introduced to economics h) the economctridan and macroeconomist Christopher Sims (1980). Tite use of VAR., for cau-.al inference is known
a~ structural VAR modeling. ''structural" because in th is applicatio n VARs arc
used to model the underlying structure of the economy. Structural VAR analysis
uses the techniques introduced in this section in the context of forecasting, plus
some additional tools. The biggest conceptual dilicn:ncc between usmg VA Rs for
fo reca:.ting and using rhern for structural modeling. however. t !hal structural
modeling requires very specific assumpt ions, det h,ed from economtc rhco~ and
inst ituuonal knowledge. of what is exogeno us and what is not Tite discu ~sion of
structural VA Rs 1s best undertaken in the con1ext of t!:otimation of systems of
simu ltaneous equations, " hich goes beyond the.: scope of this book. For an inlrodu<.: tion to using VARs for forecasting nod policy analyst<;. sec Stock and Watson
(200 I). For addi tional mathematical detail on Slructurnl VA R modeling, set!
Hamilto n (1994) or Watson (1994).
Ki~,
= 1.47 -
l -
0 P.O. In{,
( O.tJ<J}
(16.5)
642
CHAPTER 16
0.44.
11H: 'econu equa11un of the VAR i. the uncmrlo) mcnt cyuttion,m \\luch the
rcl!n..:-.,or; are the samt! as in the inflatiun equ.Hllln !"lui 1hi.' dcpcmknt van.thk h
th\! uncmpluymcnt rate:
--
llllt'IIIP1
= 0.2:!
!l.005.1ln};
(0.1"') (0.0 17)
- 1.52Unemp1
(0. I}
1 -
(0.018)
0.29Unemp,_~
(0.18)
() IKJ7~/nf1
(O.Ul ')
'-
II 1Xl3:l/ll{, ~
(0014 )
( 16.6)
(ll.l ll
--
m a(h a nee. O lt~.;n. howc\er, forecasters are called upon to m.1kc lnn:~.:.t~h lu1th~1
into the futurl.'. 1l1h section describes lwo methods Lor maldny mull 1p~oruJd
16. 2
Multiperiod Foreco~ts
643
onc-p.;riod-nhead model i~ iterated forward one paiod at ' lime, in a way that ilmade precise in this section. The second mcthotl 1s to make ''direct .. forecasts
by using a regre ' ion in which the dependent vuriJblc li. the multipcriod variable
tha t one wants to foreca~L For reasons discussc:d .11 the t!nd of thi~ section. in most
applications the ite rated me thod is recomme nded O\'cr the dtrect method.
iJiif = 0.02 I
0.24tllnj,_ 1
( Hl.?)
644
CH AP TER 16
J.f,;J = (l.ll2
1
ll.2li~!nJ;
(0.12) (O.tW)
1 -
"Pte for~.c..t"l ol ~/lz/111jj 1 bac:ed on uaw through 2fl0-t rv u...ing thi' t\1{(4), com..,u ~.:d 1 S.;l.lwn q ~- is S/;f !5.t~tl 1\- = UA Thuo; the ''"-'quarter-ahead ih:r
ah:u
lorn.1st bast:d on the AR(~) '"
~ htl:m' 111111J I\'- 0112
0.26 .fi;;;. mI. 1\ - 0.3'2.llnh v - 0.16~/nf ~Ill - U.03:l/nib0-l II l ,, 0.2o x n ~ - 11 :P 1.9 O.ln (- 2.~) - o.os 11 h = - 1.1. Ac~.urdu._ tu tht.,
item ted \ R( ) foreca... t, ha.;..:u on Jat.tthrnugh the fourth 4uarh. ll ,OW, the rate
nl mnntion i.!: predicted to tall oy I I JX'rcentage pnmts hctwe~,. nth fir..,t and ,ecunJ quMh.. r~ of 20115.
Iterated multivariate forecasts using an iterated VAR . lt l.'rat~.d multt\etriatc forct'.lsh c..ao bc computed Ul>ing a VAR in much the same \\U) \Is ttcr..ttcJ
uni\'arim~: forcc.1~t~ .m: computed u~ing an autorcgrc!-.siun. The matn n~. w lt.:uturl'
of an iterated mulli\ariate forcca~t is that the two-~t~.:p-aht!ad (p~.-riod T .._ 7) Inn:
casr ol ntl(' vari.thlc depends on the lorccasls oi all vnrinbk~ in the Vt\ R in pc:ri,lt.l
T 1 1.1-or c\ampli!. to compute Lh ~: forecast of the ch<mAe of inll.uion !rom penou
/' + I to pcncxl I 2 u'ing a VA R ,.. ith the vanahks M uj and Ummp onu mu-.
lureclst h<11h ~nj I+ I and Lnempr , usmg data through pcm,<.J T as an in11... n ...
d" 111. 't~.:r in fo 1.c.. ''tin!'- :llnf.r .,. \lor~ generaII} tu c,,Mputc= mu ttp~.onoJ iter .tk~
\ \ R fnrl."ca't' h pt'ri1kh ah\..au. It i<> necc,,,u} Ill ~0mpu c forcc<J,tS ol 111 vnrinhk"'
lor tll mtervcning periods bet\\cen T and T ~ lz.
""''"\..X tmplc wt \\Ill compute the iterated 'vA R torccast of ~It I b:t,cd
on c.Ltta thrc.1U\h 2lJO.l:l\. UStn\! lht. VAR(.l) for a/J f, ctnJ Un 11 f n S~l.l on Hl.l
(F.qu lluD!> ( lfl "i) and (to.6)]. The llr<'t Mep ts to cc mruh. the on~.. -tl.nh.r-<ohc. d
f,)rc:ca't' :l"/;;j'21 uq 20-u 1\ and G;;i;np . ; _..(1.1tv twrn that 'vAR. The for~\. st
i/,1f,, .. 1 >-~I\ ons..:<.l on Equation (16.5) '"as cnmputcd 1n <;~.c..tron II~ an<.J ''
n I pcn:cntagc pmnt !Equatiml ( 14.1f{)j A sirnil.tr cnh.:ulation using J:quati1111
( 16.6) shm.., that Ummp:.~~lltn.JI\ S -!%. In the -sccund <.tcp.thc~~ furcca,t-s tfl.'
~liO~IItutcd tnlu l::.qualtOUl> ( 16.5) ..tn<.l ( L6.6) to rrouucc the tWO quarter ahead
forecast. ~luf:x'~ utlfU 1v:
-..,
16.2
Muhiperiod foreco~ls
645
----
n,c iterated multiperiod AR forcc11~t i<s computed in step~: First compu t~ the onc-
r~: t iod
-r
..
,,
f3,Y
( 16.11)
\hc:rc the {3's are rhe OLS esllmatcs of the AR{p) coefficient!.. Contin umg thic;
proce~ ("itcraung) produccc; forecasts further imo the future.
The iterated multiperiod VAR forccllSt is also computed Ill steps: Fin.t compute the one-period-ahead forecast of all the variables in tllc VAR.thcn uc;c thos<.
lon:ca,ts to compute the two-period-ahead forecasts. and continue this proces"
itc rativd~ to the desired fon:cnc;t horizon. The two-period- ahead iterated forc CH~t llf Y 1 . l;lascd on the two-variublc VAR(p) in Key Concept 16. J is
Y,
21
+ y Y'r+.YpXr-t
..
+ .Yt,XI
+2
(16.12)
P -.
''h~ rc
the coefficients in Equalion (16.12) arc the OLS estimates of lh..: VAR cocffkk nts. hl!ruting produces t'orccac;rs further into lhe future.
Thu' the it..:ra t ~tl VAR(4) fo recast. based un d.1ta thruugh th~ fourth quarter of
2(XI-t. il' that inflation will decline b~ 1.1 percentage point-. between the fi~t and
second lJILtTlcr~ nf 2005.
ltcrcttetl multtpenoJ forecasts arc summ.mL~d in Ke> Concept lb.2
646
CHAPTfR 16
O.lSA./nf,_4
(0.14)
0.10A.lnf, _5
(0.07)
+ L.89Unemp,_5.
( 16.13)
(0.91)
-r
1.89Uuemp2004 .1 = - 1.38.
(16.1-l)
The three-q ua rter abead direct (orecast of M nfr+'l is compu ted by lagging all
the regre sor~ in Equation (16.13) by one a dditional quarter, estimal\ng that
regression, and then computing the forecast. The h-quarter-ahcad d1rect foreca~t
of A.lnf1 It is computed by using CJ.lnj, as the dependent variable and the regrl!'-
sors M nfr- lt and Vnemp 1 JJ plus additional Lags of A.lnf, 11 anti Unemp, It as
desired.
16. 2
Mulhpenod Foreco1~
647
Y,
tSp+Jx ,
dir~ctly
"+ ... +
16.3
+ u,.
( 16.15)
u ~ ing
jump in oil prices occurs in Ibe next quarte r. Toda y's two-period-ahead forecast of
inflation will be too low bet:ause it does not incorporate this unexpected event.
Because the oil price rise was also unknown in the previous quart(;!r. the twoperiod-ahead forecast made last quarte r will also be loo low. TI1us the surprise oil
price jump nexr quarter means that both last quarte r 's a nd this quarte r's twoperiod-a head forecasts are too low. Because of such intervening events, the e rror
te rm in a multiperiod regression is serially correlated .
As discussed io Section 15.4, if the error te rm is serially correlated . the usual
OLS standard errors are incorrect or. more:: precisely. they a re not a reliable basis
for inference. The refore hete roskedasticity- and a utocorrelation-consistent (H AC)
standard errors must be used with direct multtperiod regressions. The standard
e rrors reported in Equation (16.13) (or direct multiperiocl regrc:s:;ions therefore
are Ne.wey-West HAC sta ndard e rrors. where the trunca tion pa ra mete r m is set
according to Equation (15.17); for these data (for which T = 92). Equcuion ( l5.17)
yields m = 3. For lo nge r forecast horizons, the a mount of overlap-a nd thus the
degree of serial corre lation in the error-incrca~es: In gene ral. the first h - l a utocorrelation coefficients or the errors in a n 11-period-aht!ad regre sion arc nonzero.
Th us larger values of m than indicated by EquatiOn ( 15. 17) a re appropriate for
multiperiod regressions with long fo recast horizons.
Direct multiperiod fo rcasts are sumrnari?cd in Key C'oncepl 16.3
648
CHAPU R 16
dltc11.:ntly tl they urc estimated by a om.:-pet i nd -a h ~:ad regres-;inn (ami then iterated) thnn b) a mulupc riod-aheau regression. Second. twm :1 prnctkal per~rcc
ll\ 1.., lor~;.c.:l~tc rs arl! usuall} mterc:stcd in lorecastl> not JU:o.t 11 a 'tnglc honzon hut
11 multtpk honzons. Because the) are produced u~tng th~.- :...tml' modd, iterated
foreca'h 1\!nc.J to ha\ e tune path' that are le~-. ~:rra tJc ac:ro~~ hon1un-, tl1.1o do dtrcct
tore~.- .. ' l Bec:m...~.- ,, Jiffen.. nt model i-, u-;t..d at ever~ hori10n lor tr~.:~t forcCtbt~
!ktmphn ~ error in hl' e lima ted coefficients ca n add n dum lluctuation' tu th~;;
lime (Mth' of a ~qu cnce of direct multipcriod forecas t!).
l mh..r ~orne c1rcumstan ec;. however, di rect forcca~t~ tre prdcrable tn iter
a ted fort..<.:JSLS. Ont: )Uch cucumstancc is \\ben you ha\ -.; n .. a~on to hehe\c thlltthc
une-paioc.J-aheac.J moc.Jd (the A R orVAR) is not spec1t1ed correcti~. I or cxJ mpk.
you migh t h~:l ic ve that tht.! equation for the variabk )O ll .1rc lf)ing to forl!t.t)ltn
a VA R is -.pecificd correctly. but that one o r more of tho.. uthc:r '-'ttU.tlion!\ 10 the
vAR 1'\ spcd fJCc.J mc()rrec tly. perhaps because of ncglec.Lcd nnnli nl '" t~:rm' Tf th~:
onc-stcp-uhcad modt:l is specifil!u incorrectly. then in gtne.ra l the iterated mu lt i
pL'riod forecast will be btased and the MSf E of th~ itcratec.J fo rccn'>t cun exccLd
I h ~: MSfF of the c.Jm~c.t forccru;t' even though the dlrCet forecast h.ts a larg~:r \ oil I
.tnco.. A econd circumMance in which a d irt.:ct forecast nHght be dcstrablc a n s~.. ~
in mu lti va ria te forecas ting models with many predictors. 1n wh ich C<l'iC Vt\R
"pec1ftC d in termc; of all the variables could be unre liable becauc;c it would ha\ ~
\ Cf \ m.my estimatc!d coefficients.
( lfl l o)
16 .3
649
'-Cric~
has
Although the random walk modd of ,1 tre nd dc:,crihcs the long-run ruO\ements of man} economic time series. some '-'conomtc ttmc sencs have trends that
arc smoother- that is. vary lc from one penod to the next- than is implied by
Equation (16.16). A different modd is nee<.led 10 describe the trends of ~uch series.
One model of a smooth trend makes the first difference of the trend follow a
random walk: thar is,
+ 11,
(16. 17)
where u 1 is serially uncorrclatcd. Thus. if Y, follows Equ:~ tion (16.17), AY, follows
a random walk , so AY, - 6 Y,_ 1 is stationary. The difference of the first differences.
AY,- 6 Y, _,. is called the second djftereoce of Y,. and is denoted A2Y, = 6 Y, !1 Y,_ 1 In this tem1inology, if Y, follows Equation (16.17). then its second dj(ference is stationary. Tf a series has a trend of Ute form 10 Equat ion (16.17). then the
first difference of the series bas an autoregressive root that equals 1.
" Orders of integration " terminology. Some adctitional terminology is useful for distinguishing between these two models of tre nds. A series that has a random walk trend is said to be integrated of order ooe, or / (1) . A series lbat has a
trend of the form in Equation (16.17) is said to be integrated of order two, or 1(2) .
A series that do~s not have a stochastic trend and is stationary is said to be integrated of order zero, or / (0).
The order of integ, atjon in the /(1) and /(2) terminology i ~ the number of
times that the series needs to be differenct-d for it to be stationary: If Y, is /(1 ).
then the first difference of Y,. 6 Y1, is stationary, and if Y, is /(2), then the second
difference oC Y,, 61 Y, , is stationary. Tf Y, is /(0). then Y, is stationary.
Orders of integration are summari7ed 10 Key Concept 16.4.
Examples of/(2) and I( I) series: The price level and the rate ofinflation.
In Chapter 14, we concluded that Lbl.! rate olt nllat ion in the C nited State:. pla~i
h a~ a random wnlk :.tochastic trend, thnt is, that the rate of inflation is /(1). U
inflation is 1(1). then i!s stochastic trend i!'. removed by first diffe rencing, so A/nf,
bly
650
CHAPTER 1 6
16.4
If Y, is
integnll~,;d
gre ~si\ c
o f orde r one, U1at is, if Y1 is /(1 ). then ) ',has a unit autoreroot and its first dif!~rc.>nce . .l Y,. i::. stationary.
lf Y is in t~gr li~J of order two. that ic;. il Y, b !(2). then .l l has a un1t a morcg rl. :.Sl\-1.! root and its second difference. _l2 Y,. is statiOnary
j, stationary. R c~.:a ll
Thi'
s~:ction ~:ontinucs
The DF-GLS test. 'In c ADF test was the first test devc lop~;u for testing tht: null
hypothesis of a unit root and is the most communi} u~ed te~t in pracltcc. O ther test.,
subst.qucntl) lwve been propo~ed . however. nhtn)' of '~hich ha.,.e h1gher pcm...r
(I' C) Concept 3.5) than the A DF test. A test with higher power than the ADf tc:"l
is more likely to rl!jcct the null hypothesis or a unit root again-.t th~ stational y :~Iter
nattvc when the altcrnalive is trul.!; thus. a more pm.,crfult~st io; hdt~r a hie to Jl"tmgul'h between a unit A R root and a root that 1~ l,uge but tc~-. than I.
1111., ...ection d!.;cus.....c~ one such test, tht... l>.F-GLS te~t ~"' doped b) J:.llwtt.
Roth~nhag. anJ Stuck ( 1996). Inc t~st i~ introJuccd for the CU\1! that, under the:
16.3
FIGURE 16. 1 The Logarithm of the Price Level and the lnflattan Rote in the United States, 1960-2004
Logu richnl
Cl
IIIII! II!
I'Jhll
I I I I
I! I I
I'
1'):-i(J
l'J~5
I I I I
II II I
tno
1')')5
I I
::!lltlll
tt
It
~~H
Year
I ) 1Jt,
PI lntbuon
2cKifl
to0)
Year
The trend in the logortthm of prices (Figure 16 1a) is much smoother than the trend in inAction (ftgurc. 16 1b)
651
652
CHAPTER 16
a * = 1 - 13.5/ T. The n V, is regressed against X 1, and X 21: that i-i, OL<; is us~..d to
estimate the coe ft ici~nts ot the population regression equation
(16.1~)
using the observatio ns t = 1, ... , T. where e, is the error te rm. Note tha t t h~:rt: t
no inte rcept in the regressio n in Equation ( L6. t 8). 1l1e OLS estimators B11 and 51
a re then used to compute a " detrended" version of Y,. Yf1 = Y1 - (S0 + B1r).
In the second s te p, the Dickey-Fuller test is use d to test for a unit a utorc~r~..s
sive root in Y~ . whe re the Dickey-Fuller regressio n does not include an imen:~o.pt
Yf
Yf_
1
1
or a time tre nd.That 1s, ~
is regressed a gains1
1and L\Y;_ 1, .... L\Y; I'' wh1..1
t he number o f lags p is determined , as usual. either by expert kno wlcdg~o. nr b)
us ing a daUJ-based me thod such as the A I C o r BIC as discussed in Section 14 c;
lf the alternative hypothesis is that Y, is sta tio nary with a mea n that miglt be
nonzero but without a time trend, then the preceding ste ps are modified. Sp1..olt
cally, a li' tS comput ed using the formula a = 1 - 1/ T. X 2, is omitted from h~.:
regresston tn Equa tion ( 16.18), and the series Y~ is computed as Y1::: Y,- 11
The GLS regressto n in the Cirst step of the DF-GLS rest makes this test more
complicated than the conventional ADF test. but it is atso what im proves its .tl'l
ity to d iscriminate between the null hypothesis of a unit autoregrcs ive root ,, d
the a lterna ttve that Y, is s tationary. This improvement can be !>ubstantbl I r
example. suppose that Y, is in fact a stationary AR(l) with autoregre sive co Ill
cicnt {3 1 = 0.95. that the re are T = 200 obse rvations, and that the unit root tl "
a re computed wllhout a time trend [ lhat is. t is excluded from the D ickey-r t.lkr
regressio n. and X 2, h omitred from Equation (16.18)}. Then the probability th 11
the ADF test correctly rejects the null hypothesis at the 5% significance lev.:! ~~
approximately 31% compared to 75 % for the DF-GLS tes t.
Critical values for DF-GLS test. Because the coefficients on thl! dt:tl!rolln
istic te rms a re estimated differently in the ADF a nd DF-GLS tes ts, tbe test~ h e
different en tical valu'-'' The critic3J values fo r the DF-GLS test are gi,cn 10 T I 'r
16.1. If the DF-GLS tc::.tstatislic (the tSta tistic on yd in the regR,sion in th1.. ... L~
ond step)~ less than the crittcal value, then the null hypothesis that Y, ha" a untl
16 . 3
TABLE 16.1
6S3
Detem'llniilic R~reuors
1O"o
S%
- 1.6.2
- !()5
1%
- .2.57
root is rejected. Like the critica l values for the Dickey-Fu ller test, the nppropriale
critica l value depends on which version of th ~ t c~ t is used. tbnt i~ on ,.,.hether or
not a time trend is included (whether or not ..\ ' is included m Equauon (1 6. 18)}.
Application to inflation. The DF-GLS statistic, computed for the rate of CPI
inflalion, /nf,, ove r the period J962:I to 2004 J V wllh nn intercept but no lime
trend , is - 2.06 when three lags of ~Y;1 a re included in the Dickey-Fuller regression in the second stage. This value is less than the 5% critical value in Table 16.1.
-1.9S,so using the DF-GLS test with three lags leads to rejecting the null hypOJhesis of a unit root at the 5% significance level. The choice o f three lags was based
o n tbc A IC (out o f a maximum of six lag!>).
Because the DF-GLS test is beuer able to disct iminate between the unit roo\
nuU hypothesis and the stationary alternative, one mterpretation of this hndmg is
that inflation is in fact stationary, but the Dickey-Fullcr 1\!St implemented in Sec
tioo 14.6 fail ed to detect this (at the 5% level) . TI1is conclusion, however, should
be tempered by noting that whether the DFG LS tes t rejects the null hypothesis
is, in this application, sensitive to the choice of lag k ngth . If the fest is based on
two lags, which is the number of lags chosen by 8 IC. it rejects tbc null hypothesis
at the 10% but not the 5% level. The result is also sensitive to the choice o r sample; if the statistic is instead compUled ovc::r the pc!riod 1963:1 10 2004. 1V (that is,
dropping just the ftrst year). the tes t rejects the nuU hypothesis at the lU ~" Je,d
but no t at the 5% JeveJ using AIC lag lengtb~ The overall picture th(.;rc10re t::.
rather ambiguous [as it is based on tbc ADF t~t. a~ tlt~cu sed followine Equ.ttton
( 14.34)] and requires the forecaster to make:: an mlonncd judgment ahout "hcther
it is better to model infl ation as 1(1) or station<ar).
654
CHAPTER 16
non tauonat\ l nc.lcr the null hypoth~~i" thatlh~.; regr~.: 'ion contains u unit rout ,
th~ regrc.,-.or } , in the Dickcv-fulkr rcgrc~ion (and the rq,tr~.:,,o )' 1 tn the
moJtftt.d Dtt.KI.} I ull~r rcgr~"'-tOn an the second ~h:p of the OF-C,LS t~.;'
- ')11
starionar) lltc. ' on normal dJStnbuuon of the untl root t~l l.)tt,lu. . . .
qucncc of thi-. non-.talionariry.
To gain 'umc mruhcmaticnl insight into this nonnormality. constdcr the simplest po~stblc Dtckcy-Fuller regression. in" hich 6 Y, is rcg.rc. sed ill.!ainst thc. 'inglc regres-.or Y 1 and the intercept is excluded. In the notation of Kc.\ (\lllC~o.:fll
1-tl'i,thc OLS t;:.ltmator in thi!' regrc~ion is 6 =
Y, !lY ' ~r } o;o that
L,:_,
Tl) ~
I T
r 2: Y,_ , ~ Y,
I
1
T2
I..,::---T
,
( 16.19)
2:Y-_
r=l
a~sump
(I(, ~0)
Under the null h)flOlhesis, AY, = u r which is s~rwlly uncorrdutcll anJ h,l,:t
finttc variance. so th~ second tem1 in Equation (16.20) hm, the! prob<tbility I unit
f~~ ,(~ Y,)~ ~ IT~ Under the a--~umption that l"u = o. lhc fi~t lCtlll tn r yu
uon (l6.20l C.'ln be wriuen Y,rvT
\ ~~~ ~Y, = ,~: ,u.. ''hich m turn l,o...,,
the central It nUl theorem: tb.lt is. Y 'V T ~ N(O. tr). Thus ( Y 'v T)7
2 - I ), \\ hcrc L is a standan.l nomlal Titndom \31 t.thk.
~ r) (T ( Z
Recall. howc.,~r. that tbe !>quare of a standard normal d i:.tnhutton h,,, a chtgquarcd Ul\lrtbU!t()n with one ucgrec of lrccdom. It therefor~ follows It urn I lJll I
tion ( 16.20) thul under 1he null hypt)\hcst' the numerator tn Equattou ( lb.llJ) h l'
the limiting di,tribution
fL- (
(16.:!1)
The larg~.- !>ample distribution Ill Equation ( 16.21) I'> utffcrenl than Lhe U'>U tl
largd-,amplc not mJI dt'>tribution wh~.;n the regrcs~or b :.tallonary. In t~.-.,J, the
numcr~tor of th O LS e~timattlr of the cocfllcicnt on Y, tn thi' Dtch} rulle r
rcgrc . . . . ion has a dt-.tribution that j., proportional! , a ch 'CJU<~r~.;d distribution with
one dcgrcc! ol lrccd~~m. mtnll!! 1.
16.
Cointegrotion
6 SS
'11ti' di,cw.,inn has con~idcn:d tmly the numewtnr ol /I) Ill~ lknomin<llt>t
al'u hchnv~" unu,unll) under the null I\\ pllthvts: Bee tU:>t. } lullti\\S a random
\Htlk unda tht. null hypothe~1~ r L 1Y, 1 dot.~ not \:On' t~c,; in prohahtlit> h> a
cun ... tnnt. Tn,tt..tJ.the Jt:nominator tn E4uattt1n ( 16 II}) j., 1 1.mJnm' m 1blc. c:v~n
tn l.trcc 'amplc:s: Un~okr the nuU h~ puthr..:'i' / 'S:, }'., umvt.rl!C' m di<.trihution
joint I) "ith the numerator. The unwma l tlbrribution.; ot the numerator and denllmin:uor in Equation (16.11}) are the ~ource olthc nonst.uH.I.uu c.listrihutton of th~!
Olckc:)-FUII~t t~ ... t ~IJII'>llC and tht. reason thai rh..: A or ~t.tti~tic ha~ II!> 0\\ n 'Pt!ca.tlt.tblc t)f crtttl.. tl '.tlucs.
16.4 Cointegration
t\Hl or more series have lhc ,,,me sto~o.haslic ttc:nJ 1n common. In thi!.
"J'>ct.t.tl~..t,~ Tlkrrt. J to a!) comh:grauon regre,ston tn th ~s c.an reveal long run
rd.ttinn:-.htps ,unong ttmc series "anabk ,but ~llnlt. nt:w metluxb arc needed.
<;om
ttmc~
656
CHAPTER 16
FIGURE 16.2 One-Year Interest Rote, Three-Month Interest Rote, and lnlere$t Rote Spread
Percent per Annum
20
IS
10
LYear Ri!tes
Oneyeor and three month interest roles shore o common stochastic trend. The spreod, or the difference, between the
two roles does not exhibit o trend. These two interest roles appear to be cointegroted.
Vector error correction model. U ntil now, we have eliminated the stocha~ !lc
trend in an /(1) variable Y 1 by computing its first d iffere nce. ~ Y1; the problems cr~
ated by stochastic trends were then avoided by using 6. Y 1 instead of Y, in umt>
series regressions. lf X, and Y, are cointegrated. however, another way to elimina te the trend is to compute Y,- OX1 where is chosen to e liminate rhe comml10
trend fr om the difference. B ecause the term Y, - 8X1 is stationary. it too can be
used in regression analysis.
In fact, if X1 and Y1 are cointegrated , the first diffe rence~ of X1 and Y, can be
modeled using a VAR , augmented by including Y,. 1 - 6X1 _ 1 as an addniooal
regressor:
!l Y,
=f3to +
+ 'Yua..r,_. + +
ul,
16.4
Coimegrotion
657
658
CHAPTER 16
COINTEGRAnON
16.5
Suppose ,\ and Y an.. integ.nn...d of ord~r one. Jl.lor some ~odltcicn t fJ . }' -OX,
I!> tnlo..!!rHt~d of order LCro. then X, and Y, arc sa1d to he cointcgratcd. The coefficient II is calkd the cointcgratin~ coefficient.
II X, and Y, .tre cointcgrated then they ha"~.- the same. or c<m mon. ~tochastk
trend. \omputing th~ difference Y - HX, eliminates this common stochastic. trend.
~~-
..i..
+ a 2(Y,
f3~,,J.Yt
1-
9X,
t- 'Y~tJ.Xr-1
1)
+ .. ' +
+ ~~~,.
Tht. term >, OX, i-. called the error corrcd'ion term.ll1c comb wed modd m
J?quation<> (16.22) nml (16.23) i~ <.:ailed a VC'-'tor error correction model (VECM ). In
a VECM. past values of >', - ex, help to predict future vulues of oY1 and/or .1.\',.
658
CHAPTfR 16
(OINTEGRATION
16.5
Sup r os~
\' nd Y 1rc mtegratld l l order ont:. lt. lor ..,om f.:oclftctent 0. )' BX
i" tcgr.t . "' 01 <"; u~r zcrc. !lt:r. \' 'lnd } ar.: ., 11d to ~ ~:ointlgrated lltc coc ffi.
cient 0 j, cu' k<.J the cointel!l'ating coefficient.
It X amJ }
l; c >integra ted. then Lh e~ ha\(! the
''Th ..: or common. -,tocha. . tic
trend. Compt 1 ng th "rfferenco.. Y, 6.\, chnunatc-, thi c 'mmon tochn ilic trend.
\ = f3zu
Y ~,.lX,_I'
+ - f3_,,.l }',
+ a 2(Y,
1-
fJX,
'Y ~.\, 1
1)
..
I u21 .
llrt: term Y, H.\ 1 IS called the error correction term.1111! combu11.:d nlolkl 111
FlflWlion'> (1 1\22) anJ (I 6.23) b culled a vector error corrC('fi on mode l ( VLl'Yt) rn
a \ r ('M. (W\I ',II Ul!!> ul }, - IJX, hd p lO preJict f u 1U rt! \ .tl Ut.!S ol' 1 }'", .tnd ur .l \',.
16.4
Cointegrotion
6 59
~-.:~,md, 'isual inspection ot the "cries help to identify l'n'~' 10 "bkh cointegntillll j.., pluu,ihlc_ For example. the graph of th\. I\\U inte1c~t rate-.. 10 Figun: 16.2
shllW' that ~ach of fhe Seriec; appcdrS tO be::/( 1) hut th,tt the '-preHU app~ar-; 10 he
/ (0). so that the two series appcor to be cointe~rat\;c.l .
Th m.l, the umt root tec:ting procedures introduced so Jar ~an he extended to
tCl>l lor corntcgrmion. The! in..,rght ,)n which the'\. h.:!-.t., arc h<t,ed is tbttt if Y1 .mu
\
.1r~.- cotntegratcc.l with corntegwting cocfficknt 0 then Y1 - II X, is stationary.
1
lllhcn' ;.,~. r, - nx. is nonsLational"} (i IC 1)] TI11. In P'JtiH:.,is tl1.1t Y anc.l X, :~re not
wintc.:gr .tt~.:u [that is. 1hat Y - fiX , ~ 1( l) jth1.. n.tore can he tested hy testing the
nu ll hvpothcsio., that Y, - HX, h.t~ a unll root: if thh h' puthc~ts IS rcJc!Clcd, tben} 1
aml A 1 can he modeled as cointegr.lll!u. The tkt.nl" ut thi~ test ucpcnc.l on " hcther
the cointC!_!,Iaung coefficient IJ 1s kilO\\ ll.
In some ca,l:S e~pert knowledgt.: or ~.wnnm i c lheory suggests values of fJ. Wh~.:n n" kilo" n. the Dick~y- Full e r
and DF-GLS unit root rests can be W>l!d to tc-.t lor comt~~rution hy lirst constructing 1he series z, - Y, - H \',. then testing the null In pnthc~is th.ll : has a unit
autorcgscssive root.
cien t fit~ unknO\\n rhen it must be "'"timatcd pnot lutc.,tinl! lor a unit root in the
c..nur correction term. This p reliminary step mnkcs rl neccs~ury w usc different
crit ical values Cor the '>ubsequent unit roof test.
Spccsllcally. in the Cirst sll:p th\! cotntegratjng codhctcnt U ~~ t:'\llmatcd by O LS
cc;ttmation of the regression
Y1 = n
nx, .... :.
(16.24)
In the c;econc.l step. a Dickey-Fuller Hcst (wuh t'lllllcrccpt but no time.: trend)
is u.;c:J to test for a unit root in the rc'>tdual (rom 1111~ r~gr~.:!>.'>ion, ::.,. Thi' tv.o-stcp
prucedun: "called the Engle-Granger Augmc..nted Dtckcy-fulkr test for cumh:rraLton. or F:G -ADF tes1 {Engk and Granger, 19~7).
( mica I' alut~s of the EG-ADF ~taiJ!:>IJC arc given in Iable l 6.2 .~l11c critical value!> in 1h~.:. fir-.t row apply" hen thl:rC is a c;ingle rc~rc:-:.ot in Equation ( 16.16). so that
thdc: trc: two coilltcgratc d variables (X, and > ).The .;ul:\~c:qucnt row' applv to the
C.1S4.' of multiple cointegratcd van.;bl~:.. wh1ch 1:> dr..c~d .1t t h~.: ~nd ~..11 thrc; '>I!Ctton.
Th <.!Ill~ 1! \.alu~-. in Tahle 16., ;m: t.slien lrom I uiJ..r (l<r.o) .mlll'hllhp' rJ 0Jhart> ( 1'1'.111) h11 "'''"ll).'. ,, u!lll.:'l'on b) lLn~cn Cl<lr.!J.thc crnical \:.Jiuc' in "J.tl-l.:o lh.211rt ch1~n "' th.ct the' appl,wh<.' lhtJ ot n"t .r, .mJ F, hoi\<! Jntt t:Pmpono.!nb.
660
CHAPTER 16
TABlE 16.2
10%
5%
-3 12
-3.41
l "a
2
-4.16
4
-44Y
-4 ZIJ
:'i11X,_1 + ttr
(ltl.25
l11us.m Equation ( 16.25), the regressors are X 1, M. ~ ..... O.X1 _ The DOLS ~ll
mator of 9 is tht.! OLS estimator of(} in tbe regrc"on of Equauon t 16.25).
1f X, aod Y arc cointegrated. then the DOLS estim,IIOr is efficient in l1rg~:
~ampl~s. MoreO\<!T. ~tatistical inferences about 8 and the o'o; in Equ llion ( lt1 2:')
based on HAC standard erro~ are vahd. For example. the t-statistic con<;truc tcd
u~ing the DO.LS estimator with HAC standard errors has a standard normol dt:,trihution in large sample~
One.,., ay to interpret Equation (16.25) is to rcc:ull from Section 15.3 that cumulative dynomic multipliers can be computed by mollifying, th\. tli-;tribu tetllag
regression of Y1 on X, and its lags. Specificall), in Equation ( 15 7), the cumuh1tivt
dynamic multipli~rs we re computed by rt:gressing Y, on ~X,, lags of ~X1 and.\ r:
th1. coclltctent on X, in that o;pecificauon is the long-run curnulullve dyn tfllll..
mult1phc:r. Smlarl~ If X 1 \\ere strictly exogen ou~. then m Equation (16 ..25).
the coefficient on X,, 0. would be the ll>ng-run cumulative multtpiH. r, that b thl'
16.4
Cointegrahon
661
~xogenou!-.thcn thL coddo uot haYL th1~ mtcrrr~tatton . ~~' erthd~" t'lcc.llN~ \"1 and Y, h ,,.e a
common ,h,c:ha-.tic trend if th1..~ are cointegratcJ. th~ DOL S c'ltm:tlt'r j, c,m,j,.
t~..nt C\'cn if \', i-. cndogcnou'-.
ll1c DOLS e::.timator is not thc only efficient c-.tim.ttor of the cointcgrattng.
cocl ficicn \. The lt r~t such cstim.ttor \\as Je\.elupcd by StlTt...n .Johansen {Jnhan,en.
I 1 !~R) r M l dtsCU~SJOil of Johan.;cn's mdhod anJ Ul Olht:r \\1)~ to csttm.llc the
curnh:yr.tung coerhctent, scc I lamtlton (I W-1 <'h,tptl.'r "II).
f \ n if CCOnOJDlC thcOT} dOc:S DOl sugg~~-ol [I -.pcdfic Vc Jue of the cointcgrating cudficicnt it is important It\ check \\ hcth~..r th1.. c'timah.:u c.:ointt;grating relation, hip makes <;ense in practice. Because cointcl!ration tests C<tn b~ mt,h.:aJing
(the\' c:~n improperly reject the null h~poth1.. -.i~ of no c<.\tntcgnllinn mnrL Irequcntlv th.m the~ "hllulJ. ,md lrtqu~..ntl\ the\ rmpro rcrl\ t;ul to r~J~ct tht; null).
it 1s ~.:~r\.cr II} unportant torch on econoffilc th~o:OI). tn:.lttutionul J...nuwkdgc and
common ~~.n,c when e-..umating anti m;ing Ct)tnte~rating rclation-.htps..
hcr~nts
,,,JJ
~r.ttini!
rd wonship ~ugg~~tcd hy the thcor) '' Rlyr, - /Nil,. anJ ,, ~-.:cond rclaR5) r - R9<t,. (The relatttm,hip R5) 11 - Rl) r1 j, aho 1 LOintegrating
rcl.tthm<.htp. hut it Clllltain-. no additional infNmath'n hcyond th 11 in the nthcr
rd;rtiun..,hips bccau~ itt" perfectly multkolhncnr \\ith th~ oth~o.r two cointeg.ratiny !dation,hips.)
I ht. EG-ADF procedure for tl!:..liM lnr ,, ~mgle 'umtc~rating rd.ttionship
among multtple vanabl~" ''the :,ame as for the Cll'>C ot two variables. excL'pt that
the tc!'rct-,tun in F.quatron {16.::!-4) ts mod1lu:d ..;(' th 11 both X,, and,\_. ar~.: regrcsMII'; th<.. c.nttccll \aluc-. tnr the EG-ADf lc't .m.: gi\'t'n in 1abk ln.~. \\hae the
appn1priat~.. m"' depend' lln th~.; numha of rc~1re-.... nrs in the ril"-t--tagc OU, cmnh or ttinl! rcgre<,.;ion.lllc OOLS estimator ol n sink cointc!!rcll101! rciJIIOn:.htp
anllllll! multiple X'-::. lll\Oh ~.:. mtludmg the k \dol cad1 A. 1lon~ "ith k'JU' and la!!s
ol the ltr'>l dtflerencc of each X. 'le<;ts for multiple cumtcgratin~ relationships can
uon~htp ~
662
CHAPTER 16
Cointegrotion
16 .4
TABL 16 .3
Unit Root and Cointegrotion Test Statistics for Two Interest Rates
AOF Stati$!ic
Series
I
I
OFGLS Starisric
RClO
-2.%
-1.88
Rlyr
- 2.22
- 1.37
- 631
-5.59
Rlyr- R90
Rl \'f
1.046RW
663
--
- 6.97'
R90 ~~the wterest r11e on o,ICI-da} U.S. Lreawry hill._ en an annual rate, and R Iyr is the wtcrest r.ttc: un llne-year U.S Jrcasul')
hon<b. Ro:gt~o~ ''ere nltmn ted uSJng quanc:rl~ dau o,er the period I%2.1-1999:1V.The number or Iago. tn the Utlll root tc~t s!a
11\IIC retr~-ssions were o:hl~n h) A IC (SIX lag~ maximum) l" nit root test \talt<.lic- an: ~tgnUJ~'11nt ~~the 5 or .. I,. ~~~nutcance
levd.
The unit root statistics for the spread, R1 yr1 - R901, test the furthe r hypothel>is that these variables are not cointegrated against the alternative that they are.
The null hypothesis that the spread contains a unit root is rejected a t the 1% level
using both uni t root tests. Thus we reject the hypothesis that the series are not cointegrated against the alternative that they are, with a cointegrating coefficient 8 =
1. Taken together, the evidence in the first three rows of Table 16.3 suggests that
these variables plausibly can be modeled as cointegrated with 8 = 1.
Because in this application economic theory suggests a value for 8 (the expecta tions theory of tbe termstrucrure suggests that 8 = l) and because the error correction te rm is /(0) when this value is imposed (the spread is st<Jtionary), in
prin ciple it is not necessary to use the EG-ADF test, in which 8 is estimated. Neve nheless, we compute the test as an illustration.The first step in the E G-ADF test
is to estima te 8 by the OLS regression of one variable on the othe r; the result is
--Rlyr
1
= 0.361
.
-2
1.046R90,.
R + 0.973.
( 16.26)
The second step is to compute the ADF statistic for the residual from this
regression, ir The result, given in the final row of Table 16.3. is less than the J%
critical value of - 3.96 in Table 16.2,so the nuU hypothesis that bas a unit autoregressive root is rejected. This statistic also points toward treating the two inte res t
rates as cointegrated. ote that no standard err o rs a re presented in Equation
(16.26) because, as previously discussed. the OLS estimator of the cointegrating
coefficient has a non-normal distribution and its L-statistic is not normally distributed , so presen ting standard errors (HAC or otherwise) would be misleading.
z,
A vector error correction model ofthe two interest rates. If Y, and X, are
cointcgratcd , I hen forecas ts of ~ Y 1 a nd
~X1
664
CHAPTER 16
und :lX, hy the J.uH!.ed \aluc: olthc error corn:ctiun tcrm .that i hv
~{lmputing lur.:casto; u... ing thc VEC\t in rquatiuns (lfl.22) .tnJ ti6.2 3J. II t) IS
J...no,,n then the unknov.n C<lefficicnts of the; \! [C\1 can he e'tinwtcJ by OLs.
mcludm\!,- _ - }'1 1 - H \
ac; an aJdllilln,tl regressor. II H jc; unkno"n . then the
\ [.(. \l CJil be ''timatcJ U'>JOg: 1 aS 3 f' \!fi..:~"O \\ here : 1 = l'1 - 0.'(,. \\here fi jc;
an t::'lllll3LUJ tl 11.
Tn the .tpplication to the: two imcre"t r.ll~~ rhcol) suggc~t~ that tJ = I anJ the:
unit ro111 h:'h .... uppMt modeling the''' o inh:rest ratcs as Ctlintcgt ntc<.l "ith tl.'l.ttn
tegr ttim coellicient of I We lhcrdore c;pcctr~ the VEC\T u'ing th thctlr ti~o nll~
Sll\!!!t:"t J \ Jue Ol () = 1. th.tt 1:-.. b\ aJJing the Jagged V'lh: I f tl c; spr ctU.
vAR n
Rlw1
; -
/N01
Jutc:rl.'nCI.'li.th~
tu
\AR
tcsulung \
~ N1il,
10 ~Rhr,
1\\0 l.t~
of fir~t
LC~11s
(0.17) (0.32)
1-
0.44~ R90,_ 2
(0.34)
II 0 I ~R l Y1 ,_ 1
(0.39)
(0.27)
R90,_1)
( 16.27)
(0.27)
t0.~9)
- u IJ.lRhr _1 - 0 "~(Rlyr,_ 1
(0.2 -)
{0.35)
-
R90, 11
(1 6.2 )
(ll.~-l
'l th~
lal!.ceJ 'pre ad (the: crror corn:cllon term).'' hich is estimated to he 0 ":! h,,, n
t statt-.1 1~ ol - 2. 17, ~u Ills :-.latt,tlcalh :-tg.niticmt .It the 5% lchl \lthoul!.h l.n!~cu
value.; l)ltlw lu:tl d1ft~rcnc~ ulth~ tntcrc't r.ttcs &rc not u'dultur prct.lll.:ltn~ tutuc
intt.rc!>t r.t!l..' ' the l.tgged "(ln..ali Joe' help tu prcthct the ch.m~c Jll the on~o } l
Trcasun hnnd r.llc Wh~:n the nnc->C<tr rate cxcccc.l~ the 90-day rah.:.th~o one)'- ' ~'
ra te is lorccn<.ll.:d to fall in till.! future.
'}he
rhlllt111\Cllllll
16. 5
665
6-
-~~----~----~----~----------~----~----~----~
19')6
2006
11J<J4
l'rJ8
I 'J'JII
IY'J.:!
Year
Do'ly NYSE P"'' ontoge price changes exhtbit volatility dustering, in which thcrl ore some periods of high volatility.
such os tn tte late 1990s. and othet periods of relot.ve lronquilrty, svch os in tho mtd 1990s.
clu~l~t mg or. a~ ti
tS al"o known.
Volatility Clustering
lnc volc~tilily ur man) financial and mac.:rocconnmiL '.u iahlcs ch.m~cloo over time
f or l!xamplc. datly percentage changes in thc . C\\ York Stock F xchan~c (?'-. YSE)
stock pnce indc~ . sho" n an Figure 16.3.c>.hthtt p"nnd<; of htgh 'ol.ttiht ' su.:h as
m llJ90 and 1lKl3. and other period-, ol low Vlll,uslu> -,uc.:h a~ m JW3 .. \ ~~:.ne~ wtth
some penod:> of Jo,, "olatthty and some pl.!nuc.ls ul htgh 'olauht) '" -..uJ ltlcxhihit
' olutility clu!\Cering. Bl!cau!!-c: the \UI<Jliht\ PJ">C<tr" tn clu~\t;TS. lh~o. variance of the
Jail~ percent.tgc prh.:c change in the ~Y~l tmk' can he foreca,led.t:\1.!0 though
the Jaily price change itself is \ery difticull to forecast.
Forecasting the' ariance of a ~ries is of mt~.rl.:.Sl for -.c,eral rcao.;on:-. fiN. the
\artanc ot an .1'-!let price i<; a mea sur~ of thl. risl-. of U\\nmg that ac;o;ct: lbe larger
th~,. '.lrtJncc 01 datly stock pncc ch:mgt.:l!., th~.: mor~,. ,, ~toe~ murh t p.uuctpanl
tund~ to g.11n-or h' los~-<.'ID a typic.tl o t) An invcMor \\hO '" " orrit:J l!NlUl risk
\\OUIJ b!! ks ... toh.ranL \lf participating in thl! stock marhl during :1 period of
high-wthcr than lo\\ -volatility.
666
CHAP TER 16
'ecuntl.thc.. value ol \Om~ financial dcriv;uiv~, -.uch a-. optiun... tl~opcn<.l on the
\ :lnancc ol the underlying as.-.ct. An options trader wants the hc~t UV<lilahle forcca,ts
of luturc.. volatility to help h1m or her know the price at wh1clt 10 tluv or !>ell option...
llmd, lunxasling variances makes it po~sible to ha'~ .\ccur.uc.. h .1T\.:Cd., int\,;r.
vals.Suppo'<; you arc.. foreca!>ting the rate of mtlation. lt the..' 111 nc~.- oft c t ~,..
ca!>t error is constant. then Jn approximate forec.1-.t conftu~ net r t~;.l'\ a lc.l ~
con'\lructed .tlong rhc lines discusc;;ed in Section 14 ...._th.. t j,, IS lhc.. r\)(Ccast p)u:o.
or minus d mu l11pl ~ of the SF.R. If. however, the variance of the tot~o:C<L't c.. nor
changee; O\ cr t1me.thcn thl' "'1Jth ol the forecast int~. r-..al ..hould ~:han c 0\er um~:
At pcnods wh~.:n inflation 1-; ... ubject to pam cul.trl~ large Jbturl lnl~o'> or '''10 ks,
the intcn.1l 'hould he \\ Ide; during periods ot rdall\C 1ran4U1ht>. the mtcn tl
l>hould be lighter.
Volatility cluc;rcriog can be thought of as clu<;tcring of the \:lri;~nc~ of the c1 ror
term over 1imc: 1f the regression error has a small variance in one pe riod. it!- va1 ioncc tends to be small in the next period, too. In other word!'>, volatilit y clucacring
1mplies tbat lit~ error c>.hlbits time-varymg bcteroskedastkity.
ARCH.
Co nsid ~.:r
:o
where u 0 <t1, , a P arc unknown coefuctcnts. It thcs"' codlicu: nts arc poSllln
then if rccc..nt !'quarcd errors are larg~ the ARCH model pred1~;b that the: cuw.:n1
'4uarcd crron\1 11 be large in magnitude in the: .,cn'c.. that 1t~ \ .Uhlnc..c.rr, . is lm gc
Allhou!)h 11 is described here tor the ADL( 1.1) model in Equ.nion (1~.2<>) th"'
ARCH muucl c:tn be applied to the error van mcc of any time sc.. rics regression
16. s
667
llH. g~:n~.rali;ed A RCH (GARCT T) nHH.Icl. c.khlnpcJ h) the ccononh.trid.tn l'im Boller..,lt!\ (19R6).cxtenJ-. th~: A RC I I llhldd to let tT~ depend on its
own lag~; as well n' lags of the squared error. The (, A RC ll (p.q) modclts
GARCH.
n + cr u;-_ 1 +
'
a llj 11 t
1! 1cr,2
+ + <hclr;
I'
( 16 ll)
'tandarJ ~ ror::..
R, =
o.t).IQ
(0.1112)
ui = 0.()(}79
0.111211; I I 0.Q19tr; 1
(0 .0014) (0.1)()))
(0,006)
( [tl,1))
668
CHAPTER 16
FIGURE 16.4 Doily Percentage Changes in the NYSE Index and GARCH(1 1 1) Bonds
Percent per Annum
5 -1
J.b
I.H
- 11.11
- I.H
- 3.6
S.4
- 7.2 L-.,__
1990
1992
Year
The GARCH( 1, I) bonds, which ore :!: u, computing using Equation (16.33), o re narrow when the conditional vori
once is small ond wide when it is Iorge. The conditional volatility of stock price changes varies comiderobly over the
1990-2005 period.
O'i_
o-;.
16.6
Svmmory
669
urc l oA plots bands of plw. or minus one condltwnu l ~tandonJ ue\ tation (that is,
:t u,) bao;cd on th~ GARCH( 1.1) model. along with tlt... vtallons of the percen tage
pric" change series from its mean . The conditional stanuard dcviation bands quantify the timc-val)ing volatility of lhe daily price changes. Dunng the mid-1990s,
the conditional staodard deviation bands a re tight. indicating lower levels of risk
for investors holding the NYSE index. In contrast, around th..: tum of the century,
these conditional standard dev1ation bands are wide. tndicattng a period of greater
daily stock price volati lity.
16.6 Conclusion
1l1is part of the book has covered some of the most frequeotlv used tools and concepts of time series regression. Many other tools fo r analyzmg economic ti.xne
series have been developed for speci{ic applicauons. If you are interested in learning more about economic forecasting, see t h~ introductory textbooks by Enders
{1995) and Diebold (2000). For an advanced treatment of econometrics with rime
series data, see Hamilton (1994).
Summary
1. Vector autoregressions model a "vector" of k t1me series variables as each depends
on its own lags and the lags of the k - 1 othl:' r senes The forecasts of each of the
time senes produced by a VAR are mutually consistent.m the sense that they are
based on the same information.
2. Forecasts two or more periods ahead can be comput..:d either by 1rerahng forward
a one-step-ahead model (anAR or a VAR) or by t:slimating a multiperiod-ahead
regression.
3. 1\vo series that share a common stochastic trend are comt c,g.rated : that is, Y, and
X, are comtegrated ii Y, and X, are l (l) bul Y,- 8X, is /(0). If Y1 and X, are cointegrate<.l.thc error correction term Y,- OX, can help to prt!dict 6 Y, andfor AX,. A
vecto r error correcti on model is a VAR model o f oY, and AX,. augmentl:'d to
mclude the lagged error correction term.
4. Volatility cluslt!riog- when the variance of a ~eries tS htgh m some periods and
low io others- is common in economic time series, especially fi nancial time series.
5. The ARCH model of volatility clustering expresses th..: conditional variance o f the
regression error as a function of recenr squared regresston errors. The GARCH
model augments the ARCH model to include lagged conditional variances as well.
Estimated AR CH and GARCH model produce forecast mtervals with widths
th.ll dept:od on the volatility of the most r~cen t rcgrt:~l on re ..tduals.
670
CHAPTER 16
Key Terms
\C:Cinr
,j,,r (\
aul<.lr~;grv..
\R)
(h~S)
cointcgr<.~tion
(fytJ)
h Y)
DI-GI <i
t~.:'>t
J)
I /(d) (650)
(6'\U)
(655)
error corrc::cti,,n term (11:.1)
'l!ctor error currcctitm nwu~l (ti57)
cointegrating co~:ttrdcnt (h57)
EG-ADF te"l (6 "Y)
DOLS t:'>llm IUr (660)
' olalilit) du)tctim ((loO:\)
autorcgrc:""~ co tdtlhllll
hderoskedo~ tt(.ll) ( \RCH ) (M~)
generali1cd A R 'Jl (G \R( I I) (665)
r.t t~
.tpp IMCh?
16.2
-.,L"'){)SC t~ t Y,l
Ill,
fnr h
16.3
,IJ,,w
~"',lt'iVO\Irforc~.t'tof}",~:(lhatis.\\hllt"Y
16.4
16.5
&:erci~
671
Exercises
16.1
\uppo
~.: th.ll
16.2
t)f
Y, io; given
h~
}',_1.,
= J.Ly-
1'-:t
.t- }
I.
111
Jj,\'
OnL vc1~inn ol the expcct,ttwns th t.:UI y ol lh~.: t~.:rm l>l ructure of in terest rates
11l,kls thul a long-term rate equalc; thL .lH:ragc of Lhc expected values o(
'hort term uHcrcst ratel> mto the future. plul> a term prcmtum that i~ /(0).
SpLc.tftcall~ 1~1 Rk, denote a k pcrtt)u tnt~.;rc't rate, Jet Rl, denote a onc periou intcrc:-t rate. and let ~, denott: an /(0) te-rm premium. Then R/.. 1 =
.. L. I Rl I + e,, \\here Rl , I" j .... the rorccac;t made at date ( of the value of
R l at date t + i. Suppose that Rl 1 follows 11 ranuom wa lk. so thn! Rl, = Rl ,_1
+ u,.
l!.
O.S~R l
16.3
1 1-
i.IM\\er
to
' cJr(u.)
b.
=2
16.4
Supp<N; that Y, folio''' thL AR(pJ muJcl Y, fJu ... f3t '( 1 + ... f3,,Y,_,
F( Y,_,.l Y,. Y,_, .... ). Shm\
" \\her~.- r(/1 Y, _,, }" - l " .. )
0. 11!1 Y, I
,
for
h > p.
thai Y , 1 , {3, - {3 Y,_ "-' + ! fJ Y 1
16.5
L;_,l;
'L:.
I,:,(
672
CH APTER 16
16.6
A reg rco.;sion o f Y, onto current, pa'>l. ami Iu 1ur\.' "aluc ... ut ,\ 1 ) tl!ld',)
16.7
Suppose that ~ Y, = u,. where u, is i.i.d. N(O, I). and conc;ider the regr~.:ssiC'\n
Y, ... fjX, + error. where X, = Q Y,+ 1 a nd error is the r~grcssmn error. Show
that /3 --.!!....
1) . [Him: 1\nalyze the numerator of {3 using unaly~is likc
that in Equation (16.2 1). Analyze the denominator using the law of larg~.;
!<x1-
numbers.)
16.8
Conside r the following two-variable VAR model witb one 1.1g anu no inLI!rcept:
'
T " "
as Y,,, 2 =
10
term<; l
a.
Suppose that E(u,l u, 1 111_ 2, ... ) = 0, that var( 111 u, Lu, ~ ... ) fo llow-.
the ARCH( I) mod el~ = a0 + a 1u;_ 1, and thai tht! process for u, is stnLionary. Show that var(u1) = a 0 / ( 1 - a 1). (Hmt: Usc the law ol llcrmcd
expectations (u7) = E( (u;l 111 _ 1)l.)
"L;_ 1a
e. Show that a 1
Emp1ncol ~erc1ses
6 73
16.111 Constder the cointcgratcd model }' 1 - ll.'f1 -r 1 1, .1110 Y = X, 1 111 \\here
1 11 .tnJ ,., are mean zero :;crhlh uncnn d.llul r lllUtll11 ' anables wtlh
r~ (l'l I ) = u tor all (and J. D~.-mc lh~..- \-eCtor l.IIOl COIICCllon model [Equation' (lo.:-!2) and (16.23)) fur X .mJ Y.
Empirical Exercises
I ht!~t! exe.rcbe" arc bas...-d on dat,l l'eriel' 1n the ll~tt.t ltle!) LS) lacro_Quartcrl) dOd LSMacro_l\lonthl) dc!>c:rtbnlu. the Fmpirical F.x<.rcJ"c" m (n..~p
tcr' 1-l anJ 15. Let Y, - ln(Gf)l',). R, Jcnntc the thn:cm(lnth Treasuf) 1.1iil
l.ltL und r.i"~' 1 antl r.f'li:.Jenl>te the inflation rfllC~> fh,m th~.- CPI and PCE
deflator respecrivel}.
1.16.1
ll\.. a\ A R(4)
(a \AR
1<: 16.2
~) .,..: , ..,. .~
.i1
0 :; 10901 mt.l
~u hJrth.)
.l}:_,
d. Whtch model ha the ... m.tlk't mot me.tn <;LJUilrcd lort.Last erwr?
E16.3
l 'st: the DI--GLS test tn te'>t ft'f .tun it autliTcgre"'"" mot f1lr )'1. A., an altern.tlive. suppll'l' that ) , '' '>lalionar) .1round a dctcrmini..,tic lrc:nJ. Compan.:
the result<; to the result-; oht.1incc.l in rmpiricnl Exercise lL'.
t . l6A
In Emptncal E:.xerctsc I"'-~ vou '-IUutcd the b~.-ha vitJr nl rr~ 1' 1 - 1r{'< 'n 0\1..-r
the sampk pcnod 19591 through 200-t 12. 111.11 anal) ~IS ''""~ predicated on
rrf'<. lll
i.; /(0).
a. Tc~t fLlr a unit root in the autoret'!.Tl .;c;ion fot 7T~ rl 77{'' D. Cam out
he te't u'in~ the ADF tc'l that tndudc-. a cnn,tnnt and I~ ltg' lll th~
674
CHAPTER 16
or.
b. Test for a umt root in the auto reg ression for TT~ a nd 111 the :lutore
gression for
As in (a), use both the ADf anli DF (.d 'i tc~ts
tnc luding a constant and 12 lagged first differences.
"rao.
c. W hat do the results from (a) and (b) say about coinh:)!ratton "ct"ec"
these two inflation rates? What is the value of th~.- comtegrat n cod! .
cient (8) implied by your answers to (a) a nd (b)?
8 = 1.
How would you test for cointegration? Carry out the test How "-OUid
you estimate 8? Estima te the value of tJ using the DOLS regrc~i on ol
r.~l'l onto r. ~uo and six leads and lags o( C:m{'cw. Is tl1e estimated
value of fJ dose to l?
El6.5
a.
\\d<;
Us ing data on .ll Y ( tbe growth ra te in GOP) from 1955: ll o 2004:4. csll
mate an AR(l ) model with GARCH(l.l) e rrors.
:!:
a bands as in
1
Figure 16.4.
c. Some macroecono mists have clajmed that the re was a sbarp d rop in
the variability of .ll Y around 1983. wtuch they call the "Great \1oliera
I ton." Is this Grea t M ode ra tion evident in the plot that you lormed
in (b)?
APPENDIX
16.1
rate~
thr~.:e-month
CHAPT ER
18
The Theory
of Multiple Regression
objective~ The
first
IS
regression model in matrix form , which leads to compact turmulds tm the OLS
estimator and test statistics. The second objecuvc is to characterize the s.tmpling
distribution of the OLS estimator, bot.h in la rge samples (using asymptotH.:
theo1 y) and in small sa mples (if the errors are homoskedastic and normall y
<.listribured). The third objective is to slUdy the rhcory of efficient estimation of
the coeffici ents of the multiple regression model and to describe generalized
Jenst squares (GLS). a me thod fo r estimating the regression coefficients
efficiently when lhe e rrors are heteroskedastic and/or correlated across
obsetvallons. The fourth objective is to provide a concise treatment of the
asymptotic dstribution theory of instrumental variables (IV) rcgressto n 10 the
linc:ar model, including an introduction to generali7cd method or moments
(GMM) es11matio n in the linear IV regression model w1th he tero~kedas11 c
errors.
l11e chapter begins by laying our the multiple regression mode l and the
OLS estimator in matrix torm in Section 18.1 Th1s section also presen ts the
extended /east l.quares assumptions for the multiple regression model. The fu.,t
four of these assumptions are the same as the least squares assum ptions of Key
Concept 6.4. and underlie the asymptotic distributions used to justify the
procedures descri bed in Chapters 6 anJ 7. The remaining two extend(.'(.)
ku~t
squares assumptions are stronger and permit us to explore in more detail the
theore rical prope rties of rhe OLS estimator in the multi ple regression model.
11le next three sections examine the sampling Jistribution ot the OLS
estimator and test static;tics. Secuon 1~.2
704
pre~ent~
705
the OLS est1ma tor and r-statistic under the leal>t squares assumptions or Key
Concept 6.4. Section 18.3 unifies and generalizes the
tes~
of hypotheses
involving multiple coefficie nts presented in Sections 7.2 and 7.3. and provides
the asymptotic distribution of the resulting F-stati::.tic. ln Section 1H-4, we
examine tbe exact sampling di tribut1ons of the O LS estima tor and test sta tistics
in the special case that the e rrors are homoskedastic and normally distributed.
A hhough the assumption of homoskedastic normal errors is implausible in most
econometric applications, th ~ exact sampling distributions arc of theoretical
interest, and p -values compu ted using these distributiovs often appear in the
output of regression software.
The oext two sections turn to the theory of effi cient estimat ion of the
coefficients or the multiple regression model. Section 18.5 genera lizes the
Gauss-Markov theorem to multiple regression. Section 18.6 deve lops the
me thod of gene ralized least squares (GLS).
1l1e final section takes up IV estimation in the general IV regression model
when Lhe instruments are valid and are strong. This section derives the
asymptotic distribution of the TSLS estima tor when tbc errors are
be teroskedastic and provides expressions for t.bc standard e rror of the TSLS
~s t im a tor . The TSLS
Mathematical prerequisite. The treatme nt of the linear model in this chapter uses matrix notation a nd the basic tools of linea1 algebra. and assumes that the
reader has taken an introductory course in linear algebra. Appe ndix 18.1 reviews
vectors. matrices, and the matrix operations used in this chapter. In addilion. multivariate
calculu~
706
CHAPTER 18
h be r~rrc
l.
.\l T
{31 X2J
1 T fhX~;, i
u,.t
= I, ... . n.
(I:\. I)
li1 ~Hllc the. multiple. rc.!!rt:~:>J on model in mntrix form. Jdinc the follo\\tng
'ector" :mJ m,ll nee~.
Y,,
anJ p '0
th,tt }
I!\ II X
1. .\
I'.> ll ,
X 111
(k
(f)
( 1X 2)
jc;
X j, lht "
rcg.tc,,nr; ( tncluding
lh~
'\:on-.tant"
r~ogre-.,or
11
obller\.tlll..'ns o n th~o k
tor the
+1
mt~rc~.;pt).
t.
I) ' I Jim~os10n.tl
11 ~.;rrot l~rm'.
, .eliOT olth~.; k
fkiLnls.
X I fj
..1.
II
= 1. .... II
;tb
18, 1
The linear Multiple Regression Model ond OLS Estimator in Motrix Form
707
Y - .Y,' fj
The extended
I. f;(u, X)
lc.'!aSL
18. 1
is
u,. 1 = 1. ... . n .
( 18.-t )
In Fquation (IH.3), the first regressor is the constant" rl!~rc~sm thai always equab
I. and il'i cvdfidcnt i' the intercept Thus the inh!rcept doc~ not nppcar separately
in Equation (1R3); rather. it is the firs! ekml:nt nf the coefficient vector {3 .
Stacking al111 obsenations in Equation ( 18.3) yie ld~ th~: muhipk regression
moddm m<tnx form:
l'. = Xfj + U.
(18.5)
IH.l .
fxccp t for notational differences, the fir~t three asc;umpt1nns 111 K ey Concept
IR I arc ldcnl1ca lto the lirst three assumptions in " cy c~,,nccpt 6.4.
'I he fou rth assumption in Key Concepts 6.4 and 18.1 m1ght appc<.~ r t..liffLrt:nt.
hut 111 f<lct lhcv an.: the same: They arc .,imply Jiffcrcnt ways of ~aying that there
708
C HAnE~ 18
lltc k a-.1
squares ttssum pt ions in Kl.!y Conce pt 18. 1 imply si mple expressio ns lor thl! nl('<tn
vector anti cova1innce matri x (If the cond itional dislribution of U given the mutt JX
or n:g resso r~ X. (The mean vector and covuriance matrix of a vector ot random
variables are defined in Appendix 18.2.) Specifically, the fi rst and second .t!.:.UmpLions in Key Conce pt 18.1 imply that F.(u, ,X) = E(u,I X;) = 0 and that cov(u .u1 \ ')
= E(u,u1jX) = E(u,u A;,X1) = (11 l .\',)(u1 jX1) = 0 for i t-= j (Exerci''" I, "') TI1c
first. second, and fifth ac;sumptions impl)- that (u: 1 .Y) = E(u~ .\ ) tr,. C\lr tb
mg these result , w~.; lt.l \~ that
umlt..r l'>sumpuons I and 2, E( L- ,.\') = 0,. aod
(11).())
( lf< 7)
where 0, is the 11-d1men<:iona \l.ctor of zero'c; and /,~ then x n identity m:tt 1:\.
Similar!}, th~ fi rst, second, fifth, and <:Jxth .ts!>umption::. in Key Concept I" 1
imp I> thc11 th~.:. cond1t1onal d istnbution of the IH.Itm~nsional random ' ector C
conditional on .\'. is the multivariate nl..lrmal dsrrihutton (dd lne<.l an Appcmhx
Ht2). Tiwt is.
under a~)umptions l , 2, 5, and 6. the
condit1onal <.lt~t n hu t i on of U given X b N(O,. cr~l,).
(I R X)
18. 1
The lmeor Multiple Regression Model ond OLS Estimator in Molnx Form
709
<"tim lhlf h oht~ined by taking tbt> UCri\'<JIJH. of the o;um of -.quarcd pr.:Jtl.tion
ma\1 tkc~ "llll rcc>,pcct to f..;ach element ot th1. ct_>cllictcnl 'o.;Ctor. S<.tung the-.~.:
d~:m.ttth., h' 11.ro. and !>olvmg for the e'timatm Jj .
1l1c (kri' .ttih: of the sum of $qu.trt!d ptcdktion mi,takc' with n.: ... pcct ""the
P" rcgn:-;-,ion cociTicicnl. b,. i'>
.,
-!-h,,_
l:<Y,
,,
- hu -
b 1X 1,
"
2 ~X,(
Y, - h11
I>,..XIl1) ==
( IKlJ)
/1 1 \
11 -
hkX, .)
1-l
= o.
(HU ll)
{3 = (.\".\') l XT.
whcr~.
( \''X)
1 ili
lh~
7 10
cHAPTER 1 8
KEY CONa:PT
18.2
Asymptotic Normality of fJ
In large samples. the OLS estimator h:lo; the multi, anate normal asymptotiC'
distnbution
18.2
Asymptotic Distribullon of
71 1
= Q,y 1!vQ_i,IJn.
(18.13)
Derrvation of Equation (1 8. 12). To derive Equation (18.12), Grst use Equatjons (1 R4) and ( lb 11) towme.~ = (X' X )- 1X' Y = (X' X)- 1X '(Xf3 + U ).so that
(18.14)
1l1us j3
- f3 = (X'X)txo.so
Vii({3 -
(3)
'X)-' (X
= (Xn\ 'U)
, .
(18.15)
The denva tion of Equation (18.12) involves arguing first that rhe "denominator'' matrix tn Equation (18.15}, X' X/ 11, ts consistent. and <:econd Lbat the
numerator'' matrix, X' Ul Vn, obeys the multivariate central ltmit theorem in
Key Concept 18.2.111e detail!. are givt.!n in Appendix 18.3.
-X
- )-t . "here l:
( X'X)-t:I. (X'
II
II
II
II -
-k -
"'"'" 'u;.,
I LJ
I
A ~,
( 18.16)
712
CHAPTER 18
bin the SLR tor th" multiple r~grc,'>tOo moucl (Secuun <>.4) to .tJjust lor JWien
o[ C'>linlJlioo Ot J..
ltcgrv>'ilOil l'tlcftictCOh
Inc proof h~t ~ iJ p ) ___. }.. p-fr '' cun.:cpluall} 'm1ilar to th~. prool,
pr~'crtc:d tn Sco 10 1"'.3. of the con"''h.:ncy of hdcn>s!.l:cu.t"licit) rohust l>t.md.m..l \:rrors tor the stnglvregressor moJt!l.
tialthl\\0\\~r<J hi 1\ bcC3U\C
'Ilt~..
'-
(J
II
-1 ...
-\ \',!()
(1 ~' . 17 )
/J)'
l\\O
dlc,;~t..,
that in' oln. chang~~ in t\\O or more rcgn:,sor'. mere .ul! cnmp.1Cl mJlll\
exprcc;<;ion<; for tl est stanJar<l er ,>rs .uJd thu ... (or cor fiJ net tnten.th lor
preJIC'Cd d'ects..
(.llll"'ld-.:T J
Tl!gTc:>:>Cifl>
(or the
+ d , Sll th
"'Cll'lSef\'3liOO h1)!1l
th~: cham~~:
in X ; i.;
:1 \ ', =d \\hcrt. d ,.., a J.. - 1 dtmen.;ional \ector. Tht~ ~.:hang'- tn \ c.tn '"' nh c rnu 1
tipk rc!, cssor... (lh<tl j, rnultipl~! elcmcnh vf .\',) fur c\,unpk, tl
ul th~
regre,,or, trc! the \aluc of an independent \ariahk anJ Its 'tJUarc. then d ''tit~
t,,,,
Jitfcrcncc hctwccn the subs~yut!nt and initial \3IU~l> of tlwsc lwo '.tri.tbk~.
\(ll. tl'!. ... ,.11, 111d) TilUs the -.tanJ.lrd error uf this pn.:uii:tt:d dtcct '" (d' lf1d l .
\ C)'\'lo confidence intt:rval for thts prt!dicted ctkd i'
18. 3
Tl3
using the
moJ"'l
RIJ = r.
".1
( lfL~O)
q >< (k + 1) nonrandom matrix w11h full''"" r.tnk .md r is a nonmnJum tJ v 1 \t.:Clor 'The number of row' ul R 1' q, \\ h1ch io:; the numb~..r ol
re,trictiono:; hcin!! impo:.cd under the null hvputhc'l'.
'lhe null b\pothcc:ic; in Equation (l'-.20) su~~umcs lll tht.. null h~poth~.:,c:-.
considen:J 1n Sccuons 7.2 ami 7.3. For cxampk. ' JOint b)polhl..'~is nf the t)ptt
constd~rt:u in Section 7.2 is that {311 = 0. {3 1 ll.... . {3 1 0 lu '"ntc th1~ j\llnt
h)pothcsi'\ in the form of [q uatinn (JR.20). '\cl R [111 011 1, 1 .,J ,md r 0,1'
The formulation in Equation (1K21l) 'llo;o captures the rc~triCtlons of
<\cellon 7.3 in\'olving multiple reg.rc!;l>ion cocfiJCicnt'. I o r ex.tmplc. al /.; = 2.thl.!n
th1.. hvpoth~..,Js that /3 T 13: = I can be \\Otten 10 the 1~1rtn ul Lquntklll ( Us.:::!O) by
:.t..tllne R IO I lj. r = I. and tf = 1.
''hl.!rc R
714
CH APTE R IS
(Rp - r)lq.
11:-;.21)
II the hrst four assumptions in Key Concept 18.1 hold. then under
hypothcsb
th~.:
null
vn(RP -
be computed hy repeatedly evaluating the F--.tatisttc Cor many value!) ot /3. but
as is the case with a confidence interval fo1 a si ngle cocfficiem. it i' simpler to
manipulate the formula for the test statistic to ohtain an explicit formula fM I h..:
confidenc~ set.
Here is the p rocedure for constructing a conftd ~nce :-ct lor two or mon. of
the elements of (3 . Let denote the q-diruensionnl vccwr consisting of the coetficients for which we wish to construct n c<>nrich.:ncc set. For example. if \W ,11 I.'
constructing a confidence set for the regression coefficients {3 1 and (3,, then q - 2
and = ((3 1 {31 )'. In general we can write = R (3 . where the m<llrh R con:.ish ,)1
ze ros and o nes fas discussed following Equation (Hi 20)1 . The f -statistic te~t "~
the hypothc,is that o = o11 is F = (B- o1,)'[Ri 11R ' J 1(6- o11 )1q. where S= Rf3.
A 95% confluence set for o is tht! set of values 50 that are not reJeCt~<.! h~ the!'statistic. That b. \o\ h~n o = RJ3. a 95cyo confi(.)cncL o,l.! t for o is
1 8.4
71 S
i~
Matrix Representations
of OLS Regression Statistics
'lhc OLS pn.:Jictcd values. residuals. and ~urn of 'lquarcd re,ic..lu:~b have compact
m.ttrt"< rcprcc;cntation~ These representalilm' m.tkc u-.c ni L\\.O matnccs. P ranJ M,.
The matrices Px and M x The algcbr..t o( 0 I.S Ill th~.: m uhhurtatc modd
\10 the. t\\<.1 '~mmt.:tnc n < n matrices. P,. and llx:
rc!lit:~
(IX~-!l
(1 25l
j, idt'mpolent ir C '' squ.trt.: anJ (.(
C \'ec: Appc:nJix I\ IJ
P'< P_, and Mx = l(, lf , (r\crd'>t.. IS.'\) anJ l"lc:causc.: 1\ mJ \(\
... ~mmctric. P , and \f, arc wmmctrie id~ntpt"\lc.rH matrtCC'.
Am 1tri' C
n~' l\llht..
.u~
Px
716
CHAPTER 1 a
The mut.rices Px and M .ll have some additional usdul prnpcrtu;s. which fol low directly (rom the defin ition'S in Equations (18.2-1) and (1 1). 25):
OLS predicted values and residuals. The mnrrices P,'( and Mx pro.,. ill~: .,orm:
simple expressions fo r OLS predicted values and residuals. The OL predictcu
values, Y = X~, and the OLS residuals, iJ = Y - Y, can be expressed a f<.lllows
(Ex~::rcise 18.5):
Y= Px Y and
( 18.27}
( 18.28)
The expressions in EquaLion (18.27) and (18.28) pro'vide a imple prool th<1l
the OLS residuals and predie1cd values are orthogonal. that IS.. thaL Equa ron
(4.37) holds: Y iJ = Y' Px' MxY = 0, where the second equality follows lro
Px' M x = On n which in tum follows from M xX = 0,, (4 ,. 11 in Equation ( 18.26).
The standard error ofthe regression. The SR.deftnec.l in Sectio n 4 .3.~ ,,.
where
s~ = n _ k _ 1 ,~
iif = n _
... I
i _ U'if = n _ ~ _
1
1 U' M xU.
( 18.29)
where thl.' final eq uality follows because U' V = (M,r.U)'(MxU) "" V' M x MxU ""
U'Mx U (because Mx is symmetric and ide mpotent).
~
iJ
d rl>tribUtlon
( 18.8)), th'-'
given X is multt'- ariate normal with mean {3. lnl!
18.4
717
Accordingly. under aJI six ~sumptions in Key Concept 18. l. 1he fmite-sample
conditional distribution of {3 gi\ en X is
xr
(1R30)
Distribution of ~
lf all six assumptions in K ey Concept 18 I hold, then s;, has an exact sampling
distnbutio n that is proponional to a cbi-squart:d distrinution with n - k - 1
degrees of freedom:
s-=?
tl
'
u,
fl -
,,..~ - 1. k - I X f\u
(1831)
1l1e proof of Equation ( 18.31) slarts with Equation {18.29). Because U is normally distributed conditional on X and because Mx is a symme tric idempotent
matrix, tbe quadratic form U' Mxf.Ji c~ has an ~xacl chi-squared distribution with
degrees of freedom equal to lhe rank of M" I Equation ( 18.78) in Appendix 18.2).
From Eq ualion (18.26) the rank of M x is n - k - 1 Til us U' M,r.U I u~ has an exact
.\ ~- J. -l distribution. from \\hi.ch Eq ua tion ( 18.31) follows.
1l1e degrees of freedom adjustment ensures that s~ lS unhiascd. The expecrauoo of a random variable with a x~ 4 1 distribution is 11 - k - 1: thus,
2
(U' M X lf) = (n - k - 1)u11>
so E(s?)
= a u2
11
ip = s~(X'X)- 1
(homoskedasticit y-only).
{18.32)
i3r
P,
(I .33)
718
CHAPTER l 8
I =
\ (!.,,),;
t
the
(1K.1-l)
Ke~
Concept IX 1.
is the Student r dt.tribution \\ ilh 11 - k - 1
u~ing
f3 o~)
/3 I
131. , n,n.,tructeu
( ll:U5)
fhc proof
llf
The homoskedasticity-only F-statistic. The homoskcJa::.ticity-onl) 1--statb.tic jo;, c;;jmilar tl) the:! hetero<;kedac;Licity-rohust F ')(illl,tll Ill l::quation ( 1~.21) .
except th~t the homo!.kcJa,lkity-onl~ c<;tim<Hur }.1.1 is u:-.cu instl!au of the hc~
ero:-.kcd<htacatr robu't esumator i p. Substituting the expression ~P = (X' X ) - 1
mto the! e\pres~aon lor the F-statJstic in Equataon ( 18.21) yields the homoskcdastictty-onl~ F-sta ta~uc testing the nuU hypothc~as in Equauon ( 18.20):
s:,
r ) <t
08.36)
If ull six assumptions in Ker Concept HU holu. then unud the null hypmh-
F - r~.n-k.-J
(18.37)
t 8. s
71 9
1h.ln the formula for the WaJd statJ.,tac an lqu.atmn ( 11-...'h). t h~ homoskedasuconh ! ...t.ttasuc and the Wuld F-:-.lata:-.t&L lllL two VLI"'otllll'llf thL .,,amt: ')l<lli~tk. That
'" th~.: t\\ O expressiolb an: cquivaknt. a rc-.ull .,hnwn in Exc:n:isc lS.D.
The Gauss-Markov
Conditions for Multiple Regression
' 111L
(HUt\)
The (iauc:c;-Markov conditions for mu ltiple rL~IL~ ~m 1n turn are amphc:.J by the
first lt, eassumptionsin Kq Ct,nccpti R I (s~.:c Lqut llo n-.tHs.o)and(1~.7JI.TIH.'
nmJ1taons m Equation (HUb) generalize thL Gau" \ hrko' condHtOil) for .1 smgk r~gre,.,or modd to multtplc rc.;gr~.:,.,ion. (H) U\lflg m.lln\ notation thL <;\.:COnd
and third Gau.<;s-Markm condition~ in [quat inn ('i ~I) arc cnlkctcd into the 'ingk C\llldition (ii) in Cquation (J~. 3~). J
tnr
{J
A' )'.
( 1.'.39)
7 20
CHAPTER 18
Compa~on of
Equations (18. l l) anu ( 18.39) shows that Lhe O LS e timator is linear in}"; speciftcally,~ = A ' Y.where A = X(X' X) 1.Toshowthmpiscondiuonallyunhiascd.rccall
from Equation (18.1-1) that = fJ + (X' X) 1X' U.Taking the cond1tionaJ expectation of both side s of thb expression ~idus E(P il") = {J ~ [(X'X) - 1X'UIXJ
= {J + (X' X) - 1X ' E(U IX ) = {J , where the finul equa lity follows because
E( U IX)
panu
mator of c' fJ is efficien t. that is, the O LS cstimutor c' has the smallest contlilJOnal va riance of all linear condi tionally unhiascd t:~ timutors c' {3 . Remarkably.
this is tr ue no matter what the linear combinatio n 1~. It i=- m this sense that thl:
OLS estimator is BLU E in multiple regression.
The G auss-MarkO\ theorem IS slated 10 Key Concept II):"\ and proven tn
A ppendix 18.5.
1 8.6
721
that t li~ (;au,_..,_\ll arko \' condt~ion-. for multiple regre~'>ion in F quntton
(HDS) hold. 'I hen tht. OLS csum.,tor JJ b BLUE. That is, l~ t fj he a lmt:ar cun-
18.3
~Hn
<J.D!!
722
CHAPTER 18
'D1c pr~!"cncc o1 correlated crrm tenns creates two prohh:m-. for inh.:rl!ncc
ha,.cd on OLS. First, neither the hcteroskedasticity-robust nor the. homu.:;ked''"
ucitv-only standard errors produced by OLS pro,idc a ' a lid IM'Ils h.>r infer... "'1.c.
The solution 10 thi.., problem is to use standard errors that arc ruhust to hoth h\.1
ero~ l..cdasticity and correlation of the error terms acro..,s <lO,\:.r\ Ilion lbi ...
topic-hcteroskedasticity and autocorrelation-consi)>te nt (II \{') co\ nnnncc
matrix estimation-is the subject of Section 15A and we do not pur~uc it lurthcr
here
Second. it tile error term lS correlated across observation th ( L l.l' .\ ) 1s
not diagonal, the second Gauss-Markov condition in Eq uation (1 '.J ) do... ., not
hold. and OLS is not BLUE. In this section we study an estimator. generulized
least squares ( GLS), that is BLUE (at lea!>t asymptotically) when the conditional
covnriance matrix of the errors is no longer proportional to the identity matnx /\
special case of GLS is weighted least squares, discussed in Section 17.5, in wh1ch
the conditional covariance ma tr.ix is diagonal .and the i' 11 diagonal elcm~nt is a
function of X;. Like WLS. GLS transforms the regression model so that the crro~
of the transformed model satisfy the Gauss-Markov co nd ition~>. The GLS c.. timator js the OLS estimator of the cocffi.cients in the transformed model.
(18.40)
Th1s assumption is implied by the first two least squares assumpuon an l<ey
Concept 18.1 , tbat is. iJ E(u,IX,) =0 and (X,.Y;). i J, ... , n are i.i.d, then
E(V IX) - 0,. ln OLS, howeve r, we will not want to maintain the 1.i.d. assumption; aft er all. one purpose of GLS is to handle errors that arc correlated across
observations. We discuss the significance of the assumption in Equation (1R40)
after introducing the GLS estimator.
The st.:cond GLS assump tion is that the conditional covari ance matrix ol U
given X is some function of X:
E( UU'!X)
= S1(X).
(18-H)
18 .6
723
KEY CONC1'1'
= X{3 +
18.4
=0
11 :
n,. -
n,,
o,,
724
CHAPTER 11
-,.. = X{J
- + U.\\ hc.r.: )' = F Y. X - FY. ~tnd l/ Fl./.
l11c k~v inc;ight of l1LS j.., that. undet the four GLS assumptiom. the. Gau-.,.
:\.Sarkov a.....umptioth holc.J fur the tranc:formeJ rc~rc...silln in Equation (I ~.-P ).
'll1 tt '"h) tr.mslormm~ dl the vanablc;s bv thc. mver.;e of the matrix squ~ rc- roOi
Ol H. th~ r~gf\!:.:.100 c.ff()f~ In the tran-.lllnlli!J rc.gTc.:>!>tOn haW~ a Conc.JIIU'IIl31
mean l'IC lero nnd ,1 l.'ll\ rt.tnC\; m.tlrL\. that .:4u.tl~ th~o. idc.nltly matnx. fo :.) uw
thi:. math~matil:.,lly.lir..t nott: that l.li:'L\') l:(Fl.. F\ ) F I:(L,F\ ) 0,, hy
the lir~t GLS ,I..,....Untptiun rr 4uation (!SAO)) . In ,,Jc.Ji tiOD. E(i/L'' ,.\')
l (! FU)( F(')' F \ J = F ( l.iU' F.'<)F' = FHF' = l , ''hl!re the second eq Lwlity
follow~ becauo;~ (FU)' = U' F' and the fma l cqualitv lollow:. from the c.Jdmiti~'ll
of F. I! rollow:. that t h~ lfi.lll'oformcc.l fCgf\!S'iion model Ill Lquauon (1\.f') satt'l!c... the Gauss \larko\ conJitions m Key Concept I X.3.
lltc GLS e'umator /1 0 ,,, th.: O LS esum.1tor of fJ 111 E:.quauon ( 1~.42). th.1t
i.._ p :> = (.Y'.\ ) 1Ci' Yl Occ.uo'- th'" trJO">formcc.l rq;rc,,ion model
the
(' u ....... ~tarki'l\ condition$. lh~o. GL~ e timator h the hc-.t conditionally unhj,,..,.:J
e-.umator th.ll h linear in l But hccau<;e Y- f ' )' and F b (here) a.,<.umc:J 10 h~.;
l..nown. and h\!GIU'e F ,., un etllble ( bccau"c H j, po.,ttt\~ c.Jdillne ). the da" of
t!Sllll1ttlOTS that arc linear Ill }' I!> the ~atne a'- lhc cJ.I'l"i or estimatOTh that ,1r1.. lln
car in l '. lllu..... thc O LS c<>llmator of f3 tn Fquatton llh.4~) i., also rht! ht:<;t \.:onui
lit1nall~ unb' ,l,~,J to:'' im.1Lor :1mnng ~'ttmator-c; th.u 1111. hn\:.tr m }'. In otbt..r '' oru:..
unc.l~r the (iJ <; "'umptinn-,. lhe GL <; l:'lim ltor j., BLl ' F
' Pte GLS eo;.tmloltllf c.m !'lc C\.Pfl."t:J dircctl} Ill h.:rm' of n. St'l th,lt in princtrk there'" nn n~.eJ to compute the 'quare wot m:Hn\. r Bccau~e .\' - / ~\' onJ
}' FY. {f1 ~ ( .\ ' F'.f ).j !X' f" F > ). Out r F - n- 1 so
'''''r''-'
18 .6
G, tlerclized
Leos! Sqoores
725
In pra~tic~. His t' ptcall~ unlo.nu\\n '-<tth~ (J I.S 1.. -.umalnr tn b.~u;ltiun t 1K4~)
<tlly ~nnnot ht.. ~omputcd :md thu-. '' um1..1m1c~ <:<.tiled thc.: inrcao,iblc GL~
C'\tlm:uor. rr hm\ C\ cr. n ha~ a kno\\ n fun~tiumtllorrtl I'UI the: paramc.:te~ of that
functi'n .tre unkno" n.thcn fi can he e!>timutcd und a f~:~-.ihk "'"Non :>f the GLS
c"timntM can he computed.
I) pt
GLS When
If H IS <.1 known functton of some paranh.ll.. r-. that tn turn can be c:~llmatc:J, thc:n
tht:sc e'umatetl parameh.T' c~n be u...cd to 1.. tlcultc an c.:~timator of th~. L"O\ anance matrh n.. For example. runsiucr th~o. tim1. c;cric' applicatinn Ji. cu,sed
following Equation (1~.41) in \\hic.h !l( \ ) doc-. not dcpcnu un \'. .n = u-:.
fl - pcr2 for i - jl = I. and fi, -II lor jl - 1 > I fhcn fi ha-. two unkno,,n
paramctcrs. rr : and p. Thest: paramch. . r<; l.an he c~ttnMtcd u"tng th~o. rc~H.lu,tb from
a prellmtnary OLS rt.!grc~s10n; spectftc<~ll), 1.<111 b~.. c~tnn.llcd b) '~ .wJ p can bt:
estimated h) the sample correlation hctwcctl llln~tghhnring pair::. of O LS re-,it.l uals. These estima ted param~tcrc; can in turn he u'ct.lto compute''" cstimawr of
U":
.n.n.
In general. supp~ thai you have an C\lllnator n of n. ThLn the
mator b:l:...:d on i>. 1s
CIL~
esti-
1l1e GLS C\timator in Equation (I R44) b \llnldlmc' called the feasible GLS
estimator because it can be compulc<.l if the cov:m~1nce matrix cont.unc; som~
unkno\\ n parameter' that can he e um.ttcd
726
CHAPTER 18
When sampling is not 1.1.0 . howe\ a , the ftr::.t GLS a~umpllon is not implied
O\ th~.: 1 ,umption th.tt F(u, .\',) 0: that i' the first G l ~ d .. umpuon ,, !>tfOI\ cr.
Although the dastinction bet\\t.:\;n thc'c two cond itilln~ ma ht sc~m light u can
b.. v-.:rv important in applicauon-. to nme serie uata. Thl'> J <:tinct 'on ~,., Ji.,~,.u.,.,cLI
in $..::~lion 15.5 10 th~.- ~ontex t Ol whether th~.; rcj.!r -.::..sm IS ''past anJ prco;cnt"'
exog~nous or 's t rictl~ .. c:\ogo.::nou::.; the Jssurnption th..tt L(ll >: .... , X 1 = 0 C\lfresponds to 'tnct exogc.!ncat~ He re. we discuss thJs Ji~tmcuon ..tl J more gcncr.tl
Jc, cl U!-tng matrix nnt.ttion.1o do so. we Cocu on the CJ".: that L 1 homo.,keda"tic. fi ic; knOY. O. and fi has OOOAfO off-di ag.onal~.h..m ent<..
/JC'
xn-
2.;.,
Is the first GLS assumption restrictive? The first GLS assumption re4uires
thJt the error!> fo r the 11h oh~ervation be uncorrelatcd with lhe re!U'essors for all
other observation:,. Thic; a!)l)umpuo n is dubious in some timL 'lc!ries applications.
This issue is di<.cu::.sed 10 SecLJon 15.6 m th1.. conti.!XI oi .10 \;mptncal example!. the
relation.,hap bl!tY.een the chang~. an the pnc~. o! .1 contract lur future delivery ot
727
frotcn orun!!e conct:n tratc and the weather in I lorida. As cxplatncd there. the
~.rror ~t:rm tn the regr~.:sston ot pncc change!- on the'' cathcr is plauc;ibly uncorre
httt:d w11h currl!nt and past ,atues of the \.\c,llhcr. so the ft rst OLS assumption
hnlds. H O\\ c\CT, this error term is plausibly con elated with luturc values of the
wcathc..r. so the fir<>t GLS assumption does nm hold.
Tilb example illustrates a general phenomt!non in economic time series data
that ames when the 'aJue of a varia hie today is set in part based on expectations
or th~ {uture: Those future expectations typically tmply that the error term today
depends on a forecast of the regressor tomorrow." bich 1n turn is correlated \\ ith
the actu.tl value of the regressor tomorrow. For this rca-;on, th..: ltrst GLS assumption is in fac t much stronger than the fiN OLS assumptiun. Accordingly, in some
applications with economic time series data the GLS estimator is not consistent
even though the OLS estimator is.
728
CHAPTER 18
,fth~o. rcgr~.:..,
orsm the
LljU.Itl\)0 ol intt r."SI. so .\ con tams the included endogenous rc~re..,..,or, (the \",
10 J..~.y Com:~:;pt 12.1) /11/c/ the incluJ..:d ~xogcthlll'- TC!,!f.'''<(lf' llhc n \ il Kcv
( unccpt I:!.I) 11ut 1~ in the notauon ol l....t:.} Cun~..... pt 1~ I. th 1
''\ ''' ,\ "
X, (I X II ,Y2J ... X,., \\II w~ ... W,.) "\1-.,u. kt / lkO<HL th~. II < (m ~ r 1)
malri\ of all the c\ogcnou... rcg.rc!>~OT'. buth tiHN: included 111 the equation uf
inh.:rc... t (tht W's) anclthU'L excluded rrom rhc 1.4U ttiun of im rc-;t (the ir't u
mcnt,). That jc; in the not.ttion of Key Concept 12 I tl e i1h n_,,, l. Z i" Z (1 Z 1,
7.21 ...
\\
w ... W,).
\\ tth tht~ notatton. the [V regre::.sion motld of Key Cunc:~.rt 12.1. wnth.. n tn
m.llrt\
lorm. i::.
}' =
where U ls t h~:
11
X{3
( lg.45)
L,
ekm~.nt
";
1l1c: m.ttrix Z consist' ol all the exogenous n:gr~:ssors.so under the 1\r rq~rc ....
StOll :lt;mmrtklll' in KL \ runcept 1~.4.
HZ , ) = 0
( 18.46)
Hecuu'>e thcr..: .1r~ J;. indutled endogenous rcgrc-.,~or-.. the rir'>l stagt: rcgre..,..,ion
con-.i-.h of k equation'
The TSLS estimator. The TSLS t!:.Umato r 1:. th~: '"'trumcntnl 'ariahh...., c-.ti
mal or 10 wbtch the tnstrumt.. nb arc the ~rcdil'tcJ values of.\ b.tseJ on ou; c:-.11
m'Jtton ol the.: l!f'l '!>tag~ r~.ue~~ton. Let.\ tlenot~. tht'> main\ ol pn.:Jtctcd \,t!u~.:-..
Sll that the ;th H \\ l ( \ ... ( i
X:; ... ..\-~, w, Wu ... u ). wht..re .r,; j.., tht:. rrc
Jit.tcJ \alut.. from the: rcgresston of X 1, on Z . am!..,,l IMlh Bcc.:au~c tht.. W\ 1rc
contained in Z. the prcdictc.:J \Jlue from a regre..,...ion uf l4 on L i.; jut;t \V1 and
:.o lurth, ~o .\ - P1 X. where 1'7 = Z ( Z' L) 7' I"1.'C Cquutton ( 18.27)]. Accord
ingl~.
.1.;
(I ~ .48)
18.7
729
II
II
II
II
II
II .
(18.49)
''h~re th~.
wh ~re
Standard errors for TSLS. Tht: formul,t in Cquu11on CIX.'\"" lis uaunlln~ Ncvcrthc:h:'-S. 11 prO\ ides a way to esllmatc ~ 1' 1 ' h) -.ub,;utullng .;ample mnmt. nls lllr
th~o populatiOn moments. The rcsultmg. \ ,trwncc csttlllatw i-.
"'''..S-(
Q
A\ lQ
. l7. Q
~ )- Q
A.t. Q
~ l.t.IJJQ/./ Q
. /\ (Q
A\ZQ
. /./.' Q. / \ )
-'
-
where Q,z
X'Z
11 .
(1~51)
H =~ "'2:,ZZ'11~,\\hl...r~... i l'
.\/J1 ' 1
( 18.54)
730
CHAPTER 18
when! U is the vector of TSLS resid ual!:. and it, is the ;rh clcmcn 1 of that vector
(the TSLS residual for the t"th observation).
The TSUi standard errors are t he square roots of the diae.onal clement:, of
'f. nt \.
Properties ofTSLS
When the Errors Are Homoskedastic
H the errors are homoskedastic, then the TSLS estimator i\ a~~ mptoticall) efficient among the class of I V estimators in which the instruml!nts are linear combinations of lhe rows of Z.1llis result is the IV counterpart to the Gauss-Marko\
theorem, and this result constitutes an important justification for u ~ing TSLS.
= [ZiZ.' I::(uriZ;)] = Qzza~ . Jn this case. the variance of the asympto tic distribution of the TSLS estimator in Equation ( 18.52 ) simplifies to
~ tSU - ( Q
"'
X2
Q-I
Q )2Z Z X
l a2
u
(homoskedasticity only) .
(18.55)
U"U
= (0
(J- l(J )-1 C,lu where (J."Zu = -n ---,-------.,.
-XZ- ZZ-ZX
k - r
1
(homoskedaslicity only).
(18.56)
and the homoskedasticity-only TSLS standard errors are the square root of the
d iagonal elements of trsLs.
= 0.
(18.57)
18.7
731
sense !hat all arc satisfied at !he true value of /3 When th~.:sc population moments
are replaced by their -;ample moment . the system of equations ( Y - Xb )'Z = 0
C<tn be solved for b when there is exact tdenllftcation. l'hts value ol b '" the IV
esumator of /3. However, '"hen there is overidcntthe<ltttm (111 > k). the system of
equa tions typically cannot all be ~at i lied b) the same \alue of b bccaus~.. of :.ampiing variation-there are more equations than unkno\\ ns---.lnu m general this
sy<;tcm doc' not ha\e a solution.
One approach to the problem of estimating fJ '"hen there b overidentilication is to trade off the desire to satisfy each equatton by minim171ng a quadratic
form involving all the equations. Spcctltcally, Jet A be an (111 "'I r "T" l) x (m - r,
1) symmetric positive semidefinite weigh t matnx, and let {3~1 denote the c::~llma
tor that minimizes
minh (Y - Xb) ' ZA.l'( Y - Xb ).
(18.58)
The solution to this minimization problem is found by taking the dl.!rivative of the
objective function with respect to h, setting the rc ult l.'qual tu tcro. and rearranging. Doing so yields p~v, the IV estimator based on rhe weight ma trix A :
/3 1,~,.
= (X'ZAZ'X) - 1 X'ZAZ'Y.
( 18.59)
Comparison of Equations (18.59) and (18.48) shows rhat 'f'SLS i~ the IV estimator wit11 A = (Z ' Z)- 1 That is, TSLS is the solu tion of the minimization problem
in Equation (18.58) with A = (Z' z ) -1.
The calculations leading ro Equations ( 185l) and (1 8.52). applied to p~v,
show that
Vn(P.': l~v
(18.60)
The second way to generate the class of lV estimators that Ul>C linear combinations of Z is to consider IV estimators il1 which the instruments arc Z B, where
B is an (m + r + 1) x (k + r + 1) matrix wi th full row rank. Then the sy tem of
(k + r + I) eq uations, ( Y- Xb)'ZB = O.can be solved uniquely for the (k. + r +
L) unknown eleme nts of b. Solving these ettuat itms for b yic.lu!:> P111 =
(B' Z ' Xr ' (B' Z ' Y), and substitution of B = AZ' X into this expression yields
Equation (18.59). Thus the two approaches to ddining (V csllmuLUrs that are lin
ear combinations of the instruments yield the:! same family of IV estimator'>. It i-.
conventi on al to work with the first approach. in which lht: I V ~'itim.ltor solv~l-. th~
quadratic minimi7ation problem in Equation (1 8.5X). and that i~ the approach
taken here.
7 32
CHAPTER 18
111
becomes
(liS til)
To sho'' that TSLS is a"YJTlplotically eLficient amc,nr the clac:c: f ' 1mator,
th t arl.! hn"c1r \;ombinations of Z when the errors ar\; ho ,o,kcun 111.;, "~ n~.cd to
show that. undt:r homoskedasticny,
( 1S.h2)
for all positive sem idefinite mnrrice<> A nod all (k 1- r 1- 1} x l vcctorc; c. wh~:lt.'
l ''' ~
= (Q,zQz~Q ?.x)
ai
variate Guuss-Markov theorem in Key Concept 18.3. Cons(.!yucnlly, fSl Sis thl!
efficient I V co;timator under homoskedasticity. among the dn'' ol 1.. -.timator'
which tbe instruments are linear combinations of Z.
111
the nuU hy potb~b that all the o"eridentifying restrictions hold, a! un.M 1 c
ahemativc that c;ome or all of theo1 do not hold.
The tdea ot h~ ) -statistic is that. 1f the ovendc.:ntlymg n:stru:th'"' hl.llc.l. t,
will be u ncorrdat~d '' ith th..: instrum~nt!> and tho-.. a n:.gress1un of L ll / "111
haH; populauon re~re,!\ilm coefficients that all cqu.tl tcro. In pr sctke. C.: is nl.l
ohscrvcd. hut it can be E:stimaled by 1he TSLS reo;1duals L . so a lc.:er.. ssion ot C
on Z should y1cld c;tattll ticall~ ins1gnificant coeffcknl" Acco1Jingly. Ih..: 1 SI-S
J-~ tatt tu; ~ ~ t he.: homu kedasticity-only F-statistiC testi ng the hypothesi' th:ll th1.
coefficients on Z arc all :tero, in the rl!gre~sion o f U on Z. multpl11.:d by
(m +- r 1- l) ~u that the F-statistic is in its asymptotic ch1 'tfU<Hl!d 1'111 m.
A n explicit fnrmulu for the J-slulislic can be obtained w..ing I:.qu.llinn (7 I:i)
for the homuskedasl icity-only F-slatistic. The unrc'.lrictcd n:~trl.'.,...ilm j, the
rcgrcs~ion of iJ on the rn " r + 1 regressors Z . and the restricted rcgre~-.1on ha-;
no reg.t<!!.sors. 'lbus. in the notation ot Equation (7.11), SS'R
= U' M /(
andSSR,, ,. I ll'V.soSSR lri<:~J- SSR ,.,,
lf u - u'JJ,- ':: ii'PzL'
and the J-~talhtl c is
(1S.6J)
18.7
733
nH:
( 18.64)
Generalized Method
of Moments Estimation in Linear Models
1f thl.! errors are hcneroskedastic, then the TSLS estimator is no longer efficient
among the class of IV estimators that use linea r combinations of Z as instruments. The efficient esLimator in this case is known as the efficient generalized
method of moments (GMM) estimator. In addition, if the errors are heteroskedastic. then the J-stati:.tic as defined in Equation ( L8.63) no longer has a
chi-squared distriburion. However, an alternative fom1u lation of the ]-statistic.
constructed using the efficient GMM estimator, does have a chi-squared distribution with m - k degrees of freedom.
These results parallel the results for the estimation of the u~ual regression
model with exogenous regressors and b e teros k c:d a ~tic errors: If the errors are
heteroskedastic. then the OLS estimator is not efficient among estimators that
are linear io Y (the Gauss-Markov conditions are not satisfied) and the
homoskedasricity-only F-statistic no longer has an F distribution , even in large
samples. In the regression model with exogenous rl!gressors and heteroskedasticity. the effici ent estimator is weighted least squares: in the lV regrcs<:ion model
with hcteroskedasticity, the efficient estimator uses a difft:re nt weighting matrix
than TSLS, and the resulting estimator is the efficient G MM estimator.
CMM estimation.
734
CHAPTER 1 8
in ~trum t:nts
Z With tllfl.:r\:nt
Wct~ht malltlc~
A i-. th~.o 'iam.: a~ the class of I V estimator- in wh1c::h th~.; in.,trunll. nb ul~.; hncr
c::ombinmion-; of Z. Ln the linear IV rcgrc~~ion mudd. GM\1 i-. ju,t .tnnthcr name
for the cla<;s ot cc;1maton;; we have been studying th:at i'. c'ttmntor' that ~~> lv<:
E4u:11ion (I R5X).
that T~LS is the dficient GMM c"timator in the linenr regre,,i,,n llllldd \\hc:.n the
errors nre homor,kcda tic.
Iu muuvme the expresston for the efficient GMM esumntor when the errors
arc lh:l~ro'k~uu~tic . recall that when the error!> nrc h omosi-.~Ja:.tic. H Ithe \Jrt
uncc matnx of Z,tt;. sec Equatton ( l~.5U )] equal' Qucr,:. anJ the a-.ymptullcallv
dfidcnl weight matrix 1s obtained by seuingA = (Z 'Z) 1, wh1ch y1cld~> thcTSl S
t:'timc~tor In lnrg.: -.amplc~. u~ing Lhe weight tnJln\ A :;: (Z'Z) '' cqlli\<tl\!!11 to
u'ing. \
( Qzz<r~) j = H 1.1llis intcrprctati<.m of the:. TSLS estimator '-Ugg.c,tl>
that, hy analogy the ~..:fficient IY estimator under hctcro'\keJ ''licity can be..
obt.tm~..:d by semng A = H 1 and solv mg.
(IX.65)
Z'.\'J - .\ 7.H
7 l'
pt:tl
{3) ~
(1~.66)
II 1
1QL\ )
( 1Kli7)
' fhc.. r~:,ult that ~F G ~t 1 is the dfic1ent G.\1~1 c..!iltm nm b prO\~;n bv ~hO'"
ing th 1 t ~" "c '"> l' !.rrc; 1\lc tor all \ecto~ c. \\h~.;rc.. ~ tIS gi,cn in [quutlon
( I:\.60). Th~; pro ,f ot I hi.."- re:-.ult is ghcn in A PIX no x I '-i.f\
18.7
735
Feasible efficient GMM estimation. n1e GM\.1 ~~fimator ddmed m Equation ( JR66) ts not a feasible estimator hecau~~.. 11 t.l!!pcmh on the unkmm n \'ariance m.:.ttnx H . llowever. a feasible efficu!nt G MM esumator can be computed by
suh~tituting a consistent estimator of H mto the mmtm ti.IIJOn pmhll'm o( Equation (IK6:'i) or. equivalent!). b) sub~titu ling a consistent estimator of H inlOthc
formula for 111 in Equation (1-'.66).
1l1c ...:fticient GMM estimator can be computet.! in two step~ rn the ftn.t step.
estimate {J using an~ consistent estimator. lJsc thts L'\ttmatm ol {J to compute the
rcstdual<> I rom the equation of interest, ami then use these restJuals to compute an
c<.,tltnator of H . In the second step, use this csllmator ~lf II to ~sttmatc the optimal
weight matrix u - t and to compute tht! efficient GMM c~timator. li.1 he concrete.
in the linear IV regression model. it is natural to usc thl' TSLS estimator in the first
step and to use fhe TSLS residuals to estimate H lfTSLS is u-;cd tn the fiN step.
t h~ n the lca.o;,ihle dftcicnt GMM estimator computed tn the scconJ step is
P.
(IK~~)
(18.69)
where I, Lf/G\1\t = (Q_\zH- 1Qzx) - 1 (Equa tion (lo.67)]. 111at ts.tht! feastble two<; LI.'p c~timator ~EtfG 1111 in Equation (18.68) i)\. a-,ymptoticall>, th1. cftktcnt GMM
c"timator.
'"the
is given hy
( 18.70)
'' herl' L'CM i t - Y - x jJJo.ff.G 11'1 are the rcsiuuaJ, fr,1m the cqu~ tinn of interest,
c'tlmatcd hy (lcasihlc) effici~nt GMM. and i{ 1 is the wct~ht matnx used to
compute (J 1Dctltlf.
Umlcr thl.! null hypothesis l::.(Z,u,)
1Kc1).
= 0, 1'' 1111 ~
,\ :,, k
(sec Appcndi\.
7 36
CHAPTER 18
GMM with time series data. Th\; r..:l>ulf'\ tn this :.cction wen.. d~.:rivc:d unJcr
the [V regression assumption' for cross-:.cctiona l data. In many applintion'.
ho\.. c\ er. these results extend to time series applica tions of 1\ rt.;gre''llln and
GN!M.A ithough a formal matbemaucaltr~.:atmcn t ofGMM \\ith time .,.. rie data
1S beyond the scope of this book (lor such a treatment. ~ec H aya-;hi, 2CIUO. Chapter 6), \\e neverthe less will summarize the kC)' tdcac; or G~ esttmattun \\llh time
series data. This l>Ummary assumes famiHarity with the matenal in Chaph: r<- 14 und
15. For this discussion.ll U, ~umed that the variables are starionaf).
ft is useful to distinguish betwee o two types of applications: apphcatiom. an
" hich the error term u 1 is serially correlated and applications m \.. hkh tt1 is seriallv uncon elated. U the error term u, is serially correlated. then the asymptotic
dbtribution of the GMM estimator continues to be normally dbtribut~d . but the
fom1ula for H in Equation (U~.50) is no longe r correct. Instead, the correct
expression fo r H depe nds on the autocovanances of Zit, and is analogous to the
formula given in Equatio n (15.14) for the variance of the OLS estim ator when
the erro r term is serially correlated. The eCf.icient GMM estimator is still constructed using a consistent estimator of H; however, tba t consistent estimator
must he computed using the HAC methods discussed in Chapter tS.
U the error term 11, is not serially correlated, chen HAC estimation of H is
unnecessary and the formulas prcs(?nted in this section all e"'\tend to time se ri~
GMM applications. In modem applications to tinance and macroeconometnC1,.11
is common to encounter models in whtch the error te rm represents an unexpected or unforccastable disturbance, in which case t h~ model implies that u, is
senally uncorrelated. For example, consider a model wi th a smgle included
endoge nous variable and no included exogeoeou!o variables, so that the eq uation
of Lntt.rest is Y, = {31 - 13 X -r u,. Suppose an economtc theory implies that u, is
unpredtctable gi,en past information. Then the theory implies the moment
condition
(18.71)
where z,_J is the lagged value of some o ther variable. The moment condition in
Equation (I R.71) implies that all the lagged variables Y1 1 X,_,. Z,_ ,, Y, _2, Xc-2
Zt-2, ... arc cand idates fo r heing valid instruments (they satisfy the exog~n tlity
condit ion). Moreover. because u, 1 = Y, 1 - f3u- {3 1X, 1 the moment condition
in Equation (18.71) is eq uival~nt lO E(ll,iu, t X, I z ,_,.ll{-2 x c-2
=
0. Because 11 , is ~eriaJly uncorrelated, HAC estimation of H is unnecessary. The
theory of GMM presented m this section, includtng efficient GMM Cl>tlmation
and the GMM J-stattstic. tht!re!ore applies direct!) to time series applications
wuh moment contlt tton~ of Lhe form 10 Equauon (1K71), undu the lnpothcsis
I hell the moment conuitiun in Equation (1 ~ 71) i in fact, com:ct
z,_z, ... )
Key Terms
737
Summary
r
X(J + U, where Y is
then x 1 vector of observations on the dependent variable:.,\ IS then >-. (k + l)
m.ttri:-< of 11 ohscr\"ations on the k + l regressor-. (includmg a LOOl>lant), (3 is the
k + 1 vector of unknown parameters, and U is the 11 X 1 vector of error terms.
2. The OLS estimator is
= (X' X) - 1X ' Y. Under the first four least squares
assumptions in Key Concept 18.1. is consistent and a!>ympto ucally normally
distributed. If in addition the errors are homosk.cda ttc. th.:n the:: conditional variance of Pis var(P IX) = u~(X'Xr 1 .
3. General linear re~trictions on (3 can be written .ts the q equations R(J = r. and
this formulation can be used to test joint hypotheses involving multiple codficients or to construct confidence sets for e lements of (3 .
4. When the regression errors arc i.id. and normally distributed, conditional on X.
{3 has an exact normal distribution and the homoskcdasticny-only r- and F-statistics, respectively, have exact 11 k-l and Fq 11 s. 1 distri butions.
5. The Gauss-Markov theorem says that, if the errors a re ho moskcdastic and conditionally uncorrelated across observations and if E(u, j.\') - 0, the OLS estimator is e fficie nt among li near conditionally unbiased estimators (OLS is BLUE).
6. If the error covariance matrix !l. is not proportional to the identity marrix. and if
n is known or can be estimated . th.:n the GLS c::.timato r i!> nsymploticaUy more
efficient than O LS. However. GLS requires that, in general. 11; be uncorrelated
with u/1 observations on the regressors. not just wtth X, as is requm.d by OLS. an
assumption that must be evaluated carefully in applicat ions.
7. The TSLS estimator is a member of the cla\S or GMM estimator.. of the linear
model. In GMM, the coefficients are estimated by making the ~ample co\'ariance
between the regression error and the exogenous variables as small ,,s possible~
specifically. by solvi ng minb !( Y- Xb)' Z] A IZ '( Y - Xb)J. \\here .4. is a wc1ght
matrix. The a::.ymptotically efficient GMM t.:t-~imator sets .1\ - [E(Z,Z,'u;n- 1
When the e rrors are homoskedastic. the aS) mptotJcally dfic1cnt G.MJvt e::.timator
in the linear IV regression model is TSLS.
1. "11lc linear multiple regression model tn matm. tom1 .,
Key Terms
Gauss-M arkov conditions for multivle
regression (719)
Gauss-Markov theorem for multiple
regression (720)
generalized lca~l squares (GLS) (722)
infeasible GLS {725)
fca~tblc
GLS (725)
738
CH APTER t s
A researcher studying Lhe relationship bct\\ t!en carnm\!S and g~nd~r tor a
T .A {3 -r Xufj
1 1
~r..un j, a tcm .. ~
and X 11 i' a hinal) "ariable that equals I if the ih pcr.on 1s ~ muk. Writ~ th~
model in the matrix form of Equalion (18.2) Cor a h\'pothc:ticnl 'CI ' r II = '
obscn auon!.. Show that the columns of X are linear!\ dcr~ndcnt ~o th~ t X
does not have full rank. Explain how you would re"rcctll\' the modd to
climtnatc the perfect multicollinearity.
You arc analyzing a linear regression model with 500 ob~ervat1011~ ant.J on~!
rcgre,.sor. Explain bow you would construct a confidence int~rv<.ll for {3 1 11:
18.2
18.3
18.4
18.5
tt.
Assumptions 1-4 in Key Concept 18.1 arc true. hut you think assumptio n 5 or 6 migh l not be true.
u.
c.
A~s umption!)
Suppo:se that assumptions 1-5 in Key Concept 18.1 are true, but that
assumpllon 6 is not. Does the result in Equation (I R.3 1) hold! F \platn.
Can you compure the BLU E estimator of (3 iC EquaLion ( 18.41) hold:- and
)'O U do no t know !l.? What if you know !1?
Construct an example o( a regression model that sausfies the assumption
(11 IX;)= 0. but for which E(U 'X) ::= on.
Exercises
18.1
Constder the population regression of test scores agai nst tncumc and the
square of income in Equation (8.1).
a. Wrirc the regression in Equation (8.1) in the nHtl rix form of ELILI<ttii.Hl
(l fl.5). D efi ne Y, X. U, and /3.
b. Explain bow to test the null hypothesis that the rcl ~Ltio n:-.hip bc twc~..:n
test score::; and income is linear against the altcmattvc that it is quadr~ltk. Write the null hypothesis in the form of Equatmn (IH.20).
What are R . r, and q'!
1S.2
Exercses
739
I
I
Samplo Covariancos
y
x,
x,
0.26
0.22
0.32
O.SO
0 21{
Sample Means
6.39
~
x,
1 ~--
7.24
.too
x1
2.40
= c' ~" c.
= c' W.
oo.
j3 in
vi
I-'ll
18.5
740
CHAPTER 18
P{''
M,~.\
1H.7
Cnn'ltkr the regression model, Y, = f3 1X, + 13.. U', 11,, \\hcJ e tm -;implicit)
the intercept is l)mittcd and all vari.tblc:s are a'~umcd tu ha\C ,, mc:.m o l
;rcro. Suppose X, is dtstrihuted independent IV of (W 11,) hut W ;lllu 11 mi,lt
be cnm::latcd, om.l kt /3 1 aml /3-: be the OLS e~ t irn a tor... IM thts mndd Sh"''
that
a. Whether nr not W, and u, arc corrclatctl,
b. 1f W, and
111
PI
h,
---Lt /3 1
c. Ld
be the 0 LS estimator from the regression ol ron \ (the
re<;tnctcd n:grcssilm that excludes W ). Provttle condllllln' unJ.:r
"luch /3 1 has a smaller asympwuc vananc~. than ~ 1 Jllowini: forth~.:
pos-;ibtht~ tbat W1 and u, are correlated
litH
a. Der 'e an
expr~<;ton
for E( L'c:')
= n.
b. Explatn hO\\ to esttmate the mod..:l by v LS wJthout explldtl) ti.\r.,;rt 10. the matrix n. {Him: Transtorm the moJd so th . H the rcgrc ...... llln
erJ\)fS arc ii,,
!ill.)
ul.- ...
18.9
lhio; ocrci ... ~ sho"s thai the OLS estimator ol a suh<;ct of the rc~rcc;'lll)n
codftctc:nts is consistent under th ~.; conJi tion,tl mean inJ ..:pr,;~tJ~onu
"'sumptton '>tatctl in A ppendix 13.3. C'onsitlt..t the mulltplc rt.grC'\'-II'Il
model tn matnx form Y = X/3 + Wy -'- u, where X ,md n' .tre. r~pcctivcl>
n ~ /.. 1 am.l n X k~ matrices of regressors. Let .t; and w; denote the.: 11h nw.~
of.\ anJ H (as in Equation (1(\.J)I A'-"umc th.tl (i) F(11tl \',. 'W) W:'o
whl!rc 1i j, a k , X L vector of unknown parameters (ii) (XJ. W }~)arc i i d
I iii l t.\,. H~. 11 l have four finik. nonzero mom nt:-.tnd (") there j, 011 perfect multtculhneant) . Tht!:>~ arc a..:,umptH.IO' l..-4 ol K~v C 111~o:cpt 1~.. wtt ..
the c,mJIIional m~.:an mde~ndence a!>!>Umpuon (1) rt:piJlln!-the U!>U:tl con
Jiti<'mal mc:tn J'Ch' a ...... umption
Exercises
u. l
s~..
(II
h.
\" -\f 11 X)
'tn
/J gi,cn in
E\CJci-.~
IX lltu \Hik
/3- /3 =
',\" Mu L'l
r ( y \' ')
\II
741
= ( y H'')
lmho forth
I nl~
matnx A
1 H. th~
X, ,, ~ A it
(t.J) clcmcnb
llf
c. Sho'' that assumpuonl' (1) and (u) 1mply that flu.\', W)- Hi>.
d. t '" (c) anJ the law of ih:ratcJ c:xp~daton~ t<. shll\\ tl1.1t ,- ..\" \lnL '
__!!__,
b~
jJ ~
{3 .
yq implit:' 0 = Cq - yq
= CCq - "'IJ
yCq - yq
x;.
UtU
a.
how that ~t./1< 1111 is the l!l'fich.:nt G\1M c'tim.ttor- that i' that
p rtJfo\f\f) -"-;
18.13 C'l)n.,ider
th~
prohlcm of mimmi;in!,!
th~
0.
-.um of squ:u cd
residual~
subject
fJ he
742
CHAPlEt 18
P- (X'X)
- r)
= (R{J- r)'
d. Show that F in Equation ( 18.36) is equivale nt to the homoskeskasttcity-only F-statistic in Equation (7. 13).
.fJ and /3
= ( !~ O ~
t ).
= 0.
ii. Show that tbe me thod Cor computing the .!stat ist ic describe d in
K ey Concepr 12.6 (using a homoskedas1ticity-only F Slatistic) and
the formula in Equation (18.63) produce the same value for the J.
s ta tis tic. [Him: Use the results io (a), (b.i), and Exercise 18.13.)
18.1.5 (Consistency of clustered s tandard erro rs.) Conside r the panel da ta mode l
Y11 = {3Xil + a; + u, where all variables are scalars. Assume that assump
tions 1, 2. and 4 in Key Concept 10.3 hold, and st re ngthen assumptio n 3 so
that X 11 a nd II;, have eight no nzero fini re mo ments Suppose. howeve r. tb.:tt
the e rro r is conditionally serially corre lated so rhat assumptio n 5 doc. not
h<..l ld. Let M = l r- T 1u' . whe re L is a T x 1 vector of 1's. Al11o le t Y; =
(Y, 1 Yu ... Y11 )', X, = (X11 X 12 ... X,1 )', u; = (u, 11, .. uiT)'. Y, = MY,, X,=
MX1, and 1 = Mu1 For the as}mptotic cah.:uhllio ns in lhis problem. suppose that Tis fixed and 11 --+ 'JO.
a. Show that the fixe d effects c<;timato r of {3 fro m section 10.3 can be
written asp = ( L~ 1.Y,'X,)- 1 L~~ 1 X/ Y1
b. Show that
c. Let Q; =
E(X,'X,) and QA
d. Let T'l,
= Xfu11'VT a nd tr~
/3) -
.\(o,cr;
Q~. ).
743
using the true errors in~tead o f the rcsi du<~lc;. su that cT~''" "'' 1 =
~" (A
v-iu,) -.
' SllOW
th at cr;,,cftrtcnd
_..,.
P
,r1 L.J,
__,___,.
cr;'
1
g. Le t ii 1 =
Y, -
arg ument like that used to show E q uation (1 7. 16) to show that
a-;,. ltrmwd -
APPENDIX
Cha pter l8.1l1c purpose of this appendix i:. to rcvtcw somc concepts and dcfmlttons from
a
cou r~e
v~ctor
[...
111.
A=
a~t
a1..
....
a,l
an2
II:,.,.
(1 '
a.J,,
744
CHAPTER 18
1'
t hl:
(1
J) demen t ol A , that
I!> /J 1\
th~o; ckm~nt
Types of Matrices
,\ quart, \,l'mmuric, ttnd diagonal matric~.
un.tl d..:ments equ.tl/.ero, that IS. if the square matnx A io; dta~ona l . th~.:n a 0 fnr i
Sprcialmatric:t~.
An 1m~rtant matrix 1s the identity malrh<, 1,. v. h1ch is ,,n n X 11 dtaunna l mutrix wit h ones on the d iagonal 'lhc nuiJ matrix O, x, IS the 11 X m mutrtx wuh all
switchc~
is. the
tran,po~l.! of n ma trix turns the 11 X m matrix A tntO the m X n matrix, which i~ denntcu
hy A'. where the {i,J) ell.:mcnt of 1-\ bccom~.:::. the (j.i) element of A '; $:lid dtCfcn.: ntly. the
trunspose t)f the mnt11'c: A turns lhc rows of A into tbc columns of A '. ll n "the 11.j) dem nt ,l( .\ then A' (th~ tra ~pose of A) j..,
A =
a[1
[""
a,
fl:t
Dnt
a~
(lnl
au.,
n,...,J
The tran,po--c of a vector~.> a :.pcctal ca'< o l lhL tr:~n' ~. ''' ' m.Jirt>..Thu t. " tran,.
p<-..; ,f 1 \"~Ctl
vector, tht.:n
tiS
transp()l:(.
l11~.: t rnno;pos~
i~ th~.:
I > 11 rtm
b 15 an n >< I column
\'CCior
,\latrl.\ addition.
to;
txllh"
I Ctllunm
\C
745
:)Uill,
that
n,.
- "-
l "raor mul matrix multip/icarion Let a and b be two' X I column vector<;. Then the
proJuct of the tr:m~p<'~ of a (whach as itself a row vcdur) wilh b ~ a'b = 2:;&a.b,
Appl>'tni! thas Udlllill\)n walh b = a yield~ o' a =- I' aU
Sinu.larly. the matnces A and 8 can be multapiH.:J togethe r 1.1 the) are conformable..
that '' af the numl'l\!r ol columm of A e4u<ab the numl'lcr of fllWS of B. Specifically, sup
po~c -\ hit~ l.hmcn~ion n X m and 8 hcl..'i damen:.aon m x r Lll~..:n the product uf A and B is
ran 11 x r matrax. C: th:H t!\. C = AB, wh~..:rc lh~.: (tJ') d.mcnt of C i'' 1 2 1 tJ,,h~ Said diftcrcntly. thc (iJ) element of A B i~ the product of muluplyang the row vector that is the ,~~
row of A wi th the column \ector that i' the /' column o! B.
The product of 11 scalar d wnh the matrix A hfl~ the: (t,J) dement dAw thrat is, each element of A
~nmc
i~
multaplaed
u\eful
b~
pmpertie.~
the scalar d.
Let A and B be
d li -\
e A (BC) = (AB)C:
1 (A + B)C = 4 C + BC. and
g. (AB)' = B'A '.
Let V l'l\. an
11
11
square matri\.
'Then V is positive dclinitc if c' Vc > 0 tor nil nontcro n X l vecror~ c. Similarly, V is
74 6
CHAPTER 18
11~
\fore.:
0,
1.. vectors. a . a . . ... aA are linear!> independent ilthcrc U\l not cxtq non:tcro '.:alar~ . ~
L
run/,; nf a matrix.
Tb~
+ c a, - 0.,,.. 1
i~ 11 X n with rank( A)
" thc:n \
m matrix.\ has full column rank. then A 'A i-. nun.,inl!ulat.
Let V be an
11 X 11
'lquarc
i~
n1msing.ulat . Lf the-
matrix.
1111: matrix square root of Vis defineJ to be an n ,... n matrix F ~uch th:H F ' F = V. TI1e
mu l rix ~quore 1011 t of a positive defini te matrix will a lway~ .;xrst. hut it is not UnllJU<.:. The
11Hill"ix St) ll<lfC root has the property lh<lt F V" IF' =1, In auulliu n. thL
ol n posllf\'C definite matrix is invcrubk. so that F' - I VP 1 =- T,.
F.igtmulue~
'I:Jiilf
and
~igemectors.
Let A
h~
an
11
n mamx. II
th~ 11
1:.
11
root
lll .\
m.ttrix ha"
11l:llrtll: ~quare
11
J)\1'\111\c r~;.tl
term~
111 '" t genvalues and eigcn' cctors .l!> V = QAQ' \\ hae \ t:> a iliag1mal n . n
m.lln\ ''itb Ji<tgonal elements that equal the eJgt.:m.tluc:-. llf I , .mu Q '' ,m 11 x 11 matrix
wn'i'tmg of the ~igcn,c.:ton; of V. urrang~;d -;o that the 11h column of Q i., the cigt:n\CCtm <.:<Jrrt.:">JXJOding to the etgenvalue that i" thl.' t1 h di;tg1.mal t.:kmcntof \ . TI1c eigenvectors
un orthl)n<,rmal. ~o that Q'Q -== 1.,.
ldt:mpotelll matrices. A matrix C is idempolcnl if C is square und CC -- C. It C ts an
n X 11 id..:mpott\n1 matri.\ that is also symmctnc. the n C i~ positive semidclinilt and C ha~
1 ctgcnv.tlm:s that equal I and n - r eigenvalw.:s thut equal U. where r
r.tnk(C) (ExerCISe liUO).
MultiYOriote Distnbutions
747
APPENDIX
18.2
Multivariate Distributions
lbts .tpJ)\.ntln. collect~ vanou~ dchnllon~ and lad'> ah~ul c.Jt,tnbuunn' of ve~tnr' ~I rllndom \ r lhk~ We ::.t..rll'l) ddinmg the mean and tu\olrtnncc mntri\ olth~! IH.hmcn... i.-.. nal
rnnJom \ariahlc V . l\ext
\\C
qu:~dr.ntc
\~1.h1r
= I 1~, \'1
1llc
~~/-I
=P. v-
1111. Ctnariancc m11fch of V ic; the matrix c:on'l'llll~ ol th~. '":mancc var(I',J,
I... . . 11 along the diagonal anJ the {I.J) ofr-1.h.tgnnal dcmcnh '~''(1 ',. 1"1). ln 111.1trtx form.
maiilx ~ 1 tlo
th~o. 1!1'1\,.tn.m~
1,.'(1\ (
'._,.1 ~)).
(Is.n1
' rt l ',.)
l \ector random 'n<tDk V ha;, a multt\ lllt.tll: nmmal ditnhullln \\ tlh meun
J.1. 1 .111d C~l\"itri,mce m,tlrl\ ~ 1- if it ba' the jtinl pruh,thilil\ den,il~ run~tiun
I
r
1
1
"'
,
cxpl--:;( l'- p. 1 )'~ 1 1 lV-p. 1 ) 1.
\. l-") Jet( .... , l
J
ft V) = -..
~ 1 .
(PG3)
o.\ntmponant fact Jboutthc: mulll\-nrtatc nurm.ll \li,tnhuwm ''that ilt\\l'l j1intl~ n\lr
m 1ll} di~tnhuted r.1ndnm \ariahlt.:'> .m; uncorrd.al~d (e\juh 1kntlv. h.t,c.: l dia~tnn.tl covari:mc.: matrl.' ). then the' rtre independcnll) Ji,.trihutl."d I h.11 ' ' ld V1 .mJ V! lx jointly
748
CHAPTER 1 a
:1
ch1-squared Jhtnbuuon. Let V be dO m X 1 ra ndom vari,tbh.: d1:.Lnhuted V(JA. v}. v). l.:t \
and 8 be nonrandnm u X m .tnd b x m matnces, ami ld d be a n~>nram.lom a x I vector
Then
d + AVis distributed N(d
(18.7-t)
cov(AJI, B V) = Alv8' :
L( AivB' =
( 18.75)
Jl'!
v' v is di<;uihuted ~.
(18.76)
(18.77)
Ltt U be an m-d imensional mullivan ate 'itandard normal random vanable w1tb dutribuuon N(O. / _,). If C b symmetric and idempotent. then
U' CU has a
( IR 78)
APPENDIX
l bl'\ appendI\ rrovides the dem .ttlon ot thl! a"ymptotiC nOt m.tl dutnbullon of
\ n(IJ - fJ) g.&\cn 10 Equauon l}S. l2). An 1mphouon of thtS rc..,ult " that ~ IJ.
IS
*'L"
749
Con~.:pt lti I. .Y, 11- 11.J.. so.\ PX . 1 1.1.d B) th~: tlmd J' umpuon 10 1-..c) <.um:c:pt llS.l. each
dement of .\", ha" four m~mcnt ..o. h~ the ('~uch) Schwar/ m ... 4u Jhty tArrx ndix 1... .2).
\", \
Xi,X1 ha' t\\o momcnb. Bec.:aUSt; XpX j., i.i u \\llh t\\1) moment" ~
ot>eys the
1:\\\ of large num~rs, so ; ~- X \' ~ F.( \ 0- ) l'hi' i~ true lllr ull tho; c.h!m~.;n!S of
X' A. n.soX' n
Q\ .
Next con~1dcr the numerator m.llri'< 10 f 4u 111\n ( 18.15). ,\ ' (.u \, 11 =- \ ;- ~n=! V,.
where: V, = Xu,. By the fi rst assumption tn 1-..cy Conc... pt II'! I and the law of Iterated
expect a tiOn'>, ( ~ , ) = L:"[.\',(11
X,))
ine4uaht~
third least squares assumpuon. This is true tor even such vectM c, so F( V, V,') = :1 1 is
fi nite and. we a. umc. p~iuvc ddinitc. rhus the multlhli Hitc. ..:cntr.llimlt thl:,ucm of Key
\1L'- 1V, =
( 18.79)
m e rc~u l t in Equation (18 12) folio~ from Eqtmlionb (1R 15) unu (I ~ 79). the consisleocy of X' X l n. the founh least squares assumpton (wluch c n~ure~ thot (.Y' X) ' ex1sts),
and Slutsky's theorem.
APPENDIX
Normal Errors
1l1is appc!ndlx present the proofs of th~ dNnbutun~ unJ~r tho; ~ull h) p<.lhc<.t-.. ol the:
hOtnll~J,.cJ.t.,lldl} t tnl~ rstati!'tic. tn EquatiOn ( 11\. '\'\) anJ the humc"l.cd;t"l cit}''"'~ F--tali,llc 10 Equaton (18.37). ac;suming that all -ix assumpllull' m Key Clmccpt IS. I hold.
in tht' fl,rm.
nutic~
that
i si =
( 18.130)
750
CHAPTER 18
where W - (tr
.tnd m -=
k- I
11 -
W(
(i)
.:11
~ and
s;,
(ll{.sl)
.v:
x;,:(til II . ~~ dt~t nbu tcd and (iii) W1 ;md W2 are independently distrihut~u ( J\ppcndtx
17.1). 1o C).pre!i~ J: rn tlu~ fonn.lct W1 = (R~ - r )'IR(X'X) - R 'crj ' (R~ r ) arJ n -=
(n - J.. - I)',
11 ~
ubstitUll\ln
or lhcsc definrliOO
IIllO
q and":
11 -
T ha<~ .111 F. ,
I =
k
k- 1.
R(
p-
Equatton ( 18.77) in Appendtx 18.2. (R/3 - r)'(Rl X' X )- R'tr.l (R~ - r ) '" dt'>
tnhutcd ,\'q pro,ing (1)
ii. Rcqutrcment (ii) is 'hown 111 Equation ( llU I).
iii It h," tlrcadv h.!cn 'h(mn tbat
/3 .mu ,2 ar..: indcpl'nJunlv '''tnhutl.'d
Jl.quatt'n (18. !JI ll follc)\\'S that
r mJ ' are iodependL .. ,I\' J~,lnhut~d .
,,h,~,.' in rum imphcs rhJt W1 and W 1re mJe('\!nd~ntl) ~,IJ,tri!'luted. rrn''lng (Ill)
p-
RP -
APPENDIX
75 1
Regression
lhi' appendix prove-. the Gauss-Markm theorem (J..ey ('oncept IR 1) lor the multplc
re~rcs<~ion modd L..:l /3 be nlinearc0 nd1tionally unhw,\.J 1.!\lima\llt of /3, so that ~ - A' Y
and I <P1.\") - {J . '' here A is an 11 X (k + I l m.1trh th.il ~; .ln dc:pc"nd on X nnd nonrandom
u~n-;tants. \\c.: !>hO\\ that ,,lr(c'/3) $ \ar(c'fl ) for all k 1 c.lum:nllionc.~l 'ector:. c. "here th..:
mo.:4u.s 11~ hoJJ., "1th cquaht) on!~ tt ~ =
Re .. IU')e ~ , ... hno.:at. It \.aD tx v.'Tittc:n ,,... iJ : :." A ' y .. A '( rp - l ') - (.-\ 'X)fJ + A . u.
Bv tho: liNG u,,_\larko' condition. E( L'I X) 0 1 'o rf fJ I
(-~ ' \'){J_ hut because
~is Cllndlllon:~ll~ unhtased E({J X) = fJ = (A ' X ){J. \\htch imphe"lhat A'X- 1 _1 'lhu:. jj
= {J
v. h ~o-rc the.: th1rd ~.-quality follows bcc-auss: A llll d1:pcnd on."< but not L and the.: final..:qu:ll'') lollows from -.ccond Gaus:.-Markov cond11ion Th,lf ~.it~ h.lin~o.tr and unbhts~:d, then
jJ with
=\
Now lo.:t 1
~ute that
= -~ . - n. <;O that D
.4.\ =
Sub\ltt utulr A
mu
(X'X) - 1X'A
= ( '\'' ..\)
1 \:
I
ndds
var(P IX)
~(.4
+ D )'(A
., /J )
A.A = (X' \ l
752
CHAPTER 18
\') -
{IS.S4)
The mcquality 1n Equat1on ( 18.~) holds for all hnc.1r oomhm.ttllllh c' 1J and the
inc:ttuahty hold\ With cquahty for aU nonzero con I} tf D 0, v(l 'II: that 1). tl. \
\ \)f.
equivalent I~. ~ :: P.Thu~ c' iJ has the smallest vanuncc of all hncn cnra.htton.tll) u 1hHbcd
e ~tima tors of c' {J . that Ill. the OLS estimator is BLUE.
APPENDIX
'5.!t- ~}~~-~
= ( QuAQ z~)-
u;
u;
(I RK)
whe re the second term to brad.cto; in the second cqua lit)' lollows rmm
= PF
<~nd Qi~
F -I F' - and
... F - 1F
r -:::
1
'
Let
F ' 1'J. Then the hnal expression in EquattOO (IK85) can b.: r..: wril-
h:n to yie ld
~.~\,,
- I.' sr.s = (QxzAQz.r)- 1 QXT.AF'fl - F'' Qzx(Qxzr 1r '' Q7X) - 1 Qxzr 'l
FAQzx ( Qx7.AQ7X)- 1 u~.
(18.86)
where d FAQ/,( Q" /AQ." )- 1c ,lnd D = F 1 Qz.., . Now I - D(D 'm- v as a ~yrn
metrae tdc.mputcnt matfl'( (F.~crct"c IK5) As a result. I - D I D 1))- 1/J ha~ dgcnv:\lucs
l; Hl .\)c ~ 0. proving
1D 'Id
753
homobk cda~llcu:..
Asymptotic Distribution
of the J -Statistic Under Homoskedasticity
The J -stat1sllc is defined in Equation (18.63). First note that
= Y- X ( X'PzX) - 1X'P2 Y
=
( X{J
+ U) -
X(X'Pz.X) - 1X'Pz(X{J
+ V)
(18.88)
i.Jp.z U =
when; the second equalit y follows by simplifying the preceding expr\!!;Sion. Because Z ' Z
is symmetric and positive definite, it can be written Ill terms of its mntri.x 'quare root. Z'Z
= (Z' Z) 1 '~' (Z' Z) li2, and this matrix square root IS invertible, so (Z'z)- 1 =
(Z'Z) - 1 1 '(Z ' 7.)-ll~ . where (Z 'Z) - 1 1 = [(Z'Z) 112]- 1. Thu~ P7. can be \\nttcn al> Pz =
Z(Z' Z)
z = BB', where B =
UPzU =
U'BMif:rB'U.
X' B]B'U
(1&.90)
The ao;ymptotic null distribution of U'PzU ts found b) computing the hmtl'> in prob
nbtut) anJ in di~tribulion of the various terms in the final exprcss1on in Equatton {IH.90)
under the null hypothesis. Undo::r the null h) poth ~sis that (Z,II.) 0, zL'l Vii has mean
tcro and tile central lirmt theorem applies. so Z' Ul Vii ~ \1(0 Qz.zU~). 1n aJdition,
/.'Zin ~ Qzz
and
X'Z in ~ Qx:t
T11us
B' U (Z'Z)- tr z u =
754
CHAPTER 1a
'!7'l' \ r; 1 -----+ rr,.z. \\b.:rc z j., d1~tnbutcJ .\(11. 1,., .,d). In oiJJiton,
t7 7 ,)- '(7' \ 'n) ~ QJ: Q zx "'
11 \ I\ -
M,'
Q,.
ll1U~
(lS'JI)
UnJ..: r the nul lwp. th... ,j,_rh~ T~LS e;;um,,tor '' o:onw>h:nt nnd the o:oeffCJenL' in the
re2re"i'm of i ''r 7 ... JO Ve ~e in prtlbt~bilih to zo:w [l- implic.tion of r:qu ' I~.QJ l).
so the Jo:nnmmJtor in the Jdiruuon of the J-stat',uc ' ,, o:on~t\ICnt ~timatc f r1:
..
....
f1
(1 .92)
Ir.rn the c.kflni!IC'ln c.l th" J o;tatistc .tnJ Fquat1on' (1~.<11) .1nd (IS.9'),1l lulkm~ th.tl
i'I'PzU
J :.
-----'
U' M 7.V!(n -
111 -
"
~ ----+
r - I)
z'MQ.,y
'"<h, z.
(li\.9~)
HclaU\c:.: 1~ a :.t,ulthrd nmmal ranJ om v~.:ctor and M <lu' Clt i' a ~ymm~tnc idcmpo
tent mattlx.J '" c..hstnbutcJ <Ill .1 d u-squared rantli)m \.tr~o~hk ''llh ,,t..;gr<.:~o:.\ 1>1 tree,.h>m that
equlll tht!' r.ml. ol \IC!u (l, (Equauon (I .7~)1 Bo:c,lU~e Q/.,el Qn 1 (m
I X V
r - I)
anJ
111
M Qu Qn i.; m - lc (b.:r~.t\C (I
!'
~m (l~.6J).
1" 1'"'
~ ),~
1s
under h<1nlllskcd."tlcll ,
flf
Appendix
-2.9
-2.~
-2.1
:t
0.00 I~
0.11025
(I II(H4
0.0018
0.111124
O.IXH>
O.lllll7
0.1.}(11 h
O.Cl022
0.0015
O.lllll5
0.001 ~
0.0014
ll.IKI21
u.no.12
O.O<H fl
(1.()023
OJIInl
O.CKI:' I
0.0021\
!l.0020
0Jl027
0.0019
0.0026
o.<w).n
11.()()4 ~
o.oo:n
ll.ll(.IJ6
f) (.11)55
1)111.173
llll0-49
llllllM
0.()1..\7
0.111 t_,
0.0146
0.0048
O.\Xl6-t
O.!lOS4
0.0110
0.111~
li.OIS3
0.0233
0.1)01 '}
1.00:'(
ono ,5
Pr(Z :.... z)
ll.l'kl2.~
OJIOJO
IJ.(IC\411
ll.U0211
ll.I~'W
lll GS
(.1 (11154
0.00.52
(111115i
11.0069
0 .0091
() II 1'1
(1.(1()(,~
0.0151.1
on 92
0.11244
-2.ft
ll.llnl"'
11.1111-t"''
()()I~
-2.5
0 .006::!
(1.1.)11{'14 1
( t)()<i()
0.111157
(1.(1(~1
ll.liO-s
O.fltr'S
-~.J
0 0107
0 .11104
O.tll 02
11.11
3t.
(l Ul ~2
(J lll711
11.1)12..'\
-2.1
.2.0
- IQ
I '\
0.0139
11.01711
II 022S
II 02h'7
IJO:lW
O.OCJ9iJ
11.0129
fl071
(l.(l(J94
I 11121
1 1.01~
IJ.Illll2
0.01'>~
(I.IIJ'i4
11.02r
0.0212
0.0201
n1121C
I 11117
0.026-'i
0.0:'!112
0.02.'\fl
(11)~5(1
ll.IH!\1
0 0274
1111144
11.113}6
0.113211
o.unz
11.11.~07
11.11.239
(J.Il'\j 1
-17
fl.IJ.~lh
11 t).nh
() 11427
OJJ4l~
O.ll409
0.114111
1).(1314
IJ.II N;:
1Hl3s-1
tJ.o~n:;
-1 h
15
ll051X
I)
05:!h
{I Ocl7'i
O.fl.tf!5
0 .()('1(.)0
(l()'iQ4
ll Wi .:!
IJ.057J
0.05W
-1.4
o.osos
O.lliil'i
(I rm~
0.0505
O.OolS
0.07-lll
0.114 "i
11.1104~
0.0516
O.OnJO
Ll049'i
11066.'\
llll5.P
ll fl6~5
n !J71J'\
1).0294
0 036~
U.ll455
t).l)7"'i
1)0721
0 o-o~
nM9-+
-U
O.I>'Jfo~
II (19'i I
II I 111
0 09:W
11.1112
(I 1114
(I)..; 'N
n 11s.~
(I (l<) I~
0.0<..10 I
0.107:1
I) 1271
O.IJlJ2
O.OXlo.'i
onso9
0 JOx
IS'i~
1.01\3~
0 IO"'IJ
II I {)(J3
0 12311
ll IJ4tl
0 IN\'
II 121 0
11.1423
0.1660
I)
O.i}l): 1
OJk-.23
11.()<)"5
0 lPI
0.13"'9
o loll
..
.,.,
-1.2
- 1.1
1 II
-0 \)
1(111
(1 11174
0()"'22
O.ll"S I
ll
!'~'\
0.11'1'13
0.12<J2
ll I 'i I'i
() l7ft2
O.rlO%
01736
(J
U.l115h
01 251
(I 1-lh\)
(l 1711
(1 (1(1~9
flllllfl
1190
01-tlll
0.1635
0.01~3
755
7 56
APPEN DIX
TABLE 1 (continued}
-().8
-o.7
-0.6
- 0.5
- 0.4
-o.3
-0.2
-0.1
-o.o
0.0
O.J
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
].5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
0.2119
0.2420
0.2743
0.3085
0.3446
0.382 1
0.4207
0.-1602
0.5000
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7257
0.7580
0.788'1
0.8159
0.84l3
0.8643
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.97l3
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.2090
0.2389
0.2709
0.3050
0.3409
0.3783
O.-IJ68
0.-1562
0.4960
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7611
0.7910
0.8186
0.8438
0.8665
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.2061
0.235R
0.2676
0.3015
0.3372
0.3745
0.4129
0.4522
0.4920
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.&'188
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.995()
0.9967
0.9976
0.9982
0.2033
0.2327
0.2643
0.2981
0.3336
0.3707
0.4090
0.44R3
0.4880
0.5t20
0.55 17
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.8907
0.9082
0.9236
0.9370
0.2005
0.2296
0.261 .1
0.2946
0.3300
0.3669
0.4052
0.4443
0.4840
0.5 160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7995
0.8264
0.8508
0.8729
0.8925
0.9099
0.9251
0.94~4
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.1 977
0.2266
0.2578
0.2912
0.3264
0.3632
0.4013
0.-l-\04
0.-lROI
0.5199
0.5596
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.974-l
0.95R2
0.9664
0.9732
0.9788
0.9834
0.91:!71
0.9YOJ
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9382
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.1949
0.2236
0.2546
0.2877
0.3228
0.359-1
0.3974
0.4364
0.4761
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0J:!770
0.8962
0.9'13'1
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.99~
0.9961
0.9971
0.9979
0.9985
0.1922
0.2206
0.2514
0.2843
0.3192
0.3557
0.3936
0.4325
0.4721
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8078
0.8340
0.8577
0.8790
0.8980
0.9147
0.9292
0.94lR
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.99 11
0.9932
\).9949
0.9962
0.9972
0.9979
0.998.'i
0.1894
0.2177
0.24R3
0.281 (J
0.3156
0.3520
0.3897
0.42R6
0.4681
0.5319
0.57 14
0.61 03
0.6480
0.6844
0.7190
0.75 17
0.7823
0.8106
0.8365
0.8599
0.8l510
0.8997
0.9 162
0.9306
0.9429
0.9535
0.9625
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9903
0.9973
0.9980
0.9986
0. 1867
0.2l4H
0.2451
0.2776
0.3121
0.3483
0.3859
0.4247
0.4641
0.535Y
0.57'13
0.6141
0.65 17
0 f>S7lJ
0.72~-1
0 7549
0.7g52
0.8133
0./G:-:9
0.8621
0.8830
0.9015
0.9177
0.9319
0.9441
0.9545
0.963~
0.9706
0.9767
0.9817
0.9857
0.98<.)0
0.9916
0.9936
0.9952
0.9964
0.997-l
0.9981
0.9986
This table con be used to calculate Pr(Z :S ;;) \\her~:; Z is a standard normal\ ariabh.:. For el\amplc. when: = 1.17. this
probabtht~ '" 11.87'Xl. which is the table cntry for the rO\\ labeled 1.1 and the column labeled 7.
APPENDI X
TABLE 2
757
Critical Values for Two-Sided and One-S'tded Tests Using lt!e Student t Distribution
Sig nificon'e Level
OegrM$
20.,., (2 -Sid.ed)
of Fr. .dom
10% (lSided)
\.118
2
3
I S9
1.ti4
1.5.1
1~
(2Sided)
s,. I1-Sided)
S"Ko (2 Sided)
2% (2 Sided)
1% (2 Sided)
2.5'1' (l Sided)
1"'4 (lSided)
0.5"4 (1-Sided)
3 1.K~
.t)(}
ll%
'J
35
u~
4.5-t
2.13
1 \(l
S.M
4.60
4.03
3.71
3.5U
3.36
'..,.s
5
6
7
1.4h
:!.ll2
1 4--t
1.41
1 1)4
2.7S
2."i7
2 ~"i
U~9
}.1()
.1,()(1
1.40
1.38
1.~6
2.31
U:l1
2.16
un
2.23
2.20
2.11{
2.16
2.14
2.1.1
2 1Xl
2.X2
2.76
1:'7
I .16
1.36
1.35
1.35
I 14
I ~4
1.33
113
lU
11
IZ
13
14
15
lh
lX
14
2fl
133
I ..H
t.:n
")I
22
1.32
I ,_
I \1
1.32
,~
23
24
2"
26
2"1
1.32
1.31
2~
I 1
21J
311
I 11
1.31
(1(1
I 10
90
1"'9
I 2.9
1 ~s
12!1
'r;
I.SO
1.78
177
I 76
1 75
1.75
1.74
2.11
:..72
2.1\R
2.65
2.62
3.25
3.1 7
3.11
--.o5
3.01
::!.98
2.60
~.95
-\fl
2.92
:! 90
,~
I 73
J --~_,
1.72
1.10
2.01.1
2 )4
~.l'\6
2.ll9
2<\\
2 . 5
1.72
2.0.~
., "'
2 <\ )
1.72
1.71
1.71
1.71
1.71
1.70
1.7ll
1.70
2.07
2.07
21X)
., .Oti
2.11tl
2.()5
205
2.05
., 4h
1.711
204
2.4ll
167
1.66
2.00
199
l.fi6
1.%
2 17
') 16
1.64
1.%
VJ
~-1'
' ll
1)2
2 <;<;
::!
~8
2 ~->3
2 '-12
2 "II
2 '\l
., 4l)
2.XO
,.41)
2.-tS
2 . 2..1"'
2.7b
17"
2 \()
2 79
2.76
2 "6
2.75
266
263
2.62
2 )8
mt il .tl 'ulut'~ fur''' o -stdcd (-...) and (\Oo.;\lded ( > ) nhcrn.ttlvc h~ pothc~c=-- The .:ruic.,J \':tlu,
the ont.. -~ld.:J ( ::> ) crlltctl v.tluc .,hm\n 111 the tahlt'. For cll:amplc. -~I~'"
l\\ll Mdtd 1<.:'-l w ith 0 ,j~OihC30t'C J.:vcl 01 5~, U"n~ lht ~IUU<.:fll I Jt~lllhUIIOO wtth 15 Jcl!fcc~
6 ~ .66
12.71
6.3 1
2.92
758
APPENDIX
1
stgnifi<o nce Level
1--c
Degrees of FI'Mdom
2
3
4
5
6
7
8
9
10
1I
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
~~
.s~.
~ 71
4.61
384
599
6.25
7.78
9.24
10.64
12.02
13.36
14.68
15.99
17.28
18.55
19.81
21.06
22.3 l
23.54
24.77
25.99
27.20
2!>.41
29.62
30.81
32.01
33.20
3U8
35.56
36 74
37.92
39.09
40.26
VH
, ~.
6.6~
9 21
I I >.t
9 49
13.2~
II 07
l "i.ll9
12.59
14.07
l5.5 1
16.92
18.31
19.68
21.03
22.36
23.68
16.81
18.48
20.09
21.67
23.21
24.72
26.22
27.69
29.14
30.5R
32.00
334 1
25.00
26.30
27.59
28.R7
30. 14
31.41
32.67
33.92
35.17
36.41
37.65
38.R9
40.11
41.34
42.56
43.77
34.81
36.19
37.57
38.93
40.29
41 .64
4298
<W31
45.64
46.96
-li(.28
41J59
5U.X~
11us table cuntuins rhe 9Q'h. 95'11, ,1nd 99'1' percentiles of the x~ distribution. These serve! ::.s critical v:liuc' I'M lC:~IS
s1gnificancc levels of 10%.5%. und l 'Y.,.
''11ft
APPENDIX
TABLE 4
759
()
Significance Level
DegrMJ of frMd.o m
10%
2.7 1
2.30
2.08
1.94
1.85
1 77
1 72
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
1.
1lJ
~()
_,'l..
3.84
6.63
4.61
3.78
2.60
2.37
2.21
1.60
1 57
1.55
1.52
1.50
3.32
3.02
2.10
2.0 1
1.94
1.88
1.83
1.79
1.75
1.72
2.80
? "?
2.64
2.51
2 41
-"2.25
:! 18
2 13
1.69
2 Ots
] .49
1.67
1.47
1.64
2.04
2.00
1.97
1.93
l.90
1.88
1.85
1 R3
l'1
I 79
1.77
1.76
1.7.t
1.72
1.71
I 70
1.46
1.44
1.43
1.42
1.41
:!_'\
1.40
1.39
2-t
1.38
2S
1.38
26
27
1.37
1.36
1.35
1.35
1.34
2'>
1()
l "fo
3.00
1.67
1.63
21
28
5%
1.62
1.60
1.59
1.57
1.56
1.54
1.53
1.52
1.51
l ~()
I 49
I 48
1.47
1.46
Tlw. t11hlc cont.un the 1)1)01',9'i1h uml1J<1'1 pt:rcc:ntiles of the: F, d1~tnbullon. l111:~c ~r' ~ as cnttc.ll valu~.:~ for tc~ts with
~ignific.mw lc\c:l~ ot 10 11o, 5').,, und I%.
y
760
APPENDIX
Degrees of
Freedom (n ,)
10
:Ws6
49.511
53.59
55.1:1~
5h l){)
QJ()
_,
lJ..'\5
SAn
60:20
Q.J9
5:23
4 54
'i
h
4 (,K)
3 71'\
3.5\J
5.39
4.19
3.62
:1.29
3.07
2.92
2.R I
2.73
2.66
2.nl
2.56
2.52
2.49
2.46
2.+1
2.42
2.40
2.38
2.36
2.3...)
2.34
9:n
'i 2:-i
9.24
5.34
4.11
3.52
3.1.
2.96
2.1S l
2.119
2 .61
2.54
2.4S
2.43
2.39
2.36
2.33
2.31
2.29
2.27
2.25
59.+1
9.37
t:..25
59.1\h
Y.OO
57.24
Q.::!9
5.31
4.05
3.45
3.1 1
2.HH
2.73
2.1'1 t
2.52
2.45
2.39
2.35
2.31
2.27
2.24
2.22
2.211
2.1S
2.10
2.14
2.13
2.1 1
211J
2 09
5~.20
K53
5 54
I
2
~
7
8
9
10
J1
12
13
14
4 32
3 7t\
3.46
~.lh
~.46
3.11
3.36
3.29
3.01
:w
2.97
21
~.%
2.92
2.X6
Z.:.H
2.76
2.73
2.70
2.67
2.64
2.62
2.o1
2.5l)
:!Y
22
23
2,1:)5
256
2.94
2.5'l
2 54
2.53
~~
J.n
:1 IX
3. 14
3.10
3.07
16
17
:to:'
I~
3.01
!9
2 .~
24
25
26
27
3.03
'2.9"~
2.92
2 IJ)
2.90
, -.,
_,:'1.:.
2l\
2.SI.J
2.:1
2.50
29
30
'2.~9
~.50
2.:->,
2.79
2 7o
'2 75
2.71
1.49
60
90
120
<X
::.:w
2.36
2.35
2_"\()
11
_., ..,:.
~~
......
2.32
2.31
2.30
2.29
2.28
2.28
2.18
2.15
2.13
2.08
'2.23
.,...,
-2.21
')
2.19
2.18
2.17
2. 17
2.16
'.2.15
2.1-1-
2.04'2.01
1.99
1.94
"!
2.0S
2.tr7
2.06
2.06
2.05
I 95
1.9 I
1.90
l.RS
4.0 I
1 40
3.05
2.H~
2.67
2.55
2.46
2.JlJ
2.33
2.28
2.24
2.21
2.lX
2.15
2.13
2. 11
2.119
.:!.fiR
2.1)(,
2.0."\
2.04
2.02
2.01
2.1)()
2.110
I 91.J
1.%
"27
3 9H
3'\7
J.Ol
2.7'rl
2.62
2.51
2.41
2.34
2.~8
2.23
2 19
2.16
2.1 3
2.lll
1
.0X
2.1)6
2.1>4
2.112
'01
I ~y
I%
1.97
1%
I 115
1.94
191
\.95
9 \h
5.24
\.lJ4
3.34
u2
:> ()[\
2.%
2.72
2.5o
2.44
2.35
2.27
2.2 1
2.1 6
2.75
2 51.)
2.47
2J ij
2.30
2.24
2.20
2. 1'i
112
2.fW
2 .()6
2.114
2.02
2.tXI
I 9R
I 97
I 95
I 94
I Y'J
I 92
I \1 I
I
l)(J
I c9
t'l~
1.93
UG
1.8::
1.1:4
1.82
1.77
1.77
1.72
I 77
I 74
172
l.7R
1.67
2 12
2.09
2.00
203
2.W
I Q,'
1%
1 cr
I IJJ
I~
PH
1.9
I~
J.;-,l.S7
l ,hf,
l.K5
1.74
1.70
l.t'S
1.6:1
3.92
3.30
2.94
2.70
2.54
2.42
1.32
2.25
2.19
2.14
2.10
2.1l6
2.0.1
2.00
1.98
1.96
1.44
1.1}2
1 I}(J
I SI.J
I ;-,
1.:->7
l.l>n
I.S5
I X4
un
LS2
1.71
l.b7
1.65
1.60
I Qit
with
II
j
I
il
761
APP ENDIX
TABLE SB Cntical Values for the F11 , " Distribution-S% Significance level
Denominator
O.g,... of
Fr-dom (n2l
.5
21 'i.70
224.60
19.25
2311 21 )
19 3U
2~Wt1
2.1t~.t'O
2.:\~.lJO
~I
8.79
h.2fl
6ln
6.ll9
19 \7
X h5
hiJ.I
19.40
9.01
llJ '5
8 l'\9
IY.'N
fo.UO
5.-11
4.76
9.12
o.39
5.19
453
11J.H
t\ 44
'i.05
4~
4 ~2
4.77
5.96
-t74
-LW
-121
4 15
4. ~5
412
3.Y7
4.95
4.21-i
3.1\7
179
410
) 6S
' lW
3.6</
3.4X
~.:'h
~\.:ill
317
~'.29
3...::!2
3 14
1.11 1
2,1)1
IIJIJ "1(1
I \1 0!1
3
4
11113
9 'i'i
7 71
,, l}..j
n.ill
57!)
fl
7
5.lJ'1
.) .:;y
" I~
4.74
-1 -16
4 "h
4~>n
4.10
9
10
II
12
lJ
14
15
"12\:'
4~-1
.\9~
4.75
J.~lj
4,h7
3.bl
3.74
3,6.'\
4 tiO
~~
454
4 49
4.45
4.41
IQ
4 .3~
20
22
4.3:'
4 .n
4Jll
"~
4.2s
,4
4.~()
?)
4.24
ltl
17
,.I
J.63
3 'ill
4.07
3'->{,
3. 71
159
3.49
3.311
l.'!()
J.ll
3.41
.\.1)\
1.34
lll
306
3.03
2.96
2.90
2...:5
129
3.2.-1
3.20
3.41)
J.lll
3.47
3.41
3.42
3.-111
3.07
3.05
3.03
339
3.37
4.2~
60
4.21
4.211
4,IS
4.17
4.t:XJ
911
1211
3.9
,,92
3Jn
3.1l7
'X
.'\.Mol
,\,()H
1.35
3.34
3..33
) ()~
3.33
1.211
3 16
27
2R
30
19.16
9.2S
6.59
3 -;.:;
3.'i.'
2h
~~~
10
1() 1.411
ts 'il
1
2
\D
3.01
2.9'>
2.9~
2.%
2.95
34~
1.111
2%
2 13
2QO
~ R7
2.M
2. Q
2. o
2 7~
2 "'fo
2.:'4
2 7(!
:!N
2.7-1
2.71
2.64
2.:w
.Hl9
3 {)()
2.'12
2.~3
fi
J 3Y
~~~
3 02
2.1){1
2.80
2.11
2.6.:;
2 'i'}
2 '.:t
2 :.jl)
2.-lt)
2.42
23q
"_.,
~ ~
:tltl
2.74
171
2 6S
:!.h~
, 'i..j
~.4S
2.60
2.)1
2.57
2.5:'
2.53
2.51
::!.41)
:?W
2.4:.2.42 ..J(I
7.').7
2 4ll
2.~4
~.47
__
.u;
2.39
2.32
2.31
""-
2.~Y
2.24
2..2S
2.27
2.21
2.10
2.U4
V>l
2.66
::!.6-l
2.o2
2 611
2 '\Q
2
2.71
,~
3.32
292
2 'U
2.69
'15
2 .,.6
2.53
2 '6
'2 ..:;s
253
2.3'"'
2.71
2..-n
~-~
2.4~
212
2.211
.2.611
2.37
2.21
::.en
2R5
7\
3 -14
3 23
3117
2 \)'\
:? l\'i
277
., 70
240.50 241 90
2,()(1
2 61
2 "'t)
'-~
-''
-o
2.45
24"'
2.42
2.25
::!.2(1
2.1S
2. 10
~.16
2.-14
2.42
" 'l""
2 '"
235
2 ..H
2.17
2.11
2.1Jll
2.01
255
2.51
2.36
-l.llti
3.64
3.35
3.1-1
2.98
2.85
2.75
?..67
2.60
?.5-1
?..4Y
2.45
2.41
2.38
:!.35
2.:\7
2.3~
2~
2.30
'2.21
2 ..~:
2311
2.2S
J. . L.I
2..25
., .,.,
2..25
'2.2~
2.""\
1.20
2.11)
2.18
2.16
1.99
2.1l4
J.l)<)
J.Q~
2.02
l.lJ6
L.88
l.Yl
1.94
(llr a
1.83
tc~t
\\ith a
762
APPENDI X:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1.6
17
18
19
20
21
22
23
24
25
26
27
28
29
30
60
90
120
~
Degrees of
to
-t052 .00 4999.00 5403.00 5624.00 5763.00 5859.00 5928.00 5981.00 6022.00 6055.00
98.50
99.00
99.17
99.25
99.30
99.33
99.36
99.37
99.39 99.40
34.1 2
30.82
29.46
28.7l
28.24
27.91
27.67
27.49 27.35 2:7.23
21.20
18.00
16.69
15.98
15.52
15.21
14.98
14.80 14.66 14.55
16.26
13.27
12.06
11.39
L0.97
10.67
10.46
10.29
10.16
10.05
13.75
8.47
10.92
9.78
9.15
8.75
8. 10
8.26
7.98
7.87
12.25
9.55
8.45
7.46
7.85
7. 19
6.99
(\.84
6.72
6.62
11.26
8.65
7.59
7.01
6.63
6.37
6.18
6.03
5.91
5.8 1
t0 .56
8.02
6.99
6.42
6.06
5.61
5.47
5.80
5.35
5.26
10.04
7.56
6.55
5.99
5.64
5.39
5.20
5.06
4.94
4.85
9.65
7.21
5.67
5.07
4.89
6.22
5.32
4.74
4.63
4.54
6.93
9.33
5.41
5.06
4.82
4.64
5.95
4.50
4.39
4.30
9.07
6.70
5.74
5.21
4.86
4.44
4.30
4.19
4.1 0
4.62
8.86
5.56
6.5'1
5.04
4.69
4.46
4.28
4.14
4.03
3.94
8.68
6.36
5.42
4.14
4.89
4.56
4.32
4.00
3.89
3.80
8.53
6.23
5.29
4.77
4.44
4.20
4.03
3.89
3.78
3.69
8.40
6.11
5.18
4.67
4.34
4.10
3.93
3.79
3.59
3.68
8.29
6.01
5.09
4.58
4.25
4.01
3.8.4
3.71
3.60
3.51
8.J8
5.93
5.01
4.50
4.17
3.94
3.77
3.63
3.52
3.43
8.10
5.85
4.94
4.43
4.10
3.87
3.70
3.56
3.46
3.37
8.02
5.78
3.- 1
4.87
4.37
4.04
3.81
3.64
3.31
3.40
5.72
4.82
7.95
4.31
3.99
3 .76
3.59
3.45
3.35
3.26
4.26
7.88
5.66
4.76
3.71
3.94
3.54
3.41
3.30
3.21
7.l$2
5.61
4.72
4.22
3.90
3.67
3.50
3.36
3. 17
3.26
7.77
5.57
4.68
4.18
3.85
3.63
3.46
3.32
3.22
3.1 3
7.72
5.53
4.64
3.82
3.59
3.42
4.14
3.29
3.09
3.18
7.68
5.49
4.60
4.11
3.78
3.06
3.56
3.39
3.26
3.15
7.64
5.45
4.57
4.07
3.75
3.53
3.36
3.23
3.12
3.03
7.60
5.42
4.54
4.04
3.73
3.50
3.33
3.20
3.09
3 .00
7.56
5.39
4.51
4.02
3.70
3.47
3.17
3.07
3.30
2.98
7.08
4.98
4.13
3.65
3.34
2.95
2.82
2.72
2.63
3.12
6.93
4.01
3.23
4.85
3.53
3.01
2.84
2.72
2.52
2.61
6.85
4.79
3.95
3.48
3.17
2.96
2.79
2.66
2.56
2.47
6.63
4.61
3.78
3.32
3.02
2.80
2.64
2.51
2.41
2.32
This table contains the 99l11 perce ntile of the f~'"! di st ribution, which sc rv..: ~ ns the cntical values for a te-;t w1th a
I% !;ignificancc level.
Refer ences
(.'\ l:!
~-793.
-1
764
RE FERENCES
Clemen t~
80-85.
E tcker. F. 1967. "Limit Theorems for ReRres.~ions wi th
U nequal and O ,;pcndl!nt E rrol"\." Pro'Cet-dmg~ of
(lr
Econometrin Time Series and JlulrivanaJe Statistics. edited by S. Karlin. T. Amcmiya. and L A.
Goodman. 255-278. New York: Acade mic Press.
Greene. William H. 2000. conomeiric Analysis (fourth
editio n). Upper Saddle River, NJ: Pren11ce Hall.
G ruber, Jonathan. 2001. "Tobacco at the C rossroads:
ll1e Past and Future o f Smoking Regulation in
the u ni ted S tates: The lou mal of Economic
Perspec/11'1.', 15(2): 193-21 2.
Haldrup. Ntels and Michael Jansson. 2006. "lmprm'ing
Size and Po,.,er in Unit Root Testing.'' Palgmve
REFERENCES
Han~en.
15(4).fa1L 117-12X.
Hanushck. Enc. 1999a. "Some Findmgs from an Independent l nvc~llgation of the Tennes.~ee STA R
Expcnmcnt and from Other Invesugations of Class
Size Eilccls." d111:ationnl F.valuatwn and Policy
A na/rsis 21: 143-164.
Han uslick, Eric. J999b. "Tht: Evidence on Class
Size.'' Chap. 7 in Earning und l.eami ng: How
Scltools Mauer. edited by S. May~r and P. Peterson.
Washmgton, DC: Brookings Lnsttf utlon Press.
Hayasht. Fum to. 2000. Econometrics. Prin elon. ~J:
Princeton llntversitv Press.
Heel.;man. James J. 2001 "M icro Data. I h!terogeneity,
and the E\'aluauon of Public Policy: Nobel
Lecture Jou.nwl of Political Economy 109(4):
67:>-748.
Heckman, James J.. Robert J. Lalonde. and Jeffrey A .
Smith. 1999. "'.fhe .Economics and Econometrics
of Active L1hor Market Progra m~" Chap. 31 in
Handbook of LAbor Economics, edited by Orlcy
Ashenfelter and David Card. Amsterdam:
Else\'ier.
Hedges. La~ V..and Ingram Olkin. 198-. Stalistical
Method~ fur Mcta-anulysis. San Diego: Academic
Press.
Hetland. Lo is. 2000. " Listening to Music E nhances
Spallal-Temporal Rdlsoning: Evidence for the
' Mozan .EfCcct.'" Joumal of A estltettc Education
34(3-4): 179-23~.
Hoxby, Caroline M. 2000. "l11c Efft.:cts of C'las~ Size
on Student Achieveme nt: New Evidence Crom
Population Variation." The Quarter~y Journal of
Economic~ 115(4): 1239- 1285.
Huber, P. J. 1967. "The Behav1or of Maximum Likelihood Eo,timates Under Nonstandard Conditions.''
Procrt!thng\ r>f the Fifth Bc>rkdey Sympo~ium on
.Mathemuucol Suuistics and Probabilil\'. 1. 221 - 233.
Bcrl..ch.:y: L mve~ity of Cahfomia Pre'i&
J mben~ Gutdo W.. and Johsua D. Angnst. 1994. fdeotification ;.md Estimation of Local Average Treatment EHccts." Econommica 62: 467-476:
Johan~en. S~lren. 1988. "Stallsttcal Analv~is o( Cointe
grau ng Vcctof'!\." Jou rnal o.f &o11omic Dynamics
and Comrol 12 231-254.
Jom.:~. Stephen R. G. 1992. "Was There a Hawthorne
Eflcct 1" Amaicmt Joumal of Soctology 98(3):
.l~l-46.'\,
765
Levitt. ~ tc , cn D 1996. "The Effect of Pri,on Populauon Si:rc on Cn me Rates: E\ idence from Pn son
0 \crcrowdmg Llltgallon.'' Tltt! Quamrl\ lou mal of
Econonuo 111(2). 319-.151.
U\ ttt. ~tcv~n D. md Ja.:k Portc:r 2001. " How Dangcrou~ Arc Dnnking Drivers?" Joumal of Political
Eco/IOIII\' 109(6): 119~1237.
List, John. 2003." Doc Market .Explricncc Elimina te Market Anomalies.'' Quanerly Journal of
Ecrmo111ics II H( I ): 4 1-71.
Maddalo. G. S. 11.)8.1. Linured-Dependelll and Qualita11\'(' Varial>/e ~ in Fconom etncs. Camlmdste: CambridS!.e U niH~rsi t v Pre~
MadJ ala. G S. ami In-Moo Kim. 1991:!. Una Roots.
CotnteRratwn. and Stmcnmtl Chon ~e. Cambridge:
Cambn ugc Lmvc~it\ Press.
Madnan. B~ige ttc C . and Dennis F. Shea. 2001. ''The
Power of Suggestion: inertia m 40l(k) Participation
and Saving.<; Behavtor Quarterly Joumal of
Economtc~ CXV1(4): 1149-11&7.
Malk1d, Burton G. 2003. A Random Walk Down Wall
Street. Ne" York: W. W. orton
Manning. Willard G., ct at. 1989. "The Taxes of Sin:
Do Smoker~ and Drinker.. Pay Their Way?" Journal of the Amnica11 f\fcdicol Associmion 261(11):
IOCl+- 1609.
McClellan. Ma rk. Barbara J. McNeil, and Jos.:ph P.
Newhouse. 1994 "Does More ln tensi\'e Treatment
o f Acute Myocardt:ll lnfarct1on in the Elderly
Reduce Mortallly'?" Jnumal of the American
'!yfedical A \\nciauon 272( I I): R59-866
Mcvcr, Bruce D. 1995. "Natural and Quasi
Bxperiments tn ECOI'IOmlcs.'' Journal of Business
and t:rononuc ~trtltstics 13(2): 151 161.
1\-h:ycr. Bruc<. D.. W. Kip Viscusi. anti David L.
Durbin. l995. " W OJ kers' Compcn<;ation and
Injury Duration: Evtdencc from a ;-:at ural Expen ment " ll mencan Econom ic Re1iew 85(3):
322-340.
Mosteller Frederick. 1995. - The Tennessee Study of
Class Silt" in the Earl\ School Grades.'' Till! Fwure
of Chtltlmt Crwcal l~sucs for Cluldr~Jn and Youths
5(2). Summer Fall: 113-127
Mosteller. Frederick. Ri~hard light. and Jason
Sachs. 1996. "Su,l.tmcd lnquli}' m Education.
Lc~on~ from Skill Grouping anti Ctw..s Sit e."
Hartarrl l:.dllcllttnnal Re>irw 66(~). Winter:
631-676
Mostcller,Fr.:dcrick. and Da-.id L. Wallace. 1963.
Inference in ,m Author.,hip Problem." Joumal
of rile A m-nccm Statisllcul Aswciuricm "i8:
275-309.
Munneii ,Alicia T-T . Geoffre\' M B.To01ell, L\nne E.
Browne. nnd James McEneane~. 1996. " Mortgage
Lcndtng tn 13o~ton: lntcrprctmg IIMDA Data."
A mem:an htmrmllc R<'new 86( 1). 25 'i3.
766
REFERENCES
I}(J( ) )
1162 I 1Q6.
Fr,~
{or \ If?
lt"o"' from tht Ntlllli llto/tlr lmrmlllC't' Experiment Cambnucc: I larvard L nrvcrsih PTe~
Pem. Cr.ul!. ,md I lane\ S Ro-.c:n 21i. "'Jbc SelfEinplt.l\cd J\r4- l e.. ~ Lrkely Thun W;~gc-Earner; to
Havo.: I lcalth ln,unmce. Stl \\ hut?" l n Emrepeneurship tlltd f'uhlit Pnluy. curteu hv Dl1ll!!.l3-" HolllEaktn and l lar"c' S. Ro, cn. Boston M IT Press.
Phillip~ Pet~r C. R..'.tnd S<tm O ulron,. l'~Xl...Asymptotrc Prnp~.:rtie' uf Rc~adual Ba.,ed Te'" fur Cointegrauon l .nmometrica 5h( 1): loS 194
Porter. Rob~: rt I t.)K\ "A Study of Cartd Stability: The
Jomt Lxct:uta\e Cnmmrttee. 11\&>- ltsXtl.'' The Bell
Jvumal of Fcol/wmn 1-1(2): 301- 314.
Q uandt. R ac hard. 1960. "Tc:<;tS of tlw llyrothesis
l hat a Luwar R egr~.:.;-;it1n Sy,tcm Ohcys Two Separate Regtme"" Joumal vf che .t\mencan StullMtcal
Auoci.utw11 ~'i(290) 324-330
R auscher. Franc~'- Gordon L. ~h3\\ , and K:uherinc r-:.
K~ . 1993. ''\lu,ie .md 5paual Task Pcrform.ancc:
Nature 365( 6-147) to II.
Roll. Ru.:h.ml I liM orange Juice and Weather."
Amcrimn Lconomit Rnie11 74(5): HM~I
Ro-,cnz\o\ctg. Ma rk R , .md Kenneth I Wcllpm. 2000.
'Natural ' II.Jatural F\pcrimcnts' 111 Econom1cs.''
Joum111 oj fmnmmc Ltterarure .3~(4) ,'27-._'\74.
Rouse. Cecalia. 1995. MO.:mocratt73Uon ~1r Dt,Crston?
The: Effect of Communi!~ Colle2es ''" Educallonal
Anamml!nt." Journal of 8u,inc_,\ 11111/ Eumvmtc
Stumuo 12{21: 21 7-22-1
R uhm. Chmtopher J. llJ9tl "Aicohlll Poltctc'i and
H rghwu~ Vt.:hu:lc f:n.dtltc.,: Jnwllnl of 1/eulth
/.:.C0/1('11111('.\ 15( -1 )' 115 -154.
RuuJ. Pau l. 2000 A n lmroduction rn C/av,iml
Ecmwmetrtc flwon. , cw York Q\1orJ University
Pre~s.
Stock:, Jame<> H., am.l Milrk W. Watson. 1993. ''A Sampk Estimawr ot C uintcl!r.l! tng Vector; in HtghcrOrder Integrated Sy~lcrn,... lnmometnw 61(4):
7Kl--. ~0.
Stock James H .. and Mark W. Wab on 2001 'Vector
Autureg.n.:sswns: Jnum(l/ o.f f ('(11/0mic Perspeuitt,,
15(4 ), Fall: 101- 11S.
Stock, James H .. and Mo tnh tro Yogo. 2005. "Testing
for Weak rnsrrumenl\ 111 l anc.1r 1V Rcl!.rec;son."
Chap. ~ 1n ltlnw.fiwtion ,mc/lnjerence in E~:mw
metrtc. \4odek.. F.ssu1:;:, 111 H u11or oj Thf)nitJ\ J
RntlwniJt'rg, edited b~ Don-tid W "- Andre"' <tnd
Jame~ H Stock. C<Jmhndge: Cambndge 'LOJWNI\"
Prc!>s.
Tobin.J;imcs. 1958. 'Esti mation of Re l atioru.hip~ for
Ltmitcd Dcpcndent Variables .. Econometrica
26( I)' 24-36.
Wat~on. Mark \V. 199-1 "Vector t\utorcgrc~ion" and
C~ltntegrat ion "Chap. 47 m 1/tmclbook ot Ecmromctnn , vnlumt= l\,edtted by Ruben Engle anJ
Chapter I
1.1 111. C:\J'l\;rtmvllthut ~ou dl!l!tgn hould ha\c one
ur mm .. tr.. tl"'l~Ot group" and n c~mtrol ~roup: fur
e\ampk nne ''trc;ltmcnt" could llc studying for four
h'-'UT:o.. and the control would he nut 'tud~ ing (no
lrcatm~nt) S tud.::nt~ would be ramloml~ ,,.,)tgn.::d ro
b~. treat nen .md Cmtrul group. and th~: \."aU~
cth:ct ,,(hour; t'l 'tutl~ un m1dtcrm performance
would [.c t:'>tirnated hy compi11111E! th-.: ~wcrage
m1dterm gwdcs fo1 cal hut tht trc tlmcnt !!.roup'> to
thatnl th~.: controll!roup. llw lnrg~,ttmpedim.:nl is
to en,UJ<. that rhe ::.tut.kntc; m the tllf!crent treatment
~ll)Up::. s~nd 1he corrcc1 numh~:r of hours -.tudying.
llo" c.Jn \OU make -.m~: lhnt I he 'tudc111' in the control group do not 'lull\ ll 111. ' ino:c that might Jeopardize the1r gr.tdc! lim\ ~an 'llU r"l 1ke '-Urc that all
'tuJcnt' m th~: treatment group nctuall~ )IUd~ ior
lour hour)?
Chapter 2
2.1 lncre outcumcs are random hce;tu'c I he} are nut
known with n:rlaml\ untiltht:\ :Jctuulh lll:\'ur \ ou
768
A NSWERS
Chapter 3
3.1 The population mean is the average in the popu-
Chapter 4
4.1 /31 is the value of the slope in the population
regression. This value is unknown. {31 (an estimator)
gives a fo rm ula for estimating the unknown value of
{3 1 from a sample. Similarly. u, is the value of the
regression error for the t'~h o bservation: u1 i::. the
diffe.rence between Y, and th e population regression
line /3n + {3 1X1 Because the values o f f3u and (3 1
are unknown. the valut.: of u. il. unknown. rncontrast.
~~ is the difference be tween 'y, and /30 + {3 1 X;: thus. ii;
is an estimator of u1 FinaUy. ( Y IX ) = {311 .,.. {3 1X is
unknown because the values of {30 and {3 1 are
unknown: an csumator for this is the OL.C) predicted
value. So + {31 X.
4.2 111crc arc many examples. Here is one for each
assumption If th;.: value of Xi:. ossignc::J in a randomized controlled experiment. then (I) is <;atl'lficJ. For
the class SIZC n.:gress10n. if X= class SJLC is correlated
ANSWERS
J"'O'""
~to~-
Chapter 5
j, lllUCC\llhl: e.t.;l:,
769
Chapter 6
ti.1 It" likcl~ th;lt fj 1 \\Ill h~: bi<~,ed N.:causc of omittl!d \.manic'- Sclw"l' in more Hlllucnl di ... rncts are
hkeh to '>p:nd mnrt tn .111 c:Ju~;.IIJOn;tl mputll and
lhUl wouiJ have ,m,tlter eta" '"c' lmlrc l"!,loh 1n
lht.:. hhrury. und murc: c.:~mtputcr-.. lllc!->C other mput-.
ma} lead tu l11gher " c:r ill!.t. tc:)l ~corcs.ll\Us, /31 \\ill be
h1a~ed upw.~rd bcc.:,nN.: the numb~.:r ol computer" per
~tudcnt 1:. po~tiJvcly em rd.ltcd w1th omitted van able~
thai ha'.: a po~J ti\'C clte~ltlll :wera~c tc\t !oc.:on..:s.
6.2 If \ mcrea''-" b\< 3 umt<. .md X ,., unchanged.
I hen )'I\ expected lO Chillll!e I"!\' 3~ 1 UnJij; ff X~
decn:.t,cl- bv 'i unit" and.\ i'> unchanged, then Y is
e\pc:t:t.:d lo c.:hanvc h) 'i/3 unit' If \ inlTCa~t.:S b} 3
UOII'- ;md X~ d..:cr~.:a\~.;S lw "\ un11o:.. then ) J!o .:xpccteJ
to chanp.c bv 3/3
;;,~, un1t...
6.3 Tht.:. rcgre ... ~i~m cann<lt de term me the effccl of a
ch,m~~ in one ol Ih..: n.:grc:~or.. as,ummg no change
tn the o(hc..:r rcgrcs,or<.. hecnu~ 11 th~ voluc of one.: o(
thl:- perfectly mulucnlhnear rejlrc<.~or..ts held con~tant , t hen SO IS thC valUe or tht 1..11 her llJat b. there is
nu independent v UJ.IIIUn 111 ont multtClllhnear
re!!re<.-;nr r~t, ex.tmplc) ol p.:tll:cll~ mulncollinear
regr~K~r<; arc (I) a p.:rsnn s \\ ctght mca .. ured 10
pt)Und' and the- \;lmc: fll:l"\lm., '""'!!hi mea<.ured in
k1logram'. and ('l) th.: lracllt>n nt \tudent<. \\hO are
male and the con<;tant te rm. ~ hen tht datot come irom
tll-m..tle schl>tlh.
6.4 If X .1nd \ llrt.:. lughly correlated. most of the
\ariation m \' I.'Uinl'luc ... "llh the variauon m X,
Thu:. lhc.:rc i~ hill.: . , an.tiiOil Ill .\" holdmg X,
' '""' that c..an b.: U\..:d to csllmatc thl partial effect
ol X on Y.
con-
Chapter 7
"ell
770
A NS W ERS
Chapter 8
8.1 Inc n:grcs~ton func:tton will look like th" qua
drulic regression 10 Figurt: 8.3 or tlli.' loganthmtc ru ne
non in Figure 1{.4 The first of theS\: b spccifu:d a:. the
rcgn.:s,io; of Y onto X and.\~. and th1.. second " the
rcgn.:!>l ton of Y onto ln(X) '!here are many c~onomic
rdations \\ 11h tht' shape. for example. this <>hape
mtg.ht rcpn:~nt the decreasing margtnal proJuctavity
of labor in a proJuct1on function.
8.2 Ttking loeanthms of both side::. of the ClJUat ton
ytekl~ ln(Q)
flr, + f3 In( K) + 13zln(L) + f3lln(M) -1 u.
\\hero.; f3v = ln(A) The production (uncuon parame
ten. can be e!illm.ltc:d h) regressing the loganthm of
producuon on the logarithms of capital. labor. and
raw mmerio ls.
8.3 A 2% increase in GDP means th,ll ln(GDI')
increases by 0.02. lltc implied change mln(m) i.,
J.0 x 0.02 = 0.02, which corresponds to a 1~..
increase in m . With R measured in pcrccnt ag~,; poi nts,
lbc increase in R ts from 4.0 to 5.0 or 1.0 percentage
point. Thb leads to a chnnge of ln(m) ol -:0.02 '< 1.0
= - 0.02. which corr~ponds to a 2"'u fall m m
8.4 You want to compare the fit of your linear regn.:s
sion to tbe fit of a nonlmcar regression. Your tubwcr
will depend on the nonlinear regression that vou
choo-.e tor the compari~on. You mightlt:st yout linear
re8rc~ion against a quadratic regression by addtng
){- 10 the linear regres<>ion. lf the coefficient on x~ is
significantly dtffcrenr from zcro, th.;n you c1n reJeCt
the null hypothesi that the relation~h ip is ltnear in
favor of the altem .Jtive that it ts quadratic.
8.5 Augmcnung the ~.:quation in Quc,tJon b.2 ''ith an
intcrm:uon term yield~ ln(Q) = /j11 + fj 1Ln(K) +
0-ln{L ) + /3}1n(M) + f3.. [1n(K) X ln(L)) + II. nw par
ti~tl dfcct of In(/.) on ln(Q) is now /32 + f3Jln(K).
Chapter 9
9.1 See Key Ct)nccpl 9.1 and the paragraph that
immediate!)
follow~
Chapter 10
10.1 Pand data (also called lon ~ituuin:tl data) reft.or"
to datn lor 11 dtffen:ntcmiucs uh\crved at r differcnl
time periods. One of the subscnpt$. i. identitlt:s the
entity. and the other subscnpt. t. tdenlifie-. the nmc
penod.
10.2 A pcr..un:> ability or motivation mi~ht nHect
both educ.lti<.m and carrung... More ahie indt\ tdu11l~
tend to complctt: mor.: }<!an. of cchoohng. dnd. for a
''en le\.c;l of eduwt un. the\ tend to ha ..c: 'ttgh..:r
tamtn2'- The.: ..arne.: ~~ true f0r lughJ} mouvatt:d pc-o
pt..:.
<.tate of thc.: macro..:conomy is a umc<>pectfi\."
variahlc: that affect:. lx1th c.:ammg.. and \:du~:auon.
Dunnl! rc.:C.:l:,Sion"- un ... mplo.. ment '' h1gh c.:.trntnl''
Ml" lo~. and cnrollmc.:nt In colleges tncrcascs. Pt:r:o()l1
The
ANSW ERS
Chapter II
lLl B.::causc Y is binary, its predicted value is the
probability that Y = 1. A probabiliry must be
between 0 and I, so the value of 1.3 is nonsensical.
11.2 The rec;uJt<; in column (1) are for the linear probability model. The coefficients in a lino::ar probability
model show the e(fect of a unit change in X on the
probability that Y = l. The results m columns (2) and
(3) are for the logit and probit mood~ Thc..'be coefficien~ an difficult to interprcL To compute the errect
of a change in X on the probability that Y = 1 for
logir and probit model. usc th.: procedures outlmed
in Key Concept 11 .2.
11.3 She should use a logit or probi1.1l1ese models
are preferred to the linear prohahility model because
they consrrain the regression's predicted values to be
between 0 and I. l..Jsually, probit And log it regressions
give similar results. and she should use the me thod
that is ea!>tcr to implement with her software.
11.4 OLS cannot b.: U!>ed becau!>c: the regression
function i:. not a linear function of the regression
cocffictcnl (the coefficients appear 1n.side lhc nonlinear Iunctions !I> or F). Tht> maximum likelihood
estimator i~> efttdent and can handle regre<;<;ton fu nctions that arc nonlinear in the parametel'$.
Chapter 12
12.1 An tncrea."<! in the regress1on error, u. shifts out
the demand curve.leadmg to an increase in both price
and quantttv.Thusln(P~o"1''') is poc;ttively correlated
wuh the rcgreo:;ston erro r. Becauc;e of this positi\'e corrclatitln,the OLS estimator of /31 is inconsistent and is
likdv to be larger than the true value of {31.
12.2 The number of 1rees pt;r capita in lhe state is
exog~nous because tt IS p lauc;ibl~ uncorrdated with
the error tn the demand funct1on l lowever. tt probahly "abo um:orrcla te<.l with ln(P"~"""'''). so It is not
771
Chapter 13
13.1 II would be hetter to assign the trea tment level
randomly to ~:ach parcel. The research plan outlined
in the prol'llem rna} be nawed because the dtffereot
groups of parcclc; might differ systematical!~. For
e\ample. the fin.t :!5 parcel of land mtght ha"e
poorer drainage than the other parcel" anli this would
lead to IO\\ cr crop yicltb. The trcutment I!>Stgrunent
outlined in the problem would place these l5 parceb
in thc control group. thereby overestimating the
effect ot the lertilt7cr on crop yield~ llliS proble m is
avoided with random ao;~ignrnen t of treatments.
13.2 The treatment effect could oe e~t ima ted as the
difference in average choleo;terollevel-. for the
treated group and the untreated (control) group.
Data on the wetl!ht , age. and gender c)f each pattent
could be u-.cd to1mprove the e~timalc u tng. the difference!. e!>ttm<ttor \\ith additional regrc.~ors ~hown
in Equation ( 13.2). ThiS Tl!grc~SIOn may produce a
more accur,ttc C:!>tlmatc bccausc it controb lor these
addtttonul facton. that rna) affect cholesterol. Jr you
had dura on the cholc~lcrollcvcls ol ench patte-nt
772
ANSWERS
t"
Chapter 14
U .l h dtc' nul rpc.tr 'tntionnr~. The mo:.l ..triking
char;scll:mtK uf the -.cne' j;; th.at it ha'> .m Up\\arJ
trcnd. lb.ttt~. uho;..tvaticm' at the cnJ (f th ... 'ample
arc W)t~mJth. 'lv l:tr1!er than ~,h,crHtlton~; ..tt the.
Oelmirunl!- Till' ~li1!~C,ts that the m.::tn Of the ~rt~ S I~
not con'l~nt. \\htch wnuld imp!\ 1hat llt" not statulll
af\. [be f.ir.ot dtfktence ol the 'Lric' m.l\ lool.: '<tationa~.lx~au-., (iJ-,t dtffereocmg el mmatc~ the lnrgc
trend HO\\'C\\;r.thc lc\c(,,i the lir 1 Jtflercncc 'cncs
ts the .,J,)pe oil he pl\!ttn Figure 1~.2c I i>lktnl! e.trc
full~ Ill the llUTC.Iht: lope j, 'ICC!"!f 10 ll)NI 197~
1h.u1 in LY7Q-21)04 lltu, there m.!) hil\e been a
,,f
Chapter 15
1.5.1 -\.., Ji"u"c~l in Kev Ctlllll!pl 1~ , . cau...al df~cts
can he c-.um.tt.:d b.,. a dt,trtbutcd 1.1~ mtxld when the
regrl''""lf'> olf\: 1.::\111!\.0tHI' In tht ... COntC\1 exo~c:nctt~
mc:m-. th ~I ~:un nl unll.uacd \lllu~' olthe munev
"upph rc: uncurrelatcJ \\llh th~ rcgr"'"'mn error:
fhj, ac:.,umplun '' unhkcl~ 111 he .:.<~lt"ltcd. for .:xam
pk aggre~utc ;o,upplv dt,turh.mcec:. (otl pric... 'hod,,.
ch;mgcs to pwducll\11\ I h 1\C imp,,rt ..nl cflcch ~10
(,J)J~ ll1e 1-t.uetal Rc~ctvc. utd thL b.tnL;u\~W:>h.:m
ANSWERS
773
Chapter 16
16.1 lll1. m.~~:ro<eonomisl "ani~ tu construct f01 cfnr nmt: vanai"llc..;. If four l.tg~ ol each variable
ttrc U'il.!d 111 a VAR. then cnch VAR cqu:uion will
tnclUdt. 37 rCI_\f\.!!i\100 coeffictcllt~ (the COnStant term
<ltlU fm11 codficicnt., fnr each ol the nine vnriahles ).
llll.! ~;nmpk pcr111d tncluues 12K quarterly nhservations. When n coc::t'ficlenh un: (.;:O.llmntcJ usmg 128
oh,crvat~tm~ the e ... ttmllh::tl cocHICicnh arc hkdy to
he tmpre~o:tsc, lcadmg. tu maccur:uc fon.:cu~t~ One
.thcrn.ltl\c ''Ill u'c a uni\ari.1tc .tutorcgr.:,~l0n for
ea~.:h vanahlt: rhc <IU\ .mtllgt: of th1<> approach 1:. that
rclatl\cl) (C:\\ p.tramctcn. need lobe o.:stimatcd. -;o
th tl the c:ucfltClCnt' \\illl't! preci,ely e~timated 1-ly
OLS Inc dt,au\1lntaec ''that the roreca<>ts are con::.tructt.d w.ing on I) lag" ot the' aria hie be in!! forc:ca~t .
and lag-. ot the ''th..:r \"aria hie, might contain at.fdlltonalusl'lul torecasung mrormation \ compronuse
t' to usc a set ot t11ne ~ene' regre,\lon ... \\ilh addiIH'Ilnl rrcutCIOI'- For example a ( rDP rnrc~a~ting
rcerc,,lllll mtght l't! '\flCCJIICd u... mg lag<.v( GOP. consumptnlll, und '''"!!-term 1nterc:~1 r.. t~' but e:-.cludmg.
the; tth"r \.Utahlc: l11t: $;hnrt-term mterc't r<~tc for<-'c.l'oling r.:grc"lOO lllll!ht he spc.:iftet.f u-.ing l.lg.~ of
'h(Jrt-tcrm r.llc:..lt>ng-tcrm rate,, GOP. and inllation.
1l1e dc.t '' lo indudc the mo~t Important predictor'\
111 .:ach 0f the re~rc-.~i<m c~unuun'. hut kaw outlht.
vanablc' that .uc.: no! vc.:t) tmpmt.tnl
C.t~l"
Chapter 17
\\Cight.
774
ANSWE~S
as~~
Chapter 18
18.1 ach cnu' olth" hl">t column of X i I The
the s.:cund o~nc.J third columns an:. zeros and
one Th..: hl">t column ot tho.; ma111x X 1, the sum of
the sccCind J nd tlunl column-:: thu~ the column.; are
linearly dependent. and X does no~ h:we full column
rank. fllc regrcS'\IOn c.tn be re~pec1fied hy ehmmatmg
enlric~ 10
en her X 1 or
X~
standard error SE((3 ) unu form the confidence mtervaJ as (3.. :!:: l.lJ6SE(p11 ) .
c. TI1e con!1dence intervals could be constructed as
m (11 ). TI1ese uo,c the large-sample normtl dpproxrmation. Under ..,,sumpuon ~ l-6, thc c:xact 1.b~1nbu tton
can be used to form the.; confid..:oce mt~,;nal ~1 =
1 t
~S.I:(,B ). \\hl'TC t"
or i~ the 97.5111 p.:r~nt.ilc of lhc 1 c.J~tnbuuoo with 11 - lc - I d..:gn:..:~ of
freedom Here n 'i()O and k = I. '\n extended '..:rsion of App.:nJI \ Table 2 ~hOW!. ~-~v l7~ = 1.964>{
18.3 N o,lhi'l result requtrl!s normally Ji~trihutcJ
errors.
Glossary
fi!)CCtcd).
Adju_<;tcd R 1 (R 2 ) : A muJificd versttm ot R2 that
J,"'t:~ nut ncc.:~art ly m~;.rc;t't(! \1. hen a new rcgr~
"lr '" 1ddcd to the r.:grc~'l<'"
~O I ,(p,q) : Sec ourort>grtsSI\ ~ chl tribwed l11g model.
\ TC
s~:~
I jormalltm crilt'f/011
wer..:.. lkt.ted
'Bayes information aiterion: Sec mfu
IIWI/(}11
airerinn
776
GLOSSARY
sa,..,
tlistnhutt~m
Co~i~>tent
n,. .
u 10
~ g~ven t':l
Se\.lwr
.mu ., ,
~ )
,,.
GlOSSARY
,,,r...
777
Exo~<:oou'
\ pla ined
Explanat or~
of -.quarc:d
f, , trom
778
GLOSSARY
\ .tlu~:.
Hctcro~l.edasticity-robw.t tstati~k A t
hctero~kt!du~IICttv.
Gan~~-MarkO\'
GeneraJired au torcgre~~ive conditional belero kedasticity: A ume sene... model for conditional hl'tcro'il..edasllctt}
Generalized le8S1 squares (GLS): A generahzatton
0 f OLS that i~ appropriate when the rl'gr~~ n
e rrors have a known form of heterosl...:dnc;ticllv (in
which case GLS ts al'o rdcrred to ac; wct~hted
least square. WLS) or 11 know n lormul s~..ri,ll
correlation.
Generalized me thod of mome nb: A met hod (or estimatmg. paramctc~ hy fi lltng sample moments to
population moments that are fu nctions ut t he
unknown param~. ter<\. Tn ... trumental vun.1hle' C)timator-, an: an tmpon.,nt speetal ..:a:.c.
G\1~1: Sec gencrult~~d ml!tlwd uf mnmt'lll\,
GrJnger causality te!it. A procedure fo r testi ng
\\hcther current and tagged valu~s of nne lime
seri~ hdp pred1ct futurl' \alues of JllCll her tune
o;enes..
HAC standard e.crors \~.:c lrt:tund.c.lu,uut\ und
awocorrelallotHXIII\1\IUif (I lAC) ~tundard rrrur:..
Han1horne efJcd: S<:c t>\pt mmnwl ejjr:ct
H eteroskeda.sticity: The ~itU<Hion 111 '' hich the vari,Jncc of the regression error term u,. conditional o n
the regressor!>. is not con~tant.
lielcro~kedastkih- a nd ~tutocorrelati oo-con~;i tent
(HAC) ~ian da;d erron: ~tandard ~..rrMc; f11r OLS
esti mators that are con~1sten t whcthcl M not the
regre:;:,ton error.. .m: hctcro ... l..cda,llc .1nd llUtocorrclated.
ll ctero~kedasticit ~-robu\1 ' llmdard error: S t;~ndt~rd
errors for lhc 6u; c\llmator th.ll art..' appr~o1prtatc
\\hether the .:rror t~rm I'> hnmo,l..cdu,uc ur hctc:n,.,kcdasuc
con.,.tructed ustng a
d.Jrt.l e rror.
h.:t~.:Tosl..cd.l'tJCll\
'tatLSllc
robust tnn
lest: A procedure for u:.in!l. 'ample e vtdeocc to help d etl'rmine i t .1 'pcctlic h) fl<Hhcst<;
abou t a population is true or fat e.
GlOSSARY
t'
779
~ ull
780
GLOSSARY
Population intercept and ~lope lbe true. or population, values of f3u (the mtcrccpt) anu {J, (the
slope) in a single variable rc.:grcs'>IOn In,, multiple
regression, there are muluplc ~lop.: coelllcicnts
(fi1 {32, {34) . one for each rcgrc-.sor
Population multiple regression model The multtplc
regression model in Key Concept fl 2.
Population regression line: Tn a ~inglc vanat'lle
regresston. the population rl!gre~~ton hne 1c;
{j0 - {3 1X ;. and m a mult1ph.: regn.:sston 11 1~
f3o-:- I31X1, + f3: X 2i + + f3p '<v
Power: The probabilny that a test correctly rejects
the null hypothesis when the a ltcmatJvt; 1s true.
PrediL1ed \'alue: The value of Y1 tha t is predicted
by the OLS regression line, denoted by ~i n this
texibook.
Price elasticity: The percentage change in the qun nt it~ demanded resulting from a 1% incrca-;c 1n
price.
Probability: The proportion of the tim~ that an outcome (or event) ~'ill occur in the long run
Probability density ru.nctioo (p.d.f.) For a Cl)ntinuous random variable. the an:a umlcr the prot'lahility density fu nction b~tween any two pomts ~~the
probability thatlbe ranuom variable fall<; betwec.:n
those two points.
Probability distribution For a di-.crctc ranuum variable. a list of all values that 1\ random van able can
take on and the p robability as~odatcd with each
of these values.
Probit regression: A nonlin ar regressmn moucl
for a tnnar) dependent vanahle in "hich the
population regressiOn function 1" modeled usmg
the cumulative standard nonnal distributiOn
fu nction.
Program evaluation: 1l1e field of ~tudy con<.:crnctl
with estimating the effect of a program. policy. or
some other intervention or treatment
Pseudo out-of-sample forecru.t: A forecast com
puted over part of the sample. usmg a procedure
that IS as if these sample data have not yet been
realized.
Quadratic regression model: A nonhnear r~;2re ~llm
function that includes X and X ac; re~tre...,or'
Quasi-experiment: A u-cumstan e m wh~eh randumne.,s ic; introduced b\ 'analtons tn mdiv1dual circumstances that make it appear n' tj the trc.nment
1s randomly a-;s1gned.
GLOSSARY
fl! . In u re2.r\.".,~1un.thc frudtun ot the ,,,mpt.. \ariam:c ot the dependent \,mablc that 1:. c.:xphtn.:d
I>\ 1he: re r ..:,,..,,....
781
tandard oonual distribuJjoo: lb.. normJI dt:>Lributlon '' th me tn ~;qu.ll to 0 .md .. ..nancc ..qual to 1.
dC' notc:d N( l I I
Stund nrdi1ing a random \llriable: An oper.Uil"m
acnlmpfi,hc<.l h\ 'ubtracting the mean and diYid'"" h\ th~.: ,,,mJ,ar<.l Je,1a110n. ''hch produc.:s a
rand~m 'ariahk " ilh a mo.:;sn l)f 0 and a c;tand:trd
J"' 1.1tion uf I lhc ,L,md.ud11ed \alue ot r ts
l )' -
ill ) IT )'
782
GLOSSARY
for a difference in m~aru; A procedure: Cor testang whether two populations have the: 'a mo.: mean.
lim e effects: Binary variabk!> indicating the time
penod in a panel data rcgrt:.\>;lon.
lime and entity lhed effec~ regres ion model: A
pand data regresSIOn that 1nclud~.:' both cntit}
fixed ettects and time fixed erfcc.:t'.
Time fixed effects: Sec tim~: t-{jn I\
Time c ries data: Data for the "tlme en lit\ for multiple ume penods.
1olal wm of squares (TSS) The -.Lun uf squared
devaat1ons of Y,. from ill> average, Y.
Treat me nt effect: llte causal dfcct m nn l!xpcrimcnt
or a lJUasi-experiment. ~..:..:caw a! e[fl!''t,
Treatme nt group: The group that receive" the treatment or intervention in an c'\penmenl.
TSLS: See rwo sra~e least squarl!\
Conc~,;pt
12.2.
data are
~et
in \\ hich -.orne
mb~mg,
hi a~ that is
equal to zero.
U ncorrelated: 1\\'o random ' ari<tble~ are uncorrc.lnted 11 their correlation i 'cro.
U nderide nti licatio n: When the nu!l1hcr o l instrumt:ntal vanahles i~ lcs:. than the numbur of
Variance: The C\IJ{:Ctcd value of the squur~.ed dtltercnce between a random variJhlc: ami II\ mean; the
\'anance or y i!> denoted uj..
Ve ctor au toregression: A modd of k time ~cne' vilri
ablt:s consi!>tmg of k equalwns. one for each \ :m,,blc. m which the regressors 10 all t:l(Uiltlllll'> arc
lagg~d values of all the vanablec;.
Volatilil) cl u~1e ring: \\'hen a t1me scfll:' varmhle
exh1t'lit' l>Ome clustered penod~ ut h1!!h '.lft.lnCtc
and other clustered penod~ of lo\\ '<HIJncc.
\\euk instruments: Lnslrumenlal variahlcs th.tt ll\1\'t:
a low correlauon w1th th~ endogen\ll~
regressor(io ).
Weight~d
Index
Acu:pt<~n~c
A DL m\Xk:l
~t< :\utorc~,:r~"""~
dt~tnt>ntuii.IS m ldr:l
\Ee,:tnJ .:arnrnr \rt' ('ullc:~;c:: gra.W..tcs
552
~5.\:'tI.~.M.:!-653
IN\
Alcvhol
\Rs
7Y.
A~yruptouc
distribullcn. J9
dctiruuon of.():).!
)I lstatisuc. 705. 71J
,.r.
,, pfltlrl
diCCI\ r~~ln'i<lll
( tl
s. . .
dtmcd W2
~'llnt.ltlll8 cocfficicnt6. bl:! .013.
h) '>.Ill/\
~'1un.1tinn
~ St't' Ocn~rHh/Cll
kasl !>QUSICS
((ll..S) C"lllllll(ll'll
~ 4'lt
'ihJ
O! JSlaUSUC .U5
fl.l!l
ta.\~
ot tnHrnment exvgt>ndt\ 7~
tor IV rcgre>sion Sn Vori.lhks
regression 1TV rr.:~rc""'lll)
lor lca>l squares n:~:,'Te\\IC>n.
L..:aM
squares
tur 11mc senes T"'h'Tc::,,ltln ~..~ Ttmo.;
series
\.'~mptollcall~ dficu:nl G\1\1 u.umator.
oiUlllfl'jU'O:~~J\c
A~cr .. ~c .:a~u31
~\'c:r~ge
711.~71~
the..~ bb(J-()86
ol fSLS estllThlllll 711", 7~-730
A\C:IIii!C: tre~tm~nl
71ll-713
Asymptotic normality, 7HJ.-71l. Set rtlw
Cr.:ntra l lmu th<:orcm.Asymptotlc
di.,mbution
Ali\ntplotic mlTlllilll~ i.lt\tnhut.:d. 54
Altrtlion ~73,501
Augmc:ntc::d D id.ey-Fulkr ( A Of) SI311SII.:.
'\OI-563.M:?~3 nt>3t
Augmented Dtckey-Fullcr (AD[) tc-s. vs.
DF-GL<; te~t 1>5H
r\ utm:orrd.:.t1on 366-31>7. 532-533
.:cdficic:nt,532
dc:l'initwn oUM. S32
of c:rrur term. :nu
'ttmple~ .;;:n. 533t
COOl
dkc1 \"S.IOC3l
lU'U.tl diO:Cl.
B
Balanc.:i.l pnnd, ~50. 163
Bank >I Lngl.1nJ. 5~9-551
Ba~.. 'JICCittc.llinn. 237 :243 ::!4.! .'\17-318.
Bay(~
31>1J-370.J02
mtorm.lllun LTilt:nun (BlC).
~5~-~~ .. ~5~1.561 577.
6~~-M.
~"AI( .55;\
1(~. 363.
784
INDEX
simultancou.~
ca,unlt;
stmu ltancou:\ equanons. s,.:
SunuiLnneous casually
BlC Sl!e Ba)t" informauon tntenoo
Binary dependent vanablc!\. 319. .384-389.
3W
and. 2, 0-284
interaction' between. 277-279
ioteracuon specificauon 2iil
limned dependent ,-anable. t!xampk of.
4())\
c
Capllal asset pncmg model (CAP\1).121.
324-J25
CA Pl\1. Sc:' Capi tal asset pnc1ng. model
Card, David, 285 . .t96, 4\18 . .502
Cardiac ca tht"teri7.ation 5\udy. 453-456.
4%-197.501 507
Cauchy-Schwa n m.:quahty. 6l:r1
Causal effccts..202
average. 503
chal lenges of ohserv<~tional data. I 0-11
ch;~ngc: in mdt:pendeot van able, 312
d.:ftwHOll Ol 9. ss. 471. S%
d''llamk. Sl!e Dvnam ic causal effcch
"~II mat l Oll 0( ~9. 66. S5,3l3- 314,-m.
4tl4. 4S15, 497, 499.506-507
di (ferencc-of-mt:<m:.. usin.2
expcnmemal data.
for ditfcrcnt grou~ 502
and forecastmg, 9-10
and ideahzcd experiments.ll-10
and ideal randomited cootrolled
e xpcnment, 469-471
rt:gre~s1on cstirnawr~ ot. -177-485
5ta tistical mference-. about. 313
and lime scncs data. 5%-59~
or ln:atmcnt. 470
treatment \'S. ehgihility effect$. 476
unobserved \'ariltllon in. 502
variables. See C'.a~Ual rc:lation~h1p~
among vanable~
c d.L See Cumulative probability
distribution
Census. ~ec U.S. Cen~us
C'.entral timlllht:ort:m 18. 49.51-54 7'3.
78. 84.
l32- t33. 2os. 429.
434.546.549.bS4.715,729
and asymptotic distribution. 681
and convergence m dtstribuuon,
6: 1-684
Jelimuon of.l\84
C hebychev's tncqualh). 6112-{..SJ
Ch1-square<J disrribulion . 4:>-44. 89. 170.
445.654. 69()-Q'} I. 715. 717.733
Ctu-squarc:d statJSIIC. 399
Cno"; Grcgof). 566
Chow tc:~t. 566-569
modlflt:d. Sec Quundtlikelihood ratio
( 0 LR) ~tatistic
85--S8
ss.
Cigar~tte
727
anal~ s1' of. 592. 594[. fJ l R--624
2R6t
Common intercept. 358
Common m.:ao. 49
Common trend,655
Cond!tulnal dtstribuhon, 30-34. 371. 126,
IM-162. 20~
and
e~h:nded
(1.'-i()
INDEX
10
N<
tl::!.IJ~I7.\.
con~tru~:lln!).hJ.Il:i.li'i
n1
I-~J;,I5:'i.
IM 22U(>(I
con~ragc
validity
in muluplc reg.re"icm, 22U-245
on~ stdcd. !>3
1511,1>1
fun<.:llun, 111 probu rcj!.rc,sltln. ;\'XI..3\lo
Current J>opulaunn Surw~ (CPSI.l!6. lb5.
271. ~!W-~~). 32ll. 5~
sample. 95
Covcrag~
.335
teml. 2'7tJ. ~'91, 29.'
\umulaU\C dbtributi<>n
ch1-squarcd. 229
F.~9
c~nlrol
479.~1-lK.'\,-1..,'2),-t\'7,-NI-l'l~
4'17-5(-.J, ;i1)tl
( tln'C.:TI(<:nl:C
iu tlhlltbuuun. 6S.3 I>X.I
repc<tlcd. ~97-4\1\1
ddinitton of. 498
Cubac rcgn!~IOO. 275-'!.76. 294/
D
;md 1ypoc:~ nl HJ...IJ
Dcgrt:.:' o r lrccdon, , 43-J4 ~'\\1.91. 125.
1711.44'\
n.Jju,tmcnt.200.1'19("k>91 717
Dt~ta. ,.,urc~.:;.
75,201
dc:fHlllion ,1f. 125
Dtrn.md closucarv, .J2J-.121>
fur ~,I! tr~nc~ -130-4'\2 .n7-t3':1
Dc:n'-Jh 'tuncllon ~~ l'r."dbahl:y dcn<it~
fun,uon
Lkrc:nd.:nt 'Jra3bk 'iu Rct!r~~sand
l>eh:rrnm"t'~ tr~nd. ~)~.~~~~. ~ol 5o2
ddinatan ol. 555
C<lrft:{IIOO .
11m.:'~'~-
66.''
rt~<>l,
b5J
h5:tt
fm wintcftdtion 1\'\9
Darr~:r.:ncc olm.:an1-o
anatiV>I\. 15~. !9.\
1-)-.:!'11. 47~711
pml'tabatity. ol conftdence
mt.:rvailB
CPL. <:u Cun,umcr Price lnJt:~
C~
\urrent Populat1ton Slll'\-ey
Crune r~lc:,J51
Critica l ' :alue. 71J..W, &~> Sr1 ul.1u
lallstiCal bypothc:'i' l t!'ll
Cr~.,..~llonal data. JSO. J55. ! 99. 5<.16
dcfinllmn of.l l
IV r"grt:J>~ioo assumptaun> for. 7J6
s..,
".n. 24c;.256.
:!411 2::~.25~/.
21!3
and olin lion , 501
codficienl. 95. U 4
and l:tmdllaonal mean' :'>6 I:!8
populattnn. See Populution ~'Orrclauon
regre~son. and t:rror t.:nn Su
P\lpulatioo rcgrt:\~1<111
sampl.:. ~..t' Sample dUtocurrd..t tlon
and ~mplc covanancc. '4--'.lll
serial ~t' Senal corrd.tllun
Co\anan'c: 3-l-35. 94. 122. I ~X
matn>.. 7~. 71(}-71~. 71J, 717. 7lU
72~725
J)-36.1l~ ~~~190.21..'i
.:!27-221'.235-23~.
Ccmst.;h:n~
C\f'O.:I'IDlClll
\orrclataon
compa11n11 444
\ontrol \anahiC'io.1'H
2:!'-124 1fi<.!
78 S
.~t'l'
of 479-4'iiJ
dCIIUIIIlln 1>f.~hl
error tcrm.Jl\5
\:$110\Jtan ~u~l JI~.:t .f<:;(I,J%
Cumulau~e
19 :!I. 2.3
tn log.11 anJ probit re!!J~'ion-.. ~9
Cumuhtll\c standard lngi\11~ drstnbuunn
funcuon .394. 31)7
,.~ ~inglc
11)!!--IC.JCJ
O~rc.:l
786
Di
INDE X
Cl< lt
Studentr
)"'
o,
rt
DOLS. .s. <h"''d!DlC ordtn:tn I st '<lU><h.'
(oOlS 1 esiJm:ltor
anumpuom.. 6111
autocorrclaton f
rv
rm.
EIJI4il12
"'ith
Studcnrrdttnbudun
with
'~<-<
Olf.SIDII~h.: ., .X
n16
rate'
n umm)' vnn.. hlc:. .'iu BUlQJ} '"" hlc:
t>urnm~ vo~rablc trap. 20:<. 357
')mntt'IC
Sill -.:>R
tn '6
h(JI( ..Mf!
DvrMmic muhipliers. 600. (>(J2-fltl4. f. 19.
f.!l,trltluuon
.kt ('ondllionnl
COI'IlJ'IIIInS. 6 1 2~13
.Jt,tribUIIOO
Ol
hum~k~Uru~-onl~ 1-~tilllSIIC.
hllf'k'UI
JdcnttUI,..:.o
JOint \. Jotnt d tnh tion
ttltnth IWrntlll o;.ampltn!( Su Jointly
normi\l li11J'hng Ul\tnl>uuon
JOint rrul:tablluy. .!II
kurhr.i, 26
l.!r!(t:\umple. Stt l :.rge'ltmpfc
d1-..mhuuor
nurg.m~l. 3:1
measure-.. o1
"~~r
617
nnl~
F)Ultt,.IC: vs.
I c:h:ro<kc:J ..,tKtl) ruhu' t /~tattt.c. 71 illl
ul homc,.,L.cda<Uctt'
~>-1.'
IC:"'h )I
,7
muh" l!rl.tle n.,rrr II. lit't \lulman3tC
!'><
nrmaf dl~lrihuttun
n.m-nurmJI.untt wultcw-. 1\.'i:\ f>55
nom1uf \',.1 l'oonnaf Jl,tnhullon
ot 01 S e~l m:llflr 14s
nl O LS t.-'1 ~tall\tlc<. f:'Y
prohJhlht} ~tr Prohahtht~ ..ll'tribution
''' '"l!rc:-.~un \taii~IICS v.tth n.,rmd
Cfi'I'T'\. 71S-II'l
rf,
currc:clton modd
ddmun.. "r.~
:~quarcd
rc:,du~l'
E
F:mnmp.~
1~
3M
ilUII..:11rr fJI d.tn dt>lrtbuted
1,\,
7'~ "73_~
1 '
III.IUrc:J 121
nntl mufttpfc rcj!rC'sswn. 721
anJ ~nul mncln11on, 592.hD,hf'l
\l.mt.lunl d~' 13lion ol Set !lt.mJ.ml
c rrnr nf rc!-',ltl'.'on
und trcotmc:nr level.~l
crnr t}tlt f unJ II, 7V S,, 'r, ' \t ~~ llatl
IL')mploucafh Su Asymptoucall~
cmrurcd. 735
,l,.,"nc-.... .:?b
> \t.tn.J.srdll.:d J\cr..~ ~I
vi t..nd.Mdved 5allljlle &\crap.c,5V
1kl111111un nl. ~~
l (r
1>1
IC$t ..'irr
h~polhc:-.1,
F<;,~
elltCI<'DI GM~1.,.,umator
~ mvdc-o,
tllll -60:!
co !~tKm ol aero ~rvauoas. n1.
oJ '\11 nato.-.....67
11 (,Ls c,umator.St>< Genc.-rahtc:d lcust
~quare~ (GL$) csumulur
nt CJL.!> ,sttmaior See Ordmary lc:a~t
... quare' t0LS) csumnt<r
ami vannnc.,. Srr Vanance
r: ftt"cnt G/11\f "~timator. 72?
w~
{2Clt
condtuon~l
J~I,J'i'\-1}0
[ogtl-Gr
~ tnrccasung..625
hy (,J.S,hl.'--615.627
wnh 'trictfy exogenous reg."'hm'.
dl\trtbuhon
\')" tulle
du'4u:tred.St't' ( hl'<junred
r:stlm:ued. mtci'J'retmg,4~"
~llmattns, 211'9.1SO
dchnttton u(,2h7
vi demand ~t' t>crll3Dd cllt\ll<:tl~
l)mm
nde~penmcru< 4"-l
c.f )UJ'PI~ . "tre Supph d;utlt'll}
Ehtuhilu~ cffccu "- trcntmcnt ,4"(,
I )ow Jonu lnd~lrial Avera!!" 42
Drlf r n hmt w u, w1th. ~... R.andum v.lliL. l!nJ'!lc:nOO) tMirumc-nr i4Q
mc>del
l!ndngc:nouHcgrcs-..~4:U,4h, 141 W'.
ddinnton ol5'11.597
c:hrn.lte c~><:lttct<!nl\ ul, bl/1.613 ol5
I:Jdcrllub<:r Whttc 1
J mJno. \ t'
HttCh'-l.:edJ,( l)
W.t
~tandard err< r-.
r.ta,uaty 21:!. 2>\11, 2!/1
tc:'t
E'um.tlc.<kllnlltOn ol ni
f
ngel-Gr~n!lc:r
Augmc~d Dicke~fuUcr
,\01") tc~l
(EG-
.\RUt mo<kl.667
INDEX
of
cau~>ll
ettecl.
s,.,.
Cuu~11 cil.:.:t~
ICll\t "QUUrt
regre"nn m<l<ld codf1e~<!nt\.
:; ( Coc:fll.:lt!nh
in lo!!ll modo:!. \<lr>--JO
''I populallon rne;m.Srr Popu1ataon
mc.m
tn pmbu mudd W('>-1011
v i rcerc,~1<'" hnc. 20:!
bum.n~..... l~l.li\.~169, 1 1J7.2n1 'n2
SIJI!C
otlmC'~r
nlcau.<al d(e<:t t 11
c~>ndlllnnall~ unha,ell 11>7
..:un'"tc;lll 76 ~n CucN\It!nt c~tmator
Wt
gc:neruhteJ 1.:."1 'lJllllres. See
Coeneruh,cd lea.t ...quare'
lc:."t J..._nluJc: U. 'iali """ ~l't: u ~I
uh-..1lute lle\;atmn' ~'limn tor
hnear tnru.lllmtMih unhased.
I anc:ar c~>nJlttl>n.tlly unhaa~ed
ellw:.~nt
s,.,,
"'llmatm
Jt,lnbUIHllJ, ).lq
TS LS. Ste 1\vo Mogc t.;,,~l squnr''
450
IC\llltg Ol ol-15
ExO!lCliOIIS \'<.irtablco;..-1~~3(). 4.'8 . .1-1,5
dchou,on ol-l~. """ <>26
ExO!!CilOUS \',tnatlOO, -151-!52
Expc.:tatconls1. ~J
C'ondJttOn:ll \a Condition I
exp.:ctauons
iterated. Ia~ ol \rt Law of iterated
Se.. UnbtasC'dn.:ss
"O:I!I.IHcd least ,quarcs '''e Wc iph ted
1~t.tr~- ~lilnator
l.uropcan Ccmralllank. 7
l"C:ll" IIJ
F,acl d~ll '>o..: n ol.';.....l\1 );.'
E.xugcncitv, 41.1
a.~<umptum ol ll25-67S
of
co~U~dl
effect~. 477-4i!5
">.ttmator
unl't~\Cd
.$t6
te}!rcssum -135
Fitted value .5.-t> Prcdi~ted 'Jlue
Fh;cd c:tte<:t~ n:gre""'n.35!> \61
oi'-Wffipllllll\, 360, 'M-.31)()
dehruuon uf. .1-i\1 \'\n
modd. ~~11-359. \7:! 41>4
u.'ing hlOilT) 'uraable'l. '\57
J~fmnaun of '57
"ith mu!Uph: rcgressun. ..~5.<;
rdahng tralfic llcnlhs H> ukohol 1.1\.
.:wl-~1
<'>K!'C(IoiiJOO~
and
.U!2
.t5:!~55
ot-4 5
E~(I(Ctedv3IU~
"'"~cneil\
a'~ssm.:nt
787
F
Fan ~hart. 55<1
FDD su~ FrcC71ng liL'$fC:O: ll.a~'
Fd .tnl>ullm ~5
Lca,1hle dh~~nt tiMM "'tunat~lf, ;y;.
Fc.a,lhle GLS C\limaturs. 614 'Po'\
juu.:c: pric~'
Fvre.:il'l
'\71
So TtcrMcd for~cast
llltcrval\. ~JX-54Y
''confallenc.: mtcn'al ~9
dodimtwn nf. '144
u' mg R MSFe to con\trllct. 5-IX
momt!ntu!"l Su \tomentum forecast
' -t pred" led \ Alue 537
um:ertamt) 5-IJo>- 54'1
Fur~c.c.tmtt. \12
Fr.lctiun Cllrrc~otl~
pr~tlu:tcd ,.\'}<1.....1~1
788
INDEX
-1\)IOJ'IIIIit: Ul\ltibUIIOII 1. 71 I
tltw. .S711
o;ompulln~4.c;~
I'll'7-(llJY, 668}
dchmtaun uf. 665
Gcncrnh.t~J
652
nh,illlilj!CS v.;. di~thllntag..:s. 6\S
''"umruons. 122-724
rule nl first. 72~727
:t')mphnically BU 'E.I\17~11~
dcllntllon nf,127
~~~~~ rlhUUt>n or. 71!)... 71 Y
con'"t ~nC)
hctcro~kccJ,,~tlclty
robust, S,t
l lcteru... ~cdn... llclt> rohu't f .
<tllthllc
404/
lull~:olumn rank 723
Pull Ji,dO'iure ~ 17-~ Ill
fulkr, Wl!\11.:.560
Fun~llon
uellnttJCin nf WJ
'oluuoru. w. '111
rn rt>~r-:~sion St I me11r rcgr.:'"""
noni.Jncar rcl<(rCI>OtOn tun.tton"
01.~. 726
cffc~ncv
ot. 1Jl5
,, 01" hili
~lid 1cn cunduonal menn ~ ...wmpuon.
72~-727
e. . umatr
clha:J.:Ol s~r Ftficicnt (o\1\i 1:\limaiOr
dd 10111nn of 733
lcU\lhl~ clrict<lnt Se~ Feu ...ihh: duct.:nt
( iMM
c~timator
10 hn~Jr
s,,.
cstmntor
regres<ion 101'lt.ld with. 694
volatilit~
clustcnng
wctghtcd k;c.t "-fLWT~ 691 6Y2
Ho:tcro~kcd11Sl1Cll~ -robust
F-stallstu:.
Hdem,kcd<~StiCII\ -rohm.,
,r
Hl-115
1 ....1 S c'IID101IIII, I\\I) ~~~&C> II' 4.l~
,~1. 1102
Hom,,,t..cJ.t~lli:ll\ -unl\
1\lalls!tc. 170-171
Jl\tnh1111<H1 <1. t>'lC,._:,a,l
789
IND EX
H1 mosl.cdnuo r' mil 'liiUIK'C ~ 'tun:slor. ln1C3Sibk \\~lghl-.<1 J. st "jWirn.. fJ91
t 711
tnrur100
U)mv~l..c:,.fa,ll nom~ I rc:t:rcs,iun
tor<'ca,lin~. 6-7
and monc1.1n p..olj,,,(,:f,
'"ump111.111.. 1711
lhpnrh.:'>l~
.md oi I pnc"' o~t>
owrall pncc. 6
.lllctn.o\1\C: \n \ltt>lflllll' hypnthe"'
null .,,.,1\ull h)p..thcsr~
<c llm~ m1crcst r lie~. I
.u~J un.::mpl~menl St'r 1/so l'h1lhJ"
~h pntl ""~ I c.,lint:Y, II 113
o.lcfnliiiOO ()I of,
cunc:
lniMllllllton critcrb. 554 \l't'UIJn A.t\e>tor diU.rr:nu! hcl\\t:c:n ro means.
1:i.,l-I'J
u 1 ...,au'" tntcrton lBTC
Akat~c mlormauon .ntcnon
and mcn"'lcnr I uro.laro.l ..:rror~ 32r,
( AI(')
lltl"m.ol \o~lu.ltl\,.\14
ttnl ~ . , Jurnr hvp..ttho.:-.
cakululln!!. 5'\'\
lor lal)lenglh ...,kcrtou, 51Cl 55.1
IIH.I mUI IIpic: Clltlfl~'lrOt' 711'
IMtutlillly. ~rt Arc.tkS
.mJ muluplc r.:)!rc"hm ,-!2!1 2-l!i
t'lc-,>dcJ J;n I~ ~
lnstrumcnll~l
..:kfinlliOO l>l. -~.R
h r J'L'I('UI~UOI
I 1 71 "-I
C'\'~l'Oclly J13. 4.\1. I:X,, 4J'l-4J') ~01.
fnr p.1pUla1JOD rcgre."~ .!113
-=->
\\llh prc'spc.al~d ''gn1f
It'!,
~S-NJ
coodi!J(l(l'lI( .l)..\,.U2, 4~ ~
lnleltJ
5W
2t\7
:s~
::!93
, IIIli
11110:1>;('('1
h\'l"llhc'" l~"'llll!J. for,\55-l:i( t
lntcrn.11 ,,,lll.llly 11.:\ Jlh, ~31>-:-.'\1) 406
llll'lf'IIOtlll\ ~J1 -31J
~'6.4b':l.
so.
'-\It
h.'<ollll"
1\1
-.S, I ~~nuol
lntrurnc
r,
I\"
u\1 m.;.thoJ.(~ ~
11 \c, ln..'irumen
v.o ro.N.::s
anal)~.U\
;~usa! c:tf~..-1"
~to
lndc:pc:m.len,-cassuntptron Ill
lno.lc:pc:ndcntl) d1$1nbutc:ll .t6
Ind. pendenll\ lind rJcnurnllv ,.h,trlhulcd
(II d ) Jo1 ti6 flll, ~77 12.'- I::<J
fo-)o,-6
e~tuuto "
1 'tl1
in q" ,,..,xpcnmcnt
Jo~p~11. CiiWI r 5.33 5J~i '\~5-~5<1.55<.1.
Sill
VJ.JJ \~ ' hd mstrumcnt
lt.'trum.:nlll ~n~hlcs rcpc-.too (I\'
ddintllcn of. 535
n:gn:""'"l 31'- .\2b..ln.-11l-t:S!>. Jolum..:n Sor..n 11tl
.Uli'J JR-1 ~17 5U.'I
Joint .twmrrout' drsmhullun, 7111
%)um runnt .r. <~21 W
J<>ml J .. rnhullon,2'1-40.13l '204.~llh
'umptun~ for en"~
' umplin lor used clfe..:h rcgr.-ron.
3M
"'"""nat o.Jatll 7.\li
a)\ IDJ'hlll~ cJI,Irll'IUII\10 ol(, iOJ
tnd e\l.:nJcd ka\t MjWirC" t\ umptuns.
rlOd C;tU\4\I ~lt.:ct ..1115
Nil I. 707
hntlmQ
tn)trurn~nt<., 45('1
G\IM esumauoo
methodof
Co,.ncrnliu-d
e<>umauon
111th helcro.)g\.'"IICOU~ cnll..al ellttl
501-'0int.:rprdmft. ~5(1
in\cOIIIlO ('ll .1].5
m main\ torm. 7~? ~b
111e{h4ntl' <>1.4~1-l '1
(I "IS)
2..>~. il 'L715
JoHnth normal rnn!lom \:mahlcs.:ZOS
Jomrh nonnaJ :!kliDJ'Irn~ di>lnbUIIOII. I" 'I
Jo nl normal d! '1llbuhno. 21.1
Jmnl nuiiii\'Jl'Oibem. ?.1b-219 :31 265
"'il7. 2io
lmnt l'rot>:lbtht} dL,In't>uuoo " 30.
'4 ll.l
Jo1n1 sarnphnc dilnbuu,,n,205-:!l~ '11U,
nl. Z2S
"2:l7
;,, "' at OJ A nrltrtiL' Llur. til<"', 190
J m lll of I mnmn..tna, ~9
J llllbJlC ~~ .:U~9 .4~ 456
&ymptouechl-:SqWircd drstribuhnn of
727
Jclimrwn of.~
ICJC:Cltc>O, 4JO
J II,JUII~<:UrtCIItiOO ~o..fll"""' :S 12
/h ,lllhl\,;t)\IUIIfli.:C, 53;!
790
INDE X
K
Kentucky. 497
Klein. Joseph. 425
1< rucgt:r. Alan. 442,492. 497-498
Kurtn~t'S. 2f>-2~
finite, 119- 130. 135.203
L
L.lg('). ~.2&-531
dt\trihuted. See Di~tril>uted lag.
OimibUied lag model
length sclccuon. 549-554
muJitplier notatinn. lil 7
operator, 543
Landon. AJ( \1 71
Lllrg~-~amplc approximation. 129
chi-~quarccl.
229
3B4-389
comparing to loglt and probit , 3':1(1
definition of, 387
modeling mortgage de ntal probilba lity.
.:102
~honcomi ng.' of.3&9
Ltncar rcgres.,1on. 111-DS. 169.257,274.
'294}
and foreca~tmg. 5Ti
model. I I 1- 123. 126. 135. 188. 197, 205,
207.264. 285.707
with muhtple ro::gressors.IS>..2 10
OLS [unction. 257[. 259[. 2r.Al, 277[
with one regre~sor.lll-135, 148,
677-696
population function. 254
Ltncar time tre nd . 561-562
Local average treatm.:ut effects
numbers
5-16.715
loga nthmtc speci fication. 275-276.
331-332
an<.! awmptotic theory. 68H)82. 6.'>4. 6f?.7
L~lgaritbmic t ransformations. 2R6
and cunvcrgence in probability,
Loganthms. 267-275.290-291. 2<J2t
681-{)l.(J
proof of, 682-683
definition o f, 267
d iffcr.: nccs-o f approximatio n, 53 1
Least ah~ulutc dcviattom e~timator, 169
natural. See Na tural logarithm
Lca~t 1>qunrcs assumptions, 12~132,135.
and percentages, 267-lM, 271 .274-275,
148, 160.163. 167. 209-210.221 ,
::!36
284
tn regression. 273
lor l'fO'>'~ectional data . 364--365
definitinn of. 126
Logistic cumulative distribu tion function.
394-395.408
f:lllure in. 314.328
wg.isuc regre<,ion. See Logit
10 multtplc regression 202- 205. -H 9
rcgre;..,ion
ornmc:d \ ariable btl!!> and. 18&-H)C)
vaolaticm o.316,327
logll rcgre'Stvn. 384. 39-t-41XJ
cocffictent;.. .:~timating. 397
Lc:.st ::qunres estimator. I 18-119. S.:~ u/.\tl
dcfiruuon or. 389
Go::nerallzcd le.a:.t squares.
c~timation an, 396-400
Nonlinear least ~uares. Ordinary
inference in. 396-400
len'-t squares.1\vo stage least
model. 39-4--395,395/,408
'-(jUures. Wetghted le.ast square~
defined, 70
comparing to linear probability cmd
Leptol.uruc. 2ll
Ukclthood function .:100
dcfinltaon of. 398
Lama ted <.lcpcndent variables. J~. 4~ .
StP ttlw llanary dependent
vunul>le~
probit.396
modeling mortgage dcrual
probability.-402
populauon. 394
Log-linear regression . 273
function. 272!
mood. 274
dcfimtwn oi, 271
5pcC'tficauon. 271, 273
Log-tog rel:!ressaon. 273
motlel.271-272.274
s,.,
M
Mucroeconomtc foreca~ting. 9
Mudnan,Urigi ttc. 90
Manning. Willard G .. 44(\
Marginal probability di~tributton. 33-34.
43.132
Mand boatlift.4%
Market microstructure:. thcorv ot 574
Ma"-"achuserts school dl\tri<:t. class si7e and
matnx
Maximum likelihood ~ttmatvr (ML.E).
384.393-394
for ARCH and GARCH coeffk ca ts.
667
Mean(s)
common. See Commo n mean
condttl<mal See Condit tonal mean
of distnbution, n
of distnbution of carrung.... lo5
population. See PopuJa ticm mean
t:>ia<
1\linnc'-Ota. 305-366
M.inontte~. und mortgage applic:11ion
denial. Sa '.iortgagc: lendtng and
race:
INDEX
\Ill'
~.
\fa,imum likc:hhoo..l
<:-t mator
nl tlJ~cnhuuon . 27
md cxtc:nd.:d lcllst \qunre' assumptions
(,Sit. 707
go:n~rdlw:d method of ,.... uruauon,.t5(t
rtll.ll\
Momentum ltrc:o: ~>l~ 54 1
Monthh cxce" ro:tum, 541
1\lortg.agelo:nJmg .tnd race, J 5.38.'-3R5.
Jli8-.~Y '92 W3 1()(1,41)2.
40J-J116, -l(lh
79 1
0~1'\UIIOn\
N
' turul o:xpo:nmcnK .\rt' Qu~o
expc:nmc:nh
1'-.tluralllll!.aritlun. '.:!b7. ~711
JctJruti~n of. 21\8
N olur<'. IQ(l
'-17
5 'P
01 S \ OrJman lt:.!.'l ''~u~r.:
Omitted vanohl.: btas.l!>b-l<J3.l'l5.
lllX IW. 2 W. :!JC>-21'.1. 24+-245,
:s.J.l<HI-:!91. '1-1. 3111-118. n 9.
~ ,n r.amplc
..Ut tl(
~ompk
J5~
47U
792
I NDEX
220...223.3 16-326
nonlinear least sqWlTc' o:timator.
\hAreJ propccties. 39'"'
m nl>nhnear regrcsston. :!.SO
of populaU(>n mean.l22
and prt!Jich:d values. 119
reasons to U.\c!. 121-123
restricted r~t(Te"ion. 231
~tt W..tnl>ution ot 131-B-1 "U-1
standard err''" 156. 3~b-3:!7 s.:~ al-.11
Ho:to:ro<ol..etJa,tiClty-robu't
standard errors.
H omosketJa,ucn\-onl struldard
crro~
biased. 323
with fi.xed effect~. 3S3
function. linear 257/.259/
funcuon. quadral!c 259/
li ne. 119-121. 123. no. 165. 197 JW.
224. 257
' latbtics. 203. 715-7ln
Ortlm~ry lcastsquure' COL$) rcsidu;\lS..
119,125. l97-19S.201,23~ . 7 l h
v" forecast error. 5J7
Out<.:'Omes.. l~
p
Pand tlata.l3 1-1. 35ll-17I. 4:30,452,S%
toefore and .tiler compamon:.. !lC'c
-setorc and art.:r- comp;u'\00$
defmition otl3.31~.3"11
estimaung causal dft.>t:L 4."()
rcgresst<m,.:U<J-37:!. 3M
anai)''M..,4h'l
assumptwn'. 'l~O
estimauna. 'Jo'7
s trudUR ot. 35(] J5:>
on SUbJ.:<.1' uf qu~s~xp..--nment,J 1 17
v.11h two urn.- pc:nuJ-. \)3-l'\o
s,.,.
IN DEX
Projo:ct s I\R ,.,.,.
c\penmcnt
c: '<f"!rlmc n1\
P<.cuJu uut ..,t.,,,mplc fcrcca,tinl!. "1- 571,
problcllb 'l'ilh.49J-1'>5
ddtnllhm <)l 5"' I
rea<.<m' 10 stud\ . .11)!,
v.;. tru~ out-of ~oompl~ furcca,llng.57 l
R.mdoml~ as.stgnc-J trealm<:'nl. 90. 4 7<l
Randomlv sampli:<l data 205
U'C:' (lj, '71-:'72
vartancc.: f '4)>
Randomncs.' 12
Pl.cudo u-, -l(XJ
Random sampling,45-41'i. h.i . 73.132 I '5.
p'" urd~t IIUIOrc~rc,f- c modt: l .'>1'1' A R(p)
15:". 234. 32:!
mudcl
imponanct' ol. 7~71
{J-\aluc. 72-KI.I.f'l 1~1.1 55 .:!24 259.267.
~implc:. Set! Stmplc random \arupling
l7n 2~3.::!X9r , 2Y1 . 2112r.::!'J3.331 .
vanauon. 79. l-1\1, L10 494
:n.tr
Random vnriuhlc(~). 1&-54. X4 , Y4.
cakul.llm)!.. 7-1 7f>.. 77 153(
131-l32. 170. ll\6 . .227
condtuonal dtStnbuuono;. 311 34
compuun~. U>tnl'
appro~llll3lc O()mtal Ji,tnhulton lYl
condnional m<!an ot. 12.\
~~ mJ.ud oormul dlqrtbulllln. I71
dascrete v:;. <.'Ontiouou.' I'I
St 1dtnl t dl\tnbulton 92
distributed.. ~'I
I ,t.tli\U(\ ~1:!
.:xpcx:tcd \alu.... 25
dcllnuullt ot. 71
JOUll and mar~anal dl'lnbuuon. :!~.>-~ ~
IC,tlnl! c\clu:.t<>n uf group-- uf vanat>lt.:'
hnear funct1un ol mean and ''artant.-c of.
~2ti
J04t
Q
Ouddratic form. 71"'
Ouadr:~llc h'll4'1K , 2S~~.W
dcftMHIO Ot.l.~'
Oundnttlc polynomcal 11u
OuadruttL pupuhtuun rc~trc$Sctln model.
:bl>
N-1-.l99
Jdv;\Oill~C
of. 50S
.m..tl)ltn .197-4<1Q
allntcc>n 10. ~~
dcfinlliOO elf. X. N~
dfl!cl~
tn
hctt:ru~cncuu~
pupulation!>.
~11. -507
and .:~pcrtmcnt,-16l;-501!
a~'umpttoD.> Cc>r
R
R 231 "''1)..2.111. 'i1r n/.10 AOJU)Icd R~
I J, I u'lll 1 1S
R' adjtAic:d for dcl!rcc nf fr.:ctk>l" ~....
rc
,\JJU I<! U~
ddtntllon <ll. nu
l'x.Jmph: ut. s, I'll\
~1\3-IS
I C!!J~ 'IVO
C'ltrn:llcd 2;>\
cntcrprcac.J '" modelmg probabtltty.
'8tl
inlcrprtolCO a' prc:dt<.;to!d probahilih.
~M
nunhncar 1(>..).....~1\a
linc. l lli,IIX.I24.13-I, l66.2fMJ, 2R2- 2!>3
..,umatcd, I ~~. 234
urdmaf) lc;N ,.quarcs (O LS) See
Or<hnttf} lc:u't ~ uarcs
populauon. ~I''' Populnuon
rcgrc~sion hne
hn,;.cr !itt< Llne.u rc:grcs~ton
lul!ll Sa LO!ttl rc:gn'!.SIOO
mtKid. wtlh hctcm~kcdasttcit\. 694
till '<.lei lollartthmic. Su Lotzarlthmi
n.:l!.rcsston model
m11<.ld' im torc.u,ting 527
mulltple. Set Muhcple r.:gresston
11\>"''hncat. St. " nnlmcar rcp-e-<Ston
funcllo '
OLS. Su OrdmJf) lt:ast squares
rc~'t'CS~tl'"
unrc~tncted
s,., LlnJ'cstnctc:d
rcgr'""'tn
Rer~.....,orb). l'I~2JO.L'2-24fl
I> nan.lt>2
llcruurion of. II ~
s.-,
273,31-1
79 3
r(.!gf~'~f"
mulltpk.:!.;tJ
''"lUC. model
mt>dd
tcmc: '.:n~s ddl~ S, ,. Time ~encs
daltt
Rcje.uc>n regaon 711. Sl. 'it", ul o St8lhlielll
h) pt.)lhC'I\ tc:'l
~nd
794
INDEX
Rc<aduol(s). 124.165
'um <1f ~qunrcs. S ee Sum of ~quarcd
rt,lduat<
ordmal'\' k..~.,t..quar~ IO LS) re.<ic..lual
~--~ OrdmaT\ least squares
reg.rc<-'1011. 1:!.5
Rc,lnctc:J rc~rcsswn. LW-2..< I
Rc:,;tmUons.1:!i!. -234. 2b.Z- 2tl3
ddimllon olllf>
Rebrc:m.. nt "" mgs. sumul:llmll. 90
Rc,cr;e cau..al cttcct 325-321>
Rcv<!r-e ~'llu~allin.L 32.3
R.t!o\hl-hdnd '~<11nable. .'ll!t' Rcgrco.~or
R.t\<.f. { hit '<XI."' 549-551
R.'<iSFE. v Rvot mean squ:~red rore...<~~t
crrllr
RolL Rtchard, 625
Roost\'dl Franklin D .. i I
Root mean 'Mluared forCC3St error
1.14[
ot cl1.1n~~:
'" inlluuon vc:. uncmplll\tnCnt rote
cstimatm~. '\71
source' ol <:rror. 537
R-Squ.ucJ.
R"
s,.,.
momcn t.2~>
l.s\c.:' .ant.lt..mo~m~'
dtla. l::ll/
of huurh Cllr0111!t'
ol
!Itt>(
ntal and
of u"t -eN<:<>
'" Ji,IO\ol IOL"Uffi~ "J'\Ij,].SYf
' ' 'ludenl teacher rnlau, 1\IJf II)\)
'~ lhr;:~ S!Udcnl ch.tru~lt:ri$W..'S.
Jl/
o f trulfc dcn1bs anJ aknhulta'tc-;.
~52/
d~rrablc chan.:tcOrN~ of
e~umat\>r. fo7
lnrg.:,amplc appro\amuuons to, -1~ 11
::us
of -.ample
~vcrJ)l\!, -lh-olX
5-l'J
m lrlllll, d".tth' dnc..l l-eer taxc~. 3.5 ~1
und ~~rrd<lllOn. 115
of t:<JUihbuum pn~c o~nc..l <IU;mmv. 427}
uf ~tlm.uc:J rc~'"'''"" hnc; Cahlorma
p.nm,ntlo-mo:me r.ho.3&>(
IU\,:.!0::1
S1m'-. (1m,tophcr. 641
~tmultilnet>UH.sUsalit~
l2h. F JJIJ.4.U
" ,,_;.;!1\, 4'1 451-t52
defmllnn I .\23
<it nullaneuus C\jUallons 1>1 ' \
<.;,mult.mcou~ ou'<~hl' """
Sm~k rcttrc''"' moJcl.l9+ 1Q:\. J117. 199.
.:!112, ll.l
<iue of lc,t . 'W .\, ul -;1.11t tical
hvpol h, h IL
Sk.:wn"'" l "-2.'>
'ilufl<: t,P 1) .:!:\5/ l()Sf, ~l'llJ, ~YS/
\runou.r.:Jtr~"ion. 551J
Soil llSh
dcfiruuon ur 55b-55'J
Square roN malrix, i2~
SSR !>t"t :,um ol..qu<ned rcs.dual~
'itimdard d "litm 117 D:S. IIl5
dcU.nlllon ol. ~.1
of cr rur H."rm .'il'l' St<mdard error of
r.:gt.:\''"n
"' rc:gr"''"'" "rrnr ~rt S1and:ud error
ol r<:(:r.,-son
of regrc~Jon r"'idual'\. 1~5
'f '<liDf'k ,t\" r .:~ge "''
c.l ~1111phn~ dMnbulltlO, 71.J. 1-!9-150.
'ZI
;md v.uian.:c ~4--:!fo
Stand~r.t c:rrMb)
m.J auiOc>rr.:blum .mel mtcrcnce.
(l(JI l>tl~
du_,terc:\1 'te <1u.~tere.l ~r.mdard error~
.:nmp,ll marru. ~'rr"'"''"s lor . 71~
tktmll!on ol -5-"'6
m thrtCI multpcnll<l rc:grc-..<ions.
b4tHH7
of c~uma tcJ effects. 21'12- 263
c rron.. ~5
339
mconst~tcnt.
Jehnltton of.
124.~(X'l
formulator.2t~l
3~-1!1. "J
I
~ M . &.......,'N.l50. 151-1'3. 1r..;
JIII- I ~ .
St.mu;~rd
'n
2~~>.
St.llifurd .\ch~"em~nt
lest. 11
~1>6
795
INDEX
5
IUIIIIY
'<omtau(lOllnl\
e'rem.'llc h)pc tho" ot 563
dr:(IDih1111 <I ''h .. ll :4;'
11 11\.'Jf\' dl\tlllUIIu11, <.i'i,(ill JiO~
7 1\1
Stiiii,II<S
IC'\1 W 1>!, h5
s"
011
lr<nd .)5'\ s~. ~7.' nn. 641.
ha~
(W
115{1
t\62
11nd .1uturc~re
~(\()
8\ niding, s~
cum~uon
ur ~~~ n11m1<
mulrtflhco.. 1 ,"J trmc l!CO.: d.ua 726
'>r 'udural change \rr Brc: h
\truruntllnlaballl\ \rt' U " ll~
~tru.:tur.ll \ AR modchns:, t..l
'ltudent f\hln'huflnl! "4 fol. ,,_Q::.
I?U-171 r'"' 71S
"rull~nl to: tdh:l r.11111 tSTI{ ) and rest
e 1 I Ill 117 1::!0 1L'-12h
jJJ !)] !l>ll, ltt- 1 ' 1'11>-1~'.
I'~" '~~~. "l, 2.\K, :!4 l-::!.W. 2:\h
2.~'' ~tl .'h\-:'M.ZIJI. 2110,.\JIJ,
ll' 3),()..3J9
C'ahtorru.1 d:art,llr.l P.IIS(. 101. 1~1.
125.1J'I,I71- 172.1M-11>7 1'1".
nn.~ .l.N-~-1at.l12r.~5~r257.
oen .. data.
!{)r .190
:n ;,
M ~"tcbuscn~ tl II a . .. 32f,
33-lr
lcnnc,<;<o: dol !I ~ \7, J!it>-1).1.
4SS/, 4~1
rn
furca.~rng. '27
111ln<formmiJ h) ~t.1nll;lldiZUII! 'B~
C:t
I<Wnetric5..4,~
ni!JCI
C;
O:t'lll.lllf h~ II ~
C'cntr.IJ hmtl
thcor.:m
rr ltl\'llllllte. ~ \tuhhan th.: ccnltill
lnruttl eocm
Theorem. .:onrmuou m.IJ'J'In ~""
rh~orcm
~s-
\ onunuuus napp111
The .,,.., Ga~.> \L. ' .\tari;
rh:urem
w (,
lim~ fl\c:J
OdllliUOO
of 'n~
lt.:l!f<''MOD \\llh.J61
if>~
tnhtlll:~'\
ut<lCOITdauo
s... \. rocorrcbtron
.9S
5~7 '\i~<>
~rali7ed mcth<M.I ol
mo1mc:nts
tG\1>.1) eo.trmaor 71.6
H \(. Mandlltd error' m, H)J-6f!S
intrC'ductton tn. ~15 711
n .iUilhiM. ~::!b ~ 7
1:
~Q
"""'
prohlenu. S45
lollll $Urn ttl "tll3r~ (T '), 12l-J:Z4
\l tuultnn 111, IL1
It llhc death' and olwhol tax.:-.
U9
~~
J.1111 Je"-Ttjlllvn.-~5(}-,13
\"V"'' ~7
l(, l
,~2
1'12. 50Ct
Treatment '"nut>le ,p!l .JCl.l
fr.,nd{sl . < 4 ..,...,
de hnllhlll I '>~'\
J.tcrman '"c \~t l),rc:munt't" tr~ods
hnc r lin \r L r r 1 me ttcnJ
f'-ol ~ '
T
t..;;ec:1tn1Jhr
T"'\ \c Tct tl sum ot squllrC)
I<Hb.
i;! I
~."
rJotun~ ~~
UCldlr,ltC', ~
.: r
tc&t
ddtllJJn>n ui.42J
. . z,-o~'
dcfin111o~ '' "'' :!. ~'1'1
wntparoonofrc:. 1
Ca!ifom1 und \I
J.tmhr.1n or ".,
\JC1CU111JI, 51111 'I"I
Slod;. t>cra
~I
:.~~;1
rntcrval
44St
4JQ.~'l-'U
.t;5-456
~14 ~a;
U>)'Olf'IVII\
7~'1
796
IN DEX
Jd
, ..
u
Lnb.. o.mred pancl351
Lnbta>et!n~s. l35.2U'J
ot.:~limator,67
OJ!.I12,205.2Jo.3 13
defmmon oU7
ofOLS etimaror.I2J,I6l 167
-0~
lJJ
Untt :mtureyre.."SS,.c root IUntl ruot).
tl511--n.55. 61l~...M '. llft~r !itt' alw
Swchasuc Uo;ml" l)f--lo.l.!) l('t
deftmiJun ol 557
lesb fur 5~56-1
UodcriJentified cocllll'ltnt'
cen,u..._ 65.442
n2S-Q;:6
L " mtht.at:. -l%
L S.. puhhc c:ducation "'tern 1
L stock market C. IU. 541
mutual funds. 32-l
~~
v
\ ,tlld tn~lrumcnt. 43\1 . .W3. 4S\--l~5. 4Y?
chcd;ang.-' 19-4-15
wn.JIIIIllll hn, ..123, -IJ(l
~uur~c
ol. 15H--45S
' m cbk:.,..1.!2--t!.3, 'S:!,-l54,-185 511-<
\ alau 1\
~ ' ~cmlll ~;,.,. C.~tcmal ...oloJil'
3!\11
.), r f.:.xot~cnnu' ,.mable
c~" cnvu
del""'"'~!
mo; 11
ol 11.:;1
w
\\ .1ld . A I-orah n 71
Wll!d '""IUO '"1 /-si .. IJ,tic. 71<;...7111
W;dl .: [)a\td. 1::5
\\ cal. dep.-nJcncc.54o
Weak '"'trumnts.. -13\1. 455
tmhcutnr -V\9--WO
d~finiuvn vt.-11~ 1
st.md.trdlfJ W
tr.:atnh:nt \,, frcatm~nl \.lrtdble
un~r<.:J . .l!i-1 ~Sl!,371
"31oJ m trumc:nt ;}tl! Vahd '"'trument
\,&n~hlo:)
1~,
(<
ult<lfl for +II -143
\\ .:.11her ~, Cc.d weather .. nJ orange
JW~ pn,cs
\\ o;tgnted :1\t' li~C 170
V. cijthtetli.:J,I "!Uures (WL') ).I>h.
ffll-{>'ltl "'22
,tdvant.t~~ mJ dtsad\ ~nt.&!l~ ul (,c)S
d.:fmtttuo of I 1-11. Mil
CSIIm.lt~J St.l.:JSible "'"'ghlt:J kas1
~u.tres
IZ-I.JJJ-135 l'il.
1111 IIW 167,170.21111.!:! . 227.
'I~. :uo-3!1.Sre "'"' '\tandard
handhntL h~u:rvslccdasi1C11).691.
tiT CIT)
With
VauMce P
o9H%
ht:h:r()l<k.ellasucity on.no\1n
l uncuunal lorm.l>'12-il9~
' 1
on
or ,l'nJtlll nal tliSirlbuuoo IN't
ul dt~tc random \'llnahlc, 25
nd dfi~ocn,"\ hl!
231$
Nc" CV Wc~l. Sec f\cwc) \\bt
\ m.tnce CSUI!l31M
po.1I.J. S Pooled ';an an..-.:
o;\UIII.IICII(\)
~umaht
ol popubuon dNnbuuon ,
Ol f'SCUdO <lUIohampl (OI'C.:a>Ung.
54S
I m\cr.~ of C.aluomrll,4 - t ~R
t.:nr..:rnry 01 Chrcag'' 407 -1~
~mrle.oorursh:n,, vr,l:!.<~
o<~l-111:!
West. Kennclh,60S
Wlut<: ' tunJ.!rJ erro~ ~e.:
H"l~:roke\lasucrty-r<lhu't
-.t.mJurd errors
\VU1Jla Ell.:o, 190