Sei sulla pagina 1di 83

Study Guide for Econometrics (second semester)

Programa Universitat-Empresa

Universitat Autnoma de Barcelona

February 2008

Michael Creel and Montserrat Farell

Contents
Introduction Econometrics at the Facultat About this study guide Bibliograpy 7 7 8 9

Chapter 1. 1.1. 1.2. 1.3.

GRETL

11 11 12 20

Introduction Getting Started Chapter Exercises

Chapter 2. 2.1. 2.2. 2.3. 2.4. 2.5. 2.6.

Dummy Variables

23 23 23 25 28 30 34

Introduction Motivation Denition, Basic Use, and Interpretation Additional Details Primer Projecte Docencia Tutoritzada Chapter Exercises

Chapter 3. 3.1. 3.2. 3.3. 3.4.

Collinearity

35 35 35 38 39
3

Introduction Motivation: Data on Mortality and Related Factors Denition and Basic Concepts When does it occur?

CONTENTS

3.5. 3.6. 3.7. 3.8. 3.9.

Consequences of Collinearity Detection of Collinearity Dealing with collinearity Segon Projecte de Docencia Tutoritzada Chapter Exercises

40 44 45 45 46

Chapter 4. 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. 4.9. 4.10.

Heteroscedasticity

47 47 47 48 50 52 55 56 64 64 66

Introduction Motivation Basic Concepts and Denitions Eects of Het. and Aut. on the OLS estimator The Generalized Least Squares (GLS) estimator Feasible GLS Heteroscedasticity Example Tercer Projecte de Docncia Tutoritzada Chapter Exercises

Chapter 5. 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. 5.7. 5.8.

Autocorrelation

67 67 67 69 70 70 77 78 79

Introduction Motivation Causes Eects on the OLS estimator Corrections valid inferences with autocorrelation of unknown form Testing for autocorrelation Lagged dependent variables and autocorrelation: A Caution

CONTENTS

5.9. 5.10.

Quart Projecte de Docncia Tutoritzada Chapter Exercises

80 81

Chapter 6.

Data sets

83

Introduction

Econometrics at the Facultat


Econometrics (Econometria) is an annual (two semester) course in the Facultat de Cincies Econmiques i Empresarials at the UAB. It is a required course for the degree of Llicenciat in both Administraci i Direcci d'Empreses (ADE) and Economia (ECO). In both ADE and ECO, Econometrics is normally taken in the third year of study. Econometrics is an area of Economics that uses statistical and mathematical tools to analyze data on economic phenomena. Econometrics can be used to nd a mathematical model that gives a good representation of an actual economy, to test theories about how an economy behaves, or to make predictions about how an economy will evolve. Estimation of models, testing hypotheses, and making

predictions are things that can be done using econometric methods. Courses that are fundamental for successfully studying Econometrics are Matemtiques per a Economistes I and Matemtiques per a Economistes II (rst year of study) and Estadistica I and Estadistica II (second year of study). Ideally, students should have passed these courses before beginning Econometrics. If this is

not possible, any student of Econometrics should immediately begin serious review of the material covered in these courses. Basic matrix algebra, constrained and

unconstrained minimization of functions, conditional and unconditional expectations of random variables, and hypothesis testing are areas that should be reviewed.

INTRODUCTION

Microeconomia I and Microeconomia II are courses that provide a theoretical background which is important to understand why and how we use econometric tools. Macroeconomia I also provides a theoretical background for some of the

examples of the second half of Econometrics.

About this study guide


This study guide covers the material taught in the second semester, in groups 13 and 14 (the groups of the PUE). The guide contains brief notes for all of the material, as well as examples that use the GRETL. This guide does not substitute reading a textbook, it accompanies a textbook. It also does not substitute attending class. The guide highlights essential concepts, provides examples, and gives exercises. However, class lectures contain details that are not reproduced in the guide. To learn these details,

attending class is fundamental,

as is careful

reading of a textbook.

The

guide provides references to the book

Econometra

(cuarta edicin) by D. Gujarati,

mentioned below. In the second semester of Econometrics, we will cover material in Chapters 9, 10, 11 and 12 of Gujarati's book. This guide has been checked to work properly using the Firefox web browser, and Adobe Acrobat Reader. Both of these packages are freely available for the You should congure Acrobat Reader to use

commonly used operating systems.

Firefox to open links. This study guide and related materials (data sets, copies of software and manuals, page.

etc.)

are available at the Econometrics Study Guide web

BIBLIOGRAPY

Bibliograpy
There are many excellent textbooks for econometrics. Any of the following are appropriate. This study guide refers to Gujarati's book. You should denitely read the appropriate sections of at least one of these books. (1) Novales, A. , Econometria, McGraw-Hill (2) Gujarati, D. , Econometria, McGraw-Hill (3) Johnston, J. i J. Dinardo, Metodos de Econometria, Vicens Vives (4) Kmenta, J., Elementos de Econometria, Vicens Vives (5) Maddala, G.S.(1996), Introduccin a la econometria, Segona edici. Prentice Hall (6) Pindyck, R.S. & Rubinfeld, D.L. (2001), Econometria: modelos y pronsticos, McGraw-Hill. Quarta Edici.

CHAPTER 1

GRETL

1.1. Introduction
GRETL (GRETL

http://gretl.sourceforge.net/)

is a free computer pack-

age for doing econometrics. It is installed on the computers in Aules 21-22-23 as well as in the Social Sciences computer rooms. You can download a copy and install it on your own computer. It works with Windows, Macs, and Linux. It is avail-

able in a number of languages, including Spanish. The version for Windows, along with the manual and the data sets that accompany D. Gujarati's distributed with this study guide, and are also available :

Econometra

are

Gretl v. 1.7.1 for Windows Data to accompany Gujarati's book

The examples in this study guide use GRETL, and to do the class assignments you will need to use GRETL. This chapter explains the basic steps of using GRETL.

Basic concepts and goals for learning: (1) become familiar with the basic use of GRETL (2) learn how to load ASCII and spreadsheet data (3) learn how to select certain observations in a data set

Readings: GRETL manual in Spanish or in English . You don't have to read the whole manual, but looking though it would be good idea.
11

12

1. GRETL

Figure 1.2.1. GRETL's startup window

1.2. Getting Started


Once you start GRETL, you see the window in Figure 1.2.1. You need to load some data to use GRETL. Data comes in many forms: plain text les, spreadsheet les, binary les that use special formats, etc. GRETL can use most of these forms. We'll look at how to deal with two cases: plain ASCII text data, and Microsoft Excel spreadsheet data.

1.2.1. Loading ASCII text data.

The Wisconsin longitudinal survey is long

term study of people who graduated from high school in the state of Wisconsin (US) during the year 1957. The data has been collected repeatedly in subsequent years.

1.2. GETTING STARTED

13

This data can be obtained over the Internet from the address given previously. In Figure 1.2.2 you can see that several variables have been selected for download.
Figure 1.2.2. Downloading data

In Figure 1.2.3 you see that one of the available formats is comma separated values (csv), which provides records (lines) that have variables which may be text or numbers, each separated by commas. Downloading that gives us the le wls.csv , the rst few lines of which are

iduser,ix010rec,sexrsp,gg021jjd,gwiiq_bm 1001,60,2,18000,109 1002,,1,,79 1003,,2,,111 1004,,1,,96 1005,,2,,83 1006,65,2,-2,99

14

1. GRETL

Figure 1.2.3. comma separated

1007,70,1,-2,86 1008,71,1,-2,86 1009,67,2,16827,106 1010,72,1,17094,88 1011,67,2,7698,124 1012,,2,-2,124


This rst line of the le gives the variable names, and the other lines are the individual records, one for each person. There are a total of 10317 records, for individual people. Some variables are missing for some people. In the data set, this is indicated by two commas in a row with no number in between. We need to know how to load this data into GRETL. This can be done as is seen in Figure 1.2.4. Doing that, we now have the data in GRETL, as we seen in Figure

1.2. GETTING STARTED

15

Figure 1.2.4. Loading a csv le

1.2.5. This data set has some problems that make it dicult to use. First, the variable names are strange and not intuitive. Second, many observations have missing values. You can change names of variables by right-clicking on a variable, and selecting Edit attributes. Then change the name to whatever you like. See Figure 1.2.6. To see that many observations are missing values, right-click on a variable and choose Display values or Descriptive statistics. For example, the variable income (I

renamed gg021jjd to income) shows what we see in Figure 1.2.7.

16

1. GRETL

Figure 1.2.5. CSV data loaded

Figure 1.2.6. Changing a variable's name

1.2. GETTING STARTED

17

Figure 1.2.7. Missing observations

To eliminate missing observations, we can select from the menu Sample -> Restrict, based on criterion, as in Figure 1.2.8. We need to enter a selection criterion. This data set is missing many observation on income and age. We can select that these variables must be positive. This is illustrated in Figure 1.2.9. Once we do this, the new sample has 4934 observations, as we can seen in Figure 1.2.10. Whenever you are using this data, you should make sure that you have removed the observations with missing data.

1.2.2. Loading spreadsheet data.

Data is often distributed as spreadsheet

les. These are easy to load into GRETL using the File -> Open data -> Import option. Figure 1.2.11 shows how to do it. We need some spreadsheet data to try

18

1. GRETL

Figure 1.2.8. Select sample, 1

this. Get the nerlove.xls data, and then import is as I have just explained. Once you do this you will see the dialog in Figure 1.2.12. Select no.

1.2. GETTING STARTED

19

Figure 1.2.9. Selection criterion

Figure 1.2.12. Data dialog

20

1. GRETL

Figure 1.2.10. Restricted sample

1.3. Chapter Exercises


(1) For the Wisconsin data set: (a) change the variable name of the variable ix010rec to age (b) change the name of gg021jjd to income (c) change the name of gwiiq_bm to IQ. (d) select observations such that age and income are positive. should have 4934 observations after doing so. (e) save the restricted data, with new variable names, as the data set wisconsin.gdt. Conrm that you can load this data into a new GRETL session. (2) With your wisconsin.gdt data set: (a) explore the GRETL menu options, the help features, and the manual, and print histograms (frequency plots) for the variables age, income and IQ. You

1.3. CHAPTER EXERCISES

21

Figure 1.2.11. Loading spreadsheet data

(b) print descriptive statistics for all variables.

CHAPTER 2

Dummy Variables

2.1. Introduction

Basic concepts and goals for learning. After studying the material, you

should be able to answer the following questions: (1) What is a dummy variable? (2) How can dummy variables be used in regression models? (3) What is the correct interpretation of a regression model that contains dummy variables? (4) How can dummy variables be used in the cases of multiple categories, interaction terms, and seasonality? (5) What is the equivalence between the dierent parameterizations that can be used when incorporating dummy variables?

Readings: (1) Gujarati,

Econometria,

(cuarta edicion), Chapter 9: Modelos de re-

gressin con variables dictomas, pp. 285 - 320.

2.2. Motivation
Often, qualitative factors can have an important eect on the dependent variable we may be interested in. Consider the Wisconsin data set wisconsin.gdt . If we

regress income on height, having selected the sample to include men only, we obtain the tted line in Figure 2.2.1. Doing the same for the sample of women, we get

Figure 2.2.2. Comparing the two plots, we can see that:


23

24

2. DUMMY VARIABLES

Figure 2.2.1. Income regressed on height, men

Figure 2.2.2. Income regressed on height, women

2.3. DEFINITION, BASIC USE, AND INTERPRETATION

25

the y-intercept is higher for men than for women the slope of the line is steeper for men than for women men are taller on average - for men, mean height is around 70 inches, while for women it's about 65 inches

There are a few questions we might ask:

why does income appear to depend upon height? What economic explanations are possible?

why do women appear to be earning less than men, other things equal?

Apart from these questions, it is clear that a qualitative feature - the sex of the individual - has an impact upon the individual's expected income.

How can we incorporate such a qualitative characteristic into an econometric model?

The need to use qualitative information in our models motivates the study of dummy variables.

2.3. Denition, Basic Use, and Interpretation Dummy variable (denition):


A dummy variable is a binary-valued variable

that indicates whether or not some condition is true. It is customary to assign the value 1 if the condition is true, and 0 if the condition is false.

Dummy variable (example):

for the Wisconsin data, the variable sexrsp takes

the value 1 for men, and 2 for women. As such, sexrsp is not a dummy variable, since the values are not 0 or 1. We can dene the condition Is the person a woman? This is equivalent to the condition Is the value of sexrsp equal 2?. This condition will be true for some observations, and false for others. With GRETL, we can dene such a dummy variable, using the Variable -> Dene new variable menu item, as in

26

2. DUMMY VARIABLES

Figure 2.3.1. Dening a dummy variable

Figure 2.3.2. Display values

Figure 2.3.1. To check that this worked properly, highlight both variables, R-click, and select Display values. This shows us what we see in Figure 2.3.2. Note that woman is now a variable like any other, that takes on the values 0 or 1.

2.3.1. Basic use and interpretation.


like any other regressor. like

Dummy variables are used essentially Variables

In class we will discuss the following models.

dt

and

dt2

are understood to be dummy variables. Variables like

xt

and

xt3

are

2.3. DEFINITION, BASIC USE, AND INTERPRETATION

27

ordinary continuous regressors. You should understand the interpretation of all of them.

y t = 1 + 2 dt +

yt = 1 dt + 2 (1 dt ) +

yt = 1 + 2 dt + 3 xt +

Interaction terms:

an interaction term is the product of two variables, so that

the eect of one variable on the dependent variable depends on the value of the other. The following model has an interaction term. Note that The slope depends on the value of

E(y|x) x

= 3 + 4 dt .

dt .

yt = 1 + 2 dt + 3 xt + 4 dt xt +

Multiple dummy variables:

we can use more than one dummy variable in a

model. We will study models of the form

yt = 1 + 2 dt1 + 3 dt2 + 4 xt +

yt = 1 + 2 dt1 + 3 dt2 + 4 dt1 dt2 + 5 xt +

Incorrect usage:

You should understand why the following models are not

correct usages of dummy variables:

(1) overparameterization:

yt = 1 + 2 dt + 3 (1 dt ) +

(2) multiple values assigned to multiple categories. Suppose that we a condition that denes 4 possible categories, and we create a variable observation is in the rst category,

d = 1

if the

d=2

if in the second, etc. (This is not

strictly speaking a dummy variable, according to our denition). Why is the following model not a good one?

y t = 1 + 2 d +
What is the correct way to deal with this situation?

2.4. Additional Details Seasonality and dummy variables.


seasonal variations in data. Dummy variables can be used to treat

We will use the Keeling-Whorf.gdt data to illustrate

this. You should be able to use GRETL to reproduce the following results:

Model 1: OLS estimates using the 468 observations 1965:012003:12 Dependent variable: C02

2.4. ADDITIONAL DETAILS

29

Variable djan dfeb dmar dapr dmay djun djul daug dsep doct dnov ddec time

Coecient

Std. Error

t-statistic 1504.5009 1506.4046 1508.6276 1512.7780 1513.5233 1509.1057 1500.5705 1489.3056 1479.3061 1477.3572 1481.9367 1486.1530 300.0664

p-value

316.864 317.533 318.271 319.418 319.848 319.187 317.653 315.539 313.690 313.548 314.792 315.961 0.121327

0.210610 0.210789 0.210967 0.211147 0.211327 0.211507 0.211688 0.211870 0.212052 0.212235 0.212419 0.212603 0.000404332

0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Mean of dependent variable S.D. of dependent variable Sum of squared residuals Standard error of residuals ( ) Unadjusted Adjusted

345.310 16.5472 634.978 1.18134 0.995034 0.994903 7597.57 0.0634062

R2

R2

F (12, 455)
DurbinWatson statistic and the plot in Figure 2.4.1.

Multiple parameterizations.

To formulate a model that conditions on a given

set of categorical information, there are multiple ways to use dummy variables. For

30

2. DUMMY VARIABLES

Figure 2.4.1. Keeling-Whorf CO2 data, t using monthly dummies

example, the two models

yt = 1 dt + 2 (1 dt ) + 3 xt + 4 dt xt +
and

yt = 1 + 2 dt + 3 xt dt + 4 xt (1 dt ) +

are equivalent. You should know what are the 4 equations that relate the rameters to the

pa-

parameters,

j = 1, 2, 3, 4.

You should know how to interpret the

parameters of both models.

2.5. Primer Projecte Docencia Tutoritzada


Podeu treballar en grups de ns 5 alumnes. L'avaluaci formar part de la

nota dels exercicis. Recomano instalar Gretl en un ordinador porttil amb WiFi,

2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA

31

per poder treballar comodament. Heu d'entregar abans del dia 1 de juny un breu informe (10 pgines mxim) sobre el segent:

2.5.1. Theoretical background.


output level inputs

For a rm that takes input prices

and the

as given, the cost minimization problem is to choose the quantities of

to solve the problem

min w x
x
subject to the restriction

f (x) = q.
The solution is the vector of factor demands

x(w, q).

The

cost function

is obtained

by substituting the factor demands into the criterion function:

Cw, q) = w x(w, q). Monotonicity


Increasing factor prices cannot decrease cost, so

C(w, q) 0 w
Remember that these derivatives give the conditional factor demands (Shephard's Lemma).

Homogeneity The cost function is homogeneous of degree 1 in input prices: C(tw, q) = tC(w, q)
where

is a scalar constant. This is because the factor

demands are homogeneous of degree zero in factor prices - they only depend upon relative prices.

32

2. DUMMY VARIABLES

Returns to scale The returns to scale

parameter

is dened as the inverse

of the elasticity of cost with respect to output:

C(w, q) q q C(w, q)

Constant returns to scale

is the case where increasing production

implies

that cost increases in the proportion 1:1. If this is the case, then

= 1.

2.5.2. Cobb-Douglas functional form.


g

The Cobb-Douglas functional form

is linear in the logarithms of the regressors and the dependent variable. For a cost function, if there are factors, the Cobb-Douglas cost function has the form

C = Aq q w1 1 ...wg g e
What is the elasticity of

with respect to

wj ?

eC j = w

C W J

wj C
1 ..wg g e

= j Aq q w1 1 .wj j

wj 1 Aq q w1 ...wg g e

= j
This is one of the reasons the Cobb-Douglas form is popular - the coecients are easy to interpret, since they are the elasticities of the dependent variable with respect to the explanatory variable. Not that in this case,

eCj = w

C WJ

wj C wj C

= xj (w, q) sj (w, q)

2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA

33

the

cost share

of the

j th

input. So with a Cobb-Douglas cost function,

j = sj (w, q).

The cost shares are constants. Note that after a logarithmic transformation we obtain

ln C = + q ln q + 1 ln w1 + ... + g ln wg +
where data. One can verify that the property of HOD1 implies that

= ln A

. So we see that the transformed model is linear in the logs of the

g = 1
i=1
In other words, the cost shares add up to 1. The hypothesis that the technology exhibits CRTS implies that

=
so

1 =1 q i 0, i = 1, ..., g .

q = 1.

Likewise, monotonicity implies that the coecients

2.5.3. The Nerlove data and OLS.

The le nerlove.xls contains data on 145

electric utility companies' cost of production, output and input prices. The data are for the U.S., and were collected by M. Nerlove. The observations are by row, and the columns are

COMPANY, COST (C), OUTPUT (Q), PRICE OF LABOR


and

(PL ), PRICE OF FUEL (PF )

PRICE OF CAPITAL (PK ). Note that the

data are sorted by output level (the third column).

(1) Baixar les dades nerlove.xls (s un txer Excel). (2) Importar les dades en Gretl (3) Crear logaritmes de cost, output, labor, fuel, capital

34

2. DUMMY VARIABLES

(4) Estimar amb MQO el model

(2.5.1)

ln(cost) = 1 + 2 ln(output) + 3 ln(labor) + 4 ln(f uel) + 5 ln(capital) +

(5) Comentar els resultats, en general, i especicament respecte homogeneitat de grau 1 i rendiments a escala (6) Crear variables ctcies (a) (b) (c) (d) (e)

d1 d2 d3 d4 d5

= 1 si 101 <= rm <= 129, = 1 si 201 <= rm <= 229, = 1 si 301 <= rm <= 329, = 1 si 401 <= rm <= 429, = 1 si 501 <= rm <= 529,

d1 d2 d3 d4 d5

= 0 al contrari = 0 al contrari = 0 al contrari = 0 al contrari = 0 al contrari

(7) Estimar el model (2.5.2)

ln(cost) =
j=1

j dj +
j=1

j [dj ln(output)]+3 ln(labor)+4 ln(f uel)+5 ln(capital)+


Presentar un grc

(8) Comentar resultats, enfatitzant rendiments a escala.

representant rendiments a escala com una funci del tamany de l'empresa. Interpretar el grc. (9) Contrastar restriccions

1 = 2 = 3 = 4 = 5

conjuntament amb

1 =

2 = 3 = 4 = 5

i interpretar el resultat.

2.6. Chapter Exercises


The professor of the practical session will give you a problem list. The problems 9.1, 9.2, 9.3, 9.5, 9.6, 9.13, 9.15 on pages 311-320 of Gujarati's book are recommended for study.

CHAPTER 3

Collinearity

3.1. Introduction

Basic concepts and goals for learning. After studying the material, you

should learn the answers to the following questions: (1) What is collinearity? (2) What are the eects of collinearity on the OLS estimator: how does it aect estimation, hypothesis testing and prediction? (3) How can the presence of collinearity be detected? (4) What can be done to improve the situation if collinearity is a problem?

Readings: Gujarati,

Econometria,

(cuarta edicion), Chapter 10: Multicolinalidad:

Que pasa si las regresoras estn correlacionadad?, pp. 327-371.

3.2. Motivation: Data on Mortality and Related Factors


The data set mortalitat.gdt contains annual data from 1947 - 1980 on death rates in the U.S., along with data on factors like smoking and consumption of alcohol. The data description is: DATA4-7: Death rates in the U.S. due to coronary heart disease and their determinants. Data compiled by Jennifer Whisenand

chd = death rate per 100,000 population (Range 321.2 - 375.4) cal = Per capita consumption of calcium per day in grams (Range 0.9 1.06)
35

unemp = Percent of civilian labor force unemployed in 1,000 of persons 16 years and older (Range 2.9 - 8.5)

cig = Per capita consumption of cigarettes in pounds of tobacco by persons 18 years and olderapprox. 6.75 - 10.46) 339 cigarettes per pound of tobacco (Range

edfat = Per capita intake of edible fats and oil in poundsincludes lard, margarine and butter (Range 42 - 56.5)

meat = Per capita intake of meat in poundsincludes beef, veal, pork, lamb and mutton (Range 138 - 194.8)

spirits = Per capita consumption of distilled spirits in taxed gallons for individuals 18 and older (Range 1 - 2.9)

beer = Per capita consumption of malted liquor in taxed gallons for individuals 18 and older (Range 15.04 - 34.9)

wine = Per capita consumption of wine measured in taxed gallons for individuals 18 and older (Range 0.77 - 2.65)

Consider the models, with the estimation results:

chd = 1 + 2 cig + 3 spirits + 4 beer + 5 wine +

chd = 334.914 + 5.41216 cig + 36.8783 spirits 5.10365 beer


(58.939) (5.156) (7.373) (1.2513)

+ 13.9764 wine
(12.735)

T = 34 R2 = 0.5528 F (4, 29) = 11.2 = 9.9945


(standard errors in parentheses)

chd = 1 + 2 cig + 3 spirits + 4 beer +

chd = 353.581 + 3.17560 cig + 38.3481 spirits 4.28816 beer


(56.624) (4.7523) (7.275) (1.0102)

T = 34 R2 = 0.5498 F (3, 30) = 14.433 = 10.028


(standard errors in parentheses)

chd = 1 + 2 cig + 3 spirits + 5 wine +

chd = 243.310 + 10.7535 cig + 22.8012 spirits 16.8689 wine


(67.21) (6.1508) (8.0359) (12.638)

T = 34 R2 = 0.3198 F (3, 30) = 6.1709 = 12.327


(standard errors in parentheses)

chd = 1 + 2 cig + 3 spirits +

chd = 181.219 + 16.5146 cig + 15.8672 spirits


(49.119) (4.4371) (6.2079)

T = 34 R2 = 0.3026 F (2, 31) = 8.1598 = 12.481


(standard errors in parentheses)

38

3. COLLINEARITY

Note how the signs of the coecients change depending on the model, and that the magnitude of the parameter estimates varies a lot too. The parameter estimates are highly sensitive to the particular model we estimate. Why? We'll see that the problem is that the data exhibit

collinearity.

3.3. Denition and Basic Concepts Collinearity (denition):


Collinearity is the existence of linear relationships

amongst the regressors. We can always write

1 x1 + 2 x2 + + K xK + v = 0
where

xi

is the

ith

column of the regressor matrix

X,

and

is an

n1

vector. In

the case that there exists collinearity, the variation in

is relatively small, so that

there is an approximately exact linear relation between the regressors.

relative and approximate are imprecise terms, so the existence of collinearity is also an imprecise, relative concept.

many authors, including Gujarati, use the term multicollinearity. Some, including myself, prefer to call the phenomenon collinearity. Collinearity as used here means exactly what Gujarati and others refer to as multicollinearity.

Exact (or Perfect) Collinearity (denition):


In the extreme, if there are exact linear relationships, we can write

1 x1 + 2 x2 + + K xK = 0

3.4. WHEN DOES IT OCCUR?

39

In this case,

(X) < K,

so

(X X) < K,

so

XX

is not invertible and the OLS esti-

mator is not uniquely dened. The existence of exact linear relationships amongst the regressors is known as  perfect collinearity or exact collinearity. For example, if the model is

yt = 1 + 2 x2t + 3 x3t + t x2t = 1 + 2 x3t


then we can write

yt = 1 + 2 (1 + 2 x3t ) + 3 x3t + t = 1 + 2 1 + 2 2 x3t + 3 x3t + t = (1 + 2 1 ) + (2 2 + 3 ) x3t = 1 + 2 x3t + t


The

s can be consistently estimated, but since the s dene two equations s, the s can't be consistently estimated (there are multiple valthat solve the rst order conditions that dene the OLS estimator). are

in three ues of The

unidentied

in the case of perfect collinearity.

3.4. When does it occur? Perfect collinearity:

Perfect collinearity is unusual, except in the case of an error in construction of the regressor matrix, such as including the same regressor twice.

Another case where perfect collinearity may be encountered is with models with dummy variables, if one is not careful. Consider a model of rental

40

3. COLLINEARITY

price

(yi )

of an apartment. This could depend factors such as size, quality

etc., collected in if the

xi , as well as on the location of the apartment. Bi = 0

Let

Bi = 1 Gi ,

ith

apartment is in Barcelona,

otherwise. Similarly, dene

Ti
as

and

Li

for Girona, Tarragona and Lleida. One could use a model such

yi = 1 + 2 Bi + 3 Gi + 4 Ti + 5 Li + xi + i
In this model,

Bi + Gi + Ti + Li = 1, i, so there is an exact relationship be-

tween these variables and the column of ones corresponding to the constant. One must either drop the constant, or one of the qualitative variables.

Collinearity (inexact):
The more common case, if one doesn't make mistakes such as these, is the existence of inexact linear relationships,

i.e.,

correlations between the regressors

that are less than one in absolute value, but not zero. This is (unfortunately) quite common with economic data.

economic data is non-experimental, so a researcher cannot control the values of the variables.

common factors aect dierent variables at the same time, which tends to induce correlations. Variables tend to move together over time (for example, prices of apartments in Barcelona and in Valencia).

3.5. Consequences of Collinearity


The basic problem is that when two (or more) variables move together, it is dicult to determine their separate inuences. This is reected in imprecise estimates,

i.e.,

estimates with high variances.

With economic data, collinearity is commonly

encountered, and is often a severe problem.

3.5. CONSEQUENCES OF COLLINEARITY

41

Figure 3.5.1.

s()

when there is no collinearity

When there is collinearity, the minimizing point of the objective function that denes the OLS estimator (s(), the sum of squared errors) is relatively poorly dened. This is seen in Figures 3.5.1 and 3.5.2. To see the eect of collinearity on variances, partition the regressor matrix as

X=
where

x W X
if

is the rst column of

(note: we can interchange the columns of

we like, so there's no loss of generality in considering the rst column). Now, the variance of

under the classical assumptions, is

1 V () = (X X) 2

42

3. COLLINEARITY

Figure 3.5.2.

s()

when there is collinearity

Using the partition,

XX=

xx

xW

Wx WW
and following a rule for partitioned inversion,

(X X)1,1 = = =
where by

x x x W (W W )1 W x x In W (W W ) 1 W ESSx|W
1

1 1

ESSx|W

we mean the error sum of squares obtained from the regression

x = W + v.

3.5. CONSEQUENCES OF COLLINEARITY

43

Since

R2 = 1 ESS/T SS,
we have

ESS = T SS(1 R2 )
so the variance of the coecient corresponding to

is

V (x ) =

2 2 T SSx (1 Rx|W )

We see three factors inuence the variance of this coecient. It will be high if (1)

is large

(2) There is little variation in

x. Draw a picture here. x


and the other regressors, so

(3) There is a strong linear relationship between that

can explain the movement in

x well.

In this case,

2 Rx|W

will be close

to 1. As

2 Rx|W 1, V (x ) .

The last of these cases is collinearity.


Intuitively, when there are strong linear relations between the regressors, it is dicult to determine the separate inuence of the regressors on the dependent variable. This can be seen by comparing the OLS objective function in the case of no correlation between regressors with the objective function with correlation between the regressors. See Figures 3.5.1 and 3.5.2.

Consequences - summary:

the parameters associated with variables aected by collinearity have high variances.

high variances lead to low power when testing hypotheses. high variances lead to low t-statistics, broad condence intervals, etc. the results are sensitive to small changes in the sample.

3.6. Detection of Collinearity

The best way is simply to regress each explanatory variable in turn on the remaining regressors. If any of these auxiliary regressions has a high there is a problem of collinearity.

R2 ,

Furthermore, this procedure identies

which parameters

are aected. Collinearity

Sometimes, we're only interested in certain parameters.

isn't a problem if it doesn't aect what we're interested in estimating. An alternative is to examine the matrix of correlations between the regressors. High correlations are sucient but not necessary for severe collinearity. There may be a near exact linear relationship between 3 variables without the existence of any near exact linear relationship between pairs of variables.

Also indicative of collinearity is that the model ts well (high

R2 ),

but

none of the variables is signicantly dierent from zero (e.g., their separate inuences aren't well determined).

In summary, the articial regressions are the best approach if one wants to be careful.

Example: using the mortalitat.gdt data, discussed above (Section 3.2), we can use the articial regression approach, regressing spirits on the other regressors (cig, wine, beer). The results are

spirits = 1.01350 + 0.0670534 cig + 0.0794414 beer + 0.313745 wine


(1.4477) (0.12709) (0.02738) (0.3101)

T = 34 R2 = 0.8907 F (3, 30) = 90.669 = 0.24749


(standard errors in parentheses)

3.8. SEGON PROJECTE DE DOCENCIA TUTORITZADA

45

. Note that

R2

is very high: we have a serious problem of collinearity. This explains

the instability of the parameters we found earlier when we tried several models in Section 3.2.

3.7. Dealing with collinearity


Collinearity is a problem of an uninformative sample. The rst question is: is all the available information being used? Is more data available? Are there coecient restrictions that have been neglected?

Picture illustrating how a restriction can solve

problem of perfect collinearity.


There do exist specialized methods such as ridge regression, principal components analysis,

etc.

that can be used when there is a severe problem of collinearity, but

these topics are advanced and are outside the scope of this course. These methods present problems of their own, they are not clear and obviously good solutions to the problem. In sum, collinearity is a fact of life in econometrics, and there is no clear solution to the problem. It is important to be aware of its eects and to know when it is present.

3.8. Segon Projecte de Docencia Tutoritzada


(1) Pel model de Nerlove de cost de produccio d'electriticat

ln(cost) = 1 + 2 ln(output) + 3 ln(labor) + 4 ln(f uel) + 5 ln(capital) +


que s'ha explicat en Secci2.5.3, fes servir regressions articials per comprovar l'existncia de conlinealitat. (2) Quin s el motiu per la falta de signicativitat del coecient de Nerlove? Dona una interpretaci econmica.

en el model

46

3. COLLINEARITY

(3) Verica l'existncia de colinealitat en els models de mortalitat que estan presentats en Secci 3.2. Baixa les dades i fes les regressions articials

pertinyents. Tamb presenta la matriu de correlacions dels regressors cig, spirits, wine, beer. Dona una interpretaci

3.9. Chapter Exercises


The professor of the practical sessions will give you a list of problems. To that, you might also consider the exercises 10.5, 10.7, 10.9, 10.19, 10.30a, 10.30b from Gujarati, pp. 361-371.

CHAPTER 4

Heteroscedasticity

4.1. Introduction

Basic concepts and goals for learning. After studying the material, you

should learn the answers to the following questions: (1) What is heteroscedasticity? (2) What are the properties of the OLS estimator when there is heteroscedasticity? (3) What is the GLS estimator? (4) What is the feasible GLS estimator? (5) What are the properties of the (F)GLS estimator? (6) How can the presence of heteroscedasticity be detected? (7) How can we deal with heteroscedasticity if it is present?

Readings: Gujarati,

Econometria,

(cuarta edicion), Chapter 11: Heteroscedastici-

dad: Qu pasa cuando la varianza del error no es constante?, pp. 372 424.

4.2. Motivation
One of the assumptions we've made up to now is that

t IID(0, 2 ),
47

48

4. HETEROSCEDASTICITY

or occasionally

t IIN (0, 2 ).
This model is quite unreasonable in many cases. Often, the variance of

t will

change depending on the values of the regressors, or there may be correlations between dierent

t , s ,s

= t.

For example, consider the Nerlove model of section 2.5.3.

If we estimate the model in equation 5.9.1, a plot of the residuals versus log(output) is in Figure 4.2.1. Note that the variance of the error appear to be larger for small rms, and smaller for large rms. This seems to violate the classical assumption that

E( t ) = 2 , t.

If the variance is not constant, we have a problem of

heteroscedas-

ticity.

Note also in Figure 4.2.1 that there seems to be correlation in the residuals:

when a residual is positive, the next one is too in most cases. When a residual is negative, the next one is more likely to be negative than positive. If this is the case, it's a violation of the classical assumption that we have a problem of

E(

t s)

= 0, t = s.

If this is the case,

autocorrelation.

In this chapter and the next, we'll investigate what is the importance of these two problems, and how to deal with them.

4.3. Basic Concepts and Denitions


Now we'll investigate the consequences of nonidentically and/or dependently distributed errors. We'll assume xed regressors for now, relaxing this admittedly unrealistic assumption later. The model is

y = X + E() = 0 V () =

4.3. BASIC CONCEPTS AND DEFINITIONS

49

Figure 4.2.1. Residuals of Nerlove model

where

is a general symmetric positive denite matrix.

The case where

is a diagonal matrix gives uncorrelated, nonidentically

distributed errors. This is known as

heteroscedasticity

(HET).

The case where

has the same number on the main diagonal but nonzero

elements o the main diagonal gives identically (assuming higher moments are also the same) dependently distributed errors. This is known as

auto-

correlation

(AUT).

Heteroscedasticity (denition):

Heteroscedasticity is the existence of errors that

have dierent variances. More precisely, there exist i and j such that

V ( i ) = V ( j ).

Autocorrelation (denition):
E(
i j)

Autocorrelation is the existence of errors that

are correlated with one another. More precisely, there exist distinct

i and

j such

= 0.

50

4. HETEROSCEDASTICITY

Note that presence of HET implies that its main diagonal.

will have dierent elements on

If there is AUT, then at least some elements of be dierent from zero.

o the main diagonal will

When there is HET but not AUT,

will be a diagonal matrix.

It is possible to have both HET and AUT at the same time. In this case,

can be a general symmetric positive denite matrix.

4.4. Eects of Het. and Aut. on the OLS estimator


The least square estimator is

= (X X)1 X y = + (X X)1 X
We have unbiasedness, as before. The variance of

is

E ( )( )
(4.4.1)

= E (X X)1 X X(X X)1 = (X X)1 X X(X X)1

Due to this, any test statistic that is based upon an estimator of invalid, since there

is

isn't

any

2,

it doesn't exist as a feature of the true

process that generates the data. In particular, the formulas for the

t, F, 2

based tests given above do not lead to statistics with these distributions.

is still consistent, following exactly the same argument given before.

4.4. EFFECTS OF HET. AND AUT. ON THE OLS ESTIMATOR

51

If

is normally distributed, then

N , (X X)1 X X(X X)1


The problem is that

is unknown in general, so this distribution won't be

useful for testing hypotheses.

Without normality, we still have

= =

n(X X)1 X XX n n1/2 X


1

n1/2 X
(supposing a CLT applies) as

Dene the limiting variance of

lim E

X X n

so we obtain

d n N 0, Q1 Q1 X X

Summary:

OLS with heteroscedasticity and/or autocorrelation is:

unbiased in the same circumstances in which the estimator is unbiased with i.i.d. errors

has a dierent variance than before, so the previous test statistics aren't valid

is consistent is asymptotically normally distributed, but with a dierent limiting covariance matrix. Previous test statistics aren't valid in this case for this reason.

is inecient, as is shown below.

52

4. HETEROSCEDASTICITY

4.5. The Generalized Least Squares (GLS) estimator


Suppose

were known. Then one could form the Cholesky decomposition

P P = 1
Here,

is an upper triangular matrix. We have

P P = In
so

P P P = P ,
which implies that

P P = In
Consider the model

P y = P X + P ,
or, making the obvious denitions,

y = X + .
This variance of

= P

is

E(P P ) = P P = In

4.5. THE GENERALIZED LEAST SQUARES (GLS) ESTIMATOR

53

Therefore, the model

y = X + E( ) = 0 V ( ) = In
satises the classical assumptions. The GLS estimator is simply OLS applied to the transformed model:

GLS = (X X )1 X y = (X P P X)1 X P P y = (X 1 X)1 X 1 y


The GLS estimator is unbiased in the same circumstances under which the OLS estimator is unbiased. For example,

E(GLS ) = E (X 1 X)1 X 1 y = E (X 1 X)1 X 1 (X + = .


The variance of the estimator can be calculated using

GLS = (X X )1 X y = (X X )1 X (X + ) = + (X X )1 X

54

4. HETEROSCEDASTICITY

so

GLS

GLS

= E (X X )1 X X (X X )1 = (X X )1 X X (X X )1 = (X X )1 = (X 1 X)1

Either of these last formulas can be used.

All the previous results regarding the desirable properties of the least squares estimator hold, when dealing with the transformed model, since the transformed model satises the classical assumptions.

Tests are valid, using the previous formulas, as long as we substitute place of

in

X.

Furthermore, any test that involves

can set it to

1.

This is

preferable to re-deriving the appropriate formulas.

The GLS estimator is more ecient than the OLS estimator.

This is a

consequence of the Gauss-Markov theorem, since the GLS estimator is based on a model that satises the classical assumptions but the OLS estimator is not. To see this directly, not that (the following needs to be completed)

V ar() V ar(GLS ) = (X X)1 X X(X X)1 (X 1 X)1 = AA


where

A = (X X)1 X (X 1 X)1 X 1 .

This may not seem obvi-

ous, but it is true, as you can verify for yourself. Then noting that is a quadratic form in a positive denite matrix, we conclude that positive semi-denite, and that GLS is ecient relative to OLS.

AA
is

AA

4.6. FEASIBLE GLS

55

As one can verify by calculating rst order necessary conditions, the GLS estimator is the solution to the minimization problem

GLS = arg min(y X) 1 (y X)


so the

metric 1

is used to weight the residuals.

4.6. Feasible GLS


The problem is that

isn't known usually, so this estimator isn't available.

Consider the dimension of

: it's an

nn

matrix with

(n2 n) /2 + n =

(n2 + n) /2

unique elements.

The number of parameters to estimate is larger than than

and increases faster

n.

There's no way to devise an estimator that satises a law of large

numbers without adding restrictions.

The

feasible GLS estimator

is based upon making sucient assumptions

regarding the form of

so that a consistent estimator can be devised.

Suppose that we

parameterize

as a function of

and

where

may include

as well as other parameters, so that

= (X, )
where fact

is of xed dimension. Assuming that the parametrization is correct, so in and if we can consistently estimate

= (X, ),

then we can consistently

estimate

(as long as

(X, )

is a continuous function of

).

In this case,

p = (X, ) (X, )

56

4. HETEROSCEDASTICITY

If we replace estimator.

in the formulas for the GLS estimator with

we obtain the FGLS

The FGLS estimator shares the same asymptotic properties as

GLS. These are


(1) Consistency (2) Asymptotic normality (3) Asymptotic eciency

if

the errors are normally distributed. (Cramer-Rao).

(4) Test procedures are asymptotically valid.

In practice, the usual way to proceed is


(1) Dene a consistent estimator of pending on the parametrization (2) Form

This is a case-by-case proposition, deWe'll see examples below.

().

= (X, ) P = Chol(1 ).

(3) Calculate the Cholesky factorization (4) Transform the model using

P y = P X + P
(5) Estimate using OLS on the transformed model.

4.7. Heteroscedasticity
Heteroscedasticity is the case where

E( ) =
is a diagonal matrix, so that the errors are uncorrelated, but have dierent variances. Heteroscedasticity is usually thought of as associated with cross sectional

data, though there is absolutely no reason why time series data cannot also be

4.7. HETEROSCEDASTICITY

57

heteroscedastic.

Actually, the popular ARCH (autoregressive conditionally het-

eroscedastic) models that you may hear about in your nance classes explicitly assume that a time series is heteroscedastic. Consider a supply function

qi = 1 + p Pi + s Si + i
where

Pi

is price and

Si

is some measure of size of the

ith

rm. One might suppose

that unobservable factors (e.g., talent of managers, degree of coordination between production units,

etc.)

account for the error term

i .

If there is more variability in

these factors for large rms than for small rms, then when

may have a higher variance

Si

is high than when it is low.

Another example, individual demand.

qi = 1 + p Pi + m Mi + i
where

is price and

is income. In this case,

i can reect variations in preferences.

There are more possibilities for expression of preferences when one is rich, so it is possible that the variance of

could be higher when

is high.

Add example of group means.

4.7.1. Detection.

There exist many tests for the presence of heteroscedasticity.

We'll discuss three methods. 4.7.1.1. and

Goldfeld-Quandt.

The sample is divided in to three parts, with

n1 , n2

n3

observations, where

n1 + n2 + n3 = n.

The model is estimated using the rst

and third parts of the sample, separately, so that Then we have

and

will be independent.

1 1 1 M 1 1 d 2 (n1 K) = 2 2

58

4. HETEROSCEDASTICITY

and

3 3 3 M 3 3 d 2 = (n3 K) 2 2
so

1 1 /(n1 K) d F (n1 K, n3 K). 3 3 /(n3 K)


The distributional result is exact if the errors are normally distributed. This test is a two-tailed test. Alternatively, and probably more conventionally, if one has prior ideas about the possible magnitudes of the variances of the observations, one could order the observations accordingly, from largest to smallest. In this case, one would use a conventional one-tailed F-test.

Draw picture.

Ordering the observations is an important step if the test is to have any power.

The motive for dropping the middle observations is to increase the dierence between the average variance in the subsamples, supposing that there exists heteroscedasticity. This can increase the power of the test. On the other hand, dropping too many observations will substantially increase the variance of the statistics

1 1

and

3 3 .

A rule of thumb, based on Monte

Carlo experiments is to drop around 25% of the observations.

If one doesn't have any ideas about the form of the het.

the test will

probably have low power since a sensible data ordering isn't available.

4.7.1.2.

White's test.

When one has little idea if there exists heteroscedasticity,

and no idea of its potential form, the White test is a possibility. The idea is that if there is homoscedasticity, then

E(2 |xt ) = 2 , t t

4.7. HETEROSCEDASTICITY

59

so that follows:

xt

or functions of

xt

shouldn't help to explain

E(2 ). t

The test works as

(1) Since

isn't available, use the consistent estimator

instead.

(2) Regress

2 = 2 + zt + vt t
where

zt

is a

P -vector. zt

may include some or all of the variables in

xt ,

as

well as other variables. White's original suggestion was to use set of all unique squares and cross products of variables in (3) Test the hypothesis that

xt ,

plus the

xt .

= 0.

The

qF

statistic in this case is

qF =
Note that

(ESSR ESSU ) /P ESSU / (n P 1)

ESSR = T SSU , so dividing both numerator and denominator by R2 1 R2

this we get

qF = (n P 1)
Note that this is the

R2

or the articial regression used to test for hetof the original model.

eroscedasticity, not the

R2

An asymptotically equivalent statistic, under the null of no heteroscedasticity (so that

R2

should tend to zero), is

nR2 2 (P ).
This doesn't require normality of the errors, though it does assume that the fourth moment of

is constant, under the null.

Question:

why is this necessary?

60

4. HETEROSCEDASTICITY

The White test has the disadvantage that it may not be very powerful unless the

zt

vector is chosen well, and this is hard to do without knowledge of the

form of heteroscedasticity.

It also has the problem that specication errors other than heteroscedasticity may lead to rejection.

Note: the null hypothesis of this test may be interpreted as variance model

=0

for the

V (2 ) = h( + zt ), t

where

h()

is an arbitrary function

of unknown form. The test is more general than is may appear from the regression that is used.

4.7.1.3.

Plotting the residuals.

A very simple method is to simply plot the residLike the Goldfeld-Quandt test, this will

uals (or their squares).

Draw pictures here.

be more informative if the observations are ordered according to the suspected form of the heteroscedasticity.

4.7.2. Dealing with heteroscedasticity if it is present.


eroscedasticity requires that a parametric form for means for estimating

Correcting for het-

()

be supplied, and that a

consistently be determined.

The estimation method will

be specic to the for supplied for HET and HET by groups.

().

We'll consider two examples, multiplicative

Before this, let's consider using OLS, even if we have

HET. The advantage of this is that we don't need to specify the form of 4.7.2.1.

().
Eicker

OLS with heteroscedasticity-consistent covariance matrix estimation.

(1967) and White (1980) showed how to modify test statistics to account for heteroscedasticity of unknown form. The OLS estimator has asymptotic distribution

d n N 0, Q1 Q1 X X

4.7. HETEROSCEDASTICITY

61

as we've already seen. Recall that we dened

lim E

X X n

This matrix has dimension can't estimate

KK

and can be consistently estimated, even if we

consistently.

The consistent estimator, under heteroscedasticity

but no autocorrelation is

1 = n

t xt xt 2
t=1

One can then modify the previous test statistics to obtain tests that are valid when there is heteroscedasticity of unknown form. For example, the Wald test for

H0 :

R r = 0

would be

n R r
4.7.2.2.

XX n

XX n

R r 2 (q)

Multiplicative heteroscedasticity.

Suppose the model is

yt = xt + t
2 t = E(2 ) = (zt ) t

but the other classical assumptions hold. In this case

2 = (zt ) + vt t
and

vt

has mean zero.

Nonlinear least squares could be used to estimate observable.

and

consistently, were

The solution is to substitute the squared OLS

residuals have

2 t

in place of

2 , t

since it is consistent by the Slutsky theorem. Once we

and

we can estimate

2 t

consistently using

2 t = (zt ) t . 2

62

4. HETEROSCEDASTICITY

In the second step, we transform the model by dividing by the standard deviation:

x t yt = t + t t t
or

y t = x + . t t
Asymptotically, this model satises the classical assumptions.

This model is a bit complex in that NLS is required to estimate the model of the variance. A simpler version would be

yt

xt + t

2 t = E(2 ) = 2 zt t

where

zt

is a single variable. There are still two parameters to be estimated,

and the model of the variance is still nonlinear in the parameters. However, the

search method

can be used in this case to reduce the estimation problem

to repeated applications of OLS.

First, we dene an interval of reasonable values for Partition this interval into

e.g.,

[0, 3].

equally spaced values, e.g.,

{0, .1, .2, ..., 2.9, 3}.

For each of these values, calculate the variable The regression

zt m .

2 = 2 zt m + vt t
is linear in the parameters, conditional on OLS.

m ,

so one can estimate

by

2 Save the pairs (m , m ), and the corresponding


the minimum

ESSm . Choose the pair with

ESSm

as the estimate.

4.7. HETEROSCEDASTICITY

63

Next, divide the model by the estimated standard deviations. Can rene.

Draw picture.

Works well when the parameter to be searched over is low dimensional, as in this case.

4.7.2.3.

Groupwise heteroscedasticity.

A common case is where we have repeated

observations on each of a number of economic agents: e.g., 10 years of macroeconomic data on each of a set of countries or regions, or daily observations of transactions of 200 banks. This sort of data is a

pooled cross-section time-series model.

It may be reasonable to presume that the variance is constant over time within the cross-sectional units, but that it diers across them (e.g., rms or countries of dierent sizes...). The model is

yit = xit + it
2 E(2 ) = i , t it

where agent.

i = 1, 2, ..., G

are the agents, and

t = 1, 2, ..., n

are the observations on each

The other classical assumptions are presumed to hold. In this case, the variance

2 i

is specic to each agent, but constant over the

observations for that agent.

In this model, we assume that that we'll relax later.

E(it is ) = 0.

This is a strong assumption

To correct for heteroscedasticity, just estimate each

2 i

using the natural estimator:

i = 2

1 n

2 it
t=1

64

4. HETEROSCEDASTICITY

Note that we use regressors, so unimportant.

1/n

here since it's possible that there are more than

nK

could be negative.

Asymptotically the dierence is

With each of these, transform the model as usual:

yit x it = it + i i i
Do this for each cross-sectional group. This transformed model satises the classical assumptions, asymptotically.

4.8. Example 4.8.1. Example: the Nerlove model.


Let's check the Nerlove data for evi-

dence of heteroscedasticity. In what follows, we're going to use the model with the constant and output coecient varying across 5 groups, but with the input price coecients xed (see Equation 2.5.2). If you plot the residuals of this model, you obtain Figure 4.8.1. We can see pretty clearly that the error variance is larger for small rms than for larger rms. As part of your next Docencia Tutoritzada project, you will use the White and Goldfeld-Quandt tests to conrm that homoscedasticity is strongly rejected.

4.9. Tercer Projecte de Docncia Tutoritzada


(1) Dades de Wisconsin (a) Baixa les dades de Wisconsin, sobre alada i renda (b) selecciona les observacions amb informaci completa sobre alada i renda. (c) crea una variable ctcia indicant si la persona s dona/home

4.9. TERCER PROJECTE DE DOCNCIA TUTORITZADA

65

Figure 4.8.1. Residuals, Nerlove model, sorted by rm size

(d) crea noves variables "AD" i "IQD" que expressen alada i IQ en desviacions respecte les seves mitjanes mostrals. (e) Estima el model renda = b1 + b2*Dona + b3* AD + b4*(Dona*AD) + b5*IQD + e amb l'estimador MQO. (f ) Comenta els resultats (g) Comprova si hi ha heteroscedasticitat (i) dibuixant els residus (ii) amb el contrast Goldfeld-Quandt (iii) amb el contrast de White (h) Torna a estimar amb MQO, per amb desviacions tpiques robustas. Compara els resultats amb els d'abans.

66

4. HETEROSCEDASTICITY

(i) Fes una estimaci MQ Generalitzat, suposant que hi ha heteroscedasticitat per grups. resultats. (j) Fes una estimaci MQ Generalitzat, fent servir l'opci de GRETL "Correcion de heteroscadaticidad" . Comenta els resultats. (2) Dades Nerlove (a) Torna a estimar el model amb variables ctcies i termes d'interacci del Primer Projecte de Docncia Tutoritzada Hi ha dos grups - homes i dones. Comenta els

ln(cost) =
j=1

j dj +
j=1

j [dj ln(output)]+3 ln(labor)+4 ln(f uel)+5 ln(capital)+


"els errors sn homoscedastics" amb el

(b) Contrasta l'hiptesis nulla: contrast de White.

(c) fs grcs del residus, i comenta si es detecta l'heteroscedasticitat. S'hauria d'obtenir un grc semblant amb Figure 4.8.1. (d) Fes una estimaci MQ Generalitzat, fent servir l'opci de GRETL "Correcion de heteroscadaticidad" . Comenta els resultats.

4.10. Chapter Exercises


The professor of the practical sessions will give you a list of problems. To that, you might also consider exercises 11.1, 11.2, 11.6, 11.15, 11.16, from Gujarati, pp. 413-421.

CHAPTER 5

Autocorrelation

5.1. Introduction

Basic concepts and goals for learning. After studying the material, you

should learn the answers to the following questions: (1) What is autocorrelation (AUT)? (2) What are the properties of the OLS estimator when there is autocorrelation? (3) How can the presence of autocorrelation be detected? (4) How can we deal with autocorrelation if it is present?

Readings: Gujarati,

Econometria,

(cuarta edicion), Chapter 12: Autocorrelacin:

qu sucede si los trminos error estn correlacionados?, pp. 425 - 486.

5.2. Motivation
Autocorrelation, which is the serial correlation of the error term, so that

E(

t s

0)

for

t = s,

is a problem that is usually associated with time series data, but also

can aect cross-sectional data. For example, a shock to oil prices will simultaneously aect all countries, so one could expect contemporaneous correlation of macroeconomic variables across countries. Seasonality is another common problem. Consider the Keeling-Whorf.gdt data. If we regress C02 concentration on a time trend, we obtain the tted line in 5.2.1. The residuals from the same model are in Figure 5.2.2. In addition to a high frequency monthly pattern in the residuals, there
67

68

5. AUTOCORRELATION

Figure 5.2.1. Keeling-Whorf CO2 data, t using time trend

Figure 5.2.2. Keeling-Whorf CO2 data, residuals using time trend

is a long term low frequency wave. It is clear that the errors of this model are not independent over time. This is an example of autocorrelation.

5.3. CAUSES

69

If you examine the residuals of the simple Nerlove model (equation 5.9.1), in Figure 4.8.1, you can also detect that there appears to be autocorrelation. In this Chapter, we will explore the causes, eects and treatments for AUT.

5.3. Causes
Autocorrelation is the existence of correlation across the error term:

E(t s ) = 0, t = s.
Why might this occur? Plausible explanations include: (1) Lags in adjustment to shocks. In a model such as

y t = xt + t ,
one could interpret

xt as the equilibrium value. t

Suppose

xt is constant over

a number of observations. One can interpret

as a shock that moves the

system away from equilibrium. If the time needed to return to equilibrium is long with respect to the observation frequency, one could expect be positive, conditional on

t+1

to

positive, which induces a correlation.

(2) Unobserved factors that are correlated over time. The error term is often assumed to correspond to unobservable factors. If these factors are correlated, there will be autocorrelation. (3) Misspecication of the model. (DGP) is Suppose that the data generating process

y t = 0 + 1 xt + 2 x2 + t t
but we estimate

yt = 0 + 1 xt + t

70

5. AUTOCORRELATION

Figure 5.3.1. Autocorrelation induced by misspecication

The eects are illustrated in Figure 5.3.1. A similar problem might explain the residuals of the simple Nerlove model, in Figure 4.2.1.

5.4. Eects on the OLS estimator


The variance of the OLS estimator is the same as in the case of heteroscedasticity - the standard formula does not apply. The correct formula is given in equation 4.4.1. Next we discuss two GLS corrections for OLS.

5.5. Corrections
There are many types of autocorrelation. The way to correct for the problem depends on the exact type of autocorrelation that exists. We'll consider two ex-

amples. The rst is the most commonly encountered case: autoregressive order 1 (AR(1) errors.

5.5. CORRECTIONS

71

5.5.1. AR(1).

The model is

yt = xt + t t = t1 + ut
2 ut iid(0, u )

E(t us ) = 0, t < s
We assume that the model satises the other classical assumptions.

We need a stationarity assumption: explodes as

|| < 1.

Otherwise the variance of

increases, so standard asymptotics will not apply.

By recursive substitution we obtain

t = t1 + ut = (t2 + ut1 ) + ut = 2 t2 + ut1 + ut = 2 (t3 + ut2 ) + ut1 + ut


In the limit the lagged

drops out, since

m 0

as

m ,

so we obtain

t =
m=0
With this, the variance of

m utm

is found as

E(2 ) t

= =

2 u m=0 2 u 1 2

2m

72

5. AUTOCORRELATION

If we had directly assumed that obtain this using

were covariance stationary, we could

V (t ) = 2 E(2 ) + 2E(t1 ut ) + E(u2 ) t1 t


2 = 2 V (t ) + u ,

so

V (t ) =
The variance is the

2 u 1 2

0th

order autocovariance:

0 = V (t )

Note that the variance does not depend on

Likewise, the rst order autocovariance

is

Cov(t , t1 ) = s = E((t1 + ut ) t1 ) = = V (t )
2 u 1 2

Using the same method, we nd that for

s<t

Cov(t , ts ) = s =
The autocovariances don't depend on

2 s u 1 2
the process

t:

{t }

is

covariance sta-

tionary

The

correlation ( in general, for r.v.'s x and y ) is dened as


corr(x, y)

cov(x, y) se(x)se(y)

5.5. CORRECTIONS

73

but in this case, the two standard errors are the same, so the

s-order autocorrelation

is

s = s
All this means that the overall matrix

has the form

2 u = 1 2 this is the variance

1
. . .

n1

.. .

n2
. . . .. .

n1

this is the correlation matrix


So we have homoscedasticity, but elements o the main diagonal are not zero. All of this depends only on two parameters,

and

2 u .

If we can

estimate these consistently, we can apply FGLS.

It turns out that it's easy to estimate these consistently. The steps are

(1) Estimate the model

yt = xt + t

by OLS.

(2) Take the residuals, and estimate the model

t = t1 + u t
Since

t t ,

this regression is asymptotically equivalent to the regression

t = t1 + ut

74

5. AUTOCORRELATION

which satises the classical assumptions. Therefore, OLS to

obtained by applying
p
the estimator

t = t1 + u t

is consistent. Also, since

u ut , t

u = 2

1 n

n 2 ( )2 u ut t=2 p

(3) With the consistent estimators previous structure of factor

u 2

and

form

= (u , ) 2

using the

and estimate by FGLS. Actually, one can omit the

u /(1 2 ), 2

since it cancels out in the formula

F GLS = X 1 X

(X 1 y).

An asymptotically equivalent approach is to simply estimate the transformed model

yt yt1 = (xt xt1 ) + u t


using

n1

observations (since

y0

and

x0

aren't available).

This is the

method of Cochrane and Orcutt. Dropping the rst observation is asymptotically irrelevant, but

it can be very important in small samples.

One can

recuperate the rst observation by putting

y1 = y1

1 2 1 2
asymptotically, so we see that the trans-

x = x1 1
Note that the variance of

y1

is

2 u ,

formed model will be homoscedastic (and nonautocorrelated, since the are uncorrelated with the

us

y s,

in dierent time periods.

5.5. CORRECTIONS

75

5.5.2. MA(1).
is

The linear regression model with moving average order 1 errors

y t = xt + t t = ut + ut1
2 ut iid(0, u )

E(t us ) = 0, t < s
In this case,

V (t ) = 0 = E (ut + ut1 )2 = =
Similarly

2 2 u + 2 u 2 u (1 + 2 )

1 = E [(ut + ut1 ) (ut1 + ut2 )]


2 = u

and

2 = [(ut + ut1 ) (ut2 + ut3 )] = 0

76

5. AUTOCORRELATION

so in this case

1+ 2 = u 0 . . . 0

1+
2

0
.. .

0
. . .

..

1 + 2

Note that the rst order autocorrelation is

1 =

2 u 2 u (1+2 )

=
This achieves a maximum at

1 0 (1 + 2 )
and a minimum at

=1

= 1,

and the

maximal and minimal autocorrelations are 1/2 and -1/2. Therefore, series that are more strongly autocorrelated can't be MA(1) processes. Again the covariance matrix has a simple structure that depends on only two parameters. The problem in this case is that one can't estimate

using OLS on

t = ut + ut1
because the

ut

are unobservable and they can't be estimated consistently. However,

there is a simple way to estimate the parameters.

Since the model is homoscedastic, we can estimate

2 2 V (t ) = = u (1 + 2 )

using the typical estimator:

2 2 = u (1 + 2 ) =

1 n

2 t
t=1

5.6. VALID INFERENCES WITH AUTOCORRELATION OF UNKNOWN FORM

77

By the Slutsky theorem, we can interpret this as dening an (unidentied) estimator of both

2 u

and

e.g., use this as

2 u (1 + 2 ) =

1 n

2 t
t=1

However, this isn't sucient to dene consistent estimators of the parameters, since it's unidentied.

To solve this problem, estimate the covariance of

and

t1

using

2 Cov(t , t1 ) = u =

1 n

t t1
t=2

This is a consistent estimator, following a LLN (and given that the epsilon hats are consistent for the epsilons). As above, this can be interpreted as dening an unidentied estimator:

1 2 u = n

t t1
t=2

Now solve these two equations to obtain identied (and therefore consistent) estimators of both

and

2 u .

Dene the consistent estimator

2 = (, u )
following the form we've seen above, and transform the model using the Cholesky decomposition. The transformed model satises the classical assumptions asymptotically.

5.6. valid inferences with autocorrelation of unknown form


In Section 4.7.2.1 we saw that it is possible to consistently estimate the correct covariance matrix of the OLS estimator when there is HET. It is also possible to do

78

5. AUTOCORRELATION

this when there is AUT, or both HET and AUT. The details are beyond the scope of this course. It is important to remember that a correction for autocorrelation will only give an ecient estimator and valid test statistics if the model of autocorrelation is correct. It may be hard to determine which is the correct model for the autocorrelation of the errors, so one may prefer to foregoe the GLS correction and simply use OLS. If this is done, one needs to account for the existence of AUT when estimating the covariance of the parameters, to obtain correct test statistics. We will see examples in the Projecte de Docncia Tutoritzada.

5.7. Testing for autocorrelation Breusch-Godfrey test


This test uses an auxiliary regression, as does the White test for heteroscedasticity. The regression is

t = xt + 1 t1 + 2 t2 + + P tP + vt
and the test statistic is the

nR2

statistic, just as in the White test. There are

restrictions, so the test statistic is asymptotically distributed as a

2 (P ).

The intuition is that the lagged errors shouldn't contribute to explaining the current error if there is no autocorrelation.

xt

is included as a regressor to account for the fact that the

are not

independent even if the here.

are. This is a technicality that we won't go into

This test is valid even if the regressors are stochastic and contain lagged dependent variables.

5.8. LAGGED DEPENDENT VARIABLES AND AUTOCORRELATION: A CAUTION

79

The alternative is not that the model is an AR(P), following the argument above. The alternative is simply that some or all of the rst

autocorrelations

are dierent from zero. This is compatible with many specic forms of autocorrelation.

5.8. Lagged dependent variables and autocorrelation: A Caution


We've seen that the OLS estimator is consistent under autocorrelation, as long as

plim X = 0. This will be the case when E(X ) = 0, following a LLN. An important n
exception is the case where

contains lagged

y s and the errors are autocorrelated.

A simple example is the case of a single lag of the dependent variable with AR(1) errors. The model is

yt = xt + yt1 + t t = t1 + ut
Now we can write

E(yt1 t ) = E (xt1 + yt2 + t1 )(t1 + ut ) = 0


since one of the terms is and therefore

E(2 ) t1
Since

which is clearly nonzero. In this case

E(X ) = 0,

plim X = 0. n

plim = + plim

X n

the OLS estimator is inconsistent in this case. One needs to estimate by instrumental variables (IV). This is a topic that is beyond the scope of this course. It is important to be aware of the possibility that the OLS estimator can be inconsistent, though.

80

5. AUTOCORRELATION

5.9. Quart Projecte de Docncia Tutoritzada


Fent servir les dades de Nerlove (ja heu fet servir les dades, per el txer Excel est aqui si cal.) (1) Pel model senzill

(5.9.1)

ln(cost) = 1 + 2 ln(output) + 3 ln(labor) + 4 ln(f uel) + 5 ln(capital) +


(a) estimar el model amb MQO (b) Fes servir el contrast de Breusch-Godfrey per comprovar si hi ha autocorrelaci.

Important: Per poder fer aix s'haur de donar una estruc-

tura de serie temporal a les dades.


(c) fer un grc dels residus, i dona una interpretaci de si es veu o no un problema d'autocorrelaci. (2) Repetir exercici 1, per fent servir el model

ln(cost) =
j=1

j dj +
j=1

j [dj ln(output)]+3 ln(labor)+4 ln(f uel)+5 ln(capital)+

que es va presentar en Secci 2.5.3. (3) Amb les dades Keeling-Whorf.gdt (a) estimar el model

CO2t = 1 + 2 t +

(b) comprova si hi ha autocorrelaci fent servir el contrast de BreuschGodfrey. (c) fer un grc dels residus (d) tornar a estimar el model fent servir els mtodes de Cochrane-Orcutt i Prais-Winsten, i fer grcs dels residus.

5.10. CHAPTER EXERCISES

81

(e) comentar tots els resultats

5.10. Chapter Exercises


The professor of the practical sessions will give you a list of problems. To that, you might also consider exercises 12.1, 12.8, 12.9, 12.11, 12.14, 12.17, 12.22, 12.26, 12.28 from Gujarati, pp. 472-486.

CHAPTER 6

Data sets
This chapter gives links to the data sets referred to in the Study Guide Wisconsin height-income data (comma separated values) Wisconsin height-income data (Gretl data le) Nerlove data (Excel spreadsheet le) Nerlove data (Gretl data le) Keeling-Whorf CO2 data (Gretl data le) Cigarette-Alcohol Mortality data (Gretl data le)

83

Potrebbero piacerti anche