Gretl

Study Guide for Econometrics (second semester)
Programa Universitat-Empresa
Universitat Autnoma de Barcelona
February 2008
Michael Creel and Montserrat Farell
Contents
Introduction Econometrics at the Facultat About this study guide Bibliograpy 7 7 8 9
Chapter 1. 1.1. 1.2. 1.3.
GRETL
11 11 12 20
Introduction Getting Started Chapter Exercises
Chapter 2. 2.1. 2.2. 2.3. 2.4. 2.5. 2.6.
Dummy Variables
23 23 23 25 28 30 34
Introduction Motivation Denition, Basic Use, and Interpretation Additional Details Primer Projecte Docencia Tutoritzada Chapter Exercises
Chapter 3. 3.1. 3.2. 3.3. 3.4.
Collinearity
35 35 35 38 39
3
Introduction Motivation: Data on Mortality and Related Factors Denition and Basic Concepts When does it occur?
CONTENTS
3.5. 3.6. 3.7. 3.8. 3.9.
Consequences of Collinearity Detection of Collinearity Dealing with collinearity Segon Projecte de Docencia Tutoritzada Chapter Exercises
40 44 45 45 46
Chapter 4. 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. 4.9. 4.10.
Heteroscedasticity
47 47 47 48 50 52 55 56 64 64 66
Introduction Motivation Basic Concepts and Denitions Eects of Het. and Aut. on the OLS estimator The Generalized Least Squares (GLS) estimator Feasible GLS Heteroscedasticity Example Tercer Projecte de Docncia Tutoritzada Chapter Exercises
Chapter 5. 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. 5.7. 5.8.
Autocorrelation
67 67 67 69 70 70 77 78 79
Introduction Motivation Causes Eects on the OLS estimator Corrections valid inferences with autocorrelation of unknown form Testing for autocorrelation Lagged dependent variables and autocorrelation: A Caution
CONTENTS
5.9. 5.10.
Quart Projecte de Docncia Tutoritzada Chapter Exercises
80 81
Chapter 6.
Data sets
83
Introduction
Econometrics at the Facultat

Econometrics (Econometria) is an annual (two semester) course in the Facultat de Cincies Econmiques i Empresarials at the UAB. It is a required course for the degree of Llicenciat in both Administraci i Direcci d'Empreses (ADE) and Economia (ECO). In both ADE and ECO, Econometrics is normally taken in the third year of study. Econometrics is an area of Economics that uses statistical and mathematical tools to analyze data on economic phenomena. Econometrics can be used to nd a mathematical model that gives a good representation of an actual economy, to test theories about how an economy behaves, or to make predictions about how an economy will evolve. Estimation of models, testing hypotheses, and making
predictions are things that can be done using econometric methods. Courses that are fundamental for successfully studying Econometrics are Matemtiques per a Economistes I and Matemtiques per a Economistes II (rst year of study) and Estadistica I and Estadistica II (second year of study). Ideally, students should have passed these courses before beginning Econometrics. If this is
not possible, any student of Econometrics should immediately begin serious review of the material covered in these courses. Basic matrix algebra, constrained and
unconstrained minimization of functions, conditional and unconditional expectations of random variables, and hypothesis testing are areas that should be reviewed.
INTRODUCTION
Microeconomia I and Microeconomia II are courses that provide a theoretical background which is important to understand why and how we use econometric tools. Macroeconomia I also provides a theoretical background for some of the
examples of the second half of Econometrics.
About this study guide

This study guide covers the material taught in the second semester, in groups 13 and 14 (the groups of the PUE). The guide contains brief notes for all of the material, as well as examples that use the GRETL. This guide does not substitute reading a textbook, it accompanies a textbook. It also does not substitute attending class. The guide highlights essential concepts, provides examples, and gives exercises. However, class lectures contain details that are not reproduced in the guide. To learn these details,
attending class is fundamental,
as is careful
reading of a textbook.
The
guide provides references to the book
Econometra
(cuarta edicin) by D. Gujarati,
mentioned below. In the second semester of Econometrics, we will cover material in Chapters 9, 10, 11 and 12 of Gujarati's book. This guide has been checked to work properly using the Firefox web browser, and Adobe Acrobat Reader. Both of these packages are freely available for the You should congure Acrobat Reader to use
commonly used operating systems.
Firefox to open links. This study guide and related materials (data sets, copies of software and manuals, page.
etc.)
are available at the Econometrics Study Guide web
BIBLIOGRAPY
Bibliograpy
There are many excellent textbooks for econometrics. Any of the following are appropriate. This study guide refers to Gujarati's book. You should denitely read the appropriate sections of at least one of these books. (1) Novales, A. , Econometria, McGraw-Hill (2) Gujarati, D. , Econometria, McGraw-Hill (3) Johnston, J. i J. Dinardo, Metodos de Econometria, Vicens Vives (4) Kmenta, J., Elementos de Econometria, Vicens Vives (5) Maddala, G.S.(1996), Introduccin a la econometria, Segona edici. Prentice Hall (6) Pindyck, R.S. & Rubinfeld, D.L. (2001), Econometria: modelos y pronsticos, McGraw-Hill. Quarta Edici.
CHAPTER 1
GRETL
1.1. Introduction
GRETL (GRETL
http://gretl.sourceforge.net/)
is a free computer pack-
age for doing econometrics. It is installed on the computers in Aules 21-22-23 as well as in the Social Sciences computer rooms. You can download a copy and install it on your own computer. It works with Windows, Macs, and Linux. It is avail-
able in a number of languages, including Spanish. The version for Windows, along with the manual and the data sets that accompany D. Gujarati's distributed with this study guide, and are also available :
Econometra
are
Gretl v. 1.7.1 for Windows Data to accompany Gujarati's book
The examples in this study guide use GRETL, and to do the class assignments you will need to use GRETL. This chapter explains the basic steps of using GRETL.
Basic concepts and goals for learning: (1) become familiar with the basic use of GRETL (2) learn how to load ASCII and spreadsheet data (3) learn how to select certain observations in a data set
Readings: GRETL manual in Spanish or in English . You don't have to read the whole manual, but looking though it would be good idea.
11
12
1. GRETL
Figure 1.2.1. GRETL's startup window
1.2. Getting Started

Once you start GRETL, you see the window in Figure 1.2.1. You need to load some data to use GRETL. Data comes in many forms: plain text les, spreadsheet les, binary les that use special formats, etc. GRETL can use most of these forms. We'll look at how to deal with two cases: plain ASCII text data, and Microsoft Excel spreadsheet data.
1.2.1. Loading ASCII text data.
The Wisconsin longitudinal survey is long
term study of people who graduated from high school in the state of Wisconsin (US) during the year 1957. The data has been collected repeatedly in subsequent years.
1.2. GETTING STARTED
13
This data can be obtained over the Internet from the address given previously. In Figure 1.2.2 you can see that several variables have been selected for download.
Figure 1.2.2. Downloading data
In Figure 1.2.3 you see that one of the available formats is comma separated values (csv), which provides records (lines) that have variables which may be text or numbers, each separated by commas. Downloading that gives us the le wls.csv , the rst few lines of which are
iduser,ix010rec,sexrsp,gg021jjd,gwiiq_bm 1001,60,2,18000,109 1002,,1,,79 1003,,2,,111 1004,,1,,96 1005,,2,,83 1006,65,2,-2,99
14
1. GRETL
Figure 1.2.3. comma separated
1007,70,1,-2,86 1008,71,1,-2,86 1009,67,2,16827,106 1010,72,1,17094,88 1011,67,2,7698,124 1012,,2,-2,124

This rst line of the le gives the variable names, and the other lines are the individual records, one for each person. There are a total of 10317 records, for individual people. Some variables are missing for some people. In the data set, this is indicated by two commas in a row with no number in between. We need to know how to load this data into GRETL. This can be done as is seen in Figure 1.2.4. Doing that, we now have the data in GRETL, as we seen in Figure
15
Figure 1.2.4. Loading a csv le
1.2.5. This data set has some problems that make it dicult to use. First, the variable names are strange and not intuitive. Second, many observations have missing values. You can change names of variables by right-clicking on a variable, and selecting Edit attributes. Then change the name to whatever you like. See Figure 1.2.6. To see that many observations are missing values, right-click on a variable and choose Display values or Descriptive statistics. For example, the variable income (I
renamed gg021jjd to income) shows what we see in Figure 1.2.7.
16
1. GRETL
Figure 1.2.5. CSV data loaded
Figure 1.2.6. Changing a variable's name
17
Figure 1.2.7. Missing observations
To eliminate missing observations, we can select from the menu Sample -> Restrict, based on criterion, as in Figure 1.2.8. We need to enter a selection criterion. This data set is missing many observation on income and age. We can select that these variables must be positive. This is illustrated in Figure 1.2.9. Once we do this, the new sample has 4934 observations, as we can seen in Figure 1.2.10. Whenever you are using this data, you should make sure that you have removed the observations with missing data.
1.2.2. Loading spreadsheet data.
Data is often distributed as spreadsheet
les. These are easy to load into GRETL using the File -> Open data -> Import option. Figure 1.2.11 shows how to do it. We need some spreadsheet data to try
18
1. GRETL
Figure 1.2.8. Select sample, 1
this. Get the nerlove.xls data, and then import is as I have just explained. Once you do this you will see the dialog in Figure 1.2.12. Select no.
19
Figure 1.2.9. Selection criterion
Figure 1.2.12. Data dialog
20
1. GRETL
Figure 1.2.10. Restricted sample
1.3. Chapter Exercises

(1) For the Wisconsin data set: (a) change the variable name of the variable ix010rec to age (b) change the name of gg021jjd to income (c) change the name of gwiiq_bm to IQ. (d) select observations such that age and income are positive. should have 4934 observations after doing so. (e) save the restricted data, with new variable names, as the data set wisconsin.gdt. Conrm that you can load this data into a new GRETL session. (2) With your wisconsin.gdt data set: (a) explore the GRETL menu options, the help features, and the manual, and print histograms (frequency plots) for the variables age, income and IQ. You
1.3. CHAPTER EXERCISES
21
Figure 1.2.11. Loading spreadsheet data
(b) print descriptive statistics for all variables.
CHAPTER 2
Dummy Variables
2.1. Introduction
Basic concepts and goals for learning. After studying the material, you
should be able to answer the following questions: (1) What is a dummy variable? (2) How can dummy variables be used in regression models? (3) What is the correct interpretation of a regression model that contains dummy variables? (4) How can dummy variables be used in the cases of multiple categories, interaction terms, and seasonality? (5) What is the equivalence between the dierent parameterizations that can be used when incorporating dummy variables?
Readings: (1) Gujarati,
Econometria,
(cuarta edicion), Chapter 9: Modelos de re-
gressin con variables dictomas, pp. 285 - 320.
2.2. Motivation
Often, qualitative factors can have an important eect on the dependent variable we may be interested in. Consider the Wisconsin data set wisconsin.gdt . If we
regress income on height, having selected the sample to include men only, we obtain the tted line in Figure 2.2.1. Doing the same for the sample of women, we get
Figure 2.2.2. Comparing the two plots, we can see that:

23
24
2. DUMMY VARIABLES
Figure 2.2.1. Income regressed on height, men
Figure 2.2.2. Income regressed on height, women
2.3. DEFINITION, BASIC USE, AND INTERPRETATION
25
the y-intercept is higher for men than for women the slope of the line is steeper for men than for women men are taller on average - for men, mean height is around 70 inches, while for women it's about 65 inches
There are a few questions we might ask:
why does income appear to depend upon height? What economic explanations are possible?
why do women appear to be earning less than men, other things equal?
Apart from these questions, it is clear that a qualitative feature - the sex of the individual - has an impact upon the individual's expected income.
How can we incorporate such a qualitative characteristic into an econometric model?
The need to use qualitative information in our models motivates the study of dummy variables.
2.3. Denition, Basic Use, and Interpretation Dummy variable (denition):

A dummy variable is a binary-valued variable
that indicates whether or not some condition is true. It is customary to assign the value 1 if the condition is true, and 0 if the condition is false.
Dummy variable (example):
for the Wisconsin data, the variable sexrsp takes
the value 1 for men, and 2 for women. As such, sexrsp is not a dummy variable, since the values are not 0 or 1. We can dene the condition Is the person a woman? This is equivalent to the condition Is the value of sexrsp equal 2?. This condition will be true for some observations, and false for others. With GRETL, we can dene such a dummy variable, using the Variable -> Dene new variable menu item, as in
26
2. DUMMY VARIABLES
Figure 2.3.1. Dening a dummy variable
Figure 2.3.2. Display values
Figure 2.3.1. To check that this worked properly, highlight both variables, R-click, and select Display values. This shows us what we see in Figure 2.3.2. Note that woman is now a variable like any other, that takes on the values 0 or 1.
2.3.1. Basic use and interpretation.

like any other regressor. like
Dummy variables are used essentially Variables
In class we will discuss the following models.
dt
and
dt2
are understood to be dummy variables. Variables like
xt
and
xt3
are
2.3. DEFINITION, BASIC USE, AND INTERPRETATION
27
ordinary continuous regressors. You should understand the interpretation of all of them.
y t = 1 + 2 dt +
yt = 1 dt + 2 (1 dt ) +
yt = 1 + 2 dt + 3 xt +
Interaction terms:
an interaction term is the product of two variables, so that
the eect of one variable on the dependent variable depends on the value of the other. The following model has an interaction term. Note that The slope depends on the value of
E(y|x) x
= 3 + 4 dt .
dt .
yt = 1 + 2 dt + 3 xt + 4 dt xt +
Multiple dummy variables:
we can use more than one dummy variable in a
model. We will study models of the form
yt = 1 + 2 dt1 + 3 dt2 + 4 xt +
yt = 1 + 2 dt1 + 3 dt2 + 4 dt1 dt2 + 5 xt +
Incorrect usage:
You should understand why the following models are not
correct usages of dummy variables:
(1) overparameterization:
yt = 1 + 2 dt + 3 (1 dt ) +
(2) multiple values assigned to multiple categories. Suppose that we a condition that denes 4 possible categories, and we create a variable observation is in the rst category,
d = 1
if the
d=2
if in the second, etc. (This is not
strictly speaking a dummy variable, according to our denition). Why is the following model not a good one?
y t = 1 + 2 d +
What is the correct way to deal with this situation?
2.4. Additional Details Seasonality and dummy variables.

seasonal variations in data. Dummy variables can be used to treat
We will use the Keeling-Whorf.gdt data to illustrate
this. You should be able to use GRETL to reproduce the following results:
Model 1: OLS estimates using the 468 observations 1965:012003:12 Dependent variable: C02
2.4. ADDITIONAL DETAILS
29
Variable djan dfeb dmar dapr dmay djun djul daug dsep doct dnov ddec time
Coecient
Std. Error
t-statistic 1504.5009 1506.4046 1508.6276 1512.7780 1513.5233 1509.1057 1500.5705 1489.3056 1479.3061 1477.3572 1481.9367 1486.1530 300.0664
p-value
316.864 317.533 318.271 319.418 319.848 319.187 317.653 315.539 313.690 313.548 314.792 315.961 0.121327
0.210610 0.210789 0.210967 0.211147 0.211327 0.211507 0.211688 0.211870 0.212052 0.212235 0.212419 0.212603 0.000404332
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Mean of dependent variable S.D. of dependent variable Sum of squared residuals Standard error of residuals ( ) Unadjusted Adjusted
345.310 16.5472 634.978 1.18134 0.995034 0.994903 7597.57 0.0634062
R2
R2
F (12, 455)
DurbinWatson statistic and the plot in Figure 2.4.1.
Multiple parameterizations.
To formulate a model that conditions on a given
set of categorical information, there are multiple ways to use dummy variables. For
30
2. DUMMY VARIABLES
Figure 2.4.1. Keeling-Whorf CO2 data, t using monthly dummies
example, the two models
yt = 1 dt + 2 (1 dt ) + 3 xt + 4 dt xt +
and
yt = 1 + 2 dt + 3 xt dt + 4 xt (1 dt ) +
are equivalent. You should know what are the 4 equations that relate the rameters to the
pa-
parameters,
j = 1, 2, 3, 4.
You should know how to interpret the
parameters of both models.
2.5. Primer Projecte Docencia Tutoritzada

Podeu treballar en grups de ns 5 alumnes. L'avaluaci formar part de la
nota dels exercicis. Recomano instalar Gretl en un ordinador porttil amb WiFi,
2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA
31
per poder treballar comodament. Heu d'entregar abans del dia 1 de juny un breu informe (10 pgines mxim) sobre el segent:
2.5.1. Theoretical background.

output level inputs
For a rm that takes input prices
and the
as given, the cost minimization problem is to choose the quantities of
to solve the problem
min w x
x
subject to the restriction
f (x) = q.
The solution is the vector of factor demands
x(w, q).
The
cost function
is obtained
by substituting the factor demands into the criterion function:
Cw, q) = w x(w, q). Monotonicity

Increasing factor prices cannot decrease cost, so
C(w, q) 0 w
Remember that these derivatives give the conditional factor demands (Shephard's Lemma).
Homogeneity The cost function is homogeneous of degree 1 in input prices: C(tw, q) = tC(w, q)
where
is a scalar constant. This is because the factor
demands are homogeneous of degree zero in factor prices - they only depend upon relative prices.
32
2. DUMMY VARIABLES
Returns to scale The returns to scale
parameter
is dened as the inverse
of the elasticity of cost with respect to output:
C(w, q) q q C(w, q)
Constant returns to scale
is the case where increasing production
implies
that cost increases in the proportion 1:1. If this is the case, then
= 1.
2.5.2. Cobb-Douglas functional form.

g
The Cobb-Douglas functional form
is linear in the logarithms of the regressors and the dependent variable. For a cost function, if there are factors, the Cobb-Douglas cost function has the form
C = Aq q w1 1 ...wg g e
What is the elasticity of
with respect to
wj ?
eC j = w
C W J
wj C
1 ..wg g e
= j Aq q w1 1 .wj j
wj 1 Aq q w1 ...wg g e
= j
This is one of the reasons the Cobb-Douglas form is popular - the coecients are easy to interpret, since they are the elasticities of the dependent variable with respect to the explanatory variable. Not that in this case,
eCj = w
C WJ
wj C wj C
= xj (w, q) sj (w, q)
2.5. PRIMER PROJECTE DOCENCIA TUTORITZADA
33
the
cost share
of the
j th
input. So with a Cobb-Douglas cost function,
j = sj (w, q).
The cost shares are constants. Note that after a logarithmic transformation we obtain
ln C = + q ln q + 1 ln w1 + ... + g ln wg +
where data. One can verify that the property of HOD1 implies that
= ln A
. So we see that the transformed model is linear in the logs of the
g = 1
i=1
In other words, the cost shares add up to 1. The hypothesis that the technology exhibits CRTS implies that
=
so
1 =1 q i 0, i = 1, ..., g .
q = 1.
Likewise, monotonicity implies that the coecients
2.5.3. The Nerlove data and OLS.
The le nerlove.xls contains data on 145
electric utility companies' cost of production, output and input prices. The data are for the U.S., and were collected by M. Nerlove. The observations are by row, and the columns are
COMPANY, COST (C), OUTPUT (Q), PRICE OF LABOR

and
(PL ), PRICE OF FUEL (PF )
PRICE OF CAPITAL (PK ). Note that the
data are sorted by output level (the third column).
(1) Baixar les dades nerlove.xls (s un txer Excel). (2) Importar les dades en Gretl (3) Crear logaritmes de cost, output, labor, fuel, capital
34
2. DUMMY VARIABLES
(4) Estimar amb MQO el model
(2.5.1)
ln(cost) = 1 + 2 ln(output) + 3 ln(labor) + 4 ln(f uel) + 5 ln(capital) +
(5) Comentar els resultats, en general, i especicament respecte homogeneitat de grau 1 i rendiments a escala (6) Crear variables ctcies (a) (b) (c) (d) (e)
d1 d2 d3 d4 d5
= 1 si 101 <= rm <= 129, = 1 si 201 <= rm <= 229, = 1 si 301 <= rm <= 329, = 1 si 401 <= rm <= 429, = 1 si 501 <= rm <= 529,
d1 d2 d3 d4 d5
= 0 al contrari = 0 al contrari = 0 al contrari = 0 al contrari = 0 al contrari
(7) Estimar el model (2.5.2)
ln(cost) =
j=1
j dj +
j=1
j [dj ln(output)]+3 ln(labor)+4 ln(f uel)+5 ln(capital)+

Presentar un grc
(8) Comentar resultats, enfatitzant rendiments a escala.
representant rendiments a escala com una funci del tamany de l'empresa. Interpretar el grc. (9) Contrastar restriccions
1 = 2 = 3 = 4 = 5
conjuntament amb
1 =
2 = 3 = 4 = 5
i interpretar el resultat.

The professor of the practical session will give you a problem list. The problems 9.1, 9.2, 9.3, 9.5, 9.6, 9.13, 9.15 on pages 311-320 of Gujarati's book are recommended for study.
CHAPTER 3
Collinearity
3.1. Introduction
should learn the answers to the following questions: (1) What is collinearity? (2) What are the eects of collinearity on the OLS estimator: how does it aect estimation, hypothesis testing and prediction? (3) How can the presence of collinearity be detected? (4) What can be done to improve the situation if collinearity is a problem?
Readings: Gujarati,
Econometria,
(cuarta edicion), Chapter 10: Multicolinalidad:
Que pasa si las regresoras estn correlacionadad?, pp. 327-371.
3.2. Motivation: Data on Mortality and Related Factors

The data set mortalitat.gdt contains annual data from 1947 - 1980 on death rates in the U.S., along with data on factors like smoking and consumption of alcohol. The data description is: DATA4-7: Death rates in the U.S. due to coronary heart disease and their determinants. Data compiled by Jennifer Whisenand
chd = death rate per 100,000 population (Range 321.2 - 375.4) cal = Per capita consumption of calcium per day in grams (Range 0.9 1.06)
35
unemp = Percent of civilian labor force unemployed in 1,000 of persons 16 years and older (Range 2.9 - 8.5)
cig = Per capita consumption of cigarettes in pounds of tobacco by persons 18 years and olderapprox. 6.75 - 10.46) 339 cigarettes per pound of tobacco (Range
edfat = Per capita intake of edible fats and oil in poundsincludes lard, margarine and butter (Range 42 - 56.5)
meat = Per capita intake of meat in poundsincludes beef, veal, pork, lamb and mutton (Range 138 - 194.8)
spirits = Per capita consumption of distilled spirits in taxed gallons for individuals 18 and older (Range 1 - 2.9)
beer = Per capita consumption of malted liquor in taxed gallons for individuals 18 and older (Range 15.04 - 34.9)
wine = Per capita consumption of wine measured in taxed gallons for individuals 18 and older (Range 0.77 - 2.65)
Consider the models, with the estimation results:
chd = 1 + 2 cig + 3 spirits + 4 beer + 5 wine +
chd = 334.914 + 5.41216 cig + 36.8783 spirits 5.10365 beer

(58.939) (5.156) (7.373) (1.2513)
+ 13.9764 wine
(12.735)
T = 34 R2 = 0.5528 F (4, 29) = 11.2 = 9.9945

(standard errors in parentheses)
chd = 1 + 2 cig + 3 spirits + 4 beer +
chd = 353.581 + 3.17560 cig + 38.3481 spirits 4.28816 beer

(56.624) (4.7523) (7.275) (1.0102)
T = 34 R2 = 0.5498 F (3, 30) = 14.433 = 10.028

chd = 1 + 2 cig + 3 spirits + 5 wine +
chd = 243.310 + 10.7535 cig + 22.8012 spirits 16.8689 wine

(67.21) (6.1508) (8.0359) (12.638)
T = 34 R2 = 0.3198 F (3, 30) = 6.1709 = 12.327

chd = 1 + 2 cig + 3 spirits +
chd = 181.219 + 16.5146 cig + 15.8672 spirits

(49.119) (4.4371) (6.2079)
T = 34 R2 = 0.3026 F (2, 31) = 8.1598 = 12.481

38
3. COLLINEARITY
Note how the signs of the coecients change depending on the model, and that the magnitude of the parameter estimates varies a lot too. The parameter estimates are highly sensitive to the particular model we estimate. Why? We'll see that the problem is that the data exhibit
collinearity.
3.3. Denition and Basic Concepts Collinearity (denition):

Collinearity is the existence of linear relationships
amongst the regressors. We can always write
1 x1 + 2 x2 + + K xK + v = 0
where
xi
is the
ith
column of the regressor matrix
X,
and
is an
n1
vector. In
the case that there exists collinearity, the variation in
is relatively small, so that
there is an approximately exact linear relation between the regressors.
relative and approximate are imprecise terms, so the existence of collinearity is also an imprecise, relative concept.
many authors, including Gujarati, use the term multicollinearity. Some, including myself, prefer to call the phenomenon collinearity. Collinearity as used here means exactly what Gujarati and others refer to as multicollinearity.
Exact (or Perfect) Collinearity (denition):

In the extreme, if there are exact linear relationships, we can write
1 x1 + 2 x2 + + K xK = 0
3.4. WHEN DOES IT OCCUR?
39
In this case,
(X) < K,
so
(X X) < K,
so
XX
is not invertible and the OLS esti-
mator is not uniquely dened. The existence of exact linear relationships amongst the regressors is known as perfect collinearity or exact collinearity. For example, if the model is
yt = 1 + 2 x2t + 3 x3t + t x2t = 1 + 2 x3t

then we can write
yt = 1 + 2 (1 + 2 x3t ) + 3 x3t + t = 1 + 2 1 + 2 2 x3t + 3 x3t + t = (1 + 2 1 ) + (2 2 + 3 ) x3t = 1 + 2 x3t + t

The
s can be consistently estimated, but since the s dene two equations s, the s can't be consistently estimated (there are multiple valthat solve the rst order conditions that dene the OLS estimator). are
in three ues of The
unidentied
in the case of perfect collinearity.
3.4. When does it occur? Perfect collinearity:
Perfect collinearity is unusual, except in the case of an error in construction of the regressor matrix, such as including the same regressor twice.
Another case where perfect collinearity may be encountered is with models with dummy variables, if one is not careful. Consider a model of rental
40
3. COLLINEARITY
price
(yi )
of an apartment. This could depend factors such as size, quality
etc., collected in if the
xi , as well as on the location of the apartment. Bi = 0
Let
Bi = 1 Gi ,
ith
apartment is in Barcelona,
otherwise. Similarly, dene
Ti
as
and
Li
for Girona, Tarragona and Lleida. One could use a model such
yi = 1 + 2 Bi + 3 Gi + 4 Ti + 5 Li + xi + i
In this model,
Bi + Gi + Ti + Li = 1, i, so there is an exact relationship be-
tween these variables and the column of ones corresponding to the constant. One must either drop the constant, or one of the qualitative variables.
Collinearity (inexact):
The more common case, if one doesn't make mistakes such as these, is the existence of inexact linear relationships,
i.e.,
correlations between the regressors
that are less than one in absolute value, but not zero. This is (unfortunately) quite common with economic data.
economic data is non-experimental, so a researcher cannot control the values of the variables.
common factors aect dierent variables at the same time, which tends to induce correlations. Variables tend to move together over time (for example, prices of apartments in Barcelona and in Valencia).
3.5. Consequences of Collinearity

The basic problem is that when two (or more) variables move together, it is dicult to determine their separate inuences. This is reected in imprecise estimates,
i.e.,
estimates with high variances.
With economic data, collinearity is commonly
encountered, and is often a severe problem.
3.5. CONSEQUENCES OF COLLINEARITY
41
Figure 3.5.1.
s()
when there is no collinearity
When there is collinearity, the minimizing point of the objective function that denes the OLS estimator (s(), the sum of squared errors) is relatively poorly dened. This is seen in Figures 3.5.1 and 3.5.2. To see the eect of collinearity on variances, partition the regressor matrix as
X=
where
x W X
if
is the rst column of
(note: we can interchange the columns of
we like, so there's no loss of generality in considering the rst column). Now, the variance of
under the classical assumptions, is
1 V () = (X X) 2
42
3. COLLINEARITY
Figure 3.5.2.
s()
when there is collinearity
Using the partition,
XX=
xx
xW
Wx WW
and following a rule for partitioned inversion,
(X X)1,1 = = =
where by
x x x W (W W )1 W x x In W (W W ) 1 W ESSx|W
1
1 1
ESSx|W
we mean the error sum of squares obtained from the regression
x = W + v.
3.5. CONSEQUENCES OF COLLINEARITY
43
Since
R2 = 1 ESS/T SS,
we have
ESS = T SS(1 R2 )
so the variance of the coecient corresponding to
is
V (x ) =
2 2 T SSx (1 Rx|W )
We see three factors inuence the variance of this coecient. It will be high if (1)
is large
(2) There is little variation in
x. Draw a picture here. x

and the other regressors, so
(3) There is a strong linear relationship between that
can explain the movement in
x well.
In this case,
2 Rx|W
will be close
to 1. As
2 Rx|W 1, V (x ) .
The last of these cases is collinearity.

Intuitively, when there are strong linear relations between the regressors, it is dicult to determine the separate inuence of the regressors on the dependent variable. This can be seen by comparing the OLS objective function in the case of no correlation between regressors with the objective function with correlation between the regressors. See Figures 3.5.1 and 3.5.2.
Consequences - summary:
the parameters associated with variables aected by collinearity have high variances.
high variances lead to low power when testing hypotheses. high variances lead to low t-statistics, broad condence intervals, etc. the results are sensitive to small changes in the sample.
3.6. Detection of Collinearity
The best way is simply to regress each explanatory variable in turn on the remaining regressors. If any of these auxiliary regressions has a high there is a problem of collinearity.
R2 ,
Furthermore, this procedure identies
which parameters
are aected. Collinearity
Sometimes, we're only interested in certain parameters.
isn't a problem if it doesn't aect what we're interested in estimating. An alternative is to examine the matrix of correlations between the regressors. High correlations are sucient but not necessary for severe collinearity. There may be a near exact linear relationship between 3 variables without the existence of any near exact linear relationship between pairs of variables.
Also indicative of collinearity is that the model ts well (high
R2 ),
but
none of the variables is signicantly dierent from zero (e.g., their separate inuences aren't well determined).
In summary, the articial regressions are the best approach if one wants to be careful.
Example: using the mortalitat.gdt data, discussed above (Section 3.2), we can use the articial regression approach, regressing spirits on the other regressors (cig, wine, beer). The results are
spirits = 1.01350 + 0.0670534 cig + 0.0794414 beer + 0.313745 wine

(1.4477) (0.12709) (0.02738) (0.3101)
T = 34 R2 = 0.8907 F (3, 30) = 90.669 = 0.24749

3.8. SEGON PROJECTE DE DOCENCIA TUTORITZADA
45
. Note that
R2
is very high: we have a serious problem of collinearity. This explains
the instability of the parameters we found earlier when we tried several models in Section 3.2.
3.7. Dealing with collinearity

Collinearity is a problem of an uninformative sample. The rst question is: is all the available information being used? Is more data available? Are there coecient restrictions that have been neglected?
Picture illustrating how a restriction can solve
problem of perfect collinearity.

There do exist specialized methods such as ridge regression, principal components analysis,
etc.
that can be used when there is a severe problem of collinearity, but
these topics are advanced and are outside the scope of this course. These methods present problems of their own, they are not clear and obviously good solutions to the problem. In sum, collinearity is a fact of life in econometrics, and there is no clear solution to the problem. It is important to be aware of its eects and to know when it is present.
3.8. Segon Projecte de Docencia Tutoritzada

(1) Pel model de Nerlove de cost de produccio d'electriticat

que s'ha explicat en Secci2.5.3, fes servir regressions articials per comprovar l'existncia de conlinealitat. (2) Quin s el motiu per la falta de signicativitat del coecient de Nerlove? Dona una interpretaci econmica.
en el model
46
3. COLLINEARITY
(3) Verica l'existncia de colinealitat en els models de mortalitat que estan presentats en Secci 3.2. Baixa les dades i fes les regressions articials
pertinyents. Tamb presenta la matriu de correlacions dels regressors cig, spirits, wine, beer. Dona una interpretaci

The professor of the practical sessions will give you a list of problems. To that, you might also consider the exercises 10.5, 10.7, 10.9, 10.19, 10.30a, 10.30b from Gujarati, pp. 361-371.
CHAPTER 4
Heteroscedasticity
4.1. Introduction
should learn the answers to the following questions: (1) What is heteroscedasticity? (2) What are the properties of the OLS estimator when there is heteroscedasticity? (3) What is the GLS estimator? (4) What is the feasible GLS estimator? (5) What are the properties of the (F)GLS estimator? (6) How can the presence of heteroscedasticity be detected? (7) How can we deal with heteroscedasticity if it is present?
Readings: Gujarati,
Econometria,
(cuarta edicion), Chapter 11: Heteroscedastici-
dad: Qu pasa cuando la varianza del error no es constante?, pp. 372 424.
4.2. Motivation
One of the assumptions we've made up to now is that
t IID(0, 2 ),
47
48
4. HETEROSCEDASTICITY
or occasionally
t IIN (0, 2 ).
This model is quite unreasonable in many cases. Often, the variance of
t will
change depending on the values of the regressors, or there may be correlations between dierent
t , s ,s
= t.
For example, consider the Nerlove model of section 2.5.3.
If we estimate the model in equation 5.9.1, a plot of the residuals versus log(output) is in Figure 4.2.1. Note that the variance of the error appear to be larger for small rms, and smaller for large rms. This seems to violate the classical assumption that
E( t ) = 2 , t.
If the variance is not constant, we have a problem of
heteroscedas-
ticity.
Note also in Figure 4.2.1 that there seems to be correlation in the residuals:
when a residual is positive, the next one is too in most cases. When a residual is negative, the next one is more likely to be negative than positive. If this is the case, it's a violation of the classical assumption that we have a problem of
E(
t s)
= 0, t = s.
If this is the case,
autocorrelation.
In this chapter and the next, we'll investigate what is the importance of these two problems, and how to deal with them.
4.3. Basic Concepts and Denitions

Now we'll investigate the consequences of nonidentically and/or dependently distributed errors. We'll assume xed regressors for now, relaxing this admittedly unrealistic assumption later. The model is
y = X + E() = 0 V () =
4.3. BASIC CONCEPTS AND DEFINITIONS
49
Figure 4.2.1. Residuals of Nerlove model
where
is a general symmetric positive denite matrix.
The case where
is a diagonal matrix gives uncorrelated, nonidentically
distributed errors. This is known as
heteroscedasticity
(HET).
The case where
has the same number on the main diagonal but nonzero
elements o the main diagonal gives identically (assuming higher moments are also the same) dependently distributed errors. This is known as
auto-
correlation
(AUT).
Heteroscedasticity (denition):
Heteroscedasticity is the existence of errors that
have dierent variances. More precisely, there exist i and j such that
V ( i ) = V ( j ).
Autocorrelation (denition):
E(
i j)
Autocorrelation is the existence of errors that
are correlated with one another. More precisely, there exist distinct
i and
j such
= 0.
50
Note that presence of HET implies that its main diagonal.
will have dierent elements on
If there is AUT, then at least some elements of be dierent from zero.
o the main diagonal will
When there is HET but not AUT,
will be a diagonal matrix.
It is possible to have both HET and AUT at the same time. In this case,
can be a general symmetric positive denite matrix.
4.4. Eects of Het. and Aut. on the OLS estimator

The least square estimator is
= (X X)1 X y = + (X X)1 X
We have unbiasedness, as before. The variance of
is
E ( )( )
(4.4.1)
= E (X X)1 X X(X X)1 = (X X)1 X X(X X)1
Due to this, any test statistic that is based upon an estimator of invalid, since there
is
isn't
any
2,
it doesn't exist as a feature of the true
process that generates the data. In particular, the formulas for the
t, F, 2
based tests given above do not lead to statistics with these distributions.
is still consistent, following exactly the same argument given before.
4.4. EFFECTS OF HET. AND AUT. ON THE OLS ESTIMATOR
51
If
is normally distributed, then
N , (X X)1 X X(X X)1

The problem is that
is unknown in general, so this distribution won't be
useful for testing hypotheses.
Without normality, we still have
= =
n(X X)1 X XX n n1/2 X

1
n1/2 X
(supposing a CLT applies) as
Dene the limiting variance of
lim E
X X n
so we obtain
d n N 0, Q1 Q1 X X
Summary:
OLS with heteroscedasticity and/or autocorrelation is:
unbiased in the same circumstances in which the estimator is unbiased with i.i.d. errors
has a dierent variance than before, so the previous test statistics aren't valid
is consistent is asymptotically normally distributed, but with a dierent limiting covariance matrix. Previous test statistics aren't valid in this case for this reason.
is inecient, as is shown below.
52
4.5. The Generalized Least Squares (GLS) estimator

Suppose
were known. Then one could form the Cholesky decomposition
P P = 1
Here,
is an upper triangular matrix. We have
P P = In
so
P P P = P ,
which implies that
P P = In
Consider the model
P y = P X + P ,
or, making the obvious denitions,
y = X + .
This variance of
= P
is
E(P P ) = P P = In
4.5. THE GENERALIZED LEAST SQUARES (GLS) ESTIMATOR
53
Therefore, the model
y = X + E( ) = 0 V ( ) = In
satises the classical assumptions. The GLS estimator is simply OLS applied to the transformed model:
GLS = (X X )1 X y = (X P P X)1 X P P y = (X 1 X)1 X 1 y

The GLS estimator is unbiased in the same circumstances under which the OLS estimator is unbiased. For example,
E(GLS ) = E (X 1 X)1 X 1 y = E (X 1 X)1 X 1 (X + = .

The variance of the estimator can be calculated using
GLS = (X X )1 X y = (X X )1 X (X + ) = + (X X )1 X
54
so
GLS
GLS
= E (X X )1 X X (X X )1 = (X X )1 X X (X X )1 = (X X )1 = (X 1 X)1
Either of these last formulas can be used.
All the previous results regarding the desirable properties of the least squares estimator hold, when dealing with the transformed model, since the transformed model satises the classical assumptions.
Tests are valid, using the previous formulas, as long as we substitute place of
in
X.
Furthermore, any test that involves
can set it to
1.
This is
preferable to re-deriving the appropriate formulas.
The GLS estimator is more ecient than the OLS estimator.
This is a
consequence of the Gauss-Markov theorem, since the GLS estimator is based on a model that satises the classical assumptions but the OLS estimator is not. To see this directly, not that (the following needs to be completed)
V ar() V ar(GLS ) = (X X)1 X X(X X)1 (X 1 X)1 = AA

where
A = (X X)1 X (X 1 X)1 X 1 .
This may not seem obvi-
ous, but it is true, as you can verify for yourself. Then noting that is a quadratic form in a positive denite matrix, we conclude that positive semi-denite, and that GLS is ecient relative to OLS.
AA
is
AA
4.6. FEASIBLE GLS
55
As one can verify by calculating rst order necessary conditions, the GLS estimator is the solution to the minimization problem
GLS = arg min(y X) 1 (y X)

so the
metric 1
is used to weight the residuals.
4.6. Feasible GLS

The problem is that
isn't known usually, so this estimator isn't available.
Consider the dimension of
: it's an
nn
matrix with
(n2 n) /2 + n =
(n2 + n) /2
unique elements.
The number of parameters to estimate is larger than than
and increases faster
n.
There's no way to devise an estimator that satises a law of large
numbers without adding restrictions.
The
feasible GLS estimator
is based upon making sucient assumptions
regarding the form of
so that a consistent estimator can be devised.
Suppose that we
parameterize
as a function of
and
where
may include
as well as other parameters, so that
= (X, )
where fact
is of xed dimension. Assuming that the parametrization is correct, so in and if we can consistently estimate
= (X, ),
then we can consistently
estimate
(as long as
(X, )
is a continuous function of
).
In this case,
p = (X, ) (X, )
56
If we replace estimator.
in the formulas for the GLS estimator with
we obtain the FGLS
The FGLS estimator shares the same asymptotic properties as
GLS. These are

(1) Consistency (2) Asymptotic normality (3) Asymptotic eciency
if
the errors are normally distributed. (Cramer-Rao).
(4) Test procedures are asymptotically valid.
In practice, the usual way to proceed is

(1) Dene a consistent estimator of pending on the parametrization (2) Form
This is a case-by-case proposition, deWe'll see examples below.
().
= (X, ) P = Chol(1 ).
(3) Calculate the Cholesky factorization (4) Transform the model using
P y = P X + P
(5) Estimate using OLS on the transformed model.
4.7. Heteroscedasticity
Heteroscedasticity is the case where
E( ) =
is a diagonal matrix, so that the errors are uncorrelated, but have dierent variances. Heteroscedasticity is usually thought of as associated with cross sectional
data, though there is absolutely no reason why time series data cannot also be
4.7. HETEROSCEDASTICITY
57
heteroscedastic.
Actually, the popular ARCH (autoregressive conditionally het-
eroscedastic) models that you may hear about in your nance classes explicitly assume that a time series is heteroscedastic. Consider a supply function
qi = 1 + p Pi + s Si + i
where
Pi
is price and
Si
is some measure of size of the
ith
rm. One might suppose
that unobservable factors (e.g., talent of managers, degree of coordination between production units,
etc.)
account for the error term
i .
If there is more variability in
these factors for large rms than for small rms, then when
may have a higher variance
Si
is high than when it is low.
Another example, individual demand.
qi = 1 + p Pi + m Mi + i
where
is price and
is income. In this case,
i can reect variations in preferences.
There are more possibilities for expression of preferences when one is rich, so it is possible that the variance of
could be higher when
is high.
Add example of group means.
4.7.1. Detection.
There exist many tests for the presence of heteroscedasticity.
We'll discuss three methods. 4.7.1.1. and
Goldfeld-Quandt.
The sample is divided in to three parts, with
n1 , n2
n3
observations, where
n1 + n2 + n3 = n.
The model is estimated using the rst
and third parts of the sample, separately, so that Then we have
and
will be independent.
1 1 1 M 1 1 d 2 (n1 K) = 2 2
58
and
3 3 3 M 3 3 d 2 = (n3 K) 2 2
so
1 1 /(n1 K) d F (n1 K, n3 K). 3 3 /(n3 K)

The distributional result is exact if the errors are normally distributed. This test is a two-tailed test. Alternatively, and probably more conventionally, if one has prior ideas about the possible magnitudes of the variances of the observations, one could order the observations accordingly, from largest to smallest. In this case, one would use a conventional one-tailed F-test.
Draw picture.
Ordering the observations is an important step if the test is to have any power.
The motive for dropping the middle observations is to increase the dierence between the average variance in the subsamples, supposing that there exists heteroscedasticity. This can increase the power of the test. On the other hand, dropping too many observations will substantially increase the variance of the statistics
1 1
and
3 3 .
A rule of thumb, based on Monte
Carlo experiments is to drop around 25% of the observations.
If one doesn't have any ideas about the form of the het.
the test will
probably have low power since a sensible data ordering isn't available.
4.7.1.2.
White's test.
When one has little idea if there exists heteroscedasticity,
and no idea of its potential form, the White test is a possibility. The idea is that if there is homoscedasticity, then
E(2 |xt ) = 2 , t t
59
so that follows:
xt
or functions of
xt
shouldn't help to explain
E(2 ). t
The test works as
(1) Since
isn't available, use the consistent estimator
instead.
(2) Regress
2 = 2 + zt + vt t
where
zt
is a
P -vector. zt
may include some or all of the variables in
xt ,
as
well as other variables. White's original suggestion was to use set of all unique squares and cross products of variables in (3) Test the hypothesis that
xt ,
plus the
xt .
= 0.
The
qF
statistic in this case is
qF =
Note that
(ESSR ESSU ) /P ESSU / (n P 1)
ESSR = T SSU , so dividing both numerator and denominator by R2 1 R2
this we get
qF = (n P 1)
Note that this is the
R2
or the articial regression used to test for hetof the original model.
eroscedasticity, not the
R2
An asymptotically equivalent statistic, under the null of no heteroscedasticity (so that
R2
should tend to zero), is
nR2 2 (P ).
This doesn't require normality of the errors, though it does assume that the fourth moment of
is constant, under the null.
Question:
why is this necessary?
60
The White test has the disadvantage that it may not be very powerful unless the
zt
vector is chosen well, and this is hard to do without knowledge of the
form of heteroscedasticity.
It also has the problem that specication errors other than heteroscedasticity may lead to rejection.
Note: the null hypothesis of this test may be interpreted as variance model
=0
for the
V (2 ) = h( + zt ), t
where
h()
is an arbitrary function
of unknown form. The test is more general than is may appear from the regression that is used.
4.7.1.3.
Plotting the residuals.
A very simple method is to simply plot the residLike the Goldfeld-Quandt test, this will
uals (or their squares).
Draw pictures here.
be more informative if the observations are ordered according to the suspected form of the heteroscedasticity.
4.7.2. Dealing with heteroscedasticity if it is present.

eroscedasticity requires that a parametric form for means for estimating
Correcting for het-
()
be supplied, and that a
consistently be determined.
The estimation method will
be specic to the for supplied for HET and HET by groups.
().
We'll consider two examples, multiplicative
Before this, let's consider using OLS, even if we have
HET. The advantage of this is that we don't need to specify the form of 4.7.2.1.
().
Eicker
OLS with heteroscedasticity-consistent covariance matrix estimation.
(1967) and White (1980) showed how to modify test statistics to account for heteroscedasticity of unknown form. The OLS estimator has asymptotic distribution
d n N 0, Q1 Q1 X X
61
as we've already seen. Recall that we dened
lim E
X X n
This matrix has dimension can't estimate
KK
and can be consistently estimated, even if we
consistently.
The consistent estimator, under heteroscedasticity
but no autocorrelation is
1 = n
t xt xt 2
t=1
One can then modify the previous test statistics to obtain tests that are valid when there is heteroscedasticity of unknown form. For example, the Wald test for
H0 :
R r = 0
would be
n R r
4.7.2.2.
XX n
XX n
R r 2 (q)
Multiplicative heteroscedasticity.
Suppose the model is
yt = xt + t
2 t = E(2 ) = (zt ) t
but the other classical assumptions hold. In this case
2 = (zt ) + vt t
and
vt
has mean zero.
Nonlinear least squares could be used to estimate observable.
and
consistently, were
The solution is to substitute the squared OLS
residuals have
2 t
in place of
2 , t
since it is consistent by the Slutsky theorem. Once we
and
we can estimate
2 t
consistently using
2 t = (zt ) t . 2
62
In the second step, we transform the model by dividing by the standard deviation:
x t yt = t + t t t
or
y t = x + . t t
Asymptotically, this model satises the classical assumptions.
This model is a bit complex in that NLS is required to estimate the model of the variance. A simpler version would be
yt
xt + t
2 t = E(2 ) = 2 zt t
where
zt
is a single variable. There are still two parameters to be estimated,
and the model of the variance is still nonlinear in the parameters. However, the
search method
can be used in this case to reduce the estimation problem
to repeated applications of OLS.
First, we dene an interval of reasonable values for Partition this interval into
e.g.,
[0, 3].
equally spaced values, e.g.,
{0, .1, .2, ..., 2.9, 3}.
For each of these values, calculate the variable The regression
zt m .
2 = 2 zt m + vt t
is linear in the parameters, conditional on OLS.
m ,
so one can estimate
by
2 Save the pairs (m , m ), and the corresponding

the minimum
ESSm . Choose the pair with
ESSm
as the estimate.
63
Next, divide the model by the estimated standard deviations. Can rene.
Draw picture.
Works well when the parameter to be searched over is low dimensional, as in this case.
4.7.2.3.
Groupwise heteroscedasticity.
A common case is where we have repeated
observations on each of a number of economic agents: e.g., 10 years of macroeconomic data on each of a set of countries or regions, or daily observations of transactions of 200 banks. This sort of data is a
pooled cross-section time-series model.
It may be reasonable to presume that the variance is constant over time within the cross-sectional units, but that it diers across them (e.g., rms or countries of dierent sizes...). The model is
yit = xit + it
2 E(2 ) = i , t it
where agent.
i = 1, 2, ..., G
are the agents, and
t = 1, 2, ..., n
are the observations on each
The other classical assumptions are presumed to hold. In this case, the variance
2 i
is specic to each agent, but constant over the
observations for that agent.
In this model, we assume that that we'll relax later.
E(it is ) = 0.
This is a strong assumption
To correct for heteroscedasticity, just estimate each
2 i
using the natural estimator:
i = 2
1 n
2 it
t=1
64
Note that we use regressors, so unimportant.
1/n
here since it's possible that there are more than
nK
could be negative.
Asymptotically the dierence is
With each of these, transform the model as usual:
yit x it = it + i i i
Do this for each cross-sectional group. This transformed model satises the classical assumptions, asymptotically.
4.8. Example 4.8.1. Example: the Nerlove model.

Let's check the Nerlove data for evi-
dence of heteroscedasticity. In what follows, we're going to use the model with the constant and output coecient varying across 5 groups, but with the input price coecients xed (see Equation 2.5.2). If you plot the residuals of this model, you obtain Figure 4.8.1. We can see pretty clearly that the error variance is larger for small rms than for larger rms. As part of your next Docencia Tutoritzada project, you will use the White and Goldfeld-Quandt tests to conrm that homoscedasticity is strongly rejected.
4.9. Tercer Projecte de Docncia Tutoritzada

(1) Dades de Wisconsin (a) Baixa les dades de Wisconsin, sobre alada i renda (b) selecciona les observacions amb informaci completa sobre alada i renda. (c) crea una variable ctcia indicant si la persona s dona/home
4.9. TERCER PROJECTE DE DOCNCIA TUTORITZADA
65
Figure 4.8.1. Residuals, Nerlove model, sorted by rm size
(d) crea noves variables "AD" i "IQD" que expressen alada i IQ en desviacions respecte les seves mitjanes mostrals. (e) Estima el model renda = b1 + b2*Dona + b3* AD + b4*(Dona*AD) + b5*IQD + e amb l'estimador MQO. (f ) Comenta els resultats (g) Comprova si hi ha heteroscedasticitat (i) dibuixant els residus (ii) amb el contrast Goldfeld-Quandt (iii) amb el contrast de White (h) Torna a estimar amb MQO, per amb desviacions tpiques robustas. Compara els resultats amb els d'abans.
66
(i) Fes una estimaci MQ Generalitzat, suposant que hi ha heteroscedasticitat per grups. resultats. (j) Fes una estimaci MQ Generalitzat, fent servir l'opci de GRETL "Correcion de heteroscadaticidad" . Comenta els resultats. (2) Dades Nerlove (a) Torna a estimar el model amb variables ctcies i termes d'interacci del Primer Projecte de Docncia Tutoritzada Hi ha dos grups - homes i dones. Comenta els
ln(cost) =
j=1
j dj +
j=1

"els errors sn homoscedastics" amb el
(b) Contrasta l'hiptesis nulla: contrast de White.
(c) fs grcs del residus, i comenta si es detecta l'heteroscedasticitat. S'hauria d'obtenir un grc semblant amb Figure 4.8.1. (d) Fes una estimaci MQ Generalitzat, fent servir l'opci de GRETL "Correcion de heteroscadaticidad" . Comenta els resultats.

The professor of the practical sessions will give you a list of problems. To that, you might also consider exercises 11.1, 11.2, 11.6, 11.15, 11.16, from Gujarati, pp. 413-421.
CHAPTER 5
Autocorrelation
5.1. Introduction
should learn the answers to the following questions: (1) What is autocorrelation (AUT)? (2) What are the properties of the OLS estimator when there is autocorrelation? (3) How can the presence of autocorrelation be detected? (4) How can we deal with autocorrelation if it is present?
Readings: Gujarati,
Econometria,
(cuarta edicion), Chapter 12: Autocorrelacin:
qu sucede si los trminos error estn correlacionados?, pp. 425 - 486.
5.2. Motivation
Autocorrelation, which is the serial correlation of the error term, so that
E(
t s
0)
for
t = s,
is a problem that is usually associated with time series data, but also
can aect cross-sectional data. For example, a shock to oil prices will simultaneously aect all countries, so one could expect contemporaneous correlation of macroeconomic variables across countries. Seasonality is another common problem. Consider the Keeling-Whorf.gdt data. If we regress C02 concentration on a time trend, we obtain the tted line in 5.2.1. The residuals from the same model are in Figure 5.2.2. In addition to a high frequency monthly pattern in the residuals, there
67
68
5. AUTOCORRELATION
Figure 5.2.1. Keeling-Whorf CO2 data, t using time trend
Figure 5.2.2. Keeling-Whorf CO2 data, residuals using time trend
is a long term low frequency wave. It is clear that the errors of this model are not independent over time. This is an example of autocorrelation.
5.3. CAUSES
69
If you examine the residuals of the simple Nerlove model (equation 5.9.1), in Figure 4.8.1, you can also detect that there appears to be autocorrelation. In this Chapter, we will explore the causes, eects and treatments for AUT.
5.3. Causes
Autocorrelation is the existence of correlation across the error term:
E(t s ) = 0, t = s.
Why might this occur? Plausible explanations include: (1) Lags in adjustment to shocks. In a model such as
y t = xt + t ,
one could interpret
xt as the equilibrium value. t
Suppose
xt is constant over
a number of observations. One can interpret
as a shock that moves the
system away from equilibrium. If the time needed to return to equilibrium is long with respect to the observation frequency, one could expect be positive, conditional on
t+1
to
positive, which induces a correlation.
(2) Unobserved factors that are correlated over time. The error term is often assumed to correspond to unobservable factors. If these factors are correlated, there will be autocorrelation. (3) Misspecication of the model. (DGP) is Suppose that the data generating process
y t = 0 + 1 xt + 2 x2 + t t
but we estimate
yt = 0 + 1 xt + t
70
5. AUTOCORRELATION
Figure 5.3.1. Autocorrelation induced by misspecication
The eects are illustrated in Figure 5.3.1. A similar problem might explain the residuals of the simple Nerlove model, in Figure 4.2.1.
5.4. Eects on the OLS estimator

The variance of the OLS estimator is the same as in the case of heteroscedasticity - the standard formula does not apply. The correct formula is given in equation 4.4.1. Next we discuss two GLS corrections for OLS.
5.5. Corrections
There are many types of autocorrelation. The way to correct for the problem depends on the exact type of autocorrelation that exists. We'll consider two ex-
amples. The rst is the most commonly encountered case: autoregressive order 1 (AR(1) errors.
5.5. CORRECTIONS
71
5.5.1. AR(1).
The model is
yt = xt + t t = t1 + ut
2 ut iid(0, u )
E(t us ) = 0, t < s
We assume that the model satises the other classical assumptions.
We need a stationarity assumption: explodes as
|| < 1.
Otherwise the variance of
increases, so standard asymptotics will not apply.
By recursive substitution we obtain
t = t1 + ut = (t2 + ut1 ) + ut = 2 t2 + ut1 + ut = 2 (t3 + ut2 ) + ut1 + ut

In the limit the lagged
drops out, since
m 0
as
m ,
so we obtain
t =
m=0
With this, the variance of
m utm
is found as
E(2 ) t
= =
2 u m=0 2 u 1 2
2m
72
5. AUTOCORRELATION
If we had directly assumed that obtain this using
were covariance stationary, we could
V (t ) = 2 E(2 ) + 2E(t1 ut ) + E(u2 ) t1 t

2 = 2 V (t ) + u ,
so
V (t ) =
The variance is the
2 u 1 2
0th
order autocovariance:
0 = V (t )
Note that the variance does not depend on
Likewise, the rst order autocovariance
is
Cov(t , t1 ) = s = E((t1 + ut ) t1 ) = = V (t )
2 u 1 2
Using the same method, we nd that for
s<t
Cov(t , ts ) = s =
The autocovariances don't depend on
2 s u 1 2
the process
t:
{t }
is
covariance sta-
tionary
The
correlation ( in general, for r.v.'s x and y ) is dened as

corr(x, y)
cov(x, y) se(x)se(y)
5.5. CORRECTIONS
73
but in this case, the two standard errors are the same, so the
s-order autocorrelation
is
s = s
All this means that the overall matrix
has the form
2 u = 1 2 this is the variance
1
. . .
n1
.. .
n2
. . . .. .
n1
this is the correlation matrix

So we have homoscedasticity, but elements o the main diagonal are not zero. All of this depends only on two parameters,
and
2 u .
If we can
estimate these consistently, we can apply FGLS.
It turns out that it's easy to estimate these consistently. The steps are
(1) Estimate the model
yt = xt + t
by OLS.
(2) Take the residuals, and estimate the model
t = t1 + u t
Since
t t ,
this regression is asymptotically equivalent to the regression
t = t1 + ut
74
5. AUTOCORRELATION
which satises the classical assumptions. Therefore, OLS to
obtained by applying
p
the estimator
t = t1 + u t
is consistent. Also, since
u ut , t
u = 2
1 n
n 2 ( )2 u ut t=2 p
(3) With the consistent estimators previous structure of factor
u 2
and
form
= (u , ) 2
using the
and estimate by FGLS. Actually, one can omit the
u /(1 2 ), 2
since it cancels out in the formula
F GLS = X 1 X
(X 1 y).
An asymptotically equivalent approach is to simply estimate the transformed model
yt yt1 = (xt xt1 ) + u t

using
n1
observations (since
y0
and
x0
aren't available).
This is the
method of Cochrane and Orcutt. Dropping the rst observation is asymptotically irrelevant, but
it can be very important in small samples.
One can
recuperate the rst observation by putting
y1 = y1
1 2 1 2
asymptotically, so we see that the trans-
x = x1 1
Note that the variance of
y1
is
2 u ,
formed model will be homoscedastic (and nonautocorrelated, since the are uncorrelated with the
us
y s,
in dierent time periods.
5.5. CORRECTIONS
75
5.5.2. MA(1).
is
The linear regression model with moving average order 1 errors
y t = xt + t t = ut + ut1
2 ut iid(0, u )
E(t us ) = 0, t < s
In this case,
V (t ) = 0 = E (ut + ut1 )2 = =
Similarly
2 2 u + 2 u 2 u (1 + 2 )
1 = E [(ut + ut1 ) (ut1 + ut2 )]

2 = u
and
2 = [(ut + ut1 ) (ut2 + ut3 )] = 0
76
5. AUTOCORRELATION
so in this case
1+ 2 = u 0 . . . 0
1+
2
0
.. .
0
. . .
..
1 + 2
Note that the rst order autocorrelation is
1 =
2 u 2 u (1+2 )
=
This achieves a maximum at
1 0 (1 + 2 )
and a minimum at
=1
= 1,
and the
maximal and minimal autocorrelations are 1/2 and -1/2. Therefore, series that are more strongly autocorrelated can't be MA(1) processes. Again the covariance matrix has a simple structure that depends on only two parameters. The problem in this case is that one can't estimate
using OLS on
t = ut + ut1
because the
ut
are unobservable and they can't be estimated consistently. However,
there is a simple way to estimate the parameters.
Since the model is homoscedastic, we can estimate
2 2 V (t ) = = u (1 + 2 )
using the typical estimator:
2 2 = u (1 + 2 ) =
1 n
2 t
t=1
5.6. VALID INFERENCES WITH AUTOCORRELATION OF UNKNOWN FORM
77
By the Slutsky theorem, we can interpret this as dening an (unidentied) estimator of both
2 u
and
e.g., use this as
2 u (1 + 2 ) =
1 n
2 t
t=1
However, this isn't sucient to dene consistent estimators of the parameters, since it's unidentied.
To solve this problem, estimate the covariance of
and
t1
using
2 Cov(t , t1 ) = u =
1 n
t t1
t=2
This is a consistent estimator, following a LLN (and given that the epsilon hats are consistent for the epsilons). As above, this can be interpreted as dening an unidentied estimator:
1 2 u = n
t t1
t=2
Now solve these two equations to obtain identied (and therefore consistent) estimators of both
and
2 u .
Dene the consistent estimator
2 = (, u )
following the form we've seen above, and transform the model using the Cholesky decomposition. The transformed model satises the classical assumptions asymptotically.
5.6. valid inferences with autocorrelation of unknown form

In Section 4.7.2.1 we saw that it is possible to consistently estimate the correct covariance matrix of the OLS estimator when there is HET. It is also possible to do
78
5. AUTOCORRELATION
this when there is AUT, or both HET and AUT. The details are beyond the scope of this course. It is important to remember that a correction for autocorrelation will only give an ecient estimator and valid test statistics if the model of autocorrelation is correct. It may be hard to determine which is the correct model for the autocorrelation of the errors, so one may prefer to foregoe the GLS correction and simply use OLS. If this is done, one needs to account for the existence of AUT when estimating the covariance of the parameters, to obtain correct test statistics. We will see examples in the Projecte de Docncia Tutoritzada.
5.7. Testing for autocorrelation Breusch-Godfrey test

This test uses an auxiliary regression, as does the White test for heteroscedasticity. The regression is
t = xt + 1 t1 + 2 t2 + + P tP + vt
and the test statistic is the
nR2
statistic, just as in the White test. There are
restrictions, so the test statistic is asymptotically distributed as a
2 (P ).
The intuition is that the lagged errors shouldn't contribute to explaining the current error if there is no autocorrelation.
xt
is included as a regressor to account for the fact that the
are not
independent even if the here.
are. This is a technicality that we won't go into
This test is valid even if the regressors are stochastic and contain lagged dependent variables.
5.8. LAGGED DEPENDENT VARIABLES AND AUTOCORRELATION: A CAUTION
79
The alternative is not that the model is an AR(P), following the argument above. The alternative is simply that some or all of the rst
autocorrelations
are dierent from zero. This is compatible with many specic forms of autocorrelation.
5.8. Lagged dependent variables and autocorrelation: A Caution

We've seen that the OLS estimator is consistent under autocorrelation, as long as
plim X = 0. This will be the case when E(X ) = 0, following a LLN. An important n
exception is the case where
contains lagged
y s and the errors are autocorrelated.
A simple example is the case of a single lag of the dependent variable with AR(1) errors. The model is
yt = xt + yt1 + t t = t1 + ut
Now we can write
E(yt1 t ) = E (xt1 + yt2 + t1 )(t1 + ut ) = 0

since one of the terms is and therefore
E(2 ) t1
Since
which is clearly nonzero. In this case
E(X ) = 0,
plim X = 0. n
plim = + plim
X n
the OLS estimator is inconsistent in this case. One needs to estimate by instrumental variables (IV). This is a topic that is beyond the scope of this course. It is important to be aware of the possibility that the OLS estimator can be inconsistent, though.
80
5. AUTOCORRELATION
5.9. Quart Projecte de Docncia Tutoritzada

Fent servir les dades de Nerlove (ja heu fet servir les dades, per el txer Excel est aqui si cal.) (1) Pel model senzill
(5.9.1)

(a) estimar el model amb MQO (b) Fes servir el contrast de Breusch-Godfrey per comprovar si hi ha autocorrelaci.
Important: Per poder fer aix s'haur de donar una estruc-
tura de serie temporal a les dades.

(c) fer un grc dels residus, i dona una interpretaci de si es veu o no un problema d'autocorrelaci. (2) Repetir exercici 1, per fent servir el model
ln(cost) =
j=1
j dj +
j=1
que es va presentar en Secci 2.5.3. (3) Amb les dades Keeling-Whorf.gdt (a) estimar el model
CO2t = 1 + 2 t +
(b) comprova si hi ha autocorrelaci fent servir el contrast de BreuschGodfrey. (c) fer un grc dels residus (d) tornar a estimar el model fent servir els mtodes de Cochrane-Orcutt i Prais-Winsten, i fer grcs dels residus.
5.10. CHAPTER EXERCISES
81
(e) comentar tots els resultats

The professor of the practical sessions will give you a list of problems. To that, you might also consider exercises 12.1, 12.8, 12.9, 12.11, 12.14, 12.17, 12.22, 12.26, 12.28 from Gujarati, pp. 472-486.
CHAPTER 6
Data sets
This chapter gives links to the data sets referred to in the Study Guide Wisconsin height-income data (comma separated values) Wisconsin height-income data (Gretl data le) Nerlove data (Excel spreadsheet le) Nerlove data (Gretl data le) Keeling-Whorf CO2 data (Gretl data le) Cigarette-Alcohol Mortality data (Gretl data le)
83

Gretl

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Gretl

Caricato da

Copyright:

Formati disponibili

Study Guide for Econometrics (second semester)

Universitat Autnoma de Barcelona

Michael Creel and Montserrat Farell

Chapter 1. 1.1. 1.2. 1.3.

Introduction Getting Started Chapter Exercises

Chapter 2. 2.1. 2.2. 2.3. 2.4. 2.5. 2.6.

Chapter 3. 3.1. 3.2. 3.3. 3.4.

3.5. 3.6. 3.7. 3.8. 3.9.

Chapter 5. 5.1. 5.2. 5.3. 5.4. 5.5. 5.6. 5.7. 5.8.

Quart Projecte de Docncia Tutoritzada Chapter Exercises

Econometrics at the Facultat

examples of the second half of Econometrics.

About this study guide

attending class is fundamental,

guide provides references to the book

(cuarta edicin) by D. Gujarati,

commonly used operating systems.

are available at the Econometrics Study Guide web

is a free computer pack-

Gretl v. 1.7.1 for Windows Data to accompany Gujarati's book

Figure 1.2.1. GRETL's startup window

1.2. Getting Started

1.2.1. Loading ASCII text data.

The Wisconsin longitudinal survey is long

1.2. GETTING STARTED

iduser,ix010rec,sexrsp,gg021jjd,gwiiq_bm 1001,60,2,18000,109 1002,,1,,79 1003,,2,,111 1004,,1,,96 1005,,2,,83 1006,65,2,-2,99

Figure 1.2.3. comma separated

1007,70,1,-2,86 1008,71,1,-2,86 1009,67,2,16827,106 1010,72,1,17094,88 1011,67,2,7698,124 1012,,2,-2,124

1.2. GETTING STARTED

Figure 1.2.4. Loading a csv le

renamed gg021jjd to income) shows what we see in Figure 1.2.7.

Figure 1.2.5. CSV data loaded

Figure 1.2.6. Changing a variable's name

1.2. GETTING STARTED

Figure 1.2.7. Missing observations

1.2.2. Loading spreadsheet data.

Data is often distributed as spreadsheet

Figure 1.2.8. Select sample, 1

1.2. GETTING STARTED

Figure 1.2.9. Selection criterion

Figure 1.2.12. Data dialog

Figure 1.2.10. Restricted sample

1.3. Chapter Exercises

1.3. CHAPTER EXERCISES

Figure 1.2.11. Loading spreadsheet data

(b) print descriptive statistics for all variables.

Readings: (1) Gujarati,

(cuarta edicion), Chapter 9: Modelos de re-

gressin con variables dictomas, pp. 285 - 320.

Figure 2.2.2. Comparing the two plots, we can see that:

Figure 2.2.1. Income regressed on height, men

Figure 2.2.2. Income regressed on height, women

2.3. DEFINITION, BASIC USE, AND INTERPRETATION

There are a few questions we might ask:

How can we incorporate such a qualitative characteristic into an econometric model?

2.3. Denition, Basic Use, and Interpretation Dummy variable (denition):

Dummy variable (example):

for the Wisconsin data, the variable sexrsp takes

Figure 2.3.1. Dening a dummy variable

Figure 2.3.2. Display values

2.3.1. Basic use and interpretation.

Dummy variables are used essentially Variables

In class we will discuss the following models.

Figure 1.2.4. Loading a csv le

renamed gg021jjd to income) shows what we see in Figure 1.2.7.

2.3. Denition, Basic Use, and Interpretation Dummy variable (denition):

Figure 2.3.1. Dening a dummy variable

Figure 2.4.1. Keeling-Whorf CO2 data, t using monthly dummies

For a rm that takes input prices

is dened as the inverse

Likewise, monotonicity implies that the coecients

The le nerlove.xls contains data on 145

3.3. Denition and Basic Concepts Collinearity (denition):