Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Programa Universitat-Empresa
February 2008
Contents
Introduction Econometrics at the Facultat About this study guide Bibliograpy 7 7 8 9
GRETL
11 11 12 20
Dummy Variables
23 23 23 25 28 30 34
Introduction Motivation Denition, Basic Use, and Interpretation Additional Details Primer Projecte Docencia Tutoritzada Chapter Exercises
Collinearity
35 35 35 38 39
3
Introduction Motivation: Data on Mortality and Related Factors Denition and Basic Concepts When does it occur?
CONTENTS
Consequences of Collinearity Detection of Collinearity Dealing with collinearity Segon Projecte de Docencia Tutoritzada Chapter Exercises
40 44 45 45 46
Chapter 4. 4.1. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. 4.8. 4.9. 4.10.
Heteroscedasticity
47 47 47 48 50 52 55 56 64 64 66
Introduction Motivation Basic Concepts and Denitions Eects of Het. and Aut. on the OLS estimator The Generalized Least Squares (GLS) estimator Feasible GLS Heteroscedasticity Example Tercer Projecte de Docncia Tutoritzada Chapter Exercises
Autocorrelation
67 67 67 69 70 70 77 78 79
Introduction Motivation Causes Eects on the OLS estimator Corrections valid inferences with autocorrelation of unknown form Testing for autocorrelation Lagged dependent variables and autocorrelation: A Caution
CONTENTS
5.9. 5.10.
80 81
Chapter 6.
Data sets
83
Introduction
predictions are things that can be done using econometric methods. Courses that are fundamental for successfully studying Econometrics are Matemtiques per a Economistes I and Matemtiques per a Economistes II (rst year of study) and Estadistica I and Estadistica II (second year of study). Ideally, students should have passed these courses before beginning Econometrics. If this is
not possible, any student of Econometrics should immediately begin serious review of the material covered in these courses. Basic matrix algebra, constrained and
unconstrained minimization of functions, conditional and unconditional expectations of random variables, and hypothesis testing are areas that should be reviewed.
INTRODUCTION
Microeconomia I and Microeconomia II are courses that provide a theoretical background which is important to understand why and how we use econometric tools. Macroeconomia I also provides a theoretical background for some of the
as is careful
reading of a textbook.
The
Econometra
mentioned below. In the second semester of Econometrics, we will cover material in Chapters 9, 10, 11 and 12 of Gujarati's book. This guide has been checked to work properly using the Firefox web browser, and Adobe Acrobat Reader. Both of these packages are freely available for the You should congure Acrobat Reader to use
Firefox to open links. This study guide and related materials (data sets, copies of software and manuals, page.
etc.)
BIBLIOGRAPY
Bibliograpy
There are many excellent textbooks for econometrics. Any of the following are appropriate. This study guide refers to Gujarati's book. You should denitely read the appropriate sections of at least one of these books. (1) Novales, A. , Econometria, McGraw-Hill (2) Gujarati, D. , Econometria, McGraw-Hill (3) Johnston, J. i J. Dinardo, Metodos de Econometria, Vicens Vives (4) Kmenta, J., Elementos de Econometria, Vicens Vives (5) Maddala, G.S.(1996), Introduccin a la econometria, Segona edici. Prentice Hall (6) Pindyck, R.S. & Rubinfeld, D.L. (2001), Econometria: modelos y pronsticos, McGraw-Hill. Quarta Edici.
CHAPTER 1
GRETL
1.1. Introduction
GRETL (GRETL
http://gretl.sourceforge.net/)
age for doing econometrics. It is installed on the computers in Aules 21-22-23 as well as in the Social Sciences computer rooms. You can download a copy and install it on your own computer. It works with Windows, Macs, and Linux. It is avail-
able in a number of languages, including Spanish. The version for Windows, along with the manual and the data sets that accompany D. Gujarati's distributed with this study guide, and are also available :
Econometra
are
The examples in this study guide use GRETL, and to do the class assignments you will need to use GRETL. This chapter explains the basic steps of using GRETL.
Basic concepts and goals for learning: (1) become familiar with the basic use of GRETL (2) learn how to load ASCII and spreadsheet data (3) learn how to select certain observations in a data set
Readings: GRETL manual in Spanish or in English . You don't have to read the whole manual, but looking though it would be good idea.
11
12
1. GRETL
term study of people who graduated from high school in the state of Wisconsin (US) during the year 1957. The data has been collected repeatedly in subsequent years.
13
This data can be obtained over the Internet from the address given previously. In Figure 1.2.2 you can see that several variables have been selected for download.
Figure 1.2.2. Downloading data
In Figure 1.2.3 you see that one of the available formats is comma separated values (csv), which provides records (lines) that have variables which may be text or numbers, each separated by commas. Downloading that gives us the le wls.csv , the rst few lines of which are
14
1. GRETL
15
1.2.5. This data set has some problems that make it dicult to use. First, the variable names are strange and not intuitive. Second, many observations have missing values. You can change names of variables by right-clicking on a variable, and selecting Edit attributes. Then change the name to whatever you like. See Figure 1.2.6. To see that many observations are missing values, right-click on a variable and choose Display values or Descriptive statistics. For example, the variable income (I
16
1. GRETL
17
To eliminate missing observations, we can select from the menu Sample -> Restrict, based on criterion, as in Figure 1.2.8. We need to enter a selection criterion. This data set is missing many observation on income and age. We can select that these variables must be positive. This is illustrated in Figure 1.2.9. Once we do this, the new sample has 4934 observations, as we can seen in Figure 1.2.10. Whenever you are using this data, you should make sure that you have removed the observations with missing data.
les. These are easy to load into GRETL using the File -> Open data -> Import option. Figure 1.2.11 shows how to do it. We need some spreadsheet data to try
18
1. GRETL
this. Get the nerlove.xls data, and then import is as I have just explained. Once you do this you will see the dialog in Figure 1.2.12. Select no.
19
20
1. GRETL
21
CHAPTER 2
Dummy Variables
2.1. Introduction
Basic concepts and goals for learning. After studying the material, you
should be able to answer the following questions: (1) What is a dummy variable? (2) How can dummy variables be used in regression models? (3) What is the correct interpretation of a regression model that contains dummy variables? (4) How can dummy variables be used in the cases of multiple categories, interaction terms, and seasonality? (5) What is the equivalence between the dierent parameterizations that can be used when incorporating dummy variables?
Econometria,
2.2. Motivation
Often, qualitative factors can have an important eect on the dependent variable we may be interested in. Consider the Wisconsin data set wisconsin.gdt . If we
regress income on height, having selected the sample to include men only, we obtain the tted line in Figure 2.2.1. Doing the same for the sample of women, we get
24
2. DUMMY VARIABLES
25
the y-intercept is higher for men than for women the slope of the line is steeper for men than for women men are taller on average - for men, mean height is around 70 inches, while for women it's about 65 inches
why does income appear to depend upon height? What economic explanations are possible?
why do women appear to be earning less than men, other things equal?
Apart from these questions, it is clear that a qualitative feature - the sex of the individual - has an impact upon the individual's expected income.
The need to use qualitative information in our models motivates the study of dummy variables.
that indicates whether or not some condition is true. It is customary to assign the value 1 if the condition is true, and 0 if the condition is false.
the value 1 for men, and 2 for women. As such, sexrsp is not a dummy variable, since the values are not 0 or 1. We can dene the condition Is the person a woman? This is equivalent to the condition Is the value of sexrsp equal 2?. This condition will be true for some observations, and false for others. With GRETL, we can dene such a dummy variable, using the Variable -> Dene new variable menu item, as in
26
2. DUMMY VARIABLES
Figure 2.3.1. To check that this worked properly, highlight both variables, R-click, and select Display values. This shows us what we see in Figure 2.3.2. Note that woman is now a variable like any other, that takes on the values 0 or 1.
dt
and
dt2
xt
and
xt3
are
27
ordinary continuous regressors. You should understand the interpretation of all of them.
y t = 1 + 2 dt +
yt = 1 dt + 2 (1 dt ) +
yt = 1 + 2 dt + 3 xt +
Interaction terms:
the eect of one variable on the dependent variable depends on the value of the other. The following model has an interaction term. Note that The slope depends on the value of
E(y|x) x
= 3 + 4 dt .
dt .
yt = 1 + 2 dt + 3 xt + 4 dt xt +
yt = 1 + 2 dt1 + 3 dt2 + 4 xt +
Incorrect usage:
(1) overparameterization:
yt = 1 + 2 dt + 3 (1 dt ) +
(2) multiple values assigned to multiple categories. Suppose that we a condition that denes 4 possible categories, and we create a variable observation is in the rst category,
d = 1
if the
d=2
strictly speaking a dummy variable, according to our denition). Why is the following model not a good one?
y t = 1 + 2 d +
What is the correct way to deal with this situation?
this. You should be able to use GRETL to reproduce the following results:
Model 1: OLS estimates using the 468 observations 1965:012003:12 Dependent variable: C02
29
Variable djan dfeb dmar dapr dmay djun djul daug dsep doct dnov ddec time
Coecient
Std. Error
t-statistic 1504.5009 1506.4046 1508.6276 1512.7780 1513.5233 1509.1057 1500.5705 1489.3056 1479.3061 1477.3572 1481.9367 1486.1530 300.0664
p-value
316.864 317.533 318.271 319.418 319.848 319.187 317.653 315.539 313.690 313.548 314.792 315.961 0.121327
0.210610 0.210789 0.210967 0.211147 0.211327 0.211507 0.211688 0.211870 0.212052 0.212235 0.212419 0.212603 0.000404332
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Mean of dependent variable S.D. of dependent variable Sum of squared residuals Standard error of residuals ( ) Unadjusted Adjusted
R2
R2
F (12, 455)
DurbinWatson statistic and the plot in Figure 2.4.1.
Multiple parameterizations.
set of categorical information, there are multiple ways to use dummy variables. For
30
2. DUMMY VARIABLES
yt = 1 dt + 2 (1 dt ) + 3 xt + 4 dt xt +
and
yt = 1 + 2 dt + 3 xt dt + 4 xt (1 dt ) +
are equivalent. You should know what are the 4 equations that relate the rameters to the
pa-
parameters,
j = 1, 2, 3, 4.
nota dels exercicis. Recomano instalar Gretl en un ordinador porttil amb WiFi,
31
per poder treballar comodament. Heu d'entregar abans del dia 1 de juny un breu informe (10 pgines mxim) sobre el segent:
and the
min w x
x
subject to the restriction
f (x) = q.
The solution is the vector of factor demands
x(w, q).
The
cost function
is obtained
C(w, q) 0 w
Remember that these derivatives give the conditional factor demands (Shephard's Lemma).
Homogeneity The cost function is homogeneous of degree 1 in input prices: C(tw, q) = tC(w, q)
where
demands are homogeneous of degree zero in factor prices - they only depend upon relative prices.
32
2. DUMMY VARIABLES
parameter
C(w, q) q q C(w, q)
implies
that cost increases in the proportion 1:1. If this is the case, then
= 1.
is linear in the logarithms of the regressors and the dependent variable. For a cost function, if there are factors, the Cobb-Douglas cost function has the form
C = Aq q w1 1 ...wg g e
What is the elasticity of
with respect to
wj ?
eC j = w
C W J
wj C
1 ..wg g e
= j Aq q w1 1 .wj j
wj 1 Aq q w1 ...wg g e
= j
This is one of the reasons the Cobb-Douglas form is popular - the coecients are easy to interpret, since they are the elasticities of the dependent variable with respect to the explanatory variable. Not that in this case,
eCj = w
C WJ
wj C wj C
= xj (w, q) sj (w, q)
33
the
cost share
of the
j th
j = sj (w, q).
The cost shares are constants. Note that after a logarithmic transformation we obtain
ln C = + q ln q + 1 ln w1 + ... + g ln wg +
where data. One can verify that the property of HOD1 implies that
= ln A
g = 1
i=1
In other words, the cost shares add up to 1. The hypothesis that the technology exhibits CRTS implies that
=
so
1 =1 q i 0, i = 1, ..., g .
q = 1.
electric utility companies' cost of production, output and input prices. The data are for the U.S., and were collected by M. Nerlove. The observations are by row, and the columns are
(1) Baixar les dades nerlove.xls (s un txer Excel). (2) Importar les dades en Gretl (3) Crear logaritmes de cost, output, labor, fuel, capital
34
2. DUMMY VARIABLES
(2.5.1)
(5) Comentar els resultats, en general, i especicament respecte homogeneitat de grau 1 i rendiments a escala (6) Crear variables ctcies (a) (b) (c) (d) (e)
d1 d2 d3 d4 d5
= 1 si 101 <= rm <= 129, = 1 si 201 <= rm <= 229, = 1 si 301 <= rm <= 329, = 1 si 401 <= rm <= 429, = 1 si 501 <= rm <= 529,
d1 d2 d3 d4 d5
ln(cost) =
j=1
j dj +
j=1
representant rendiments a escala com una funci del tamany de l'empresa. Interpretar el grc. (9) Contrastar restriccions
1 = 2 = 3 = 4 = 5
conjuntament amb
1 =
2 = 3 = 4 = 5
i interpretar el resultat.
CHAPTER 3
Collinearity
3.1. Introduction
Basic concepts and goals for learning. After studying the material, you
should learn the answers to the following questions: (1) What is collinearity? (2) What are the eects of collinearity on the OLS estimator: how does it aect estimation, hypothesis testing and prediction? (3) How can the presence of collinearity be detected? (4) What can be done to improve the situation if collinearity is a problem?
Readings: Gujarati,
Econometria,
chd = death rate per 100,000 population (Range 321.2 - 375.4) cal = Per capita consumption of calcium per day in grams (Range 0.9 1.06)
35
unemp = Percent of civilian labor force unemployed in 1,000 of persons 16 years and older (Range 2.9 - 8.5)
cig = Per capita consumption of cigarettes in pounds of tobacco by persons 18 years and olderapprox. 6.75 - 10.46) 339 cigarettes per pound of tobacco (Range
edfat = Per capita intake of edible fats and oil in poundsincludes lard, margarine and butter (Range 42 - 56.5)
meat = Per capita intake of meat in poundsincludes beef, veal, pork, lamb and mutton (Range 138 - 194.8)
spirits = Per capita consumption of distilled spirits in taxed gallons for individuals 18 and older (Range 1 - 2.9)
beer = Per capita consumption of malted liquor in taxed gallons for individuals 18 and older (Range 15.04 - 34.9)
wine = Per capita consumption of wine measured in taxed gallons for individuals 18 and older (Range 0.77 - 2.65)
+ 13.9764 wine
(12.735)
38
3. COLLINEARITY
Note how the signs of the coecients change depending on the model, and that the magnitude of the parameter estimates varies a lot too. The parameter estimates are highly sensitive to the particular model we estimate. Why? We'll see that the problem is that the data exhibit
collinearity.
1 x1 + 2 x2 + + K xK + v = 0
where
xi
is the
ith
X,
and
is an
n1
vector. In
relative and approximate are imprecise terms, so the existence of collinearity is also an imprecise, relative concept.
many authors, including Gujarati, use the term multicollinearity. Some, including myself, prefer to call the phenomenon collinearity. Collinearity as used here means exactly what Gujarati and others refer to as multicollinearity.
1 x1 + 2 x2 + + K xK = 0
39
In this case,
(X) < K,
so
(X X) < K,
so
XX
mator is not uniquely dened. The existence of exact linear relationships amongst the regressors is known as perfect collinearity or exact collinearity. For example, if the model is
s can be consistently estimated, but since the s dene two equations s, the s can't be consistently estimated (there are multiple valthat solve the rst order conditions that dene the OLS estimator). are
unidentied
Perfect collinearity is unusual, except in the case of an error in construction of the regressor matrix, such as including the same regressor twice.
Another case where perfect collinearity may be encountered is with models with dummy variables, if one is not careful. Consider a model of rental
40
3. COLLINEARITY
price
(yi )
Let
Bi = 1 Gi ,
ith
apartment is in Barcelona,
Ti
as
and
Li
for Girona, Tarragona and Lleida. One could use a model such
yi = 1 + 2 Bi + 3 Gi + 4 Ti + 5 Li + xi + i
In this model,
tween these variables and the column of ones corresponding to the constant. One must either drop the constant, or one of the qualitative variables.
Collinearity (inexact):
The more common case, if one doesn't make mistakes such as these, is the existence of inexact linear relationships,
i.e.,
that are less than one in absolute value, but not zero. This is (unfortunately) quite common with economic data.
economic data is non-experimental, so a researcher cannot control the values of the variables.
common factors aect dierent variables at the same time, which tends to induce correlations. Variables tend to move together over time (for example, prices of apartments in Barcelona and in Valencia).
i.e.,
41
Figure 3.5.1.
s()
When there is collinearity, the minimizing point of the objective function that denes the OLS estimator (s(), the sum of squared errors) is relatively poorly dened. This is seen in Figures 3.5.1 and 3.5.2. To see the eect of collinearity on variances, partition the regressor matrix as
X=
where
x W X
if
we like, so there's no loss of generality in considering the rst column). Now, the variance of
1 V () = (X X) 2
42
3. COLLINEARITY
Figure 3.5.2.
s()
XX=
xx
xW
Wx WW
and following a rule for partitioned inversion,
(X X)1,1 = = =
where by
x x x W (W W )1 W x x In W (W W ) 1 W ESSx|W
1
1 1
ESSx|W
x = W + v.
43
Since
R2 = 1 ESS/T SS,
we have
ESS = T SS(1 R2 )
so the variance of the coecient corresponding to
is
V (x ) =
2 2 T SSx (1 Rx|W )
We see three factors inuence the variance of this coecient. It will be high if (1)
is large
x well.
In this case,
2 Rx|W
will be close
to 1. As
2 Rx|W 1, V (x ) .
Consequences - summary:
the parameters associated with variables aected by collinearity have high variances.
high variances lead to low power when testing hypotheses. high variances lead to low t-statistics, broad condence intervals, etc. the results are sensitive to small changes in the sample.
The best way is simply to regress each explanatory variable in turn on the remaining regressors. If any of these auxiliary regressions has a high there is a problem of collinearity.
R2 ,
which parameters
isn't a problem if it doesn't aect what we're interested in estimating. An alternative is to examine the matrix of correlations between the regressors. High correlations are sucient but not necessary for severe collinearity. There may be a near exact linear relationship between 3 variables without the existence of any near exact linear relationship between pairs of variables.
R2 ),
but
none of the variables is signicantly dierent from zero (e.g., their separate inuences aren't well determined).
In summary, the articial regressions are the best approach if one wants to be careful.
Example: using the mortalitat.gdt data, discussed above (Section 3.2), we can use the articial regression approach, regressing spirits on the other regressors (cig, wine, beer). The results are
45
. Note that
R2
the instability of the parameters we found earlier when we tried several models in Section 3.2.
etc.
these topics are advanced and are outside the scope of this course. These methods present problems of their own, they are not clear and obviously good solutions to the problem. In sum, collinearity is a fact of life in econometrics, and there is no clear solution to the problem. It is important to be aware of its eects and to know when it is present.
en el model
46
3. COLLINEARITY
(3) Verica l'existncia de colinealitat en els models de mortalitat que estan presentats en Secci 3.2. Baixa les dades i fes les regressions articials
pertinyents. Tamb presenta la matriu de correlacions dels regressors cig, spirits, wine, beer. Dona una interpretaci
CHAPTER 4
Heteroscedasticity
4.1. Introduction
Basic concepts and goals for learning. After studying the material, you
should learn the answers to the following questions: (1) What is heteroscedasticity? (2) What are the properties of the OLS estimator when there is heteroscedasticity? (3) What is the GLS estimator? (4) What is the feasible GLS estimator? (5) What are the properties of the (F)GLS estimator? (6) How can the presence of heteroscedasticity be detected? (7) How can we deal with heteroscedasticity if it is present?
Readings: Gujarati,
Econometria,
dad: Qu pasa cuando la varianza del error no es constante?, pp. 372 424.
4.2. Motivation
One of the assumptions we've made up to now is that
t IID(0, 2 ),
47
48
4. HETEROSCEDASTICITY
or occasionally
t IIN (0, 2 ).
This model is quite unreasonable in many cases. Often, the variance of
t will
change depending on the values of the regressors, or there may be correlations between dierent
t , s ,s
= t.
If we estimate the model in equation 5.9.1, a plot of the residuals versus log(output) is in Figure 4.2.1. Note that the variance of the error appear to be larger for small rms, and smaller for large rms. This seems to violate the classical assumption that
E( t ) = 2 , t.
heteroscedas-
ticity.
Note also in Figure 4.2.1 that there seems to be correlation in the residuals:
when a residual is positive, the next one is too in most cases. When a residual is negative, the next one is more likely to be negative than positive. If this is the case, it's a violation of the classical assumption that we have a problem of
E(
t s)
= 0, t = s.
autocorrelation.
In this chapter and the next, we'll investigate what is the importance of these two problems, and how to deal with them.
y = X + E() = 0 V () =
49
where
heteroscedasticity
(HET).
elements o the main diagonal gives identically (assuming higher moments are also the same) dependently distributed errors. This is known as
auto-
correlation
(AUT).
Heteroscedasticity (denition):
have dierent variances. More precisely, there exist i and j such that
V ( i ) = V ( j ).
Autocorrelation (denition):
E(
i j)
are correlated with one another. More precisely, there exist distinct
i and
j such
= 0.
50
4. HETEROSCEDASTICITY
It is possible to have both HET and AUT at the same time. In this case,
= (X X)1 X y = + (X X)1 X
We have unbiasedness, as before. The variance of
is
E ( )( )
(4.4.1)
Due to this, any test statistic that is based upon an estimator of invalid, since there
is
isn't
any
2,
process that generates the data. In particular, the formulas for the
t, F, 2
based tests given above do not lead to statistics with these distributions.
51
If
= =
n1/2 X
(supposing a CLT applies) as
lim E
X X n
so we obtain
d n N 0, Q1 Q1 X X
Summary:
unbiased in the same circumstances in which the estimator is unbiased with i.i.d. errors
has a dierent variance than before, so the previous test statistics aren't valid
is consistent is asymptotically normally distributed, but with a dierent limiting covariance matrix. Previous test statistics aren't valid in this case for this reason.
52
4. HETEROSCEDASTICITY
P P = 1
Here,
P P = In
so
P P P = P ,
which implies that
P P = In
Consider the model
P y = P X + P ,
or, making the obvious denitions,
y = X + .
This variance of
= P
is
E(P P ) = P P = In
53
y = X + E( ) = 0 V ( ) = In
satises the classical assumptions. The GLS estimator is simply OLS applied to the transformed model:
GLS = (X X )1 X y = (X X )1 X (X + ) = + (X X )1 X
54
4. HETEROSCEDASTICITY
so
GLS
GLS
= E (X X )1 X X (X X )1 = (X X )1 X X (X X )1 = (X X )1 = (X 1 X)1
All the previous results regarding the desirable properties of the least squares estimator hold, when dealing with the transformed model, since the transformed model satises the classical assumptions.
Tests are valid, using the previous formulas, as long as we substitute place of
in
X.
can set it to
1.
This is
This is a
consequence of the Gauss-Markov theorem, since the GLS estimator is based on a model that satises the classical assumptions but the OLS estimator is not. To see this directly, not that (the following needs to be completed)
A = (X X)1 X (X 1 X)1 X 1 .
ous, but it is true, as you can verify for yourself. Then noting that is a quadratic form in a positive denite matrix, we conclude that positive semi-denite, and that GLS is ecient relative to OLS.
AA
is
AA
55
As one can verify by calculating rst order necessary conditions, the GLS estimator is the solution to the minimization problem
metric 1
: it's an
nn
matrix with
(n2 n) /2 + n =
(n2 + n) /2
unique elements.
n.
The
Suppose that we
parameterize
as a function of
and
where
may include
= (X, )
where fact
is of xed dimension. Assuming that the parametrization is correct, so in and if we can consistently estimate
= (X, ),
estimate
(as long as
(X, )
is a continuous function of
).
In this case,
p = (X, ) (X, )
56
4. HETEROSCEDASTICITY
If we replace estimator.
if
().
= (X, ) P = Chol(1 ).
(3) Calculate the Cholesky factorization (4) Transform the model using
P y = P X + P
(5) Estimate using OLS on the transformed model.
4.7. Heteroscedasticity
Heteroscedasticity is the case where
E( ) =
is a diagonal matrix, so that the errors are uncorrelated, but have dierent variances. Heteroscedasticity is usually thought of as associated with cross sectional
data, though there is absolutely no reason why time series data cannot also be
4.7. HETEROSCEDASTICITY
57
heteroscedastic.
eroscedastic) models that you may hear about in your nance classes explicitly assume that a time series is heteroscedastic. Consider a supply function
qi = 1 + p Pi + s Si + i
where
Pi
is price and
Si
ith
that unobservable factors (e.g., talent of managers, degree of coordination between production units,
etc.)
i .
these factors for large rms than for small rms, then when
Si
qi = 1 + p Pi + m Mi + i
where
is price and
There are more possibilities for expression of preferences when one is rich, so it is possible that the variance of
is high.
4.7.1. Detection.
Goldfeld-Quandt.
n1 , n2
n3
observations, where
n1 + n2 + n3 = n.
and
will be independent.
1 1 1 M 1 1 d 2 (n1 K) = 2 2
58
4. HETEROSCEDASTICITY
and
3 3 3 M 3 3 d 2 = (n3 K) 2 2
so
Draw picture.
Ordering the observations is an important step if the test is to have any power.
The motive for dropping the middle observations is to increase the dierence between the average variance in the subsamples, supposing that there exists heteroscedasticity. This can increase the power of the test. On the other hand, dropping too many observations will substantially increase the variance of the statistics
1 1
and
3 3 .
If one doesn't have any ideas about the form of the het.
probably have low power since a sensible data ordering isn't available.
4.7.1.2.
White's test.
and no idea of its potential form, the White test is a possibility. The idea is that if there is homoscedasticity, then
E(2 |xt ) = 2 , t t
4.7. HETEROSCEDASTICITY
59
so that follows:
xt
or functions of
xt
E(2 ). t
(1) Since
instead.
(2) Regress
2 = 2 + zt + vt t
where
zt
is a
P -vector. zt
xt ,
as
well as other variables. White's original suggestion was to use set of all unique squares and cross products of variables in (3) Test the hypothesis that
xt ,
plus the
xt .
= 0.
The
qF
qF =
Note that
this we get
qF = (n P 1)
Note that this is the
R2
or the articial regression used to test for hetof the original model.
R2
R2
nR2 2 (P ).
This doesn't require normality of the errors, though it does assume that the fourth moment of
Question:
60
4. HETEROSCEDASTICITY
The White test has the disadvantage that it may not be very powerful unless the
zt
form of heteroscedasticity.
It also has the problem that specication errors other than heteroscedasticity may lead to rejection.
Note: the null hypothesis of this test may be interpreted as variance model
=0
for the
V (2 ) = h( + zt ), t
where
h()
is an arbitrary function
of unknown form. The test is more general than is may appear from the regression that is used.
4.7.1.3.
A very simple method is to simply plot the residLike the Goldfeld-Quandt test, this will
be more informative if the observations are ordered according to the suspected form of the heteroscedasticity.
()
consistently be determined.
().
HET. The advantage of this is that we don't need to specify the form of 4.7.2.1.
().
Eicker
(1967) and White (1980) showed how to modify test statistics to account for heteroscedasticity of unknown form. The OLS estimator has asymptotic distribution
d n N 0, Q1 Q1 X X
4.7. HETEROSCEDASTICITY
61
lim E
X X n
KK
consistently.
but no autocorrelation is
1 = n
t xt xt 2
t=1
One can then modify the previous test statistics to obtain tests that are valid when there is heteroscedasticity of unknown form. For example, the Wald test for
H0 :
R r = 0
would be
n R r
4.7.2.2.
XX n
XX n
R r 2 (q)
Multiplicative heteroscedasticity.
yt = xt + t
2 t = E(2 ) = (zt ) t
2 = (zt ) + vt t
and
vt
and
consistently, were
residuals have
2 t
in place of
2 , t
and
we can estimate
2 t
consistently using
2 t = (zt ) t . 2
62
4. HETEROSCEDASTICITY
In the second step, we transform the model by dividing by the standard deviation:
x t yt = t + t t t
or
y t = x + . t t
Asymptotically, this model satises the classical assumptions.
This model is a bit complex in that NLS is required to estimate the model of the variance. A simpler version would be
yt
xt + t
2 t = E(2 ) = 2 zt t
where
zt
and the model of the variance is still nonlinear in the parameters. However, the
search method
First, we dene an interval of reasonable values for Partition this interval into
e.g.,
[0, 3].
zt m .
2 = 2 zt m + vt t
is linear in the parameters, conditional on OLS.
m ,
by
ESSm
as the estimate.
4.7. HETEROSCEDASTICITY
63
Next, divide the model by the estimated standard deviations. Can rene.
Draw picture.
Works well when the parameter to be searched over is low dimensional, as in this case.
4.7.2.3.
Groupwise heteroscedasticity.
observations on each of a number of economic agents: e.g., 10 years of macroeconomic data on each of a set of countries or regions, or daily observations of transactions of 200 banks. This sort of data is a
It may be reasonable to presume that the variance is constant over time within the cross-sectional units, but that it diers across them (e.g., rms or countries of dierent sizes...). The model is
yit = xit + it
2 E(2 ) = i , t it
where agent.
i = 1, 2, ..., G
t = 1, 2, ..., n
The other classical assumptions are presumed to hold. In this case, the variance
2 i
E(it is ) = 0.
2 i
i = 2
1 n
2 it
t=1
64
4. HETEROSCEDASTICITY
1/n
nK
could be negative.
yit x it = it + i i i
Do this for each cross-sectional group. This transformed model satises the classical assumptions, asymptotically.
dence of heteroscedasticity. In what follows, we're going to use the model with the constant and output coecient varying across 5 groups, but with the input price coecients xed (see Equation 2.5.2). If you plot the residuals of this model, you obtain Figure 4.8.1. We can see pretty clearly that the error variance is larger for small rms than for larger rms. As part of your next Docencia Tutoritzada project, you will use the White and Goldfeld-Quandt tests to conrm that homoscedasticity is strongly rejected.
65
(d) crea noves variables "AD" i "IQD" que expressen alada i IQ en desviacions respecte les seves mitjanes mostrals. (e) Estima el model renda = b1 + b2*Dona + b3* AD + b4*(Dona*AD) + b5*IQD + e amb l'estimador MQO. (f ) Comenta els resultats (g) Comprova si hi ha heteroscedasticitat (i) dibuixant els residus (ii) amb el contrast Goldfeld-Quandt (iii) amb el contrast de White (h) Torna a estimar amb MQO, per amb desviacions tpiques robustas. Compara els resultats amb els d'abans.
66
4. HETEROSCEDASTICITY
(i) Fes una estimaci MQ Generalitzat, suposant que hi ha heteroscedasticitat per grups. resultats. (j) Fes una estimaci MQ Generalitzat, fent servir l'opci de GRETL "Correcion de heteroscadaticidad" . Comenta els resultats. (2) Dades Nerlove (a) Torna a estimar el model amb variables ctcies i termes d'interacci del Primer Projecte de Docncia Tutoritzada Hi ha dos grups - homes i dones. Comenta els
ln(cost) =
j=1
j dj +
j=1
(c) fs grcs del residus, i comenta si es detecta l'heteroscedasticitat. S'hauria d'obtenir un grc semblant amb Figure 4.8.1. (d) Fes una estimaci MQ Generalitzat, fent servir l'opci de GRETL "Correcion de heteroscadaticidad" . Comenta els resultats.
CHAPTER 5
Autocorrelation
5.1. Introduction
Basic concepts and goals for learning. After studying the material, you
should learn the answers to the following questions: (1) What is autocorrelation (AUT)? (2) What are the properties of the OLS estimator when there is autocorrelation? (3) How can the presence of autocorrelation be detected? (4) How can we deal with autocorrelation if it is present?
Readings: Gujarati,
Econometria,
5.2. Motivation
Autocorrelation, which is the serial correlation of the error term, so that
E(
t s
0)
for
t = s,
is a problem that is usually associated with time series data, but also
can aect cross-sectional data. For example, a shock to oil prices will simultaneously aect all countries, so one could expect contemporaneous correlation of macroeconomic variables across countries. Seasonality is another common problem. Consider the Keeling-Whorf.gdt data. If we regress C02 concentration on a time trend, we obtain the tted line in 5.2.1. The residuals from the same model are in Figure 5.2.2. In addition to a high frequency monthly pattern in the residuals, there
67
68
5. AUTOCORRELATION
is a long term low frequency wave. It is clear that the errors of this model are not independent over time. This is an example of autocorrelation.
5.3. CAUSES
69
If you examine the residuals of the simple Nerlove model (equation 5.9.1), in Figure 4.8.1, you can also detect that there appears to be autocorrelation. In this Chapter, we will explore the causes, eects and treatments for AUT.
5.3. Causes
Autocorrelation is the existence of correlation across the error term:
E(t s ) = 0, t = s.
Why might this occur? Plausible explanations include: (1) Lags in adjustment to shocks. In a model such as
y t = xt + t ,
one could interpret
Suppose
xt is constant over
system away from equilibrium. If the time needed to return to equilibrium is long with respect to the observation frequency, one could expect be positive, conditional on
t+1
to
(2) Unobserved factors that are correlated over time. The error term is often assumed to correspond to unobservable factors. If these factors are correlated, there will be autocorrelation. (3) Misspecication of the model. (DGP) is Suppose that the data generating process
y t = 0 + 1 xt + 2 x2 + t t
but we estimate
yt = 0 + 1 xt + t
70
5. AUTOCORRELATION
The eects are illustrated in Figure 5.3.1. A similar problem might explain the residuals of the simple Nerlove model, in Figure 4.2.1.
5.5. Corrections
There are many types of autocorrelation. The way to correct for the problem depends on the exact type of autocorrelation that exists. We'll consider two ex-
amples. The rst is the most commonly encountered case: autoregressive order 1 (AR(1) errors.
5.5. CORRECTIONS
71
5.5.1. AR(1).
The model is
yt = xt + t t = t1 + ut
2 ut iid(0, u )
E(t us ) = 0, t < s
We assume that the model satises the other classical assumptions.
|| < 1.
m 0
as
m ,
so we obtain
t =
m=0
With this, the variance of
m utm
is found as
E(2 ) t
= =
2 u m=0 2 u 1 2
2m
72
5. AUTOCORRELATION
so
V (t ) =
The variance is the
2 u 1 2
0th
order autocovariance:
0 = V (t )
is
Cov(t , t1 ) = s = E((t1 + ut ) t1 ) = = V (t )
2 u 1 2
s<t
Cov(t , ts ) = s =
The autocovariances don't depend on
2 s u 1 2
the process
t:
{t }
is
covariance sta-
tionary
The
cov(x, y) se(x)se(y)
5.5. CORRECTIONS
73
but in this case, the two standard errors are the same, so the
s-order autocorrelation
is
s = s
All this means that the overall matrix
1
. . .
n1
.. .
n2
. . . .. .
n1
and
2 u .
If we can
It turns out that it's easy to estimate these consistently. The steps are
yt = xt + t
by OLS.
t = t1 + u t
Since
t t ,
t = t1 + ut
74
5. AUTOCORRELATION
obtained by applying
p
the estimator
t = t1 + u t
u ut , t
u = 2
1 n
n 2 ( )2 u ut t=2 p
u 2
and
form
= (u , ) 2
using the
u /(1 2 ), 2
F GLS = X 1 X
(X 1 y).
n1
observations (since
y0
and
x0
aren't available).
This is the
method of Cochrane and Orcutt. Dropping the rst observation is asymptotically irrelevant, but
One can
y1 = y1
1 2 1 2
asymptotically, so we see that the trans-
x = x1 1
Note that the variance of
y1
is
2 u ,
formed model will be homoscedastic (and nonautocorrelated, since the are uncorrelated with the
us
y s,
5.5. CORRECTIONS
75
5.5.2. MA(1).
is
y t = xt + t t = ut + ut1
2 ut iid(0, u )
E(t us ) = 0, t < s
In this case,
V (t ) = 0 = E (ut + ut1 )2 = =
Similarly
2 2 u + 2 u 2 u (1 + 2 )
and
76
5. AUTOCORRELATION
so in this case
1+ 2 = u 0 . . . 0
1+
2
0
.. .
0
. . .
..
1 + 2
1 =
2 u 2 u (1+2 )
=
This achieves a maximum at
1 0 (1 + 2 )
and a minimum at
=1
= 1,
and the
maximal and minimal autocorrelations are 1/2 and -1/2. Therefore, series that are more strongly autocorrelated can't be MA(1) processes. Again the covariance matrix has a simple structure that depends on only two parameters. The problem in this case is that one can't estimate
using OLS on
t = ut + ut1
because the
ut
2 2 V (t ) = = u (1 + 2 )
2 2 = u (1 + 2 ) =
1 n
2 t
t=1
77
By the Slutsky theorem, we can interpret this as dening an (unidentied) estimator of both
2 u
and
2 u (1 + 2 ) =
1 n
2 t
t=1
However, this isn't sucient to dene consistent estimators of the parameters, since it's unidentied.
and
t1
using
2 Cov(t , t1 ) = u =
1 n
t t1
t=2
This is a consistent estimator, following a LLN (and given that the epsilon hats are consistent for the epsilons). As above, this can be interpreted as dening an unidentied estimator:
1 2 u = n
t t1
t=2
Now solve these two equations to obtain identied (and therefore consistent) estimators of both
and
2 u .
2 = (, u )
following the form we've seen above, and transform the model using the Cholesky decomposition. The transformed model satises the classical assumptions asymptotically.
78
5. AUTOCORRELATION
this when there is AUT, or both HET and AUT. The details are beyond the scope of this course. It is important to remember that a correction for autocorrelation will only give an ecient estimator and valid test statistics if the model of autocorrelation is correct. It may be hard to determine which is the correct model for the autocorrelation of the errors, so one may prefer to foregoe the GLS correction and simply use OLS. If this is done, one needs to account for the existence of AUT when estimating the covariance of the parameters, to obtain correct test statistics. We will see examples in the Projecte de Docncia Tutoritzada.
t = xt + 1 t1 + 2 t2 + + P tP + vt
and the test statistic is the
nR2
2 (P ).
The intuition is that the lagged errors shouldn't contribute to explaining the current error if there is no autocorrelation.
xt
are not
This test is valid even if the regressors are stochastic and contain lagged dependent variables.
79
The alternative is not that the model is an AR(P), following the argument above. The alternative is simply that some or all of the rst
autocorrelations
are dierent from zero. This is compatible with many specic forms of autocorrelation.
plim X = 0. This will be the case when E(X ) = 0, following a LLN. An important n
exception is the case where
contains lagged
A simple example is the case of a single lag of the dependent variable with AR(1) errors. The model is
yt = xt + yt1 + t t = t1 + ut
Now we can write
E(2 ) t1
Since
E(X ) = 0,
plim X = 0. n
plim = + plim
X n
the OLS estimator is inconsistent in this case. One needs to estimate by instrumental variables (IV). This is a topic that is beyond the scope of this course. It is important to be aware of the possibility that the OLS estimator can be inconsistent, though.
80
5. AUTOCORRELATION
(5.9.1)
ln(cost) =
j=1
j dj +
j=1
que es va presentar en Secci 2.5.3. (3) Amb les dades Keeling-Whorf.gdt (a) estimar el model
CO2t = 1 + 2 t +
(b) comprova si hi ha autocorrelaci fent servir el contrast de BreuschGodfrey. (c) fer un grc dels residus (d) tornar a estimar el model fent servir els mtodes de Cochrane-Orcutt i Prais-Winsten, i fer grcs dels residus.
81
CHAPTER 6
Data sets
This chapter gives links to the data sets referred to in the Study Guide Wisconsin height-income data (comma separated values) Wisconsin height-income data (Gretl data le) Nerlove data (Excel spreadsheet le) Nerlove data (Gretl data le) Keeling-Whorf CO2 data (Gretl data le) Cigarette-Alcohol Mortality data (Gretl data le)
83