Sei sulla pagina 1di 14

Econometrics

Professor Robert H. Patrick


Department of Finance and Economics
Rutgers Business School – Newark and New Brunswick, Rutgers University

1. Review of regression and hypothesis testing


2. Introduction to forecasting and the standard error of the forecast

This summary is intended to help you learn key aspects of econometrics. There is no
need to memorize the formulas, spend your energy on understanding the concepts and
how they are used in economic analysis.

Ordinary Least Squares (OLS) estimators for the simple (1 X variable) model
We use economic theory to specify a model, e.g.,
yt = β1 + β 2 xt + et , t = 1,...,T , (1)
where yt is the variable to be predicted (or explained, the dependent variable),
xt is the independent (explanatory) variable used to predict yt ,
T is the number of data observations on each of yt and xt ,
β1 is the constant term,
β 2 is the slope parameter, and

et is the error term (what is unexplained by the model).

Regardless of the number of x variables or the form of the model being estimated, y is
decomposed into what we can explain and what we can’t,
y = β1 + β 2 xt + et , t = 1,...,T .
t    
total explained unexplained

Given the T observations of data, OLS regression is used to estimate the parameters β1

and β 2 . β̂1 and β̂ 2 represent the estimated parameter values.1

1
− RFt = β̂1 + β̂ 2 ( RM t − RFt ) = 0.0175 + 1.0138( RM t − RFt ) ,
MSFT
For example, Rt
where β̂1 = 0.0175 and β̂ 2 = 1.0138 are from the MSFT CAPM.
Since the et , t = 1,...,T , are not observed, they are to be predicted. The predicted (or

forecast) error et is

êt = yt − β̂1 − β̂ 2 xt = yt − ŷt , t = 1,...,T ,

where ŷt is the predicted (or forecast) value of yt , which is conditional on the value of
xt , i.e.,
ŷt = E ( yt xt ) = β̂1 + β̂ 2 xt . (2)

OLS parameters (coefficients), β̂1 and β̂ 2 , are chosen to minimize the sum of squared
errors,
T T T
SSE = ∑ êt2 = ∑ ( yt − ŷt ) = ∑ ( yt − β1 − β 2 xt ) .
2 2

t=1 t=1 t=1

OLS parameter estimates for the simple model are then


T T T T
T ∑ xt yt − ∑ xt ∑ yt ∑( x t − x ) ( yt − y )
cov ( xt , yt )
β̂ 2 = t=1 t=1 t=1
2 = t=1
T = Slope
⎛ T ⎞
∑( x − x)
T var(xt )
T ∑ x − ⎜ ∑ xt ⎟
2
2
t
⎝ t=1 ⎠
t
t=1 t=1

β̂1 = y − β̂ 2 x Intercept (constant term)

OLS parameter variances, covariances, and standard errors for the simple model are
T
σ̂ 2 ∑ xt2
( )
SE β̂1 = var β̂1 ( ) ( )
vâr β̂1 = T
t =1
,
T ∑ ( xt − x )
2

t =1

( )
SE β̂ 2 = var β̂ 2 ( ) ( )
vâr β̂ 2 = T
σ̂ 2

∑( x − x)
2
t
t =1

σ̂ 2 ( −x )
(
côv β̂1 , β̂ 2 =) T ,
∑( x − x)
2
t
t =1

where σ̂ is the standard error of the regression .

2
Standard error of the regression:

∑ ê
⎡T t
2

1/2

σ̂ = = ⎢ ∑ êt2 (T − K ) ⎥
t=1

(T − K ) ⎣ t=1 ⎦

This is a measure of the variation in the predicted errors (residuals) of the model. K= the
number of parameters estimated in the model, e.g., K=2 in model (1), the β̂1 and β̂ 2
parameters.

Coefficient of determination R 2

If there is a constant term in the regression model then 0 ≤ R 2 ≤ 1 and can be interpreted
as the proportion of the variation in y that is explained by the model.

y = β1 + β 2 xt + et , t = 1,...,T .
t    
total explained unexplained

The regression equation separates yt into explained (systematic) and unexplained


(random, noise, unsystematic) components.

So for each observation we have


yt − y =total variation (the observed y less the sample mean of y)
yˆt − y = explained component of the total
yt − yˆt = eˆt , the unexplained component of the total.

Together we have yt − y = yˆt − y + eˆt

Squaring and summing over the sample we have

T T T

∑ ( y − y ) = ∑ ( ŷ − y ) + ∑ ê
2 2 2
t t t

t=1
 
t=1
 
t=1
TSS XSS ESS

so
TSS=XSS+ESS (note that MS Excel prints these values in the ANOVA table, the SS column).
TSS ≡ total sum of squares
XSS ≡ explained or regression sum of squares

3
ESS ≡ error (residual) sum of squares.

XSS ESS
R2 ≡ = 1−
TSS TSS

It is only valid to use R2 to compare regressions is when


a. Y is exactly the same (i.e., no transformations Y in different regressions being
compared),
b. There are exactly the same number of X variables,
c. Regressions being compared must all include a constant or all exclude constant.

If the goal is to obtain the highest R2, which is the objective in many stepwise regressions,
one can always obtain an R2=1 as long as a constant term is included in the regression. To
obtain the upper bound result with any Y and X data, just estimate
Yt = β1 + β 2 X + β3 X 2 + ... + βT −1 X T −1 .

t-statistic

The t-statistic can be used to test the null hypothesis H 0 : β =c , it is given by

β −c
t= ∼ t p,df ,
SEβ

where c is a constant (whatever value you want to compare to β ), p is the significance

level (usually 0.05), and the degrees of freedom for the model are given by df = T − K ,

K=the number of parameters estimated in the model (so far K=2 because we are

estimating β̂1 and β̂ 2 ).

The t-statistic presented with regression results is to test the null hypothesis H 0 : β =0 .

The shape of the Student’s t distribution is determined by the degrees of freedom.

t with right tail probabilities

4
Alternatively, the same hypothesis can be tested using confidence intervals for β̂ . For

example, the 95% confidence interval for β̂ is given by

β̂ ± t.05,T −K SEβ ⇒
β̂ − t.05,T −K SEβ ≤ c ≤ β̂ + t.05,T −K SEβ .

If the null hypothesis is, e.g., H 0 : β̂ =0 , then reject the null hypothesis if 0 is within the
calculated bounds.

OLS assumptions for the simple (single x variable) linear (in parameters) regression
model:

1. yt = β1 + β 2 xt + et , t = 1,..., T

2. E ( et ) = 0 if and only if E ( yt ) = β̂1 + β̂ 2 xt

3. var ( et ) = σ 2 < ∞

4. cov (et , es ) = 0 for all s, t = 1,..., T , s ≠ t

5. xt is not constant and cov (xt , et ) = 0 for all t = 1,..., T

If assumptions 1-5 hold, then the OLS estimator is the Best Linear Unbiased Estimator
(BLUE) of the parameters β1 and β 2 and their standard errors. Linear unbiased applies

to the parameter estimates and best refers to the efficiency of the standard errors.

BLUE implies that the OLS estimators have the smallest variance of all linear (in
parameters) and unbiased estimators (see the Gauss-Markov Theorem for the proof).

5
6. et  N (0, σ 2 ), the error term is normally distributed with mean zero and variance
sigma-squared.

Assumption 6 is needed for hypothesis testing and confidence intervals.

Implications of violating any of these assumptions on hypothesis tests and forecasting are
covered after we consider adding additional X variables to the model.

Multiple regression: adding more X variables to the model

Multiple regression involves more than one X variable in the model, e.g.,

y = β + β 2 x2t + ...+ β K x Kt + et , t = 1,...,T .


t 1   
total explained unexplained

Regardless of the number of x variables or the form of the model being estimated, y is
decomposed into what we can explain and what we can’t, as above.

Why might we want to include more than 1 X variable?

1. Theoretical and empirical interest. There may be more than one variable that explains
Y, i.e., additional X variables should be in the model (such as in APT or multifactor asset
pricing models, demand and supply, etc.).

2. Statistical validity of estimated parameters. Excluding relevant X variables that are


correlated with at least one X variable remaining in the model may lead to bias in the
estimated parameters. Suppose X2t is excluded from the above model and
cov(Xit , X2t ) ≠ 0 , X2t will be captured by the error term in the model, which implies

cov ( xit ,et ) ≠ 0 . This then creates a violation of assumption 5, leading to bias in the

estimated βi .

Multiple regression requires some modification of the OLS assumptions:


a. Assumption 1 becomes yt = β1 + β 2 x2t + ... + β K x Kt + et , t = 1,...,T (we’ve

expanded the number of X variables from 1 to K-1).

6
b. Assumption 2 becomes E ( et ) = 0 if and only if E ( yt ) = ŷt = β̂1 + β̂ 2 xt + ...+ β̂ K x Kt

c. Assumption 5 becomes cov ( xkt ,et ) = 0 for all k = 2,..., K, t = 1,...,T , and there is

no exact linear relationship between two or more of the X variables (i.e., no exact
multicollinearity).

Caveat with hypothesis tests of a single restriction (e.g., H 0 : β̂ =0 ) in multiple regression:

with multiple X variables there is often correlation across these variables. This

correlation is called multicollinearity. If this correlation is strong enough, then we may

not be able to separate the individual effects of the collinear variables (although we can

validly consider the collinear variables jointly with an F-test). Multicollinearity is not a

violation of the above OLS assumptions, but it may bring in to question the validity of

individual parameter tests when the null hypothesis not rejected. However, it has no

impact on forecasting y, only hypothesis testing when we are considering a multiple

regression model and have reached, e.g., a do not reject the zero null with individual

parameter t-tests.

F-test

Allows testing of joint restrictions. The F-statistic is calculated as follows:

F(q,T − K ) =
( ESS r
− ESSu ) / m
∼ Fq,T − K ,
ESSu / (T − K )

where ESSi , i = r,u, is the error sum of squares for the restricted and unrestricted model,

and m is the number of restrictions specified in the null hypothesis. The statistic is

distributed according to the F distribution with degrees of freedom m for the numerator,

and degrees of freedom T-K for the denominator.

7
Consider the unrestricted model yt = β1 + β 2 x2t + ...+ β K x Kt + et , t = 1,...,T . As an example

(we can test any subset of the parameters), suppose we are interested in testing the null
hypothesis

H 0 : β̂ 2 = ... = β̂ K = 0
versus the alternative

H 1 : β̂ 2 ,..., and/or β̂ K ≠ 0.
The F statistic printed with the regression output is to test this null. Apply the null
hypothesis to the unrestricted model to obtain the restricted model yt = β1 + et , t = 1,...,T , and

estimate to obtain the ESSr . The degrees of freedom for the numerator, the difference in the

two models, is given by the difference in the number of parameters estimated in unrestricted
and restricted models, and is K −1 for the example above. The degrees of freedom for the
denominator are the same as the degrees of freedom for the unrestricted model, T − K.

If the calculated F(q,T − K ) > F( critical sig. level ,q,T − K ) , then reject the null hypothesis.

F critical value for significance level 0.05 (i.e., the right tail area equals 0.05).

Note the degrees of freedom for the numerator, df1, and for the denominator, df2, used to

determine the critical value. Changing their order of the degrees of freedom changes the

distribution (but then the test is not valid, so don’t do it).

The F-test can be used for testing linear (in parameters) restrictions on any subset of

parameters in the model. It will also be useful in testing the OLS assumptions above.

8
Violations of assumption(s)

1. Bias and inconsistency (assumptions 1, 2, and/or 5)


Ramsey RESET and (Durbin-Wu) Hausman tests will be used to test for violations of (1),
(2), and (5), which can cause biased parameter estimates. Note (2) is not an assumption
if a constant term is included in the model.

2. Efficiency (assumptions 3 and/or 4)

LM test for autocorrelation is used to test for violation of (4), which can cause inefficient
standard errors.

White test for heteroskedasticity can be used to test for violation of (3), which can cause
inefficient standard errors.

Potential sources of error in hypothesis testing:

Type 1 errors (rejecting the null when it is true) and Type II errors (not rejecting the null
when it is false) may result from violations of the OLS assumptions, i.e., model
specification issues:

1. Central tendency of violations of OLS assumptions 1, 2, and/or 5 may lead to


biased parameter estimates.
2. Central tendency of violations of assumptions 3 and/or 4 may lead to inefficient
parameter standard errors.
3. Multicollinearity is not a violation of the above OLS assumptions, but it may
bring in to question the validity of individual (single restriction) parameter tests
when the null hypothesis not rejected.
The only way to reduce the probability of both Type 1 and Type II error is to increase the
sample size, otherwise, for any given sample size, there will always be a tradeoff.

9
FORECASTING (prediction) with a single equation model

The predicted (or forecast) y for future time periods is given by

ŷT +τ = E ( yT +τ xT +τ ) = β̂1 + β̂ 2 xT +τ , τ = 1,..., N, (3)

where N is the number of time periods into the future that y is being forecast. For
example, if τ = 1 then we are calculating the one period ahead forecast, ŷT +1 .

Forecasting is primarily useful for the following purposes:


1. Predict the future, e.g., ŷT +1 .
2. Determine the effect of changing policy variables, x , on y.
3. Validating models (e.g., models are always abstractions, how well does it predict?).

Types of forecasting

Unconditional forecast: explanatory variables are known with certainty (they are
observed).
• Ex post forecast is unconditional (values of explanatory variables are observed).
• Ex ante forecast may be unconditional if explanatory variables are lagged. If
there are no lagged explanatory variables then all ex ante forecasts are
conditional.

Conditional forecast: at least one explanatory variable is not known (i.e., we must
forecast at least one explanatory variable in order to forecast ŷ ). In this case the forecast
of ŷ is said to be conditional on the forecast of the x variables.

Example:

10
Time t = [ 0,…,t1 ] is the sample data used in estimating the model (e.g., Jan. 1990-Dec.

1998 in above diagram).


t = ( t1 ,…,T ] is the ex post forecast period, implies we observe the values for the

explanatory variables as well as y, necessary for simulation and evaluation of


forecasts (e.g., Jan. 1999-Dec. 1999).
t = (T ,…] is the ex ante forecast period, which will require estimates of explanatory

variables, unless they are lagged.

For simulating and evaluation of forecasts, data on forecast variables is necessary so


that we can compare the forecast value of y to the actual value of y. Consider the
timeline above, suppose monthly forecasts are made for 1999. Once at least part of 1999
is past and actual values of y are observed, the forecasting model can be assessed for
accuracy and, if necessary, revised.

Potential sources of error in forecasting:

1. model specification: violations of OLS assumptions 1, 2, and/or 5 may lead to


biased parameter estimates, violations of assumptions 3 and/or 4 may lead to
inefficient parameter standard errors.
2. Error associated with the error term in the model (this is measured by the
standard error of the regression).
3. Errors associated with the estimated parameters in the model (this is measured by
the standard errors of estimated regression parameters).
4. Some forecasts may be conditional on predicted values of x variables, there may
be errors in these predicted x.

Conditional on the appropriate model specification, forecast standard error, used in


calculating bounds around forecast point estimates, should be used to account for error
sources 2 and 3. If forecasts are conditional then the forecast model should be specified
as a system, to include the prediction of the X variables upon which the forecast is
conditional, this would then account for error source 4.

11
Forecast standard error

For the simple regression model, the standard error around the forecast of ŷT +τ is given
by
1/2
* $ 2
'-
, 1 (x − x ) )/
σˆ f = ,σˆ 2 &&1+ + T 0 (4)
T ∑ (x − x )2 )/
,+ % t=1 t (/.

This expression is only for simple regression (1 X variable) models and includes error
from the error term in the model (the standard error of the regression) and error
associated with the parameter estimates (parameter standard error).

EXPRESSION (4) IS NOT VALID IF THERE IS MORE THAN 1 X VARIABLE. (4)

is a special case of a more complicated expression that includes terms for multiple X

variables. To consider more than one X variable, the forecast standard error expression

in (4) has to be expanded to include analogous terms for each X and covariance terms

between all the β associated with X variables in the model. The prediction or forecast

standard error for the multiple regression model is

1/2
⎡ ⎛ 1 ⎞ K −1 K −1
( ⎤
σ̂ f = ⎢σ̂ 2 ⎜ 1+ ⎟ + ∑ ∑ (xi0 − xi )2 (x j0 − x j )2 cov β̂ i , β̂ j ⎥ .
⎣ ⎝ T ⎠ i=1 j=1 ⎦
) (5)

For conditional mean forecasts, drop the 1 from the expression (5), so

1/2
⎡ σ̂ 2 K −1 K −1
( )

σ̂ f = ⎢ + ∑ ∑ (xi0 − xi )2 (x j0 − x j )2 cov β̂ i , β̂ j ⎥ .
⎣T i=1 j=1 ⎦
(6)

Using dummy variables to compute the forecast and forecast standard error

For any number of X variables and forecast time periods, a relatively easy way to

calculate the forecast and forecast standard error is to use the following dummy variable

12
approach. Expand the data used to estimate the model by increasing the number of rows

and columns of data by the number periods being forecast. Each new column will

contain all zeros except for the diagonal of the rows and columns that that have been

added. Each element of this diagonal is set equal to -1. Each of the new rows below the

original data will contain the following data:

• 0 for the Y column


• X values are needed, hence the forecast is conditional on the values of X.

For example, consider the model yt = β1 + β 2 x1t + β3 x2t + β 4 x3t + et , t = 1,...,T . To

forecast two periods forward (i.e., time periods T+1 and T+2), it is necessary to have
values for all xij , i = 1, 2, 3 and j = T + 1,T + 2. The forecast of y is conditional on these x

values. Below is the data layout which will allow us calculate the forecast values and the
forecast standard error for each forecast period. The black entries are the original data
matrix to estimate the model. The blue entries are the additional rows and columns to
add to obtain the forecast, forecast standard error, and the probability bounds around the
forecast.
" y1 x11 x21 x31 0 0 %
$ '
$ y2 x12 x22 x32 0 0 '
$ '
$     0 0 '
(7)
$ yT x1T x2T x3T 0 0 '
$ 0 x1,T +1 x2,T +1 x3,T +1 −1 0 '
$ '
$ 0 x1,T +2 x2,T +2 x3,T +2 0 −1 '&
#

See the DAL CAPM and other examples for more detail.

13
Choosing among alternative forecasting models

The root mean square error (RMSE) is a measure of the deviation of the simulated
(forecast) variable from the actual (observed) value of the variable, and can be used to
choose between alternative forecasting models.

To calculate the RMSE, we need the predicted and actual values of the y so that we can
calculate the prediction error in the forecast, ŷt − yt . Using this information over the
forecasting horizon we can calculate the
T +h
( ŷt − yt )2
Root mean square error = RMSE = ∑ h
,
t=T +1

where h is the number of forecasting periods, ŷt is the forecast value of y at time t, and y

is the observed (actual) value of y at time t.

A smaller RMSE indicates a more precise forecast. In comparing alternative forecasting


models, the model with the smallest RMSE provides relatively precise forecasts.

For any specified forecasting model, the magnitude of this error can only be evaluated
relative to the mean of the variable being forecast.

“Can Economists Forecasts Crashes?” David Hendry (little over 3 min video):
http://www.youtube.com/watch?v=yrpUO0k3kSM

Econometrics Beat: Dave Giles’ Blog (has data, among other things)
http://davegiles.blogspot.com

14

Potrebbero piacerti anche