Linear Regression Model Estimation

II 1
James B. McDonald
Brigham Young University
5/2010
II. TWO VARIABLE LINEAR REGRESSION MODEL
Several applications about the importance of having information about the relationship
between economic variables were illustrated in the introduction. This section provides some
essential building blocks used in estimating and analyzing "appropriate" functional relationships
between two variables. We first consider estimation problems associated with linear relationships.
The properties and distribution of the least squares estimators are considered. Diagnostic and test
statistics which are important in evaluating the adequacy of the specified model are then discussed.
A methodology for forecasting and the determination of confidence intervals associated with the
linear model is presented. Finally, some alternative functional forms (nonlinear) which can be
estimated using techniques of regular least squares are presented.
A. INTRODUCTION
Consider the model
Yt = β1 + β2Xt + εt
with n observations (X1,Y1), . . ., (Xn,Yn) which are graphically depicted as
ε t: true random disturbance or

error term
(vertical distance from the
observation to the line)
Random behavior
Measurement error (Y)
Omitted variables
β1 + β2Xt: population regression line
β1 and β2 are unknown
II 2
Population Regression Function:
Yt 1 2 Xt t
observed population error or
Y regression random
line disturbance
The observations don't have to lie on the population regression line, but it is usually
assumed that
E(Yt | Xt) = β1 + β2Xt, i.e.,
the expected value or the "average" value of Y corresponding to any given value of X lies on
the population regression line.
An important objective of econometrics is to estimate the unknown parameters (β1, β2),

and thereby estimate the unknown population regression line. This estimated regression line is
referred to as the sample regression line. Again, the sample regression line is an estimator of
the population regression line.
Sample Regression Function:
Yt ˆ ˆ X et
1 2 t
observed sample estimated random
Y regression disturbance or
line "residual"
Yˆt et
estimated Y
for a given X
et (the residual) is the vertical distance from the Yt to the sample regression line, so
et Yt ˆ ˆ X Yt ˆ , whereas
Y Yt Xt
1 2 t t t 1 2
It is important to recognize that the residual (et) is an estimate of the equation error or
random disturbance (εt) and may have different properties.
II 3
B. THE ESTIMATION PROBLEM
(1) Given a sample of (Xt,Yt): (X1,Y1), . . ., (Xn,Yn),
Yt
_____________________________
Xt
(2) estimate β1, β2 , ˆ 1 , ˆ 2 .

Note that each different guess of β1 and β2, i.e., β̂1 and β̂ 2 , gives a different sample
regression line. How should β̂1 and β̂ 2 be selected? There are many possible approaches
to this problem. We now review five possible alternatives and then carefully develop a
method known as least squares.
Criteria: (five of many)
(1) minimize "vertical" distances
min et no unique solution

β̂1 and β̂ 2
min e 2t least squares or ordinary least squares (OLS)

β̂1 and β̂ 2
(2) min et p robust estimators

β̂1 and β̂ 2
p=2 gives least squares
p=1 gives least absolute deviations (LAD)
(3) min (horizontal distances)2

β̂1 and β̂ 2
2
(4) min t (perpendicular distances from regression line)
β̂1 and β̂ 2
II 4
(5) Method of moments (MM) estimators

Sample average of estimated residuals = E(εt) = 0
n
et = 0
t =1
Sample covariance between residual and X = E(εtXt) = 0
et X t = 0
The solution of these equations yields OLS estimators
Many techniques are available and each may have different properties. We will want
to use the best estimators. One of the most popular procedures is least squares.
Derivation of Least Squares Estimators (OLS)*

The sum of squares of the vertical distances between Yt and the sample regression line is
called, by many authors, the sum of squared errors and is denoted SSE. The SSE can be
written as
2
SSE = e2 = t Y -βˆ -βˆ X
t 1 2 t
Different β̂ 's (sample regression lines) are associated with different SSE. This can be
visualized as in the next figure. Least squares amounts to selecting the estimators with the
smallest SSE.
____________
*Since the SSE involves squaring the residuals, least squares estimators may be very sensitive to
"outlying" observations. This will be discussed in more detail later.
II 5
SSE
β̂
2
β̂
1
Minimizing SSE with respect to β̂ 1 and β̂ 2 yields
ˆ Y - ˆ 2 X (the sample regression line goes through X,Y )

1
ˆ t
X t Yt nXY
2
t
X 2t nX 2
Xt X Yt Y
2
Xt X
Cov(X, Y)
Var(X)
Proof: In order to minimize the SSE with respect to ˆ 1 and ˆ 2, we differentiate SSE,
with respect to β̂ 1 and β̂ 2, yielding:
SSE
=2 (Y t - ˆ 1 - ˆ 2 X t )(1) (-1)
ˆ t
1
= -2 et
t
II 6
SSE
=2 (Y t - ˆ 1 - ˆ 2 X t )(1) (- X t )
ˆ t
2
= - 2 (Yt X t - ˆ 1 Xt - ˆ 2 X2t)
= -2 et Xt.
SSE SSE
We see that setting these derivatives equal to zero, = 0 and = 0 , implies
ˆ ˆ
1 2
n
et = 0
t=1
n
e t X t = 0.
t=1
These two equations are often referred to as the normal equations. Note that the normal
equations imply that the sample mean of the residuals is equal to zero and that the sample
covariance between the residuals and X is zero which were also the conditions used in
method of moments estimation.
Solving the first normal equation for β̂ 1 yields
ˆ Y ˆ 2X
1
which implies that the regression line goes through the point ( X, Y ). The slope of the
sample regression line is obtained by substituting ˆ Y ˆ X into the second normal 1 2
SSE
equation = 0 or e t X t = 0 and solving for β̂ 2. This yields
ˆ
2
( Yt X t nXY)
ˆ t
2
( X 2t nX 2 )
t
Cov(X, Y)
Var(X)
II 7
C. PROPERTIES OF LEAST SQUARES ESTIMATORS
The properties of the β̂ 1 and β̂ 2 derived in the previous section will be very sensitive to
which of the following five assumptions are satisfied:
(A.1) εt are normally distributed
(A.2) E(εt Xt) = 0
(A.3) Homoskedasticity:
Var(εt Xt) = 2t 2
for every t
Homoskedasticity Heteroskedasticity
(A.4) No Autocorrelation:
Cov(εt, εs) = 0 t s
II 8
(A.5) The X's are nonstochastic (fixed in repeated sampling) and

n
Var(X) is finite, or in other words: 0 lim (Xt X )2 .

n
t 1
(This assumption can be relaxed, but the X’s need to be uncorrelated with
the errors in order for OLS estimators to be unbiased and consistent.)
A linear model satisfying (A.2)-(A.5) is referred to as the classical linear regression model. If
(A.1)-(A.5) are satisfied, then we have the classical normal linear regression model. We will
now summarize the properties of the least squares estimators in each of these two cases.
1. The Classical Linear Regression Model (A.2 – A.5)
If Yt = β1 + β2Xt + εt
where (A.2)-(A.5) are satisfied, then the β̂ i ’sare

unbiased: E ˆ i i
consistent: Var( β̂ i) 0 as n
the minimum variance of all linear unbiased estimators.
These estimators are referred to as BLUE--best linear unbiased estimators.
(A.2)-(A.5) are known as the Gauss-Markov Assumptions.
2. The Classical Normal Linear Regression Model (A.1 – A.5)
If Yt = β1 + β2Xt + εt
where (A.1)-(A.5) are satisfied, then the least squares estimators are:
unbiased
consistent
minimum variance of all unbiased estimators
(not just linear estimators)
normally distributed
This result facilitates t and F tests which will be discussed in another section.
least squares estimators will also be maximum likelihood estimators.
Since these desirable properties are conditional on the assumptions, it is important to test
for their validity. These tests will be outlined in another section of the notes.
We now attempt to give some intuitive motivation to the concept of maximum likelihood
estimation, then we prove that least squares are maximum likelihood estimators if (A.1)-
(A.5) are valid.
II 9
a. Pedagogical examples of maximum likelihood estimation:

(1) Estimation of μ (population mean)
The observed values of a normally distributed random variable Yt are denoted

by (Yt's) on the horizontal axis. Assume that we know that these data were
generated by one of two populations (#1, #2). Is it possible that the data were
generated from #1?, from #2? Which is the "most likely" population to have
generated the sample?
(2) Regression models
In this example, which of the two population regression lines is most likely* to
have generated the random sample?
II 10
*It might be useful to think about these “pdf’s” as “coming out” of the page in a
third dimension with the “points” being thought of as being normally distributed
around the population regression line.
b. Maximum likelihood estimation--Derivation

How can we quantify the ideas illustrated by these two examples and obtain the
"most likely" sample regression line? We now formally derive the maximum
likelihood estimators of β1 and β2 under the assumptions (A.1)-(A.5).
For the model
(1) E(Yt) = β1 + β2Xt
(2) Var(Yt X) = Var(β1 + β2Xt + εt Xt) = ζ2;
hence, we can write Yt ~ N[β1 + β2Xt; ζ2] which means that the density of Yt, given
-( Y t - 1- 2 X t )2 / 2 2
e
Xt, is given by f(Yt Xt) = = . These results can be visually depicted as in
2
2
the following figure:
II 11
The Likelihood Function for a random sample is defined by the product of the density
functions. Since each density function gives the likelihood or relative frequency of an
individual observation being realized, when we multiply these values, we obtain the
likelihood of observing the entire sample, given the current parameters:
L(Y;β1,β2,ζ2) = f Y1 f Yn
2 2
- ( - -
Yt 1 2 Xt ) /2
=e
(2 )n/ 2 ( 2 )n/ 2
and the Log Likelihood Function is given by:
(Y;β1,β2,ζ2) = ln L(Y;β1,β2,ζ2)
= t ln f(Yt)
2 2 n n 2
=- (Yt - 1 - 2 Xt ) / 2 - ln(2 ) - ln .
t 2 2
2 n n 2
= -SSE/ 2 ln(2 ) - ln
2 2
Maximum Likelihood Estimators (MLE) are obtained by maximizing (Y; β1, β2, ζ2)
over β1, β2, and ζ2. This maximization requires that we solve the following equations:
 - 1 SSE
(1) = =0
1 2 2 1
 - 1 SSE
(2) = =0
2 2 2 2
 SSE 2 -2 n 1
(3) = (ˆ ) - =0
2
2 2 ˆ2
LogL
𝛽1
β1
𝛽2
β2
II 12
Results:
β̂ 1 and β̂ 2 (the MLE) are also the OLS estimators β1 and β2 when (A.1) – (A.5).
2
e 2t Yt ˆ ˆ
1 2
ˆ2
n n
= average of square vertical deviations is the MLE of ζ2
ˆ 2 is biased.
s2 = et2/(n - 2) is an unbiased estimator of ζ2. The reason ˆ 2 is biased is that

not all of the et's are independent. Recall that there are two constraints on the
et's:
et = 0
etXt = 0;
hence, (n – 2) of the residuals (estimated errors) are independent. In other

words, if we had (n-2) of the et's, we could solve for the remaining two using
the two constraints above.
3. Important observation:
If the assumptions (A.1) - (A.5) are not satisfied, we may be

able to "do better" than least squares. It is important to test
the validity of (A.1) - (A.5).
II 13
D. DISTRIBUTION OF β̂1 AND β̂ 2 .
1. Distribution
In this section we give, without proof, the distribution of the least squares estimators if
(A.2)-(A.5) hold. We also consider factors impacting estimator precision and finally
provide some simulation results to provide intuition to the distributional results. The
main results are then summarized. The proofs will be given in the next chapter using
matrix algebra.
β̂1 and β̂ 2 are linear functions of the Yt ' s are random variables; hence, β̂1 and β̂ 2 are
random variables.
Expected Value: (unbiased estimators)

E( β̂1 ) = β1
E( β̂ 2 ) = β2
Variance (Population)
2
2 2 2
ˆ = / (X t - X) =
2
n Var (X)
2
ˆ
1
= 2
1/n + X / ( X t - X) 2
2
= 2
/n + X 2 2
ˆ
2
β̂1 and β̂ 2 are consistent because they are unbiased and their variances approach zero as
the sample size increases.
Furthermore, if (A.1) holds (εt ~ N(0, ζ2)), then Yt ~ N[β1+β2Xt;ζ2], which implies the
β̂i 's will be normally distributed since they will be linear combinations of normally
distributed variables.
These results can be summarized by stating that if (A.1)-(A.5) are valid, then
ˆ ~N ; 2
ˆ
i i i
where the equations for the variances are given above.

II 14
2. What factors contribute to increased precision (reduced variance) of parameter

estimators?
Consider the density of ˆ 1 and recall that

2
1 1
+ X
2
2
ˆ = 2
( + X 2 / (X t - X) ) = 2
.
1
n n n Var (X)
Precise Less Precise
Var(X)
σ
II 15
3. Interpretation of β̂ i ~ N[βi; 2
ˆ
i
] using Monte Carlo Simulations
In this section we report the results of some Monte Carlo simulations which provide
additional intuition about the distribution of β̂ i . We first construct the model used to
generate the data and then generate the data. Parameter estimates are then obtained,
another sample is generated and the process is continued until we can consider the
histograms of the estimators. Most Monte Carlo studies are similar in structure.
Consider the simple model which is referred to as the data generating process (DGP)
= 4 + 1.5Xt + εt
where εt ~ N(0, ζ2 = 4). We will let the X's be given by
Xt = 1, 2, . . ., 20. The selection of β1 , β 2 , 2

, and the X’s are arbitrary.
We then generate 20 random disturbances (ε) using a random number generator for
N(0, ζ2 = 4).
The X's and ε's are then substituted into
Yt = 4 + 1.5Xt + εt
to determine corresponding Y's. We now have 20 observations on Xt and Yt.
Pretend that we don't know what β1, β2, ζ2 are. The only thing we observe are the (Xt,
Yt). This might be visualized as
X β1, β2, ζ2, ε Y
We now estimate the unknown parameters (β1, β2, ζ2) using the previously discussed
formulas. This could yield, for example:
( β̂ 1, β̂ 2, ζ2) = (3.618, 1.615, 2.499).
If 14 more samples were generated, we would have a total of 15 estimates of β1, β2, ζ2.
II 16
The results of these random simulations are given by:
Trial β̂ 1 s β̂2 β̂ 2 s β̂2 s2 R2 D.W.*

1 2
________________________________________________________________________
1 3.618 .539 1.615 .00372 2.499 .974 2.14

2 3.794 .992 1.494 .00689 4.599 .947 2.32
3 5.770 .826 1.346 .00578 3.838 .946 2.10
4 3.491 .646 1.516 .00449 2.997 .966 2.41
5 4.443 .566 1.438 .00397 2.623 .967 2.20
6 4.697 .968 1.491 .00672 4.486 .948 2.83
7 5.428 .504 1.363 .00348 2.333 .967 2.40
8 4.685 .923 1.394 .00672 4.278 .944 1.73
9 6.122 .653 1.337 .00449 3.025 .956 2.21
10 2.589 .885 1.624 .00624 4.100 .960 1.63
11 4.046 1.447 1.514 .01000 6.707 .927 3.35
12 4.384 1.362 1.488 .00941 6.314 .928 1.32
13 3.452 .797 1.594 .00563 3.693 .962 2.06
14 4.301 .598 1.495 .00423 2.770 .968 1.51
15 3.196 .910 1.566 .00640 4.221 .955 2.17
Average 4.27 .8411 1.485 .0059 3.8989 .954 2.16
*D.W. denotes Durbin Watson statistic which can be used to test the validity of (A.4).
n 2
Given that Xt X = 665.

t 1
Questions:
(1) Evaluate the population variance of β̂ 1 and β̂ 2; i.e., 2

ˆ
1
, 2
ˆ
2
.
(2) Compare the average of s β̂2 and s β̂2 with their population counter-parts obtained in (1).
1 2
(3) Evaluate the sample variance of the fifteen estimates of β̂ 1 and β̂ 2 and compare them
with their population counterparts.
(4) Use a chi-square test to determine whether the average of the s2's is consistent with
n- 2
ζ2 = 4. Hint: 2
2
s ~
2
(15(18) = 270) .
II 17
A histogram of the estimated β̂1 's might yield a result similar to the following:
β̂1
4
2
Note the relationship between the histogram and the normal density N( 1 , ˆ
1
).
In practice we only have one sample of X's and Y's; hence, we only have one
observation of β̂1 , β̂ 2 , ˆ i or s ˆ i and these distributional results must be interpreted
accordingly.
4. Review:
Model: Yt = β1 + β2Xt + εt
A.1 εt is distributed normally
A.2 E(εt Xt) = 0
A.3 Var(εt) = ζ2 t
A.4 Cov(εtεs) = 0 t s
n
A.5 The X's are nonstochastic and 0 lim (Xt X )2 .

n
t 1
Unknown parameters: β1, β2, ζ2
Problem: Given a sample of size n: (X1,Y1), . . ., (Xn,Yn), obtain estimators of the

unknown parameters.
Estimators of the unknown parameters are given by:

II 18
Parameter Estimator
β1 : β̂ 1 = Y - β̂ 2 X
ˆ = (X t - X)( Y t - Y)
β2 : 2
( X t - X )2
= X t Y t - n X Y = Cov(X, Y)
2 2
Xt - n X Var(X)
2 ( Y t - ˆ 1 - ˆ 2 X t )2
ζ: s = 2 2
e /(n - 2) =
t
n-2
Distributions:
ˆ ~ N[ 1, 2
ˆ = 2
/n + X 2 2
/ ( X t - X ) 2]
1 1
ˆ ~ N[ 2, 2
ˆ = 2
/ ( X t - X ) 2]
2 2
The covariance between 1, and β̂ 2 is given by
ˆ ˆ
1 2
= -X 2
ˆ
2
= - X var(ˆ 2) = - X 2
/ (X - X )2 and will be proven later.
2
The ˆ
i
are estimated by
2
s
2
s 1=
ˆ + X 2 s 2 / ( X t - X )2
n
2 2 2
s ˆ 2 = s / (X t - X ) .
It should be mentioned that
(n- 2)s 2 (n- 2)s 2ˆ 1 (n- 2)s 2ˆ 2 (Y t - ˆ 1 - ˆ 2X t ) 2 2

2
= 2
= 2
= 2
~ (n- 2)
ˆ ˆ
1 2
II 19
E. DESCRIPTIVE STATISTICS AND HYPOTHESIS TESTS
In this section we assume that (A.1)-(A.5) are valid and consider test statistics which can
be used to test whether the model has any explanatory power. Z and t statistics and R2 (the
coefficient of determination) are important tools in this analysis. An important hypothesis is
whether the exogenous variable X helps explain Y. Normally, we would hope to reject the
hypothesis H0: β2=0 (Yt=β1+εt). We also consider how to test more general hypotheses of the
form H0: βi=β i0 .
0 2
1. H0 : i = i , where ˆ
i
is known
ˆ - 0
i
i
Z= ~ N(0,1)
ˆ
i
The test statistic measures the number of standard deviations that β̂ i differs from the
hypothesized value. Large values provide the basis for rejecting the null hypothesis. The
critical value is 1.96 for a two tailed test at the 5% level.
0 2
2. H0 : i = i , where ˆ
i
is unknown
ˆ - 0
i
ˆ - 0
i
i i
t= = ~ t(n - 2)
2
s î s î
Note the structure of the t-statistic and the Z-statistic are the same, except the standard
error in the Z-statistic is replaced by an unbiased estimator. s β̂ would, in some sense, get
i
closer to ζ as the sample size increases. We see this as we compare critical values for
β̂ i
the t- and Z-statistics.
II 20
Relationship between t- statistics and the standard normal
90% 95% 99%

N(0,1) 1.645 1.960 2.326
t(1) 6.314 12.706 31.821
2 2.920 4.303 6.965
3 2.353 3.182 4.541
4 2.132 2.776 3.747
10 1.812 2.228 2.764
25 1.708 2.060 2.485
1.645 1.960 2.326 = N(0,1)
Note that the critical values for a t-statistic are larger than for a standard normal, because
the t density has thicker tails.
II 21
Confidence Intervals and t-statistics:
We note, from the following, the close relationship between the t-statistic just discussed
and confidence intervals.
ˆ - 0
i
i
Pr(- t /2 < <t /2 )
s î
= Pr( ˆ 1 - t /2 s î < i < î + t s )

/ 2 î
=1-α
Thus, the use of confidence intervals or "test statistics" are just two different ways of
looking at the same problem.
II 22
3. Coefficient of Determination (R2)
The coefficient of determination measures the fraction of the total sum of squares
"explained" by the model. The following figure will provide motivation and definition of
important terms.
et Yt ˆ
Y t
Ŷt ˆ ˆ X
1 2 t
Ŷt Y
Define the total sum of squares (SST) to be

SST = (Yt - Y )2 = (Yt - Ŷt + Ŷt - Y )2
t
cross products = 0 if
= (Y t - Ŷ t )2 + (Ŷ t - Y )2 +
least squares is used
2 2
= et + (Ŷ t - Y )
= SSE + SSR,
where SSE and SSR, respectively, denote the sum of squared errors and sum of squares
explained by the regression model.
total sum of squares = sum of squared errors + sum of squares "explained"
by regression model.
SST = SSE + SSR
The coefficient of determination (R2) is defined by
2 SSR SSE
R = =1-
SST SST
II 23
2
et
=1- = fraction of total sum of squares "explained" by the model.
( Y t - Y )2
Note that increasing the number of independent variables in the model will not change SST,
but will decrease the SSE as long as the estimated coefficient of the new variable(s) is not
equal to zero; hence, increase R2. This is true even if the additional variables are not
statistically significant. This has provided the motivation for considering the adjusted R2 ( R 2 )
instead of R2. The adjusted R 2 is defined by
2 ( e2t) /(n- K)
R =1-
(Y t - Y )2 /(n - 1)
where K = the number of β's (coefficients) in the model. R 2 will only increase with the
addition of a new variable if the associated t-statistic is greater than 1 in absolute value. This
results follows from the equation
2 2
(n 1) SSENew ˆ 0
2 2 New _ var
RNew ROld 1 where the last term in
n k n K 1 SST sˆ
New _ var
the product is t 2 1 and K denotes the number of coefficients in the “old” regression model
and the “new” regression model includes K+1 coefficients.
4. Analysis of Variance (ANOV)

We have just decomposed the total sum of squares (SST) into two components:
sum of squares error (SSE)
sum of squares explained by regression (SSR).
This decomposition is commonly summarized in the form of an analysis of variance
(ANOV) table.
Source of Variation SS d.f MSE

Model SSR K-1 SSR/(K-1)
Error SSE n–K SSE/(n - K)
Total SST n–1
K = number of coefficients in model
II 24
where SS denotes the sum of squares and degrees of freedom, d.f., is the number of
independent terms in SS. The mean squared error (MSE) is the corresponding sum of squares
(SS) divided by the degrees of freedom.
Dividing the MSE for the model by the MSE for the error (s2) gives an F-statistic:
SSR/(K- 1)
F=
SSE/n- K
2
n- K R
= ~ F(K - 1, n - K)
K-1 1- R 2
The F-statistic can be used to test the hypothesis that all non-intercept (slope) coefficients
are equal to zero.
In the case of a single exogenous variable,

Yt = β1 2 Xt t
n-2 R2
the F statistic ~ F 1, n 2
1 1-R 2
tests the hypothesis H0 : β2 = 0 (all non-intercept coefficients = 0).

II 25
5. Sample Stata regression output (general format and a numerical example)

sum lwage educ
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lwage | N sample mean slwage smallest value largest value
educ | N sample mean seducat smallest value largest value
. reg lwage educ

ANOVA (Analysis of Variance Table)
Source | SS df MS Number of obs = N
-------------+------------------------------ F( #coef-1, N-#coeff) =
Model | SSR #coef-1 SSR/(#coeff-1) Prob > F = 0.0000
Residual | SSE N-#coef SSE/(N-#coeff) R-squared = SSR/SST = 1- SSE/SST
SSE /( N # coeff )
-------------+------------------------------ Adj R-squared = 1
SST /( N 1)
SSE
Total | SST N-1 SST/(N-1) Note: s2 , RMSE s s2
N # coeff
Regression results
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------------------------------------------------------------
ˆ
educ | ˆ sˆ 2
Probability of a larger t stat. ˆ / t sˆ
2 i /2
2
sˆ i
2
ˆ
_cons | ˆ sˆ 1
Same as above
1 1
s ˆ1
sum lwage educ

-------------+--------------------------------------------------------
lwage | 428 1.190173 .7231978 -2.054164 3.218876
educ | 753 12.28685 2.280246 5 17
. reg lwage educ
Source | SS df MS Number of obs = 428

-------------+------------------------------ F( 1, 426) = 56.93
Model | 26.3264193 1 26.3264193 Prob > F = 0.0000
Residual | 197.001022 426 .462443713 R-squared = 0.1179
-------------+------------------------------ Adj R-squared = 0.1158
Total | 223.327441 427 .523015084 Root MSE = .68003
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1086487 .0143998 7.55 0.000 .0803451 .1369523
_cons | -.1851968 .1852259 -1.00 0.318 -.5492673 .1788736
II 26
F. FORECASTS
If we have determined that our model has significant explanatory power, we may want to use it
to obtain predictions. We turn to constructing predictions or forecasts and confidence intervals
for the (1) regression line (or mean Y corresponding to a given X) and (2) individual value of
Y corresponding to an arbitrary value of X.
sample period
Sample: (Xt, Yt), t = 1, 2, . . ., n

Estimators: ˆ 1, ˆ 2
Sample Regression Line: Ŷ t = ˆ 1 + ˆ 2Xt
Uncertainty about ˆ 1, ˆ 2 implies uncertainty about Yt.
E( Ŷ t) = β1 + β2Xt
Var( Ŷ t ) = 2ˆ1 + X 2t 2
ˆ
2
+ 2 Xt ˆ ˆ
1 2
2
= + X2 2
ˆ + X2t 2
ˆ + 2 Xt (-X 2
ˆ )
n 2 1 2
2
= + ( X t - X )2 2
ˆ
n 2
2
= Ŷ t
Therefore,
Yˆ t ~ N(
2
1 + 2 X t; Yˆ t
).
2
2 s 2
can be estimated by s = + (Xt - X ) s2ˆ 2
Ŷ
2
Ŷ
n
From these results we can construct
II 27
Confidence Intervals for β1 + β2Xt: (regression line or E(Y X))
ˆ
Y t cs Yˆ
t
ˆ ˆ X t cs Yˆ
1 2 t
where tc = tα/2(n-2).
The forecasting problem is more often concerned with finding confidence intervals for the
actual value of Yt (not E(Yt Xt)) rather than the “mean” or expected value Yt corresponding to
an arbitrary value of Xt. To do this we consider an analysis of the forecast error (FE):
FE = Yt - Ŷt
E(FE) = 0
ζFE2 = Var(FE X)
= Var(Yt) + Var( Ŷ )
2 2
= + Ŷ
due to due to
the error uncertainty about
term population regression line
with ζFE2 being estimated by sFE2 = s 2Yˆ + s2
2
Note that s Yˆ and sFE are functions of X X , i.e., the further X is from the mean value, the
larger s Yˆ and sFE. This can also be seen in the following figure.
II 28
Confidence Intervals (CI) for actual Yt: (not β1 + β2Xt)
Ŷ t tsFE
where s FE = s 2FE
= s2 + s2Ŷ
2
s
= s + + (X t - X) 2 s 2ˆ 2
2
n C.I. for Yt
(outer intervals)
C.I. for 1 2 Xt
(inner intervals)
The two curved lines closest to the sample regression line correspond to CI’s for the population
regression line and the two curved lines furthest from the sample regression line are the CI’s
for the actual value of Y corresponding to different values of X.
G. ESTIMATION USING Stata
These calculations can be very tedious for even moderate sample sizes. Fortunately,
calculators and many computer programs make this part of econometrics relatively painless,
even exciting. Thus, we will be able to focus on understanding the statistical procedures, the
validity of the assumptions, and interpreting the statistical output. We will outline the
commands used in least squares estimation using the program Stata. Extensive manuals and
abbreviated information are also available describing additional procedures and options are
available for Stata and other programs such as SAS, EVIEWS, Gretl, R, SHAZAM and
TSP. Gretl is quite user friendly and it is free.
Stata
The data files can be created with Microsoft Excel (saving the file as a csv file). Stata
will automatically read in any column headings the data have. With a file named
FUN388.CSV, we can easily perform least squares estimation of the relationship
II 29
using the commands:
. insheet using "C:\FUN388.CSV”, clear This reads the data into STATA.
This can also be done by opening the
data editor and manually pasting the
data.
. sum Y X Gives statistical characteristics of Y and

X.
. plot Y X Plots Y on vertical axis, X on the

horizontal axis
. reg Y X Uses OLS to estimate the given model
To view additional residual diagnostics, use the following commands:
After the “. reg Y X” command, type
. predict error, resid (the variable “error” now contains the estimated
residuals)
.predict yhat, xb (creates yhat= ˆ1 ˆ2 X t )
.predict sfe, stdf (creates sfe = sFE )
.predict syhat,stdp (creates syhat = sYˆ )
1. To test for normality of the errors, type
. sktest error Tests for normality using a skewness/kurtosis test.

OR
. swilk error Tests for normality using a Shapiro-Wilk test
OR
. sfrancia error Tests for normality using a Shapiro-Francia test.
OR
. qnorm error Displays plot of error against quantiles of normal
distribution.
OR
. findit jb The “findit” command is useful in Stata to find
commands that are not yet installed. “findit jb”
will find the command for a Jarque-Bera test for
normality. After installing the command, type “jb
error” to run a Jarque-Bera test.
II 30
2. To test for heteroskedasticity, the following post-regression commands are useful:
. whitetst tests for heteroskedasticity using White’s test.
. estat hettest varnames tests for heteroskedasticity using a Breusch-Pagan

and Cook and Weisberg test.
. estat hettest, rhs iid or fstat uses all rhs var’s and a chi squre or f-test
tests for heteroskedasticity (using White’s test)

. estat imest, preservewhite
and for skewness and kurtosis.
More post-esimation commands are explained in the STATA help file titled
“regress postestimation.”
3. To test for autocorrelation (serial independence or randomness) of the error terms

you must first declare your data to be time series with the command
. tsset timevar timevar is the name of the time variable in your

dataset.
You can then test for autocorrelation in your time series data with the commands
. estat dwatson tests for first-order autocorrelation.
. estat bgodfrey Breusch-Godfrey test for higher-order serial

correlation.
. estat archlm tests for ARCH effects in the residuals.
. runtest varname varname is the name of the variable being tested

for random order.
4. Some other options:

a. To calculate the sum of absolute errors (SAE), type
. egen SAE = sum(abs(error))
“SAE” will appear as a constant column in the data editor.
b. To view information criteria, including the log-likelihood value and the

Akaike and Schwarz Bayesian information criteria, type
. estat ic
c. To display the variance covariance matrix, type

. estat vce
d. To display the correlation matrix, type

II 31
. estat vce, corr
e. Help files – use the Help menu or type HELP KEYWORD

II 32
Sample Stata output corresponding to the Anscombe_A data set in problem 1.2 (#4)
. infile x y using "C:\anscombe_a.txt", clear
(11 observations read)
. list y x
+------------+
| y x |
|------------|
1. | 8.04 10 |
2. | 6.95 8 |
3. | 7.58 13 |
4. | 8.81 9 |
5. | 8.33 11 |
|------------|
6. | 9.96 14 |
7. | 7.24 6 |
8. | 4.26 4 |
9. | 10.84 12 |
10. | 4.82 7 |
|------------|
11. | 5.68 5 |
+------------+
. plot y x
10.84 +
| *
|
|
| *
|
|
| *
|
| *
y | *
| *
| *
| *
|
|
| *
|
|
| *
4.26 + *
+----------------------------------------------------------------+
4 x 14
II 33
. sum y x

-------------+--------------------------------------------------------
y | 11 7.500909 2.031568 4.26 10.84
x | 11 9 3.316625 4 14
. reg y x

-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. whitetst
White's general test statistic : .6998421 Chi-sq( 2) P-value = .7047
. estat hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

Ho: Constant variance
Variables: fitted values of y
chi2(1) = 0.41
Prob > chi2 = 0.5232
. estat ic
------------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+----------------------------------------------------------------
. | 11 -22.88101 -16.84069 2 37.68137 38.47717
------------------------------------------------------------------------------
*ll(model) corresponds to the optimized log-likelihood value to the specified model; whereas,
ll(null) is obtained by estimating the model without any explanatory variables. Twice the difference
of the log-likelihood values is distributed as a chi square with df equal to the number of explanatory
variables.
II 34
H. FUNCTIONAL FORMS
In many applications the relationships between variables are not linear. A simple test for the
presence of nonlinear relationships is the Regression Specification Error Test (RESET–Ramsey,
1969). This test can be performed as follows:
Ho: yt X t t (estimate a linear model)
Ha: yt X t ˆ2
1 yt
ˆ3
2 yt t (the ŷ ’s denote OLS predicted values)
An F test of the hypothesis that both delta coefficients are simultaneously equal to zero is
approximately distributed as an F(2, N-K). Alternatively nonlinear functions of x can be added to
the linear terms and test for the collective explanatory power of the non-linear terms. Box-Cox
transformations provide another approach.
The linear regression model just considered is more general than might first appear.
Many nonlinear models can be transformed so that "linear techniques" can be used.
We can consider two types of nonlinear models:

o transformable types--estimable by least squares
o nontransformable--use nonlinear optimization algorithms
1. Transformable Models
a. Log-Log or Double Log Model

0
0
1
Yt AX t t 0
The slope and elasticity 0 1

are given by: 0
dY -1
=A X
dX 0
dY X 0
Y X = =
dX Y
II 35
This model can be estimated using least squares by taking the logarithm of the model
to yield
ln Yt = ln A + β ln Xt + ln εt
= β1 + β2 lnXt + ln εt
where β1 = lnA and β2 = β . Regressing ln(Yt) on ln(Xt) gives estimates for β1 and
ˆ eˆ 1 and ˆ ˆ .
β2; hence A 2
b. Semi Log Models
(1) Yt = A BXt t
B>1
B=1
0<B<1
The slope and elasticities are given by

dY
= Yt ln B;
dX
Y X = X ln B
Estimation: Least squares can again be applied to the logarithmic
transformation of the original model.
ln Yt = ln A + (ln B)Xt + ln εt
= β1 + β2 Xt + ln εt.
ˆ ˆ
Hence A e 1 and B ˆ eˆ 2
.
(2) X t = A BYt t
dY 1
The slope and elasticity are given by =
dX X ln B
and ηY X = 1/(Y ln B).
II 36
Estimation: Applying least squares to

1
Y t = - ln A/ln B + (1 /ln B) ln Xt - ln t
ln B
= β1 + β2 ln Xt + ηt which yields
1/ ˆ 2 and Â = e- ˆ 1/ ˆ 2.
B̂ = e
c. Reciprocal Transformations
Yt = A + B/Xt + εt
β>0
β<0
The slope and elasticity are:
dY
= - B/ X2 ; Y X = - B/YX and Y X = - B/XY .
dX
II 37
Estimation: Let Z = 1/X, then estimate

Yt = A + BZt + εt
ˆ ˆ .
= β1 + β2Z + εt and Â ˆ 1 and B 2
d. Logarithmic Reciprocal Transformations
Yt = eA-B/X+εt
asymptotic level
dY
= - BY/ X2 ; Y X = B/X
dX
Estimation: This model can be estimated using least squares on
ln Yt = A - B/Xt + εt
= β1 + β2(1/X) + εt where A = ˆ = ˆ 1 and
B̂ = - ˆ 2.
Application:
α = 0 market share
II 38
e. Polynomials
y = β1 + β2x + β3x2 + β4x3
β3 = β4 = 0 β4 = 0 β4 ≠ 0
Cost Function:
TC(q) = β1 + β2q + β3q2 + β4q3
MC(q) = β2 + 2β3q + 3β4q2
the desired shape requires β4 > 0

a minimum for positive q requires
MC'(q) = 2β3 + 6β4q = 0

q = -2β3/6β4 > 0
β3 < 0
minimum MC > 0 requires
4 β 32 - 4β2 3β4 < 0

β 32 < 3β2 β4
β2 > 0
Restrictions (Summary):
β1 0, β2 > 0, β3 < 0, β4 > 0
β 32 < 3β2β4
II 39
f. Production Functions
(1) Cobb Douglas (CD)

+ t
Y t = e 1 2 Lt 3 K t 4 t
ln Yt = β1 + β2t + β3 ln Lt + β4 ln Kt + ln εt
Production Characteristics:
β3 + β4 = 1 constant returns to scale
β3 = percent of total revenue paid to labor
% (K/L)
= = 1 = elasticity of substitution
% WL / WK
(2) Translog Transformation
ln Yt = β1 + β2 ln Lt + β3 ln Kt
+ β3(ln Lt)2 + β4(ln Kt)2
+ β5(ln Lt)(ln Kt)
Note that this model includes the Cobb Douglas as a special case (β3=β4=β5=0).
(3) Constant Elasticity of Substitution (CES)

+ t M/
Y t = e 1 2 Lt + (1 - ) K t t
1
= ,
1-
M = returns to scale.
Cost functions can be estimated from estimated production functions.
Estimation: (?)
ln Yt = β1+β2t + M/ρ ln[δLρ+(1 - δ)kρ] + ln εt
This function is a "nontransformable" type.
2. "Nontransformable" Models
Problem: Estimate the parameters in
Yt = F(β1, β2, . . ., βs ; Xlt, . . ., X ) + εt.

Kt
II 40
Two possible approaches include using nonlinear optimization programs or

approximations.
(a) Nonlinear Optimization Approach
(1) Define the objective function

Min SSE or
Maximum Likelihood
(2) Specify an initial guess for parameters.
(3) "Press go."
Start at initial value and iterate to a solution.
(b) Examples:
(1) Logistic Model

Yt = + Xt
+ +e t
Estimation:
2
SSE = ln - - - Xt
Yt
= Σ(ln εt)2
(2) Constant elasticity of substitution (CES) production function

Y -1
(3) Box Cox. Define Y( ) =
Consider Y(λ) = β1 + β2X(λ) + εt.

λ = 0: ln y = β1 + β2 ln X + εt
λ = 1: Y - 1 = β1 + β2(X - 1) + ε
or Y = 1 + β1 - β2 + β2X + ε
Stata will estimate "Box-Cox" models with the command format
boxcox depvar [indepvars] [, options]
Options (list from help file “boxcox” in Stata).
model(lhsonly) applies the Box-Cox transform to depvar only.

model(lhsonly) is the default.
model(rhsonly) applies the transform to the indepvars only.

II 41
model(lambda) applies the transform to both depvar and indepvars, and

they are transformed by the same parameter.
model(theta) applies the transform to both depvar and indepvars, but this
time, each side is transformed by a separate parameter.
notrans(varlist) specifies that the variables in varlist be included as

nontransformed independent variables.
II 42
I. PROBLEM SETS
Problem Set 2.1

Simple Linear Regression
Theory
1. Let kids denote the number of children ever born to a woman, and let educ denote the years of
education for the woman. A simple model relating fertility to years of education is
kids = β0 + β1educ + u
where u is the unobserved error.
a. All of the factors besides a woman’s education that affect fertility are lumped into the
error term, u. What kinds of factors are contained in u? Which of these are likely to be
correlated with level of education, which are not?
b. Will a simple regression analysis uncover the ceteris paribus effect of education on
fertility? Explain.
(Wooldridge 2.1)
2. Demonstrate that
ˆ = (X t - X)(Y t 2- Y) Covariance( X , Y ) is equivalent to
2
(X t - X) Variance( X )
Xt Yt - n XY
a. 2 2
X t - n(X )
(X t - X) Y t
b.
( X t - X )2
(Hints: Expand the numerator and denominator and remember that Xt nX ).
c. If you only have two observations (n=2), ( X 1 , Y1 , X 2 , Y2 ) , demonstrate that the

rise Y2 Y1
equation for ˆ2 can be simplified to .
run X 2 X1
(JM II-B, JM Math)
3. Demonstrate that the sample regression line obtained from least squares with an estimated
intercept passes through ( X , Y ). (Hint: Ŷ ˆ1 ˆ2 X , substitute X X , and simplify)
(JM II-B)
II 43
4. Consider the model

Yt = βXt + εt, where
A.1 εt distributed normally
A.2 E(εt) = 0 t
A.3 Var(εt) = ζ2 t
A.4 Cov(εt,εs) = 0 t, s (t s)
A.5 Xt nonstochastic.
a) Find the least squares estimator of β.
Hint: SSE = Σεt2 = Σ(Yt - βXt)2.
b) Find the MLE of β and 2.

Hint: (Y; Β , 2 ) = Σ ln f(Yt; , 2
)
n n
= - ( Y t - X t )2 / 2 2
- ln(2 ) - ln( 2)
2 2
c) Will the sample regression line Yˆt ˆ X obtained in (a) or (b) pass through ( X , Y )?
Explain.
(JM II-B)
Applied
5. The data set in CEOSAL2.RAW contains information on chief executive officers for U.S.
corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten
is prior number of years as company CEO.
i) Find the average salary and average tenure in the sample.
ii) How many CEO’s are in their first year as CEO (that is, ceoten = 0)?
iii) Estimate the simple regression model
log(salary) = 0 + 1ceoten +ε
and report your results in the usual form*. What is the predicted percentage increase in
salary given one more year as CEO?
(Wooldridge C.2.2)
*The usual form is to write out the equation with the estimated betas and their standard errors
underneath in parentheses. For example, if I was estimating
Yt = α + βXt + εt
and estimated α to be .543 with a standard error of .001 and β to be 1.43 with a standard error of
1.01 then I would report my results in the “usual form” as follows:
Yt = .543 + 1.43*Xt R2 =.955

(.001) (1.01) N = 123.
** We will review the required Stata commands in class/TA sessions.

II 44
Problem Set 2.2

Simple Linear Regression
Theory
Consider the model
Yt = β1 + β2Xt + εt.
1. BACKGROUND: The purpose of this problem is to show that, using OLS, the total sum of
squares can be partitioned into two parts as follows:
n n
( Y t - Y )2 = (Y t - Ŷ t + Ŷ t - Y )2
t =1 t =1
n n n
= (Y t - Ŷ )2 + 2 (Y t - Ŷ t )( Ŷ t - Y) + (Ŷ t - Y )2
t =1 t =1 t =1
n n n
where the terms ( Y t - Y )2 , (Y t - Ŷ )2 , (Ŷ t - Y )2 are referred to as the total sum of
t =1 t =1 t =1
squares (SST), sum of squares error (SSE), sum of squares "explained by the regression"
(SSR), respectively. This notation differs from that used by Wooldridge, but conforms with
notation used in a number of other econometrics texts
QUESTION: Explain why the cross product term

n n n
ˆ t )(Y
(Y t - Y ˆ t - Y) = ˆ t - Y) =
e t(Y e t( ˆ 1 + ˆ 2X t - Y) = 0
t=1 t=1 t=1
when least squares estimators are used. (Remember the first order conditions or normal
equations.)
(JM II-B)
Applied
2. For the population of firms in the chemical industry, let rd denote annual expenditures on
research and development, and let sales denote annual sales (both are in millions of
dollars).
a. Write down a model (not an estimated equation) that implies a constant elasticity
between rd and sales. Which parameter is the elasticity? (Hint: what functional
form should be used?)
b. Now estimate the model using the data in RDCHEM.RAW. Write out the estimated
equation in the usual form*. What is the estimated elasticity of rd with respect to
sales? Explain in words what this elasticity means.
(Wooldridge C 2.5)
2
* report the estimated parameters, standard errors, and R
II 45
3. Consider the following four sets of data1
Data Set A B C D
Variable X Y X Y X Y X Y
Obs. No. 1 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
2 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
3 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
4 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
5 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
6 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
7 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
8 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
9 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
10 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
11 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
a. For each of the data sets estimate the relationship
Yt = β1 + β2Xt + t , using least squares.
b. Compare and explain the four sets of results. (Hint: plot the data.)
c. In each of the four cases obtain a prediction of the value of Yt corresponding to a value of X = 20.
Which of the forecasts would you feel most comfortable with? Explain.
d. Based upon these examples comment on the following widely held notions.
i) "Numerical calculations are exact, but graphs are rough."
ii) "For any particular kind of statistical data there is just one set of calculations constituting a
correct statistical analysis."
iii) "Performing intricate calculations is rigorous, whereas actually looking at the data is cheating."
(JM II)
1
Reference: Anscombe, F. J., "Graphs in Statistical Analysis," The American Statistician, Vol. 27 (1973), p. 17-21.
II 46
4. The following Stata printout corresponds to the first Anscombe data set.
a. From the printout, determine the values of the following:

X
s2
s 2ˆ
2
b. Calculate the predicted value of Y and the variance of the forecast error
corresponding to x=20.
(1) Ŷ
(2) sY2ˆ s2
(3) sY2ˆ
s2
Hint: Recall that sY2ˆ (20 X ) 2 s 2ˆ and sFE
2
= sY2ˆ s2
n 2
c. Calculate 95% confidence intervals for the actual value of Y corresponding to X=20.
d. Calculate 95% confidence intervals for the population regression line corresponding
to X=20.
Yet another hint: the sample and population regression lines, respectively,
are defined by Yˆ ˆ ˆ X and 1
t 1 2 t
s s
2 X t , so use Yˆ for part (d) and FE for part
(c).
Check your work: Recall that the confidence interval for the population regression
line is narrower than the confidence interval for the actual value of Y corresponding
to a given X.
5. Consider the attached data file (functional forms 2.dta).

X denotes the independent variable, x=1,2,3, ..., 100. Corresponding to this independent
variable, various dependent variables were generated. Plot and estimate an appropriate
functional form between
a. the dependent variable denoted loglog and x;
b. semilog1 and x;
c. reciptrans and x;
d. polya and x;
e. polyb and x; and
f. polyc and x.
II 47
STATA Output (for problem #4)
. infile x y using "anscombe_a.txt", clear

(11 observations read)
. summ y x

-------------+--------------------------------------------------------
y | 11 7.500909 2.031568 4.26 10.84
x | 11 9 3.316625 4 14
. reg y x

-------------+------------------------------ F( 1, 9) = 17.99
Model | 27.5100011 1 27.5100011 Prob > F = 0.0022
Residual | 13.7626904 9 1.52918783 R-squared = 0.6665
-------------+------------------------------ Adj R-squared = 0.6295
Total | 41.2726916 10 4.12726916 Root MSE = 1.2366
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. set obs 12
obs was 11, now 12
. replace x=20 in 12
(1 real change made)
. predict yhat
(option xb assumed; fitted values)
. predict sfe, stdf
. list in 11/12 //only lists observations 11 and 12
+---------------------------------+
| x y yhat sfe |
|---------------------------------|
11. | 5 5.68 5.500546 1.375003 |
12. | 20 . 13.00191 1.830386 |

Linear Regression Model Estimation

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Linear Regression Model Estimation

Caricato da

Copyright:

Formati disponibili

II 1

II. TWO VARIABLE LINEAR REGRESSION MODEL

Consider the model

with n observations (X1,Y1), . . ., (Xn,Yn) which are graphically depicted as

ε t: true random disturbance or

Population Regression Function:

E(Yt | Xt) = β1 + β2Xt, i.e.,

An important objective of econometrics is to estimate the unknown parameters (β1, β2),

Sample Regression Function:

B. THE ESTIMATION PROBLEM

(1) Given a sample of (Xt,Yt): (X1,Y1), . . ., (Xn,Yn),

(2) estimate β1, β2 , ˆ 1 , ˆ 2 .

Criteria: (five of many)

(1) minimize "vertical" distances

min et no unique solution

min e 2t least squares or ordinary least squares (OLS)

(2) min et p robust estimators

(3) min (horizontal distances)2

(5) Method of moments (MM) estimators

The solution of these equations yields OLS estimators

Derivation of Least Squares Estimators (OLS)*

Minimizing SSE with respect to β̂ 1 and β̂ 2 yields

ˆ Y - ˆ 2 X (the sample regression line goes through X,Y )

Solving the first normal equation for β̂ 1 yields

C. PROPERTIES OF LEAST SQUARES ESTIMATORS

(A.1) εt are normally distributed

(A.2) E(εt Xt) = 0

(A.5) The X's are nonstochastic (fixed in repeated sampling) and

Var(X) is finite, or in other words: 0 lim (Xt X )2 .

1. The Classical Linear Regression Model (A.2 – A.5)

where (A.2)-(A.5) are satisfied, then the β̂ i ’sare

2. The Classical Normal Linear Regression Model (A.1 – A.5)

a. Pedagogical examples of maximum likelihood estimation:

The observed values of a normally distributed random variable Yt are denoted

(2) Regression models

b. Maximum likelihood estimation--Derivation

For the model

(1) E(Yt) = β1 + β2Xt

(2) Var(Yt X) = Var(β1 + β2Xt + εt Xt) = ζ2;

and the Log Likelihood Function is given by:

= average of square vertical deviations is the MLE of ζ2

s2 = et2/(n - 2) is an unbiased estimator of ζ2. The reason ˆ 2 is biased is that

hence, (n – 2) of the residuals (estimated errors) are independent. In other

If the assumptions (A.1) - (A.5) are not satisfied, we may be

D. DISTRIBUTION OF β̂1 AND β̂ 2 .

Expected Value: (unbiased estimators)

where the equations for the variances are given above.

2. What factors contribute to increased precision (reduced variance) of parameter

Consider the density of ˆ 1 and recall that

Precise Less Precise

where εt ~ N(0, ζ2 = 4). We will let the X's be given by

Xt = 1, 2, . . ., 20. The selection of β1 , β 2 , 2

The X's and ε's are then substituted into

to determine corresponding Y's. We now have 20 observations on Xt and Yt.

X β1, β2, ζ2, ε Y

The results of these random simulations are given by:

Trial β̂ 1 s β̂2 β̂ 2 s β̂2 s2 R2 D.W.*

1 3.618 .539 1.615 .00372 2.499 .974 2.14

Given that Xt X = 665.

(1) Evaluate the population variance of β̂ 1 and β̂ 2; i.e., 2

A.1 εt is distributed normally

A.2 E(εt Xt) = 0