Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
James B. McDonald
Brigham Young University
5/2010
Several applications about the importance of having information about the relationship
between economic variables were illustrated in the introduction. This section provides some
essential building blocks used in estimating and analyzing "appropriate" functional relationships
between two variables. We first consider estimation problems associated with linear relationships.
The properties and distribution of the least squares estimators are considered. Diagnostic and test
statistics which are important in evaluating the adequacy of the specified model are then discussed.
A methodology for forecasting and the determination of confidence intervals associated with the
linear model is presented. Finally, some alternative functional forms (nonlinear) which can be
estimated using techniques of regular least squares are presented.
A. INTRODUCTION
Yt = β1 + β2Xt + εt
Yt 1 2 Xt t
observed population error or
Y regression random
line disturbance
The observations don't have to lie on the population regression line, but it is usually
assumed that
the expected value or the "average" value of Y corresponding to any given value of X lies on
the population regression line.
Yt ˆ ˆ X et
1 2 t
observed sample estimated random
Y regression disturbance or
line "residual"
Yˆt et
estimated Y
for a given X
et (the residual) is the vertical distance from the Yt to the sample regression line, so
et Yt ˆ ˆ X Yt ˆ , whereas
Y Yt Xt
1 2 t t t 1 2
It is important to recognize that the residual (et) is an estimate of the equation error or
random disturbance (εt) and may have different properties.
II 3
Yt
_____________________________
Xt
2
(4) min t (perpendicular distances from regression line)
β̂1 and β̂ 2
II 4
Many techniques are available and each may have different properties. We will want
to use the best estimators. One of the most popular procedures is least squares.
Different β̂ 's (sample regression lines) are associated with different SSE. This can be
visualized as in the next figure. Least squares amounts to selecting the estimators with the
smallest SSE.
____________
*Since the SSE involves squaring the residuals, least squares estimators may be very sensitive to
"outlying" observations. This will be discussed in more detail later.
II 5
SSE
β̂
2
β̂
1
ˆ t
X t Yt nXY
2
t
X 2t nX 2
Xt X Yt Y
2
Xt X
Cov(X, Y)
Var(X)
Proof: In order to minimize the SSE with respect to ˆ 1 and ˆ 2, we differentiate SSE,
with respect to β̂ 1 and β̂ 2, yielding:
SSE
=2 (Y t - ˆ 1 - ˆ 2 X t )(1) (-1)
ˆ t
1
= -2 et
t
II 6
SSE
=2 (Y t - ˆ 1 - ˆ 2 X t )(1) (- X t )
ˆ t
2
= - 2 (Yt X t - ˆ 1 Xt - ˆ 2 X2t)
= -2 et Xt.
SSE SSE
We see that setting these derivatives equal to zero, = 0 and = 0 , implies
ˆ ˆ
1 2
n
et = 0
t=1
n
e t X t = 0.
t=1
These two equations are often referred to as the normal equations. Note that the normal
equations imply that the sample mean of the residuals is equal to zero and that the sample
covariance between the residuals and X is zero which were also the conditions used in
method of moments estimation.
ˆ Y ˆ 2X
1
which implies that the regression line goes through the point ( X, Y ). The slope of the
sample regression line is obtained by substituting ˆ Y ˆ X into the second normal 1 2
SSE
equation = 0 or e t X t = 0 and solving for β̂ 2. This yields
ˆ
2
( Yt X t nXY)
ˆ t
2
( X 2t nX 2 )
t
Cov(X, Y)
Var(X)
II 7
The properties of the β̂ 1 and β̂ 2 derived in the previous section will be very sensitive to
which of the following five assumptions are satisfied:
(A.3) Homoskedasticity:
Var(εt Xt) = 2t 2
for every t
Homoskedasticity Heteroskedasticity
(A.4) No Autocorrelation:
Cov(εt, εs) = 0 t s
II 8
(This assumption can be relaxed, but the X’s need to be uncorrelated with
the errors in order for OLS estimators to be unbiased and consistent.)
A linear model satisfying (A.2)-(A.5) is referred to as the classical linear regression model. If
(A.1)-(A.5) are satisfied, then we have the classical normal linear regression model. We will
now summarize the properties of the least squares estimators in each of these two cases.
If Yt = β1 + β2Xt + εt
consistent: Var( β̂ i) 0 as n
the minimum variance of all linear unbiased estimators.
These estimators are referred to as BLUE--best linear unbiased estimators.
(A.2)-(A.5) are known as the Gauss-Markov Assumptions.
If Yt = β1 + β2Xt + εt
where (A.1)-(A.5) are satisfied, then the least squares estimators are:
unbiased
consistent
minimum variance of all unbiased estimators
(not just linear estimators)
normally distributed
This result facilitates t and F tests which will be discussed in another section.
least squares estimators will also be maximum likelihood estimators.
Since these desirable properties are conditional on the assumptions, it is important to test
for their validity. These tests will be outlined in another section of the notes.
We now attempt to give some intuitive motivation to the concept of maximum likelihood
estimation, then we prove that least squares are maximum likelihood estimators if (A.1)-
(A.5) are valid.
II 9
In this example, which of the two population regression lines is most likely* to
have generated the random sample?
II 10
*It might be useful to think about these “pdf’s” as “coming out” of the page in a
third dimension with the “points” being thought of as being normally distributed
around the population regression line.
Yt = β1 + β2Xt + εt
hence, we can write Yt ~ N[β1 + β2Xt; ζ2] which means that the density of Yt, given
-( Y t - 1- 2 X t )2 / 2 2
e
Xt, is given by f(Yt Xt) = = . These results can be visually depicted as in
2
2
the following figure:
II 11
The Likelihood Function for a random sample is defined by the product of the density
functions. Since each density function gives the likelihood or relative frequency of an
individual observation being realized, when we multiply these values, we obtain the
likelihood of observing the entire sample, given the current parameters:
L(Y;β1,β2,ζ2) = f Y1 f Yn
2 2
- ( - -
Yt 1 2 Xt ) /2
=e
(2 )n/ 2 ( 2 )n/ 2
(Y;β1,β2,ζ2) = ln L(Y;β1,β2,ζ2)
= t ln f(Yt)
2 2 n n 2
=- (Yt - 1 - 2 Xt ) / 2 - ln(2 ) - ln .
t 2 2
2 n n 2
= -SSE/ 2 ln(2 ) - ln
2 2
Maximum Likelihood Estimators (MLE) are obtained by maximizing (Y; β1, β2, ζ2)
over β1, β2, and ζ2. This maximization requires that we solve the following equations:
- 1 SSE
(1) = =0
1 2 2 1
- 1 SSE
(2) = =0
2 2 2 2
SSE 2 -2 n 1
(3) = (ˆ ) - =0
2
2 2 ˆ2
LogL
𝛽1
β1
𝛽2
β2
II 12
Results:
β̂ 1 and β̂ 2 (the MLE) are also the OLS estimators β1 and β2 when (A.1) – (A.5).
2
e 2t Yt ˆ ˆ
1 2
ˆ2
n n
ˆ 2 is biased.
3. Important observation:
1. Distribution
In this section we give, without proof, the distribution of the least squares estimators if
(A.2)-(A.5) hold. We also consider factors impacting estimator precision and finally
provide some simulation results to provide intuition to the distributional results. The
main results are then summarized. The proofs will be given in the next chapter using
matrix algebra.
β̂1 and β̂ 2 are linear functions of the Yt ' s are random variables; hence, β̂1 and β̂ 2 are
random variables.
Variance (Population)
2
2 2 2
ˆ = / (X t - X) =
2
n Var (X)
2
ˆ
1
= 2
1/n + X / ( X t - X) 2
2
= 2
/n + X 2 2
ˆ
2
β̂1 and β̂ 2 are consistent because they are unbiased and their variances approach zero as
the sample size increases.
Furthermore, if (A.1) holds (εt ~ N(0, ζ2)), then Yt ~ N[β1+β2Xt;ζ2], which implies the
β̂i 's will be normally distributed since they will be linear combinations of normally
distributed variables.
These results can be summarized by stating that if (A.1)-(A.5) are valid, then
ˆ ~N ; 2
ˆ
i i i
Var(X)
σ
II 15
3. Interpretation of β̂ i ~ N[βi; 2
ˆ
i
] using Monte Carlo Simulations
In this section we report the results of some Monte Carlo simulations which provide
additional intuition about the distribution of β̂ i . We first construct the model used to
generate the data and then generate the data. Parameter estimates are then obtained,
another sample is generated and the process is continued until we can consider the
histograms of the estimators. Most Monte Carlo studies are similar in structure.
Consider the simple model which is referred to as the data generating process (DGP)
Yt = β1 + β2Xt + εt
= 4 + 1.5Xt + εt
We then generate 20 random disturbances (ε) using a random number generator for
N(0, ζ2 = 4).
Yt = 4 + 1.5Xt + εt
Pretend that we don't know what β1, β2, ζ2 are. The only thing we observe are the (Xt,
Yt). This might be visualized as
We now estimate the unknown parameters (β1, β2, ζ2) using the previously discussed
formulas. This could yield, for example:
( β̂ 1, β̂ 2, ζ2) = (3.618, 1.615, 2.499).
If 14 more samples were generated, we would have a total of 15 estimates of β1, β2, ζ2.
II 16
________________________________________________________________________
*D.W. denotes Durbin Watson statistic which can be used to test the validity of (A.4).
n 2
Questions:
(2) Compare the average of s β̂2 and s β̂2 with their population counter-parts obtained in (1).
1 2
(3) Evaluate the sample variance of the fifteen estimates of β̂ 1 and β̂ 2 and compare them
with their population counterparts.
(4) Use a chi-square test to determine whether the average of the s2's is consistent with
n- 2
ζ2 = 4. Hint: 2
2
s ~
2
(15(18) = 270) .
II 17
A histogram of the estimated β̂1 's might yield a result similar to the following:
β̂1
4
2
Note the relationship between the histogram and the normal density N( 1 , ˆ
1
).
In practice we only have one sample of X's and Y's; hence, we only have one
observation of β̂1 , β̂ 2 , ˆ i or s ˆ i and these distributional results must be interpreted
accordingly.
4. Review:
Model: Yt = β1 + β2Xt + εt
A.3 Var(εt) = ζ2 t
A.4 Cov(εtεs) = 0 t s
n
Parameter Estimator
β1 : β̂ 1 = Y - β̂ 2 X
ˆ = (X t - X)( Y t - Y)
β2 : 2
( X t - X )2
= X t Y t - n X Y = Cov(X, Y)
2 2
Xt - n X Var(X)
2 ( Y t - ˆ 1 - ˆ 2 X t )2
ζ: s = 2 2
e /(n - 2) =
t
n-2
Distributions:
ˆ ~ N[ 1, 2
ˆ = 2
/n + X 2 2
/ ( X t - X ) 2]
1 1
ˆ ~ N[ 2, 2
ˆ = 2
/ ( X t - X ) 2]
2 2
ˆ ˆ
1 2
= -X 2
ˆ
2
= - X var(ˆ 2) = - X 2
/ (X - X )2 and will be proven later.
2
The ˆ
i
are estimated by
2
s
2
s 1=
ˆ + X 2 s 2 / ( X t - X )2
n
2 2 2
s ˆ 2 = s / (X t - X ) .
In this section we assume that (A.1)-(A.5) are valid and consider test statistics which can
be used to test whether the model has any explanatory power. Z and t statistics and R2 (the
coefficient of determination) are important tools in this analysis. An important hypothesis is
whether the exogenous variable X helps explain Y. Normally, we would hope to reject the
hypothesis H0: β2=0 (Yt=β1+εt). We also consider how to test more general hypotheses of the
form H0: βi=β i0 .
0 2
1. H0 : i = i , where ˆ
i
is known
ˆ - 0
i
i
Z= ~ N(0,1)
ˆ
i
The test statistic measures the number of standard deviations that β̂ i differs from the
hypothesized value. Large values provide the basis for rejecting the null hypothesis. The
critical value is 1.96 for a two tailed test at the 5% level.
0 2
2. H0 : i = i , where ˆ
i
is unknown
ˆ - 0
i
ˆ - 0
i
i i
t= = ~ t(n - 2)
2
s ˆi s ˆi
Note the structure of the t-statistic and the Z-statistic are the same, except the standard
error in the Z-statistic is replaced by an unbiased estimator. s β̂ would, in some sense, get
i
closer to ζ as the sample size increases. We see this as we compare critical values for
β̂ i
the t- and Z-statistics.
II 20
Note that the critical values for a t-statistic are larger than for a standard normal, because
the t density has thicker tails.
II 21
We note, from the following, the close relationship between the t-statistic just discussed
and confidence intervals.
ˆ - 0
i
i
Pr(- t /2 < <t /2 )
s ˆi
=1-α
Thus, the use of confidence intervals or "test statistics" are just two different ways of
looking at the same problem.
II 22
The coefficient of determination measures the fraction of the total sum of squares
"explained" by the model. The following figure will provide motivation and definition of
important terms.
et Yt ˆ
Y t
Ŷt ˆ ˆ X
1 2 t
Ŷt Y
cross products = 0 if
= (Y t - Ŷ t )2 + (Ŷ t - Y )2 +
least squares is used
2 2
= et + (Ŷ t - Y )
= SSE + SSR,
where SSE and SSR, respectively, denote the sum of squared errors and sum of squares
explained by the regression model.
total sum of squares = sum of squared errors + sum of squares "explained"
by regression model.
SST = SSE + SSR
2 SSR SSE
R = =1-
SST SST
II 23
2
et
=1- = fraction of total sum of squares "explained" by the model.
( Y t - Y )2
Note that increasing the number of independent variables in the model will not change SST,
but will decrease the SSE as long as the estimated coefficient of the new variable(s) is not
equal to zero; hence, increase R2. This is true even if the additional variables are not
statistically significant. This has provided the motivation for considering the adjusted R2 ( R 2 )
instead of R2. The adjusted R 2 is defined by
2 ( e2t) /(n- K)
R =1-
(Y t - Y )2 /(n - 1)
where K = the number of β's (coefficients) in the model. R 2 will only increase with the
addition of a new variable if the associated t-statistic is greater than 1 in absolute value. This
results follows from the equation
2 2
(n 1) SSENew ˆ 0
2 2 New _ var
RNew ROld 1 where the last term in
n k n K 1 SST sˆ
New _ var
the product is t 2 1 and K denotes the number of coefficients in the “old” regression model
where SS denotes the sum of squares and degrees of freedom, d.f., is the number of
independent terms in SS. The mean squared error (MSE) is the corresponding sum of squares
(SS) divided by the degrees of freedom.
Dividing the MSE for the model by the MSE for the error (s2) gives an F-statistic:
SSR/(K- 1)
F=
SSE/n- K
2
n- K R
= ~ F(K - 1, n - K)
K-1 1- R 2
The F-statistic can be used to test the hypothesis that all non-intercept (slope) coefficients
are equal to zero.
n-2 R2
the F statistic ~ F 1, n 2
1 1-R 2
ˆ
_cons | ˆ sˆ 1
Same as above
1 1
s ˆ1
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .1086487 .0143998 7.55 0.000 .0803451 .1369523
_cons | -.1851968 .1852259 -1.00 0.318 -.5492673 .1788736
II 26
F. FORECASTS
If we have determined that our model has significant explanatory power, we may want to use it
to obtain predictions. We turn to constructing predictions or forecasts and confidence intervals
for the (1) regression line (or mean Y corresponding to a given X) and (2) individual value of
Y corresponding to an arbitrary value of X.
sample period
E( Ŷ t) = β1 + β2Xt
Var( Ŷ t ) = 2ˆ1 + X 2t 2
ˆ
2
+ 2 Xt ˆ ˆ
1 2
2
= + X2 2
ˆ + X2t 2
ˆ + 2 Xt (-X 2
ˆ )
n 2 1 2
2
= + ( X t - X )2 2
ˆ
n 2
2
= Ŷ t
Therefore,
Yˆ t ~ N(
2
1 + 2 X t; Yˆ t
).
2
2 s 2
can be estimated by s = + (Xt - X ) s2ˆ 2
Ŷ
2
Ŷ
n
From these results we can construct
II 27
ˆ
Y t cs Yˆ
t
ˆ ˆ X t cs Yˆ
1 2 t
where tc = tα/2(n-2).
The forecasting problem is more often concerned with finding confidence intervals for the
actual value of Yt (not E(Yt Xt)) rather than the “mean” or expected value Yt corresponding to
an arbitrary value of Xt. To do this we consider an analysis of the forecast error (FE):
FE = Yt - Ŷt
E(FE) = 0
ζFE2 = Var(FE X)
= Var(Yt) + Var( Ŷ )
2 2
= + Ŷ
due to due to
the error uncertainty about
term population regression line
2
Note that s Yˆ and sFE are functions of X X , i.e., the further X is from the mean value, the
larger s Yˆ and sFE. This can also be seen in the following figure.
II 28
Ŷ t tsFE
where s FE = s 2FE
= s2 + s2Ŷ
2
s
= s + + (X t - X) 2 s 2ˆ 2
2
n C.I. for Yt
(outer intervals)
C.I. for 1 2 Xt
(inner intervals)
The two curved lines closest to the sample regression line correspond to CI’s for the population
regression line and the two curved lines furthest from the sample regression line are the CI’s
for the actual value of Y corresponding to different values of X.
These calculations can be very tedious for even moderate sample sizes. Fortunately,
calculators and many computer programs make this part of econometrics relatively painless,
even exciting. Thus, we will be able to focus on understanding the statistical procedures, the
validity of the assumptions, and interpreting the statistical output. We will outline the
commands used in least squares estimation using the program Stata. Extensive manuals and
abbreviated information are also available describing additional procedures and options are
available for Stata and other programs such as SAS, EVIEWS, Gretl, R, SHAZAM and
TSP. Gretl is quite user friendly and it is free.
Stata
The data files can be created with Microsoft Excel (saving the file as a csv file). Stata
will automatically read in any column headings the data have. With a file named
FUN388.CSV, we can easily perform least squares estimation of the relationship
II 29
Yt = β1 + β2Xt + εt
. insheet using "C:\FUN388.CSV”, clear This reads the data into STATA.
This can also be done by opening the
data editor and manually pasting the
data.
. predict error, resid (the variable “error” now contains the estimated
residuals)
.predict yhat, xb (creates yhat= ˆ1 ˆ2 X t )
.predict sfe, stdf (creates sfe = sFE )
.predict syhat,stdp (creates syhat = sYˆ )
You can then test for autocorrelation in your time series data with the commands
Sample Stata output corresponding to the Anscombe_A data set in problem 1.2 (#4)
. infile x y using "C:\anscombe_a.txt", clear
(11 observations read)
. list y x
+------------+
| y x |
|------------|
1. | 8.04 10 |
2. | 6.95 8 |
3. | 7.58 13 |
4. | 8.81 9 |
5. | 8.33 11 |
|------------|
6. | 9.96 14 |
7. | 7.24 6 |
8. | 4.26 4 |
9. | 10.84 12 |
10. | 4.82 7 |
|------------|
11. | 5.68 5 |
+------------+
. plot y x
10.84 +
| *
|
|
| *
|
|
| *
|
| *
y | *
| *
| *
| *
|
|
| *
|
|
| *
4.26 + *
+----------------------------------------------------------------+
4 x 14
II 33
. sum y x
. reg y x
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. whitetst
. estat hettest
chi2(1) = 0.41
Prob > chi2 = 0.5232
. estat ic
------------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+----------------------------------------------------------------
. | 11 -22.88101 -16.84069 2 37.68137 38.47717
------------------------------------------------------------------------------
*ll(model) corresponds to the optimized log-likelihood value to the specified model; whereas,
ll(null) is obtained by estimating the model without any explanatory variables. Twice the difference
of the log-likelihood values is distributed as a chi square with df equal to the number of explanatory
variables.
II 34
H. FUNCTIONAL FORMS
In many applications the relationships between variables are not linear. A simple test for the
presence of nonlinear relationships is the Regression Specification Error Test (RESET–Ramsey,
1969). This test can be performed as follows:
Ho: yt X t t (estimate a linear model)
Ha: yt X t ˆ2
1 yt
ˆ3
2 yt t (the ŷ ’s denote OLS predicted values)
An F test of the hypothesis that both delta coefficients are simultaneously equal to zero is
approximately distributed as an F(2, N-K). Alternatively nonlinear functions of x can be added to
the linear terms and test for the collective explanatory power of the non-linear terms. Box-Cox
transformations provide another approach.
The linear regression model just considered is more general than might first appear.
Many nonlinear models can be transformed so that "linear techniques" can be used.
1. Transformable Models
This model can be estimated using least squares by taking the logarithm of the model
to yield
ln Yt = ln A + β ln Xt + ln εt
= β1 + β2 lnXt + ln εt
where β1 = lnA and β2 = β . Regressing ln(Yt) on ln(Xt) gives estimates for β1 and
ˆ eˆ 1 and ˆ ˆ .
β2; hence A 2
(1) Yt = A BXt t
B>1
B=1
0<B<1
(2) X t = A BYt t
dY 1
The slope and elasticity are given by =
dX X ln B
and ηY X = 1/(Y ln B).
II 36
1/ ˆ 2 and  = e- ˆ 1/ ˆ 2.
B̂ = e
c. Reciprocal Transformations
Yt = A + B/Xt + εt
β>0
β<0
dY
= - B/ X2 ; Y X = - B/YX and Y X = - B/XY .
dX
II 37
Yt = eA-B/X+εt
asymptotic level
dY
= - BY/ X2 ; Y X = B/X
dX
ln Yt = A - B/Xt + εt
B̂ = - ˆ 2.
Application:
α = 0 market share
II 38
e. Polynomials
β3 = β4 = 0 β4 = 0 β4 ≠ 0
Cost Function:
Restrictions (Summary):
β 32 < 3β2β4
II 39
f. Production Functions
ln Yt = β1 + β2t + β3 ln Lt + β4 ln Kt + ln εt
Production Characteristics:
% (K/L)
= = 1 = elasticity of substitution
% WL / WK
ln Yt = β1 + β2 ln Lt + β3 ln Kt
+ β3(ln Lt)2 + β4(ln Kt)2
+ β5(ln Lt)(ln Kt)
Note that this model includes the Cobb Douglas as a special case (β3=β4=β5=0).
1
= ,
1-
M = returns to scale.
Cost functions can be estimated from estimated production functions.
Estimation: (?)
2. "Nontransformable" Models
(b) Examples:
SSE = ln - - - Xt
Yt
= Σ(ln εt)2
model(theta) applies the transform to both depvar and indepvars, but this
time, each side is transformed by a separate parameter.
I. PROBLEM SETS
Theory
1. Let kids denote the number of children ever born to a woman, and let educ denote the years of
education for the woman. A simple model relating fertility to years of education is
kids = β0 + β1educ + u
where u is the unobserved error.
a. All of the factors besides a woman’s education that affect fertility are lumped into the
error term, u. What kinds of factors are contained in u? Which of these are likely to be
correlated with level of education, which are not?
b. Will a simple regression analysis uncover the ceteris paribus effect of education on
fertility? Explain.
(Wooldridge 2.1)
2. Demonstrate that
ˆ = (X t - X)(Y t 2- Y) Covariance( X , Y ) is equivalent to
2
(X t - X) Variance( X )
Xt Yt - n XY
a. 2 2
X t - n(X )
(X t - X) Y t
b.
( X t - X )2
(Hints: Expand the numerator and denominator and remember that Xt nX ).
3. Demonstrate that the sample regression line obtained from least squares with an estimated
intercept passes through ( X , Y ). (Hint: Ŷ ˆ1 ˆ2 X , substitute X X , and simplify)
(JM II-B)
II 43
A.2 E(εt) = 0 t
A.3 Var(εt) = ζ2 t
A.4 Cov(εt,εs) = 0 t, s (t s)
A.5 Xt nonstochastic.
a) Find the least squares estimator of β.
Hint: SSE = Σεt2 = Σ(Yt - βXt)2.
5. The data set in CEOSAL2.RAW contains information on chief executive officers for U.S.
corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten
is prior number of years as company CEO.
i) Find the average salary and average tenure in the sample.
ii) How many CEO’s are in their first year as CEO (that is, ceoten = 0)?
iii) Estimate the simple regression model
log(salary) = 0 + 1ceoten +ε
and report your results in the usual form*. What is the predicted percentage increase in
salary given one more year as CEO?
(Wooldridge C.2.2)
*The usual form is to write out the equation with the estimated betas and their standard errors
underneath in parentheses. For example, if I was estimating
Yt = α + βXt + εt
and estimated α to be .543 with a standard error of .001 and β to be 1.43 with a standard error of
1.01 then I would report my results in the “usual form” as follows:
Theory
Yt = β1 + β2Xt + εt.
1. BACKGROUND: The purpose of this problem is to show that, using OLS, the total sum of
squares can be partitioned into two parts as follows:
n n
( Y t - Y )2 = (Y t - Ŷ t + Ŷ t - Y )2
t =1 t =1
n n n
= (Y t - Ŷ )2 + 2 (Y t - Ŷ t )( Ŷ t - Y) + (Ŷ t - Y )2
t =1 t =1 t =1
n n n
where the terms ( Y t - Y )2 , (Y t - Ŷ )2 , (Ŷ t - Y )2 are referred to as the total sum of
t =1 t =1 t =1
squares (SST), sum of squares error (SSE), sum of squares "explained by the regression"
(SSR), respectively. This notation differs from that used by Wooldridge, but conforms with
notation used in a number of other econometrics texts
when least squares estimators are used. (Remember the first order conditions or normal
equations.)
(JM II-B)
Applied
2. For the population of firms in the chemical industry, let rd denote annual expenditures on
research and development, and let sales denote annual sales (both are in millions of
dollars).
a. Write down a model (not an estimated equation) that implies a constant elasticity
between rd and sales. Which parameter is the elasticity? (Hint: what functional
form should be used?)
b. Now estimate the model using the data in RDCHEM.RAW. Write out the estimated
equation in the usual form*. What is the estimated elasticity of rd with respect to
sales? Explain in words what this elasticity means.
(Wooldridge C 2.5)
2
* report the estimated parameters, standard errors, and R
II 45
Data Set A B C D
Variable X Y X Y X Y X Y
Obs. No. 1 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
2 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
3 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
4 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
5 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
6 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
7 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
8 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
9 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
10 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
b. Compare and explain the four sets of results. (Hint: plot the data.)
c. In each of the four cases obtain a prediction of the value of Yt corresponding to a value of X = 20.
Which of the forecasts would you feel most comfortable with? Explain.
d. Based upon these examples comment on the following widely held notions.
ii) "For any particular kind of statistical data there is just one set of calculations constituting a
correct statistical analysis."
iii) "Performing intricate calculations is rigorous, whereas actually looking at the data is cheating."
(JM II)
1
Reference: Anscombe, F. J., "Graphs in Statistical Analysis," The American Statistician, Vol. 27 (1973), p. 17-21.
II 46
4. The following Stata printout corresponds to the first Anscombe data set.
b. Calculate the predicted value of Y and the variance of the forecast error
corresponding to x=20.
(1) Ŷ
(2) sY2ˆ s2
(3) sY2ˆ
s2
Hint: Recall that sY2ˆ (20 X ) 2 s 2ˆ and sFE
2
= sY2ˆ s2
n 2
c. Calculate 95% confidence intervals for the actual value of Y corresponding to X=20.
d. Calculate 95% confidence intervals for the population regression line corresponding
to X=20.
Yet another hint: the sample and population regression lines, respectively,
are defined by Yˆ ˆ ˆ X and 1
t 1 2 t
s s
2 X t , so use Yˆ for part (d) and FE for part
(c).
Check your work: Recall that the confidence interval for the population regression
line is narrower than the confidence interval for the actual value of Y corresponding
to a given X.
. summ y x
. reg y x
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons | 3.000091 1.124747 2.67 0.026 .4557369 5.544445
------------------------------------------------------------------------------
. set obs 12
obs was 11, now 12
. replace x=20 in 12
(1 real change made)
. predict yhat
(option xb assumed; fitted values)
+---------------------------------+
| x y yhat sfe |
|---------------------------------|
11. | 5 5.68 5.500546 1.375003 |
12. | 20 . 13.00191 1.830386 |