MahlersGuidetoRegression2006 PDF

Mahler’s Guide to
Regression
Sections 1-4:
1 Fitting a Straight Line with No Intercept
2 Fitting a Straight Line with an Intercept
3 Residuals
4 Dividing the Sum of Squares into Two Pieces
VEE-Applied Statistical Methods Exam
prepared by
Howard C. Mahler, FCAS
Copyright 2006 by Howard C. Mahler.
Study Aid F06-Reg-A
New England Actuarial Seminars Howard Mahler

POB 315 hmahler@mac.com
Sharon, MA, 02067
www.neas-seminars.com
HCMSA-F06-Reg-A, Mahler’s Guide to Regression, 7/11/06, Page 1
Mahler’s Guide to Regression

While these study guides were written for the VEE-Applied Statistical Methods Exam given by
the Casualty Actuarial Society, they should be of value to anyone learning Regression.
They should also help those trying to refresh their memories about a particular idea.
There is no knowledge specific to actuarial work assumed.
The material on the regression portion of the VEE-Applied Statistical Methods Exam is
covered. 1 The material on time series, that is also on this exam, is not covered.2
Information in bold or sections whose title is in bold are more important for passing the exam.
Larger bold type indicates it is extremely important.
Starred sections, subsections, and questions should not be needed to directly answer exam
questions and should be skipped on first reading. It is provided to aid the reader’s overall
understanding of the subject, and to be useful in practical applications.
For those who have trouble getting through the material, concentrate on the
sections in bold.
Highly Recommended problems (about 1/6 of the total) are double underlined.
Recommended problems (about 1/6 of the total) are underlined.
Do at least the Highly Recommended problems your first time through.
It is important that you do problems when learning a subject and then some more
problems a few weeks later.
The points assigned to each problem are based on 100 points for a four hour exam.
1 point problems are shorter than typical exam questions.
2 and 3 point problems are similar in length to typical exam questions.
4 point problems are longer than typical exam questions.
Solutions to problems are given at the end.3
The following tables will be provided to the candidate with the exam:
Normal Distribution, Chi-square Distribution, t-Distribution, and F-Distribution.
1
Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.
Chapters 1, 3, 4, 5, 6 (excluding Appendix 6.1), and Sections 8.1, 8.2, 10.1.
Sections 8.1, 8.2, and 10.1, covered in my Sections 29-31 and 38, were added to the syllabus in 2005.
2
Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.
Chapters 15, 16 (excluding Appendix 16.1), 17( excluding Appendix 17.1), and 18, cover time series.
3
Note that problems include both some written by me and some from past exams. The latter are copyright by the
Casualty Actuarial Society and Society of Actuaries, or the Institute of Actuaries and Faculty of Actuaries, and are
reproduced here solely to aid students in studying for exams. The solutions and comments are solely the
responsibility of the author. While some of the comments may seem critical of certain questions, this is intended
solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long
and hard to produce quality exams. In some cases I’ve rewritten these questions in order to match the notation in the
current Syllabus.
Section # Pages Section Name

A 1 5-17 Fitting a Straight Line with No Intercept
2 18-29 Fitting a Straight Line with an Intercept
3 30-32 Residuals
4 33-41 Dividing the Sum of Squares into Two Pieces
B 5 42-51 R-Squared
6 52-59 Corrected R-Squared
7 60-67 Normal Distribution
8 68-71 Assumptions of Linear Regression
9 72-84 Properties of Estimators
C 10 85-95 Variances and Covariances

11 96-102 t-Distribution
12 103-114 t-test
13 115-121 Confidence Intervals for Estimated Parameters
D 14 122-135 F Distribution
15 136-147 Testing the Slope, Two Variable Model
16 148-155 Hypothesis Testing
17 156-158 A Simulation Experiment *
E 18 159-167 Three Variable Regression Model

19 168-180 Matrix Form of Multiple Regression
20 181-193 Tests of Slopes, Multiple Regression
21 194-207 Additional Tests of Slopes
F 22 208-225 Additional Models

23 226-236 Dummy Variables
24 237-240 Piecewise Linear Regression
G 25 241-249 Weighted Regression

26 250-256 Heteroscedasticity
27 257-263 Tests for Heteroscedasticity
28 264-274 Correcting for Heteroscedasticity
H 29 275-283 Serial Correlation

30 284-295 Durbin-Watson Statistic
31 296-302 Correcting for Serial Correlation
32 303-306 Multicollinearity
Table of Contents Continued on the Next Page

Section # Pages Section Name

I 33 307-323 Forecasting
34 324-332 Testing Forecasts
35 333-342 Forecasting with Serial Correlation
J 36 343-349 Standardized Coefficients

37 350-354 Elasticity
38 355-360 Partial Correlation Coefficients
39 361-367 Regression Diagnostics *
40 368 Stepwise Regression *
41 369 Stochastic Explanatory Variables *
42 370-373 Generalized Least Squares *
43 374-385 Nonlinear Estimation
K 44 386-399 Generalized Linear Models*

45 400-415 Important Ideas and Formulas
L 416-447 Solutions to Problems, Sections 1-12
M 448-487 Solutions to Problems, Sections 13-21
N 488-529 Solutions to Problems, Sections 22-32
O 530-569 Solutions to Problems, Sections 33-44
The CAS/SOA did not release the 5/02, 5/03, and 5/04 exams.
Only the first VEE exam was released.
Sample Exam Q.40: statements C, D, and E are from chapter 7 of Pindyck & Rubinfeld, no
longer on the Syllabus. 5/00, Q.16 and 11/00 Q.35 can be answered using ideas specifically
discussed in chapter 7 of Pindyck & Rubinfeld, no longer on the syllabus. However, they can
also be answered from first principles.
Course 4 and VEE Exam Questions by Section of this Study Aid

Section Sample 5/00 11/00 5/01 11/01 11/02 11/03 11/04 VEE 8/05
1 16* 35* 29 4
2
3
4
5 29 5 30 9
6 31
7
8
9 35
10 40 35 13
11
12
13 5 5 38
14
15 1
16
17 *
18 35 13
19 36 3 2
20
21 12 35 9 21 21 27 20 19 7
22 5 20
23 24 5 9
24
25 7 *
26
27 11
28 31 21 28 23
29
30 30 24 14
31 12 33
32
33 25
34 6
35
36 37 13
37 27
38 5 12 11
39*
40*
41*
42*
43 10
44* 34
Section 1, Fitting a Straight Line with No Intercept
Assume we have the following heights of eight fathers and their adult sons (in inches):4
Father Son
53 56
54 58
57 61
58 60
61 63
62 62
63 65
66 64
Here is a graph of this data:

Son
64
62
60
58
56
Father
54 56 58 60 62 64 66
There appears to be a relationship between the height of the father, X, and the height of his
son, Y. A taller father seems to be more likely to have a taller son.
4
There are only 8 pairs of observations solely in order to keep things simple.
Straight Line with No Intercept:
Let us assume Y = βX. We want to determine the “best” value of β. The most common way to
do so is to minimize the sum of the squared differences between the height of each
son estimated by our equation βXi, and the actual height of that son Yi.5
Sum of Squared Errors = Σ(Yi - βXi)2.
Exercise: If β = 1.01, what is the sum of squared errors?

[Solution: Σ(Yi - βXi)2 = (56 - 53.53)2 + (58 - 54.54)2 + (61 - 57.57)2 + (60 - 58.58)2 +
(63 - 61.61)2 + (62 - 62.62)2 + (65 - 63.63)2 + (64 - 66.66)2 = 43.12.]
Here is a graph of the sum of squared errors, as a function of β:
55
50
45
40
35
1.01 1.02 1.03 1.04 1.05 1.06
The smallest sum of squared errors corresponds to β ≅ 1.03. We refer to 1.03 as the least
squares estimate of the slope, β.
We would determine the least squares estimate of β algebraically, by setting equal to zero the
partial derivative with respect to β of the sum of squared errors.6
0 = ∂Σ(Yi - βXi)2/ ∂β = -2Σ(Yi - βXi)Xi. ⇒ 0 = ΣXiYi - ΣβXiXi. ⇒ βΣXi2 = ΣXiYi . ⇒
β = ΣXiYi /ΣXi2.
5
Minimizing squared differences is not the only criterion that could be used. For example, one could instead
minimize the absolute differences, which produces different results. See Figure 1.3 in Pindyck and Rubinfeld.
See also Section 12.4.2 in Loss Models.
6
We treat β as the only variable.
Exercise: Use the above equation in order to determine the least squares estimate of β.
[Solution: ΣXiYi = (53)(56) + ... + (66)(64) = 29063. ΣXi2 = 532 + ... + 662 = 28228.
estimate of β = ΣXiYi /ΣXi2 = 29063/28228 = 1.02958 ≅ 1.03.]
^ ^
The estimated value of beta is usually written with a ^ over it, β. In this case β = 1.03.
Estimated values of other quantities are written in a similar manner.
We do not expect any model to exactly predict the height of a son from the height of his father,
therefore we include an error term in the model. The model we have been using is usually
written Y = βX + ε, or Yi = βXi + εi, where εi is an error term.
In general, for the least squares fit to the linear model with no intercept,
Y = β X + ε:
^
β = Σ X i Y i /Σ X i 2 .
^
Here is a graph of the least squares line fit to the data on heights, with β = 1.03:
Son
68
66
64
62
60
58
56
54
Father
54 56 58 60 62 64 66
Residuals:
^ ^
The estimated height of the sons is written as Y. Yi = 1.03Xi. The difference between each
son’s height and his height estimated by the model is the error, referred to as the residual.
Residual = actual - estimated.
The residual for son i is written as ^εi .
^ε ^
i ≡ Yi - Yi .
^
Exercise: What are the residuals for the fitted model Yi = 1.03Xi?
[Solution: ^εi = 56 - 54.59, 58 - 55.62, 61 - 58.71, 60 - 59.74, 63 - 62.83, 62 - 63.86,
65 - 64.89, 64 - 67.98 = 1.41, 2.38, 2.29, .26, .17, -1.86, .11, -3.98.
Comment: Note that these residuals do not sum to zero.7 ]
Here is a plot of these residuals:
Residual
Father
54 56 58 60 62 64 66
-1
-2
-3
-4
^
Exercise: For the fitted model Yi = 1.03Xi, what is the sum of squared errors?
[Solution: Σ ^εi 2 = 1.412 + 2.382 + 2.292 + .262 + .172 + (-1.86)2 + .112 + (-3.98)2 = 32.3.
Comment: This matches the result shown previously in a graph.]
The sum of squared errors is referred to the Error Sum of Squares or ESS. 8
^
ESS ≡ Σ ^εi 2 = Σ (Y i - Yi ) 2 .
In this case, ESS = 32.3.
7
As will be discussed later, when there is an intercept, the residuals sum to zero.
8
ESS is the sum of squared errors for a fitted model, as opposed to the sum of squared errors for any value of β.
Unbiased Estimator:
For the one variable linear regression model with no intercept, Yi = βXi + εi:
We assume E[εi]. Each error term has mean of zero.
Then E[Yi] = E[βXi + εi] = βXi.
^
β = ΣXiYi /ΣXi2.
^
E[ β] = ΣXiE[Yi] /ΣXi2 = ΣXiβXi /ΣXi2 = βΣXi2 /ΣXi2 = β.
^
Thus, β is an unbiased estimator of the slope β.
Expected Value of Residuals:
^ε ^ ^
i = Yi - Yi = Yi - βXi.
^
E[ ^εi ] = E[Yi] - E[ β]Xi = βXi - βXi = 0.
Thus the expected value of each residual is zero.
However, it is important to note that the observed residuals will usually be nonzero.
One is interested in the variance of each residual around its expected value.
Variances of Residuals:*9
Assume we have:
i Xi Var(εi)
1 1 1
2 3 5
3 8 10
Exercise: Fit the model Yi = βXi + εi.

^
[Solution: β = ΣXiYi /ΣXi2 = (Y1 + 3Y2 + 8Y3)/74.]
^
ε^1 = Y1 - X1 β = Y1 - (1)(Y1 + 3Y2 + 8Y3)/74 = (73Y1 - 3Y2 - 8Y3)/74.
9
See 4, 11/03, Q.29.
Assuming εi and εj are independent,10 then Yi and Yj are independent.
Var[ ε^1] = Var[(73Y1 - 3Y2 - 8Y3)/74] = (732Var[ε1] + 32Var[ε2] + 82Var[ε3])/742 =

(732(1) + 32(5) + 82(10))/742 = 1.098.
Note that since E[ ε^1] = 0, E[ ε^12] = Var[ ε^1] = 1.098.
Exercise: What is Var[ ε^2 ]?

^
[Solution: ε^2 = Y2 - X2 β = Y2 - (3)(Y1 + 3Y2 + 8Y3)/74 = (65Y2 - 3Y1 - 24Y3)/74.
Var[ ε^2 ] = Var[(65Y2 - 3Y1 - 24Y3)/74] = (652Var[ε2] + 32Var[ε1] + 242Var[ε3])/742 =

(652(5) + 32(1) + 242(10))/742 = 4.911.]
Formula for the Variance of the Residuals:*
One can derive a general formula for Var[ ^εi ] as follows.
E[Yi2] = Var[Yi] + E[Yi]2 = Var[εi] + β2Xi2.
Yi and Yj are independent ⇒ E[YiYj] = Cov[Yi, Yj] + E[Yi]E[Yj] = 0 + βXiβXj = β2XiXj, i ≠ j.
^
β = ΣXiYi /ΣXi2.
^
E[ β 2] = E[ΣΣXiYi XjYj]/{ΣXi2}2 = ΣXi2Var[εi]/{ΣXi2}2 + ΣΣβ2Xi2Xj2/{ΣXi2}2 =
ΣXi2Var[εi]/{ΣXi2}2 + β2.
^
E[Yj β] = E[Yj ΣXiYi]/ΣXi2 = XjVar[εj]/ΣXi2 + β2ΣXjXi2/ΣXi2 = XjVar[εj]/ΣXi2 + Xjβ2.
^ε ^
i = Yi - βXi.
^ ^
E[ ^εi 2] = E[Yi2] + Xi2E[ β 2] - 2XiE[Yi β] =
Var[εi] + β2Xi2 + Xi2ΣXj2Var[εj]/{ΣXj2}2 + Xi2β2 - 2Xi2Var[εi]/ΣXj2 - 2Xi2β2.
Var[ ^εi ] = E[ ^εi 2] = Var[εi] + Xi2ΣXj2Var[εj]/{ΣXj2}2 - 2Xi2Var[εi]/ΣXj2.
10
In the absence of serial correlation, we assume that the error terms are independent. Serial correlation will be
discussed in a subsequent section.
Exercise: What is Var[ ε^3 ]?
[Solution: Var[ ε^3 ] = Var[ε3] + X32ΣXj2Var[εj]/{ΣXj2}2 - 2X32Var[ε3]/ΣXj2 =

10 + 82{(12)(1) + (32)(5) + (82)(10)}/742 - (2)(82)(10)/74 = .720.
^
Alternately, ε^3 = Y3 - X3 β = Y3 - (8)(Y1 + 3Y2 + 8Y3)/74 = (10Y3 - 8Y1 - 24Y2)/74.
Var[ ε^3 ] = Var[(10Y3 - 8Y1 - 24Y2)/74] = (102Var[ε3] + 82Var[ε1] + 242Var[ε2])/742 =

(102(10) + 82(1) + 242(5))/742 = .720.]
If all of the Var[εi] are equal, Var[εi] = σ2, then:11
Var[ ^εi ] = Var[εi] + Xi2ΣXj2Var[εj]/{ΣXj2}2 - 2Xi2Var[εi]/ΣXj2 =

σ2 + Xi2ΣXj2σ2/{ΣXj2}2 - 2Xi2σ2/ΣXj2 = σ2(1 - Xi2/ΣXj2) = σ2ΣXj2/ΣXj2.
j≠i
E[ ^εi 2] = Var[ ^εi ] = σ2(1 - Xi2/ΣXj2).
E[ESS] = E[Σ ^εi 2] = ΣE[ ^εi 2] = Σσ2(1 - Xi2/ΣXj2) = σ2(N - 1).

i
Thus ESS/(N-1) is an unbiased estimator of σ2.
Covariances of Residuals:*
^ ^ ^ ^ ^
E[ ε^1 ε^2 ] = E[(Y1 - βX1)(Y2 - βX2)] = E[Y1Y2] + X1X2E[ β 2] - X2E[Y1 β] - X1E[Y1 β] =
β2X1X2 + X1X2ΣXi2Var[εi]/{ΣXi2}2 + X1X2β2 - X2X1Var[ε1]/ΣXi2 - X1X2Var[ε2]/ΣXi2 - 2X1X2β2 =
X1X2ΣXi2Var[εi]/{ΣXi2}2 - X1X2(Var[ε1] + Var[ε2] )/ΣXi2.
Cov[ ε^1 , ε^2 ] = E[ ε^1 ε^2 ] - E[ ε^1]E[ ε^2 ] = X1X2ΣXi2Var[εi]/{ΣXi2}2 - X1X2(Var[ε1] + Var[ε2])/ΣXi2.
In the example, Cov[ ε^1 , ε^2 ] = (1)(3)(686)/742 - (1)(3)(1 + 5)/74 = .1326.
Corr[ ε^1 , ε^2 ] = .1326/√((1.098)(4.911)) = .057.12
Cov[ ^εi , ε^j ] = XiXj{ΣXk2Var[εk]/ΣXk2 - Var[εi] - Var[εj]}/ΣXk2.
11
Homoscedasticity is the term used for the situation in which all of the error terms have the same variance.
Homoscedasticity and heteroscedasticity will be discussed in a subsequent section.
12
While ε1 and ε2 are independent, the same is not true of the observed residuals.
Exercise: What is Corr[ ε^1 , ε^3 ]?
[Solution: Cov[ ε^1 , ε^3 ] = (1)(8){686/74 - 1 - 10}/74 = -.1870.
Corr[ ε^1 , ε^3 ] = -.1870/√((1.098)(.720)) = -.210.]
Exercise: What is Corr[ ε^2 , ε^3 ]?

[Solution: Cov[ ε^2 , ε^3 ] = (3)(8){686/74 - 5 - 10}/74 = -1.8583.
Corr[ ε^2 , ε^3 ] = -1.8583/√((4.911)(.720)) = -.988.]
For this example, the variance-covariance matrix of the residuals is:
(1.098 .133 -.187)

( .133 4.911 -1.858)
(-.187 -1.858 .720)
If all of the Var[εi] are equal, Var[εi] = σ2, then:
Cov[ ^εi , ε^j ] = XiXj{ΣXk2σ2/ΣXk2 - σ2 - σ2}/ΣXk2 = -σ2XiXj/ΣXk2.
Corr[ ^εi , ε^j ] = -XiXj/√{ΣXk2)(ΣXk2)}.

k≠i k≠j
Simulation:
Assume we have Yi = 2Xi + εi, with εi independent and Normal with mean zero, and:
i Xi Var(εi)
1 1 1
2 3 5
3 8 10
We can simulate this situation as follows:

1. Simulate ε1, ε2, and ε3.
2. Calculate Yi = 2Xi + εi.
^
3. Fit a regression, β = ΣXiYi /ΣXi2.
^ ^
4. Calculate Yi = βXi.
^
5. Calculate ^εi = Yi - Yi .
For example, let -1.272, -.620, and .574, be 3 independent random Standard Normals.
ε1 = -1.272√1 = -1.272. ε2 = -.620√5 = -1.386. ε3 = .574√10 = 1.815.
Y1 = 2X1 + ε1 = (2)(1) - 1.272 = .728. Y2 = (2)(3) - 1.386 = 4.614. Y3 = (2)(8) + 1.815 = 17.815.
^
β = ΣXiYi /ΣXi2 = 157.1/74 = 2.123.
^ ^ ^
Y1 = (2.123)(1) = 2.123. Y2 = (2.123)(3) = 6.369. Y3 = (2.123)(8) = 16.984.
ε^1 = .728 - 2.123 = -1.395. ε^2 = 4.614 - 6.369 = -1.755. ε^3 = 17.815 - 16.984 = .831.
Exercise: Let 2.388, -.849, and -2.315, be 3 independent random Standard Normals.
Simulate the above situation and determine the residuals.
[Solution: ε1 = 2.388√1 = 2.388. ε2 = -.849√5 = -1.898. ε3 = -2.315√10 = -7.321.
Y1 = (2)(1) + 2.388 = 4.388. Y2 = (2)(3) - 1.898 = 4.102. Y3 = (2)(8) - 7.321 = 8.679.
^
β = ΣXiYi /ΣXi2 = 86.126/74 = 1.164.
^ ^ ^
Y1 = (1.164)(1) = 1.164. Y2 = (1.164)(3) = 3.492. Y3 = (1.164)(8) = 9.312.
ε^1 = 4.388 - 1.164 = 3.224. ε^2 = 4.102 - 3.492 = .610. ε^3 = 8.679 - 9.312 = -.633.]
Notice that each time we perform this simulation we get a different set of Yis, a different fitted
slope, and a different set of residuals. If we ran this simulation 1000 times, we would get a set
of 1000 different values for ε^1. Var[ ε^1] measures the variance of ε^1 around its expected value
of zero.
^
If we ran this simulation 1000 times, we would get a set of 1000 different values for β.
^ ^
Var[ β ] measures the variance of β around its expected value of β = 2.13
13
The variance of fitted regression parameters will be discussed subsequently.
Problems:
Use the following 4 observations for the next 3 questions:

X: 4 7 13 19
Y: 5 15 22 35
1 . 1 (1 point) Via least squares, fit to the above observations the following model Y = βX + ε.
What is the fitted value of β?
(A) 1.4 (B) 1.5 (C) 1.6 (D) 1.7 (E) 1.8
1 . 2 (2 points) For the model fit in the previous question, what is the Error Sum of Squares?
(A) 11 (B) 12 (C) 13 (D) 14 (E) 15
1.3 (2 points) For the model Y = 2X, what is the sum of squared errors?
(A) 30 (B) 35 (C) 40 (D) 45 (E) 50
1.4 (2 points) You are given:

(i) The model is Yi = βXi + εi, i = 1, 2, 3.
(ii) i Xi Var(εi)
1 1 1
2 5 2
3 10 4
^
(iii) The ordinary least squares residuals are ^εi = Yi - βXi, i = 1, 2, 3.
Determine E( ε^2 2 | X1, X2, X3).

(A) 1.7 (B) 1.8 (C) 1.9 (D) 2.0 (E) 2.1
1.5 (1 point) Via ordinary least squares, the model Y = βX + ε is fit to the following data:
X: 1 5 10 25
Y: 5 15 50 100
^
Determine β.
(A) 3.9 (B) 4.0 (C) 4.1 (D) 4.2 (E) 4.3
1.6 (2 points) You are given the following data on the appraised values and sale prices of six
homes, in thousands of dollars:
Appraised Value: 170 213 68 66 96 137
Sale Price: 180 245 85 88 132 156
Fit a least squares line with no intercept.
What is the estimated sale price of a home appraised at 300?
(A) 340 (B) 342 (C) 344 (D) 346 (E) 348
1.7 (2 points) Fit a least squares line with no intercept to the following data:
X -2 -1 0 1 2 3 4 5
Y -12 -7 0 6 14 21 24 31
What is the slope of the fitted line?
(A) 6.1 (B) 6.2 (C) 6.3 (D) 6.4 (E) 6.5
1.8 (3 points) You are given the following information on the SAT scores for 10 students.
English: 630 700 540 610 580 670 710 630 580 760
Math: 570 710 570 580 610 640 660 640 670 720
Fit via least squares the model: Math Score = β(English Score).
What is the fitted value of β?
A. 0.98 B. 0.99 C. 1.00 D. 1.01 E. 1.02
1.9 (1 point) Given the following information:

ΣXi = -1015. ΣYi = -1410. ΣXi2 = 191,711. ΣYi2 = 123,526. ΣXiYi = 36,981. n = 20.
Determine the value of β fitted via least squares for the following model: Yi = βXi + ε.
A. Less than 0.18
B. At least 0.18, but less than 0.19
C. At least 0.19, but less than 0.20
D. At least 0.20, but less than 0.21
E. 0.21 or more
1.10 (2, 5/85, Q. 19) (1.5 points) For the data (x1, y1) = (1, 2) and (x2, y3) = (5, 3) and the
model E(Y) = βx, the least squares estimate of β is:
A. 1/4 B. 17/26 C. 17/13 D. 17/6 E. 4
* 1.11 (4, 5/00, Q.16) (2.5 points) You are given:

(i) x1 = -2 x2 = -1 x3 = 0 x4 = 1 x5 = 2
(ii) The true model for the data is y = 10x + 3x2 + ε.
(iii) The model fitted to the data is y = β*x + ε*.
Determine the expected value of the least-squares estimator of β*.
(A) 6 (B) 7 (C) 8 (D) 9 (E) 10
* 1.12 (4, 11/00, Q.35) (2.5 points)

You are analyzing a large set of observations from a population.
The true underlying model is: y = 0.1t - z + ε.
You fit a two-variable model to the observations, obtaining: y = 0.3t + ε*.
You are given: Σ t = 0. Σ t2 = 16. Σ z = 0. Σ z2 = 9.
Estimate the correlation coefficient between z and t.
(A) -0.7 (B) -0.6 (C) -0.5 (D) -0.4 (E) -0.3
1.13 (IOA 101, 9/03, Q.14) (12 points) Consider a linear regression model in which
responses Yi are uncorrelated and have expectations βXi and common variance
σ2 (i = 1,... ,n) ; i.e., Yi is modeled as a linear regression through the origin:
E(Yi | Xi) = βXi and V(Yi | Xi) = σ2 (i = 1,... ,n).
^
(i) (3.75 points) (a) Show that the least squares estimator of β is β1 = ΣXiYi/ΣXi2.
^
(b) Derive the expectation and variance of β1 under the model.
(ii) (3 points) An alternative to the least squares estimator in this case is:
^
β 2 = ΣYi/ΣXi = Y / X .
^
(a) Derive the expectation and variance of β 2 under the model.
^
(b) Show that the variance of the estimator β 2 is at least as large as that of the least squares
^
estimator β1.
^
(iii) (5.25 points) Now consider an estimator β 3 of β which is a linear function of the responses;
^
i.e., an estimator which has the form β 3 = ΣaiYi, where a1,..., an are constants.
^ ^
(a) Show that β 3 is unbiased for β if ΣaiXi = 1, and that the variance of β 3 is Σai2σ2.
^ ^ ^
(b) Show that the estimators β1 and β 2 above may be expressed in the form β 3 = ΣaiYi and
^ ^
hence verify that β1 and β 2 satisfy the condition for unbiasedness in (iii)(a).
^
(c) It can be shown that, subject to the condition ΣaiXi = 1, the variance of β 3 is minimized by
setting ai = Xi/ΣXi2. Comment on this result.
1.14 (4, 11/03, Q.29) (2.5 points) You are given:

(i) The model is Yi = βXi + εi, i = 1, 2, 3.
(ii) i Xi Var(εi)
1 1 1
2 2 9
3 3 16
^
(iii) The ordinary least squares residuals are ^εi = Yi - βXi, i = 1, 2, 3.
Determine E( ε^12 | X1, X2, X3).

(A) 1.0 (B) 1.8 (C) 2.7 (D) 3.7 (E) 7.6
1.15 (VEE-Applied Statistics Exam, 8/05, Q.4) (2.5 points) You are given:
Yi = β + βXi + εi.
Determine the least-squares estimate of β.
(A) ΣYi / ΣXi
(B) ΣYi / Σ(1 + Xi)
(C) ΣXiYi / ΣXi2
(D) Σ(1 + Xi)Yi / Σ(1 + Xi)2
(E) Σ(Xi - X )(Yi - Y )/ Σ(Xi - X )2
Section 2, Fitting a Straight Line with an Intercept

In the previous section we fit a straight line with no intercept, to the heights of fathers and their
sons. In this section we will include an intercept in the model.
Let us assume Y = α + βX + ε, where X is the height of the father and Y is the height of his son.
This model with one independent variable and one intercept, is called the
two-variable regression model. We want to determine the best values of α and β, those
that minimize the sum of the squared differences between the height of each son
estimated by our equation α + βXi, and the actual height of that son Yi. This is called the
ordinary least squares regression.14
Sum of Squared Errors = Σ(Yi - α − βXi)2.
We would determine the least squares estimates of α and β algebraically, by setting equal to
zero the partial derivatives with respect to α and β of the sum of squared errors.
0 = ∂Σ(Yi - α − βXi)2/ ∂α = -2Σ(Yi - α - βXi). ⇒ 0 = ΣYi - Σα - ΣβXi. ⇒

αN + βΣXi = ΣYi , where N is the number of observations.
0 = ∂Σ(Yi - α − βXi)2/ ∂β = -2Σ(Yi - α - βXi)Xi. ⇒ 0 = ΣXiYi - αΣXi - ΣβXiXi. ⇒

αΣXi + βΣXi2 = ΣXiYi.
Exercise: Use the above equations in order to determine the least squares estimates of
α and β for the fathers and sons example.
[Solution: ΣXiYi = (53)(56) + ... + (66)(64) = 29063.
ΣXi2 = 532 + ... + 662 = 28228.
N = number of observations = 8.
ΣXi = 53 + ... + 66 = 474.
ΣYi = 56 + ... + 64 = 489.
Therefore, 8α + 474β = 489 and 474α + 28228β = 29063.
Therefore, α^ = {(489)(28228) - (29063)(474)}/{(8)(28228) - 4742} = 27630/1148 = 24.07, and

^
β = {(29063)(8) - (489)(474)}/{(8)(28228) - 4742} = 718/1148 = .6254.]
^
Thus the result of this regression is: Yi = 24.07 + .6254Xi.
^
For example, the fitted height of the first son is: Y1 = 24.07 + .6254X1 = 24.07 + (.6254)(53) =
57.216. This of course differs somewhat from the actual height of the first son which is 56.
14
The term “regression” was introduced by Francis Galton in the 1880s, referring to his analysis of the heights of
adult children versus the heights of their parents.
Here is a graph of the least squares line with intercept (solid) and that without intercept
(dashed), each fit to the same data on heights:
Son
68
66
64
62
60
58
56
54
Father
54 56 58 60 62 64 66
The line with intercept (solid) seems to fit better than that without intercept (dashed). However,
this will always be the case, since the line with no intercept is just a special case of that with
intercept, with α^ = 0. How to determine whether the line with intercept is a significantly better
fit, will be discussed subsequently.
We obtained two equations in two unknowns:

αN + βΣXi = ΣYi , where N is the number of observations.
αΣXi + βΣXi2 = ΣXiYi .
The solution is:

^
α^ = {ΣYiΣXi2 - ΣXiΣXiYi }/ {NΣXi2 - (ΣXi)2}, or α^ = Y − β X..
^
β = {NΣ X iY i - Σ X iΣ Y i }/ {NΣ X i2 - (Σ X i) 2 }.
These are the easiest formulas to use, when one is given the summary statistics such as ΣXiYi.
Using the Functions of the Calculator:
Provided you are given the individual data rather than the summary statistics,
the allowed electronic calculators will fit a least squares straight line with an
intercept.
Father Son
53 56
54 58
57 61
58 60
61 63
62 62
63 65
66 64
Using the TI-30X-IIS, one would fit a straight line with intercept as follows:
2nd STAT
CLRDATA ENTER
2nd STAT
2-VAR ENTER (Use the key if necessary to select 2-VAR rather than 1-VAR.)
DATA
X1 = 53
Y1 = 56
X2 = 54
Y2 = 58
etc.
X8 = 66
Y8 = 64 ENTER
STATVAR
Various outputs are displayed. Use the key and the key to scroll through them.
n = 8. (number of pairs of data.)
X = 59.25 (sample mean of X.)
Sx = 4.5277 (square root of the sample variance of X, computed with n - 1 in the denominator.)
σx = 4.2353 (square root of the variance of X, computed with n in the denominator.)
Y = 61.125 (sample mean of Y.)
Sy = 3.0443 (square root of the sample variance of Y, computed with n - 1 in the denominator.)
σy = 2.8477 (square root of the variance of Y, computed with n in the denominator.)
Σ X = 474 ΣX2 = 28228 ΣY = 489 ΣY2 = 29955 ΣXY = 29063
a = 0.6254 slope
b = 24.07 intercept
r = 0.93019 (sample correlation coefficient between X and Y.)

Deviations Form:
While these are perfectly valid solutions, when given individual data some people find it easier
to work with the variables in deviations form.
Exercise: What is the mean height of the fathers?

[Solution: (53 + 54 + 57 + 58 + 61 + 62 + 63 + 66)/8 = 474/8 = 59.25.]
Exercise: What is the mean height of the sons?

[Solution: (56 + 58 + 61 + 60 + 63 + 62 + 65 + 64)/8 = 489/8 = 61.125.]
The mean of a variable is written as that variable with a bar over it. Mean of X is X .
Mean height of fathers = X = 59.25.
Mean height of sons = Y = 61.125.
To convert a variable to deviations form, one subtracts its mean.

A variable in deviations form is written with a small rather than capital letter.
xi = Xi - X .
Exercise: What are xi and yi?

[Solutions: xi = Xi - X = (53, 54, 57, 58, 61, 62, 63, 66) - 59.25 =
(-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75).
yi = Yi - Y = (56, 58, 61, 60, 63, 62, 65, 64) - 61.125 =
(-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).]
Σxi = ΣXi - N X = N X - N X = 0.
Verify that in this case both xi and yi sum to zero. In general, the sum of any variable in
deviations form is zero. Therefore, its mean is also zero.
Variables in deviations always have a mean of zero.

Least Squares Regression in Deviations Form:
We have assumed the model, Yi = α + βXi + εi, i = 1, 2, ... N.

Then adding up the N equations and dividing by N we get:
Y = α + β X + Σεi/N.
We have no reason to believe the average error is positive or negative. Lets assume it is zero.15
^ ^
Then we would expect that: Y = α^ + β X . ⇒ α^ = Y − β X .16
^
One could verify that this is true in general for the solutions given previously, for α^ and β.
In any case, when we set the partial derivative of the squared error with respect to α equal to
^ ^ ^
zero we got: α^ N + β ΣXi. = ΣYi ⇒ Y = α^ + β X . ⇒ α^ = Y − β X .
^
Exercise: For the regression fit to heights, verify that α^ = Y − β X .
^
[Solution: α^ = 24.07. β = .6254. X = 59.25. Y = 61.125.
24.07 = 61.125 - (.6254)(59.25).]
We can take the original model and convert it to deviations form:

^
Yi = α + βXi + εi = Y − β X + βXi + εi. ⇒ Yi - Y = β(Xi - X ) + εi. ⇒
yi = βxi + εi.
In deviations we get the same equation, except with no intercept. Based on the previous
^
section, the least squares fit is: β = Σxiyi /Σxi2.
In deviations form, the least squares regression to the two-variable (linear)

regression model, Y i = α + βX i + εi, has solution:
^
β = Σ x i y i /Σ x i 2
^
α^ = Y − β X..
Exercise: Using deviations form, fit the least squares regression to the data on heights.
[Solution: xi = (-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75).
yi = (-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).
Σxi2 = 143.5. Σxiyi = 89.75. β^ = Σxiyi /Σxi2 = 89.75/143.5 = .625.

^
α^ = Y − β X = 61.125 - (.625)(59.25) = 24.1.
Comment: This matches the result obtained previously.]
A Shortcut when using Deviations Form:*

15
Assumptions behind least squares regression will be discussed subsequently.
16
This is a good way to remember this formula.
A Shortcut when using Deviations Form:*
Σxiyi = Σxi(Yi - Y ) = ΣxiYi - Y Σxi = ΣxiYi - Y 0 = ΣxiYi.
^
Therefore, β = ΣxiYi /Σxi2.
This can save some time on an exam, by avoiding having to calculate yi = Yi - Y .

^
One would still have to calculate Y , in order to calculate α^ = Y − β X .
S Notation:*
Another notation that some people find useful is:
SXX = Σ(Xi - X )2 = ΣXi2 - (ΣXi)2/N.
SYY = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N.
SXY = Σ(Xi - X )(Yi - Y ) = ΣXiYi - ΣXiΣYi/N.
Exercise: For the regression of heights example, calculate SXX, SYY, and SXY
[Solution: SXX = 143.5, SYY = 64.875, and SXY = 89.75.]
^
Then, β = SXY/SXX.
^
For the regression of heights example, β = 89.75/143.5 = .625.
^
As before, α^ = Y − β X .
The sample variance of X is: SXX/(N-1).
The sample variance of Y is: SYY/(N-1).
The sample covariance of X and Y is: SXY/(N-1).
The sample correlation of X and Y is: SXY/√(SXXSYY).
For the regression of heights example, r = 89.75/√{(143.5)(64.875)} = 0.9302.

Relation of Fitted Slope to Covariances or Correlations:
The sample variance of X is: sX2 = Σ (Xi - X )2 / (N - 1) = Σxi2 / (N - 1).

The sample covariance of X and Y is:
Cov[X, Y] = Σ (Xi - X )(Yi - Y ) / (N - 1) = Σ xiyi / (N - 1).
^
Therefore, β = Σxiyi/Σxi2 = Cov[X, Y] / Var[X].
^
β = Cov[X, Y] / Var[X].
Exercise: The sample variance of X is 125. The sample covariance of X and Y is 167.
^
What is β in a two variable linear regression?
^
[Solution: β = Cov[X, Y] / Var[X] = 167/125 = 1.336.]
Note that the sample correlation coefficient is: r = Cov[X, Y]/(sXsY) =

{Σ (Xi - X )(Yi - Y )/(N - 1)} / √({Σ (Xi - X )/(N - 1)2}{(Yi - Y )2/(N - 1)}) = Σ xiyi / √ (Σxi2Σyi2).
^
Therefore, β = Σxiyi /Σxi2 = r√(Σyi2/Σxi2) = rsY/sX.
^
β = r sY /s X .
^
For the heights example, r = .930, sx = 11.979, sY = 8.055, β = (.9302)(8.055)/11.979 = .625.
Exercise: The sample correlation of X and Y is -.4. The sample standard deviation of X is 5.
^
The sample standard deviation of Y is 10. What is β in a two variable linear regression?
^
[Solution: β = rsY/sX = -.4(10/5) = -.8.]
Problems:

X: 0 4 8 12
Y: 834 889 916 950
2.1 (2 points) Via least squares, fit to the above observations the following model
Y = α + βX + ε. What is the fitted value of β?
(A) 9.0 (B) 9.2 (C) 9.4 (D) 9.6 (E) 9.8
2.2 (1 point) Via least squares, fit to the above observations the following model
Y = α + βX + ε. What is the fitted value of α?
(A) 800 (B) 810 (C) 820 (D) 830 (E) 840
2.3 (1 point) The sample covariance of X and Y is -413. The sample variance of X is 512. What
^
is β in a two variable linear regression?
(A) -0.8 (B) -0.7 (C) -0.6 (D) -0.5 (E) -0.4
2 . 4 (3 points) You fit a two-variable linear regression to the following 5 observations:

X: 1 2 3 4 5
Y: 202 321 404 480 507
What is the predicted value of Y, when X = 7?
(A) 650 (B) 670 (C) 690 (D) 710 (E) 730
2.5 (1 point) The sample correlation of X and Y is 0.6. The sample variance of X is 36. The
^
sample variance of Y is 64. What is β in a two variable linear regression?
(A) 0.6 (B) 0.8 (C) 1.0 (D) 1.2 (E) 1.4
2.6 (2 points) Use the following 4 observations:

X: -1 1 3 5
Y: 3 4 7 6
Fit a least squares straight line with intercept and use it to estimate y for x = 6.
(A) 7.0 (B) 7.2 (C) 7.4 (D) 7.6 (E) 7.8
2.7 (3 points) Use the following information:

Year (t) Loss Ratio (Y)
1 82
2 78
3 80
4 73
5 77
You fit the following model: Y = α + βt + ε.
What is the estimated Loss Ratio for year 7?
(A) 71 (B) 72 (C) 73 (D) 74 (E) 75
2.8 (2 points) For each of five policy years an actuary has estimated the ultimate losses based
on the information available at the end of that policy year.
Policy Year Estimated Actual Ultimate
1991 45 43
1992 50 58
1993 55 63
1994 60 76
1995 65 78
Let Xt be the actuary’s estimate and Yt be the actual ultimate.
Fit the ordinary least squares model, Yt = α + βXt.
2.9 (2 points) You are given the following data on the number of exams and the salaries of
seven actuaries in the land of Elbonia:
Number of Exams: 2 3 3 4 2 4 3
Salaries: 50 63 56 66 60 82 71
Fit a least squares line with intercept.
What is the estimated salary of an actuary with 5 exams?
(A) 83 (B) 84 (C) 85 (D) 86 (E) 87
2.10 (3 points) You are given the following data for 10 taxi drivers. For each driver you are
given the number of moving traffic violations during three years and the sum of their basic limit
losses for Bodily Injury Liability Insurance (in $1000) during the following three years.
Violations: 0 0 0 0 1 1 1 2 3 5
Losses: 10 0 43 0 35 0 80 0 58 64
Fit a least squares line with intercept.
What are the estimated losses for a taxi driver with 4 moving violations?
(A) 49 (B) 51 (C) 53 (D) 55 (E) 57
2.11 (2 points) Given the following information:

ΣXi = 351. ΣYi = 15,227. ΣXi2 = 6201. ΣYi2 = 9,133,797. ΣXiYi = 204,296. n = 26.
Determine the least squares equation for the following model:
Yi = β0 + β1Xi + ε.
^
A. Yi = 601.2 - 1.153Xi.
^
B. Yi = 570.1 + 1.153Xi.
^
C. Yi = 597.4 - 0.867Xi.
^
D. Yi = 573.9 + 0.867Xi.
E. None of the above
2.12 (3 points) A linear regression, Y = α + βX, is fit to a set of observations (Xi, Yi), where X is
^
^ = 37 and β = 2.4.
in feet and Y is in dollars. α
If instead X had been in meters, 1 meter = 3.28 feet, and Y had been in yen, 1 yen = 116
dollars, what would have been the fitted model?
2.13 (2 points) Use the following information:

Year (t) Claim Frequency (Y)
1 3.18%
2 3.12%
3 3.30%
4 3.39%
5 3.41%
You fit via least squares the following model: Y = α + βt.
What is the fitted claim frequency for year 7?
(A) 3.51% (B) 3.53% (C) 3.55% (D) 3.57% (E) 3.59%
2.14 (2 points) You are given the following information on six women each 35 years old and
five foot 4 inches tall:
Weight: 142 146 156 163 170 177
Household income($000): 52 49 48 47 43 42
Determine the least squares equation for the following model: Yi = β0 + β1Xi + ε.
What is the fitted value of β1?
A. -0.27 B. -0.25 C. -0.23 D. -0.21 E. -0.19
2.15 (3 points) A linear regression, Y = α + βX, is fit to a set of observations (Xi, Yi).
What would be the effect on the fitted regression if a constant c had been added to each Xi?
What would be the effect on the fitted regression if instead a constant c had been added to
each Yi?
2.16 (2 points) For each of 10 insureds, you are given the number of claims in year 1 and the
number of claims in year 2.
Insured: 1 2 3 4 5 6 7 8 9 10
Year 1: 0 0 0 0 0 1 1 1 1 1
Year 2: 0 0 0 1 1 0 0 1 1 1
Fit a least squares line with intercept, using the number of claims in year 1 as the independent
variable and the number of claims in year 2 as the dependent variable.
What is the estimated future claim frequency for an insured with one claim in the most recent
year?
(A) 45% (B) 50% (C) 55% (D) 60% (E) 65%
2 . 1 7 (2 points) Given the following information:

ΣXi = 153. ΣYi = 727. ΣXi2 = 2016. ΣYi2 = 17972. ΣXiYi = 4002. n = 62.
Fit the following model via least squares: Yi = α + βXi + ε.
For the fitted model, what value of Y corresponds to X = 20?
A. Less than 35
B. At least 35, but less than 36
C. At least 36, but less than 37
D. At least 37, but less than 38
E. At least 38
2.18 (3 points) For a set of 10,000 private passenger automobile insureds you are given their
claim counts in 2004 and 2005.
2004 Claim Count
2005 Claim Count 0 1 2 Total
0 8300 740 40 9080
1 750 100 8 858
2 50 10 2 62
9100 850 50 10,000
Fit a least squares regression, Y = α + βX, where X is the claim count in 2004 and Y is the
claim count in 2005.
Joe had 2 claims in 2004.
Use this regression in order to estimate Joe’s expected claim frequency in 2005.
A. 0.14 B. 0.16 C. 0.18 D. 0.20 E. 0.22
2.19 (Course 120 Sample Exam #2, Q.1) (2 points) You fit the model Yi = α + βX i + ε i to
the following data:
i 1 2 3
Xi 1 3 4
Yi 2 Y2 5
^ = 5/7. Calculate Y2.
You determine that α
(A) 0 (B) 1 (C) 2 (D) 3 (E) 4
2.20 (Course 120 Sample Exam #2, Q.7) (2 points) You are given the following
information about a simple linear regression fit to 10 observations:
10 10 10 10
Σ Xi = 20. ΣYi = 100. Σ (Xi - X )2/9 = 4. Σ (Yi - Y )2/9 = 64.
i=1 i=1 i=1 i=1
You are also given that the simple correlation coefficient r = -0.98.
Determine the predicted value of Y when X = 5.
(A) -10 (B) -2 (C) 11 (D) 30 (E) 37
2.21 (IOA 101, 9/01, Q.4) (1.5 points) Let {(Xi , Yi); i = 1, … , n} denote a set of n pairs of
points, with X the sample mean of the Xs and Y the sample mean of the Ys.
Assuming the usual expressions for the estimated coefficients, verify that the
least squares fitted regression line of Y on X passes through the point ( X , Y ).
2.22 (IOA 101, 9/03, Q.6) (1.5 points)

Show that the slope of the regression line fitted by least squares to the three points:
(0, 0) , (1, y) , (2, 2)
is 1 for all values of y.
2.23 (CAS3, 5/05, Q.27) (2.5 points) Given the following information:
ΣXi = 144. ΣYi = 1,742. ΣXi2 = 2,300. ΣYi2 = 312,674. ΣXiYi = 26,696. n = 12.
Determine the least squares equation for the following model:
Yi = β0 + β1Xi + ε
^
A. Yi = -0.73 + 12.16Xi
^
B. Yi = -8.81 + 12.16Xi
^
C. Yi = 283.87 + 10.13Xi
^
D. Yi = 10.13 + 12.16Xi
^
E. Yi = 23.66 + 10.13Xi
2.24 (CAS3, 5/06, Q.9) (2.5 points)

The following summary statistics are available with respect to a random sample of seven
observations of the price of gasoline, Y, versus the price of oil, X:
ΣXi = 315. ΣXi2 = 14,875. ΣYi = 12.8. ΣYi2 = 24.3. ΣXiYi = 599.5.
Use the available information and a linear regression model of the form Y = α + βX to calculate
the predicted cost of gasoline if the price of oil reaches $75.
A. Less than $2.85
B. At least $2.85, but less than $2.90
C. At least $2.90, but less than $2.95
D. At least $2.95, but less than $3.00
E. At least $3.00
Section 3, Residuals
Continuing the example from the previous section, the fitted height of a son is:
^
Yi = 24.07 + .6254Xi, where Xi is the height of his father.
As discussed previously, the difference between each son’s height and his height estimated by
the model is the residual.
^
Residual = actual - estimated. ^εi ≡ Yi - Yi .
^
Exercise: What are the residuals for the fitted model Yi = 24.07 + .6254Xi?
[Solution: ^εi = 56 - 57.216, 58 - 57.842, 61 - 59.718, 60 - 60.343, 63 - 62.219, 62 - 62.845,
65 - 63.470, 64 - 65.346 = -1.216, .158, 1.282, -.343, .781, -.845, 1.530, -1.346.]
For the two variable linear regression model with an intercept:

^ε ^ ^ ^ ^ ^
i = Yi - Yi = Yi - α^ - βXi = Yi - ( Y - β X ) - βXi = yi - βxi.
^ ^ ^
Σ ^εi = Σ(yi - βxi) = Σyi - β Σxi = 0 - β0 = 0.
For the linear regression model with an intercept, the sum of the residuals is
always zero.17
This provides a good check of your work.

For the current example, Σ ^εi = -1.216 + .158 + 1.282 - .343 + .781 - .845 + 1.530 - 1.346 =
0.001, zero subject to rounding.
Error Sum of Squares:
^
Exercise: For the fitted model Yi = 24.07 + .6254Xi, what is the sum of squared errors?
[Solution: Σ ^εi 2 = 1.2162 + .1582 + 1.2822 - .3432 + .7812 - .8452 + 1.532 + 1.3462 = 8.741.]
^
The sum of squared errors = Error Sum of Squares = ESS ≡ Σ ^εi 2 = Σ (Y i - Yi ) 2 .
In this case, ESS = 8.741.
The error sum of squares will be discussed further in the section on Analysis of Variance.
17
This is not necessarily true for a model with no intercept.
Other Properties of Residuals:*18
One can prove that the residuals are uncorrelated with X.
Corr[ ε^ , X] = Cov[ ε^ , X]/√(Var[ ε^ ]Var[X]). Cov[ ε^ , X] = E[ ε^ X] - E[ ε^ ]E[X]. Since the mean of the
residuals is always zero, the numerator of Corr[ε^ , X] is:
Cov[ ε^ , X] = E[ ε^ X] = Σ ^εi (Xi - X ) = Σ ^εi xi.
In the current example, Σ ^εi xi is: (-6.25)(-1.216) + (-5.25)(.158) + (-2.25)(1.282) +

(-1.25)(-.343) + (1.75)(.781) + (2.75)(-.845) + (3.75)(1.53) + (6.75)(-1.346) = .01, or zero subject
to rounding.
^ ^ ^ ^ ^
In general, ^εi = Yi - Yi = Yi - α^ - βXi = Yi - ( Y - β X ) - βXi = yi - βxi.
^ ^ ^
Σ ^εi xi = Σ(yi - βxi)xi = Σxiyi - β Σxi2 = 0, since β = Σxiyi / Σxi2.
Therefore, Corr[ ε^ , X] = 0.
As will be seen when we discuss analysis of variance, the difference between the fitted Y and
^ ^ ^ ^ ^ ^
the mean of Y, Yi - Y , is also of interest. Yi - Y = α^ + βXi - Y = Y - β X + βXi - Y = βxi.
^ ^ ^ ^ ^
Σ ^εi ( Yi - Y ) = Σ ^εi βxi = β Σ εi xi = β(0) = 0.
^
Thus, Y - Y and ε^ are uncorrelated.
^
Exercise: In the current example, compute Σ ^εi ( Yi - Y ).
^
[Solution: Yi - Y = 57.22 - 61.125, 57.84 - 61.125, 59.72 - 61.125, 60.34 - 61.125,
62.22 - 61.125, 62.84 - 61.125, 63.47 - 61.125, 65.35 - 61.125 =
-3.91, -3.28, -1.41, -.78, 1.09, 1.72, 2.34, 4.22.
^
Σ ^εi ( Yi - Y ) = (-3.91)(-1.216) + (-3.28)(.158) + (-1.41)(1.282) + (-.78)(-.343) + (1.09)(.781) +
(1.72)(-.845) + (2.34)(1.53) + (4.22)(-1.346) = -.006, or zero subject to rounding.]
18
See Appendix 3.2 of Pindyck and Rubinfeld.
Problems:
3.1 (1 point) A regression is fit to 5 observations. The first four residuals are: 12, -4, -9, and 6.
What is the error sum of squares?
A. 220 B. 240 C. 260 D. 280 E. 300
3 . 2 (3 points) A two-variable regression is fit to the following 4 observations.

t 1 2 3 4
Y 30 40 55 60
What is the error sum of squares?
(A) Less than 16
(B) At least 16, but less than 17
(C) At least 17, but less than 18
(D) At least 18, but less than 19
(E) At least 19
3.3 (2 points) A two-variable regression is fit to 5 observations.

The first four values of the independent variable X and the residuals are as follows:
i 1 2 3 4
Xi 7 12 15 21
^ε
i 1.017 0.409 -0.557 -2.487
What is X5?
A. 29 B. 30 C. 31 D. 32 E. 33
3.4 (3 points) A two-variable regression is fit to 5 observations. The first 4 values of the
^
dependent variable Y and the corresponding fitted values Y are as follows:
i 1 2 3 4
Yi 13 25 36 40
^
Yi 18.036 22.989 30.419 40.325
What is Y5?
A. 48 B. 49 C. 50 D. 51 E. 52
Section 4, Dividing the Sum of Squares into Two Pieces
As will be discussed, one can divide the Total Sum of Squares (TSS) into two pieces: the
Regression Sum of Squares (RSS) and Error Sum of Squares (ESS).19
Sample Variance:
Exercise: X1 and X2 are two independent, identically distributed variables, with mean µ and
variance σ2. X = (X1 + X2)/2. What is the expected value of: (X1 - X )2 + (X2 - X )2?
[Solution: (X1 - X )2 + (X2 - X )2 = (X1/2 - X2/2)2 + (X2/2- X1/2)2 = 2(X1 - X2)2/4 =
X12 /2 + X22 /2 - X1X2. E[(X1 - X )2 + (X2 - X )2 ] = E[X12 /2 + X22 /2 - X1X2 ] =
(σ2 + µ2)/2 + (σ2 + µ2)/2 + µ2 = σ2.]
Thus {(X1 - X )2 + (X2 - X )2}/(2 - 1) = (X1 - X )2 + (X2 - X )2 is an unbiased estimator of σ2.

In general, with N independent, identically distributed variables Xi, Σ(Xi - X )2/(N - 1) is an
unbiased estimator of the variance.
Σ (X i - X) 2 /(N - 1) is called the sample variance of X.
The sample variance has in its numerator the sum of squared differences between each
element and the mean. The denominator of the sample variance is the number of
elements minus one.20 With this denominator, the sample variance is an unbiased
estimator of the underlying variance, when the underlying mean is unknown.21
Exercise: The heights of the eight sons were: 56, 58, 61, 60, 63, 62, 65, and 64.
What is the sample variance of heights of these sons?
[Solution: Y = 61.125. Sample Variance ≡ Σ(Yi - Y )2/(N - 1) =
{(56 - 61.125)2 + (58 - 61.125)2 + (61 - 61.125)2 + (60 - 61.125)2 +
(63 - 61.125)2 + (62 - 61.125)2 + (65 - 61.125)2 + (64 - 61.125)2} / (8 - 1) = 64.875/7 = 9.27.]
19
This is similar to the ideas behind Analysis of Variance (ANOVA).
See for example, Probability and Statistical Inference, by Hogg and Tanis.
Similar ideas also apply to Buhlmann Credibility. See “Credibility” by Mahler and Dean.
20
As will be discussed subsequently, the number of degrees of freedom associated with the sum of squares in the
numerator is N - 1.
21
The (non-sample) variance, Σ(Yi - Y )2/N = 2nd moment - square of the mean, is a biased estimator of the true
underlying variance.
Using the Functions of the Calculator to Compute Samples Means and Variances:
Using the TI-30X-IIS, one could work as follows with the sample of size eight:
56, 58, 61, 60, 63, 62, 65, and 64.
2nd STAT
CLRDATA ENTER
2nd STAT
1-VAR ENTER (Use the key if necessary to select 1-VAR rather than 2-VAR.)
DATA
X1 = 56
Freq = 1
X2 = 58
Freq = 1
X3 = 61
Freq = 1
X4 = 60
Freq = 1
X5 = 63
Freq = 1
X6 = 62
Freq = 1
X7 = 65
Freq = 1
X8 = 64
Freq = 1 ENTER
STATVAR
Various outputs are displayed. Use the key and the key to scroll through them.
n = 8. (number of data points.)
X = 61.125 (sample mean of X.)
Sx = 3.044316 (square root of the sample variance of X.)
σx = 2.847696 (square root of the variance of X, computed with n in the denominator.)
Σ X = 489 ΣX2 = 29955
S x2 = 3.0443162 = 9.268.
Total Sum of Squares:
The Total Sum of Squares or TSS is defined as the sum of squared differences between
Yi and Y .
TSS ≡ Σ (Y i - Y)2 = Σ yi2 = ΣYi2 - (ΣYi)2/N = SYY.
Note that while both TSS and ESS involve squared differences from the observations of the
dependent variable, Yi, in the case of the total sum of squares we subtract the mean, Y , while
^
in the case of the error sum of squares we subtract the estimated height, Yi .
TSS is just the numerator of the sample variance of Y.
In this example, TSS = (56 - 61.125)2 + (58 - 61.125)2 + (61 - 61.125)2 + (60 - 61.125)2 +
(63 - 61.125)2 + (62 - 61.125)2 + (65 - 61.125)2 + (64 - 61.125)2 = 64.875.
The TSS quantifies the total variation in the observations of the dependent variable. In the
case of a series of experiments, TSS would measure the total variation in outcomes.
Error Sum of Squares:
Recall that the Error Sum of Squares or ESS is:22
^
ESS ≡ Σ ^εi 2 = Σ (Yi - Yi )2.
As computed previously, for this example, ESS = 8.741.
Since Σ ^εi = 0, ESS is the numerator of the variance of ^εi .
Other Ways to write ESS for the two-variable model:*
^ ^ ^ ^ ^ ^
Since, ^εi = Yi - Yi = Yi - α^ - βXi = Yi - ( Y - β X ) - βXi = yi - βxi. β = Σxiyi / Σxi2
^ ^ ^
ESS = Σ ^εi 2 = Σ(yi - βxi)2 = Σyi2 + β 2Σxi2 - 2 β Σxiyi =
Σyi2 + (Σxiyi/ Σxi2)2Σxi2 - 2(Σxiyi/ Σxi2)Σyi xi = Σyi2 - (Σxiyi)2/ Σxi2 = Σyi2 - β^ Σxiyi.
^ ^ ^ ^ ^
Σ ^εi Yi = Σ ^εi ( α^ + βX ^ ^ ^ ^ ^ ^ ^
i + εi ) = α Σ εi + β Σ εi Xi + Σ εi 2 = α 0 + β0 + Σ εi 2 = Σ εi 2 = ESS.
22
The Error Sum of Squares, ESS, is also sometimes called the residual sum of squares.
Regression Sum of Squares:
There is a third sum of squared differences that is of importance.
The Regression Sum of Squares or RSS23 is defined as the sum of squared differences
between the fitted values and the mean of Y.
^
RSS = Σ ( Yi - Y) 2 .
^
Exercise: For the fitted model of heights, Yi = 24.07 + .6254Xi, what is the RSS?
^
[Solution: Yi - Y = 57.216 - 61.125, 57.842 - 61.125, 59.718 - 61.125, 60.343 - 61.125,
62.219 - 61.125, 62.845 - 61.125, 63.470 - 61.125, 65.346 - 61.125 =
-3.909, -3.283, -1.407, -0.782, 1.094, 1.720, 2.345, 4.221.
RSS = 3.9092 + 3.2832 + 1.4072 + .7822 + 1.0942 + 1.7202 + 2.3452 + 4.2212 = 56.121.]
In this example, RSS = 56.121.
^ ^ ^
Σ ^εi = 0 ⇒ ΣYi - Yi ⇒ ΣYi = Σ Yi ⇒ mean of Yi is Y .
^
Exercise: For this example, verify that the mean of Yi is 61.125 = Y .
^
[Solution: Σ Yi = 57.216 + 57.842 + 59.718 + 60.343 + 62.219 + 62.845 + 63.470 + 65.346 =
488.999. 488.999/8 = 61.125.]
^ ^
Therefore, RSS = Σ ( Yi - Y )2 is the numerator of the variance of Yi .
^ ^ ^ ^
RSS = Σ ( Yi - Y )2 = Σ ( Yi - Y )(Yi - Y - ^εi ) = Σ ( Yi - Y )(Yi - Y ) - Σ ( Yi - Y ) ^εi =
^ ^
Σ ( Yi - Y )(Yi - Y ), since Yi - Y and ^εi are uncorrelated.
^ ^
Therefore, RSS = Σ ( Yi - Y )(Yi - Y ) = the numerator of the correlation of Y and Y.
^
For this example, one can verify that Σ ( Yi - Y )(Yi - Y ) = 56.121 = RSS.
23
The RSS is also sometimes called the sum of squares associated with the model as opposed to the error.
Other Ways to write RSS for the two-variable model:*
^ ^ ^ ^
ΣYi = Σ Yi ⇒ RSS = Σ ( Yi - Y )2 = Σ Yi 2 - (Σ Yi )2/N.
^ ^ ^ ^ ^
Yi - Y = α^ + βXi - Y = ( Y - β X ) + βXi - Y = βxi.
^ ^ ^
RSS = Σ( Yi - Y )2 = β 2Σxi2 = (Σxiyi/ Σxi2)2Σxi2 = (Σxiyi)2/ Σxi2 = β Σxiyi.
TSS = RSS + ESS:
For this example, TSS = 64.875, RSS = 56.121, and ESS = 8.741.
Note that RSS + ESS = 64.862, equal to TSS subject to rounding.
In general for a regression model with an intercept, the Total Sum of Squares is equal to the
Regression Sum of Squares plus the Error Sum of Squares.
TSS = RSS + ESS.

The total variation has been broken into two pieces: that explained by the
regression model, RSS, and that unexplained by the regression model, ESS.
This very important result holds for any linear regression model with an intercept, whether it is
the two-variable model such as in this example, or a multivariable regression model to be
discussed subsequently.
Proof of TSS = RSS + ESS:*
^ ^ ^
Yi - Y = Yi - Yi + ( Yi - Y ) = ^εi + ( Yi - Y ).
^ ^
(Yi - Y )2 = ^εi 2 + ( Yi - Y )2 + 2 ^εi ( Yi - Y ).
^ ^
TSS = Σ (Yi - Y )2 = Σ ^εi 2 + Σ( Yi - Y )2 + 2Σ ^εi ( Yi - Y ).
^ ^
It has been shown previously that ^εi and Yi - Y have a correlation of zero and Σ ^εi ( Yi - Y ) = 0.
Thus the final term drops out and:
^
TSS = Σ ^εi 2 + Σ( Yi - Y )2 = ESS + RSS.
Note that the final term dropping out followed from a result that was proven for a regression
model with an intercept. Analysis of Variance is not generally applied to a model without an
intercept.
Alternately, for the two-variable model:

^ ^
RSS + ESS = β Σxiyi + Σyi2 - β Σxiyi = Σyi2 = TSS.
Degrees of Freedom:
Each of these sums of squares has a number of “Degrees of Freedom” associated with it.
The number of degrees of freedom will be needed in order to perform t-tests and F-tests.
Exercise: There were four observations. In deviations form, y1 = -6, y2 = -3, and y3 = 2.
The value of y4 is unreadable because a coworker spilled coffee on the report.
What is TSS?
[Solution: In deviations form, the sum of yi is zero. Therefore, the missing y4 must be: 7.
TSS = Σyi2 = 62 + 32 + 22 + 72 = 98.]
In this exercise, we can compute TSS only knowing three out of the four yi. In that sense, TSS
only depends on 3 pieces of information. Therefore, we say TSS has 3 degrees of freedom.
Another way to look at the same thing, is that TSS has 4 squared terms, but there is one linear
constraint on the yi: Σyi = 0. This linear constraint results in a loss of one degree of freedom,
and therefore we have: 4 - 1 = 3 degrees of freedom.
In any case, in general, if we have N points, TSS has N -1 degrees of freedom.24
^ ^ ^
Now RSS = Σ( Yi - Y )2 = β 2Σxi2. Treating the xi as known, we need only β, one piece of
information depending on the yi, the outcomes of the experiment. Therefore, RSS has 1
degree of freedom, for the two-variable model.
Since TSS = RSS + ESS,

(number of d.f. for TSS) = (number of d.f. for RSS) + (number of d.f. for ESS).
Therefore, ESS has N - 2 degrees of freedom, for the two-variable model.
The number of degrees of freedom for ESS is the number of points minus the number of fitted
parameters (including the fitted intercept.)
Exercise: A linear regression is fit to two observations. What is ESS?

[Solution: A line passes though any two points, therefore the regression line perfectly fits the
data. Therefore, the residuals are zero, and ESS = 0.]
With 2 observations, ESS has 2 - 2 = 0 degrees of freedom, for the two-variable model.
This is consistent with the result of the above exercise. ESS is automatically zero in this case,
regardless of the particular observations. We need zero pieces of information in order to
determine ESS in this case.
24
TSS is the numerator of the sample variance, while its degrees of freedom, N - 1, is the denominator of the sample
variance.
Degrees of Freedom, Multivariable Regression:
When we subsequently discuss the multivariable regression model, the following more
general formulas will hold:
Source of Variation Sum of Squares Degrees of Freedom

Model RSS k-1
Error ESS N-k
Total TSS N-1

Where N is the number of points, and k is the number of variables including the
intercept (k = 2 for the two-variable model with one slope and an intercept.)
Note that TSS = RSS + ESS, while N - 1 = (k - 1) + (N - k).
^
Exercise: For the model fit to heights, Yi = 24.07 + .6254Xi, what are the degrees of freedom?
[Solution: There are 8 points, N = 8. There are two variables, including the intercept, k = 2.
RSS has k - 1 = 2 - 1 = 1 degree of freedom. ESS has N - k = 8 - 2 = 6 degrees of freedom.
TSS has N - 1 = 8 - 1 = 7 degrees of freedom. Note 7 = 1 + 6.]
ANOVA Table:
When you run a regression program on a computer, it will usually print out an Analysis of
Variance (ANOVA) Table.25
For example, for the two-variable model (k = 2) fit to heights, with eight observations (N = 8),
the ANOVA Table might look like:26
Source of Variation Sum of Squares27 Degrees of Freedom Mean Square

Model 56.13 1 56.13
Error 8.74 6 1.46
Total 64.87 7 9.27
Note that: RSS + TSS = 56.13 + 8.74 = 64.87 = TSS. 1 + 6 = 7.

8.74/6 = 1.46. 64.87/7 = 9.27 = sample variance of Y.
This ANOVA table was for a two-variable regression model. For a multivariable regression
model, the ANOVA table would look similar, with of course the appropriate degrees of freedom.
25
Those who have not done so, will probably benefit from running such a program a few times. Most such programs
will print out many values related to items on the Syllabus, such as residuals, ESS, RSS, TSS, t-statistics,
F-Statistics, Durbin-Watson Statistics, variance-covariance matrices, etc.
26
Different computer programs may arrange things slightly differently. Also some additional information is probably
shown relating to items we have yet to discuss. This ANOVA table was produced by Mathematica.
27
The values for the sums of squares differ slightly from those shown previously, due to the lack of intermediate
rounding in the calculations underlying what is shown here.
Problems:
4 . 1 (1 point) For a two variable model (slope and intercept) fit to 25 points, what are the
degrees of freedom associated with the three sums of squares?
Use the following information for the next two questions:

For a multivariable regression, you have the following ANOVA Table, with certain items left
blank:
Source of Variation Sum of Squares Degrees of Freedom Mean Square
Model 1020 255
Error 7
Total 1230
4 . 2 (1 point) How many observations were there?

(A) 30 or less (B) 35 (C) 40 (D) 45 (E) 50 or more
4 . 3 (1 point) How many variables were there in the regression, including the intercept?
(A) 2 (B) 3 (C) 4 (D) 5 (E) 6 or more
4.4 (1 point) A regression model with 4 variables (3 slopes and one intercept) has been fit to
50 observations. What are the degrees of freedom associated with the Total Sum of Squares,
Regression Sum of Squares, and Error Sum of Squares?
4.5 (10 points) You are given the following 17 observations:

X 0 25 50 75 100 125 150 175 200 225 250 275
Y 4.90 7.41 6.19 5.57 5.17 6.89 7.05 7.11 6.19 8.28 4.84 8.29
X 300 325 350 375 395

Y 8.91 8.54 11.79 12.12 11.02
Fit a two-variable linear regression.
Graph the data and the fitted line.
Graph the residuals.
Put together the ANOVA Table, showing the sum of squares and the degrees of freedom.
(You may use a computer, but do not use a regression software package.
After completing your work, you may then check it using a regression software package.)
4.6 (1 point) A linear regression has been fit to 10 points, (Xi, Yi).
^
The fitted intercept is α^ . The fitted slope is β.
^ ^
Σ( α^ + βX ^
i - Y )2 = 49. The sample variance of Y is 8. Determine Σ( α + βXi - Yi )2.
(A) 19 (B) 20 (C) 21 (D) 22 (E) 23
Use the following information for the next 5 questions:

A linear regression, X = α + βY, is fit to 20 observations, (Xi, Yi).
ΣXi = 42, ΣXi2 = 101, ΣYi = 76, ΣYi2 = 310, ΣXiYi = 167.
^
4.7 (2 points) Determine β.
4.8 (2 points) Determine α^ .
4.9 (2 points) Determine TSS, the total sum of squares.
4.10 (2 points) Determine RSS, the regression sum of squares.
4.11 (2 points) Determine ESS, the error sum of squares.
4.12 (165, 11/88, Q.2) (1.7 points) You are given the following table:
Xi E[Yi ] ei
0 2.0 1.0
1 3.5 1.5
2 5.0 -2.0
3 6.5 0.5
where:
(i) E[Yi ] is the sequence of true values to be estimated.
(ii) ei are particular realizations of the error random variables εi.
(iii) Yi are the corresponding particular observations.
^
(iv) Yi are obtained by linear regression of Yi on Xi, including an intercept.
3
^
Determine Σ (Yi - Yi )2.
i=0
(A) 4 (B) 6 (C) 8 (D) 10 (E) 12
Note: The original exam question has been rewritten.
Mahler’s Guide to
Regression
Sections 5-9:
5 R-Squared
6 Corrected R-Squared
7 Normal Distribution
8 Assumptions of Linear Regression
9 Properties of Estimators
prepared by
Study Aid F06-Reg-B

Sharon, MA, 02067
HCMSA-F06-Reg-B, Mahler’s Guide to Regression, 7/11/06, Page 42
Section 5, R-Squared
For the two-variable model fit to heights, the ANOVA Table was:
Source of Variation Sum of Squares Degrees of Freedom Mean Square

Model 56.13 1 56.13
Error 8.74 6 1.46
Total 64.87 7 9.27
RSS is the amount of variation explained by the model. Thus in this example,
RSS/TSS = 56.13/64.87 = 86.5% of the total variation has been explained by the regression
model.
R-Squared is the percentage of variation explained by the regression model. 28
^
R2 ≡ RSS/TSS ≡ Σ( Yi - Y )2 / Σ(Yi - Y )2. In this example, R2 = 56.13/64.87 = .865.
R 2 = RSS/TSS = 1 - ESS/TSS = 1 - Σ ^εi 2 /Σ y i2 .
These formulas apply to the multiple-variable case as well as the two variable
case.
0 ≤ R2 ≤ 1.29
R2 = 1, when the observed points fall exactly on the fitted line.

R2 = 0, when the regression explains none of the variation in the independent variable.
There are number of additional ways to write R2.

^
For example, since as shown previously, for the two-variable model, RSS = β 2 Σxi2:
^
R2 = RSS/TSS = β 2 Σxi2 /Σyi2.30
Exercise: Verify the above formula in the case of the regression of heights.
[Solution: xi = (-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75).
yi = (-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).
Σxi2 = 143.5. Σyi2 = 64.875. Σxiyi = 89.75. β^ = Σxiyi /Σxi2 = 89.75/143.5 = .6254.
^
β 2 Σxi2 /Σyi2 = (.62542)(143.5/64.875) = .865 = R2.]
28
R-Squared is sometimes called the coefficient of determination.
In the case of multiple regression R-Squared is sometimes called the coefficient of multiple determination.
29
This restriction on the value of R-squared does not apply to a regression without an intercept.
30
See 4, 11/02, Q.30. This formula does not hold for the multiple-variable model.
Correlations:
^
Since for the two variable model, β = Σxiyi / Σxi2:
^ ^
R2 = β 2 Σxi2 /Σyi2 = β Σxiyi /Σyi2 = (Σxiyi)2/ {Σxi2Σyi2} = Corr[X, Y]2.
For the 2-variable model, R2 is the square of the correlation between X and Y.
Exercise: What is the correlation of the heights of the fathers and sons?
[Solution: Corr[X , Y] = Σxiyi/ √(Σxi2Σyi2) = 89.75/√((143.5)(64.875)) = .9302.]
For this regression of heights, Corr[X, Y]2 = .93022 = .865 = R2.
^
Corr[Y, Y]2 = R2:*
^
As was shown previously, RSS = Σ ( Yi - Y )(Yi - Y ) = the numerator of the correlation of Y and
^ ^
Y, and the mean of Y is Y .
^ ^
Therefore, Corr[Y, Y] = RSS/√(Σ (Yi - Y )2Σ ( Yi - Y )2) = RSS/√(TSS RSS) = √(RSS/ TSS).
^
Thus, Corr[Y, Y]2 = RSS/TSS = R2.
^
R2 is the square of the correlation between Y and Y.31
Exercise: For the regression of the heights of the fathers and sons, what is the correlation
^
between Y and Y?
[Solution: yi = Yi - Y = (-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).
Σyi2 = 64.875 = TSS. Mean of Y^ is 61.125 = Y .

^
Yi - Y = (-3.909, -3.283, -1.407, -0.782, 1.094, 1.720, 2.345, 4.221)
^
Σ( Yi - Y )2 = 56.121 = RSS .
^
Σ(Yi - Y )( Yi - Y ) = (-5.125)(-3.909) + (-3.125)(-3.283) + (-.125)(-1.407) + (-1.125)(-.782) +
(1.875)(1.094) + (.875)(1.720) + (3.875)(2.345) + (2.875)(4.221) = 56.127.
^
Corr[Y , Y] = 56.127/√((64.875)(56.121)) = .9302.
^
Comment: For this regression of heights, Corr[Y , Y]2 = .93022 = .865 = R2.]
31
This is what is meant by: R-Squared is the square of the multiple correlation coefficient.
Various Formulas for R2:
For either the two-variable or multiple-variable model:
^
R2 ≡ RSS/TSS ≡ Σ ( Yi - Y )2 /Σ (Yi - Y )2.
R2 = the percentage of variation explained by the model.
R2 = RSS/TSS = 1 - ESS/TSS = 1 - Σ ^εi 2/Σyi2.
R2 = 1 - ESS/TSS = 1 - (N - k)s2/TSS.32
R2 = (k-1)Fk-1,N-k/{(k-1)Fk-1,N-k + N - k}.33
2
R2 = 1 - (1 - R )(N - k)/(N - 1).34
For the two-variable model only:
^ ^
RSS = β 2 Σxi2= β Σxiyi.
^ ^
R2 = RSS/TSS = β 2 Σxi2 /Σyi2 = β Σxiyi /Σyi2 = (Σxiyi)2/ {Σxi2Σyi2} = Corr[X, Y]2.
^
Therefore, R2 = 0 ⇔ β = 0. A small fitted slope ⇒ a small R2.
S Notation:*
R2 = Corr[X, Y]2 = {SXY/√(SXX SYY)}2 = SXY2/(SXX SYY).

⇒ RSS = R2TSS = {SXY2/(SXX SYY)}SYY = SXY2/SXX = {ΣXiYi - ΣXiΣYi/N}2/{ΣXi2 - (ΣXi)2/N}.
For the regression of heights example, SXX = 143.5, SYY = 64.875, and SXY = 89.75.
RSS = SXY2/SXX = 89.752/143.5 = 56.13, matching the previous result subject to rounding.
ESS = TSS - RSS = SYY - SXY2/SXX = 64.875 - 89.752/143.5 = 8.74.
32
Where k is the number of variables including the intercept. As discussed in subsequent sections,
2
s ≡ ESS/(N-k) = estimated variance of the error terms. See pages 65 and 88 of Pindyck and Rubinfeld.
33
Where k is the number of variables including the intercept. As discussed in subsequent sections,
Fk-1,N-k = {RSS/(k-1)}/{ESS/(N-k)} = {TSS R2/(k-1)}/{TSS(1 - R2)/(N-k)} = {R2/(1 - R2)}{(N-k)/(k-1)},
follows an F Distribution with k-1 and N-k degrees of freedom.
See Equation 4.12 in Pindyck and Rubinfeld.
34
Where k is the number of variables including the intercept.
2
R , the corrected R2, is discussed subsequently in this section.
Interpretation of R2:
A large R2 indicates a good fit. A small R2 indicates a poor fit.
In the two variable model, R2 is the square of the correlation between X and Y. Thus an R2
close to 1, means the correlation between X and Y is close to ±1. However we need to
distinguish between a large correlation (either positive or negative) and causation.
For example, it seems reasonable to assume that the height of the father is indirectly
responsible in part for the height of his son. Yet we would get the same R2 for a regression
with the heights of the sons as the independent variable and the heights of the fathers as the
dependent variable.
Exercise: Fit a regression model Y = α + βX + ε, where X is the height of the son and Y is the
height of his father.
[Solution: Remembering, Xi is now the heights of the sons, Σxi2 = 64.875. Σxiyi = 89.75.
^ ^
β = Σxiyi /Σxi2 = 89.75/64.875 = 1.383. α^ = Y − β X = 59.25 - (1.383)(61.125) = -25.29.]
Here is a graph of the new regression, Father’s Height = -25.29 + (1.383)(Son’s Height):
Father
66
64
62
60
58
56
54
Son
56 58 60 62 64 66
R2 = .865 = Corr[sons, fathers]2, the same as for the previous regression of son’s heights as a
function of father’s height.
Here is a comparison of the two regression lines, with the previous regression,
Son’s Height = 24.07 + (.6254)(Father’s Height), as a solid line, and the new regression,
Father’s Height = -25.29 + (1.383)(Son’s Height) as a dotted line:
Son
66
64
62
60
58
56
Father
54 56 58 60 62 64 66
While the two regressions are similar they are not identical. They would only be identical if R2
were equal to 1.
Limitations of R2:35
Since they have the same R2, there is no way to chose between these two regressions on this
basis. Based on other than statistical considerations, one might assume that the father’s height
helps to determine his son’s height, rather than vice versa. In general, a high R2 indicates a
high correlation, which may or may not be related to causality.
Quite often time series of costs will have high correlations, resulting in R2 ≥ .90 when one
regresses one on the other. For example, if one regressed average claim costs for automobile
insurance as a function of average claim costs for fire insurance, one would likely get a very
high R2. Both of these series tend to increase over time at a somewhat similar rate, thus they
will likely have a very high correlation. Neither causes the other, although they probably have
causes in common.
35
See “The Usefulness of the R2 Statistic”, by Ross Fonticella, CAS Forum Winter 1998, and “A Statistical Note on
Trend Factors: The Meaning of R-Squared,” by D. Lee Barclay, CAS Forum Fall 1991.
Similarly, you might also get a high R2 if you regressed average claim costs for automobile
insurance in New York as function of the cost of living in England. Yet the cost of living in
England would not be a very sensible choice of explanatory variable for automobile insurance
severities in New York. In contrast, a consumer price index of the costs of repairing an
automobile in the northeast United States might be a good variable to use to try to explain a
portion of the movement of automobile insurance claim costs in New York.
So sometimes a high R2 does not indicate a meaningful regression result. On the other hand,
sometimes a low R2 will result, even when the independent variable(s) do explain a useful
portion of the variation of the dependent variable.
First, the dependent variable may be subject to a large amount of random fluctuation, so it is
hard for any model to fit the observations. This could occur if one was trying to fit the observed
loss ratios for a small book of business. Second, it may be that the R2 would be higher if
additional causes of the dependent variable were added to the regression.
The value of R2 may depend on how a model is stated.

Two models with essentially the same information, may have different R2 .
Exercise: Fit a regression of the difference between the height of the father and his son, as a
function of the height of the father. Determine R2.
[Solution: Height of Son - Height of Father = 24.079 - .3746(Height of Father).
RSS = 20.13. ESS = 8.74. R2 = 20.13/(20.13 + 8.74) = .672.]
This regression contains the exact same information as our original regression:
Height of Son = 24.079 + .6254(Height of Father). However, the original regression had an R2
of .865 rather than .672. The residuals and ESS are the same for both models, but the TSS is
smaller for the second model. Thus the R-Squared for two essentially identical models can be
significantly different. This highlights one of the problems of relying solely on R2 in order to
decide between models.
As one adds more and more relevant variables, one is usually able to get a regression that
seems to fit very well. As one adds more variables, R2 increases.
However, we can also overfit the data. For example if there are only five observations, we can
^
exactly fit them with a model: Y = β1 + β2X + β3X2 + β4X3 + β5X4 + ε.
The principle of parsimony states we do not wish to use more parameters than needed to get
the job done. While we would like a large R2, we would also like to use few parameters. These
are countervailing goals. Thus one should not directly compare the R2 for two models with
different numbers of parameters. As discussed in the next section, one can directly compare
2
the corrected R2, R , for two models.
Pure Error:*36
When there are repeated observations with the same value of X, usually the corresponding Y
values differ. In this situation, no model can perfectly fit the data. Therefore, the maximum
possible R2 is less than one.
One can quantify the “pure error” that results from these differing Y values for the same value of
X. Then one can divide the Error Sum of Squares between “lack of fit” and “pure error”.
The model can not explain the “pure error”.
One could then compare the R2 for a model to the maximum possible R2 in this situation. For
example, if the maximum possible R2 were 0.60, then an R2 of 0.56 is reasonably high.
36
See Section 2.1 of Applied Linear Regression by Draper and Smith, not on the syllabus.
Problems:
5 . 1 (1 point) A two-variable linear regression has its regression sum of squares equal to 124
and error sum of squares equal to 21. What is R2?
(A) .77 (B) .79 (C) .81 (D) .83 (E) .85
5.2 (3 points) You are given the following five observations:

X 1 2 3 4 5
Y 1 1 2 2 4
For a two-variable linear regression fit to this data, what is R2?
(A) 0.82 (B) 0.84 (C) 0.86 (D) 0.88 (E) 0.90
5.3 (1 point) A linear regression is fit, Y = α + βX.

For the same set of data, another linear regression is also fit, X = γ + δY.
^
Determine the value of β δ^ .
5.4 (2 points) You fit a two-variable regression model:

^
β = 17.25.
Σ(Xi - X )2 = 37.0.
Σ(Yi - Y )2 = 20019.
Determine R2.
(A) 0.55 (B) 0.60 (C) 0.65 (D) 0.70 (E) 0.75

X 0 1 2 5
Y 2 5 11 18
5.5 (3 points) Using the method of least squares, you fit the model Yi = α + βXi + εi.
Determine the value of R2 for this model.
A. 93% B. 94% C. 95% D. 96% E. 97%
5.6 (3 points) Using the method of least squares, you fit the model (Xi - Yi) = α + βXi + εi.
A. 93% B. 94% C. 95% D. 96% E. 97%
5.7 (2 points) You fit a two-variable regression model:

Σ(Xi - X )2 = 1660.
Σ(Yi - Y )2 = 899.
Σ(Xi - X )(Yi - Y ) = 1022.
Determine R2.
(A) 0.55 (B) 0.60 (C) 0.65 (D) 0.70 (E) 0.75
5.8 (Course 120 Sample Exam #2, Q.3) (2 points) You fit a simple linear regression to
five pairs of observations. The residuals for the first four observations are 0.4, -0.3, 0.0,
-0.7, and the estimated variance of the dependent variable Y is Σ(Yi - Y )2/(N - 1) = 1.5.
Calculate R2.
(A) 0.82 (B) 0.84 (C) 0.86 (D) 0.88 (E) 0.90
5.9 (Course 4 Sample Exam, Q.29) (2.5 points) You wish to determine the relationship
between sales (Y) and the number of radio advertisements broadcast (X).
Data collected on four consecutive days is shown below.
Day Sales Number of Radio Advertisements
1 10 2
2 20 2
3 30 3
4 40 3
Using the method of least squares, you determine the estimated regression line:
^
Y = -25 + 20X.
5.10 (IOA 101, 9/00, Q.6) (1.5 points) Suppose that the linear regression model
Y = α + βX + ε
is fitted to data {(Xi , Yi) : i = 1, 2, … , n}, where Y is the salary of a company manager and
X (years) is the number of years of relevant experience of that manager.
State the units of measurement (if any) of
^ the estimate of α,
(a) α,
^
(b) β, the estimate of β,
(c) R2, the coefficient of determination of the fit.
5.11 (IOA 101, 9/02, Q.5) (2.75 points) Suppose that a line is fitted by least squares to a
set of data, {(Xi, Yi), i =1, 2, ..., n}, which has sample correlation coefficient r.
^
Let the fitted value at Xi be denoted Yi .
^
Show that the sample correlation coefficient of the data {(Yi, Yi ), i =1, 2, ..., n}, that is, of the
observed and fitted y values, is also equal to r.
5.12 (4, 11/02, Q.5) (2.5 points) You fit the following model to eight observations:
Y = α + βX + ε
You are given:
^
β = 2.065
Σ(Xi - X )2 = 42
Σ(Yi - Y )2 = 182
Determine R2.
(A) 0.48 (B) 0.62 (C) 0.83 (D) 0.91 (E) 0.98
5.13 (4, 11/02, Q.30) (2.5 points)

Which of the following is not an objection to the use of R2 to compare the validity of
regression results under alternative specifications of a multiple linear regression model?
(A) The F statistic used to test the null hypothesis that none of the explanatory variables
helps explain variation of Y about its mean is a function of R2 and degrees of freedom.
(B) Increasing the number of independent variables in the regression equation can never
lower R2 and is likely to raise it.
(C) When the model is constrained to have zero intercept, the ratio of regression sum of
squares to total sum of squares need not lie within the range [0,1].
(D) Subtracting the value of one of the independent variables from both sides of the
regression equation can change the value of R2 while leaving the residuals unaffected.
(E) Because R2 is interpreted assuming the model is correct, it provides no direct
procedure for comparing alternative specifications.
5.14 (VEE-Applied Statistics Exam, 8/05, Q.9) (2.5 points) The method of ordinary
least squares is used to fit the following two models to the same data set:
Model I: Yi = α1 + β1Xi + ε1i
Model II: (Xi - Yi) = α2 + β2Xi + ε2i
Which of (A), (B), (C), and (D) is false?
^ 1 = -α
(A) α ^2
^ ^
(B) β 1 + β 2 = 1
(C) Σ ε^ 1i2 = Σ ε^ 2i2
(D) R2 for Model I is equal to R2 for Model II.
(E) None of (A), (B), (C), and (D) is false.
Section 6, Corrected R2 :
2
The corrected R2, R , adjusts for the number of variables used.37
2
Recall that R2 = 1 - ESS/ TSS = 1 - Σ ^εi 2/Σyi2 . The definition of R is similar, except it uses
sample variances rather than sums of squared errors.
2
R ≡ 1 - (sample variance of residuals)/ (sample variance of Y) = 1 - {Σ ^εi 2/(N-k)}/{Σyi2/(N-1)} =
1 - (ESS/TSS)(N - 1)/(N - k) = 1 - (1 - R2)(N - 1)/(N - k).
2
1 - R = (1 - R2)(N - 1)/(N - k), where N is the number of observations, and k is the
number of variables including the intercept.
Note that the correction factor of: (N - 1)/(N - k) =
(# degrees of freedom associated with TSS) / (# degrees of freedom associated with ESS).
2
For the two-variable model, 1 - R = (1 - R2)(N - 1)/(N - 2).
2
Exercise: For the regression of the heights of the fathers and sons, what is R ?
2
[Solution: N = 8, k = 2 and R2 = .865. 1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .865)(7/6) = .1575.
2
R = .843.]
2
For k = 1, a one variable model, R = R2
2
In general, R ≤ R2 .
As k, the number of variables, increases, the correction due to the factor of (N - 1)/(N - k) also
2
increases. Thus as one adds more variables, R may either increase or decrease. If R2 is
small, and the correction factor is big due to using lots of variables relative to the number of
2
observations, then it may turn out that R < 0.
Exercise: For a multiple regression with 5 slopes plus an intercept fit to 11 observations,
R2 = 0.3. What is the corrected R2?
2 2
[Solution: N = 11. k = 6. 1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .3)(10/5) = 1.4. R = -0.4.]
One can usefully compare the corrected R2 ’s of different regressions.

2
All other things being equal, we prefer the model with the largest R .
As will be discussed subsequently, one can perform F-Tests, in order to check the significance
of adding variables to a model.
37
The corrected R2 is also called the “adjusted R2” or “R2 adjusted for degrees of freedom.”
Exercise: The following seven regression models each with intercept and different numbers of
explanatory variables have been fit to the same set of 10 observations:
Model Variables in the Model ESS
I X2 5.58
II X3 7.54
III X4 6.51
IV X2, X3 5.21
V X2, X4 4.53
VI X3, X4 3.72
VII X2, X3, X4 3.31
The estimated variance of the dependent variable Y is 2.20.
2
Which model has the best corrected R2, R ?
[Solution: TSS = (N - 1)(sample variance of Y) = (9)(2.20) = 19.8. R2 = 1 - ESS/TSS.
2
1 - R = (1 - R2)(N - 1)/(N - k), where k is the number of variables including the intercept.
Model k ESS TSS R2 corrected R2
I 2 5.58 19.8 0.718 0.683
II 2 7.54 19.8 0.619 0.572
III 2 6.51 19.8 0.671 0.630
IV 3 5.21 19.8 0.737 0.662
V 3 4.53 19.8 0.771 0.706
VI 3 3.72 19.8 0.812 0.758
VII 4 3.31 19.8 0.833 0.749
2
Model VI has the largest R .
2
Comment: For a fixed number of variables, the model with the smallest ESS has the best R .
For two variables, model I is best. For three variables, model VI is best. Thus in order to
2
answer this question, there is no reason to compute R for models IV and V.]
Principle of Parsimony:*
The principle of parsimony states that one should not make more assumptions than the
minimum needed.38 As applied here, one should not use more independent variables in the
model than get the job done.39 Adding additional independent variables usually reduces R2,
although rarely R2 remains the same.
2
One should not use an additional variable that results in a reduction in R . Many actuaries
would not use an additional independent variable unless there was a substantial improvement
2
in R . We will discuss subsequently how to test whether model coefficients are zero.
38
Also called Occam’s razor.
39
As applied to fitting size of loss distributions, one should not use more parameters than get the job done.
The Expected Value of R2:*
Since 1 ≥ R2 ≥ 0, the expected value of R2 is positive, even if ρ = 0 in the two variable model.
2
Since R has been corrected for degrees of freedom, if ρ = 0 in the two variable model, or
more generally if in a multiple regression the dependent variable is independent of each of the
2
independent variables, then we expect that E[ R ] = 0.40
2
R2 = 1 - (1 - R )(N - k)/(N - 1).
2 2
ρ = 0. ⇒ E[ R ] = 0. ⇒ E[R2] = 1 - (1 - E[ R ])(N - k)/(N - 1) = 1 - (N - k)/(N - 1) = (k - 1)/(N - 1).
Exercise: A linear regression is fit to 6 observations.

If X and Y are independent, what is the expected value of R2?
[Solution: E[R2] = (k - 1)/(N - 1) = (2 - 1)/(6 - 1) = 20%.]
Thus in spite of there being no relationship between X and Y in this case, E[R2] = 20%.
When one has few observations, N is small, or one has fit many variables, k is large, the
expected value of R2 is relatively large.
Exercise: A multiple regression with 6 variables is fit to 6 observations.

The dependent variable is independent of each of the independent variables.
What is the expected value of R2?
[Solution: E[R2] = (k - 1)/(N - 1) = (6 - 1)/(6 - 1) = 1.
Comment: This model will fit perfectly, even if there is no actual relationship between any of the
independent variables and the dependent variable. R2 = 1.]
The Distribution of R2:*
R2 = RSS/TSS = RSS/(RSS + ESS) = (RSS/ESS)/{(RSS/ESS) + 1}.
Define the statistic, F = (RSS/ν1)/(ESS/ν2), where ν1 = k - 1, the number of degrees of freedom

of RSS, and ν2 = N - k, the number of degrees of freedom of ESS.41
Then R2 = ν1F/(ν1F + ν2). ⇒ F = (ν2/ν1)R2/(1 - R2).
For the Classical Normal Linear Regression Model, it can be shown that if all of the actual
slopes of the model are zero, in other words the dependent variable is independent of each of
the independent variables, then the F-statistic follows an F-Distribution with ν1 and ν2 degrees
of freedom. 42
2
40
This result will be demonstrated below. Recall that unlike R2, R can be either positive or negative.
41
As discussed subsequently, this F-statistic will be used to test hypotheses about the slopes of multiple regression
models.
42
The assumptions of the regression models are discussed in a subsequent section.
The distribution of F is in terms of an incomplete Beta Function: β[ν1/2, ν2/2; ν1x/(ν2 + ν1x)].43
Applying a change of variables, the distribution of R2 is:

β[ν1/2, ν2/2; ν1{(ν2/ν1)R2/(1 - R2)}/{ν2 + ν1(ν2/ν1)R2/(1 - R2)}]
= β[ν1/2, ν2/2; R2]
Thus if all of the actual slopes of the model are zero, R2 follows a Beta Distribution as per
Loss Models with a = ν1/2 = (k - 1)/2, b = ν2/2 = (N - k)/2, and θ = 1.44
For example, for a linear regression fit to 6 observations, if ρ = 0, R2 follows a Beta Distribution
with a = (2 - 1)/2 = 1/2, b = (6 - 2)/2 = 2, and θ = 1, with density graphed below:45
Prob.
17.5
15
12.5
10
7.5
2.5
R2
0.2 0.4 0.6 0.8 1
A Beta Distribution as per Loss Models has mean: θa/(a + b).

Therefore, E[R2] = a/(a + b) = (ν1/2)/(ν1/2 + ν2/2) = ν1/(ν1 + ν2) = (k - 1)/(N - 1).
This matches the result discussed previously, for the case where the dependent variable is
independent of each of the independent variables.
Exercise: For a linear regression fit to 6 observations, if ρ = 0,

what is the expected value of R2?
[Solution: E[R2] = (k - 1)/(N - 1) = (2 - 1)/(6 - 1) = 0.20.]
43
The F-Distribution is discussed in a subsequent section.
44
See Section 5.3 of Applied Regression Analysis, by Draper and Smith.
45
This density is: .75(1 - x)/√x, 0 ≤ x ≤ 1.
A Beta Distribution as per Loss Models has second moment: θ2a(a + 1)/{(a + b)(a + b + 1)}, and
variance: θ2ab / {(a + b)2 (a + b + 1)}.
Therefore, Var[R2] = ν1ν2/{(ν1 + ν2)2 (ν1 + ν2 + 2)/2} = 2(k - 1)(N - k)/{(N - 1)2(N + 1)}.
Exercise: For a linear regression fit to 6 observations, if ρ = 0, what is the variance of R2?
[Solution: Var[R2] = 2(k - 1)(N - k)/{(N - 1)2(N + 1)} = (2)(2 - 1)(6 - 2)/{(6 - 1)2(6 + 1)} = 0.0457.
Comment: The standard deviation is: 0.214.]
The Expected Value of Corrected R2:*
2
R ≡ 1 - (1 - R2)(N - 1)/(N - k).
2
E[ R ] = 1 - (1 - E[R2])(N - 1)/(N - k).
When the dependent variable is independent of each of the independent variables:

2
E[ R ] = 1 - {1 - (k - 1)/(N - 1)}(N - 1)/(N - k) = 1 - {(N - k)/(N - 1)}(N - 1)/(N - k) = 0.
2
This demonstrates a key advantage of R as compared to R2. When the dependent variable
is independent of each of the independent variables, in other words when the model actually
2
explains none of the variation of the dependent variable, E[ R ] = 0, while E[R2] > 0.
Problems:

Source Sum of Squares Degrees of Freedom
Regression 335.2 3
Error 102.5 6
Total 437.7 9
6 . 1 (1 point) Determine R2.

(A) .77 (B) .79 (C) .81 (D) .83 (E) .85
2
6 . 2 (1 point) Determine R , the corrected R2.
(A) .61 (B) .63 (C) .65 (D) .67 (E) .69
6.3 (2 points) Several regression models, each with intercept and different sets of explanatory
variables, have been fit to the same set of 15 observations in order to explain Workers
Compensation Insurance Claim Frequencies. The explanatory variables used were:
Log of Employment (E), Log of Unemployment Rate (U),
Log of Waiting Period for Benefits (W), and a Cost Containment Dummy Variable (C).
Model Variables in the Model Error Sum of Squares
I E, U 0.0131
II E, W 0.0123
III U, W 0.0144
IV E, U, C 0.0115
V E, W, C 0.0117
VI U, W, C 0.0118
VII E, U, W, C 0.0106
The estimated variance of the dependent variable, the log of the claim frequencies, is .0103.
2
Which model has the best R ?
A. II B. III C. IV D. VI E. VII
* 6.4 (3 points) A three variable linear regression, Y = β1 + β2X2 + β3X3 + ε, has been fit to 11
observations.
If β2 = β3 = 0, what is the distribution of R2? What is its density, mean, and variance?
6.5 (3 points) Two linear regressions, with slope β and intercept α, have been fit to two
different sets of 19 observations.
For both sets of observations, the sample variance of the values of the independent variables
is 37.
For both sets of observations, the sample variance of the values of the dependent variables is
13.
For the first set of observations Σ(Xi - X )(Yi - Y ) is greater than for the second set of
observations.
Which of the following statements is not true?
^
A. β is greater for the first regression than it is for the second regression.
^ is greater for the first regression than it is for the second regression.
B. α
C. s2 is smaller for the first regression than it is for the second regression.
D. R2 is greater for the first regression than it is for the second regression.
2
E. R is greater for the first regression than it is for the second regression.
6 . 6 (2 points) Which of the following is not equal to R2?

A. RSS/TSS.
^
B. Σ ( Yi - Y )2 /Σ (Yi - Y )2.
2
C. R (N - k)/(N - 1).
D. The percentage of variation explained by the regression model.
E. 1 - Σ ^εi 2/Σyi2.
6.7 (2 points) A multiple regression model is fit to some data. Then an additional independent
variable is added to the model, and the regression is fit to the same data.
Which of the following statements is true?
A. R2 may decrease.
2
B. R may decrease.
C. ESS may increase.
D. RSS may decrease.
E. None of A, B, C, and D are true.
6.8 (2 points) A linear regression model with slope and intercept is fit to 6 observations.
2
R = 0.64. If the same form of model is fit to 10 new similar observations, what is the expected
value of R2?
A. 0.66 B. 0.67 C. 0.68 D. 0.69 E. 0.70
* 6.9 (3 points) A two variable linear regression, Y = α + βX + ε, has been fit to 3 observations.
If β = 0, what is the distribution of R2? What is its density, mean, and variance?
6.10 (Course 120 Sample Exam #1, Q.9) ( 2 points) You are given:
Source Sum of Squares Degrees of Freedom
Regression 1115.11 2
Error 138.89 5
Total 1254.00 7
2
Determine R , the corrected R2.
(A) 0.84 (B) 0.89 (C) 0.93 (D) 0.97 (E) 1.00
6.11 (Course 120 Sample Exam #3, Q.4) (2 points) You fit the regression model
Yi = α + βXi + εi to 11 observations. You are given that R2 = 0.85.
2
Determine R , the corrected R2.
(A) 0.77 (B) 0.79 (C) 0.80 (D) 0.83
6.12 (4, 5/00, Q. 31) (2.5 points) You fit the following model to 48 observations:
Y = β1 + β2X2 + β3X3 + β4X4 + ε
You are given:
Source of Variation Degrees of Freedom Sum of Squares
Regression 3 103,658
Error 44 69,204
2
Calculate R , the corrected R2.
(A) 0.57 (B) 0.58 (C) 0.59 (D) 0.60 (E) 0.61
Section 7, Normal Distribution
The Normal Distribution is a bell-shaped symmetric distribution. Its two parameters are
its mean µ and its standard deviation σ. f(x) = exp[-(x-µ)2 / (2σ2) ] / σ(2π).5, -∞ < x < ∞.
The sum of two independent Normal Distributions is also a Normal Distribution,
with the sum of the means and variances.
If X is normally distributed, then so is aX + b, but with mean aµ+b and standard
deviation aσ. If one standardizes a normally distributed variable by subtracting µ and dividing
by σ, then one obtains a unit normal with mean 0 and standard deviation of 1.
Normal Distribution
Support: ∞ > x > -∞ Parameters: ∞ > µ > -∞ (location parameter)
σ > 0 (scale parameter)
D. f. : F(x) = Φ((x−
−µ)/σ)
P. d. f. : f(x) = φ ( ( x− σ √ 2π
− µ ) / σ ) = (1/σ µ )2]/[2σ
π ) exp( -[(x-µ σ 2] )
Mean = µ Variance = σ 2
Skewness = 0 ( distribution is symmetric)

Kurtosis = 3
Mode = µ Median = µ
Method of Moments: µ = µ1′ , σ = (µ2′ - µ1′2).5
Percentile Matching: Set gi = Φ−1(pi), then σ = (x1 - x2)/(g1 - g2), µ = x1 - σg1
Method of Maximum Likelihood: Same as Method of Moments
Sample Distribution, µ = 10 and σ = 5:
0.08
0.06
0.04
0.02
-10 10 20 30
The density of the Unit Normal is denoted by φ(x) = exp[-x2 / 2 ] / (2π).5, -∞ < x < ∞.
The corresponding distribution function is denoted by Φ(x).
The following table is similar to that attached to the exam and shows the values of the Unit
Normal Distribution Φ(x) with mean of 0 and variance of 1. In order to use this table one must
first standardize an approximately normal variable by subtracting its mean and dividing by its
standard deviation. For x < 0 one must make use of symmetry: Φ(x) = 1 - Φ(-x).
NORMAL DISTRIBUTION TABLE
The first table below gives values of the distribution function, Φ (x), of the
standard normal distribution for selected values of x. The integer part of x is
given in the top row, and the first decimal place of x is given in the left column.
x 0 1 2 3
0.0 0.5000 0.8413 0.9772 0.9987

0.1 0.5398 0.8643 0.9821 0.9990
0.2 0.5793 0.8849 0.9861 0.9993
0.3 0.6179 0.9032 0.9893 0.9995
0.4 0.6554 0.9192 0.9918 0.9997
0.5 0.6915 0.9332 0.9938 0.9998
0.6 0.7257 0.9452 0.9953 0.9998
0.7 0.7580 0.9554 0.9965 0.9999
0.8 0.7881 0.9641 0.9974 0.9999
0.9 0.8159 0.9713 0.9981 1.0000
This second table provides the x values that correspond to some selected values
of Φ (x).
Φ(x) x
0.800 0.842
0.850 1.036
0.900 1.282
0.950 1.645
0.975 1.960
0.990 2.326
0.995 2.576
Central Limit Theorem:
Let X1 , X2 , ..., Xn be a series of independent, identically distributed variables,

with finite mean and variance. Let X n = average of X1 , X2 , ..., Xn .
Then as n approaches infinity, X n approaches a Normal Distribution.
If each Xi has mean µ and variance σ2, then X n has mean m and variance σ2/n.
Therefore, ( X n - µ ) / ( σ /√ n) approaches a unit Normal as n approaches infinity.
For example, assume you X follows a uniform distribution from 0 to 10.

This has mean of 5 and variance of 102/12 = 8.333.
The average of 20 independent random draws from this uniform distribution, has a mean of 5
and a variance of 8.333/20 = .41665. The average of these 20 values is approximately
Normal, with this mean and variance.
Exercise: What is the probability that the average of these 20 random draws is less than 4?
[Solution: Prob[ X < 4] ≅ Φ((4 - 5)/√.41665) = Φ(-1.549) = 6.1%.]
The sum of 20 independent random draws from this uniform distribution, has a mean of (20)(5)
= 100 and a variance of (20)(8.333) = 166.67. The sum of these 20 values is approximately
Normal, with this mean and variance.
Exercise: What is the probability that the sum of these 20 random draws is less than 80?
[Solution: Prob[Σ Xi < 80] ≅ Φ((80 - 100)/√166.67) = Φ(-1.549) = 6.1%.
Comment: Prob[Σ Xi < 80] = Prob[ X < 4].]
Kurtosis:*
The kurtosis is defined as the fourth central moment divided by the square of the variance. As
with the skewness, the kurtosis is a dimensionless quantity (both the numerator and
denominator are in dollars to the fourth power), which describes the shape of the distribution.
Since the fourth central moment is always non-negative, so is the kurtosis. Large kurtosis
corresponds to a heavier-tailed curve, and vice versa.
Exercise: Compute the 4th central moment of a Normal Distribution.

∞ ∞
∫ (x-µ)4 (1/σ√2π) exp( -[(x-µ )2]/[2σ2] ) dx = (2/σ√2π)∫ σ4z2 exp(-z/2) .5σz-.5dz =

x=-∞ z=0
(σ4/√2π) ∫ z1.5 exp(-z/2)dz = (σ4/√2π)Γ(2.5)(1/2)-2.5 = (σ4/√π)(1.5)Γ(1.5)22 =

z=0
(σ4/√π)(6)(.5√π) = 3σ4. Note we made a change of variables z = ((x-µ)/σ)2 and got an integral
involving a complete Gamma function.]
Exercise: Compute the kurtosis of a Normal Distribution with parameters µ and σ.

[Solution: The kurtosis is defined as the fourth central moment divided by the square of the
variance = 3σ4/ (σ2)2 = 3.]
All Normal Distributions have a kurtosis of 3. Thus curves with a kurtosis more than 3 are
heavier-tailed than a Normal Distribution. Rather than kurtosis, some people use Excess,
which is just kurtosis - 3. Thus the Normal Distribution has an Excess of 0.
Curves with positive excess are heavier-tailed than the Normal Distribution.
Exercise: Let µ1′, µ2′, µ3′, and µ4′ be the first four moments (around the origin) of a distribution.
What is the 4th central moment of this distribution?
[Solution: The 4th central moment is: E[(X-µ1′)4] = E[X4 - 4µ1′X3 + 6µ1′2X2 - 4µ1′3X + µ1′4] =
E[X4] - 4µ1′E[X3] + 6µ1′2E[X2] - 4µ1′3E[X] + µ1′4 = µ4′ - 4µ1′µ3′+ 6µ1′2µ2′ - 4µ1′3µ1′ + µ1′4 = µ4′
- 4µ1′µ3′ + 6µ1′2µ2′ - 3µ1′4.]
Approximations to the Normal Distribution:*46
Φ(x) ≅ 1- φ(x){.4361836t -.1201676t2 +.9372980t3}, where t = 1/(1+.33267x)
46
See pages 103-104 of Simulation by Ross or 26.2.16 in Handbook of Mathematical Functions.
Testing the Normality of Residuals:
One could test whether the residuals are Normally distributed using the methods one can
apply to any data and distribution.47 One could use graphical techniques such as:
histogram, ogive, or p-p plots. One could use statistical tests such as the K-S Statistic.
However, there are special techniques one can apply for the Normal Distribution.
If the graph of the residuals does not look roughly symmetric, they are probably not Normally
distributed.48 If the residuals are Normally Distributed one also expects the skewness of the
residuals to be close to zero and the kurtosis to be close to 3, those of a Normal Distribution.
Jarque-Bera Statistic:*
One can combine these ideas into a statistical test using the Jarque-Bera statistic:49
JB = (N/6)(Skewness2 + (Kurtosis - 3)2/4).
The JB Statistic has a Chi-Square Distribution with 2 degrees of freedom,50 which is an

Exponential Distribution a with mean of 2.51 If the residuals are Normal, we expect the JB
Statistic to be close to zero, since the observed skewness should be close to zero and the
observed kurtosis should be close to 3. If the JB Statistic is sufficiently large, then we reject the
hypothesis that the residuals are Normal.
Exercise: For a regression fit to 100 points, the residuals have a skewness of -0.6 and a
Kurtosis of 3.8. Compute the Jarque-Bera statistic and test the null hypothesis that the
residuals are Normally Distributed
[Solution: JB = (100/6)(.62 + .82/4) = 8.67.
The p-value of the test is the chance that the JB statistic would be this large or larger if the null
hypothesis were true. If H0 is true, JB follows an Exponential Distribution with mean 2 and
therefore, p-value = e-8.67/2 = 1.3%.
Using the Chi-Square table for 2 degrees of freedom, since 7.38 < 8.67 < 9.21, we reject H0 at
2.5% and do not reject at 1%.]
47
See “Mahler’s Guide to Fitting Loss Distributions.”
48
The Normal Distribution is symmetric. Note that the mean of the residuals is always zero.
49
See page 47 of Econometric Models and Economic Forecasts.
50
The skewness of a sample from a Normal Distribution is asymptotically Normal with mean zero and variance of 6/n.
Therefore, (N/6)Skewness2 for large samples is approximately a Unit Normal squared.
The kurtosis of a sample from a Normal Distribution is asymptotically Normal with mean 3 and variance of 24/n.
Therefore, (N/24)(Kurtosis - 3)2 for large samples is approximately a Unit Normal squared. See Statistical Methods by
Snedecor and Cochran or Volume 1 of Kendall’s Advanced Theory of Statistics.
Also the correlation between the skewness and kurtosis from Normal Samples is asymptotically (27/n). Thus for large
samples, the two terms in the JB Statistic are approximately independent, and their sum is approximately the sum of
two independent unit Normals squared, which is a Chi-Square Distribution with 2 degrees of freedom. See Volume
1 of Kendall’s Advanced Theory of Statistics.
51
In general, a Chi-Square Distribution with ν degrees of freedom, is a Gamma Distribution with α = ν/2 and θ = 2.
See “Mahler’s Guide to Conjugate Priors.”
Problems:
7 .1 (1 point) For a Standard Normal Distribution, with mean 0 and standard deviation of 1,
which of the following is a symmetric 90% confidence interval?
A. [-1.282, 1.282] B. [-1.645, 1.645] C. [-1.960, 1.960]
D. [-2.326, 2.326] E. None of A, B, C, or D.
7 .2 (1 point) X has a Normal Distribution with µ = 7 and σ = 4.

What is the probability that 6 ≤ x ≤ 12?
A. 49% B. 50% C. 51% D. 52% E. 53%
7.3 (3 points) For which of the following situations are the residuals of the regression most
likely to be Normally Distributed?
Regression Second Moment Third Moment Fourth Moment
of the Residuals of the Residuals of the Residuals
A. 5 2 80
B. 5 1 50
C. 5 0 100
D. 5 -2 50
E. 5 -1 70
^ ^
7.4 (1 point) β = 13 and Var[ β] = 9. Using the Normal Approximation, which of the following is
a symmetric 95% confidence interval for β?
A. [9.1, 16.9] B. [8.1, 17.9] C. [7.1, 18.9]
D. [6.1, 19.9] E. None of A, B, C, or D.
7.5 (2, 5/83, Q. 19) (1.5 points) Let X1, X2, and X3 be a random sample from a normal
distribution with mean µ ≠ 0 and variance σ2 = 1/24. What are the values of a and b,
respectively, in order for L = aX1 + 4X2 + bX3, to have a standard normal distribution?
A. a = -2, b = -2 B. a = -2, b = 2 C. a = -1, b= -3
D. a = 2, b = 2 E. Cannot be determined from the given information
7.6 (2, 5/88, Q. 29) (1.5 points) A symmetric 98% confidence interval is needed for µ, the
mean of a normal population whose variance is 10. What is the smallest sample size required
so that the length of the confidence interval will be no more than 3?
A. 5 B. 7 C. 25 D. 30 E. 242
7.7 (2, 5/90, Q. 3) (1.7 points) Let X1, X2, . . . , X36 and Y1, Y2, . . . , Y49, be independent
random samples from distributions with means µX = 30.4 and µY = 32.1 and with standard
deviations σX = 12 and σY = 14.
What is the approximate value of P[ X > Y ]?
A. 0.27 B. 0.34 C. 0.50 D. 0.66 E. 0.73
7.8 (2, 5/92, Q. 37) (1.7 points) A random sample X1, . . . , Xn is taken from a normal
distribution with mean µ and variance 12.
A symmetric 95% confidence interval is needed for µ. What is the smallest sample size for
which the length of the desired confidence interval is less than or equal to 5?
A. 3 B. 7 C. 8 D. 62 E. 89
7.9 (2, 2/96, Q.2) (1.7 points) Let X be a normal random variable with mean 0 and variance
a > 0. Calculate P[X2 < a].
A. 0.34 B. 0.42 C. 0.68 D. 0.84 E. 0.90
7.10 (2, 2/96, Q.45) (1.7 points) The weights of the animals in a population are normally
distributed with variance 144. A random sample of 16 of the animals Is taken. The mean
weight of the sample is 200 pounds. Calculate the lower bound of the symmetric 90%
confidence interval for the mean weight of the population.
A. 140.96 B. 194.12 C. 194.75 D. 195.08 E. 198.77
7.11 (1, 5/00, Q.6) (1.9 points) Two instruments are used to measure the height, h, of a
tower. The error made by the less accurate instrument is normally distributed with mean 0 and
standard deviation 0.0056h. The error made by the more accurate instrument is normally
distributed with mean 0 and standard deviation 0.0044h.
Assuming the two measurements are independent random variables, what is the
probability that their average value is within 0.005h of the height of the tower?
(A) 0.38 (B) 0.47 (C) 0.68 (D) 0.84 (E) 0.90
7.12 (1, 5/00, Q.9) (1.9 points) The total claim amount for a health insurance policy follows
a distribution with density function e-x/1000/1000, for x > 0.
The premium for the policy is set at 100 over the expected total claim amount.
If 100 policies are sold, what is the approximate probability that the insurance company will
have claims exceeding the premiums collected?
(A) 0.001 (B) 0.159 (C) 0.333 (D) 0.407 (E) 0.460
7.13 (1, 5/00, Q.19) (1.9 points) In an analysis of healthcare data, ages have been rounded
to the nearest multiple of 5 years. The difference between the true age and the rounded age is
assumed to be uniformly distributed on the interval from -2.5 years to 2.5 years. The
healthcare data are based on a random sample of 48 people.
What is the approximate probability that the mean of the rounded ages is within 0.25 years of
the mean of the true ages?
(A) 0.14 (B) 0.38 (C) 0.57 (D) 0.77 (E) 0.88
7.14 (1, 11/00, Q.19) (1.9 points) Claims filed under auto insurance policies follow a
normal distribution with mean 19,400 and standard deviation 5,000.
What is the probability that the average of 25 randomly selected claims exceeds 20,000?
(A) 0.01 (B) 0.15 (C) 0.27 (D) 0.33 (E) 0.45
7.15 (1, 5/01, Q.19) (1.9 points) A company manufactures a brand of light bulb with a
lifetime in months that is normally distributed with mean 3 and variance 1. A consumer buys a
number of these bulbs with the intention of replacing them successively as they burn out.
The light bulbs have independent lifetimes.
What is the smallest number of bulbs to be purchased so that the succession of light bulbs
produces light for at least 40 months with probability at least 0.9772?
(A) 14 (B) 16 (C) 20 (D) 40 (E) 55
7.16 (1, 5/03, Q.13) (2.5 points) A charity receives 2025 contributions. Contributions are
assumed to be independent and identically distributed with mean 3125 and standard deviation
250. Calculate the approximate 90th percentile for the distribution of the total contributions
received.
(A) 6,328,000 (B) 6,338,000 (C) 6,343,000 (D) 6,784,000 (E) 6,977,000
Section 8, Assumptions of Linear Regression52

There are a number of assumptions of the two-variable and multivariable regression models.
Linear Relationship:
In the two variable model we assume: Yi = α + βXi + εi,

where α is the intercept, β is the slope, and εi is the ith error term.
As will be discussed subsequently, a similar relationship holds for the multivariable model.
For example, for the three variable model: Yi = β1 + β2X2i + β3X3i + εi.
For the four variable model: Yi = β1 + β2X2i + β3X3i + β4X4i + εi.
As will be discussed subsequently, via change of variables one can obtain other relationships
between the independent variable(s) and the dependent variable.
Meaning of the Error Terms:
The error terms, εi, are random variables. They represent the variation in the dependent
variable caused by independent variables not included in the model and/or random
fluctuation.
For example, in the heights example, the height of the mother probably explains some of the
differences in the heights of sons, that is not explained solely by the heights of the fathers.
Nevertheless, a model that included the heights of both parents, would still have an error term,
since there are other factors that affect height, as well as a random element.
Many items important in insurance, such as the number of claims, aggregate dollars of loss,
etc., have a large purely random component.
Nonstochastic Independent Variables:
We assume the values of the independent variable(s) are observed known values.
In actuarial work, we usually have little or no control over what values of X have been
observed.
52
See Sections 3.1 and 4.1 of Pindyck and Rubinfeld.
Error Terms Have Mean of Zero:
We assume E[εi] = 0, for all i.
This implies that the linear regression estimator for Yi, α + βXi, is unbiased:
E[Yi] = E[α + βXi + εi] = E[α + βXi ] + E[εi] = E[α + βXi].53
Homoscedasticity:
We assume the errors terms have constant variance; Var[εi] = σ2 for all i. This is the equivalent
to assuming the Yi have constant variance. This is called homoscedasticity. We will
subsequently discuss what to do when the we have heteroscedasticity rather than
homoscedasticity.
Independent Errors:
We assume the errors terms are independent.54 This is the equivalent to assuming the Yi are
independent.
This implies that Cov[εi, εj] = 0 for i ≠ j. Since E[ε] = 0, this implies E[εiεj] = 0 for i ≠ j. We will
subsequently discuss what to do when the error terms are autocorrelated and thus not
independent.
Classical Linear Regression Model:
The above five assumptions are made in the Classical Linear Regression Model:
1. Yi = α + βXi + εi.
2. Xi are known fixed values.
3. E[εi] = 0 for all i.
4. Var[εi] = σ2 for all i.
5. εi and εj independent for i ≠ j.
53
For the model with an intercept, this would be equivalent to an assumption that all of the error terms have the same
expected value. If that expected value were not zero, we could just subtract that expected value from the intercept
and get a new model with E[εi*] = 0, for all i.
54
While the errors are independent, the residuals ^εi are not, since as discussed previously they are constrained to
sum to zero.
No Multicollinearity:
In the multivariate version, we add an assumption that no exact linear relationship exists
between two or more of the independent variables. In other words, we assume that the matrix
whose rows are X2, X3, ..., Xk has rank k -1, the number of independent variables other than
the constant. We will subsequently discuss multicollinearity or approximate multicollinearity.
Classical Normal Linear Regression Model:
If we add an assumption, we get the Classical Normal Linear Regression Model:

6. The error terms are Normally Distributed.
Thus Y i is Normally Distributed with mean α + β X i and variance σ 2 .
It is this assumption of normality that will allow the very important use of t-tests and F-tests, to
be discussed subsequently.55
55
In practical applications it is only required that the errors be approximately normally distributed.
Problems:
8.1 (2 points) In the Classical Linear Regression Model, Yi = α + βXi + εi, which of the
statements following are false?
A. The Xs are nonstochastic.
B. E[εi] = 0.
C. E[εi εj] = δij, where δij = 0 if i ≠ j and 1 if i = j.
D. Var[εi] = Var[εj].
E. Statements A, B, C, and D are all true.
8.2 (3 points) In the Classical Linear Regression Model, determine the correlation between X
and Y.
Section 9, Properties of Estimators56

We are interested in what is expected to happen if we were to use an estimator again and
again to make a particular estimate. The errors that would result from the repeated
use of a procedure is what is referred to when we discuss the qualities of an
estimator. Various desirable properties of estimators will be discussed.
Estimates versus Estimators:
Using linear regression to estimate the salary of an actuary with 5 exams is an example of an
estimator.
^
If for a particular set of data, α^ = 35.5, β = 9.5, then 35.5 + (5)(9.5) = 83 is an example of an
estimate.
An estimator is a procedure used to estimate a quantity of interest.

An estimate is the result of using an estimator.
An estimator is a random variable or random function.

An estimate is a number or a function.
Point Estimators:
A point estimator provides a single value, or point estimate as an estimate of a quantity of

^
^ + 5 β for the parameters of a
interest.57 An example of a point estimator, would be to take α
fitted linear regression as the estimate of the value of Y when X is 5.
One wants point estimators: to be unbiased, to be consistent, to be efficient,

and to have a small mean squared error.
Bias:
The Bias of an estimator is the expected value of the estimator minus the true
value. An unbiased estimator has a Bias of zero.
The sample variance, Σ(Xi - X )2/(N-1), is a good example of an unbiased estimator.
56
While Section 2.3 of Pindyck and Rubinfeld is not on the syllabus, these ideas are.
See also Loss Models by Klugman, Panjer, and Willmot.
^
57
Point estimates differ from interval estimates. A point estimate of β might be 17.
^
An interval estimate of β might be: 17 ± 5, with 90% confidence.
Exercise: Demonstrate that the sample variance is an unbiased estimator for a random sample
of size 3.
[Solution: Let X1, X2, and X3, be three independent identically distributed variables.
X = (X1 + X2 + X3) / 3.
(X1 - X )2 = (2X1/3 - X2/3 - X3/3)2 = 4X12/9 + X22/9 + X32/9 - 4X1X2/9 - 4X1X3/9 + 2X2X3/9.
E[(X1 - X )2] = E[4X12/9 + X22/9 + X32/9 - 4X1X2/9 - 4X1X3/9 + 2X2X3/9] =
(4/9)E[X2] + (1/9)E[X2] + (1/9)E[X2] - (4/9)E[X]E[X] - (4/9)E[X]E[X] + (2/9)E[X]E[X] =
(2/3){E[X2] - E[X]2} = (2/3)Var[X].
E[(X1 - X )2] = E[(X2 - X )2] = E[(X3 - X )2] = (2/3)Var[X].
The expected value of the sample variance is:
E[{(X1 - X )2 + (X2 - X )2 + (X3 - X )2}/(3-1)] = {(2/3)Var[X] + (2/3)Var[X] + (2/3)Var[X]}/2 = Var[X].]
ΣXi = N X . ⇒ Σ(Xi - X ) = 0. Thus if one knows any N - 1 of the Xi - X , then one knows the
remaining one. We lose a degree of freedom; we have N - 1 rather than N degrees of freedom.
Therefore, in order to obtain an unbiased estimator of the variance, we put N - 1 rather than N
in the denominator.
Asymptotically Unbiased:
For an asymptotically unbiased estimator, as the number of data points, N → ∞ ,

the bias approaches zero.58
In other words, as the sample size N → ∞ approaches infinity, the expected value of the
estimator approaches the true value of the quantity being estimated.
Σ(Xi - X )2/N = {(N - 1)/N}(sample variance).

The sample variance is an unbiased estimator of the variance, and therefore the expected
value of the sample variance is Var[X].
Thus, Σ(Xi - X )2/N has an expected value of Var[X](N - 1)/N, which goes to Var[X] as N → ∞.
Therefore, Σ(Xi - X )2/N is an asymptotically unbiased estimator of the variance.
Consistency:
When based on a large number of observations, a consistent estimator, also

called weakly consistent, has a very small probability that it will differ by a large
amount from the true value.59
Let ψn be the estimator with a sample size of n and c be the true value, then ψ is a consistent
estimator if given any ε > 0:
limit Probability{| ψn - c | < ε} = 1
n→∞
58
See Definition 9.6 in Loss Models.
59
A consistent estimator may also be defined as one that converges stochastically to the true value.
Most estimators used by actuaries are consistent. For example the sample mean is a
consistent estimator of the underlying mean assuming the data are independent draws from a
single distribution with finite mean.60
Exercise: Use as an estimator of the mean, the first element of a data set of size N.
Is this estimator consistent?
[Solution: The estimate resulting from this estimator does not depend on the sample size N.
The probability of a large error is independent of N and therefore does not approach zero as N
approaches infinity. This estimator is not consistent.
Comment: This is a stupid estimator to use.]
Mean Squared Errors:
The mean square error (MSE) of an estimator is the expected value of the squared
difference between the estimate and the true value. The smaller the MSE, the better the
estimator, all else equal.
MSE(θ^ ) = Var(θ^ ) + [Bias(θ^ )]2
The mean squared error is equal to the variance plus the square of the bias.
Thus for an unbiased estimator, the mean square error is equal to the variance.
When there is a tradeoff between the bias and variance (efficiency) of an estimator, one may
look to minimize the mean squared error.
Variances:
Let ψn be asymptotically unbiased estimator, whose variance goes to zero as n, the number of
data points, goes to infinity. Then as n goes to infinity, since both the variance and bias go to
zero, the Mean Squared Error, MSE(ψn) = Var(ψn) + [Bias(ψn)]2, also goes to zero.
Let c be the true value, then and for any ε > 0:

∫
MSE(ψn) = ( ψn - c )2 f(ψn) dψn ≥ Probability{| ψn - c | ≥ ε} ε2 ≥ 0.
Therefore, since as n goes to infinity, MSE(ψn) goes to zero:

limit Probability{| ψn - c | ≥ ε} = 0.
n→∞
Therefore, an asymptotically unbiased estimator, whose variance goes to zero as the number
of data points goes to infinity, is consistent (weakly consistent).
60
The Law of Large Numbers. See for example, An Introduction to Probability Theory and Its Applications by Feller,
of A First Course in Probability by Ross.
i=n
For example, ∑ (xi - x)2 / n is an asymptotically unbiased estimator of the variance. For a
i=1
distribution with finite fourth moment, the variance of this estimator goes to zero. Therefore, in
i=n
that case, ∑ (xi - x)2 / n is a consistent estimator of the variance.
i=1
The minimal variance for an unbiased estimator of a parameter, that is not in the
support of a density, is the Rao-Cramer lower bound:
-1 / {n E [∂∂ 2 ln f(x) / ∂ θ 2 ] } = 1 / {n E [(∂ ln f(x) / ∂θ)2] }.
Thus an unbiased estimator of a parameter has the smallest MSE among unbiased estimators
if and only if its variance attains the Rao-Cramer lower bound.
Efficiency:
For an unbiased estimator, the variance is equal to the mean squared error. Thus an unbiased
estimator has the smallest MSE among unbiased estimators if and only if it has the smallest
variance among unbiased estimators.
An unbiased estimator is efficient if for a given sample size it has the smallest
variance of any unbiased estimator.
The efficiency of an unbiased estimator is:

(the Rao-Cramer lower bound)/(Variance of that estimator).
Thus an unbiased estimator with a variance equal to the Rao-Cramer lower bound is 100%
efficient.
Exercise: The Rao-Cramer lower bound is 23.

What is the efficiency of an unbiased estimator with Mean Squared Error = 37?
[Solution: Efficiency = 23/37 = 62.2%.]
Maximum Likelihood estimators usually have an efficiency less than 100% for finite sample
sizes. However, the efficiency of Maximum Likelihood estimators approaches 100% as the
sample sizes increases towards infinity; they are asymptotically efficient.
Linear Regression Estimators:
Under the assumptions of the classical linear regression model61, the least squares
estimator of the slope is unbiased.
In other words, the expected value of the estimate of the slope is equal to the true slope:62
^
E[ β] = β.
This holds for both the two-variable model and the multi-variable model, provided the assumed
model is the correct model.
The ordinary least squares estimator is consistent. It is consistent even with

heteroscedasticity and/or serial correlation of errors, to be discussed subsequently.
Under the assumptions of the classical linear regression model, the regression estimators of
the intercept and slope(s) are the most efficient linear unbiased estimators.63 This is the
Gauss-Markov Theorem. In other words, the ordinary least squares estimator are
the best linear unbiased estimator, BLUE; they have the smallest variance
among linear unbiased estimators.
If the assumptions of the classical linear regression model do not hold then the ordinary least
squares estimator is not necessarily efficient. Specifically with heteroscedasticity and/or
serial correlation of errors, to be discussed subsequently, the ordinary least squares
estimator is not efficient.64
When Relevant Variables Are Omitted from Linear Regression Models:*65
Let us assume that Y = β1 + β2X2 + β3X3 + ε, or in deviations form y = β2x2 + β3x3 + ε.

However, we inadvertently exclude the independent variable X3 from our model.
^
Then β 2 = Σx2iyi/Σx2i2 = Σx2i(β2x2i + β3x3i + εi)/Σx2i2 = β2 + β3Σx2ix3i/Σx2i2 + Σx2iεi/Σx2i2 =
β2 + β3Cov[X2 , X3]/Var[X2] + Σx2iεi/Σx2i2.
^
E[ β 2 ] = β2 + β3Cov[X2 , X3]/Var[X2].
^
Bias = E[ β 2 ] - β2 = Cov[X2 , X3]/Var[X2] = Corr[X2 , X3]√(Var[X3]/Var[X2]).
61
For unbiasedness, only the first three assumptions are needed: linearity, X not stochastic, and E[ε] = 0.
62
See equation 3.6 of Econometric Models and Economic Forecasts.
63
We do not require normal errors for the Gauss-Markov Theorem to hold.
64
See pages 147 and 159 of Pindyck and Rubinfeld.
65
See Section 7.3.1. of Pindyck and Rubinfeld, not on the Syllabus.
Thus, when a relevant variable is (unknowingly) omitted from the model, then the estimator of
the slope will be biased, unless the omitted variable is uncorrelated with the included
variables.66
^
Correlation of X2 and X3 Sign of β3. Bias of β 2
Positive Positive Upwards
Positive Negative Downwards
Zero None
Negative Positive Downwards
Negative Negative Upwards
We note that the above bias does not depend on the sample size.
Thus as N → ∞, the probability of large absolute error does not go to zero.
^
β 2 is not a consistent estimator of β2.
Thus, an example of an inconsistent estimator is the regression estimator of the slope, when a
relevant variable is (unknowingly) omitted from the model.67
As discussed in a subsequent section, the omission of relevant variables from a linear

regression model of time series, often leads to positive serial correlation of errors.
In that case, the estimators of the slopes are no longer efficient.
When Irrelevant Variables Are Included in Linear Regression Models:*68
If one includes an extra variable or variables in a linear regression model that do not explain
the independent variable, then the least squares estimators of the coefficients are:
1. Still unbiased.
2. Still consistent.
3. No longer efficient.
Specifically, if irrelevant variables are included, the variance of the estimated slope of a
relevant variable will increase.
Specification Error:*69
Specification Error or Modeling Error occurs when one (inadvertently) uses the wrong model
for a real world situation.
specification error ⇔ (inadvertently) used inappropriate model or assumptions.
We have discussed two examples, either using an irrelevant variable or omitting relevant
variables. Another example of specification error is if one assume a linear model, when the
real world relationship is not linear.
66
This bias is also present when more than one variables is omitted.
67
This problem does not occur if the omitted variable is uncorrelated with the included variables.
68
See Section 7.3.2. of Pindyck and Rubinfeld, not on the Syllabus.
69
See Section 7.3 of Pindyck and Rubinfeld, not on the Syllabus.
Other Types of Errors:*70
Even if one has specified the correct model there are other types of errors.
Due to random fluctuation in the data used to fit the model, which depends on the size of the
sample, there is parameter error. The fitted parameters vary around their actual values.
If we then use the fitted model for forecasting, due to random fluctuations the observed value
will vary around its mean. This is usually referred to as process error.
As will be discussed in a subsequent section, both parameter error and process error
contribute to forecasting error.
Variances of Estimated Parameters:
^
We will subsequently discuss how to estimate variances of estimated parameters such as β.
^
As will be discussed, variances such as Var[ β] are used to get standard errors used in t-tests.
Therefore, it is important to be able to get good estimates of the variances of estimated
parameters.
Under the assumptions of the classical linear regression model, the usual estimators of these
variances are unbiased, consistent, and efficient. However, when heteroscedasticity is present,
this is no longer true.
Heteroscedasticity-consistent estimators (HCE), to be discussed subsequently, provide

unbiased, and consistent estimators of variances of estimated parameters, when
heteroscedasticity is present.
Heteroscedasticity-consistent estimators are not efficient. Efficient estimators of variances of

estimated parameters are obtained via weighted regression, to be discussed subsequently.
Heteroscedasticity-consistent estimators are consistent in the presence of heteroscedasticity of

unknown form. In order to apply weighted regression to correct for heteroscedasticity, one must
know or estimate how the variances vary across the observations.
70
Not on the syllabus. Different ways to categorize errors are used by various authors.
Loss Functions:*
Assume we observe values of: 12, 3, 38, 5, 8. If one wanted to estimate the next value, one
might take either the empirical mean of 11 or the empirical median of 8. Which of these two
estimates is “better” depends on which criterion one uses to decide the question. As we’ll see,
either of these estimates or some other estimate might be considered “best”.
Which estimator of a quantity of interest is the “best” depends on what criterion is used.
Commonly one wants to minimize the expected value of the “error”. Depending on the error or
loss function one wishes to minimize one gets different estimators. Let xi be the observations of
the quantity we which to estimate and let α be the estimate of that quantity of interest which
minimizes a given loss function.71
If the loss function is Squared Deviations, then the best estimator is the Mean.
∂ (Σ (α − xi)2 / ∂ α = 2αΣ(α−xi), which equals 0 when α = Σxi / n = mean.
The mean of the conditional distribution minimizes the expected squared error.
If the loss function is Absolute Deviations, then the best estimator is the Median.
∂ (Σ | α − xi | / ∂ α = Σ sgn (α−xi), (where sgn(y) equals 1 when y > 0 , equals -1 when y < 0
and equals 0 when y = 0.) This partial derivative of the absolute deviations has an expected
value equal to zero when there is an equal chance that α > xi or α < xi.
This occurs when α is the median.
The median of the conditional distribution minimizes the expected absolute error.
Note that the squared error loss function (dashed line) counts extreme errors more heavily
than does the absolute error function (solid line.)
-2 -1 1 2
71
Note that multiplying an loss function by a constant does not change when that function is minimized.
Thus one gets the same “best” estimator.
Problems:
9 . 1 (1 point) For a linear regression model, which of the following statements about estimators
of the slope parameters are true?
(A) Heteroscedasticity can produce biased estimators.
(B) Serial correlation can produce biased estimators.
(C) Heteroscedasticity can produce inconsistent estimators.
(D) Serial correlation can produce inconsistent estimators.
(E) None of A, B, C, or D.
9.2 (3 points) X has a 70% chance of being 0 and a 30% chance of being 10.
What is the variance of X?
For a sample of size two, compute the expected value of the sample variance by listing all
possible samples of size two.
9.3 (1 point) A variable is omitted from a linear regression model.

The omitted variable is correlated with the included variables.
Which of the following statements about estimators of the slope parameters are true?
1. They are unbiased.
2. They are consistent.
3. They are efficient.
A. None of 1, 2, or 3 B. 1, 2 C. 1, 3 D. 2, 3 E. None of A, B, C, or D
9.4 (3 points) You are given two estimators, α and β, of the same unknown quantity.
(i) Bias of α is -1. Variance of α is 5.
(ii) Bias of β is 2. Variance of β is 3.
(iii) The correlation of α and β is 0.6.
Estimator γ is a weighted average of the two estimators α and β, such that:
γ = wα + (1 - w)β.
Determine the value of w that minimizes the mean squared error of γ.
A. Less than 0.4
E. 0.7 or more
9.5 (1 point) A variable that does not help explain the dependent variable is included in a
linear regression model.
Which of the following statements about estimators of the slope parameters are true?
1. They are unbiased.
2. They are consistent.
3. They are efficient.
A. None of 1, 2, or 3 B. 1, 2 C. 1, 3 D. 2, 3 E. None of A, B, C, or D
9.6 (2 points) You are given:

x 0 10 100
Pr[X = x] 0.6 0.3 0.1
For a sample of size n, X = ΣXi/n, and the population variance is estimated by Σ(Xi - X )2/n.
When n = 6, calculate the bias of this estimator of the variance.
A. Less than -100
B. At least -100, but less than -50
C. At least -50, but less than 50
D. At least 50, but less than 100
E. At least 100
* 9.7 (2 points) One is fitting a size of loss distribution to some data.

Ignoring any effects of inflation, give examples in this context of specification error (modeling
error), parameter error, and process error.
9.8 (3 points) Y = 10 + 3X2 + 2X3 + ε, where each εi has mean of zero.

Observations are made for the following pairs of X2 and X3: (0, 0), (1, 1), (2, 3).
A linear regression Y = α + βX2 + ε, is fit the observations, with X3 omitted from the model.
^ and β. ^
Determine the expected values of α
9.9 (2, 5/83, Q.33) (1.5 points) Let X1, X2, . . . , Xn be a random sample of size n ≥ 2 from a
Poisson distribution with mean λ. Consider the following three statistics as estimators of λ.
n
I. X = ∑ Xi / n
i=1
n
II. ∑ (Xi - X)2 / (n -1)
i=1
III. 2X1 - X2
Which of these statistics are unbiased?
A. I only B. II only C. III only D. I, II, and III
E. The correct answer is not given by A, B, C, or D
9.10 (4B, 5/92, Q.2) (1 point) Which of the following are true?
1. The expected value of an unbiased estimator of a parameter is equal to the true value of the
parameter.
2. If an estimator is asymptotically unbiased, the probability that an estimate based on n
observations differs from the true parameter by more than some fixed amount converges
to zero as n grows large.
3. A consistent estimator is one with a minimal variance.
A. 1 only B. 3 only C. 1 and 2 only D. 1, 2, and 3
E. The correct answer is not given by A, B, C, or D
9.11 (4B, 11/92, Q.8) (1 point) You are given the following information:
X is a random variable whose distribution function has parameter α = 2.00.
Based on n random observations of X you have determined:
• E[α1] = 2.05 where α1 is an estimator of α having variance = 1.025.
• E[α2] = 2.05 where α2 is an estimator of α having variance = 1.050.
• As n increases to ∞, P[|α1 - α| > ε] approaches 0 for any ε > 0.
Which of the following are true?
1. α1 is an unbiased estimator of α.
2. α2 has a smaller Mean Squared Error than α1.
3. α1 is a consistent estimator of α.
A. 1 only B. 2 only C. 3 only D. 1, 3 only E. 2, 3 only
9.12 (4B, 11/93, Q.13) (3 points) You are given the following:
• Two instruments are available for measuring a particular (non-zero) distance.
• X is the random variable representing the measurement using the first instrument
and Y is the random variable representing the measurement using the second instrument.
• X and Y are independent.
• E[X] = 0.8m; E[Y] = m; Var[X] = m2; and Var[Y] = 1.5m2 where m is the true distance.
Consider the class of estimators of m which are of the form Z = αX + βY.
Within this class of estimators of m, determine the value of α that makes Z an unbiased
estimator with smallest mean squared error.
A. Less than 0.45
E. At least 0.60
9.13 (4B, 5/95, Q.27) (2 points) Two different estimators, ψ and φ, are available for
estimating the parameter, β, of a given loss distribution. To test their performance, you have
conducted 75 simulated trials of each estimator, using β = 2, with the following results:
75 75 75 75
∑ ψi = 165, ∑ ψi2 = 375, ∑ φi = 147, ∑ φi2 = 312
i=1 i=1 i=1 i=1
Let MSE(ψ) = the mean squared error of estimator ψ.
Let MSE(φ) = the mean squared error of estimator φ.
In this simulation, what is MSE(ψ) / MSE(φ)?
A. Less than 0.50
E. At least 0.95, but less than 1.00
9.14 (4B, 5/96, Q.12) (1 point)

Which of the following must be true of a consistent estimator?
1. It is unbiased.
2. For any small quantity ε, the probability that the absolute value of the deviation of the
estimator from the true parameter value is less than ε tends to 1 as the number of
observations tends to infinity.
3. It has minimal variance.
A. 1 B. 2 C. 3 D. 2, 3 E. 1, 2, 3
9.15 (4B, 11/96, Q.21) (2 points) You are given the following:
• The expectation of a given estimator is 0.50.
• The variance of this estimator is 1.00.
• The bias of this estimator is 0.50.
Determine the mean square error of this estimator.
A. 0.75 B. 1.00 C. 1.25 D. 1.50 E. 1.75
9.16 (4B, 11/97 Q.13) (2 points) You are given the following:
• The random variable X has the density function
f(x)=e-x, 0 < x < ∞.
• A loss function is given by |X - k|, where k is a constant.
Determine the value of k that will minimize the expected loss.
A. ln 0.5 B. 0 C. ln 2 D. 1 E. 2
9.17 (4, 11/01, Q.35) (2.5 points) You observe N independent observations from a process
whose true model is: Yi = α + βXi + εi.
You are given:
(i) Zi = Xi2, for i = 1, 2, ..., N.
(ii) b∗ = {Σ(Zi - Z )(Yi - Y )}/ {Σ(Zi - Z )(Xi - X )}.
Which of the following is true?
(A) b∗ is a nonlinear estimator of β.
(B) b∗ is a heteroscedasticity-consistent estimator (HCE) of β.
(C) b∗ is a linear biased estimator of β.
(D) b∗ is a linear unbiased estimator of β, but not the best linear unbiased estimator (BLUE)
of β.
(E) b∗ is the best linear unbiased estimator (BLUE) of β.
9.18 (CAS3, 5/05, Q.21) (2.5 points) An actuary obtains two independent, unbiased
estimates, Y1 and Y2, for a certain parameter. The variance of Y1 is four times that of Y2.
A new unbiased estimator of the form k1Y1 + k2Y2 is to be constructed.
What value of k1 minimizes the variance of the new estimate?
A. Less than 0.18
E. 0.33 or more
9.19 (4, 5/05, Q.16) (2.9 points) For the random variable X, you are given:
(i) E[X] = θ, θ > 0.
(ii) Var(X) = θ2/25.
(iii) θ^ = {k/(k+1)} X, k > 0.
(iv) MSE θ^(θ) = 2[bias θ^(θ)]2 .
Determine k.
(A) 0.2 (B) 0.5 (C) 2 (D) 5 (E) 25
9.20 (CAS3, 5/06, Q.3) (2.5 points) Mrs. Actuarial Gardner has used a global positioning
system to lay out a perfect 20-meter by 20-meter gardening plot in her back yard.
Her husband, Mr. Actuarial Gardner, decides to estimate the area of the plot. He paces off a
single side of the plot and records his estimate of the length. He repeats this experiment an
additional 4 times along the same side. Each trial is independent and follows a Normal
distribution with mean 20 meters and a standard deviation of 2 meters. He then averages his
results and squares that number to estimate the total area of the plot.
Which of the following is a true statement regarding Mr.Gardener’s method of estimating the
area?
A. On average, it will underestimate the true area by at least 1 square meter.
B. On average, it will underestimate the true area by less than 1 square meter.
C. On average, it is an unbiased method.
D. On average, it will overestimate the true area by less than 1 square meter.
E. On average, it will overestimate the true area by at least 1 square meter.
Mahler’s Guide to
Regression
Sections 10-14:
10 Variances and Covariances
11 t-Distribution
12 t-test
13 Confidence Intervals for Estimated Parameters
prepared by
Study Aid F06-Reg-C

Sharon, MA, 02067
HCMSA-F06-Reg-C, Mahler’s Guide to Regression, 7/11/06, Page 85
Section 10, Variances and Covariances

As with any model, it is useful to estimate the variances and covariances of the estimated
parameters of the linear regression model.
Estimating the Variance of the Regression:
One assumption of the Classical Linear Regression Model is that Var[εi] = σ2 for all i.
In order to estimate the variances and covariances of the estimated parameters of the
regression model, the key step is to estimate σ2, the variance of the regression.
An unbiased of estimator σ2 is the residual variance, s2:
s2 = Σ ^εi 2 / (N - k) = ESS/(N - k) = estimated variance of the regression.
In order to estimate the variance of the regression, we divide the error sum of squares by its
number of degrees of freedom: N - k = number of data points minus number of parameters
estimated (including the intercept).72
For the two-variable model: s 2 = Σ ^εi 2 / (N - 2) = ESS/(N - 2).
Exercise: Estimate the variance of the regression of heights discussed previously.

[Solution: As determined previously for this example, ESS = 8.741. There are 8 fathers and 8
sons. Therefore, s2 = ESS/(N - 2) = 8.741/(8 - 2) = 1.457.]
s is called the standard error of the regression.
72
This is analogous to the calculation of a sample variance, which has denominator N-1 rather than N.
An Example Showing that s2 is an Unbiased Estimator of σ2:73 *
Take a particular example of a two-variable regression.

Let X1 = 1, X2 = 2, and X3 = 3.
Let α = 4 and β = 5.74
Thus E[Y1] = 9, E[Y2] = 14, E[Y3] = 19.
Let each error term have a distribution of 50% chance of -1 and 50% chance of +1.75
σ2 = 1. As usual, the error terms are independent.
There are then 8 equally likely sets of values for Y: (8, 13, 18), (8, 13, 20), (8, 15, 18),
(8, 15, 20), (10, 13, 18), (10, 13, 20), (10, 15, 18), (10, 15, 20).
For each of these sets of values of Y one can fit a regression and calculate ESS.
In each case, in deviations form, x = (-1 , 0, 1) and Σxi2 = 2.
In deviations form, TSS = Σyi2, RSS = (Σxiyi)2/ Σxi2, and ESS = Σyi2 - (Σxiyi)2/ Σxi2.
Y y Σyi2 Σxiyi ESS

8, 13, 18 -5, 0, 5 50 10 0
8, 13, 20 -17/3, -2/3, 19/3 218/3 12 2/3
8, 15, 18 -17/3, 4/3, 13/3 158/3 10 8/3
8, 15, 20 -19/3, 2/3, 17/3 218/3 12 2/3
10, 13, 18 -11/3, -2/3, 13/3 98/3 8 2/3
10, 13, 20 -13/3, -4/3, 17/3 158/3 10 8/3
10, 15, 18 -13/3, 2/3, 11/3 98/3 8 2/3
10, 15, 20 -5, 0, 5 50 10 0
E[ESS] = (0 + 2/3 + 8/3 + 2/3 + 2/3 + 8/3 + 2/3 + 0)/8 = 1.

E[s2] = E[ESS/(N - 2)] = 1/(3 - 2) = 1 = σ2.
73
The fact that E[ESS] = (N - k)σ2 will be proved in a subsequent section.
74
Of course we do not usually know the actual slope and intercept.
75
While this has mean of zero, it is not Normally distributed.
The result does not depend on the distributional form of ε.
Straight Line with No Intercept:
^
In the case of fitting a straight line with no intercept, β = ΣXiYi /ΣXi2.
^
Var[ β] = Var[ΣXiYi /ΣXi2] = ΣXi2Var[Yi] /{ΣXi2}2 = ΣXi2σ2 /{ΣXi2}2 = σ2 /{ΣXi2}.
Thus one estimates the variance of the slope as:
^
Var[ β] = s2 /{ΣXi2} = {ESS/(N - 1)}/{ΣXi2}.
Exercise: Estimate the variance of the slope in regression of heights with no intercept.
[Solution: As determined previously for this example with no intercept, ESS = 32.3.
s2 = ESS/(N - k) = 32.3/(8 - 1) = 4.614. ΣXi2 = 532 + ... + 662 = 28,228.
^ ^
Var[ β] = s2 /{ΣXi2} = 4.614/28228 = .000163. We had β = 1.03.]
For a 95% confidence interval, the critical value on the t-table with 8 - 1 = 7 d.f. is 2.365.
^
β = 1.03 ± 2.365 √.000163 = 1.03 ± .03.
Two Variable Regression, Variance of the Estimated Slope:
For any finite sample of data, the estimated slope will vary around the true slope of the model
due to random fluctuations in values of the dependent variable.
Under the assumptions of the Classical Linear Regression Model76 discussed previously, the
variance of least squares estimator of the slope of the two-variable regression model is:
{(Variance of errors)/(Variance of X)}/(number of data points):77
^
Var[ β] = (σ2/Var[X])/N = σ2 / Σxi2.78
More random noise ⇔ larger σ2 ⇔ worse estimate of slope.
Independent Variable over wider domain ⇔ larger Var[X] ⇔ better estimate of slope.
More observations ⇔ larger N ⇔ better estimate of slope.
As is usual, the variance of the estimate goes down as the inverse of the number of
observations.
The more noise, as quantified by Var[ε] = σ2, the worse the estimate.79
The more spread apart the places at which the observations take place, as quantified by
Var[X], the better the estimate of the slope.80 If the independent variable were time, then we
expect to get a better estimate of the slope of the line if we observe at times 0 and 100, rather
than at times 49 and 51. Observations at times 0 and 100 would provide more useful
information about the slope than would observations at times 49 and 51.
Two Variable Regression, Variance of the Estimated Intercept:
There is somewhat different formula for the variance of the estimated intercept:81
^ = σ2ΣXi2/(NΣxi2) = (ΣXi2/N)(σ2/Σxi2) = E[X2]σ2/Σxi2.

Var[ α]
76
The assumption that the errors are Normally distributed is not used.
77
78
Var[X] = Σxi2 / N.
79
* Thus σ2 acts somewhat analogously to the Expected Value of the Process Variance in credibility.
80
* Thus Var[X] acts somewhat analogously to the Variance of the Hypothetical Means in credibility.
81
Covariances and Correlations:
The Covariance of two variables X and Y is defined by:

Cov[X,Y] = E[XY] - E[X]E[Y].
Since Cov[X,X] = E[X2] - E[X]E[X] = Var[X], the covariance is a generalization of the variance.
Covariances have the following useful properties:

Cov[X, aY] = aCov[X, Y].
Cov[X, Y] = Cov[Y, X].
Cov[X, Y + Z] = Cov[X, Y] + Cov[X, Z].
The Correlation of two random variables is defined in terms of their covariances:

Corr[X, Y] = Cov[X ,Y] / √ {Var[X]Var[Y]}.
The correlation is always between -1 and +1.
Corr[X, X] = 1. Corr[X, -X] = -1. Corr[X, Y] = Corr[Y, X].
Two Variable Regression, Covariance of the Estimated Slope and Intercept:
The covariance of the estimated slope and intercept is:82
^
^ β] = -σ2 X /Σxi2.
Cov[ α,
Estimating the Variances and Covariances of the Two Variable Regression: 8 3
We use s2, the estimated variance of the regression, in order to estimate the variances of the
estimated slope and intercept as well as their covariance.84
Var[ α^ ] = s2Σ Xi2 /(NΣ xi2) = (Σ Xi2/N)(s2/Σ xi2) = E[X2]s2/Σ xi2.
^
Var[ β ] = s2 / Σ x i2 .
^
Cov[ α^ , β ] = -s2 X /Σ x i2 .
82
83
The multiple-variable case will be covered in a subsequent section.
84
See pages 63-64 of Pindyck and Rubinfeld.
Exercise: Estimate these variances and covariance for the regression of heights discussed
previously.
[Solution: As determined previously for this example, s2 = 1.457.
N = 8. X = 59.25. ΣXi2 = 28,228. Σxi2 = 143.5. Therefore,
Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (1.457)(28228)/((8)(143.5)) = 35.83.

^
Var[ β] = s2 /Σxi2 = 1.457/143.5 = .01015.
^
Cov[ α^ , β] = -s2 X /Σxi2 = -(1.457)(59.25)/143.5 = -.6016.]
Thus for this heights example, the variance-covariance matrix is:
^
(Var[ α^ ] Cov[ α^ , β]) (35.83 -.6016)
( ) = ( )
^ ^
(Cov[ α^ , β] Var[ β] ) (-.6016 .01015)
In general for the two-variable model, the variance-covariance matrix is:
^
(Var[ α^ ] Cov[ α^ , β]) (s2ΣXi2 /(NΣxi2) -s2 X /Σxi2) (ΣXi2/N - X)
( ) = ( ) = (s2 /Σxi2)( )
^ ^
(Cov[ α^ , β] Var[ β] ) (-s2 X /Σxi2 s2 /Σxi2 ) (- X 1)
^ ^
The above formulas for Var[ α^ ], Var[ β], and Cov[ α^ , β] for the two-variable model, are special
cases of those for multiple regression discussed subsequently. Some may find it easier to
^
memorize the general matrix form of the variance-covariance matrix, Var[ β] = s2(X’X)-1,
discussed subsequently for the multiple-variable case.
S Notation:*
One can also put the variance of the estimated slope in the S.. notation discussed previously.
^
Var[ β] = s2 /Σxi2 = {ESS/(N-2)}/SXX = (TSS - RSS)/{(N-2)SXX} = (SYY - SXY2/SXX)/{(N-2)SXX} =
{SYY/SXX - (SXY/SXX)2}/(N-2).
^
Var[ β] = {64.875/143.5 - (89.75/143.5)2}/(8 - 2) = 0.01015.
Correlation of Estimated Slope and Intercept:
^ ^ ^
Corr[ α^ , β] ≡ Cov[ α^ , β]/ √(Var[ α^ ]Var[ β]) = - X /√(E[X2]).
Thus if the mean of X is positive, then the estimates of α and β are negatively
correlated. Therefore, if the mean of X is positive, then if one of the coefficients is
overestimated, the other is more likely to be underestimated, and vice-versa.
If instead, the mean of X is negative, then the estimates of α and β are positively correlated.
Therefore, if the mean of X is negative, then if one of the coefficients is overestimated, the other
is more likely to be overestimated.
^
Exercise: Estimate Corr[ α^ , β] for the regression of heights discussed previously.
^ ^ ^
[Solution: Corr[ α^ , β] = Cov[ α^ , β]/ √Var[ α^ ]Var[ β] = -.6016/√((35.83)(.01015)) = -.9976.]
For the heights example, the slope and intercept are almost perfectly negatively correlated.
Distribution of the Fitted Coefficients:
^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2}, a linear combination of the Yi, independent Normals.
^
Therefore, β is Normally Distributed, with mean β and variance σ 2 /Σ x i2 .85
α^ = {ΣYiΣXi2 - ΣXiΣXiYi }/ {NΣXi2 - (ΣXi)2}, a linear combination of the Yi, independent Normals.
Therefore, α^ is Normally Distributed, with mean α and variance E[X2 ]σ

σ 2 /Σ x i2 .86
α^ and β^ are jointly Bivariate Normally Distributed, with correlation - X/ √ (E[X 2 ]).
Coefficient of Variation:*
The coefficient of variation can be written in terms of the mean and second moment,
^
E[X2]/E[X]2 = 1 + CVX2.87 Therefore, Corr[ α^ , β] = -1/√(1 + CVX2). The smaller the coefficient of
variation of the independent variable X, the closer this correlation is to -1. The larger the CV of
X, the closer this correlation is to 0.
85
^
Recall that β is an unbiased estimator of β. The variance has been given previously.
^
86
Recall that α is an unbiased estimator of α. The variance has been given previously.
87
Coefficient of Variation ≡ Standard Deviation / Mean.
^
Exercise: The Coefficient of Variation of X is 3. What is Corr[ α^ , β] ?
^
[Solution: Corr[ α^ , β] = -1/√(1 + CVX2) = -1/√10 = -.316.]
Exercise: For the heights example, what is the coefficient of variation of the fathers heights?
[ X = 59.25. Σxi2 = 143.5. Var[X] = 143.5/8 = 17.9375. CVX = (√17.9375)/59.25 = .0715.]
For the heights example, -1/√(1 + CVX2) = -.9975, matching the previously estimated
^
correlation of α^ and β, subject to rounding.
Standard Errors of the Estimated Parameters:
The square root of the variance of an estimated parameter is its standard error.
^
sβ^ = √Var[ β] = standard error of the estimate of β =
sβ^ = s /√Σxi2.
sα^ = √Var[ α^ ] = standard error of the estimate of α.
Exercise: For the heights regression, what are the standard errors of the slope and intercept?
^
[Solution: sβ^ = √Var[ β] = √.01015 = .101. sα^ = √Var[ α^ ] = √35.83 = 5.99.]
^
The larger sβ^ the more dispersed β is around its mean.
The larger sα^ the more dispersed α^ is around its mean.
The larger s, the more dispersed the error terms, ^εi .
The t-statistic for the slope, to be discussed subsequently, is:

^
t = β/ sβ^ = t-statistic for testing the hypothesis β = 0.
^
For the heights regression, β = .6254, sβ^ = .101, and t = .6254/.101 = 6.2.
For the heights example, a typical output from a regression program would look like this:88
Parameter Table Estimate SE T-Stat p-Value

1 24.0679 5.98553 4.02102 0.00695079
x 0.625436 0.100765 6.2069 0.000806761
88
This is from Mathematica. The p-values of the t-statistics will be discussed subsequently.
Problems:
10.1 (8 points) Fit a two-variable regression to the following data:

X 2 5 8 9
Y 10 6 11 13
^ 2 ^ ^
Determine, α^ , β, R2, R , s2, sα^ , sβ^ , Cov[ α^ , β] and Corr[ α^ , β].

A two-variable regression has been fit to 100 data points.
X = 122. ΣXi2 = 5,158,000. ESS = 533,000.
1 0 . 2 (2 points) Calculate sβ^ , the standard error of β.

A. less than .02
B. at least .02 but less than .03
C. at least .03 but less than .04
D. at least .04 but less than .05
E. at least .05
1 0 . 3 (2 points) Calculate sα^ , the standard error of α.

A. less than 5
B. at least 5 but less than 6
C. at least 6 but less than 7
D. at least 7 but less than 8
E. at least 8
^
10.4 (2 points) Calculate Cov[ α^ , β].
A. less than -.4
B. at least -.4 but less than -.3
C. at least -.3 but less than -.2
D. at least -.2 but less than -.1
E. at least -.1
10.5 (2 points) For a two-variable regression based on five observations, you are given:
Y = {0, 3, 6, 10, 8} and s2 = 2.554. Determine R2.
(A) 0.84 (B) 0.86 (C) 0.88 (D) 0.90 (E) 0.92
10.6 (2 points) A linear regression model, Yi = a + bXi + εi, has been fit to 15 observations.
Let yi = Yi - Y . Σ yi2 = 169. The estimated variance of the regression, s2 = 4.
2
Determine R .
(A) 0.65 (B) 0.66 (C) 0.67 (D) 0.68 (E) 0.69
10.7 (2 points) For a two-variable regression based on 30 observations, σ2 = 255.

^
Determine the minimum possible value for the variance of the estimated intercept, Var[ α].
(A) 8.0 (B) 8.5 (C) 9.0 (D) 9.5 (E) 10.0
10.8 (2 points) A linear regression model with 5 explanatory variables (4 independent

variables plus the intercept), has been fit to 15 observations. Σ ^εi 2 = 1036.
If we had fit instead twice as many observations, which are similar to these observations
except due to random fluctuation, what is the expected value of Σ ^εi 2?
A. less than 2000
B. at least 2000, but less than 2200
C. at least 2200, but less than 2400
D. at least 2400, but less than 2600
E. at least 2600

A linear regression, X = α + βY, is fit to 30 observations, (Xi, Yi).
ΣXi = 44, ΣXi2 = 81, ΣYi = 106, ΣYi2 = 410, ΣXiYi = 173.
^
10.9 (2 points) Determine β.
10.10 (2 points) Determine α^ .
10.11 (3 points) Determine R2.
10.12 (2 points) Determine s2.
10.13 (2 points) Determine sβ^ .
10.14 (2 points) Determine sα^ .
^
10.15 (2 points) Determine Cov[ α^ , β].
^
10.16 (2 points) Determine Corr[ α^ , β].
10.17 (2 points) For a two-variable regression based on five observations, you are given:
^
Y = {16.1, 15.4, 13.3, 12.6, 10.5} and s2 = 8.356. Determine R2.
(A) 0.45 (B) 0.50 (C) 0.55 (D) 0.60 (E) 0.65
10.18 (1 point) Bob and Ray each fit a linear regression X = α + βY, where X = age of male,
and Y = weight. Bob’s data set consists of 60 boys equally split between ages 11, 12, and 13.
Ray’s data set consists of 60 boys equally split between ages 10, 11, 12, 13, and 14.
Which regression has a larger mean squared in its estimate of the slope?
10.19 (4, 5/01, Q.40) (2.5 points)

For a two-variable regression based on seven observations, you are given:
(i) Σ(Xi - X )2 = 2000.
(ii) Σεi2 = 967.
Calculate sβ , the standard error of β.
(A) 0.26 (B) 0.28 (C) 0.31 (D) 0.33 (E) 0.35
10.20 (4, 11/04, Q.35) (2.5 points) Which of the following statements regarding the ordinary
least squares fit of the model Y = α + βX + ε is false?
(A) The lower the ratio of the standard error of the regression s to the mean of Y, the more
closely the data fit the regression line.
(B) The precision of the slope estimator decreases as the variation of the X’s increases.
(C) The residual variance s2 is an unbiased as well as consistent estimator of the error
variance σ2.
(D) If the mean of X is positive, then an overestimate of α is likely to be associated with
an underestimate of β.
^
(E) β is an unbiased estimator of β.
10.21 (VEE-Applied Statistics Exam, 8/05, Q.13) (2.5 points)

You fit the model Yi = α + βXi + εi to twenty observations.
You are given:
Error sum of squares (ESS) = 2000
Σ Xi = -300
Σ Xi2 = 6000
^ β). ^
Determine Cov( α,
(A) 0.7 (B) 0.8 (C) 0.9 (D) 1.0 (E) 1.1
^
^ and Var( β).
10.22 (2 points) In the previous question, determine Var( α)
Section 11, t-Distribution

As will be discussed subsequently, the t-Distribution, also called the Student’s t-distribution,
can be used to get confidence intervals for fitted regression coefficients and to test the
significance of fitted regression coefficients.
The t-distribution depends on one parameter, ν, the number of degrees of freedom.

The t-distribution is symmetric around 0, with support from -∞ to ∞.
For example here is a graph of the density of a t-distribution for ν = 4.
0.35
0.3
0.25
0.2
0.15
0.1
0.05
-4 -2 2 4
The density for ν = 4 is: f(x) = (3/8)(1 + x2/4)-2.5, -∞ < x < ∞.
Relation to the Normal Distribution:
The t-distribution is heavier-tailed than the Normal Distribution. For large absolute values of x,
the density of the t-distribution is larger than the Standard Normal Distribution.
Here is a graph of a t-distribution for 4 degrees of freedom, compared to that of a Standard

Normal Distribution (shown dashed). Note the way that the t-distribution is heavier-tailed than
the Normal Distribution; the Normal approaches the x-axis more quickly.
0.4
0.3
0.2
0.1
-4 -2 2 4
Here is a table of the densities of the t-distribution at 3:

ν 5 10 15 25 100 500
0.0173 0.0114 0.0091 0.0073 0.0051 0.0046
As ν increases, the t-distribution gets lighter-tailed. As ν → ∞, the density of the t-distribution at
3 approaches φ(3) = 0.0044, the density at 3 of the Standard Normal Distribution.
As ν → ∞ , the t-Distribution → the Standard Normal Distribution.
Using the t-table attached to the exam, for some values of ν, here are the values at which the
t-distribution is 95%, so that there is a total of 10% in both tails:
ν 1 2 5 10 25 120 ∞
6.314 2.920 2.015 1.812 1.708 1.658 1.645
The value shown in the table for ν = ∞, is for the Standard Normal Distribution.
Φ(1.645) = .95. For a Standard Normal Distribution, ±1.645 is a 90% confidence interval.
The survival function at 1.645 for the Standard Normal Distribution is: 1 - .95 = 5%.
The survival function for a t-distribution at 1.645 is larger than 5%, since it has a heavier tail.
For example, for ν = 10, from the t-table S(1.812) = 5%, while using a computer
S(1.645) = 6.55% > 5%. Here is a graph of the t-distribution for 10 degrees of freedom:
5% 5%
-3 -2 -1 0 1 2 3
For the t-distribution, here is a graph the survival functions at 1.645 as a function of ν:
0.16
0.14
0.12
0.1
0.08
0.06
20 40 60 80 100
As ν increases, S(1.645) → 5%; the tail gets lighter and approaches a Normal Distribution.
Summary of Student’s t Distribution:*
Support: -∞
∞ < x < ∞. Parameters: ν = positive integer
D. f. : F(x) = β[ν/2, 1/2; ν/(ν+x2)] /2 for x≤ 0

F(x) = 1 - β[ν/2, 1/2; ν/(ν+x2)] /2 for x≥ 0
For ν = 1: F(x) = .5 + ArcTan[x]/π, -∞ < x < ∞.

For ν = 2: F(x) = .5 + x / {2√(2 + x2)}, -∞ < x < ∞.
P. d. f. : f(x) = 1/{(1 + x2/ν)(ν+1)/2 β[ν/2, 1/2] ν.5}

where β[ν/2, 1/2] = Γ(1/2)Γ(ν/2)/Γ((ν+1)/2)
β[ν/2, 1/2] = (ν/2 - 1)! / {((ν - 1)/2)((ν - 3)/2)....(1/2)}, for ν even.
β[ν/2, 1/2] = π {(ν/2 - 1)(ν/2 - 2)....(1/2)}/ ((ν−1)/2)!, for ν odd.
For ν = 1: f(x) = (1/π){1/(1 + x2)}, -∞ < x < ∞.

For ν = 2: f(x) = 2-1.5 (1 + x2/2)-1.5 = (2 + x2)-1.5, -∞ < x < ∞.
Moments: E[Xn] = νn/2(n-1)(n-3)...(3)(1)/{(ν-2)(ν-4)...(ν-n)}, n even, ν > n

E[Xn] = 0, n odd
Mean = 0 Variance = ν/(ν−2), ν>2 89
Skewness = 0 (symmetric) Kurtosis = 3 + 6/(ν-4), ν>4 90
Mode = 0 Median = 0
If U is a Unit Normal variable and χ2 follows an independent chi-square distribution with ν

degrees of freedom, then U/ √ χ2/ν follows a t-distribution, with ν degrees of freedom.91
89
As ν goes to infinity, the variance approaches 1, that of a Standard Normal Distribution. When it exists, the
variance is greater than 1.
90
As ν goes to infinity, the kurtosis approaches 3, that of a Standard Normal Distribution. When it exists, the kurtosis
is greater than 3. The t-distribution is heavier-tailed than the Standard Normal Distribution.
91
A chi-square distribution with ν degrees of freedom is the sum of ν squares of independent Unit Normals. See
Sections 2.4 2 and 2.4.3 of Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.
Relation to the Beta Distribution:*
The t-Distribution can be written in terms of the Incomplete Beta Function.
For parameter ν, the density is:

f(x) = (Γ((ν+1)/2)/ {Γ(ν/2) √(π ν)}) (1 + x2/ν)-(ν+1)/2 = (1/β(ν/2,1/2))(1 + x2/ν)-(ν+1)/2, -∞<x<∞.
The Distribution Function is:92

F(x) = β[ν/2, 1/2; ν/(ν+x2)] /2 for x≤ 0, and
F(x) = 1 - β[ν/2, 1/2; ν/(ν+x2)] /2 for x≥ 0.
Exercise: In terms of an Incomplete Beta Function, what is the t-Distribution with 12 degrees of
freedom, at -2.179?
[Solution: β[12/2, 1/2, 12/(12 + 2.1792)] /2 = β[6 , .5, .7165] /2 .]
It turns out that β[6 , .5, .7165] = .050. Therefore, for the t-Distribution with 12 degrees of
freedom, the distribution function at -2.179 is .050/2 = 2.5%. Similarly, at 2.179 it is 97.5%. (The
t-distribution is symmetric.) Thus there is 5% probability outside at ±2.179. That is why 2.179
appears in the table of the t-distribution for 12 degrees of freedom and a total of 5% probability
in the tails, (2.5% in each tail.)
92
See page 96 of Loss Models or Section 26.7 of the Handbook of Mathematical Functions by Abramowitz, et. al.
t-Table:
The rows of the t-table attached to the exam are the number of degrees of freedom.
For most exam questions, one determines the number of degrees of freedom, and then looks
at the appropriate row of the table, ignoring all of the other rows.
The values in each row are the sum of the area in both the righthand and
lefthand tails.
For example, for 5 degrees of freedom, there is a total of 10% below -2.015 and above +2.015.
5% 5%
-4 -3 - 2.015 -1 1 2.015 3 4
There is 5% area in the lefthand tail below -2.015. There is also 5% area in the righthand tail
above 2.015. In other words, for the t-Distribution with 5 degrees of freedom, the 5th percentile
is -2.015, and the 95th percentile is 2.015. There is 90% area between -2.015 and 2.015.
For 5 degrees of freedom, similarly there is a total area of 2% below -3.365 and above +3.365.
1% 1%
-3.365 -2 -1 1 2 3.365
Percentage Points of the t Distribution
aê2 aê2
0
Area in both tails (α)

ν 0.10 0.05 0.02 0.01
1 6.314 12.706 31.821 63.657
2 2.920 4.303 6.965 9.925
3 2.353 3.182 4.541 5.841
4 2.132 2.776 3.747 4.604
5 2.015 2.571 3.365 4.032
6 1.943 2.447 3.143 3.707

7 1.895 2.365 2.998 3.499
8 1.860 2.306 2.896 3.355
9 1.833 2.262 2.821 3.250
10 1.812 2.228 2.764 3.169
11 1.796 2.201 2.718 3.106

12 1.782 2.179 2.681 3.055
13 1.771 2.160 2.650 3.012
14 1.761 2.145 2.624 2.977
15 1.753 2.131 2.602 2.947
16 1.746 2.120 2.583 2.921

17 1.740 2.110 2.567 2.898
18 1.734 2.101 2.552 2.878
19 1.729 2.093 2.539 2.861
20 1.725 2.086 2.528 2.845
21 1.721 2.080 2.518 2.831

22 1.717 2.074 2.508 2.819
23 1.714 2.069 2.500 2.807
24 1.711 2.064 2.492 2.797
25 1.708 2.060 2.485 2.787
26 1.706 2.056 2.479 2.779

27 1.703 2.052 2.473 2.771
28 1.701 2.048 2.467 2.763
29 1.699 2.045 2.462 2.756
30 1.697 2.042 2.457 2.750
40 1.684 2.021 2.423 2.704

60 1.671 2.000 2.390 2.660
120 1.658 1.980 2.358 2.617
∞ 1.645 1.960 2.326 2.576
Problems:
1 1 . 1 (1 point) For a t-distribution with 16 degrees of freedom, what is the distribution function
at 2.583?
(A) .95 (B) .975 (C) .98 (D) .99 (E) .995
11.2 (1 point) For a t-distribution with 6 degrees of freedom, what is Prob[t < -3.5]?
(A) Less than 0.5%
(B) At least 0.5%, but less than 1%
(C) At least 1%, but less than 2.5%
(D) At least 2.5%, but less than 5%
(E) At least 5%
11.3 (1 point) For a t-distribution with 7 degrees of freedom, what is the distribution function at
-1.895?
(A) .01 (B) .02 (C) .05 (D) .10 (E) .20
11.4 (1 point) For a t-distribution with 27 degrees of freedom, what is Prob[|t| < 2]?
(A) Less than 90%
(B) At least 90%, but less than 95%
(C) At least 95%, but less than 98%
(D) At least 98%, but less than 99%
(E) At least 99%
11.5 (1 point) For a t-distribution with 7 degrees of freedom, what is the distribution function at
-1.895?
(A) .01 (B) .02 (C) .05 (D) .10 (E) .20
11.6 (2, 5/83, Q. 42) (1.5 points) Let X1, X2, X3, and X4 be independent random variables
having a normal distribution with mean 0 and variance 1.
The distribution of (X1 + X4)/√(X22 + X32) is the same as that of aY where:
A. a = 1 and Y has a t-distribution with 1 degree of freedom
B. a = 1 and Y has a t-distribution with 2 degrees of freedom
C. a = 1/√2 and Y has a t-distribution with 2 degrees of freedom
D. a = √2 and Y has a t-distribution with 2 degrees of freedom
E. a = 2 and Y has a t-distribution with 2 degrees of freedom
11.7 (2, 5/92, Q.26) (1.7 points) Z1 and Z2 be independent and identically distributed
normal random variables with mean 0 and variance 1.
If W = Z1/√Z22, then what is the number w0 for which P[W < w0] is closest to .95?
A. 1.64 B. 2.92 C. 3.84 D. 5.99 E. 6.31
Section 12, t-test

The t-distribution can be used to provide confidence intervals for an estimated mean, and to
test whether two samples come from Normal Distributions with the same mean. These ideas
are preliminary to ideas covered on this exam, which will be discussed in subsequent sections.
Some of you may find it helpful to review these preliminary ideas, even though they should not
be directly tested on your exam.
Confidence Intervals for an Estimated Mean:
If one wants a confidence interval for an estimated mean and one knows the variance, then
one can use the Normal Distribution.
Exercise: Based on 20 observations from a variable with variance 49, the observed mean is
31. What is a 95% confidence interval for the mean?
[Solution: The mean of 20 observations has variance 49/20 = 2.45. For the Standard Normal,
Φ(1.960) = .975. Therefore, ±1.960 standard deviations would have 2.5% outside on either tail,
and 95% probability inside. Take 31 ± 1.960√2.45 = 31 ± 3.07 = [27.93 , 33.07].]
In general with known variance, if we want a confidence interval of probability P, we take

X ± y√(σ2/n), where Φ(y) = (1 + P)/2.
If one does not know the variance, the t-distribution is used instead.
The interval estimate is: the sample mean ± t0 √sample variance / √N,
where t0 is such that for the Student’s t distribution with N-1 degrees of freedom,
F(t0) - F(-t0) = desired confidence.
For example, for 10 points, in order to get a 95% confidence interval for the mean one would
look up the Student’s t for 9 degrees of freedom. The probability that the absolute value of t is
greater than 2.262 is 5%. Therefore for 10 data points, the sample mean ±2.262 standard
deviations / √10 is an approximate 95% confidence interval for the mean.
Exercise: Let 0, 2, 4, 5, 6, 6, 8, 9, 9, 12 be a random sample from a Normal Distribution with

unknown mean and variance. What is an approximate 95% confidence interval for the mean?
[Solution: The point estimate of the mean is: (0 + 2 + 4 + 5 + 6 + 6 + 8 + 9 + 9 + 12)/10 = 6.1.
The second moment is: (0 + 22 + 42 + 52 + 62 + 62 + 82 + 92 + 92 + 122)/10 = 48.7.
Thus the sample variance is: (10/9)(48.7 - 6.12) = 12.76.
The sample standard deviation is: √12.76 = 3.57 .
For 10 data points, we have 10 - 1 = 9 degrees of freedom.
Consulting the t-table, the critical value for 5% and 9 degrees of freedom is 2.262.
Therefore, the sample mean ±2.262 standard deviations /√10 is an approximate 95%
confidence interval for the mean.
An approximate 95% confidence interval for the mean is:
6.1 ± t0s/√n = 6.1 ± (2.262)(3.57)/√10 = 6.1 ± 2.55 = (3.55, 8.65). ]
In general, if we want a confidence interval for the mean of probability 1 - α , we

take X ± t√ (S 2 /n), where S2 is the sample variance and t is the critical value for
the t-distribution with n-1 degrees of freedom and α area in both tails.
This critical value is in the t-table attached to the exam.
Exercise: Based on 20 observations from a variable, the observed mean is 31 and the sample
variance is 49. What is a 95% confidence interval for the mean?
[Solution: The mean of 20 observations has variance: 49/20 = 2.45. For 20 observations we
have 20 - 1 = 19 degrees of freedom, the denominator of the sample variance. Consulting the
t-table, for 19 degrees of freedom and 5% total area in both tails, the critical value for the
t-distribution is 2.093. Take 31 ± 2.093√2.45 = 31 ± 3.28 = [27.72, 32.08].]
We note that when we have an unknown variance, the use of the t-distribution results in a
somewhat wider interval than the use of the Normal Distribution would have, if S2 = σ2.
In addition, in the case of an unknown variance, S2 is an estimate of this unknown variance.
In both cases, the confidence interval is approximate.
In the case with known variance, we do not always require that the variable being observed be
Normal, but rather that its average can be approximated by a Normal Distribution. Similarly, in
the case with unknown variance, in order to employ the t-distribution to get a confidence
interval, we do not always require that the variable being observed be Normal, but rather that
its average is approximately Normal.
Exercise: Based on 200 observations from a variable, the observed mean is 47 and the
sample variance is 112. What is a 95% confidence interval for the mean?
[Solution: The mean of 200 observations has variance: 112/200 = 0.56. For 200 observations
we have 200 - 1 = 199 degrees of freedom. Consulting the t-table, for 199 degrees of freedom
and 10% total area in both tails, the critical value for the t-distribution is 1.960.
Take 47 ± 1.960√0.56 = 47 ± 1.47 = [45.53, 48.47].
Comment: For large samples, the t-distribution is approximately Normal. Therefore, the critical
values in the t-table for ∞ degrees of freedom are those for the Normal Distribution.]
The t-Statistic, and Testing Hypotheses about the Mean:
For the null hypothesis H0 : µ = µ 0 , the test statistic is the t-statistic:

( X - µ 0 )/(S/ √ n).
If H0 is true, then the t-statistic follows a t-distribution with n-1 degrees of
freedom.
For example, let 1, 4, 6, 9 is a sample of size four from a Normal Distribution.

Then X = 20/4 = 5. S2 = {(1 - 5)2 + (4 - 5)2 + (6 - 5)2 + (9 - 5)2}/3 = 11.333.
Exercise: Test the hypothesis that H0: µ = 9 versus H1: µ ≠ 9.

[Solution: t = 2( X - µ)/S = (2)(5 - 9)/√ 11.333 = -2.376.
For 3 d.f., for a 2-sided test, the 10% critical value is 2.353 and the 5% critical value is 3.182.
2.353 < 2.376 < 3.182. ⇒ reject H0 at 10% and do not reject at 5%.]
Therefore, the t-statistic and the t-table can be used to test H0: µ = µ0, versus H1: µ ≠ µ0, using a
two-sided test.
Exercise: A sample of size 25 from a Normal Distribution, has a mean of 8 and sample
variance of 17. Test the hypothesis that H0: µ = 6 versus H1: µ ≠ 6.
[Solution: t = ( X - µ)/(S/√n) = (8 - 6)/√(17/25) = 2.425.
For 24 d.f., for a 2-sided test, the 5% critical value is 2.064 and the 2% critical value is 2.492.
2.064 < 2.425 < 2.492. ⇒ reject H0 at 5% and do not reject at 2%.]
If H1: µ ≠ µ0, reject H0: µ = µ0 at a significance level of α if |t| ≥ tα.

Relationship to Confidence Intervals:
Testing the hypothesis that µ takes on a particular value µ0, is equivalent to testing whether
that value µ0 is in the appropriate confidence interval for µ.
Assume one has a sample of size 29 from a Normal Distribution, with X = -0.64, and
S2/29 = 0.182.
Then for 29 - 1 = 28 degrees of freedom, the critical values of the t-distribution are:
ν 0.10 0.05 0.02 0.01
28 1.701 2.048 2.467 2.763
Therefore, we can get the following confidence intervals for µ:

90% confidence interval: -.64 ± (1.701)(.18) = -.64 ± .31 = [-0.95, -0.33].
90%
95%
98%
99%
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0
Zero is not in the 99% confidence interval for µ. Therefore, there is less than a 1 - 99% = 1%
probability that µ has a value at least as far (on either side) from X as 0. Therefore, if H0 is the
hypothesis that µ = 0, and H1 is µ ≠ 0, then we can reject H0 at 1%.
On the other hand, -.25 is in the 98% confidence interval but not in the 95% confidence
interval. Therefore, for the hypothesis that µ = -.25, we reject at 5% but do not reject at 2%.
In general, if µ0 is not within the P confidence interval for µ, then reject at significance level
1- P the hypothesis that µ = µ0 in favor of the alternative µ ≠ µ0.93
93
Confidence values are large such as 90%, 95%, or 99%, while significance levels are small such as 1%, 5%, or
10%.
One-Sided Tests:
If the alternative hypothesis is either H1: µ < µ0 or H1: µ > µ0, then one performs a one-sided
test.
variance of 17. Test the hypothesis that H0: µ = 6 versus H1: µ > 6.
[Solution: t = ( X - µ)/(S/√n) = (8 - 6)/√(17/25) = 2.425.
For 24 d.f., for a 1-sided test, the 2.5% critical value is 2.064 and the 1% critical value is 2.492.
2.064 < 2.425 < 2.492. ⇒ reject H0 at 2.5% and do not reject at 1%.
Alternately, as discussed below, t = 2.425 is in the shaded tail with 2.5% probability.
⇒ Reject H0 at 2.5%.
fHxL fHxL
2.5% 1%
x x
-4 -3 -2 -1 2.064 4 -4 -3 -2 -1 2.492 4
t = 2.425 is not in the shaded tail with 1% probability. ⇒ Do not reject H0 at 1%.]
One needs to recall that the values at the top of the columns in the t-table are the sum of the
areas in both tails. In the above exercise, there is a total area of 5% below -2.064 and above
2.064. Performing the one-sided test, we are interested in the 2.5% in the righthand tail above
2.064.
When performing a one-sided test, one is interested in the area in one tail, and therefore you
should halve the values at the top of the columns in the t-table.
Put another way:
If H1: µ > µ0, reject H0: µ = µ0 at a significance level of α if t ≥ t2α.
If H1: µ < µ0, reject H0: µ = µ0 at a significance level of α if -t ≥ t2α.
variance of 12. Test the hypothesis that H0: µ = 10 versus H1: µ < 10.
[Solution: t = ( X - µ)/(S/√n) = (7 - 10)/√(12/15) = -3.354.
For 14 degrees of freedom, for a 1-sided test, the 0.5% critical value is 2.977.
3.354 > 2.977. ⇒ Reject H0 at 0.5%.]
Testing Whether Two Samples from Normal Distributions Have the Same Mean:*94
Assume you have the loss ratios (losses divided by premiums) for two similar insurers writing
the same line of business in a state.
Loss Ratios (%)

Year Insurer A Insurer B
1 72.2 71.2
2 68.3 76.1
3 72.6 78.3
4 70.1 77.8
5 69.4 73.0
Assume that each set of five loss ratios is a sample from a Normal Distribution. Further assume
that the two Normal Distributions have the same (unknown) variance.95
We wish to test the hypothesis that the two insurers have the same expected loss ratio, in other
words that the two Normal Distributions have the same mean.
We test the hypothesis H0 that the mean of the difference in expected loss ratios is zero, versus
the alternate that it is not.
The five differences in loss ratio are: 1.0, -7.8, -5.7, -7.7, -3.6.96
The mean of the five differences is: -4.76.
The second moment of the five differences is: {12 + 7.82 + 5.72 + 7.72 + 3.62}/5 = 33.316.
The sample variance of the five differences is: (5/4)(33.316 - 4.762) = 13.323.
The estimated variance of the mean difference is: 13.323/5 = 2.6646.
The t-statistic of the test of H0 is:

sample mean / √(sample variance / n) = -4.76/√2.6646 = -2.916.
We perform a two-sided t-test.

The number of degrees = the sample size - 1 = denominator of the sample variance = 4.
From the t-table:
ν 0.1 0.05 0.02 0.01
4 2.132 2.776 3.747 4.604
Since we 2.776 < 2.916 < 3.747, we reject H0 at 5% and do not reject H0 at 2%.
94
See for example, Statistical Methods, by Snedecor and Cochran.
95
These assumptions lead to the difference being Normally Distributed, which is required for the t-test to be
statistically valid.
96
The result of the t-test would be the same, regardless of in which order we took the differences.
Problems:
1 2 . 1 (2 points) One observes a sample of 10 values from a variable which has a variance of
80: 18, 24, 33, 34, 30, 35, 39, 12, 18, 30. Determine the upper end of the symmetric 95%
confidence interval for the mean of this variable.
(A) Less than 33
(E) At least 36
1 2 . 2 (3 points) One observes a sample of 10 values from a variable: 18, 24, 33, 34, 30, 35, 39,
12, 18, 30. Determine the upper end of the symmetric 95% confidence interval for the mean of
this variable.
(A) Less than 33
(E) At least 36
12.3 (2 points) Let 0, 3, 4, 4, 6, 9, 9, 13 be a random sample from a distribution with unknown

mean and variance. Which of the following is an approximate 90% confidence interval for the
mean of this distribution?
A. ( 3.93, 8.07 )
B. ( 4.06, 7.94 )
C. ( 3.28, 8.72 )
D. ( 3.41, 8.59 )
E. None of A, B, C, or D.
12.4 (2 points) A random sample of eleven observations yields the values:

8, 14, 18, 20, 21, 22, 26, 30, 42, 55, and 96. ΣXi = 32. ΣXi2 = 1590.
Assume the sample is taken from a Normal Distribution, with unknown mean and variance.
Determine a 95% confidence interval for the mean.
A. (19.6, 44.4) B. (17.3, 46.7) C. (15.4, 48.6) D. (15.2, 48.8) E. (15.0, 49.0)
12.5 (2 points) A sample of size 20 from a Normal Distribution, has a sample mean of -6 and
sample variance of 46. Test the hypothesis that H0: µ = -2 versus H1: µ < -2.
A. Reject H0 at 0.5%.
B. Do not reject H0 at 0.5%. Reject H0 at 1%.
C. Do not reject H0 at 1%. Reject H0 at 2.5%.
D. Do not reject H0 at 2.5%. Reject H0 at 5%.
E. Do not reject H0 at 5%.
Use the following information for the next three questions:
An insurer is investigating whether to introduce a program to inspect all the homes it insures
for homeowners insurance, with hopes that by then inducing homeowners to repair damaged
roofs, eliminate fire hazards, etc., it will lead to a reduction in losses.
To test the program, in each of eight counties the insurer inspects at random half the homes it
insures and provides the homeowners with the appropriate advice. It then collects data on the
loss ratios the following year for both sets of homes in each county.
County
1 2 3 4 5 6 7 8
With new Program: 56 52 52 49 59 56 60 56
Without new Program: 49 57 64 53 64 68 68 66
Each loss ratio is losses divided by premiums shown as a percent.
For example, the displayed loss ratio of 56 means losses were 56% of premiums.
12.6 (3 points) Use the t-distribution to test the hypothesis that the two samples have the same
mean.
A. Reject H0 at 1%.
B. Do not reject H0 at 1%. Reject H0 at 2%.
C. Do not reject H0 at 2%. Reject H0 at 5%.
D. Do not reject H0 at 5%. Reject H0 at 10%.
12.7 (1 point) Let H0 be the hypothesis that the expected loss ratio with inspections is greater
than or equal to that without inspections. Let H1 be the hypothesis that the expected loss ratio
with inspections is less than that without inspections.
12.8 (1 point) Spreading the cost of this inspection program over several years, the cost
would be equivalent to 1.5% of premiums (for those homes to which it was applied).
Let H0 be the hypothesis that the expected loss ratio with inspections plus the cost of
inspections is greater than or equal to the expected loss ratio without inspections. Let H1 be
the hypothesis that the expected loss ratio with inspections plus the cost of inspections is less
than the expected loss ratio without inspections. Which of the following is true?
12.9 (2 points) A sample of size 15 from a Normal Distribution, has a sample mean of 60 and
sample variance of 33. Test the hypothesis that H0: µ = 55 versus H1: µ ≠ 55.
A. Reject H0 at 1%.
12.10 (3 points) Five pairs of college graduates are selected who are otherwise similar except
for gender. Their starting salaries are as follows:
Pair 1 2 3 4 5
Male 43 65 54 35 86
Female 38 66 48 35 76
The starting salaries of males and females are each Normally Distributed.
H0: the mean starting salary of males is equal to that of females.
H1: the mean starting salary of males is greater than that of females.
A. Reject H0 at 1/2%.
B. Do not reject H0 at 1/2%. Reject H0 at 1%.
12.11 (2, 5/83, Q. 47) (1.5 points) Let X1, X2, . . ., X11 be a random sample of size 11 from a
normal distribution with unknown mean µ and unknown variance σ2 > 0.
If Σxi = 132 and Σ(xi - x)2 = 99, then for what value of k is
(12 - k√.90, 12 + k√.90) a 90% confidence interval for µ?
A. 1.36 B. 1.37 C. 1.64 D. 1.80 E. 1.81
12.12 (2, 5/85, Q. 8) (1.5 points) Let x1, x2, x3, x4 be the values of a random sample from a
normal distribution with unknown mean µ and unknown variance σ2 > 0. The null hypothesis
H0: µ = 10 is to be tested against the alternative H1: µ ≠ 10 at a significance level (size) of .05
using the Student’s t-statistic. If the resulting sample mean is X = 15.84 and
s2 = Σ(Xi - X )2/3 = 16, then what are the critical t-value and the decision reached?
A. t = 2.13: reject H0
B. t = 2.35: do not reject H0
C. t = 2.78: reject H0
D. t = 3.18: do not reject H0
E. t = 3.18: reject H0
12.13 (2, 5/85, Q. 17) (1.5 points) Let X1,. . . . X9, be a random sample from a normal
distribution with unknown mean µ and unknown variance σ2 > 0.
9 9 2
Let X = ∑ Xi / 9 and S2 = ∑ (Xi - X) / 8 . What is P[( X - µ) < .62S]?
i=1 i=1
A. 0.050 B. 0.100 C. 0.500 D. 0.900 E. 0.950
12.14 (2, 5/85, Q. 41) (1.5 points) Let X1, . . . , X10 be the values of a random sample from a
normal distribution with unknown mean µ and unknown variance σ2 > 0.
Let x be the sample mean, and let s2 = (1/9)Σ(xi - x)2.
Which of the following is a 95% confidence interval for µ?
A. ( x - 2.26 s/√10, x + 2.26 s/√10)
B. ( x - 2.26 s/√9, x + 2.26 s/√9)
C. ( x - 2.23 s/√10, x + 2.23 s/√10)
D. ( x - 2.23 s/√9 , x + 2.23 s/√9)
E. ( x - 1.83 s/√10 , x + 1.83 s/√10)
12.15 (4, 5/87, Q.58) (2 points) Let 0, 3, 5, 5, 5, 6, 8, 8, 9, 11 be a random sample from a

distribution.Which of the following is an approximate 95% confidence interval for the mean?
A. (3.738, 8.262)
B. (3.855, 8.145)
C. (3.934, 8.066)
D. (4.040, 7.960)
E. Cannot be determined.
12.16 (2, 5/88, Q. 19) (1.5 points) Let X1, X2, ..., X9 be a random sample from a normal
distribution with mean µ and variance σ2 > 0. The null hypothesis H0: µ = 50 is tested against
the alternative H1: µ > 50 at a significance level (size) of .025.
9
If X = 52.53 and (1/8) ∑ (Xi − X )2 = (3.3)2, what is the value of the Student's t-statistic and its
i=1
critical value?
A. 0.77; 2.26 B. 0.77; 2.31 C. 2.30; 1.96 D. 2.30; 2.26 E. 2.30; 2.31
12.17 (2, 5/88, Q. 39) (1.5 points) A random sample of size 3 from a normal distribution
yielded the values 12, 8 and 10. A 95% confidence interval for µ based on the standard
t-statistic is of the form (k, ∞). What is k?
A. 4.2 B. 5.0 C. 6.6 D. 7.3 E. 8.1
12.18 (2, 5/90, Q. 6) (1.7 points) Let (X1, Y1), (X2, Y2), (X3, Y3), be a random sample of
paired observations from distributions with means µX and µY respectively, and with positive
variances. The null hypothesis H0: µX = µY is to be tested against the alternative H1: µX ≠ µY,
using the Student’s t statistic based on the difference scores Xi - Yi.
If the significance level (size) of the test is .05, and the value of the test statistic is 4.10, what is
the critical value of this test and what is the decision reached?
A. 2.92, reject H0
B. 3.18, reject H0
C. 4.30, reject H0
D. 3.18, do not reject H0
E. 4.30, do not reject H0
12.19 (4, 5/90, Q.38) (2 points) The following observations:

2, 0, 4, 4, 6, 3, 1, 5, 6, 9
are taken from a normal distribution with mean µ and variance σ2.
Which of the following is an approximate 90% confidence interval for the mean µ?
A. 2.84 ≤ µ ≤ 5.16 B. 2.83 ≤ µ ≤ 5.17 C. 2.82 ≤ µ ≤ 5.18
D. 2.45 ≤ µ ≤ 5.55 E. 2.43 ≤ µ ≤ 5.57
12.20 (2, 5/92, Q.23) (1.7 points) In a random sample of 15 residents of Tampa, the time (in
minutes) spent commuting to work has a sample mean of 47.21 and an unbiased sample
variance of 135. If commute times are normally distributed, then what is the shortest 90%
confidence interval for the mean commute time?
A. (41.93, 52.49) B. (42.29, 52.13) C. (43.16, 51.26) D. (43.37, 51.05) E. (45.84, 48.57)
12.21 (2, 5/92, Q. 43) (1.7 points) Let X1, . . . , X9 and Y1, . . . , Y9 be random samples from
independent normal distributions with common mean µ and variances σX2 > 0 and σY2 > 0.
9 9
Let Zi = Yi - Xi for i = 1, ..., 9, Z = ∑ Zi / 9 and SZ2 = ∑ (Zi - Z)2 / 8 .
i=1 i=1
What is the value of c such that P[ Z /SZ ≤ c] = .95 ?
A. .207 B. .547 C. .620 D. 1.640 E. 1.860
12.22 (2, 2/96, Q.6) (1.7 points) Let (X1, Y1), ..., (X8, Y8) be a random sample from a
bivariate normal distribution with means µx and µy and nonzero variances. The null hypothesis
H0: µx = µy is rejected in favor of the alternate hypothesis H1: µx ≠ µy if
8
√8 I X - Y I /√{(1/7)Σ{(Xi - Yi) - ( X - Y )}2} > k.
i=1
Determine the value of k for which the significance level (size) of the test is 0.05.
A. 1.64 B. 1.90 C. 1.96 D. 2.31 E. 2.37
12.23 (4B, 11/96, Q.24) (3 points)

The random variable X has a lognormal distribution, with parameters µ and σ.
A random sample of four observations of X yields the values: 2, 8, 13, and 27.
Determine a 90% confidence interval for µ.
A. (0.867, 3.449)
B. (0.989, 3.328)
C. (1.040, 3.276)
D. (1.145, 3.171)
E. (1.256, 3.061)
12.24 (IOA 101, 9/00, Q.4) (2.25 points) Suppose that a random sample of nine
observations is taken from a normal distribution with mean µ = 0. Let X and S2 denote the
sample mean and variance respectively.
Determine the probability that the value of X exceeds that of S, i.e. determine P( X > S).
Section 13, Confidence Intervals for Estimated Parameters

One can use the t-distribution to get confidence intervals for estimated parameters.
Exercise: One has fit a two-variable regression to 30 observations.

^
α^ = 37.2. β = -0.64. sα^ = 0.53. sβ^ = 0.18.
Determine a 95% confidence interval for the intercept.
Determine a 95% confidence interval for the slope.
[Solution: For 30 - 2 = 28 degrees of freedom, for 1 - 95% = 5% area in both tails, the critical
value for the t-distribution is 2.048.
The 95% confidence interval for the intercept is:
α^ + t sα^ = 37.2 ± (2.048)(.53) = 37.2 ± 1.1 = [36.1, 38.3].
The 95% confidence interval for the slope is:
^
β + t sβ^ = -.64 ± (2.048)(.18) = -.64 ± .37 = [-1.01, -0.27]. ]
In general, in order to get a confidence interval for a regression parameter,

one uses the critical value for N - k degrees of freedom.97
In order to cover probability 1 - α,
take the t-distribution critical value corresponding to α area in both tails.
^
Then the confidence interval is: β ± t sβ^ .
^
Exercise: For the heights example with 8 observations, β = .6254 and sβ^ = .101.
Construct a 99% confidence interval for the slope.
[Solution: The critical value at 1% for the t-distribution with 6 degrees of freedom is 3.707.
Therefore a 99% confidence interval for the slope is:
.6254 ± (3.707)(.101) = .6254 ± .3744 = [.25, 1.00]. ]
97
k = number of variables including the intercept. k = 2 for the two-variable model.
Problems:
1 3 . 1 (1 point) You fit the regression model Yi = α + βXi + εi to 25 observations.

^
β = 1.73. sβ^ = 0.24.
Determine the lower limit of the symmetric 90% confidence interval for the slope parameter.
(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
13.2 (1 point) You fit the regression model Yi = α + βXi + εi to 12 observations.

sα^ = 126.
Determine the width of the symmetric 98% confidence interval for the intercept.
(A) 500 (B) 550 (C) 600 (D) 650 (E) 700

The following model: Y = α + βX + ε, has been fit to 10 observations.
^
α^ = 6.93. β = 2.79. ΣXi2 = 385. ΣYi2 = 5893.
^
Σxi2 = Σ(Xi - X )2 = 82.5. Σyi2 = Σ(Yi - Y )2 = 920. Σ ^εi 2 = Σ(Yi - Yi )2 = 276.
1 3 . 3 (2 points) Determine the shortest 99% confidence interval for α.

(A) (-8.0, 21.9) (B) (-6.5, 20.4) (C) (-5.0, 18.9) (D) (-3.5, 17.4) (E) (-2.0, 15.9)
1 3 . 4 (2 points) Determine the shortest 99% confidence interval for β.

(A) (1.4, 4.2) (B) (1.2, 4.4) (C) (1.0, 4.6) (D) (.8, 4.8) (E) (.6, 5.0)

You fit the following model to 12 observations: Y = α + βX + ε.
^
You are given: α^ = 9.88. β = 2.36.
^
Σ(Xi - X )2 = 1283. Σ(Yi - Yi )2 = 272. ΣYi = 390.
13.5 (2 points) Determine the upper limit of the symmetric 90% confidence interval for α.
(A) 12.8 (B) 13.0 (C) 13.2 (D) 13.4 (E) 13.6
13.6 (2 points) Determine the upper limit of the symmetric 90% confidence interval for β.
(A) 2.6 (B) 2.8 (C) 3.0 (D) 3.2 (E) 3.4
13.7 (8 points) You are given the following 6 pairs of observations:

X: 10 25 50 100 250 500
Y: 60 40 50 30 10 0
Fit the regression model: Y = α + βX + ε.
Determine 98% confidence intervals for α and β.

You fit a two-variable linear regression model to 14 pairs of observations. You are given:
The sample mean of the independent variable is 13.86.
The sum of squared deviations from the mean of the independent variable is 3096.
The sample mean of the dependent variable is 25.86.
The sum of squared deviations from the mean of the dependent variable is 6748.
The ordinary least-squares estimate of the slope parameter is 0.643.
The regression sum of squares (RSS) is 1279.
13.8 (2 points) Determine the lower limit of the symmetric 95% confidence interval for the
intercept parameter.
(A) -3 (B) -2 (C) -1 (D) 0 (E) 1
13.9 (2 points) Determine the lower limit of the symmetric 95% confidence interval for the
slope parameter.
(A) -0.6 (B) -0.4 (C) -0.2 (D) 0 (E) 0.2
13.10 (Course 120 Sample Exam #1, Q.3) ( 2 points)

You fit the regression model Yi = α + βXi + εi to 10 observations (Xi, Yi).
^
You determine: Σ(Yi - Yi )2 = 2.79. Σ(Xi - X )2 = 180. Σ(Yi - Y )2 = 152.40. X = 6. Y = 7.78.
Determine the width of the shortest symmetric 95% confidence interval for α.
(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5
13.11 (Course 120 Sample Exam #3, Q.2) (2 points)

You fit a regression model Yi = α + βXi + εi to 12 observations.
You determine that the symmetric 95% confidence interval for β is (1.2, 3.8) and that
Σ(Xi - X )2 = 0.826.
Determine the residual variance, s2.
(A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5
13.12 (IOA 101, 4/00, Q.16) (9.75 points)

The table below contains measurements on the strengths of beams.
The width and height of each beam was fixed but the lengths varied.
Data are available on the length (cm) and strength (Newtons) of each beam.
Length, l x = log l Strength, p y = log p Fitted value Residual
7 1.946 11775 9.374 9.379 -0.005
7 1.946 11275 9.330 9.379 -0.049
9 2.197 8400 9.036 9.055 -0.019
9 2.197 8200 9.012 9.055 -0.043
12 2.485 6100 8.716 8.684 0.032
12 2.485 6050 8.708 8.684 0.024
14 2.639 5200 8.556 8.486 0.070
18 2.890 3750 8.230 8.162 0.068
18 2.890 3650 8.202 8.162 0.040
20 2.996 3275 8.094 8.026 0.068
20 2.996 3175 8.063 8.026 0.037
24 3.178 2200 7.696 7.791 -0.095
24 3.178 2125 7.662 7.791 -0.129
2
Σx = 34.023, Σx = 91.3978, Σy = 110.679, Σxy = 286.6299
It is thought that P and L satisfy the law P = k/L where k is a constant, so
log P = log k - log L, i.e. Y = log k - X.
A graph of log P against log L is displayed below.
The simple linear regression model y = α + βx has been fitted to the data, and the fitted values
and residuals are recorded in the table above.
(i) (2.25 points) Use the data summaries above to calculate the least squares estimates
^
^ of α and β of β. Show all work.
α
(ii) (5.25 points) Assuming the usual normal linear regression model
(a) estimate the error variance σ2,
(b) calculate a 95% confidence interval for β, and
(c) discuss briefly whether the data are consistent with the relationship P = k/L.
(iii) (2.25 points) Plot the residuals of the model against X and comment on the information
contained in the plot.
13.13 (4, 11/00, Q.5) (2.5 points) You are investigating the relationship between per capita
consumption of natural gas and the price of natural gas. You gathered data from 20 cities and
constructed the following model: Y = α + βX + ε, where Y is per capita consumption, X is the
price, and ε is a normal random error term.
^
You have determined: α^ = 138.561. β = -1.104. ΣXi2 = 90,048. ΣYi2 = 116,058.
^
Σxi2 = Σ(Xi - X )2 = 10,668. Σyi2 = Σ(Yi - Y )2 = 20,838. Σ ^εi 2 = Σ(Yi - Yi )2 = 7,832.
Determine the shortest 95% confidence interval for β.
(A) (-2.1, -0.1) (B) (-1.9, -0.3) (C) (-1.7, -0.5) (D) (-1.5, -0.7) (E) (-1.3, -0.9)
2
13.14 (2 points) In the previous question, determine the corrected R2, R .
(A) 0.54 (B) 0.56 (C) 0.58 (D) 0.60 (E) 0.62
13.15 (4, 11/01, Q.5) (2.5 points) You fit the following model to eight observations:
Y = α + βX + ε.
You are given:
^
β = -35.69.
Σ(Xi - X )2 = 1.62.
^
Σ(Yi - Yi )2 = 2394.
Determine the symmetric 90-percent confidence interval for β.
(A) (–74.1, 2.7) (B) (–66.2, –5.2) (C) (–63.2, –8.2) (D) (–61.5, –9.9) (E) (–61.0, –10.4)
13.16 (IOA 101, 4/02, Q.14) (13.5 points) The table below gives the numbers of deaths nx
in a year in groups of women aged x years. The exposures of the groups, denoted Ex, are also
given (the exposure is essentially the number of women alive for the year in question).
The values of the death rates yx, where yx = nx/Ex, and the log(death rates), denoted wx, are
also given.
age x number of deaths nx exposure Ex yx = nx/Ex wx = logyx
70 30 426 0.07042 -2.6532
71 38 471 0.08068 -2.5173
72 38 454 0.08370 -2.4805
73 53 482 0.10996 -2.2077
74 59 445 0.13258 -2.0205
75 61 423 0.14421 -1.9365
76 82 468 0.17521 -1.7417
77 96 430 0.22326 -1.4994
2 2
Σx = 588 Σx = 43260 Σw = -17.0568 Σw = 37.5173 Σxw = -1246.7879
(i) (2.25 points) A scatter plot of yx against x is shown below.
yx
0.225
0.2
0.175
0.15
0.125
0.1
0.075
x
70 71 72 73 74 75 76 77
Draw a scatter plot of wx against x and comment briefly on the two scatter plots and the
relationships displayed.
(ii) (9 points) (a) Calculate the least squares fit regression line in which wx is modeled as the
response and x as the explanatory variable.
(b) Draw the fitted line on your scatter plot of wx against x.
(c) Calculate a 95% confidence interval for the slope coefficient of the regression model of wx
on x, adopting the assumptions of the usual “normal regression model”.
(d) Calculate the fitted values for the number of deaths for the group aged 71 years and the
group aged 76 years.
(iii) (2.25 points) Explain briefly the relationship between the fitting procedure used in part (ii)
and a model which states that the number of deaths Nx is a random variable with mean Exbcx
for some constants b and c.
13.17 (IOA 101, 9/02, Q.13) (9.75 points) The table below gives the frequency of
coronary heart disease by age group. The table also gives the age group midpoint (x) and
y = ln[p/(1-p)], where p denotes the proportion in an age group with coronary heart disease.
Coronary Heart Disease
Age group x Yes No n y
20-29 25 1 9 10 -2.19722
30-34 32.5 2 13 15 -1.87180
35-39 37.5 3 9 12 -1.09861
40-44 42.5 5 10 15 -0.69315
45-49 47.5 6 7 13 -0.15415
50-54 52.5 5 3 8 0.51083
55-59 57.5 13 4 17 1.17865
60-69 65 8 2 10 1.38629
Σx = 360; Σx2 = 17437.5; Σy = -2.9392; Σy2 = 13.615; Σxy = -9.0429.
Consider the regression model y = α + βx.
(a) Draw a scatterplot of y against x, and comment on the appropriateness of the suggested
model.
(b) Calculate the least squares fitted regression line of y on x.
(c) Calculate a 99% confidence interval for the slope parameter.
(d) Discuss whether there are differences in the probability of having coronary heat disease for
the different age groups with reference to the confidence interval obtained in (ii)(c).
Comment: Only part ii of the original past exam question is shown here.
13.18 (4, 11/02, Q.38) (2.5 points) You fit a two-variable linear regression model to 20
pairs of observations. You are given:
(i) The sample mean of the independent variable is 100.
(ii) The sum of squared deviations from the mean of the independent variable is 2266.
(iii) The ordinary least-squares estimate of the intercept parameter is 68.73.
(iv) The error sum of squares (ESS) is 5348.
Determine the lower limit of the symmetric 95% confidence interval for the intercept parameter.
(A) -273 (B) -132 (C) -70 (D) -8 (E) -3
13.19 (2 points) In the previous question, what is the width of a 95% confidence interval for
the slope parameter?
(A) 1.5 (B) 1.6 (C) 1.7 (D) 1.8 (E) 1.9
Mahler’s Guide to
Regression
Sections 14-17:
14 F Distribution
15 Testing the Slope, Two Variable Model
16 Hypothesis Testing
17 A Simulation Experiment *
prepared by
Study Aid F06-Reg-D

Sharon, MA, 02067
HCMSA-F06-Reg-D, Mahler’s Guide to Regression, 7/11/06, Page 122
Section 14, F-Distribution
The F-distribution is used in many tests of hypotheses of regression models.

We will start with a review of the variance ratio test, which is preliminary to the ideas on the
your exam.
F-Test or Variance Ratio Test:98
Assume you have 5 observations of a variable X: 5, 5, 7, 8, 10.

Then the sample mean is: (5 + 5 + 7 + 8 + 10)/5 = 7. The sample variance is:
((5 - 7)2 + (5 - 7)2 + (7 - 7)2 + (8 - 7)2 + (10 - 7)2)/(5 -1) = 4.5.
Exercise: Given three observations of the variable Y: 8, 12, 22, what are the mean and sample
variance?
[Solution: Mean is: (8 + 12 + 22)/3 = 14.
Sample Variance is: ((8 - 14)2 + (12 - 14)2 + (22 - 14)2)/(3 -1) = 52.]
If we assume X is Normally distributed and Y is Normally distributed, then we can apply an

F-Test, in order to test the hypothesis that X and Y have the same variance.
The test statistic is the ratio of the sample variances, with the larger one in the numerator:99
(Sample Variance of Y)/(Sample Variance of X) = 52/4.5 = 11.56.
Consulting the F-Table, the critical value for 5% for 2 and 4 degrees of freedom100 is 6.94.
Therefore, since 11.56 > 6.94, if the null hypothesis is true, there is less than a 5% chance of
seeing a variance ratio of 11.56 or higher. Doing a two-sided test, there is less than a 10%
chance of seeing a variance ratio this high, with either sample variance larger. Thus we reject
the null hypothesis at a 10% level (two-sided test.)
Consulting the F-Table, the critical value for 1% for 2 and 4 degrees of freedom101 is 18.00.
Therefore, since 11.56 < 18.00, if the null hypothesis is true, there is more than a 1% chance of
seeing a variance ratio of 11.56 or higher. Doing a two-sided test, there is more than a 2%
chance of seeing a variance ratio this high, with either sample variance larger. Thus we do not
reject the null hypothesis at a 2% level (two-sided test.)
While this is the original and most common use of the F-Distribution, this is not tested on your
exam! Rather, as will be discussed in later sections, you may be asked to apply the
F-Distribution to test various hypotheses about the slope coefficients of regressions.
The F-Test depends on assuming X and Y are each (approximately) Normal.
98
See for example Section 2.4.4 of Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.
99
One takes the ratio of the larger sample variance to that of the smaller sample variance.
One expects the ratio to be near one if the null hypothesis is true. H0: σY2 = σX2.
100
The sample variance of Y in the numerator has 2 degrees of freedom, the denominator of its sample variance.
The sample variance of X in the denominator has 4 degrees of freedom, the denominator of its sample variance.
101
The sample variance of Y in the numerator has 2 degrees of freedom, the denominator of its sample variance.
The sample variance of X in the denominator has 4 degrees of freedom, the denominator of its sample variance.
Relationship to the Chi-Square Distribution:*
In the case above, if X is Normal, then the numerator of the sample variance of X is a sum of
squared Normals, each with mean zero and the same variance. Thus if we divide by the
variance we get a Chi-Square distribution.102 Thus the numerator of this sample variance is σX2
times a Chi-Square distribution with 4 degrees of freedom. The sample variance of X is
therefore, (σX2 /4) times a Chi-Square distribution with 4 degrees of freedom.
Similarly, the sample variance of Y is therefore, (σY2 /2) times a Chi-Square distribution with 2
degrees of freedom.103
(Sample Variance of Y)/(Sample Variance of X) =

(σY2 /σX2)(Chi-Square 2 d.f./ 2)/(Chi-Square 4 d.f./ 4).
If σX2 = σY2, then this ratio follows an F-Distribution with 2 and 4 degrees of freedom.
In general, if χ12 follows a chi-square distribution with ν1 degrees of freedom and χ22 follows
an independent chi-square distribution with ν2 degrees104, then (χ12/ν1)/(χ22/ν2) follows an F-
distribution, with ν1 and ν2 degrees of freedom.
F-Statistic:
In the example above, the numerator of the F-Statistic was the sample variance of Y, a sum of
squares divided by its number of degrees of freedom. Similarly, the denominator of the
F-Statistic was the sample variance of X, a sum of squares divided by its number of degrees of
freedom.
Generally an F-Statistic will involve in the numerator some sort of sum of

squares divided by its number of degrees of freedom. In the denominator will be
another sum of squares divided by its number of degrees of freedom.
When applied to regression models, the particular numerator and denominator of the
computed F-Statistic will depend on the particular hypothesis being tested. However, they will
all have this same basic form.
102
A sum of squares of Unit Normals is a Chi-Square Distribution, a special case of the Gamma Distribution.
We lose one degree of freedom because the sum of Xi - X is zero, and therefore, knowing the value of n-1 of these
terms, we know the value of the last one.
103
This is where the assumption that Y is Normal is used.
104
For example, in the variance ratio test, χ12 would be the estimated variance from a sample of ν1 drawn from a
Normal Distribution and χ22 would be the estimated variance from an independent sample of ν2 drawn from a
second Normal Distribution.
Graphs:
The F-Distribution has two parameters, ν1 and ν2, each integers.

Here is a graph of the F-Distribution for ν1 = 2 and ν2 = 4:105
0.8
0.6
0.4
0.2
2 4 6 8 10
The F-Distribution is a heavy-tailed distribution.106

The F-Distribution is skewed to the right; it has positive skewness.
Here is a graph of the F-Distribution for ν1 = 12 and ν2 = 9:107
0.7
0.6
0.5
0.4
0.3
0.2
0.1 95%
1 2 3 4 5 6
It turns out that the F-Distribution for 12 and 9 degrees of freedom, at 3.07 is .950; the survival
function is 5%. That is why for the 5% significance level, for ν1 = 12 and ν2 = 9, 3.07 appears in
the F-Table attached to the exam. At 5.11 the survival function is 1%, and 5.11 appears in the
F-Table for the 1% significance level, for ν1 = 12 and ν2 = 9.
105
In terms of an Incomplete Beta Function, F(x) = β[2/2, 4/2; 2x/(4 + 2x)].
106
In the F-Statistic, if the numerator is unusually large and the denominator is unusually small, then their ratio can be
very big.
107
In terms of an Incomplete Beta Function, F(x) = β[12/2, 9/2; 12x/(9 + 12x)].
Summary of the F Distribution:* 108
Support: 0 < x < ∞ Parameters: ν 1 = positive integer, ν 2 = positive integer.
D. f. : F(x) = β[ν1/2, ν2/2; ν1x / (ν2 + ν1x)] = 1 - β[ν2/2, ν1/2; ν2/(ν2 + ν1x)].
Fν1,ν2(x) = 1 - Fν2,ν1(1/x).
P. d. f. : f(x) = ν1ν1/2 ν2ν2/2xν1/2 -1 /{(ν2 + ν1x)(ν1 +ν2)/2 β[ν1/2, ν2/2]}
where β[ν1/2, ν2/2] = Γ(ν1/2)Γ(ν2/2)/Γ((ν1+ν2)/2).
Moments: E[Xn] = (ν2/ν1)nΓ(ν1/2 + n)Γ(ν2/2 - n)/{Γ(ν1/2) Γ(ν2/2)}, ν2 > 2n.
Mean = ν2/(ν2 - 2), ν2>2. Variance = 2ν22(ν1 + ν2 - 2)/{ν1(ν2 - 2)2(ν2 - 4)}, ν2>4.
Skewness = 21.5(ν2 - 4).5(2ν1 + ν2 - 2)/{ν1.5(ν2 - 6)(ν1 + ν2 - 2).5}, ν2>6.

Kurtosis = 3 + 12{(ν2 - 4)(ν2 - 2)2 + ν1(ν1 + ν2 -2)(5ν2 + 22)}/{ν1(ν2 - 6)(ν2 - 8)(ν1 + ν2 -2)}, ν2>8
Mode = (ν2/ν1)(ν1 - 2)/(ν2 + 2), ν1 > 2; Mode = 0 for ν1 ≤ 2.
The F-Distribution is a heavy tailed distribution on 0 to ∞, with a righthand tail somewhat

similar to a Pareto Distribution with α = ν2/2.
Exercise: What is the density of an F-Distribution with ν1 = 4 and ν2 = 6?

[Solution: β[ν1/2, ν2/2] = β[2, 3] =Γ(2)Γ(3)/Γ(5) = (1!)(2!)/(4!) = (1)(2)/24 = 1/12.
f(x) = 42 63 x /{(6 + 4x)5 β[2, 3]} = 1296x/(3 + 2x)5, x > 0.]
Relation to the t-distribution:
The F-Distribution for ν1 = 1 is related to the t-distribution.

Prob[F-Distribution with 1 and ν degrees of freedom > c2 ] =
Prob[absolute value of t-distribution with ν degrees of freedom > c].
For example Prob[F-Distribution with 1 and 4 degrees of freedom > 2.7762 = 7.71] = 5% =
Prob[absolute value of t-distribution with 4 degrees of freedom > 2.776].
The critical values for 5% in the column of the F-table for ν1 = 1 are the squares of the critical
values for the two-sided t-test for 5%. Similarly, the critical values for 1% in the column of the F-
table for ν1 = 1 are the squares of the critical values for the two-sided t-test for 1%.
Prob[F-Distribution with 1 and 4 degrees of freedom > 4.6042 = 21.20] = 1% =
Prob[absolute value of t-distribution with 4 degrees of freedom > 4.604].
108
Also called the F Ratio Distribution or the Variance-Ratio Distribution. The F comes from the last name of the
statistician R. A. Fisher, who devised the variance ratio test.
Relation to the Pareto distribution:*
The F-Distribution for ν1 = 2 is a Pareto Distribution with α = ν2/2 and θ = ν2/2.
For example, for ν1 = 2 and ν2 = 1, the F-Distribution has survival function:

S(x) = {(1/2)/(1/2 + x)}1/2 = 1/√(1+2x).
Exercise: For ν1 = 2 and ν2 = 1, determine the critical values for 5%, 2.5%, and 1%.
[Solution: .05 = 1/√(1+2x). ⇒ x = 199.5. .025 = 1/√(1+2x). ⇒ x = 799.5.
.01 = 1/√(1+2x). ⇒ x = 4999.5.
Comment: These match the critical values shown in the F-Table.]
[Solution: For a Pareto distribution with α = 1 and θ = 1, S(x) = 1/(1 + x).
.05 = 1/(1 + x). ⇒ x = 19. .025 = 1/(1+ x). ⇒ x = 39. .01 = 1/(1+ x). ⇒ x = 99.
More generally, an F-Distribution with ν1 and ν2 degrees of freedom is a Generalized Pareto

Distribution, as per Loss Models, with τ = ν1/2, α = ν2/2, θ = ν2/ν1.
Therefore, the F-Distribution for ν2 = 2 is an Inverse Pareto Distribution with τ = ν1/2 and
θ = 2/ν1. S(x) = 1 - {ν1x/(ν1x + ν2)} ν 1/ 2 .
[Solution: For an Inverse Pareto distribution with τ = 2 and θ = .5, S(x) = 1 - (x/(x + .5))2.
.95 = (x/(x + .5))2. ⇒ x = 19.25. .975 = (x/(x + .5))2. ⇒ x = 39.25.
.99 = (x/(x + .5))2. ⇒ x = 99.25.
Limits:*109
As ν1 → ∞, the survival function of the F-Distribution at y approaches the distribution function of

a Chi-Square Distribution with ν2 degrees of freedom at ν2/y.
For example for ν2 = 7, as ν1 → ∞, the critical value at 5% of the F-Distribution is 3.23;

for ν2 = 7, as ν1 → ∞, the survival function of the F-Distribution at 3.23 is 5%.
For a Chi-Square Distribution with 7 degrees of freedom, the distribution function at 7/3.23 =
2.17 is 5%, as shown in the Chi-Square Table.
109
See for example, Handbook of Mathematical Functions, edited by Abramowitz and Stegun.
As ν2 → ∞, the survival function of the F-Distribution at y approaches the survival function of a

Chi-Square Distribution with ν1 degrees of freedom at yν1.
For example, for a Chi-Square Distribution with 20 degrees of freedom, the survival function at
31.41 is 5%, as shown in the Chi-Square Table. Therefore, for ν1 = 20, as ν2 → ∞, the critical
value at 5% of the F-Distribution approaches 31.41/20 = 1.571.
Using a computer for large values of ν2, the 5% critical values for ν1 = 20 are:
ν2 10 20 100 1000 10,000 100,000
5% critical value: 2.77 2.12 1.676 1.581 1.572 1.571
F-Table:
In order to enter the Table of the F-Distribution, one needs to know ν 1 and ν 2 , where ν 1 is
number of degrees of freedom associated with the numerator and ν 2 is number
of degrees of freedom associated with the denominator.
The columns correspond to ν1 while the rows correspond to ν2.
ν1 ⇔ number of degrees of freedom associated with the numerator ⇔ columns of table.

ν2 ⇔ number of degrees of freedom associated with the denominator ⇔ rows of table.
Listed in the table are the critical values for 5% and 1%.110
For example, for ν1 = 5 and ν2 = 3, the critical value for 5% is 9.01. In other words, the
distribution function at 9.01, F5,3(9.01) = .95 = 1 - .05. For ν1 = 5 and ν2 = 3, the critical value for
1% is 28.24. In other words, the distribution function at 28.24, F5,3(28.24) = .99 = 1 - .01.
For ν1 = 5 and ν2 = 3, the entries in the F-table look as follows:

9.01
28.24
110
While the tables attached to your exam contain only these two critical values, another table might contain
additional critical values. Exact p-values can be calculated via computer.
Percentage Points of the F-Distribution
Italic type ⇔ critical value for 5%. Bold face type ⇔ critical value for 1%.
ν1
ν2 1 2 3 4 5 6 7 8 9 10 11 12
1 161 200 216 225 230 234 237 239 241 242 243 244
4052 4999 5403 5625 5764 5859 5928 5981 6022 6056 6082 6106
2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.41
98.49 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.41 99.42
3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.74
34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68
16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.29 10.15 10.05 9.96 9.89
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00
13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57
12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62 6.54 6.47
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28
11.26 8.65 7.59 7.10 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07
10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.91
10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 4.71
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.86 2.82 2.79
9.65 7.20 6.22 5.67 5.32 5.07 4.88 4.74 4.63 4.54 4.46 4.40
12 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.69
9.33 6.93 5.95 5.41 5.06 4.82 4.65 4.50 4.39 4.30 4.22 4.16
13 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.60
9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96
Note: This is only 1/4 of the table attached to the exam, which also contains critical values for
larger numbers of degrees of freedom than shown here.
Hypothesis Testing:
The null hypothesis, H0, is that the variances of the variables associated with the numerator
and denominator of an F-Test are equal. One can apply the F-Test as either a one or a two
sided test. If the alternate hypothesis, H1, is that the variables have different variances, then
one applies a two sided test. If instead the alternate hypothesis, H1, is that the variable or sum
of squares in the numerator has the larger variance, then one applies a one sided test. This is
the generally the case for the tests applied to regression models, and therefore one
applies a one-sided F-test.
Assume for a particular one-sided hypothesis test111, the F-Statistic with ν1 = 5 and ν2 = 3 is:
15.4. The critical values shown in the F-Table for 5% and 1% are 9.01 and 28.24.
Then since 9.01 < 15.4 < 28.24, we can reject the hypothesis at 5% and do not reject at 1%.
The p-value is somewhere in between 1% and 5%.112
A table of critical values for ν1 = 5 and ν2 = 3:113
p-value = α: 5% 1%
critical value = c: 9.01 28.24
Then one rejects to the left and does not reject to the right.
If instead the F-statistic were 7.3, since 7.3 < 9.01, one would not reject the null hypothesis at
5%. If instead the F-statistic were 31.7, since 31.7 > 28.24, one would reject the null
hypothesis at 1%.
If the null hypothesis is true, then we are unlikely to observe a very large F-Statistic. However,
there is always some positive probability due to random fluctuation producing “unusual”
observations. For example, with ν1 = 5 and ν2 = 3, there is only a 1% of observing an
F-Statistic of 28.24 or more, if the null hypothesis is true.
Large F-Statistic ⇒ reject the null hypothesis.
Small F-Statistic ⇒ do not reject the null hypothesis.
With ν1 = 5 and ν2 = 3, if the F-Statistic is less than 9.01, do not reject at 5%. If the F-Statistic is
between 9.01 and 28.24, then reject at 5% and do not reject at 1%. If the F-Statistic is greater
than 28.24, then reject at 1%.
111
There are a number of different applications of the F-test, as discussed in subsequent sections. However, once
one has computed the correct F-Statistic and determined the corresponding degrees of freedom ν1 and ν2, then a
one-sided test proceeds in the same manner.
112
Using a computer, the p-value corresponding to 15.4 is 2.4%.
113
Put in a similar in format to a row of the Chi-Square Table.
Additional Critical Values:*
If one wanted to, using a computer one could construct the following table of critical values for
ν1 = 5 and ν2 = 3:
p-value = α: 5% 2.5% 1% 0.5%

critical value = c: 9.01 14.88 28.24 45.39
Then one rejects to the left and does not reject to the right. The table attached to your exam
only contains the critical values for 5% and 1%, so one can not be as precise in conducting an
F-test, as one could be with the above table.
Problems:
14.1 (1 point) The F-Statistic is 30.03. ν1 = 7 and ν2 = 3.

What is the p-value of a one-sided F-test?
1 4 . 2 (1 point) F-Statistic = 2.72 for 10 degrees of freedom in the numerator and 14 degrees of
freedom in the denominator.
For a one-sided F-Test, at what level, if any, do you reject the null hypothesis?
14.3 (1 point) For ν1 = 6, the F Distribution is .95 at 3. What is v2?
14.4 (1 point) Use the F-Table in order to determine the critical value for the 1% significance
level (two sided t-test) for the t-statistic with 10 degrees of freedom.
14.5 (1 point) What is the 99th percentile of the F-Distribution with 6 and 4 degrees of
freedom?
14.6 (1 point) What is the 5th percentile of the F-Distribution with 8 and 3 degrees of freedom?
14.7 (2 points) Let Z1, Z2, Z3, Z4, and Z5 be independent Standard Normal Distributions, each
with mean zero and standard deviation one.
Determine Prob[Z12 + Z22 ≥ 20.55(Z32 + Z42 + Z52)].
1 4 . 8 (1 point) What is the 95th percentile of the F-Distribution with 7 and 11 degrees of
freedom?
14.9 (3 points) You have a sample of size 4 from a Normal Distribution: 5, 7, 2, 3.

You have a sample of size 3 from another Normal Distribution: 4, 1, 15.
Test the null hypothesis H0: the two distributions have the same variance, versus the
alternative H1: the variance of the second distribution is greater than the variance of the first
distribution.
14.10 (2 points) Let X1, X2, . . . , X12 be a random sample obtained from a normal distribution
with unknown mean µX and unknown variance σX2 > 0.
Let Y1, Y2, . . . , Y15 be a random sample obtained independently from a normal distribution
with unknown mean µY and unknown variance σY2 > 0.
The statistic W = Σ(Xi - X )2/Σ(Yi - Y )2 is to be used to test the null hypothesis H0: σX2 = σY2
versus the alternative hypothesis H1: σX2 > σY2. If H0 is rejected when W > C, and the
significance level of the test is .01, then C must equal:
A. 2.4 B. 2.6 C. 2.8 D. 3.0 E. 3.2
14.11 (3 points) For a sample of size 25 from a Normal Distribution:

Σ Xi = 255 and Σ Xi2 = 2867.
For a sample of size 20 from a Normal Distribution: Σ Yi = 212 and Σ Yi2 = 2368.
Test the null hypothesis H0: the two distributions have the same variance, versus the
alternative H1: the variance of the first distribution is greater than the variance of the second
distribution. Which of the following is true?
A. The F statistic has 19 and 24 degrees of freedom and H0 is rejected at the .05 level, but not
rejected at the .01 level.
B. The F statistic has 24 and 19 degrees of freedom and H0 is rejected at the .05 level, but not
rejected at the .01 level.
C. The F statistic has 19 and 24 degrees of freedom and H0 is rejected at the .01 level.
D. The F statistic has 24 and 19 degrees of freedom and H0 is rejected at the .01 level.
E. None of A, B, C, or D
14.12 (2 points) W, X, and Y are three independent samples, each from Normal Distributions.
Each sample is of size 13. The sample variance of W is 598.
The sample variance of X is 1787. The sample variance of Y is 3560.
You test the hypothesis σW2 = σX2 versus σW2 < σX2.
You also test the hypothesis σX2 = σY2 versus σX2 < σY2.
A. Do not reject σW2 = σX2 at 5%, and do not reject σX2 = σY2 at 5%.
B. Reject σW2 = σX2 at 5% but not at 1%, and do not reject σX2 = σY2 at 5%.
C. Reject σW2 = σX2 at 1%, and reject σX2 = σY2 at 5% but not at 1%.
D. Reject σW2 = σX2 at 1%, and reject σX2 = σY2 at 1%.
with unknown mean µX and unknown variance σX2 > 0. The sample variance of X is 722.
Let Y1, Y2, . . . , Y200 be a random sample obtained independently from a normal distribution
with unknown mean µY and unknown variance σY2 > 0. The sample variance of Y is 1083.
Test the null hypothesis H0: σX2 = σY2 versus the alternative hypothesis H1: σY2 > σX2.
For the F-Distribution with ν1 and ν2 degrees of freedom, F(x) = β[ν1/2, ν2/2; ν1x / (ν2 + ν1x)].
Determine the p-value of this test.
A. β[50, 100; .429]
B. 1 - β[50, 100; .429]
C. β[100, 50; .750]
D. 1 - β[100, 50; .750]
with unknown mean µX and unknown variance σX2 > 0. The sample variance of X is 189.
Let Y1, Y2, . . . , Y7 be a random sample obtained independently from a normal distribution with
unknown mean µY and unknown variance σY2 > 0. The sample variance of Y is 37.
Test the null hypothesis H0: σX2 = b σY2 versus the alternative hypothesis H1: σX2 > b σY2.
At the 5% significance level, what is the largest value of b, such that one rejects H0?
A. 1.25 B. 1.30 C. 1.35 D. 1.40 E. 1.45
14.15 (3 points) You have two independent samples from LogNormal Distributions.
The first sample is: 1000, 1500, 3000, 25,000, and 500,000.
The second sample is: 500, 1000, and 2000.
You test the hypothesis that the two LogNormal Distributions have the same σ parameter,
versus the alternate that they do not. Which of the following is true?
A. H0 is rejected at the 1% significance level.
B. H0 is rejected at the 2% significance level
C. H0 is rejected at the 5% significance level
D. H0 is rejected at the 10% significance level
E. H0 is not rejected at the 10% significance level.
14.16 (2, 5/83, Q. 29) (1.5 points) Let X1, X2, . . . , X10 be a random sample obtained from a
normal distribution with unknown mean µX and unknown variance σX2 > 0.
Let Y1, Y2, . . . , Y6 be a random sample obtained independently from a normal distribution with
known mean µY = 0 and unknown variance σY2 > 0.
The statistic W = Σ(Xi - X )2/ΣYi2 is to be used to test the null hypothesis H0: σX2 = σY2 versus
the alternative hypothesis H1: σX2 > σY2. If H0 is rejected when W > C, and the significance
level (size) of the test is .05, then C must equal:
A. 4.10 B. 6.09 C. 6.15 D. 8.28 E. 8.59
14.17 (2, 5/85, Q. 30) (1.5 points) Let X1, . . . , X6 and Y1, . . . ,Y8 be independent random
samples from a normal distribution with mean 0 and variance 1.
6 8
Let Z = (4 / 3) ∑ Xi2 / ∑ Yi2 .
i=1 i=1
What is the 99th percentile of the distribution of Z?
A. 6.37 B. 7.46 C. 8.10 D. 16.81 E. 20.09
14.18 (2, 5/88, Q. 13) (1.5 points) Let X1, . . . , X4 and Y1, . . . , Y4 be independent random
samples from the same normal distribution with unknown mean and variance.
For what value of k does k( X - Y )2/{Σ(Xi - X )2 + Σ(Yi - Y )2} have an F-distribution?
A. 3 B. 6 C. 8 D. 12 E. 16
14.19 (2, 5/90, Q. 20) (1.7 points) Let X, Y, and Z be independent normally distributed
random variables with E(X) = 2, E(Y) = 1, E(Z) = 2, and common variance σ2 > 0.
Let W = c4(X - 2)2/{(Y - 1)2 + (Z - 2)2}.
For what value of c will W have an F-distribution with 1 and 2 degrees of freedom?
A. 0.25 B. 0.50 C. 1 D. 2 E. 4
14.20 (2, 5/90, Q. 34) (1.7 points) Let X1, X2,. . . . , X9, be a random sample from a normal
distribution with mean 0 and variance 4, and let Y1, Y2,. . . ,Y8, be an independent random
sample from a normal distribution with mean 0 and variance 9.
9 8
P[ ∑ Xi2 / ∑ Yj2 > c] = .010.
i=1 j=1
Determine the value of c.
A. 2.66 B. 2.96 C. 3.42 D. 5.91 E. 6.84
Note: This former exam question has been rewritten.
14.21 (2, 5/92, Q. 30) (1.7 points) Independent random samples of size 9 and 6 are taken
from two normal populations with variances σ12 > 0 and σ22 > 0, respectively. Let S12 and S22
be the unbiased sample variances. The null hypothesis H0: 2 σ12 = σ22 is to be tested against
the alternative H1: 2 σ12 > σ22 using the test statistic W = S12 / S22.
What is the critical value for a test of size .05?
A. 2.05 B. 2.41 C. 3.86 D. 4.82 E. 9.64
14.22 (2, 2/96, Q.14) (1.7 points) Let X1, . . . , X7 and Y1, . . . , Y14 be independent random
samples from normal distributions with common mean µ = 30 and common variance σ2 > 0.
The statistic W = 2( Y - 30)2/( X - 30)2 has an F distribution with c and d degrees of freedom.
Determine c and d.
A. c = 1, d = 1
B. c = 6, d =13
C. c = 7, d = 14
D. c = 13, d = 6
E. c = 14, d = 7
14.23 (IOA 101, 4/00, Q.4) (2.25 points) Consider the following three probability
statements concerning an F variable with 6 and 12 degrees of freedom.
(a) P(F6,12 > 0.250) = 0.95
(b) P(F6,12 < 4.82) = 0.99
(c) P(F6,12 < 0.130) = 0.01
State, with reasons, whether each of these statements is true.
14.24 (IOA 101, 9/01, Q.13) (12 points) Twenty overweight executives take part in an
experiment to compare the effectiveness of two exercise methods, A (isometric),
and B (isotonic).
They are allocated at random to the two methods, ten to isometric, ten to isotonic methods.
After several weeks, the reductions in abdomen measurements are recorded in
centimeters with the following results:
A (isometric method) 3.1 2.1 3.3 2.7 3.4 2.7 2.7 3.0 3.0 1.6
B (isotonic method) 4.5 4.1 2.7 2.2 4.7 2.2 3.6 3.0 3.3 3.4
(i) (6.75) (a) Plot the data for the two exercise methods on a single diagram.
Comment on whether the response values for each exercise method are well modeled
by normal random variables.
(b) Perform a test to investigate whether the assumption of equal variability for the responses
for the two exercise methods is reasonable.
(c) Perform a t-test to investigate whether these data support the claim that the isotonic method
is more effective than the other method.
(ii) (5.25) (a) Determine a two-sided 95% confidence interval for the difference in the means for
the two exercise methods.
(b) Assuming that the two sets of 10 measurements are taken from normal populations with the
same variance, determine a 95% confidence interval for the common standard
deviation, leaving equal probability in each tail.
Section 15, Testing the Slope, Two Variable Model
The t-statistic can be used to test the slope of a regression.
Testing Whether the Slope is Zero:
The most common null hypothesis is that β = 0.

H0: β = 0, with alternative hypothesis H1: β ≠ 0.
^
If β = 0, then we expect β to be close to zero. However, due to random fluctuations in the
values of Y, even if the actual slope is zero, the estimated slope will be somewhat different
than zero.
^
As before, sβ^ is the standard deviation of the estimate of β. If H0 is true, we expect β/ sβ^ to be
^
close to zero. Let the test statistic be t = β/ sβ^ . If the actual slope is zero, we are unlikely that t
^
will have a large absolute value. Therefore, if the absolute value of t = β/ sβ^ is sufficiently large,
then we reject H0, and conclude that the slope is nonzero.
For the heights example, we had:
Parameter Table Estimate Standard Error T-Stat p-Value

1 24.0679 5.98553 4.02102 0.00695079
x 0.625436 0.100765 6.2069 0.000806761
^
t = β/ sβ^ = .625436/.100765 = 6.207 = t-statistic for testing the hypothesis β = 0.
This t-statistic follows a t-distribution with N - 2 = 8 - 2 = 6 degrees of freedom.114
We compare the t-statistic to the t-table for 6 degrees of freedom.

ν 0.1 0.05 0.02 0.01
6 1.943 2.447 3.143 3.707
Since 6.207 > 3.707 we reject H0 at the 1% significance level.
Exercise: If the t-statistic had instead been 3, what conclusion would have been drawn?
[Solution: Since 2.447 < 3 < 3.143, we reject H0 at 5% and do not reject at 2%.]
114
^
Since we have assumed Normal errors, β is Normally Distributed with mean β. For a random sample from a Normal
variable, the sample mean divided by the sample standard deviation follows a t-distribution, with number of degrees
of freedom equal to the sample size minus one. The analogous situation here is somewhat more complicated. We
lose two degrees of freedom, since we are using the data to estimate both the slope and intercept.
Exercise: If the t-statistic had instead been -2, what conclusion would have been drawn?
[Solution: Since 1.943 < 2 < 2.447, we reject H0 at 10% and do not reject at 5%.]
Since the alternative hypothesis is H1: β ≠ 0, the t-test is two sided. We reject H0 if the
t-statistic is unusually large or small. Compare the absolute value of the t-statistic to the critical
values in the t-table, for the appropriate number of degrees of freedom. Reject to the left and do
not reject to the right.
Most Common t-test for the 2-variable model:

1. H0 : β = 0. H 1 : β ≠ 0.
^
2. t = β / sβ^ .
3. If H0 is true, then t follows a t-distribution.
4. Number of degrees of freedom = N - 2.
5. Compare the absolute value of the t-statistic to the critical values in the
t-table, for the appropriate number of degrees of freedom.
6. Reject to the left and do not reject to the right.
In general, the p-value or probability value of a statistical test is:

p-value = Prob[test statistic takes on a value equal to its calculated value or a value less in
agreement with H0 (in the direction of H1) | H0].
The p-value of this test is the sum of the area in both tails. From the t-table attached to the
exam, since 6.207 > 3.707 we can determine that the p-value is less than 1%. Using a
computer one can determine that the p-value is .00081. In other words, for a t-distribution with
6 degrees of freedom, 2S(6.207) = .00081.
Exercise: If the t-statistic had instead been 3, what would be the p-value of the test?
[Solution: Since 2.447 < 3 < 3.143, the p-value would be between 2% and 5%.]
Assuming β = 0, for this example the probability of seeing a |t| ≥ 3 is the sum of the areas in the
two tails of a t-distribution with 6 degrees of freedom, Prob[t ≤ -3] + Prob[t ≥ 3]:115
-3 3
115
Using a computer, the area in each tail is 1.20%. Therefore, the p-value is 2.40%.
In general, if the p-value is less than the chosen significance level, then we reject H0.
In the above exercise with t = 3, we would reject H0 at a significance level of 5%, but not reject
H0 at a significance level of 2%
S Notation:*
One can also put the t-statistic variance in the S.. notation discussed previously.
^ ^
As discussed previously, β = SXY/SXX, and Var[ β] = {SYY/SXX - (SXY/SXX)2}/(N-2).
^
Therefore = t = β/ sβ^ = (SXY/SXX)√(N-2)/√{SYY/SXX - (SXY/SXX)2}
= SXY√(N-2)/√{SXXSYY - SXY2}.
t =89.75 √(8 - 2) / √{(143.5)(64.875) - 89.752} = 6.2069.
More General Test:
One can also test the hypothesis that β takes on a certain nonzero value. For example in the
^
heights example, take H0 to be that β = 0.5. Then t = ( β - .5)/ sβ^ = (.6254 - .5)/.10077 = 1.244.
Since 1.244 < 1.943, we do not reject this hypothesis at the 10% level.
General t-test, 2-variable model:

1. H0: a particular regression parameter takes on certain value b. H1: H0 is not true.
2. t = (estimated parameter - b)/standard error of the parameter.
Exercise: In the heights example, test the hypothesis that the intercept is 40.
[Solution: t = ( α^ - 40)/ sα^ = (24.07 - 40)/5.986 = -2.661.

Since 2.447 < 2.661 < 3.143, we reject at 5% and do not reject at 2%.]
As will be discussed in a subsequent section, one can apply the t-test to individual parameters
in the multiple-variable case in a similar manner,
with the number of degrees of freedom = N - k.
Relationship to Confidence Intervals:
^
Testing the hypothesis that β takes on a particular value b, is equivalent to testing whether that
^
value b is in the appropriate confidence interval for β.
^
Assume one has fit a two-variable regression to 30 observations, with β = -0.64, and
sβ^ = 0.18. Then for 30 - 2 = 28 degrees of freedom, the critical values of the t-distribution are:
ν 0.10 0.05 0.02 0.01
28 1.701 2.048 2.467 2.763
Therefore, we can get the following confidence intervals for the slope:
90%
95%
98%
99%
-1.2 -1 -0.8 -0.6 -0.4 -0.2 0
Zero is not in the 99% confidence interval for β. Therefore, there is less than a 1 - 99% = 1%
^
probability that β has a value at least as far (on either side) from β as 0. Therefore, if H0 is the
hypothesis that β = 0, then we can reject H0 at 1%.
On the other hand, -.25 is in the 98% confidence interval but not in the 95% confidence
interval. Therefore, for the hypothesis that β = -.25, we reject at 5% but do not reject at 2%.
In general, if b is not within the P confidence interval for β, then reject at significance level
1- P the hypothesis that β = b.
Exercise: In the heights example, construct 95% and 98% confidence intervals for the
intercept.
[Solution: α^ = 24.07. sα^ = 5.986.

The critical value for 5% and 6 degrees of freedom is 2.447.
Therefore a 95% confidence interval for the intercept is:
24.07 ± (2.447)(5.986) = 24.07 ± 14.65 = [9.42, 38.72].
Similarly, a 98% confidence interval for the intercept is:
24.07 ± (3.143)(5.986) = 24.07 ± 18.81 = [5.26, 42.88]. ]
The value 40 is not in the 95% confidence interval; therefore, we reject at 5% the hypothesis
that the intercept is 40. On the other hand, the value 40 is in the 98% confidence interval;
therefore, we do not reject at 2% the hypothesis that the intercept is 40. This matches the result
obtained previously using the t-statistic; the two methods are equivalent.
F-Test:
As will be discussed extensively for the multiple-variable case, the F-Test can also be used to
test the slopes of a regression. However, applying the F-Test to a single slope is
equivalent to the t-test with t = √ F.
As discussed subsequently for the multiple-variable case, one form of the

F-Statistic = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)}, with q and N - k degrees of freedom.
In the two-variable model for testing the hypothesis β = 0: k = 2,

q = dimension of the restriction = 1,
R2R = restricted R2 = percent of variation explained by using just an intercept = 0,
R2UR = unrestricted R2 = R2 of the two variable model.
For the 2-variable model, F = (N - 2)R2/(1 - R2), with 1 and N - 2 degrees of freedom.
Exercise: You fit the following model to 12 observations: Y = α + βX + ε. R2 = 0.80.

Calculate the value of the F statistic used to test for a linear relationship.
[Solution: F = (N - 2)R2/(1 - R2) = (12 - 2)(.8)/(1 - .8) = 40.]
Consulting the F-Table for 1 and 12 - 2 = 10 degrees of freedom,

since 10.04 < 40, we reject at 1% the hypothesis that β = 0.
Alternately, t = √F = √40 = 6.325. Consulting the t-table for 10 degrees of freedom,

since 3.169 < 6.325, we reject at 1% the hypothesis that β = 0.116
116
Note that 3.189, the critical value for the t-test at 1%, equals √10.04, the critical value for the F-test at 1%.
See the section on the F-Distribution, for a discussion of its relationship to the t-distribution when ν1 = 1.
Exercise: At the 5% significance level, for 15 observations, for the 2-variable regression model,
determine for which values of R2 you would reject the hypothesis that β = 0.
[Solution: For 1 and 15 - 2 = 13 degrees of freedom, the critical value for 5% is 4.67.
F = 13R2/(1 - R2). Thus we reject if 13R2/(1 - R2) > 4.67. ⇔ R2 < .264.]
As discussed subsequently for the multiple-variable case, another equivalent form of the
F-Statistic = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)}, with q and N - k degrees of freedom.
In the two-variable model for testing the hypothesis β = 0: k = 2,
q = dimension of the restriction = 1,
ESSR = error sum of squares if using just an intercept = TSS,
ESSUR= unrestricted ESS = ESS of the two variable model.
Therefore, F-Statistic = {(TSS - ESS)/1} / {ESS/(N - 2)} = (N - 2)RSS/ESS.
For the 2-variable model, F = (N - 2)RSS/ESS, with 1 and N - 2 degrees of freedom.
Exercise: You fit the following model to 12 observations: Y = α + βX + ε.

You determine that ESS = 200 and RSS = 800.
[Solution: F = (N - 2)RSS/ESS = (12 - 2)(800)/(200) = 40.
Comment: R2 = RSS/TSS = 800/(800 + 200) = .80, matching a previous exercise.]
Problems:
1 5 . 1 (1 point) You fit the 2-variable linear regression model, Y = α + βX + ε,

^
to 20 observations. β = 13. sβ^ = 7. Test the hypothesis H0: β = 0 versus H1: β ≠ 0.
A. Reject H0 at 1%.

You fit the following model to 15 observations: Y = α + βX + ε.
Σ(Xi - X )2 = 24.88.
Σ(Xi - X )(Yi - Y ) = 1942.1.
Σ(Yi - Y^ i )2 = 282,750.
Σ(Yi - Y )2 = 434,348.
^
15.2 (1 point) What is β?
(A) 70 (B) 72 (C) 74 (D) 76 (E) 78
15.3 (2 points) Let H0 be the hypothesis that β = 0. Which of the following is true?
A. Reject H0 at 1%.
15.4 (1 point) Determine the upper end of a symmetric 95% confidence interval for β.
(A) 142 (B) 143 (C) 144 (D) 145 (E) 146
1 5 . 5 (2 points) You fit the 2-variable linear regression model, Y = α + βX + ε, to 27

observations.
The total sum of squares (TSS) is 44 and the regression sum of squares (RSS) is 9.
Test the hypothesis H0: β = 0 versus H1: β ≠ 0.
A. Reject H0 at 1%.

1 82
2 78
3 80
4 73
5 77
You fit the following model: Y = α + βt + ε.
15.6 (3 points) What is the estimated Loss Ratio for year 7?

(A) 71 (B) 72 (C) 73 (D) 74 (E) 75
15.7 (2 points) What is the variance of α^ ?

(A) Less than 7.5
(B) At least 7.5, but less than 8.0
(C) At least 8.0, but less than 8.5
(D) At least 8.5, but less than 9.0
(E) At least 9.0
15.8 (2 points) Determine the absolute value of the t-statistic in order to test whether β = 0.
(A) Less than 1.5
(E) At least 3.0
15.9 (1 point) What is the p-value for the t-test of the hypothesis that β = 0?
(A) Less than 1%
(B) At least 1%, but less than 2%
(C) At least 2%, but less than 5%
(D) At least 5%, but less than 10%
(E) At least 10%
^
15.10 (2 points) What is the covariance of α^ and β?
(A) -2.5 (B) -2.0 (C) -1.5 (D) -1.0 (E) -0.5
^
15.11 (1 point) What is the correlation of α^ and β?
(A) -0.9 (B) -0.8 (C) -0.7 (D) -0.6 (E) -0.5
* 15.12 (2 points) Using the delta method, estimate the standard deviation of the forecast of
the expected value for year 7.
(A) 3.0 (B) 3.2 (C) 3.4 (D) 3.6 (E) 3.8
* 15.13 (2 points) As t → ∞, what is the limit of the coefficient of variation of the forecast of the
expected value for year t?
(A) -0.8 (B) -0.7 (C) -0.6 (D) -0.5 (E) -0.4
15.14 (1 point) You fit a linear regression, Y = α + βX + ε, with X in feet.

^
If instead X were put in inches, with 12 inches per foot, what would be the effect on β and the
t-statistic to test whether β is zero?
15.15 (2 points) You fit the 2-variable linear regression model, Y = α + βX + ε, to 15

observations. R2 = .45. Test the hypothesis H0: β = 0 versus H1: β ≠ 0.
A. Reject H0 at 1%.
15.16 (1 point) You are given the following confidence intervals for the intercept parameter of
a regression:
90% confidence interval: [110, 150].
Let H0 be the hypothesis that the intercept parameter is 100.
A. Reject H0 at 1%.
15.17 (1 point) You fit the 2-variable linear regression model, Y = α + βX + ε,

^
to 300 observations. β = -2.74. sβ^ = 1.30. Test the hypothesis H0: β = 0 versus H1: β < 0.
A. Reject H0 at 1/2%.
B. Do not reject H0 at 1/2%. Reject H0 at 1%.
1 5 . 1 8 (1 point) You fit the following model: Y = α + βX + ε.

^
β = -1.27. sβ^ = 0.57.
(A) 3.5 (B) 4.0 (C) 4.5 (D) 5.0 (E) 5.5
15.19 (1 point) You fit a linear regression, Y = α + βX + ε, with Y in euros.

If instead Y were put in dollars, with 1.3 dollars per euro, what would be the effect on
^
β and the t-statistic to test whether β is zero?
15.20 (2 points) You fit the following model to 15 observations: Y = α + βX + ε.

2
You determine that R = 0.72.
(A) 29 (B) 31 (C) 33 (D) 35 (E) 37
15.21. You fit a linear regression to 25 observations via least squares: Y = α + βX + ε.

^
Let Yi be the fitted values.
Σ(Xi - X )2 = 42.65.
Σ(Xi - X )(Yi - Y ) = 302.1.
Σ(Yi - Y^ i )2 = 7502.
Let H0 be the hypothesis that β = 0. Which of the following is true?
A. Reject H0 at 1%.
15.22 (Course 120 Sample Exam #1, Q.2) (2 points) You fit the simple linear regression
^
model to 47 observations and determine Y = 1.0 + 1.2X. The total sum of squares (TSS) is 54
and the regression sum of squares (RSS) is 7.
Determine the value of the t statistic for testing H0: β = 0 versus H1: β ≠ 0.
(A) 0.4 (B) 1.2 (C) 2.2 (D) 2.6 (E) 6.7
1 5 . 2 3 (Course 120 Sample Exam #3, Q.3) (2 points) You fit a simple linear regression
to seven observations. You determine: ESS = 218.680, and F = 2.088. Calculate R2.
(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7
15.24 (4, 5/00, Q.1) (2.5 points) You fit the following model to 20 observations:
Y = α + βX + ε
You determine that R2 = 0.64.
(A) Less than 30
(E) At least 39
15.25 (IOA, 4/03, Q.12) (13.5 points) The following data give the invoiced amounts for
work carried out on 12 jobs performed by a plumber in private customers’ houses.
The durations of the jobs are also given.
duration x (hours) 1 1 2 3 4 4 5 6 7 8 9 10
amount y (£) 45 65 80 95 100 125 145 180 180 210 330 240
Σ xi = 60, Σ xi2 = 402, Σ yi = 1795, Σ yi2 = 343,725, Σ xiyi = 11,570.
The plumber claims to calculate his total charge for each job on the basis of a fixed charge for
showing up plus an hourly rate for the time spent working on the job.
(i) (3.75 points)
(a) Draw a scatterplot of the data on graph paper and comment briefly on your plot.
(b) The equation of the fitted regression line of y on x is y = 22.4 + 25.4x, and the coefficient of
determination is R2 = 87.8% (you are not asked to verify these results).
Draw the fitted line on your scatterplot.
(ii) (9.75 points)
(a) Calculate the fitted regression line of invoiced amount on duration of job using only the 11
pairs of values remaining after excluding the invoice for which x = 9 and y = 330.
(b) Calculate the coefficient of determination of the fit in (ii)(a) above.
(c) Add the second fitted line to your scatterplot, distinguishing it clearly from the first line you
added (in part (i)(b) above).
(d) Comment on the effect of omitting the invoice for which x = 9 and y = 330.
(e) Carry out a test to establish whether or not the slope in the model fitted in (ii)(a) above is
consistent with a rate of £25 per hour for work performed.
15.26 (IOA 101, 4/04, Q.14) (15.75 points) Forensic scientists use various methods for
determining the likely time of death from post-mortem examination of human bodies. A recently
suggested objective method uses the concentration of a compound (3-methoxytyramine or
3-MT) in a particular part of the brain. In a study of the relationship between post-mortem
interval and the concentration of 3-MT, samples of the appropriate part of the brain were taken
from coroners cases for which the time of death had been determined from eye-witness
accounts. The intervals (x; in hours) and concentrations (y; in parts per million) for 18
individuals who were found to have died from organic heart disease are given in the following
table. For the last two individuals (numbered 17 and 18 in the table), there was no
eye-witness testimony directly available, and the time of death was established on the
basis of other evidence including knowledge of the individuals activities.
Observation Interval Concentration
number (x) (y)
1 5.5 3.26
2 6.0 2.67
3 6.5 2.82
4 7.0 2.80
5 8.0 3.29
6 12.0 2.28
7 12.0 2.34
8 14.0 2.18
9 15.0 1.97
10 15.5 2.56
11 17.5 2.09
12 17.5 2.69
13 20.0 2.56
14 21.0 3.17
15 25.5 2.18
16 26.0 1.94
17 48.0 1.57
18 60.0 0.61
2 2
Σ x = 337, Σ x = 9854.5, Σ y = 42.98, Σ y = 109.7936, Σ xy = 672.8.
In this investigation you are required to explore the relationship between concentration
(regarded as the response/dependent variable) and interval (regarded as the
explanatory/independent variable).
(i) (3.75 points) Construct a scatterplot of the data. Comment on any interesting features of the
data and discuss briefly whether linear regression is appropriate to model the relationship
between concentration of 3-MT and the interval from death.
*(ii) (3.75 points) Calculate the correlation coefficient for the data, and use it to test the null
hypothesis that the population correlation coefficient is equal to zero.
(iii) (3.75 points) Calculate the equation of the least-squares fitted regression line, and use it to
estimate the concentrations of 3-MT:
(a) after 1 day and (b) after 2 days
Comment briefly on the reliability of these estimates.
(iv) (4.5 points) Calculate a 99% confidence interval for the slope of the regression line. Using
this confidence interval, test the hypothesis that the slope of the regression line is equal to
zero. Comment on your answer in relation to the answer given in part (ii) above.
Section 16, Hypothesis Testing117
You should know how to apply hypothesis testing to the coefficients of regression models,
using the t-distribution or F-Distribution.118
It is also a good idea to know some of the general terminology.
Testing a Slope, an Example:
The previously discussed application of the t-statistic to test a slope of a regression is an

example of Hypothesis Testing. For the example involving the heights of fathers and sons, as
^
discussed previously, t = β/ sβ^ = .625436/.100765 = 6.207.
The steps of hypothesis testing are:
1. Choose a level of significance For example, level of significance = 1%.
2. Formulate the statistical model The Normal Linear Regression Model holds.
3. Specify the null hypothesis H0 H0: β = 0.
and the alternative hypothesis H1. H1: β ≠ 0.
4. Select a test statistic whose The t-statistic computed above follows a t-distribution
behavior is known. 6 degrees of freedom.119
5. Find the appropriate critical region. The critical region or rejection region is |t| ≥ 3.707.120
6. Compute the test statistic on the The test statistic is t = 6.207.

assumption that H0 is true.
7. Draw conclusions. If the test statistic The test statistic is in the critical region,
lies in the critical region, then reject the since 6.207 ≥ 3.707, so reject H0 at 1%.
null hypothesis.
117
This material is on the syllabus of CAS Exam 3 and Joint Exam 4/C. See Section 2.5 of Pindyck and Rubinfeld.
See also Probability and Statistical Inference by Hogg and Tanis, Introduction to Mathematical Statistics by Hogg,
McKean and Craig, or Section 9.4 of Loss Models.
118
As will be discussed subsequently, in a multiple regression model, one can use the F-Distribution in order to test
the null hypothesis that all of the slopes are zero
119
N - k = 8 - 2 = 6 degrees of freedom.
120
Consulting the t table for 6 d.f. and a total of 1% area in both tails.
Null Hypothesis:
In general, in hypothesis testing one tests the null hypothesis H0 versus an

alternative hypothesis H1. It is important which hypothesis is H0 and which is H1.121
In the example above, the null hypothesis was that β = 0. A large absolute value of the
t-statistic means it is unlikely H0 is true and therefore we would reject H0.122
Note that hypothesis tests are set up to disprove something, H0 , rather than prove
something. In the above example, the test is set up to disprove that β = 0.
For example, a dry sidewalk is evidence it did not rain. On the other hand a wet sidewalk might
be caused by rain or something else such as a sprinkler system. A wet sidewalk can not prove
that it rained, but a dry sidewalk is evidence that it did not rain.
Similarly, a large absolute value of the t-statistic is evidence that the data was not drawn from
the given distribution, and may lead one to reject the null hypothesis. On the other hand, small
absolute values of the t-statistic result in one not rejecting the null hypothesis; a small absolute
value of the t-statistic does not prove the null hypothesis is true. If for example β = 0.1, a small
absolute value of t-statistic may result due to random fluctuations, particularly if we have a only
small number of observations.123
We do not reject H0 unless there is sufficient evidence to do so. This is similar to the legal
concept of innocent (not guilty) until proven guilty. A trial does not prove one innocent.
Technically, one should not use the term “accept H0”. Nevertheless, it is common for actuaries,
including perhaps some members of the exam committee, to use the terms “do not reject H0”
and “accept H0” synonymously. For many actuaries in common usage: do not reject ⇔ accept.
Test Statistic:
A hypothesis test needs a test statistic whose distribution is known. In the above example,
the test statistic was the t-statistic, and one consults the t-table. In other statistical tests, one
would use the F-Table, Normal Table, Chi-Square Table, etc.
Critical Values:
The critical values are the values used to decide whether to reject H0. For example, in the
above test, the critical value (for 1% and 6 degrees of freedom) was 3.707.
We reject H0 if |t| ≥ 3.707.
121
If the universe of possibility is divided in a manner that includes a boundary, the null hypothesis must include the
boundary.
122
There are many other hypothesis tests, such as the t-test of means from Normal Distributions, the Chi-Square
Goodness of Fit Test, the Likelihood Ratio Test, the Kolmogorov-Smirnov Test, etc.
123
See the next section for a simulation experiment, illustrating this point.
The critical value(s) form the boundary (other than ±∞) of the rejection or critical region.
critical region ⇔ if test statistic is in this region then we reject H0 .
Significance Level:
The significance level, α, of the test is a probability level selected prior to performing the
test. In the above example, 1% was selected. Using the t table attached to the exam, one can
perform tests at significance levels of 10%, 5%, 2%, and 1%. For example, a significance level
of 5% uses the column listed as a total of 5% area in both tails.
If Prob[test statistic will take on a value at least as unusual as the computed value | H0 is true]
is less than the significance level chosen, then we reject the H0. If not, we do not reject H0.
The result of any hypothesis test depends on the significance level chosen. Therefore, in
practical applications the choice of the significance level is usually important.
Exercise: A linear regression has been fit to 12 observations. We test H0: β = 0. t = 2.6.
What conclusions do you draw at different significance levels?
[Solution: There are 12 - 2 = 10 degrees of freedom. The critical values for 10%, 5%, 2%, and
1%, shown in the t table for 10 degrees of freedom are: 1.812, 2.228, 2.764, and 3.169.
Since 2.6 > 1.812, reject H0 at 10%. Since 2.6 > 2.228, reject H0 at 5%.
Since 2.6 < 2.764, do not reject H0 at 2%. Since 2.6 < 3.169, do not reject H0 at 1%.]
The results of this exercise would usually be reported as: reject H at 5%, do not reject at 2%.
Since we reject at 5%, we also automatically reject at 10%. Since we do not reject at 2%, we
also automatically do not reject at 1%.
Types of Errors:*
There are two important types of errors that can result when performing hypothesis testing:124
Type I Error Reject H0 when it is true.

Type II Error Do not reject H0 when it is false.
Exercise: A linear regression has been fit to 15 observations.

^
We are testing H0: β = 0, by computing the t-statistic, t = β/ sβ^ .
We will reject H0 when |t| ≥ 2.650.
If we reject, what is the probability of making a Type I error?
124
We are assuming you set everything up correctly. These errors are due to the random fluctuations present in all
data sets and the incomplete knowledge of the underlying risk process which led one to perform a hypothesis test
in the first place.
[Solution: If H0 is true, then this t-statistic follows a t-distribution with 15 - 2 = 13 degrees of

freedom. Consulting the t-table, if H0 is true, then there is a 2% chance that |t| ≥ 2.650, due to
random fluctuation in the limited sample represented by the observed data.
In other words, the significance level of this test is 2%.
We reject when |t| ≥ 2.650, for example, if t = -2.9. Prob[|t| ≥ 2.9] < 2%.
The probability of making a Type I error is 2% or less.]
In general, rejecting H0 at a significance level of α, means the probability of a Type I error is at

most α.
p-value:
The p-value = Prob[test statistic takes on a value less in agreement with H0

than its calculated value].
If the p-value is less than the chosen significance level, then we reject H0 .
Exercise: A linear regression has been fit to 7 observations.

^
We are testing H0: β = 0, by computing the t-statistic, t = β/ sβ^ .
If t = -3.6, What is the p-value?
[Solution: There are 7 - 2 = 5 degrees of freedom. Since the critical value for 2% is 3.365 and
the critical value for 1% is 4.032, and 3.365 < 3.6 < 4.032, the p-value is between 1% and 2%.
Reject H0 at 2%, do not reject at 1%.
Comment: Using a computer, the p-value is 1.55%.]
Power of a Test:*
The power of a test is the probability of rejecting the null hypothesis, when H1 is true.
Prob[Type II error] = 1 - Power of the test = probability of failing to reject H0 when it is false.
Thus, everything else equal, large power of a test is good.
Decision H0 true H0 False
Reject H0 Type I Error ⇔ p-value Correct ⇔ Power
Do not reject H0 Correct ⇔ 1 - p-value Type II Error ⇔ 1 - Power
In general, there is a trade-off between Type I and Type II errors. Making the probability of one
type of error smaller, usually makes the probability of the other type of error larger.
The larger the data set, the easier it is to reject H0 when it is false.
The larger the data set, the more powerful a given test, all else being equal.
A Ratemaking Example:*
A Ratemaking Example:*
The probabilities of the these two types of errors are important in some work by actuaries.
For example, using the information from credit reports, an insurer can calculate a “credit score”
for an individual.125 Let us assume that Allen the actuary fit a regression model to data for the
insureds for his insurer and found that for a personal line of insurance, such as automobile or
homeowners insurance, a higher (better) credit score was associated with lower expected total
insurance claim payments.126 127 The actuary computes the p-value for a test of whether the
slope associated with the credit score is zero, versus negative.
How small should this p-value be, before the insurer uses credit scores for pricing?
There is no single right answer.
If Allen were the first actuary to do such a test, then he probably would want a small p-value,
such as for example 10%, before recommending the introduction of a new rating variable such
as credit scores. The slope would also have had to be large in absolute value, in other words
credit scores would have to have a large effect on expected insurance costs, before the insurer
would bother using credit scores as a rating variable.128
In some states, the use of credit scores to price insurance might be controversial. For the first
insurer to propose the use of credit scores for pricing, an insurance regulator in such a state
might require a very small p-value such as 1%, before approving the use of credit scores.129 If
the p-value were 10%, then even if a lower credit score is not associated with a higher
expected cost, after taking into account the other rating variables, there is a 10% chance that
one would see a statistic at least as large as that gotten by Allen.
If many other actuaries and many other insurers had gotten similar results, then the p-value for
Allen’s test may be irrelevant. More simply, from an underwriting standpoint, if all of your
competitors are using credit scores in pricing, then the p-value for Allen’s test is irrelevant.
125
Among the many items from credit reports that may be used to calculate a credit score for an individual are: late
payments, bad debts, and financial leverage. See “A View Inside the Black Box: A Review and Analysis of Personal
Lines Insurance Credit Scoring Models Filed in the State of Virginia,” by Cheng-sheng Peter Wu and John R.
Lucker, Winter 2004 CAS Forum.
126
Other variables were included in the model that also affect expected total claim payments.
127
A Generalized Linear Model, to be discussed in a subsequent section, might have been used instead of a linear
regression model.
128
There are many criteria for the use of a rating variable. See ““Risk Classification,” by Robert J. Finger,
Foundations of Casualty Actuarial Science.
129
Some insurance regulator are not swayed by facts, but some are. Some regulators would actually have their staffs
carefully review the results of an actuarial study and the result of that review would affect the regulator’s decision.
Another Ratemaking Example:*
Workers Compensation has different classes, which are charged different amounts for
insurance. The most important part of ratemaking is to estimate the expected pure premium, in
other words the expected dollars of loss per exposure insured.130
Let us assume, that the pure premium for Wire Goods Manufacturing indicated by the most
recent 5 years of data is 115% of the average for all Manufacturing classes.
Let H0: the expected pure premium for the Wire Goods Manufacturing class is the same as that
for the average of all Manufacturing classes.
If one charges the Wire Goods Manufacturing class more than the average rate for all
Manufacturing Classes, when the expected cost for this class is not higher than average, then
one is making a Type I error.
One might be able to perform some sort of a simulation experiment in order to estimate the
p-value.131 Let us assume this estimated p-value is 25%.
Most statisticians would not reject H0, when the p-value is as large as 25%. However, an
actuary is not just worried about the probability of a Type I error, in this case 25%, he is also
worried about the probability of a Type II error.
If one charges the Wire Goods Manufacturing class the average rate for all Manufacturing
classes, when the expected cost for this class is higher than average, then one is making a
Type II error.
Let H1: the expected pure premium for the Wire Goods Manufacturing class is 115% of the
average pure premium of all Manufacturing classes.
One might estimate the probability of a Type II error via a simulation experiment.
The probability of making a Type II error might also be 25%.
It seems that one will have a 25% probability of making an error, no matter of which of two
choices you make. However, actuaries rather than use either the average pure premium or
115% of the average pure premium for all Manufacturing classes, would use a value
somewhere in between, via the use of Credibility.132
Also, the magnitude of the difference in indicated pure premium and therefore indicated price
is very important. An actuary would have been unconcerned by the practical implications of an
indicated pure premium only 1% higher than average, but would have been very concerned if
the indicated pure premium had been either twice or half of average.
130
Regression is not used to estimate these pure premiums.
131
An example of a simulation experiment is in a subsequent section.
132
See for example, “Credibility” by Howard C. Mahler and Curtis Gary Dean in Foundations of Casualty Actuarial
Science, and “Workers Compensation Classification Credibilities,” by Howard C. Mahler in the Fall 1999 CAS Forum.
Problems:
16.1 (1 point) Which of the following statements about hypothesis testing is false?
A. The p-value is the probability given H0 is true, that the test statistic takes on a value equal
to its calculated value or a value less in agreement with H0 (in the direction of H1).
B. When testing whether β = 0, if the t-statistic is 3, for 7 degrees of freedom the p-value is
less than 2%.
C. If the p-value is less than the chosen significance level, then we reject H0.
D. The p-value is the chance of a Type II error.
E. None of the above statements is false.

One has fit a regression model with 2 variables (1 independent variable plus the intercept).
One is testing the hypothesis H0: β = 0, versus the alternative hypothesis H1: β ≠ 0.
16.2 (1 point) With 15 observations, what is the critical region for a test at a 5% significance
level?
level?
16.4 (1 point) Compare the probability of a Type II error for the tests in the two previous
questions, all else being equal.
16.5 (3 points) Captain James T. Kirk, interstellar explorer, has discovered the new planet of
Slubovia. Captain Kirk believes that the heights of adult males of humanoid species are
Normally Distributed with an average of 175 centimeters.
Captain Kirk beams down to the planet, to visit the capital city and talk to the native Slubs.
The first 5 Slubs Kirk observes have heights of: 150, 153, 160, 171, and 176 centimeters.
ΣXi = 810, ΣXi2 = 131,726.
Perform a statistical test of H0: µ = 175 versus H1: µ ≠ 175.
Science officer Spock points out to Kirk a number of possible problems with this test.
Briefly discuss some of them.
16.6 (4, 5/87, Q.50) (1 point) Which of the following are true regarding hypothesis tests?
1. The test statistic has a probability of α of falling in the critical region when H0 is true,
where α is the level of significance.
2. One should reject the H0 when the test statistic falls outside of the critical region.
3. The fact that the test criteria is not significant proves that the null hypothesis is true.
A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3
16.7 (CAS3, 5/05, Q.24) (2.5 points)

Which of the following statements about hypothesis testing are true?
1. A Type I error occurs if H0 is rejected when it is true.
2. A Type II error occurs if H0 is rejected when it is true.
3. Type I errors are always worse than Type II errors.
A. 1 only B. 2 only C. 3 only D. 1 and 3 only E. 2 and 3 only
Section 17, A Simulation Experiment:*

By reviewing a simulation of the situation in which we test the slope of a linear regression,
some people will get a better understanding of the hypothesis testing concepts discussed
previously.
Assume the model, Y = 50 + βX + εi, where εi are independent Normals with µ = 0 and σ = 3.133
Take for example, X = (0, 1, 5, 10, 25, 100).
We will use the t-statistic to test H0: β = 0.
When the Null Hypothesis is True:
Take β = 0.
Simulate a random set of εi and corresponding Yi.134
^
Fit a linear regression with intercept, and record t = β/ sβ^ .
For example, let the first simulated set of errors be:

(1.22272, -0.936812, -2.95579, 1.23242, 1.3919, 5.60513).
Then the corresponding set of Yi is:
(51.2227, 49.0632, 47.0442, 51.2324, 51.3919, 55.6051).
^
Exercise: What is the fitted regression, and t = β/ sβ^ ?
^
^ = 49.4691, β = 0.0620192, s ^ = 0.0202704,
[Solution: α β
and t = 0.0620192/0.0202704 = 3.05959.
Comment: An unusually large value of t. The corresponding p-value is only 3.8%.]
We repeat this process 10,000 times, recording in each case the value of the t-statistic.135
Since the null hypothesis is true, the actual slope is zero, most of the values of t are of small
absolute value. However, there are some unusual values of t.136
The ten smallest values of t are: -17.6622, -13.2387, -11.9001, -11.6023, -10.9685, -10.6137,
-10.4323, -8.979, -8.9156, -8.03695.
The ten largest values of t are: 6.87149, 7.40265, 7.61979, 7.64125, 7.81643, 8.08546,
8.29908, 8.60589, 9.45159, 12.7397.
133
The values of the intercept and variance of the regression were chosen solely for illustrative purposes.
134
One does not need to know the details of how to simulate a Normal Distribution. See Simulation by Ross.
135
Each simulation run has different simulated errors, different Yi, different fitted regression, and different t statistic.
136
When there is such an unusual value of t, we would reject H0 even though it is true, making a so called Type I
Here is a histogram of the 10,000 values of the t-statistic:
0.4
0.3
0.2
0.1
- 4. - 2. 0. 2. 4.
Here is comparison to the density function of the t-distribution with 6 - 2 = 4 d. f.:
0.4
0.3
0.2
0.1
- 4. -2. 0. 2. 4.
The simulated results seem to be a reasonable match to this t-distribution.

When the Null Hypothesis is Not True:
A similar set of 10,000 simulations was performed, except with β = 0.1.
Here is a histogram of the 10,000 values of the t-statistic:137
0.25
0.2
0.15
0.1
0.05
0. 2.5 5. 7.5 10. 12.5 15.
Since the actual slope is not zero, in other words the null hypothesis is false, many of the
values of t have a large absolute value.
However, note that there are still some values of t near zero. For example, there are 929 cases
out of 10,000 where |t| ≤ 1.5. Thus there are a significant number of simulated situations where
we would have failed to reject H0, even though it was false.138
In general, hypothesis tests are set up to disprove something. In this case, we reject H0 when if
H0 is true, there is a small probability of seeing a t-statistic as unusual as observed or more
unusual. Failing to reject H0 means that either H0 is true or there is insufficient evidence to
demonstrate that H0 is false.
137
This distribution is not symmetric. When H0 is false, the t-statistic follows what is called a non-central t-distribution.
138
This is what is called a Type II Error.
The power of a statistical test is: 1 - Prob[Type II Error] = Prob[reject H0 | H1].
The power of this test would have been larger if there had been more than 6 observations, β had been larger than
0.1, or σ had been smaller than 3.
Mahler’s Guide to
Regression
Sections 18-21:
18 Three Variable Regression Model
19 Matrix Form of Multiple Regression
20 Tests of Slopes, Multiple Regression Model
21 Additional Tests of Slopes
prepared by
Study Aid F06-Reg-E

Sharon, MA, 02067
HCMSA-F06-Reg-E, Mahler’s Guide to Regression, 7/11/06, Page 159
Section 18, Three Variable Regression Model
Tristate Insurance writes private passenger automobile insurance in three states, New York,
New Jersey, and Connecticut, and has 10 agents. Let X2 = the percentages of business written
by each agent that is from New York. Let X3 = the percentages of business written by each
agent that is from New Jersey.139 Let Y = the loss ratio for each agent.
Agent X2 X3 Y
1 100% 0 75%
2 90% 10% 78%
3 70% 0% 71%
4 65% 10% 73%
5 50% 50% 79%
6 50% 35% 75%
7 40% 10% 65%
8 30% 70% 82%
9 15% 20% 72%
10 10% 10% 66%
Let’s fit via regression the linear model: Y = β1 + β2X2 + β3X3 + ε.
Since there are 3 independent variables including the intercept, the formulas are somewhat
more complicated than for the two-variable model.
As usual define the variables in deviation form: x2 = X2 - X 2 .
Agent X2 X3 Y x2 x3 y x2x3 x2y x3y

1 100 0 75 48 -21.5 1.4 -1032.0 67.2 -30.1
2 90 10 78 38 -11.5 4.4 -437.0 167.2 -50.6
3 70 0 71 18 -21.5 -2.6 -387.0 -46.8 55.9
4 65 10 73 13 -11.5 -0.6 -149.5 -7.8 6.9
5 50 50 79 - 2 28.5 5.4 -57.0 -10.8 153.9
6 50 35 75 - 2 13.5 1.4 -27.0 -2.8 18.9
7 40 10 65 -12 -11.5 -8.6 138.0 103.2 98.9
8 30 70 82 -22 48.5 8.4 -1067.0 -184.8 407.4
9 15 20 72 -37 -1.5 -1.6 55.5 59.2 2.4
10 10 10 66 -42 -11.5 -7.6 483.0 319.2 87.4
Sum -2480.0 463.0 751.0
Avg. 52.0 21.5 73.6
Σx2ix3i = -2480, Σx2iyi = 463, and Σx2iyi = 751. Similarly, Σx2i2 = 8010.0, Σx3i2 = 4802.5.
139
Connecticut represents the remaining percentage, 1 - (X2 + X3).
The fitted least squares coefficients are:140

^
β 2 = {Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2}
= {(463)(4802.5) - (751)(-2480)}/{(8010)(4802.5) - (-2480)2} = 4086038/32317625 = .126.
^
= {(751)(8010.5) - (463)(-2480)}/{(8010)(4802.5) - (-2480)2} = 7164126/32317625 = .222.
^ ^ ^
β1 = Y - β 2 X 2 - β 3 X 3 = 73.6 - (.126)(52.0) - (.222)(21.5) = 62.3.141
^
Y = 62.3 + .126X2 + .222X3.
Thus, the model would seem to indicate that the larger X2, portion of business in New York, the
higher the loss ratio, and the larger X3, portion of business in New Jersey, the higher the loss
ratio.
The fitted regression is a plane, as shown in the following three dimensional graph:
90
L.R. 100
80
80
70
60
0
N.J.
20 40
40
60 20
N.Y.
80
0
100
If an agent wrote all of its business in New York, then X2 = 100 and X3 = 0, and the predicted
loss ratio is β1 + 100β2 = 74.9.
140
See equations 4.3 to 4.5 of Econometric Models and Economic Forecasts. As discussed in the next section, it
also possible to perform this regression in matrix form.
141
Therefore, the fitted regression goes through the point at which the independent variables and the dependent
variable are equal to their means.
If an agent wrote all of its business in New Jersey, then X2 = 0 and X3 = 100, and the predicted
loss ratio is β1 + 100β2 = 84.5.
If an agent wrote all of its business in Connecticut, then X2 = 0 and X3 = 0, and the predicted
loss ratio is β1 = 62.3.
For example, for the second agent the fitted loss ratio is:
62.3 + (.126)(90) + (.222)(10) = 75.86.
As in the two-variable case, these estimators of the slopes are unbiased.

In general, in multiple regression, the least squares estimators of the slopes are unbiased.142
R2:
^
Y = 74.90, 75.86, 71.11, 72.69, 79.66, 76.34, 69.53, 81.57, 68.59, 65.74.
Y = 75, 78, 71, 73, 79, 75, 65, 82, 72, 66.
^
Note that the mean of Y = 73.6 = Y .
TSS = Σ (Yi - Y )2 = 264.4.

^
RSS = Σ ( Yi - Y )2 = 225.01.
R2 = RSS/ TSS = 225.01/264.4 = 0.851.
2
1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .851)(10 - 1)/(10 - 3) = .192.
2
R = .808.
^
R2 can also be computed as the square of the correlation between Y and Y:
^ ^
Corr[ Y, Y] = 0.9225. Corr[Y, Y] 2 = .851 = R2.
Based on this model, the mix of business by state seems to explain some of the variation of
loss ratios between the agents.143 We have computed how much of the variation was explained
by this model.
In general one wants to determine whether the results of a regression are statistically
significant. As with the two-variable regression model, as will be discussed subsequently, one
can test the significance of the fitted coefficients using the t-test and F-Test. One preliminary
step is to compute the variances and covariances of the fitted parameters.
142
Provided the assumed form of the model is correct.
143
Since these loss ratios are from finite data sets, at least some of the variation is due to random fluctuation in the
aggregate loss process.
Variances and Covariances:144
For the three variable model, one can write the variances and covariances of the slopes in
terms of the simple correlation of x2 and x3,
rX X = Σx2ix3i/√(Σx2i2Σx3i2) = -2480/√{(8010.0)(4802.5)} = -.3999
2 3
As in the two-variable model, one can estimate the variance of the regression:
s2 = ESS/(N - k) = (TSS - RSS)/(10 - 3) = (264.4 - 225.01)/7 = 5.627.
^
Var[ β 2 ] = s2/{(1 - rX X 2)Σx2i2} = (5.627)/{(1 - .39992)(8010)} = .000836.
2 3
^
Var[ β 3] = s2/{(1 - rX X 2)Σx3i2} = (5.627)/{(1 - .39992)(4802.5)} = .001395.
2 3
Notice that as the correlation of the two independent variables increases in absolute value,
the variance of the regression parameters increases. For X1 and X2 independent, their
correlation is zero, and their sample correlation is close to zero. For X1 and X2 independent,
the values of X1, X2, and Y provide more information than when X1 and X2 are highly
^
correlated, and we get a better estimate of the coefficients. For rX X close to zero, Var[ β 2 ] and
2 3
^
Var[ β 3] are smaller, than for rX X close to ±1.
2 3
Agent X2 X3 X2^2 X3^2 X2 X3

1 100 0 10000 0 0
2 90 10 8100 100 900
3 70 0 4900 0 0
4 65 10 4225 100 650
5 50 50 2500 2500 2500
6 50 35 2500 1225 1750
7 40 10 1600 100 400
8 30 70 900 4900 2100
9 15 20 225 400 300
10 10 10 100 100 100
Sum 520 215 35050 9425 8700
^ ^
Cov[ β 2 , β 3] = - rX X s2 / {(1 - rX X 2)√(Σx2i2Σx3i2)} =
2 3 2 3
2
-(-.3999)(5.627)/{(1 - .3999 )√{(8010.0)(4802.5)}} = .000432.
144
See equations 4.6 to 4.8 of Econometric Models and Economic Forecasts. As discussed in the next section, it
also possible to use the matrix form to compute these elements of the variance-covariance matrix.
^ ^ ^
Econometric Models and Economic Forecasts does not include the formulas for Var[ β1] , Cov[ β1, β 2 ], and
^ ^
Cov[ β1, β 3]. These are derived in the next section from the matrix formula for of the covariance matrix.
^ ^ ^ ^ ^ ^
Corr[ β 2 , β 3] = Cov[ β 2 , β 3] / √(Var[ β 2 ]Var[ β 3]) =
.000432/√{(.000836)(.001395)} = .400 = - rX X , subject to rounding.
2 3
^
Var[ β1] = s2{ΣX2i2 ΣX3i2 - (ΣX2iX3i)2}/{NΣx2i2Σx3i2(1 - rX X 2)} =
2 3
2 2
(5.627){(35050)(9425) - 8700 }/{10(1 - .3999 )(8010)(4802.5)} = 4.434.
^ ^
Cov[ β1, β 2 ] = s2{ΣX3iΣX2iX3i - ΣX2iΣX3i2}/{NΣx2i2Σx3i2(1 - rX X 2)} =
2 3
(5.627){(215)( 8700) - (520)(9425)}/{10(1 - .39992)(8010)(4802.5)} = -.05277.
^ ^
Cov[ β1, β 3] = s2{ΣX2iΣX2iX3i - ΣX3iΣX2i2}/{NΣx2i2Σx3i2(1 - rX X 2)} =
2 3
(5.627){(520)( 8700) - (215)(35050)}/{10(1 - .39992)(8010)(4802.5)} = -.05244.
Problems:

You fit the following model to four observations:
Yi = β1 + β2X2i + β3X3i + εi, i = 1, 2, 3, 4
You are given:
i X2i X3i
1 2 4
2 5 8
3 7 10
4 10 14
4
^
1 8 .1 (3 points) The least squares estimator of β2 is expressed as β 2 = Σ wiYi.
i =1
Determine (w1, w2, w3, w4).
(A) (0.5, -2.5, 2.5, -0.5)
(B) (0.5, 2.5, -2.5, -0.5)
(C) (0.5, 2, -2, -0.5)
(D) (-0.5, 2, -2, 0.5)
(E) None of A, B, C, or D.
^
1 8 .2 (2 points) If Y = (10, 8, 14, 20), determine β 3.
A. -10 B. -7 C. 0 D. 10 E. 16
^
18.3 (1 point) If Y = (10, 8, 14, 20), determine β1.
A. -10 B. -7 C. 0 D. 10 E. 16
Use the following information for the next four questions:

You are given the multiple linear regression model Yi = β1 + β2X2i + β3X3i + εi.
32 32 32
Σ(X2i - X2 )2 = 23,266. Σ(X3i - X 3 )2 = 250. Σ(X2i - X2 )(X3i - X 3 ) = -612.
1 1 1
32
^
Σ(Yi - Yi )2 = 516,727.
1
^
18.4 (1 point) Determine Var[ β 2 ].
A. 0.8 B. 1.0 C. 1.2 D. 1.4 E. 1.6
^
18.5 (1 point) Determine Var[ β 3].
A. 65 B. 70 C. 75 D. 80 E. 85
^ ^
18.6 (1 point) Determine Cov[ β 2 , β 3].
A. 0.5 B. 1.0 C. 1.5 D. 2.0 E. 2.5
18.7 (1 point) Determine the estimate of the standard deviation of the least-squares estimate
of the sum of β2 and β3.
A. 8.6 B. 8.7 C. 8.8 D. 8.9 E. 9.0
18.8 (165, 5/89, Q.15) (1.7 points) You observed the following four students who sat for an
examination:
Student Hours Studied At Home Hours Studied At Library Score on The Exam
I 0 0 0
II 0 100 30
III 100 0 40
IV 100 100 80
Expected scores are to be obtained by using the regression approach to fit a plane to the
observed scores.
Determine the number of hours of study at home required to have an expected score of 60,
if 75 hours are studied at the library.
(A) 75 (B) 81 (C) 86 (D) 93 (E) 100
18.9 (Course 120 Sample Exam #1, Q.7) ( 2 points) You are given the multiple linear
regression model Yi = β2X2i + β3X3i + εi. The value of X2i and X3i have been scaled so that
ΣXji = 0 and ΣXji2 = 1, j = 1, 2.
i i
You are also given:
^
(i) Var[ β 2 ] is 4s2/3.
(ii) The regression of X2 on X3 has negative slope.
Determine the correlation coefficient between X2 and X3.
1 8 .1 0 (Course 120 Sample Exam #2, Q.2) (2 points) You fit the model
Yi = β1 + β2X2i + β3X3i + εi to the following data:
Y X2 X3
1 -1 -1
2 1 -1
4 -1 1
3 1 1
^
Determine β 2 .
(A) 0 (B) 1 (C) 2 (D) 3 (E) 4
18.11 (4, 5/00, Q.35) (2.5 points) You fit the following model to 30 observations:
Y = β1 + β2X2 + β3X3 + ε
(i) s2 = 10
(ii) rX X = 0.5.
2 3
(iii) Σ(X2 - X2 )2 = 4
(iv) Σ(X3 - X 3 )2 = 8
Determine the estimate of the standard deviation of the least-squares estimate of the difference
between β2 and β3.
(A) 1.7 (B) 2.2 (C) 2.7 (D) 3.2 (E) 3.7
18.12 (4, 11/01, Q.13) (2.5 points) You fit the following model to four observations:
Yi = β1 + β2X2i + β3X3i + εi, i = 1, 2, 3, 4
You are given:
i X2i X3i
1 –3 –1
2 –1 3
3 1 –3
4 3 1
4
^
The least squares estimator of β3 is expressed as β 3 = Σ wiYi.
i =1
Determine (w1, w2, w3, w4).
(A) (–0.15, –0.05, 0.05, 0.15)
(B) (–0.05, 0.15, –0.15, 0.05)
(C) (–0.05, 0.05, –0.15, 0.15)
(D) (–0.3, –0.1, 0.1, 0.3)
(E) (–0.1, 0.3, –0.3, 0.1)
18.13 (2 points) In the previous question, the least squares estimator of β2 is expressed as
4
^
β 2 = Σ wiYi. Determine (w1, w2, w3, w4).
i =1
Section 19, Matrix Form of Multiple Regression145
One can also perform regression using matrix methods. This is particularly useful as the
number of variables increases. Data on loss ratios by agent was previously fit via regression.
Agent X2 X3 Y
1 100% 0 75%
2 90% 10% 78%
3 70% 0% 71%
4 65% 10% 73%
5 50% 50% 79%
6 50% 35% 75%
7 40% 10% 65%
8 30% 70% 82%
9 15% 20% 72%
10 10% 10% 66%
As an example, let’s use matrix methods to fit the same regression.
To these 10 observations, fit the model: Y = β1 + β2X2 + β3X3 + ε.
The first step to list the so-called design matrix, in which the first column consists of
ones, corresponding to the constant term in the model, and the remainder of
each row is the values of the independent variables for an observation.
Thus for example, the second observation has X2 = 90 and X3 = 10, and therefore the second
row of the design matrix is (1, 90, 10).
The design matrix is called X. A column vector of the independent variables, which in this case
is the loss ratio by agent, is called Y.
(1 100 0) (75)
(1 90 10) (78)
(1 70 0) (71)
(1 65 10) (73)
X= (1 50 50) Y = (79)
(1 50 35) (75)
(1 40 10) (65)
(1 30 70) (82)
(1 15 20) (72)
(1 10 10) (66)
Note that the fourth row of X is: 1, X2,4 , X3,4. So the subscripts of the elements of the design
matrix do not follow the usual convention of row followed by column.
145
See Appendix 4.3 in Econometric Models and Economic Forecasts.
β is a column vector of the coefficients to be fit.

( β1 )
β = ( β2 )
( β3 )
In matrix form, the model equations are: Y = Xβ + ε.
For example the second row corresponds to the second observation.

Y2 = β1 + β2X2,2 + β3X3,2 + ε, or 78 = β1 + β290 + β310 + ε.
X’ is the transpose of the matrix X, with the rows and columns interchanged:
(1 1 1 1 1 1 1 1 1 1)
X’ = (100 90 70 65 50 50 40 30 15 10)
(0 10 0 10 50 35 10 70 20 10)
The next step is to multiply X’ times X:
(10 520 215)

X’X = (520 35050 8700)
(215 8700 9425)
For example, the third element of the second row is:

(100)(0) + (90)(10) + (70)(0) + (65)(10) + ( 50)(50) + (50)(35) + (40)(10) + (30)(70) + (15)(20) +
(10)(10 ) = 8700.
X’X is called the cross product matrix. The cross product matrix is always square and
symmetric. The 1,1 element is the number of observations. The other elements in the first row
and first column are sums of the independent variables. The remaining elements are of the
form Σ Xji Xli.146
We need to take the inverse of X’X.147
(.787979 -.00937724 -.00931922 )

(X’X)-1 = (-.00937724 .000148603 .0000767383)
(-.00931922 .0000767383 .00247852 )
For example, the second element of the first row is: -{(520)(9425) - (215)(8700)}/ 323176250 =
-.00937724, where 323,176,250 is the determinant of the cross product matrix.148
146
This is the dot product of the vectors Xj and Xl.
147
In general, the cross product matrix will have an inverse unless two or more of the independent variables are
linearly related. Such multicollinearity will be discussed subsequently. If the cross product matrix does not have an
inverse, then one can not perform linear regression using all of these independent variables. In that case, one
would need to drop one or more of the independent variables from the originally proposed model equation.
148
The determinant is: (10){(35050)(9425) - (8700)(8700)} - (520){(520)(9425) - (215)(8700)} +
(215){(520)(8700) - (215)(35050)} = 323,176,250.
Except in special cases with lots of zeros in the matrix, I do not believe you should be able to
take inverses of three by three matrices, or larger, on the exam!
We need to multiply the transpose of the design matrix times the column vector of dependent
variables.
( 736 )
X’Y = (38735)
(16575)
The element in the first row of X’Y is the sum of the loss ratios. The element in the second row
is Σ X2iYi, the sum of the product of the portion in New York and the loss ratio.
It turns out that the fitted regression coefficients are: (X’X)-1X’Y.
(.787979 -.00937724 -.00931922)( 736 ) (62.2596)

(X’X)-1 X’Y = (-.00937724 .000148603 .0000767383)(38735) = (.126434)
(-.00931922 .0000767383 .00247852) (16575) (.221667)
^ ^ ^
β1 = 62.3, β 2 = .126, β 3 = .222, matching the result obtained previously.
In general, the fitted regression coefficients are:149

^
β = (X’X)-1 X’Y.
Variance-Covariance Matrix:
As computed in the previous section the estimated variance of the regression is:
s2 = ESS/(N - k) = (TSS - RSS)/(10 - 3) = (264.4 - 225.01)/7 = 5.627.
The variance-covariance matrix of the estimated coefficients is:

(.787979 -.00937724 -.00931922 )
^
Var[ β ] = s2(X’X)-1 = (5.627) (-.00937724 .000148603 .0000767383)
(-.00931922 .0000767383 .00247852 )
(4.434 -.05277 -.05244 )

^
Var[ β] = (-.05277 .000836 .000432)
(-.05244 .000432 .001395)
Note that the above matrix, as with all variance-covariance matrices, is symmetric.
^ ^ ^
Var[ β1] = 4.434, Var[ β 2 ] = .000836, Var[ β 3] = .001395.
^ ^ ^ ^ ^ ^
Cov[ β1, β 2 ] = -.05277, Cov[ β1, β 3] = -.05244, Cov[ β 2 , β 3] = .000432.
149
See Equation A4.12 in Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.
^
The matrix of correlations of β is:
( 1 -.867 -.667)
^
Corr[ β] = (-.867 1 -.400)
(-.667 .400 1 )
^ ^
For example, Corr[ β1, β 2 ] = -.05277/√((4.434)(.000836)) = -.867.
Model with One Slope and No Intercept, Variance-Covariance Matrix:
For the one variable model (slope and no intercept), the design matrix has a single column
containing X. In other words, X is a column vector. X’X = ΣXi2. (X’X)-1 = 1/ΣXi2.
^
Var[ β] = s2(X’X)-1 = s2/ΣXi2, matching the result shown in a previous section.
Two Variable Model, Variance-Covariance Matrix:
For the two variable model (slope and intercept), the design matrix has 1s in the first column
and Xi in the second column:
(1 X1)
X= (1 X 2)
(1 X 3)
(... ... )
X’ = (1 1 1 .... )
(X1 X2 X3 .... )
N ΣXi 
X’X =  
ΣXi ΣXi2
 ΣXi2 -ΣXi
(X’X)-1 =   /{N ΣXi2 - (ΣXi)2}.
−ΣXi N 
Now, N ΣXi2 - (ΣXi)2 = N{ΣXi2 - N X 2} = NΣ(Xi - X )2 = NΣxi2.
Therefore, the variance-covariance matrix of the fitted parameters is:

 ΣXi2 -ΣXi
s2(X’X)-1 = s2   /{NΣxi2}.
 −ΣXi N 
^
Therefore, Var[ α^ ] = s2ΣXi2 /(NΣxi2), Cov[ α^ , β] = -s2ΣXi /(NΣxi2) = -s2 X /Σxi2, and
^
Var[ β] = s2N /(NΣxi2) = s2 /Σxi2. This matches the formulas discussed previously.
Three Variable Model, Variance-Covariance Matrix:*
For the three variable model (two slopes and an intercept), the design matrix has 1s in the first
column, X2 in the second column, and X3 in the third column:
(1 X 21 X31)
X= (1 X 22 X32)
(1 X 23 X33)
(... ... ... )
(1 1 1 .... )
X’ = (X21 X 22 X 23 .... )
(X31 X 32 X 33 .... )
(N ΣX2i ΣX3i )
X’X = (ΣX2i ΣX2i2 ΣX2iX3i)
(ΣX3i ΣX2iX3i ΣX3i2 )
(ΣX2i2 ΣX3i2 - (ΣX2iX3i)2 ΣX3iΣX2iX3i - ΣX2iΣX3i2 ΣX2iΣX2iX3i - ΣX3iΣX2i2)

(X’X)-1 = (ΣX3iΣX2iX3i - ΣX2iΣX3i2 NΣX3i2 - (ΣX3i)2 ΣX2iΣX3i - NΣX2iX3i ) /D
(ΣX2iΣX2iX3i - ΣX3iΣX2i2 ΣX2iΣX3i - NΣX2iX3i NΣX2i2 - (ΣX2i)2 )
Where D is the determinant of X’X.
D = N{ΣX2i2ΣX3i2 - (ΣX2iX3i)2} - ΣX2i{ΣX2iΣX3i2 - ΣX3iΣX2iX3i} + ΣX3i{ΣX2iΣX2iX3i - ΣX2iΣX3i2} =

NΣX2i2ΣX3i2 - (ΣX2i)2ΣX3i2 - (ΣX3i)2ΣX2i2 + (ΣX2i)2(ΣX3i)2
- {N(ΣX2iX3i)2 - 2ΣX2iΣX3iΣX2iX3i + (ΣX2i)2(ΣX3i)2} =
N{ΣX2i2 - (ΣX2i)2/N}{ΣX3i2 - (ΣX3i)2/N} - N{ΣX2iX3i - ΣX2iΣX3i/N}2 =
N{Σx2i2Σx3i2 - (Σx2ix3i)2} = N Σx2i2Σx3i2(1 - rX X 2).
2 3
Where rX X = correlation of X2 and X3 = Σx2ix3i/√(Σx2i2Σx3i2).

2 3
The variance-covariance matrix of the fitted parameters is s2(X’X)-1.
^
Therefore, Var[ β1] = s2{ΣX2i2 ΣX3i2 - (ΣX2iX3i)2}/{NΣx2i2Σx3i2(1 - rX X 2)}.
2 3
^
Var[ β 2 ] = s2{NΣX3i2 - (ΣX3i)2}/{NΣx2i2Σx3i2(1 - rX X 2)} = s2NΣx3i2/{NΣx2i2Σx3i2(1 - rX X 2)}
2 3 2 3
2 2 2
= s /{Σx2i (1 - rX X )}.
2 3
^
Var[ β 3] = s2{NΣX2i2 - (ΣX2i)2}/{NΣx2i2Σx3i2(1 - rX X 2)} = s2/{Σx3i2(1 - rX X 2)}.
2 3 2 3
^ ^
Cov[ β1, β 2 ] = s2{ΣX3iΣX2iX3i - ΣX2iΣX3i2}/{NΣx2i2Σx3i2(1 - rX X 2)}.
2 3
^ ^
Cov[ β1, β 3] = s2{ΣX2iΣX2iX3i - ΣX3iΣX2i2}/{NΣx2i2Σx3i2(1 - rX X 2)}.
2 3
^ ^
Cov[ β 2 , β 3] = -s2{NΣX2iX3i - ΣX2iΣX3i}/{NΣx2i2Σx3i2(1 -rX X 2)} =
2 3
-s2NΣx2ix3i/{NΣx2i2Σx3i2(1 - rX X 2)} = -s2 rX X /{(1 - rX X 2)√(Σx2i2Σx3i2)}.
2 3 2 3 2 3
This matches the formulas discussed previously for the three variable model.
The Regression Passes Through the Point Where All Variables Take on Their Mean:*
As has been discussed previously, in the two variable regression model (one independent
variable plus intercept) the fitted line passes though the point ( X , Y ). A similar result holds for
multiple regressions.
^ ^
β = (X’X)-1X’Y. ⇒ (X’X) β = X’Y.
For concreteness, let assume 3 independent variables. Then the design matrix is:
(1 X 21 X31 X41)
X= (1 X 22 X 32 X42)
(1 X 23 X 33 X43)
(... ... ... ... )
(1 1 1 .... )
X’ = (X21X 22 X 23 .... )
(X31X 32 X 33 .... )
(X41X 42 X 43 .... )
 N ∑ X2i ∑ X3i ∑ X4i 
 
∑ X2i ∑ X2i2 ∑ X2iX3i ∑ X2iX4i
Then X’X =  .
∑ X3i ∑ X2iX3i ∑ X3i2 ∑ X3iX4i
 
∑ X4i ∑ X2iX4i ∑ X3iX4i ∑ X4i2 
 ∑ Yi 
 
 ∑ YiX2i
X’Y =
∑ YiX3i
 
∑ YiX4i
^
Thus the first component of the matrix equation (X’X) β = X’Y is:
^ ^ ^ ^
N β1 + ΣX2i β 2 + ΣX3i β 3 + ΣX4i β 4 = ΣYi.
^ ^ ^ ^
⇒ β1 + β2 X 2 + β3 X 3 + β4 X 4 = Y .
The fitted regression passes through the point at which all of the variables are equal to their
means. This nice property holds in general for multiple regressions.
Division of the Variance into Two Pieces in Matrix Form:*
One can also write the various sums of squares in matrix form:
TSS = Y’Y - N Y 2.
^ ^ ^ ^
RSS = β‘X’X β - N Y 2 = β‘X’X(X’X)-1X’Y - N Y 2 = β‘X’Y - N Y 2 .
^ ^ ^
ESS = Y’Y - β‘X’X β = Y’Y - β‘X’Y.
For the agents loss ratio example:
TSS = Y’Y - N Y 2 = 54434 - 10(73.62) = 264.4.150
( 736 )
^
RSS = β‘X’Y - N Y 2 = (62.2596 .126434 .221667)(38735) - 10(73.62) = 54394.6 - 54169.6
(16575)
= 225.0.
^
ESS = Y’Y - β‘X’Y = 54434 - 54394.6 = 39.4.
s2 = ESS/(N - k) = 39.4/(10 - 3) = 5.63.
Matching the previous results, subject to rounding.
150
Note that the ΣYi = 736, which can be computed directly, but is also the the first element of X’Y. Y = 736/10.
Hat Matrix:*151
^ ^
Y = X β = X (X’X)-1X’Y = HY,
where H = X (X’X)-1X’, is called the hat matrix.
H’ = {X (X’X)-1X’}’ = (X’)’ {(X’X)-1}’ X’ = X {(X’X)’}-1X’ = X (X’X)-1X’ = H.152

Thus H is symmetric.
H2 = X (X’X)-1X’X (X’X)-1X’ = X (X’X)-1X’ = H.153
HX = X (X’X)-1X’H = X. ⇒ (I - H)X = 0.
^
ε^ = Y - Y = Y - HY = (I - H)Y.
Covariance Matrix of the Residuals:*
E[ ε^ ] = E[(I - H)Y] = (I - H)E[Y] = (I - H)E[Xβ + ε] = (I - H)E[Xβ] + (I - H)E[ε] = E[(I - H)X]β + (I - H)0 =

E[0]β + 0 = 0.154
Therefore, ε^ - E[ ε^ ] = (I - H)Y - 0 = (I - H)Y.

Using the fact that (I - H)X = 0, this can be rewritten as:
ε^ - E[ ε^ ] = (I - H)Y - (I - H)Xβ = (I - H)(Y - Xβ) = (I - H)ε.155
( ε^ - E[ ε^ ])( ε^ - E[ ε^ ])’ = (I - H)ε{(I - H)ε}’ = (I - H)εε’(I - H)’ = (I - H)εε’(I’ - H’) = (I - H)εε’(I - H).
Therefore, the covariance matrix of the residuals is:

V( ε^ ) = E[( ε^ - E[ ε^ ])( ε^ - E[ ε^ ])’] = E[(I - H)εε’(I - H)] = (I - H)E[εε’](I - H).
However, E[εi] = 0, the εi each have variance σ2, and are mutually independent, and therefore
E[εiεj] = σ2δij. ⇒ E[εε’] = σ2I.
⇒ V( ε^ ) = (I - H)σ2I(I - H) = (I - H)(I - H)σ2 = (I - 2H + H2)σ2 = (I - H)σ2.
Thus the covariance matrix of the residuals is (I - H)σ2.156

151
See Section 8.1 of Applied Regression Analysis by Draper and Smith.
152
The transpose of a matrix reverses the rows and columns.
The transpose of the product of matrices is the product of the transposes in the opposite order.
Transpose of X’X is X’X. Also, (M-1)’ = (M’)-1.
153
Thus H is idempotent.
154
This is a vector result. E[ ^εi ] = 0 for each i.
155
Recall that in matrix form the model is: Y = Xβ + ε.
156
This result used the assumptions of homoscedasticity and independent errors.
Exercise: For a linear regression through the origin, Yi = βXi + εi, determine the covariance
matrix of the residuals.
[Solution: The design matrix has only one column consisting of the Xi. X’X = ΣXk2.
H = X (X’X)-1X’ = X X’/ΣXk2. Hij = XiXj /ΣXk2.
(I - H)σ2 = (δij - XiXj /ΣXk2)σ2. Var[ ^εi ] = σ2(1 - Xi2/ΣXk2). Cov[ ^εi , ε^j ] = -σ2XiXj/ΣXk2.
Comment: This matches a result in a previous section. H is an N by N matrix.]
Expected Value of ESS:*157
As demonstrated previously, E[ ^εi ] = 0. Therefore, E[ ^εi 2] = Var[ ^εi ].
E[ESS] =E[Σ ^εi 2] = ΣE[ ^εi 2] = ΣVar[ ^εi ] = Tr[V[ ε^ ]].158
E[ESS] = Tr[V[ ε^ ]] = Tr[(I - H)σ2] = (Tr[I] - Tr[H])σ2.
I is the N by N identity matrix with trace N.
We will use the fact that assuming the matrices A and B are compatible in size:
Tr[AB] = Σ δik Σ aijbjk = Σ aijbji = Σ δjk Σ bjiaik = Tr[BA].
i j i,j j i
Tr[H] = Tr[X (X’X)-1X’] = Tr[X’X (X’X)-1] = Tr[Ik] = k.159
E[ESS] = (Tr[I] - Tr[H])σ2 = (N - k)σ2.
Therefore, s2 = ESS/(N - k) is an unbiased estimator of σ2.
Note that this result did not depend on the distributional form of the errors.
This result did depend on the errors each having mean of zero and variance of σ2, and that the
errors be mutually independent.
157
See Section 19.9 of Volume 2 of Kendall’s Advanced Theory of Statistics, by Stuart and Ord.
158
Where the trace of a matrix is the sum of the elements along its diagonal.
159
Using the above result with A = X (X’X)-1 and B = X’.
Note that the design matrix X is an N by k matrix, so that X’X, the cross product matrix is k by k.
Therefore, X’X(X’X)-1 is the k by k identity matrix.
Problems:
Use the following information for a regression model for the next 4 questions:
(1133.200 )
X’Y = (12273.244 )
(46085.1007 )
(1738.0862 )
(.52648584 .0070505743 -0.00080896501 -.370750118 )
-1
(X’X) = (.0070505743 0.0041832832 -0.00071877348 -0.014609706)
(-0.00080896501 -0.00071877348 0.00071110259 -0.012779797)
(-.370750118 -0.014609706 -0.012779797 0.69152462 )
Y’Y = 73990.3. N = 20.
19.1 (4 points) What are the fitted coefficients?
19.2 (2 points) What is the estimated variance of the regression?
19.3 (2 points) What is the covariance matrix of the fitted parameters?
2
19.4 (3 points) What are R2 and R ?
19.5 (2 points) You fit the multiple regression model Yi = β1 + β2X2i + β3X3i + εi to a set of 32
observations. You determine:
( 1.695 -0.00773 -0.0571 )
-1
(X’X) = (-0.00773 0.0000459 0.0001125)
(-0.0571 0.0001125 0.00428 )
Total Sum of Squares (TSS) = 4,799,790.
Regression Sum of Squares (RSS) = 4,283,063.
^ ^
Determine the estimated standard error of: 100 β 2 + 10 β 3.
(A) 110 (B) 120 (C) 130 (D) 140 (E) 150
19.6 (1 point) Demonstrate that the matrix form of regression matches the equation for the
fitted slope of the regression model with no intercept.
19.7 (2 points) A multiple regression has been fit to 25 observations:

^
Y = 11 - 4X2 + 7X3 - 12X4.
Σ X2i = 148. Σ X3i = 201. Σ X4i = 82.
Determine Y .
19.8 (3 points) Demonstrate that the matrix form of regression matches the equations for the
fitted slope and intercept of the two variable regression model.
Use the following information for the next five questions:

You fit the following model to four observations:
Yi = β2X2i + β3X3i + εi, i = 1, 2, 3, 4
You are given:
i X2i X3i Yi
1 2 4 10
2 5 8 8
3 7 10 14
4 10 14 20
^
19.9 (2 points) Determine β 2 .
A. -2.0 B. -1.8 C. -1.6 D. -1.4 E. -1.2
^
19.10 (2 points) Determine β 3.
A. 2.3 B. 2.5 C. 2.7 D. 2.9 E. 3.1
^
19.11 (2 points) Determine Var[ β 2 ].
A. 6 B. 8 C. 10 D. 12 E. 14
^
19.12 (2 points) Determine Var[ β 3].
A. 1 B. 3 C. 5 D. 7 E. 9
^ ^
19.13 (2 points) Determine Cov[ β 2 , β 3].
A. -10 B. -8 C. -6 D. -4 E. -2
* 19.14 (4 points) For a linear regression, Yi = α + βXi + εi, determine the covariance matrix of
the residuals.
* 19.15 (4 points) In the previous question, if there are a total of 5 observations, with X1 = 1,
X2 = 2, X3 = 3, X4 = 4, and X5 = 5, determine the covariance matrix of the residuals.
1 9 . 1 6 (Course 120 Sample Exam #1, Q.4) (2 points) You fit the multiple regression
model Yi = β1 + β2X2i + β3X3i + εi to a set of data. You determine:
( 6.1333 -0.0733 -0.1933)
(X’X)-1 = (-0.0733 0.0087 -0.0020)
(-0.1933 -0.0020 0.0087)
2
s = 280.1167.
^ ^
Determine the estimated standard error of β 2 - β 3.
(A) 1.9 (B) 2.2 (C) 2.5 (D) 2.8 (E) 3.1
1 9 . 1 7 (Course 120 Sample Exam #3, Q.6) ( 2 points) You fit the multiple regression
model Yi = β1 + β2X2i + β3X3i + β4X4i + εi to 30 observations. You are given:
Y’Y = 7995.
( 2.8195 -0.0286 -0.0755 -0.0263)
-1
(X’X) = (-0.0286 0.0027 0.0010 -0.0014)
(-0.0755 0.0010 0.0035 -0.0010)
(-0.0263 -0.0014 -0.0010 0.0032)
(261.5 ) (5.22)
^
X’Y = (4041.5) β = (1.62)
(6177.5) (0.21)
(5707.0) (-0.45)
Determine the length of the symmetric 95% confidence interval for β3.
(A) 0.3 (B) 0.6 (C) 0.7 (D) 1.5 (E) 1.8
19.18 (4, 11/03, Q.36) (2.5 Points)

For the model Yi = β1 + β2X2i + β3X3i + β4X4i + εi, you are given:
(i) N = 15
(ii) (13.66 -0.33 2.05 -6.31)
(X’X)-1 = (-0.33 0.03 0.11 0.00)
( 2.05 0.11 2.14 -2.52)
(-6.31 0.00 -2.52 4.32)
(iii) ESS = 282 82.
^ ^
Calculate the standard error of β 3 - β 2 .
(A) 6.4 (B) 6.8 (C) 7.1 (D) 7.5 (E) 7.8

(i) Y is the annual number of discharges from a hospital.
(ii) X is the number of beds in the hospital.
(iii) Dummy D is 1 if the hospital is private and 0 if the hospital is public.
(iv) The proposed model for the data is Y = β1 + β2X + β3D + ε.
(v) To correct for heteroscedasticity, the model Y/X = β1/X + β2 + β3D/X + ε/X is fitted to
^ ^ ^
N = 393 observations, yielding β 2 = 3.1, β1 = -2.8 and β 3 = 28.
(vi) For the fit in (v) above, the matrix of estimated variances and covariances
^ ^ ^
of β 2 , β1 and β 3 is:
(0.0035 -0.1480 0.0357)
(-0.1480 21.6520 -16.9185)
(0.0357 -16.9185 38.8423)
Determine the upper limit of the symmetric 95% confidence interval for the difference
between the mean annual number of discharges from private hospitals with 500 beds and the
mean annual number of discharges from public hospitals with 500 beds.
(A) 6 (B) 31 (C) 37 (D) 40 (E) 67
19.20 (2 points) in the previous question, determine the lower limit of the symmetric 99%
confidence interval for the difference between the mean annual number of discharges from
private hospitals with 300 beds and the mean annual number of discharges from public
hospitals with 400 beds.
(A) -311 (B) -309 (C) -307 (D) -305 (E) -303
(i) Y is the annual number of discharges from a hospital.
(ii) X is the number of beds in the hospital.
(iii) Dummy variable D is 1 if the hospital is private and 0 if the hospital is public.
(iv) The classical three-variable linear regression model β1 + β2X + β3D + ε is fitted to
N cases using ordinary least squares.
^ ^ ^
(v) The matrix of estimated variances and covariances of β1, β 2 , and β 3 is:
(1.89952 -0.00364 -0.82744)
(-0.00364 0.00001 -0.00041)
(-0.82744 -0.00041 2.79655)
^ ^
Determine the standard error of β1 + 600 β 2 .
(A) 1.06 (B) 1.13 (C) 1.38 (D) 1.90 (E) 2.35
Section 20, Tests of Slopes, Multiple Regression Model

One can test hypotheses about the slopes of multiple regression models in a similar manner to
that discussed for the two-variable model. One can apply the t-test to individual parameters in
the multiple-variable case in the same manner. The number of degrees of freedom is N - k,
where k is the number of variables including the intercept. The t-test is a special case of the
F-Test, which can be used to test more than one slope simultaneously.
An Example of a Four Variable Regression Model:
In order to give a concrete example to discuss, assume the following 8 observations of three
independent variables, four variables when we include the intercept in the regression, and one
dependent variable.
X2 X3 X4 Y
-2 1 -4 6
1 -1 0 8
3 4 4 33
6 -4 8 14
11 0 12 40
15 8 16 118
17 -8 20 2
20 -6 24 61
To these 8 observations, fit the model:160 Y = β1 + β2X2 + β3X3 + β4X4 + ε.
(1 -2 1 -4) ( 6 )
(1 1 -1 0) ( 8 )
(1 3 4 4) ( 33 )
X= (1 6 -4 8) Y = ( 14 )
(1 11 0 12 ) ( 40 )
(1 15 8 16 ) (118)
(1 17 -8 20) ( 2 )
(1 20 -6 24) ( 61 )
(8 71 -6 80 ) (282 )
X’X = (71 1085 -151 1260) X’Y = (3643)
(-6 -151 198 -196) (636 )
(80 1260 -196 1472) (4092)
160
With the aid of a computer. Due to the number of observations and independent variables, calculating the fitted
coefficients would be much too time consuming to do on the exam. However, questions involving for example the
tests of slopes can be asked. In any case, this example can serve as a useful review of the concepts discussed
previously.
(0.378621 -0.162062 0.005565 0.118885 )

(X’X)-1 = (-0.162062 0.276362 -0.022578 -0.230759)
(0.005565 -0.022578 0.007869 0.020071 )
(0.118885 -0.230759 0.020071 0.194415 )
(6.3974)
^
β = (X’X)-1X’Y = (2.4626)
(6.4560)
(1.1839)
^ ^ ^ ^
β1 = 6.3974, β 2 = 2.4626, β 3 = 6.4560, β 4 = 1.1839.
For the fitted model, for example the first predicted value is:
^ ^ ^ ^ ^
Y1 = β1 + β 2 X2,1 + β 3X3,1 + β 4 X4,1 = 6.3974 + (2.4626)(-2) + (6.4560)(1) + (1.1839)(-4) = 3.19.
The residuals of this regression, ^εi , are:

^ ^ε ^ ^
X2 X3 X4 Yi Yi i = Yi - Yi Yi - Y Yi - Y
-2 1 -4 6 3.19 2.81 -32.06 -29.25
1 -1 0 8 2.40 5.60 -32.85 -27.25
3 4 4 33 44.34 -11.34 9.09 -2.25
6 -4 8 14 4.82 9.18 -30.43 -21.25
11 0 12 40 47.69 -7.69 12.44 4.75
15 8 16 118 113.93 4.07 78.68 82.75
17 -8 20 2 20.29 -18.29 -14.96 -33.25
20 -6 24 61 45.33 15.67 10.08 25.75
Y = (6 + 8 + 33 + 14 + 40 + 118 + 2 + 61)/8 = 35.25.
^
Error Sum of Squares = ESS = residual variation ≡ Σ ^εi 2 = Σ(Yi - Yi )2 =
2.812 + 5.602 + (-11.34)2 + 9.182 + (-7.69)2 + 4.072 + (-18.29)2 + 15.672 = 908.1.
^
Regression Sum of Squares = RSS = explained variation ≡ Σ( Yi - Y )2 =
(-32.06)2 + (-32.85)2 + 9.092 + (-30.43)2 + 12.442 + 78.682 + (-14.96)2 + 10.082 = 9786.
Total Sum of Squares = TSS ≡ Σ(Yi - Y )2 =

(-29.25)2 + (-27.25)2 + (-2.25)2 + (-21.25)2 + 4.752 + 82.752 + (-33.25)2 + 25.752 =10693.5.
ESS + RSS = 908 + 9786 = 10694 = TSS.
R2 ≡ RSS/TSS = explained variation / total variation = 9786/10694 = .915.

The estimated variance of the regression is:

s2 ≡ sample variance of ^εi = Σ ^εi 2/ (N - k) = ESS/(N - k) =
{2.812 + 5.602 + (-11.35)2 + 9.182 + (-7.69)2 + 4.072 + (-18.29)2 + 15.672}/(8 - 4) =
908.1/4 = 227.0.
Sample Variance of Y = Σ(Yi - Y )2/(N - 1) = TSS/(N-1) = 10693.5 /(8 - 1) = 1527.6.

2
R = Corrected R2 ≡ 1 - s2/Var[Y] = 1 - 227.0/1527.6 = .851.
2
Note that 1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .915)(8 - 1)/(8 - 4) = (.085)(7/4) = .149.
The variance-covariance matrix of the estimated coefficients is:

(0.378621 -0.162062 0.005565 0.118885 )
^
Var[ β] = s2(X’X)-1 = (227.0)(-0.162062 0.276362 -0.022578 -0.230759)
(0.005565 -0.022578 0.007869 0.020071 )
(0.118885 -0.230759 0.020071 0.194415 )
(85.96 -36.79 1.26 26.99 )

^
Var[ β] = (-36.79 62.75 -5.13 -52.39)
(1.26 -5.13 1.79 4.56 )
{26.99 -52.39 4.56 44.14)
Note that the above matrix, as with all variance-covariance matrices, is symmetric.
^ ^ ^ ^
Var[ β1] = 85.96, Var[ β 2 ] = 62.75, Var[ β 3] = 1.79, Var[ β 4 ] = 44.14.
^ ^ ^ ^ ^ ^
Cov[ β1, β 2 ] = -36.79, Cov[ β1, β 3] = 1.26, Cov[ β1, β 4 ] = 26.99,
^ ^ ^ ^ ^ ^
Cov[ β 2 , β 3] = -5.13, Cov[ β 2 , β 4 ] = -52.39, Cov[ β 3, β 4 ] = 4.56.
^ ^
For example, Corr[ β1, β 2 ] = -36.79/√{(85.96)(62.75)} = -.50.
^
The matrix of correlations of β is:
( 1 -.50 .10 .44 )

^
Corr[ β] = (-.50 1 -.48 -.995)
(.10 -.48 1 .51 )
(.44 -.995 .51 1 )
The standard errors of the estimated regression coefficients are:

^ ^
s ^ = √Var[ β1] = √85.96 = 9.27, sβ^ = √Var[ β 2 ] = √62.75 = 7.92,
β1 2
^ ^
sβ^ = Var[ β 3] = √1.79 = 1.34, sβ^ = √Var[ β 4 ] = √44.14 = 6.64.
3 4
The More General the Model, the Smaller the ESS:
The above model but with β2 fixed at zero, is a special case of the above model, with X2 not
^ ^
entering into the model. The fitted regression with β1 = 6.3974, β 2 = 2.4626,
^ ^
β 3 = 6.4560, β 4 = 1.1839, was determined so as to have the least sum of squared errors; in
other words, for the given observations, it has the smallest ESS over all possible values of β1,
β2, β3, and β4. A more restricted model with β2 fixed at zero has an ESS that can not be larger
than that of the unrestricted model.
Minimizing over a larger set we do at least as well and probably better than when minimizing
over a subset.161 For example, Susan is the youngest employee in the actuarial department at
the Regressive Insurance Company. We know that the youngest employee of the whole
Regressive Insurance Company is the same age or younger than Susan, because the
actuarial department is a subset of the Regressive Insurance Company.
So in general, adding additional variables to a linear regression model creates a more general
model, and will decrease the ESS, (on rare occasions the ESS will stay the same.)162 The
question is whether this improvement in ESS is significant. As for the two-variable model, this
can be determined by t-tests and F-tests, as will be discussed.
Testing Individual Coefficients:
As in the two-variable case, we can use the t-test in order to test the hypothesis that an
individual coefficient is zero. We have N - k = 8 - 4 = 4 degrees of freedom.
The t-statistics are:163

^
β1 / s ^ = 6.3974/9.27 = .690 ⇒ p-value = .528.
β1
^
β 2 / s ^ = 2.4626/7.92 = .311 ⇒ p-value = .771.
β2
^
β 3 / sβ^ = 6.4560/1.34 = 4.83 ⇒ p-value = .0085.
3
161
This is similar to an idea covered on Exam 4/C. The maximum likelihood Gamma has to have a likelihood at least as
large as the maximum likelihood Exponential fit to the same data, since the Exponential Distribution is a special case
of the Gamma Distribution with alpha = 1.
162
Since TSS = RSS + ESS, and the total sum of squares depends only on the data, not the model, as we add
additional variables to a regression model, the RSS will increase, (on rare occasions the RSS will stay the same.)
163
p-values obtained via computer, allowing more accuracy than using the t-table.
^
β 4 / sβ^ = 1.1839/6.64 = .178 ⇒ p-value = .867.
4
The critical value at 10% (two-sided test) for the t-distribution for 4 degrees of freedom is 2.132.
Since .690 < 2.132, we do not reject at 10% the hypothesis that β1 = 0.
The critical value at 1% (two-sided test) for the t-distribution for 4 degrees of freedom is 4.604.
Since 4.83 > 4.604, we reject at 1% the hypothesis that β3 = 0. Saying the same thing
somewhat differently, for 4 degrees of freedom and a 1% significance level, the critical region
is |t| > 4.604. Since 4.83 is in the critical region, we reject at 1% (two-sided test) the hypothesis
that β3 = 0.
As discussed previously, in general, if the test statistic is in the critical region or rejection
region, then we reject the null hypothesis. The critical region depends on the significance level
and the type of test: t-test, F-Test, etc.
Exercise: For a 5 variable regression model fit to 25 observations, what is the critical region or
rejection region for testing the hypothesis that β1 = 0, at a 10% level (2-sided test)?
[Solution: There are N - k = 25 - 5 = 20 degrees of freedom. For a 2-sided test, the critical value
for 10% in the t-table is: 1.725. The critical region is: |t| > 1.725.]
^
Exercise: For a 5 variable regression model fit to 25 observations, β 2 = 23.2 and sβ^ = 9.7.
2
Test the hypothesis that β2 = 0.
^
[Solution: t = β 2 / sβ^ = 23.2/9.7 = 2.4. There are N - k = 25 - 5 = 20 degrees of freedom.
2
For a 2-sided test, the critical value for 5% in the t-table is 2.086, and for 2% is 2.528.
Since 2.086 < 2.4 < 2.528, we reject the null hypothesis at 5% and do not reject it at 2%.]
One can also test whether an individual slope has a specific value b. The statistic is then
^
( β - b)/ sβ^ , which reduces to the previous case when b = 0. If the value to be tested is not zero,
then most commonly it will be 1.
^
Exercise: For a 3 variable regression model fit to 43 observations, β 2 = .67 and sβ^ = .12.
2
Test the hypothesis that β2 = 1.
^
[Solution: t = ( β 2 -1) / sβ^ = (1 - .67)/.12 = -2.75.
2
There are N - k = 43 - 3 = 40 degrees of freedom.
For a 2-sided test, the critical value for 1% (two-sided test) in the t-table is 2.704.
Since |-2.75| > 2.704, we reject the null hypothesis at 1%.
We conclude (at a 1% significance level) that β2 ≠ 1.]
Summary of the general t-test:

1. H0 : a particular regression parameter takes on certain value b.
H1 : H0 is not true.
4. Number of degrees of freedom = N - k.
Testing Whether All of the Slopes are Zero:
However, one may also be interested in whether two or more slope coefficients are all zero.
This involves using the F-Statistic.164 165
Let’s test the hypothesis that all of the slope coefficients are zero in the prior example of a four
variable regression: H0: β2 = β3 = β4 = 0.
The F-Statistic is: {RSS/(k-1)}/{ESS/(N - k)} = (9786/3)/(908.2/4) = 3262/227.0 = 14.37.
Note that the numerator and denominator are each sums of squares, divided by their degrees
of freedom. The denominator is the estimated variance of the regression,
s2 = ESS/(N - k) = 227.0.
For the F-Distribution with 3 and 4 degrees of freedom, the critical values at 5% and 1% are
6.59 and 16.69. Since 6.59 < 14.37 < 16.69, we reject the null hypothesis at 5% and do not
reject the null hypothesis at 1%.
The Analysis of Variance (ANOVA) Table is:

DF SumofSq MeanSq FRatio PValue
Model 3 9786 3262 14.37 .013
Error 4 908 227
Total 7 10694
Note that the F-Statistic can also be calculated as:

F-Statistic = (R2/(1 - R2))(N - k)/(k - 1) = (.915/(1- .915))(4/3) = 14.4.
To test the hypothesis that all of the slope coefficients are zero compute the
F-Statistic = {RSS/(k-1)}/{ESS/(N - k)} = {R 2 /(1 - R2 )}(N - k)/(k - 1),
which if H0 is true follows an F-Distribution with ν 1 = k -1 and ν 2 = N - k.
Exercise: For a 5 variable regression model fit to 25 observations, RSS = 1382 and
TSS = 1945. Test the hypothesis that β2 = β3 = β4 = β5 = 0.
[Solution: F-Statistic = {RSS/(k-1)}/{ESS/(N - k)} = {1382/(5-1)}/{(1945 - 1382)/(25-5)} = 12.27.
DF SumofSq MeanSq FRatio
Model 4 1382 345.5 12.27
Error 20 563 28.15
Total 24 1945
Alternately, R = RSS/TSS = 1382/1945 = .711. F = (R2/(1 - R2))(N - k)/(k - 1) = 12.3.
2
There are k - 1 = 4 and N - k = 20 degrees of freedom.
The critical value at 5% is 2.87, and the critical value at 1% is 4.43.
4.43 < 12.3 ⇒ Reject the hypothesis at 1%.]
164
The t-test is a special case of the F-Test.
R2 determines F and vice-versa. As discussed previously, if all the slopes are zero, R2 follows a Beta Distribution.
165
Therefore, if one had a table of incomplete beta functions, one could perform an equivalent statistical test using R2.
Distribution of Sums of Squares:*166
TSS = Σ(Yi - Y )2 = Σyi2. Assuming the εi are Normal and independent, then so are the Yi.
Assume each εi has variance σ2; we assume homoscedasticity.
If all of the slopes are zero, then each of Yi has the same mean, and TSS/σ2 = Σ{(Yi - Y )/σ}2,
looks like a sum of squared Unit Normals. The sum of the squares of N independent Unit
Normals is a Chi-Square Distribution with N degrees of freedom. However, the yi are not
independent since they sum to zero. Therefore, we lose one degree of freedom.167
If β1 = β2 = ... = βN = 0, TSS follows a Chi-Square Distribution with N - 1 degrees of freedom.168
^
Similarly, if β1 = β2 = ... = βN = 0, RSS/σ2 = Σ( Yi - Y )2/σ2, follows a Chi-Square Distribution with
k - 1 degrees of freedom.169
^
ESS/σ2 = Σ ^εi 2/σ2 = Σ(Yi - Yi )2/σ2, follows a Chi-Square Distribution with N - k degrees of
freedom. Therefore, E[ESS/σ2] = N - k.170 Therefore, E[s2] = E[ESS/(N-k)] = σ2.
Therefore, s2 = ESS/(N-k), is an unbiased estimator of σ2.171
If β1 = β2 = ... = βN = 0, then RSS and ESS have independent Chi-Square Distributions, and
F = {RSS/(k - 1)}/{ESS/(N - k)} = {(RSS/σ2)/(k - 1)}/{(ESS/σ2)/(N - k)}, follows an F-Distribution,
with k - 1 and N - k degrees of freedom.
166
See Section 19.11 of Volume 2 of Kendall’s Advanced Theory of Statistics or Appendix 2 of Statistical Methods
of Forecasting by Abraham and Ledbolter.
167
If all the slopes are zero, then TSS is the numerator of the sample variance of Y, where the Yi are random samples
from the same Normal Distribution. An important statistical result is that in this case, S2(N - 1)/σ2 has a Chi-Square
Distribution with N-1 degrees of freedom; which is the same as saying TSS/σ2 has a Chi-Square Distribution with N-1
degrees of freedom. See Theorem 6.3-4 in Hogg and Tanis, or Theorem 3.6.1 in Hogg , McKean, and Craig.
168
If all the slopes are not zero, then TSS/σ2 follows a noncentral Chi-Square Distribution.
169
If all the slopes are not zero, then RSS/σ2 follows a noncentral Chi-Square Distribution.
170
The mean of a Chi-Square Distribution is equal to its number of degrees of freedom.
The fact that s2 is an unbiased estimator of σ2 was proven in a previous section, without using the assumption of
171
Normally Distributed Errors. See also Section 19.9 of Volume 2 of Kendall’s Advanced Theory of Statistics.
Problems:
2 0. 1 (1 point) A 4-variable model has been fit to 30 points. The estimated first slope parameter
^
β1 = - 4.421, with standard error 2.203. Test the hypothesis that β1 = 0.
20.2 (1 point) A 6-variable model has been fit to 36 observations. The estimated third slope
^
parameter β 3 = 3.13, with standard error .816. Test the hypothesis that β3 = 1.
2 0. 3 (2 points) A 3-variable model (including intercept) has been fit to 15 observations.

Regression Sum of Squares (RSS) = 5,018,232.
Total Sum of Squares (TSS) = 8,618,428.
Test the hypothesis that β2 = β3 = 0.
For a linear regression model: Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + ε, fit to 23 observations,
R2 = .912.
2
20.4 (1 point) What is R ?
(A) .86 (B) .87 (C) .88 (D) .89 (E) .90
20.5 (1 point) What is the value of the F-statistic used to test the hypothesis
β2 = β3 = β4 = β5 = 0?
(A) 47 (B) 49 (C) 51 (D) 53 (E) 55
20.6 (2 points) The F-Statistic used to test the hypothesis that all of the slopes are zero is
calculated as which of the following?
A. The explained variation divided by the total variation
B. The explained variance divided by the total variance
C. The explained variation divided by the unexplained variation
D. The explained variance divided by the unexplained variance
E. None of A, B, C, or D is true.
* 20.7 (2 points) A multiple linear regression model with k variables has been fit to N
observations, N > k. You compute F = {RSS/(k-1)}/{ESS/(N - k)}.
Which of the following is not a necessary condition for F to follow an F-Distribution?
A. Homoscedasticity
B. Independent errors
C. All of the actual slopes are zero.
D. The error terms are Normally Distributed.
E. All of the above are necessary.
Use the following information from a three variable regression for the next 4 questions:
Use the following information from a three variable regression for the next 4 questions:
^ ^ ^
N = 15. β1 = -20.352, β 2 = 13.3504, β 3 = 243.714.
(426076 -2435 -36703)
^
Var[ β] = (-2435 58.85 41.99)
(-36703 41.99 4034)
20.8 (1 point) Test the hypothesis that β1 = 1500.

A. Reject H0 at 1%.

A. Reject H0 at 1%.

A. Reject H0 at 1%.
20.11 (3 points) What is the upper limit of a symmetric 95% confidence interval for
β1 + 50β2 + 10β3?
A. 3425 B. 3450 C. 3475 D. 3500 E. 3525
20.12 (2 point) You fit the 2-variable linear regression model, Y = α + βX + ε,

^
to 16 observations. ΣXi2 = 1018. β = -1.9. α^ = 27.
The t-statistic for testing β = 0 is -2.70.

Test the hypothesis H0: α = 10 versus H1: α ≠ 10.
A. Reject H0 at 1%.

A multiple regression model has been fit to 50 observations.
Coefficient Fitted Value Standard Error
β0 88 49
β1 0.031 0.012
β2 -0.72 0.46
The error sum of squares is 63 and R2 = .84.

A. Reject H0 at 1%.

A. Reject H0 at 1%.

A. Reject H0 at 1%.
20.16 (1 point) Determine the Standard Error of the regression.

A. 1.0 B. 1.2 C. 1.4 D. 1.6 E. 1.8
20.17 (2 points) Determine the adjusted R2 of the regression.
20.18 (2 points) Test the hypothesis that β0 = β1 = β2 = 0.


One has fit a regression model with 5 variables (4 independent variables plus the intercept).
One is testing the hypothesis H0: β2 = β3 = β4 = β5 = 0, versus the alternative hypothesis that H0
is false.
level?
level?
20.21 (1 point) Compare the probability of a Type II error for the tests in the two previous
questions, all else being equal.
2 0. 2 2 (Course 120 Sample Exam #2, Q.4) (2 points) You apply all possible regression
models to a set of five observations with three explanatory variables. You determine ESS, the
sum of squared errors (or residuals), for each of the models:
Model Variables in the Model ESS

I X2 5.85
II X3 8.45
III X4 6.15
IV X2, X3 5.12
V X2, X4 4.35
VI X3, X4 1.72
VII X2, X3, X4 0.07
You also determine that the estimated variance of the dependent variable Y is 2.2.
Calculate the value of the F statistic for testing the significance of adding the variable X4 to the
model Yi = β2 + β2X2i + εi.
(A) 0.3 (B) 0.7 (C) 1.0 (D) 1.4 (E) 1.7
20.23 (Course 120 Sample Exam #2, Q.5) (2 points) You preform a regression of Y on
^
X2 and X3. You determine: Yi = 20.0 - 1.5X2i - 2.0X3i.
Source Sum of Squares Degrees of Freedom Mean Sum of Squares F-Ratio
Regression 42 2 21 5.25
Error 12 3 4
Total 54 5
(4/3 -1/4 -1/3)

(X’X)-1 = (-1/4 1/16 0)
(-1/3 0 2/3)
Calculate the value of t statistic for testing the null hypothesis H0: β3 = 1.
(A) -0.9 (B) -1.2 (C) -1.8 (D) -3.0 (E) -5.0

You fit the following model to 10 observations: Y = β1 + β2X2 + β3X3 + ε.
You are given: RSS = 61.3. TSS = 128.
You then fit the following new model, with the additional variable X4, to the same data:
Y = β1 + β2X2 + β3X3 + β4X4 + ε.
For this new model, you determine: RSS = 65.6, TSS = 128.
Calculate the value of the F statistic to test H0: β4 = 0.
(A) 0.01 (B) 0.41 (C) 1.76 (D) 4.30 (E) 10.40
Section 21, Additional Tests of Slopes

This section continues the discussion of tests of hypotheses about the slopes of multiple
regression models.
Testing Whether Subgroups of Slopes are Zero:
In a similar manner to testing whether all of the slopes are zero, one can also use the
F-Distribution to test whether groups of slope coefficients are equal to zero. For example, in the
4 variable regression model discussed in the previous section, take H0: β2 = β4 = 0.
Then run a regression for the new model: Y = β1 + β3X3 + ε.

^ ^
The result is β1 = 38.5349, β 3 = 4.3798.
The residuals of this regression, ^εi , are:
^ ^ε ^
X3 Yi Yi = Yi - Yi
i
1 6 42.91 -36.91
-1 8 34.16 -26.16
4 33 56.05 -23.05
-4 14 21.02 -7.02
0 40 38.53 1.47
8 118 73.57 44.43
-8 2 3.50 -1.50
6 61 12.26 48.74
^
Error Sum of Squares = ESS = residual variation ≡ Σ ^εi 2 = Σ(Yi - Yi )2 = 6982.
ESSUR = Error Sum of Squares of the Unrestricted model = 908.

ESSR = Error Sum of Squares of the Restricted model = 6982.
q = dimension of the restriction =
variables for unrestricted model - variables for restricted model = 4 - 2 = 2.
N = number of observations = 8. k = variables for the unrestricted model = 4.
Then the F-statistic is:172
{(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(6982 - 908)/2} /{908/(8 - 4)} = 3037/227 = 13.37.
This is an F-Statistic with 2 and 4 degrees of freedom.
Using the table with ν1 = 2 and ν2 = 4, the critical values are 6.94 and 18.00 for 5% and 1%
respectively. Since 6.94 < 13.37 < 18.00, we reject at 5% and do not reject at 1% the null
hypothesis, that β2 = β4 = 0.
172
The numerator and denominator are each sums of squares, divided by their degrees of freedom.
See formula 5.20 in Pindyck and Rubinfeld.
In general when testing whether some set of slope coefficients are all zero, or whether some
linear relationship holds between two slope coefficients:173
F-Statistic = {(ESS R - ESSU R )/q} / {ESSU R /(N - k)}.
The ESS for the restricted model is greater than or equal the ESS for the unrestricted model;
the F-Statistic is always non-negative.
Since TSS = RSS + ESS, and the total sum of squares depends only on the data, not the
model, we can rewrite the numerator of this F-Statistic as a difference of Regression Sums of
Squares, but in the opposite order: F-Statistic = {(RSSUR - RSSR)/q} / {ESSUR/(N - k)}.
The case of testing whether all the slope coefficients are zero is a special case of the above,
with q = k -1, ESSUR = ESS, RSSUR = RSS, and RSSR = 0,
so that F-Statistic = {RSS/(k-1)}/{ESS/(N - k)}.
Since ESS = (1 - R2)TSS, and the total sum of squares depends only on the data, not the
model, we can rewrite this F-Statistic in terms of R2.
F = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} =
{((1 - R2R)TSS - (1 - R2UR)TSS)/q} / {(1 - R2UR)TSS/(N - k)}.
F-Statistic = {(R2 UR - R2 R )/q}/{(1 - R2 UR )/(N - k)}. 174
Exercise: Test the null hypothesis that β2 = β3 = 0.

[Solution: Run a regression for the new model: Y = β1 + β4X4 + ε.
^ ^
The result is β1 = 16.3214, β 4 = 1.8929.
The residuals of this regression, ^εi , are: -2.75, -8.32, 9.11, -17.46, 0.96, 71.39, 52.18, -0.75.
Error Sum of Squares = ESS = residual variation ≡ Σ ^εi 2 = 8286.
Then the F-statistic is:
respectively. Since 6.94 < 16.25 < 18.00, we reject at 5% and do not reject at 1% the null
hypothesis, that β2 = β3 = 0.]
173
This is a specific example of what is called a Wald Test.
174
See formula 5.21 in Pindyck and Rubinfeld.
t-test as a special case of the F-Test:175
If one applies the F-Test to a single slope, it is equivalent to the previously discussed t-test.
For example, in the four-variable regression example, one could use the F-Test to test the
hypothesis that β3 = 0. There is one restriction, so that q = 1. N - k = 8 - 4 = 4.
F-Statistic = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = (4)(ESSR - ESSUR)/ESSUR =

(4)(ESSR/ESSUR - 1), with 1 and 4 degrees of freedom.
Run a regression for the restricted model: Y = β1 + β2X2 + β4X4 + ε.

^ ^ ^
The result is β1 = 1.83186, β 2 = 20.9849, β 4 = -15.2823.
The residuals of this regression, ^εi , are: -14.99, -14.82, 29.34, 8.52, -9.284, 45.91, -50.93, 6.24.
Error Sum of Squares = ESSR ≡ Σ ^εi 2 = 6205.
F-Statistic = (4)(6205/908 - 1) = 23.33, with 1 and 4 degrees of freedom.
Consulting the F-Table, since 21.20 < 23.33, we reject the hypothesis at the 1% level.
This is the same conclusion as drawn previously based on the t-test.
Using a computer, one can determine that the p-value is .0085. This is the same p-value as
determined previously based on the t-statistic.
In fact the t-statistic was 4.83 = √23.33 = square root of the F-Statistic.
In general, applying the F-Test to a single slope is equivalent to the t-test with
t = √ F.
175
See the section on the F-Distribution, for a discussion of its relationship to the t-distribution when ν1 =1.
Linear Relationship Between Slope Coefficients:
In a similar manner, one can also use the F-distribution to test the hypothesis that the slope
coefficients satisfy a specific linear relationship.
For example one can test the hypothesis H0: β2 + β4 = 1.176

To obtain the restricted model, substitute β4 = 1 - β2 to yield the model:
Y - X4 = β1 + β2(X2 - X4) + β3X3 + ε.
X2 X3 X4 Y X2-X4 Y-X4
-2 1 -4 6 2 10
1 -1 0 8 1 8
3 4 4 33 -1 29
6 -4 8 14 -2 6
11 0 12 40 -1 28
15 8 16 118 -1 102
17 -8 20 2 -3 -18
20 -6 24 61 -4 37
^ ^ ^
Run a regression for the new model: β1 = 18.7371 , β 2 = -10.5707, β 3 = 7.1723.
The residuals of this regression, ^εi , are: 5.23, 7.01, -29.00, -5.19, -1.31, 15.31, -11.07, 19.01.
Error Sum of Squares = ESS = residual variation ≡ Σ ^εi 2 = 1665.
The restriction is one dimensional; q = 1. Then the F-statistic is:
respectively. Since 3.33 < 7.71, we do not reject at 5% the null hypothesis, that β2 + β4 = 1.
As before, F-Statistic = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)}.
Exercise: For the four variable regression example, test the hypothesis β3 = β4.
[Solution: To obtain the restricted model, substitute β3 = β4 to yield the model:
Y = β1 + β2X2 + β3X3 + β3X4 + ε. ⇔ Y = β1 + β2X2 + β3(X3 + X4) + ε.
Fitting a regression to the restricted model:
^ ^ ^
β1 = 10.082, β 2 = -4.306, β 3 = 6.853. ESSR = 1080. ESSUR = 908.
F-Statistic = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(1080 - 908)/1}/{908/(8 - 4)} = .76.
For 1 and 4 degrees of freedom, the critical value for 5% is 7.71.
Since .76 < 7.71, we do not reject at 5% the null hypothesis, that β3 = β4.
Comment: With so few observations, it is difficult to reject hypotheses.]
176
See 4, 11/01, Q.21.
The Equality of the Coefficients of Two Regressions:177
We can also use the F-Statistic to test the equality of the coefficients of two similar regressions
fit to different data sets.
Previously we fit the model:178 Y = β1 + β2X2 + β3X3 + β4X4 + ε, to 8 observations:

X2 X3 X4 Y
-2 1 -4 6
1 -1 0 8
3 4 4 33
6 -4 8 14
11 0 12 40
15 8 16 118
17 -8 20 2
20 -6 24 61
^ ^ ^ ^
The result was β1 = 6.3974, β 2 = 2.4626, β 3 = 6.4560, β 4 = 1.1839, with ESS = 908.
Let’s assume this data was from geographical region A.
Let’s assume we have 6 similar observations from a neighboring geographical region B:
X2 X3 X4 Y
-5 2 3 18
7 -3 5 -5
13 5 11 98
19 -1 9 47
24 6 14 109
26 3 2 121
Exercise: With the aid of a computer, fit a linear regression to these 6 observations from Region
B.
^ ^ ^ ^
[Solution: β1 = 21.1185, β 2 = 2.7666, β 3 = 10.5036, β 4 = -2.2079, with ESS = 310.]
If we assume that the variances of the two models are equal179, we can pool the data into one
model.
Exercise: With the aid of a computer, fit a linear regression to the combined 14 observations
from both regions.
^ ^ ^ ^
[Solution: β1 = 7.4757, β 2 = 2.8453, β 3 = 6.7521, β 4 = .6758, with ESS = 2248.]
177
See Section 5.3.3 of Econometric Models and Economic Forecasts.
178
With the aid of a computer.
179
One can test for heteroscedasticity.
We note that ESSC = 2248 > 1218 = 908 + 310 = ESSA + ESSB. With separate models, and
therefore 4 extra parameters, we are able to reduce the ESS and get a better fit. However, we
can always get a model that appears to fit better by adding additional parameters!
Is it better to use the separate models for regions A and B, or use the combined model C for
both regions? One can do an F-Test in order to test H0: model C applies to both regions. ⇔
H0: the coefficients for regions A and B are the same.
This is just an application of ideas previously discussed. One takes C as the restricted model,
and separate A plus separate B as the unrestricted model.
There are 8 + 6 = 14 total observations ⇒ N = 14.
There are four restrictions (equality of the 4 coefficients in the two regions) ⇒ q = 4.
There are 8 coefficients being fit in the unrestricted model (4 in region A and 4 in region B) ⇒ k
= 8.
ESSR = ESSC. ESSUR = ESSA + ESSB.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(ESSC - ESSA - ESSB)/4}/{(ESSA + ESSB)/6} =
{(2248 - 908 - 310)/4}/{(908 + 310)/6} = 1.27.
This F-statistic has 4 and 6 degrees of freedom, with critical value at 5% of 4.53.
Since 1.27 < 4.53, we do not reject H0.180
In general, F = {(ESSR - ESSUR )/q}/{ESS UR /(N - k)} =

{(ESS C - ESSA - ESSB )/k}/{(ESS A + ESSB )/(N A + NB - 2k)},
with k and NA + NB - 2k degrees of freedom.181
For the above example, F = {(2248 - 908 - 310)/4}/{(908 + 310)/(8 + 6 - (2)(4)} = 1.27,
with 4 and 8 + 6 - (2)(4) = 6 degrees of freedom.
Exercise: The same two-variable linear regression model is fit to similar data from two different
insurers. For the model fit to 13 observations from the first insurer ESS = 792.
For the model fit to 17 observations from the second insurer ESS = 951.
When the same model is fit to all 30 observations, ESS = 2370.
Test the hypothesis that the model for the first insurer is identical to that for the second insurer.
[Solution: F = {(ESSC - ESS1 - ESS2)/k}/{(ESS1 + ESS2)/(N1 + N2 - 2k)} =
{(2370 - 792 - 951)/2}/{(792 + 951)/(13 + 17 - (2)(2)} = (627/2)/(1743/26) = 4.68, with 2 and 26
degrees of freedom. For 2 and 26 degrees of freedom, the 5% critical value is 3.37 and the 1%
critical value is 5.53. Since 3.37 < 4.68 < 5.53, we reject the hypothesis at the 5% level, but not
at the 1% level.]
180
With 4 independent variables including the constant, and so few observations, it is difficult to reject H0.
181
See Equation 5.25 in Econometric Models and Economic Forecasts.
Problems:

Six models have been fit to 25 observations:
Model I: Y = β1 + β2X2 + β3X3 + β4X4 + ε
Model II: Y = β1 + β2(X2 + X3) + β4X4 + ε
Model III: Y = β1 + β4X4 + ε
Model IV: Y - X3 = β1 + β2(X2 - X3) + β4X4 + ε
Model V: Y = β1 + β2(X2 + X3 + X4) + ε
Model VI: Y - X4 = β1 + β2(X2 - X4) + β3X3 + ε
You are given:
Model Error Sum of Squares (ESS)
I 2721
II 3024
III 3763
IV 3406
V 3245
VI 3897
21.1 (2 points) Calculate the value of the F statistic used to test the hypothesis
H0: β2 = β3 = 0.
(A) 2 (B) 3 (C) 4 (D) 5 (E) 6
21.2 (1 point) In the prior question, at what level, if any, do you reject the null hypothesis?
21.3 (2 points) Calculate the value of the F statistic used to test the hypothesis H0: β2 = β3.
(A) 2 (B) 3 (C) 4 (D) 5 (E) 6
21.5 (2 points) Calculate the value of the F statistic used to test the hypothesis
H0: β2 + β4 = 1.
(A) 8 (B) 9 (C) 10 (D) 11 (E) 12

For each of five policy years an actuary has estimated the ultimate losses based on the
information available at the end of that policy year.
Policy Year Estimated Actual Ultimate
1991 45 43
1992 50 58
1993 55 63
1994 60 76
1995 65 78
21.7 (3 points) Let Xt be the actuary’s estimate and Yt be the actual ultimate.
Fit the ordinary least squares model, Yt = α + βXt + εt, and determine the ESS.
21.8 (3 points) The null hypothesis is H0: α = 0, β = 1.

Perform an F test of the null hypothesis.
2 1 . 9 (3 points) You fit the model Y = β1 + β2X2 + β3X3 + ε, separately to similar data from two
states.
For the regression to the 30 observations from the state of Southern Exposure, the error sum of
squares is: 2573.
For the regression to the 20 observations from the state of Northern Exposure, the error sum of
squares is: 2041.
For a similar regression to the 50 observations from both states combined, the error sum of
squares is: 5735.
Which of the following is the F-statistic for testing the hypothesis that the models fit separately
in the two states are the same?
(A) Less than 2
(E) At least 5
21.10 (2 points) For a regression model containing all the independent variables:
Regression 5 72,195
Error 33 22,070
You exclude some of the independent variables, and for this revised regression model:
Regression 3 63,021
Determine the F ratio to use to test the hypothesis that the coefficients for the excluded
variables are all equal to zero.
(A) 6 (B) 7 (C) 8 (D) 9 (E) 10
21.11 (3 points) Data Minor works at the Fitz Insurance Company (FIC).
He has access to a large data base of information on long haul trucking insureds for which FIC
writes commercial automobile insurance.
For each of 16 such large insureds, Data has their claim frequency over the most recent 3
years and 25 different characteristics that vary across these insureds.
Over the weekend, Data has his computer fit every possible 6 variable linear regression model
(5 independent variables and an intercept).
The computer ranks these regressions by R2; the largest R2 is 0.92.
On Monday, Data very proudly presents this regression model to his boss Ernest Checca.
What are some things Ernest should check or consider?
2 1. 1 2 (Course 120 Sample Exam #1, Q.5 & Course 4 Sample Exam, Q.35)
(2.5 points) To determine the relationship of salary (Y) to years of experience (X2) for both men
(X3 = 1) and women (X3 = 0) you fit the model Yi = β1 + β2X2i + β3X3i + β4X2iX3i + εi
to a set of observations from a sample of 11 employees. For this model you are given:
Regression 3 330.0117
Error 7 12.8156
You also fit the model
Yi = β1* + β*2 X2i + ε*i
to the observations. For this model you are given:
Regression 1 315.0992
Error 9 27.7281
Determine the F ratio to use to test whether the linear relationship between salary and years of
experience is identical for men and women.
(A) 0.6 (B) 2.0 (C) 3.5 (D) 4.1 (E) 6.2
21.13 (Course 120 Sample Exam #1, Q.6 & Course 4 Sample Exam, Q. 12)
(2.5 points) To predict insurance sales using eight independent variables you fit two
regression models based on 27 observations.
The first model contains all eight independent variables. For this model you are given:
Regression 8 115,175
Error 18 76,893
The second model contains only the first two independent variables. For this model you are
given:
Regression 2 65,597
Error 24 126,471
Determine the F ratio to use to test the hypothesis that the coefficients for the third through the
eighth independent variables are all equal to zero.
(A) 5.8 (B) 4.5 (C) 2.6 (D) 1.9 (E) 1.6
21.14 (4, 5/00, Q.9) (2.5 points) The following models are fitted to 30 observations:
21.14 (4, 5/00, Q.9) (2.5 points) The following models are fitted to 30 observations:
Model I: Y = β1 + β2X2 + ε
Model II: Y = β1 + β2X2 + β3X3 + β4X4 + ε
You are given:
(i) Σ(Y - Y )2 = 160
(ii) Σ(X2 - X2 )2 = 10
^
(iii) For Model I, β 2 = -2
(iv) For Model II, R2 = 0.70
Determine the value of the F statistic used to test that β3 and β4 are jointly equal to zero.
(A) Less than 15
(E) At least 24
21.15 (4, 11/00, Q.21) (2.5 points) You are given the following two regression models,
each based on a different population of data:
Model A: Yi = A1 + A2X2i + A3X3i + εi where i = 1,2 ,...,30.
Model B: Yj = B1 + B2X2j + B3X3j + εj where j = 1,2 ,...,50.
You assume that the variances of the two models are equal and pool the data into one model:
Model G: Yp = G1 + G2X2p + G3X3p + εp where p = 1,2 ,...,80.
2
You calculate Rmodel and the error sum of squares, denoted as ESSmodel, for all three models.
Which of the following is the F statistic for testing the hypothesis that Model A is identical to
Model B?
(A) (ESSG - ESSA - ESSB)/3
F3 ,74 = _____________________
(ESSA + ESSB)/74
(B) (ESSG - ESSA - ESSB)/6

F6 ,77 = _____________________
(ESSA + ESSB)/77
(C) (ESSG - ESSA - ESSB)/6

F6 ,74 = _____________________
(ESSA + ESSB)/74
(D) ( RG2 - 2 - 2 )/3

RA RB
F3 ,74 = _______________
( R2A + RB
2 )/74
(E) ( RG2 - 2 - 2 )/6

RA RB
F6 ,77 = ______________
( R2A + RB
2 )/77
21.16 (4, 11/01, Q.21) (2.5 Points) Three models have been fit to 20 observations:
Model I: Y = β1 + β2X2 + β3X3 + ε
Model II: Y = β1 + β2(X2 + X3) + ε
Model III: Y - X3 = β1 + β2(X2 - X3) + ε
You are given:
Model ESS
I 484
II 925
III 982
Calculate the value of the F statistic used to test the hypothesis H0: β2 + β3 = 1.
(A) Less than 15
(E) At least 18
21.18 (2 Points) Using the information in question 4, 11/01, Q.21, calculate the value of the
F statistic used to test the hypothesis H0: β2 = β3.
(A) Less than 15
(E) At least 18
21.20 (4, 11/02, Q.27) (2.5 Points) For the multiple regression model
Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + ε, you are given:
(i) N = 3,120
(ii) TSS = 15,000
(iii) H0: β4 = β5 = β6 = 0
(iv) RUR2 = 0.38
(v) RSSR = 5,565
Determine the value of the F statistic for testing H0.
(A) Less than 10
(E) At least 16
21.21 (4, 11/03, Q.20) (2.5 Points) At the beginning of each of the past 5 years, an actuary
has forecast the annual claims for a group of insureds.
The table below shows the forecasts (X) and the actual claims (Y).
A two-variable linear regression model is used to analyze the data.
t Xt Yt
1 475 254
2 254 463
3 463 515
4 515 567
5 567 605
You are given:
(i) The null hypothesis is H0: α = 0, β = 1.
(ii) The unrestricted model fit yields ESS = 69,843.
Which of the following is true regarding the F test of the null hypothesis?
(A) The null hypothesis is not rejected at the 0.05 significance level.
(B) The null hypothesis is rejected at the 0.05 significance level, but not at the 0.01 level.
(C) The numerator has 3 degrees of freedom.
(D) The denominator has 2 degrees of freedom.
(E) The F statistic cannot be determined from the information given.
21.22 (4, 11/04, Q.19) (2.5 points)

You are given the following information about a linear regression model:
(i) The unit of measurement is a region, and the number of regions in the study is 37.
(ii) The dependent variable is a measure of workers’ compensation frequency, while the
three independent variables are a measure of employment, a measure of
unemployment rate and a dummy variable indicating the presence or absence of
vigorous cost-containment efforts.
(iii) The model is fitted separately to the group of 18 largest regions and to the group of
19 smallest regions (by population). The ESS resulting from the first fit is 4053, while
the ESS resulting from the second fit is 2087.
(iv) The model is fitted to all 37 regions, and the resulting ESS is 10,374.
The null hypothesis to be tested is that the pooling of the regions into one group is
appropriate. Which of the following is true?
(A) The F statistic has 4 numerator degrees of freedom and 29 denominator degrees of
freedom, and it is statistically significant at the 5% significance level.
(B) The F statistic has 4 numerator degrees of freedom and 29 denominator degrees of
freedom, and it is not statistically significant at the 5% significance level.
(C) The F statistic has 4 numerator degrees of freedom and 33 denominator degrees of
(D) The F statistic has 8 numerator degrees of freedom and 33 denominator degrees of
freedom, and it is statistically significant at the 5% significance level.
(E) The F statistic has 8 numerator degrees of freedom and 33 denominator degrees of

You use the model Y= α + βX+ ε to analyze the following data:
i Xi Yi
1 1 2.8
2 2 2.9
3 3 3.6
4 4 4.7
5 5 6.2
You are given:
(i) The null hypothesis is H0: α = β.
(ii) The unrestricted model fit yields α = 1.46 and β = 0.86.
(iii) The restricted model fit yields α = β = 0.993.
Which of the following is true regarding the F test of the null hypothesis?
(A) The null hypothesis is not rejected at the 5% significance level.
(B) The null hypothesis is rejected at the 5% significance level but not at the 1% level.
(C) The numerator has 2 degrees of freedom.
(D) The denominator has 2 degrees of freedom.
(E) The F statistic cannot be determined from the information given.
Mahler’s Guide to
Regression
Sections 22-24:
22 Additional Models
23 Dummy Variables
24 Piecewise Linear Regression
prepared by
Study Aid F06-Reg-F

Sharon, MA, 02067
HCMSA-F06-Reg-F, Mahler’s Guide to Regression, 7/12/06, Page 208
Section 22, Additional Models
One can create additional models via change of variables and other techniques.
Change of variables can be used to convert models into those that are linear in the
parameters.
For example, Y = 10 X13X22(error) is a multiplicative model.

ln(Y) = ln(10) + 3ln(X1) + 2ln(X2) + ln(error) is an equivalent model, linear in its parameters.
More generally, such a model would be written as: ln(Y) = α + βln(X1) + γln(X2) + ln(error),
and the linear regression techniques could be applied. If the errors of the original relationship
are LogNormal, then those of the transformed relationship are Normal.
Exponential Regression:
If one assumes that on average costs increase at $5 per year, and they are $100 at time 0,
then a reasonable model is: Yt = 100 + 5t + errort. If instead, we assume that costs increase on
average 5% per year, then a reasonable model is: Yt = (100)(1.05t)(errort).
This second model can be rewritten: ln Yt = ln(100) + t ln(1.05) + ln(errort).

This is of the form: Zt = α + βt + εt, where Zt = ln Yt. By a change of variables we have
managed to transform the original model into a two-variable linear regression model.
If the errors of the original relationship are LogNormal, then those of the transformed
relationship are Normal.
This is usually referred to as an exponential regression, since the original model can be written
as Yt = exp[ln(100) + t ln(1.05) + ln(errort)].
Exponential regression: ln[Yi] = α + β X i + ε i. ⇔ Y i = exp[α

α + β X i + ε i].
More generally, an exponential model with more independent variables would be:
Y = exp[β1 + β2X2 + ...+ βnXn + ln(error)] ⇔ lnY = β1 + β2X2 + ...+ βnXn + ln(error).
Exercise: Fit an exponential regression to the following data.

Year: 1 2 3 4 5
Average Claim Cost: 300 320 345 370 400
[Solution: Fit a linear regression to the natural log of the claim sizes:
ln(300) = 5.704, ln(320) = 5.768, ln(345) = 5.844, ln(370) = 5.914, ln(400) = 5.991.
X = 3. x = -2, -1, 0, 1, 2. Σxi2 = 10.
Σxiyi = (-2)(5.704) + (-1)(5.768) + (0)(5.844) + (1)(5.914) + (2)(5.991) = 0.72.
^ ^
β = Σxiyi /Σxi2 = .72/10 = .072. α^ = Y − β X = 5.844 - (.072)(3) = 5.628.]
If Y is the average claim cost, then the fitted model is: ln(Y) = 5.628 + .072t.
Exponentiating both sides, and noting that e5.628 = 278.1 and that e.072 = 1.075, this model is
equivalent to: Y = 278.1(1.075t). This represents a constant annual inflation rate of 7.5%.
In general, exponential regression ⇔ constant percentage rate of inflation.182
Using this exponential regression, one could predict the average claim cost in the future.
For example, for year 6, the predicted average claim cost is: 278.1(1.0756) = 429.
Fitting a Quadratic:
Assume we have the following heights of eight fathers and their adult sons (in inches):
Father Son
53 56
54 58
57 61
58 60
61 63
62 62
63 65
66 64
Here is a graph of the least squares line fit in a previous section, y = .6254x + 24.07:
Son
64
62
60
58
56
Father
54 56 58 60 62 64 66
182
While the expected percent changes are the same, due the random element, the observed changes will vary.
One can also fit higher order polynomials by least squares. Here is an example of fitting a
second degree polynomial, y = β1 + β2x + β3x2, to this same data.
Let Vi = Xi2 and vi = Vi - E[X2].

In deviations form, the fitted least squares coefficients are:183
^
β 2 = {Σxiyi Σvi2 - Σviyi Σxivi} / {Σxi2 Σvi2 - (Σxivi)2}.
^
β 3 = {Σviyi Σxi2 - Σxiyi Σxivi} / {Σxi2 Σvi2 - (Σxivi)2}.
^ ^ ^
β1 = Y - β 2 X - β 3E[X2].
E[X2] = {532 + 542 + 572 + 582 + 612 + 622 + 632 + 662}/8 = 3528.5.
vi = (532, 542, 572, 582, 612, 622, 632, 662) - 3528.5 =

(-719.5, -612.5, -279.5, -164.5, 192.5, 315.5, 440.5, 827.5).
xi yi xiyi vi xivi viyi xi^2 vi^2

-6.25 -5.125 32.031 -719.5 4496.9 3687.4 39.062 517680.2
-5.25 -3.125 16.406 -612.5 3215.6 1914.1 27.562 375156.2
-2.25 -0.125 0.281 -279.5 628.9 34.9 5.062 78120.2
-1.25 -1.125 1.406 -164.5 205.6 185.1 1.562 27060.2
1.75 1.875 3.281 192.5 336.9 360.9 3.062 37056.2
2.75 0.875 2.406 315.5 867.6 276.1 7.562 99540.2
3.75 3.875 14.531 440.5 1651.9 1706.9 14.062 194040.2
6.75 2.875 19.406 827.5 5585.6 2379.1 45.562 684756.2
Sum 0 0 89.750 0.0 16989.0 10544.5 143.500 2013410.0
^
β 2 = {Σxiyi Σvi2 - Σviyi Σxivi} / {Σxi2 Σvi2 - (Σxivi)2} =
{(89.75)(2013410) - (10544.5)(16989)}/{(143.5)(2013410) - (16989)2} =
1563037/298214 = 5.24133.
^
β 3 = {Σviyi Σxi2 - Σxiyi Σxivi} / {Σxi2 Σvi2 - (Σxivi)2} =
{(10544.5)(143.5) - (89.75)(16989)}/{(143.5)(2013410) - (16989)2} =
-11627/298214 = -.0389888.
X = 59.25. Y = 61.125.
^ ^ ^
β1 = Y - β 2 X - β 3E[X2] = 61.125 - (5.24133)(59.25) - (-.0389888)(3528.5) = -111.852.
183
Below I will show how to fit a quadratic in matrix from and via the Normal Equations.
You can use which ever form you find most convenient.
Here is a graph of the fitted polynomial, y = -111.852 + 5.24133x - 0.0389888x2:
Son
64
62
60
58
56
Father
54 56 58 60 62 64 66
One convenient way to fit least squares polynomials is to write the regression equations in
matrix form, as discussed in a previous section. Here is the same example in matrix form.
The design matrix, X, has as its columns the values of 1, Xi and Xi2.
The matrix Y, has a single column with Yi in it.
(1 53 2809) (56)
(1 54 2916) (58)
(1 57 3249) (61)
X= (1 58 3364) Y = (60)
(1 61 3721) (63)
(1 62 3844) (62)
(1 63 3969) (65)
(1 66 4356) (64)
Let X’ be the transpose of X, in other words with the rows and columns interchanged.
(8 474 28228)
X’X = (474 28228 1689498)
(28228 1689498 101615908)
(5872.61 -199.014 1.67752)
-1
(X’X) = (-199.014 6.75156 -0.0569692)
(1.67752 -0.0569692 0.000481198)
(489 )
X’Y = (29063 )
(1735981)
(-111.852 )
^
β = (X’X)-1X’Y = (5.24133 )
(-0.0389888)
The fitted polynomial is: y = -111.852 + 5.24133x - 0.0389888x2, matching the previous result.
This matrix form can be applied in a similar manner to higher order polynomials.
Of course, many software programs can fit regressions.184
This matrix form is just a set of N linear equations in N unknowns, called the Normal Equations.
In general, the Normal Equations can be obtained by writing the expression for
the sum of squared errors, and setting equal to zero the partial derivative with
respect to each of the parameters.
Exercise: Obtain the Normal Equations for this example.

^
[Solution: Σ(Yi - Yi )2 = Σ(Yi - β1 - Xi β2 - Xi2 β3)2.
^
0 = ∂ Σ(Yi - Yi )2 / ∂β1 = -2Σ(Yi - β1 - Xi β2 - Xi2 β3). ⇒ N β1 + ΣXi β2 + ΣXi2 β3 = ΣYi.
^
0 = ∂ Σ(Yi - Yi )2 / ∂β2 = -2ΣXi(Yi - β1 - Xi β2 - Xi2 β3). ⇒ ΣXi β1 + ΣXi2 β2 + ΣXi3 β3 = ΣYiXi.
^
0 = ∂ Σ(Yi - Yi )2 / ∂β3 = -2ΣXi2(Yi - β1 - Xi β2 - Xi2 β3). ⇒ ΣXi2 β1 + ΣXi3 β2 + ΣXi4 β3 = ΣYiXi2.
N = 8. ΣXi = 474. ΣXi2 = 28228. ΣXi3 = 1,689,498. ΣXi4 = 101,615,908.
ΣYi = 489. ΣXiYi = 29,063. ΣXi2Yi = 1,735,981.
Therefore, in this example, the Normal Equations are:
8β1 + 474β2 + 28228β3 = 489.
474β1 + 28228β2 + 1689498β3 = 29063.
28228β1 + 1689498β2 + 101615908β3 = 1735981.
Comment: These are the same equations as obtained in the matrix form.]
One can solve these three linear equations in three unknowns in the usual manner.
The first two equations imply: 1148β2 + 135912β3 = 718.

The second two equations imply: 4002068β2 + 474790848β3 = 2464630.
Therefore, 4002068(718 - 135912β3)/1148 + 474790848β3 = 2464630. ⇒ β3 = -.03898878.
⇒ β2 = 5.2413267. ⇒ β1 = 111.8516966.
The fitted polynomial is: y = -111.852 + 5.24133x - 0.0389888x2, matching the previous result.
184
For example, I used PolynomialFit in Mathematica in order to fit least squares polynomials.
Testing the Quadratic Fit:
Exercise: Determine the Error Sum of Squares for the linear and quadratic fits.
^
[Solution: ^εi = Yi - Yi . ESS = Σ ^εi 2.
Linear Square of Quadratic Square of
Xi Yi Fitted Value Residual Residual Fitted Value Residual Residual
53 56 57.216 -1.216 1.479 56.419 -0.419 0.176
54 58 57.842 0.158 0.025 57.488 0.512 0.262
57 61 59.718 1.282 1.644 60.229 0.771 0.594
58 60 60.343 -0.343 0.118 60.987 -0.987 0.974
61 63 62.219 0.781 0.609 62.792 0.208 0.043
62 62 62.845 -0.845 0.714 63.238 -1.238 1.531
63 65 63.470 1.530 2.340 63.605 1.395 1.945
66 64 65.346 -1.346 1.813 64.241 -0.241 0.058
Sum 0.000 8.742 0.001 5.583
Note that the residuals for the quadratic fit would sum to zero except for rounding.]
It should be noted that the second degree least squares polynomial has a smaller sum of
squared errors at 5.583 than the least squares straight line at 8.742. Since a polynomial of
order m - 1 is a special case of a polynomial of order m, the best polynomial of order m does at
least as well and in most cases better than the best polynomial of order m - 1. However, the
principle of parsimony states one should only continue to increase the degree of the
polynomial as long as one gets a “significantly” better fit.
Exercise: Perform an F-Test of whether the quadratic regression is significantly better than the
linear regression.
[Solution: The restriction of going from the quadratic to the linear model is one dimensional.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(8.742 - 5.583)/1}/{(5.583/(8 - 3)} = 2.829.
Perform a one sided F-Test at 1 and 5 d.f. 6.61 is the critical value at 5%.
Since 2.829 < 6.61 we do not reject the simpler linear model at the 5% level.
Comment: Using a computer, the p-value is: 15.3%.
This is equivalent to performing a t-test of the hypothesis that β3 = 0.]
Thus in this case, with only 8 data points, the more complicated model quadratic model is not
significantly better than the simpler linear model.
As discussed previously for the linear fit of the heights example, we had:185
Parameter Table Estimate SE T-Stat p-Value
1 24.0679 5.98553 4.02102 0.00695079
x 0.625436 0.100765 6.2069 0.000806761
185
The p-values of the t-statistics were calculated on a computer.
Exercise: Test each of the coefficients of the fitted quadratic, in order to see if it is significantly
different from zero.
[Solution: s2 = ESS/(N - k) = 5.583/(8 - 3) = 1.1166.
Calculate the correlation of the two variables X and X2 = V:
rX X = Σx2ix3i/√(Σx2i2Σx3i2) = 16989/√{(143.5)(2013410)} = .999484
2 3
^
Var[ β 2 ] = s2/{(1 - rX X 2)Σx2i2} = (1.1166)/{(1 - .9994842)(143.5)} = 7.539.
2 3
^
sβ^ = √7.539 = 2.746. t = β 2 /sβ^ = 5.24133/2.746 = 1.909.
2 2
At N - k = 8 - 3 degrees of freedom, the 10% critical value is 2.015.
Thus we do not reject H0: β2 = 0 at the 10% level. (Using a computer, the p-value is 11.5%.)
^
Var[ β 3] = s2/{(1 - rX X 2)Σx3i2} = (1.1166)/{(1 - .9994842)(2013410)} = .0005375.
2 3
sβ^ = √.0005375 = .02318. t = -0.0389888/.02318 = -1.682.
3
Since 1.682 < 2.015, we do not reject H0: β3 = 0 at the 10% level. (The p-value is 15.3%.)
X X^2 X^3 X^4
53 2809 148,877 7,890,481
54 2916 157,464 8,503,056
57 3249 185,193 10,556,001
58 3364 195,112 11,316,496
61 3721 226,981 13,845,841
62 3844 238,328 14,776,336
63 3969 250,047 15,752,961
66 4356 287,496 18,974,736
Sum 28,228 1,689,498 101,615,908
ΣX2i2 = ΣXi2 = 28,228. ΣX3i2 = Σ(Xi2)2 = ΣXi4 = 101,615,908.
ΣX2iX3i = ΣXi(Xi2) = ΣXi3 = 1,689,498.
^
Var[ β1] = s2{ΣX2i2 ΣX3i2 - (ΣX2iX3i)2}/{NΣx2i2Σx3i2(1 - rX X 2)} =
2 3
(1.1166){(28,228)(101,615,908) - 1,689,4982}/{8(1 - .9994842)(143.5)(2013410)} = 6560.
sβ^1= √6560 = 80.99. t = -111.852/80.99 = -1.381.
Since 1.381 < 2.015, we do not reject H0: β1 = 0 at the 10% level. (The p-value is 22.6%.)]
^
Thus in this case, β 3 is not significantly different from 0, and as determined before, the more
complicated model quadratic model is not significantly better than the simpler linear model.186
Many would find it easier to use the matrix form discussed previously, in order to compute the
variance-covariance matrix, since one need not memorize individual formulas.187
186
Note that the F-test was equivalent to the t-test of the fitted β3. In both cases the p-value was 15.3%.
187
Note that even for the three variable model and only a few data points, carrying out all of the calculations oneself
takes too long for exam questions.
Exercise: For this quadratic fit, determine the variance-covariance matrix as s2(X’X)-1.
[Solution: From the previous calculation of the fitted coefficients using the matrix form:
(5872.61 -199.014 1.67752)
(X’X)-1 = (-199.014 6.75156 -0.0569692)
(1.67752 -0.0569692 0.000481198)
From the previous solution, s2 = ESS/(N - k) = 5.583/(8 - 3) = 1.1166.
(6557.4 -222.22 1.8731)
2 -1
s (X’X) = (-222.22 7.5388 -.063612)
(1.8731 -.063612 .00053731)
^ ^ ^
Therefore, Var[ β1] = 6557.4, Var[ β 2 ] = 7.5388, Var[ β 3] = .00053731,
^ ^ ^ ^ ^ ^
Cov[ β1, β 2 ] = -222.22, Cov[ β1, β 3] = 1.8731, Cov[ β 2 , β 3] = -.063612.
Comment: The variances match those calculated previously, subject to rounding.]
A List of Some Models:188
Linear model: Y = β1 + β2X2 + β3X3 + ε. Example: Y = 3 + 4X2 + 5X3 + 7X4 + ε.
Quadratic model: Y = β1 + β2X + β3X2 + ε. Example: Y = 1 + 2X + 8X2 + ε.
Polynomial model: Y = β1 + β2X + β3X2 + β4X3 + ε.

Example: Y = -4 + 3X + 6X2 + 2X3 + ε.
Log-Log model: ln Y = β1 + β2 lnX2 + β3 lnX3 + ε. Example: ln Y = 4 + 6 lnX2 + 7 lnX3 + ε.

Equivalent to: Y = γ1X2γ2 X3γ3 ε∗, where ε = ln(ε∗).
Multiplicative model: Y = β1X2β2 X3β3 + ε. Example: Y = 0.3X22.2 X31.5 + ε.
Exponential model: Y = exp[β1 + β2X2 + β3X3 + ε]. Example: Y = exp[5 + .1X + ε]

Equivalent to: lnY = β1 + β2X2 + β3X3 + ε.
Reciprocal Model: Y = 1/{β1 + β2X2 + β3X3 + ε}. Example: Y = 1/{7 + 9X2 + 6X3 + ε}
Equivalent to: 1/Y = β1 + β2X2 + β3X3 + ε.
Semilog Model: Y = β1 + β2 lnX2 + β3lnX3 + ε. Example: Y = -10 + 3lnX + ε
Interaction Model: Y = β1 + β2X2 + β3X3 + β4X2X3 + ε.

Example: Y = 20 + 4X2 + 3X3 + 0.7X2X3 + ε.
188
See pages 118 to 119 of Pindyck and Rubinfeld
Bias:*
Assume we have lnY = β1 + β2X2 + ε, with εi Normal with mean zero and variance σ2, and the
other assumptions of the linear regression model.
Then Y = exp[β1 + β2X2 + ε] = exp[β1 + β2X2]eε. Thus the errors are multiplicative rather than
additive. eε is Lognormal with parameters µ = 0 and σ. E[eε] = exp[σ2/2] > 1.
^ ^
ln Y is an unbiased estimator of lnY. ⇔ E[ln Y] = E[lnY].
^ ^
However, this does not imply that E[ Y] = E[Y]. Y is not an unbiased estimator of Y.
Unbiasedness is not necessarily preserved under change of variables.

Problems:
2 2 .1 (3 points) Fit the model Y = a bt (error), to the following data.

Time: 1 2 3 4 5
Average Claim Cost: 117 132 136 149 151
What is the fitted value of b?
A. 1.065 B. 1.070 C. 1.075 D. 1.080 E. 1.085
22.2 (4 points) For homeowner’s insurance, you are given the following series of average
written premium per home insured at current rate level:
1991 1156.83
1992 1152.34
1993 1153.64
1994 1150.22
1995 1144.52
1996 1150.11
1997 1164.21
1998 1178.57
1999 1193.75
Fit an exponential regression. Use this fitted model to predict the average written premium per
home insured at current rate level for the year 2002.
A. 1170 B. 1180 C. 1190 D. 1200 E. 1210
22.3 (10 points) Fit the model Y = β1 + β2X2 + β3X3 + β4X2X3 + ε to 15 observations:
X2 X3 Y
29 12 2841
21 8 1876
62 10 2934
18 10 1552
40 11 3065
50 11 3670
65 5 2005
44 8 3215
17 8 1930
70 6 2010
20 9 3111
29 9 2882
15 5 1683
14 7 1817
33 12 4066
Test whether X2 and X3 interact.
(You may use a computer to help with the computations.
You may check your work using a regression package.)
The model Y = β1 + β2X + β3X2 + ε is fit to the following data:
X: 11.7 25.3 90.2 213.0 10.2 17.6 32.6 81.3 141.5 285.7
Y: 15.3 9.3 6.5 6.0 15.7 10.0 8.6 6.4 5.6 6.0
^
22.4 (2 points) What is β1?
(A) Less than 15
(E) At least 18
^
22.5 (3 points) What is β 2 ?
(A) Less than -2
(B) At least -2, but less than -1
(C) At least -1, but less than 0
(E) At least 1
^
22.6 (3 points) What is β 3?
(A) Less than .0001
(B) At least .0001, but less than .0002
(C) At least .0002, but less than .0003
(D) At least .0003, but less than .0004
(E) At least .0004
22.7 (3 points) What is the estimated variance of this regression?

(A) Less than 3.0
(E) At least 4.5
22.8 (3 points) Test the hypothesis that β3 = 0.

A. Reject H0 at 1%.
2 2 .9 (2 points) A regression model Y = β1 + β2X2 + β3X3 + β4X2X3 + ε, where

Y is the auction price of an antique,
X2 is the age of an antique,
and X3 is the number of bidders on the antique,
^ ^ ^ ^
has been fit, with: β1 = 300, β 2 = 0.9, β 3 = -90, β 4 = 1.3.
What is the expected change in auction price for an increase of 2 in the number of bidders?
22.10 (4 points) You are given the following data on 6 planets.

For each planet you are given X, how far it is from the sun (in astronomical units), and Y, how
many days it takes to go around the sun once.
X: 0.39 0.72 1.00 1.52 5.20 9.54
Y: 88 225 365 687 4333 10,759
Fit via least squares the model Y = aXb.
Estimate how many days it would take a planet at a distance of 19.2 to go around the sun.
(A) 23,000 (B) 25,000 (C) 27,000 (D) 29,000 (E) 31,000
22.11 (5 points) You are given the following 12 values of a consumer price index:
Third Quarter 2002 122.0
Fourth Quarter 2002 123.9
First Quarter 2003 124.9
Second Quarter 2003 127.0
Fit via least squares the model, ln(Y) = a + b t. Use the fitted model to predict the value of the
consumer price index in the Third Quarter of 2006.
A. 154 B. 156 C. 158 D. 160 E. 162
2 2 .1 2 (2 points) You use the method of least squares to fit the model Yi = α + β√Xi to the
following data:
Xi 1 3 4 4 7
Yi 0 1 3 5 6
^
Determine the least squares estimate β.
(A) 3.1 (B) 3.3 (C) 3.5 (D) 3.7 (E) 3.9
22.13 (5 points) Use the following 4 observations:
X: -1 1 3 5
Y: 3 4 7 6
Fit a least squares quadratic, Y = β1 + β2X + β3X2, and use it to estimate y for x = 6.
(A) 6.0 (B) 6.2 (C) 6.4 (D) 6.6 (E) 6.8
22.14 (5 points) Polynomials are fit via least squares to the following mortality data:
Age Mortality per 1000 Age Mortality per 1000
27 3.89 67 40.74
32 2.45 72 59.55
37 2.49 77 86.02
42 3.81 82 145.42
47 6.34 87 172.15
52 10.49 92 230.80
57 15.94 97 271.60
62 26.91
The results are as follows:
Straight line: y = -150.442 + 3.58626 x.
2nd degree: y = 166.68 - 8.05695 x + 0.0938969 x2.
3rd degree: y = 2.70544 + 1.34511 x - 0.0695867 x2 + 0.000878944 x3.
4th degree: y = -229.788 + 19.3897 x - 0.559033 x2 + 0.00642612 x3 - 0.0000223677 x4.
5th degree: y = 509.078 - 52.7309 x + 2.10662 x2 - 0.0404085 x3 + 0.000370816 x4
- .00000126833 x5.
Fitted Polynomial Error Sum of Squares
Straight line 24005.9
2nd degree 1273.7
3rd degree 582.3
4th degree 433.7
5th degree 282.9
Based on a significance level of 5%, which polynomial should one use?
A. Straight line B. 2nd degree C. 3rd degree
D. 4th degree E. 5th degree
2
22.15 (4 points) In the previous question, compute and compare both R2 and R for the
various models, including a 6th and 7th degree polynomial.
The ESS for the sixth degree polynomial is 278.2.
The ESS for the seventh degree polynomial is 271.2.
22.16 (2, 5/83, Q. 9) (1.5 points) For the regression model Yi = b ln(xi) + εi, i = 1, 2, assume
that x1 = e and x2 = e2 and that ε1 and ε2 are independent random variables with mean zero
and unknown variance σ2.
What is the least squares estimator for b based on (x1, Y1) and (x2, Y2)?
A. (Y1 + 2Y2)/3 B. (Y1 + Y2)/3 C. (Y1 + 2Y2)/5
D. (exp[Y1] + exp[Y2])/3 E. (exp[Y1] + exp[2Y2])/3
x x2 ln(ux) x ln(ux)
38 1444 -6.23 -236.74
39 1521 -6.04 -235.56
40 1600 -6.00 -240.00
41 1681 -5.96 -244.36
42 1764 -5.77 -242.34
200 8010 -30.00 -1199.00
where the ux are the observed values of the force of mortality µx.
Based on a Gompertz form, S(x) = exp[-m(cx - 1)], fitted by linear regression, estimate ln(µ41).
(A) -6.115 (B) -6.100 (C) -6.000 (D) -5.900 (E) -5.885
22.18 (165, 11/87, Q.8) (2.1 points) You are fitting the model µ [x] + r = B dr Cx+r
to observed forces of mortality.
The values u[x]+r (x = 21, 22; r = 0, 1, 2, 3) are the logs of the observed forces of mortality.
Define:
3 22
SS = Σ Σ (u[x]+r - λ1 - λ2r - λ3x)2.
r =0 x =21
One of the normal equations to solve for the least-squares estimates of λ1, λ2 and λ3 is:
3 22
fλ1 + gλ2 + hλ3 = Σ Σ u[x]+r .
r =0 x =21
Determine f + g + h.
(A) 158 (B) 175 (C) 192 (D) 316 (E) 384
22.19 (2, 5/88, Q. 15) (1.5 points) Let (x1, y1), (x2, y2),. . . . , (xn, yn), be pairs of
observations. The curve y = θex is to be fitted to this data set. What is the least squares
estimate for θ?
n n
A. ∑ yi / ∑ exp[xi]
i=1 i=1
n n
B. ∑ yi exp[xi] / ∑ exp[2xi]
i=1 i=1
n n
C. ∑ yi exp[xi] / ∑ exp[xi]
i=1 i=1
n n
D. ∑ yi / ∑ exp[2xi]
i=1 i=1
n
E. ∑ (yi / exp[xi] )
i=1
22.20 (165, 11/89, Q.11) (1.7 points) You wish to fit the Gompertz form, µ x = Bcx, using
linear regression.
You are given:
x ln(µx)
1 -3.10
2 -3.07
3 -3.06
Determine ln(c).
(A) 0.010 (B) 0.015 (C) 0.020 (D) 0.025 (E) 0.030
22.21 (165, 11/90, Q.19) (1.9 points) You are given the following initial estimates, u[x]+t,
and graduated values, v[x]+t, which were obtained by fitting the form A + Bx + Ct2 by least
squares:
Initial Estimates Graduated Values
t t
x 0 1 2 x 0 1 2
0 u[0] 1.5 1.8 0 1.4 v[0]+1 2.0
1 1.5 1.8 2.3 1 v[1] 1.8 v[1]+2
2 1.8 2.3 2.4 2 v[2] v[2]+1 v[2]+2
Determine u[0].
(A) 0.6 (B) 1.4 (C) 1.5 (D) 2.3 (E) 2.4
22.22 (165, 5/91, Q.16) (1.9 points) A Gompertz form, S(x) = exp[-m(cx - 1)], has been fitted
by linear regression. You are given:
x ln(ux) x ln(ux)
1 - 8.37 - 8.37
2 - 8.29 - 16.58
3 - 8.21 - 24.63
4 - 8.12 - 32.48
5 - 8.01 - 40.05
Total -41.00 -122.11
where the ux are the observed values of the force of mortality µx.
Determine the estimate of In µ4.
(A) -8.15 (B) -8.13 (C) -8.11 (D) -8.09 (E) -8.07
22.23 (2, 5/92, Q.7) (1.7 points) Let (x1, y1), . . . , (xn, yn) be n pairs of observations.
The curve θ ln(x) is to be fitted to this data set. What is the least squares estimate for θ?
n n
A. ∑ yi ln(xi) / ∑ 2 ln(xi)
i=1 i=1
n n
B. ∑ yi ln(xi) / ∑ {ln(xi)}2
i=1 i=1
n n
C. ∑ yi ln(xi) / ∑ ln(xi)
i=1 i=1
n n
D. ∑ yi / ∑ {ln(xi)}2
i=1 i=1
n n
E. ∑ yi / ∑ ln(xi)
i=1 i=1
22.24 (2, 2/96, Q.9) (1.7 points) You are given the model E(Yi) = θ(xi + xi2)
and the following data:
i xi yi
1 1 4
2 2 8
3 3 14
Calculate the least squares estimate of θ.
A. 13/92 B. 28/23 C. 13/10 D. 9/2 E. 56/5

x ln ux
5 -3.9
10 -3.3
15 -2.8
where ux is the observed force of mortally.
The force of mortality is fit by linear regression to the form µx = kxb.
Determine the estimate of ln µ15.
(A) - 2.88 (B) -2.84 (C) -2.80 (D) -2.76 (E) - 2.72
22.26 (165, 11/97, Q.17) (1.9 points) You are fitting the model µ [x] + r = B dr Cx+r to
observed forces of mortality.
The values u[x]+r (x = 11, 12; r = 0, 1, 2, 3) are the logs of the observed forces of mortality.
Define:
3 12
SS = Σ Σ (u[x]+r - λ1 - λ2r - λ3x)2.
r =0 x =11
One of the normal equations to solve for the least-squares estimates of λ1, λ2 and λ3 is:
3 12
fλ1 + gλ2 + hλ3 = Σ Σ u[x]+r .
r =0 x =11
Determine f + g + h.
(A) 43 (B) 89 (C) 112 (D) 124 (E) 140

You fit the regression model Yi = βXi2 + εi to n observations (Xi, Yi).
Which of the following is the correct expression for the least squares estimate of β?
(A) ΣYiXi2 /ΣXi4
(B) ΣYi ΣXi2 /ΣXi4
(C) Y ΣXi2 /ΣXi4
(D) ΣYi /ΣXi2
(E) ΣYi Xi/ΣXi2
22.28 (Course 120 Sample Exam #3, Q.1) (2 points) You use the method of least
squares to fit the model Yi = α + βXi2 + εi to the following data:
Xi 0 0 1 2 2
Yi 2 4 8 16 20
^
Determine the least squares estimate β.
(A) 1.0 (B) 2.4 (C) 3.7 (D) 4.6 (E) 5.5
22.29 (CAS3, 11/05, Q.9) (2.5 points)
The following information is known about average claim sizes:
Year Average Claim Size
1 $1,020
2 1,120
3 1,130
4 1,210
5 1,280
Average claim sizes, Y, in year X are modeled by:
Y = α eβX
Using linear regression to estimate α and β, calculate the predicted average claim size in year
6.
A. Less than $1,335
B. At least $1,335, but less than $1,340
C. At least $1,340, but less than $1,345
D. At least $1,345, but less than $1,350
E. At least $1,350
Section 23, Dummy Variables189
A dummy variable is one that is discrete rather than continuous.

Most commonly a dummy variable takes on only the values 0 or 1.
For example, Bergen Insurance has a special set of insurance agents, who are members of its
“Gold Circle” program. Bergen Insurance believes the loss ratios from business written through
these Gold Circle agents is better than that from its other agents. One can fit a regression using
a dummy variable to test this hypothesis.
Let Xi = 1 if an agent is a Gold Circle.

Let Xi = 0 if an agent is not Gold Circle.
Then X is an example of a dummy variable.
Let Yi = the loss ratio for agent i.
A simple model is: Yi = β1 + β2Xi + εi.
Then for the Gold Circle Agents, Xi = 1 and the expected loss ratio is β1 + β2.
For the other agents, Xi = 0 and the expected loss ratio is β1.
Thus β2 measures the difference in expected loss ratio between Gold Circle Agents and other
agents.
Exercise: The above regression model is fit to data for 22 Gold Circle Agents and 40 other
^ ^
agents. The results are: β1 = 73.1, β 2 = -7.2, sβ^1 = 3.4, sβ^2 = 2.9.
Test H0, the hypothesis that β2 = 0. The alternative is β2 ≠ 0, Gold Circle agents are different.
^
[Solution: t = β 2 /sβ^2 = -7.2/2.9 = -2.483.
To test H0 one preforms a two-sided t-test with 62 - 2 = 60 degrees of freedom.
Since 2.390 < 2.483 < 2.660, reject H0 at 2% and do not reject H0 at 1%.]
Gold Circle agents are better than others ⇔ their expected loss ratio is lower: β2 < 0.
Let H0 be the hypothesis that β2 ≥ 0. Test H0 versus the alternative that β2 < 0.
To test H0, one preforms a one-sided t-test with 60 degrees of freedom, t = -2.483.
Since 2.390 < 2.483 < 2.660, reject H0 at 2%/2 = 1% and do not reject H0 at 1%/2 = 0.5%.
A somewhat more complex model would take into account others things about each agent. For
example, we might take into account how many years the agent has been writing insurance for
Bergen Insurance.
Such a model could be: Yi = β1 + β2X2i + β3X3i + εi, where X2 is the Gold Circle Dummy,
X3 is the years with Bergen Insurance, and Y is the loss ratio.
189
See Section 5.2 and Appendix 5.1 of Pindyck and Rubinfeld.
Exercise: The above regression model is fit to data for 22 Gold Circle Agents and 40 other
^ ^ ^
agents. The results are: β1 = 80.6, β 2 = -3.7, β 3 = -.53, sβ^1 = 2.6, sβ^2 = 2.2, sβ^3 = .17.
Let H0 be the hypothesis that β2 ≥ 0. Test H0 versus the alternative that β2 < 0.
^
[Solution: t = β 2 /sβ^2 = -3.7/2.2 = -1.682.
To test H0 one preforms a one-sided t-test with 62 - 2 = 60 degrees of freedom.
Since 1.671 < 1.682 < 2.000, reject H0 at 10%/2 = 5% and do not reject H0 at 5%/2 = 2.5%.]
One can use more than one dummy variable in a model. For example, let us assume you
divide the towns (and cities) in a state into three categories: Urban, Suburban, and Rural. You
wish to see whether your loss ratios are significantly different between the different types of
towns of the state.190
Let Yi = the loss ratio for town i.
Let X2i = 1 if the town is urban and 0 otherwise.
Let X3i = 1 if the town is suburban and 0 otherwise.
Then we have the following table:191

Type X2 X3
Urban 1 0
Suburban 1 0
Rural 0 0
^ ^ ^
Exercise: You fit the model Yi = β1 + β2X2i + β3X3i + εi, and β1 = 77.2, β 2 = -3.7, β 3 = 1.4.
Assuming the insurer does not alter its practices, what are the expected loss ratios for the three
categories?
[Solution: For Rural, X2 = 0 and X3 = 0, so the mean is β1 = 77.2.
For Urban, X2 = 1 and X3 = 0, so the mean is β1 + β2 = 77.2 - 3.7 = 73.5.
For Suburban, X2 = 0 and X3 = 1, so the mean is β1 + β3 = 77.2 + 1.4 = 78.6.]
One can do a joint test of the significance of all of the fitted coefficients for the dummy
variables. One uses the F-Test as described in a previous section.
For example, let assume that the above regression is fit to 25 towns, with error sum of squares
of 412 and total sum of squares of 577. Then the F-Statistic to test whether
β2 = β3 = 0 is: {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {RSS/(k-1)}/{ESS/(N - k)} =
{(577 - 412)/2}/{412/(25 - 3)} = 4.41, with 2 and 22 degrees of freedom.
Since 3.44 < 4.41 < 5.72, we reject the hypothesis that β2 = β3 = 0 at the 5% level and do not
reject at the 1% significance level.
190
For example you might charge more in an urban area and expect more losses.
However, the expected loss ratio might be the same in an urban area as for a rural area.
191
There are equally good equivalent ways to use dummy variables to deal with this situation.
For example, one could instead have a dummy variable for rural and one for suburban.
Dummy variables can also be useful when there is a one time effect. For example, assume
claim frequencies have been changing at a constant rate. Then a good model would be:
Y = α + βt + ε.
For example, if α = .17 and β = .001, then the graph of frequency versus time is:
0.184
0.182
0.18
0.178
0.176
0.174
0.172
2 4 6 8 10 12 14
Instead, assume that at time 5 there is a one time change in the level of claim frequency.192
Also assume that that the expected rate of change in frequency is the same, both before and
after the change. Then a good model would be:
Y = α1 + α2D + βt + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.
For example, if α1 = .17, α2 = -.02, and β = .001, then the graph of frequency versus time is:
0.175
0.17
0.165
0.16
0 2 4 6 8 10 12 14
192
This could be a change in a liability law, a change in the definition of a “claim” in the insurers data base, passage of
a seat belt law, etc.
If instead we assume an event that produced at t = 5 a change in the expected rate of change
in frequency, then a model would be:
Y = α1 + β1t + β2(t-5)D + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.
For example, if α1 = .17, β1 = .001, and β2 = -.0005, then the graph of frequency versus time
is:193
0.18
0.178
0.176
0.174
0.172
2 4 6 8 10 12 14
We can combine these two situations; if instead we assume an event that produced at t = 5
both a one time change in the level and a change in the expected rate of change in frequency,
then a good model would be:
Y = α1 + α2D + β1t + β2tD + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.
For example, if α1 = .17, α2 = -.01, β1 = .001, and β2 = -.0005, then the graph of frequency
versus time is:194
0.174
0.172
2 4 6 8 10 12 14
0.168
0.166
0.164
193
Note that this graph is continuous. this an example of piecewise linear regression, to be discussed in a
subsequent section.
194
This is an example of the “switching regression method”, discussed in Section 5.4.1 of Pindyck and Rubinfeld.
One can apply the same ideas to exponential regressions.
Exercise: For the dependent variable, Y, you calculate the average claim costs on closed
claims by year during 1990-99. You define the variable X as the year.
You also define a variable D as:
0 for years 1996 and prior
D= {
1 for years 1997 and later
What is the assumption behind each of the following models:195
(A) Y = α1D β1X ε (B) Y = α1 α2D β1X ε (C) Y = α1 β1X β2XD ε
(D) Y = α1 α2D β1X β2XD ε (E) Y = α1 α2D Xβ1 ε
[Solution: (A) ln Y = Dln(α1) + Xln(β1) + ε: constant expected percent rate of inflation, over the
whole period, one time change in average cost between 1996 and 1997. The problem with this
model is that the fitted cost in 1990 is: β11990, unlikely to be meaningful.
(B) ln Y = ln(α1) + Dln(α2) + Xln(β1) + ε: constant expected percent rate of inflation, over the
whole period, one time change in average cost between 1996 and 1997.
(C) ln Y = ln(α1) + Xln(β1) + XDln(β2) + ε: constant expected percent rate of inflation over the
period 1990 to 1996 and another (possibly) different constant expected percent rate of inflation
over the period 1997 to 1999.
(D) ln Y = ln(α1) + Dln(α2) + Xln(β1) + XDln(β2) + ε: combines the assumptions of B and C.
(E) ln Y = ln(α1) + Dln(α2) + β1ln(X) + ε: a one time change in average cost between 1996 and
1997, and otherwise expected costs are a power of the year (unlikely to be a useful model).]
195
See 4, 5/10, Q.24.
Problems:
2 3 .1 (2 points) A tort reform law was passed in a state to be effective on January 1, 1995. For
liability insurance, you believe the law immediately reduced claim severity as well as reducing
the future rate of inflation. However, you assume there was a constant rate of inflation before
the law was effective and another but lower constant rate after the law was effective. You use a
multiple regression model to test these assumptions.
For the dependent variable, Y, you calculate the average claim costs on liability claims in this
state by year during 1991-1998.
You define the variable X as the year, with 1 corresponding to 1991.
D= {
Assume a lognormal error component.
Which of the following models would be used to test the assumptions?
(A) Y = α1D β1X ε
(B) Y = α1 α2D β1X ε
(C) Y = α1 β1X β2XD ε
(D) Y = α1 α2D β1X β2XD ε
(E) None of the above.

A model has been fit of private passenger automobile liability insurance costs for a certain
^
territory and class: Yi = 100 - 2Xi + 10D1i + 4D2i - XiD1i + 3D1i D2i , where
Xi = age of the car in years, (0 = new, 1 = one year old, etc.)
D1i = 1 if the car is an SUV, and 0 otherwise.
D2i = 1 if the car has a cell phone installed, and 0 otherwise.
23.2 (2 points) What is the expected difference in cost between a 7 year old SUV with a cell
phone and a 2 year old car that is not an SUV and that has no cell phone?
(A) -5 (B) 0 (C) 5 (D) 10 (E) 15
23.3 (2 points) For this class and territory you have the following information:
Not SUV and no cell phone: 3000 cars with an average age of 6.7.
Not SUV and with cell phone: 4000 cars with an average age of 6.1.
SUV and no cell phone: 1000 cars with an average age of 4.9.
SUV and with cell phone: 2000 cars with an average age of 4.4.
What is the overall average cost per car?
(A) 93.0 (B) 93.5 (C) 94.0 (D) 94.5 (E) 95.0
The following data has been collected on 100 corporate executives:
X2 = years of experience
X3 = years of education
X4 = 1 if male and 0 if female
X5 = number of employees supervised
X6 = corporate assets
Y = ln(annual salary)
The following different models have been fit, with the corresponding values of R2:
Y = 9.86 + .0436X2 + .0309X3 + .117X4 + .000326X5 + .00239X6 - .000635X22
+ .000302X4X5; R2 = .9401. (All terms included in the regression.)
Y = 10.29 + .0281X3 + .122X4 + .000301X5 + .00155X6 - .000923X22 + .000269X4X5;
R2 = .8524. (No X2 term.)
Y = 10.34 + .0404X2 + .145X4 + .000302X5 + .00260X6 - .000510X22 + .000221X4X5;
R2 = .8685. (No X3 term.)
Y = 9.92 + .0438X2 + .0316X3 + .000121X5 + .00254X6 - .000652X22 + .000571X4X5;
R2 = .9336. (No X4 term.)
Y = 9.96 + .0432X2 + .0306X3 - 0.0101X4 + .00262X6 - .000635X22 + .000626X4X5;
R2 = .9289. (No X5 term.)
Y = 10.26 + .0403X2 + .0317X3 + .138X4 + .000379X5 - .000507X22 + .000266X4X5;
R2 = .9212. (No X6 term.)
Y = 10.02 + .0269X2 + .0298X3 + .123X4 + .000326X5 + .00204X6 + .000274X4X5;
R2 = .9264. (No X22 term.)
Y = 9.81 + .0433X2 + .0301X3 + .228X4 + .000543X5 + .00229X6 - .000605X22;
R2 = .9331. (No X4X5 term.)
Y = 10.43 + .0306X3 + .00383X4 + .0000552X5 + .00216X6 + .000648X4X5;
R2 = .3605. (No X2 and X22 terms.)
Y = 9.89 + .0410X2 + .0298X3 + .000394X5 + .00301X6 - .000494X22;
R2 = .7695. (No X4 and X4X5 terms.)
Y = 10.04 + .0385X2 + .0230X3 + .186X4 + .00297X6 - .000423X22;
R2 = .8234. (No X5 and X4X5 terms.)
23.4 (1 point) Determine the value of the F statistic used to test whether gender (male/female)
is significant in determining salary.
23.5 (1 point) Determine the value of the F statistic used to test whether years of education is
significant in determining salary.
23.6 (1 point) Determine the value of the F statistic used to test whether years of experience is
23.7 (1 point) Determine the value of the F statistic used to test whether corporate assets is
23.8 (1 point) Determine the value of the F statistic used to test whether number of employees
supervised is significant in determining salary.
23.9 (1 point) Determine the value of the F statistic used to test whether the interactive term
between gender and number of employees supervised is significant in determining salary.
23.10 (1 point) Determine the value of the F statistic used to test whether the term involving
the square of the years of experience is significant in determining salary.
23.11 (2 points) Some people believe that soda machines in schools leads to an increase in
obesity among the students.
You have access to information on 283 public high schools across the state, including the
average weight of the students in each school.
The null hypothesis is that the average weight is the same regardless of whether a school has
soda machines, while the alternative hypothesis is that the average weight is higher for
schools with soda machines.
Briefly discuss how you would set up a regression model to test this hypothesis.
23.12 (5 points) You are considering linear regression models of the annual amount spent on
clothing by single persons of ages 26 to 30 living in Manhattan.
Let X = annual pretax income of the individual.
Let D = 1 if the individual is female (and zero if male).
Y = annual amount spent on clothing by the individual.
List five common but different linear regression models using an intercept, X, and possibly D.
Briefly explain the assumptions behind each.
23.13 (2 points) Let Y be the size of loss.

Let X2 = 1 if the loss occurred in the Spring and 0 otherwise.
Let X3 = 1 if the loss occurred in the Summer and 0 otherwise.
Let X4 = 1 if the loss occurred in the Fall and 0 otherwise.
You fit the model Yi = β1 + β2X2i + β3X3i + β4X4i + εi to 1000 losses.
The error sum of squares (ESS) is 1,143,071.
The total sum of squares (TSS) is 1,155,820.
Determine the value of the F statistic used to test whether season is significant in determining
size of loss.
A. Less than 2.5
E. 4.0 or more
2 3 .1 4 (Course 120 Sample Exam #2, Q.9) (2 points) An insurance company uses a
model to predict Yi = daily phone sales for personnel in New York, Chicago, Portland.
The model is:
^
Yi = 12 + 3Xi + (2)(Xi - 5)D1i - (2)(Xi - 2)D2i + (Xi -5)D3i,
where Xi = salesperson’s years of experience,
D1i = 1 if Xi ≥ 5, and 0 otherwise,
D2i = 1 if New York, and 0 otherwise,
D3i = 1 if Chicago, and 0 otherwise.
Calculate the predicted phone sales for a salesperson with 7 years of experience located in
Chicago.
(A) 32 (B) 33 (C) 35 (D) 37 (E) 39
23.15 (Course 120 Sample Exam #3, Q.7) (2 points) The following model is used to
estimate the amount of fire damage Y (in thousands):
^
Yi = 8 + 5X1i + 2(X1i - 4)X2i + 9X3i - 2X1iX3i
where X1i = the distance from the nearest fire station, in kilometers,
X2i = 1 if X1i ≥ 4 and 0 otherwise,
X3i = 1 if the city is A and 0 if the city is B.
For the fires that took place at least 4 kilometers from the fire station, determine the distance
from the nearest fire station for which the average fire damage for city A and city B is the same.
(A) 4.25 (B) 4.50 (C) 8.50 (D) 28.00 (E) 29.00
23.16 (4, 5/01, Q.5) (2.5 Points) A professor ran an experiment in three sections of a
psychology course to show that the more digits in a number, the more difficult it is to remember.
The following variables were used in a multiple regression:
X2 = number of digits in the number
X3 = 1 if student was in section 1, 0 otherwise
X4 = 1 if student was in section 2, 0 otherwise
Y = percentage of students correctly remembering the number
You are given:
(i) A total of 42 students participated in the study.
(ii) The regression equation Y = β1 + β2X2 + β3X22 + β4X3 + β5X4 + ε
was fit to the data and resulted in R2 = 0.940.
(iii) A second regression equation Y = γ1 + γ2X2 + γ3X22 + ε
was fit to the data and resulted in R2 = 0.915.
Determine the value of the F statistic used to test whether class section is a significant
variable.
(A) 5.4 (B) 7.3 (C) 7.7 (D) 7.9 (E) 8.3
23.17 (4, 5/01, Q.24) (2.5 points) Your claims manager has asserted that a procedural
change in the claims department implemented on January 1, 1997 immediately reduced claim
severity by 20 percent. You use a multiple regression model to test this assertion.
For the dependent variable, Y, you calculate the average claim costs on closed claims by year
during 1990-99. You define the variable X as the year.
D= {
Assuming a lognormal error component and constant inflation over the entire period, which of
the following models would be used to test the assertion?
(A) Y = α1D β1X ε
(B) Y = α1 α2D β1X ε
(C) Y = α1 β1X β2XD ε
(D) Y = α1 α2D β1X β2XD ε
(E) Y = α1 α2D Xβ1 ε
23.18 (4, 11/02, Q.20) (2.5 points) You study the impact of education and number of
children on the wages of working women using the following model:
Y = a + b1E + b2F + c1G + c2H + ε
where Y = ln(wages)
1 if the woman has not completed high school
E= 0 if the woman has completed high school
-1 if the woman has post - secondary education
1 if the woman has completed high school

F= 0 if the woman has not completed high school
-1 if the woman has post - secondary education
1 if the woman has no children

G= 0 if the woman has 1 or 2 children
-1 if the woman has more than 2 children
1 if the woman has 1 or 2 children

H= 0 if the woman has no children
-1 if the woman has more than 2 children
Determine the expected difference between ln(wages) of a working woman who has
post-secondary education and more than 2 children and ln(wages) of the average for all
working women.
(A) a - b1 - b2
(B) b1 + b2
(C) -b1 - b2
(D) a - b1 - b2 + c2
(E) -b1 - b2 - c1 - c2
23.19 (4, 11/03, Q.5) (2.5 points)
For the model Yi = α + βXi + εi, where i = 1, 2,...,10, you are given:
(i) Xi = 1, if the ith individual belongs to a specified group
0, otherwise
(ii) 40 percent of the individuals belong to the specified group.
^
(iii) The least squares estimate of β is β = 4.
^ 2
(iv) Σ(Yi - α
^ - βX i) = 92.
Calculate the t statistic for testing H0: β = 0.
(A) 0.9 (B) 1.2 (C) 1.5 (D) 1.8 (E) 2.1

(i) Ytij is the loss for the jth insured in the ith group in Year t.
(ii) Yti is the mean loss in the ith group in Year t.
(iii) Xji = 0, if the jth insured is in the first group (i = 1)
1, if the jth insured is in the second group (i = 2).
(iv) Y2ij = δ + φY1ij + θXij + εij, where i = 1, 2 and j = 1, 2,..., n.
(v) Y21 = 30, Y22 = 37, Y11 = 40, Y12 = 41.
^
(vi) φ = 0.75.
Determine the least-squares estimate of θ.
(A) 5.25 (B) 5.50 (C) 5.75 (D) 6.00 (E) 6.25
Section 24, Piecewise Linear Regression196

Piecewise Linear Regression uses a model made up a series of straight line
segments, with the entire model continuous.
For example, assume some event has produced at time = 5 a change in the expected rate of
change in claim frequency, then a model would be:
Y = β1 + β2t + β3(t - 5)D + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.
For example, if β1 = .17, β2 = .001, and β3 = -.0005, then the graph of frequency versus time
is:
0.18
0.178
0.176
0.174
0.172
2 4 6 8 10 12 14
Note that the graph is continuous; the term β3(t-5)D is zero at time = 5,
While the slope before t = 5 is .001, after t = 5 the slope is: .001 - .0005 = .0005.
In general, a piecewise linear regression with one structural break at time s could be written:
Y = β1 + β2t + β3(t - s)D + ε, where D is 0 for t < s and D is 1 for t ≥ s.
A piecewise linear regression with two structural breaks at times s1 and s2 could be written:
Y = β1 + β2t + β3(t - s1)D1 + β4(t - s2)D2 + ε, where D1 is 0 for t < s1 and D1 is 1 for t ≥ s1,
and D2 is 0 for t < s2 and D2 is 1 for t ≥ s2.
One can apply the same idea to exponential regressions, with ln(Y) piecewise linear.
For example if claim severity were 1000 at time 0, and increasing at 5% per year before
time = 4 and at 3% per year after time = 4, then an appropriate model would be:
Y = 1000(1.05t)(1.03D(t-4))ε, where D = 0 for t < 4 and D = 1 for t ≥ 4.
196
See Section 5.4 of Pindyck and Rubinfeld.
Checking Whether the Slope Changes:
If we have a two piece linear model with one structural break, then the slopes of the two line
segments are (usually) not equal. One question of interest is whether the two slopes are
significantly different.
For example, assume that 25 data points have been fit to the model:
Y = β1 + β2t + β3(t-10)D + ε, where D is 0 for t < 10 and D is 1 for t ≥ 10.
^ ^ ^
β1 = 50, β 2 = 3, β 3 = 2.
Then the slope before time 10 is β2 and after time 10 is β2 + β3.

The two slopes would be the same if β3 = 0. Thus to test whether the slopes on the two
segments are different, we apply a t-test to test the hypothesis β3 = 0.
Exercise: In the above example, if sβ^3 = .7, test whether the two slopes are different.
^
[Solution: H0 is β3 = 0. t = β 3/sβ^3 = 2/.7 = 2.857, with 25 - 3 = 22 degrees of freedom.
Since 2.819 < 2.857, we reject H0 at 1%.
At the 1% level the two slopes are significantly different.]
Splines:*
Spline Functions are a generalization of piecewise linear models. One still requires continuity,
but no longer requires that each segment be a straight line.197 There are usually smoothness
requirements, such as equality of first derivatives, or first and second derivatives, at the points
of joining.
197
See for example, Section 15.3-15.6 of Loss Models.
Problems:
2 4 . 1 (2 points) There are 30 observations of average claim costs over time.

You have fit the regression model:
Y = β1(β2t)(β3D(t-6))ε, where D is 0 for t < 6 and D is 1 for t ≥ 6.
^ ^ ^
β1 = 450, β 2 = 1.07, β 3 = 1.02. sβ^ = 3.9, sβ^ = .016, sβ^ = .011.
1 2 3
At what level are the rates of inflation significantly different before and after time 6?
A. 10% B. 5% C. 2% D. 1% E. None of A, B, C, or D

X: 1150 840 900 800 1070 1220 980 1300 520 670 1420 850 1000 910 1230
Y: 1.29 2.20 2.26 2.38 1.77 1.25 1.87 0.71 2.90 2.63 0.55 2.31 1.90 2.15 1.20
24.2 (7 points) Fit a piecewise linear regression model with a structural break at 1000.
24.3 (3 points) Test whether the effect of X on Y is significantly different before and after 1000.

Let X be the age of driver and Y be the claim frequency for automobile insurance.
You are to fit a piecewise linear regression model to a large set of observations.
You assume the slope changes at age 27 and at age 60.
For your set of observations let:
a = ΣXi, for Xi < 27. b = ΣXi2, for Xi < 27. c = ΣYi, for Xi < 27. d = ΣXiYi, for Xi < 27.
e = ΣXi, for 27 ≤ Xi < 60. f = ΣXi2, for 27 ≤ Xi < 60.
g = ΣYi, for 27 ≤ Xi < 60. h = ΣXiYi, for 27 ≤ Xi < 60.
j = ΣXi, for 60 ≤ Xi. k = ΣXi2, for 60 ≤ Xi. l = ΣYi, for 60 ≤ Xi. m = ΣXiYi, for 60 ≤ Xi.
s = number of observations for Xi < 27.
t = number of observations for 27 ≤ Xi < 60.
u = number of observations for 60 ≤ Xi.
24.4 (2 points) Write out the form of the model.
24.5 (8 points) Derive the set of linear equations to be solved in order to fit the model, in terms
of the given observed quantities.
You are given the following model.
E(Ct) = β1 + β2Yt 0 < t ≤ t0

(β1 - β3 Yt0 ) + (β2 + β3)Yt t0 ≤ t ≤ t1
(β1 - β3 Yt0 - β4 Yt1) + (β2 + β3 + β4)Yt t > t1
(A) This model is not considered a spline function.
(B) This model is discontinuous, with two structural breaks.
(C) This model is continuous, with one structural break.
(D) Dummy variables are used to account for shifts in the intercept.
(E) This model is continuous, with two structural breaks.
Mahler’s Guide to
Regression
Sections 25-28:
25 Weighted Regression
26 Heteroscedasticity
27 Tests for Heteroscedasticity
28 Correcting for Heteroscedasticity
prepared by
Study Aid F06-Reg-G

Sharon, MA, 02067
HCMSA-F06-Reg-G, Mahler’s Guide to Regression, 7/12/06, Page 241
Section 25, Weighted Regression
In a weighted regression, we weight some of the observations more heavily

than others.
Sometimes weighted regressions are used when one has some reason to count certain
observations more heavily. For example one might weight more recent observations more
heavily. In an insurance study, one might weight more heavily observations from states,
insurers, classifications, employers, policies, etc., that are more similar to whatever one was
studying.
Weighted regressions come up in the study of credibility. The line formed by the Buhlmann
Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a
priori probability of each outcome acting as the weights.198
However, as will be discussed in a subsequent section, the chief use of weighted regressions
on this exam is to correct for the presence of heteroscedasticity.
Model with No Intercept:
Let us assume we have three observations:

X = 1 and Y = 2, X = 2 and Y = 6, X = 5 and Y = 11.
We can fit the model Yi = βXi + εi, by ordinary least squares regression.
^
β = ΣXiYi / ΣXi2 = 69/30 = 2.3.
What if we wish to weight some of these three observations more heavily than others.
For example, assume we wish to weight the second observation twice as much as the first,
and the third observation three times as much as the first.
In other words, let us minimize the weighted squared error:

^ ^ ^
( Y1 - 2)2 + 2( Y2 - 6)2 + 3( Y3 - 11)2 = (β - 2)2 + 2(2β - 6)2 + 3(5β - 11)2.
Setting the partial derivative of the weighted squared error with respect to β equal to zero:
0 = (2){(β - 2) + (2)(2)(2β - 6) + (5)(3)(5β - 11)}.
⇒ β^ = {2 + (2)(2)(6) + (5)(3)(11)}/{1 + (2)(2)(2) + (5)(3)(5)} = 191/84 = 2.274.
In general, one can perform a weighted regression by minimizing the weighted
^
sum of squared errors: Σ w i(Yi - Yi )2 = Σwi(Yi - βXi)2.
Setting the partial derivative with respect to β equal to zero:
^
0 = -2βΣXi{wi(Yi - βXi) ⇒ β = ΣwiXiYi / ΣwiXi2.
198
Buhlmann Credibility is covered on joint Exam 4/C.
For the model with no intercept, for a weighted regression with weights wi:
^
β = Σ w iX iY i / Σ w iX i2 .
Note that when all the weights are equal, the weighted regression reduces to the unweighted
^
case, β = ΣXiYi / ΣXi2.
Exercise: Apply the above formulas to the previous example of a weighted regression.
[Solution: w = (1, 2, 3). X = (1, 2, 5). Y= (2, 6, 11).
ΣwiXiYi = (1)(1)(2) + (2)(2)(6) + (3)(5)(11) = 191.
ΣwiXi2 = (1)(12) + (2)(22) + (3)(52) = 84.
^
β = ΣwiXiYi / ΣwiXi2 = 191/84 = 2.274.
Comment: One can take w = (1/6, 2/6, 3/6) if one prefers, without affecting the result.]
Exercise: You have 6 observations:

{1, 2}, {2, 6}, {2, 6}, {5, 11}, {5, 11}, {5, 11}.
Fit the model Yi = βXi + εi, by ordinary least squares regression.
[Solution: ΣXi2 = 12 + (2)(22) + (3)(52) = 84.
ΣXiYi = (1)(2) + (2)(2)(6) + (3)(5)(11) = 191.
^
β = ΣXiYi / ΣXi2 = 191/84 = 2.274.]
This is the same result as for the weighted regression. In general, when the weights are
integer, or proportional to integers, one can pretend one had different numbers of repeated
copies of the actual observations.
Two Variable Model:
Let’s apply these ideas to the two variable model: Yi = α + βXi + εi.
The weighted sum of squared errors is:
^
Σwi(Yi - Yi )2 = Σwi(Yi - α - βXi)2.
We minimize this sum of squared errors by setting the partial derivatives with respect to α and
β equal to zero.
0 = Σwi(Yi - α - βXi). ⇒ αΣwi = ΣwiYi - βΣwi.
⇒ α^ = {ΣwiYi - β^ ΣwiXi}/Σwi, where we have not necessarily assumed Σwi = 1.

^
0 = Σwi(Yi - α + βXi)Xi. ⇒ 0 = ΣwiXiYi - α^ Σ wiXi - β ΣwiXi2.
^
Substituting α^ into the second equation, we can solve for β:
^
β = {ΣwiXiYi - ΣwiXiΣwiYi/Σwi}/ {ΣwiXi2 - (ΣwiXi)2/Σwi}.
^
α^ = {ΣwiYi - β ΣwiXi}/Σwi.
One can always divide the weights by a constant so that Σwi = 1.

If we assume that Σwi = 1, then these equations become:
^
β = {ΣwiXiYi - ΣwiXiΣwiYi}/ {ΣwiXi2 - (ΣwiXi)2}.
^
α^ = ΣwiYi - β ΣwiXi.
These weighted regression equations compare to the unweighted regression equations:

^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} = {ΣXiYi/N - ΣXi/N ΣYi /N}/ {ΣXi2/N - (ΣXi/N)2}.
^
α^ = Y - β X . If all the weights are equal, wi = 1/N, the weighted regression reduces to the
unweighted regression, as it should.
Exercise: Fit a weighted regression (with slope and intercept) to the following data:
Xi Yi wi
3 10 2/11
7 6 3/11
20 4 6/11
^
[Solution: β = {ΣwiXiYi - ΣwiXiΣwiYi}/{ΣwiXi2 - (ΣwiXi)2} =
{60.545 - (13.364)(5.6364)}/{233.18 - 13.3642} = -.271.
^
α^ = ΣwiYi - β ΣwiXi = 5.6364 - (-.271)(13.364) = 9.26.
^
The weighted regression is: Yi = 9.26 - .271Xi.]
Deviations Form:
Just as in the unweighted case, one can put weighted regressions into deviations form.
However, rather than subtracting the straight average, one subtracts the weighted average
from each variable. Assuming that Σwi = 1:
x i = Xi - Σ w i X i
y i = Yi - Σ w i Y i
^
β = Σ w i x i y i /Σ w i x i 2 .
^
α^ = Σ w iY i - β Σ w iX i.
If wi = 1/N, then these formulas for the weighted regression reduce to the formulas for an
unweighted regression.
For example, redoing the previous exercise in deviations form:

X = {3, 7, 20}
Y = {10, 6, 4}
w = {2/11, 3/11, 6/11}
ΣwiXi = 13.364.
xi = Xi - ΣwiXi = {-10.364, -6.364, 6.636}
ΣwiYi = 5.6364.
yi = Yi - ΣwiYi = {4.3636, .3636, -1.6364}
Σwixiyi = -14.78.
Σwixi2 = 54.60.
^
β = Σwixiyi/Σwixi2 = -14.78/54.60 = -.271.
^
α^ = ΣwiYi - β ΣwiXi = 5.6364 - (-.271)(13.364) = 9.26.
^
The weighted regression is: Yi = 9.26 - .271Xi, matching the previous result.
Multiple Regression:*
Weighted regression is just a special case of Generalized Least Squares (GLS), to be

discussed in a subsequent section. Weighted least squares can be applied in the same
manner to the multiple regression model as to the two variable model, but it is usually easier
to handle this as a special case of Generalized Least Squares in matrix form.
The Relationship of Bayes Analysis and Buhlmann Credibility:*199
The line formed by the Buhlmann Credibility estimates is the weighted least squares line to
the Bayesian estimates, with the a priori probability of each outcome acting as the weights.
The slope of this weighted least squares line to the Bayesian Estimates is the Buhlmann
Credibility. Buhlmann Credibility is the Least Squares approximation to the Bayesian
Estimates.
For example, assume the following information:

Observation 1 2 3 4 5 6 7 8
A Priori Probability 0.2125 0.2125 0.2125 0.2125 0.0625 0.0625 0.0125 0.0125
Bayesian Estimate 2.853 2.853 2.853 2.853 3.7 3.7 4.5 4.5
The weights to be used are the a priori probabilities of each observation.
Put the variables in deviations form, by subtracting the weighted average from each variable:
xi = Xi - ΣwiXi, yi = Yi - ΣwiYi.
w = {0.2125, 0.2125, 0.2125, 0.2125, 0.0625, 0.0625, 0.0125, 0.0125}.
X = {1, 2, 3, 4, 5, 6, 7, 8}.
ΣwiXi = 3 = a priori mean.
x = X - ΣwiXi = {-2, -1, 0, 1, 2, 3, 4, 5}.
Y = {2.853, 2.853, 2.853, 2.853, 3.7, 3.7, 4.5, 4.5}.
ΣwiYi = 3.
y = Y - ΣwiYi = {-.147, -.147, -.147, -.147, .7 , .7, 1.5, 1.5}
Then if the least squares line is Y = α + βX,

^
β = Σwixiyi/Σwixi2 = .45/2.6 = .173.
^
α^ = ΣwiYi - β ΣwiXi = 3 - (3)(.173) = (3)(.827) = 2.481.
^
Yi = 2.481 + .173Xi, where Xi is the observation.
The slope of the line is the credibility assigned to one observation, Z = 17.3%.
The fitted weighted regression line is the estimates using Buhlmann Credibility:
Z(observation) + (1-Z)(a priori mean). This is true in general.
199
Bayes Analysis and Buhlmann credibility are covered on joint Exam 4/C.
See for example “Credibility,” by Mahler and Dean.
This example is taken from “Mahler’s Guide to Buhlmann Credibility and Bayesian Analysis.”
Problems:
25.1 (2 points) Determine the slope of a weighted regression, with slope and intercept, fit to
the following data.
X Y Weight
0 2 1/4
3 4 1/2
10 6 1/4
A. 0.31% B. 0.33 C. 0.35 D. 0.37 E. 0.39
2 5 . 2 (3 points) You are given the following information:

X Y Weight
1 15 60%
4 30 30%
9 50 10%
Fit a weighted regression, with slope and intercept, to this data.
What is the fitted value of Y for X = 10?
(A) 55 (B) 56 (C) 57 (D) 58 (E) 59
25.3 (2 points) Determine the slope of a weighted regression, with no intercept, fit to the
following data.
X Y Weight
1 3 30%
5 8 40%
10 13 20%
20 32 10%
A. 1.46 B. 1.48 C. 1.50 D. 1.52 E. 1.54
25.4 (165, 5/89, Q.2) (1.7 points) You believe that the true probability of success in a
single play of a game is directly proportional to the age of the player.
You are given the following observed experience:
Player Age Number of Successes Number of Plays, wi
1 20 25 100
2 25 28 112
3 30 30 100
where:
(i) ui = the observed proportion of successes; and
(ii) vi = the graduated proportion of successes.
3
The fit measure, F = ∑ wi(ui − vi)2 , is to be minimized subject to the prior opinion concerning
i=1
the true probability of success.
Determine v1.
(A) 0.210 (B) 0.213 (C) 0.218 (D) 0.250 (E) 0.265
* 25.5 (4, 5/90, Q.57) (3 points) Let X1 be the outcome of a single trial and let
E[X2 | X1] be the expected value of the outcome of a second trial as described in the table
below.
Outcome Initial Probability Bayesian Estimate
K of Outcome E[X2 | X1 = K ]
0 1/3 1
3 1/3 6
12 1/3 8
Which of the following represents the Buhlmann credibility estimates corresponding to the
Bayesian estimates (1, 6, 8)?
A. (3, 5, 10) B. (2, 4, 10) C. (2.5, 4.0, 8.5) D. (1.5, 3.375, 9.0) E. (1, 6, 8)
25.6 (165, 11/90, Q.1) (1.9 points) You are given the following exposures, nx, and
observed values, ux:
x nx ux
0 300 3
1 200 6
2 100 11
Revised estimates, vx, are to be determined such that:
(i) ∆ vx = a, x = 0, 1; and
(ii) the sum of the squared deviations, weighted by exposures, is minimized.
Determine v1.
(A) 6.3 (B) 6.6 (C) 6.7 (D) 10.4 (E) 10.7
25.7 (165, 11/90, Q.15) (1.9 points) You are using the least squares method, weighted by
exposures, to develop a rate of mortality, qx = a(x + 1/2).
You are given:
x Exposure Deaths
30 300 3
40 400 10
50 300 15
Determine a.
(A) 0.00069 (B) 0.00070 (C) 0.00072 (D) 0.00074 (E) 0.00075
* 25.8 (4B, 11/93, Q.24) (3 points) You are given the following:
• An experiment consists of three possible outcomes, R1 = 0, R2 = 2, and R3 = 14.
• The a priori probability distribution for the experiment's outcome is:
Outcome, Ri Probability, Pi
0 2/3
2 2/9
14 1/9
• For each possible outcome, Bayesian analysis was used to calculate predictive
estimates, Ei, for the second observation of the experiment.
The predictive estimates are:
Bayesian Analysis Predictive
Outcome, Ri Estimate Ei Given Outcome Ri
0 7/4
2 55/24
14 35/12
• The Buhlmann credibility factor after one experiment is 1/12.
Determine the values for the parameters a and b that minimize the expression:
3
Σ Pi(a + bRi - Ei)2
i=1
A. a = 1/12; b = 11/12 B. a = 1/12; b = 22/12 C. a = 11/12; b = 1/12
D. a = 22/12; b = 1/12 E. a = 11/12; b = 11/12
25.9 (165, 11/94, Q.3) (1.9 points) You are given the following exposures nx, observed
values ux and graduated values vx:
x nx ux vx
1 1 4 v1
2 1 6 v2
3 2 u3 10
Graduated values vx are determined such that:
(i) vx = ax + b; and
(ii) the sum of the squared deviations, weighted by exposures, is minimized.
Determine v2.
(A) 6.0 (B) 6.2 (C) 6.4 (D) 6.6 (E) 6.8
Note: The original exam question has been revised.
25.10 (165, 11/94, Q.15) (1.9 points) You are using the least squares method, weighted
by exposures, to fit the functional form qx = a(x + 1/3).
You are given:
x Exposures Deaths
10 30 1
20 90 5
30 80 7
Determine 1000a.
(A) 2.84 (B) 2.87 (C) 2.90 (D) 2.93 (E) 2.96
* 25.11 (4, 11/02, Q.7) (2.5 points)

You are given the following information about a credibility model:
First Unconditional Bayesian Estimate of
Observation Probability Second Observation
1 1/3 1.50
2 1/3 1.50
3 1/3 3.00
Determine the Bühlmann credibility estimate of the second observation, given that the first
observation is 1.
(A) 0.75 (B) 1.00 (C) 1.25 (D) 1.50 (E) 1.75
Section 26, Heteroscedasticity
One of the assumptions underlying ordinary least squares regression is that the error terms εi
are random variables with the same variance. Actually we assumed the εi were independent,
identically distributed normal variables, each with mean zero, however, we are for now
focusing on the assumption that they have the same variance.
We use the following terms:
Variances of ε i are all equal ⇔ Homoscedasticity.
Variances of ε i are not all equal ⇔ Heteroscedasticity. 200
An assumption underlying ordinary least squares regression is that the σi2 are all equal. The
null hypothesis is that there is homoscedasticity. The alternate hypothesis is that there is
heteroscedasticity.
An Example of Heteroscedasticity: 201
Assume that for each of 80 towns, we have data on their annual claim frequency over the last
4 years, and the exposures (car-years) over the same period of time. In each pair shown
below, the exposures are followed by the observed number of claims per 10,000 exposures:
{4092, 261}, {4401, 218}, {5164, 267}, {5687, 215}, {5847, 173}, {6003, 173}, {6196, 211},
{6219, 241}, {6524, 302}, {6698, 199}, {7244, 213}, {7924, 235}, {8473, 250}, {8546, 274},
{8923, 236}, {9107, 238}, {10113, 254}, {10341, 266}, {11740, 272}, {11919, 214},
{11972, 285}, {12020, 291}, {12387, 284}, {12653, 253}, {13560, 210}, {13893, 205},
{14403, 271}, {14906, 321}, {16178, 243}, {16280, 270}, {16611, 266}, {17653, 265},
{18421, 307}, {18506, 269}, {18575, 272}, {21036, 340}, {21125, 309}, {21972, 289},
{23576, 280}, {23800, 324}, {25599, 339}, {28942, 369}, {29773, 329}, {30778, 366},
{31897, 310}, {32738, 346}, {35708, 404}, {42345, 418}, {44382, 385}, {46051, 394},
{47386, 446}, {52763, 454}, {54881, 479}, {60044, 491}, {63511, 543}, {66620, 539},
{69062, 520}, {71807, 494}, {72231, 542}, {77854, 595}, {81597, 609}, {92432, 648},
{98188, 644}, {100133, 715}, {104217, 703}, {111460, 750}, {123870, 761}, {124017, 794},
{129975, 862}, {132996, 879}, {139876, 877}, {140738, 922}, {141963, 934}, {148211, 935},
{159978, 982}, {167914, 1033}, {180206, 1109}, {185566, 1143}, {194448, 1163},
{211189, 1250}.
For example, the first town had 4092 exposures and 107 claims, for a claim frequency of
107/4092 = .0261 or 261 claims per 10,000 exposures. The final town had 211,189
exposures and a claim frequency of 12.50%.
200
“Hetero” ⇔ differing, as in heterogeneous. “Homo” ⇔ similar, as in homogeneous. “Scedastic” from to scatter.
201
This example is very loosely based on Private Passenger Automobile Insurance in Massachusetts.
The behavior of the actual data is more complicated.
See “The Construction of Automobile Rating Territories in Massachusetts,” by Robert Conger, PCAS 1987.
Here is a graph of this data:
1200
1000
800
600
400
200
50000 100000 150000 200000
It appears as if the larger towns tend to have higher frequencies.
Therefore, we fit via ordinary least squares the model Yi = α + βXi + εi, where
Xi is the number of exposures for town i, and Yi is the observed claim frequency per 10,000
exposures for town i.
^ 2
The result is α^ = 194.4 and β = .00499. R2 = .9903. R = .9902. s2 = 764.
sα^ = 4.285. t-statistic for the intercept is: 194.4/4.285 = 45.4.
sβ^ = .00005594. t-statistic for the slope is: .00499/.00005594 = 89.2.
Source DF Sum of Squares Mean Square F-Statistic
Model 1 6,083,320 6,083,320 7960
Error 78 59,610 764
Total 79 6,142,930
(18.36 -0.000166)
Covariance Matrix = ( )
(-0.000166 .00000000313)
^
Corr[ α^ , β ] = -.693.
Durbin-Watson Statistic is 1.719.202
Based on the t-statistics and the F-Statistic, the slope and intercept both appear to be
2
significantly different than zero. R is extremely high; the regression line accounts for most of
the variation in frequency between the towns.
202
As discussed subsequently, the Durbin-Watson statistic tests for serial correlation.
In this case the Durbin-Watson statistic of 1.719 is sufficiently close to 2, so as not to indicate serial correlation.
^
Here is a graph of this regression, Yi = 194.4 + .00499Xi, versus the data:
1200
1000
800
600
400
200
50000 100000 150000 200000
So far everything seems okay. However, here is a graph of the squared residuals:
3000
2500
2000
1500
1000
500
Expos.
50000 100000 150000 200000
The magnitude of the residuals appears to have some tendency to be larger on average for
smaller towns. This corresponds to our intuition, that the observed frequencies of smaller
towns would be more affected by random fluctuation. Thus we have a suspicion that the
variance of the errors εi is not constant. Rather, we suspect that σi2 increases as the number of
exposures decreases.
Simulating Heteroscedasticity:*
Assume for example we wish to simulate the model: Yi = 95 + .05Xi + εi.

We are given the values for a series of observations of X, and want to simulate a
corresponding series of values of Y.
It is assumed that εi is Normally Distributed with mean zero.
If the model is homoscedastic with Var[εi] = 900, then we simulate this model as follows:203
Yi = 95 + .05Xi + 30(Random Standard Normal Distribution),
where for each Yi we simulate a new independent random draw from a Normal Distribution
with mean zero and standard deviation 1.
In the case of the heteroscedastic model, instead of multiplying the random draws from a
Standard Normal by the same value of σ for each Yi, σi varies.
Yi = 95 + .05Xi + σi(Random Standard Normal Distribution).
If the model is heteroscedastic with for example Var[εi] = .36Xi and Stddev[εi] = .6√Xi, then we
simulate this model as follows:
Yi = 95 + .05Xi + .6Xi.5(Random Standard Normal Distribution),
Exercise: Let 1.670, -0.518, 0.299, be three independent random draws from a Standard
Normal Distribution. X1 = 1711, X2 = 3124, and X3 = 4502. For the heteroscedastic example
with Var[εi] = .36Xi, simulate Y1, Y2, and Y3.
[Solution: Y1 = 95 + (.05)(1711) + (.6)(1711.5)(1.670) = 222.00.
Y2 = 95 + (.05)(3124) + (.6)(3124.5)(-0.518) = 233.83.
Y3 = 95 + (.05)(4502) + (.6)(4502.5)(.299) = 332.14.]
203
Here one is not interested in the details of how the computer simulates a random draw from a Normal
Distribution. How to simulate a Normal Distribution is explained for example in Simulation by Ross.
Estimated Variances and Covariances:
The usual formula for the variance of the estimated slope for the two-variable model is:
^
Var[ β] = s2 /Σxi2, where s2 = Σ ^εi 2/(N - 2) = ESS/(N - 2). If there is homoscedasticity this
estimate is unbiased and consistent. However, if there is heteroscedasticity, then the
^
usual estimator of the variance of β is biased and inconsistent.
^
For the two variable model, when there is heteroscedasticity, Var[ β ] = Σ x i2 σ i2 /(Σ x i2 )2 ,204
^
compared to Var[ β] = σ2 /Σxi2 with homoscedasticity. Thus when heteroscedasticity is
present, the ordinary least squares estimators of these variances are biased, inconsistent,
and inefficient.
^
Derivation of Var[ β]:*
^
For the two variable model, β = Σ xiYi / Σxi2.
^
Var[ β] = Var[Σ xiYi / Σxi2] = Σ xi2Var[Yi] / (Σxi2)2 = Σxi2σi2 /(Σxi2)2.205
If all the σi are equal, in other words if we have homoscedasticity, then this reduces to the
^
usual: Var[ β] = σ2/(Σxi2).
If instead of an unweighted regression, we perform a weighted regression with wi = 1/σi2,

^
then β = Σ wixiYi / Σwixi2 = Σ xiYi/σi2 / Σxi2/σi2.206
^
In this case of a weighted regression, Var[ β] = Var[Σ xiYi/σi2 / Σxi2/σi2]
= Σ xi2Var[Yi]/σi4 / (Σxi2/σi2)2 = Σxi2σi2/σi4 /(Σxi2/σi2)2 = Σxi2/σi2 /(Σxi2/σi2)2 = 1 / Σxi2/σi2.
204
205
The xi are assumed to be known and therefore we can treat them as constants. We have used the fact that the
variance of a variable times a constant is the variance of the variable times the square of the constant.
206
As will be discussed, this is one way to correct for heteroscedasticity, if one can determine how the variances
vary.
Problems:
2 6 . 1 (1 point) Define homoscedasticity and heteroscedasticity.
26.2 (2 points) Regressions are fit to five different time series, each with 100 observations.
Of the following five graphs of the squares of the residuals, which of them most clearly
indicates the presence of heteroscedasticity?
D.
A.
20 40 60 80 100 20 40 60 80 100
E.
B.
20 40 60 80 100 20 40 60 80 100
C.
20 40 60 80 100
26.3 (8 points) Let X = (0, 5, 10). Yi = 3 + 2Xi + εi.

ε1 has a 50% chance of being -1 and a 50% chance of being +1.
ε1, ε2, and ε3 are mutually independent.
List all possible observed sets of Y.
^
^ β, and ESS.
For each set determine α,
Determine the average values over the sets.
Section 27, Tests for Heteroscedasticity207

There are number of tests for heteroscedasticity in errors: the Goldfeld-Quandt Test, the
Breusch-Pagan Test, and the White Test. The latter two are very similar.
Goldfeld-Quandt Test for Heteroscedasticity:
One way to test for heteroscedasticity in errors is the Goldfeld-Quandt test.
First one needs to find some variable that one believes is related to σi2, the variance of εi.
In the town example, we believe such a variable to be the exposures.
Next, rank the observations according to that variable, from smallest to largest assumed σi2.
In this case, we would rank the towns from most to fewest exposures.
Run two separate regressions. One regression on the 32 largest towns, and then another
regression on the 32 smallest towns, omitting the middle 80/5 = 16 towns.208
For the 32 largest towns, the regression results were:

^ 2
α^ = 180.6 and β = .00510. R2 = .9918. R =.9916. s2 = 513.
sα^ = 10.08. t-statistic for the intercept is 17.9. sβ^ = .00008464. t-statistic for the slope is 60.2.
Model 1 1,860,890 1,860,890 3626
Error 30 15,395 513
Total 31 1,876,295
For the 32 smallest towns, the regression results were:

^ 2
α^ = 209.8 and β = .00360. R2 = .160. R = .132. s2 = 1109.
sα^ = 16.32. t-statistic for the intercept is 12.9. sβ^ = .001505. t-statistic for the slope is 2.39.
Model 1 6331 6331 5.71
Error 30 33,283 1109
Total 31 39,614
In order to test for homoscedasticity, we compare the ESS for the second regression to the
error sum of squares for the first regression. The test statistic is:
(ESS for second regression)/(32 - 2)}/{ESS first regression/(32 - 2)} = 33283/15395 = 2.16.
Assuming the εi are independent, identically distributed normal variables, each with mean
zero, this test statistic has an F-Distribution, with 30 and 30 degrees of freedom.209
207
See Section 6.1.2 of Pindyck and Rubinfeld.
208
Usually one omits some observations in the middle and then fits a regression to each half of the remaining data.
209
In each of the numerator and denominator, we had 32 observations and fit 2 coefficients.
For 30 and 30 degrees of freedom, the critical value at 5% is 1.84, and the critical value at 1%
is 2.30.210 Since 1.84 < 2.16 < 2.30, we do not reject at 1% the null hypothesis that the errors
are constant and reject at 5% the null hypothesis. At a 5% significance level, the
Goldfeld-Quandt Test indicates heteroscedasticity.
In general, the Goldfeld-Quandt Test proceeds as follows:
0. Test H0 that σi2, the variance of εi, is the same for all i.
1. Find a variable that seems to be related to σi2, by graphing the squared residuals, or other
techniques.
2. Order the observations in assumed increasing order of σi2, based on the relationship
from step 1.
3. Run a regression on the first (N - d)/2 observations, with assumed smaller σi2.
4. Run a regression on the last (N - d)/2 observations, with assumed larger σi2.
5. (ESS from step 4)/(ESS from step 3) has an F-Distribution,
with (N - d)/2 - k and (N - d)/2 - k degrees of freedom.
210
Using a somewhat larger F-Table than attached to the exam.
Breusch-Pagan Test for Heteroscedasticity:
Another test for heteroscedasticity is the Breusch-Pagan test.
As with the Goldfeld-Quandt test, we first need to find a variable that seems to be related to
σi2. In the town example, we have already identified exposures.
^
We use the regression that was fit to the town data previously: Yi = 194.4 + .00499Xi.
Take σ2 = ESS/N = 59610/80 = 745.211
Run a linear regression of ^εi 2/σ2 on exposures. The result is:

intercept = 1.291, and slope = -5.5 x 10-6.
2
R2 = .0485. R = .0363. RSS = 7.34. ESS = 144.141. F = 3.97.
zero, RSS/2 has a Chi-Square Distribution, with 1 degree of freedom.212
RSS/ 2 = 3.67. For the Chi-Square, the critical value for 1 degree of freedom at 5% is 3.84.
Since 3.67 < 3.84, we do not reject the null hypothesis of homoscedasticity at 5%.213
In general, the Breusch-Pagan Test proceeds as follows:
1. Find a variable(s) that seems to be related to σi2, by graphing the squared residuals, or
other techniques.
2. Run the assumed regression model. Note the residuals ^εi and let σ2 = ESS/N.
3. Run a regression of ^εi 2/σ2 from step 2 on the variable(s) from step 1.
4. RSS/2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal
to the number of variables from step 1, not counting an intercept.
One does not have to assume a linear relationship between ^εi 2/σ2 and the exposures.
For example, let us assume ^εi 2/σ2 = a + b/exposures + error term.
Running this regression results in: intercept = .608, and slope = 6628. RSS = 11.75.
RSS/ 2 = 5.88. Since 5.02 < 5.88 < 6.64, we reject the null hypothesis of homoscedasticity at
2.5% and do not reject the null hypothesis at 1%.
211
For calculating σ2 in this test, the denominator is N rather than N - k.
212
Assuming the variances are related to set of p independent variables, then RSS/2 has a Chi-Square
Distribution, with p degrees of freedom.
213
The critical value at 10% turns out to be 2.71. Since 3.67> 2.71, we reject the null hypothesis at 10%.
White Test for Heteroscedasticity:
A third test for heteroscedasticity in errors is the White test, which is very similar to the
Breusch-Pagan test.
As with the Goldfeld-Quandt test and Breusch-Pagan test, we first need to find a variable that
seems to be related to σi2. In this example, we have already identified exposures.
^
We use the regression that was fit to the town data previously: Yi = 194.4 + .00499Xi.
Run a linear regression of ^εi 2 on exposures. The result is:214

intercept = 961.9, and slope = -.00409. R2 = .0485.
zero, N R2 has a Chi-Square Distribution, with 1 degree of freedom.215
N R2 = (80)(.0485) = 3.88. For the Chi-Square, the critical value for 1 degree of freedom at
5% is 3.84. Since 3.88 > 3.84, we reject the null hypothesis of homoscedasticity at 5%.216
In general, the White Test proceeds as follows:
other techniques.
2. Run the assumed regression model. Note the residuals.
3. Run a regression of ^εi 2 from step 2 on the variable(s) from step 1.
4. N R2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal
One does not have to assume a linear relationship between ^εi 2 and the exposures.
For example, let us assume ^εi 2 = a + b/exposures + error term.
Running this regression results in: intercept = 453, and slope = 4.94 million. R2 = .0776.
N R2 = (80)(.0776) = 6.21. Since 5.02 < 6.21 < 6.64, we reject the null hypothesis of
homoscedasticity at 2.5% and do not reject the null hypothesis at 1%.
^
214
Note that since the only difference at this stage from the Breusch-Pagan test was not dividing each εi 2 by
745 = σ2, the R2 values are the same.
215
Assuming the variances are related to set of p independent variables, then NR2 has a Chi-Square Distribution,
with p degrees of freedom.
216
The critical value at 2.5% is 5.02. Since 3.88 < 5.02, we do not reject the null hypothesis at 2.5%.
Problems:
27.1 (2 points) You are given 75 observations, which you have ordered based on some
variable which is believed to be related to the size of the variances of the errors.
A four variable linear regression was fit to the first 30 observations, with total sum of squares
of 1313 and R2 = .985.
A four variable linear regression was fit to the last 30 observations, with total sum of squares
of 1696 and R2 = .980.
What conclusion do you draw with respect to heteroscedasticity?
27.2 (1 point) Which of the following statements about tests for heteroscedasticity are false?
A. The null hypothesis is that there is heteroscedasticity.
B. In the White Test, the test statistic is the number of observations times R2 for a regression.
C. In the Breusch-Pagan Test, the test statistic is a Regression Sum of Squares divided by 2.
D. In the Goldfeld-Quandt Test, the test statistic involves a ratio of Error Sum of Squares.
E. None of A, B, C, or D is false.

X 1 5 6 8 9 13 14 18 23 25
Y 3 5 9 11 13 16 16 23 30 27
27.3 (4 points) Fit a two variable regression model.

What is the fitted value of Y, corresponding to X = 23?
(A) Less than 27
(E) At least 31
27.4 (6 points) Apply the Breusch-Pagan Test for heteroscedasticity, assuming the variance
of the errors is related to X. At which level do you reject the null hypothesis?
(A) 5% (B) 2.5% (C) 1% (D) 0.5% (E) None of A, B, C, or D
27.5 (5 points) Apply the White Test for heteroscedasticity, assuming the variance of the
errors is related to X. At which level do you reject the null hypothesis?
(A) 5% (B) 2.5% (C) 1% (D) 0.5% (E) None of A, B, C, or D
27.6 (6 points) Calculate the F statistic used in the Goldfeld-Quandt test for
heteroscedasticity. Omit the middle 1/5 of observations.
(A) 1.0 (B) 1.5 (C) 2.0 (D) 2.5 (E) 3.0
The following 20 observations have been fit to a regression with the result:
-44.0947 + 11.5339x - 0.0637749x2.
Xi Yi Residual
7.7 47 6.06484
7.9 43.2 0.157038
8.2, 45.1 -1.0951
9.6, 52.7 -8.0533
9.9, 54.7 -9.14039
10, 61.3 -3.56687
10.2 85 18.0840
11.2 68 -9.08513
11.5 103.8 23.6890
12 73.3 -11.8286
12.3 74.2 -13.9239
12.4 75.2 -13.9197
13.2 97.5 0.459264
13.5 70.7 -29.2901
13.8 144.3 41.3721
14.5 148 38.2617
15.2 86.8 -29.6861
16.3 158.1 31.1364
17.6 87.6 -51.5472
18.7 171.2 21.9121
27.7 (6 points) Perform the Goldfeld-Quandt test for heteroscedasticity.

Omit the middle 1/5 of observations.
27.8 (6 points) Perform the White test for heteroscedasticity.
27.9 (6 points) Perform the Breusch-Pagan test for heteroscedasticity.
2 7 .1 0 (1 point) Match the tests for heteroscedasticity with the distribution for their test
statistic.
1. Goldfeld-Quandt Test f. F Distribution
2. Breusch-Pagan Test t. t-Distribution
3. White Test x. Chi-Square Distribution
A. 1f, 2t, 3x
B. 1f, 2x, 3t
C. 1t, 2x, 3f
D. 1x, 2f, 3t
27.11 (Course 120 Sample Exam #1, Q.14) (2 points) You are given the following:
Group Xi Yi
5.0 1.0
1 5.0 2.0
5.0 2.0
10.0 3.0
2 10.0 3.2
10.0 3.5
15.0 4.0
3 15.0 4.2
15.0 4.6
20.0 4.6
4 20.0 5.0
20.0 5.8
You are to test for heteroscedasticity in errors between the first two groups and the second
two groups, assuming all groups are to be included in the calculation.
A linear regression was fit to the first two groups with result:
^
Yi = 0.10 + .313Xi, with RSS = 3.68 and ESS = .79.
A linear regression was fit to the second two groups with result:
^
Yi = 1.67 + .173Xi, with RSS = 1.13 and ESS = .93.
Calculate the F statistic used in the Goldfeld-Quandt test.
(A) 0.3 (B) 0.7 (C) 1.2 (D)1.7 (E) 2.2

You fit the model Yi = α + βXi + εi to a data set with N observations.
You test the null hypothesis that the error terms are homoscedastic against the alternative
hypothesis that Var(εi) = σi2 = γ + δXi + ηXi2. Which of the following statements is false?
(A) A valid test is done by running the model ^εi 2 = γ + δXi + ηXi2 + νi on the residuals and
referring the resulting value of NR2 to a chi-square distribution with 2 degrees of freedom.
(B) A valid test is done by running the model ^εi 2 = γ + δXi + ηXi2 + νi on the residuals and
referring the resulting value of RSS/2 to a chi-square distribution with 2 degrees of freedom.
(C) A valid test is done by running the model ^εi 2/ σ^ 2 = γ + δXi + ηXi2 + νi on the residuals and
referring the resulting value of RSS/2 to a chi-square distribution with 2 degrees of freedom.
^
(D) If β is the ordinary least-squares estimator of β and the alternative hypothesis is true,
^
Var( β) = Σxi2σi2 /(Σxi2)2.
(E) If γ = δ = 0, the procedure for testing homoscedasticity developed by Goldfeld and
Quandt can be applied.
Section 28, Correcting for Heteroscedasticity
Exercise: A random variable is divided by its standard deviation.

What is the variance of the new variable that results?
[Solution: Let Var[X] = σ2. Then Var[X/σ] = Var[X]/ σ2 = σ2/ σ2 = 1.]
Thus if we divide any variable by its standard deviation, we can get a new variable with
variance of 1.
Model with No Intercept:
We will first consider how we would correct for the presence of heteroscedasticity in the case
of a simple model with one variable and no intercept.
Suppose the variance of ε1 is 4 and the variance of ε2 is 36. Then if these were the first two
error terms of a regression, we transform to ε1/2 and ε2 /6, in order to get variables with
variance 1. In this manner we have transformed a situation with differing variances of the
error terms, into one where the errors have equal variances. Of course to preserve the model
we need to also divide X1 and Y1 by 2, and X2 and Y2 by 6.
Assume Yi = βXi + εi, with variance of ε1 = 4, variance of ε2 = 36, and variance of ε3 = 64.
Then we would revise the model to:
Y1/2 = βX1/2 + ε1/2, Y2/6 = βX2/6 + ε2/6, and Y3/8 = βX3/8 + ε3/8.
The revised model is equivalent to the original model; it has the same slope β.
However, the adjusted model is homoscedastic; the errors each have a variance of 1.
Exercise: For the above situation, if X = {10, 36, 104} and Y = {20, 78, 168}, what are the
estimates of β, prior to and after making the above adjustment.
^
[Solution: Prior to adjustment: ΣXi2 = 12212. ΣXiYi = 20480. β = 20480/12212 = 1.68.
After adjusting, X = {10/2, 36/6, 104/8} = {5, 6, 13} and Y = {20/2, 78/6, 168/8} = {10, 13, 21}.
ΣXi2 = 230. ΣXiYi = 401. β^ = 401/230 = 1.74.]
Prior to adjustment, the estimate of the slope is 1.68. However, there was heteroscedasticity.
As discussed previously, therefore while this estimate is unbiased and consistent, it is not
efficient. It does not have the smallest expected squared error among unbiased linear
estimators.
After adjustment, the estimate of the slope is 1.74. The adjustment has removed the
heteroscedasticity. The assumptions behind ordinary least squares hold for the adjusted
model, and therefore this estimate is unbiased, consistent, and efficient. It has the smallest
expected squared error among unbiased linear estimators. The estimate from the adjusted
model is better.
Note that in terms of the original X and Y, the estimate of the slope after the adjustment to
correct for heteroscedasticity is:
^
β = {(10/2)(20/2) + (36/6)(78/6) + (104/8)(168/8)}/ {(10/2)2 + (36/6)2 + (104/8)2} =
Σ(Xi/σi)(Yi/σi) / Σ(Xi/σi)2 = ΣwiXiYi / ΣwiXi2, where wi = (1/σi2)/ Σ1/σi2.
In this case, the weights wi are: {1/4, 1/36, 1/64}/{1/4 + 1/36 + 1/64} = {144, 16, 9}/169 =
{.852, .095, .053}. It is as if we count the first observation more heavily and the last
observation less heavily. This is an example of a weighted regression, as discussed in a
previous section.
In a weighted regression we weight some of the observations more heavily than others.
Equivalently we pretend as if we have different numbers of repeated copies of the actual
observations.
Exercise: You have 169 observations. For 144 of them X = 10 and Y = 20, for 16 of them
X = 36 and Y = 78, and for the last 9 of them X = 104 and Y = 168.
Fit the model Yi = βXi + εi, by ordinary least squares regression.
[Solution: ΣXi2 = (144)(102) + (16)(362) + (9)(1042) = 132480.
ΣXiYi = (144)(10)(20) + (16)(36)(78) + (9)(104)(168) = 230976. β^ = 230976/132480 = 1.74.]
This is the same result as for the model adjusted to correct for the effects of
heteroscedasticity. The chief use of weighted regressions on this exam is to
correct for the presence of heteroscedasticity.
In order to adjust for heteroscedasticity, we use a weighted regression, in

which we weight each observation by wi, with wi proportional to 1/σσ i2 , the
inverse of the variance of the error ε i.
For the model with no intercept:

^
β = Σ ( X i/σ
σ i)( Y i/σ σ i)2 = Σ w iX iY i / Σ w iX i2 , where wi = (1/σ
σ i) / Σ ( X i/σ σ i2 )/ Σ 1/σ
σ i2 .
Sometimes one is given Var(εi) = σi2 as a function of an independent variable.
Exercise: You are given that Var(εi) = σ2Xi and the εi’s are uncorrelated.
You fit the regression model Yi = βXi + εi.
Determine the weighted least squares estimate of β.
[Solution: Adjust each variable by dividing by StdDev[εi], σ√Xi.

Yi / σ√Xi = βXi/ σ√Xi + εi/ σ√Xi. Yi /√Xi = β√Xi + εi /√Xi. The errors now have constant variance
^
and the least squares solution is: β = Σ√Xi Yi /√Xi /Σ(√Xi)2 = ΣYi /ΣXi.
Alternately, wi = (1/ σ2Xi )/Σ1/ σ2Xi = (1/Xi) /Σ1/Xi .
^
β = ΣwiXiYi / ΣwiXi2 = {ΣYi/Σ1/Xi }/{ΣXiΣ1/Xi} = ΣYi /ΣXi.]
σ2, the proportionality constant in the formula for Var(εi), had no effect on the estimate of the
slope. In this example, we only needed to know that Var(εi) was proportional to Xi. In general,
we only need to know the relationship between σi2 and Xi up to a proportionality constant. We
are only interested in the relative sizes of σi2.
In the above exercise, with the variance of εi proportional to Xi, the Yi associated with large Xi
had a larger variance. Therefore, it is harder to estimate the Yi associated with large Xi with
the same accuracy as those Yi associated with small Xi. We are less concerned with a given
size error in predicting Y when X is large, than the same size error when X is small.
Therefore, it makes sense to weight those squared errors associated with large X less heavily
in a sum of squared errors.
This is exactly what is done in the weighted regression as used to correct for
heteroscedasticity. In the weighted sum of squares, each term is weighted inversely to its
variance. By dividing by each variance, we have standardized the squared errors so they are
on a comparable scale and can be usefully added up.
Two Variable Model:
Let’s apply these ideas to the two variable model: Yi = α + βXi + εi. If we believe there is
heteroscedasticity, we adjust the original model: Yi/σi = α/σi + βXi/σi + εi/σi. This adjusted
model is homoscedastic.
The sum of squared errors for the adjusted model is:

Σ(Yi/σi - (α/σi + βXi/σi))2 = Σ(Yi - α + βXi)2/ σi2.
We minimize this sum of squared errors by setting the partial derivatives with respect to α and
β equal to zero.
0 = Σ(Yi - α + βXi)/ σi2. ⇒ αΣ(1/ σi2) = ΣYi/ σi2 - βΣXi/ σi2.
⇒ α^ = ΣwiYi - β^ ΣwiXi, where wi = (1/σi2)/ Σ(1/σj2).

0 = Σ(Yi - α + βXi)Xi/ σi2. ⇒ 0 = ΣXiYi/ σi2 - αΣ Xi/ σi2 - βΣXi2/ σi2. ⇒

^
0 = ΣwiXiYi - α^ ΣwiXi - β ΣwiXi2, where wi = (1/σi2)/ Σ(1/σj2).
^
Substituting α^ into the second equation, we can solve for β:
^
β = {ΣwiXiYi - ΣwiXiΣwiYi}/ {ΣwiXi2 - (ΣwiXi)2}, where wi = (1/σi2)/ Σ(1/σj2).
^
α^ = ΣwiYi - β ΣwiXi.
These are the equations for weighted regression, when Σwi = 1, discussed in a previous
section.
Exercise: Fit a weighted regression to the following data:

Xi Yi Var[εi]
3 10 3
7 6 2
20 4 1
[Solution: wi = (1/σi2)/ Σ(1/ σi2) = {1/3, 1/2, 1}/(1/3 + 1/2 + 1) = {2/11, 3/11, 6/11}.
^
β = {ΣwiXiYi - ΣwiXiΣwiYi}/{ΣwiXi2 - (ΣwiXi)2} = {60.545 - (13.364)(5.6364)}/{233.18 - 13.3642}
^
= -.271. α^ = ΣwiYi - β ΣwiXi = 5.6364 - (-.271)(13.364) = 9.26.
^
The weighted regression is: Yi = 9.26 - .271Xi.]
Deviations Form:
As discussed in a previous section, one could instead use the equations in deviations form:
x i = Xi - Σ w i X i
y i = Yi - Σ w i Y i
^
β = Σ w i x i y i /Σ w i x i 2 .
^
α^ = Σ w iY i - β Σ w iX i.
σ i2 ) / Σ (1/σ
In order to correct for heteroscedasticity, wi = (1/σ σ j2 ) .
Adjusting the Town Example for Heteroscedasticity:
For the example of frequencies by town, there was heteroscedasticity, with the variance of the
errors increasing with exposures.
As one example, it was assumed that ^εi 2 = a + b/(exposures for town i) + error term.
Running this regression resulted in: intercept = 453, and slope = 4.94 million.
Based on this a regression, let us assume that:
σi2 = 450 + 5 million/(exposures for town i).
The original model is: Yi = α + βXi + εi

The adjusted model is: Yi/σi = α/σi + βXi/σi + εi/σi.
The weights in the weighted regressions are: wi = (1/σi2)/ Σ(1/σj2).

The smallest town with 4092 exposures, has w1 = .004954, a medium sized town with 23,800
exposures has w40 = 0.012548, while the largest town with 211,189 exposures, has
w80 = .0174858. It is not unreasonable that one is giving more weight to the data from larger
towns.
^
β = {ΣwiXiYi - ΣwiXiΣwiYi}/ {ΣwiXi2 - (ΣwiXi)2} =
{5.2434 x 107 - (67003)(528.33)}/{7.8922 x 09 - (67003)2} = .005006.
^
α^ = ΣwiYi - β ΣwiXi = 528.33 - (.005006)(67003) = 192.9.
The results of the weighted regression are:

^ 2
α^ = 192.9 and β = .00501. R2 = .9923. R = .9922. s2 = 1.022.
sα^ = 4.43. t-statistic for the intercept is 43.5. sβ^ = .0000499. t-statistic for the slope is 100.
Durbin-Watson Statistic: 1.72.
(19.64 -0.000167)
Covariance Matrix = (
(-0.000167 .00000000249)
^
The weighted regression is: Yi = 192.9 + .00501Xi. This is only slightly different than the
^
unweighted regression gotten previously, Yi = 194.4 + .00499Xi.
Here is a comparison of the two models for three towns:

Town Exposures Unweighted Regression Weighted Regression
1 4092 214.8 213.4
40 23,800 313.2 312.1
80 211,189 1248.2 1251.0
For example, for the smallest town, the fitted claim frequencies per exposure (rather than per
10000 exposures) are: 2.148% and 2.134%. So while there are small differences, in this
case there is no practical difference between the weighted and unweighted regressions.
Here is a plot of this weighted regression line versus the data:217
Freq.
0.12
0.1
0.08
0.06
0.04
0.02
Expos.
50000 100000 150000 200000
217
Frequencies are shown as number of claims per exposure, rather than per 10,000 exposures.
Heteroscedasticity-Consistent Estimators:218
Heteroscedasticity-consistent estimators (HCE), provide unbiased, and

consistent estimators of variances of estimated parameters, when
heteroscedasticity is present.
Exercise: Fit the linear regression model Yi = βXi + εi to the following data:
Y 1 2 6 11
X 0 3 5 8
^
Estimate Var[ β].
[Solution: X = (0 + 3 + 5 + 8)/4 = 4. x = (-4, -1, 1 , 4).
Y = (1 + 2 + 6 + 11)/4 = 5. y = (-4, -3, 1, 6).
^ ^
β = Σ xiyi / Σ xi2 = 44/34 = 22/17. α^ = Y - β X = 5 - (22/17)(4) = -3/17.
^ ^
Yi = α^ + βX = (-3/17, 63/17, 107/17, 173/17).
^ε ^
i = Y - Yi = (20/17, -29/17, -5/17, 14/17).
^
s2 = Σ ^εi 2/(N-2) = 5.0588/2 = 2.529. Var[ β] = s2 /Σxi2 = 2.529/34 = .0744.]
Heteroscedasticity-consistent estimators are based on the more general equation:219

^
Var[ β] = Σxi2σi2 /(Σxi2)2, using ^εi to estimate σi.
^
Var[ β ] ≅ Σ x i2 ^εi 2 / ( Σ x i2 ) 2 .
Exercise: In the previous exercise, determine the heteroscedasticity-consistent estimator of

^
Var[ β].
^
[Solution: Var[ β] = Σxi2 ^εi 2 /(Σxi2)2 =
{(-4)2(20/17)2 + (-1)2(-29/17)2 + (1)2(-5/17)2 + (4)2(14/17)2}/{(-4)2 + (-1)2 + (1)2 + (4)2}2
= 35.993/342 = .0311.
^
Comment: This differs from the previous estimate of Var[ β], which assumed homoscedasticity.
Unlike in the use of weighted regression, here there was no need to specify a form of εi as a
function of Xi.]
Heteroscedasticity-consistent estimators are not efficient. Efficient estimators of variances of

estimated parameters are obtained via weighted regression.
Heteroscedasticity-consistent estimators are consistent in the presence of heteroscedasticity

of unknown form. In order to apply weighted regression to correct for heteroscedasticity, one
must know or estimate how the variances vary across the observations.
218
See page 152 of Pindyck and Rubinfeld.
219
Applying Heteroscedasticity-Consistent Estimators to the Town Example:*
The simplest and most commonly used HCE is in matrix form:220

^
Var[ β] ≅ (X’X)-1 X’ Diag[ ^εi 2] X (X’X)-1, where X is the design matrix.
In the two-variable case, this is equivalent to:
Var[ α^ ] ≅ {Var[X]2Σ ^εi 2 + X 2Σxi2 ^εi 2 - 2Var[X] X Σxi ^εi 2}/ ( Σxi2)2.
^
Var[ β] ≅ Σxi2 ^εi 2 / ( Σxi2)2.
^
Cov[ α^ , β] ≅ {Var[X]Σxi ^εi 2 - X Σxi2 ^εi 2} / ( Σxi2)2.
In the town example, this heteroscedasticity-consistent estimator yields the following

covariance matrix:
(20.71 -.000164)
(-.000164 .00000000212)
This is fairly close to the covariance matrix from the weighted regression performed above:
(19.64 -0.000167)
(-0.000167 .00000000249)
As well as the covariance matrix from the unweighted regression:

(18.36 -0.000166)
(-0.000166 .00000000313)
An HCE recommended for samples of size less than 250 is in matrix form:221
^
Var[ β] ≅ (X’X)-1 X’ Diag[ ^εi 2/(1 - hii)2] X (X’X)-1,
where hii are the diagonal elements of the “hat matrix”, H = X(X’X)-1X’.
220
Pindyck and Rubinfeld does not contain formulas for HCEs. See “A heteroscedastic-consistent covariance
matrix estimator and a direct test of heteroscedasticity,” by H. White, Econometrica, 48 (1980).
221
See “Some heteroscedasticity consistent covariance matrix estimators with improved finite sample properties,”
by J.G. MacKinnon and H. White, Journal of Econometrics, 29 (1985).
Problems:

Yi = α + βXi + εi. Var(εi) = Xi/10.
i Xi Yi
1 10 10
2 40 40
3 160 100
4 250 125
28.1 (3 points) Determine the weighted least squares estimate of β.

(A) Less than 0.3
(E) At least 0.6
28.2 (2 points) Determine the weighted least squares estimate of α.

(A) Less than 5
(E) At least 8

Ten independent loss ratios Y1, Y2, ..., Y10 are described by the model Yt = α + εt.
Y1 + Y2 + Y3 = 225. Y4 + Y5 + Y6 + Y7 = 290. Y8 + Y9 + Y10 = 205.
2 8 .3 (1 point) Determine the ordinary least squares estimator of α.

(A) 70.5 (B) 71.0 (C) 71.5 (D) 72.0 (E) 72.5
2 8 .4 (3 points) Var(εt) = 1/3, t = 1, 2, 3; Var(εt) = 1/5, t = 4, 5, 6, 7; Var(εt) = 1/8, t = 8, 9, 10.

Determine the weighted least squares estimator of α.
(A) 70.5 (B) 71.0 (C) 71.5 (D) 72.0 (E) 72.5
Use the following 15 observations for the next two questions:

X: 10 10 10 10 10 15 15 15 15 15 20 20 20 20 20
Y: 15 23 11 14 18 19 29 20 35 24 26 48 27 38 39
28.5 (3 points) Fit the ordinary least squares model, Yi = α + βXi + εi.
28.6 (6 points) Fit the weighted least squares model, Yi = α + βXi + εi, assuming Var(εi) is
proportional to Xi2.
28.7 (4 points) You fit the linear regression model Yi = βXi + εi to the following data:
Y 3 9 14
X 1 4 10
^
Determine the heteroscedasticity-consistent estimator of Var[ β].
(A) .013 (B) .014 (C) .015 (D) .016 (E) .017
2 8 .8 (2 points) You fit the model Y = α + βX + ε.

The error variance is inversely proportional to X.
Which of the following models corrects for this form of heteroscedasticity?
(A) YX1/4 = αX1/4 + βX5/4 + ε∗
(B) YX1/4 = α + βX5/4 + ε∗
(C) YX1/2 = αX1/2 + βX3/2 + ε∗
(D) YX-1/4 = αX-1/4 + βX3/4 + ε∗
(E) YX-1/2 = αX-1/2 + βX1/2 + ε∗

You fit the regression model Yi = βXi + εi to the following data:
Y 3 8 15
X 1 4 9
You are given that Var(εi) = σ2Xi and the εi’s are uncorrelated.
(A) 1.68 (B) 1.70 (C) 1.73 (D) 1.86 (E) 2.22

You fit the model Yi = βXi + εi to the following observations:
X 1 2 3 4 5
Y 8 12 24 36 55
^
Determine β, the least squares estimate of β, when the error variance is proportional to X2.
(A) 8.4 (B) 9.0 (C) 9.5 (D) 9.8 (E) 10.1

(i) yi = βxi + εi
Var(εi) = (xi/2)2
(ii) i xi yi
1 1 8
2 2 5
3 3 3
4 4 –4
(A) 0.4 (B) 0.9 (C) 1.4 (D) 2.0 (E) 2.6
28.12 (4, 5/01, Q.21) (2.5 points) Twenty independent loss ratios Y1, Y2, ..., Y20 are
described by the model Yt = α + εt,
where: Var(εt) = 0.4, t = 1, 2, ...,8, and Var(εt) = 0.6, t = 9, 10, ...,20.
You are given:
Y1 = (Y1 + Y2 +...+ Y8)/8
Y 2 = (Y9 + Y10 +...+ Y20)/12.
Determine the weighted least squares estimator of α in terms of Y1 and Y 2 .
(A) 0.3 Y1 + 0.7 Y 2
(B) 0.4 Y1 + 0.6 Y 2
(C) 0.5 Y1 + 0.5 Y 2
(D) 0.6 Y1 + 0.4 Y 2
(E) 0.7 Y1 + 0.3 Y 2
28.13 (4, 11/01, Q.28) (2.5 points) You fit the model Y = α + βX + ε.
The error variance is proportional to X-1/2.
Which of the following models corrects for this form of heteroscedasticity?
(A) YX1/4 = αX1/4 + βX5/4 + ε∗
(B) YX1/4 = α + βX5/4 + ε∗
(C) YX1/2 = αX1/2 + βX3/2 + ε∗
(D) YX-1/4 = αX-1/4 + βX3/4 + ε∗
(E) YX-1/2 = αX-1/2 + βX1/2 + ε∗
28.14 (4, 11/04, Q.23) (2.5 points) The model Yi = βX i + εi is fitted to the following
observations:
X Y
1.0 1.0
4.5 9.0
7.0 20.1
You are given:
Var(εi) = σ2Xi.
Determine the weighted least-squares estimate of β.
(A) Less than 2.5
(E) At least 3.4
Mahler’s Guide to
Regression
Sections 29-32:
29 Serial Correlation
30 Durbin-Watson Statistic
31 Correcting for Serial Correlation
32 Multicollinearity
prepared by
Study Aid F06-Reg-H

Sharon, MA, 02067
HCMSA-F06-Reg-H, Mahler’s Guide to Regression, 7/12/06, Page 275
Section 29, Serial Correlation
Assume one has the following time series, Y1, Y2, ... , Y20: 547, 628, 778, 759, 823, 772, 904,
971, 974, 916, 1043, 932, 959, 1077, 998, 1003, 1048, 1371, 1414, 1568.
Exercise: Fit a linear regression to this time series.

[Solution: X = 1, 2, ..., 20. X = 10.5. Σxi2 = 665. Y = 974.25. Σxiyi = 25289.5.
^ ^
β = Σxiyi/Σxi2 = 25289.5 /665 = 38.0. α^ = Y - β X = 575.]
Exercise: What are the residuals, ε^t , for this regression?

^
[Solution: The fitted Yt are: 613, 651, 689, 727, 765, 803, 841, 879, 917, 955, 993, 1031,
1069, 1107, 1145, 1183, 1221, 1259, 1297, 1335.
^
ε^t = Yt - Yt = -66, -23, 89, 32, 58, -31, 63, 92, 57, -39, 50, -99, -110, -30, -147, -180, -173, 112,
117, 233.]
Here is a graph of these residuals:
200
100
5 10 15 20
-100
There seems to be some tendency for a positive residual to follow a positive residual, and a
negative residual to follow a negative residual.
This apparent correlation is also shown in a graph of ε^ t −1 versus ε^t :
There are more points in the lower left and upper right quadrants. Such a lower-left and
upper-right pattern, indicates positive serial correlation. On the other hand, a lower-right and
upper-left pattern would indicate negative serial correlation.
The visual impression of the above graph of ε^ t −1 versus ε^t can be quantified by calculating
the sample correlation between ε^ t −1 and ε^t , which is positive in this example.
The average of the residuals for 1 to 19 is: -12.00

The average of the residuals for 2 to 20 is: 3.74.
The sample correlation between ε^ t −1 and ε^t is:222

20 20 20
Σ( ε^ t −1 + 12)( ε^t - 3.74)/ √{Σ( ε^ t −1 + 12)2Σ( ε^t - 3.74)2} = 100382 /√{(16804)(220498)} = .52.
t=2 t=2 t=2
The residuals appear to be positively correlated with their value one time period earlier.223
222
See for example, equation 2.9 in Econometric Models and Economic Forecasts.
223
The data for this time series was in fact simulated from a model with positive serial correlation with ρ = .5.
First Order Serial Correlation:
One of the assumptions of the ordinary least squares regression was that the
errors were independent. If they were independent the correlation should be
zero. It appears that in this case this assumption is violated. Rather we seem to have positive
serial correlation.224
Let ρ = Corr[εε t-1 , ε t ]. If ρ > 0, then we have positive (first order) serial
correlation.225 If ρ < 0, then we have negative (first order) serial correlation. Positive serial
correlation is common in time series.
ρ > 0 ⇔ successive residuals tend to be alike.

ρ < 0 ⇔ successive residuals tend to be unalike.
ρ = 0 ⇔ successive residuals are approximately independent.226
Model of First Order Serial Correlation:
ε t = ρ ε t-1 + ν t,
where |ρ| ≤ 1, εt is Normal with mean zero and standard deviation σε,
νt is Normal with mean zero and standard deviation σν, with νt and εt independent.227
σε2 = Var[εt] = Var[ρεt-1 + νt] = ρ2Var[εt-1] + Var[νt] = ρ2σε2 + σν2.228

⇒ Var[εε t] = σ ε 2 = σ ν 2 /(1 - ρ 2 ).229
Cov[εt-1, εt] = E[εt-1εt] = E[εt-1(ρεt-1 + νt)] = E[ρεt-12 + εt-1νt)] = ρE[εt-12] + E[εt-1νt]

= ρVar[εt-1] + ρE[εt-1]2 + E[εt-1]E[νt] = ρσε2.230
⇒ Corr[εt-1, εt] = Cov[εt-1, εt]/√(Var[εt-1]Var[εt]) = ρσε2/σε2 = ρ.
Thus ρ is indeed the correlation between successive errors.
Similarly, Corr[εt-2, εt] = ρ2, Corr[εt-3, εt] = ρ3, and Corr[εt-d, εt] = ρd.231
224
Due to random fluctuation in this relatively small sample, the calculated simple correlation is not necessarily a
good indication of the underlying correlation of errors.
225
Serial correlation is discussed more extensively with respect to time series.
226
Even if the underlying errors are independent, the residuals are not independent of each other, because they
sum to zero.
227
228
Where we have assumed homoscedasticity. Var[εt-1] = Var[εt].
229
230
We have used the fact that νt and εt-1 are independent, and each have mean of zero.
231
See Equations 6.15 and 6.16 in Pindyck and Rubinfeld.
Effects of Positive Serial Correlation on Ordinary Regression Estimators:
1. Still unbiased.
3. No longer efficient.232
4. Standard error of regression is biased downwards.
5. Overestimate precision of estimates of model coefficients.
6. Some tendency to reject H0: β = 0, when one should not.
7. R2 is biased upwards.
2
Exercise: For the regression done above what are R2 and R ?
[Solution: RSS ≡ Σ( Y^ i - Y )2 = 960,261. TSS = Σ(Yi - Y )2 = 1,186,860.
2 2
R2 = RSS/TSS = .81. 1 - R = (1 - R2)(N - 1)/(N - k) = 1 - (.19)(19)/18 = .20. R = .80.]
These estimates of the goodness of fit are biased upwards due to the positive serial
correlation. The fitted regression is likely to explain less than 80% of the actual underlying
variation.
Exercise: For the regression done above what are the t-statistics?
[Solution: s2 = Σ ^εi 2 /(N - 2) = 225119/18 = 12507.
Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (12507)(2870)/((20)(665)) = 2699.
sα^ = √2699 = 51.95. In order to test the hypothesis that α = 0, t = α^ /sα^ = 575/ 51.95 = 11.1.
^
Var[ β] = s2 /Σxi2 = 12507/665 = 18.81. sβ^ = √18.81 = 4.34.
^
In order to test the hypothesis that β = 0, t = β/ sβ^ = 38.0/4.34 = 8.8.]
Based on these very large t-statistics one would ordinarily conclude that each of the
coefficients are significantly different than zero. However, the standard error of the regression
is biased upwards, due to the positive serial correlation. Therefore, the absolute values of the
estimated t-statistics are likely to be too large. One can not trust any conclusions one might
draw from these calculated t-statistics.
The statistic commonly used to test for the presence of serial correlation is the Durbin-Watson
statistic. After the Durbin-Watson statistic is discussed, then methods of correcting for serial
correlation will be discussed.
232
No longer does ordinary least squares have the smallest variance of unbiased estimators. This property
depended on an assumption that the errors were independent and had equal variance.
Here are some examples of simulated series of ε^t , with time on the horizontal axis:
ρ = 0.7
ρ=0
ρ = -0.7
and here are the corresponding plots of ε^ t −1 versus ε^t :
ρ = 0.7
ρ = 0.0
ρ = -0.7
Simulating Serial Correlation:*
Assume for example we wish to simulate the model: Yi = 22 + 7Xi + εi.

We are given the values for a series of observations of X, and want to simulate a
corresponding series of values of Y.
It is assumed that εi is Normally Distributed with mean zero.
If the model has Var[εi] = 100, with no serial correlation, then we simulate this model as
follows:233
Yi = 22 + 7Xi + 10(Random Standard Normal Distribution),
Assume instead that the model has first order serial correlation with for example ρ = .6, and
as before Var[εi] = 100. Then we simulate this model as follows:
εi = .6εi-1 + 8(Random Standard Normal Distribution),
Yi = 22 + 7Xi + εi,
where for each εi we simulate a new independent random draw from a Normal Distribution
In order to initialize, we let ε0 = 10(Random Standard Normal Distribution).
Exercise: Let -0.849, 0.931, 1.988, and -1.253, be four independent random draws from a
Standard Normal Distribution. X1 = 1.7, X2 = 3.1, and X3 = 4.5. Simulate Y1, Y2, and Y3.
[Solution: ε0 = 10(-.849) = -8.49. ε1 = (.6)(-8.49) + (8)(.931) = 2.35.
ε2 = (.6)(2.35) + (8)(1.988) = 17.31. ε3 = (.6)(17.31) + (8)(-1.253) = .36.
Y1 = 22 + (7)(1.7) + 2.35 = 36.25. Y2 = 22 + (7)(3.1) + 17.31 = 61.01.
Y3 = 22 + (7)(4.5) + .36 = 53.86.]
In general, εi = ρεi-1 + (Random Standard Normal Distribution)σ√(1 - ρ2), where σ2 is the

variance of ε.234 Initially we let ε0 = σ(Random Standard Normal Distribution).
Exercise: If εi = .6εi-1 + 8(Random Standard Normal Distribution), what is Var[εi]?

[Solution: Var[εi] = .62Var[εi-1] + 82(1) = .36Var[εi] + 64. ⇒ Var[εi] = 64/.64 = 100.]
Thus in the example above, we do in fact have Var[εi] = 100 as desired.
233
Here one is not interested in the details of how the computer simulates a random draw from a Normal
Distribution. How to simulate a Normal Distribution is explained for example in Simulation by Ross.
234
See Equation 6.12 in Econometric Models and Econometric Forecasts by Pindyck and Rubinfeld.
Problems:
2 9 . 1 (1 point) Which of the following is not an effect of positive serial correlation on

regressions?
A. Ordinary least squares regression estimators are no longer efficient.
B. Ordinary least squares regression estimators are no longer consistent.
C. The standard error of the regression is biased downwards.
D. The estimate of the portion of variation explained by the regression is biased upwards.
E. There will be a tendency to reject the null hypothesis when in fact it should not be
rejected. .
Y = 4 + 3t + εt, t = 1, 2, 3, 4, 5.
Prob[εt = 1] = 50% = Prob[εt = -1].
29.2 (3 points) If Corr[εt , εt] = 1, determine the expected value of the Error Sum of Squares.
29.3 (3 points) If Corr[εt , εt] = -1, determine the expected value of the Error Sum of Squares.
29.4 (1 point) According to Pindyck and Rubinfeld in Economic Models and Economic
Forecasts, which of the following statements about serial correlation is false?
A. When there is first order serial correlation, the errors in one time period are correlated
directly with errors in the ensuing period.
B. Serial correlation may be positive or negative.
C. Positive serial correlation frequently occurs in time series studies.
D. A likely cause of negative serial correlation is the high degree of correlation over time that
is present in the cumulative effects of omitted variables.
E. When there is negative serial correlation, a negative error is likely to be followed by a
positive error in the ensuing period.
29.5 (1 point) There is first order serial correlation, εt = ρεt-1 + νt.

ρ = 0.7. Var[ν] = 40. Determine Var[ε].
A. 12 B. 20 C. 28 D. 78 E. 133.
29.6 (1 point) The regression model Yt = α + βXt + εt is fit to five different time series.
Which of the following graphs of ε^ t −1 versus ε^t , indicates negative serial correlation?
A D
E
B
C
Section 30, Durbin-Watson Statistic
Exercise: For the regression fit in the previous section, compute Σ( ε^t - ε^ t −1)2 / Σ ε^t 2.
^
[Solution: ε^t = Yt - Yt = -66, -23, 89, 32, 58, -31, 63, 92, 57, -39, 50, -99, -110, -30, -147, -180,
-173, 112, 117, 233. Σ( ε^t - ε^ t −1)2 = 192,533. Σ ε^t 2 = 225,119. Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = .855.]
This an example of the Durbin-Watson statistic, which is used to test for serial correlation.
If the value of the DW statistic is far from 2, such as .855, that indicates the likely presence of
serial correlation. The Durbin-Watson statistic is computed from the residuals as follows:235
N N
DW = Σ ( ε^t - ε^t−1) 2 / Σ ε^t 2 .

t = 2 t=1
^
Exercise: Yt = 23, 25, 33, 39, 46, 44. Yt = 22.6, 27.5, 32.5, 37.5, 42.5, 47.4.
Compute the Durbin-Watson statistic.
^
[Solution: ε^t = Yt - Yt = .4, -2.5, .5, 1.5, 3.5, -3.4. Σ ε^t 2 = 32.7.
ε^t - ε^ t −1 = -2.9, 3.0, 1.0, 2.0, -6.9. Σ( ε^t - ε^ t −1)2 = 70.0.
DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = 70.0/32.7 = 2.14.]
The DW Statistic has the following properties:

0 < DW < 4.
no serial correlation ⇔ DW near 2.
positive serial correlation ⇔ DW small.
negative serial correlation ⇔ DW large.
DW ≅ 2(1 - ρ ), where ρ is the correlation coefficient of adjacent errors.
One can use the Durbin-Watson statistic to test the null hypothesis that there is no (first order)
serial correlation ⇔ ρ = 0. One would need values from the appropriate statistical table,
which is not attached to the exam. The values depend on the number of observations, N, and
the number of explanatory variables excluding the constant term, k.236 237
235
See Equation 6.22 in Econometric Models and Economic Forecasts.
Note that the numerator has only N-1 terms, while the denominator has N terms.
236
Note that k in the DW table is the number of explanatory variables excluding the constant, while elsewhere in
Econometric Models and Economic Forecasts k is the number of explanatory variables including the constant.
237
As N increases, the range of DWs where we do not reject H0 gets narrower. As k increases, the range of DWs
where we do not reject H0 gets wider. As N increases, the range of DW that result in an indeterminate result
decreases. The distribution of the DW statistic is complicated. It not only depends on N, k, and the chosen
significance level, it also depends on X, the design matrix of independent variables. See Kendall’s Advanced
Theory of Statistics, Volume 2.
Using Table 5 at the back of Econometric Models and Economic Forecasts, at a 5%
significance level, for the two-variable regression model, one explanatory variable excluding
the constant (k = 1), fit to 20 observations (N =20): dl = 1.20 and du = 1.41. These values,
d-lower and d-upper are compared to the DW statistic and conclusions are drawn as follows:
H0 : no (first order) serial correlation ⇔ ρ = 0
0 < DW < 1.20 ⇒ reject H0, positive serial correlation.
1.20 < DW < 1.41 ⇒ indeterminate238
1.41 < DW < 4 - 1.41 = 2.59 ⇒ do not reject H0.
2.59 < DW < 4 - 1.20 = 2.80 ⇒ indeterminate239
2.80 < DW < 4 ⇒ reject H0, negative serial correlation.
Exercise: A multiple regression model with 4 explanatory variables excluding the constant
term has been fit to 50 observations. You test the hypothesis that there is no first order serial
correlation of the errors. How does your conclusion depend on the Durbin-Watson Statistic?
Hint: Looking in Table 5 in Econometric Models and Economic Forecasts, at a 5%
significance level, for k = 4 and N = 50: dl = 1.38 and du = 1.72.
[Solution: 0 < DW < 1.38 ⇒ reject H0, positive serial correlation.
1.38 < DW < 1.72 ⇒ no conclusion on H0 (but not negative serial correlation.)
1.72 < DW < 4 - 1.72 = 2.28 ⇒ do not reject H0.
2.28 < DW < 4 - 1.38 = 2.62 ⇒ no conclusion on H0 (but not positive serial correlation.)
2.62 < DW < 4 ⇒ reject H0, negative serial correlation.]
As the number of observations increases, the size of the indeterminate regions decreases.240
A Simulation Experiment:*
I performed a simulation experiment on the model: Yt = 50 + 5t + εt, t = 1, 2, ... ,20.

I assumed the errors were independent and had variance 100.
I simulated Y1, Y2, ... Y20, fit a linear regression, and computed the Durbin-Watson Statistic.
Since the Ys were simulated with no serial correlation, we expect DW to be close to 2.
When I performed the experiment 10 times, I got the following values of the DW Statistic,
sorted from smallest to largest: 1.35, 1.75, 1.99, 2.03, 2.08, 2.21, 2.36, 2.36, 2.80, 2.85.
While most of the values are close to 2, some are not that close.
Using Table 5 at the back of Econometric Models and Economic Forecasts, at a 5%

significance level, for the two-variable regression model, one explanatory variable excluding
the constant (k = 1), fit to 20 observations (N =20): dl = 1.20 and du = 1.41.
238
No conclusion on H0 (but not negative serial correlation.)
239
No conclusion on H0 (but not positive serial correlation.)
240
Some people use the rule of thumb, that at least 50 observations of a time series are needed before the
Durbin-Watson test is likely to provide worthwhile conclusions.
Since 1.20 < 1.35 < 1.41, in this case the test would have been inconclusive241; there might be
positive serial correlation or no serial correlation, but there is not negative serial correlation.
Since 1.41 < 1.75 < 2.59, in this case we would not reject the null hypothesis that there is no
serial correlation. Since 4 - 1.20 = 2.80 > 2.85 > 2.59 = 4 - 1.41, in this case the test would
have been inconclusive; there might be negative serial correlation or no serial correlation,
but there is not positive serial correlation.
Here is a graph of DW for 100 such simulations, with horizontal lines at dl = 1.20, du = 1.41,
4 - du = 2.59, and 4 - dl = 2.80:
2.5
1.5
The 100 simulations were divided into:

2 cases with 0 < DW < 1.20 ⇒ reject H0, positive serial correlation.
3 cases with 1.20 < DW < 1.41 ⇒ no conclusion (however, not negative serial correlation.)
86 cases with 1.41 < DW < 2.59 ⇒ do not reject H0.
5 cases 2.59 < DW < 2.80 ⇒ no conclusion (however, not positive serial correlation.)
4 cases with 2.80 < DW < 4 ⇒ reject H0, negative serial correlation.
Since the table from which dl and du were taken was for a 5% significance level, we would
expect about 5 cases out of 100 where we reject H0 even though H0 is in fact true.242 For this
particular simulation, there are 6 cases where we reject H0 when we should not have.243
241
Assuming one did not know how the data had been simulated.
242
In other words, the probability of a Type I Error is 5%.
243
At a 1% significance level it turns out dl = .95, du = 1.15. See Kendall’s Advanced Theory of Statistics,
Volume 2, Appendix Table 11. For this same experiment, at the 1% level, there are no cases where one rejects
H0, 6 indeterminate cases, and 94 cases where one does not reject H0.
Next, I simulated a similar situation, but with positive serial correlation with ρ = 0.5.
As discussed previously, one can simulate this situation as follows:
εt = .5εt-1 + vt, where vt are independent Normally distributed with mean 0 and variance 75.
Var[εt] = Var[vt]/(1 - ρ2) = 75/(1 - .52) = 100.
Take ε1 to be Normal with mean 0 and variance 100.
When I performed the experiment 10 times, I got the following values of the DW Statistic,
sorted from smallest to largest: .80, .91, .93, 1.14, 1.30, 1.31, 1.34, 1.52, 1.62, 1.63.
Since ρ = .5, we expect the Durbin-Watson statistics to be close to 2(1 - ρ) = 1.
Using dl = 1.20, du = 1.41, we would have 4 cases (DW < 1.20) where we (correctly)
conclude there positive serial correlation, 3 cases (1.20 < DW < 1.41) where we are not sure
whether there is positive or no serial correlation, and 3 cases (1.41 < DW < 2.59) where we
do not reject the null hypothesis that there is no serial correlation.
Here is a graph of DW for 100 such simulations, with horizontal lines at dl = 1.20, du = 1.41,
and 4 - du = 2.59:
2.5
1.5
0.5
There were many simulation runs which resulted in Durbin-Watson statistics in the
indeterminate region, 1.20 < DW < 1.41, in the region where we reject the null hypothesis in
favor of positive serial correlation, DW < 1.20, and in the region where we do not reject the
null hypothesis, 1.54 < DW < 2.59. With only 20 observations, it is often the case that the
Durbin-Watson test will lead to no conclusion.
I performed a similar experiment, but with negative serial correlation, ρ = - .5. (A positive
error is more likely to be followed by a negative error and vice-versa.)
Here is a graph of DW for 100 such simulations, with horizontal lines at du = 1.41,
4 - du = 2.59, and 4 - dl = 2.80:
3.5
2.5
1.5
In the experiments with negative serial correlation DW is more likely to be large, while in the
experiments with positive serial correlation DW is more likely to be small. However, as with
any statistic, for finite sample sizes DW is subject to random fluctuation. So with negative
serial correlation, we can get a calculated DW < 2.
Lagged Dependent Variables:244
The model Yt = α + βYt-1 + γXt + εt, contains a lagged dependent variable. The value of Y is
assumed to depend among other things on the value of Y during the previous period.245
Assuming we observe Xt and Yt-1, we could use the fitted model to forecast Yt.
While the Durbin-Watson test can be used when a lagged dependent variable is present in
the regression, the DW statistic will often be close to 2, even when there is serial correlation.
The usual tables of critical values for the DW Statistic, dl and du, are not valid when there is a
lagged variable. Using the DW statistic when there is a lagged dependent variable, one is
unlikely to find serial correlation when it exists.
244
245
For example, Y might be automobile insurance claim frequency and X might be the price of gasoline in real
dollars.
When there is a lagged dependent variable, the Durbin-Watson test is biased
towards not rejecting the null hypothesis of no serial correlation.246 Pindyck and
Rubinfeld present two alternatives to the Durbin-Watson test for use when there is a lagged
variable. The first is the Durbin h-test.
Durbin’s h-test:
Let us assume we have fit the model Yt = α + βYt-1 + γXt + εt to 200 observations and get a
^
Durbin-Watson Statistic of 1.74 and Var[ β] = .0009. Then it turns out that
^
h = (1 - DW/2)√{N/(1- N Var[ β])} = (1 - 1.74/2)√{200/(1- (200)(.0009)} = (.13)(15.62) = 2.03,
has a Standard Normal Distribution if H0 is true. Therefore, since Φ(1.960) = .975 and 2.03 >
1.960, we can reject the null hypothesis at a 5% (two sided test) or 2.5% (one sided test).247
When there is a lagged variable with coefficient β , then the Durbin h-statistic: 248
^
h = (1 - DW/2)√ {N/(1 - N Var[ β ])}, where N is the number of observations and
DW is the Durbin-Watson Statistic, has a Standard Normal Distribution.
^
if H0: no serial correlation, is true. This is not valid if N Var[ β] ≥ 1.249
Exercise: One has fit the model Yt = α + βYt-1 + εt to 100 observations of a time series and the
^
Durbin-Watson Statistic is 1.68 and Var[ β] = .0012. H0 is that there is no serial correlation
and H1 is that there is positive serial correlation.
Using Durbin’s h-test, what conclusion do you draw?
^
[Solution: h = (1 - DW/2)√(N/(1- N Var[ β])) = 1.706. Φ(1.645) = .95 and Φ(1.960) = .975.
1.645 < 1.706 < 1.960. We reject H0 at 5% and do not reject H0 at 2.5% (one-sided test.)]
A Second Technique when one has a Lagged Dependent Variable:
The second technique for dealing with the situation with a lagged dependent variable,
involves fitting a regression to the residuals. If the original model is Yt = α + βYt-1 + γXt + εt,
then we fit: ε^t = a + ρ ε^ t −1 + bYt-1 + gXt + errort. We then apply the usual t-test to ρ. If ρ is
significantly different from zero, then we reject the null hypothesis of no serial correlation.
246
Nevertheless, it is still often used in this situation.
247
A one sided test is performed if the alternative hypothesis is positive serial correlation. A two-sided test is
performed if the alternative hypothesis is serial correlation.
248
249
^
If N Var[ β ] ≥ 1, one would be taking the square root of a negative number.
Exercise: One has fit the model Yt = α + βYt-1 + εt to 125 observations of a time series.
Then using the residuals, one fits ε^t = a + ρ ε^ t −1 + bYt-1 + errort. ρ^ = -.23 and Var[ρ^ ] = .017.
What conclusion do you draw?
[Solution: t = - .23/√.017 = -1.76. We fit 124 residuals, with 3 parameters, for 121 degrees of
freedom. The critical values at 10% and 5% are about 1.66 and 1.98. 1.66 < 1.76 < 1.98.
We reject H0 at 10% (two-sided test) and do not reject H0 at 5% (two-sided test).
At 10%, we conclude there is serial correlation.]
Some Intuition with respect to the Durbin Watson Statistic:
t=N t=N t = 30 t = N-1 t=N t=N
Σ( ε^t - ε^ t −1)2 = Σ ε^t 2 - 2Σ ε^t ε^ t −1 + Σ ε^t 2 = 2Σ ε^t 2 - 2Σ ε^t ε^ t −1 - ε^12 - ε^N2.

t=2 t=2 t=2 t=1 t=1 t=2
If εt-1 and εt have a correlation of zero, then Cov[εt-1 , εt] = 0.

⇒ E[εt-1 εt] = E[εt-1] E[εt] = (0)(0) = 0.
If ρ = 0, and N is not small, then E[ ε^t ε^ t −1] ≅ 0, and the sum of cross products is small.250
t=N t=N t=N t=N
DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 ≅ 2Σ ε^t 2/ Σ ε^t 2 = 2.

t=2 t=1 t=1 t=1
If instead ρ > 0, then E[ ε^t ε^ t −1] > 0, and DW < 2.
If ρ = 1, then E[ ε^t ε^ t −1] ≅ E[ ε^t 2], and DW ≅ 0.
If ρ = -1, then E[ ε^t ε^ t −1] ≅ −E[ ε^t 2], and DW ≅ 4.
250
Since the residuals sum to zero, they are not independent.
Problems:
3 0 .1 (1 point) The Durbin-Watson Statistic is 2.9. Assuming there is serial correlation of the
errors, estimate ρ, the correlation coefficient between successive errors.
(A) Less than -0.6
(B) At least -0.6, but less than -0.2
(C) At least -0.2, but less than 0.2
(E) At least 0.6
30.2 (1 point) A multiple regression model with 3 explanatory variables excluding the
constant term has been fit to 40 observations.
You test the hypothesis that there is no first order serial correlation of the errors.
At a 5% significance level, for k = 3 and N = 40: dl = 1.34 and du = 1.66.
The Durbin-Watson Statistic is 2.41.
Which of the following statements is true?
A. Do not reject the null hypothesis.
B. Reject the null hypothesis; negative serial correlation present.
C. Reject the null hypothesis; positive serial correlation present.
D. Result indeterminate.
3 0 .3 (3 points) Use the following information:

1 82
2 78
3 80
4 73
5 77
You have fit the following model: Y = 82.5 - 1.5t.
What is the Durbin-Watson Statistic?
(A) 3.0 (B) 3.2 (C) 3.4 (D) 3.6 (E) 3.8
30.4 (2 points) One has fit the model Yt = α + βYt-1 + γXt + εt to 50 observations of a time
^
^ = .17. Var[ β]
series. The Durbin-Watson Statistic is 1.46. Var[ α] = .009. Var[ γ^ ]= .014.
The null hypothesis H0 is that there is no serial correlation, and the alternative hypothesis H1
is that there is positive serial correlation.
A. Reject H0 at 0.005.
B. Do not reject H0 at 0.005; reject H0 at 0.010.
C. Do not reject H0 at 0.010; reject H0 at 0.025.
D. Do not reject H0 at 0.025; reject H0 at 0.050.
E. Do not reject H0 at 0.050.
30.5 (1 point) According to Pindyck and Rubinfeld in Econometric Models And Economic
Forecasts, which of the following statements about the Durbin-Watson Statistic is false?
A. A small value of the Durbin-Watson Statistic is associated with positive serial correlation.
B. The Durbin-Watson Statistic should be used to test the regression model
Yt = α + βYt-1 + εt, for serial correlation.
C. A value of the Durbin-Watson Statistic near 2 is associated with no serial correlation.
D. The Durbin-Watson Statistic is less than 4.
E. The Durbin-Watson Statistic is greater than 0.
30.6 (2 points) One has fit the model Yt = α + βYt-1 + γXt + εt to 250 observations of a time
^
series. The Durbin-Watson Statistic is 1.78. Var[ α^ ] = .0814. Var[ β] = .0023. Var[ γ^ ] = .0009.
H0 is that there is no serial correlation and H1 is that there is positive serial correlation.
A. Do not reject H0 at 5%.
E. Reject H0 at 0.5%.
30.7 (1 point) One has fit the model Yi = α + βYi-1 + γXi + εi to 45 observations.
Then using the residuals of this fit, one fits ^εi = a + ρ ^εi−1 + bYi-1 + gXi + errori.
^ ^
a^ = 172, Var[ a^ ] = 3207. ρ^ = .35, Var[ ρ^ ] = .031. b = .88, Var[ b ] = .30. g^ = -2.3, Var[ g^ ] = .82.
H0 is that there is no serial correlation and H1 is that there is serial correlation.
What conclusion do you draw?
A. Do not reject H0 at 10%.
E. Reject H0 at 1%.
For the next 3 questions, use the following 28 values for a time series at regular intervals:
2545, 2469, 2392, 2193, 1901, 1718, 1645, 1546, 1433, 1289, 1136, 1041, 1000, 984, 964,
955, 969, 949, 926, 880, 839, 822, 812, 802, 797, 782, 762, 759.
These 28 values sum to 35,310.
30.8 (4 points) Fit the regression model, Y = α + βt + ε.
30.9 (4 points) Compute the Durbin-Watson Statistic.
30.10 (1 point) Determine the approximate value of the sample autocorrelation coefficient
measuring the association between consecutive residuals.
30.11 (2 points) A linear regression has been fit to a time series with 30 observations.
ε^1 = -7. ε^30 = 11.
t = 30 t = 30
Σ ε^t 2 = 2422. Σ ε^t ε^ t −1 = 801.

t=1 t=2
Compute the Durbin-Watson Statistic.
(A) Less than 1.3
(E) At least 1.6
30.12 (7 points) You are given the following information by quarter on gas prices (inflation
adjusted dollars per gallon) and automobile insurance claims frequencies.
Gas Price: 1.96 2.21 2.13 2.30 2.57 2.42 2.46 2.68 2.69 2.72 2.75
Frequency: .0423 .0383 .0385 .0378 .0364 .0339 .0347 .0402 .0369 .0364 .0385
Gas Price: 2.90 2.94 3.01 2.98 2.86 2.91 3.04 3.16 3.29 3.38
Frequency: .0338 .0362 .0372 .0376 .0367 .0342 .0344 .0351 .0355 .0322
Gas Price: 3.40 3.45 3.52 3.51 3.67 3.62 3.68 3.74 3.88 3.92
Frequency: .0329 .0327 .0302 .0291 .0257 .0282 .0315 .0308 .0339 .0345
Using a regression program on a computer, fit the following model: Yi = α + βYi-1 + γXi + εi,
where X is gas prices and Y is claim frequency.
Use Durbin’s h test in order to test the null hypothesis H0 is that there is no serial correlation
versus the alternative hypothesis H1 is that there is positive serial correlation.
30.13 (5 points) Average premiums for homeowners insurance follow the model
Y = 400 + 20t + ε.
Due to an underwriting cycle, ε1 = 0, ε2 = 10, ε3 = 20, ε4 = 10, ε5 = 0, ε6 = -10, ε7 = -20,
ε8 = -10, ε9 = 0, ε10 = 10, ε11 = 20, ε12 = 10, ε13 = 0, ε14 = -10, ε15 = -20, ε16 = -10, ε17 = 0.
With the aid of a computer, for t = 1, 2, ..., 17, determine Yi, and fit a linear regression of Y as
2
function of t. Determine the values of: R2, R , the t-statistics, the F-Statistic, and the
Durbin-Watson Statistic.
30.14 (Course 120 Sample Exam #1, Q.8) (2 points) You fit a simple linear regression
model to the following eight observations obtained on consecutive days:
Day Y X
1 11 2
2 20 2
3 30 3
4 39 3
5 51 4
6 59 4
7 70 5
8 80 5
Using the method of least squares, you determine the estimated regression line
^
Y = -25 + 20X.
Determine the value of the Durbin-Watson statistic.
(A) 3.5 (B) 3.6 (C) 3.7 (D) 3.8 (E) 3.9
30.15 (Course 120 Sample Exam #3, Q.10) ( 2 points) You have performed a simple
regression analysis and determined that the value of the Durbin-Watson test statistic is 0.8.
Determine the approximate value of the sample autocorrelation coefficient measuring the
association between consecutive residuals.
(A) 0.2 (B) 0.3 (C) 0.4 (D) 0.6 (E) 0.8
30.16 (Course 4 Sample Exam, Q.30) (2.5 points) You wish to determine the
relationship between sales (Y) and the number of radio advertisements broadcast (X). Data
collected on four consecutive days is shown below.
Day Sales Number of Radio Advertisements
1 10 2
2 20 2
3 30 3
4 40 3
Using the method of least squares, you determine the estimated regression line:
^
Y = -25 + 20X
Determine the value of the Durbin-Watson statistic for this model.
30.17 (4, 5/00, Q.24) (2.5 points) You are given the following linear regression results:
t Actual Fitted
1 77.0 77.6
2 69.9 70.6
3 73.2 70.9
4 72.7 72.7
5 66.1 67.1
Estimate the lag 1 serial correlation coefficient for the residuals, using the Durbin-Watson
statistic.
(A) Less than -0.2
(B) At least -0.2, but less than -0.1
(C) At least -0.1, but less than 0.0
(E) At least 0.1
(i) The model is St = α + βSt-1 + γAt + εt
where S denotes monthly sales and A denotes monthly advertising expenses.
(ii) A regression fit with T = 36 yields (standard errors in parentheses):
^
S t = 7.5 + 0.5St-1 + 0.2At
(0.5) (0.1) (0.03)
s = 4.3, R2 = 0.94, DW = 1.2
Determine the outcome of Durbin's h test for the presence of serial correlation in the errors.
(A) The test does not reject the null hypothesis of no serial correlation at the 10%
significance level.
(B) The test rejects the null hypothesis of no serial correlation at the 10% significance level,
but not at the 5% significance level.
(C) The test rejects the null hypothesis of no serial correlation at the 5% significance level,
but not at the 2.5% significance level.
(D) The test rejects the null hypothesis of no serial correlation at the 2.5% significance level,
but not at the 1 % significance level.
(E) The test rejects the null hypothesis of no serial correlation at the 1% significance level.
Section 31, Correcting for Serial Correlation:

We will discuss two methods of correcting for serial correlation when it exists:
the Hildreth-Lu Procedure and the Cochrane-Orcutt Procedure
Hildreth-Lu Procedure:
One method of correcting for serial correlation is the Hildreth-Lu procedure.
Recall that when the following time series, Y1, Y2, ... , Y20: 547, 628, 778, 759, 823, 772, 904,
971, 974, 916, 1043, 932, 959, 1077, 998, 1003, 1048, 1371, 1414, 1568, was fit by linear
regression, there was evidence of positive first order serial correlation. The Durbin-Watson
Statistic was .855. Since DW ≅ 2(1 - ρ), ρ ≅ 1 - DW/2 = 1 - .855/2 = .57.
Thus we assume ρ is approximately .57. We can try various value for ρ, such as: .50, .55, .60,
.65, and see which one “works best.”
Let X*t = Xt - ρXt-1, and Y*t = Yt - ρYt-1. As discussed below, we hope this transformation will
remove much of the serial correlation present in the original regression.
For example, for ρ = .55, X*2 = X2 - .55X1 = 2 - (.55)(1) = 1.45.

X∗ = 1.45, 1.90, 2.35, 2.80, 3.25, 3.70, 4.15, 4.60, 5.05, 5.50, 5.95, 6.40, 6.85, 7.30, 7.75, 8.20,
8.65, 9.10, 9.55.
Y*2 = Y2 - .55Y1 = 628 - (.55)(547) = 327.15.251
Y∗ = 327.15, 432.6, 331.1, 405.55, 319.35, 479.4, 473.8, 439.95, 380.3, 539.2, 358.35, 446.4,
549.55, 405.65, 454.1, 496.35, 794.6, 659.95, 790.3.
The transformed regression equation is: Y∗ = α(1 - ρ) + βX∗ = .45α + βX∗, where α and β are
the intercept and slope of the equation that relates to the original variables, Y = α + βX.
Exercise: Fit a linear regression to X∗ and Y∗. Determine ESS.

^
[Solution: .45 α^ = 255.655, and β = 40.4421. ε^t = 12.8537, 100.105, -19.5942, 36.6568,
-67.7421, 74.1089, 50.31, -1.73895, -79.5879, 61.1132, -137.936, -68.0847, 16.8663,
-145.233, -114.982, -90.9305, 189.121, 36.2716, 148.423. ESS = Σ ε^t 2 = 160,238.]
^
Thus α^ = 255.655 / .45 = 568.12 and β = 40.44.
^
In terms of the original variables: Yt = 568.12 + 40.44t.
Proceeding in the same way for various values of ρ, the regressions of X∗ and Y∗ are:
251
There is no Y1* . After the transformation, there are only 19 rather than 20 values.
ρ intercept slope ESS

.50 286.479 39.7930 161,681
.55 255.655 40.4421 160,238
.60 224.669 41.2535 159,588
.65 193.452 42.2967 159,732
The smallest Error Sum of Squares corresponds to ρ = .60.252 Thus we could use the results
^
of this regression: slope = (1 - .60) α^ = 224.669 and β = 41.2535. α^ = 224.669/.4 = 561.67.
^
Note this differs somewhat from the linear regression fit to the original variables:
^
Yt = 575 + 38.0t.
Exercise: Use both the original regression and the result of the Hildreth-Lu procedure in
order to forecast Y22.
[Solution: Using the original regression, Y22 = 575 + (38.0)(22) = 1411.
Using the Hildreth-Lu procedure, Y22 = 561.67 + (41.25)(22) = 1469.]
The Reason for the Transformed Variables and the Transformed Regression Equation:*
Assume Yt = α + βXt + εt, with Corr[εt-1, εt] = ρ, Corr[εt-2, εt] = ρ2, and Var[εt] = σ2.253
Then Yt - ρYt-1 = α + βXt + εt - ρ(α + βXt-1 + εt-1) = α(1 - ρ) + βX∗ + ε*t , where
X*t = Xt - ρXt-1, and ε*t = εt - ρεt-1.
Var[ ε*t ] = Var[εt - ρεt-1] = Var[εt] + ρ2Var[εt-1] - 2ρCov[εt, εt-1] = σ2 + ρ2σ2 - 2ρ2σ2 = (1 - ρ2)σ2.
Cov[ ε*t−1,ε*t ] = Cov[εt-1 - ρεt-2, εt - ρεt-1] =
Cov[εt-1, εt] + ρ2Cov[εt-2, εt-1] - ρCov[εt-1, εt-1] - ρCov[εt, εt-2] = ρσ2 + ρ2ρσ2 - ρσ2 - ρρ2σ2 = 0.
Thus ε*t have constant variance and no first order serial correlation.
If Y*t = Yt - ρYt-1 and X*t = Xt - ρXt-1, then the transformed equation is: Y∗ = α(1 - ρ) + βX∗.
Summary of the Hildreth-Lu procedure:254
0. Choose a grid of likely values for -1 ≤ ρ ≤ 1. For each such value of ρ:

1. Let X*t = Xt - ρXt-1, and Y*t = Yt - ρYt-1.
2. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.
252
We might then create a new finer grid of values of ρ centered on .6, and try to refine our estimate of ρ.
253
This is the covariance structure we expect with homoscedasticity and first order serial correlation.
254
Shown for the case with a constant plus one independent variable. The procedure is the same for more than
one independent variable, except one must transform each independent variable in the same manner.
3. The best regression has the smallest Error Sum of Squares (ESS).
If desired, then refine the grid of values for ρ, and again perform steps 1, 2, and 3.
Translate the transformed equation back to the original variables: Yt = α + βXt.
Cochrane-Orcutt Procedure:
Another method of correcting for serial correlation is the Cochrane-Orcutt procedure.
When the following time series, Y1, Y2, ... , Y20: 547, 628, 778, 759, 823, 772, 904, 971, 974,
916, 1043, 932, 959, 1077, 998, 1003, 1048, 1371, 1414, 1568, was fit by linear regression,
the residuals, ε^t were: -66, -23, 89, 32, 58, -31, 63, 92, 57, -39, 50, -99, -110, -30,
-147, -180, -173, 112, 117, 233. There was evidence of positive first order serial correlation.
Exercise: Fit a linear regression ε^t = ρ ε^ t −1 + error.
[Solution: Since this is a model with no intercept, ρ^ = Σ ε^ t −1 ε^t / Σ ε^ t −12 =

{(-66)(-23) + ... + (117)(233)}/{(-66)2 + ... + 1172} = 99530/170830 = .58.]
Thus our first estimate of the lag 1 serial correlation coefficient is .58.
Let X*t = Xt - .58 Xt-1, and Y*t = Yt - .58 Yt-1. As discussed previously, we hope this
transformation, the same one used in the Hildreth-Lu procedure, will remove much of the
serial correlation present in the original regression.
X*2 = X2 - .58X1 = 2 - (.58)(1) = 1.42.

X∗ = 1.42, 1.84, 2.26, 2.68, 3.10, 3.52, 3.94, 4.36, 4.78, 5.20, 5.62, 6.04, 6.46, 6.88, 7.30, 7.72,
8.14, 8.56, 8.98.
Y*2 = Y2 - .58 Y1 = 628 - (.58)(547) = 310.74.255
Y∗ = 310.74, 413.76, 307.76, 382.78, 294.66, 456.24, 446.68, 410.82, 351.08, 511.72,
327.06, 418.44, 520.78, 373.34, 424.16, 466.26, 763.16, 618.82, 747.88.
The transformed regression equation is: Y∗ = α(1 - ρ) + βX∗ = .42α + βX∗, where α and β are
the intercept and slope of the equation that relates to the original variables.
Exercise: Fit a linear regression to X∗ and Y∗.

^
[Solution: slope = .42 α^ = 237.1, and β = 40.9.]
^
Thus α^ = 237.1 / .42 = 564.5 and β = 40.9.
^
255
There is no Y1* . After the transformation, there are only 19 rather than 20 values.
^
Exercise: For this revised equation compute Yt and ε^t .
^
[Solution: Yt = 605.4, 646.3, 687.2, 728.1, 769.0, 809.9, 850.8, 891.7, 932.6, 973.5, 1014.4,
1055.3, 1096.2, 1137.1, 1178., 1218.9, 1259.8, 1300.7, 1341.6, 1382.5.
ε^t = -58.4, -18.3, 90.8, 30.9, 54.0, -37.9, 53.2, 79.3, 41.4, -57.5, 28.6, -123.3, -137.2, -60.1,
-180.0, -215.9, -211.8, 70.3, 72.4, 185.5.]
One could perform another iteration of this whole procedure.
Exercise: Fit a linear regression ε^t = ρ ε^ t −1 + error.
[Solution: ρ^ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 123969/203949 = .61.]
Thus our second estimate of the lag 1 serial correlation coefficient is .61, compared to our first
estimate of .58.
Let X*t = Xt - .61 Xt-1, and Y*t = Yt - .61 Yt-1.
X∗ = 1.39, 1.78, 2.17, 2.56, 2.95, 3.34, 3.73, 4.12, 4.51, 4.9, 5.29, 5.68, 6.07, 6.46, 6.85, 7.24,
7.63, 8.02, 8.41.
Y∗ = 294.33, 394.92, 284.42, 360.01, 269.97, 433.08, 419.56, 381.69, 321.86, 484.24,
295.77, 390.48, 492.01, 341.03, 394.22, 436.17, 731.72, 577.69, 705.46.
The transformed regression equation is: Y∗ = α(1 - ρ) + βX∗ = (1 - .61)α + βX∗, where α and β
are the intercept and slope of the equation that relates to original variables.
Exercise: Fit a linear regression to X∗ and Y∗.

^
[Solution: slope = .39 α^ = 218.45 and β = 41.44.]
^
Thus α^ = 218.45 / .39 = 560.13 and β = 41.44.
^
^
Exercise: For this revised equation compute Yt and ε^t .
^
[Solution: Yt = 601.57, 643.01, 684.45, 725.89, 767.33, 808.77, 850.21, 891.65, 933.09,
974.53, 1015.97, 1057.41, 1098.85, 1140.29, 1181.73, 1223.17, 1264.61, 1306.05, 1347.49,
1388.93.
ε^t = -54.57, -15.01, 93.55, 33.11, 55.67, -36.77, 53.79, 79.35, 40.91, -58.53,
27.03, -125.41, -139.85, -63.29, -183.73, -220.17, -216.61, 64.95, 66.51, 179.07.]
One could perform yet another iteration of this whole procedure.
Fitting a linear regression ε^t = ρ ε^ t −1 + error:
ρ^ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 127058/209607 = .61.
Since ρ seems to have converged, we can exit the procedure.256

^
The resulting model is: Yt = 560.13 + 41.44t.
Summary of the Cochrane-Orcutt procedure:257
1. Fit a linear regression and get the resulting residuals ε^t .
2. Estimate the serial correlation coefficient: ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12.

5. Translate this transformed equation back to the original variables: Yt = α + βXt, and get the
resulting residuals ε^t .
6. Estimate the serial correlation coefficient: ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12.258

7. Unless the value of ρ seems to have converged or enough iterations have been
performed, return to step 3.259
256
One could instead take ρ to a little more accuracy if desired, such as ρ = .606.
257
Shown for the case with a constant plus one independent variable. The procedure is the same for more than
one independent variable, except one must transform each independent variable in the same manner.
^
Note that since we are fitting a linear regression between ε ^
t −1 and εt , the denominator only includes the sum
258
^ , with the square of the final error missing. This is contrast to equations 16.23 and 18.15 in
of squares of ε t −1
Pindyck and Rubinfeld, related to time series, which involve all of the squared differences.
Note that the denominator of the Durbin-Watson Statistic includes the sum of the squares of all the residuals.
259
Pindyck and Rubinfeld suggest one exit when the change in ρ is less than either .01 or .005, or when one has
performed 10 or 20 iterations.
Problems:
3 1 .1 (2 points) You are given the following linear regression results:

t Actual Fitted
1 68.9 69.9
2 73.2 72.8
3 76.8 75.6
4 79.0 78.5
5 80.3 81.4
Determine the estimated lag 1 serial correlation coefficient after one iteration of the
Cochrane-Orcutt procedure.
(A) -.10 (B) -0.05 (C) 0 (D) .05 (E) .10
31.2 (1 point) For a time series with 100 observations, you wish to estimate the serial
correlation for the model Yt = α + βt + εt.
Y = (21.268, 15.557, 4.8898, 10.878, 14.122, 33.743, 10.395, 5.158, 17.910, 38.840, ...).
Using the Hildreth-Lu procedure, for ρ = .3, what is Y6* ?
(A)20 (B)25 (C) 30 (D) 35 (E) 40
31.3 (1 point) For a time series with 100 observations, you have transformed the variables
and fit a linear regression as in the Hildreth-Lu procedure.
ρ ESS
.3 9808.5
.4 9712.4
.5 9844.4
.6 10204.4
.7 10792.5
Determine the estimated lag 1 serial correlation coefficient.
(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7
One has 3 years of monthly values of the Consumer Price Index for Medical Care,
t = 1, 2, 3, ..., 36:
216.6, 217.9, 218.4, 218.9, 219.3, 219.8, 220.8, 221.6, 222.1, 222.9, 223.5, 223.8,
225.2, 226.2, 226.6, 227.0, 227.4, 227.8, 228.7, 229.2, 229.4, 230.1, 230.5, 230.6,
231.8, 232.7, 233.4, 233.8, 234.2, 234.4, 234.8, 235.2, 235.4, 235.8, 236.4, 237.1.
Use a computer to help you with the calculations.
31.4 (4 points) Fit an exponential regression to this data, ln(CPI) = α + βt.
31.5 (3 points) What is the Durbin-Watson Statistic for this regression?
31.6 (8 points) Apply the Cochrane-Orcutt procedure, in order to correct for serial correlation.
31.7 (8 points) Apply the Hildreth-Lu procedure, in order to correct for serial correlation.
31.8 (4, 11/00, Q.12) (2.5 points) You are given the following linear regression results:
t Actual Fitted
1 77.0 77.6
2 69.9 70.6
3 73.2 70.9
4 72.7 72.7
5 66.1 67.1
Determine the estimated lag 1 serial correlation coefficient after one iteration of the
Cochrane-Orcutt procedure.
(A) –0.3 (B) –0.2 (C) –0.1 (D) 0.0 (E) 0.1
31.9 (4, 5/01, Q.33) (2.5 points) An actuary uses annual premium income from the
previous year as the independent variable and loss ratio in the current year as the dependent
variable in a two-variable linear regression model. Using 20 years of data, the actuary
estimates the model slope coefficient with the ordinary least-squares estimator β and does
not take into account that the error terms in the model follow an AR(1) model with first-order
autocorrelation coefficient ρ > 0.
Which of the following statements is false?
(A) The estimator β is biased.
(B) The estimator β is consistent.
(C) The R2 probably gives an overly optimistic picture of the success of the regression.
(D) The estimator of the standard error of β is biased downward.
(E) Use of the Cochrane-Orcutt procedure would have produced a consistent estimator of
the model slope with variance probably smaller than the variance of β.
Section 32, Multicollinearity260

In the multiple regression model, we assumed no linear relationship among the independent
variables. Specifically we assumed the k by k cross product matrix X’X had dimension k, so
that its matrix inverse exists.
If X’X has dimension less than k, then we have what is called perfect multicollinearity. In the
simplest case of perfect multicollinearity, two independent variables are collinear, i.e.
proportional. In the case of multicollinearity, the equations to estimate the parameters will not
work; in this case the inverse of X’X does not exist. Less extreme, are situations where even
though the inverse of X’X exists, independent variables are highly correlated.
There is a high degree of multicollinearity when some of the independent

variables or combinations of independent variables are highly correlated.261
This creates problems.
A high degree of multicollinearity generally results in large variances of the estimated

parameters.262 It also generally results in high covariances between estimated parameters.
A high degree of multicollinearity, usually leads to unreliable estimates of the
regression parameters.
Signs of Possible Multicollinearity:263

1. If several parameters have high standard errors and dropping one or more variables from
the equation lowers the remaining standard errors.
2. High R2, (and significant F-Statistic,) with few significant t-statistics.
3. High simple correlations between pairs of variables may indicate multicollinearity.
4. A high “condition number.”264
5. Sign of fitted parameter opposite of what is expected for one or more parameters.
6. When a variable is excluded from the model, one has R2 ≥ .95.
Adding an additional independent variable to a regression model does not improve its
performance if this variable is highly correlated with either one or more of the current
variables or a linear combination of the current variables.
As will be discussed subsequently, multiple regression is often used in order to estimate the
elasticity of the dependent variable with respect to each independent variable. If there is a
high degree of multicollinearity, then such estimates of elasticities are unreliable or even
meaningless.
260
261
This is often described as X’X being an ill-conditioned matrix; one can also say the data is
ill-conditioned. In this case, the determinant of X’X will be very small.
262
As discussed previously, in the three variable model, the variance of the fitted slopes increases when the two
independent variables are highly correlated.
263
See also Section 16.4 of Applied Regression Analysis by Draper and Smith.
264
Mentioned in Pindyck and Rubinfeld.
For an explanation, see for example Section 16.5 of Applied Regression Analysis by Draper and Smith.
For example, let us assume the dependent variable is the miles per gallon for a automobile.
Vehicle weight and horsepower of the engine, might each be useful independent variables
for a regression model. However, if these two variables are highly correlated, one should not
use them both in the model. Perhaps instead, weight and the ratio of weight to horsepower
could be used together in the model.
Variance Inflation Factors:*
If we regress the ith independent variable against all of the other independent variables,
then the Variance Inflation Factor is: VIFi = 1/(1 - Ri2) ≥ 1.265 If one or more of the VIFs is large,
that is an indication of multicollinearity.
Average Claim Costs:*
Insurance average claims costs are often modeled via consumer price indices (CPI).
However, different CPIs usually have a high correlation; when one CPI increases by a lot the
other CPI is more likely to increase by a lot. Therefore, if we used two or more CPIs in a
regression model, we are likely to run into the problem of multicollinearity.
For example, for homeowners insurance, average claim costs are related to both the overall
rate of inflation and the cost of construction (in order to repair or replace damaged homes.)
Rather than use two CPIs separately, one might instead use a 45%-55% weighting of the
Consumer Price Index for all items and a Construction Cost Index.266 The selected weights
would approximately reflect the mix of the costs of items for which the claims are paying.
Classifications for Ratemaking:*
In insurance ratemaking, insureds are divided into classifications.267 Insureds classified

similarly will be charged the same or similar rates.268
For example, in automobile insurance one might use the location, characteristics of the
driver, and characteristics of the vehicle, in order to classify an insured. Since these three
types of variables are not highly correlated, we get a more accurate rate, than if correlated
variables were used.
265
See Section 16.4 of Applied Regression Analysis by Draper and Smith.
The VIFs are the diagonal elements of the inverse of the correlation matrix.
266
See “Homeowners Insurance Pricing” by Mark Homan in the 1990 CAS Discussion Paper Program.
267
See for example “Risk Classification” by Robert Finger, in Foundations of Casually Actuarial Science.
268
While linear regression is usually not used to determine classification rates, the problem of multicollinearity still
applies.
Problems:
32.1 (3 points) You are given the following observations of 10 young boys:
Age Height Weight Chest Depth Maximal Oxygen Uptake
(years) (cms.) (kgs.) (centimeters) (milliliters per kilogram)
X2 X3 X4 X5 Y
8.4 132.0 29.1 14.4 1.54
8.7 135.5 29.7 14.5 1.74
8.9 127.7 28.4 14.0 1.32
9.9 131.1 28.8 14.2 1.50
9.0 130.0 25.9 13.6 1.46
7.7 127.6 27.6 13.9 1.35
7.3 129.9 29.0 14.0 1.53
9.9 138.1 33.6 14.6 1.71
9.3 126.6 27.7 13.9 1.27
8.1 131.8 30.8 14.5 1.50
The correlation matrix between the independent variables is:
( 1 .327 .231 .166)
(.327 1 .790 .791)
(.231 .790 1 .881)
(.166 .791 .881 1)
The model Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + ε, was fit:

Parameter Estimate Standard Error T-Statistic P-Value
β1 -4.775 .8628 -5.534 0.3%
β2 -0.03521 .01539 -2.289 7.1%
β3 0.05164 .006215 8.308 0.04%
β4 -0.02342 .01343 -1.744 14.2%
β5 0.03449 .08524 0.4046 70.3%
2
R2 = .967. R = .941. s2 = .001385. Durbin-Watson = 2.753.
D.F. Sum of Squares Mean Square F-Ratio P-Value
RSS 4 0.20604 0.0515 37 0.07%
ESS 5 0.00692 0.00138
TSS 9 0.21296
(0.744 -0.00122 -0.00144 0.00910 -0.0572 )
(-0.00122 0.000237 -0.0000274 -0.0000178 0.000231)
^
Var[ β] = (-0.00144 -0.0000274 0.0000386 -0.0000235 -0.000191)
(0.00910 -0.0000178 -0.0000235 0.000180 -0.000784)
(-0.0572 0.000231 -0.000191 -0.000784 0.00727 )
Are there signs of possible multicollinearity? If so what are they?

What additional steps, if any, might you take to test for multicollinearity?
32.2 (1 point) The Hoffa Insurance Company is studying the loss ratios of long haul trucking
firms for which it is writing Workers Compensation Insurance in the United States.
One fits a multiple linear regression model with:
Y = loss ratio for the most recent year,
X2 = written premium,
X3 = number of trucks,
X4 = log of number of years insured with Hoffa Insurance, and
dummy variables D1 to D49 indicating whether or not there is exposure in each state,
(excluding Hawaii and Alaska, but including the District of Columbia.)
List one specific reason why multicollinearity could be expected to cause a problem.
32.3 (3 points) Four different multiple regressions have each been fit to the same 20
observations. Each regression has the same dependent variable, but uses different
combinations of the independent variables X2, X3, X4, and X5.
In which case is there an indication of multicollinearity?
^ ^ ^
A. RSS = 773. ESS = 227. β 2 = 44. β 3 = 1.3. β 4 = -2.4. sβ^ = 16. sβ^3 = 0.7. sβ^ 4 = 0.5.
2
^ ^ ^
B. RSS = 798. ESS = 202. β 2 = 40. β 3 = 1.5. β5 = 0.35. sβ^ = 18. sβ^3 = 0.6. sβ^ = 0.16.
2 5
^ ^ ^
C. RSS = 769. ESS = 231. β 2 = 37. β 4 = -1.9. β5 = 0.41. sβ^ = 25. sβ^ 4 = 1.4. sβ^ = 0.23.
2 5
^ ^ ^
D. RSS = 785. ESS = 215. β 3 = 1.7. β 4 = -2.8. β5 = 0.45. sβ^3 = 0.7. sβ^ 4 = 1.1. sβ^ = 0.24.
5
E. None of A, B, C, or D has an indication of multicollinearity.

Mahler’s Guide to
Regression
Sections 33-35:
33 Forecasting
34 Testing Forecasts
35 Forecasting with Serial Correlation
prepared by
Study Aid F06-Reg-I

Sharon, MA, 02067
HCMSA-F06-Reg-I, Mahler’s Guide to Regression, 7/12/06, Page 307
Section 33, Forecasting
One can use regression to forecast the value of the dependent variable for a value of the
independent variable, or in the case of a multiple regression a set of values of the
independent variables. Assuming we have selected the right model, there are two different
reasons why the forecasted value will not equal the observed value. First, the coefficients of
the model have been estimated from a finite sample of data. Second, the model assumes
there is a random element to the observation.
A Time Series Example:
Linear regression models are often fit to time series.
For example assume you have the following 5 years of loss ratios:269

1 84
2 80
3 81
4 76
5 74
Exercise: Fit the two-variable linear regression model: Y = α + βt + ε, to the above data.
[Solution: x = -2, -1, 0, 1, 2. Y = 79. y = 5, 1, 2, -3, -5.
^ ^
β = Σxiyi/Σxi2 = -24/10 = -2.4. α^ = Y - β X = 79 - (-2.4)(3) = 86.2.]
Exercise: What is the standard error of this regression?

^ ^
[Solution: Yt = 83.8, 81.4, 79, 76.6, 74.2. ε^t = Yt - Yt = .2 , -1.4, 2, -0.6, -0.2.
s2 = Σ ε^t 2 / (N - 2) = 6.4/3 = 2.133. s = 1.461.]
Exercise: What are the standard errors of the slope and intercept?
[Solution: Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (2.133)(55)/((5)(10)) = 2.346. sα^ = √2.346 = 1.532.

^
Var[ β] = s2 /Σxi2 = 2.133/10 = .2133. sβ^ = √.2133 = .462.]
Exercise: What is the covariance of the estimated slope and intercept?

^
[Solution: Cov[ α^ , β] = -s2 X /Σxi2 = -(2.133)(3)/10 = -.640.]
^ ^ ^
Corr[ α^ , β] = Cov[ α^ , β]/ √(Var[ α^ ]Var[ β]) = -.640/√((2.346)(.2133)) = -.905.
269
Loss Ratio = Losses/Premium. The 80 shown here corresponds to a loss ratio of 80%.
These loss ratios have presumably been brought to ultimate and adjusted for any rate changes.
Predicting the Future:
Sometimes linear regression models fit to time series are used in order to forecast the
future.270
^
In the above example, the forecasted loss ratio for year 6 is: α^ + β6 = 86.2 + (-2.4)(6) = 71.8.
This is a point estimate as opposed to an interval estimate, which can be obtained as follows.
Using the general properties of variances, the variance of this forecast is:271
^ ^ ^
Var[ α^ + β6] = Var[ α^ ] + 62Var[ β] + (2)(6)Cov[ α^ , β] = 2.346 + (36)(.2133) + (12)(-.640) = 2.345.
Thus the standard error of the forecasted loss ratio for year 6 is: √2.345 = 1.531.
With 5 - 2 = 3 degrees of freedom, a 95% confidence interval from the t-table is:
±3.182 standard deviations.272
Thus an approximate 95% confidence interval for this forecast is:
71.8 ± (3.182)(1.531) = 71.8 ± 4.9.
Exercise: What would the forecasted loss ratio be for year 8?

What is the standard error of this forecast?
[Solution: estimate = 86.2 + (-2.4)(8) = 67.0.
^
Var[ α^ + β8] = 2.346 + (82)(.2133) + (8)(2)(-.640) = 5.757. Standard error = √5.757 = 2.399.]
Thus an approximate 95% confidence interval for the forecasted loss ratio for year 8 is:
67.0 ± (3.182)(2.399) = 67.0 ± 7.6. This confidence interval is wider. In general as one tries
to forecast further out into the future, the confidence intervals are wider.
One can derive a formula for the variance of such a forecast. Assume one is forecasting at
^
the value X of the independent variable, then the estimate is α^ + βX, with variance:
^ ^
Var[ α^ ] + X2Var[ β] + 2XCov[ α^ , β] = s2ΣXi2 /(NΣxi2) + X2 s2 /Σxi2 - 2Xs2 X /Σxi2
= s2{(Σxi2 + N X 2)/(NΣxi2) + (X2 - 2X X ) /Σxi2} = s2{1/N + (X2 - 2X X + X 2) /Σxi2} =
s2{1/N + x2/Σxi2}.
When we have observed N data points, with independent variable Xi, with mean X :
Variance of forecast at x = X - X is: s2 {1/N + x2 / Σ x i2 }. 273
270
Generally linear regression (or exponential regression) could be used to forecast for a short period of time
beyond the observed data. It would not be used to forecast the distant future.
271
Var[aX + bY] = a2Var[X] + b2Var[Y] + 2abCov[X,Y]. This is also a special case of the delta method.
272
A sum of 5% area in the two tails, with 6 d.f. For the 2-variable model, the degrees of freedom are N-2.
273
See for example, section 3.1 of Applied Regression Analysis, by Draper and Smith.
As X gets further from the mean of the values of the independent variable contained in the
observations, X , x = X - X gets larger, and the variance of the forecast increases.
^
Near X , the forecast is not affected very much by the value of β, while far from X , the forecast
^
is affected a great deal by the value of β. Therefore, when far from X , the uncertainty in the
estimate of the slope, leads to the large variance of the forecast.
Exercise: Use this alternate formula in order to estimate the variance of the forecasted loss
ratio for year 8.
[Solution: x = 8 - 3 = 5. N = 5. s2 = 2.133. Σxi2 = 10. s2{1/N + x2 /Σxi2} = 5.759.]
This matches the result obtained previously. For this example, the variance of a forecast at
time t is: s2{1/N + x2/Σxi2} = (2.133){1/5 + (t - 3)2/10)}. Therefore, a 95% confidence interval for
this forecast is: 86.2 - 2.4t ± (3.182)(1.461)√{1/5 + (t - 3)2/10)} =
86.2 - 2.4t ± (4.65)√{.2 + (t - 3)2/10)}.
Here is a graph of the observed loss ratios, the regression line (solid), and 95% confidence
intervals for the forecasted loss ratios (dashed):
L.R.
80
75
70
65
60
55
time
2 4 6 8 10
We have predicted the expected value of the loss ratio for each time. However, we have yet
to consider the fact that for a given time the observed loss ratio will vary around its expected
value.
Random Nature of the Model:
Let us assume we knew that the correct model was Y = α + βt + ε, with α = 86, β = -2 and ε is
Normally Distributed with mean zero and variance 1.5.274 Note that these are not values
estimated from a finite data set, but are the assumed true values.
Then the expected loss ratio for t = 6 is: 86 - (2)(6) = 74. Therefore, the loss ratio for t = 6 is
Normally Distributed with mean 74 and variance 1.5. If one simulated this situation, the
observed loss ratio at time 6 would vary around 74. The mean squared difference between
74 and the observed loss ratio at time 6 is 1.5.
Thus even when there is no error due to estimating the coefficients of the regression model
from data, the forecast would not be equal to the observation; the mean squared forecast
error would be σ2, the variance of ε.
Forecast Errors:
Combining the two causes of forecast error, the variance of the forecast and the variance of
the model:
Mean Squared Forecast Error at x = X - X is: σ2{1 + 1/N + x2/Σxi2}.275
Taking into account both the error in the estimated coefficients and the random nature of the
model, and estimating as usual σ2 by s2:
Mean Squared Forecast Error at x = X - X is: sf 2 = s 2 {1 + 1/N + x2 / Σ x i2 }.
For the time series example, at time 6, the mean squared forecast error is:
2.133{1 + 1/5 + (6 - 3)2/10} = 4.479.
Note that this is the variance of the forecast at time 6, 2.345 as calculated previously, plus
s2 = 2.133, subject to rounding. In general, the mean squared forecast error is the sum of s2
and the variance of the forecast.
Exercise: For the time series example, what is the mean squared forecast error at time 7?
[Solution: x = 7 - 3 = 4. N = 5. s2 = 2.133. Σxi2 = 10. s2{1 + 1/N + x2 /Σxi2} = 5.972.]
If one estimated the regression from data at time 1, 2, ..., T, and one is forecasting at time T+1,
Mean Squared Forecast Error is: sf 2 = s 2 {1 + 1/T + ( X T+1 - X) 2 / Σ ( X i - X)2 }.276
The mean squared forecast error is smallest when we try to predict the value of
the dependent variable at the mean of the independent variable.
274
We assume that the εi are independent and they each have the same variance.
275
See equation 8.19 in Pindyck and Rubinfeld.
276
When predicting the future in this example, here is a graph of the mean squared forecast
error as a function of time:
MSE
12
10
4
Time
6 7 8 9 10
As we attempt to forecast further into the future, the mean squared forecast error increases.
Normalized Errors:
In the previous example, the forecasted loss ratio for year 6 was: 86.2 + (-2.4)(6) = 71.8, with
a mean squared forecast error of 4.479. If the loss ratio for year 6 turned out to be 75, then the
forecast error was 71.8 - 75 = -3.2. One can normalize this error by dividing by sf:
-3.2/√4.479 = -1.512.
^
Normalized Forecast Error = λ = ( Y T+1 - YT+1)/sf.277
The normalized forecast error follows a t-distribution with N - k degrees of freedom.

In this example, N = 5 and k = 2, so the t-distribution has 3 degrees of freedom.
Confidence Intervals:
We can use the mean squared forecast error to create confidence intervals for the future
observations.
For example, at time 6 the mean squared forecast error was calculated as 4.479.
^
The forecasted loss ratio for time 6 is: α^ + β6 = 86.2 + (-2.4)(6) = 71.8.
With 3 degrees of freedom, a 95% confidence interval from the t-table is: ±3.182 standard
deviations. Thus an approximate 95% confidence interval for the observed loss ratio is:
71.8 ± (3.182)(√4.479) = 71.8 ± 6.7.
277
Note this is wider than the 95% confidence interval for the expected loss ratio, calculated
previously as: 71.8 ± 4.9. The observed loss ratios will vary around the expected loss ratio,
and therefore their confidence interval is wider.
Exercise: For the time series example, determine a 95% confidence interval for the observed
loss ratio at time 9.
[Solution: The forecast is: 86.2 - (9)(2.4) = 64.6. x = 9 - 3 = 6. N = 5. s2 = 2.133. Σxi2 = 10.
Mean squared forecast error = s2{1 + 1/N + x2 /Σxi2} = 10.238.
64.6 ± (3.182)(√10.238) = 64.6 ± 10.2.]
Here is a graph of the observed loss ratios, the regression line (solid), 95% confidence
intervals for the expected future loss ratios (shorter dashes), and 95% confidence intervals for
the observed future loss ratios (longer dashes):
L.R.
80
75
70
65
60
55
50
time
2 4 6 8 10
These same ideas apply to regressions that do not involve time, and to multiple regression.
Three Variable Example:*278
We have previously discussed an example with two independent variables and an intercept.
Agent X2 X3 Y
1 100% 0 75%
2 90% 10% 78%
3 70% 0% 71%
4 65% 10% 73%
5 50% 50% 79%
6 50% 35% 75%
7 40% 10% 65%
8 30% 70% 82%
9 15% 20% 72%
10 10% 10% 66%
X2 = the % of business written from New York.

X3 = the % of business written from New Jersey.
Y = the loss ratio for each agent.
^
The fitted regression was: Y = 62.3 + .126X2 + .222X3.
One can use this model to predict the loss ratio for an agent not included in the sample.
This is an example where we are not necessarily predicting the future.
This would be called a cross-section model.279
Exercise: What is the predicted loss ratio for an agent with 25% of his business written in
New York and 60% in New Jersey?
[Solution: 62.3 + (.126)(25) + (.222)(60) = 78.77.]
^ ^ ^ ^
As discussed previously, for this fitted model, Y = β1 + β 2 X2 + β 3X3 + ε:
^ ^ ^
Var[ β1] = 4.434. Var[ β 2 ] = .000836. Var[ β 3] = .001395.
^ ^ ^ ^ ^ ^
Cov[ β1, β 2 ] = -.05277. Cov[ β1, β 3] = -.05244. Cov[ β 2 , β 3] = .000432.
Exercise: What is the variance of the forecast in the previous exercise?

^ ^ ^ ^ ^ ^ ^ ^
[Solution: Var[ β1 + 25 β 2 + 60 β 3] = Var[ β1] + 252Var[ β 2 ]+ 602Var[ β 3] + (2)(25)Cov[ β1, β 2 ] +
^ ^ ^ ^
(2)(60)Cov[ β1, β 3] + (2)(25)(60)Cov[ β 2 , β 3] = 4.434 + (625)(.000836) + (3600)(.001395) +
(50)(-.05277) + (120)(-.05244) + (3000)(.000432) = 2.343.]
The calculation of the variance of this forecast can also be done in matrix form.
278
See Appendix 8.1 of Pindyck and Rubinfeld, not on the syllabus.
279
Estimating classification relativities would be a good example of a cross-section model.
As discussed previously, for this example:
(4.434 -.05277 -.05244)
^
Var[ β] = s2(X’X)-1 = (-.05277 .000836 .000432)
(-.05244 .000432 .001395)
^
Performing the matrix multiplication: (1, 25, 60)Var[ β] (1, 25, 60)T = 2.343, involves the same
arithmetic as performed in this exercise.
In general, the variance of the forecast at the vector Xf is: Xfs2(X’X)-1Xf’.
The t-distribution with 10 - 3 = 7 degrees of freedom has a critical value of 1.860 for a total of
10% area in both tails. Thus an approximate 90% confidence interval for this forecast is:
78.77 ± 1.860√2.343 = 78.8 ± 2.8.
However, as in the two variable case, there is inherent randomness in the model. Each
observed loss ratio includes ε, which is Normally Distributed with variance σ2. As discussed
previously, for this example, σ2 was estimated as: s2 = ESS/(N - k) = 39.389/(10 - 3) = 5.627.
Therefore, the mean squared error for this forecast is: 5.627 + 2.343 = 7.970.
In general, the mean squared forecast error is: s2(1 + Xf(X’X)-1Xf’).280
In analogy to the two variable case, the mean squared forecast error is smallest when we try
to predict the value of the dependent variable at the means of all of the independent
variables.281
Exercise: For this example, what is the mean squared error of the forecast at the means of all
of the independent variables?
[Solution: The mean of X2 is 52.0. The mean of X3 is 21.5.
^
(1, 52, 21.5) Var[ β] (1, 52, 21.5)T = .562.
Adding s2, the mean squared error of the forecast is: .562 + 5.627 = 6.189.
Comment: 6.189 < 7.970.]
280
This formula reduces to the previously discussed formula in the case of the two variable model,
Mean Squared Forecast Error is: s2{1 + 1/T + (XT+1 - X )2/Σ(Xi - X )2}.
281
See equation A8.17 in Pindyck and Rubinfeld, not on the syllabus.
Problems:

You are given the following frequencies for Workers Compensation Medical Only claims
(per 10000 worker-weeks) for each of five years:
Year 1 2 3 4 5
Frequency 86,756 85,601 83,474 75,307 72,083
3 3 . 1 (2 points) Fit a linear regression, with intercept, to this data.

What is the forecasted frequency for year 8?
A. 61,000 B. 62,000 C. 63,000 D. 64,000 E. 65,000
3 3 . 2 (2 points) What is the estimated variance of this regression?

A. 5 million B. 6 million C. 7 million D. 8 million E. 9 million
3 3 . 3 (2 points) Determine a 95% confidence interval for the expected frequency in year 8.
What is the the upper end of this interval?
A. 70,000 B. 72,000 C. 74,000 D. 76,000 E. 78,000
3 3 . 4 (2 points) Determine a 95% confidence interval for the observed frequency in year 8.
What is the the lower end of this interval?
A. 43,000 B. 45,000 C. 47,000 D. 49,000 E. 51,000
33.5 (4 points) You are given the following information on 12 apple trees:
Size of Crop (hundreds of apples), X Percentage of Wormy Apples, Y
8 59
6 58
11 56
22 53
14 50
17 45
18 43
24 42
19 39
23 38
26 30
40 27
A linear regression with intercept has been fit to the above data,
^
Y = 64.247 - 1.013X.
Based on this regression, determine a 95% confidence interval for the percentage of wormy
fruit one would observe from a tree that produced 3500 apples (X = 35).
What is the lower end of this confidence interval?
A. 11 B. 12 C. 13 D. 14 E. 15
33.6 (3 points) You are given the following severities for each of five years:
Year 1 2 3 4 5
Severity 6117 6873 8148 8246 9112
A regression has been fit to the natural logs of these severities:
ln[severity] = 8.64516 + 0.0979168t. s2 = 0.00191151.
Determine the upper end of a 90% confidence interval for the expected severity in year 8.
A. 14,500 B. 14,700 C. 14,900 D. 15,100 E. 15,300
For Actuarial Exam Q, based on the data for many students you have fit the following multiple
regression model: Y = 18.40 + 2.04X2 + 0.1120X3 + .09371X4.
where X2 = number of times Exam Q has been taken previously,
X3 = number of hours studied at work,
X4 = number of hours studied at home, and
Y = student’s score on Exam Q out of 100.
The covariance matrix of the fitted coefficients is:
(.304 -.0167 .00104 .00136 )
(-.0167 .00452 .000239 .000220 )
(.00104 .000239 .0000936 .0000628)
(.00136 .000220 .0000628 .0001017)
The estimated variance of the regression is 20.38.
33.7 (3 points) Determine a 95% confidence interval for the expected score of a student who
has taken Exam Q twice before, has studied 100 hours at work and 200 hours at home.
What is the upper end of that interval?
A. 54 B. 55 C. 56 D. 57 E. 58
33.8 (3 points) Determine a 90% confidence interval for the observed score of a student who
has not taken Exam Q before, has studied 150 hours at work and 50 hours at home.
What is the upper end of that interval?
A. 48 B. 49 C. 50 D. 51 E. 52
33.9 (4 points) Assume that a score of 54.2 or more is required to pass Exam Q.
What is the probability of passing for a student who has taken Exam Q once before, has
studied 50 hours at work and 250 hours at home?
A. 13% B. 15% C. 17% D. 19% E. 21%
33.10 (4 points) You are given the following mortality data for males at age 70, at four points
in time:
Year 1970 1980 1990 2000
100,000q70 3580 3302 3007 2694
Fit a linear regression with intercept to this data. Determine the lower end of a 99%
confidence interval for the expected value of 100,000q70 in 2005.
A. 2370 B. 2390 C. 2410 D. 2430 E. 2450
For private passenger automobile bodily injury liability insurance, the size of loss is assumed
to be LogNormal.
Let X = weight (in 1000 of pounds) of the automobile driven by the insured,
and Y = size of loss.
You fit the regression model, lnY = α + β X + ε, to 4000 observations.
ΣXi = 13,283. ΣXi2 = 54,940.
^ = 8.743. β^ = .136. ESS = 3285. RSS = 11,781.
The results are: α
33.11 (3 points) What is the lower end of a 90% confidence interval for the expected size of
loss for an automobile that weighs 5000 pounds?
A. 11,200 B. 11,400 C. 11,600 D. 11,800 E. 12,000
33.12 (2 points) What is the upper end of a 90% confidence interval for the observed size of
loss for an automobile that weighs 5000 pounds?
A. 35,000 B. 40,000 C. 45,000 D. 50,000 E. 55,000

The finishing times for the Boston Marathon from 1980 to 2005 were:
2:12:11, 2:09:26, 2:08:52, 2:09:00, 2:10:34, 2:14:05, 2:07:51, 2:11:50, 2:08:43, 2:09:06,
2:08:19, 2:11:06, 2:08:14, 2:09:33, 2:07:15, 2:09:22, 2:09:15, 2:10:34, 2:07:34, 2:09:52,
2:09:47, 2:09:43, 2:09:02, 2:10:11, 2:10:37, 2:11:45.
This data was converted to number of seconds beyond two hours.
For example, 2:11:45, which is 2 hours, 11 minutes, and 45 seconds, corresponds to 705.
Then a linear regression was fit to this data, with the following results, where t = 0 for 1980:
Finish Time = 596.496 - 0.86735t.
Covariance Matrix of the estimated parameters:
(1301.31 -76.548)
(-76.548 6.1238)
33.13 (3 points)
Determine the lower end of a 99% confidence interval for the observed finish time in 2006.
A. 2:4:50 B. 2:5:00 C. 2:5:10 D. 2:5:20 E. 2:5:30
33.14 (1 point) The finishing time in 2006 turned out to be 2:07:14.

Determine the normalized forecast error.
A. Less than 0
B. At least 0, but less than 0.5
E. 1.5 or more
33.15 (4 points) You are given the following pure premiums for each of four years:
Year 1 2 3 4
Pure Premium 251 247 260 268
Fit a linear regression with intercept to this data.
Determine the upper end of a 95% confidence interval for the pure premium observed in
year 5.
A. 270 B. 280 C. 290 D. 300 E. 310
33.16 (5 points) For liability insurance you have the following information on the number of
claims reported for several Accident Years. (Accident Year 1997 consists of all claims on
accidents that occurred during 1997, regardless of when they are reported to the insurer.)
Accident Number of Claims Reported Number of Claims Reported
Year by December 31 of that year after December 31 of that year
1997 8037 3312
1998 7948 3090
1999 7792 2983
2000 8125 3211
2001 7936 3224
You fit the regression model Y = βX + ε, where X = Number of Claims Reported by December
31 of that year, and Y = Number of Claims Reported after December 31 of that year.
8341 claims have been observed for Accident Year 2004, as of December 31, 2004.
You determine an interval such that the probability that the total number of claims that will be
reported for Accident Year 2004 is outside this interval is 10%.
What is the lower end of that confidence interval?
A. 11,400 B. 11,450 C. 11,500 D. 11,550 E. 11,600
33.17 (2 points) A linear regression X = α + βY, where X = age of male, and

Y = height (in inches), has been fit to a data set consisting of 1000 boys equally split between
ages 10, 11, 12, 13, and 14.
^
^ = 30.6. β = 2.44. s2 = 9.2.
α
Briefly discuss the use of this model in order to predict the height of a male of a different age.
Use the following data on the amount property-casualty insurers paid for losses due to
catastrophes in the United States for the next two questions:
Year Losses ($billion)
1991 4.7
1992 23.0
1993 5.6
1994 17.0
1995 8.3
1996 7.4
1997 2.6
1998 10.1
1999 8.3
2000 4.6
2001 26.5
2002 5.9
2003 12.9
2004 27.3
A linear regression was fit to this data, with the following results:
Loss = -1057.7 + 0.535385 Year. Standard Error of the Regression = 8.42436.
Standard Error of the Intercept = 1115.67. Standard Error of the Slope = 0.55853.
33.18 (2 points) Determine a 90% confidence interval for the expected catastrophe losses in
2005.
33.19 (2 points) Determine a 90% confidence interval for the observed catastrophe losses in
2005.
33.20 (IOA 101, 9/00, Q.16) (12.75 points) At the end of the skiing season the tourist
board in a mountain region examines the records of ten ski resorts. For each one it obtains
the total number (y, thousands) of visitor-days during the season as a measure of the resort’s
popularity, and the ski-lift capacity (x, thousands), being the maximum number of skiers that
can be transported per hour. The resulting data are given in the following table:
Resort: A B C D E F G H I J
Lift capacity x: 1.9 3.3 1.2 4.2 1.5 2.2 1.0 5.6 1.9 3.8
Visitor-days y: 15.1 22.6 9.2 37.5 8.9 21.1 5.8 41.0 9.2 32.4
Σx = 26.6, Σx2 = 91.08, Σy = 202.8, Σy2 = 5603.12, Σxy = 707.58.
(i) (1.5 points) Draw a scatterplot of y against x and comment briefly on any relationship
between a resort’s popularity and its ski-lift capacity.
(ii) (2.25) Calculate the correlation coefficient between x and y and comment briefly in the
light of your comment in part (i).
(iii) (1.5 points) Calculate the fitted linear regression equation of y on x.
(iv) (4.5 points) (a) Calculate the “total sum of squares” together with its partition into the
“regression sum of squares” (RSS) and the “error sum of squares” (ESS).
(b) Use the values in part (iv)(a) above to calculate the coefficient of determination R2 and
comment briefly on its relationship with the correlation coefficient calculated in part (ii).
(c) Use the values in part (iv)(a) above to calculate an estimate of the error variance σ2 in the
usual linear regression model.
(v) (3 points) Suppose that a resort can increase its ski-lift capacity by 500 skiers per hour.
Estimate the increase in the number of visitor-days it can expect in a season, and
specify a standard error for this estimate.
33.21 (IOA 101, 4/01, Q.16) (12 points)
The table below gives data on the lean body mass (the weight without fat) and resting
metabolic rate for twelve women who were the subjects in a study of obesity.
The researchers suspected that metabolic rate is related to lean body mass.
Lean body mass (kg) Resting metabolic rate
x y
36.1 995
54.6 1425
48.5 1396
42.0 1418
50.6 1502
42.0 1256
40.3 1189
33.1 913
42.4 1124
34.5 1052
51.1 1347
41.2 1204
Σ x = 516.4 Σ x2 = 22,741.34
Σ y = 14821 Σ y2 = 18,695,125
Σ xy = 650,264.8
(i) (1.5 points) Draw a scatter plot of the resting metabolic rate against lean body mass and
comment briefly on any relationship.
(ii) (2.25 points) Calculate the least squares fit regression line in which resting metabolic rate
is modeled as the response and the lean body mass as the explanatory variable.
(iii) (3.75 points) Determine a 95% confidence interval for the slope coefficient of the model.
State any assumptions made.
(iv) (3 points) Use the fitted model to construct 95% confidence intervals for the mean resting
metabolic rate when:
(a) the lean body mass is 50kg
(b) the lean body mass is 75kg
(v) (1.5 points) Comment on the appropriateness of each of the confidence intervals given
in (iv).
33.22 (2 points) In the previous question, IOA 101, 4/01, Q.16, using a computer produce a
graph showing the fitted line and 95% confidence intervals for the mean resting metabolic
rate when the lean body mass is between 30 and 55 kilograms.
33.23 (2 points) For IOA 101, 4/01, Q.16, produce another graph showing the fitted line and
95% confidence intervals for the observed resting metabolic rate when the lean body mass is
between 30 and 55 kilograms.
33.24 (4, 5/01, Q.25) (2.5 points)
You have modeled eight loss ratios as Yt = α + βt + εt,
t = 1, 2, ..., 8, where Yt is the loss ratio for year t and εt is an error term.
You have determined:
( α^ ) (0.50)
( )= ( )
^
(β) (0.02)
[ ( α^ ) ] ( 0.00055 -0.00010)
Var [ ( ) ] = ( )
^
[( β)] (-0.00010 0.00002)
^
Estimate the standard deviation of the forecast for year 10, Y10 = α^ + 10β .
(A) Less than 0.01
(E) At least 0.04
33.25 (2 points) In the previous question, determine a 95% confidence interval for the
forecast.
33.26 (IOA 101, 9/01, Q.15) (12.75 points)
In a study into employee share ownership plans, data were obtained from ten large
insurance companies on the following two variables:
employee satisfaction with the plan (x);
employee commitment to the company (y).
For each company a random sample (of the same size) of employees completed
questionnaires in which satisfaction and commitment were recorded on a 1-10 scale, with 1
representing low satisfaction/commitment and 10 representing high satisfaction/commitment.
The resulting means provide each company’s employees’ satisfaction and commitment
score. These scores are given in the following table:
Co. A B C D E F G H I J
x 5.05 4.12 5.38 4.17 3.81 4.47 5.41 4.88 4.64 5.19
y 5.36 4.59 5.42 4.35 4.03 5.34 5.64 4.89 4.52 5.88
Σx = 47.12, Σx2 = 224.8554, Σy = 50.02, Σy2 = 253.5796, Σxy = 238.3676.
(i) (1.5 points) Draw a scatterplot of y against x and comment briefly on any relationship
between employee satisfaction and commitment.
(ii) (2.25 points) Calculate the fitted linear regression equation of y on x.
(iii) (1.5 points) Calculate the coefficient of determination R2 and relate its value to your
comment in part (i).
(iv) (2.25 points) Assuming the full normal model, calculate an estimate of the error variance
σ2 and obtain a 95% confidence interval for σ2.
(v) (2.25 points) Calculate a 95% confidence interval for the true underlying slope coefficient.
(vi) (3 points) For companies with an employees’ satisfaction score of 5.0, calculate an
estimate of the expected employees' commitment score together with 95% confidence
limits.
Section 34, Testing Forecasts282

One important method of testing models is to see how they would have performed if used in
the past.283 One can compare the predictions the model would have made at a given point in
time with the eventual outcomes. A better match between prediction and outcome would
indicate a model that is more likely to work well in the future.
Boston Marathon Example:
We are given the following finishing times for the Boston Marathon from 1980 to 1992:
2:12:11, 2:09:26, 2:08:52, 2:09:00, 2:10:34, 2:14:05, 2:07:51, 2:11:50, 2:08:43, 2:09:06,
2:08:19, 2:11:06, 2:08:14.
This data was converted to number of seconds beyond two hours.

X = (731, 566, 532, 540, 634, 845, 471, 710, 523, 546, 499, 666, 494).
Here is a graph of his data:
850
800
750
700
650
600
550
500
1980 1982 1984 1986 1988 1990 1992
A linear regression was fit to this data, with the following results, where t = 0 for 1980:
Finish Time = 650.269 - 7.65385t.
sα^ = 66.445, with t-statistic = 9.787 and p-value 9 x 10-7.

sβ^ = 8.37128, with t-statistic = -0.914 and p-value .380.
2
R2 = 0.0706273. R = -0.0138611. s2 = 12754.3.
RSS = 10662. ESS = 140297. F = {10662/(2-1)}/{140297/(13-2)} = .833, with p-value .380.
Durbin-Watson Statistic = 2.64.
282
See Section 8.1.2 in Pindyck and Rubinfeld.
283
This very important and useful idea applies to most models used by actuaries, not just regression models.
These statistics indicate among other things that the slope is not significantly different than
zero. However, none of this directly examines how well this regression model would have
predicted the future.
Root Mean Squared Error:
Here are the finishing times for the Boston Marathon for the next 13 years, 1993 to 2005,
not used in fitting the regression: 2:09:33, 2:07:15, 2:09:22, 2:09:15, 2:10:34, 2:07:34,
2:09:52, 2:09:47, 2:09:43, 2:09:02, 2:10:11, 2:10:37, 2:11:45.
This data was converted to number of seconds beyond two hours:

(573, 435, 562, 555, 634, 454, 592, 587, 583, 542, 611, 637, 705).
Here is a graph of this data and the regression model which was fit to the earlier data:
700
650
600
550
500
450
1994 1996 1998 2000 2002 2004
The forecasts are: 650.269 - 7.65385t = (543.115, 535.461, 527.807, 520.154, 512.500,
504.846, 497.192, 489.538, 481.884, 474.230, 466.577, 458.923, 451.269).
The mean squared error is the mean of the squared differences between the observations
and the forecasted values:
{(573 - 543.115)2 + ... + (705 - 451.269)2}/13 = 13920.8.
The root mean squared error is: √13920.8 = 118.0 seconds.
Root Mean Squared Forecast Error =

√ { Σ (forecast i - observationi) 2 /(# of forecasts)}.
All else being equal, the smaller the mean squared error or the root mean squared error, the
better the performance of the model.
Evaluating the Qualities of an Estimator:
In this particular use of this estimator, the root mean squared forecast error was 118 seconds,
which seems rather large.284 This is just one use of this estimator. Any estimator can do a
poor job or good job on occasion. Rather, the errors that would result from the
repeated use of a procedure is what is referred to when we discuss the
qualities of an estimator.
In this example, one could check this estimator by performing similar regressions over
different periods of time. For example, one could see how well a regression applied to the
data from 1960 to 1972 would have predicted the results from 1973 to 1985. One could also
apply the estimator to other marathons. In an actuarial problem, one would apply the
estimator to similar situations, covering different periods of time, different states, etc.
Ex Post Forecasts:285
Using a regression fit to 1980 to 1992 in order to forecast 1993 to 2005 was an example of
what is called an ex post forecast. The values that were forecast were already known at the
time the forecast was made. Ex post forecasts are used to evaluate a forecasting model.
ex post forecast ⇔ values to be forecast known at time of forecast.
ex ante forecast ⇔ values to be forecast not known at the time of the forecast.
In contrast, an ex ante forecast is used to predict the future. An example of an ex ante forecast
would be if in 2005 we applied a regression to the finishing times from 1993 to 2005 in order
to predict the finishing time in 2006.
Conditional Forecasts:* 286
When predicting a future finishing time for the marathon, one knows the values of the year,
the independent variable to be used in the model. This is called an unconditional forecast.
When predicting the future in an ex ante forecast, sometimes one does not know all of the
values of all of the independent variables to be used in the model. For example, we might be
using the unemployment rate to predict the claim frequency for workers compensation
insurance. If the unemployment is lagged one year, then one would be using the
unemployment rate during 2004 to help predict the claim frequency during 2005.
284
A large portion of the error resulted from using the nonzero fitted slope, which was not significantly different
from zero.
285
286
Section 8.3 in Pindyck and Rubinfeld, not on the syllabus, discusses conditional forecasting.
In January 2005, we would know the unemployment rate during 2004, and therefore this
would be an unconditional forecast of the claim frequency for 2005. However, in January
2005, we would not know the unemployment rate during 2005. We could use a prediction of
the unemployment rate during 2005, in order to forecast the claim frequency during 2006.
This would be an example of a conditional forecast.
In a conditional forecast, not all of the values of the independent variable(s)

are known at the time of the forecast. 287
Theil’s Inequality Coefficient:288
In the marathon example, the second moment of the observed finishing times from 1993 to
2005 was: (5732 + 4352 + 5622 + 5552 + 6342 + 4542 + 5922 + 5872 + 5832 + 5422 + 6112 +
6372 + 7052)/13 = 4354296/13 = 334,946.
The second moment of the predicted finishing times from 1993 to 2005 was:
(543.1152 + 535.4612 + 527.8072 + 520.1542 + 512.5002 + 504.8462 + 497.1922 + 489.5382
+ 481.8842 + 474.2302 + 466.5772 + 458.9232 + 451.2692)/13 = 3,224,260/13 = 248,020.
The root mean squared error was 118.0 seconds.
For this example, Theil’s Inequality Coefficient is:

118.0/(√248,020 + √334,946) = .110.
In general, U = Theil’s Inequality Coefficient =

(RMS Error) / {√ (2nd moment of forecasts) + √ (2nd moment of observeds)}.
In general, 0 ≤ Theil’s Inequality Coefficient ≤ 1.

Theil’s Inequality Coefficient is a relative measure of the root mean squared forecast error.
The smaller Theil’s Inequality Coefficient, the better the forecasts.
Dividing the Mean Squared Error into Three Pieces:289
In the marathon example, for 1993 to 2005, the mean squared forecast error was 13920.8.
The mean forecast for 1993 to 2005 was: (543.115 + 535.461 + 527.807 + 520.154 +
512.500 + 504.846 + 497.192 + 489.538 + 481.884 + 474.230 + 466.577 + 458.923 +
451.269)/13 = 6463.5/13 = 497.192.
The mean observation for 1993 to 2005 was: (573 + 435 + 562 + 555 + 634 + 454 + 592 +
587 + 583 + 542 + 611 + 637 + 705)/13 = 7470/13 = 574.615.
287
This would be an additional source of forecast error, beyond that present when making an unconditional
forecast.
288
289
As calculated previously, the second moment of the forecasts for 1993 to 2005 was 248,020.
Therefore, the variance of the forecasts for 1993 to 2005 is: 248,020 - 497.1922 = 820.115.
The standard deviation of the forecasts for 1993 to 2005 is: √820.115 = 28.638.
As calculated previously, the second moment of the observations for 1993 to 2005 was
334,946. Therefore, the variance of the observations for 1993 to 2005 is:
334,946 - 574.6152 = 4763.602.
The standard deviation of the observations for 1993 to 2005 is: √4763.602 = 69.019.
The numerator of the correlation of the forecasts and observations for 1993 to 2005 is:
{(543.115 - 497.192)(573 - 574.615) + ... + (451.269 - 497.192)(705 - 574.615)}/13 =
-15231.2/13 = -1171.63.
Correlation of the forecasts and observations for 1993 to 2005 is:

-1171.63/√{(28.638)(69.019)} = -.5928.
{(mean forecast) - (mean observation)}2 +

{(stddev of forecasts) - (stddev of observations)}2 +
2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) =
(497.192 - 574.615)2 + (28.638 - 69.019)2 + (2)(1.5928)(28.638)(69.019) =
5994.3 + 1630.6 + 6296.5 = 13,921.
This matches the mean square error computed previously, subject to rounding.
In general, Mean Squared forecast Error = {(mean forecast) - (mean observation)}2 +

{(stddev of forecasts) - (stddev of observations)}2 +
2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.).
Demonstration of the Division of the Mean Squared Error into Three Pieces:*290
Let Oi be the m observed values being compared to the m forecasted values Fi.
{(mean forecast) - (mean observation)}2 + {(stddev of forecasts) - (stddev of observations)}2

+ 2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) =
( F - O )2 + (σF - σO)2 + 2σF σO - 2Σ(Fi - F )(Oi - O )/m =
F 2 + O 2 - 2 F O + σF2 + σO2 - 2ΣFi Oi/m + 2 F O + 2 F O - 2 F O =
F 2 + σF2 + O 2 + σO2 - 2ΣFi Oi/m = ΣFi2 /m + ΣOi2 /m - 2ΣFi Oi/m =
Σ(Fi - Oi)2 /m = Mean Squared forecast Error.
290
This is an application of a result discussed in a previous section, MSE = Variance + Bias2.
The Proportions of Inequality:291
Using the three pieces of the mean squared error (MSE), one can divide the contributions to
Theil’s Inequality Coefficient, U, into three pieces.
Bias proportion of U = UM = Bias2 /MSE

= {(mean forecast) - (mean observation)}2 / MSE.
Variance proportion of U = US =
{(stddev of forecasts) - (stddev of observations)}2 / MSE.
Covariance proportion of U = UC =
2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) / MSE.
In the marathon example:

Bias proportion of U = UM = (497.192 - 574.615)2/13921 = 5994.3/13,921 = .431.
Variance proportion of U = US = (28.638 - 69.019)2/13921 = 1630.6/13,921 = .117.
Covariance proportion of U = UC = (2)(1.5928)(28.638)(69.019)/13921
= 6296.5/13,921 = .452.
Note that .431 + .117 + .452 = 1.000. Since as discussed previously, these numerators add
up to the Mean Squared Error, these three proportions of inequality sum to one.
U M + US + UC = 1.
The bias proportion measures the error due to forecasting on average either too high or too
low. In this case, 43.1% of the mean squared error is due to bias; this is large.
A value of the bias proportion of U, UM, greater than 10% or 20%, indicates a systematic bias,
and the forecasting model should (probably) be revised.
The variance proportion measures the error due to the forecasted series having a variance
that differs from the observed series. In this case, 11.7% of the mean squared error is due to
this difference in variances; this is not large. A large variance proportion would have
indicated that the forecasting model should (probably) be revised.
The covariance proportion measures the remaining error not due to the other two causes,
and is of less concern than the other two proportions.
The ideal distribution of inequality over the three proportions is:

U M + US = 0, and UC = 1.292 This is only an ideal, which is never achieved by any realistic
estimator. However, we would prefer UM and US to be as small as possible.
291
See equations 8.27 to 8.29 in Pindyck and Rubinfeld.
292
Bottom of page 211 in Pindyck and Rubinfeld.
Problems:

You are given the following series of 15 years of average rates for Workers Compensation
Insurance in Massachusetts, t = 1, 2, ..., 15: 1.630, 1.553, 1.453, 1.340, 1.322, 1.360, 1.341,
1.359, 1.323, 1.302, 1.366, 1.342, 1.334, 1.307, 1.362.
For each year, the currently charged rates by class were averaged using that year’s
distribution of exposures by class.
3 4 .1 (3 points) Fit a linear regression versus time to the first ten years of average rates.
What is the forecasted average rate for time 15?
A. 1.0 B. 1.1 C. 1.2 D. 1.3 E. 1.4
3 4 .2 (2 points) What is the root mean squared error in using this regression to forecast the
average rates for times 11 through 15?
A. 0.06 B. 0.09 C. 0.12 D. 0.15 E. 0.18
3 4 .3 (2 points) What is Theil’s Inequality Coefficient, U, in using this regression to forecast

the average rates for times 11 through 15?
A. 0.03 B. 0.05 C. 0.07 D. 0.09 E. 0.11
34.4 (2 points) What is UM, the bias proportion of U, in using this regression to forecast the
average rates for times 11 through 15?
A. 75% B. 80% C. 85% D. 90% E. 95%
34.5 (2 points) What is US, the variance proportion of U, in using this regression to forecast
the average rates for times 11 through 15?
A. 2% B. 4% C. 8% D. 10% E. 12%
34.6 (2 points) What is UC, the covariance proportion of U, in using this regression to
forecast the average rates for times 11 through 15?
A. 2% B. 4% C. 8% D. 10% E. 12%
Use the following forecasted and actual values for the next two questions:
s a
Yt Yt
76 81
106 93
110 125
142 129
34.7 (2 points) Determine the value of the bias proportion of inequality.

(A) 1.0% (B) 1.5% (C) 2.0% (D) 2.5% (E) 3.0%
34.8 (3 points) Determine the value of the variance proportion of inequality.

(A) 6% (B) 7% (C) 8% (D) 9% (E) 10%
34.9 (1 point) According to Pindyck and Rubinfeld in Economic Models and Economic
Forecasts, which of the following statements is false?
A. Forecasting is a principal purpose for constructing regression models.
B. A forecast is a quantitative estimate about the likelihood of future events.
C. In an ex post forecast some of the values of the dependent variable are unknown.
D. An ex ante forecast may be conditional.
E. All of A, B, C, and D are true.

100 forecasts, F1, F2, ..., F100, are being compared to 100 corresponding observations,
O1, O2, ..., O100.
Σ Fi = 872. Σ Fi2 = 11,330. Σ Oi = 981. Σ Oi2 = 15,856. Σ FiOi = 10,281.
34.10 (2 points) What is the root mean squared error of forecasting?

A. 7 B. 8 C. 9 D. 10 E. 11
34.11 (1 point) What is Theil’s Inequality Coefficient, U, for these forecasts?

A. 0.15 B. 0.20 C. 0.25 D. 0.30 E. 0.35
34.12 (1 point) What is UM, the bias proportion of U, for these forecasts?
A. 2% B. 3% C. 4% D. 5% E. 6%
34.13 (2 points) What is US, the variance proportion of U, for these forecasts?
A. 2% B. 3% C. 4% D. 5% E. 6%
34.14 (2 points) What is UC, the covariance proportion of U, for these forecasts?
A. 89% B. 91% C. 93% D. 95% E. 97%
3 4 . 1 5 (1 point) Let X = a consumer price index of the value of homes in a certain

metropolitan area.
Let Y = the average homeowners premiums on polices written by the Regressive Insurance
Company on homes in that metropolitan area.
Both X and Y are monthly series.
The values of each series are available within 20 days after the close of the month.
For example, the values of each series for June are available by July 20.
A regression model is fit, with Yi = α + βXi-2 + ε.
For forecasts made on July 23, 2006, which of the following statements are false?
A. A prediction of the average premium for June 2006 is an ex post forecast.
B. A prediction of the average premium for July 2006 is an ex ante forecast.
C. A prediction of the average premium for August 2006 is a unconditional forecast.
D. A prediction of the average premium for September 2006 is a conditional forecast.
E. All of A, B, C, and D are true.
You are given the following series of 18 months of a Consumer Price Index, t = 1, 2, ...,18:
292.6, 293.7, 294.2, 294.6, 295.5, 296.3, 297.6, 298.4, 299.2, 299.9, 300.8, 302.1,
303.6, 306.0, 307.5, 308.3, 309.0, 310.0.
34.16 (3 points) Fit a linear regression versus time to the first 12 values.
What is the forecasted value for time 18?
A. 307 B. 308 C. 309 D. 310 E. 311
34.17 (2 points) What is the root mean squared error in using this regression to forecast the
values for times 13 through 18?
A. 1 B. 2 C. 3 D. 4 E. 5
34.18 (2 points) What is Theil’s Inequality Coefficient, U, in using this regression to forecast
the values for times 13 through 18?
A. 0.005 B. 0.010 C. 0.015 D. 0.020 E. 0.025

You are given the following forecasted and actual values:
s a
Yt Yt
174 186
193 206
212 227
231 242
Determine the value of the bias proportion of inequality.
(A) 0.031 (B) 0.077 (C) 0.800 (D) 0.890 (E) 0.987
34.20 (3 points) In the previous question, determine the value of the covariance proportion
of inequality.
(A) 0.009 (B) 0.011 (C) 0.013 (D) 0.015 (E) 0.017
Section 35, Forecasting with Serial Correlation293
The following time series exhibits positive serial correlation, which affects how to use
regression to best predict the future.
Massachusetts State Average Weekly Wage (SAWW), An Example of a Time Series:294
Period Covered Average Weekly Wage Period Covered Average Weekly Wage
4/1/69 to 3/31/70 131.02 4/1/87 to 3/31/88 444.20
4/1/70 to 3/31/71 139.38 4/1/88 to 3/31/89 474.47
4/1/71 to 3/31/72 149.64 4/1/89 to 3/31/90 490.57
4/1/72 to 3/31/73 155.57 4/1/90 to 3/31/91 515.52
4/1/73 to 3/31/74 163.80 4/1/91 to 3/31/92 543.30
4/1/74 to 3/31/75 174.48 4/1/92 to 3/31/93 565.94
4/1/75 to 3/31/76 186.85 4/1/93 to 3/31/94 585.66
4/1/76 to 3/31/77 199.31 4/1/94 to 3/31/95 604.03
4/1/77 to 3/31/78 211.37 4/1/95 to 3/31/96 631.03
4/1/78 to 3/31/79 227.31 4/1/96 to 3/31/97 665.55
4/1/79 to 3/31/80 245.48 4/1/97 to 3/31/98 699.91
4/1/80 to 3/31/81 269.93 4/1/98 to 3/31/99 749.69
4/1/81 to 3/31/82 297.85 4/1/99 to 3/31/00 830.89
4/1/82 to 3/31/83 320.29 4/1/00 to 3/31/01 890.94
4/1/83 to 3/31/84 341.06 4/1/01 to 3/31/02 882.57
4/1/84 to 3/31/85 360.50 4/1/02 to 3/31/03 884.46
4/1/85 to 3/31/86 383.57 4/1/03 to 3/31/04 918.78
4/1/86 to 3/31/87 411.00 4/1/04 to 3/31/05 958.58
One can fit an exponential regression to the first 33 of the 36 element data elements, holding
out the last three values in order to compare to forecasts.
ln[SAWW] = Y = α + βt, where t = 1 for the first annual period, 4/1/69 to 3/31/70.
The result is:
^ = 4.84291, with standard error of 0.0192764, and t-statistic of 251.235 (p-value = 0).
α
^
β = 0.0613192, with standard error of 0.00989292, and t-statistic of 61.9829 (p-value = 0).
2
R2 = 0.991996. R = 0.991737. s2 = 0.00292827.
DF SumofSq MeanSq FRatio PValue
Regression 1 11.2501 11.2501 3841.88 0
Error 31 .0907763 .00292827
Total 32 11.3408
Durbin Watson Statistic = .160702.
293
294
Data from the Massachusetts Department of Unemployment Assistance (formerly the Division of Unemployment
and Training.) These values, released each October 1, affect the Workers’ Compensation benefits paid in the
state of Massachusetts.
Here is a graph of all of the data and the exponential regression fitted to the first 33 points,
State Average Weekly Wage = exp[4.84291 + 0.0613192t] = (126.84) (1.06324t):
SAWW
1000
800
600
400
200
t
5 10 15 20 25 30 35
Residuals:
The residuals of this regression are graphed below:
0.075
0.05
0.025
5 10 15 20 25 30
-0.025
- 0.05
-0.075
The graph of the residuals indicates that they are positively serially correlated.
The Durbin-Watson Statistic is .161, strongly indicating positive serial correlation.
DW ≅ 2(1- ρ). ⇒ ρ ≅ 1 - DW/2 = 1 - .161/2 = .92.
The residuals of this regression are: -0.0288821, -0.0283473, -0.0186381, -0.041094,
-0.0508628, -0.0490181, -0.0418413, -0.0386053, -0.0411758, -0.0297904, -0.0142089,
0.019419, 0.0565271, 0.0678447, 0.0693571, 0.0634714, 0.0641823, 0.0719342,
0.0882966, 0.0929009, 0.0649513, 0.0532401, 0.0444065, 0.0239136, -0.00315424,
-0.033589, -0.0511787, -0.0592376, -0.0702188, -0.06283, -0.0213116, -0.0128512,
-0.0836094.
One can estimate the serial correlation coefficient as follows:
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 0.07957/0.08379 = .950.
As discussed previously, this would be the first step in the Cochrane-Orcutt procedure, one
way to correct for serial correlation.
Applying the Cochrane-Orcutt Procedure:295
Continuing the Cochrane-Orcutt procedure,

let X*t = Xt - ρXt-1 = Xt - .95Xt-1 = (1.05, 1.10, 1.15, ..., 2.6).
Y*t = Yt - ρYt-1 = Yt - .95Yt-1 = (4.87535, 4.9372, 5.00823, ... , 6.78284).
Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

^
α(1 ^ = 0.264477/(1 - .95) = 5.28954.296 β^ = 0.0482171.
- ρ) = 0.264477. ⇒ α
Translate this transformed equation back to the original variables: Yt = α + βXt, and get the

^ ^
^ + βX = 5.28954 + 0.0482171X = (5.33776, 5.38597, 5.43419, ... , 6.8807).
Y=α
^
ε^ = Y - Y = (-0.462407, -0.44877, -0.425959, ... , -0.0978662).
Estimate the serial correlation coefficient:
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 2.13757/2.24902 = .950.
Since ρ seems to have converged, we will use the equation

^
Ln[SAWW] = Y = 5.28954 + 0.0482171X, together with ρ = .950, in order to make forecasts.
295
A similar result could have been obtained using instead the Hildreth-Lu procedure.
296
When ρ is close to one, the estimated intercept is very sensitive to the estimated intercept of the translated
equation. As will be seen, when ρ is close to one, the estimated intercept has only a small effect on the forecast.
Forecasting:
When there is serial correlation in time series, we forecast forwards one step at a time,
starting at the last available data point.
In this case the last data point used in the regression was a SAWW of 882.57, corresponding
to data from 4/1/01 to 3/31/02 or X = 33. We want to forecast the SAWW corresponding to the
next annual period 4/1/02 to 3/31/03 or X = 34.297
One possible estimate of ln[SAWW] is the last observed point plus the slope:
ln[882.57] + 0.0482171 = 6.7828 + 0.0482171 = 6.8310.
Another possible estimate is obtained by using the regression equation:

^
Y = 5.28954 + 0.0482171X = 5.28954 + (0.0482171)(34) = 6.9289.
We weight this estimate, with weight ρ being given the last observation and weight
1- ρ being given to the regression estimate:298
(.95)(6.8310) + (.05)(6.9289) = 6.8359.
This is equivalent to using the following equation:299

^ ^ ^ + β(X ^ ^
YT+1 = ρY T + α(1- ρ) T+1 - ρXT) =
= (.95)(6.7828) + (5.28954)(1 - .95) + 0.0482171{34 - (.95)(33)} = 6.8359.
Corresponding to an SAWW of exp[6.8359] = 930.68.
Continuing, we can forecast the value for X = 35 using the value forecasted for X = 34:
(.95)(6.8359) + (5.28954)(1 - .95) + 0.0482171{35 - (.95)(34)} = 6.8887.
In general, with serial correlation one successively forecasts ahead one time period at a time:
^
Forecast for time t+1 = ρ^ (Forecast for time t) + α^ ( 1 - ρ^ ) + β (t+1 - ρ^ t).
^ ^
= ρ^ ( β + Forecast for time t) + (1- ρ^ ) { α^ + β (t+1)}.
^
^ β, and ρ^ have been estimated using one of the procedures to correct for serial
Where, α,
correlation, the Cochrane-Orcutt procedure or Hildreth-Lu procedure.
Exercise: Forecast the values for X = 36, and then 37, using the value forecasted for X = 35.
[Solution: (.95)(6.8887) + (5.28954)(1 - .95) + 0.0482171{36 - (.95)(35)} = 6.9413.
(.95)(6.9413) + (5.28954)(1 - .95) + 0.0482171{37 - (.95)(36)} = 6.9937.
Corresponding to an SAWW of exp[6.9937] = 1089.75.]
297
Ignoring for now that we actually have this data point.
298
This is related to the ideas behind the use of credibility.
See the sequential approach to credibility in “Mahler’s Guide to Buhlmann Credibility and Bayesian Analysis.”
299
Comparing Forecasted and Actual SAWW:
Forecast Forecast
X Actual SAWW Corrected for Serial Correlation Original Exponential Regression
34 884.46 930.68 1036.37

35 918.78 981.13 1101.91
36 958.58 1034.11 1171.60
37 ??? 1089.75 1245.69
In this case, the forecasts are not very good. After decades of steady increase, the last few
values in the series did not follow that pattern. Changes in economic conditions can result in
changes in the movement of time series such as this one.300
Variance of Forecasts:*301
With serial correlation, one can use similar techniques to those discussed previously, in order
to get the variance of the forecasted expected values and the mean squared error of the
forecast. However, we need to work with the regression on the transformed variables, X* and
Y*.
Y*i+1 = Yi+1 - ρYi. ⇒ Yi+1 = Y*i+1 + ρYi.
^
^ and β are the estimated coefficients for the regression on the transformed
Where α*
variables, X* and Y*, one can write the forecasting equations as:
^ ^ ^ ^ ^
^ + βX* ^ ^ ^
^ + β(X ^
Y T+1 = ρYT + Y*T+1 = ρYT + α* T+1 = ρYT + α(1- ρ) T+1 - ρXT) .
^ ^ ^ ^ ^
Y T+2 = ρ^ Y T+1 + Y*T+2 = ρY
^ ^ + βX*
T+1 + α*
^
T+2 = ρ 2YT + (1 + ρ) α*
^ + β(X* ^
T+2 + ρX*T+1).
^ ^ ^ ^
Y T+3 = ρ^ Y T+2 + Y*T+3 = ρ^ 3YT + (1+ ρ^ + ρ^ 2) α*
^ + β(X* ^ ^2
T+3 + ρX*T+2 + ρ X*T+1).
When the Cochrane-Orcutt Procedure was applied to the SAWW example, the fitted
regression was Y* = 0.264477 + 0.0482171X*, with ρ = .95.
The last observed value of ln[SAWW], YT = 6.7828, corresponds to X = 33.
X*T+1 = 34 - (.95)(33) = 2.65. X*T+2 = 35 - (.95)(34) = 2.7. X*T+3 = 36 - (.95)(35) = 2.75.

^ ^ ^
^ + βX*
Y T+1 = ρYT + α* T+1 = (.95)(6.7828) + .264477 + (0.0482171)(2.65) = 6.8359.
300
One should always bear in mind that when one uses regression models for forecasting, one is assuming that
the pattern in the past will more or less continue into the future.
301
See pages 216 and 217 of Pindyck and Rubinfeld.
^ ^
Y T+2 = ρ^ 2YT + (1 + ρ) α* ^
^ + β(X*T+2 + ρX*T+1) =
2
(.95 )(6.7828) + (1.95)(.264477) + (0.0482171){2.7 + (.95)(2.65)} = 6.8888.
^ ^
Y T+3 = ρ^ 3YT + (1+ ρ^ + ρ^ 2) α*
^ + β(X* ^ ^
T+3 + ρX*T+2 + ρ 2X*T+1) =
(.953)(6.7828) + (1 + .95 + .952)(.264477) + (0.0482171){2.75 + (.95)(2.7) + (.952)(2.65)} =
6.9414.
These match the forecasts gotten previously, subject to rounding.
Treating the serial correlation coefficient, ρ, as known, one can determine the variances of
the forecasted expected values:302
^ ^
^ + X*T+12Var[ β] + 2 X*T+1Cov[ α*,
^ β]. ^
Var[ Y T+1] = Var[ α*]
^ ^ ^
Var[ Y T+2] = (1 + ρ)2Var[ α*]
^ + (X*T+2 + ρX*T+1)2Var[ β] + 2 (1 + ρ)(X*T+2 + ρX*T+1)Cov[ α*,
^ β].
^ ^
Var[ Y T+3] = (1 + ρ + ρ2)2Var[ α*]
^ + (X*T+3 + ρX*T+2 + ρ2X*T+1)2Var[ β]
^ β]. ^
+ 2 (1 + ρ + ρ2)(X*T+3 + ρX*T+2 + ρ2X*T+1)Cov[ α*,
^ and β^ is:
For the regression on the transformed variables, the covariance matrix of α*
(0.00022733 -0.000117075)
(-0.000117075 0.0000641505)
^ ^
^ + X*T+12Var[ β] + 2 X*T+1Cov[ α*,
^ β] = ^
Var[ Y T+1] = Var[ α*]
0.00022733 + (2.652)(0.0000641505) + (2)(2.65)(-0.000117075) = 0.00005733.
^ ^ ^
Var[ Y T+2] = (1 + ρ)2Var[ α*]
^ + (X*T+2 + ρX*T+1)2Var[ β] + 2 (1 + ρ)(X*T+2 + ρX*T+1)Cov[ α*,
^ β] =
(1.952)(0.00022733) + {2.7 + (.95)(2.65)}2(0.0000641505) +

(2)(1.95){2.7 + (.95)(2.65)}(-0.000117075) = 0.00022848.
^
Var[ Y T+3] = (1 + .95 + .952)2(0.00022733) +
{2.75 + (.95)(2.7) + (.952)(2.65)}2(0.0000641505) +
(2) (1 + .95 + .952){2.75 + (.95)(2.7) + (.952)(2.65)}(-0.000117075) = 0.00051241.
In order to get mean squared errors for the forecasts, we add s2 = .00043751, from the
regression on the transformed variables.
MSE Forecast forward one period: .00043751 + 0.00005733 = .00049484.
MSE Forecast forward two periods: .00043751 + 0.00022848 = .00066599.
MSE Forecast forward three periods: .00043751 + 0.00051241 = .00094992.
302
YT is the last observed value, which is a known constant, and does not contribute to the variance.
The regression on the transformed variables, had 33 - 1 = 32 values, and 32 - 2 = 30 degrees
of freedom. For a 95% confidence interval, from the t-table one takes ±2.042 standard
deviations.
Thus 95% confidence intervals for the observed ln[SAWW] are:

Forecast forward 1 period: 6.8359 ± 2.042√.00049484 = 6.8359 ± .0454 = (6.7905, 6.8813).
Forecast forward 2 periods: 6.8888 ± 2.042√.00066599 = 6.8888 ± .0527 = (6.8361, 6.9415).
Forecast forward 3 periods: 6.8359 ± 2.042√.00094992 = 6.9414 ± .0629 = (6.8785, 7.0043).
Therefore, 95% confidence intervals for the observed SAWW are:

Forecast forward one period: (889.36, 973.89).303
Forecast forward two periods: (930.85, 1034.32).304
Forecast forward three periods: (971.17, 1101.36).
An Actuarial Procedure:*
When dealing with such economic time series, an actuary might have fit a regression solely
to estimate the annual percentage rate of increase, and then applied that annual increase to
the latest observed point.
Using the original regression, the rate of increase is 1.06324, or 6.324% per year.
Using the regression adjusted for serial correlation, the rate of increase is e0.0482171 =
1.04940, or 4.940% per year.
Applying these increases to the latest observed value of 882.57 (at X = 33):
X Actual SAWW Forecast at 4.940% Forecast at 6.324%

33 882.57
34 884.46 926.17 938.38
35 918.78 971.92 997.73
36 958.58 1019.93 1060.82
37 ??? 1070.32 1127.91
When ρ is very close to one, and we are predicting forwards only a few periods, this actuarial
procedure gives a result similar to that gotten previously.
Using this previous procedure, with ρ = .95, the forecasts for ln[SAWW] were:
^ ^
(Forecast for time 34) = .95(Y for time 33) + α.05 + 2.65 β.
^ + 2.7 β ^
(Forecast for time 35) = .95(Forecast for time 34) + .05 α
^ + 5.2175 β. ^
= .9025(Y for time 33) + .0975 α
^ + 2.75 β ^
(Forecast for time 36) = .95(Forecast for time 35) + .05 α
^ + 7.706625 β. ^
= .857375(Y for time 33) + .142625 α
303
Actual SAWW was 884.46, outside this 95% confidence interval.
304
Actual SAWW was 918.78, outside this 95% confidence interval. Given that the previous value was much less
than its forecast, one would have expected this value to also be significantly less than its forecast.
Using the actuarial procedure, the forecasts of ln[SAWW] are:
^
(Forecast for time 34) = (Y for time 33) + β.
^
(Forecast for time 35) = (Y for time 33) + 2 β.
^
(Forecast for time 36) = (Y for time 33) + 3 β.
The difference between the forecasts at time 36 (3 periods ahead) from this actuarial
procedure and the previous procedure is:
^ ^ + 7.706625 β} = ^
(Y for time 33) + 3 β - {.857375(Y for time 33) + .142625 α
^
^ + 33)} = (1 - ρ3){(Y at time 33) - ( Y at time 33)} =
.142625{(Y for time 33) - ( α
(1 - ρ3)(regression residual at time 33).
The difference between the forecasts J periods ahead from this actuarial procedure and the
previous procedure is: (1 - ρJ)(regression residual at the latest observation).
Thus for ρ close to one and J small, the actuarial procedure and the correct procedure give
similar results, provided the regression residual at the last observation is not too large. If the
regression exactly matches the last observation, then the results of using the two procedures
are the same.
Problems:

One has a time series of 3 years of monthly points, t = 1, 2, 3, ..., 36, and
Y = 88.3, 89.5, 91.1, ..., 135.7.
Using the Cochrane-Orcutt procedure, in order to adjust for serial correlation, the serial
correlation has been estimated at 0.80, and a regression has been fit: Y = 87.2 + 1.4 t.
3 5 .1 (2 points) Forecast the value of Y for t = 37.

A. 137.5 B. 137.7 C. 137.9 D. 138.1 E. 138.2
3 5 .2 (2 points) Forecast the value of Y for t = 38.

A. 138.8 B. 139.0 C. 139.2 D. 139.4 E. 139.6
35.3 (2 points) Forecast the value of Y for t = 39.

A. 140.2 B. 140.4 C. 140.6 D. 140.8 E. 141.0

The population (in millions) of the United States was:
year 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
population 76.2 92.2 106.0 123.2 132.2 151.3 179.3 203.3 226.5 249.6 281.4
35.4 (4 points) Fit an exponential regression to this data, ln(population) = α + βt.

Use the model to predict the population (in millions) in 2030.
A. 405 B. 410 C. 415 D. 420 E. 425
35.5 (3 points) What is the Durbin-Watson Statistic for this regression?

A. 1 B. 5/4 C. 3/2 D. 7/4 E. 2
35.6 (9 points) Apply the Cochrane-Orcutt procedure, in order to correct for serial correlation.
Use the revised model to predict the population (in millions) in 2010, 2020, and 2030.
35.7 (9 points) Apply the Hildreth-Lu procedure, in order to correct for serial correlation.
Use the revised model to predict the population (in millions) in 2010, 2020, and 2030.
One has 10 years of monthly values of the Consumer Price Index for All Urban Consumers,
t = 1, 2, 3, ..., 120. The value of the Price Index for t = 120 is 190.3.
Using the Hildreth-Lu procedure, in order to adjust for serial correlation, the serial correlation
has been estimated as 0.90, and a linear regression has been fit: Y* = 14.981 + .33590 t*,
where Y* = Yi+1 - .9Yi, and t* = i+1 - .9i.
For this regression, R2 = .884 and s2 = .1775.
35.8 (2 points) Forecast the value of Y (the price index) for t = 121.
(A) Less than 190.4
(E) At least 190.7

(A) Less than 190.7
(E) At least 191.0

(A) Less than 190.9
(E) At least 191.2
35.11 (3 points) Determine a 95% confidence intervals for the value of the price index
n
observed at t = 121. Hint: ∑ i2 = n(n + 1)(2n + 1) / 6.
i=1
Mahler’s Guide to
Regression
Sections 36-43:
36 Standardized Coefficients
37 Elasticity
38 Partial Correlation Coefficients
39 * Regression Diagnostics
40 * Stepwise Regression
41 * Stochastic Explanatory Variables
42 * Generalized Least Squares
43 Nonlinear Estimation

prepared by
Study Aid F06-Reg-J

Sharon, MA, 02067
HCMSA-F06-Reg-J, Mahler’s Guide to Regression, 7/12/06, Page 343
Section 36, Standardized Coefficients305

In the two variable regression model, X and Y may be in different units. For example X may be
the rate of unemployment and Y might be the workers’ compensation insurance claim
frequency. In multiple regression some of the independent variables may be in different units.
For example, X1 might be in the mean annual income and X2 might be the unemployment rate.
Even two variables both in dollars might have significantly different means and variances, for
example the annual cost of Homeowners losses due to theft and the annual cost of
Homeowners losses due to hurricanes.
Therefore, often one standardizes the variables, each independent variable as well as the
dependent variable, prior to performing a regression. To standardize a variable, one
subtracts its mean and then divides each variable by its standard deviation.306
The standardized variables than each have a mean of 0 and a standard deviation of 1.
Since the standardized variables each have a mean of 0, the intercept vanishes from the
regression equation (the intercept is zero.)
Two Variable Model:
In the two variable regression model X∗ = (X - X )/sX, and Y∗ = (Y - Y )/sY.

The regression model becomes: Y∗ = β∗X∗ + ε.
In standardized form, the regression goes through the origin; the intercept is zero.
The slope of the standardized regression is related to that of the original regression:
^
β^* = β sX/sY.
The slope is ∆y/∆x, so to standardize we divide the numerator by sY and divide the
^
denominator by sX; we multiply β by sX/sY.
Exercise: You fit a two variable regression Y = α + βt + ε to the data:

1 82
2 78
3 80
4 73
5 77
^ ^
With result β = Σxiyi/Σxi2 = -15/10 = -1.5, and α^ = Y - β X = 78 - (-1.5)(5) = 82.5.
What is the standardized regression equation?
[Solution: sX2 = Σxi2/(5 -1) = 10/4 = 2.5. sY2 = Σyi2/(5 -1) = (16 + 0 + 4 + 25 + 1)/4 = 11.5.
^
β^* = βsX/sY = -1.5(2.5/11.5)1/2 = -.70. The standardized regression is: Y∗ = -.70X∗.]
305
See Section 4.5.1 of Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.
306
This is the same way one standardizes a variable in order to use the Standard Normal Table.
Therefore, a year with time one standard deviation above the average time of 3, has a fitted
loss ratio 0.7 standard deviation less than average. For example, (6 - 3)/√2.5 = 1.90. Therefore,
the forecasted loss ratio for year 6 is: (1.90)(.70) = 1.33 standard deviation below the mean
loss ratio. This is: 78 - 1.33√11.5 = 73.5.
Note that this is the same result obtained using the original regression: 82.5 + (6)(-1.5) = 73.5.
Thus while a standardized regression allows one to better interpret or compare the impacts of
variables, the fitted values are the same.
Note that in this example, the (simple) correlation of X and Y is:

rXY = {Σxiyi/ (N-1)}/√{Σxi2/ (N-1)Σyi2/ (N-1)} = Σxiyi/ √{Σxi2Σyi2} = -15/√{(10)(46) = -.70 = β∗.
In general, for the two variable regression, the standardized slope is equal to
correlation of X and Y. β^* = rXY .
Multiple Regression:
In the case of the multiple regression model, in a similar manner the standardized slopes are:
^ ^ ^ ^
β^*2 = β 2 sX2 /sY = β 2 √(Σx2i2/Σyi2), β^*3 = β 3 sX3 /sY = β 3√(Σx3i2/Σyi2), etc.
One can compare the standardized coefficients directly. The larger the absolute value of
the standardized coefficient, the more important the corresponding variable is in
determining the value of Y.307
Three Variable Model:
In the case of three variables, as shown below, the standardized slopes can be written in terms
of correlations:
β^*2 = ( rYX - rYX rX X ) / (1 - rX X 2).
2 3 2 3 2 3
β^*3 = ( rYX - rYX rX X ) / (1 - rX X 2).
3 2 2 3 2 3
If X2 and X3 are uncorrelated, then rX X = 0 and β*2 = rYX and β*3 = rYX .
2 3 2 3
^
β 2 = {(Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2} =
{(Σx2iyi - Σx3iyi Σx2ix3i/Σx3i2 } / {Σx2i2 - (Σx2ix3i)2/Σx3i2 } =
( rYX sX2 sY - rYX sY rX X sX2 ) / ( sX2 2 - rX X 2 sX2 2 } = sY(rYX - rYX rX X ) /{ sX2 (1 - rX X 2)}.
2 3 2 3 2 3 2 3 2 3 2 3
^
β^*2 = β 2 sX2 /sY = ( rYX - rYX rX X ) / (1 - rX X 2).
2 3 2 3 2 3
By symmetry, the similar result holds, β^*3 = ( rYX - rYX rX X ) / (1 - rX X 2).

3 2 2 3 2 3
307
However, one must keep in mind that variables may interact in determining the value of Y.
t-statistics:
^ ^ ^
t = β 2 /sβ^ . In standardized form, β^*2 = β 2 sX /sY. Therefore its standardized error is β 2 sX /sY.
2 2 2
Therefore, the t-statistics for the regression in standardized form are the same as for the
original regression.
The Idea Behind Standardizing:*
Let us assume that one of the independent variables is in feet. What if instead of using feet we
were to express it in terms of inches? Then all of the values of this independent variable would
be multiplied by 12, and its coefficient in the model would be divided by 12.308 However, in
terms of inches the standard deviation of this independent variable would be multiplied by 12.
The standardized coefficient of this variable would be unaffected.
The same result would hold for a change of scale in the dependent variable. If the dependent
variable would expressed in inches rather than feet, then the slopes would all be multiplied by
12. However, the sample standard deviation would also be multiplied by 12. Therefore, the
standardized coefficients would be unaffected.
In general, standardized coefficients are unaffected by changes in scale of the variables. The
standardized coefficients can be written in terms of correlations, dimensionless quantities
unaffected by changes of scale. Therefore, the standardized coefficients are dimensionless
quantities, unaffected by changes of scale.
Exercise: Assume the exchange rate is 0.8 dollars per euro.

^
Assume that when the independent variable X2 is expressed in euros, β 2 = 100 and β^*2 = 30.
^
If instead X2 is expressed in dollars, what are the values of β 2 and β^*2 ?
^
[Solution: β 2 is divided by 0.8. The new value of the coefficient is: 100/0.8 = 125.
^ ^
sX2 is multiplied by 0.8. β*2 = β 2 sX2 /sY remains unchanged at 30.]
For example, assume X2 is in dollars and Y is in tons.

^
Then, β 2 is in units of tons per dollar, just as ∆Y/∆X2.
sX is in dollars, just as X2. sY is in tons, just as Y.
2
^
Therefore, β^*2 = β 2 sX2 /sY is a pure number.
In general, standardized variables and standardized coefficients are unit-less

numbers
308
The coefficient is like ∆Y/∆X. If all the X values are multiplied by 12, then the slope is divided by 12.
Problems:
3 6 .1 (3 points) Data on 20 observations have been fit:

^
Y = 1470.3 + 0.8145X2 + 0.8204X3 + 13.5287X4.
X2 = 9213, X 3 = 35311.3, X 4 = 1383.35, Y = 56660.
sα^ = 5746. sβ^ = .5122. sβ^3 = .2112. sβ^ 4 = .5857.

2
Var[X2] = 28.98 million. Var[X3] = 236.4 million. Var[X4] = .2169 million.
Var[Y] = 514.9 million.
Determine the equivalent model if each of the variables is standardized by subtracting its
mean and dividing by its standard deviation.
36.2 (3 points) The following linear regression has been fit:

^
Y = 177.703 - 0.715143X2 - 0.873252X3 + 31.2728X4 - 17.8078X5 + 9.98376X6.
The variance-covariance matrix of the variables in the regression is:
Y X2 X3 X4 X5 X6
Y 317.564 -11.2631 -6.42264 6.12579 -0.311231 1.33801
X2 -11.2631 3.90916 0.862264 -0.231626 -0.0822102 -0.193801
X3 -6.42264 0.862264 16.4044 0.307817 -0.0320755 -0.168104
X4 6.12579 -0.231626 0.307817 0.226415 0.0226415 -0.0449236
X5 -0.311231 -0.0822102 -0.0320755 0.0226415 0.0622642 0.000269542
X6 1.33801 -0.193801 -0.168104 -0.0449236 0.000269542 0.246631
Based on the standardized coefficients, which of the variables is the single most important
determinant of Y?
A. X2 B. X3 C. X4 D. X5 E. X6
36.3 (3 points) The correlation of X2 and X3 is -.2537.

The correlation of X2 and Y is .7296. The correlation of X3 and Y is .3952.
For the model Y = β1 + β2X2 + β3X3 + ε, determine the value of β*3 , the standardized coefficient
associated with X3.
(A) 0.56 (B) 0.58 (C) 0.60 (D) 0.62 (E) 0.64
36.4 (3 points) You given the following 5 observations:

X: 1 2 3 4 5
Y: 202 321 404 480 507
Determine the linear regression model if each of the variables is standardized by subtracting
its mean and dividing by its standard deviation.
36.5 (2 points) The following linear regression has been fit:

^
Y = 6.3974 + 2.4642X2 + 6.4560X3 + 1.1839X4.
The variance-covariance matrix of the variables in the regression is:
Y X2 X3 X4
Y 1527.64 162.893 121.071 181.714
X2 162.893 64.9821 -13.9643 78.5714
X3 121.071 -13.9643 27.6429 -19.4286
X4 181.714 78.5714 -19.4286 96.0000
β^*j is the standardized regression coefficient associated with Xj.
Which of the following is correct?
(A) β^*2 > β^*3 > β^*4
(B) β^*2 > β^*4 > β^*3
(C) β^*3 > β^*4 > β^*2
(D) β^*4 > β^*2 > β^*3
(E) None of A, B, C, or D
36.6 (3 points) Given the following information:

ΣXi = 128. ΣYi = 672. ΣXi2 = 1853. ΣYi2 = 14,911. ΣXiYi = 4120. N = 40.
Determine β^* , the standardized regression coefficient.
A. Less than 0.60
E. At least 0.90
36.7 (3 points) You given the following observations:

X: 40 60 80 100 120 140 160
Y: 15.9 18.8 21.6 25.2 28.7 30.4 30.9
Determine the value of β∗, the standardized regression slope.
A. 0.965 B. 0.970 C. 0.975 D. 0.980 E. 0.985
36.8 (4, 11/00, Q.37) (2.5 points) Data on 28 home sales yield the fitted model:
^
Y = 43.9 + 0.238X2 - 0.000229X3 + 0.14718X4 - 6.68X5 - 0.269X6.
where
Y = sales price of home
X2 = taxes
X3 = size of lot
X4 = square feet of living space
X5 = number of rooms
X6 = age in years
You are given that the estimated variance-covariance matrix (lower-triangular portion) of the
variables in the regression is:
Y X2 X3 X4 X5 X6
Y 20,041.4
X2 36,909.0 80,964.2
X3 229,662.6 439,511.8 5,923,126.9
X4 71,479.2 129,032.9 907,497.1 300,121.4
X5 127.2 244.5 1,589.3 532.5 1.3
X6 –585.4 –1,420.5 –12,877.4 –1,343.4 0.2 190.9
β^*j is the standardized regression coefficient associated with Xj.
Which of the following is correct?
(A) β^*2 > β^*3 > β^*4
(B) β^*2 > β^*4 > β^*3
(C) β^*3 > β^*4 > β^*2
(D) β^*4 > β^*2 > β^*3
(E) β^*4 > β^*3 > β^*2
36.9 (4, 5/01, Q.13) (2.5 points) Applied to the model Yi = β1 + β2X2i + β3X3i + εi, the method
of least squares implies:
Σ(Yi - Y )(X2i - X2 ) = β2Σ(X2i - X2 )2 + β3Σ(X2i - X2 )(X3i - X 3 )
Σ(Yi - Y )(X3i - X 3 ) = β2Σ(X2i - X2 )(X3i - X 3 ) + β3Σ(X3i - X 3 )2.
You are given:
(i) Σ(Yi - Y )(X2i - X2 )
rYX = = 0.4.
2
√{Σ(Yi - Y )2Σ(X2i - X2 )2}
(ii) Σ(Yi - Y )(X3i - X 3 )

rYX = = 0.9.
3
√{Σ(Yi - Y )2Σ(X3i - X 3 )2}
(iii) Σ(X2i - X2 )(X3i - X 3 )

rX X = = 0.6.
2 3
√{Σ(X2i - X2 )2Σ(X3i - X 3 )2}
Determine the value of β*2 , the standardized coefficient associated with X2.
(A) –0.7 (B) –0.2 (C) 0.3 (D) 0.8 (E) 1.0
36.10 (2 points) In the previous question, determine the value of β*3 , the standardized
coefficient associated with X3.
(A) –0.7 (B) –0.2 (C) 0.3 (D) 0.8 (E) 1.0
Section 37, Elasticity309

The elasticity measures the percent change in the dependent variable for a
given percent change in an independent variable. In regressions, the elasticity
is most commonly calculated at the mean value of each variable.
For example, in the heights regression example, we had

X = Height of father. X = 59.250.
Y = Height of son. Y = 61.125.
Fitted regression model: Y = 24.1 + .625X.
Near the mean, the percentage change in height of a father is approximately: ∆X/ X .
Near the mean, the percentage change in height of a son is approximately: ∆Y/ Y .
^
For this model, ∆Y/∆X = β = .625.
(percent change in height of a son)/ (percent change in height of a father) ≅ .625 X /Y =
(.625)(59.250/61.125) = .61.
Thus in this case, near the mean, a 1% change in the height of the father is expected to result
in about a 0.61% change in the height of the son.
For a simple linear regression, the coefficient of elasticity related to the jth variable is:
^
Ej = βj X j / Y
Exercise: In the three variable regression example involving the loss ratios of agents:
^
X2 = 52.0. X 3 = 21.5. Y = 73.6. Y = 62.3 + .126X2 + .222X3.
What are the elasticities?
^ ^
[Solution: E2 = β 2 X2 /Y = (.126)(52.0)/73.6 = .089. E3 = β 3 X 3 /Y = (.222)(21.5)/73.6 = .064.]
For example, assume X2 is in dollars and Y is in tons.

^
Then, β 2 is in units of tons per dollar, just as ∆Y/∆X2.
X2 is in dollars, just as X2. Y is in tons, just as Y.
^
Therefore, E2 = β 2 X2 /Y is a pure number.
In general, elasticities are are unit-less numbers

Elasticities may be any real number, positive, negative, or zero.
Large absolute value of an elasticity ⇔

the dependent variable is responsive to changes in that independent variable.
309
Elasticities at Other Values than the Mean:*
While Pindyck and Rubinfeld concentrate on the elasticity at the mean of the variables, one
can also look at the elasticity at other values of the variables.
Ei = (∂Y/∂Xi)(Xi/Y).
For example, let Y = 10 + 2X2 - 5X3.

Then E2 = 3X2/(10 + 3X2 - 5X3), and E3 = 5X3/(10 + 3X2 - 5X3).
The values of these elasticities, depend on the values of X2 and X3. For example, at X2 = 4
and X3 = 3, E1 =12/(10 + 12 - 15) = 1.71, and E2 = -15/(10 + 12 - 15) = -2.14.
At X2 = 10 and X3 = -3, E1 = 20/(10 + 20 + 15) = 0.444, and E2 = 15/(10 + 20 + 15) = 0.333.
Thus the percentage change in the dependent variable due to a percentage change in an
independent variable, depends on where in the domain of the regression we look.
More Complicated Models:
More generally the elasticity of Y with respect to Xi is:

(% change in Y)/(%change in Xi) = (∆Y/Y)/(∆Xi/Xi) ≅ (∂ Y/∂ Xi)(Xi/Y).310
For example, assume a regression model has been fit:

Y = 7 + 3X2 + 4X3, with X 2 = 10 and X 3 = 5. ⇒ Y = 7 + 30 + 20 = 57.
At the means, the elasticity of Y with respect to X2 is: (∂Y / ∂X2)( X 2/ Y ) = 3(10/57) = 0.526.
^
For this linear model, one could instead calculate E2 = β 2 X2 /Y = (3)(10/57) = 0.526.
For a linear model, ∂Y / ∂Xi = βi, and therefore the more general definition of elasticity reduces
to that given previously for the linear model.
Exercise: At the means, what is the elasticity of Y with respect to X3?

[Solution: (∂Y / ∂X3)( X 3/ Y ) = 4(5/57) = .351.]
For a different data set, assume a model with an interactive term:

Y = 10 - 5X2 + 3X3 + 2X2X3, with X 2 = 8, X 3 = 11, and Y = 200.
∂Y / ∂X2 = -5 + 2X3. Thus, the elasticity of Y with respect to X2 depends on the level of X3.
One could either determine the elasticity Y with respect to X2 for a stated value of X3 or use the
average value of X3.
For X3 = 11, ∂Y / ∂X2 = 17, and the elasticity Y with respect to X2 is: (17)(8/200) = .680.
Exercise: What is the elasticity of Y with respect to X3, at the average level of X2?
[Solution: ∂Y / ∂X3 = 3 + 2X2 = 3 + (2)(8) = 19. (∂Y / ∂X3)( X 3/ Y ) = (19)(11/200) = 1.045.]
310
This the manner in which elasticities are usually calculated in economics.
For yet another data set, assume the model:

ln Y = 4 + 0.5lnX2 + 0.3lnX3.
Y = exp[ 4 + 0.5lnX2 + 0.3lnX3].
∂Y / ∂X2 = Y(0.5/X2). Taking the partial derivative at the mean of X2,
Elasticity of Y with respect to X2 is: (∂Y / ∂X2)( X 1/ Y ) = 0.5.
When a model is estimated in logarithms rather than in levels, the variable

coefficients can be interpreted as elasticities.
Exercise: At the means, what is the elasticity of Y with respect to X3?

[∂Y / ∂X3 = Y(0.3/X3). (∂Y / ∂X3)( X 3/ Y ) = 0.3.]
elasticity = (∆Y/ Y )/(∆Xi/ X i) ≅ (∆ lnY)/(∆ lnXi) ≅ (∂ lnY / ∂ lnXi).

elasticity ≅ change in lnY per change in lnXi.
Standardized Coefficients versus Elasticities:*
Both standardized coefficients and elasticities are unit-less quantities that are useful to
measure which independent variable is most important in a multiple regression.
When there is a multiplicative relationship between the variables, in other words a linear
model in terms of their logarithms, then it make sense to use elasticities, since they measure
the percentage change in the independent variable based on a percentage change in the
dependent variable. When a multiplicative relationship holds, the variable coefficients can be
interpreted as elasticities.
If on the other hand, the relationship is additive/linear, then it makes somewhat more sense to
use standardized coefficients, since they measure the change in the independent variable
based on a change in the dependent variable.
Problems:
3 7 .1 (2 points) For a four variable linear regression model:

X2 = 8.875. X 3 = -0.750. X 4 = 10.000. Y = 35.250.
^ ^ ^ ^
β1 = 6.3974, β 2 = 2.4626, β 3 = 6.4560, β 4 = 1.1839.
s ^ = 9.27, sβ^ = 7.92, sβ^ = 1.34, sβ^ = 6.64.
β1 2 3 4
Rank the absolute values of the elasticities, at the means of each variable, from smallest to
largest.
A. |E2| < |E3| < |E4|
B. |E2| < |E4| < |E3|
C. |E3| < |E2| < |E4|
D. |E4| < |E3| < |E2|
37.2 (2 points) Data on 20 observations have been fit:

^
Y = 1470.3 + 0.8145X2 + 0.8204X3 + 13.5287X4.
X2 = 9213, X 3 = 35311.3, X 4 = 1383.35, Y = 56660.
sα^ = 5746. sβ^ = .5122. sβ^3 = .2112. sβ^ 4 = .5857.

2
Var[X2] = 28.98 million. Var[X3] = 236.4 million. Var[X4] = .2169 million.
Var[Y] = 514.9 million.
Determine the elasticities, at the means of each variable.

The following model has been fit via least squares:
ln Yi = -4.30 - .002D2i + .336 ln(X3i) + .384 X4i + .067D5i - .143D6i + .081D7i + .134 ln(X8i),
where D2, D5, D6, and D7 are dummy variables.
37.3 (1 point) Estimate the elasticity of Y with respect to X3.

(A) .134 (B) .336 (C) .384 (D) .857 (E) Can not be determined



^
A regression has been fit to five observations: Y = 700 + 60X2 - 3X3.
X2 takes on the values 1, 2, 3, 4, and 5.
X3 takes on the values 300, 500, 100, 400, and 200.
TSS = 20,000.
37.6 (3 points) Determine the standardized coefficients.

Briefly discuss the meaning of your results.
37.7 (3 points) Determine the elasticities at the means of each variable.

Briefly discuss the meaning of your results.
37.8 (4, 11/04, Q.27) (2.5 points) You are given the following model:
ln Yt = β1 + β2 ln X2t + β3 ln X3t + β4(ln X2t - ln X2t0 )Dt + β5(ln X3t - ln X3t0 )Dt + εt
where,
• t indexes the years 1979-93 and t0 is 1990.
• Y is a measure of workers’ compensation frequency.
• X2 is a measure of employment level.
• X3 is a measure of unemployment rate.
• Dt is 0 for t ≤ t0 and 1 for t > t0.
Fitting the model yields:
 4.00 
 
 0.60 
^
β = −0.10
 
−0.07
 
−0.01
Estimate the elasticity of frequency with respect to employment level for 1992.
(A) –0.11 (B) 0.53 (C) 0.60 (D) 0.90 (E) 1.70
Section 38, Partial Correlation Coefficients
The partial correlation coefficient measures the effect of Xj on Y which is not

accounted for by the other variables. For the example involving agent’s loss ratios
versus percent of business in three states, let’s see how to calculate partial correlation
coefficients.
^
The model Y = 62.3 + .126X2 + .222X3, with R2 = 0.851 had been fit to the following data:
Agent X2 X3 Y
1 100 0 75
2 90 10 78
3 70 0 71
4 65 10 73
5 50 50 79
6 50 35 75
7 40 10 65
8 30 70 82
9 15 20 72
10 10 10 66
Exercise: What are the sample correlations of X2 and X3, X2 and Y, and X3 and Y?
[Solution: rX X = Σx2x3/√{(Σx22)(Σx32)} = -.400.
2 3
rYX = Σx2y/√{(Σx22)(Σy2)} = .318.

2
rYX = Σyx3/√{(Σy2)(Σx32)} = .666.]
3
One can calculate the partial correlation coefficients in terms of these sample correlations.311
The partial correlation coefficient of Y and X2 controlling for X3 is:

rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} = .855.
2 3 2 3 2 3 3 2 3
Similarly, the partial correlation coefficient of Y and X3 controlling for X2 is:

rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} = .913.
3 2 3 2 2 3 2 2 3
It turns out that one can also calculate the partial correlation coefficients in terms of R2 and
sample correlations.312
rYX .X 2 = (R2 - rYX 2)/(1 - rYX 2) = (.851 - .6662)/(1 - .6662) = .732. rYX .X = ±.856.
2 3 3 3 2 3
rYX .X 2 = (R2 - rYX 2)/(1 - rYX 2) = (.851 - .3182)/(1 - .3182) = .834. rYX .X = ±.913.
3 2 2 2 2 3
Matching the previous results, subject to determining the sign and subject to rounding.
311
See equations 4.16 and 4.17 in Pindyck and Rubinfeld.
312
R2 is the portion of the variation of Y explained by the model, involving both X2 and X3.
(R2 - rYX 2)/(1 - rYX 2) = .732 is the percentage of the variation of Y which is accounted for by
3 3
the part of X2 that is uncorrelated with X3.
In this case, rYX .X 2 = 73.2% of the variation of Y is accounted for by the part of X2 that is
2 3
uncorrelated with X3.
In general, the square of the partial correlation coefficient measures the

percentage of the variation of Y that is accounted for by the part of Xj that is
uncorrelated with the other variables.
Exercise: What percentage of the variation of Y is accounted for by the part of X3 that is
uncorrelated with X2?
[Solution: rYX .X 2 = 83.4%]
32
Using Regressions to Estimate the Partial Correlation Coefficients:
Here is another way to estimate the partial correlation coefficients using regressions.
For example, here is a calculation of rYX .X .
2 3
First run the regression of Y on just X3.

^
Y = 70.24 + .1564X3.
Next run the regression of X2 on X3.

X^ 2 = 63.10 - .5164X3.
Eliminate the effect of X3 on Y and X2:

^
Y* = Y - Y and X*2 = X2 - X^ 2 .
Take the sample correlation of X*2 and Y* .

Corr[ X*2 , Y* ] = {Σ X*2 Y* - Σ X*2 Σ Y* /N}/√{(Σ X*2 X*2 - Σ X*2 Σ X*2 /N)(Σ Y* Y* - Σ Y* Σ Y* /N)} =
{850.81 - (.03)(-.03)/10}/√{(6729.3 - .032/10)(146.96 - .032/10)} = .856.
The partial correlation coefficient of Y and X2 controlling for X3 is rYX .X = .856,

2 3
matching the previous result.
X2 X3 Y Fitted Y Y* Fitted X2 X2* Y*X2* Y*^2 X2*^2

100 0 75 70.24 4.76 63.10 36.90 175.64 22.66 1361.61
90 10 78 71.80 6.20 57.94 32.06 198.67 38.39 1028.10
70 0 71 70.24 0.76 63.10 6.90 5.24 0.58 47.61
65 10 73 71.80 1.20 57.94 7.06 8.45 1.43 49.90
50 50 79 78.06 0.94 37.28 12.72 11.96 0.88 161.80
50 35 75 75.71 -0.71 45.03 4.97 -3.55 0.51 24.74
40 10 65 71.80 -6.80 57.94 -17.94 122.04 46.29 321.70
30 70 82 81.19 0.81 26.95 3.05 2.47 0.66 9.29
15 20 72 73.37 -1.37 52.77 -37.77 51.67 1.87 1426.72
10 10 66 71.80 -5.80 57.94 -47.94 278.22 33.69 2297.86
Sum -0.03 0.03 850.81 146.96 6729.3
Avg. -0.003 0.003
One can calculate rYX .X in a similar manner:

3 2
1. Run the regression of Y on just X2.
2. Run the regression of X3 on X2.
3. Eliminate the effect of X2 on Y and X3:

^
Y* = Y - Y and X*3 = X3 - X^ 3 .
4. rYX .X is the simple correlation of Y* and X*3 .

2 3
Exercise: For the agent’s regression, use this technique to calculate rYX .X .
3 2
^
[Solution: Regression of Y on just X2: Y = 70.59 + .0578X2.
^
Y = 76.37, 75.80, 74.64, 74.35, 73.48, 73.48, 72.91, 72.33, 71.46, 71.17.
^
Y* = Y - Y = -1.37, 2.20, -3.64, -1.35, 5.52, 1.52, -7.91, 9.67, 0.54, -5.17.
Regression of X3 on just X2: X^ 3 = 37.60 - .3096X2.
X^ 3 = 6.64, 9.73, 15.93, 17.48, 22.12, 22.12, 25.22, 28.31, 32.96, 34.50.
X*3 = X3 - X^ 3 = -6.64, 0.27, -15.93, -7.48, 27.88, 12.88, -15.22, 41.69, -12.96, -24.50
rYX .X = Corr[ Y* , X*3 ] = .913, matching the result obtained previously.]
3 2
Multiple Regression with More Than Two Independent Variables:*
With three independent variables, one could calculate rYX .X X in a similar manner:
2 3 4
1. Run the regression of Y on just X3 and X4.
2. Run the regression of X2 on X3 and X4.
3. Eliminate the effect of X3 and X4 on Y and X2:

^
Y* = Y - Y and X*2 = X2 - X^ 2 .
4. rYX .X X is the simple correlation of Y* and X*2 .

2 3 4
Relation to the Standardized Coefficients:*
For the three variable model, we had:

β*2 = ( rYX - rYX rX X ) /(1 - rX X 2) and β*3 = ( rYX - rYX rX X ) /(1 - rX X 2).
2 3 2 3 2 3 3 2 2 3 2 3
rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)}.

2 3 2 3 2 3 3 2 3
3 2 3 2 2 3 2 2 3
The numerators of β*2 and rYX .X are the same, as are those of β*3 and rYX .X .
2 3 3 2
β*2 = rYX .X √{(1 - rYX 2)/(1 - rX X 2)}.

2 3 3 2 3
β3 = rYX .X √{(1 - rYX )/(1 - rX X 2)}.
* 2
3 2 2 2 3
Problems:

• The correlation of X2 and X3 is -.2537.
• The correlation of X2 and Y is .7296.
• The correlation of X3 and Y is .3952.
38.1 (2 points) For the model Y = β1 + β2X2 + β3X3 + ε, determine rYX .X , the partial
2 3
correlation of Y with X2.
(A) 0.85 (B) 0.88 (C) 0.90 (D) 0.93 (E) 0.95
38.2 (2 points) For the model Y = β1 + β2X2 + β3X3 + ε, determine rYX .X , the partial
3 2
correlation of Y with X3.
(A) 0.85 (B) 0.88 (C) 0.90 (D) 0.93 (E) 0.95

• For the model Y = β1 + β2X2 + β3X3 + ε, R2 = 0.90.
• The correlation of X2 and Y is 0.60.
• The correlation of X3 and Y is 0.40.
38.3 (2 points) Determine the absolute value of the partial correlation of Y with X2.
(A) 0.86 (B) 0.88 (C) 0.90 (D) 0.92 (E) 0.94
38.4 (2 points) Determine the absolute value of the partial correlation of Y with X3.
(A) 0.86 (B) 0.88 (C) 0.90 (D) 0.92 (E) 0.94
38.5 (6 points) For the multiple regression model Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + ε,
you are given:
Independent Fitted Partial Standardized
Variable Slope Elasticity Correlation Coefficient Coefficient
X2 0.00875 .649 .867 .911
X3 -1.927 -.337 -.471 -.395
X4 -3444 -.062 -.561 -.537
X5 2093 .271 .776 .390
Briefly interpret each of these values.
3 8 .6 (Course 120 Sample Exam #3, Q.5) (2 points) You fit a multiple linear regression
function relating Y with X2 and X3. The simple correlation coefficients are:
rX X = -.878, rYX = .970, rYX = -0.938.
2 3 2 3
Calculate rYX .X , the partial correlation of Y with X3.
3 2
(A) -0.7 (B) -0.1 (C) 0.5 (D) 0.9 (E) 1.0
38.7 (2 points) In the previous question, calculate rYX .X , the partial correlation of Y with X2.
2 3
(A) .86 (B) .88 (C) .90 (D) .92 (E) .94
3 8 .8 (Course 4 Sample Exam, Q.5) (2.5 points) For the multiple regression model
Yi = β1 + β2X2i + β3X3i + β4X4i + εi,
you are given:
Independent Partial Standardized Elasticity
Variable Correlation Coefficient Coefficient
X2 0.64 0.50 0.20
X3 -0.04 -0.01 -0.01
X4 0.70 0.40 0.60
Which of the following is implied by this model?
(A) 16% of the variance of Y not accounted for by X2 and X3 is accounted for by X4.
(B) An increase of 1 standard deviation in X2 will lead to an increase of 0.64 standard
deviations in Y.
(C) An increase of 1% in X2 will lead to an increase of 0.20% in Y.
(D) An increase of 1 unit in X3 will lead to a decrease of 0.04 units in Y.
(E) X4 is a more important determinant of Y than X2 is.
38.9 (4, 11/02, Q.12) (2.5 points) For the three variables Y, X2 and X3, you are given the
following sample correlation coefficients:
rYX = 0.6
2
rYX = 0.5
3
rX X = 0.4
2 3
Calculate rYX .X , the partial correlation coefficient between Y and X2.
2 3
(A) 0.50 (B) 0.55 (C) 0.58 (D) 0.64 (E) 0.73
38.10 (4, 11/04, Q.11) (2.5 points) For the model Y = β1 + β2X 2 + β3X 3 + ε, you are given:
(i) rYX = 0.4
2
(ii) rYX .X = -0.4.
3 2
Determine R2.
(A) 0.03 (B) 0.16 (C) 0.29 (D) 0.71 (E) 0.84
Section 39, Regression Diagnostics*313

There are additional items one can look at in order to examine a fitted model. This section will
discuss three of these: studentized residuals, DFBETAS, and Cook’s D.
An Example of a Regression:
Take the set of 14 observations:

X: 1.3 2.0 2.7 3.3 3.7 4.0 4.7 5.0 5.3 5.7 6 6.3 6.7 7.0
Y: 2.3 2.8 2.2 3.8 1.7 2.8 3.2 1.8 3.5 3.4 3.2 3.0 5.9 3.9
A linear regression was fit to this data:

^ = 1.6042.
α sα^ = 0.6952 tα = 2.307.
^
β = 0.3303. sβ^ = 0.1430 tβ = 2.310.
2
s2 = 0.838. R2 = 0.308. R = 0.250.
TSS = 14.5293. RSS = 4.4709 ESS = 10.0584
Here is a graph of the residuals:
Resid.
1.5
0.5
X
2 3 4 5 6 7
-0.5
-1
-1.5
We observe that some of the absolute values of the residuals are large, for example at X = 6.7
the residual is 2.083.
313
See Section 7.4 of Pindyck and Rubinfeld, not on the syllabus.
For a more complete discussion, see for example Chapter 8 of Applied Regression Analysis by Draper and Smith.
Studentized Residuals:314
One way to judge the size of the residuals is to “studentize” them.
Let s(i) be the standard error of the regression excluding observation i.

Excluding the 13th observation, (6.7, 5.9), the fitted regression is:
^
^ = 2.031.
α sα^ = 0.5132 β = 0.1964. sβ^ = 0.1094. s2 = 0.4310.
Let H = X(X’X)-1X’.315
For the original regression, h13,13 = 0.1842.
Then as discussed in a previous section, the covariance matrix of the residuals is: (I - H)σ2.
Thus an estimate of the variance of ^εi is (1 - hii)s2.
Let the studentized residual be: ^εi * = ^εi / {s(i)√ (1 - hii)}. 316
Therefore, the studentized residual corresponding to the 13th observation is:

2.0827/√{(.4310)(1 - .1842)} = 3.512.
Here is a graph of the studentized residuals:317
Resid.
X
2 3 4 5 6 7
-1
314
Also called Externally Studentized Residuals, in order to distinguished them from Internally Studentized
Residuals, also called Standardized Residuals. See Section 8.1 of Draper and Smith.
315
X is the design matrix. As discussed previously H is called the hat matrix. See Section 8.1 of Draper and Smith.
316
See equation 8.1.18 in Draper and Smith.
This seems to differ somewhat from Equation 7.16 in Pindyck and Rubinfeld.
317
Many regression software packages will calculate studentized residuals. I used Mathematica.
The studentized residuals have a t-distribution with n - k - 1 degrees of freedom.318
For this example, the studentized residuals have a t-distribution with 14 - 2 - 1 = 11 d. f.

The 5% critical value is 2.201, while the 1% critical value is 3.106.
Thus the studentized residual corresponding to the 13th observation of 3.512 is significantly
large. One could treat the 13th observation, (6.7, 5.9), as an “outlier”.
An outlier is an observation that is far from the fitted least squares line. One can
compare studentized residuals to the t-distribution in order to spot outliers.
The presence of one or more outliers could be due to an inaccurate or inappropriate model.
These outliers could also just be unusual observations, which occur from time to time due to
random fluctuation. Importantly, these outliers could indicate data errors; i.e., the observations
could have been either incorrectly measured or recorded.319
If a large percentage of studentized residuals had a large absolute value compared to the
critical values of the t-distribution, that would call into question the assumption of Normally
distributed errors.
DFBETAS:320
^
Let β(i) be the estimated slope of the regression excluding observation i.
^
Let β be the estimated slope from the original regression including all observations.
^ ^
The numerator of DFBETASi is β - β(i), the difference between the two estimates of the slope.
Let s(i) be the standard error of the regression excluding observation i.

Recall that in matrix form, the variance-covariance matrix of the estimated coefficients is:
^
Var[ β] = s2(X’X)-1.
^
The denominator of DFBETASi is an estimate of the standard error of β(i):
s(i) √{(X’X)-12,2}, where X is the design matrix including all of the observations.
^ ^
DFBETAS i = { β - β (i)}/(s(i) √ {(X’X) -1 2,2 }). 3 2 1
318
Assuming the errors are Normally Distributed.
319
Data errors are quite common in actuarial work. An actuary should always be very concerned about the quality of
the data on which he is relying.
320
See Section 7.4.2 of Pindyck and Rubinfeld. Also called Best Fit Parameter Deltas.
There are similar DFFITS, which measure the influence of a single observation on its corresponding predicted value
of the dependent variable.
321
Differs somewhat from Equation 7.17 in Pindyck and Rubinfeld.
For the example, β = 0.3303.

 0.576676 -0.111043 
(X’X)-1 =  .
-0.111043 0.0244051
Excluding the 13th observation, β(13) = 0.1964, with s(13)2 = 0.4310.
DFBETAS13 = (.3303 - .1964)/√{(.4310)(.0244051)} = 1.306.
DFBETAS are measuring in units of standard errors the change in the estimated slope when a
single observation is included rather than excluded from the regression.
Here is a graph of the DFBETAS:322
The DFBETAS corresponding to excluding the 13th observation, (6.7, 5.9), has a much larger
absolute value than the others. The 13th observation has a large effect on the estimated slope.
One rule of thumb, is that one should be concerned when the absolute value of a DFBETAS is
larger than 2/√N. Since 1.306 > 2/√14 = 0.534, the 13th observation, (6.7, 5.9), has a large
effect on the fitted slope.
When the absolute value of a DFBETAS is larger than 2/√ N, then the
corresponding observation has a large effect on the estimated slope.
In a similar manner as was done here for the two variable regression, one can define
DFBETAS for each coefficient in a multiple regression:
^ ^
DFBETASj(i) = { β j - β j(i)}/(s(i) √{(X’X)-1jj}).
322
Many regression software packages will calculate DFBETAS. I used Mathematica.
Cook’s D:323
^
Since in deviations form, β = ΣxiYi/Σxi2, those observations for which X is far from X have more
effect on the fitted slope. Least squares estimation is sensitive to observations far from the
means of the independent variables.
Therefore, an observation is influential in fitting the model if its value of X is far from X and it is
also an outlier. One way to spot influential points is via Cook’s D.
^
Let β(i) be the vector of fitted coefficients, excluding observation i.
^ ^
Let Y(i) = X β(i), the fitted values, when observation i is excluded from the regression.
The numerator of Cook’s D is the squared distance between the fitted values with and without
^ ^ ^ ^
the ith observation: { Y - Y(i)}’{ Y - Y(i)}.
The denominator of Cook’s D is a scaling factor: k s2.
^ ^ ^ ^
D i = { Y - Y (i)}’{ Y - Y (i)}/(k s2 ).324
Cook’s D is nonnegative.
For the example, for the original regression, the fitted values are:
^
Y = 1.60417 + 0.330323X = (2.03359, 2.26482, 2.49604, 2.69424, 2.82637, 2.92546, 3.15669,
3.25579, 3.35488, 3.48701, 3.58611, 3.6852, 3.81733, 3.91643).
With the 13th observation, (6.7, 5.9), excluded from the regression:325
^
Y(13) = 2.03133 + 0.196363X = (2.2866, 2.42406, 2.56151, 2.67933, 2.75787, 2.81678,
2.95424, 3.01315, 3.07205, 3.1506, 3.20951, 3.26842, 3.34696, 3.40587).
^ ^ ^ ^
{ Y - Y(13)}’{ Y - Y(13)} = (2.03359 - 2.2866)2 + ... + (3.91643 - 3.40587)2 = 1.20087.
k = 2 variables (slope plus intercept).

For the original regression, s2 = 0.838197.
^ ^ ^ ^
D13 = { Y - Y(13)}’{ Y - Y(13)}/(k s2) = 1.20087/{(2)(0.838197)} = 0.716.
323
See Section 8.3 of Applied Regression Analysis by Draper and Smith, not on the syllabus.
324
See Equation 8.3.1 in Applied Regression Analysis by Draper and Smith.
325
We include a fitted value for X = 6.7, even though this observation was not use to fit the parameters.
Here is a graph of the values of Cook’s D:326
Cook’s D
0.7
0.6
0.5
0.4
0.3
0.2
0.1
X
2 3 4 5 6 7
The value of Cook’s D for the 13th observation, (6.7, 5.9), is much larger than the others.
The 13 observation is very influential.
In general, a large value of Cook’s D relative to the other values, indicates an

influential observation.
Cook’s D may be written in other ways.
^ ^ ^ ^
Di = { β - β(i)}’X’X{ β - β(i)}/(k s2).327
Di = ^εi 2 hii /{k s2 (1 - hii)2}.328
326
Many regression software packages will calculate Cook’s D. I used Mathematica.
327
328
hii is the diagonal element of the hat matrix.
Problems:
Use a regression software package and the following information for the next 6 questions:
One has the following 21 observations, (Xi, Yi):
(15, 95), (26, 71), (10, 83), (9, 91), (15, 102), (20, 87), (18, 93), (11, 100), (8, 104), (20, 94),
(7, 113), (9, 96), (10, 83), (11, 84), (11, 102), (10, 100), (12, 105), (42, 57), (17, 121), (11, 86),
(10, 100).
39.1 (2 points) Draw a scatterplot of this data.
39.2 (2 points) Fit a least squares regression: Y = α + βX + ε.

Add the fitted line to your scatterplot.
39.3 (2 points) Graph the residuals of this regression.
39.4 (2 points) Graph the studentized residuals of this regression.
39.5 (2 points) Graph the DFBETAS of this regression.
39.6 (2 points) Graph the values of Cook’s D of this regression.

Section 40, Stepwise Regression*329

In model building, it is important to choose which independent variables to include. On the
one hand, we do not want to exclude important explanatory variables. On the other hand, we
wish to keep the model as simple as possible.
Stepwise regression is a technique that can be employed when there are many possible
explanatory variables.
One chooses one variable to use first, usually the independent variable with the largest
absolute value of its correlation with the dependent variable. Then at each stage one adds the
independent variable with the largest absolute value of its partial correlation coefficient with
respect to all of the variables already included in the model.
At each stage, one could use the F-Test to decide whether a variable should be eliminated.
One would proceed to add (and possibly eliminate) variables step by step, until no more
2
improvement in R is possible. Then the final model that results contains the set of
independent variables that “work best”.
It should be noted that since one is choosing variables from a large set, specifically to get a
better fit, one can not then apply the t-test and F-test to the result of the stepwise regression
process in order to test hypotheses in the usual manner. However, if one then collects
additional similar data, one can fit to this additional data a model of the form determined to be
best by stepwise regression, and then apply the t-test and F-test in order to test hypotheses in
the usual manner.
329
See page 101 of Pindyck and Rubinfeld. For a more complete explanation, see for example Section 15.2 of
Applied Regression Analysis by Draper and Smith.
Section 41, Stochastic Explanatory Variables*330

For example let us assume one regresses the number of claims for each private passenger
automobile insured one year versus the number of claims the subsequent year.331 Then both Xi,
the number of claims this year, and Yi, the number of claims next year, are stochastic variables;
i.e., random variables.
Examples of stochastic variables in insurance include: number of claims, frequency, severity,

aggregate loss, loss ratio, and pure premium.
In the classical linear regression model, it was assumed that the independent variable(s) were
deterministic rather than random. If we relax this assumption and allow for stochastic
independent variables, then provided we make some additional assumptions, many results
still hold.
Assume in addition that:

1. The distribution of each independent variable is independent of the true regression
parameters.
2. The distribution of each independent variable is independent of the errors of the model.
Then conditional on the given values of the independent variables, all the basic properties of
the least squares estimators continue to hold. For example, the estimates of Ordinary Least
Squares are conditionally unbiased. However, looked at unconditionally not all these
properties continue to hold.
Unconditionally, the Ordinary Least Squares estimator is no longer unbiased, although it is

asymptotically unbiased. Ordinary Least Squares is still consistent. Ordinary Least Squares is
asymptotically efficient (for very large sample sizes it has the smallest mean squared error.)
The least squares estimators are the maximum likelihood estimators of the regression
parameters.
Deterministic Independent Variables Stochastic Independent Variables
OLS unbiased OLS asymptotically unbiased
OLS consistent OLS consistent
OLS efficient OLS asymptotically efficient
Not Applicable OLS ⇔ maximum likelihood estimates
330
331
See for example, “A Graphical Illustration of Experience Rating Credibilities,” by Howard C. Mahler,
PCAS 1998.
Section 42, Generalized Least Squares*332

Generalized Least Squares is a generalization of ordinary least squares regression.
Weighted Least Squares, discussed previously, is a special case of Generalized Least
Squares.
Variance-Covariance Matrix of the Errors:
Assume we have four observations, then the variance-covariance matrix of the errors is:
(Var[ε1] Cov[ε1, ε2] Cov[ε1, ε3] Cov[ε1, ε4])

(Cov[ε2, ε1] Var[ε2] Cov[ε2, ε3] Cov[ε2, ε4])
(Cov[ε3, ε1] Cov[ε3, ε2] Var[ε3] Cov[ε3, ε4])
(Cov[ε4, ε1] Cov[ε4, ε2] Cov[ε4, ε3] Var[ε4])
With N observations, the variance-covariance matrix is an N by N symmetric matrix with entries

Cov[εi, εj].333 In certain cases discussed previously, this variance-covariance matrix of the errors
has a special form.
Classical Linear Regression Model:
In the classical linear regression model, among the assumptions are that:
Var[εi] = σ2 for all i (homoscedasticity), and εi and εj independent for i ≠ j.
Therefore, Cov[εi, εi] = σ2 and Cov[εi, εj] = 0 for i ≠ j. ⇒ Cov[εi, εj] = σ2I.
For example, if σ = 3, then for the classical linear regression model, with 4 observations, the
variance-covariance matrix of the errors is:
(9 0 0 0)
(0 9 0 0)
(0 0 9 0)
(0 0 0 9)
Heteroscedasticity:
As discussed previously, in the case of heteroscedasticity Var[εi] = σi2, however we still

assumed independent errors. Therefore, the variance-covariance matrix of the errors is still
diagonal, but the entries along the diagonal are not all equal.
332
See Appendix 6.1 of Pindyck and Rubinfeld, not on the Syllabus.
333
Note that the dimension of the variance-covariance matrix only depends on the number of observations, not the
number of independent variables.
As an example of heteroscedasticity with 4 observations, the variance-covariance matrix of the

errors could be:
(9 0 0 0)
(0 11 0 0)
(0 0 14 0)
(0 0 0 20)
Serial Correlation:
As discussed previously, when there is serial correlation, the errors are no longer
independent. In the case of first order serial correlation, Cov[εi, εj] = σ2ρ|i - j|, -1 < ρ < 1.
For four observations, the variance-covariance matrix would look like:
(1 ρ ρ2 ρ 3)
(ρ 1 ρ ρ2) σ2.
(ρ2 ρ 1 ρ)
(ρ3 ρ2 ρ 1)
General Case:
These are three important special cases of a general situation in which the
variance-covariance matrix of the errors, σ2Ω, can be any positive definite matrix.334 We
assume Ω is known, while σ2 is usually unknown. We usually given the covariance matrix up to
a proportionality constant.
If X is the design matrix and Y is the vector of observations of the dependent variable,
then the fitted parameters using Generalized Least Squares are:335
~
β = (X’Ω−1X)-1 X’Ω−1Y.
The variance-covariance matrix of the fitted parameters is:336

~
Cov[ β ] = σ2(X’Ω−1X)-1.
An unbiased estimator of σ2 is given by:337

( ε^ ‘ Ω−1 ε^ )/(N-k).
334
A positive definite matrix is a symmetric, square matrix such that xTAx >0 for every x ≠ 0.
All variance-covariances matrices are positive definite.
335
See Equation A6.8 in Pindyck and Rubinfeld. In the classical regression model this reduces to the fitted
parameters being: (X’X)-1 X’Y, as discussed previously.
336
See Equation A6.9 in Pindyck and Rubinfeld. In the classical regression model this reduces to the
variance-covariance matrix of the fitted parameters being: σ2(X’X)-1, as discussed previously.
337
See Equation A6.10 in Pindyck and Rubinfeld. In the classical regression model this reduces to ( ε^ ‘ ε^ )/(N-k).
Unfortunately without assumptions as to the form of Ω, one can not estimate Ω solely from the
observations.
Analysis of Variance, GLS:*
For Generalized Least Squares, TSS = Y’Ω−1Y, RSS = Y’Ω−1X(X’Ω−1X)-1X’Ω−1Y, and

ESS = TSS - RSS = Y’Ω−1Y - Y’Ω−1X(X’Ω−1X)-1X’Ω−1Y.
OLS versus GLS:
If Ω = σ2I, then Generalized Least Squares (GLS) reduces to Ordinary Least Squares (OLS).
If Ω is not σ2I, then the use of Ordinary Least Squares rather than Generalized Least Squares
would result in unbiased but inefficient estimates; the Generalized Least Squares
estimates would have a smaller variance than the Ordinary Least Squares estimates. In
addition, Ordinary Least Squares would result in a biased estimate of the
variance-covariance matrix.
Transformation of Equations:*
Let Ω be proportional to the variance-covariance matrix. Then since the variance-covariance

matrix is positive definite, so is Ω. Therefore, there exists a matrix H, such that H’H = Ω-1.338 339
In the derivation of the equations for Generalized Least Squares, the original equation is
transformed by multiplying everything by this matrix H.
If one has heteroscedasticity, then if for example we have only three observations:
(σ12 0 0) (1/σ1 0 0)
Ω= (0 σ22 0) H= (0 1/σ2 0)
(0 0 σ32) (0 0 1/σ3)
If one has first order serial correlation, then if for example we have only three observations:
(1 ρ ρ 2) (√(1-ρ2) 0 0)
Ω= (ρ 1 ρ) H= (−ρ 1 0)/{√(1-ρ2)}
(ρ2 ρ 1) (0 −ρ 1)
338
H’ is the transpose of H; H’ has the rows and columns of H reversed.
The Choleski Square Root Method is one method of getting C, such that C C’ = Ω.
339
Then if H = C-1, H’H = Ω-1. See for example Fundamentals of Numerical Analysis, by Stephen J. Kellison.
The Choleski Square Root Method applies to any positive definite matrix.
Problems:
42.1 (2 points) If N = 5, σ2 = 100 and ρ = .6, with first order serial correlation, what is the
covariance matrix of the errors?

(i) Yi = α + βXi + εi
Var(εi) = (Xi/2)2
(ii) i Xi Yi
1 1 8
2 2 5
3 3 3
4 4 –4
42.2 (2 points) What is the covariance matrix of the errors?
42.3 (3 points) Use the methods of generalized least squares to fit this model.
42.4 (2 points) What are the variances and covariances of the fitted parameters?

(i) Yi = α + βXi + εi
(ii) i Xi Yi
1 1 3
2 4 8
3 9 15
(iii) The covariance matrix of the errors is σ2Ω, where σ2 is unknown and
( 10 15 20) (119 -34 -17)
Ω = ( 15 40 −1
25) Ω = (-34 20 2 ) / 340
( 20 25 90) (-17 2 7 )
42.5 (4 points) Use the methods of generalized least squares to fit this model.
42.6 (3 points) Estimate σ2.
42.7 (3 points) What are the variances and covariances of the fitted parameters?
Section 43, Nonlinear Estimation340

We previously discussed linear models and models that can be transformed into linear models
by changing variables.
Exercise: Fit via least squares lnY = βX to the following three observations.
X: 1 2 4
Y: 2 3 5
^
[Solution: β = Σ Xi lnYi / Σ Xi2 = 9.32812/21 = 0.44420.]
However, there are other models that are inherently nonlinear. It is harder to estimate the
parameters for such nonlinear models, than for linear models.
Examples of nonlinear models:

Y = α0 + α1X1 β1 + α2X2 β2 + ε.
Y = α1 exp[X1β1] + α2 exp[X2β2] + ε.
One way to fit a nonlinear model is by minimizing the squared errors.341
Determining the fitted parameters for a nonlinear model is more difficult than for
the linear case.
Sum of Squared Errors:
For the model Y = eβX, the sum of squared errors is: Σ (Yi - exp[bXi])2.
Exercise: For the model Y = eβX, determine the sum of squared errors for β = 0.5 and the
following three observations.
X: 1 2 4
Y: 2 3 5
[Solution: (2 - e ) + (3 - e1)2 + (5 - e2)2 = 5.910.]
.5 2
340
See Section 10.1 in Pindyck and Rubinfeld.
341
Another technique is via maximum likelihood, which is covered on Exam 4/C.
See Section 10.2 in Pindyck and Rubinfeld, not on the syllabus of this exam.
Here is a graph of the sum of squared errors as a function of β:
0.35 0.4 0.45 0.5
Solving numerically, the smallest sum of squared errors corresponds to β = 0.411867.
Note this does not match β = 0.44420, the least squares fit for lnY = βX obtained previously.
One can convert Y = eβX to a linear form, lnY = βX, by taking logs of both sides. However,
minimizing the squared errors of this linear model is not equivalent to minimizing the squared
errors of the original nonlinear model.
Normal Equations:
Just as in the linear case, one can write down a series of equations to solve for the least
squares parameters, the Normal Equations, by setting the partial derivatives of the sum of
squared errors equal to zero. For this example, with the model Y = eβX, and the data
X: 1 2 4
Y: 2 3 5
Sum of Squared Errors = S = Σ(Yi - exp[-βXi])2.

In order to minimize S, we set its derivative with respect to β equal to zero:342
0 = ∂S/∂β = 2Σ(Yi - exp[-βXi])Xiexp[-βXi] = 2{(2 - eβ)eβ + (3 - e2β)e2β + (5 - e4β)e4β}.

⇒ 4e7β - 18e3β - 5eβ - 2 = 0.
342
In general, one would set equal to zero the partial derivative with respect to each of the parameters.
There would be as many Normal Equations as there are parameters in the model.
Graphing the righthand side of this Normal Equation:
40
30
20
10
0.35 0.4 0.45 0.5

-10
-20
Solving numerically, this Normal Equation is satisfied by β = 0.411867, matching the previous
result.
Iterative Linearization Method:
Another method of solution involves approximating the nonlinear model by a linear model.343
For example, let us continue to work with the nonlinear model: Y = eβX.
Let f(X) = ebX.

∂f/∂b = X ebX.
Then we have the linear approximation: Y - f(X) ≅ (β - b)∂f/∂b = (β - b) X ebX.
For three observations, we have the following 3 linear equations:

Yi - f(Xi) = (β - b)Xi exp[bXi]. ⇒
Yi - exp[bXi] + bXi exp[bXi] = βXi exp[bXi].
We can treat the lefthand side of the above equation as a constructed dependent variable, and
the portion of the righthand side multiplying β as a constructed independent variable.
In other words, we can view this as the linear equation without intercept: Vi = βUi.
If for example, we take b = 0.5, then the constructed dependent variable is:
Vi = Yi - exp[bXi] + bXi exp[bXi] = Yi + (.5Xi - 1)exp[.5Xi], while the constructed independent
variable is: Ui = Xi exp[bXi] = Xi exp[.5Xi].
343
This is similar to the idea behind the delta method, covered on Exam 4/C.
For example, with b = 0.5 for X = 4 and Y = 5, the constructed independent variable is:
5 + (2 - 1)e2 = 12.3891.
For X = 4 and Y = 5, the constructed dependent variable is: 4e2 = 29.5562.
X Y Constructed Independent Variable Constructed Dependent Variable

1 2 1.6487 1.1756
2 3 5.4366 3.0000
4 5 29.5562 12.3891
One can preform a linear regression with no intercept on the constructed variables.
^
β = {(1.6487)(1.1756) + (5.4366)(3) + (29.5562)(12.3891)}/(1.64872 + 5.43662 + 29.55622) =
384.42/905.83 = 0.424.
Now we can iterate, taking 0.424 as the new value of b.
Exercise: For b = 0.424, construct the dependent and independent variables.

Then perform a linear regression with no intercept on the constructed variables.
[Solution: For X = 4 and Y = 5, the constructed independent variable is:
5 + ((4)(.424) - 1)e(4)(.424) = 8.7947.
X Y Constructed Independent Variable Constructed Dependent Variable
1 2 1.5281 1.1198
2 3 4.6699 2.6451
4 5 21.8084 8.7947
^
β = {(1.5281)(1.1198) + (4.6699)(2.6451) + (21.8084)(8.7947)}
/ (1.52812 + 4.66992 + 21.80842) = 205.86/499.75 = .412.]
To three decimal places, we have converged to the previously determined value of β that
minimizes the squared errors.344
In general, one would have a nonlinear model with k independent variables and p parameters:
Y = f(X1, X2, ..., Xk; β1, β2, ..., βp), which we want to fit by least squares.345
Given initial guesses, β1,0, β2,0, ..., βp,0, we construct the dependent variable:
Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0,
and the p independent variables: (∂f/∂βi)0.
Then we solve for least squares values of the coefficients βi.
These solutions are used as the next guesses, βi,1.
We iterate until there is convergence.346
344
If more accuracy was desired one could perform another iteration.
345
346
In general, this iterative linearization method may or may not converge.
Convergence may depend on making a good choice of initial values for the parameters.
A More Complicated Example:*347
We are given the following 44 observations:
X 0 0 2 2 2 2 4 4 4 4 6
Y 49 49 48 47 48 47 46 46 45 43 45
X 6 6 8 8 8 10 10 12 12 12 14
Y 43 43 44 43 43 46 45 42 42 43 41
X 14 14 16 16 16 18 18 18 20 20 22
Y 41 40 42 40 40 41 40 41 41 40 40
X 22 22 24 24 26 28 28 30 30 32 34
Y 40 38 41 40 40 41 38 40 40 39 39
We wish to fit via least squares the nonlinear model: Y = α + (49 - α) e-βX + ε.
The sum of squared errors is: S = Σ(Yi - α - (49 - α) exp[-βXi])2.348
Given the data, S is a function of the parameters α and β.

We wish to find the values of α and β that minimize S.
Here is a graph of the sum of squared errors, as a function of α and β:
3000
0.2
2000
1000 0.15
0
30 0.1
beta
35
40 0.05
alpha 45
0
50
347
Based on Example 24.3 in Applied Regression Analysis, by Draper and Smith.
348
S is just the sum of the squares of the residuals, which in a linear regression context would be called ESS.
Many software packages will numerically minimize functions such as this one.349 Given a
reasonable starting place, the computer program will search for a minimum.350
In this case, the values: α = 39.0140, and β = 0.101633, produce a minimum of 50.0168.
Exercise: Use the least squares fit to predict the value of Y when X = 15.
[Solution: Y = 39.014 + (49 - 39.014) e-0.101633X = 39.014 + 9.986 exp[-(0.101633)(15)] =
41.188.]
Sum of Squared Errors = S = Σ(Yi - α - (49 - α) exp[-βXi])2. The Normal Equations are:
0 = ∂S/∂α = 2Σ(Yi - α - (49 - α) exp[-βXi])(-1 + exp[-βXi]).
0 = ∂S/∂β = 2Σ(Yi - α - (49 - α) exp[-βXi])(49 - α)Xi exp[-βXi].
These two equations become:

Σ(Yi - α - (49 - α) exp[-βXi])(exp[-βXi] - 1) = 0.
Σ(Yi - α - (49 - α) exp[-βXi])Xi exp[-βXi] = 0.
One can rewrite these two equations as:
α = -Σ(Yi - 49 exp[-βXi])(exp[-βXi] - 1) / Σ(exp[-βXi] - 1)2.
α = -Σ(Yi - 49 exp[-βXi])Xi exp[-βXi] / Σ Xi exp[-βXi] (exp[-βXi] - 1).
One could eliminate α and numerically solve for β.

For example, one could graph the difference of the righthand sides of these equations; where
this difference is zero is the fitted value of β, about 0.102.
0.2
beta
0.06 0.08 0.1 0.12 0.14
-0.2
-0.4
Then using either of the above equations, α is about 39.0.

349
I used the function FindMinimum in Mathematica.
350
One has to beware of finding a local rather than a global minimum.
The more parameters one is trying to fit, the harder it becomes to find a global minimum.
We can also use the iterative linearization method, in order to determine the values of α and β
that minimize the squared errors.
Let f(X) = a + (49 - a) e-bX.

∂f/∂a = 1 - e-bX.
∂f/∂b = - (49 - a) X e-bX.
Then we have Y - f(X) ≅ (α - a)∂f/∂a + (β - b)∂f/∂b = (α - a)(1 - e-bX) - (β - b) (49 - a) X e-bX.
Thus we have 44 linear equations:

Yi - f(Xi) = (α - a)(1 - exp[-bXi]) + (β - b) (a - 49) Xi exp[-bXi]. ⇔
Yi - f(Xi) + a(1 - exp[-bXi]) + b(a - 49) Xi exp[-bXi] = α(1 - exp[-bXi]) + β(a - 49) Xi exp[-bXi].
The lefthand side is the constructed dependent variable:

Yi - f(Xi) + a(1 - exp[-bXi]) + b(a - 49) Xi exp[-bXi]
=Yi - a - (49 - a) exp[-bXi] + a(1 - exp[-bXi]) + b(a - 49) Xi exp[-bXi]
= Yi - 49 exp[-bXi] + (a - 49) bXi exp[-bXi].
The righthand side is α times the first constructed independent variable: (1 - exp[-bXi]),
plus β times the second constructed independent variable: (a - 49) Xi exp[-bXi].
For example, given starting values (guesses) of a = 10 and b = 0.2, for X = 0 and Y = 49,
the value of the constructed dependent variable is: 49 - 49 + (10 - 49)(.2)(0) = 0.
The value of the first constructed independent variable is: 1 - 1 = 0.
The value of the second constructed independent variable is: (10 - 49)0 = 0.
Exercise: Given starting values (guesses) of a = 10 and b = 0.2, for X = 34 and Y = 39,
determine the value of the constructed variables.
[Solution: The value of the constructed dependent variable is:
39 - 49exp[-(.2)(34)] + (10 - 49)(.2)(34)exp[-(.2)(34)] = -38.650.
The value of the first constructed independent variable is: 1 - exp[-(.2)(34)] = .99889.
The value of the second constructed independent variable is:
(10 - 49)34exp[-(.2)(34)] = -1.4769.]
Let Z be the 44 by 2 matrix, with rows given by the values of the constructed independent
variables: 1 - exp[-bXi], (a - 49) Xi exp[-bXi].
(0 0)
Z= (... ...)
(0.99889 -1.4769)
Let the constructed dependent variable be the vector:

Vi = Yi - 49 exp[-bXi]) + (a - 49) bXi exp[-bXi] = (0, ..., -38.650).
α
Then these equations in matrix form are: V = Z   .
β
The solution to these matrix equations is:

(Z’Z)-1Z’V = (39.7201, .1729796).
Thus the new values of α and β are: α1 = 39.7201, and β1 = .1729796.
Continuing in this manner one gets a series of values of the parameters:

(39.5131, 0.0931486), (39.0293, 0.10189), (39.0142, 0.101638), (39.0140, 0.101633).
The process has converged to the same fitted values α = 39.0140, and β = 0.101633, obtained
previously.
In general, this iterative linearization method may or may not converge.
Damping Factors:*
It can sometimes help convergence to use a damping factor, 0 < d < 1, so that when one
updates at each iteration, one only adds d times the computed difference:
For example, in the previous example, the first computed difference was
(29.7201, -0.0270204). With a damping factor of d = .7, the new values of the parameters
would be: (10, .2) + (.7)(29.7201, -0.0270204) = (30.8041, 0.181086). Then we would iterate
as before, except at each stage only adding 0.7 times the computed difference to get the new
values of the parameters.
Steepest Descent Method:*
As with the iterative linearization method, one would start with a set of initial values for the
parameters. As before, S = the sum of squared errors. Define the gradient vector of S as
(∂S/∂β1, ..., ∂S/∂βn). In the steepest descent method, at each stage one moves in the direction
of minus the gradient vector. One determines the distance to move in this direction that
produces the smallest value of S. One then iterates, using these new values of the
parameters.
In this example, the gradient vector of S is:

(∂S/∂α, ∂S/∂β) =
(2Σ(Yi - α - (49 - α) exp[-βXi])(-1 + exp[-βXi]), 2Σ(Yi - α - (49 - α) exp[-βXi])(49 - α)Xi exp[-βXi]).
For example, for an initial values of α = 40 and β = 0.1, the gradient vector of S is:
(45.5366, -1437.91).
This gradient vector points in the direction of steepest assent of S at α = 40 and β = 0.1.
Minus this gradient vector points in the direction of steepest descent; this direction is:
(-45.5366, 1437.91)/ √(45.53662 + 1437.912) = (-0.0316527, 0.999499).
Therefore, the new values of a and b will be: (40 , 0.1) + λ(-0.0316527, 0.999499), where λ is
chosen to minimize the value of S along that line. In this case, that value is λ = 0.02935, which
results in S = 55.9276 rather than S = 73.7519.
The new values of the parameters are:

α = 40 + (0.02935)(-0.0316527) = 39.9991, and β = 0.1 + (0.02935)(0.999499) = 0.129335.
The gradient vector at α = 39.9991 and β = 0.129335 is: (13.6431, 0.504235).

Therefore, the steepest decent is in the direction:
(-13.6431, -0.504235)/√(13.64312 + 0.5042352) = (-0.999318, -0.0369338).
It turns out that λ = 0.6699 minimizes S.

The new values of the parameters are:
α = 39.9991+ (0.6699)(-0.999318) = 39.3297,
and β = 0.129335 + (0.6699)(-0.0369338) = 0.104815.
We would continue in this manner, until we converged on the minimum.351
351
The steepest descent method can take a large number of iterations to converge.
It can be very important to choose a good starting point.
See for example, Numerical Recipes, The Art of Scientific Computing, by Press, et. al.
R2:
One can compute R2 for a fitted nonlinear model in the same manner as one did for a fitted
linear model.
For this example, the fitted parameters were: α = 39.0140 and β = 0.101633
^
Y = 39.0140 + 9.9860 e-0.101633X.
X = (0, 0, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 8, 8, 8, 10, 10, 12, 12, 12, 14, 14, 14, 16, 16, 16, 18, 18, 18,
20, 20, 22, 22, 22, 24, 24, 26, 28, 28, 30, 30, 32, 34).
Y = (49, 49, 48, 47, 48, 47, 46, 46, 45, 43, 45, 43, 43, 44, 43, 43, 46, 45, 42, 42, 43, 41, 41, 40,
42, 40, 40, 41, 40, 41, 41, 40, 40, 40, 38, 41, 40, 40, 41, 38, 40, 40, 39, 39).
^
Y = (49, 49, 47.1632, 47.1632, 47.1632, 47.1632, 45.6642, 45.6642, 45.6642, 45.6642,
44.441, 44.441, 44.441, 43.4428, 43.4428, 43.4428, 42.6281, 42.6281, 41.9634, 41.9634,
41.9634, 41.4209, 41.4209, 41.4209, 40.9781, 40.9781, 40.9781, 40.6169, 40.6169, 40.6169,
40.322, 40.322, 40.0814, 40.0814, 40.0814, 39.8851, 39.8851, 39.7249, 39.5941, 39.5941,
39.4874, 39.4874, 39.4003, 39.3293).
Sum of Squared Differences is: (49 - 49)2 + ... + (39 - 39.3293)2 = 50.02.
Y = 42.5. Σ(Yi - Y )2 = 395.
R2 = 1 - 50.02/395 = 87.3.
Evaluating the Nonlinear Fit:*
Unlike the linear case, the residuals can not be used to get an unbiased estimator of the
variance of a nonlinear model. However, one can approximate this variance as that of the
linear regression at the final stage of the Iterative Linearization Method.
One could not directly apply the t-test or F-Test to the fitted parameters of a nonlinear model.
However, one could examine the results of such tests for the linear regression at the final stage
of the Iterative Linearization Method.
Mean Squared Errors of Forecasts:*
One can not use the formula for the mean squared of error of forecasts, that applied for the
linear model. However, once can use the standard errors of the parameters in the linear
regression at the final stage of the Iterative Linearization Method, together with simulation, in
order to estimate the mean squared of error of forecasts for the nonlinear model.
Problems:
4 3 .1 (2 points) Write down the Normal Equations for the model Y = a + bXc.
Use the following data for the next 9 questions:

X 1 2 3 4 5 6 7 8 9 10 11
Y 920 228 98 51 36 25 19 14 11 9 8
43.2 (4 points) Convert the model Y = a/Xb to a linear model.

Fit this linear model to the above data via least squares linear regression.
43.3 (2 points) Write down the sum of squared errors function S for the nonlinear model
Y = a/Xb. Write down the Normal Equations for this model.
43.4 (3 points) Use a computer to fit via least squares the nonlinear model, Y = a/Xb.
43.5 (3 points) Determine R2 for the fitted model in the previous question.
43.6 (8 points) For the initial values a = 900 and b = 2, for the nonlinear model Y = a/Xb,
for the iterative linearization method, determine the values of the constructed variables, and
then determine the resulting values of the parameters from the first iteration.
43.7 (2 points) Write down the sum of squared errors function S for the nonlinear model
Y = a/(X + c)b. Write down the Normal Equations for this model.
43.8 (4 points) Use a computer to fit via least squares the nonlinear model, Y = a/(X + c)b.
43.9 (5 points) Verify that the Normal Equations are satisfied at the fitted parameters for the
model Y = a/(X + c)b.
43.10 (3 points) Determine R2 for the fitted least squares model, Y = a/(X + c)b.

X 0 1 2
Y 1.1 0.7 0.3
The model is Y = 1/(α + X) + ε.
You use the iterative linearization method to obtain a nonlinear least-square estimate of α.
The initial value of α is α0 = 1.
43.11 (2 points) Determine the value of the constructed dependent variable in the first
iteration.
43.12 (1 point) Determine the value of the constructed independent variable in the first
iteration.
43.13 (1 point) Determine the estimate of α that results from the first iteration.
43.14 (4 points) Determine the estimate of α that results from the second iteration.
(i) The model is Y = eβX + ε.
(ii) You use the iterative linearization method to obtain a nonlinear least-square estimate of β.
(iii) The initial value of β is β0 = 0.1.
Determine the value of the constructed dependent variable in the first iteration when
Y = 11.7 and X = 25.
(A) 12 (B) 18 (C) 24 (D) 30 (E) 36
43.16 (2 points) In the previous question, determine the value of the constructed independent
variable in the first iteration when Y = 11.7 and X = 25.
(A) 120 (B) 180 (C) 240 (D) 300 (E) 360
Mahler’s Guide to
Regression
Sections 44-45:
44 *Generalized Linear Models
45 Important Ideas and Formulas

prepared by
Study Aid F06-Reg-K

Sharon, MA, 02067
HCMSA-F06-Reg-K, Mahler’s Guide to Regression, 7/12/06, Page 386
Section 44, Generalized Linear Models*352

For the generalized linear model:353
A parametric distribution has mean µ, and θ a vector of additional parameters.
µ and θ do not depend on each other.
z is the vector of covariates for an individual.
β is the vector of coefficients.
η(µ) and c(y) are functions.354
X is the random variable.355
F(x | θ , β) = F(x | µ , θ),
where µ is such that η (µ µ ) = c(Σ β izi).
Generalized Linear Models are fit by maximum likelihood.
Ordinary Linear Regression:
For ordinary linear regression, Y has a Normal Distribution with parameters µ = µ and θ = σ,
and both η and c are the identity function. µ = E[Y] = Σ βixi.
The ordinary linear regression model is a special case of the generalized linear
model.
In the case of Normally distributed errors, fitting by least squares is equivalent

to fitting via maximum likelihood.
Generalized Linear Models allow other forms of the error than Normal, do not require
homoscedasticity,356 and allow more forms of relationships between the covariates and the
dependent variable.
352
See Section 12.7.3 of Loss Models, not on the syllabus of this exam.
353
See Definition 12.70 in Loss Models.
This is more general than the usual definition: g(µ) = Σ βizi, where g is the link function.
354
η is called the link function. Common examples of link functions are listed below.
In the applications of the Generalized Linear Model you are likely to read about, c is the identity function.
355
We would assume a specific distribution form for X, for example Normal, Exponential, Poisson, Gamma, etc.
356
Homoscedasticity refers to the assumption that Var[εi] = σ2 for all i.
A One Variable Example of Linear Regression:
Assume we have the following set of three observations: (1, 1), (2, 2), (3, 9).
For example, for x = 3, we observe the value y = 9.
We assume Y = β0 + β1 X + ε.357
Then we could fit a linear regression by minimizing the sum of the squared errors.
^
The predicted values are Y = (β0 + β1, β0 + 2β1, β0 + 3β1).
The errors are: 1 - (β0 + β1), 2 - (β0 + 2β1), 9 - (β0 + 3β1).
The sum of the squared errors is: (1 - β0 - β1)2 + (2 - β0 - 2β1)2 + (9 - β0 - 3β1)2.
We can minimize the sum of the squared errors by setting its partial derivatives equal to zero:
0 = 2(1 - β0 - β1) + 2(2 - β0 - 2β1) + 2(9 - β0 - 3β1). ⇒ 24 = 6β0 + 12β1. ⇒ 4 = β0 + 2β1.
0 = 2(1 - β0 - β1) + 4(2 - β0 - 2β1) + 6(9 - β0 - 3β1). ⇒ 64 = 12β0 + 28β1. ⇒ 16 = 3β0 + 7β1.
Solving these two equations in two unknowns: β0 = -4 and β1 = 4.

^
Thus the least squares line is: Y = -4 + 4X.
^
The predicted values of Y for X = 1, 2, 3 are: Y = (0, 4, 8).
Alternately, one can calculate the regression in deviations form.

X = (1 + 2 + 3)/3 = 2. x = X - X = (-1, 0, 1).
Y = (1 + 2 + 9)/3 = 4. y = Y - Y = (-3, -2, 5).
^
β 1 = Σxiyi /Σxi2 = {(-1)(-3) + (0)(-2) + (1)(5)}/{(-1)2 + (0)2 + (1)2} = 8/2 = 4.
^ ^
β 0 = Y − β 1 X = 4 - (4)(2) = -4.
357
β1 is the slope and β0 is the intercept.
A One Dimensional Example of Generalized Linear Models:358
Let us assume the same set of three observations: (1, 1), (2, 2), (3, 9).
We will now call the independent variable the covariate z, which takes on the values 1, 2, and
3 for the observations.359 While in a generalized linear model, one would call the dependent
variable X, I will continue to call it Y in order to avoid confusion with the previous example.
In a generalized linear model, Y will have some distributional form. The mean of the
distribution will vary with z. However, any other parameters will be constant.
I will take what Loss Models refers to as c as the identity function.
η(µ) = Σβizi. ⇔ µ = η−1(Σβizi ).
For now let us assume the identity link ratio, η(µ) = µ, so that µ = Σβizi = β0 + β1 z.360
Thus for now we are fitting a straight line.
For the first observation, z = 1, µ = β0 + β1, and y = 1.
For the second observation, z = 2, µ = β0 + 2β1, and y = 2.
For the third observation, z = 3, µ = β0 + 3β1, and y = 9.
We will do the fitting via maximum likelihood.
Which line we get, depends on the distributional form we assume for Y.
Assume that Y is Normal, with mean µ and standard deviation σ.

µ = β0 + β1 z, while σ is the same for all z.
For the Normal Distribution, f(y) = exp[-.5(y - µ)2/σ2]/{σ√(2π)}.
ln f(y) = -.5(y - µ)2/σ2 - .5ln(2π) - ln(σ).
The loglikelihood is the sum of the contributions from the three observations:
{-.5(1 - (β0 + β1))2/σ2 - .5ln(2π) - ln(σ)} + {-.5(2 - (β0 + 2β1))2/σ2 - .5ln(2π) - ln(σ)}
+ {-.5(9 - (β0 + 3β1))2/σ2 - .5ln(2π) - ln(σ)} =
(-.5/σ2){(1 - (β0 + β1))2 + (2 - (β0 + 2β1))2 + (9 - (β0 + 3β1))2} - 3ln(σ) - 1.5ln(2π).
To maximize the loglikelihood, we set its partial derivatives equal to zero.

Setting the partial derivative with respect to β0 equal to zero:
0 = (1/σ2){(1 - (β0 + β1)) + (2 - (β0 + 2β1)) + (9 - (β0 + 3β1))}. ⇒ 12 = 3β0 + 6β1. ⇒ 4 = β0 + 2β1.
0 = (1/σ2){(1 - (β0 + β1)) + 2(2 - (β0 + 2β1)) + 3(9 - (β0 + 3β1))}.
⇒ 32 = 6β0 + 14β1. ⇒ 16 = 3β0 + 7β1.
358
See page 15 of “A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum,
Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi, in the 2004 CAS Discussion Paper
Program.
359
It is not clear in this example whether z can take on values other than 1, 2 and 3. These may be the only possible
values, or they might be the three values for which we happen to have had an observation. In practical applications,
when z is discrete, we would expect to have many observations for each value of z.
360
I have treated z0 as the constant 1 and z1 as the covariate z.
Solving these two equations in two unknowns: β0 = -4 and β1 = 4.

µ = -4 + 4z.
This should be interpreted as follows. For a given value of z, Y is Normally Distributed with
mean = -4 + 4z and standard deviation independent of z. For example, for z = 3, the mean = 8.
Thus for z = 3, the expected value of Y is 8. However, due to random fluctuation, for z = 3 we
will observe values of Y varying around the expected value of 8. If we make a very large
number of observations of individuals with z = 3, then we expect to observe a Normal
Distribution of outcomes with mean 8 and standard deviation σ.361
^
The fitted line is: Y = -4 + 4z.
This is the exact same result as obtained previously for linear regression.
For ordinary linear regression, Y has a Normal Distribution with parameters µ = µ and θ = σ,
and both η and c are the identity function. µ = E[Y] = Σ βixi. As stated earlier, the ordinary linear
regression model is a special case of the generalized linear model.
One could solve for σ, by setting the partial derivative of the loglikelihood with respect to σ
equal to zero:
0 = (1/σ3){(1 - (β0 + β1))2 + (2 - (β0 + 2β1))2 + (9 - (β0 + 3β1))2} - 3/σ.
⇒ 3σ2 = {(1 - (β0 + β1))2 + (2 - (β0 + 2β1))2 + (9 - (β0 + 3β1))2} = 12 + (-2)2 + 12 = 6.
⇒ σ2 = 2. ⇒ σ = √2.
In the linear regression version of this same example, one would estimate the variance of the
regression as: s2 = Σ ^εi 2 / (N - 2) = {(1 - 0)2 + (2 - 4)2 + (9 - 8)2}/(3 - 2) = 6.362 This is an unbiased
estimate of σ2, which is not equal to that from maximum likelihood which is biased.
In general, for N observations the estimate of σ2 from maximum likelihood will be: Σ ^εi 2 / N.
For large N this is very close to s2 = Σ ^εi 2 / (N - 2).
The maximum likelihood estimator is asymptotically unbiased.
Gamma Distribution:*
For this example, instead assume that Y is Gamma, with mean µ and shape parameter α.363
µ = β0 + β1 z, while α is the same for all z.
For the Gamma Distribution as per Loss Models, f(y) = θ−αyα−1 e−y/θ / Γ(α).
ln f(y) = (α-1)ln(y) - y/θ - αln(θ) - ln(Γ(α)) = (α-1)ln(y) - y/(µ/α) - αln(µ/α) - ln(Γ(α))
= (α-1)ln(y) - αy/µ - αln(µ) + αln(α) - ln(Γ(α)).
361
Note that we do not need to determine σ, in order to estimate the mean. How to fit σ will be discussed below.
362
See “Mahler’s Guide to Regression” or Economic Models and Economic Forecasts by Pindyck and Rubinfeld.
363
The mean is αθ, for the Gamma Distribution as per Loss Models.
(α-1){ln(1) + ln(2) + ln(3)} - α{1/(β0 + β1) + 2/(β0 + 2β1) + 9/(β0 + 3β1)}
- α{ln(β0 + β1) + ln(β0 + 2β1) + ln(β0 + 3β1)} + 3αln(α) - 3 ln(Γ(α)).

0 = α{1/(β0 + β1)2 + 2/(β0 + 2β1)2 + 9/(β0 + 3β1)2} - α{1/(β0 + β1) + 1/(β0 + 2β1) + 1/(β0 + 3β1)}.
⇒ 1/(β0 + β1)2 + 2/(β0 + 2β1)2 + 9/(β0 + 3β1)2 = 1/(β0 + β1) + 1/(β0 + 2β1) + 1/(β0 + 3β1).
0 = α{1/(β0 + β1)2 + 4/(β0 + 2β1)2 + 27/(β0 + 3β1)2} - α{1/(β0 + β1) + 2/(β0 + 2β1) + 3/(β0 + 3β1)}.
⇒ 1/(β0 + β1)2 + 4/(β0 + 2β1)2 + 27/(β0 + 3β1)2 = 1/(β0 + β1) + 2/(β0 + 2β1) + 3/(β0 + 3β1).
Solving these two equations in two unknowns: β0 = -1.79927 and β1 = 2.74390.364

µ = -1.79927 + 2.74390z.
For z = 1, µ = .94463. For z = 2, µ = 3.68853. For z = 3, µ = 6.43243.
This differs from what was obtained when one assumed Y was Normal rather than Gamma!
Although it is not needed in order to estimate the means, one can solve for α by maximizing
the loglikelihood via computer. The fitted α = 7.00417.
This model should be interpreted as follows. For a given value of z, Y is Gamma Distributed
with mean = -1.79927 + 2.74390z, and α = 7.00417 independent of z. For example, for z = 3,
the mean = 6.43243 and α = 7.00417. This implies that for z = 3 the scale parameter of the
Gamma is θ = 6.43243/7.00417 = .91837, and the variance of the Gamma is:365
(7.00417)(.918372) = 5.9073.
Thus for z = 3, the expected value of Y is 6.43243. However, due to random fluctuation, for
z = 3 we will observe values of Y varying around the expected value of 6.43243. If we make a
very large number of observations of individuals with z = 3, then we expect to observe a
Gamma Distribution of outcomes with mean 6.43243 and variance 5.9073.
364
I used a computer to solve these two equations. When fitting Generalized Linear Models via maximum likelihood,
one almost always needs to use a computer. There are specialized commercial software packages which are
specifically designed to work with Generalized Linear Models. See http://www.statsci.org/glm/software.html.
365
For a Gamma Distribution as per Loss Models, the variance is αθ2.
Since the mean is αθ, the variance = mean2/α.
Poisson Distribution:
For this same example, instead assume that Y is Poisson, with mean µ.366
µ = β0 + β1 z.
For the Poisson Distribution as per Loss Models, f(y) = e−λ λy / y!.
ln f(y) = -λ + yln(λ) - ln(y!) = -µ + yln(µ) - ln(y!).
-(β0 + β1) - (β0 + 2β1) - (β0 + 3β1) + ln(β0 + β1) + 2ln(β0 + 2β1) + 9ln(β0 + 3β1) - ln(1) - ln(2) -
ln(9!).

0 = -3 + 1/(β0 + β1) + 2/(β0 + 2β1) + 9/(β0 + 3β1).
0 = -6 + 1/(β0 + β1) + 4/(β0 + 2β1) + 27/(β0 + 3β1).
Solving these two equations in two unknowns: β0 = -12/5 = -2.4 and β1 = 16/5 = 3.2.367
µ = -2.4 + 3.2z.
For z = 1, µ = 0.8. For z = 2, µ = 4.0. For z = 3, µ = 7.2.
This differs from what was obtained when one assumed Y was Normal rather than Poisson!
This model should be interpreted as follows. For a given value of z, Y is Poisson Distributed
with mean = -2.4 + 3.2z. For example, for z = 3, the mean = 7.2. However, due to random
fluctuation, for z = 3 we will observe values of Y varying around the expected value of 7.2. If we
make a very large number of observations of individuals with z = 3, then we expect to observe
a Poisson Distribution of outcomes with mean 7.2.
Here is a comparison of the results of the three fitted models:
z Observed Normal Poisson Gamma

1 1 0 0.8 0.945
2 2 4 4.0 3.689
3 9 8 7.2 6.432
The Poisson assumes that the variance increases with the mean, and therefore less weight is
given to the error related to the third observation. Therefore, the Poisson model is less affected
by the observation of 9, than is the Normal model. The Gamma assumes that the variance
increases with the square of the mean, and therefore even less weight is given to the error
related to the third observation. Therefore, the Gamma model is even less affected by the
observation of 9, than is the Poisson model.
366
In the case of a Poisson, there are no additional parameters beyond the mean.
367
I used a computer to solve these two equations. One can confirm that these values satisfy these equations.
Using a Different Link Function:*
In this example, let us maintain the assumption of a Poisson Distribution, but instead of the
identity link function let us use the log link function.
ln(µ) = Σβizi = β0 + β1z. ⇒ µ = exp[Σβizi] = exp[β0 + β1z].
f(y) = e−λ λy / y!.

ln f(y) = -λ + yln(λ) - ln(y!) = -µ + yln(µ) - ln(y!) = -exp[β0 + β1z] + y(β0 + β1z) - ln(y!).
-exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + β0 + β1 + 2(β0 + 2β1) + 9(β0 + 3β1) - ln(1) - ln(2) -
ln(9!).

0 = -exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + 12.
0 = -exp[β0 + β1] - 2exp[β0 + 2β1] - 3exp[β0 + 3β1] + 32.
Thus we have two equations in two unknowns:

exp[β0 + β1]{1 + exp[β1] + exp[2β1]} = 12.
exp[β0 + β1]{1 + 2exp[β1] + 3exp[2β1]} = 32.
Dividing the second equation by the first equation:

{1 + 2exp[β1] + 3exp[2β1]}/{1 + exp[β1] + exp[2β1]} = 8/3.
⇒ exp[2β1] - 2exp[β1] - 5 = 0.
Letting v = exp[β1], this equation is: v2 - 2v - 5 = 0, with positive solution v = 1 + √6 = 3.4495.

exp[β1] = 3.4495. ⇒ β1 = 1.238.
⇒ exp[β0] = 12/{exp[β1] + exp[2β1] + exp[3β1]} = 12/{3.4495 + 3.44952 + 3.44953} = .2128.
⇒ β0 = -1.547.
µ = exp[β0 + β1z] = exp[β0] exp[β1]z = (.2128)(3.4495z).

For z = 1, µ = .734. For z = 2, µ = 2.532. For z = 3, µ = 8.735.
This differs from the result obtained previously when using the identity link function:
z Observed Poisson, Identity Link Poisson, Log Link Function

1 1 0.8 0.734
2 2 4.0 2.532
3 9 7.2 8.735
A Two Dimensional Example of Generalized Linear Models:*368
Let us assume we have two types of drivers, male and female, and two territories, urban and
rural. Then there are a total of four combinations of gender and territory.
Let us assume that we have the following observed pure premiums:369
Urban Rural
Male 800 500
Female 400 200
Let us assume the following generalized linear model:

Gamma Function
Reciprocal Link Ratio370
z1 = 1 if male.
z2 = 1 if female.
z3 = 1 if urban and z3 = 0 if rural.
Then 1/µ = Σβizi = β1z1 + β2z2 + β3z3. ⇒ µ = 1/(β1z1 + β2z2 + β3z3).

Therefore, the modeled means are:
Urban Rural
Male 1/(β1 + β3) 1/β1
Female 1/(β2 + β3) 1/β2
For the Gamma Distribution as per Loss Models, f(y) = θ−αyα−1 e−y/θ / Γ(α).
ln f(y) = (α-1)ln(y) - y/θ - αln(θ) - ln(Γ(α)) = (α-1)ln(y) - y/(µ/α) - αln(µ/α) - ln(Γ(α))
= (α-1)ln(y) - αy/µ - αln(µ) + αln(α) - ln(Γ(α))
= (α-1)ln(y) - αy(β1z1 + β2z2 + β3z3) + αln(β1z1 + β2z2 + β3z3) + αln(α) - ln(Γ(α)).
The loglikelihood is the sum of the contributions from the four observations:
(α-1){ln(800) + ln(400) + ln(500) + ln(200)}
- α{800(β1 + β3) + 400(β2 + β3) + 500β1 + 200β2}
+ α{ln(β1 + β3) + ln(β2 + β3) + ln(β1) + ln(β2)}} + 4αln(α) - 4 ln(Γ(α)).
368
See pages 24 to 28 and Appendix F of “A Practitioners Guide to Generalized Linear Models,” by Duncan
Anderson, Sholom Feldblum, Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi, in the
2004 CAS Discussion Paper Program. Note that the same data I use is there assumed to be claim severities.
369
For simplicity assume that for each cell we have the same number of exposures. If the exposures varied by cell,
then modifications could be made to the generalized linear model in order to take into account the volume of data by
cell. All values are for illustrative purposes only.
370
One could instead use the log link function, and obtain somewhat different results.

0 = -α(800 + 500) + α{1/(β1 + β3) + 1/β1}. ⇒ 1/(β1 + β3) + 1/β1 = 1300.
0 = -α(400 + 200) + α{1/(β2 + β3) + 1/β2}. ⇒ 1/(β2 + β3) + 1/β2 = 800.
0 = -α(800 + 400) + α{1/(β1 + β3) + 1/(β2 + β3)}. ⇒ 1/(β1 + β3) + 1/(β2 + β3) = 1200.
Solving these three equations in three unknowns:371

β1 = .00223804, β2 = .00394964, and β3 = -.00106601.
µ = 1/(.00223804z1 + .00394964z2 - .00106601z3).
For Male and Urban: z1 = 1, z2 = 0, z3 = 1, and µ = 1/(.00223804 - .00106601) = 853.22.

For Female and Urban: z1 = 0, z2 = 1, z3 = 1, and µ = 1/(.00394964 - .00106601) = 346.79.
For Male and Rural: z1 = 1, z2 = 0, z3 = 0, and µ = 1/.00223804 = 446.82.
For Female and Rural: z1 = 0, z2 = 1, z3 = 0, and µ = 1/.00394964 = 253.19.
The fitted pure premiums by cell are:372
Urban Rural Average

Male 853.22 446.82 650.02
Female 346.79 253.19 299.99
Average 600.00 350.00 475.00
This compares to the observed pure premiums by cell:
Urban Rural Average

Male 800 500 650
Female 400 200 300
Average 600 350 475
Notice how subject to rounding, the averages for male, female, urban, and rural are equal for
the fitted and observed. The overall experience of each class and territory has been
reproduced by the model.373
371
I used a computer to solve these three equations.
There is no need to solve for α in order to calculate the fitted pure premiums by cell.
372
The averages were computed assuming the same number of exposures by cell.
373
This is an example of a relationship between generalized linear models and “Minimum Bias” Methods.
See “A Systematic Relationship Between Minimum Bias and Generalized Linear Models”, by Stephen J. Mildenhall,
PCAS 1999. The Gamma with a log link function, rather than the reciprocal link function, is equivalent to one of the
Minimum Bias Methods considered by Mildenhall.
Common Link Functions:
What Loss Models calls η is commonly called the link function. In the actuarial applications of
the Generalized Linear Model you are likely to read about, what Loss Models calls c is taken
as the identity function. η(µ) = βz. ⇔ µ = η−1(βz).
Common link functions to use include:
Identity η(µ) = µ η−1(y) = y µ = βz

Log η(µ) = ln(µ) η−1(y) = ey µ = eβz
Logit374 η(µ) = ln(µ/(1 - µ)) η−1(y) = ey/(ey + 1) µ = eβz/(eβz + 1)
Reciprocal η(µ) = 1/µ η−1(y) = 1/y µ = 1/(βz)
It is common to pick the form of the variable X, to be a member of a linear exponential family.375
In that case, there are corresponding commonly chosen link functions.376 These are referred to
as the “canonical link functions”.
Distribution Form Canonical Link Function
Normal377 Identity
Poisson378 Log: ln(µ)
Gamma379 Reciprocal: 1/µ
Bernoulli380 Logit: ln(µ/(1 - µ))
Inverse Gaussian 1/µ2
374
Used in Example 12.72 in Loss Models. Useful when 0 < µ < 1, since then -∞ < ln(µ/(1 - µ)) < ∞.
375
See “Mahler’s Guide to Conjugate Priors” for a discussion of linear exponential families.
Exponential families can be used in Generalized Linear Models, including a “dispersion parameter”.
376
While these choices make it easier to fit the generalized linear model, they are not required.
377
For example, ordinary linear regression.
378
Could be used to model claim frequencies or claim counts.
379
Could be used to model claim severities. In that case, one could use the log link function, ln(µ).
380
Could be used to model probability of policy renewal.
The use of the logit link function and with the Bernoulli is the idea behind logistic regression.
Actuarial Applications:*381
Generalized Linear Models can be used for many purposes. Among the applications to
actuarial work are: determining classification relativities, loss reserving, and studying policy
renewal rates.
Further Reading:*382
“A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom

Feldblum, Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi,
in the 2004 CAS Discussion Paper Program.
“Something Old, Something New in Classification Ratemaking with a Novel Use of GLMs for
Credit Insurance”, by K. D. Holler, D. B. Sommer, and G. Trahair, CAS Forum, Winter 1999,
formerly on the Syllabus for CAS Part 9.
“Using Generalized Linear Models to Build Dynamic Pricing Systems”, by Karl P. Murphy,
Michael J. Brockman, and Peter K. Lee, CAS Forum, Winter 2000.
“A Systematic Relationship Between Minimum Bias and Generalized Linear Models”,

by Stephen J. Mildenhall, PCAS 1999.
381
See for example the section for Insurance Applications at http://www.statsci.org/glm/bibliog.html
382
For a list of books, see for example http://www.statsci.org/glm/books.html
Problems:

X: 1 5 10 25
Y: 5 15 50 100
Y1, Y2, Y3, Y4 are independently Normally distributed with means µi = βXi, i = 1, 2,...,4, and
common variance σ2.
^
44.1 (2 points) Determine β via maximum likelihood.
(A) 3.9 (B) 4.0 (C) 4.1 (D) 4.2 (E) 4.3
^
44.2 (3 points) Estimate the standard deviation of β.
(A) 0.15 (B) 0.20 (C) 0.25 (D) 0.30 (E) 0.35
44.3 (1 point) Which of the following are statements are true?

A. If the errors are Normally distributed, then the method of least squares produces the same fit
as the method of maximum likelihood.
B. Ordinary Linear Regression is a special case of Generalized Linear Models.
C. Weighted Least Squares Regression is a special case of Generalized Linear Models.
A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A, B, C, or D
44.4 (4 points) Assume a set of three observations:

For z = 1, we observe 4. For z = 2, we observe 7. For z = 3, we observe 8.
Fit to these observations a Generalized Linear Model with a Poisson Distribution and a log link
function. In other words, assume that each observation is a Poisson random variable, with
mean λ and ln(λ) = β0 + β1z.

X 2 5 8 9
Y 10 6 11 13
Y1, Y2, Y3, Y4 are independently Normally distributed with means µi = β0 + β1Xi, i = 1, 2,...,4,
and common variance σ2.
^
44.5 (2 points) Determine β 1 via maximum likelihood.
(A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5
^
44.6 (2 points) Determine β 0 via maximum likelihood.
(A) 4 (B) 6 (C) 7 (D) 8 (E) 9
44.7 (2 points) Determine σ^ via maximum likelihood.

(A) 1.0 (B) 1.5 (C) 2.0 (D) 2.5 (E) 3.0
^
44.8 (3 points) Estimate the standard deviation of β 1.
(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7
^
44.9 (3 points) Estimate the standard deviation of β 0.
(A) 2.5 (B) 3.0 (C) 3.5 (D) 4.0 (E) 4.5
44.10 (8 points) You have the following data on reported occurrences of a communicable
disease in two areas of the country at 2 month intervals:
Months Area A Area B
2 8 14
4 8 19
6 10 16
8 11 21
10 14 23
12 17 27
14 13 28
16 15 29
18 17 33
20 15 31
Let X1 = ln(months). Let X2 = 0 for Area A and 1 for Area B.
Assume the number of occurrences Yi are Poisson variables with means µi, and
ln(µi) = β0 + β1X1i + β2X2i.
Set up the equations to be solved in order to fit this model via maximum likelihood.
44.11 (5 points) You have the following data on the renewal of homeowners insurance
policies with the ABC Insurance Company:
Number of Year Insured Number of Policies Number of Policies Renewed
1 1000 900
2 900 820
3 800 740
4 700 660
5 600 580
Let X = number of years insured with ABC Insurance Company.
Assume the number of renewals is Binomial with expected rate of renewal p.
Further you assume that ln[p/(1-p)] = β0 + β1X.
Determine the equations to be solved in order to fit this model via maximum likelihood.

(i) The ages and number of accidents for five insureds are as follows:
Insured X = Age Y = Number of Accidents
1 34 2
2 38 1
3 45 0
4 25 3
5 21 3
Total 163 9
(ii) Y1, Y2,..., Y5 are independently Poisson distributed with means µi = βXi, i = 1, 2,...,5.
^
Estimate the standard deviation of β.
(A) Less than 0.015
(E) At least 0.030
Section 45, Important Ideas and Formulas

Fitting a Straight Line with No Intercept (Section 1)
Least squares fit to the linear model with no intercept, Y = β X + ε :

^
β = Σ X i Y i /Σ X i 2 .
Fitting a Straight Line with an Intercept (Section 2)
Two-variable regression model ⇔ 1 independent variable and 1 intercept.

Yi = α + β Xi + εi
Ordinary least squares regression: minimize the sum of the squared differences
between the estimated and observed values of the dependent variable.
^ ^
estimated slope = β = {NΣ X iY i - Σ X iΣ Y i }/ {NΣ X i2 - (Σ X i)2 }. α^ = Y − β X.
To convert a variable to deviations form, one subtracts its mean.

A variable in deviations form is written with a small rather than capital letter.
xi = Xi - X . Variables in deviations always have a mean of zero.
In deviations form, the least squares regression to the two-variable (linear)

regression model, Y i = α + βX i + εi, has solution:
^ ^
β = Σ xiyi /Σ xi2 = ΣxiYi /Σxi2. α^ = Y − β X..
^
β = Cov[X, Y] / Var[X] = r sY/sX.
Provided you are given the individual data rather than the summary statistics, the allowed
electronic calculators will fit a least squares straight line with an intercept.
Residuals (Section 3)
^
Residual = actual - estimated. ⇔ ^εi ≡ Yi - Yi .
For the linear regression model with an intercept, the sum of the residuals is always zero.
The sum of squared errors is referred to the Error Sum of Squares or ESS.
^
ESS ≡ Σ ^εi 2 = Σ (Y i - Yi ) 2 .
^
Corr[ ε^ , X] = 0. Corr[ Y - Y , ε^ ] = 0.
Dividing the Sum of Squares into Two Pieces (Section 4)
Σ (X i - X) 2 /(N - 1) ⇔ the sample variance of X, an unbiased estimator of the

underlying variance, when the underlying mean is unknown.
Total Sum of Squares ≡ TSS ≡ Σ (Y i - Y)2 = Σ y i2 .

TSS = numerator of the sample variance of Y.
^
Regression Sum of Squares ≡ RSS ≡ Σ ( Yi - Y) 2 .
TSS = RSS + ESS.

The total variation has been broken into two pieces: that explained by the
regression model, RSS, and that unexplained by the regression model, ESS.

Model RSS k-1
Error ESS N-k
Total TSS N-1

Where N is the number of points, and k is the number of variables including the
intercept (k = 2 for the two-variable model with one slope and an intercept.)
R-Squared (Section 5)
R 2 = RSS/TSS = 1 - ESS/TSS = 1 - Σ ^εi 2 /Σ y i2 .

R-Squared is the percentage of variation explained by the regression model.
0 ≤ R2 ≤ 1. (Not applicable to a model with no intercept.)
Large R2 ⇔ good fit, but not necessarily a good model.

As one adds more variables, R2 increases.
For the Two Variable Model:

^ ^
RSS = β 2 Σxi2= β Σxiyi = SXY2/SXX.
^
R2 = RSS/TSS = β 2 Σxi2 /Σyi2 = Corr[X, Y]2.
Corrected R-Squared (Section 6)
2
Corrected R 2 = R = 1 - (1 - R2 )(N - 1)/(N - k), where N is the number of observations,
2
and k is the number of variables including the intercept. R ≤ R 2 .
One can usefully compare the corrected R2’s of different regressions. Larger is better.
Normal Distribution (Section 7)
F(x) = Φ ( ( x−
−µ)/σ).
f(x) = φ ( ( x− σ √ 2π
− µ ) / σ ) = (1/σ µ )2 ]/[2σ
π ) exp( -[(x-µ σ 2 ] ).
Mean = µ . Variance = σ 2 . Skewness = 0 ( distribution is symmetric).
The sum of two independent Normal Distributions is also a Normal Distribution.
If X is normally distributed, then so is aX + b.
Let X1, X2, ..., Xn be a series of independent, identically distributed variables, with finite mean
and variance. Let X n = average of X1, X2, ..., Xn.
Then as n approaches infinity, X n approaches a Normal Distribution.
Assumptions of Linear Regression (Section 8)
Five assumptions are made in the Classical Linear Regression Model:

1. Yi = α + βXi + εi.
2. Xi are known fixed values.
3. E[ε] = 0.
4. Var[εi] = σ2 for all i.
5. εi and εj independent for i ≠ j.
If we add an assumption, we get the Classical Normal Linear Regression Model:
6. The error terms are Normally Distributed.
Therefore, Yi is Normally Distributed with mean α + βXi and variance σ2.
In the multivariate version, we add an assumption that no exact linear relationship exists
between two or more of the independent variables.
Properties of Estimators (Section 8)
The Bias of an estimator is the expected value of the estimator minus the true
value. An unbiased estimator has a Bias of zero. For an asymptotically unbiased
estimator, as the number of data points, n → ∞, the bias approaches zero.
When based on a large number of observations, a consistent estimator, has a

very small probability that it will differ by a large amount from the true value.
The mean square error (MSE) of an estimator is the expected value of the squared
difference between the estimate and the true value.
The smaller the MSE, the better the estimator, all else equal.
The mean squared error is equal to the variance plus the square of the bias.
Thus for an unbiased estimator, the mean square error is equal to the variance.
An unbiased estimator is efficient if for a given sample size it has the smallest
variance of any unbiased estimator.
The least squares estimator of the slope is unbiased.

The ordinary least squares estimator is consistent, even with heteroscedasticity
and/or serial correlation of errors.
Gauss-Markov Theorem: the ordinary least squares estimator are the best linear
unbiased estimator, BLUE; they have the smallest variance among linear
unbiased estimators. With heteroscedasticity and/or serial correlation of errors, the
ordinary least squares estimator is not efficient.
Variances and Covariances (Section 10)
s 2 = Σ ^εi 2 / (N - k) = ESS/(N - k) = estimated variance of the regression.

s is called the standard error of the regression.
sα^ = √ Var[ α^ ] = standard error of the estimate of α.

^
sβ^ = √ Var[ β] = standard error of the estimate of β.
For the two-variable model:

^
Var[ α^ ] = s2 Σ X i2 /(NΣ x i2 ). Var[ β ] = s2 /Σ x i2 .
^ ^
Cov[ α^ , β] = -s2 X /Σxi2. Corr[ α^ , β] = - X /√(E[X2]).
If X > 0, then the estimates of α and β are negatively correlated.
^
β is Normally Distributed, with mean β and variance σ2/Σxi2.
α^ is Normally Distributed, with mean α and variance E[X2]σ2/Σxi2.

^
α^ and β are jointly Bivariate Normally Distributed, with correlation - X /√(E[X2]).
^
If we simulated the Y values many times, we would get a set of many different values for β;
^ ^
Var[ β] measures the variance of β around its expected value of β.
t-distribution (Section 11)
Support: -∞
∞ < x < ∞. Parameters: ν = positive integer
Mean = 0 Skewness = 0 (symmetric)
Mode = 0 Median = 0
As ν → ∞ , the t-Distribution → the Standard Normal Distribution.
t-test (Section 12)
For a confidence interval for the mean with probability 1 - α, take X ± t√(S2/n), where S2 is the
sample variance and t is the critical value for the t-distribution with n-1 degrees of freedom and
α area in both tails.
Confidence Intervals for Estimated Parameters (Section 13)
To get a confidence interval for a regression parameter with probability 1 - α ,

one uses the critical value for t-distribution with N-k degrees of freedom and α
^
area in both tails. The confidence interval is: β ± t sβ^ .
F-Distribution (Section 14)
Support: 0 < x < ∞ .

Parameters:
ν 1 ⇔ number of degrees of freedom associated with the numerator
⇔ columns of table.
ν 2 ⇔ number of degrees of freedom associated with the denominator
⇔ rows of table.
Generally an F-Statistic will involve in the numerator some sort of sum of

squares divided by its number of degrees of freedom. In the denominator will be
another sum of squares divided by its number of degrees of freedom.
For the tests applied to regression models, one applies a one-sided F-test.
Prob[F-Distribution with 1 and ν degrees of freedom > c2] =

Prob[absolute value of t-distribution with ν degrees of freedom > c].
Testing the Slope, Two Variable Model (Section 15)
Most Common t-test for the 2-variable model:

1. H0 : β = 0. H 1 : β ≠ 0.
^
2. t = β / sβ^ .
General t-test, 2-variable model:

1. H0: a particular regression parameter takes on certain value b. H1: H0 is not true.
Applying the F-Test to a single slope is equivalent to the t-test with t = √ F.
For the 2-variable model, F = (N - 2)R2 /(1 - R2 ) = (N - 2)RSS/ESS,

with 1 and N - 2 degrees of freedom.
Hypothesis Testing (Section 16)
One tests the null hypothesis H0 versus an alternative hypothesis H1 .

Hypothesis tests are set up to disprove something, H0, rather than prove anything.
A hypothesis test needs a test statistic whose distribution is known.

critical values ⇔ the values used to decide whether to reject H0 ⇔
boundaries of the critical region other than ±∞.
critical region ⇔ if test statistic is in this region then we reject H0.
The significance level, α, of the test is a probability level selected prior to performing the test.
If given the value of the test statistic, the probability that H0 is true is less than the significance
level chosen, then we reject the H0. If not, we do not reject H0.
p-value = probability of rejecting H0 even though it is true

= probability of a Type I error
= Prob[test statistic takes on a value equal to its calculated value or
a value less in agreement with H0 (in the direction of H1 ) | H0 ].
If the p-value is less than the chosen significance level, then we reject H0 .
When applying hypothesis testing to test the fit of a model, the larger the
p-value the better the fit.
Type I Error ⇔ Reject H0 when it is true.

Type II Error ⇔ Do not reject H0 when it is false.
Rejecting H0 at a significance level of α ⇔ the probability of a Type I error is less than α.
Power of the test ≡ probability of rejecting H0 when it is false = 1 - Prob[Type II error].

The larger the data set, the more powerful a given test.
Three Variable Regression Model (Section 18)
Y = β1 + β2X2 + β3X3 + ε.
^
β 2 = {Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2}.
^
β 3 = {Σx3iyi Σx2i2 - Σx2iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2}.
^ ^ ^
β1 = Y - β 2 X 2 - β 3 X 3 .
s2 = ESS/(N - k) = ESS/(N - 3).

^
Var[ β 2 ] = s2/{(1 - rX X 2)Σx2i2}.
2 3
^
Var[ β 3] = s2/{(1 - rX X 2)Σx3i2}.
2 3
^ ^
Cov[ β 2 , β 3] = - rX X s2 / {(1 - rX X 2)√(Σx2i2Σx3i2)}.
2 3 2 3
^ ^ ^ ^ ^ ^
Corr[ β 2 , β 3] = Cov[ β 2 , β 3] / √(Var[ β 2 ]Var[ β 3]) = - rX X .
2 3
^
Var[ β1] = s2{ΣX2i2 ΣX3i2 - (ΣX2iX3i)2}/{NΣx2i2Σx3i2(1 - rX X 2)}.
2 3
^ ^
Cov[ β1, β 2 ] = s2{ΣX3iΣX2iX3i - ΣX2iΣX3i2}/{NΣx2i2Σx3i2(1 - rX X 2)}.
2 3
^ ^
Cov[ β1, β 3] = s2{ΣX2iΣX2iX3i - ΣX3iΣX2i2}/{NΣx2i2Σx3i2(1 - rX X 2)}.
2 3
Matrix Form of Linear Regression (Section 19)
X = the design matrix, is an N by k matrix in which the first column consists of ones,
corresponding to the constant term in the model,
and the remainder of each row is the values of the independent variables for an observation.
^
fitted parameters = β = (X’X)-1X’Y.
^
variance-covariance matrix of fitted parameters = Var[ β] = s2(X’X)-1.
Tests of Slopes, Multiple Regression Model, (Section 20)
Summary of the general t-test:

1. H0 : a particular regression parameter takes on certain value b.
H1 : H0 is not true.
4. Number of degrees of freedom = N - k.
To test the hypothesis that all of the slope coefficients are zero compute the
F-Statistic = {RSS/(k-1)}/{ESS/(N - k)} = {R 2 /(1 - R2 )}(N - k)/(k - 1),
which if H0 is true follows an F-Distribution with ν 1 = k -1 and ν 2 = N - k.
Additional Tests of Slopes (Section 21)
When testing whether some set of slope coefficients are all zero, or whether some linear
relationship holds between two slope coefficients:
F-Statistic = {(ESS R - ESSU R )/q} / {ESSU R /(N - k)}
= {(R2 U R - R2 R )/q}/{(1 - R2 U R )/(N - k)}.
where q is the dimension of the restriction,
R ⇔ restricted model, and UR ⇔ unrestricted model.
Applying the F-Test to a single slope is equivalent to the t-test with t = √ F.
To test the equality of the coefficients of 2 similar regressions fit to different data sets A and B,
F = {(ESSR - ESSU R )/q}/{ESS U R /(N - k)} =
{(ESS C - ESSA - ESSB )/k}/{(ESS A + ESSB )/(N A + NB - 2k)},
with k and NA + NB - 2k degrees of freedom.
Additional Models (Section 22)
By change of variables, for example taking logs, one can convert certain models into those that
are linear in their parameters. Exponential regression: ln[Yi] = α + βXi + εi.
exponential regression ⇔ constant percentage rate of inflation.
The Normal Equations, N linear equations in N unknowns, can be obtained by writing the
expression for the sum of squared errors, and setting equal to zero the partial derivative with
respect to each of the parameters.
Dummy Variables (Section 23)
A dummy variable is one that is discrete rather than continuous.

Most commonly a dummy variable takes on only the values 0 or 1.
Piecewise Linear Regression (Section 24)
Piecewise Linear Regression uses a model made up a series of straight line segments, with
the entire model continuous.
Weighted Regressions (Section 25)
In a weighted regression we weight some of the observations more heavily than

others.
One can perform a weighted regression by minimizing the weighted sum of

squared errors:
^
For the model with no intercept: β = ΣwiXiYi / ΣwiXi2.
One can put weighted regression into deviations form, by subtracting the weighted average
from each variable.
xi = Xi - ΣwiXi yi = Yi - ΣwiYi
^ ^
For the two variable model: β = Σwixiyi /Σwixi2. α^ = ΣwiYi - β ΣwiXi.
Heteroscedasticity (Section 26)
Variances of ε i are all equal ⇔ Homoscedasticity.

Variances of ε i are not all equal ⇔ Heteroscedasticity.
^
If there is heteroscedasticity, then the usual estimator of the variance of β is
biased and inconsistent.
^
For the two variable model, when there is heteroscedasticity, Var[ β] = Σxi2σi2 /(Σxi2)2.
Tests for Heteroscedasticity (Section 27)
The Goldfeld-Quandt Test proceeds as follows:

1. Find a variable that seems to be related to σi2, by graphing the squared residuals, or other
techniques.
2. Order the observations in assumed increasing order of σi2, based on the relationship
from step 1.
3. Run a regression on the first (N - d)/2 observations, with assumed smaller σi2.
4. Run a regression on the last (N - d)/2 observations, with assumed larger σi2.
5. (ESS from step 4)/(ESS from step 3) has an F-Distribution,
with (N - d)/2 - k and (N - d)/2 - k degrees of freedom.
The Breusch-Pagan Test proceeds as follows:

other techniques.
2. Run the assumed regression model. Note the residuals ^εi and let σ2 = ESS/N.
3. Run a regression of ^εi 2/σ2 from step 2 on the variable(s) from step 1.
4. RSS/2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal
The White Test proceeds as follows:

other techniques.
2. Run the assumed regression model. Note the residuals.
3. Run a regression of ^εi 2 from step 2 on the variable(s) from step 1.
4. N R2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal
Correcting for Heteroscedasticity (Section 28)
In order to adjust for heteroscedasticity, we use a weighted regression, in which

σ i2 , the inverse of
we weight each observation by wi, with wi proportional to 1/σ
the variance of the error ε i.
For the model with no intercept:

^
β = Σ ( X i/σ σ i) 2 = Σ w iX iY i / Σ w iX i2 , where wi = (1/σ
σ i) / Σ ( X i/σ
σ i) ( Y i/σ σ i2 )/ Σ (1/σ
σ j2 ) .
Heteroscedasticity-consistent estimators (HCE), provide unbiased, and consistent estimators

of variances of estimated parameters, when heteroscedasticity is present.
^
Var[ β] ≅ Σxi2 ^εi 2 / ( Σxi2)2.
Serial Correlation (Section 29)
ρ = Corr[εε t-1 , ε t ]. If ρ > 0, then we have positive (first order) serial correlation.
If ρ < 0, then we have negative (first order) serial correlation.
ρ > 0 ⇔ successive residuals tend to be alike.
ρ < 0 ⇔ successive residuals tend to be unalike.
εt = ρεt-1 + νt. Var[εt] = σε2 = σν2/(1 - ρ2).
Effects of Positive Serial Correlation on Ordinary Regression Estimators:

1. Still unbiased.
3. No longer efficient.
4. Standard error of regression is biased downwards.
5. Overestimate precision of estimates of model coefficients.
6. Some tendency to reject H0: β = 0, when one should not.
7. R2 is biased upwards.
Durbin-Watson Statistic (Section 30)
DW = Σ ( ε^t - ε^t−1) 2 / Σ ε^t 2 .

no serial correlation ⇔ DW near 2.
positive serial correlation ⇔ DW small.
negative serial correlation ⇔ DW large.
DW ≅ 2(1 - ρ).
When there is a lagged dependent variable, the Durbin-Watson test is biased towards not
rejecting the null hypothesis of no serial correlation.
When there is a lagged dependent variable with coefficient β, then the Durbin h-statistic:
^
h = (1 - DW/2)√{N/(1 - N Var[ β])}, where N is the number of observations and DW is the Durbin-
Watson Statistic, has a Standard Normal Distribution if H0: no serial correlation, is true.
^
This is not valid if N Var[ β] ≥ 1.
A second technique for dealing with the situation with a lagged dependent variable, involves
fitting a regression to the residuals using lagged dependent variable ε^ t −1. We then apply the
usual t-test to ρ, the coefficient of ε^ t −1. If ρ is significantly different from zero, then we reject the
null hypothesis of no serial correlation.
Correcting for Serial Correlation (Section 31)
The Hildreth-Lu procedure:

0. Choose a grid of likely values for -1 ≤ ρ ≤ 1. For each such value of ρ:
If desired, then refine the grid of values for ρ, and again perform steps 1, 2, and 3.
Translate the transformed equation back to the original variables: Yt = α + βXt.
The Cochrane-Orcutt procedure:

1. Fit a linear regression and get the resulting residuals ε^t .


7. Unless the value of ρ seems to have converged or enough iterations have been
performed, return to step 3.
Multicollinearity (Section 32)
In multiple regression, there is a high degree of multicollinearity when some of the

independent variables or combinations of independent variables are highly correlated.
A high degree of multicollinearity, usually leads to unreliable estimates of the regression
parameters.
Forecasting (Section 33)
Variance of the forecast of the expected value at x = X - X is: s2 {1/N + x2 / Σ x i2 }.
Mean Squared Forecast Error at x = X - X is: sf 2 = s 2 {1 + 1/N + x2 / Σ x i2 }

= s2 + Variance of forecast at x.
If one estimated the regression from data at time 1, 2, ..., T, and one is forecasting at time T+1,
Mean Squared Forecast Error is: s2{1 + 1/T + (XT+1 - X )2/Σ(Xi - X )2}.
The mean squared forecast error is smallest when we try to predict the value of the dependent
variable at the mean of the independent variable.
^
Normalized Forecast Error = λ = ( Y T+1 - YT+1)/sf.
One can use the t-distribution to get confidence intervals for forecasted expected values and
forecasted observed values.
Testing Forecasts (Section 34)
The errors that would result from the repeated use of a procedure is what is referred to when
we discuss the qualities of an estimator.
ex post forecast ⇔ values to be forecast known at time of forecast.

ex ante forecast ⇔ values to be forecast not known at the time of the forecast.
In a conditional forecast, not all of the values of the independent variable(s) are known at the
time of the forecast.
Root Mean Squared Forecast Error =

√ { Σ (forecast i - observationi) 2 /(# of forecasts)}.
U = Theil’s Inequality Coefficient =

(RMS Error) / {√(2nd moment of forecasts) + √(2nd moment of observeds)}.
0 ≤ U ≤ 1. The smaller Theil’s Inequality Coefficient, the better the forecasts.
Bias proportion of U = UM = {(mean forecast) - (mean observation)}2/ MSE.

Variance proportion of U = US = {(stddev of forecasts) - (stddev of observations)}2/ MSE.
Covariance proportion of U = UC =
2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) / MSE.
U + US + UC = 1.
M
Forecasting with Serial Correlation (Section 35)
With serial correlation one successively forecasts ahead one time period at a time:
^ ^ ^
^ + β(t+1 ^
Forecast for time t+1 = ρ(Forecast for time t) + α(1- ρ) - ρt).
^ β^ + Forecast for time t) + (1- ρ){
= ρ( ^ α ^
^ + β(t+1)}.
^
^ β, and ρ^ have been estimated using one of the procedures to correct for serial
Where, α,
correlation, the Cochrane-Orcutt procedure or Hildreth-Lu procedure.
Standardized Coefficients (Section 36)
Prior to preforming the regression, standardize each variable by subtracting its mean and then
dividing each variable by its standard deviation.
^
For the two variable model: β^* = β sX/sY = rXY.
For the three variable model, the standardized slopes can be written in terms of correlations:
β^*2 = ( rYX - rYX rX X ) /(1 - rX X 2) and β^*3 = ( rYX - rYX rX X ) /(1 - rX X 2).
2 3 2 3 2 3 3 2 2 3 2 3
^ ^
For the multiple regression model: β^*2 = β 2 sX2 /sY, β^*3 = β 3 sX3 /sY, etc.
The larger the absolute value of the standardized coefficient, the more important the
corresponding variable is in determining the value of Y.
Elasticity (Section 37)
The elasticity measures the percent change in the dependent variable for a given percent
change in an independent variable, near the mean value of each variable.
^
Ej = βj X j / Y. elasticity ≅ (∂Y / ∂Xi)( X i/Y ).
|Ej| large ⇔ Y is responsive to changes in Xj.
For a model is estimated in logarithms, the variable coefficients are the elasticities.
Partial Correlation Coefficients (Section 38)
In a multiple regression, the partial correlation coefficient measures the effect of Xj on Y which
is not accounted for by the other variables.
The square of the partial correlation coefficient measures the percentage of the variation of Y
that is accounted for by the part of Xj that is uncorrelated with the other variables.
For the three variable model:

2 3 2 3 2 3 3 2 3
rYX .X = ( rYX - rYX rX X )/√{(1 - rYX )(1 - rX X 2)}.
2
3 2 3 2 2 3 2 2 3
rYX .X 2 = (R2 - rYX 2)/(1 - rYX 2). rYX .X 2 = (R2 - rYX 2)/(1 - rYX 2).
2 3 3 3 3 2 2 2
Regression Diagnostics (Section 39) *
s(i) = the standard error of the regression excluding observation i. H = X(X’X)-1X’.

studentized residual = ^εi * = ^εi / {s(i)√(1 - hii)}.
An outlier is an observation that is far from the fitted least squares line.
One can compare studentized residuals to the t-distribution in order to spot outliers.
^ ^
DFBETASi = { β - β(i)}/(s(i) √{(X’X)-12,2}).
When the absolute value of a DFBETAS is larger than 2/√N, then the corresponding
observation has a large effect on the estimated slope.
^ ^ ^ ^
Cook’s D = Di = { Y - Y(i)}’{ Y - Y(i)}/(k s2).
A large value of Cook’s D relative to the other values, indicates an influential observation.
Stepwise Regression (Section 40) *
At each stage one adds the independent variable with the largest absolute value of its partial
correlation coefficient with respect to all of the variables already included in the model.
2
Proceed until no more improvement in R is possible.
Stochastic Explanatory Variables (Section 41) *
The Ordinary Least Squares estimator is no longer (unconditionally) unbiased.
Generalized Least Squares (Section 42) *
The variance-covariance matrix of the errors is σ2Ω, with Ω known.

~
β = (X’Ω−1X)-1 X’Ω−1Y.
~
The variance-covariance matrix of the fitted parameters is: Cov[ β ] = σ2(X’Ω−1X)-1.
An unbiased estimator of σ2 is given by: ( ε^ ‘ Ω−1 ε^ )/(N - k).
If Ω is not σ2I, then the use of Ordinary Least Squares rather than Generalized Least Squares
would result in unbiased but inefficient estimates.
Nonlinear Estimation (Section 43)
One can fit a nonlinear model by minimizing the squared errors.
Determining the fitted parameters for a nonlinear model is more difficult than for the linear
case. Among the methods of determining the fitted parameters for a nonlinear model:
solve the Normal Equations, the Iterative Linearization Method, and the Steepest Descent
Method.
Iterative Linearization Method:

There is a nonlinear model with k independent variables and parameters:
Y = f(X1, X2, ..., Xk; β1, β2, ..., βp), which we want to fit by least squares.
Given initial guesses, β1,0, β2,0, ..., βp,0, we construct the dependent variable:
Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0,
and the p independent variables: (∂f/∂βi)0.
Then we solve for least squares values of the coefficients βi.
These solutions are used as the next guesses, βi,1.
We iterate until there is convergence.
In the nonlinear case, one can compute R2 as in the linear case.
In the nonlinear case, one can not directly use from the linear case:
s2 = ESS/(N - k), t-test, F-Test, and the formula for the mean squared forecast error.
Generalized Linear Models (Section 44) *
The ordinary linear regression model is a special case of the generalized linear
model.
In the case of Normally distributed errors, fitting by least squares is equivalent

to fitting via maximum likelihood.
F(x | θ , β) = F(x | µ , θ), where µ is such that η(µ) = c(Σ βizi). η is called the link function.
Given a distributional form for F, a link function, and data, one fit the coefficients β via
maximum likelihood.
Mahler’s Guide to
Regression
Solutions to Problems
Sections 1-12
prepared by
Study Aid F06-Reg-L

Sharon, MA, 02067
HCMSA-F06-Reg-L, Solutions to Regression §1-12, 7/12/06, Page 416
Solutions to Problems, Sections 1-12
^
1.1. E. β = ΣXiYi /ΣXi2 = {(4)(5) + (7)(15) + (13)(22) + (19)(35)}/{42 + 72 + 132 + 192} =
1076/595 = 1.808.
^ ^
1.2. C. Yi = 1.808Xi. ^εi = Yi - Yi .
Xi: 4 7 13 19
Yi: 5 15 22 35
^
Yi : 7.232 12.656 23.504 34.352
^ε
i: -2.232 2.344 -1.504 .648
ESS = Σ ^εi 2 = 2.2322 + 2.3442 + 1.5042 + .6482 = 13.16.
1.3. B. Xi: 4 7 13 19
Yi: 5 15 22 35
2Xi: 8 14 26 38
error: -3 1 -4 -3
2 2 2 2
sum of squared errors = 3 + 1 + 4 + 3 = 35.
Comment: Note that the sum of squared errors is larger than for the least squares fit.
^
1.4. C. A model with no intercept, therefore β = ΣXiYi / ΣXi2 = (Y1 + 5Y2 + 10Y3)/126.
^
ε^2 = Y2 - βX2 = Y2 - 5(Y1 + 5Y2 + 10Y3)/126 = (101Y2 - 5Y1 - 50Y3)/126.
E[ ε^2 ] = 0. E[ ε^2 2] = Var[ ε^2 ] = {1012Var[Y2] + 52Var[Y1] + 502Var[Y3]}/1262 =

{10201Var[ε2] + 25Var[ε1] + 2500Var[ε3]}/15876 = {(10201)(2) + (25)(1) + (2500)(4)}/15876 =
30427/15876 = 1.92.
Comment: Similar to 4, 11/03, Q.29.
^
1.5. C. A model with no intercept, therefore β = ΣXiYi / ΣXi2 = 3080/751 = 4.10.
^
1.6. D. β = ΣXiYi / ΣXi2 = 128417/111234 = 1.1545. (300)(1.1545) = 346.3.
^
1.7. C. β = ΣXiYi / ΣXi2 = 379/60 = 6.32.
^
1.8. B. β = ΣXiYi/ΣXi2 = {(630)(570) + ... + (760)(720)}/(6302 + ... + 7602)
= 4,107,900/4,150,900 = .9896.
^
1.9. C. β = ΣXiYi/ΣXi2 = 36,981/191,711 = .193.
^
1.10. B. β = Σxiyi / Σxi2 = {(1)(2) + (5)(3)}/{12 + 52)} = 17/26.
1.11. E. β* = ΣXiYi/ΣXi2 = (-2Y1 - Y2 + Y4 + 2Y5)/10.

E[Y1] = E[10x + 3x2 + ε] = (10)(-2) + (3)(-2)2 + 0 = -8. E[Y2] = -7. E[Y4] = 13. E[Y5] = 32.
E[β*] = -.2E[Y1] - .1E[Y2] + .1E[Y4] + .2E[Y5] = 10.
Comment: Discussed in Section 7.3.1 of of Pindyck & Rubinfeld, no longer on the syllabus.
Thus even though this question can be answered from first principles, you are unlikely to be
asked it on your exam.
1.12. E. The fitted slope = .3 = Σtiyi /Σti2 = Σti(.1ti - zi + εi)/16 = .1 - Σtizi/16 + Σtiεi/16.
Since E[εi] = 0, and the errors are assumed to be uncorrelated with t, the expected value of the
last term is zero. Therefore, we estimate that:
.3 = .1 - Σtizi/16. ⇒ Σtizi = -3.2. Since Σt = 0, Cov[t , z] = Σtizi / n = -3.2/n.
Since Σt = 0, Var[t] = Σti2 / n = 16/n. Since Σz = 0, Var[z] = Σzi2 / n = 9/n.
Corr[t , z] = (-3.2/n) / √((16/n)(9/n)) = -.267.
Comment: Discussed in Section 7.3.1 of of Pindyck & Rubinfeld, no longer on the syllabus.
Thus even though this question can be answered from first principles, you are unlikely to be
asked it on your exam.
1.13. (i) (a) Squared Error is: (Yi - βXi)2.

Setting the derivative with respect to β equal to zero: -2Xi(Yi - βXi) = 0. ⇒ β = ΣXiYi/ΣXi2.
^
(b) E[ β1] = E[ΣXiYi/ΣXi2] = ΣE[XiYi/ΣXi2] =ΣE[Yi](Xi/ΣXi2) =ΣβXi(Xi/ΣXi2) = βΣXi2/ΣXi2 = β.
^
Var[ β1] = Var[ΣXiYi/ΣXi2] = ΣVar[XiYi/ΣXi2] = ΣVar[Yi](Xi/ΣXi2)2 = σ2ΣXi2/(ΣXi2)2 = σ2/ΣXi2.
^
(ii) (a) E[ β 2 ] = E[ΣYi/ΣXi] = ΣE[Yi/ΣXi] =ΣE[Yi]/ΣXi =ΣβXi/ΣXi = βΣXi/ΣXi = β.
^
Var[ β 2 ] = Var[ΣYi/ΣXi] = ΣVar[Yi/ΣXi] = ΣVar[Yi]/(ΣXi)2 = Σσ2/(ΣXi)2 = Nσ2/(ΣXi)2.
(b) Var[X] ≥ 0. ⇒ E[X2] ≥ E[X]2. ⇔ ΣXi2/N ≥ (ΣXi/N)2. ⇔ 1/ΣXi2 ≤ N/(ΣXi)2.
^ ^
⇒ Var[ β1] ≤ Var[ β2 ].
^
(iii) (a) E[ β 3] = E[ΣaiYi] = ΣE[aiYi] =ΣE[Yi]ai =ΣβXiai = βΣaiXi.
^
Unbiased if and only if E[ β 3] = β.⇔ ΣaiXi = 1.
^
Var[ β 3] = Var[ΣaiYi] = ΣVar[aiYi] = ΣVar[Yi]ai2 = Σσ2ai2 = σ2Σai2.
^
(b) For β1, ai = Xi/ΣXi2. ΣaiXi = ΣXi2/ΣXi2 = 1.
^
For β 2 , ai = 1/ΣXi. ΣaiXi = ΣXi/ΣXi = 1.
^
(c) Therefore, the least squares estimator, β1, has the smallest variance of any linear unbiased
estimator of β.
1.14. B. A model with no intercept, therefore for ordinary least squares:
^
β = ΣXiYi / ΣXi2 = (Y1 + 2Y2 + 3Y3)/14.
^
ε^1 = Y1 - βX1 = Y1 - (Y1 + 2Y2 + 3Y3)/14 = (13Y1 - 2Y2 - 3Y3)/14.
E[ ε^1] = 0. ⇒ E[ ε^12] = Var[ ε^1] = {132Var[Y1] + 22Var[Y2] + 32Var[Y3]}/142 =

{169Var[ε1] + 4Var[ε2] + 9Var[ε3]}/196 = {(169)(1) + (4)(9) + (9)(16)}/196 = 349/196 = 1.78.
Comment: Since it is not stated otherwise, it is assumed the εi are independent.
Since the Var(εi) are not equal, this is an example of heteroscedasticity; ordinary least squares
^
is not efficient. ε^2 = Y2 - βX2 = Y2 - (Y1 + 2Y2 + 3Y3)/7 = (5Y2 - Y1 - 3Y3)/7.
E[ ε^2 ] = 0. E[ ε^2 2] = Var[ ε^2 ] = {52Var[Y2] + Var[Y1] + 32Var[Y3]}/72 =

{25Var[ε2] + Var[ε1] + 9Var[ε3]}/49 = {(25)(9) + 1 + (9)(16)}/49 = 370/49 = 7.55.
^
ε^3 = Y3 - βX3 = Y3 - 3(Y1 + 2Y2 + 3Y3)/14 = (5Y3 - 3Y1 - 6Y2)/14.
E[ ε^3 ] = 0. E[ ε^3 2] = Var[ ε^3 ] = {52Var[Y3] + 32Var[Y1] + 62Var[Y3]}/142 =

{25Var[ε3] + 9Var[ε1] + 36Var[ε2]}/196 = {(25)(16) + (9)(1) + (36)(9)}/196 = 733/196 = 3.74.
1.15. D. Let Vi = 1 + Xi. Then Yi = βVi + εi, a line without an intercept.

Therefore, the least-squares estimate of β is: ΣViYi / ΣVi2 = Σ (1 + Xi)Yi / Σ (1 + Xi)2.
Alternately, the squared error is: Σ(Yi - β - βXi)2 =
ΣYi2 + Nβ2 + β2ΣXi2 - 2βΣYi + 2β2ΣXi - 2βΣXiYi.
Set the derivative with respect to β equal to zero:
0 = 2Nβ + 2βΣXi2 - 2ΣYi + 4βΣXi - 2ΣXiYi.
⇒ β = (ΣYi + ΣXiYi)/(N + ΣXi2 + 2ΣXi) = Σ (1 + Xi)Yi / Σ (1 + Xi)2.
2.1. C. and 2.2. E. X = 24/4 = 6. x = X - X = -6, -2, 2, 6. Y = 3589/4 = 897.

y = Y - Y = -63, -8, 19, 53. Σxiyi = 750. Σxi2 = 80.
^ ^
β = Σxiyi /Σxi2 = 750/80 = 9.375. α^ = Y - β X = 897 - (9.375)(6) = 841.
2.3. A. Sample covariance of X and Y = Σxiyi /(N-1) = -413.

^
sample variance of X = Σxi2/(N-1) = 512. β = Σxiyi /Σxi2 = Cov[X, Y]/Var[X] = -413/512 = -.807.
2.4. C. X = 3. x = X - X = -2, -1, 0, 1, 2. Y = 1914/5 = 383.

y = Y - Y = -181, -62, 21, 97, 124. Σxiyi = 769. Σxi2 = 10.
^ ^
β = Σxiyi /Σxi2 = 769/10 = 76.9. α^ = Y - β X = 383 - (76.9)(3) = 152.
^
The predicted value of Y, when X = 7 is: α^ + 7 β = 152 + (7)(76.9) = 691.
^
2.5. B. β = Σxiyi /Σxi2 = rsY/sX = (0.6)(8/6) = 0.8.
2.6. C. X = 2. x = X - X = -3, -1, 1, 3. Y = 5. y = Y - Y = -2, -1, 2, 1.

Σxiyi = 6. Σxi2 = 20.
^ ^
β = Σxiyi /Σxi2 = 12/20 = .6. α^ = Y - β X = 5 - (.6)(2) = 3.8.
^
The predicted value of Y, when X = 6 is: α^ + 6 β = 3.8 + (6)(.6) = 7.4.
2.7. B. X = (1 + 2 + 3 + 4 + 5)/5 = 3. xi = Xi - X .
Y = (82 + 78 + 80 + 73 + 77)/5 = 78. yi = Yi - Y .
X Y x y xy x2
1 82 -2 4 -8 4
2 78 -1 0 0 1
3 80 0 2 0 0
4 73 1 -5 -5 1
5 77 2 -1 -2 4
^ ^
β = Σxiyi/Σxi2 = -15/10 = -1.5. α^ = Y - β X = 78 - (-1.5)(3) = 82.5.
^
Forecast for year 7 is: α^ + β7 = 82.5 + (-1.5)(7) = 72.
2.8. X = 55. x = {-10, -5, 0, 5, 10}. Y = 63.6. y = {-20.6, -5.6, -.6, 12.4, 14.4}.
^ ^
Σxi2 = 250. Σxiyi = 440. β = 440/250 = 1.76. α
^ = Y - β X = -33.2.
2.9. A. X = 21/7 = 3. x = X - X = -1, 0, 0, 1, -1, 1, 0. Y = 448/7 = 64.

y = Y - Y = -14, -1, -8, 2, -4, 18, 7. Σxiyi = 38. Σxi2 = 4.
^ ^
β = Σxiyi /Σxi2 = 38/4 = 9.5. α^ = Y - β X = 64 - (9.5)(3) = 35.5.
Estimated salary of an actuary with 5 exams is: 35.5 + (5)(9.5) = 83.
Comment: The estimated salary of an actuary with 8 exams is: 35.5 + (8)(9.5) = 111.5.
However, one should be cautious about using forecasts for values significantly outside the
range of the data, such as 8 exams in this case.
2.10. D. X = 13/10 = 1.3. x = X - X = -1.3, -1.3, -1.3, -1.3, -.3, -.3, -.3, .7, 1.7, 3.7.
Y = 290/10 = 29. y = Y - Y = -19, -29, 14, -29, 6, -29, 51, -29, 29, 35.
Σxiyi = 232. Σxi2 = 24.1.
^ ^
β = Σxiyi /Σxi2 = 232/24.1 = 9.627. α^ = Y - β X = 29 - (9.627)(1.3) = 16.485.
Estimated losses for a taxi driver with 4 moving violations is: 16.485 + (4)(9.627) = 55.0.
^
2.11. C. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(26)(204,296) - (351)(15,227)} /{(26)(6201) - 3512} = -32981/38025 = -.86735.
^
α^ = Y - β X = 15,227/26 - (-.86735)(351/26) = 597.363.
^
Yi = 597.4 - 0.867Xi.
Alternately, Σxi2 = Σ(Xi - X )2 = ΣXi2 - N X 2 = ΣXi2 - {ΣXi}2/N = 6201 - 3512/26 = 1462.5.
Σxiyi = Σ(Xi - X )(Yi - Y ) = ΣXiYi - N X Y = ΣXiYi - ΣXiΣYi/N =
204,296 - (351)(15,227)/26 = -1268.5.
^
β = Σxiyi / Σxi2 = -1268.5/1462.5 = -.86735.
^
α^ = Y - β X = 15,227/26 - (-.86735)(351/26) = 597.363.
Comment: Similar to CAS3, 5/05, Q.27.
2.12. If each Xi is multiplied by 3.28 then so is X , and in deviations form so is each xi.
If each Yi is multiplied by 116 then so is Y , and in deviations form so is each yi.
^
Therefore, β = Σxiyi/Σxi2 is multiplied by (116)(3.28)/3.282 = 116/3.28 = 35.37.
The new fitted slope is: (35.67)(2.4) = 84.9.
^ ^
^ = Y - β X . Y is multiplied by 116, and β X is multiplied by (116/3.28)(3.28) = 116.
α
Thus the new α ^ is multiplied by 116.
The new fitted intercept is: (116)(37) = 4292.
New fitted model is: Y = 4292 + 84.9X.
^
Comment: In general, if X is multiplied by a constant, β is divided by that constant.
^
^ and β are each multiplied by that constant.
In general, if Y is multiplied by a constant, α
2.13. D. X = (1 + 2 + 3 + 4 + 5)/5 = 3. xi = Xi - X .
x = (-2, -1, 0, 1, 2). Σxi2 = 10.
ΣxiYi = (-2)(3.18) + (-1)(3.12) + (0)(3.30) + (1)(3.39) + (2)(3.41) = 0.73.
^
β = ΣxiYi/Σxi2 = 0.73/10 = 0.073.
Y = (3.18 + 3.12 + 3.30 + 3.39 + 3.41)/5 = 3.28.
^
α^ = Y - β X = 3.28 - (.073)(3) = 3.061.
^
Forecast for year 7 is: α^ + β7 = 3.061 + (.073)(7) = 3.572%.
Comment: I have used the shortcut, which avoids calculating yi = Yi - Y .
2.14. A. X = (142 + 146 + 156 + 163 + 170 + 177)/6 = 159.

x = (-17, -13, -3, 4, 11, 18). Σxi2 = 928. ΣxiYi = -248.
^
β 1 = ΣxiYi/ Σxi2 = - 248/928 = -.267.
^ ^
β 0 = Y - β 1 X = 46.83 - (-.267)(159) = 89.3.
2.15. If c is added to each Xi, then X is also increased by c, and in deviations form each xi is
^ ^ ^
the same. Therefore, β = Σxiyi/Σxi2 remains the same. α ^ = Y - β X , decreases by c β.
If c is added to each Yi, then Y is also increased by c, and in deviations form each yi is the
^ ^ = Y - β^ X , increases by c.
same. Therefore, β = Σxiyi/Σxi2 remains the same. α
2.16. D. X = 5/10 = .5. xi = Xi - X = (-.5, -.5, -.5 , -.5 , -.5 , .5 , .5, .5, .5, .5).
^ ^
β = ΣxiYi/Σxi2 = 0.5/2.5 = .2. Y = 5/10 = .5. α^ = Y - β X = .5 - (.2)(.5) = .4.
^
Estimated future claim frequency for an insured with 1 claim is: α^ + β1 = .4 + .2 = 0.6.
Comment: See “A Graphical Illustration of Experience Rating Credibilities,” by Howard C.
Mahler, PCAS 1998.
^
2.17. B. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(62)(4002) - (153)(727)} /{(62)(2016) - 1532} = 1.3476.
^
α^ = Y - β X = 727/62 - (1.3476)(153/62) = 8.400.
^
Yi = 8.400 + 1.3476Xi.
8.400 + (1.3476)(20) = 35.35.
2.18. D.ΣXi = 850 + (2)(50) = 950. ΣXi2 = 850 + (22)(50) = 1050.

ΣYi = 858 + (2)(62) = 982. ΣXiYi = 100 + (8)(2) + (10)(2) + (2)(4) = 144.
^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(10000)(144) - (950)(982)} /{(10000)(1050) - 9502} = 0.0528.
^
α^ = Y - β X = 0.0982 - (0.0528)(0.0950) = 0.0932.
0.0932 + (2)(0.0528) = 0.1988.
Comment: Related to the ideas behind Buhlmann Credibility, covered on Exam 4/C. See “A
Graphical Illustration of Experience Rating Credibilities,” by Howard Mahler, PCAS 1998.
2.19. C. X = 8/3. x = X - X = -5/3, 1/3, 4/3.

Let Y2 = 3z. Then Y = z + 7/3 and y = Y - Y = -1/3 - z, 2z - 7/3, 8/3 - z.
^ ^
β = Σxiyi /Σxi2 = (30 + 9z)/ 42. α^ = Y - β X = z + 7/3 - (8/3)(30 + 9z)/ 42 = z3/7 + 3/7.
But we are given α^ = 5/7. Therefore, z3/7 + 3/7 = 5/7. ⇒ Y2 = 3z = 2.

Alternately, minimizing the squared error Σ(Yi - α − βXi)2
results in two equations in two unknowns:
αN + βΣXi = ΣYi αΣXi + βΣXi2 = ΣXiYi .
N = 3, ΣXi = 8, ΣXi2 = 26, ΣYi = 7 + Y2, ΣXiYi = 22 + 3Y2. α^ = 5/7.

Therefore, 15/7 + 8β = 7 + Y2, and 40/7 + 26β = 22 + 3Y2.
Multiplying the first equation by 13 and subtracting the second equation times 4:
35/7 = 3 + Y2. ⇒ Y2 = 2.
Comment: Given an output and asked to solve for the missing input.
^
You can check your work by taking Y2 = 2. And then getting α^ = 5/7 and β = 6/7.
2.20. B. Σ xi2 = (9)(4) = 36. Σ yi2 = (9)(64) = 576.

-0.98 = r = Σ xiyi / √ Σxi2Σyi2 ⇒ Σ xiyi = (-.98)√(36)(576) = -141.12.
^
β = Σ xiyi / Σ xi2 = -141.12/36 = -3.92.
^
α^ = Y - β X = 10 - (2)(-3.92) = 17.84.
^
For X = 5, Y = 17.84 - (5)(3.92) = -1.76.
^
Alternately, sX = √4 = 2. sY = √64 = 8. β = rsY/sX = (-.98)(8/2) = -3.92. Proceed as before.
^
^ = Y - β X. ⇒ α ^
^ + β X = Y . ⇒ ( X , Y ) is on the fitted line, Y = α+
^ βX. ^
2.21. α
2.22. In deviations form, Σxiyi = Σxi(Yi - Y ) = ΣxiYi - Y Σxi = ΣxiYi - Y 0 = ΣxiYi.

^
Therefore, β = ΣxiYi /Σxi2.
^
xi = (-1, 0, 1). Σxi2 = 2. ΣxiYi = (-1)(0) + (0)(y) + (1)(2) = 2. β = 2/2 = 1.
^
Alternately, β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} = {(3)(y + 4) - (3)(2 + y)}/{(3)(5) - 32} = 1.
^
2.23. E. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(12)(26,696) - (144)(1,742)} /{(12)(2,300) - 1442} = 69504/6864 = 10.126.
^
α^ = Y - β X = 1742/12 - (10.126)(144/12) = 23.655.
^
Yi = 23.66 + 10.13Xi.
Alternately, Σxi2 = Σ(Xi - X )2 = ΣXi2 - N X 2 = ΣXi2 - {ΣXi}2/N = 2300 - 1442/12 = 572.
Σxiyi = Σ(Xi - X )(Yi - Y ) = ΣXiYi - N X Y = ΣXiYi - ΣXiΣYi/N = 26696 - (144)(1742)/12 = 5792.
^ ^
β = Σxiyi / Σxi2 = 5792/572 = 10.126. α^ = Y - β X = 1742/12 - (10.126)(144/12) = 23.655.
^
2.24. A. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(7)(599.5) - (315)(12.8)} /{(7)(14,875) - 3152} = 164.5/4900 = 0.03357.
^
α^ = Y - β X = 12.8/7 - (0.03357)(315/7) = 0.3179.
^
Yi = 0.3179 + 0.03357Xi.
The predicted price of oil for X = 75 is: 0.3179 + (0.03357)(75) = $2.836.
3.1. E. The residuals always sum to zero. ⇒ The final residual must be: -(12 - 4 - 9 + 6) = -5.
ESS = Σ ^εi 2 = (122 + 42 + 92 + 62 + 52) = 302.
3.2. C. X = 2.5. x = -1.5, -.5, .5, 1.5. Y = 46.25. y = -16.25, -6.25, 8.75, 13.75.
Σxi2 = 5. Σxiyi = 52.5. β^ = Σxiyi /Σxi2 = 10.5. α^ = Y - β^ X = 20.

^ ^
Yi = 20 + 10.5Xi = 30.5, 41, 51.5, 62. ^εi = Yi - Yi = -.5, -1, 3.5, -2. ESS = Σ ^εi 2 = 17.5.
3.3. B. The residuals add to zero. ⇒ ε^5 = -{1.017 + .409 - .557 - 2.487} = 1.618.
ε^ and X have a correlation of zero. ⇒ E[ε^ X] - E[ ε^ ]E[X] = 0. But E[ ε^ ] = 0. ⇒ E[ ε^ X].
0 = ΣXi ^εi = (7)(1.017) + (12)(.409) + (15)(-.557) + (21)(-2.487) + X5(1.618). ⇒ X5 = 30.
3.4. A. The first four residuals are: (13, 25, 36, 40) - (18.036, 22.989, 30.419, 40.325) =
-5.036, 2.011, 5.581, -.325.
The residuals add to zero. ⇒ ε^5 = -{-5.036 + 2.011 + 5.581 - .325} = -2.231.
^ ^
ε^ and Yi - Y have a correlation of zero. But E[ε^ ] = 0. ⇒ E[ ε^ ( Yi - Y )] = 0.
^ ^ ^
0 = Σ( Yi - Y ) ^εi = Σ Yi ^εi - Y Σ ^εi = Σ Yi ^εi =
^
(18.036)(-5.036) + (22.989)(2.011) + (30.419)(5.581) + (40.325)(-.325) + Y5 ( -2.231). ⇒
^ ^
Y5 = 50.231. Y5 = ε^5 + Y5 = -2.231 + 50.231 = 48.
4.1. TSS has N - 1 = 24 degrees of freedom.
RSS has k - 1 = 2 - 1 = 1 degree of freedom.
ESS has N - k = 25 - 2 = 23 degrees of freedom.
4.2. B. & 4.3. D. ESS = TSS - RSS = 1230 - 1020 = 210.

1020 / (# degrees of freedom for RSS) = 255. ⇒ # d.f. for RSS = 1020/255 = 4.
k - 1 = 4. ⇒ k = 5.
210 / (# degrees of freedom for ESS) = 7. ⇒ # d.f. for ESS = 210/7 = 30.
N - k = 30. ⇒ N = 35.
4.4. TSS has N - 1 = 49 degrees of freedom.

RSS has k - 1 = 4 - 1 = 3 degrees of freedom.
ESS has N - k = 50 - 4 = 46 degrees of freedom.
Comment: Note that: 3 + 46 = 49. This multivariable regression model would be written as:
^
Y = β1 + β2X2 + β3X3 + β4X4, where X2, X3 and X4 are the three independent variables.
^
4.5. β = Σ xiyi / Σ xi2 = 3640.5/253024 = .0144.
^
α^ = Y - β X = 7.663 - (.0144)(199.7) = 4.79.
X Y x y x^2 xy y^2
0 4.90 -199.7 -2.763 39882 -978.6 7.63
25 7.41 -174.7 -0.253 30522 -1294.6 0.06
50 6.19 -149.7 -1.473 22412 -926.7 2.17
75 5.57 -124.7 -2.093 15552 -694.6 4.38
100 5.17 -99.7 -2.493 9941 -515.5 6.21
125 6.89 -74.7 -0.773 5581 -514.7 0.60
150 7.05 -49.7 -0.613 2471 -350.4 0.38
175 7.11 -24.7 -0.553 610 -175.7 0.31
200 6.19 0.3 -1.473 0 1.8 2.17
225 8.28 25.3 0.617 640 209.4 0.38
250 4.84 50.3 -2.823 2529 243.4 7.97
275 8.29 75.3 0.627 5669 624.2 0.39
300 8.91 100.3 1.247 10059 893.6 1.56
325 8.54 125.3 0.877 15699 1070.0 0.77
350 11.79 150.3 4.127 22588 1772.0 17.03
375 12.12 175.3 4.457 30728 2124.6 19.87
395 11.02 195.3 3.357 38140 2152.1 11.27
Sum 3395 130.27 -0.0 0.000 253024 3640.5 83.15
Avg. 199.7 7.663
Graph of the data and the fitted line:

12
11
10
X
100 200 300 400
For example, for X =100, the estimated Y is: 4.79 + (.0144)(100) = 6.23.
For X =100, the observed Y is 5.17.
Therefore, the corresponding residual is: 5.17 - 6.23 = -1.06.
Graph of the residuals:
Residual
X
100 200 300 400
-1
-2
-3
TSS = Σ yi2 = 83.15.

^
RSS = Σ ( Yi - Y )2 = (4.79 - 7.663)2 + (5.15 - 7.633)2 + ... + (10.478 - 7.633)2 = 52.38.
^
Alternately, RSS = β Σxiyi = (.0144)(3640.5) = 52.4.
^
ESS = Σ ^εi 2 = Σ (Yi - Yi )2 = (4.90 - 4.79)2 + (7.41 - 5.15)2 + ... + (11.02 - 10.478)2 = 30.77.
TSS has degrees of freedom: N - 1 = 17 - 1 = 16.
RSS has degrees of freedom: k - 1 = 2 - 1 = 1.
ESS has degrees of freedom: N - k = 17 - 2 = 15.
Model (RSS) 52.38 1
Error (ESS) 30.77 15
Total (TSS) 83.15 16
Comment: Once one has TSS and either RSS or ESS, one can get the other one by
subtraction. For example, ESS = TSS - RSS = 83.15 - 52.38 = 30.77.
^ ^
4.6. E. RSS = Σ( Yi - Y )2 = Σ( α^ + βXi - Y )2 = 49.
TSS = Σ(Yi - Y )2 = (10 - 1)(Sample Variance of Y) = (9)(8) = 72.
^ ^
Σ( α^ + βXi - Yi )2 = Σ( Yi - Yi)2 = ESS = TSS - RSS = 72 - 49 = 23.
^
4.7. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} = {(20)(167) - (42)(76)}/{(20)(101) - 422} = 0.578.
^
4.8. α^ = Y - β X = 76/20 - (0.578)(42/20) = 2.586.
4.9. TSS = Σ(Y i - Y )2 = ΣYi2 - (ΣYi)2/N = 310 - 762/20 = 21.2.
^ ^
4.10. RSS = β 2Σxi2 = β 2{ΣXi2 - (ΣXi)2/N} = (0.5782)(101 - 422/20) = 4.28.
4.11. ESS =TSS - RSS = 21.2 - 4.28 = 16.92.
4.12. B. X = 1.5. x = - 1.5, -.5, .5, 1.5.

Y = (2, 3.5, 5, 6.5) + (1, 1.5, -2, 0.5) = 3, 5, 3, 7. Y = 4.5. y = - 1.5, .5, -1.5, 2.5.
^ ^
β = Σxiyi/Σxi2 = 5/5 = 1. RSS = β Σxi2 = (5)(1) = 5.
^
TSS = Var[Y] =Σyi2 = 11. Σ (Yi - Yi )2 = ESS = TSS - RSS = 11 - 5 = 6.
^ ^
Alternately, α^ = Y - β X = 4.5 - (1)(1.5) = 3. Yi = 3 + Xi = (3, 4, 5, 6).
^
Σ (Yi - Yi )2 = (3 - 3)2 + (5 - 4)2 + (3 - 5)2 + (7 - 6)2 = 6.
Comment: The true values are on the line 1.5X + 2. The estimated slope of 1.0 is not equal to
the true slope of 1.5 due to the random error terms contained in the observations of Y.
5.1. E. R2 = RSS/TSS = RSS/(RSS + ESS) = 124/(124 + 21) = .855.

^ ^
5.2. A. β = Σ xiyi / Σ xi2 = 7/10 = 0.7. α^ = Y - β X = 2 - (0.7)(3) = -0.1.
X Y x y x^2 xy y^2
1 1 - 2 - 1 4 - 2 1
2 1 - 1 - 1 1 - 1 1
3 2 0 0 0 0 0
4 2 1 0 1 2 0
5 4 2 2 4 8 4
Sum 15 10 0 0 10 7 6
Avg. 3 2
^
TSS = Σ yi2 = 6. RSS = β Σxiyi = (.7)(7) = 4.9. R2 = RSS/TSS = 4.9/6 = .817.
^ ^
5.3. β = r sY/sX. δ^ = r sX/sY. β δ^ = r2.
Comment: Therefore, the product of the two fitted slopes is between 0 and 1.
^ ^ ^
β δ^ have the same sign. If β = δ^ = 1, then r = 1. If β = δ^ = -1, then r = -1
For the 2-variable model, R2 is the square of the sample correlation of X and Y,
^
and thus R2 = β δ^ .
For the heights example, when one regresses X = heights of fathers and Y = heights of sons,
the fitted slope is 0.6254.
When instead one regresses X = heights of sons and Y = heights of fathers, the fitted slope is
1.383. (0.6254)(1.383) = 0.865 = 0.9302
= square of the correlation between the heights of the fathers and sons.
^
5.4. A. For the 2-variable model, R2 = β 2 Σxi2 /Σyi2 = 17.252 (37/20019) = .550.
5.5. D. X = 2. x = -2, -1, 0, 3. ΣxiYi = (-2)(2) + (-1)(5) + (0)(11) + (3)(18) = 45.

^
Σxi2 = 14. β = 45/14 = 3.214. Y = 9. α
^ = 9 - (3.214)(2) = 2.572.
X Y Fitted Model Error Squared Error

0 2 2.572 0.572 0.327
1 5 5.786 0.786 0.618
2 11 9.000 -2.000 4.000
5 18 18.642 0.642 0.412
Sum 5.357
ESS = 5.357. TSS = Σyi2 = {(2 - 9)2 + (5 - 9)2 + (11 - 9)2 + (18 - 9)2 = 150.
R 2 = 1 - 5.357/150 = 96.43%.
5.6. A. Let V = X - Y = -2, -4, -9, -13.
X = 2. x = -2, -1, 0, 3. ΣxiVi = (-2)(-2) + (-1)(-4) + (0)(-9) + (3)(-13) = -31.
^
Σxi2 = 14. β = -31/14 = -2.214. V = -7. α
^ = -7 - (-2.214)(2) = -2.572.
X V Fitted Error Squared

Model Error
0 - 2 -2.572 -0.572 0.327
1 - 4 -4.786 -0.786 0.618
2 - 9 -7.000 2.000 4.000
5 -13 -13.642 -0.642 0.412
Sum 5.357
ESS = 5.357. TSS = Σvi2 = {(-2 + 7)2 + (-4 + 7)2 + (-9 + 7)2 + (-13 + 7)2 = 74.
R 2 = 1 - 5.357/74 = 92.76%.
Comment: Restate the model in this question: Yi = -α + (1 - β)Xi - εi.
Therefore, the intercept in this question is minus the intercept in the previous question.
Also the slope in this question plus the slope in the previous question sum to one.
The errors in this question are minus those in the previous question; thus the ESS is the same
for the two models. However, the TSS are different. Therefore R2 for the two models are
different, even though the two models contain the same information.
See VEE-Applied Statistics Exam, 8/05, Q.9.
5.7. D. For the 2-variable model, R2 = (Σxiyi)2/ {Σxi2Σyi2} = 10222 /{(1660)(899)} = .700.
5.8. A. Since for a model with an intercept, the residuals sum to zero, the fifth residual must
be 0.6. ESS = Σ ^εi 2 = .42 + .32 + 02 + .72 + .62 = 1.1.
TSS = Σ(Yi - Y )2 = 1.5(N - 1) = (1.5)(5 - 1) = 6. R2 = 1 - ESS/ TSS = 1 - 1.1/6 = .817.
^ ^
5.9. t X Y Y = -25 +20X Y-Y
1 2 10 15 -10
2 2 20 15 -10
3 3 30 35 10
4 3 40 35 10
Y = 25. TSS = Σ(Yt - Y ) = 15 + 5 + 5 + 152 = 500.
2 2 2 2
^
RSS = Σ( Yt - Y )2 = 100 + 100 + 100 + 100 = 400. R2 = RSS/TSS = 400/500 = .80.
^
Alternately, ESS = Σ(Yt - Yt ) = (-5)2 + 52 + (-5)2 + 52 = 100. R2 = 1 - ESS/TSS = .80.
5.10. (a) α is in the same units as Y, which is in dollars (or other monetary unit such as
pounds, yen, or euros.)
(b) β is in the same units as Y/X, which is in dollars per year.
(c) R2 is a dimensional quantity, a pure number without units.
Comment: R is a correlation coefficient, a pure number without units.
^ ^ ^ ^
^ + βX
5.11. Yi = α i = Y + β(Xi - X ). ⇒ The mean of Yi is Y .
^ ^ ^ ^ ^
Cov[Yi, Yi ] = Σ(Yi - Y )( Yi - Y ) = Σ(Yi - Y ) β(Xi - X ) = βΣ(Yi - Y )(Xi - X ) = βCov[Xi, Yi].
^ ^ ^ ^
Var[ Yi ] = Σ( Yi - Y )2 = Σ{ β(Xi - X )}2 = β 2Var[Xi].
^ ^ ^
Corr[Yi, Yi ] = βCov[Xi, Yi]/√(Var[Yi] β 2Var[Xi]) = Cov[Xi, Yi]/√(Var[Yi]Var[Xi]) = Corr[Xi, Yi] = r.
^ ^ ^
5.12. E. RSS = Σ( Yi - Y )2 = Σ{ β (Xi - X )}2 = β 2Σxi2.
^
R2 = RSS/TSS = β 2Σxi2/Σyi2 = (2.0652)(42)/182 = .984.
Comment: See page 73 of Pindyck and Rubinfeld.
5.13. A. Fk-1,N-k = {RSS/(k-1)}/{ESS/(N-k)} = {TSS R2/(k-1)}/{TSS(1 - R2)/(N-k)} =

{R2/(1 - R2)}{(N-k)/(k-1)}, Equation 4.12 in Pindyck and Rubinfeld, so statement A is true.
However, statement A is not an objection to the use of R2 to compare the validity of
regression results. Statement B is true, see page 89 of Pindyck and Rubinfeld. The corrected
R2 solves this problem with R2. This is related to the principle of parsimony; one does not want
to use more variables than are needed to get the job done. Statement C is true, see page 89
of Pindyck and Rubinfeld. Generally, R2 is not applied to models without an intercept.
Statement D is true, see page 92 of Pindyck and Rubinfeld. Thus R2 is to some extent an
artifact of how the model relationship is stated, rather than an inherent feature of the model.
Statement E is true, see page 89 of Pindyck and Rubinfeld.
5.14. D. Restate the second model: Yi = -α2 + (1 - β2)Xi - ε2i.

^ 1 and β^ 1 = 1 - β^ 2. Therefore, statements A and B are true.
^ 2 = -α
Therefore, α
Models I and II contain the same information; ε2i = -ε1i. Σ ε^ 1i2 = Σ ε^ 2i2. Statement C is true.
Thus the ESS is the same for the two models. However, the TSS are different.
For model one, TSS = sum of (Y - Y )2. For model two, TSS = sum of (X - Y - X + Y )2.
R2 = 1 - ESS/TSS. Therefore R2 for the two models are different. Statement D is false.
6.1. A. R2 = RSS/TSS = 335.2/437.7 = .766.
2 2
6.2. C. 1 - R = (1 - R2)(N - 1)/(N - k) = 1 - (1 - .766)(9/6) = .351. R = .649.
2
Alternately, R = 1 - (ESS/(N - k))/(TSS/(N - 1)) = 1 - (102.5/6)/(437.7/9) = .649.
6.3. A. TSS = (N - 1)(sample variance of Y) = (14)(.0103) = .1442. R2 = 1 - ESS/TSS.

2
1 - R = (1 - R2)(N - 1)/(N - k), where k is the number of variables including the intercept.
Model k ESS TSS R2 corrected R2
I 3 0.0131 0.1442 0.909 0.894
II 3 0.0123 0.1442 0.915 0.900
III 3 0.0144 0.1442 0.900 0.883
IV 4 0.0115 0.1442 0.920 0.898
V 4 0.0117 0.1442 0.919 0.897
VI 4 0.0118 0.1442 0.918 0.896
VII 5 0.0106 0.1442 0.926 0.897
2
Model II has the largest R .
Comment: Based very loosely on “Workers Compensation and Economic Cycles:
A Longitudinal Approach”, by Hartwig, Retterath, Restrepo, and Kahley, PCAS 1997.
6.4. R2 follows a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2 = (3 - 1)/2 = 1,
b = ν2/2 = (N - k)/2 = (11 - 3)/2 = 4, and θ = 1.
The density is: {(a + b - 1)! / ((a-1)! (b-1)!)} (x/θ)a-1 (1 - x/θ)b-1 /θ, 0 ≤ x ≤ θ,
which is: 4(1 - x)3, 0 ≤ x ≤ 1.
Mean is: θa/(a+b) = 1/(1 + 4) = 0.20.
Variance is: θ2ab / {(a + b)2 (a + b + 1)} = (1)(4)/{(1 + 4)2(1 + 4 + 1)} = 0.0267.
Comment: Here is a graph of this density:
Prob.
4
R2
0.2 0.4 0.6 0.8 1
Since a ≤ 1, and b > 1, the mode is zero.
6.5. B. Σxi2 = Σ(Xi - X ) = (sample variance of X)(18), is the same for both regressions.
TSS = Σyi2 = Σ(Yi - Y ) = (sample variance of Y)(18), is the same for both regressions.
Σxiyi = Σ(Xi - X )(Yi - Y ) is larger for the first regression.
^
⇒ β = Σxiyi/ Σxi2 is larger for the first regression than it is for the second regression.
R2 = correlation2 = (Σxiyi)2/(Σxi2Σyi2), is larger for the first regression than it is for the second.
2
⇒ R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - R2)18/17, is larger for the first regression.
ESS = (1 - R2)TSS. Since R2 is bigger for the first and TSS is the same, ESS is smaller for the
first regression. ⇒ s2 = ESS/(N-k) = ESS/17 is smaller for the first regression.
^ could be either bigger, the same, or smaller for the first regression.
α
Since we are not given X and Y , there is no way to determine which of these is the case.
2
6.6. C. R 2 = 1 - (1 - R )(N - k)/(N - 1).
6.7. B. The original model is a special case of a model that includes the additional variable,
but with the coefficient corresponding to that variable equal to zero. Therefore, when an
additional independent variable is added, and a least squares regression is fit, the match
between the model and the data is as good or better than it was. Therefore, the regression sum
^
of squares, RSS = Σ( Yi - Y )2, either is the same or increases. (RSS almost always increases.)
Since the data has remained the same, TSS = Σ(Yi - Y )2, remains the same.
Therefore, ESS = TSS - RSS, remains the same or decreases. (ESS almost always
decreases.) Therefore, R2 = RSS/TSS, either is the same or increases.
(R2 almost always increases.)
2
R = 1 - (1 - R2)(N - 1)/(N - k). When we add a variable, N - k is one less, while R2 usually
2 2 2
increases. The former reduces R , while the latter increases R . The net effect is that R can
either increase, decrease, or stay the same.
Comment: For example, if one were modeling the annual frequency of claims for Workers
Compensation Insurance in California, and the independent variable added to the model was
the number of games won each year by the New York Yankees baseball team, one would
2
expect RSS to increase very little, resulting in a decline in R , since this additional variable
explains nothing significant about what is being modeled. The model including the irrelevant
baseball data is inferior to the model without it.
2
6.8 . C. For the new fit, the expected value of R is the same, 0.64.
2
0.64 = R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - R2)(9/8). ⇒ R2 = 0.680.
2
Comment: Unlike R2, R has been adjusted for degrees of freedom, so its expected value
remains the same.
6.9. R2 follows a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2 = (2 - 1)/2 = 1/2,
b = ν2/2 = (N - k)/2 = (3 - 2)/2 = 1/2, and θ = 1.
The density is: {Γ[a + b] / (Γ[a] Γ[b])} (x/θ)a-1 (1 - x/θ)b-1 /θ, 0 ≤ x ≤ θ,
which is: 1/(π √{x(1 - x)}), 0 ≤ x ≤ 1.
Mean is: θa/(a+b) = .5/(.5 + .5) = 0.5.
Variance is: θ2ab / {(a + b)2 (a + b + 1)} = (1/2)(1/2)/{(1)2(2)} = 0.125.
Comment: The constant in front of the density, 1/π, follows from Γ[1/2] = √π, a fact for which you
are not responsible. With this constant, the density integrates to one over its support (0, 1).
Here is a graph of this density:
Prob.
7
6
5
4
3
2
1
R2
0.2 0.4 0.6 0.8 1
Since a ≤ 1, and b ≤ 1, the density is bimodal, with modes at zero and one.
6.10. A. R2 = RSS/TSS = 1115.11/1254.00 = .889.

2 2
1 - R = (1 - R2)(N - 1)/(N - k) = 1 - (1 - .889)(7/5) = .155. R = .845.
2
Alternately, R = 1 - (ESS/(N - k))/(TSS/(N - 1)) = 1 - (138.89/5)/(1254/7) = .845.
2 2
6.11. E. 1 - R = (1 - R2)(N -1)/(N - k) = (.15)(11 - 1)/(11 - 2) = .167. R = .833.
6.12. A. R2 = RSS/TSS = RSS/(RSS + ESS) = 103658/(103658 + 69204) = .600.

2 2
1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .600)(48 - 1)/(48 - 4) = (.4)(47/44) = .427. R = .573.
Alternately, s2 = ESS/(N - k) = 69204/44 = 1572.8.
Sample Variance of Y = Σ(Yi - Y )2/(N - 1) = TSS/(N - 1) = (103658 + 69204)/47 = 3677.9
2
R = 1 - s2/Var[Y] = 1 - 1572.8/3677.9 = .572.
7.1. B. Φ(1.645) = .95, so ±1.645 standard deviations has 5% on each tail; it covers 90%
probability.
7.2. A. Standardize each value by subtracting the mean and dividing by σ.

Prob[6 ≤ x ≤ 12] = Φ((12 - 7)/4) - Φ((6 - 7)/4) = Φ(1.25) - Φ(-.25) = .8941 - .4014 = 49.3%.
7.3. E. The sum of the residuals is always zero, so they have a first moment of zero.
Therefore, Variance = Second Central Moment = Second Moment.
Therefore, Third Central Moment = E[(X - X )3] = E[(X - 0)3] = 3rd moment.
Therefore, Fourth Central Moment = 4th moment.
Skewness = Third Central Moment/Variance1.5 = Third Moment/Second Moment1.5.
Kurtosis = Fourth Central Moment/Variance2 = Fourth Moment/Second Moment2.
2nd mom. 3rd mom. 4th mom. Skewness Kurtosis
A 5 2 80 0.18 3.20
B 5 1 50 0.09 2.00
C 5 0 100 0.00 4.00
D 5 - 2 50 -0.18 2.00
E 5 - 1 70 -0.09 2.80
For the Normal Distribution the skewness is 0 and the Kurtosis is 3.
Regression E seems to most closely fit these criteria.
7.4. C. Φ(1.960) = .975, so ±1.960 standard deviations has 2.5% on each tail; it covers 95%
^ ^
probability. β ± 1.960StdDev[ β] = 13 ± (1.960)(3) = [7.12, 18.88].
7.5. A. 0 = E[L] = E[aX1 + 4X2 + bX3] = aµ + 4µ + bµ. ⇒ a + b = -4.

1 = Var[L] = Var[aX1 + 4X2 + bX3] = a2/24 + 16/24 + b2/24. ⇒ a2 + b2 = 8.
⇒ a = -2, b = -2.
7.6. C. Φ(2.326) = .99. The variance of the mean is 10/n.

Therefore a 98% confidence interval is: ± (2.326)√(10/n).
Set the length equal to 3: 3 = (2)(2.326)√(10/n). ⇒ n = 24.04. ⇒ n = 25.
7.7. A. X is Normal with mean 30.4 and variance: 122/36 = 4.

Y is Normal with mean 32.1 and variance: 142/49 = 4.
X - Y is Normal with mean 30.4 - 32.1 = -1.7, and variance: 4 + 4 = 8.
P[ X > Y ] = P[ X - Y > 0] = 1 - Φ((0 - (-1.7))/√8) = 1 - Φ(.60) = 27.4%.
7.8. C. For a 95% confidence interval we want ± 1.960 standard deviations.

X has variance 12/n. So the length of the confidence interval is: (2)(1.960)√(12/n).
5 = (2)(1.960)√(12/n). ⇒ n = 7.4. Since n is integer, take n = 8.
Comment: For n = 7 the length is: (2)(1.960)√(12/7) = 5.13, too wide.
7.9. C. Prob[X2 < a] = Prob[X2 / a < 1] = Prob[|X|/√a < 1] = 1 - Prob[|X|/√a ≥ 1] =

1 - (Prob[X/√a ≤ -1] + Prob[X/√a ≥ 1]) = 1 - 2Prob[X/√a ≥ 1] = 1 - 2(1 - Φ(1)) = .6826.
Comment: X/√a has a Unit Normal Distribution.
X2/a has a Chi-Square Distribution with one degree of freedom.
7.10. D. Φ(1.645) = 95% = (1+90%)/2. X is Normal with variance: 144/16 = 9.

90% confidence interval = 200 ± (1.645)√9 = (195.065, 204.935).
7.11. D. Let X1 and X2 denote the measurement errors of the two instruments.
(X1 + X2)/2 is Normal with mean 0 and variance: {(0.0056h)2 + (0.0044h)2}/4 =
.00001268h2. The standard deviation is: .003561h.
Prob[-0.005h ≤ (X1 + X2)/2 ≤ 0.005h] = Φ(.005h/.003561h) - Φ(-.005h/.003561h) =
Φ(1.404) - Φ(-1.404) = 2Φ(1.404) - 1 = (2)(.9196) - 1 = 0.839.
7.12. B. The mean of this exponential is θ = 1000. Its variance is θ2 = 1 million.

The sum of losses from 100 polices is approximately Normal with mean 100,000 and variance
100 million. The premium from those 100 polices is (100)(1000 + 100) = 110,000.
Prob[claims > premium] ≅ 1 - Φ((100,000 - 110,000)/√100 million) = 1 - Φ(1) = .1587.
7.13. D. This uniform distribution has mean 0 and variance 52/12 = 25/12.
The mean for 48 people has mean 0 and variance: (25/12)/48 = .04340.
Prob[-.25 ≤ mean ≤ .25] = Φ(.25/√.04340) - Φ(-.25/√.04340) =
Φ(1.2) - Φ(-1.2) = 2Φ(1.2) - 1 = (2)(.8849) - 1 = 0.770.
7.14. C. The average has a mean of 19,400 and standard deviation of 5000/√25 = 1000.
Prob[average > 20,000] = 1 - Φ((20000 - 19400)/1000) = 1 - Φ(.6) = 1 - 0.7257 = 0.2743.
7.15. B. For n light bulbs, the total is Normal with mean 3n and variance n.
Prob[sum ≥ 40] = 1 - Φ((40 - 3n)/√n) = .9772. ⇒ (40 - 3n)/√n = -2.0.
⇒ 40 - 3n = -2√n. ⇒ 3n - 2√n - 40 = 0. √n = (2 ± √484)/6 = 4 or - 10/3. ⇒ n = 16.
7.16. C. The sum of the contributions has mean: (2025)(3125) = 6,328,125 and
standard deviation: (250)(√2025) = 11250. Φ(1.282) = .9.
90th percentile ≅ 6,328,125 + (1.282)(11250) = 6,342,548.
8.1. C. For i ≠ j, E[εi εj] = E[εi]E[εj] = (0)(0) = 0, since εi and εj are independent, each with mean
zero. However, E[εi εi] = Var[εi] + E[εi]2 = σ2, which is not necessarily equal to 1.
Statements A, B, and D are true, where D is the definition of homoscedasticity.
8.2. Cov[X, Y] = Cov[X, α + βX + ε] = Cov[X, α] + Cov[X, βX] + Cov[X, ε]

= 0 + βCov[X, X] + 0 = βVar[X].
Var[Y] = Var[α + βX + ε] = 0 + Var[βX] + Var[ε] = β2Var[X] + σ2.
Corr[X, Y] = βVar[X]/√{Var[X](β2Var[X] + σ2)} = β /√ {β
β 2 + σ 2/Var[X]}.
Comment: I have made use of the fact that X and ε are independent, and Var[ε] = σ2.
As σ → ∞, Corr[X, Y] → 0. As β → ∞, Corr[X, Y] → 1. As Var[X] → ∞, Corr[X, Y] → 1.
9.1. E. Heteroscedasticity does not produce biased or inconsistent estimators of the slope
parameters. It does produce inefficient estimators, unless corrected for. (See page 147 of
Pindyck and Rubinfeld.) Serial correlation does not produce biased or inconsistent estimators
of the slope parameters. It does produce inefficient estimators, unless corrected for. (See page
159 of Pindyck and Rubinfeld.)
Comment: Based on Course 4 Sample Exam, question 40, but I have excluded those ideas
which are no longer on the Syllabus.
9.2. E[X] = (.7)(0) + (.3)(10) = 3. E[X2] = (.7)(02) + (.3)(102) = 30. Var[X] = 30 - 32 = 21.
Sample Probability Mean Sample Variance
0, 0 (.7)(.7) = .49 0 {(0 - 0)2 + (0 - 0)2}/(2 - 1) = 0
0, 10 (.7)(.3) = .21 5 {(0 - 5)2 + (10 - 5)2}/(2 - 1) = 50
10, 0 (.3)(.7) = .21 5 {(10 - 5)2 + (0 - 5)2}/(2 - 1) = 50
10, 10 (.3)(.3) = .09 10 {(10 - 10)2 + (10 - 10)2}/(2 - 1) = 0
Expected value of the sample variance is: (.49)(0) + (.21)(50) + (.21)(50) + (.09)(0) = 21.
Comment: Illustrating the general fact, that the sample variance, with N - 1 in the denominator,
is an unbiased estimator of the variance of the distribution.
9.3. A. They are biased and inconsistent. They need not be efficient.
9.4. C. Cov[α, β] = Corr[α, β]√(Var[α]Var[β]) = (.6)√15 = 2.324.

Var[γ] = Var[wα + (1-w)β] = w2Var[α] + (1-w)2Var[β] + 2w(1-w)Cov[α, β] =
5w2 + 3(1-w)2 + 4.648w(1-w) = 3.352w2 - 1.352w + 3.
Bias[γ] = Bias[wα + (1-w)α] = wBias[α] + (1-w)Bias[β] = (-1)w + (2)(1 - w) = 2 - 3w.
MSE[γ] = Var[γ] + Bias[γ]2 = 3.352w2 - 1.352w + 3 + (2 - 3w)2 = 12.352w2 - 13.352w + 7.
Setting the derivative with respect to w equal to zero:
24.704w - 13.352 = 0. ⇒ w = 13.352/24.704 = .540.
Comment: Bias[α] = E[α] - true value. Bias[β] = E[β] - true value.
Bias[γ] = E[γ] - true value = wE[α] + (1-w)E[β] - true value = wBias[α] + (1-w)Bias[β].
Here is a graph of the Mean Squared Error as a function of w:
MSE
7
6.5
5.5
4.5
3.5
w
0.2 0.4 0.6 0.8 1
9.5. B. They are unbiased and consistent. They are no longer efficient.
9.6. A. mean = (0)(.6) + (10)(.3) + (100)(.1) = 13.

2nd moment = (02)(.6) + (102)(.3) + (1002)(.1) = 1030.
variance = 1030 - 132 = 861.
Σ(Xi - X )2/6 = (5/6)(Σ(Xi - X )2 /5) = (5/6)(sample variance).
The sample variance is unbiased. ⇒ E[sample variance] = 861.
⇒ E[Σ(Xi - X )2/6] = (5/6)(861) = 717.5.
Bias = expected value of the estimator - true value = 717.5 - 861 = -143.5.
Comment: This estimator of the variance is biased downwards, since n is in the denominator
rather than n - 1.
* 9.7. An example of specification error would be if we fit a Weibull Distribution, while the
data was actually drawn from a Gamma Distribution. The fitted parameters will vary around the
actual parameters, an example of parameter error. If we use the fitted model to predict the size
of the next loss, there will be error due to the fact that the next loss is a random draw from a
size of loss distribution; this is an example of process risk.
9.8. Y = 10 + 3X2 + 2X3 + ε. ⇒ Y1 = 10 + ε1. Y2 = 15 + ε2. Y3 = 22 + ε3.

^
In deviations form, x = (-1, 0, 1), and β = ΣxiYi/Σxi2 = (-Y1 + Y3)/2 = 6 + (ε3 - ε1)/2.
^
^ = Y - β X = 15.667 + (ε1 + ε2 + ε3)/3 - {6 + (ε3 - ε1)/2}(1) = 9.667 + (5ε1 + 2ε2 - ε3)/6.
α
^ = 9.667 + E[(5ε1 + 2ε2 - ε3)/6] = 9.667.
E[ α]
^
E[ β] = 6 + E[(ε3 - ε1)/2] = 6.
Comment: Since a relevant variable has been omitted from the model, the least squares
estimators of the coefficients are biased.
Var[X2] = {(0 - 1)2 + (1 - 1)2 + (2 - 1)2}/3 = 2/3.
Cov[X2 , X3] = {(0)(0) + (1)(1) + (2)(3)}/3 - {(0 + 1 + 2)/3}{(0 + 1 + 3)/3} = 7/3 - (1)(4/3) = 1.
^
E[ β 2 ] = β2 + β3Cov[X2 , X3]/Var[X2] = 3 + (2)(1)/(2/3) = 6.
9.9. D. λ is both the mean and the variance of the Poisson Distribution.
X is an unbiased estimator of the mean and thus of λ.
Estimator II is the sample variance, an unbiased estimator of the variance and thus of λ.
E[2X1 - X2] = 2E[X1] - E[X2] = 2λ - λ = λ. Thus estimator III is unbiased.
9.10. A. 1. T. Definition of Unbiased. 2. False. For an asymptotically unbiased estimator, the

average error goes to zero as n goes to infinity. However, there can still be a large chance of a
large error, as long as the errors are of opposite sign. For example, a 50% chance of an error
of +1000 and a 50% chance of an error of -1000, gives an average error of 0. 3. False.
9.11. C. 1. False. An unbiased estimator would have its expected value equal to the true
value, but 2 ≠ 2.05. 2. False. MSE(α1) = (.052) + 1.025 < (.052) + 1.050 = MSE(α2). 3. True.
9.12. E. Z = αX + βY. E[Z] = αE[X] + βE[Y]. For Z unbiased, E[Z] = m.

Thus m = α(.8m) + βm. Thus β = 1 - .8α.
Since X and Y are independent, Var[Z] = α2Var[X] + β2Var[Y] = α2m2 + β21.5m2 =
m2(α2 + 1.5(1 - .8α)2) = m2(1.96α2 - 2.4α + 1.5). The mean squared error is the sum of the
variance and the square of the bias. Since the bias is zero we need to minimize the variance of
Z. ∂Var[Z] / ∂α = m2(3.92α - 2.4). Setting the partial derivative equal to zero:
m2(3.92α - 2.4) = 0. Therefore, α = 2.4 / 3.92 = .612..
9.13. B. E[ψ] = 165/75 = 2.2. E[ψ2] = 375/75 = 5. Var[ψ] = 5 - 2.22 = .16.

Mean Square error of ψ = Square of Bias plus Variance = (2.2 - 2)2 + .16 = .200.
E[φ] = 147/75 = 1.96. E[φ2] = 312/75 = 4.16. Var[φ] = 4.16 - 1.962 = .318.
Mean Square error of φ = Square of Bias plus Variance = (1.96 - 2)2 + .318 = .320.
MSE(ψ) / MSE(φ) = .200 /.320 = .625.
9.14. B. Statement 2 is the definition of a consistent estimator, and therefore must be true of
a consistent estimator. Neither statement 1 or 3 must be true of a consistent estimator.
9.15 C. Mean squared error = variance plus the square of the bias = 1.00 + .52 = 1.25.
9.16. C. This absolute value loss function is minimized by the median. The median of this
distribution is the value of x such that .5 = F(x) = 1- e-x. Thus x = ln 2 = .693.
9.17. D. b ∗ = {Σ (Zi - Z )(Yi - Y )}/ {Σ(Zi - Z )(Xi - X )} =

ΣYi(Zi - Z )/ {Σ(Zi - Z )(Xi - X )} - (1/N)ΣYiΣ(Zi - Z )/ {Σ(Zi - Z )(Xi - X )},
which is linear in Yi; b∗ is of the form ΣwiYi.
E[Yi] = α + βXi = Y − β X + βXi = Y + β(Xi - X ). E[Yi - Y ] = β(Xi - X ).
E[b∗] = {Σ(Zi - Z )E[Yi - Y ]}/ {Σ(Zi - Z )(Xi - X )} = {Σ(Zi - Z )β(Xi - X )}/ {Σ(Zi - Z )(Xi - X )} = β.
The ordinary least squares estimator, Σxiyi/Σxi2 is the best linear unbiased estimator, so b∗ can
not be the best, in other words with minimum mean squared error, of the linear unbiased
estimators. Thus b ∗ is a linear unbiased estimator of β , but not the best linear
unbiased estimator (BLUE) of β .
Heteroscedasticity-consistent estimators are concerned with estimating variances of estimated
regression parameters, which b∗ is not doing, so choice B is false.
Comment: If the true model were: Yi = α + βXi2 + εi, then b∗ would be the best linear unbiased
estimator (BLUE) of β. In deviations form, b∗ = Σzi yi / Σzi xi = Σyi {zi / Σzi xi}, which is linear in
yi; b∗ is of the form Σwiyi, with wi = zi / Σzi xi. E[b∗] = ΣE[yi] {zi / Σzi xi} = Σβxi {zi / Σzi xi} = β.
9.18. B. Let the parameter be θ. E[Y1] = θ. E[Y2] = θ. E[k1Y1 + k2Y2] = θ.

⇒ k1E[Y1] + k2E[Y2] = θ. ⇒ k1θ + k2θ = θ. ⇒ k1 + k2 = 1.
Var[k1Y1 + k2Y2] = k12Var[Y1] + k22Var[Y2] = (1 - k2)2 4Var[Y2] + k22 Var[Y2].
Set equal to zero the partial derivative with respect k2:
0 = -8(1 - k2) Var[Y2] + 2 k2 Var[Y2]. ⇒ k2 = 4/5. ⇒ k1 = 1/5 = 0.20.
Comment: Weight each estimator inversely proportional to its variance.
9.19. D. E[ θ^ ] = E[{k/(k+1)} X] = {k/(k+1)}E[X] = {k/(k+1)}θ. Bias = E[ θ^ ] - θ = θ/(k + 1).

Var[ θ^ ] = Var[{k/(k+1)} X] = {k/(k+1)}2Var[X] = {k/(k+1)}2θ2/25.
MSE = Var + Bias2. We are given that in this case MSE θ^(θ) = 2[bias θ^(θ)]2 . ⇒ Var[ θ^ ] = Bias2.
⇒ {k/(k+1)}2θ2/25 = {θ/(k + 1)}2. ⇒ k2 = 25. ⇒ k = 5.
9.20. D. There are a total of five observations.
The average of 5 independent, identically distributed variables has the same mean and 1/5 the
variance; it is Normal with mean 20 and variance 4/5 = 0.8.
The expected value of the estimator is E[ X 2], which is the second moment of a Normal with
µ = 20 and σ2 = 0.8, which is: 0.8 + 202 = 400.8.
The true value of the area is: 202 = 400.
The bias is: 400.8 - 400 = 0.8. It will overestimate on average by 0.8 square meters.
Comment: Even though we have an unbiased estimator of the length of a side, squaring it
does not give an unbiased estimator of the area, since E[X2] ≥ E[X]2.
Variance ≥ 0. ⇒ E[X2] ≥ E[X]2.
10.1. X = 24/4 = 6. x = -4, -1, 2, 3. Y = 40/4 = 10. y = 0, -4, 1, 3.

^ ^
β = Σxiyi/Σxi2 = 15/30 = 0.5. α^ = Y - β X = 10 - (.5)(6) = 7.
^ ^ ^
Y = α^ + βX = 8, 9.5, 11, 11.5. ε^ = Y - Y = 2, -3.5, 0, 1.5. ESS = Σ ^εi 2 = 18.5.
^
TSS = Σyi2 = 26. Note that, RSS = Σ( Y - Y )2 = 7.5 = TSS - ESS.
R2 = 1 - ESS/TSS = 1 - 18.5/26 = .288.
2 2
1 - R = (1 - R2)(N - 1)/(N - 2) = (.712)(3/2) = 1.067. ⇒ R = -.067.
s2 = ESS/(N - 2) = 18.5/(4 - 2) = 9.25.
^
Var[ β] = s2 /Σxi2 = 9.25/30 = .3083. sβ^ = √.3083 = .555.
Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (9.25)(174)/((4)(30)) = 13.41. sα^ = √13.41 = 3.66.

^
Cov[ α^ , β] = -s2 X /Σxi2 = -(9.25)(6)/30 = -1.85.
^ ^ ^
Corr[ α^ , β] = Cov[ α^ , β]/ √(Var[ α^ ]Var[ β]) = -1.85/√((13.41)(.3083)) = -.910.
10.2. C., 10.3. E., & 10.4. D.
s2 = ESS/(N - 2) = 533000/(100 - 2) = 5439.
Σxi2 = Σ(Xi - X )2 = ΣXi2 - N X 2 = 5,158,000 - (100)(1222) = 3,670,000.
^
Var[ β] = s2 /Σxi2 = 5439/3,670,000 = .001482. sβ^ = √ .001482 = .0385.
Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (5439)(5,158,000)/((100)(3,670,000)) = 76.44.

^
sα^ = √ 76.44 = 8.74. Cov[ α^ , β] = -s2 X /Σxi2 = -(5439)(122)/3,670,000 = -.1808.
10.5. C. Y = (0 + 3 + 6 + 10 + 8)/5 = 5.4.

TSS = (0 - 5.4)2 + (3 - 5.4)2 + (6 - 5.4)2 + (10 - 5.4)2 + (8 - 5.4)2 = 63.2.
2.554 = s2 = ESS/(5-2). ⇒ ESS = 7.662. R2 = 1 - ESS/TSS = 1 - 7.662/63.2 = .879.
10.6. C. 4 = s2 = ESS/(N - 2) = ESS/13. ESS = (4)(13) = 52.

R2 = 1 - ESS/TSS = 1 - 52/169 = 0.692.
2
R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - 0.692)(15 - 1)/(15 - 2) = 0.669.
2
Alternately, 1 - R = {ESS/(N - k)}/{TSS/(N - 1)} = s2/ {Σ yi2/(N - 1)} = 4/(169/14) = 56/169.
2
R = 1 - 56/169 = 113/169 = 0.669.
^ = σ2ΣXi2/(NΣxi2) = σ2(N E[X2])/{N (N Var[X])} = (σ2/N)E[X2]/Var[X] =

10.7. B. Var[ α]
(σ2/N)(Var[X] + E[X]2)/Var[X] = (σ2/N)(1 + E[X]2/Var[X]) ≥ σ2/N.
Thus the minimum possible value of Var[ α] ^ is σ2/N = 255/30 = 8.5.
Comment: The values of the independent variable X are known rather than random.
However, over all possible of sets of X at which we could have had observations of Y, the
minimum variance of the estimated intercept occurs for those sets in which X = 0.
10.8. D. s2 = ESS/(N - k) = 1036/(15 - 5) = 103.6.

s2 is an unbiased estimator of σ2. ⇒ σ2 = E[s2] = E[Σ ^εi 2/(N - k)] = E[Σ ^εi 2]/(N - k).
E[Σ ^εi 2] = (N - k)σ2, N ≥ k. For the 30 observations, E[Σ ^εi 2] = (30 - 5)(103.6) = 2590.
Comment: E[Σ ^εi 2] is not proportional to N, so twice the data does not result in twice the
expected ESS. If for example, we had had only 5 observations, then the regression would
have fit perfectly and ESS = 0. With 10 observations, E[Σ ^εi 2] = (10 - 5)σ2 > 0.
^
10.9. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} = {(30)(173) - (44)(106)}/{(30)(81) - 442} = 1.065.
^
10.10. α^ = Y - β X = 106/30 - (1.065)(44/30) = 1.971.
10.11. R2 = Corr[X, Y]2 = (Σxiyi)2/(Σxi2Σyi2)

= {ΣXiYi - ΣXiΣYi/N}2/({ΣXi2 - (ΣXi)2/N}{ΣYi2 - (ΣYi)2/N}) = SXY2/(SXXSYY).
SXX = 81 - 442/30 = 16.467. SYY = 410 - 1062/30 = 35.467. SXY = 173 - (44)(106)/30 = 17.533.
R2 = 17.5332/{(16.467)(35.467)} = 0.526.
10.12. TSS = Σ(Y i - Y )2 = ΣYi2 - (ΣYi)2/N = 410 - 1062/30 = 35.467.

ESS = (1 - R2)TSS = ( 1 - 0.526)(35.467) = 16.81.
s2 = ESS/(N - 2) = 16.81/28 = 0.600.
10.13. sβ^ = √(s2 /Σxi2) = √(0.600/{ΣXi2 - (ΣXi)2/N}) = √(0.600/(81 - 442/30)) = 0.191.

^
Alternately, Var[ β] = {SYY/SXX - (SXY/SXX)2}/(N-2). = {35.467/16.467 - (17.533/16.467)2}/28
= 0.03643. sβ^ = √ 0.03643 = 0.191.
10.14. sα^ = √{(ΣXi2/N)(s2/Σxi2)} = √(81/30)√(0.600/(81 - 442/30)) = 0.314.
^
10.15. Cov[ α^ , β] = - X s2 /Σxi2 = -(44/30)(0.600)/(81 - 442/30) = -0.0534.
^
10.16. Corr[ α^ , β] = - X √(E[X2]) = -(44/30)/√(81/30) = -0.893.
10.17. A. Y = (16.1 + 15.4 + 13.3 + 12.6 + 10.5)/5 = 13.6.

RSS = (16.1 - 13.6)2 + (15.4 - 13.6)2 + (13.3 - 13.6)2 + (12.6 - 13.6)2 + (10.5 - 13.6)2 = 20.2.
8.356 = s2 = ESS/(5-2). ⇒ ESS = 25.07. R2 = RSS/TSS = 20.2/(20.2 + 25.07) = .446.
^
Comment: RSS is the numerator of the variance of Y.
^
The mean of the predicted values, Y, is equal to the mean of the Ys, Y .
^ ^
10.18. Both estimates of β are unbiased. Therefore, MSE[ β] = Var[ β].
^
Var[ β] = σ2 /Σxi2 = σ2/(N Var[X]).
^
Since Bob’s variance of X is smaller than Ray’s, Bob’s Var[ β] is larger than Ray’s.
Comment: All else being equal, the more dispersed the values of the independent variable, the
better the estimate of the slope.
Assuming the model is a reasonable one for boys ages 10 to 14, it is probably reasonable to
use the model to estimate the average weight of a boy aged 15. It would not be reasonable to
use the model to estimate the average weight of a boy aged 2 or a man aged 25!
In general, one should be cautious about applying a model outside the range of data used to fit
that model.
10.19. C. Error Sum of Squares = ESS ≡ Σ ^εi 2 = 967.

estimated variance of the regression = s2 = ESS/(N - k) = 967/(7 - 2) = 193.4.
Σxi2 = Σ(Xi - X )2 = 2000. sβ^2 = s2 / Σxi2 = 193.4/2000 = .0967.
^
standard error of β = sβ^ = √ .0967 = .311.
^
10.20. B. Var[ β] = (σ2/Var[X])/N = σ2 / Σxi2.
Independent Variable over wider domain ⇔ larger Var[X] ⇔ better estimate of slope.
Therefore, Statement B is exactly wrong, as the variation of the X’s increases one gets a better
estimate of the slope.
s measures the dispersion of the error terms associated with the regression line.
Statement A is true, since s/ Y is like the estimated coefficient of variation of Y; the smaller this
is, the smaller the errors and the better fit.
Statement C is true; see page 65 in Pindyck and Rubinfeld.
^
Statement D is true since Cov[ α^ , β ] = -σ2 X /Σxi2; see equation 3.13 in Pindyck and Rubinfeld.
^
X > 0 ⇒ Cov[ α^ , β ] < 0.
⇒ an overestimate of α is likely to be associated with an underestimate of β.
Statement E is true; see equation 3.6 in Pindyck and Rubinfeld.
10.21. E. s2 = ESS/ (N - 2) = 2000/(20 - 2) = 111.11. X = -300/20 = -15.

Σxi2 = Σ(Xi - X )2 = Σ Xi2 - N X 2 = 6000 - (20)(152) = 1500.
^
^ β] = -s2 X /Σxi2 = -(111.11)(-15)/(1500) = 1.111.
Cov[ α,
Alternately, as discussed in a subsequent section, where X is the design matrix:
 N ΣXi   20 −300 
XX’ =  2 =  .
 ΣXi ΣXi  −300 6000 
 6000 300   .2 .01 
(XX’)-1 =   / {(20 )(6000 ) − (−300 )(−300 )} =  .
 300 20  .01 .000667 
 .2 .01   22.222 1.111
Covariance matrix is: s2(XX’)-1 = 111.11  = .
.01 .000667   1.111 .0741
^
^ β] = 1.111.
Cov[ α,
^ is the upper lefthand element of the

10.22. Continuing the previous solution, Var( α)
^
covariance matrix, 22.222, while Var( β) is the lower righthand element of the covariance
matrix, 0.0741.
^ = s2ΣXi2/(NΣxi2) = (111.11)(6000)/{(20)(1500)} = 22.222.
Alternately, Var[ α]
^
Var[ β] = s2 /Σxi2 = (111.11/1500 = 0.0741.
11.1. D. From the t-table, for 16 degrees of freedom at 2.583, there is 2% probability in the
sum of the tails. Therefore, the distribution function at 2.583 is 99%.
11.2. B. For 6 d.f. the 2% critical value is 3.143. ⇒ Prob[t < -3.5] < 2%/2 = 1%.
For 6 degrees of freedom the 1% critical value is 3.707. ⇒ Prob[t < -3.5] < 1%/2 = 0.5%.
11.3. C. From the t-table, for 7 degrees of freedom at 1.895, there is 10% probability in the
sum of the tails. Therefore, the distribution function at -1.895 is 5%.
11.4. B. For 27 degrees of freedom the 10% critical value is 1.703. ⇒ Prob[|t| < 2] > 90%.
For 27 degrees of freedom the 5% critical value is 2.052. ⇒ Prob[|t| < 2] < 95%.
11.5. C. From the t-table, for 7 degrees of freedom at 1.895, there is 10% probability in the
sum of the tails. Therefore, the distribution function at -1.895 is 5%.
11.6. B. X1 + X4 is Normal with mean 0 and variance 2. ⇒ (X1 + X4)/√2 is a Unit Normal.
X22 + X32 is Chi-Square with 2 degrees of freedom.
⇒ t = {(X1 + X4)/√2}/{√{(X22 + X32)/2}} = (X1 + X4)/√(X22 + X32)
has a t-distribution with 2 degrees of freedom.
11.7. E. W has a t-distribution with one degree of freedom. Prob[W < 6.31] = .95.
Comment: From the t-table, for one degree of freedom, Prob[|W| > 6.31] = 10%.
⇒ Prob[W > 6.31] = 5%. ⇒ Prob[W < 6.31] = 95%.
12.1. A. X = (18 + 24 + 33 + 34 + 30 + 35 + 39 + 12 + 18 + 30) / 10 = 27.3.

Variance of the estimated mean = 80/10 = 8.
Using the Normal Distribution, a 95% confidence interval is ± 1.960 standard deviations.
27.3 ± 1.960 √8 = 27.3 ± 5.54 = 21.8 to 32.8.
12.2. B. X = (18 + 24 + 33 + 34 + 30 + 35 + 39 + 12 + 18 + 30) / 10 = 27.3.

Second moment = (182 + 242 + 332 + 342 + 302 + 352 + 392 + 122 + 182 + 302) / 10 = 815.9.
Sample variance = (10/9)(815.9 - 27.32) = 78.46.
Variance of the estimated mean = 78.46/10 = 7.846.
For 10 - 1 = 9 degrees of freedom , the critical value for 5% sum of the area in both tails is
2.262. 27.3 ± 2.262 √7.846 = 27.3 ± 6.34 = 21.0 to 33.6.
12.3. E. The estimated mean is: (0 + 3 + 4 + 4 + 6 + 9 + 9 + 13)/8 = 48 / 8 = 6.

The sample variance S2 = (1/7)(62 + 32 + 22 + 22 + 02 + 32 + 32 + 72) = 120 /7 = 17.14.
The number of degrees of freedom is 8 - 1 = 7.
Since we want a 90% confidence interval, we allow a total of 10% in both tails, so t = 1.895.
The approximate interval is 6 ± tS/√n = 6 ± (1.895)(√17.14)/√8 = 6 ± 2.77 = (3.23, 8.77).
12.4. D. X = 32. Second moment = 1590.

x square of x
8 64
14 196
18 324
20 400
21 441
22 484
26 676
30 900
42 1764
55 3025
96 9216
Average 32.00 1590.00
Sample Variance = (11/10)(1590 - 322) = 622.6. The sample standard deviation S = 24.95.
We have 11-1 = 10 degrees of freedom, and for a 95% confidence interval we have t = 2.228.
Therefore, an approximate 95% confidence interval for the mean is:
32 ± tS/√ n = 32 ± (2.228)(24.95)/√ 11 = 32 ± 16.76 = (15.2, 48.8).
12.5. B. t = ( X - µ)/(S/√n) = (-6 - (-2))/√(46/20) = -2.638.

For 19 d.f., for a 1-sided test, the 1% critical value is 2.539 and the 0.5% critical value is 2.861.
2.539 < 2.638 < 2.861. ⇒ reject H0 at 1% and do not reject at 0.5%.
Comment: We perform a one-sided test since H1: µ < -2. Note that X < -2.
If X ≥ -2, then we do not reject in this case.
12.6. C. H0: the two means are the same, or that the mean of the difference is zero.
County New Old Difference Square
Program Program of Difference
1 56 49 7 49
2 52 57 - 5 25
3 52 64 -12 144
4 49 53 - 4 16
5 59 64 - 5 25
6 56 68 -12 144
7 60 68 - 8 64
8 56 66 -10 100
Average -6.125 70.875
The mean of the differences is: - 6.125.
The sample variance of the differences is: (8/7)(70.875 - 6.1252) = 38.125.
t = -6.125/√(38.125/8) = -2.806, with 7 degrees of freedom.
ν 0.1 0.05 0.02 0.01
7 1.895 2.365 2.998 3.499
Since 2.365 < 2.806 < 2.998, reject H0 at 5% and do not reject H0 at 2%.
12.7. C. From the previous solution, t = -2.806.
We now do a one-sided rather than two-sided t-test. Since 2.365 < 2.806 < 2.998,
we reject H0 at 5%/2 = 2.5% and do not reject H0 at 2%/2 = 1%.
Comment: At a 2.5% significance level, we have shown that the new program reduces
expected losses. In other words, there is less than a 2.5% chance we would have observed the
improvements we did see if the new program did not reduce expected losses. While the values
in the t-table consider the area in both tails, for this test we only look at the area under the t-
distribution to the left of -2.806.
0.3
0.2
0.1
-4 -2 2 4
Using a computer, the area to the left of -2.806 is 1.31%, for a p-value of 1.31% for the
one-sided test.
12.8. D. From a previous solution, the mean of the differences is: - 6.125.
The sample variance of the differences is 38.125. The cost is 1.5.
t = (-6.125 + 1.5)/√(38.125/8) = -2.119, with 7 degrees of freedom.
ν 0.1 0.05 0.02 0.01
7 1.895 2.365 2.998 3.499
Doing a one-sided test, since 1.895 < 2.119 < 2.365,
we reject H0 at 10%/2 = 5% and do not reject H0 at 5%/2 = 2.5%.
12.9. A. t = ( X - µ)/(S/√n) = (60 - 55)/√(33/15) = 3.371.

For 14 d.f., for a 2-sided test, the 1% critical value is 2.977.
3.371 > 2.977. ⇒ reject H0 at 1%.
Comment: A 99% confidence interval for µ is: 60 ± (2.977)√(33/15) = 60 ± 4.416.
Since 55 is outside this confidence interval, we can reject H0 at 1%.
Using a computer, the p-value of the two-sided test is .46%.
12.10. E. Differences are: 5, -1, 6, 0, 10. Mean of differences is: 4.

Sample variance of differences is: {(5 - 4)2 + (-1 - 4)2 + (6 - 4)2 + (0 - 4)2 + (10 - 4)2}/4 = 20.5.
t = 4/√(20.5/5) = 1.975. Perform a one-sided t-test at 4 degrees of freedom.
1.975 < 2.132. Do not reject at 5%.
12.11. E. X = Σxi/n = 132/11 = 12. S2 = Σ(xi - x)2/(n-1) = 99/10 = 9.9.

For the t-distribution with 11 - 1 = 10 degrees of freedom, for 10% area in both tails, the critical
value is 1.812.
90% confidence interval: X ± t√(S2/n) = 12 ± (1.812)√(9.9/11) = 12 ± (1.812)√.90.
Thus k = 1.812.
12.12. D. t = ( X - µ)/√(S2/n) = (15.84 - 10)/√(16/4) = 2.92.

For 3 d.f. and 5% area in both tails, the critical value is 3.182.
Since 2.92 < 3.182, do not reject at 5%.
12.13. E. t = ( X - µ)/√(S2/n) = 3( X - µ)/S has a t-distribution with 8 degrees of freedom.

( X - µ) < .62S. ⇔ ( X - µ)/S < .62. ⇔ t < 1.86. Prob[t < 1.86] = .95.
Comment: For 8 d.f. at 1.86 there is 10% area in both tails, or 5% area in the righthand tail.
12.14. A. With a sample size of 10, we have a t-distribution with 10 - 1 = 9 d.f.

The critical value for 5% area in both tails is 2.262.
The variance of the mean is s2/10.
The standard deviation of the mean is s/√10.
A 95% confidence interval for µ: x ± 2.26 s/√10.
12.15. A. The point estimate of the mean is: (0 + 3 + 5 + 5 + 5 + 6 + 8 + 8 + 9 + 11)/10 = 6.

S2 = (62 + 32 + 12 + 12 + 12 + 02 + 22 + 22 + 32 + 52)/9 = 10. Thus the sample standard
deviation S = 3.162. We have 10 - 1 = 9 degrees of freedom, and for a 95% confidence interval
we want t = 2.262. An approximate 95% confidence interval for the mean is:
6 ± t S/√ n = 6 ± (2.262)(3.162)/√ 10 = (3.738, 8.262).
12.16. E. t = ( X - m)/(S/√n) = (52.53 - 50)/(3.3/√9) = 2.3.

The critical value at 2.5% for a one-sided test t-test with 8 d.f. is 2.306.
Comment: Since 2.3 < 2.306, we do not reject at 2.5%.
12.17. C. X = 10. S2 = {(12 - 10)2 + (8 - 10)2 + (10 - 10)2}/(3 - 1) = 4.

Standard Deviation of the mean is: √(4/3) = 1.155
For 2 d.f., from the t-table, S(t) = 95% for t = -2.920 standard deviations.
k = 10 - (2.920)(1.155) = 6.63.
Comment: 2.920 leaves 10% outside on both tails, and therefore 5% in the lefthand tail.
12.18. E. Xi - Yi is Normal with mean µX - µY and variance: σ2 = σX2 + σY2.

If H0 is true Xi - Yi is Normal with mean 0 and variance σ2.
Σ (Xi - Yi) is Normal with mean 0 and variance 3σ2.
Σ (Xi - Yi)/(σ√3) is a Unit Normal.
Σ {(Xi - Yi) - ( X - Y )}2/σ2 has Chi-Square Distribution with 3 - 1 = 2 degrees of freedom.
Thus {Σ (Xi - Yi)/(σ√3)}/√(Σ {(Xi - Yi) - ( X - Y )}2/σ2/2) = √1.5 Σ (Xi - Yi) /√(Σ {(Xi - Yi) - ( X - Y )}2)
has a t-statistic with 2 degrees of freedom.
For 2 degrees of freedom, for a two-sided test, the 5% critical value is 4.303.
Since 4.10 ≤ 4.303, we do not reject at the 5% significance level.
Comment: When doing such a paired test of means, the statistic has a t-distribution with n - 1
degrees of freedom, where n is the number of pairs.
12.19. D. There are 10 points, so we have 10 - 1 = 9 degrees of freedom when using the
Student’s t-distribution to get an interval estimate of the mean.
For a 90% confidence interval t = 1.833.
The point estimate of the mean is: X = (2 + 0 + 4 + 4 + 6 + 3 + 1 + 5 + 6 + 9)/10 = 4.
The sample variance is: S2 = {(2-4)2 + (0-4)2 + (4-4)2 + (4-4)2 + (6-4)2 + (3-4)2 + (1-4)2 +
(1-4)2 + (5-4)2 + (6-4)2 + (9-4)2}/9 = 7.111. The 90% confidence interval for the mean is:
X ± t S / √ n = 4 ± (1.833)(√ 7.111)/√ 10 = 4 ± 1.55 = (2.45, 5.55).
12.20. A. The estimated variance of the mean is: 135/15 = 9.

Looking at the t-table, with 15 - 1 = 14 degrees of freedom, for a 90% confidence interval we
want ±1.761 standard deviations. 47.21 ± (1.761)(3) = (41.93, 52.49).
Comment: Since this is a variance estimated from the data, we use the t-distribution rather than
the Normal Distribution.
12.21. C. Zi is Normal with mean 0 and variance σX2 + σY2.

Z is Normal with mean 0 and variance (σX2 + σY2)/9.
3 Z /√(σX2 + σY2) has a unit Normal Distribution.
8SZ2/(σX2 + σY2) is Chi-Square with 9 - 1 = 8 degrees of freedom.
⇒ {3 Z /√(σX2 + σY2)}/√{SZ2/(σX2 + σY2)} = 3 Z /SZ has a t-distrib. with 8 degrees of freedom.
P[3 Z /SZ ≤ 1.860] = .95. ⇒ c = 1.860/3 = .620.
Comment: From the t-table, for 8 degrees of freedom, 10% probability on both tails has a
critical value of 1.860.
12.22. E. X - Y is Normal with mean µx - µy, (and variance σx2 + σy2 - 2ρσxσy.) Let Z = X - Y.
√8 ( X - Y ) /√{(1/7)Σ{(Xi - Yi) - ( X - Y )}2} = √n Z /√{(1/(n-1))Σ(Zi - Z )2 = Z /{SZ/√n}.
This is the form of a t-distribution with n -1 = 8 -1 = 7 degrees of freedom.
Prob[|t| > 2.365] = 5%.
Comment: The absolute value results in a two-sided t-test.
For 7 degrees of freedom, for a two-sided 5% significance level, the critical value is 2.365.
12.23. A. If X follows a LogNormal Distribution with parameters µ and σ, then ln(X) follows a
Normal Distribution with parameters µ and σ. Thus we have that ln(2) = .693, ln(8) = 2.079,
ln(13)=2.656 and ln(27) = 3.296 are 4 random draws from a Normal Distribution.
The point estimate of the mean is: (.693 + 2.079 + 2.565 + 3.296)/4 = 2.158.
S2 = {(.693 - 2.158)2 + (2.079 - 2.158)2 + (2.079 - 2.158)2 + (2.079 - 2.158)2}/3 = 1.204.
Thus the sample standard deviation S = 1.097.
We have 4-1 = 3 degrees of freedom, and for a 90% confidence interval we have t = 2.353.
Therefore, an approximate 90% confidence interval for the mean is:
2.158 ± t S/√n = 2.158 ± (2.353)(1.097)/√4 = 2.158 ± 1.291 = (.867, 3.449).
12.24. X is Normal with mean 0 and variance σ2/9.

Therefore, 3 X /σ is a Standard Normal.
8S2/σ2 is Chi-Square with 8 degrees of freedom.
Therefore, (3 X /σ)/√{S2/σ2} = 3 X /S has a t-distribution with 8 degrees of freedom.
Prob( X > S) = Prob(3 X /S > 3) = Prob[t with 8 d.f. > 3], which is between 1/2% and 1%.
Comment: ( X - µ0)/(S/√n) follows a t-distribution with n-1 d.f.
Using the t-table, for 8 degrees of freedom, Prob[t > 2.896] = 1% and Prob[t > 3.355] = 1/2%.
Using a computer, the exact probability that t > 3 is 0.85%.
Solutions to problems in the remaining sections appear in Study Guides M, etc.

Mahler’s Guide to
Regression
Sections 13-21
prepared by
Study Aid F06-Reg-M

Sharon, MA, 02067
HCMSA-F06-Reg-M, Solutions to Regression §13-21, 7/12/06, Page 448
13.1. C. For 25 - 2 = 22 degrees of freedom, for 1 - 90% = 10% area in both tails,
the critical value for the t-distribution is 1.714.
The 90% confidence interval for the slope is:
^
β + t sβ^ = 1.73 ± (1.714)(.24) = 1.73 ± .41 = [1.32, 2.14].
13.2. E. For 22 - 2 = 10 degrees of freedom, for 1 - 98% = 2% area in both tails,

the critical value for the t-distribution is 2.764. The 98% confidence interval is:
α^ + t sα^ = α^ ± (2.764)(126) = α^ ± 348, of width (2)(348) = 696.
13.3. B. &. 13.4. E. s2 = Σ ^εi 2/(N - 2) = 276/(10 - 2) = 34.5.

Var[ α^ ] = s2(ΣXi2 / N)/Σxi2 = (34.5)(385/10)/82.5 = 16.1. sα^ = √16.1 = 4.012.
^
Var[ β] = s2 /Σxi2 = 34.5/82.5 = .418. sβ^ = √.418 = .647.
For N - 2 = 8 degrees of freedom, for a 1% area in both tails of the t-distribution, t = 3.355.
Therefore, for a 99% confidence interval for α, we want:
α^ ± 3.355 sα^ = 6.93 ± (3.355)(4.012) = 6.93 ± 13.46 = (-6.5, 20.4).
Therefore, for a 99% confidence interval for β, we want:
^
β ± 3.355 sβ^ = 2.79 ± (3.355)(.647) = 2.79 ± 2.17 = (.62, 4.96).
^
13.5. E. &. 13.6. A. ESS = Σ ^εi 2 = Σ(Yi - Yi )2 = 272.
s2 = ESS/(N - 2) = 272/(12 - 2) = 27.2.
^
α^ = Y - β X ⇒ 9.88 = 390/12 - 2.36 X ⇒ X = 9.585.
Var[X] = Σ(Xi - X )2 / N = 1283/12.
However, Var[X] = ΣXi2 / N - X 2 ⇒ ΣXi2 / N = 1283/12 + 9.5852 = 198.8.
Var[ α^ ] = s2(ΣXi2 / N)/Σxi2 = (27.2)(198.8)/1283 = 4.215. sα^ = √4.215 = 2.05.

^
Var[ β] = s2 /Σxi2 = 27.2/1283 = .0212. sβ^ = √.0212 = .146.
α^ ± 1.812 sα^ = 9.88 ± (1.812)(2.05) = 9.88 ± 3.71 = (6.2, 13.6).
^
β ± 1.812 sβ^ = 2.36 ± (1.812)(.146) = 2.36 ± .265 = (2.1, 2.6).
^ ^
13.7. β = Σ xiyi / Σ xi2 = -20008/180021 = -0.111. α^ = Y - β X = 31.7 - (-0.111)(155.8) = 49.0.
X X^2 Y x y x^2 xy y^2
10 100 60 -145.8 28.3 21,267 -8,750 802.8
25 625 40 -130.8 8.3 17,117 -5,233 69.4
50 2,500 50 -105.8 18.3 11,201 -5,292 336.1
100 10,000 30 -55.8 -1.7 3,117 -1,675 2.8
250 62,500 10 94.2 -21.7 8,867 942 469.4
500 250,000 0 344.2 -31.7 118,451 0 1002.8
Sum 935 325,725 190 -0.0 -0.0 180,021 -20,008 2683.3
Avg. 155.8 31.7
^
TSS = Σ yi2 = 2683. RSS = β Σxiyi = (-.111)(-20008) = 2221.
ESS = TSS - RSS = 2683 - 2221 = 462. s2 = ESS/(N - 2) = 462/(6 - 2) = 115.
Var[ α^ ] = s2(ΣXi2 / N)/Σxi2 = (115)(325725/6)/180021 = 34.68. sα^ = √34.68 = 5.89.

^
Var[ β] = s2 /Σxi2 = 115/180021 = .000639. sβ^ = √.000639. = .0253.
α^ ± 3.747 sα^ = 49.0 ± (3.747)(5.89) = 49.0 ± 22.1 = (27, 71).
^
β ± 3.747 sβ^ = -0.111 ± (3.747)(.0253) = -0.111 ± .0948 = (-.206, -.016).
13.8. D. &. 13.9. C. TSS = Σ yi2 = 6748. ESS = TSS - RSS = 6748 - 1279 = 5469.
s2 = ESS/(N - 2) = 5469/(14 - 2) = 455.75.
^
α^ = Y - β X = 25.86 - (.643)(13.86) = 16.95. Var[X] = Σ(Xi - X )2 / N = 3096/14.
However, Var[X] = ΣXi2 / N - X 2 ⇒ ΣXi2 / N = 3096/14 + 13.862 = 413.2.
Var[ α^ ] = s2(ΣXi2 / N)/Σxi2 = (455.75)(413.2)/3096 = 60.83. sα^ = √60.83 = 7.80.

^
Var[ β] = s2 /Σxi2 = 455.75/3096 = .1472. sβ^ = √.1472 = .384.
α^ ± 2.179 sα^ = 16.95 ± (2.179)(7.80) = 16.95 ± 17.00 = (0, 34).
^
β ± 2.179 sβ^ = .643 ± (2.179)(.384) = .643 ± .837 = (-0.2, 1.5).
^
13.10. E. s2 = Σ (Y i - Yi )2/(N - k) = 2.79/8 = .349.
variance = 2nd moment - mean2. ⇒ Σxi2/N = ΣXi2/N - X 2. ⇒ 180/10 = ΣXi2/10 - 62. ⇒ ΣXi2
= 180 + 360 = 540. Var[ α^ ] = s2(ΣXi2 / N)/Σxi2 = (.349)(540/10)/180 = .1047.

sα^ = √.1047 = .324. For 8 degrees of freedom and 95%, the t-statistic is 2.306.
Therefore we want: α^ ± (2.306)(.324) = α^ ± .747.

Thus the width of the interval is: (2)(.747) = 1.49.
13.11. C. For 12 - 2 = 10 degrees of freedom and 95% confidence, the t statistic is 2.228.
The confidence interval for β is 2.5 ± 1.3. Therefore, 1.3 = t sβ^ . ⇒ sβ^ = 1.3/2.228 = .583.
Now sβ^ 2 = s2/Σxi2. ⇒ s2 = sβ^ 2Σxi2 = (.5832)(.826) = .281.
^
13.12. (i) β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(13)(286.6299) - (34.023)(110.679)}/{(13)(91.3978) - 34.0232} = -39.443/30.607 = -1.289.
^
^ = Y - β X = (110.679/13) - (-1.289)(34.023/13) = 11.887.
α
(ii) (a) ESS = (-.005)2 + (-.049)2 + (-.019)2 + (-.043)2 + (.032)2 + (.024)2 + (.070)2 + (.068)2
+ (.037)2 + (-.095)2 + (-.129)2 = .049019.
s2 = ESS/(n-2) = .049019/11 = 0.004456.
(b) Var[X] = 91.3978/13 - (34.023/13)2 = 0.1811.
^
Var[ β] = s2/{nVar[X]} = 0.004456/{(13)(0.1811)} = .00189.
For a t-distribution with 11 d.f., for 5% area in both tails, the critical value is 2.201.
A 95% confidence interval for β is: -1.289 ± 2.201√.00189 = -1.289 ± .096 = -1.385 to -1.193.
(c) If the relationship: P = k/L. ⇔ lnP = lnk - lnL. ⇔ Y = lnk - X.
is correct, then the slope parameter of the regression line should be -1.
Since -1 is not in the 95% confidence interval for β, the data do not support the suggested
relationship; we can reject this hypothesis at the 5% significance level.
(iii) The residuals show that the line underestimates in the middle. A straight line doesn’t fit the
data very well.
Residuals
0.05
ln L
2 2.2 2.4 2.6 2.8 3 3.2
-0.05
- 0.1
^
^ and β from the given fitted values.
Comment: One could back out α
Check: 11.887 + (-1.289)(1.946) = 9.379. 11.887 + (-1.289)(2.197) = 9.055.
Part (ii) (c) is equivalent to a t-test of the hypothesis β = -1.
13.13. D. s2 = Σ ^εi 2/(N - 2) = 7832/(20 - 2) = 435.11.

^
Var[ β] = s2 /Σxi2 = 435.11/10,668 = .04079. sβ^ = √.04079 = .202.
^
Therefore, for a 95% confidence interval, we want β ± 2.101 sβ^ = -1.104 ± (2.101)(.202) =
-1.104 ± .424 = (-1.528, -.680).
Comment: We are given that TSS = Σyi2 = Σ(Yi - Y )2 = 20,838 and ESS = Σ ^εi 2 = 7,832.
Therefore, RSS = TSS - ESS = 20838 - 7832 = 13006.
^ ^ ^ ^
However, we also have that RSS = β Σxiyi = β( β Σxi2) = β 2Σxi2 = (-1.104)2(10668) = 13002,
matching subject to rounding.
13.14. D. R 2 = 1 - ESS/TSS = 1 - 7832/20838 = 1 - .376.

2 2
1 - R = (1 - R2)(N - 1)/(N - k) = (.376)(20 - 1)/(20 - 2) = .397. R = .603.
^
13.15. B. ESS ≡ Σ ^εi 2 = Σ(Yi - Yi )2 = 2394.
s2 = ESS/(N - k) = 2394/(8 - 2) = 399.
Σxi2 = Σ(Xi - X )2 = 1.62.
sβ^2 = s2 / Σxi2 = 399/1.62 = 246.3. sβ^ = √246.3 = 15.69.
For 8 -2 = 6 degrees of freedom, for 10% two-tailed test using the t-distribution, or a 90%
confidence interval, we want ±1.943 standard deviations.
^
β ± t.95 sβ^ = -35.69 ± (1.943)(15.69) = (-66.2, -5.2).
13.16. (i) A scatter plot of wx against x:

wx
-1.6
-1.8
-2
-2.2
-2.4
-2.6
x
70 71 72 73 74 75 76 77
The relationship of x and y is not linear,
while the relationship of x and w appears to be approximately linear.
^
(ii) β = {NΣXiWi - ΣXiΣWi }/ {NΣXi2 - (ΣXi)2} =
{(8)(-1246.7879) - (588)(-17.0568)}/{(8)(43260) - 5882} = 0.16397.
^
^ = W - β X = (-17.0568/8) - ( 0.16397)(588/8) = -14.184.
α
Fitted line is: wx = -14.184 + 0.16397x.
Alternately, SXX = ΣXi2 - (ΣXi)2/N = 43260 - 5882/8 = 42.
SXW = ΣXiWi - ΣXiΣWi /N = -1246.7879 - (588)(-17.0568)/8 = 6.8869
^
β = SXW/SXX = 6.8869/42 = 0.16397. Proceed as before.
(b) A scatter plot of wx against x, showing the regression line:
wx
-1.6
-1.8
-2
-2.2
-2.4
-2.6
x
70 71 72 73 74 75 76 77
(c) RSS = SXW2/SXX = 6.88692/42 = 1.129271.
TSS = SWW = ΣWi2 - (ΣWi) 2/N = 37.5173 - 17.05682/8 = 1.150497.
ESS = TSS - RSS = 1.150497 - 1.129271 = 0.021226.
s2 = ESS/(N-2) = 0.021226/6 = 0.003538.
^
Var[ β] = s2 /SXX = 0.003538/42 = .000084229.
For the t-distribution with 6 degrees of freedom, in order to have a total of 5% area in both tails,
the critical value is 2.447.
Thus a 95% confidence interval for β is:
0.1640 ± 2.447√ .000084229 = 0.1640 ± .0225, or (0.141, 0.187).
(d) The fitted value of w for age 71 is: -14.184 + (0.16397)(71) = -2.542.
The fitted value of y for age 71 is: exp[-2.542] = 0.0787.
The fitted value of n for age 71 is: (0.0787)(471) = 37.1.
The fitted value of w for age 76 is: -14.184 + (0.16397)(76) = -1.722.
The fitted value of y for age 76 is: exp[-1.722] = 0.1787.
The fitted value of n for age 76 is: (0.1787)(468) = 83.6.
(iii) E[Nx] = Exbcx. ⇒ E[Yx] = E[Nx/Ex] = bcx. ⇒ ln(E[Yx]) = ln b + x ln c.
The above regression fitted the model: wx = lnyx = α + βx.
This model was based on an assumption that: E[lnYx] = α + βx.
Thus the two models are similar.
However, E[lnYx] ≠ ln(E[Yx]), so the two models are not the same.
Comment: If Nx is a random variable with mean Exbcx, then mortality approximately follows
Gompertz’s Law, which has a force of mortality of bcx. See Actuarial Mathematics, Section 3.7.
13.17. (a) A scatter plot of y against x:
0.5
x
30 40 50 60
-0.5
-1
-1.5
-2
While the relationship of x and y appears to be approximately linear for the middle groups, it
seems to deviate from linear for the lowest and highest age groups.
(b) SXX = ΣXi2 - (ΣXi)2/N = 17437.5 - 3602/8 = 1237.5.
SXY = ΣXiYi - ΣXiΣYi /N = -9.0429 - (360)(-2.9392)/8 = 123.22
^
β = SXY/SXX = 123.22/1237.5 = 0.09957.
^
^ = Y - β X = (-2.9392/8) - (0.09957)(360/8) = -4.848.
α
Fitted line is: y = -4.848 + 0.09957x.
(c) RSS = SXY2/SXX = 123.222/1237.5 = 12.2692.
TSS = SYY = ΣYi2 - (ΣYi) 2/N = 13.615 - 2.93922/8 = 12.5351.
ESS = TSS - RSS = 12.5351 - 12.2692 = 0.2659.
s2 = ESS/(N-2) = 0.2659/6 = 0.04432.
^
Var[ β] = s2/SXX = 0.04432/1237.5 = .00003581.
For the t-distribution with 6 degrees of freedom, in order to have a total of 1% area in both tails,
the critical value is 3.707.
Thus a 99% confidence interval for β is:
0.09957 ± 3.707√ .00003581 = 0.09957 ± .0222, or (0.077, 0.122).
(d) If the probability of having coronary heat disease for the different age groups were the
same, then the slope would be zero. However, zero is not in the 99% confidence interval for β.
Therefore, at a significance level of 1%, we can reject the hypothesis that the probability of
having coronary heat disease for the different age groups is the same.
Comment: ln[p/(1-p)] is called the logit function, and is used in Generalized Linear Models.
13.18. D. s2 = Σ ^εi 2/(N - 2) = ESS/(N - 2) = 5348/(20 - 2) = 297.1.

Σxi2 = Σ(Xi - X )2 = ΣXi2 - N X 2. ΣXi2 = Σxi2 + N X 2 = 2266 + (20)(1002) = 202266.
Var[ α^ ] = s2ΣXi2 / (NΣxi2) = (297.1)(202266)/{(20)(2266)} = 1326. sα^ = √1326 = 36.4.
Therefore, for a 95% confidence interval, we want: α^ ± 2.101 sα^ = 68.73 ± (2.101)(36.4) =
68.73 ± 76.5 = (-7.8, 145.2).
^
13.19. A. Var[ β] = s2 /Σxi2 = 297.1/2266 = .1311. sβ^ = √.1311= .362.
Therefore, for a 95% confidence interval we want: ± 2.101 sβ^ .
The width of the confidence interval is: (2)(2.101)(.362) = 1.52.
14.1. The critical values for 5% and 1% are 8.88 and 27.67 respectively.
Since 30.03 > 27.67, the p-value is less than 1%.
Comment: Using a computer, the p-value is: 0.89%.
14.2. The critical values for 5% and 1% are 2.60 and 3.94 respectively.
Since 2.60 < 2.72 < 3.94, we reject at 5% and do not reject at 1%.
14.3. Finding the appropriate column of the F-Table where the critical value for 5% is 3,
ν2 = 12.
14.4. In the F-Table for ν1 = 1 and ν2 = 10, the 1% critical value is 10.04.
Therefore at the 1% significance level with 10 degrees of freedom the t-statistic has a critical
value of √10.04 = 3.169.
Comment: This is indeed the critical value shown in the t-table.
14.5. As shown in the F-Table, 15.21.
14.6. As shown in the F-Table, for 3 and 8 degrees of freedom, the 95th percentile is 4.07.
Fν1,ν2(x) = 1 - Fν2,ν1(1/x). Therefore 95% = F3,8(4.07) = 1 - F8,3(1/4.07). ⇒ F8,3(.246) = 5%.
The 5th percentile of the F-Distribution with 8 and 3 d.f. is 0.246.
14.7. Z12 + Z22 is Chi-Square with 2 degrees of freedom.

Z32 + Z42 + Z52 is Chi-Square with 3 degrees of freedom.
Therefore, {(Z12 + Z22)/2}/{(Z32 + Z42 + Z52)/3} has an F-Distribution with 2 and 3 d.f.
Prob[Z12 + Z22 ≥ 20.55(Z32 + Z42 + Z52)] = Prob[(Z12 + Z22)/(Z32 + Z42 + Z52) ≥ 20.55] =
Prob[{(Z12 + Z22)/2}/{(Z32 + Z42 + Z52)/3} ≥ 30.82] = Prob[F2,3 ≥ 30.82] = 1%.
14.8. As shown in the F-Table, 3.01.

14.9. F = second sample variance / first sample variance = 54.33/4.92 = 11.04.
observation Sample
Sample 1 2 3 4 Mean Variance
A 5 7 2 3 4.25 4.92
B 4 1 15 6.67 54.33
F has an F-Distribution with 2 and 3 degrees of freedom.
The 5% critical value is 9.55. The 1% critical value is 30.82.
Since 9.55 < 11.04 < 30.82, we reject H0 at 5% and do not reject at 1%.
Comment: The p-value is 4.1%.
14.10. D. Σ (X i - X )2 is (n-1) times the sample variance of X, and thus is σX2 times a
Chi-Square distribution with n - 1 = 11 degrees of freedom.
Σ(Yi2 - Y )2 is (m-1) times the sample variance of Y, and thus is σY2 times a Chi-Square
distribution with m - 1 = 14 degrees of freedom.
If H0 is true, ({Σ(Xi - X )2/σX2}/11) /({Σ(Yi2 - Y )2/σY2}/14) = 14W/11, has an F-Distribution with 11
and 14 degrees of freedom. The critical value for 1% is 3.86.
Prob[14W/11 > 3.86] = 1%. ⇒ Prob[W > 3.03] = 1%.
14.11. E. Mean of X is: 255/25 = 10.2. Second Moment of X is: 2867/25 = 114.68.
Variance of X is: 114.68 - 10.22 = 10.64. Sample Variance of X is: (25/24)(10.64) = 11.08.
Mean of Y is: 212/20 = 10.6. Second Moment of Y is: 2368/20 = 118.4.
Variance of Y is: 118.4 - 10.62 = 6.04. Sample Variance of Y is: (20/19)(6.04) = 6.36.
F = (Sample Variance of X)/(Sample Variance of Y) = 11.08/6.36 = 1.74.
For 24 and 19 degrees of freedom, the critical value for 5% is 2.11.
Since 1.74 < 2.11, we do not reject at 5%.
Comment: The “regular” variance has n in the denominator, while the sample variance has
n - 1 in the denominator. Thus one way to calculate the sample variance is as:
{n/(n-1)}(Second Moment - Square of the Mean).
14.12. B. To test the hypothesis σW2 = σX2 versus σW2 < σX2, F = 1787/598 = 2.99.
For 12 and 12 d.f. the 5% critical value is 2.69 and the 1% critical value is 4.16.
2.69 < 2.99 < 4.16. ⇒ Reject σW2 = σX2 at 5% but not at 1%.
To test the hypothesis σX2 = σY2 versus σX2 < σY2, F = 3560/1787 = 1.99.
1.99 < 2.69. ⇒ Do not reject σX2 = σY2 at 5%.
14.13. E. F = 1083/722 = 1.5.

Y is in the numerator, and has 200 - 1 = 199 degrees of freedom; ν1 = 199. ν2 = 100 - 1 = 99.
The survival function of the F-Distribution with 199 and 99 degrees of freedom at 1.5 is:
1 - β[199/2, 99/2; ((199)(1.5) / (99 + (199)(1.5))] = 1 - β [99.5, 49.5; .751]
Comment: Using a computer, the p-value is 1.22%.
F(x) = β[ν1/2, ν2/2; ν1x / (ν2 + ν1x)] = 1 - β[ν2/2, ν1/2; ν2/(ν2 + ν1x)].
Thus the p-value can also be written as: β[49.5, 99.5; .249].
14.14. A. The sample variance of X divided by σX2 has a Chi-Square Distribution with 10 d.f.
The sample variance of Y divided by σY2 has a Chi-Square Distribution with 6 d.f.
Therefore, (SX2/σX2) / (SY2/σY2) = (σY2/σX2)(SX2/SY2) has an F-Distribution with 10 and 6 d.f.
If H0 is true, then (σY2/σX2)(SX2/SY2) = (1/b)(SX2/SY2) has an F-Distribution with 10 and 6 d.f.
(1/b)(SX2/SY2) = (1/b)(189/37) = 5.108/b. For 10 and 6 d.f., the 5% critical value is 4.06.
Provided 5.108/b > 4.06 we reject H0. 1.26 > b. The largest b for which we reject is 1.25.
14.15. E. Taking logarithms, each sample has a Normal Distribution with parameters µ
and σ equal to those of the samples LogNormal Distribution.
Sample one: 6.908, 7.313, 8.006, 10.127, 13.122.
The mean is 9.095 and the sample variance is 6.607.
Sample two: 6.215, 6.908, 7.601.
The mean is 6.908 and the sample variance is .480.
F = 6.607/.480 = 13.76.
We perform a two-sided test, since the alternative is that the two variances are not equal.
The (one-sided) 5% critical value for 4 and 2 degrees of freedom is 19.25.
13.76 < 19.25 so we do not reject at 10%, for a 2-sided test.
Comment: This test does not have a lot of power with such small sample sizes.
14.16. C. Σ (X i - X )2 is (n-1) times the sample variance of X, and thus is σX2 times a
Chi-Square distribution with n - 1 = 9 degrees of freedom. Y has a known mean of zero,
therefore ΣYi2 is σY2 times a Chi-Square distribution with m = 6 degrees of freedom.
If H0 is true, ({Σ(Xi - X )2/σX2}/9) /({ΣYi2/σY2}/6) = 2W/3, has an F-Distribution with 9 and 6
degrees of freedom. The critical value for 5% is 4.10.
Prob[2W/3 > 4.10] = 5%. ⇒ Prob[W > 6.15] = 5%.
14.17. A. ΣXi2 has a Chi-Square Distribution with 6 degrees of freedom.

ΣYi2 has a Chi-Square Distribution with 8 degrees of freedom.
(ΣXi2/6)/(ΣYi2/8) = (4/3)ΣXi2/ΣYi2 = Z has an F-Distribution with 6 and 8 d. f.
Finding the 1% critical value in the F-Table, the 99th percentile of Z is 6.37.
14.18. D. X - Y is Normal with mean 0 and variance: σ2/4 + σ2/4 = σ2/2.

Therefore, ( X - Y )2/(σ2/2) is Chi-Square with 1 d.f.
Σ(Xi - X )2/σ2 is Chi-Square with 3 d.f.
Σ(Yi - Y )2/σ2 is Chi-Square with 3 d.f.
Therefore, {Σ(Xi - X )2 + Σ(Yi - Y )2}/σ2 is Chi-Square with 6 d.f.
Therefore, {(( X - Y )2/(σ2/2))/1}{({Σ(Xi - X )2 + Σ(Yi - Y )2}/σ2)/6} =
12( X - Y )2/{Σ(Xi - X )2 + Σ(Yi - Y )2} has an F-distribution with 1 and 6 degrees of freedom.
Comment: (n-1)S2/σ2 = Σ(Xi - X )2/σ2 is Chi-Square with n -1 = 4 - 1 = 3 d.f.
14.19. B. (X - 2)2/σ2 is the square of a Unit Normal, or a Chi-Square with 1 d.f.

(Y - 1)2/σ2 + (Z - 2)2/σ2 is a Chi-Square with two degrees of freedom.
⇒ {{(X - 2)2/σ2}/1}/{{(Y - 1)2/σ2 + (Z - 2)2/σ2}/2} = 2 (X - 2)2/{(Y - 1)2 + (Z - 2)2} is an
F-Distribution with 1 and 2 degrees of freedom, thus so would W if 4c = 2. ⇒ c = 1/2.
9
14.20. B. Xi/2 is a Unit Normal. ∑ Xi2 /4 is Chi-Square with 9 d.f.
i=1
8
Yi/3 is a Unit Normal. ∑ Yj2 /9 is Chi-Square with 8 d.f.
j=1
9 8 9 8
({ ∑ Xi2 /4}/9)/({ ∑ Yj2 /9}/8) = 2 ∑ Xi2 / ∑ Yj2 has an F-Distribution with 9 and 8 d.f.
i=1 j=1 i=1 j=1
9 8
P[2 ∑ Xi2 / ∑ Yj2 > 5.91] = 1%. ⇒ c = 5.91/2 = 2.96.
i=1 j=1
14.21. B. S12 divided by σ12 has a Chi-Square Distribution with 9 - 1 = 8 d.f.

S22 divided by σ22 has a Chi-Square Distribution with 6 - 1 = 5 d.f.
⇒ (S12/σ12) / (S22/σ22) = (σ22/σ12)W has an F-Distribution with 8 and 5 degrees of freedom.
If H0 is true then 2W has an F-Distribution with 8 and 5 d.f.
For a one sided F-Test of size 5%, consulting the table, the critical value is 4.82.
2W = 4.82. ⇒ W = 2.41.
Comment: We reject when W > 2.41; when σ12 > σ22/2, we expect S12 / S22 to be big.
14.22. A. X - 30 is Normal with mean 0 and variance σ2/7.

⇒ ( X - 30)/(σ/√7) is a Unit Normal.
⇒ 7( X - 30)2/σ2 is Chi-Square with one degree of freedom.
Y - 30 is Normal with mean 0 and variance σ2/14.
⇒ ( Y - 30)/(σ/√14) is a Unit Normal.
⇒ 14( Y - 30)2/σ2 is Chi-Square with one degree of freedom.
Therefore, ({14( Y - 30)2/σ2}/1)/({7( X - 30)2/σ2}/1) = 2( Y - 30)2/( X - 30)2 has an F distribution
with 1 and 1 degrees of freedom.
14.23. Consulting the F-Table for 12 and 6 degrees of freedom, S(4.00) = 5%.
Therefore, using the general fact that Fν1,ν2(x) = 1 - Fν2,ν1(1/x), for 6 and 12 degrees of freedom,
F(1/4.00) = F(.250) = 1 - 5% = 95%. Thus statement a is true.
Consulting the F-Table for 6 and 12 degrees of freedom, S(4.82) = 1%.
Thus statement b is true.
Consulting the F-Table for 12 and 6 degrees of freedom, S(7.72) = 1%.
Therefore, using the general fact that Fν1,ν2(x) = 1 - Fν2,ν1(1/x), for 6 and 12 degrees of freedom,
F(1/7.72) = F(.130) = 1%. Thus statement c is true.
14.24. (i) (a) Dotplot for Isotonic-Isometric
It seems plausible that each plot could be from a Normal variable.

(b) sA2 = 0.3027. sB2 = 0.7734. H0: σA2 = σB2 versus H1: σA2 ≠ σB2.
F = .7734/.3027 = 2.555, with 9 and 9 degrees of freedom.
Perform a two-sided test. Since 2.555 < 3.18, do not reject H0 at 10%.
(c) sp2 = (9sA2 + 9sB2)/18 = .5381. A = 2.76 and B = 3.37.
t = (3.37 - 2.76)/√{.5381(1/10 + 1/10)} = 1.859 with 18 degrees of freedom.
H0: µB = µA versus H1: µB > µA .
Perform a one-sided test.
Since 1.734 < 1.859 < 2.101, reject H0 at 5% but not at 2.5%.
(ii) (a) The observed difference is 3.37 - 2.76 = 0.61.
For the t-distribution with 18 degrees of freedom for 5% area in both tails, 2.101.
0.61 ± 2.101√ {.5381(1/10 + 1/10)} = 0.61 ± 0.69 = (-0.08, 1.30).
(b) 18sp2/σ2 has a Chi-Square Distribution with 18 degrees of freedom.
Using the 2.5th and 97.5th percentiles, 95% = Prob[8.23 < 18sp2/σ2 < 31.53] =
Prob[8.23 < 9.6858/σ2 < 31.53] = Prob[31.53/9.6858 < σ2 < 8.23/9.6858] =
Prob[0.554 < σ < 1.085]. A 95% confidence interval for σ is (0.554, 1.085).
^
15.1. D. t -stat = β / sβ^ = 13/7 = 1.857 with 20 - 2 = 18 degrees of freedom.
For 18 degrees of freedom, for a two-tailed test using the t-distribution, the critical values are:
10% 5% 2% 1%
1.734 2.101 2.552 2.878
Since 1.734 < 1.857 < 2.101, we reject H0 at 10% and do not reject H0 at 5%.
^
15.2. E. β = Σxiyi/Σxi2 = Σ(Xi - X )(Yi - Y ) /Σ(Xi - X )2 = 1942.1/24.88 = 78.06.
15.3. C. ESS ≡ Σ ^εi 2 = Σ(Yi - Y^ i )2 = 282,750. s2 = ESS/(N - k) = 282,750/(15 - 2) = 21,750.

Σxi2 = Σ(Xi - X )2 = 24.88. sβ^2 = s2 / Σxi2 = 21750/24.88 = 874.2. sβ^ = √874.2 = 29.57.
^
t = β / sβ^ = 78.06/29.57 = 2.640. For 15 - 2 = 13 degrees of freedom, for a two-tailed test using
the t-distribution, the critical values are:
10% 5% 2% 1%
1.771 2.160 2.650 3.012
Comment: If one applies the F-Test, F = (RSS/(k-1))/(ESS/(N-k)) =
((434348 - 282750)/1)/(282750/13) = 6.970 = 2.6402 = t2.
For 1 and 13 degrees of freedom, since 4.67 < 6.970 < 9.07, we reject at 5% and do not reject
at 1%. Since the F-Table has fewer significance levels, we can not use it to determine whether
to reject or not at 2%. (The critical value at 2% is 2.6502 = 7.02, is not shown in the F-Table.
6.970 < 7.02, so we would not reject at 2%.)
Note that RSS = TSS - ESS = 434348 - 282,750 = 151,598 is also
RSS = (Σxiyi)2 /Σxi2 =1942.12/24.88 = 151,598.
15.4. A. For 15 - 2 = 13 degrees of freedom, for 5% two-tailed test using the t-distribution, or a
95% confidence interval, we want ±2.160 standard deviations.
^
β ± t.975 sβ^ = 78.06 ± (2.160)(29.57) = (14.2, 141.9).
15.5. B. ESS = TSS - RSS = 44 - 9 = 35.

F = (RSS/1)/(ESS/(N - k)) = 9/(35/(27 - 2)) = 6.429.
t -stat = √F = √6.429 = 2.536, with 27 - 2 = 25 degrees of freedom.
10% 5% 2% 1%
1.708 2.060 2.485 2.787
Since 2.485 < 2.536 < 2.787, reject H0 at 2% and do not reject H0 at 1%.
15.6. B. X = (1 + 2 + 3 + 4 + 5)/5 = 3. xi = Xi - X .
Y = (82 + 78 + 80 + 73 + 77)/5 = 78. yi = Yi - Y .
X Y x y xy x2
1 82 -2 4 -8 4
2 78 -1 0 0 1
3 80 0 2 0 0
4 73 1 -5 -5 1
5 77 2 -1 -2 4
^
β = Σxiyi/Σxi2 = -15/10 = -1.5.
^
α^ = Y - β X = 78 - (-1.5)(3) = 82.5.
^
Forecast for year 7 is: α^ + β7 = 82.5 + (-1.5)(7) = 72.
15.7. D. ΣXi2 = 1 + 4 + 9 + 16 + 25 = 55.

^ ^ε ^
Xi Yi Yi i = Yi - Yi
1 82 81 1
2 78 79.5 -1.5
3 80 78 2
4 73 76.5 -3.5
5 77 75 2
s2 = Σ ^εi 2 /(N - 2) = (1 + 2.25 + 4 + 12.25 + 4)/(5 - 3) = 23.5/3 = 7.833.
Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (7.833)(55)/((5)(10)) = 8.616.
^
15.8. B. Var[ β] = s2 /Σxi2 = 7.833/10 = .7833. sβ^ = √.7833 = .885.
^
t = β/ sβ^ = -1.5/.885 = -1.7. |t| = 1.7.
15.9. E. There are 5 -2 = 3 degrees of freedom. The critical value of the two sided t-test at
10% is 2.353. 1.7 < 2.353, so the p-value is greater than 10%
Comment: At a 10% significance level, we can not reject the hypothesis that β = 0.
Using a computer, the p-value is 18.9%.
^
15.10. A. Cov[ α^ , β] = -s2 X /Σxi2 = -(7.833)(3)/10 = -2.35.
^ ^ ^
15.11. A. Corr[ α^ , β] = Cov[ α^ , β]/ √Var[ α^ ]Var[ β] = -2.35/√ ((8.616)(.7833)) = -.905.
^
Alternately, E[X2] = 55/5 = 11. Corr[ α^ , β] = - X /√(E[X2] ) = -3/√11 = -.905.
^
15.12. E. Let h(α, β) = Y7 = α + 7β. ∂h/∂α = 1 and ∂h/ ∂β = 7. The gradient vector is: 1, 7.
Therefore, the variance of the forecast is:
(transpose of gradient vector)(Variance-Covariance matrix)(gradient vector) =
( 8.616 -2.35) ( 1 ) (-7.834 )
(1 7) ( ) ( ) = (1 7) ( ) = 14.10.
(-2.35 .7833) ( 7 ) (3.133 )
The standard deviation of the forecast is: √14.10 = 3.75.
^ ^ ^ ^
Alternately, Var[ Y7 ] = Var[ α^ + 7 β] = Var[ α^ ] + Var[7 β] + 2Cov[ α^ , 7 β] =
^ ^
Var[ α^ ] + 49Var[ β] + 14Cov[ α^ , β] = 8.626 + (49)(.7833) + (14)(-2.35) = 14.11.
√14.11 = 3.76.
Comment: See 4, 5/01, Q.25 and “Mahler’s Guide to Fitting Loss Distributions.”
^ ^ ^
15.13. C. Yt = α^ + t β ≅ t β, for very large t.
^ ^ ^ ^
Var[ Yt ] = Var[ α^ + t β] = Var[ α^ ] + Var[t β] + 2Cov[ α^ , t β] =
^ ^ ^
Var[ α^ ] + t2Var[ β] + 2tCov[ α^ , β] ≅ t2Var[ β], for very large t.
^ ^ ^
Coefficient of variation of the forecast for year t ≅ √(t2Var[ β] ) / t β = sβ^ / β= .885/-1.5 = -.59.
^ ^
Comment: In general, as t → ∞, the limit of the coefficient of variation of Yt is sβ^ / β,
the inverse of the t-statistic used for testing the hypothesis β = 0.
Even if one would use this regression to predict loss ratios 2 years beyond the most recent
observation, an actuary would be unlikely to use it to extrapolate out 10, 100 or even 1000
years! The forecast for year 1000 is -1417.5, not a possible loss ratio! So while this has
mathematical validity it has no actuarial meaning.
^ ^
15.14. β is measuring ∆Y/∆X, so multiplying each of the X values by 12, divides β by 12.
When a variable is divided by a constant, so is its standard deviation.
^
Therefore, sβ^ is divided by 12. t = β/ sβ^ is unaffected.
Comment: In general, t-tests and F-tests are unaffected by changes in scale.
15.15. A. For the 2-variable model, F = (N - 2)R2/(1 - R2) = (15 - 2)(.45)/(1 - .45) = 10.636.
t -stat = √F = √10.636 = 3.261 with 15 - 2 = 13 degrees of freedom.
10% 5% 2% 1%
1.771 2.160 2.650 3.012
Since 3.012 < 3.261, reject H0 at 1%.
Alternately, the critical value at 1% for the F-Stat with 1 and 13 degrees of freedom is 9.07.
Since 9.07 < 10.636, reject H0 at 1%.
15.16. C. Since 100 is not in the 95% confidence interval we reject at 5%.
Since 100 is in the 98% confidence interval we do not reject at 2%.
15.17. C. Since we have 300 observations, and 300 - 2 = 298 degrees of freedom, the
critical values in the t-table are those for the Normal Distribution.
Since H1: β < 0, we perform a one-sided test.
t = -2.74/1.30 = -2.108. 1.960 < 2.108 < 2.326.
Reject H 0 at 2.5%, do not reject H0 at 1%.
Alternately, using the Normal Table, for a one-sided test,
p-value = Φ[-2.108] = 1 - .9824 = 1.76%. Reject H0 at 2.5%, do not reject H0 at 1%.
^
15.18. D. t = β / sβ^ = -1.27/.57 = -2.23. F = t2 = 4.97.
^ ^
15.19. β is measuring ∆Y/∆X, so multiplying each of the Y values by 1.3, multiplies β by 1.3.
When a variable is multiplied by a constant, so is its standard deviation.
^
Therefore, sβ^ is multiplied by 1.3. t = β/ sβ^ is unaffected.
Comment: In general, t-tests and F-tests are unaffected by changes in scale.
2
15.20. E. 1 - R = (1 - R2)(N-1)/(N-k) ⇒ 1 - .72 = (1 - R2)(14)/(13) ⇒ R2 = .74.
F = {RSS/(k-1)}/{ESS/(N-k)} = {TSS R2/(k-1)}/{TSS(1 - R2)/(N-k)} =
{R2/(1 - R2)}{(N-k)/(k-1)} = (.74/.26)(13/1) = 37.
^
15.21. B. β = Σxiyi/Σxi2 = Σ(Xi - X )(Yi - Y ) /Σ(Xi - X )2 = 302.1/42.65 = 7.083.
ESS ≡ Σ ^εi 2 = Σ(Yi - Y^ i )2 = 7502. s2 = ESS/(N - k) = 7502/(25 - 2) = 326.2.

sβ^2 = s2 / Σxi2 = 326.2/42.65 = 7.648. sβ^ = √7.648 = 2.766.
^
t = β / sβ^ = 7.083/2.766 = 2.561. For 25 - 2 = 23 degrees of freedom, for a two-tailed test using
the t-distribution, the critical values are:
10% 5% 2% 1%
1.714 2.069 2.500 2.807
15.22. D. ESS = TSS - RSS = 54 - 7 = 47. F = (RSS/1)/(ESS/(N - k)) = 7/(47/(47 - 2)) = 6.70.
t -stat = √F = √6.70 = 2.59.
^ ^ ^ ^ ^ ^
Alternately, Y - Y = α^ + βX - Y = Y - β X + βX - Y = β(X - X ) = βx.
^ ^ ^
RSS = Σ( Y - Y )2 = Σ( βx)2 = β 2 Σx2. ⇒ 7 = (1.22)Σx2. ⇒ Σx2 = 4.861.
s2 = ESS/ (N - k) = (TSS - RSS)/(47 - 2) = 47/45 = 1.044.
^
Var[ β] = s2/Σx2 = 1.044/4.861 = .2148. sβ^ = √.2148 = .463.
^
t -stat = β / sβ^ = 1.2/.463 = 2.59.
15.23. A. 2.088 = F = (RSS/1)/(ESS/(7 - 2)) = RSS/(218.680/5). ⇒

RSS = (2.088)(43.736) = 91.32.
R2 = RSS/TSS = 91.32/(91.32 + 218.680) = .295.
Alternately, 2.088 = F = (N - 2)R2/(1 - R2) = (7 - 2)R2/(1 - R2), ⇒ R2 = .295.
15.24. B. F = {RSS/(k-1)}/{ESS/(N-k)} = {TSS R2/(k-1)}/{TSS(1 - R2)/(N-k)} =

{R2/(1 - R2)}{(N-k)/(k-1)} = (.64/.36)(18/1) = 32.
Comment: This F-Statistic has 1 and 18 degrees of freedom. Applying the F-Test in this case
would be the same as applying the t-test to the slope coefficient.
The t-statistic would be √32 = 5.66, with 18 degrees of freedom.
15.25. (i) All of the observations other than (9, 330), appear to fall approximately on a straight
line. The observation (9, 330) does not appear to lie on this straight line.
amount
300
250
200
150
100
50
dur.
2 4 6 8 10
(9, 330) is an “outlier” from the general linear pattern.
(ii) Adding the fitted line, y = 22.4 + 25.4x, to the scatterplot:
amount
300
250
200
150
100
50
dur.
2 4 6 8 10
(ii) (a) Excluding the observation (9, 330): Σ xi = 60 - 9 = 51, Σ xi2 = 402 - 81 = 321,
Σ yi = 1795 - 330 = 1465, Σ yi2 = 343,725 - 3302 = 234,825, Σ xiyi = 11,570 - (9)(330) = 8600.
^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(11)(8600) - (51)(1465)}/{(11)(321) - 512} = 21.38.
^
^ = Y - β X = 1465/11 - (21.38)(51/11) = 34.06.
α
The fitted line is: y = 34.06 + 21.38 x.
^
(b) R2 = β 2 Σ(Xi - X )2/Σ(Yi - Y )2 = 21.382{ΣXi2 - (ΣXi)2/N}/{ΣYi2 - (ΣYi)2/N}
= (457.1)(321 - 512/11)/(234825 - 14652/11) = 0.973.
(c) The line fit the 11 observations other than (9, 330) is shown as dashed:
amount
300
250
200
150
100
50
dur.
2 4 6 8 10
(d) R2 has increased from 87.8% to 97.3%.
Removing the outlier, has resulted in a much better fit for the remaining data.
The fitted slope changed from 25.4 to 21.38.
(e) H0: β = 25 versus H1: β ≠ 25.
For the second regression, ESS = (1 - R2)TSS = (.027)(234825 - 14652/11) = 1072.
s2 = 1072/(11 - 2) = 119.
^
Var[ β] = s2/Σ(Xi - X )2 = 119/(321 - 512/11) = 1.408.
t = (21.38 - 25)/√1.408 = -3.05.
For a two-sided test, with 9 degrees of freedom, the 2% critical value is 2.821 and the 1%
critical value is 3.250. 2.821 < 3.05 < 3.250. Reject H0 at 2% and not at 1%.
15.26. (i) The scatterplot indicates that a linear relationship, with a negative slope, seems
appropriate.
Concentration
2.5
1.5
Interval
10 20 30 40 50 60
There are two points with a much higher post-mortem interval than the other observations.
Care should be taken, as these two points might have a large impact on the regression results.
(ii) SXX = Σx2 - (Σx)2/N = 9854.5 - 3372/18 = 3545.1111.
SYY = Σy2 - (Σy)2/N = 109.7936 - 42.982/18 = 7.1669111.
SXY = Σxy - (Σx)(Σy)/N = 672.8 - (337)(42.98)/18 = -131.88111.
r = SXY/√(SXXSYY) = -131.88111/√{(3545.1111)(7.1669111)} = -0.827.
Test H0: ρ = 0 versus H1: ρ ≠ 0.
t = r√{(N - 2)/(1 - r2)} = (-0.827)√{(18 - 2)/(1 - 0.8272)} = -5.89.
If H0 is true, t has a t-distribution with n - 2 = 16 degrees of freedom.
The critical value for 1% is 2.921. Since 5.89 > 2.921, reject H0 at 1%.
^
(iii) β = SXY/SXX = -131.88111/3545.1111 = -0.0372.
^
^ = Y - β X = 42.98/18 - (-0.0372)(337/18) = 3.084.
α
For 1 day (x = 24 hours): 3.084 - (0.0372)(24) = 2.19.
For 2 days (x = 48 hours): 3.084 - (0.0372)(48) = 1.30.
Even though 48 hours is within the range of observed x-values, one should be cautious about
the forecast for 48 hours, since there are only 2 observations with x more than 26 hours.
(iv) ESS = TSS - RSS = SYY - SXY2/SXX = 7.1669111 - (-131.88111)2/3545.1111 = 2.2608.
s2 = ESS/(N - 2) = 2.2608/(18 - 2) = 0.14130.
^
Var[ β] = s2/SXX = 0.14130/3545.1111 = 0.00003986.
For 18 - 2 = 16 degrees of freedom, the 1% critical value for the t-distribution is 2.921.
99% confidence interval for β:
-0.0372 ± 2.921√0.00003986 = -0.0372 ± 0.0184 = (-0.0556, -0.0188).
zero is not in this 99% confidence interval for β. ⇒ Reject H0: β = 0 at 1%.
This is consistent with the previous rejection at 1% of the hypothesis that the correlation is 0.
Comment: As shown in a solution to a problem in a previous section, for the linear regression
model, Corr[X, Y] = β/√{β2 + σ2/Var[X]}.
Therefore, in a linear regression, β = 0 if and only if ρ = 0.
The test of ρ = 0 in part (ii) is equivalent to an F-Test of whether the slope is zero.
If β = 0, then F = (N - 2)R2/(1 - R2), has an F-Distribution with 1 and N - 2 degrees of freedom.
Therefore, if β = 0, t = √F, has a t-Distribution with N - 2 degrees of freedom.
This test of correlation is discussed for example in Section 8.8 in Probability and Statistical
Inference by Hogg and Tanis, or Section 9.7 in Introduction to Mathematical Statistics by Hogg,
McKean, and Craig.
16.1. D. The p-value = Prob[Type I error] = Prob[rejecting H0 when it is true]. ⇒

Statement D is false.
For 7 degrees of freedom, the 2% critical value is 2.998. 3 > 2.998 ⇒ Statement B is true.
^
16.2. to 16.4. Compute t = β/ sβ^ . t has N - 2 degrees of freedom.
For 15 observations, t has 13 degrees of freedom and the 5% critical value is 2.160.
Critical region is when we reject H0, which is when |t| ≥ 2.160.
For 30 observations, t has 28 degrees of freedom and the 5% critical value is 2.048.
Critical region is when we reject H0, which is when |t| ≥ 2.048.
For a given significance level, the more data, the more powerful the test. With 30 observations
the probability of making a Type II error is less than with only 15 observations.
Comment: The probability of a Type I error, rejecting H0 when it is true, is the significance level
of the test.
16.5. X = 810/5 = 162. S2 = (5/4)(131726/5 - 1622) = 126.5.

t = (162 - 175)/√(126.5/5) = -2.585, with 4 degrees of freedom. Perform a 2-sided test.
Since 2.132 < 2.585 < 2.776, reject H0 at 10% and do not reject at 5%.
Kirk should have verified that these Slubs were fully grown adults (assuming Slubs do not
continue to grow their whole life, as do many reptiles.)
Kirk should have verified that these Slubs were males (assuming Slubs have genders, one of
which corresponds to the human concept of male.)
The test assumes a random sample of 5 out of the entire set of adult male Slubs.
However, these five Slubs may have been related, for example they might have been brothers.
They may have been the equivalent of a group of jockeys, who are shorter than the average
population.
The heights of Slubs in the capital might not be a representative sample of the heights over the
whole planet.
Finally, it would have been preferable to collect a larger (random) sample of data, improving
the power of such a test.
Comment: I have listed a number of possible problems with Kirk’s test. There may be others.
16.6. A. 1. True. This is the definition of the significance level.
2. False. One should not reject the null hypothesis (at the given level of significance) when the
test statistic falls outside of the critical region. 3. False. The fact that the test criteria is not
significant merely tells us that the data do not contradict the null hypothesis, rather than
proving that H0 is true.
16.7. A. Statement 1 is true.

A Type II error occurs if H0 is not rejected when it is false. ⇒ Statement 2 is false.
Depending on the situation being modeled, either type of error can be worse.
Comment: From a purely statistical point of view, one wants to avoid both types of errors, and
neither is inherently worse. However, for a given sample size, decreasing the probability of
one type of error increases the probability of the other type of error.
18.1. A. The means of X2 and X3 are 6 and 9. x2 = -4, -1, 1, 4. x3 = -5, -1, 1, 5.
Σx2i2 = (-4)2 + (-1)2 + 12 + 42 = 34. Σx3i2 = (-5)2 + (-1)2 + (1)2 + 52 = 52.
Σx2ix3i = (-4)(-5) + (-1)(-1) + (1)(1) + (4)(5) = 42.
^
= {Σx2iyi (52) - (Σx3iyi)(42)}/{(34)(52) - (42)2} = 13Σx2iyi - 10.5Σx3iyi.
^
We are given β 2 = Σ wiYi . Thus wi = 13x2i - 10.5x3i.
(w1, w2, w3, w4) = (13)(-4, -1, 1, 4) - (10.5)(-5, -1, 1, 5) = (0.5, -2.5, 2.5, -0.5).
^
Alternately, using the matrix formulas for multiple regression, β = (X’X)-1X’Y,
(1 2 4) (4 24 36) (22.75 16.5 -13.5)
X= (1 5 8) X’X = (24 178 258) (X’X)-1 = (16.5 13 -10.5)
(1 7 10) (36 258 376) (-13.5 -10.5 8.5 )
(1 10 14)
(1.75 -2.75 3.25 -1.25)
-1
(X’X) X’ = (0.5 -2.5 2.5 -0.5 )
(-0.5 2 -2 0.5 )
^
β 2 = Σ wiY i, with wi the elements of the 2nd row of (X’X)-1X’: (0.5, -2.5, 2.5, -0.5).
Comment: Similar to 4, 11/01, Q.13. Note that Σ wi = 0. This is generally true for the slope
parameters, but not the intercept parameter. For the intercept parameter,
Σ wi = 1.75 - 2.75 + 3.25 - 1.25 = 1. The first column of the design matrix, X, is all ones.
Therefore, the first column of ((X’X)-1X’)X consists of the sums of the rows of ((X’X)-1X’), i.e. the
sums of the different sets of w’s. However ((X’X)-1X’)X = the identity matrix, whose first column
has a one followed by zeros. Therefore, the sum of the w’s for the intercept is 1, and the sum of
the the w’s for a slope parameter is 0.
18.2. B. Y = 13. y = (-3, -5, 1, 7). Σx2iyi = 46. Σx3iyi = 56.

^
β 3 = {Σx3iyi Σx2i2 - Σx2iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2} =
{(56)(34) - (46)(42)}/{(34)(52) - 422} = -7.
Alternately, using the matrix formulas for multiple regression:
(1 2 4) (4 24 36) (22.75 16.5 -13.5)
X= (1 5 8) X’X = (24 178 258) -1
(X’X) = (16.5 13 -10.5)
(1 7 10) (36 258 376) (-13.5 -10.5 8.5 )
(1 10 14)
(10) (52) (16)
^
Y = ( 8) X’Y = (358) β = (X’X)-1X’Y = (10)
(14) (524) ( -7)
(20)
^
18.3. E. From a previous solution, β 2 = .5Y1 - 2.5Y2 + 2.5 Y3 - .5Y4 = 10.
^ ^ ^
β1 = Y - β 2 X 2 - β 3 X 3 = 13 - (10)(6) - (9)(-7) = 16.
18.4. A., 18.5. C., 18.6. D., 18.7. E.

s2 = ESS/(N - k) = 516727/(32 - 3) = 17818.
rX X = Σx2ix3i/√(Σx2i2Σx3i2) = -612/√{(23,266)(250)} = -.254.
2 3
^
Var[ β 2 ] = s2/{(1 - rX X 2)Σx2i2} = (17818)/{(1 - .2542)(23,266)} = .819.
2 3
^
Var[ β 3] = s2/{(1 - rX X 2)Σx3i2} = (17818)/{(1 - .2542)(250)} = 76.2.
2 3
^ ^
Cov[ β 2 , β 3] = - rX X s2 / {(1 - rX X 2)√(Σx2i2Σx3i2)} =
2 3 2 3
-(-.254)(17818)/{(1 - .254 2 ) √ {(23,266)(250)}} = 2.01.
^ ^ ^ ^ ^ ^
Var[ β 2 + β 3] = Var[ β 3] + Var[ β 3] + 2 Cov[ β 2 , β 3] = .819 + 76.2 + (2)(2.01) = 81.0.
^ ^
StdDev[ β 2 + β 3] = √81.0 = 9.0.
18.8. B. Y = β1 + β2X2 + β3X3.

Squared error: β12 + (β1 + 100β3 - 30)2 + (β1 + 100β2 - 40)2 + (β1 + 100β2 + 100β3 - 80)2.
Set the partial derivative with respect to β1 equal to zero:
0 = 2{β1 + β1 + 100β3 - 30 + β1 + 100β2 - 40 + β1 + 100β2 + 100β3 - 80}.
⇒ 2β1 + 100β2 + 100β3 = 75.
0 = 2{100(β1 + 100β2 - 40) + 100(β1 + 100β2 + 100β3 - 80)}.
⇒ 2β1 + 200β2 + 100β3 = 120.
0 = 2{100(β1 + 100β3 - 30) + 100(β1 + 100β2 + 100β3 - 80)}.
⇒ 2β1 + 100β2 + 200β3 = 110.
From the first two equations: 100β2 = 120 - 75. ⇒ β2 = .45.
From the last two equations: 100β2 - 100β3 = 120 - 110. ⇒ β3 = β2 - .1 = .35.
⇒ β1 = 60 - 100β2 - 50β3 = -2.5.
^
Y = -2.5 + .45(hours at home) + .35(hours at library).
60 = -2.5 + .45(hours at home) + (.35)(75). ⇒ hours at home = 80.56.
18.9. A. The variables X2i and X3i have means of zero, so they are in deviations form.
^ ^ ^
The usual 3 variable regression would give β1 = Y - β 2 X2 - β 3 X 3 = Y .
^ ^ ^
If we were to rewrite Y in deviations form we would get β1 = 0 and the same β 2 and β 3;
thus the given model with no intercept has the same variance of its estimated slopes as would
the usual 3 variable regression model.
^
Var[ β 2 ] = s2/{Σxi22 (1 - rX X 2)}.
2 3
^
We are given Var[ β 2 ] = 4s2/3 and Σxi22 = 1 ⇒ 4/3 = 1/(1 - rX X 2). ⇒ rX X = ±0.50.
2 3 2 3
Both X2 and X3 have standard deviations of 1, therefore, the regression of X2 on X3 has slope
rX X . We are given this slope is negative, so that rX X < 0. rX X = -0.5.
2 3 2 3 2 3
^
Comment: In general, for a regression of X on Y, the slope is: β = Σxiyi/Σxi2 = rXYsY/sX.
18.10. A. X2 = X 3 = 0. Σx2i2 = 4. Σx3i2 = 4. Σx2ix3i = 0. Y = 2.5. Σx2iyi = 0. Σx3iyi = 4.

^
β 2 = {Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2} = {(0)(4) - (4)(0)}/{(4)(4) - 02} = 0.
Alternately, using the matrix form of regression:
(1 -1 -1) (4 0 0) (1/4 0 0)
X = (1 1 -1) X’X = (0 4 0) -1
(X’X) = (0 1/4 0)
(1 -1 1) (0 0 4) (0 0 1/4)
(1 1 1)
(1/4 0 0) (10) (2.5)

^
β = (X’X)-1X’Y = (0 1/4 0) ( 0 ) = (0 )
(0 0 1/4) ( 4 ) ( 1 ).
^ ^ ^
β1 = 2.5, β 2 = 0, and β 3 = 1.
^
18.11. C. Var[ β 2 ] = s2/{(1 - rX X 2)Σx2i2} = 10/{(1 - .52)(4)} = 10/3 = 3.33.
2 3
^
Var[ β 3] = s2/{(1 - rX X 2)Σx3i2} = 10/{(1 - .52)(8)} = 10/6 = 1.67.
2 3
^ ^
Cov[ β 2 , β 3] = - rX X s2/{(1 - rX X 2)√(Σx2i2Σx3i2)} = -(.5)(10)/{(1 - .52)√((4)(8))} = -1.1785.
2 3 2 3
^ ^ ^ ^ ^ ^
Var[ β 2 - β 3] = Var[ β 2 ] + Var[ β 3] - 2Cov[ β 2 , β 3] = 3.33 + 1.67 - (2)(-1.1785) = 7.357.
^ ^
StdDev[ β 2 - β 3] = √7.357 = 2.71.
Alternately, Σx2ix3i = rX X √(Σx2i2Σx3i2) = (.5)√((4)(8)) = 2√2 = 2.828. If the variables were
2 3
rewritten in deviations form, the slope coefficients are the same as the original model, and the
intercept is zero. The design matrix would be:
(x2,1 x3,2)
x = (x2,2 x3,2)
(... ...)
(x2,30 x3,30)
(Σx2i2 Σx2ix3i ) (4 2.818)
x’x = ( ) = ( )
( Σx2ix3i Σx3i2) (2.818 8)
(8 -2.828) (3.33 -1.174)

Variance-Covariance Matrix = s2(x’x)-1 = (10) ( )/24 = ( )
(-2.828 4) (-1.174 1.67 )
^ ^ ^ ^ ^ ^
Var[ β 2 - β 3] = Var[ β 2 ] + Var[ β 3] - 2Cov[ β 2 , β 3] = 3.33 + 1.67 - (2)(-1.174) = 7.35.
^ ^
StdDev[ β 2 - β 3] = √7.35 = 2.71.
^ ^
Comment: Note that Corr[ β 2 , β 3] = -.5 = - rX X .
2 3
18.12. B. The mean of X2 and X3 are each zero, so they are already in deviations form.
Σx2i2 = (-3)2 + (-1)2 + 12 + 32 = 20. Σx3i2 = (-1)2 + 32 + (-3)2 + 12 = 20.
Σx2ix3i = (-3)(-1) + (-1)(3) + (1)(-3) + (3)(1) = 0.
^
= {Σx3iyi (20) - (Σx2iyi)(0)}/{(20)(20) - (0)2} = Σx3iyi /20.
^
We are given β 3 = Σ wiYi . Thus wi = x3i /20.
(w1 , w2 , w3 , w4 ) = (-1, 3, -3, 1)/20 = (–0.05, 0.15, –0.15, 0.05).
^
Alternately, using the matrix formulas for multiple regression, β = (X’X)-1X’Y:
(1 -3 -1) (4 0 0) (1/4 0 0)
X= (1 -1 3) X’X = (0 20 0) -1
(X’X) = (0 1/20 0)
(1 1 -3) (0 0 20) (0 0 1/20)
(1 3 1)
(1/4 1/4 1/4 1/4)
(X’X)-1X’ = (-3/20 -1/20 1/20 3/20)
(-1/20 3/20 -3/20 1/20)
^
β 3 = Σ wiY i, with wi the elements of the third row of (X’X)-1X’: (–0.05, 0.15, –0.15, 0.05).
Comment: Arithmetic simplifies a little due to the terms which turn out to be zero in this case.
18.13. A. Continuing the previous solution:

^
= {Σx2iyi (20) - (Σx3iyi)(0)}/{(20)(20) - (0)2} = Σx2iyi /20.
^
We are given β 2 = Σ wiYi . Thus wi = x2i /20.
(w1 , w2 , w3 , w4 ) = (-3, -1, 1, 3)/20 = (–0.15, -0.05, 0.05, 0.15).
^
Alternately, using the matrix formulas for multiple regression, β = (X’X)-1X’Y,
^
β 2 = Σ wiY i, with wi the elements of the 2nd row of (X’X)-1X’: (–0.15, -0.05, 0.05, 0.15).
19.1. ( 1.47027 )
^
β = (X’X)-1X’Y = ( 0.81449 )
(0.820444)
( 13.5286 )
This is a four variable model, 3 independent variables plus an intercept.
^ ^ ^ ^
β1 = 1.470. β 2 = 0.814. β 3 = 0.820. β 4 = 13.53.
^
19.2. ESS = Y’Y - β‘X’Y = 73990.3 - 72986.7 = 1003.6.
s2 = ESS/(N - k) = 1003.6/(20 - 4) = 62.72.
Comment: Similar to Course 120 Sample Exam #3, Q.6.
19.3. (33.0 .442 -0.0507 -23.3 )
s2(X’X)-1 = (.442 0.262 -0.0451 -0.0916)
(-0.0507 -0.0451 0.0446 -0.802)
(-23.3 -0.0916 -0.802 43.4 )
19.4. The sum of the Y’s is the first element of X’Y: 1133.2. Y = 1133.2/20 = 56.66.
TSS = Y’Y - N Y 2 = 73990.3 - (20)(56.662) = 9783.2.
R2 = 1 - ESS/TSS = 1 - 1003.6/9783.2 = .897.
2
R = 1 - (1 - R2)(N-1)/(N - k) = 1 - (1 - .897)(20 - 1)/(20 - 4) = .878.
19.5. D. s2 = ESS/(N - k) = (4799790 - 4283063)/ (32 - 3) = 17818.

^ ^
Var[ β] = s2(X’X)-1. Therefore, Var[ β 2 ] = (17818)(0.0000459) = .818.
^ ^ ^
Var[ β 3] = (17818)(0.00428) = 76.3. Cov[ β 2 , β 3] = (17818)(0.0001125) = 2.005.
^ ^ ^ ^ ^ ^
Var[100 β 2 + 10 β 3] = 10000 Var[ β 2 ] + 100 Var[ β 3] + 2000 Cov[ β 2 , β 3] =
(10000)(.818) + (100)(76.3) + (2000)(2.005) = 19820.
^ ^
StdDev[100 β 2 + 10 β 3] = √19820 = 141.
Comment: Similar to Course 120 Sample Exam #1, Q.4
2 2
1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .895)(20 - 1)/(20 - 4) = .125. R = .875.
19.6. For the one variable model (slope and no intercept), the design matrix has a single
column containing X. In other words, X is a column vector.
^
X’X = ΣXi2. (X’X)-1 = 1/ΣXi2. X’Y = ΣXiYi. β = (X’X)-1X’Y = ΣXiYi/ΣXi2.
19.7. Since the regression passes through the point where all the variables are equal to their
means, Y = 11 - 4 X 2 + 7 X 3 - 12 X 4 = 11 - (4)(148/25) + (7)(201/25) - (12)(82/25) = 4.24.
19.8. For the two variable model (slope and intercept), the design matrix has 1s in the first
column and Xi in the second column:
(1 X1)
X = (1 X 2)
(1 X 3)
(... ... )
X’ = (1 1 1 ....
(X1 X2 X3 ....
N ΣXi 
X’X =  
ΣXi ΣXi2
 ΣXi2 -ΣXi
(X’X)-1 =   /{N ΣXi2 - (ΣXi)2}.
−ΣXi N 
X’Y = (ΣYi, ΣXiYi), a column vector.
^
β = (X’X)-1X’Y = (ΣYiΣXi2 - ΣXiΣXiYi, NΣXiYi - ΣXiYi}/{N ΣXi2 - (ΣXi)2}.
The first component of this vector gives the equation for the fitted intercept, and the second
component gives the equation for the fitted slope of the two-variable model:
α^ = {ΣYiΣXi2 - ΣXiΣXiYi }/ {NΣXi2 - (ΣXi)2}.
^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2}.
Comment: The equations in deviations form can be derived from these equations not in
deviations form.
19.9. C. and 19.10. B. Model with no intercept. Put it in matrix form.

(2 4) (10) (178 258) (358)
X = (5 8) Y = ( 8 ) X’X = ( ) X’Y = ( )
(7 10) (14) (258 376) (524)
(10 14) (20)
(376 -258) (376 -258)
-1
(X’X) = ( )/{(178)(376) - (258)(258)} = ( )/364.
(-258 178) (-258 178)
(376 -258)(358) (-1.6044)
^
β = (X’X)-1X’Y = ( )( )/364 = ( ).
(-258 178)(524) ( 2.4945 )
^ ^
β 2 = -1.6044. β 3 = 2.4945.
19.11. E., 19.12. D., and 19.13. A.
^
Y = -1.6044X2 + 2.4945X3 = (6.7692, 11.934, 13.7142, 18.879).
^
ε^ = Y- Y = (10, 8 ,14, 20) - (6.7692, 11.934, 13.7142, 18.879) = (3.2308, -3.934, .2858, 1.121).
ESS = Σ ε^ 2 = 3.23082 + (-3.9342) + .28582 + 1.1212 = 27.253.
s2 = ESS/(N -k) = 27.253/(4 - 2) = 13.626.
(376 -258) (14.075 -9.658)
^
Var[ β] = s2(X’X)-1 =13.626 ( )/364 = ( ).
(-258 178) (-9.658 6.663)
^ ^ ^ ^
Var[ β 2 ] = 14.075. Var[ β 3] = 6.663. Cov[ β 2 , β 3] = -9.658.
19.14. The design matrix X has a first column of ones and a second column equal to the Xi.
N ΣXi 
X’X =  
ΣXi ΣXi2
 ΣXi2 -ΣXi  ΣXi2 -ΣXi E[X2] − X 

(X’X)-1 =   /{N ΣXi2 - (ΣXi)2} =   /{N Σxi2} =   / Σxi2.
−ΣXi N  −ΣXi N   − X 1 
E[X2] − X1 X X1 - X 
 
X (X’X)-1 = E[X2] − X2 X X2 - X  / Σxi2.
 ... ... 

 E[X2 ] − 2X1 X + X12 E[X2 ] − X1 X − X2 X + X1X2 ...
 
H = X (X’X)-1X’ =  E[X2 ] − X1 X − X2 X + X1X2 E[X2 ] − 2X2 X + X22 ... / Σxi2.
 ... ... ...

Hij = (E[X2] - Xi X - Xj X + XiXj) / Σxi2.
(I - H)σ2 = {δij - (E[X2] - Xi X - Xj X + XiXj) / Σxi2}σ2.
Var[ ^εi ] = σ2{1 - (E[X2] - 2Xi X + Xi2) / Σxi2}.
Cov[ ^εi , ε^j ] = -σ2(E[X2] - Xi X - Xj X + XiXj) / Σxi2.

Comment: The hat matrix H is an N by N matrix.
1 1 1 1 1
19.15. Transpose of the design matrix, X’ =  .
1 2 3 4 5
 5 15   55 -15
X’X =   . (X’X)-1 =   / 10 .
 15 55   -15 5 
 8 −2 6 4 2 0 −2
   
 5 −1 4 3 2 1 0
X (X’X)-1 =  2 0  /10. H = X (X’X)-1X’ = 2 2 2 2 2  / 10 .
   
 −1 1  0 1 2 3 4
   
 −4 2   −2 0 2 4 6
 4 -4 -2 0 2 
 
 -4 7 -2 -1 0 
(I - H)σ2 = σ2  -2 -2 8 -2 -2 / 10 .
 
 0 -1 -2 7 -4
 
 2 0 -2 -4 4 
Comment: Var[ ε^ 1] = 0.4σ2. Var[ ε^ 2] = 0.7σ2. Var[ ε^ 3] = 0.8σ2. Var[ ε^ 4] = 0.7σ2. Var[ ε^ 5] = 0.4σ2.
E[ESS] = ΣE[ ε^ i2] = ΣVar[ ε^ i] = (.4 + .7 + .8 + .7 + .4)σ2 = 3σ2 = (5 - 2)σ2 = (N - k)σ2.
 1 -.76 -.35 0 .5 
 
 -.76 1 -.26 -.14 0 
The correlation matrix of the residuals is:  -.35 -.26 1 -.26 -.35 .
 
 0 -.14 -.26 1 -.76
 
 .5 0 -.35 -.76 1 
Note that Corr[ ε^ 1 , ε^ 5] = 0.5 > 0. For observations with X values near opposite extremes, the
corresponding residuals may be positively correlated.
It can be shown that if Xi = i, i = 1, 2, ... N, then
Var[ ε^ 1] = σ2(N - 1)(N - 2)/{N(N+1)} = Var[ε^ N],
Cov[ ε^ 1 , ε^ N] = σ22(N - 2)/{N(N+1)}, and Corr[ε^ 1 , ε^ N] = 2/(N - 1).
^ ^ ^
19.16. C. Var[ β] = s2(X’X)-1. Var[ β 2 ] = s2(.0087). Var[ β 3] = s2(.0087).
^ ^
Cov[ β 2 , β 3] = s2(-.0020).
^ ^ ^ ^ ^ ^
Var[ β 2 - β 3] = Var[ β 2 ] + Var[ β 3] - 2Cov[ β 2 , β 3] = .0214s2 = (.0214)(280.1167) = 5.994.
^ ^
The estimated standard error of β 2 - β 3 is: √5.994 = 2.45.
^
Comment: Var[ β 2 ] is s2 times the 2,2 element of (X’X)-1.
^ ^
Cov[ β 2 , β 3] is s2 times the 2,3 element of (X’X)-1.
^
19.17. E. Y = X β^ = X(X’X)-1X’Y.
^ ^ ^
Y’ Y = {X(X’X)-1X’Y}’X(X’X)-1X’Y = Y’X(X’X)-1X’X(X’X)-1X’Y = Y’X(X’X)-1X’Y = Y’X β.
^ ^ ^
Y’Y = {X β}’Y = β‘(X’Y) = (5.22, 1.62, 0.21, -0.45)(261.5, 4041.5, 6177.5, 5707.0) = 6641.4.
^ ^ ^ ^ ^
Y’ Y = ( Y’Y)’ = ( β‘X’Y)’ = Y’X β^ = Y’ Y.
^ ^ ^ ^ ^ ^
^
ESS = Σ ^εi 2 = (Y - Y)’(Y - Y) = Y’Y - Y’Y - Y’ Y + Y’ Y = Y’Y - β‘(X’Y) = 7995 - 6641.4 = 1353.6.
s2 = ESS/(N - k) = 1353.6/(30 - 4) = 52.1.
Var[β3] = s2(3,3 element of (X’X)-1) = (52.1)(.0035) = .182. sβ^3 = √.182 = .427.
For the t-distribution, for 26 degrees of freedom, a 95% confidence interval is ±2.056 standard
deviations. This has width: (2)(2.056) sβ^3 =(2)(2.056)(.427) = 1.76.
Comment: Difficult.
19.18. C. s2 = ESS/(N - k) = 282.82/(15 - 4) = 25.71. Covariance Matrix is s2(X’X)-1.

^ ^
Var[ β 2 ] = (25.71)(.03) = .7713. Var[ β 3] = (25.71)(2.14) = 55.02.
^ ^
Cov[ β 2 , β 3] = (25.71)(.11) = 2.828.
^ ^ ^ ^ ^ ^
Var[ β 3 - β 2 ] = Var[ β 3] + Var[ β 2 ] - 2 Cov[ β 2 , β 3] = 55.02 + .7713 - (2)(2.828) = 50.14.
^ ^
StdErr[ β 3 - β 2 ] = √50.14 = 7.08.
^ ^ ^ ^ ^ ^
Comment: Var[ β 3 - β 4 ] = Var[ β 3] + Var[ β 4 ] - 2 Cov[ β 2 , β 4 ] = 55.02 + 111.07 - (2)(-64.79) =
^ ^
295.67. StdErr[ β 3 - β 4 ] = √295.67 = 17.2.
19.19. D. Y = β1 + 500β2 + β3 for a private hospital with 500 beds, while Y = β1 + 500β2 for a
^ ^
public hospital with 500 beds. The difference is β3. β 3 = 28. Var[ β 3] = 38.8423.
For 393 - 3 = 390 degrees of freedom, the t-distribution for a total of 5% probability in both tails
has a critical value of 1.960, the same as the Normal Distribution.
28 ± 1.960√38.8423 = 28 ± 12.2 = (15.8, 40.2).
Comment: The adjustment that was made for heteroscedasticity, similar to 4, 11/00, Q. 31,
affects the fitted parameters and the estimated covariance matrix. However, once we have the
fitted parameters and estimated covariance matrix, the fact that such an adjustment was made
can be ignored for purposes of answering the question that was asked.
19.20. E. Y = β1 + 300β2 + β3 for a private hospital with 300 beds, while Y = β1 + 400β2 for a
^ ^
public hospital with 400 beds. The difference is β3 - 100β2. β 2 = 3.1. β 3 = 28.
^ ^
β 3 - 100 β 2 = -282.
^ ^ ^ ^
Var[ β 3] = 38.8423. Var[ β 2 ] = .0035. Cov[ β 2 , β 3] = 0.0357.
^ ^ ^ ^ ^ ^
Var[ β 3 - 100 β 2 ] = Var[ β 3] + 10000Var[ β 2 ] - 200Cov[ β 2 , β 3] = 66.7023.
For 393 - 3 = 390 degrees of freedom, the t-distribution for a total of 1% probability in both tails
has a critical value of 2.576, the same as the Normal Distribution.
-282 ± 2.576√66.7023 = -282 ± 21.0 = (-303.0, -261.0).
^ ^
Comment: For some reason the exam question listed β 2 before β1!
The same order was used when giving the variances and covariances.
^
Therefore, the first row and first column of the given matrix refer to β 2 .
^ ^ ^ ^ ^ ^
19.21. A. Var[ β1 + 600 β 2 ] = Var[ β1] + 6002Var[ β 2 ] + (2)(600)Cov[ β1 , β 2 ] =
1.89952 + (360,000)(.00001) + (1200)(-.00364) = 1.13152.
^ ^
The standard error of β1 + 600 β 2 is: √ 1.13152 = 1.06373.
20.1. Degrees of Freedom = N - k = 30 - 4 = 26. t = -4.421/2.203 = -2.007.

Since 1.706 < 2.007 < 2.056, reject at 10% and do not reject at 5%.
20.2. Degrees of Freedom = N - k = 36 - 6 = 30. t = (3.13 - 1)/.816 = 2.610.

20.3. ESS = TSS - RSS = 3,600,196.

F = {RSS/(k-1)}/{ESS/(N - k)} = {5,018,232/(3-1)}/{3,600,196/(15 - 3)} = 8.363.
For 2 and 12 degrees of freedom, the critical values are: 3.88 at 5% and 6.93 at 1%.
Since 8.363 > 6.93, we reject the hypothesis at 1%.
Comment: Using a computer, the p-value is .53%.
2 2
20.4. D. 1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .912)(23 - 1)/(23 - 5) = .108. R = .892.
20.5. A. F = {RSS/(k-1)}/{ESS/(N-k)} = {(N-k)/(k-1)}R2TSS/{(1-R2)TSS} =

{(N-k)/(k-1)}R2/(1-R2) = ((23 - 5)/(4 - 1))(.912)/(1 - .912) = 46.64.
Comment: If for example TSS = 1000, then R2 ≡ RSS/TSS, so that RSS = 912.
ESS = TSS - RSS = 88. F = {RSS/(k-1)}/{ESS/(N-k)} = (912/4)/(88/18) = 228/4.889 = 46.64.
20.6. D. F = (explained variance)/(unexplained variance).

Comment: Statement A is R2.
20.7. E. The definition of the F-Distribution involves the ratio of two independent
Chi-Square Distributions. In the derivation of this result, in order to get Chi-Square
Distributions, the errors have to be independent Normal Distributions with mean zero and the
same variance. Then F follows an F-Distribution, provided H0: all the slopes are zero, is true.
Comment: If there were multicollinearity, then the effective value of k would be smaller than the
number of variables actually used.
^
20.8. C. sβ^ = √426076 = 652.7. t = ( β1 - 1500)/ sβ^ = (-20.352 - 1500)/652.7= -2.329.
1 1
For 15 - 3 = 12 degrees of freedom, the critical values for 5% and 2% are 2.179 and 2.681.
Since 2.179 < 2.329 < 2.681, we reject at 5% and do not reject the hypothesis at 2%.
^
20.9. E. sβ^ = √58.85 = 7.671. t = β 2 /sβ^ = 13.3504/7.671 = 1.740.
2 2
For 15 - 3 = 12 degrees of freedom, the critical value for 10% is 1.782.
Since 1.740 < 1.782, we do not reject the hypothesis at 10%.
^
20.10. A. sβ^3 = √4034 = 63.51. t = β 3/sβ^3 = 243.714/63.51 = 3.837.
Since 3.837 > 3.005, we reject the hypothesis at 1%.
20.11. E. Var[β 1 + 50β 2 + 10β 3] =

Var[β1] + 2500Var[β2] + 100Var[β3] + 100Cov[β1, β2] + 20Cov[β1, β3] + 1000Cov[β2, β3] =
426076 + (2500)(58.85) + (100)(4034) + (100)(-2435) + (20)(-36703) + (1000)(41.99) = 41031.
^ ^ ^
β1 + 50 β 2 + 10 β 3 = -20.352 + (50)(13.3504) + (10)(243.714) = 3084.
For the t-distribution with 15 - 3 = 12 degrees of freedom, the critical values for 5% is 2.179.
thus a 95% confidence interval is: 3084 ± (2.179)√41031 = 3084 ± 441 = (2643, 3525).
^
20.12. A. For testing β = 0, t = β/ sβ^ . sβ^ = -1.9/(-2.70) = 0.7037.
^
^ = Var[ β] ΣXi2/N = (0.70372) 1018/16 = 31.507.
Var[ α]
For testing α = 10, t = (27 - 10)/√31.507 = 3.029.
For 16 - 2 = 14 degrees of freedom, for a 2-sided test 2.977 is the 1% critical value.
3.029 > 2.977. Reject H0 at 1%.
20.13. D. t = 88/49 = 1.796, for 50 - 3 = 47 degrees of freedom.

As shown in the t-table for 47 degrees of freedom, the 10% and 5% critical values are about
1.68 and 2.01, so we reject H0 at 10% and do not reject at 5%.
20.14. B. t = 0.031/0.012 = 2.583, for 50 - 3 = 47 degrees of freedom.

As shown in the t-table for 47 degrees of freedom, the 2% and 1% critical values are about
2.41 and 2.69, so we reject H0 at 2% and do not reject at 1%.
20.15. E. t = -0.72/0.46 = -1.565, for 50 - 3 = 47 degrees of freedom.
As shown in the t-table for 47 degrees of freedom, the 10% critical value is about 1.68.
1.565 < 1.68, so we do not reject H0 at 10%.
20.16. B. s2 = ESS/(N - 3) = 63/(50 - 3) = 1.340. s = 1.158.
2 2
20.17. 1 - R = (1 - R2)(N - 1)/(N - k) = (1 - .84)(50 - 1)/(50-3) = 0.1668. R = 0.833.
20.18. R2 = 1 - ESS/TSS. ⇒ .84 = 1 - 63/TSS. ⇒ TSS = 393.75.

RSS = R2 TSS = (.84)(393.75) = 330.75.
F = {RSS/(k - 1)}/{ESS/(N - k)} = {330.75/(3 - 1)}/{63/(50 - 3)} = 123.375 with 2 and 47 degrees
of freedom. The 1% critical value is about 5.5. 123.375 > 5.5, so we reject H0 at 1%!
Alternately, F = {R2/(1 - R2)}(N - k)/(k - 1) = (.84/.16)(47/2) = 123.375. Proceed as before.
20.19. to 20.21. Compute F-Statistic = {RSS/(k-1)}/{ESS/(N - k)} = (RSS/4)/{ESS/(N - 5)}.

F has k - 1 = 4 and N - 5 degrees of freedom.
For 15 observations, F has 4 and 10 degrees of freedom and the 5% critical value is 3.48.
Critical region is when we reject H0, which is when F ≥ 3.48.
For 30 observations, F has 4 and 25 degrees of freedom and the 5% critical value is 2.76.
Critical region is when we reject H0, which is when F ≥ 2.76.
For a given significance level, the more data, the more powerful the test. With 30 observations
the probability of making a Type II error is less than with only 15 observations.
Comment: The probability of a Type I error, rejecting H0 when it is true, is the significance level
of the test.
20.22. B. ESSUR = ESSV = 4.35. ESSR = ESSI = 5.85. N = 5.

q = dimension of restriction = 1.
k = independent variables for the unrestricted model = 3.
{(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(5.85 - 4.35)/1} /{4.35/(5 - 3)} = .69 is an
F-Statistic with 1 and 2 degrees of freedom.
Comment: Sample variance of Y = Σ(Yi - Y )2/(N - 1). ⇒ 2.2 = TSS / 4. ⇒ TSS = 8.8.
There is a lot of information given in this question that is not used.
We are comparing a model with an intercept, X2 and X4, to a model with an intercept and X2.
The unrestricted model has 3 variables including the intercept.
The restricted model has 2 variables including the intercept.
q is the difference between the number of variables in the unrestricted model and the restricted
model. Thus q = 3 - 2 = 1.
^
20.23. C. β 3 = -2. s2 = ESS/(N - k) = 12/3 = 4.
^
variance-covariance matrix of β is: s2(X’X)-1. ⇒ sβ^ 2 = (2/3)s2 = 8/3. ⇒ sβ^ = 1.633.
3 3
^
The t statistic for testing the null hypothesis H0: β3 = 1 is: ( β 3 - 1)/ sβ^ = (-2 - 1)/1.633 = -1.84.
3
Comment: For 3 degrees of freedom, since 1.638 < 1.84 < 2.353, we reject H0 at 20% and do
not reject H0 at 10%, for a two-tailed test.
20.24. B. The first model has the restriction β4 = 0. ESS = TSS - RSS.
Error sum of squares for the unrestricted (second) model = ESSUR = 128 - 65.6 = 62.4.
Error sum of squares for the restricted (first) model = ESSR = 128 - 61.3 = 66.7.
N = number of observations = 10. q = dimension of restriction = 1.
F = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(66.7 - 62.4)/1}/{62.4/(10 - 4)} = .41.
Comment: The TSS only depends on the data, so it is equal for the two models.
21.1. C. The unrestricted model is Model I with ESS = 2721.

To obtain the restricted model, set β2 = β3 = 0 to yield Model III with ESS = 3763. Then, ESSUR
= Error Sum of Squares of the Unrestricted model = 2721.
q = dimension of the restriction = independent variables for unrestricted model - independent
variables restricted model = 4 - 2 = 2. N = number of observations = 25.
{(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(3763 - 2721)/2} /{2721/(25 - 4)} = 4.02 is an
21.2. 4.02 is an F-Statistic with 2 and 21 degrees of freedom. Using the table with ν1 = 2 and
ν2 = 21, the critical values are 3.47 and 5.78 for 5% and 1% respectively.
Since 3.47 < 4.02 < 5.78, we reject at 5% and do not reject at 1% the null hypothesis.
21.3. A. The unrestricted model is Model I with ESS = 2721.

To obtain the restricted model, substitute β3 = β2 to yield Model II with ESS = 3024.
Then, ESSUR = Error Sum of Squares of the Unrestricted model = 2721.
Since 2.33 < 4.32, we do not reject at 5% the null hypothesis, that β2 = β3.
21.5. B. The unrestricted model is Model I with ESS = 2721.
To obtain the restricted model, substitute β4 = 1 - β2 to yield Model VI with ESS = 3897 . Then,
variables restricted model = 4 - 3 = 1.
Since 9.08 > 8.02, we reject at 1% the null hypothesis, that β2 + β4 = 1.
21.7. X = 55. x = {-10, -5, 0, 5, 10}. Y = 63.6. y = {-20.6, -5.6, -.6, 12.4, 14.4}.
^ ^
Σxi2 = 250. Σxiyi = 440. β = 440/250 = 1.76. α
^ = Y - β X = -33.2.
^
Y = -33.2 + 1.76X = {46, 54.8, 63.6, 72.4, 81.2}.
^
ε^ = Y - Y = {-3, 3.2, -0.6, 3.6, -3.2}. ESS = Σ ε^t 2 = 42.8.
^
21.8. Restricting α = 0 and β = 1, Yt = Xt. ε^t = Yt - Xt.
ESSrestricted = Σ ε^t 2 = (43 - 45)2 + (58 - 50)2 + (63 - 55)2 + (76 - 60)2 + (78 - 65)2 = 557.
The restriction is two dimensional, q = 2.
F-Statistic = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(557 - 42.8/2}/{42.8/(5 - 2)} = 18.02
The numerator has 2 degrees of freedom and the denominator has 5 - 2 = 3 degrees of
freedom. Since 9.55 < 18.02 < 30.82, we reject H0 at the 5%, but not at 1%.
21.9. C. F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} =

{(ESSC - ESS1 - ESS2)/k}/{(ESS1 + ESS2)/(N1 + N2 - 2k)} =
{(5735 - 2573 - 2041)/3}{(2573 + 2041)/(30 + 20 - (2)(3))} = 373.7/104.9 = 3.56
Comment: Similar to 4, 11/00, Q. 21. The F-statistic has 3 and 44 degrees of freedom.
For 3 and 40 degrees of freedom, the critical value at 5% is 2.84, and at 1% it is 4.31.
Since 2.84 < 3.56 < 4.31, we reject the hypothesis at 5% and do not reject at 1%.
21.10. B. N - k is the number of degrees of freedom associated with the Error Sum of
Squares for the unrestricted model, 33 for the first model.
q = dimension of the restriction = the difference in the number of degrees of freedom
associated with the regression sum of squares for the unrestricted and restricted models =
5 - 3 = 2. (2 independent variables must have been excluded.)
Error Sum of Squares for the unrestricted model = 22,070.
TSS = 72,195 + 22,070 = 94,265.
Error Sum of Squares for the restricted model = TSS - RSSR = 94,265 - 63,021 = 31,244.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(31,244 - 22,070)/2} /{22,070/33} = 4587/668.8
= 6.86 is an F-Statistic with 2 and 33 degrees of freedom.
21.11. One initial step would be to examine each independent variable to see whether it
seems reasonable that it could affect the claim frequency. If so, does the sign of the slope
make sense? Data should go back and test whether each of the coefficients separately are
significantly different from zero, using the t-test. Data should go back and test whether groups
of the coefficients separately are significantly different from zero using F-Tests.
While R2 = 0.92 is rather high, one needs to take into account the enormous number of
separate regressions that were run. From 25 characteristics, the number of distinct sets of size
5 is: (25)(24)(23)(22)(21)/5! = 53,130. Thus even if there were no relation between the
independent variables and claim frequency, it is not surprising that one of these regressions
would show a good match to the observations, and thus have a high R2.
One should be somewhat skeptical whenever a very large set of models has been examined,
and the one that fits best has been selected.
It would be very useful if the model could be tested on a similar data set that had not been
used in this selection process.
Comment: These are some reasonable things one could say. There are probably others.
2
R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - .92)(16 - 1)/(16 - 6) = 0.88, still rather high.
Since all of the regressions used the same number of observations and the same number of
2
variables, the rankings by R2 are the same as the rankings by R .
In order to test H0: β1 = β2 = β3 = β4 = β5 = 0, F = {R2/(1 - R2)}(N - k)/(k - 1) =
(.92/.08)(16 - 6)/(6 - 1) = 23, with 5 and 10 degrees of freedom.
The 1% critical value is 5.64.
Since 23 > 5.64, if β1 = β2 = β3 = β4 = β5 = 0, the probability of F ≥ 23 is less than 1%.
Using a computer, Prob[F ≥ 23 | H0] = 0.0000346612. 1/0.0000346612 = 28,851.
Thus we would expect to see F ≥ 23 by random chance, 1 in every 28,851 times.
Alternately, as discussed previously, if all of the actual slopes of the model are zero, R2 follows
a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2, b = ν2/2 = (N - k)/2, and θ = 1.
In this case, a = (6 - 1)/2 = 2.5, and b = (16 - 6)/2 = 5.
Thus, for a model that actually explains nothing about claim frequency, the probability that
R2 ≥ .92 is: 1 - β[2.5, 5; .92].
Using a computer, 1 - β[2.5, 5; .92] = 0.0000346612. 1/0.0000346612 = 28,851.
Thus we would expect to see R2 ≥ .92 by random chance, 1 in every 28,851 times.
Thus one is not shocked that one out of 53,130 regressions tested has R2 as big as .92.
(On the one hand, many regressions were very similar, sharing all but one independent
variable. So their matches to the observations were not independent of each other.
On the other hand, this statistical result assumed independent variables with absolutely no
explanatory value whatsoever, a somewhat extreme assumption for practical applications.)
21.12. D. Test the hypothesis H0: β3 = β4 = 0.

The first model is unrestricted (UR). The second model is restricted (R).
N = 11. k = 4. q = 3 - 1 = 2 = 4 - 2.
F = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(27.7281 - 12.8156)/2}/(12.8156/7) = 4.07.
Comment: At 2 and 7 degrees of freedom, the critical value at 5% is 4.74.
Since 4.07 < 4.74 we do not reject H0 at 5%. One could figure out the sample has 11
employees, by adding 1 to the sum of the degrees of freedom; 11 = 1 + 3 + 7 = 1 + 1 + 9.
21.13. D. There are eight independent variables.

Thus including the intercept, k = 8 + 1 = 9. q = dimension of the restriction = 8 - 2 = 6.
N = 27. Error Sum of Squares for the restricted model = 126,471.
Error Sum of Squares for the unrestricted model = 76,893.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(126,471 - 76,893)/6} /{76,893/(27 - 9)} =
8263/4271.8 = 1.93 is an F-Statistic with 6 and 18 degrees of freedom.
Comment: N - k is the number of degrees of freedom associated with the Error Sum of
Squares for the unrestricted model, 18 for the first model. q is the difference in the number of
degrees of freedom associated with the regression sum of squares for the unrestricted and
restricted models. Note that the numerator of the F-Statistic is also: (RSSUR - RSSR)/q =
(115,175 - 65,597)/(8 - 2) = 8263.
21.14. C. F = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(ESSI - ESSII)/2}/{ESSII/(30 -4)} =

13(ESSI - ESSII)/ESSII.
We have TSS = Σ(Y - Y )2 = 160. ESSII = (1 - R2II)TSS = (1 - .7)(160) = 48.
^
For Model I, β 2 = Σx2iyi / Σx2i2 ⇒ -2 = Σx2iyi/10 ⇒ Σx2iyi = -20.
^ ^ ^
For Model I, ESSI = Σ(yi - β 2 x2i)2 = Σyi2 + β 2 2 Σx2i2 - 2 β 2 Σx2iyi = 160 + (4)(10) - (2)(-2)(-20) =
120.
F = 13(ESSI - ESSII)/ESSII = (13)(120 - 48)/48 = 19.5.
Alternately, for Model I, which is a two variable model, we have:
^
R2I = β 2 2Σx2i2 /Σyi2 = (4)(10)/160 = .25. ⇒ ESSI = (1 - R2I)TSS = (1 - .25)(160) = 120.
(See 73 of Pindyck & Rubinfeld.) Proceed as above.
Alternately, F = {(R2UR - R2R)/q} / {R2R/(N - k)}. Proceed as above.
21.15. A. One takes G as the restricted model, and A plus B as the unrestricted model.
There are 30 + 50 = 80 total observations ⇒ N = 80.
There are three restrictions: A1 = B1, A2 = B2, A3 = B3 ⇒ q = 3.
There are 6 coefficients being fit in the unrestricted model ⇒ k = 6.
ESSR = ESSG. ESSUR = ESSA + ESSB.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} =
{(ESS G - ESSA - ESSB )/3}/{(ESS A + ESSB )/74}.
This F-statistic has 3 and 74 degrees of freedom.
Comment: See Section 5.3.3 of Pindyck and Rubinfeld. Note that the
F-Statistic = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {( R2A + RB
2 - 2 )/3}/{(1 - 2 - 2 )/74}.
RG RA RB
When guessing, some people compare the choices and choose those features that show up
most often. In this case, three involve ESS, while only two involve R2, so one would choose
ESS. Two involve 3,74, two involve 6,77, and only one involves 6,74, so one would choose
either 3,74 or 6,77. Thus in this case, we would guess either A or B.
21.16. D. The unrestricted model is Model I with ESS = 484.

To obtain the restricted model, substitute β3 = 1 - β2 to yield Model III with ESS = 982. Then,
Comment: See Equation 5.20 at page 129 of Econometric Models and Economic Forecasts.
21.17. 17.49 is an F-Statistic with 1 and 17 degrees of freedom. Using the table with ν1 = 1
and ν2 = 17, the critical values are 4.45 and 8.40 for 5% and 1% respectively.
Since 17.49 > 8.40, we reject at 1% the null hypothesis, that β2 + β3 = 1.
Comment: Using a computer, the p-value is 0.06%. So we really, really reject the H0.
21.18. B. The unrestricted model is Model I with ESS = 484.

To obtain the restricted model, substitute β3 = β2 to yield Model II with ESS = 925. Then,
Comment: See Section 5.3.2 of Econometric Models and Economic Forecasts.
21.19. 15.49 is an F-Statistic with 1 and 17 degrees of freedom. Using the table with ν1 = 1
and ν2 = 17, the critical values are 4.45 and 8.40 for 5% and 1% respectively.
Since 15.49 > 8.40, we reject at 1% the null hypothesis, that β2 = β3.
21.20. D. ESSR = TSS - RSSR = 15,000 - 5,565 = 9,435.

ESSUR = (1 - RUR2)TSS = (1 - 0.38)(15,000) = 9,300.
F = {(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(9435 - 9300)/3}/{9300/(3120 - 6)} = 15.07.
Alternately, RR2 = RSSR/TSS = 5565/15000 = 0.371.
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(0.38 - 0.371)/3}/{(1 - 0.38)/(3120 - 6)} = 15.07.
Comment: This F-Statistic has 3 and 3114 degrees of freedom.
^ ^
21.21. A. Restricting α = 0 and β = 1, Yt = Xt. ε^t = Yt - Yt = Yt - Xt.
ESSrestricted = Σ ε^t 2 = (254 - 475)2 + (463 - 254)2 + (515 - 463)2 + (567 - 515)2 + (605 - 567)2
= 99,374. The restriction is two dimensional, q = 2. F-Statistic =
{(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {(99374 - 69843)/2}/{69843/(5 - 2)} = .634.
The numerator has 2 degrees of freedom and the denominator has 5 - 2 = 3 degrees of
freedom. Since .634 < 9.55, we do not reject H0 at the 5% significance level.
Comment: Y = α + βX + ε. If α = 0 and β = 1, then X = Y + ε, and the actuary’s forecast method
is a good one. The actuary’s forecast method is to use the current year as the prediction of the
next year.
21.22. A. F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} =

{(ESSC - ESSA - ESSB)/k}/{(ESSA + ESSB)/(NA + NB - 2k)},
with k and NA + NB - 2k degrees of freedom.
ESSC = 10,374. ESSA = 4053. ESSB = 2087. k = 4. NA = 18 and NB = 19.
F = {(10374 - 4053 - 2087)/4)}/{(4053 + 2087)/(18 + 19 - 8)} = 5.00, with 4 and 29 degrees of
freedom. From the table the critical value for 5% for 4 and 29 degrees of freedom is less than
2.74. 5.00 > 2.74 so it is statistically significant at the 5% significance level.
Comment: I have assumed that the model includes an intercept, so that k = 4. The question
should have made it clear whether the model had an intercept. If there were no intercept in the
model then k = 3, which does not produce any of the given choices.
The particular situation modeled here would require an intercept in order for the model to
make sense.
The critical value for 5% for 4 and 29 degrees of freedom is 2.70.
From the table the critical value for 1% for 4 and 29 degrees of freedom is less than 4.14.
5.00 > 4.14 so the F statistic is statistically significant at the 1% significance level.
The critical value for 1% for 4 and 29 degrees of freedom is 4.04.
21.23. A. For the unrestricted model, ESS = .482 + .282 + .442 + .202 + .442 = 0.736.
X Y Unrestricted Error Unrestricted Error
Model Model
1 2.8 2.32 -0.48 1.986 -0.814
2 2.9 3.18 0.28 2.979 0.079
3 3.6 4.04 0.44 3.972 0.372
4 4.7 4.90 0.20 4.965 0.265
5 6.2 5.76 -0.44 5.958 -0.242
For the restricted model, ESS = .8142 + .0792 + .3722 + .2652 + .2422 = 0.93601.
q = dimension of the restriction = 1.
N = number of data points = 5. k = number of independent variables including intercept = 2.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(.93601 - .736)/1}/{0.736/(5 - 2)} = 0.815.
This F-Statistic has 1 and 3 degrees of freedom.
The 5% critical value for 1 and 3 degrees of freedom is 10.13.
0.815 < 10.13, and therefore we do not reject H0 at 5%.
Solutions to problems in the remaining sections appear in Study Guides N, etc.

Mahler’s Guide to
Regression
Sections 22-32
prepared by
Study Aid F06-Reg-N

Sharon, MA, 02067
HCMSA-F06-Reg-N, Solutions to Regression §22-32, 7/12/06, Page 488
22.1. A. Fit a linear regression to the natural log of the claim sizes:
ln(117) = 4.762, ln(132) = 4.883, ln(136) = 4.913, ln(149) = 5.004, ln(151) = 5.017.
X = 3. x = -2, -1, 0, 1, 2. Σxi2 = 10.
Y = (4.762 + 4.883 + 4.913 + 5.004 + 5.017)/5 = 4.916.
y = Y - Y = -.154, -.033, -.003, .088, .101.
Σxiyi = (-2)(-.154) + (-1)(-.033) + (0)(-.003) + (1)(.088) + (2)(.101) = 0.631.
^ ^
β = Σxiyi /Σxi2 = .631/10 = .0631. b = exp[ β] = e.0631 = 1.065.
^
Comment: α^ = Y − β X = 4.916 - (.0631)(3) = 4.73. a = exp[ α^ ] = e4.73 = 113.
The fitted model is: Y = 113 (1.065t).
22.2. C. Let t be time (in years) and Y = ln(average written premium at current rate level).
t = 45/9 = 5. Y = 63.50844/9 = 7.05649.
Σt2/N = 285/9 = 31.667. ΣtY/N = 317.75352/9 = 35.30595.
variance of t = Σt2/N - t 2 = 31.667 - 52 = 6.667.
sample covariance of t and Y = ΣtY/N - t Y = 35.30595 - (5)(7.05649) = .0235.
slope of the regression line = .0235/6.667 = .00352.
Year LN of
avg. W.P. 12 month avg
@CRL W.P. @ CRL t^2 t ln(avg w.p)
1 1156.83 7.05344 1 7.05344
2 1152.34 7.04955 4 14.09910
3 1153.64 7.05068 9 21.15203
4 1150.22 7.04771 16 28.19083
5 1144.52 7.04274 25 35.21370
6 1150.11 7.04761 36 42.28568
7 1164.21 7.05980 49 49.41859
8 1178.57 7.07206 64 56.57646
9 1193.75 7.08485 81 63.76369
Sum 45 10444.19 63.50844 285 317.75352
^
intercept of the regression line is: Y - β t = 7.05649 - (.00352)(5) = 7.03889.
Therefore, the exponential regression is:
average written premium at current rate level = (e7.03889)e.00352t = 1140.1(1.00353t).
For year 2002 (t = 12), the estimate is: 1140.1(1.00353t) = 1189.3.
22.3. Using the matrix formulas for multiple regression:
The first three rows, out of fifteen rows in total, of the design matrix X are:
(1 29 12 348)
(1 21 8 168)
(1 62 10 620)
(15 527 131 4549)
X’X = (527 23651 4549 195459)
(131 4549 1219 42241)
(4549 195459 42241 1759219)
(5.6118 -0.113031 -0.676391 0.0142883)
-1
(X’X) = (-0.113031 0.00282223 0.0140079 -0.000357637)
(-0.676391 0.0140079 0.0866804 -0.00188865)
(0.0142883 -0.000357637 -0.00188865 0.0000487058)
( 38657 ) (1041.89)
^
X’Y = (1413683) β = (X’X)-1X’Y = (-13.2376)
( 355153 ) ( 103.306 )
(12885970) ( 3.62096)
^
ESS = Y’Y - β‘X’Y = 108,242,671 - 104,911,589 = 3,331,081.
s2 = ESS/(N - k) = 3,331,081/(15 - 4) = 302,826.
^ ^
Var[ β] = s2(X’X)-1. Var[ β 4 ] = (302,826)(0.0000487058) = 14.749.
^
sβ^ = √14.74 = 3.84. t = β 4 /sβ^ = 3.621/ 3.84 = .943.
4 4
Since .943 < 1.796, we do not reject at 10% the hypothesis that β 4 = 0.
Comment: The null hypothesis is that β4 = 0. The alternate hypothesis is that β4 ≠ 0, in
other words that X2 and X3 interact.
22.4. A., 22.5. C., 22.6. C. Using the matrix formulas for multiple regression:
The first two rows, out of ten rows in total, of the design matrix X are:
(1 11.7 11.72)
(1 25.3 25.32)
(10 909.1 164015)
X’X = (909.1 164015 3.71471 x 107)
(164015 3.71471 x 107 9.23334 x 109)
(0.354364 -0.0060632 0.0000180984)
-1
(X’X) = (-0.0060632 0.00017239 -5.85846 x 10-7)
(0.0000180984 -5.85846 x 10-7 2.14376 x 10-9)
( 89.4 ) ( 13.71 )
^
X’Y = (5922.02) β = (X’X)-1X’Y = ( -.1018 )
(991190) (.0002735)
^ ^ ^
β1 = 13.71, β 2 = -.1018, β 3 = .0002735.
^
22.7. E. ESS = Y’Y - β‘X’Y = 927.6 - 893.895 = 33.705.
s2 = ESS/(N - k) = 33.705/(10 - 3) = 4.815.
^
Alternately, Y = (12.56, 11.31, 6.75, 4.43, 12.70, 12.01, 10.68, 7.24, 4.78, 6.94).
^
ε^ = Y - Y = (2.74, -2.01, -0.25, 1.57, 3.00, -2.01, -2.08, -0.84, 0.82, -0.94).
ESS = Σ ε^ 2 = 33.703. s2 = ESS/(N - k) = 33.703/(10 - 3) = 4.815.
^ ^
22.8. C. Var[ β] = s2(X’X)-1. Var[ β 3] = (4.815)(2.14376 x 10-9) = 1.0322 x 10-8.
^
sβ^3 = √1.0322 x 10-8 = .0001016. t = β 3/sβ^3 = .0002735/ .0001016 = 2.692.
For 10 - 3 = 7 degrees of freedom, the critical values for 5% and 2% are 2.365 and 2.998.
22.9. For a given age of antique, X2, for an increase of 2 in the number of bidders, the
^ ^
auction price, Y increases: 2 β 3 + 2 β 4 X2 = -180 + (2.6)(age).
So if for example age is 100, then the expected increase in price is 80.
If instead the age is 200, then the expected increase in price is 340.
Comment: A general feature of the interactive model is that the effect of a change in one
independent variable depends on the level of the other independent variable.
22.10. E. lnY = ln(a) + b ln (X).

Therefore, fit a least squares line to lnX and lnY, with intercept ln(a) and slope b.
V = ln X = -0.941609, -0.328504, 0, 0.41871, 1.64866, 2.25549.
V = 0.508792. v = V - V = -1.4504, -0.837296, -0.508792, -0.0900813, 1.13987, 1.7467.
W = ln Y = 4.47734, 5.4161, 5.8999, 6.53233, 8.37402, 9.2835.
W = 6.66386. w = W - W = -2.18653, -1.24776, -0.763966, -0.131529, 1.71015, 2.61963.
Σ vi wi = 11.14. Σ vi2 = 7.42. b = Σ vi wi /Σ vi2 = 11.14/7.42 = 1.50. ln a = W - b V = 5.90.
For X = 19.2, ln(Y) = 5.90 + (1.5)ln(19.2) = 10.33. Y = e10.33 = 30,638.
Comment: The data is for Mercury, Venus, Earth, Mars, Jupiter, and Saturn, the 6 planets
known at the time Kepler published his third law of motion, which states that b = 3/2.
We used the fitted curve to estimate the year of Uranus, which is actually 30,685 days.
(The difference is due to rounding.)
22.11. D. Let T = 1, 2, 3,..., 12. T = 6.5. t = T - T = -5.5, - 4.5, ... , 5.5

Taking the logarithms of the consumer price indices:
W = lnY = 4.80402, 4.81947, 4.82751, 4.84419, 4.87901, 4.88734, 4.90749, 4.92435, 4.9395,
4.95018, 4.96284, 4.99721. W = 4.89526.
w = W - W = -0.0912387, -0.075785, -0.0677464, -0.0510727, -0.0162529, -0.00792269,
0.0122348, 0.0290912, 0.0442375, 0.0549176, 0.0675849, 0.101953.
Σwi ti = 2.45341. Σ ti2 = 143. β^ = Σwi ti/Σ ti2 = .01716.
^
^ = W - β T = 4.89526 - (.01716)(6.5) = 4.784.
α
Second Quarter 2005 ⇔ T = 12. Third Quarter of 2006 ⇔ T = 17.
Fitted consumer price index is: exp[4.784 + (.01716)(17)] = e5.076 = 160.1.
22.12. E. Let Z = √X. Yi = α + β√Xi ⇔ Yi = α + βZi.

Z = 1, 1.732, 2, 2, 2.646.
Z = 1.876. zi = -.876, -.144, .124, .124, .770.
^
β = Σziyi / Σzi2 = ΣziYi / Σzi2 = 5.468/1.412 = 3.87
22.13. A.
(1 -1 1) (3)
X= (1 1 1) Y = (4)
(1 3 9) (7)
(1 5 25) (6)
(4 8 36)
X’X = (8 36 152)
(36 152 708)
(2384 -192 -80)

(X’X)-1 = (-192 1536 -320) /5120
(-80 -320 80)
(20 )
X’Y = (52 )
(220)
(20096) (3.925)
^
β = (X’X)-1X’Y = (5632 ) /5120 = (1.1 )
(-640 ) (-.125)
The fitted quadratic polynomial is: y = 3.925 + 1.1x - .125x2.
The predicted value of Y, when X = 6 is: 3.925 + (1.1)(6) - (.125)(62) = 3.8 + (6)(.6) = 6.025.
Alternately, the squared error is: Σ(Yi − β1 + β2Xi + β3Xi2)2
Setting the partial derivatives with respect to β1, β2, and β3 equal to zero, we get the Normal
Equations, three equations in three unknowns:
Nβ1 + ΣXi β2 + ΣXi2 β3 = ΣYi.
ΣXiβ1 + ΣXiXi β2 + ΣXiXi2 β3 = ΣYiXi.
ΣXi2β1 + ΣXiXi2 β2 + ΣXi2Xi2 β3 = ΣYiXi2.
4β1 + 8β2 + 36β3 = 20. ⇒ β1 + 2β2 + 9β3 = 5.
8β1 + 36β2 + 152β3 = 52. ⇒ 2β1 + 9β2 + 38β3 = 13.
36β1 + 152β2 + 708β3 = 220. ⇒ 9β1 + 38β2 + 177β3 = 55.
The first two equations imply: 5β2 + 20β3 = 3.
The first and third equations imply: 20β2 + 96β3 = 10. ⇒ 10β2 + 48β3 = 5.
Therefore, 8β3 = -1. ⇒ β3 = -1/8. ⇒ β2 = 1.1. ⇒ β1 = 3.925.
The fitted quadratic polynomial is: y = 3.925 + 1.1x - .125x2.
The predicted value of Y, when X = 6 is: 3.925 + (1.1)(6) - (.125)(62) = 3.8 + (6)(.6) = 6.025.
22.14. E. The restriction of going from the quadratic to the linear model is one dimensional.
Since 214.2 > 4.75 we reject the simpler linear model at the 5% level.
The restriction of going from the third degree to the quadratic model is one dimensional.
Since 13.06 > 4.84 we reject the simpler quadratic model at the 5% level.
The restriction of going from the fourth degree to the third degree model is one dimensional.
Since 3.43 < 4.96 we do not reject the simpler third degree model at the 5% level.
Now compare the third degree model to the fifth degree model.
Since 4.76 > 4.26 we reject the simpler third degree model in favor of the fifth degree model
at the 5% level. Use the fifth degree model.
Comment: Based on Table 15.1 and Figures 15.3 to 15.6 in Loss Models.
The ESS for the sixth degree polynomial is 278.2.
The ESS for the seventh degree polynomial is 271.2.
Neither is a significant improvement over the fifth degree polynomial.
The restriction of going from the fifth degree to the sixth degree model is one dimensional.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(282.9 - 278.2)/1}/{(278.2/(15 - 7)} = .135.
Since .135 < 5.32 we do not reject the simpler fifth degree model at the 5% level.
The restriction of going from the fifth degree to the seventh degree model is two dimensional.
F = {(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(282.9 - 271.2)/2}/{(271.2/(15 - 8)} = .151.
Since .151 < 4.74 we do not reject the simpler fifth degree model at the 5% level.
22.15. The mean mortality is 71.9067.
The TSS = Σ(Yi - Y )2 = (3.89 - 71.9067)2 + ... + (271.60 - 71.9067)2 = 114,035.
R2 = 1 - ESS/TSS = 1 - ESS/114035.
The sample variance of the mortalities = 114035/(15 - 1) = 8145.
2
R ≡ 1 - (sample variance of residuals)/ (sample variance of Y) =
1 - {ESS/(N-k)}/ (sample variance of Y) = 1 - {ESS/(14 - order)}/ 8145.
Order ESS R^2 Corrected R^2
1 24005.9 0.78949 0.77328
2 1273.7 0.98883 0.98697
3 582.3 0.99489 0.99350
4 433.7 0.99620 0.99468
5 282.9 0.99752 0.99614
6 278.2 0.99756 0.99573
7 271.2 0.99762 0.99524
The values of R2 increase as the order of the equation increases.
In general, R2 increases as we add more variables to the model.
2
The fifth degree polynomial has the best R .
22.16. C. Fit a regression with no intercept, taking lnXi as the independent variable.
ln(X1) = ln(e) = 1. ln(X2) = ln(e2) = 2.
b = Σln(Xi)Yi /Σln(Xi)2 = (Y1 + 2Y2)/(1 + 22) = (Y1 + 2Y2)/5.
22.17. D. S(x) = exp[-m(cx - 1)]. f(x) = -S'(x) = m ln(c) cx exp[-m(cx - 1)].

µx = f(x) / S(x) = m ln(c) cx. ln(µx) = ln(c) x + ln(m) + ln(ln(c)).
Thus under Gompertz law, lnµx is linear in x.
Let Y = ln(µx). Fitting a linear regression:
^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} = {(5)(-1199) - (200)(-30)}/{(5)(8010) - 2002} =
5/50 = 0.1.
^
α^ = Y - β X = (-30/5) - (.1)(200/5) = -10.
Fitted value of ln(µx) at X = 41 is: -10 + (0.1)(41) = -5.90.
Comment: The fitted µ41 = e-5.9 = .00274.
22.18. C. To minimize the SS, set the partial derivative with respect to λ1 equal to zero:
3 22
0 = -2 Σ Σ (u[x]+r - λ1 - λ2r - λ3x) = -2{ Σ Σu[x]+r - λ1Σ Σ 1 - λ2 Σ Σ r - λ3Σ Σx }. ⇒
r =0 x =21 r x r x r x r x
3 22
Σ Σ u[x]+r = λ1(4)(2) + λ2(2)(0 + 1 + 2 + 3) + λ3(4)(21 + 22) = 8λ1 + 12λ2 + 172λ3. ⇒
r =0 x =21
f = 8, g = 12 , and h = 172. ⇒ f + g + h = 192.
Comment: The other two normal equations result from setting the partial derivatives of SS
with respect to λ2 and λ3 equal to zero.
3 22
0 = -2 Σ Σ r (u[x]+r - λ1 - λ2r - λ3x).
r =0 x =21
3 22
0 = -2 Σ Σ x (u[x]+r - λ1 - λ2r - λ3x).
r =0 x =21
22.19. B. One can take exp[xi] as the independent variable, and just use the equation for
the slope of the least squares line with no intercept:
n n
θ^ = Σ yi exp[xi] / Σ exp[xi]2 = ∑ yi exp[xi] / ∑ exp[2xi] .
i=1 i=1
22.20 C. ln(µx) = ln(B) + x ln(c). Let Y = ln(µx), and fit a linear regression.
X = 2. x = (-1, 0, 1). Y = -3.07667. y = (-.02333, .00667, .01667).
^
ln(c) = β = Σ xiyi/ Σ xi2 = .04/2 = .02.
^ = Y - β^ X = -3.07667 - (.02)(2) = -3.11667.
ln(B) = α
22.21. C. A + B0 + C02 = v[0] = 1.4. ⇒ A = 1.4.

A + B1 + C12 = v[1]+1 = 1.8. ⇒ A + B + C = 1.8.
A + B0 + C22 = v[0]+2 = 2.0. ⇒ A + 4C = 2.0.
⇒ C = .15. ⇒ B = .25.
Now the sum of squared errors is: ΣΣ(v[x]+t - u[x]+t)2 = ΣΣ(A + Bx + Ct2 - u[x]+t)2.
Setting the partial derivative with respect to A equal to zero:
0 = ΣΣ(A + Bx + Ct2 - u[x]+t) = ΣΣ(v[x]+t - u[x]+t). ⇒ ΣΣv[x]+t = ΣΣu[x]+t.
ΣΣv[x]+t = 9A + 3(0 + 1 + 2)B + 3(0 + 1 + 4)C = 17.1.
⇒ 17.1 = ΣΣu[x]+t = u[0] + 1.5 + 1.8 + 1.5 + 1.8 + 2.3 + 1.8 + 2.3 + 2.4.
⇒ u[0] = 17.1 - 15.6 = 1.5.
22.22. C. S(x) = exp[-m(cx - 1)]. f(x) = -S'(x) = m ln(c) cx exp[-m(cx - 1)].

µx = f(x) / S(x) = m ln(c) cx. ln(µx) = ln(c) x + ln(m) + ln(ln(c)).
Thus under Gompertz law, lnµx is linear in x.
Let Y = ln(µx). Fitting a linear regression:
^
β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} = {(5)(-122.11) - (15)(-41)}/{(5)(55) - 152} =
4.45/50 = 0.089.
^
α^ = Y - β X = (-41/5) - (.089)(3) = -8.467.
Fitted value of ln(µx) at X = 4 is: -8.467 + (0.089)(4) = -8.111.
Comment: The fitted µ4 = e-8.111 = .000300.
22.23. B. θ is the slope of a least squares line through the origin, treating ln(x) as the
n n
independent variable. θ = ∑ yi ln(xi) / ∑ {ln(xi)}2 .
i=1 i=1
22.24. B. i xi yi θ(xi + xi2)

1 1 4 2θ
2 2 8 6θ
3 3 14 12θ
The squared error is: (4 - 2θ)2 + (8 - 6θ)2 + (14 - 12θ)2.
Set the derivative with respect to θ equal to zero:
0 = -2{2(4 - 2θ) + 6(8 - 6θ) + 12(14 - 12θ). ⇒ θ = 28/23.
22.25. B. ln µx = ln k + b ln(x) = c + b ln(x).

Sum of Squares = (c + b ln(5) + 3.9)2 + (c + b ln(10) + 3.3)2 + (c + b ln(15) + 2.8)2.
Set the partial derivative with respect to c equal to zero:
0 = 2{(c + b ln(5) + 3.9) + (c + b ln(10) + 3.3) + (c + b ln(15) + 2.8)}. ⇒ 3c + 6.620b + 10 = 0.
Set the partial derivative with respect to b equal to zero:
0 = 2{ln(5)(c + b ln(5) + 3.9) + ln(10)(c + b ln(10) + 3.3) + ln(15)(c + b ln(15) + 2.8)}.
⇒ 6.620c + 15.226b + 21.458 = 0.
⇒ b = {(3)(21.458) - (6.620)(10)}/{6.6202 - (3)(15.226)} = (-1.826)/(-1.8536) = .985.
⇒ c = -5.507.
The estimate of ln µ15 is: -5.507 + ln(15)(.985) = -2.84.
Comment: For a Weibull Distribution as per Loss Models, F(x) = 1 - exp(-(x/θ)τ),
f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x, and the force of mortality is: τxτ−1/θτ.
22.26. C. To minimize the SS, set the partial derivative with respect to λ1 equal to zero:
3 12
0 = -2 Σ Σ (u[x]+r - λ1 - λ2r - λ3x) = -2{ Σ Σu[x]+r - λ1Σ Σ 1 - λ2 Σ Σ r - λ3Σ Σx }. ⇒
r =0 x =11 r x r x r x r x
3 12
Σ Σ u[x]+r = λ1(4)(2) + λ2(2)(0 + 1 + 2 + 3) + λ3(4)(11 + 12) = 8λ1 + 12λ2 + 92λ3. ⇒
r =0 x =11
f = 8, g = 12 , and h = 92. ⇒ f + g + h = 112.
22.27. A. Let Zi = Xi2, then the model is Yi = βZi + εi. The least squares fit to this model with
no intercept is: β = ΣYiZi /ΣZi2 = Σ YiXi2 /Σ Xi4.
Alternately, one could minimize: Σ(Yi - βXi2)2.
Comment: There are two other Normal Equations, which are gotten by setting the partial
derivatives with respect to λ2 and λ3 equal to zero.
22.28. C. Let Z = X2. Yi = α + βXi2 + εi ⇔ Yi = α + βZi + εi.

Z = 9/5. zi = -9/5, -9/5, -4/5, 11/5, 11/5. Y = 10. yi = -8, -6, -2, 6, 10.
Σziyi = 310/5. Σzi2 = 420/25. β^ = Σziyi / Σzi2 = (310/5)/(420/25) = 155/42 = 3.69.

Alternately, the sum of squared errors is: Σ(Yi - α - βXi2)2.
Setting the partial derivative with respect to α equal to zero: 0 = 2Σ(Yi - α - βXi2).
Therefore, ΣYi = Nα + βΣXi2. ⇒ 50 = 5α + 9β.
Setting the partial derivative with respect to β equal to zero: 0 = 2ΣXi2(Yi - α - βXi2).
Therefore, ΣYiXi2 = αΣXi2 + βΣXi4. ⇒ 152 = 9α + 33β.
Solving the two equations: β= 310/84 = 3.69, and α = 282/84.
Comment: Unless stated otherwise, we usually transform the variables to achieve a linear
form of the equation. In this situation, minimizing the squared errors in the original equation
gives the same result as applying a change variables, since the original model is linear in the
coefficients.
In the case of transforming an exponential relationship into a linear relationship by taking
logs of both sides, the result is not the same as minimizing the squared errors in the original
equation. In that situation, minimizing the squared errors in the original equation is an
example of nonlinear estimation, covered in subsequent section.
22.29. D. Y = α eβX. lnY = lnα + βX.

Fit a linear regression between year and the natural log of the claim sizes.
In deviations form, x = X - X = -2, -1, 0, 1, 2.
^
β = Σxi ln Yi/Σ xi2 = {(-2)(ln1020) + (-1)(ln1120) + (0)(ln1130) + (1)(ln1210) + (2)(ln1280)} / 10
^
= .05314. lnα = Fitted intercept = average of lnY - β X =
(ln1020 + ln1120 + ln1130 + ln1210 + ln1280) / 5 - (.05314)(3) = 7.0463 - .15942 = 6.8869.
α = e6.8869 = 979.36.
Predicted claim cost for year 6 is: 979.36 exp[(6)(.05314)] = $1347.14.
Comment: One can use your electronic calculator to fit a linear regression between year and
the natural log of the claim sizes.
23.1. D. For model A, lnY = Dlnα1 + Xlnβ1 + lnε.

The term Dlnα1 incorporates the one time effect via the use of the dummy variable.
The term Xlnβ1 incorporates constant inflation over the entire period.
(Y = α1 β1X is the basic form of constant inflation.)
If ε is lognormal, then lnε is Normal.
However, unlike model B, Model A does not have an intercept. (Thus under model A, the
average claim cost in year 0 is automatically 1.)
For model B, lnY = lnα1 + Dlnα2 + Xlnβ1 + lnε. This incorporates a one time effect, but does
not incorporate a change in the rate of inflation on 1/1/95.
For model C, lnY = lnα1 + Xlnβ1 + XDlnβ2 + lnε.
Model C would be used, if we had assumed the rate of inflation changed on 1/1/95, without a
one time effect.
For model D, lnY = lnα1 + Dlnα2 + Xlnβ1 + XDlnβ2 + lnε.
Model D is used, since we assumed the rate of inflation changed on 1/1/95 in addition to a
one time immediate change in claim costs.
Comment: For example if the fitted Model D were Y = 6702(.88D)(1.057X)(.981XD), (assuming
the regression statistics were significant), that would indicate a claim cost at time = 0 (1990)
of 6702, a one time reduction of 12% on 1/1/95, an annual inflation rate of 5.7% prior to
1/1/95, and an annual inflation rate of: (1.057)(.981) - 1 = 3.7% after 1/1/95.
23.2. B. For a 7 year old SUV with a cell phone the expected cost is:
100 - (2)(7) + (10)(1) + (4)(1) - (7)(1) + (3)(1)(1) = 96.
For a 2 year old car that is not an SUV and that has no cell phone the expected cost is:
100 - (2)(2) + (10)(0) + (4)(0) - (2)(0) + (3)(0)(0) = 96.
The difference in expected costs is zero.
Comment: Similar to Course 120 Sample Exam #2, Q.9.
For a new car that is not an SUV and has no cell phone, the expected cost is 100.
This is not intended as a realistic model of insurance costs.
23.3. A. For not SUV and no cell phone the expected cost is: 100 - 2 age.
Averaged over these vehicles: 100 - (2)(6.7) = 86.6.
For not SUV and with cell phone the expected cost is: 100 - 2 age + 4 = 104 - 2 age.
For SUV and no cell phone the expected cost is: 100 - 2 age + 10 - age = 110 - 3 age.
For SUV and with cell phone the expected cost is: 100 - 2 age + 10 + 4 - age + 3 =
117 - 3 age. Averaged over these vehicles: 117 - (3)(4.4) = 103.8.
Overall average = {(3000)(86.6)+ (4000)(91.8)+ (1000)(95.3)+ (2000)(103.8)}/10000 = 93.0.
Alternately, E[Xi] = average age = 5.82. E[D1i] = portion SUV = .3.
E[D2i] = portion with cell phones = .6. E[XiD1i] = average age of SUVs = 4.567.
E[D1i D2i] = portion that are SUVs with cell phones = .2.
Overall average = 100 - (2)(5.82) + (10)(.3) + (4)(.6) - (4.567)(.3) + (3)(.2) = 93.0.
23.4. Comparing the model with no X4 and X4X5 terms to that with both:
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .7695)/2} /{(1 - .9401)/(100 - 8)} = 131.0.
This has 2 and 92 degrees of freedom.
One can also usefully, compare the model with no X4 and X4X5 terms to that with no X4X5 :
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9331 - .7695)/1} /{(1 - .9331)/(100 - 7)} = 227.
23.5. Comparing the model with no X3 term to that with X3:

F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .8685)/1} /{(1 - .9401)/(100 - 8)} = 110.0.
Comment: The t-statistic is √110.0 = 10.49.
23.6. Comparing the model with no X2 and X22 terms to that with both:
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .3605)/2} /{(1 - .9401)/(100 - 8)} = 445.
One can also usefully, compare the model with no X2 and X22 terms to that with no X22:
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9264 - .3605)/1} /{(1 - .9264)/(100 - 7)} = 715.

F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .9212)/1} /{(1 - .9401)/(100 - 8)} = 29.0.
23.8. Comparing the model with no X5 and X4X5 terms to that with both:
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .8234)/2} /{(1 - .9401)/(100 - 8)} = 89.6.
One can also usefully, compare the model with no X5 and X4X5 terms to that with no X4X5 :
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9331 - .8234)/1} /{(1 - .9331)/(100 - 7)} = 152.5.
23.9. Comparing the model with no X4X5 term to that with X4X5:
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .9331)/1} /{(1 - .9401)/(100 - 8)} = 10.75.
F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.9401 - .9264)/1} /{(1 - .9401)/(100 - 8)} = 21.04.
Comment: The t-statistic is -√21.04 = -4.59. (The fitted coefficient is negative, so t < 0.)
23.11. The model should include many independent variables, in addition to whether or not
the school has soda machines.
For example dummy variables might include: whether or not the school has candy machines,
whether or not the school has any machines besides soda and candy machines for example
those that allow students to buy juice or healthy snacks like apples, the average age of the
students, the percentage of students that are female, the percentage of students in various
ethnic groups, etc.
One would probably want one or more variables measuring the socioeconomic status of the
students, for example how many students qualify for the federal school lunch program.
Some measure of the academic performance of the school might be a useful variable.
The important point is that there are many other relevant variables than whether there are
soda machines in a school. Unless one accounts for many of them, the model will be
incomplete and the results of any test could be spurious.
Once one has a relatively good model, one could perform a one-sided t-test on the coefficient
of the dummy variable for soda machines, and see whether it is significantly different from
zero and positive.
Comment: In a practical application one would talk to people who know something about the
subject and/or read the literature, in order to obtain ideas for potentially useful variables.
23.12. 1. Y = α + βX + ε.
Assumes that gender has no effect.
2. Y = α + βX + γD + ε.
Assumes the same slope by gender, but the intercept for males is α and for females is α + γ.
3. Y = α + βX + γDX + ε.
Assumes the same intercept by gender, but the slope for males is β and for females is β + γ.
4. Y = α + βX + γD + δDX + ε.
Assumes the intercept for males is α and for females is α + γ, while the slope for males is β
and for females is β + δ. However, unlike the next case, we do assume the same variance of
the error terms for the two genders.
5. Y = α1 + β1X + ε1 for males, and Y = α2 + β2X + ε2 for females.
We assume two totally separate models, one for each gender.
The error terms for the two genders are assumed to have different variances.
We estimate two separate regressions.
Comment: There are possibly other models that one could list, but these are the five listed at
page 124-125 of Pindyck and Rubinfeld.
The exact notation used for the coefficients is not important.
23.13. D. The F-Statistic to test whether β2 = β3 = β4 = 0 is:

{(ESSR - ESSUR)/q} / {ESSUR/(N - k)} = {RSS/(k-1)}/{ESS/(N - k)} =
{(1155820 - 1143071)/3}/{1143071/(1000 - 4)} = 3.70, with 3 and 996 degrees of freedom.
Source Sum of Squares Degrees of Freedom Mean Sum of Squares F-Ratio
Regression 12749 3 4249.67 3.70
Error 1143071 996 1147.66
Total 1155820 999
Comment: It turns out that for 3 and 996 degrees of freedom the critical values for 5% and 1%
are 2.61 and 3.80. (These are not shown in the table attached to the exam, since ν2 is too
large.) Therefore, since 2.61 < 3.70 < 3.80, season is significant at the 5% level, but not at the
1% level.
23.14. E. For 7 years experience D1i = 1. For Chicago D2i = 0 and D3i = 1.
^
Y = 12 + (3)(7) + (2)(7 - 5)(1) - (2)(7 - 2)(0) + (7 - 5)(1) = 12 + 21 + 4 + 0 + 2 = 39.
23.15. B. For fires at least 4 kilometers from the fire station X2i = 1.
For city A, the average fire damage is: 8 + 5X1i + 2(X1i - 4) + 9 - 2X1i = 9 + 5X1i.
For city B, the average fire damage is: 8 + 5X1i + 2(X1i - 4) = 7X1i.
Setting the damages equal: 9 + 5X1i = 7X1i ⇒ X1i = 4.5.
23.16. C. In order to test whether class section is a significant variable, we run a regression
on the restricted model with the coefficients of the (dummy) variables that determine class
section set equal to zero: β4 = β5 = 0.
ESSUR = Error Sum of Squares of the Unrestricted model = (1 - R2UR)TSS = .06TSS.
ESSR = Error Sum of Squares of the Restricted model = (1 - R2R)TSS = .085TSS.
q = dimension of the restriction = independent variables for unrestricted model -
independent variables restricted model = 5 - 3 = 2.
{(ESSR - ESSUR)/q}/{ESSUR/(N - k)} = {(.085TSS - .06TSS)/2} /{.06TSS/(42 - 5)} = 7.71 is an
Alternately, F = {(R2UR - R2R)/q}/{(1 - R2UR)/(N - k)} = {(.940 - .915)/2} /{(1 - .940)/(42 - 5)} =
7.71.
Comment: See formulas 5.20 and 5.21 in Pindyck and Rubinfeld.
23.17. B. For model A, lnY = Dlnα1 + Xlnβ1 + lnε.

The term Dlnα1 incorporates the one time affect via the use of the dummy variable.
The term Xlnβ1 incorporates constant inflation over the entire period.
(Y = α1 β1X is the basic form of constant inflation.)
If ε is lognormal, then lnε is Normal.
However, unlike model B, Model A does not have an intercept. (Thus under model A, the
average claim cost in year 0 is automatically 1.)
For model B, lnY = lnα1 + Dlnα2 + Xlnβ1 + lnε. This is what we want.
For model C, lnY = lnα1 + Xlnβ1 + XDlnβ2 + lnε.
Model C would be used, if we had assumed the rate of inflation changed on 1/1/97.
For model D, lnY = lnα1 + Dlnα2 + Xlnβ1 + XDlnβ2 + lnε.
Model D would be used, if we had assumed the rate of inflation changed on 1/1/97 in
addition to a one time immediate change in claim costs.
For model E, Y = α1 α2D Xβ1 ε.
Model E would be used, if we had assumed Y increases as an unknown power of X, rather
than via a constant rate of inflation.
Comment: For years 1997 and later, Y is multiplied by α2.
If the claims manager’s assertion were correct, then α^ 2 ≅ .8.
For example if the fitted Model B were Y = 2345(.83D)(1.043X), (assuming the regression
statistics were significant), that would indicate a claim cost at time = 0 (presumably 1990) of
2345, a one time reduction of 17% on 1/1/97, and an annual inflation rate of 4.3%.
23.18. E. If the woman has post-secondary education, then E = -1 and F = -1.
If the woman has more than 2 children, then G = -1 and H = -1.
Thus the average of ln(wages) over women with post-secondary education and more than 2
children is: a - b1 - b2 - c1 - c2. The average ln(wages) over all women is: a.
Therefore, the differential for women with post-secondary education and more than 2 children
is: -b1 - b2 - c1 - c2 .
Comment: See Appendix 5.1 of Pindyck and Rubinfeld. If the woman has not completed high
school, then E = 1 and F = 0. If the woman has completed high school (with no
post-secondary education), then E = 0 and F = 1. If the woman has post-secondary
education, then E = -1 and F = -1. So the combination of the values of E and F describes the
amount of education. Another way to accomplish the same goal would be to have two dummy
variables with 0 or 1; the first variable for whether high school was completed or not and the
second variable for whether there was (some) post-secondary education or not. Which
specification is preferable depends on the purpose of the model.
Note that I have followed the textbook and taken the overall average as “a”. The overall
average depends on the number of women in the study from each category. If there are equal
numbers in each category, then the overall average is “a”. If instead we assume in the study
for example, 100 women who have not completed high school, 300 people who have
completed high school but with no secondary education, and 200 women with post
secondary education, then the average is not “a”. In this case, the expected value of the
variable E is: (100 - 200)/600 = -1/6 and the second term contributes -b1/6 to the average.
Similarly, the expected value of the variable F is: (300 - 200)/600 = 1/6 and the second term
contributes b2/6 to the average.
23.19. D. s2 = Σ ^εi 2 / (N - k) = 92/(10 - 2) = 11.5.

6 of the 10 Xi are zero, and 4 are 1. X = .4. 6 of the 10 xi are -.4, and 4 are .6.
Σxi2 = (6)(0 - .4)2 + (4)(1 - .4)2 = 2.4.
^
sβ^2 = s2/Σxi2 = 11.5/2.4 = 4.792. sβ^ = 2.189. t = β/ sβ^ = 4/2.189 = 1.827.
Comment: For N - k = 10 - 2 = 8 degrees of freedom, since 1.827 < 1.860 we do not reject H0
at 10%.
23.20. E. For the first group, the predicted value of Y21j is: δ + .75Y11j.
For the second group, the predicted value of Y22j is: δ + .75Y12j + θ.
The sum of the squared errors is:
n n
Σ (Y21j - δ - .75Y11j)2 + Σ (Y22j - δ - .75Y12j - θ)2.
j=1 j=1
Setting the partial derivative with respect to δ equal to zero:
0 = -2Σ (Y21j - δ - .75Y11j) - 2Σ(Y22j - δ - .75Y12j - θ). ⇒
0 = ΣY21j - Σδ - .75ΣY11j + ΣY22j - Σδ - .75ΣY12j - Σθ. ⇒
30n - nδ - .75(40n) + 37n - nδ - (.75)(41n) - nθ = 0. ⇒ 2δ + θ = 6.25.
Setting the partial derivative with respect to θ equal to zero:
0 = -2Σ(Y22j - δ - .75Y12j - θ). ⇒ 0 = ΣY22j - Σδ - .75ΣY12j - Σθ. ⇒
37n - nδ - (.75)(41n) + nθ = 0. ⇒ δ + θ = 6.25.
Subtracting twice the second equation from the first equation: θ = 6.25.
Comment: The estimated δ = 0. Y21 = 30. ⇒ ΣY21j = 30n.
24.1. A. The rate of inflation before time 6 is β2 and after time 6 is β2β3.
The two rates would be the same if β3 = 1.
Thus we apply a t-test to test the hypothesis β3 = 1.
^
H0 is β3 = 1. t = ( β 3 - 1)/ sβ^3 = .02/.011 = 1.818, with 30 - 3 = 27 degrees of freedom.
Since 1.703 < 1.818 < 2.052, we reject H0 at 10% and do not reject at 5%.
At the 10% level the two rates of inflation are significantly different.
24.2. The desired form of the model is: Y = β1 + β2X + β3(X - 1000)D + ε,
where D = 1 for X > 1000 and 0 for X ≤ 1000.
The first three rows, out of fifteen rows in total, of the design matrix X are:
(1 1150 150)
(1 840 0)
(1 900 0)
(15 14860 1390 )
X’X = (14860 15561000 1785100)
(1390 1785100 395100 )
(3.4589 -0.0039592 0.0057191)
-1
(X’X) = (-0.0039592 4.6652 x 10 -6 -7.1490 x 10-6)
( 0.0057191 -7.1490 x 10-6 0.000014711)
( 27.37 ) ( 4.024 )
^
X’Y = (24791.1) β = (X’X)-1X’Y = (-.002090)
( 1312.4 ) (-.001394)
^
Y = 4.02 - .00209X - 00139(X - 1000)D, where D = 1 for X > 1000 and 0 for X ≤ 1000.
Comment: A graph of the data and the fitted model:
3
2.5
1.5
0.5
600 800 1000 1200 1400
^
24.3. ESS = Y’Y - β‘X’Y = 56.6105 - 56.5022 = .1083.
s2 = ESS/(N - k) = .1083/(15 - 3) = .00903.
^
Var[ β] = s2(X’X)-1.
Variance of the coefficient of (X - 1000)D is: (.00903)(0.000014711) = .0000001328.
Standard Error of the coefficient of (X - 1000)D is: √.0000001328 = .000364.
t = -.00139 / .000364 = -3.82.
For 12 degrees of freedom, the critical value for 1% is 3.055.
Since 3.82 > 3.055, we reject H0 at 1%.
At the 1% level, the effect of X on Y is significantly different before and after 1000.
24.4. & 24.5. The piecewise linear regression model would have the form:
Y = β1 + β2X + β3(X - 27)D1 + β4(X - 60)D2 + ε, where D1 is 0 for X < 27 and D1 is 1 for X ≥ 27,
and D2 is 0 for X < 60 and D2 is 1 for X ≥ 60.
Squared error: Σ{Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2}2.
0 = 2Σ{Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2}.
⇒ ΣYi = Σβ1 + β2ΣXi + β3Σ(D1Xi - 27D1) + β4Σ(D2Xi - 60D2).
⇒ c + g + l = (s + t + u)ββ 1 + (a + e + j)ββ 2 + (e + j - 27t - 27u)ββ 3 + (j - 60u)ββ 4 .
0 = 2ΣXi{Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2}.
⇒ ΣXiYi = β1ΣXi + β2ΣXi2 + β3Σ(D1Xi2 - 27D1Xi) + β4Σ(D2Xi2 - 60D2Xi).
⇒ d + h + m = (a + e + j)ββ 1 + (b + f + k)ββ 2 + (f + k - 27e - 27j)ββ 3 + (k - 60j)ββ 4 .
0 = 2Σ(Xi - 27)D1{Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2}.
⇒ ΣXiYiD1 - 27ΣYiD1 = β1Σ(XiD1 - 27D1) + β2Σ(Xi2D1 - 27D1Xi)
+ β3Σ(D1Xi2 - 54D1Xi + 729D1) + β4Σ(D2Xi2 - 87D2Xi + 1620D2).
⇒ h + m - 27g - 27l = (e + j - 27t - 27u)ββ 1 + (f + k - 27e - 27j)ββ 2
+ (f + k - 54e - 54j + 729t + 729u)β β 3 + (k - 87j + 1620u)β
β4.
0 = 2Σ(Xi - 60)D2{Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2}.
⇒ ΣXiYiD2 - 60ΣYiD2 = β1Σ(XiD2 - 60D2) + β2Σ(Xi2D2 - 60D2Xi)
+ β3Σ(D2Xi2 - 87D2Xi + 1620D2) + β4Σ(D2Xi2 - 120D2Xi + 3600D2).
⇒ m - 60l = (j - 60u)ββ 1 + (k - 60j)ββ 2 + (k - 87j + 1620u)ββ 3 + (k - 120j + 3600u)ββ 4 .
Comment: The fitted model might look something like this, with three different slopes:
freq.
age
17 27 60 80
24.6. E. This is a piecewise linear model, a special case of spline functions. Thus A is false.
The model has two structural breaks, one at t0 and one at t1. Thus C is false.
At the structural breaks, the slopes change. For example, before t0 the slope is β2 while after
t0 the slope is β2 + β3. The intercepts also change, however, the purpose of the dummy
variables is to account for shifts in the slope. Thus D is “false”.
At t0 the function is β1 + β2 Yt0 = (β1 - β3 Yt0 ) + (β2 + β3)Yt0 , thus it is continuous at t0.
At t1 the function is (β1 - β3 Yt0 ) + (β2 + β3)Yt1 = (β1 - β3 Yt0 - β4 Yt1) + (β2 + β3 + β4) Yt1, thus it is
continuous at t1. Thus B is false and E is true.
Comment: See pages 136 to 137 of Pindyck and Rubinfeld.
25.1. D. In this case one can just duplicate the point (3, 4) and perform an unweighted
regression. Thus the X values are: 0, 3, 3, 10 and the Y values are: 2, 4, 4, 6.
The slope is: {((1/N)ΣXiYi) - ((1/N)ΣXi) ((1/N)ΣYi) } / { ((1/N)ΣXi2) - ((1/N)ΣXi)2 } =
{(21) - (4)(4)} /{(29.5) - (4)2} = 5/13.5 = .370.
X Y XY X^2
0 2 0 0
3 4 12 9
3 4 12 9
10 6 60 100
Average 4 4 21 29.5
Alternately, in deviations form:
xi = Xi - ΣwiXi = (0, 3, 10) - {(1/4)(0) + (1/2)(3) + (1/4)(10)} = (-4, -1, 6).
yi = Yi - ΣwiYi = (2, 4, 6) - {(1/4)(2) + (1/2)(4) + (1/4)(6)} = (-2, 0, 2).
^
β = Σwixiyi /Σwixi2 =
{(1/4)(-4)(-2) + (1/2)(-1)(0) + (1/4)(6)(2)}/{(1/4)(-4)2 + (1/2)(-1)2 + (1/4)(6)2} = 5/13.5 = .370.
^
^ = ΣwiYi - β ΣwiXi = 4 - (.370)(4) = 2.52.
α
25.2. B. X = Σ wiXi = 2.7. x = X - X = (-1.7, 1.3, 6.3).

Y = Σ wiYi = 23. y = Y - Y = (-8, 7, 27).
Σ wixiyi = (.6)(-1.7)(-8) + (.3)(1.3)(7) + (.1)(6.3)(27) = 27.9.
Σ wixi2 = (.6)(-1.7)2 + (.3)(1.3)2 + (.1)(6.3)2 = 6.21.
slope = Σ wixiyi /Σ wixi2 = 27.9/6.21 = 4.493.
Intercept = Y - (slope) X = 23 - (4.493)(2.7) = 10.87.
The fitted value of Y for X = 10 is: 10.87 + (10)( 4.493) = 55.80.
^
25.3. D. β = ΣwiXiYi / ΣwiXi2 =
{(.3)(1)(3) + (.4)(5)(8) + (.2)(10)(13) + (.1)(20)(32)}/{(.3)(1)2 + (.4)(5)2 + (.2)(10)2 + (.1)(20)2} =
106.9/70.3 = 1.521.
25.4. A. Let X be the age. Then the model is: u = βX. vi will be the fitted values of u.
Weighted squared error = F = Σwi (ui - βXi)2.
Setting the partial derivative with respect to β equal to zero:
0 = -2 Σwi Xi(ui - βXi). ⇒ ΣwiuiXi = βΣwi Xi2. ⇒ β = ΣwiuiXi / Σwi Xi2.
^
β = {(25)(20) + (28)(25) + (30)(30)}/{(100)(202) + (112)(252) + (100)(302)} =
^
2100/200000 = .0105. v1 = 20 β = (20)(.0105) = .210.
Comment: A weighted regression with no intercept. "Directly proportional to" ⇔ no intercept.
25.5. C. The Buhlmann Credibility is the slope of the least squares line fit to the Bayesian
Estimates. One needs to do a weighted regression with the weights equal to the a priori
probabilities; in this case since the a priori probabilities are the same one can perform an
unweighted regression.
The X values are: 0, 3, 12 and the Y values are: 1, 6, 8. The slope is:
{(1/N)ΣXiYi - (1/N)ΣXi (1/N)ΣYi) } / {(1/N)ΣXi2 - (1/N)ΣXi2} = {(38) - (5)(5)}/{51 - 52} = 0.5.
X Y XY X^2
0 1 0 0
3 6 18 9
12 8 96 144
Average 5 5 38 51
Thus the Buhlmann Credibility is .50 and the new estimates are:
(observation)Z + (prior mean)(1 - Z) = (0, 3, 12)(.5) + (5)(1 - .5) = (0, 1.5, 6) + 2.5 =
(2.5, 4.0, 8.5).
25.6. B. ∆ vx = a. ⇒ vx is a linear function of x. v = α + βx.

The sum of the squared deviations, weighted by exposures is:
300(α - 3)2 + 200(α + β - 6)2 + 100(α + 2β - 11)2.
Set the partial derivative with respect to α equal to zero:
0 = 600(α - 3) + 400(α + β - 6) + 200(α + 2β - 11). ⇒ 3α + 2β = 16.
Set the partial derivative with respect to β equal to zero:
0 = 400(α + β - 6) + 400(α + 2β - 11). ⇒ 2α + 3β = 17.
Solving, α = 2.8 and β = 3.8.
v1 = α + β = 2.8 + 3.8 = 6.6.
Alternately, one can weight the first data point three times the third by pretending it appeared
three times, and weight the second data point twice the third by pretending it appeared twice.
X = (0, 0, 0, 1, 1, 2). X = 2/3. x = (-2/3, -2/3, -2/3, 1/3, 1/3, 4/3).
Y = (3, 3, 3, 6, 6, 11). Y = 16/3. y = (-7/3, -7/3, -7/3, 2/3, 2/3, 17/3).
^ ^ = Y - β^ X = 16/3 - (3.8)(2/3) = 2.8.
β = Σ xiyi/ Σ xi2 = (114/9)/(30/9) = 3.8. α
^
^ + β = 2.8 + 3.8 = 6.6.
v1 = α
25.7. D. Weighted Sum of Squared Errors is:
300(30.5a - 3/300)2 + 400(40.5a - 10/400)2 + 300(50.5a - 15/300)2.
Setting the derivative with respect to a equal to zero:
0 = 2{(30.5)(300)(30.5a - .01) + (40.5)(400)(40.5a - .025) + (50.5)(300)(50.5a - .05)}.
a = 12.54/17002.5 = .000738.
25.8. D. Buhlmann Credibility is the least squares linear approximation to the Bayesian
analysis result. The given expression is the squared error of a linear estimate. Thus the
values of a and b that minimize the given expression correspond to the Buhlmann credibility
estimate. In this case, the new estimate using Buhlmann Credibility =
(prior mean)(1-Z) + (observation) Z = 2(1 - 1/12) + 1/12(observation) = 22/12 + 1/12(obser.).
Therefore a = 22/12 and b = 1/12. Alternately, one can minimize the given expression.
One takes the partial derivatives with respect to a and b and sets them equal to zero.
Σ 2Pi (a + bRi - Ei) = 0, and Σ 2Pi Ri (a + bRi - Ei) = 0.
Therefore, (2/3)(a + b(0) - 7/4) + (2/9)(a + b(2) - 55/24) + (1/9)(a + b(14) - 35/12) = 0
⇒ a + 2 b = 2, and (2/9)(2)(a + b(2) - 55/24) + (1/9)(14)(a + b(14) - 35/12) = 0
18 a + 204 b = 300/6 =50 ⇒ 9 a + 102 b = 25. One can either solve these two
simultaneous linear equations by matrix methods or try the choices A through E.
Comment: Normally one would not be given the Buhlmann credibility factor as was the case
here, allowing the first method of solution, which does not use the information given on the
values of the Bayesian analysis estimates. Note that the Bayesian estimates balance to the a
priori mean of 2: (2/3)(7/4) + (2/9)(55/24) + (1/9)(35/12) = (126 + 55 + 35)/108 = 216/108 = 2.
25.9. E. v3 = 10 = 3a + b. ⇒ b = 10 - 3a.
The sum of the squared deviations, weighted by exposures is:
1(a + b - 4)2 + 1(2a + b - 6)2 + 2(u3 - 10)2 = (6 - 2a)2 + (4 - a)2 + 2(u3 - 10)2.
Set the partial derivative with respect to a equal to zero:
0 = -4(6 - 2a) - 2(4 - a). ⇒ a = 32/10 = 3.2.
v2 = 2a + b = 10 - a = 10 - 3.2 = 6.8.
Comment: b = 10 - 3a = 10 - (3)(3.2) = 0.4. Thus the fitted values are: (3.6, 6.8, 10).
The weighted average of the fitted values is: (3.6 + 6.8 + 20)/4 = 7.6.
This must be equal to the weighted average of the observed values:
7.6 = (4 + 6 + 2u3)/4. ⇒ u3 = 10.2.
25.10. A. q10 = 1/30. q20 = 5/90 = 1/18. q30 = 7/80.

^ ^ ^
q 10 = 31a/3. q 20 = 61a/3. q 30 = 91a/3.
Weighted sum of squares is:
30(31a/3 - 1/30)2 + 90(61a/3 - 1/18)2 + 80(91a/3 - 7/80)2 =
(10/9){3(31a - 1/10)2 + 9(61a - 1/6)2 + 8(91a - 21/80)2}.
Setting the partial derivative with respect to a equal to zero:
0 = (20/9){93(31a - 1/10) + 549(61a - 1/6) + 728(91a - 21/80)}.
1000a = 1000(9.3 + 91.5 + 191.1)/(2883 + 33489 + 66248) = 291900/102620 = 2.844.
25.11. C. The line formed by the Buhlmann Credibility estimates is the weighted least
squares line to the Bayesian estimates, with the a priori probability of each outcome acting as
the weights. Since the a priori probabilities are equal we fit an unweighted regression.
X = 1, 2, 3. X = 2. x = X - X = -1, 0, 1. Y = 1.5, 1.5, 3. Y = 2. y = Y - Y = -.5, -.5, 1.
Σ xiyi = 1.5. Σ xi2 = 2. slope = Σ xiyi /Σ xi2 = 1.5/2 = .75 = Z.
Intercept = Y - (slope) X = 2 - (.75)(2) = .5.
Bühlmann credibility estimate of the second observation = .5 + .75(first observation).
Given that the first observation is 1, the Bühlmann credibility estimate is: (.5) + (.75)(1) =
1.25.
Comment: The Bühlmann credibility estimate given 2 is 2; the estimate given 3 is 2.75.
The Bayesian Estimates average to 2, the overall a priori mean. Bayesian estimates are in
balance. The Bühlmann Estimates are also in balance; they also average to 2.
26.1. Homoscedasticity is when the errors terms of a regression have a constant variance.
Heteroscedasticity is when the errors terms of a regression do not have a constant variance.
26.2. A. For the first graph, the squared residuals tend to be larger for later values,
indicating that the errors terms of the regression do not have a constant variance.
Comment: Due to random fluctuation, it can be hard to pick out a pattern, even if one exists.
^
26.3. In deviations form, x = (-5, 0, 5), and β = ΣxiYi/Σxi2 = (-5Y1 + 5Y3)/50 = .1(Y3 - Y1) =
.1{3 + (2)(10) + ε3 - 3 - ε1} = 2 + .1(ε3 - ε1).
^
^ = Y - β X = 3 + 2 X + (ε1 + ε2 + ε3)/3 - {2 + .1(ε3 - ε1)} X = 3 + (ε1 + ε2 + ε3)/3 - .5(ε3 - ε1)
α
= 3 + (5ε1 + 2ε2 - ε3)/6.
ε1 ε2 ε3 Fitted beta Fitted alpha residual 1 residual 2 residual 3 ESS
1 2 4 2.3 3.8333 -0.1667 0.3333 -0.1667 0.1667
1 2 - 4 1.5 5.1667 1.1667 -2.3333 1.1667 8.1667
1 - 2 4 2.3 2.5000 -1.5000 3.0000 -1.5000 13.5000
1 - 2 - 4 1.5 3.8333 -0.1667 0.3333 -0.1667 0.1667
- 1 2 4 2.5 2.1667 0.1667 -0.3333 0.1667 0.1667
- 1 2 - 4 1.7 3.5000 1.5000 -3.0000 1.5000 13.5000
- 1 - 2 4 2.5 0.8333 -1.1667 2.3333 -1.1667 8.1667
- 1 - 2 - 4 1.7 2.1667 0.1667 -0.3333 0.1667 0.1667
Avg. 2 3 -0.0000 0.0000 -0.0000 5.5000
For example, for the first set, Y1 = 3 + (2)(0) + 1 = 4, Y2 = 3 + (2)(5) + 2 = 15,
and Y3 = 3 + (2)(10) + 4 = 27.
^ ^ = (4 + 15 + 27)/3 - (2.3)(5) = 3.8333.
β = (27 - 4)/10 = 2.3. α
^ ^
Y1 = 3.8333. ε^1 = Y1 - Y1 = 3.8333 - 4 = -0.1667.
^ ^
Y2 = 3.8333 + (2.3)(5) = 15.3333. ε^2 = Y2 - Y2 = 15.3333 - 15 = 0.3333.
^ ^
Y3 = 3.8333 + (2.3)(10) = 26.8333. ε^3 = Y3 - Y3 = 26.8333 - 27 = -0.1667.
ESS = (-0.1667)2 + (0.3333)2 + (-0.1667)2 = 0.1667.
Comment: Note that the estimates of alpha and beta are both unbiased.
^
Var[ β] = {(2.3 - 2)2 + (1.5 - 2)2 + (2.3 - 2)2 + (1.5 - 2)2 + (2.5 - 2)2 + (1.7 - 2)2 + (2.5 - 2)2
+ (1.7 - 2)2}/8 = 0.17.
^
This matches the result of using the formula, Var[ β] = Σxi2σi2/(Σxi2)2
= {(-5)2(12) + (0)2(22) + (5)2(42)}/{(-5)2 + (0)2 + (5)2}2 = 425/502 = 0.17.
E[ESS/(N - 2)] = E[ESS] = 5.5. Since the variances of the εi are not equal, there is no σ2, and
E[ESS/(N - 2)] can not be equal to it, as would be the case for homoscedasticity.
27.1. Calculate the F statistic used in the Goldfeld-Quandt test for heteroscedasticity in
errors. For the first regression, ESS = (1 - R2)TSS = 19.7.
For the second regression, ESS = (1 - R2)TSS = 33.9.
F = (ESS for second regression)/(30 - 4)}/{ESS first regression/(30 - 4)} = 33.9 /19.7 = 1.72.
At 26 and 24 degrees of freedom, the critical value at 5% is 1.95. At 26 and 30 degrees of
freedom, the critical value at 5% is 1.90. Thus at 26 and 26 degrees of freedom, the critical
value at 5% is about 1.93. Since 1.72 < 1.93, we do not reject at 5% the null
hypothesis that the errors are constant.
27.2. A. The null hypothesis is that there is homoscedasticity, not heteroscedasticity.
Statements B, C, and D are true.
27.3. B. X = 12.2, x = (-11.2, -7.2, - 6.2, -4.2, -3.2, .8, 1.8, 5.8, 10.8, 12.8).
Y = 15.3, y = (-12.3, -10.3, -6.3, -4.3, -2.3, .7, .7, 7.7, 14.7, 11.7). Σxiyi = 631.4. Σxi2 = 561.6.
^ ^ ^
β = Σxiyi/Σxi2 = 631.4/561.6 = 1.124. α^ = Y - β X = 1.584. α^ + 23 β = 27.4.
^
27.4. E. For the fitted regression, Y = 1.584 + 1.124X:
^
Y = (2.708, 7.204, 8.328, 10.576, 11.7, 16.196, 17.32, 21.816, 27.436, 29.684).
^
ε^ = Y - Y = (0.292, -2.204, 0.672, 0.424, 1.300, -0.196, -1.320, 1.184, 2.564, -2.684).
ESS = Σ ^εi 2 = 24.225. Let σ2 = ESS/N = 24.225/10 = 2.4225.
Run a linear regression of ^εi 2/σ2 on X. X = 12.2.
Z = ^εi 2/σ2 = (0.0352, 2.005, 0.1864, 0.0742, 0.6976, 0.0159, 0.7193, 0.5787, 2.714, 2.974).
Z = 1.000.
z = (-0.9648, 1.005, -0.8136, -0.9258, -0.3024, -0.9841, -0.2807, -0.4213, 1.714, 1.974)
Σxizi = 53.50. Σxi2 = 561.6.
^ ^
β = Σxizi/Σxi2 = 53.50/561.6 = .0953. α^ = Z - β X = -.163.
^
Z = (-0.0677, 0.3135, 0.4088, 0.5994, 0.6947, 1.0759, 1.1712, 1.5524, 2.0289, 2.2195).
^
RSS = Σ ( Z - Z )2 = 5.10. RSS/2 = 2.55.
Compare RSS/2 to the Chi-Square Distribution with 1 degree of freedom. Since 2.55 < 3.84
we do not reject the null hypothesis, homoscedasticity, at 5%.
^
27.5. A. From the previous solution, for the fitted regression, Y = 1.584 + 1.124X:
^
ε^ = Y - Y = (0.292, -2.204, 0.672, 0.424, 1.300, -0.196, -1.320, 1.184, 2.564, -2.684).
Run a linear regression of ^εi 2 on X. X = 12.2.
Z = ^εi 2 = (0.0853, 4.858, 0.4516, 0.1798, 1.6900, 0.0384, 1.7424, 1.4019, 6.5741, 7.2039).
Z = 2.4225.
z = (-2.3372, 2.4351, -1.9709, -2.2427, -0.7325, -2.3841, -0.6801, -1.0206, 4.1516, 4.7814)
Σxizi = 129.6. Σxi2 = 561.6. β^ = Σxizi/Σxi2 = 129.6/561.6 = .231. α^ = Z - β^ X = -.396.
^
Z = (-0.165, 0.759, 0.990, 1.452, 1.683, 2.607, 2.838, 3.762, 4.917, 5.379).
^
RSS = Σ ( Z - Z )2 = 29.97. TSS = Σzi2 = 68.13. R2 = RSS/TSS = 0.440.
N R2 = (10)(.440) = 4.40, has a Chi-Square Distribution with 1 degree of freedom.
Since 3.84 < 4.40, we reject the null hypothesis, homoscedasticity, at 5%.
Since 5.02 < 4.40, we do not reject the null hypothesis, homoscedasticity, at 2.5%.
27.6. E. Run two separate regressions. One on the first four observations and one on the
last four observations, omitting the middle 10/5 = 2 observations.
For the first four observations: X = 5, x = (-4, 0, 1, 3). Y = 7, y = (-4, -2, 2, 4). Σxiyi = 30.
^ ^
Σxi2 = 26. β = Σxiyi/Σxi2 = 30/26 = 1.154. α^ = Y - β X = 1.230.
^
Y = 1.230 + 1.154X = (2.384, 7.000, 8.154, 10.462).
^ε ^
= Y - Yi = (.616, -2, .846, .538). ESS = Σ ^εi 2 = 5.38.
i
For the last four observations: X = 20, x = (-6, -2, 3, 5). Y = 24, y = (-8, -1, 6, 3). Σxiyi = 83.
Σxi2 = 74. β^ = Σxiyi/Σxi2 = 83/74 = 1.122. α^ = Y - β^ X = 1.560.

^
Y = 1.560 + 1.122X = (17.268, 21.756, 27.366, 29.610).
^ε ^
i= Y - Yi = (-1.268, 1.244, 2.634, -2.610). ESS = Σ ^εi 2 = 16.91.
F = (ESS for second regression)/(4 - 2)}/{ESS first regression/(4 - 2)} = 16.91/5.38 = 3.14.
Comment: You would order the observations from smallest to largest X, if it had not already
been done for you. At 2 and 2 degrees of freedom, the critical value at 5% is 19.0, so we do
not reject at 5% the null hypothesis that the errors are constant. For practical applications of
this test, you would want more total observations.
27.7. The absolute value of the residuals seem to be increasing with X, and therefore we
keep things in that order. We fit a regression to the first 8 observations:
-35.7999 + 11.4083x - 0.1535489x2 with ESS = 567.536.
We fit a regression to the last 8 observations:
46.2803 + 2.09519x + 0.175957x2 with ESS = 8859.01.
F = 8859.01/567.536 = 15.61 with 8 - 3 = 5 d.f. and 5 d.f.
The 1% critical value for 5 and 5 d.f. is 10.97.
Since 15.61 > 10.97, we reject at 1% the null hypothesis, that there is homoscedasticity.
27.8. Run a linear regression of ^εi 2 on X. The result is:

intercept = -1374.31, and slope = 155.496. R2 = 0.465.
Since the regression of ^εi 2 on X has one independent variable (not counting the intercept),
N R2 has a Chi-Square Distribution, with 1 degree of freedom.
N R2 = (20)(0.465) = 9.30.
For the Chi-Square, the critical value for 1 degree of freedom at 1/2% is 7.88.
Since 9.30 > 7.88, we reject the null hypothesis of homoscedasticity at 1/2%.
27.9. For the regression that was fit to the data, ESS = 6.064842 + ... + 21.91212 = 10719.1.
Take σ2 = ESS/N = 10719.1/20 = 536.
Run a linear regression of ^εi 2/σ2 on X. The result is:
intercept = -2.56401, and slope = 0.290104. RSS = 15.555.
Since the regression of ^εi 2/σ2 on X has one independent variable (not counting the
intercept), RSS/2 has a Chi-Square Distribution, with 1 degree of freedom. RSS/2 = 7.78
For the Chi-Square, the critical value for 1 degree of freedom at 1% is 6.64 and the critical
value for 1 degree of freedom at 1/2% is 7.88. Since 6.64 < 7.78 < 7.88 , we reject the null
hypothesis of homoscedasticity at 1%, but not at 1/2%.
27.10. E. In the Goldfeld-Quandt Test, the test statistic follows an F Distribution.

In the Breusch-Pagan Test, the test statistic follows a Chi-Square Distribution.
In the White Test, the test statistic follows a Chi-Square Distribution.
27.11. C. F = (ESS for 2nd regression)/(6 - 2)}/{ESS 1st regression/(6 - 2)} =

.93 /.79 = 1.18.
27.12. B. Statement A is the White Test; there are 2 d.f. since the formula for the alternative
hypothesis for Var(εi) has two independent variables, not counting the intercept.
Statement C is the Breusch-Pagan Test; there are 2 d.f. since the formula for the alternative
hypothesis for Var(εi) has two independent variables, not counting the intercept.
Statement B is false; as in Statement C it should be ^εi 2/ σ^ 2.

Statement D is true. See Equation 6.3 in Pindyck and Rubinfeld.
If we assume Var(εi) = ηXi2, then we can order the observations in assumed increasing order
of Var(εi) and apply the Goldfeld-Quandt test. Statement E is true.
28.1. B. & 28.2. D. We calculate the weights: wi = (1/σi2)/ Σ1/σi2.

i Xi Yi σi2 1/σi2 wi
1 10 10 1 1 400/541
2 40 40 4 1/4 100/541
3 160 100 16 1/16 25/541
4 250 125 25 1/25 16/541
^
β = {ΣwiXiYi - ΣwiXiΣwiYi}/ {ΣwiXi2 - (ΣwiXi)2} =
{2033.3 - (29.575)(23.105)}/{3401.1 - 29.5752} = .5343.
^
α^ = ΣwiYi - β ΣwiXi = 23.105 - (.5343)(29.575) = 7.303.
Alternately, to correct for heteroscedasticity, we divide each variable by σi = √Var(εi) =
√(Xi/10). The modified variables are:
i σi Xi Yi Xi /σi Yi/σi
1 1 10 10 10 10
2 2 40 40 20 20
3 4 160 100 40 25
4 5 250 125 50 25
Minimize the sum of squared errors for the adjusted model: Yi/σi = α/σi + βXi /σi + εi /σi.
The sum of squared errors is:
(10 - α/1 - 10β)2 + (20 - α/2 - 20β)2 + (25 - α/4 - 40β)2 + (25 - α/5 - 50β)2.
To minimize the squared error, set its partial derivatives with respect to α and β equal to zero:
(10 - α/1 - 10β) + (20 - α/2 - 20β)/2 + (25 - α/4 - 40β)/4 + (25 - α/5 - 50β)/5 = 0, and
10(10 - α/1 - 10β) + 20(20 - α/2 - 20β) + 40(25 - α/4 - 40β) + 50(25 - α/5 - 50β) = 0.
^
31.25 = 1.3525α + 40β, and 2750 = 40α + 4600β. ⇒ α^ = 7.303 and β = .5343.
^
Comment: An unweighted regression would produce the fit: α^ = 14.8 and β = .47.
28.3. D. α^ = Y = (225 + 290 + 205)/10 = 72.0.

28.4. B. Each Yt should be weighted inversely proportionally to its variance:
α^ = {3(Y1 + Y2 + Y3) + 5(Y4 + Y5 + Y6 + Y7) + 8(Y8 + Y9 + Y10)}/{(3)(3) + (5)(4) + (8)(3)} =

{(3)(225) + (5)(290) + (8)(205)}/53 = 3765/53 = 71.0.
Alternately, we need to minimize Σ (Yt - α)2/σt2, where σt2 = Var(εt).
Taking the partial derivative with respect to α and setting it equal to zero:
0 = -2Σ (Yt - α)/σt2. ⇒ α^ Σ1 /σt2 = Σ Yt /σt2 ⇒ α^ = Σ Yt /σt2 / Σ1 /σt2. Proceed as before.

Alternately, divide each Yt by σt = √Var(εt): Zt = Yt / σt. The new model is Zt = α/ σt + εt/ σt.
Unlike Y, Z has constant variances of the error terms; the model for Z is homoscedastic.
Performing ordinary least squares on a model with a slope but no intercept (treat 1/ σt as xt):
α^ = ΣZt/ σt / Σ1 /σt2 = Σ Yt /σt2 / Σ1 /σt2. Proceed as before.

28.5. X = 15. x = X - X . Y = 25.7333. y = Y - Y . Σxiyi = 485. Σxi2 = 250.

^ ^
β = Σxiyi /Σxi2 = 485/250 = 1.94. α^ = Y - β X = 25.7333 - (1.94)(15) = -3.367.
28.6. Take the weights proportional to 1/Var(εi); let wi = 1/Xi2/Σ1/Xi2.

1/Σ1/Xi2 = 11.803. wi = 11.803/Xi2. ΣwiXi = 11.803Σ1/Xi = (11.803)(1.08333) = 12.787.
ΣwiYi = 11.803ΣYi /Xi2 = (11.803)(1.81944) = 21.475.
xi = Xi - ΣwiXi = Xi - 12.787. yi = Yi - ΣwiYi = Yi - 21.475. Σwixiyi = 25.986. Σwixi2 = 13.544.
^
β = Σwixiyi /Σwixi2 = 25.986/13.544 = 1.919.
^
α^ = ΣwiYi - β ΣwiXi = 21.475 - (1.919)(12.787) = -3.06.
Comment: The weights should sum to one. One subtracts the weighted average in order to
convert to deviations form.
28.7. B. X = (1 + 4 + 10)/3 = 5. x = (-4, -1, 5).

Y = (3 + 9 + 14)/3 = 8.667. y = (-5.667, .333, 5.333).
^ ^
β = Σ xiyi / Σ xi2 = 49/42 = 7/6. α^ = Y - β X = 26/3 - (7/6)(5) = 17/6.
^ ^
Yi = (4, 7.5, 14.5). ^εi = Y - Yi = (-1, 1.5, -.5).
^
Var[ β] = Σxi2 ^εi 2 /(Σxi2)2 = {(-4)2(-1)2 + (-1)2(1.5)2 + (5)2(-.5)2}/{(-4)2 + (-1)2 + (5)2}2
= 24.5/422 = .0139.
28.8. C. We want to have the variance of the errors be equal, rather than depend on X.
If we multiply by a value, the variance of the errors will be multiplied by that value squared.
If we multiply everything by X1/2, the variance of the errors the new model will be proportional
to: (X1/2)2X-1 = 1. Multiplying the original model, Y = α + βX + ε, by X1/2 produces the new
model: YX 1/2 = α X 1/2 + β X 3/2 + ε ∗ .
Comment: Similar to 4, 11/01, Q.28. To correct for heteroscedasticity, divide the model by
something proportional to the standard deviation of the error.
In this case, divide by √X-1 = X-1/2, which is equivalent to multiplying by X1/2.
28.9. D. Adjust each variable by dividing by something proportional to StdDev[εi], √Xi.

Yi /√Xi = βXi/√Xi + εi/√Xi = β√Xi + εi /√Xi. The errors now have constant variance and the least
^
squares solution is: β = Σ√Xi Yi /√Xi /Σ(√Xi)2 = ΣYi /ΣXi = 26/14 = 1.86.
Alternately, minimize: Σ{(Yi - βXi)/σ√Xi}2. 0 = -2βΣ√Xi/σ {(Yi - βXi)/σ√Xi} ⇒
^
0 = ΣYi - βXi ⇒ β = ΣYi /ΣXi = 26/14 = 1.86.
28.10. A. Adjust each variable by dividing by something proportional to StdDev[εi], Xi.

Yi /Xi = βXi/Xi + εi/Xi = β + εi /Xi. The errors now have constant variance and the least squares
^
solution is: β = ΣYi /Xi /Σ12 = {(8/1) + (12/2) + (24/3) + (36/4) + (55/5)}/5 = 8.4.
Alternately, minimize: Σ{(Yi - βXi)/σXi}2. 0 = -2β/σ Σ{(Yi - βXi)/σXi} ⇒
^
0 = ΣYi/Xi - Nβ ⇒ β = (1/N)ΣYi /Xi = 42/5 = 8.4.
28.11. E. To correct for heteroscedasticity, we divide each variable by σi = √Var(εi) = xi/2.

The modified variables are:
i xi σi xi /σi yi yi/σi
1 1 .5 2 8 16
2 2 1 2 5 5
3 3 1.5 2 3 2
4 4 2 2 -4 –2
^
Since there is no intercept in the model, β = Σ(xi /σi)(yi/σi)/Σ(xi /σi)2 = 42/16 = 2.625.
Comment: The variables are not in deviations form, since they do not sum to zero. An
intercept of zero is a very poor model for this data. In my opinion, this is a very poor question!
28.12. C. An unweighted average of the Yt would be (8 Y1 + 12 Y 2 )/(8 + 12).

However, each Yt should be weighted inversely proportionally to its variance:
α^ = (8 Y1/.4 + 12 Y 2 /.6)/(8/.4 + 12/.6) = (20 Y1 + 20 Y 2 )/(20 + 20) = 0.5 Y1 + 0.5 Y 2 .

Alternately, we need to minimize Σ (Yt - α)2/σt2, where σt2 = Var(εt).
Taking the partial derivative with respect to α and setting it equal to zero:
0 = -2Σ (Yt - α)/σt2. ⇒ α^ Σ1 /σt2 = Σ Yt /σt2 ⇒ α^ = Σ Yt /σt2 / Σ1 /σt2.
α^ = {(Y1 + Y2 +...+ Y8)/.4 + (Y9 + Y10 +...+ Y20)/.6}/{8/.4 + 12/.6} =

(8 Y1/.4 + 12 Y 2 /.6)/(8/.4 + 12/.6) = (20 Y1 + 20 Y 2 )/(20 + 20) = 0.5 Y1 + 0.5 Y 2 .
Alternately, divide each Yt by σt = √Var(εt): Zt = Yt / σt.
The new model is Zt = α/ σt + εt/ σt.
Unlike Y, Z has constant variances of the error terms; the model for Z is homoscedastic.
Performing ordinary least squares on a model with a slope but no intercept (treat 1/ σt as xt):
α^ = ΣZt/ σt / Σ1 /σt2 = Σ Yt /σt2 / Σ1 /σt2. Proceed as before.

Comment: See pages 148-149 of Pindyck and Rubinfeld.
28.13. A. We want to have the variance of the errors be equal, rather than depend on X.
If we multiply by a value, the variance of the errors will be multiplied by that value squared.
If we multiply everything by X1/4, the variance of the errors the new model will be proportional
to: (X1/4)2X-1/2 = 1. Multiplying the original model, Y = α + βX + ε, by X1/4 produces the new
model: YX 1/4 = α X 1/4 + β X 5/4 + ε ∗ .
Comment: To correct for heteroscedasticity, divide the model by something proportional to
the standard deviation of the error. In this case, divide by √X-1/2 = X-1/4, which is equivalent to
multiplying by X1/4.
28.14. A. wi = 1/Var(εi) = 1/(σ2Xi), which is proportional to 1/Xi.

^
β = Σ wiXiYi / Σ wi Xi2 = Σ Yi / Σ Xi = 30.1/12.5 = 2.408.
29.1. B. Ordinary least squares regression estimators remain consistent, even when there
is positive serial correlation.
29.2. In deviations form x = (-2 , -1, 0, 1, 2).
^
β = ΣxiYi/Σxi2 = (-2Y1 - Y2 + Y4 + 2Y5)/10 = 3 + (-2ε1 - ε2 + ε4 + 2ε5)/10.
If ε1 = 1, then ε2 = ε3 = ε4 = ε5 = 1.
Y = (8, 11, 14, 17, 20).
^ ^ = 14 - (3)(3) = 5. All of the residuals are zero, and ESS = 0.
β = 3. α
If instead, ε1 = -1, then ε2 = ε3 = ε4 = ε5 = -1.
Y = (6, 9, 12, 15, 18).
^ ^ = 12 - (3)(3) = 3. All of the residuals are zero, and ESS = 0.
β = 3. α
In either case ESS = 0.
Comment: ESS/(N - 2) = 0, which underestimates the actual variance of the regression which
is 1. With positive serial correlation, s2 = ESS/(N - 2) is biased downwards as an estimator of
the actual variance of the regression. While the errors are not Normally Distributed, this
feature still holds for this simplified example.
29.3. If ε1 = 1, then ε2 = -1, ε3 = 1, ε4 = -1, ε5 = 1. Then, Y = (8, 9, 14, 15, 20). Y = 13.2.
^ ^ = 13.2 - (3)(3) = 4.2.
β = 3 + (-2 + 1 -1 + 2)/10 = 3. α
^
Y = (7.2, 10.2, 13.2, 16.2, 19.2).
^
ε^ = Y - Y = (0.8, -1.2, 0.8, -1.2, 0.8). ESS = 4.8.
If instead, ε1 = -1, then ε2 = 1, ε3 = -1, ε4 = 1, ε5 = -1, Y = (6, 11, 12, 17, 18),
and again it turns out that ESS = 4.8. In either case, ESS = 4.8.
Comment: ESS/(N - 2) = 4.8/3 = 1.6, which overestimates the actual variance of the
regression which is 1. With negative serial correlation, s2 = ESS/(N - 2) is biased upwards as
an estimator of the actual variance of the regression.
29.4. D. Statement D should say “positive serial correlation.”

Comment: See page 159 of Pindyck and Rubinfeld.
29.5. D. Var[εt] = Var[ν]/(1 - ρ2) = 40/(1 - .72) = 78.43.

Comment: See Equation 6.13 in Pindyck and Rubinfeld.
29.6. E. In graph E, negative ε^ t −1 are more likely to be associated with positive ε^t , and
positive ε^ t −1 are more likely to be associated with negative ε^t . The points tend to be in the
upper-left and lower-right quadrants, indicating negative serial correlation.
Comment: Graph A indicates positive serial correlation.
30.1. B. 2.9 = DW ≅ 2(1 - ρ). ρ ≅ 1 - 2.9/2 = -.45.
30.2. D. 4 - du = 4 - 1.66 = 2.34 < DW = 2.41 < 4 - dl = 4 - 1.34 = 2.66 ⇒

result indeterminate.
^ ^ε ^
30.3. C. Xi Yi Yi i = Yi - Yi
1 82 81 1
2 78 79.5 -1.5
3 80 78 2
4 73 76.5 -3.5
5 77 75 2
Σ ε^t 2 = 1 + 2.25 + 4 + 12.25 + 4 = 23.5.
Σ( ε^t - ε^ t −1)2 = 2.52 + 3.52 + 5.52 + 5.52 = 79.
Durbin-Watson Statistic = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = 79/23.5 = 3.36.
Comment: There are too few observations to draw any useful conclusions about possible
serial correlation.
^
30.4. B. h = (1 - DW/2)√(N/(1- N Var[ β])) = (1 - 1.46/2)√{50/(1 - (50)(0.009)} =
(.27)(9.5346) = 2.574. We preform a one-sided Normal Test.
Φ(2.326) = .990 and Φ(2.576) = .995.
2.326 < 2.574 < 2.576. We reject H0 at 1% and do not reject H0 at 0.5%.
^
Comment: We use Var[ β] in computing h, since β is the coefficient of the lagged dependent
variable.
30.5. B. The Durbin-Watson Statistic can be used for models with lagged variables,
however, since it would be biased against rejection, Pindyck and Rubinfeld recommend other
tests.
^
30.6. E. h = (1 - DW/2)√(N/(1- N Var[ β])) = (1 - 1.78/2)√(250/(1- (250)(.0023)) = 2.668.
Φ(2.576) = .995. 2.668 > 2.576. We reject H0 at 0.5%, a one-sided test.
30.7. B. One applies the t-test to test whether ρ is significantly different from zero.
t = .35/√.031 = 1.988. We fit 44 residuals, with 4 parameters, for 40 degrees of freedom.
The critical values at 10% and 5% are 1.684 and 2.021. 1.684 < 1.988 < 2.021.
We reject H0 at 10% (two-sided test) and do not reject H0 at 5% (two-sided test).
At 10%, we conclude there is serial correlation.
30.8. T = 14.5. t = T - T . Y = 35310/28 = 1261.07. y = Y - Y . Σtiyi = -113,118. Σti2 = 1827.

^ ^
β = Σtiyi /Σti2 = -113118/1827 = -61.91. α^ = Y - β X = 1261.07 - (-61.91)(14.5) = 2159.
^
30.9. Y = 2159 - 61.91T = 2097, 2035, 1973, 1911, 1849, 1788, 1726, 1664, 1602, 1540,
1478, 1416, 1354, 1292, 1230, 1168, 1107, 1045, 983, 921, 859, 797, 735, 673, 611, 549,
487, 426.
^
ε^ = Y - Y = 448, 434, 419, 282, 52, -70, -81, -118, -169, -251, -342, -375, -354, -308, -266,
-213, -138, -96, -57, -41, -20, 25, 77, 129, 186, 233, 275, 333.
Σ ε^t 2 = 1,689,870. Σ( ε^t - ε^ t −1)2 = 141,915.

Durbin-Watson Statistic = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = 141,915/1,689,870 = .084.
30.10. DW ≅ 2(1 - ρ). .084 ≅ 2(1 - ρ). ⇒ ρ ≅ .96.

Comment: A graph of the residuals, showing that they are highly serially correlated:
400
200
5 10 15 20 25
-200
ρ ≅ 1 ⇒ ε^ t −1 ≅ ε^t ⇒ Σ( ε^t - ε^ t −1)2 ≅ 0 ⇒ DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 ≅ 0.
If instead ρ ≅ -1 ⇒ ε^ t −1 ≅ − ε^t ⇒ Σ( ε^t - ε^ t −1)2 ≅ Σ(2 ε^t )2 = 4Σ ε^t 2 ⇒
DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 ≅ 4.
30.11. A. t = 30 t = 30 t = 30 t = 29
Σ( ε^t - ε^ t −1)2 = Σ ε^t 2 - 2Σ ε^t ε^ t −1 + Σ ε^t 2 = (2422 - 72) - (2)(801) + (2422 - 112) = 3072.
t=2 t=2 t=2 t=1
t = 30 t = 30
DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = 3072/2422 = 1.268.

t=2 t=1
30.12. The fitted parameters are: α = 0.0490683, β = -0.0117457, γ = -0.00460421.

The standard errors are respectively: 0.00265772, 0.00900476, 0.000861757.
The t-statistics and p-values are: 18.463 (0%), -5.343 (0%), -1.304 (20.3%).
2
RSS = .0001787. ESS = .0001500. R2 = .544. R = .510. F = 16.09 (0%).
Durbin-Watson Statistic is 1.258.
^
h = (1 - DW/2)√(N/(1- N Var[ β])) = (1 - 1.258/2)√{30/(1 - (30)(0.009004762)} = 2.051.
Φ(1.960) = .975 and Φ(2.326) = .990. 1.960 < 2.051 < 2.326.
We reject H0 at 2.5% and do not reject H0 at 1% (one-sided test).
30.13. Y = (420, 450, 480, 490, 500, 510, 520, 550, 580, 610, 640, 650, 660, 670, 680, 710,
^
^ = 407.059. β = 19.2157. s2 = 143.268. s = 6.0721. s ^ = 0.59258.
740). α ^
α β
tα = 407.059/6.0721 = 67.04. tβ = 19.2157/0.59258 = 32.43.
2
RSS = 150,651. ESS = 2149. TSS = 152,800. R2 = 0.986. R = 0.985.
F = 1052 = 32.432. The Durbin-Watson Statistic is 0.749.
Comment: Unlike the linear regression model, the errors are not random in this example.
However, the errors are positively correlated, and therefore DW < 2.
For k = 1 (one explanatory variable excluding the constant) and N = 17, dl = 1.13.
Since 0.749 < 1.13, we reject H0: ρ = 0, at a 5% significance level.
One could add a random component to the errors, with E[ε1] = 0, E[ε2] = 10, etc.
^ ^
30.14. A. Day Y X Yt = -25 + 20Xt ε^t = Yt - Yt ε^t - ε^ t −1
1 11 2 15 -4
2 20 2 15 5 9
3 30 3 35 -5 -10
4 39 3 35 4 9
5 51 4 55 -4 -8
6 59 4 55 4 8
7 70 5 75 -5 -9
8 80 5 75 5 10
DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = 571/164 = 3.48.
30.15. D. DW ≅ 2(1 - ρ). .8 ≅ 2(1 - ρ). ⇒ ρ ≅ .6.

^ ^
30.16 t X Y Y = -25 +20X ε^t = Yt - Y
1 2 10 15 -5
2 2 20 15 5
3 3 30 35 -5
4 3 40 35 5
DW = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 = (102 + (-10)2 + 102)/{(-5)2 + 52 + (-5)2 + 52} = 300/100 = 3.

Comment: The residuals appear to be negatively serially correlated.
30.17. D. The residuals, ε^t , are: 77 - 77.6 = -.6, 69.9 - 70.6 = -.7, 73.2 - 70.9 = 2.3,
72.7 - 72.7 = 0, and 66.1 - 67.1 = -1.
Durbin-Watson statistic = Σ( ε^t - ε^ t −1)2 / Σ ε^t 2 =

((-.1)2 + 32 + (-2.3)2 + (-1)2)/((-.6)2 + (-.7)2 + 2.32 + 02 + (-1)2) = 15.3/7.14 = 2.143.
2.143 = DW ≅ 2(1 - ρ). ⇒ ρ ≅ -.071.
Comment: DW > 2 is some indication of negative serial correlation.
^ ^
30.18. E. We given that the Standard Error of β = 0.1. ⇒ Var[ β] = .12 = .01.
^
h = (1 - DW/2)√{N/(1 - N Var[ β])} = (1 - 1.2/2)√{36/(1 - (36)(.01))} = 3.
h has a Standard Normal Distribution if there is no serial correlation.
If the alternative hypothesis is positive serial correlation, then we do a one-sided test:
1 - Φ(3) = 1 - .9987 = 0.13% < 1%.
Reject the null hypothesis of no serial correlation at the 1% significance level.
If the alternative hypothesis is serial correlation, then we do a two-sided test:
2(1 - Φ(3)) = 0.26% < 1%.
Reject the null hypothesis of no serial correlation at the 1% significance level.
Comment: See Example 6.7 in Pindyck and Rubinfeld.
The exam question really should have specified the alternative hypothesis.
One uses Durbin's h test when the model includes a lagged dependent variable on the right
^
side of the equation. We use Var[ β] in computing h, since β is the coefficient of the lagged
dependent variable.
31.1. D. t Actual Fitted ε^t

1 68.9 69.9 68.9 - 69.9 = -1.0
2 73.2 72.8 73.2 - 72.8 = 0.4
3 76.8 75.6 76.8 - 75.6 = 1.2
4 79.0 78.5 79.0 - 78.5 = 0.5
5 80.3 81.4 80.3 - 81.4 = -1.1
Fit a linear regression ε^t = ρ ε^ t −1 + error.
Since this is a model with no intercept, ρ^ = Σ ε^ t −1 ε^t / Σ ε^ t −12 =

{(-1)(.4) + (.4)(1.2) + (1.2)(.5) + (.5)(-1.1)}/{(-1.0)2 + .42 + 1.22 + .52} = .13 / 2.85 = .046.
31.2. C. Y6* = Y6 - ρ Y5 = 33.743 - (.3)(14.122) = 29.5.
31.3. B. ρ = .4 has the smallest Error Sum of Squares.
31.4. X = 18.5. x = (-17.5, -16.5, ..., 17.5). Y = ln(CPI) = (5.37805, 5.38404, ..., 5.46848).
^
β = ΣxiY i/Σxi2 = 9.92578/3885 = .0025549.
^
^ = Y - β X = 195.405/36 - (.0025549)(18.5) = 5.38065.
α
Comment: Data for 1995 to 1997.
^ ^
^ + βX
31.5. Y = α = 5.38065 + .0025549X = (5.3832, 5.38576, ..., 5.47007, 5.47263).
^
ε^ = Y - Y = (-0.005167, -0.001738, ..., -0.004561, -0.004159).
Durbin-Watson Statistic = Σ( ε^t - ε^ t −1)2/Σ ε^t 2 = 0.00006859/0.0002635 = .260.

Comment: As is common for time series, positive serial correlation is indicated.
31.6. 1. Fit a regression and get the resulting residuals ε^t .

^
From the previous solution, ε^ = Y - Y = (-0.005167, -0.001738, ..., -0.004561, -0.004159).
2. Estimate the serial correlation coefficient: ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 =

{(-0.005167)(-0.001738) + .... + (-0.00456)(-0.004159)}/{(-0.005167)2 + .... (-0.004561)2} =
0.0002072/0.0002462 = .842.
3. Let X*t = Xt - ρXt-1 = (1.158, 1.316, 1.474, ..., 6.530).
(There are now only 35 elements, rather than 36.)
Y*t = Yt - ρYt-1 = (0.855716, 0.85297, 0.853327, ..., 0.866510).
^ ^
^ = 0.851256/(1 - .842) = 5.38770. β = 0.00227827.
α(1 - ρ) = 0.851256. ⇒ α

^ ^
^ + βX
Y=α = 5.38770 + 0.00227827X = (5.38998, 5.39226, 5.39453, ..., 5.46972).
^
ε^ = Y - Y = (-0.0119259, -0.0082203, -0.00820657, ..., -0.00123573).
6. Estimate the serial correlation coefficient:
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 0.000585432/0.000691695 = .846.

7. Although one could do another iteration, the value of ρ seems to have converged
sufficiently for most purposes.
^
One can use the equation with ρ = .842, Y = 5.38770 + 0.00227827X.
31.7. DW ≅ 2(1- ρ). ⇒ ρ ≅ 1 - DW/2 = 1 - .26/2 = .87.

Thus try ρ = .83, .85, .87, .89, .91.
For example, for ρ = .87:
X*t = (1.13, 1.26, 1.39, 1.52, 1.65, 1.78, 1.91, 2.04, 2.17, 2.3, 2.43, 2.56, 2.69, 2.82, 2.95, 3.08,
3.21, 3.34, 3.47, 3.60, 3.73, 3.86, 3.99, 4.12, 4.25, 4.38, 4.51, 4.64, 4.77, 4.90, 5.03, 5.16,
5.29, 5.42, 5.55).
Y*t = (0.705131, 0.702217, 0.702509, 0.702346, 0.703035, 0.705593, 0.70526, 0.704367,
0.706002, 0.705562, 0.704565, 0.709634, 0.708639, 0.706551, 0.706778, 0.707004,
0.70723, 0.709644, 0.708397, 0.70737, 0.709658, 0.708744, 0.707666, 0.712479,
0.711839, 0.711471, 0.71057, 0.71079, 0.710156, 0.711119, 0.711338, 0.710707,
0.711665, 0.712729, 0.713475).
^
For example, for ρ = .87: α(1 - ρ) = 0.700669 and β = 0.00221417.
ρ .83 .85 .87 .89 .91
ESS (10 )-10 556198 554812 555378 557898 56237
The best of these is ρ = .85.
4. Refine the grid of values for ρ, and again perform steps 1, 2, and 3:
ESS
0.00005549
0.000055488
0.000055486
0.000055484
0.000055482
0.00005548
0.000055478
Rho
0.846 0.848 0.85 0.852 0.854 0.856 0.858 0.86
Take ρ = .854. For the transformed regression with ρ = .854:

^ ^
^ = 0.786716/(1 - .854) = 5.38847. β = 0.00225381.
α(1 - ρ) = 0.786716. ⇒ α
Translate the transformed equation back to the original variables:
Y = α + βX = 5.38847 + 0.00225381X.
Comment: The results of using the two different procedures are very similar.
How to forecast in the presence of serial correlation is discussed in a subsequent section.
31.8. B. t Actual Fitted ε^t

1 77.0 77.6 77.0 - 77.6 = -0.6
2 69.9 70.6 69.9 - 70.6 = -0.7
3 73.2 70.9 73.2 - 70.9 = 2.3
4 72.7 72.7 72.7 - 72.7 = 0
5 66.1 67.1 66.1 - 67.1 = -1.0
Fit a linear regression ε^t = ρ ε^ t −1 + error.
Since this is a model with no intercept, ρ^ = Σ ε^ t −1 ε^t / Σ ε^ t −12 =

{(-.6)(-.7) + (-.7)(2.3) + (2.3)(0) + (0)(-1)}/{(-.6)2 + (-.7)2 + 2.32 + 02} = -1.19/6.14 = -.19.
Thus our first estimate of the lag 1 serial correlation coefficient is -.19.
Comment: The fitted values given in the question are not based on t being the (only)
independent variable. When I fit a linear regression with t the independent variable, I got:
Yt = 77.48 - 1.9t, which does not match the fitted values given in the question.
31.9. A. At page 159 of Pindyck and Rubinfeld, “Serial correlation does not affect the
unbiasedness or consistency of the ordinary least squares estimators, but it does affect their
efficiency.” Therefore, A is false and B is true. For positive serial correlation, the R2
probably gives an overly optimistic picture of the success of the regression, see page 160 of
Pindyck and Rubinfeld. C is True. The lower efficiency of the ordinary least-squares
estimator is masked; the estimator of the standard error of β is biased downward. D is True.
See page 159 of Pindyck and Rubinfeld. The Cochrane-Orcutt procedure is designed to
correct for serial correlation, and therefore E is True. See page 163 of Pindyck and Rubinfeld.
Comment: The assumption of independent errors with constant variance was not necessary
for the ordinary least-squares estimator of the slope to be unbiased.
^
In Appendix 3.1, Result 1, E[ β] = β, is derived, without any reference to the variance-
covariance matrix of the errors. (We do use the fact that the errors have mean of zero.)
32.1. Corr[Height, Weight] = .790. Corr[Height, Chest Depth] = .791.
Corr[Weight, Chest Depth] = .881. The high correlations between these independent
variables may indicate multicollinearity.
While the R2 is high at .967, and the F-Statistic has a very small p-value of 0.07%, 3 of 4
slopes have t-statistics that are not significant at the 5% level. This is a possible sign of
multicollinearity.
In order to test for multicollinearity, one might calculate the condition number as mentioned
by Pindyck and Rubinfeld.
In order to test for whether multicollinearity is causing problems, one could run regressions
with one or more variables dropped, and see the effect on the standard errors.
^
An older boy is expected to have a higher maximal oxygen intake, yet β 2 < 0. A boy who
weighs more is expected to have larger lungs and to have a higher maximal oxygen intake,
^
yet β 4 < 0. These unexpected signs of slopes are possible signs of multicollinearity.
Comment: The model Y = β1 + β2X2 + β3X3 + β4X4 + ε, was fit:
Parameter Estimate Standard Error T-Statistic P-Value
β1 -4.503 .5038 -8.939 0.01%
β2 -0.03631 .01405 -2.584 4.2%
β3 0.05254 .005380 9.766 0.006%
β4 -0.01970 .009078 -2.170 7.3%
2
R2 = .966. R = .950. s2 = .00119.
D.F. Sum of Squares Mean Square F-Ratio P-Value
RSS 4 0.20581 0.0686 58 0.01%
ESS 5 0.00715 0.00119
TSS 9 0.21296
Dropping X5 from the model: R2 ≥ .95, the p-value for the F-Test is .01%, and all the
individual fitted parameters are now significant at the 10% level. This indicates that
multicollinearity was indeed causing problems with the original model.
32.2. X2 and X3 are highly correlated, since a firm with more trucks pays on average more
for insurance.
The written premium depends on the rate level charged in each state, so that X2 is probably
significantly correlated (positively or negatively) with many of the state dummy variables.
If in the past there has been expansion or contraction by Hoffa Insurance of business written
based on in which states a firm has large exposure, then one or more of the state dummy
variables could be significantly correlated (positive or negative) with the number of years
insured with Hoffa Insurance.
A trucking firm is probably more likely to have exposures in neighboring states than in states
distant from each other. Therefore, some pairs or groups of state dummy variables may be
significantly correlated.
Comment: These are four possible answers, there may be others. Your exam is extremely
unlikely to have open ended questions like this one.
32.3. C. For each regression, the t-statistic has 20 - 4 = 16 degrees of freedom, and the
10% significance critical value is 1.746. We would like the absolute value of all the t-statistics
to be at least 1.746.
For regression A, R2 = 773/(773 + 227) = .773.
The t-statistics are: 44/16 = 2.75, 1.3/0.7 = 1.857, -2.4/0.5 = -4.8.
For regression B, R2 = 798/1000 = .798.
The t-statistics are: 40/18 = 2.22, 1.5/0.6 = 2.5, 0.35/0.16 = 2.19.
For regression C, R2 = 769/1000 = .769.
The t-statistics are: 37/25 = 1.48, -1.9/1.4 = -1.36, 0.41/0.23 = 1.78.
High R2 with few significant t-statistics. An indication of multicollinearity.
For regression D, R2 = 785/1000 = .785.
The t-statistics are: 1.7/0.7 = 2.42, -2.8/1.1 = -2.55, 0.45/0.24 = 1.88.
Comment: F = {R2/(1 - R2)}(N - k)/(k - 1) = {R2/(1 - R2)}16/3, with 3 and 16 d.f.
For the smallest R2, .769, F = (.769/.231)(16/3) = 17.75.
Since the 1% critical value is 5.29, in each case the F-Statistic is significant.
Solutions to problems in the remaining sections appear in Study Guide O.

Mahler’s Guide to
Regression
Sections 33-44
prepared by
Study Aid F06-Reg-O

Sharon, MA, 02067
HCMSA-F06-Reg-O, Solutions to Regression §33-44, 7/12/06, Page 530
33.1. A. X = 3. x = (-2, -1, 0, 1, 2). Σxi2 = 10.

^
ΣxiYi = -39640. β = ΣxiYi/Σxi2 = -39640/10 = -3964.0.
Y = 403221/5 = 80644.2. α ^ = 80644.2 - (-3964.0)(3) = 92,536.
Forecasted frequency for year 8 is: 92,536 + (-3964.0)(8) = 60,824.
^
33.2. A. Y = (88572, 84608, 80644, 76680, 72716).
^
ε^ = Y - Y = (-1816, 993, 2830, -1373, -633).
ESS = Σ ^εi 2 = 14,578,623. s2 = ESS/(N-2) = 14578623/(5-2) = 4,859,541.
33.3. B. 8 - X = 8 - 3 = 5. Variance of forecast of expected frequency in year 8 is:

s2{1/N + x2/Σxi2} = (4,859,541){1/5 + 52/10} = 13,120,761.
For the t-distribution with 5 - 2 = 3 degrees of freedom, the critical value for 5% area in both
tails is 3.182.
95% confidence interval for the expected frequency in year 8 is:
60,824 ± 3.182√13120761 = 60,824 ± 11,526 = (49,298, 72,350).
33.4. C. Mean Squared Error of the forecast of the observed frequency in year 8 is:
s2{1 + 1/N + x2/Σxi2} = (4,859,541){1 + 1/5 + 52/10} = 17,980,302.
95% confidence interval for the observed frequency in year 8 is:
60,824 ± 3.182√17980302 = 60,824 ± 13,493 = (47,331, 74,317).
^
33.5. E. X = 228/12 = 19. x = (-11, -13, ... , 21). Σxi2 = 924. Y = 64.247 - 1.013X.
^
Y = (56.14, 58.17, 53.10, 41.96, 50.06, 47.03, 46.01, 39.94, 45.00, 40.95, 37.91, 23.73).
^
ε^ = Y - Y = (2.86, -0.17, 2.90, 11.04, -0.06, -2.03, -3.01, 2.06, -6.00, -2.95, -7.91, 3.27).
ESS = Σ ^εi 2 = 273.84. s2 = ESS/(N-2) = 273.84/(12-2) = 27.384.
Mean Squared Error of forecast for X = 35:
s2{1 + 1/N + x2/Σxi2} = (27.384){1 + 1/12 + (35 - 19)2/924} = 37.253.
tails is 2.228.
The forecast for X = 35 is: 64.247 - (1.013)(35) = 28.792.
95% confidence interval: 28.792 ± 2.228√37.253 = 28.8 ± 13.6 = (15.2, 42.4).
Comment: An example of a cross-section study, as opposed to a time series.
The regression was fit as follows:
Y = 540/12 = 45. y = (14, 13, ..., -18). Σxiyi = -936.
^ ^ = 45 - (-1.013)(19) = 64.247.
β = - 936/924 = -1.013. α
Here is a graph of the data, the fitted line and the 95% confidence interval (dashed):
% wormy
80
60
40
20
size
10 20 30 40
Note that the confidence intervals get wider as we get further from X = 19.
33.6. B. The forecasted expected ln[severity] in year 8 is:
8.64516 + (0.0979168)(8) = 9.42849.
X = 3. x = (-2, -1, 0, 1, 2). Σxi2 = 10. 8 - X = 8 - 3 = 5.
Variance of forecast of expected ln[severity] in year 8 is:
s2{1/N + x2/Σxi2} = (0.00191151){1/5 + 52/10} = .0051611.
tails is 2.353.
90% confidence interval for expected ln[severity] in year 8 is:
9.42849 ± 2.353√.0051611 = 9.42849 ± .16904 = (9.25945, 9.59753).
Exponentiating, 90% confidence interval for expected severity in year 8 is: (10503, 14728).
33.7. E. (1, 2, 100, 200) (Covariance Matrix) (1, 2, 100, 200) = 8.795.
The predicted expected score is: 18.40 + (2.04)(2) + (0.1120)(100) + (.09371)(200) = 52.42.
95% confidence interval: 52.42 ± (1.960)√8.795 = 52.4 ± 5.8. Upper end is 58.2.
33.8. A. (1, 0, 150, 50) (Covariance Matrix) (1, 0, 150, 50) = 4.05.
Adding s2, the mean squared forecast error is: 20.38 + 4.05 = 24.43.
90% confidence interval for the observed score: 39.89 ± (1.645)√24.43 = 39.9 ± 8.1.
Upper end is 48.0.
33.9. D. (1, 1, 50, 250) (Covariance Matrix) (1, 1, 50, 250) = 9.35.
Adding s2, the mean squared forecast error is: 20.38 + 9.35 = 29.73.
Prob[Y ≥ 54.2] = 1 - Φ((54.2 - 49.47)/√29.73) = 1 - Φ(.867) = 19.3%.
33.10. D. X = 1985. x = (-15, -5, 5, 15). Σxi2 = 500.

^
ΣxiYi = -14765. β = ΣxiYi/Σxi2 = -14765/500 = -29.53.
Y = 12583/4 = 3145.75. α ^ = 3145.75 - (-29.53)(1985) = 61,762.8.
Forecasted value for 2005 is: 61,762.8 + (-29.53)(2005) = 2555.
^
Y = (3588.7, 3293.4, 2998.1, 2702.8).
^
ε^ = Y - Y = (-8.7, 8.6, 8.9, -8.8).
ESS = Σ ^εi 2 = 306.3. s2 = ESS/(N-2) = 306.3/(4-2) = 153.2.
Variance of the forecast of the expected value of 100,000q70 in 2005 is:
s2{1/N + x2/Σxi2} = (153.2){1/4 + (2005 - 1985)2/500} = 160.9.
tails is 9.925.
99% confidence interval for the expected value of 100,000q70 in 2005 is:
2555 ± 9.925√160.9 = 2555 ± 126 = (2429, 2681).
Comment: The mortality data was made up.
33.11. E. Since Y is LogNormal, lnY is Normal.
For X = 5, estimate of lnY is: 8.743 + (5)(.136) = 9.423.
X = 13,283/4000 = 3.32075. Second moment of X is: 54,940/4000 = 13.735.
Variance of X is: 13.735 - 3.320752 = 2.7076. Σxi2 = (4000)(2.7076) = 10,830.
s2 = ESS/(N-2) = 3285/3998 = .8217.
5 - X = 1.67925.
Variance of forecast at x = X - X is:
s2{1/N + x2/Σxi2} = (.8217)(1/4000 + 1.6792/10830) = .000419.
90% confidence interval for lnY is: 9.423 ± (1.645)√.000419 = 9.423 ± .034.
90% confidence interval for Y is: e9.389 to e9.457 ⇔ 11,956 to 12,797.
Comment: Not intended as a realistic model of private passenger automobile losses.
33.12. E. Mean Squared Forecast Error at x = X - X is:

s2{1 + 1/N + x2/Σxi2} = (.8217)(1 + 1/4000 + 1.6792/10830) = .8221.
90% confidence interval for lnY is: 9.423 ± (1.645)√.8221 = 9.423 ± 1.492
90% confidence interval for Y is: e7.931 to e10.915 ⇔ 2782 to 54,995.
Comment: Due to the process variance of a LogNormal, the 90% confidence interval for the
observed size of loss is very wide.
33.13. A. The expected finish time (seconds exceeding 2 hours)

in 2006 (t = 2006 - 1980 = 26) is: 596.496 - (0.86735)(26) = 573.95.
X = (0, 1, ..., 25). X = 12.5. x = X - X = (-12.5, -11.5, ..., 12.5). Σxi2 = 1462.5.
^ ^
Var[ β] = s2/Σxi2. ⇒ s2 = Var[ β]Σxi2 = (6.1238)(1462.5) = 8956.1.
^ ^
^ + 262Var[ β] + (2)(26)Cov[ α,
^ + 26 β] = Var[ α] ^ β] = ^
Variance of Forecast for 2006 is: Var[ α
1301.31 + (676)(6.1238) + (52)(-76.548) = 1460.50.
Mean Squared Forecast Error for 2006 is: 1460.50 + s2 = 1460.50 + 8956.1 = 10,417.
For N - 2 = 24 degrees of freedom, the critical value for the 1% area in both tails for the
t-distribution is 2.797.
A 99% confidence interval for the observed finish time (seconds exceeding 2 hours) in 2006:
573.95 ± 2.797√10417 = 574 ± 285 = (289, 859).
Translating back to hours and minutes: (2:4:49, 2:14:19).
Alternately, the Mean Squared Forecast Error for 2006 (X = 26) is:
s2{1 + 1/N + x2/Σxi2} = (8956.1)(1 + 1/26 + 13.52/1462.5) = 10,417. Proceed as before.
Comment: The slope is not significantly different from zero; for a t-test the p-value is 73%.
33.14. D. The forecast for 2006 is 573.95 with mean squared error 10,417.
The observed value for 2006, converting to seconds beyond two hours is: (60)(7) + 14 = 434.
normalized forecast error = (573.95 - 434)/√10,417 = 1.37.
33.15. E. X = 2.5. x = (-1.5, -.5, .5, 1.5). Σxi2 = 5.

^
ΣxiYi = 32. β = ΣxiYi/Σxi2 = 32/5 = 6.4
Y = 1026/4 = 256.5. α ^ = 256.5 - (6.4)(2.5) = 240.5.
Forecasted value for year 5 is: 240.5 + (5)(6.4) = 272.5.
^
Y = (246.9, 253.3, 259.7, 266.1).
^
ε^ = Y - Y = (4.1, -6.3, 0.3, 1.9).
ESS = Σ ^εi 2 = 60.2. s2 = ESS/(N-2) = 60.2/(4-2) = 30.1.
Mean Squared Error of the forecast of the pure premium observed in year 5 is:
s2{1 + 1/N + x2/Σxi2} = (30.1){1 + 1/4 + (5 - 2.5)2/5} = 75.25.
tails is 4.303.
95% confidence interval for the pure premium observed in year 5 is:
272.5 ± 4.303√75.25 = 272.5 ± 37.3 = (235, 310).
^
33.16. D. β = ΣXiYi/ΣXi2 = 126096439/317475058 = .397185.
^
ESS = Σ (Yi - Yi )2 = (3312 - 3192.2)2 + (3090 - 3156.8)2 + (2983 - 3094.9)2
+ (3211 - 3227.1)2 + (3224 - 3152.1)2 = 36765.
s2 = ESS/(5 - 1) = 36765/4 = 9191.
^
Var[ β] = s2/ΣXi2 = 9191/317475058 = .00002895.
For Accident Year 2004, the expected number of claims reported after December 31, 2004 is:
(.397185)(8431) = 3348.7.
^
Var[8431 β] = (84312)(.00002895) = 2058.
The mean squared error of the estimate compared to the observed is: 2058 + 9191 = 11249.
Expected number of claims for Accident Year 2004 is: 8431 + 3348.7 = 11779.7.
For the t-distribution with 5 - 1 = 4 degrees of freedom, 2.132 is the critical value for 10% area
in both tails.
90% confidence interval is: 11779.7 ± 2.132√11249 = 11780 ± 226 = (11554, 12006).
33.17. Assuming the model is a reasonable one for boys ages 10 to 14, it is probably
reasonable to use the model to estimate the average height of a boy aged 15:
30.6 + (2.44)(15) = 67.2 inches.
It would not be reasonable to use the model to estimate the average height of a boy aged 0:
30.6 + (2.44)(0) = 30.6 inches!
It would not be reasonable to use the model to estimate the average height of a man aged
60: 30.6 + (2.44)(60) = 177 inches = 14.75 feet!
In general, one should be cautious about applying a model outside the range of data used to
fit that model.
^
Comment: Var[ β] = s2/(N Var[X]) = (9.2)/{(1000)(2)} = 0.0046.
^
^ = E[X2]Var[ β] = (146)(.0046) = 0.6716.
Var[ α]
^ ^
Cov[ α^ , β] = - X Var[ β] = - (12)(.0046) = -0.0552.
The variance of the forecast of the mean height of a boy aged 15 is:
^
^ + 15 β] = Var[ α]
^ + 30Cov[ α ^^ ^
Var[ α , β] + 225Var[ β] = 0.6716 + (30)(-0.0552) + (225)(0.0046)
= 0.0506.
Alternately, the variance of the forecast of the mean height of a boy aged 15 is:
(9.2)(1/1000 + (15 - 12)2/{(1000)(2)}) = 0.0506.
The mean squared error of the forecast of the height of a boy aged 15 is:
(9.2)(1 + 1/1000 + (15 - 12)2/{(1000)(2)}) = 9.25.
Thus almost of the forecast error would be due the variance of the distribution of heights of
boys aged 15, rather than an error in predicting the average height of a boy aged 15,
assuming the model is valid.
By the way, in this case, the error terms are likely to be not quite homoscedastic, with a
slightly larger variance of the heights of boys aged 14, than the variance of the heights of
boys aged 10.
33.18 & 33.19. The expected catastrophe losses in 2005 are:

-1057.7 + (0.535385)(2005) = 15.75.
s2 = 8.424362 = 70.9698. N = 14.
X = 1997.5. x = X - X = (-6.5, -5.5, ..., 6.5). Σxi2 = 227.5.
Variance of Forecast at x = 2005 - X = 7.5 is:
s2{1/N + x2/Σxi2} = (70.9698)(1/14 + 7.52/227.5) = 22.617.
For N - 2 = 12 degrees of freedom, the critical value for the 10% area in both tails for the
t-distribution is 1.782.
90% confidence interval for the expected catastrophe losses in 2005:
15.75 ± 1.782√ 22.617 = 15.75 ± 8.47 = (7.3, 24.2).
Mean Squared Forecast Error at x = 2005 - X = 7.5 is:
s2{1 + 1/N + x2/Σxi2} = (70.9698)(1 + 1/14 + 7.52/227.5) = 93.587.
90% confidence interval for the observed catastrophe losses in 2005:
15.75 ± 1.782√93.587 = 15.75 ± 17.24 = (0, 33).
Comment: Catastrophe losses are never negative. There are large errors in the estimated
coefficients of the model, and therefore the interval for the expected catastrophe losses is
very wide. Linear regression is not a practical tool for predicting catastrophe losses. Due to
the large random fluctuation in catastrophe losses year to year, the interval for the observed
catastrophe losses is extremely wide and useless.
33.20. (i) There appears to be a linear relationship with a positive slope.

y
40
35
30
25
20
15
10
5
x
1 2 3 4 5
(ii) Var[X] = 91.08/10 - 2.662 = 2.0324. Var[Y] = 5603.12/10 - 20.282 = 149.0336.

Cov[X,Y] = 707.58/10 - (2.66)(20.28) = 16.8132.
Corr[X,Y] = 16.8132/√{(2.0324)(149.0336)} = 0.966.
A large positive correlation, agreeing with the previous comment.
^
(iii) β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2}
= {(10)(707.58) - (26.6)(202.8)}/{(10)(91.08) - 26.62} = 168.132/203.24 = 8.2726
^
α^ = Y - β X = 20.28 - (8.2726)(2.66) = -1.725.
Fitted regression is: y = -1.73 + 8.273 x.
(iv) (a) TSS = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N = 5603.12 - 202.82/10 = 1490.336.
RSS = {ΣXiYi - ΣXiΣYi/N}2/{ΣXi2 - (ΣXi)2/N} = 168.1322/20.234 = 1390.886.
Alternately, RSS = TSS R2 = TSS Corr[X, Y]2 =(1490.336)(0.9662) = 1391.
ESS = TSS - RSS = 1490.336 - 1390.886 = 99.450.
(b) R2 = RSS/TSS = 1390.886/1490.336 = 0.933.
This is also the square of the correlation between x and y: 0.9662 = 0.933.
(c) s2 = ESS/(N - 2) = 99.450/8 = 12.43.
(v) Recalling that x, the ski-lift capacity, is given in thousands, a 500 increase in skiers per
hour is an increase in x of 0.5.
^
The expected increase in y is: 0.5 β = (0.5)(8.2726) = 4.136 or 4136 visitor days.
^
Var[ β] = s2/(N Var[X]) = 12.43/20.324 = 0.6116.
^ ^
Var[.5 β] = .52Var[ β] = (.25)(0.6116) = 0.1529.
^
StdDev[.5 β] = √0.1529 = 0.391 or standard error of the estimate of 391 visitor days.
33.21. (i) There appears to be a linear relations between x and y.
y
1500
1400
1300
1200
1100
1000
x
35 40 45 50 55
^
(ii) β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(12)(650,264.8) - (516.4)(14821)}/{(12)(22,741.34) - 516.42} = 149613.2/6227.12 = 24.026.
^
^ = Y - β X = 14821/12 - (24.026)(516.4/12) = 201.16.
α
The fitted line is: Y = 201.2 + 24.03 X.
(iii) TSS = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N = 18,695,125 - 148212/12 = 389955.
^ ^
RSS = β 2 Σ(Xi - X )2 = β 2 {ΣXi2 - (ΣXi)2/N} = (24.0262){22,741.34 - 516.42/12} = 299550.
Alternately, RSS = Sxy2/Sxx = {ΣXiYi - ΣXiΣYi/N}2/{ΣXi2 - (ΣXi)2/N}
= {650,264.8 - (516.4)(14821)/12}2/{22,741.34 - 516.42/12} = 12467.82/518.93 = 299553.
ESS = TSS - RSS = 389955 - 299550 = 90405.
s2 = ESS/(N - 2) = 90405/10 = 9040.5.
^
Var[ β] = s2 / {N Var[X]} = 9040.5/{ΣXi2 - (ΣXi)2/N} = 9040.5/{22,741.34 - 516.42/12} = 17.42.
sβ^ = √17.42 = 4.17.
For 10 degrees of freedom, using the t-table, the critical value for 5% area in both tails is
2.228.
95% confidence interval for β is: 24.03 ± (2.228)(4.17) = (14.7, 33.3).
We have assumed Normal errors with a constant variance.
(iv) Variance of forecast at x = X - X is: s2{1/N + x2/Σxi2}.

X = 516.4/12 = 43.033. Σxi2 = Σ(Xi - X )2 = ΣXi2 - (ΣXi)2/N = 22,741.34 - 516.42/12 = 518.93.
s2{1/N + x2/Σxi2} = (9040.5)(1/12 + x2/518.93).
(a) For X = 50, the forecast is: 201.2 + (24.03)(50) = 1402.7.
For X = 50, x = X - X = 50 - 43.033 = 6.967.
Variance is: (9040.5)(1/12 + 6.9672/518.93) = 1599.
95% confidence interval for the mean resting metabolic rate when X = 50 is:
1402.7 ± (2.228)(√ 1599) = 1402.7 ± 89.1 = (1314, 1492).
(b) For X = 75, the forecast is: 201.2 + (24.03)(75) = 2003.5.
For X = 75, x = X - X = 75 - 43.033 = 31.967.
Variance is: (9040.5)(1/12 + 31.9672/518.93) = 18556.
95% confidence interval for the mean resting metabolic rate when X = 75 is:
2003.5 ± (2.228)(√ 18556) = 2003.5 ± 303.5 = (1700, 2307).
(v) The confidence interval for X = 50 is okay.
However, in the case of X = 75 we are extrapolating significantly outside the values of X
contained in the data. Therefore, considerable caution is needed in using this confidence
interval.
Comment: Here is a graph of the fitted line and the data:
y
1500
1400
1300
1200
1100
1000
x
35 40 45 50 55
33.22. The fitted line is: Y = 201.2 + 24.03 X.
The ends of the confidence intervals for the mean resting metabolic rate are:
201.2 + 24.03 X ± (2.228)√{(9040.5)(1/12 + (X - X )2/518.93)}.
met . rate
1600
1400
1200
1000
800
mass
30 35 40 45 50 55
Note how the confidence intervals are wider further from X = 516.4/12 = 43.03.
33.23. The ends of the confidence intervals for the observed resting metabolic rate are:
201.2 + 24.03 X ± (2.228)√{(9040.5)(1 + 1/12 + (X - X )2/518.93)}.
met . rate
1600
1400
1200
1000
800
mass
30 35 40 45 50 55
Note how the confidence intervals for the observed values are much wider than those for the
mean values. The observed values vary around the mean value.
33.24. C. Var[Y10] = Var[α + 10β] = Var[α] + Var[10β] + 2Cov[α , 10β] =

Var[α] + 100Var[β] + 20Cov[α , β] = .00055 + (100)(0.00002) + (20)(-0.0001) = .00055.
√.00055 = .0235.
Alternately, one can use the delta method.
h(α, β) = Y10 = α + 10β. ∂h/∂α = 1 and ∂h/ ∂β = 10.
Therefore, the variance of the forecast is:
( 0.00055 -0.00010) ( 1 )
(1 10) ( ) ( ) = .00055.
(-0.00010 0.00002) (10)
The standard deviation of the forecast is: √.00055 = .0235.
Comment: The delta method, covered on Exam 4/C, only measures the error in forecasting
the expected value, not the error due to random fluctuation of the future observation around
its expected value. The delta method measures the effect of the random fluctuations
contained in the observations used to fit the model, not the random fluctuations of future
observations. Note that equation 8.12 in Econometric Models and Economic Forecasts, by
Pindyck and Rubinfeld, includes an additional term of σ2, the variance of ε, the error term in
the model. This term takes into account the additional error due to the random fluctuation of
the loss ratio for year 10 around its expected value.
33.25. For 8 data points, we 8 - 2 = 6 degrees of freedom.

For the t-distribution with 6 d.f., for 5% area in both tails, the critical value is 2.447.
The forecast is: .50 + (10)(.02) = .700.
An approximate 95% confidence interval is: .700 ± (2.447)(.0235) = .700 ± .058.
33.26. (i) There seems to be an increasing and linear relationship.
y
5.75
5.5
5.25
4.75
4.5
4.25
4 x
4 4.25 4.5 4.75 5 5.25
^
(ii) β = {(10)(238.3676) - (50.02)(47.12)}/{(10)(224.8554) - 47.122} = 0.946.
^
^ = Y - β X = 5.002 - (.946)(4.712) = 0.544.
α
Fitted line is: y = 0.544 + 0.946x.
^
(iii) R2 = β 2 Sxx/Syy = (0.946){224.8554 - 47.122/10}/{253.5796 - 50.022/10} = 0.748.
Alternately, TSS = 253.5796 - 50.022/10 = 3.37956.
RSS = Sxy2/Sxx = {238.3676 - (50.02)(47.12)/10}2/{224.8554 - 47.122/10} = 2.5290.
R2 = RSS/TSS = 2.5290/3.37956 = 0.748.
This high R2 agrees with the apparent linear relationship.
(iv) ESS = TSS - RSS = 3.37956 - 2.5290 = 0.8506.
s2 = ESS/(N-2) = 0.8506/8 = 0.1063.
ESS/σ2 follows a Chi-Square Distribution with 8 degrees of freedom.
There is 95% probability between 2.18 and 17.53 on this Chi-Square Distribution.
.8506/17.53 = .0485 ≤ σ2 < .8506/2.18 = .390.
A 95% confidence interval for σ 2: (.0485, .390).
^
(v) Var[ β]= s2/Sxx2 = 0.1063/{224.8554 - 47.122/10}2 = .0376.
For a t-distribution with 8 degrees of freedom, for a total of 5% area in both tails, 2.306
0.946 ± (2.306)√ .0376 = .946 ± .447 = (0.499, 1.393).
(vi) The forecasted value is: 0.544 + (5)(.946) = 5.724.
Var[forecast] = s2{1/N + (5- X )2/Sxx} = (.1063){1/10 + (5 - 4.712)2/2.82596} = .01375.
5.724 ± 2.306√ .01375 = 5.274 ± .270 = (5.004, 5.544).
34.1. B. For the first ten years, X = 5.5. x = (-4.5, -3.5, ..., 4.5). Σxi2 = 82.5. ΣxiYi = -2.3725.
^
β = ΣxiYi/Σxi2 = -2.4955/82.5 = -0.0302485. Y = 1.3983.
^ = 1.3983 - (-0.0302485)(5.5) = 1.5647.
α
Forecasted average rate for time 15 is: 1.5647 + (15)(-0.0302485) = 1.111.
34.2. E. Forecasted average rates for times 11 to 15 are:

1.23194, 1.20169, 1.17144, 1.14119, 1.11094.
Mean squared error is: {(1.23194 - 1.366)2 + (1.20169 - 1.342)2 + (1.17144 - 1.334)2 +
(1.14119 - 1.307)2 + (1.11094 - 1.362)2}/5 = .03092.
Root mean squared error = √.03092 = .176.
34.3. C. 2nd moment of forecasts is:

(1.231942 + 1.201692 + 1.171442 + 1.141192 + 1.110942)/5 = 1.37410.
2nd moment of observations is: (1.3662 + 1.3422 + 1.3342 + 1.3072 + 1.3622)/5 = 1.80195.
U = .176/(√1.37410 + √1.80195) = .070.
34.4. E. mean of forecasts is: (1.23194 + 1.20169 + 1.17144 + 1.14119 + 1.11094)/5 =

1.17144. mean of observations is: (1.366 + 1.342 + 1.334 + 1.307 + 1.362)/5 = 1.3422.
UM = (1.17144 - 1.3422)2/.03092 = 94.3%.
34.5. A. Standard deviation of forecasts = √(1.37410 - 1.171442) = .04276.

Standard deviation of observations = √(1.80195 - 1.34222) = .02119.
US = (.04276 - .02119)2/.03092 = 1.5%.
34.6. B. Correlation of the forecasts and the observations is:

({(1.23194 - 1.17144)(1.366 - 1.3422) + (1.20169 - 1.17144)(1.342 - 1.3422) +
(1.17144 - 1.17144)(1.334 - 1.3422) + (1.14119 - 1.17144)(1.307 - 1.3422) +
(1.11094 - 1.17144)(1.362 - 1.3422)} /5)/{(.04276)(.02119)} =
(0.00130075/5)/.0009061 = .287.
UC = (2)(1 - .287)(.04276)(.02119)/.03092 = 4.2%.
Comment: UM + US + UC = 94.3% + 1.5% + 4.2% = 100%.
34.7. B. Mean forecast = (76 + 106 + 110 + 142)/4 = 108.5.

Mean actual = (81 + 93 + 125 + 129)/4 = 107. Bias = 108.5 - 107 = 1.5.
MSE = {(76 - 81)2 + (93 - 106)2 + (110 - 125)2 + (142 - 129)2}/4 = 588/4 = 147.
Bias proportion of inequality is: Bias2/ MSE = 1.52/147 = 0.0153.
Comment: Similar to VEE-Applied Statistics Exam, 8/05, Q.6.
34.8. A. 2nd moment of forecasted = (762 + 1062 + 1102 + 1422)/4 = 12,319.

variance of forecasted = 12319 - 108.52 = 546.75.
2nd moment of actual = (812 + 932 + 1252 + 1292)/4 = 11,869.
variance of actual = 11,869 - 1072 = 420.
Standard Deviation of forecasted = √546.75 = 23.383.
Standard Deviation of actual = √420 = 20.494.
Variance proportion of inequality is: (23.383 - 20.494)2/147 = 0.0568.
Comment: E[(forecast)(actual)] = {(76)(81) + (106)(93) + (110)(125) + (142)(129)}/4 =
12020.5. Cov[forecast, actual] = 12020.5 - (108.5)(107) = 411.
Corr[forecast, actual] = 411/{(23.383)(20.494)} = .85766.
Covariance proportion of inequality is: 2(1 - .85766)(23.383)(20.494)/147 = 0.9280.
34.9. C. In an ex post forecast, all of the values of the dependent variable are known at the
time of the forecast, in contrast to an ex ante forecast.
Comment: See pages 202-203 of Pindyck and Rubinfeld.
34.10. B. MSE = Σ(Fi - Oi)2 /100 = {ΣFi2 + ΣOi2 - 2ΣFiOi} /100 =

{11,330 + 15,856 - (2)(10,281)}/100 = 66.24. RMSE = √66.24 = 8.14.
34.11. E. Second Moment of forecasts = 11,330/100 = 113.30.

Second Moment of observations = 15,856/100 = 158.56.
U = 8.14/(√113.30 + √158.56) = .350.
34.12. A. F = 872/100 = 8.72. O = 981/100 = 9.81.

UM = (8.72 - 9.81)2/66.24 = 1.79%.
34.13. D. Standard deviation of forecasts = √(113.30 - 8.722) = 6.104.

Standard deviation of observations = √(158.56 - 9.812) = 7.895.
US = (6.104 - 7.895)2/66.24 = 4.84%.
34.14. C. Correlation of the forecasts and the observations is:

{Σ(Fi - F )(Oi - O )/100}/{σFσO} = {ΣFiOi/100 - F O )}/{(6.104)(7.895)} =
{10,281/100 - (8.72)(9.81)}/(48.191) = .3583.
UC = (2)(1 - .3583)(6.104)(7.895)/66.24 = 93.37%.
Comment: UM + US + UC = 1.79% + 4.84% + 93.37% = 1.
34.15. E. All of these statements are true. Since we have the average premium for June,
the forecast of the June premium is ex post. Since we do not yet have the average premium
for July, the forecast of the July premium is ex ante.
We know the CPI for June, so the forecast of August average premiums is unconditional. We
do not yet know the CPI for July; we would need to somehow predict it. Thus the forecast of
September average premium is conditional.
34.16. A. For the first 12 values, X = 6.5. x = (-5.5, -4.5, ..., 5.5). Σxi2 = 143.
^
ΣxiYi = 120.65. β = ΣxiYi/Σxi2 = 120.65/143 = 0.843706.
Y = 297.075. α^ = 297.075 - (0.843706)(6.5) = 291.591.
Forecasted value for time 18 is: 291.591+ (18)(0.843706) = 306.778.
34.17. C. Forecasted average rates for times 13 to 18 are:

302.559, 303.403, 304.247, 305.090, 305.934, 306.778.
Mean squared error is: {(302.559 - 303.6)2 + (303.403 - 306.0)2 + ( 304.247 - 307.5)2 +
(305.090 - 308.3)2 + (305.934 - 309.0)2 + (306.778 - 310.0)2}/6 = 8.083.
Root mean squared error = √8.083 = 2.84.
Comment: The CPI for Medical Care for 2003 and the first half of 2004.
34.18. A. 2nd moment of forecasts is:

(302.5592 + 303.4032 + 304.2472 + 305.0902 + 305.9342 + 306.7782)/6 = 92825.
2nd moment of observations is:
(303.62 + 306.02 + 307.52 + 308.32 + 309.02 + 310.02)/6 = 94499.
U = 2.84/(√92825 + √94499) = .0046.
34.19. E. Mean forecast = (174 + 193 + 212 + 231)/4 = 202.5.

Mean actual = (186 + 206 + 227 + 242)/4 = 215.25. Bias = 202.5 - 215.25 = -12.75.
MSE = {(174 - 186)2 + (193 - 206)2 + (212 - 227)2 + (231 - 242)2}/4 = 659/4 = 164.75.
Bias proportion of inequality is: Bias2/ MSE = 12.752/164.75 = 0.9867.
34.20. C. 2nd moment of forecasted = (1742 + 1932 + 2122 + 2312)/4 = 41,457.5.

variance of forecasted = 41457.5 - 202.52 = 451.25.
2nd moment of actual = (1862 + 2062 + 2272 + 2422)/4 = 46,781.25.
variance of actual = 46,781.25 - 215.252 = 448.6875.
E[(forecast)(actual)] = {(174)(186) + (193)(206) + (212)(227) + (231)(242)}/4 = 44,037.
Cov[forecast, actual] = 44,037 - (202.5)(215.25) = 448.875.
Corr[forecast, actual] = 448.875/√{(451.25)(448.6875)} = .99757.
Covariance proportion of inequality is: 2(1 - .99757)√{(451.25)(448.6875)}/164.75 = 0.0133.
Comment: Standard Deviation of forecasted = √451.25 = 21.2426.
Standard Deviation of actual = √448.6875 = 21.1822.
Variance proportion of inequality is: (21.2426 - 21.1822)2/164.75 = 0.000022.
35.1. A. (.8)(135.7) + (.2)(87.2) + (1.4){37 - (36)(.8)} = 137.48.

Alternately, (.8)(135.7 + 1.4) + (.2){87.2 + (1.4)(37)} = 137.48.
35.2. C. Using the prior solution, (.8)(137.48) + (.2)(87.2) + (1.4){38 - (37)(.8)} = 139.184.
Alternately, (.8)(137.48 + 1.4) + (.2){87.2 + (1.4)(38)} = 139.184.
35.3. D. Using the prior solution, (.8)(139.184) + (.2)(87.2) + (1.4){39 - (38)(.8)} = 140.827.
Alternately, (.8)(139.184 + 1.4) + (.2){87.2 + (1.4)(39)} = 140.827.
35.4. E. X = 1950. x= (-50, -40, -30, -20, -10, 0, 10, 20, 30, 40, 50).
Y = ln(pop.) = (4.333, 4.524, 4.663, 4.814, 4.884, 5.019, 5.189, 5.315, 5.423, 5.520, 5.640).
^ ^
β = ΣxiYi/Σxi2 = 141.06/11000 = .01282. α ^ = Y - β X = 5.0295 - (.01282)(1950) = -19.9695.
Predicted the population (in millions) in 2030 is: exp[-19.9695 + (.01282)(2030)] = 426.3.
Comment: An actuary might instead apply the predicted annual change of exp[.01282] to the
latest population in 2000 of 281.4, in order to get a predicted population (in millions) for 2030
of: 281.4 exp[(30)(.01282] = 413.4.
^ ^
^ + βX = -19.9695 + .01282X =
35.5. B. Y = α
(4.3885, 4.5167, 4.6449, 4.7731, 4.9013, 5.0295, 5.1577, 5.2859, 5.4141, 5.5423, 5.6705.)
^
ε^ = Y - Y = (-0.0555, 0.0073, 0.0181, 0.0409, -0.0173, -0.0105, 0.0313, 0.0291, 0.0089,
-0.0223, -0.0305).
Durbin-Watson Statistic = Σ( ε^t - ε^ t −1)2/Σ ε^t 2 = 0.01121/0.00888 = 1.26.

Comment: Since 1.26 < 2, this indicates the possible presence of positive serial correlation.
This is common for time series. Positive serial correlation is expected here, since the
population at a given point in time is made up to a large extent of the same people who were
in the population ten years earlier. Also the number of births and deaths during a decade
depends heavily on the population at the beginning of the decade. There are many factors
affecting population growth such as: the age of the population, the rate at which births occur,
the mortality rates, the rate of immigration, the rate of emigration, etc.
35.6. 1. Fit a linear regression and get the resulting residuals ε^t .
^
From the previous solution, ε^ = Y - Y = (-0.0555, 0.0073, 0.0181, 0.0409, -0.0173, -0.0105,
0.0313, 0.0291, 0.0089, -0.0223, -0.0305).
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 =
{(-0.055)(0.0073) + (0.0073)(0.0181) + ... + (-0.0223)(-0.0305)}/{(-0.055)2 + ... + (-0.0223)2} =
0.0012642/0.0079465 = .159.
3. Let X*t = Xt - ρXt-1 = (1607.90, 1616.31, 1624.72, 1633.13, 1641.54, 1649.95, 1658.36,
1666.77, 1675.18, 1683.59).
Y*t = Yt - ρYt-1 = (3.83505, 3.94368, 4.07258, 4.11857, 4.24244, 4.39098, 4.48995, 4.57792,
4.65774, 4.76232).
^ ^
^ = -16.0107/(1 - .159) = -19.038. β = 0.01235.
α(1 - ρ) = -16.0107. ⇒ α

^ ^
^ + βX = -19.038 + .01235X = (4.4270, 4.5505, 4.6740, 4.7975, 4.9210, 5.0445, 5.1680,
Y=α
5.2915, 5.4150, 5.5385, 5.6620).
^
ε^ = Y - Y = (-0.0940, -0.0265, -0.0110, 0.0165, -0.0370, -0.0255, 0.0210, 0.0235, 0.0080,
-0.0185, -0.0220).
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 0.003339/0.0133502 = .250.

3. Let X*t = Xt - ρXt-1 = Xt - .250Xt-1 = (1435.0, 1442.5, 1450.0, 1457.5, 1465.0, 1472.5,
1480.0, 1487.5, 1495.0, 1502.5).
Y*t = Yt - ρYt-1 = Yt - .250Yt-1 = (3.44075, 3.5320, 3.64825, 3.6805, 3.7980, 3.93425, 4.01775,
4.09425, 4.16425, 4.26000).
^ ^
^ = -14.1561/(1 - .250) = -18.8748. β = 0.01226.
α(1 - ρ) = -14.1561. ⇒ α

^ ^
^ + βX = -18.8748 + 0.01226X = (4.4192, 4.5418, 4.6644, 4.7870, 4.9096, 5.0322,
Y=α
5.1548, 5.2774, 5.4000, 5.5226, 5.6452).
^
ε^ = Y - Y = (-0.0862, -0.0178, -0.0014, 0.027, -0.0256, -0.0132, 0.0342, 0.0376, 0.0230,
-0.0026, -0.0052).
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 0.0028212/0.012427 = .227.

3. Let X*t = Xt - ρXt-1 = Xt - .227Xt-1 = (1478.70, 1486.43, 1494.16, 1501.89, 1509.62,
1517.35, 1525.08, 1532.81, 1540.54, 1548.27).
Y*t = Yt - ρYt-1 = Yt - .227Yt-1 = (3.54041, 3.63605, 3.75550, 3.79122, 3.91033, 4.04969,
4.13710, 4.21650, 4.28898, 4.38696).
^ ^
^ = -14.6249/(1 - .227) = -18.9197. β = 0.012287.
α(1 - ρ) = -14.6249. ⇒ α

^ ^
^ + βX = -18.9197 + 0.012287X = (4.4256, 4.54847, 4.67134, 4.79421, 4.91708,
Y=α
5.03995, 5.16282, 5.28569, 5.40856, 5.53143, 5.6543).
^
ε^ = Y - Y = (-0.0926, -0.02447, -0.00834, 0.01979, -0.03308, -0.02095, 0.02618, 0.02931,
0.01444, -0.01143, -0.0143).
ρ = Σ ε^ t −1 ε^t / Σ ε^ t −12 = 0.00298383/0.0130516 = 0.2286.

7. The value of ρ seems to have converged sufficiently for our purposes.
Use the transformed equation translated back to the original variables for ρ = .227:
^ ^
^ + βX = -18.9197 + 0.012287X.
Y=α
Y for 2000 is 5.640. Predicted Y for 2010 is:
(.227)(5.640) + (1 - .227)(-18.9197) + (0.012287){2010 - (.227)(2000)} = 5.774.
Predicted population (in millions) for 2010 is: exp[ 5.774] = 321.8.
Predicted Y for 2020 is:
(.227)(5.774) + (1 - .227)(-18.9197) + (0.012287){2020 - (.227)(2010)} = 5.899.
Predicted population (in millions) for 2020 is: exp[5.899] = 364.7.
(.227)(5.899) + (1 - .227)(-18.9197) + (0.012287){2030 - (.227)(2020)} = 6.023.
Comment: This prediction for 2030 of 412.8 compares to that of 426.3 using the original
regression.
35.7. DW ≅ 2(1- ρ). ⇒ ρ ≅ 1 - DW/2 = 1 - 1.26/2 = .37. Thus try ρ = .20, .30, .40, .50.
1. Let X*t = Xt - ρXt-1, and Y*t = Yt - ρYt-1. (There are now only 10 elements, rather than 11.)
For example, for ρ = .3: X*t = (1340, 1347, 1354, 1361, 1368, 1375, 1382, 1389, 1396, 1403).
Y*t = (3.2241, 3.3058, 3.4151, 3.4398, 3.5538, 3.6833, 3.7583, 3.8285, 3.8931, 3.9840).
^
For example, for ρ = .30: α(1 - ρ) = -13.137 and β = 0.0122097.
ρ .20 .30 .40 .50
ESS 0.00400584 0.00403586 0.00421587 0.00454589
4. Refine the grid of values for ρ, and again perform steps 1, 2, and 3.
Take ρ = .18, .20, .22, .24.
ρ .18 .20 .22 .24
ESS 0.00401784 0.00400584 0.00399984 0.00399985
For ρ = .23, ESS = 0.00399909. Take ρ = .23. For the transformed regression with ρ = .23:
^ ^
^ = -14.5637/(1 - .23) = -18.9139. β = 0.0122841.
α(1 - ρ) = -14.5637. ⇒ α
Translate the transformed equation back to the original variables:
Y = α + βX = -18.9139 + 0.0122841X.
Y for 2000 is 5.640. Predicted Y for 2010 is:
(.23)(5.640) + (1 - .23)(-18.9139) + (0.0122841){2010 - (.23)(2000)} = 5.774.
Predicted population (in millions) for 2010 is: exp[ 5.774] = 321.8.
(.23)(5.774) + (1 - .23)(-18.9139) + (0.0122841){2020 - (.23)(2010)} = 5.899.
(.23)(5.899) + (1 - .23)(-18.9139) + (0.0122841){2030 - (.23)(2020)} = 6.023.
Comment: In this case, the result of using Hildreth-Lu procedure is basically the same as that
from using the Cochrane-Orcutt procedure. A graph of the ESS as a function of ρ:
0.00406
0.00405
0.00404
0.00403
0.00402
0.00401
0.004
0.14 0.17 0.2 0.23 0.26 0.29 0.32
35.8. D. From the regression equation on the transformed variables,
^ ^
^ = 14.981, and β = .33590.
α(1- ρ)
Using the last observed value of 190.3, the forecast for t = 121 is:
(.9)(190.3) + 14.981 + (.33590){121 - (120)(.9)} = 190.62.
Comment: Based on the Consumer Price Index for All Urban Consumers for 1995 to 2004.
^ ^ ^
^ + β(X ^
Used the forecasting equation YT+1 = ρYT + α(1- ρ) T+1 - ρXT).
35.9. D. Using the prior solution,

(.9)(190.62) + 14.981 + (.33590){122 - (121)(.9)} = 190.94.
35.10. E. Using the prior solution,

(.9)(190.94) + 14.981 + (.33590){123 - (122)(.9)} = 191.26.
35.11. Use the estimated variance of the regression of the transformed variables,
s2 = .1775. For this regression, N = 119.
t* = i + 1 - .9i = (1.1, 1.2, .. , 12.9). t* = 7. For the forecast, t* = 13.
59 59
Σ(t* - t* )2 = 5.92 + 5.82 + ... + .12 + 02 + .12 + ... + 5.82 + 5.92 = 2 ∑ (i / 10)2 = .02 ∑ i2 =
i=1 i=1
(.02)(59)(60)(119)/6 = 1404.2.
Mean Squared Error of the forecast is: .1775{1 + 1/119 + (13 - 7)2/1404.2} = .1835.
For the t-distribution with 119 - 2 = 117 degrees of freedom, for 5% area outside on both tails,
the critical value is 1.980, (using the value in the table for 120 degrees of freedom.)
95% confidence interval: 190.62 ± 1.980 √ .1835 = 190.62 ± .85 = (189.77, 191.47).
Comment: Beyond what you are likely to be asked on your exam. The observed values for
the first three months of 2005 were 190.7, 191.8, and 193.3 (preliminary).
36.1. The standardized coefficients are β^*j = β^ j sXj /sY.

j slope Var[Xj] StdDev[Xj] Var[Y] StdDev[Y] standardized slope
2 0.8145 28.98 5.383 514.9 22.69 0.1932
3 0.8204 236.4 15.375 514.9 22.69 0.5559
4 13.5287 0.2169 0.4657 514.9 22.69 0.2777
^
The revised model is: Y* = .1932X2* + 0.5559X3* + .2777X4*.
Where Y* = (Y - Y )/sY, and X2* = (X2 - X2 )/ sX , etc.
2
^
36.2. C. β^*j = β^ j sXj /sY. β^*2 = β 2 sX2 /sY = (-0.715143)√(3.90916/317.564) = -.079344.
β^*3 = (-0.873252)√(16.4044/317.564) = -.198474.
β^*4 = (31.2728)√(0.226415/317.564) = .835033.
β^*5 = (-17.8078)√(0.0622642/317.564) = -.249353.
β^*6 = (9.98376)√(0.246631/317.564) = .278229.
The variable with the largest absolute value of its standardized coefficient is the single most
important determinant of Y, which in this case is X4.
36.3. D. β*3 = ( rYX - rYX rX X ) /(1 - rX X 2) = {.3952 - (.7296)(-.2537)}/(1 - (-.25372)) =

3 2 2 3 2 3
.5803/.9356 = .620.
β*2 = ( rYX - rYX rX X ) /(1 - rX X 2) = {.7296 - (.3952)(-.2537)}/(1 - (-.25372)) = .887.
2 3 2 3 2 3
36.4. X = 3. x = X - X = -2, -1, 0, 1, 2. Y = 1914/5 = 383.

y = Y - Y = -181, -62, 21, 97, 124. Σxiyi = 769. Σxi2 = 10. Σyi2 = 61831.
For the two variable regression, the standardized slope is equal to correlation of X and Y.
β∗ = rXY = Σxiyi/√{Σxi2Σyi2} = 769/√{(10)(61831)} = .978.
^
The fitted model in standardized form is: Y* = .978X*.
Where Y* = (Y - Y )/sY, and X* = (X - X )/sX.
^
36.5. E. β^*j = β^ j sXj /sY. β^*2 = β 2 sX2 /sY = (2.4642)√(64.9821/1527.64) = .508.
β^*3 = (6.4560)√(27.6429/1527.64) = .868. β^*4 = (1.1839)√(96.0000/1527.64) = .297.
β^*3 > β^*2 > β^*4 .
^
36.6. D. β = {NΣXiYi - ΣXiΣYi }/ {NΣXi2 - (ΣXi)2} =
{(40)(4120) - (128)(672)} /{(40)(1853) - 1282} = 1.3646.
^
β^* = βsX/sY = 1.3646√{(1853/40 - (128/40)2)/(14911/40 - (672/40)2)} = 0.86.
Alternately, β^* = r = {ΣXiYi - ΣXiΣYi/N}/√{(ΣXi2 - (ΣXi)2/N)}{(ΣYi2 - (ΣYi)2/N)}
= {4120 - (128)(672)/40}/√{(1853 - 1282/40)(14911 - 6722/40)}
= 1969.6/√{(1443.4)(3621.4)} = 0.86.
Comment: A one standard deviation increase in X results in a 0.86 standard deviation
increase in the fitted value of Y. In standardized form, the fitted regression is: Y* = 0.86X*.
36.7. E. X = 100. x = X - X = (-60, -40, -20, 0, 20, 40, 60). Σxi2 = 11,200.
Y = 24.5. y = Y - Y = (-8.6, -5.7, -2.9, 0.7, 4.2, 5.9, 6.4). Σyi2 = 208.76. Σxiyi = 1506.
β∗= rXY = Σxiyi/√{Σxi2Σyi2} = 1506/√{(11200)(208.76)} = .985.
36.8. D. β^*j = β^ j sXj /sY. β^*2 = .238√(80964.2/20041.4) = .478.

β^*3 = -.000229√(5923126.9/20041.4) = -.004. β^*4 = .14718√(300121.4/20041.4) = .570.
j slope Var[Xj] StdDev[Xj] Var[Y] StdDev[Y] standardized slope
2 0.238 80964.2 284.54 20041.4 141.6 0.478
3 -0.000229 5923126.9 2433.75 20041.4 141.6 -0.004
4 0.14718 300121.4 547.83 20041.4 141.6 0.570
5 -6.68 1.3 1.14 20041.4 141.6 -0.054
6 -0.269 190.9 13.82 20041.4 141.6 -0.026
β^*4 = .570 > β^*2 = .478 > β^*3 = -.004.

Comment: The variances are on the diagonal of the variance-covariance matrix.
36.9. B. Σyix2i = β2Σx2i2 + β3Σx2ix3i, and Σyix3i = β2Σx2ix3i + β3Σx3i2.

The solution for β2 is: β2 = {Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i} / {Σx2i2 Σx3i2 - (Σx2ix3i)2}.
sY = square root of sample variance of Y = √ {Σ(Yi - Y )2 /(N-1)}.
sX2 = √ {Σ(X2i - X2 )2 /(N-1)} = √{Σx2i2/(N-1)}. ⇒ Σx2i2 = (N - 1) sX2 2.
rYX = Σx2iyi /√{Σx2i2Σyi2} = {Σx2iyi /(N-1)} / sY sX2 = 0.4. ⇒ Σx2iyi = .4(N - 1)sY sX2 .
2
rYX = {Σx3iyi /(N-1)} / sY sX3 = 0.9. ⇒ Σx3iyi = .9(N - 1)sY sX3 .
3
rX X = {Σx2ix3i/(N-1)} / sX2 sX3 = 0.6. ⇒ Σx2ix3i = .6(N - 1) sX2 sX3 .
2 3
Therefore, β2 = {.4(N - 1)sY sX (N - 1) sX3 2 - .9(N - 1)sY sX3 .6(N - 1) sX sX3 } /
2 2
{(N - 1) sX2 2(N - 1) sX3 2 - .62(N - 1)2 sX2 2 sX3 2} = (.4sY/ sX2 - .54sY/ sX2 ) /(1 - .36) = - .22sY/ sX2 .
Therefore, β*2 = β2 sX2 /sY = -.22.
Alternately, Σyix2 = β2Σx2i2 + β3Σx2ix3i, and Σyix3i = β2Σx2ix3i + β3Σx3i2.
Divide the first equation by √{Σyi2Σx2i2} and get:
Σyix2/√{Σyi2Σx2i2} = β2√Σx2i2/√Σyi2 + β3Σx2ix3i/√{Σyi2Σx2i2} ⇒
rYX = β2 sX2 /sY + β3 rX X sX3 /sY ⇒ rYX = β*2 + rX X β*3 ⇒ .4 = β*2 + .6β*3 .
2 2 3 2 2 3
2 2
Divide the second equation by √{Σyi Σx3i } and get:
Σyix3/√{Σyi2Σx3i2} = β2Σx2ix3i/√{Σyi2Σx3i2} + β3√Σx3i2/√Σyi2 ⇒
rYX = β2 rX X sX2 /sY + β3 sX3 /sY ⇒ rYX = rX X β*2 + β*3 ⇒ .9 = .6β*2 + β*3 .
3 2 3 3 2 3
Thus we have two equations in two unknowns, with solution
β*2 = {.4 - (.9)(.6)}/(1 - .62) = -.22, and β*3 = {.9 - (.4)(.6)}/(1 - .62) = 1.03.
Comment: In general, β*2 = ( rYX - rYX rX X ) /(1 - rX X 2) = {.4 - (.9)(.6)}/(1 - .62) = -.22.
2 3 2 3 2 3
36.10. E. β*3 = ( rYX - rYX rX X ) /(1 - rX X 2) = {.9 - (.4)(.6)}/(1 - .62) = 1.03.

3 2 2 3 2 3
^
37.1. E. E2 = β 2 X2 /Y = (2.4626)(8.875)/35.250 = .62.
^
E3 = β 3 X 3 /Y = (6.4560)(-.750)/35.250 = -.14.
^
E4 = β 4 X 4 /Y = (1.1839)(10.000)/35.250 = .34. |E 3 | < |E4 | < |E2 |.
Comment: Near their means, a one percent change in X2 is expected to produce a 0.62%
change in Y. Near their means, a one percent change in X3 is expected to produce a -0.14%
change in Y.
^
37.2. E2 = β 2 X2 /Y = (0.8145)(9213)/56660 = .132.
^
E3 = β 3 X 3 /Y = (0.8204)(35311.3)/56660 = .511.
^
E4 = β 4 X 4 /Y = (13.5287)(1383.35)/56660 = .330.
37.3. B., 37.4. E., 37.5. A.

Y = exp[-4.30 - .002D2i + .336 ln(X3i) + .384 X4i + .067D5i - .143D6i + .081D7i + .134 ln(X8i)].
∂Y / ∂X3 = Y(.336/X3). Taking the partial derivative at the means of X3 and Y,
Elasticity of Y with respect to X3 is: (∂Y / ∂X3) X 3 /Y = .336.
Alternately, because the model was estimated in logarithms rather than in levels, the variable
coefficients can be interpreted as elasticities. ln(X2) is multiplied by .336.
Elasticity of Y with respect to X4 is: (∂Y / ∂X4) X 4 /Y = .384 X 4, which can not be determined.
Elasticity of Y with respect to X8 is: (∂Y / ∂X8) X 8/ Y = .134.
coefficients can be interpreted as elasticities. ln(X8) is multiplied by .134.
37.6. & 37.7. SY = √{TSS/(N - 1)} = √{20000/(5 - 1)} = 70.71.

X2 = 3. sX = √{(1 - 3)2 + (2 - 3)2 + (3 - 3)2 + (4 - 3)2 + (5 - 3)2}/(5 - 1)} = 1.581.
2
X 3 = 300.
sX3 = √{(300 - 300)2 + (500 - 300)2 + (100 - 300)2 + (400 - 300)2 + (200 - 300)2}/(5 - 1)} =
158.1.
^
β^*2 = β 2 sX2 /sY = (60)(1.581)/70.71 = 1.34.
^
β^*3 = β 3 sX3 /sY = (-3)(158.1)/70.71 = -6.71.
Since | β^*3 | > | β^*2 |, X3 is more important than X2 in determining Y.
A one standard deviation increase in X2, results in a 6.71 standard deviation decease in the
fitted Y. A one standard deviation increase in X3, results in a 1.34 standard deviation
increase in the fitted Y.
Since the regression passes through the point where each of the variables is equal to its
mean, Y = 700 + (60)(3) - (3)(300) = -20.
^
E2 = β 2 X2 /Y = (60)(3)/(-20) = -9.
^
E3 = β 3 X 3 /Y = (-3)(300)/(-20) = 45.
Since |E3| > |E2|, near their means Y is more sensitive to chances in X3 than X2.
Near their means, a 1% increase in X2 results in about a 9% decrease in Y.
Near their means, a 1% increase in X3 results in about a 45% increase in Y.
37.8. B. For 1992, Dt = 1.

Y = exp[β1 + β2ln X2t + β3 ln X3t + β4(ln X2t - ln X2t0 ) + β5(ln X3t - ln X3t0 )].
∂Y / ∂X2 = Y(β2/X2t + β4/X2t). Taking the partial derivative at the means of X2 and Y,
elasticity of Y with respect to X2 is: (∂Y / ∂X2) X 2/ Y = β2 + β4 = .60 - .07 = .53.
coefficients can be interpreted as elasticities.
For 1992, ln(X2) is multiplied by: β2 + β4 = .60 - .07 = .53.
Alternately, elasticity is the percent change in Y due to a 1% change in X. Let x be a given
value of X2t. Then the expected value of lnYt is s + 0.53lnx, where s represents the rest of the
equation and 0.53 is the estimated value of β2 + β4. With a 1% increase in x, the new value is
s + 0.53ln(1.01x), which is the old value plus 0.53ln(1.01) = 0.0052737. Exponentiating
indicates that the new Y value will be e0.0052737 = 1.0052876 times the old value. This is a
0.53% increase, and so the elasticity is 0.53.
Comment: A 1 percent increase in employment leads (approximately) to a .53 percent
increase in claim frequency. Loosely based on “Workers Compensation and Economic
Cycles: A Longitudinal Approach”, by Hartwig, Retterath, Restrepo, and Kahley, PCAS 1997.
38.1. D. rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} =

2 3 2 3 2 3 3 2 3
2 2
(.7296 - (.3952)(-.2537)}/√{(1 - .3952 )(1 - .2537 )} = .8299/√{(.8438)(.9356)} = .934.
38.2. B. rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} =

3 2 3 2 2 3 2 2 3
(.3952 - (.7296)(-.2537)}/√{(1 - .72962)(1 - .25372)} = .5803/√{(.4677)(.9356)} = .877.
38.3. E. rYX .X 2 = (R2 - rYX 2)/(1 - rYX 2) = (.9 - .42)/(1 - .42) = .881. | rYX .X | = .939.
2 3 3 3 2 3
38.4. D. rYX .X 2 = (R2 - rYX 2)/(1 - rYX 2) = (.9 - .62)/(1 - .62) = .844. | rYX .X | = .919.
3 2 2 2 2 3
38.5. An increase of 1 in X2 will lead to an increase of .00875 in Y.

An increase of 1 in X3 will lead to a decrease of 1.927 in Y.
An increase of 1 in X4 will lead to a decrease of 3444 in Y.
An increase of 1 in X5 will lead to an increase of 2093 in Y.
An increase of 1% in X2 will lead to an increase of 0.649% in Y, near the mean value of each
variable. An increase of 1% in X3 will lead to a decrease of 0.337% in Y, near the mean
value of each variable. An increase of 1% in X4 will lead to a decrease of 0.062% in Y, near
the mean value of each variable. An increase of 1% in X5 will lead to an increase of 0.271%
in Y, near the mean value of each variable.
Since the absolute value of the elasticity associated with X2 is largest, Y is most responsive
to changes in X2.
Since the absolute value of the elasticity associated with X4 is smallest, Y is least responsive
to changes in X4.
.8672 = 75.2% of the variance of Y not accounted for by X3, X4 and X5 is accounted for by X2.
An increase of 1 standard deviation in X2 will lead to an increase of 0.911 standard
deviations in Y. An increase of 1 standard deviation in X3 will lead to a decrease of 0.395
standard deviations in Y. An increase of 1 standard deviation in X4 will lead to a decrease of
0.537 standard deviations in Y. An increase of 1 standard deviation in X5 will lead to an
increase of 0.390 standard deviations in Y.
With respect to the normalized variables the fitted model is:
^
Y* = .911X2* - .395X3* - .537X4* + .390X5* .
Since the absolute value of the standardized coefficient associated with X2 is largest, X2 is
the single most important determinant of Y.
38.6. A. rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} =

3 2 3 2 2 3 2 2 3
2 2
(-.938 - (.97)(-.878)}/√{(1 - .97 )(1 - .878 )} = -.08634/√{(.0591)(.2291)} = -.742.
38.7. B. rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} =

2 3 2 3 2 3 3 2 3
(.97 - (-.938)(-.878)}/√{(1 - .9382)(1 - .8782)} = .1464/√{(.1202)(.2291)} = .882.
38.8. C. The partial correlation coefficient measures the effect of Xj on Y which is not
accounted for by the other variables. Therefore, an increase of 1 unit in X3 will lead to a
decrease of 0.04 units in Y, that is not accounted for by X2 and X4. An increase of 1 unit in X3
^
will lead to a decrease of β 3 units in Y. D is false.
The square of the partial correlation coefficient measures the percent of variance of Y
accounted for by the portion of Xj that is uncorrelated with the other variables, that is not
accounted for by the other variables. Therefore, .72 = 49% of the variance of Y not accounted
for by X2 and X3 is accounted for by X4. A is false.
^
The standardized coefficients are: β^*j = β j sXj /sY. Thus an increase of one standard deviation
in Xj will lead to a change of β^*j standard deviations in (the fitted) Y.
Therefore, an increase of 1 standard deviation in X2 will lead to an increase of 0.50 standard
deviations in Y. B is false.
The standardized coefficient of X2 has the larger absolute value, and therefore X2 is a more
important determinant of Y than X4. E is false.
^
The elasticities are Ej = β j X j /Y ≅ (∂Y/ ∂ Xj) X j /Y = (∂Y/ Y )/ (∂ Xj/ X j ).
Near the mean of the variables, a 1% change in Xj will lead to about a Ej change in Y.
Therefore, (near the mean of the variables) an increase of 1% in X2 will lead to an increase of
(about) 0.20% in Y. C is True.
Comment; Statement C could have been more carefully worded. See pages 98 to 101
of Pindyck and Rubinfeld.
38.9. A. The partial correlation coefficient of Y and X2 controlling for X3 is:

rYX .X = ( rYX - rYX rX X )/√{(1 - rYX 2)(1 - rX X 2)} =
2 3 2 3 2 3 3 2 3
{.6 - (.5)(.4)}/ √{(1 - .52)(1 - .42)} = .504.
38.10. C. rYX .X 2 = (R2 - rYX 2) / (1 - rYX 2).

2 3 2 2
(-0.4)2 = (R2 - .42) / (1 - .42). ⇒ R2 = .2944.
39.1. Scatterplot of the data:
120
110
100
90
80
70
60
X
10 15 20 25 30 35 40
Comment: Data taken from Table 8.1 in Applied Regression Analysis by Draper and Smith.
^ = 109.874 and β^ = -1.12699.

39.2. α
120
110
100
90
80
70
60
X
10 15 20 25 30 35 40
39.3. Residuals = (2.03099, -9.57213, -15.604, -8.73094, 9.03099, -0.334062, 3.41196,
2.52304, 3.14207, 6.66594, 11.0151, -3.73094, -15.604, -13.477, 4.52304, 1.39605,
8.65003,
-5.54031, 30.285, -11.477, 1.39605).
The residual for the 19th observation, (17, 121), has a large absolute value at 30.285.
Resid.
30
20
10
X
10 15 20 25 30 35 40
-10
39.4. Studentized Residuals = (0.183968, -0.941583, -1.51081, -0.814263, 0.832863,

-0.0306318, 0.311247, 0.229716, 0.28991, 0.61766, 1.05085, -0.342831, -1.51081, -
1.27978, 0.413153, 0.127393, 0.798281, -0.845111, 3.60698, -1.07648, 0.127393).
The studentized residual for the 19th observation, (17, 121), has a large absolute value at
3.607.
Stud. Resid.
X
10 15 20 25 30 35 40
-1
39.5. DFBETAS = (0.00328426, -0.334798, 0.192386, 0.127884, 0.0148685, -0.00502936,
0.0326571, -0.0225011, -0.054266, 0.101412, -0.228885, 0.0538434, 0.192386, 0.125357,
-0.0404692, -0.0162222, -0.054933, -1.11275, 0.273168, 0.105444, -0.0162222).
The DFBETAS for the 18th observation, (42, 57), of - 1.11 has a large absolute value.
DFBETAS
0.2
X
10 15 20 25 30 35 40
-0.2
-0.4
-0.6
-0.8
-1
39.6. The values of Cook’s D = (0.000897406, 0.081498, 0.0716581, 0.025616, 0.0177437,
0.0000387763, 0.00313057, 0.00166821, 0.00383195, 0.0154395, 0.0548101, 0.00467762,
0.0716581, 0.0475978, 0.00536122, 0.000573585, 0.0178565, 0.678112, 0.223288,
0.0345189, 0.000573585).
Cook’s D
0.6
0.5
0.4
0.3
0.2
0.1
X
10 15 20 25 30 35 40
Cook’s D for the 18th observation, (42, 57), of 0.678 is very large compared to the others.
Observation 18 is very influential.
Cook’s D for the 19th observation, (17, 121), of 0.223 is somewhat large compared to the
others. Observation 19 is somewhat influential.
42.1. For 5 observations, with first order serial correlation, the covariance matrix of the errors
is:
(1 ρ ρ2 ρ3 ρ 4) (100 60 36 21.6 12.96)
(ρ 1 ρ ρ2 ρ3) (60 100 60 36 21.6 )
(ρ2 ρ 1 ρ ρ2)σ2 = (36 60 100 60 36 )
(ρ3 ρ2 ρ 1 ρ) (21.6 36 60 100 60 )
(ρ4 ρ3 ρ2 ρ 1) (12.96 21.6 36 60 100 )
42.2. 42.3, & 42.4. Var(ε) = (X/2)2 = .25, 1, 2.25, 4.

For heteroscedasticity, the covariance matrix σ2Ω of the errors is diagonal; take σ2 = 1:
(.25 0 0 0)
Ω= (0 1 0 0)
(0 0 2.25 0)
(0 0 0 4)
(1 1)
X= (1 2) X’Ω−1X = (5.694 8.333) (X’Ω−1X)-1 = (.7385 -.3846)
(1 3) (8.333 16 ) (-.3846 .2628)
(1 4)
~
X’Ω−1Y = (37.333) β = (X’Ω−1X)-1 X’Ω−1Y = (11.42).
( 42 ) ( -3.32)
~
Cov[ β ] = σ2(X’Ω−1X)-1 = (.7385 -.3846)
(-.3846 .2628)
^ ^
Var[ α^ ] = .7385. Var[ β] = .2628. Cov[ α^ , β] = -.3846.
Comment: Based on 4, 11/00, Q. 31. One could get the same answer using weighted
regression. In this case we are given the variances, rather than something proportional to the
variances. Thus we know σ2Ω. I have taken σ2 = 1. One could instead take σ2 = 1/4 and Ω to
be 4 times what I took. The answers would not be affected.
42.5. 42.6, & 42.7.

(1 1)
X= (1 4) X’Ω−1X = (.1412 -.1529) (X’Ω−1X)-1 = (7.857 .7143)
(1 9) (-.1529 1.682) (.7143 .6593)
~
X’Ω−1Y = (-.03529) β = (X’Ω−1X)-1 X’Ω−1Y = (1.429).
( 2.388 ) (1.549)
^
Y = 1.429 + 1.549 X = (2.978, 7.625, 15.370).
^
ε^ = Y - Y = (0.022, 0.375, -0.370).
An unbiased estimator of σ2 is given by:
( ε^ ‘ Ω−1 ε^ )/(N-k) = 0.00879/(3 - 2) = 0.00879.
~
Cov[ β ] = σ2(X’Ω−1X)-1 = (0.0691 0.00628)
(0.00628 0.00580)
^ ^
Var[ α^ ] = 0.0691. Var[ β] = 0.00580. Cov[ α^ , β] = 0.00628.
Comment; Well beyond what you are likely to be asked on an exam!
With more than three observations, the matrices are larger, but the concept is the same.
43.1. Y = a + bXc. S = Σ(Yi - a - bXic)2.

0 = ∂ S/∂ a = -2Σ (Yi - a - bXic). ⇒ Σ (Yi - a - bXic) = 0.
0 = ∂S/∂b = -2Σ (Yi - a - bXic)Xic. ⇒ Σ (Yi - a - bXic)Xic = 0.
0 = ∂ S/∂ c = -2Σ (Yi - a - bXic)b lnXi Xic. ⇒ Σ (Yi - a - bXic) lnXi Xic = 0.
43.2. Y = a/Xb. lnY = ln(a) - b ln(X).

This is a linear model, V = α + βU, where V = lnY, U = lnX, α = ln(a), and β = -b.
U = {ln(1) + ln(2) + ... + ln(11)}/11 = ln(11!)/11 = 1.59112.
u = (-1.59112, -0.897972, -0.492507, -0.204825, 0.018319, 0.200641, 0.354791, 0.488323,
0.606106, 0.711466, 0.806776)
V = ln(Y) = (6.82437, 5.42935, 4.58497, 3.93183, 3.58352, 3.21888, 2.94444, 2.63906,
2.3979, 2.19722, 2.07944).
^
Σui2 = 5.55189. ΣuiVi = -11.0581. β = ΣuiVi /Σui2 = -1.9918.
^ = 3.6210 - (-1.9918)(1.59112) = 6.7902.
V = 3.6210. α
^
^ = 889.08. b = - β = 1.9918.
a = exp[ α]
Comment: Note these parameters do not minimize the sum of squared differences between
the fitted and the original data. The sum of these squared differences is 1006.5.
43.3. Y = a/Xb. S = Σ(Yi - a/Xib)2.

0 = ∂S/∂a = -2Σ (Yi - a/Xib)/Xib. ⇒ a = (Σ Y i/Xib)/(Σ 1/Xi2b).
0 = ∂ S/∂ b = 2Σ (Yi - a/(Xib)a ln(Xi)/Xib. ⇒ a = (Σ Y iln(Xi)/Xib )/(Σ ln(Xi)/Xi2b).
43.4. From the Normal Equations, we want b such that:

(ΣYi/Xib)/(Σ1/Xi2b) = (ΣYiln(Xi)/Xib)/(Σln(Xi)/Xi2b).
Solving numerically, using b = 1.9918 or b = 2 as an initial value, b = 2.026525.
Using either Normal Equation, then a = 920.203.
43.5. Y = a/Xib = 920.203/X2.026525.

X 1 2 3 4 5 6 7 8 9 10 11
Observed 920 228 98 51 36 25 19 14 11 9 8
Fitted 920.20 225.86 99.31 55.43 35.27 24.37 17.83 13.61 10.72 8.66 7.14
Sum of Squared Differences = (920 - 920.20)2 + ... + (8 - 7.14)2 = 29.4.
Y = 129. Σ(Y - Y )2 = 730,282.
R 2 = 1 - 29.4/730282 = .999960.
43.6. f(x) = a/Xb. ∂f/∂a = 1/Xb. ∂f/∂b = -a ln[X]/Xb.

For a = 900 and b = 2, the constructed dependent variable is:
V = Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0 = Y - a/Xb + a(1/Xb) + b( -a ln[X]/Xb) =
Y - 1800 ln[X]/X2 = (920, -83.9162, -121.722, -104.958,
-79.8795, -64.588, -52.4824, -44.4843, -37.8272, -32.4465, -27.6712).
The first constructed independent variable is: 1/Xb = 1/X2 =
(1, 1/4, 1/9, 1/16, 1/25, 1/36, 1/49, 1/64, 1/81, 1/100, 1/121).
The second constructed independent variable is: -a ln[X]/Xb = -900ln[X]/X2 = (0, -155.958,
-109.861, -77.9791, -57.9398, -44.794, -35.7412, -29.2421, -24.4136, -20.7233, -17.8356).
Let Z be the matrix with columns equal to the constructed independent variables.
 1.0821 -61.4742  0.991614 0.00118798 
Z’Z =   . (Z’Z)-1 =  .
-61.4742 51312.8  0.00118798 0.0000209116
Z’V = (871.161, 47431.9).
The solution to the matrix equations is: (Z’Z)-1Z’V = (920.204, 2.0268).
Comment: The results of this first iteration are close to the least squares fit: a = 920.203 and
b = 2.026525. If more accuracy was needed, one could perform another iteration.
43.7. Y = a/(X + c)b. S = Σ(Yi - a/(Xi + c)b)2.

0 = ∂ S/∂ a = -2Σ (Yi - a/(Xi + c)b)/(Xi + c)b. ⇒ Σ (Yi - a/(Xi + c)b)/(Xi + c)b = 0.
0 = ∂S/∂b = 2Σ(Yi - a/(Xi + c)b)a ln(Xi + c)/(Xi + c)b.
⇒ Σ (Y i - a/(Xi + c)b )ln(X i + c)/(Xi + c)b = 0.
0 = ∂ S/∂ c = -2Σ (Yi - a/(Xi + c)b)a(-b)/(Xi + c)b+1. ⇒ Σ (Yi - a/(Xi + c)b )/(Xi + c)b+1 = 0.
43.8. Using a computer, the least squares fit is: a = 993.2, b = 2.0743, c = 0.037565.
Comment: One can use the values from the solution to a previous question, as the starting
values: a = 920.203, b = 2.026525, and c = 0. Example adapted from “Extrapolating,
Smoothing, and Interpolating Development Factors,” by Richard Sherman, PCAS 1984.
43.9. Σ(Yi - a/(Xi + c)b)/(Xi + c)b = Σ(Yi - 993.2/(Xi + 0.037565)2.0743)/(Xi + 0.037565)2.0743 =

(920 - 993.2/(1 + 0.037565)2.0743)/(1 + 0.037565)2.0743 + ... =
-0.0555746 + 0.249839 - 0.111144 - 0.217003 + 0.0451745 + 0.0278578 + 0.0288684 +
0.0110243 + 0.00701497 + 0.00580866 + 0.00810089 = -0.00003 ≅ 0.
Σ(Yi - a/(Xi + c)b)ln(Xi + c)/(Xi + c)b =
Σ(Yi - 993.2/(Xi + 0.037565)2.0743)ln(Xi + 0.037565)/(Xi + 0.037565)2.0743 =
-0.0020494 + 0.177824 - 0.123487 - 0.302859 + 0.0730437 + 0.0500883 + 0.0563298 +
0.0229761+ 0.0154427 + 0.0133967+ 0.0194527 = 0.00016 ≅ 0..
Σ(Yi - a/(Xi + c)b)/(Xi + c)b+1 = Σ(Yi - 993.2/(Xi + 0.037565)2.0743)/(Xi + 0.037565)3.0743 =
-0.0535625 + 0.122617 - 0.0365897 - 0.0537461 + 0.00896752 + 0.00461408 +
0.00410204 + 0.0013716 + 0.000776202 + 0.000578692 + 0.000733938 = -0.00014 ≅ 0.
43.10. Y = a/(Xi + c)b = 993.2/(X + 0.037565)2.0743.

X 1 2 3 4 5 6 7 8 9 10 11
Observed 920 228 98 51 36 25 19 14 11 9 8
Fitted 920.06 226.91 99.11 54.92 34.71 23.84 17.35 13.17 10.33 8.31 6.82
Sum of Squared Differences = (920 - 920.06)2 + ... + (8 - 6.82)2 = 26.4.
Y = 129. Σ(Y - Y )2 = 730,282.
R 2 = 1 - 26.4/730282 = .999964.
Comment: By introducing the additional parameter c, the sum of squared differences has
been reduced from 29.4 to 26.4.
43.11 to 43.14. f(X) = 1/(α + X). ∂f/∂α = −1/(α + X)2.

Constructed dependent variable:
V = Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0 = Y - 1/(α + X) - α/(α + X)2 =
Y - 1/(1 + X) - 1/(1 + X)2 = (-0.9, -0.05, -.1444).
Constructed independent variable:
U = - 1/(α + X)2 = - 1/(1 + X)2 =(-1, -1/4, -1/9).
^ = Σ Ui Vi/ Σ Ui2 = .92855/1.07485 = .864.
α
Constructed dependent variable for the second iteration: V = Y - 1/(α + X) - α/(α + X)2 =
Y - 1/(.864 + X) - .864/(.864 + X)2 = (-1.21481, -0.0851498, -0.154496).
Constructed independent variable for the second iteration:
U = - 1/(α + X)2 = - 1/(.864 + X)2 =(-1.33959, -0.287812, -0.121914).
^ = Σ Ui Vi/ Σ Ui2 = 1.6707/1.89221 = .883.
α
Comment: By doing another iteration, one could determine that to three decimal places the
procedure has converged after two iterations.
43.15. D. ∂eβX/∂β = XeβX.

For Y = 11.7 and X = 25, the value of the constructed dependent variable for β0 = 0.1is:
Y - f(X) + β0(∂f/∂β)0 = 11.7 - e(.1)(25) + (.1)(25)e(.1)(25) = 29.97.
Comment: There would be other pairs of X and Y observed, each of which would have a
corresponding value of the constructed dependent variable. Since there is one parameter in
the model, the summation in the final term of the formula for the constructed dependent
variable only has one term.
43.16. D. For Y = 11.7 and X = 25, the value of the constructed independent variable for
β0 = 0.1is: (∂f/∂βi)0 = ∂eβX/∂β = XeβX.= 25exp[(0.1)(25)] = 304.56.
Comment: There would be other pairs of X and Y observed, each of which would have a
corresponding value of the constructed independent variable. Since there is one parameter
in the model, there is only one constructed independent variable.
44.1. C. f(y) = exp[-(y - µ)2/(2σ2)]/{σ√(2π)}. ln f(Yi) = -(Yi - βXi)2/(2σ2) - ln(σ) - ln(2π)/2.

Loglikelihood is: -Σ(Yi - βXi)2/(2σ2) - n ln(σ) - n ln(2π)/2.
Set the partial derivative of the loglikelihood with respect to β equal to zero:
^
0 = ΣXi(Yi - βXi)/σ2. ⇒ ΣXiYi = βΣXi2. ⇒ β = ΣXiYi / ΣXi2 = 3080/751 = 4.10.
^
Comment: Matches the linear regression model with no intercept, β = ΣXiYi / ΣXi2.
44.2. B. Set the partial derivative of the loglikelihood with respect to σ equal to zero:
0 = Σ(Yi - βXi)2/σ3 - n/σ. ⇒ σ2 = Σ(Yi - βXi)2/n =
{(5 - (1)(4.1))2 + (15 - (5)(4.1))2 + (50 - (10)(4.1))2 + (100 - (25)(4.1))2}/4 = 29.58.
^ ^
β = ΣXiYi / ΣXi2. Var[ β] = Var[ΣXiYi / ΣXi2] = ΣVar[XiYi / ΣXi2] = ΣXi2Var[Yi ]/ (ΣXi2)2 =
ΣXi2σ2/ (ΣXi2)2 = σ2/ΣXi2 = 29.58/751 = .0394.
^
StdDev[ β] = √ .0394 = .198.
Comment: In the linear regression version of this same example, one would estimate the
variance of the regression as: s2 = Σ ^εi 2 / (N - 1) = {(5 - (1)(4.1))2 + (15 - (5)(4.1))2 +
(50 - (10)(4.1))2 + (100 - (25)(4.1))2}/3 = 39.4. This is an unbiased estimate of σ2, which is not
equal to that from maximum likelihood which is biased.
44.3. E. Statement #1 is true. Ordinary Linear Regression is a special case of the

generalized linear model. Weighted Least Squares Regression, in which one can select
whatever weights one wants, is not a special case of the generalized linear model.
44.4. ln(λ) = β0 + β1z. ⇒ λ = exp[β0 + β1z].

For the Poisson Distribution: f(y) = e−λ λy / y!.
ln f(y) = -λ + yln(λ) - ln(y!) = -exp[β0 + β1z] + y(β0 + β1z) - ln(y!).
-exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + 4(β0 + β1) + 7(β0 + 2β1) + 8(β0 + 3β1) - ln(4!) -
ln(7!) - ln(8!).
0 = -exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + 19.
0 = -exp[β0 + β1] - 2exp[β0 + 2β1] - 3exp[β0 + 3β1] + 42.
Thus we have two equations in two unknowns:
exp[β0 + β1]{1 + exp[β1] + exp[2β1]} = 19.
exp[β0 + β1]{1 + 2exp[β1] + 3exp[2β1]} = 42.
Dividing the second equation by the first equation:
{1 + 2exp[β1] + 3exp[2β1]}/{1 + exp[β1] + exp[2β1]} = 42/19.
⇒ 19 + 38exp[β1] + 57exp[2β1] = 42 + 42exp[β1] + 42exp[2β1].
⇒ 15exp[2β1] - 4exp[β1] - 23 = 0.
Letting v = exp[β1], this equation is: 15v2 - 4v - 23 = 0, with positive solution:
v = (4 + √1396)/30 = 1.3788.
exp[β1] = 1.3788. ⇒ β1 = .3212.
⇒ exp[β0] = 19/{exp[β1] + exp[2β1] + exp[3β1]} = 19/{1.3788 + 1.37882 + 1.37883} = 3.2197.
⇒ β0 = 1.1693.
λ = exp[β0 + β1z] = exp[β0] exp[β1]z = (3.2197)(1.3788z).
For z = 1, λ = 4.439. For z = 2, λ = 6.121. For z = 3, λ = 8.440.
Comment: Beyond what you are likely to be asked on your exam.
An ordinary linear regression fit to these same observations turns out to be: y = 2.333 + 2x,
with fitted values: 4.333, 6.333, and 8.333.
44.5. E., 44.6. C. f(y) = exp[-(y - µ)2/(2σ2)]/{σ√(2π)}.

ln f(Yi) = -(Yi - β0 - β1Xi)2/(2σ2) - ln(σ) - ln(2π)/2.
Loglikelihood is: -Σ(Yi - β0 - β1Xi)2/(2σ2) - n ln(σ) - n ln(2π)/2.
Set the partial derivative of the loglikelihood with respect to β0 equal to zero:
0 = Σ(Yi - β0 - β1Xi)/σ2. ⇒ ΣYi = nβ0 + β1ΣXi. ⇒ β0 = Y - β1 X .
Set the partial derivative of the loglikelihood with respect to β1 equal to zero:
0 = ΣXi(Yi - β0 - β1Xi)/σ2. ⇒ ΣXiYi = β0ΣXi + β1ΣXi2. ⇒ ΣXiYi = ( Y - β1 X )ΣXi + β1ΣXi2.
⇒ β^ 1 = {ΣXiYi - Y ΣXi}/{ΣXi2 - X ΣXi} = {255 - (10)(24)}{174 - (6)(24)} = 15/30 = 0.5.

⇒ β^ 0 = Y - β^ 1 X = 10 - (0.5)(6) = 7.
Comment: Matches the linear regression model with an intercept. In deviations form:
X = 24/4 = 6. x = X - X = -4, -1, 2, 3. Y = 40/4 = 10. y = Y - Y = 0, -4, 1, 3.
^ ^
β = Σxiyi/Σxi2 = 15/30 = 0.5. α^ = Y - β X = 10 - (.5)(6) = 7.
44.7. C. Set the partial derivative of the loglikelihood with respect to σ equal to zero:
0 = Σ(Yi - β0 - β1Xi)2/σ3 - n/σ. ⇒ σ2 = Σ(Yi - β0 - β1Xi)2/n = Σ(Yi - 7 - (.5)Xi)2/4 =
{(10 - 7 - (.5)(2))2 + (6 - 7 - (.5)(5))2 + (11 - 7 - (.5)(8))2 + (13 - 7 - (.5)(9))2}/4 = 18.5/4 = 4.625.
σ^ = √ 4.625 = 2.15.
44.8. B., 44.9. A. Let x = X - X = -4, -1, 2, 3, and y = Y - Y = 0, -4, 1, 3.

Then, ΣXiYi - Y ΣXi = ΣXjYj - ΣYjΣXi/n = ΣYj(Xj - X ) = ΣYjxj.
Also, ΣXi2 - X ΣXi = ΣXi(Xi - X ) = ΣXixi = Σ(Xi - X )xi + Σ X xi = Σxi2 + X Σxi = Σxi2 + X (0) = Σxi2.
^
β 1 = {ΣXiYi - Y ΣXi}/{ΣXi2 - X ΣXi} = ΣYixi /Σxi2.
^
Var[ β 1] = Var[ΣYixi /Σxi2] = ΣVar[Yixi]/{Σxi2}2 = Σxi2Var[Yi]/{Σxi2}2 = σ2 Σxi2/{Σxi2}2 = σ2/Σxi2 =
^
4.625/30 = .1542. StdDev[ β 1] = √ .1542 = .393.
^ ^
β 0 = Y - β 1 X = (Y1 + Y2 + Y3 + Y4)/4 - (ΣYixi /Σxi2)(6) =
(Y1 + Y2 + Y3 + Y4)/4 - (-4Y1 - Y2 + 2Y3 + 3Y4)(6/30) = 1.05Y1 + .45Y2 - .15Y3 - .35Y4.
Recalling that the Yi are independent and each have variance σ2:
^
Var[ β 0] = σ2(1.052 + .452 + .152 + .352) = 1.45σ2 = (1.45)(4.625) = 6.706.
^
StdDev[ β 0] = √ 6.706 = 2.59.
^
Comment: One can show in general that Var[ β] = σ2 /Σxi2 and Var[ α^ ] = σ2 ΣXi2 /(NΣxi2).
While the maximum likelihood results are similar, they do not match linear regression:
^ ^ ^
Y = α^ + βX = 8, 9.5, 11, 11.5. ε^ = Y - Y = 2, -3.5, 0, 1.5. ESS = Σ ^εi 2 = 18.5.
s2 = ESS/(N - 2) = 18.5/(4 - 2) = 9.25.
^
Var[ β] = s2 /Σxi2 = 9.25/30 = .3083. sβ^ = √.3083 = .555.
Var[ α^ ] = s2ΣXi2 /(NΣxi2) = (9.25)(174)/((4)(30)) = 13.41. sα^ = √13.41 = 3.66.

44.10. For a Poisson, f(n) = e−λλn/n!.

ln f(n) = −λ + nlnλ - ln(n!) = -exp[β0 + β1X1i + β2X2i] + ni(β0 + β1X1i + β2X2i) - ln(ni!).
loglikelihood = -Σexp[β0 + β1X1i + β2X2i] + ΣYi(β0 + β1X1i + β2X2i) + constants.
Setting the partial derivatives of the loglikelihood with respect to β0, β1, and β2 equal to zero:
0 = -Σexp[β0 + β1X1i + β2X2i] + ΣYi.
0 = -ΣX1iexp[β0 + β1X1i + β2X2i] + ΣYiX1i.
0 = -ΣX2iexp[β0 + β1X1i + β2X2i] + ΣYiX2i.
ΣYi = 8 + 8 + 10 + .... + 33 + 31 = 369.
ΣYiX1i = 8ln(2) + 8ln(4) + 10ln(6) + .... + 33ln(18) + 31ln(20) = 872.856.
ΣYiX1i = 14 + 19 + .... + 33 + 31 = 241.
exp[β0 + β1X1i + β2X2i] = exp[β0]exp[β1X1i]exp[β2X2i] = exp[β0]exp[X1i] β1 exp[β2X2i].
The first equation becomes:
exp[β0]{2 β1 + 4 β1 + ... + 20 β1 + 2 β1 exp[β2] + 4 β1 exp[β2] + 20 β1 exp[β2]} = 369. ⇒
exp[β0](1 + exp[β2]){2 β1 + 4 β1 + 6 β1 + ... + 20 β1 } = 369.
The second equation becomes:
exp[β0](1 + exp[β2]){ln(2)2 β1 + ln(4)4 β1 + ln(6)6 β1 + ... + ln(20)20 β1 } = 872.856.
The third equation becomes:
exp[β0]exp[β2]{2 β1 + 4 β1 + 6 β1 + ... + 20 β1 } = 241.
Comment: Well beyond what you should be asked on your exam!
A Poisson variable with a logarithmic link function.
Dividing the 1st and 3rd equations:
(1 + exp[β2])/exp[β2] = 369/241. ⇒ β2 = ln(241/148) = .6328.
Using a computer, the fitted parameters are: β0 = 1.684, β1 = .3784, β2 = .6328.
One can verify that these values satisfy the three equations.
Example taken from Applied Regression Analysis by Draper and Smith.
44.11. p/(1-p) = exp[β0 + β1X]. ⇒ 1/p - 1 = exp[-β0 - β1X].

⇒ p = 1/(1 + exp[-β0 - β1X]). ⇒ 1 - p = exp[-β0 - β1X]/(1 + exp[-β0 - β1X]) = 1/(1 + exp[β0 +
β1X]).
For a Binomial, f(n) = pn(1-p)m-n m!/{(n!)(m-n)!}.
ln f(n) = n lnp + (m-n)ln(1-p) + ln(m!) - ln(n!) - ln[(m-n)!] = n ln[p/(1-p)] + m ln(1-p) + constants =
n(β0 + β1X) - m ln[(1 + exp[β0 + β1X])] + constants.
loglikelihood = Σni(β0 + β1Xi) - Σmi ln[(1 + exp[β0 + β1Xi])] + constants.
Setting the partial derivatives of the loglikelihood with respect to β0 and β1 equal to zero:
0 = Σni - Σmi exp[β0 + β1Xi]/(1 + exp[β0 + β1Xi]).
0= ΣniXi - Σmi Xi exp[β0 + β1Xi]/(1 + exp[β0 + β1Xi]).
Σni = 900 + 820 + 740 + 660 + 580 = 3700.
ΣniXi = (1)(900) + (2)(820) + (3)(740) + (4)(660) + (5)(580) = 10,300.
The first equation becomes:
3700 = 1000/(1 + exp[-β0 - β1]) + 900/(1 + exp[-β0 - 2β1]) + 800/(1 + exp[-β0 - 3β1])
+ 700/(1 + exp[-β0 - 4β1]) + 600/(1 + exp[-β0 - 5β1]).
The second equation becomes:
10300 = 1000/(1 + exp[-β0 - β1]) + 1800/(1 + exp[-β0 - 2β1]) + 2400/(1 + exp[-β0 - 3β1])
+ 2800/(1 + exp[-β0 - 4β1]) + 3000/(1 + exp[-β0 - 5β1]).
Comment: An example of a Logistic Regression; a Binomial with a logit link function.
Using a computer, the maximum likelihood fit is: β0 = 1.885 and β1 = .2455.
44.12. B. For the Poisson, f(n) = e−λλn/n!. ln f(n) = -λ + n ln(λ) - ln(n!). The loglikelihood is:
-34β + 2 ln(34β) - ln(2!) - 38β + 1 ln(38β) - ln(1!) - 45β + 0 ln(45β) - ln(0!) - 25β + 3 ln(25β)
- ln(3!) - 21β + 3 ln(21β) - ln(3!) = -163β + 9 ln(β) + constants.
Setting the partial derivative of the loglikelihood with respect to β equal to zero:
^ ^
0 = -163 + 9/β. ⇒ β = 9/163. More generally, β = ΣYi / Σ Xi = ΣYi / 163.
^
Var[ β] = Var[ΣYi / 163] = Var[ΣYi]/1632 = ΣVar[Yi]/1632 = Σµi /1632 = ΣβXi /1632 = βΣXi /1632
^ ^
= β 163 /1632 = 9/1632 = .000338. StdDev[ β] = √ .000338 = .0184.
Alternately, Information ≅ - ∂2 loglikelihood / ∂β2 = 9/β2.
^ ^
Var[ β] ≅ 1/Information = β2/9 = (9/163)2/9. StdDev[ β] = (9/163)/3 = .0184.
Comment: Generalized Linear Model, with a Poisson Distribution and an identity link
function. Since Yi is Poisson distributed, Var[Yi] = E[Yi] = µi.
While these solutions are believed to be correct, anyone can make a mistake. If you believe
you’ve found something that may be wrong, send any corrections or comments to:
Howard Mahler, Email: hmahler@mac.com

MahlersGuidetoRegression2006 PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

MahlersGuidetoRegression2006 PDF

Caricato da

Copyright:

Formati disponibili

Mahler’s Guide to

VEE-Applied Statistical Methods Exam

Study Aid F06-Reg-A

New England Actuarial Seminars Howard Mahler

Mahler’s Guide to Regression

Section # Pages Section Name

C 10 85-95 Variances and Covariances

E 18 159-167 Three Variable Regression Model

F 22 208-225 Additional Models

G 25 241-249 Weighted Regression

H 29 275-283 Serial Correlation

Table of Contents Continued on the Next Page

Section # Pages Section Name

J 36 343-349 Standardized Coefficients

K 44 386-399 Generalized Linear Models*

L 416-447 Solutions to Problems, Sections 1-12

M 448-487 Solutions to Problems, Sections 13-21

N 488-529 Solutions to Problems, Sections 22-32

O 530-569 Solutions to Problems, Sections 33-44

Course 4 and VEE Exam Questions by Section of this Study Aid

Section 1, Fitting a Straight Line with No Intercept

Here is a graph of this data:

Straight Line with No Intercept:

Sum of Squared Errors = Σ(Yi - βXi)2.

Exercise: If β = 1.01, what is the sum of squared errors?

Here is a graph of the sum of squared errors, as a function of β:

1.01 1.02 1.03 1.04 1.05 1.06

Here is a plot of these residuals:

We assume E[εi]. Each error term has mean of zero.

Then E[Yi] = E[βXi + εi] = βXi.

Expected Value of Residuals:

Thus the expected value of each residual is zero.

Exercise: Fit the model Yi = βXi + εi.

Assuming εi and εj are independent,10 then Yi and Yj are independent.

Var[ ε^1] = Var[(73Y1 - 3Y2 - 8Y3)/74] = (732Var[ε1] + 32Var[ε2] + 82Var[ε3])/742 =

Note that since E[ ε^1] = 0, E[ ε^12] = Var[ ε^1] = 1.098.

Exercise: What is Var[ ε^2 ]?

Var[ ε^2 ] = Var[(65Y2 - 3Y1 - 24Y3)/74] = (652Var[ε2] + 32Var[ε1] + 242Var[ε3])/742 =

Formula for the Variance of the Residuals:*

One can derive a general formula for Var[ ^εi ] as follows.

E[Yi2] = Var[Yi] + E[Yi]2 = Var[εi] + β2Xi2.

Yi and Yj are independent ⇒ E[YiYj] = Cov[Yi, Yj] + E[Yi]E[Yj] = 0 + βXiβXj = β2XiXj, i ≠ j.

Var[ ^εi ] = E[ ^εi 2] = Var[εi] + Xi2ΣXj2Var[εj]/{ΣXj2}2 - 2Xi2Var[εi]/ΣXj2.

Exercise: What is Var[ ε^3 ]?

[Solution: Var[ ε^3 ] = Var[ε3] + X32ΣXj2Var[εj]/{ΣXj2}2 - 2X32Var[ε3]/ΣXj2 =

Var[ ε^3 ] = Var[(10Y3 - 8Y1 - 24Y2)/74] = (102Var[ε3] + 82Var[ε1] + 242Var[ε2])/742 =

If all of the Var[εi] are equal, Var[εi] = σ2, then:11

Var[ ^εi ] = Var[εi] + Xi2ΣXj2Var[εj]/{ΣXj2}2 - 2Xi2Var[εi]/ΣXj2 =

E[ ^εi 2] = Var[ ^εi ] = σ2(1 - Xi2/ΣXj2).

E[ESS] = E[Σ ^εi 2] = ΣE[ ^εi 2] = Σσ2(1 - Xi2/ΣXj2) = σ2(N - 1).

In the example, Cov[ ε^1 , ε^2 ] = (1)(3)(686)/742 - (1)(3)(1 + 5)/74 = .1326.

Corr[ ε^1 , ε^2 ] = .1326/√((1.098)(4.911)) = .057.12

Cov[ ^εi , ε^j ] = XiXj{ΣXk2Var[εk]/ΣXk2 - Var[εi] - Var[εj]}/ΣXk2.

Exercise: What is Corr[ ε^1 , ε^3 ]?

[Solution: Cov[ ε^1 , ε^3 ] = (1)(8){686/74 - 1 - 10}/74 = -.1870.

Corr[ ε^1 , ε^3 ] = -.1870/√((1.098)(.720)) = -.210.]

Exercise: What is Corr[ ε^2 , ε^3 ]?

Corr[ ε^2 , ε^3 ] = -1.8583/√((4.911)(.720)) = -.988.]

For this example, the variance-covariance matrix of the residuals is:

(1.098 .133 -.187)

If all of the Var[εi] are equal, Var[εi] = σ2, then:

Cov[ ^εi , ε^j ] = XiXj{ΣXk2σ2/ΣXk2 - σ2 - σ2}/ΣXk2 = -σ2XiXj/ΣXk2.

Corr[ ^εi , ε^j ] = -XiXj/√{ΣXk2)(ΣXk2)}.