Sei sulla pagina 1di 3

ST102 Outline solutions to Exercise 19 (201314)

1. We first estimate the slope:


n
P

1 =

n
P

(xi x
)(yi y)

i=1
n
P

=
(xi

x
)2

xi yi n
xy

i=1
n
P

i=1

i=1

= 0.0735,
x2i

n
x2

and then estimate the intercept to be 0 = y 1 x


= 2.588. Hence the fitted regression
line is y = 2.588 + 0.0735x, which may be interpreted as follows: a one cent increase in
the price of gasoline provides an increase in sales revenue of $73,500.
2. Using the same formulae as in Question 1 above, we obtain the fitted regression line:
y = 19.454 + 0.183x.
Plotting it together with the original data, we obtain

26

24

Death rate

28

22

20

25

30

35

40

45

Percentage over 60 MPH

The figure clearly indicates that there is positive correlation between the death rate and
the volume of speeding (i.e. over 60 mph).
Note: Hand-drawn graphs are acceptable.
3. (a) Here is the plot, together with the fitted regression line obtained from part (b). The
residual plot for part (d) is also given. (Note: Hand-drawn graphs are acceptable.)

(a)

(d)

5 10

20

20

30

5 10

Year (after 1960)

20

Residual

150

50

Cost

250

20

30

Year (after 1960)

P
P
P
P
(b) P
We obtain n = 7, i xi = 137, i yi = 945.4, i x2i = 3299, i yi2 = 195715.2 and
i xi yi = 24855.7.
P
P
P
P
)(yi y)
ix
i x i yi
i xi
j yj /n
i (x

P
P
1 =
= P 2
= 10.28,
2
2
)
i (xi x
i xi ( i xi ) /n
P
P
0 = i yi /n 1 i xi /n = 66.22. Hence the fitted model is y = 66.22+10.28x.
This indicates that social security cost increases by about $10.28 billion every year.
P
1/2
0 1 xi )2 /(n 2)
(c) We first have to estimate using
=
(y

= 23.22.
i
i
Note that
X
X
X
X
X
X
(yi 0 1 xi )2 =
yi2 +n02 + 12
x2i 20
yi 21
xi yi +20 1
xi .
1/2
P
The standard error of 1 is then estimated: S.E.(1 ) =
/
)2
= 0.93.
i (xi x
The test statistic for testing H0 : 1 = 0 is T = 1 /S.E.(1 ). Under H0 , T tn2 =
t5 . Since t = 10.28/0.93 = 11.05 > 3.365 = t0.01, 5 , we reject the null hypothesis.
There is strong evidence indicating that social security costs increase over time.
(d) The residual plot shows a clear non-random pattern, indicating inadequacy of the
linear model. Looking at the original data plot in (a), we would think about
applying a log-transformation, i.e. let z = log(y), in order to accommodate the
non-linear relationship between y and x.

5 10

20

30

0.1
0.1

Residual

5.0

4.0
3.0

log(cost)

Additional note: Unfortunately, the non-linearity in this dataset cannot be removed


using a simple log-transformation. Fitting a linear regression of z = log(y) on x, results
in the estimated model z = 2.434 + 0.106x, with
= 0.173. The plots below indicate
that there is still some pattern in the residuals.

5 10

Year (after 1960)

20

30

Year (after 1960)

4. The fitted linear regression model is y = 2.32 + 0.64x, with


= 2.05. We plot y against
x together with the fitted line, and the residuals against x. The residual plot indicates
an increasing variance with x, which is not consistent with the assumption of a constant
variance for the error term.

8
6

6
x

1 2 3

Residual

10

6
x

5. We first note E(Y ) = 0 + 1 E(X) and Y E(Y ) = (X E(X))1 + . Hence,


Cov(X, Y ) = E ((X E(X))(Y E(Y )))
= E ((X E(X))(X E(X))1 )
= 1 Var(X).
Therefore, 1 = Cov(X, Y )/Var(X). The second equality follows from the fact that
Corr(X, Y ) = Cov(X, Y )/(Var(X)Var(Y ))1/2 .
Additional note: The first equality resembles the estimator
P
(xi x
)(yi y)

1 = i P
,
)2
i (xi x
although in the standard regression model y = 0 + 1 x + , x is assumed to be fixed
(to make the inference easier). Otherwise 0 and 1 are no longer linear estimators, for
example. The second equality reinforces the fact that 1 > 0 iff y and x are positively
correlated.

Potrebbero piacerti anche