Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Instruction
Answer ALL 15 questions on the answer sheet provided. Start by writing your name
and upi in the fields given.
All questions have a single correct answer and carry the same mark value.
If you give more than one answer to any question you will receive zero marks for that
question.
1
Given : W
2 4 6 8
2 4 6 8 10 2 4 6 8 10
20
10
0
10
Y
20
10
0
10
2 4 6 8 10
2. Figure 1 shows a plot of the data for Question 1. Which of the following statements is
not correct:
3. In a linear regression model, which of the following is the most important assump-
tion?
2
High High
groundnut soybean
8000
7500
7000
6500
Chicken weight
Low Low
groundnut soybean
8000
7500
7000
6500
0 1 2 0 1 2
Protein level
4. Figure 2 shows a plot of the chicken weight data set discussed in lecture. The response
is the weight of 24 chickens, which is thought to depend on the type of diet (groundnut
or soybean), amount fish solubles (High or Low) and protein level (0,1,2). Which of
the following statements is correct:
3
6. In a regression, a pair of explanatory variables, X1 and X2 have a correlation of 0.95
and p-values considerably greater than 0.05. Which of the following is the worst
interpretation.
7. The data for the following questions come from a study of the operation of a plant oxi-
dising ammonia into nitric acid. The measurements have been taken over the duration
of 21 days. The variables measured are
Examine the R output below and select the most correct statement on the basis
of this output only.
Call:
lm(formula = stack.loss ~ air.flow + water.temp + acid.conc)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.61416 8.90213 0.406 0.68982
air.flow 0.07156 0.01349 5.307 5.8e-05 ***
water.temp 0.12953 0.03680 3.520 0.00263 **
acid.conc -0.15212 0.15629 -0.973 0.34405
---
Residual standard error: 0.3243 on 17 degrees of freedom
Multiple R-squared: 0.9136, Adjusted R-squared: 0.8983
F-statistic: 59.9 on 3 and 17 DF, p-value: 3.016e-09
(a) If the variables acid.conc and water.temp are held constant, the amount of
ammonia escaping tends to be smaller with increased air.flow.
(b) The estimate of the error variance is 0.3243.
(c) The variable acid.conc should be kept in the model.
(d) If the variables acid.conc and air.flow are held constant, the amount of am-
monia escaping tends to be higher with increased water.temp.
(e) The higher acid.conc the more ammonia escapes.
4
Residuals vs Fitted Normal QQ
2
4 4
0.5
3 3
1
Standardized residuals
Residuals
0.0
0
1
0.5
2
21
21
2
1.5
4
0.5
4
1
3
1
Standardized residuals
Standardized residuals
1.0
0
1
0.5
0.5
2
21
0.0
Cook's distance
3
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 0.0 0.1 0.2 0.3 0.4
8. Figure 3 shows diagnostic plots for model given above. What is most strongly indi-
cated? Hint: The threshold for hat leverage is 3(k + 1)/n.
5
95%
5
0
logLikelihood
5
10
2 1 0 1 2
9. After seeing the diagnostic plots in the previous question, we look whether we should
transform the response. The resulting Box-Cox plot is shown in Figure 4. What should
we do?
6
10. After taking some corrective action, a new model was fitted. Some influence plots for
the new model are shown in Figure 5. Which of the following statements is not a
21 21
1.2
1.2
0.5
0.5
1.0
1.0
0.4
0.4
0.8
0.8
dfb.acd.
dfb.wtr.
dfb.ar.f
dfb.1_
0.3
0.3
0.6
0.6
0.2
0.2
0.4
0.4
0.1
0.1
0.2
0.2
0.0
0.0
0.0
0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
0.5
21 17 21
0.4
1
0.8
1.5
0.4
4
0.3
8
0.6
ABS(COV RATIO1)
14 21
1.0
0.3
Cook's D
DFFITS
Hats
0.2
0.4
0.2
0.5
0.1
0.2
0.1
0.0
0.0
0.0
0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
7
5 (i) (ii)
4
4
3
3
y
2
2
1
1
0
A
0
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0
x x
(iii) (iv)
8
D
2.5
2.0
6
1.5
y
4
1.0
2
0.5
C
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5
x x
12. Figure 6 shows four scatter plots of a variable y vs. x for four different data sets. In
each case a regression of y on x was fitted. Which of the following is false?
8
14. The last two questions concern a data set, where the response (the percent conversation
of n-heptane to acetylene) is related to a categorical variable ratio (the ratio between
n-heptane and acetylene with levelslow, medium, high) and a continuous variable
temperature (in C). 14 measurements have been taken. We first fitted a parallel
lines model with the following results:
Call:
lm(formula = percent.conv ~ temp + ratio)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.8723 2.6040 6.479 3.03e-05 ***
temp 0.3557 0.1767 2.013 0.067083 .
ratiomedium 12.7956 2.4727 5.175 0.000231 ***
ratiohigh 26.6930 2.4959 10.695 1.73e-07 ***
---
Residual standard error: 3.76 on 12 degrees of freedom
Multiple R-squared: 0.9201, Adjusted R-squared: 0.9002
F-statistic: 46.08 on 3 and 12 DF, p-value: 7.348e-07
(a) The fitted line for medium ratio is 13.8974 below the fitted line for high ratio.
(b) The fitted line for low ratio has intercept 16.8723.
(c) The slope is not significantly different from 0.
(d) The fitted slope for all ratios is the same.
(e) The fitted line for medium ratio has slope 29.6679.
9
15. Finally, we fit the non-parallel lines model and compare both models.
> summary(acet.full)
Call:
lm(formula = percent.conv ~ temp * ratio)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.7663 1.5847 4.901 0.000622 ***
temp 1.2484 0.1425 8.758 5.28e-06 ***
ratiomedium 19.0732 2.0083 9.497 2.55e-06 ***
ratiohigh 45.2763 2.1285 21.271 1.17e-09 ***
temp:ratiomedium -0.6732 0.1670 -4.031 0.002397 **
temp:ratiohigh -1.5948 0.1731 -9.215 3.34e-06 ***
---
Residual standard error: 1.26 on 10 degrees of freedom
Multiple R-squared: 0.9925, Adjusted R-squared: 0.9888
F-statistic: 265.4 on 5 and 10 DF, p-value: 2.721e-10
> anova(acet.par,acet.full)
Analysis of Variance Table
(a) There is strong evidence that the lines for the ratios are non-parallel.
(b) Under the parallel lines model, the estimated percentage of conversion for a
medium mixture at a temperature of 15 C is 22.20718%.
(c) The parallel lines model has a residual sum of squares that is 153.73 higher than
that of the non-parallel lines model.
(d) Under the non-parallel lines model, the estimated percentage of conversion for a
low ratio mixture at a temperature of 10 C is 20.25032%.
(e) The fitted line for a high ratio in the non-parallel lines model has slope 0.3464.
10
STATS 330 / STATS 762
Name:
UPI:
a b c d e
1 O O O O O
2 O O O O O
3 O O O O O
4 O O O O O
5 O O O O O
6 O O O O O
7 O O O O O
8 O O O O O
9 O O O O O
10 O O O O O
11 O O O O O
12 O O O O O
13 O O O O O
14 O O O O O
15 O O O O O