Sei sulla pagina 1di 6

Department of Statistical Sciences University of Cape Town STA2020F Class Test 1 Time allowed: 1 hour 30 minutes Marks: 45 Date:

15 March 2010

Instructions: Answer all questions. A formula sheet and the necessary tables are attached.

Question 1 We wish to analyse data from an experiment in which we compare five insecticide types (Temephos, Malathion, Fenitrothion, Fenthion, and Chlorpyrifos). Each insecticide is applied once in each of seven Caribbean locations (Anguilla, Antigua, Dominica, Guyana, Jamaica, St. Lucia, and Suriname). The variable measured is dosage required to kill 50% of a batch of mosquito larvae. An ANOVA table (partially completed) is given: Source Insecticide Location Error Total DF 4 6 24 34 SS 39.284 56.735 82.676 178.695 MS 9.82100 9.45590 3.44483 F 2.85 2.74 P-val 0.046

a) What null hypotheses are being tested in this experiment? b) Which are the treatments in the experiment? Give a reason for your answer. c) What is a P-value? d) Suggest one reason why the researchers decided to use blocking in their design.

(2) (2) (2) (1)

e) Was the decision to use blocking a good one? Explain. Your explanation should include a P-value (as far as tables allow you to determine it). (3) f) By how much would any pair of treatment means have to differ for us to declare them significantly different at the 5% level? (3) [13] Question 2 Suppose you want to determine whether the brand of laundry detergent used and the water temperature affects the amount of dirt removed. You buy two different brands of detergent (Super and Best) and choose three different temperature levels (cold, warm, and hot). Four items of dirty clothing are randomly assigned to each combination of detergent and temperature, washed and the amount of dirt removed is recorded (refer to next page):

Cold Super 4,5,6,5 Best 6,6,4,4

Warm 7,9,8,12 13,15,12,12

Hot 10,12,11,9 12,13,10,13 (2)

a) What type of experimental design is this? Give as much detail in your answer as possible. b) A portion of the ANOVA table produced is shown: Source of Variation deter temp deter.temp DF 1 2 2 SS 20.167 200.333 16.333 MS 20.167 100.167 8.167 F value 9.8108 48.7297 3.9730

There are three sets of null and alternative hypotheses. Clearly state, test and write a conclusion for each, giving a p-value to the extent that the tables allow. (6) c) Construct a graph to illustrate the presence (or absence) of interaction among the treatment factors. (4) [12] Question 3 Suppose that the management of a chain of package delivery stores would like to develop a model for predicting the weekly sales (in thousands of Rands) for individual stores based on the number of customers. A random sample of 20 stores was selected from among all the stores in the chain and the weekly sales and number of customers was recorded. The data are presented below: Customers 907 926 506 741 789 889 874 510 529 420 679 872 924 607 452 729 794 844 1010 621 Sales (R'000s) 11.2 11.05 6.84 9.21 9.42 10.08 9.45 6.73 7.24 6.12 7.63 9.43 9.46 7.64 6.92 8.95 9.33 10.23 11.77 7.41

A regression analysis was conducted in Excel and the following output produced: 2

SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations ANOVA df Regression Residual Total 1 18 19 Coefficients Intercept Customers 2.4230 0.0087 SS MS F Significance F 46.8335 46.8335 186.2188 0.0000 4.5270 0.2515 51.3605 Standard t Stat P-value Error 0.4810 5.0379 0.0001 0.0006 13.6462 *****

0.9549 0.9119 0.9070 0.5015 20

a) Write out the equation of the fitted model and give an interpretation for each of the parameter estimates. b) Use the equation to find the residual for the first observation. c) Give the null and alternative hypotheses associated with the p-value marked ***** above. d) Use tables to approximate the value of *****. e) What may be said about the fit of the model? Comment briefly. f) The output shows the value of Pearsons correlation coefficient. Interpret this value.

(4) (2) (2) (1) (3) (1) [13]

Question 4 The plot on the next page is of residuals against fitted values and was produced after a simple linear regression model had been fitted:

Plot of Residuals against Predicted Values


20

10
RESIDUAL

-10

-20

40

50

60 ESTIMATE

70

80

Discuss the conclusions that may be drawn from this plot about the usual regression assumptions. If you feel that one or more of the assumptions may be invalid, what can be done about it? [7]

Formulae ANOVA 1 RBD Source of variation Treatments Blocks Error Total SS Ti2 /r GT2/n Bi2 /t GT2/n by subtraction x2 GT2/n d.f. t-1 r-1 (r-1)(t-1) rt - 1 =n-1

ANOVA 2 Factorial Source of variation Factor A SS d.f.


2

Ta i ni

( x) N

= SSA

a-1

Factor B
where: N a b Tai
Tb j

Tb j 2 nj

( x)2 N

=SSB

b-1

= total number of observations. = number of levels of Factor A. = number of levels of Factor B. = totals for the i th level of Factor A.
= totals for the j th level of Factor B.

Simple Linear Regression and Correlation


t = r(n-2)0.5/(1-r2)0.5

SS xy = ( xi x )( yi y ) = xy SS x = ( xi x ) = x SS y = yi y rxy = SS xy SS x SS y
2 2

x y
n

( x)
n n

) = y

( y)

( x x )( y y )
i i

b1 =

i =1 n

=
2 i

SS xy SS x

b0 = y b1 x

(x x )
i =1

Testing the correlation coefficient:

Standard error of estimate:

t=r

n2 1 r2

df = n 2

s =

SSE n2

Prediction interval:

t 2,n 2 s y

2 1 ( xg x ) 1+ + 2 n (n 1) sx

Confidence interval:

t 2,n 2 s y

2 1 ( xg x ) + 2 n (n 1) sx

Potrebbero piacerti anche