Sei sulla pagina 1di 13

SEAT NUMBER: ……...............….… ROOM: .……..….....…….

FAMILY NAME.………….....……………………………………….
This question paper must be returned.
Candidates are not permitted to remove any part OTHER NAMES…………….…...................……….........…..……..
of it from the examination room.
STUDENT NUMBER…………...................………......……………..

FORMAL EXAMINATION PERIOD: SESSION 2, NOVEMBER 2018

Unit Cod
odee: STAT271

Unit Name: Statistics I


Duration of Exam
Three (3) hours plus ten (10) minutes reading time
(including reading time if applicable)::
otaal No. of Question
Tot onss: Nine (9)
otaal No. of Page
Tot gess
Thirteen (13)
(including this cover sheet)::

GENERAL INSTRUCTIONS TO STUDENTS:


 Students are required to follow directions given by the Final Examination Supervisor and must refrain from communicating in any way with another student once they have entered
the final examination venue.
 Students may not write or mark the exam materials in any way during reading time.
 Students may only access authorised materials during this examination. A list of authorised material is available on this cover sheet.
 All watches must be removed and placed at the top of the exam desk and must remain there for the duration of the exam. All alarms, notifications and alerts must be switched off.
 Students are not permitted to leave the exam room during the first hour (excluding reading time) and during the last 15 minutes of the examination.
 If it is alleged you have breached these rules at any time during the examination, the matter may be reported to a University Discipline Committee for determination.

EXAMINATION INSTRUCTIONS:
• All questions are to be answered.
Answers are to be written legibly in the spaces provided in the question paper. If extra space is needed, you may use
the back of the pages. Please indicate clearly (with an arrow) if an answer is continued elsewhere.

• Do not write past the vertical line on the right hand side of the pages.
Do not write in the resources booklet – it will not be marked. This booklet contains additional information, data and
Minitab printout which will be needed to answer the exam questions. This booklet is to be collected at the end of the

examination.
The questions are not of equal value. The mark of each question is shown in the brackets beside the question number,
and in the table below.

• Statistical Tables as used in the unit are provided separately. These are to be returned at the end of the examination.

AIDS
AIDS AND MATERIALS PERMITTED/NOT PERMITTED:
Dictionaries None
Calculators Only non-programmable calculators without text retrieval capacity / alphabet on keyboard may be used.
Other Closed book with specified material permitted: Two (2) A4 sheets of notes, written on one or both sides,
handwritten or typed. These are to be collected (N
NOT returned to students) at the end of the exam.

Question 1 2 3 4 5 6 7 8 9 Total

Worth 7 7 7 10 24 10 10 10 15 100

Mark
Question 1 [ 7 Marks ]

(a) Consider the distribution dealt with in Assignment one and the first mid-session test
this year. We considered a random sample X1 ,… , X n from the following pdf:
1
(1
F(G; I) = K2 + IG) ; −1 ≤ G ≤ 1 ; where -1 ≤ θ ≤ 1
0 ; O. P.
The method of moments estimator of θ is θˆ = 3 X , which gives an infeasible
solution unless ≤U≤ . Consider an “adjusted mom estimator”, where the
RS VS
mom

T T
estimated value is set to be:
1
⎧−1 ; U<−
⎪ 3
1 1
IZ[\[(]^_) = 3U ; − ≤ U ≤ +
⎨ 3 3
⎪ 1
⎩+1 ; U>+
3
Would you expect the variance of the adjusted method of moments estimator to
be less than, the same as, or more than the unadjusted method of moments
estimator? Briefly explain why.

(b) A random sample of 30 observations is drawn from a normal distribution with


unknown variance. The sample standard deviation is s = 7.5. Calculate a 95%
symmetric confidence interval for the population standard deviation.

2
Question 2 [ 7 Marks ]

(a) Apply a runs test for randomness to the following 15 observations:


25 14 29 20 21 19 12 30 24 27 33 13 23 31 22
The sample median is 23.

(b) For a random variable from a Poisson(9) distribution, obtain a “symmetric” interval
that has a minimum probability of 0.90 (but as close to 0.90 as possible) of
containing a random observation from the distribution. State the actual probability.
[Hint: refer to the tabulated cdf for a Poisson (9).]

x 0 1 2 3 4 5 6
P(X≤x) 0.0001 0.0012 0.0062 0.0212 0.0550 0.1157 0.2068

x 7 8 9 10 11 12 13
P(X≤x) 0.3239 0.4557 0.5874 0.7060 0.8030 0.8758 0.9261

x 14 15 16 17 18 19 20
P(X≤x) 0.9585 0.9780 0.9889 0.9947 0.9976 0.9989 0.9996

3
Question 3 [ 7 Marks ]

(a) Give brief answers to the following (one or 2 lines):

(i) If a straight line is subjectively a “good fit” to a set of data, what will this
imply in terms of correlation between the residuals and the x values?

(ii) Why do some non-parametric tests only have a finite number of distinct
significance levels (e.g. 6% or 2%, but not 4% etc.)?

(iii) In what context does one divide a sum of squares by (n-2) in order to get an
unbiased estimate of a population variance?

(b) The following sample of paired data are thought to have a


Scatterplot of y vs x
20

monotonic relationship. The data is sorted by the x values,


15

and a scatterplot is also shown. Apply an appropriate


10
y

statistical test to determine if the relationship is monotonic.


5

0
1 2 3 4 5 6 7 8 9 10

x 1.7 2.3 3.9 4.4 5.0 6.1 7.5 8.2 9.3


newx

y 7.6 1.8 7.2 6.9 6.4 12.7 16.5 18.3 17.6

4
Question 4 [ 10 marks]
The level of Vitamin D in the blood has an effect on general health and well being:
• levels below 60 are considered to be low in Vitamin D;
• levels of 61-100 are considered to be in the “healthy range”; and
• levels over 100 are high and so not recommended.
Medical consultants have randomly selected 40 staff from each of two different companies,
and categorised the 80 workers by level of Vitamin D as defined above. The counts are
presented in the table below. Of interest is whether the proportions of low, healthy and
high Vitamin D levels are the same or different for the two companies.
counts (obs) Low Healthy High Sum
Company A 6 18 16 40
Company B 11 22 7 40

(a) State the relevant statistical procedure as well as the null and alternative hypotheses
to address the question of interest.

(b) Give the values of the parameter estimates under H0.

(c) The observed value of the test statistic is 3.94.


3.94 State the distribution the test statistic
follows if H0 is true, the 5% critical value, your decision and a brief conclusion.

(d) The proportions of Low, Healthy and High Vitamin D levels are 0.23, 0.65, 0.12,
nationwide. It is now desired to test if the two companies have proportions different
to that expected when compared to the national proportions.
(i) Would you expect the observed value of the test statistic to be less, the same as,
or more than that stated in part (c)?

(ii) State the distribution the test statistic follows if H0 is true, justifying any numeric
values you use.

5
Question 5 [ 24 marks]
Please refer to the additional information in the resources booklet.
For all parts if you are able, simply quote the relevant statistic from the Minitab output to
answer the question, with comments (and conclusions etc.) as required.

(a) Given that the aim of this experiment was to see if the total iron content is affected by
food type and/or cooking pot, what is the appropriate statistical analysis to perform on
the data?

(b) Two plots of the cell means are provided in the resources.

(i) Briefly state any patterns indicated by either of the plots of cell means.

(ii) It is often useful to put standard error bars on the plot of cell means. What
would the numeric ± value be here? Give your answer to 3 d.p.

(c) (i) What proportion of variability in FeTot is accounted for by the full interaction
model?

(ii) What proportion of variability in FeTot is accounted for by the inclusion of


interaction after the main effects in the additive model?

6
(d) Across the twelve combinations of “treatment” levels, the smallest standard deviation
was 0.185, and the largest was 0.641. Apply the simplest appropriate statistical
procedure to formally test equality of variances for all treatment combinations.

(e) Formally test the hypothesis that there is no interaction between food and cooking pot
regarding FeTot (state hypotheses, observed test statistic, p-value and conclusion).

(f) Explain briefly (no more than 3 lines) why the p-values for testing the main effects of
food type and cooking pot are not directly interpretable.

7
(g) Specifically for legumes, obtain relevant pairwise comparisons to formally test if there
is any cooking pot effect on FeTot with an overall Type I error rate of at most 0.05.
If the tables do not have the exact value required, use the nearest. Present your results
as shown in lectures using lines to indicate non-significant differences. Briefly (at most
two lines) comment on your results.

(h) We wish to test when cooking in an aluminium pot if the average FeTot for meat is
greater than that for the other two food types combined. Obtain the relevant 95%
confidence interval for this contrast, and comment briefly on the result.

8
Question 6 [ 10 marks]
Please refer to the additional information in the resources booklet.

(a) What is an appropriate procedure in this case? Specify the null and alternative
hypotheses.

(b) Show that the value of the relevant test statistic is 7.39.

(c) Use the approximating distribution to obtain an approximate p-value. Are there any
problems with using this approximation here? State your decision at a 5%
significance level.

(d) Write your conclusion as an advisory statement to the thrifty student who obtained
the data.

9
Question 7 [ 10 marks]
Please refer to the additional information in the resources booklet.

(a) State a transform of X which would allow reference to chi-squared tables.

(b) We now wish to test the hypotheses H0: θ = 3 vs HA: θ = 5 with a Type I error rate
of 0.05. Obtain numerically the most powerful critical region in the specific case of a
sample of size ten (n=10) in terms of ∑mknS Uk l .

(c) Determine the power of the test using the tables provided. From this table it is not
possible to determine the power exactly, but you can produce an interval (of length
of no more than 0.1) within which the power lies.

10
Question 8 [ 10 marks]
A sample with 60 observations has supposedly come from the exponential distribution with
a mean of 12.

(a) State an appropriate statistical procedure to check this, including the hypotheses.

(b) If the distribution is split in to four equally sized bins, show that the cut-offs for the
“bins” are 3.45, 8.32 and 16.64.

(c) Given that the observed counts in each bin are: 7, 18, 12 and 23 perform the relevant
test and report on your results.

(d) Now we wish to test if the data are from the exponential distribution, but without a
pre-specified mean, using the same procedure.
(i) Would you expect the value of the test statistic to be higher or lower than from
part (c)? Briefly explain your reasoning.

(ii) What distribution would be referenced in this case? Specifically explain any
numeric values.

11
Question 9 [ 15 marks]
Please refer to the additional information in the resources booklet.
You may refer to the variables using their letter (M, A, H, W, C).

(a) Based on the correlation matrix, which variable do you expect to be most useful in
predicting oxygen uptake. How much of the variability in oxygen uptake would this
variable account for fitted alone in the model?

(b) Based on the full regression with all four predictors, are any useful in the model?
State your hypotheses, quote relevant statistic(s), and write a brief conclusion.

(c) Based on the full regression with all four predictors, briefly explain why “chest”
would be the most obvious predictor variable to remove. What is its effect on the
predicted oxygen if all other variables have been taken into account (explain in REAL
terms). Would you expect this to be the same or different from the effect when chest
is fitted alone (briefly explain why.)

(d) Briefly explain why the model with age, height and weight is reasonable to select as
the final model, even though not all coefficients have p-values > 0.05.

12
(e) Comment briefly (two lines) on the appropriateness of the model assumptions.

(f) Briefly explain why the Matrix “X” has a column of 1’s.

(g) We wish to use model (2) to make a prediction for maximum oxygen uptake for a
boy with: age = 8 years, height=130 cm and weight = 30 kg.

(i) Write out the vector ‘a’ which would enable the prediction and its standard
error to be evaluated.

(ii) Quote an interval that you expect to contain with 95% probability the true
average maximum oxygen uptake to be for all boys this age.

(iii) Would you be surprised if a boy with these measurements had a maximum
oxygen uptake of 1.53 millilitres per kg bodyweight? Briefly explain why or
why not.

----- END OF EXAMINATION -----


13

Potrebbero piacerti anche