Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
pjg102
STAT101 Assignment #3
Question 1
As the samples provided in the ‘PULSE.xls’ file were greater than 30, the sample means
were assumed to be normally distributed in accordance with the Central Limit Theorem.
Microsoft Excel was used to find the mean pulse rates and the standard deviation for
males and females by using the SUM and STDEV function as follows (where n = 40):
EXCEL FORMULAS:
MEAN: = SUM(number_1:number_n) / n
STANDARD DEVIATION : = STDEV(number_1:number_n)
MALES:
MEAN: 69.4
STANDARD DEVIATION: 11.30 (2dp)
FEMALES:
MEAN: 76.3
STANDARD DEVIATION: 12.50 (2dp)
a. To construct a 95% confidence interval for the population mean pulse rate for
males using the sample provided, a critical t-value must first be found using
degrees of freedom (df = n – 1) and alpha value (α = 1 – (confidence / 100)). In this
case these are:
df = 40 – 1
= 39
α = 1 – (95 / 100)
= 1 – 0.95
= 0.05
By consulting the t-distribution tables found on page 276 of the STAT101 Course
Reader these values can be found to correspond to a critical t-value of 2.023.
The formula below can then be used to construct the interval (where refers to the
sample mean, s refers to the sampling distribution and n refers to the sample size):
69.4 ± 3.61 therefore represents a 95% confidence interval for the population
mean pulse rate for males. The lower limit is 65.79 (2dp) and the upper limit is
73.01 (2dp).
STAT101 Page 1 of 8
Assignment #3
Peter Greer 46046307
pjg102
76.3 ± 4.00 therefore represents a 95% confidence interval for the population
mean pulse rate for females. The lower limit is 72.30 (2dp) and the upper limit is
80.30 (2dp).
Upper limit of the 95% 73.0130 Upper limit of the 95% 80.29725
CI: 77 CI: 1
Lower limit of the 95% 65.7869 Lower limit of the 95% 72.30274
CI: 23 CI: 9
STAT101 Page 2 of 8
Assignment #3
Peter Greer 46046307
pjg102
Question 2
In order to validly use this test we must first verify that the sample size is large
enough. In order to do this we multiply the sample size (n = 415) by first the null-
hypothesis population proportion (p0 = 0.79), and then q (1 – p0 = 0.21). If both
these values are equal to or greater than 5, we can validly use the test. So:
n * p0 = 415 * 0.79
= 327.85
These values are both well in excess of 5, so we may proceed with the test. The
researcher’s theory is that the proportion of accounting firms who offer flexible
working hours (p) is lower than the proportion of all companies who offer the same
(p0 = 0.79). This is the alternate hypothesis (HA). The null hypothesis (H0) is that the
proportion of accounting firms who offer flexible working hours is the same as the
proportion of all companies offering flexible working hours. These hypotheses are
laid out below, along with the calculation of sample proportion (p̂):
H0: p = 0.79
HA: p < 0.79
Now we have formed our hypotheses, we use the formula below to calculate our
test statistic z:
z = p̂ - p0 .
√((p0 * (1 – p0)) / n)
= 0.73 – 0.79 .
√((0.79 * (1 – 0.79)) / 415)
= -0.06 .
√((0.79 * 0.21) / 415)
= -0.06 .
√(0.166 / 415)
= -0.06 .
√0.0004
= -0.06 .
0.020
= -3.00 (2dp)
STAT101 Page 3 of 8
Assignment #3
Peter Greer 46046307
pjg102
By using the Microsoft NORMSINV function the critical z-score with a significance
level of 0.01 for a one-tailed test was found to be -2.32 (2dp). To sum this
information up:
z0.01: -2.32
z: -3.00
We can see from this information that the test statistic z is in the rejection region (-
3.00 < -2.32). The null hypothesis (p = 0.79) can therefore be rejected, and the
alternative hypothesis (p < 0.79) can be accepted. In the context of this
problem this means we can conclude that a significantly lower proportion
of accounting firms do offer flexible working hours than the stated claim
for all companies of 79% at the 0.01 level of significance.
c. To calculate the 90% confidence interval for the true proportion of accounting firms
that offer flexible working hours the formula below is used:
The confidence interval is 0.73 ± 0.036. The lower limit is 0.694 (3dp) and the
upper limit is 0.766 (3dp).
This interval suggests that we can estimate with 90% confidence that the true
proportion of accounting firms who offer flexible working hours is between 0.694
and 0.766 (69.4% and 76.6%).
STAT101 Page 4 of 8
Assignment #3
Peter Greer 46046307
pjg102
Question 3
a. A Type I error is committed by rejecting a true null hypothesis. In the context of the
problem, a Type I error would mean the manager, by chance, selects an extreme
sample and based on that sample rejects the null hypothesis, concluding that the
average time taken to supply customers with the basic order is greater than 80
seconds. The business manager could then take unfair disciplinary action, or
change policy unnecessary.
b. In the context of this problem, it is likely the Type II error would be considered more
important. A Type I error would result in increased efforts to improve efficiency,
which is not usually a problem from the business manager’s point of view. However,
being wrongly satisfied with an under-performing business is likely to be expensive
in the long run.
c. In order to test whether or not the manager has cause for concern regarding the
efficiency of his workers at the 0.05 level, a t-test is used to test the hypotheses
below:
H0: μ = 80
HA: μ > 80
A t-test must be used as we do not have the population standard deviation to hand.
In order to use this type of hypothesis test, the population must be normally
distributed and/or the sample size must be greater than or equal to 30, in
accordance with the Central Limit Theorem. In this case the sample size is stated as
36, so we can safely use the test.
t =-μ
s / √n
= 89.0 – 80.0
19.46 / √36
= 9.0 .
19.46 / 6
= 9.0
3.24
= 2.77 (2dp)
Using the TINV function of Excel the critical value t is found to be 1.69 (2dp) for a
one-tailed t-test where α is equal to 0.05 and there are 35 (n – 1) degrees of
freedom. 2.77 falls well outside the critical value of 1.69, and the manager can
therefore reject the null hypothesis and conclude that he has cause for concern
regarding the efficiency of his staff.
STAT101 Page 5 of 8
Assignment #3
Peter Greer 46046307
pjg102
The P-value can be yielded from the TDIST function in Excel, entering the test
statistic, degrees of freedom and number of tails. The P-value of this test statistic is
0.0044 (2sf).
d. The P-value provides us with an actual probability that the discrepancy between
two results is due to chance, as opposed to a position on a distribution curve. It
requires no alpha value, and can be readily compared to any value the statistician
wishes.
Question 4
a. A two sample t-test can be conducted provided the samples are approximately
normally distributed, or the sample size is greater than or equal to 30; and that the
variables are independent. Both samples are larger than 30 (n > 30), and they are
indeed independent, and therefore a t-test can be carried out. For the purposes of
this test an alpha value of 0.05 will be used, as stipulated in the question.
Firstly, the mean and standard deviation of each sample must be calculated. This
was done in Excel using the same method as in Question 1.
MALES:
MEAN: 2.71 (2dp)
STANDARD DEVIATION: 0.37 (2dp)
FEMALES:
MEAN: 1.99 (2dp)
STANDARD DEVIATION: 0.38 (2dp)
H0: μW = μM
HA: μW ≠ μM
The null hypothesis implies that the population mean for woman is the same as that
for men. If the test statistic (calculated using the formula provided in the Course
Reader) lies outside the critical t-value (calculated using Excel TINV where α = 0.05
and df = 35 (the smaller sample size – 1) as 2.03), we reject the null hypothesis,
and can conclude there appears to be a difference in the population means.
Using the formula we can proceed to calculate our test statistic as follows:
t = (W - M) – (μW - μM)
√(sW2/nW + sM2/nM)
In this case we know that given our null hypothesis, μW = μM, and therefore that μW -
μM = 0. The other variables are already available to us, and when used in the
formula give us the following results:
= 0.72 .
√0.125
= 0.72
0.35
= 2.035 (3dp)
STAT101 Page 6 of 8
Assignment #3
Peter Greer 46046307
pjg102
STAT101 Page 7 of 8
Assignment #3
Peter Greer 46046307
pjg102
So, our test statistic has been calculated as 2.035, and we know from previously
that where df = 35 and α = 0.05 our critical t-value is 2.030. 2.035 is greater than
2.030, and the test statistic therefore falls into the rejection region. From this we
can reject the hypothesis that μW = μM, and therefore at the 0.05 significance
level we can conclude that there appears to be a difference in the
population means of the extent of approval for unsporting play between
men and women.
b. Using the TDIST function the P-value is calculated from our test statistic and
degrees of freedom to be 0.049 (3dp). This value tells us that the probability of
observing a test statistic as extreme as we did by chance, assuming the null
hypothesis (that the two means are the same) is true; is 0.049; or in other words
that there is 4.9% chance our results are due to coincidence rather than an actual
difference in population means.
c. The Two Sample t-Test with Unequal Variance output from Microsoft Excel is below:
Men Women
Mean 2.706666667 1.9875
Variance 0.140074286 0.142501282
Observations 36 40
Hypothesized Mean
Difference 0
df 73
t Stat 8.330093485
P(T<=t) one-tail 1.68728E-12
t Critical one-tail 1.665996224
P(T<=t) two-tail 3.37456E-12
t Critical two-tail 1.992997097
d. The critical value calculated by Excel is slightly different, but the biggest difference
lies in the t-statistic calculated by Excel; the t-statistic calculated in part (a) was
only 2.035, as opposed to 8.33 (2dp) calculated by Excel.
STAT101 Page 8 of 8
Assignment #3