Sei sulla pagina 1di 8

Peter Greer 46046307

pjg102

STAT101 Assignment #3

Question 1

As the samples provided in the ‘PULSE.xls’ file were greater than 30, the sample means
were assumed to be normally distributed in accordance with the Central Limit Theorem.

Microsoft Excel was used to find the mean pulse rates and the standard deviation for
males and females by using the SUM and STDEV function as follows (where n = 40):

EXCEL FORMULAS:
MEAN: = SUM(number_1:number_n) / n
STANDARD DEVIATION : = STDEV(number_1:number_n)

The values returned were as follows:

MALES:
MEAN: 69.4
STANDARD DEVIATION: 11.30 (2dp)

FEMALES:
MEAN: 76.3
STANDARD DEVIATION: 12.50 (2dp)

a. To construct a 95% confidence interval for the population mean pulse rate for
males using the sample provided, a critical t-value must first be found using
degrees of freedom (df = n – 1) and alpha value (α = 1 – (confidence / 100)). In this
case these are:

df = 40 – 1
= 39

α = 1 – (95 / 100)
= 1 – 0.95
= 0.05

By consulting the t-distribution tables found on page 276 of the STAT101 Course
Reader these values can be found to correspond to a critical t-value of 2.023.

The formula below can then be used to construct the interval (where refers to the
sample mean, s refers to the sampling distribution and n refers to the sample size):

95% CI = ± tn-1, (α / 2) * (s / √n)


= 69.4 ± 2.023 * (11.30 / √40)
= 69.4 ± 2.023 * (11.30 / 6.32)
= 69.4 ± 2.023 * 1.79
= 69.4 ± 3.61 (2dp)
= (65.79, 73.01)

69.4 ± 3.61 therefore represents a 95% confidence interval for the population
mean pulse rate for males. The lower limit is 65.79 (2dp) and the upper limit is
73.01 (2dp).

STAT101 Page 1 of 8
Assignment #3
Peter Greer 46046307
pjg102

b. In order to construct a 95% confidence


interval for the population mean pulse rate for females using the sample provided,
the critical t-value from 1a) was used again as both the df (degrees of freedom)
and α (alpha) variables remain unchanged.

95% CI = ± tn-1, (α / 2) * (s / √n)


= 76.3 ± 2.023 . (12.50 / √40)
= 76.3 ± 2.023 . (12.50 / 6.32)
= 76.3 ± 2.023 . 1.98
= 76.3 ± 4.00 (2dp)
= (72.30, 80.30) (2dp)

76.3 ± 4.00 therefore represents a 95% confidence interval for the population
mean pulse rate for females. The lower limit is 72.30 (2dp) and the upper limit is
80.30 (2dp).

c. The following is a print-out of the Excel


descriptive statistics for each sample. The limits at the bottom of each table were
not automatically calculated by Microsoft Excel, and were added manually using
basic arithmetic in MS Excel formula bar:

Male Pulse Female Pulse


Mean 69.4 Mean 76.3
1.78627 1.976204
Standard Error 24 Standard Error 6
Median 66 Median 74
Mode 64 Mode 72
11.2973 12.49861
Standard Deviation 79 Standard Deviation 5
127.630 156.2153
Sample Variance 77 Sample Variance 8
-
0.63951
Kurtosis 8 Kurtosis 4.525991
0.68002 1.683896
Skewness 37 Skewness 1
Range 40 Range 64
Minimum 56 Minimum 60
Maximum 96 Maximum 124
Sum 2776 Sum 3052
Count 40 Count 40
Confidence 3.61307 3.997251
Level(95.0%) 7 Confidence Level(95.0%) 1

Upper limit of the 95% 73.0130 Upper limit of the 95% 80.29725
CI: 77 CI: 1
Lower limit of the 95% 65.7869 Lower limit of the 95% 72.30274
CI: 23 CI: 9

d. Because the 95% confidence intervals


for the population means overlap, we cannot conclude without further tests
that the two population means are different.

STAT101 Page 2 of 8
Assignment #3
Peter Greer 46046307
pjg102

Question 2

a. In order to test the researcher’s theory, we perform a one-tailed Z-test on our


hypothesis. If the returned value is outside the critical z-value, which can be found
in z-tables using significance level α (in this case given as 0.01), the null hypothesis
can be rejected.

In order to validly use this test we must first verify that the sample size is large
enough. In order to do this we multiply the sample size (n = 415) by first the null-
hypothesis population proportion (p0 = 0.79), and then q (1 – p0 = 0.21). If both
these values are equal to or greater than 5, we can validly use the test. So:

n * p0 = 415 * 0.79
= 327.85

n*q = 415 * (1 – 0.79)


= 415 * 0.21
= 87.15

These values are both well in excess of 5, so we may proceed with the test. The
researcher’s theory is that the proportion of accounting firms who offer flexible
working hours (p) is lower than the proportion of all companies who offer the same
(p0 = 0.79). This is the alternate hypothesis (HA). The null hypothesis (H0) is that the
proportion of accounting firms who offer flexible working hours is the same as the
proportion of all companies offering flexible working hours. These hypotheses are
laid out below, along with the calculation of sample proportion (p̂):

H0: p = 0.79
HA: p < 0.79

p̂ = number of successes / sample size


= 303 / 415
= 0.73 (2dp)

Now we have formed our hypotheses, we use the formula below to calculate our
test statistic z:

z = p̂ - p0 .
√((p0 * (1 – p0)) / n)

= 0.73 – 0.79 .
√((0.79 * (1 – 0.79)) / 415)

= -0.06 .
√((0.79 * 0.21) / 415)

= -0.06 .
√(0.166 / 415)

= -0.06 .
√0.0004

= -0.06 .
0.020

= -3.00 (2dp)

STAT101 Page 3 of 8
Assignment #3
Peter Greer 46046307
pjg102

By using the Microsoft NORMSINV function the critical z-score with a significance
level of 0.01 for a one-tailed test was found to be -2.32 (2dp). To sum this
information up:

z0.01: -2.32
z: -3.00

We can see from this information that the test statistic z is in the rejection region (-
3.00 < -2.32). The null hypothesis (p = 0.79) can therefore be rejected, and the
alternative hypothesis (p < 0.79) can be accepted. In the context of this
problem this means we can conclude that a significantly lower proportion
of accounting firms do offer flexible working hours than the stated claim
for all companies of 79% at the 0.01 level of significance.

b. The P-value of this test is 0.0013 (2sf)

c. To calculate the 90% confidence interval for the true proportion of accounting firms
that offer flexible working hours the formula below is used:

CI: =p̂ ± zα/2 * √ ((p̂ * (1 – p̂ ))/ n)


=0.73 ± 1.645 * √ ((0.73 * 0.27) / 415)
=0.73 ± 1.645 * √ (0.197 / 415)
=0.73 ± 1.645 * √ 0.000475
=0.73 ± 1.645 * 0.0218
= 0.73 ± 0.036

The confidence interval is 0.73 ± 0.036. The lower limit is 0.694 (3dp) and the
upper limit is 0.766 (3dp).

This interval suggests that we can estimate with 90% confidence that the true
proportion of accounting firms who offer flexible working hours is between 0.694
and 0.766 (69.4% and 76.6%).

STAT101 Page 4 of 8
Assignment #3
Peter Greer 46046307
pjg102

Question 3

a. A Type I error is committed by rejecting a true null hypothesis. In the context of the
problem, a Type I error would mean the manager, by chance, selects an extreme
sample and based on that sample rejects the null hypothesis, concluding that the
average time taken to supply customers with the basic order is greater than 80
seconds. The business manager could then take unfair disciplinary action, or
change policy unnecessary.

A Type II error is committed by failing to reject a false null hypothesis. In the


context of this problem, this would imply the manager reaches the conclusion that
the average time to supply customers with a basic order is equal to or less than 80
seconds, when in fact it is not. This could result in decreased efficiency, due to the
manager’s unfounded satisfaction with the performance of his workers/equipment.

b. In the context of this problem, it is likely the Type II error would be considered more
important. A Type I error would result in increased efforts to improve efficiency,
which is not usually a problem from the business manager’s point of view. However,
being wrongly satisfied with an under-performing business is likely to be expensive
in the long run.

c. In order to test whether or not the manager has cause for concern regarding the
efficiency of his workers at the 0.05 level, a t-test is used to test the hypotheses
below:

H0: μ = 80
HA: μ > 80

A t-test must be used as we do not have the population standard deviation to hand.
In order to use this type of hypothesis test, the population must be normally
distributed and/or the sample size must be greater than or equal to 30, in
accordance with the Central Limit Theorem. In this case the sample size is stated as
36, so we can safely use the test.

Firstly we calculate our test statistic using the formula:

t =-μ
s / √n

= 89.0 – 80.0
19.46 / √36

= 9.0 .
19.46 / 6

= 9.0
3.24

= 2.77 (2dp)

Using the TINV function of Excel the critical value t is found to be 1.69 (2dp) for a
one-tailed t-test where α is equal to 0.05 and there are 35 (n – 1) degrees of
freedom. 2.77 falls well outside the critical value of 1.69, and the manager can
therefore reject the null hypothesis and conclude that he has cause for concern
regarding the efficiency of his staff.

STAT101 Page 5 of 8
Assignment #3
Peter Greer 46046307
pjg102

The P-value can be yielded from the TDIST function in Excel, entering the test
statistic, degrees of freedom and number of tails. The P-value of this test statistic is
0.0044 (2sf).

d. The P-value provides us with an actual probability that the discrepancy between
two results is due to chance, as opposed to a position on a distribution curve. It
requires no alpha value, and can be readily compared to any value the statistician
wishes.

Question 4

a. A two sample t-test can be conducted provided the samples are approximately
normally distributed, or the sample size is greater than or equal to 30; and that the
variables are independent. Both samples are larger than 30 (n > 30), and they are
indeed independent, and therefore a t-test can be carried out. For the purposes of
this test an alpha value of 0.05 will be used, as stipulated in the question.

Firstly, the mean and standard deviation of each sample must be calculated. This
was done in Excel using the same method as in Question 1.

MALES:
MEAN: 2.71 (2dp)
STANDARD DEVIATION: 0.37 (2dp)

FEMALES:
MEAN: 1.99 (2dp)
STANDARD DEVIATION: 0.38 (2dp)

In order to continue, we have to establish a null hypothesis (H0), an alternate


hypothesis (HA), and a test statistic. The hypotheses are laid out below:

H0: μW = μM
HA: μW ≠ μM

The null hypothesis implies that the population mean for woman is the same as that
for men. If the test statistic (calculated using the formula provided in the Course
Reader) lies outside the critical t-value (calculated using Excel TINV where α = 0.05
and df = 35 (the smaller sample size – 1) as 2.03), we reject the null hypothesis,
and can conclude there appears to be a difference in the population means.

Using the formula we can proceed to calculate our test statistic as follows:

t = (W - M) – (μW - μM)
√(sW2/nW + sM2/nM)

In this case we know that given our null hypothesis, μW = μM, and therefore that μW -
μM = 0. The other variables are already available to us, and when used in the
formula give us the following results:

t = (2.71 – 1.99) – (0) .


√(2.712/36 +1.992/40)

= 0.72 .
√0.125

= 0.72
0.35

= 2.035 (3dp)
STAT101 Page 6 of 8
Assignment #3
Peter Greer 46046307
pjg102

STAT101 Page 7 of 8
Assignment #3
Peter Greer 46046307
pjg102

So, our test statistic has been calculated as 2.035, and we know from previously
that where df = 35 and α = 0.05 our critical t-value is 2.030. 2.035 is greater than
2.030, and the test statistic therefore falls into the rejection region. From this we
can reject the hypothesis that μW = μM, and therefore at the 0.05 significance
level we can conclude that there appears to be a difference in the
population means of the extent of approval for unsporting play between
men and women.

b. Using the TDIST function the P-value is calculated from our test statistic and
degrees of freedom to be 0.049 (3dp). This value tells us that the probability of
observing a test statistic as extreme as we did by chance, assuming the null
hypothesis (that the two means are the same) is true; is 0.049; or in other words
that there is 4.9% chance our results are due to coincidence rather than an actual
difference in population means.

c. The Two Sample t-Test with Unequal Variance output from Microsoft Excel is below:

Men Women
Mean 2.706666667 1.9875
Variance 0.140074286 0.142501282
Observations 36 40
Hypothesized Mean
Difference 0
df 73
t Stat 8.330093485
P(T<=t) one-tail 1.68728E-12
t Critical one-tail 1.665996224
P(T<=t) two-tail 3.37456E-12
t Critical two-tail 1.992997097

d. The critical value calculated by Excel is slightly different, but the biggest difference
lies in the t-statistic calculated by Excel; the t-statistic calculated in part (a) was
only 2.035, as opposed to 8.33 (2dp) calculated by Excel.

STAT101 Page 8 of 8
Assignment #3

Potrebbero piacerti anche