Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Part B contains 8 case studies with short answer questions. Marks are
identified within each question.
Total Part B marks: 80
Page 1 of 38
PART A: Multiple Choice questions (20 marks). Contains 20 questions,
each worth 1 mark. Please attempt to answer ALL questions.
Circle the letter that corresponds to the most correct answer beside each of
the following questions.
A1. The process of using sample statistics to draw conclusions about population
parameters is called
a) inferential statistics.
b) experimentation.
c) primary sources.
d) descriptive statistics.
e) the scientific method.
A2. In analysing categorical data, the following graphical device is not appropriate
a) Pie chart.
b) Pareto diagram.
c) Stem and leaf display.
d) Bar chart.
e) they are all appropriate.
A4. A summary measure that is computed from only a sample of the population is
called
a) A parameter.
b) A census.
c) A statistic.
d) The scientific method.
e) All of the above.
Page 2 of 38
A6. Which of the following statements about the median is false?
a) It is a measure of a ”typical” value.
b) It is equal to the mean in a ”bell-shaped” normal distribution.
c) It is the average of the two middle values when an even number of data
values are ranked.
d) It is more affected by extreme values than the mean.
e) It is equal to the mode in a ”bell-shaped” normal distribution.
A7. What is the probability that a customer would have to wait longer than 3.5
minutes?
a) 0.5000
b) 0.2375
c) 0.7523
d) 0.3085
e) 0.6915
A8. If a sample of 100 customers (n=100) is taken, what is the probability that the
sample mean is less than 4.2 minutes?
a) 0.0228
b) 0.4207
c) 0.5793
d) 0.9772
e) 1.0000
A9. The same hamburger chain later surveys 1,000 customers to see if it should
add beetroot to its Super Jumbo burger. The hamburger chain has 50 stores
but, on the assumption that each store’s customers have similar tastes in
hamburgers, randomly chooses only 10 of the stores to participate in the
survey. Each store then samples 100 customers. What sampling method has
the hamburger chain used?
a) Cluster sampling
b) Non-probability sampling
c) Simple random sampling
d) Stratified sampling
e) Systematic sampling
A10. The 1,000 customers sampled were also asked to estimate how much they
spend per visit at this hamburger chain. From this sample data, it was found that
on average customers spend $10.42 per visit with a standard deviation of $6.24.
The correct 95% confidence interval for the mean of all customers spending
per visit to this hamburger chain is:
a) 8.11 to 12.43
b) 4.18 to 16.66
c) 10.03 to 10.81
d) 9.72 to 11.36
e) 9.91 to 10.93
Page 3 of 38
A11. A group of researchers were attempting to determine whether female MBA
graduates have a similar mean starting salary as male MBA graduates. What
assumptions were necessary to conduct this hypothesis test?
a) Both populations of salaries (male and female) must have approximate
normal distributions.
b) The population variances are approximately equal.
c) The samples were randomly and independently selected.
d) All of the above.
e) (a) and (b) only.
A12. State the null and alternative hypothesis to determine if the average number
of hours played on the golf course is more than 12 hours per week.
a) H0: μ ≤ 12 hours and H1: μ > 12 hours
b) H0: μ ≥ 12 hours and H1: μ < 12 hours
c) H0: μ = 12 hours and H1: μ ≠ 12 hours
d) H0: x ≤ 12 hours and H1: x > 12 hours
e) H0: x ≥ 12 hours and H1: x < 12 hours
A13. To test the hypotheses the membership controller decided to construct a one-
tailed hypothesis test with a 5% significance level (i.e. α = 0.05). What is the
appropriate t-critical value for a sample of size 16?
a) 1.6450
b) 1.7459
c) 1.9600
d) 2.1315
e) 1.7531
A14. To test the hypotheses the membership controller decided to construct a one-
tailed hypothesis test with a 5% significance level (i.e. α = 0.05). What is the
appropriate t-statistic value?
a) 2.32
b) 2.06
c) 1.96
d) 1.80
e) 2.41
Page 4 of 38
A15. Based on a 0.05 alpha level, the membership controller‘s decision would be:
a) Reject H0, p-value < α = 0.05 and +t-statistic > +t-critical.
b) Reject H0, p-value > α = 0.05 and +t-statistic < +t-critical.
c) Not reject H0, p-value < α = 0.05 and +t-statistic > +t-critical.
d) Not reject H0, p-value > α = 0.05 and +t-statistic < +t-critical.
e) Throw his hands in the air and go and practise his putting.
A16. A retail manager wants to predict sales of umbrellas (Y) (in $000s). The
manager uses a combination of the following variables: daily maximum
temperature (X1), number of customers in the shopping centre where the store
is located (X2), friendliness of staff (X3), and whether or not the manager is
walking the shop floor (X4). Which of these variables is most likely to be the
strongest predictor of umbrella sales for this retail store?
a) X1
b) X2
c) X3
d) X4
e) All of these variables would have little effect on sales.
A18. Not satisfied with just considering the effect of daily maximum temperature on
umbrella sales, the manager recorded the number of customers (CUSTOMERS)
visiting the store, irrespective of whether the customer purchased an umbrella.
The regression model is:
SALES (predicted) = 472 - 4*MAXTEMP +3.5*CUSTOMERS
The new coefficient of determination is 0.59. Which of the following statements
is not true?
a) ‘Umbrella sales’ is the dependent variable.
b) Assuming the maximum temperature is kept constant, the average effect
of each extra customer is to increase sales by $3.5.
c) Assuming the number of customers is kept constant, the effect of a one-
degree increase in maximum temperature is to increase sales by $4.
d) 59% of the variation in umbrella sales is explained by the combined
variation in maximum temperature and number of customers.
e) Umbrella sales are negatively correlated with daily maximum temperature
and positively correlated with the number of customers in the store.
Page 5 of 38
A19. Use the model in Q3 to predict SALES on a day when MAXTEMP = 19C and
CUSTOMERS = 220.
a) 6671
b) 1166
c) 2352
d) 472
e) 2743
Page 6 of 38
PART B: Case Study Questions (80 marks). Contains 8 case studies with
short answer questions. Please attempt to answer all questions.
The coaches of a local football team wanted to determine whether the players on their team
were older than those of others teams. They recorded the ages of the players on two of the
teams as follows:
Team A: 25, 25, 28, 30, 25, 25, 24, 25, 22, 21, 19,
20, 22, 33, 28, 25, 26, 30, 28, 22, 24, 21
Team B: 23, 23, 26, 27, 24, 20, 21, 23, 25, 21, 22,
29, 28, 29, 28, 27, 23, 23, 26, 32, 23, 20
A set of descriptive statistics was computed for both teams’ ages, along with adjacent five-
number summaries (below) and box plots (next page).
Team A Team B
Mean 24.909 Mean 24.682
Standard Error 0.756 Standard Error 0.694
Median 25 Median 23.5
Mode 25 Mode 23
Standard Deviation 3.544 Standard Deviation 3.257
Sample Variance 12.563 Sample Variance 10.608
Kurtosis -0.112 Kurtosis -0.496
Skewness XXXX Skewness XXXX
Range 14 Range 12
Minimum 19 Minimum 20
Maximum 33 Maximum 32
Sum 548 Sum 543
Count 22 Count 22
Confidence Level 1.571 Confidence Level 1.444
(95.0%) (95.0%)
Five-number Summary
Team A Team B
Minimum 19 20
First Quartile 22 23
Median 25 23.5
Third
Quartile 28 27
Maximum 33 32
Page 7 of 38
Box Plots for Age
35
30
Ag
e 25
20 Team B
Team A
15
Team
1. Given one of the aims of this analysis is to make inferences about the
average age of all football players (there are 16 teams in the competition),
what type of sampling scheme has been used here? Briefly explain. (2
marks)
2. For which team, Team A or Team B, is age more variable? Refer to and/or
compute two different statistics to support your answer. (2 marks)
Page 8 of 38
Case Study B (6 marks).
Please attempt to answer ALL questions.
Lucky Pizzas, where your food is presented with a smile and a poem, delivers its
wide assortment of pizzas in an average time of 24 minutes with a standard
deviation of 6 minutes. The delivery times are approximately normally
distributed.
1. The owner of Lucky Pizzas has promised that any household waiting more
than 33 minutes to receive its order will get the pizza(s) free. What is the
probability of receiving a free pizza? (1 mark)
2. If 200 orders were received in one night, how many would you expect to take
between 15 and 33 minutes to be delivered? Show all workings. (2 marks)
3. On a night in which 100 deliveries were made, the average waiting time
was 25 minutes. Calculate the probability the average delivery time exceeds
25 minutes. (2 marks)
4. Does your answer to (3) indicate that the average delivery time is now
significantly greater than 24 minutes? Briefly explain (use α=0.05). (1
marks)
i. Note: In effect, you are testing H0: µ ≤ 24 v H1: µ >24.
Page 9 of 38
Case Study C (9 marks).
Please attempt to answer ALL questions.
The growing use of bicycles to commute to work has caused many cities to create
exclusive bicycle lanes. These lanes are usually created by disallowing parking on
streets that formerly allowed curb-side parking. Shop-owners on such streets
complain that the removal of parking will cause their businesses to suffer. To
examine this problem a mayor of a large city decided to launch an experiment on
one busy street that had one-hour parking meters. The meters were removed and a
bicycle lane was created. The mayor asked three businesses (a drycleaner, a
doughnut shop, and a convenience store) in one block to record daily sales for two
complete weeks (Sunday to Saturday) prior to the change and two complete weeks
after the change, the assumption being that the removal of parking bays would
result in fewer sales. The data are presented below:
1. State the null hypothesis for the doughnut shop. Write the null hypothesis in
words AND statistical notation. (2 marks)
2. State the alternative hypothesis for the doughnut shop. Write the alternative
hypothesis in words AND statistical notation. (2 marks)
Page 10 of 38
Following is the EXCEL output.
Page 11 of 38
3. Use the information from the EXCEL output to decide whether you should
reject the null hypothesis for each of the three shops. Show all workings. (Use
alpha=0.05). (3 marks)
4. What would you conclude from this study in relation to the original concern of
the shop owners? (2 marks)
Page 12 of 38
Case Study D (9 marks).
Please attempt to answer ALL questions.
The owners of Big Bucks car yard are concerned with their declining sales over the
past few months. As a result they want to determine whether there is a difference in
the motivation of sales staff paid on an hourly rate plus commission or on
commission only (but at a higher percentage rate). Of 24 randomly selected newly
trained sales staff, 12 were paid at an hourly rate and 12 on a commission basis
only. The following data represent the sales in volume (in thousands of dollars)
achieved during the first month on the job.
147 330
224 472
118 195
209 489
126 386
372 462
197 509
260 312
372 227
451 325
447 518
328 476
To determine whether there is any difference in the sales volume of the two
motivational methods one of the owners of Big Bucks, who has a statistical
background, decided to analyse the data with a hypothesis test at a 5%
significance level (α). The Excel output is below.
1. State the null hypothesis for this situation. Write the null hypothesis in words
AND statistical notation (2 marks)
Page 13 of 38
2. State the alternative hypothesis for this situation. Write the alternative
hypothesis in words AND statistical notation. (2 marks)
Hourly Commission
Mean 270.9166667 391.75
Variance 14407.90152 12557.47727
Observations 12 12
Pooled Variance 13482.68939
Hypothesized Mean Difference 0
Df 22
t Stat -2.549025127
P(T<=t) one-tail 0.009146094
t Critical one-tail 1.717144187
P(T<=t) two-tail 0.018292187
t Critical two-tail 2.073875294
3. Use this information to decide whether you should reject the null hypothesis
in this case. Show all workings. (Use alpha=0.05). (1 mark)
Page 14 of 38
Case Study E (6 marks).
Please attempt to answer ALL questions.
Page 15 of 38
Case Study F (16 marks).
Please attempt to answer ALL questions.
The chief statistician of Brokenleggen - also the mayor - wants to isolate the
seasonal effects of the town’s unemployment to assist in the prediction of future
unemployment levels. The data were plotted (see graph below) and the line of best
fit was computed.
Unemployment in Brokenleggen
1996-2000
300
250
Unemployed
200
150
100 y = 1.2135x + 162.76
50 2
R = 0.0156
0
0 2 4 6 8 10 12 14 16 18
Quarter (0=March 1996)
Page 16 of 38
Further data analysis produced the following quarterly seasonal indexes:
MARCH = 0.66
JUNE = 1.25
SEPTEMBER = 1.36
DECEMBER = 0.73
The quarterly seasonal indexes were used to seasonally adjust the data. The
regression equation is: 164.46 + 1.0163*(coded quarter) where March 1996 is
coded 0. The R2 is 0.3982. The data are graphically displayed below.
De-seasonalised Unemployment in
Brokenleggen 1996-2000
200
190
Unemployed
180
170
160 y = 1.0163x + 164.46
2
150 R = 0.3982
140
0 2 4 6 8 10 12 14 16 18
Quarter (0=March 1996)
1. Comment briefly of the main reason why the line of best fit (trend line) has an
extraordinarily low coefficient of determination (R2) value. (2 marks)
Page 17 of 38
4. Use the trend line to predict unemployment in September 1998 and compute the
residual. (3 marks)
6. Use the multiplicative model Y=TS (i.e. ignore cyclical (C) and irregular (I)
effects) to predict unemployment in December 2001. (3 marks)
Page 18 of 38
Case Study G (18 marks).
Please attempt to answer ALL questions.
A DADM student decides to predict the selling price for apartments in an upmarket
suburb of Perth. The student randomly collected data for 31 apartments sold in the
last 6 months. The variables included the age of the apartment in years (AGE), the
proximity of the apartment to transport- whether or not it was close to public transport
(dummy variable 1=close, 0= not close) (TRANS), the location of the apartment
measured by the kilometres away from the centre of the city (LOCATION), the
number of bedrooms in the apartment (BEDRMS), the number of bathrooms in the
apartment (BATHRMS). The information is as follows:
Selling Price (000s) Age (yrs) Close to Transport Location No. Bedrooms No. Bathrooms
Y X1 X2 X3 X4 X5
640 14 0 13 1 1
570 20 0 16 1 1
890 4 1 9 5 2
670 14 0 12 2 1
750 9 0 9 3 2
680 12 0 13 2 2
460 20 0 21 1 1
770 7 1 8 4 2
650 14 0 11 2 1
620 21 0 12 1 1
890 4 1 7 5 2
560 15 0 20 1 1
670 13 0 9 2 1
680 15 0 11 2 1
640 13 0 14 2 1
760 12 1 7 3 1
850 5 1 5 5 2
790 8 1 7 4 1
880 3 1 6 5 1
920 1 1 3 6 3
740 10 0 7 2 1
570 16 0 17 1 1
680 13 0 12 3 1
670 14 0 10 2 1
930 2 1 4 6 2
560 18 0 17 1 1
520 14 0 18 1 1
510 21 0 20 1 1
710 11 0 7 2 1
740 10 0 6 3 2
670 16 0 10 2 1
Selling Price (000s) Age (yrs) Close to Transport Location No. Bedrooms No. Bathrooms
Selling Price (000s) 1
Age (yrs) -0.934674404 1
Close to Transport 0.805954172 -0.790302 1
Location -0.910099748 0.804188 -0.629731296 1
No. Bedrooms 0.945762931 -0.933053 0.876556176 -0.799555 1
No. Bathrooms 0.668905203 -0.706306 0.547251139 -0.537283 0.724347265 1
Page 19 of 38
1. Comment briefly on the correlation matrix. Specifically in relation to the
relationship between the dependent variable and each independent variable (i.e.
strength and direction). (2.5 marks)
Page 20 of 38
Best Subsets Analysis
X1 X2 X3 X4
Regression Statistics
Multiple R 0.983339284
R Square 0.966956147
Adjusted R Square 0.961872477
Standard Error 24.45813435
Observations 31
ANOVA
df SS MS F Significance F
Regression 4 455130.6622 113782.6656 190.2082944 7.59928E-19
Residual 26 15553.20873 598.2003358
Total 30 470683.871
ANOVA
df SS MS F Significance F
Regression 5 455135.8691 91027.17382 146.3647461 1.12258E-17
Residual 25 15548.00186 621.9200746
Total 30 470683.871
Page 21 of 38
Best Subsets Analysis
X1 X3 X4
Regression Statistics
Multiple R 0.983055532
R Square 0.966398178
Adjusted R Square 0.962664642
Standard Error 24.20272072
Observations 31
ANOVA
df SS MS F Significance F
Regression 3 454868.0353 151622.6784 258.842619 5.35047E-20
Residual 27 15815.83564 585.7716903
Total 30 470683.871
ANOVA
df SS MS F Significance F
Regression 4 454915.2495 113728.8124 187.5210927 9.08323E-19
Residual 26 15768.62143 606.4854397
Total 30 470683.871
2. On the basis of the results above, which Regression Model is a more appropriate
predictor of Selling Price and why (refer to output)? (2 marks)
Page 22 of 38
3. Referring to the regression output in the model you have selected from
question 2 (at the 0.05 significance level) determine whether each explanatory
variable makes a significant contribution to the regression model. (2.5 marks)
5. State the regression equation that should be used to predict Selling Price (using
your selected model). (2 marks)
6. Using the equation in question 5 estimate the selling price that an apartment
owner can expect given that they have an apartment which is 16 years old, is
close to transport, 8 kilometres from the city, has 2 bedrooms and 1 bathroom.
Show all workings. (2 marks)
7. Using the equation in question 5 estimate the selling price that an apartment
owner can expect given that they have an apartment which is 24 years old, is
not close to transport, 18 kilometres from the city, has 3 bedrooms and 1
bathroom. Show all workings. (2 marks)
Page 23 of 38
8. Interpret the value of the ‘age’ coefficient in your chosen model. (2 marks)
9. Given that the adjusted coefficient of multiple determination is not 100%, there is
still some of the variability in Selling Price left unexplained. List 2 additional
variables that have not previously been considered in this example that may help
to predict Selling Price of apartments. (1 mark)
Page 24 of 38
Case Study H (10 marks).
Please attempt to answer ALL questions.
The managers of a newly-established holiday resort want to ensure that their guests
are receiving excellent service compared with their competitors. As they had spent
considerable time hiring the ‘right staff for the job’ they felt that their resort patrons
would be more satisfied, and therefore well above the industry average of 7.25 (out
of 10). To determine whether this is true 77 people who had recently stayed in their
resort, were randomly selected. The sample data showed the average rating as 7.37
with a standard deviation of .76. Work through the following questions to determine
whether this difference is significant.
Page 25 of 38
Formulae Page
Standardizing values for:
χ −μ
Ζ=
σ
The sampling distribution
σ unknown χ±ts [( n )]
σ known χ ± Ζσ [( n )]
Hypothesis testing for the mean
x−μ
Ζ=
σ
n
x−μ
t=
s
n
NOTE: You may also need to refer to table E.2 the standardized normal distribution
and table E.3 the critical values of t from your textbook.
Page 26 of 38
Page 27 of 38
Page 28 of 38
Page 29 of 38
Page 30 of 38
SOLUTIONS
Part A: Multiple Choice
1. a 2. c 3. a 4. c
5. b 6. d 7. e 8. d
Part B:
Case Study A: Ages of football players
1. This is an example of cluster sampling as the 16 football teams together
would be considered the population and each team a cluster. Thus, Teams A
and B were randomly selected (as clusters) and all two teams, together with
all members, formed the sample (with all of the players from these three
teams included in the sample).
Range 14 > 12
Standard Deviation 3.54 > 3.25
The player age for Team A are more variable than for Team B as shown by
the larger range and standard deviation.
Page 31 of 38
Case Study B: Lucky Pizzas
1. Z=x-μ
σ
Z= 33-24
6
Z=1.5 look up tables p= 0.9332 (as we want the prob. for x>33) 1-0.9332
= 0.0668
2. Z=x-μ
σ
Z= 15-24
6
Z= -1.5 p= 0.0668
For x<33 p=0.9332
0.9332-0.0668
= 0.8664
3. (Please note that this is now a sampling distribution question not a normal
distribution question)
_
Z=x - μ
σ/√n
Z= 25-24
6/√100
Z=1.667 look up tables p=0.9522 (as we want the prob. > 25) 1-0.9552 =
0.0478
4. Since p=0.0478 < α=0.05 we reject H0. Thus there is enough evidence to
conclude that the average delivery time is now significantly greater than 24
minutes.
Page 32 of 38
Case Study C: Drycleaner, Doughnut Shop, Convenience Store
1. Doughnut shop
H0: μAFTER ≥ μBEFORE
The null hypothesis states that the average sales for the doughnut shop after
the removal of the parking bays will be greater than or equal to the average
sales before the removal of the parking bays.
2. Doughnut shop
H1: μAFTER < μBEFORE
The alternative hypothesis states that the average sales for the doughnut
shop after the removal of the parking bays will be less than the average sales
before the removal of the parking bays.
3. This question may be correctly answered using either the p-value method or
critical value method. I have included both below.
Drycleaner
Since p(one-tail)=0.178 > α=0.05 do not reject H0
Since tstat = -0.957 > -tcrit = -1.77 do not reject H0
Doughnut shop
Since p(one-tail)=0.003 < α=0.05 reject H0
Since tstat = -3.24 < -tcrit = -1.77 reject H0
Convenience store
Since p(one-tail)=0.0000028 < α=0.05 reject H0
Since tstat = -7.34 < -tcrit = -1.77 reject H0
Page 33 of 38
Case Study D: Big Bucks car yard
3. This question may be correctly answered using either the p-value method or
critical value method. I have included both below.
Since p(two-tail)=0.018 < α=0.05 reject H0
Since -tstat = -2.549 < -tcrit = -2.07 reject H0
5. A type 1 error occurs when the null hypothesis is rejected when in actual fact
it is true and should not be rejected. In this case a type 1 error would have
occurred if the results led us to conclude that there is a difference in average
sales between staff paid on an hourly rate and staff paid on commission only,
when in actual fact this is not true and thus there is no real difference
between the two motivational methods.
Page 34 of 38
Case Study E: Shampoo
1. n= 100 df = 100-1 = 99
_
x=199
s=0.90
95% Confidence interval
Using t
t for df=99, α=0.05 = 1.9842
_
x ± [t (s/√n)] s= sample standard deviation
=(198.82, 199.18)
Using Z
Z = 1.96
_
x ± [Z (s/√n)] s= sample standard deviation
= (198.82, 199.18)
2. We are 95% confident that the average amount of shampoo for all shampoo
bottles filled is between 198.82mls and 199.18mls.
3. As 200mls does not falls within the confidence interval this would suggest that
the bottling plan is under-filling the shampoo bottles and thus needs to make
adjustments to the system of change the advertised amount on the bottle to
199mls.
Page 35 of 38
Case Study F: Brokenleggen
1. The line of best fit is not fitting the data well, reflecting a low r2 value, as the
data is highly seasonal. This can been seen by the peaks and troughs in the
data within one year.
2. The June seasonal index is 1.25 meaning that unemployment in the June
quarter is 25% higher than the average quarterly level.
This de-seasonalised figure for September 1998 (177.21) is very similar to the
trend estimate of 174.6
Page 36 of 38
Case Study G: Selling Price for Apartments
1. Age and location are negatively correlated with selling price. Close to
transport, number of bedroom and number of bathrooms are positively
correlated with selling price. All independent variables are strongly correlated
with selling price.
2. The first regression output table lists of all possible models (containing all
possible combinations of the variables). It is evident that four of these models
should be considered useful. After considering the output for each of these
four models, Model 3 (X1, X3, X4) is a more appropriate predictor of selling
price as it has the highest adjusted r2, lowest standard error and all of the
independent variables make a significant contribution to the model.
3. Using Model 3:
Independent P value Alpha = 0.05 significant??
Variable
Age 0.033 As p<alpha (0.05) reject Ho, conclude that
age is making a significant contribution to the
regression model.
Location 0.000000757 As p<alpha (0.05) reject Ho, conclude that
location is making a significant contribution to
the regression model.
Number of 0.00027 As p<alpha (0.05) reject Ho, conclude that
bedrooms number of bedrooms is making a significant
contribution to the regression model.
8. The age coefficient is -5.12 this can be interpreted as follows: as the age of an
apartment increases by one year the expected selling price declines on
average by $5.12 (thousand) or $5120, holding constant all other variables.
Page 37 of 38
Case Study H: Customer Satisfaction at Holiday Resort
1. H0: μ ≤ 7.25
The null hypothesis states that the average customer satisfaction rating is less
than or equal to 7.25.
3. Zstat= 7.37-7.25
0.76/√77
=1.39
5. There is not enough evidence to conclude that that the average customer
satisfaction rating is greater than 7.25.
Page 38 of 38