Sei sulla pagina 1di 38

The University of Western Australia

The Graduate School of Management

DATA ANALYSIS AND DECISION MAKING MGMT8504

Practice Substitution Test

Student Name______________________________ Student No.________________

Time: 2 hours plus 10 minutes reading time


Questions: This paper contains two parts:

Part A contains 20 multiple-choice questions, each worth 1 mark.


Total Part A marks: 20

Part B contains 8 case studies with short answer questions. Marks are
identified within each question.
Total Part B marks: 80

Total marks: 100


Answer: Please attempt to answer ALL questions. Write your answers in the
space provided in this booklet.
Pages: This booklet contains 38 pages including this page, statistical tables
and solutions (please note solutions will not be provided with the actual
test ☺).
Special
Conditions: This is a closed-book test with only your calculators (non-
programmable), statistical tables and formulae page available for
assistance. Please note that if you pass this test (50% or better) you
will be able to substitute another elective unit in place of the Data
Analysis and Decision Making MGMT8504 unit.

GOOD LUCK EVERYONE!!!

Page 1 of 38
PART A: Multiple Choice questions (20 marks). Contains 20 questions,
each worth 1 mark. Please attempt to answer ALL questions.

Circle the letter that corresponds to the most correct answer beside each of
the following questions.

A1. The process of using sample statistics to draw conclusions about population
parameters is called
a) inferential statistics.
b) experimentation.
c) primary sources.
d) descriptive statistics.
e) the scientific method.

A2. In analysing categorical data, the following graphical device is not appropriate
a) Pie chart.
b) Pareto diagram.
c) Stem and leaf display.
d) Bar chart.
e) they are all appropriate.

A3. The number of Singaporeans travelling to work by car today is an example of


a) discrete numerical data.
b) categorical data.
c) continuous numerical data.
d) discrete categorical data.
e) continuous categorical data.

A4. A summary measure that is computed from only a sample of the population is
called
a) A parameter.
b) A census.
c) A statistic.
d) The scientific method.
e) All of the above.

A5. In a right skew distribution


a) The median, mean and mode are all equal.
b) The median and mode are both smaller than the mean.
c) The median and mode are both larger than the mean.
d) The distance between Q1 and the median and Q3 and the median is
equal.
e) None of the above.

Page 2 of 38
A6. Which of the following statements about the median is false?
a) It is a measure of a ”typical” value.
b) It is equal to the mean in a ”bell-shaped” normal distribution.
c) It is the average of the two middle values when an even number of data
values are ranked.
d) It is more affected by extreme values than the mean.
e) It is equal to the mode in a ”bell-shaped” normal distribution.

Questions A7 to A10 relate to the example below.


The average waiting time at the drive thru’ section of a popular hamburger chain
is 4 minutes with a standard deviation of 1 minute.

A7. What is the probability that a customer would have to wait longer than 3.5
minutes?
a) 0.5000
b) 0.2375
c) 0.7523
d) 0.3085
e) 0.6915

A8. If a sample of 100 customers (n=100) is taken, what is the probability that the
sample mean is less than 4.2 minutes?
a) 0.0228
b) 0.4207
c) 0.5793
d) 0.9772
e) 1.0000

A9. The same hamburger chain later surveys 1,000 customers to see if it should
add beetroot to its Super Jumbo burger. The hamburger chain has 50 stores
but, on the assumption that each store’s customers have similar tastes in
hamburgers, randomly chooses only 10 of the stores to participate in the
survey. Each store then samples 100 customers. What sampling method has
the hamburger chain used?
a) Cluster sampling
b) Non-probability sampling
c) Simple random sampling
d) Stratified sampling
e) Systematic sampling

A10. The 1,000 customers sampled were also asked to estimate how much they
spend per visit at this hamburger chain. From this sample data, it was found that
on average customers spend $10.42 per visit with a standard deviation of $6.24.
The correct 95% confidence interval for the mean of all customers spending
per visit to this hamburger chain is:
a) 8.11 to 12.43
b) 4.18 to 16.66
c) 10.03 to 10.81
d) 9.72 to 11.36
e) 9.91 to 10.93

Page 3 of 38
A11. A group of researchers were attempting to determine whether female MBA
graduates have a similar mean starting salary as male MBA graduates. What
assumptions were necessary to conduct this hypothesis test?
a) Both populations of salaries (male and female) must have approximate
normal distributions.
b) The population variances are approximately equal.
c) The samples were randomly and independently selected.
d) All of the above.
e) (a) and (b) only.

Questions A12 to A15 relate to the example below.


The membership controller at the Royal Big Bucks Golf Club believes that each
member, on average, plays golf for more than 12 hours per week. To test
his theory, the controller took a random sample of 16 golfers and asked them
how many hours a week they played golf.
The data was as follows: 12, 15, 10, 22, 7, 16, 8, 18, 17, 14, 13, 14, 8, 24, 18, 16.
The mean number of hours the sample of golfers play for is 14.5 with a
standard deviation of 4.844.

A12. State the null and alternative hypothesis to determine if the average number
of hours played on the golf course is more than 12 hours per week.
a) H0: μ ≤ 12 hours and H1: μ > 12 hours
b) H0: μ ≥ 12 hours and H1: μ < 12 hours
c) H0: μ = 12 hours and H1: μ ≠ 12 hours
d) H0: x ≤ 12 hours and H1: x > 12 hours
e) H0: x ≥ 12 hours and H1: x < 12 hours

A13. To test the hypotheses the membership controller decided to construct a one-
tailed hypothesis test with a 5% significance level (i.e. α = 0.05). What is the
appropriate t-critical value for a sample of size 16?
a) 1.6450
b) 1.7459
c) 1.9600
d) 2.1315
e) 1.7531

A14. To test the hypotheses the membership controller decided to construct a one-
tailed hypothesis test with a 5% significance level (i.e. α = 0.05). What is the
appropriate t-statistic value?
a) 2.32
b) 2.06
c) 1.96
d) 1.80
e) 2.41

Page 4 of 38
A15. Based on a 0.05 alpha level, the membership controller‘s decision would be:
a) Reject H0, p-value < α = 0.05 and +t-statistic > +t-critical.
b) Reject H0, p-value > α = 0.05 and +t-statistic < +t-critical.
c) Not reject H0, p-value < α = 0.05 and +t-statistic > +t-critical.
d) Not reject H0, p-value > α = 0.05 and +t-statistic < +t-critical.
e) Throw his hands in the air and go and practise his putting.

A16. A retail manager wants to predict sales of umbrellas (Y) (in $000s). The
manager uses a combination of the following variables: daily maximum
temperature (X1), number of customers in the shopping centre where the store
is located (X2), friendliness of staff (X3), and whether or not the manager is
walking the shop floor (X4). Which of these variables is most likely to be the
strongest predictor of umbrella sales for this retail store?
a) X1
b) X2
c) X3
d) X4
e) All of these variables would have little effect on sales.

Questions A17 to A18 relate to the example below.


The manager of this retail stores decides to statistically analyse the relationship
between sales of umbrellas (SALES) and the daily maximum temperature
(MAXTEMP). The regression model is: SALES(predicted)=520 - 5*MAXTEMP.
The coefficient of correlation (R) for a random sample of 20 days between
maximum temperature and umbrella sales is -0.70.

A17. Which of the following statements is not true?


a) Daily maximum temperature is a good predictor of umbrella sales.
b) A positive relationship exists between maximum temperature and
umbrella sales.
c) The coefficient of determination (R2) is 0.49.
d) If the max. temperature is 220C, umbrella sales are predicted to be $410.
e) If the max. temperature is 150C, umbrella sales are predicted to be $445.

A18. Not satisfied with just considering the effect of daily maximum temperature on
umbrella sales, the manager recorded the number of customers (CUSTOMERS)
visiting the store, irrespective of whether the customer purchased an umbrella.
The regression model is:
SALES (predicted) = 472 - 4*MAXTEMP +3.5*CUSTOMERS
The new coefficient of determination is 0.59. Which of the following statements
is not true?
a) ‘Umbrella sales’ is the dependent variable.
b) Assuming the maximum temperature is kept constant, the average effect
of each extra customer is to increase sales by $3.5.
c) Assuming the number of customers is kept constant, the effect of a one-
degree increase in maximum temperature is to increase sales by $4.
d) 59% of the variation in umbrella sales is explained by the combined
variation in maximum temperature and number of customers.
e) Umbrella sales are negatively correlated with daily maximum temperature
and positively correlated with the number of customers in the store.

Page 5 of 38
A19. Use the model in Q3 to predict SALES on a day when MAXTEMP = 19C and
CUSTOMERS = 220.
a) 6671
b) 1166
c) 2352
d) 472
e) 2743

A20. The annual multiplicative time-series model possesses which components?


a) Cyclical, Seasonal and Trend only
b) Trend and Seasonal only
c) Irregular, Seasonal, Cyclical and Trend only
d) Cyclical, Seasonal and Irregular only
e) Trend, Cyclical and Irregular only

Page 6 of 38
PART B: Case Study Questions (80 marks). Contains 8 case studies with
short answer questions. Please attempt to answer all questions.

Case Study A (6 marks).


Please attempt to answer ALL questions.

The coaches of a local football team wanted to determine whether the players on their team
were older than those of others teams. They recorded the ages of the players on two of the
teams as follows:

Team A: 25, 25, 28, 30, 25, 25, 24, 25, 22, 21, 19,
20, 22, 33, 28, 25, 26, 30, 28, 22, 24, 21
Team B: 23, 23, 26, 27, 24, 20, 21, 23, 25, 21, 22,
29, 28, 29, 28, 27, 23, 23, 26, 32, 23, 20

A set of descriptive statistics was computed for both teams’ ages, along with adjacent five-
number summaries (below) and box plots (next page).

Team A Team B
Mean 24.909 Mean 24.682
Standard Error 0.756 Standard Error 0.694
Median 25 Median 23.5
Mode 25 Mode 23
Standard Deviation 3.544 Standard Deviation 3.257
Sample Variance 12.563 Sample Variance 10.608
Kurtosis -0.112 Kurtosis -0.496
Skewness XXXX Skewness XXXX
Range 14 Range 12
Minimum 19 Minimum 20
Maximum 33 Maximum 32
Sum 548 Sum 543
Count 22 Count 22
Confidence Level 1.571 Confidence Level 1.444
(95.0%) (95.0%)

Five-number Summary
Team A Team B

Minimum 19 20
First Quartile 22 23
Median 25 23.5
Third
Quartile 28 27
Maximum 33 32

Page 7 of 38
Box Plots for Age
35

30

Ag
e 25

20 Team B
Team A

15

Team

1. Given one of the aims of this analysis is to make inferences about the
average age of all football players (there are 16 teams in the competition),
what type of sampling scheme has been used here? Briefly explain. (2
marks)

2. For which team, Team A or Team B, is age more variable? Refer to and/or
compute two different statistics to support your answer. (2 marks)

3. By using either descriptive statistics or box plots, comment on the skewness


of both teams’ age distributions. (2 marks)

Page 8 of 38
Case Study B (6 marks).
Please attempt to answer ALL questions.

Lucky Pizzas, where your food is presented with a smile and a poem, delivers its
wide assortment of pizzas in an average time of 24 minutes with a standard
deviation of 6 minutes. The delivery times are approximately normally
distributed.

1. The owner of Lucky Pizzas has promised that any household waiting more
than 33 minutes to receive its order will get the pizza(s) free. What is the
probability of receiving a free pizza? (1 mark)

2. If 200 orders were received in one night, how many would you expect to take
between 15 and 33 minutes to be delivered? Show all workings. (2 marks)

3. On a night in which 100 deliveries were made, the average waiting time
was 25 minutes. Calculate the probability the average delivery time exceeds
25 minutes. (2 marks)

4. Does your answer to (3) indicate that the average delivery time is now
significantly greater than 24 minutes? Briefly explain (use α=0.05). (1
marks)
i. Note: In effect, you are testing H0: µ ≤ 24 v H1: µ >24.

Page 9 of 38
Case Study C (9 marks).
Please attempt to answer ALL questions.

The growing use of bicycles to commute to work has caused many cities to create
exclusive bicycle lanes. These lanes are usually created by disallowing parking on
streets that formerly allowed curb-side parking. Shop-owners on such streets
complain that the removal of parking will cause their businesses to suffer. To
examine this problem a mayor of a large city decided to launch an experiment on
one busy street that had one-hour parking meters. The meters were removed and a
bicycle lane was created. The mayor asked three businesses (a drycleaner, a
doughnut shop, and a convenience store) in one block to record daily sales for two
complete weeks (Sunday to Saturday) prior to the change and two complete weeks
after the change, the assumption being that the removal of parking bays would
result in fewer sales. The data are presented below:

Day Drycleaner Doughnut Shop Convenience Store


Sales Sales Sales Sales Sales Sales
Before After Before After Before After
Sunday 195 173 319 317 307 287
Monday 194 204 347 331 393 390
Tuesday 146 153 306 301 407 394
Wednesday 186 184 316 306 352 314
Thursday 178 168 324 318 337 308
Friday 146 145 339 340 445 419
Saturday 161 141 272 248 440 429
Sunday 190 185 285 284 357 320
Monday 162 157 312 284 389 354
Tuesday 154 154 346 325 410 398
Wednesday 153 163 266 268 314 270
Thursday 172 175 309 282 359 339
Friday 174 170 315 268 425 380
Saturday 141 145 258 262 310 272

1. State the null hypothesis for the doughnut shop. Write the null hypothesis in
words AND statistical notation. (2 marks)

2. State the alternative hypothesis for the doughnut shop. Write the alternative
hypothesis in words AND statistical notation. (2 marks)

Page 10 of 38
Following is the EXCEL output.

t-Test: Paired Two Sample for Means-- Drycleaners

Sales After Sales Before


Mean 165.5 168
Variance 321.961538 351.3846154
Observations 14 14
Pearson Correlation 0.85899149
Hypothesized Mean Difference 0
df 13
t Stat -0.9571992
P(T<=t) one-tail 0.17796508
t Critical one-tail 1.7709317
P(T<=t) two-tail 0.35593015
t Critical two-tail 2.16036824

t-Test: Paired Two Sample for Means- Doughnut Shop

Sales After Sales Before


Mean 295.285714 308.1428571
Variance 812.065934 809.6703297
Observations 14 14
Pearson Correlation 0.86397796
Hypothesized Mean Difference 0
df 13
t Stat -3.2390095
P(T<=t) one-tail 0.00323176
t Critical one-tail 1.7709317
P(T<=t) two-tail 0.00646353
t Critical two-tail 2.16036824

t-Test: Paired Two Sample for Means- Convenience Store

Sales After Sales Before


Mean 348.142857 374.6428571
Variance 2941.82418 2270.401099
Observations 14 14
Pearson Correlation 0.97310849
Hypothesized Mean Difference 0
df 13
t Stat -7.3412498
P(T<=t) one-tail 2.8273E-06
t Critical one-tail 1.7709317
P(T<=t) two-tail 5.6547E-06
t Critical two-tail 2.16036824

Page 11 of 38
3. Use the information from the EXCEL output to decide whether you should
reject the null hypothesis for each of the three shops. Show all workings. (Use
alpha=0.05). (3 marks)

4. What would you conclude from this study in relation to the original concern of
the shop owners? (2 marks)

Page 12 of 38
Case Study D (9 marks).
Please attempt to answer ALL questions.

The owners of Big Bucks car yard are concerned with their declining sales over the
past few months. As a result they want to determine whether there is a difference in
the motivation of sales staff paid on an hourly rate plus commission or on
commission only (but at a higher percentage rate). Of 24 randomly selected newly
trained sales staff, 12 were paid at an hourly rate and 12 on a commission basis
only. The following data represent the sales in volume (in thousands of dollars)
achieved during the first month on the job.

Hourly Rate Commission

147 330

224 472

118 195

209 489

126 386

372 462

197 509

260 312

372 227

451 325

447 518

328 476

To determine whether there is any difference in the sales volume of the two
motivational methods one of the owners of Big Bucks, who has a statistical
background, decided to analyse the data with a hypothesis test at a 5%
significance level (α). The Excel output is below.

1. State the null hypothesis for this situation. Write the null hypothesis in words
AND statistical notation (2 marks)

Page 13 of 38
2. State the alternative hypothesis for this situation. Write the alternative
hypothesis in words AND statistical notation. (2 marks)

Following is the EXCEL output.


t-Test: Two-Sample Assuming Equal Variances

Hourly Commission
Mean 270.9166667 391.75
Variance 14407.90152 12557.47727
Observations 12 12
Pooled Variance 13482.68939
Hypothesized Mean Difference 0
Df 22
t Stat -2.549025127
P(T<=t) one-tail 0.009146094
t Critical one-tail 1.717144187
P(T<=t) two-tail 0.018292187
t Critical two-tail 2.073875294

3. Use this information to decide whether you should reject the null hypothesis
in this case. Show all workings. (Use alpha=0.05). (1 mark)

4. Draw a conclusion with respect to the problem. (2 marks)

5. Describe what a Type 1 error would be for this situation. (2 marks).

Page 14 of 38
Case Study E (6 marks).
Please attempt to answer ALL questions.

A cosmetic manufacturer is interesting in estimating the actual amount of shampoo


that is placed in a 200 ml bottle to ensure that they are not over-filling or under-filling
the bottles. The manager of the bottling plant took a random sample of 100 bottles
and calculated the mean to be 199ml with a standard deviation of 0.90.

1. Set up a 95% confidence interval of the true population mean amount of


shampoo in each bottle. (3 marks)

2. Interpret this confidence interval in relation to the situation. (2 mark)

3. Does the bottling plant need adjustment (explain)? (1 mark)

Page 15 of 38
Case Study F (16 marks).
Please attempt to answer ALL questions.

Brokenleggen is a medieval village in the Swiss Alps whose unemployment level is


primarily influenced by the influx of skiers, particularly during the snow-laden months
of October- March. The following data represent the number of unemployed
residents of Brokenleggen during the 20 economic quarters from 1996-2000.

Quarter/Year Unemployed Quarter/Year Unemployed


March 1996 105 September 1998 241
June 1996 214 December 1998 128
September 1996 226 March 1999 121
December 1996 126 June 1999 217
March 1997 112 September 1999 250
June 1997 232 December 1999 122
September 1997 224 March 2000 126
December 1997 131 June 2000 229
March 1998 106 September 2000 259
June 1998 208 December 2000 133

The chief statistician of Brokenleggen - also the mayor - wants to isolate the
seasonal effects of the town’s unemployment to assist in the prediction of future
unemployment levels. The data were plotted (see graph below) and the line of best
fit was computed.

Unemployment in Brokenleggen
1996-2000

300
250
Unemployed

200
150
100 y = 1.2135x + 162.76
50 2
R = 0.0156
0
0 2 4 6 8 10 12 14 16 18
Quarter (0=March 1996)

The regression equation shown above is Y= 162.76 + 1.2135X, where X =


quarter/year (March 1996 is X=0), and Y=predicted unemployment level. The
coefficient of determination (R2) =0.0156

Page 16 of 38
Further data analysis produced the following quarterly seasonal indexes:
MARCH = 0.66
JUNE = 1.25
SEPTEMBER = 1.36
DECEMBER = 0.73

The quarterly seasonal indexes were used to seasonally adjust the data. The
regression equation is: 164.46 + 1.0163*(coded quarter) where March 1996 is
coded 0. The R2 is 0.3982. The data are graphically displayed below.

De-seasonalised Unemployment in
Brokenleggen 1996-2000

200
190
Unemployed

180
170
160 y = 1.0163x + 164.46
2
150 R = 0.3982
140
0 2 4 6 8 10 12 14 16 18
Quarter (0=March 1996)

1. Comment briefly of the main reason why the line of best fit (trend line) has an
extraordinarily low coefficient of determination (R2) value. (2 marks)

2. Interpret the June seasonal index value in relation to the seasonality of


unemployment in Brokenleggen. (2 marks)

3. Interpret the December seasonal index value in relation to the seasonality of


unemployment in Brokenleggen. (2 marks)

Page 17 of 38
4. Use the trend line to predict unemployment in September 1998 and compute the
residual. (3 marks)

5. Calculate the de-seasonalised unemployment level in September 1998 and


compare it with the trend estimate. (3 marks)

6. Use the multiplicative model Y=TS (i.e. ignore cyclical (C) and irregular (I)
effects) to predict unemployment in December 2001. (3 marks)

7. Give an example of an irregular effect that might influence unemployment


(positively or negatively) in Brokenleggen. (1 mark)

Page 18 of 38
Case Study G (18 marks).
Please attempt to answer ALL questions.

A DADM student decides to predict the selling price for apartments in an upmarket
suburb of Perth. The student randomly collected data for 31 apartments sold in the
last 6 months. The variables included the age of the apartment in years (AGE), the
proximity of the apartment to transport- whether or not it was close to public transport
(dummy variable 1=close, 0= not close) (TRANS), the location of the apartment
measured by the kilometres away from the centre of the city (LOCATION), the
number of bedrooms in the apartment (BEDRMS), the number of bathrooms in the
apartment (BATHRMS). The information is as follows:

Selling Price (000s) Age (yrs) Close to Transport Location No. Bedrooms No. Bathrooms
Y X1 X2 X3 X4 X5
640 14 0 13 1 1
570 20 0 16 1 1
890 4 1 9 5 2
670 14 0 12 2 1
750 9 0 9 3 2
680 12 0 13 2 2
460 20 0 21 1 1
770 7 1 8 4 2
650 14 0 11 2 1
620 21 0 12 1 1
890 4 1 7 5 2
560 15 0 20 1 1
670 13 0 9 2 1
680 15 0 11 2 1
640 13 0 14 2 1
760 12 1 7 3 1
850 5 1 5 5 2
790 8 1 7 4 1
880 3 1 6 5 1
920 1 1 3 6 3
740 10 0 7 2 1
570 16 0 17 1 1
680 13 0 12 3 1
670 14 0 10 2 1
930 2 1 4 6 2
560 18 0 17 1 1
520 14 0 18 1 1
510 21 0 20 1 1
710 11 0 7 2 1
740 10 0 6 3 2
670 16 0 10 2 1

Following is the correlation matrix.

Selling Price (000s) Age (yrs) Close to Transport Location No. Bedrooms No. Bathrooms
Selling Price (000s) 1
Age (yrs) -0.934674404 1
Close to Transport 0.805954172 -0.790302 1
Location -0.910099748 0.804188 -0.629731296 1
No. Bedrooms 0.945762931 -0.933053 0.876556176 -0.799555 1
No. Bathrooms 0.668905203 -0.706306 0.547251139 -0.537283 0.724347265 1

Page 19 of 38
1. Comment briefly on the correlation matrix. Specifically in relation to the
relationship between the dependent variable and each independent variable (i.e.
strength and direction). (2.5 marks)

Following is the regression output.


Consider
Model Cp k R Square Adj. R Square Std. Error This Model?
X1 68.65023 2 0.873616 0.869258181 45.29094 No
X1X2 61.52526 3 0.885673 0.877506964 43.83891 No
X1X2X3 9.019199 4 0.957693 0.952991844 27.15755 No
X1X2X3X4 4.008372 5 0.966956 0.961872477 24.45813 Yes
X1X2X3X4X5 6 6 0.966967 0.960360651 24.93833 Yes
X1X2X3X5 10.09606 5 0.958912 0.952591243 27.27303 No
X1X2X4 40.44016 4 0.916176 0.906861975 38.22676 No
X1X2X4X5 40.59247 5 0.918617 0.906096708 38.38349 No
X1X2X5 63.35223 4 0.885902 0.873224221 44.5987 No
X1X3 16.86854 3 0.944679 0.940727086 30.49525 No
X1X3X4 2.430656 4 0.966398 0.962664642 24.20272 Yes
X1X3X4X5 4.354739 5 0.966498 0.961344409 24.62693 Yes
X1X3X5 18.09873 4 0.945696 0.939661959 30.76803 No
X1X4 38.9175 3 0.915545 0.909512572 37.67889 No
X1X4X5 39.60948 4 0.917273 0.908081511 37.97567 No
X1X5 70.53488 3 0.873769 0.864752124 46.06481 No
X2 238.2197 2 0.649562 0.637478063 75.41733 No
X2X3 36.96801 3 0.918121 0.912272437 37.09984 No
X2X3X4 6.983359 4 0.960383 0.95598071 26.28001 No
X2X3X4X5 8.940519 5 0.960439 0.954352974 26.76149 No
X2X3X5 27.28539 4 0.933557 0.926174816 34.03348 No
X2X4 53.13213 3 0.896763 0.889389046 41.65845 No
X2X4X5 54.07219 4 0.898164 0.886848459 42.13415 No
X2X5 184.1331 3 0.72367 0.703932102 68.15531 No
X3 102.9606 2 0.828282 0.822360226 52.79273 No
X3X4 5.168036 3 0.960139 0.957291382 25.88581 No
X3X4X5 7.163328 4 0.960145 0.955716493 26.35876 No
X3X5 70.51747 3 0.873792 0.864776773 46.06061 No
X4 52.86948 2 0.894468 0.890828471 41.3865 No
X4X5 54.45391 3 0.895017 0.887517819 42.00934 No
X5 391.1949 2 0.447434 0.428380177 94.70168 No

Page 20 of 38
Best Subsets Analysis
X1 X2 X3 X4
Regression Statistics
Multiple R 0.983339284
R Square 0.966956147
Adjusted R Square 0.961872477
Standard Error 24.45813435
Observations 31

ANOVA
df SS MS F Significance F
Regression 4 455130.6622 113782.6656 190.2082944 7.59928E-19
Residual 26 15553.20873 598.2003358
Total 30 470683.871

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 794.2417773 51.69782063 15.36315782 1.47042E-14 687.9753155 900.5082392
Age (yrs) -5.272000598 2.318117857 -2.274259086 0.031440073 -10.03696322 -0.507037978
Close to Transport 13.81432018 20.84890413 0.662592148 0.513423151 -29.04124411 56.66988446
Location -10.12641224 1.602014132 -6.321050504 1.08116E-06 -13.41940161 -6.83342287
No. Bedrooms 28.30433106 10.48393484 2.699781283 0.012036811 6.754280238 49.85438187

Best Subsets Analysis


X1 X2 X3 X4 X5
Regression Statistics
Multiple R 0.983344909
R Square 0.966967209
Adjusted R Square 0.960360651
Standard Error 24.93832542
Observations 31

ANOVA
df SS MS F Significance F
Regression 5 455135.8691 91027.17382 146.3647461 1.12258E-17
Residual 25 15548.00186 621.9200746
Total 30 470683.871

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 794.8884879 53.18454302 14.94585537 5.69121E-14 685.3529479 904.4240279
Age (yrs) -5.301123068 2.384962783 -2.222727795 0.035509088 -10.21304242 -0.389203711
Close to Transport 13.22664968 22.20726745 0.595600053 0.55679648 -32.51004175 58.96334111
Location -10.09513665 1.668846203 -6.04917136 2.54941E-06 -13.53218734 -6.658085969
No. Bedrooms 28.72440573 11.63392687 2.469020654 0.020732672 4.763901608 52.68490986
No. Bathrooms -1.187898206 12.98249998 -0.091499958 0.927824574 -27.9258387 25.55004229

Page 21 of 38
Best Subsets Analysis
X1 X3 X4
Regression Statistics
Multiple R 0.983055532
R Square 0.966398178
Adjusted R Square 0.962664642
Standard Error 24.20272072
Observations 31

ANOVA
df SS MS F Significance F
Regression 3 454868.0353 151622.6784 258.842619 5.35047E-20
Residual 27 15815.83564 585.7716903
Total 30 470683.871

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 782.089883 47.83064578 16.35122985 1.56719E-15 683.9495701 880.2301959
Age (yrs) -5.118964735 2.282495659 -2.242705135 0.033326064 -9.802255844 -0.435673626
Location -9.899578419 1.548664942 -6.392330677 7.57586E-07 -13.07717428 -6.721982559
No. Bedrooms 32.83787476 7.860479085 4.177592028 0.000276237 16.7095147 48.96623482

Best Subsets Analysis


X1 X3 X4 X5
Regression Statistics
Multiple R 0.98310655
R Square 0.966498488
Adjusted R Square 0.961344409
Standard Error 24.6269251
Observations 31

ANOVA
df SS MS F Significance F
Regression 4 454915.2495 113728.8124 187.5210927 9.08323E-19
Residual 26 15768.62143 606.4854397
Total 30 470683.871

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 785.4442184 50.13183119 15.66757487 9.26233E-15 682.396696 888.4917407
Age (yrs) -5.221678627 2.351495903 -2.220577387 0.035302859 -10.05525085 -0.388106406
Location -9.837240136 1.591568653 -6.180845621 1.5468E-06 -13.1087585 -6.565721773
No. Bedrooms 33.49283752 8.335609579 4.018042976 0.000446012 16.35873541 50.62693962
No. Bathrooms -3.424202658 12.27250742 -0.279014104 0.782441354 -28.65071947 21.80231416

2. On the basis of the results above, which Regression Model is a more appropriate
predictor of Selling Price and why (refer to output)? (2 marks)

Page 22 of 38
3. Referring to the regression output in the model you have selected from
question 2 (at the 0.05 significance level) determine whether each explanatory
variable makes a significant contribution to the regression model. (2.5 marks)

4. Interpret the adjusted coefficient of multiple determination in the model you


have selected from question 2. (2 marks)

5. State the regression equation that should be used to predict Selling Price (using
your selected model). (2 marks)

6. Using the equation in question 5 estimate the selling price that an apartment
owner can expect given that they have an apartment which is 16 years old, is
close to transport, 8 kilometres from the city, has 2 bedrooms and 1 bathroom.
Show all workings. (2 marks)

7. Using the equation in question 5 estimate the selling price that an apartment
owner can expect given that they have an apartment which is 24 years old, is
not close to transport, 18 kilometres from the city, has 3 bedrooms and 1
bathroom. Show all workings. (2 marks)

Page 23 of 38
8. Interpret the value of the ‘age’ coefficient in your chosen model. (2 marks)

9. Given that the adjusted coefficient of multiple determination is not 100%, there is
still some of the variability in Selling Price left unexplained. List 2 additional
variables that have not previously been considered in this example that may help
to predict Selling Price of apartments. (1 mark)

Page 24 of 38
Case Study H (10 marks).
Please attempt to answer ALL questions.

The managers of a newly-established holiday resort want to ensure that their guests
are receiving excellent service compared with their competitors. As they had spent
considerable time hiring the ‘right staff for the job’ they felt that their resort patrons
would be more satisfied, and therefore well above the industry average of 7.25 (out
of 10). To determine whether this is true 77 people who had recently stayed in their
resort, were randomly selected. The sample data showed the average rating as 7.37
with a standard deviation of .76. Work through the following questions to determine
whether this difference is significant.

1. State the null hypothesis in words AND statistical notation. (2 marks)

2. State the alternate hypothesis in words AND statistical notation. (2 marks)

3. Calculate the appropriate test statistic. (2 marks)

4. What is your decision in terms of this example? (2 marks)

5. What is your conclusion in relation to the original question? (2 marks)

Page 25 of 38
Formulae Page
Standardizing values for:

The entire population

χ −μ
Ζ=
σ
The sampling distribution

χ −μ σ = population standard deviation


Ζ= s= sample standard distribution
σ χ = sample mean
n μ = population mean
Calculating the confidence interval for:

σ unknown χ±ts [( n )]

σ known χ ± Ζσ [( n )]
Hypothesis testing for the mean

Z test of hypothesis for the mean (σ known)

x−μ
Ζ=
σ
n

t test of hypothesis for the mean (σ unknown)

x−μ
t=
s
n

NOTE: You may also need to refer to table E.2 the standardized normal distribution
and table E.3 the critical values of t from your textbook.

Page 26 of 38
Page 27 of 38
Page 28 of 38
Page 29 of 38
Page 30 of 38
SOLUTIONS
Part A: Multiple Choice

1. a 2. c 3. a 4. c

5. b 6. d 7. e 8. d

9. a 10. c 11. d 12. a

13. e 14. b 15. a 16. a

17. b 18. c 19. b 20. e

Part B:
Case Study A: Ages of football players
1. This is an example of cluster sampling as the 16 football teams together
would be considered the population and each team a cluster. Thus, Teams A
and B were randomly selected (as clusters) and all two teams, together with
all members, formed the sample (with all of the players from these three
teams included in the sample).

2. Refer to table and comments below


Team A Team B

Range 14 > 12
Standard Deviation 3.54 > 3.25

The player age for Team A are more variable than for Team B as shown by
the larger range and standard deviation.

3. This question may be correctly answered using either of the following:


Descriptive statistics approach
Team A- since the mean (24.91) ≈ the median (25) = mode (25) the
distribution of age of players for team A is approximately normal.
Team B- since the mean (24.68) > median (23.5) > mode (23) the distribution
of age of players for team B is considered to be right-skewed.
OR
Box-and-whisker plot approach
Team A- since the tails of the box-and-whisker plot are approximately the
same length, and the median cuts the box approximately in half (between Q1
and Q3) the distribution of age of players for team A is approximately normal.
Team B- since the left tail (from the minimum score to Q1) is shorter than the
right tail (from Q3 to the maximum score) and the median cuts the box closer
to Q1 than Q3 the distribution of age of players for team B is considered to be
right-skewed.

Page 31 of 38
Case Study B: Lucky Pizzas

1. Z=x-μ
σ
Z= 33-24
6
Z=1.5 look up tables p= 0.9332 (as we want the prob. for x>33) 1-0.9332
= 0.0668

2. Z=x-μ
σ

Z= 15-24
6

Z= -1.5 p= 0.0668
For x<33 p=0.9332
0.9332-0.0668
= 0.8664

For 200 orders we need to multiply this probability by 200


0.8664 x 200 =173.28
So we would expect 173.28 orders to take between 15 and 33 minutes.

3. (Please note that this is now a sampling distribution question not a normal
distribution question)
_
Z=x - μ
σ/√n

Z= 25-24
6/√100

Z=1.667 look up tables p=0.9522 (as we want the prob. > 25) 1-0.9552 =
0.0478

4. Since p=0.0478 < α=0.05 we reject H0. Thus there is enough evidence to
conclude that the average delivery time is now significantly greater than 24
minutes.

Page 32 of 38
Case Study C: Drycleaner, Doughnut Shop, Convenience Store

1. Doughnut shop
H0: μAFTER ≥ μBEFORE
The null hypothesis states that the average sales for the doughnut shop after
the removal of the parking bays will be greater than or equal to the average
sales before the removal of the parking bays.

2. Doughnut shop
H1: μAFTER < μBEFORE
The alternative hypothesis states that the average sales for the doughnut
shop after the removal of the parking bays will be less than the average sales
before the removal of the parking bays.

3. This question may be correctly answered using either the p-value method or
critical value method. I have included both below.
Drycleaner
Since p(one-tail)=0.178 > α=0.05 do not reject H0
Since tstat = -0.957 > -tcrit = -1.77 do not reject H0

Doughnut shop
Since p(one-tail)=0.003 < α=0.05 reject H0
Since tstat = -3.24 < -tcrit = -1.77 reject H0

Convenience store
Since p(one-tail)=0.0000028 < α=0.05 reject H0
Since tstat = -7.34 < -tcrit = -1.77 reject H0

4. We can conclude the following:


There is enough evidence to conclude that the average sales for the
doughnut shop and convenience store after the removal of parking bays are
less than the average sales before the removal of the parking bays. However,
there is not enough evidence to conclude this for the drycleaners.

Page 33 of 38
Case Study D: Big Bucks car yard

1. H0: μHourly rate = μCommission only


The null hypothesis states that the average sales (in $000s) made by staff
paid on an hourly rate are equal to the average sales (in $000s) made by staff
paid on commission only.

2. H1: μHourly rate ≠ μCommission only


The alternate hypothesis states that the average sales (in $000s) made by
staff paid on an hourly rate are not equal to the average sales (in $000s)
made by staff paid on commission only.

3. This question may be correctly answered using either the p-value method or
critical value method. I have included both below.
Since p(two-tail)=0.018 < α=0.05 reject H0
Since -tstat = -2.549 < -tcrit = -2.07 reject H0

4. There is enough evidence to conclude that there is a difference between the


average sales made by staff paid on an hourly rate and staff paid on
commission only.

5. A type 1 error occurs when the null hypothesis is rejected when in actual fact
it is true and should not be rejected. In this case a type 1 error would have
occurred if the results led us to conclude that there is a difference in average
sales between staff paid on an hourly rate and staff paid on commission only,
when in actual fact this is not true and thus there is no real difference
between the two motivational methods.

Page 34 of 38
Case Study E: Shampoo

1. n= 100 df = 100-1 = 99
_
x=199
s=0.90
95% Confidence interval

Using t
t for df=99, α=0.05 = 1.9842
_
x ± [t (s/√n)] s= sample standard deviation

199± [1.9842 (0.90/√100)]

=(198.82, 199.18)

Using Z
Z = 1.96
_
x ± [Z (s/√n)] s= sample standard deviation

199± [1.96 (0.90/√100)]

= (198.82, 199.18)

2. We are 95% confident that the average amount of shampoo for all shampoo
bottles filled is between 198.82mls and 199.18mls.

3. As 200mls does not falls within the confidence interval this would suggest that
the bottling plan is under-filling the shampoo bottles and thus needs to make
adjustments to the system of change the advertised amount on the bottle to
199mls.

Page 35 of 38
Case Study F: Brokenleggen

1. The line of best fit is not fitting the data well, reflecting a low r2 value, as the
data is highly seasonal. This can been seen by the peaks and troughs in the
data within one year.

2. The June seasonal index is 1.25 meaning that unemployment in the June
quarter is 25% higher than the average quarterly level.

3. The December seasonal index is 0.73 meaning that unemployment in the


December quarter is 27% lower than the average quarterly level.

4. September 1998 - coded quarter (X)= 10


Y= 164.46 + 1.0163X
Y= 164.46 + 1.0163*(10)
=174.62 Predicted unemployment for September 1998 =174.6

Residual= actual Y - predicted Y


=241- 174.6
=66.4 (please note that this is a large residual)

5. De-seasonalised unemployment for September 1998 – TSCI / S = TCI (de-


seasonalised data)
=241/1.36 =177.21

This de-seasonalised figure for September 1998 (177.21) is very similar to the
trend estimate of 174.6

6. December 2001- coded quarter (X)= 23


December seasonal index = 0.73
Y= TSCI (assume C=1, I=1)
Y= TS
Y=[164.46 + 1.0163X]*S
Y=[164.46 + 1.0163*23)]*0.73
=137.12

7. An example of an irregular effect that might influence unemployment in


Brokenleggen is any of the following and more:
Severe unexpected weather i.e. blizzard, flooding, extra-warm summer, extra-
cold summer etc, one-off promotion by competitors, war, unexpected travel
problems which affect tourists visiting the town, etc etc.

Page 36 of 38
Case Study G: Selling Price for Apartments

1. Age and location are negatively correlated with selling price. Close to
transport, number of bedroom and number of bathrooms are positively
correlated with selling price. All independent variables are strongly correlated
with selling price.

2. The first regression output table lists of all possible models (containing all
possible combinations of the variables). It is evident that four of these models
should be considered useful. After considering the output for each of these
four models, Model 3 (X1, X3, X4) is a more appropriate predictor of selling
price as it has the highest adjusted r2, lowest standard error and all of the
independent variables make a significant contribution to the model.

3. Using Model 3:
Independent P value Alpha = 0.05 significant??
Variable
Age 0.033 As p<alpha (0.05) reject Ho, conclude that
age is making a significant contribution to the
regression model.
Location 0.000000757 As p<alpha (0.05) reject Ho, conclude that
location is making a significant contribution to
the regression model.
Number of 0.00027 As p<alpha (0.05) reject Ho, conclude that
bedrooms number of bedrooms is making a significant
contribution to the regression model.

4. The (Adjusted) Coefficient of multiple determination for Model 3 is 0.96. Thus,


96% of the variation in selling price can be explained by the combined
variation of age, location and number of bedrooms.

5. Selling Price = 782.09 - 5.12*(age) - 9.90*(location) + 32.8*(number of


bedrooms)

6. Selling Price = 782.09 - 5.12*(age) - 9.90*(location) + 32.8*(number of


bedrooms)
Selling Price = 782.09 - 5.12*(16) - 9.90*(8) + 32.8(2)
Selling Price = $686.57 (thousand or $686 579).

7. It is not possible to estimate the selling price of an apartment that is 24 years


old as this is outside the range of the data. Prediction using regression
analysis only allows for interpolation (prediction within the range of the data)
not extrapolation (prediction beyond the data range).

8. The age coefficient is -5.12 this can be interpreted as follows: as the age of an
apartment increases by one year the expected selling price declines on
average by $5.12 (thousand) or $5120, holding constant all other variables.

9. Answers could include air-conditioning present, river/ocean views, close to


shops/schools, garage, size of floor space (metres2), etc, etc.

Page 37 of 38
Case Study H: Customer Satisfaction at Holiday Resort

1. H0: μ ≤ 7.25
The null hypothesis states that the average customer satisfaction rating is less
than or equal to 7.25.

2. H1: μ > 7.25


The alternate hypothesis states that the average customer satisfaction rating
is greater than 7.25.

3. Zstat= 7.37-7.25
0.76/√77

=1.39

4. Since Zstat= 1.39 <Zcrit = 1.645, do not reject H0.

5. There is not enough evidence to conclude that that the average customer
satisfaction rating is greater than 7.25.

Page 38 of 38

Potrebbero piacerti anche