Sei sulla pagina 1di 8

Stats 250 W15 Exam 2 SOLUTIONS

1. Protecting the Environment Part 1 ~ Some people consider themselves “green” meaning they are supportive
when it comes to environmental issues. But how do people really act? American per capita use of energy is
roughly double that of Western Europeans. Would people be willing to pay more for gas to fund environmental
projects? A research team at Michigan State University selected a random sample of 1000 Michigan adults and
asked each if they would support an additional 2% tax on gasoline in order to fund various environmental
projects. For this sample, 450 of the 1000 adults reported a willingness to do so. Create a 90% confidence
interval to estimate the population proportion for all Michigan adults who would support such a gasoline tax.
[3] Note that a general (not conservative) interval was requested here and z* of 1.64 or 1.65 is ok.
pˆ (1 − pˆ ) (0.45)(1 − 0.45)
pˆ ± z ∗ → ( 0.45) ± 1.645
n 1000
0.45 ± 1.645 0.0157 → 0.45 ± 0.0259
Final Answer: ____0.4241______ to ____0.4759______

2. Protecting the Environment Part 2 ~ A researcher at the University of Minnesota wanted to conduct a similar
study as did Michigan State University to estimate the population proportion of Minnesota adults who would
support an additional 2% tax on gasoline to fund environmental projects.
a. The researcher would like to have a 95% confidence interval estimate with a width of (at most) 8%.
Determine the minimum sample size that would be required.
[3]
Note that a width of 0.08 is the same as 2m (with a margin of error of 0.04).
𝒛𝒛∗ 𝟐𝟐 𝟏𝟏.𝟗𝟗𝟗𝟗 𝟐𝟐
z* = 1.96  n = � � =� � = (24.5)2 = 600.25 and ALWAYS ROUND UP
𝟐𝟐𝟐𝟐 𝟎𝟎.𝟎𝟎𝟎𝟎
If use z* of 2, then n = 625 is the final answer.

Final answer: _____ 601 ________


The resulting confidence interval from the Minnesota study was reported as (0.40, 0.48).
For part (b) and (c) indicate if the statement is correct or incorrect.

[2] b. We are 95% confident that the population proportion of all Minnesota adults who would support an
additional 2% tax on gasoline to fund environmental projects is in the interval 0.40 to 0.48.

Correct Incorrect

[2] c. Based on the same survey results, the width for a 90% confidence interval for the population proportion all
Minnesota adults who would support an additional 2% tax on gasoline to fund environmental projects
would be more than 8%.

Correct Incorrect

[2] d. Consider now using these same Minnesota survey results to test, at a significance level of 0.05, the null
hypothesis H0: p = 0.50 against the majority hypothesis of Ha: p > 0.50.

Then the resulting p-value will be (circle one) greater than less than 0.05,

so that H0 (circle one) would be would not be rejected.

Note: if selected “LESS THAN” as answer at start of sentence, then must circle “WOULD BE”
to earn 1 of 2 points; if “LESS THAN” and “WOULD NOT BE” both points are lost as not consistent.

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 1 of 8


Stats 250 W15 Exam 2 SOLUTIONS
3. Poor Roads in Cleveland ~ A recent UM study reported that 25% of Michigan’s top elected or appointed state
government officials rated the roads in their area as poor. The Director of Transportation and Roadways in
Cleveland read this article and decides to repeat the study in Cleveland. She wishes to assess if the population
proportion of all Cleveland adults rating their roads as poor is less than 25%. She approves the survey design
which will take a random sample 300 Cleveland adults and use the results to test H0: p = 0.25 vs Ha: p < 0.25 at
the 10% significance level.
a. Given budgetary issues in Cleveland, the director recommends that the survey be piloted by initially taking
just a random sample of 20 Cleveland adults and analyzing the results. Then the administration could
re-evaluate whether it is reasonable to follow with a larger scale study. What distribution would be used
for this pilot analysis in order to determine a p-value? Be complete.
[2] Since the sample size is too small (20(0.25) = 5 which is less than 10), a small sample binomial test would
need to be conducted. The test statistic would be the count statistic X and the distribution to use to find
the p-value is the Bin(n=20, p=0.25).

Final answer: _____ Binomial (n = 20, p = 0.25) _________


b. Based on the p-value from the pilot survey, the decision was to reject H0. What type of mistake could have
been made? Circle one.
[1] Type I error Type II error Both Type I and Type II error Neither as the decision was already made

c. Based on the pilot study, the decision was made to follow up with the larger scale study. Out of the
300 Cleveland adults that were contacted, 66 reported their roads are in “poor” shape. Report the test
statistic with its symbol, provide the corresponding p-value and a well-labelled sketch showing that p-value.
[6]
𝟎𝟎.𝟐𝟐𝟐𝟐−𝟎𝟎.𝟐𝟐𝟐𝟐 −𝟎𝟎.𝟎𝟎𝟎𝟎
𝒁𝒁 = = = −𝟏𝟏. 𝟐𝟐 and p-value will be the area to the LEFT of -1.2 under N(0,1) model
(𝟎𝟎.𝟐𝟐𝟐𝟐)(𝟎𝟎.𝟕𝟕𝟕𝟕) 𝟎𝟎.𝟎𝟎𝟎𝟎𝟎𝟎

𝟑𝟑𝟑𝟑𝟑𝟑

N(0,1)

p-value = area
to left of -1.2

-1.2 0 Z (values)

Test Statistic: __ z __ = ____-1.2 _________ p-value = ______ 0.1151____________


Symbol

d. At the 5% level of significance, provide the appropriate conclusion in context.


[2]
There is not sufficient evidence to conclude that the population proportion of all Cleveland adults
who rate their roads as poor is less than 25%.

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 2 of 8


Stats 250 W15 Exam 2 SOLUTIONS
4. Recruiting T cells ~ It has been suggested that cytotoxic T lymphocytes (T cells) participate in controlling tumor
growth and that they can be harnessed to use the body’s immune system to treat cancer. One study
investigated the use of a T cell-engaging antibody, blinatumomab, to recruit T cells to control tumor growth.
The data below are T cell counts (1000 per microliter) at baseline (beginning of the study) and after 20 days on
blinatumomab for six subjects in the study. The response variable is the difference (after 20 days minus
baseline) in T cell counts. Some corresponding SPSS output based on these data is also provided.
Subject 1 2 3 4 5 6
Baseline 0.04 0.02 0.00 0.02 0.38 0.33
After 20 days 0.28 0.47 1.30 0.25 1.22 0.44
Difference (after 20 days – baseline) 0.24 0.45 1.30 0.23 0.84 0.11

One-Sample Test
Test Value = 0
Mean 95% Confidence Interval of the Difference
t df Sig. (2-tailed) Difference Lower Upper
Difference 2.918 5 .033 .5383 .0641 1.0126
a. Briefly explain why this is a paired design.
[1] Each of the 6 subjects were measured twice OR we have two measurements on each subject.
b. The treatment will be considered successful if the population mean difference in T cell counts is higher after
20 days on blinatumomab over the baseline. For this situation, specify the null and alternative hypotheses.
[3]
H0: _____ µd = 0 _________ Ha: _____ µd > 0__________
c. Use the provided SPSS output to report the test statistic and the exact p-value that corresponds with your
hypotheses in part (b).
[2]
Test statistic: _____ 2.918 ________ p-value: ____ 0.033/2 = 0.00165 _______
d. Which of the following is the appropriate statistical decision and conclusion at the 5% significance level?
[2] Circle one.
• Reject H0; there is insufficient evidence to say the population mean difference in T cell counts
is higher after 20 days on blinatumomab over the baseline.
• Reject H0; there is sufficient evidence to say the population mean difference in T cell counts
is higher after 20 days on blinatumomab over the baseline.
• Fail to reject H0; there is insufficient evidence to say the population mean difference in T cell counts
is higher after 20 days on blinatumomab over the baseline.
• Fail to reject H0; there is sufficient evidence to say the population mean difference in T cell counts
is higher after 20 days on blinatumomab over the baseline.

e. The researcher remembers there are “assumptions” for any test, to ensure its integrity. She believes the
subjects are representative of the population and can be treated as a random sample; but since her sample
size was small, there is an additional “assumption.” Clearly state that assumption in context.

[2] The assumption: The population of differences in T cells (after 20 days less baseline) should be normal.
OR The differences in T cell counts (after less base) for the population of all subjects should follow
a normal model.

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 3 of 8


Stats 250 W15 Exam 2 SOLUTIONS
5. Stepping Toward Health ~ A new fitness program has been in place at a large company for a few months. As
part of the program, each employee was provided a fitness tracker to record the number of steps she takes
each day. The organizer of the program would like to estimate the mean number of steps taken daily for all
employees. A random sample of 35 employees was selected and the number of steps taken yesterday was
recorded resulting in a mean of 5765 steps and a standard deviation 800 steps.

a. A histogram of the 35 observations indicates that the model for the number of daily steps in the population
may not be normal. However, a confidence interval to estimate the population mean can still be
constructed because:

[2] Circle one.


• The sample size is large enough to assume that the model for the number of steps taken daily
will be approximately normal.
• The sample size is large enough to assume that the model for the sample mean number of steps
taken daily will be approximately normal.

This is because of the CLT!


b. Calculate a 95% confidence interval to estimate the mean number of steps taken daily for all employees in
the program. Show all work, report the interval values out to 1 decimal place, and include your units.
[4]

𝟖𝟖𝟖𝟖𝟖𝟖
� ± 𝒕𝒕∗ [𝒔𝒔. 𝒆𝒆. (𝒙𝒙
𝒙𝒙 �)] → 𝟓𝟓𝟓𝟓𝟓𝟓𝟓𝟓 ± 𝟐𝟐. 𝟎𝟎𝟎𝟎 � �
√𝟑𝟑𝟑𝟑
→ 𝟓𝟓𝟓𝟓𝟓𝟓𝟓𝟓 ± 𝟐𝟐. 𝟎𝟎𝟎𝟎(𝟏𝟏𝟏𝟏𝟏𝟏. 𝟐𝟐𝟐𝟐𝟐𝟐𝟐𝟐) → 𝟓𝟓𝟓𝟓𝟓𝟓𝟓𝟓 ± 𝟐𝟐𝟐𝟐𝟐𝟐. 𝟖𝟖𝟖𝟖𝟖𝟖𝟖𝟖)

Or 5489.1 to 6040.9

Final Answer: ____ 5489.1 steps ____ to ____ 6040.9 steps _____

c. Consider the following incorrect statement regarding the 95% confidence level used.

Briefly edit the statement to make it correct so it can be used in the report summary.
Do not rewrite the sentence.
[2]

If we were to repeat this study many times,

we would expect 95% of the resulting intervals

to contain the sample mean number of steps taken daily.


population

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 4 of 8


Stats 250 W15 Exam 2 SOLUTIONS
6. Is the dress white and gold or blue and black?—On February 26, 2015 a picture of a cocktail dress originally
uploaded to the blog Tumblr swept the Internet and managed to divide the population over a simple question:
What color is the dress? Some viewers saw gold and white while others insisted the dress is blue and black. It
made people stop and ask, ‘What exactly is going on with this image?’ Everyone had theories about why people
might see the dress differently. A random sample of adults was collected and looked at how the black/blue
percentage changed with respect to sex, where vision differences are well-known.
“What color is the dress?”
Group Blue/Black White/Gold Total
1 =Males 145 105 250
2 = Females 150 150 300
a. Let p1 represent the population proportion of all male adults who see the dress as blue/black and let
p2 represent the population proportion of all female adults who see the dress as blue/black The study
conjectured that males are more likely to see the dress as blue/black than females. State the appropriate
hypotheses to be tested:
[2]
H0: ____ p1 = p2 (p1 – p2 = 0) _____ Ha: ____ p1 > p2 (p1 – p2 > 0) _____
b. In order to compute the test statistic we would need to make use of an overall estimate of the common
proportion. Provide that overall estimate (out to 4 decimal places) and include the appropriate symbol.
𝟏𝟏𝟏𝟏𝟏𝟏+𝟏𝟏𝟏𝟏𝟏𝟏 𝟐𝟐𝟐𝟐𝟐𝟐
[2] �=
𝒑𝒑 = 𝟓𝟓𝟓𝟓𝟓𝟓 = 𝟎𝟎. 𝟓𝟓𝟓𝟓𝟓𝟓𝟓𝟓
𝟐𝟐𝟐𝟐𝟐𝟐+𝟑𝟑𝟑𝟑𝟑𝟑

� __ = __ 0.5364 __
Final answer: __ 𝒑𝒑
Symbol
c. One of the conditions for the test to be valid involves having two independent random samples, which is
reasonable from the design of the study. Validate the remaining assumption.
[2]
We need that each sample would be EXPECTED (under the null hypothesis) to have at least 10 Blue/Black
responses and at least 10 White/Gold responses. Using the common estimate from part (b) we have:
250(0.5364) = 134.1 and 250(1 – 0.5364) = 250(0.4636) = 115.9
300(0.5364) = 160.92 and 300(1 – 0.5364) = 300(0.4636) = 139.08

d. Based on the results, the observed test statistic value is 1.87 with a corresponding p-value of 0.0307.
Which of the following is a correct meaning of this p-value?
[2] Circle one.
• the probability that the null hypothesis is true is 0.0307.
• the probability of seeing a test statistic of 1.87 or more extreme is 0.0307.
• if there is no difference in the population rates for seeing the dress as blue/black,
we would see a test statistic of 1.87 or more extreme with probability 0.0307.
• none of the above statements is a correct and complete meaning for this p-value.

e. This p-value of 0.0307 tells us the result is:


[2] Circle one.
• significant at both α = 0.05 and at α = 0.01.
• significant at α = 0.05 but not at α = 0.01.
• significant at α = 0.01 but not at α = 0.05.
• not significant at either α = 0.05 nor at α = 0.01.
Stats 250 W15 Exam 2 SOLUTIONS ~ Page 5 of 8
Stats 250 W15 Exam 2 SOLUTIONS
7. Number of Office Hours – During the first Fall term of teaching a very large intro course, Dr. Z determined a
probability model for the number of office hours that his students attended in a given week. He then made
some changes as to how his office hours were publicized and posted for his students in the Winter term and
determined a probability model for the number of office hours that students attended in a given week for that
term. Below are these two models along with the corresponding means and standard deviations.

H0: Model for Number of Office Hours attended by Students Weekly during the Fall term
X = # Hours 0 1 2 3 4 5
Probability 0.1 0.4 0.2 0.1 0.1 0.1
Mean number of office hours attended weekly = 2 hours, with standard deviation = 1.5 hours

Ha: Model for Number of Office Hours attended by Students Weekly during the Winter term
X = # Hours 0 1 2 3 4 5
Probability 0 0.1 0.1 0.1 0.3 0.4
Mean number of office hours attended weekly = 3.8 hours, with standard deviation = 1.3 hours

a. In the Spring term, a former student of Dr. Z’s stops by his office to pick up his final exam. The student
cannot remember which term he took the class, so Dr. Z decides he will ask the student how many office
hours he attended weekly when in his class. If the student’s response is 4 hours or more, Dr. Z will decide
the student was in the Winter term class. So Dr. Z will be picking between the following hypotheses:
H0: The student was in the Fall term class versus Ha: The student was in the Winter term class
using the decision rule: Reject H0 if the number of office hours attended weekly was at least 4.

i. For this decision rule, find the level of significance, that is, compute α. Apply the DEFINITION of α.
[2]
α = P(Reject H0 when H0 is true) = P(X = 4 or X = 5 under the H0 model) = 0.1 + 0.1 = 0.2

Final answer = _______ 0.2 ________


ii. For this decision rule, compute the statistical power. Apply the DEFINITION of Power.
[2]
Power = P(Reject H0 when Ha is true) = P(X = 4 or X = 5 under the Ha model) = 0.3 + 0.4 = 0.7

Final answer = _______ 0.7 ________


b. Suppose a group of 100 students of Dr. Z’s Fall term class will be stopping by his office to pick up their final
exams. Dr. Z decides he will pass each exam back and ask how many office hours they attended weekly.
Assume that these students could be treated as a random sample and that they will all answer honestly.
Which of the following graphs best portrays the model for the mean number of office hours attended for
this sample? Clearly circle your selected graph. By the CLT, we have approx. N(2, 1.5/sqrt(100))
[2]

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 6 of 8


Stats 250 W15 Exam 2 SOLUTIONS
8. Paired or Independent? – For each scenario below determine if it is a paired data scenario or based on
two independent samples.
[4] Circle one in each situation.
a. A study is conducted to compare sales of 8.4 fluid ounce
cans of Red Bull Energy Drink at a convenience store versus a grocery store. For each store, the number of
cans sold each week will be recorded for the next 10 weeks. Counts will be paired by the week

Paired Data Two Independent Samples


b. You interview 200 university students in their freshman year and again in their senior year and each time
ask the student to report the number of hours they spend exercising weekly. Two measures on each student

Paired Data Two Independent Samples

9. Truck Weight Limit ~ A warehouse has a fleet of small trucks to transport crates of their floor tile. Each truck
can carry a maximum load of 2000 pounds. Suppose that the weight of a standard crate of floor tile is normally
distributed with a mean weight of 480 pounds and a standard deviation of 20 pounds.
What is the probability that a random sample of 4 crates placed in a truck will exceed the maximum load? (Hint:
think about what exceeding the maximum load would imply about the sample mean weight for these 4 crates.)

[4] Using the hint to think about the sample mean … since X=crate weight has a N(480,20) model, the sample
mean (for a random sample of 4 crates) will also have a normal model N(480, 20/sqrt(4)), that is, N(480,10).

P(Total weight of the 4 crates exceeding 2000)


= P(Sample mean weight for the 4 crates exceeding 2000/4) = P(𝑿𝑿 � > 𝟓𝟓𝟓𝟓𝟓𝟓)
So we have Z = (500 – 480)/10 = 2, and the area to the right of 2 under a standard normal is 0.0228
(or about 2.5% using the empirical rule).

Final Answer: _____ 0.0228 (or 0.25) ______


Common errors were to only use a standard deviation of 20, others called their standardized quantity a ‘t’ –
but s is known here for the original population, so it will be z.
Some students tried to solve P(Total weight > 2000); the total weight will be normally distributed with an
average of 4(480) = 1920 lbs, but to find the standard deviation you have to work with the variance.
The variance of the total (or sum) will be the sum of the variances = (20)2 + (20)2 + (20)2 + (20)2 = 1600;
so the standard deviation of the total will be sqrt(1600) = 40 lbs.
So the z score for the total is Z = (2000 – 1920)/40 = 80/40 = 2 and the probability will be again 0.0228.

10. Statistically Significant – A researcher will compare middle students and high school students on 25 different
Yes/No questions. He will use a 10% significance level to carry out the 25 independent samples z-tests
(one for each question) to compare the two population proportions. If, for each test, the null hypothesis of
no difference in population proportions is actually true, how many decisions are expected to be correct?
[2]
25 different tests each with H0 true and conducted at the 10% level; so expect 10% of the decisions
to be wrong, so 90% of the 25 tests or 22.5 tests are expected to be correct.
Common errors: (1) 2.5 tests = how many tests expected to REJECT H0 (be statistically significant) which
would be an Incorrect decision, (2) rounded to whole value (but expect 22.5), (3) 90% (not how many).

Final Answer: _____ 22.5 tests _____

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 7 of 8


Stats 250 W15 Exam 2 SOLUTIONS
11. Name that Scenario – For each research problem below, determine if the appropriate method to address the
problem would be to make a confidence interval (CI) or to conduct a test of hypotheses (HT).

If a CI, then provide the notation for the corresponding parameter the CI is for.

If a HT, then clearly state the appropriate null and alternative hypotheses to be tested.

The last scenario has one additional question, so be sure to answer it too.

a. The Dean of a college want to learn about the proportion of all students have a summer internship before they
graduate. His research team takes a representative sample of 200 students who will be graduating this May and
finds that 85 of the 200 sampled students had a previous summer internship.
[2]

CI for ____ p _____ OR H0: _____________________ versus Ha: ______________________

b. A sociologist developed a test to measure attitudes about public transportation. She is interested in estimating
the difference between the average score for younger residents (under 25 years old) and the average score for
older residents (50 years or older).
[2]

CI for µ1 – µ2 or µyoung – µold OR H0: __________________ versus Ha: ____________________

c. The average age of customers at a local nightclub has been 25 years old. The owner has re-modeled the club
in hopes the new décor will attract an older crowd. She will take a random sample of recent customers and
record their ages to assess if there has been an increase in the average age.
[2]

CI for ____________ OR H0: _____ µ = 25_______ versus Ha: _____ µ > 25___________

[2] Clearly define the parameter of interest in context:

The parameter ___ µ ____ represents ___ the POPULATION AVERAGE age of all customers with new décor
__ or the MEAN age of ALL customers that visit the re-modeled club. ____________________________

Common error = population must be of all ages for all RECENT customers (so after remodel).

When you are all done with the exam,


please leave your formula card and seat number at your place.
Bring your exam up front and sign in and then collect all your belongings.
Check eCoach and Canvas starting next Monday, March 30 for scores and solutions.

Stats 250 W15 Exam 2 SOLUTIONS ~ Page 8 of 8

Potrebbero piacerti anche