Sei sulla pagina 1di 31

Econ 102A

Introduction to Statistical Methods for Social Scientists

Stanford University

Course Materials for Week 9 and Week 10

Professor Scott M. McKeon

Winter Quarter, 2019 - 20

© Scott M. McKeon
All Rights Reserved
Weeks 9 and 10

Goals:

1. Developing confidence intervals for proportion problems.

2. Becoming familiar with hypothesis testing (or ‘tests of significance’).

3. Learning the concept of p-values and how they relate to levels of significance.

4. Constructing confidence intervals for the difference in means between two populations.

5. Becoming familiar with hypothesis testing in a two-population setting.


Handout #76
Econ 102A Statistical Methods for Social Scientists Page 1 of 3

Weeks 9 and 10 Worksheet

1. National Motors has equipped the ZX-900 with a new disc brake system. We define the
stopping distance for a ZX-900 to be the distance (in feet) required to bring the automobile
to a complete stop from a speed of 35 miles per hour under normal driving conditions
(using this new brake system). Let  be the mean stopping distance of all ZX-900s. One of
ZX-900’s major competitors has advertised a mean stopping distance of 70 feet. National
Motors would like to claim in a new television commercial that the ZX-900 achieves a
shorter mean stopping distance with this new brake system.

In order to test whether this new brake system does in fact lead to a shorter stopping
distance than that claimed by our competitor, we have formulated the hypotheses:

H0:   70
Ha:  < 70

Notice that rejection of the null hypothesis implies that the new brake system does, in fact,
lead to a shorter stopping distance.

The standards and practices division of a major television network will permit National
Motors to run the commercial if the null hypothesis can be rejected at a 4% level of
significance.

Suppose National Motors selects a sample of 65 ZX-900s. The company records the
stopping distance of each of these automobiles and calculates the mean and standard
deviation of the sample as x = 68.23 feet and s = 7.05 feet respectively.

(a) Based on this sample, what conclusion do we reach regarding our null hypothesis?
Can National Motors run the commercial?

(b) Determine the p-value of the sample.

(c) Assume a standard deviation of 7.05 feet, but also assume we only have enough
resources to test the new brakes out on 25 ZX-900s. Determine the range on sample
means that would lead to rejecting the null hypothesis.

2. Reconsider the setup in Question 1 above. But, suppose the objective of the hypothesis test
is to detect any difference in our brakes versus those of our competitor (i.e., maybe ours are
better, but maybe they are worse).

This time, the sample consisted of 80 ZX-900s which had a mean stopping distance of
x = 71.15 feet and a standard deviation of s = 7.86 feet. We will run the test using a 15%
level of significance.
Handout #76
Page 2 of 3

(a) Based on this sample, should we conclude that there is a difference in mean stopping
distance between us versus our competitor?

(b) Determine the p-value of the sample.

3. (Based on Exercise 8.46 from textbook) Each participant in a survey on coffee preference
tastes two unmarked cups of coffee: one cup with Brand X instant coffee and one with
fresh-brewed coffee. After tasting them in random order, the participant then selects which
cup s/he prefers. Of the 60 subjects who participate in the study 32 prefer the fresh-brewed
cup and 28 prefer the instant coffee.

The makers of Brand X instant coffee want to advertise the phrase, “tastes almost as good
as the real thing” on the packaging which contains the instant coffee. Suppose Brand X
executives are able to make this claim if over 40% of the population selects Brand X instant
coffee in a blind taste test.

(a) Construct a hypothesis test on whether Brand X instant coffee executives can include
the phrase, “tastes almost as good as the real thing” on their package’s label. (Rejection
of the null would imply that the packaging can include the label.) Resolve the test at a
5% level of significance using the sample quoted above.

(b) Determine the lowest level of significance such that Brand X executives would be
allowed to include the phrase on their label.

4. (Based on Exercise 8.95 from textbook) An experiment examined the relationship between
tips and server behavior in a restaurant. In one scenario, the server repeated the customer’s
order word for word, while in the other scenario, the orders were not repeated. Tips were
received in 47 of the 60 trials when the server repeated the order (i.e., the ‘parrot effect’)
and in 31 of the 60 trials under the no repeat scenario. Compute a 95% confidence interval
for the difference in population proportions.

5. (Based on Exercise 7.83 from textbook) A recent study of food portion sizes looked at
beverage consumption of pre-teen children. One part of this study compared 20 children
who were 7 to 10 years old with 34 children who were 11 to 13 years old. Assume
beverage consumption for both groups of children is normally distributed. The younger
children consumed an average of 12.2 ounces of sweetened drinks per day, while the older
children averaged 14.5 ounces. The standard deviations were 5.7 ounces and 4.4 ounces
respectively.
Handout #76
Page 3 of 3

(a) Construct a 95% confidence interval for the difference in means between these two
populations of age groups.

(b) Consider a researcher who wishes to claim that the older cohort of children consumes at
least 2 more ounces of sweetened beverage than does the younger cohort. Consider
testing this claim at  = 6%. Determine the lowest average we would need in the above
sample for the older cohort in order that the researcher be able to make the claim
(assume the younger cohort average of 12.2 ounces and the sample standard deviations
of 5.7 and 4.4 still apply.)
Handout #77
Econ 102A Statistical Methods for Social Scientists Page 1 of 7

Hypothesis Testing: t-statistics and p-values

• T-statistics

The idea behind hypothesis testing is that you wish to test whether or not some statistic of
interest assumes a particular value. The statistic of interest is commonly a population mean.
You propose the value of the population mean that you wish to test and encode it within the
null hypothesis. For example, your null hypothesis might assert, “I believe the population
mean of Statistic A is μ = 43” or you might say, “I believe the population mean of Statistic B is
28 and, for purposes of my test, it does not matter if this mean is lower (i.e., μ ≤ 28).” The
alternative hypothesis (in this course) is simply the opposite statement to the null hypothesis
(i.e., μ ≠ 43 or μ > 28 respectively).

We then venture out into the world and collect a sample of data and calculate the ensuing
sample mean, x , and sample standard deviation, s . From this we can further calculate the test
statistic (or t-statistic) of the sample, which considers how deviant the sample mean is from the
hypothesized mean we wish to test. This is found simply by standardizing the sample mean
under the assumption that the true population mean is as specified in the null hypothesis. That
is, you always assume the null hypothesis to be true when initially running the test. So, if the
null hypothesis asserts that the population mean is μ = μ0 then the t-statistic for the sample
mean is the standardized value:

x  μ0
t-statistic =
s n

If the t-statistic is small in absolute value, this provides further credence that the null
hypothesis may well be true (since the sample mean is close to the proposed population mean).
However, if the t-statistic is large in absolute value, this could cast doubt on the credibility of
the null hypothesis (since the sample mean is far from the proposed population mean). In this
way, the t-statistic measures the compatibility between the null hypothesis and the new sample
data.

• P-values

First off, it is important to point out that the ‘p’ in p-value stands for ‘probability.’ Loosely
speaking, the p-value is the probability of getting a sample mean at least as deviant as that
which actually occurred. A simple example will help illustrate this point:

Suppose the average height of a man is normally distributed with a mean of 6 feet and standard
deviation of 4 inches. Suppose further that the next man who walks into your office is 6 ft 8
inches tall. What is that man’s p-value?
Handout #77
Page 2 of 7

Well, what actually happened is that a 6 ft 8 inch man walked into your office. What is the
probability that something at least as deviant or ‘stranger’ would have happened? That is, what
is the chance that a person 6 ft 8 inches tall or taller would have walked into your office? That
probability is the p-value.

Visually, we have the following:

The shaded region represents a


more deviant or ‘stranger’
observation.

This shaded region (a


probability) is the man’s
p-value.

6 ft 6 ft
8 in

The particular man who


came to your office is
here

To actually find the probability, you need to first compute the t-statistic (which we find to be
2.0) and then look this number up on a t-table. The t-table is what converts standard deviations
(i.e., t-statistics) into their corresponding probabilities. As mentioned in lecture, these tables
are already built into Excel through the TDIST command.

So, in the end, the t-statistic and p-value are closely connected. Looking at the above graph the
label 6 ft 8 inches got converted to a t-statistic of 2.0. The above graph might be considered
the ‘real world’ with the variable being measured in feet or inches whereas in the
‘standardized’ world we have:

0.0 2.0
Handout #77
Page 3 of 7

and the p-value is just the corresponding probability which lies above that t-statistic of t = 2.0
(i.e., the shaded region above).

Remember that a p-value is the probability of something ‘at least as deviant’ happening than
that which actually occurred. And, since deviant outcomes imply being away from the mean
(i.e., away from the center of the graph), the p-values will therefore always be in the tails of the
normal distribution because, after all, the tails of the distribution are where the really deviant
stuff is happening.

• Putting the Concepts together in a Hypothesis Test

Consider the average number of e-mails you receive in a given week. Take this as our statistic
of interest. Notice this is a ‘group average’ since you are considering the specific number of
weekly e-mails you have received over time and you then consider taking the average of all
these weekly numbers. From your experience, suppose you estimate that the average number
of e-mails you receive in a week is 150. We may want to test whether your estimate is too low.
So, we set up the null hypothesis to reflect not only that you believe the true average to be 150
but also that you do not care if this overstates the truth (since we want to only test whether 150
is too low). Therefore, you set up this ‘hypothesized’ average as follows:

H0: average weekly e-mails received ≤ 150


Ha: average weekly e-mails received > 150

Now the idea of hypothesis testing is the following: We are going to run out into the world and
take just one more sample of data. And, we are going to use this one more sample of data to
‘test’ whether our null hypothesis seems to be a good belief or a bad belief.

Now, in this example, ‘one more sample of data’ means recording the number of e-mails
received over a course of n weeks, and computing the mean (i.e., average) of these n numbers.

For the sake of argument suppose the sample mean number of e-mails you obtain is 138. In
this case there is no need to proceed. Since 138 < 150 this new sample only substantiates your
null hypothesis. It lends further credence to the belief that the average number of e-mails you
receive in a week is indeed 150 or less. Story over.

But now suppose the sample mean number of e-mails you obtain is 174. In this case, our belief
is called into question. Is 174 really a huge number compared with 150? Is the number 174 so
much larger than 150 (i.e., so deviant from 150) that our belief of getting an average of 150 or
less e-mails a week is really a ‘bad belief’ and we should therefore change the 150 to
something higher?

This is the whole point of calculating a t-statistic and its associated p-value. If 174 ends up
being too deviant from 150 we then conclude that the hypothesis of 150 or less e-mails is a
‘bad belief’ and we should readjust our null hypothesis for next time.
Handout #77
Page 4 of 7

To figure out if 174 is too deviant from 150:

(1) Calculate the t-statistic of 174. This means determining how many standard deviations 174
is away from 150. In addition to the sample mean, this calculation will also require
knowledge of the sample size and sample standard deviation (since, in the typical case, the
sample standard deviation serves as a proxy for the population standard deviation).

(2) Find the associated p-value for this t-statistic. That is, what is the chance of obtaining a
sample mean that is even more deviant than 174 e-mails? In other words, a sample where
you average getting 175 e-mails, or 176 e-mails, or 177 e-mails, etc. This amounts to
finding the area in the tail corresponding to all numbers at 174 e-mails and beyond (again,
just like the ‘height of a man’ p-value example from above – find the probability of getting
something at or more deviant than what actually occurred).

(3) Compare the p-value to the level of significance. The level of significance is something
imposed by you (completely at your discretion) from the outset of the exercise. A common
level of significance to use is 5%. Notice that the level of significance is stated as a
probability. And, remembering that a p-value is also a probability (recall that the p in
p-value stands for ‘probability’), p-values can be compared directly to the level of
significance. You cannot, say, compare a level of significance to a t-statistic since one is a
probability and the other is measured in standard deviations (comparing apples to oranges).
This is the point of converting t-statistics to p-values: so your sample mean of 174 e-mails
is now in the form of a probability which can be directly compared to a level of
significance.

In general, you reject the null hypothesis if your p-value is less than the level of significance;
do not reject your null hypothesis if your p-value is above this level. Notice this makes
intuitive sense. If the p-value associated with 174 e-mails is, say, 1% this means that we only
have a 1% chance of obtaining a sample mean of 174 e-mails or more if, in actuality, the mean
number of weekly e-mails received is 150. So, obtaining a sample mean of 174 is very unlikely
if the true population mean is 150. Since 174 is so deviant, you should probably change your
hypothesis of 150 or less e-mails since 150 probably understates the true value (i.e., a p-value
less than the level of significance means reject your null hypothesis).

Alternatively, if the p-value associated with a sample mean of 174 e-mails is, say, 35% this
means that 174 e-mails really is not that far away from 150 e-mails, and it would not have been
too strange to observe a sample mean of 174 e-mails if your average weekly e-mails truly is
150 per week. So, you do not need to necessarily change your hypothesis since 174 is actually
pretty close to 150 (i.e., a p-value greater than the level of significance means do not reject your
null hypothesis).
Handout #77
Page 5 of 7

• Visualizing Hypothesis Tests

Consider again the e-mail example where we are testing the hypothesis of getting an average of
150 e-mails (or less) each week. Remember, the idea of hypothesis testing is that you are now
going out into the world and you will collect just one more sample of data in order to test this
belief (i.e., to test your null hypothesis).

But, before you have even gone out to collect this sample, we set up the ‘when will we reject’
and ‘when will we not reject’ conditions. This is done through the level of significance, a
probability which you are free to choose to be anything you want. In particular, the level of
significance has absolutely nothing to do with the sample you are about to take (and, for that
matter, it has nothing to do with any of the samples you may have taken when forming your
hypothesis in the first place). It is analogous to the arbitrary nature of the confidence level
chosen in confidence intervals.

Suppose the level of significance is taken to be 5%. Then, the visual setup of our hypothesis
test is as follows:

This shaded region is


exactly 5% of the
whole area under the
curve.

150

If the sample mean for the new


sample is in this range, you will
reject the null.
If the sample mean for the new
sample is in this range, you will
not reject the null.
Boundary value for
rejection.

So, the level of significance indirectly sets the boundary line for ‘reject’ versus ‘not reject.’
The word ‘indirectly’ is used here because the boundary value is measured in actual number of
average e-mails whereas the level of significance is the probability of the shaded region.
However, one implies the other.
Handout #77
Page 6 of 7

Now it is time to go out into the world, take the next sample and use it to test the null
hypothesis. As we assumed before, suppose the ultimate sample mean you compute from this
new sample is 174 e-mails.

Now, if the p-value ultimately ends up being 1%, then this is what the corresponding visual
must look like:

This shaded region is


exactly 5% of the
whole area under the
curve.

The circled region is


only 1%

In terms of the x-axis, 174 must


150
have hit the graph here because
174 there is only 1% probability to
the right of where it hit.

If the p-value ends up being 1%, then 174 e-mails must be in the ‘reject region’ since the reject
region contains as much as 5% of the deviant outcomes. This is why you compare p-values to
levels of significance – since 1% < 5%, then 174 e-mails must land in the rejection region.

But, if the p-value ultimately ends up being 35%, then this is what the corresponding visual
must look like:

This shaded region is


exactly 5% of the
whole area under the
curve.

The circled region under


the curve is now 35%
150
174
In terms of the x-axis, 174 must
have hit the graph here because
there is now 35% probability to the
right of where it hit.
Handout #77
Page 7 of 7

If the p-value ends up being 35%, then 174 e-mails must be fairly close to the mean (but
certainly to the right of the mean since 174 > 150). Here, 174 lands in the ‘do not reject’ region
since the reject region only contains the 5% most deviant outcomes. Again, this is why you
compare p-values to levels of significance – since 35% > 5%, then 174 e-mails must land to the
left of the rejection area (i.e., it is in the ‘not reject’ range). A p-value of 35% is not strong
evidence that 150 is a bad (i.e., incorrect) guess of the population mean. Under this case,
174 is just a little bit more than 150 (notice that the p-value of 150 would be 50%).
Handout #78
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

A Summary of Hypothesis Testing

Verbally: We may want to test whether the population mean of some statistic is equal to a
specific value. We quantify this assertion through null and alternative hypotheses.
We then test the null hypothesis with a new sample.

Visually:
Ho:   o Ho:   o Ho:  = o
Ha:  > o Ha:  < o Ha:   o
(one-tailed test) (one-tailed test) (two-tailed test)

   
 

 0  0  0

Mathematically: Take the new sample and calculate the sample mean and sample standard
deviation. This then allows for calculation of the t-statistic, which
determines how many standard deviations the sample mean is away from the
hypothesized mean. Based on this calculation, determine whether the null
hypothesis should be rejected or not.

We can also calculate the p-value for the sample. The p-value is the
probability of obtaining a sample mean which is as deviant or even more
deviant from o than the sample mean that actually happened.

Whether or not to reject a null hypothesis can be determined by comparing


the p-value to . In general:

p-value    reject the null hypothesis

p-value >   do not reject the null hypothesis


Handout #79
Econ 102A Statistical Methods for Social Scientists Page 1 of 1

Type I and Type II Errors in Hypothesis Testing

(This handout can be omitted due to missing two lectures due to holidays)
Handout #80
Econ 102A Statistical Methods for Social Scientists Page 1 of 3

Confidence Intervals for the Difference in Means between Two Populations

Thus far we have only considered confidence intervals for a single population. That is, we
considered taking a sample from the population of some statistic of interest and using the point
estimators of the sample to construct an estimate of the true population mean for the statistic.
However, another common use of confidence intervals is to detect the degree of difference in
means between two separate populations. For example, a drug company might believe it has
developed a new drug which lowers blood pressure. In an effort to lend credence to the
efficacy of the new drug, the company selects two groups: (1) a control group that has not been
administered the drug, and (2) a group of individuals that have been administered the drug. We
might then develop a confidence interval for the difference in average blood pressure between
these two populations. If the difference is significantly different than zero (and the average for
the latter group is lower than for the former group) this would then lend credence to the notion
that the drug does indeed lower blood pressure.

With regard to the underlying mathematics, we have the following:

• Non-proportion Problems:

The statistic of interest is the difference in population means between two separate populations.
That is, we are interested in developing a confidence interval for μ1 – μ2. Clearly, the point
estimate for this difference (i.e., our best single guess as to this difference) is given by x 1 – x 2 ,
the difference in sample means.

In order to develop the associated margin of error, we need to consider the variance of the
difference in sample means. That is, Var( x 1 – x 2 ).

By using the properties of variance, we have:

Var( x 1 – x 2 ) = Var( x 1 ) + Var(– x 2 )

= Var( x 1 ) + (– 1)2 Var( x 2 )

= Var( x 1 ) + Var( x 2 )

σ12 σ 22
= +
n1 n2

where n1 and n2 are the respective sample sizes drawn from each population. Thus the standard
deviation of the sampling distribution is given by:

σ12 σ 22

n1 n 2
Handout #80
Page 2 of 3

and the confidence interval would therefore be given by:

σ12 σ 22
( x 1 – x 2 ) ± (#SD)  .
n1 n 2

In the typical case where σ12 and σ 22 are unknown and need to be estimated by s12 and s 22
respectively, the confidence interval is given by:

s12 s 22
( x 1 – x 2 ) ± (#SD) 
n1 n 2

where #SD can be approximated from the t-table. In particular recall that, in order to use the
t-table, we must specify the ‘degrees of freedom.’ The degrees of freedom in this construct is
given by the Satterthwaite approximation (see textbook Section 7.2):

degrees of freedom =
s2
1 n1  s 22 n 2 
2
.
s12 n1 
2
(n1  1)  
s 22 n2 
2
(n 2  1)

Note that, typically, the degrees of freedom will not be integer-valued when using this
approximation.

A variation on the above is when one assumes that the standard deviations of the two
populations are equal. In this case the confidence interval is:

σ2 σ2
( x 1 – x 2 ) ± (#SD) 
n1 n 2

which is equivalent to

1 1
( x 1 – x 2 ) ± (#SD) σ2 (  ).
n1 n 2

Here, we pool the sample variances, s12 and s 22 , together to estimate σ 2 as:

(n1  1)s12  (n 2  1)s 22


s 2pooled =
n1  n 2  2

and the confidence interval is therefore given by:

1 1
( x 1 – x 2 ) ± (#SD) s 2pooled (  ).
n1 n 2
Handout #80
Page 3 of 3

In this construct, #SD is again taken from the t-table. Recall, in the one population setting,
degrees of freedom = n – 1. So, here, the sample from the first population renders n1 – 1
degrees of freedom and the sample from the second population renders n2 – 1 degrees of
freedom. Overall, then, we have a total of (n1 – 1) + (n2 – 1) = n1 + n2 – 2 degrees of freedom.

• Proportion Problems:

As we have previously seen, when the statistic of interest within the population is a Bernoulli
random variable we have a proportion problem with mean p and standard deviation p(1 p)
as opposed to the more generic μ and σ respectively.

Suppose now that we subject two populations to the same Bernoulli random variable.
Accordingly, let p1 be the proportion of ‘Yes’ responses from the sample taken among the first
population and let p 2 be the proportion of ‘Yes’ responses from the sample taken among the
second population. The analogous confidence interval expression (as compared to the
non-proportion setup shown above) is:

p1 (1  p1 ) p 2 (1  p 2 )
( p1 – p 2 ) ± (#SD)  .
n1 n2

As in the proportion problem setup for the one-population case, we will always take #SD from
the z-table as opposed to the t-table and thus avoid the laborious Satterthwaite approximation
formula for calculating degrees of freedom that we saw in the non-proportion setup above.
Handout #81
Econ 102A Statistical Methods for Social Scientists Page 1 of 4

Hypothesis Testing for the Difference in Means between Two Populations

In a previous handout we developed confidence intervals for the difference in means between
two populations of interest. This setting is also amenable for hypothesis testing. The example
on the previous handout considers a drug company who believes they have developed a new
drug which lowers blood pressure. In an effort to lend credence to the efficacy of the new drug,
the company selects two groups: (1) a control group that has not been administered the drug
(Population 1), and (2) a group of individuals that have been administered the drug (Population
2). How might hypothesis testing be used to lend credence to the notion that the drug does, in
fact, lower blood pressure? Well, we might consider the average blood pressure reading from
Population 1 against the average blood pressure reading from Population 2. If the drug has no
effect, we might expect μ1 – μ2 to be close to zero. If the drug has the beneficial effect of
lowering blood pressure we might expect μ1 – μ2 to be positive. In the unlikely event that the
drug actually has a detrimental effect on blood pressure we might expect μ1 – μ2 to be negative.

So, to test whether the drug has a beneficial effect, we might consider the null hypothesis
H0: μ1 – μ2 ≤ 0 along with the alternative hypothesis Ha: μ1 – μ2 > 0. We then proceed in the
exact same fashion as one-population hypothesis tests by sampling from both populations,
computing a t-statistic and then drawing an appropriate inference on whether or not to reject
the null hypothesis. Below, we concentrate attention on the computation of the t-statistic.

The underlying mathematics in very similar to that seen for confidence intervals. Specifically,
consider the following:

• Non-proportion Problems:

The statistic of interest is the difference in population means between two separate populations.
That is, we are interested in μ1 – μ2. Clearly, the point estimate for this difference (i.e., our best
single guess as to this difference) is given by x 1 – x 2 , the difference in sample means.

In order to develop the t-statistic we need the appropriate standard deviation. That is, we need
to consider Var( x 1 – x 2 ).

Exactly as in the case of confidence intervals, by using the properties of variance, we have:

Var( x 1 – x 2 ) = Var( x 1 ) + Var(– x 2 )

= Var( x 1 ) + (– 1)2 Var( x 2 )

= Var( x 1 ) + Var( x 2 )

σ12 σ2
= + 2
n1 n2
Handout #81
Page 2 of 4

where n1 and n2 are the respective sample sizes drawn from each population. Thus the standard
deviation of the sampling distribution is given by:

σ12 σ 22

n1 n 2

in the case where the population standard deviations are both known, and the corresponding
z-statistic would therefore be given by:

(x1  x 2 )  (μ1  μ 2 )
z-statistic = .
σ12 σ 22

n1 n2

Since the hypothesis test in two-population situations is usually looking to test any difference
in the population means (i.e., the null hypothesis is often μ1 – μ2 ≤ 0, μ1 – μ2 = 0, or μ1 – μ2 ≥ 0)
this z-statistic would then reduce to:

(x 1  x 2 )
z-statistic = .
σ12 σ 22

n1 n 2

In the typical case where σ12 and σ 22 are unknown and need to be estimated by s12 and s 22
respectively, the analogous t-statistic would then be:

(x1  x 2 )  (μ1  μ 2 )
t-statistic = .
s12 s 22

n1 n 2

This t-statistic is then converted to a p-value which is then directly compared to the level of
significance for the hypothesis test. Specifically, the p-value is given by TDIST(t-statistic,
degrees of freedom, number of tails in test) where the degrees of freedom is given by the
Satterthwaite approximation discussed in the previous handout. Note that, for those of you
with older versions of Excel, the t-statistic entered in the TDIST function should be entered as
a positive number.

As in the confidence interval discussion, a variation on the above is when one assumes that the
standard deviations of the two populations are equal. In this case the z-statistic is:

(x1  x 2 )  (μ1  μ 2 )
z-statistic =
σ2 σ2

n1 n 2

which is equivalent to:


Handout #81
Page 3 of 4

(x 1  x 2 )  (μ1  μ 2 )
z-statistic = .
1 1
σ ( 
2
)
n1 n 2

As before, we pool the sample variances, s12 and s 22 , together to estimate σ 2 as:

(n1  1)s12  (n 2  1)s 22


s 2pooled =
n1  n 2  2

and the t-statistic is therefore given by:

(x 1  x 2 )  (μ1  μ 2 )
t-statistic =
1 1
s 2pooled (  )
n1 n 2

with the degrees of freedom given as n1 + n2 – 2.

• Proportion Problems:

As we have previously seen, when the statistic of interest within the population is a Bernoulli
random variable we have a proportion problem with mean p and standard deviation p(1 p)
as opposed to the more generic μ and σ respectively.

Suppose now that we subject two populations to the same Bernoulli random variable.
Accordingly, let p1 be the proportion of ‘Yes’ responses from the sample taken among the first
population and let p 2 be the proportion of ‘Yes’ responses from the sample taken among the
second population. The analogous z-statistic (as compared to the non-proportion setup shown
above) is:

(p1  p 2 )  (p1  p 2 )
z-statistic = .
p1 (1  p1 ) p 2 (1  p 2 )

n1 n2

Now, since the hypothesis test in two-population situations is usually looking to test any
difference in the population proportions (i.e., the null hypothesis is often p1 – p2 = 0) this
implies that p1 = p2 and hence both populations have the same mean and standard deviation.
Thus, in such cases, the z-statistic reduces to:

(p1  p 2 )
z-statistic =
p(1  p) p(1  p)

n1 n2

or, equivalently:
Handout #81
Page 4 of 4

(p1  p 2 )
z-statistic =
1 1
p(1  p)(  )
n1 n 2

where p = the proportion of ‘Yes’ responses from both populations combined and is estimated
as:
Number of Yes responses from Population 1  Number of Yes responses from Population 2
p = .
n1  n 2
Handout #82
Econ 102A Statistical Methods for Social Scientists Page 1 of 5

Weeks 9 and 10 Practice Exercises

1. (Based on Exercises 8.20 and 8.21 from textbook). A survey of 7056 people who play both
the video games Guitar Hero and Rock Band reported that two-thirds of players who do not
currently play a musical instrument said that they are likely to begin playing a real musical
instrument within the next two years. The reports describing the survey do not give the
number of respondents (out of the 7056) who do not currently play a musical instrument.

(a) Suppose that half of the overall people surveyed do not currently play a musical
instrument. Develop a 99% confidence interval for the proportion of video game
players who do not currently play a musical instrument that anticipate playing a real
musical instrument within the next two years.

(b) Suppose only 25% of the players surveyed do not currently play a musical instrument.
Develop a 99% confidence interval for the proportion of video game players who do not
currently play a musical instrument that anticipate playing a real musical instrument
within the next two years.

(c) Again suppose that half of the overall people surveyed do not currently play a musical
instrument, but the survey reported that three-eighths (as opposed to two-thirds) of
players who do not currently play a musical instrument said that they are likely to begin
playing a real musical instrument within the next two years. Develop a 95% confidence
interval for the proportion of video game players who do not currently play a musical
instrument that anticipate playing a real musical instrument within the next two years.

2. (Based on Exercise 7.32 from textbook). Pollution of water resources is a serious problem
that can require substantial efforts to improve. To determine the financial resources
required, an accurate assessment of the extent of the problem is needed. The index of
biotic integrity (IBI) is a measure of the water quality in streams. Below are IBI
measurements for a sample of streams in the Ozark Highland eco-region of Arkansas that
were collected as part of a study. Assume IBI measurements are normally distributed for
streams in this region.

47 61 39 59 72 76 85 89 74 89

33 32 46 80 80 53 78 43 88 84

Suppose the national average of streams in the United States have an IBI index of 72.5. We
wish to analyze how streams in the Ozark Highland region compare to this national
average.
Handout #82
Page 2 of 5

(a) Construct appropriate null and alternative hypotheses to test whether streams in the
Ozark Highland region are significantly below this national average.

(b) Using a level of significance of  = .05 along with the above sample, test your null
hypothesis in part (a).

(c) Determine the p-value of the above sample and use it to verify the conclusion you
found in part (b).

3. (Based on Exercise 7.45 from textbook). Various car models have a computerized system
which estimates the miles-per-gallon that the car achieves between gasoline fill-ups.
Suppose you own such an automobile but you also do this calculation by hand at each
gasoline fill-up to test whether the computer is giving an accurate reading. Assume that the
difference between the computer reading and your reading is normally distributed. You
have gleaned the following data after 15 fill-ups:

Fill-up Difference in Readings Fill-up Difference in Readings


1 5.0 9 4.9
2 6.5 10 - 1.9
3 - 0.6 11 4.4
4 1.7 12 0.1
5 - 2.3 13 - 1.0
6 4.5 14 1.1
7 4.0 15 1.1
8 2.2

(a) Construct appropriate null and alternative hypotheses to test whether the computer’s
reading is significantly different than your manual calculation.

(b) Using a level of significance of  = .02 along with the above sample, test your null
hypothesis in part (a).

(c) Determine the p-value of the above sample.

4. An appliance manufacturer stockpiles washers and dryers in a large warehouse for shipment
to retail stores. Some appliances get damaged in handling. The long-term goal has been to
keep the level of damaged machines below 2%. In a recent test, an inspector randomly
checked 60 washers and dryers, discovering that 4 of them had scratches or dents.

(a) Test the null hypothesis H0: p ≤ .02, where p represents the population proportion of
damaged washers and dryers, using a level of significance of  = .05.

(b) How might the i.i.d. assumption about the sample of 60 washers be questioned here?
Handout #82
Page 3 of 5

5. (Based on Exercise 6.104 from textbook). You want to see if a redesign of the cover of a
mail-order catalog will increase sales. A very large number of customers will receive the
original catalog and a random sample of customers will receive the one with the new cover.
Based on past experience, you are willing to assume that the mean sales for the new catalog
will be normally distributed with a standard deviation of σ = 50 dollars. Further,
experience suggests that mean sales for the original catalog will be μ = 25 dollars, so you
construct the null hypothesis as H0: μ ≤ 25 when testing the new catalog. Your sample size
will be n = 900.

(a) Verbally state what a Type I error and a Type II error are for this exercise.

(b) Suppose you decide to reject the null hypothesis whenever x ≥ 28. Determine the
probability of a Type I error.

(c) Determine the probability you will reject the null hypothesis when μ = 29 dollars.

6. (Based on Exercise 7.72 from textbook). The ‘misery is not miserly’ phenomenon refers to
a sad person’s spending judgment going haywire. Consider a study where people from the
population were randomly divided into two groups. Group 1 was shown a video about a
sad topic while Group 2 was shown a video about a neutral topic. Afterward, individuals in
both groups were asked to express their willingness to pay for a certain item. The objective
of the study was to observe whether the group shown the sad video would, on average, be
willing to pay more for the item than the neutral group.

Suppose Group 1 consisted of 58 individuals and their average willingness-to-pay was


x 1 = $34.32 with a standard deviation of s1 = $14.85 and Group 2 consisted of 76
individuals having an average willingness-to-pay of x 2 = $27.68 with a standard deviation
of s 2 = $8.13.

(a) Develop a 95% confidence interval on the difference in mean between these two
groups. Calculate the degrees of freedom by using the Satterthwaite approximation.

(b) Does your confidence interval support the assertion that sad people are willing to spend
more? Please explain your reasoning.

7. (Based on Exercise 8.66 from textbook). A Pew Internet Project Data Memo presented
data comparing adult gamers with teen gamers with respect to the devices on which they
play. The data are from two surveys. The adult survey had 868 gamers and the teen survey
had 1064 gamers. The memo reports that 469 adult gamers played on game consoles (e.g.,
Xbox, PlayStation and Wii) while 947 teen gamers played on game consoles. Determine
the 99% confidence interval for the difference in proportion of game console users between
adults gamers versus teen gamers.
Handout #82
Page 4 of 5

8. A researcher is interested in comparing the number of hours of television watched each day
by six-year old and ten-year old children. Suppose the daily number of hours of television
watched by both groups is normally distributed. A random sample of various children in
each age group produced the following data:

6 year olds 10 year olds


1.5 2.5 2.0 1.8
0.5 1.5 1.8 2.5
2.0 3.3 2.5 3.0
1.2 0.8 0.5 4.5
2.8 2.0 3.5
0.7 3.5 2.2
2.0 1.5 2.0
1.8 2.1 3.5

(a) Use the above sample data to test (at  = .05) the null hypothesis that the average
amount of daily hours spent watching television by both age groups is equal.

(b) Determine the p-value for the above hypothesis test.

(c) Use the above sample data to test whether the average daily hours spent watching
television for ten-year olds is at least 15 minutes more than for six-year olds. Use a
level of significance of  = .10.

9. (Based on Exercise 8.75 from textbook). An association of Christmas tree growers in


Indiana sponsored a survey of Indiana households to help improve the marketing of
Christmas trees. Specifically, survey respondents who had a tree during the holiday season
were asked whether the tree was natural or artificial. Respondents were also asked if they
lived in an urban area or in a rural area. Of the 421 households displaying a Christmas tree,
160 lived in rural areas. Among the rural households 64 had natural trees; among the urban
households 89 had natural trees. The tree growers want to know if there is a difference in
preference for natural trees versus artificial trees between urban and rural households.

(a) Use the above sample data to test the null hypothesis that there is no difference in
Christmas tree preference between urban and rural areas in Indiana. Please use a level
of significance of  = .10.

(b) Explain how the sample is being used to estimate both the population mean and
population standard deviation in the above hypothesis test.

(c) Determine the p-value for the above hypothesis test.


Handout #82
Page 5 of 5

10. Looking for more practice exercises on the material covered during the last few weeks?
This is a main focus of our (recommended) course textbook. The following table breaks
down which sections of the textbook have practice exercises in the various topics we have
recently covered. Should you attempt to solve any of the (literally hundreds of) textbook
exercises, your instructor is willing to look over your attempted solutions.

Section of Textbook Concept(s) Covered

Non-proportion confidence intervals where


6.1
population standard deviation is known

Non-proportion hypothesis tests where population


6.2
standard deviation is known

6.4 Type I and Type II errors for hypothesis tests

Non-proportion, one-population confidence intervals


where population standard deviation is unknown
7.1
Non-proportion, one-population hypothesis tests
where population standard deviation is unknown

Non-proportion, two-population confidence intervals


7.2
Non-proportion, two-population hypothesis tests

Proportion problem, one-population confidence


8.1 intervals

Proportion problem, one-population hypothesis tests

Proportion problem, two-population confidence


8.2 intervals

Proportion problem, two-population hypothesis tests


Handout #83
Econ 102A Statistical Methods for Social Scientists Page 1 of 5

Weeks 9 and 10 Practice Exercises – Solutions

1. Guitar Hero and Rock Band Exercise

(a) The number of people who do not currently play an instrument is 7056 / 2 = 3528. Of
these, 2/3 (or 2352 people) claim that they anticipate playing a real instrument within
the next two years. The 99% confidence interval on the proportion is therefore:

(.6667)(.3333)
.6667 ± 2.576 ( ) = .6667 ± .0204  [.6463 up to .6871]
3528

(b) The number of people who do not currently play an instrument is now 7056 / 4 = 1764.
Of these, 2/3 (or 1176 people) claim that they anticipate playing a real instrument
within the next two years. The 99% confidence interval on the proportion is therefore:

(.6667)(.3333)
.6667 ± 2.576 ( ) = .6667 ± .0289  [.6378 up to .6956]
1764

(c) The number of people who do not currently play an instrument is again 7056 / 2 = 3528.
Now, 3/8 (or 1323 people) claim that they anticipate playing a real instrument within
the next two years. The 95% confidence interval on the proportion is therefore:

(.375)(.625)
.375 ± 1.96 ( ) = .375 ± .016  [.359 up to .391]
3528

2. Water Pollution Exercise

(a) The hypotheses are given by H0: μ ≥ 72.5 and Ha: μ < 72.5.

(b) Upon entering the data in an Excel spreadsheet, we find x = 65.4 and s = 19.82. The
corresponding t-statistic is:
65.4  72.5
t-statistic = = – 1.60.
19.82 / 20

In terms of the number of standard deviations, the critical value for the test is TINV(.10,
19) = – 1.729. Since – 1.60 > – 1.729 we do not have sufficient evidence at the 5%
level to claim that the Ozark Highland streams have an IBI index below the national
average.

(c) The p-value is given by TDIST(1.60, 19, 1) = 6.30%. Notice that 6.30% > 5% which
substantiates the conclusion found in part (b).
Handout #83
Page 2 of 5

3. Automobile Mileage Exercise

(a) The statistic of interest here is the difference in the computer’s reading versus our
manual calculation. The computer’s reading could be different than our manual
calculation by either underestimating miles-per-gallon or overestimating miles-per-
gallon. So, to test the hypothesis we allow a rejection region in both tails of the
distribution. That is, the hypotheses are given by H0: Computer reading – Manual
calculation = 0 and Ha: Computer reading – Manual calculation ≠ 0. Or, in other words,
H0: Difference in readings = 0 and Ha: Difference in readings ≠ 0

(b) Upon entering the data in an Excel spreadsheet, we find the average difference in the
readings as x = 1.98 with standard deviation s = 2.784. The corresponding t-statistic is:
1.98  0
t-statistic = = 2.754.
2.784 / 15

In terms of the number of standard deviations, the critical value for the test is TINV(.02,
14) = 2.624. Since 2.754 > 2.624 we reject the null at the 2% level.

(c) The p-value is given by TDIST(2.754, 14, 2) = 1.55%. Notice that 1.55% < 2% which
substantiates the conclusion found in part (b).

4. Appliance Exercise

(a) Here we have a ‘proportion problem’ hypothesis test. The proportion of damaged
washers and dryers in the sample is 4/60 = 6.67%. The z-statistic for the sample is:
.0667  .02
z-statistic = = 2.584.
(.02)(.98) / 60

In terms of the number of standard deviations, the critical value for the test is
NORMSINV(.95) = 1.645. Since 2.584 > 1.645 we reject the null at the 5% level.

(b) We might wonder if damage among washers and dryers is independent. For instance,
perhaps a truck transports multiple washers and dryers to the warehouse in a single
hauling. Then, if the truck has an accident, hits a few potholes on the road, or has a
rocky journey due to inclement weather then all the appliances on the truck may show
damage. Even if only the single washer is transported along with the single dryer it is
matched with, there may not be independence between damage to these two separate
appliances for similar reasons.
Handout #83
Page 3 of 5

5. Mail-Order Catalog Exercise

(a) The Type I error means that the new catalog does not significantly increase sales but we
wrongly conclude that it does. The Type II error means that the new catalog does
increase sales but we wrongly conclude that it does not.

(b) The z-statistic which defines the rejection region is:


28  25
z-statistic = = 1.80.
50 / 900
Therefore, the probability in the rejection region is 1 – NORMSDIST(1.80) = 3.59%.
So, the probability of rejecting the null when it is indeed true = P(Type I error) =
3.59%.

(c) Here, we consider the chance of rejecting the null hypothesis when, in actuality,
μ = 29. From part (b), we will reject the null hypothesis whenever x ≥ 28. The
corresponding z-statistic is given by:
28  29
z-statistic = = – 0.60.
50 / 900
The probability that we would achieve a sample mean above this value is then
1 – NORMSDIST(– 0.60) = .7257. Therefore, the power of the test is .7257.

6. Willingness to Pay Exercise

(a) We have a confidence interval on the difference between two populations. The point
estimate is given by x1  x 2 = $34.32 – $27.68 = $6.64. The standard error is given
by:

s12 s 22 (14.85) 2 (8.13) 2


 =  = 2.161
n1 n 2 58 76

In order to determine the degrees of freedom, we appeal to the Satterthwaite


approximation:

degrees of freedom =
s2
1 n1  s 22 n 2 
2

s
2
1 n1 
2

(n1  1)  s 22 n 2 
2
(n 2  1)

which renders a value of 82.77 in this instance. Therefore, the number of standard
deviations for our confidence interval is TINV(.05, 82.77) = 1.989. Putting everything
together, the required confidence interval is:
6.64 ± 1.989 (2.161) = 6.64 ± 4.30  [2.34 up to 10.94]
Handout #83
Page 4 of 5

(b) Yes, the confidence interval above does not include $0 within the range. Even with
99% confidence the lower bound would still be $0.94 > $0 suggesting strong evidence
that there is a difference between the two populations.

7. Gamers Exercise

We have a confidence interval on the difference between two populations for a proportion
problem setting. The proportion from the adult gamers sample is 469/868 = 54% while the
proportion from the teen gamers sample is 947/1064 = 89%. The point estimate on the
difference in proportions is given by p teen  p adult = .89 – .54 = .35. The standard error is
given by:

p1 (1  p1 ) p 2 (1  p 2 ) (.89) (.11) (.54) (.46)


 =  = .0194
n1 n2 1064 868

Since we use the z-statistic for proportion problems, we take #SD = 2.576. Putting
everything together, the required confidence interval is:

.35 ± 2.576 (.0194) = .35 ± .05  [.30 up to .40]

8. Television Viewing Exercise

(a) We have a hypothesis test on the difference between two populations. Let Population 1
= the 6 year olds and Population 2 = the 10 year olds. Then, plugging the sample data
into an Excel spreadsheet, we have n1 = 16, x 1 = 1.86, s1 = 0.869, n2 = 12, x 2 = 2.48
and s 2 = 1.036. We wish to test the null hypothesis, H0: μ1 – μ2 = 0, at a 5% level of
significance.

The t-statistic is given by:

(x1  x 2 )  (μ1  μ 2 )  0.62  0


t-statistic = = = – 1.677.
s12 s 22 (0.869) 2 (1.036) 2
 
n1 n 2 16 12

In order to determine the degrees of freedom, we again appeal to the Satterthwaite


approximation (as in the previous exercise). In this instance, the approximation
calculates as df = 21.32. Then, in terms of the number of standard deviations, the
critical value for the test is TINV(.05, 21.32) = – 2.08. Since – 1.677 > – 2.08 we do
not have sufficient evidence to reject the null hypothesis at the 5% level.

(b) The p-value is given by TDIST(1.677, 21.32, 2) = 10.84%. Notice that 10.84% > 5%
which substantiates the conclusion found in part (a).
Handout #83
Page 5 of 5

(c) We wish to test the null hypothesis, H0: μ1 – μ2  – 0.25 and Ha: μ1 – μ2 < – 0.25, at a
10% level of significance.
The t-statistic is given by:
(x1  x 2 )  (μ1  μ 2 )  0.62  0.25
t-statistic = = = – 1.00.
s12 s 22 (0.869) 2 (1.036) 2
 
n1 n 2 16 12

The degrees of freedom is still given by df = 21.32. Then, in terms of the number of
standard deviations, the critical value for the test is TINV(.20, 21.32) = – 1.323. Since
– 1.00 > – 1.323 we do not have sufficient evidence to reject the null hypothesis at the
10% level.

9. Christmas Tree Exercise

(a) We have a two-tailed hypothesis test on the difference between two populations in a
proportion problem setting. Let Population 1 = people who live in rural areas and
Population 2 = people who live in urban areas. The sample proportion of natural tree
owners from the rural population is 64/160 = 40% while the sample proportion of
natural tree owners from the urban population is 89/261 = 34.1%. The point estimate
on the difference in proportions is given by p1  p 2 = .40 – .341 = .059.

Overall, the z-statistic is given by:


(p1  p 2 ) .059
z-statistic = = = 1.22
1 1 1 1
p(1  p)(  ) (.3634)(.6366)(  )
n1 n 2 160 261

Since we use the z-statistic for proportion problems, the number of standard deviations
for the critical value using  = 10% is NORMSINV(.95) = 1.645. Since 1.22 < 1.645
we do not have sufficient evidence to reject the null hypothesis at the 10% level.

(b) The null hypothesis is given by H0: p1 – p2 = 0 or, equivalently, H0: p1 = p2. This
implies that both populations have not only the same population mean but also the same
population standard deviation (since standard deviation in a proportion problem only
depends on the population mean, p). Upon aggregating the samples, the best estimate
for the (constant) population mean is (64 + 89) / 421 = .3634. Therefore, the estimate
for the population standard deviation is p(1 p) = (.3634)(.6366) which was used
in the denominator of the z-statistic calculated above.

(c) The p-value is given by 2*[1 – NORMSDIST(1.22)] = 22.25%. Notice that 22.25% >
10% which substantiates the conclusion found in part (a).

Potrebbero piacerti anche