AgainstAllOdds StudentGuide Set3

Unit 22: Sampling
Distributions
Summary of Video
If we know an entire population, then we can compute population parameters such as the
population mean or standard deviation. However, we generally don’t have access to data
from the entire population and must base our information about a population on a sample.
From samples, we compute statistics such as sample means or sample standard deviations.
However, if we resample, chances are good that we won’t get the same results.
This video begins with a population of heights from students in a third grade class at Monica
Ros School. A graphic display of the population distribution of heights shows a roughly normal
shape with a mean µ = 53.4 inches and standard deviation σ = 1.8 inches (See Figure 22.1.).
Figure 22.1. Population distribution of heights from third-grade class.
Next, we draw random samples of size four from the class and record the heights. Figure 22.2
shows the results from five samples along with their sample means, which can be found in
Table 22.1. Notice that the sample means vary from sample to sample, except for Samples 3
and 4 where the sample means match even though the data values differ.
Unit 22: Sampling Distributions | Student Guide | Page 1

1
Sample
2
3
4
5
50 51 52 53 54 55 56 57
Height
Figure 22.2. Random samples of size four.
We can keep sampling until we’ve selected all samples of size four from this population of
20 students. If we plot the sample means of all possible samples of size four, we get what is
called the sampling distribution of the sample mean (See bottom graph in Figure 22.3.).
Sample Mean, x
1 53.00
2 52.25
3 52.75
4 52.75
5 53.25
Table 22.1.
Sample means.
Figure 22.3. Sampling distribution of the sample mean.
Now, compare the sampling distribution of x to the population distribution. Notice that
both distributions are approximately normal with mean 53.4 inches. However, the sampling
distribution of x is not as spread out as the population distribution.
We can calculate the standard deviation of x as follows:

σ
σx =
n

1.8 inches
σx = ≈ 0.9 inch
4
Next, we put what we have learned about the sampling distribution of the sample mean to
use in the context of manufacturing circuit boards. Although the scene depicted in the video is
one that you don’t see much anymore in the United States, we can still explore how statistics
can be used to help control quality in manufacturing. A key part of the manufacturing process
of circuit boards is when the components on the board are connected together by passing
it through a bath of molten solder. After boards have passed through the soldering bath, an
inspector randomly selects boards for a quality check. A score of 100 is the standard, but there
is variation in the scores. The goal of the quality control process is to detect if this variation
starts drifting out of the acceptable range, which would suggest that there is a problem with
the soldering bath.
Based on historical data collected when the soldering process was in control, the quality
scores have a normal distribution with mean 100 and standard deviation 4. The inspector’s
random sampling of boards consists of samples of size five. Hence, the sampling distribution
of x is normal with a mean of 100 and standard deviation of 4 / 5 ≈ 1.79 . The inspector uses
this information to make an x control chart, a plot of the values of x against time. A normal
curve showing the sampling distribution of x has been added to the side of the control chart.
Recall from the 68-95-99.7% rule, that we expect 99.7% of the scores to be within three
standard deviations of the mean. So, we have added control limits that are three standard
deviations (3 × 1.79 or 5.37 units) on either side of the mean (See Figure 22.4.). A point outside
either of the control limits is evidence that the process has become more variable, or that its
mean has shifted – in other words, that it’s gone out of control. As soon as an inspector sees a
point such as the one outside the upper control limit in Figure 22.4, it’s a signal to ask, what’s
gone wrong? (For more information on control charts, see Unit 23, Control Charts.)

Figure 22.4. Control chart with control limits.
So far we’ve been looking at population distributions that follow a roughly normal curve. Next,
we look at a distribution of lengths of calls coming into the Mayor’s 24 Hour Hotline call center
in Boston, Massachusetts. Most calls are relatively brief but a few last a very long time. The
shape of the call-length distribution is skewed to the right as shown in Figure 22.5.
Figure 22.5. Duration of calls to a call center.
To gain insight into the sampling distribution of the sample mean, x , for samples of size 10,
we randomly selected 40 samples of size 10 and made a histogram of the sample means.
We repeated this process for samples of size 20 and then again for samples of size 60. The
histograms of the sample means appear in Figure 22.6.

Figure 22.6. Histograms of sample means from samples of size 10, 20, and 60.
Now let’s compare our sampling distributions (Figure 22.6) with the population distribution
(Figure 22.5). Notice that the spread of all the sampling distributions is smaller than the spread
of the population distribution. Furthermore, as the sample size n increases, the spread of the
sampling distributions decreases and their shape becomes more symmetric. By the time
n = 60, the sampling distribution appears approximately normally distributed. What we have
uncovered here is one of the most powerful tools statisticians possess, called the Central Limit
Theorem. This states that, regardless of the shape of the population, the sampling distribution
of the sample mean will be approximately normal if the sample size is sufficiently large. It is
because of the Central Limit Theorem that statisticians can generalize from sample data to the
larger population. We will be seeing applications of the Central Limit Theorem in later units on
confidence intervals and significance tests.

Student Learning Objectives
A. Recognize that there is variability due to sampling. Repeated random samples from the
same population will give variable results.
B. Understand the concept of a sampling distribution of a statistic such as a sample mean,

sample median, or sample proportion.
C. Know that the sampling distributions of some common statistics are approximately normally
distributed; in particular, the sample mean x of a simple random sample drawn from a normal
population has a normal distribution.
D. Know that the standard deviation of the sampling distribution of x depends on both the
standard deviation of the population from which the sample was drawn and the sample size n.
E. Grasp a key concept of statistical process control: Monitor the process rather than examine
all of the products; all processes have variation; we want to distinguish the natural variation of
the process from the added variation that shows that the process has been disturbed.
F. Make an x control chart. Use the 68-95-99.7% rule and the sampling distribution of x to
help identify if a process is out of control.
G. Be familiar with the Central Limit Theorem: the sample mean x of a large number of
observations has an approximately normal distribution even when the distribution of individual
observations is not normal.

Content Overview
The idea of a sampling distribution, in general, and specifically about the sampling distribution
of the sample mean x , underlies much of introductory statistical inference. The application
to x charts is important in practice and the discussion of x charts, along with other types of
control charts, continues in Unit 23, Control Charts.
If repeated random samples are chosen from the same population, the values of sample
statistics such as x will vary from sample to sample. This variation follows a regular pattern
in the long run; the sampling distribution is the distribution of values of the statistic in a very
large number of samples. For example, suppose we start with data from the population
distribution shown in Figure 22.7. This population is skewed to the right, and clearly not
normally distributed.
0 5 10 15 20 25
x X
Figure 22.7. Population distribution.
Now, we draw a random sample of size 50 from this population and compute two statistics,
the mean and the median, and get 20.7 and 19.8, respectively. Next we take another sample
of size 50 and compute the mean and median for that sample. We keep resampling until we
have a total of 1000 samples. Histograms of the 1000 means and 1000 medians from those
samples appear in Figures 22.8 and 22.9, respectively. In both cases, the sampling distribution
of the statistic appears approximately normally distributed. The sampling distribution of the
sample mean, x , is centered around 24 and the sampling distribution of the sample median at
around 22.

120
100
80
Frequency
60
40
20
0
20 22 24 26 28 30
Sample Mean
Figure 22.8. Distribution of the sample mean from 1000 samples of size 50.
100
80
Frequency
60
40
20
0
16 18 20 22 24 26 28 30
Sample Median
Figure 22.9. Distribution of the sample median from 1000 samples of size 50.
Although basic statistics such as the sample mean, sample median and sample standard
deviation all have sampling distributions, the remainder of this unit will focus on the sampling
distribution of the sample mean, x . If x is the mean of a simple random sample of size n from
a population with mean µ and standard deviation σ, then the mean and standard deviation of
the sampling distribution of x are:
µx = µ
σ
σx =
n

If a population has the normal distribution with mean µ and standard deviation σ, then the
sample mean x of n independent observations has a normal distribution with mean µ and
standard deviation σ n . In our example above, the population distribution was not normal
(see Figure 22.7). In such cases, the Central Limit Theorem comes to the rescue – if the
sample size is large (say n > 30), the sampling distribution of x is approximately normal for
any population with finite standard deviation.
Control charts for the sample mean x provide an immediate application for the sampling
distribution of x . In the 1920’s Walter Shewhart of Bell Laboratories noticed that production
workers were readjusting their machines in response to every variation in the product. If the
diameter of a shaft, for example, was a bit small, the machine was adjusted to cut a larger
diameter. When the next shaft was a bit large, the machine was adjusted to cut smaller. Any
process has some variation, so this constant adjustment did nothing except increase variation.
Shewhart wanted to give workers a way to distinguish between the natural variation in the
process and the extraordinary variation that shows that the process has been disturbed and
hence, actually requires adjustment.
The result was the Shewhart x control chart. The basic idea is that the distribution of sample
mean x is close to normal if either the sample size is large or individual measurements are
normally distributed. So, almost all the x -values lie within ±3 standard deviations of the mean.
The correct standard deviation here is the standard deviation of x , which is σ n (where σ is
the standard deviation of individual measurements). So, the control limits µ ± 3σ n contain
the range in which sample means can be expected to vary if the process remains stable. The
control limits distinguish natural variation from excessive variation.

Key Terms
If repeated random samples are chosen from the same population, the values of sample
statistics such as x will vary from sample to sample. This variation follows a regular pattern
in the long run; the sampling distribution is the distribution of values of the statistic in a very
large number of samples.
If x is the mean of a simple random sample (SRS) of size n from a population having mean µ
and standard deviation σ, then the mean and standard deviation of x are:
µx = µ
σ
σx =
n
If a population has a normal distribution with mean µ and standard deviation σ, then the
sampling distribution of the sample mean, x , of n independent observations has a normal
distribution with mean µ and standard deviation σ n .
If the population is not normal but n is large (say n > 30), then the Central Limit Theorem tells
us that the sampling distribution of the sample mean, x , of n independent observations
has an approximate normal distribution with mean µ and standard deviation σ n .

The Video
Take out a piece of paper and be ready to write down answers to these questions as you
watch the video.
1. What is the difference between parameters and statistics?
2. Does statistical process control inspect all the items produced after they are finished?
3. The inspector samples five circuit boards at regular intervals and finds the mean solder
quality score x for these five boards. Do we expect x to be exactly 100 if the soldering
process is functioning properly?
4. If the quality of individual boards varies according to a normal distribution with mean µ = 100
and standard deviation σ = 4 , what will be the distribution of the sample averages, x ?
(Recall the sample size is n = 5.)
5. In general, is the mean of several observations more or less variable than single
observations from a population? Explain.

6. The distribution of call lengths to a call center is strongly skewed. What does the
Central Limit Theorem say about the distribution of the mean call length x from large samples
of calls?

Unit Activity:
Sampling Distributions of the Sample Mean
Write each of On this

these numbers many slips
50 10
49, 51 9
48, 52 9
47, 53 8
46, 54 6
45, 55 5
44, 56 3
43, 57 2
42, 58 1
41, 59 1
40, 60 1
Table 22.2. Numbered slips for the population distribution.
1. Your instructor has a container filled with numbered strips as shown in Table 22.2. Make a
histogram of this distribution. Describe its shape.
2. You will need 100 samples of size 9. Your instructor will provide instructions for gathering
these samples. After the data have been collected, you will need a copy of the table of results
before you can answer parts (a) and (b).
a. Find the sample mean for each of the samples. Record the sample means in the results
table. (Save your results table. You will need this table again for the activity in Unit 24,
Confidence Intervals.)
b. To get an idea of the characteristics of the sampling distribution for the sample mean, make
a histogram of the sample means. (Use the same scaling on the horizontal axis that you used
in question 1.) Compare the shape, center and spread of the sampling distribution to that of the
original distribution (question 1).

Extension
3. A population has a uniform distribution with density curve as shown in Figure 22.10.
1.0
0.8
Proportion
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Figure 22.10. Density curve for uniform distribution.
a. Your instructor will give you directions for using technology to generate 100 samples of size
9 from this distribution.
b. Once you have your 100 samples, find the sample means.
c. Make a histogram of the 100 sample means. Describe the shape of your histogram.
Compare the center of this sampling distribution with the center of the population distribution
from Figure 22.10.

Exercises
1. The law requires coal mine operators to test the amount of dust in the atmosphere of the
mine. A laboratory carries out the test by weighing filters that have been exposed to the air in
the mine. The test has a standard deviation of σ = 0.08 milligram in repeated weighings of the
same filter. The laboratory weighs each filter three times and reports the mean result.
a. What is the standard deviation of the reported result?
b. Why do you think the laboratory reported a result based on the mean of three weighings?
2. The scores of students on the ACT college entrance examination in a recent year had the
normal distribution with mean µ = 18.6 and standard deviation σ = 5.9 .
a. What fraction of all individual students who take the test have scores 21 or higher?
b. Suppose we choose 55 students at random from all who took the test nationally. What is the
distribution of average scores, x , in a sample of size 55? In what fraction of such samples will
the average score be 21 or higher?
3. The number of accidents per week at a hazardous intersection varies with mean 2.2 and
standard deviation 1.4. This number, x, takes only whole-number values, and so is certainly
not normally distributed.
a. Let x be the mean number of accidents per week at the intersection during a year (52
weeks). What is the approximate distribution of x according to the Central Limit Theorem?
b. What is the approximate probability that, on average, there are fewer than two accidents per
week over a year?
c. What is the approximate probability that there are fewer than 100 accidents at the
intersection in a year? (Hint: Restate this event in terms of x .)
4. A company produces a liquid that can vary in its pH levels unless the production process is
carefully controlled. Quality control technicians routinely monitor the pH of the liquid. When the
process is in control, the pH of the liquid varies according to a normal distribution with mean
µ = 6.0 and standard deviation σ = 0.9.

a. The quality control plan calls for collecting samples of size three from batches produced
each hour. Using n = 3, calculate the lower control limit (LCL) and upper control limit (UCL).
b. Samples collected over a 24-hour time period appear in Table 22.3.
Sample pH level Sample mean

1 5.8 6.2 6.0
2 6.4 6.9 5.3
3 5.8 5.2 5.5
4 5.7 6.4 5.0
5 6.5 5.7 6.7
6 5.2 5.2 5.8
7 5.1 5.2 5.6
8 5.8 6.0 6.2
9 4.9 5.7 5.6
10 6.4 6.3 4.4
11 6.9 5.2 6.2
12 7.2 6.2 6.7
13 6.9 7.4 6.1
14 5.3 6.8 6.2
15 6.5 6.6 4.9
16 6.4 6.1 7.0
17 6.5 6.7 5.4
18 6.9 6.8 6.7
19 6.2 7.1 4.7
20 5.5 6.7 6.7
21 6.6 5.2 6.8
22 6.4 6.0 5.9
23 6.4 4.6 6.7
24 7.0 6.3 7.4
Table 22.3. pH of samples.
c. Make an x chart by plotting the sample means versus the sample number. Draw horizontal
reference lines at the mean and lower and upper control limits.
d. Do any of the sample means fall below the lower control limit or above the upper control
limit? This is one indication that a process is “out of control.”
e. Apart from sample means falling outside the lower and upper control limits, is there any
other reason why you might be suspicious that this process is either out of control or going out
of control? Explain.

Review Questions
1. Suppose a chemical manufacturer produces a product that is marketed in plastic bottles.
The material is toxic, so the bottles must be tightly sealed. The manufacturer of the bottles
must produce the bottles and caps within very tight specification limits. Suppose the caps will
be acceptable to the chemical manufacturer only if their diameters are between 0.497 and
0.503 inch. When the manufacturing process for the caps is in control, cap diameter can be
described by a normal distribution with µ = 0.500 inch and σ = 0.0015 inch .
a. If the process is in control, what percentage of the bottle caps would have diameters outside
the chemical manufacturer’s specification limits?
b. The manufacturer of the bottle caps has instituted a quality control program to prevent the
production of defective caps. As part of its quality control program, the manufacturer measures
the diameters of a random sample of n = 9 bottle caps each hour and calculates the sample
mean diameter. If the process is in control, what is the distribution of the sample mean x ? Be
sure to specify both the mean and standard deviation of x ’s distribution.
c. The cap manufacturer has a rule that the process will be stopped and inspected any time
the sample mean falls below 0.499 inch or above 0.501 inch. If the process is in control, find
the proportion of times it will be stopped for inspection.
2. A study of rush-hour traffic in San Francisco records the number of people in each car
entering a freeway at a suburban interchange. Suppose that this number, x, has mean 1.5
and standard deviation 0.75 in the population of all cars that enter at this interchange during
rush hours.
a. Could the exact distribution of x be normal? Why or why not?
b. Traffic engineers estimate that the capacity of the interchange is 700 cars per hour.
According to the Central Limit Theorem, what is the approximate distribution of the mean
number of persons, x , per car in 700 randomly selected cars at this interchange?
c. What is the probability that 700 cars will carry more than 1075 people? (Hint: Restate the
problem in terms of the average number of people per car.)

3. Recall that the distribution of the lengths of calls coming into a Boston, Massachusetts, call
center each month is strongly skewed to the right. The mean call length is µ = 90 seconds and
the standard deviation is σ = 120 seconds.
a. Let x be the sample mean from 10 randomly selected calls. What is the mean and
standard deviation of x ? What, if anything, can you say about the shape of the distribution of
x ? Explain.
b. Let x be the sample mean from 100 randomly selected calls. What is the mean and
standard deviation of x ? What, if anything, can you say about the shape of the distribution of
x ? Explain.
c. In a random sample of 100 calls from the call center, what is the probability that the average
length of these calls will be over 2 minutes?

Unit 23: Control Charts
Summary of Video
Statistical inference is a powerful tool. Using relatively small amounts of sample data we can
figure out something about the larger population as a whole. Many businesses rely on this
principle to improve their products and services. Management theorist and statistician W.
Edwards Deming was among the first to champion the idea of statistical process management.
Initially, Deming found the most receptive audience to his management theories in Japan.
After World War II, Japanese industry was shattered. Rebuilding was a daunting challenge,
one that Japanese business leaders took on with great determination. In the decades after the
war, they transformed the phrase “Made in Japan” from a sign of inferior, cheaply-made goods
to a sign of quality respected the world over. Deming’s emphasis on long-term thinking and
continuous process improvement was vital in bringing about the so-called “Japanese Miracle.”
At first, Deming’s ideas were not as well received in America. Deming criticized American
managers for their lack of understanding of statistics. But as time went on – and competition
from Japan grew – companies in the U.S. began to embrace Deming’s ideas on statistical
process control. Now his principles of total quality management are an integral part of
American business, helping workers uncover problems and produce higher quality goods
and services.
In statistics, a process is a chain of steps that turns inputs into outputs. A process could be
anything from the way a factory turns raw iron into a finished bolt to the way you turn raw
ingredients into a hot dinner. Statisticians say a process that is running smoothly, with its
variables staying within an expected range, is in control. Deming was adamant that statistics
could help in understanding a manufacturing process and identifying its problems, or when
things were out of control or about to go out of control. He advocated the use of control charts
as a way to monitor whether a process is in or out of control. This technique is widely used to
this day as we’ll see in the video in a visit to Quest Diagnostics’ lab.
Quest performs medical tests for healthcare providers. So, for example, at Quest a patient’s
blood sample is the input of the process and the test result is the output. A courier picks
up specimens and transports them to the processing lab, where they are sorted by time of
arrival and urgency of test. Technicians verify each specimen and confirm the doctor’s orders.
Then the specimens are barcoded and are ready to be passed on for testing. Quest’s Seattle
Unit 23: Control Charts | Student Guide | Page 1

processing lab aimed to get all specimens logged in and ready by 2 a.m. so the sample could
move on to the technical department for analysis. However, until a few years ago, they were
rarely meeting that 2 a.m. goal. Their lateness was leading to poor customer and
employee satisfaction and moreover, it was wasting corporate resources. Enter statistical
process control!
Quest needed to know where the process stood at present: How close were they to hitting
the 2 a.m. target and how much did finish times vary? Keep in mind that all processes have
variation. Common cause variation is due to the day-to-day factors that influence the process.
In Quest’s case, it could be things like a printer running out of paper and needing to be refilled,
or a worker calling in sick. It is the normal variation in a system.
Processes are also susceptible to special cause variation – that’s when sudden, unpredictable
events throw a wrench into the process. Examples of special cause variation would be
blackouts that shut down the lab’s power, or a major crash on the highway that would keep the
samples from being delivered to the lab. Quest needed to figure out how their process was
running on a day-to-day basis when they were only up against common cause variation.
Quest used six months of finish-time data to set up control limits and then created a control
chart, which is a graphic way to keep track of variation in finish times. Figure 23.1 shows a
control chart for month 1. The center line is the target finish time. The control limits at 12:00
a.m. and 4:00 a.m. are set three standard deviations above and below the center line. The
data points are the finish times that Quest tracked over a one-month period.
Figure 23.1. Control chart for month 1.
Quest assumed that their nightly finish times are normally distributed. In Figure 23.2, we add a
graph of the normal distribution to the control chart. Remember, in a normal distribution 68% of
your data is within one standard deviation of the mean, 95% is within two standard deviations,
and 99.7% is within three standard deviations.

Figure 23.2. Adding an assumption of normality.
Using the control chart Quest was able to figure out when their process had been disturbed
and gone out of control, or was heading that way. One dead giveaway that the finish times
are out of control is if a point falls outside the control limits. That should only happen 0.3% of
the time if everything is running smoothly. Take a look at Figure 23.3, which highlights what
happened toward the end of the one-month cycle.
Figure 23.3. Highlighting finishing times beyond the control limits.
There are other indicators that something suspicious might be going on besides points falling
outside the control limits. For example, if too many points are on one side of the center line or
if a strong pattern emerges (hence, the variability is not random) – then it’s time to investigate.
Mapping finish times on the control chart helps monitor the process, and alerts techs right
away that something has been disturbed. Then they can track down and address the
cause immediately.
Another way the control chart helped Quest improve efficiency was by revealing some of
the causes of variation in the process, which the team could then address. Quest actually

restructured the entire department. It set up pods within the department and changed staffing.
These sorts of changes brought the mean finish time much closer to the 2 a.m. target, and
the remaining variation clustered more tightly around the center line. The days of wildly erratic
finish times were gone thanks to statistical process control.

A. Understand why statistical process control is used.
B. Be able to distinguish between common cause and special cause variation.
C. Know how to construct a run chart and describe patterns/trends in data over time.
D. Know how to construct an x chart and describe the changes in sample means over time.
E. Make decisions based on observed patterns in 7 run charts and x charts.
F. Be able to apply decision rules to determine if a process is out of control.
Content Overview
Figure 23.4: Silicon ingots and polished wafers.
Consider the problem of quality control in the manufacturing process of turning ingots of
silicon into polished wafers used to make microchips. (See Figure 23.4.) Assume that the
manufacturer wants the polished wafers to have consistent thickness with a target thickness
of 0.5 millimeters. A sample of 50 polished wafers is selected as a batch is being produced.
Table 23.1 contains these data.
0.555 0.543 0.533 0.538 0.533 0.529 0.526 0.522 0.518 0.519
0.516 0.515 0.513 0.515 0.512 0.510 0.508 0.507 0.507 0.507
0.506 0.506 0.506 0.505 0.503 0.502 0.500 0.498 0.499 0.496
0.497 0.493 0.492 0.491 0.487 0.488 0.486 0.485 0.483 0.484
0.482 0.479 0.476 0.476 0.474 0.471 0.471 0.469 0.454 0.447
Table 23.1. Wafer thickness from sample of 50 polished wafers.
In order to gain a sense of the distribution of wafer thickness, a quality control technician
constructs the histogram shown in Figure 23.5.

40
30
Percent
20
10
0
0.40 0.45 0.50 0.55 0.60
Thickness (mm)
Figure 23.5. Histogram of wafer thickness.
The histogram indicates that distribution of wafer thickness is approximately normal. The
sample mean is 0.50064, which is pretty close to the target value. Furthermore, the standard
deviation is 0.02227, which is relatively small compared to the mean. The analysis thus far
supports the conclusion that the process is in control.
The sample mean and standard deviation together with the histogram provide information on
the overall pattern of the sample data. However, there is more to quality control than simply
studying the overall pattern. Manufacturers also keep track of the run order, the order in which
the data are collected. For the data in Table 23.1, the run order may relate to which part of the
ingot – top, middle, or bottom – the wafers came from, or it may relate to the order in which
wafers were fed through the grinding and polishing machines. If a process is stable or in
control, the order in which data are collected, or the time in which they are processed, should
not affect the thickness of polished wafers. One way to check that the production processes of
polished wafers are in control is by creating a run chart.
A run chart is a scatterplot of the data versus the run order. To help visualize patterns over
time, the dots in the scatterplot are usually connected. Table 23.1 lists the data values in the
order they were collected, starting with the first row 0.555, 0.543, . . . , 0.519, followed by the
second row, third row, fourth row and ending with 0.447, the last entry in the fifth row. So, the
run order for 0.555 is 1, for 0.543 is 2, and so forth until you get to the run order for 0.447,
which is 50. Figure 23.6 shows the run chart for the wafer thickness data. A center line has
been drawn on the chart at the target thickness of 0.05 millimeters.

Run Chart
0.550
0.525
Thickness (mm)
0.500
0.475
0.450
0 10 20 30 40 50
Run Order
Figure 23.6. Run chart for wafer thickness data from Table 23.1.
Even though the overall pattern of the data gave no indication that there were any problems
with the grinding and polishing processes, it is clear from the run chart in Figure 23.6 that the
thickness of polished wafers is decreasing over time. Processes need to be stopped so that
adjustments can be made to the grinding and polishing processes.
The run chart involved plotting individual data values over time (run order). Another approach
is to select samples from batches produced over regular time intervals. For example, a quality
control plan for the polished wafers might call for routine collection of a sample of n polished
wafers from batches produced each hour. The thickness of each wafer in the sample is
recorded and the mean thickness, x , is calculated. The information on mean thickness can be
used to determine if the process is out of control at a particular time and to track changes in
the process over time.
Suppose when the grinding and polishing processes are in control, the distribution of the
individual wafers can be described by a normal distribution with mean µ = 0.5 millimeters and
standard deviation σ = 0.02 millimeters (similar to the data pattern in Figure 23.5). From Unit
22, Sampling Distributions, we know that under this condition the distribution of the hourly
sample means, x , based on samples of size n are normally distributed with the following
mean and standard deviation:
µ x = µ = 0.05 millimeters
σ 0.02
σx = = millimeters
n n

Assume that the quality control plan calls for taking samples of four polished wafers each hour.
In this case, our standard deviation for x is:
σ x = 0.02 4 = 0.01 millimeter
Each hour a technician collects samples of four polished wafers, measures their thickness,
records the values, and then calculates the sample mean. Suppose that the data in Table 23.2
come from samples collected over an eight-hour period.
Sample Sample Sample

Number Thickness (mm) Mean (mm)
1 0.509 0.502 0.521 0.469 0.5003
2 0.504 0.505 0.525 0.468 0.5005
3 0.489 0.506 0.486 0.497 0.4945
4 0.513 0.516 0.482 0.483 0.4985
5 0.552 0.516 0.476 0.472 0.5040
6 0.480 0.484 0.518 0.510 0.4980
7 0.516 0.489 0.513 0.495 0.5032
8 0.508 0.499 0.466 0.480 0.4882
Table 23.2. Data from samples collected each hour.
According to the 68-95-99.7% Rule, if the process is in control, we would expect:
68% of the x values to be within the interval 0.5 mm ± 0.1 mm or between 0.49 mm and
0.51 mm.
95% of the x values to be within the interval 0.5 mm ± 2(0.1) mm or between 0.48 mm and
0.52 mm.
99.7% of the x values to be within the interval 0.5 mm ± 3(0.1) mm or between 0.47 mm and
0.53 mm.
Next, we make an x chart, which is a scatterplot of the sample means versus the sample
order. We draw a reference line at μ = 0.5 called the center line. We use the values from the
68-95-99.7% Rule to provide additional reference lines in our x chart. The lower and upper
endpoints on the 99.7% interval are called the lower control limit (LCL) and upper control limit
(UCL), respectively. Figure 23.7 shows the completed x chart.

x Chart
0.53
0.52
0.51
Sample Mean
0.50
0.49
0.48
0.47
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sample Number
Figure 23.7. x chart for wafer thickness over time.
The x chart in Figure 23.7 does not appear to indicate any problems that warrant stopping the
grinding or polishing processes to make adjustments. All of the points except one fall within
one σ n of the mean, in other words, fall between the reference lines corresponding to
0.49 and 0.51. However, as we add additional points, we will need some guidelines – a set of
decision rules – that tell us when the process is going out of control. The decision rules below
are based on a set of rules developed by the Western Electric Company. Although they are
widely used, they are not the only set of decision rules.
Decision Rules:
The following rules identify a process that is becoming unstable or is out of control. If any
of the rules apply, then the process should be stopped and adjusted (or the problem fixed)
before resuming production.
Rule 1: Any single data point falls below the LCL or above the UCL.
Rule 2: Two of three consecutive points fall beyond the 2σ / n limit, on the same side of
the center line.
Rule 3: Four out of five consecutive points fall beyond the σ n limit, on the same side of
the center line.
Rule 4: A run of 9 consecutive points (in other words, nine consecutive points on the same
side of the center line).

None of the decision rules apply to the control chart in Figure 23.7. Hence, the processes are
allowed to continue. Table 23.3 contains data on the next seven hourly samples.
Sample Sample Sample

Number Thickness (mm) Mean (mm)
9 0.505 0.489 0.499 0.532 0.5063
10 0.534 0.521 0.530 0.511 0.5240
11 0.526 0.514 0.520 0.530 0.5225
12 0.517 0.518 0.511 0.512 0.5145
13 0.506 0.504 0.511 0.511 0.5080
14 0.507 0.499 0.501 0.510 0.5043
15 0.511 0.509 0.512 0.520 0.5130
Table 23.3. Samples from an additional seven hours.
Figure 23.8 shows the updated x chart that includes the means from the seven
additional samples.
x Chart
0.53
0.52
0.51
Sample Mean
0.50
0.49
0.48
0.47
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sample Number
Figure 23.8. Updated x chart.
Now, we apply the decision rules. This time, we find that Rule 2 applies. Data points
associated with Samples 10 and 11 fall above 0.52 (which, in this case, is above the 2σ / n
limit). According to Rule 2 the process should be stopped after observing Sample 11’s
x -value.
The x chart monitors one statistic, the sample mean, over time. The x chart is only one
type of control chart. As mentioned earlier, the manufacturer is also interested in producing a
consistent product. So, instead of tracking the sample mean, the quality control plan could also
track the sample standard deviations, or the sample ranges over time. More generally, control
charts are scatterplots of sample statistics (or individual data values) versus sample order and
are commonly used tools in statistical process control.

Before control charts were popular, there was a tendency to adjust processes whenever a
slight change was noted. This led to over-adjustment, which often caused more problems than
it solved. In addition, it meant that the process was stopped for adjustment more frequently
than was necessary, which was a waste of money. Control charts and decision rules give
manufacturers concrete guidelines for deciding when processes need attention.

Key Terms
A process is a chain of steps that turns inputs into outputs. Every process has variation.
Common cause variation is the variation due to day-to-day factors that influence the
process. Special cause variation is the variation due to sudden, unexpected events that
affect the process.
When a process is running smoothly, with its variables staying within an acceptable range,
the process is in control. When the process becomes unstable or its variables are no longer
within an acceptable range, the process is out of control.
A run chart is a scatterplot of the data values versus the order in which these values are
collected. The chart displays process performance over time. Patterns and trends can be
spotted and then investigated.
Control charts are used to monitor the output of a process. The charts are designed to
signal when the process has been disturbed so that it is out of control. Control charts rely on
samples taken over regular intervals. Sample statistics (for example, mean, standard deviation,
range) are calculated for each sample. A control chart is a scatterplot of a sample statistic (the
quality characteristic) versus the sample number. Figure 23.9 shows a generic control chart.
Control Chart
UCL
Quality Characteristic
Center Line
LCL
0 2 4 6 8 10 12 14 16
Sample Number
Figure 23.9. Generic control chart.
The center line on a control chart is generally the target value or the mean of the quality
characteristic when the process is in control. The upper control limit (UCL) and lower
control limit (LCL) on a control chart are generally set ±3 σ n from the center line.

An x chart is one example of a control chart. It is a scatterplot of the sample means versus
the sample number. Scatterplots of sample standard deviations or sample ranges over time
are two other types of control charts.
Decision rules consist of a set of rules used to identify when a process is becoming unstable
or going out of control. Decision rules help quality control managers decide when to stop the
process in order to fix problems or make adjustments.

The Video
watch the video.
1. What was W. Edwards Deming known for?
2. What is a process, statistically speaking? Give an example.
3. What does it mean for a process to be in control?
4. Why did Quest Diagnostics’ lab need a statistical-quality-control intervention?
5. In Quest’s control chart, how did they determine where to set the upper and lower
control limits?

6. How did Quest respond to what it learned from its control charts? What were the results of
these changes?

Unit Activity:
You’re In Control!
For this activity, you will play the role of a semiconductor quality control manager in charge of
monitoring the thickness of polished wafers. Open the Control Chart tool from the Interactive
Tools menu. You will be working with x charts. The activity questions follow the list of steps
below.
Step 1: Select a set of values for the mean µ and standard deviation σ. You have three
possible choices for each of these parameters.
For now, you will work through the construction of at least two control charts with your
selection. In the real world, these values would be determined from past data collected when
the process was known to be in control.
Step 2: Since you are in charge of the quality control plan, decide on the sample size n you
would like to use for monitoring the process. You have three choices: 5, 10, or 20.
Keep in mind the following: The more wafers you sample, the more time it will take, and the
more it will cost. On the other hand, with larger samples, results are more precise.
Step 3: Select the Step-By-Step mode.
In this mode, you will get feedback immediately after each decision that you make. If you make
a mistake, you will be told to start over and will need to click the “Start Over” button. Once you
feel confident about your decisions, you can change to Continuous mode.
Step 4: Calculate the lower control limit to four decimals and enter its value in the box for LCL.
Calculate the upper control limit to four decimals and enter its value in the box for
UCL. Click the “Change Control Limits” button.
If your calculations are correct, control lines will appear in the x chart. In Step-By-Step mode,
you will get feedback (see bottom of screen) if you have made a mistake. The feedback will
say: Recalculate control limit values. To correct the error, enter new values for LCL and UCL
and then click the “Change Control Limits” button.
Step 5: Click on the “Collect Sample Data” button. The data will appear in a column under
the heading Thickness (mm) near the top of your screen. To calculate x , click the
“Calculate Mean” button. The mean will appear underneath the column.

Step 6: Click the “Plot Point” button to plot the ordered pair (sample number, mean) on the
x chart.
Step 7: Make a decision. Your possible decisions are: (1) Continue Process, which means that
you have decided the process is in control; or (2) Stop Process, which means that you
have decided to shut down the process for adjustments or inspection.
Step 8: Repeat steps 5 – 7 until one of the following three things happens:
(1) You decide to continue and get the following feedback: Process is not in control. It should
be stopped immediately. In this case, click the “Start Over” button at the top of the screen.
(2) You decide to continue and get the following feedback: Good decision. In this case,
continue constructing the control chart.
(3) After 25 samples, it will be time for routine maintenance even if the process is still in
control. At this time you, you can proceed to the next question. Click the “Start Over” button to
do so.
1. Work through Steps 1 – 8 using the Control Chart tool. Complete one control chart
successfully. Make a sketch of your chart (or do a screen capture and paste the screen
capture into a Word document). If the process was stopped before 25 samples were selected,
state which of the decision rules applies.
2. Use the same settings as you did for question 1. Rework question 1.
After you have successfully completed two control charts in Step-by-Step mode, you are ready
to move on to question 3.
3. Change the settings for µ, σ, and n. Choose Continuous mode. Allow the process to
continue until you think it needs to be stopped. After clicking the “Stop Process” button, you
will receive feedback.
a. What settings did you choose? What were the values of the upper and lower control limits?
b. Make a sketch of your control chart or save a screen capture of your control chart into a
Word document.

c. What feedback did you receive?
d. If your feedback indicated that you made a correct choice to stop the process, state the
rule that made you decide it was time to stop the process. If your feedback indicated that you
should have stopped the process sooner, state the sample number for when you should have
stopped the process and the rule that applies.
4. Select new settings for µ and σ (it is up to you if you also want to change n).
Repeat question 3 and make another control chart.

Exercises
1. A manufacturer of electrical resistors makes 100-ohm resistors that have specifications
of 100 ± 3 ohms. A quality control inspector collected a sample of 15 electrical resistors and
tested their resistance. The results are recorded below:
99 98 101 98 99 101 99 100
100 98 99 102 99 101 100
Assume that these data are recorded in the order they were collected beginning with the first
row 99, . . . 100, followed by the second row 100, . . . 100.
a. Make a run chart for these data. Leave room on the horizontal axis to expand the run orders
out to 30. (You will be adding 15 more data points in part (c).) Draw a reference line for the
target resistance (100 ohms) and for the tolerance interval (these can serve as control limits).
b. Based on your run chart in (a), is there any evidence that the process is out of control?
Support your answer.
c. The quality control inspector continued collecting data on the resistors. Results from an
additional 15 data values, in the order values were collected, are recorded below:
100 99 102 99 101 102 101 100
101 102 100 103 101 102 103
Use these data to complete the run chart in (a) for run orders from 1 – 30.
d. Based on the completed run chart in (c) is there any evidence that the manufacturing
process is out of control? Support your answer.
2. Suppose a chemical manufacturer produces a product that is marketed in plastic bottles.

The material is toxic, so the bottles must be tightly sealed. The manufacturer of the bottles
must produce the bottles and caps within very tight specification limits. Suppose the caps will
be acceptable to the chemical manufacturer only if their diameters are between 0.497 and
0.503 inch. When the manufacturing process for the caps is in control, cap diameter can be
described by a normal distribution with µ = 0.5 inch and σ = 0.0015 inch .

a. If the process is in control, what percentage of the bottle caps would have diameters outside
the chemical manufacturer’s specification limits?
b. The manufacturer of the bottle caps has instituted a quality control program to prevent the
production of defective caps. As part of its quality control program, the manufacturer measures
the diameters of a random sample of n = 9 bottle caps each hour and then calculates the
sample mean diameter. If the process is in control, what is the distribution of the sample mean
x ? Be sure to specify both the mean and standard deviation of x ’s distribution.
c. The cap manufacturer has a rule that the process will be stopped and inspected any time
the sample mean falls below 0.499 inch or above 0.501 inch. If the process is in control, find
the proportion of times it will be stopped during inspection periods.
3. For each of the x charts in Figures 23.10 – 23.12, decide whether or not the process is in
control. If the process is out of control, state which decision rule applies. Justify your answer.
(Note that reference lines at one, two, and three σ n on either side of the mean have been
drawn on the control charts.)
a.
Control Chart
35 35
30 30
25 25
Sample Mean
20 20
15 15
10 10
5 5
0 2 4 6 8 10 12 14 16
Sample Number
Figure 23.10. Control chart for exercise 3(a).

b.
Control Chart
70 70
60 60
50 50
Sample Mean
40 40
30 30
20 20
10 10
0 2 4 6 8 10 12 14 16
Sample Number
Figure 23.11. Control chart for exercise 3(b).
c.
Control Chart
70 70
60 60
50 50
Sample Mean
40 40
30 30
20 20
10 10
0 5 10 15 20 25 30 35 40
Sample Number
Figure 23.12. Control chart for exercise 3(c).
4. A company produces a liquid which can vary in its pH levels unless the production process
is carefully controlled. Quality control technicians routinely monitor the pH of the liquid. When
the process is in control, the pH of the liquid varies according to a normal distribution with
mean µ = 6.0 and standard deviation σ = 0.9 .
a. The quality control plan calls for collecting samples of size three from batches produced
each hour. Using n = 3, calculate the lower control limit (LCL) and upper control limit (UCL).
b. Samples collected over a 24-hour time period appear in Table 23.4. Compute the sample
means for each of the 24 samples and add the results to a copy of Table 23.4.

Sample pH level Sample Mean
1 5.8 6.2 6.0
2 6.4 6.9 5.3
3 5.8 5.2 5.5
4 5.7 6.4 5.0
5 6.5 5.7 6.7
6 5.2 5.2 5.8
7 5.1 5.2 5.6
8 5.8 6.0 6.2
9 4.9 5.7 5.6
10 6.4 6.3 4.4
11 6.9 5.2 6.2
12 7.2 6.2 6.7
13 6.9 7.4 6.1
14 5.3 6.8 6.2
15 6.5 6.6 4.9
16 6.4 6.1 7.0
17 6.5 6.7 5.4
18 6.9 6.8 6.7
19 6.2 7.1 4.7
20 5.5 6.7 6.7
21 6.6 5.2 6.8
22 6.4 6.0 5.9
23 6.4 4.6 6.7
24 7.0 6.3 7.4
Table 23.4. pH samples collected hourly.
c. Make an x chart. Add reference lines including lines for the lower and upper control limits.
d. Based on the control chart you drew for (c), decide whether or not the process is in control.
If not, state which of the decision rules applies.

Review Questions
1. A manager keeps track of duplicate e-mail messages he receives, which he views as a
waste of his time. His log of the number of duplicate e-mails over 20 consecutive workdays
appears in Table 23.5.
Day Number 1 2 3 4 5 6 7 8 9 10
Number of Duplicates 2 1 0 2 12 14 17 15 25 20
Day Number 11 12 13 14 15 16 17 18 19 20
Number of Duplicates 24 27 22 24 26 20 22 5 2 0
Table 23.5. Duplicate e-mail messages per day.
a. Calculate the mean number of duplicate e-mails per day.
b. Draw a run chart of the duplicate e-mail data. Add the mean number of duplicates as a
reference centerline.
c. Nine or more consecutive data points on the same side of a center line can signal a special
cause variation. Does the run chart from (b) signal a special cause variation?
2. A quality control inspector at a company that manufactures valve linings monitors the mass
of the linings. When the process is in control, the mean mass is µ = 240.0 grams and standard
deviation σ = 0.4 gram . The inspector randomly selects a valve liner from batches produced
each hour and records its mass. The mass (in grams) of 25 valve liners are displayed in
Table 23.6 on the next page.

Hour Mass Hour Mass
1 240.0 14 240.2
2 239.9 15 239.8
3 239.6 16 240.7
4 240.2 17 239.4
5 239.6 18 240.5
6 239.8 19 239.7
7 239.8 20 239.3
8 240.1 21 240.5
9 239.8 22 239.7
10 240.1 23 239.5
11 240.1 24 240.7
12 239.8 25 239.4
13 240.2
Table 23.6. Mass of valve liners.
a. Make a histogram for mass of valve liners from Table 23.6. For the first class interval,
use 239.0 grams to 239.2 grams. Based on the histogram is there any evidence that the
manufacturing process is not in control? Explain.
b. Make a run chart for the mass of valve liners. Add a reference center line at µ. Add lower
and upper control limits at µ ± 3σ .
c. Does the run chart show any changes in the distribution of valve-liner mass over time?
Explain.
3. One process in the production of integrated circuits involves chemical etching of a layer of
silicon dioxide until the metal beneath is reached. The company closely monitors the thickness
of the silicon dioxide layers because thicker layers require longer etching times. The target
thickness is 1 micrometer (µm) and has a standard deviation of 0.06 micrometers (based on
past data when the process was in control). The company uses samples of four wafers. An x
chart based on 40 consecutive samples appears in Figure 23.13 on the next page.

Control Chart
UCL
U2
Sample Mean
U1
1.0 1
L1
L2
LCL
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Sample Number
Figure 23.13. Control chart for thickness of silicon dioxide layers.
a. Calculate the appropriate control limits (the values of the reference lines drawn in Figure
23.13). Round the values to two decimals.
b. Decide whether or not the process is in control. If not, explain which decision rule applies
and identify the sample number after which the process should be shut down for adjustments.
4. The company referred to in exercise 4 has two plant lines that produce the liquid. Data from
the second line appears in Table 23.7. When the process is in control, the pH of the liquid varies
according to a normal distribution with mean µ = 6.0 and standard deviation σ = 0.9 . The
quality control plan calls for collecting samples of size three from batches produced each hour.
Sample pH level
1 7.2 7.4 7.4
2 6.9 6.6 6.5
3 6.2 6.3 6.3
4 6.8 6.4 6.5
5 6.5 6.6 6.7
6 6.8 6.8 6.8
7 6.2 6.3 6.4
8 5.6 5.7 5.9
9 4.9 5.8 5.6
10 6.4 6.0 4.4
11 6.9 5.3 6.2
Continued on the next page...

12 5.5 5.9 5.9
13 5.3 5.1 5.2
14 6.2 6.7 6.5
15 4.9 4.7 4.8
16 6.4 6.1 7.0
17 6.3 5.8 6.0
18 4.9 5.0 5.1
19 5.5 5.7 5.3
20 5.3 5.2 5.4
21 5.8 5.8 5.6
22 5.8 5.6 5.7
23 4.8 4.7 4.6
24 4.8 4.9 4.8
Table 23.7. pH of samples.
a. Calculate the sample means for each of the 24 samples.
b. Construct an x chart for the pH samples from the second plant line. Include reference lines
marking the center line and one, two, and three σ n on either side of the center line.
c. Based on the control chart from (b), does the process appear to be in control? If not, which
decision rule applies and what appears to be the problem?

Unit 24: Confidence
Intervals
Summary of Video
This video is an introduction to inference, which means we use information from a sample to
infer something about a population. For example, we might use a sample statistic to estimate a
population parameter. Suppose we wanted to know a man’s mean blood pressure. A sample of
blood pressure readings is shown in Table 24.1.
Su M T W Th F Sa
130 120 140 125 130 130 140 A.M.
125 130 145 140 125 135 110 P.M.
Table 224.1.
Table 4.1 Systolic blood pressure readings.
We could estimate his mean blood pressure using the sample mean from these readings,
x = 130. But how trustworthy is our conclusion given that different samples could lead to
different results, some higher and others lower? Statisticians address this issue by calculating
confidence intervals. Rather than a single number like 130, we can compute a range of values
along with a confidence level for that range.
Next, the context switches from blood pressure to the length of life of batteries. Because
companies promise specific battery lifetimes and improved performance over a competitor,
they need proof before ads promoting their product go on the air. At Kodak’s Ultra
Technologies, technicians use rigorous testing and calculate confidence intervals to back up
their marketing claims. Here’s how the data are collected. Random samples of batteries are
pulled from the warehouse. The batteries are drained under controlled conditions and the time
it takes for them to run out of juice is recorded. From these data, Kodak has determined that
its population of AA batteries when used in a toy will last 7½ hours ± 20 minutes and that their
confidence in that range is 95%.
Now, we retrace Kodak’s steps to figure out how they came up with this interval. Before getting
started, we need to check that a few underlying assumptions are satisfied:
Unit 24: Confidence Intervals | Student Guide | Page 1

1. Observations are independent.
2. Data are from a normal population or the sample size n is large.
3. The population standard deviation is known.
Selecting a random sample of batteries for the test takes care of the assumption of
independent observations. The second assumption is satisfied since the sample size of n = 40
is considered large. The last assumption is not reasonable in the real world, but for now, we’ll
assume that from past data we do know the population standard deviation, σ = 63.5 minutes.
The task is to calculate a confidence interval for μ, the mean life of Kodak’s AA batteries.
Our sample statistic x is a point estimate for the parameter μ. If we include a margin of error
around our point estimate, we get an interval estimate of the form:
point estimate ± margin of error
From Unit 22, Sampling Distributions, we know that the sampling distribution of x is normal,
with mean µ x = µ and standard deviation σ x = σ n . In this case, we are given σ = 63.5
minutes and we can compute σ x = 63.5 40 minutes or about 10 minutes. Think back to the
68-95-99.7% Rule. In any normal distribution, 95% of the observations lie within two standard
deviations of the mean. So, 95% of all possible samples result in battery-life data for which µ
is within plus or minus 20 minutes of that sample’s mean, x . In our example, x = 450 minutes.
So, we can say with 95% confidence that μ lies within 20 minutes of x , giving us a confidence
interval from 430 minutes to 470 minutes. To say that we are 95% confident in our calculated
range of values means that we got the numbers using a method that gives correct results 95%
of the time over many, many examples.
What if Kodak were willing to settle for only 90% confidence? Or what if they insisted on 99%
confidence? We can get any confidence level that we want by turning to the standard normal
distribution and finding the z* critical value. Then just substitute the appropriate values into the
following formula:
⎛ σ ⎞
x ±z*⎜
⎝ n ⎟⎠
Notice that the margin of error gets larger if we insist on higher confidence because z* will be
larger. On the other hand, the margin of error gets smaller if we take more observations so that
n is larger.

A. Understand that a common reason for taking a sample is to estimate some property of the
underlying population.
B. Recognize that a useful estimate requires a measure of how accurate the estimate is.
C. Know that a confidence interval has two parts: an interval that gives the estimate and the
margin of error, and a confidence level that gives the likelihood that the method will produce
correct results in the long range.
D. Be able to assess whether the underlying assumptions for confidence intervals are
reasonably satisfied. Provided the underlying assumptions are satisfied, be able to calculate
a confidence interval for μ given the sample mean, sample size, and population standard
deviation.
E. Understand the tradeoff between confidence and margin of error in intervals based on the
same data.
F. Given a specific confidence level, recognize that increasing the size of the sample can give
a margin of error as small as desired.

Content Overview
The purpose of a confidence interval is to estimate an unknown parameter with an indication
of (1) how precise the estimate is and (2) how confident we are that the result is correct. Any
confidence interval has two parts: an interval computed from the data and a confidence level.
The interval often has the form
point estimate ± margin of error.
The confidence level states the probability that the method will give a correct result. That is,
if you use 95% confidence intervals often, in the long run 95% of your intervals will contain the
true parameter value.
Suppose that a simple random sample of size n is drawn from a normally distributed
population having an unknown mean µ and known standard deviation σ. A level C (expressed
as a decimal) confidence interval for µ is
⎛ σ ⎞,
x ±z*⎜
⎝ n ⎟⎠
where z* is a cutoff point for the standard normal curve with area (1 – C)/2 to its right.
For example, if C = 0.95 (for a 95% confidence interval) then (1 – C)/2 = (1 – 0.95)/2 = 0.025.
In this case, z* turns out to be 1.96 as shown in Figure 24.1.
Distribution Plot
Normal, Mean=0, StDev=1
0.4
0.3
Density
0.2
0.1
0.025 0.025
0.0
-1.960 0 1.960
Z
Figure 24.1. Standard normal density curve illustrating z* = 1.96.
If the sample size n is relatively small, we first need to check that the underlying assumption
of normality is reasonably satisfied before computing a confidence interval. One way to check
the assumption of normality is to make a normal quantile plot of the sample data. Alternatively,

we could look at a boxplot. If the sample size n is large (n at least 30), the confidence interval
formula is approximately correct even when the population does not have a normal distribution.
This result is due to the Central Limit Theorem (Unit 22, Sampling Distributions).
The size of the margin of error controls the precision (width) of the confidence interval
estimate. Precision is increased as the margin of error shrinks. The margin of error of a
confidence interval decreases if any of the following occur:
• the confidence level C is reduced
• the sample size n increases
• the population standard deviation σ decreases
In practice, the population standard deviation σ is not known and must be estimated
from the sample. If the sample size n is fairly large (say at least 30), then the value of the
sample standard deviation s should be close to σ. In that case, you can replace σ by s in
the confidence interval formula. (See Unit 26, Small Sample Inference for One Mean, for a
continued discussion of confidence intervals for µ when σ is unknown.)

Key Terms
A point estimate of an unknown population parameter is a single number based on sample
data (a statistic) that represents a plausible value for that parameter.
A confidence interval for a population parameter is an interval of plausible values for that
parameter. It is constructed so that the value of the parameter will be captured between the
endpoints of the interval with a chosen level of confidence. The confidence level is the
success rate of the method used to construct the confidence interval.
Many confidence intervals have the following form: point estimate ± margin of error. The
margin of error is the range of values above and below the point estimate.
A formula used to compute a confidence interval for µ when σ is known and either the
sample size n is large or the population distribution is normal is given by:
⎛ σ ⎞
x ±z*⎜ ,
⎝ n ⎟⎠
where z* is a z-critical value associated with the confidence level.

The Video
watch the video.
1. Why is a single blood pressure reading not sufficient if we want to estimate a person’s
average blood pressure?
2. What are the two parts of any confidence interval?
3. What assumptions need to be checked before computing a confidence interval?
4. In plain language, what does “95% confidence” mean?

Unit Activity:
Confidence Interval Simulation
In this activity, you will need the simulation data collected for question 2 in Unit 22’s activity.
Recall that samples of size 9 were drawn from an approximately normal distribution with
µ = 50 and standard deviation σ = 4. Assume for the moment that µ is unknown. You will be
using sample data to find confidence interval estimates for µ.
1. a. What is the standard deviation of x for samples of size 9?
b. What is the margin of error for a 95% confidence interval for µ? (Round your answer to
two decimals.)
2. Your instructor has a container filled with numbered strips. Draw a sample of size 9.
a. Record the outcomes of your sample and calculate the sample mean, x .
b. Based on your sample, determine a 95% confidence interval for µ.
c. In this case, the true value of µ is 50. Does your confidence interval contain the true
value of µ?
3. Your instructor should distribute a table of the results from 100 samples of size 9 generated
for Unit 22’s activity.
a. For each sample, calculate a 95% confidence interval estimate for μ and record the
endpoints of the interval.
b. Of the 100 samples collected, how many of the 95% confidence intervals contain the true
value of μ, which is 50? How many did you expect to contain 50? Is there a discrepancy
between the number you found and the number you expected to find? Explain how this
discrepancy could occur.

Exercises
1. Students who take SATs in most high schools are not representative of all students in the
school. Generally, only students who plan to apply for college admission take the test. The
statistics class at Lincoln High decides to get better data. They select a random sample of 20
members of the junior class and arrange for all 20 to take the Math SAT.
The scores are given below.
410 400 460 440 390 400 450 460 520 380
480 480 490 450 480 330 390 460 600 610
Assume that the standard deviation of scores for all juniors is σ = 100 .
a. Find the value of σ x , the standard deviation of the sample mean in size-20 samples.
b. Check to see whether these data could be considered to come from a normally distributed
population. (The data need only be roughly normal – in other words, the data should have no
severe departures from normality.)
c. Let μ be the mean score that would be observed if every junior at Lincoln High took the
exam. Give a 95% confidence interval for μ. Show your calculations. How could you get a
smaller margin of error with the same confidence?
d. Give a 99% confidence interval for μ. Explain in plain language, to someone who knows no
statistics, why this interval is wider than your result in (c).
2. The Massachusetts Comprehensive Assessment System (MCAS) includes a 10th grade

math test which is scaled from 200 to 280. Assume that the standard deviation of math test
scores is σ = 17 . (This assumption is reasonable based on past results.)
a. A random sample of 30 test results is given below. Use these results to determine a 95%
confidence interval for the mean MCAS math score, μ.
252 266 264 244 262 268 236 254 264 276
266 220 218 260 258 232 268 218 262 242
238 262 250 264 276 234 232 266 276 248

b. A second random sample of 30 test results was taken. The results are given below.
Combine the data from the two samples, the one below and the one in (a), and use the
combined data to compute a 95% confidence interval for μ.
258 252 268 264 264 264 222 258 220 254
254 274 266 264 268 248 238 248 258 254
254 258 208 268 268 272 274 254 272 270
c. Compare the margin of errors for the confidence intervals in (a) and (b). Why would you
expect the margin of error based on 60 observations to be less than the margin of error based
on 30 observations?
d. Keeping the confidence level at 95%, how many observations would you need in order to
reduce the margin of error to under 3.0?
3. A city planner randomly selects 100 apartments in Boston, Massachusetts, to estimate the
mean living area per apartment. The sample yielded x = 875 square feet with a standard
deviation s = 255 square feet.
a. Calculate a 95% confidence interval for μ, the mean living area per apartment. (Keep in
mind that since the sample size is large, s should be close to σ.)
b. Having found the interval in (a), can you say there is a 95% chance that the mean living
area is within the interval? Explain why or why not.
4. A random sample of 50 full-time, hourly wage workers between the ages of 20 and 40
was selected from participants in the 2012 March Supplement, which is part of the Current
Population Survey (a joint venture of the U.S. Bureau of Labor Statistics and Census Bureau).
The hourly rate (in dollars) of these workers is given below.
7.25 30.09 12.00 25.00 8.00 27.53 14.20 31.00 20.00 18.00
12.00 28.12 16.50 8.00 9.00 15.00 15.10 18.00 17.43 14.00
15.25 34.50 8.00 14.80 7.80 11.00 33.07 10.55 19.00 19.50
12.25 18.00 24.00 27.50 15.00 6.75 30.00 10.30 27.00 14.50
8.00 14.00 10.00 11.75 15.00 28.00 7.50 28.50 16.25 11.75

a. Calculate the sample mean and standard deviation.
b. Calculate a 95% confidence interval for μ, the mean hourly wage of full-time, hourly wage
workers between the ages 20 and 40. Because the sample size is large, use s, the sample
standard deviation, in place of σ, the unknown population standard deviation.
c. A politician speaking around the time that the data for the 2012 March Supplement were
collected claimed that salaries were rising. He stated that the average hourly rate for fulltime
workers between the ages of 20 and 40 was $20.00. Does your confidence interval from (b)
affirm or refute the politician’s claim. Explain.
d. After being confronted, the politician complained that we should have used a 99%
confidence interval to estimate the mean hourly wage. Compute a 99% confidence interval for
μ. Does the 99% confidence interval affirm his claim? Explain.

Review Questions
1. There are many thousands of male high school basketball players. Julie collects the heights
of the 96 varsity players in her school’s league. The mean height of these 96 players is
x = 71.1 inches and the standard deviation is s = 2.7 inches.
a. Because the sample is large, the population standard deviation σ will be close to 2.7, the
observed sample standard deviation. Give a 95% confidence interval for the mean height of all
varsity basketball players, assuming that Julie’s observations are a random sample. Show your
calculations.
b. Do you think it is reasonable to take these 96 players as a random sample of all male varsity
basketball players? Why or why not?
2. A random sample of 36 skeletal remains from females was taken from data stored in the
Forensic Anthropology Data Bank (FDB) at the University of Tennessee. The femur lengths
(right leg) in millimeters are recorded below.
432 432 435 460 432 440 448 449 434 443
525 451 448 443 450 467 436 423 475 435
433 438 453 438 435 413 439 442 507 424
468 419 434 483 448 514
a. Determine the sample mean and standard deviation.
Since the sample size is large, we can use the sample standard s in place of σ in calculations
of confidence intervals.
b. Before doing any calculations, think about a 90%, 95% and 99% confidence for µ, the mean
femur bone length for women. Which of these intervals would be the widest? Which would be
the narrowest? Explain how you know without calculating the confidence intervals.
c. Calculate 90%, 95%, and 99% confidence intervals for µ, the mean femur bone length for
adult females. Do your results confirm your answer to (b)?

3. Birth weights in grams of a random sample of 20 babies born in Massachusetts in 2010
appear below. Based on past data, assume that the standard deviation for birth weights is
σ = 600 grams .
4054 3572 2636 3430 3118 3969 3628 3940 4536 4819
3883 3487 3827 3883 2749 3487 3855 4450 4309 3345
a. Are the underlying assumptions for confidence intervals reasonably satisfied?
b. Determine a 95% confidence interval for the mean birth weights of babies born
in Massachusetts.
c. In the United States, we are more accustomed to reporting babies’ weights in ounces (or
even pounds and ounces) than grams. How would you modify the confidence interval to give a
confidence interval for the mean weight in ounces? Calculate that interval. (Use the following
conversion: 1 gram ≈ 0.03527 ounce.) Does your result seem reasonable?
4. How much can a single outlier affect a confidence interval? Suppose that the first
observation of 4054 grams in the random sample in question 3 had been 350 grams (the
weight of a baby that did not survive).
a. Make a boxplot of the modified data set to show that this low weight baby is an outlier.
b. Recalculate the 95% confidence interval based on the modified data. How much did the
outlier affect the confidence interval?
Final comment: Always look at your data before calculating confidence intervals.
Outliers can greatly affect your results.

Unit 25: Tests of
Significance
Summary of Video
Sometimes, when you look at the outcome of a particular study, it can be hard to tell just
how noteworthy the results are. For example, if the severe injury and death rates due to car
crashes on one state’s roads have dropped from 4.7% down to 3.8% after enacting a seat
belt law, how would we know whether this result was due to the seat belt law or simply due to
chance variation?
To sort out whether results are due to chance or there is something else at work (such as
the enactment of the seat belt law), statisticians turn to a tool of inference called tests of
significance. Significance testing can be applied in a variety of situations. We next explore how
researchers used it to help solve a controversy in classic literature.
In 1985, scholar Gary Taylor made a surprising find while conducting research for a new
edition of the complete works of William Shakespeare. While going through a 17th century
anthology at the Bodleian Library at Oxford University, he came upon a sonnet he had never
seen before and it was attributed to William Shakespeare. Obviously, Taylor was excited about
his new find and wanted to include it in his new edition of The Complete Works.
This discovery caused quite a controversy – some scholars were thrilled by the discovery
but others didn’t think the poem was good enough to be one of Shakespeare’s. Statistics
to the rescue! A decade earlier, statistician Ron Thisted had done a statistical analysis of
Shakespeare’s vocabulary. Thisted’s program provided a detailed, numeric description of
Shakespeare’s vocabulary. For every work, Thisted could tell how many new words there
were that Shakespeare didn’t use anywhere else. Using this model, Thisted predicted that if
Shakespeare had written the poem in question, it would have 7 unique words in it. When they
ran the poem through the program, however, they found that there were 10 unique words. Did
this difference reflect random variation within Shakespeare’s writing? Or did it indicate that
Shakespeare was not the author? This is where significance testing (or tests of hypotheses)
can be helpful.
Thisted set up two opposing hypotheses: the null hypothesis, written as H0, that basically
means nothing unusual is happening; and the alternative hypothesis, the researchers’ point of
Unit 25: Tests of Significance | Student Guide | Page 1

view, written as Ha. Researchers aim to reject the null hypothesis with evidence that suggests
something more is going on than random variation. In this case, the hypotheses are:
H0: Shakespeare wrote the poem.
Ha: Someone other than Shakespeare wrote the poem.
The question was whether the discrepancy between the observed number of unique words,
10, and the predicted number of unique words, 7, was due to another author writing the poem
rather than to chance variation. Is that three-word difference a big difference? To answer
this question, Thisted assumed (based on his data) that the number of unique words in
Shakespeare’s poems had the approximately normal distribution with mean µ = 7 and standard
deviation σ = 2.6 shown in Figure 25.1.
Figure 25.1. Distribution of the number of unique words in Shakespeare’s poems.
The shaded area under the density curve in Figure 25.2 corresponds to the probability of a
number of unique words at least as extreme as 10 (in other words, a difference from 7 of 3 or
more words).

Figure 25.2. Finding the p-value.
Using technology, we find that the shaded area is 2(0.1243) = 0.2483. Thus, Thisted
could expect to find a value at least as extreme as 10 unique words roughly 25% of the
time. Therefore, Thisted failed to find significant evidence against the null hypothesis that
Shakespeare wrote the poem. He could not reject H0. In the absence of literary or statistical
evidence against Shakespeare’s authorship, the poem was published in Taylor’s edition of The
Complete Works.
Since we want to work with sample means, let’s suppose researchers found a folio of five
new poems that were attributed to Shakespeare. Suppose that our sample mean from the five
poems in the folio is x = 8.2 . We want to know if, based on this evidence, we can conclude
that Shakespeare did not write these poems. We set up our null and alternative hypotheses:
H0 : µ = 7
Shakespeare wrote the poems.
Ha : µ ≠ 7
Someone else wrote the poems.
One thing to decide, when setting up a significance test, is whether to use a one-sided or
two-sided alternative hypothesis. In our Shakespeare example, we are using a two-sided
alternative hypothesis because a different author might consistently use either more or fewer
unique words than Shakespeare. But suppose we suspected the poem was written by a
particular author who was known to consistently use more unique words than Shakespeare?

Then the alternative hypothesis would be one-sided:
Ha : µ > 7
We begin by assuming the null hypothesis is true. Then we find the probability of getting a
result at least as extreme as ours if the null hypothesis really is true. If these poems were
written by Shakespeare, then the distribution of x , the mean number of unique words per
poem in five poems, would have a normal distribution with the following mean and standard
deviation:
µx = µ
2.6
σx = ≈ 1.163
5
Next, we need to find the probability that any sample of five of Shakespeare poems would
have an x at least as far from 7 as what we observed from our sample, x = 8.2 . Figure 25.3
illustrates this probability. Notice that two areas are shaded because our alternative is
two-sided.
Figure 25.3. Sampling distribution of x .
To calculate this probability from a standard normal table, we find the z-score for our observed
sample mean. This is called a z-test statistic:

x−µ
z=
σ n
8.2 − 7
z= ≈ 1.03
2.6 5
So, the observed value of our test statistic z is 1.03, a little more than one standard deviation
away from the mean, 0, on the standard normal curve. The final step in our test of significance
is to find the probability of observing a value from a standard normal distribution that is at least
this extreme. This probability is called the p-value. To find this p-value, we use z = 1.03 and
look in the standard normal table (z-table). From Figure 25.4, we find that the area under the
standard normal curve to the left of 1.03 is 0.8485.
Figure 25.4. Portion of standard normal table (z-table).
That means that 1 – 0.8485 or 0.1515 is the area in the right tail (the shaded region in
Figure 25.5). Since we choose a two-sided alternative, we double this value because we are
interested in the area under BOTH tails (the area to the right of 1.03 and the area to the left of
-1.03). Our final result gives a p-value of 0.303.

Figure 25.5. Finding the p-value from a standard normal distribution.
From the p-value, we know that there is a 30.3% chance that random variation would produce
a mean unique word count as far from 7 in either direction as 8.2. Since a 30.3% chance is a
pretty good chance, we have failed to disprove the null hypothesis. We have not found good
evidence against Shakespeare’s authorship of these new poems.
This example helps illustrate the general rule about p-values: Small p-values give evidence
against the null hypothesis; large p-values fail to reject the null hypothesis. Since p-values
can range from the very small – close to zero – to the very large – close to one, researchers
need to decide when a p-value is small enough for them to reject the null hypothesis. One of
the most common levels is 0.05 or 5%. If something is statistically significant at the 5% level, it
means that the results produced a p-value less than 0.05. Another widely used level is 0.01 or
the 1% level.

A. Understand that a significance test answers the question “Is this sample outcome good
evidence that an effect is present in the population, or could it easily occur just by chance?”
B. Be able to formulate the null hypothesis and alternative hypothesis for tests about the mean
of a population. Understand that the alternative hypothesis is the researcher’s point of view.
C. Understand the concept of a p-value. Know that smaller p-values indicate stronger
evidence against the null hypothesis.
D. Be able to calculate p-values as areas under a normal curve in the setting of tests about the
mean of a normal population with known standard deviation.
E. Be able to test a population mean with a z-test.

Content Overview
A significance test (also called a test of hypotheses) answers the question “Is this sample
outcome good evidence that an effect is present in the population, or could it easily occur
just by chance?” The reasoning is as follows: Suppose, for the moment, that we assume the
effect is not present in the population. If the observed result is very unlikely to occur given this
assumption, that’s evidence that the supposition of “no population effect” is false.
The statement being tested in a test of significance is called the null hypothesis, written H0.
For example, H0 might state that a population parameter, such as the mean µ, takes a specific
value. Usually the null hypothesis is a statement of “no effect” or “no difference” or “status
quo.” The test of significance is designed to assess the strength of the evidence against the
null hypothesis and in favor of an alternative hypothesis Ha that represents the effect we hope
or suspect is true. (Ha is generally the researcher’s point of view.) The alternative hypothesis
might be that the parameter differs from its null value, in a specific direction (one-sided
alternative) or in either direction (two-sided alternative).
Suppose that we want to conduct a test about the mean of a population. More specifically,
suppose that we want to test that the mean has a specific value, which we’ll call µ0 , or that it
doesn’t have that value, or is smaller than that value, or larger than that value. We form two
opposing hypotheses – the null and alternative hypotheses – which we express symbolically
as follows (select one of the possible alternatives):
H 0 : µ = µ0
Ha : µ ≠ µ0 or Ha : µ > µ0 or Ha : µ < µ0
To test the hypothesis H0 : µ = µ0 based on a random sample of size n from a population with
unknown mean µ and known standard deviation σ, we compute the sample mean x . Here’s a
recap of what we know about x :
• If H0 is true and the population is normal, then x has the normal distribution with mean
µ0 and standard deviation σ n .
• Suppose instead that the population does not follow a normal distribution. If the
sample size n is large, we can apply the Central Limit Theorem and conclude that x is
approximately normally distributed with mean µ0 and standard deviation σ n .

• Next, still assuming H0 is true, we convert x into a z-score. The result is the z-test
statistic given below:
x − µ0
z=
σ n
If H0 is true, z has the standard normal distribution (at least approximately).
Now, we work through an example. Researchers studying the effects of smoking on sleep
believe that men who smoke need more sleep than what is average for men, which is 7.5
hours per night. Let μ be the mean number of hours of sleep for men who smoke. Assume that
the standard deviation is σ = 0.5 hours. The null and alternative hypotheses are:
H0 : m = 7.5
Ha : m > 7.5
A random sample of 50 smokers completed a questionnaire in which they were asked to

record the number of hours they sleep each night. The sample mean is x = 7.7 hours. We
compute the z-test statistic as follows:
7.7 − 7.5
z= ≈ 2.83
0.5 50
From the z-test statistic, we learn that the observed value of x = 7.7 is 2.83 standard
deviations from the hypothesized mean from H0 , µ = 7.5 . If H0 is true, then z has the standard
normal distribution. Now, we are ready to evaluate the evidence against H0 – How likely would
it be to observe a value from the standard normal distribution that is at least as extreme as
2.83? The answer, around 0.2%, is illustrated in Figure 25.6. Around 0.2% is pretty unlikely.
So, in this case, we reject the null hypothesis and accept the alternative: Male smokers, on
average, need more sleep than men in general.

Standard Normal Density Curve
0.4
0.3
Density
0.2
0.1
0.002327
0.0
0 2.83
Z
Figure 25.6. The evidence against H0.
As we saw in the previous example, the distribution of the z-test statistic, under the assumption
that H0 is true, allows us to use the observed z-value to assess the evidence against H0. We
calculate the probability, assuming H0 is true, of observing a value from the standard normal
distribution as extreme or more extreme than the z-value we calculated – this probability is
called the p-value. Because there are three possible alternatives, there are three possibilities
for computing the p-value:
1. The p-value for a test of H0 against Ha : µ > µ0 is the probability of observing a value from
the standard normal distribution that is at least as large as the observed z-test statistic.
(See Figure 25.7 (1).)
2. The p-value for a test of H0 against Ha : µ < µ0 is the probability of observing a value from
the standard normal distribution that is at least as small as the observed z-test statistic.
(See Figure 25.7 (2).)
3. The p-value for a test of H0 against Ha : µ ≠ µ0 is the probability of observing a value from
the standard normal distribution that is at least as far from 0 (on either side of 0) as the
observed z-test statistic. (See Figure 25.7 (3).)
p-value
p-value
p-value
0 Observed z 0
Observed z 0 Observed z
(1) (2) (3)
Figure 25.7. Calculating p-values corresponding to alternative hypotheses (1 – 3).

Small p-values mean that the probability of observing standard normal values at least as
extreme as the observed z-test statistic are very unlikely to occur assuming the null
hypothesis is true. Hence, small p-values provide evidence against the null hypothesis in
support of the alternative.
Sometimes we set certain cutoffs for the p-value called the significance level. For example, if
the p-value is below 0.05 (p < 0.05), we say the results are significant at the 0.05 level, or the
5% level.

Key Terms
A significance test or test of hypotheses is a method that uses sample data to decide
between two competing claims.
The claim tested by a significance test is called the null hypotheses. Usually the null
hypothesis is a statement about “no effect” or “no change.” The claim that we are trying to
gather evidence for – the researcher’s point of view – is called the alternative hypothesis.
The alternative hypothesis is two-sided if it states that a parameter is different from the null
hypothesis value. The alternative hypothesis is one-sided if it states that either a parameter is
greater than or a parameter is less than the null hypothesis value.
A test statistic is a quantity computed from the sample data that measures the gap between
the null hypotheses and the sample data. A test statistic is used to make a decision between
the null and alternative hypotheses.
The p-value is the probability, computed under the assumption that the null hypothesis is
true, of observing a value from the test statistic at least as extreme as the one that was
actually observed.
The significance level of a test of hypotheses is the highest p-value for which we will reject
the null hypothesis.
A z-test statistic for testing H0 : µ = µ0 , where μ is the population mean, is given by:
x − µ0
z=
σ n
The z-test is used in situations where the population standard deviation σ is known and either
the population has a normal distribution or the sample size n is large.

The Video
watch the video.
1. In the 1970s, statistician Ron Thisted did a statistical analysis of Shakespeare’s vocabulary.
Based on his analysis he created a computer program. What could his program tell you about
a Shakespearean poem?
2. In analyzing a poem to see whether or not it was authored by Shakespeare, Thisted set up
a null hypothesis and an alternative hypothesis. State those hypotheses in words.
3.What was the approximate distribution of the number of unique words per poem in
Shakespeare’s poems?
4. Thisted observed 10 unique words in the newly discovered poem. Was that sufficient
evidence to conclude that Shakespeare did not write the poem?
5. Which is better evidence against the null hypothesis, a large p-value or a small p-value?

Unit Activity:
Chips Ahoy!
Nabisco Chips Ahoy is a popular brand of chocolate chip cookie. In the 1980s, Nabisco ran
television ads claiming that their cookies had, on average, 16 chips per cookie. Since the
1980s many more brands of chocolate chip cookies have appeared on supermarket shelves,
which could have put pressure on Nabisco to improve its product perhaps by increasing the
amount of chips. On the other hand, the price of chocolate has increased, which could have
had the opposite effect. In this activity, you will test whether or not Nabisco could run the same
ad today.
1. Collect the data. Your instructor will provide directions and, after the data collection is
complete, distribute the data. (Save the data for use in Unit 27’s activity.)
2. Compute the mean and standard deviation of the number of chips per cookie.
3. a. State the null and alternative hypotheses.
b. Calculate the value of the z-test statistic. (Since the sample size is large, use s in
place of σ.)
c. Calculate the p-value and state your conclusion.
4. Calculate a 95% confidence interval for µ. Does your confidence interval indicate that µ has
increased, decreased, or remained the same from its value in the 1980s?

Exercises
1. Each of the following situations requires a significance test about a population mean μ.
State the appropriate null hypothesis, H0, and alternative hypothesis, Ha, in each case.
a. Larry’s car averages 32 miles per gallon on the highway. He switches to a new motor oil that
is advertised as increasing gas mileage. After driving 3000 highway miles with the new oil, he
wants to determine if his gas mileage actually has increased.
b. A university gives credit in a French language course to students who pass a placement
test. The language department wants to know if students who get credit in this way differ in
their understanding of spoken French from students who actually take the French course.
Some faculty think the students who test out of the course are better, but others argue that
they are weaker in oral comprehension. Experience has shown that the mean score of
students in the course on a standard listening test is 24. The language department gives the
same listening test to a sample of 40 students who passed the placement test to see if their
performance is different.
c. Experiments on learning in animals sometimes measure how long it takes a mouse to find
its way through a maze. The mean time is 18 seconds for one particular maze. A student
thinks that a loud noise will cause the mice to complete the maze faster. She measures how
long each of 10 mice takes with a noise as stimulus.
2. The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures the
motivation, attitude toward school, and study habits of students. Scores range from 0 to 200.
The mean score for U.S. college students is about 115, and the standard deviation is about
30. A teacher who suspects that older students have better attitudes toward school gives the
SSHA to 25 students who are at least 30 years of age. Their mean score is x = 125.2 .
Assume that σ = 30 for the population of older students, and that the students tested are a
random sample from the population of older college students. Carry out a significance test of
H0 : µ = 115
Ha : µ > 115
Report the value of the test statistic, the p-value of your test, and state your conclusion clearly.

3. Radon is a colorless, odorless gas that is naturally released by rocks and soils and may
concentrate in tightly closed houses. Because radon is slightly radioactive, there is some
concern that it may be a health hazard. Radon detectors are sold to homeowners worried
about this risk, but the detectors may be inaccurate. Tricia wants to study the accuracy of
radon detectors for a science fair project. At a nearby university, she places 12 detectors in a
chamber where they are exposed to 105 picocuries per liter (pci/l) of radon over 3 days. Here
are the readings given by the detectors.
91.9 97.8 111.4 122.3 105.4 95.0

99.6 96.6 119.3 104.8 101.7 03.8
a. In this case, the sample size n = 12 is relatively small. Check to see if it is reasonable to
assume these data come from an approximately normal population.
b. Do these observations provide good evidence that the average detector reading differs from
the true value of 105? Assume that you know that the standard deviation of readings for all
detectors of this type is σ = 9 .
4. The CDC publishes charts on Body Mass Index (BMI) percentiles for boys and girls of
different ages. Based on the chart for girls, the mean BMI for 6-year-old girls is listed as 15.2
kg/m2. The data from which the CDC charts were developed is old and there is concern
that the mean BMI for 6-year old girls has increased. The BMIs of a random sample of 30
6-year-old girls are given below.
24.5 16.3 15.7 20.6 15.3 14.5 14.7 15.7 14.4 13.2
16.3 15.9 16.3 13.5 15.5 14.3 13.7 14.3 13.7 16.0
14.2 17.3 19.5 22.8 16.4 15.4 18.2 13.9 17.6 15.5
a. State null and alternative hypotheses relevant to this situation.
b. Calculate the sample mean and standard deviation.
c. Since the sample size is relatively large, use s in place of σ and calculate the value of the
z-test statistic. Then calculate the p-value.
d. Based on your answer to (c), do the sample data provide sufficient evidence that the mean
BMI for 6-year-old girls has increased? Explain.

Review Questions
1. Small amounts of sulfur compounds are often present in wine. Because these compounds
have unpleasant odors, wine experts have determined the odor threshold, the lowest
concentration of a compound that a trained human nose can detect. For example, the odor
threshold for dimethyl sulfide (DMS) is 25 micrograms per liter of wine (µg/l). Untrained noses
may be less sensitive, however. A wine researcher found the DMS odor thresholds for 10
students in his restaurant management class. Here are the data.
31 31 43 36 23 34 32 30 20 24
Assume that the standard deviation of the odor threshold for untrained noses is known to
be σ = 7 µg/l.
a. Is it reasonable to assume the data are from an approximately normal population? Explain.
b. The researcher believes that the mean odor threshold for beginning students is higher than
the published threshold, 25 µg/l, and decides to conduct a significance test. What are the null
and alternative hypotheses?
c. Carry out a significance test. Report the value of the test statistic, the p-value, and
your conclusion.
2. In 2010/2011 the national mean SAT Math score was 514. Faculty at a state university
had disagreements over their students’ mathematics preparation for college. Some felt that
their students had fallen below the national average, and others felt that their students had
made some advances. To help answer this question, math faculty took a random sample of
50 students who entered the university fall semester 2011. The SAT Math scores from those
students are given below.
580 540 520 490 430 570 520 540 440 610
430 390 470 550 390 500 550 440 550 660
560 550 450 560 680 630 400 450 500 460
460 530 590 380 660 570 520 530 500 680
450 590 660 420 370 550 450 510 480 500

b. Do these data provide sufficient evidence that the mean SAT Math scores of students
entering the university in fall 2011 differed from the national mean? State the hypothesis you
are testing, the value of the test statistic, the p-value and your conclusion. (Replace σ in the
test statistic by s since the sample size is large.)
c. Construct a 95% confidence interval for µ, the mean Math SAT for students entering this
university in fall 2011. (Refer to Unit 24, Confidence Intervals.) Does your confidence interval
indicate that the true mean SAT Math score for students entering the university in fall 2011 is
less than 514, could be 514, or is greater than 514? Explain.
3. The average length of calls coming into a municipal call center had been around 90
seconds. Lately, there has been some concern that more complicated calls are coming into
the center causing the mean length of the calls to increase. In order to test this assumption,
the city draws a random sample of 100 calls. The sample mean and standard deviation are
x = 118.4 seconds and s = 186.5 seconds, respectively.
a. State the hypotheses being tested.
b. Do these data provide good evidence that the average call length has increased from 90
seconds? (Since the sample size is large, use s in place of µ ) Show the work needed to
support your answer. Conduct the significance test at the 0.05 level.
c. Suppose city planners are willing to run the test at the 0.10 level. (They will reject the null
hypothesis if the p-value is below 0.10.) Would this change the conclusion reached in (b)?
Explain.
4. Eating fish contaminated with mercury can cause serious health problems. Mercury
contamination from historic gold mining operations is fairly common in sediments of
rivers, lakes and reservoirs today. A study was conducted on Lake Natoma in California to
determine if the mercury concentration in fish in the lake exceeded guidelines for safe human
consumption. Suppose that you are an inspector for the Fish and Game Department and that
you are given the task of determining whether to prohibit fishing in Lake Natoma. You will
close the lake to fishing if it is determined that fish from the lake have unacceptably high
mercury content.

a. Assuming that mercury concentration of 5 ppm is considered the maximum safe
concentration, which of the pairs of hypotheses below would you test? Justify your choice.
H0 : µ = 5 versus Ha : µ > 5
or
H0 : µ = 5 versus Ha : µ < 5
b. Would you prefer a significance level of 0.1 or 0.01 for your test? Explain your choice.

Unit 26: Small Sample
Inference for One Mean
Summary of Video
The z-procedures for computing confidence intervals or hypothesis testing work in cases
where we know the population’s standard deviation. But that’s hardly ever the case in real
life. For times when we don’t know the population standard deviation but still want to figure
out confidence intervals and do significance tests, statisticians turn to t-inference procedures.
These t-procedures were invented in 1908 by William S. Gosset. Gosset was a chemist at
the Guinness Brewery in Ireland. Making ale requires constant sampling of everything from
barley to yeast to the beer itself. Gosset wanted to save time and money using small samples
and their standard deviations as an estimate of the unknown population standard deviation, σ.
Using the standard deviation s derived from only a few data values does not give a sufficiently
good estimate of the entire population’s standard deviation; and he couldn’t proceed with a
z-procedure. In his efforts to figure out a way around this issue, Gosset created a new class of
distribution called the t-distributions.
The video now turns to a modern day brewery, Pretty Things Beer and Ale Project. Dann
Paquette and Martha Holley-Paquette, owners of the operation, take samples at various
stages in the brewing process. At one stage, they take a sample and measure its density. They
aim for a density reading of at least 19.5 degrees Plato. (Degrees Plato measure how much
more dense the liquid is than water.) In one batch of Baby Tree beer, they got a reading of 20.3
degrees Plato – a good sign that this batch will be great.
Let’s imagine that Pretty Things wants to see how closely their production of Baby Tree beer
is hitting their pre-fermentation density goal of 19.5 degrees Plato. Data collected from 10
batches are given below.
20.2 18.9 19.6 20.6 20.3 18.7 21.0 18.5 20.1 19.3
Since our sample size is small, its standard deviation could be quite far from the population
standard deviation of all Baby Tree beer ever brewed. So, z-procedures won’t work. Instead,
we call on the t-procedure that William Gosset invented.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 1
In Figure 26.1, we compare a t-distribution for a sample of size 3 to the standard normal curve.
Standard Normal Curve
t-distribution
Sample size 3
-5.0 -2.5 0.0 2.5 5.0
Figure 26.1. Comparing t-distribution to standard normal distribution.
The two curves share certain features. They are both bell-shaped. But the t-density curve
is broader with a shorter peak, and its tails are higher. These fatter tails mean there is more
probability of getting results far from zero. That’s because the sample standard deviation
varies from sample to sample, particularly when the sample size is small, adding uncertainty.
Another difference is that although there is a single standard normal distribution, there is
a different t-distribution for every sample size. As the sample size increases, the sample
standard deviation s gets closer and closer to the population standard deviation, σ and the
t-density curve gets closer to the standard normal curve.
The different t-distributions are specified by something called degrees of freedom, which are
related to the sample size n: degrees of freedom = n – 1. For our beer data, the degrees of
freedom are 10 – 1 = 9. Next, we calculate a confidence interval for the population mean of
the density of all Baby Tree beer. For these calculations to work, our data need to be from a
normal distribution, which is a safe assumption in this case. Here’s our formula:
s
x ±t *
n
Filling in information from our data, we get
0.85
19.72 ± t *
10
In order to determine t*, we choose whatever confidence level we like – we’ll go with 95%. We
calculate the value of t* from software as illustrated in Figure 26.2: t* = 2.262.
0.95
0.025 0.025
-2.262 0 2.262
T t*
Figure 26.2. t-distribution with df = 9.
Plugging in the value of t*, we get the following confidence interval:
⎛ 0.85 ⎞
19.72 ± (2.262) ⎜ ≈ 19.72 ± 0.61
⎝ 10 ⎟⎠
So, our confidence interval (19.11, 20.33) gives us a range of plausible values for µ, the mean
density of Baby Tree beer.
Take a moment to compare z*, the z-critical value for a 95% z-confidence interval with our
value for t*:
z* = 1.960
t* = 2.262
Here, the t-critical value of 2.262 gives us a wider confidence interval. That is the price we pay
for having a small sample and for not knowing σ.
Pretty Things’ goal for their Baby Tree beer is a density of 19.5 degrees Plato. Using the
confidence interval that we have calculated, we can say with 95% confidence that a 19.5
population mean is within our range of plausible values for µ.
A. Understand when to use t-procedures for a single sample and how they differ from the
z-procedures covered in Units 24 and 25.
B. Understand what a t-distribution is and how it differs from a normal distribution.
C. Know how to check whether the underlying assumptions for a t-test or t-confidence interval
are reasonably satisfied.
D. Be able to calculate a t-confidence interval for a population mean.
E. Be able to test a population mean with a t-test. Be able to calculate the t-test statistic and to
determine the p-value as an area under a t-density curve.
F. Be able to adapt one-sample t-procedures to analyze matched pairs data.
Content Overview
In Units 24 and 25, we introduced z-procedures for (1) calculating confidence intervals for
a population mean and (2) conducting significance tests about a population mean. The
confidence interval formula and z-test statistic are as follows:
σ x−µ
(1) x ± z * (2) z =
n σ n
For both procedures we assumed that the population was normally distributed or the sample
size n was large, and that the population standard deviation σ was known. However, in real
life, σ is generally unknown.
Let’s start with an example. The weights (pounds) from a random sample of 16 4-year-old
children who took part in a study on childhood obesity appear below.
37.1 26.7 36.1 36.2 40.3 43.9 36.2 40.7
42.5 34.8 37.9 34.5 31.1 36.4 35.7 33.4
From these data, we can compute the sample mean and sample standard deviation:
x = 36.47 lbs and s = 4.23 lbs. However, the population standard deviation σ is unknown.
Nevertheless, we would like to calculate a confidence interval for µ, the mean weight of
4-year-olds.
It seems reasonable to assume that weights of 4-year-olds are normally distributed and the
normal quantile plot in Figure 26.3 confirms this assertion.
Normal Quantile Plot

Normal - 95% CI
99
95
90
80
70
Percent
60
50
40
30
20
10
1
20 25 30 35 40 45 50
Weight Age 4 (pounds)
Figure 26.3. Normal quantile plot of children’s weights.
But we still have to deal with the fact that σ is unknown. We know that when the sample size n
is large, s will be close to σ. But in this case n = 16, which is not large enough. Hence, simply
replacing σ by s in the z-confidence interval formula would introduce too much additional
variability. To compensate for the additional variability, we also replace z*, a critical value from
a standard normal distribution, with t*, a critical value from a t-distribution. The result is the
formula for computing t-confidence intervals:
s
x ±t *
n
The t-distributions have some features in common with the standard normal distribution.
Both have density curves that are bell shaped and centered at zero as can be seen in Figure
26.4. However, there is more area in the tails of t-distributions than there is for the standard
normal distribution. This difference is particularly noticeable when the degrees of freedom are
small (See Figure 26.4(a).).
Density Curves Density Curves
standard normal
standard normal distribution
distribution
t-distribution t-distribution
df = 5 df = 15
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
(a) t-distribution has 5 degrees of freedom. (b) t-distribution has 15 degrees of freedom.
Figure 26.4. Comparison: two t-distributions with standard normal distribution.
The degrees of freedom (df) associated with t*, the t-critical value in our confidence interval
formula, are related to the size n of the sample:
df = n – 1
So, for our sample of 16 observations, df = 15. The value of t* for a 95% confidence interval
can be determined from a t-table (Figure 26.5) or using statistical software (Figure 26.6). In
either case, t* = 2.131.
Density Curve
T, df =15
0.4
0.3
Density
0.2
0.1
CONFIDENCE LEVEL C 0.025 0.95 0.025

0.0
DEGREES OF FREEDOM 90% 95% 96% -2.131 0 2.131
T
15 ... 1.753 2.131 2.249 t*
Figure 26.5
Figure 26.5. Using a t-table to determine t* Figure 26.6. Using statistical software to
determine t*.
We now have everything that we need to calculate a 95% confidence interval for µ, the mean
weight of 4-year-olds. Here are the calculations:
s  4.23 
x ±t * = 36.47 ± (2.131)   = 36.47 ± 2.25
n  16 
Hence, we can say that µ is between 34.22 lbs and 38.72 lbs and that we have used a process
to calculate this interval that has a 95% track record of giving correct results.
Next, we turn our attention to significance tests about a population mean µ for situations where
the sample size is relatively small (n < 30), the population has a normal distribution, but σ is
unknown. The t-test statistic results from replacing σ in the z-test statistic with s:
x−µ
t=
s n
We return to our sample of 16 4-year-olds to see how this works.
Suppose that a height chart listed the average height of 4-year-olds as 39 inches. The heights
of the sample of 16 4-year-olds are given below.
39.9 37.4 40.3 39.6 39.2 43.2 40.5 40.6
41.8 39.5 40.9 39.8 40.3 39.4 40.7 39.5
We suspect that children’s heights have increased since the time the height chart was created
due in part to better nutrition. To test our supposition, we let µ represent the mean height of
4-year-olds. The null and alternative hypotheses are:
H0 : µ = 39
Ha : µ > 39
The sample mean and standard deviation for the height data are: x = 40.163 inches and
s = 1.255 inches. Now we calculate the t-test statistic, replacing µ0 with its value from the null
hypothesis and substituting in the sample values for x and s:
x − µ0 40.163 − 39
t= = ≈ 3.71
s n 1.255 16
If the null hypothesis is true, then the t-test statistic will have a t-distribution with n – 1 = 16
– 1 or 15 degrees of freedom. All that is left is to determine the p-value, the probability of
observing a value at least as extreme as 3.71. Using statistical software we find p ≈ 0.001 as
illustrated in Figure 26.7. Hence, we reject the null hypothesis and conclude the mean height of
4-year-olds has increased since the time that the height chart was created.
Density Curve
T, df = 15
0.001048
0 3.71
T
Figure 26.7. Determining the p-value.
The study involving the 16 children followed the children for two years. As part of the study,
children were weighed when they were four and again when they were six. Table 26.1 shows
the results, including the children’s weight gain over the two-year period (Difference).
Weight Age 4 (lbs) Weight Age 6 (lbs) Difference (lbs)
37.1 46.9 9.8
26.7 35.9 9.2
36.1 48.3 12.2
36.2 44.2 8.0
40.3 50.6 10.3
43.9 56.4 12.5
36.2 51.8 15.6
40.7 50.2 9.5
42.5 51.6 9.1
34.8 41.4 6.6
37.9 55.3 17.4
34.5 44.5 10.0
31.1 39.9 8.8
36.4 46.3 9.9
35.7 49.6 13.9
33.4 39.0 5.6
Table 26.1. Change in weight from age 4 to age 6.
In the past, children around this age would have been expected to gain 4.5 pounds per year
or 9 pounds over the two-year period. However, we suspect that the mean change in weight
has increased. To test this assumption, we perform what is called a matched-pairs t-test.
The parameter µ in a matched-pairs t-procedure is the mean difference in observations
(or responses) on each individual (or subject in a match pair) – in our case, µ is the mean
difference between weight at age 4 and weight at age 6. We set up the null hypothesis (no
change from what was expected in the past) and alternative hypothesis (increase from what
was expected in the past):
H0 : µ = 9
Ha : µ > 9
Now, we calculate the one-sample t-test statistic using the differences. The sample mean and
sample standard deviation of the differences are: xD = 10.525 lbs and sD = 3.122 lbs. The
matched-pairs t-test statistic is computed as follows:
xD − µ0
t=
sD n
10.525 − 9
t= ≈ 1.95
3.122 16
From Figure 26.8 we see that p ≈ 0.035. Since p < 0.05, we reject the null hypothesis and
accept the alternative that the mean weight gain in children from age 4 to age 6 is greater than
9 pounds.
Density Curve
T, df = 15
p-value
0.03506
0 1.95
T
Figure 26.8. Calculating the p-value for matched-pairs t-test.
The last step in this analysis is to use a matched-pairs t-procedure to calculate a 95%
confidence interval for µ, the mean weight gain from age four to age six. The matched-pairs
t-confidence interval is computed as follows:
⎛s ⎞
xD ± t * ⎜ D ⎟
⎝ n⎠
⎛ 3.122 ⎞
(
10.525 ± 2.131 ⎜ )
⎝ 16 ⎟⎠
= 10.525 ± 1.663
Notice that the t-critical value, t*, depends only on the sample size and the confidence level
and not on whether we are calculating a one-sample or a matched pairs t-confidence
interval. Our confidence interval for µ is from 8.9 lbs to 12.2 lbs.
Look back at the results from our two matched-pairs t-procedures. We concluded from the
t-test that µ, the mean weight gain from age 4 to age 6, was greater than 9 pounds. However,
our confidence interval for µ was (8.2, 12.2), an interval that includes values that are below 9
pounds. Results from a two-sided confidence interval do not always match the results from a
significance test involving a one-sided alternative.
Key Terms
Density curves for t-distributions are bell-shaped and centered at zero, similar to the
standard normal density curve. Compared to the standard normal distribution, a t-distribution
has more area under its tails. The shape of a t-distribution, and how closely it resembles the
standard normal distribution, is controlled by a number called its degrees of freedom (df). A
t-distribution with df > 30 is very close to a standard normal distribution.
A t-confidence interval for µ when σ is unknown is calculated from the formula:

 s 
x ±t * 
 n
where t* is a t-critical value associated with the confidence level and determined from a
t-distribution with df = n – 1.
A t-test statistic for testing H0 : µ = µ0 , where µ is the population mean, is given by:
x − µ0
t=
s n
The t-test is a modification of the z-test and is used in situations where the population standard
deviation σ is unknown and either the population has a normal distribution or the sample size n
is large. The p-value is determined from a t-distribution with df = n –1.
A matched-pairs, t-confidence interval for µD, the population mean difference, is given by
the formula:
⎛s ⎞
xD ± t * ⎜ D ⎟
⎝ n⎠
t-distribution with df = n – 1 and xd and sD are the mean and standard deviation of the
sample differences.
A matched-pairs t-test statistic for testing H0 : µD = µD0 , where µD is the population mean
difference, is given by
xD − µD0
t=
sD n
where xd and sD are the mean and standard deviation of the sample differences.
The Video
watch the video.
1. Why won’t the z-procedure work in most cases, particularly if the sample size is small?
2. Who invented t-inference procedures?
3. Compare a normal density curve with a t-distribution for a sample size of 3. How are the
two distributions similar and how do they differ?
4. For a t-distribution, how are the degrees of freedom related to sample size?
5. For a 95% confidence interval, which is larger, z* or t*?
Unit Activity:
Step-by-Step
Pedometers count the number of steps a person walks. If you want your pedometer to
calculate how far you have walked, you need to enter in your step length (distance from heel of
one foot to heel of other foot when walking). In this activity, you will collect data on step length
for males and females in the class. Assuming that students in your class are representative
of the student population, you will calculate confidence intervals for the mean step length of
males and females.
1. Discuss methods for getting reliable measurements for step length. After your group
discussion, the class must decide on the method that will be used to collect the step-length
data. Write a brief description of this method.
2. Collect the step-length data for males and females separately. After the data are collected,
the two data sets (male step lengths, female step lengths) should be distributed to the class.
In answering the remaining questions, assume that the class data are representative of the
general male and female student populations.
3. Check that the underlying assumption of normality is reasonably satisfied in both data sets.
4. a. Calculate the mean and standard deviation for the male step-length data.
b. Calculate the mean and standard deviation for the female step-length data.
5. a. Calculate a 95% confidence interval for the mean step length of males. What are the
degrees of freedom of the t-critical value?
b. Calculate a 95% confidence interval for the mean step lengths of females. What are the
degrees of freedom of the t-critical value?
6. Based on your confidence intervals in question 5, can you conclude that the mean step
length for males is greater than for females? Explain.
Exercises
1. A woman in a nursing home is on medication for high blood pressure. Her blood pressure is
taken daily. A sample of 20 blood pressure readings (mmHg) appears below.
150 148 136 120 142 144 130 150 130 142
140 130 148 142 138 130 120 166 130 152
b. Determine a 95% confidence interval for µ, her mean systolic blood pressure.
Show your calculations.
c. Is the underlying assumption that these data come from a normal distribution reasonably
satisfied? Explain.
2. A manufacturer of brass washers produces one type of washer that has a target mean
thickness of 0.019 inches. After the production process had continued for some time without
any adjustment, a random sample of 10 washers was selected and measured for thickness.
The data are given below.
0.0185 0.0190 0.0180 0.0184 0.0179
0.0186 0.0188 0.0178 0.0182 0.0186
a. Does the assumption that washer thickness is normally distributed seem reasonable given
these data? Explain.
b. Do you think the production process is still in control or do the data indicate that it is time to
make some adjustments? To answer this question, test the hypothesis that the mean thickness
equals 0.019 inches against the hypothesis that it does not. Report the value of the test
statistic, the p-value and your conclusion.
c. Calculate a 95% confidence interval for µ, the mean thickness of washers currently being
produced. Show your calculations. Does your interval indicate that the process needs to be
adjusted to increase washer thickness or decrease washer thickness? Explain.
3. Students in a statistics class measured their foot lengths and forearm lengths. The data are
given in Table 26.2. Assume this sample is representative of the student population.
Forearm Length (inches) Foot Length (inches)

8.75 10.00 10.00 9.00
9.00 9.00 9.00 10.50
8.50 10.00 9.50 11.00
10.25 10.00 10.00 11.50
10.25 11.50 12.50 11.25
8.50 9.00 11.50 9.00
9.25 8.50 9.00 10.50
10.50 6.75 9.25 10.50
8.25 10.00 9.50 8.50
9.00 8.25 8.25 10.00
7.00 8.25 9.50 8.75
9.50 9.00 9.50 8.75
9.75 8.00 9.50 10.00
Table 26.2. Data on forearm and foot lengths.
a. Calculate a 95% confidence interval for µForearm , the mean forearm length of students. Then
calculate a 95% confidence interval for µFoot , the mean foot length of students.
b. Do your 95% confidence intervals support the hypothesis that the mean forearm length of
students differs from the mean foot length of students? Explain.
c. Calculate 90% confidence intervals for µForearm and µFoot . Compare the 95% confidence
intervals to the 90% confidence intervals. Which of the two were wider? Explain why that was
the case.
d. Answer part (b) based on the 90% confidence intervals calculated for (c).
4. A statistics professor was concerned that students were not as successful on the second
exam (which was on inference) as they were on the first exam (which was on descriptive
statistics). She took a random sample of 15 students enrolled in her introductory statistics
courses over the past three years. Their grades on these two exams appear in Table 26.3
(continued on next page).
Exam 1 Exam 2
67 59
74 82
85 96
71 62
78 69
96 83
63 52
91 94
93 82
81 67
84 66
64 66
88 84
89 75
82 90
Table 26.3. Statistics exam scores.
a. Compute the differences between Exam 2 and Exam 1. What is the sample mean for the
differences? What is the sample standard deviation for the differences?
b. Return to the professor’s concern that students were not as successful on the material for
Exam 2 as they were on the material for Exam 1. Let µD be the population mean difference
between the two exam scores. Set up null and alternative hypotheses to test the professor’s
supposition.
c. Conduct a significance test. Report the value of the test statistic and the p-value.
d. Do the results of your test of hypotheses support the professors’ concern? Explain.
Review Questions
1. Given a simple random sample of size n, you want to compute a confidence interval for µ
( )
of the form: x ± t * s n . Find the value of t* for each of the following confidence levels and
sample sizes.
a. A 95% confidence interval for a sample size n = 12.
b. A 99% confidence interval for a sample size n = 15.
c. An 80% confidence interval for a sample size n = 10.
2. Supermarket rotisserie chickens have become very popular with the American public. A
study was conducted to compare the nutrient composition of commercially-prepared rotisserie
chicken to that of roasted chicken, which is listed in the USDA National Nutrient Database for
Standard Reference (SR).
a. The Standard Reference (SR) listed the mean protein content of roasted chicken breast
as 31 grams. In the sample of 9 rotisserie chickens, x = 29.86 grams and s = 1.95 grams.
Conduct a t-test to see if the mean protein content in rotisserie chicken breasts differs from the
SR. Report the value of the test statistic, the p-value, and your conclusion.
b. The SR listed the mean cholesterol level in roasted chicken thighs as 95 milligrams. In a
sample of 9 rotisserie chicken thighs, x = 134 milligrams and s = 2.43 milligrams. Conduct
a t-test to see if the mean cholesterol level in rotisserie chicken thighs differs from the SR.
Report the value of the test statistic, the p-value, and your conclusion.
3. A researcher studying ocean literacy focused her efforts on the program Ocean Commotion
– a one-day ocean/wetlands literacy program that includes hands-on demonstrations about
marine environments and products. Prior to attending the program, a sample of 337 students
from 6 schools were given a pre-test to measure their attitudes toward the ocean and
wetlands. After the program, students were given a post-test. The test was graded on a scale
from 1 (lowest) to 5 (highest). The mean score on the pre-test was 4.06. The mean score on
the post-test was 4.13. The standard deviation of the pre-post differences was 0.40.
a. The researcher wanted to test whether attendance at Ocean Commotion had a significant
positive effect on students’ attitudes toward the ocean and wetlands. Let µD be the
mean difference in attitude after the program compared to before the program. Write the
researcher’s null and alternative hypotheses.
b. Calculate the value of the t-test statistic. What are the degrees of freedom associated with
the t-test statistic?
c. What is the p-value?
d. The researcher concluded that Ocean Commotion had a significant effect on students’
attitudes toward the ocean and wetlands. Given your answer to (c), do you agree with her
conclusion? Do you think that the difference is of practical importance? Explain.
4. A psychology class wanted to find out whether attendance at a workshop on happiness

can teach people how to be happier. The class arranged to attend a “happiness” workshop. A
week prior to going they completed the Oxford Happiness Questionnaire. The questionnaire,
developed by psychologists Michael Argyle and Peter Hills, is scored on a scale from 1 (not
happy) to 6 (too happy). Students completed the Oxford Happiness Questionnaire a second
time two days after the workshop and a third time six weeks after the workshop. Table 26.4
displays their data (See table on next page).
First Second Third
Questionnaire Questionnaire Questionnaire
2.6 3.1 2.2
4.9 5.0 2.5
6.0 4.8 3.9
3.3 3.5 4.5
4.1 4.2 2.9
3.8 4.0 3.1
5.2 4.6 4.8
4.3 4.7 5.1
3.1 5.3 3.9
5.5 5.2 3.8
4.2 5.4 4.9
2.0 3.2 2.8
4.3 5.1 2.9
3.6 4.6 3.2
3.1 3.8 2.9
5.2 4.7 3.4
Table 26.4. Results from Oxford Happiness Questionnaire
a. Students thought that the workshop would have a positive effect on happiness, at least short
term. Use the data from the first and second questionnaires to test their hypothesis. State the
null and alternative hypotheses, calculate the value of the t-test statistic, determine the p-value
and give your conclusion.
b. Use the data from the first and third questionnaires to test whether there is any long-term
positive effect on students from this type of happiness workshop. State the null and alternative
hypotheses, the value of the test statistic, the p-value, and your conclusion.
c. Let µ(Third – First) be the population mean difference of Oxford Happiness Questionnaire
scores: taken before and six weeks after participation in a happiness workshop. To estimate
the long-term effect of the workshop on students’ happiness, calculate a 95% confidence
interval for µ(Third – First). Interpret what it tells you about students who participated in happiness
workshops before and after participating in the Oxford Happiness Questionnaire.
d. The sample for this study consisted of students from one psychology class. Do you think the
results are valid for all college students from this university? Explain.
Unit 27: Comparing
Two Means
Summary of Video
It’s an age old battle of the sexes. Are men or women worse drivers? Whatever your opinion
on this question, a statistician needs evidence in order to make a decision. One way to
analyze this question would be to see which gender, on average, gets more moving violations.
We could take a sample from all licensed drivers in one state, and then look at the number of
tickets each person received in one year. We could then calculate the mean number of tickets
received by members of each gender and compare the two numbers to see which group had
the worst driving record.
Comparing two populations is an important topic in statistics because it occurs quite

frequently. Moreover, we can use inference to move beyond just looking at two sample means,
as was suggested in our driving example. We can go on to figure out whether the difference
between two groups is statistically significant; and if it is, we can calculate a confidence
interval for the difference of population means.
That’s what researchers did when they decided to investigate the difference in the amount
of calories necessary to power daily life in two groups of people with very different lifestyles.
Herman Pontzer is an anthropologist who is interested in how energy is used by primate
species, particularly human beings. Pontzer teamed up with other researchers to work with the
Hadza in Tanzania, a group of traditional hunter-gatherers who live in a way very similar to our
ancestors. Men hunt with bows and arrows and women forage for plant foods and dig for root
vegetables. The Hadza are a lot more active and cover a lot more ground than their Western
counterparts. Everyone had always assumed that this physically-demanding hunter-forager
lifestyle would require much more energy than the relatively inactive daily life of a Western
office worker. In fact, one suspected cause of the obesity epidemic in the West is our
more sedentary modern lifestyle. But the Hadza’s actual energy expenditure had never yet
been tested.
Was the assumption correct that the Hadza used more calories throughout their day? Pontzer
and his team already had data on how many calories typical Americans and Europeans
burned in their daily lives. Now they needed to measure how many calories it took to power
Unit 27: Comparing Two Means | Student Guide | Page 1

the Hadza through their daily tasks. They used a technique that relies on the subjects drinking
something called doubly-labeled water. For this technique, a person drinks some water where
the hydrogen and oxygen have been enriched with rare isotopes. From urine samples taken
over a two-week period, researchers can measure with the use of spectroscopy how much of
those rare isotopes are in their urine samples. As the concentration of the special traceable
hydrogen and oxygen isotopes in the urine goes down over time, Pontzer can figure out how
much carbon dioxide the subject has exhaled over the course of the study. When a body
burns calories, a byproduct is carbon dioxide, so the amount of carbon dioxide exhaled told
the researchers how much energy the Hadza were expending. In addition, the researchers
recorded the physical activity of the Hadza by having them wear heart rate monitors and
GPS units.
The Hadza are typically smaller and lighter than their Western counterparts. That difference
required Pontzer and his colleagues to use sophisticated statistical techniques in their
analyses to control for the effects of body size, age, and sex. To keep things simple, so that
we can follow their comparison, we’ll look just at women with comparable body sizes from
the Hadza and Western groups. We want to use our sample to determine whether there is a
significant difference between the means of the Hadza and Western populations.
First, the scientists calculated the mean total energy expenditure (TEE), which was measured
in calories, for each group. The sample means, standard deviations, and sample sizes for
each group are as follows:
Hadza
x1 = 1,877 calories, s1 = 364 calories, n1 = 17
Westerners
x2 = 1,975 calories, s2 = 286 calories, n2 = 26
Is the difference between these sample means significant? Or, could the difference we see be
due simply to chance variation? We can set up a significance test to figure this out. Below are
the null and alternative hypotheses concerning the total energy expenditure.

H0 : Mean Hadza TEE = Mean Western TEE
µ 1 = µ2
Ha : Mean Hadza TEE ≠ Mean Western TEE
µ1 ≠ µ2
The two-sample t-test statistic is:
(x1 − x2 ) − ( µ1 − µ2 )
t=
s12 s22
+
n1 n2
Now we can substitute the numbers into the formula. We have the sample means, standard
deviations, and sample sizes. For the value of µ 1− µ2 , we use the value from the null
hypothesis, which states these two means are equal, and hence, µ 1− µ2 = 0 .
(1,877 − 1,975) − 0
t= ≈ −0.94
(364)2 (286)2
+
17 26
Like all of the z- or t-test statistics that we have encountered, this one tells us how far x1 − x2 is
from 0, the hypothesized difference in means, in standard units.
Software can figure out the degrees of freedom for the t-test statistic, or we can just go with a
very conservative approach that uses the smaller sample size minus one, which gives us 16
degrees of freedom. We can look up the corresponding p-value in a t-table or use technology;
either way, we get p = 0.3612. That means that assuming the null hypothesis is true, we have
a 36% chance of seeing a t-value as or more extreme than the one we calculated. A 36%
chance is pretty likely, so we have insufficient evidence to reject the null hypothesis. We
conclude that there is no significant difference between total energy expenditure of Hadza
women and Western women.
This, in fact, is what the researchers concluded. After controlling for body size, age, and sex,
the scientists did not find any statistical difference when they compared the mean daily energy
expenditure of the Hadza and the Westerners. This result seemed counterintuitive, since they knew
the Hadza were much more active. The researchers suspect that the Hadza’s bodies are allocating
a smaller percentage of those daily calories to run-of-the-mill cellular function and more to physical
activity. Researchers think that it is a difference in energy allocation, not energy efficiency.

Today’s obesity epidemic tells us something is out of balance between the amount of calories
that we take in and the amount we burn off. Based on this study and others, metabolism
seems to hold quite constant among different populations of people with varying activity levels.
Because of this finding, Pontzer and his colleagues place the blame for rising societal levels of
obesity more on people eating too much than on our modern lifestyle.

A. Understand when to use two-sample t-procedures.
B. Know how to check whether the underlying assumptions for a two-sample t-procedure are
reasonably satisfied.
C. Be able to calculate a confidence interval for the difference of two population means.
D. Be able to test hypotheses about the difference between two population means. Be able to
calculate the t-test statistic and use technology to determine a p-value.

Content Overview
Consider the following questions: Do men earn more than women? Are women smarter than
men? Do students from private schools do better on their SATs than students from public
schools? Which relieves headaches more quickly, Tylenol or Advil? Each of these questions
seeks to compare two populations or two treatments – a commonly encountered situation in
statistical practice.
We begin with a question related to the activity in Unit 26: Are the step lengths of 10th-grade
male students longer, on average, than the step lengths of 10th-grade female students? In
this case, the comparison is between two populations, 10th-grade males and 10th-grade
females. Let µ1 and σ 1 be the mean and standard deviation, respectively, of step lengths
for the population of 10th-grade males. Let µ2 and σ 2 be the mean and standard deviation,
respectively, for the 10th-grade females. If there is no difference between the mean step
lengths of male students and female students, then µ1 − µ2 = 0 . However, if males, on average,
take longer steps than females, then µ1 − µ2 > 0 . We can state the null hypothesis and
alternative hypothesis as follows:
H0 : µ1 − µ2 = 0
Ha : µ1 − µ2 > 0
Suppose we randomly select two samples, one of size n1 from the male students and another
of size n2 from the female students. After collecting the data, we can calculate the sample
means, x1 from the males and x2 from the females, and sample standard deviations, s1 from
the males and s2 from the females. It seems reasonable to use the difference in sample
means, x1 − x2 , to estimate the difference in population means, µ1 − µ2 . If the two populations
are normally distributed or if the sample sizes are large, then x1 − x2 has a normal distribution
with the following mean and standard deviation:
σ1 σ 2
µ x −x = µ1 − µ2 and σ x −x = +
1 2 1 2
n 1 n2
Transforming x1 − x2 into a z-score, we get:

(x1 − x2 ) − ( µ1 − µ2 )
z=
σ 12 σ 22
+
n1 n2
The two-sample z-test statistic has the standard normal distribution. At this point, if we knew
the population standard deviations, we could use a z-procedure to test our hypotheses.
Unfortunately, σ 1 and σ 2 are unknown, so we will need to use the sample standard
deviations, s1 and s2 , as estimates. Substituting s1 and s2 in place of σ 1 and σ 2 gives us the
two-sample t-test statistic:
(x1 − x2 ) − ( µ1 − µ2 )
t=
s12 s22
+
n1 n2
The two-sample t-test statistic has an approximate t-distribution. The degrees of freedom
(df) are a bit complicated to figure out. We can either use software or adopt a conservative
approach and set the degrees of freedom to be one less than the smaller of the two
sample sizes.
Now, we return to our hypotheses about step lengths of male and female students. Sample
data were collected and are summarized below:
Males: n1 = 12 , x1 = 64.08 cm , s1 = 7.71 cm
Females: n2 = 15 , x2 = 60.34 cm , s2 = 7.74 cm
Normal quantile plots of the male and female step-length data indicate that it is reasonable
to assume that step lengths are approximately normally distributed. Now we are ready to
compute the t-test statistic. Using the null hypothesis value of 0 for µ1 − µ2 and our sample
means and standard deviations, we get:
(64.08 − 60.34) − 0
t= ≈ 1.25
(7.71)2 (7.74)2
+
12 15

Taking the conservative approach, we assume df = 12 – 1, or 11. Since the alternative is one-
sided, we use technology to determine the area under the t-density curve that lies to the right
of 1.25, our observed value of the test statistic. As shown in Figure 27.1, this gives p = 0.1186.
0.1186
0 1.25
T
Figure 27.1. Density curve for t-distribution with df = 11.
Since p > 0.05, there is insufficient evidence to reject the null hypothesis. We cannot conclude
that the mean step length of 10th-grade male students differs from the mean step length of
10th-grade female students.
The next example involves a study that compares two teaching strategies for nursing students
– lecture notes combined with structured group discussions versus a traditional lecture format.
Two groups of students taking a medical-surgical nursing course were taught using each of
the two strategies. Exam scores were used to compare the effectiveness of the two teaching
strategies. Let µ1 be the mean exam score for students enrolled in the lecture notes/group
discussion version of the course; let µ 2 be the mean exam score for students enrolled in the
lecture only version of the course. We set up the following null and alternative hypotheses:
H0 : µ1 − µ2 = 0
Ha : µ1 − µ2 ≠ 0
Exam scores of two groups of students taught by each of these methods were collected with
the following results:
Lecture notes/group discussion: n1 = 81, x1 = 80.6 , s1 = 7.34
Lecture format (control): n2 = 88 , x2 = 77.68 , s2 = 7.23
Since the sample sizes are large, we can conduct a t-test to decide between the null and
alternative hypotheses without first checking that the data come from normal distributions.

Here are the calculations:
(80.60 − 77.68) − 0
t= ≈ 2.60
7.34 2 7.23 2
+
81 88
Adopting the conservative approach, we use df = 81 – 1, or 80, to determine the p-value.

Based on Figure 27.2, we get a value of p = 2(0.00555) ≈ 0.011. (Note: In this situation, we
have a two sided alternative.)
Distribution Plot
T, df = 80
0.4
0.3
0.2
0.1
0.005550 0.005550
0.0
-2.600 0 2.6
T
Figure 27.2. Calculating the p-value for a two-sided alternative.
Since p < 0.05, we reject the null hypothesis and accept the alternative hypothesis that
the mean exam scores for the two teaching methods differ. To estimate that difference, we
calculate a two-sample t-confidence interval for µ1 − µ2 using the following formula:
* s12 s22
( x1 − x2 ) ± t +
n1 n2
Adopting the conservative approach, we set df = 80 and determine a t-critical value for a
95% confidence level: t*= 1.990. Now we are ready to calculate a 95% confidence interval for
µ1 − µ2 :
7.342 7.23 2
(80.60 − 77.68) ± (1.990) + ≈ 2.92 ± 2.23 , or (0.69, 5.15).
81 88
Hence, the mean exam scores for the lecture notes/group discussion teaching strategy are
between 0.69 and 5.15 points higher than the mean exam scores for the traditional lecture
teaching strategy.

We conclude this section with one final comment related to checking the underlying
assumptions for the two-sample t-procedures. In the development of the two-sample
t-procedures for cases where the sample sizes are small, we assume that the population
distributions are normal. As it turns out, if the sample sizes are reasonably close and the
population distributions are similar in shape, without major outliers, the probabilities from the
t-distribution are quite accurate even if the population distributions are not normal.

Key Terms
Two sample t-procedures are used to test or estimate µ1 − µ2 , the difference in the means of
two populations (or treatments). The required data consists of two independent simple random
samples of sizes n1 and n2 from each of the populations (or treatments).
The two-sample t-test statistic for testing the difference in population means is:
(x1 − x2 ) − ( µ1 − µ2 )
t=
s12 s22
+
n1 n2
where the value for µ1 − µ2 is taken from the null hypothesis. There are two options for finding
the degrees of freedom (df) associated with t: (1) use technology or (2) use a conservative
approach and let df = smaller of n1 − 1 or n2 − 1 .
The two-sample t-confidence interval for µ1 − µ2 is computed as follows:
s12 s22
(x1 − x2 ) ± t * +
n1 n2
The degrees of freedom for calculating t*, the t-critical value associated with the confidence
level, uses the approach outlined for the two-sample t-test statistic.

The Video
watch the video.
1. How might a statistician gather evidence to answer the following question: Are men or
women worse drivers?
2. What was different about the lifestyle of the Hadzas compared to typical Europeans
or Americans?
3. What was Pontzer’s original assumption about the daily energy expenditure (in calories)
consumed by the Hadza compared to the Westerners?
4. What type of test was used in the video to test this assumption for Hadza and Western
women of similar body size?
5. Was the assumption in question 3 correct? Explain.
6. On what did Pontzer and his colleagues place the blame for rising societal levels of obesity?

Unit Activity:
Chips Ahoy, Regular and Reduced Fat
Nabisco’s Chips Ahoy is a popular brand of chocolate chip cookies. Nabisco makes both a
regular and, for those who want to restrict their fat intake, a reduced fat version of chocolate
chip cookies. The question for this activity is to find out whether the mean number of chips per
cookie is the same for Chips Ahoy reduced fat chocolate chip cookies as it is for Chips Ahoy
regular chocolate chip cookies.
1. If needed, collect the data on the number of chips per cookie in regular and reduced fat
Chips Ahoy cookies. Your instructor will provide directions. (You may already have collected
the data as part of Unit 25’s activity.)
2. a. Do you think the mean number of chips per cookie is the same for both Chips Ahoy
regular and Chips Ahoy reduced fat chocolate chip cookies? If not, which type, regular or
reduced fat, do you think has, on average, more chips per cookie? Explain.
b. Set up null and alternative hypotheses for testing whether the mean number of chips per
cookie is the same for both the regular and the reduced fat version of Chips Ahoy chocolate
chip cookies. Be sure to define any symbols that you use in your hypotheses. (Did you choose
a one-sided or two-sided alternative?)
3. Report the sample size, mean and standard deviation for the regular chocolate chip cookie
data. Then do the same for the reduced fat chocolate chip cookie data.
4. Make comparative graphic displays of the chip count data for the regular and reduced fat
cookies. Based on your plots, do the chip counts for the two types of cookies appear to differ?

5. a. Compute the two-sample t-test statistic. Show your calculations.
b. Determine a p-value for your test statistic in (a).
c. Is there a significant difference in the mean number of chips per cookie in regular and
reduced fat Chips Ahoy chocolate chip cookies? Explain.
6. Calculate a 95% confidence interval for the difference between the mean number of chips
per cookie in Chips Ahoy regular and Chips Ahoy reduced fat chocolate chip cookies.

Exercises
1. A study published in the Journal of Business and Psychology investigated whether being
pregnant had an adverse effect on women’s job performance appraisal ratings. Two groups
of female employees at a large financial institution were subjects in this study, a pregnancy
group and a control group. The first group consisted of 71 pregnant women. For each subject
in the pregnancy group, a control group subject was randomly selected from the non-pregnant
female employees with the same job title. The performance appraisal ratings were on a scale
from 1 (outstanding performance) to 5 (unsatisfactory performance). The sample sizes, sample
means and standard deviations for the two groups are given below:
Pregnancy group: n1 = 71, x1 = 2.38 , s1 = 1.10
Control group: n2 = 71 , x2 = 2.69 , s2 = 0.58
The researchers hypothesized that pregnant employees would be rated differently when
compared with the control group.
a. Set up a null hypothesis and an alternative hypothesis to test whether the population mean
performance ratings differed for the two groups of female employees.
b. Calculate the t-test statistic and determine a p-value. State your conclusion.
c. Calculate a 95% t-confidence interval for the difference in mean performance appraisal
ratings for pregnant employees and non-pregnant female employees. On average, are the
performance ratings for pregnant women better or worse than for the non-pregnant female
employees? Explain.
2. Return to the study discussed in question 1. The same group of researchers also gathered
data on the pregnant group’s performance appraisal ratings during pregnancy and after
returning from pregnancy leave. Here is a summary of the data gathered:
During pregnancy: x1 = 2.38 , s1 = 1.10 , n1 = 71
After pregnancy: x2 = 2.65 , s2 = 1.64 , n2 = 71

Difference “During – After”: xd = −0.27 , sd = 2.00 , nd = 71
a. The researchers were interested in answering the following question:
Did the mean performance ratings for the pregnancy group differ significantly between the
During Pregnancy and After Pregnancy time periods?
Which test is more appropriate to answer this question, a one-sample t-test or a two-sample
t-test? Explain.
b. Use the appropriate test to answer the question posed in (a). Report the value of the test
statistic, the p-value, and your conclusion.
3. A state university is concerned that female students are not as well prepared in mathematics
as their male counterparts. Random samples of 20 male students and 20 female students were
selected from the class of first-year students. Their SAT Math scores are given below.
SAT Math Scores of Female Students
530 450 550 470 450 500 480 510 470 450
600 540 530 470 420 490 440 540 500 480
SAT Math Scores of Male Students
670 440 410 510 410 600 530 490 600 530
570 550 640 530 550 460 660 570 670 490
a. Make graphic displays to compare the SAT Math scores of the female students and the
male students. Do your plots provide evidence that male students entering the university have
higher SAT Math scores than female students?
b. Is it reasonable to assume that the distributions of SAT Math scores for both populations,
first-year male students and first-year female students, are approximately normally distributed?
Support your answer.
c. Calculate the sample means and standard deviations for the females and the males SAT
Math scores.

d. Conduct a test of hypotheses to see if the mean SAT Math scores are significantly higher
for males than for females. Report the value of the two-sample t-test statistic, the p-value and
your conclusion.
4. A group of 4-year-olds, who were part of the Infant Growth Study, participated in a
laboratory meal. Data collected during this meal can be used to answer the following research
question: Do the mean number of calories consumed by girls at a meal differ from the mean
number of calories consumed by boys? Below is a summary of the results:
Girls: xG = 494 calories, sG = 172 calories, nG = 31
Boys: xB = 409 calories, sB = 148 calories, nB = 27
a. Write the null and alternative hypotheses.
b. Calculate the value of the two-sample t-test statistic. (Round to three decimals.)
c. Adopt a conservative approach and set the degrees of freedom to be one less than the
smaller of the two sample sizes. Calculate the p-value. Are the results significant at the 0.05
level? Explain.
d. For two-sample t-tests, the Content Overview of this unit suggested using a conservative
approach for determining the degrees of freedom associated with the test statistic: set df =
smaller of n1 − 1 or n2 − 1 , where n1 and n2 are the sample sizes of the two groups. However,
statistical software calculates degrees for freedom using the following formula:
2
 s12 s22 
 + 
df =  n1 n2 
2 2
 1   s12   1   s22 
   +   
 n1 − 1   n1   n2 − 1   n2 
In general, this formula does not result in an integer value. In that case, the degrees of
freedom are rounded down to the closest integer below the calculated value.
Use the formula given above to calculate the degrees of freedom. (Be sure to round down to
the closest integer.) Calculate the p-value based on the df you calculated from the formula.
Based on this p-value, are the results significant at the 0.05 level?

Review Questions
1. The financial aid office of a university asks a sample of students about their employment
and earnings. The report says that “for academic year earnings, a significant difference
(p = 0.038) was found between the sexes, with men earning more on the average. No
difference (p = 0.476) was found between the earnings of African-American and white
students.” Explain both of these conclusions, for the effects of sex and of race on mean
earnings, in language understandable to someone who knows no statistics.
2. A study was conducted to investigate the effect of different levels of air pollution on the
pulmonary functions of healthy, non-smoking, young men. Two geographical areas with
different levels of air pollution were selected – Area 1 had lower levels of pollutants than
Area 2. Samples of 60 men were selected from each area. The two groups of men had no
significant differences in age, height, weight, and BMI. Data on two measures of pulmonary
function for each group are provided below:
Forced vital capacity (FVC, in Liters)
Area 1: x1 = 4.49 , s1 = 0.43
Area 2: x2 = 4.32 , s2 = 0.45
Respiratory rate (RR, per minute)
Area 1: y1 = 17.17 , s1 = 4.26
Area 2: y 2 = 16.28 , s2 = 2.39
a. Why do you think the researchers tested to see if there were significant differences between
the age, height, weight, and BMI for the two samples?
b. Test whether there is a significant difference between the mean FVC for the participants
from Area 1 and Area 2. State the value of the test statistic, the p-value and your conclusion.
c. Test whether there is a significant difference between the mean RR for the two areas. State
the value of the test statistic, the p-value and your conclusion.
d. If you find a significant difference in (b) or (c) or both, construct a 95% confidence interval to
estimate the difference in means between the two areas.

3. A state university is concerned that there is a difference in the writing abilities of their male
and female students. To test this assertion, the university took a random sample of 60 of their
first-year students and recorded their genders and SAT Writing scores.
The data appears below.
SAT Writing Scores of Female Students
480 540 620 590 530 620 580 530 530 560 510 560 560
550 520 480 560 510 500 540 490 430 610 620 510
SAT Writing Scores of Male Students
480 560 400 580 480 460 430 430 490 610 540 500 540
400 530 640 350 470 600 610 530 580 430 510 520 380
540 460 640 520 570 560 490 440 480
a. Make comparative boxplots for the SAT Writing scores for female and male students.
Based on your boxplots, is it reasonable to assume that SAT Writing scores are approximately
normally distributed for each gender? Does one gender tend to have higher SAT Writing
scores than the other?
b. Summarize the data by reporting the sample sizes, sample means and standard deviations
for both groups.
c. Test to see if there is a significant difference in mean SAT Writing scores between female
and male first-year students attending this university. Report the value of the test statistic, the
p-value, and your conclusion.
d. Compute a 95% confidence interval for the difference in mean SAT Writing scores for
female and male students attending this university. Interpret your results.
4. Do 4-year-old boys eat, on average, more mouthfuls of food at a meal than 4-year-old girls? A
group of 4-year-olds, who were part of the Infant Growth Study, participated in a laboratory meal.

Data on mouthfuls of food consumed during the laboratory meal were collected. Here is a
summary of the results:
Girls: xG = 80.8 mouthfuls, sG = 41.4 mouthfuls, nG = 31
Boys: xB = 92.3 mouthfuls, sB = 42.0 mouthfuls, nB = 27
a. State the null and alternative hypotheses.
b. Determine the value of the two-sample t-test statistic and the p-value.
Report your conclusion.

Unit 28: Inference for
Proportions
Summary of Video
It is nearly impossible to collect data about an entire population. Take, for example, all the
salmon in one watershed. We can’t count the number of eggs laid by every single spawning
salmon. But we can count the eggs laid by a sample of some of these salmon. Then, using
statistical inference, we can use the mean number of eggs in our sample to draw conclusions
about the egg-laying population as a whole. As part of the inference procedure, we use
probability to indicate the reliability of our results.
We can also use statistical inference to estimate a population proportion. For instance,
suppose we wanted to know how many of the eggs laid by the salmon were fertilized. We
could investigate the fertilization rate in our sample to get a sample proportion or sample
percentage. Then we could use the sample proportion as an estimate of the unknown
population proportion. But how good of an estimate is it? This will be the topic of this video –
using information from samples to make inferences about population proportions.
Let’s turn our attention to a completely different context: the workplace. Employers think
about how to motivate their employees to do their best, most creative work. Psychologist
Teresa Amabile has studied creativity for years. One of Amabile’s discoveries from her
earlier research is that creativity fluctuates, even for a given individual, as a function of the
kind of work environment the individual is in. Building on that foundation, Amabile designed
a study around the question of worker motivation. She recruited 238 people with creative
jobs who were willing to keep track of their activities, emotions, and motivation levels every
workday. Their electronic diaries had two components. One consisted of participants rating
their motivation, emotions, and other subjective factors on a seven-point scale. The second
component was an open-ended question where participants were asked to describe one
event that stood out that day. It could be anything, as long as it was relevant to the work or the
project. After several years, Amabile had nearly 12,000 diary entries. These entries validated
her earlier findings that people were able to solve problems creatively and come up with new
ideas on days they felt most motivated and excited about their work. So, the next question to
ask was: What led to high levels of motivation?
Unit 28: Inference for Proportions | Student Guide | Page 1

Dipping into the diaries, Amabile was able to see that one factor, more than anything else,
made people feel they were having a great day at work. That was simply making progress
in meaningful work, even if the progress looked incremental. She called this the Progress
Principle. It turned out that 76% of participants’ best days had a progress event; whereas only
25% of their worst days had a progress event. Progress was paramount for people to feel
positive and highly motivated – much more than other things like support from management and
coworkers, feelings of doing important work, or collaboration, as can be seen from Figure 28.1.
Figure 28.1. Type of event recorded on workers best and worst days.
Amabile and her coauthor decided to survey managers to see whether they were aware of
how important this feeling of progress was in motivating workers. She asked them to rate
five different items in order of how much they felt they affected workers’ motivation. If the
managers just randomly chose one of the five options to rank as most important, we would
expect 20% of them to pick progress. So, we let p be the proportion of all managers who
would pick progress as the most important of the five items for motivating workers. Now, we
can set up a test of hypothesis for the population proportion, p:
H0 : p = 0.20
Ha : p ≠ 0.20
As it turned out, only 35 out of 669 managers selected progress as the top motivational
factor. That gives a sample proportion of just 0.0523, or a mere 5.23%. This seems pretty low
compared to the 20% proportion from our null hypothesis. But is it low enough to reject the null
hypothesis? To find out, we can turn to a z-test statistic:

pˆ − p0
z=
p0 (1 − p0 )
n ,
where p̂ (pronounced p-hat) is our sample proportion, p0 is the null hypothesis proportion,
and n is the sample size. Substituting our sample proportion and sample size we get:
0.0523 − 0.20
z= ≈ −9.55
(0.20)(1− 0.20)
669
That is a pretty extreme z-test statistic. If you compare it to a standard normal distribution,
being 9.55 standard deviations from the mean is highly unlikely. As can be seen from Figure
28.2, the area under the curve that far out is not really visible! In fact, the p-value is 0.000.
So, we have our answer: reject the null hypothesis and accept the alternative. The population
proportion of all managers in the world who would select “Support for Making Progress” as the
most important motivator is not 20%.
Figure 28.2. Determining a p-value from a standard normal density curve.
Now that we have rejected the null hypothesis, let’s calculate a confidence interval for the
true population proportion. We know that the sample proportion of managers who selected
progress was 0.0523, but we don’t know how close that is to the true population proportion.
Just like in the confidence intervals for one mean, we can figure out a standard error to go with
our point estimate. Here’s the formula for the confidence interval:
pˆ (1 − pˆ )
pˆ ± z *
n

Suppose we decide that we want a 95% confidence interval. Then our value for z* is 1.96 just
as it was for z-confidence intervals for a population mean.
Next, we use our sample information to calculate the 95% confidence interval for the
population proportion, p:
0.0523(1 − 1.0523)
0.0523 ± 1.96 ,
669
0.0523 ± 0.0169 or 5.23% ± 1.69%
So, our estimate is that only between 3.5% and 6.9% of managers in the overall population
would rate progress as the number one motivational factor. A good question to ask is how
could managers be so unaware of what really counted to their employees? What managers
have said in response to that question is that it is just part of their employees’ jobs – they are
supposed to make progress. Managers don’t typically think of progress as something that they
need to worry about. But, according to Amabile, they actually do need to worry about it a lot.
What Amabile saw in the diaries was that there were often little hassles happening in the work
lives of most of the study participants that kept them from making as much progress as they
would like. These were things that managers could have cleared away for them, without a lot
of effort, if they had just been paying attention.
On some level the workers themselves might have recognized that their best days often went
hand-in-hand with progress events. But the managers basically had no clue. It is the kind of
finding that makes perfect sense once you know about it. Sometimes you just have to ask the
right questions and know how to analyze the data.

A. Identify inference problems that concern a population proportion.
B. Know how to conduct a significance test of a population proportion.
C. Be able to calculate a confidence interval for a population proportion.
D. Understand that the z-inference procedures for proportions are based on approximations to
the normal distribution and that accuracy depends on having moderately large sample sizes.

Content Overview
Up to this point, all the inference procedures we have discussed involve using sample means,
x , to make inferences about population means, μ. In this unit, we focus on proportions. For
example, what if we wanted to know what proportion of people own or use a computer at
home, or have access to the Internet from home, or from work, or from school? In order to
answer these questions, we need new inference procedures designed for proportions.
In inference, we start by defining the population – for our question on home-use of computers,
the population will be all households in America. Of interest is the population proportion,
p, of households in which some member owns or uses a computer at home. Now, we don’t
have access to every household in America, but we can take a sample. In a random sample of
2,500 households, 2,036 answered yes to the following question:
At home, do you or any member of this household own or use a desktop, laptop, netbook,
or notebook computer?
From this information we can calculate the sample proportion, which we label as p̂ :
2036
pˆ = = 0.8144 , or 81.44%
2500
But how good is this estimate for p? Remember, the sample proportion, p̂ , is a statistic. If we
take another sample of 2,500 households, we will most likely get a different estimate for p. So,
as a first step in developing inference procedures for population proportions, we need to know
something about the sampling distribution of the sample proportion, p̂ .

Sampling Distribution of a Sample Proportion
Suppose that a large population is divided by some characteristic into two categories,
successes and failures, and that p is the population proportion of successes. A simple
random sample of size n is drawn from the population and is the sample proportion:
number of successes in the sample

p̂ =
n
As a statistic, p̂ varies over repeated sampling. Its sampling distribution has the
following properties:
• Mean: µ p̂ = p
p(1− p)
• Standard deviation: σ p̂ = .
n
• Distribution: For large n, p̂ has an approximately normal distribution.
Since, in the case of home use and/or ownership of computers, the sample size is large,
2,500, the sampling distribution of p̂ is approximately normal (as pictured in Figure 28.3.)
Approximately Normal Density Curve
Standard
deviation
p(1 − p )
n
p
Values of p
Figure 28.3. Sampling distribution of the sample proportion, p̂ .
Suppose that an online source claimed that 79% of American households had a member of
the household who owned or used a computer at home. We would like to test that claim. To
do so, we use the online source’s claim about the population to set up the null and alternative
hypotheses:

H0 : p = 0.79
Ha : p ≠ 0.79
Now, if the null hypothesis is true, then the distribution of p̂ from a sample with n = 2,500 will
have an approximately normal distribution with the following mean and standard deviation:
µ p̂ = 0.79
(0.79)(1− 0.79)
σ p̂ = ≈ 0.0081
2500
Since we are dealing with an approximately normal distribution, we can express p̂ in

standardized units (subtract the mean and divide by the standard deviation):
pˆ − 0.79
z=
0.0081
If the null hypothesis is true, z will have a standard normal distribution. Now, go back to the
results of the survey, pˆ = 0.8144 , and express that value in standardized units:
0.8144 − 0.79
z= ≈ 3.01
0.0081
We calculate a p-value for the significance test by determining how likely it is to observe a
value from the standard normal distribution that is at least 3.05 from the mean. In this case,
we get a p-value of 2(0.001306) ≈ 0.003. Since this p-value < 0.05, we can reject the null
hypothesis and conclude that the population proportion is not 0.79, or 79%.
Standard Normal Density Curve
0.001306 0.001306
-3.01 0 3.01
z
Figure 28.4. Calculating the p-value of a z-test statistic.

Before moving on, we summarize the basics of a significance test for population proportions.
Significance Test for a Population Proportion
To test the null hypothesis H0 : p = p0 , where p is the population proportion and p0 is

the hypothesized value, we use the z-test statistic:
pˆ − p0
z=
p0 (1 − p0 )
n
where p̂ is the sample proportion. When the null hypothesis is true and the sample
size is large, the z-test statistic will have an approximate standard normal distribution.
Now that we have rejected the null hypothesis that members of 79% of American households
own/use a computer at home, let’s calculate a confidence interval for the true population
proportion. The formula for a confidence interval for a population proportion follows the
same pattern that was used to calculate a confidence interval for a population mean:
Point estimate ± z*(standard error of point estimate)

Point estimate ± margin of error
Here’s the formula for calculating a confidence interval for a population proportion.
Confidence Interval for a Population Proportion
( pˆ )(1 − pˆ )
pˆ ± z *
n
where p̂ is the sample proportion and z* is the z-critical value (from a standard normal
distribution) associated with the confidence level.
Suppose we decide on a 95% confidence interval for p. Then we use z* = 1.96, just as we did
in Unit 24, Confidence Intervals. All that is left is to substitute our observed sample proportion,
pˆ = 0.8144 into the formula (Continued on next page):

(0.8144)(1- 0.8144)
0.8144 ±1.96 » 0.8144 ± 0.0152
2500
81.44% ± 1.52%, or between 79.92% to 82.96%
So, now we are ready to use sample proportions to conduct significance tests and calculate
confidence intervals for population proportions.

Key Terms
Assume that a population is divided into two categories, successes and failures, based on
some characteristic. The population proportion, p, is:
number of successes in the population

p=
population size
Draw a sample of size n from this population. Then the sample proportion, p̂ , is calculated
as follows:
number of successes in the sample

p̂ =
n
If the sample size n is relatively large, the sampling distribution of the sample proportion,
p̂ , is approximately normally distributed with the following mean and standard deviation:
• Mean: µ p̂ = p , where p is the population proportion.

p(1− p)
• Standard deviation: σ p̂ = , where n is the sample size.
n
To test the null hypothesis H0 : p = p0 , where p is the population proportion, we can use the
z-test statistic for proportions. The formula for the z-test statistic is:
pˆ − p0
z=
p0 (1 − p0 )
n
The z-test is used in situations where the sample size n is large.
In situations where the sample size n is large, a confidence interval for the population
proportion, p, can be calculated from the formula:
pˆ (1 − pˆ )
pˆ ± z *
n
where p̂ is the sample proportion and z* is the z-critical value (from a standard normal
distribution) associated with the confidence level.

The Video
watch the video.
1. What is the general topic of this video?
2. In Teresa Amabile’s earlier study of workers in creative jobs, how did participants of the
study feel on the days when they were most able to solve problems creatively and come up
with new ideas?
3. Describe the principle that Amabile dubbed the Progress Principle.
4. Managers were given five items, including progress, and asked to select the one that they
felt most affected workers’ motivation. If managers randomly selected one of the five items,
what percentage of the managers would we expect to select progress?
5. What type of test statistic was used to test the null hypothesis H0 : p = 0.20 , where p is the
population proportion?
6. In the video, a 95% confidence interval was calculated for the true population proportion
of managers who would select progress as the most important motivational factor. After
converting to percentages, were the values in this confidence interval below 20%, around
20%, or above 20%?

Unit Activity:
Proportions of Blue Eyes
In the activity for Unit 21, you completed Table 21.1 by simulating data for inheriting blue eyes
(genes bb) from brown-eyed parents who carried a recessive gene for blue eyes (genes Bb).
You will need those data for this activity. In this activity, the population consists of the children
of brown-eyed parents, each of whom carries a recessive gene for blue eyes. In this case, the
true population proportion is known, which is generally not the case, and p = 0.25. In this case,
knowing the population proportion allows us to see how well the statistics perform.
Number of Estimated Running Total

Running Total
Sample Blue-Eyed Proportion Number of
Number of
Number Children Blue-Eyed Children Blue-Eyed
Children
n=4 n=4 Children
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Table 28.1.Table
Data 28.1on children’s eye color.

1. In a copy of Table 28.1, enter the x-values from your completed Table 21.1 into the
second column.
2. a. For each sample of four children, calculate the sample proportion of blue-eyed children,
p̂ . Enter the sample proportions in the third column of Table 28.1.
b. Notice that your sample proportions vary from sample to sample (even though the
population proportion stayed the same). What was the smallest sample proportion? What was
the largest?
c. To get a sense of the shape of the sampling distribution of the sample proportion, make
a histogram of your values for p̂ (from column three). Use class intervals of width 0.25 for
your histogram. Does your histogram indicate that the sample proportions have a normal
distribution?
3. a. Complete the fourth column of Table 28.1 by entering a running total of the number of
children as samples are combined.
This list should contain the following numbers: 4, 8, 12, . . . , 120.
b. Complete the fifth column of Table 28.1 by entering a running total of the number of blue-
eyed children as samples are combined.
4. The confidence interval formula given in the Content Overview is for large sample sizes.
After combining the data from the first 10 samples, you now have a sample of 40 children.
a. Give a point estimate for the population proportion, p, of blue-eyed children based on the 40
children from Samples 1 – 10.
b. Compute a 95% confidence interval for p. (Round to three decimals.)
c. How big is the margin of error in your confidence interval in (b)?
5. After combining the data from the first 20 samples, you now have a sample of 80 children.
a. Give a point estimate for the population proportion, p, of blue-eyed children based on your
sample of 80 children.

b. Compute a 95% confidence interval for p.
6. After combining the data from all 30 samples, you now have 120 children.
a. Give a point estimate for the population proportion, p, of blue-eyed children based on your
sample of 120 children.
b. Compute a 95% confidence interval for p.
7. Compare the margins of error for the three confidence intervals that you computed in
questions 4 – 6. What happened to the margin of error as the sample size increased?
8. From questions 4 – 6, we know that sample size affects the margin of error. How large a
sample size n is needed to guarantee that the margin of error for a 95% confidence interval for
p is less than 0.05? Complete parts (a) – (c) to find out.
a. The margin of error, E, for a 95% confidence interval is calculated by the following formula:
pˆ (1 − pˆ )
E = 1.96
n
Replace E by 0.5 and solve for n.
b. If you solved for n correctly, you found that n is a multiple of pˆ (1 − pˆ ) , which varies for
different values of p̂ . Complete the second column of Table 28.2 by calculating the values of
pˆ (1 − pˆ ) for different values of p̂ (See next page).

p̂ pˆ (1 − pˆ )
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
28.2pˆ (1 − pˆ ) .
Table of
Table 28.2 Values
c. To find the value of n that guarantees a margin of error < 0.05, substitute the largest value
you found for pˆ (1 − pˆ ) into your equation in (a). Report the value of n needed to guarantee that
the margin of error will be less than 0.05 (regardless of the value of p̂ ).
9. To conclude this activity, we know that the population proportion of blue-eyed children born
to brown-eyed parents with a blue-eye recessive gene is p = 0.25. Which of your confidence
intervals from questions 4 – 6 gave correct results? (In other words, which of your confidence
intervals contained the true population mean?)

Exercises
1. A random sample of 2,454 12th-grade students were asked the following question:
Taking all things together, how would you say things are these days – would you say you’re
happy or not too happy? Of the responses, 2,098 students selected happy. (These data were
from a Monitoring the Future survey.)
a. Determine the sample proportion of students who responded they were happy.
b. Calculate a 95% confidence interval for the population proportion of 12th-grade students
who are happy.
c. Would a 90% confidence interval for the proportion of happy students be wider or narrower
than the one you calculated for (b)? Justify your answer.
2. Currently, mothers in North America are advised to put babies to sleep on their backs. This
recommendation has reduced the number of cases of sudden infant death syndrome (SIDS).
However, it is a likely cause of another problem – flat spots on babies’ heads. A study of 440
babies aged 7 – 12 weeks found that 46.6% had flat spots on their heads.
a. The headline of the online news article reporting this story read: Nearly half of babies have
flat spots, study finds. Conduct a test of hypotheses to test H0 : p = 0.5 against Ha : p ≠ 0.5
where p is the population proportion of North American babies aged 7 – 12 weeks who
have flat spots on their heads. Report the value of your test statistic, the p-value, and your
conclusion.
b. Calculate a 95% confidence interval for the proportion of babies in this age group that have
flat spots.
c. Suppose you decide to use your confidence interval from (b) to make a decision between
H0 : p = 0.50 and Ha : p ≠ 0.50 . Would your decision based on your confidence interval agree
with your decision based on the z-test statistic from (a)? Explain.
3. An online article claims that 90% of American households in which a computer is owned/
used have access to the Internet. However, an Internet provider questioned the claim. The
Internet provider felt that the percentage should be higher. A phone survey contacted 1,910

households in which a computer was owned/used and respondents were asked if they could
access the Internet from their home. A total of 1,816 of the households responded yes.
a. Define the population.
b. Set up the null and alternative hypotheses.
c. Calculate the z-test statistic, determine the p-value, and state your conclusion.
4. Return to question 3. Calculate a 95% confidence interval for the population proportion p.
Re-express your confidence interval as a percentage.

Review Questions
1. A sample of 5,462 eighth-grade students were asked whether or not they actively
participated in sports, athletics, or exercising on a nearly daily basis. Of the students who
responded, 2,998 said yes. (These data were from a Monitoring the Future survey.)
a. Determine the sample proportion of eighth-grade students who responded that they were
involved nearly daily in some sort of physical activity.
b. A physical education teacher claimed that over 50% of all eighth-grade students in America
actively participate in physical activity on a nearly daily basis. Set up a null hypothesis and an
alternative hypothesis to test this claim.
c. Conduct a significance test for the population proportion. Report the value of the test
statistic, the p-value, and your conclusion.
2. Polls taken a few days before the 2012 presidential election between Barack Obama and
Mitt Romney did not indicate a clear winner. An NBC/Wall Street Journal poll showed that 48%
of the sample intended to vote for Obama. The polling organization announced that they were
95% confident that the sample result was within ± 2.6 percentage points of the true percent of
all voters who favored Obama.
a. Explain in plain language to someone who knows no statistics what “95% confident” means
in this announcement.
b. The poll showed Obama leading Romney 48% to 47%. Yet NBC/Wall Street Journal
declared the election was too close to call. Explain why.
3. A community college conducted a survey of student learning outcomes just prior to

graduation. A sample of its students completed the survey. Student responses have been
boiled down to two categories, agree or disagree, for the following three questions:
a. I have improved in my ability to take responsibility for my own actions.
Valid responses: 296; Agree: 255
b. I have improved in my ability to understand my society and the world.

c. I have improved in my awareness and appreciation of cultures other than my own.
For each of questions (a) – (c), determine a point estimate for the proportion of graduates from
this college who would agree with the statement. Then calculate a 95% confidence interval for
the population proportion.
4. Rasmussen Reports conducted a national survey of 1,000 adults from June 19-20, 2013.
The poll found that 63% of Americans think that a government that is too powerful is a bigger
danger than one that is not powerful enough.
a. Use the information from the report to calculate a 95% confidence interval for the proportion
of Americans who would agree with the statement above. Restate your confidence interval in
terms of percentages. What is the margin of error for your confidence interval?
b. The report concluded with the following statement: The margin of error is ±3% with a 95%
level of confidence. Compare this statement with the margin of error you calculated in (a).
c. Was a sample size of 1,000 large enough to guarantee that the margin of error was less
than 3% even if the sample percentage had been as low as 50% or as high as 80%? Explain.
d. How large a sample size was needed to guarantee that the margin of error was below 3%
regardless of the sample proportion?

Two-Way Tables
Summary of Video
In this video, we visit the Broad Institute in Cambridge, Massachusetts, where our host, Dr.
Pardis Sabeti, has a small research team investigating an ancient biological battle – the non-
stop evolutionary arms race between our bodies and the infectious microorganisms that try to
invade and inhabit them. The Broad Institute is home to new high tech tools such as the latest
generation of genome sequencers. They allow us to sequence out the letters that code the
genomes of both humans and our microbial enemies. In her research, Dr. Sabeti and her team
use the data that these machines provide to find clues that might lead to new ways to battle
some of our most dangerous diseases, diseases that we in the West rarely encounter.
One of the deadliest is Lassa fever, which, like the more notorious tropical disease Ebola,
is caused by a virus and kills its victims with hemorrhagic fever. Throughout West Africa,
thousands of people die of Lassa fever every year. But what is surprising is that many tens
of thousands more throughout the region are exposed to the virus without getting sick. This
suggests that these people have some sort of resistance to the virus. It is the source of this
resistance that Dr. Sabeti wants to discover.
Dr. Sabeti’s work on Lassa fever is still in its early stages, but one of the models for what she
hopes to uncover can be found in the research on another tropical disease, malaria, which
kills and sickens millions every year. With malaria we already know of one important source of
resistance to the disease. It’s a genetic mutation that is better known for the harm it does than
for the good – sickle cell anemia.
(Continued on next page...)
Unit 29: Inference for Two-Way Tables | Student Guide | Page 1

Figure 29.1. Inheriting the sickle cell mutation.
As we discovered in the module on binomial distributions (Unit 21), if a child inherits two copies
of the sickle cell mutation (SS) from his or her parents, the child will have sickle cell anemia. If
the child inherits only one copy of the gene, he or she is unaffected by the disease, but more
importantly the child is protected against malaria. (See Figure 29.1.) It is this protective effect that is
responsible for the sickle cell mutation becoming so prevalent and it is statistics that can reveal it.
To see how two-way tables can help reveal protective factors, Dr. Sabati has borrowed some
data from Dr. Hans Ackerman. He and his colleagues looked at the genotypes of 315 children
with severe malaria. Since each child inherits one hemoglobin gene from each parent, they
examined 630 genes in total. The researchers wanted to quantify whether children who came
down with malaria were less likely to have inherited the protective sickle cell version of the
gene (HbS) rather than the normal version (HbA), as compared to the general population.
Table 29.1 shows the breakdown of HbA and HbS in two groups of children. The top row of the
table shows the genes they found in the children with severe malaria. The bottom row shows
the genes they found in a control group of newborn babies.
HbA Susceptible HbS Protective Total

Malaria 623 7 630
Control 1065 101 1166
Table 29.1. Table of hemoglobin gene in two groups ofTable

children.29.1
Intuitively, we would expect to find the protective version of the gene less frequently in the
children sick with malaria than in the control group. After all, if they were protected, they likely
wouldn’t have come down with the disease. Table 29.2 shows the conditional distribution of
HbA and HbS for each group of children.

Malaria 98.89% 1.11% 100%
Control 91.34% 8.66% 100%
Table 29.2. Conditional

Table 29.2distribution of HbA and HbS for each group.
Notice that HbS was inherited by the kids who caught malaria only 1.11% of the time compared
to 8.66% of the time by the control group. Is that difference larger than would be expected
just by chance? Is it statistically significant? We can conduct a test of hypotheses to find
out whether there is sufficient evidence that the status of two variables – Malaria/General
Population and HbS/HbA – are linked. Our null hypothesis is that there is no association
between contracting malaria and having the HbS sickle cell gene. The alternative hypothesis is
that there is an association between contracting malaria and having the protective HbS sickle
cell gene.
H 0 : No association between malaria and HbS

Ha : Association between malaria and HbS
We can compute what the expected counts in our two-way table would be if there really is no
association between our variables as the null hypothesis states. Here’s how to compute the
expected counts:
(row total)(column total)

expected count =
grand total
Table 29.3 shows the results of adding the expected counts to our two-way table.

Observed 623 7 630
Malaria
Expected 592.1 37.9 630.0
Observed 1065 101 1166
Control
Expected 1095.9 70.1 1166.0
Table
Table 229.3.
9.3 Adding the expected counts.
Now we can see that if there were no relationship between having the gene and coming down
with the disease we would expect to find 37.9 HbS genes in the children with malaria. But in reality
there are only 7 HbS genes in that group. Is that difference between 7 and 37.9 enough to tell us
that there is an association between our two categorical variables? The next step in our analysis
is to use the chi-square test statistic, given below, to figure out if that difference is significant.

(observed − expected)2
χ =∑
2
expected
The chi-square test statistic is a measure of how far the observed counts in the table are from
the expected counts. Here are the calculations:
( 623 − 592.1) + (7 − 37.9) + (1065 − 1095.9) + (101− 70.1)

2 2 2 2
χ2 =
592.1 37.9 1095.9 70.1
χ 2 ≈ 41.26
Using software, we find the p-value: p ≈ 0 . So, we have very strong evidence that there is an
association between our variables and we can reject our null hypothesis. This result, together
with the pattern of the data, gives support to the research hypothesis that the HbS sickle cell
variant of the hemoglobin gene does protect against malaria.

A. Understand the basic principles of the chi-square test.
B. Know how to calculate the expected cell counts in a two-way table.
C. Know the assumptions required for the chi-square test of independence.
D. Be able to conduct a chi-square test of independence.

Content Overview
Each year, the study Monitoring the Future: A Continuing Study of American Youth (MTF)
surveys 12th -grade students on a wide range of topics related to behaviors, attitudes,
and values. It is a major source of information on smoking, drinking, and drug habits of
American youth.
Suppose we want to investigate whether the environment in which students grow up is linked
to the likelihood that they have consumed alcohol (more than just a few sips). We focus on
three growing-up environments – a farm, the country, or a small-to-medium size city. Since we
expect the growing-up environment may help us explain the likelihood of alcohol consumption,
environment is the explanatory variable, and alcohol consumption is the response variable. We
are interested in testing if there is an association between these two variables or if they are
independent.
The two-way table in Table 29.4 shows the results on these questions from the
2011 MTF survey.
Environment
Count A Farm Country Small/Medium City
No 144 342 1366
Alcohol
Yes 305 800 3049
Table 29.4. Results from questions

Table 29.4 on growing-up environment and drinking alcohol.
We begin analyzing these data using techniques covered in Unit 13, Two-Way Tables.
Because we think that growing-up environment explains whether or not students might have
consumed alcohol, we calculate the conditional percentages for the variable alcohol for each
level of environment. In other words, we compute the column percentages, which appear in
Table 29.5.
Environment
Percent A Farm Country Small/Medium City
No 30.94 29.95 32.07
Alcohol
Yes 69.06 70.05 67.93
Total 100.00 100.00 100.00
Table 29.5
Table 29.5. Column percentages.
Based on Table 29.5, it looks as if students who grew up in the country were the most likely
(70.05%) to have drunk alcoholic beverages and the students who grew up on a farm were

the least likely (67.93%). The question is whether these differences are due to an association
between the two variables or could these differences be due simply to chance variation? In
order to distinguish between these two cases, we set up hypotheses for a significance test:
H0 : No association between drinking alcohol and growing-up environment.

Ha : Association between drinking alcohol and growing-up environment.
Remember, the data in Table 29.5 came from a sample of 12th-grade students. The meaning
of the null hypothesis is that in the population of all 12th-grade students in America there
is no difference among the distributions of alcohol consumption for the three growing-up
environments. To test H 0 we compare the observed counts from Table 29.4 with the counts
that we would expect to see if the two variables were independent (no association). If it
turns out that the observed counts are far from the expected counts, then we would have
evidence against the null hypothesis. Here’s how to calculate the expected counts.
Calculating the Expected Counts
Assume that H0 is true and that there is no association between two variables in a
two-way table. Then the expected count in any cell of the table is computed as follows:

expected count = ,
grand total
where the grand total is the sum of the counts in all cells in the table.
Before calculating the expected counts, we add the row and column totals to our table of
counts (See Table 29.6.).
Environment
Total
Farm Country City
No 144 342 1366 1852
Alcohol
Yes 305 800 3049 4154
Total 449 1142 4415 6006
Table 29.6. Addition of Table

row and 29.6
column totals to Table 29.4.

For example, the expected count for the cell in the first row, first column is:
(1852)(449)
expected count = = 138.45
6006
Table 29.7 shows the expected counts added to the table. For each cell, the expected counts
appear below the observed counts.
Environment
Total
Farm Country City
144 342 1366
No 1852
138.5 352.1 1361.4
Alcohol
305 800 3049
Yes 4154
310.5 789.9 3053.6
Total 449 1142 4415 6006
Table 29.7. Table 29.6 with expected counts added.

Table 29.7
If there is no association between alcohol consumption and growing-up environment, the

expected counts should be close to the observed counts. We compare the observed and
expected counts by way of a chi-square test statistic, χ , which is given below.
2
Computing the Chi-Square Test Statistic

The chi-square test statistic measures how far the observed counts in a two-way table
are from the expected counts.
The χ - test statistic is calculated by the following formula:

2
(observed − expected)
2
χ2 = ∑
expected
Next, we calculate the value of the chi-square test statistic:
(144 − 138.5) + (342 − 352.1) + (1366 − 1361.4)

2 2 2
χ 2
=
138.5 352.1 1361.40.76
(305 − 310.5) + (800 − 789.9) + (3049 − 3053.6)

2 2 2
+
310.5 789.5 3053.6
≈ 0.76

If the null hypothesis is true and the cell counts are reasonably large, then the chi-square test
statistic has an approximate chi-square distribution. Like t-distributions, chi-square distributions
are specified by degrees of freedom. In this case, the degrees of freedom depend on the
number of rows and columns of the table: df = (r – 1)(c – 1), where r and c are the number of
rows and columns, respectively. Table 29.4 has two rows and three columns.
So, we get df = (2 – 1)(3 – 1) = 2. To calculate a p-value, we find the probability of observing

a value from the chi-square distribution with df = 2 that is at least as large as the one we
observed, χ 2 = 0.76. Using software, we determine that p ≈ 0.684 as can be seen in Figure
29.2. Assuming the null hypothesis is true, we would expect to see χ 2 -values at least as large
as the one we observed around 68% of the time. That’s pretty often. So, we have insufficient
evidence to reject the null hypothesis. We conclude that there does not appear to be an
association between students’ drinking alcohol and whether students grew up on a farm, in the
country, or in a city.
Chi-Square Density Curve, df = 2
0.5
0.4
0.3
0.2
0.6839
0.1
0.0
0 0.76
2
Χ
Figure 29.2. Calculating the p-value from a chi-square distribution.
Next, we ask whether the same results would be true for 12th-grade students’ smoking habits.
In other words, are the smoking habits of 12th-grade students independent of their growing-up
environment? Table 29.8 gives results for these questions from the 2011 MTF survey. (More
students answered the question on smoking than did on drinking alcohol.)

Environment
Farm Country City
Never 299 738 3218
Smoking Occasionally 159 403 1521
Regularly, now or past 103 236 596
Table 29.8. Table of smoking and growing-up environment.
Table 29.8
Again we set up the null and alternative hypotheses:

H0 : No association between smoking and growing-up environment.
Ha : Association between smoking and growing-up environment.
This time we leave the work of calculating the expected cell counts to the statistical software
Minitab. Figure 29.3 shows the Minitab output.






















Figure 29.3. Minitab chisquare analysis for smoking and growingup environment.
Figure 29.3. Minitab chi-square analysis for smoking and growing-up environment.
Notice that the cell counts appear below the observed counts in the table. The value of
Notice that
the test the cell
statistic is counts appear
χ 2 ≈ 56.2 below
. Since thisthe
is aobserved
3×3 table,counts
 = (3in–the
1)(3table.
– 1) The value of the
= 4. Minitab
reports
test the is
statistic value as approximately
χ 2 ≈ 56.2 . Since this is 0. So, table,
a 3×3 we conclude
df = (3 that
– 1)(3the– variables smoking
1) = 4. Minitab and
reports
growingup environment are not independent – there is an association.
the p-value as approximately 0. So, we conclude that the variables smoking and growing-up The results from
the chisquare test do not tell us anything about the nature of the association, only that
environment
there is one.areTonot independent
learn – thereofisthat
about the nature an association.
association, The results
we look from
at the the chi-square
conditional
test do not tellofussmoking
distributions anythingforabout
eachthe nature
of the of the association,
growingup environments. only Table
that there
29.9isshows
one. Tothelearn
column percentages.
Environment
Percent Farm Country City
about the nature of that association, we look at the conditional distributions of smoking for
each of the growing-up environments. Table 29.9 shows the column percentages.
Environment
Percent Farm Country City
Never 53.3 53.6 60.3
Smoking Occasionally 28.3 29.3 28.5
Regularly, now or past 18.4 17.1 11.2
Total 100.0 100.0 100.0
Table 29.9. Conditional

Table 29.9 distribution of smoking for each growing-up environment.
What we notice from Table 29.9 is that a higher percentage of students who grew up in a city
never smoked (60.3%) compared to students who grew up on a farm (53.3%) or in the country
(53.6%). The percentages for students who occasionally smoked (but not regular smokers)
were about the same for all three growing-up environments. However, the percentage of
regular smokers (either now or in the past) was higher for students who grew up on a farm
(18.4%) or in the country (17.1%) compared to students who grew up in a city (11.2%).
The chi-square test, like the z-test for proportions, is an approximate method that becomes more
accurate as the cell counts get larger. If the expected cell counts get too low, the test becomes
untrustworthy. Here are some guidelines for when a chi-square test gives accurate results.
Guidelines for Using Chi-Square Test

The chi-square test gives trustworthy results provided the following are satisfied:
• All expected counts are greater than 1.
• No more than 20% of the expected counts are less than 5.
Statistical software will often give a warning if the guidelines have been violated. For example,
energy drinks – non-alcoholic beverages that usually contain high amounts of caffeine (e.g.,
Red Bull, Full Throttle, and Monster) – have caused concern in the medical community.
Suppose we wanted to know if the pattern of daily consumption of energy drinks was
associated with students’ growing-up environment.
The output from Minitab appears in Figure 29.4. Notice the software reports the value of the
chi-square test statistic, but this time it does not provide a p-value. Instead it prints a warning,
which we have highlighted.




































Figure 29.4. Minitab chisquare analysis for energy drinks and growingup environment.
Figure 29.4. Minitab chi-square analysis for energy drinks and growing-up environment.
In this case, we could combine some of the categories for energy drinks. For example,
In this case, we could combine some of the categories for energy drinks. For example, we
we might combine categories Three, Four, and Five or more into a single category
might
“Threecombine categories
or more.” You will Three, Four, and
get a chance Five
to try thisorapproach
more intoinathe
single category “Three or
exercises.
more.” YouData forget
will twoway tables
a chance can
to try arise
this in different
approach in theways. In the case of the 
exercises.
 data, a single sample of high school students was chosen to take part in the
survey.
Data Their responses
for two-way tables canto two
arisequestions (two
in different categorical
ways. variables)
In the case of the were organized
Monitoring the Future
into twoway tables. That was not the case for the data discussed in the
data, a single sample of high school students was chosen to take part in the survey. Theirvideo. Those
data came from two different samples, a sample of children sick with malaria and a
responses
sample ofto two questions
newborns (two
(control categorical
group), variables)
which were then were organized
classified into two-way
according to one tables.
That was not the case for the data discussed in the video. Those data came from two different
Student a
samples, Guide,
sample Unit
of 29, Inference
children for TwoWay
sick with malaria andTables
a sample of newborns (control Page 10
group),

which were then classified according to one categorical variable, HbS/HbA. In this case,
the sample, malaria or control, was the second variable in the two-way table. There is no
difference in the analysis used in these two situations. The expected counts and chi-square
test statistics were computed using the same formulas in both cases.

Key Terms
The observed cell counts (or frequencies) are the actual number of observations that fall
into each cell (class). The expected counts (or frequencies) are the number of observations
that should fall into each class in a frequency distribution under the hypothesized probability
distribution.
Chi-square statistic:
(observed − expected)2
χ2 = ∑
expected
Degrees of freedom for chi-square test of independence: (r − 1)(c − 1) , where the number r and
c are the number of rows and columns, respectively.
Expected count for chi-square test of independence:

expected count =
grand total .

The Video
Take out a piece of paper and be ready to write down the answers to these questions as you
watch the video.
1. What type of research is the host of this series, Dr. Pardis Sabeti, involved in?
2. Dr. Sabeti’s work is modeled off of work done on malaria. What genetic mutation is an
important source of resistance to malaria?
3. What were the null and alternative hypotheses for testing whether the sickle cell gene
protects against malaria?
4. What is the rule for calculating the expected counts under the null hypothesis?
5. The p-value of the chi-square test statistic turned out to be approximately 0. What can you
conclude based on this p-value?

Unit Activity:
Associations With Color
This activity is in three parts. In Part I, you will examine the reasoning behind the expected
count formula. In Part II, you will need to collect data on eye color and gender from a sample of
students. In Part III, there are different samples – different types of M&M candies. The candies
are classified on one variable, color. In all three cases, you will conduct chi-square analyses.
Part I: Introduction – Assumption of Independence and Expected Count Formula
1. A survey given to 500 students asked: How would you describe your political preference?
There were three response choices: GOP (Republican), DEM (Democrat), and IND
(Independent). Keeping with the color theme of this activity, GOP is red (red states tend to
vote Republican), DEM is the blue, and to make the color scheme patriotic, we’ll let IND be
represented by the color white. In addition to collecting information on political preference, the
students indicated whether they were male or female. The results are given in Table 29.10.
Count Male Female Total

Political DEM (Blue) 107 89 196
Preference GOP (Red) 76 109 185
Color IND (White) 63 56 119
Total 246 254 500
Table
Table29.10.
29.10 Distribution of political preference and gender.
We are interested in finding out whether there is an association between gender and political
preference. We begin attacking this problem as a problem in probability. For example, to
estimate the probability that a randomly selected student will be female and a Democrat (blue),
we use the observed proportion 107/500. We can also calculate marginal probabilities using
the row or column totals. For example, we estimate the probability that a student prefers the
Democratic Party to be 196/500 and the probability that a randomly selected student is female
as 246/500.
Using probability, we can examine what it would mean for the variables gender and political
preference to be independent (or to have no association). If gender and political preference are
independent, then we can use the Multiplication Rule to calculate this probability: P(political
preference = DEM and gender = female).

a. Assume the variables gender and political preference are independent. Use the
Multiplication Rule to estimate P(DEM and female) from the marginal probabilities. Show your
calculations. (Give your answer to at least 4 decimals.)
b. Use your probability in (a) to determine the number of students out of the 500 observed that
you would expect to fall into the category of being female and preferring the Democratic Party.
c. In a test of the null hypothesis H0 : no association between the variables , the formula for
calculating the expected count is

expected count =
grand total
For the cell corresponding to female and DEM, determine the expected count from the formula
above. Compare your result with your answer to (b).
d. Repeat (a) - (c) for the cell corresponding to DEM and Male.
2. a. Assuming that the null hypothesis in 1(c) is true, calculate the expected counts for each
cell in Table 29.10.
b. Calculate the value of the chi-square test statistic and the degrees of freedom. Then
determine the p-value.
c. What can you conclude from your results in (b)?
Part II: Single Sample, Classified on Two Categorical Variables
One way to gather data that is appropriate for chi-square analysis is to select a single sample
and then to classify the subjects in that sample by two categorical variables.
You will need a sample of students (your class, combined classes, friends). The two variables
that you will use to classify the students in your sample are gender and eye color. The null
hypothesis is:

H0 : No association between gender and eye color.
or equivalently:

H0 : The variables gender and eye color are independent.

3. a. Collect the data from your sample of students. Enter it into a copy of Table 29.11.
Eye Color
Total
Count Blue Brown Other
Male
Gender
Female
Total
Table 29.11. Data onTable

gender and eye color.
29.11
b. State the null and alternative hypotheses.
c. Calculate the expected cell counts and enter them into your table.
d. Perform a chi-square test. Report the value of the test statistic, the p-value, and your
conclusion.
Part III: Multiple Samples, Classified on One Categorical Variable
Another data structure that is appropriate for chi-square analysis is when samples are drawn
from different populations and classified on one categorical variable. In this case, we can
think of “which sample” as the second variable. Next, your samples will be from different types
of M&M candies. Given bags of at least two types of M&Ms, you will classify the M&Ms into
colors, taking care to record which type of M&Ms candies you are classifying.
The null hypothesis is:

H0 : No association between M&M type and color.
or equivalently:

H0 : The color distributions are the same for the different M&M types.
4. a. Collect the color distribution from bags of up to four types of M&Ms. Then enter your data
into a table similar to the one in Table 29.12. (Be sure to record the type.)

Type 1
Type 2 Type 3 Type 4 Total
Count Regular
Green
Blue
Yellow
Color
Orange
Red
Brown
Total
Table 29.12. Data on M&Ms type and color.
Table 29.12
b. State the null and alternative hypotheses.
c. Calculate the expected cell counts and enter them into your table.
d. Perform a chi-square test. Report the value of the test statistic, the p-value, and your
conclusion.

Exercises
The questions in these exercises all relate to data collected from the study Monitoring the
Future: A Continuing Study of American Youth (MTF).
1. One of the questions on the MTF survey asked the following: About how many (if any) energy
drinks do you drink PER DAY, on average? Figure 29.4 (see Page 12) shows Minitab results
from testing to see if there is an association between the number of energy drinks students
consumed each day and their growing-up environment. As noted in the Content Overview,
Minitab computed the value of the chi-square test statistic but did not compute a p-value.
a. Explain all ways in which this analysis failed to meet the guidelines for using a chi-square test.
b. In order to continue the investigation into an association between energy drink consumption
and growing-up environment, we decided to combine the last three categories (Three, Four,
and Five or more) into a single category Three+. Make a copy of Table 29.13. Use the data
from Figure 29.4 to fill in the observed values in the third row of the table. Then find the row
total and enter that into your table.
Environment
Count Farm Country City Total
None Observed 57 144 598
799
Expected 52.55 150.44 596.01
One Observed 11 44 160
215
Energy Expected 14.14 40.48 160.38
Drinks Two Observed 4 13 36
53
Expected 3.49 9.98 39.54
Three + Observed
Expected
Total 73 209 828 1110
Table 29.13. Two-way table of energy drinks and growing-up environment.

Table 29.13
c. Use the row and column totals to calculate the expected counts for the third row. Enter the
expected counts into your table. Do the expected counts in your completed table meet the
guidelines for using the chi-square test?
d. Calculate the value of the chi-square test statistic. How many degrees of freedom are
associated with this statistic?
e. Determine the p-value and state your conclusion.

2. Table 29.14 revisits data from Unit 12’s exercises, which also dealt with responses to the
MTF survey. Table 29.14 organizes data on gender and responses to the following question:
How intelligent do you think you are compared with others your age?
Intelligence
Below Above Total
Count Average Average Average
Female 437 2243 4072
Gender
Male 456 1643 4593
Total
Table 29.14. Results from questions

Table 29.14 on gender and intelligence rating.
a. We would like to test whether there is a statistical difference between how males and
females rate their intelligence compared to their peers. In this context, which is the explanatory
variable and which is the response variable? Explain.
b. State an appropriate null hypothesis and alternative hypothesis.
c. Make a copy of Table 29.14. Calculate the row totals and column totals and enter them into
your table. Then calculate the expected counts for each cell and enter the expected counts
into your table.
d. Calculate the chi-square test statistic. What are the degrees of freedom associated with the
chi-square test statistic?
e. Calculate the p-value and state your conclusion.
3. We would expect that there is an association between how students rated their intelligence
and their academic success. Table 29.15 organizes students responses rating their intelligence
compared to their peers and their average grade in high school.
Average Grade
Count A B C or Below
Above 2886 4044 1387
Intelligence Average 1335 1881 585
Below 305 416 164
Table 29.15. Results from question

Tableon29.15
intelligence and average grade.

a. State the null and alternative hypotheses.
b. Calculate the expected counts and record them in a table.
c. Calculate the chi-square test statistic. State the degrees of freedom. Determine the p-value.
c. Calculate the chisquare test statistic. State the degrees of freedom. Determine the 
d.value.
If the null hypothesis is true, how likely would it be to observe a value from the chi-
square
 distribution in (c) at least as large as the value of the chi-square test statistic that you
calculatednull
d. If the hypothesis
in (c)? Does thisis true, howstrong
provide likely evidence
would it be to observe
against a value
the null from theExplain.
hypothesis? chi
square distribution in (c) at least as large as the value of the chisquare test statistic that
you calculated in (c)? Does this provide strong evidence against the null hypothesis?
Explain.
4. Another question on the MTF survey asked the following: On average over the school year,
4. Another question on the MTF survey asked the following: On average over the school
how many hours per week do you work in a paid or unpaid job? The survey results, classified
year, how many hours per week do you work in a paid or unpaid job? The survey
into a two-way
results, table,
classified are
into a shown
twowayintable,
Figureare
29.5. In addition,
shown in Figurethe Minitab
29.5. output contains
In addition, the
the Minitab
output contains
conditional the conditional
distributions of hoursdistributions of hours
worked per week for worked per week
each gender (rowfor each genderAnd
percentages).
(row percentages).
finally, And finally,
of particular interest of particular
is whether interest
or not there is is whether ordifference
a statistical not therein
is work
a patterns
statistical difference in work patterns between male and female 12thgrade students.
between male and
The expected female
counts, 12th-grade
under students.
the hypothesis thatThe expected
there counts, under
is no association the hypothesis
between gender
andthere
that workispatterns, also appear
no association in Figure
between gender29.5.
and(See
workkey at bottom
patterns, alsoofappear
outputinforFigure
the order
29.5.
of appearance.)
(See key at bottom of output for the order of appearance.)























Figure 29.5. Minitab chisquare analysis for gender and weekly work hours.
Figure
 29.5. Minitab chi-square analysis for gender and weekly work hours.
a. State appropriate the null and alternative hypotheses for this situation.
b. Report the outcome of the chisquare test and state your conclusion.
Student Guide, Unit 29, Inference for TwoWay Tables Page 19

a. State appropriate null and alternative hypotheses for this situation.
b. Report the outcome of the chi-square test and state your conclusion.
c. A chi-square test tells you whether or not there is an association between the two variables
but it doesn’t tell you anything about the nature of that association. Based on the row
percentages, describe the nature of the association between gender and hours worked per
week or describe evidence for the lack of such an association.

Review Questions
1. The video for Unit 15, Designing Experiments, focused on an observational study of coral
reefs. Moray eels are an important component of coral reef fish communities. Researchers
Robert Young and Howard Winn conducted an observational study of moray eel behavior in
the Belize Barrier Reef. They focused on two common species, the spotted moray and the
purplemouth moray. For each eel they observed, they identified its species and classified the
locations of the sightings into three categories, G for grass bed, S for sand or rubble, and B for
within one meter of the border between grass and sand/rubble. The results are presented in
Table 29.16.
Count Spotted Purplemouth

G 127 116
Habitat
S 99 67
Use
B 264 161
Table 29.16. Habitat Table

types29.16for two species of moral eels.
Source: Robert F. Young, Howard E. Winn, and W. L. Montgomery. Activity Patterns, Diet, and Shelter Site Use
for Two Species of Moray Eels, Gymnothorax moringa and Gymnothorax vicinus, in Belize. Copeia: February
2003, Vol. 2003, No. 1, pp. 44-55.
a. Set up the hypotheses to test whether there is a relationship between eel species and
habitat use.
b. Create a table of expected cell counts.
c. Calculate the chi-square test statistic. Show your calculations. Report the degrees of
freedom, and the p-value. At the 0.05 level of significance, is the habitat use independent of
the species of moray eel?
d. To examine the nature of any association between the two variables, habitat use and moray
eel species, calculate either row or column percentages, whichever is more appropriate to the
situation under study. Justify your choice of type of percentage. What do your percentages
reveal about moray eels?
2. A random sample of registered voters was asked about their educational background and
whether or not they voted in the November 2012 elections. Table 29.17 contains the results of
the survey.

Voted Nov. 2012
Yes No
Not HS Graduate 57 64
Highest
HS Graduate/No College 227 163
Educational
Some College or Associate's Degree 303 51
Attainment
Bachelor's Degree or Higher 303 51
Table 29.17. Survey results

Table 29.17on voting and highest educational attainment.
a. In this situation, which is the explanatory variable and which is the response variable?
Justify your answer.
b. Set up the hypotheses for testing whether educational attainment and voting in the 2012
presidential election are independent.
c. Calculate the expected counts for each cell.
d. Calculate the chi-square test statistic, state the degrees of freedom, and determine the
p-value. Are the results significant?
e. Make a bar chart that displays how voting patterns are related to highest educational
attainment. (Your choice of which variable is the explanatory variable should be evident in
your display.) Label the bars with the corresponding percentages. Describe the nature of the
relationship between the two variables.
3. Some tired, stressed-out students have turned to 2-ounce energy drink shots such as
5-Hour Energy to give them the energy boost they feel they need to make it through the day
(or night). Compared to energy drinks that can run about 100 calories per 8-ounce serving,
energy shots are sugar free and are around 4 calories per shot.
Because of the low calorie count, would female students be apt to drink more energy shots on a
daily basis than male students? To find out, researchers asked a group of 12th-grade students
the following question: How many (if any) energy drink shots do you drink PER DAY, on average?
Table 29.18 gives the results from a survey given to a sample of 12th-grade students.

Count Female Male
None 896 938
Less than one 63 70
One 16 19
Energy Shots
Two 5 16
Consumed Per
Three 7 5
Day
Four 1 0
Five or Six 4 4
Seven or more 4 7
Table 29.18. Student responses to question on energy drink shots.
Table
a. The 29.18
researchers wanted to see if there was an association between the daily number of
energy drink shots consumed and gender. Calculate the expected cell counts for each cell.
b. Based on your answer to (a) do the expected counts satisfy the guidelines for using a chi-
square test? Explain.
c. Combine some of the categories for the amount of energy shots consumed per day.
Compute the expected counts and check to see if the guidelines for using the chi-square test
are satisfied. If not, combine some additional categories until the guidelines are satisfied.
(There are different choices for how the categories can be combined.)
d. Perform a chi-square test on your data from (c). What is the value of the chi-square test
statistic? Report its p-value. What conclusions could the researchers draw from your results?

Regression
Summary of Video
In Unit 11, Fitting Lines to Data, we examined the relationship between winter snowpack and
spring runoff. Colorado resource managers made predictions about the seasonal water supply
using a least-squares regression line that was fit to a scatterplot of their measurement data,
which is shown in Figure 30.1.
Figure 30.1. Least-squares regression line used by Colorado resource managers.
But would we really see a linear relationship between snowpack and runoff if we had all the
possible data? Or might the pattern we see in the sample data’s scatterplot occur just by
chance? We would like to know whether the positive association we see between snowpack
and runoff in the sample is strong enough that we can conclude that the same relationship
holds for the whole population. Statisticians rely on inference to determine whether the
relationship observed between two variables in a sample is valid for some larger population.
Inference is a powerful tool. Powerful enough, in fact, to help bring an entire bird species
back from the brink of extinction. After World War II, the agrichemical industry began mass-
producing chemicals to control pests. Cities like San Antonio, Texas, sprayed whole sections of
the city with the insecticide DDT in their fight against the spread of poliomyelitis. Unfortunately,
Unit 30: Inference for Regression | Student Guide | Page 1

there weren’t many safeguards in place, and the damaging environmental effects of these
compounds were not taken into account. Eventually, changes in the natural environment due
to chemical pesticides became apparent. One species that was severely affected was the
peregrine falcon.
In Great Britain, Derek Ratcliffe noticed in the 1950s that peregrine falcons were declining at
nesting sites and they were unable to hatch their eggs. This decline in falcons was eventually
demonstrated to be a worldwide phenomenon. Researchers determined that the reason
peregrine falcons were not successfully hatching their eggs was due to eggshell thinning, a
very serious problem since the weaker shells were breaking before the baby birds were ready
to hatch. After looking at some of the causes for this eggshell thinning, scientists began to zero
in on a possible culprit: DDT and its breakdown product, DDE.
There were a couple of reasons why scientists believed that there was a relationship between
DDT or DDE and eggshell thinning. In studying the broken eggshells and eggs collected in the
field, scientists found very high residues of DDE that had not been seen in historic samples.
The falcons were ingesting DDT through their prey – birds they ate had small concentrations
of the chemical in their flesh. Over time the DDT built up in the peregrines’ own bodies and
started to affect the females’ ability to lay healthy eggs.
Even though scientists had a pretty strong hunch that DDT was the cause of peregrine falcon
eggshell thinning, they could not rely on their scientific instincts alone. So, researchers turned
to statistics as a way to validate their analyses. We can follow in the researchers’ footsteps by
taking a look at a data set comprised of 68 peregrine falcon eggs from Alaska and Northern
Canada. A scatterplot of the two variables we will be studying, eggshell thickness (response
variable) and the log-concentration of DDE (explanatory variable), appears in Figure 30.2. We
have added the least-squares regression line fit to these data. Remember it is described by an
equation of the form ŷ = a + bx .

Figure 30.2. Scatterplot of eggshell thickness versus log-concentration of DDE.
The data in Figure 30.2 show a negative, linear relationship between the two variables. Using
the equation, we can predict eggshell thickness for any measurement of DDE. The slope
b and intercept a are statistics, meaning we calculated them from our sample data. But if
we repeated the study with a different sample of eggs, the statistics a and b would take on
somewhat different values. So, what we want to know now is whether there really is a negative
linear relationship between these variables for the entire population of all peregrine eggs,
beyond just the eggs that happen to be in our sample. Or might the pattern we see in the
sample data be due simply to chance variation?
Data of the entire peregrine egg population might look like the scatterplot in Figure 30.3.
Notice that for any given value of the explanatory variable, such as the value indicated by the
vertical line, many different eggshell thicknesses may be observed.
Figure 30.3. Scatterplot representing population of peregrine eggs.

In the scatterplot in Figure 30.4, the mean eggshell thickness, y, does have a linear
relationship with the log concentration of DDE, x. The line fit to the hypothetical population
data is called the population regression line. Because we don’t have access to ALL the
population data, we use our sample data to estimate the population regression line.
Figure 30.4. The population regression line fit to the population data.
Several conditions, which are discussed in the Content Overview, must be met in order to
move forward with regression inference. You can check out whether these conditions are
satisfied in Review Question 1. But for now, we assume that the conditions for inference are
met. The population regression model is written as follows:
µy = α + β x
where  y represents the true population mean of the response y for the given level of x, α
is the population y-intercept, and β is the population slope. Now let’s look back at our least
squares regression line, based on the sample of 68 bird eggs. The equation is
yˆ = 2.146 − 0.3191x
The sample intercept, a = 2.146, is an estimate for the population intercept α . And the sample
slope, b = -0.3191, is an estimate for the population slope β.
Of course, we’ve learned by now that other samples from the same population will give us
different data, resulting in different parameter estimates of α and β. In repeated sampling,
the value of these statistics, a and b, form sampling distributions, which provide the basis for
statistical inference. In particular, we want to infer from the sampling distribution for our statistic
b, whether the sample data provide sufficiently strong evidence that higher levels of DDE are

related to eggshell thinning in the population. To answer this question, we set up our null and
alternative hypotheses.
Ho : Amount of DDE and eggshell thickness have no linear relationship.

or H0 : β = 0
Ha : Amount of DDE and eggshell thickness have a negative linear relationship.

or Ha : β < 0
The t-test statistic for testing the null hypothesis is:
b − β0
t=
sb
where b is our sample estimate for the population slope, β0 is the null hypothesis value for
the population slope, and sb is the standard error of the estimate b, which we can get from
software. In this case, sb = 0.0255 . Next, we calculate the value of our t-test statistic:
−0.3191 − 0
t= ≈ −12.5
0.0255
If the null hypothesis is true, then t has a t-distribution with n – 2, or 66, degrees of freedom.
The value t = -12.5 is an extreme value and the corresponding p-value is essentially 0. Thus,
we have strong evidence to reject the null hypothesis. By rejecting the null hypothesis, we
can confirm what scientists already suspected – that there is a connection between peregrine
falcon eggshell thickness and the presence of DDE. More precisely, there is a statistically
significant, negative linear relationship between the log-concentration of DDE and the
thickness of peregrine eggshells.
Before researchers could present this finding to the public, however, they had to quantify the
relationship. That meant computing a confidence interval for the population slope. Here’s the
formula:
b ± t * sb
For a 95% confidence interval and df = 68 – 2 = 66, we find t* = 1.997. Now, we can compute
the confidence interval:

−0.3191 ± (1.997)(0.0255)
−3.191 ± 0.0509
−0.3700 to − 0.2681
Hence, based on our sample of 68 peregrine falcon eggs, we are 95% confident that a one-
unit increase in the log-concentration of DDE is associated with a true average decrease of
between 0.27 and 0.37 in Ratcliffe’s eggshell thickness index. Armed with this information,
scientists were able to make a strong argument against the use of DDT because of its
dangerous impact on peregrines and the environment as a whole. These results led to
a prolonged legal battle with people on both sides presenting evidence. Due to scientific
and statistical evidence, the United States and many Western European countries banned
DDT use. Since then, the peregrine falcon population has rebounded significantly. So, this
environmental detective story has a happy ending for the peregrine falcons.

A. Understand the linear regression model. Know how to find the least-squares regression line
as an estimate (covered in Unit 11, Fitting Lines to Data.)
B. Know how to check whether the assumptions for the linear regression model are reasonably
satisfied.
C. Recall how to find the least-squares regression equation (Unit 11, Fitting Lines to Data).
D. Be able to calculate, or obtain from software, the standard error of the estimate, se , and the
standard error of the slope, s b .
E. Be able to conduct a significance test for the population slope β.
F. Be able to calculate a confidence interval for the population slope β.

Content Overview
While we often hear of the benefits of eating fish, we also hear warnings about limiting
our consumption of certain fish whose flesh contains high levels of mercury. Much like the
peregrine falcons and DDT, small levels of mercury in oceans, lakes, and streams build up in
fish tissue over time. It becomes most concentrated in larger fish, which are higher up on the
food chain.
To better understand the relationship between fish size and mercury concentration, the United
State Geological Survey (USGS) collected data on total fish length and mercury concentration
in fish tissue. (Total length is the length from the tip of the snout to the tip of the tail.) The data
from a sample of largemouth bass (of legal size to catch) collected in Lake Natoma, California,
appear in Table 30.1. (You may remember these data from Review Question 3 in Unit 11.)
Total Length Mercury Concentration Total Length Mercury Concentration
(mm) (µg/g wet wt.) (mm) (µg/g wet wt.)
341 0.515 490 0.807
353 0.268 315 0.320
387 0.450 360 0.332
375 0.516 385 0.584
389 0.342 390 0.580
395 0.495 410 0.722
407 0.604 425 0.550
415 0.695 480 0.923
425 0.577 448 0.653
446 0.692 460 0.755
Table 30.1. Fish total length and mercury concentration in fish tissue.
Table 30.1
Since we believe that fish length explains mercury concentration, total length is the
explanatory variable and mercury concentration is the response variable. A scatterplot of
mercury concentration versus total length appears in Figure 30.5.

1.0
0.9
Mercury Concentration (µg/g)
0.8
0.7
0.6 y = - 0.7374 + 0.003227x
0.5
0.4
0.3
0.2
300 350 400 450 500
Total Length (mm)
Figure 30.5. Scatterplot of mercury concentration versus total fish length.
Since the pattern of the dots in the scatterplot indicates a positive, linear relationship between
the two variables, we fit a least-squares line to the data. However, these data are a sample of
20 largemouth bass from the population of all the largemouth bass that live in Lake Natoma.
While we can use the least-squares equation to make predictions about mercury concentration
for fish of a particular length, we need techniques from statistical inference to answer the
following questions about the population:
• Is there really a positive linear relationship between the variables mercury

concentration and total length, or might the pattern observed in the scatterplot be due
simply to chance?
• Can we determine a confidence interval estimate for the population slope, the rate of
change of mercury concentration per one millimeter increase in fish total length?
• If we use the least-squares line to predict the mercury concentration for a fish of a
particular length, how reliable is our prediction?
Now, what if we could make a scatterplot of mercury concentration versus total length for all of
the largemouth bass (at or close to the legal catch length) in Lake Natoma? Figure 30.6 shows
how a scatterplot of the population might look and how a regression line fit to the population
data might look.

Population Scatterplot
1.25
1.00
Mercury Concentration µ g/g
0.75 µy = α + β x
0.50
0.25
0.00
200 300 x1 400 x2 500 600

Total Length (mm)
Figure 30.6. Population scatterplot of mercury concentration versus total length.
Notice, for each fish length, x, there are many different values of mercury concentration, y.
For example, in Figure 30.6 a vertical line segment has been drawn at length x1 . That line
segment intersects with a whole distribution of mercury concentration values, y-values, on
the scatterplot. The mean of that distribution of y-values, µ y , is at the intersection of the
vertical line at x1 and the regression line. Now look at the vertical line at x2 . It too intersects
with an entire distribution of y-values, with mean at the intersection of the vertical line at
x2 and the regression line. So, the population regression line describes how the mean
mercury concentration values, µ y , are related to total length, x. In this case, the relationship
looks linear and so we express it as: µ y = α + β x . As mentioned earlier in this unit, several
conditions must be met in order to move forward with regression inference. Those conditions,
along with a description of the simple linear regression model, are presented below.

Simple Linear Regression Model and Conditions
We have n ordered pairs of observations (x, y) on an explanatory variable, x, and response
variable, y.
The simple linear regression model assumes that for each value of x the observed values
of the response variable, y, vary about a mean µ y that has a linear relationship with x:
µy = α + β x
The line described by µ y = α + β x is called the population regression line. In addition,

the following conditions must be satisfied:
• For any fixed value of x, the response y varies according to a normal distribution.
Repeated responses, y-values, are independent of each other.
• The standard deviation of y for any value of x, σ , is the same for all values of x.
Thus, the model has three unknown population parameters: α , β , and σ .

Figure 30.7 provides a graphic representation of the simple linear regression model and
conditions.
µ y = α + βx
α + βx 1 σ
α + βx 2 σ
α + βx 3 σ
x1 x2 x3
x
Three different x-values
Figure 30.7. Graphic representation of linear regression model.
A first step in inference is to estimate the unknown parameters. We begin with estimates for
the slope and intercept of the population regression line. The estimated regression line
for the linear regression model is the least-squares line, ŷ = a + bx . From Figure 30.5, the
estimated regression line is:

yˆ = −0.7374 + 0.003227 x
The y-intercept, a = -0.7374, is a point estimate for the population intercept, α , and the slope,
b = 0.003227, is a point estimate of the population slope, β.
Next, we develop an estimate for σ , which measures the variability of the response y about
the population regression line. Because the least-squares line estimates the population
regression line, the residuals estimate how much y varies about the population regression line:
residual = observed y – predicted y
= y − yˆ
We estimate σ from the standard deviation of the residuals, se , as follows:
se =
∑ ( y − yˆ )
2
=
SSE
n−2 n−2
Our estimate for σ , se , is called the standard error of the estimate.
The computation of se is tedious by hand. Regression outputs from statistical software will
compute the value for you. However, here’s how it is computed in our example of mercury
concentration and fish length. First, we’ll compute the residual corresponding to data value
(341, 0.515) as a reminder of how that is done.
yˆ = −0.7374 + 0.003227(341) ≈ 0.363
y − yˆ = 0.515 − 0.363 = 0.152
Here are all 20 residuals (rounded to three decimals):
0.152 -0.134 -0.062 0.043 -0.176
-0.042 0.028 0.093 -0.057 -0.010
-0.037 0.041 -0.092 0.079 0.059
0.136 -0.084 0.111 -0.055 0.008
Next, we calculate the SSE, the sum of the squares of the residuals:
SSE = (0.152)2 + ( −.0134)2 + ( −0.062)2 + . . . + (0.008)2 ≈ 0.1545

Now, we calculate se :
SSE 0.1545
se = ≈ ≈ 0.0926 μg/g
20 − 2 18
We can use the equation of the least-squares line, yˆ = −0.7374 + 0.003227 , to make
predictions. However, those predictions are more reliable when the data points lie “close” to
the line. Keep in mind that se is one measure of the closeness of the data to the least-squares
line. If se = 0 , the data points fall exactly on the least-squares line. Moreover, when se is
positive, we can use it to place error bounds above and below the least-squares line. These
error bounds are lines parallel to the least-squares line that lie one or two se above and below
the least-squares line. We apply this idea to our mercury concentration and fish length data.
yˆ = −0.7374 + 0.003227 x ± 0.0926
yˆ = −0.7374 + 0.003227 x ± 2(0.0926)
1.0
0.9
Mercury Concentration ( µ g/g)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
300 350 400 450 500

Total Length (mm)
Figure 30.8. Adding lines ± se and ±2 se above and below the least-squares line.
Recall from Unit 8, Normal Calculations, that we expect roughly 68% of normal data to
lie within one standard deviation of the mean and roughly 95% to lie within two standard
deviations of the mean. Notice that all of our data fall within two se of the least-squares line.
So, for a particular fish length, say with total length = 400 mm, we expect roughly 95% of the
fish to have mercury concentrations between 0.3682 μg/g and 0.7386 μg/g.
The standard error of the estimate provides one way to select between competing models. For
example, suppose we had a second model relating mercury concentration to the explanatory
variable fish weight. Choose the model with the smaller value for se .

The scatterplot in Figure 30.5 appears to support the hypothesis that longer fish tend to have
higher levels of mercury concentration. But is this positive association statistically significant?
Or could it have occurred just by chance? To answer this question, we set up the following null
and alternative hypotheses:
H 0 : Total length and mercury concentration have no linear relationship.

or H0 : β = 0
Ha : Total length and mercury concentration have a positive linear relationship.
or Ha : β > 0
A regression line with slope 0 is horizontal. That indicates that the mean of the response y
does not change as x changes – which, in turn, means that the linear regression equation is
of no value in predicting y. In the case of mercury concentration and total length, the estimate
of the population slope is very small, b = 0.003227. So, we might jump to the conclusion that
total length is not useful in predicting mercury concentration. But we’d better work through the
details of a significance test before jumping to such a conclusion.
Significance Test For Regression Slope, β
To test the hypothesis H0 : β = β0 , compute the t-test statistic:
b − β0
t=
sb
se
where sb =
∑(x − x )
2
and b is the least-squares estimate of the population slope, β, and β0 is the null
hypothesis value for β .
If the null hypothesis is true and the linear regression conditions are satisfied, then t has
a t-distribution with df = n – 2.
Back to the situation with mercury concentration and fish length.

We use software to help us calculate sb :
0.093
sb = ≈ 0.000468
39463.2

Now we are ready to calculate t:
0.003227 − 0
t= ≈ 6.9
0.000468
In this case, df = n – 2 = 20 – 2 = 18. Since this is a one-sided alternative, we find the

probability of observing a value of t at least as large as the one we observed, 6.9. As shown in
Figure 30.9, the area under the t-density curve to the right of 6.9 is so small that it isn’t really
visible. The area is only 9.4127 × 10 -7; so, p ≈ 0. We conclude that there is sufficient evidence
to reject the null hypothesis and conclude β > 0. There is a positive linear relationship between
total length and mercury concentration.
Density Curve for t-distribution, df = 18
9.4127E-07
0 6.9
t
Figure 30.9. Calculating the p-value.
Next, we calculate a confidence interval estimate for the regression slope, β. Here are the
details for constructing a confidence interval.
Confidence Interval For Regression Slope, β

A confidence interval for β is computed using the following formula:
b ± t * sb
where t* is a t-critical value associated with the confidence level and determined from
a t-distribution with df = n – 2; b is the least-squares estimate of the population slope,
and sb is the standard error of b.

To calculate the confidence interval, we start by determining the value of t* for a 95%
confidence interval when df = 18. Using a t-table, we get t* = 2.101. We can now calculate the
confidence interval:
b ± t * sb
0.003227 ± (2.101)(0.000468) ≈ 0.003227 ± 0.000983 ,
Or, rounded to four decimals, from 0.0022 to 0.0042.
Thus, for each increase of 1 millimeter in total length, we expect the mercury concentration to
increase between 0.0022 μg/g and 0.0042 μg/g. That may seem like a small increase, but, for
example, Florida has set the safe limit on mercury concentration to be below 0.5 μg/g.
The results from inference are trustworthy provided the conditions for the simple linear
regression model are satisfied. We conclude this overview with a discussion of checking the
conditions – what should be done first before proceeding to inference. The conditions involve
the population regression line and deviations of responses, y-values, from this line. We don’t
know the population regression line, but we have the least-squares line as an estimate. We
also don’t know the deviations from the population regression line, but we have the residuals
as estimates. So, checking the assumptions can be done through examining the residuals.
Here is a rundown of the conditions that must be checked:
1. Linearity
Check the adequacy of the linear model (covered in Unit 11). Make a residual plot, a
scatterplot of the residuals versus the explanatory variable. If the pattern of the dots
appears random, with about half the dots above the horizontal axis and half below, then the
condition of linearity is satisfied.
2. Normality
The responses, y-values, vary normally about the regression line for each x. This does
not mean that the y-values are normally distributed because different y-values come from
different x-values. However, the deviations of the y-values about their mean (the regression
line) are normal and those deviations are estimated by the residuals. So, check that
the residuals are approximately normally distributed (covered in Unit 9). Make a normal
quantile plot. If the pattern of the dots appears fairly linear, then the condition of normality
is satisfied. If the plot indicates that the residuals are severely skewed or contain extreme
outliers, then this condition is not satisfied.
3. Independence
The responses, y-values, must be independent of each other. The best evidence of
independence is that the data are a random sample.

4. Constant standard deviations of the responses for all x
To check this condition, examine a residual plot. Check to see if the vertical spread of
the dots remains about the same as x-values increase. As an example, consider the two
residual plots in Figure 30.10. In residual plot (a), the vertical spread is about the same for
small x-values as it is for large x-values. In this case, Condition 4 is satisfied. In residual
plot (b), the spread of the residuals tends to increase as x-values increase. We’ve used a
pencil to roughly draw an outline of the spread as it fans out for larger values of x. Here
Condition 4 is not satisfied.
2 2
1 1
Residuals
Residuals
0 0
-1 -1
-2 -2
1 2 3 4 5 1 2 3 4 5
x x
(a) (b)
Figure 30.10. Checking to see if Condition 4 is satisfied.
Now, we return to the fish study: Are the inference results – the significance test and
confidence interval that we calculated – trustworthy? Let’s check to see if Conditions 1 – 4 are
reasonably satisfied. A residual plot appears in Figure 30.11.
Residual Plot
(Response is Mercury Concentration))
0.2
0.1
Residual
0.0
-0.1
-0.2
300 350 400 450 500
Total Length (mm)
Figure 30.11. Residual plot for checking conditions.

The dots appear randomly scattered and split above and below the horizontal axis. In addition,
the vertical spread seems to be roughly the same as total length, x, increases. Therefore,
Conditions 1 and 4 are reasonably satisfied. Figure 30.12 shows a normal quantile plot of the
residuals. The pattern of the dots appears fairly linear. So, Condition 2 is reasonably satisfied.

99
95
90
80
70
Percent
60
50
40
30
20
10
5
1
-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
Residuals
Figure 30.12. Normal quantile plot of residuals.
Finally, the data were a random sample of fish. So, the mercury concentration levels are
independent of each other. Condition 3 is satisfied. So, now we can say that our inference
results are trustworthy.

Key Terms
The simple linear regression model assumes that for each value of x the observed values of
the response variable y are normally distributed about a mean µ y that has the following linear
relationship with x:
µy = α + β x
The line described by µ y = α + β x is called the population regression line. The estimated
regression line for the linear regression model is the least-squares line, ŷ = a + bx .
Assumptions of the linear regression model:
The observed response y for any value of x varies according to a normal distribution.
The y-values are independent of each other.
The mean response, µ y , has a straight-line relationship with x: µ y = α + β x .
The standard deviation of y, σ , is the same for all values of x.
The standard error of the estimate, se , is a measure of how much the observations vary
about the least-squares line. It is a point estimate for σ and is computed from the following
formula:
se =
∑ ( y − yˆ )2
=
SSE
n−2 n−2
The standard error of the slope, sb , is the estimated standard deviation of b, the least-
squares estimate for the population slope β. It is calculated from the following formula:
se
sb =
∑(x − x ) 2
The t-test statistic for testing H0 : β = β 0 , where β is the population slope, is calculated as
follows:
b − β0
t=
sb

where b is the least-squares estimate of the population slope, β0 is the null hypothesis value
for β, and sb is the standard error of b. When H0 is true, t has a t-distribution with df = n – 2,
where n is the number of (x,y)-pairs in the sample. The usual null hypothesis is H0 : β = 0 ,
which says that the straight-line dependence on x has no value in predicting y.
To calculate a confidence interval for the population slope, β, use the following formula:
b ± t * sb
t-distribution with df = n – 2; b is the least-squares estimate of the population slope, and sb is
the standard error of b.

The Video
Take out a piece of paper and be ready to write down the answers to these questions as you
watch the video.
1. The population of peregrine falcons was in decline in the 1950s. What was the reason for
the population’s decline?
2. In a scatterplot of eggshell thickness and log-concentration of DDE, which was the

explanatory variable and which was the response variable?
3. Describe the form of the relationship between eggshell thickness and log-concentration of
DDE – is the form linear or nonlinear? Positive or negative?
4. What is a population regression line?
5. Why are a and b, the y-intercept and slope of the least-squares line, called statistics?
6. State the null and alternative hypotheses used for testing whether the sample data provided
strong evidence that higher levels of DDE were related to eggshell thinning in the population.

7. What was the outcome of the significance test?
8. Did the peregrine falcons ever recover?

Unit Activity:
Clues to the Thief
A high school’s mascot is stolen and the poster shown in Figure 30.13 has been posted around
the school and the town. The thief has left clues: a plain black sweater and a set of footprints
under a window. The footprints appear to have been made by a man’s sneaker. Here are more
details from the investigation:
• The distance between the footprints reveals that the thief’s steps are about 58 cm long.
This distance was measured from the back of the heel on the first footprint to the back
of the heel on the second.
• The thief’s forearm is between 26 and 27 cm. The forearm length was estimated from the
sweater by measuring from the center of a worn spot on the elbow to the turn at the cuff.
Figure 30.13. The missing manatee.
School officials suspect that the thief is a student from a rival high school. Table 30.2 contains
data from a random sample of 9th and 10th-grade students that you can use for this activity.
Feel free to add and/or substitute data that your class collects.
In this activity, you will fit two linear regression models to the data. For the first model you
will fit a line to forearm length and height; for the second model, you will fit a line to step
length and height. To eliminate confusion, express your models using the variable names
rather than x and y.

1. a. Make a scatterplot of height versus forearm length. Calculate the equation of the least-
squares line and add its graph to your scatterplot.
b. Check to see if the four conditions for the simple linear regression model are reasonably
satisfied. (Look to see if there are strong departures from the conditions.)
c. Calculate the standard error of the estimate, se .
2. Next, let’s focus on inference related to the relationship between height and forearm length.
a. We expect people with longer forearms to be taller than people with shorter forearms.
Conduct a significance test H0 : β = 0 against Ha : β > 0 . Report the value of the test statistic,
the degrees of freedom, the p-value, and your conclusion.
b. Construct a 95% confidence interval for β. Interpret your confidence interval in the context
of this situation.
3. a. Make a scatterplot of height versus step length. Calculate the equation of the least-
squares line and add its graph to your scatterplot.
b. Check to see if the four conditions for the simple linear regression model are reasonably
satisfied. (Look to see if there are strong departures from the conditions.)
c. Calculate the standard error of the estimate, se .
4. Next, we focus on inference related to the relationship between height and step length.
a. We expect people with longer step lengths to be taller than people with shorter step lengths.
Conduct a significance test H0 : β = 0 against Ha : β > 0 . Report the value of the test statistic,
the degrees of freedom, the p-value, and your conclusion.
b. Construct a 95% confidence interval for β. Interpret your confidence interval in the context
of this situation.
5. a. You have two competing models for predicting height, one based on forearm length and
the other based on step length. Which of your two models is likely to produce more precise
estimates? Explain.

b. Use one or both of your models to fill in the blanks in the following sentence. Justify your
answer.
We predict that the thief is ______ cm tall. But the thief might be as short as ______ or as tall
as ______.
Height Stride Length Forearm Length

Gender
(cm) (cm) (cm)
Male 166.0 58.250 28.5
Male 178.0 68.500 29.0
Male 171.0 58.500 27.2
Male 165.0 50.125 28.0
Male 177.5 58.750 31.3
Male 166.0 62.875 28.3
Male 175.5 59.125 28.6
Male 171.0 67.750 31.5
Male 184.0 68.875 30.5
Male 184.5 66.250 30.8
Male 183.5 79.500 30.5
Male 172.0 70.500 30.3
Female 164.5 55.875 24.2
Female 166.0 52.375 27.3
Female 168.0 55.375 28.0
Female 178.5 59.750 29.1
Female 166.0 48.375 27.9
Female 159.0 57.125 28.0
Female 166.0 64.000 27.4
Female 154.5 57.750 25.8
Female 161.0 63.500 27.0
Female 177.0 69.750 30.1
Female 161.0 72.500 26.5
Female 164.0 75.250 28.2
Female 174.0 58.500 28.4
Female 164.0 59.750 26.8
Female 168.0 55.250 26.4
TableTable
30.2.30.2
Data from 9th and 10-grade students.

Exercises
Table 30.3 provides data on femur (thighbone) and ulna (forearm bone) lengths and height.
These data are a random sample taken from the Forensic Anthropology Data Bank (FDB)
at the University of Tennessee. Notice that height is given in centimeters and bone length in
millimeters. All exercises will be based on these data.
Femur Length, x1 Ulna Length, x2 Height, y

(mm) (mm) (cm)
432 237 158
498 288 188
463 276 173
443 245 163
511 278 191
547 283 189
484 279 178
522 293 182
438 251 163
462 262 175
449 255 159
499 273 181
484 280 168
472 255 175
484 269 175
432 248 160
439 248 165
483 263 170
484 269 180
508 307 183
Table 330.3.
Table 0.3 Data on femur and ulna length and height.
1. a. Make a scatterplot of height versus femur length. Would you describe the pattern of the
dots as linear or nonlinear? Positive association or negative?
b. Calculate the equation of the least-squares line. Add a graph of the line to your scatterplot in (a).
c. Check to see if the conditions for regression inference are reasonably satisfied. Identify any
strong departures from the conditions.

2. a. Building on the work done for question 1, calculate the standard error of the estimate, se .
b. Write the equations of error bands one and two standard errors, se , above and below the
least-squares line. Add graphs of these lines to your scatterplot from question 1(b).
c. If the distributions of the responses, y-values, for any fixed x are normally distributed with
mean on the regression line, then the outermost bands in (b) should trap roughly 95% of the
data between the bands. Is that the case?
3. a. Make a scatterplot of height versus ulna length. Determine the equation of the least-
squares line and add a graph of the least-squares line to your scatterplot.
b. Calculate the standard error of the estimate, se .
c. Suppose a partial skeleton is found on a rugged hillside. The skeleton is brought to a lab for
identification. The ulna bone measures 287 mm and the femur measures 520 mm. Use your
equation from 3(a) to predict the person’s height. Then use your equation from 1(b) to predict
the person’s height. Which of your estimates, the one based on ulna length or the one based
on femur length, is likely to be more reliable? Justify your answer based on the standard error
of the estimate, se , for each equation.
4. Consider the linear regression model for height based on femur length.
a. Test the hypothesis H0 : β = 0 against the one-sided alternative Ha : β > 0 . Report the value
of the t-test statistic, the degrees of freedom, the p-value, and your conclusion.
b. Calculate a 95% confidence interval for the population slope, β.

Review Questions
1. The video focused on peregrine falcons and the relationship between eggshell thickness
and log-concentration of DDE. During the video, we did not check whether or not the
conditions for inference were met and went ahead with conducting a significance test and
constructing a confidence interval. Your task is to check whether the four conditions for
inference are reasonably satisfied given the following information. Justify your answer.
Assume that the data came from a random sample of eggs collected from Alaska and
Northern Canada. Figure 30.14 shows a residual plot and Figure 30.15 displays a normal
quantile plot of the residuals.
Residual Plot
0.3
0.2
0.1
Residuals
0.0
-0.1
-0.2
-0.3
1.0 1.5 2.0 2.5
Log-Concentration DDE
Figure 30.14. Residual plot.

99.9
99
95
90
80
70
Percent
60
50
40
30
20
10
5
0.1
-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Residuals
Figure 30.15. Normal Quantile Plot of Residuals.

2. Admissions offices of colleges and universities are interested in any information that can
help them determine which students will be successful at their institution. For example, could
students’ high school grade point averages (GPA) be useful in predicting their first-year college
GPAs? Data on high school GPA and first-year college GPA from a random sample of 32
college students attending a state university is displayed in Table 30.4.
High School GPA First Year College GPA High School GPA First Year College GPA
3.00 3.15 2.90 1.46
3.00 2.07 3.50 3.10
2.30 2.60 3.10 2.76
3.68 4.00 3.35 2.01
2.20 2.03 3.70 3.34
3.00 3.53 2.70 2.90
3.03 3.17 2.86 2.93
3.00 2.68 2.51 1.95
3.16 3.88 2.93 3.01
2.70 2.30 3.41 3.48
4.00 3.64 3.30 2.87
3.77 3.62 3.76 2.85
2.70 2.34 2.66 1.67
3.10 3.64 2.91 3.38
3.23 3.67 3.47 3.68
2.80 3.37 3.40 3.76
Table 30.4. Data on Table

high3school
0.4 GPA and first-year college GPA.
a. Make a scatterplot of first-year college GPA versus high school GPA. Does the form of
these data appear to be linear? Would you describe the relationship as positive or negative?
b. Determine the equation of the least-squares line and add the line to your scatterplot in (a).
c. Determine the t-test statistic for testing H0 : β = 0 . How many degrees of freedom does t have?
d. Find the p-value for the one-sided alternative Ha : β > 0 . What do you conclude?
3. Linda heats her house with natural gas. She wonders how her gas usage is related to how
cold the weather is. Table 30.5 shows the average temperature (in degrees Fahrenheit) each
month from September through May and the average amount of natural gas Linda’s house
used (in hundreds of cubic feet) each day that month.

Month Sep Oct Nov Dec Jan Feb Mar Apr May
Outdoor temperature °F 48 46 38 29 26 28 49 57 65
Gas used per day (100 cu ft) 5.1 4.9 6 8.9 8.8 8.5 4.4 2.5 1.1
Table 30.5. Gas usage and temperature data.

Table 30.5
a. Make a scatterplot of gas usage versus temperature. Describe the form and direction of the
relationship between these two variables.
b. Fit a least-squares line to gas usage versus temperature and add a graph of the line to your
scatterplot in (a).
c. Check to see if the conditions needed for inference are satisfied.
d. Calculate the standard error of the estimate, se , and standard error of the slope, sb . Show
your calculations.
e. Conduct a significance test of Ho : β = 0 . Should the alternative be one-sided or two-
sided? Report the value of the t-test statistic, the degrees of freedom, the p-value and your
conclusion.
f. Calculate a 95% confidence interval for the population slope. Interpret your results in the
context of this problem.
4. Do taller 4-year-olds tend to become taller 6-year-olds? Can a linear regression model
be used to predict a 4-year-old’s height when he or she turns six? Table 30.6 gives data on
heights of children when they were four and then when they were six.
Height Age 4 Height Age 6 Height Age 4 Height Age 6

104.4 118.4 98.1 112.8
104.0 119.4 100.6 115.2
92.1 103.9 100.5 115.8
103.3 116.8 102.7 117.3
98.4 113.1 98.5 113.3
96.5 110.0 98.8 109.3
105.3 119.3 102.3 117.9
103.2 118.6 99.0 112.2
105.9 123.2 100.2 112.9
97.4 110.2 100.3 113.4
103.4 118.7 99.6 112.6
101.7 119.2 109.8 124.5
105.4 120.2 100.2 113.7
104.4 119.2 99.6 115.2
100.7 112.6 104.1 117.1
TableTable
30.6.30.6
Data on children’s heights at age 4 and 6.

a. Make a scatterplot of height at age 6 versus height at age 4. Determine the equation of the
least squares line and add its graph to the scatterplot.
b. From regression output we get se = 1.38596 and sb = 0.07437 . Construct a 95%

confidence interval for the population slope β. Interpret your confidence interval in the context
of children’s growth.

Unit 31: One-Way ANOVA
Summary of Video
A vase filled with coins takes center stage as the video begins. Students will be taking part
in an experiment organized by psychology professor John Kelly in which they will guess the
amount of money in the vase. As a subterfuge for the real purpose of the experiment, students
are told that they are taking part in a study to test the theory of the “Wisdom of the Crowd,”
which is that the average of all of the guesses will probably be more accurate than most of the
individual guesses. However, the real purpose of the study is to see whether holding heavier
or lighter clipboards while estimating the amount of money in the jar will have an impact on
students’ guesses. The idea being tested is that physical experience can influence our thinking
in ways we are unaware of – this phenomenon is called embodied cognition.
The sheet on which students will record their monetary guesses is clipped onto a clipboard.
For the actual experiment, clipboards, each holding varying amounts of paper, weigh either
one pound, two pounds or three pounds. Students are randomly assigned to clipboards and
are unaware of any difference in the clipboards. After the data are collected, guesses are
entered into a computer program and grouped according to the weights of the clipboards. The
mean guess for each group is computed and the output is shown in Table 31.1.
Money Guesses
Clipboard Weight Mean N Standard Deviation
1 $106.56 75 $100.62
2 $129.79 75 $204.95
3 $143.29 75 $213.13
Total $126.55 225 $180.16
Table 31.1
Table 31.1. Average guesses by clipboard weight.
Looking at the means, the results appear very

promising. As clipboard weight goes up, so does
the mean of the guesses, and that pattern appears
fairly linear (See Figure 31.1.). To test whether or Figure 31.1. Mean guess versus
not the apparent differences in means could be due clipboard weight.
simply to chance, John turns to a technique called
a one-way analysis of variance, or ANOVA. The null
Unit 31: One-Way ANOVA | Student Guide | Page 1

hypothesis for the analysis of variance will be that there is no difference in population means
for the three weights of clipboards: H0 : µ1 = µ2 = µ3 . He hopes to find sufficient evidence to
reject the null hypothesis so that he can conclude that there is a significant difference among
the population means. John runs an ANOVA using SPSS statistical software to compute a
statistic called F, which is the ratio of two measures of variation:
variation among sample means

F=
variation within individual observations in the same sample
In this case, F = 0.796 with a p-value of 0.45. That means there is a 45% chance of getting an
F value at least this extreme when there is no difference between the population means. So,
the data from this experiment do not provide sufficient evidence to reject the null hypothesis.
One of the underlying assumptions of ANOVA is that the data in each group are normally
distributed. However, the boxplots in Figure 31.2 indicate that the data are skewed and include
some rather extreme outliers. John’s students tried some statistical manipulations on the data to
make them more normal and reran the ANOVA. However, the conclusion remained the same.
$1,600.00
$1,400.00
$1,200.00
$1,000.00
MoneyGuess
$800.00
$600.00
$400.00
$200.00
$0.00
1 2 3
Clipboard Weight
Figure 31.2. Boxplots of guesses grouped by clipboard weight.
But what if we used the data displayed in Figure 31.3 instead? The sample means are the same,
around $107, $130, and $143, but this time the data are less spread out about those means.

225
200
175
MoneyGuess
150
125
100
75
50
1 2 3
Clipboard Weight
Figure 31.3. Hypothetical guess data.
In this case, after running ANOVA, the result is F = 33.316 with a p-value that is essentially
zero. Our conclusion is to reject the null hypothesis and conclude that the population means
are significantly different.
In John’s experiment, the harsh reality of a rigorous statistical analysis has shot down the idea
that holding something heavy causes people, unconsciously, to make larger estimates, at least
in this particular study. But if the real experiment didn’t work, what about the cover story – the
theory of the Wisdom of the Crowd? The actual amount in the vase is $237.52. Figure 31.4
shows a histogram of all the guesses. The mean of the estimates is $129.22 – more than $100
off, but still better than about three-quarters of the individual guesses. So, the crowd was wiser
than the people in it.
Figure 31.4. Histogram of guess data.

A. Be able to identify when analysis of variance (ANOVA) should be used and what the null
and alternative (research) hypotheses are.
B. Be able to identify the factor(s) and response variable from a description of an experiment.
C. Understand the basic logic of an ANOVA. Be able to describe between-sample variability

(measured by mean square for groups (MSG)) and within-sample variability (measured by
mean square error (MSE)).
D. Know how to compute the F statistic and determine its degrees of freedom given the
following summary statistics: sample sizes, sample means and sample standard deviations.
Be able to use technology to compute the p-value for F.
E. Be able to use technology to produce an ANOVA table.
F. Recognize that statistically significant differences among population means depend on the
size of the differences among the sample means, the amount of variation within the samples,
and the sample sizes.
G. Recognize when underlying assumptions for ANOVA are reasonably met so that it is
appropriate to run an ANOVA.
H. Be able to create appropriate graphic displays to support conclusions drawn from

ANOVA output.

Content Overview
In Unit 27, Comparing Two Means, we compared two population means, the mean total energy
expenditure for Hadza and Western women, and used a two-sample t-test to test whether or
not the means were equal. But what if you wanted to compare three population means? In that
case, you could use a statistical procedure called Analysis of Variance or ANOVA, which
was developed by Ronald Fisher in 1918.
For example, suppose a statistics class wanted to test whether or not the amount of caffeine
consumed affected memory. The variable caffeine is called a factor and students wanted
to study how three levels of that factor affected the response variable, memory. Twelve
students were recruited to take part in the study. The participants were divided into three
groups of four and randomly assigned to one of the following drinks:
A. Coca-Cola Classic (34 mg caffeine)

B. McDonald’s coffee (100 mg caffeine)
C. Jolt Energy (160 mg caffeine).
After drinking the caffeinated beverage, the participants were given a memory test (words
remembered from a list). The results are given in Table 31.2.
Group A (34 mg) Group B (100 mg) Group C (160 mg)

7 11 14
8 14 12
10 14 10
12 12 16
7 10 13
Table
Table 331.2.
1.2 Number of words recalled in memory test.
For an ANOVA, the null hypothesis is that the population means among the groups are the
same. In this case, H0 : µ A = µB = µC , where µ A is the population mean number of words
recalled after people drink Coca Cola and similarly for µB and µC . The alternative or research
hypothesis is that there is some inequality among the three means. Notice that there is a lot of
variation in the number of words remembered by the participants. We break that variation into
two components:
(1) variation in the number of words recalled among the three groups also called
between-groups variation
(2) variation in number of words among participants within each group also called
within-groups variation.
To measure each of these components, we’ll compute two different variances, the mean
square for groups (MSG) and the mean square error (MSE). The basic idea in gathering
evidence to reject the null hypothesis is to show that the between-groups variation is
substantially larger than the within-groups variation and we do that by forming the ratio, which
we call F:
between-groups variation MSG

F= =
within-groups variation MSE
In the caffeine example, we have three groups. More generally, suppose there were k different
groups (each assigned to consume varying amounts of caffeine) with sample sizes n1, n2, …
nk. Then the null hypothesis is H 0 : µ1 = µ2 = . . . = µk and the alternative hypothesis is that
at least two of the population means differ. The formulas for computing the between-groups
variation and within-groups variation are given below:
n1(x1 - x)2 + n2 (x2 - x)2 + . . .+ nk (xk - x)2

MSG =
k -1
where x is the mean of all the observations and x1 ,x2 , . . . ,xk are the
sample means for each group.
(n1 -1)s12 + (n2 -1)s22 + . . . ,+ (nk -1)sk2

MSE =
N-k
where N = n1 + n2 + . . . + nk and s1, s2 , . . . , sk are the sample standard

deviations for each group.
When H0 is true, then F = MSG/MSE has the F distribution k –1 and N – k degrees of

freedom. We use the F distribution to calculate the p-value for the F-test statistic.
We return to our three-group caffeine experiment to see how this works. To begin, we
calculate the sample means and standard deviations (See Table 31.3.).
Group Sample Mean Sample Standard Deviation

A 8.8 2.168
B 12.2 1.789
C 13.0 2.236
Table 31.3.
Table 31.3Group means and standard deviations.

Before calculating the mean square for groups we also need the grand mean of all the data
values: x ≈ 11.3 . Now we are ready to calculate MSG, MSE, and F:
5(8.8 − 11.33)2 + 5(12.2 − 11.33)2 + 5(13.0 − 11.33)2 49.73

MSG = = = 24.87
3 −1 2
(5 − 1)(2.168)2 + (5 − 1)(1.789)2 + (5 − 1)(2.236)2 51.602
MSE = = ≈ 4.30
15 − 3 12
24.87
F= ≈ 5.78
3.40
All that is left is to find the p-value. If the null hypothesis is true, then the F-statistic has the F
distribution with 2 and 12 degrees of freedom. We use software to see how likely it would be
to get an F value at least as extreme as 5.78. Figure 31.5 shows the result giving a p-value of
around 0.017. Since p < 0.05, we conclude that the amount of caffeine consumed affected the
mean memory score.
Distribution Plot
F, df1=2, df2=12
1.0
0.8
0.6
Density
0.4
0.2
0.01746
0.0
0 5.78
F
Figure 31.5. Finding the p-value from the F-distribution.
It takes a lot of work to compute F and find the p-value. Here’s where technology can help.
Statistical software such as Minitab, spreadsheet software such as Excel, and even graphing
calculators can calculate ANOVA tables. Table 31.4 shows output from Minitab. Now, match
the calculations above with the values in Table 31.4. Check out where you can find the values
for MSG, MSE, F, the degrees of freedom for F, and the p-value directly from the output of
ANOVA. That will be a time saver!

Minitab. Now, match the calculations above with the values in Table 31.4. Check out
where you can find the values for , , , the degrees of freedom for , and the
value directly from the output of ANOVA. That will be a time saver!








Table 31.4. ANOVA output from Minitab.
Table 31.4. ANOVA output from Minitab.
It is important to understand that ANOVA does not tell you which population
Itmeans
is important
differ, to understand
only thattwo
that at least ANOVA
of the does
means not tell you
differ. Wewhich
wouldpopulation
have to usemeans
other differ,
only
teststhat at least
to help two of the
us decide whichmeans
of thediffer.
threeWe would have
population meansto use
are other tests todifferent
significantly help us decide
from each
which of theother.
three However,
populationwe can also
means are get a clue by different
significantly plotting the
fromdata.
each Figure
other.31.6
However,
shows comparative dotplots for the number of words for each group. The sample means
we can also get a clue by plotting the data. Figure 31.6 shows comparative dotplots for the
are marked with triangles. Notice that the biggest difference in sample means is
number
between of groups
words for each
A (34 mggroup. Theand
caffeine) sample means
C (160 mg ofare markedThe
caffeine). withsample
triangles. Notice
means for that
groups
the biggestB and C are quite
difference close together.
in sample means is So, it looks
between as if consuming
groups Coca Cola
A (34 mg caffeine) anddoesn’t
C (160 mg
give the memory boost you could expect from consuming coffee or Jolt Energy.
of caffeine). The sample means for groups B and C are quite close together. So, it looks as
if consuming Coca Cola doesn’t give the memory boost you could expect from consuming
coffee or Jolt Energy.
A
Group
B
C
6 7 8 9 10 11 12 13 14 15 16 17 18
Figure 31.6. Comparative dotplots.
Number of Words
There is one last detail before jumping into running an ANOVA – there are some
Figure 31.6. assumptions
underlying Comparative that need to be checked in order for the results of the analysis
dotplots.
to be valid. What we should have done first with our caffeine experiment, we will do last.
Here is
There areone
thelast
three things
detail to check.
before jumping into running an ANOVA – there are some underlying
assumptions that need to be checked in order for the results of the analysis to be valid. What
1. Each group’s data need to be an independent random sample from that
we should have done
population. In first with our
the case caffeine
of an experiment,
experiment, we willneed
the subjects do last. Here
to be are the three
randomly
things toassigned
check. to the levels of the factor.
 The subjects in the caffeinememory experiment were divided into

1. Eachgroups.
group’sGroups
data need
weretothen
be an independent
randomly random
assigned sample
to the from
level of that population. In
caffeine.
the case of an experiment, the subjects need to be randomly assigned to the levels of the
factor.Guide, Unit 31, OneWay ANOVA
Student Page 7
Check: The subjects in the caffeine-memory experiment were divided into groups. Groups
were then randomly assigned to the level of caffeine.

2. Next, each population has a Normal distribution. The results from ANOVA will be
approximately correct as long as the sample group data are roughly normal. Problems can
arise if the data are highly skewed or there are extreme outliers.
Check: The normal quantile plots of Words Recalled for each group are shown in Figure
31.7. Based on these plots, it seems reasonable to assume these data are from a Normal
distribution.
Normal Quantile Plots of Number of Words

Normal - 95% CI
Group A (35 mg) Group B (100 mg)

99 99
90 90
50 50
10 10
Percent
1 1
0 5 10 15 20 5 10 15 20
Group C (160 mg)

99
90
50
10
1
5 10 15 20
Figure 31.7. Checking the normality assumption.
3. All populations have the same standard deviation. The results from ANOVA will be
approximately correct as long as the ratio of the largest standard deviation to the smallest
standard deviation is less than 2.
Check: The ratio of the largest to the smallest standard deviation is 2.236/1.789 or around
1.25, which is less than 2.

Key Terms
A factor is a variable that can be used to differentiate population groups. The levels of a
factor are the possible values or settings that a factor can assume. The variable of interest is
the response variable, which may be related to one or more factors.
An analysis of variance or ANOVA is a method of inference used to test whether or not three
or more population means are equal. In a one-way ANOVA there is one factor that is thought
to be related to the response variable.
An analysis of variances tests the equality of means by comparing two types of variation,
between-groups variation and within-groups variation. Between-groups variation deals
with the spread of the group sample means about the grand mean, the mean of all the
observations. It is measured by the mean square for groups, MSG. Within-groups variation
deals with the spread of individual data values within a group about the group mean. It is
measured by the mean square error, MSE.
The F-statistic is the ratio MSG/MSE.

The Video
watch the video.
1. What is the theory called the Wisdom of the Crowd?
2. What was different about the clipboards that students were holding?
3. State the null hypothesis for the one-way ANOVA.
4. What is the name of the test statistic that results from ANOVA?
5. Was the professor able to conclude from the F-statistic that the population means differed
depending on the weight of the clipboard? Explain.
6. Did the crowd prove to be wiser than the individual students?

Unit Activity:
Controlling Wafer Thickness
You will use the Wafer Thickness tool to collect data for this activity. There are three control
settings that affect wafer thickness during the manufacture of polished wafers used in the
production of microchips.
1. Leave Controls 2 and 3 set at level 2. Your first task will be to perform an experiment to
collect data and determine whether settings for Control 1 affect the mean thickness of polished
wafers.
a. Open the Wafer Thickness tool. Set Control 1 to level 1, and Controls 2 and 3 to level 2 (the
middle setting). In Real Time mode, collect data from 10 polished wafers. Store the data in a
statistical package or Excel spreadsheet or in a calculator list. Make a sketch of the histogram
produced by the interactive tool.
b. Set Control 1 to level 2. Leave Controls 2 and 3 set at level 2. Repeat (a). Sketch the second
histogram using the same scales as was used on the first. Store the data in your spreadsheet
or a calculator list.
c. Set Control 1 to 3. Leave Controls 2 and 3 set at level 2. Repeat (a). Sketch your third
histogram, again using the same scales as were used on the first histogram. Store the data in
your spreadsheet or a calculator list.
d. Calculate the means and standard deviations for each of your three samples. Based on
the sample means and on your histograms, do you think that there is sufficient evidence that
changing the level of Control 1 changes the mean thickness of the polished wafers produced?
Or might these sample-mean differences be due simply to chance variation? Explain your
thoughts.
e. Use technology to run an ANOVA. State the null hypothesis being tested, the value of F, the
p-value, and your conclusion.
2. Your next task will be to perform an experiment to collect data and determine whether
settings for Control 2 affect the mean thickness of polished wafers.
a. Leave Controls 1 and 3 set at level 2. Adapt the process used in question 1(a – c) to collect
the data on Control 2.

b. Compute the standard deviations for the three samples. Is the underlying assumption of
equal standard deviations reasonably satisfied? Explain.
c. Provided you answered yes to (b), use technology to run an ANOVA. State the null
hypothesis being tested, the value of F, the p-value, and your conclusion. (If you answered no
to (b), skip this part.)
3. Your final task will be to perform an experiment to collect data and determine whether
settings for Control 3 affect the mean thickness of polished wafers.
a. Leave Controls 1 and 2 set at level 2. Adapt the process used in question 1(a – c) to collect
the data on Control 3.
b. Compute the standard deviations for the three samples. Is the underlying assumption of
equal standard deviations reasonably satisfied? Explain.
c. Provided you answered yes to (b), use technology to run an ANOVA. State the null
hypothesis being tested, the value of F, the p-value, and your conclusion. (If you answered no
to (b), skip this part.)

Exercises
1. A professor predicts that students will learn better if they study to white noise (similar to
a fan) compared to music or no sound. She randomly divides 27 students into three groups
and sends them to three different rooms. In the first room, students hear white noise, in the
second, music from a local radio station, and in the third, the door is closed to help block out
normal sound from the hall. Each group is given 30 minutes to study a section of their text
after which they take a 10-question multiple-choice exam. Table 31.5 contains the results.
White Noise Music No Sound

8 5 4
5 7 7
5 3 6
7 5 2
8 5 4
7 8 3
8 4 5
3 5 5
10 7 4
Table 31.5
Table 31.5. Test results.
a. Calculate the mean test score for each group. Calculate the standard deviation of the test
scores for each group.
b. Make comparative dotplots for the test results of the three groups. Do you think that the
dotplots give sufficient evidence that there is a difference in population mean test results
depending on the type of noise? Explain.
c. Run an ANOVA. State the hypotheses you are testing. Show the calculations for the
F-statistic. What are the degrees of freedom associated with this F-statistic?
d. What is the p-value of the test statistic? What is your conclusion?
2. Not all hotdogs have the same calories. Table 31.6 contains calorie data on a random
sample of Beef, Poultry, and Veggie dogs. (One extreme outlier for Veggie dogs was omitted
from the data.) Does the mean calorie count differ depending on the type of hotdog? You first
encountered this topic in Unit 5, Boxplots.

Calories First-Year Cumulative Grade Point Average
Beef Poultry Veggie High Rating Medium Rating Low Rating
110 60 40 3.37 2.92 3.19
110 60 45 3.28 2.11 1.57
130 60 45 1.73 3.92 0.99
130 70 45 3.64 2.83 3.58
140 70 50 3.04 3.26 3.87
150 70 50 2.80 3.18 1.43
160 80 55 3.83 2.28 2.50
160 90 57 3.22 3.13 3.45
170 90 60 3.55 2.90 2.63
170 100 60 2.28 3.41 2.59
175 100 70 2.51 2.64 3.50
180 100 80 1.74 3.71 1.95
180 110 80 2.88 2.52 2.21
180 110 81 2.86 2.24 2.50
190 110 90 4.00 0.98 3.35
190 120 95 2.67 3.65 2.88
190 120 100 3.75 2.87 1.27
200 130 100 2.30 2.21 3.10
210 140 110 3.24 3.02 3.29
230 150 2.43 3.66 1.06
Table 31.6
Table 31.6. Calorie content Table 31.7
Table 31.7. First-year college GPA by high
of hotdogs. school rating.
a. Verify that the standard deviations allow the use of ANOVA to compare population means.
b. Use technology to run an ANOVA. State the value of the F-statistic, the degrees of freedom
for F, the p-value of the test, and your conclusion.
c. Make boxplots that compare the calorie data for each type of hot dog. Add a dot to each
boxplot to mark the sample means. Do your plots help confirm your conclusion in (b)?
3. Many states rate their high schools using factors such as students’ performance, teachers’
educational backgrounds, and socioeconomic conditions. High school ratings for one state
have been boiled down into three categories: high, medium, and low. The question for one of
the state universities is whether or not college grade performance differs depending on high
school rating. Table 31.7 contains random samples of students from each high school rating
level and their first-year cumulative college grade point averages (GPA).
a. Calculate the sample means for the GPAs in each group. Based on the sample means
alone, does high school rating appear to have an impact on mean college GPA? Explain.
b. Check to see that underlying assumptions for ANOVA are reasonably satisfied.

c. Run an ANOVA to test whether there is a significant difference among the population
mean GPAs for the three groups. Report the value of the F-statistic, the p-value, and your
conclusion.
4. Researchers in a nursing school of a large university conducted a study to determine if

differences exist in levels of active and collaborative learning (ACL) among nursing students,
other health professional students, and students majoring in education. A random sample of
1,000 students from each of these three majors was selected from students who completed
the National Survey of Student Engagement (NSSE).
a. The sample mean ACL scores for nursing, other health professional students, and
education majors were 46.44, 45.58, and 48.59, respectively. Do these sample means provide
sufficient evidence to conclude that there was some difference in population mean ACL scores
among these three majors? Explain.
b. A one-way analysis of variance was run to determine if there was a difference among the
three groups on mean ACL scores. Assuming that all students answered the NSSE questions
related to ACL, what were the degrees of freedom of the F-test?
c. The results from the ANOVA gave F = 8.382. Determine the p-value. What can you
conclude?

Review Questions
1. Random samples of three types of candy were given to children. The three types of candy
were chocolate bars (A), hard candy (B), and chewy candy (C). A group of 15 children rated
samples of each type of candy on a scale from 1 (lowest) to 10 (highest). Two hypothetical
data sets are given in Tables 31.8 and 31.9.
Ratings for A Ratings for B Ratings for C Ratings for A Ratings for B Ratings for C
8 4 6 8 4 6
10 5 5 10 5 5
7 7 7 7 6 8
8 8 5 8 9 2
6 7 6 3 6 7
7 8 5 7 8 3
4 6 6 4 6 7
7 5 5 9 5 5
6 5 4 6 4 4
8 6 2 8 7 2
6 6 3 5 6 3
5 7 4 4 9 4
6 3 5 6 3 3
7 8 5 8 9 7
8 5 6 10 3 8
Table 31.8. Data set #1.

Table 31.8 Table 31.9. Data set #2.
Table 31.9
a. Find the sample means of each candy type based on the ratings in Table 31.8. Then do the
same for the ratings in Table 31.9. Based on these results, can you tell if there is a significant
difference in population mean ratings among the different types of candies? Explain.
b. Make comparative boxplots for the data in Table 31.8. Then do the same for the data in
Table 31.9. For both sets of plots, mark the mean with a dot on each boxplot. For which data
set is it more likely that the results from a one-way ANOVA will be significant? Explain.
c. Run an ANOVA based on Data Set #1. Report the value of the F-statistic, the p-value, and
your conclusion. Then do the same for Data Set #2. Explain why you should not be surprised
by the results.
2. The data in Table 31.10 were part of a study to investigate online questionnaire design.
The researcher was interested in the effect that type of answer entry and type of question-to-
question navigation would have on the time it would take to complete online surveys. Twenty-

seven volunteers participated in this study. Each completed two questionnaires, one dealing
with credit and the other focusing on vacations. Each questionnaire had 14 questions, and
participants could select only one answer.
There were three display types for answers:
(1) radio button
(2) drop down list
(3) list box
There were three navigational methods:
(1) questions were on a single page
(2) click the Next/Prev button
(3) press Tab
Display Type Navigation Type Time (sec) Display Type Navigation Type Time (sec)
1 1 97 1 3 117
3 3 83 2 2 74
1 1 102 1 3 66
3 3 85 3 1 62
1 1 92 1 2 93
3 3 71 3 1 62
1 2 105 1 2 64
3 3 92 3 1 48
1 2 67 1 2 57
3 3 71 3 1 96
1 2 54 2 3 68
3 3 66 3 1 90
1 3 63 2 3 71
2 1 61 3 1 74
1 3 101 2 3 74
2 1 117 3 2 78
1 3 124 2 3 92
2 1 97 3 2 71
2 1 126 2 3 80
3 2 83 3 2 49
2 1 107 2 3 67
3 2 88 1 1 101
2 1 88 2 2 111
3 2 62 1 1 103
2 2 55 2 2 80
1 3 73 1 1 103
2 2 126 2 2 111
Table 31.10. Time to complete on-line questionnaires.

Table 31.10

a. Enter the data in Table 31.10 into a statistical software, Excel, or graphing calculator
spreadsheet. The display types and navigation types have been coded as numbers to facilitate
data entry. Once the data are entered, you can replace the coded values with their categorical
values. (For example, in Display Type replace 1 with radio button, 2 with drop down list, and 3
with list box).
Display Type and Time
b. Make comparative boxplots of the times for each level of Display Type. Mark the location
of the means on your boxplot. Do you see anything unusual in the data that might make it not
appropriate to use ANOVA? If so, follow up with normal quantile plots to check the assumption
of normality.
c. Run an ANOVA using Display Type as the factor. State the null hypothesis you are testing.
Report the value of the F-statistic, the p-value, and your conclusion.
d. Make comparative boxplots of the times for each level of Navigation Type. Mark the location
of the means on your boxplot. Do you see anything unusual in these data that might make
it not appropriate to use ANOVA? If so, follow up with normal quantile plots to check the
assumption of normality.
e. Run an ANOVA using Navigation Type as the factor. Report the value of the F-statistic, the
degrees of freedom of the F-statistic, the p-value, and your conclusion.
f. Based on this study, what recommendations would you make to online questionnaire
designers?
3. A group researching wage discrepancies among the four regions of the U.S. focused on full-
time, hourly-wage workers between the ages of 20 and 40. Researchers randomly selected
200 workers meeting the age criteria from the northeast, midwest, south and west and
recorded their hourly pay rates. The mean hourly rate for the combined regions was $15.467. A
summary of the data are given in Table 31.11. The researchers ran an ANOVA on these data.
Region Sample size Sample Mean Standard Deviation

Northeast 200 16.560 9.164
Midwest 200 15.154 6.381
South 200 13.931 6.933
West 200 16.223 9.289
Table
Table31.11.
31.11Summary of hourly rate data.

a. Is the homogeneous standard deviations assumption for ANOVA reasonably satisfied?
Explain.
b. State the researchers’ null hypothesis.
c. Calculate the value of the F-statistic and give its degrees of freedom. Show calculations.
d. Determine the p-value.
e. Based on the evidence in Table 31.11 and your answers to (a – d), what conclusions can the
researchers make?
4. A study focusing on women’s wages was investigating whether there was a significant
difference in salaries in four occupations commonly (but not exclusively) held by women –
cashier, customer service representative, receptionist, and secretary/administrative assistant.
Weekly wages from 50 women working in each occupation are recorded in Table 31.12.
Cashier Customer Service Receptionist Secretary/Adm. Asst.

385 320 440 473 673 807 640 850
427 450 540 400 232 2038 600 580
480 333 222 673 523 529 420 840
520 300 1000 84 380 520 554 522
540 1200 625 769 715 600 577 528
690 240 540 1428 596 383 712 382
680 288 600 289 402 400 447 800
690 315 738 1885 787 945 380 440
364 548 344 429 238 400 1384 92
113 369 800 555 538 620 705 877
360 340 481 480 360 969 600 923
360 350 340 788 500 450 225 1154
2885 387 720 1480 430 577 440 481
720 345 885 596 540 440 1058 812
360 350 680 600 528 650 673 560
297 540 760 920 568 672 769 918
300 548 700 624 561 400 692 600
340 400 560 340 603 500 826 227
360 439 800 769 510 420 900 865
511 320 431 400 400 345 1900 640
508 315 500 390 500 919 769 692
321 302 1162 615 520 400 746 654
400 665 400 481 1280 480 1384 543
290 220 400 640 560 188 360 320
729 331 1270 430 386 680 415 323
Table
Table 331.12.
1.12 Weekly wages of women in four occupations.
Data from 2012 March Supplement, Current Population Survey.

a. Calculate the means and standard deviations for the weekly wages in each occupation
category.
b. Is the homogeneous standard deviations assumption for ANOVA reasonably satisfied?

Explain.
c. Run an ANOVA. Record the ANOVA table and highlight the value of F, and the p-value.
d. Based on your answers to (a – c), what are your conclusions?

AgainstAllOdds StudentGuide Set3

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

AgainstAllOdds StudentGuide Set3

Caricato da

Copyright:

Formati disponibili

Unit 22: Sampling

Figure 22.1. Population distribution of heights from third-grade class.

Unit 22: Sampling Distributions | Student Guide | Page 1

Figure 22.2. Random samples of size four.

Figure 22.3. Sampling distribution of the sample mean.

We can calculate the standard deviation of x as follows:

Unit 22: Sampling Distributions | Student Guide | Page 2

Unit 22: Sampling Distributions | Student Guide | Page 3

Figure 22.5. Duration of calls to a call center.

Unit 22: Sampling Distributions | Student Guide | Page 4

Unit 22: Sampling Distributions | Student Guide | Page 5

B. Understand the concept of a sampling distribution of a statistic such as a sample mean,

Unit 22: Sampling Distributions | Student Guide | Page 6

Figure 22.7. Population distribution.

Unit 22: Sampling Distributions | Student Guide | Page 7

Unit 22: Sampling Distributions | Student Guide | Page 8

Unit 22: Sampling Distributions | Student Guide | Page 9

Unit 22: Sampling Distributions | Student Guide | Page 10

1. What is the difference between parameters and statistics?

Unit 22: Sampling Distributions | Student Guide | Page 11

Unit 22: Sampling Distributions | Student Guide | Page 12

Write each of On this

Unit 22: Sampling Distributions | Student Guide | Page 13

Figure 22.10. Density curve for uniform distribution.

Unit 22: Sampling Distributions | Student Guide | Page 14

a. What is the standard deviation of the reported result?

Unit 22: Sampling Distributions | Student Guide | Page 15

b. Samples collected over a 24-hour time period appear in Table 22.3.

Sample pH level Sample mean

Table 22.3. pH of samples.

Unit 22: Sampling Distributions | Student Guide | Page 16

a. Could the exact distribution of x be normal? Why or why not?

Unit 22: Sampling Distributions | Student Guide | Page 17

Unit 22: Sampling Distributions | Student Guide | Page 18

Unit 23: Control Charts | Student Guide | Page 1

Figure 23.1. Control chart for month 1.

Unit 23: Control Charts | Student Guide | Page 2

Figure 23.3. Highlighting finishing times beyond the control limits.

Unit 23: Control Charts | Student Guide | Page 3

Unit 23: Control Charts | Student Guide | Page 4

B. Be able to distinguish between common cause and special cause variation.

E. Make decisions based on observed patterns in 7 run charts and x charts.

F. Be able to apply decision rules to determine if a process is out of control.

Figure 23.4: Silicon ingots and polished wafers.

Unit 23: Control Charts | Student Guide | Page 5

Figure 23.5. Histogram of wafer thickness.

Unit 23: Control Charts | Student Guide | Page 6

Unit 23: Control Charts | Student Guide | Page 7

Sample Sample Sample

According to the 68-95-99.7% Rule, if the process is in control, we would expect:

Unit 23: Control Charts | Student Guide | Page 8

Figure 23.7. x chart for wafer thickness over time.

Unit 23: Control Charts | Student Guide | Page 9

Sample Sample Sample

Figure 23.8. Updated x chart.

Unit 23: Control Charts | Student Guide | Page 10

Unit 23: Control Charts | Student Guide | Page 11

Figure 23.9. Generic control chart.

Unit 23: Control Charts | Student Guide | Page 12

Unit 23: Control Charts | Student Guide | Page 13

1. What was W. Edwards Deming known for?

2. What is a process, statistically speaking? Give an example.