Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Distributions
Summary of Video
If we know an entire population, then we can compute population parameters such as the
population mean or standard deviation. However, we generally don’t have access to data
from the entire population and must base our information about a population on a sample.
From samples, we compute statistics such as sample means or sample standard deviations.
However, if we resample, chances are good that we won’t get the same results.
This video begins with a population of heights from students in a third grade class at Monica
Ros School. A graphic display of the population distribution of heights shows a roughly normal
shape with a mean µ = 53.4 inches and standard deviation σ = 1.8 inches (See Figure 22.1.).
Next, we draw random samples of size four from the class and record the heights. Figure 22.2
shows the results from five samples along with their sample means, which can be found in
Table 22.1. Notice that the sample means vary from sample to sample, except for Samples 3
and 4 where the sample means match even though the data values differ.
We can keep sampling until we’ve selected all samples of size four from this population of
20 students. If we plot the sample means of all possible samples of size four, we get what is
called the sampling distribution of the sample mean (See bottom graph in Figure 22.3.).
Sample Mean, x
1 53.00
2 52.25
3 52.75
4 52.75
5 53.25
Table 22.1.
Sample means.
Now, compare the sampling distribution of x to the population distribution. Notice that
both distributions are approximately normal with mean 53.4 inches. However, the sampling
distribution of x is not as spread out as the population distribution.
Next, we put what we have learned about the sampling distribution of the sample mean to
use in the context of manufacturing circuit boards. Although the scene depicted in the video is
one that you don’t see much anymore in the United States, we can still explore how statistics
can be used to help control quality in manufacturing. A key part of the manufacturing process
of circuit boards is when the components on the board are connected together by passing
it through a bath of molten solder. After boards have passed through the soldering bath, an
inspector randomly selects boards for a quality check. A score of 100 is the standard, but there
is variation in the scores. The goal of the quality control process is to detect if this variation
starts drifting out of the acceptable range, which would suggest that there is a problem with
the soldering bath.
Based on historical data collected when the soldering process was in control, the quality
scores have a normal distribution with mean 100 and standard deviation 4. The inspector’s
random sampling of boards consists of samples of size five. Hence, the sampling distribution
of x is normal with a mean of 100 and standard deviation of 4 / 5 ≈ 1.79 . The inspector uses
this information to make an x control chart, a plot of the values of x against time. A normal
curve showing the sampling distribution of x has been added to the side of the control chart.
Recall from the 68-95-99.7% rule, that we expect 99.7% of the scores to be within three
standard deviations of the mean. So, we have added control limits that are three standard
deviations (3 × 1.79 or 5.37 units) on either side of the mean (See Figure 22.4.). A point outside
either of the control limits is evidence that the process has become more variable, or that its
mean has shifted – in other words, that it’s gone out of control. As soon as an inspector sees a
point such as the one outside the upper control limit in Figure 22.4, it’s a signal to ask, what’s
gone wrong? (For more information on control charts, see Unit 23, Control Charts.)
So far we’ve been looking at population distributions that follow a roughly normal curve. Next,
we look at a distribution of lengths of calls coming into the Mayor’s 24 Hour Hotline call center
in Boston, Massachusetts. Most calls are relatively brief but a few last a very long time. The
shape of the call-length distribution is skewed to the right as shown in Figure 22.5.
To gain insight into the sampling distribution of the sample mean, x , for samples of size 10,
we randomly selected 40 samples of size 10 and made a histogram of the sample means.
We repeated this process for samples of size 20 and then again for samples of size 60. The
histograms of the sample means appear in Figure 22.6.
Now let’s compare our sampling distributions (Figure 22.6) with the population distribution
(Figure 22.5). Notice that the spread of all the sampling distributions is smaller than the spread
of the population distribution. Furthermore, as the sample size n increases, the spread of the
sampling distributions decreases and their shape becomes more symmetric. By the time
n = 60, the sampling distribution appears approximately normally distributed. What we have
uncovered here is one of the most powerful tools statisticians possess, called the Central Limit
Theorem. This states that, regardless of the shape of the population, the sampling distribution
of the sample mean will be approximately normal if the sample size is sufficiently large. It is
because of the Central Limit Theorem that statisticians can generalize from sample data to the
larger population. We will be seeing applications of the Central Limit Theorem in later units on
confidence intervals and significance tests.
C. Know that the sampling distributions of some common statistics are approximately normally
distributed; in particular, the sample mean x of a simple random sample drawn from a normal
population has a normal distribution.
D. Know that the standard deviation of the sampling distribution of x depends on both the
standard deviation of the population from which the sample was drawn and the sample size n.
E. Grasp a key concept of statistical process control: Monitor the process rather than examine
all of the products; all processes have variation; we want to distinguish the natural variation of
the process from the added variation that shows that the process has been disturbed.
F. Make an x control chart. Use the 68-95-99.7% rule and the sampling distribution of x to
help identify if a process is out of control.
G. Be familiar with the Central Limit Theorem: the sample mean x of a large number of
observations has an approximately normal distribution even when the distribution of individual
observations is not normal.
If repeated random samples are chosen from the same population, the values of sample
statistics such as x will vary from sample to sample. This variation follows a regular pattern
in the long run; the sampling distribution is the distribution of values of the statistic in a very
large number of samples. For example, suppose we start with data from the population
distribution shown in Figure 22.7. This population is skewed to the right, and clearly not
normally distributed.
0 5 10 15 20 25
x X
Now, we draw a random sample of size 50 from this population and compute two statistics,
the mean and the median, and get 20.7 and 19.8, respectively. Next we take another sample
of size 50 and compute the mean and median for that sample. We keep resampling until we
have a total of 1000 samples. Histograms of the 1000 means and 1000 medians from those
samples appear in Figures 22.8 and 22.9, respectively. In both cases, the sampling distribution
of the statistic appears approximately normally distributed. The sampling distribution of the
sample mean, x , is centered around 24 and the sampling distribution of the sample median at
around 22.
100
80
Frequency
60
40
20
0
20 22 24 26 28 30
Sample Mean
Figure 22.8. Distribution of the sample mean from 1000 samples of size 50.
100
80
Frequency
60
40
20
0
16 18 20 22 24 26 28 30
Sample Median
Figure 22.9. Distribution of the sample median from 1000 samples of size 50.
Although basic statistics such as the sample mean, sample median and sample standard
deviation all have sampling distributions, the remainder of this unit will focus on the sampling
distribution of the sample mean, x . If x is the mean of a simple random sample of size n from
a population with mean µ and standard deviation σ, then the mean and standard deviation of
the sampling distribution of x are:
µx = µ
σ
σx =
n
Control charts for the sample mean x provide an immediate application for the sampling
distribution of x . In the 1920’s Walter Shewhart of Bell Laboratories noticed that production
workers were readjusting their machines in response to every variation in the product. If the
diameter of a shaft, for example, was a bit small, the machine was adjusted to cut a larger
diameter. When the next shaft was a bit large, the machine was adjusted to cut smaller. Any
process has some variation, so this constant adjustment did nothing except increase variation.
Shewhart wanted to give workers a way to distinguish between the natural variation in the
process and the extraordinary variation that shows that the process has been disturbed and
hence, actually requires adjustment.
The result was the Shewhart x control chart. The basic idea is that the distribution of sample
mean x is close to normal if either the sample size is large or individual measurements are
normally distributed. So, almost all the x -values lie within ±3 standard deviations of the mean.
The correct standard deviation here is the standard deviation of x , which is σ n (where σ is
the standard deviation of individual measurements). So, the control limits µ ± 3σ n contain
the range in which sample means can be expected to vary if the process remains stable. The
control limits distinguish natural variation from excessive variation.
If x is the mean of a simple random sample (SRS) of size n from a population having mean µ
and standard deviation σ, then the mean and standard deviation of x are:
µx = µ
σ
σx =
n
If a population has a normal distribution with mean µ and standard deviation σ, then the
sampling distribution of the sample mean, x , of n independent observations has a normal
distribution with mean µ and standard deviation σ n .
If the population is not normal but n is large (say n > 30), then the Central Limit Theorem tells
us that the sampling distribution of the sample mean, x , of n independent observations
has an approximate normal distribution with mean µ and standard deviation σ n .
2. Does statistical process control inspect all the items produced after they are finished?
3. The inspector samples five circuit boards at regular intervals and finds the mean solder
quality score x for these five boards. Do we expect x to be exactly 100 if the soldering
process is functioning properly?
4. If the quality of individual boards varies according to a normal distribution with mean µ = 100
and standard deviation σ = 4 , what will be the distribution of the sample averages, x ?
(Recall the sample size is n = 5.)
5. In general, is the mean of several observations more or less variable than single
observations from a population? Explain.
1. Your instructor has a container filled with numbered strips as shown in Table 22.2. Make a
histogram of this distribution. Describe its shape.
2. You will need 100 samples of size 9. Your instructor will provide instructions for gathering
these samples. After the data have been collected, you will need a copy of the table of results
before you can answer parts (a) and (b).
a. Find the sample mean for each of the samples. Record the sample means in the results
table. (Save your results table. You will need this table again for the activity in Unit 24,
Confidence Intervals.)
b. To get an idea of the characteristics of the sampling distribution for the sample mean, make
a histogram of the sample means. (Use the same scaling on the horizontal axis that you used
in question 1.) Compare the shape, center and spread of the sampling distribution to that of the
original distribution (question 1).
3. A population has a uniform distribution with density curve as shown in Figure 22.10.
1.0
0.8
Proportion
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
a. Your instructor will give you directions for using technology to generate 100 samples of size
9 from this distribution.
b. Once you have your 100 samples, find the sample means.
c. Make a histogram of the 100 sample means. Describe the shape of your histogram.
Compare the center of this sampling distribution with the center of the population distribution
from Figure 22.10.
b. Why do you think the laboratory reported a result based on the mean of three weighings?
2. The scores of students on the ACT college entrance examination in a recent year had the
normal distribution with mean µ = 18.6 and standard deviation σ = 5.9 .
a. What fraction of all individual students who take the test have scores 21 or higher?
b. Suppose we choose 55 students at random from all who took the test nationally. What is the
distribution of average scores, x , in a sample of size 55? In what fraction of such samples will
the average score be 21 or higher?
3. The number of accidents per week at a hazardous intersection varies with mean 2.2 and
standard deviation 1.4. This number, x, takes only whole-number values, and so is certainly
not normally distributed.
a. Let x be the mean number of accidents per week at the intersection during a year (52
weeks). What is the approximate distribution of x according to the Central Limit Theorem?
b. What is the approximate probability that, on average, there are fewer than two accidents per
week over a year?
c. What is the approximate probability that there are fewer than 100 accidents at the
intersection in a year? (Hint: Restate this event in terms of x .)
4. A company produces a liquid that can vary in its pH levels unless the production process is
carefully controlled. Quality control technicians routinely monitor the pH of the liquid. When the
process is in control, the pH of the liquid varies according to a normal distribution with mean
µ = 6.0 and standard deviation σ = 0.9.
c. Make an x chart by plotting the sample means versus the sample number. Draw horizontal
reference lines at the mean and lower and upper control limits.
d. Do any of the sample means fall below the lower control limit or above the upper control
limit? This is one indication that a process is “out of control.”
e. Apart from sample means falling outside the lower and upper control limits, is there any
other reason why you might be suspicious that this process is either out of control or going out
of control? Explain.
a. If the process is in control, what percentage of the bottle caps would have diameters outside
the chemical manufacturer’s specification limits?
b. The manufacturer of the bottle caps has instituted a quality control program to prevent the
production of defective caps. As part of its quality control program, the manufacturer measures
the diameters of a random sample of n = 9 bottle caps each hour and calculates the sample
mean diameter. If the process is in control, what is the distribution of the sample mean x ? Be
sure to specify both the mean and standard deviation of x ’s distribution.
c. The cap manufacturer has a rule that the process will be stopped and inspected any time
the sample mean falls below 0.499 inch or above 0.501 inch. If the process is in control, find
the proportion of times it will be stopped for inspection.
2. A study of rush-hour traffic in San Francisco records the number of people in each car
entering a freeway at a suburban interchange. Suppose that this number, x, has mean 1.5
and standard deviation 0.75 in the population of all cars that enter at this interchange during
rush hours.
b. Traffic engineers estimate that the capacity of the interchange is 700 cars per hour.
According to the Central Limit Theorem, what is the approximate distribution of the mean
number of persons, x , per car in 700 randomly selected cars at this interchange?
c. What is the probability that 700 cars will carry more than 1075 people? (Hint: Restate the
problem in terms of the average number of people per car.)
a. Let x be the sample mean from 10 randomly selected calls. What is the mean and
standard deviation of x ? What, if anything, can you say about the shape of the distribution of
x ? Explain.
b. Let x be the sample mean from 100 randomly selected calls. What is the mean and
standard deviation of x ? What, if anything, can you say about the shape of the distribution of
x ? Explain.
c. In a random sample of 100 calls from the call center, what is the probability that the average
length of these calls will be over 2 minutes?
Summary of Video
Statistical inference is a powerful tool. Using relatively small amounts of sample data we can
figure out something about the larger population as a whole. Many businesses rely on this
principle to improve their products and services. Management theorist and statistician W.
Edwards Deming was among the first to champion the idea of statistical process management.
Initially, Deming found the most receptive audience to his management theories in Japan.
After World War II, Japanese industry was shattered. Rebuilding was a daunting challenge,
one that Japanese business leaders took on with great determination. In the decades after the
war, they transformed the phrase “Made in Japan” from a sign of inferior, cheaply-made goods
to a sign of quality respected the world over. Deming’s emphasis on long-term thinking and
continuous process improvement was vital in bringing about the so-called “Japanese Miracle.”
At first, Deming’s ideas were not as well received in America. Deming criticized American
managers for their lack of understanding of statistics. But as time went on – and competition
from Japan grew – companies in the U.S. began to embrace Deming’s ideas on statistical
process control. Now his principles of total quality management are an integral part of
American business, helping workers uncover problems and produce higher quality goods
and services.
In statistics, a process is a chain of steps that turns inputs into outputs. A process could be
anything from the way a factory turns raw iron into a finished bolt to the way you turn raw
ingredients into a hot dinner. Statisticians say a process that is running smoothly, with its
variables staying within an expected range, is in control. Deming was adamant that statistics
could help in understanding a manufacturing process and identifying its problems, or when
things were out of control or about to go out of control. He advocated the use of control charts
as a way to monitor whether a process is in or out of control. This technique is widely used to
this day as we’ll see in the video in a visit to Quest Diagnostics’ lab.
Quest performs medical tests for healthcare providers. So, for example, at Quest a patient’s
blood sample is the input of the process and the test result is the output. A courier picks
up specimens and transports them to the processing lab, where they are sorted by time of
arrival and urgency of test. Technicians verify each specimen and confirm the doctor’s orders.
Then the specimens are barcoded and are ready to be passed on for testing. Quest’s Seattle
Quest needed to know where the process stood at present: How close were they to hitting
the 2 a.m. target and how much did finish times vary? Keep in mind that all processes have
variation. Common cause variation is due to the day-to-day factors that influence the process.
In Quest’s case, it could be things like a printer running out of paper and needing to be refilled,
or a worker calling in sick. It is the normal variation in a system.
Processes are also susceptible to special cause variation – that’s when sudden, unpredictable
events throw a wrench into the process. Examples of special cause variation would be
blackouts that shut down the lab’s power, or a major crash on the highway that would keep the
samples from being delivered to the lab. Quest needed to figure out how their process was
running on a day-to-day basis when they were only up against common cause variation.
Quest used six months of finish-time data to set up control limits and then created a control
chart, which is a graphic way to keep track of variation in finish times. Figure 23.1 shows a
control chart for month 1. The center line is the target finish time. The control limits at 12:00
a.m. and 4:00 a.m. are set three standard deviations above and below the center line. The
data points are the finish times that Quest tracked over a one-month period.
Quest assumed that their nightly finish times are normally distributed. In Figure 23.2, we add a
graph of the normal distribution to the control chart. Remember, in a normal distribution 68% of
your data is within one standard deviation of the mean, 95% is within two standard deviations,
and 99.7% is within three standard deviations.
Using the control chart Quest was able to figure out when their process had been disturbed
and gone out of control, or was heading that way. One dead giveaway that the finish times
are out of control is if a point falls outside the control limits. That should only happen 0.3% of
the time if everything is running smoothly. Take a look at Figure 23.3, which highlights what
happened toward the end of the one-month cycle.
There are other indicators that something suspicious might be going on besides points falling
outside the control limits. For example, if too many points are on one side of the center line or
if a strong pattern emerges (hence, the variability is not random) – then it’s time to investigate.
Mapping finish times on the control chart helps monitor the process, and alerts techs right
away that something has been disturbed. Then they can track down and address the
cause immediately.
Another way the control chart helped Quest improve efficiency was by revealing some of
the causes of variation in the process, which the team could then address. Quest actually
C. Know how to construct a run chart and describe patterns/trends in data over time.
D. Know how to construct an x chart and describe the changes in sample means over time.
Content Overview
Consider the problem of quality control in the manufacturing process of turning ingots of
silicon into polished wafers used to make microchips. (See Figure 23.4.) Assume that the
manufacturer wants the polished wafers to have consistent thickness with a target thickness
of 0.5 millimeters. A sample of 50 polished wafers is selected as a batch is being produced.
Table 23.1 contains these data.
0.555 0.543 0.533 0.538 0.533 0.529 0.526 0.522 0.518 0.519
0.516 0.515 0.513 0.515 0.512 0.510 0.508 0.507 0.507 0.507
0.506 0.506 0.506 0.505 0.503 0.502 0.500 0.498 0.499 0.496
0.497 0.493 0.492 0.491 0.487 0.488 0.486 0.485 0.483 0.484
0.482 0.479 0.476 0.476 0.474 0.471 0.471 0.469 0.454 0.447
Table 23.1. Wafer thickness from sample of 50 polished wafers.
In order to gain a sense of the distribution of wafer thickness, a quality control technician
constructs the histogram shown in Figure 23.5.
30
Percent
20
10
0
0.40 0.45 0.50 0.55 0.60
Thickness (mm)
The histogram indicates that distribution of wafer thickness is approximately normal. The
sample mean is 0.50064, which is pretty close to the target value. Furthermore, the standard
deviation is 0.02227, which is relatively small compared to the mean. The analysis thus far
supports the conclusion that the process is in control.
The sample mean and standard deviation together with the histogram provide information on
the overall pattern of the sample data. However, there is more to quality control than simply
studying the overall pattern. Manufacturers also keep track of the run order, the order in which
the data are collected. For the data in Table 23.1, the run order may relate to which part of the
ingot – top, middle, or bottom – the wafers came from, or it may relate to the order in which
wafers were fed through the grinding and polishing machines. If a process is stable or in
control, the order in which data are collected, or the time in which they are processed, should
not affect the thickness of polished wafers. One way to check that the production processes of
polished wafers are in control is by creating a run chart.
A run chart is a scatterplot of the data versus the run order. To help visualize patterns over
time, the dots in the scatterplot are usually connected. Table 23.1 lists the data values in the
order they were collected, starting with the first row 0.555, 0.543, . . . , 0.519, followed by the
second row, third row, fourth row and ending with 0.447, the last entry in the fifth row. So, the
run order for 0.555 is 1, for 0.543 is 2, and so forth until you get to the run order for 0.447,
which is 50. Figure 23.6 shows the run chart for the wafer thickness data. A center line has
been drawn on the chart at the target thickness of 0.05 millimeters.
0.550
0.525
Thickness (mm)
0.500
0.475
0.450
0 10 20 30 40 50
Run Order
Figure 23.6. Run chart for wafer thickness data from Table 23.1.
Even though the overall pattern of the data gave no indication that there were any problems
with the grinding and polishing processes, it is clear from the run chart in Figure 23.6 that the
thickness of polished wafers is decreasing over time. Processes need to be stopped so that
adjustments can be made to the grinding and polishing processes.
The run chart involved plotting individual data values over time (run order). Another approach
is to select samples from batches produced over regular time intervals. For example, a quality
control plan for the polished wafers might call for routine collection of a sample of n polished
wafers from batches produced each hour. The thickness of each wafer in the sample is
recorded and the mean thickness, x , is calculated. The information on mean thickness can be
used to determine if the process is out of control at a particular time and to track changes in
the process over time.
Suppose when the grinding and polishing processes are in control, the distribution of the
individual wafers can be described by a normal distribution with mean µ = 0.5 millimeters and
standard deviation σ = 0.02 millimeters (similar to the data pattern in Figure 23.5). From Unit
22, Sampling Distributions, we know that under this condition the distribution of the hourly
sample means, x , based on samples of size n are normally distributed with the following
mean and standard deviation:
µ x = µ = 0.05 millimeters
σ 0.02
σx = = millimeters
n n
Each hour a technician collects samples of four polished wafers, measures their thickness,
records the values, and then calculates the sample mean. Suppose that the data in Table 23.2
come from samples collected over an eight-hour period.
68% of the x values to be within the interval 0.5 mm ± 0.1 mm or between 0.49 mm and
0.51 mm.
95% of the x values to be within the interval 0.5 mm ± 2(0.1) mm or between 0.48 mm and
0.52 mm.
99.7% of the x values to be within the interval 0.5 mm ± 3(0.1) mm or between 0.47 mm and
0.53 mm.
Next, we make an x chart, which is a scatterplot of the sample means versus the sample
order. We draw a reference line at μ = 0.5 called the center line. We use the values from the
68-95-99.7% Rule to provide additional reference lines in our x chart. The lower and upper
endpoints on the 99.7% interval are called the lower control limit (LCL) and upper control limit
(UCL), respectively. Figure 23.7 shows the completed x chart.
0.53
0.52
0.51
Sample Mean
0.50
0.49
0.48
0.47
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sample Number
The x chart in Figure 23.7 does not appear to indicate any problems that warrant stopping the
grinding or polishing processes to make adjustments. All of the points except one fall within
one σ n of the mean, in other words, fall between the reference lines corresponding to
0.49 and 0.51. However, as we add additional points, we will need some guidelines – a set of
decision rules – that tell us when the process is going out of control. The decision rules below
are based on a set of rules developed by the Western Electric Company. Although they are
widely used, they are not the only set of decision rules.
Decision Rules:
The following rules identify a process that is becoming unstable or is out of control. If any
of the rules apply, then the process should be stopped and adjusted (or the problem fixed)
before resuming production.
Rule 1: Any single data point falls below the LCL or above the UCL.
Rule 2: Two of three consecutive points fall beyond the 2σ / n limit, on the same side of
the center line.
Rule 3: Four out of five consecutive points fall beyond the σ n limit, on the same side of
the center line.
Rule 4: A run of 9 consecutive points (in other words, nine consecutive points on the same
side of the center line).
Figure 23.8 shows the updated x chart that includes the means from the seven
additional samples.
x Chart
0.53
0.52
0.51
Sample Mean
0.50
0.49
0.48
0.47
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sample Number
Now, we apply the decision rules. This time, we find that Rule 2 applies. Data points
associated with Samples 10 and 11 fall above 0.52 (which, in this case, is above the 2σ / n
limit). According to Rule 2 the process should be stopped after observing Sample 11’s
x -value.
The x chart monitors one statistic, the sample mean, over time. The x chart is only one
type of control chart. As mentioned earlier, the manufacturer is also interested in producing a
consistent product. So, instead of tracking the sample mean, the quality control plan could also
track the sample standard deviations, or the sample ranges over time. More generally, control
charts are scatterplots of sample statistics (or individual data values) versus sample order and
are commonly used tools in statistical process control.
When a process is running smoothly, with its variables staying within an acceptable range,
the process is in control. When the process becomes unstable or its variables are no longer
within an acceptable range, the process is out of control.
A run chart is a scatterplot of the data values versus the order in which these values are
collected. The chart displays process performance over time. Patterns and trends can be
spotted and then investigated.
Control charts are used to monitor the output of a process. The charts are designed to
signal when the process has been disturbed so that it is out of control. Control charts rely on
samples taken over regular intervals. Sample statistics (for example, mean, standard deviation,
range) are calculated for each sample. A control chart is a scatterplot of a sample statistic (the
quality characteristic) versus the sample number. Figure 23.9 shows a generic control chart.
Control Chart
UCL
Quality Characteristic
Center Line
LCL
0 2 4 6 8 10 12 14 16
Sample Number
The center line on a control chart is generally the target value or the mean of the quality
characteristic when the process is in control. The upper control limit (UCL) and lower
control limit (LCL) on a control chart are generally set ±3 σ n from the center line.
Decision rules consist of a set of rules used to identify when a process is becoming unstable
or going out of control. Decision rules help quality control managers decide when to stop the
process in order to fix problems or make adjustments.
5. In Quest’s control chart, how did they determine where to set the upper and lower
control limits?
For this activity, you will play the role of a semiconductor quality control manager in charge of
monitoring the thickness of polished wafers. Open the Control Chart tool from the Interactive
Tools menu. You will be working with x charts. The activity questions follow the list of steps
below.
Step 1: Select a set of values for the mean µ and standard deviation σ. You have three
possible choices for each of these parameters.
For now, you will work through the construction of at least two control charts with your
selection. In the real world, these values would be determined from past data collected when
the process was known to be in control.
Step 2: Since you are in charge of the quality control plan, decide on the sample size n you
would like to use for monitoring the process. You have three choices: 5, 10, or 20.
Keep in mind the following: The more wafers you sample, the more time it will take, and the
more it will cost. On the other hand, with larger samples, results are more precise.
In this mode, you will get feedback immediately after each decision that you make. If you make
a mistake, you will be told to start over and will need to click the “Start Over” button. Once you
feel confident about your decisions, you can change to Continuous mode.
Step 4: Calculate the lower control limit to four decimals and enter its value in the box for LCL.
Calculate the upper control limit to four decimals and enter its value in the box for
UCL. Click the “Change Control Limits” button.
If your calculations are correct, control lines will appear in the x chart. In Step-By-Step mode,
you will get feedback (see bottom of screen) if you have made a mistake. The feedback will
say: Recalculate control limit values. To correct the error, enter new values for LCL and UCL
and then click the “Change Control Limits” button.
Step 5: Click on the “Collect Sample Data” button. The data will appear in a column under
the heading Thickness (mm) near the top of your screen. To calculate x , click the
“Calculate Mean” button. The mean will appear underneath the column.
Step 7: Make a decision. Your possible decisions are: (1) Continue Process, which means that
you have decided the process is in control; or (2) Stop Process, which means that you
have decided to shut down the process for adjustments or inspection.
Step 8: Repeat steps 5 – 7 until one of the following three things happens:
(1) You decide to continue and get the following feedback: Process is not in control. It should
be stopped immediately. In this case, click the “Start Over” button at the top of the screen.
(2) You decide to continue and get the following feedback: Good decision. In this case,
continue constructing the control chart.
(3) After 25 samples, it will be time for routine maintenance even if the process is still in
control. At this time you, you can proceed to the next question. Click the “Start Over” button to
do so.
1. Work through Steps 1 – 8 using the Control Chart tool. Complete one control chart
successfully. Make a sketch of your chart (or do a screen capture and paste the screen
capture into a Word document). If the process was stopped before 25 samples were selected,
state which of the decision rules applies.
2. Use the same settings as you did for question 1. Rework question 1.
After you have successfully completed two control charts in Step-by-Step mode, you are ready
to move on to question 3.
3. Change the settings for µ, σ, and n. Choose Continuous mode. Allow the process to
continue until you think it needs to be stopped. After clicking the “Stop Process” button, you
will receive feedback.
a. What settings did you choose? What were the values of the upper and lower control limits?
b. Make a sketch of your control chart or save a screen capture of your control chart into a
Word document.
d. If your feedback indicated that you made a correct choice to stop the process, state the
rule that made you decide it was time to stop the process. If your feedback indicated that you
should have stopped the process sooner, state the sample number for when you should have
stopped the process and the rule that applies.
4. Select new settings for µ and σ (it is up to you if you also want to change n).
Repeat question 3 and make another control chart.
Assume that these data are recorded in the order they were collected beginning with the first
row 99, . . . 100, followed by the second row 100, . . . 100.
a. Make a run chart for these data. Leave room on the horizontal axis to expand the run orders
out to 30. (You will be adding 15 more data points in part (c).) Draw a reference line for the
target resistance (100 ohms) and for the tolerance interval (these can serve as control limits).
b. Based on your run chart in (a), is there any evidence that the process is out of control?
Support your answer.
c. The quality control inspector continued collecting data on the resistors. Results from an
additional 15 data values, in the order values were collected, are recorded below:
Use these data to complete the run chart in (a) for run orders from 1 – 30.
d. Based on the completed run chart in (c) is there any evidence that the manufacturing
process is out of control? Support your answer.
b. The manufacturer of the bottle caps has instituted a quality control program to prevent the
production of defective caps. As part of its quality control program, the manufacturer measures
the diameters of a random sample of n = 9 bottle caps each hour and then calculates the
sample mean diameter. If the process is in control, what is the distribution of the sample mean
x ? Be sure to specify both the mean and standard deviation of x ’s distribution.
c. The cap manufacturer has a rule that the process will be stopped and inspected any time
the sample mean falls below 0.499 inch or above 0.501 inch. If the process is in control, find
the proportion of times it will be stopped during inspection periods.
3. For each of the x charts in Figures 23.10 – 23.12, decide whether or not the process is in
control. If the process is out of control, state which decision rule applies. Justify your answer.
(Note that reference lines at one, two, and three σ n on either side of the mean have been
drawn on the control charts.)
a.
Control Chart
35 35
30 30
25 25
Sample Mean
20 20
15 15
10 10
5 5
0 2 4 6 8 10 12 14 16
Sample Number
70 70
60 60
50 50
Sample Mean
40 40
30 30
20 20
10 10
0 2 4 6 8 10 12 14 16
Sample Number
c.
Control Chart
70 70
60 60
50 50
Sample Mean
40 40
30 30
20 20
10 10
0 5 10 15 20 25 30 35 40
Sample Number
4. A company produces a liquid which can vary in its pH levels unless the production process
is carefully controlled. Quality control technicians routinely monitor the pH of the liquid. When
the process is in control, the pH of the liquid varies according to a normal distribution with
mean µ = 6.0 and standard deviation σ = 0.9 .
a. The quality control plan calls for collecting samples of size three from batches produced
each hour. Using n = 3, calculate the lower control limit (LCL) and upper control limit (UCL).
b. Samples collected over a 24-hour time period appear in Table 23.4. Compute the sample
means for each of the 24 samples and add the results to a copy of Table 23.4.
c. Make an x chart. Add reference lines including lines for the lower and upper control limits.
d. Based on the control chart you drew for (c), decide whether or not the process is in control.
If not, state which of the decision rules applies.
Day Number 1 2 3 4 5 6 7 8 9 10
Number of Duplicates 2 1 0 2 12 14 17 15 25 20
Day Number 11 12 13 14 15 16 17 18 19 20
Number of Duplicates 24 27 22 24 26 20 22 5 2 0
Table 23.5. Duplicate e-mail messages per day.
b. Draw a run chart of the duplicate e-mail data. Add the mean number of duplicates as a
reference centerline.
c. Nine or more consecutive data points on the same side of a center line can signal a special
cause variation. Does the run chart from (b) signal a special cause variation?
2. A quality control inspector at a company that manufactures valve linings monitors the mass
of the linings. When the process is in control, the mean mass is µ = 240.0 grams and standard
deviation σ = 0.4 gram . The inspector randomly selects a valve liner from batches produced
each hour and records its mass. The mass (in grams) of 25 valve liners are displayed in
Table 23.6 on the next page.
a. Make a histogram for mass of valve liners from Table 23.6. For the first class interval,
use 239.0 grams to 239.2 grams. Based on the histogram is there any evidence that the
manufacturing process is not in control? Explain.
b. Make a run chart for the mass of valve liners. Add a reference center line at µ. Add lower
and upper control limits at µ ± 3σ .
c. Does the run chart show any changes in the distribution of valve-liner mass over time?
Explain.
3. One process in the production of integrated circuits involves chemical etching of a layer of
silicon dioxide until the metal beneath is reached. The company closely monitors the thickness
of the silicon dioxide layers because thicker layers require longer etching times. The target
thickness is 1 micrometer (µm) and has a standard deviation of 0.06 micrometers (based on
past data when the process was in control). The company uses samples of four wafers. An x
chart based on 40 consecutive samples appears in Figure 23.13 on the next page.
UCL
U2
Sample Mean
U1
1.0 1
L1
L2
LCL
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Sample Number
a. Calculate the appropriate control limits (the values of the reference lines drawn in Figure
23.13). Round the values to two decimals.
b. Decide whether or not the process is in control. If not, explain which decision rule applies
and identify the sample number after which the process should be shut down for adjustments.
4. The company referred to in exercise 4 has two plant lines that produce the liquid. Data from
the second line appears in Table 23.7. When the process is in control, the pH of the liquid varies
according to a normal distribution with mean µ = 6.0 and standard deviation σ = 0.9 . The
quality control plan calls for collecting samples of size three from batches produced each hour.
Sample pH level
1 7.2 7.4 7.4
2 6.9 6.6 6.5
3 6.2 6.3 6.3
4 6.8 6.4 6.5
5 6.5 6.6 6.7
6 6.8 6.8 6.8
7 6.2 6.3 6.4
8 5.6 5.7 5.9
9 4.9 5.8 5.6
10 6.4 6.0 4.4
11 6.9 5.3 6.2
Continued on the next page...
b. Construct an x chart for the pH samples from the second plant line. Include reference lines
marking the center line and one, two, and three σ n on either side of the center line.
c. Based on the control chart from (b), does the process appear to be in control? If not, which
decision rule applies and what appears to be the problem?
Summary of Video
This video is an introduction to inference, which means we use information from a sample to
infer something about a population. For example, we might use a sample statistic to estimate a
population parameter. Suppose we wanted to know a man’s mean blood pressure. A sample of
blood pressure readings is shown in Table 24.1.
Su M T W Th F Sa
130 120 140 125 130 130 140
A.M.
125 130 145 140 125 135 110
P.M.
Table
224.1.
Table 4.1 Systolic blood pressure readings.
We could estimate his mean blood pressure using the sample mean from these readings,
x = 130. But how trustworthy is our conclusion given that different samples could lead to
different results, some higher and others lower? Statisticians address this issue by calculating
confidence intervals. Rather than a single number like 130, we can compute a range of values
along with a confidence level for that range.
Next, the context switches from blood pressure to the length of life of batteries. Because
companies promise specific battery lifetimes and improved performance over a competitor,
they need proof before ads promoting their product go on the air. At Kodak’s Ultra
Technologies, technicians use rigorous testing and calculate confidence intervals to back up
their marketing claims. Here’s how the data are collected. Random samples of batteries are
pulled from the warehouse. The batteries are drained under controlled conditions and the time
it takes for them to run out of juice is recorded. From these data, Kodak has determined that
its population of AA batteries when used in a toy will last 7½ hours ± 20 minutes and that their
confidence in that range is 95%.
Now, we retrace Kodak’s steps to figure out how they came up with this interval. Before getting
started, we need to check that a few underlying assumptions are satisfied:
Selecting a random sample of batteries for the test takes care of the assumption of
independent observations. The second assumption is satisfied since the sample size of n = 40
is considered large. The last assumption is not reasonable in the real world, but for now, we’ll
assume that from past data we do know the population standard deviation, σ = 63.5 minutes.
The task is to calculate a confidence interval for μ, the mean life of Kodak’s AA batteries.
Our sample statistic x is a point estimate for the parameter μ. If we include a margin of error
around our point estimate, we get an interval estimate of the form:
From Unit 22, Sampling Distributions, we know that the sampling distribution of x is normal,
with mean µ x = µ and standard deviation σ x = σ n . In this case, we are given σ = 63.5
minutes and we can compute σ x = 63.5 40 minutes or about 10 minutes. Think back to the
68-95-99.7% Rule. In any normal distribution, 95% of the observations lie within two standard
deviations of the mean. So, 95% of all possible samples result in battery-life data for which µ
is within plus or minus 20 minutes of that sample’s mean, x . In our example, x = 450 minutes.
So, we can say with 95% confidence that μ lies within 20 minutes of x , giving us a confidence
interval from 430 minutes to 470 minutes. To say that we are 95% confident in our calculated
range of values means that we got the numbers using a method that gives correct results 95%
of the time over many, many examples.
What if Kodak were willing to settle for only 90% confidence? Or what if they insisted on 99%
confidence? We can get any confidence level that we want by turning to the standard normal
distribution and finding the z* critical value. Then just substitute the appropriate values into the
following formula:
⎛ σ ⎞
x ±z*⎜
⎝ n ⎟⎠
Notice that the margin of error gets larger if we insist on higher confidence because z* will be
larger. On the other hand, the margin of error gets smaller if we take more observations so that
n is larger.
B. Recognize that a useful estimate requires a measure of how accurate the estimate is.
C. Know that a confidence interval has two parts: an interval that gives the estimate and the
margin of error, and a confidence level that gives the likelihood that the method will produce
correct results in the long range.
D. Be able to assess whether the underlying assumptions for confidence intervals are
reasonably satisfied. Provided the underlying assumptions are satisfied, be able to calculate
a confidence interval for μ given the sample mean, sample size, and population standard
deviation.
E. Understand the tradeoff between confidence and margin of error in intervals based on the
same data.
F. Given a specific confidence level, recognize that increasing the size of the sample can give
a margin of error as small as desired.
The confidence level states the probability that the method will give a correct result. That is,
if you use 95% confidence intervals often, in the long run 95% of your intervals will contain the
true parameter value.
Suppose that a simple random sample of size n is drawn from a normally distributed
population having an unknown mean µ and known standard deviation σ. A level C (expressed
as a decimal) confidence interval for µ is
⎛ σ ⎞,
x ±z*⎜
⎝ n ⎟⎠
where z* is a cutoff point for the standard normal curve with area (1 – C)/2 to its right.
For example, if C = 0.95 (for a 95% confidence interval) then (1 – C)/2 = (1 – 0.95)/2 = 0.025.
In this case, z* turns out to be 1.96 as shown in Figure 24.1.
Distribution Plot
Normal, Mean=0, StDev=1
0.4
0.3
Density
0.2
0.1
0.025 0.025
0.0
-1.960 0 1.960
Z
If the sample size n is relatively small, we first need to check that the underlying assumption
of normality is reasonably satisfied before computing a confidence interval. One way to check
the assumption of normality is to make a normal quantile plot of the sample data. Alternatively,
The size of the margin of error controls the precision (width) of the confidence interval
estimate. Precision is increased as the margin of error shrinks. The margin of error of a
confidence interval decreases if any of the following occur:
In practice, the population standard deviation σ is not known and must be estimated
from the sample. If the sample size n is fairly large (say at least 30), then the value of the
sample standard deviation s should be close to σ. In that case, you can replace σ by s in
the confidence interval formula. (See Unit 26, Small Sample Inference for One Mean, for a
continued discussion of confidence intervals for µ when σ is unknown.)
A confidence interval for a population parameter is an interval of plausible values for that
parameter. It is constructed so that the value of the parameter will be captured between the
endpoints of the interval with a chosen level of confidence. The confidence level is the
success rate of the method used to construct the confidence interval.
Many confidence intervals have the following form: point estimate ± margin of error. The
margin of error is the range of values above and below the point estimate.
A formula used to compute a confidence interval for µ when σ is known and either the
sample size n is large or the population distribution is normal is given by:
⎛ σ ⎞
x ±z*⎜ ,
⎝ n ⎟⎠
1. Why is a single blood pressure reading not sufficient if we want to estimate a person’s
average blood pressure?
In this activity, you will need the simulation data collected for question 2 in Unit 22’s activity.
Recall that samples of size 9 were drawn from an approximately normal distribution with
µ = 50 and standard deviation σ = 4. Assume for the moment that µ is unknown. You will be
using sample data to find confidence interval estimates for µ.
b. What is the margin of error for a 95% confidence interval for µ? (Round your answer to
two decimals.)
2. Your instructor has a container filled with numbered strips. Draw a sample of size 9.
a. Record the outcomes of your sample and calculate the sample mean, x .
c. In this case, the true value of µ is 50. Does your confidence interval contain the true
value of µ?
3. Your instructor should distribute a table of the results from 100 samples of size 9 generated
for Unit 22’s activity.
a. For each sample, calculate a 95% confidence interval estimate for μ and record the
endpoints of the interval.
b. Of the 100 samples collected, how many of the 95% confidence intervals contain the true
value of μ, which is 50? How many did you expect to contain 50? Is there a discrepancy
between the number you found and the number you expected to find? Explain how this
discrepancy could occur.
410 400 460 440 390 400 450 460 520 380
480 480 490 450 480 330 390 460 600 610
Assume that the standard deviation of scores for all juniors is σ = 100 .
a. Find the value of σ x , the standard deviation of the sample mean in size-20 samples.
b. Check to see whether these data could be considered to come from a normally distributed
population. (The data need only be roughly normal – in other words, the data should have no
severe departures from normality.)
c. Let μ be the mean score that would be observed if every junior at Lincoln High took the
exam. Give a 95% confidence interval for μ. Show your calculations. How could you get a
smaller margin of error with the same confidence?
d. Give a 99% confidence interval for μ. Explain in plain language, to someone who knows no
statistics, why this interval is wider than your result in (c).
a. A random sample of 30 test results is given below. Use these results to determine a 95%
confidence interval for the mean MCAS math score, μ.
252 266 264 244 262 268 236 254 264 276
266 220 218 260 258 232 268 218 262 242
238 262 250 264 276 234 232 266 276 248
258 252 268 264 264 264 222 258 220 254
254 274 266 264 268 248 238 248 258 254
254 258 208 268 268 272 274 254 272 270
c. Compare the margin of errors for the confidence intervals in (a) and (b). Why would you
expect the margin of error based on 60 observations to be less than the margin of error based
on 30 observations?
d. Keeping the confidence level at 95%, how many observations would you need in order to
reduce the margin of error to under 3.0?
3. A city planner randomly selects 100 apartments in Boston, Massachusetts, to estimate the
mean living area per apartment. The sample yielded x = 875 square feet with a standard
deviation s = 255 square feet.
a. Calculate a 95% confidence interval for μ, the mean living area per apartment. (Keep in
mind that since the sample size is large, s should be close to σ.)
b. Having found the interval in (a), can you say there is a 95% chance that the mean living
area is within the interval? Explain why or why not.
4. A random sample of 50 full-time, hourly wage workers between the ages of 20 and 40
was selected from participants in the 2012 March Supplement, which is part of the Current
Population Survey (a joint venture of the U.S. Bureau of Labor Statistics and Census Bureau).
The hourly rate (in dollars) of these workers is given below.
7.25 30.09 12.00 25.00 8.00 27.53 14.20 31.00 20.00 18.00
12.00 28.12 16.50 8.00 9.00 15.00 15.10 18.00 17.43 14.00
15.25 34.50 8.00 14.80 7.80 11.00 33.07 10.55 19.00 19.50
12.25 18.00 24.00 27.50 15.00 6.75 30.00 10.30 27.00 14.50
8.00 14.00 10.00 11.75 15.00 28.00 7.50 28.50 16.25 11.75
b. Calculate a 95% confidence interval for μ, the mean hourly wage of full-time, hourly wage
workers between the ages 20 and 40. Because the sample size is large, use s, the sample
standard deviation, in place of σ, the unknown population standard deviation.
c. A politician speaking around the time that the data for the 2012 March Supplement were
collected claimed that salaries were rising. He stated that the average hourly rate for fulltime
workers between the ages of 20 and 40 was $20.00. Does your confidence interval from (b)
affirm or refute the politician’s claim. Explain.
d. After being confronted, the politician complained that we should have used a 99%
confidence interval to estimate the mean hourly wage. Compute a 99% confidence interval for
μ. Does the 99% confidence interval affirm his claim? Explain.
a. Because the sample is large, the population standard deviation σ will be close to 2.7, the
observed sample standard deviation. Give a 95% confidence interval for the mean height of all
varsity basketball players, assuming that Julie’s observations are a random sample. Show your
calculations.
b. Do you think it is reasonable to take these 96 players as a random sample of all male varsity
basketball players? Why or why not?
2. A random sample of 36 skeletal remains from females was taken from data stored in the
Forensic Anthropology Data Bank (FDB) at the University of Tennessee. The femur lengths
(right leg) in millimeters are recorded below.
432 432 435 460 432 440 448 449 434 443
525 451 448 443 450 467 436 423 475 435
433 438 453 438 435 413 439 442 507 424
Since the sample size is large, we can use the sample standard s in place of σ in calculations
of confidence intervals.
b. Before doing any calculations, think about a 90%, 95% and 99% confidence for µ, the mean
femur bone length for women. Which of these intervals would be the widest? Which would be
the narrowest? Explain how you know without calculating the confidence intervals.
c. Calculate 90%, 95%, and 99% confidence intervals for µ, the mean femur bone length for
adult females. Do your results confirm your answer to (b)?
4054 3572 2636 3430 3118 3969 3628 3940 4536 4819
3883 3487 3827 3883 2749 3487 3855 4450 4309 3345
b. Determine a 95% confidence interval for the mean birth weights of babies born
in Massachusetts.
c. In the United States, we are more accustomed to reporting babies’ weights in ounces (or
even pounds and ounces) than grams. How would you modify the confidence interval to give a
confidence interval for the mean weight in ounces? Calculate that interval. (Use the following
conversion: 1 gram ≈ 0.03527 ounce.) Does your result seem reasonable?
4. How much can a single outlier affect a confidence interval? Suppose that the first
observation of 4054 grams in the random sample in question 3 had been 350 grams (the
weight of a baby that did not survive).
a. Make a boxplot of the modified data set to show that this low weight baby is an outlier.
b. Recalculate the 95% confidence interval based on the modified data. How much did the
outlier affect the confidence interval?
Final comment: Always look at your data before calculating confidence intervals.
Outliers can greatly affect your results.
Summary of Video
Sometimes, when you look at the outcome of a particular study, it can be hard to tell just
how noteworthy the results are. For example, if the severe injury and death rates due to car
crashes on one state’s roads have dropped from 4.7% down to 3.8% after enacting a seat
belt law, how would we know whether this result was due to the seat belt law or simply due to
chance variation?
To sort out whether results are due to chance or there is something else at work (such as
the enactment of the seat belt law), statisticians turn to a tool of inference called tests of
significance. Significance testing can be applied in a variety of situations. We next explore how
researchers used it to help solve a controversy in classic literature.
In 1985, scholar Gary Taylor made a surprising find while conducting research for a new
edition of the complete works of William Shakespeare. While going through a 17th century
anthology at the Bodleian Library at Oxford University, he came upon a sonnet he had never
seen before and it was attributed to William Shakespeare. Obviously, Taylor was excited about
his new find and wanted to include it in his new edition of The Complete Works.
This discovery caused quite a controversy – some scholars were thrilled by the discovery
but others didn’t think the poem was good enough to be one of Shakespeare’s. Statistics
to the rescue! A decade earlier, statistician Ron Thisted had done a statistical analysis of
Shakespeare’s vocabulary. Thisted’s program provided a detailed, numeric description of
Shakespeare’s vocabulary. For every work, Thisted could tell how many new words there
were that Shakespeare didn’t use anywhere else. Using this model, Thisted predicted that if
Shakespeare had written the poem in question, it would have 7 unique words in it. When they
ran the poem through the program, however, they found that there were 10 unique words. Did
this difference reflect random variation within Shakespeare’s writing? Or did it indicate that
Shakespeare was not the author? This is where significance testing (or tests of hypotheses)
can be helpful.
Thisted set up two opposing hypotheses: the null hypothesis, written as H0, that basically
means nothing unusual is happening; and the alternative hypothesis, the researchers’ point of
The question was whether the discrepancy between the observed number of unique words,
10, and the predicted number of unique words, 7, was due to another author writing the poem
rather than to chance variation. Is that three-word difference a big difference? To answer
this question, Thisted assumed (based on his data) that the number of unique words in
Shakespeare’s poems had the approximately normal distribution with mean µ = 7 and standard
deviation σ = 2.6 shown in Figure 25.1.
The shaded area under the density curve in Figure 25.2 corresponds to the probability of a
number of unique words at least as extreme as 10 (in other words, a difference from 7 of 3 or
more words).
Using technology, we find that the shaded area is 2(0.1243) = 0.2483. Thus, Thisted
could expect to find a value at least as extreme as 10 unique words roughly 25% of the
time. Therefore, Thisted failed to find significant evidence against the null hypothesis that
Shakespeare wrote the poem. He could not reject H0. In the absence of literary or statistical
evidence against Shakespeare’s authorship, the poem was published in Taylor’s edition of The
Complete Works.
Since we want to work with sample means, let’s suppose researchers found a folio of five
new poems that were attributed to Shakespeare. Suppose that our sample mean from the five
poems in the folio is x = 8.2 . We want to know if, based on this evidence, we can conclude
that Shakespeare did not write these poems. We set up our null and alternative hypotheses:
H0 : µ = 7
Shakespeare wrote the poems.
Ha : µ ≠ 7
Someone else wrote the poems.
One thing to decide, when setting up a significance test, is whether to use a one-sided or
two-sided alternative hypothesis. In our Shakespeare example, we are using a two-sided
alternative hypothesis because a different author might consistently use either more or fewer
unique words than Shakespeare. But suppose we suspected the poem was written by a
particular author who was known to consistently use more unique words than Shakespeare?
We begin by assuming the null hypothesis is true. Then we find the probability of getting a
result at least as extreme as ours if the null hypothesis really is true. If these poems were
written by Shakespeare, then the distribution of x , the mean number of unique words per
poem in five poems, would have a normal distribution with the following mean and standard
deviation:
µx = µ
2.6
σx = ≈ 1.163
5
Next, we need to find the probability that any sample of five of Shakespeare poems would
have an x at least as far from 7 as what we observed from our sample, x = 8.2 . Figure 25.3
illustrates this probability. Notice that two areas are shaded because our alternative is
two-sided.
To calculate this probability from a standard normal table, we find the z-score for our observed
sample mean. This is called a z-test statistic:
So, the observed value of our test statistic z is 1.03, a little more than one standard deviation
away from the mean, 0, on the standard normal curve. The final step in our test of significance
is to find the probability of observing a value from a standard normal distribution that is at least
this extreme. This probability is called the p-value. To find this p-value, we use z = 1.03 and
look in the standard normal table (z-table). From Figure 25.4, we find that the area under the
standard normal curve to the left of 1.03 is 0.8485.
That means that 1 – 0.8485 or 0.1515 is the area in the right tail (the shaded region in
Figure 25.5). Since we choose a two-sided alternative, we double this value because we are
interested in the area under BOTH tails (the area to the right of 1.03 and the area to the left of
-1.03). Our final result gives a p-value of 0.303.
From the p-value, we know that there is a 30.3% chance that random variation would produce
a mean unique word count as far from 7 in either direction as 8.2. Since a 30.3% chance is a
pretty good chance, we have failed to disprove the null hypothesis. We have not found good
evidence against Shakespeare’s authorship of these new poems.
This example helps illustrate the general rule about p-values: Small p-values give evidence
against the null hypothesis; large p-values fail to reject the null hypothesis. Since p-values
can range from the very small – close to zero – to the very large – close to one, researchers
need to decide when a p-value is small enough for them to reject the null hypothesis. One of
the most common levels is 0.05 or 5%. If something is statistically significant at the 5% level, it
means that the results produced a p-value less than 0.05. Another widely used level is 0.01 or
the 1% level.
B. Be able to formulate the null hypothesis and alternative hypothesis for tests about the mean
of a population. Understand that the alternative hypothesis is the researcher’s point of view.
C. Understand the concept of a p-value. Know that smaller p-values indicate stronger
evidence against the null hypothesis.
D. Be able to calculate p-values as areas under a normal curve in the setting of tests about the
mean of a normal population with known standard deviation.
The statement being tested in a test of significance is called the null hypothesis, written H0.
For example, H0 might state that a population parameter, such as the mean µ, takes a specific
value. Usually the null hypothesis is a statement of “no effect” or “no difference” or “status
quo.” The test of significance is designed to assess the strength of the evidence against the
null hypothesis and in favor of an alternative hypothesis Ha that represents the effect we hope
or suspect is true. (Ha is generally the researcher’s point of view.) The alternative hypothesis
might be that the parameter differs from its null value, in a specific direction (one-sided
alternative) or in either direction (two-sided alternative).
Suppose that we want to conduct a test about the mean of a population. More specifically,
suppose that we want to test that the mean has a specific value, which we’ll call µ0 , or that it
doesn’t have that value, or is smaller than that value, or larger than that value. We form two
opposing hypotheses – the null and alternative hypotheses – which we express symbolically
as follows (select one of the possible alternatives):
H 0 : µ = µ0
Ha : µ ≠ µ0 or Ha : µ > µ0 or Ha : µ < µ0
To test the hypothesis H0 : µ = µ0 based on a random sample of size n from a population with
unknown mean µ and known standard deviation σ, we compute the sample mean x . Here’s a
recap of what we know about x :
• If H0 is true and the population is normal, then x has the normal distribution with mean
µ0 and standard deviation σ n .
• Suppose instead that the population does not follow a normal distribution. If the
sample size n is large, we can apply the Central Limit Theorem and conclude that x is
approximately normally distributed with mean µ0 and standard deviation σ n .
Now, we work through an example. Researchers studying the effects of smoking on sleep
believe that men who smoke need more sleep than what is average for men, which is 7.5
hours per night. Let μ be the mean number of hours of sleep for men who smoke. Assume that
the standard deviation is σ = 0.5 hours. The null and alternative hypotheses are:
H0 : m = 7.5
Ha : m > 7.5
7.7 − 7.5
z= ≈ 2.83
0.5 50
From the z-test statistic, we learn that the observed value of x = 7.7 is 2.83 standard
deviations from the hypothesized mean from H0 , µ = 7.5 . If H0 is true, then z has the standard
normal distribution. Now, we are ready to evaluate the evidence against H0 – How likely would
it be to observe a value from the standard normal distribution that is at least as extreme as
2.83? The answer, around 0.2%, is illustrated in Figure 25.6. Around 0.2% is pretty unlikely.
So, in this case, we reject the null hypothesis and accept the alternative: Male smokers, on
average, need more sleep than men in general.
0.4
0.3
Density
0.2
0.1
0.002327
0.0
0 2.83
Z
As we saw in the previous example, the distribution of the z-test statistic, under the assumption
that H0 is true, allows us to use the observed z-value to assess the evidence against H0. We
calculate the probability, assuming H0 is true, of observing a value from the standard normal
distribution as extreme or more extreme than the z-value we calculated – this probability is
called the p-value. Because there are three possible alternatives, there are three possibilities
for computing the p-value:
1. The p-value for a test of H0 against Ha : µ > µ0 is the probability of observing a value from
the standard normal distribution that is at least as large as the observed z-test statistic.
(See Figure 25.7 (1).)
2. The p-value for a test of H0 against Ha : µ < µ0 is the probability of observing a value from
the standard normal distribution that is at least as small as the observed z-test statistic.
(See Figure 25.7 (2).)
3. The p-value for a test of H0 against Ha : µ ≠ µ0 is the probability of observing a value from
the standard normal distribution that is at least as far from 0 (on either side of 0) as the
observed z-test statistic. (See Figure 25.7 (3).)
p-value
p-value
p-value
0 Observed z 0
Observed z 0 Observed z
Sometimes we set certain cutoffs for the p-value called the significance level. For example, if
the p-value is below 0.05 (p < 0.05), we say the results are significant at the 0.05 level, or the
5% level.
The claim tested by a significance test is called the null hypotheses. Usually the null
hypothesis is a statement about “no effect” or “no change.” The claim that we are trying to
gather evidence for – the researcher’s point of view – is called the alternative hypothesis.
The alternative hypothesis is two-sided if it states that a parameter is different from the null
hypothesis value. The alternative hypothesis is one-sided if it states that either a parameter is
greater than or a parameter is less than the null hypothesis value.
A test statistic is a quantity computed from the sample data that measures the gap between
the null hypotheses and the sample data. A test statistic is used to make a decision between
the null and alternative hypotheses.
The p-value is the probability, computed under the assumption that the null hypothesis is
true, of observing a value from the test statistic at least as extreme as the one that was
actually observed.
The significance level of a test of hypotheses is the highest p-value for which we will reject
the null hypothesis.
A z-test statistic for testing H0 : µ = µ0 , where μ is the population mean, is given by:
x − µ0
z=
σ n
The z-test is used in situations where the population standard deviation σ is known and either
the population has a normal distribution or the sample size n is large.
1. In the 1970s, statistician Ron Thisted did a statistical analysis of Shakespeare’s vocabulary.
Based on his analysis he created a computer program. What could his program tell you about
a Shakespearean poem?
2. In analyzing a poem to see whether or not it was authored by Shakespeare, Thisted set up
a null hypothesis and an alternative hypothesis. State those hypotheses in words.
3.What was the approximate distribution of the number of unique words per poem in
Shakespeare’s poems?
4. Thisted observed 10 unique words in the newly discovered poem. Was that sufficient
evidence to conclude that Shakespeare did not write the poem?
5. Which is better evidence against the null hypothesis, a large p-value or a small p-value?
Nabisco Chips Ahoy is a popular brand of chocolate chip cookie. In the 1980s, Nabisco ran
television ads claiming that their cookies had, on average, 16 chips per cookie. Since the
1980s many more brands of chocolate chip cookies have appeared on supermarket shelves,
which could have put pressure on Nabisco to improve its product perhaps by increasing the
amount of chips. On the other hand, the price of chocolate has increased, which could have
had the opposite effect. In this activity, you will test whether or not Nabisco could run the same
ad today.
1. Collect the data. Your instructor will provide directions and, after the data collection is
complete, distribute the data. (Save the data for use in Unit 27’s activity.)
2. Compute the mean and standard deviation of the number of chips per cookie.
b. Calculate the value of the z-test statistic. (Since the sample size is large, use s in
place of σ.)
4. Calculate a 95% confidence interval for µ. Does your confidence interval indicate that µ has
increased, decreased, or remained the same from its value in the 1980s?
a. Larry’s car averages 32 miles per gallon on the highway. He switches to a new motor oil that
is advertised as increasing gas mileage. After driving 3000 highway miles with the new oil, he
wants to determine if his gas mileage actually has increased.
b. A university gives credit in a French language course to students who pass a placement
test. The language department wants to know if students who get credit in this way differ in
their understanding of spoken French from students who actually take the French course.
Some faculty think the students who test out of the course are better, but others argue that
they are weaker in oral comprehension. Experience has shown that the mean score of
students in the course on a standard listening test is 24. The language department gives the
same listening test to a sample of 40 students who passed the placement test to see if their
performance is different.
c. Experiments on learning in animals sometimes measure how long it takes a mouse to find
its way through a maze. The mean time is 18 seconds for one particular maze. A student
thinks that a loud noise will cause the mice to complete the maze faster. She measures how
long each of 10 mice takes with a noise as stimulus.
2. The Survey of Study Habits and Attitudes (SSHA) is a psychological test that measures the
motivation, attitude toward school, and study habits of students. Scores range from 0 to 200.
The mean score for U.S. college students is about 115, and the standard deviation is about
30. A teacher who suspects that older students have better attitudes toward school gives the
SSHA to 25 students who are at least 30 years of age. Their mean score is x = 125.2 .
Assume that σ = 30 for the population of older students, and that the students tested are a
random sample from the population of older college students. Carry out a significance test of
H0 : µ = 115
Ha : µ > 115
Report the value of the test statistic, the p-value of your test, and state your conclusion clearly.
a. In this case, the sample size n = 12 is relatively small. Check to see if it is reasonable to
assume these data come from an approximately normal population.
b. Do these observations provide good evidence that the average detector reading differs from
the true value of 105? Assume that you know that the standard deviation of readings for all
detectors of this type is σ = 9 .
4. The CDC publishes charts on Body Mass Index (BMI) percentiles for boys and girls of
different ages. Based on the chart for girls, the mean BMI for 6-year-old girls is listed as 15.2
kg/m2. The data from which the CDC charts were developed is old and there is concern
that the mean BMI for 6-year old girls has increased. The BMIs of a random sample of 30
6-year-old girls are given below.
24.5 16.3 15.7 20.6 15.3 14.5 14.7 15.7 14.4 13.2
16.3 15.9 16.3 13.5 15.5 14.3 13.7 14.3 13.7 16.0
14.2 17.3 19.5 22.8 16.4 15.4 18.2 13.9 17.6 15.5
c. Since the sample size is relatively large, use s in place of σ and calculate the value of the
z-test statistic. Then calculate the p-value.
d. Based on your answer to (c), do the sample data provide sufficient evidence that the mean
BMI for 6-year-old girls has increased? Explain.
31 31 43 36 23 34 32 30 20 24
Assume that the standard deviation of the odor threshold for untrained noses is known to
be σ = 7 µg/l.
a. Is it reasonable to assume the data are from an approximately normal population? Explain.
b. The researcher believes that the mean odor threshold for beginning students is higher than
the published threshold, 25 µg/l, and decides to conduct a significance test. What are the null
and alternative hypotheses?
c. Carry out a significance test. Report the value of the test statistic, the p-value, and
your conclusion.
2. In 2010/2011 the national mean SAT Math score was 514. Faculty at a state university
had disagreements over their students’ mathematics preparation for college. Some felt that
their students had fallen below the national average, and others felt that their students had
made some advances. To help answer this question, math faculty took a random sample of
50 students who entered the university fall semester 2011. The SAT Math scores from those
students are given below.
580 540 520 490 430 570 520 540 440 610
430 390 470 550 390 500 550 440 550 660
560 550 450 560 680 630 400 450 500 460
460 530 590 380 660 570 520 530 500 680
450 590 660 420 370 550 450 510 480 500
c. Construct a 95% confidence interval for µ, the mean Math SAT for students entering this
university in fall 2011. (Refer to Unit 24, Confidence Intervals.) Does your confidence interval
indicate that the true mean SAT Math score for students entering the university in fall 2011 is
less than 514, could be 514, or is greater than 514? Explain.
3. The average length of calls coming into a municipal call center had been around 90
seconds. Lately, there has been some concern that more complicated calls are coming into
the center causing the mean length of the calls to increase. In order to test this assumption,
the city draws a random sample of 100 calls. The sample mean and standard deviation are
x = 118.4 seconds and s = 186.5 seconds, respectively.
b. Do these data provide good evidence that the average call length has increased from 90
seconds? (Since the sample size is large, use s in place of µ ) Show the work needed to
support your answer. Conduct the significance test at the 0.05 level.
c. Suppose city planners are willing to run the test at the 0.10 level. (They will reject the null
hypothesis if the p-value is below 0.10.) Would this change the conclusion reached in (b)?
Explain.
4. Eating fish contaminated with mercury can cause serious health problems. Mercury
contamination from historic gold mining operations is fairly common in sediments of
rivers, lakes and reservoirs today. A study was conducted on Lake Natoma in California to
determine if the mercury concentration in fish in the lake exceeded guidelines for safe human
consumption. Suppose that you are an inspector for the Fish and Game Department and that
you are given the task of determining whether to prohibit fishing in Lake Natoma. You will
close the lake to fishing if it is determined that fish from the lake have unacceptably high
mercury content.
H0 : µ = 5 versus Ha : µ > 5
or
H0 : µ = 5 versus Ha : µ < 5
b. Would you prefer a significance level of 0.1 or 0.01 for your test? Explain your choice.
Summary of Video
The z-procedures for computing confidence intervals or hypothesis testing work in cases
where we know the population’s standard deviation. But that’s hardly ever the case in real
life. For times when we don’t know the population standard deviation but still want to figure
out confidence intervals and do significance tests, statisticians turn to t-inference procedures.
These t-procedures were invented in 1908 by William S. Gosset. Gosset was a chemist at
the Guinness Brewery in Ireland. Making ale requires constant sampling of everything from
barley to yeast to the beer itself. Gosset wanted to save time and money using small samples
and their standard deviations as an estimate of the unknown population standard deviation, σ.
Using the standard deviation s derived from only a few data values does not give a sufficiently
good estimate of the entire population’s standard deviation; and he couldn’t proceed with a
z-procedure. In his efforts to figure out a way around this issue, Gosset created a new class of
distribution called the t-distributions.
The video now turns to a modern day brewery, Pretty Things Beer and Ale Project. Dann
Paquette and Martha Holley-Paquette, owners of the operation, take samples at various
stages in the brewing process. At one stage, they take a sample and measure its density. They
aim for a density reading of at least 19.5 degrees Plato. (Degrees Plato measure how much
more dense the liquid is than water.) In one batch of Baby Tree beer, they got a reading of 20.3
degrees Plato – a good sign that this batch will be great.
Let’s imagine that Pretty Things wants to see how closely their production of Baby Tree beer
is hitting their pre-fermentation density goal of 19.5 degrees Plato. Data collected from 10
batches are given below.
20.2 18.9 19.6 20.6 20.3 18.7 21.0 18.5 20.1 19.3
Since our sample size is small, its standard deviation could be quite far from the population
standard deviation of all Baby Tree beer ever brewed. So, z-procedures won’t work. Instead,
we call on the t-procedure that William Gosset invented.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 1
In Figure 26.1, we compare a t-distribution for a sample of size 3 to the standard normal curve.
t-distribution
Sample size 3
The two curves share certain features. They are both bell-shaped. But the t-density curve
is broader with a shorter peak, and its tails are higher. These fatter tails mean there is more
probability of getting results far from zero. That’s because the sample standard deviation
varies from sample to sample, particularly when the sample size is small, adding uncertainty.
Another difference is that although there is a single standard normal distribution, there is
a different t-distribution for every sample size. As the sample size increases, the sample
standard deviation s gets closer and closer to the population standard deviation, σ and the
t-density curve gets closer to the standard normal curve.
The different t-distributions are specified by something called degrees of freedom, which are
related to the sample size n: degrees of freedom = n – 1. For our beer data, the degrees of
freedom are 10 – 1 = 9. Next, we calculate a confidence interval for the population mean of
the density of all Baby Tree beer. For these calculations to work, our data need to be from a
normal distribution, which is a safe assumption in this case. Here’s our formula:
s
x ±t *
n
0.85
19.72 ± t *
10
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 2
In order to determine t*, we choose whatever confidence level we like – we’ll go with 95%. We
calculate the value of t* from software as illustrated in Figure 26.2: t* = 2.262.
0.95
0.025 0.025
-2.262 0 2.262
T t*
⎛ 0.85 ⎞
19.72 ± (2.262) ⎜ ≈ 19.72 ± 0.61
⎝ 10 ⎟⎠
So, our confidence interval (19.11, 20.33) gives us a range of plausible values for µ, the mean
density of Baby Tree beer.
Take a moment to compare z*, the z-critical value for a 95% z-confidence interval with our
value for t*:
z* = 1.960
t* = 2.262
Here, the t-critical value of 2.262 gives us a wider confidence interval. That is the price we pay
for having a small sample and for not knowing σ.
Pretty Things’ goal for their Baby Tree beer is a density of 19.5 degrees Plato. Using the
confidence interval that we have calculated, we can say with 95% confidence that a 19.5
population mean is within our range of plausible values for µ.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 3
Student Learning Objectives
A. Understand when to use t-procedures for a single sample and how they differ from the
z-procedures covered in Units 24 and 25.
C. Know how to check whether the underlying assumptions for a t-test or t-confidence interval
are reasonably satisfied.
E. Be able to test a population mean with a t-test. Be able to calculate the t-test statistic and to
determine the p-value as an area under a t-density curve.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 4
Content Overview
In Units 24 and 25, we introduced z-procedures for (1) calculating confidence intervals for
a population mean and (2) conducting significance tests about a population mean. The
confidence interval formula and z-test statistic are as follows:
σ x−µ
(1) x ± z * (2) z =
n σ n
For both procedures we assumed that the population was normally distributed or the sample
size n was large, and that the population standard deviation σ was known. However, in real
life, σ is generally unknown.
Let’s start with an example. The weights (pounds) from a random sample of 16 4-year-old
children who took part in a study on childhood obesity appear below.
From these data, we can compute the sample mean and sample standard deviation:
x = 36.47 lbs and s = 4.23 lbs. However, the population standard deviation σ is unknown.
Nevertheless, we would like to calculate a confidence interval for µ, the mean weight of
4-year-olds.
It seems reasonable to assume that weights of 4-year-olds are normally distributed and the
normal quantile plot in Figure 26.3 confirms this assertion.
95
90
80
70
Percent
60
50
40
30
20
10
1
20 25 30 35 40 45 50
Weight Age 4 (pounds)
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 5
But we still have to deal with the fact that σ is unknown. We know that when the sample size n
is large, s will be close to σ. But in this case n = 16, which is not large enough. Hence, simply
replacing σ by s in the z-confidence interval formula would introduce too much additional
variability. To compensate for the additional variability, we also replace z*, a critical value from
a standard normal distribution, with t*, a critical value from a t-distribution. The result is the
formula for computing t-confidence intervals:
s
x ±t *
n
The t-distributions have some features in common with the standard normal distribution.
Both have density curves that are bell shaped and centered at zero as can be seen in Figure
26.4. However, there is more area in the tails of t-distributions than there is for the standard
normal distribution. This difference is particularly noticeable when the degrees of freedom are
small (See Figure 26.4(a).).
standard normal
standard normal distribution
distribution
t-distribution t-distribution
df = 5 df = 15
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
(a) t-distribution has 5 degrees of freedom. (b) t-distribution has 15 degrees of freedom.
Figure 26.4. Comparison: two t-distributions with standard normal distribution.
The degrees of freedom (df) associated with t*, the t-critical value in our confidence interval
formula, are related to the size n of the sample:
df = n – 1
So, for our sample of 16 observations, df = 15. The value of t* for a 95% confidence interval
can be determined from a t-table (Figure 26.5) or using statistical software (Figure 26.6). In
either case, t* = 2.131.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 6
Density Curve
T, df =15
0.4
0.3
Density
0.2
0.1
Figure
26.5
Figure 26.5. Using a t-table to determine t* Figure 26.6. Using statistical software to
determine t*.
We now have everything that we need to calculate a 95% confidence interval for µ, the mean
weight of 4-year-olds. Here are the calculations:
s 4.23
x ±t * = 36.47 ± (2.131) = 36.47 ± 2.25
n 16
Hence, we can say that µ is between 34.22 lbs and 38.72 lbs and that we have used a process
to calculate this interval that has a 95% track record of giving correct results.
Next, we turn our attention to significance tests about a population mean µ for situations where
the sample size is relatively small (n < 30), the population has a normal distribution, but σ is
unknown. The t-test statistic results from replacing σ in the z-test statistic with s:
x−µ
t=
s n
Suppose that a height chart listed the average height of 4-year-olds as 39 inches. The heights
of the sample of 16 4-year-olds are given below.
We suspect that children’s heights have increased since the time the height chart was created
due in part to better nutrition. To test our supposition, we let µ represent the mean height of
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 7
4-year-olds. The null and alternative hypotheses are:
H0 : µ = 39
Ha : µ > 39
The sample mean and standard deviation for the height data are: x = 40.163 inches and
s = 1.255 inches. Now we calculate the t-test statistic, replacing µ0 with its value from the null
hypothesis and substituting in the sample values for x and s:
x − µ0 40.163 − 39
t= = ≈ 3.71
s n 1.255 16
If the null hypothesis is true, then the t-test statistic will have a t-distribution with n – 1 = 16
– 1 or 15 degrees of freedom. All that is left is to determine the p-value, the probability of
observing a value at least as extreme as 3.71. Using statistical software we find p ≈ 0.001 as
illustrated in Figure 26.7. Hence, we reject the null hypothesis and conclude the mean height of
4-year-olds has increased since the time that the height chart was created.
Density Curve
T, df = 15
0.001048
0 3.71
T
The study involving the 16 children followed the children for two years. As part of the study,
children were weighed when they were four and again when they were six. Table 26.1 shows
the results, including the children’s weight gain over the two-year period (Difference).
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 8
Weight Age 4 (lbs) Weight Age 6 (lbs) Difference (lbs)
37.1 46.9 9.8
26.7 35.9 9.2
36.1 48.3 12.2
36.2 44.2 8.0
40.3 50.6 10.3
43.9 56.4 12.5
36.2 51.8 15.6
40.7 50.2 9.5
42.5 51.6 9.1
34.8 41.4 6.6
37.9 55.3 17.4
34.5 44.5 10.0
31.1 39.9 8.8
36.4 46.3 9.9
35.7 49.6 13.9
33.4 39.0 5.6
Table 26.1. Change in weight from age 4 to age 6.
In the past, children around this age would have been expected to gain 4.5 pounds per year
or 9 pounds over the two-year period. However, we suspect that the mean change in weight
has increased. To test this assumption, we perform what is called a matched-pairs t-test.
The parameter µ in a matched-pairs t-procedure is the mean difference in observations
(or responses) on each individual (or subject in a match pair) – in our case, µ is the mean
difference between weight at age 4 and weight at age 6. We set up the null hypothesis (no
change from what was expected in the past) and alternative hypothesis (increase from what
was expected in the past):
H0 : µ = 9
Ha : µ > 9
Now, we calculate the one-sample t-test statistic using the differences. The sample mean and
sample standard deviation of the differences are: xD = 10.525 lbs and sD = 3.122 lbs. The
matched-pairs t-test statistic is computed as follows:
xD − µ0
t=
sD n
10.525 − 9
t= ≈ 1.95
3.122 16
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 9
From Figure 26.8 we see that p ≈ 0.035. Since p < 0.05, we reject the null hypothesis and
accept the alternative that the mean weight gain in children from age 4 to age 6 is greater than
9 pounds.
Density Curve
T, df = 15
p-value
0.03506
0 1.95
T
The last step in this analysis is to use a matched-pairs t-procedure to calculate a 95%
confidence interval for µ, the mean weight gain from age four to age six. The matched-pairs
t-confidence interval is computed as follows:
⎛s ⎞
xD ± t * ⎜ D ⎟
⎝ n⎠
⎛ 3.122 ⎞
(
10.525 ± 2.131 ⎜ )
⎝ 16 ⎟⎠
= 10.525 ± 1.663
Notice that the t-critical value, t*, depends only on the sample size and the confidence level
and not on whether we are calculating a one-sample or a matched pairs t-confidence
interval. Our confidence interval for µ is from 8.9 lbs to 12.2 lbs.
Look back at the results from our two matched-pairs t-procedures. We concluded from the
t-test that µ, the mean weight gain from age 4 to age 6, was greater than 9 pounds. However,
our confidence interval for µ was (8.2, 12.2), an interval that includes values that are below 9
pounds. Results from a two-sided confidence interval do not always match the results from a
significance test involving a one-sided alternative.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 10
Key Terms
Density curves for t-distributions are bell-shaped and centered at zero, similar to the
standard normal density curve. Compared to the standard normal distribution, a t-distribution
has more area under its tails. The shape of a t-distribution, and how closely it resembles the
standard normal distribution, is controlled by a number called its degrees of freedom (df). A
t-distribution with df > 30 is very close to a standard normal distribution.
where t* is a t-critical value associated with the confidence level and determined from a
t-distribution with df = n – 1.
A t-test statistic for testing H0 : µ = µ0 , where µ is the population mean, is given by:
x − µ0
t=
s n
The t-test is a modification of the z-test and is used in situations where the population standard
deviation σ is unknown and either the population has a normal distribution or the sample size n
is large. The p-value is determined from a t-distribution with df = n –1.
A matched-pairs, t-confidence interval for µD, the population mean difference, is given by
the formula:
⎛s ⎞
xD ± t * ⎜ D ⎟
⎝ n⎠
where t* is a t-critical value associated with the confidence level and determined from a
t-distribution with df = n – 1 and xd and sD are the mean and standard deviation of the
sample differences.
A matched-pairs t-test statistic for testing H0 : µD = µD0 , where µD is the population mean
difference, is given by
xD − µD0
t=
sD n
where xd and sD are the mean and standard deviation of the sample differences.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 11
The Video
Take out a piece of paper and be ready to write down answers to these questions as you
watch the video.
1. Why won’t the z-procedure work in most cases, particularly if the sample size is small?
3. Compare a normal density curve with a t-distribution for a sample size of 3. How are the
two distributions similar and how do they differ?
4. For a t-distribution, how are the degrees of freedom related to sample size?
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 12
Unit Activity:
Step-by-Step
Pedometers count the number of steps a person walks. If you want your pedometer to
calculate how far you have walked, you need to enter in your step length (distance from heel of
one foot to heel of other foot when walking). In this activity, you will collect data on step length
for males and females in the class. Assuming that students in your class are representative
of the student population, you will calculate confidence intervals for the mean step length of
males and females.
1. Discuss methods for getting reliable measurements for step length. After your group
discussion, the class must decide on the method that will be used to collect the step-length
data. Write a brief description of this method.
2. Collect the step-length data for males and females separately. After the data are collected,
the two data sets (male step lengths, female step lengths) should be distributed to the class.
In answering the remaining questions, assume that the class data are representative of the
general male and female student populations.
3. Check that the underlying assumption of normality is reasonably satisfied in both data sets.
4. a. Calculate the mean and standard deviation for the male step-length data.
b. Calculate the mean and standard deviation for the female step-length data.
5. a. Calculate a 95% confidence interval for the mean step length of males. What are the
degrees of freedom of the t-critical value?
b. Calculate a 95% confidence interval for the mean step lengths of females. What are the
degrees of freedom of the t-critical value?
6. Based on your confidence intervals in question 5, can you conclude that the mean step
length for males is greater than for females? Explain.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 13
Exercises
1. A woman in a nursing home is on medication for high blood pressure. Her blood pressure is
taken daily. A sample of 20 blood pressure readings (mmHg) appears below.
150 148 136 120 142 144 130 150 130 142
140 130 148 142 138 130 120 166 130 152
b. Determine a 95% confidence interval for µ, her mean systolic blood pressure.
Show your calculations.
c. Is the underlying assumption that these data come from a normal distribution reasonably
satisfied? Explain.
2. A manufacturer of brass washers produces one type of washer that has a target mean
thickness of 0.019 inches. After the production process had continued for some time without
any adjustment, a random sample of 10 washers was selected and measured for thickness.
The data are given below.
a. Does the assumption that washer thickness is normally distributed seem reasonable given
these data? Explain.
b. Do you think the production process is still in control or do the data indicate that it is time to
make some adjustments? To answer this question, test the hypothesis that the mean thickness
equals 0.019 inches against the hypothesis that it does not. Report the value of the test
statistic, the p-value and your conclusion.
c. Calculate a 95% confidence interval for µ, the mean thickness of washers currently being
produced. Show your calculations. Does your interval indicate that the process needs to be
adjusted to increase washer thickness or decrease washer thickness? Explain.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 14
3. Students in a statistics class measured their foot lengths and forearm lengths. The data are
given in Table 26.2. Assume this sample is representative of the student population.
a. Calculate a 95% confidence interval for µForearm , the mean forearm length of students. Then
calculate a 95% confidence interval for µFoot , the mean foot length of students.
b. Do your 95% confidence intervals support the hypothesis that the mean forearm length of
students differs from the mean foot length of students? Explain.
c. Calculate 90% confidence intervals for µForearm and µFoot . Compare the 95% confidence
intervals to the 90% confidence intervals. Which of the two were wider? Explain why that was
the case.
d. Answer part (b) based on the 90% confidence intervals calculated for (c).
4. A statistics professor was concerned that students were not as successful on the second
exam (which was on inference) as they were on the first exam (which was on descriptive
statistics). She took a random sample of 15 students enrolled in her introductory statistics
courses over the past three years. Their grades on these two exams appear in Table 26.3
(continued on next page).
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 15
Exam 1 Exam 2
67 59
74 82
85 96
71 62
78 69
96 83
63 52
91 94
93 82
81 67
84 66
64 66
88 84
89 75
82 90
Table 26.3. Statistics exam scores.
a. Compute the differences between Exam 2 and Exam 1. What is the sample mean for the
differences? What is the sample standard deviation for the differences?
b. Return to the professor’s concern that students were not as successful on the material for
Exam 2 as they were on the material for Exam 1. Let µD be the population mean difference
between the two exam scores. Set up null and alternative hypotheses to test the professor’s
supposition.
c. Conduct a significance test. Report the value of the test statistic and the p-value.
d. Do the results of your test of hypotheses support the professors’ concern? Explain.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 16
Review Questions
1. Given a simple random sample of size n, you want to compute a confidence interval for µ
( )
of the form: x ± t * s n . Find the value of t* for each of the following confidence levels and
sample sizes.
2. Supermarket rotisserie chickens have become very popular with the American public. A
study was conducted to compare the nutrient composition of commercially-prepared rotisserie
chicken to that of roasted chicken, which is listed in the USDA National Nutrient Database for
Standard Reference (SR).
a. The Standard Reference (SR) listed the mean protein content of roasted chicken breast
as 31 grams. In the sample of 9 rotisserie chickens, x = 29.86 grams and s = 1.95 grams.
Conduct a t-test to see if the mean protein content in rotisserie chicken breasts differs from the
SR. Report the value of the test statistic, the p-value, and your conclusion.
b. The SR listed the mean cholesterol level in roasted chicken thighs as 95 milligrams. In a
sample of 9 rotisserie chicken thighs, x = 134 milligrams and s = 2.43 milligrams. Conduct
a t-test to see if the mean cholesterol level in rotisserie chicken thighs differs from the SR.
Report the value of the test statistic, the p-value, and your conclusion.
3. A researcher studying ocean literacy focused her efforts on the program Ocean Commotion
– a one-day ocean/wetlands literacy program that includes hands-on demonstrations about
marine environments and products. Prior to attending the program, a sample of 337 students
from 6 schools were given a pre-test to measure their attitudes toward the ocean and
wetlands. After the program, students were given a post-test. The test was graded on a scale
from 1 (lowest) to 5 (highest). The mean score on the pre-test was 4.06. The mean score on
the post-test was 4.13. The standard deviation of the pre-post differences was 0.40.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 17
a. The researcher wanted to test whether attendance at Ocean Commotion had a significant
positive effect on students’ attitudes toward the ocean and wetlands. Let µD be the
mean difference in attitude after the program compared to before the program. Write the
researcher’s null and alternative hypotheses.
b. Calculate the value of the t-test statistic. What are the degrees of freedom associated with
the t-test statistic?
d. The researcher concluded that Ocean Commotion had a significant effect on students’
attitudes toward the ocean and wetlands. Given your answer to (c), do you agree with her
conclusion? Do you think that the difference is of practical importance? Explain.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 18
First Second Third
Questionnaire Questionnaire Questionnaire
2.6 3.1 2.2
4.9 5.0 2.5
6.0 4.8 3.9
3.3 3.5 4.5
4.1 4.2 2.9
3.8 4.0 3.1
5.2 4.6 4.8
4.3 4.7 5.1
3.1 5.3 3.9
5.5 5.2 3.8
4.2 5.4 4.9
2.0 3.2 2.8
4.3 5.1 2.9
3.6 4.6 3.2
3.1 3.8 2.9
5.2 4.7 3.4
Table 26.4. Results from Oxford Happiness Questionnaire
a. Students thought that the workshop would have a positive effect on happiness, at least short
term. Use the data from the first and second questionnaires to test their hypothesis. State the
null and alternative hypotheses, calculate the value of the t-test statistic, determine the p-value
and give your conclusion.
b. Use the data from the first and third questionnaires to test whether there is any long-term
positive effect on students from this type of happiness workshop. State the null and alternative
hypotheses, the value of the test statistic, the p-value, and your conclusion.
c. Let µ(Third – First) be the population mean difference of Oxford Happiness Questionnaire
scores: taken before and six weeks after participation in a happiness workshop. To estimate
the long-term effect of the workshop on students’ happiness, calculate a 95% confidence
interval for µ(Third – First). Interpret what it tells you about students who participated in happiness
workshops before and after participating in the Oxford Happiness Questionnaire.
d. The sample for this study consisted of students from one psychology class. Do you think the
results are valid for all college students from this university? Explain.
Unit 26: Small Sample Inference for One Mean | Student Guide | Page 19
Unit 27: Comparing
Two Means
Summary of Video
It’s an age old battle of the sexes. Are men or women worse drivers? Whatever your opinion
on this question, a statistician needs evidence in order to make a decision. One way to
analyze this question would be to see which gender, on average, gets more moving violations.
We could take a sample from all licensed drivers in one state, and then look at the number of
tickets each person received in one year. We could then calculate the mean number of tickets
received by members of each gender and compare the two numbers to see which group had
the worst driving record.
That’s what researchers did when they decided to investigate the difference in the amount
of calories necessary to power daily life in two groups of people with very different lifestyles.
Herman Pontzer is an anthropologist who is interested in how energy is used by primate
species, particularly human beings. Pontzer teamed up with other researchers to work with the
Hadza in Tanzania, a group of traditional hunter-gatherers who live in a way very similar to our
ancestors. Men hunt with bows and arrows and women forage for plant foods and dig for root
vegetables. The Hadza are a lot more active and cover a lot more ground than their Western
counterparts. Everyone had always assumed that this physically-demanding hunter-forager
lifestyle would require much more energy than the relatively inactive daily life of a Western
office worker. In fact, one suspected cause of the obesity epidemic in the West is our
more sedentary modern lifestyle. But the Hadza’s actual energy expenditure had never yet
been tested.
Was the assumption correct that the Hadza used more calories throughout their day? Pontzer
and his team already had data on how many calories typical Americans and Europeans
burned in their daily lives. Now they needed to measure how many calories it took to power
The Hadza are typically smaller and lighter than their Western counterparts. That difference
required Pontzer and his colleagues to use sophisticated statistical techniques in their
analyses to control for the effects of body size, age, and sex. To keep things simple, so that
we can follow their comparison, we’ll look just at women with comparable body sizes from
the Hadza and Western groups. We want to use our sample to determine whether there is a
significant difference between the means of the Hadza and Western populations.
First, the scientists calculated the mean total energy expenditure (TEE), which was measured
in calories, for each group. The sample means, standard deviations, and sample sizes for
each group are as follows:
Hadza
Westerners
Is the difference between these sample means significant? Or, could the difference we see be
due simply to chance variation? We can set up a significance test to figure this out. Below are
the null and alternative hypotheses concerning the total energy expenditure.
µ 1 = µ2
µ1 ≠ µ2
(x1 − x2 ) − ( µ1 − µ2 )
t=
s12 s22
+
n1 n2
Now we can substitute the numbers into the formula. We have the sample means, standard
deviations, and sample sizes. For the value of µ 1− µ2 , we use the value from the null
hypothesis, which states these two means are equal, and hence, µ 1− µ2 = 0 .
(1,877 − 1,975) − 0
t= ≈ −0.94
(364)2 (286)2
+
17 26
Like all of the z- or t-test statistics that we have encountered, this one tells us how far x1 − x2 is
from 0, the hypothesized difference in means, in standard units.
Software can figure out the degrees of freedom for the t-test statistic, or we can just go with a
very conservative approach that uses the smaller sample size minus one, which gives us 16
degrees of freedom. We can look up the corresponding p-value in a t-table or use technology;
either way, we get p = 0.3612. That means that assuming the null hypothesis is true, we have
a 36% chance of seeing a t-value as or more extreme than the one we calculated. A 36%
chance is pretty likely, so we have insufficient evidence to reject the null hypothesis. We
conclude that there is no significant difference between total energy expenditure of Hadza
women and Western women.
This, in fact, is what the researchers concluded. After controlling for body size, age, and sex,
the scientists did not find any statistical difference when they compared the mean daily energy
expenditure of the Hadza and the Westerners. This result seemed counterintuitive, since they knew
the Hadza were much more active. The researchers suspect that the Hadza’s bodies are allocating
a smaller percentage of those daily calories to run-of-the-mill cellular function and more to physical
activity. Researchers think that it is a difference in energy allocation, not energy efficiency.
B. Know how to check whether the underlying assumptions for a two-sample t-procedure are
reasonably satisfied.
C. Be able to calculate a confidence interval for the difference of two population means.
D. Be able to test hypotheses about the difference between two population means. Be able to
calculate the t-test statistic and use technology to determine a p-value.
We begin with a question related to the activity in Unit 26: Are the step lengths of 10th-grade
male students longer, on average, than the step lengths of 10th-grade female students? In
this case, the comparison is between two populations, 10th-grade males and 10th-grade
females. Let µ1 and σ 1 be the mean and standard deviation, respectively, of step lengths
for the population of 10th-grade males. Let µ2 and σ 2 be the mean and standard deviation,
respectively, for the 10th-grade females. If there is no difference between the mean step
lengths of male students and female students, then µ1 − µ2 = 0 . However, if males, on average,
take longer steps than females, then µ1 − µ2 > 0 . We can state the null hypothesis and
alternative hypothesis as follows:
H0 : µ1 − µ2 = 0
Ha : µ1 − µ2 > 0
Suppose we randomly select two samples, one of size n1 from the male students and another
of size n2 from the female students. After collecting the data, we can calculate the sample
means, x1 from the males and x2 from the females, and sample standard deviations, s1 from
the males and s2 from the females. It seems reasonable to use the difference in sample
means, x1 − x2 , to estimate the difference in population means, µ1 − µ2 . If the two populations
are normally distributed or if the sample sizes are large, then x1 − x2 has a normal distribution
with the following mean and standard deviation:
σ1 σ 2
µ x −x = µ1 − µ2 and σ x −x = +
1 2 1 2
n 1 n2
The two-sample z-test statistic has the standard normal distribution. At this point, if we knew
the population standard deviations, we could use a z-procedure to test our hypotheses.
Unfortunately, σ 1 and σ 2 are unknown, so we will need to use the sample standard
deviations, s1 and s2 , as estimates. Substituting s1 and s2 in place of σ 1 and σ 2 gives us the
two-sample t-test statistic:
(x1 − x2 ) − ( µ1 − µ2 )
t=
s12 s22
+
n1 n2
The two-sample t-test statistic has an approximate t-distribution. The degrees of freedom
(df) are a bit complicated to figure out. We can either use software or adopt a conservative
approach and set the degrees of freedom to be one less than the smaller of the two
sample sizes.
Now, we return to our hypotheses about step lengths of male and female students. Sample
data were collected and are summarized below:
Normal quantile plots of the male and female step-length data indicate that it is reasonable
to assume that step lengths are approximately normally distributed. Now we are ready to
compute the t-test statistic. Using the null hypothesis value of 0 for µ1 − µ2 and our sample
means and standard deviations, we get:
(64.08 − 60.34) − 0
t= ≈ 1.25
(7.71)2 (7.74)2
+
12 15
0.1186
0 1.25
T
Since p > 0.05, there is insufficient evidence to reject the null hypothesis. We cannot conclude
that the mean step length of 10th-grade male students differs from the mean step length of
10th-grade female students.
The next example involves a study that compares two teaching strategies for nursing students
– lecture notes combined with structured group discussions versus a traditional lecture format.
Two groups of students taking a medical-surgical nursing course were taught using each of
the two strategies. Exam scores were used to compare the effectiveness of the two teaching
strategies. Let µ1 be the mean exam score for students enrolled in the lecture notes/group
discussion version of the course; let µ 2 be the mean exam score for students enrolled in the
lecture only version of the course. We set up the following null and alternative hypotheses:
H0 : µ1 − µ2 = 0
Ha : µ1 − µ2 ≠ 0
Exam scores of two groups of students taught by each of these methods were collected with
the following results:
Since the sample sizes are large, we can conduct a t-test to decide between the null and
alternative hypotheses without first checking that the data come from normal distributions.
(80.60 − 77.68) − 0
t= ≈ 2.60
7.34 2 7.23 2
+
81 88
Distribution Plot
T, df = 80
0.4
0.3
0.2
0.1
0.005550 0.005550
0.0
-2.600 0 2.6
T
Since p < 0.05, we reject the null hypothesis and accept the alternative hypothesis that
the mean exam scores for the two teaching methods differ. To estimate that difference, we
calculate a two-sample t-confidence interval for µ1 − µ2 using the following formula:
* s12 s22
( x1 − x2 ) ± t +
n1 n2
Adopting the conservative approach, we set df = 80 and determine a t-critical value for a
95% confidence level: t*= 1.990. Now we are ready to calculate a 95% confidence interval for
µ1 − µ2 :
7.342 7.23 2
(80.60 − 77.68) ± (1.990) + ≈ 2.92 ± 2.23 , or (0.69, 5.15).
81 88
Hence, the mean exam scores for the lecture notes/group discussion teaching strategy are
between 0.69 and 5.15 points higher than the mean exam scores for the traditional lecture
teaching strategy.
The two-sample t-test statistic for testing the difference in population means is:
(x1 − x2 ) − ( µ1 − µ2 )
t=
s12 s22
+
n1 n2
where the value for µ1 − µ2 is taken from the null hypothesis. There are two options for finding
the degrees of freedom (df) associated with t: (1) use technology or (2) use a conservative
approach and let df = smaller of n1 − 1 or n2 − 1 .
s12 s22
(x1 − x2 ) ± t * +
n1 n2
The degrees of freedom for calculating t*, the t-critical value associated with the confidence
level, uses the approach outlined for the two-sample t-test statistic.
1. How might a statistician gather evidence to answer the following question: Are men or
women worse drivers?
2. What was different about the lifestyle of the Hadzas compared to typical Europeans
or Americans?
3. What was Pontzer’s original assumption about the daily energy expenditure (in calories)
consumed by the Hadza compared to the Westerners?
4. What type of test was used in the video to test this assumption for Hadza and Western
women of similar body size?
6. On what did Pontzer and his colleagues place the blame for rising societal levels of obesity?
Nabisco’s Chips Ahoy is a popular brand of chocolate chip cookies. Nabisco makes both a
regular and, for those who want to restrict their fat intake, a reduced fat version of chocolate
chip cookies. The question for this activity is to find out whether the mean number of chips per
cookie is the same for Chips Ahoy reduced fat chocolate chip cookies as it is for Chips Ahoy
regular chocolate chip cookies.
1. If needed, collect the data on the number of chips per cookie in regular and reduced fat
Chips Ahoy cookies. Your instructor will provide directions. (You may already have collected
the data as part of Unit 25’s activity.)
2. a. Do you think the mean number of chips per cookie is the same for both Chips Ahoy
regular and Chips Ahoy reduced fat chocolate chip cookies? If not, which type, regular or
reduced fat, do you think has, on average, more chips per cookie? Explain.
b. Set up null and alternative hypotheses for testing whether the mean number of chips per
cookie is the same for both the regular and the reduced fat version of Chips Ahoy chocolate
chip cookies. Be sure to define any symbols that you use in your hypotheses. (Did you choose
a one-sided or two-sided alternative?)
3. Report the sample size, mean and standard deviation for the regular chocolate chip cookie
data. Then do the same for the reduced fat chocolate chip cookie data.
4. Make comparative graphic displays of the chip count data for the regular and reduced fat
cookies. Based on your plots, do the chip counts for the two types of cookies appear to differ?
c. Is there a significant difference in the mean number of chips per cookie in regular and
reduced fat Chips Ahoy chocolate chip cookies? Explain.
6. Calculate a 95% confidence interval for the difference between the mean number of chips
per cookie in Chips Ahoy regular and Chips Ahoy reduced fat chocolate chip cookies.
The researchers hypothesized that pregnant employees would be rated differently when
compared with the control group.
a. Set up a null hypothesis and an alternative hypothesis to test whether the population mean
performance ratings differed for the two groups of female employees.
b. Calculate the t-test statistic and determine a p-value. State your conclusion.
c. Calculate a 95% t-confidence interval for the difference in mean performance appraisal
ratings for pregnant employees and non-pregnant female employees. On average, are the
performance ratings for pregnant women better or worse than for the non-pregnant female
employees? Explain.
2. Return to the study discussed in question 1. The same group of researchers also gathered
data on the pregnant group’s performance appraisal ratings during pregnancy and after
returning from pregnancy leave. Here is a summary of the data gathered:
Did the mean performance ratings for the pregnancy group differ significantly between the
During Pregnancy and After Pregnancy time periods?
Which test is more appropriate to answer this question, a one-sample t-test or a two-sample
t-test? Explain.
b. Use the appropriate test to answer the question posed in (a). Report the value of the test
statistic, the p-value, and your conclusion.
3. A state university is concerned that female students are not as well prepared in mathematics
as their male counterparts. Random samples of 20 male students and 20 female students were
selected from the class of first-year students. Their SAT Math scores are given below.
530 450 550 470 450 500 480 510 470 450
600 540 530 470 420 490 440 540 500 480
670 440 410 510 410 600 530 490 600 530
570 550 640 530 550 460 660 570 670 490
a. Make graphic displays to compare the SAT Math scores of the female students and the
male students. Do your plots provide evidence that male students entering the university have
higher SAT Math scores than female students?
b. Is it reasonable to assume that the distributions of SAT Math scores for both populations,
first-year male students and first-year female students, are approximately normally distributed?
Support your answer.
c. Calculate the sample means and standard deviations for the females and the males SAT
Math scores.
4. A group of 4-year-olds, who were part of the Infant Growth Study, participated in a
laboratory meal. Data collected during this meal can be used to answer the following research
question: Do the mean number of calories consumed by girls at a meal differ from the mean
number of calories consumed by boys? Below is a summary of the results:
b. Calculate the value of the two-sample t-test statistic. (Round to three decimals.)
c. Adopt a conservative approach and set the degrees of freedom to be one less than the
smaller of the two sample sizes. Calculate the p-value. Are the results significant at the 0.05
level? Explain.
d. For two-sample t-tests, the Content Overview of this unit suggested using a conservative
approach for determining the degrees of freedom associated with the test statistic: set df =
smaller of n1 − 1 or n2 − 1 , where n1 and n2 are the sample sizes of the two groups. However,
statistical software calculates degrees for freedom using the following formula:
2
s12 s22
+
df = n1 n2
2 2
1 s12 1 s22
+
n1 − 1 n1 n2 − 1 n2
In general, this formula does not result in an integer value. In that case, the degrees of
freedom are rounded down to the closest integer below the calculated value.
Use the formula given above to calculate the degrees of freedom. (Be sure to round down to
the closest integer.) Calculate the p-value based on the df you calculated from the formula.
Based on this p-value, are the results significant at the 0.05 level?
2. A study was conducted to investigate the effect of different levels of air pollution on the
pulmonary functions of healthy, non-smoking, young men. Two geographical areas with
different levels of air pollution were selected – Area 1 had lower levels of pollutants than
Area 2. Samples of 60 men were selected from each area. The two groups of men had no
significant differences in age, height, weight, and BMI. Data on two measures of pulmonary
function for each group are provided below:
a. Why do you think the researchers tested to see if there were significant differences between
the age, height, weight, and BMI for the two samples?
b. Test whether there is a significant difference between the mean FVC for the participants
from Area 1 and Area 2. State the value of the test statistic, the p-value and your conclusion.
c. Test whether there is a significant difference between the mean RR for the two areas. State
the value of the test statistic, the p-value and your conclusion.
d. If you find a significant difference in (b) or (c) or both, construct a 95% confidence interval to
estimate the difference in means between the two areas.
480 540 620 590 530 620 580 530 530 560 510 560 560
550 520 480 560 510 500 540 490 430 610 620 510
480 560 400 580 480 460 430 430 490 610 540 500 540
400 530 640 350 470 600 610 530 580 430 510 520 380
a. Make comparative boxplots for the SAT Writing scores for female and male students.
Based on your boxplots, is it reasonable to assume that SAT Writing scores are approximately
normally distributed for each gender? Does one gender tend to have higher SAT Writing
scores than the other?
b. Summarize the data by reporting the sample sizes, sample means and standard deviations
for both groups.
c. Test to see if there is a significant difference in mean SAT Writing scores between female
and male first-year students attending this university. Report the value of the test statistic, the
p-value, and your conclusion.
d. Compute a 95% confidence interval for the difference in mean SAT Writing scores for
female and male students attending this university. Interpret your results.
4. Do 4-year-old boys eat, on average, more mouthfuls of food at a meal than 4-year-old girls? A
group of 4-year-olds, who were part of the Infant Growth Study, participated in a laboratory meal.
b. Determine the value of the two-sample t-test statistic and the p-value.
Report your conclusion.
Summary of Video
It is nearly impossible to collect data about an entire population. Take, for example, all the
salmon in one watershed. We can’t count the number of eggs laid by every single spawning
salmon. But we can count the eggs laid by a sample of some of these salmon. Then, using
statistical inference, we can use the mean number of eggs in our sample to draw conclusions
about the egg-laying population as a whole. As part of the inference procedure, we use
probability to indicate the reliability of our results.
We can also use statistical inference to estimate a population proportion. For instance,
suppose we wanted to know how many of the eggs laid by the salmon were fertilized. We
could investigate the fertilization rate in our sample to get a sample proportion or sample
percentage. Then we could use the sample proportion as an estimate of the unknown
population proportion. But how good of an estimate is it? This will be the topic of this video –
using information from samples to make inferences about population proportions.
Let’s turn our attention to a completely different context: the workplace. Employers think
about how to motivate their employees to do their best, most creative work. Psychologist
Teresa Amabile has studied creativity for years. One of Amabile’s discoveries from her
earlier research is that creativity fluctuates, even for a given individual, as a function of the
kind of work environment the individual is in. Building on that foundation, Amabile designed
a study around the question of worker motivation. She recruited 238 people with creative
jobs who were willing to keep track of their activities, emotions, and motivation levels every
workday. Their electronic diaries had two components. One consisted of participants rating
their motivation, emotions, and other subjective factors on a seven-point scale. The second
component was an open-ended question where participants were asked to describe one
event that stood out that day. It could be anything, as long as it was relevant to the work or the
project. After several years, Amabile had nearly 12,000 diary entries. These entries validated
her earlier findings that people were able to solve problems creatively and come up with new
ideas on days they felt most motivated and excited about their work. So, the next question to
ask was: What led to high levels of motivation?
Figure 28.1. Type of event recorded on workers best and worst days.
Amabile and her coauthor decided to survey managers to see whether they were aware of
how important this feeling of progress was in motivating workers. She asked them to rate
five different items in order of how much they felt they affected workers’ motivation. If the
managers just randomly chose one of the five options to rank as most important, we would
expect 20% of them to pick progress. So, we let p be the proportion of all managers who
would pick progress as the most important of the five items for motivating workers. Now, we
can set up a test of hypothesis for the population proportion, p:
H0 : p = 0.20
Ha : p ≠ 0.20
As it turned out, only 35 out of 669 managers selected progress as the top motivational
factor. That gives a sample proportion of just 0.0523, or a mere 5.23%. This seems pretty low
compared to the 20% proportion from our null hypothesis. But is it low enough to reject the null
hypothesis? To find out, we can turn to a z-test statistic:
where p̂ (pronounced p-hat) is our sample proportion, p0 is the null hypothesis proportion,
and n is the sample size. Substituting our sample proportion and sample size we get:
0.0523 − 0.20
z= ≈ −9.55
(0.20)(1− 0.20)
669
That is a pretty extreme z-test statistic. If you compare it to a standard normal distribution,
being 9.55 standard deviations from the mean is highly unlikely. As can be seen from Figure
28.2, the area under the curve that far out is not really visible! In fact, the p-value is 0.000.
So, we have our answer: reject the null hypothesis and accept the alternative. The population
proportion of all managers in the world who would select “Support for Making Progress” as the
most important motivator is not 20%.
Now that we have rejected the null hypothesis, let’s calculate a confidence interval for the
true population proportion. We know that the sample proportion of managers who selected
progress was 0.0523, but we don’t know how close that is to the true population proportion.
Just like in the confidence intervals for one mean, we can figure out a standard error to go with
our point estimate. Here’s the formula for the confidence interval:
pˆ (1 − pˆ )
pˆ ± z *
n
Next, we use our sample information to calculate the 95% confidence interval for the
population proportion, p:
0.0523(1 − 1.0523)
0.0523 ± 1.96 ,
669
So, our estimate is that only between 3.5% and 6.9% of managers in the overall population
would rate progress as the number one motivational factor. A good question to ask is how
could managers be so unaware of what really counted to their employees? What managers
have said in response to that question is that it is just part of their employees’ jobs – they are
supposed to make progress. Managers don’t typically think of progress as something that they
need to worry about. But, according to Amabile, they actually do need to worry about it a lot.
What Amabile saw in the diaries was that there were often little hassles happening in the work
lives of most of the study participants that kept them from making as much progress as they
would like. These were things that managers could have cleared away for them, without a lot
of effort, if they had just been paying attention.
On some level the workers themselves might have recognized that their best days often went
hand-in-hand with progress events. But the managers basically had no clue. It is the kind of
finding that makes perfect sense once you know about it. Sometimes you just have to ask the
right questions and know how to analyze the data.
D. Understand that the z-inference procedures for proportions are based on approximations to
the normal distribution and that accuracy depends on having moderately large sample sizes.
In inference, we start by defining the population – for our question on home-use of computers,
the population will be all households in America. Of interest is the population proportion,
p, of households in which some member owns or uses a computer at home. Now, we don’t
have access to every household in America, but we can take a sample. In a random sample of
2,500 households, 2,036 answered yes to the following question:
At home, do you or any member of this household own or use a desktop, laptop, netbook,
or notebook computer?
From this information we can calculate the sample proportion, which we label as p̂ :
2036
pˆ = = 0.8144 , or 81.44%
2500
But how good is this estimate for p? Remember, the sample proportion, p̂ , is a statistic. If we
take another sample of 2,500 households, we will most likely get a different estimate for p. So,
as a first step in developing inference procedures for population proportions, we need to know
something about the sampling distribution of the sample proportion, p̂ .
Suppose that a large population is divided by some characteristic into two categories,
successes and failures, and that p is the population proportion of successes. A simple
random sample of size n is drawn from the population and is the sample proportion:
As a statistic, p̂ varies over repeated sampling. Its sampling distribution has the
following properties:
• Mean: µ p̂ = p
p(1− p)
• Standard deviation: σ p̂ = .
n
• Distribution: For large n, p̂ has an approximately normal distribution.
Since, in the case of home use and/or ownership of computers, the sample size is large,
2,500, the sampling distribution of p̂ is approximately normal (as pictured in Figure 28.3.)
Standard
deviation
p(1 − p )
n
p
Values of p
Suppose that an online source claimed that 79% of American households had a member of
the household who owned or used a computer at home. We would like to test that claim. To
do so, we use the online source’s claim about the population to set up the null and alternative
hypotheses:
Now, if the null hypothesis is true, then the distribution of p̂ from a sample with n = 2,500 will
have an approximately normal distribution with the following mean and standard deviation:
µ p̂ = 0.79
(0.79)(1− 0.79)
σ p̂ = ≈ 0.0081
2500
pˆ − 0.79
z=
0.0081
If the null hypothesis is true, z will have a standard normal distribution. Now, go back to the
results of the survey, pˆ = 0.8144 , and express that value in standardized units:
0.8144 − 0.79
z= ≈ 3.01
0.0081
We calculate a p-value for the significance test by determining how likely it is to observe a
value from the standard normal distribution that is at least 3.05 from the mean. In this case,
we get a p-value of 2(0.001306) ≈ 0.003. Since this p-value < 0.05, we can reject the null
hypothesis and conclude that the population proportion is not 0.79, or 79%.
0.001306 0.001306
-3.01 0 3.01
z
pˆ − p0
z=
p0 (1 − p0 )
n
where p̂ is the sample proportion. When the null hypothesis is true and the sample
size is large, the z-test statistic will have an approximate standard normal distribution.
Now that we have rejected the null hypothesis that members of 79% of American households
own/use a computer at home, let’s calculate a confidence interval for the true population
proportion. The formula for a confidence interval for a population proportion follows the
same pattern that was used to calculate a confidence interval for a population mean:
Here’s the formula for calculating a confidence interval for a population proportion.
( pˆ )(1 − pˆ )
pˆ ± z *
n
where p̂ is the sample proportion and z* is the z-critical value (from a standard normal
distribution) associated with the confidence level.
Suppose we decide on a 95% confidence interval for p. Then we use z* = 1.96, just as we did
in Unit 24, Confidence Intervals. All that is left is to substitute our observed sample proportion,
pˆ = 0.8144 into the formula (Continued on next page):
So, now we are ready to use sample proportions to conduct significance tests and calculate
confidence intervals for population proportions.
Draw a sample of size n from this population. Then the sample proportion, p̂ , is calculated
as follows:
If the sample size n is relatively large, the sampling distribution of the sample proportion,
p̂ , is approximately normally distributed with the following mean and standard deviation:
In situations where the sample size n is large, a confidence interval for the population
proportion, p, can be calculated from the formula:
pˆ (1 − pˆ )
pˆ ± z *
n
where p̂ is the sample proportion and z* is the z-critical value (from a standard normal
distribution) associated with the confidence level.
2. In Teresa Amabile’s earlier study of workers in creative jobs, how did participants of the
study feel on the days when they were most able to solve problems creatively and come up
with new ideas?
4. Managers were given five items, including progress, and asked to select the one that they
felt most affected workers’ motivation. If managers randomly selected one of the five items,
what percentage of the managers would we expect to select progress?
5. What type of test statistic was used to test the null hypothesis H0 : p = 0.20 , where p is the
population proportion?
6. In the video, a 95% confidence interval was calculated for the true population proportion
of managers who would select progress as the most important motivational factor. After
converting to percentages, were the values in this confidence interval below 20%, around
20%, or above 20%?
In the activity for Unit 21, you completed Table 21.1 by simulating data for inheriting blue eyes
(genes bb) from brown-eyed parents who carried a recessive gene for blue eyes (genes Bb).
You will need those data for this activity. In this activity, the population consists of the children
of brown-eyed parents, each of whom carries a recessive gene for blue eyes. In this case, the
true population proportion is known, which is generally not the case, and p = 0.25. In this case,
knowing the population proportion allows us to see how well the statistics perform.
Table 28.1.Table
Data 28.1on children’s eye color.
2. a. For each sample of four children, calculate the sample proportion of blue-eyed children,
p̂ . Enter the sample proportions in the third column of Table 28.1.
b. Notice that your sample proportions vary from sample to sample (even though the
population proportion stayed the same). What was the smallest sample proportion? What was
the largest?
c. To get a sense of the shape of the sampling distribution of the sample proportion, make
a histogram of your values for p̂ (from column three). Use class intervals of width 0.25 for
your histogram. Does your histogram indicate that the sample proportions have a normal
distribution?
3. a. Complete the fourth column of Table 28.1 by entering a running total of the number of
children as samples are combined.
This list should contain the following numbers: 4, 8, 12, . . . , 120.
b. Complete the fifth column of Table 28.1 by entering a running total of the number of blue-
eyed children as samples are combined.
4. The confidence interval formula given in the Content Overview is for large sample sizes.
After combining the data from the first 10 samples, you now have a sample of 40 children.
a. Give a point estimate for the population proportion, p, of blue-eyed children based on the 40
children from Samples 1 – 10.
5. After combining the data from the first 20 samples, you now have a sample of 80 children.
a. Give a point estimate for the population proportion, p, of blue-eyed children based on your
sample of 80 children.
6. After combining the data from all 30 samples, you now have 120 children.
a. Give a point estimate for the population proportion, p, of blue-eyed children based on your
sample of 120 children.
7. Compare the margins of error for the three confidence intervals that you computed in
questions 4 – 6. What happened to the margin of error as the sample size increased?
8. From questions 4 – 6, we know that sample size affects the margin of error. How large a
sample size n is needed to guarantee that the margin of error for a 95% confidence interval for
p is less than 0.05? Complete parts (a) – (c) to find out.
a. The margin of error, E, for a 95% confidence interval is calculated by the following formula:
pˆ (1 − pˆ )
E = 1.96
n
b. If you solved for n correctly, you found that n is a multiple of pˆ (1 − pˆ ) , which varies for
different values of p̂ . Complete the second column of Table 28.2 by calculating the values of
pˆ (1 − pˆ ) for different values of p̂ (See next page).
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
28.2pˆ (1 − pˆ ) .
Table
of
Table 28.2 Values
c. To find the value of n that guarantees a margin of error < 0.05, substitute the largest value
you found for pˆ (1 − pˆ ) into your equation in (a). Report the value of n needed to guarantee that
the margin of error will be less than 0.05 (regardless of the value of p̂ ).
9. To conclude this activity, we know that the population proportion of blue-eyed children born
to brown-eyed parents with a blue-eye recessive gene is p = 0.25. Which of your confidence
intervals from questions 4 – 6 gave correct results? (In other words, which of your confidence
intervals contained the true population mean?)
Taking all things together, how would you say things are these days – would you say you’re
happy or not too happy? Of the responses, 2,098 students selected happy. (These data were
from a Monitoring the Future survey.)
a. Determine the sample proportion of students who responded they were happy.
b. Calculate a 95% confidence interval for the population proportion of 12th-grade students
who are happy.
c. Would a 90% confidence interval for the proportion of happy students be wider or narrower
than the one you calculated for (b)? Justify your answer.
2. Currently, mothers in North America are advised to put babies to sleep on their backs. This
recommendation has reduced the number of cases of sudden infant death syndrome (SIDS).
However, it is a likely cause of another problem – flat spots on babies’ heads. A study of 440
babies aged 7 – 12 weeks found that 46.6% had flat spots on their heads.
a. The headline of the online news article reporting this story read: Nearly half of babies have
flat spots, study finds. Conduct a test of hypotheses to test H0 : p = 0.5 against Ha : p ≠ 0.5
where p is the population proportion of North American babies aged 7 – 12 weeks who
have flat spots on their heads. Report the value of your test statistic, the p-value, and your
conclusion.
b. Calculate a 95% confidence interval for the proportion of babies in this age group that have
flat spots.
c. Suppose you decide to use your confidence interval from (b) to make a decision between
H0 : p = 0.50 and Ha : p ≠ 0.50 . Would your decision based on your confidence interval agree
with your decision based on the z-test statistic from (a)? Explain.
3. An online article claims that 90% of American households in which a computer is owned/
used have access to the Internet. However, an Internet provider questioned the claim. The
Internet provider felt that the percentage should be higher. A phone survey contacted 1,910
c. Calculate the z-test statistic, determine the p-value, and state your conclusion.
4. Return to question 3. Calculate a 95% confidence interval for the population proportion p.
Re-express your confidence interval as a percentage.
a. Determine the sample proportion of eighth-grade students who responded that they were
involved nearly daily in some sort of physical activity.
b. A physical education teacher claimed that over 50% of all eighth-grade students in America
actively participate in physical activity on a nearly daily basis. Set up a null hypothesis and an
alternative hypothesis to test this claim.
c. Conduct a significance test for the population proportion. Report the value of the test
statistic, the p-value, and your conclusion.
2. Polls taken a few days before the 2012 presidential election between Barack Obama and
Mitt Romney did not indicate a clear winner. An NBC/Wall Street Journal poll showed that 48%
of the sample intended to vote for Obama. The polling organization announced that they were
95% confident that the sample result was within ± 2.6 percentage points of the true percent of
all voters who favored Obama.
a. Explain in plain language to someone who knows no statistics what “95% confident” means
in this announcement.
b. The poll showed Obama leading Romney 48% to 47%. Yet NBC/Wall Street Journal
declared the election was too close to call. Explain why.
For each of questions (a) – (c), determine a point estimate for the proportion of graduates from
this college who would agree with the statement. Then calculate a 95% confidence interval for
the population proportion.
4. Rasmussen Reports conducted a national survey of 1,000 adults from June 19-20, 2013.
The poll found that 63% of Americans think that a government that is too powerful is a bigger
danger than one that is not powerful enough.
a. Use the information from the report to calculate a 95% confidence interval for the proportion
of Americans who would agree with the statement above. Restate your confidence interval in
terms of percentages. What is the margin of error for your confidence interval?
b. The report concluded with the following statement: The margin of error is ±3% with a 95%
level of confidence. Compare this statement with the margin of error you calculated in (a).
c. Was a sample size of 1,000 large enough to guarantee that the margin of error was less
than 3% even if the sample percentage had been as low as 50% or as high as 80%? Explain.
d. How large a sample size was needed to guarantee that the margin of error was below 3%
regardless of the sample proportion?
Summary of Video
In this video, we visit the Broad Institute in Cambridge, Massachusetts, where our host, Dr.
Pardis Sabeti, has a small research team investigating an ancient biological battle – the non-
stop evolutionary arms race between our bodies and the infectious microorganisms that try to
invade and inhabit them. The Broad Institute is home to new high tech tools such as the latest
generation of genome sequencers. They allow us to sequence out the letters that code the
genomes of both humans and our microbial enemies. In her research, Dr. Sabeti and her team
use the data that these machines provide to find clues that might lead to new ways to battle
some of our most dangerous diseases, diseases that we in the West rarely encounter.
One of the deadliest is Lassa fever, which, like the more notorious tropical disease Ebola,
is caused by a virus and kills its victims with hemorrhagic fever. Throughout West Africa,
thousands of people die of Lassa fever every year. But what is surprising is that many tens
of thousands more throughout the region are exposed to the virus without getting sick. This
suggests that these people have some sort of resistance to the virus. It is the source of this
resistance that Dr. Sabeti wants to discover.
Dr. Sabeti’s work on Lassa fever is still in its early stages, but one of the models for what she
hopes to uncover can be found in the research on another tropical disease, malaria, which
kills and sickens millions every year. With malaria we already know of one important source of
resistance to the disease. It’s a genetic mutation that is better known for the harm it does than
for the good – sickle cell anemia.
As we discovered in the module on binomial distributions (Unit 21), if a child inherits two copies
of the sickle cell mutation (SS) from his or her parents, the child will have sickle cell anemia. If
the child inherits only one copy of the gene, he or she is unaffected by the disease, but more
importantly the child is protected against malaria. (See Figure 29.1.) It is this protective effect that is
responsible for the sickle cell mutation becoming so prevalent and it is statistics that can reveal it.
To see how two-way tables can help reveal protective factors, Dr. Sabati has borrowed some
data from Dr. Hans Ackerman. He and his colleagues looked at the genotypes of 315 children
with severe malaria. Since each child inherits one hemoglobin gene from each parent, they
examined 630 genes in total. The researchers wanted to quantify whether children who came
down with malaria were less likely to have inherited the protective sickle cell version of the
gene (HbS) rather than the normal version (HbA), as compared to the general population.
Table 29.1 shows the breakdown of HbA and HbS in two groups of children. The top row of the
table shows the genes they found in the children with severe malaria. The bottom row shows
the genes they found in a control group of newborn babies.
Intuitively, we would expect to find the protective version of the gene less frequently in the
children sick with malaria than in the control group. After all, if they were protected, they likely
wouldn’t have come down with the disease. Table 29.2 shows the conditional distribution of
HbA and HbS for each group of children.
Notice that HbS was inherited by the kids who caught malaria only 1.11% of the time compared
to 8.66% of the time by the control group. Is that difference larger than would be expected
just by chance? Is it statistically significant? We can conduct a test of hypotheses to find
out whether there is sufficient evidence that the status of two variables – Malaria/General
Population and HbS/HbA – are linked. Our null hypothesis is that there is no association
between contracting malaria and having the HbS sickle cell gene. The alternative hypothesis is
that there is an association between contracting malaria and having the protective HbS sickle
cell gene.
We can compute what the expected counts in our two-way table would be if there really is no
association between our variables as the null hypothesis states. Here’s how to compute the
expected counts:
Table 29.3 shows the results of adding the expected counts to our two-way table.
Table
Table
229.3.
9.3 Adding the expected counts.
Now we can see that if there were no relationship between having the gene and coming down
with the disease we would expect to find 37.9 HbS genes in the children with malaria. But in reality
there are only 7 HbS genes in that group. Is that difference between 7 and 37.9 enough to tell us
that there is an association between our two categorical variables? The next step in our analysis
is to use the chi-square test statistic, given below, to figure out if that difference is significant.
expected
The chi-square test statistic is a measure of how far the observed counts in the table are from
the expected counts. Here are the calculations:
χ2 =
592.1 37.9 1095.9 70.1
χ 2 ≈ 41.26
Using software, we find the p-value: p ≈ 0 . So, we have very strong evidence that there is an
association between our variables and we can reject our null hypothesis. This result, together
with the pattern of the data, gives support to the research hypothesis that the HbS sickle cell
variant of the hemoglobin gene does protect against malaria.
Each year, the study Monitoring the Future: A Continuing Study of American Youth (MTF)
surveys 12th -grade students on a wide range of topics related to behaviors, attitudes,
and values. It is a major source of information on smoking, drinking, and drug habits of
American youth.
Suppose we want to investigate whether the environment in which students grow up is linked
to the likelihood that they have consumed alcohol (more than just a few sips). We focus on
three growing-up environments – a farm, the country, or a small-to-medium size city. Since we
expect the growing-up environment may help us explain the likelihood of alcohol consumption,
environment is the explanatory variable, and alcohol consumption is the response variable. We
are interested in testing if there is an association between these two variables or if they are
independent.
The two-way table in Table 29.4 shows the results on these questions from the
2011 MTF survey.
Environment
Count A Farm Country Small/Medium City
No 144 342 1366
Alcohol
Yes 305 800 3049
We begin analyzing these data using techniques covered in Unit 13, Two-Way Tables.
Because we think that growing-up environment explains whether or not students might have
consumed alcohol, we calculate the conditional percentages for the variable alcohol for each
level of environment. In other words, we compute the column percentages, which appear in
Table 29.5.
Environment
Percent A Farm Country Small/Medium City
No 30.94 29.95 32.07
Alcohol
Yes 69.06 70.05 67.93
Total 100.00 100.00 100.00
Table
29.5
Table 29.5. Column percentages.
Based on Table 29.5, it looks as if students who grew up in the country were the most likely
(70.05%) to have drunk alcoholic beverages and the students who grew up on a farm were
Remember, the data in Table 29.5 came from a sample of 12th-grade students. The meaning
of the null hypothesis is that in the population of all 12th-grade students in America there
is no difference among the distributions of alcohol consumption for the three growing-up
environments. To test H 0 we compare the observed counts from Table 29.4 with the counts
that we would expect to see if the two variables were independent (no association). If it
turns out that the observed counts are far from the expected counts, then we would have
evidence against the null hypothesis. Here’s how to calculate the expected counts.
Assume that H0 is true and that there is no association between two variables in a
two-way table. Then the expected count in any cell of the table is computed as follows:
where the grand total is the sum of the counts in all cells in the table.
Before calculating the expected counts, we add the row and column totals to our table of
counts (See Table 29.6.).
Environment
Total
Farm Country City
No 144 342 1366 1852
Alcohol
Yes 305 800 3049 4154
Total 449 1142 4415 6006
(1852)(449)
expected count = = 138.45
6006
Table 29.7 shows the expected counts added to the table. For each cell, the expected counts
appear below the observed counts.
Environment
Total
Farm Country City
144 342 1366
No 1852
138.5 352.1 1361.4
Alcohol
305 800 3049
Yes 4154
310.5 789.9 3053.6
Total 449 1142 4415 6006
(observed − expected)
2
χ2 = ∑
expected
χ 2
=
138.5 352.1 1361.40.76
+
310.5 789.5 3053.6
≈ 0.76
0.5
0.4
0.3
0.2
0.6839
0.1
0.0
0 0.76
2
Χ
Next, we ask whether the same results would be true for 12th-grade students’ smoking habits.
In other words, are the smoking habits of 12th-grade students independent of their growing-up
environment? Table 29.8 gives results for these questions from the 2011 MTF survey. (More
students answered the question on smoking than did on drinking alcohol.)
This time we leave the work of calculating the expected cell counts to the statistical software
Minitab. Figure 29.3 shows the Minitab output.
Figure 29.3. Minitab chisquare analysis for smoking and growingup environment.
Figure 29.3. Minitab chi-square analysis for smoking and growing-up environment.
Notice that the cell counts appear below the observed counts in the table. The value of
Notice that
the test the cell
statistic is counts appear
χ 2 ≈ 56.2 below
. Since thisthe
is aobserved
3×3 table,counts
= (3in–the
1)(3table.
– 1) The value of the
= 4. Minitab
reports
test the is
statistic value as approximately
χ 2 ≈ 56.2 . Since this is 0. So, table,
a 3×3 we conclude
df = (3 that
– 1)(3the– variables smoking
1) = 4. Minitab and
reports
growingup environment are not independent – there is an association.
the p-value as approximately 0. So, we conclude that the variables smoking and growing-up The results from
the chisquare test do not tell us anything about the nature of the association, only that
environment
there is one.areTonot independent
learn – thereofisthat
about the nature an association.
association, The results
we look from
at the the chi-square
conditional
test do not tellofussmoking
distributions anythingforabout
eachthe nature
of the of the association,
growingup environments. only Table
that there
29.9isshows
one. Tothelearn
column percentages.
Unit 29: Inference for Two-Way Tables | Student Guide | Page 10
Environment
Percent Farm Country City
about the nature of that association, we look at the conditional distributions of smoking for
each of the growing-up environments. Table 29.9 shows the column percentages.
Environment
Percent Farm Country City
Never 53.3 53.6 60.3
Smoking Occasionally 28.3 29.3 28.5
Regularly, now or past 18.4 17.1 11.2
Total 100.0 100.0 100.0
What we notice from Table 29.9 is that a higher percentage of students who grew up in a city
never smoked (60.3%) compared to students who grew up on a farm (53.3%) or in the country
(53.6%). The percentages for students who occasionally smoked (but not regular smokers)
were about the same for all three growing-up environments. However, the percentage of
regular smokers (either now or in the past) was higher for students who grew up on a farm
(18.4%) or in the country (17.1%) compared to students who grew up in a city (11.2%).
The chi-square test, like the z-test for proportions, is an approximate method that becomes more
accurate as the cell counts get larger. If the expected cell counts get too low, the test becomes
untrustworthy. Here are some guidelines for when a chi-square test gives accurate results.
Statistical software will often give a warning if the guidelines have been violated. For example,
energy drinks – non-alcoholic beverages that usually contain high amounts of caffeine (e.g.,
Red Bull, Full Throttle, and Monster) – have caused concern in the medical community.
Suppose we wanted to know if the pattern of daily consumption of energy drinks was
associated with students’ growing-up environment.
The output from Minitab appears in Figure 29.4. Notice the software reports the value of the
chi-square test statistic, but this time it does not provide a p-value. Instead it prints a warning,
which we have highlighted.
Figure 29.4. Minitab chisquare analysis for energy drinks and growingup environment.
Figure 29.4. Minitab chi-square analysis for energy drinks and growing-up environment.
In this case, we could combine some of the categories for energy drinks. For example,
In this case, we could combine some of the categories for energy drinks. For example, we
we might combine categories Three, Four, and Five or more into a single category
might
“Threecombine categories
or more.” You will Three, Four, and
get a chance Five
to try thisorapproach
more intoinathe
single category “Three or
exercises.
more.” YouData forget
will twoway tables
a chance can
to try arise
this in different
approach in theways. In the case of the
exercises.
data, a single sample of high school students was chosen to take part in the
survey.
Data Their responses
for two-way tables canto two
arisequestions (two
in different categorical
ways. variables)
In the case of the were organized
Monitoring the Future
into twoway tables. That was not the case for the data discussed in the
data, a single sample of high school students was chosen to take part in the survey. Theirvideo. Those
data came from two different samples, a sample of children sick with malaria and a
responses
sample ofto two questions
newborns (two
(control categorical
group), variables)
which were then were organized
classified into two-way
according to one tables.
That was not the case for the data discussed in the video. Those data came from two different
Student a
samples, Guide,
sample Unit
of 29, Inference
children for TwoWay
sick with malaria andTables
a sample of newborns (control Page 10
group),
Chi-square statistic:
(observed − expected)2
χ2 = ∑
expected
Degrees of freedom for chi-square test of independence: (r − 1)(c − 1) , where the number r and
c are the number of rows and columns, respectively.
1. What type of research is the host of this series, Dr. Pardis Sabeti, involved in?
2. Dr. Sabeti’s work is modeled off of work done on malaria. What genetic mutation is an
important source of resistance to malaria?
3. What were the null and alternative hypotheses for testing whether the sickle cell gene
protects against malaria?
4. What is the rule for calculating the expected counts under the null hypothesis?
5. The p-value of the chi-square test statistic turned out to be approximately 0. What can you
conclude based on this p-value?
This activity is in three parts. In Part I, you will examine the reasoning behind the expected
count formula. In Part II, you will need to collect data on eye color and gender from a sample of
students. In Part III, there are different samples – different types of M&M candies. The candies
are classified on one variable, color. In all three cases, you will conduct chi-square analyses.
1. A survey given to 500 students asked: How would you describe your political preference?
There were three response choices: GOP (Republican), DEM (Democrat), and IND
(Independent). Keeping with the color theme of this activity, GOP is red (red states tend to
vote Republican), DEM is the blue, and to make the color scheme patriotic, we’ll let IND be
represented by the color white. In addition to collecting information on political preference, the
students indicated whether they were male or female. The results are given in Table 29.10.
Table
Table29.10.
29.10 Distribution of political preference and gender.
We are interested in finding out whether there is an association between gender and political
preference. We begin attacking this problem as a problem in probability. For example, to
estimate the probability that a randomly selected student will be female and a Democrat (blue),
we use the observed proportion 107/500. We can also calculate marginal probabilities using
the row or column totals. For example, we estimate the probability that a student prefers the
Democratic Party to be 196/500 and the probability that a randomly selected student is female
as 246/500.
Using probability, we can examine what it would mean for the variables gender and political
preference to be independent (or to have no association). If gender and political preference are
independent, then we can use the Multiplication Rule to calculate this probability: P(political
preference = DEM and gender = female).
b. Use your probability in (a) to determine the number of students out of the 500 observed that
you would expect to fall into the category of being female and preferring the Democratic Party.
c. In a test of the null hypothesis H0 : no association between the variables , the formula for
calculating the expected count is
For the cell corresponding to female and DEM, determine the expected count from the formula
above. Compare your result with your answer to (b).
d. Repeat (a) - (c) for the cell corresponding to DEM and Male.
2. a. Assuming that the null hypothesis in 1(c) is true, calculate the expected counts for each
cell in Table 29.10.
b. Calculate the value of the chi-square test statistic and the degrees of freedom. Then
determine the p-value.
One way to gather data that is appropriate for chi-square analysis is to select a single sample
and then to classify the subjects in that sample by two categorical variables.
You will need a sample of students (your class, combined classes, friends). The two variables
that you will use to classify the students in your sample are gender and eye color. The null
hypothesis is:
H0 : No association between gender and eye color.
or equivalently:
H0 : The variables gender and eye color are independent.
Eye Color
Total
Count Blue Brown Other
Male
Gender
Female
Total
c. Calculate the expected cell counts and enter them into your table.
d. Perform a chi-square test. Report the value of the test statistic, the p-value, and your
conclusion.
Another data structure that is appropriate for chi-square analysis is when samples are drawn
from different populations and classified on one categorical variable. In this case, we can
think of “which sample” as the second variable. Next, your samples will be from different types
of M&M candies. Given bags of at least two types of M&Ms, you will classify the M&Ms into
colors, taking care to record which type of M&Ms candies you are classifying.
H0 : No association between M&M type and color.
or equivalently:
H0 : The color distributions are the same for the different M&M types.
4. a. Collect the color distribution from bags of up to four types of M&Ms. Then enter your data
into a table similar to the one in Table 29.12. (Be sure to record the type.)
Blue
Yellow
Color
Orange
Red
Brown
Total
Table 29.12. Data on M&Ms type and color.
Table 29.12
b. State the null and alternative hypotheses.
c. Calculate the expected cell counts and enter them into your table.
d. Perform a chi-square test. Report the value of the test statistic, the p-value, and your
conclusion.
1. One of the questions on the MTF survey asked the following: About how many (if any) energy
drinks do you drink PER DAY, on average? Figure 29.4 (see Page 12) shows Minitab results
from testing to see if there is an association between the number of energy drinks students
consumed each day and their growing-up environment. As noted in the Content Overview,
Minitab computed the value of the chi-square test statistic but did not compute a p-value.
a. Explain all ways in which this analysis failed to meet the guidelines for using a chi-square test.
b. In order to continue the investigation into an association between energy drink consumption
and growing-up environment, we decided to combine the last three categories (Three, Four,
and Five or more) into a single category Three+. Make a copy of Table 29.13. Use the data
from Figure 29.4 to fill in the observed values in the third row of the table. Then find the row
total and enter that into your table.
Environment
Count Farm Country City Total
None Observed 57 144 598
799
Expected 52.55 150.44 596.01
One Observed 11 44 160
215
Energy Expected 14.14 40.48 160.38
Drinks Two Observed 4 13 36
53
Expected 3.49 9.98 39.54
Three + Observed
Expected
Total 73 209 828 1110
d. Calculate the value of the chi-square test statistic. How many degrees of freedom are
associated with this statistic?
Intelligence
Below Above Total
Count Average Average Average
Female 437 2243 4072
Gender
Male 456 1643 4593
Total
a. We would like to test whether there is a statistical difference between how males and
females rate their intelligence compared to their peers. In this context, which is the explanatory
variable and which is the response variable? Explain.
c. Make a copy of Table 29.14. Calculate the row totals and column totals and enter them into
your table. Then calculate the expected counts for each cell and enter the expected counts
into your table.
d. Calculate the chi-square test statistic. What are the degrees of freedom associated with the
chi-square test statistic?
3. We would expect that there is an association between how students rated their intelligence
and their academic success. Table 29.15 organizes students responses rating their intelligence
compared to their peers and their average grade in high school.
Average Grade
Count A B C or Below
Above 2886 4044 1387
Intelligence Average 1335 1881 585
Below 305 416 164
c. Calculate the chi-square test statistic. State the degrees of freedom. Determine the p-value.
c. Calculate the chisquare test statistic. State the degrees of freedom. Determine the
d.value.
If the null hypothesis is true, how likely would it be to observe a value from the chi-
square
distribution in (c) at least as large as the value of the chi-square test statistic that you
calculatednull
d. If the hypothesis
in (c)? Does thisis true, howstrong
provide likely evidence
would it be to observe
against a value
the null from theExplain.
hypothesis? chi
square distribution in (c) at least as large as the value of the chisquare test statistic that
you calculated in (c)? Does this provide strong evidence against the null hypothesis?
Explain.
4. Another question on the MTF survey asked the following: On average over the school year,
4. Another question on the MTF survey asked the following: On average over the school
how many hours per week do you work in a paid or unpaid job? The survey results, classified
year, how many hours per week do you work in a paid or unpaid job? The survey
into a two-way
results, table,
classified are
into a shown
twowayintable,
Figureare
29.5. In addition,
shown in Figurethe Minitab
29.5. output contains
In addition, the
the Minitab
output contains
conditional the conditional
distributions of hoursdistributions of hours
worked per week for worked per week
each gender (rowfor each genderAnd
percentages).
(row percentages).
finally, And finally,
of particular interest of particular
is whether interest
or not there is is whether ordifference
a statistical not therein
is work
a patterns
statistical difference in work patterns between male and female 12thgrade students.
between male and
The expected female
counts, 12th-grade
under students.
the hypothesis thatThe expected
there counts, under
is no association the hypothesis
between gender
andthere
that workispatterns, also appear
no association in Figure
between gender29.5.
and(See
workkey at bottom
patterns, alsoofappear
outputinforFigure
the order
29.5.
of appearance.)
(See key at bottom of output for the order of appearance.)
Figure 29.5. Minitab chisquare analysis for gender and weekly work hours.
Figure
29.5. Minitab chi-square analysis for gender and weekly work hours.
a. State appropriate the null and alternative hypotheses for this situation.
Unit 29: Inference for Two-Way Tables | Student Guide | Page 22
b. Report the outcome of the chisquare test and state your conclusion.
b. Report the outcome of the chi-square test and state your conclusion.
c. A chi-square test tells you whether or not there is an association between the two variables
but it doesn’t tell you anything about the nature of that association. Based on the row
percentages, describe the nature of the association between gender and hours worked per
week or describe evidence for the lack of such an association.
a. Set up the hypotheses to test whether there is a relationship between eel species and
habitat use.
c. Calculate the chi-square test statistic. Show your calculations. Report the degrees of
freedom, and the p-value. At the 0.05 level of significance, is the habitat use independent of
the species of moray eel?
d. To examine the nature of any association between the two variables, habitat use and moray
eel species, calculate either row or column percentages, whichever is more appropriate to the
situation under study. Justify your choice of type of percentage. What do your percentages
reveal about moray eels?
2. A random sample of registered voters was asked about their educational background and
whether or not they voted in the November 2012 elections. Table 29.17 contains the results of
the survey.
b. Set up the hypotheses for testing whether educational attainment and voting in the 2012
presidential election are independent.
d. Calculate the chi-square test statistic, state the degrees of freedom, and determine the
p-value. Are the results significant?
e. Make a bar chart that displays how voting patterns are related to highest educational
attainment. (Your choice of which variable is the explanatory variable should be evident in
your display.) Label the bars with the corresponding percentages. Describe the nature of the
relationship between the two variables.
3. Some tired, stressed-out students have turned to 2-ounce energy drink shots such as
5-Hour Energy to give them the energy boost they feel they need to make it through the day
(or night). Compared to energy drinks that can run about 100 calories per 8-ounce serving,
energy shots are sugar free and are around 4 calories per shot.
Because of the low calorie count, would female students be apt to drink more energy shots on a
daily basis than male students? To find out, researchers asked a group of 12th-grade students
the following question: How many (if any) energy drink shots do you drink PER DAY, on average?
Table 29.18 gives the results from a survey given to a sample of 12th-grade students.
b. Based on your answer to (a) do the expected counts satisfy the guidelines for using a chi-
square test? Explain.
c. Combine some of the categories for the amount of energy shots consumed per day.
Compute the expected counts and check to see if the guidelines for using the chi-square test
are satisfied. If not, combine some additional categories until the guidelines are satisfied.
(There are different choices for how the categories can be combined.)
d. Perform a chi-square test on your data from (c). What is the value of the chi-square test
statistic? Report its p-value. What conclusions could the researchers draw from your results?
Summary of Video
In Unit 11, Fitting Lines to Data, we examined the relationship between winter snowpack and
spring runoff. Colorado resource managers made predictions about the seasonal water supply
using a least-squares regression line that was fit to a scatterplot of their measurement data,
which is shown in Figure 30.1.
But would we really see a linear relationship between snowpack and runoff if we had all the
possible data? Or might the pattern we see in the sample data’s scatterplot occur just by
chance? We would like to know whether the positive association we see between snowpack
and runoff in the sample is strong enough that we can conclude that the same relationship
holds for the whole population. Statisticians rely on inference to determine whether the
relationship observed between two variables in a sample is valid for some larger population.
Inference is a powerful tool. Powerful enough, in fact, to help bring an entire bird species
back from the brink of extinction. After World War II, the agrichemical industry began mass-
producing chemicals to control pests. Cities like San Antonio, Texas, sprayed whole sections of
the city with the insecticide DDT in their fight against the spread of poliomyelitis. Unfortunately,
In Great Britain, Derek Ratcliffe noticed in the 1950s that peregrine falcons were declining at
nesting sites and they were unable to hatch their eggs. This decline in falcons was eventually
demonstrated to be a worldwide phenomenon. Researchers determined that the reason
peregrine falcons were not successfully hatching their eggs was due to eggshell thinning, a
very serious problem since the weaker shells were breaking before the baby birds were ready
to hatch. After looking at some of the causes for this eggshell thinning, scientists began to zero
in on a possible culprit: DDT and its breakdown product, DDE.
There were a couple of reasons why scientists believed that there was a relationship between
DDT or DDE and eggshell thinning. In studying the broken eggshells and eggs collected in the
field, scientists found very high residues of DDE that had not been seen in historic samples.
The falcons were ingesting DDT through their prey – birds they ate had small concentrations
of the chemical in their flesh. Over time the DDT built up in the peregrines’ own bodies and
started to affect the females’ ability to lay healthy eggs.
Even though scientists had a pretty strong hunch that DDT was the cause of peregrine falcon
eggshell thinning, they could not rely on their scientific instincts alone. So, researchers turned
to statistics as a way to validate their analyses. We can follow in the researchers’ footsteps by
taking a look at a data set comprised of 68 peregrine falcon eggs from Alaska and Northern
Canada. A scatterplot of the two variables we will be studying, eggshell thickness (response
variable) and the log-concentration of DDE (explanatory variable), appears in Figure 30.2. We
have added the least-squares regression line fit to these data. Remember it is described by an
equation of the form ŷ = a + bx .
The data in Figure 30.2 show a negative, linear relationship between the two variables. Using
the equation, we can predict eggshell thickness for any measurement of DDE. The slope
b and intercept a are statistics, meaning we calculated them from our sample data. But if
we repeated the study with a different sample of eggs, the statistics a and b would take on
somewhat different values. So, what we want to know now is whether there really is a negative
linear relationship between these variables for the entire population of all peregrine eggs,
beyond just the eggs that happen to be in our sample. Or might the pattern we see in the
sample data be due simply to chance variation?
Data of the entire peregrine egg population might look like the scatterplot in Figure 30.3.
Notice that for any given value of the explanatory variable, such as the value indicated by the
vertical line, many different eggshell thicknesses may be observed.
Figure 30.4. The population regression line fit to the population data.
Several conditions, which are discussed in the Content Overview, must be met in order to
move forward with regression inference. You can check out whether these conditions are
satisfied in Review Question 1. But for now, we assume that the conditions for inference are
met. The population regression model is written as follows:
µy = α + β x
where y represents the true population mean of the response y for the given level of x, α
is the population y-intercept, and β is the population slope. Now let’s look back at our least
squares regression line, based on the sample of 68 bird eggs. The equation is
yˆ = 2.146 − 0.3191x
The sample intercept, a = 2.146, is an estimate for the population intercept α . And the sample
slope, b = -0.3191, is an estimate for the population slope β.
Of course, we’ve learned by now that other samples from the same population will give us
different data, resulting in different parameter estimates of α and β. In repeated sampling,
the value of these statistics, a and b, form sampling distributions, which provide the basis for
statistical inference. In particular, we want to infer from the sampling distribution for our statistic
b, whether the sample data provide sufficiently strong evidence that higher levels of DDE are
b − β0
t=
sb
where b is our sample estimate for the population slope, β0 is the null hypothesis value for
the population slope, and sb is the standard error of the estimate b, which we can get from
software. In this case, sb = 0.0255 . Next, we calculate the value of our t-test statistic:
−0.3191 − 0
t= ≈ −12.5
0.0255
If the null hypothesis is true, then t has a t-distribution with n – 2, or 66, degrees of freedom.
The value t = -12.5 is an extreme value and the corresponding p-value is essentially 0. Thus,
we have strong evidence to reject the null hypothesis. By rejecting the null hypothesis, we
can confirm what scientists already suspected – that there is a connection between peregrine
falcon eggshell thickness and the presence of DDE. More precisely, there is a statistically
significant, negative linear relationship between the log-concentration of DDE and the
thickness of peregrine eggshells.
Before researchers could present this finding to the public, however, they had to quantify the
relationship. That meant computing a confidence interval for the population slope. Here’s the
formula:
b ± t * sb
For a 95% confidence interval and df = 68 – 2 = 66, we find t* = 1.997. Now, we can compute
the confidence interval:
−3.191 ± 0.0509
−0.3700 to − 0.2681
Hence, based on our sample of 68 peregrine falcon eggs, we are 95% confident that a one-
unit increase in the log-concentration of DDE is associated with a true average decrease of
between 0.27 and 0.37 in Ratcliffe’s eggshell thickness index. Armed with this information,
scientists were able to make a strong argument against the use of DDT because of its
dangerous impact on peregrines and the environment as a whole. These results led to
a prolonged legal battle with people on both sides presenting evidence. Due to scientific
and statistical evidence, the United States and many Western European countries banned
DDT use. Since then, the peregrine falcon population has rebounded significantly. So, this
environmental detective story has a happy ending for the peregrine falcons.
B. Know how to check whether the assumptions for the linear regression model are reasonably
satisfied.
C. Recall how to find the least-squares regression equation (Unit 11, Fitting Lines to Data).
D. Be able to calculate, or obtain from software, the standard error of the estimate, se , and the
standard error of the slope, s b .
To better understand the relationship between fish size and mercury concentration, the United
State Geological Survey (USGS) collected data on total fish length and mercury concentration
in fish tissue. (Total length is the length from the tip of the snout to the tip of the tail.) The data
from a sample of largemouth bass (of legal size to catch) collected in Lake Natoma, California,
appear in Table 30.1. (You may remember these data from Review Question 3 in Unit 11.)
Total Length Mercury Concentration Total Length Mercury Concentration
(mm) (µg/g wet wt.) (mm) (µg/g wet wt.)
341 0.515 490 0.807
353 0.268 315 0.320
387 0.450 360 0.332
375 0.516 385 0.584
389 0.342 390 0.580
395 0.495 410 0.722
407 0.604 425 0.550
415 0.695 480 0.923
425 0.577 448 0.653
446 0.692 460 0.755
Table 30.1. Fish total length and mercury concentration in fish tissue.
Table
30.1
Since we believe that fish length explains mercury concentration, total length is the
explanatory variable and mercury concentration is the response variable. A scatterplot of
mercury concentration versus total length appears in Figure 30.5.
0.9
Mercury Concentration (µg/g)
0.8
0.7
0.5
0.4
0.3
0.2
300 350 400 450 500
Total Length (mm)
Since the pattern of the dots in the scatterplot indicates a positive, linear relationship between
the two variables, we fit a least-squares line to the data. However, these data are a sample of
20 largemouth bass from the population of all the largemouth bass that live in Lake Natoma.
While we can use the least-squares equation to make predictions about mercury concentration
for fish of a particular length, we need techniques from statistical inference to answer the
following questions about the population:
• Can we determine a confidence interval estimate for the population slope, the rate of
change of mercury concentration per one millimeter increase in fish total length?
• If we use the least-squares line to predict the mercury concentration for a fish of a
particular length, how reliable is our prediction?
Now, what if we could make a scatterplot of mercury concentration versus total length for all of
the largemouth bass (at or close to the legal catch length) in Lake Natoma? Figure 30.6 shows
how a scatterplot of the population might look and how a regression line fit to the population
data might look.
1.25
1.00
Mercury Concentration µ g/g
0.75 µy = α + β x
0.50
0.25
0.00
Notice, for each fish length, x, there are many different values of mercury concentration, y.
For example, in Figure 30.6 a vertical line segment has been drawn at length x1 . That line
segment intersects with a whole distribution of mercury concentration values, y-values, on
the scatterplot. The mean of that distribution of y-values, µ y , is at the intersection of the
vertical line at x1 and the regression line. Now look at the vertical line at x2 . It too intersects
with an entire distribution of y-values, with mean at the intersection of the vertical line at
x2 and the regression line. So, the population regression line describes how the mean
mercury concentration values, µ y , are related to total length, x. In this case, the relationship
looks linear and so we express it as: µ y = α + β x . As mentioned earlier in this unit, several
conditions must be met in order to move forward with regression inference. Those conditions,
along with a description of the simple linear regression model, are presented below.
The simple linear regression model assumes that for each value of x the observed values
of the response variable, y, vary about a mean µ y that has a linear relationship with x:
µy = α + β x
µ y = α + βx
α + βx 1 σ
α + βx 2 σ
α + βx 3 σ
x1 x2 x3
x
A first step in inference is to estimate the unknown parameters. We begin with estimates for
the slope and intercept of the population regression line. The estimated regression line
for the linear regression model is the least-squares line, ŷ = a + bx . From Figure 30.5, the
estimated regression line is:
The y-intercept, a = -0.7374, is a point estimate for the population intercept, α , and the slope,
b = 0.003227, is a point estimate of the population slope, β.
Next, we develop an estimate for σ , which measures the variability of the response y about
the population regression line. Because the least-squares line estimates the population
regression line, the residuals estimate how much y varies about the population regression line:
= y − yˆ
se =
∑ ( y − yˆ )
2
=
SSE
n−2 n−2
The computation of se is tedious by hand. Regression outputs from statistical software will
compute the value for you. However, here’s how it is computed in our example of mercury
concentration and fish length. First, we’ll compute the residual corresponding to data value
(341, 0.515) as a reminder of how that is done.
Next, we calculate the SSE, the sum of the squares of the residuals:
SSE 0.1545
se = ≈ ≈ 0.0926 μg/g
20 − 2 18
We can use the equation of the least-squares line, yˆ = −0.7374 + 0.003227 , to make
predictions. However, those predictions are more reliable when the data points lie “close” to
the line. Keep in mind that se is one measure of the closeness of the data to the least-squares
line. If se = 0 , the data points fall exactly on the least-squares line. Moreover, when se is
positive, we can use it to place error bounds above and below the least-squares line. These
error bounds are lines parallel to the least-squares line that lie one or two se above and below
the least-squares line. We apply this idea to our mercury concentration and fish length data.
1.0
0.9
Mercury Concentration ( µ g/g)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Figure 30.8. Adding lines ± se and ±2 se above and below the least-squares line.
Recall from Unit 8, Normal Calculations, that we expect roughly 68% of normal data to
lie within one standard deviation of the mean and roughly 95% to lie within two standard
deviations of the mean. Notice that all of our data fall within two se of the least-squares line.
So, for a particular fish length, say with total length = 400 mm, we expect roughly 95% of the
fish to have mercury concentrations between 0.3682 μg/g and 0.7386 μg/g.
The standard error of the estimate provides one way to select between competing models. For
example, suppose we had a second model relating mercury concentration to the explanatory
variable fish weight. Choose the model with the smaller value for se .
or Ha : β > 0
A regression line with slope 0 is horizontal. That indicates that the mean of the response y
does not change as x changes – which, in turn, means that the linear regression equation is
of no value in predicting y. In the case of mercury concentration and total length, the estimate
of the population slope is very small, b = 0.003227. So, we might jump to the conclusion that
total length is not useful in predicting mercury concentration. But we’d better work through the
details of a significance test before jumping to such a conclusion.
b − β0
t=
sb
se
where sb =
∑(x − x )
2
and b is the least-squares estimate of the population slope, β, and β0 is the null
hypothesis value for β .
If the null hypothesis is true and the linear regression conditions are satisfied, then t has
a t-distribution with df = n – 2.
0.093
sb = ≈ 0.000468
39463.2
0.003227 − 0
t= ≈ 6.9
0.000468
9.4127E-07
0 6.9
t
Next, we calculate a confidence interval estimate for the regression slope, β. Here are the
details for constructing a confidence interval.
b ± t * sb
where t* is a t-critical value associated with the confidence level and determined from
a t-distribution with df = n – 2; b is the least-squares estimate of the population slope,
and sb is the standard error of b.
b ± t * sb
0.003227 ± (2.101)(0.000468) ≈ 0.003227 ± 0.000983 ,
Or, rounded to four decimals, from 0.0022 to 0.0042.
Thus, for each increase of 1 millimeter in total length, we expect the mercury concentration to
increase between 0.0022 μg/g and 0.0042 μg/g. That may seem like a small increase, but, for
example, Florida has set the safe limit on mercury concentration to be below 0.5 μg/g.
The results from inference are trustworthy provided the conditions for the simple linear
regression model are satisfied. We conclude this overview with a discussion of checking the
conditions – what should be done first before proceeding to inference. The conditions involve
the population regression line and deviations of responses, y-values, from this line. We don’t
know the population regression line, but we have the least-squares line as an estimate. We
also don’t know the deviations from the population regression line, but we have the residuals
as estimates. So, checking the assumptions can be done through examining the residuals.
Here is a rundown of the conditions that must be checked:
1. Linearity
Check the adequacy of the linear model (covered in Unit 11). Make a residual plot, a
scatterplot of the residuals versus the explanatory variable. If the pattern of the dots
appears random, with about half the dots above the horizontal axis and half below, then the
condition of linearity is satisfied.
2. Normality
The responses, y-values, vary normally about the regression line for each x. This does
not mean that the y-values are normally distributed because different y-values come from
different x-values. However, the deviations of the y-values about their mean (the regression
line) are normal and those deviations are estimated by the residuals. So, check that
the residuals are approximately normally distributed (covered in Unit 9). Make a normal
quantile plot. If the pattern of the dots appears fairly linear, then the condition of normality
is satisfied. If the plot indicates that the residuals are severely skewed or contain extreme
outliers, then this condition is not satisfied.
3. Independence
The responses, y-values, must be independent of each other. The best evidence of
independence is that the data are a random sample.
2 2
1 1
Residuals
Residuals
0 0
-1 -1
-2 -2
1 2 3 4 5 1 2 3 4 5
x x
(a) (b)
Now, we return to the fish study: Are the inference results – the significance test and
confidence interval that we calculated – trustworthy? Let’s check to see if Conditions 1 – 4 are
reasonably satisfied. A residual plot appears in Figure 30.11.
Residual Plot
(Response is Mercury Concentration))
0.2
0.1
Residual
0.0
-0.1
-0.2
300 350 400 450 500
Total Length (mm)
95
90
80
70
Percent
60
50
40
30
20
10
5
1
-0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
Residuals
Finally, the data were a random sample of fish. So, the mercury concentration levels are
independent of each other. Condition 3 is satisfied. So, now we can say that our inference
results are trustworthy.
µy = α + β x
The line described by µ y = α + β x is called the population regression line. The estimated
regression line for the linear regression model is the least-squares line, ŷ = a + bx .
The observed response y for any value of x varies according to a normal distribution.
The standard error of the estimate, se , is a measure of how much the observations vary
about the least-squares line. It is a point estimate for σ and is computed from the following
formula:
se =
∑ ( y − yˆ )2
=
SSE
n−2 n−2
The standard error of the slope, sb , is the estimated standard deviation of b, the least-
squares estimate for the population slope β. It is calculated from the following formula:
se
sb =
∑(x − x ) 2
The t-test statistic for testing H0 : β = β 0 , where β is the population slope, is calculated as
follows:
b − β0
t=
sb
To calculate a confidence interval for the population slope, β, use the following formula:
b ± t * sb
where t* is a t-critical value associated with the confidence level and determined from a
t-distribution with df = n – 2; b is the least-squares estimate of the population slope, and sb is
the standard error of b.
1. The population of peregrine falcons was in decline in the 1950s. What was the reason for
the population’s decline?
3. Describe the form of the relationship between eggshell thickness and log-concentration of
DDE – is the form linear or nonlinear? Positive or negative?
5. Why are a and b, the y-intercept and slope of the least-squares line, called statistics?
6. State the null and alternative hypotheses used for testing whether the sample data provided
strong evidence that higher levels of DDE were related to eggshell thinning in the population.
A high school’s mascot is stolen and the poster shown in Figure 30.13 has been posted around
the school and the town. The thief has left clues: a plain black sweater and a set of footprints
under a window. The footprints appear to have been made by a man’s sneaker. Here are more
details from the investigation:
• The distance between the footprints reveals that the thief’s steps are about 58 cm long.
This distance was measured from the back of the heel on the first footprint to the back
of the heel on the second.
• The thief’s forearm is between 26 and 27 cm. The forearm length was estimated from the
sweater by measuring from the center of a worn spot on the elbow to the turn at the cuff.
School officials suspect that the thief is a student from a rival high school. Table 30.2 contains
data from a random sample of 9th and 10th-grade students that you can use for this activity.
Feel free to add and/or substitute data that your class collects.
In this activity, you will fit two linear regression models to the data. For the first model you
will fit a line to forearm length and height; for the second model, you will fit a line to step
length and height. To eliminate confusion, express your models using the variable names
rather than x and y.
b. Check to see if the four conditions for the simple linear regression model are reasonably
satisfied. (Look to see if there are strong departures from the conditions.)
2. Next, let’s focus on inference related to the relationship between height and forearm length.
a. We expect people with longer forearms to be taller than people with shorter forearms.
Conduct a significance test H0 : β = 0 against Ha : β > 0 . Report the value of the test statistic,
the degrees of freedom, the p-value, and your conclusion.
b. Construct a 95% confidence interval for β. Interpret your confidence interval in the context
of this situation.
3. a. Make a scatterplot of height versus step length. Calculate the equation of the least-
squares line and add its graph to your scatterplot.
b. Check to see if the four conditions for the simple linear regression model are reasonably
satisfied. (Look to see if there are strong departures from the conditions.)
4. Next, we focus on inference related to the relationship between height and step length.
a. We expect people with longer step lengths to be taller than people with shorter step lengths.
Conduct a significance test H0 : β = 0 against Ha : β > 0 . Report the value of the test statistic,
the degrees of freedom, the p-value, and your conclusion.
b. Construct a 95% confidence interval for β. Interpret your confidence interval in the context
of this situation.
5. a. You have two competing models for predicting height, one based on forearm length and
the other based on step length. Which of your two models is likely to produce more precise
estimates? Explain.
We predict that the thief is ______ cm tall. But the thief might be as short as ______ or as tall
as ______.
TableTable
30.2.30.2
Data from 9th and 10-grade students.
Table
330.3.
Table 0.3 Data on femur and ulna length and height.
1. a. Make a scatterplot of height versus femur length. Would you describe the pattern of the
dots as linear or nonlinear? Positive association or negative?
b. Calculate the equation of the least-squares line. Add a graph of the line to your scatterplot in (a).
c. Check to see if the conditions for regression inference are reasonably satisfied. Identify any
strong departures from the conditions.
b. Write the equations of error bands one and two standard errors, se , above and below the
least-squares line. Add graphs of these lines to your scatterplot from question 1(b).
c. If the distributions of the responses, y-values, for any fixed x are normally distributed with
mean on the regression line, then the outermost bands in (b) should trap roughly 95% of the
data between the bands. Is that the case?
3. a. Make a scatterplot of height versus ulna length. Determine the equation of the least-
squares line and add a graph of the least-squares line to your scatterplot.
c. Suppose a partial skeleton is found on a rugged hillside. The skeleton is brought to a lab for
identification. The ulna bone measures 287 mm and the femur measures 520 mm. Use your
equation from 3(a) to predict the person’s height. Then use your equation from 1(b) to predict
the person’s height. Which of your estimates, the one based on ulna length or the one based
on femur length, is likely to be more reliable? Justify your answer based on the standard error
of the estimate, se , for each equation.
4. Consider the linear regression model for height based on femur length.
a. Test the hypothesis H0 : β = 0 against the one-sided alternative Ha : β > 0 . Report the value
of the t-test statistic, the degrees of freedom, the p-value, and your conclusion.
Assume that the data came from a random sample of eggs collected from Alaska and
Northern Canada. Figure 30.14 shows a residual plot and Figure 30.15 displays a normal
quantile plot of the residuals.
Residual Plot
0.3
0.2
0.1
Residuals
0.0
-0.1
-0.2
-0.3
1.0 1.5 2.0 2.5
Log-Concentration DDE
99
95
90
80
70
Percent
60
50
40
30
20
10
5
0.1
-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Residuals
High School GPA First Year College GPA High School GPA First Year College GPA
3.00 3.15 2.90 1.46
3.00 2.07 3.50 3.10
2.30 2.60 3.10 2.76
3.68 4.00 3.35 2.01
2.20 2.03 3.70 3.34
3.00 3.53 2.70 2.90
3.03 3.17 2.86 2.93
3.00 2.68 2.51 1.95
3.16 3.88 2.93 3.01
2.70 2.30 3.41 3.48
4.00 3.64 3.30 2.87
3.77 3.62 3.76 2.85
2.70 2.34 2.66 1.67
3.10 3.64 2.91 3.38
3.23 3.67 3.47 3.68
2.80 3.37 3.40 3.76
a. Make a scatterplot of first-year college GPA versus high school GPA. Does the form of
these data appear to be linear? Would you describe the relationship as positive or negative?
b. Determine the equation of the least-squares line and add the line to your scatterplot in (a).
c. Determine the t-test statistic for testing H0 : β = 0 . How many degrees of freedom does t have?
d. Find the p-value for the one-sided alternative Ha : β > 0 . What do you conclude?
3. Linda heats her house with natural gas. She wonders how her gas usage is related to how
cold the weather is. Table 30.5 shows the average temperature (in degrees Fahrenheit) each
month from September through May and the average amount of natural gas Linda’s house
used (in hundreds of cubic feet) each day that month.
4. Do taller 4-year-olds tend to become taller 6-year-olds? Can a linear regression model
be used to predict a 4-year-old’s height when he or she turns six? Table 30.6 gives data on
heights of children when they were four and then when they were six.
TableTable
30.6.30.6
Data on children’s heights at age 4 and 6.
Summary of Video
A vase filled with coins takes center stage as the video begins. Students will be taking part
in an experiment organized by psychology professor John Kelly in which they will guess the
amount of money in the vase. As a subterfuge for the real purpose of the experiment, students
are told that they are taking part in a study to test the theory of the “Wisdom of the Crowd,”
which is that the average of all of the guesses will probably be more accurate than most of the
individual guesses. However, the real purpose of the study is to see whether holding heavier
or lighter clipboards while estimating the amount of money in the jar will have an impact on
students’ guesses. The idea being tested is that physical experience can influence our thinking
in ways we are unaware of – this phenomenon is called embodied cognition.
The sheet on which students will record their monetary guesses is clipped onto a clipboard.
For the actual experiment, clipboards, each holding varying amounts of paper, weigh either
one pound, two pounds or three pounds. Students are randomly assigned to clipboards and
are unaware of any difference in the clipboards. After the data are collected, guesses are
entered into a computer program and grouped according to the weights of the clipboards. The
mean guess for each group is computed and the output is shown in Table 31.1.
Money Guesses
1 $106.56 75 $100.62
2 $129.79 75 $204.95
3 $143.29 75 $213.13
Total $126.55 225 $180.16
Table
31.1
Table 31.1. Average guesses by clipboard weight.
In this case, F = 0.796 with a p-value of 0.45. That means there is a 45% chance of getting an
F value at least this extreme when there is no difference between the population means. So,
the data from this experiment do not provide sufficient evidence to reject the null hypothesis.
One of the underlying assumptions of ANOVA is that the data in each group are normally
distributed. However, the boxplots in Figure 31.2 indicate that the data are skewed and include
some rather extreme outliers. John’s students tried some statistical manipulations on the data to
make them more normal and reran the ANOVA. However, the conclusion remained the same.
$1,600.00
$1,400.00
$1,200.00
$1,000.00
MoneyGuess
$800.00
$600.00
$400.00
$200.00
$0.00
1 2 3
Clipboard Weight
But what if we used the data displayed in Figure 31.3 instead? The sample means are the same,
around $107, $130, and $143, but this time the data are less spread out about those means.
200
175
MoneyGuess
150
125
100
75
50
1 2 3
Clipboard Weight
In this case, after running ANOVA, the result is F = 33.316 with a p-value that is essentially
zero. Our conclusion is to reject the null hypothesis and conclude that the population means
are significantly different.
In John’s experiment, the harsh reality of a rigorous statistical analysis has shot down the idea
that holding something heavy causes people, unconsciously, to make larger estimates, at least
in this particular study. But if the real experiment didn’t work, what about the cover story – the
theory of the Wisdom of the Crowd? The actual amount in the vase is $237.52. Figure 31.4
shows a histogram of all the guesses. The mean of the estimates is $129.22 – more than $100
off, but still better than about three-quarters of the individual guesses. So, the crowd was wiser
than the people in it.
B. Be able to identify the factor(s) and response variable from a description of an experiment.
D. Know how to compute the F statistic and determine its degrees of freedom given the
following summary statistics: sample sizes, sample means and sample standard deviations.
Be able to use technology to compute the p-value for F.
F. Recognize that statistically significant differences among population means depend on the
size of the differences among the sample means, the amount of variation within the samples,
and the sample sizes.
G. Recognize when underlying assumptions for ANOVA are reasonably met so that it is
appropriate to run an ANOVA.
For example, suppose a statistics class wanted to test whether or not the amount of caffeine
consumed affected memory. The variable caffeine is called a factor and students wanted
to study how three levels of that factor affected the response variable, memory. Twelve
students were recruited to take part in the study. The participants were divided into three
groups of four and randomly assigned to one of the following drinks:
After drinking the caffeinated beverage, the participants were given a memory test (words
remembered from a list). The results are given in Table 31.2.
Table
Table
331.2.
1.2 Number of words recalled in memory test.
For an ANOVA, the null hypothesis is that the population means among the groups are the
same. In this case, H0 : µ A = µB = µC , where µ A is the population mean number of words
recalled after people drink Coca Cola and similarly for µB and µC . The alternative or research
hypothesis is that there is some inequality among the three means. Notice that there is a lot of
variation in the number of words remembered by the participants. We break that variation into
two components:
(1) variation in the number of words recalled among the three groups also called
between-groups variation
Unit 31: One-Way ANOVA | Student Guide | Page 5
(2) variation in number of words among participants within each group also called
within-groups variation.
To measure each of these components, we’ll compute two different variances, the mean
square for groups (MSG) and the mean square error (MSE). The basic idea in gathering
evidence to reject the null hypothesis is to show that the between-groups variation is
substantially larger than the within-groups variation and we do that by forming the ratio, which
we call F:
In the caffeine example, we have three groups. More generally, suppose there were k different
groups (each assigned to consume varying amounts of caffeine) with sample sizes n1, n2, …
nk. Then the null hypothesis is H 0 : µ1 = µ2 = . . . = µk and the alternative hypothesis is that
at least two of the population means differ. The formulas for computing the between-groups
variation and within-groups variation are given below:
where x is the mean of all the observations and x1 ,x2 , . . . ,xk are the
sample means for each group.
We return to our three-group caffeine experiment to see how this works. To begin, we
calculate the sample means and standard deviations (See Table 31.3.).
Table 31.3.
Table 31.3Group means and standard deviations.
All that is left is to find the p-value. If the null hypothesis is true, then the F-statistic has the F
distribution with 2 and 12 degrees of freedom. We use software to see how likely it would be
to get an F value at least as extreme as 5.78. Figure 31.5 shows the result giving a p-value of
around 0.017. Since p < 0.05, we conclude that the amount of caffeine consumed affected the
mean memory score.
Distribution Plot
F, df1=2, df2=12
1.0
0.8
0.6
Density
0.4
0.2
0.01746
0.0
0 5.78
F
It takes a lot of work to compute F and find the p-value. Here’s where technology can help.
Statistical software such as Minitab, spreadsheet software such as Excel, and even graphing
calculators can calculate ANOVA tables. Table 31.4 shows output from Minitab. Now, match
the calculations above with the values in Table 31.4. Check out where you can find the values
for MSG, MSE, F, the degrees of freedom for F, and the p-value directly from the output of
ANOVA. That will be a time saver!
Table 31.4. ANOVA output from Minitab.
Table 31.4. ANOVA output from Minitab.
It is important to understand that ANOVA does not tell you which population
Itmeans
is important
differ, to understand
only thattwo
that at least ANOVA
of the does
means not tell you
differ. Wewhich
wouldpopulation
have to usemeans
other differ,
only
teststhat at least
to help two of the
us decide whichmeans
of thediffer.
threeWe would have
population meansto use
are other tests todifferent
significantly help us decide
from each
which of theother.
three However,
populationwe can also
means are get a clue by different
significantly plotting the
fromdata.
each Figure
other.31.6
However,
shows comparative dotplots for the number of words for each group. The sample means
we can also get a clue by plotting the data. Figure 31.6 shows comparative dotplots for the
are marked with triangles. Notice that the biggest difference in sample means is
number
between of groups
words for each
A (34 mggroup. Theand
caffeine) sample means
C (160 mg ofare markedThe
caffeine). withsample
triangles. Notice
means for that
groups
the biggestB and C are quite
difference close together.
in sample means is So, it looks
between as if consuming
groups Coca Cola
A (34 mg caffeine) anddoesn’t
C (160 mg
give the memory boost you could expect from consuming coffee or Jolt Energy.
of caffeine). The sample means for groups B and C are quite close together. So, it looks as
if consuming Coca Cola doesn’t give the memory boost you could expect from consuming
coffee or Jolt Energy.
A
Group
B
C
6 7 8 9 10 11 12 13 14 15 16 17 18
Figure 31.6. Comparative dotplots.
Number of Words
There is one last detail before jumping into running an ANOVA – there are some
Figure 31.6. assumptions
underlying Comparative that need to be checked in order for the results of the analysis
dotplots.
to be valid. What we should have done first with our caffeine experiment, we will do last.
Here is
There areone
thelast
three things
detail to check.
before jumping into running an ANOVA – there are some underlying
assumptions that need to be checked in order for the results of the analysis to be valid. What
1. Each group’s data need to be an independent random sample from that
we should have done
population. In first with our
the case caffeine
of an experiment,
experiment, we willneed
the subjects do last. Here
to be are the three
randomly
things toassigned
check. to the levels of the factor.
Check: The subjects in the caffeine-memory experiment were divided into groups. Groups
were then randomly assigned to the level of caffeine.
Check: The normal quantile plots of Words Recalled for each group are shown in Figure
31.7. Based on these plots, it seems reasonable to assume these data are from a Normal
distribution.
90 90
50 50
10 10
Percent
1 1
0 5 10 15 20 5 10 15 20
90
50
10
1
5 10 15 20
3. All populations have the same standard deviation. The results from ANOVA will be
approximately correct as long as the ratio of the largest standard deviation to the smallest
standard deviation is less than 2.
Check: The ratio of the largest to the smallest standard deviation is 2.236/1.789 or around
1.25, which is less than 2.
An analysis of variance or ANOVA is a method of inference used to test whether or not three
or more population means are equal. In a one-way ANOVA there is one factor that is thought
to be related to the response variable.
An analysis of variances tests the equality of means by comparing two types of variation,
between-groups variation and within-groups variation. Between-groups variation deals
with the spread of the group sample means about the grand mean, the mean of all the
observations. It is measured by the mean square for groups, MSG. Within-groups variation
deals with the spread of individual data values within a group about the group mean. It is
measured by the mean square error, MSE.
2. What was different about the clipboards that students were holding?
4. What is the name of the test statistic that results from ANOVA?
5. Was the professor able to conclude from the F-statistic that the population means differed
depending on the weight of the clipboard? Explain.
You will use the Wafer Thickness tool to collect data for this activity. There are three control
settings that affect wafer thickness during the manufacture of polished wafers used in the
production of microchips.
1. Leave Controls 2 and 3 set at level 2. Your first task will be to perform an experiment to
collect data and determine whether settings for Control 1 affect the mean thickness of polished
wafers.
a. Open the Wafer Thickness tool. Set Control 1 to level 1, and Controls 2 and 3 to level 2 (the
middle setting). In Real Time mode, collect data from 10 polished wafers. Store the data in a
statistical package or Excel spreadsheet or in a calculator list. Make a sketch of the histogram
produced by the interactive tool.
b. Set Control 1 to level 2. Leave Controls 2 and 3 set at level 2. Repeat (a). Sketch the second
histogram using the same scales as was used on the first. Store the data in your spreadsheet
or a calculator list.
c. Set Control 1 to 3. Leave Controls 2 and 3 set at level 2. Repeat (a). Sketch your third
histogram, again using the same scales as were used on the first histogram. Store the data in
your spreadsheet or a calculator list.
d. Calculate the means and standard deviations for each of your three samples. Based on
the sample means and on your histograms, do you think that there is sufficient evidence that
changing the level of Control 1 changes the mean thickness of the polished wafers produced?
Or might these sample-mean differences be due simply to chance variation? Explain your
thoughts.
e. Use technology to run an ANOVA. State the null hypothesis being tested, the value of F, the
p-value, and your conclusion.
2. Your next task will be to perform an experiment to collect data and determine whether
settings for Control 2 affect the mean thickness of polished wafers.
a. Leave Controls 1 and 3 set at level 2. Adapt the process used in question 1(a – c) to collect
the data on Control 2.
c. Provided you answered yes to (b), use technology to run an ANOVA. State the null
hypothesis being tested, the value of F, the p-value, and your conclusion. (If you answered no
to (b), skip this part.)
3. Your final task will be to perform an experiment to collect data and determine whether
settings for Control 3 affect the mean thickness of polished wafers.
a. Leave Controls 1 and 2 set at level 2. Adapt the process used in question 1(a – c) to collect
the data on Control 3.
b. Compute the standard deviations for the three samples. Is the underlying assumption of
equal standard deviations reasonably satisfied? Explain.
c. Provided you answered yes to (b), use technology to run an ANOVA. State the null
hypothesis being tested, the value of F, the p-value, and your conclusion. (If you answered no
to (b), skip this part.)
Table
31.5
Table 31.5. Test results.
a. Calculate the mean test score for each group. Calculate the standard deviation of the test
scores for each group.
b. Make comparative dotplots for the test results of the three groups. Do you think that the
dotplots give sufficient evidence that there is a difference in population mean test results
depending on the type of noise? Explain.
c. Run an ANOVA. State the hypotheses you are testing. Show the calculations for the
F-statistic. What are the degrees of freedom associated with this F-statistic?
2. Not all hotdogs have the same calories. Table 31.6 contains calorie data on a random
sample of Beef, Poultry, and Veggie dogs. (One extreme outlier for Veggie dogs was omitted
from the data.) Does the mean calorie count differ depending on the type of hotdog? You first
encountered this topic in Unit 5, Boxplots.
Table
31.6
Table 31.6. Calorie content Table
31.7
Table 31.7. First-year college GPA by high
of hotdogs. school rating.
a. Verify that the standard deviations allow the use of ANOVA to compare population means.
b. Use technology to run an ANOVA. State the value of the F-statistic, the degrees of freedom
for F, the p-value of the test, and your conclusion.
c. Make boxplots that compare the calorie data for each type of hot dog. Add a dot to each
boxplot to mark the sample means. Do your plots help confirm your conclusion in (b)?
3. Many states rate their high schools using factors such as students’ performance, teachers’
educational backgrounds, and socioeconomic conditions. High school ratings for one state
have been boiled down into three categories: high, medium, and low. The question for one of
the state universities is whether or not college grade performance differs depending on high
school rating. Table 31.7 contains random samples of students from each high school rating
level and their first-year cumulative college grade point averages (GPA).
a. Calculate the sample means for the GPAs in each group. Based on the sample means
alone, does high school rating appear to have an impact on mean college GPA? Explain.
b. Check to see that underlying assumptions for ANOVA are reasonably satisfied.
a. The sample mean ACL scores for nursing, other health professional students, and
education majors were 46.44, 45.58, and 48.59, respectively. Do these sample means provide
sufficient evidence to conclude that there was some difference in population mean ACL scores
among these three majors? Explain.
b. A one-way analysis of variance was run to determine if there was a difference among the
three groups on mean ACL scores. Assuming that all students answered the NSSE questions
related to ACL, what were the degrees of freedom of the F-test?
c. The results from the ANOVA gave F = 8.382. Determine the p-value. What can you
conclude?
Ratings for A Ratings for B Ratings for C Ratings for A Ratings for B Ratings for C
8 4 6 8 4 6
10 5 5 10 5 5
7 7 7 7 6 8
8 8 5 8 9 2
6 7 6 3 6 7
7 8 5 7 8 3
4 6 6 4 6 7
7 5 5 9 5 5
6 5 4 6 4 4
8 6 2 8 7 2
6 6 3 5 6 3
5 7 4 4 9 4
6 3 5 6 3 3
7 8 5 8 9 7
8 5 6 10 3 8
a. Find the sample means of each candy type based on the ratings in Table 31.8. Then do the
same for the ratings in Table 31.9. Based on these results, can you tell if there is a significant
difference in population mean ratings among the different types of candies? Explain.
b. Make comparative boxplots for the data in Table 31.8. Then do the same for the data in
Table 31.9. For both sets of plots, mark the mean with a dot on each boxplot. For which data
set is it more likely that the results from a one-way ANOVA will be significant? Explain.
c. Run an ANOVA based on Data Set #1. Report the value of the F-statistic, the p-value, and
your conclusion. Then do the same for Data Set #2. Explain why you should not be surprised
by the results.
2. The data in Table 31.10 were part of a study to investigate online questionnaire design.
The researcher was interested in the effect that type of answer entry and type of question-to-
question navigation would have on the time it would take to complete online surveys. Twenty-
Display Type Navigation Type Time (sec) Display Type Navigation Type Time (sec)
1 1 97 1 3 117
3 3 83 2 2 74
1 1 102 1 3 66
3 3 85 3 1 62
1 1 92 1 2 93
3 3 71 3 1 62
1 2 105 1 2 64
3 3 92 3 1 48
1 2 67 1 2 57
3 3 71 3 1 96
1 2 54 2 3 68
3 3 66 3 1 90
1 3 63 2 3 71
2 1 61 3 1 74
1 3 101 2 3 74
2 1 117 3 2 78
1 3 124 2 3 92
2 1 97 3 2 71
2 1 126 2 3 80
3 2 83 3 2 49
2 1 107 2 3 67
3 2 88 1 1 101
2 1 88 2 2 111
3 2 62 1 1 103
2 2 55 2 2 80
1 3 73 1 1 103
2 2 126 2 2 111
b. Make comparative boxplots of the times for each level of Display Type. Mark the location
of the means on your boxplot. Do you see anything unusual in the data that might make it not
appropriate to use ANOVA? If so, follow up with normal quantile plots to check the assumption
of normality.
c. Run an ANOVA using Display Type as the factor. State the null hypothesis you are testing.
Report the value of the F-statistic, the p-value, and your conclusion.
d. Make comparative boxplots of the times for each level of Navigation Type. Mark the location
of the means on your boxplot. Do you see anything unusual in these data that might make
it not appropriate to use ANOVA? If so, follow up with normal quantile plots to check the
assumption of normality.
e. Run an ANOVA using Navigation Type as the factor. Report the value of the F-statistic, the
degrees of freedom of the F-statistic, the p-value, and your conclusion.
f. Based on this study, what recommendations would you make to online questionnaire
designers?
3. A group researching wage discrepancies among the four regions of the U.S. focused on full-
time, hourly-wage workers between the ages of 20 and 40. Researchers randomly selected
200 workers meeting the age criteria from the northeast, midwest, south and west and
recorded their hourly pay rates. The mean hourly rate for the combined regions was $15.467. A
summary of the data are given in Table 31.11. The researchers ran an ANOVA on these data.
Table
Table31.11.
31.11Summary of hourly rate data.
c. Calculate the value of the F-statistic and give its degrees of freedom. Show calculations.
e. Based on the evidence in Table 31.11 and your answers to (a – d), what conclusions can the
researchers make?
4. A study focusing on women’s wages was investigating whether there was a significant
difference in salaries in four occupations commonly (but not exclusively) held by women –
cashier, customer service representative, receptionist, and secretary/administrative assistant.
Weekly wages from 50 women working in each occupation are recorded in Table 31.12.
Table
Table
331.12.
1.12 Weekly wages of women in four occupations.
Data from 2012 March Supplement, Current Population Survey.
c. Run an ANOVA. Record the ANOVA table and highlight the value of F, and the p-value.