12 SSGB Amity BSI Hypothesis

Module-12
Hypothesis Testing
Objectives of this module
• To introduce a range of hypothesis tests suitable
for testing for differences between averages.
• Specifically:
• 1 Sample t-test
• 2 Sample t-test
• Paired t-test
• ANOVA
Tools for Verifying the Causes
Plot - Scatter Verify the Causes

Matrix
Boxplot Identify possible
Increasing complexity
Dotplot relationships
of relationship
Determine strength
Hypothesis Tests Correlation
of relationships
Quantify
ANOVA Regression relationships
Identify and quantify

Design of complex relationships
Experiments
Hypothesis Tests for Average
Before a hypothesis test for “averages” can be selected,
there are two preliminary questions:
1) Are the samples Normally distributed?
Each of the samples that are to be included in the test must
contain Normally distributed data in order for the results of the
test to be statistically valid.
2) How many samples do you want to compare?

The “number of samples” refers to the number of different
samples (subgroups) that are to be compared, not the amount of
data that is in each of those samples (that’s sample size).
To compare the Averages of samples of data
1
Determine 1 Sample t - Test
YES
the Number
of Samples
Are the 2
Samples 2 Sample t - Test
normally
distributed 2 Paired t - Test
3+
One-way Anova
Transform the
NO Data
using Box-Cox
Compare
NO Median
Values
To compare the Medians of samples of data
Determine
YES the Number
of Samples
Are the
Samples Use the
normally Transform the
Data YES Mood’s
distributed NO using Box-Cox
Median
Test
NO Compare Does the

Median Data have
Values Outliers Use the
Kruskal–Wallis
NO
Test
An Expensive Situation
• You manage a warranty claims department. A

customer claims loss of earnings of $1,245 for an
item which usually is about $1,000.
• You examine 250 previous claims of the same
item to make a comparison and find the average
to indeed be $997 with a standard deviation $88.
• You want to know if the customer is over-
claiming or if it is reasonable
Innocent or Guilty?
• If the customer is not over-claiming (it is a

legitimate claim) then we would expect the
claim to fit with the pattern of data represented
by the previous 250 claims
• If the customer is over-claiming then we would

expect the claim to not fit the pattern of data
from the previous 250 claims
Does the Claim fit the Pattern?
You remember from previous module that if the data is
normally distributed you can apply normal theory
By using standard Normal
tables we can calculate the
s = 88 probability of a claim of
$1245 based on the historical
1245 data. If the probability is low,
then we can assume an
Illegitimate claim
997 1085 1173 1261 X-X

Z= s
1245 - 997
Z= = 2.82
88
From Standard Normal tables the P-value, the probability of being equal to or
greater than 1245 is 0.0024 or 0.24%. In other words we would expect such a claim
to happen 1 in 417 claims
P-values are Probabilities of Interest
P--value
• Tail area
• Area under curve beyond value of interest
• Probability of being at value of interest or
beyond
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
Value of Value of Value of Value of

Interest Interest Interest Interest
Making the Decision
• The Normal theory is telling us that based on the previous
250 claims, we should expect a claim of $1245 or greater
every 417 claims
Hence:
• This claim is legitimate – it is that 1 in 417
• This is not legitimate - it does not fit the previous data
• Experience shows a small p-value (0 to 0.05) means
• The probability is small that the value of interest comes
from that distribution by chance therefore something else
is going on
• Since our p-value 0.024 < 0.05 we can conclude that the
claim is not legitimate
What Have we done?
• In the previous example we have used the properties of the
Normal distribution to test whether the occurrence of an
event could have happened by chance (the data fits the
expected pattern) or there is a real difference (the data does
not fit the expected pattern)
• This type of situation occurs frequently during Six Sigma

improvement projects, either in the
• Analyze phase when we are looking for differences to
identify potential roots causes
• Improve and Control phases when we are aiming to
demonstrate that a real change has been made – we have
made a difference
Analyze: Is there a Difference?
• A common question during the Analyze phase is “is there
really a difference?”
• For example: We suspect the output of a process depends
upon the supplier
Supplier A Variation in
process output
Process YA
Sample yA and sA
Is there a difference?
Sample yB and sB
Supplier B Variation in
process output
Process YB
Improve: Have We Made a Difference?
A common question having improved a process is

“have we really made a difference?”
Variation in
process output
Old Process
CHANGE
Yold
Sample yold and sold
Sample ynew and snew
Variation in
process output
New Process Ynew

Looking at the histograms may give us the idea that
there is a difference
20
20
Frequency
Frequency
10 10
0 0
10 11 12 13 14 15 16 17 18 19 20 10 11 12 13 14 15 16 17 18 19
Yold Ynew
But can we be sure?

A Difference?
It is possible that we have taken two samples from the same
distribution
Old New
sample sample
While on face value they look different statistically they

are not!! The difference is due to chance (sampling)
Hypothesis Testing
• In the first example we calculated probabilities to help us
decide. This situation is common, where we are interested in
making a decision about differences between two or more
sets of data that relate to real world situations
• Hypothesis Testing allows us to decide whether an observed
difference is real or has happened by chance
• In simple terms we expect there to be a difference between
expected and observed outcomes due to random
(chance/common cause variation) factors. Hypothesis testing
is concerned with whether these differences are so large that
they cannot be explained by random (chance) effects
• Hypothesis tests are often based on sample data in which
case we use sampling distributions rather than the
distribution of individual values
Hypothesis Testing Concept
• In order to show that an apparent difference is real

or due to chance we start by assuming that there is
no difference (Null Hypothesis, H0).
• If the observed difference is within that expected by
chance then the Null Hypothesis of no difference is
correct.
• If the observed difference is greater than that expected by
chance then the Null Hypothesis of no difference is not
correct.
Greater Than Expected!
Distribution of
sample means -
100%
99.73%
95.45%
68.26%
-∞ +∞
-3σx -2σx-1σx 1σx 2σx 3σx Theoretically the
Mean µx
limits are±∞
We cannot be 100% certain that an actual sample mean
falls outside the distribution. There is always an
element of risk
95% Confidence
Experience has shown

that setting the decision
Boundaries at 95%
Distribution of
provides a good balance
sample means
between being able to
make a decision and the
risk of making the 95%
wrong decision
-1.96σx 1.96σx
Mean µx
Risk
A point here could be due A point here could be due to

to chance or there is a difference chance or there is a difference.
If we decide they are different If decide they are the same when
when they are not we have made a they are not we have made a
type I error type II error
The risk of making this error The risk of making this error
Is called the α-risk is called the β-risk
95%
Mean µx
Type I and II errors
• Type I - Deciding they are different when they are not
• Type II - Not detecting a difference when there is one
• P-value = the actual probability of making a Type I error

• The probability of a Type II can be calculated given an
assumed true difference
• Both are important
• Guarding too heavily against one increase the risk of the
other
• Increasing the sample size
• Reduces Type II errors
• Allows you to detect smaller differences
• More Later
Hypothesis Tests
• Hypothesis tests can be used in a number of situations:

• A sample is taken and found to have a mean x, is this value “equal” to
a target value T?
• Samples taken before and after a change, the two sample means are
different - but are they?
• Samples from a process for different stratification factors, the two
sample means are different - but are they?
• Samples taken before and after a change, the standard deviations are
different - but are they?
• Samples from a process for different stratification factors, the two
sample standard deviations are different - but are they?
• The proportion defective appears to vary with different factors – but
does it?
Steps in Hypothesis Testing
Hypothesis tests follow the pattern

1. determine the null hypothesis H0
2. determine the alternative hypothesis H1 or Ha
3. determine statistical test
4. state level of risk
5. establish sample size
6. collect data
7. analyse data
8. accept or reject the null hypothesis
9. determine conclusion
1. The Null Hypothesis
• The null hypothesis is always stated so as to

specify that there is no (null) difference
• When comparing means or variances the null

hypothesis is:
H0: µ = Target
H0: µ1 = µ2
H0: σ1 = σ2
2. The Alternative Hypothesis
• The alternative hypothesis opposes the null
hypothesis and can therefore has several options
For example
H1: µ1 < µ2
or The choice depends
H1: µ1 > µ2 upon the problem
Minitab gives the
or option to select
these
H1: µ1 ≠ µ2
3. Determine Tests
Hypothesis Test Can be used to
t-test Compare a group average against a

target.
Compare two group averages.
paired t-test Compare two group averages when
data is matched.
ANOVA Compare two or more group
(Analysis Of Variance) averages.
Test for equal variances (F-test, Compare two or more group
Bartlett’s test, Levene’s test) variances.
Chi-square test Compare two or more group
proportions.
3. Determine Test
• Hypothesis tests are used when the input or process (X)

variable is discrete.
• If the X data are continuous, use Regression analysis to
judge whether they are related to the output (Y)
variable. X
Discrete
(“Groups”) Continuous
Continuous
t-test
Paired t-test Regression
ANOVA
Y
(Proportions)
Discrete
Logistic
Chi-Square
regression
4. State level of Risk
• It is very important to specify before conducting the test the

level of risk that the decision will be based on
• The risk means that there is the chance that we will make
the wrong decision
• In assessing the risk level we need to consider the situation
and the business – it is a question of significance vs.
importance
• Significant means there is a real difference as proven by a
hypothesis test
• Important means that the difference observed is important
to the business
• Typically the α-risk is set at 5% which provides a 95%
confidence
"Important" vs. "Significant" Differences
• Significant but not important differences

• Sometimes you will detect a statistically significant
difference—yet it will be too small to be of practical
importance to your business.
• Example: A project to reduce machine setup times
• The average time for the old setup is 45 minutes. The new
average setup time is 30 minutes. The observed difference of
15 minutes is statistically significant
• However, to justify the cost of implementation, a reduction of
25 minutes is necessary
"Important" vs. "Significant" Differences,
• Important but not significant differences

• Sometimes you cannot claim a difference is statistically
significant yet the observed difference is of importance to
the business
• Example: Production volumes
• An increase of 1000 items produced per day is observed during the pilot.
• An increase of 1000 is important to the business.
• However, the difference is not statistically significant
• Either the observed difference is due to random variation and no true
difference exists, or the the variation is too large (or sample size too
small) to detect the difference.
• You (and your business leaders) need to decide if it is worth
the risk to go ahead and implement the new process
5. Establish Sample Size
• The ability to detect differences is related to the

sample size
• A small sample means we cannot detect small
differences, but the cost of data collection is low
• A large sample size will enable the detection of small
differences, but the cost of data collection could be
high
• Selecting the “right” sample size depends upon
the Power = (1 - β)
6. Collect Data
• Remember all the rules and guidelines about

collecting data
• Sample size
• Good operational definition
• Clear measurement procedure
• Un-biased samples
• Good Gauge R&R
7. Analyse Data
• All hypothesis tests can be conducted by hand

calculation – BUT
• Minitab Rules!
8. Acceptance or Rejection
• Method 1
if the p-value is > or = to α - fail to reject H0
if the p-value is < α - reject H0 and accept H1
• Method 2
if “0” falls within the confidence interval - fail to reject H0
if “0” falls outside the confidence interval - reject H0 and
accept H1
• In practice we use Methods 2 and 3 from information provided

by Minitab
t -tests
t- tests – a bit of history
Mr Gossett (1876 – 1936) worked

for the Guinness company and
published his work under the pen
name of Student, hence the
“student T-test”.
The T-test measures the
significance of shift in the average.
1
YES
the Number
of Samples
Are the 2
normally
3+
One-way Anova
Transform the
NO Data
using Box-Cox
Compare
NO Median
Values
1 Sample t-test (1)
• 1 Sample t-tests are used to compare the average

of one sample of data against a known average.
• The known average may be a historical average

or an industry benchmark.
• For the purpose of the test, the known average is

assumed to be “exact”.
• Open data file: “One-Sample-t.MPJ”
1 Sample t-test (2)
• A project team has implemented a new invoicing
process, and wants to compare it against the
historical average of 16.5 days.
Individual Value Plot of New Process

22
Hypothesis:
20 The Individual Value
18
Plot suggests that the
New Process
16.5
average invoice time
16
New process for the new process is
14 lower (quicker) than
12 the historical average
10
of 16.5
1 Sample t-test (3)
• A 1 Sample t-test can be used to test if the difference
between the averages is statistically significant.
• The hypotheses for the test would be:

• Ho: There is no difference between the average invoice time
of the new process and 16.5
• Ha: New process is lower than average invoice time of 16.5
• Because our theory is that there is a difference, we
expect to reject the Null hypothesis.
1 Sample t-test (4)
MINITAB: Stat > Basic Statistics > 1-Sample t
There are two options

for entering data.
1st Option: Enter the
column that contains the
data here.
2nd Option: If you know

the sample size, mean
and standard deviation
In both cases, the “known average” of the data it can be
that the data is to be compared entered here.
against should be entered here.
1 Sample t-tests (5)
Leave the confidence level of

95%
Alternative: less than
Select the graph most appropriate
for the samples sizes involved. If in
doubt, check all three.
One-Sample T: New Process Session Window

Output for 1 Sample
Test of mu = 16.5 vs < 16.5
95% t-test
Upper
Variable N Mean StDev SE Mean Bound T P
New Process 15 14.9818 3.1218 0.8060 16.4015 -1.88 0.040
The p-value for the Since the p-value for this test is less
1 sample t-test is than 0.05, we can be 95% confident
found in the session that there is a difference between the
window output. In average invoicing time of the new
this case, the p-value process versus the historical average.
is 0.040 We reject the Null Hypothesis
Histogram of New Process Individual Value Plot of New Process
(with Ho and 95% t-confidence interval for the mean) (with Ho and 95% t-confidence interval for the mean)
3
Frequency
_
1 X
Ho
0 _
X
Ho
10 12 14 16 18 20 22 10 12 14 16 18 20 22
New Process New Process
MINITAB adds a blue line to In this case, Ho (the known

the graphs that represents the average) is outside the
confidence interval of the confidence interval of the
sample average, and also sample average and
indicates the position of Ho therefore we can prove a
(the known average). difference between the two.
1 Sample t-test – Whiskey or water?
• Problem : A cocktail bar suspects that one of its suppliers of
whiskey is adding water to increase profits. The freezing
temperature of whiskey has a normal distribution with a
mean of µ= - 0.5450 C. Adding water raises the freezing
temperature. The cocktail bar wants to know if its suspicions
are correct. Is the supplier adding water to it’s whiskey?
• Procedure: A sample from each of ten bottles is obtained from
this supplier.
• Experimental Unit : A bottle.
• Measurement : Freezing temperature of the sample.
• Significance Level : α = 0.05
• Data Set : Whiskey.MPJ
1
YES
the Number
of Samples
Are the 2
normally
3+
One-way Anova
Transform the
NO Data
using Box-Cox
Compare
NO Median
Values
The concept of Two Sample t-tests
You take two samples of

data from the same
process, but at different
times. Has the process
mean moved?
Would you think the

process mean has moved
if the results had been…..
What did you look at to make your decisions?

• 2 Sample t-tests are used to compare the averages of two
samples of data. The two samples may represent:
• two different suppliers
• two different processes
• two different teams
• two different products etc.
• The two samples can be of different sample sizes, since the 2
sample t-test takes account of both sample sizes.
• The two samples can have different levels of variation
(standard deviation) since the two sample t-test takes
account of this as well.
Open data file: “Two-Sample-T-Appeal.MPJ”
• A project team has implemented a new “appeals”
process, and the box plot below shows the cycle
time of the new and old processes.
Boxplot of Time (days) vs Process Hypothesis:

42.5
40.0
The Box Plot suggests
37.5
that the average cycle
time for the new
Time (days)
35.0
32.5 appeals process is lower

30.0 (quicker) than the old
27.5 process.
25.0
New Old
Process
• A 2 Sample t-test can be used to test if the
difference between the average cycle times is
statistically significant.
• The hypotheses for the test would be:
• Ho: There is no difference between the average cycle
times for the new and old processes.
• Ha: The new cycle time is lower than the cycle times for
the old processes.
• Because our theory is that the new is lower, we
expect to reject the Null hypothesis.
There are several options for

entering data.
1st Option: If the data is in a
single column, with a second
column containing subgroup
data.
2nd Option: If the two data
samples are in two separate
columns.
3rd Option: If you know the
sample size, mean and standard
deviation of both samples, this
information can be entered
here.
Set the options as shown

above:
• a confidence of 95%
• a test difference of 0.0
• Alternative: less than
Two-Sample T-Test and CI: Time (days), Process
Two-sample T for Time (days)

Session Window
Process N Mean StDev SE Mean
New 20 32.12 3.46 0.77 Output for 2 Sample
Old 25 34.35 3.12 0.62 t-test
Difference = mu (New) - mu (Old)
Estimate for difference: -2.23700
95% upper bound for difference: -0.56083
T-Test of difference = 0 (vs <): T-Value = -2.25 P-Value = 0.015 DF = 38
The p-value for the 2 sample Since the p-value for this test
t-test is found in the session is less than 0.05, we can be
window output. In this case, over 95% confident that there
the p-value is 0.015. is a difference between the
average cycle times.
Boxplot of Time (days) by Process Individual Value Plot of Time (days) vs Process
42.5 42.5
40.0 40.0
37.5 37.5
Time (days)
Time (days)
35.0 35.0
32.5 32.5
30.0 30.0
27.5 27.5
25.0 25.0
New Old New Old

Process Process
The graphs provided by the 2 sample t-test are the

same as those provided by using the graph menu.
By default, MINITAB adds a symbol to show the
average of each subgroup.
Exercise: Using the data file: “Two-Sample-T-

Appeal.MPJ”
• 2) Repeat the test using the “samples in different

columns” option.
• 3) Repeat the test using the “summarised data”
option.
1
YES
the Number
of Samples
Are the 2
normally
3+
One-way Anova
Transform the
NO Data
using Box-Cox
Compare
NO Median
Values

12 SSGB Amity BSI Hypothesis

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

12 SSGB Amity BSI Hypothesis

Caricato da

Copyright:

Formati disponibili

Module-12

Plot - Scatter Verify the Causes

Identify and quantify

2) How many samples do you want to compare?

NO Compare Does the

• You manage a warranty claims department. A

• If the customer is not over-claiming (it is a

• If the customer is over-claiming then we would

997 1085 1173 1261 X-X

Value of Value of Value of Value of

• This type of situation occurs frequently during Six Sigma

A common question having improved a process is

New Process Ynew

But can we be sure?

While on face value they look different statistically they

• In order to show that an apparent difference is real

Experience has shown

A point here could be due A point here could be due to

• P-value = the actual probability of making a Type I error

• Hypothesis tests can be used in a number of situations:

Hypothesis tests follow the pattern

• The null hypothesis is always stated so as to

• When comparing means or variances the null

Hypothesis Test Can be used to

t-test Compare a group average against a

• Hypothesis tests are used when the input or process (X)

• It is very important to specify before conducting the test the

• Significant but not important differences

• Important but not significant differences

• The ability to detect differences is related to the

• Remember all the rules and guidelines about

• All hypothesis tests can be conducted by hand

• In practice we use Methods 2 and 3 from information provided

Mr Gossett (1876 – 1936) worked

• 1 Sample t-tests are used to compare the average

• The known average may be a historical average

• For the purpose of the test, the known average is

Individual Value Plot of New Process

• The hypotheses for the test would be:

There are two options

2nd Option: If you know

Leave the confidence level of

One-Sample T: New Process Session Window

MINITAB adds a blue line to In this case, Ho (the known

You take two samples of

Would you think the

What did you look at to make your decisions?

Boxplot of Time (days) vs Process Hypothesis:

32.5 appeals process is lower

There are several options for

Set the options as shown

Two-sample T for Time (days)

New Old New Old

The graphs provided by the 2 sample t-test are the

Exercise: Using the data file: “Two-Sample-T-

• 2) Repeat the test using the “samples in different

Potrebbero piacerti anche