Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Hypothesis Testing
To introduce a range of hypothesis tests suitable for testing for differences between
averages.
Specifically:
1 Sample t-test
2 Sample t-test
Paired t-test
ANOVA
Copyright
2012BSI.Alrights
reserved.
Hypothesis Tests
ANOVA
Design of
Experiments
Determine strength
of relationships
Regression
Quantify
relationships
Increasing complexity
of relationship
Plot - Scatter
Matrix
Boxplot
Dotplot
Before a hypothesis test for averages can be selected, there are two preliminary
questions:
1) Are the samples Normally distributed?
Each of the samples that are to be included in the test must contain Normally
distributed data in order for the results of the test to be statistically valid.
Copyright
2012BSI.Alrights
reserved.
YES
Are the
Samples
normally
distributed
2
2
3+
NO
Transform the
Data
using Box-Cox
NO
Compare
Median
Values
C
opyright
2012B
SI.A
lrights
reserved.
1 Sample t - Test
2 Sample t - Test
Paired t - Test
One-way Anova
YES
Are the
Samples
normally
distributed NO
Transform the
Data
using Box-Cox
NO
Compare
Median
Values
YES
Does the
Data have
Outliers
NO
C
opyright
2012B
SI.A
lrights
reserved.
Use the
Moods
Median
Test
Use the
KruskalWallis
Test
6
An Expensive Situation
Copyright
2012BSI.Alrights
reserved.
Innocent or Guilty?
If the customer is not over-claiming (it is a legitimate claim) then we would expect the
claim to fit with the pattern of data represented by the previous 250 claims
If the customer is over-claiming then we would expect the claim to not fit the pattern of
data from the previous 250 claims
Copyright
2012BSI.Alrights
reserved.
s = 88
1245
1245 - 997
Z=
= 2.82
88
X-X
Z= s
From Standard Normal tables the P-value, the probability of being equal to or greater than
1245 is 0.0024 or 0.24%. In other words we would expect such a claim to happen 1 in 417
claims
C
opyright2012BSI.Alrights
reserved.
P-value
Tail area
Area under curve beyond value of interest
Probability of being at value of interest or beyond
-4
-3
-2
-1
Value of
Interest
Copyright2012BSI.Allrights
reserved.
-4
-3
-2
-1
Value of
Interest
-4
-3
-2
-1
Value of
Interest
Value of
Interest
10
The Normal theory is telling us that based on the previous 250 claims, we
should expect a claim of $1245 or greater every 417 claims
Hence:
This claim is legitimate it is that 1 in 417
This is not legitimate - it does not fit the previous data
Copyright2012BSI.Allrights
reserved.
11
In the previous example we have used the properties of the Normal distribution
to test whether the occurrence of an event could have happened by chance
(the data fits the expected pattern) or there is a real difference (the data does
not fit the expected pattern)
This type of situation occurs frequently during Six Sigma improvement projects,
either in the
Analyze phase when we are looking for differences to identify potential roots causes
Improve and Control phases when we are aiming to demonstrate that a real change has been made
we have made a difference
Copyright2012BSI.Allrights
reserved.
12
Process
YA
Variation in
process output
Sample yA and sA
Is there a difference?
Sample yB and sB
Supplier B
Process
C
opyright2012BSI.Alrights
reserved.
YB
Variation in
process output
13
Yold
CHANGE
Old Process
Is there a difference?
Sample ynew and snew
Variation in
process output
New Process
Copyright
2012BSI.Allrights
reserved.
Ynew
14
Is there a difference?
Looking at the histograms may give us the idea that there is a difference
20
Frequency
Frequency
20
10
10
0
10 11 12 13 14 15 16 17 18 19 20
Yold
10 11
12 13 14
15 16 17
18 19
Ynew
15
A Difference?
It is possible that we have taken two samples from the same distribution
Old
sample
New
sample
While on face value they look different statistically they are not!! The difference
is due to chance (sampling)
Copyright
2012BSI.Alrights
reserved.
16
Hypothesis Testing
Copyright
2012BSI.Alrights
reserved.
17
Copyright2012BSI.Allrights
reserved.
18
Distribution of
sample means -
100%
99.73%
95.45%
68.26%
-3x -2x-1x
1x 2x 3x
Mean x
+
Theoretically the limits
are
We cannot be 100% certain that an actual sample mean falls outside the distribution.
There is always an element of risk
C
opyright2012BS
I.Alrights
reserved.
19
95% Confidence
Experience has shown that setting the decision Boundaries at 95% provides
a good balance between being able to
make a decision and the risk of making the wrong decision
Distribution of
sample means
95%
-1.96x 1.96x
Mean x
Copyright
2012BSI.Alrights
reserved.
20
Risk
95%
Mean x
C
opyright2012BSI.Alrights
reserved.
21
More Later
Copyright
2012BSI.Alrights
reserved.
22
Hypothesis Tests
Copyright2012BSI.Allrights
reserved.
23
Copyright2012BSI.Allrights
reserved.
24
The null hypothesis is always stated so as to specify that there is no (null) difference
Copyright2012BSI.All rights
reserved.
25
The alternative hypothesis opposes the null hypothesis and can therefore has several
options
For example
H1: 1 < 2
or
H1: 1 > 2
or
H1: 1 2
C
opyright2012BSI.Alrights
reserved.
26
3. Determine Tests
Hypothesis Test
Can be used to
t-test
paired t-test
ANOVA
(Analysis Of Variance)
Chi-square test
Copyright
2012BSI.Alrights
reserved.
27
3. Determine Test
Hypothesis tests are used when the input or process (X) variable is discrete.
If the X data are continuous, use Regression analysis to judge whether they are
related to the output (Y) variable.
X
Discrete
Discrete
Copyright 2012 BSI. All rights
reserved.
(Proportions)
Continuous
(Groups)
Continuous
t-test
Paired t-test
Regression
ANOVA
Chi-Square
Logistic
regression
28
It is very important to specify before conducting the test the level of risk that the decision
will be based on
The risk means that there is the chance that we will make the wrong decision
In assessing the risk level we need to consider the situation and the business it is a
question of significance vs. importance
Significant means there is a real difference as proven by a hypothesis test
Important means that the difference observed is important to the business
Typically the -risk is set at 5% which provides a 95% confidence
Copyright
2012BSI.Alrights
reserved.
29
Copyright2012BSI.Allrights
reserved.
30
Copyright
2012BSI.Alrights
reserved.
31
Copyright
2012BSI.Alrights
reserved.
32
6. Collect Data
Copyright2012BSI.Allrights
reserved.
33
7. Analyse Data
Copyright
2012BSI.Alrights
reserved.
34
8. Acceptance or Rejection
Method 1
Copyright
2012BSI.Alrights
reserved.
35
t -tests
YES
Are the
Samples
normally
distributed
2
2
3+
NO
Transform the
Data
using Box-Cox
NO
Compare
Median
Values
C
opyright
2012B
SI.A
lrights
reserved.
1 Sample t - Test
2 Sample t - Test
Paired t - Test
One-way Anova
37
1 Sample t-tests are used to compare the average of one sample of data against a known average.
For the purpose of the test, the known average is assumed to be exact.
Copyright2012BSI.Allrights
reserved.
38
A project team has implemented a new invoicing process, and wants to compare it
against the historical average of 16.5 days.
Hypothesis:
New Process
20
18
16
14
12
16.5
New process
10
39
A 1 Sample t-test can be used to test if the difference between the averages is
statistically significant.
Because our theory is that there is a difference, we expect to reject the Null hypothesis.
Copyright
2012BSI.Alrights
reserved.
40
41
42
Session Window
Output for 1 Sample
t-test
Copyright2012BSI.A
lrights
reserved.
Since the p-value for this test is less than 0.05, we can be
95% confident that there is a difference between the
average invoicing time of the new process versus the
historical average. We reject the Null Hypothesis
43
Frequency
4
3
2
1
_
X
Ho
_
X
Ho
10
12
14
16
18
NewProcess
20
22
10
12
14
16
NewProcess
18
20
22
44
Problem : A cocktail bar suspects that one of its suppliers of whiskey is adding water to
increase profits. The freezing temperature of whiskey has a normal distribution with a
mean of = - 0.5450 C. Adding water raises the freezing temperature. The cocktail bar
wants to know if its suspicions are correct. Is the supplier adding water to its whiskey?
Procedure: A sample from each of ten bottles is obtained from this supplier.
Experimental Unit : A bottle.
Measurement : Freezing temperature of the sample.
Significance Level : = 0.05
Data Set : Whiskey.MPJ
Copyright
2012BSI.Alrights
reserved.
45
1
Determine
the Number
of Samples
YES
Are the
Samples
normally
distributed
2
2
3+
NO
Transform the
Data
using Box-Cox
NO
Compare
Median
Values
Copyright
2012BSI.Alrights
reserved.
1 Sample t - Test
2 Sample t - Test
Paired t - Test
One-way Anova
46
Copyright
2012BSI.Alrights
reserved.
47
2 Sample t-tests are used to compare the averages of two samples of data. The two
samples may represent:
The two samples can be of different sample sizes, since the 2 sample t-test takes account
of both sample sizes.
The two samples can have different levels of variation (standard deviation) since the two
sample t-test takes account of this as well.
Open data file: Two-Sample-T-Appeal.MPJ
Copyright
2012BSI.Alrights
reserved.
48
A project team has implemented a new appeals process, and the box plot below shows
the cycle time of the new and old processes.
Hypothesis:
40.0
Time (days)
37.5
35.0
32.5
30.0
27.5
25.0
New
Old
Process
Copyright
2012BSI.Alrights
reserved.
49
Because our theory is that the new is lower, we expect to reject the Null
hypothesis.
Copyright
2012BSI.Alrights
reserved.
50
with
second
column
C
o
p
y
rig
h
t
2
0
1
2
B
S
I.A
lrig
h
ts
reserved.
51
52
Session Window
Output for 2 Sample
t-test
C
opyright
2012B
SI.A
lrights
reserved.
53
42.5
42.5
40.0
40.0
37.5
37.5
Time (days)
Time (days)
35.0
32.5
35.0
32.5
30.0
30.0
27.5
27.5
25.0
25.0
New
Old
Process
New
Old
Process
The graphs provided by the 2 sample t-test are the same as those provided by using
the graph menu.
By default, MINITAB adds a symbol to show the average of each subgroup.
54
Copyright
2012BSI.Alrights
reserved.
55
YES
Are the
Samples
normally
distributed
2
2
3+
NO
Transform the
Data
using Box-Cox
NO
Compare
Median
Values
C
opyright
2012B
SI.A
lrights
reserved.
1 Sample t - Test
2 Sample t - Test
Paired t - Test
One-way Anova
56