Sei sulla pagina 1di 76

Chapter 3

Treatment Comparisons

Li-Shya Chen and Tsung-Chi Cheng

Department of Statistics
National Chengchi University
Taipei 11605, Taiwan

E-mail: chengt@nccu.edu.tw
Outline

• 3.1 Treatment comparisons answer research questions


• 3.2 Planning comparisons among treatments
• 3.3 Response curves for quantitative treatment factors
• 3.4 Multiple comparisons affect error rates
• 3.5 Simultaneous statistical inference
• 3.6 Multiple comparisons with the best treatment
• 3.7 Comparison of all treatments with a control
• 3.8 Pairwise comparisons of all treatments
• 3.9 Summary comments on multiple comparisons
• 3.A Appendix: Linear functions of random variables
Objectives

• The methods for an in-depth analysis of the responses to


the treatment design include
• planned contrast among treatment group
• regression response curve for quantitative treatment
factors
• selection of the best subset of treatments
• comparison of treatment to the control
• all pairwise comparisons among treatment means
• All of these methods involve a set of simultaneous
decisions to be made by the investigator.
• This simultaneous statistical inference affects statistical
errors of inference.
3.1 Treatment comparisons answer research
questions

• Null hypothesis for one-way ANOVA is


H0 : µ1 = . . .. . .. . . = µt
• Rejection of H0 allows us to conclude that the treatment
means are not all equal, but not where the differences
occur.
• Researchers may be more interested in ascertaining
whether the mean of the control or placebo group is
different from the means of other groups; or whether
mean of the targeted treatment is different to the means
of other groups.
Example

• For the steak storage experiment


Treatments
Commercial Vacuum CO2 , O2 , N CO2
µ̂i = ȳi . 7.48 5.50 7.26 3.36
t =4 2
ri = 3 s = MSE = 0.116 with 8 d.f.
• Some questions might be interested to the researchers
are:
• Is the creation of an artificial atmosphere more effective
than ambient air with commercial wrap?
• Are the gases more effective than vacuum?
• Is pure CO2 more effective than a mixture of gases?
3.2 Planning Comparisons Among Treatments

• Contrasts among treatment means are helpful to answer


specific questions about treatment comparisons.
• Contrasts are special forms of linear function of
observations (see Appendix 3A).
• A Contrast among treatment means is defined as

t
X
C= ki µi = k1 µ1 + k2 µ2 + · · · + kt µt
i =1

where
t
X
ki = 0.
i =1
I. Contrast for three specific questions
• Commercial wrap vs. artificial atmospheres:

1 1 1 1 h i0
C1 = µ1 − (µ2 +µ3 +µ4 ) = [1, − , − , − ] µ1 µ2 µ3 µ4
3 3 3 3
or C1∗ = 3µ1 − (µ2 + µ3 + µ4 )
• Vacuum vs. gases:

1 1 1 h i0
C2 = µ2 − (µ3 + µ4 ) = [0, 1, − , − ] µ1 µ2 µ3 µ4
2 2 2
or C2∗ = 2µ2 − (µ3 + µ4 )
• Mixture gases vs. pure CO2
h i0
C3 = µ3 − µ4 = [0, 0, 1, −1] µ1 µ2 µ3 µ4
II. Estimating and testing contrasts
• a) Point estimator
t
X t
X
Ĉl = kli µ̂i = kli ȳi .
i =1 i =1
• The estimates for the three contrasts

t
X 1
Ĉ1 = k1i µ̂i = ȳ1. − (ȳ2. + ȳ3. + ȳ4. )
3
i =1
Pt
or Ĉ1∗ = i =1 k1i µ̂i = 3ȳ1. − (ȳ2. + ȳ3. + ȳ4. )
t
X 1
Ĉ2 = k2i µ̂i = ȳ2. − (ȳ3. + ȳ4. )
2
i =1
Pt
or Ĉ2∗ = i =1 k2i µ̂i = 2ȳ2. − (ȳ3. + ȳ4. )
t
X
Ĉ3 = k3i µ̂i = ȳ3. − ȳ4.
II. Estimating and testing contrasts (continued)

• b) Standard errors of contrasts: sĈ

t t t
X X s 2 X k2 i
sĈ2 = ki2 sȳ2i . = ki2 =s 2
ri ri
i =1 i =1 i =1

• Hence, for each contrast estimator:

t t X k2 t
X X s2 li
s2 = kli2 sȳ2i . = kli2 = s2
Ĉl ri ri
i =1 i =1 i =1
II. Estimating and testing contrasts (continued)

• c) Notice that

Ĉ − C
∼ tN −t
sĈ

and
!2
Ĉ − C
∼ F1,N −t
sĈ

• d) 100(1 − α)% CI for contrast Cl :

Ĉl ± tα/2;N −t × sĈl


III. Sum of squares partition and hypothesis testing

Pt
• Sum of squares attributed to the contrast Cl = i =1 kli µi is

( ti=1 kli ȳi . )2


P
SSCl = Pt 2
with 1 d.f.
i =1 (kli /ri )

• Note: Also

(Ĉl )2
SSCl =
s 2 /s 2
Ĉl
III. (continued)
• To test
(
H0 : Cl = 0
H0 : Cl , 0
• t-test:
• Test statistic:

Ĉl
t=
sĈl

• Rejection region:
• F -test:
• Test statistic:
(Ĉi )2
2
s 2 /s 2

MSCl SSCl /1 Ĉi  Ĉl 
F= = = =  
MSE s2 s2 sĈl 

• Rejection region:
IV. Orthogonal contrasts
Pt Pt
• Let Cl = i =1 kli µi and Cm = i =1 kmi µi be two different
contrasts.
• They are orthogonal contrasts if

t
X kli kmi
= 0.
ri
i =1

• Since there are t treatments, t -1 orthogonal contrasts


can be constructed, and
t−1
X
SSTR = SSCl
l =1

if C1 , . . . , Ct−1 are orthogonal contrasts.


IV. Orthogonal contrasts: Example

• For steak storage experiment, there are 4 treatments.

1 1 1 1 1 1 1 1
[1, − , − , − ][0, 1, − , − ]0 = − + + = 0
3 3 3 2 2 3 6 6
1 1 1 1 1
[1, − , − , − ][0, 0, 1, −1]0 = − + = 0
3 3 3 3 3
1 1 1 1
[0, 1, − , − ][0, 0, 1, −1] = − + = 0
2 2 2 2
• Hence, C1 = µ1 − 31 (µ2 + µ3 + µ4 ), C2 = µ2 − 12 (µ3 + µ4 ), and
C3 = µ3 − µ4 are orthogonal contrasts.
• SSTR = 3l =1 SSCl .
P
Example: Meat storage - SAS
options LS=80 nocenter nodate nonumber;
data steak;
input method logcnt@@;
cards;
1 7.66 1 6.98 1 7.80
2 5.26 2 5.44 2 5.80
3 7.41 3 7.33 3 7.04
4 3.51 4 2.91 4 3.66
;
proc glm;
class method;
model logcnt = method;
lsmeans method;
means method;
TITLE2 ’Estimating Contrasts’;
estimate ’C1’ method 1 -0.3333 -0.3333 -0.3334;
estimate ’C2’ method 0 1 -0.5 -0.5;
estimate ’C3’ method 0 0 1 -1;
estimate ’C1*’ method 3 -1 -1 -1;
estimate ’C2*’ method 0 2 -1 -1;
TITLE2 ’Tests for Contrasts’;
CONTRAST ’ C1’ method 1 -0.3333 -0.3333 -0.3334;
CONTRAST ’ C2’ method 0 1 -0.5 -0.5;
CONTRAST ’ C3’ method 0 0 1 -1;
CONTRAST ’ C1*’ method 3 -1 -1 -1;
CONTRAST ’ C2*’ method 0 2 -1 -1;
RUN;
Meat storage - Output

Tests for Contrast


Dependent Variable: logcnt
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 3 32.87280000 10.95760000 94.58 <.0001
Error 8 0.92680000 0.11585000
Corrected Total 11 33.79960000

R-Square Coeff Var Root MSE logcnt Mean


0.972580 5.768940 0.340367 5.900000

Source DF Type I SS Mean Square F Value Pr > F


method 3 32.87280000 10.95760000 94.58 <.0001
Source DF Type III SS Mean Square F Value Pr > F
method 3 32.87280000 10.95760000 94.58 <.0001

Least Squares Means


logcnt
method LSMEAN
1 7.48000000
2 5.50000000
3 7.26000000
4 3.36000000

Level of ------------logcnt-----------
method N Mean Std Dev
1 3 7.48000000 0.43863424
2 3 5.50000000 0.27495454
3 3 7.26000000 0.19467922
4 3 3.36000000 0.39686270

Dependent Variable: logcnt


Contrast DF Contrast SS Mean Square F Value Pr > F
C1 1 9.98560000 9.98560000 86.19 <.0001
C2 1 0.07220000 0.07220000 0.62 0.4526
C3 1 22.81500000 22.81500000 196.94 <.0001
C1* 1 9.98750868 9.98750868 86.21 <.0001
C2* 1 0.07220000 0.07220000 0.62 0.4526

Standard
Parameter Estimate Error t Value Pr > |t|
C1 2.10686800 0.22691163 9.28 <.0001
C2 0.19000000 0.24067613 0.79 0.4526
C3 3.90000000 0.27790886 14.03 <.0001
C1* 6.32000000 0.68073490 9.28 <.0001
C2* 0.38000000 0.48135226 0.79 0.4526
Remarks

• Since C1 = µ1 − 31 (µ2 + µ3 + µ4 ) and


C1∗ = 3µ1 − (µ2 + µ3 + µ4 ), so C1∗ = 3C1 .
• Hence, Ĉ1∗ = 3Ĉ1 .
• However, H0 :C1∗ = 0 is equivalent to H0 :C1 = 0.
• estimates are proportional, Ĉ1∗ = 3Ĉ1
• standard errors for estimates of these two contrasts are
proportional
• Values of t test are the same
• Values for the sum of squares, mean squares and values of
F test, are the same.
• Because SSC ∗ = SSC
Ex: For phlebitis experiment

• t=3, r1 = 9, r2 = 6, r3 = 8.
• Let C1 = µ1 − 12 (µ2 + µ3 ) and C2 = µ2 − µ3 be two contrasts.
• Since 3i =1 klirkmi = 09 − 16 + 81 , 0 =⇒ they are not
P
i
orthogonal.
• Let C1 = 7µ1 − 3µ2 − 4µ3 and C2 = µ2 − µ3 .
• Then, 3i =1 klirkmi = 90 − 36 + 48 = 0, so they are orthogonal
P
i
contrasts.
SAS
• options nocenter nodate nonumber;
data phlebitis;
input method y@@;
cards;
1 2.2 1 1.6 1 0.8 1 1.8 1 1.4 1 0.4 1 0.6 1 1.5 1 0.5
2 0.3 2 0.0 2 0.6 2 0.0 2 -0.3 2 0.2
3 0.1 3 0.1 3 0.2 3 -0.4 3 0.3 3 0.1 3 0.1 3 -0.5
;
proc glm;
class method;
model y = method/solution e;
estimate ‘A-V’ method 1 -1 0;
estimate ‘V-S’ method 0 1 -1;
contrast ‘A-V’ method 1 -1 0;
contrast ‘V-S’ method 0 1 -1;
run;
estimate ’C1’ method 7 -3 -4;
estimate ’C2’ method 0 1 -1;
contrast ’C1’ method 7 -3 -4;
contrast ’C2’ method 0 1 -1;
run;
Meat storage - Output

Class Level Information


Class Levels Values
method 3 1 2 3
Number of Observations Read 23
Number of Observations Used 23
Dependent Variable: y
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 2 7.21623188 3.60811594 16.58 <.0001
Error 20 4.35333333 0.21766667
Corrected Total 22 11.56956522
R-Square Coeff Var Root MSE y Mean
0.623725 92.50513 0.466548 0.504348

Source DF Type I SS Mean Square F Value Pr > F


method 2 7.21623188 3.60811594 16.58 <.0001
Source DF Type III SS Mean Square F Value Pr > F
method 2 7.21623188 3.60811594 16.58 <.0001

Level of --------------y--------------
method N Mean Std Dev
1 9 1.20000000 0.64226163
2 6 0.13333333 0.30767949
3 8 0.00000000 0.28784917

Contrast DF Contrast SS Mean Square F Value Pr > F


A-V 1 4.09600000 4.09600000 18.82 0.0003
V-S 1 0.06095238 0.06095238 0.28 0.6025
註: A-V 及 V-S 不是 orthogonal contrasts, SS(A-V)+SS(V-S)≠SSTR
Standard
Parameter Estimate Error t Value Pr > |t|
A-V 1.06666667 0.24589218 4.34 0.0003
V-S 0.13333333 0.25196450 0.53 0.6025

Contrast DF Contrast SS Mean Square F Value Pr > F


C1 1 7.15527950 7.15527950 32.87 <.0001
C2 1 0.06095238 0.06095238 0.28 0.6025

Standard
Parameter Estimate Error t Value Pr > |t|
C1 8.00000000 1.39531624 5.73 <.0001
C2 0.13333333 0.25196450 0.53 0.6025
註: C1 and C2 are orthogonal contrasts, SS(C1)+SS(C2)=SSTR
Remarks

• For the phlebitis example, although C1 = 7µ1 − 3µ2 − 4µ3


and C2 = µ2 − µ3 are orthogonal, C1 = 7µ1 − 3µ2 − 4µ3 may
neither be meaningful nor be interested to researchers.
• It is better to construct contrasts based on research
hypotheses and treatment design.
• The comparisons based on contrasts (or orthogonal
contrasts) is useful for preplanned comparisons, which are
specified prior to run the experiment.
• If comparisons are constructed after examining the data
(post hoc comparisons), most experimenters would
construct tests that correspond to large observed
differences among treatment means.
• But these large differences could be the result of the real
effect, or be the result of random error.
3.3 Response curves for quantitative factors

• One purpose of experiment is to establish the trend


between response and the factor variable.
• Example: The objective of an animal growth study is to
characterize the growth as a function of the amount of
nutrient in the diet.
• Example 3.1
• Characterizing the relationship between plants per unit of
area and grain production.
• Quantitative factor
• amount of nutrient, such as: 0 , 500, 1000 ppm
• plant density, such as: 10, 20, 30, 40, 50
• From Figure 3.2 density and grain production has curve
relationship, and can be fitted by a polynomial regression.
Polynomial regression

• For a quantitative factor with t equally spaced levels, a


t-1th degree polynomial regression can be fitted.
• Problems with polynomial regression:
• May be more expensive in degrees of freedom than
alternative nonlinear models or linear models with
transformed variables.
• May exhibit serious multi-collinearity.
• Lower order polynomial may be as well as the t-1th order
polynomial.
• An alternative to using centered variables in polynomial
regressions is to use orthogonal polynomials, since it
enables us to decompose the treatment effect into linear,
quadratic, cubic, . . . effects, and decide the significant
highest degree of the trend.
Orthogonal polynomial regression
• Regression model for an orthogonal polynomial with
fourth-order:

yij = β0 + β1 P1i + β2 P2i + β3 P3i + β4 P4i + εij

where Pci ≡ Pc (xi ) is the cth-order orthogonal polynomial


for the i th treatment level with value denoted as xi .
• Transformations of the powers of x into orthogonal
polynomial:
• Constant: P0 (x) = 1
• Linear:
x − x̄
 
P1 (x) = λ1
d
• Quadratic:

t2 − 1
" 2 #
x − x̄
P2 (x) = λ2 −
d 12
Orthogonal polynomial regression (continued)

• Cubic:

x − x̄ 3t 2 − 7
" 3  #
x − x̄

P3 (x) = λ3 −
d d 20

• Quartic

x − x̄ 2 3t 2 − 13 3(t 2 − 1)(t 2 − 9)
" 4  ! #
x − x̄

P4 (x) = λ4 − +
d d 14 560

• λc is the constant to make Pc be an integer


• x is the value of factor level
• x̄ is the mean of factor levels
• d is the distance between factor levels
Remark 1

• Estimator for the orthogonal polynomial regression


coefficients
X ri
t X ri
t X
,X
β̂c = Pci yij Pci2 , c = 0, · · · , t − 1,
i =1 j =1 i =1 j =1

• Pci ≡ Pc (xi )
• xi is the i th level
• the matrix X 0 X is a diagonal matrix
Pt .P
t 2
• Note: For balanced design β̂c = i =1 Pci ȳi . i =1 Pci .
Remark 2

• Orthogonal polynomials made from equally spaced


treatments are also orthogonal contrasts.
• Contribution of the cth-order orthogonal polynomial to
SSTR is
P 2
t
i =1 Pci ȳi .
SSPc = r Pt 2
i =1 Pci
Pt
( kli ȳi . )2
(Recall SSCl = Pti =1 2 )
i =1 (kli /ri )
Pt−1
• Moreover, SSTR = c =1 SSPc .
Example 3.1
xi P0 P1 P2 P3 P4 ȳi .
10 1 -2 2 -1 1 12
20 1 -1 -1 2 -4 16
30 1 0 -2 0 6 19
40 1 1 -1 -2 -4 18
50 1 2 2 1 1 17
Pt
i =1 Pci ȳi . 82 12 -14 1 7
P t 2
i =1 Pci 5 10 14 10 70
β̂c 16.4 1.2 -1 0.1 0.1
SSPc 43.2 42 0.3 2.1

10 − x̄ 10 − 30
   
P1 (10) = λ1 = λ1 = −2λ1 =⇒ λ1 = 1
d 10
2  10 − 30 2 52 − 1 
   
 x − x̄ 2 t − 1 

P2 (10) = λ2  −  = λ2  −  = 2λ =⇒ λ = 1
2 2
d 12  10 12 

 x − x̄ 3  x − x̄  3t 2 − 7   10 − 30 3  10 − 30   3 × 52 − 7 
    
P3 (10) = λ3  −  = λ 
3 −  
d d 20  10 10  20 

= λ3 (−8 + 2 × 3.4) = −1.2λ3 =⇒ λ3 = 1/1.2 = 5/6


 10 − 30 4  10 − 30 2  3 × 52 − 13  3(52 − 1)(52 − 9)  12
   
P4 (10) = λ4  −   +
 35 λ4 =⇒ λ4 = 35/12
 =
10 10  14  560
SAS
• DATA grain;
INPUT production density @@;
DATALINES;
12.2 10 11.4 10 12.4 10
16.0 20 15.5 20 16.5 20
18.6 30 20.2 30 18.2 30
17.6 40 19.3 40 17.1 40
18.0 50 16.4 50 16.6 50
;
OPTIONS LS=80 NODATE NONUMBER;
TITLE1 ’Orthogonal Polynomials in SAS’;
PROC GLM DATA=grain;
CLASS density;
MODEL production = density;
RUN;
TITLE2 ’Components of the Orthogonal Polynomial Contrasts’;
CONTRAST ’Linear’ density -2 -1 0 1 2;
CONTRAST ’Quadratic’ density 2 -1 -2 -1 2;
CONTRAST ’Cubic’ density -1 2 0 -2 1;
CONTRAST ’Quartic’ density 1 -4 6 -4 1;
RUN;
TITLE2 ’Estimates of the Orthogonal Polynomial Contrasts’;
ESTIMATE ’Linear’ density -2 -1 0 1 2 / DIVISOR = 10;
ESTIMATE ’Quadratic’ density 2 -1 -2 -1 2 / DIVISOR = 14;
ESTIMATE ’Cubic’ density -1 2 0 -2 1 / DIVISOR = 10;
ESTIMATE ’Quartic’ density 1 -4 6 -4 1 / DIVISOR = 70;
*The DIVISOR in each case above is the sum of squared coeff.;
RUN;
Meat storage - Output

Orthogonal Polynomials in SAS


CRD
The GLM Procedure
Class Level Information
Class Levels Values
density 5 10 20 30 40 50
Number of Observations Read 15
Number of Observations Used 15

Dependent Variable: production


Sum of
Source DF Squares Mean Square F Value Pr > F
Model 4 87.60000000 21.90000000 29.28 <.0001
Error 10 7.48000000 0.74800000
Corrected Total 14 95.08000000

R-Square Coeff Var Root MSE production Mean


0.921329 5.273597 0.864870 16.40000

Source DF Type I SS Mean Square F Value Pr > F


density 4 87.60000000 21.90000000 29.28 <.0001

Source DF Type III SS Mean Square F Value Pr > F


density 4 87.60000000 21.90000000 29.28 <.0001

Components of the Orthogonal Polynomial Contrasts


The GLM Procedure
Dependent Variable: production
Contrast DF Contrast SS Mean Square F Value Pr > F
Linear 1 43.20000000 43.20000000 57.75 <.0001
Quadratic 1 42.00000000 42.00000000 56.15 <.0001
Cubic 1 0.30000000 0.30000000 0.40 0.5407
Quartic 1 2.10000000 2.10000000 2.81 0.1248

Estimates of the Orthogonal Polynomial Contrasts


The GLM Procedure
Dependent Variable: production
Standard
Parameter Estimate Error t Value Pr > |t|
Linear 1.20000000 0.15790292 7.60 <.0001
Quadratic -1.00000000 0.13345233 -7.49 <.0001
Cubic 0.10000000 0.15790292 0.63 0.5407
Quartic 0.10000000 0.05968170 1.68 0.1248
Estimating the response curve
• Hence, β3 = 0 and β4 = 0 are not rejected, and a quadratic model is
sufficient to describe the relationship between grain production and
plant density, and the estimated orthogonal polynomial regression
equation is
ŷij = ȳ.. + β̂1 P1i + β̂2 P2i = 16.4 + 1.2P1i − P2i
• Since
xi − x̄ x − 30
 
P1i = λ1 = i = 0.1xi − 3
d 10
t2 − 1
" 2 #
xi − x̄
P2i = λ2 −
d 12
xi − 30 2 52 − 1
 
= − = 0.01xi2 − 6xi + 7
10 12

ŷij = 16.4 + 1.2P1i − P2i = 16.4 + 1.2(0.1xi − 3) − (0.01xi2 − 6xi + 7)


= 5.8 + 0.72xi − 0.01xi2
• Hence, the estimated response curve based on densities becomes
ŷij = 5.8 + 0.72xi − 0.01xi2
Remarks

• Just like that CI can be constructed from regression


analysis, we can also establish CI for the response curve
with point estimator ŷ = ȳ.. + β̂1 P1 + β̂2 P2
• Its standard error is sŷ , where sŷ2 = sȳ2 + s 2 P12 + s 2 P22 .
.. β̂1 β̂2
Skip the part after Eq. (3.18) to the end of Section 3.3.
3.4 Error rates in multiple comparisons
• Multiple Comparisons: more than one comparison among
treatments.
• Groups of contrasts or orthogonal contrasts are multiple
comparisons.
• Multiple comparisons helps to find some differences
among treatments after the F test for H0 : µ1 = · · · = µt
has been rejected.
• Multiple comparisons are to be used only when the overall
equality of population means has been rejected by the
F -test in the ANOVA approach.
• No need to conduct any contrast test if µ1 = · · · = µt can’t
be rejected.
• To protect the Type I or Type II experimentwise error rate
due to conduct multiple tests, some multiple comparison
methods have been developed.
• There are abundant procedures about multiple
comparisons.
• Understanding the features of them helps us to choose the
appropriate one for a particular situation.
Error rates for multiple comparisons

• Comparisonwise error rate (αC )


• the probability of making type I error of a specific
comparison among treatment means.
• Experimentwise error rate (αE )
• the probability of making at least one Type I error among
the family of comparisons.
Error rates
• Suppose there are n independent comparisons each with
the same type I error rate αC , then

αE = 1 − (1 − αC )n (3.24a)

• If one wants to control αE , then the comparisonwise error


rate can be set by

αC = 1 − (1 − αE )1/n (3.24b )

• However, the n comparisons are dependent, and 1 - (1 -


αC )n is the maximum or upper bound of αE .
• Moreover, if the number of comparisons n is big, then αC
determined by (3.24b) would be too small to lead to a
nonsignificant result, and thus be too conservative.
• In Table 3.8, if n=10, to have αE =0.05, then set αC = 0.005.
Remarks

• Error rates for multiple comparisons are defined in either


weak sense or strong sense.
• The weak sense examines the error rate assuming
µ1 = · · · = µt , while the strong sense makes no such
equality constraint.
• Since the treatment means are unlikely to all be equal
under most circumstances, the weak sense
experimentwise error rate may offer poor protection
against incorrect decisions.
3.5 Simultaneous Inference
• When there are several inferences to be made, we usually
want these inferences to be simultaneously hold.
• For instance, to construct multiple confidence intervals,
we would like to have the simultaneous coverage for all
related CIs to be 0.95, not just the coverage for one
specific CI or each individual CI to be 0.95.
• It’s important to make decisions from the data with a valid
statistical method.
• Choice of method depends on the type and the strength of
the desired inference.
• Types of contrasts often considered are:
• Planned contrasts
• Orthogonal polynomial contrasts
• Multiple comparisons with the best treatment
• Multiple comparisons with the control experiment
• All pairwise comparisons
I. Bonferroni t statistic or Bonferroni method

• According to Bonferroni inequality, when n comparisons


are made at the same comparison error rate αC , then the
maximum probability of making an experimentwise type I
error αE ≤ nαC .
• Hence, to control αE , set αC to be αC = αE /n.
• The 100(1-αE )% simultaneous C.I. for contrast C is

t
X
Ĉ ± tαE / (2n );N −t × sĈ , where Ĉ = ki ȳi .
i =1

• Remark:
• If n is large, then tαE / (2n );N −t might be big.
• Bonferroni’s method can be used safely for a small number
of pre-planned contrasts.
II. Scheffe t statistic or Scheffe method
• Scheffe proposed a method to construct simultaneous
confidence intervals or simultaneously test all possible
contrasts.
• This method is quite conservative and is often used for
examining unplanned contrasts or contrasts suggested by
the data.
• The 100(1-αE )% simultaneous C.I. for contrast C with
Scheffe method is
q
Ĉ ± sĈ (t − 1)FαE ;t−1,N −t ,
q
k2
where Ĉ = i =1 ki ȳi . , and sĈ = MSE × ti=1 ri .
Pt P
i
• Note:
• Based on Scheffe method, H0 : C = 0 is rejected
p
if |Ĉ | > sĈ(t − 1)FαE ;t−1,N −t .
p
• The Scheffe statistic is sĈ (t − 1)FαE ;t−1,N −t .
3.6 Multiple Comparisons with the Best Treatments

(Skip this section!)


3.7 Comparison of All Treatments with a Control
(Dunnett method)

• Let µc denote the mean of the control treatment.


• The Dunnett criterion to compare k treatments to the
control is
p
D (k , αE ) = dαE ;k ,ν 2s 2 /r

where ν = N − t for RCD.


• dαE ;k ,ν can be found in Appendix Table VI.
I

• 100(1-αE )% simultaneous C.I. for µi − µc :


• two-sided: ȳi . − ȳc. ± D (k , αE )
• one-sided lower bound: ȳi . − ȳc. − D (k , αE )
• one-sided upper bound: ȳi . − ȳc. + D (k , αE )
• Note:
• To obtain the value of dα ;k ,ν , check p. 598 or p. 600 for
two-sided CI;
• check p. 597 or p. 599 for one-sided CI.
• k =t-1, if want to compare all treatments with a control.
II. Testing hypotheses about µi − µc

H0 : µi − µc = 0 H0 : µ i − µ c ≤ 0 H0 : µ i − µ c ≥ 0
Ha : µi − µc ,0 Ha : µi − µc >0 Ha : µ i − µ c < 0
Reject H0 |ȳi . − ȳc. | > D (k , αE ) ȳi . − ȳc. > D (k , αE ) ȳi . − ȳc. < −D (k , αE )
Example 3.2 Flow rates through filters

• Treatment: 6 filter configurations, A, B, C, D, E, F.


• Replication=4. t=6, r=4, N =24, N − t=18
Filter A B C D E F
Mean 8.29 7.23 7.54 8.10 8.59 7.10
MSE= 0.08.
• Filter configuration F is the control.
• At αE =0.05, want to compare other 5 filters with F.
=⇒ k = t − 1 = 5, v = N − t=18.
p
D (k , αE ) = dαE ;k ,ν 2s 2 /r =⇒
p
D (5, 0.05) = d0.05;5,18 2(0.08)/4 = 2.76(0.02) = 0.55
Example 3.2 (continued)

• We can conclude that µi , µc if |ȳi . − ȳc. | > D (k , αE ).


|ȳA − ȳF | = |8.29 − 7.10| = 1.19 > 0.55
|ȳB − ȳF | = |7.23 − 7.10| = 0.13 < 0.55
|ȳC − ȳF | = |7.54 − 7.10| = 0.44 < 0.55
|ȳD − ȳF | = |8.10 − 7.10| = 1.00 > 0.55
|ȳE − ȳF | = |8.59 − 7.10| = 1.49 > 0.55
• Hence, at αE =0.05, we conclude A, D and E differ
significant from F.
• Moreover, since all sample means of them are bigger than
the sample average of F, A, D, and E are superior than F.

ȳF ȳB ȳC ȳD ȳA ȳE


Example 3.2 (continued)

• 95% simultaneous CIs based on Dunnett:

ȳi . − ȳc. ± D (k , αE )

ȳA − ȳF ± D (k , αE ) = 1.19 ± 0.55(0.64, 1.74)


ȳB − ȳF ± D (k , αE ) = 0.13 ± 0.55(−0.42, 0.68)
ȳC − ȳF ± D (k , αE )
ȳD − ȳF ± D (k , αE )
ȳE − ȳF ± D (k , αE )
Remarks

• Only whether any of other treatments is different from a


control can be concluded from Dunnett.
• It can’t help us to detect differences among other
treatments.
3.8 Pairwise Comparison of All Treatments

!
t t (t−1)
• There are = 2 pairwise mean differences µi − µl ,
2
for all i , l .
I. Tukey’s HSD (honestly siginificant difference)
• Tukey developed this method to obtain 100(1-αE )%
simultaneous CIs, which is based on the Studentized range
statistic for balanced case:
,r 2
  s
q = ȳ(t ) − ȳ(1) , where ȳ(t ) = max ȳi . , ȳ(1) = min ȳi .
r i =1,···t i =1,···t

• Since there are t treatment means, the HSD for µi and µj is


r
s2
HSD (t, αE ) = qαE ;t,N −t .
r
• qαE ;t,N −t is the upper αE quantile for the distribution of the
Studentized range statistic for a range of t treatment
means in an ordered array.
• 100(1-αE )% simultaneous CIs of µi − µj , for all i , j , are
ȳi . − ȳj . ± HSD (t, αE ) (∗∗).
• 100(1-αE )% simultaneous
inequalities test: Conclude that
µi and µj differ if ȳi . − ȳj . > HSD (t, αE ).
For unbalanced case
• a)Tukey-Krammer approximation for unbalanced case:
s
MSE 1 1
HSD (t, αE )ij = qαE ;t,N −t ( + ).
2 ri rj

q
MSE 1
• qαE ;t,nT −t + 1r ) for balanced design
2 (r
• Conclude that that µi and µj differ if ȳi . − ȳj . > HSD (t, αE )ij .
• 100(1-αE )% simultaneous CIs of µi − µj , for all i , j , are

ȳi . − ȳj . ± HSD (t, αE )ij .(∗∗)

• b) Another alternative is harmonic mean:


r  t −1
MSE  1 X 1 
HSD (t, αE ) = qαE ;t,nT −t , where rh =  ( ) .
rh t ri
i =1
Remarks
• Though HSD considers the absolute differences. Inference
about the direction of the difference can still be assessed
by the sign of ȳi . − ȳj . .
• If the variances for µi -µj are not all equal for complex
designs, then use
|ȳi . − ȳj . | ± qαE ;k ,v sµ̂. i −µ̂j
• HSD is very conservative, since it is based on the range of
trt means, but it is robust to heterogeneity of variances
and nonnormal populations.
• The simultaneous confidence level is 100(1-αE )% for
balanced design and at least 100(1-αE )% for unbalanced
design if SCI’s are constructed by (**).
• Two pairwise multiple comparisons tests in the weak
sense, i.e. based on µ1 = · · · = µt are: Fisher’s LSD and
Student-Newman-Keuls SNK multiple range test.
II. Fisher’s LSD (Least Significant Different)
• This method builds on the equal variances t-test of the
difference between two means.
• The test statistic is improved by using MSE rather than
pooled variance of only two samples.
µi and
• We can conclude that µj differ, i.e. µi , µj , at α
significance level if ȳi . − ȳj . > LSD (α)ij , where
q
LSD (α)ij = tα/2;N −t MSE ( r1 + r1 ).
i j

• The LSD method may result in an increased probability of


committing a type I error.
• If there are k ≤ t(t − 1)/2 number of pairwise comparisons
to be made, each has significance level being α, αE is
about 1-(1 –α)k .
• Example 3.3 and Table 3.13
III. Student-Newman-Keuls SNK multiple range test
• Multiple range procedure
• Order the treatment means as

ȳ(1) ≤ ȳ(2) ≤ · · · ≤ ȳ(t )

• The critical value for each pair of means depends on the


number of means in the range of the particular pair of
means under test.
• SNK test is similar to Tukey’s HSD test, the SNK criterion is
r
MSE
SNK (k , αE ) = qαE ;k ,N −t
r
where k is the number of sample treatment means in the
range when comparing two treatment means.
• For two means, ȳi . and ȳj . , which containing k ordered
means over the closed interval [ȳi . , ȳj . ] or [ȳj . , ȳi . ], reject
µi = µj if ȳi . − ȳj . > SNK (k , αE ).
Note

• a) The test is not conducted if there is a range of means of


size greater than k containing ȳi . and ȳj . that is not
significant by the SNK criteria.
• b) For unbalanced case, harmonic mean is often used in
SNK: q
SNK (k , αE ) = qαE ;k ,N −t MSE
rh , where
h P i−1 .P
rh = 1t ti=1 ( r1 ) = t ti=1 ( r1 )
i i
Procedures of conducting SNK testing process
• 1. Let k = t, i.e. consider ȳ(t ) − ȳ(1) .
• If ȳ(t ) − ȳ(1) ≤ SNK (t, αE ), then the two groups
corresponding to ȳ(1) and ȳ(t ) are not significantly
different, neither do any other pairs, so stop the SNK
procedure;
• otherwise, the two groups corresponding to ȳ(1) and ȳ(t )
are significantly different, and the procedure continues.
• 2. Set k = t-1.
• There are 2 differences, ȳ(t−1) − ȳ(1) , ȳ(t ) − ȳ(2) , containing
t-1 means in the range.
• If max(ȳ(t−1) − ȳ(1) , ȳ(t ) − ȳ(2) ) ≤ SNK (t − 1, αE ), then stop
the SNK procedure.
• If max(ȳ(t−1) − ȳ(1) , ȳ(t ) − ȳ(2) ) > SNK (t − 1, αE ), the
differences among the means with range (k-1 ) are tested.
• If any pair of means with range (k-1 ) are not significant, no
further testing is conducted for any other pairs of means
between that specific pair of means.
• 3. Proceed until the maximum difference for a certain range
does not exceed the critical value or until tests for range 2 have
Example 3.3 Strength of welds

• Treatment: 4 welding techniques, A, B, C, D.


Replication=5. t=4, r=5, N=20, N − t=16
Method A B C D
Mean 69 83 75 71
MSE= 15, αE =0.05.
1. Tukey’s HSD:
Example 3.3 – Table 3.14, SNK

Method
A D C B
Method Mean 69 71 75 83 k q0.05;k ,16 SNK (k , 0.05)
A 69 – 2 6 14* 4 4.05 7.02
D 71 – 4 12* 3 3.65 6.32
C 75 – 8* 2 3.00 5.20
B 83 –
• From SNK method, B is significantly different to all others;
while A, C, and D are not significantly different.
ADCB
• LSD
More example
More example on using SAS
OPTIONS LS=80 NODATE NONUMBER;
data eggwt;
input trt$ @;
do repeat = 1 to 10;
input eggwt @;
output;
end;
cards;
A 37 32 31 46 44 38 36 33 40 33
B 49 43 52 52 50 53 46 42 56 47
C 43 33 48 45 46 42 46 40 44 43
D 47 41 54 49 44 52 46 45 54 38
E 40 31 44 34 44 36 51 43 37 40
;
proc glm;
class trt;
model eggwt=trt;
means trt/lsd;
means trt/tukey;
means trt/dunnett;
means trt/snk;
means trt/scheffe;
means trt/duncan;
lsmeans trt/pdiff stderr;
contrast ’TRT-E vs. OTHERS’ trt -1 -1 -1 4;
contrast ’TRT-D vs. OTHERS’ trt -1 -1 4 -1;
contrast ’TRT-CE vs. TRT-ABD’ trt -2 -2 3 -2 3;
contrast ’TRT-A vs. OTHERS’ trt -1 -1 -1 -1;
run;
Output 1

The GLM Procedure


Class Level Information
Class Levels Values
trt 5 A B C D E
Number of Observations Read 50
Number of Observations Used 50
Dependent Variable: eggwt
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 4 968.000000 242.000000 9.59 <.0001
Error 45 1136.000000 25.244444
Corrected Total 49 2104.000000
R-Square Coeff Var Root MSE eggwt Mean
0.460076 11.63052 5.024385 43.20000

Source DF Type I SS Mean Square F Value Pr > F


trt 4 968.0000000 242.0000000 9.59 <.0001
Source DF Type III SS Mean Square F Value Pr > F
trt 4 968.0000000 242.0000000 9.59 <.0001

t Tests (LSD) for eggwt


NOTE: This test controls the Type I comparisonwise error rate, not the
experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 45
Error Mean Square 25.24444
Critical Value of t 2.01410
Least Significant Difference 4.5256

Means with the same letter are not significantly different.


t Grouping Mean N trt
A 49.000 10 B
A
B A 47.000 10 D
B
B C 43.000 10 C
C
D C 40.000 10 E
D
D 37.000 10 A
Output 2

Tukey's Studentized Range (HSD) Test for eggwt


NOTE: This test controls the Type I experimentwise error rate, but it generally
has a higher Type II error rate than REGWQ.
Alpha 0.05
Error Degrees of Freedom 45
Error Mean Square 25.24444
Critical Value of Studentized Range 4.01842
Minimum Significant Difference 6.3847

Means with the same letter are not significantly different.


Tukey Grouping Mean N trt
A 49.000 10 B
A
A 47.000 10 D
A
B A 43.000 10 C
B
B 40.000 10 E
B
B 37.000 10 A

Dunnett's t Tests for eggwt


NOTE: This test controls the Type I experimentwise error for comparisons of all
treatments against a control.
Alpha 0.05
Error Degrees of Freedom 45
Error Mean Square 25.24444
Critical Value of Dunnett's t 2.53129
Minimum Significant Difference 5.6877

Comparisons significant at the 0.05 level are indicated by ***.


Difference
trt Between Simultaneous 95%
Comparison Means Confidence Limits
B - A 12.000 6.312 17.688 ***
D - A 10.000 4.312 15.688 ***
C - A 6.000 0.312 11.688 ***
E - A 3.000 -2.688 8.688

Student-Newman-Keuls Test for eggwt


NOTE: This test controls the Type I experimentwise error rate under the
complete null hypothesis but not under partial null hypotheses.
Alpha 0.05
Error Degrees of Freedom 45
Error Mean Square 25.24444
Number of Means 2 3 4 5
Critical Range 4.5257112 5.445799 5.9942499 6.384667
Means with the same letter are not significantly different.
SNK
A
B A 47.000 10 D
B
B C 43.000 10 C
C
Output 3

Scheffe's Test for eggwt


NOTE: This test controls the Type I experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 45
Error Mean Square 25.24444
Critical Value of F 2.57874
Minimum Significant Difference 7.2166
Means with the same letter are not significantly different.
Scheffe Grouping Mean N trt
A 49.000 10 B
A
B A 47.000 10 D
B A
B A C 43.000 10 C
B C
B C 40.000 10 E
C
C 37.000 10 A
Output 4

Duncan's Multiple Range Test for eggwt


NOTE: This test controls the Type I comparisonwise error rate, not the
experimentwise error rate.
Alpha 0.05
Error Degrees of Freedom 45
Error Mean Square 25.24444
Number of Means 2 3 4 5
Critical Range 4.526 4.759 4.913 5.023

Means with the same letter are not significantly different.


Duncan Grouping Mean N trt
A 49.000 10 B
A
B A 47.000 10 D
B
B C 43.000 10 C
C
D C 40.000 10 E
D
D 37.000 10 A

Least Squares Means


Standard LSMEAN
trt eggwt LSMEAN Error Pr > |t| Number
A 37.0000000 1.5888500 <.0001 1
B 49.0000000 1.5888500 <.0001 2
C 43.0000000 1.5888500 <.0001 3
D 47.0000000 1.5888500 <.0001 4
E 40.0000000 1.5888500 <.0001 5

Least Squares Means for effect trt


Pr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: eggwt
i/j 1 2 3 4 5
1 <.0001 0.0105 <.0001 0.1885
2 <.0001 0.0105 0.3782 0.0002
3 0.0105 0.0105 0.0818 0.1885
4 <.0001 0.3782 0.0818 0.0032
5 0.1885 0.0002 0.1885 0.0032
NOTE: To ensure overall protection level, only probabilities associated with
pre-planned comparisons should be used.

Dependent Variable: eggwt


Contrast DF Contrast SS Mean Square F Value Pr > F
TRT-E vs. OTHERS 1 128.0000000 128.0000000 5.07 0.0293
TRT-D vs. OTHERS 1 180.5000000 180.5000000 7.15 0.0104
TRT-CE vs. TRT-ABD 1 96.3333333 96.3333333 3.82 0.0570
TRT-A vs. OTHERS 1 480.5000000 480.5000000 19.03 <.0001
Which method to use?

• Multiple comparisons (i.e. contrasts) but not pairwise


comparisons
• If the number of comparisons is very small, use Bonferroni
method.
• If the number of comparisons is very large, use Scheffe
method.
• Pairwise comparison
• Tukey’s HSD method or SNK
• Note: SNK is less conservative than Tukey’s HSD.
• Comparing treatment means with a control
• Dunnett method.
• The ANOVA F-test only test the null hypothesis that the
treatment means are equal (H0 : µ1 = . . .. . .. . . = µt ).
• Implication of H0 : Data were all sampled from the same
distribution.
• How do we decide which means are different from one
another?
Two general approaches for conducting multiple
comparisons

• A priori comparisons (planned ; before the fact)


• Form specific comparisons by contrasts before experiment,
usually only a small number of these comparisons are
considered.
• Note: In order to represent a mathematical partitioning of
the treatments sum of squares, these contrasts are better
to be orthogonal of one another.
• Individual t tests; Contrasts; Bonferroni t
• A posteriori comparisons (unplanned ; after the fact)
• Make a larger number out of all possible comparisons after
ANOVA being done.
• Fisher’s LSD; Scheffe method; HSD; SNK; Dunnett;
• Ryan (REGWQ); CityplaceDuncan
Objective of multiple comparisons

• Keep the experimentwise error (αE ) low, and maintain the


power of the test. But not all multiple comparisons
methods succeed in that.
• more “conservative” test: emphasis on keeping αE under
control at the cost of losing power
• more “liberal ” test: lose control of αE , but higher power of
test
• Student-Newman- Keuls Test (SNK):
• Powerful, but tend to lose control of αE
• Tukey’s HSD Test:
• firm control of αE , but less powerful
• Ryan Procedure (REGWQ):
• firm control of αE and more powerful than Tukey’s Test
Statistical methods for Psychology by Howell, p. 375
Test Error rate Comparison Type Prior/Posterior
Individual t tests PCa Pairwise t Prior
Linear contrast PC Any contrasts F Prior
Bonferroni t FWb Any contrasts t ∗∗ Prior
Holm: Larzelere & Mulaik FWb Any contrasts t ∗∗ Both
Fisher’s LSD FW∗ Pairwise t Posterior
Newman-Keuls test FW∗ Pairwise Range Posterior
Ryan (REGWQ) FW Pairwise Range Posterior
Tukey HSD FW Pairwise*** Range Posterior
Scheffé Test FW Any contrasts F ∗∗ Posterior
Dunnett’s test FW With control F ∗∗ Posterior

• a : Error rate per comparison.

• b : Family error rate (FW).

• FW∗ : against complete null hypothesis.


• t ∗∗ : modified t test.
• F ∗ : modified.
• Pairwise***: Tukey HSD can be used for all contrasts, but is poor for this
purpose.
Bonferroni’s inequality and Bool’s Inequality
!
t
• Suppose there are k = pairwise comparisons to test,
2
and each single test has significance level being α.
• Let Ti , i = 1, · · · , k , denote the i th test making type I error .
Then, the probability of not making any type I error among
all these tests is P ( ki=1 Ti0 ). (Ti0 is the complement of Ti )
T

• Also, P ( ki=1 Ti0 ) = 1 − P [( ki=1 Ti0 )0 ] = 1 − P ( ki=1 Ti ),


T T S
and P ( ki=1 Ti ) ≤ ki=1 P (Ti ) = k α, (Bool’s inequality)
S P

• so P ( ki=1 Ti0 ) =1 − P ( ki=1 Ti ) ≥ 1 − ki=1 P (Ti ) = 1 − k α.


T S P

• Or, P ( ki=1 Ti0 )≥ 1 − ki=1 P (Ti ) = 1 − k α. (Bonferron’s ineq.)


T P

• Hence, αEW =P(experimentwise type I error)


T k 0
=1-P ( i =1 Ti ) ≤ 1 − [1 − k α] = k α.
• Therefore, if we want αEW = α, and want to set the type I
error rate of each individual test to be αc , then since
αEW = α ≤ cαc =⇒ αc = αEW /k .
Bonferroni’s inequality and Bool’s Inequality
(continued)

• Bool’s inequality: P ( ∞
S P∞
i =1 Ai ) ≤ i =1 P (Ai )
• Bonferroni’s inequality: P ( ki=1 Ai ) ≥ 1 − ki=1 P (Ai0 )
T P

• Note: If these k tests are independent, then the probability


is (1 − α)k that none of these k tests makes type I error.
• Hence, the experimentwise Type I error rate is 1-(1 − α)k .
Example 3.1

• Why do we need orthogonal polynomials?


• Since plant density has 5 levels: 10, 20, 30, 40, 50,
degrees of freedom due to this factor is 4, so we may fit a
4th order polynomial regression.

y = β0 + β1 x + β2 x 2 + β3 x 3 + β4 x 4 + ε

• What will be the problem?


Example 3.1 – SAS
DATA grain;
INPUT production density @@;
den2=density*density;
den3=den2*density;
den4=den3*density;
DATALINES;
:
:
PROC reg DATA=grain;
MODEL production = density den2 den3 den4/vif;
RUN;
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t| Variance
Estimate Error Inflation
Intercept 1 17.00000 7.91092 2.15 0.0572 0
density 1 -1.45833 1.45335 -1.00 0.3393 8471.52778
den2 1 0.12708 0.08711 1.46 0.1753 113819
den3 1 -0.00342 0.00209 -1.63 0.1336 182547
den4 1 0.00002917 0.00001741 1.68 0.1248 32969
Example 3.1 (continued)

• PROC reg DATA=grain;


MODEL production = density den2;
RUN;
• The REG Procedure
Analysis of Variance
Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 85.20000 42.60000 51.74 <.0001
Error 12 9.88000 0.82333
Corrected Total 14 95.08000

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.80000 1.12359 5.16 0.0002
density 1 0.72000 0.08562 8.41 <.0001
den2 1 -0.01000 0.00140 -7.14 <.0001
Tolerance or Variance Inflation Factor
• The standard multiple regression with (p − 1) explanatory
variables, the output often provides a diagnostic measure
of the collinearity of a predictor with the other predictors
in the model, either the tolerance (TOL) or the variance
inflation factor (VIF).
• Tolerance (TOL) : TOL = 1 − Rk2
• Rk2 is the R 2 of the regression of Xk on the other p − 2
predictors in the regression and a constant.
• TOL can vary between 0 and 1;
• TOL close to 1 means that R k 2 is close to 0, indicating that
Xk is not highly correlated with the other predictors in the
model
• TOL close to 0 means that X k is highly correlated with the
other predictors; one then says that X k is collinear with the
other predictors
• A common rule of thumb is that TOL < 0.1 is an indication
that collinearity may unduly influence the results.
Variance Inflation Factor

• The variance inflation factor is the inverse of the


tolerance.

VIF = (TOL )−1 = (1 − Rk2 )−1

• Large values of VIF therefore indicate a high level of


collinearity.
• The corresponding rule of thumb is that VIF > 10.

Potrebbero piacerti anche