Sei sulla pagina 1di 3

Stat Note 6

In the sixth of a series of articles about statistics for biologists, Anthony Hilton and Richard Armstrong discuss:

post hoc ANOVA tests


N A PREVIOUS say three or four, a non- than was possible in our extraction the purified plasmid

I article in
Microbiologist
(Armstrong &
significant F-test would
indicate no meaningful
differences among the means
original article (Armstrong &
Hilton, 2004).
DNA pellet was dissolved in
50 µl of water and the
concentration determined
The scenario
Hilton, 2004), we described and no further analysis would spectrophotometrically at 260
the application of analysis of be required. However, a An experiment was nm. The yield of plasmid DNA
variance (ANOVA) to various significant F-test suggests real designed to investigate the using each preparation
experimental designs in differences among the efficacy of two commercial method is detailed in Table 1.
Microbiology. treatment means and the next plasmid-prep kits compared to
Planned comparisons
ANOVA is a data analysis stage of the analysis would a standard alkaline-SDS lysis
between the means
method of great elegance, involve a more detailed protocol. A 5 ml overnight
utility and flexibility and is the examination of these recombinant E. coli culture The experiment may have
most effective method differences. containing a high copy been designed to test specific
available for analysing There are various options plasmid was harvested by (‘planned’) differences
experimental data in which available depending on the centrifugation and the pellet between the treatment means.
several treatments or factors objectives of the experiment. resuspended in 100 µl of lysis Planned comparisons are
are represented. In the Specific comparisons may buffer. Plasmid DNA was hypotheses specified before
simplest case of a one-way have been planned before the subsequently extracted from the analysis commences
ANOVA, in which the experiment was carried out, the cell suspension using a whereas ‘post-hoc’ tests are
experiment consists of a decided after the data have standard SDS-lysis protocol or for further explanation after a
number of independent been collected, or a commercially available kit significant effect has been
treatments or groups, the first comparisons between all following the manufacturer’s found.
stage of the analysis is to possible combinations of the instructions. In total, ten
How are the tests done?
carry out a variance ratio test treatment means may be independent cultures were
(F-test) to determine whether envisaged. This Statnote processed using each of the The basic strategy for
all group means are the same. provides a more detailed three extraction methods planned comparisons is to
If treatment groups are few, discussion of these questions under investigation. Following divide up the treatments sums

34 September 2006 www.sfam.org.uk


Features

of squares among the various Table 1. Comparison of two commercial plasmid-prep kits and their sensitivity to
hypotheses, called ‘contrasts’, (plasmid yield mg) compared to a standard alkaline-SDS lysis violations of the assumptions
which are then analysed protocol using planned comparisons and post-hoc tests. of ANOVA. The most critical
separately either by an F-test problem is the possibility of
Culture Alkaline-SDS lysis Commercial kit A Commercial kit B
or a t-test. If this procedure making a Type 1 error, i.e.,
was carried out for all possible 1 1.7 3.1 4.7 rejecting the null hypothesis
comparisons between the 2 2 2.2 3.5 when it is true. By contrast, a
means, then the sums of 3 1.2 2.8 2.6 Type 2 error is accepting the
squares for all contrasts would null hypothesis when a real
4 0.5 4.8 4.3
be greater than the treatments difference is present. The
sums of squares as a whole 5 0.9 5 3.8 post-hoc tests listed in Table 2
since the comparisons overlap 6 1 1.9 4.5 give varying degrees of
and based on the same 7 1.4 2 4
protection against making a
sources of variance. Strictly, Type 1 error.
such comparisons cannot be 8 2.7 3.6 1.9
Discussion of the tests
made independently of each 9 3.2 4.1 2.8
other. As a result, comparisons 10 0.7 4.7 4.6 Fisher’s protected least
must be constructed so that significant difference (Fisher’s
they are not overlapping, i.e., ANOVA PLSD) is the most ‘liberal’ of
they have to be ‘orthogonal.’ Source of variation Sums of squares DF Mean square F the methods discussed and
Essentially, orthogonal therefore the most likely to
Treatments 27.3807 2 13.690 13.28
comparisons have no common result in a Type 1 error. All
variance and their coefficients (P<0.001) possible pairwise comparisons
sum to zero. Hence, the sums Error 27.998 27 1.0370 are evaluated and the method
of squares can be calculated uses Student’s ‘t’ to determine
for each contrast and a test of Planned comparisons the critical value to be
significance made on each. Contrast Estimate Std. error (SE) ‘t’ exceeded for any pair of
The number of possible means based on the maximum
1. Std. v (Kit A + Kit B)/2 4.03 0.79 5.109
contrasts is equivalent to the number of steps between the
number of degrees of freedom (P<0.001) smallest and largest mean.
(DF) of the treatment groups 2.Kit A v Kit B) 0.25 0.45 0.54 The Tukey-Kramer honestly
in the experiment. Hence, if (P>0.05) significant difference (Tukey-
an experiment employs three Kramer HSD) is similar to the
groups, as in our scenario, Post-hoc tests Fisher PLSD but is less liable
then two contrasts can be to result in a Type 1 error. In
Test Std. v Kit A Std. v Kit B Kit A v Kit B
validly tested. This approach addition, the method uses the
Fisher PLSD P<0.001 P<0.001 P<0.05
has two advantages. First, more conservative
there is no problem as to the Tukey-Kramer HSD P<0.001 P<0.001 P<0.05 ‘Studentised range’ rather
validity of the individual SNK P<0.001 P<0.001 P<0.05 than Student’s ‘t’ to determine
comparisons, a problem P<0.001 a single critical value that all
Scheffé P<0.001 P<0.05
present to some extent with all comparisons must exceed for
conventional post-hoc tests. significance. This method can
Second, the comparisons method, viz., do the between the treatment means be used for experiments that
provide direct tests of the commercial kits on average are carried out post hoc or have equal numbers of
hypotheses of interest. Most improve plasmid yield where multiple comparisons observations (N) in each
commercially available (contrast 1)? Second, a between the treatment means group or in cases where ‘N’
software will allow for valid comparison of the two may be required. A variety of varies significantly between
contrasts to be tested for a commercial prep kits methods exist for making groups. However, with modest
range of experimental designs. themselves (contrast 2). post-hoc tests. The most variations in N, the Spjotvoll-
Contrast 1 is highly significant common tests included in Stoline modification of the
An illustrative example
(t = 5.11, P < 0.001) commercially available above method can be used.
An example of this indicating the superiority of statistical software are listed The Student-Newman-Keuls
approach is shown in Table 1. the commercial kits over the in Table 2 (Abacus Concepts, (SNK) method makes all
In our scenario, we compared standard method but contrast 1993; Armstrong et al., pairwise comparisons of the
two commercial plasmid-prep 2 is not significant (t = 0.54, 2001). These tests determine means ordered from the
kits with a standard alkaline- P > 0.05) showing that the the critical differences that smallest to the largest using a
SDS lysis protocol. Two valid two commercial kits did not have to be exceeded by a pair stepwise procedure. First, the
contrasts are possible using differ in their efficacy. of treatment means to be means furthest apart, i.e., ‘a’
this experimental design. significant. However, the steps apart in the range, are
Post-hoc tests
First, a comparison of the individual tests vary in how tested. If this mean difference
mean of the two-commercial There may be effectively they address a is significant, the means a-2,
prep kits with the standard circumstances in which tests particular statistical problem a-3, etc., steps apart are tested

www.sfam.org.uk September 2006 35


until a test produces a non- This method defines a likely to have an effect, then it all four tests lead to the same
significant mean difference, different critical value for each is better to use a more liberal conclusion, i.e., both
after which the analysis is pairwise comparison and this test such as Fisher’s PLSD. In commercial kits are superior
terminated. The SNK test is is determined by the variances this scenario it is better not to to the standard method but
more liable to make a Type 2 and numbers of observations miss a possible effect. By there is no difference between
rather than a Type 1 error. in each group under contrast, if the objective is to commercial kits A and B thus
By contrast, the Tukey comparison. Dunnett’s test is be as certain as possible that a confirming the results of the
compromise method employs used when several treatment particular treatment does have planned comparisons.
the average of the HSD and means are each compared to a an effect then a more
Conclusion
SNK critical values. Duncan’s control mean. Equal or conservative test such as the
multiple range test is very unequal ‘N’ can be analysed Scheffé’s test would be If data are analysed using
similar to the SNK method, and the method is not appropriate. Tukey’s HSD and ANOVA, and a significant F
but is more liberal than SNK, sensitive to heterogenous the compromise method fall value obtained, a more
the probability of making a variances. An alternative to between the two extremes and detailed analysis of the
Type 1 error increasing with this test is the the Student-Newman-Keuls differences between the
the number of means Bonferroni/Dunn method that (SNK) method is also a good treatment means will be
analysed. One of the most can also be employed to test choice. We would also required. The best option is to
popular methods is Scheffé’s multiple comparisons between recommend the use of plan specific comparisons
‘S’ test. This method makes all treatment means especially Dunnett’s method when among the treatment means
pairwise comparisons between when a large number of several treatments are being before the experiment is
the means and is a very robust treatments is present. compared with a control carried out and test them
procedure to violations of the mean. However, none of these using ‘contrasts’. In some
Which test to use?
assumptions associated with methods is an effective circumstances, post-hoc tests
ANOVA (Armstrong & Hilton, In many circumstances, substitute for an experiment may be necessary and
2004). It is also the most different post-hoc tests may designed specifically to make experimenters should think
conservative of the methods lead to the same conclusions planned comparisons between carefully which of the many
discussed giving maximum and which of the above tests is the treatment means. tests available should be used.
protection against making a actually used is often a matter Different tests can lead to
An illustrative example
Type 1 error. The Games- of fashion or personal taste. different conclusions and
Howell method is one of the However, each test addresses As an example, we analysed careful consideration as to the
most robust of the newer the statistical problems in a data from our scenario using appropriate test should be
methods. It can be used in unique way. A good way of four different post-hoc tests, given in each circumstance.
circumstances where ‘N’ deciding which test to use is viz., Fishers PLSD, Tukey-
varies between groups, with to consider the purpose of the Kramer HSD, the SNK References
heterogeneous variances (see experimental investigation. If procedure and by Scheffé’s
Statnote 5), and when the purpose is to decide which test (Table 1). In this example, ■ Abacus Concepts (1993)
SuperANOVA. Abacus Concepts
normality cannot be assumed. of a group of treatments is the results are clear cut and
Inc., Berkeley CA 94704, USA.

Table 2. Features of the most commonly used post-hoc tests (modified from Abacus Concepts 1993 ■ Armstrong R A, Slade S V &
and Armstrong et al., 2000) Eperjesi F (2000) An
introduction to analysis of
Method Equal N F Normality Use Error control Protection variance (ANOVA) with special
reference to clinical experiments
Fisher PLSD Yes Yes Yes All Most sensitive to Type 1
in optometry. Ophthal Physiol
Tukey-Kramer HSD No Yes Yes All Less sensitive to Type 1
than Fisher PLSD Opt 20: 235-241.
Spjotvoll-Stoline No Yes Yes All As Tukey-Kramer
■ Armstrong R A & Hilton A
Student-Newman Keuls (SNK) Yes Yes Yes All Sensitive to Type 2 (2004) The use of analysis of
Tukey-Compromise No Yes Yes All Average of Tukey and SNK variance (ANOVA) in applied
microbiology. Microbiologist vol
Duncan’s Multiple Range No Yes Yes All More sensitive to Type 1 than SNK
5: No.4 18-21.
Scheffé’s S Yes No No All Most conservative
■ Hilton A & Armstrong R A
Games/Howell Yes No No All More conservative than majority
(2006) Is one set of data more
Dunnett’s test No No No T/C More conservative than majority variable than another?
Bonferroni No Yes Yes All, TC Conservative Microbiologist Vol. 7: No.2 34-
36 (June 2006)
Abbreviations: PLSD = Protected least significant difference, HSD = Honestly significant difference.
T = treatment groups, C = Control group, Column 2 indicates whether equal numbers of replicates (N) in each Dr Anthony* Hilton and
treatment group are required or whether the method can be applied to cases with unequal ‘N’. Column 3 Dr Richard Armstrong**
indicates whether a significant between treatments F ratio is required before post-hoc tests can be applied and *Pharmaceutical Sciences and
columns 4 and 5 whether the method assumes equal variances in the different treatments and normality of errors **Vision Sciences, Aston
respectively. The final column indicates the relative degree of protection against type 1 and type 2 errors. University, Birmingham, UK

36 September 2006 www.sfam.org.uk

Potrebbero piacerti anche