Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Approaches
Author(s): Donald A. Jackson
Source: Ecology, Vol. 74, No. 8 (Dec., 1993), pp. 2204-2214
Published by: Ecological Society of America
Stable URL: http://www.jstor.org/stable/1939574 .
Accessed: 17/08/2013 15:45
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Ecological Society of America is collaborating with JSTOR to digitize, preserve and extend access to Ecology.
http://www.jstor.org
DONALD A. JACKSON2
Departmentof Zoology, Universityof Toronto,Toronto,Ontario,CanadaM5S JA]
INTRODUCTION
after their study demonstrated "significant" results from
Although ecologists have compared the results of randomly collected data.
various ordination methods (see Gauch 1982a, Pielou My study examines the issue of assessing multivar-
1984, Digby and Kempton 1987, Minchin 1987 for iate data dimensionality using both heuristic and sta-
comparisons), few guidelines exist to evaluate how many tistical approaches. I restrict this study to principal
ordination axes should be considered nontrivial and components analysis (PCA) because it represents the
interpretable. An implicit assumption in the use of simplest and most commonly used multivariate meth-
ordination methods is that the experienced ecologist od. In many instances, these results can be extrapolated
can separate meaningful patterns from random noise to related multivariate techniques. I used a parallel
(i.e., ecologically meaningful information vs. sampling analysis of field and simulated data to examine the
variation or measurement error; Gauch 1982b). An implications of different: (1) degrees or strength of in-
ability to distinguish "signal" from "noise" is essential tervariable correlations; (2) numbers of variables; and
and in a statistical sense, these decisions provide "stop- (3) structure within correlation matrices (i.e., blocks of
ping rules." The failure to distinguish between signal correlated variables that are uncorrelated with other
and noise may lead to the rejection of useful infor- variables). These three conditions are important fac-
mation or the interpretation of ecologically meaning- tors in determining the success of a multivariate anal-
less information. In the former case, a loss of infor- ysis. The types of data ecologists analyze often lead to
mation may limit our understanding of ecological different degrees of correlations. For example, vari-
processes. In the latter case erroneous conclusions may ables like the morphology of organisms or lakes often
result as we would be interpreting essentially mean- show strongly correlated variables whereas correlations
ingless patterns (Jackson et al 1992). Rexstad et al. of the abundance of organisms may be weaker. Studies
(1988) questioned the value of multivariate analyses often differ in their ratio of the number of observations
relative to variables in the analysis. Although some
researchers recognize the importance of having a ratio
' Manuscript received 10 March 1992; revised 11 March
1993; accepted 15 March 1993. of 3: 1 or greater to provide a stable solution (see
2 Present address: Department of Zoology, University of Grossman et al. 1991), it is not uncommon to find
Western Ontario, London, Ontario, Canada N6A 5B7. studies having a lower observation: variable ratio than
recommended. Often the implication for this latter sit- divided into groups of four variables each. Within-
uation remains unrecognized. Within data sets, there group correlations were equal to either 0.3 or 0.8
may be groups of variables (e.g., species) that are highly whereas the between-group correlations were equal to
correlated with one another, but uncorrelated with oth- either 0 or 0.3 (Fig. 1 for correlation matrices from
er groups. The within- and between-group structure simulations S-I to S-IV). This approach was used to
also contributes to substantial effects within principal simulate S-I to S-III in order to examine the effect of
components analysis. The use of simulated data per- differing degrees of strength in correlation structure.
mits data to be generated having an a priori underlying To study the effect of groups having different numbers
dimensionality such that the various methods may be of constituent variables, the matrices for S-IV used
compared relative to known values (i.e., the true num- groups containing 5, 4, and 3 variables, respectively.
ber of nontrivial components; see Lambert et al. 1990 Within-group correlations were set to 0.8 and between-
for a discussion). group correlations to 0. From each population, three
replicates of 40 observations were sampled.
METHODS
The simulation resulted in a 3 x 3 x 3 design based
Data matrices-ecological on the number of variables, level of correlation, and
Three ecological data sets were used for comparisons replication for the uniform matrices and 4 matrices x
with the simulated data. The first data set was based 3 replicates for the structured-correlation matrices.
on four lake morphological variables from 40 lakes Matrix names are coded such that the first number
from south-central Ontario. These variables were as- indicates the number of variables and the subsequent
sociated strongly with one another and generally had alphanumeric indicates the degree of correlation within
correlations between 0.6 and 0.9. The second matrix the matrix. For example, 12R3 is a 12-variable matrix
included measurements on 12 chemical elements or with all intervariable correlations equal to 0.3.
compounds from the lakes. Correlations for this matrix
ranged from near 0 to 0.7. The third matrix comprised Statistical analysis
abundance measurements on 32 benthic invertebrate Principal components analyses were conducted us-
taxa from the same 40 lakes. The correlations between ing the PRINCOMP procedure in SAS (SAS 1989).
these taxa varied between 0 and 0.7 with most corre- Methods of component assessment-heuristic ap-
lations between 0.3 and 0.4. All data in these ecological proaches. -1. Kaiser-Guttman. -The most common
data sets were transformed to linearize intervariable stopping rule in principal components analysis (PCA)
relationships and approximate normal distributions (see is based on the average value of the eigenvalues (i.e.,
Jackson 1992 for details). the Kaiser-Guttman criterion; Guttman 1954, Cliff
1988; H. Kaiser, unpublished manuscript). Because
Simulated data matrices- uniform correlation variables are often measured in different units, most
Normally distributed data were simulated to match ecologists use a correlation matrix in PCA, thereby
the number of variables in the three ecological data giving each variable equal weight in the analysis. As a
sets. Population data matrices were constructed with result, the sum of the eigenvalues equals the number
4, 12, and 32 variables and 1000 observations. Ma- of variables. In the Kaiser-Guttman method, eigen-
trices were simulated having three levels of overall values greater than the average eigenvalue (i.e., X >
correlation structure. For each population, the corre- 1.0) are retained because these axes summarize more
lations were uniformly generated to be 0, 0.3, or 0.8 information than any single original variable. There-
for all off-diagonal correlations (i.e., RO, R3, and R8). fore, only components with X > 1.0 are interpreted.
This approach generated matrices having no inter- Unfortunately, a PCA of randomly generated, uncor-
pretable dimensions (i.e., RO),a weak one-dimensional related data will produce eigenvalues exceeding one.
structure (R3), and a strong one-dimensional structure As a result, this method has been criticized (e.g., Karr
(R8). Analyses were done on three replicate samples and Martin 1981, Stauffer et al. 1985, Rexstad et al.
of 40 observations each drawn from the population of 1986, 1988, Grossman et al. 1991); however the Kai-
1000 individuals. This approach was used to assess the ser-Guttman criterion remains the most popular stop-
ability of the methods to correctly resolve the dimen- ping rule in ecology.
sionality of the population from the analysis of a sam- 2. Bootstrapped Kaiser-Guttman. -The bootstrap
ple. This parallels the same problems confronting ecol- resampling technique (Efron 1979) was proposed as a
ogists when analyzing field data. means of determining the interpretability of eigenval-
ues by Lambert et al. (1990). They argued that the
Structured correlation Kaiser-Guttman criterion was arbitrary and it ignored
For comparison, matrices were simulated with three- error associated with each X due to sampling. Conse-
dimensional structure. These three-dimensional ma- quently, eigenvalues of 0.99 would be discarded,
trices also varied as to whether submatrices were un- whereas an eigenvalue of 1.01 would be retained even
correlated or weakly correlated with one another. In though an eigenvalue of 1.01 may have 95% confidence
the first set of simulations, 12-variable matrices were limits ranging from 0.9 to 1.1. As a result, they pro-
S-I S-Il
S-l1l S-Iv
5-8 0.8
6-9 0.8
posed that the bootstrap should be used to determine to the scree plot (Horn 1965). After analyzing a given
how many eigenvalues had confidence limits encom- data set and plotting the eigenvalues in a traditional
passing the 1.0 criterion (i.e., a bootstrap Kaiser-Gutt- scree plot, numerous matrices of rank equal to the
man approach). observed data, but with uncorrelated variables, are
3. Scree plot.-Another common method (although generated and eigenvalues are calculated. These eigen-
used infrequently by ecologists; e.g., Zebra and Collins values from the random data are tabulated and the
[1992]) is the scree plot. To apply the scree method, mean values plotted on the scree plot of the original
one plots the value of each successive eigenvalue against data. The point where the two lines cross indicates the
the rank order (Fig. 2; the log of the eigenvalues also
can be used with covariance-based PCAs). The smaller
eigenvalues, representing random variation, tend to lie 4
along a straight line. The point where the first few
eigenvalues depart from the line distinguishes the "in- Structured Data
terpretable" and trivial components. Cattell (1966) 3
originally proposed that points to the left of the straight-
line segment should be considered important (i.e., three cm~
components in the structured data of Fig. 2), but sub-
sequently concluded (Cattell and Vogelmann 1977) that
the first eigenvalue to the right of this point should be
included also (i.e., four interpretable components in 1 _ I _ Random Data
Fig. 2). Often the scree approach is complicated by
either the lack of any obvious break or the possibility
of multiple break points.
Horn (1965, Horn and Engstrom 1979) recognized 2 4 6 8 10 12
that with matrices composed of random data, the scree Component Number
plot would show a stable negative slope. Horn argued
FIG. 2. Eigenvalues from a principal components analysis
that distinguishing eigenvalues in scree plots remained of a 12-variable data set of randomly generated, uncorrelated
quite arbitrary. As a result, he proposed a modification data and for a data set with underlying structure.
maximum limit where eigenvalues are considered in- variance matrix and several studies recommend its use
terpretable. Further variations of this method have in- only in covariance-based analyses (Dillon and Gold-
cluded regression or Monte Carlo approaches (e.g., Al- stein 1984, Morrison 1990, Grossman et al. 1991,
len and Hubbard 1986, Lautenschlager 1989). Jackson 1991). However, the test can be used with a
4. Broken-stick. -Frontier (1976) proposed a bro- correlation matrix where such results are considered
ken-stick method that is based on eigenvalues from to be conservative estimates of the number of non-
random data. Frontier's model assumes that if the total trivial components (Pimentel 1979, Kendall 1980).
variance (i.e., sum of the eigenvalues) is divided ran- 7. Bartlett's test of the equality of X,.-Bartlett (1954)
domly amongst the various components, then the ex- also developed a statistical test of whether the first
pected distribution of the eigenvalues will follow a bro- eigenvalue of a correlation matrix is equal to the re-
ken-stick distribution (i.e., the random data in Fig. 2). maining set of eigenvalues (i.e., correlation matrix ho-
Observed eigenvalues are considered interpretable if mogeneity). A modified Bartlett's test (Box 1949, Krza-
they exceed eigenvalues generated by the broken-stick nowski 1988) is calculated as
model. Frontier (1976) and Legendre and Legendre
(1983) provide a table of eigenvalues based on the - 1(2p + 11) InIR 1,
x2=-n
broken-stick distribution, but the solution is easily cal-
culated as:
P 1 where IR Iis the determinant of the correlation matrix,
bk = . and the test has p(p - 1)/2 degrees of freedom. The
i=k 1
test is limited because it only examines the first eigen-
where p is number of variables and bk is the size of the value. However it provides an assessment ofthe overall
eigenvalue for the kth component under the broken- PCA (i.e., if the null hypothesis is not rejected, it is
stick model. pointless to interpret the PCA).
5. Proportion of total variance. -Another simple 8. Lawley's test of X2.-Lawley (1956, 1963) pro-
criterion for estimating the number of nontrivial com- posed a method to test for the equality of the p - 1
ponents is to include all components up to some ar- eigenvalues (i.e., all but the first eigenvalue). It is based
bitrary proportion of the total variance. This method on the following
typically includes components comprising 95% of the
total variance. Although this method is advocated by X2 X2 (rij - r, u 2; (rk )2,
some statisticians (Jolliffe 1986), Jackson (1991) i=k+1 k=1 k=1
strongly recommended against its application as being where ri is the correlation between variable i and vari-
unfounded and unreliable. able j and
Statistical approaches. -Some data analysts retain
2
components with significant correlations (e.g., P < .05)
-
between the component scores and the original vari- P(P 1) i=k+l k=1
ues were considered to be indistinguishable from one Guttman method of interpreting Xs > 1.0 indicated all
another. However, if the ranges did not overlap, the PCAs from S-I to S-IV contained three interpretable
eigenvalues were assumed to be different. This latter components (i.e., within each analysis there were three
condition was considered to represent the break-point eigenvalues exceeding 1.0; Table 4). For the S-I ma-
between "meaningful" or nontrivial components and trices the approach indicated retaining five compo-
those associated with sampling and random noise. nents although only three dimensions were constructed
Similarly, the eigenvector coefficients were evaluated
in the simulations.
using a bootstrap approach. Coefficients that did not 2. Bootstrapped Kaiser-Guttman. -The bootstrap
differ significantly from zero were categorized as trivial
of the eigenvalues using the Kaiser-Guttman approach
or nonsignificant. However, if zero fell outside the 95 %
resulted in only one component being considered non-
confidence limits, then the coefficient was considered trivial with each of the 4-variable matrices (i.e., 4RO,
to be relatively stable and informative. Only boot- 4R3, and 4R8). Four components were retained with
strapped components having two or more coefficients 12RO PCAs and 2-3 components for the 12R3 matri-
different from zero were considered meaningful. Com- ces. The 32RO matrices had 9-10 components retained
ponents with only a single nonzero coefficient repre- and 8 components for the 32R3 matrices. For all R8
sented only a single variable, hence the component matrices (i.e., 4R8, 12R8, and 32R8), only the first
does not provide a true multivariate summary. component was considered nontrivial using this meth-
As a means of evaluating the overall similarity amongod. In the structured matrices, 3-4 components were
the different approaches with the different data sets, aidentified as nontrivial in the low-correlation S-II ma-
multivariate summary was done. The number of non- trices, whereas only two components were retained from
trivial components for each method from each data set S-III, and three components from the other analyses
was used as an input matrix. For example, the Kaiser- (i.e., S-I and S-IV).
Guttman method had a value for each of 4R0-A, 4R0- Both versions of the Kaiser-Guttman approach in-
B,.. .S-IV-C. The number of dimensions that were dicated a single interpretable dimension with the
simulated for each data set was included as an addi- 4-variable matrix of lake morphometry. The PCA based
tional observation. A Euclidean distance matrix was on the 12-variable matrix of water chemistry revealed
calculated between the methods and a principal co- three nontrivial components with Xs > 1.0 retained.
ordinates analysis done on the distance matrix to However, the bootstrapped evaluation suggested that
graphically integrate the differences among approaches only the first eigenvalue was significantly greater than
across all the data sets (e.g., see Jackson and Somers 1.0. Both the traditional and bootstrapped approaches
1991). indicated that nine and eight components, respectively,
were nontrivial in the 32-variable matrix of benthic
RESULTS
invertebrates, similar to results for the 32RO and 32R3
1. Kaiser-Guttman approach (X > 1.0). -For ma- matrices.
trices with correlations of RO or R3, the Kaiser-Gutt- 3. Scree plot. -Results from the scree plot based on
man method retained ; 50% of the components for the 4-variable matrices were difficult to interpret. In some
4- and 12-variable matrices and 30-40% of the com- cases, trends were apparent, but in other cases it is
ponents in the 32-variable matrices (Tables 1-3). For difficult to discern any pattern in the plot because only
each R8 matrix, only one eigenvalue exceeded 1.0, four points were available. Where a trend was apparent,
indicating a single interpretable gradient. The Kaiser- the approach advocated by Cattell and Vogelmann
TABLE 1. Number of nontrivial components indicated by various methods. Simulated data matrices have four variables and
40 observations with uniform correlations as follows: 4R0 has uniform correlation structure of r = 0, 4R3 has uniform
correlation structure of r = 0.3, and 4R8 has uniform correlation structure of r = 0.8. The letters A-C indicate replicates
drawn from a simulated population having that correlation structure. The morphology data set comprised four lake
morphometric variables.
TABLE 2. Number of nontrivial components indicated by various methods. Data matrices having 12 variables and 40
observations with uniform correlations as follows: 12R0 has uniform correlation structure of r = 0, 12R3 has uniform
correlation structure of r = 0.3, and 12R8 has uniform correlation structure of r = 0.8. The letters A-C indicate replicates
drawn from a simulated population having that correlation structure. The chemistry data set comprised 12 lake water
chemistry variables.
(1977) suggested that two components were nontrivial. 4. Broken-stick model. -The broken-stick method
With the 12RO analyses, the scree indicated 4-7 com- correctly identified the dimensionality of all uniform-
ponents should be interpretable. The number of com- correlation matrices (a single exception being one of
ponents dropped to 2-3 when 12R3 matrices were used the replicates from 4R3). For the RO matrices, the
and 2 components would be retained with 12R8 ma- method indicated that the underlying dimensionality
trices (Fig. 3). In the 32RO analyses, the scree plot was 0, and one component as nontrivial with R3 or
results suggested from 5 to 15 components should be R8 matrices. This method revealed three interpretable
interpreted, 2-6 components with 32R3, and 2 com- components for matrices from S-I and S-IV, a single
ponents with 32R8 analyses. component from S-Il, and 2-3 components from S-Ill.
For the structured matrices, the scree plot suggested A single component would be retained from the lake
that there were four nontrivial components in the S-I, morphometry data, three from the water chemistry
S-Ill, and S-IV matrices. For PCAs based on S-Il, scree data, and two components from the benthic inverte-
results indicated that between two and four compo- brate data. (The application of the broken-stick model
nents would be considered interpretable. As with the to the eigenvalues presented in Rexstad et al.'s [1988]
simulated 4-variable matrices, no estimate of the num- criticism of PCA showed no nontrivial components in
ber of dimensions for the lake morphometry data could contrast to 7 of the 15 being considered useful from
be made because no obvious trend was apparent. With the Kaiser-Guttman method.)
the water chemistry data, there were three interpretable 5. 95% of the total variance. -The approach of re-
components, but a second break is evident (Fig. 3). If taining components until 95% of the total variance was
this latter point was considered, then a total of five achieved would result in all components being inter-
components were nontrivial. preted for the 4RO or 4R3 analyses, and 2-3 compo-
TABLE 3. Number of nontrivial components indicated by various methods. Data matrices having 32 variables and 40
observations with uniform correlations as follows: 32R0 has uniform correlation structure of r = 0, 32R3 has uniform
correlation structure of r = 0.3, and 32R8 has uniform correlation structure of r = 0.8. The letters A-C indicate replicates
drawn from a simulated population having that correlation structure. The benthic invertebrate data set comprised abun-
dances for 32 lake benthic invertebrate taxa.
TABLE 4. Number of nontrivial components indicated by various methods. Patterned data matrices having 12 variables and
40 observations. The intervariable correlations were generated following Fig. 1. Letters A-C represent replicate samples
drawn from each simulated population.
simplicity of calculation and accurate evaluation of simulations, three dimensions were created by having
dimensionality relative to the other statistical ap- three sets of four variables, each set having identical
proaches. correlations. Due to this condition and chance selection
The 95%-variance-threshold method provided un- of observations in the bootstrap, any specific dimen-
satisfactory results. Although the choice at 95% of the sion could be expressed on the first component of one
total variance is relatively high, any level is arbitrary. PCA, but on the second or third component from an-
No matter what cumulative percentage level is select- other PCA. This is similar to the re-ordering of com-
ed, this approach does not appear promising because ponents or solution instability found by Oksanen (1988)
there is the high risk that many of the components that with detrended correspondence analysis. When the or-
are retained will summarize noise or nontrivial com- der of expression of the underlying dimensions varies
ponents will not be included. between components for different analyses, the eigen-
Bartlett's test of sphericity correctly identified the vector coefficient approach will fail. With these same
dimensionality in many of the data sets, but in some data characteristics, the first three eigenvalues also
cases indicated up to 11 significant eigenvalues al- overlap in their 95% confidence limits, but are signif-
though only a single dimension was simulated (Table icantly different from the fourth eigenvalue. The prob-
3). Despite the statement by Kendall (1980) that this lem with the eigenvector coefficients is particularly ev-
test is overly conservative when applied to correlation ident when the initial correlation structure is weak (e.g.,
matrices, it appears to correctly identify the number S-Il). However, if the dimensions differ in: (1) the
of dimensions with many data sets, but it is too liberal strength of correlation structure (i.e., several high cor-
a test with matrices having a low observation-to-vari- relations vs. low or medium correlations); (2) the num-
able ratio (e.g., less than the 3: 1 ratio advocated by ber of constituent variables; or (3) have strong corre-
Grossman et al. 1991). With the ecological data, the lations, this method provides more accurate results.
test also retained large numbers of the components, Overall, it appears that the combination of these two
i.e., 19 of 32 components were considered significant approaches, i.e., the bootstrapped eigenvalue and ei-
with the benthic invertebrate data. genvector coefficients, provides a better measure of the
Bartlett's approach to test for homogeneity of the dimensionality than either approach alone. The max-
correlation matrix (i.e., whether the first X equalled all imum value obtained with either approach was close
others) appeared to identify the correct minimal di- to the true dimensionality, except with the S-I1 data.
mensionality except with the 32-variable analyses. Here An additional consideration of the bootstrapped ei-
the method indicated significant structure with ran- genvector method is that it assists with the evaluation
dom, uncorrelated data. Likewise, Lawley's test con- of whether or not each variable contributes to a given
sistently overestimated the dimensionality of the 12- component. If a specific variable is not significantly
and 32-variable matrices having uniform correlations. weighted on any nontrivial component, then that vari-
Because the test is designed to evaluate only whether able could be removed from the analysis. For example,
X2 is the same as successive eigenvalues, the method in the PCA of the lake morphology data, three variables
is rather limited. As a result of its limited utility and had eigenvector coefficients that differed from zero.
relatively poor performance in this set of comparisons, However, lake volume coefficients included 0 in the
the method is not recommended. 95% confidence limits on each component. Therefore,
The combination of testing for overlap in ranges of lake volume did not contribute to the analysis and
bootstrapped eigenvalues and for eigenvector coeffi- added little information to the PCA.
cients differing from 0 appears more promising. With With the use of any of the methods employing formal
simple matrices either lacking structure or having a statistical tests, e.g., Bartlett's test of sphericity, it is
single dimension, both approaches consistently re- important to recognize the increased probability of re-
vealed the underlying dimensionality of the simula- jecting the null hypothesis when many components are
tion. However with patterned matrices, the results were evaluated sequentially. When such tests are used, re-
less reliable for either approach individually. Both searchers may remove this increased risk of a Type I
methods worked well with S-I and S-IV matrices hav- error by employing some form of a correction for mul-
ing strong inter-variable correlations. However, in S-I1 tiple comparisons such as Bonferroni's adjustment.
where there were three underlying, but weak dimen- The most promising approaches to component eval-
sions, there were no differences between the boot- uation are the broken-stick model and the boot-
strapped eigenvalues. The eigenvector coefficient ap- strapped eigenvalue-eigenvector method. The broken-
proach produced inconsistent results and frequently stick approach has the advantage of being simple to
underestimated the correct dimensionality. With ma- calculate. Within the scope of this study, both methods
trices from S-Ill, both methods correctly identified two led to similar conclusions about the dimensionality of
dimensions, but from different replicated matrices. the simulated data sets. The matrices simulated in this
The poor showing of the eigenvector approach for study all represented relatively well-conditioned data
S-I1 and S-Ill is easily explained and similar to situ- (e.g., from a normal distribution, independent sam-
ations discussed elsewhere (Oksanen 1988). In both pling). However, many data used in ecological studies
do not meet formal assumptions of classical statistical Grossman, G. D., D. M. Nickerson, and M. C. Freeman.
approaches. The extension of this comparison to sim- 1991. Principal component analyses of assemblage struc-
ture data: utility of tests based on eigenvalues. Ecology 72:
ulated data varying in departure from the statistical 341-347.
"ideal" would be of considerable value (e.g., Davis Guttman, L. 1954. Some necessary conditions for common
1977). Approaches such as the bootstrapped eigenval- factor analysis. Psychometrika 19:149-161.
ue-eigenvector method would likely prove more useful Horn, J. L. 1965. A rationale and test for the number of
with such data conditions than the relatively sensitive factors in factor analysis. Psychometrika 30:179-185.
methods based on idealized distributions and formal Horn, J. L., and R. Engstrom. 1979. Cattell's scree test in
relation to Bartlett's chi-square test and other observations
tests (e.g., both of Bartlett's and Lawley's methods). on the number of factors problem. Multivariate Behavioral
Research 14:283-300.
ACKNOWLEDGMENTS Jackson, D. A. 1992. Fish and benthic invertebrates: ana-
This studywas greatlyassistedby the criticalcommentsof lytical approaches and community-environment relation-
K. P. Burnham,M. Dennison, R. H. Green, H. H. Harvey, ships. Dissertation. University of Toronto, Toronto, On-
K. M. Somers,and D. F Stauffer.Fundingwas providedby tario, Canada.
a NaturalSciencesandEngineering ResearchCouncil(NSERC) Jackson, D. A., and K. M. Somers. 1991. Putting things in
GraduateScholarshipand OntarioGraduateScholarshipto order: the ups and downs of detrended correspondence
D. A. Jackson,an NSERC Operatinggrantto H. H. Harvey, analysis. American Naturalist 137:704-712.
and OntarioMinistryof Environmentand Ontario Renew- Jackson, D. A., K. M. Somers, and H. H. Harvey. 1992.
able ResourcesResearchGrantsto H. H. Harvey and D. A. Null models and fish communities: evidence of nonrandom
Jackson. patterns. American Naturalist 139:930-951.
Jackson, J. E. 199 1. A user's guide to principal components.
LITERATURE CITED John Wiley & Sons, New York, New York, USA.
Allen, S. J., and R. Hubbard. 1986. Regression equations Jolliffe, I. T. 1972. Discarding variables in a principal com-
for the latent roots of random data correlation matrices ponents analysis. I. Artificial data. Applied Statistics 23:
with unities on the diagonal. Multivariate Behavioral Re- 160-173.
search 21:393-398. 1986. Principal components analysis. Springer-Ver-
Bartlett, M. S. 1950. Tests of significance in factor analysis. lag, New York, New York, USA.
British Journal of Psychology (Statistical Section) 3:77-85. Karr, J. R., and T. E. Martin. 1981. Random numbers and
2 1954. A note on the multiplying factors for various principal components: further searches for the unicorn. Pages
X2 approximation. Journal of the Royal Statistical Society, 20-24 in D. E. Capen, editor. The use of multivariate sta-
Series B 16:296-298. tistics in studies of wildlife habitat. United States Forest
Box, G. E. P. 1949. A general distribution theory for a class Service General Technical Report RM-87.
of likelihood criteria. Biometrika 36:317-346. Kendall, M. 1980. Multivariate analysis. Second edition.
Cattell, R. B. 1966. The scree test for the number of factors. Charles Griffin, London, England.
Journal of Multivariate Behavioral Research 1:245-276. Krzanowski, W. J. 1983. Cross-validatory choice in prin-
Cattell, R. B., and S. Vogelmann. 1977. A comprehensive cipal components analysis: some sampling results. Journal
trial of the scree and KG criteria for determining the num- of Statistical Computation and Simulation 18:299-314.
ber of factors. Multivariate Behavioral Research 12:289- 1988. Principles of multivariate analysis: a user's
325. perspective. Oxford University Press, London, England.
Cliff, N. 1988. The eigenvalues-greater-than-one rule and Lambert, Z. V., A. R. Wildt, and R. M. Durand. 1990. As-
the reliability of components. Psychological Bulletin 103: sessing sampling variation relative to number-of-factors
276-279. criteria. Educational and Psychological Measurement 50:
Cooley, W. W., and P. R. Lohnes. 1971. Multivariate data 33-49.
analysis. John Wiley & Sons, New York, New York, USA. Lautenschlager, G. J. 1989. A comparison of alternatives
Davis, A. W. 1977. Asymptotic theory for principal com- to conducting Monte Carlo analyses for determining par-
ponents analysis: non-normal case. Australian Journal of allel analysis criteria. Multivariate Behavioral Research 24:
Statistics 19:206-212. 365-395.
Digby, P. G. N., and R. A. Kempton. 1987. Multivariate Lawley, D. N. 1956. Tests of significance for the latent roots
analysis of ecological communities. Chapman and Hall, of covariance and correlation matrices. Biometrika 43:128-
New York, New York, USA. 136.
Dillon, W. R., and M. Goldstein. 1984. Multivariate anal- 1963. On testing a set of correlation coefficients for
ysis: methods and applications. John Wiley & Sons, New equality. Annals of Mathematical Statistics 34:149-151.
York, New York, USA. Legendre, L., and P. Legendre. 1983. Numerical ecology. El-
Dudzifiski, M. L., J. T. Chmura, and C. B. H. Edwards. 1975. sevier, Amsterdam, The Netherlands.
Repeatability of principal components in samples: normal Morrison, D. F. 1990. Multivariate statistical methods. Third
and non-normal data sets compared. Multivariate Behav- edition. McGraw-Hill, New York, New York, USA.
ioral Research 10: 109-118. Oksanen. J. 1988. A note on the occasional instability of
Efron, B. 1979. Bootstrap methods: another look at the detrending in correspondence analysis. Vegetatio 74:29-32.
jackknife. Annals of Statistics 7:1-26. - Orloci, L. 1978. Multivariate analysis in vegetation re-
Frontier, S. 1976. Etude de la decroissance des valeurs propres search. Second edition. Dr. W. Junk, The Hague, The Neth-
dans une analyze en composantes principales: comparison erlands.
avec le module de baton bris6. Journal of Experimental Pielou, E. C. 1984. The interpretation of ecological data.
Marine Biology and Ecology 25:67-75. John Wiley & Sons, New York, New York, USA.
Gauch, H. G., Jr. 1982a. Multivariate analysis in com- Pimentel, R. A. 1979. Morphometrics: the multivariate
munity ecology. Cambridge University Press, New York, analysis of biological data. Kendall-Hunt, Dubuque, Iowa,
New York, USA. USA.
1982b. Noise reduction by eigenvector ordination. Rexstad, E. A., D. D. Miller, C. H. Flather, E. M. Anderson,
Ecology 63:1643-1649. J. W. Hupp, and D. R. Anderson. 1988. Questionable
multivariate statistical inference in wildlife habitat and Stauffer, D. F., E. 0. Gordon, and R. K. Steinhorst. 1985.
community studies. Journal of Wildlife Management 52: A comparison of principal components from real and ran-
794-798. dom data. Ecology 66:1693-1698.
Rexstad, E. A., D. D. Miller, C. H. Flather, E. M. Anderson, Taylor, J. 1990. Questionable multivariate statistical infer-
J. W. Hupp, and D. R. Anderson. 1990. Questionable ence in wildlife habitat and community studies: a comment.
multivariate statistical inference in wildlife habitat and Journal of Wildlife Management 54:186-189.
community studies: a reply. Journal of Wildlife Manage- Zebra, K. E., and J. P. Collins. 1992. Spatial heterogeneity
ment 54:189-193. and individual variation in diet of an aquatic predator.
SAS. 1989. SAS/STAT user's guide. Version 6. SAS Insti- Ecology 73:268-279.
tute, Cary, North Carolina, USA.