Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT: The method of choice for multivariate representation of community structure is often
non-metric multi-dimensional scaling (MDS). This has great flexibility in accomn~odatingbiologically
relevant (i.e. non correlation-based) definitions of similarity In species composition of 2 samples, and
in preserving the rank-order relations amongst those similarities in the placing of samples in a n
ordination. Correlation-based techniques (such a s Canonical Correlation) a r e then inappropriate in
linking the observed biotic structure to measured environmental variables; a more natural approach
is simply to compare separate sample ordinations from biotic and abiotic variables and choose that
subset of environmental variables which provides a good match between the 2 configurations. In fact,
the fundamental constructs here are not the ordination plots but the (rank) similarity matrices which
underlie them: a suitable measure of agreement between 2 such matrices is therefore proposed and
used to define an optimal subset of environmental variables w h ~ c h'best explains' the biotic structure.
This simple technique is illustrated wlth 3 data sets, from studles of macrobenthic, meiobenthic and
diatom communities in estuarine and coastal waters.
0 Inter-Research 1993
206 Mar. Ecol. Prog. Ser. 92: 205-219, 1993
The intention of this paper is to propose a simple remainder are strongly non-monotonic (unirnodal) over
method for addressing such questions and to examine different ranges of one or several abiotic variables. In
its performance for several literature data sets. contrast, 'classical' statistical methods such as Canonical
Correlation (e.g. Mardia et al. 1979) would compute
correlations, and thus assume linear relationships,
METHOD among and between the biotic and abiotic variables.
This is quite unrealistic.
Concept How then should the 'pattern matching' be under-
taken? One possibility is to use a Procustes analysis
Our approach relies on the premise that pairs of (Gower 1971) on the separate ordinations for biota and
samples which are rather similar in terms of a suite of environmental variables. This is a technique for opti-
physico-chemical variables would be expected to have mal matching of 2 different configurations of points
rather similar species composition, provided the rele- with the same set of labels, using only rotation, rever-
vant variables determining community structure (and sal and shrinking of one plot in relation to the other, i.e.
only these variables) have been included in the analy- preserving the relative position of the points within
sis. Thus an ordination of these abiotic variables, rep- each plot. Their goodness-of-fit is then assessed by
resenting the mutual environmental similarities among some criterion involving 'squared distance apart' of the
samples, should closely resemble the ordination of 2 configurations. However, this has the disadvantage
samples based on the biota. Selecting different combi- that it operates on ordinations in a specified number
nations of the full environmental vaiiable set should of dimensions. Thus the particuiar combination of
allow us to determine an 'optimal' match of the sepa- environmental variables giving an optimal match of
rate biotic and abiotic ordinations. The omission of biotic and abiotic patterns may vary, depending on
a key determinant will degrade the match, as will whether we view the sample relationships in 2-
the inclusion of environmental variables that differ or 3-dimensional ordinations. This is undesirable. The
markedly between the samples but have no effect complex data typically encountered scarcely ever
on community composition. admits of a perfect representation (zero stress) in a
Two related advantages are conferred by separating specified number of dimensions, so choice of dimen-
the construction of environmental and biotic ordina- sionality for convenient viewing of the patterns is
t i o n ~ ,only linking the two by a 'pattern matching' inevitably somewhat arbitrary. A more objective crite-
stage. rion would be based on the triangular similarity (or
(1) Separate definitions of among-sample similarity dissimilarity) matrices underlying both ordinations.
can be used for each. As noted earlier, species data The definition of similarity of 2 samples will have been
requires a rather careful formulation of similarity, with carefully selected, in an appropriately different way for
appropriate trade-offs between the contributions of biotic and abiotic variables, and it can therefore be
common and rarer species, and independence of the argued that the similarity matrices are the funda-
chosen coefficient from joint absences (as Field et al. mental constructs, representing all that is known about
1982 put it: 'estuarine and abyssal samples are [no sample relationships.
more] similar because both lack outer shelf species').
A typical species abundance matrix is dominated by Coefficient
zero counts and this makes correlation (or Euclidean
distance) a poor basis for assessing similarity. By con- How then should one define a coefficient of agree-
trast, physico-chemical data matrices can usually be ment between the biotic and abiotic (dis)similarity
handled by conventional multivariate statistics. The matrices? An obvious candidate is a rank correlation
variables can often be transformed to approximate computed between all the elements of the 2 triangu-
normality - sometimes with different transformations lar matrices. A rank coefficient is an appealing choice
needed for separate classes of variables - and corre- because of the differing units and modes of construc-
lation-based definitions of similanty are then to be tion of the biotic and abiotic matrices (e.g. Bray-
preferred. Curtis dissimilarities and Euclidean distances respec-
(2) Few constraints are implied on the nature of the tively); their common denominator is the relative
link between biota and environment. It is clear that the ordenng of dissimilarities within each matrix. The
original premise of this section could be met under a argument for using only these ranks is further
typical mix of conditions in which the numbers (or bio- strengthened by the knowledge that the displays of
mass) of some species are linearly related to an sample relationships, through accurate (low stress)
environmental gradient, other species are non-linearly MDS ordinations, are effectively functions only of
but still monotonically related to the gradient, and the the rank (dis)similarities.
Clarke & A~nsworth:Linking community structure to environment 207
We have studied the performance in this context similarities but 90 between-group terms; of impor-
of standard rank correlation coefficients, such as tance here would seem to be a search for abiotic vari-
Spearman's p, or Kendall's T (Kendall 1970), for ables which discriminate the same 5 clusters - the
several literature data sets, including the illustrations precise disposition of the groups in relation to each
of the following section. Spearman's coefficient is other then seems of less significance, though this
defined a s feature would tend to dominate the unweighted co-
efficient. The point is seen again in the first example
of the next section.
Down-weighting of the larger ranks in E q . 1 can be
achieved by adding a denominator term, inside the
where (r,; i = 1, ..., N ] = ranks of all the sample simi- summation, which is a n increasing (and symmetric)
larities calculated using the biotic data and (si;i = function of ri and S;. Amongst a number of possibilities
1, ..., N }= ranks of sample similarities defined from (inevitably), the algebraically simplest choice of
abiotic data. This correlation is computed by first 'un- weighting term is (r, + S,), giving coefficient
peeling' the 2 triangular similarity (or dissimilarity)
matrices in the same way, to give 2 vectors of length
N = n ( n - 1)/2, where n = number of samples. Each
vector is then replaced by its ranks (the integers 1 to N,
in some order) and Eq. 1 applied. where c i s chosen so that p, (like p,) takes values in the
Some points of clarification a r e needed here. Firstly, range -1 to + l . More (or less) severe weighting could
similarities a n d dissimilarities a r e always defined to be be achieved by higher (or lower) powers of (ri + S,) in
exactly complementary (whatever the coefficient used), the denominator.
in the sense that the rank order of similarities is the Algebraic manipulation shows that
inverse of the rank order of dissimilarities. Secondly.
high similarities correspond to low rank similarities,
following the usual convention that the highest similar-
ity is given a rank of 1, the next highest a rank of 2, and with extreme values p = -1 and +I corresponding
so on. Thirdly, note that the rank similarities (r,] (or { S , ] ) respectively to the cases where the 2 sets of ranks a r e
are not a set of mutually independent variables, since in complete opposition or complete agreement. The
they a r e based on a large number ( N ) of strongly inter- former case is unlikely to be attainable in practice be-
dependent similarity calculations. Standard statistical cause of the constraints inherent in a similarity matrix;
tests and confidence intervals for p, or T,derived for the p will take values around zero if the match between the
case where the (r;}a n d {S,] are ranks of independent 2 patterns is effectively random but p will typically be
and identically distributed observations, a r e therefore positive.
totally invalid. The algebra also yields the alternative, a n d entirely
In itself, the above does not compromise the use of equivalent, formulation
p,, say, as a n index of agreement of the 2 triangular
matrices. O n e might expect, however, that it would be
a less than ideal measure because of the fact that
few of the equally-weighted difference terms in Eq. 1 where
involve 'nearby' samples. In contrast, the premise at
the beginning of this section makes it clear that we
a r e seeking a combination of environmental variables
which attains a good match of the high similarities is the average over all i = 1, ...., N matrix elements of
(low rank similarities) in the biotic and abiotic matri- the harmonic mean of the ranks r, and S;. This suggests
ces. This is more a mechanistic point than a n inter- the use of the term harmonic rank correlation for the
pretational one; it is simply that the value of p,, when weighted Spearman coefficient p,, and this alternative
computed from triangular similarity matrices, will formulation is (arguably) helpful in visualising the
tend to be swamped by the larger number of terms behaviour of p,, for example the fact that it is min-
involving 'distant' pairs of samples. These have high imised when the ranks (r,]a n d { S , } a r e exact reversals
rank values and the squaring of differences in the of each other. Ultimately, the efficacy of this (or a more
numerator of Eq. 1 makes the coefficient sensitive to severe) down-weighting of the larger rank similarities
small relative differences in these large ranks. For can only be judged by its performance in real a n d
example, 15 community samples, strongly clustered simulated applications; this issue is returned to in the
into 5 groups of size 3, generate only 15 within-group later examples.
208 Mar. Ecol. Prog. Ser. 92: 205-219, 1993
l 1
Bray-Curtrs
12
.. + Exe estuary nematodes
Species
numbers
l l 4 :l.
.....
-
/ b ~ o m a s si
ll1
I
l I.
An instructive starting point is the data set used as
I
: l;
1
Eucl~dean
"NK $ "r;""AT/ON an example in Field et al. (1982).The nematode com-
ponent of benthic meiofaunal communities at several
j,
Abiotic
:
, 4 . .1
•••
I 2 l 74
inter-tidal locations in the Exe estuary, UK, was
varlables I ;- - ; I 5
I I i Using a
subset of
j
17
H2S, Sal, M P D
-'1
MPD, WT, H t H2S. Sat, MPD, % 0 r g All 6 variables
text).The associated 'matching
coefficients' pw of environ- 10
mental to biotic similarity 15 5
patterns (Table 1 A ) are: (B)
0.76, (C) 0.80, (D) 0.40, ( E )
0.79 and (F)0.77, Stress values
for the 2-dimensional ordina-
tions, using Kruskal's stress
formula 1, are: (A) 0.05; (B) 0;
(C) 0.04; (D) 0.07; (E) 0.04; and
(F)0.06
the subset of environmental variables increases. Thus, variables; perhaps the latter are so confounded with
on its own, the variable which best groups the sites, in each other that they all tell essentially the same story
a manner consistent with the faunal pattern, is the (irrespective of the range of values observed for p,!).
depth of the H2S layer (p, = 0.62); next best is the This can be simply refuted by plotting some combina-
organic content of the sediment (p, = 0.54), etc. Of tions of environmental variables giving rise to lower
course, since the faunal ordination is not essentially matching coefficients. Fig. 2D shows the ordination of
l-dimensional (Fig. 2A), one would not expect a single the 3 variables: median particle diameter, depth of the
environmental variable to provide a very successful water table and height up the shore (the correspond-
match, though knowledge of the H,S variable alone ing p, is 0.40). Whilst there are still some points of
does distinguish points to the left and right of Fig. 2A similarity, arising from the inclusion of median particle
(Sites 1 to 4 and 6 to 9 have lower values than for Sites diameter, the match with Fig. 2A has largely broken
5, 10 and 12 to 19, with Site 11 intermediate). down.
The best 2-variable combination also involves depth More subtly, note what happens when further vari-
of the H2S layer but adds the interstitial salinity. The ables are added to a description which already appears
correlation (p, = 0.76) is markedly better than for to provide a good match. Table 1A shows that the best
any other 2-variable subset. Fig. 2B displays the corre- 4-variable combination simply adds knowledge of the
sponding ordination; there are only 2 variables % organic content to the best 3-variable combination,
involved so, in effect, the figure is a simple scatter plot but this contributes nothing further to the explanation.
of H2S against salinity, rotated to aid visual comparison In fact, p, declines slightly and this is reflected in the
with Fig. 2A. The interstitial salinity distinguishes corresponding ordination Fig. ZE, with a parting of
Sites 1 to 4 from 6 to 9 and Sites 5 and 10 from 12 to 19. Site 9 from Sites 7 and 8 which is not observed in the
The best 3-variable combination retains the above 2 faunal plot Fig. 2A. Adding the depth of the water
variables and adds the median particle diameter of the table and height u p the shore, the match is seen to
sediment; the matching coefficient rises to p, = 0.80 deteriorate further, as indicated by the declining value
and, in fact, this is the maximum value of p, for any of p and as observed in Fig. 2F: Site 14 is separated
size of subset. The ordination of the 3 variables is from Sites 12 and 13 and the spacing between Sites 1
shown in Fig. 2C; it can be seen that the inclusion of to 4 increases; neither of these changes mirrors the
median particle diameter draws Sites 6 and 11 closer pattern in Fig. 2A.
and adds some meaningful differentiation of the 12 There is an interesting contrast here with an un-
to 19 group (e.g. note the proximate positions of Sites related, but in some ways analogous statistical tech-
12 to 14). In fact, Fig. 2C now provides a surprisingly nique, that of multiple regression (e.g. Draper & Smith
good match to Fig. 2A, with all the main conclusions 1981).Layouts such as Table 1 have the appearance of
about site relationships being apparent from both lattice structures from linear models, where the value
plots. The only obvious discrepancy is in the location of in brackets would represent the residual sum of
the (5, 10) cluster; in Fig. 2C this has changed places squares from fitting a subset model. There, the addi-
with the large 12 to 19 group. This is of no real signifi- tion of further explanatory variables guarantees an
cance: the faunal data clearly indicate that there are 'improved fit' of the data points to the model, in the
5 distinct clusters, with the (5, 10) group rather unlike sense of always reducing the residual sum of squares.
anything else but closer to (6, 11) and 12 to 19 than the By contrast, here, the inclusion of environmental vari-
other groups; similar conclusions are apparent from ables which have no effect on community structure is
the arrangement of sites in the environmental variable not just neutral in its effect on the matching coefficient
plot, Fig. 2C. The point to bear in mind is that the but is virtually guaranteed to decrease p. For example,
ordinations are only a visual aid. The matching is car- Sites 1 to 4 cover the full range of possible sampling
ried out on the relative similarities (i.e. which samples positions in the intertidal zone which, of course, ex-
are 'close' to which other samples), and these are not plains why the corresponding points are 'pushed apart'
constrained to a 2-dimensional layout. In fact, this in Fig. 2F. This is not matched in Fig. 2A and height up
example provides a good illustration of why a Pro- the shore is clearly seen not to be a forcing variable of
crustes approach to matching the faunal and environ- the overall community pattern in this data set.
mental ordinations could be unsatisfactory. Fig. 2A and Note the satisfying consistency of the 'lattice' struc-
C could not be made to match closely by rotation, ture in Table 1A: the best combination at one level is
reflection or scaling operations; yet, as we have always a subset of the best combination on the Line
remarked, they tell a surprisingly similar story. below. The analysis has not been constrained to
At this point, it might not be unreasonable to query achieve this; all combinations have been evaluated
whether a roughly similar division of the sites would and simply ranked. Furthermore, the introduction of
arise from any combination of (say) 3 environmental some variables (e.g. depth of the water table and
Clarke 81 Ainsworth Llnking comnnunlty structure to environment 211
Table 2. Messolong~lagoon diatoms. Combinations of variables, k at time, giving the largest rank correlations p, between biotic
and environmental similarity matrices; bold type indicates the best combination overall. In-N: inorganic nitrogen; Sal: salinity;
0 2 : dissolved oxygen; Temp: temperature; other abbreviations are standard. All variables except Sal and O2 are log-transformed
ture from a large abundance matrix, heavily trans- tion. That the environmental and biotic variables be,
formed so that many species contribute to the MDS in some sense, 'independently measured' is not a pre-
pattern, is replicated almost perfectly through knowl- requisite of the method. We are not attempting at
edge of only 5 environmental variables. This result has this stage to erect a formal statistical framework which
obvious implications for prediction as well as explana- would permit hypothesis testing, construction of con-
fidence intervals for p,, etc. Indeed, it is
Biota In-N. PO,
clear that p, could be exploited as a simple
12
measure of agreement between the similarity
15 structure underlying any 2 ordinations with
12
1a4 13 15 the same label sets, however these are
8 14 17
16 obtained.
4 ,p10 IIs
4 s7 9 Another general point of practical impor-
l 3 7 9 ?o
6 6 tance is also exemplified by the current data.
1
2 Some of the environmental variables were in
pi 2 fact transformed prior to computation of the
Euclidean dissimilarity matrix. The upper
In-N, PO,, Sal, Sic+, 4 All 10 variables
12 right-hand half of Fig. 5 shows why this was
12
necessary. It is a 'draftsman plot', namely, an
4
16 7 l6 arrangement in a triangular array of simple
8 13 l5 5 17
1314 15 14 scatter plots between pairs of environmental
74 1
7--1 1110
6
variables. The points in each plot are, of
4 5 l0 9
8
course, the values of the variables at the 17
1 sites, in this case untransformed. Such plots
2 F 2
can be a helpful way of examining observa-
tions en masse, when assessing the likely
Fig 4 . Messolongl lagoon diatoms. MDS plots of the I f sampling sites, validity of assumptions that will need to be
based on: (A) transformed abundances of 193 diatom taxa; ( B ) to ( D )the
'best' 2-, 5- and 10-variable combinations of transformed environmental
made about the data. Here, choice of trans-
variables (see text). Matching coefficients p, (Table 2 ) are: (B) 0.79, (C) formation is important in the definition of
0.86 and ( D )0.79. Stress values are: ( A ) 0.08; ( B ) 0 ; ( C )0.08; and ( D ) 0.08 Euclidean dissimilarity between pairs of
Clarke & Ainsworth: Linking community structure to environment 213
TRANSFORMED
In Tot-P
Fig. 5. Messolongi lagoons. 'Draftsman' plots (pairwise scatter plots) of selected environmental variables at the 17 sampling sites.
Upper triangle: original values. Lower triangle: concentrations of inorganic nitrogen, silicate, phosphate and total phosphorus are
log-transformed (note In = log,); salinity and dissolved oxygen remain untransformed
samples, just as it is in defining Bray-Curtis dissimilar- formation. For concentration variables this will usually
ity for the species data. The use of Euclidean distance involve reduction in right skewness, so the choice is
in the (normalised) environmental variable space is often simply between log (or 4th-root, having similar
most effective if the data are approximately multi- effect), square root and no transformation at all. Here,
variate-normally distributed; that is, pairwise relation- all environmental variables were log transformed, with
ships (if present) are linear and the data are not the exceptions of salinity and dissolved oxygen, and
markedly skewed on any of the variable axes. one can have confidence from the resulting draftsman
The upper triangle in Fig. 5 indicates a marked right- plot that a dissimilarity matrix calculated from
skewness in most of the original variables, the excep- Euclidean distance (on subsequently normalised vari-
tions being salinity and dissolved oxygen, and a ables) is a fair summary of the sample relationships.
noticeable curvilinearity of the relation between The draftsman plot also serves another useful 'pre-
inorganic nitrogen and salinity. (Note that, for clarity of processing' function. Note that Fig. 5 includes a vari-
presentation, Fig. 5 includes only those variables able not found in the analysis of Table 2, total phos-
deemed important by the matching process; however it phorus; there were in fact 11 environmental variables
is necessary initially to compute the full draftsman plot, originally recorded. Total phosphorus was omitted
for all variables). The lower triangle in Fig. 5 depicts because of the very high degree of correlation it shows
the same draftsman plot after suitable transformation with phosphate in the (transformed) draftsman plot,
of most variables; this is seen to have largely removed Fig. 5. Whilst there would have been no technical diffi-
the widespread skewness and also the curvilinearity in culty in retaining both variables in the matching
the inorganic nitrogen versus salinity plot. Note that it routine, there is little point in doing so. It doubles the
is not usually appropriate to tinker with minor modifi- number of combinations to search and for every com-
cations to transformations for individual variables. All bination involving phosphate, the parallel combination
that is necessary is a broad-brush approach in which - in which phosphate is replaced by total phosphorus -
related groups of variables, expected to behave in would give a virtually identical p value. For example,
similar fashion (in terms of variance-mean relation- the variable subset: inorganic nitrogen, total phos-
ships for example), are subjected to the same trans- phorus, salinity, silicate and dissolved oxygen yields a
214 Mar. Ecol. Prog. Ser. 92: 205-219, 1993
In CO
Fig. 7 Firth of Clyde dumpground. 'Drafts-
man' plot for all measured environmental
variables at the 12 sampling sites: depth of the
In Ni water column, sediment carbon and nitrogen
In Zn p
-.
-. .. . . - . -
~ ~ ~ .I ._.
content and sediment concentration of 8
heavy metals. All except water depth
have been log-transformed
In ~d ~
- l .~ p q ~ ~ l . ..
In ~b
InCr
m
,
c:n m
~~u~~~~n n l
.. .. . - .
%
.
mI . 4 ' .# . --.
~
.. l ~ l ~ ~ - " - . .. .-
~ n.l~~~lr
In %C 5 .S
In %N ..
.. .. -... -. +... ..
-.. .... .
~ ' . I ~ l
C m - . . C .C
Depth - .. .--- :C P .. . r .. -m -. ..
. -.
In Cu In Mn In CO In Ni In Z n In Cd In Pb In Cr In %C In %N
Some of the corresponding MDS plots are given in Table 3. Firth of Clyde macrofauna. Combinat~ons of
Fig. 8. The ordination for the biota (Fig. 8A) shows the variables, k at a time, giving the largest rank correlations p,
clear trend of change in community structure passing between biotic and abiotic similarity matrices; bold type indl-
cates the best combination overall. Dep: depth of water
from Site 1 to the sludge dumping area (centred at column; other variables measure organic content (C, N) and
Site 6), with a steady reversal of the trend as one heavy metal contamination (Cd, Ni, Mn, Cr, CO)of the sedl-
moves on to Site 12; the communities at the extremi- ments. All variables except water depth are log-transformed
ties of the transect are very similar. This major axis of
change is largely replicated by the information on k Best variable combinations ( p , )
sediment organic content (Fig. 8B, p, = 0.75). Note,
C N Cd ...
however, that there is additional structure in the biotic (.70) (.62) (.58)
plot, for example the 'vertical' separation of Site 9, C, Cd C,N Cd,N C,Co
which is not well represented by the %C and %N (.78) (.75) (.74)
information alone. Knowledge of Cd together with C.Cd,N C,Cd,Co C,N,Mn .
standard tables or large sample normality approxima- variable, whilst it may occasionally accommodate
tions, even if the unweighted Spearman p, coefficient some minor feature of the biotic ordination purely by
is used. The standard tests are based on 'independent chance, is certain to conflict strongly with many of the
and identically distributed observations' underlying main features of the plot a n d , overall, decrease the
each set of ranks; the elements of a dissimilarity matrix value of p,,. Thus the requirement to search through a
are certainly not that! Instead, one can use a randomi- large number of variable combinations, in determining
sation (permutation) test in which the sample labels the optimal match, is less likely to raise the spectre of
in the environmental dissimilarity matrix, say, a r e selection bias than might at first appear. Some prac-
randomly reassigned a n d the match p , to the biotic tical evidence for this can b e adduced from the co-
matrix recomputed. This operation is repeated on a herence of the lattice structures in the earlier examples
large number of occasions, generating the spread of (more comprehensive versions of Tables 1 to 3, not
small positive (and negative) p, values that could be shown): what appear to b e irrelevant variables consis-
expected by chance if there is truly no relation be- tently degrade the match, virtually wherever they are
tween biotic and abiotic patterns. The 'significance' of included. There is clearly scope, however, for address-
the genuine p, is then determined by where it falls in ing this issue through simulation studies.
relation to the upper tail of this simulated probability Another central feature of the proposed methodol-
distribution. ogy is the simple visual assessment that can b e made,
O n e should not forget the 'selection bias' effects, on from the ordinations, of the significance (= importance)
the overall significance level, of the large number of of a n improvement in p,. It is well-known, of course,
individual tests implicit in comparing the biotic pattern that what a scientist or layman understands by the
with the optimal environmental combination from a n word 'significance' does not always equate with 'statis-
exhaustive search. A crude compensation for this tical significance': differences that are established
would be to demand a very high individual signifi- beyond reasonable statistical doubt can nonetheless be
cance level, for example, to reject the null hypothesis inconsequential. Thus, a marginal adjustment in the
only if the observed p, is greater than that for any of location of one sample in a n environmental plot may
the random permutations in 20(2"- 1) simulations. This have no implications for the overall conclusions of the
would guarantee a n 'experiment-wide' significance study, a n d the question of whether this adjustment
level of p <0.05, although very conservatively since the leads to a 'statistically significant' improvement in the
(2'- l )potential tests are strongly interdependent. This match may be academic. Above all, the BIO-ENVpro-
aspect could be improved upon but is of little impor- cedure should be thought o f as exploratory. In fact, just
tance since the test itself is of limited applicability. as it is desirable in multiple regression that a subset
Situations in which it is useful to test the hypothesis model be assessed on how well it explains a further, in-
that there is no relationship between biotic a n d dependent data set, so the BIO-ENV procedure would
environmental patterns a r e presumably rather rare. benefit from a 2-stage approach. Only those environ-
Practical interest will more likely centre around mental variables identified a s contributing to the opti-
whether addition of a n explanatory variable 'signifi- mal match with the biotic data would b e analysed in a
cantly increases' a n already non-zero value of p,. A repeat study; variables making a marginal, spurious
formal statistical framework for such tests is, perhaps, contribution to the original match (if any) would be
too much to expect on the present assumptions (or lack almost certain not to feature in the second analysis.
of them). (2) Underprecisely what assumptions is the BIO-ENV
This is not as serious a s it may at first appear, a s procedure valid? The main techniques proposed for
evidenced by the analogy with multiple regression. It relating biotic to abiotic data a r e linked in a natural way
has long been recognised that the significance testing to the ordination methods used for display. Thus, the
framework in, for example, stepwise multiple regres- assumption that sample dissimilarities a r e well-
sion (Draper & Smith 1981) is not without logical represented by Euclidean distance for both environ-
inconsistencies, and its most defensible role is as a n mental and species data, a n d that abundances a r e
exploratory tool. As such, the main function of signifi- linearly related to environmental gradients, leads to
cance testing is then to provide a sensible 'stopping classical Canonical Correlation (e.g.Mardia et al. 1979)
rule', since adding further explanatory variables - or the variant known as Redundancy Analysis (Rao
even entirely random ones - always produces some 1964). Essentially, all links within a n d between biotic
improvement in the residual sum of squares. By con- a n d abiotic variables are assumed to be adequately
trast, the BIO-ENV procedure has a natural stopping summarised by standard correlation coefficients, a n d
rule in the propensity of p, to decrease with inclusion the associated ordination technique is simply Principal
of unimportant variables. It seems intuitively clear Component Analysis. Similarly, the important method
that the addition of a totally irrelevant environmental of Canonical Correspondence Analysis introduced by
218 Mar. Ecol. Prog. Ser. 92: 205-219, 1993
C. J. F. ter Braak (Jongman et al. 1987) is derived from culty generally encountered in handling and inter-
the assumption that the species abundances have uni- preting time-series data.
modal responses across the measured environmental For purely spatial studies, the BIO-ENV procedure
gradients, and the dual relationships of biotic and has now been applied in a reasonably wide range of
abiotic data are accurately displayed by Correspon- marine contexts. Apart from the 3 examples of this
dence Analysis, with its implicit definition of ' X 2 dis- paper, Gee et al. (1992) employ it to assess meiofaunal
tance' as the measure of sample dissimilarity. community responses to differing physical factors and
Our stance has been to retain maximum freedom in sediment concentrations of hydrocarbons and heavy
choosing measures of sample similarity which are indi- metals, along a transect of the Elbe plume (German
vidually relevant to the biological and environmental Bight). Agard et al. (1993) also use the program to
contexts. These then lead to separate graphical repre- examine the relation of tropical macrobenthic commu-
sentations, in appropriately flexible manner, by non- nities to a suite of physical and chemical variables
metric MDS ordination. (Note the contrast with the recorded at 31 stations in a coastal region of Trinidad.
above methods, where the species-environment rela- At an earlier stage of development, some of the
tionships are embedded at a n earlier stage and will methods of this paper were utilised by Warwick et al.
influence the individual representations of biotic and (1991) to link intertidal macrobenthic invertebrate
abiotic information.) MDS effectively relies only on the data, for 46 sites in 6 estuaries of southwest Britain, to
rank similarities, and the subsequent BIO-ENV proce- static and dynamic physical characteristics of the
dure, itself only a function of these ranks, is therefore estuarine sites. In addtion, Warwick & Clarke (1991)
consistent with the chosen ordination technique. referred to this putative methodoivyy in several con-
Selecting a n environmental subset to optimise the texts, including the sampling of macrobenthic commu-
match of biotic and abiotic patterns is an intuitively nities at 39 sites around the Ekofisk oilfield in the
plausible and seductively simple procedure, but it is North Sea (Gray et al. 1990). Whilst further practical
far from clear what assumptions (if any) are being application, simulation studies and comparative trials
made about the form of the species-environment rela- are clearly desirable. the evidence so far suggests that
tionships. There may be conditions under which the these ideas can play a useful exploratory role in identi-
results are misleading; such possibilities could, in part, fying potentjal agents in the shaping of community
be examlned by simulation. structure. Perhaps the technique's greatest strength is
(3) What is the likely range of practical application? the relative simplicity of both the concept and the re-
Clearly the method requires that environmental data sulting graphical presentations, which non-speciahsts
can genuinely b e matched to biotic data site for site, or appear to find very accessible.
sample for sample. Presumably, the more closely envi-
ronmental sampling is matched to the biotic material the
better the chances of a good 'explanation', particularly Acknowledgements. We thank Melanle Austen, Martin Carr,
&chard Warwick (Plymouth Marine Laboratory) a n d Martin
for spatial studies. Note that the data sets of this paper
Green (NIVA. Oslo) for helpful discussions, a n d Daniel
are all examples of spatial, rather than temporal patterns Danielidis (University of Athens) for permission to use data
of community structure. In fact, in 2 of the 3 cases, vari- from his Ph.D. thesis for illustrative purposes. The work forms
ations through time have been deliberately removed by part of the Community Ecology project of t h e Plymouth
averaging across seasons. Of course, the mechanics of Marine Laboratory, with the BIO-ENV software development
undertaken by M.A being funded by the U . K . Department of
the method would allow the linking of matched biotic the Environment, as part of its coordinated programme of
and abiotic time serjes from a single site but a greater research o n the North Sea. We also thank Charles Green for
degree of caution is necessary here and there are good coding additional features in BIO-ENV, under funding from
reasons for expecting a lower rate of success: (a) Tem- the U.K. National Rivers Authority.
poral autocorrelation within each series would not be
well catered for. For example, the global randomisation
test for complete absence of a match would be mislead- LITERATURE CITED
ing for serially-correlated data. (b) The environmental
Agard. J. B. R., Gobin. J.. Warwick. R. M. (1993). Analysis
variables shaping community trends are less likely to be
of marine macrobenthic community structure In relation
measurable concurrently with the biotic data. It may be to natural and man induced perturbations in a tropical
necessary for the abiotic variables to incorporate lags in environment (Trinidad, West I n h e s ) . Mar. Ecol Prog. Ser.
time (and even in space, in the context of pelagic data), 92: 233-243
and the scale of the linking procedure could then easily Bray, J. R., Curtis, J. T (1957). An ordination of the upland
forest communities of Southern Wisconsin. Ecol. Monogr.
get out-of-hand. 27.325-349
Neither of these caveats is specific to the current Clarke, K. R. (1993).Non-parametric multivariate analyses of
methodology; they are a reflection of the greater diffi- changes in commmunity structure. Aust. J . Ecol. ( ~ press)
n
Clarke & Ainsworth: Linking commun~tystructure to environment 219
Clarke, K. R , Green, R. H. (1988). Statistical design and Sage Publications, Beverly Hills
analysis for a ' b ~ o l o g ~ c effects'
al study. Mar. Ecol. Prog. Mardia, K. V., Kent, J. T., Bibby, J. M. (1979). Multlvariate
Ser. 46: 213-226 analys~s.Academic Press, London
Danielidis, D. B. (1991). A systematic and ecological study Pearson, T. H. (1987).The benthic biology of a n accumulating
of diatoms in the lagoons of Messolongi, Aitoliko and sludge disposal ground. In: Capuzzo, J., Kester, D. (eds.)
Kleissova (Greece). Ph.D. thesis, University of Athens Biological processes a n d wastes in the ocean. Oceanic
Draper, N. R., Smith, H. (1981). Applied regression analysis, processes: marine pollution, Vol. 1 Krieger, Melbourne.
2nd edn. Wiley. New York FL, p. 195-200
Field, J . G., Clarke, K. R., Warwick, R. M. (1982). A practical Rao. C. R. (1964). The use and interpretation of principal
strategy for analysing multispecies distribution patterns. component analysis in applied research. Sankhya A 26:
Mar. Ecol. Prog. Ser. 8: 37-52 329-358
Gee, J. M., Austen, M.. De Smet, G., Ferraro, T., McEvoy, A.. Underwood, A. J., Peterson. C . H. (1988). Towards a n eco-
Moore, S.. Van Gausbeki, D., Vincx. M , , Warwick, R. M. logical framework for investigating pollution. Mar. Ecol.
(1992). Soft sediment meiofauna community responses to Prog. Ser. 46: 227-234
environmental pollution gradients in the German Bight
Warwick, R. M. (1971). Nematode associations in the Exe
and at a drilllng site off the Dutch coast. Mar. Ecol. Prog.
estuary. J. mar. biol. Ass. U.K. 51: 439-454
Ser. 91: 289-302
Gower, J . C. (1971). Statistical methods of comparing dif- Warwick, R. M . (1988).The level of taxonomic discrimination
ferent multivariate analyses of the same data. In: Hodson, required to detect pollution effects on marine benthic
F. R., Kendall, D. G., Tautu, P. (eds.) Mathematics in the communites. Mar. Pollut. Bull. 19: 259-268
archaeolog~cal and historical sciences. Edinburgh Uni- Warwlck, R. M . , Clarke, K. R. (1991). A comparison of some
versity Press, Edinburgh, p. 138-149 methods for analysing changes in benthic community
Gray, J. S., Aschan, M., Carr, M. R., Clarke, K. R., Green, R. structure. J. mar. biol. Ass. U.K. 71: 225-244
H., Pearson, T H., Rosenberg, R., Warwick, R. M. (1988). Warwlck, R M., Clarke, K. R. (1993).Comparing the severity
Analysis of community attributes of the benthic macro- of disturbance: a meta-analysis of marine macrobenthlc
fauna of Frierfjord/Langesundfjord and in a mesocosm community data. Mar. Ecol. Prog. Ser. 92: 221-231
experiment. Mar. Ecol. Prog. Ser. 46: 151-165 Warwick, R. M., Goss-Custard. J. D., Kirby. R., George, C. L..
Gray, J. S., Clarke, K. R., Warwick, R. M,, Hobbs, G. (1990). Pope, N. D., Rowden, A. A. (1991). Static and dynamic
Detection of initial effects of pollution on marine benthos: environmental factors determining the community struc-
a n example from the Ekofisk and Eldfisk oilfields, North ture of estuarine macrobenthos in SW Britain: why is the
Sea. Mar. Ecol. Prog. Ser. 66: 285-299 Severn Estuary different. J. appl. Ecol. 28: 329-345
Jongman. R. H. G., Ter Braak, C. F. J., Tongeren. 0. F. R. Warwick, R. M., Pearson. T. H., Ruswahyuni (1987).Detection
van (1987). Data analysis in community and landscape of pollution effects on marine macrobenthos: further
ecology. Pudoc, Wageningen evaluation of the species abundance/biomass method.
Kendall, M. G . (1970). Rank correlation methods. Griffin, Mar. Biol. 95: 193-200
London Warwick, R. IM., Platt. H. M.. Clarke, K. R., Agard, J., Gobin,
Kenkel, N. C., Orloci, L. (1986) Applying metric and non- J. (1990). Analysis of macrobenthic and meiobenthic
metric multid~mensional scaling to some ecological comnlunlty structure in relation to pollution and distur-
studies: some new results Ecology 67: 919-928 bance in Hamilton Harbour, Bermuda. J exp. mar. Biol.
Kruskal, J. B., Wlsh, M . (1978). Multidimensional scaling. Ecol. 138: 119-142
Tlijs article was submitted to the editor Manuscript first received: March 3, 1992
Revised version accepted: November 30, 1992