Sei sulla pagina 1di 6

EstimateS notes for FESINJuly 25, 2009

Susan G. Letcher

Estimating and comparing species richness with EstimateS 8.0:


notes and lab exercises
Summary of introductory talk:
Statistical issues in the measurement of biodiversity. Biodiversity is a fundamental
property of ecosystems, but estimating and comparing biodiversity is a non-trivial
statistical problem. Sampling all of the species present is an impossible task, except in the
most depauperate of ecosystems, and observed species richness is non-linearly related to
sampling effort. Because of this non-linear relationship, comparing species richness
among sites and comparing the similarity of species composition among sites requires a
robust statistical approach.
Estimating species richness: how many species are present at a site? Three main
approaches are used to estimate species richness: rarefaction, species richness estimators,
and diversity indices. Rarefaction involves constructing a smooth species accumulation
curve by randomly resampling the data, and then comparing species richness across all
sites at the point on the curve corresponding to the number of individuals (or samples) at
the site with the lowest sampling intensity. Because the number of observed species is a
biased estimator of actual species richness, species numbers calculated by rarefaction will
be lowersometimes considerably lowerthan the actual species richness of an
ecosystem.
Species richness estimators attempt to predict the asymptote of the species
accumulation curve, thus correcting for the downward bias inherent in observed species
richness. However, richness estimators are also sensitive to sampling intensity, and so the
same procedure as in rarefaction (comparing all curves at the point on the x-axis
corresponding to the site with lowest sampling intensity) should be followed when
comparing estimators at multiple sites. EstimateS offers a number of parametric and nonparametric estimators; the help files in the program and references available on the
EstimateS web site can help you decide which is best for your data. The program also
offers an option for calculating standard error for many of the estimators.
Diversity indices are an alternate approach to calculating species richness.
EstimateS offers three of the most robust and popular indices: Shannon, Simpson, and
Fishers . The Shannon index (also called Shannon-Wiener index) is an entropy measure
adapted from information theory; the Simpson index is based on abundance and evenness
of species; and Fishers is a fitted parameter assuming a logseries distribution of species
abundances. Note that diversity indices are also sensitive to sampling intensity, and again
the same protocol as in rarefaction should be followed. EstimateS has an option to
calculate confidence intervals for diversity indices.
Comparing species richness: how many species are shared between two sites?
While this may seem like a trivial question, the inherent bias in biodiversity sampling
makes it more difficult than it appears. If it is impossible to measure all the species
present at one site, how can we accurately characterize the similarity in species
composition between two sites?
Anne Chao developed the first statistical estimator of shared species (Chen et al.
1995 in Chinese, Chao et al. 2005 in English; references available on EstimateS website).
Chaos estimate of shared species is provided in the EstimateS output for shared species.

EstimateS notes for FESINJuly 25, 2009


Susan G. Letcher
For examining overall compositional similarity between two sites, EstimateS
offers four classic similarity indices and two new non-parametric abundance-based
similarity indices with bias-corrected estimators (Chao et al. 2005; available on
EstimateS website). The classic estimators are Jaccard, Srensen, Bray-Curtis (=Srensen
abundance-based), and Morisita-Horn. Of these four, Morisita-Horn is the least sensitive
to sampling intensity, but it is highly sensitive to the abundance of the most common
species.
Although the classic similarity indices are widely reported in the literature, they
all lack one important component: none of them corrects for the downward bias in
observed species richness, and for a downward bias in similarity, by accounting for the
effect of undetected shared species. The first indices to satisfactorily adjust for unseen
shared species were developed by Chao et al. (2005). These non-parametric similarity
indices are based on the probability that individuals from a random draw belong to shared
species. EstimateS reports the raw values for the indices as well as the bias-corrected
values, so that you can observe the effect of the bias correction. EstimateS also calculates
standard error for the Chao similarity indices, allowing rigorous comparison of multiple
similarity values.
For comparing species richness among >2 sites, ordination methods are useful.
EstimateS does not offer any ordination tools, but the output can easily be uploaded into
R, Primer-E, or other ecological software packages. If you have environmental data as
well as biodiversity data, try DCA or PCA on the similarity matrix and the matrix of
environmental data. If you have only the similarity matrix, try NMDS.
Conclusion: the statistical properties of species abundance distributions make it
imperative to use robust techniques when estimating or comparing biodiversity. The field
of biodiversity measurement continues to grow and evolve, and new approaches will
certainly appear. For the moment, and to the best of my knowledge, the recommendations
here are state-of-the-art approaches to biodiversity estimation. EstimateS puts all these
tools within reach.
Using EstimateS: lab exercises
These lab exercises are entirely optional, and I encourage you to work with your
own data sets instead or in addition to the data I provide. You may also want to run
through these exercises before the lab session, using the data I provide, and then bring
your own data for the lab session.
Note: A small bug in EstimateS 8.0 causes the application to crash when the diversity indexes
option has been checked and the data set includes one or more completely empty samples (no species).
Unfortunately, the benchmark dataset that is downloaded with EstimatesS 8.0 ("Seedbank") contains two
empty samples. A replacement version will be supplied. Please make sure each sample in each of your own
datasets records at least one species. Eliminating empty samples has no affect on any estimators in
EstimateS. This bug will be corrected in the next version of EstimateS.

Opening EstimateS: If it is the first time you open the program, you will be
prompted to select a Data File. Choose the file Statistics.4DD (Windows) or
Statistics.data (Mac OS). This specifies the default output file for results. Do not
attempt to load your input data file at this time.

EstimateS notes for FESINJuly 25, 2009


Susan G. Letcher
Loading your input data: Use Ctrl+I or choose Load Input File from the File
menu. Follow the directions on-screen and specify the file format and how many
columns/rows of labels to skip.
Estimating species richness: Load the input file FLNEstSformat.txt. This file is
in format 2 (samples x species matrix), with one row of species labels and one column of
sample labels, as well as the obligatory title and parameter records in the first two lines.
Examine the data file in a spreadsheet program to familiarize yourself with the format.
Use Ctrl+T or choose Diversity Settings from the Diversity menu to specify the
parameters you want. Under the Randomization tab, specify the settings you want.
Usually 50 runs (default) is enough to get a smooth species accumulation curve. You can
choose a higher number of randomizations, but this will increase the calculation time.
Under random number generators, strong hash encryption (SHA) is recommended
unless you require precise repeatability of your results. Under randomization protocol
for estimators, your choice depends on the output you want. You should choose the
sample with replacement option if you want to calculate confidence intervals for
species richness estimators and/or diversity indices. (Otherwise, the confidence interval
will shrink to zero in the last data points and not be meaningful). However, sample with
replacement has the unpleasant feature that estimated richness may turn out to be LESS
than observed richness, especially for a moderate number of runs, because some samples
are never included at all, by chance. You should increase the number of runs and/or
choose sample without replacement if this occurs in your dataset. You should choose
sample without replacement if you want a confidence interval for a rarefaction curve,
because the Mao Tau SD assumes sampling without replacement. It may be a good idea
to analyze your data twice, once with replacement and once without, so that you can
choose the appropriate output file for the estimators you want. (See more about analyzing
data and exporting output below.)
Under the Estimators tab, the bias-corrected option is recommended unless
you have a very small dataset. Likewise, the default value of 10 for the upper
abundance limit for rare species is recommended. See the help buttons or the EstimateS
website for more information. Generally, if you choose an option that is not appropriate
for your dataset, EstimateS will show a warning message with recommendations for how
to make your analysis more robust.
Under Other Options, the individual shuffling option allows you to examine
the effect of patchiness in your data. Note that this is a simulation and data exploration
tool rather than an analytical tool; see the help button and documentation for more
information. Click the diversity indices box if you want to calculate these indices. (The
calculation for Fisher's alpha takes time, especially for large data sets, because it must be
calculated by iteration. Hence the default is not to calculate diversity indices.) The
individual run export option will export the output from each randomization into a text
file for more detailed examination of the results, but note that this option takes a
considerable amount of time, especially for large datasets or large numbers of iterations.
The settings usage option allows you to save the settings if you wish to repeat the
analysis. If you choose use these settings and save them, EstimateS will prompt you to
save your data file in a new format with these settings in the parameter record. (If you
want to run the same analysis on many files, you can save time by pasting this parameter

EstimateS notes for FESINJuly 25, 2009


Susan G. Letcher
record into all your data files rather than clicking through the options in the dialogue
box.)
Click compute. Depending on your processor speed, the calculations may take up
to 30 seconds for this dataset. When the results window appears, click the export button
at the bottom of the output screen to save the results in a text file. Note that this text file
has a header and column names that contain spaces. If you want to work with it in R, you
will need to do a bit of re-formatting. Once you have exported the data, you can choose
Diversity Settings again to re-analyze your data with different parameters (e.g.,
sampling with/without replacement to calculate confidence intervals.)
In the EstimateS output, the first column shows the number of accumulated
samples, and the second column shows the average number of individuals per that
number of samples. The Individuals column is the one you want to use in plotting a
species accumulation curve. Samples are arbitrary divisions of the data, but an individual
is a fundamental unit that can be compared among datasets. In a graphing program, plot
the individual-based rarefaction curve (Individuals vs. Sobs (Mao Tau). Add the
confidence intervals (Sobs 95% CI Lower/Upper Bound). (Note that you should use
results from the sample without replacementoption in the Diversity Settings dialog box
for meaningful Mao Tau confidence intervals.) Does the species accumulation curve level
off?
Plot the number of indivduals vs. observed species richness and the different
estimators available in EstimateS. Which estimators stabilize most rapidly? Which are the
most conservative (i.e., closest to the observed species richness), and which give the
highest estimates? Observe the behavior of the estimators at low numbers of individuals.
Which estimators drastically overshoot the observed species richness early on, and which
are more stable? For the estimators where confidence intervals are provided, plot them
and see if/where they overlap with observed species richness. (Remember to use the
results from the sample with replacement option in the Diversity Settings to get
meaningful confidence intervals for the estimators.)
Plot number of individuals vs. the diversity indices and their confidence intervals.
(Note: to get 95% CIs, multiply SD by 1.96). Here, use the sample with replacement
output. How sensitive are the diversity indices to sampling intensity?
Comparing estimates of species richness between two sites: rarefaction,
estimators, diversity indices. Load the input file PozoAzulEstSformat.txt. Choose the
same parameters as above in the diversity settings menu. Alternately, if you chose save
settings with your first data file, you can paste the parameter record into your second data
file. (Make sure that the number of species and samples in your second data filethe first
two columns in the parameter roware correct.) Compute diversity stats using the menu,
or use Ctrl+D.
When you hit compute, EstimateS will prompt you with a window that says,
There are already some Diversity Statistics in the EstimateS data file. OK to erase
them? Hit OK, and your new diversity statistics will appear in the output window. Save
them in a text file by clicking export.
Now you can use the three approachesrarefaction, estimators, and diversity
indicesto compare the species richness at these two sites. Which site has the lower
number of individuals? When you compare the two sites at this number of individuals,

EstimateS notes for FESINJuly 25, 2009


Susan G. Letcher
which is more diverse? Do the confidence intervals overlap? Does the particular approach
that you use affect the outcome?
About the data files: these data are from two 0.5 ha surveys of woody plants in mid-elevation rain
forests in Costa Rica, including all woody stems 2.5 cm at 1.3 m from the rooting point. Pozo Azul is a
secondary forest c. 40 yrs old on abandoned pasture with some remnant trees; Finca Los Nacientes is a
heavily logged but never clear-cut forest. Each sample in the data files is a 10 x 2 m subsection of the
survey transect. Known species have a six-letter code name (first three letters of genus + first three letters
of specific epithet; e.g., Allomarkgrafia plumeriiflora Allplu); species yet to be determined from
vouchers are indicated with a morphoname (e.g., 2 fat glands).

Comparing species richness: estimating shared species and similarity. Load the
data file shared3sites.txt. This data file is in format 1 (species x sample matrix), with
one column of species labels and one row of sample labels, along with the required title
and parameter records. Examine the format of the file in a spreadsheet; note that each
column in the data set represents a site (rather than a sample).
Use Ctrl+U or the Shared Species menu to open the Shared Species Settings
dialog box. Under Coverage-based estimators, 10 is the recommended limit. Under
Similarity indexes and estimators, the default is to compute similarity indices but not
the bootstrap SEs, since the latter is computationally intensive. If you want SEs, check
this option. In this input file, the data are abundance-based; for more information on
working with incidence data, see the section below on incidence data or consult the help
files. As for the diversity menu, you have the option to use these settings and save them
in a copy of the data file.
Compute the shared species settings and export your data to a text file. Note that
EstimateS uses the order of samples in the data file to specify First sample and Second
sample in the output. For instance, in this case, Sample 1 = T6, Sample 2 = T23, Sample
3 = T28. Compare the similarity of the three forests. Which sites are most/least similar?
Do the confidence intervals overlap? Read about the data file, below, and see if your
results make sense given the ages of the different stands.
About the data file: these data are from three 0.1 ha surveys of woody plants in lowland rain
forests in Costa Rica: trees and shrubs 2.5 cm and lianas 0.5 cm at 1.3 m from the rooting point.
Transect 6 (T6) is an 11-yr old forest that was in pasture for >20 yrs before abandonment; T23 is an 24-yr
old forest that was in pasture for c. 18 yrs, and T28 is an old-growth forest. More detail on these data can be
found in Letcher & Chazdon, in press, Biotropica; DOI: 10.1111/j.1744-7429.2009.00517.x), and the entire
dataset of 30 transects is archived with the SALVIAS project: www.salvias.org.

Working with incidence data. Loading your input file and calculating estimated
species richness works exactly the same way as for abundance data, but you need to be
sure to report only the incidence-based indices and estimators: Chao2, Bootstrap,
Jackknife, and/or ICE. I have provided a version of the Finca Los Nacientes dataset in
which the abundance of each species per sample is reduced to 0/1 (incidence data).
Compare the estimated species richness from this datafile (FLNEstSformatinc.txt) to
the estimates you calculated for the abundance data.
For shared species incidence data, you need to upload two data files: a file of
summed incidence values for all your sites, and a vector showing how many samples
were combined to get the summed incidence values. I have provided examples of the
summed incidence file (sharedincdata.txt; format 3; this is the same dataset as

EstimateS notes for FESINJuly 25, 2009


Susan G. Letcher
shared3sites.txt except with abundances of species per sample reduced to 0/1) and the
vector of sample sizes (incvector.txt). Load the data file, and then upload the vector via
the load sample sizes button in the Shared Species dialogue box, and then click
compute when the vector is successfully uploaded. In the output, you will note that some
slightly different similarity indices are listed: Chao-Jaccard Incidence-based and ChaoSrensen Incidence-based, instead of the abundance-based versions. The details of the
calculations can be found in Chao et al. (2005). Compare the output for the incidencebased version to the abundance-based output from above.
For further reference
The EstimateS website contains more extensive help files and descriptions of the
program, as well as an extensive bibliography and links to copies of many articles in PDF
format:
http://purl.oclc.org/estimates or http://viceroy.eeb.uconn.edu/estimates
A forthcoming book (expected publication date: 2010) will cover these and more
topics in detail, with contributions from expert authors in the field:
Biologicaldiversity:frontiersinmeasurementandassessment.EditedbyAnneMagurran
&BrianMcGill.OxfordUniversityPress.
Workshop attendees may also be interested in a recent paper on sampling effort:
Chao, A., Colwell, R.K., Lin, C.-W., & Gotelli, N.J. 2009. Sufficient sampling for
asymptotic minimum species richness estimators. Ecology 90(4): 1125-1133.

Potrebbero piacerti anche