Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
I. INTRODUCTION
in categorical data; the same way that statistical variance is used to provide measure of
variability for quantitative variables. Among other indices, Shannon index (Shannon-
Weiner index) stands out because it provides more information about community
code. Later on, it was widely used to determine the diversity of sample of individual from
an ecological community, treating species as symbols and their relative population sizes
as the probability.
Moreover, it has been one of the most commonly used indices because of its
application in measuring the diversity for collection coming from infinitely large
population in which a random sample can be drawn. This classification is most common
community’s diversity, it has been proven biased especially when all the species are not
__________________________________________________________________
Undergraduate Special Problem under the supervision of Consornia Reano,Ph.D., submitted as partial
fulfillment of the requirements in STAT 190, 2nd semester, SY 2008-2009
Several studies have adopted the index to measure other types of diversity. The
index also measures the variation of genetic, morphologic and phenotypic characteristic.
Likewise, the Shannon index used for these diversities is biased and more often
underestimate the true population diversity, especially when the number of not sampled
conditions for Shannon index are not feasible. In biological diversity, the number of
some characteristics may not be observed in the sample. Thus, the statistical properties of
the index are questionable, restricting one to proceed on estimation of the index by means
condition. As Peet had elaborated, the use of diversity index without a significance test
What is required for the solution of these inference problems is the sampling
distribution of the estimate of the Shannon index. Several methods have been done to
develop and obtain improved estimates considering its distribution and other statistical
properties.
Zahl (1977) applied the jackknife method to the estimation of the diversity index
and showed the advantages of the method when the random sampling of the species or
improved estimate for Shannon index. Bootstrapping as a technique that allows one to
by a single sampling. Through this process, the scatter found enables to estimate the
knowing the distribution of the population. Also, the bootstrap provides a way to
interval methods, normal approximation method, standard percentile method, and bias-
corrected method seem to be the ones that best take advantage of bootstrapping’s
benefits. They are automatic and relatively simple to program, easy to understand
conceptually, and applicable, perhaps, to any statistics developed from a simple random
sample. For these reasons, these methods are the leading candidate for application in
Over the past decade, substantial attention has been paid to the development of
various population parameters. This study presents a bootstrap simulation for determining
the Shannon index using the four most general and practical method for biological
sciences namely: the normal approximation method, the standard percentile method, and
the bias-corrected method. It also explores the sampling mean coverage of the confidence
intervals for the index when the number of species or subject in the sample itself is
In general the study aims to utilize the bootstrap confidence interval methods for
Shannon index.
as the normal approximation method, the standard percentile method, and the
bias-corrected method;
3. assess the interval based on its statistical properties such as mean coverage,
4. compare the interval obtained using the different interval construction method.
Shannon index, similar to other indices for different field simplifies the
monitoring the diversity condition of an ecological space trough time, reflecting the
represent diversity of the collection could help if not maintain healthy community. Since
policies this study may be beneficial to the user of Shannon index. In addition,
identifying the bootstrap method that will give a best coverage of an interval will improve
the analysis of the index. Furthermore, the problem of interpreting this diversity
by ecologist since the late 1950s. However, with the different types of collection, Pielou
(1966) recommended the use of Shannon index for a large collection where a random
sample can be drawn and the number of species is known. Since the basis is only a
portion of the whole collection, one cannot determine the true population diversity
preferably estimate the average diversity from a sample. The estimated value of Shannon
is from the incomplete knowledge yielded by a sample and thus has sampling error.
is the proportion of the ith species of a population with s species. The value of H is
s
estimated using field data as Hˆ = − ∑ pi ln pi , where pi=n/N is the proportion of the ith
i= 1
species in the sample. However, this method yields a biased estimator with expected
mean
s
s− 1
1− ∑ p− 1
E ( Hˆ ) = − ∑ pi ln pi − + i= 0
+ ...
i= 0 2N 12 N
and variance
s s
− ∑ pi ln pi 2 − (∑ pi ln pi ) 2
s− 1
var( Hˆ ) = i= 0 i= 0
+ + ..
N 2N
(Hutchenson 1970). The bias may only be allowed when the most of the species are
Several methods had been employed to obtain improved estimate for the
Shannon index. Pielou repeatedly computed the index of a sample, adding new quadrants
in random order at a time, until the index showed no significant difference. Monk and
McGuinis(1966) implemented a similar sequential procedure for the ratio of the number
of log number of individual. Using Pielou’s method, Heyer and Berven (1973) repeated
the procedure for different random orderings of the quadrant. This method provided an
improved standard error. However, these methods failed to provide significance tests or
confidence intervals for sampling done when the number of species in the sample is
subject to variation.
Shannon index. The method was employed by systematically dropping out quadrants one
at a time and assessing the variation in the resulted index. It automatically took into
account the restriction on filed sampling and showed approximately normally distributed
procedure as a tool to correct the bias of the usual estimation procedure for species
diversity. She compared the relative reduction in bias of the jackknife estimates from the
computed values using sample based procedure of the three diversity indices, including
nonparametric estimates of the standard error of an estimator. It showed that the bootstrap
performs notably better than the jackknife in estimating the standard error of correlation
coefficient from a bivariate normal model. Hall (1989) generalized that bootstrap
methods are simulation methods for assessing sampling properties of the statistical
estimates.
Several studies extended the index to evaluate other types of diversity. Genetic
Jain et al. (1975) adopted the Shannon index to examine the geographical
patterns of the phenotypic diversity in the world collection of the durum wheats
(Triticum turgidum). In the study, durum wheats were classified for different observable
n
characteristics, each in different number of classes. The Shannon index, H = ∑
i= 1
pi ln pi ,
was employed, where n is the number phenotypic classes for a character and pi is the
proportion of total number of entries in the ith class. Moreover, Jain et al, used the
diversity in the Ethiopian noug germplasm collections across a wide range of characters
phenotypic frequencies of the characters were analyzed by the Shannon diversity index
(H’) in order to estimate the diversity of each character within each province. The result
supported Yang et al. (1991) assertion that the value of the index increases with the
increase in polymorphism and reaches the maximum value when all phenotypic classes
have equal frequencies. Also the variance of H’ has not been characterized. However,
assuming that the eight characters used in the study represent a random sample of all
possible characters of noug plant, an empirical variance was computed from the eight
estimates of Shannon’s diversity index. It was concluded that the utility of germplasm
collection to research programmes designed to locate genes depends on adequate
sampling procedures.
Korean germplasm of rice with different levels of resistance to blast in the Philippines.
the leaves of each Korean germplasm. The bootstrap values was generalized and come in
percentages which can be considered as statistical tests (confidence limits) on the validity
of the various groups. She discussed further that the higher the percentage, the greater the
compared theoretically the three confidence interval method and highlighted the
parametric assumption about the distribution of the estimator but demands the least
accuracy when the sample is small. Taking the advantages of percentile method, the bias-
Bootstrap Procedure
Efron and Tibshani (1993) simplified the steps of the generic bootstrapping
procedure. This was followed to bootstrap the Shannon diversity index. Suppose a
random sample of x1, x2,.. xN with unspecified probability distribution F, so that xi~indF, for
which the parameter of interest is to estimated. The basic steps in the procedure are as
follows:
values xi, assigning 1/n at each data point. This is the empirical distribution
4. Repeat the steps 2 and 3 B times, where the B is a large number. The
probability 1/B at each point, θˆ1* , θˆ2* ,…, θˆB* . This distribution is the
interval for Shannon diversity index namely: normal approximation method, standard
constructing confidence intervals. The method assumes that the statistics follow a normal
distribution; however no analytic standard error formula for it exists. The bootstrapped
sampling distribution can be surrogated to estimate the standard error. This estimation is
samples.
( )
The percentile method, on the other hand, takes literally the notion that
approximates the empirical distribtution F( ). The basic approach in finding the lower
percentiles of the distribution. The values will be sorted so that the value
of at the 2.5th and 97.5th percentile of can easily be determined. Thus, given
that B =1000, the lowest 25th value of will be the lower limit and the 25th highest value
center of the point estimate. This allows finding asymmetric intervals. The confidence
( )
bootstrap estimates that are larger than the original sample estimate or the bootstrap
population. If the distribution is already centered, that is, if p* is 0.50, then it will turn out
Equivalently, the endpoints can also be obtained using the cumulative probability
lower limit.
Thus the bias corrected percentile limits are the 100 % and 100 % percentile
the IRRI germplasm bank will be used in the study. Each phenotypic character, denoted
by Yi, of each rice accession was scored or measured in accordance with the procedure
describe in descriptors for Rice provided by IBPGR – IRRI (1980). For agronomic
classes defined by µ ± kσ , where k=1,2,3,4, and 5, µ is the mean and σ is the variance
Sample of rice accessions of sizes 3000, 1000, 500, 100, and 50 were drawn
randomly from the rice collection. This was denoted as the original sample. The (1)
Shannon index ( Ĥ ) and (2) number of states ( R ) of each descriptor, Yi, using the
n
formula of Hˆ = − ∑ pi ln pi , where n is the number phenotypic states for a descriptor and
i= 1
pi is the proportion of the total number of entries in the ith state, and R = exp( Hˆ ) ,
respectively. Simple random sampling with replacement was drawn from the original
sample, to be noted as bootstrap resamples. Then, for each resample, the (1) Shannon
index and (2) number of states observed was estimated. One hundred bootstrap resamples
was conducted so that (3) normal approximation confidence interval, (4) standard
percentile confidence interval, (5) bias-corrected confidence interval, and (6) bootstrap
mean, median, and standard error of Shannon index can be estimated. The empirical
coverage of nominal 0.95 confidence will be used for each confidence interval.
Since it is of interest to know the behavior of the of the Shannon diversity index
with the varying number of classes or states of a descriptor, different conditions was set.
The richness values or the number of states for each descriptor was determined. The
For each original sample, 200, 500, 1000, 1500, 2000, 5000 and 10000 random
samples was applied to make bootstrap estimation and examine the performance of the
index in modifying the number of samples. Each resamples was subjected in determining
bootstrap confidence intervals were constructed for each of the method. Then, the
Shannon index. The efficiency of the method was amounted based on the Average Range
(AR) of set of estimated confidence interval. The measure of coverage sufficiency was
estimated intervals actually cover the population index given a prescribed level of
confidence. The RCR of the methods was compared to determine the most efficient
Statistical Analysis Software (SAS) and STATA Software was used for the
analysis.
II. RESULTS AND DISCUSSION
descriptors and five descriptors with actual value in the data. Some of the descriptors that
contain rice entry with no recorded observation were dropped for the analysis. Table 1
shows the Shannon Diversity Index of each of the descriptors of the rice collection.
Table 1. Shannon Diversity Indices of the rice descriptors of the population of rice
collection and the number of state observed.
*Quantitative Variables
The descriptor with the highest Shannon index is the Culm Length with the index
of 1.723441 with seven states observed, followed by Panicle Threshability with the index
of 1.509179 with nine states. On the other hand, the descriptor with the lowest Shannon
index is found to be the Endosperm Type with the index of only 0.228995 having two
states.
It is also detected that all five uncoded quantitative variables acquired an index
larger than 1. The Grain Length has the highest Shannon diversity index of 1.464131 with
the complete 10 states observed. This is followed by 100-Grain Weight with the index of
1.442009 with also the complete 10 states observed. Grain Width, Main Heading, and
Ligule Length have indices higher than 1 with detected states of nine, seven, and eight,
respectively.
The Lemma Palea Color has the highest number of states detected with 11 states
observed, providing a Shannon diversity index of 1.418971. On the other hand, the
Endosperm Type and the Culm Diameter incur only two states, providing an index of
Sample of 3,000
Using the 3000 original sample from the population of the Blade Colors of the
rice collections with the total number of rice entry of 9105, the Shannon diversity index
estimate of the Blade Color is found to be 1.0091. This estimate provides a very small
bias of -0.0007 which can be attributed to the proportional sample among the descriptor’s
states in the sample of 3000 rice collections to the rice population collection as presented
in Figure 1 and Figure 2. The two graphs exhibit almost the same collection distribution
Out of 9,105 rice with the recorded Blade Color, 5,602 samples have the color of
Pale Green with the registered proportion of 0.61527 among all colors. It is followed by
Dark Green state with 2,564 observations and a proportion of 0.28160. However, there
are only 13 rice entries with the Purple Tip state, giving only 0.00143 of the collection.
Figure 1. Pie graph of the proportional distribution of the Blade Color’s states on
the population of rice collection.
Similar to the population collection, rice having Pale Green on Blade Color incurs
the highest proportion of 0.62167 in the sample of 3,000 rice entries. Also, there are only
Figure 2. Pie graph of the proportional distribution of the Blade Color’s states
on sample of 3,000.
All of the bootstrap estimates in different resamples have the value of the index
close to the original sample estimate index of 1.0091 as indicated by the small bias on
each resamples. All the bootstrap estimates underestimated the original sample index.
The most accurate estimate is produced by the bootstrap with 1,000 resamples with value
of index equal to 1.00893. On the other hand, the bootstrap with 200 resamples has the
least accurate estimate with the value of index equal to 1.00718. The standard errors of
the estimates using different number of resamples are found to be reliable with value
The 95% Confidence Intervals constructed using the three methods do not vary
significantly and cover the original sample estimate and the population index.
All the resamples are subjected to testing the normality of the distribution of the
estimates and it was verified that the distribution of the estimates follows a normal
distribution.
Table 3. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 3,000 original samples with different bootstrap resamples.
Sample of 1,000
Taking a sample of 1,000 from the population of the Blade Colors of the rice
collections, the Shannon index estimate of the Blade Color is found to be 0.9971. This
index estimate is smaller than the index estimated with 3,000 samples. Moreover, this
estimate provides a small bias of -0.01203. Furthermore, it can be observed that the
proportional distribution of the rice collection on the different blade color’s states is
closely the same in the population and in the sample with 1,000 observations, as
Descriptor's Frequency
State Population Sample
Pale Green 5602 621
Green 361 32
Dark Green 2564 285
Purple Tips 13 3
Purple Margins 423 40
Purple Blotch 40 7
Purple 102 12
Total 9105 1000
The sample of 1,000 rice entries is dominated by the state of having Pale Green
with the number of 621 or a proportion of 0.62100; followed by Dark Green with the
proportion 0.28500. On the other hand two states have samples with less than ten
observations. There are 3 and 7 rice entries with Blade Color of Purple Tips and Purple
Blotch, respectively.
Given the original sample index estimate of 0.99774, all the bootstrap estimates
produced from the sample of 1,000, underestimated the original sample estimate and even
Table 5. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 1,000 original samples with different bootstrap resamples.
However, all the bootstrap estimates in different number of resamples are accurate
based on its bias. The number of resamples of 1,000 provides the most accurate estimate
with the index of 0.99523 and the value of bias of -0.00251. On the other hand, the least
accurate estimate is found in the number of resamples of 200 with the index of 0.99378
Sample of 500
Taking 500 observations as original sample from the population of the Blade
Colors, the Shannon Index estimate of the Blade Color is found to be 0.9971. This
estimate provides a small bias of -0.101839. Moreover, this index estimate is smaller than
the index estimate with 3,000 and 1,000 samples. Furthermore, it can be stated that the
proportional distribution of the rice collection on most of the blade color’s states is the
closely the same in the population and in the sample with 500 observations, as presented
in Figure 1 and Figure 4. However, no observation is collected for the state of Purple
Descriptor's Frequency
State Population Sample
Pale Green 5602 324
Green 361 14
Dark Green 2564 137
Purple Tips 13 1
Purple Margins 423 22
Purple Blotch 40 0
Purple 102 2
Total 9105 500
Shannon Index 1.0098 0.907940
Similar to the sample of 3,000 and 1,000 entries, the Blade Color state of Pale
Green has the highest sampled entries with 324 in numbers of 0.64800 in proportion.
There are only one and two entries in the sample that are classified with Purple Tips and
Purple, respectively. Moreover, no entry has found in the state of having the Blade Color
of Purple Blotch.
With the original sample index estimate of 0.90794, all the bootstrap estimates in
different number of resamples underestimate the original sample estimate. However, all
the bootstrap estimates in different number of resamples are accurate based on its bias.
The number of resamples of 1,000 provides the most accurate estimate with the index of
0.90398 and the value of bias of -0.00396. On the other hand, the least accurate estimate
is found in the number of resamples of 2,000 and 5,000 with the bias of the estimate of
-0.00582.
resamples do not vary significantly which ranges from 0.03816 to 0.04055. The most
precise estimate is given by the bootstrap estimate with 500 and 2,000 resamples
In the same way with the intervals in the 3,000 and 1,000 original sample, the
95% confidence intervals in the 500 original sample produced using the three methods do
not vary significantly and cover the original sample estimate. The distribution of the
Table 7. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 500 original samples with different bootstrap resamples.
Sample of 100
A random sample of 100 rice accessions, as shown in Table 8 incurred only three
observed states. No accession entry has a Blade Color of Purple, Purple Tips, and Purple
Blotch in the sample. This sample collection produced a value Shannon index of 0.89913,
Descriptor's Frequency
State Population Sample
Pale Green (60) 5602 59
Green (61) 361 3
Dark Green (63) 2564 35
Purple Tips (80) 13 0
Purple Margins (85) 423 3
Purple Blotch (86) 40 0
Purple (89) 102 0
Total 9105 100
Shannon Index 1.0098 0.88913
Figure 5. Pie Graph of the proportional distribution of the Blade Color’s states on sample
of 100.
In Table 9, all the Bootstrap estimates on the sample of 100 using the different
number of bootstrap resamples underestimate the original sample estimate of the Blade
color’s Shannon index. The most accurate index is provided by the bootstrap estimate
with 1000 resamples having a bias of only -0.01317. However, trend on the bootstrap
resamples 2,000 and higher. Moreover, the confidence interval using these resamples
Table 9. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 100 original samples with different bootstrap resamples.
Sample of 50
Shannon index estimate of 0.87317 with the bias of -0.1366. Rice entries registered only
on the three Blade color’s states namely Pale Green, Green, and Dark Green.
Similar to the previous number of sample, all the Bootstrap estimates on the
sample of 100 using the different number of bootstrap resamples underestimate the
original sample estimate of the Blade color’s Shannon index. The most accurate index is
provided by the bootstrap estimate with 1000 resamples having a bias of only -0.01431.
However, the standard errors of the bootstrap estimate in this sample indicated relatively
Descriptor's Frequency
State Population Sample
Pale Green (60) 5602 30
Green (61) 361 4
Dark Green (63) 2564 16
Purple Tips (80) 13 0
Purple Margins (85) 423 0
Purple Blotch (86) 40 0
Purple (89) 102 0
Total 9105 50
Shannon Index 1.0098 0.87317
For all resamples, it can be noticed that the upper bound of the interval using
percentile method failed to reach and cover the population index of 1.0098. Bias-
corrected method for 1000 resamples has upper bound less than the population index.
Table 11. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 50 original samples with different bootstrap resamples.
sample and the frequency observed in the Blade color’s states. Furthermore, it also
presents the population index of 1.0098 is estimated using 3,000, 1,000, 500, 100, and 50
with the value 1.0091, 0.9977, 0.9079, 0.88913 and 0.87317 respectively. Relative to the
Shannon Index on the population of the Blade Color in the collection of rice, the Shannon
Index estimate in the sample decreases as the number of sample decreases. This shows
that gathering fewer samples from the collection the estimate will more likely
underestimate the true population index. This can be attributed by having no observation
in some of the state of the Blade Color sampled. In the sample of 500 observations, no
sampled rice has a Blade Color of Purple Blotch. In addition, the sample of 100 and 50
Table 12. Comparison of the distribution of Blade Color’s states and Shannon Index on
the different samples.
Frequency
State Population 3000 1000 500 100 50
Pale Green 5602 1865 621 324 59 30
Green 361 114 32 14 3 4
Dark Green 2564 818 285 137 35 16
Purple Tips 13 3 3 1 0 0
Purple Margins 423 147 40 22 3 0
Purple Blotch 40 11 7 0 0 0
Purple 102 42 12 2 0 0
Total 9105 3000 1000 500 100 50
The Shannon Index of the descriptor is also examined when the number of the
state are subjected to variation with the same number of samples. Table 8 confirms that
the Shannon diversity index decreases as the number of the state or class decreases. The
amount of decrease also increases as the number of detected state closes to zero.
Furthermore, the Shannon index is observed to be equal to zero when there is only one
Table 13. The frequency distribution on the Blade Color’s State on the sample of 1,000 in
different number of states observed
The Shannon indices of different descriptors with different number of state were
also analyzed. Table 14 presents seven descriptors with increasing population index as
the number of state increases. However, there is no descriptor with six states found in the
ice collection. It can be observed that Blade color with seven states has a Shannon index
of 1.00978. This index is less than the index of Leaf Length with only five states but with
an index of 1.02823. Thus, it can be said that the rice collection’s Leaf Length is more
diverse than Blade color. The same behavior is followed for the sample of 100 and 50.
Table 14. Shannon index of different descriptors with different number of states
The sample of 100 was used for the analysis of bootstrap confidence interval
since only after this resample variation on the index was identified. Original samples with
large number, from 500 to 3,000 for instance, produced a normal distribution for the
confidence intervals that cover the true parameter index. Thus, a relatively small sample
was utilized.
The 95% confidence interval using the normal approximation method for different
bootstrap resamples with the original sample of 100 observations is presented in Table
10. It is revealed that the confidence interval using normal approximation with 2,000
resamples has the narrowest length; while the bootstrap estimate with 5,000 resamples
has the widest coverage of interval. Figure 7 illustrates the coverage of the intervals and it
can be noticed that the intervals are close to one another. All the intervals covered the
population index of Blade color with the value of 1.00978 indicated by the solid vertical
Original Sample
Estiimate
Lower Limit
Upper Limit
5000
2000
Resamples
1500
1000
500
200
Shannon Index
The 95% confidence interval using the percentile method for different bootstrap
resamples with the original sample of 100 observations is presented in Table 16. It is
revealed that the confidence interval using normal approximation with 200 resamples has
the narrowest length; while the bootstrap estimate with 1,500 resamples has the widest
coverage of interval.
confidence intervals in different resamples are not symmetric with respect to the
bootstrap estimate. Notably, the upper bound of all the intervals are lie near the
population index of Blade color. Only the resample of 2,000 did not cover the parameter
of Shannon index.
Lower Limit
Upper Limit
5000
2000
Resamples
1500
1000
500
200
Shannon Index
Bias-Corrected Method
The 95% confidence interval using the bias corrected method for different
bootstrap resamples with the original sample of 100 observations is presented in Table
17. Likewise to Normal approximation methods, the confidence interval 2000 resamples
registered the narrowest length among the different number of resamples implemented;
also the bootstrap estimate with 200 resamples has the widest coverage of interval.
method, the skewness in the intervals are more reflective using the bias corrected method.
The lower bound of all the resamples converged to a certain value; while, the upper
Original Sample
estimate
Lower Limit
Uppwe Limit
5000
2000
Resamples
1500
1000
500
200
Shannon Index
The three bootstrap confidence intervals are further analyzed within the different
number of resamples as presented in Figure 10. For all the different number of resamples
specified, the lower and upper bound of the Percentile Method reached the lowest index
value for the confidence interval of the Blade Color among the three methods except only
for the lower bound of 200 resamples. On the other hand, the all the lower and upper
bound of the confidence interval for all the number of resamples specified using the
Normal Approximation arrived with the highest value of Shannon diversity index of
200
P
NA
3 Original Lower
Sample Limit
500
2 Estimate Upper
Bootstrap Limit
1 Estimate
BC
1000
P
NA
BC
1500
NA
NA
2000
NA
BC
5000
NA
BC
10000
NA
Shannon Index
Figure 10. Comparison of the Confidence Interval using different methods in different
numbers of resamples
100 Confidence Intervals
From the 100 constructed confidence intervals using the Normal Approximation
method, all the intervals cover the original bootstrap estimates. It can also observe on
Figure 9 that the confidence intervals are symmetric with the reference on the bootstrap
estimate.
91
88
85
82
79
76
73
70
67
64
61
58
Interval
55
52
49
46
43
40
37
34
31
28
25
22
19
16
13
10
Shannon index
Figure 11. One Hundred Bootstrap confidence interval using Normal Approximation
Method
Unlike other methods, fifty-five out of 100 confidence intervals using Percentile
method covered the true population parameter. The other forty-five intervals have upper
bound less than the parameter. Moreover, the intervals produced are no longer symmetric
Original Sample
estimate
Bootstrap estimate
101
99 Lower limit
97 Upper limit
95
93
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
57
Interval
55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
Shannon index
Figure 12. One Hundred Bootstrap confidence interval using Percentile Method
The Bias-Corrected method also generated 100 confidence intervals that cover the
original bootstrap estimate like the previous two methods. No interval is found to be
symmetric with regard to bootstrap estimate and it can be observed that intervals are
Original Sample
Estimate
101 Bootstrap Estimate
99
97 Lower Limit
95
93 Upper Limit
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
Case Number
57
55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1
Value
Figure 13. One Hundred Bootstrap confidence intervals using Bias Corrected Method
Confidence Interval Measures
Using the 100 generated confidence intervals on the three methods, the
comparison of the properties of the interval were evaluated and presented on Table 14.
As a measure of the interval’s accuracy, the Average Range of the ranges of the 100
intervals indicates that the Bias-corrected method produced the most accurate interval
with the mean range of 0.26601. It was followed by Bias-Corrected and Normal
The Expected Coverage Rate and Realized Coverage Rate of the Normal and
Bias-corrected methods are the same with the rates equal to 0.95 and 1.00, respectively.
Percentile method, on the other hand, has a realized coverage rate of 0.55. Thus, this
means that the intervals constructed under Normal approximation and Bias-corrected
have the same number of intervals that actually covered the original sample estimate.
Also, only 55 percent of the intervals will expect to contain the parameter index using
Percentile method. Thus, the Normal approximation and Bias-corrected method are more
calibration rate of 0.99646 which implies that the average range of each method needs a
downward adjustment for the estimated confidence intervals to equalize the realized
coverage rate with the expected coverage rate. On the other hand Percentile method has a
calibration rate of 1.03454 which needs an upward adjustment. However, based on the
method’s Calibrated Average Rate, the Bias-corrected method is the most efficient
It is also an interest to know the behavior of the Shannon index as the number of
Bootstrap resamples varies. The original sample of 100 was used for this analysis since it
is the sample where the index reached its critical limits on fitting the distribution for
normal. Different levels of significance were set to determine the power of the
all the resamples’ distribution resemble the curve of the normal distribution.
the distribution of the estimates produced by having 200 and 500 resamples fits the
normal distribution at level of significance ranging from 0.01 to 0.2. For the resamples of
1,000, the distribution of estimates of the resamples differs significantly from normal
distribution only at 0.2 level of significance. Both 2,000 and 5,000 resamples detected
that index distribution does not fit the normal distribution on at least 0.05 level of
significance. Thus, as the number of bootstrap resamples increases the distribution of the
0.18
0.18
0.16
0.16
0.14
0.14
0.12 0.12
0.1
f(x)
f(x)
0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0.7 0.8 0.9 1 1.1 0.7 0.8 0.9 1 1.1
x x
0.2
0.18
0.18
0.16
0.16
0.14
0.14
0.12
0.12
f(x)
f(x)
0.1
0.1
0.08
0.08
0.06
0.06
0.04 0.04
0.02 0.02
0 0
0.6 0.7 0.8 0.9 1 0.7 0.8 0.9 1 1.1
x x
0.18 0.18
0.16 0.16
0.14 0.14
0.12 0.12
f(x)
f(x)
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 1.1
x x
Figure 14. Comparison of the distribution of the Bootstrap estimates using different
resamples
Table 17. Goodness-of-Fit test for the normality of the different bootstrap resamples
collection. However, the estimate of this index is known to be biased and with no simple
formula for statistical properties exist. Without any requirement for the formula,
bootstrapping was used to estimate the Shannon index and construct confidence interval
around it.
The behavior of the Shannon index using the different conditions under bootstrap
method was analyzed in the study. Using the different number of original sample to be
used for bootstrapping, relatively small number of sample found to have significant effect
on the variation of the statistical properties of the Shannon diversity index. From almost
9,000 rice accessions, a sample of size 100 was found to detect significant bias and
interval coverage for the diversity index of Blade Color and other descriptors. Also, the
number of bootstrap resamples did not present any particular trend for the properties of
index for large original samples. However, resamples indicate that as the number of
resamples increases the distribution of the Shannon index will more likely deviate from
normal distribution.
Among the three methods used for the interval construction, Bias-corrected
method produced the most accurate, sufficient, and efficient interval based on average
range, expected and realized coverage rate, and calibration rate. On the other hand
Percentile method has a calibration rate of 1.03454 which needs an upward adjustment.
Normal Approximation method provided significant intervals for the some resamples that
JAIN, S.K., QUALSETM C.O., BHATT, G.M., WU, K.K. 1975. Geographic patterns of
Phenotyphic diversity in a world collection of Durum Wheats. Crop Science 15. 700-704.
Riley, K.W. 1998. Phenotypic diversity in the ethiopian noug germplasm. African Crop
Science Journal 8. No. 2 . 137-143
Rice Diversity <http://www.ricediversity.org>
YANG, R.C., JANA, S., CLARKE, J.M. 1991. Phenotypic diversity and associations of
some potentially drought-responsive characters in durum wheat. Crop Science 31:1484 -
1491
ZALH, S. 1977. Jackkniffing an index of Diversity. Ecology 58. 907-913