Sei sulla pagina 1di 47

BOOTSTRAP CONFIDENCE INTERVAL ESTIMATION FOR

SHANNON DIVERSITY INDEX:

Jundy L. Dela Puerta 1

I. INTRODUCTION

Background of the Study

Diversity index is a statistic which is intended to measure diversity or variability

in categorical data; the same way that statistical variance is used to provide measure of

variability for quantitative variables. Among other indices, Shannon index (Shannon-

Weiner index) stands out because it provides more information about community

composition. This index was originally proposed as a measure of information content of a

code. Later on, it was widely used to determine the diversity of sample of individual from

an ecological community, treating species as symbols and their relative population sizes

as the probability.

Moreover, it has been one of the most commonly used indices because of its

application in measuring the diversity for collection coming from infinitely large

population in which a random sample can be drawn. This classification is most common

scenario in which it becomes necessary to estimate diversity. As an estimator the of

community’s diversity, it has been proven biased especially when all the species are not

included in the sample (Pielou 1969).

__________________________________________________________________
Undergraduate Special Problem under the supervision of Consornia Reano,Ph.D., submitted as partial
fulfillment of the requirements in STAT 190, 2nd semester, SY 2008-2009
Several studies have adopted the index to measure other types of diversity. The

index also measures the variation of genetic, morphologic and phenotypic characteristic.

Analogous to components of biological collection, the properties and condition of

Shannon index remains.

Likewise, the Shannon index used for these diversities is biased and more often

underestimate the true population diversity, especially when the number of not sampled

species or subjects increases. Moreover, in practice when diversity is estimated, the

conditions for Shannon index are not feasible. In biological diversity, the number of

species in a collection is sometimes not independently known; for phenotypic diversity,

some characteristics may not be observed in the sample. Thus, the statistical properties of

the index are questionable, restricting one to proceed on estimation of the index by means

of confidence interval and hypothesis testing based on sampling imposed by field

condition. As Peet had elaborated, the use of diversity index without a significance test

can be quite misleading.

What is required for the solution of these inference problems is the sampling

distribution of the estimate of the Shannon index. Several methods have been done to

develop and obtain improved estimates considering its distribution and other statistical

properties.

Zahl (1977) applied the jackknife method to the estimation of the diversity index

and showed the advantages of the method when the random sampling of the species or

subject is not satisfied.


Similar to the basic idea of jackknife method, bootstrap method produces an

improved estimate for Shannon index. Bootstrapping as a technique that allows one to

estimate sample variability by resampling from empirical probability distribution defined

by a single sampling. Through this process, the scatter found enables to estimate the

standard error and confidence interval of the result.

In bootstrap sampling data can be created based on the observation without

knowing the distribution of the population. Also, the bootstrap provides a way to

substitute computation for mathematical analysis if calculating the asymptotic

distribution of an estimator or statistic is difficult.

However, the choice of which bootstrap confidence interval to use is highly

dependent on the practical research situation. Among several bootstrap confidence

interval methods, normal approximation method, standard percentile method, and bias-

corrected method seem to be the ones that best take advantage of bootstrapping’s

benefits. They are automatic and relatively simple to program, easy to understand

conceptually, and applicable, perhaps, to any statistics developed from a simple random

sample. For these reasons, these methods are the leading candidate for application in

different field of science.

Statement of the Problem

Over the past decade, substantial attention has been paid to the development of

techniques using bootstrapping sampling distribution to build confidence intervals around

various population parameters. This study presents a bootstrap simulation for determining
the Shannon index using the four most general and practical method for biological

sciences namely: the normal approximation method, the standard percentile method, and

the bias-corrected method. It also explores the sampling mean coverage of the confidence

intervals for the index when the number of species or subject in the sample itself is

accounted to random variation.

Objective of the Study

In general the study aims to utilize the bootstrap confidence interval methods for

Shannon index.

Specifically, it aims to:

1. determine the probability distribution of the Shannon index;

2. construct confidence interval of Shannon index using bootstrap methods such

as the normal approximation method, the standard percentile method, and the

bias-corrected method;

3. assess the interval based on its statistical properties such as mean coverage,

accuracy, precision, and reliability; and

4. compare the interval obtained using the different interval construction method.

Significance of the Study

Shannon index, similar to other indices for different field simplifies the

information content of a collection. Most of the times it is of interest to compare


collections and to know which collection exhibits stable and productive community

structure. For environmental sciences, quantifying through this index is a tool in

monitoring the diversity condition of an ecological space trough time, reflecting the

extinction or domination of a species or classification. Producing an index that best

represent diversity of the collection could help if not maintain healthy community. Since

it is important to account uncertainty when diversity indices are calculated in making

policies this study may be beneficial to the user of Shannon index. In addition,

identifying the bootstrap method that will give a best coverage of an interval will improve

the analysis of the index. Furthermore, the problem of interpreting this diversity

measurement will be improved if statistical properties of estimates will be studied.

Balanced characterization and biological-conservation program will be more efficient.

II. REVIEW OF LITERATURE

The application of index in biological diversity had been increasingly adopted

by ecologist since the late 1950s. However, with the different types of collection, Pielou

(1966) recommended the use of Shannon index for a large collection where a random

sample can be drawn and the number of species is known. Since the basis is only a

portion of the whole collection, one cannot determine the true population diversity

preferably estimate the average diversity from a sample. The estimated value of Shannon

is from the incomplete knowledge yielded by a sample and thus has sampling error.

Pielou reiterated that estimates of Shannon should always be accompanied by estimates

of their standard errors (1969).


s
The diversity using Shannon index has a formula of H = − ∑ pi ln pi , where pi
i= 1

is the proportion of the ith species of a population with s species. The value of H is

s
estimated using field data as Hˆ = − ∑ pi ln pi , where pi=n/N is the proportion of the ith
i= 1

species in the sample. However, this method yields a biased estimator with expected

mean

s
s− 1
1− ∑ p− 1
E ( Hˆ ) = − ∑ pi ln pi − + i= 0
+ ...
i= 0 2N 12 N

and variance

s s
− ∑ pi ln pi 2 − (∑ pi ln pi ) 2
s− 1
var( Hˆ ) = i= 0 i= 0
+ + ..
N 2N

(Hutchenson 1970). The bias may only be allowed when the most of the species are

include in the sample, though not feasible (Peet 1974).

Several methods had been employed to obtain improved estimate for the

Shannon index. Pielou repeatedly computed the index of a sample, adding new quadrants

in random order at a time, until the index showed no significant difference. Monk and

McGuinis(1966) implemented a similar sequential procedure for the ratio of the number

of log number of individual. Using Pielou’s method, Heyer and Berven (1973) repeated

the procedure for different random orderings of the quadrant. This method provided an

improved standard error. However, these methods failed to provide significance tests or
confidence intervals for sampling done when the number of species in the sample is

subject to variation.

Zahl (1977) applied the so called jackknife method to the estimation of

Shannon index. The method was employed by systematically dropping out quadrants one

at a time and assessing the variation in the resulted index. It automatically took into

account the restriction on filed sampling and showed approximately normally distributed

estimates making possible significant test and confidence interval.

A simulation study made by Pinol (1997) focused on the impact of jackknifing

procedure as a tool to correct the bias of the usual estimation procedure for species

diversity. She compared the relative reduction in bias of the jackknife estimates from the

computed values using sample based procedure of the three diversity indices, including

Shannon index. Jackknife method performed better, resulting in a substantial reduction

on bias. However, the estimates gathered showed an increase on its variance.

Efron (1981) compared jackknife and bootstrap method in evaluating the

nonparametric estimates of the standard error of an estimator. It showed that the bootstrap

performs notably better than the jackknife in estimating the standard error of correlation

coefficient from a bivariate normal model. Hall (1989) generalized that bootstrap

methods are simulation methods for assessing sampling properties of the statistical

estimates.

Several studies extended the index to evaluate other types of diversity. Genetic

diversity used Shannon index to distinguish the genetical variation of an organism.


Morphologic and agronomic variety of plants implemented the index to characterize the

resistance of the variety from certain diseases.

Jain et al. (1975) adopted the Shannon index to examine the geographical

patterns of the phenotypic diversity in the world collection of the durum wheats

(Triticum turgidum). In the study, durum wheats were classified for different observable

n
characteristics, each in different number of classes. The Shannon index, H = ∑
i= 1
pi ln pi ,

was employed, where n is the number phenotypic classes for a character and pi is the

proportion of total number of entries in the ith class. Moreover, Jain et al, used the

additivity property of the Shannon index which is useful in hierarchal analyses of

diversity in a large variety of data.

Meanwhile, a study by Riley in 1998 was undertaken to estimate the phenotypic

diversity in the Ethiopian noug germplasm collections across a wide range of characters

in different agro-ecological areas of the country. As described by Jain et al., the

phenotypic frequencies of the characters were analyzed by the Shannon diversity index

(H’) in order to estimate the diversity of each character within each province. The result

supported Yang et al. (1991) assertion that the value of the index increases with the

increase in polymorphism and reaches the maximum value when all phenotypic classes

have equal frequencies. Also the variance of H’ has not been characterized. However,

assuming that the eight characters used in the study represent a random sample of all

possible characters of noug plant, an empirical variance was computed from the eight

estimates of Shannon’s diversity index. It was concluded that the utility of germplasm
collection to research programmes designed to locate genes depends on adequate

sampling procedures.

A study by Aldemita (2006) was done to determine the genetic diversity of

Korean germplasm of rice with different levels of resistance to blast in the Philippines.

Bootstrap analysis was used to determine the variation on phenotypic characteristics of

the leaves of each Korean germplasm. The bootstrap values was generalized and come in

percentages which can be considered as statistical tests (confidence limits) on the validity

of the various groups. She discussed further that the higher the percentage, the greater the

confidence that a particular group is true.

On the literature of bootstrap confidence interval, Efron and Tibshani (1993)

compared theoretically the three confidence interval method and highlighted the

advantages and disadvantages of each method. The normal approximation requires

parametric assumption about the distribution of the estimator but demands the least

computation. On the other hand, percentile method, which is invariant to transformation,

allows the distribution of the estimator to be asymmetrical; however, it results to low

accuracy when the sample is small. Taking the advantages of percentile method, the bias-

corrected method, on the other hand, requires a limited parametric assumption.


III. CONCEPTUAL FRAMEWORK

Bootstrap Procedure

Efron and Tibshani (1993) simplified the steps of the generic bootstrapping

procedure. This was followed to bootstrap the Shannon diversity index. Suppose a

random sample of x1, x2,.. xN with unspecified probability distribution F, so that xi~indF, for

which the parameter of interest is to estimated. The basic steps in the procedure are as

follows:

1. Construct an empirical probability distribution, Fˆ ( x) , from the observed

values xi, assigning 1/n at each data point. This is the empirical distribution

function (EDF) of x, which is the nonparametric maximum likelihood

estimate (MLE) of the population distribution function, F(X).

2. From the EDF, Fˆ ( x) , draw a random sample of size n with replacement.

This is a “resample,” xb*.

3. Calculate the statistics of interest, θˆ , from this resample, yielding θˆb* .

4. Repeat the steps 2 and 3 B times, where the B is a large number. The

practical magnitude of B depends on the tests to be run on the data.

Typically, B should be 50-200 to estimate the standard error of θˆ , and at

least 1000 to estimate the confidence intervals around the θˆ .

5. Construct a probability distribution from the B θb*’s by placing a

probability 1/B at each point, θˆ1* , θˆ2* ,…, θˆB* . This distribution is the

bootstrapped estimate of the sampling distribution of θˆ , Fˆ * (θˆ* ) .


Confidence Interval Estimation

Three different methods will be used and compared in constructing confidence

interval for Shannon diversity index namely: normal approximation method, standard

percentile method, and bias-corrected method.

The normal approximation method is analogous to the parametric approach in

constructing confidence intervals. The method assumes that the statistics follow a normal

distribution; however no analytic standard error formula for it exists. The bootstrapped

sampling distribution can be surrogated to estimate the standard error. This estimation is

a straightforward application of the notion that are normal variates distributed as :

where , and B is the number of bootstrap

samples.

Similar to the traditional parametric case, the points on the z distribution

associated with α / 2 and 1 − α / 2 will be identified. Using the bootstrapped standard

error, , thus, there is ( 1 − α ) probability that

( )

The percentile method, on the other hand, takes literally the notion that

approximates . That is, the bootstrapped estimate of the sampling distribution of

approximates the empirical distribtution F( ). The basic approach in finding the lower

and upper limit of an a-level confidence interval is to determine the α / 2 and 1 − α / 2

percentiles of the distribution. The values will be sorted so that the value
of at the 2.5th and 97.5th percentile of can easily be determined. Thus, given

that B =1000, the lowest 25th value of will be the lower limit and the 25th highest value

will be the upper limit of the interval.

The third method, bias-corrected, adjusts the bootstrapped distribution at the

center of the point estimate. This allows finding asymmetric intervals. The confidence

interval can be modified and written as

( )

Where and are the standardized normal distribution values with

probability α / 2 , (1 − α / 2 ), and p*, respectively, where p* is the proportion of the

bootstrap estimates that are larger than the original sample estimate or the bootstrap

population. If the distribution is already centered, that is, if p* is 0.50, then it will turn out

to be the symmetric percentile confidence interval.

Equivalently, the endpoints can also be obtained using the cumulative probability

of the bound from the empirical distribution as

for upper limit and for

lower limit.

Thus the bias corrected percentile limits are the 100 % and 100 % percentile

of the parameter estimation bootstrap estimation.


IV. METHODOLOGY

Partial data in morphologic and agronomic characterization of rice accessions at

the IRRI germplasm bank will be used in the study. Each phenotypic character, denoted

by Yi, of each rice accession was scored or measured in accordance with the procedure

describe in descriptors for Rice provided by IBPGR – IRRI (1980). For agronomic

characteristics which involve quantitative data, the accessions was categorized in to 10

classes defined by µ ± kσ , where k=1,2,3,4, and 5, µ is the mean and σ is the variance

of the values collected for the variable.

Sample of rice accessions of sizes 3000, 1000, 500, 100, and 50 were drawn

randomly from the rice collection. This was denoted as the original sample. The (1)

Shannon index ( Ĥ ) and (2) number of states ( R ) of each descriptor, Yi, using the

n
formula of Hˆ = − ∑ pi ln pi , where n is the number phenotypic states for a descriptor and
i= 1

pi is the proportion of the total number of entries in the ith state, and R = exp( Hˆ ) ,

respectively. Simple random sampling with replacement was drawn from the original

sample, to be noted as bootstrap resamples. Then, for each resample, the (1) Shannon

index and (2) number of states observed was estimated. One hundred bootstrap resamples

was conducted so that (3) normal approximation confidence interval, (4) standard

percentile confidence interval, (5) bias-corrected confidence interval, and (6) bootstrap

mean, median, and standard error of Shannon index can be estimated. The empirical

coverage of nominal 0.95 confidence will be used for each confidence interval.
Since it is of interest to know the behavior of the of the Shannon diversity index

with the varying number of classes or states of a descriptor, different conditions was set.

The richness values or the number of states for each descriptor was determined. The

performance of Shannon index for each condition was then assessed.

For each original sample, 200, 500, 1000, 1500, 2000, 5000 and 10000 random

samples was applied to make bootstrap estimation and examine the performance of the

index in modifying the number of samples. Each resamples was subjected in determining

the distribution of the estimates acquired. Kolmogorov-Smirnov statistics on Goodness-

of-Fit test on testing the normality of the distribution was used.

Moreover, to evaluate the measure of good confidence intervals obtained 100

bootstrap confidence intervals were constructed for each of the method. Then, the

accuracy, efficiency and sufficiency of the intervals were assessed.

Accuracy of the method was measured as indicated by confidence interval

coverage or the percentage of bootstrapped intervals which contain the population

Shannon index. The efficiency of the method was amounted based on the Average Range

(AR) of set of estimated confidence interval. The measure of coverage sufficiency was

provided by Realized Coverage Rate (RCR), defined as a percentage by which a set of

estimated intervals actually cover the population index given a prescribed level of

confidence. The RCR of the methods was compared to determine the most efficient

interval construction technique.

Statistical Analysis Software (SAS) and STATA Software was used for the

analysis.
II. RESULTS AND DISCUSSION

There are a total of 36 rice descriptors with 32 pre-coded and classified

descriptors and five descriptors with actual value in the data. Some of the descriptors that

contain rice entry with no recorded observation were dropped for the analysis. Table 1

shows the Shannon Diversity Index of each of the descriptors of the rice collection.

Table 1. Shannon Diversity Indices of the rice descriptors of the population of rice
collection and the number of state observed.

No. of Shannon No. of Shannon


State State
Descriptor Observed Index Descriptor Observed Index
Apiculus Color 8 1.401522 Main Heading 7 1.412793
Auricle Color 3 0.349402 Internode Color 4 0.903629
Awn Color 7 0.720629 Ligule Color 4 0.292467
Awn Presence 5 0.726706 Ligule length* 8 1.403048
Blade Color 7 1.009778 Leaf Length 5 1.028233
Blade Pubescence 3 0.261031 Lemma Palea Color 11 1.418971
Blade Leaf Sheath
Color 4 0.584005 Leaf Senescence 9 1.389686
Collar Color 4 0.953196 Leaf Width 3 0.362462
Culm Angle 9 1.265548 Panicle Exserion 9 1.169451
Culm Diameter 2 0.652631 Panicle length 5 0.619180
Culm Length 7 1.723441 Panicle Threshability 9 1.509179
Culm Number 3 0.631618 Panicle Type 9 1.367554
Culm Strength 9 1.583626 Seed Coat Color 7 0.633206
Endosperm Type 2 0.228995 Seedling Height 3 0.753285
Flag Leaf Angle 4 1.180939 Sterile Lemma Color 4 0.490297
Grain Length* 10 1.464131 Sterile Lemma length 5 0.653199
Grain width* 9 1.428575 Spikelet Sterility 5 0.857232
100-Grain Weight* 10 1.442009 Stigma Color 5 0.632181

*Quantitative Variables

The descriptor with the highest Shannon index is the Culm Length with the index

of 1.723441 with seven states observed, followed by Panicle Threshability with the index

of 1.509179 with nine states. On the other hand, the descriptor with the lowest Shannon
index is found to be the Endosperm Type with the index of only 0.228995 having two

states.

It is also detected that all five uncoded quantitative variables acquired an index

larger than 1. The Grain Length has the highest Shannon diversity index of 1.464131 with

the complete 10 states observed. This is followed by 100-Grain Weight with the index of

1.442009 with also the complete 10 states observed. Grain Width, Main Heading, and

Ligule Length have indices higher than 1 with detected states of nine, seven, and eight,

respectively.

The Lemma Palea Color has the highest number of states detected with 11 states

observed, providing a Shannon diversity index of 1.418971. On the other hand, the

Endosperm Type and the Culm Diameter incur only two states, providing an index of

0.228995 and 0.652631, respectively.

Analysis for Blade Color

Sample of 3,000

Using the 3000 original sample from the population of the Blade Colors of the

rice collections with the total number of rice entry of 9105, the Shannon diversity index

estimate of the Blade Color is found to be 1.0091. This estimate provides a very small

bias of -0.0007 which can be attributed to the proportional sample among the descriptor’s

states in the sample of 3000 rice collections to the rice population collection as presented

in Figure 1 and Figure 2. The two graphs exhibit almost the same collection distribution

among the Blade Color’s states.


Table 2. Frequency distribution of the Blade Color on the
Population and on the sample of 3000

Blade Color's Frequency


State Population Sample
Pale Green 5602 1865
Green 361 114
Dark Green 2564 818
Purple Tips 13 3
Purple Margins 423 147
Purple Blotch 40 11
Purple 102 42
Total 9105 3000

Shannon Index 1.0098 1.0091

Out of 9,105 rice with the recorded Blade Color, 5,602 samples have the color of

Pale Green with the registered proportion of 0.61527 among all colors. It is followed by

Dark Green state with 2,564 observations and a proportion of 0.28160. However, there

are only 13 rice entries with the Purple Tip state, giving only 0.00143 of the collection.

Figure 1. Pie graph of the proportional distribution of the Blade Color’s states on
the population of rice collection.
Similar to the population collection, rice having Pale Green on Blade Color incurs

the highest proportion of 0.62167 in the sample of 3,000 rice entries. Also, there are only

3 entries or 0.00100 of the sample with the Purple Tip state.

Figure 2. Pie graph of the proportional distribution of the Blade Color’s states
on sample of 3,000.

All of the bootstrap estimates in different resamples have the value of the index

close to the original sample estimate index of 1.0091 as indicated by the small bias on

each resamples. All the bootstrap estimates underestimated the original sample index.

The most accurate estimate is produced by the bootstrap with 1,000 resamples with value

of index equal to 1.00893. On the other hand, the bootstrap with 200 resamples has the

least accurate estimate with the value of index equal to 1.00718. The standard errors of

the estimates using different number of resamples are found to be reliable with value

ranges from 0.01677 to 0.01789 only.

The 95% Confidence Intervals constructed using the three methods do not vary

significantly and cover the original sample estimate and the population index.
All the resamples are subjected to testing the normality of the distribution of the

estimates and it was verified that the distribution of the estimates follows a normal

distribution.

Table 3. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 3,000 original samples with different bootstrap resamples.

Bootstrap Bootstrap Standard Normality 95% Confidence Interval Bootstrap


Resample Estimates Bias Error Test Lower Limit Upper Limit Method

200 1.00718 -0.00194 0.01689 Normal 0.97582 1.04242 NA


0.97317 1.04099 P
0.97855 1.04308 BC
500 1.00817 -0.00095 0.01789 Normal 0.97396 1.04428 NA
0.97295 1.04224 P
0.97808 1.04580 BC
1000 1.00893 -0.00019 0.01685 Normal 0.97605 1.04219 NA
0.97542 1.04214 P
0.97425 1.04137 BC
1500 1.00827 -0.00085 0.01677 Normal 0.97622 1.04203 NA
0.97554 1.04035 P
0.97662 1.04130 BC
2000 1.00770 -0.00142 0.01723 Normal 0.97533 1.04291 NA
0.97303 1.04107 P
0.97733 1.04474 BC
5000 1.00825 -0.00087 0.01690 Normal 0.97600 1.04225 NA
0.97545 1.04159 P
0.97744 1.04365 BC
10000 1.00806 -0.00106 0.01677 Normal 0.97625 1.04200 NA
0.97462 1.04091 P
0.97721 1.04284 BC
NA = Normal Approximation P = Percentile BC = Bias Corrected

Sample of 1,000

Taking a sample of 1,000 from the population of the Blade Colors of the rice

collections, the Shannon index estimate of the Blade Color is found to be 0.9971. This

index estimate is smaller than the index estimated with 3,000 samples. Moreover, this
estimate provides a small bias of -0.01203. Furthermore, it can be observed that the

proportional distribution of the rice collection on the different blade color’s states is

closely the same in the population and in the sample with 1,000 observations, as

presented in Figure 1 and Figure 3.

Table 4 . Frequency distribution of the Blade Color on the


Population and on the sample of 1000

Descriptor's Frequency
State Population Sample
Pale Green 5602 621
Green 361 32
Dark Green 2564 285
Purple Tips 13 3
Purple Margins 423 40
Purple Blotch 40 7
Purple 102 12
Total 9105 1000

Shannon Index 1.0098 0.99774

Figure 3. Pie graph of the proportional distribution of the Blade Color’s


states on sample of 1,000.

The sample of 1,000 rice entries is dominated by the state of having Pale Green

with the number of 621 or a proportion of 0.62100; followed by Dark Green with the
proportion 0.28500. On the other hand two states have samples with less than ten

observations. There are 3 and 7 rice entries with Blade Color of Purple Tips and Purple

Blotch, respectively.

Given the original sample index estimate of 0.99774, all the bootstrap estimates

produced from the sample of 1,000, underestimated the original sample estimate and even

the population index..

Table 5. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 1,000 original samples with different bootstrap resamples.

Bootstrap Bootstrap Standard Normality 95% Confidence Interval Bootstrap


Resample Esimates Bias Error Test Lower Limit Upper Limit Method

200 0.99378 -0.00397 0.02931 normal 0.93994 1.05555 NA


0.93442 1.05091 P
0.94232 1.05863 BC
500 0.99409 -0.00366 0.02931 normal 0.94016 1.05533 NA
0.93577 1.04949 P
0.94395 1.05243 BC
1000 0.99523 -0.00251 0.03073 normal 0.93744 1.05805 NA
0.93435 1.05586 P
0.93744 1.05965 BC
1500 0.99515 -0.00259 0.02970 normal 0.93948 1.05601 NA
0.93736 1.05350 P
0.94145 1.06104 BC
2000 0.99504 -0.00270 0.02973 normal 0.93943 1.05606 NA
0.93695 1.05256 P
0.94003 1.05683 BC
5000 0.99442 -0.00333 0.02956 normal 0.93979 1.05570 NA
0.93593 1.05210 P
0.94374 1.05933 BC
10000 0.99483 -0.00291 0.02972 Normal 0.93949 1.05600 NA
0.93653 1.05255 P
0.94231 1.05799 BC
NA = Normal Approximation P = Percentile BC = Bias Corrected

However, all the bootstrap estimates in different number of resamples are accurate

based on its bias. The number of resamples of 1,000 provides the most accurate estimate
with the index of 0.99523 and the value of bias of -0.00251. On the other hand, the least

accurate estimate is found in the number of resamples of 200 with the index of 0.99378

with the bias of -0.00397.

Sample of 500

Taking 500 observations as original sample from the population of the Blade

Colors, the Shannon Index estimate of the Blade Color is found to be 0.9971. This

estimate provides a small bias of -0.101839. Moreover, this index estimate is smaller than

the index estimate with 3,000 and 1,000 samples. Furthermore, it can be stated that the

proportional distribution of the rice collection on most of the blade color’s states is the

closely the same in the population and in the sample with 500 observations, as presented

in Figure 1 and Figure 4. However, no observation is collected for the state of Purple

Blotch in the sample of 500.

Table 6. Frequency distribution of the Blade Color on the


Population and on the sample of 500.

Descriptor's Frequency
State Population Sample
Pale Green 5602 324
Green 361 14
Dark Green 2564 137
Purple Tips 13 1
Purple Margins 423 22
Purple Blotch 40 0
Purple 102 2
Total 9105 500
Shannon Index 1.0098 0.907940

Similar to the sample of 3,000 and 1,000 entries, the Blade Color state of Pale

Green has the highest sampled entries with 324 in numbers of 0.64800 in proportion.
There are only one and two entries in the sample that are classified with Purple Tips and

Purple, respectively. Moreover, no entry has found in the state of having the Blade Color

of Purple Blotch.

Figure 4. Pie Graph of the proportional distribution of


the Blade Color’s states on sample of 500.

With the original sample index estimate of 0.90794, all the bootstrap estimates in

different number of resamples underestimate the original sample estimate. However, all

the bootstrap estimates in different number of resamples are accurate based on its bias.

The number of resamples of 1,000 provides the most accurate estimate with the index of

0.90398 and the value of bias of -0.00396. On the other hand, the least accurate estimate

is found in the number of resamples of 2,000 and 5,000 with the bias of the estimate of

-0.00582.

The standard errors of the bootstrap estimates using different number of

resamples do not vary significantly which ranges from 0.03816 to 0.04055. The most

precise estimate is given by the bootstrap estimate with 500 and 2,000 resamples

In the same way with the intervals in the 3,000 and 1,000 original sample, the
95% confidence intervals in the 500 original sample produced using the three methods do

not vary significantly and cover the original sample estimate. The distribution of the

estimates for all the resamples follow the normal distribution.

Table 7. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 500 original samples with different bootstrap resamples.

Bootstrap Bootstrap Standard Normality 95% Confidence Interval Bootstrap


Resample Estimates Bias Error Test Lower Limit Upper Limit Method

200 0.90247 -0.00547 0.03908 normal 0.83087 0.98501 NA


0.82192 0.96784 P
0.82561 0.97421 BC
500 0.90357 -0.00437 0.03816 normal 0.83296 0.98292 NA
0.83003 0.97490 P
0.83688 0.97805 BC
1000 0.90398 -0.00396 0.04055 normal 0.82837 0.98751 NA
0.82560 0.98381 P
0.83113 0.99359 BC
1500 0.90266 -0.00527 0.03887 normal 0.83170 0.98418 NA
0.82615 0.98075 P
0.83683 0.99094 BC
2000 0.90212 -0.00582 0.03816 normal 0.83310 0.98278 NA
0.82438 0.97535 P
0.83727 0.98772 BC
5000 0.90212 -0.00582 0.03867 normal 0.83213 0.98375 NA
0.82589 0.97596 P
0.83780 0.98800 BC
10000 0.90213 -0.00581 0.03858 normal 0.83231 0.98356 NA
0.82548 0.97671 P
0.83648 0.98662 BC
NA = Normal Approximation P = Percentile BC = Bias Corrected

Sample of 100

A random sample of 100 rice accessions, as shown in Table 8 incurred only three

observed states. No accession entry has a Blade Color of Purple, Purple Tips, and Purple

Blotch in the sample. This sample collection produced a value Shannon index of 0.89913,

accounting a bias of -0.1206.


Table 8. Frequency distribution of the Blade Color on
the Population and on the sample of 100.

Descriptor's Frequency
State Population Sample
Pale Green (60) 5602 59
Green (61) 361 3
Dark Green (63) 2564 35
Purple Tips (80) 13 0
Purple Margins (85) 423 3
Purple Blotch (86) 40 0
Purple (89) 102 0
Total 9105 100
Shannon Index 1.0098 0.88913

Figure 5. Pie Graph of the proportional distribution of the Blade Color’s states on sample
of 100.

In Table 9, all the Bootstrap estimates on the sample of 100 using the different

number of bootstrap resamples underestimate the original sample estimate of the Blade

color’s Shannon index. The most accurate index is provided by the bootstrap estimate

with 1000 resamples having a bias of only -0.01317. However, trend on the bootstrap

estimates is not present as the number of resamples varies.


Non-normality of the distribution of the resamples is only found out for the

resamples 2,000 and higher. Moreover, the confidence interval using these resamples

does not cover the population of 1.0098 using Percentile method.

Table 9. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 100 original samples with different bootstrap resamples.

Bootstrap Bootstrap Normality 95% Confidence Interval Bootstrap


Resample Estimate Bias SE Test Lower Limit Upper Limit Method

200 0.87139 -0.01771 0.07482 normal 0.74643 1.03184 NA


0.74757 1.01639 P
0.76150 1.06629 BC
500 0.87449 -0.01461 0.07212 normal 0.74744 1.03083 NA
0.73066 1.01065 P
0.75628 1.03345 BC
1000 0.87593 -0.01317 0.07147 normal 0.74888 1.02939 NA
0.72907 1.01195 P
0.76150 1.02870 BC
1500 0.87497 -0.01413 0.07232 normal 0.74728 1.03099 NA
0.73066 1.01474 P
0.75628 1.03547 BC
2000 0.87207 -0.01703 0.07018 not normal 0.75150 1.02677 NA
0.72755 1.00192 P
0.76070 1.01877 BC
5000 0.87342 -0.01568 0.07256 not normal 0.74688 1.03139 NA
0.72411 1.00800 P
0.75677 1.03474 BC
10000 0.87256 -0.01654 0.07220 not normal 0.74761 1.03066 NA
0.73066 1.01058 P
0.76160 1.03986 BC
NA = Normal Approximation P = Percentile BC = Bias Corrected

Sample of 50

A random sample of 50 rice accessions, as shown in Table 10, provides a

Shannon index estimate of 0.87317 with the bias of -0.1366. Rice entries registered only

on the three Blade color’s states namely Pale Green, Green, and Dark Green.
Similar to the previous number of sample, all the Bootstrap estimates on the

sample of 100 using the different number of bootstrap resamples underestimate the

original sample estimate of the Blade color’s Shannon index. The most accurate index is

provided by the bootstrap estimate with 1000 resamples having a bias of only -0.01431.

However, the standard errors of the bootstrap estimate in this sample indicated relatively

high value compared to the previous number of sample.

Table 10. Frequency distribution of the Blade Color on


the Population and on the sample of 50.

Descriptor's Frequency
State Population Sample
Pale Green (60) 5602 30
Green (61) 361 4
Dark Green (63) 2564 16
Purple Tips (80) 13 0
Purple Margins (85) 423 0
Purple Blotch (86) 40 0
Purple (89) 102 0
Total 9105 50
Shannon Index 1.0098 0.87317

Figure 6. Pie Graph of the proportional distribution of the Blade Color’s


states on sample of 50.
For the original sample of 50, only the resamples of 200, 500 and 1,000 fitted the

normal distribution. Resampling greater than equal to 1,500 produced non-normal

distribution at 0.05 level of significance.

For all resamples, it can be noticed that the upper bound of the interval using

percentile method failed to reach and cover the population index of 1.0098. Bias-

corrected method for 1000 resamples has upper bound less than the population index.

Table 11. The Bootstrap estimates and statistical properties of the Shannon Index of the
Blade Color on the 50 original samples with different bootstrap resamples.

Bootstrap Bootstrap Normality 95% Confidence Interval Bootstrap


Resample Estimate Bias SE Test Lower Limit Upper Limit Method

200 0.85569 -0.01749 0.07461 normal 0.72604 1.02031 NA


0.70276 0.99841 P
0.76099 1.03758 BC
500 0.85518 -0.01799 0.08114 normal 0.71375 1.03259 NA
0.68434 0.99787 P
0.71710 1.01810 BC
1000 0.85886 -0.01431 0.08036 normal 0.71548 1.03086 NA
0.68434 0.99783 P
0.70168 1.00487 BC
1500 0.85390 -0.01927 0.08293 not normal 0.71051 1.03584 NA
0.66500 0.99787 P
0.70779 1.01810 BC
2000 0.85279 -0.02039 0.08297 not normal 0.71046 1.03588 NA
0.67939 0.99785 P
0.71351 1.01986 BC
5000 0.85164 -0.02153 0.08272 not normal 0.71100 1.03535 NA
0.67301 0.99524 P
0.71351 1.01331 BC
10000 0.85362 -0.01955 0.08292 not normal 0.71064 1.03571 NA
0.67301 0.99785 P
0.71351 1.01810 BC
NA = Normal Approximation P = Percentile BC = Bias Corrected
Comparison of Samples

Table 12 summarizes the value of Shannon Index in the different number of

sample and the frequency observed in the Blade color’s states. Furthermore, it also

presents the population index of 1.0098 is estimated using 3,000, 1,000, 500, 100, and 50

with the value 1.0091, 0.9977, 0.9079, 0.88913 and 0.87317 respectively. Relative to the

Shannon Index on the population of the Blade Color in the collection of rice, the Shannon

Index estimate in the sample decreases as the number of sample decreases. This shows

that gathering fewer samples from the collection the estimate will more likely

underestimate the true population index. This can be attributed by having no observation

in some of the state of the Blade Color sampled. In the sample of 500 observations, no

sampled rice has a Blade Color of Purple Blotch. In addition, the sample of 100 and 50

left three and four states unregistered.

Table 12. Comparison of the distribution of Blade Color’s states and Shannon Index on
the different samples.

Frequency
State Population 3000 1000 500 100 50
Pale Green 5602 1865 621 324 59 30
Green 361 114 32 14 3 4
Dark Green 2564 818 285 137 35 16
Purple Tips 13 3 3 1 0 0
Purple Margins 423 147 40 22 3 0
Purple Blotch 40 11 7 0 0 0
Purple 102 42 12 2 0 0
Total 9105 3000 1000 500 100 50

Shannon Index 1.0098 1.0091 0.99774 0.907940 0.88913 0.873173


Differences on the Observed State

The Shannon Index of the descriptor is also examined when the number of the

state are subjected to variation with the same number of samples. Table 8 confirms that

the Shannon diversity index decreases as the number of the state or class decreases. The

amount of decrease also increases as the number of detected state closes to zero.

Furthermore, the Shannon index is observed to be equal to zero when there is only one

state collected in the sample.

Table 13. The frequency distribution on the Blade Color’s State on the sample of 1,000 in
different number of states observed

Frequency in the Number of States Observed


State (7) (6) (5) (4) (3) (2) (1)
Pale Green 621 622 624 628 640 669 1000
Green 32 32 33 36 0 0 0
Dark Green 285 286 288 291 303 331 0
Purple Tips 3 0 0 0 0 0 0
Purple Margins 40 41 42 45 57 0 0
Purple Blotch 7 7 0 0 0 0 0
Purple 12 12 13 0 0 0 0
Total 1000 1000 1000 1000 1000 1000 1000
Shannon Index 0.99774 0.98225 0.95495 0.91060 0.81070 0.63488 0.00000

Differences of Observed State on Different Descriptors

The Shannon indices of different descriptors with different number of state were

also analyzed. Table 14 presents seven descriptors with increasing population index as

the number of state increases. However, there is no descriptor with six states found in the

ice collection. It can be observed that Blade color with seven states has a Shannon index

of 1.00978. This index is less than the index of Leaf Length with only five states but with
an index of 1.02823. Thus, it can be said that the rice collection’s Leaf Length is more

diverse than Blade color. The same behavior is followed for the sample of 100 and 50.

Table 14. Shannon index of different descriptors with different number of states

No. of Population Sample Index


Descriptor state Index 100 50
Endosperm 2 0.22900 0.22697 0.20456
Culm Number 3 0.63162 0.60008 0.53418
Collar Color 4 0.95320 0.94536 0.83457
Leaf Length 5 1.02823 1.02186 0.91956
Blade Color 7 1.00978 0.88913 0.87317
Apiculus Color 8 1.40152 1.37467 1.00034
Leaf Senescense 9 1.38969 1.28873 1.03265

Analysis for Confidence Interval

The sample of 100 was used for the analysis of bootstrap confidence interval

since only after this resample variation on the index was identified. Original samples with

large number, from 500 to 3,000 for instance, produced a normal distribution for the

Shannon index estimates. Furthermore, large original samples generated bootstrap

confidence intervals that cover the true parameter index. Thus, a relatively small sample

was utilized.

Normal Approximation Method

The 95% confidence interval using the normal approximation method for different

bootstrap resamples with the original sample of 100 observations is presented in Table

10. It is revealed that the confidence interval using normal approximation with 2,000

resamples has the narrowest length; while the bootstrap estimate with 5,000 resamples

has the widest coverage of interval. Figure 7 illustrates the coverage of the intervals and it
can be noticed that the intervals are close to one another. All the intervals covered the

population index of Blade color with the value of 1.00978 indicated by the solid vertical

line on the graph.

Table 15. 95 % Confidence Interval using Normal Approximation method.

Bootstrap Bootstrap 95 % Confidence Interval Interval


Resamples Estimate Lower Limit Upper Limit Length
200 0.87139 0.74643 1.02694 0.28050
500 0.87449 0.74744 1.03083 0.28339
1000 0.87593 0.74888 1.02939 0.28050
1500 0.87497 0.74728 1.03099 0.28371
2000 0.87207 0.75150 1.02677 0.27526
5000 0.87342 0.74688 1.03139 0.28450
10000 0.87256 0.74761 1.03066 0.28306

Original Sample
Estiimate

10000 Bootstrap Estimate

Lower Limit

Upper Limit
5000

2000
Resamples

1500

1000

500

200

0.700000 0.800000 0.900000 1.000000

Shannon Index

Figure 7. 95 % Confidence Interval using Normal Approximation method


Percentile Method

The 95% confidence interval using the percentile method for different bootstrap

resamples with the original sample of 100 observations is presented in Table 16. It is

revealed that the confidence interval using normal approximation with 200 resamples has

the narrowest length; while the bootstrap estimate with 1,500 resamples has the widest

coverage of interval.

As illustrated in Figure 8, unlike the normal approximation method, the

confidence intervals in different resamples are not symmetric with respect to the

bootstrap estimate. Notably, the upper bound of all the intervals are lie near the

population index of Blade color. Only the resample of 2,000 did not cover the parameter

of Shannon index.

Table 16. 95 % Confidence Interval using Percentile method

Bootstrap Bootstrap 95 % Confidence Interval Interval


Resamples Estimate Lower Limit Upper Limit Length
200 0.87139 0.74757 1.01639 0.26882
500 0.87449 0.73066 1.01065 0.27999
1000 0.87593 0.72907 1.01195 0.28287
1500 0.87497 0.73066 1.01474 0.28408
2000 0.87207 0.72755 1.00192 0.27437
5000 0.87342 0.72411 1.00800 0.28389
10000 0.87256 0.73066 1.01058 0.27992
Original Sample
Estimate

10000 Bootstrap Estimate

Lower Limit

Upper Limit
5000

2000
Resamples

1500

1000

500

200

0.700000 0.800000 0.900000 1.000000

Shannon Index

Figure 8. 95 % Confidence Interval using Percentile method

Bias-Corrected Method

The 95% confidence interval using the bias corrected method for different

bootstrap resamples with the original sample of 100 observations is presented in Table

17. Likewise to Normal approximation methods, the confidence interval 2000 resamples

registered the narrowest length among the different number of resamples implemented;

also the bootstrap estimate with 200 resamples has the widest coverage of interval.

As illustrated in Figure 9, unlike the normal approximation and percentile

method, the skewness in the intervals are more reflective using the bias corrected method.

The lower bound of all the resamples converged to a certain value; while, the upper

bounds vary significantly on every resample.


Table 17. 95 % Confidence Interval using Bias Corrected method

Bootstrap Bootstrap 95 % Confidence Interval Interval


Resamples Estimate Lower Limit Upper Limit Length
200 0.87139 0.76150 1.06629 0.30479
500 0.87449 0.75628 1.03345 0.27716
1000 0.87593 0.76150 1.02870 0.26721
1500 0.87497 0.75628 1.03547 0.27919
2000 0.87207 0.76070 1.01877 0.25807
5000 0.87342 0.75677 1.03474 0.27797
10000 0.87256 0.76160 1.03986 0.27826

Original Sample
estimate

10000 Bootstarp Esimate

Lower Limit

Uppwe Limit
5000

2000
Resamples

1500

1000

500

200

0.700000 0.750000 0.800000 0.850000 0.900000 0.950000 1.000000 1.050000

Shannon Index

Figure 9. 95 % Confidence Interval using Normal Approximation method


Comparison of Intervals on Different Resamples

The three bootstrap confidence intervals are further analyzed within the different

number of resamples as presented in Figure 10. For all the different number of resamples

specified, the lower and upper bound of the Percentile Method reached the lowest index

value for the confidence interval of the Blade Color among the three methods except only

for the lower bound of 200 resamples. On the other hand, the all the lower and upper

bound of the confidence interval for all the number of resamples specified using the

Normal Approximation arrived with the highest value of Shannon diversity index of

Blade Color except with 2000 number of resamples.


BC

200
P

NA

3 Original Lower
Sample Limit
500

2 Estimate Upper
Bootstrap Limit
1 Estimate

BC
1000

P
NA

BC
1500

NA

NA
2000

NA

BC
5000

NA

BC
10000

NA

0.700000 0.800000 0.900000 1.000000 1.100000

Shannon Index

Figure 10. Comparison of the Confidence Interval using different methods in different
numbers of resamples
100 Confidence Intervals

From the 100 constructed confidence intervals using the Normal Approximation

method, all the intervals cover the original bootstrap estimates. It can also observe on

Figure 9 that the confidence intervals are symmetric with the reference on the bootstrap

estimate.

Original sample estimate


Bootstrap estimate
100
Lower limit
97
Upper limit
94

91

88

85

82

79

76

73

70

67

64

61

58
Interval

55

52

49

46

43

40

37

34

31

28

25

22

19

16

13

10

0.700000 0.800000 0.900000 1.000000

Shannon index

Figure 11. One Hundred Bootstrap confidence interval using Normal Approximation
Method
Unlike other methods, fifty-five out of 100 confidence intervals using Percentile

method covered the true population parameter. The other forty-five intervals have upper

bound less than the parameter. Moreover, the intervals produced are no longer symmetric

with respect to the bootstrap estimates.

Original Sample
estimate
Bootstrap estimate
101
99 Lower limit
97 Upper limit
95
93
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
57
Interval

55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1

0.700000 0.800000 0.900000 1.000000

Shannon index

Figure 12. One Hundred Bootstrap confidence interval using Percentile Method
The Bias-Corrected method also generated 100 confidence intervals that cover the

original bootstrap estimate like the previous two methods. No interval is found to be

symmetric with regard to bootstrap estimate and it can be observed that intervals are

skewed at either side the bootstrap estimate.

Original Sample
Estimate
101 Bootstrap Estimate
99
97 Lower Limit
95
93 Upper Limit
91
89
87
85
83
81
79
77
75
73
71
69
67
65
63
61
59
Case Number

57
55
53
51
49
47
45
43
41
39
37
35
33
31
29
27
25
23
21
19
17
15
13
11
9
7
5
3
1

0.700000 0.800000 0.900000 1.000000

Value

Figure 13. One Hundred Bootstrap confidence intervals using Bias Corrected Method
Confidence Interval Measures

Using the 100 generated confidence intervals on the three methods, the

comparison of the properties of the interval were evaluated and presented on Table 14.

As a measure of the interval’s accuracy, the Average Range of the ranges of the 100

intervals indicates that the Bias-corrected method produced the most accurate interval

with the mean range of 0.26601. It was followed by Bias-Corrected and Normal

Approximation method with corresponding ranges equal to 0.28314 and 0.28332.

The Expected Coverage Rate and Realized Coverage Rate of the Normal and

Bias-corrected methods are the same with the rates equal to 0.95 and 1.00, respectively.

Percentile method, on the other hand, has a realized coverage rate of 0.55. Thus, this

means that the intervals constructed under Normal approximation and Bias-corrected

have the same number of intervals that actually covered the original sample estimate.

Also, only 55 percent of the intervals will expect to contain the parameter index using

Percentile method. Thus, the Normal approximation and Bias-corrected method are more

sufficient than Percentile method in forming confidence interval.

Similarly, both Normal approximation and Bias-corrected have the same

calibration rate of 0.99646 which implies that the average range of each method needs a

downward adjustment for the estimated confidence intervals to equalize the realized

coverage rate with the expected coverage rate. On the other hand Percentile method has a

calibration rate of 1.03454 which needs an upward adjustment. However, based on the

method’s Calibrated Average Rate, the Bias-corrected method is the most efficient

estimation after the adjustment of coverage sufficiency.


Table 18. Comparisons of the properties of the three confidence intervals

Interval Properties Normal Approx. Percentile Bias-Corrected


Average Range 0.28314 0.28332 0.26601
Expected Coverage Rate 0.95 0.95 0.95
Realized Coverage Rate 1.00 0.55 1.00
Calibration Rate 0.99646 1.03454 0.99646
Calibrated Average Rate 0.28214 0.29311 0.26507

Distribution of Shannon Index

It is also an interest to know the behavior of the Shannon index as the number of

Bootstrap resamples varies. The original sample of 100 was used for this analysis since it

is the sample where the index reached its critical limits on fitting the distribution for

normal. Different levels of significance were set to determine the power of the

distribution of the estimates as it follows the normal distribution.

Figure 14 shows the probability density function of the distribution of the

estimates generated by bootstrapping using different number of resamples. Graphically,

all the resamples’ distribution resemble the curve of the normal distribution.

Using Kolmogorov-Smirnov statistics on the Goodness-of-Fit test for normality,

the distribution of the estimates produced by having 200 and 500 resamples fits the

normal distribution at level of significance ranging from 0.01 to 0.2. For the resamples of

1,000, the distribution of estimates of the resamples differs significantly from normal

distribution only at 0.2 level of significance. Both 2,000 and 5,000 resamples detected

that index distribution does not fit the normal distribution on at least 0.05 level of

significance. Thus, as the number of bootstrap resamples increases the distribution of the

Shannon index departs from the normal distribution.


Probability Density Function Probability Density Function

0.18
0.18
0.16
0.16
0.14
0.14

0.12 0.12

0.1
f(x)

f(x)
0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0
0.7 0.8 0.9 1 1.1 0.7 0.8 0.9 1 1.1
x x

His togram Norm al His togram Norm al

(a) 200 Resamples (b) 500 Resamples


Probability Density Function Probability Density Function

0.2
0.18
0.18
0.16
0.16
0.14
0.14
0.12
0.12
f(x)

f(x)

0.1
0.1
0.08
0.08
0.06
0.06

0.04 0.04

0.02 0.02

0 0
0.6 0.7 0.8 0.9 1 0.7 0.8 0.9 1 1.1
x x

His togram Norm al His togram Norm al

(c) 1000 Resamples (d) 1500 Resamples


Probability Density Function Probability Density Function

0.18 0.18

0.16 0.16

0.14 0.14

0.12 0.12
f(x)

f(x)

0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

0 0
0.7 0.8 0.9 1 0.6 0.7 0.8 0.9 1 1.1
x x

His togram Norm al His togram Norm al

(e) 2000 Resamples (e) 5000 Resamples

Figure 14. Comparison of the distribution of the Bootstrap estimates using different
resamples
Table 17. Goodness-of-Fit test for the normality of the different bootstrap resamples

Goodness of Fit (Test of Normality)


Kolmogorov-Smirnov
Sample Size 200
Statistic 0.04181
P-Value 0.86094
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.07587 0.08648 0.09603 0.10734 0.11519
Reject? No No No No No
Sample Size 500
Statistic 0.02512
P-Value 0.90266
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.04799 0.05469 0.06073 0.06789 0.07285
Reject? No No No No No
Sample Size 1000
Statistic 0.03585
P-Value 0.14925
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.03393 0.03867 0.04294 0.048 0.05151
Reject? Yes No No No No
Sample Size 2000
Statistic 0.03297
P-Value 0.02532
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.02399 0.02735 0.03037 0.03394 0.03643
Reject? Yes Yes Yes No No
Sample Size 5000
Statistic 0.02057
P-Value 0.02868
Α 0.2 0.1 0.05 0.02 0.01
Critical Value 0.01517 0.0173 0.01921 0.02147 0.02304
Reject? Yes Yes Yes No No
SUMMARY AND CONCLUSION

Shannon diversity index is widely used in determining the diversity of a

collection. However, the estimate of this index is known to be biased and with no simple

formula for statistical properties exist. Without any requirement for the formula,

bootstrapping was used to estimate the Shannon index and construct confidence interval

around it.

The behavior of the Shannon index using the different conditions under bootstrap

method was analyzed in the study. Using the different number of original sample to be

used for bootstrapping, relatively small number of sample found to have significant effect

on the variation of the statistical properties of the Shannon diversity index. From almost

9,000 rice accessions, a sample of size 100 was found to detect significant bias and

interval coverage for the diversity index of Blade Color and other descriptors. Also, the

number of bootstrap resamples did not present any particular trend for the properties of

index for large original samples. However, resamples indicate that as the number of

resamples increases the distribution of the Shannon index will more likely deviate from

normal distribution.

Among the three methods used for the interval construction, Bias-corrected

method produced the most accurate, sufficient, and efficient interval based on average

range, expected and realized coverage rate, and calibration rate. On the other hand

Percentile method has a calibration rate of 1.03454 which needs an upward adjustment.

Normal Approximation method provided significant intervals for the some resamples that

detected to have a normal distribution.


LITERATURE CITED

ALDEMITA, Y. 2006. Genetic diversity in Korean Germpalsm of rice with different


levels of resistance to blast in the Philippines. Undergraduate Special Problem.
University of The Philippines, Los Banos.

ALCASID, C. 2006. Morpho-Agronomic Diversity Analysis of twenty thermo-sensitive


genetic male sterile lines of rice. Undergraduate Special Problem. University of The
Philippines, Los Banos.

ALMAZAN, K. 2007 Agro-Morphological Diversity among wide crossed derived rice


lines at F8 generation. Undergraduate Thesis Manuscript. University of the Philippines,
Los Banos.

EFRON, B. 1981. Nonparametic estimates of standard error: the jackknife, bootstrap,


and other methods. Biometrika 68. 589-599.

HUTCHENSON, K. 1970. A test for comparing diversities based on shannon formula.


Journal of Theoretical Biology. 29. 151-154.

JAIN, S.K., QUALSETM C.O., BHATT, G.M., WU, K.K. 1975. Geographic patterns of
Phenotyphic diversity in a world collection of Durum Wheats. Crop Science 15. 700-704.

PEET, R.1975. Relative Diversity indices. Ecology 56. 496-498

PIELOU, E.C. 1966. The measurement of diversity in different types of biological


collections. Journal of Theoretical Biology. 13. 131-141.

_____ .1969. An introduction to Mathematical Ecology. Wiley, New York.

PINOL, M. 1997. Species diversity measurement for logged-over dipterocarp forest in


the Philippines under different cutting regimes. Graduate Thesis. University of the
Philippines, Los Banos.

Riley, K.W. 1998. Phenotypic diversity in the ethiopian noug germplasm. African Crop
Science Journal 8. No. 2 . 137-143
Rice Diversity <http://www.ricediversity.org>

YANG, R.C., JANA, S., CLARKE, J.M. 1991. Phenotypic diversity and associations of
some potentially drought-responsive characters in durum wheat. Crop Science 31:1484 -
1491
ZALH, S. 1977. Jackkniffing an index of Diversity. Ecology 58. 907-913

Potrebbero piacerti anche