Kayal and Kumar - Estimation of The Shannon Entropy of Several Shifted Exponential Populations

Statistics and Probability Letters 83 (2013) 1127–1135
Contents lists available at SciVerse ScienceDirect
Statistics and Probability Letters

journal homepage: www.elsevier.com/locate/stapro
Estimation of the Shannon’s entropy of several shifted

exponential populations
Suchandan Kayal, Somesh Kumar ∗
Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur - 721302, India
article info abstract

Article history: Estimation of the entropy of several exponential distributions is considered. A general in-
Received 22 January 2011 admissibility result for the scale equivariant estimators is proved. The results are extended
Received in revised form 10 January 2013 to the case of unequal sample sizes. Risk functions of proposed estimators are compared
Accepted 11 January 2013
numerically.
Available online 20 January 2013
© 2013 Elsevier B.V. All rights reserved.
Keywords:
Entropy
Equivariant estimator
Inadmissibility
Monotone likelihood ratio
Brewster–Zidek technique
1. Introduction
The concept of entropy was introduced by Clausius, Boltzmann and Gibbs in thermodynamics and statistical mechanics
in the nineteenth century as a measure of disorder of a physical system. A major boost to the concept was provided by
Shannon (1948) who related it to the theory of communication as a measure of information. Suppose a random variable X
has the probability density function fθ (x), θ ∈ Θ . Then the Shannon’s entropy of the random variable X is defined by
H (θ ) = Eθ (− ln fθ (X )).
Presently the term entropy has applications in such diverse areas as molecular biology, hydrology, computer science and
meteorology. For example, molecular biologists use the concept of Shannon’s entropy in the analysis of patterns in gene
sequences. In dynamical systems, entropy is used to measure the exponential complexity of the system. In social studies,
entropy is used as a measure of the decay of systems such as organizations, social orders or practices. For a detailed account
of importance and applications of the principles of entropy in various disciplines one may refer to Cover and Thomas (1999),
Adami (2004), Misra et al. (2005), Robinson (2011) and Liu et al. (2011).
There have been attempts by several authors for the parametric estimation of entropy. Lazo and Rathie (1978) obtained
entropy expressions of various univariate continuous probability distributions. Ahmed and Gokhale (1989) derived the
expressions of entropy of several multivariate distributions. In particular, they studied multivariate normal and exponential
distributions and obtained uniformly minimum variance unbiased estimator (UMVUE ) of the entropy. The problem of
estimating the entropy of a multivariate normal distribution with respect to the squared error loss function has been further
investigated by Misra et al. (2005). They showed that the best affine equivariant estimator (BAEE ) is unbiased and is also
generalized Bayes. Further improved estimators were obtained dominating the BAEE.
∗ Corresponding author. Tel.: +91 3222283662; fax: +91 3222255303.

E-mail addresses: suchandan.kayal@gmail.com (S. Kayal), smsh@iitkgp.ac.in, smsh@maths.iitkgp.ernet.in (S. Kumar).
0167-7152/$ – see front matter © 2013 Elsevier B.V. All rights reserved.
doi:10.1016/j.spl.2013.01.012
1128 S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135
The problem of estimating the Shannon’s entropy in exponential populations is considered here. The exponential distri-
bution can be obtained as a distribution with the maximum entropy when a continuous random variable has a given mean
and support on the positive real line. Cover and Thomas (1999) describe an application in atmospheric physics. They con-
sider the distribution of the height of molecules in the atmosphere. Here the average potential energy of molecules is fixed
and the gas tends to the distribution with the maximum entropy subject to the restriction that the average potential energy
is constant. In fact the density of atmosphere is known to have an exponential distribution. If σ is the scale parameter of
the exponential distribution then the expression for the entropy is 1 + ln σ . Therefore in an exponential population, the
estimation of entropy is equivalent to estimation of the logarithm of the scale parameter.
It was first observed by Stein (1964) that the BAEE of the normal variance is inadmissible. Brown (1968) gave general
conditions under which Stein type results can be obtained for scale parameter families. However, his results are not applica-
ble to many situations such as a shifted exponential distribution. Arnold (1970) proved that the BAEE of the scale parameter
in a shifted exponential distribution is inadmissible with respect to a squared error loss. Zidek (1973) extended the result of
Arnold to a larger class of bowl-shaped loss functions. The estimators of Arnold and Zidek are not smooth. Brewster (1974)
derived a smooth improved estimator, however, it does not dominate the BAEE in the whole parameter space. An improve-
ment over the BAEE of the reciprocal of the scale parameter was derived by Sharma (1977). Petropoulos and Kourouklis
(2002) derived a class of improved estimators with respect to a scale invariant loss function. Recently Bobotas and Kourouk-
lis (2009) have obtained a new class of improving estimators for the scale parameter in the presence of a nuisance parameter
under a scale invariant loss. In particular the result yields a class of estimators improving upon the BAEE of the scale param-
eter in an exponential population.
Kayal and Kumar (2011a) considered the problem of estimating the entropy of an exponential distribution with respect to
a linex loss function. For the negative exponential model they proved that the best scale equivariant estimator of the entropy
is admissible and minimax. However, for the shifted exponential distribution, due to the presence of nuisance parameter,
the sufficient statistic changes and the BAEE of the entropy is shown to be inadmissible (Kayal and Kumar, 2011a). The
estimation of the entropy of k (≥2) negative exponential populations was considered by Kayal and Kumar (2011b) with
respect to the squared error and linex loss functions.
In this paper we consider the estimation of entropy of k (≥2) shifted exponential populations, when they have a common
scale parameter σ and different location parameters µ1 , . . . , µk . Note that this model is not covered by the work mentioned
in the previous two paragraphs. Exponential distribution is one of the most widely used distributions in describing lifetimes
of components, service times in queueing systems, time periods between two successive occurrences in a Poisson process
etc. Recently Pal et al. (2006) have demonstrated that real life data sets on stems sizes of male and female species of diecious
plants as obtained from Sakai and Burries (1985) are fitted by exponential distributions. Dragulescu and Yakovenko (2001)
have shown that individual annual income data in USA is fitted very well by exponential distribution. Here one may consider
the parameters µ1 , . . . , µk to denote the income levels below which the tax filing is not required in different states. However,
the average income levels may be same due to overall economic policies of the country which is applicable to all citizens.
Similarly one may consider service times at check-in counters of k different airlines at different airports. Here due to different
starting times, the parameters µ1 , . . . , µk may be different but average service times (once the service has started) may be
same due to similar nature of trained service persons and equipment used.
In Section 2, we obtain the BAEE for the Shannon’s entropy for our model. A general inadmissibility result for the scale
equivariant estimators is proved. Consequently, a new estimator is obtained which dominates the BAEE under the squared
error loss function. Further, problems of estimating the entropy are considered in restricted parameter spaces and improved
estimators are derived. In Section 3 the results are extended to the case when sample sizes are unequal. A heuristic discussion
is added in Section 4. A numerical comparison of the risk values of the proposed estimators is presented in Section 5.
2. The best affine equivariant estimator
Let (Xi1 , . . . , Xin ) be a random sample taken from the population Πi , i = 1, . . . , k (k ≥ 2). We assume that the k samples
are taken independently. The probability density associated with the population Πi is given by
x − µi
  
1
exp − , if x > µi ,
fi (x) = σ σ (1)
0, otherwise.

The expression of the Shannon’s entropy is H (σ ) = k(1 + ln σ ). We consider an equivalent problem of estimating
Q (σ ) = ln σ under the squared error loss
L(σ , δ) = (δ − ln σ )2 . (2)
On the basis of the i-th n sample {Xi1 , . . . , Xin }, (Xi(1) , Yi ) is a complete and sufficient statistic for (µi , σ ), where Xi(1) =
min1≤j≤n Xij , Yi = j=1 Xij . Further, we define Zi = Yi − nXi(1) . Then Xi(1) and Zi are independently distributed. Also
Xi(1) follows an exponential distribution with location parameter µi and scale parameter σ /n, whereas 2Zi /σ follows
a chi-square distribution with 2(n − 1) degrees of freedom (see, for example, Lehmann and Casella, 1998, p. 43). Let
X (1) = (X1(1) , . . . , Xk(1) ) and T = i=1 Zi . Then (X (1) , T ) is complete and sufficient for (µ, σ ), where µ = (µ1 , . . . , µk ).
k
S. Kayal, S. Kumar / Statistics and Probability Letters 83 (2013) 1127–1135 1129
It should be noted that X (1) and T are independently distributed. Further, using the additive property of the chi-square
distribution, it can be shown that 2T /σ follows a chi-square distribution with 2k(n − 1) degrees of freedom. The maximum
likelihood estimator (MLE ) of Q (σ ) is δML = ln T − ln(kn). We derive the UMVUE of Q (σ ) as δMV = ln T − ψ(k(n − 1)),
where ψ denotes Euler psi (digamma) function, defined as ψ(x) = dx d
(ln Γ (x)).
Consider the transformations ga,bi (xij ) = axij + bi , j = 1, . . . , n, i = 1, . . . , k. Here a is kept the same so as to have the
common scale property to be sustained after transformation. Writing b = (b1 , . . . , bk ) and ga,b = (ga,b1 , . . . , ga,bk ), we see
that under the transformation ga,b ,
(µ, σ ) → (aµ + b, aσ ), (X (1) , T ) → (aX (1) + b, aT ).
Consequently, we get ln σ → ln σ + ln a. The loss function (2) is invariant under the group Ga,b of affine transformations
ga,b , a > 0, b ∈ Rk , if δ → δ + ln a. The form of an affine equivariant estimator is obtained as
δc (X (1) , T ) = ln T − c (3)
for any constant c. The following theorem gives the BAEE of Q (σ ).
Theorem 1. Under the squared error loss function (2), the BAEE of Q (σ ) is δc0 (X (1) , T ), where c0 = ψ(k(n − 1)).
Proof. The risk of the estimators of the form (3) is
R(σ , δc ) = E (ln T − c − ln σ )2 ,
which is minimized for
c = E (ln(T /σ )) = ψ(k(n − 1)) = c0 , say.
Hence the result follows.
Remark 1. The BAEE is also the UMVUE. Also using Jensen’s inequality it can be shown that ψ(k(n − 1)) < ln(kn) which
means that the MLE underestimates Q (σ ).
2.1. Improving upon the best affine equivariant estimator
To get an improvement over the BAEE δc0 , we consider a larger class of estimators. Consider the scale group of
transformations Ga = {ga : ga (x) = ax, a > 0}. The problem of estimating Q (σ ) remains invariant with respect to the
group Ga . Under the transformation ga , we have

(µ, σ ) → (aµ, aσ ), (X (1) , T ) → aX (1) , aT )
and therefore, ln σ → ln σ +ln a. It can be also shown that the loss function (2) is invariant under the group Ga if δ → δ+ln a.
Therefore we get the form of a scale equivariant estimator as
δφ (W , T ) = ln T + φ(W ), (4)
where W = (W1 , . . . , Wk ), Wi = Xi(1) /T and φ is a real valued measurable function. A general inadmissibility result for
the estimators of the form (4) is proved in the theorem below. Let B1 = {w : w(1) > 0}, B2 = {w : u < exp(φ(w)
+ ψ(kn))}, B3 = {w : w(k) < 0}, u = n ki=1 wi + 1, w(1) = min{w1 , . . . , wk }, w(k) = max{w1 , . . . , wk } and wi = xi(1) /t

for i = 1, . . . , k. Also define for a function φ(w) as in (4),
  
  k     
ln n wi + 1 − ψ(kn), if w ∈ B1 B2 B3 Bc2

φ0 (w) = (5)
i = 1
φ(w),

otherwise.

Theorem 2. Let δφ be a scale equivariant estimator of the form (4) and φ0 (w) be as defined in (5). If there exists some (µ, σ )
such that P(µ,σ ) (φ0 (W ) ̸= φ(W )) > 0, then under the squared error loss function (2), the estimator δφ0 dominates δφ .
Proof. The risk function of the estimators of the form δφ given in (4) can be written as
R(µ, σ , δφ ) = E W R1 (µ, σ , W , δφ ),
where R1 (µ, σ , w, δφ ) denotes the conditional risk of δφ given W = w given by
R1 (µ, σ , w, δφ ) = E [(δφ − ln σ )2 |W = w]
= E [(ln(T /σ ) + φ(W ))2 |W = w]. (6)

We notice that the conditional risk R1 (µ, σ , w, δφ ) in (6) is only a function of the ratio µ/σ . Therefore, without loss of
generality we can take σ = 1. Again the conditional risk R1 is a convex function of φ , and the choice of φ minimizing R1 can
be obtained as
φ̂(w, µ) = −E (ln T |W = w). (7)

In order to evaluate the term in (7), we derive the conditional distribution of T given W = w . The joint probability density
of X (1) and T is
nk
  
− n ki=1 (xi(1) −µi )+t k(n−1)−1
f(X (1) ,T ) (x(1) , t ) = e t , t ≥ 0, xi(1) ≥ µi , i = 1, . . . , k. (8)
Γ (k(n − 1))
Now using the transformations w1 = x1(1) /t , . . . , wk = xk(1) /t and t = t, we get the joint density of W and T , as
nk
  
− n ki=1 (wi t −µi )+t kn−1
f(W ,T ) (w, t ) =  e t , t ≥ 0, t wi ≥ µi .
Γ k(n − 1)
To find the marginal density of W , we integrate f(W ,T ) (w, t ) with respect to t.

Case (i) Suppose all µi ’s are non-negative, i = 1, . . . , k:
In this case, t varies from η1 to ∞, where η1 = max{µ1 /w1 , . . . , µk /wk }. Therefore, the marginal density of W is
∞
nk
   
fW (w) = e t dt , wi > 0.
Γ (k(n − 1)) η1
Consequently, the conditional density of T given W = w is given by

  
e t
fT |W (t |w) =   , t > η1 .
 ∞ − n ki=1 (wi t −µi )+t
η1 e t kn−1 dt
Therefore, we get
∞
η1 ln t e−ut t kn−1 dt
E (ln T |W = w) = ∞ .
η1 e−ut t kn−1 dt
Substituting the expression of E (ln T |W = w) in (7), we get

∞
η1′ ln p e−p pkn−1 dp
φ̂(w, µ) = ln u − ∞ = ln u − h1 (η1′ ), say (9)
η′ e−p pkn−1 dp
1
where η1′ = η1 u. In order to apply the Brewster–Zidek technique (1974) we need to find the supremum and infimum of
φ̂(w, µ) given in (9). To this end, we show that the density function
e−p pkn−1
∞ , η1′ < p < ∞,
η1′ e−p pkn−1 dp
has a monotone likelihood ratio property in η1′ and then apply Lemma 3.4.2, in Lehmann and Romano (2009). Now it can be
shown that h1 (η1′ ) is a nondecreasing function in η1′ and η1′ lies between 0 to ∞. Thus we get
sup h1 (η1′ ) = +∞ and inf h1 (η1′ ) = ψ(kn).
η1′ η1′
Therefore, from (9) we get
sup φ̂(w, µ) = ln u − ψ(kn) and inf φ̂(w, µ) = −∞.

µ µ
Case (ii) Suppose all µi ’s are negative, i = 1, . . . , k:

In this case several possibilities in wi ’s may arise, which are (a) all wi ’s are non-negative, (b) all wi ’s are negative and (c )
some wi ’s are non-negative and remaining are negative. In the following discussion, we investigate all these cases in detail.
(a) When all wi ’s are non-negative, the range of t is from 0 to ∞. Therefore the marginal density of W is
∞
nk
   
fW (w) = e t dt , wi > 0, i = 1, . . . , k.
Γ (k(n − 1)) 0
Consequently, the conditional density of T given W = w can be obtained as

  
e t
fT |W (t |w) =   , t > 0.
 ∞ − n ki=1 (wi t −µi )+t
e kn−1 t dt
0
Therefore, the conditional expectation of ln T given W = w is given by

∞
ln t e−ut t kn−1 dt
E (ln T |W = w) = 0
∞ .
0
e−ut t kn−1 dt
Substituting the expression of E (ln T |W = w) in (7) and integrating, we get
φ̂(w, µ) = ln u − ψ(kn). (10)

(b) Now we consider the case when wi ’s are negative:
In this case, t varies from 0 to η2 , where η2 = min{µ1 /w1 , . . . , µk /wk }. Similar to the Case (a) we derive the conditional
expectation of ln T given W = w as
 η2
E (ln T |W = w) = 0
 η2 .
0
e−ut t kn−1 dt
When u > 0, we have from (7)

 η2′
ln p e−p pkn−1 dp
φ̂(w, µ) = ln u − 0
 η2′ = ln u − h2 (η2′ ), say
0 e−p pkn−1 dp
where η2′ = η2 u. Using monotone likelihood ratio property as in Case (i), we can show that h2 (η2′ ) is a nondecreasing function
in η2′ . Thus we get
sup h2 (η2′ ) = ψ(kn) and inf h1 (η2′ ) = −∞.
η2′ η2′
Therefore,
sup φ̂(w, µ) = +∞ and inf φ̂(w, µ) = ln u − ψ(kn).
µ µ
Similarly, when u < 0, we get
sup φ̂(w, µ) = +∞ and inf φ̂(w, µ) = −∞.

µ µ
(c) For the case when some wi ’s are non-negative and the remaining are negative, we show that the results are permutation
invariant. Let (i1 , . . . , ik ) be a permutation of (1, . . . , k). We assume wij ≥ 0 for j = 1, . . . , r and wij < 0 for j = r + 1,
. . . , k, r = 1, . . . , k − 1. Thus the range of t is from 0 to η3 , where η3 = min{µir +1 /wir +1 , . . . , µik /wik }. In this case the
conditional expectation of ln T given W = w is obtained as
 η3
E (ln T |W = w) = 0
 η3 .
0
e−ut t kn−1 dt
Using the arguments as in Part (b), we get the supremum and infimum of φ̂ given in (7), as
sup φ̂(w, µ) = +∞, and inf φ̂(w, µ) = ln u − ψ(kn),

µ µ
when u > 0; and

sup φ̂(w, µ) = +∞ and inf φ̂(w, µ) = −∞,
µ µ
when u < 0.
Case (iii) Some of µi ’s are non-negative and remaining are negative:
In this case we show that finding the supremum and infimum of φ̂(w, µ) is invariant under different permutations in
µi ’s. We consider the case that within all µi ’s any r (r = 1, . . . , k − 1) terms are non-negative and remaining (k − r )
terms are negative. Let (i1 , . . . , ik ) be a permutation of (1, . . . , k) so that µij ≥ 0 for j = 1, . . . , r, and µij < 0 for
j = r + 1, . . . , k. Therefore, when µi1 , . . . , µir ≥ 0, all corresponding wij ’s are also non-negative, for j = 1, . . . , r, whereas
when µir +1 , . . . , µik < 0 there are several possibilities: all (k − r ) wi ’s are non-negative, all (k − r )wi ’s are negative, some
of wi ’s are non-negative and remaining are negative. To find the supremum and infimum of φ̂(w, µ) given in (7) we use the
technique used in Case (i).
(a) Let us consider the case wi1 , . . . , wir , . . . , wik > 0:
Under this case the range of t is from η11 = max{µi1 /wi1 , . . . , µir /wir } to ∞. The conditional expectation of ln T given
W = w is given by
∞
E (ln T |W = w) = ∞ .
Hence, we get
sup φ̂(w, µ) = ln u − ψ(kn) and inf φ̂(w, µ) = −∞.

µ µ
(b) Suppose wi1 , . . . , wir > 0 and wir +1 , . . . , wik < 0:

The range of t is from η12 = max{µi1 /wi1 , . . . , µir /wir } to η13 = min{µir +1 /wir +1 , . . . , µik /wik }. Therefore, the condi-
tional expectation of ln T given W = w can be obtained as
 η12
E (ln T |W = w) =  η12 .
It can be shown as before that

µ µ
(c) Let wi1 , . . . , wir > 0 and within (k − r ), some wi ’s are non-negative and remaining are negative:
In this case we again show that the results are also permutation invariant. We consider the case: let (j1 , . . . , jk−r ) is a
permutation of (ir +1 , . . . , ik ). Suppose wj1 , . . . , wjm ≥ 0 and wjm+1 , . . . , wjk−r < 0. The range of t is from max{µi1 /wi1 ,
. . . , µir /wir , µjr +1 /wjr +1 , . . . , µjm /wjm } to min{µjm+1 /wjm+1 , . . . , µjk−r /wjk−r }. Arguing as earlier, it can be shown that
µ µ
An application of the Brewster–Zidek technique (1974) on the function R1 (µ, σ , w, δφ ) then completes the proof of the
theorem.
As a consequence of this theorem we get the following corollary.
Corollary 1. Let C1 = {w : u < ed } and d = ψ(kn) − ψ(k(n − 1)). The BAEE δc0 of Q (σ ) is inadmissible and dominated by
the estimator given by
     
ln(uT ) − ψ(kn), if w ∈ B1 C1 B3 C1c ,
δIB =
ln T − ψ(k(n − 1)), otherwise.
Remark 2. We also study the entropy estimation problem when it is known a priori that all µi ’s are bounded below. Such a
situation may arise when the minimum guarantee time of components is known to be more than a pre-specified constant due
to physical constraints. In this case one may take without loss of generality that µ(1) ≥ 0, where µ(1) = min{µ1 , . . . , µk }.
Here the MLE of Q (σ ) is same as the MLE obtained for unrestricted parameter space. The inadmissibility of the BAEE δc0 of
Q (σ ) can be established using the steps of Case (i) of the proof of the Theorem 2. The improved estimator is given by
ln(uT ) − ψ(kn), if w ∈ C1 ,

δIB =
+
Remark 3. We have also considered the entropy estimation when, contrary to the case in Remark 2, the guarantee times are
known to be bounded from above. Here one may assume a priori that µ(k) < 0, where µ(k) = max{µ1 , . . . , µk }. In this case,
the MLE of Q (σ ) gets modified as δRM = ln T 0 − ln(kn), where T 0 = i=1 (Yi − nXi(1) ), Xi(1) = min{0, Xi(1) }, i = 1, . . . , k.
0 0
k
This is the restricted maximum likelihood estimator (RMLE ) of the entropy. Further, the inadmissibility of the BAEE δc0 is
proved using the steps used in Case (ii) of the proof of the Theorem 2. Let C2 = {w : w(r ) < 0}, C3 = {w : w(r +1) > 0}. The
improved estimator is then
      
ln(uT 0 ) − ψ(kn), if w ∈ B1 B3 C1c C2 C3 C1c
δIB =
−
3. Unequal sample sizes
The results of the previous section can be extended to the case when random samples with unequal sample sizes are
drawn from k exponential populations. The proofs, though somewhat more complicated, are similar to those of the results
in Section 2, and hence are omitted. For the sake of completeness, the notation and results have been stated here in full
detail. Suppose (Xi1 , . . . , Xini ), i = 1, . . . , k be independent random sample drawn from the populations Π1 , . . . , Πk
respectively with pdf of the i-th population given by (1). On the basis of the i-th sample {Xi1 , . . . , Xini }, (Xi∗(1) , Yi∗ ) is a
ni
complete and sufficient statistic for (µi , σ ) where Xi∗(1) = min1≤j≤ni Xij , Yi∗ = j =1 Xij . Let, X(∗1) = (X1(1)∗ , . . . , Xk∗(1) ), T ∗ =
k ni
j=1 (Xij − Xi(1) ), and N = i=1 ni . Therefore, (X(1) , T ) is a complete and sufficient statistic for (µ, σ ). X(1) and T
∗
k ∗ ∗ ∗ ∗
i =1
are independently distributed. Also Xi∗(1) follows exponential distribution with location parameter µi and scale parameter
σ /ni and 2T ∗ /σ follows Chi-square distribution with 2(N − k) degrees of freedom. The MLE and the UMVUE of Q (σ ) are
δML
∗
= ln T ∗ − ln N and δMV ∗
= ln T ∗ − ψ(N − k) respectively. The problem under study is also invariant with respect to
Ga,b , the group of the affine transformations. The form of the affine equivariant estimator will be δc∗ (X(∗1) , T ∗ ) = ln T ∗ − c
for some real value constant c. In the following theorem we get the BAEE.
Theorem 3. Under the squared error loss function (2), the BAEE of Q (σ ) is δc∗∗ (X(∗1) , T ∗ ), where c0∗ = ψ(N − k).
0
As in Section 2, we can obtain the form of the scale equivariant estimator of Q (σ ) as
δφ∗ ∗ (W ∗ , T ∗ ) = ln T ∗ + φ ∗ (W ∗ ), (11)
where W ∗ = (W1∗ , . . . , Wk∗ ) and Wi∗ = Xi∗(1) /T ∗ . Suppose B∗1 = {w ∗ : w(∗1) > 0}, B∗2 = {w ∗ : u < exp(φ(w ∗ ) + ψ(kn))},
B∗3 = {w ∗ : w(∗k) < 0}, u∗ = i=1 ni wi + 1, w(1) = min{w1 , . . . , wk }, w(k) = max{w1 , . . . , wk } and wi = xi(1) /t for
k ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
i = 1, . . . , k. For a function φ in (11), define

∗
  
 k
     
ni wi + 1 − ψ(N ), if w ∗ ∈ B∗1
∗ c
ln B∗2 B∗3 B∗2

φ0 (w ) =
∗ ∗
(12)
i =1
φ (w ),
∗ ∗

otherwise.

The following theorem proves a general inadmissibility result for the estimators of the form (11).
Theorem 4. Let δφ∗ ∗ be a scale equivariant estimator of the form (11) and φ0∗ (w ∗ ) be as defined in (12). If there exists some (µ, σ )
such that P(µ,σ ) (φ0∗ (W ∗ ) ̸= φ ∗ (W ∗ )) > 0, then under the squared error loss function (2), the estimator δφ∗ ∗ dominates δφ∗ ∗ .
0
In the following corollaries, the improved estimator of the BAEE is given for various cases.
Corollary 2. The BAEE δc∗∗ of Q (σ ) is inadmissible and dominated by the estimator given by
0
     
ln(u∗ T ∗ ) − ψ(N ), if w ∗ ∈ B∗1
c
C1∗ B∗3 C1∗
δIB =
∗
ln T ∗ − ψ(N − k), otherwise,
d∗
where C1∗ = {w ∗ : u∗ < e } and d∗ = ψ(N ) − ψ(N − k).
Corollary 3. The BAEE δc∗∗ of Q (σ ) is inadmissible when µ(1) ≥ 0 and dominated by the estimator given by
0
ln(u T ) − ψ(N ),
∗ ∗
if w ∗ ∈ C1∗

δIB∗ + =
ln T ∗ − ψ(N − k), otherwise.
Corollary 4. The estimator

      
ln(u∗ T ∗ ) − ψ(N ), if w ∗ ∈ B∗1
c c
B∗3 C1∗ C2∗ C3∗ C1∗
δIB =
∗−
ln T ∗ − ψ(N − k), otherwise,

∗
where C2 = {w : ∗
w(∗r ) < 0}, C3 = {w∗ : w(∗r +1) > 0}, r = 1, . . . , k − 1 dominates the BAEE δc∗∗ of Q (σ ) when µ(k) < 0.
∗
0
a b c
0.04
0.102
R
R
R
0.1 -1
-0.5
0.098 0
2
µ
2
2
µ
µ
1 0.5
0 -1 1
µ1 µ1 µ1
d e f
R
R
R
2
µ
2
2
µ
µ
µ1 µ1 µ1
g h i
R
R
2
µ
2
µ
µ1
2
µ
µ1 µ1
j k l
R
R
µ1 µ1
2
µ
2
2
µ
µ
µ1
Fig. 1. The risk plot of the estimators δIB , δIB

+ −
, δIB and δRM for n = (4, 6, 8). Graphs (a, b, c ) for δIB , Graphs (d, e, f ) for δIB+ , Graphs (g , h, i) for δIB− and Graphs
(j, k, l) for δRM respectively.
4. Heuristic discussion
We have considered the problem of estimating entropy of k shifted exponential populations with a common scale but
different locations. The entropy expression is related to the logarithm of the scale parameter. Stein (1964) first showed
that the best equivariant estimator of normal variance is inadmissible. Later this phenomenon was observed for some other
scale parameter families including exponential distribution (see Brown, 1968 and Arnold, 1970). Misra et al. (2005) obtained
Stein type and Brewster–Zidek type estimators for the entropy for a multivariate normal population. In this paper we derive
dominating estimators over the BAEE for the entropy of k shifted exponential populations. The model is important as the
structure of the sufficient statistics gets modified.
5. Numerical comparisons
In this section we compare numerically the risk performance of the improved estimators δIB , δIB +
and δIB
−
with the
BAEE δc0 . It is noticed that for all cases of µi ’s the risk differences become small for large values of n. For n ≥ 100 the
risk values are same up to six decimal places. For the purpose of presentation of the numerical study, we have taken
n = 4, 6, 8, 10, 15, 20, 25, 30 and 50 and k = 2. The risk values of the proposed estimators are calculated using simulations
based on 10 000 samples of size n. Since the risk functions of the estimators are functions of (µ1 /σ , . . . , µk /σ ), we take
σ = 1 without loss of generality. The results of the numerical study are presented through graphs. The graphs corresponding
to values of n = 4, 6 and 8 are presented in Fig. 1 in this paper, whereas for values of n = 10, 15, 20, 25, 30 and 50, they are
placed on the website: http://www.facweb.iitkgp.ernet.in/∼smsh/graph.pdf. The following observations are made based on
the risk values.
(a) Under the squared error loss function the risk values of the MLE δML are 0.320865, 0.161249, and 0.104970 and that
of the BAEE δc0 are 0.178992, 0.104975, and 0.075129 for n = 4, 6, 8 respectively. Graphs (a), (b), (c ) in the Fig. 1 represent
the risk plot of the estimator δIB . We observe that for different values of n the improved regions of the estimator δIB over δc0
are different. Keeping µ1 fixed, if we decrease the magnitude of µ2 , then margin of improvement is more. It is also noticed
that we get considerable improvement when both µ1 and µ2 are close to zero. In this case, the region of improvement is
approximately |µ1 | ≤ 0.5 and |µ2 | ≤ 0.5. The maximum improvement observed is about 12%.
(b) When both µ1 and µ2 are non-negative, the risk values of the estimators are plotted in graphs (d), (e), (f ) in the
Fig. 1. For large values of µ1 and µ2 , approximately (≥1), δIB +
takes the value of the risk equal to the R(δc0 ).
For the values of µ1 and µ2 approaching towards zero risk of δIB +
decreases and before 0, it stops decreasing and starts
increasing. The maximum improvement observed is about 12%.
(c) When both µ1 and µ2 are negative, graphs (g ), (h), (i) and (j), (k), (l) in the Fig. 1 represent the risk plot of the
estimators δIB −
and δRM respectively. From the numerical risk values it is observed that risk values of δRM and δIB −
decrease
when both µ1 and µ2 increase. The performance of δRM is always better than that of δML . We also see that the estimator δIB −
always performs better than δRM . The maximum improvement observed is about 27%.
Acknowledgments
The authors thank the reviewers and a co-editor-in-chief for their valuable suggestions which have considerably
improved the content and the presentation of the paper.
References
Adami, C., 2004. Information theory in molecular biology. Phys. Life Rev. 1, 3–22.
Ahmed, N.A., Gokhale, D.V., 1989. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inf. Theory 35, 688–692.
Arnold, B.C., 1970. Inadmissibility of the usual scale estimate for a shifted exponentail distribution. J. Amer. Statist. Assoc. 65, 1260–1264.
Bobotas, P., Kourouklis, S., 2009. Strawderman-type estimators for a scale parameter with application to the exponential distribution. J. Statist. Plann.
Inference 139, 3001–3012.
Brewster, J.F., 1974. Alternative estimators for the scale parameter of the exponential distribution with unknown location. Ann. Statist. 2, 553–557.
Brewster, J.F., Zidek, J.V., 1974. Improving on equivariant estimators. Ann. Statist. 2, 21–38.
Brown, L.D., 1968. Inadmissibility of the usual estimators of scale parameters in problems with unknown location and scale parameters. Ann. Math. Statist.
39, 29–48.
Cover, T.M., Thomas, J.A., 1999. Elements of Information Theory. Wiley, New York.
Dragulescu, A., Yakovenko, V.M., 2001. Evidence for the exponential distribution of income in the USA. Eur. Phys. J. B 20, 585–589.
Kayal, S., Kumar, S., 2011a. Estimating entropy of an exponential population under linex loss function. J. Indian Statist. Assoc. 49, 91–112.
Kayal, S., Kumar, S., 2011b. On estimating the Shannon entropy of several exponential populations. Int. J. Stat. Econ. 7, 42–52.
Lazo, A.C.G., Rathie, P.N., 1978. On the entropy of continuous probability distributions. IEEE Trans. Inf. Theory 24, 120–122.
Lehmann, E.L., Casella, G., 1998. Theory of Point Estimation, second ed. Springer, New York.
Lehmann, E.L., Romano, J.P., 2009. Testing Statistical Hypotheses. Springer, New York.
Liu, Y., Liu, C., Wang, D., 2011. Understanding atmospheric behaviour in terms of entropy: a review of applications of the second law of thermodyanamics
to meteorology. Entropy 13, 211–240.
Misra, N., Singh, H., Demchuk, E., 2005. Estimation of the entropy of a multivariate normal distribution. J. Multivariate Anal. 92, 324–342.
Pal, N., Jin, C., Lim, W., 2006. Handbook of Exponential and Related Distributions for Engineers and Scientists. Chapman and Hall/CRC, Boca Raton.
Petropoulos, C., Kourouklis, S., 2002. A class of improved estimators for the scale parameter of an exponential distribution with unknown location. Comm.
Statist. Theory Methods 31, 325–335.
Robinson, D.W., 2011. Entropy and uncertainty. Entropy 10, 493–506.
Sakai, A.K., Burries, T.A., 1985. Growth in male and female aspen clones: a twenty-five year longitudinal study. Ecology 66, 1921–1927.
Shannon, C., 1948. The mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423.
Sharma, D., 1977. Estimation of the reciprocal of the scale parameter in a shifted exponential distribution. Sankhyā Ser. A 39, 203–205.
Stein, C., 1964. Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Statist. Math. 16, 155–160.
Zidek, J.V., 1973. Estimating the scale parameter of the exponential distribution with unknown location. Ann. Statist. 1, 264–278.

Kayal and Kumar - Estimation of The Shannon Entropy of Several Shifted Exponential Populations

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Kayal and Kumar - Estimation of The Shannon Entropy of Several Shifted Exponential Populations

Caricato da

Copyright:

Formati disponibili

Statistics and Probability Letters 83 (2013) 1127–1135

Contents lists available at SciVerse ScienceDirect

Statistics and Probability Letters

Estimation of the Shannon’s entropy of several shifted

article info abstract

∗ Corresponding author. Tel.: +91 3222283662; fax: +91 3222255303.

2. The best affine equivariant estimator

2.1. Improving upon the best affine equivariant estimator

where R1 (µ, σ , w, δφ ) denotes the conditional risk of δφ given W = w given by

= E [(ln(T /σ ) + φ(W ))2 |W = w]. (6)

φ̂(w, µ) = −E (ln T |W = w). (7)

To find the marginal density of W , we integrate f(W ,T ) (w, t ) with respect to t.

Consequently, the conditional density of T given W = w is given by

Substituting the expression of E (ln T |W = w) in (7), we get

Therefore, from (9) we get

sup φ̂(w, µ) = ln u − ψ(kn) and inf φ̂(w, µ) = −∞.

Case (ii) Suppose all µi ’s are negative, i = 1, . . . , k:

Consequently, the conditional density of T given W = w can be obtained as

Therefore, the conditional expectation of ln T given W = w is given by

Substituting the expression of E (ln T |W = w) in (7) and integrating, we get

φ̂(w, µ) = ln u − ψ(kn). (10)

When u > 0, we have from (7)

Similarly, when u < 0, we get

sup φ̂(w, µ) = +∞ and inf φ̂(w, µ) = −∞.

sup φ̂(w, µ) = +∞, and inf φ̂(w, µ) = ln u − ψ(kn),

when u > 0; and

sup φ̂(w, µ) = ln u − ψ(kn) and inf φ̂(w, µ) = −∞.

(b) Suppose wi1 , . . . , wir > 0 and wir +1 , . . . , wik < 0:

It can be shown as before that

3. Unequal sample sizes

As in Section 2, we can obtain the form of the scale equivariant estimator of Q (σ ) as

i = 1, . . . , k. For a function φ in (11), define

ln T ∗ − ψ(N − k), otherwise,

Corollary 4. The estimator

ln T ∗ − ψ(N − k), otherwise,

Fig. 1. The risk plot of the estimators δIB , δIB

Potrebbero piacerti anche