Sei sulla pagina 1di 16

Solutions to Steven Kays Statistical Estimation

book
Satish Bysany
Aalto University School of Electrical Engineering
March 1, 2011
[section]

Introduction

This is as set of notes describing solutions to Steven Kays book Fundamentals of Statistical Signal Processing: Estimation Theory. A brief review of
notation is in order.

1.1

Notation

I is identity matrix.
0 represents a matrix or vector of all zeros.
e is a column vector of all ones.
J is exchange matrix, with 1s on the anti-diagonal and 0s elsewhere.
ej is a column vector whose j th element is 1, rest all 0.
.
ab = aH b is the dot product of a and b

t f (t)

is the derivative of a scalar function f (t) depending on M 1


real vector parameter t, is defined by

f
(t)
t1

f
(t)

f (t) = t2

t
..

tM f (t)

t h(t)

is the derivative of a M 1 real vector function h(t) depending


upon a scalar value t.

t f1 (t)

f2 (t)
f (t) = t

t
..

t fM (t)

Chapter 2

Solutions to Problems in Chapter 2

2.1

Problem 2.1

The data x = {x[0], x[1], . . . , x[N 1]} are observed where the x[n]s are
i.i.d. as N (0, 2 ). We wish to estimate the variance 2 as

2 =

N 1
1 X 2
x [n]
N

(1)

n=0

Solution

From the problem definition, it follows that, n,


= E (x[n]) = 0


2 = E (x[n] )2 = E x2 [n]

Now take the E() operator on both sides of Eq(1) and using the fact
that, for any two random variables X and Y ,
E(X + Y ) = E(X) + E(Y )

N 1
N 1

1 X
1 X 2 N 2
2
=
E x [n] =
=
= 2
N
N
N
n=0

(2)

n=0

Hence the estimator 1 is unbiased. Note that, this result holds even if the
x[n]s are not independent!
Next, applying the variance operator var() on both sides of Eq(1) and
using the fact that, for independent random variables X and Y ,
var(aX + bY ) = a2 var(X) + b2 var(Y )
2

N 1


1 X
var
2 = 2
var x2 [n]
N

(3)

n=0

Let X N (0, 1) be normal distribution with zero-mean and unit variance. Then, by definition, Y = X 2 21 is chi-square
distributed
with


1 degree of freedom. We know that mean 2n = n, var 2n = 2n, so,
var(Y ) = var(X 2 ) = 2 1 = 2.
Introducing Z = X, implies that var(Z) = 2 var(X) = 2 . Since
E(Z) = E(X) = 0, we conclude Z N (0, 2 ).
Now consider var(Z 2 ) = var( 2 X 2 ) = 4 var(X 2 ) = 2 4 . Since each of
x[n] N (0, 2 ), we have,
var(x2 [0]) = var(x2 [1]) = = var(x2 [N 1]) = 2 4
Hence, Eq(3) simplifies to
var

N 1
1 X
2 4 N
2 4
= 2
(2 4 ) =
=
N
N2
N

(4)

n=0


As N , var
2 0.

2.2

Problem 2.5


Two samples {x[0], x[1]} are independently observed from N 0, 2 distribution. The estimator

1 2

2 =
x [0] + x2 [1]
(5)
2
is unbiased. Find the PDF of
2 to determine if it is symmetric about 2
Solution Consider two standard normal random variables X0 and X1 ,
that is, Xi N (0, 1) , i = 0, 1. Then, by definition, X = X02 + X12 is
2 (n)-distributed with n = 2 degrees of freedom. Its PDF is
1
fX (x) = ex/2
2

x>0

Let x[0] = X0 and x[1] = X1 . Then


x2 [0] + x2 [1] = 2 (X02 + X12 ) = 2 X
=
2 =

2
X
2
3

from Eq(5)

We know that, for two continuous random varaibles X and Y related as


Y = aX + b,


yb
1
fY (y) =
fX
|a|
a
2
2 ,b

= 0, = 2 , the PDF of
2 is


y 
1
1
1
2 1 y
2
2a
= 2 ey/ = ey/
f 2 (y; ) = fX
= 2
e
a
a

Taking a =

y>0

Its obvious that f 2 (y; ) 6= f 2 (y; ), so the PDF is not symmetric about
= 2 . Note carefully that the PDF is symmetric about , not 2 .

3
3.1

Chapter 3: CRLB
Formulas

Let a random variable X depend on some parameter t. We write the PDF


of X as fX (x; t) it represents a family of PDFs, each one with a different
value of t. When the PDF is viewed as a function of t for a given, fixed
value of x, it is termed as likelihood function. We define, the log-likelihood
function as
.
.
L(t) = LX (t|x) = ln fX (x; t)

(6)

Note that t is a deterministic, but unknown parameter. We simply write it


as L(t) when the random variable X is known from context. For the sake
of notation, we define

L = L(t) =
ln fX (x; t) =
fX (x; t)
t
t
fX (x; t) t
2
2
= L(t) = ln fX (x; t)
L
t2
t2

(7)
(8)

Taking the expectation w.r.t X, if the regularity condition


=0
E(L)

(9)

is satisfied, then there exists a lower bound on the variance of an unbiased


estimator t,
var(t)

E(L)

(10)

Furthermore, for the equality sign, and for all t,


var(t) =

1
L = g(t)(h(x) t) t = h(x)

E(L)

(11)

where g() and h() are some functions. Note that the above applies only for
unbiased estimates, so E(t) = t = E[h(x)]. The minimum variance is also
given by,
1
1

var(t) =
=
= g(t) = E(L)
(12)

g(t)
E(L)
Note: t is an estimate of t. Hence, t cannot depend on t itself (if it does,
such an estimate is useless!). So the result t = h(x) intuitively makes sense,
because t depends only on the observed, given data x and not at all on t.
But the mean and variance of t generally do depend on t and that is ok !
For the MVUE case, mean E(t) = t and variance var(t) = g(t) both are
purely functions of t alone.
Replacing the scalar random variable X by a vector of random variables
x, the results still hold.
Facts
Identity, if the regularity condition is satisfied, then
 
 

E L 2 = E L
Fisher information I(t) for data x is defined by

I(t) = E(L)
So, the minimum variance is the reciprocal of Fisher information. The
more the information, the lower is the CRLB.
For a deterministic signal s[n; t] with an unknown parameter t in zeromean AWGN w[n] N 0, 2 ,
x[n] = s[n; t] + w[n]

n = 1, 2, . . . , N

the minimum variance (the CRLB, if it exists) is given by



2
var t P
N 1
n=0

2
=
2

k t
sk2
t s[n; t]

For an estimate t of t, if the CRLB is known, then for any transformation = g(t) for some function g() has the new CRLB

CRLB = CRLBt

2

g(t)
t

The CRLB always increases as we estimate more parameters for same


given data.
Let = [1 , 2 , . . . , M ]T be a vector parameter. Assume that an estih
iT
is unbiased, that is,
mator = 1 , 2 , . . . , M
= E(i ) = i
E()
The M M Fisher information matrix I() is a matrix, whose (i, j)th
element is given by
 2

ln p(x; )
[I()]i,j = E
i j
Note that p(x; ) is a scalar function, depending
on vector parameters x and

. For example, if w[n] is i.i.d N 0, 2 and x[n] = 1 + n2 + w[n], then
(
)
N
1 X
1
p(x; ) =
exp 2
(x[n] 1 n2 )2
2
(2 2 )N/2
n=1
Say x = [1, 2, 5, 3], = [1, 2], = 2 implies p(x; ) = 1.89 103 .
Note: The Fisher matrix is symmetric, because the partial derivatives do
not depend on order of evaluation. If the regularity condition



E
ln p(x; ) = 0

is satisfied (where the expectation is taken w.r.t p(x; )) then the covariance
matrix of any unbiased estimator satisfies
C I1 () 0 var(i ) [I1 ()]i,i

Note: [I1 ()]i,i means first you calculate the whole matrix inverse and
then take the (i, i)th element. The covariance matrix of any vector y is given
by
y = E(y)


Cy = E (y y )(y y )T
Furthermore, an estimator that attains the lower bound,
C = I1 ()

ln p(x; ) = I()(g(x) )

for some M -dimensional function g and some M M matrix I. That estimator, which is the MVUE, is = g(x), and its covariance matrix is
I1 ().

3.2

Problem 3.1

If x[n] for n = 0, 1, . . . , N 1 are i.i.d. according to U(0, ), show that the


regularity condition does not hold. That is,



E
ln p(x; ) 6= 0 > 0

Solution By definition of the expectation operator,



 Z 

Z

E
ln p(x; ) =
ln p(x; ) p(x; ) dx =
p(x; ) dx

(13)

follows from Eq(7). Denote the N random variables as xi = x[i 1] for


i = 1, 2, . . . , N . It is given in the problem that their PDFs are identical:
(
1/ 0 < xi
p(xi ; ) =
0
otherwise
and
Z

p(xi ; ) dxi = 1
0

The multiple integral in Eq(13) simplifies to product of integrals


Z

p(x; ) dx =

Z
0


Z


p(x1 ; ) dx1
p(xN ; ) dxN

0
7

because the xi s are independent. Note that the limits of the integral depend
on , so we cannot interchange the order of differentiation and integration,
Z
Z

p(xi ; ) dxi
p(xi ; ) dxi 6=
0
0
Hence, the regularity condition fails to hold. In fact, LHS= 1/, but RHS=0!

3.3

Problem 3.3

The data x[n] = Arn + w[n] for n = 0, 1, , N 1 are observed, where w[n]
is WGN with variance 2 and r > 0 is known. Find the CRLB of A. Show
that an efficient estimator exists and find its variance.
Solution
is

Assuming that x[i]s are statistically independent, the joint PDF

N
1
Y



1
1
n 2
p(x; A) =
exp 2 (x[n] Ar )
2
(2 2 )1/2
i=0
!
N 1
1
1 X
n 2
exp 2
(x[n] Ar )
=
2
(2 2 )N/2
= ln p(x; A) = ln 2 2

1
ln p(x; A) = 2
A

Since the sum


S=

N
1
X

N
1
X

N/2

1
2 2

n=0
N
1
X

(x[n] Arn )2

n=0

rn (x[n] Arn )

n=0

(
r

2n

r2N 1
r2 1

n=0

r=
6 1
r=1

is deterministic and known (because both r and N are known), the above
equation simplifies to
!
N 1

1 X n
ln p(x; A) = 2
r x[n] AS
(14)
A

n=0
!
N
1 n
X
S
r
L = 2
x[n] A
(15)

S
n=0

= g(A)(h(x) A)
8

(16)

where g(A) = S/ 2 is a constant (doesnt even depend on A!) and


h(x) =

N
1
X
n=0

rn
x[n]
S

is depends on x but not on A. Hence, from Theorem 3.1, the MVUE estimate
A is
N 1
1 X n
r x[n]
A = h(x) =
S
n=0

and the variance of A satisfies



var(A)

2
S

and

CRLB =

1
2
=
g(A)
S

We can also find the second derivative, from Eq(14),


2
= ln p(x; A) = S (0 1)
L
A2
2

and, in our case, E[L]


=L
because it
and, as required, CRLB = 1/E[L]
is constant (does not depend on x or A).

3.4

Problem 3.5

If x[n] = A+w[n] for n = 1, 2, . . . , N are observed, where w = [w[1], w[2], . . . , w[N ]]T
N (0, C), find the CRLB for A. Does an efficient estimator exist and if so,
what is its variance ?
Solution

The joint p.d.f. of x is given by




1
1
T 1
p
p(x; A) =
exp (x Ae) C (x Ae)
2
det(2C)
1
1
= ln p(x; A) = ln p
(x Ae)T C1 (x Ae)
det(2C) 2


1 
=
ln p(x; A) =
(x Ae)T C1 (x Ae)
A
2 A

Using the result that


T
m Qm = 2

T
m


Qm

Setting Q = C1 and m = (x Ae),


T

m =
(x Ae)T = (0
AeT ) = eT
A
A
A
So

ln p(x; A) = eT C1 (x Ae) = (eT C1 x AeT C1 e)


A
The scalar eT Qe is nothing
Consider, for example,

a
[1, 1, 1] d
g

but sum of all the elements of Q for any Q.



b c
1
e f 1
h i
1

(17)

1
= [a + d + g, b + e + h, c + f + i] 1
1

(18)

=a+d+g+b+e+h+c+f +i

(19)

So, denoting = eT C1 e,

ln p(x; A) = (eT C1 x AeT C1 e) =


A

eT C1 x
A

The above expression is clearly of the form

ln p(x; A) = g(A)(h(x) A)
A
Hence, there exists a MVUE (the efficient estimator) given by
eT C1 x
eT C1 x
MVUE = A = h(x) =
= T 1

e C e
and its variance is
=
var(A)

1
1
= PN PN
1

i=1
j=1 (C )i,j

10

3.5

Problem 3.9

We observe two samples of a DC level in correlated Gaussian noise


x[0] = A + w[0]
x[1] = A + w[1]
where w = [w[0], w[1]]T is zero mean with covariance matrix


1
2
C=
1
The parameter is the cross-correlation coefficient between w[0] and w[1].
Compute the CRLB of A and compare it to the case when = 0 (WGN).
Also explain what happens when = 1.
Solution:
Since

This is a special case of Problem 3.5 (see above) for N = 2.




1
1
C1 = 2 2
1
( 1)

the CRLB is
var A =

1
2 (2 1)
=
2( 1)
eT C1 e

When = 0, var A = 2 /2, as expected. But when 1, the matrix C


becomes singular, hence its inverse does not exist; it means that the samples
w[0] and w[1] are almost perfectly correlated and hence do not carried any
additional information.

3.6

Problem 3.13

Consider polynomial curve fitting


x[n] =

p1
X

Ak nk + w[n]

k=0

for n = 0, 1, . . . , N 1. w[n] is i.i.d. WGN with variance 2 . It is desired


to estimate {A0 , A1 , . . . , Ap1 }. Find the Fisher information matrix for this
problem.

11

Solution:

The joint p.d.f. is

#2
"
p1

X
1
1

Ak n k
p(x; A) =
exp 2 x[n]

2
2 2
n=0
k=0

"
#2
p1
N
1

X
X
1
1
k
=
x[n]

A
n
exp

2 2
(2 2 )N/2
N
1
Y

n=0

k=0

"
#2
p1
N 1
X
1
1 X
= ln p(x; A) = ln

x[n]
Ak nk
(2 2 )N/2 2 2 n=0
k=0
" (
)
#
p1
N
1
X

1 X

k
i
ln p(x; A) = 0 2
=
2 x[n]
Ak n
0n
Ai
2
n=0

k=0

Because
p1

X

Ak n k =
A1 n1 + A2 n2 + . . . + Ai ni + . . . + AN nN
Ai
Ai
k=0



i
= 0 + 0 + ... +
Ai n + 0
Ai

= ni
Hence, the simplification:
N 1

1 X i
ln p(x; A) = 2
n
Ai

(
x[n]

n=0

2
Aj Ai

ln p(x; A) =

1
2

=
Hence, by definition,
I(A) is given by

(i, j)th

N
1
X

ni 0 nj

n=0
N
1
X

1
2

p1
X

)
Ak n

k=0

ni+j

n=0

entry of the the p p Fisher information matrix


N 1
2
1 X i+j
= E
ln p(x; A) = 2
n
Ai Aj

[I(A)]i,j

n=0

for i, j = 0, 1, . . . , p 1. Note that the Fisher information matrix is symmetric, so the order of evaluation of partial derivatives can be interchanged. See
12

pg. 42, Eq (3.22) in the textbook for a special case of the above for p = 2.
Note that for the (0, 0)th entry of the matrix, the above expression gives
N
1
X

i+j

n=0

N
1
X

n0+0 = (00 + 10 + . . . + (N 1)0 )

n=0

where 00 must be taken as 1 (even though some authors disagree).

Chapter 5

Neyman-Fisher Factorization Theorem If we can factor the p.d.f


p(x; th) as
p(x; th) = g(T (x), )h(x)
where g() is a function depending on x only through T (x) and h() is a
function depending only on x, then T (x) is a sufficient statistic for . The
converse is also true.

4.1

Problem 5.2

The IID observations xn for n = 1, 2, . . . , N have exponential p.d.f


(
xn
2
2
xn > 0
2 exp(xn /2 )
2
p(xn ; ) =
0
otherwise
Find a sufficient statistic for 2 .
Solution Let u(t) be the unit step function. The joint PDF of x1 , x2 , . . . , xn
is given by (because they are independent),
2

p(x; ) =

N
Y

p(xn ; 2 )

n=1
N
Y

xn
exp(x2n /2 2 )u(xn )
2
n=1
!
!!
N
N
Y
1
1 X 2
=
xn u(xn )
exp 2
xn
2
2

n=1

n=1

= h(x)g(T (x), )

13

whence, the sufficient statistic for 2 is T (x)


N
X

T (x) =

x2n

n=1

4.2

Problem 5.5

The IID observations xn for n = 1, 2, . . . , N are distributed according to


U[, ], where > 0. Find a sufficient statistic for .
Solution

The individual sample p.d.f. is given by


(
1/2 < xn <
p(xn ; ) =
0
otherwise

The joint p.d.f is given by


p(x; ) =

N
Y

p(xn ; )

n=1

(
1/(2)N
=
0

< xn < , n N
otherwise

Define a function bool(S) for any mathematical statement S such that


(
1 S is true
bool(S) =
0 S is false
(This is also called as Indicator function, see Wikipedia). Then
p(x; ) =

1
bool( < xn < , N)
(2)N

But,
xn < = > x1 and > x2 and > xN
= ( > x1 ) ( > x2 ) ( > xn )
= > max{x1 , x2 , . . . , xN }
Similarly,
< xn = > xn
= > max{x1 , x2 , . . . , xN }
14

Combining both of the above,


< xn < = ( < xn ) ( > xn )
= ( > max(x)) ( > max(x))
= > max{|x1 |, |x2 |, . . . , |xN |}
So, the joind p.d.f. becomes
1
bool(max{|x1 |, |x2 |, . . . , |xN |} < )
(2)N
= g(T (x), )h(x)

p(x; ) =

where h(x) = 1 and


T (x) = max{|x1 |, |x2 |, . . . , |xN |}
1
bool(T (x) < )
g(T (x), ) =
(2)N
Hence, by Neyman-Fisher factorization theorem, T (x), as given above, is
the sufficient statistic. Note: The sample mean is not a sufficient statistic
for uniform distribution!

Chapter 7: MLE

The MLE for a scalar parameter is defined as the value of parameter t


that maximizes p(x; t) for a given, fixed x, i.e., the value that maximizes
the likelihood function. The maximization is performed over the allowable
range of t.
To find the MLE, solve the equation

ln p(x; t) = 0
t
for t. This equation may have multiple solutions and you should choose the
one appropriately.
Theorem. If an efficient estimator (the estimator which attains CRLB)
exists, then MLE procedure will find it.
The MLE is
asymptotically unbiased i.e., E(t) t as N .
15

asymptotically efficient i.e., var(t) CRLB as N .


asymptotically optimal i.e., both of the above are true
Theorem. If the pdf p(x; t) is twice differentiable and the Fisher information I(t) is nonzero, then the MLE of the unknown parameter t is asymptotically distributed (for large N ) according to

t N t, I 1 (t)
i.e., Gaussian distributed with mean equal to true value t and variance equal
to CRLB (= inverse of Fisher information).
Theorem. Assume that the MLE t of unknown parameter t is known. Consider a transformation function of t,
= f (t)
for any function f (). Then the MLE of is nothing but
= f (t)

16

Potrebbero piacerti anche