Sei sulla pagina 1di 6

Proof for question 11:

For x Sp , let h = h(x) = p(x)logp(x) and p = p(x).


Then,
dh(x)
= logp 1
dp(x)
d2h(x)
dp(x)2

1
= <0
p

For x Sp , h(x) = p(x)logp(x) is concave function of p.


H(p) is the sum
P of concave functions of p(x).
H(p) = x p(x) logp(x) is a concave function of p.
= 1 , H(p1 ) + H(p

Namely, for 0 1 and


2 ) H(p1 + p2 ).

Proof for question 12:


1. Proof for fixed p(x)
Since p(x, y) = p(x)p(y|x), p(x) and p(y|x) specify the joint distribution
p(x, y). Because p(x) is fixed, we consider the mutual information as a
function of p(y|x), noted as I(p(y|x)).
Let Zbe a binomial random variable with distribution
1 with probability
Z=
0 with probability1
If Z = 1, we select Y by p1 = p1 (y|x); otherwise, we select Y by
p2 = p2 (y|x).
Then
p(x, y) = p(x)[p1 (y|x) + (1 )p2 (y|x)]
and
I = I(X; Y ) = I(p1 + (1 )p2 ).
+I(X; Y |Z)
I(X; Z)
| {z }
X and Z are independent
= 0 + I(X; Y |Z)
= I(X; Y |Z = 1) + (1 )I(X; Y |Z = 0)
= I(p1 ) + (1 )I(p2 )

I(X; Z, Y ) =

At the same time,


I(X; Z, Y ) = I(X; Y ) + I(X; Z|Y )
I(X; Y )
= I(p1 + (1 )p2 )
I(p1 ) + (1 )I(p2 ) I(p1 + (1 )p2 )
Namely, I(X; Y ) is a convex functional of p(y|x) for fixed p(x).
Notice: I read the proof from the lecture notes of Mark Braverman.

2. Proof for fixed p(y|x)


I(X; Y ) =P
H(Y ) H(Y |X)
p(y) = x p(x)p(y|x), for fixed p(y|x), p(y) is a linear combination
of p(x). From question 11, we know H(Y ) is a concave function of p(y).
H(Y ) is a concave function of p(x).
X
H(Y |X) =
p(x)H(Y |X = x)
x

p(x)[

p(y|x)logp(y|x)]

For fixed p(y|x), H(Y |X) is linear function of p(x) which can also be
consider a concave a function of p(x).
I(X; Y ) is the sum of two concave functions of p(x), and accordingly
it is a concave functional of p(x) for fixed p(y|x).
Notice: I read Evan Chous lecture notes of Information Theory.

Question 15:
H(Y ) = H(X, Y )
H(X|Y )=0

H(X|Y )
| {z }
since X is function of Y

= H(X, Y )
= H(X) + H(Y |X)
H(X)
Proof is done.
Interpretation:
Since X is a function of Y , let x = g(y).
Then, for x Sp ,
X
p(x) =
p(y)
y:x=g(y)

X distributes less spread than Y .


In this situation, there is less uncertainty about X than Y .

Proof for question 16:


I will use mathematical induction method to prove this question.
When n = 2
H(X1 , X2 ) = H(X1 ) + H(X2 |X1 )
H(X1 |X2 ) + H(X2 |X1 )
2
X
=
H(Xi |Xj , j 6= i)
i=1

So, for n = 2, the inequality is satisfied.


+
Suppose for n = k
P1k1 2, k N , the inequality is satisfied. Namely,
H(X1 , , Xk1 ) i=1 H(Xi |Xj , j 6= i).
Then, in the case of n = k, we have
H(X1 , , Xk1 , Xk ) =

k1
X

H(Xi |X1 , , Xi1 ) + H(Xk |X1 , , Xk1 )

i=1

= H(X1 , , Xk1 ) + H(Xk |X1 , , Xk1 )


k1
X

H(Xi |Xj , j 6= i) + H(Xk |X1 , , Xk1 )


=

i=1
k
X

H(Xi |Xj , j 6= i)

i=1

So, for the case of n = k, the inequality


P is satisfied.
For n 2, H(X1 , X2 , , Xn ) ni=1 H(Xi |Xj , j 6= i) .

Proof for question 17:


H(X1 , X2 , X3 ) = H(X1 , X2 ) + H(X3 |X1 , X2 )

(1)

Similarly, we have
H(X1 , X2 , X3 ) = H(X1 , X3 , X2 )
= H(X1 , X3 ) + H(X2 |X1 , X3 )

(2)

H(X1 , X2 , X3 ) = H(X2 , X3 , X1 )
= H(X2 , X3 ) + H(X1 |X2 , X3 )

(3)

Sum (1), (2) and (3), we have


3H(X1 , X2 , X3 ) = H(X1 , X2 ) + H(X1 , X3 ) + H(X2 , X3 )
+H(X3 |X1 , X2 ) + H(X2 |X1 , X3 ) + H(X1 |X2 , X3 )(4)
From problem 16, we know that
H(X3 |X1 , X2 ) + H(X2 |X1 , X3 ) + H(X1 |X2 , X3 ) =

3
X

H(Xi |Xj , j 6= i)

i=1

H(X1 , X2 , X3 )

(5)

Combine (4) and (5) together (namely, plug the result of (5) into (4)), we
have
H(X1 , X2 ) + H(X2 , X3 ) + H(X1 , X3 ) 2H(X1 , X2 , X3 )

Potrebbero piacerti anche