Sei sulla pagina 1di 7

Harvard SEAS ES250 Information Theory

Homework 1
Solution

1. Let p(x, y) be given by

XY 0 1
0 1/3 1/3
1 0 1/3

Evaluate the following expressions:


(a) H(X), H(Y )
(b) H(X|Y ), H(Y |X)
(c) H(X, Y )
(d) H(Y ) H(Y |X)
(e) I(X; Y )
(f) Draw a Venn diagram for the quantities in (a) through (e)

Solution :

(a) H(X) = 23 log 32 + 13 log 3 = 0.918 bits = H(Y )


(b) H(X|Y ) = 31 H(X|Y = 0) + 32 H(X|Y = 1) = 0.667 bits = H(Y |X)
(c) H(X, Y ) = 3 13 log 3 = 1.585 bits
(d) H(Y ) H(Y |X) = 0.251 bits
(e) I(X; Y ) = H(Y ) H(Y |X) = 0.251 bits
(f)

2. Entropy of functions of a random variable

(a) Let X be a discrete random variable. Show that the entropy of a function of X is less than
or equal to the entropy of X by justifying the following steps:
(a)
H(X, g(X)) = H(X) + H(g(X)|X)
(b)
= H(X)
(c)
H(X, g(X)) = H(g(X)) + H(X|g(X))
(d)
H(g(X))

Thus H(g(X)) H(X).

(b) Let Y = X 7 , where X is a random variable taking in positive and negative integer values.
What is the relationship of H(X) and H(Y )? What if Y = cos(X/3) ?

Solution :
(a)
STEP (a) : H(X, g(X)) = H(X) + H(g(X)|X) by the chain rule for entropies.

1
Harvard SEAS ES250 Information Theory

STEP (b) : H(g(X)|X) = 0P


P since for any particular value of X, g(X) is fixed, and hence H(g(X)|X) =
x p(x)H(g(X)|X = x) = x 0 = 0.
STEP (c) : H(X, g(X)) = H(g(X)) + H(X|g(X)) again by the chain rule.
STEP (d) : H(X|g(X)) 0, with equality iff X is a function of g(X), i.e., g(.) is one-to-one. Hence
H(X, g(X)) H(g(X)).

Combining STEP (b) and (d), we obtain H(X) H(g(X)).

(b) By the part (a), we know that passing a random variable through a function can only reduce the
entropy or leave it unchanged, but never increase it. That is, H(g(X)) H(X), for any function g.
The reason for this is simply that if the function g is not one-to-one, then it will merge some states,
reducing entropy.
The trick, then, for this problem, is simply to determine whether or not the mappings are one-to-one.
If so, then entropy is unchanged. If the mappings are not one-to-one, then the entropy is necessarily
decreased. Note that whether the function is one-to-one or not is only meaningful for the support of
X, i.e. for all x with p(x) > 0.

Y = X 7 is one-to-one and hence the entropy, which is just a function of the probabilities does
not change, i.e. H(X) = H(Y ).

Y = cos(X/3) is not one-to-one, unless the support of X is rather small, since this function maps
the entire set of integers into just three different values!

3. One wishes to identify a random object X p(x). A question Q r(q) is asked at random accord-
ing to r(q). This results in a deterministic answer A = A(x, q) {a1 , a2 , }. Suppose the object
X and the question Q are independent. Then I(X; Q, A) is the uncertainty in X removed by the
question-answer (Q, A).

(a) Show I(X; Q, A) = H(A|Q). Interpret.

(b) Now suppose that two i.i.d. questions Q1 , Q2 r(q) are asked, eliciting answers A1 and
A2 . Show that two questions are less valuable than twice a single question in the sense that
I(X; Q1 , A1 , Q2 , A2 ) 2I(X; Q1 , A1 ).

Solution :
(a)

I(X; Q, A) = H(Q, A) H(Q, A|X)


= H(Q) + H(A|Q) H(Q|X) H(A|Q, X)
= H(Q) + H(A|Q) H(Q)
= H(A|Q)

The interpretation is as follows. The uncertainty removed in X by (Q, A) is the same as the uncer-
tainty in the answer given the question.

(b) Using the result from part (a) and the fact that questions are independent, we can easily obtain

2
Harvard SEAS ES250 Information Theory

the desired relationship.


(a)
I(X 0 Q1 , A1 , Q2 , A2 ) = I(X; Q1 ) + I(X; A1 |Q1 ) + I(X; Q2|A1 , Q1 ) + I(X; A2 |A1 , Q1 , Q2 )
(b)
= I(X; A1 |Q1 ) + H(Q2 |A1 , Q1 ) H(Q2 |X, A1 , Q1 ) + I(X; A2 |A1 , Q1 , Q2 )
(c)
= I(X; A1 |Q1 ) + I(X; A2 |A1 , Q1 , Q2 )
= I(X; A1 |Q1 ) + H(A2 |A1 , Q1 , Q2 ) H(A2 |X, A1 , Q1 , Q2 )
(d)
= I(X; A1 |Q1 ) + H(A2 |A1 , Q1 , Q2 )
(e)
I(X; A1 |Q1 ) + H(A2 |Q2 )
(f )
= 2I(X; A1 |Q1 )

(a) Chain rule.


(b) X and Q1 are independent.
(c) Q2 are independent of X, Q1 and A1 .
(d) A2 is completely determined given Q2 and X.
(e) Conditioning decreases entropy.
(f) Result from part a.

4. Let the random variable X have three possible outcomes {a, b, c}. Consider two distributions on this
random variable.

Symbol p(x) q(x)


a 1/2 1/3
b 1/4 1/3
c 1/4 1/3

(a) Calculate H(p), H(q), D(p||q) and D(q||p).

(b) Verify that in this case D(p||q) 6= D(q||p).

Solution :
1 1 1
H(p) = log 2 + log 4 + log 4 = 1.5 bits
2 4 4
1 1 1
H(q) = log 3 + log 3 + log 3 = log 3 = 1.58496 bits
3 3 3
1 3 1 3 1 3
D(p k q) = log + log + log = log 3 1.5 = 0.08496 bits
2 2 4 4 4 4
1 2 1 4 1 4 5
D(q k p) = log + log + log = log 3 = 0.08170 bits
3 3 3 3 3 3 3

5. Here is a statement about pairwise independence and joint independence. Let X, Y1 and Y2 be binary
random variables. If I(X; Y1 ) = 0 and I(X; Y2 ) = 0, does it follow that I(X; Y1 , Y2 ) = 0?

(a) Yes or no?

(b) Prove or provide a counterexample.

3
Harvard SEAS ES250 Information Theory

(c) If I(X; Y1 ) = 0 and I(X; Y2 ) = 0 in the above problem, does it follow that I(Y1 ; Y2 ) = 0?
In other worlds, if Y1 is independent of X, and of Y2 is independent of X, is it true that Y1 and Y2
are independent?

Solution :
(a) The answer is no.

(b) Although at first the conjecture seems reasonable enough-after all, if Y1 gives you no information
about X, and if Y2 gives you no information about X, then why should the two of them together
give any information? But remember, it is NOT the case that I(X; Y1 , Y2 ) = I(X; Y1 ) + I(X; Y2 ).
The chain rule for information says instead that I(X; Y1 , Y2 ) = I(X; Y1 ) + I(X; Y2 |Y1 ). The chain
rule gives us reason to be skeptical about the conjecture.
This problem is reminiscent of the well-known fact in probability that pair-wise independence of three
random variables is not sufficient to guarantee that all three are mutually independent. I(X; Y1 ) = 0
is equivalent to saying that X and Y1 are independent. Similarly for X and Y2 . But just because
X is pairwise independent with each of Y1 and Y2 , it does not follow that X is independent of the
vector (Y1 , Y2 ).
Here is a simple counterexample. Let Y1 and Y2 be independent fair coin flips. And let X =
Y1 XOR Y2 . X is pairwise independent of both Y1 and Y2 , but obviously not independent of the
vector (Y1 , Y2 ), since X is uniquely determined once you know (Y1 , Y2 ).

(c) Again the answer is no. Y1 and Y2 can be arbitrarily dependent with each other and both
still be independent of X. For example, let Y1 = Y2 be two observations of the same fair coin flip,
and X an independent fair coin flip. Then I(X; Y1 ) = I(X; Y2 ) = 0 because X is independent of
both Y1 and Y2 . However, I(Y1 ; Y2 ) = H(Y1 ) H(Y1 |Y2 ) = H(Y1 ) = 1.

6. Let X1 and X2 be identically distributed, but not necessarily independent. Let

H(X2 |X1 )
=1
H(X1 )
I(X1 ;X2 )
(a) Show = H(X1 ) .

(b) Show 0 1.

(c) When is = 0?

(d) When is = 1?

Solution :

4
Harvard SEAS ES250 Information Theory

(a)

H(X1 ) H(X2 |X1 )


=
H(X1 )
H(X2 ) H(X2 |X1 )
= (sinceH(X1 ) = H(X2 ))
H(X1 )
I(X1 ; X2 )
=
H(X1 )

(b) Since 0 H(X2 |X1 ) H(X2 ) = H(X1 ), we have

H(X2 |X1 )
0 1
H(X1 )
01

(c) = 0 iff I(X1 ; X2 ) = 0 iff X1 and X2 are independent.

(d) = 1 iff H(X2 ; X1 ) = 0 iff X2 is a function of X1 . By symmetry, X1 is a function of X2 ,


i.e., X1 and X2 have a one-to-one relationship.

7. Consider a sequence of n binary random variables X1 , X2 , , Xn . Each n-sequence with an even


number of 1s has probability 2(n1) and each n-sequence with an odd number of 1s has probability
0. Find the mutual informations

I(X1 ; X2 ), I(X2 ; X3 |X1 ), , I(Xn1 ; Xn |X1 , , Xn2 )

Solution :
Consider a sequence of n binary random variables X1 , X2 , , Xn . Each sequence of length n with
an even number of 1s is equally likely and has probability 2(n1) .
Any n 1 or fewer of these are independent. Thus, for k n 1,

I(Xk1 ; Xk |X1 , X2 , , Xk2 ) = 0

However, given X1 , X2 , , Xn2 , we know that once we know either Xn1 or Xn we know the other.

I(Xn1 ; Xn |X1 , X2 , , Xn2 ) = H(Xn |X1 , X2 , , Xn2 ) H(Xn |X1 , X2 , , Xn1 )


= 1 0 = 1 bit

8. Let X, Y and Z be joint random variables.

(a) Prove the following inequality and find conditions for equality

I(X; Z|Y ) I(Z; Y |X) I(Z; Y ) + I(X; Z)

(b) Give examples of X, Y and Z for the following inequalities

5
Harvard SEAS ES250 Information Theory

I(X; Y |Z) < I(X; Y )


I(X; Y |Z) > I(X; Y )

Solution :
(a) Using the chain rule for mutual information,

I(X; Z|Y ) + I(Z; Y ) = I(X, Y ; Z) = I(Z; Y |X) + I(X; Z),

and therefore

I(X; Z|Y ) = I(Z; Y |X) I(Z; Y ) + I(X; Z)

We see that this inequality is actually an equality in all cases.

(b)

I(X; Y |Z) < I(X; Y )


The lase corollary to Theorem 2.8.1 in the text states that if X Y Z that is, if p(x, y|z) =
p(x|z)p(y|z) then, I(X; Y ) I(X; Y |Z). Equality holds if and only if I(X; Z) = 0 or X and Z
are independent.
A simple example of random variables satisfying the inequality conditions above is, X is a fair
binary random variable and Y = X and Z = Y . In this case,

I(X; Y ) = H(X) H(X|Y ) = H(X) = 1

and,

I(X; Y |Z) = H(X|Z) H(X|Y, Z) = 0

So that, I(X; Y ) > I(X; Y |Z).


I(X; Y |Z) > I(X; Y )
This example is also given in the text. Let X, Y be independent fair binary random variables
and let Z = X + Y . In this case we have that,

I(X; Y ) = 0

and,

I(X; Y |Z) = H(X|Z) = 1/2

So I(X; Y ) < I(X; Y |Z). Note that in this case X, Y, Z are not Markov.

9. We are given the following joint distribution on (X, Y ):

XY a b c
1 1/6 1/12 1/12
2 1/12 1/6 1/12
3 1/12 1/12 1/6

6
Harvard SEAS ES250 Information Theory

Let X(Y ) be an estimator for X (based on Y ) and let Pe = Pr{X(Y ) 6= X}.

(a) Find the minimum probability of error estimator X(Y ) and the associated Pe .

(b) Evaluate Fanos inequality for this problem and compare.

Solution :
(a) From inspection we see that

1, y=a,
X(y) = 2, y=b,

3, y=c.

Hence the associated Pe is the sum of P (1, b), P (1, c), P (2, a), P (2, c), P (3, a) and P (3, b). Therefore,
Pe = 1/2.

(b) From Fanos inequality we know

H(X|Y ) 1
Pe
log |X |

Here,

H(X|Y ) = H(X|Y = a)P r{y = a} + H(X|Y = b)P r{y = b} + H(X|Y = c)P r{y = c}

1 1 1 1 1 1 1 1 1
=H , , P r{y = a} + H , , P r{y = b} + H , , P r{y = c}
2 4 4 2 4 4 2 4 4

1 1 1
=H , , (P r{y = a} + P r{y = b} + P r{y = c})
2 4 4

1 1 1
=H , ,
2 4 4
= 1.5 bits

Hence
1.5 1
Pe = 0.316
log 3

Hence our estimator X(Y ) is not very close to Fanos bound in this form. If X X , as it does here,
we can use the stronger form of Fanos inequality to get

H(X|Y ) 1
Pe
log(|X | 1)

and
1.5 1 1
Pe =
log 2 2

Therefore our estimator X(Y ) is actually quite good.

Potrebbero piacerti anche