Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Homework 1
Solution
XY 0 1
0 1/3 1/3
1 0 1/3
Solution :
(a) Let X be a discrete random variable. Show that the entropy of a function of X is less than
or equal to the entropy of X by justifying the following steps:
(a)
H(X, g(X)) = H(X) + H(g(X)|X)
(b)
= H(X)
(c)
H(X, g(X)) = H(g(X)) + H(X|g(X))
(d)
H(g(X))
(b) Let Y = X 7 , where X is a random variable taking in positive and negative integer values.
What is the relationship of H(X) and H(Y )? What if Y = cos(X/3) ?
Solution :
(a)
STEP (a) : H(X, g(X)) = H(X) + H(g(X)|X) by the chain rule for entropies.
1
Harvard SEAS ES250 Information Theory
(b) By the part (a), we know that passing a random variable through a function can only reduce the
entropy or leave it unchanged, but never increase it. That is, H(g(X)) H(X), for any function g.
The reason for this is simply that if the function g is not one-to-one, then it will merge some states,
reducing entropy.
The trick, then, for this problem, is simply to determine whether or not the mappings are one-to-one.
If so, then entropy is unchanged. If the mappings are not one-to-one, then the entropy is necessarily
decreased. Note that whether the function is one-to-one or not is only meaningful for the support of
X, i.e. for all x with p(x) > 0.
Y = X 7 is one-to-one and hence the entropy, which is just a function of the probabilities does
not change, i.e. H(X) = H(Y ).
Y = cos(X/3) is not one-to-one, unless the support of X is rather small, since this function maps
the entire set of integers into just three different values!
3. One wishes to identify a random object X p(x). A question Q r(q) is asked at random accord-
ing to r(q). This results in a deterministic answer A = A(x, q) {a1 , a2 , }. Suppose the object
X and the question Q are independent. Then I(X; Q, A) is the uncertainty in X removed by the
question-answer (Q, A).
(b) Now suppose that two i.i.d. questions Q1 , Q2 r(q) are asked, eliciting answers A1 and
A2 . Show that two questions are less valuable than twice a single question in the sense that
I(X; Q1 , A1 , Q2 , A2 ) 2I(X; Q1 , A1 ).
Solution :
(a)
The interpretation is as follows. The uncertainty removed in X by (Q, A) is the same as the uncer-
tainty in the answer given the question.
(b) Using the result from part (a) and the fact that questions are independent, we can easily obtain
2
Harvard SEAS ES250 Information Theory
4. Let the random variable X have three possible outcomes {a, b, c}. Consider two distributions on this
random variable.
Solution :
1 1 1
H(p) = log 2 + log 4 + log 4 = 1.5 bits
2 4 4
1 1 1
H(q) = log 3 + log 3 + log 3 = log 3 = 1.58496 bits
3 3 3
1 3 1 3 1 3
D(p k q) = log + log + log = log 3 1.5 = 0.08496 bits
2 2 4 4 4 4
1 2 1 4 1 4 5
D(q k p) = log + log + log = log 3 = 0.08170 bits
3 3 3 3 3 3 3
5. Here is a statement about pairwise independence and joint independence. Let X, Y1 and Y2 be binary
random variables. If I(X; Y1 ) = 0 and I(X; Y2 ) = 0, does it follow that I(X; Y1 , Y2 ) = 0?
3
Harvard SEAS ES250 Information Theory
(c) If I(X; Y1 ) = 0 and I(X; Y2 ) = 0 in the above problem, does it follow that I(Y1 ; Y2 ) = 0?
In other worlds, if Y1 is independent of X, and of Y2 is independent of X, is it true that Y1 and Y2
are independent?
Solution :
(a) The answer is no.
(b) Although at first the conjecture seems reasonable enough-after all, if Y1 gives you no information
about X, and if Y2 gives you no information about X, then why should the two of them together
give any information? But remember, it is NOT the case that I(X; Y1 , Y2 ) = I(X; Y1 ) + I(X; Y2 ).
The chain rule for information says instead that I(X; Y1 , Y2 ) = I(X; Y1 ) + I(X; Y2 |Y1 ). The chain
rule gives us reason to be skeptical about the conjecture.
This problem is reminiscent of the well-known fact in probability that pair-wise independence of three
random variables is not sufficient to guarantee that all three are mutually independent. I(X; Y1 ) = 0
is equivalent to saying that X and Y1 are independent. Similarly for X and Y2 . But just because
X is pairwise independent with each of Y1 and Y2 , it does not follow that X is independent of the
vector (Y1 , Y2 ).
Here is a simple counterexample. Let Y1 and Y2 be independent fair coin flips. And let X =
Y1 XOR Y2 . X is pairwise independent of both Y1 and Y2 , but obviously not independent of the
vector (Y1 , Y2 ), since X is uniquely determined once you know (Y1 , Y2 ).
(c) Again the answer is no. Y1 and Y2 can be arbitrarily dependent with each other and both
still be independent of X. For example, let Y1 = Y2 be two observations of the same fair coin flip,
and X an independent fair coin flip. Then I(X; Y1 ) = I(X; Y2 ) = 0 because X is independent of
both Y1 and Y2 . However, I(Y1 ; Y2 ) = H(Y1 ) H(Y1 |Y2 ) = H(Y1 ) = 1.
H(X2 |X1 )
=1
H(X1 )
I(X1 ;X2 )
(a) Show = H(X1 ) .
(b) Show 0 1.
(c) When is = 0?
(d) When is = 1?
Solution :
4
Harvard SEAS ES250 Information Theory
(a)
H(X2 |X1 )
0 1
H(X1 )
01
Solution :
Consider a sequence of n binary random variables X1 , X2 , , Xn . Each sequence of length n with
an even number of 1s is equally likely and has probability 2(n1) .
Any n 1 or fewer of these are independent. Thus, for k n 1,
However, given X1 , X2 , , Xn2 , we know that once we know either Xn1 or Xn we know the other.
(a) Prove the following inequality and find conditions for equality
5
Harvard SEAS ES250 Information Theory
Solution :
(a) Using the chain rule for mutual information,
and therefore
(b)
and,
I(X; Y ) = 0
and,
So I(X; Y ) < I(X; Y |Z). Note that in this case X, Y, Z are not Markov.
XY a b c
1 1/6 1/12 1/12
2 1/12 1/6 1/12
3 1/12 1/12 1/6
6
Harvard SEAS ES250 Information Theory
(a) Find the minimum probability of error estimator X(Y ) and the associated Pe .
Solution :
(a) From inspection we see that
1, y=a,
X(y) = 2, y=b,
3, y=c.
Hence the associated Pe is the sum of P (1, b), P (1, c), P (2, a), P (2, c), P (3, a) and P (3, b). Therefore,
Pe = 1/2.
H(X|Y ) 1
Pe
log |X |
Here,
H(X|Y ) = H(X|Y = a)P r{y = a} + H(X|Y = b)P r{y = b} + H(X|Y = c)P r{y = c}
1 1 1 1 1 1 1 1 1
=H , , P r{y = a} + H , , P r{y = b} + H , , P r{y = c}
2 4 4 2 4 4 2 4 4
1 1 1
=H , , (P r{y = a} + P r{y = b} + P r{y = c})
2 4 4
1 1 1
=H , ,
2 4 4
= 1.5 bits
Hence
1.5 1
Pe = 0.316
log 3
Hence our estimator X(Y ) is not very close to Fanos bound in this form. If X X , as it does here,
we can use the stronger form of Fanos inequality to get
H(X|Y ) 1
Pe
log(|X | 1)
and
1.5 1 1
Pe =
log 2 2