HW4 ES250 Sol A

Harvard SEAS
ES250 Information Theory
Homework 4 Solutions
1. Find the channel capacity of the following discrete memoryless channel:

Z
1 . The alphabet for x is X = {0, 1}. Assume that Z is independent where Pr{Z = 0} = Pr{Z = a} = 2 of X . Observe that the channel capacity depends on the value of a.
Solution :
Y = X + Z X {0, 1}, Z {0, a} We have to distinguish various cases depending on the values of a. a = 0. In this case, Y = X , and max I (X ; Y ) = max H (X ) = 1. Hence the capacity is 1 bit per transmission. a = 0, 1. In this case, Y has four possible values 0, 1, a, and 1 + a. Knowing Y , we know the X which was sent, and hence H (X |Y ) = 0. Hence max I (X ; Y ) = max H (X ) = 1, achieved for an uniform distribution on the input X . a = 1. In this case, Y has three possible output values, 0, 1, and 2. The channel is identical to the binary erasure channel with a = 1/2. The capacity of this channel is 1/2 bit per transmission. a = 1. This is similar to the case when a = 1 and the capacity here is also 1/2 bit per transmission. 2. Consider a 26-key typewriter. (a) If pushing a key results in printing the associated letter, what is the capacity C in bits? (b) Now suppose that pushing a key results in printing that letter or the next (with equal probability). Thus A A or B , , Z Z or A. What is the capacity? (c) What is the highest rate code with block length one that you can nd that achieves zero probability of error for the channel in part (b). Solution : (a) If the typewriter prints out whatever key is struck, then the output Y , is the same as the input X , and C = max I (X ; Y ) = max H (X ) = log 26, attained by a uniform distribution over the letters. 1
Harvard SEAS
1 (b) In this case, the output is either equal to the input (with probability 2 ) or equal to the next 1 letter (with probability 2 ). Hence H (Y |X ) = log 2 independent of the distribution of X , and hence
C = max I (X ; Y ) = max H (Y ) log 2 = log 26 log 2 = log 13 attained for a uniform distribution over the output, which in turn is attained by a uniform distribution on the input. (c) A simple zero error block length one code is the one that uses every alternate letter, say A,C,E, ,W,Y. In this case, none of the codewords will be confused, since A will produce either A or B, C will produce C or D, etc. The rate of this code, R= log( codewords) log 13 = = log 13 Block length 1
In this case, we can achieve capacity with a simple code with zero error. 3. Consider a binary symmetric channel with Yi = Xi Zi , where is mod 2 addition, and Xi , Yi {0, 1}. Suppose that {Zi } has constant marginal probabilities p(Zi = 1) = p = 1 p(Zi = 0), but that Z1 , Z2 , , Zn are not necessarily independent. Let C = 1 H (p). Show that
p(x1 ,x2 , ,xn )
max
I (X1 , X2 , , Xn ; Y1 , Y2 , , Yn ) nC
Comment on the implications. Solution : 1 When X1 , X2 , , Xn are chosen i.i.d. Bern( 2 ), I (X1 , , Xn ; Y1 , , Yn ) = H (X1 , , Xn ) H (X1 , , Xn |Y1 , , Yn ) = H (X1 , , Xn ) H (Z1 , , Zn |Y1 , , Yn ) H (X1 , , Xn ) H (Z1 , , Zn ) H (X1 , , Xn ) = n nH (p) Hence, the capacity of the channel with memory over n uses of the channel is nC (n) =
p(X1 , ,Xn )
H (Zi )
max
I (X1 , , Xn ; Y1 , , Yn )
2
I (X1 , , Xn ; Y1 , , Yn )p(x1 , ,xn )=Bern( 1 ) n(1 H (p)) = nC Hence, channels with memory have higher capacity. The intuitive explanation for this result is that the correlation between the noise decreases the eective noise; one could use the information from the past samples of the noise to combat the present noise. 4. Consider the channel Y = X + Z (mod 13), 1, 2, Z= 3, and X {0, 1, , 12}. where with probability with probability with probability
1 3 1 3 1 3
Harvard SEAS
(a) Find the capacity. (b) What is the maximizing p (x)? Solution : (a) C = max I (X ; Y )
p(x )
= max H (Y ) H (Y |X )
p(x )
= max H (Y ) log 3
p(x )
= log 13 log 3 13 = log 3 which is attained when Y has a uniform distribution, which occurs (by symmetry) when X has a uniform distribution. (b) The capacity is achieved by a uniform distribution on the inputs, that is, p(X = i) = 5. Using two channels (a) Consider two discrete memoryless channels (X1 , p(y1 |x1 ), Y1 ) and (X2 , p(y2 |x2 ), Y2 ) with capacities C1 and C2 respectively. A new channel (X1 X2 , p(y1 |x1 ) p(y2 |x2 ), Y1 Y2 ) is formed in which x1 X1 and x2 X2 , are simultaneously sent, resulting in y1 , y2 . Find the capacity of this channel. (b) Find the capacity C of the union 2 channels (X1 , p(y1 |x1 ), Y1 ) and (X2 , p(y2 |x2 ), Y2 ) where, at each time, one can send a symbol over channel 1 or channel 2 but not both. Assume the output alphabets are distinct and do not intersect. Show 2C = 2C1 + 2C2 . Solution : (a) To nd the capacity of the product channel (X1 X2 , p(y1 , y2 |x1 , x2 ), Y1 Y2 ) , we have to nd the distribution p(x1 , x2 ) on the input alphabet X1 X2 that maximizes I (X1 , X2 ; Y1 , Y2 ). Since the transition probabilities are given as p(y1 , y2 |x1 , x2 ) = p(y1 |x1 )p(y2 |x2 ), p(x1 , x2 , y1 , y2 ) = p(x1 , x2 )p(y1 , y2 |x1 , x2 ) = p(x1 , x2 )p(y1 |x1 )p(y2 |x2 ) Therefore, Y1 X1 X2 Y2 forms a Markov chain and I (X1 , X2 ; Y1 , Y2 ) = H (Y1 , Y2 ) H (Y1 , Y2 |X1 , X2 ) = H (Y1 , Y2 ) H (Y1 |X1 , X2 ) H (Y2 |X1 , X2 ) = H (Y1 , Y2 ) H (Y1 |X1 ) H (Y2 |X2 ) H (Y1 ) + H (Y2 ) H (Y1 |X1 ) H (Y2 |X2 ) = I (X1 ; Y1 ) + I (X2 ; Y2 ) 3 (1) (2) (3) 1 13 for i = 0, 1, , 12.
Harvard SEAS
where (1) and (2) follow from Markovity and (3) is met with equality of X1 and X2 are independent and hence Y1 and Y2 are independent. Therefore C = max I (X1 , X2 ; Y1 , Y2 )
p(x1 ,x2 )
max I (X1 ; Y1 ) + max I (X2 ; Y2 )

p(x1 ,x2 ) p(x 1 ) p(x1 ,x2 )
= max I (X1 ; Y1 ) + max I (X2 ; Y2 )

p(x 2 )
= C1 + C2 with equality i p(x1 , x2 ) = p (x1 )p (x2 ) and p (x1 ) and p (x2 ) are the distributions that maximize C1 and C2 respectively. (b) Let = 1, if the signal is sent over the channel 1 2, if the signal is sent over the channel 2
Consider the following communication scheme: The sender chooses between two channels according to Bern() coin ip. Then the channel input is X = (, X ). Since the output alphabets Y1 and Y2 are disjoint, is a function of Y ,i.e. X Y . Therefore, I (X ; Y ) = I (X ; Y, ) = I (X , ; Y, ) = I (; Y, ) + I (X ; Y, |) = I (; Y, ) + I (X ; Y |) = H () + I (X ; Y | = 1) + (1 )I (X ; Y | = 2) = H () + I (X1 ; Y1 ) + (1 )I (X2 ; Y2 ) Thus, it follows that C = sup{H () + C1 + (1 )C2 }
which is a strictly concave function on . Hence, the maximum exists and by elementary calculus, one can easily show C = log2 (2C1 + 2C2 ), which is attained with = 2C1 /(2C1 + 2C2 ). 6. Consider a time-varying discrete memoryless binary symmetric channel. Let Y1 , Y2 , , Yn be conditionally independent given X1 , X2 , , Xn , with conditional distribution given by p(y n |xn ) = n i=1 pi (yi |xi ), as shown below.
(a) Find maxp(x) I (X n ; Y n ). 4
Harvard SEAS
(b) We now ask for the capacity for the time invariant version of this problem. Replace each pi , n 1 1 i n, by the average value p = n j =1 pj , and compare the capacity to part (a). Solution :
(a)
n
I (X n ; Y n ) = H (Y n )
i=1 n
H (Yi |Xi )
n
i=1 n
H (Yi )
i=1
H (Yi |Xi )
i=1
(1 H (pi ))
with equality if X1 , , Xn are chosen i.i.d. Bern( 1 2 ). Hence

n
max I (X1 , , Xn ; Y1 , , Yn ) =
p(x ) i=1
(1 H (pi ))
(b) Since H (p) is concave on p, by Jensens inequality, 1 n i.e.,

n n
H (pi ) H
i=1
1 n
pi
i=1
= H ( p)
H (pi ) nH ( p)
i=1
Hence,
n
Ctime-varying =
i=1
(1 H (pi ))
n
=n
i=1
H (pi )
n nH ( p)
n
=
i=1
(1 H ( p))
= Ctime invariant 7. Suppose a binary symmetric channel of capacity C1 is immediately followed by a binary erasure channel of capacity C2 . Find capacity C of the resulting channel.
Harvard SEAS
Now consider an arbitrary discrete memoryless channel (X , p(y |x), Y ) followed by a binary erasure channel , resulting in an output = Y Y, with probability 1 e, with probability
where e denotes erasure. Thus the output Y is erased with probability What is the capacity of this channel?
Solution :
(a) Let C1 = 1 H (p) be the capacity of the BSC with parameter p, and C2 = 1 be the capacity denote the output of the cascaded channel, and Y the of the BEC with parameter . Let Y output of the BSC. Then, the transition rule for the cascaded channel is simply p( y |x) =
y =0,1
p( y |y )p(y |x)
for each (x, y ) pair. Let X Bern( ) denote the input to the channel. Then, ) = H ((1 )( (1 p) + p(1 )), , (1 )(p + (1 p)(1 ))) H (Y and also |X = 0) = H ((1 )(1 p), , (1 )p) H (Y |X = 1) = H ((1 )p, , (1 )(1 p)) = H (Y |X = 0) H (Y Therefore, ) C = max I (X ; Y
p(x )
) H (Y |X )} = max{H (Y
p(x )
)} H (Y |X ) = max{H (Y
p(x )
= max{H ((1 )( (1 p) + p(1 )), , (1 )(p + (1 p)(1 )))}

p(x )
H ((1 )(1 p), , (1 )p) 6
(4)
Harvard SEAS
) occurs when = 1/2 by the concavity and symmetry Note that the maximum value of H (Y of H (). (We can check this also by dierentiating (4) with respect to .) Substituting the value = 1/2 in the expression for the capacity yields C = H ((1 )/2, , (1 )/2) H ((1 p)(1 ), , p(1 )) = (1 )(1 + p log p + (1 p) log(1 p)) = C1 C2 (b) For the cascade of an arbitrary discrete memoryless channel (with capacity C ) with the erasure channel (with the erasure probability ), we will show that ) = (1 )I (X ; Y ) I (X ; Y (5)
Then, by taking suprema of both sides over all input distributions p(x), we can conclude the capacity of the cascaded channel is (1 )C . Proof of (5): Let E= Then, since E is a function of Y , ) = H (Y , E) H (Y |E ) = H (E ) + H (Y |E = 1) + (1 )H (Y |E = 0) = H () + H (Y = H () + (1 )H (Y ), where the last equality comes directly from the construction of E . Similarly, |X ) = H (Y , E |X ) H (Y |X, E ) = H (E |X ) + H (Y |X, E = 1) + (1 )H (Y |X, E = 0) = H (E ) + H (Y = H () + (1 )H (Y |X ), whence ) = H (Y ) H (Y |X ) = (1 )I (X ; Y ) I (X ; Y 8. We wish to encode a Bernoulli() process V1 , V2 , for transmission over a binary symmetric channel with error probability p. =e 1, Y =Y 0, Y
n = V n ) can be made to go to Find conditions on and p so that the probability of error p(V zero as n . Solution : Suppose we want to send a binary i.i.d. Bern() source over a binary symmetric channel with error 7
Harvard SEAS
probability p. By the source-channel separation theorem, in order to achieve the probability of error n = V n ) 0, we need the entropy of the source to be less that vanishes asymptotically, i.e. P (V than the capacity of the channel. Hence, H () + H (p) < 1, or, equivalently, 1 (1 )1 pp (1 p)1p < . 2 9. Let (Xi , Yi , Zi ) be i.i.d. according to p(x, y, z ). We will say that (xn , y n , z n ) is jointly typical [written (n) (xn , y n , z n ) A ] if 2n(H (X )+) p(xn ) 2n(H (X )) 2n(H (Y )+) p(y n ) 2n(H (Y )) 2n(H (Z )+) p(z n ) 2n(H (Z )) 2n(H (X,Y )+) p(xn , y n ) 2n(H (X,Y )) 2n(H (X,Z )+) p(xn , z n ) 2n(H (X,Z )) 2n(H (Y,Z )+) p(y n , z n ) 2n(H (Y,Z )) 2n(H (X,Y,Z )+) p(xn , y n , z n ) 2n(H (X,Y,Z ))
n, Y n, Z n ) is drawn according to p(xn )p(y n )p(z n ). Thus, X n, Y n, Z n have the Now suppose that (X n) n, Y n, Z n ) A( same marginals as p(xn , y n , z n ) but are independent. Find (bounds on) Pr{(X } in terms of the entropies H (X ), H (Y ), H (Z ), H (X, Y ), H (X, Z ), H (Y, Z ) and H (X, Y, Z ). Solution :
n) n, Y n, Z n ) A( Pr{(X }= (xn ,y n ,z n )A
(n)
p(xn )p(y n )p(z n ) 2n(H (X )+H (Y )+H (Z )3)

(xn ,y n ,z n )A
(n)
n) n(H (X )+H (Y )+H (Z )3) |A( |2
2n(H (X,Y,Z )+) 2n(H (X )+H (Y )+H (Z )3) 2n(H (X,Y,Z )H (X )H (Y )H (Z )+4) Also,
n) n, Y n, Z n ) A( Pr{(X }= (xn ,y n ,z n )A
(n)
p(xn )p(y n )p(z n ) 2n(H (X )+H (Y )+H (Z )+3)

(xn ,y n ,z n )A
(n)
n) n(H (X )+H (Y )+H (Z )+3) |A( |2
(1 )2n(H (X,Y,Z )) 2n(H (X )+H (Y )+H (Z )3) (1 )2n(H (X,Y,Z )H (X )H (Y )H (Z )4) Note that the upper bound is true for all n, but the lower bound only hold for n large. 8
Harvard SEAS
10. Twenty questions. (a) Player A chooses some object in the universe, and player B attempts to identify the object with a series of yes-no questions. Suppose that player B is clever enough to use the code achieving the minimal expected length with respect to player As distribution We observe that player B requires an average 38.5 questions to determine the object. Find a rough lower bound to the number of objects in the universe. (b) Let X be uniformly distributed over {1, 2, , m}. Assume that m = 2n . We ask random questions: Is X S1 ? Is X S2 ? until only one integer remains. All 2m subsets S of {1, 2, m} are equally likely. i. How many deterministic questions are needed to determine X ? ii. Without loss of generality, suppose that X = 1 is the random object. What is the probability that object 2 yields the same answers as object 1 for k questions? iii. What is the expected number of objects in {2, 3, , m} that have the same answers to the questions as those of the correct object 1? Solution :
(a) 37.5 = L 1 < H (X ) log |X | and hence number of objects in the universe > 237.5 = 1.94 1011 . (b) i. Obviously, Human codewords for X are all of length n. Hence, with n deterministic questions, we can identify an object out of 2n candidates. ii. Observe that the total number of subsets which include both object 1 and object 2 or neither of them is 2m1 . Hence, the probability that object 2 yields the same answers for k questions as object 1 is (2m1 /2m )k = 2k . iii. Let 1j = 1, object j yields the same answers for k questions as object 1 0, otherwise. for j = 2, , m Then E [N ] = E
m
m j =2
1j
=
j =2 m
E [1j ] 2k
j =2
= (m 1)2k = (2n 1)2k where N is the number of objects in {2, 3, , m} with the same answers.

HW4 ES250 Sol A

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

HW4 ES250 Sol A

Caricato da

Copyright:

Formati disponibili

Harvard SEAS

ES250 Information Theory

1. Find the channel capacity of the following discrete memoryless channel:

ES250 Information Theory

ES250 Information Theory

ES250 Information Theory

max I (X1 ; Y1 ) + max I (X2 ; Y2 )

= max I (X1 ; Y1 ) + max I (X2 ; Y2 )

(a) Find maxp(x) I (X n ; Y n ). 4

ES250 Information Theory

with equality if X1 , , Xn are chosen i.i.d. Bern( 1 2 ). Hence

(b) Since H (p) is concave on p, by Jensens inequality, 1 n i.e.,

ES250 Information Theory

= max{H ((1 )( (1 p) + p(1 )), , (1 )(p + (1 p)(1 )))}

H ((1 )(1 p), , (1 )p) 6

ES250 Information Theory

ES250 Information Theory

p(xn )p(y n )p(z n ) 2n(H (X )+H (Y )+H (Z )3)

n) n(H (X )+H (Y )+H (Z )3) |A( |2

p(xn )p(y n )p(z n ) 2n(H (X )+H (Y )+H (Z )+3)

n) n(H (X )+H (Y )+H (Z )+3) |A( |2

ES250 Information Theory

Potrebbero piacerti anche