Sei sulla pagina 1di 33

. . . . . .

.
.
. . .
.
.
Information Theory: Principles and Applications
Tiago T. V. Vinhoza
April 16, 2010
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 1 / 33
. . . . . .
.
.
.1
Channel Coding Theorem
Preview and some denitions
Achievability
Converse
.
.
.2
Error Correcting Codes
.
.
.3
Joint Source-Channel Coding
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 2 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Why the channel capacity is important?
Shannon proved that the channel capacity is the maximum number of
bits that can be reliably transmitted over the channel.
Reliably = probability of error can be made arbitrarily small.
Channel coding theorem.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 3 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Intuitive idea of channel capacity as a fundamental limit
Basic idea: For large block lengths, every channel looks like the noisy
typewriter channel shown last class.
Channel has a subset of inputs that produce disjoint sequences at the
output.
Tipicality arguments.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 4 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Intuitive idea of channel capacity as a fundamental limit
For each typical input sequence of length n, there are approximately
2
nH(Y |X)
possible Y sequences.
Desirable: No two dierent X sequences produce the same Y output
sequence.
Total number of typical Y sequences is approx. 2
nH(Y )
.
So that total number of disjoint sets is less or equal to
2
n(H(Y )H(Y |X)
= 2
nI(X;Y )
At most 2
nI(X;Y )
distinguishable sequences of length n can be sent.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 5 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Formalizing the ideas
Message W is drawn from the index set {1, 2, . . . , M}.
Encoded signal X
n
(W).
The encoded signal passes through the channel and is received as the
sequence Y
n
. The channel is described by a transition probability
matrix P(Y
n
|X
n
).
Receiver guesses index W using a decoding rule

W = g(Y
n
).
If

W = W then an error occurs.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 6 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Denition: Discrete Channel
A discrete channel consists of two nite sets X and Y and a collection
of probability distributions p
Y |X=x
(y) one for each x X.
(X, p
Y |X=x
(y), Y)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 7 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Denition: Extension of the Discrete Memoryless Channel
The n-th extension of the discrete memoryless channel is the channel
(X
n
, p
Y
n
|X
n
=x
n(y
n
), Y
n
) where
P(Y
k
|X
k
, Y
k1
) = P(Y
k
|X
k
), k = 1, 2, . . . , n.
If the channel is used without feedback, that is, the inputs do not
depend on the past outputs then the channel transition function for
the discrete memoryless channel is
p
Y
n
|X
n
=x
n(y
n
) =
n

i=1
p
Y
i
|X
i
=x
i
(y
i
)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 8 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Denition: Code for a channel
An (M, n) code for the channel (X, p
Y |X=x
(y), Y) consists of the
following:
An index set {1, 2, . . . , M}.
An encoding function X
n
: {1, 2, . . . , M} X
n
, that generates
codewords X
n
(1), . . . , X
n
(M).
A decoding function
g : Y
n
{1, 2, . . . , M}.
which assigns a guess to each possible received vector
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 9 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Denition: Rate of a code
The rate R of an (M, n) code is:
R =
log M
n
bits per transmission.
A rate R is achievable if there exists a sequence of (2
nR
, n) codes
such that the maximal probability of error goes to zero as n goes to
innity.
The capacity of a discrete memoryless channel is the supremum of all
achievable rates.
Rates less than capacity yield arbitrarily small probability of error for
suciently large n.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 10 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Denition: Probability of Error
Conditional probability of error given that index i was sent:

i
= P(g(Y
n
) = i|X
n
= X
n
(i)) =

y
n
p
Y
n
|X
n
=x
n
(i)
(y
n
)I(g(y
n
) = i)
where I() is the indicator function.
Maximal probability of error:

(n)
= max
i{1,2,...M}

i
Average probability of error for an (M, n) code:
P
(n)
e
=
1
M
M

i=1

i
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 11 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Denition: Joint Typical Sequences
The decoding procedure employed in the proofs will decode a channel
output Y
n
as the i-th index if the codeword X
n
(i) is jointly-typical
with the received sequence Y
n
.
The set A
(n)

of jointly typical sequences {(x


n
, y
n
)} with respect to
their joint distribution is the set of n-sequences with sample entropy
-close to the true entropies.
A
(n)

=
{
(x
n
, y
n
) X
n
Y
n
:

log p
X
n(x
n
)
n
H(X)

<

log p
Y
n(y
n
)
n
H(Y )

<

log p
X
n
Y
n(x
n
, y
n
)
n
H(X, Y )

<
}
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 12 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Joint AEP
Let (X
n
, Y
n
) be sequences of length n drawn i.i.d. according to the
joint distribution p
X
n
Y
n(x
n
, y
n
) =

n
i=1
p
X
i
Y
i
(x
i
, y
i
) then
P((X
n
, Y
n
) A
(n)

) 1 as n .
|A
(n)

| 2
n(H(X,Y )+)
and |A
(n)

| (1 )2
n(H(X,Y ))
If (

X
n
,

Y
n
) are independent and have the same marginals as X
n
and
Y
n
P((

X
n
,

Y
n
) A
(n)

) =

(x
n
,y
n
)A
(n)

p
X
n(x
n
)p
Y
n(y
n
)
2
n(H(X,Y )+)
2
n(H(X))
2
n(H(Y ))
= 2
n(I(X,Y )3)
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 13 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Joint AEP
There are about 2
nH(X)
typical X sequences, and about 2
nH(Y )
typical Y sequences. However, since there are only 2
nH(X,Y )
jointly
typical sequences, not all pairs (X
n
, Y
n
) with X
n
and Y
n
being
typical are jointly typical.
The probability that any randomly chosen pair is jointly typical is
about 2
nI(X;Y )
. So, for a xed Y
n
sequence, we can consider
2
nI(X;Y )
of pairs before we come across a jointly typical pair. This
suggests that there are about 2
nI(X;Y )
distinguishable sequences X
n
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 14 / 33
. . . . . .
Channel Coding Theorem Preview and some denitions
Channel Coding Theorem
Achievability: Consider a discrete memoryless channel with capacity
C. All rates R < C are achievable. Specically, for every rate R < C
there exists a sequence of (2
nR
, n) codes with maximum probability of
error arbitrarily small.
Converse: Consider a discrete memoryless channel with capacity C.
For any sequence of (2
nR
, n) codes with maximum probability of error
as small as we want, then R < C.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 15 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Generate an (2
nR
, n) code at random accordion to the distribution
p
X
(x), that is, generate 2
nR
codewords according to the probability
distribution p
X
n(x
n
) =

n
i=1
p
X
i
(x
i
).
Exhibit the 2
nR
codewords as the rows of the matrix
C =

x
1
(1) x
2
(1) . . . x
n
(1)
.
.
.
.
.
.
.
.
.
.
.
.
x
1
(2
nR
) x
2
(2
nR
) . . . x
n
(2
nR
)

Reveal the code to transmitter and receiver (they both know the
channel transition matrix P(Y |X), too.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 16 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Message W is chosen according to a uniform distribution, that is,
P(W = w) = 2
nR
, for w = 1, 2, . . . , 2
nR
.
The codeword X
n
(w), corresponding to the w-th row of matrix C is
sent over the channel.
The receiver gets sequence Y
n
according to the distribution
p
Y
n
|X
n
=x
n
(w)
(y
n
) =

n
i=1
p
Y
i
|X
i
=x
i
(w)
(y
i
)
Receiver guesses message using typical set decoding
Receiver declares that index i was sent if
(X
n
(i), Y
n
) are jointly typical.
there is no other index j such that (X
n
(j), Y
n
) are jointly typical.
Otherwise the receive declares and error.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 17 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Analysis of the error probability
Instead of calculating the probability of error for a single code, we
compute the average over all codes generated at random according to
the probability distribution P(C)
Two types of error events: The output Y
n
is not jointly typical with
the transmitted codeword or there is another codeword with is also
jointly typical with Y
n
.
The probability that the transmitted codeword and the received
sequence are jointly typical goes to one as shown by the AEP.
For the rival codewords, the probability that any one of them is jointly
typical with the received sequence is about 2
nI(X;Y )
, so we can use
2
nI(X;Y )
codewords and have a small error probability.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 18 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Calculating the average error probability
P(E) =

C
P(C)P
(n)
e
(C)
=

C
P(C)
1
2
nR
2
nR

w=1

w
(C)
=
1
2
nR
2
nR

w=1

C
P(C)
w
(C)
By the symmetry of the code construction, the average probability of
error over all codes does not depend on the particular index that was
sent.

C
P(C)
w
(C) is not a function of w.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 19 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Assuming WLOG that W = 1 was sent.
P(E) =

C
P(C)
1
(C) = P(E|W = 1)
Dening the events
E
i
= {(X
n
(i), Y
n
) is in A
(n)

}, i = 1, 2, . . . , 2
nR
The error events in our case are
E
1
, that is, the complement of E
1
occurs. This means that Y
n
and
X
n
(1) are not jointly typical.
E
2
or E
3
or ... E
2
nR occurs. This means that a wrong codeword is
jointly typical with Y
n
.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 20 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Evaluating
P(E|W = 1) = P(E
1
E
2
E
3
. . . E
2
nR)
P(E
1
) +
2
nR

i=2
P(E
i
)
The inequality is due to the union bound.
By the joint AEP, P(E
1
) < for suciently large n.
As X
n
(1) and X
n
(i) are independent (code generation procedure), it
follows that Y
n
and X
n
(i) are also independent if i = 1. Hence, from
the joint AEP
P(E
i
) 2
n(I(X;Y )3)
if i = 1.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 21 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Evaluating
P(E|W = 1) +
2
nR

i=2
2
n(I(X;Y )3)
= + (2
nR
1)2
n(I(X;Y )3)
+ (2
nR
)2
n(I(X;Y )3)
= + (2
n3
)2
n(I(X;Y )R)
2
if n is suciently large and R < I(X; Y ) 3
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 22 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
Evaluating
P(E|W = 1) +
2
nR

i=2
2
n(I(X;Y )3)
= + (2
nR
1)2
n(I(X;Y )3)
+ (2
nR
)2
n(I(X;Y )3)
= + (2
n3
)2
n(I(X;Y )R)
2
if n is suciently large and R < I(X; Y ) 3
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 23 / 33
. . . . . .
Channel Coding Theorem Achievability
Channel Coding Theorem: Achievability
If R < I(X; Y ), we can choose and n so that the average
probability of error over all codebooks is less than 2.
If the input distribution p
X
(x) is the one that achieves the channel
capacity C, then the achievability condition is replaced by R < C.
If the average probability of error over all codebooks is less than 2,
than there exists at least one codebook C

with an average probability


of error P
(n)
e
2.
2
1
2
nR

i
(C

) = P
(n)
e
This implies that at least half of the indices i and their codewords
have
i
< 4. Using only this best half of codewords we have 2
nR1
codewords and the new rate R

= R 1/n R for large n.


Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 24 / 33
. . . . . .
Channel Coding Theorem Converse
Channel Coding Theorem: Converse
We have now to show that any sequence of (2
nR
, n) codes with

(n)
0 must have R C.
If the maximal error probability goes to zero, the average error
probability, P
(n)
e
, also goes to zero. For each n, let W be drawn from
a uniform distribution over {1, 2, . . . , 2
nR
}. Since W is uniform,
P
(n)
e
= P(

W = W)
We will resort to the Fanos Inequality to prove the converse.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 25 / 33
. . . . . .
Channel Coding Theorem Converse
Channel Coding Theorem: Converse
Proving the converse:
nR = H(W) = H(W|Y
n
) +I(W; Y
n
)
H(W|Y
n
) +I(X
n
(W); Y
n
)
1 +P
(n)
e
nR +I(X
n
(W); Y
n
)
1 +P
(n)
e
nR +nC
Dividing by n, and rewriting we get
P
(n)
e
1
C
R

1
nR
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 26 / 33
. . . . . .
Channel Coding Theorem Converse
Channel Coding Theorem
The theorem shows that good codes exist with exponentially small
probability of error for long block lengths.
No systematic way of constructing such codes is provided.
Random Codes: No structure Lookup table decoding Huge table
size for large blocks.
1950s: Coding theorists started searching for good codes and to
devise ecient implementations.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 27 / 33
. . . . . .
Error Correcting Codes
Error Correcting Codes
Error control coding: Addition of redundancy in a smart way to
combat errors induced by the channel.
Error detection
Error correction
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 28 / 33
. . . . . .
Error Correcting Codes
Error Correcting Codes
Block Codes
Linear Codes: Encoding and Decoding
Hamming Codes
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 29 / 33
. . . . . .
Joint Source-Channel Coding
Joint Source Channel Coding
The source coding theorem states that for data compression R > H.
The channel coding theorem states that for data transmission R < C.
Is the condition H < C necessary and sucient for sending a source
over a channel?
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 30 / 33
. . . . . .
Joint Source-Channel Coding
Joint Source Channel Coding
Consider a source modeled by a nite alphabet stochastic process
V
n
= V
1
, V
2
, . . . , V
n
,with entropy rate H(V) that satises the AEP.
Achievability: The source can be sent reliably over a discrete
memoryless channel with capacity C if H(V) < C
Converse: The source cannot be sent reliably over a discrete
memoryless channel with capacity C if H(V) > C
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 31 / 33
. . . . . .
Joint Source-Channel Coding
Joint Source Channel Coding
The source channel separation theorem shows that it is possible to
design the source code and the channel code separately and combine
the results to achieve optimal performance.
Asymptotic optimality: For nite block length, the probability of error
can be reduced by using joint source-channel coding.
That separation, however, fails for some multiuser channels.
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 32 / 33
. . . . . .
Joint Source-Channel Coding
Next Steps
Lossy Source Coding
Multiple-user Information Theory
Tiago T. V. Vinhoza () Information Theory - MAP-Tele April 16, 2010 33 / 33

Potrebbero piacerti anche