Sei sulla pagina 1di 108

Information theory and

Coding.

Wadih Sawaya
General Introduction.
Communication systems

The Shannon ’s paradigm


2. Communication Systems

• Communication systems are designed to transmit an information generated by


a source to some destination.

• There exists between the source and the destination a communicating channel
affected by various disturbances.

SOURCE CHANNEL RECEIVER

disturbances

Figure: Block diagram of a communication system:


The Shannon’s paradigm

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


3. Communication Systems

SOURCE CHANNEL RECEIVER

disturbances

Information is emitted The user of the information have


from the source by mean to reproduce the exact emitted
of sequence of symbols. sequence in order to extract
information.

The presence of the disturbed


channel may introduce changes
in the emitted sequence.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


4. Communication Systems

The designer of a communication system will be


asked to:

1. insure a high quality of transmission with an “as low as possible”


error rate in the reproduced sequence
 Different user requirements may lead to different criteria of
acceptability.
 Ex: speech transmission, data, audio/video,…

2. provide the higher Information rate through the channel


because:
 The use of a channel is costly.
 The channel employs different limited resources (time, frequency,
power…).

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


5. Communication Systems

Source Channel
SOURCE
Coding Coding C
H
A
N
N
E
Source Channel L
User
Decoding Decoding

Figure: Extension of the Shannon’s paradigm

• Source : deliver Information as a sequence of source symbols.


• Source coding: provide “in average” the shortest description of the emitted
sequences ⇒ higher information rate
• Channel: generates disturbances.
• Channel Coding: protect the information from errors induced by the
channel, by voluntary adding redundancy to information ⇒ higher quality of
transmission

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


Course Contents

Part I – An Information measure.

Part II – Source Coding

Part III – The Communication Channel.

Part III – Channel Coding.


Part I – An Information measure.

Part II – Source Coding

Part III – The Communication Channel.

Part III – Channel Coding.


PREAMBLE
In 1948 C. E. Shannon developed a “Mathematical theory of Communication”, called
information theory. This theory deals with the most fundamental aspects of a communication
system. It emphasis on probability theory and has a primary concern with encoders and
decoders, in terms of their functional role, and in terms of existence of encoders and decoders
that achieve a given level of performance. The latter aspect of this theory is established by
mean of two fundamental theorems.

As in any mathematical theory, this theory deals only with mathematical models and not
with physical sources and physical channels. To proceed we will study the simplest classes of
mathematical models of sources and channels. Naturally the choice of these models will be
influenced by the more important existing real physical sources and physical channels.

After understanding the theory we will emphasize on practical implementation of channel


coding and decoding, provided the important relationships established by the theory, which
appears being useful indications of tradeoffs that exist in constructing encoders and decoders.

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


9. An Information measure
• A discrete source deliver a sequence of symbols from the alphabet {x1, x2
,…xM}.

 Each symbol from this sequence is thus a random outcome taking value from the
finite alphabet {x1, x2 ,…xM}.

• To construct a mathematical model we consider the set X of all possible


outcomes as the alphabet of the source, say {x1, x2 ,…xM}.
 Each outcome s = xi will correspond to one particular symbol of the set.
 A probability measure Pk is associated to each symbol.

M 
Pk = P( s = xk ) 1 ≤ k ≤ M ; ∑ Pk = 1
 k =1 

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


10. An Information measure

• If a symbol emitted by a source is known exactly, there would be no need to


transmit it.

• The information content carried by one particular symbol is thus strictly


related to its uncertainty.
 Example: In the city of Madrid, in July, the weather prediction report: “Rain” contains much more
information than the event “Sunny”.

• The Information content of one symbol xi is a decreasing function of the


probability of its realization.
Q ( xi ) > Q ( x j ) ⇔ P ( xi ) < P (x j )

• The Information content associated with two independent symbols xi and xj


will be the sum of their two individual information contents:

P (xi , x j ) = P( xi )P (x j ) ⇔ Q (xi ; x j ) = Q ( xi ) + Q (x j )

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


11. An Information Measure
• The mathematical function that satisfy these two conditions is
indeed the logarithm function.

• Each symbol xi has its information content defined by:

1
Q( xi ) ∆ log a  
 Pi 

• The base (a) of the logarithm determines the unit of the measure assigned to
the information content. When the base a = 2, the unit is the “bit” measure.

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


12. An Information Measure
• Examples:

1) The correct identification of one of two equally likely symbols, that is, P(x1) = P(x2 ),
conveys an amount of information equal to Q(x1) = Q(x2) = log22 = 1 bit of
information.

2) The information content of each outcome when tossing a fair coin is Q(“Head”) =
Q(“Tail”) = log22 = 1 bit of information.

3) Consider the Bernoulli distribution (probability measure of two possible events


"1" and "0") with P(X="0") =2/3 and P(X="1") = 1/3. The information content of each
outcome is:
 1   1 
Q ("0") = log2   = 0.585 bits Q ("1" ) = log2   = 1.585 bits
 2/3  1/ 3 

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


13. Entropy of a finite alphabet

• We define the Entropy of a finite alphabet as the average


information content over all its possible outcomes:

M M
 1 
H (X ) = ∑ Pk Q ( xk ) = ∑ Pk log
 Pk


k =1 k =1

• The entropy characterizes in average the finite source's


alphabet and is measured in bits/symbol.

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


14. Entropy of a finite alphabet

Example 1:

 Alphabet : {x1, x2 , x3 , x4 }
1 1 1
P =
 Probabilities: 1 2 ; P2 = ; P3 = P4 =
4 8

4
 1 
⇒ Entropy H ( X ) = ∑
k =1
Pk
log 2  = 1 . 75 bits/symbol
 Pk 

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


15. Entropy of a finite alphabet

• Example 2:
1
 Alphabet of M equally likely distributed symbols: Pk = ∀k ∈ {1,..., M }
M

1 M
⇒ Entropy : H ( X ) = ∑ M log 2 ( M ) = log 2 ( M ) bits/symbol
k =1

• Example 3:
 Binary alphabet : {0, 1}
p0 = Px ; p1 = 1 − Px


1  1 
⇒ Entropy : H ( X ) = Px log 2 + (1 − Px ) log 2   ∆ H f ( Px )
Px  1 − Px 

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


16. Entropy of a finite alphabet

ENTROPY OF A BINARYALPHABET
1

0.9

0.8

0.7
Entropy in bits/symbol

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
probability Px

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


17. Entropy of a finite alphabet

• The maximum occurs for Px = 0.5, that is, when the two
symbols are equally likely. This results is fairly general:

 Theorem 1: The entropy H(X) of a discrete alphabet of M


symbols satisfies the inequality:

H ( X ) ≤ log M

with equality when the symbols are equally likely.

 Exercise: Proof theorem 1.

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


18. Conditional Entropy

• We now extend the definition to a random variable given


another one: the conditional entropy H(X/Y) is defined as:
M X MY
 
H(X /Y)= ∑ ∑ P(X =x ,Y = y )log 1 
k l P(X =x /Y = y ) 
k =1 l =1
 k l 
 Example:
X 1 2 3 4
Y
1 1/8 1/16 1/32 1/32
2 1/16 1/8 1/32 1/32
3 1/16 1/16 1/16 1/16
4 1/4 0 0 0

Determine H(X), H(Y) and H(X/Y) ?


TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding
19. Relative Entropy or Kullback
Leibler divergence.
• The entropy of a random variable is a measure of its uncertainty or
the amount of information needed on the average to describe it.

•Relative entropy is the measure between two distributions. It is the


measure of the inefficiency of assuming that the distribution is q when
the true one is p.

 Definition: The relative entropy or the Kullback-Leibler divergence between two


probability mass functions p(x) and q(x) is defined as:

p(x)
D(p q)=∑ p(x)log
x∈ℵ q(x)
 Example: Determine D(p q) for p(0)=p(1)=1/2 and q(0)= 3/4, q(1)=1/4.

 Relative entropy is always non-negative and is zero if and only if q = p.

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


20. Mutual Information

• The mutual information is a measure of the amount of


information that one random variable contains about another
random variable.
 Definition: Consider two random variables X and Y with a joint probability mass
function p(x,y) and marginal probability mass function p(x) and p(y). The mutual
information I(X;Y) is the relative entropy between the joint distribution and the
product distribution p(x)p(y):
M X MY
 p(X =xk,Y = yl) 
I(X;Y)= ∑ ∑ p(X =x ,Y = y )log 
k l p(X =xk)p(Y = yl) 
k =1 l =1
 
 Theorem 2:
I(X;Y)=H(X)−H(X /Y)
I(X;Y)=H(Y)−H(Y / X)
I(X;Y)=H(X)+H(Y)−H(X,Y)

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


21. Mutual Information

• From theorem 2 the mutual information is in the form:


M X MY
 p(X =xk /Y = yl) 
I(X;Y)= ∑ ∑ p(X =x ,Y = y )log 
k l p(X =x ) 
k =1 l =1
 k 
•The relationship between all these entropies is expressed in a
Venn diagram: H(X) H(Y)

H(Y/X)
H(X/Y) I(X ; Y)

H(X,Y)
TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding
22. Mutual Information

• Example: You have a jar containing 30 red cubes, 20 red


spheres, 10 white cubes and 40 white spheres. In order to
quantify the amount of information that the geometrical form
contains about the color you have to determine the mutual
information between the two random variables.

• We will emphasize later on the mutual information as the


amount of information that can reliably pass through a
communication channel .

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


23. Chain Rules

• Definition: The joint entropy of a pair of discrete random


variables (X,Y) with joint distribution p(x,y) is defined as:
M X MY
 
H(X,Y)= ∑ ∑P(X =x ,Y = y )log 1 

k l P (X = x ,Y = y )
k =1 l =1
 k l 
•Theorem 3 (Chain rule):

H(X,Y)=H(X)+H(Y / X)

•Definition: The conditional mutual information of random


variables X and Y given Z is defined by

I(X;Y /Z)=H(X /Z)−H(X /Y,Z)

•Theorem 4 (Chain rule for mutual information)


I(X , X ;Y)=I(X ;Y)+I(X ;Y / X )
1 2 1 2 1

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


24. Information inequalities

• Using Jensen’s inequality ( for a convex function f of a random


variable X, E[f(X)] ≥ f(E[X]) ), we ca prove the following
inequality:
D(p q) ≥ 0

•and then:
I(X; Y) ≥ 0

•Conditioning reduces entropy:

H(X / Y) ≤ H(X)

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


25. Data Processing Inequality

• Definition: Random variables X, Y and Z form a Markov chain


(X→Y→Z):
P(x, y,z)=P(x)P(y/ x)P(z / y)

 Markovity implies conditional independency: P(x,z/ y)=P(x/ y)P(z / y)

•Theorem 5 (Data processing inequality): If X→Y→Z, then:


I(X;Y)≥I(X;Z)

 No processing of Y, deterministic or random, can increase the information that Y


contains about X.
 If X→Y→Z, then:
I(X;Y /Z)≤I(X;Y)

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


26. The discrete stationary source

• We have studied until now the average information content of a set of all
possible outcomes recognized as the alphabet of the discrete source.
• We are interested by the knowledge of the information content per symbol
in a long sequence of symbols delivered by the discrete source, disregarding
if the emitted symbols are correlated in time or not.
• The source can be identified as a stochastic process. A source is stationary if
it has the same statistics no matter the time origin is.
• Let (X1, X 2,..., X k ) be a sequence of k non-independent random variables emitted

by a source with an alphabet of size M.

 The entropy of the k-dimensional alphabet is: H ( X k )

1
 The entropy per symbol of a sequence of k-symbols is fairly defined as: H k ( X ) = H (X k )
k

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


27. The discrete stationary source

• Definition: the entropy rate of the source as the average information


content per source symbol, that is:
1
H ∞ ( X ) = lim H ( X k ) bits / symbol
k →∞ k

• Theorem 6: For a stationary source this limit exists and is equal to the limit
of the conditional entropy lim
k →∞
H(X X ,..., X )
k k −1 1

 For a discrete memoryless source (DMS), each symbol emitted is independent from
all previous ones and the entropy rate of the source is equivalent to the entropy of
the alphabet of the source:

H∞ (X ) = H ( X )

 Otherwise one can show the relation: 0 ≤ H ∞ ( X ) ≤ H ( X )

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


28. Entropy of a continuous ensemble
• The symbol delivered by the source is a continuous random variable x taking
values in the set of real number, with a probability density function p(x).

• The entropy of a continuous alphabet with probability density p(x) is:

+∞
H ( X ) = −∫ p( x) log 2 p( x)dx
−∞

Remark: This entropy is not necessarily positive, not necessarily finite.

• Theorem 7: Let x be a continuous random variable with probability


density function p(x). If x has a finite variance σx², then H(X) exists
and satisfies the inequality:
1
H ( X ) ≤ log 2 (2π e σ x2 )
2
with equality if and only if X ~ N(µ , σx²)

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding


Part I – An Information measure.

Part II – Source Coding

Part III – The Communication Channel.

Part III – Channel Coding.


30. Coding of the source alphabet.
• Suppose that we want to transmit each symbol, using a binary
channel (a channel able to communicate binary symbols).

• The role of the source encoder is to represent each symbol of the


source by a finite string of digits (a codeword).

• Efficient communication would involve transmitting a symbol in


the shortest possible time. This implies representing the symbol
with an as short as possible codewords .
 More generally, the best source coding is one that have “in average” the
shortest description length assigned for each message to be transmitted by
the source.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


31. Coding of the source alphabet.
• Each symbol will be affected to a codeword with a different length. The
average length over all codewords is:

M
n∆ ∑ Pk nk
k =1

where nk is the length (number of digits) of the codeword representing the


symbol xk of probability Pk .

• The source encoder must be conceived in order to convey messages with an


as small as possible “average length” of binary codewords strings (concise
messages).

• The source encoder must also be conceived to be uniquely decodable. In


other words, any sequence of codewords have only one possible sequence of
source symbols producing it.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


32. Coding of the source alphabet.

• Example:
Symbol codeword
x1 0
x2 01
x3 10
x4 100

the binary sequence 010010 could correspond to any of the


five messages : x1x3x2x1 , x1x3x1x3 , x1x4x3 ,, x2x1x1x3 or x2x1x2x1

⇒ this code is ambiguous, and is not uniquely decipherable.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


33. Coding of the source alphabet.

• Condition that ensures unique decipherability : « no code word be a prefix of a


longer codeword » . Codes satisfying this constraint are called prefix codes.
x1
0 Symbol Code Word
1 x2 x1 0
0 x3 x2 10
1 0
x3 110
1 x4 111
x4

• Theorem 8 (Kraft Inequality): If the integers n1, n2,, …nK satisfy the inequality
K
∑ 2− n k
≤1
k =1

then a prefix binary code exists with these integers as codeword lengths

Note: The theorem does not say that any code whose lengths satisfy this inequality is a prefix code.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


34. Bound on optimal codelength.

• Theorem 9: A binary code satisfying the prefix constraint can be found


for any alphabet of entropy H(X) with an average codeword length
satisfying the inequality:

H ( X ) ≤ n < H ( X ) +1
H (X )
• we can define the efficiency of a code as: ε ∆
n

• Exercise: Proof theorem 3

 Hint: 1) Proof that H ( X ) − n ≤ 0

2) choose nk to be integer satisfying: 2 − nk ≤ P( xk ) < 2− nk +1

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


35. Source Coding example:
The Huffman Coding algorithm.

• A method for the construction of such a code is given by Huffman.

1. Arrange the symbols with increasing values of their probabilities.

2. Group the last two symbols xM and xM-1 into an equivalent symbol, with
probability PM + PM-1.

3. Repeat steps 1 and 2 until only one “symbol” is left.

4. Associate the binary digits 0 and 1 to each pair of branches in the tree departing
from intermediate nodes.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


36. Huffman Coding algorithm.
• Example:
Fixed length H ( X ) = 1.712 bits/symbol
Symbol Probability code
HuffmanCode

x1 0.45 00 0 * Fixed length code :

x2 0.35 01 10 n = 2 digits/sym

x3 0.1 10 110 ε = 85%

x4 0.1 11 111 * HuffmanCode:


n = 1.75 digits/sym
Huffman coding: ε = 98%
0
0.45 x1

x2 0
0.35
0 0.55
0.1 x3
1
0.2 1
0.1 x4
1

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


37. The Asymptotic Equipartition
Property (AEP)
• The AEP is the analog of the weak law of large numbers,
which states that for independent,
n
identically distributed
1
n∑
xi
(i.i.d.) random variables, thei =1 sample mean will approach
its statistical mean E[X] with probability 1, as n tends
toward infinity.
• Theorem 10: If X1, X2, … Xn are i.i.d. ~ p(x), then:

( )
− 1 log p(x1, x2,..., xn) → H(X)
n
in probability

 Definition: The “typical set” is the set defined as:

{ − n(H(X) + ε)
Aε = (x1, x2,..., xn) : 2
(n) − n(H(X) − ε)
≤ p(x1, x2,..., xn) ≤ 2 }
TELECOM LILLE 1 - Février 2010 Information Theory and Coding
38. The Asymptotic Equipartition
Property (AEP)
• Theorem 11: If (x1, x2,..., xn) ∈ Aε(n) then:

1. H(X) − ε ≤ − 1 p(x1, x2,..., xn) ≤ H(X) + ε


n

2. { }
Pr Aε(n) > 1 − ε for n sufficiently large

3. Aε(n) ≤ 2n(H(X)+ε) A denotes the number of elements in the set A

4. Aε(n) ≥ (1 − ε)2n(H(X)−ε) for n sufficiently large

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


39. The Asymptotic Equipartition
Property (AEP)
 Data compression
Non typical set
Xn => Indexing requires no more
than nlog(X) elements + prefixed
by 1

From property 3 above:

Aε(n) ≤ 2n(H(X)+ε)
(n)
Typical set Aε
=> Indexing requires no more
than n(H+ε)+1binary elements +
prefixed by 0

 Theorem 12: Let X1, X2, … Xn are i.i.d. ~ p(x), let e >0. There exists
a code which maps sequences (x1,…,xn) such that:
n = ∑ P(x ,..., x )n(x ,..., x ) ≤ H(X) + ε
1 n 1 n
Xn

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


40. Encoding the stationary source
• Until now we didn’t take into account the possible interdependency between
symbols emitted at different time.

• Let us recall the entropy per symbol in a sequence of length k:


1
Hk (X ) = H (X k )
k
• Theorem 13: It is possible to encode sequences of k source symbols into a prefix
condition code in such a way that the average number of digits n satisfies:
1
Hk ( X ) ≤ n < Hk ( X ) +
k

• Increasing the bloc length k makes the code more efficient and thus: for any δ > 0 it
is possible to choose k large enough so that n satisfies:

H∞ ( X ) ≤ n < H∞ ( X ) + δ

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


41. Huffman Coding algorithm (2).

• Example: Huffman code for source X , k=1:

Symbol Probability Code H ( X ) = 1.518 bits/sym


x1 0.45 0
Huffman coding , for k =1
x2 0.35 10
x3 0.2 11 n = 1.55 bits/symbol
ε = 97,9%
Huffman coding:
0
0.45 x1

x2 0
0.35
0.55
0.2 x3 1
1

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


42. Huffman Coding algorithm (2).
Example: Huffman code for source Y=Xk , k=2:

Symbol Y Probability Code H (Y ) = 2 × H ( X ) = 3.036 bits/sym


(x1,x1) 0.2025 10
(x1,x2) 0.1575 001 Huffman Coding of alphabet Y:
(x2,x1) 0.1575 010
nk = 3.0675 bits/sym
(x2,x2) 0.1225 011
(x1,x3) 0.09 111
Average length per symbol from set X:
(x3,x1) 0.09 0000 nk
(x2,x3) 0.07 0001 n= = 1.534 bits/sym
k
(x3,x2) 0.07 1100
ε = 99 %
(x3,x3) 0.04 1101

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


43. Huffman Coding algorithm
Exercise: Let X be the source alphabet with X= {A,B,C,D,E}, and
probabilities 0.35, 0.1, 0.15, 0.2, 0.2 respectively. Construct the binary
Huffman code for this alphabet and compute its efficiency.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


Part I – An Information measure.

Part II – Source Coding

Part III – The Communication Channel.

Part III – Channel Coding.


45. Introduction
• A communication channel is used to connect the source of information and
its user.
Discrete Source Channel Modulator
Source Encoder Encoder
Transmission
Channel

User Source Channel Demodulator


Decoder Decoder

Between the channel encoder output and the channel decoder input, we may consider a
discrete channel.
• The input and output of the channel are discrete alphabets. As a practical example the Binary Channel.

Between the channel encoder output and the input of the demodulator we may consider a
continuous channel with discrete input alphabet.
• As a practical example, the AWGN ("Additive" White Gaussian Noise) channel is well known, and is completely
characterized by the probability distribution of the noise.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


46. The discrete memoryless channel.

• A discrete channel is characterized by :


 An input alphabet: X = {xi }iN=1X

 An output alphabet: Y = {y j }j =Y1


N

 A set of conditional probabilities pij where pij ∆ P ( y j xi )

p11
x1 y1
p 21 p12
x2 p22 y2
. .
. .
. .
xN X pN X N Y
y NY

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


47. Discrete memoryless channel.

• pij ∆ P ( y j xi ) represents
the probability of receiving the
symbol yj, given that the symbol xi has been
transmitted.

• The channel is memoryless :


n
P ( y1 , y2 ,..., yn x1, x2 ,..., xn ) = ∏ P ( yi xi )
i =1

 x1,..., xn and y1,..., yn represent n consecutive transmitted and received


symbols respectively.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


48. Discrete memoryless channel.
• Example 1:
 The binary channel: NX = NY = 2
p11
x1 y1
p1
2
p 21
x2 y2
p22

NY
 Obviously we have the relationship: ∑ pij = 1
j =1
 p11 + p12 = 1 and p21 + p22 = 1

 When p12 = p21 = p the channel is called binary symmetric channel (BSC) .

x1 1−p
y1
p
p
x2 y2
1−p

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


49. Discrete memoryless channel.

• We define the channel matrix P by:

 p11 p12 ⋅⋅⋅ p1NY 


 p p22 ⋅⋅⋅ p2 NY 
P∆  
21
 ⋅ ⋅⋅⋅ ⋅⋅⋅ ⋅ 
 
 p N X 1 pN X 2 ⋅ ⋅ ⋅ p N X NY 

NY
 The sum of the elements in each row of P is 1: ∑ pij = 1
j =1

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


50. Discrete memoryless channel.

• Example 2:

• The noiseless channel: NX = NY = N


1 i = j p11
pij =  x1 y1
0 i ≠ j y2
x2 p22
. .
1 0 L 0 . .
0 1 0 M  . .
P=  =I
M 0 O  N xN pN yN
N
 
0 L 1

 The symbols of the input alphabet are in one-to-one correspondence with the
symbols of the output alphabet.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


51. Discrete memoryless channel.
• Example 3:
 The useless channel: NX = NY = N
1
P( y j xi ) = ∀ j, i
N
1 L 1
P ( y j ) = ∑ P ( y j xi )P ( xi )
P = M O M
1
i N 
1 1
1 L 1
= ∑
N i
P( xi ) =
N

⇒ P ( y j xi ) = P( y j ) ∀j , i

 The matrix P has identical rows. The useless channel completely scrambles all
input symbols, so that the received symbol gives any useful information to
decide upon the transmitted one.
P( y j xi ) = P ( y j ) ⇔ P ( xi y j ) = P ( xi )

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


52. Conditional Entropy
• Definition:
 The conditional entropy H(X|Y) measures the average information quantity
needed to specify the input symbol X when the output (or received) symbol is
known.
N X NY  1 
H ( X Y ) ∆ ∑ ∑ P( xi , y j ) log
  bits/sym

i =1 j =1  P( xi y j ) 

• This conditional entropy represents the average amount of information that has been
lost in the channel, and it is called equivocation.

• Examples:
The noiseless channel: H(X|Y) = 0
 No loss in the channel.
The useless channel: H(X|Y) = H(X)
 All transmitted information is lost on the channel

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


53. The average mutual information
• Consider a source with alphabet X transmitting through a channel having the same
input alphabet.

• A basic point is the knowledge of the average information flow that can reliably pass
through the channel.

CHANNEL
emitted message received message

Information lost in
the channel

Average Information flow = Entropy of the input alphabet


— Average Information lost in the channel

• Remark: We can define the average information at the output end of the channel:
NY  1 
H (Y ) ∆ ∑ P ( y j ) log  bits/sym
j =1  P( y j ) 

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


54. The average mutual information
• We define the average information flow (the average mutual information
between X and Y) through the channel:

I ( X ;Y ) ∆ H ( X ) − H ( X Y ) bits/sym

 Note that: I ( X ;Y ) = H ( X ) − H ( X Y )

= H (Y ) − H (Y X )

 Remark: The mutual information has a more general definition than “an information
flow”. It is the average information provided about the set X by set the Y, excluding all
average information about X from X itself (the average self-information is H(X)).

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


55. The average mutual information

• Application on the BSC Channel :


1-p
I ( X ;Y ) = H (Y ) − H (Y X ) x1 y1
p
p
N X NY   x2 y2
 1  bits/sym 1-p
1. H (Y X ) ∆ ∑∑ P ( xi , y j ) log
i =1 j =1
 P ( y j xi ) 
 

 1   1   1   1 
H (Y X ) = P ( x1, y1 ) log 2   + P ( x1, y2 ) log 2   + P( x2 , y1 ) log 2   + P( x2 , y2 ) log 2  
 P ( y1 x1 )   P ( y 2 x1 )   P ( y1 x2 )   P ( y 2 x2 ) 

 pij × P ( xi ) for i ≠ j
P ( y j , xi ) = P ( y j xi ) P ( xi ) = 
(1 − pij ) × P( xi ) for i = j

p12 = p21 = p
1  1 
⇒ H (Y X ) = p log 2   + (1 − p) log 2   = H f ( p)
p
   1 − p 
P(x1)= 1 - P(x2)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


56. Mutual Information

2. H (Y ) = P( y1 )log 2 
 1   1 
 + P( y2 )log 2  
 P( y1 )   P ( y2 ) 

P ( y j ) = ∑ p( y j , xi ) ⇒ P( y1 ) = p + P( x1 )(1 − 2 p )
i

P( y2 ) = (1 − p ) − P( x1 )(1 − 2 p )
= ∑ p( y j xi )P( xi )
i

3. I ( X ;Y ) = H (Y ) − H f ( p ) and we plot I(X;Y) as a function of P(x1 ) for different values of p


Mutual Information for a BSC
1
p =0.
0.9 p=0.1
p=0.2
0.8 p=0.3
p=0.5
0.7
bits/symbol

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(x1)
TELECOM LILLE 1 - Février 2010 Information Theory and Coding
57. Capacity of a discrete memoryless
channel.
• Considering the set of curves I(X;Y) function of P(x1 ) we can observe
that the maximum of I(X;Y) is always obtained for P(x1)=P(x2)=0.5
 when the input symbols are equally likely.

The maximum value of I(X;Y) is called the channel capacity C.

• The Channel Capacity is defined as the maximum information flew


through the channel that a communication system can theoretically
expect.

• This maximum is achieved for a given probability distribution of the


input symbols:
C ∆ Max I ( X ;Y )
P( x)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


58. Capacity of a discrete memoryless
channel.
• For BSC this capacity is obtained when the channel input symbols
are equally likely.

• This result can be extended for more general case of symmetric


discrete memoryless channels (NX inputs) .

1
P ( xi ) = for all i = 1 , .... N X
NX

• Theorem 14: For a symmetric discrete memoryless channel,


capacity is achieved by using the inputs with equal probability.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


59. Capacity of a discrete memoryless
channel.
• Example: The Binary Symmetric Channel.

cos(2πf0t)
BSC
Discrete Source
Source Encoder BPSK h(t) x
{1, 0}

AWGN

Source
User h(-t) x
Decoder
{1, 0}

cos(2πf0t)

 2 Eb 
p = Q 

 N0 
P(0) = P(1) = 0.5

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


60. Capacity of a discrete memoryless
channel
• Example: The Binary Symmetric Channel

1.2

1.1
Capacity of BSC
1
P(x1)=0.5 ; P(x2) = 0.5
0.9

0.8 I ( X ;Y )
bits/symbole

0.7 P(x1)=0.25 ; P(x2) = 0.75

0.6

0.5

0.4

0.3

0.2
0 5 10 15 20 25
SNR (dB)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


61. Capacity of the additive Gaussian
channel
• The channel disturbance has the form of a continuous Gaussian
random value ν with variance σ ν2 , added to transmitted signal.
 The assumption that the noise is Gaussian is desirable from the mathematical
point of view, and is reasonable in a wide variety of physical settings.

• In order to study the capacity of the AWGN channel, we drop the hypothesis
of discrete input alphabet and we consider the input X as a random
continuous variable with variance σ X2

X + Y
p X (x) pY ( y )

ν
ν ~ N(0, σν)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


62. Capacity of the additive Gaussian
channel
• We recall the expression of the capacity:
C ∆ Max I ( X ;Y )
P( x)

I ( X ;Y ) = H (Y ) − H (Y X )

 σ2 
C == 1 log21 + X2 
2  σν 

• Theorem 15: The capacity of a discrete-time,


continuous additive Gaussian channel is
achieved when the continuous input has a
Gaussian probability distribution.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


63. Capacity of a bandlimited Gaussian
Channel with waveform input
• We deal now with a waveform input signal in a bandlimited channel in the
frequency interval (-B, +B).

• The noise is white and Gaussian with two-sided power spectral density N0/2. In
the band (-B, +B), the noise mean power is σν² = (N0/2).(2B) = N0B

• For zero mean and a stationary input each sample will have a variance σX²
equal to the signal power P, i.e. σX² = P

• Using the sampling theorem we can represent the signal using at least 2B
samples per second. Transmitting at a sample rate 1/2B we express the
capacity in bits/sec as:

 P 
Cs = B log 2 1 +  bits/sec
 N 0 
B

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


64. Capacity of a bandlimited Gaussian
Channel with waveform input
3.5

1
3
C = log 2 (1 + SNR )
2
AWGN Capacity bits/symbol

2.5

1.5

0.5
0 2 4 6 8 10 12 14 16 18 20
SNR (dB)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


Part I – An Information measure.

Part II – Source Coding

Part III – The Communication Channel.

Part III – Channel Coding.


66. The noisy channel coding theorem
• In its more general definition, channel coding is the operation of mapping each
sequence emitted from a source to another sequence belonging to the set of all possible
sequences that the channel can convey. The functional role of channel coding in a
communication system is to insure reliable communication. The performance limits of
this coding are stated in the fundamental channel coding theorem.

• The noisy channel coding theorem introduced by C. E. Shannon in 1948 is one of the
most important results in information theory.

• In imprecise terms, this theorem states that if a noisy channel has capacity Cs in bits
per second, and if binary data enters the channel encoder at a rate Rs < Cs , then by an
appropriate design of the encoder and decoder, it is possible to reproduce the emitted
data after decoding with a probability of error as small as desired.

• Hence the noise appears to be no more a limiting parameter on the quality of a


communication system, but rather to the information rate that can be transmitted
through the channel.

Source Channel Channel


Coding

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


67. The noisy channel coding theorem
• This result enlightens the significance of the channel capacity. Let us recall
the average information rate that passes through the channel:

I ( X ;Y ) ∆ H ( X ) − H ( X Y ) bits/sym

 The equivocation H ( X Y ) represents the amount of information lost in the


channel, where X and Y are its input and output alphabets respectively.
 The capacity C is defined as the maximum of I(X;Y). The maximum is taken over
all input distributions [P(x1), P(x2), …. ].
 If an attempt is made to transmit at a higher rate than C, say C + r, then there will
be necessary an equivocation equal to or greater than r.

• Theorem 16: Let a discrete channel have a capacity C and a discrete source have an
entropy rate R. If R ≤ C there exists a coding system such that the output of the source
can be transmitted over the channel with an arbitrarily small frequency of errors (or an
arbitrarily small equivocation). If R > C there is no method of encoding which gives an
equivocation less than R − C (Shannon 1948).

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


68. The noisy channel coding theorem
• To proof the theorem, Shannon shows that a code having this desired
property must exist in a certain group of codes. Shannon proposed to average
the frequency of errors over this group of codes, and shows that this average
can be made arbitrarily small.

• Hence the noisy channel coding theorem states on the existence of such a
code but didn’t exhibit the way of constructing it.

• Consider a source with entropy rate R, R ≤ C . Consider then a random


mapping of each sequence of the source to a possible channel sequence. One
can then compute the average error probability over an ensemble of long
sequences of the channel. This will give rise to an upper bounded average
error probability:
P (e) < 2 − nE ( R )

• E(R) is a convex ∪ , decreasing function of R, with 0 < R < C and n the length
of the emitted sequences.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


69. The noisy channel coding theorem
• In order to make this bound as small as desired, the exponent factor has to
be as large as possible. A typical behavior of E(R) is shown in figures below.

• The average probability can be made as small as desired by increasing E(R) :

E(R) E(R)

R C1 C2 R
R2 R1 C

Reducing R is not a desirable Higher capacity is achieved with


solution as it is antinomic with a greater signal to noise ratio.
the objective of transmitting a Again, this solution is not
higher information rate. adequate since power is costly
and, in almost all applications
power is limited.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


70. The noisy channel coding theorem
• The informal proof by Shannon of the noisy channel coding theorem considers randomly
chosen long sequences of channel symbols.

 Thus it is obvious that the average error probability could be rendered arbitrarily small by
choosing long sequences of codewords n.

 In addition, the theorem considers randomly chosen codewords. Practically this appears to be
incompatible with reality, unless a genius observer deliver to the user of information, the rule
of coding (the mapping) for each received sequence.

 The number of codewords and the number of possible received sequences are exponentially
increasing functions of n Thus for large n, it is impractical to store the codewords in the
encoder and decoders when a deterministic rule of mapping is adopted.

 We shall continue our study on channel coding by discussing now techniques that avoid
these difficulties and we hope that progressively, after introducing simple coding techniques
we can emphasize on concatenated codes (known as turbo codes) which approaches capacity
limits as they behaves as random like codes.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


71. Improving transmission reliability:
Channel Coding
• The role of channel coding in a digital communication system is essential in order to
improve the error probability at the receiver. In almost all practical applications the
need of channel coding is indubitably required to achieve reliable communication
especially in Digital Mobile Communication .

-1
10
Gc (2.5 dB) is the QPSK
TCM
coding gain at Pe=10-4 -2 6D TCM
10
for the first code
-3
10

G'c (3.75 dB) is the


-4
10
Gc
coding gain at Pe=10-5
for the second code -5 G'c
10

-6
10
3 4 5 6 7 8 9 10

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


72. Linear Binary Codes

• Data and codewords are formed by binary digits 0 and 1.


 The channel input alphabet is binary and accepts symbols 0 and 1.
 If the output of the channel is binary we will deal essentially with BSC
 If the output of the channel is continuous we will deal essentially with AWGN
channel.

 We assume an ideal source coding, i. e. , each digit at the output of the


source block will convey an information amount of 1 bit: P(0) = P(1) = 0.5
Source
and source Channel Channel
coding {0,1,1,0….} Coding { 1,0,1,0….}

• We will present two families of binary channel coding:


 Block Codes
 Convolutional codes

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


73. Channel Coding techniques:
Linear binary block codes
• A block of n digits (codeword) generated by the encoder
depends only on the corresponding block of k bits generated
by the source.

u= (u1 , u2 , ... , uk ) Block Encoder


x= (x1 , x2 , ... , xn )
k/n

k
ρ= <1
n

• The code is defined by the set of all 2k sequences of length n


(codewords) generated by the encoder and is referred as an (n,
k) code .

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


74. Channel Coding techniques:
Linear binary block codes
Source u x
Channel
(& source coding) Modulator
Encoder
Rs bit/s Rs /ρ
symbol/s Transmission
Channel

Channel y
User Demodulator
Decoder

u = [u1 u2 ⋅ ⋅ ⋅ uk ] k
x = [ x1 x2 ⋅ ⋅ ⋅ xn ] ρ= <1
n
• In a BSC channel (Binary symmetric channel) the received n-sequence is:

y = x⊕e e = [e1 e2 ⋅ ⋅ ⋅ en ]
• e is a binary n-sequence representing the error vector. If ei =1 than an error
has occurred at digit i .

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


75. Channel Coding techniques:
Linear binary block codes
• Examples:

 Repetition Code (3 , 1)

x1 = u1 , x2 = u1 , x3 = u1

 Parity Check code (3, 2)

x1 = u1 , x2 = u 2 , x3 = u1 ⊕ u 2

where ⊕ denotes the modulo 2 sum.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


76. Channel Coding techniques:
Linear binary block codes
 Hamming Code (7, 4)
xi = ui i = 1, 2, 3, 4
x5 = u1 ⊕ u 2 ⊕ u3
x6 = u 2 ⊕ u 3 ⊕ u 4
x7 = u1 ⊕ u 2 ⊕ u 4
 The encoding rule can be represented by the generator matrix G : x =
uG
1 0 0 0 1 0 1
0 1 0 0 1 1 1
G= 
0 0 1 0 1 1 0
 
0 0 0 1 0 1 1

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


77. Channel Coding techniques:
Linear binary block codes
• A systematic encoder is an encoder where all the k information
datas belong to the codeword. Thus G assumes the canonical form:

G = [I k P ]

where Ik is the k x k identity matrix and P is a k x (n-k) matrix which


specifies the parity check equations.

• An encoder introduces r = (n – k) redundant


binary digits.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


78. Properties of linear block codes
• Property 1: All linear combination of codewords is a codeword.
a. The block code consists of all possible sums of the rows of the generator
matrix.
b. The sum of two code words is a code word.

• Property 2: The n-sequence of all zeros is always a code word.

• Property 3: A block code is a commutative group over ⊕ operation.


a. The all zeros codeword is the identity element of the code.
b. If x1, x2 and x3 are codewords then:

( x1 ⊕ x 2 ) ⊕ x 3 = x 1 ⊕ ( x 2 ⊕ x 3 )
x1 ⊕ x 2 = x 2 ⊕ x1
x1 ⊕ x 2 = 0 ⇒ x 1 = x 2

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


79. Hamming distance
• We define the Hamming distance between two codewords as the
number of places where they differ.
n
d H ( x , x ' ) = ∑ ( xi ⊕ x 'i )
i =1

 One can verify that the Hamming distance is a metric that indeed satisfies the
triangle inequality dH(x1,x3) ≤ dH(x1,x2) + dH(x2,x3)

• The minimum distance of a linear block code is:

d H , min ∆ Min[d H ( x ,x ' )]


x, x'
x≠ x'

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


80. Error detecting capabilities
• Consider a systematic encoder transmitting over a BSC.

xi = ui 1 ≤ i ≤ k
k
xi = ∑ g ij u j k +1 ≤ i ≤ n (n-k) parity equations
j =1

• The received sequence contains independent random errors caused by the


channel noise.
y = x⊕e e = [e1 e2 ⋅ ⋅ ⋅ en ]

• Using the first k received symbols y1, …, yk, an algebraic decoder compute
the n-k parity equations and compare them to the received last (n-k)
symbols yk+1, …yn
k
y 'i = ∑ g ij y j k +1 ≤ i ≤ n
j =1

yi ⊕ y 'i k +1 ≤ i ≤ n

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


81. Maximum likelihood detection in a
AWGN channel
• Consider a communication system with channel coding and decoding
processes and having these properties:

 The channel is memoryless and the noise is AWGN


 The channel's input alphabet is binary and its output is the set of real numbers.
 The source coding is ideal in the sense that each binary digit delivered by the
ensemble bloc "source and source coding" will convey an amount of information
of 1 bit (P(0)=P(1) =0.5).
cos(2πf0t)

Source and Channel x x


Source Coding Encoder BPSK h(t)
{1, 0, …}

AWGN

Maximum
User Likelihood h(-t) x
Detection r
cos(2πf0t)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


82. Lower bound on error probability
• Considering only nearest neighbors errors we have:

 dE 
Pe ≥ N min Q ,min 

 2 N0 
where dE,min is the minimum Euclidean distance between two
sequences and where N min is the average number of nearest
neighbors in the code separated by dE,min .

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


83. Lower bound on error probability
• Considering again BPSK modulation case with symbols (-A, +A) one
can easy show that:

d E2 , min = d H , min × 4 × A2 with A=


k
2E
n b

• The lower bound could be expressed as follows:

 2E  k
Pe ≥ N min Q d H , min × ρ × b  ρ=
 N0  n

• Comments: A code having a greater minimum Hamming distance may


exhibit better asymptotic performances. But when N min is very large
one can experiment significant losses in the global performance. In
addition this bound may be loose for small values of SNR as errors
may occur between codewords separated by a distance greater then
minimum Euclidean distance.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


84. Lower bound on error probability
Lower Bound on word error probability
0
10
BPSK
Hamming (7,4)
-1
10 Golay (23,12)

-2
10

-3
10
Pe

-4
10

-5
10

-6
10

0 1 2 3 4 5 6 7 8 9 10
Eb/N0 (dB)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


85. Maximum likelihood detection in
BSC Channel
• Consider again the communication system with channel coding and decoding
but consider now the that decoding processes after demodulation.

 The channel is memoryless and the noise is AWGN.


 The channel's input and outputs alphabets are binary.
 The source coding is ideal in the sense that each binary digit delivered by the
ensemble bloc "source and source coding" will convey an amount of information
of 1 bit (P(0)=P(1) =0.5).
cos(2πf0t)

Source and Channel x x


Source Coding Encoder BPSK h(t)
{1, 0, …}

AWGN

User Channel {1, 0, …} h(-t) x


Decoding y

cos(2πf0t)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


86. Maximum likelihood detection in
BSC channel
• The channel encoder deliver a codeword x of the code (n,k), and BPSK
demodulation deliver a binary n-sequence vector y performs with transition
probability p:
1-p
0 0
 k 2 Eb  p
p = Q 
 n N 0  p
1 1
1-p

• We have then y = x ⊕ e where e =[e1 e2, … en] is a sequence of errors, ei = 1


when an error occurs at position i , ei = 0 otherwise. The ML detection of
codewords in a BSC is given by:

xˆ = x (l) ⇔ ( )
P y x (l) = Max P y x (m)
(m)
( )
x

( )
P y x (l) = p d l (1 − p ) n − d l

d l = d H ( y, x (l) ) is the Hamming distance between the received sequence and the
codeword x(l).

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


87. Maximum likelihood detection in
BSC Channel
• The ML detection criterion in BSC can be expressed after taking the logarithm function
over conditional probabilities. As p < ½, P(y|x(l)) is a monotonic decreasing function of dl.
Therefore, ML criterion in BSC is resumed by the following rule:

xˆ = x (l) ( ) (
⇔ d H y, x (l) = M in d H y, x (m)
(m)
)
x

• As ML detection in BSC resumes in selecting the closest codeword, in terms of Hamming


distance, to the received binary sequence, the minimum Hamming distance appears once
more to be an influent parameter to the error performance of linear block codes.

 In designing a good linear binary code one must search on codes maximizing the minimum
Hamming distance, and having a small average number of nearest neighbors.

 The receptor operating on received binary sequences (after demodulation,i.e. a BSC channel)
is known as hard decision decoder

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


88. Hard v/s Soft decoding

Input Hard Hard decoder Output Hard

Input Soft Soft decoding Output Hard

Lower Bounds for word error probability of (7,4) Hamming code

Hard Decoding
-1 Soft decoding
10

-2
10
Pe

-3
10

-4
10

-5
10

0 2 4 6 8 10
Eb/N0 (dB)
TELECOM LILLE 1 - Février 2010 Information Theory and Coding
89. Hard v/s Soft decoding
Capacity of BPSK in AWGN
1.2

Soft
1 decision channel

bits/symbol
0.8

Hard Decision Channel (BSC)


0.6

0.4

0.2
-5 0 5 10 15
Eb/No (dB)

• We will study now correcting capabilities of a ML receptor in a BSC. We will introduce


then a practical method of decoding linear block codes. This method can be stated as an
"error controlling and correcting technique". We will derive then detection capabilities.
Correction and detection capabilities both are related to the minimum Hamming
distance.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


90. Error correcting and detecting
capabilities

• Theorem 17: A linear block code (n, k) with minimum


Hamming distance dH,min can correct all error vectors of
weight not greater than t = (dH,min – 1 )/2 (a is the natural
value of a).

• Theorem 18: A linear block code (n,k) with minimum


distance dH,min detects all error vectors of weight not greater
than (dH,min – 1 ).

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


91. Error correcting and detecting
capabilities
• A code which has a capacity of correction equal to t
is often denoted as an (n, k, t) code.

• The Parity Check code has a minimum distance 2.


It can detect all single errors but cannot correct
any.

• The (7, 4) Hamming code has a minimum distance


3. It is expected to correct all single errors.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


92. Cyclic codes
• A linear (n, k) block code is a cyclic code if and only if any cyclic shift
of a code word produces another code word.

• Cyclic codes are parity-check codes that present a large


amount of algebraic structure.

• Cyclic codes have the peculiar properties that allows easy


encoding operations and simple decoding algorithms.
 Cyclic codes are of great practical interest.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


93. Cyclic Codes.

 The Hamming code (7, 4) is a cyclic code. For instance, there are
six different cyclic shifts of the code word 0111010:

1110100 1101001 1010011 0100111 1001110 0011101

 they all all belong to the the set of code words.

 In dealing with cyclic codes it is useful to represent a binary


sequence of n digits as a polynomial in the indeterminate Z.

 A code word x = [xn - 1, xn - 2, … , x0] is represented as follow:

x( Z ) = xn −1Z n −1 ⊕ xn − 2 Z n − 2 ⊕ ... ⊕ x1Z ⊕ x0

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


94. BCH codes

• Bose-Chaudhuri-Hocquenghem codes.
• This class of cyclic codes is one of the most useful
for correcting random errors mainly because the
decoding algorithms can be implemented with an
acceptable amount of complexity.

• For any pair of positive integers m and t, there is a


binary BCH code with the following parameters:
m
n = 2 − 1, n − k ≤ mt, dH, min ≥ 2t + 1

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


95. BCH codes

• This code can correct all combinations of t or fewer


errors.

• These codes are interesting because of the


flexibility in choice of parameters (block length and
code rate), and the available decoding algorithms
that can be implemented.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


96. Reed- Solomon Codes

• A subclass of BCH codes generalized to the non binary case


(symbols belonging to a set of cardinality q = 2m ).

• Each symbol can be represented as a binary m-tuple, and the


code can be considered a special type of binary code.

• The parameters of an RS code are:


 Symbol m binary digits
 Block length n 2m – 1 symbols
 Parity checks (n – k) 2t symbols

• These codes are capable of correcting all combinations of t or


fewer symbol errors. They are well suited for correction of
burst binary errors.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


97. Convolutional Codes

• A sequential machine:

+ x1

ui ui-1 ui-2

+ + x2

• The registry content ui-1ui-2 define the state of the machine at


instant i :
ui-1ui-2 State
00 S0
01 S2
10 S1
11 S3

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


98. Convolutional Codes

• Transitions rules from state to state:


S0

S1

S2

S3

• Each transition will be assigned by the label (ui/ x1,


x2).

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


99. Convolutional Codes

(0/ 0, 0)
S0
(1/
1, 1
)

, 1)
1
S1
(0
/
0, 0)
(1/

(0/
(1 0, 1
/ )
S2
1,
0)

1, 0)
(0/
S3
(1/ 0, 1)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


100. Convolutional Codes

• Thus a convolutional code (n , k , ν) is a code of


rate k/n with 2ν states for its trellis representation.

• A convolutional code has a minimal Hamming


distance dH,min.

• This distance can be evaluated by looking up on


the trellis structure when the latter has relatively
simple behavior.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


101. Viterbi Algorithm

• The Viterbi Algorithm can be used to decode a convolutionaly


coded sequence taking advantage from the inherent trellis
structure of the code.

• The Viterbi Algorithm proved to be MLSE (maximum


likelihood sequence estimate) and asymptotically optimal.

• For a BPSK modulation and a convolutional code (n, k) with


dfree the lower bound of the error probability is:

 d free   2 Eb kd free 
Pe ≥ N free Q   = N free Q  
 2σ   N0n 
 

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


102. Viterbi Algorithm

• The practical implementation of the Viterbi Algorithm makes


Convolutional codes widely used in communication systems.

• High performance are achieved with small amount of


complexity.

• For a same constraint length, decoder complexity grows


linearly with n (with a direct computation of MLSE complexity
would grows exponentially with n ).

• Soft decoding Viterbi Algorithm is commonly used, and


improve performance with up to 3 dB over hard decoding
technique.

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


ANNEXE 1: Comparison between
digital modulations
• We define the spectral efficiency η of the transmitted waveform signaling:

R
η ∆ bit/sec/Hz
B
 Eb R  Cs
• Let Cs = B log 2 1 +  bits/sec ηc ∆ bit/sec/Hz
 N0 B  B

• Theorem A.1: To transmit information reliably on an additive white


Gaussian noise channel with spectral efficiency η any digital communication
system requires a signal-to-noise ratio satisfying:

Eb 2η c − 1

N0 η

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


ANNEXE 1: Comparison between
digital modulations
 E R
• for B → ∞ (η → 0 ) the limit of Cs = B log 2 1 + b 
 N0 B 
yields:

Eb R
C∞ = log 2 e bits/s
N0

 As information source rate R must be less than the channel


capacity (to insure an error free transmission, i.e. Pe → 0) :

Eb 1
R ≤ C∞ ⇒ ≥ = 0.693
N 0 log 2 e

Eb
≥ −1.6 dB
N0

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


ANNEXE 1: Comparison between
digital modulations
1
10

Channel Capacity
64QAM
Spectral Efficiency (bit/s/Hz in log)

16QAM 16PSK Region where error-free


transmission is possible:
8PSK Eb 2η − 1

QPSK N0 η
Bandwidth-limited region
0 -1.6 dB BPSK
10
M=8
M = 16 Power-limited region

M = 32

M = 64
Pe = 10 −5
Orthogonal signals
Coherent detection
-1
10 Back to text
-5 0 5 10 15 20 25
Eb/N0 (dB)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


ANNEXE 2:
The noisy channel coding theorem
Shannon’s informal interpretation
• Let us consider a source with alphabet X matched to the channel in a way that the source
achieves the channel capacity C, the entropy rate of the source being H(X).
 Each source’s sequence of length n is a codeword and is represented by a point in the figure
below.
 We know that for large n, there are approximately 2nH(X) input typical sequences x having
probability 2− nH(X) and similarly 2nH(Y) output typical sequences y having probability 2− nH(Y) ,
and finally 2nH(X,Y) typical pairs (x,y) .
 For each output sequence y, there are 2n[H(X,Y) −H(Y)] = 2nH(X|Y) input sequences x such that (x,y)
is a typical pair. S will be the set of 2nH(X|Y) input sequences x associated with y .
2nH(X) 2nH(Y)
nH(X|Y)
2

TELECOM LILLE 1 - Février 2010 Information Theory and Coding


ANNEXE 2:
The noisy channel coding theorem
Shannon’s informal interpretation
• Let us consider now another source with entropy rate R ≤ C ≤ H ( X ) delivering sequences
or codewords of length n. This source will have 2nR high probability sequences. We wish to
associate each of these sequences with one of the possible channel inputs in such a way
to get an arbitrarily small error probability. One way is to randomly associate each source
sequence to a channel input sequence, and calculate the frequency of errors.
• If a codeword x(i) is transmitted through the channel and the sequence y is received, an
error in decoding is possible only if at least one codeword x(j) , j ≠ i belongs to the set S
associated with y:

{ } { }
2 nR
P at least one x ( j ) , j ≠ i, belongs to S ≤ ∑ P x ( j ) ∈ S
j =1
j ≠i

( )
≤ 2 nR 2 nH ( X Y ) 2 nR
nH ( X )
= nC → 0 as n → ∞
2 2

TELECOM LILLE 1 - Février 2010 Information Theory and Coding