0 Voti positivi0 Voti negativi

1.4K visualizzazioni108 pagineMar 31, 2010

© Attribution Non-Commercial (BY-NC)

PDF, TXT o leggi online da Scribd

Attribution Non-Commercial (BY-NC)

1.4K visualizzazioni

Attribution Non-Commercial (BY-NC)

- Solution Manual of Elements of Information Theory
- Objective Question - 1 ( Information Theory and Coding)
- Information Theory and coding
- Objective Questions - 2 ( Information Theory and Coding)
- ITC_Chitode_Unit_IV
- Information Theory and Coding by Example
- information theory coding 6 sem ec notes
- Information Theory and Coding
- VTU E&C(CBCS) 5th Sem Information theory and coding full notes (1-5modules)
- Ece-V Sem-Information Theory & Coding-unit1,2,3,4,5,6,7,8
- Information Theory & Coding (ECE) by Nitin Mittal
- Ranjan Bose Information Theory Coding and Cryptography Solution Manual
- Information theory
- Information Theory and coding
- Information Theory and Coding
- information theory and coding
- Information Theory and Coding_fulandfinalversion_updated on 31-8-2006
- ITC2marks
- Fundamentals of Information Theory and Coding Design
- Information Theory and Coding Notes - Akshansh

Sei sulla pagina 1di 108

Coding.

Wadih Sawaya

General Introduction.

Communication systems

2. Communication Systems

a source to some destination.

• There exists between the source and the destination a communicating channel

affected by various disturbances.

disturbances

The Shannon’s paradigm

3. Communication Systems

disturbances

from the source by mean to reproduce the exact emitted

of sequence of symbols. sequence in order to extract

information.

channel may introduce changes

in the emitted sequence.

4. Communication Systems

asked to:

error rate in the reproduced sequence

Different user requirements may lead to different criteria of

acceptability.

Ex: speech transmission, data, audio/video,…

because:

The use of a channel is costly.

The channel employs different limited resources (time, frequency,

power…).

5. Communication Systems

Source Channel

SOURCE

Coding Coding C

H

A

N

N

E

Source Channel L

User

Decoding Decoding

• Source coding: provide “in average” the shortest description of the emitted

sequences ⇒ higher information rate

• Channel: generates disturbances.

• Channel Coding: protect the information from errors induced by the

channel, by voluntary adding redundancy to information ⇒ higher quality of

transmission

Course Contents

Part I – An Information measure.

PREAMBLE

In 1948 C. E. Shannon developed a “Mathematical theory of Communication”, called

information theory. This theory deals with the most fundamental aspects of a communication

system. It emphasis on probability theory and has a primary concern with encoders and

decoders, in terms of their functional role, and in terms of existence of encoders and decoders

that achieve a given level of performance. The latter aspect of this theory is established by

mean of two fundamental theorems.

As in any mathematical theory, this theory deals only with mathematical models and not

with physical sources and physical channels. To proceed we will study the simplest classes of

mathematical models of sources and channels. Naturally the choice of these models will be

influenced by the more important existing real physical sources and physical channels.

coding and decoding, provided the important relationships established by the theory, which

appears being useful indications of tradeoffs that exist in constructing encoders and decoders.

9. An Information measure

• A discrete source deliver a sequence of symbols from the alphabet {x1, x2

,…xM}.

Each symbol from this sequence is thus a random outcome taking value from the

finite alphabet {x1, x2 ,…xM}.

outcomes as the alphabet of the source, say {x1, x2 ,…xM}.

Each outcome s = xi will correspond to one particular symbol of the set.

A probability measure Pk is associated to each symbol.

M

Pk = P( s = xk ) 1 ≤ k ≤ M ; ∑ Pk = 1

k =1

10. An Information measure

transmit it.

related to its uncertainty.

Example: In the city of Madrid, in July, the weather prediction report: “Rain” contains much more

information than the event “Sunny”.

probability of its realization.

Q ( xi ) > Q ( x j ) ⇔ P ( xi ) < P (x j )

will be the sum of their two individual information contents:

P (xi , x j ) = P( xi )P (x j ) ⇔ Q (xi ; x j ) = Q ( xi ) + Q (x j )

11. An Information Measure

• The mathematical function that satisfy these two conditions is

indeed the logarithm function.

1

Q( xi ) ∆ log a

Pi

• The base (a) of the logarithm determines the unit of the measure assigned to

the information content. When the base a = 2, the unit is the “bit” measure.

12. An Information Measure

• Examples:

1) The correct identification of one of two equally likely symbols, that is, P(x1) = P(x2 ),

conveys an amount of information equal to Q(x1) = Q(x2) = log22 = 1 bit of

information.

2) The information content of each outcome when tossing a fair coin is Q(“Head”) =

Q(“Tail”) = log22 = 1 bit of information.

"1" and "0") with P(X="0") =2/3 and P(X="1") = 1/3. The information content of each

outcome is:

1 1

Q ("0") = log2 = 0.585 bits Q ("1" ) = log2 = 1.585 bits

2/3 1/ 3

13. Entropy of a finite alphabet

information content over all its possible outcomes:

M M

1

H (X ) = ∑ Pk Q ( xk ) = ∑ Pk log

Pk

k =1 k =1

alphabet and is measured in bits/symbol.

14. Entropy of a finite alphabet

Example 1:

Alphabet : {x1, x2 , x3 , x4 }

1 1 1

P =

Probabilities: 1 2 ; P2 = ; P3 = P4 =

4 8

4

1

⇒ Entropy H ( X ) = ∑

k =1

Pk

log 2 = 1 . 75 bits/symbol

Pk

15. Entropy of a finite alphabet

• Example 2:

1

Alphabet of M equally likely distributed symbols: Pk = ∀k ∈ {1,..., M }

M

1 M

⇒ Entropy : H ( X ) = ∑ M log 2 ( M ) = log 2 ( M ) bits/symbol

k =1

• Example 3:

Binary alphabet : {0, 1}

p0 = Px ; p1 = 1 − Px

1 1

⇒ Entropy : H ( X ) = Px log 2 + (1 − Px ) log 2 ∆ H f ( Px )

Px 1 − Px

16. Entropy of a finite alphabet

ENTROPY OF A BINARYALPHABET

1

0.9

0.8

0.7

Entropy in bits/symbol

0.6

0.5

0.4

0.3

0.2

0.1

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

probability Px

17. Entropy of a finite alphabet

• The maximum occurs for Px = 0.5, that is, when the two

symbols are equally likely. This results is fairly general:

symbols satisfies the inequality:

H ( X ) ≤ log M

18. Conditional Entropy

another one: the conditional entropy H(X/Y) is defined as:

M X MY

H(X /Y)= ∑ ∑ P(X =x ,Y = y )log 1

k l P(X =x /Y = y )

k =1 l =1

k l

Example:

X 1 2 3 4

Y

1 1/8 1/16 1/32 1/32

2 1/16 1/8 1/32 1/32

3 1/16 1/16 1/16 1/16

4 1/4 0 0 0

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding

19. Relative Entropy or Kullback

Leibler divergence.

• The entropy of a random variable is a measure of its uncertainty or

the amount of information needed on the average to describe it.

measure of the inefficiency of assuming that the distribution is q when

the true one is p.

probability mass functions p(x) and q(x) is defined as:

p(x)

D(p q)=∑ p(x)log

x∈ℵ q(x)

Example: Determine D(p q) for p(0)=p(1)=1/2 and q(0)= 3/4, q(1)=1/4.

20. Mutual Information

information that one random variable contains about another

random variable.

Definition: Consider two random variables X and Y with a joint probability mass

function p(x,y) and marginal probability mass function p(x) and p(y). The mutual

information I(X;Y) is the relative entropy between the joint distribution and the

product distribution p(x)p(y):

M X MY

p(X =xk,Y = yl)

I(X;Y)= ∑ ∑ p(X =x ,Y = y )log

k l p(X =xk)p(Y = yl)

k =1 l =1

Theorem 2:

I(X;Y)=H(X)−H(X /Y)

I(X;Y)=H(Y)−H(Y / X)

I(X;Y)=H(X)+H(Y)−H(X,Y)

21. Mutual Information

M X MY

p(X =xk /Y = yl)

I(X;Y)= ∑ ∑ p(X =x ,Y = y )log

k l p(X =x )

k =1 l =1

k

•The relationship between all these entropies is expressed in a

Venn diagram: H(X) H(Y)

H(Y/X)

H(X/Y) I(X ; Y)

H(X,Y)

TELECOM LILLE 1 - Février 2010 Information Theory and Channel Coding

22. Mutual Information

spheres, 10 white cubes and 40 white spheres. In order to

quantify the amount of information that the geometrical form

contains about the color you have to determine the mutual

information between the two random variables.

amount of information that can reliably pass through a

communication channel .

23. Chain Rules

variables (X,Y) with joint distribution p(x,y) is defined as:

M X MY

H(X,Y)= ∑ ∑P(X =x ,Y = y )log 1

k l P (X = x ,Y = y )

k =1 l =1

k l

•Theorem 3 (Chain rule):

H(X,Y)=H(X)+H(Y / X)

variables X and Y given Z is defined by

I(X , X ;Y)=I(X ;Y)+I(X ;Y / X )

1 2 1 2 1

24. Information inequalities

variable X, E[f(X)] ≥ f(E[X]) ), we ca prove the following

inequality:

D(p q) ≥ 0

•and then:

I(X; Y) ≥ 0

H(X / Y) ≤ H(X)

25. Data Processing Inequality

(X→Y→Z):

P(x, y,z)=P(x)P(y/ x)P(z / y)

I(X;Y)≥I(X;Z)

contains about X.

If X→Y→Z, then:

I(X;Y /Z)≤I(X;Y)

26. The discrete stationary source

• We have studied until now the average information content of a set of all

possible outcomes recognized as the alphabet of the discrete source.

• We are interested by the knowledge of the information content per symbol

in a long sequence of symbols delivered by the discrete source, disregarding

if the emitted symbols are correlated in time or not.

• The source can be identified as a stochastic process. A source is stationary if

it has the same statistics no matter the time origin is.

• Let (X1, X 2,..., X k ) be a sequence of k non-independent random variables emitted

1

The entropy per symbol of a sequence of k-symbols is fairly defined as: H k ( X ) = H (X k )

k

27. The discrete stationary source

content per source symbol, that is:

1

H ∞ ( X ) = lim H ( X k ) bits / symbol

k →∞ k

• Theorem 6: For a stationary source this limit exists and is equal to the limit

of the conditional entropy lim

k →∞

H(X X ,..., X )

k k −1 1

For a discrete memoryless source (DMS), each symbol emitted is independent from

all previous ones and the entropy rate of the source is equivalent to the entropy of

the alphabet of the source:

H∞ (X ) = H ( X )

28. Entropy of a continuous ensemble

• The symbol delivered by the source is a continuous random variable x taking

values in the set of real number, with a probability density function p(x).

+∞

H ( X ) = −∫ p( x) log 2 p( x)dx

−∞

density function p(x). If x has a finite variance σx², then H(X) exists

and satisfies the inequality:

1

H ( X ) ≤ log 2 (2π e σ x2 )

2

with equality if and only if X ~ N(µ , σx²)

Part I – An Information measure.

30. Coding of the source alphabet.

• Suppose that we want to transmit each symbol, using a binary

channel (a channel able to communicate binary symbols).

source by a finite string of digits (a codeword).

the shortest possible time. This implies representing the symbol

with an as short as possible codewords .

More generally, the best source coding is one that have “in average” the

shortest description length assigned for each message to be transmitted by

the source.

31. Coding of the source alphabet.

• Each symbol will be affected to a codeword with a different length. The

average length over all codewords is:

M

n∆ ∑ Pk nk

k =1

symbol xk of probability Pk .

as small as possible “average length” of binary codewords strings (concise

messages).

other words, any sequence of codewords have only one possible sequence of

source symbols producing it.

32. Coding of the source alphabet.

• Example:

Symbol codeword

x1 0

x2 01

x3 10

x4 100

five messages : x1x3x2x1 , x1x3x1x3 , x1x4x3 ,, x2x1x1x3 or x2x1x2x1

33. Coding of the source alphabet.

longer codeword » . Codes satisfying this constraint are called prefix codes.

x1

0 Symbol Code Word

1 x2 x1 0

0 x3 x2 10

1 0

x3 110

1 x4 111

x4

• Theorem 8 (Kraft Inequality): If the integers n1, n2,, …nK satisfy the inequality

K

∑ 2− n k

≤1

k =1

then a prefix binary code exists with these integers as codeword lengths

Note: The theorem does not say that any code whose lengths satisfy this inequality is a prefix code.

34. Bound on optimal codelength.

for any alphabet of entropy H(X) with an average codeword length

satisfying the inequality:

H ( X ) ≤ n < H ( X ) +1

H (X )

• we can define the efficiency of a code as: ε ∆

n

35. Source Coding example:

The Huffman Coding algorithm.

2. Group the last two symbols xM and xM-1 into an equivalent symbol, with

probability PM + PM-1.

4. Associate the binary digits 0 and 1 to each pair of branches in the tree departing

from intermediate nodes.

36. Huffman Coding algorithm.

• Example:

Fixed length H ( X ) = 1.712 bits/symbol

Symbol Probability code

HuffmanCode

x2 0.35 01 10 n = 2 digits/sym

n = 1.75 digits/sym

Huffman coding: ε = 98%

0

0.45 x1

x2 0

0.35

0 0.55

0.1 x3

1

0.2 1

0.1 x4

1

37. The Asymptotic Equipartition

Property (AEP)

• The AEP is the analog of the weak law of large numbers,

which states that for independent,

n

identically distributed

1

n∑

xi

(i.i.d.) random variables, thei =1 sample mean will approach

its statistical mean E[X] with probability 1, as n tends

toward infinity.

• Theorem 10: If X1, X2, … Xn are i.i.d. ~ p(x), then:

( )

− 1 log p(x1, x2,..., xn) → H(X)

n

in probability

{ − n(H(X) + ε)

Aε = (x1, x2,..., xn) : 2

(n) − n(H(X) − ε)

≤ p(x1, x2,..., xn) ≤ 2 }

TELECOM LILLE 1 - Février 2010 Information Theory and Coding

38. The Asymptotic Equipartition

Property (AEP)

• Theorem 11: If (x1, x2,..., xn) ∈ Aε(n) then:

n

2. { }

Pr Aε(n) > 1 − ε for n sufficiently large

39. The Asymptotic Equipartition

Property (AEP)

Data compression

Non typical set

Xn => Indexing requires no more

than nlog(X) elements + prefixed

by 1

Aε(n) ≤ 2n(H(X)+ε)

(n)

Typical set Aε

=> Indexing requires no more

than n(H+ε)+1binary elements +

prefixed by 0

Theorem 12: Let X1, X2, … Xn are i.i.d. ~ p(x), let e >0. There exists

a code which maps sequences (x1,…,xn) such that:

n = ∑ P(x ,..., x )n(x ,..., x ) ≤ H(X) + ε

1 n 1 n

Xn

40. Encoding the stationary source

• Until now we didn’t take into account the possible interdependency between

symbols emitted at different time.

1

Hk (X ) = H (X k )

k

• Theorem 13: It is possible to encode sequences of k source symbols into a prefix

condition code in such a way that the average number of digits n satisfies:

1

Hk ( X ) ≤ n < Hk ( X ) +

k

• Increasing the bloc length k makes the code more efficient and thus: for any δ > 0 it

is possible to choose k large enough so that n satisfies:

H∞ ( X ) ≤ n < H∞ ( X ) + δ

41. Huffman Coding algorithm (2).

x1 0.45 0

Huffman coding , for k =1

x2 0.35 10

x3 0.2 11 n = 1.55 bits/symbol

ε = 97,9%

Huffman coding:

0

0.45 x1

x2 0

0.35

0.55

0.2 x3 1

1

42. Huffman Coding algorithm (2).

Example: Huffman code for source Y=Xk , k=2:

(x1,x1) 0.2025 10

(x1,x2) 0.1575 001 Huffman Coding of alphabet Y:

(x2,x1) 0.1575 010

nk = 3.0675 bits/sym

(x2,x2) 0.1225 011

(x1,x3) 0.09 111

Average length per symbol from set X:

(x3,x1) 0.09 0000 nk

(x2,x3) 0.07 0001 n= = 1.534 bits/sym

k

(x3,x2) 0.07 1100

ε = 99 %

(x3,x3) 0.04 1101

43. Huffman Coding algorithm

Exercise: Let X be the source alphabet with X= {A,B,C,D,E}, and

probabilities 0.35, 0.1, 0.15, 0.2, 0.2 respectively. Construct the binary

Huffman code for this alphabet and compute its efficiency.

Part I – An Information measure.

45. Introduction

• A communication channel is used to connect the source of information and

its user.

Discrete Source Channel Modulator

Source Encoder Encoder

Transmission

Channel

Decoder Decoder

Between the channel encoder output and the channel decoder input, we may consider a

discrete channel.

• The input and output of the channel are discrete alphabets. As a practical example the Binary Channel.

Between the channel encoder output and the input of the demodulator we may consider a

continuous channel with discrete input alphabet.

• As a practical example, the AWGN ("Additive" White Gaussian Noise) channel is well known, and is completely

characterized by the probability distribution of the noise.

46. The discrete memoryless channel.

An input alphabet: X = {xi }iN=1X

N

p11

x1 y1

p 21 p12

x2 p22 y2

. .

. .

. .

xN X pN X N Y

y NY

47. Discrete memoryless channel.

• pij ∆ P ( y j xi ) represents

the probability of receiving the

symbol yj, given that the symbol xi has been

transmitted.

n

P ( y1 , y2 ,..., yn x1, x2 ,..., xn ) = ∏ P ( yi xi )

i =1

symbols respectively.

48. Discrete memoryless channel.

• Example 1:

The binary channel: NX = NY = 2

p11

x1 y1

p1

2

p 21

x2 y2

p22

NY

Obviously we have the relationship: ∑ pij = 1

j =1

p11 + p12 = 1 and p21 + p22 = 1

When p12 = p21 = p the channel is called binary symmetric channel (BSC) .

x1 1−p

y1

p

p

x2 y2

1−p

49. Discrete memoryless channel.

p p22 ⋅⋅⋅ p2 NY

P∆

21

⋅ ⋅⋅⋅ ⋅⋅⋅ ⋅

p N X 1 pN X 2 ⋅ ⋅ ⋅ p N X NY

NY

The sum of the elements in each row of P is 1: ∑ pij = 1

j =1

50. Discrete memoryless channel.

• Example 2:

1 i = j p11

pij = x1 y1

0 i ≠ j y2

x2 p22

. .

1 0 L 0 . .

0 1 0 M . .

P= =I

M 0 O N xN pN yN

N

0 L 1

The symbols of the input alphabet are in one-to-one correspondence with the

symbols of the output alphabet.

51. Discrete memoryless channel.

• Example 3:

The useless channel: NX = NY = N

1

P( y j xi ) = ∀ j, i

N

1 L 1

P ( y j ) = ∑ P ( y j xi )P ( xi )

P = M O M

1

i N

1 1

1 L 1

= ∑

N i

P( xi ) =

N

⇒ P ( y j xi ) = P( y j ) ∀j , i

The matrix P has identical rows. The useless channel completely scrambles all

input symbols, so that the received symbol gives any useful information to

decide upon the transmitted one.

P( y j xi ) = P ( y j ) ⇔ P ( xi y j ) = P ( xi )

52. Conditional Entropy

• Definition:

The conditional entropy H(X|Y) measures the average information quantity

needed to specify the input symbol X when the output (or received) symbol is

known.

N X NY 1

H ( X Y ) ∆ ∑ ∑ P( xi , y j ) log

bits/sym

i =1 j =1 P( xi y j )

• This conditional entropy represents the average amount of information that has been

lost in the channel, and it is called equivocation.

• Examples:

The noiseless channel: H(X|Y) = 0

No loss in the channel.

The useless channel: H(X|Y) = H(X)

All transmitted information is lost on the channel

53. The average mutual information

• Consider a source with alphabet X transmitting through a channel having the same

input alphabet.

• A basic point is the knowledge of the average information flow that can reliably pass

through the channel.

CHANNEL

emitted message received message

Information lost in

the channel

— Average Information lost in the channel

• Remark: We can define the average information at the output end of the channel:

NY 1

H (Y ) ∆ ∑ P ( y j ) log bits/sym

j =1 P( y j )

54. The average mutual information

• We define the average information flow (the average mutual information

between X and Y) through the channel:

I ( X ;Y ) ∆ H ( X ) − H ( X Y ) bits/sym

Note that: I ( X ;Y ) = H ( X ) − H ( X Y )

= H (Y ) − H (Y X )

Remark: The mutual information has a more general definition than “an information

flow”. It is the average information provided about the set X by set the Y, excluding all

average information about X from X itself (the average self-information is H(X)).

55. The average mutual information

1-p

I ( X ;Y ) = H (Y ) − H (Y X ) x1 y1

p

p

N X NY x2 y2

1 bits/sym 1-p

1. H (Y X ) ∆ ∑∑ P ( xi , y j ) log

i =1 j =1

P ( y j xi )

1 1 1 1

H (Y X ) = P ( x1, y1 ) log 2 + P ( x1, y2 ) log 2 + P( x2 , y1 ) log 2 + P( x2 , y2 ) log 2

P ( y1 x1 ) P ( y 2 x1 ) P ( y1 x2 ) P ( y 2 x2 )

pij × P ( xi ) for i ≠ j

P ( y j , xi ) = P ( y j xi ) P ( xi ) =

(1 − pij ) × P( xi ) for i = j

p12 = p21 = p

1 1

⇒ H (Y X ) = p log 2 + (1 − p) log 2 = H f ( p)

p

1 − p

P(x1)= 1 - P(x2)

56. Mutual Information

2. H (Y ) = P( y1 )log 2

1 1

+ P( y2 )log 2

P( y1 ) P ( y2 )

P ( y j ) = ∑ p( y j , xi ) ⇒ P( y1 ) = p + P( x1 )(1 − 2 p )

i

P( y2 ) = (1 − p ) − P( x1 )(1 − 2 p )

= ∑ p( y j xi )P( xi )

i

Mutual Information for a BSC

1

p =0.

0.9 p=0.1

p=0.2

0.8 p=0.3

p=0.5

0.7

bits/symbol

0.6

0.5

0.4

0.3

0.2

0.1

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

P(x1)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding

57. Capacity of a discrete memoryless

channel.

• Considering the set of curves I(X;Y) function of P(x1 ) we can observe

that the maximum of I(X;Y) is always obtained for P(x1)=P(x2)=0.5

when the input symbols are equally likely.

through the channel that a communication system can theoretically

expect.

input symbols:

C ∆ Max I ( X ;Y )

P( x)

58. Capacity of a discrete memoryless

channel.

• For BSC this capacity is obtained when the channel input symbols

are equally likely.

discrete memoryless channels (NX inputs) .

1

P ( xi ) = for all i = 1 , .... N X

NX

capacity is achieved by using the inputs with equal probability.

59. Capacity of a discrete memoryless

channel.

• Example: The Binary Symmetric Channel.

cos(2πf0t)

BSC

Discrete Source

Source Encoder BPSK h(t) x

{1, 0}

AWGN

Source

User h(-t) x

Decoder

{1, 0}

cos(2πf0t)

2 Eb

p = Q

N0

P(0) = P(1) = 0.5

60. Capacity of a discrete memoryless

channel

• Example: The Binary Symmetric Channel

1.2

1.1

Capacity of BSC

1

P(x1)=0.5 ; P(x2) = 0.5

0.9

0.8 I ( X ;Y )

bits/symbole

0.6

0.5

0.4

0.3

0.2

0 5 10 15 20 25

SNR (dB)

61. Capacity of the additive Gaussian

channel

• The channel disturbance has the form of a continuous Gaussian

random value ν with variance σ ν2 , added to transmitted signal.

The assumption that the noise is Gaussian is desirable from the mathematical

point of view, and is reasonable in a wide variety of physical settings.

• In order to study the capacity of the AWGN channel, we drop the hypothesis

of discrete input alphabet and we consider the input X as a random

continuous variable with variance σ X2

X + Y

p X (x) pY ( y )

ν

ν ~ N(0, σν)

62. Capacity of the additive Gaussian

channel

• We recall the expression of the capacity:

C ∆ Max I ( X ;Y )

P( x)

I ( X ;Y ) = H (Y ) − H (Y X )

σ2

C == 1 log21 + X2

2 σν

continuous additive Gaussian channel is

achieved when the continuous input has a

Gaussian probability distribution.

63. Capacity of a bandlimited Gaussian

Channel with waveform input

• We deal now with a waveform input signal in a bandlimited channel in the

frequency interval (-B, +B).

• The noise is white and Gaussian with two-sided power spectral density N0/2. In

the band (-B, +B), the noise mean power is σν² = (N0/2).(2B) = N0B

• For zero mean and a stationary input each sample will have a variance σX²

equal to the signal power P, i.e. σX² = P

• Using the sampling theorem we can represent the signal using at least 2B

samples per second. Transmitting at a sample rate 1/2B we express the

capacity in bits/sec as:

P

Cs = B log 2 1 + bits/sec

N 0

B

64. Capacity of a bandlimited Gaussian

Channel with waveform input

3.5

1

3

C = log 2 (1 + SNR )

2

AWGN Capacity bits/symbol

2.5

1.5

0.5

0 2 4 6 8 10 12 14 16 18 20

SNR (dB)

Part I – An Information measure.

66. The noisy channel coding theorem

• In its more general definition, channel coding is the operation of mapping each

sequence emitted from a source to another sequence belonging to the set of all possible

sequences that the channel can convey. The functional role of channel coding in a

communication system is to insure reliable communication. The performance limits of

this coding are stated in the fundamental channel coding theorem.

• The noisy channel coding theorem introduced by C. E. Shannon in 1948 is one of the

most important results in information theory.

• In imprecise terms, this theorem states that if a noisy channel has capacity Cs in bits

per second, and if binary data enters the channel encoder at a rate Rs < Cs , then by an

appropriate design of the encoder and decoder, it is possible to reproduce the emitted

data after decoding with a probability of error as small as desired.

communication system, but rather to the information rate that can be transmitted

through the channel.

Coding

67. The noisy channel coding theorem

• This result enlightens the significance of the channel capacity. Let us recall

the average information rate that passes through the channel:

I ( X ;Y ) ∆ H ( X ) − H ( X Y ) bits/sym

channel, where X and Y are its input and output alphabets respectively.

The capacity C is defined as the maximum of I(X;Y). The maximum is taken over

all input distributions [P(x1), P(x2), …. ].

If an attempt is made to transmit at a higher rate than C, say C + r, then there will

be necessary an equivocation equal to or greater than r.

• Theorem 16: Let a discrete channel have a capacity C and a discrete source have an

entropy rate R. If R ≤ C there exists a coding system such that the output of the source

can be transmitted over the channel with an arbitrarily small frequency of errors (or an

arbitrarily small equivocation). If R > C there is no method of encoding which gives an

equivocation less than R − C (Shannon 1948).

68. The noisy channel coding theorem

• To proof the theorem, Shannon shows that a code having this desired

property must exist in a certain group of codes. Shannon proposed to average

the frequency of errors over this group of codes, and shows that this average

can be made arbitrarily small.

• Hence the noisy channel coding theorem states on the existence of such a

code but didn’t exhibit the way of constructing it.

mapping of each sequence of the source to a possible channel sequence. One

can then compute the average error probability over an ensemble of long

sequences of the channel. This will give rise to an upper bounded average

error probability:

P (e) < 2 − nE ( R )

• E(R) is a convex ∪ , decreasing function of R, with 0 < R < C and n the length

of the emitted sequences.

69. The noisy channel coding theorem

• In order to make this bound as small as desired, the exponent factor has to

be as large as possible. A typical behavior of E(R) is shown in figures below.

E(R) E(R)

R C1 C2 R

R2 R1 C

solution as it is antinomic with a greater signal to noise ratio.

the objective of transmitting a Again, this solution is not

higher information rate. adequate since power is costly

and, in almost all applications

power is limited.

70. The noisy channel coding theorem

• The informal proof by Shannon of the noisy channel coding theorem considers randomly

chosen long sequences of channel symbols.

Thus it is obvious that the average error probability could be rendered arbitrarily small by

choosing long sequences of codewords n.

In addition, the theorem considers randomly chosen codewords. Practically this appears to be

incompatible with reality, unless a genius observer deliver to the user of information, the rule

of coding (the mapping) for each received sequence.

The number of codewords and the number of possible received sequences are exponentially

increasing functions of n Thus for large n, it is impractical to store the codewords in the

encoder and decoders when a deterministic rule of mapping is adopted.

We shall continue our study on channel coding by discussing now techniques that avoid

these difficulties and we hope that progressively, after introducing simple coding techniques

we can emphasize on concatenated codes (known as turbo codes) which approaches capacity

limits as they behaves as random like codes.

71. Improving transmission reliability:

Channel Coding

• The role of channel coding in a digital communication system is essential in order to

improve the error probability at the receiver. In almost all practical applications the

need of channel coding is indubitably required to achieve reliable communication

especially in Digital Mobile Communication .

-1

10

Gc (2.5 dB) is the QPSK

TCM

coding gain at Pe=10-4 -2 6D TCM

10

for the first code

-3

10

-4

10

Gc

coding gain at Pe=10-5

for the second code -5 G'c

10

-6

10

3 4 5 6 7 8 9 10

72. Linear Binary Codes

The channel input alphabet is binary and accepts symbols 0 and 1.

If the output of the channel is binary we will deal essentially with BSC

If the output of the channel is continuous we will deal essentially with AWGN

channel.

source block will convey an information amount of 1 bit: P(0) = P(1) = 0.5

Source

and source Channel Channel

coding {0,1,1,0….} Coding { 1,0,1,0….}

Block Codes

Convolutional codes

73. Channel Coding techniques:

Linear binary block codes

• A block of n digits (codeword) generated by the encoder

depends only on the corresponding block of k bits generated

by the source.

x= (x1 , x2 , ... , xn )

k/n

k

ρ= <1

n

(codewords) generated by the encoder and is referred as an (n,

k) code .

74. Channel Coding techniques:

Linear binary block codes

Source u x

Channel

(& source coding) Modulator

Encoder

Rs bit/s Rs /ρ

symbol/s Transmission

Channel

Channel y

User Demodulator

Decoder

u = [u1 u2 ⋅ ⋅ ⋅ uk ] k

x = [ x1 x2 ⋅ ⋅ ⋅ xn ] ρ= <1

n

• In a BSC channel (Binary symmetric channel) the received n-sequence is:

y = x⊕e e = [e1 e2 ⋅ ⋅ ⋅ en ]

• e is a binary n-sequence representing the error vector. If ei =1 than an error

has occurred at digit i .

75. Channel Coding techniques:

Linear binary block codes

• Examples:

Repetition Code (3 , 1)

x1 = u1 , x2 = u1 , x3 = u1

x1 = u1 , x2 = u 2 , x3 = u1 ⊕ u 2

76. Channel Coding techniques:

Linear binary block codes

Hamming Code (7, 4)

xi = ui i = 1, 2, 3, 4

x5 = u1 ⊕ u 2 ⊕ u3

x6 = u 2 ⊕ u 3 ⊕ u 4

x7 = u1 ⊕ u 2 ⊕ u 4

The encoding rule can be represented by the generator matrix G : x =

uG

1 0 0 0 1 0 1

0 1 0 0 1 1 1

G=

0 0 1 0 1 1 0

0 0 0 1 0 1 1

77. Channel Coding techniques:

Linear binary block codes

• A systematic encoder is an encoder where all the k information

datas belong to the codeword. Thus G assumes the canonical form:

G = [I k P ]

specifies the parity check equations.

binary digits.

78. Properties of linear block codes

• Property 1: All linear combination of codewords is a codeword.

a. The block code consists of all possible sums of the rows of the generator

matrix.

b. The sum of two code words is a code word.

a. The all zeros codeword is the identity element of the code.

b. If x1, x2 and x3 are codewords then:

( x1 ⊕ x 2 ) ⊕ x 3 = x 1 ⊕ ( x 2 ⊕ x 3 )

x1 ⊕ x 2 = x 2 ⊕ x1

x1 ⊕ x 2 = 0 ⇒ x 1 = x 2

79. Hamming distance

• We define the Hamming distance between two codewords as the

number of places where they differ.

n

d H ( x , x ' ) = ∑ ( xi ⊕ x 'i )

i =1

One can verify that the Hamming distance is a metric that indeed satisfies the

triangle inequality dH(x1,x3) ≤ dH(x1,x2) + dH(x2,x3)

x, x'

x≠ x'

80. Error detecting capabilities

• Consider a systematic encoder transmitting over a BSC.

xi = ui 1 ≤ i ≤ k

k

xi = ∑ g ij u j k +1 ≤ i ≤ n (n-k) parity equations

j =1

channel noise.

y = x⊕e e = [e1 e2 ⋅ ⋅ ⋅ en ]

• Using the first k received symbols y1, …, yk, an algebraic decoder compute

the n-k parity equations and compare them to the received last (n-k)

symbols yk+1, …yn

k

y 'i = ∑ g ij y j k +1 ≤ i ≤ n

j =1

yi ⊕ y 'i k +1 ≤ i ≤ n

81. Maximum likelihood detection in a

AWGN channel

• Consider a communication system with channel coding and decoding

processes and having these properties:

The channel's input alphabet is binary and its output is the set of real numbers.

The source coding is ideal in the sense that each binary digit delivered by the

ensemble bloc "source and source coding" will convey an amount of information

of 1 bit (P(0)=P(1) =0.5).

cos(2πf0t)

Source Coding Encoder BPSK h(t)

{1, 0, …}

AWGN

Maximum

User Likelihood h(-t) x

Detection r

cos(2πf0t)

82. Lower bound on error probability

• Considering only nearest neighbors errors we have:

dE

Pe ≥ N min Q ,min

2 N0

where dE,min is the minimum Euclidean distance between two

sequences and where N min is the average number of nearest

neighbors in the code separated by dE,min .

83. Lower bound on error probability

• Considering again BPSK modulation case with symbols (-A, +A) one

can easy show that:

k

2E

n b

2E k

Pe ≥ N min Q d H , min × ρ × b ρ=

N0 n

exhibit better asymptotic performances. But when N min is very large

one can experiment significant losses in the global performance. In

addition this bound may be loose for small values of SNR as errors

may occur between codewords separated by a distance greater then

minimum Euclidean distance.

84. Lower bound on error probability

Lower Bound on word error probability

0

10

BPSK

Hamming (7,4)

-1

10 Golay (23,12)

-2

10

-3

10

Pe

-4

10

-5

10

-6

10

0 1 2 3 4 5 6 7 8 9 10

Eb/N0 (dB)

85. Maximum likelihood detection in

BSC Channel

• Consider again the communication system with channel coding and decoding

but consider now the that decoding processes after demodulation.

The channel's input and outputs alphabets are binary.

The source coding is ideal in the sense that each binary digit delivered by the

ensemble bloc "source and source coding" will convey an amount of information

of 1 bit (P(0)=P(1) =0.5).

cos(2πf0t)

Source Coding Encoder BPSK h(t)

{1, 0, …}

AWGN

Decoding y

cos(2πf0t)

86. Maximum likelihood detection in

BSC channel

• The channel encoder deliver a codeword x of the code (n,k), and BPSK

demodulation deliver a binary n-sequence vector y performs with transition

probability p:

1-p

0 0

k 2 Eb p

p = Q

n N 0 p

1 1

1-p

when an error occurs at position i , ei = 0 otherwise. The ML detection of

codewords in a BSC is given by:

xˆ = x (l) ⇔ ( )

P y x (l) = Max P y x (m)

(m)

( )

x

( )

P y x (l) = p d l (1 − p ) n − d l

d l = d H ( y, x (l) ) is the Hamming distance between the received sequence and the

codeword x(l).

87. Maximum likelihood detection in

BSC Channel

• The ML detection criterion in BSC can be expressed after taking the logarithm function

over conditional probabilities. As p < ½, P(y|x(l)) is a monotonic decreasing function of dl.

Therefore, ML criterion in BSC is resumed by the following rule:

xˆ = x (l) ( ) (

⇔ d H y, x (l) = M in d H y, x (m)

(m)

)

x

distance, to the received binary sequence, the minimum Hamming distance appears once

more to be an influent parameter to the error performance of linear block codes.

In designing a good linear binary code one must search on codes maximizing the minimum

Hamming distance, and having a small average number of nearest neighbors.

The receptor operating on received binary sequences (after demodulation,i.e. a BSC channel)

is known as hard decision decoder

88. Hard v/s Soft decoding

Hard Decoding

-1 Soft decoding

10

-2

10

Pe

-3

10

-4

10

-5

10

0 2 4 6 8 10

Eb/N0 (dB)

TELECOM LILLE 1 - Février 2010 Information Theory and Coding

89. Hard v/s Soft decoding

Capacity of BPSK in AWGN

1.2

Soft

1 decision channel

bits/symbol

0.8

0.6

0.4

0.2

-5 0 5 10 15

Eb/No (dB)

then a practical method of decoding linear block codes. This method can be stated as an

"error controlling and correcting technique". We will derive then detection capabilities.

Correction and detection capabilities both are related to the minimum Hamming

distance.

90. Error correcting and detecting

capabilities

Hamming distance dH,min can correct all error vectors of

weight not greater than t = (dH,min – 1 )/2 (a is the natural

value of a).

distance dH,min detects all error vectors of weight not greater

than (dH,min – 1 ).

91. Error correcting and detecting

capabilities

• A code which has a capacity of correction equal to t

is often denoted as an (n, k, t) code.

It can detect all single errors but cannot correct

any.

3. It is expected to correct all single errors.

92. Cyclic codes

• A linear (n, k) block code is a cyclic code if and only if any cyclic shift

of a code word produces another code word.

amount of algebraic structure.

encoding operations and simple decoding algorithms.

Cyclic codes are of great practical interest.

93. Cyclic Codes.

The Hamming code (7, 4) is a cyclic code. For instance, there are

six different cyclic shifts of the code word 0111010:

sequence of n digits as a polynomial in the indeterminate Z.

94. BCH codes

• Bose-Chaudhuri-Hocquenghem codes.

• This class of cyclic codes is one of the most useful

for correcting random errors mainly because the

decoding algorithms can be implemented with an

acceptable amount of complexity.

binary BCH code with the following parameters:

m

n = 2 − 1, n − k ≤ mt, dH, min ≥ 2t + 1

95. BCH codes

errors.

flexibility in choice of parameters (block length and

code rate), and the available decoding algorithms

that can be implemented.

96. Reed- Solomon Codes

(symbols belonging to a set of cardinality q = 2m ).

code can be considered a special type of binary code.

Symbol m binary digits

Block length n 2m – 1 symbols

Parity checks (n – k) 2t symbols

fewer symbol errors. They are well suited for correction of

burst binary errors.

97. Convolutional Codes

• A sequential machine:

+ x1

ui ui-1 ui-2

+ + x2

instant i :

ui-1ui-2 State

00 S0

01 S2

10 S1

11 S3

98. Convolutional Codes

S0

S1

S2

S3

x2).

99. Convolutional Codes

(0/ 0, 0)

S0

(1/

1, 1

)

, 1)

1

S1

(0

/

0, 0)

(1/

(0/

(1 0, 1

/ )

S2

1,

0)

1, 0)

(0/

S3

(1/ 0, 1)

100. Convolutional Codes

rate k/n with 2ν states for its trellis representation.

distance dH,min.

the trellis structure when the latter has relatively

simple behavior.

101. Viterbi Algorithm

coded sequence taking advantage from the inherent trellis

structure of the code.

likelihood sequence estimate) and asymptotically optimal.

dfree the lower bound of the error probability is:

d free 2 Eb kd free

Pe ≥ N free Q = N free Q

2σ N0n

102. Viterbi Algorithm

Convolutional codes widely used in communication systems.

complexity.

linearly with n (with a direct computation of MLSE complexity

would grows exponentially with n ).

improve performance with up to 3 dB over hard decoding

technique.

ANNEXE 1: Comparison between

digital modulations

• We define the spectral efficiency η of the transmitted waveform signaling:

R

η ∆ bit/sec/Hz

B

Eb R Cs

• Let Cs = B log 2 1 + bits/sec ηc ∆ bit/sec/Hz

N0 B B

Gaussian noise channel with spectral efficiency η any digital communication

system requires a signal-to-noise ratio satisfying:

Eb 2η c − 1

≥

N0 η

ANNEXE 1: Comparison between

digital modulations

E R

• for B → ∞ (η → 0 ) the limit of Cs = B log 2 1 + b

N0 B

yields:

Eb R

C∞ = log 2 e bits/s

N0

capacity (to insure an error free transmission, i.e. Pe → 0) :

Eb 1

R ≤ C∞ ⇒ ≥ = 0.693

N 0 log 2 e

Eb

≥ −1.6 dB

N0

ANNEXE 1: Comparison between

digital modulations

1

10

Channel Capacity

64QAM

Spectral Efficiency (bit/s/Hz in log)

transmission is possible:

8PSK Eb 2η − 1

≥

QPSK N0 η

Bandwidth-limited region

0 -1.6 dB BPSK

10

M=8

M = 16 Power-limited region

M = 32

M = 64

Pe = 10 −5

Orthogonal signals

Coherent detection

-1

10 Back to text

-5 0 5 10 15 20 25

Eb/N0 (dB)

ANNEXE 2:

The noisy channel coding theorem

Shannon’s informal interpretation

• Let us consider a source with alphabet X matched to the channel in a way that the source

achieves the channel capacity C, the entropy rate of the source being H(X).

Each source’s sequence of length n is a codeword and is represented by a point in the figure

below.

We know that for large n, there are approximately 2nH(X) input typical sequences x having

probability 2− nH(X) and similarly 2nH(Y) output typical sequences y having probability 2− nH(Y) ,

and finally 2nH(X,Y) typical pairs (x,y) .

For each output sequence y, there are 2n[H(X,Y) −H(Y)] = 2nH(X|Y) input sequences x such that (x,y)

is a typical pair. S will be the set of 2nH(X|Y) input sequences x associated with y .

2nH(X) 2nH(Y)

nH(X|Y)

2

ANNEXE 2:

The noisy channel coding theorem

Shannon’s informal interpretation

• Let us consider now another source with entropy rate R ≤ C ≤ H ( X ) delivering sequences

or codewords of length n. This source will have 2nR high probability sequences. We wish to

associate each of these sequences with one of the possible channel inputs in such a way

to get an arbitrarily small error probability. One way is to randomly associate each source

sequence to a channel input sequence, and calculate the frequency of errors.

• If a codeword x(i) is transmitted through the channel and the sequence y is received, an

error in decoding is possible only if at least one codeword x(j) , j ≠ i belongs to the set S

associated with y:

{ } { }

2 nR

P at least one x ( j ) , j ≠ i, belongs to S ≤ ∑ P x ( j ) ∈ S

j =1

j ≠i

( )

≤ 2 nR 2 nH ( X Y ) 2 nR

nH ( X )

= nC → 0 as n → ∞

2 2

- Solution Manual of Elements of Information TheoryCaricato daEngr.bilal
- Objective Question - 1 ( Information Theory and Coding)Caricato dasuganyamachendran
- Information Theory and codingCaricato daMonica Jones
- Objective Questions - 2 ( Information Theory and Coding)Caricato dasuganyamachendran
- ITC_Chitode_Unit_IVCaricato daankurwidguitar
- Information Theory and Coding by ExampleCaricato dasumanrbr
- information theory coding 6 sem ec notesCaricato daRuturaj
- Information Theory and CodingCaricato daRavi_Teja_4497
- VTU E&C(CBCS) 5th Sem Information theory and coding full notes (1-5modules)Caricato dajayanthdwijesh h p
- Ece-V Sem-Information Theory & Coding-unit1,2,3,4,5,6,7,8Caricato daQuentin Guerra
- Information Theory & Coding (ECE) by Nitin MittalCaricato daAmanjeet Panghal
- Ranjan Bose Information Theory Coding and Cryptography Solution ManualCaricato daANURAG CHAKRABORTY
- Information theoryCaricato daTahir Siddiqui
- Information Theory and codingCaricato daMonica Jones
- Information Theory and CodingCaricato dacaolephuong
- information theory and codingCaricato daNandakumar Cm C M
- Information Theory and Coding_fulandfinalversion_updated on 31-8-2006Caricato dasohaibirshad
- ITC2marksCaricato daankurwidguitar
- Fundamentals of Information Theory and Coding DesignCaricato daQuang Minh Nguyen
- Information Theory and Coding Notes - AkshanshCaricato daAkshansh Chaudhary
- ITC_Unit_IICaricato daankurwidguitar
- Information theory and coding by Norman AbramsonCaricato dachandaninfinite
- Information Theory and CodingCaricato dasunilnkkumar
- Error Control CodingCaricato dabesniku_k
- Coding TheoryCaricato daSanjay Sinha
- Ece-V-Information Theory & Coding [10ec55]-NotesCaricato daPrashanth Kumar
- Information Theory and EntropyCaricato dachintar2
- Communications Theory by CHITODECaricato daIsmaela Catipay
- Coding TheoryCaricato daDonReba
- Unit I Information Theory & Coding Techniques P ICaricato daShanmugapriyaVinodkumar

- 1 2b ErrorsanduncertaintiesCaricato datronicgirl
- 2 Frequency Distribution NCCaricato daGirish Narayanan
- Solutions.dviCaricato daJavier Mejia
- Forecasting Tourist Inflow in Bhutan using Seasonal ARIMACaricato daIjsrnet Editorial
- RakCaricato daAmir Ahmad
- Six Sigma Black Belt Roadmap-Lynda.pdfCaricato daChino Estoque Fragata
- Tutorial 08 Probabilistic AnalysisCaricato darrj44
- Enar06 HandoutCaricato daikin sodikin
- STAT 3360 Homework Chapter 7Caricato daxxambertaimexx
- Problems With EconomicsCaricato dacosmin_k19
- Week 11 AnovaCaricato daSoh Mei Ling
- Unit Roots and Cointegration in Estimating Causality Between Exports and Economic Growth - Empirical Evidence 4m ASEANCaricato daYogesh Chadha
- Quality Lecture NotesCaricato daVinodpuri Gosavi
- cheming-eCaricato daEduardo Ocampo Hernandez
- Xtscc PaperCaricato dawaseem_akram4ur
- Lec2 Signals ReviewCaricato daShakeel Ahmad
- Romil Patel Lab 3-1 Simple Pendulum ReportCaricato datommy
- Appl StatCaricato dahelloapurba
- Sampling and the Sampling Distribution(7) (1)Caricato damd1sabeel1ansari
- Answer Chapter 12Caricato daSajal Singh
- Braided BCH CodesCaricato daUjjwal Dhusia
- Chapter III- Experimental Research MethodologyCaricato daRotua Serep Sinaga
- response surface methodCaricato daadiraju07
- Probability Theory and Stochastic Processes - AssignmentCaricato daVinayKumarPamula
- 2016 Biostat TosCaricato daKATHERINE GRACE JOSE
- aCaricato dafarihafairuz
- Corrected Sampling VariogramCaricato daZhirokh RM
- Average PorosityCaricato daadolfoFIP
- Formulae for Calculating IDI and NRICaricato dassheldon_222
- Ito StratonovichCaricato daGoce Angelovski

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.