Sei sulla pagina 1di 36

Introduction to Information

theory
channel capacity and models
A.J. Han Vinck
University of Essen
May 2011
This lecture
Some models
Channel capacity
Shannon channel coding theorem
converse
some channel models
Input X P(y|x) output Y

transition probabilities
memoryless:
- output at time i depends only on input at time i
- input and output alphabet finite
Example: binary symmetric channel (BSC)
Error Source
+
E
X
Output
Input
E X Y =
E is the binary error sequence s.t. P(1) = 1-P(0) = p
X is the binary information sequence
Y is the binary output sequence
1-p
0 0
p
1 1
1-p
from AWGN
to BSC

Homework: calculate the capacity as a function of A and o
2

p
Other models
0

1
0 (light on)

1 (light off)
p
1-p
X Y
P(X=0) = P
0

0


1
0

E

1
1-e
e


e

1-e

P(X=0) = P
0

Z-channel (optical) Erasure channel (MAC)
Erasure with errors
0


1
0

E

1
p
p
e
e
1-p-e
1-p-e
burst error model (Gilbert-Elliot)
Error Source
Random error channel; outputs independent
P(0) = 1- P(1);
Burst error channel; outputs dependent
Error Source
P(0 | state = bad ) = P(1|state = bad ) = 1/2;
P(0 | state = good ) = 1 - P(1|state = good ) = 0.999
State info: good or bad
good bad
transition probability
P
gb
P
bg
P
gg
P
bb
channel capacity:

I(X;Y) = H(X) - H(X|Y) = H(Y) H(Y|X) (Shannon 1948)

H(X) H(X|Y)




notes:
capacity depends on input probabilities
because the transition probabilites are fixed






channel
X Y
capacity ) Y ; X ( I max
) x ( P
=
Practical communication system design
message
estimate
channel
decoder
n
Code
word in
receive
There are 2
k
code words of length n
k is the number of information bits transmitted in n channel uses

2
k

Code book
Code book
with errors
Channel capacity
Definition:
The rate R of a code is the ratio k/n, where
k is the number of information bits transmitted in n channel uses
Shannon showed that: :
for R s C
encoding methods exist
with decoding error probability 0

Encoding and decoding according to Shannon
Code: 2
k
binary codewords where p(0) = P(1) =
Channel errors: P(0 1) = P(1 0) = p
i.e. # error sequences ~ 2
nh(p)

Decoder: search around received sequence for codeword
with ~ np differences
space of 2
n
binary sequences
decoding error probability
1. For t errors: |t/n-p|>
0 for n
(law of large numbers)
2. > 1 code word in region
(codewords random)


< =
=
~ >

n and
) p ( h 1
n
k
R for
0 2 2
2
2
) 1 2 ( ) 1 ( P
) R
BSC
C ( n ) R ) p ( h 1 ( n
n
) p ( nh
k
channel capacity: the BSC
1-p
0 0
p
1 1
1-p
X Y
I(X;Y) = H(Y) H(Y|X)
the maximum of H(Y) = 1
since Y is binary
H(Y|X) = h(p)
= P(X=0)h(p) + P(X=1)h(p)

Conclusion: the capacity for the BSC C
BSC
= 1- h(p)
Homework: draw C
BSC
, what happens for p >
channel capacity: the BSC
0.5 1.0
1.0
Bit error p
C
h
a
n
n
e
l

c
a
p
a
c
i
t
y

Explain the behaviour!
channel capacity: the Z-channel
Application in optical communications
0

1
0 (light on)

1 (light off)
p
1-p
X Y
H(Y) = h(P
0
+p(1- P
0
) )

H(Y|X) = (1 - P
0
) h(p)

For capacity,
maximize I(X;Y) over P
0
P(X=0) = P
0

channel capacity: the erasure channel
Application: cdma detection
0


1
0

E

1
1-e
e


e

1-e

X Y
I(X;Y) = H(X) H(X|Y)
H(X) = h(P
0
)
H(X|Y) = e h(P
0
)

Thus C
erasure
= 1 e
(check!, draw and compare with BSC and Z) P(X=0) = P
0

Erasure with errors: calculate the capacity!
0


1
0

E

1
p
p
e
e
1-p-e
1-p-e
example
Consider the following example
1/3
1/3
0

1

2
0

1

2
For P(0) = P(2) = p, P(1) = 1-2p

H(Y) = h(1/3 2p/3) + (2/3 + 2p/3); H(Y|X) = (1-2p)log
2
3

Q: maximize H(Y) H(Y|X) as a function of p
Q: is this the capacity?

hint use the following: log
2
x = lnx / ln 2; d lnx / dx = 1/x
channel models: general diagram
x
1
x
2
x
n
y
1
y
2
y
m
:
:
:
:
:
:
P
1|1
P
2|1
P
1|2
P
2|2
P
m|n
Input alphabet X = {x
1
, x
2
, , x
n
}
Output alphabet Y = {y
1
, y
2
, , y
m
}
P
j|i
= P
Y|X
(y
j
|x
i
)

In general:
calculating capacity needs more
theory
The statistical behavior of the channel is completely defined by
the channel transition probabilities P
j|i
= P
Y|X
(y
j
|x
i
)

* clue:
I(X;Y)
is convex in the input probabilities

i.e. finding a maximum is simple
Channel capacity: converse

For R > C the decoding error probability > 0
k/n
C
Pe
Converse: For a discrete memory less channel

1 1 1 1
( ; ) ( ) ( | ) ( ) ( | ) ( ; )
n n n n
n n n
i i i i i i i
i i i i
I X Y H Y H Y X H Y H Y X I X Y nC
= = = =
= s = s

X
i
Y
i
m X
n
Y
n
m
encoder channel
channel
Source generates one
out of 2
k
equiprobable
messages
decoder
Let Pe = probability that m = m
source
converse R := k/n
Pe > 1 C/R - 1/nR
Hence: for large n, and R > C,
the probability of error Pe > 0
k = H(M) = I(M;Y
n
)+H(M|Y
n
)
X
n
is a function of M Fano
s I(X
n
;Y
n
) + 1 + k Pe
s nC + 1 + k Pe
1 C n/k - 1/k s Pe
We used the data processing theorem
Cascading of Channels
I(X;Y)
X Y
I(Y;Z)
Z
I(X;Z)
The overall transmission rate I(X;Z) for the cascade can
not be larger than I(Y;Z), that is:

) Z ; Y ( I ) Z ; X ( I s
Appendix:
Assume:
binary sequence P(0) = 1 P(1) = 1-p
t is the # of 1s in the sequence
Then n , c > 0
Weak law of large numbers
Probability ( |t/n p| > c ) 0

i.e. we expect with high probability pn 1s
Appendix:
Consequence:

n(p- c) < t < n(p + c) with high probability


1.

2.

3.

) p ( nh
) p ( n
) p ( n
2 n 2
pn
n
n 2
t
n
c ~
|
|
.
|

\
|
c ~
|
|
.
|

\
|

c +
c
) p ( h
pn
n
n 2 log lim
2
n
1
n

|
|
.
|

\
|
c

) p 1 ( log ) p 1 ( p log p ) p ( h
2 2
=
Homework: prove the approximation using ln N! ~ N lnN for N large.

Or use the Stirling approximation: ! 2
N N
N NN e t

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
Binary Entropy: h(p) = -plog
2
p (1-p) log
2
(1-p)
Note:
h(p) = h(1-p)
Input X Output Y
Noise
)] Noise ( H ) Y ( H [ sup : Cap
W 2 / S
2
x
) x ( p
=
s
Input X is Gaussian with power spectral density (psd) S/2W;


Noise is Gaussian with psd = o
2
noise


Output Y is Gaussian with psd = o
y
2 =
S/2W + o
2
noise

Capacity for Additive White Gaussian Noise
W is (single sided) bandwidth
For Gaussian Channels: o
y
2 =
o
x
2
+o
noise
2

X Y
X Y
Noise
. sec / bits )
W 2 / S
( log W Cap
. trans / bits ) ( log
. trans / bits ) e 2 ( log )) ( e 2 ( log Cap
2
noise
2
noise
2
2
noise
2
x
2
noise
2
2
1
2
noise 2
2
1
2
noise
2
x 2
2
1
o
+ o
=
o
o + o
=
o t o + o t =
bits ) e 2 ( log ) Z ( H ; e
2
1
) z ( p
2
z 2 2
1
2 / z
2
z
2
z
2
o t =
to
=
o
Middleton type of burst channel model
Select channel k
with probability Q(k)
Transition
probability P(0)
0

1
0

1

channel 1
channel 2
channel k has
transition
probability p(k)
Fritzman model:
multiple states G and only one state B
Closer to an actual real-world channel
G
n
B

1-p
Error probability 0 Error probability h

G
1
Interleaving: from bursty to random
Message interleaver channel interleaver
-1
message
encoder decoder
bursty
random error
Note: interleaving brings encoding and decoding delay

Homework: compare the block and convolutional interleaving w.r.t. delay
Interleaving: block
Channel models are difficult to derive:
- burst definition ?
- random and burst errors ?
for practical reasons: convert burst into random error
read in row wise


transmit
column wise
1
0
0
1
1
0
1
0
0
1
1
0
0
0
0

0
0
1
1
0
1
0
0
1
1
De-Interleaving: block
read in column
wise
this row contains 1 error
1
0
0
1
1
0
1
0
0
1
1
e
e
e
e
e
e
1
1
0
1
0
0
1
1
read out
row wise
Interleaving: convolutional
input sequence 0
input sequence 1 delay of b elements
---
input sequence m-1 delay of (m-1)b elements

Example: b = 5, m = 3
in
out

Potrebbero piacerti anche