Sei sulla pagina 1di 70

Approaching Shannon

Ruediger Urbanke, EPFL


Summer School @ USC, August 6th, 2010

Many thanks to Dan Costello, Shrinivas Kudekar, Alon Orlitsky, and Thomas Riedel
for their help with these slides.

Storing Shannon

Networking Shannon

Completing Shannon

Compressing Shannon

Reading Shannon

Coding

Disclaimer

Technical slides do not contain references.


These are all summarized at the end of each section.

Classes of Codes

(linear) block codes

sparse graph codes

convolutional codes

polar codes

How Do We Compare?

capacity

rate

block length

PN (R, C)

block error probability

complexity

How We Compare: Error Exponent

1
log(PN (R, C))
N N

E(R, C) = lim

error exponent

How We Compare: Finite-Length Scaling


f (z) scaling function; mother curve

10-2

PN (R, )

10-3
10-4
10

1/

0
N in

cre

asin

-5

10-6
10-7
10

threshold

10-1

-8

N 1/ ( )

scaling exponent

channel quality

How We Compare: Finite-Length Scaling


f (z) scaling function; mother curve

PN (R, C)

10-2
10-3
10-4
10

1/

0
N in

cre

asin

-5

10-6
10-7

threshold

10-1

10-8

N 1/ (C
C C)
C

scaling exponent

rate

Finite-Length Scaling

lim

N :N 1/ (CR)=z

PN (R, C) = f (z)

f (z)

scaling function; mother curve

>0

scaling exponent

Finite-Length Scaling -- References

V. Privman, Finite-size scaling theory, in Finite Size Scaling and Numerical Simulation of Statistical
Systems, V. Privman, ed., World Scientific Publ., Singapore, 1990, pp. 198.

Complexity

exponential versus polynomial


=C R

gap to capacity

linear -- but look at prefactor

Block Codes

Error Exponent of Block Codes under MAP


Figure ``borrowed from

1
log(PN (R, C)) error exponent
N N

E(R, C) = lim

quadratic

Error Exponent -- References

R. Gallager, Information Theory and Reliable Communication, Wiley 1968.


A. Barg and G. D. Forney, Jr., Random codes: Minimum distances and error exponents, IEEE
Transactions on Information Theory, Sept 2002.

Scaling of Block Codes under MAP -- BEC


``perfect code

PN

0
=0
distribution of erasures

PN

=1R

erasure fraction

=1
E[E] = N
2 ] = N (1 )
E[(E E)
E N (N , N (1 ))

N ((1 R) )
N ((1 R) )

Q
=Q
=Q
N (1 )
(1 )
(1 )

z = N ((1 R) )

Scaling of Block Codes under MAP -- BEC


random linear block codes are almost perfect
00101010001
01110010010
10101010101
01000101001
01011100010
probability that full rank

n1

i=0

square binary
random matrix
of dimension n

n1

2n 2i
in n
=
(1

2
) 0.28878809508 . . .
2n
i=0

00101010001
01110010010
10101010101
01000101001

if we have k rows
less then probability
decays by roughly
(k+1
2 )

hence for random linear block codes the transition is of constant (on an absolute scale) width

Scaling of Block Codes under MAP

log A(N, P ) = N C

N V Q1 (P ) + O(log(N ))

block length

error probability

A(N, P )

size of largest such code

C = E[i(x, y)]

i(x, y) = log

V = V[i(x, y)]

dp(y | x)
dp(y)

Finite-Length Scaling -- References

G. Landsburg, Uber eine Anzahlbestimmung und eine damit zusammenhangende Reihe, J.


Reine Angew. Math. vol. 111, pp. 87-88, 1893.
A. Feinstein, A new basic theorem of information theory, IRE Trans. Inform. Theory, vol.
PGIT-4, pp. 222, 1954.
V. Strassen, Asymptotische Abschtzungen in Shannons Informationstheorie, Trans. Third
Prague Conf. Information Theory, pp. 689723, 1962.
Y. Polyanskiy, H. V. Poor and S. Verd, "Dispersion of Gaussian Channels," 2009 IEEE Int.
Symposium on Information Theory, Seoul, Korea, June 28-July 3, 2009.
Y. Polyanskiy, H. V. Poor and S. Verd, "Dispersion of the Gilbert-Elliott Channel," 2009 IEEE
Int. Symposium on Information Theory, Seoul, Korea, June 28-July 3, 2009.
For a very simple proof of previous result ask Thomas Riedel, UIUC

convolutional codes

Convolutional Codes

Convolutional Codes
Figures ``borrowed from

affine

Finite-Length Scaling of LDPC Codes -- BEC

scaling behavior?

K constraint length

Convolutional Codes -- Some References


Big bang:
P. Elias, Coding for noisy channels, in IRE International Convention Record, Mar. 1955, pp. 3746.
Algorithms and error exponents:
J. M. Wozencraft, Sequential decoding for reliable communication, Research Lab. of Electron. Tech. Rept. 325,
MIT, Cambridge, MA, USA, 1957.
R. M. Fano, A heuristic discussion of probabilistic decoding, IEEE Trans. Information Theory, vol. IT-9, pp.
64-74, Apr. 1963.
A. J. Viterbi, Error bounds of convolutional codes and an asymptotically optimum decoding algorithm, IEEE
Trans. Inform. Theory, 13 (1967), pp. 260269.
H. L. Yudkin, Channel state testing in information decoding, Sc.D. thesis, Dept. of Elec. Engg.,M.I.T., 1964.
J. K. Omura, On the Viterbi decoding algorithm, IEEE Trans. Inform. Theory,15 (1969), pp. 177179.
G. D. Forney, Jr., The Viterbi algorithm, Proc. IEEE, 61 (1973), pp. 268278.
K. S. Zigangirov, Time-invariant convolutional codes: Reliability function, in Proc. 2nd Joint Soviet-Swedish
Workshop Information Theory, Grnna, Sweden, Apr. 1985.
N. Shulman and M. Feder, Improved Error Exponent for Time-Invariant and Periodically Time-Variant
Convolutional Codes, IEEE Trans. Inform. Theory, 46 (2000), pp. 97103.
G. D. Forney, Jr., The Viterbi algorithm: A personal history. E-print: cond-mat/0104079, 2005.
Overview:
A. J. Viterbi and J. K. Omura, Principles of Digital Communication and Coding, McGraw-Hill, New York, NY,
USA, 1979.
S. Lin and D. J. Costello, Jr., Error Control Coding, Prent. Hall, Englewood Cliffs, NJ, USA, 2nd ed., 2004.
R. Johannesson and K. S. Zigangirov, Fundamentals of Convolutional Coding, IEEE Press, Piscataway, NJ, USA,
1999.

Some Open Questions

Scaling behavior

digrams such as TH, ED, etc. In the second-order approximation, digram structure is introduced. After a
letter is chosen, the next one is chosen in accordance with the frequencies with which the various letters
follow the first one. This requires a table of digram frequencies pi j . In the third-order approximation,
trigram structure is introduced. Each letter is chosen with probabilities which depend on the preceding two
letters.
3. T HE S ERIES OF A PPROXIMATIONS TO E NGLISH
To give a visual idea of how this series of processes approaches a language, typical sequences in the approximations to English have been constructed and are given below. In all cases we have assumed a 27-symbol
alphabet, the 26 letters and a space.
1. Zero-order approximation (symbols independent and equiprobable).
XFOML RXKHRJFFJUJ
QPAAMKBZAACIBZLUDREVOIGRES
LDPC ZLPWCFWKCYJ FFJEYVKCQSGHYD
HJQD.
2. First-order approximation (symbols independent but with frequencies of English text).
OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA
NAH BRL.
3. Second-order approximation (digram structure as in English).
ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE.
4. Third-order approximation (trigram structure as in English).
IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.
5. First-order word approximation. Rather than continue with tetragram, , n-gram structure it is easier
and better to jump at this point to word units. Here words are chosen independently but with their
appropriate frequencies.
REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES
THE LINE MESSAGE HAD BE THESE.
6. Second-order word approximation. The word transition probabilities are correct but no further structure is included.
THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT
THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.
The resemblance to ordinary English text increases quite noticeably at each of the above steps. Note that
these samples have reasonably good structure out to about twice the range that is taken into account in their
construction. Thus in (3) the statistical process insures reasonable text for two-letter sequences, but fourletter sequences from the sample can usually be fitted into good sentences. In (6) sequences of four or more
words can easily be placed in sentences without unusual or strained constructions. The particular sequence
of ten words attack on an English writer that the character of this is not at all unreasonable. It appears then
that a sufficiently complex stochastic process will give a satisfactory representation of a discrete source.
The first two samples were constructed by the use of a book of random numbers in conjunction with
(for example 2) a table of letter frequencies. This method might have been continued for (3), (4) and (5),
since digram, trigram and word frequency tables are available, but a simpler equivalent method was used.

Sparse Graph Codes

LDPC Ensemble

n
o
d
e
s

x1+x4+x8 = 0
p
e
r
m
u
t
a
t
i
o
n

c
h
e
c
k
n
o
d
e
s

Hx=0
sparse

v
a
r
i
a
b
l
e

(1-(1-x)5)3

Asymptotic Analysis -- BEC

34

(1-(1-x)5)3

MAP versus BP

35

Capacity Achieving -- BEC

Capacity Approaching -- BMS

Error Exponent of LDPC Codes -- BP


P {|PN (G, , ) E[PN (G, , )]| } eN

graph
channel parameter
# of iterations

If E[PN (G, , )] converges to zero for large and if code


has error correcting radius then we can prove that the code
has an error exponent under iterative decoding.
simplest sufficient condition: code has expansion at least 3/4 which is
true whp if left degree is at least 5; (less restrictive conditions are known
but more complicated); codes used in ``practice do not have error exponents

Expansion
(dl, dr)-regular cannot
have expansion beyond
(dl-1)/dl

|C|
at most size
dl |V|

|V|

take the smallest ratio


|C|/(dl |V|)
over all ``small sets

remarkably, random graphs


essentially achieve this bound whp

Finite-Length Scaling of LDPC Codes -- BEC


PN = Q(z/)(1 + O(N

13

))

z=

scaling parameters computable

(x) =

1
5
x + x3
6
6

N (BP N 2/3 )

(we ignore error floor here!)

(x) = x5

R = 3/7

PN

10-1
10-2

= 0.5791
/ = 0.6887

10-3
10-4
10-5

0.3

0.35

0.4

0.45
0.5
BP = 0.4828

Optimization

10-1

10 11 12 13

10

40.58 %
0.0

10-2

rate/capacity

1.0

contribution to error floor


10-3

10 12 14 16 18 20 22 24 26

10-4

10-5

10-6
0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

Finite-Length Scaling of LDPC Codes -- BAWGNC

same form of scaling law; parameters are computable but no proof

(3, 6) BSC

(3, 4) BAWGNC

Gap To Threshold versus Length


lim

N :N 1/ (CR)=z

PN (R, C) = f (z)

halving the gap requires increasing length by 4

1/
N1/
N
(C R) = z

N = (z/)

=2

fixes error
additive gap

Gap versus Complexity (per bit)


BEC/Threshold -- O(1); degrees are constant and we touch every edge at most
once
BEC/Capacity -- O(log(1/\delta)) for standard LDPC; degrees grow like log(1/\delta)
and we touch every edge once
BEC/Capacity -- O(1) for MN-type LDPC ensembles; degrees are constant and we
touch every edge at most `once
BMS/Threshold -- ???
BMS/Capacity -- ???

Sparse Graph Codes -- Some References


Big bang:
R. G. Gallager, Low-density parity-check codes, IRE Trans.Inform.Theory, 8 (1962).
C. Berrou, A. Glavieux, and P. Thitimajshima, Near Shannon limit error-correcting coding and
decoding, in Proc. of ICC, Geneva, Switzerland, May 1993.
Analysis:
M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. A. Spielman, Analysis of low density codes and
improved designs using irregular graphs, in Proc. of the 30th Annual ACM Symposium on Theory of
Computing, 1998, pp. 249258.
M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. A. Spielman, Efficient erasure correcting codes, IEEE
Trans. Inform. Theory, 47 (2001), pp. 569584.
T. Richardson, A. Shokrollahi, and R. Urbanke, Design of capacity-approaching irregular low-density
parity-check codes, IEEE Trans. Inform. Theory, 47 (2001), pp. 619637.
T. Richardson and R. Urbanke, The capacity of low-density parity check codes under messagepassing decoding, IEEE Trans. Inform. Theory, 47 (2001), pp. 599618.
S.-Y. Chung, G. D. Forney, Jr., T. Richardson, and R. Urbanke, On the design of low-density paritycheck codes within 0.0045 dB of the Shannon limit, IEEE Commun. Lett., 5 (2001), pp. 5860.
Error exponents:
D. Burshtein and G. Miller, Expander graph arguments for message-passing algorithms, IEEE Trans.
Inform. Theory, 47 (2001), pp. 782790.
O. Barak and D. Burshtein, Upper Bounds on the Error Exponents of LDPC Code Ensembles, The
IEEE International Symposium on Information Theory (ISIT-2006), Seattle, July 2006.

Sparse Graph Codes -- Some References


Finite-length scaling:
A. Montanari, Finite-size scaling of good codes, in Proc. of the Allerton Conf. on Commun., Control,
and Computing, Monticello, IL, USA, Oct. 2001.
A. Amraoui, A. Montanari, T. Richardson, and R. Urbanke, Finite-length scaling for iteratively decoded
LDPC ensembles, in Proc. of the Allerton Conf. on Commun., Control, and Computing, Monticello, IL,
USA, Oct. 2003.
J. Ezri, A. Montanari, and R. Urbanke, Finite-length scaling for Gallager A, in 44th Allerton Conf. on
Communication, Control, and Computing, Monticello, IL, Oct. 2006.
A. Dembo and A. Montanari, Finite size scaling for the core of large random hyper-graphs. E-print:
math.PR/0702007, 2007.
Ezri, J., Montanari, A., Oh, S. and Urbanke, R. (2008) The Slope Scaling Parameter for General
Channels, Decoders, and Ensembles. Proceeding of the IEEE International Symposium on Information
Theory.
Complexity:
A. Khandekar and R. J. McEliece, On the complexity of reliable communication on the erasure
channel, in Proc. IEEE Int. Symp. Information Theory (ISIT2001), Washington, DC, Jun. 2001, p. 1.
H. D. Pfister, I. Sason, and R. Urbanke, Capacity-achieving ensembles for the binary erasure channel
with bounded complexity, IEEE Transactions on Inform. Theory, vol. 51 , issue 7, 2005 , pp. 2352 2379.
Overviews:
D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge Univ. Press,
2003.
T. Richardson and R. Urbanke, Modern Coding Theory, Cambridge Univ. Press, 2008.

Some Open Questions

Simple design procedures?


Can you achieve capacity on general BMS channels?
Thresholds under LP decoding?
Scaling for general BMS channels?
Scaling under MAP?
Scaling under LP decoding?
Scaling under flipping decoding?
Scaling to capacity?

Polar Codes

patterns

Codes from Kronecker Product of G2

Reed-Muller Codes

choose rows of largest weight

Definition of Channels
Polar Codes

W -- BMS channel

Definition of Channels
Channel Polarization

0
0

bad channels

Definition of Channels
Successive Decoding

Successive Decoding

Stefan Meier http://ipgdemos.epfl.ch/polarcodes/

threshold

Channel Polarization

Stefan Meier http://ipgdemos.epfl.ch/polarcodes/

How Do Channels Polarize?


X1

known
noise

X2

BEC()
BEC()

U1 =X1+X2; observe Y1 and Y2

U2 =X2; U2 =X1+U1 ; observe Y1 and Y2

U1

U2

+
X1

1-(1-)2
much worse

X2
Y1 Y2

2 much better
X2

parity-check node

X1+U1=X1

Y2 Y1
repetition code

total capacity = (1-)2+1-2=2(1-)

Definition of Channels
How Do Channels Polarize?

0.9375 0.9961 0.75

0.5

0.8789
0.43750.8086 0.25
0.3164
0.5625 0.6836
0.1914
0.1211
0.0625
0.0039

0.5

Polar Codes -- Some References


Big bang:
E. Arikan, Channel polarization: A method for constructing capacity-achieving codes for symmetric
binary-input memoryless channels, http://arxiv.org/pdf/0807.3917
Exponent:
E. Arikan and E. Telatar, On the Rate of Channel Polarization, http://arxiv.org/pdf/0807.3806
S. B. Korada, E. Sasoglu, and R. Urbanke, Polar Codes: Characterization of Exponent, Bounds, and
Constructions, http://arxiv.org/pdf/0901.0536
Source Coding:
N. Hussami, S. B. Korada, and R. Urbanke, Performance of Polar Codes for Channel and Source
Coding, http://arxiv.org/pdf/0901.2370
S. B. Korada, and R. Urbanke, Polar Codes are Optimal for Lossy Source Coding, http://arxiv.org/pdf/
0903.0307
E. Arikan, Source Polarization, http://arxiv.org/pdf/1001.3087
Non-symmetric and non-binary channels:
E. Sasoglu, E. Telatar, and E.Arikan, Polarization for arbitrary discrete memoryless channels, http://
arxiv.org/pdf/0908.0302
R. Mori and T. Tanaka, Channel Polarization on q-ary Discrete Memoryless Channels by Arbitrary
Kernels, http://arxiv.org/pdf/1001.2662
R. Mori and T. Tanaka, Non-Binary Polar Codes using Reed-Solomon Codes and Algebraic Geometry
Codes, http://arxiv.org/pdf/1007.3661

MAC channel:
E. Sasoglu, E. Telatar and Edmund Yeh, Polar codes for the two-user multiple-access channel, http://
arxiv.org/pdf/1006.4255
E. Abbe and E. Telatar, Polar Codes for the m-User MAC and Matroids, http://arxiv.org/pdf/
1002.0777
Compound channel:
S. H. Hassani, S. B. Korada, and R. Urbanke, The Compound Capacity of Polar Codes, http://
arxiv.org/pdf/0907.3291
Wire-tap channel and security:
H. Mahdavifar and A. Vardy, Achieving the Secrecy Capacity of Wiretap Channels Using Polar Codes,
http://arxiv.org/pdf/1007.3568
E. Hof and S. Shamai, Secrecy-Achieving Polar-Coding for Binary-Input Memoryless Symmetric WireTap Channels, http://arxiv.org/pdf/1005.2759
Mattias Andersson, Vishwambhar Rathi, Ragnar Thobaben, Joerg Kliewer, Mikael Skoglund, Nested
Polar Codes for Wiretap and Relay Channels, http://arxiv.org/pdf/1006.3573
O. O. Koyluoglu and H. El Gamal, Polar Coding for Secure Transmission and Key Agreement, http://
arxiv.org/pdf/1003.1422
Constructions:
R. Mori and T. Tanaka, Performance and Construction of Polar Codes on Symmetric Binary-Input
Memoryless Channels, http://arxiv.org/pdf/0901.2207
M. Bakshi, S. Jaggi, and M. Effros, Concatenated Polar Codes, http://arxiv.org/pdf/1001.2545
Scaling:
S. H. Hassani and R. Urbanke, On the scaling of Polar codes: I. The behavior of polarized channels,
http://arxiv.org/pdf/1001.2766
T. Tanaka and R. Mori, Refined rate of channel polarization, http://arxiv.org/pdf/1001.2067
S. H. Hassani, K. Alishahi and R. Urbanke, On the scaling of Polar Codes: II. The behavior of unpolarized channels, http://arxiv.org/pdf/1002.3187

Error Exponent of Polar Codes -- BEC


A First Guess

Z ,
1 (1 Z)2 ,

wp
wp

1
2,
1
2.

Z 2,
2Z Z 2 ,

wp 12 ,
wp 12 .

Y = log(Z)

2Y,
Y 1,

wp 12 ,
wp 12 .

X = log(Y )

X + 1,
X,

wp 12 ,
wp 12 .

assume that Z is already small,


hence Y is large

Error Exponent of Polar Codes -- BEC


A First Guess

X + 1,
X,

wp 12 ,
wp 12 .

random walk on lattice with drift

after m steps we expect X to have value roughly m/2


this means we expect Y to have value roughly 2

m/2

this means we expect Z to have value roughly 2

=2

Error Exponent of Polar Codes

lim P (Zm 2

2m/2+

mQ1 (R/C)/2+o(

m)

)=R

Finite-Length Scaling for Polar Codes (BEC)

wp 12 ,
wp 21 .

Z 2,
1 (1 Z)2 ,

Battacharyya process

z=0

z=1

1
2

= 1/2

symmetry of distribution
(general case follows in similar manner)

QN (x) =
z=0

1
1
(i)
|{i : x E(WN ) }|
N
4
z=1

z = 2x

1
2

Finite-Length Scaling for Polar Codes -- BEC

2Q2N (x) = QN (1/2 1/4 x/2)+(121{x1/8} )QN (min x/2, 1/2 x/2)
1

Q(x) = lim N QN (x)


N

Q(x) = 2

1
1

Q(1/2

scaling assumption

1/4 x/2) + (1 21{x1/8} )Q(min x/2, 1/2 x/2)

solve this functional equation (numerically)


this gives Q(x) (up to scaling) and
Q(x)

BEC

3.62

BAWGNC

Finite-Length Scaling for Polar Codes -- BEC


Simulations versus Scaling

log10 PN (R, C)

PN (R, C) Q1 (N (C R))

N 1/ (C R)

C R

N = 223 , 224 , 225 , 226

Gap To Capacity versus Length

0.43 bits/channel use


86 % of capacity

lim

N :N 1/ (CR)=z

PN (R, C) = f (z)

1/
N1/
N
(C R) = z

N = (z/)

fixes error
additive gap

! 10 billion to get 1% close to capacity

Gap to Capacity versus Complexity

= O(log(1/))

complexity per bit

Some Open Questions

Variation on the theme that performs better at small lengths?


Do RM codes achieve capacity?
Make scaling conjecture under successive decoding rigorous.
Scaling behavior under MAP decoding?
Find a reasonable channel where they do not work. :-)

Message
sparse graph codes -- best codes in ``practice; still miss
some theory; error floor region is tricky; still somewhat of an
art to construct
polar codes -- nice for theory; not (yet) ready for applications
but the field is young; how do we improve finite-length
performance
scaling behavior is the next best thing to exact analysis;
probably more meaningful characterization for practical
case than error exponent

Potrebbero piacerti anche