Sei sulla pagina 1di 5

2006 IEEE Information Theory Workshop, Punta del Este, Uruguay, March 13-17, 2006

Design Principles for Raptor Codes


Payam Pakzad Laboratoire d'algorithmique (ALGO) Ecole Polytechnique Federale de Lausanne Lausanne, Switzerland

payam.pakzad@epfl.ch

Amin Shokrollahi Laboratoire d'algorithmique (ALGO) Ecole Polytechnique Federale de Lausanne Lausanne, Switzerland amin.shokrollahi@epfl.ch

Abstract-In this paper we describe some practical aspects of the design process of good Raptor codes for finite block lengths over arbitrary binary input symmetric channels. In particular we introduce a simple model for the finite-length convergence behavior of the iterative decoding algorithm based on density evolution, and propose a practical design procedure. We report simulation results for some example codes.
I. INTRODUCTION

Precoding

Redundant nodes

LT-coding
Fig. 1. Graphical representation of a Raptor code.

The introduction of the sparse-graph codes combined with low-complexity iterative decoding algorithms has caused a complete and phenomenal change of approach to the errorcorrecting codes. Whereas not too long ago the idea of reliable transmission at rates close to the channel capacity was considered a mere theoretical feasibility due to the high complexity of the decoding of the algebraic codes, today we are closer than ever to achieving these bounds. Between turbo codes [1], variations of low-density parity-check (LDPC) codes [4], [8], [7], [10] and repeat-accumulate (RA) codes [2], and other similar graph-codes, we now have very good codes with rates arbitrarily close to the capacity, and with low-complexity decoding algorithms. Fountain codes are a new class of codes originally designed and ideally suited for communication over an erasure channel with unknown erasure probability. The important feature of the fountain codes, apart from their low-complexity encoding and decoding algorithms, is the fact that they are naturally rateless; for a given finite set of input symbols, a fountain code can generate an infinite sequence of coded outputs, each of which is a linear combination of a random subset of the inputs. Thus, unlike fixed-rate codes which require puncturing, shortening or other such operations in order to achieve rates other than the design rate, a fountain code can achieve this quite naturally by generating fewer or more output symbols. LT codes [6] were the first class of efficient fountain codes. An LT code is described by an output degree distribution; to generate an output, a degree d is sampled from this distribution, independently from the past; the output is then formed as the sum of a uniformly-randomly-chosen subset of size d of the inputs. The complexity of the encoding and decoding of the LT codes is linear in block-length only if the average degree of the outputs is a constant. It can be shown however that, in order for the maximum likelihood decoding of LT codes to have vanishing probability of error, this average degree must grow at least logarithmically with the number of

inputs. This means that reliable decoding of LT codes grows super-linearly with the block-length of the code. Raptor codes [12] are an extension of LT codes which solve this problem and yield easy linear-time encoders and decoders. Raptor codes are universally capacity-achieving onthe erasure channels, see [12]. They are also conjectured to achieve capacity on other binary-input, symmetric channels [3]. Although a complete analysis is still missing, the asymptotic fraction of degree-one and two output nodes in any capacity-achieving sequence of Raptor codes are known [3]. Nevertheless, the practically important problem of designing finite-length codes can be addressed separately. This paper is a short description of some of the practical issues involved, and techniques used by the authors, in the design process. Here is the outline of this paper. In Section II we give a brief overview of the Raptor codes and their iterative decoding algorithms, and density evolution. In Section III we introduce a simple model for the finite-length convergence behavior of the decoding algorithm, and propose a practical design procedure. Finally, we present some example codes obtained using our design procedure, and report simulation results in Section IV.
Although Raptor codes, being a subclass of fountain codes, can in principle be described using a distribution on the subsets of neighbors of each output symbol, it is most useful to think of them as an extension of LT codes. The main idea behind Raptor codes is to first encode the input symbols using a block code with a linear-time encoder and decoder. The outputs are then produced by applying an LT code on these intermediate symbols, see Fig. 1. Whereas the LT coding with constant average degree for the outputs will inevitably be inadequate to correct all the errors, the high-rate precoder will be able to complete the correction. Meanwhile, the overall code retains

l1-4244-0036-8/06/$20.00 )2006 IEEE

165

0.02 Sequential Curve the desirable property of being rateless. Decoding can be done 0.015 - Continuation either in twoin ~~~~~~~~~~~~~~~~~~~~~~~~~~~~XSamplesI stages LT decoding, followed by decoding of the precoder , or, more efficiently, as one sparse-graph 0.01. code; as far as the fixed-points of the iterative decoding are ^ concerned, this variation in the 'message-updating schedule' X\ is irrelevant. A Raptor code is thus defined by a precoder, <: x~~~~~~~ P_ and an LT code, which in turn is specified by its output 0005 degree distribution. In this paper we mainly concern with the , design of the LT component of the code according to some -0.015) 0.05 011 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 design parameters, and leave the discussion of the design of X the precoder to the future publications. The main practical tool used in the analysis and design of Fig. 2. Asymptotic change in the probability of error vs. the residual graphical codes using iterative decoding is density evolution probability of error in the decoding process. (DE), see [10], [11]. As suggested by the name, density evolution tracks the asymptotic evolution of the distribution of the messages during the iterative decoding, for a random repre- continuation. The sequential decoding is expected to terminate sentative in an ensemble of graph-codes. On the erasure chan- with a residual probability pO 0.25 of error. nels, this can be done analytically [7]; in more complicated III. DESIGN OF RAPTOR CODES cases, this is done numerically, see e.g. [5] for an efficient implementation. The DE 'update' rules of the distributions As described in the previous section, density evolution is of messages follow immediately from the message update a tool which predicts the asymptotic behavior of the iterative rules of the iterative decoding algorithm. The most natural decoding; therefore, given an ensemble of codes (which we choice for the decoding algorithm is the belief propagation described by an output degree distribution), one can only [9], which updates the messages according to a Bayesian predict that for large enough block-lengths, a random code in probability model under the assumption of independence of the ensemble is expected to behave as predicted by the density the information contained in the messages. This assumption is evolution. To translate the information provided by the density justified, since finite neighborhoods in large, random, sparse evolution into probabilistic guarantees for finite block-lengths graphs are, with high probability, cycle-free. The algorithm is requires a model for the dynamics of the decoding process. usually described in a way that messages are the log-likelihood For erasure channels, such models exist, see [6], [12]; in ratios (LLR) of the corresponding input bit, based on a subset short the fraction of erasures at a given iteration of the of the output observations. The belief at a input node is then decoding process iS modelled as a random walk, leading to '. the LLR of the corresponding bit, based on all the information a lower-bound for the size of the ripple, i.e. the number of contained in the arriving messages. input symbols that can be recovered at a given stage of the We refer the reader to [11], [3] for more detailed discussion decoding process, as a function of the desired fidelity of the of the density evolution rules for belief propagation. Here, we decoding. Here we try to devise a general model of the same simply assume that, for a given ensemble of codes, we can flavor, which is applicable on arbitrary symmetric channel. track a distribution f( (z) of the messages at iteration t of Fix a code with input length k and rate r, drawn randomly the decoding. Thus, a fraction f( (z) dz of the messages from an output degree distribution Q(x). Define, Pe(t) to be (corresponding to LLR's) are expected to lie in the neigh- the probability of error of the code at a stage t of decoding, i.e. borhood dz of z. The expected probability of error, Pt, at the fraction of the input bits with incorrect estimates of their iteration t can also be calculated from fD (z) by integrating respective value. The change APe (t) Pe (t + 1) - Pe (t) in over the 'incorrect' LLR values. Clearly p0 = 0.5, and it the probability of error from an iteration to the next, is then an can be shown [10] that {Pt} is a decreasing sequence. We analogue of the ripple in the erasure case; this is the fraction of define ADE(Pt) = Pt+ - Pt to be the change in probability errors 'correctable' in the next iteration of the algorithm. The of error after the (t + 1)st iteration. Therefore, the higher bigger this fraction is, the faster the decoding will proceed, ADE(Pt) is, the faster the decoding 'proceeds' at iteration t. and the more unlikely it is that the decoding will get stuck In general, we can define ADE(p) as the smooth continuation in an incorrect fixed-point. But we can calculate the expected of ADE over p E [0, 0.5]. Indeed, taking advantage of certain distributions ft of the belief LLR values at stage t using apparent robustness properties of the density evolution process, the density evolution. We will then aim to find an 'optimal we are able to 'sample' this function at any point in its value' for APe(t) based on the code length, the desired domain, even below the largest fixed point pOO: sup{p C error tolerance and speed of convergence ,and shape the [0, 0.5] AiDE (p) =0} of the sequential algorithm. We use output degree distribution so that A\Pe, as a function of Pe, this technique to substantially reduce the complexity of the follows the optimal curve as closely as possible. To that end, we make the following simple assumptions design optimization problem which we will pose later in this paper. Fig. 2 below depicts an example A\DE(P) curve and its about the decoding process:
-

166

* At the point of the decoding process when the probability 0.25 of error Pe(t) = x, the belief LLR of the input bits are x 0.2~ x X* i.i.d., with distribution f(, the distribution obtained by the density evolution at a stage with the same probability 0.15 -4 of error. * Belief LLR's at different iterations of the decoding 0.1 process are independent. 0.05 A * Decoding fails when the probability of error increases from an iteration to the next. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Clearly the above assumptions do not give an accurate portrait of the complex dynamics of the iterative decoding, but as we will see, they will lead to reasonable design conditions. Fig. 3. Points on the curve u7(p) for AWGN symmetric channel with noise standard of LLR values, when ~~~~~~~~distributions. deviation 0.9787, corresponding to 300 random output degree Suppose then that the distribution fDE converted to a distribution on the error probabilities, has a mean Pt and standard deviation ot. Note that by eliminating 0.18 AWGN(1l.3) the variable t, we can represent the standard deviation as 0.16 BSC(0.1 1) BSC(0172) an explicit function of the mean error, so that 7(Pt) := 7t. 0.14 According to the first assumption above, the number kPe(t) 0.12 of errors is the sum of k i.i.d. random variables, with the given 0.1 distribution. Thus, for k moderately large, we can take Pe (t) as .08 a Gaussian JV(Pt, '7t ). From the third assumptions above, the 0.06 decoding fails if Pe(t + 1) > Pe(t). Finally, from the second 0.04 assumption, this happens with probability 0.02
x

] V

p00 ~~~~~~2 ~2 t (72

-00

(7] V ) A(pt, k )(Y) / y \(pt+l, tk 1) (z) dz dy. N k

00

0.05

0.1

-00k
y

0.15

0.2

0.25

Suppose further, as is normally the case, that APe(t) is small and the curve (X(p) is smooth, so that 7t 7t+. Define
00

Fig. 4. Fitted o(p) curves for various symmetric channels.

address the issue of the rate of convergence. Fortunately, how(y) 1) (z) dz dy, x(x) -oo -oo ever, the expected number of iterations required to converge and let A. x= 1 (e) for eE (0,1). Then, according to the is also inferable from the function ADE(P): starting with above model, the probability of a decoding failure at Pe > d Po = 0.5, we calculate the sequence Pt+l = Pt - ADE(Pt). for some small positive 6, is upper-bounded by e if pt+l -Pt > The expected number of iterations to converge to a point with probability of error d is the smallest t for which Pt < 6. AE7t/Vk while Pt > 6, or equivalently, Therefore, from a practical point of view it is essential that ADE(P) is not too small for p > 6. Of course this constraint is
`

V(O, I)

(x

The analysis above leading to inequality (1) does not

ADE(p)

>

Aeor (p).

for

P >

2>

much

more

important

at the

beginning

of the process

(p

0.5)

where the functions cDEo(p) and (p) are obtained from the density evolution. Of course bap)tself depends on the i chosen degree distribution Q (x), but as is experimentally verifiable, this dependence is rather weak, and indeed for P small, different output distributions seem to result in a value of oX(p) which only depends on the underlying channel. Fig. 3 05depicts points on the curves o(p) for 300 randomly chosen f d N output distributions for a binary input channel with Gaussian noise of standard deviation 0.9787. Fig. 4 compares the fitted Conditions (1) and (2), parameterized by the design paramecurves for o(Xp) for a few different channels. ters c, 6, N and d', define the 'optimal' shape for the curve In practice, the probability of decoding failure is not the ADE (p) according to our simple model. Note that, by choosing
i

than towards the end, since in the latter case a premature termination causes only marginal loss in the potential errorcorrecting power of the decoder. Suppose that ADE(P) = d for p > d' for some d'. Then the expected number of iterations to achieve Pe = d' is 0.5-6' Thus, if a target N number of iterations is desired to arrive to this point, we will impose

only design parameter of interest; one is also interested in bounding the complexity of the decoding process. Indeed it is quite possible, but not desirable, to design codes that converge very slowly, and hence require a large number of decoding iterations.

different forms for the lower-bound of (2), rather than a constant , it is always possible to make this optimal lowerbound a continuous function of p. For the purposes of this presentation, we simply correlate our target parameters so to achieve this with (1) and (2).

167

It should be noted that the more restrictive combinations of design parameters, i.e. smaller 5, c, k, and N, will inevitably require larger overheads for the code. There is therefore, naturally, a trade-off between these design parameters and the rate of operation of the code. In the remainder of this section, , we discuss practical methods to shape the curve ADE(P) to a given optimal form. As mentioned in Section II, by taking advantage of the apparent continuity of ADE(p) as a functional of the degree distribution Q(x), we have devised a fast sampling method instead of the complete sequential density evolution. This has opened the possibilities for numerical optimization of the degree distribution Q(x). Here we briefly discuss two such methods, and leave a detailed description to future publications. Our first method uses standard numerical optimization methods, like the differential evolution of [13], in combination with a cost function appropriately measuring the 'distance' from the optimal ADE (P) curve. Differential evolution is an effective numerical optimization algorithm, where each generation of candidates is perturbed, partly randomly, and partly according to the differences of other surviving candidates. The output node degrees may either be fixed a priori, or left as optimization parameters. 0sw m sl d c a The second method is somewhat more subtle and once again uses the apparent continuity of A\DE (p) as a afunctional of usesQ( heapparenx)ontinity.of DEpa prnciogram ,. In short, we form a carefully designed based on the values of ADE (P) for an initial collection of candidates. Optimization is then done using the standard methods. For larger optimization problems with many parameters, this method performs better and faster than the first method.
IV. EXAMPLE CODES AND SIMULATION RESULTS In this section we report some simple example codes obtained using the methods of Section III. For each case we

0.018
0.016-

0.02

Target

Optimized

0.014-

0.012 -

<0008

0.006 0.004 0.0020 00

005

0.1

0.15

0.2

0.25
p

0.3

0.35

0.4

0.45

0.5

Fig. 5. The target, and the optimized

6% overhead.

ADE(

curve over

BSC(.11), with

0.03 Target 0.025 - BSC(0. I1'),o 006 BSC(0. 172),o=0.06 0.02 0 <.. -._

001

- - - - -

0.005

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Fig.

different BSC channels. The overhead

6.

The target and actual ADE(p)


o

curve for the optimized code of each code is also given.

over

about 37 iterations. The resulting optimized degree distribution is given below.

Ql(x) = 0.0103+0.3983x+0.2492x2+0.0370x3+0. 1002x4


+ 0.0013x5 + 0.0709x6 + 0.0012x7 + 0.0049x8 + 0.0779x13 + 0.0176x16 + 0.0305x 20

used the first optimization method of the pervious section, with the maximum output degree restricted to at most 21; this confines the potential performance of the code, but makes the optimization task faster. In addition, for the purposes of this presentation we restrict our attention to the design of the LT part of the Raptor code, with the assumption that a good precoder is appended to correct the residual error. We measure the success of the design by how closely and consistently the LT code performs compared to the design requirements. It should be noted that the example codes given here are merely to demonstrate the proposed design procedure, rather than their eventual performance or usability as 'good' codes.
Our first design is for a binary symmetric channel with crossover probability p = 0.11, and capacity 1/2. Following

We ran 2000 simulation runs of a rate 0.475 random code chosen from this degree distribution, with k = 10000 input bits, on a BSC(0.11) channel. This amounts to about 5% overhead compared to the capacity of the channel. For each simulation run, 70 decoding iterations were performed. The average bit error probability Pe was 0.0064, as compared to 0.0055 predicted by the density evolution. The standarddeviation of Pe was 0.016; in particular, the decoding terminated with Pe > 0.01 in four occasions, each with Pe > 0.30. Recall that according to our model, the fidelity c _ 0.004, and so a failure rate of 4/2000 = 0.002 is not surprising. The standard-deviation of Pe on the 'successful' decodings was
0.0013. Fig. 6 shows the A\DE (p) curve for this code over two other BSC channels. For this we chose BSC(0.061) with

ADE(P) are depicted in Fig. 5.

The target shape, and the actual plot of the optimized

the model of the previous section, we choose an input length k =10000 and A6 0.065; according to our model, this corresponds to a decoding failure with probability e 0.004. We further choose d 0.01, d' =0.15 and d =0.095, so that the decoding is expected to get to the point with Re =0.15 in

capacity 2/3, and BSC(0.172) with capacity 1/3. In each case, the overhead was chosen so that A\DE lies above the target curve. In particular, for BSC(0.061) a larger overhead

168

0.025

T -

V.
p s

CONCLUSION

We proposed a simple under to predict the performance of finite-length Raptor codesmodel iterative decoding. Our model leads to a numerical design procedure for a given set of design 0.015 - - a, -} + \7parameters, including the block-length, probability of failure, 0.01 -residual probability of error of the LT code, and the expected , complexity of the decoding. Clearly this model is not sophisticated enough to capture all the intricacies of the optimal design 0.005process, and, as is common with model-based designs, further fine-tunings are required; rather, this model can be used as 0t) 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 p a practical design guideline. A more sophisticated model and Fig. 7. The target and actual ADE (P) curve for the optimized code over detailed discussion of the optimization techniques are deferred 0.02_ AWGN(1.3),o=0.05 Sl . / H>f
different AWGN channels. The overhead o of each code is also given.
to future publications.

AWGN(0 98), o=0.05 AWGN(0 77), o=0 08

REFERENCES

of 10% was needed so the curve would not dip too low prematurely. We ran similar sets of simulations for our code over these two channels. The average probability of error over BSC(0.061) was 0.0035, compared to 0.0036 predicted by DE. The standarddeviationofP,inthiscasewasles by DE. The standard deviation of Re In this case was less than 0.0009, with every decoding runs terminating with 0.0012 < Pe < 0.0065. The average probability of error

[1] C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon Limit In Proc. IEEE Error Correcting Coding and International Conference on Decoding: Turbo-Codes," pp. 1064-1070, Communications (ICC), 1993. [2] D. Divsalar, H. Jin, and R. McEliece, "Coding Theorems for 'Turbo-Like' Codes," in Proc. Allerton Conference, pp. 201-210, 1998. [3] 0. Etesami and A. Shokrollahi, "Raptor Codes on Symmetric Channels,"
[4] R. Gallager, Low Density Parity-Check Codes. MIT Press, Cambridge, 1963.
MA, [5] H. Jin and T. Richardson, "Fast Density Evolution," In Proc. 38th
Preprint available at http://algo.epfl.ch/index.php?p=output-pubs

by DE. The standard deviation of Pe in this case was less than 0.0058, with every decoding runs terminating with 0.002 K Re K 0.002 < P, < 0.01.
Our second design is on a binary input channel with Gaussian noise, with standard deviation of (u= 0.9787. The capacity of this channel is 1/2. We used a similar set of design the above case. We obtained the following output degree distribution:

over SC(O.72) ws 0.0058, ompard to 0.0063 predicted over BSC(0.172) was 0.058, compared to0.006 predcted

Inform. Sciences and Systems (CISS), Princeton, NJ, 2004. [6] M. Luby, "LT Codes," In Proc. IEEE Symposium on the Foundations of
Computer Science (STOC), pp. 271-280, 2002. [7] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman, "Efficient Erasure Correcting Codes," IEEE Trans. Inform. Theory, vol. 47, pp. 569-

Conff

584, 2001. [8] D. MacKay, "Good Error-Correcting Codeds Based on Very Sparse

Matrices," IEEE Trans. Inform. Theory, vol. 45, pp. 399-431, 1999. [9] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988. [10] T. Richardson and R. Urbanke, "The Capacity of Low-Density ParityCheck Codes Under Message-Passing Decoding," IEEE Trans. Inform. Theory, vol. 47, pp. 599-618, 2001. [11] T. Richardson, A. Shokrollahi, and R. Urbanke, "Design of Capacity-

Q2(X) = 0.0121+0.4276x+0.2462x2+0.0615x3+0.0486x4 + 0.0340x5 + 0.0103x6 + 0.0264x7 + 0.0069x8


+

0.0075x9

0.0658x10

0.0008x16

0.0522x20

Approaching Irregular Low-Density Parity-Check Codes," IEEE Trans. Inform. Theory, vol. pp. 619-637, 2001. [1]A. Shokrollahi, 47,"Raptor Codes," Preprint available at http://algo.epfl.ch/index.php?p=output-pubs [13] R. Storn and K. Price "Differential Evolution - A Simple and Efficient

The target and the actual ADE(P) curve for the optimized code, over three different channels are shown in Fig. 7. Once again, each case was simulated for 2000 decoding runs of 70 iterations each. On the AWGN(0.9789) with 5% overhead compared to the capacity of 1/2, one out of the 2000 runs terminated with Pe = 0.087, and the rest had 0.0018 < Pe < 0.01, for an average of 0.0046 and standard deviation of 0.0023. On the AWGN(0.77), with 8% overhead over the channel capacity of 2/3, one run terminated with Pe = 0.36, and the rest had 0.0008 < Pe < 0.0070, with average Pe of 0.0031 and standard deviation of 0.0079 including the failed decoding, or 0.0008 excluding the failed case. Finally, on the AWGN(1.3), with 5% overhead over the channel capacity of 1/3, all the runs terminated with 0.0025 K Re < 0.018. The average and the standard deviation of the Re were 0.0063 and 0.0017 respectively.
169

International Computer Science Institute, TR-95-012, 1995.

Adaptive Scheme for Global Optimization over Continuous Spaces,"

Potrebbero piacerti anche