Sei sulla pagina 1di 19

Viterbi implementation, using FPGAs

Version number 1.0 Date 2008

Authors: Mrio Vstias, Helena Sarmento

Revised by: Helena Sarmento

Technical Report Project: UWB Receiver: baseband processing using reconfigurable hardware (UWBR) PTDC/EEA-ELC/67993/2006

Funded by:

Abstract
This report describes the Viterbi decoding algorithm and presents an implementation of the decoder for the UWB MB_OFDM technology. The work done is based on a previous implementation that was analysed in order to improve performance. Results for implementations with different number of soft bits e traceback length are presented.

Funded by:

Funded by:

Table of Contents
Viterbi Algorithm....................................................................................................................7 Convolutional Code 7 Trellis Diagram 8 State transitions 8 Decoding 9 Hard versus soft decision 10 Euclidean distance 11 Decoding Length 13 Viterbi Decoder implementation...........................................................................................13 BMU 14 ACSU 14 SMU 15 DU 15 Results...................................................................................................................................17 Conclusions...........................................................................................................................17 References.............................................................................................................................19

Funded by:

Funded by:

Viterbi Algorithm
The Viterbi algorithm is a maximum-likelihood algorithm for decoding of convolutional codes. It is a recursive sequential minimization algorithm that can be used to find the least expensive way to route symbols from one edge of a state diagram to another. Viterbi algorithm uses a cost analysis mechanism to calculate the distance between the received symbol s and the symbol associated to an edge. The Viterbi algorithm solves the minimization problem by applying recursively equation (1). PM [j](t) = min (PM [i](t-1) + BM [i, j](s)) (1)

PM [j](t) is the path metric associated to the minimum cost path leading to state j at time t. BM [i, j](s) is the branch metric associated to the transition from state i to state j. BM[i, j](s) is the distance between the received symbol s and the symbol associated to that transition from state i at time t-1 to sate j at time t.

Convolutional Code
A convolutional code is a type of error-correcting code in which each block of m input bits (m-bit string) is transformed into a block of n bits and the transformation is a function of the last k bits. The quantity m/n is the code rate, being a measure of the efficiency of the code. The constraint length k represents the number of bits in the encoder memory that affect the generation of the n output bits. The convolutional code is defined by a set of n generating polynomials for each input bit. The constraint length is equal to the highest-degree generator polynomial.

outC in
D6 D D5 D D4 D D3 D D2 D D1 D D0

outB outA

Figure 1 MB-OFDM convolutional encoder Figure 1 presents the convolutional encoder defined for MB-OFDM, where m =1, n = 3, codification rate m/n =1/3 and constraint length k = 7. Generator polynomials are represented by equations (2)(3)(4). G0 =(133)8 = (1011011)2 G1 =(165)8 = (1110101)2 (2) (3)

Funded by:

G2 =(171)8 = (1111001)2

(4)

G0 generates outA, G1 outB and G2 outC (equations (5)(6)(7)). OutA = D0 + D3 + D4 + D5 + D6 OutB = D0 + D2 + D4 + D5 + D6 OutC = D0 + D3 + D4 + D5 + D6 (5) (6) (7)

Trellis Diagram
A Trellis diagram is a state diagram. A trellis diagram for a convolutional code, in which 1 bit is shifted at a time into the shift register, with k stages has 2k1 states. The trellis diagram for the convolutional encoder of Figure 2 is presented on Figure 3. Figure 3 presents the four states (00, 01, 10, 11) and the transitions between states, for 4 time intervals (t0-t1, t1-t2, t2-t3 and t3-t4). The initial state at time t0 is 00. When the input bit is 1 transitions are represented by dashed lines and for 0 by solid lines. Output values for each transition are represented near the transition branch.
State input bit outA outB

Figure 2 Convolutional encoder (m = 1, n =2, k =3)


00 00 11 01 10 11 t0

11 00
01

11 00
10

11 00 11 10 00 01 01 10 t3


t4

11

t1

00 01 01 10 t2

Figure 3 Trellis diagram (m = 1, n =2, k =3)

State transitions
For a convolutional encoder with one input bit (m = 1), there are only two paths that merge at each node, as represented in the example of Figure 4 . There are only two states that can

Funded by:

9 change to the same state. The states differ in the least significant bit (D0) and the input bit must be the same. A butterfly can represent the state transitions to each state (Figure 5).
state i 0 D 5 A 0 D4 B D3 C D2 D D1 E D0 0 D5 0 D4 A D3 B state m D2 C D1 D D0 E

D5 A

D4 B

D3 C

D2 D

D1 E

D0 1

D5 0

D4 A

D3 B

D2 C

D1 D

D0 E

state i+1 state i 1 D5 A 1 D4 B D3 C D2 D D1 E D0 0 D5 1 D4 A

state m state m+32 D3 B D2 C D1 D D0 E

D5 A

D4 B

D3 C

D2 D

D1 E

D0 1

D5 1

D4 A

D3 B

D2 C

D1 D

D0 E

state i+1

state m+32

Figure 4 State transition (m = 1, k = 7)


0 i 1 0 I +1 1 m + 32 m

Figure 5 Butterfly representation of transitions (m = 1, k = 7)

Decoding
The Viterbi algorithm tries to find a path of the trellis diagram, where the sequence of output symbols approximately matches the received sequence. To accomplish this task, it calculates for each path the path metric, which measures the distance to the received symbols sequence. As two paths can merge at each node (Figure 5), two path metrics are computed for each node. Only one path will survive. The survivor presents the minimum distance to the received sequence. Thus the number of computations in decoding, performed at each time interval, increases exponentially with k. The exponential increase in the number of computation make impractical to use large constraint lengths to implement convolutional codes. Funded by:

10
11 00 01 2 00 00 11 11 00 11 00 2 01 11 0 10 2 10 00 2 01 01 1 01 10 11 t0 t1 t2 01

11 00 11
10


t4

00 01 01 10 t3

Figure 6 Path metric In Figure 6 the starting state is assumed to be 00. It has been found that a Viterbi decoder may start decoding at any arbitrary point in a transmission, if all state metrics are initially reset to zero. In this example, received bits are represented by orange numbers. An error exists, represented by the italic 0. Two paths reach state 10 at time t3 (blue and green paths). Considering the hamming distance, branch metrics associated to each transition of this path were calculated. Metrics of the blue and the green paths are calculated by (8) and (9). Blue path has a smaller distance to the received sequence. PM [01](t3) = 0 + 1 + 1 = 2 PM [01](t3) = 2 + 2 + 2 = 2 (8) (9)

If the decoding process ends at time t3, the received sequence will be corrected from 110001 to 110101.

Hard versus soft decision


In the demodulation process, received analog waveforms are converted to a digital signal. Sampled voltages are quantized. In the simplest quantization method, the hard decision, two levels are used. The demodulator codes the two levels, using a single bit. It decides whether 0 or 1is the received bit.
Probability of being 0 Probability of being1

{ { { { { { { {
000 001 010 011 100 101 110 111

. . .

Figure 7 Hard decision and 8-level soft decision for BPSK Funded by:

. . . 8 level s oft decision 2 levels hard decision

11

On the bottom of Figure 7, the sampled voltage (BPSK) is quantized in two levels and it is demodulated to a 0 or 1. Adding more levels to the quantization process improves the decoder performance, as the demodulator provides the decoder with a measure of confidence for its decision. For instance, in an 8 level soft decision, the demodulator identifies 8-levels. These levels indicate a 0 or 1 with a high or low confidence [1]. The 3 soft-bits can be coded, using an offset-binary format or sign-magnitude format [2] (Table 1). Twos complement format can also be adopted. Figure 7 presents 3 soft bits with an offset-binary format. Table 1 Soft bits mapping for 8 levels
Code words Offset-binary 111 110 101 100 011 010 001 000 Sign-magnitude 111 110 101 100 000 001 010 011 2s complement 011 010 001 000 111 110 101 100 strongest 0 weakest 1 weakest 0 strongest 1

For a Gaussian channel, 8-level quantization, when compared wit 2 level quantization, results in a performance improvement in required signal to noise ratio of approximately 2 dB. Analog (infinite level quantization) results in a 2.2 dB signal to noise improvement over 2 level quantization. Only a 0.2 dB loss exist for 8-level when compared to analog representation [3]. For the hard decision decoding, the Hamming distance is used. It is defined as the number of bits that are different between the received symbol at the decoder and the output symbol of the trellis diagram branch. Euclidean distance is adopted for soft decision decoding.

Euclidean distance
In the Viterbi algorithm, comparison between path metrics is required to determine the survivor path. Since path metrics are made up of the accumulated branch metrics we can only analyze branch metrics. For a 1/3 code rate, the Euclidean distance calculates the distance between the 3 received noisy received symbol (Ai,Bi,Ci) and the ideal output symbol (A,B,C) of the transition between two states of the trellis diagram (Figure 8), using Equation (10). Funded by:

12

Ai Bi Ci

ABC

Figure 8 Received bits and output bits in a 1/3 decoder


bmi = bm ( Ai , Bi , Ci ) = ( Ai A) + (Bi B ) + (Ci C )
2 2 2

(10)

Developing the first term of Equation (10) we obtain equation (11).

( Ai A)2

= A 2 + Ai2 2 Ai A

(11)

If ideal output symbols or noiseless symbols 0 and 1 are represented by symmetrical values a and +a, equation (12) is obtained.

( Ai A)2

= a 2 + Ai2 2 AAi

(12)

As only differences between branch metrics are important, we can add or multiply all branch metrics, in the same time interval, by a constant. The comparison between path metrics will not change. Since all branch metrics, in the same time interval, have the same term Ai we can compute the branch metric based on equation (13) to obtain equation (14).
2 A Ai a A B C Ai Bi C i a a a

(13)

bmi ( Ai , Bi , C i ) =

(14)

Table 2 presents the branch metric for the 8 symbols received. Only additions need to be implemented to calculate Euclidean distances. We considered noiseless symbols 0 and 1 represented by the symmetrical values a and +a. It can be demonstrated [4] that for other representation (b, b+k) the branch metric can also be calculated only with additions (Table 2). Table 2 Branch metrics for the 8 possible received symbols
Received symbol Ai Bi C i 000 001 010 Branch metric

bm(0,0,0 ) = A + B + C bm(0,0,01) = A + B C bm(0,1,0) = A B + C

Funded by:

13
bm(0,1,1) = A B C bm(1,0,0) = A + B + C bm(1,0,1) = A + B C bm(1,1,0) = A B + C bm(1,1,1) = A B C

011 100 101 110 111

Decoding Length
Looking at the example of Figure 6, we can see that only at time t3 the decoder can decide the first decoded bit (0 on the blue path). For long sequences, the Viterbi algorithm requires large decoding delays and, as so, large amount of memory because paths must be stored before being discarded. The storage requirements grow exponentially with constraint length k. For a code rate of 1/n, a set of 2k-1 paths must be stored after each decoding step. It has been demonstrated that a delay as much as five times the constraint length results in negligible performance degradation. For punctured codes, the decoding length (decoding delay or traceback length) must be increased to compensate for the addition of dummy 0s [5][6]. High-rate codes, where more bits are punctured, have low minimum distances between coded sequences and, therefore, the survivor paths take longer to converge [7]. Simulations are usually done to determine the decoding length.

Viterbi Decoder implementation


We implement the Viterbi decoder based on our previous implementation [7]. The functionality of a Viterbi decoder is usually implemented by three functional units: the branch metric unit (BMU); the add-compare select unit (ACSU); and the survivor memory unit (SMU). BMU calculates the distance (metric) between the received noisy symbol and the output symbol of the state transition (branch). ACSU computes the accumulated metric associated with the sequence of transitions (path) to reach a state. When more then a path arrives to a state, ACSU selects the path with the lowest metric value, which is the survivor path. SMU stores the information that permit to traceback from a state to the previous one.

Figure 9 Viterbi decoder architecture

Funded by:

14 Figure 9 presents the classical architecture of a Viterbi decoder [5][10], where ACSU has a parallel architecture. For high speed communications, throughput can only be achieved by parallel or pipelined architectures [11]. In Figure 9, traceback processing is realized by the decision unit (DU), using data stored in SMU.

BMU
The BMU computes the Euclidean distance between the received symbol and the output symbol of a transition. Considering the base code rate of 1/3 (MB-OFDM), a symbol of tree bits is compared. For each bit, we use four soft bits. Soft bits format used is presented of Figure 10.
dynamic range 1001 -7 1010 -6 1011 -5 1100 -4 1101 -3 1110 -2 1111 -1 0001 1 0010 2 0011 3 0100 4 0101 5 0110 6 0111 7

Figure 10 Soft decision format Figure 11 presents the BMU implementation, where A, B, C represent the noisy received bits. The eight outputs represent the branch metrics of Table 2.
n n n

A n B n C

n+2

n+2

n+2

n+2

n+2

n+2

n+2

n+2

bm(000)

bm(001) bm(010)

bm(011) bm(100)

bm(101)

bm(110)

bm(111)

Figure 11 BMU implementation

ACSU
The MB-OFDM encoder, with constraint length seven, has a state diagram with 64 states [9]. In order to compute in parallel the accumulated distances for each state, at each time step, 64 ACSUs were implemented.

Funded by:

15

dSt

bm Sn MSnt-1

6 12 12

MUX

Comparator M Smt-1 bmSm


12 12 6

12

MSt

Figure 12 ACSU As depicted on Figure 12, the trellis diagram for 1/3 MB-OFDM, two path metrics are computed for each state. Therefore, each ACSU has two inputs from the BMU: the accumulated metric for each path. One decision bit, identifying the previous state to allow traceback. Each ACSU has two outputs: the accumulated metric for each path and one decision bit, identifying the previous state to allow traceback. For example, the ACSU00, which calculates the accumulated metric for the 000000 state, has two inputs from the BMU: bm(000) and bm(111). In fact, the output of the state transition from 000000 state to 000000 is 000, and from 000001 state to 000000 is 111.

SMU
SMU is a memory, storing the decision bit (ds on figure 13) to identify, for each state and for each time step, the previous state. We used for decoding length, seven times the constraint length. Therefore, the SMU memory has 4964 bits: the algorithm runs for 49 time steps and 64 states exist. d000 d001 ...d063 d100 d101 ...d163 d4800 d4801 ...d4863

Figure 13 SMU For a good performance, as 64 ACSUs exist, 64 bits will be available at each time unit. Therefore, the memory is implemented with 49 elements of 64 bits (Figure 13).

DU
The decision unit detects the state with the lowest metric and identifies the path to reach it. To identify the state with the lowest metric, the final path metrics are compared to each other, until the state with the lowest metric is found. Instead of using dedicated comparators, our implementation reutilizes the comparators of the ACS unit (see figure 14).

Funded by:

16
dSt

0 MSnt-1

6 12 12

MUX

Comparator M Smt-1 0
12 12 6

12

MSt

Figure 14 Comparison using the ACS unit The 64 values are compared using the comparators of the first 32 ACS units. For a valid comparison, the bm inputs are set to 0. The process repeats iteratively for 6 cycles (64 = 26) until the best metric is found. At each step, and according to the comparison result, the state with the lowest metric is partially identified. At the final step, the state is completely identified. This approach reduces the resources needed to compute the Viterbi algorithm. However, since the resources are shared, the analysis of the next bits is delayed by 6 cycles. The traceback block (see figure 15) starts from the state with the best metric and determines serially the values of the bits (b0, b1, b2, b3, , btbl-1, where tbl is the traceback length). The circuit determines the backward path based on the decision bits found during the calculation of the paths. From a state and a decision bit the DU block finds the previous state. The process repeats iteratively until the first decision bit is found.
E d

MUX
6

Register
1 1 1 1 1 1

DECISOR
6 1

R0

R1

R2

R3

R tbl-1

b0

b1

b2

b3

Btbl-1

Figure 15 DU traceback

Funded by:

17

Results
The Viterbi decoder was described in VHDL and placed and routed in a Virtex-5 FPGA using ISE 10.1. Different designs with traceback lengths from 35 to 70 with 3 and 4 soft bits were implemented (see results in table II). Table 3 Results for the Viterbi decoder
Soft bits Traceback length 35 42 49 56 63 35 42 49 56 63 70 LUT/FF pairs 2628 2628 2628 2903 2903 2935 2935 3173 3173 3173 3173 BRAM Freq (MHz) 242 242 242 241 241 242 242 240 240 240 240 Mbps 207 212 216 219 221 207 212 214 217 219 221

As expected, the implementation with 4 soft bits consumes more resources than that with 3 soft bits. For example, with a traceback length of 49 the implementation with 4 soft bits consumes around 20% more resources. This percentage reduces to 10% for higher traceback lengths. All implementations achieve more than 200 Mbps. The throughput is lower than the operating frequency since our implementation of the Viterbi decoder uses the ACSUs to implement the decision unit. Since the DU unit takes 6 cycles to execute, the throughput, Th, is given by equation (15) Th = Freq 1/ ( 1 + 6/traceback_length ) (15) Based on the implementation results and taking into account the analysis of BER (Matlab simulations), we conclude that with 3 soft bits the most efficient solution is the one with trace 49. It achieves almost the same BER and throughput with 10% less resources. However, with 4 soft bits, the most efficient solution is the one with trace 70. Using 4 soft bits instead of 3 achieves 22% improvement in the BER at the cost of an additional 21% resource utilization.

Conclusions
Many Viterbi implementations have been proposed for reconfigurable computing using FPGAs (see, for example [12][13]). Recently, only a few works have been proposed as a result of some specificity associated with the target application. For example, [1] presents a configurable 3-bit soft decision Viterbi decoder implementation that meets the requirements for WLAN and broadband applications. The programmable Funded by:

18 design supports a constraint length K=7 soft decision Viterbi decoder (SDVD) realization with a code rate (R) of 1/2 and traceback lengths (TBL) of 35 and 50 symbols. The architecture works with a throughput of 155 Mbps in a XC2VP100-1704ff5 FPGA device. Our proposal achieves higher throughputs and uses fewer resources since we are using the comparators from the ACSU blocks to compare the final cost values with a small penalty over the performance.

Funded by:

19

References
[1] [2] [3] B. Sklar, Digital Communications- Fundamentals and Applications, Second Edition, Prentice Hall, 2001. Qualcomm Application Note AN1650-2, Setting Soft-Decision Thresholds for Viterbi Decoder Code Words from PSK Modems Heller, J. Jacobs, I., Viterbi Decoding for Satellite and Space Communication, IEEE Transactions on Communication Technology, Volume: 19, Issue: 5, Part 1, pp: 835-848, October 1971 H. Lou, "Implementing the Viterbi Algorithm: Fundamentals and Real-Time Issues for Processors Designers", IEEE Signal Processing Magazine, Vol. 12, No. 5, pp. 4252, 1995. S. Singhal and M. Gilani, Crafting a Custom Viterbi Decoder for WLAN Designs, Jan 2002, http://www.commsdesign.com/showArticle.jhtml?articleID=16504015 C. L. Taylor, Punctured Convolutional Coding Scheme for Multi-Carrier MultiAntenna Wireless Systems, EECS Department University of California, Berkeley Technical Report No. UCB/ERL M01/27, 2001 Robert H. Morelos-Zaragoza, Art of Error Correcting Coding, second edition, John Wiley & Sons, 2006 Rui Borges, Horcio Neto and Helena Sarmento, Implementing a Viterbi decoder in a FPGA for a UWB MB-OFDM receiver , XXII Conference on Design of Circuits and Integrated Systems, November 2007 ECMA, "Standard ECMA-368: High Rate Ultra Wideband PHY and MAC Standard", December 2007 Chang Y.-N., Suzuki H., and Parhi K., A 2-mb/s 256-state 10-mw rate-1/3 Viterbi decoder, IEEE Journal of Solid-State Circuits, vol. 35, no. 6, pp. 826-834, June 2000. I. Bogdan, M. Munteanu, P. A. Ivey, N. L. Seed, N. Powell, Power Reduction Techniques for a Viterbi Decoder Implementation, www.mitzanu.ro/resume/pdf/espld00.pdf. J. Cavallaro and M. Vaya. Viturbo: a reconfigurable architecture for viterbi and turbo decoding. ICASSP 03, 2:II 497500 vol.2, April 2003. K. Chadha and J. Cavallaro. A Reconfigurable Viterbi Decoder Architecture. Conference Record of the Thirty-Fifth Asilomar Conference on Signals, Systems and Computers, 2001, 1:6671, 2001. Abdul-Rafeeq Abdul-Shakoor and Valek Szwarc, A High Performance Soft Decision Viterbi Decoder for Wlan and Broadband Applications, IEEE CCECE/CCGEI, Ottawa, May 2006.

[4]

[5] [6]

[7] [8]

[9] [10]

[11]

[12] [13]

[14]

Funded by:

Potrebbero piacerti anche