Sei sulla pagina 1di 4

CONSIDERATIONS FOR PHASE ACCUMULATOR DESIGN FOR DIRECT DIGITAL FREQUENCY SYNTHESIZERS

David J. Betowski and Valeriu Beiu School of Electrical Engineering & Computer Science, Washington State University 102 Spokane Str. (EME), Pullman, WA, 99164-2752, USA ABSTRACT This paper reviews the approach of using a direct digital frequency synthesizer (DDFS) to generate high-resolution, fast switching frequencies for modern communication systems. Because these systems are required to have high speed and/or low power requirements, optimizing the phase accumulator (PA) component is a crucial design step. A mathematical model for estimating the speedpower tradeoffs of pipelined PAs will be presented. Simulations based on this model show that pipelining the PA to the maximum allowable number of stages provides the smallest latency, but at power consumptions significantly higher than a non-pipelined PA. The model can be used to estimate the optimal number of pipeline stages for given speedpower constraints. 1. INTRODUCTION For any communication system, a frequency generator is a required component. Those systems that utilize spread-spectrum modulation, such as frequency hopping, CDMA, or TDMA, require a different carrier frequency to be generated several thousand times per second. The traditional method of implementing a frequency generator is to use an analog Phaselocked loop (PLL). However, PLLs have a low-frequency switching speed, high phase noise, and closed-loop stability issues, making them undesirable for high-frequency spreadspectrum technologies [1], [2]. Additionally, PLLs consume significant area on either the chip die or system board. Developments of Direct Digital Frequency Synthesizers (DDFSs) have made them a desirable alternative to PLLs, especially for mobile, wireless, and satellite communications. The basic block diagram of a DDFS is shown in Fig. 1 [1], [2]. The essential components are a Phase Accumulator (PA), which is a variable increment counter of N bits; a phase-to-sine amplitude converter having a resolution of Q (less than N); a digital-to-analog converter (normally followed by a low pass filter). The input to the system is a digital frequency control word, FCW, of length N, leading to an output frequency:

f f out = FCW clk 2N

(1)

The output frequency is proportional to fclk. The frequency resolution is defined as fclk/2N. Both the switching speed and frequency resolution are higher than those of a PLL. The existing literature mentions an amazingly large number of methods of implementing the phase-to-sine amplitude converter (over one-hundred). All of these are basically variations of the three general methods summarized in Table I. There are two major challenges in designing a DDFS: first, the evolving wireless communication systems require very fast frequency switching, especially as these systems are developed to meet high-bandwidth applications; secondly, mobile wireless applications require very low power consumption in an effort to extend battery life and minimize heat. This paper focuses on optimizing the PA design for either high-speed, low power, or both.

Fig. 1. Block diagram of a DDFS.

TABLE I SOME METHODS OF PHASE-TO-AMPLITUDE CONVERSION Method Full size ROM Reduced size ROM Approximation method None Linear Non-linear None Linear Non-linear None Reference(s) [1], [2] [4] [5] [7] [3], [6] [3], [8] [9] DAC Binary Binary Binary Non-Linear Binary Binary Non-linear SFDR High Medium High Medium Medium Medium Low Power consumption High Low Low Low Low Low Low Circuit complexity Low Medium Medium Medium Medium High High

ROM-less

2. THE PHASE ACCUMULATOR Whatever the phase-to-amplitude conversion method used, the PA remains an essential component of any DDFS. The PA is an adder and end-register in a feedback configuration. As shown in Fig. 2, the output Y, increases by FCW for every successive clock pulse. When Y > 2N1, it resets to 0. For larger values of the FCW, the phase increases at a faster rate, hence a higher frequency wave will be generated. Obviously, the adder is a limiting factor for the speed of the DDFS. The standard approach is to design a very fast N-bit adder. Such a design exhibits high power dissipation. In general, the phase-to-sine amplitude approximation logic and the DAC use a Q = 1012-bit representation. That is why the upper Q sum bits must be available as fast as possible. These are affected by the resultant carry bit from the addition of the lower NQ bits. It follows that the carry path of the lower NQ bits must also run at a high speed. However, the sum computation of those lower N Q bits may operate at a slower speed, lowering the power dissipation for the full N-bit adder. 2.1. Types of Binary Adders The selection of the binary adder dictates the speed and power consumption of the PA. Because of the required high-speed operation, the adder selection is usually limited to adders with a

parallel-prefix structure. Several such adders are presented in a compact form in Table II. The number of gates can be considered a rough approximation of the area A, while the number of layers can approximate the delay T. For being able to compare different design solutions, it is reasonable to assume that A and T for any parallel prefix adder structure may be represented as: A = n log n T = log n where and are constants for the particular type of adder. 2.2. Pipelining the Phase Accumulator Depending on the semiconductor process used to implement the DDFS, the desired speed may be impossible to achieve using a single-stage pipelined PA. A solution for such a case is to pipeline the PA as m stages of n bits each, such that mn = N (or equivalently m = N/n), as shown in Fig. 3. Each adder outputs n + 1 bits: n sum bits, and one carry output bit. These bits are stored in an end register. The n stored sum bits then feed back into the adder, and the latched carry output bit connects to the carry input of the adder in the next pipeline stage. To store the upper Q = 1012 bits, additional end registers must be placed on the m1, m2, stages of the pipeline. m clock cycles are required to fully initialize or flush the pipeline. While a nonpipelined PA requires only two registers of N bits each, the requirements for a multiple stage pipelined PA can be obtained based on the following equation: TABLE II STANDARD PARALLEL-PREFIX ADDER ARCHITECTURES [10]. Architecture Sklansky Brent-Kung Kogge-Stone Han-Carlson Conditional Sum Number of gates n logn 2n logn 2 n logn n + 1 n logn 3n logn + 7n 2 logn - 7 Number of layers logn 2 logn 2 logn logn + 1 2 logn + 2 (2) (3)

Fig. 2. The output of a PA over time.

Remark: n is the width of the adder.

Fig. 3. Multiple stage parallel pipelined PA.

FFTotal

m(m + 1)n = + m(n + 1) + End _ registers 1 2

(4)

To determine the end register requirements for an phase-toamplitude conversion circuit (and DAC) with a resolution of Q bits, where Q < N:

End _ registers = A + (Q n)
if A = 0 and Q n 0. Here A = j < 0Q / n j . j=

(5)

It follows that a parallel-pipelined PA requires almost twice as many flip-flops as a single-stage pipelined PA, increasing the power consumption. Based on (2)(4), it is possible to estimate an optimal n with respect to minimal power. The total area of the PA can be represented as: A(N, n) = FFTotal(N, n) + nlogn. The latency L of one pipeline stage is: L(N, n) = log n + . The total delay through the PA (from the 1st to mth stage) is: T(N, n) = m L(N, n). (8) (7) (6)

power dissipation of a flip-flop to that of a generic adder gate. Since power dissipation is not known until completion of the chip layout, may be estimated as the ratio of the number of transistors composing a single flip-flop to the number of transistors in a single gate of the adder. High-speed flip-flops produced with current CMOS processes are composed of 15-25 transistors, and a single adder gate consists of 4-8 transistors using standard CMOS, pseudo NMOS, dynamic, or static threshold logic [11][14]. Consequently, we have estimated = 24. In a standard 0.25 m CMOS process, adder gates with a delay of 50100 ps [12], [13], and flip-flops with a delay of 125300 ps [14] have been reported. Therefore, it may be concluded that = 16.

x 10 12

10

AT

0 10 20 30 30 20 10

Here and are the area coefficients of a flip-flop and a generic adder gate, respectively, and and are the associated delay coefficients. More precisely, should be the ratio of the

40

50

60

60

50

40

n (bits) N (bits)

Fig. 4. Surface plot of AT (power consumption).

N = 16 40 30 20 10 0 2 4 6 8 10 12 14 16 80 60

N = 24 150 100 40 20 0 2 4 8 24 50 0

N = 32 AT (Normalized) Delay Latency (Normalized)

12 16 20 n n Fig. 5. Plots showing AT, delay and latency for PAs with N = 16, 24, and 32 bits.

2 4

12 16 20 24 28 32 n

We shall use the well-known VLSI measures AT and AT2 (corresponding to power and energy) for finding optimal designs for the PA. Using equations (6)(8), the optimal n values can be determined. The simplest method is to create a surface plot for all practical values of N and n. For a PA with Q = 12 and constructed with a Han-Carlson connectivity pattern for the adder, and with = 4 and = 2, the AT surface plot is shown in Fig. 4. Interpreting the surface plot, maximizing n minimizes power consumption since fewer flip-flops are used in the design. Of course, increasing n also increases the latency. To determine the optimal n for a given N, it is useful to take a cross section view of the AT, delay, and latency surfaces, and superimpose them on the same plot. Results for N = 16, 24, and 32 are shown in Fig. 5. As can be seen from all these plots, the lowest power consumption occurs at n = N. This is expected, since the minimum number of flip-flops is used. Naturally, this configuration has the largest latency. If a smaller latency is desired, but within low power consumption limits, all the plots show that n = N/2 is quite a good power-latency option. The smallest latencies are achieved for n = 1, with the power consumption jumping to nearly 32 times the minimal value! 3. CONCLUSIONS The equations presented provide a reasonable high-level estimate for the speed and power consumption of a phase accumulator. Further speed and power design considerations are required at a lower level to optimize the PA for the intended application. REFERENCES [1] J. Tierney, C. M. Rader, and B. Gold, A digital frequency synthesizer, IEEE Trans. Audio Electroacoust., vol. 19, pp. 4857, Jan. 1971. [2] K. Palomki, A digital sinusoidal synthesizer based on feedback, MSc thesis, Dept. of Info. Tech., Tampere Univ. of Tech., Tampere, Finland, Nov. 1999. [3] P.-S. Wu, Towards ROM-less DDFSs: On digital circuits for accurate sine approximation, MSc thesis, School of EE&CS, Washington State Univ., Pullman, WA, Aug. 2003.

[4] J.M.P. Langlois, and D. Al-Khalili, ROM size reduction with low processing cost for direct digital frequency synthesis, Proc. PACRIM01, Victoria, Canada, Aug. 2001, vol. 1, pp. 287290. [5] A.M. Sodagar, and G.R. Lahiji, Parabolic approximation: a new method for phase-to-amplitude conversion in sineoutput direct digital frequency synthesizers, Proc. ISCAS 2000, Geneva, Switzerland, May 2000, vol. 1, pp. 515-518. [6] J.M.P. Langlois, and D. Al-Khalili, Hardware optimized direct digital frequency synthesizer architecture with 60 dBc spectral purity, Proc. ISCAS02, Scottsdale, USA, May 2002, vol. 5, pp. 361364. [7] Z. Zhou, D. Betowski, X. Li, G. La Rue, V. Beiu, High performance direct digital frequency synthesizers, Proc. UGIM03, Boise, USA, 2003. [8] A.M. Sodagar, and G.R. Lahiji, A pipelined ROM-less architecture for sine-output direct digital frequency synthesizers using the second order parabolic approximation, IEEE Trans. Circuits and Systems II, vol. 48, pp. 850857, Sep. 2001 [9] S. Mortezapour, and E.K.F. Lee, Design of low-power ROM-less direct digital frequency synthesizer using nonlinear digital-to-analog converter, IEEE J. Solid-State Circuits, vol. 34, pp. 13501359, Oct. 1999. [10] R. Zimmerman, Binary adder architectures for cell-based VLSI and their synthesis, PhD thesis, Swiss Federal Institute of Tech., Zurich, Switzerland, 1997. [11] J. Rabaey, A. Chandrakasan, and B. Nikoli, Digital Integrated Circuits. A Design Perspective (2nd edition), Prentice Hall, 2003, Chp. 6 pp. 235308. [12] V. Beiu, Ultra-fast noise immune CMOS threshold logic gates, Proc. MWSCAS00, Lansing, USA, 2000, pp. 13101313. [13] V. Beiu, J. Quintana, and M. Avedillo, VLSI implementations of threshold logic: A comprehensive survey, IEEE Trans. Neural Networks, vol. 14, Sep. 2003. [14] V. Oklobdzija, V. Stojanovic, D. Markovic, N. Nedovic, Digital System Clocking: High Performance & Low-Power Aspects, Wiley-IEEE Press, 2003.

Potrebbero piacerti anche