Sei sulla pagina 1di 9

16

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2002

The Design of Hybrid Carry-Lookahead/ CarrySelect Adders


Yuke Wang, C. Pai, and Xiaoyu Song, Member, IEEE
AbstractIn this paper, we present a general architecture for designing hybrid carry-lookahead/carryselect adders. Several previous adders in the literature are all special cases of this general architecture. They differ in the way Boolean functions for the carries are implemented. Based on the general architecture, we propose a new implementation of high-speed 56-bit hybrid adder. The new adder directly implements group carry propagates and group carry generators without individual carry generator/propagate signals. Moreover, the group carry generator/propagate signals are complemented to gain speed. The new implementation can be in static CMOS or dynamic logic style. The critical path length of our new design is about 2/3 of the critical path lengths of previous adders; therefore, higher speed can be gained. Index TermsAdders, high-speed circuits, VLSI.
Fig. 1. Traditional CLA block diagram.

I. INTRODUCTION LTHOUGH integer addition typically has the smallest delay of all arithmetic functions, it has the largest impact on the overall computer performance [4]. Classic high-speed adders include carry-lookahead, carryskip, carryselect, and conditional-sum adders. Many more addition algorithms and circuits have been reported in literature in the last 10 years [1] and [2]. and To add two numbers , we need carry information , where , and . in Carry-lookahead adders (CLAs) generate the carries parallel. They are commonly known as the fastest adders. Different implementations of the CLA have been reported in the literature. A fast implementation using fast circuit techniques adds two 32-bit operands in 3.1 ns using 0.9- m CMOS technology [5]. In another paper [6], a slightly modified version of the CLA adds two 64-bit numbers in 4.5 ns using 1- m static CMOS, and can also add two 96-bit numbers in the same amount of time. Recently a few new adders based on hybrid carry-lookahead and carry-select scheme have been reported [1] and [2]. The adders in [1] and [2] implemented in dynamic CMOS are significantly faster than both classical CLA and carry-select adders (CSAs) [2]. The 64-bit spanning tree carry-lookahead adder (STCLA) introduced in [1] uses a tree of 4-bit Manchester carry-lookahead chains (MCC) to generate
Manuscript received January 5, 2001; revised September 26, 2001. This paper was recommended by Associate Editor V. Anastassopoulos. Y. Wang is with the Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688 USA (e-mail: yuke@utdallas.edu). C. Pai is with the Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada. X. Song is with the Department of Electrical Engineering, Portland State University, Portland, OR 97207-0751 USA (e-mail: song@ee.pdx.edu). Publisher Item Identifier S 1057-7130(02)02787-8.

carries for different bit positions instead of generating all the carries as in traditional CLA. The adder can evaluate 56-bit sums in well under 4 ns. The 56-bit recursive CLA/carry-select hybrid adder (RCLCSA) reported in [2] is a continuation work of the STCLA. Unlike the STCLA, it uses Manchester chain of various lengths instead of using only fixed 4-bit chain, which results in nonuniform carry positions. A gain in speed over the STCLA is reported in [2]. Besides the traditional carries, there are other forms of carries. Lings carry defined in [7] can result in substantial savings in implementations. Moreover, four different variants of Lings carry have been reported in [9]. One implementation of a 32-bit high-speed adder (HSAC) using the CLA scheme based on Lings carry has been reported in [3]. The implementation is done in static CMOS technology. The HSAC adder can add up two 32-bit numbers in 4.0 ns. In this paper, we propose a general architecture for the design of hybrid carry-lookahead/carry-select adders. Our general architecture is not restricted to any particular logic style or carry information, unlike the previous works in [1][3]. We show that previous adders in [1][3] are all special implementations of our general architecture. Based on the general architecture, we propose a new implementation of high-speed 56-bit hybrid adder. The new implementation is based on Lings carry and can be in static CMOS or dynamic logic. Moreover, the new adder directly implements group carry propagates and carry generators without individual carry generator/propagate signals , and , which are required in traditional CLAs and CLA/CSA hybrid adders. Furthermore, the group carry generator/propagate signals are complemented to gain speed. The rest of this paper is organized as follows. In Section II, we present a general architecture for CLAs/CSAs, and show that three previous adders fall into this architecture. In Section III,

1057-7130/02$17.00 2002 IEEE

WANG et al.: DESIGN OF HYBRID ADDERS

17

Fig. 2. General architecture of CLA/CSA.

we present a new implementation of the hybrid adder based on the general architecture. In Section IV, we analyze the performance of the new adder and previous adders. Finally, in Section V, we conclude the paper. II. A GENERAL ARCHITECTURE OF THE HYBRID CLA/CSA A. General Architecture of the Hybrid CLA/CSA We use the following notations and definitions in this paper. The two operands in addition are and . The sum . The carry propagate/genis ) are defined as , erator signals ( . We can also define the group propand and agate/generator signals as . We . The carry-in signal is dehave the equation , and the carry-out signal ( ) noted as is the carry generated by the most significant bit. The carry generated at bit position is , where , thus we have . It is easy to see . A widely used scheme to generate is to that operator [1], [2], and [10] defined as use the . Lings carry is defined as such that . Traditionally a CLA consists of three units shown in Fig. 1. The first unit generates all the s and s, which are fed into the carry-lookahead unit to generate all the , finally the sum generation unit generates the sum . Hybrids of CLAs and CSAs do not generate all carries, instead only part of the carry signals is generated. The carry-lookahead unit is computing the carry selection signals while the sum-generation unit computing the sums to be selected. Once the carry signals are computed, the carry-select adders will choose the correct sum and produce the output. Unfortunately, previous hybrid carry-lookahead and carry-select adders are restricted to certain technology and implementation

details. For example, the STCLA and RCLCSA are for dynamic CMOS logic styles only while the HSAC adder is for static CMOS logic implementation. We propose a general block diagram for the hybrid carry-select/CLA in Fig. 2. The numbers and plus the carry-in are the inputs. The carry lookahead unit generates carries at different positions. The first unit in the traditional CLA adder shown in Fig. 1 which generates and is eliminated. Moreover, the carry signals can be traditional carries, Lings carries, or any other kinds of carries which suite the context. The selected carry signals are used to select the correct sum bits computed by the CSA. The different units can be implemented in different algorithms and technologies. Therefore, our architecture is more general than those proposed in [1][3]. In the following, we will show different adders proposed in [1][3] fall into our general architecture. B. Dynamical Logic Implementation of CLA/CSA Two CLA/CSA implemented using dynamical logic style have been reported in the literature. They both fit into the above general architecture shown in Fig. 2. The STCLA is the first CLA/CSA reported in the literature. The STCLA generates the traditional carries using a tree of Manchester carry chain (MCC) modules. They are then fed into the carry select adders. Each CSA in the general architecture corresponds to two 8-bit carry-ripple adders and one 8-bit 2 : 1 multiplexer (MUX) in the STCLA adder. The RCLCSA is an improvement of the STCLA. Unlike the STCLA, the RCLCSA uses Manchester chain of various lengths instead of using only fixed 4-bit chain, which results in nonuniform carry positions. As the result, the traditional carries are . They are fed into the generated for bit positions CSAs. Each CSA is constructed using one multiplexer and two specially designed ripple adders presented in [2]. C. Static Logic Implementation of CLA/CSA The general architecture shown in Fig. 2 can also be implemented using static CMOS logic style. A 32-bit high-speed CLA

18

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2002

Fig. 3. 56-bit HSAC.

(HSAC) has been reported in [3]. The carry signals used in this adder are Lings carries . In order to be able to compare with other adders, we first extend the HSAC from 32 bits to 56 bits. A direct extension of the architecture given in [3] results in an adder shown in Fig. 3, where we use the same notation as in [3]. Fig. 3 is very different from our general architecture in Fig. 2. Below we illustrate that this 56-bit HSAC falls into the general architecture. , we have the following equation: Starting from the carry

erals in the product terms. For example, the product term contains 53 literals. In the following, we present a multilevel implementation of such that each gate has a fanin no more than 4. We introduce the following new variables:

Using those new variables, we have the following equation:

where we have

Thus,

As a byproduct, we also have the following signals available:

The above equation of cannot be implemented in CMOS by two-level logic due to the large number of lit-

WANG et al.: DESIGN OF HYBRID ADDERS

19

Fig. 4. Equivalent block diagram of 56-bit HSAC.

The global carry-lookahead unit in the HSAC is responsible . This component to generate carries corresponds to the carry-lookahead unit in Fig. 2. The carries are fed into the CSA blocks. Each block consists of no more than 3 groups. All groups in the same block share the same carry signal. Each group is equivalent to 3 CSAs, shown in [3, Fig. 3]. Table I summarizes the correspondences between the general architecture proposed in Fig. 2 and the three adders in [1][3]. III. A NEW IMPLEMENTATION OF HYBRID CLA/CSA It is noted that the individual carry propagate/generator sig) have already been eliminated in Fig. 4. In this paper, nals ( we further extend the scheme in Fig. 4, which is an extension of the HSAC in [3]. In our newly proposed adder, the carry-lookahead unit generates the complement Lings carries instead of Lings carries, which are inputs to the MUX in the CSA blocks. This will not affect the function of the carry-select adders since the MUX is controlled by both an input and its complement. However, implementing the complement Lings carries can gain speed over the implementation of Lings carries. Specifically, we in the implement the carries carry-lookahead unit based on the equations shown at the , bottom of the next page. The are implemented by the 5 blocks marked

TABLE I COMPARISON OF DIFFERENT 56-BIT ADDERS

shown in Fig. 4. The inputs to by the following equations:

are , which are implemented

A simplified block diagram of the new adder is shown in Fig. 5. The static complementary CMOS logic implementations are shown in Fig. 6. The blocks of individual blocks for

20

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2002

Fig. 5. Block diagram of the new adder.

Fig. 6.

Schematics for G

; P

for are shown in Fig. 7, and finally, the blocks for are shown in Fig. 8. It is also an easy task to implement the above equations using dynamic logic. Regardless of static or dynamic logic, the NMOS networks of the implementations are the same. Therefore, the implementations have the same critical path lengths.

IV. PERFORMANCE EVALUATION A. Performance Evaluation Model In the literature, the speed of adders has been measured in different ways. The STCLA has been implemented in AMD Am29050 microprocessor of 1- m dynamic CMOS technology. The measurement of speed is done on the real chip. The STCLA adds two 56-bit operands in 3.2 ns, measured from the clock

edge to the slowest sum bits [1]. The RCLCSA has been simulated using MOSIS 1- m process at room temperature, with effective channel length of approximately 0.7120.766 m. With the assumptions that the wire, capacitance of length to be 2 * 0.111 * fF, and the carry signals fed into wires that are up to 500- m long, the delay of the 56-bit adder has been reported to be 1.85 ns [2]. The 4.0 ns delay of the 32-bit HSAC operating at 5 V driving a 0.3 pF load at room temperature is obtained from simulation with simulation parameters obtained from the layout of 11-bit adders using advanced bipolar/CMOS (BiCMOS) process with 1.0- m drawn channel length. Table II summarizes the simulation conditions and results from previous literature [1][3]. From Table II, one can clearly see the difference in the measurement of the performance in the literature. There is no generally accepted performance model for the adder evaluation.

WANG et al.: DESIGN OF HYBRID ADDERS

21

Fig. 7.

Schematics for Gb =P b with inverted inputs.

Fig. 8. Schematics for H

in the block C .

From Table II, one can see that the HSAC based on Lings carry is the slowest adder. It takes 4.0 ns for 32-bit addition while the RCLCSA can add 56-bit numbers in 1.8 ns. However, it is not so straightforward if we wish to compare these adders by simply using their simulation results. All 3 adders are implemented using different technologies (dynamic CMOS/static BiCMOS), and simulated/measured under different conditions. The speed-up of one adder may be due to many reasons, such as technology, supplying voltage, measurement criteria, loading, simulation software, or adder algorithm/architecture. No one has simulated all different adders under exactly the same conditions. Also, even the same design may have different simulation results. For example, the author of [2] simulated the STCLA

and RCLCSA and reported different results using different conditions. Thus, comparing simulation results is not really a good way to compare the adders. Critical path is the path of a circuit that has the longest delay. Critical path is an important measurement of the speed of circuits. As pointed out in [3], the number of serial transistors from the output to the power or ground node is one of the major speed limiting factors. Admittedly, comparing CMOS adders in terms of serial transistors in the critical path is crude but it does allow us to quickly evaluate the potential performance of an algorithm for further study [3]. The delays of all three adders, the STCLA, the RCLCSA, and the HSAC, are determined using critical paths in the simulation or measurement. A model of counting the

22

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2002

Fig. 9.

Critical path of the STCLA.

Fig. 10.

Critical path of the RCLCSA. TABLE II TECHNOLOGIES AND SIMULATION CONDITION OF THE ADDERS TABLE III TRANSISTOR DELAY MODEL [3]

number of serial transistors in the critical path is given in [3]. A similar comparison scheme has been suggested by [8] with the account details. For fully static CMOS circuits, both p-channel and n-channel transistors are evaluated for critical paths. For dynamic CMOS circuits, only the n-channel transistors are evaluated. The counting scheme proposed in [3] is summarized in Table III. This model is more accurate than the traditional complex gate delay analysis [3]. B. Critical Paths of the Adders In the following, we demonstrate the critical paths of the all the adders presented in this paper. Figs. 912 show the critical paths of the STCLA, the RCLCSA, the HSAC, and the new adder respectively. The different MCC modules in Figs. 9 and 10 include the , the MCC with intermediate output, and the MCC with MCC without intermediate output. They can be found in [1], [2], and Fig. 4. Due to space limits, we do not reproduce those implementations in this paper. All the MCC modules implement the

group propagate/generator signals and . With or without intermediate outputs, the -bit MCC modules have transistors in serial plus an inverter at the end. Therefore, the total number if the inputs are bits. of transistors in each such module is On the other hand, for the HSAC and the new adder, the com, and are shown in Figs. 68 reponents , and one additional inspectively. To implement , and , respectively. verter is needed at the end of Table IV presents the results of the number of transistors in the critical paths for all the adders. Column 5 is the detailed counting of the number of transistors in the critical paths. Column 4 is the summary of the counting in column 5. Among all the transistors appearing in the critical paths, some of the transistors are pass transistors. According to the model in Table III, pass transistors can be counted as 0.5-transistor delay. Column 3 presents the detailed counting when a pass transistor is counted as 0.5-transistor delay. The summary is in column 2. From Table IV, it is easy to see that the new adder has 13 transistors in the critical path while the STCLA has 21 transistors in the critical path. If a pass transistor is counted as 0.5-transistor delay, then the new adder has total 10-transistor delay while the STCLA has 15-transistor delay. Therefore, the critical path length of the new adder is about 2/3 of the STCLA. Comparing with other adders, the new adder has the shortest critical path.

WANG et al.: DESIGN OF HYBRID ADDERS

23

Fig. 11.

Critical path of the HSAC.

Fig. 12.

Critical path of the new adder.

TABLE IV CRITICAL PATH DELAYS

V. CONCLUSIONS We have presented a general architecture for designing hybrid CLA/CSA. We have also demonstrated that several previous adders in the literature are all special cases of this general architecture. They differ in the way Boolean functions for the carries are implemented. Based on the general architecture, we have proposed a new implementation of high-speed 56-bit hybrid adder. The new adder generates the complement Lings carries for CSAs to select the appropriate sums. Moreover, it directly implements group carry propagates and group carry generators without individual carry generator/propagate signals. Furthermore, the group carry generator/propagate signals are complemented to gain speed. The new implementation can be in static CMOS or dynamic logic style. The critical path length of our new design is about 2/3 of the critical path lengths of previous adders; therefore, higher speed can be gained. REFERENCES
[1] T. Lynch and E. Swartzlander, A spanning tree carry lookahead adder, IEEE Trans. Comput., vol. 41, pp. 931939, Aug. 1992. [2] V. Kantabutra, A recursive carry-lookahead/carry-select hybrid adder, IEEE Trans. Comput., vol. 42, pp. 14951499, Dec. 1993. [3] N. Quach and M. J. Flynn, High speed addition in CMOS, IEEE Trans. Comput., vol. 41, pp. 16121615, Dec. 1992. [4] M. J. Flynn and S. F. Oberman, Modern research in computer arithmetic, Class notes, Stanford Univ., Stanford, CA, Autumn quarter, 19981999. [5] I. S. Hwang and A. L. Fisher, A 3.1 ns 32B CMOS adder in multiple output domino logic, IEEE J. Solid-State Circuits, vol. 24, pp. 358369, Apr. 1989. [6] A. Naini, D. Bearden, and W. Anderson, A 4.5 ns 96B CMOS adder design, in Proc. IEEE Custom Integrated Circuits Conf., Boston, MA, May 1992. [7] H. Ling, High speed binary adder, IBM J. Res. Develop., vol. 25, no. 3, pp. 156166, May 1981. [8] V. G. Oklobdzija and E. R. Barnes, Some optimal schemes for ALU implementation in VLSI technology, in Proc. 7th Symp. Computer Arithmetic, June 1985, pp. 28. [9] R. W. Doran, Variants of an improved carry look ahead adder, IEEE Trans. Comput., vol. 37, pp. 11101113, Sept. 1988. [10] R. Brent and H. T. Kung, A regular layout for parallel adders, IEEE Trans. Comput., vol. C-31, pp. 260264, Mar. 1982.

A word of caution is that critical path is a quick measurement of the delay and should not be taken as an absolute measurement. For example, even though the critical path of the new adder is about 2/3 of the old adders shown in Table IV, the real implementation of all the adders may not have a difference as big as 1/3 of the speed. However, it will be hard to imagine that adders implemented in the same technology and conditions will have longer delay for shorter critical path. Therefore, we can safely conclude that if all the 4 adders listed in Table IV are implemented in the same technology and all other practical conditions, the new adder will have shorter delay compared to previous 3 adders.

24

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 1, JANUARY 2002

Yuke Wang received the B.Sc. degree from the University of Science and Technology of China, Hefei, in 1989, and the M.Sc. degree and the Ph.D. degree from the University of Saskatchewan, Saskatoon, SK, Canada, in 1992 and 1996, respectively. He has held Visiting Assistant Professor positions at the University of Minnesota, Twin City, the University of Maryland, College Park, and the University of California at Berkeley. He has held faculty positions at Concordia University, Montreal, QC, Canada, and Florida Atlantic University, Boca Raton. Currently, he is an Assistant Professor at the Computer Science Department, University of Texas at Dallas. From 1996 to 2001, he published about 60 papers among which 20 papers appeared in IEEE/ACM Transactions. His research interests include VLSI design of circuits and systems for DSP and communication, computer aided design, and computer architectures. Dr. Wang is currently an Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS, Applied Signal Processing, and a few other journals.

Xiaoyu Song (M93) received M.S. and Ph.D. degrees in computer engineering from the University of Pisa, Pisa, Italy, in 1987 and 1992, respectively. From 1992 to 1997, he was a faculty member of the University of Montreal, Montreal, QC, Canada. He was a senior member of consulting staff with Cadence, San Jose, CA. Currently, he is an Associate Professor in the Department of Electrical and Computer Engineering, Portland State University, Portland, OR. His research interests include IC and VLSI Circuit Design, testing and verification, systems-on-a-chip, and synthesis. Dr. Song is an Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS. He serves on the Editorial Board of VLSI DESIGN: An International Journal of Custom-Chip Design, Simulation, and Testing. He has served on many program committees including the IEEE International Conference on Quality of Electronics and the ACM International Conference on System-Level Interconnect Prediction.

C. Pai, photograph and biography not available at time of publication.

Potrebbero piacerti anche