Sei sulla pagina 1di 4
A Fast and Low Power Multiplier Architecture E, Abu-Shama, M. B. Maz, M. A. Bayoumi ‘The Center for Advanced Computer Studies The University of Southwestern Louisiana Lafayette, LA 70504 Abstract- In this paper a new multiplier architecture is proposed for low power and high speed applications. It is based on generating all partial products in one step, then summing these partial Products using binary tree network. This reveals a speedup of more than 50% than the array multiplier for (82X32) bit multiplication. Computer simulation with SPICE shows that the new proposed architecture has better speed and power performance. 1. Introduction ‘The demand for high speed processing has been increasing as a result of expanding computer and signal processing applications, Higher throughput arithmetic operations are important to achieve the desired performance in many real-time signal and image processing applications. One of the key arithmetic operations in such applications is ‘multiplication, and the development of fast multiplier circuits has been a subject of interest over decades. Reducing time delay, and power consumption are very essential requirements for many applications. Achieving these critical requirements may result, in some cases, in increasing the area of the design by a considerable factor. Hiigh speed multiplication can be implemented using several algorithms such as: array, booth, carry-save, modified booth algorithms, and wallace tree. In this paper, we present @ ‘multiplier architecture that is based on binary tree topology, itis termed Modified Binary Tree (MBT) multiplier. Ithas a time delay that is considerably less than the array multiplier algorithm. In section 2, we introduce the multiplier architectures of the array multiplier and the new (MBT) multiple. A comparison of the two architectures is introduced in section 3, followed by simulation analysis and ‘optimization in section 4 and 5, respectively. Then the paper's conclusion is summarized in section 6, Mul ‘Combinational Array Multipliers For simplicity, lets assume the multiplication of two unsigned binary numbers X & Y, where X= Xo% Xz.Xa and Y=YoVi Yo-¥oa [1]. Therefore, the product P=X * ¥ can be. expressed as 2. 2. wr Architectures 0-7803-3636-4/97 $10.00 © 1997 IEEE me > gy m corresponding to the usual bit-by-bit multiplication, ‘Eq.1 can be rewritten as eS et Sar) Each of the n° I-bit products xy; in (Eq.2) maybe computed by a two-input AND gate(3]. ‘The summation of these term 4s accomplished according to (Eq.2) by an array of n(n-1) full adders. The shifts implied by the 2°* and 2"*° factors {in (Eq.2) are implemented by the spatial displacement of the full adders{3). Therefore, the worst case propagation delay can be expressed as, Ea (Eq. 2] 2(n-1)a+a? where d and d’ are the propagation delays of a full adders and an AND gate respectively [3]. Note that the propagation, delay increases as n (numbers of bits) increases 2.2, The New Multiplier Architecture ‘The proposed modified binary tree (MBT) multiplier architecture is based on an algorithm similar to the one reported in [6]. It is based on two concepts. First, the generation of all partial products of the multiplication can be done in parallel with a delay of d’. Second, the speedup of adding these partial product will take loga(n) steps. The parallelism in generating the partial product is realized by ANDing the first (LSB) of the multiplier with the rultiplicand bits. The second partial product is achieved by ANDing the second multiptier bit with the multiplicand bits proceeded by a zero, The third partial product is achieved by ‘ANDing the third multiplier bit with the multiplicand bits proceeded by double zeros, and so on... For (a X n) bits there will be n partial products . These partial products can be added in parallel. Each two adjacent partial product will be added together through n-bit adder. ‘This will generate the frst level of computation with m2 partial sums. These partial sums are added again in the same fashion creating a second level of computation with n/4 new 53 (ol) of mit adders in the form of a binary tee network. The number of levels needed to create this binary tree network is logs(n). Figure 1. Shows the basic architectre for the new MBT multiplier structure for (4 X 4) bit binary numbers. a— | wh me n Fig 1. New Multiple Architecture for (4.4) bit binary numbers For example, multiplying (4 X 4) bitnumber can be performed as follows Paxtial Sum ——j Parallel Gorrie | computation Partial of TTiG0 of Partial um Products s T110do For this example, there are 4 partial products generated, cach 2 partial products are added (indicated by dot lines) creating the first level of computation which have 2 partial sums. These two partial sums are fed to the second level of adder, resulting in the formation of the final product. Since this architecture requires (n-1) of n-bit adders, it needs a total of n(n-1) full adder cells. So, the worst propagation delay of this architecture can be computed as follow: a +nd flogs(n)] ‘Where, d and d’ are the propagation delays of a full adder and a 2-input AND gate ,respectively. ‘This is for the case of using an n-bit ripple carry adders. ‘But, changing the type of adder used in the architecture will ‘make a substantial change in the propagation delay. For example, if we use a carry-lookahead adder, the propagation delay would be @ + (42) d Mlog.t0)} Where, n=mk, and m is the number of groups each with bits. So that each group has its CLA circuit that generates its carryout and fed to the next group and so on [3] 3. Evaluation of The Proposed Architecture ‘The number of full adders required for the proposed MBT multiplier architecture is n(a-1), which is the same as jn the case of the array multiplier. The type of adder structure differs between the two architectures. In the new MBT multiplier architecture, adders are arranged in (0-1) blocks of parallel adders. Each adder block is composed of n adder cells which may be implemented in many different ways, such as, carry-lookahead, carry-skip, ripple cary. Ripple carry is used for the case of small multiplier design like (4x4) bit multiplication. However, for higher ‘multiplication value carry skip adder will be the best choice to use like for the case of (16x16) bit multiplication, In the array multiplier architecture, adders are placed in one huge block instead of m parallel blocks in the proposed architecture. This lead to a higher propagation delay time for the array multiplier compared with the proposed MBT ‘nultiplier. The new architecture can be much faster than the array multiplier depending on the selection of the n-bit parallel adder type. For example, the multiplication of (32 X 32) bits using both architectures will result in the following delays: Type Proposed Amy Delay 30d od note: d’ is eliminated from both sides since they are equal Jn this example, the new architecture results in a speedup of $1.6 % more than the array architecture, Table II shows a comparisons of the two architectures for different n Xn bit multiplication using camry-lookahead adders and full adder cell for the new and the array architectures, respectively. ‘The features of the new multiplier architecture as ‘compared to the other architectures are as follows: a) The n-bit multiplication time is proportional to logan and the physical layout has a good repeatability +) All the partial products are generated in one step. ©) The summing time of our architecture is a function of O(og3N) while for the array multiplier is O(Nlog:N), the RBA. tree multiplier is O(og:N), and Wallace tree nltiplier is Ologs2 N). We compare the proposed multiplier having such features with other high-speed multipliers. Table I shows the comparison of our multiplier, an array multiplier, Redundant Binary Adder (B.A) tree multiplier, and Wallace tree rultiplier for the intrinsic logic path, The table shows that our multiplier has the shortest logic path among other ‘multiplier architectures Table. Logic path of different types of Multipliers ‘Type Proposed Wallace. RB.ATree Amray Logic path 19 32 36 187 for 32 bits In our design, all partial products are generated in one- level regardless on the size of multiplication. This is done by having n groups each with n number of AND gates, in order to generate-n partial products for the case of (n x) bit ‘multiplication. This dramatically reduces the area of the iultiplier aswell as lowering the amount of power consumption. However, in [6] generating the partial product requires more than one level (3 levels for the case of 16 x 16 bit multiplication) and the numberof levels will increase exponentially with the size of multiplication. Moreover, we tty to minimize the power consumption not only by ‘minimizing the multiplier architecture, but also by utilizing an adder cell that was designed especially for low power [1] Figure 2. shows the basic adder cell used. Reducing the number of transistors in the adder cell itself [1], and ‘generating all partial products in one step lead to a major reduction in time delay. vas Fig. 2. The new desig of full adder FA New ‘Table UL Propagation delay time of both architectures for different (a X n) bit sizes. 4, Simulation Analysis A model of (4 X 4) bit_multipl architecture has been implemented using a low power adder cell [0] with a ripple cary structure under CADENCE EDGE DESIGN FRAMEWORK with 2-1 CMOS technology, and the simulations were done using HSPICE 19002 [SJover 500ns duration. The average power consumption of this multiplier is 0.0614 mW, and the ‘maximum power consumption is 2.77 mW. The propagation delay time was about 886 ns. 5. Optimization of the Architecture ‘The new architecture can be further optimized by ‘employing a more complex control citcuit to the design. For example, after the first level had computed its partial sums, it will pass these results to the second level, that ‘makes the first level setting idle until the final product is accomplished. By using a control circuit that would feed these partial sums again to the first level of adders and running the computation on 1/2 bit of adders and powering down the rest n/2 bit of adders, then the next step is to run the computation on 1/4 bit of adders and so on until the final product is produced. This will be repeated log, (n) times. Furthermore, the control circuit can be modified to have a leading zeroes count for both the multiplicand and the ‘multiplier, so it would reduce the computation time and ‘power consumption by running less mumiber of adders 6. Conclusion In this paper, we propose new multiplication architecture using a parallel approach for generating the partial products and there sums, As we stated before the architecture can be further enhanced by involving a control logic, which eventually will reduce both area ,and cost ,but increase the complexity of the design. Over all, the proposed architecture shows better performance than the other architecture. References [1] B. Abu-Shama, A. Elchoumi, S. Sayed, M. Bayoumi, “An Efficient Low Power Basic Coll for Adders,” Proceeding of 38th Midwest Symposium on Circuit and Systems, Rio de Janeiro, Brazil, 1995. [2] M. Mortis Mano, “Digital Logic and Computer Design’, Reading, CA: Prentice-Hall, 1979, pp. 434-446. [3] LP. Hayes, “Computer Architecture and Organization, Computer Organization “. Reading, NY: McGraw Hill, 1988, pp. 230-250, 56 [4] Nell H. E, Weste and Kamran Eshraghian, “ Principles of CMOS VLSI Design, A Systems Perspective” . Reading, MA: Addison Wesley, 1993, pp. 542-560. [5] Meta-Software, “ HSPICE User's Manual version 9002, Campbell, CA, 1992,”. Meta-Software Inc., 1300 White Oaks Road, Campbell, CA, 95008. 6] ¥. Harata, Y. Nakamura, H. Nagase, M. Takigawa, “A High-Speed Multiplier Using a Redundant Binary Adder Tree,” IEEE J. of Solid-State Circuits, vol SC-22, NO 1, pp. 28-33, Feb. 1987.

Potrebbero piacerti anche