Sei sulla pagina 1di 11

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. I , NO. 4.

DECEMBER 1993

503

Circuit Activity Based Logic Synthesis for Low Power Reliable Operations
Kaushik Roy Member, IEEE, and Sharat C. Prasad Member, IEEE

Abstract-Circuit activity or the average number of transitions at a node is a measure of power dissipation in digital CMOS circuits. Circuit activity is also related to electromigration, and hot electron effects which can degrade reliability. In this paper, we address the problem of both finite state machine and combinational logic synthesis to minimize the average number of transitions at CMOS circuit nodes for battery-operated, low-power operations and increased reliability, while minimizing area at the same time. Logic can be optimally synthesized suited for different applications requiring different types of inputs. Results have been obtained for a wide range of MCNC benchmark examples.

I. INTRODUCTION UE to the rapid development of semiconductor technology, the possible gate count in a chip has increased enormously. The minimization of power in such CMOS circuits is of extreme importance for battery-operated portable operations; and, if power consumption is low enough, the ceramic packages can be replaced by plastic ones, which cost about 25% less. In a CMOS circuit with negligible leakage current, power is dissipated only when there is a transition (a ZERO to ONE or ONE to ZERO in logic value) at the output of a gate. The circuit activity is associated with the average number of transitions occurring at any particular node of a circuit. Iyer et al. [5] have shown that the circuit activity is a measure of the stress that can cause failures in digital circuits. Hence, the circuits synthesized for low-power applications are also less susceptible to run time failures. The theoretical study of energy required to perform computations using digital computers and the energy complexity of algorithms have been undertaken by several researchers. Mead and Conway in their classical text [ I ] have considered the energy required by primitive logic devices of a technology (in this case CMOS) when switching from one logical state to another. They arrive at the postulate, yet to be contradicted, that no logic element constructed using the primitive switching device of a technology can dissipate less energy than what is required to change the state of the device. Lengauer and

Manuscript received October 20, 1992; revised July I , 1993. K. Roy was with the Semiconductor Process and Design Center of Texas Instruments, and the University of Texas, Dallas, TX 75243. He is now with Purdue University, West Lafayette. IN 47907. S. Prasad is with the Integrated Systems Laboratory of Texas Instruments Inc.. Dallas. TX 75243. IEEE Log Number 9213094.

Mehlhorn in [6] prove a lower bound on the energy needed for computing transitive functions. Kissin in [7] considers measurement of switching energy consumed in VLSI circuits, proposes an energy model, and derives bounds for energy in acyclic monotone circuits. The problem of determining when and how often transitions occur at a node in a digital circuit is difficult, because they depend on the applied input vectors and the sequence in which they are applied. During the course of normal operation, each of these vary widely. Therefore, probabilistic techniques have been resorted to. All reported methods of estimating the probability of a transition at a circuit node involve estimation of signal probability, which is the probability that the logic value at a node is a ZERO or ONE. Computing signal probabilities has attracted much research [lo], [13], [ l l ] , [12]. In [ l o ] , a simple and general, but inefficient scheme based on symbolic algebraic expressions is described. In [13], a relatively efficient algorithm to estimate the range (a subrange of { 0, 1 }) that the signal probability of a node lies within is presented. But there exist doubts as to whether the ranges are narrow enough to be of use. The algorithm presented in [ l l ] is very efficient, but the values computed by it are approximate. A more sophisticated algorithm is proposed by Kapur and Mercer in [ 121. This algorithm works by generating tighter bounds than (0, l } to assign to the branch cut at the point of fanout. These bounds are computed by a prediction scheme. The accuracy of the results depends on the performance of the prediction scheme. There are classes of circuits for which the prediction scheme fails. Research directed at estimating the power dissipation or the factors influencing it are reported in [3], [4]. We employ the method described in [3] with enhancements to compute signal probabilities for circuits with reconvergent fanout. It is described in more detail in Section 111. In [4], the average power dissipated in CMOS circuits are computed using the expected value of the number of gate output transitions per global clock cycle, which are computed, in case of static CMOS circuits, from signal probabilities. The signal probabilities are, in turn, computed using Binary Decision Diagrams. An overview of current thinking on how to reduce power dissipation in CMOS digital circuits can be found in [8] and [9]. The two papers together consider the effects of scaling feature sizes, selection of dynamic passgate family of circuits versus static circuits, parallelism,

1063-8210/93$03.00 @ 1993 IEEE

504

IEEE TRANSACTIONS ON V E R Y LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. I , NO. 4 , DECEMBER 1993

pipelining, hardware replication, lower supply voltages, different algorithms, spurious transitions due to hazards, and leakage and direct path currents on total power dissipation. In this paper, we consider the effect of minimization of circuit node activity on the synthesis process, and in turn, its effect on power dissipation and reliability of CMOS digital circuits. It can be noted that circuit activity is related to the input signal probabilities and signal transitions, and hence, a particular circuit can be optimally synthesized in different ways, suited for different applications, requiring different types of inputs. Finite state machine (FSM) and combinational logic synthesis have been conventionally targeted to reducing area and critical path delay [ 141, [ 151, [ 161. However, power dissipation during the logic synthesis process has not been considered. The synthesis process consists of two parts: state assignment, which determines the combinational logic function, and multilevel optimization of the combinational logic, which tries to minimize area while at the same time trying to reduce the circuit activity at the internal nodes of the circuit. The optimization process is iterative. During each iteration, the best subexpression from among all promising common subexpressions is selected. The objective function is based on both area and power savings. The selected subexpression is factored out of all affected expressions. The state assignment scheme considers the likelihood of state transitions-the probability of a state transition (say, from state SI to state S,) when the primary input signal probabilities are given. The state assignment minimizes the total number of transitions occurring at the V inputs (or the present state inputs) of the state machine shown in Fig. 1. It should be noted that scaled down supply voltage technologies can still be applied after logic synthesis to further reduce power dissipation. The paper is organized as follows. Section I1 considers the preliminaries and the basic definitions required for understanding the circuit activity based synthesis process. Section 111 considers signal pobability and transition density calculation for circuits with reconvergence fanout. Section IV deals with state assignment of finite state machines. The determination of signal probabilities and transition densities at the present state inputs to the combinational part of a finite state machine is given in Section V . Section VI describes multilevel logic synthesis. Results of our analyses on MCNC benchmark examples are given in Section VII. Detailed power estimation results from circuit simulator SPICE on a small example is also presented. Section VI11 summarizes the results and draws the conclusions. 11. PRELIMINARIES A N D DEFINITIONS
A . Multilevel Logic Representation Multilevel logic can be described by a set 5 of completely specified Boolean functions. Each Boolean function f E 5 , maps one or more input and intermediate sig-

Combinational Clrcult
V
W .

- It
c

IYgTI
Fig. I . State machine representation.

nals to an output or a new intermediate signal. A circuit is represented as a Boolean network. Each node has a Boolean variable and a Boolean expression associated with it. There is a directed edge to a node g from a node f , if the expression associated with node g contains in it the variable associated with f in either true or complemented form. A circuit is also viewed as a set of gates. Each gate has one or more input pins and (generally) one output pin. Several pins are electrically tied together by a signal. Each signal connects to the output pin of exactly one gate, called the driver gate.

B. Representation of F S M s We represent FSMs (Fig. 1) by Probabilistic State Transition Graphs (PSTGs). PSTGs are directed graphs consisting of a set of nodes S and a set of edges E. Each node S, E S represents a state of the FSM. There is a directed edge Si - Sj between nodes S, and S, if there exists input set I, which when applied to the machine at state Si produces a transition from state S, to state Sj with output 0. Hence, each edge is associated with a label L, which carries information on the values of primary inputs that caused the transition, and the value of the primary outputs corresponding to the state transition. Each edge is also associated with a numberp,,, 0 < p , I 1, which denotes the conditional probability of a state transition from state Si to S,, given that the state machine is at state Si, and is directly related to the signal probabilities of the primary input nodes. The cardinality of set S, N,, gives the total number of states in the machine. The number of primary inputs and primary outputs are denoted by N , and No respectively. We consider completely specified FSMs. Hence, if there are k outgoing edges from node S,, each associated with a probability of p;,,!, m Ik , then

If a machine is incompletely specified, then we make it completely specified by introducing a self loop at each state S, corresponding to the dont care inputs to that state. Fig. 2 shows a state machine with five states, where state SI has three outgoing edges, each associated with a tran-

ROY AND PRASAD: CIRCUIT ACTIVITY

505

s1

Fig. 2. A state transition diagram

sition probability such that p I 2+ p 1 3 p I 4= 1. For node s3, P33 + P35 + P 3 h = 1. The state assignment problem involves assigning unique Boolean numbers of same length to different states of an FSM so as to satisfy some given criteria. Given a state assignment, the Hamming distance between any two states Sf and S, is given by

H(Sl, S,)

$(Sf

S,)

where 8 represents the Exclusive Or operation. The function 3(x) determines the number of ONES in Boolean representation of x. In other words, H ( S , , S,) denotes the total number of bits that states Sf and S, differ in.

takes the values 0 or 1, transitioning from one to the other at random times. A stochastic process is said to be strictsense stationary (SSS) if its statistical properties are invariant to a shift of the time origin. More importantly, the mean of such a process does not change with time. If a constant-mean process s ( t ) has a finite variance, and is 7) become uncorrelated as T such that s ( t ) and s ( t 03, then s ( t ) is mean-ergodic. As in [ 3 ] , we use the term mean-ergodic to refer to regular processes which are mean-ergodic and satisfy the two conditions of finitevariance and decaying autocorrelation. In [3], the primary inputs to the circuit are modeled as mutually independent SSS mean-ergodic 0-1 processes. This has two consequences. The probability of signal s ( t ) , assuming the logic value ONE at any given time t becomes a constant independent of time, is referred to as the equilibrium probability of random quantity s ( t ) and is denoted by P ( s = l ) . Second, the last term in (1) becomes the expected number of transitions per unit time and is referred to as the transirion density of s ( t ) and is denoted by D ( s ) . Since digital circuits can be thought of as nonlinear but time-invariant systems, the signals at the internal and output nodes of the circuit are also SSS and meanergodic. Further, the Boolean functions describing the outputs of a logic module are decoupled from the delays inside the module by assuming the signals to have passed through special delay modules just prior to entering the module under consideration. This also has two consequences. The task of propagating equilibrium probabilities through the module is transformed into that of propagating signal probabilities. Second, the transition densities at the outputs y, of the module are given by

C. Power Dissipation in CMOS Of the three sources of power dissipation in digital CMOS circuits [9]-transitions, direct-path short circuit current, and leakage current, the first is by far the dominant and is only considered in the following discussions. In static CMOS circuits, there are spurious transitions at circuit nodes before they settle at one of the two logic levels. With careful design these can be minimized. We assume the circuits to be spurious transition free. If leakage and direct-path short circuit currents are ignored, the average power drawn by a CMOS gate is given by

,=I

Here x i , i = 1 , . . , n are the module inputs and d y / a x is the Boolean difference of function y with respect to x and is defined by

(3)
From the above discussions it is clear that the average power dissipation in a CMOS circuit can be written as

POWER,,,

V2d

C,D(i)

where Vdd is the supply voltage, C is the capacitance at the output of the gate, and n , ( T ) is the number of transitions of s ( t ) , the logic signal at the output of the gate, in the time interval [ - T / 2 , T / 2 ] . The last term in (1) is average number of transitions per unit time. Determining it by traditional means will require simulation of the circuit for a very large number of input vectors. Instead we use the transition density simulator proposed in [ 3 ] . Let s ( t ) , t E ( - 03, m), be a stochastic process which

summing over all circuit nodes i. During multilevel logic synthesis process, the capacitive load C; at each node of a circuit is approximated by the fanout factor at that node. We define power dissipation measure 9 as

CfanoutiD(i)
I

where fanout, is the number of fanouts at node i. Vdd is assumed to be constant for all circuits that we are considering, and hence, 9 is proportional to the average power dissipated in the circuit.

506

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. I , NO. 4. DECEMBER 1993

111. SIGNAL PROBABILITY CALCULATION The probability of a logic signal s expressed as P ( s = 1) or P , is a real number in the range [O], [ 11, which expresses the probability that signal s has logic value 1. It is easy to compute signal probabilities for all nodes in a circuit that does not have reconvergent fanout. There has been much work on bounding or estimating the signal probabilities, but the results obtained are either inexact or approximate. Since having correct signal probabilities was important to the investigation of the current problem, it was decided to use the general algorithm proposed in [ 101 and do as efficient an implementation as possible. The algorithm is as follows. Algorithm: Compute signal probabilities Inputs: Circuit, signal probabilities of all the inputs Output: Signal probabilities for all nodes of the circuit Step I: For each input signal and gate output in the circuit, assign a unique variable. Step 2: Starting at the inputs and proceeding to the outputs, write the expression for the output of each gate as a function (using standard expressions for each gate type for probability of its output signal in terms of its mutually independent input signals) of its input expressions. Step 3: Suppress all exponents in a given expression to obtain the correct probability expression for that signal. Table I illustrates the computation of signal probabilities and transition densities for a circuit implement function y = x 1 A (x2 V x 3 ) V x2 A x3 as shown in Fig. 3. We devised a data representation for signal probability expressions which is memory efficient and which allows us to perform the necessary operations efficiently. In this representation we have taken advantage of the fact that exponents have been suppressed, and therefore, a signal probability expression may contain a variable (assigned to one of the inputs) raised to power 1 or may not contain it. So each product term may be regarded as a set with variables as its elements. The multiplication of two product terms can be achieved by taking the union of the corresponding sets. The primary inputs of the circuit under consideration are arbitrarily ordered and assigned indices. j < M, be the primary input signals. Let pJ Let x J , 0 I be the signal probability variable (a, b, and c in the example above) assigned to input xJ , i.e., P(xJ = 1) = p J . A product term Q , ( a , bc, - 2abc, etc. in the examples above) is represented as a pair (a,, P,), where both cy, and 0, are integers. a,is called the coeficient of the term and may be negative or positive. 6, is regarded as a bit string. B i t j of p,, written PI,, is 1 if and only if the corresponding product term contains the variable p l and is 0 otherwise. Hence for the product term -2abc in the example above, coefficient = - 2 and the bit array = 111. When two product terms Q, and Q, are multiplied, the resulting product term Qk is given by (ak, Pk),where ak = cy, * a,,and Pkl = A pJl. It is easy to see that we can define a full order relation on the set of all possible product terms. Q, < Q, if P, < @, where both 0, and PJ are interpreted as integers. Each probability expression is represented as an ordered list of its product terms, i.e., P ( G ) = Q2,

x2 0

SI

x3

7-r
Signal Probability Expression
U

ripl

3
Fig. 3. An example circuit with reconvergent fanout TABLE I SIGNAL PROBABILITIES A N D TRANSITION DENSITIES
Node
XI 1 2
X1
SI

Signal Probability
0.5 0.5 0.5 0.25 0.75 0.375 0.5

Transition Density
2.1 13.5 0.3 6.9 6.9 5.415 8.419

b
c

s2
SI

bc b i c - b c ab t ac - abc ab t bc f ac - 2ubc

. . . , Q,,). It is obvious that the sum of two expressions P ( G ) and P ( H ) can be determined in O(nc + n H ) time and the product in O(nG* n H )time. In the preceding discussion, it was implicitly assumed that the word size of the machine is larger than M. When this is not the case multiple words may be used to implement each Pi. We have assumed that the primary input signals ( i l , . . . , i,) are mutually independent. If they are not, then we can find a set of n mutually independent signals ( i i , . . . , i;), n Im, such that each ij can be expressed in terms of i;. Now, the signal probabilities of ijs will be given by symbolic expressions containing signal probabilities of ibs. The signal probability expression for internal nodes will also be in terms of signal probabilities of iis rather than ijs.
IV. STATEASSIGNMENT FOR FINITESTATEMACHINES This section addresses the problem of encoding of states of synchronous sequential machines based on input signal probabilities and transition densities. The state encoding scheme users the likelihood o f state transition information. Let us consider the PSTG of Fig. 2 . State SI has three outgoing and three incoming edges. Let us assume that p I 2is much greater than both pi3 and ~ 1 4 From . the diagram it is clear that p S l = p61 = p 2 1= 1, because there is one outgoing edge from each of the states S , s6, and S2, and the machine is assumed to be completely specified. Hence, the likelihood of state transition from state S I to S2, or vice versa, is the maximum. Therefore, states S I and S2 should be assigned codes such that there are minimum number of transitions between these two states, i.e., H ( S I ,S2) should be minimum. For example, if state

(el,

ROY AND PRASAD: CIRCUIT ACTIVITY


WO CODINO 1
CODINQ 2
My)

507

S I

010
101

sa
SI

100
111

WO

s1
s5

111
100

010
011

Hamming distance between any two states is always 2, and hence, an optimum assignment to minimize the number of switching at the present state inputs might not be obtainable. Besides, one-hot coding also increases the number of present state inputs to the combinational logic of Fig. 9. The average number of switching at the present state inputs to a state machine can be minimized if the state assignment scheme is such that the following objective function is minimized:
= overalledges

Fig. 4. State assignment

p i j ~ ( ~ Sj i> ,

SI is assigned a four bit code of 0000 and state S2 is assigned a four bit code of 1000, then H(SI, S,) = 1. When there is a transition from SI to S 2 , only one flip-flop of Fig. 9 will undergo transition from a logic value of 0 to 1. However, if S2 is assigned a code of 1111 instead of 0000, then all the four flip-flops will undergo a transition from 0 to 1. Let p!, and L , respectively denote the conditional state transition probability (given that the FSM is in state S , , p o is the probability that the next state is S J ) and the label associated with an edge S, - S, of a PSTG with N , primary inputs each having a signal probability of P,, x I N,. Each primary input x in L,, which causes the state transition to occur, can assume a logic value (x), from the set (1, 0, -1, where - represents the don t-care condition. P, is the probability that value (x) = 1. Hence, (1 P,) denotes the probability that vuZue(x) = 0. The condition, value (x) = - obviously occurs with a probability of 1. Assuming that the inputs are mutually independent, the state transition probability pfJis given by
JJ

= xcinputclnL,

I I

w,

where W, = P,,
=

ifvuZue(x) = 1 ifvuZue(x) = 0

1 - P,,

= 1,

ifvaZue(x) = -

The signal probabilities and the transition densities of the primary inputs can be obtained by system level simulation of the circuit over a large number of clock periods, and noting the signal values and transitions at the boundary pins. The state assignment algorithm tries to minimize the number of transitions or the transition density at the present state inputs to the state machine. We assume that if the transitions at the present state inputs are minimized, then the combinational portion of the state machine can be more efficiently synthesized for low-power dissipation. The higher the input transition densities at the input to a combinational circuit, the internal nodes of the circuit will also probably switch at a higher rate dissipating more power. If there are N s states, then the minimum number of flipflops required for coding is [log2 Nsl . It should be noted, that if one-hot coding [ 151 with N s flip-flops are used, the

The above function represents the summation of all Hamming distances between two adjacent states weighted by the state transition probability. The higher the state transition probability between two states, the lower should be the Hamming distance between those two states. Due to the complex nature of the objective function y, simulated annealing was used to solve the problem. We begin with a random assignment of states with the prescribed number of bits. Two types of moves are allowed during annealing: interchange the codes of two states, or assign an unassigned code to the state which is randomly picked for exchange. The move is accepted if the new assignment decreases y. If the move increases the value of the objective function y, the move is accepted with a probability of e - l S ( y ) l / T r m where p, (6(y)( is the absolute value of the change in the objective function, and Temp denotes the annealing temperature. When Temp is high, moves which increases y are accepted with a higher probability so that the solution does not get stuck at any local minima. As the annealing temperature decreases, the probability of accepting such a move decreases. Fig. 9 shows a state machine which produces an output of ONE whenever a sequence of five ONES appear, else it outputs a ZERO. The machine was implemented using three D type flip-flops, using the two assignment schemes shown in the table. The input signal probability is assumed to be 0.5, and hence, each edge of the state transition graph has a state transition probability of 0.5. For Coding 1, y is 10, whereas, for Coding 2 , y is 5.5. Both machines can be implemented using 34 transistors and 3 flip-flops. SPICE (shown in Section VII), with random inputs, show that the time average power dissipated with the first encoding is more than the second one. It should be noted that an incompletely specified state machine is made completely specified by introducing self loops to the nodes for which there are dont-care inputs (inputs which will never occur when the machine is in that state). This reduces the number of transitions, and possibly the area too. V . SIGNAL PROBABILITIES AND TRANSITION DENSITIES AT PRESENT STATEINPUTS State assignment determines the functionality of combinational logic. The combinational logic is represented by S(Z, V ) , where Z is the set of primary inputs, and V

508

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 1, NO. 4, DECEMBER 1993

represents the present state inputs (refer to Fig. 9). The signal probabilities and transition densities are given for each input ik E I. Given the combinational logic 5 (I, V), the signal probabilities and transition densities for the V inputs have to be determined in order to synthesize multilevel combinational logic based on power dissipation measure. The V inputs are the same as U outputs, but delayed by a clock period. Hence, for a steady state stationary process, the signal probabilities and the transition densities of V inputs are equal to the corresponding values for the U outputs. After state assignment, the state machine is simulated with different inputs to determine the signal probabilities and transition densities at the present state inputs. The simulation proceeds as follows. Primary input signals are randomly generated such that the signal probabilities and transition densities conform to the given distribution. The state machine is simulated to determine the percentage of time bit vj of the state machine has a logical value of ONE. Similarly, the number of transitions occurring at bit vJ of the machine is also determined. The number of transitions divided by the total number of simulations gives transition density at input D ( v j ) . The unit of D(v,) is transitions per clock period. The simulation can be carried out very fast because the state transition diagram is only simulated.

Fig. 5 . Factoring a common subexpression.

input uk to g is either primary input or the output of a node in the circuit. Fig. 5 is a pictorial representation of the * ,fL, the sigcircuit. When g is factored out off,,f2, nal probabilities and transition densities at all the nodes of the Boolean network remain unchanged. However, the capacitances at the output of the driver gates of u I , u2, . . . , uK change. Each such gate now drives L - 1 fewer gates than before. This results in a reduction in the power dissipation which is given by

VI. POWERDISSIPATION DRIVEN MULTILEVEL LOGIC


OPTIMIZATION

(L - l>VidcO

A complicated two level Boolean function is often implemented to have additional levels between inputs and outputs. Multilevel optimization of a set of such Boolean functions involves creating new intermediate signals and/ or removing some of the existing ones to achieve reduction in area and to meet other design constraints such as performance. The global area optimization process of MIS [14] turned out to be very well-suited for extensions to consider the impact on power dissipation of the circuit. The input to the optimization process is a set of Boolean functions. A procedure, called kernel finds some or all cube-free multiple or single cube divisors of each of the functions and retains those divisors which are common to two or more functions. The best few common divisors are factored out and the affected functions are simplified by a second procedure called substitution. This process is repeated until no common divisors can be found. The goodness of a divisor is measured by the magnitude of area saving it brings about. In our system, this has been extended to take power saving also into account. A . Factoring and Power Saving In this section we consider the effect of factoring out a common subexpression from several expressions, and also that of selecting a factor from amongst possible factors. Let g = g ( u l , u2, - * , u K ) ;K 2 1, be a common subexpression of functions f,, f2, * , fL; L 2 2. Let v l . v2, * , uM; M 2 0 be the nodes internal to g. Each

,cI
K

%kD(uk)

Here D ( x ) is the transition density at node x , nu, is the number of gates belonging to node g and driven by signal U, (there are gates not belonging to g which are also driven by uk) and COis the load capacitance due to a fanout equal to one gate. The driver gate of the newly created node g drives exactly as many gates as the driver gates of all its copies (which existed prior to factorization) taken together, so there is no change is this component of the total power dissipation. Since there is only one copy of g in place of L, there are L - 1 fewer copies of internal nodes v l , v2, , v M switching and dissipating power. The saving in power is given by

( L - l>V:dco

m =I

c n,,,,D(u,)

Here n,,,,8 is the number of gates driven by signal U , . The total power saving on factoring out g is the sum the above two components and is given by
AW( g )
=

( L - l)VidCo
*

(5

k = I nukD(u!,)

in =

c n,N,,,D(Uin)) (4)
I

The magnitude of the saving A W ( g ) depends on the transition densities at the boundary and internal nodes of g.

ROY AND PRASAD: CIRCUIT ACTIVITY

509

The area saving AA( g ) due to a divisor g is found as in [ 141. Let T ( g ) be the number of literals in the factored form of g . Then,

AA(g)

(L

- 1 ) ( T ( g ) - 1)

The net saving is given by,

Here WT and AT are the area and average power dissipation of the input Boolean network, and aw and aA are weight factors: 0 IaA,a,,, 5 1, and a,,, + aA= 1.0.

is negative, there are no more multiple-cube divisors common to two or more functions and so we stop. Step 6: For allfsuch thatfE F A g E K ( f ) , substitute variable g in f in place of the subexpression g ( u , , u2, . . . , uK). Each function, which has the expression g as one of its kernels, has the new variable g substituted into it in place of the expression. Srep7: F = F U { g > , G = G - { g } . P ( g = 1 ) = p s , D( g ) = ds. New function g is added to the set of functions F. The newly added node is assigned signal probability and transition density values from step 5.

B. The Optimization Procedure


At the beginning of the optimization procedure, signal probabilities and transition densities for each internal and output node is computed. Each time a common divisor g = g ( u , , u2, * , u K ) is factored out, the P ( u k = 1) and D ( u k ) ,1 Ik IK, are known but P ( v , = 1) and D ( v , ) , 1 I m I M , are not. The latter are computed when A W ( g ) is being evaluated and are retained. Thus once again P ( s = 1) and D ( s ) , for each node s are known. The parameter No is used to control the number of kernel intersections (cube free divisors common to two or more functions) which are substituted into all the functions before the set of kernel intersections is recomputed. Recomputing after a single substitution is wasteful, as only some of the functions have changed. On the other hand, with each substitution, some of the kernel intersections become invalid. Algorithm: Power dissipation driven multilevel logic optimization Inputs: Boolean network F , input signal probability P(xi = 1) and transition density D ( x j ) for each primary input x i , No Output: Optimized Boolean network F , P ( s = 1) and D ( s ) for each node in the optimized network. Step 0: Compute P ( s = 1) and D ( s ) for each node s in F. Step 1: Repeat steps 2 through 4 . Step 2: G = U f G F K ( f )where , K ( f ) = set of all divisors off. Set of kernels (cube free divisors) is computed for each function. G is the union of all the sets of kernels. Step3: G = { g l ( g E G )A ( g c K ( f , ) )A ( g E K ( & ) ) A (i # j ) } . G, the set of kernel intersections, is the set of those kernels which apply to more than one function. Step 4: Repeat steps 5 through 7 No times Step 5: Find g, p s , d, such that

C. An Example In this section, we illustrate the application of the above procedure to a small circuit. Let F = { f,, f 2 ) be a two output circuit given by

fi
f2

+ bcd + ae = a + bc + dh + eh
= ad

The signal probabilities and the transition densities at the primary inputs are assumed to be

P ( a ) = P ( b ) = P ( c ) = P ( d ) = P ( e ) = P ( h ) = 0.5
D ( a ) = 0.1, D ( b ) = 0.6, D ( c ) = 3.6, D ( d ) = 21.6, D ( e ) = 129.6, D ( h ) = 3.6
Since F is a small circuit, we recompute the set of kernel intersections after every substitution, i.e., No = 1. Fig. 6 shows the circuit F as an interconnection of logic gates. The area and power dissipation of the unoptimized circuit are, AT(F) = 6 6 = 12

WT(F) = 503.35
The sets of kernels forfi and& are computed.

K(fi)

+ bc, d + e > K ( f 2 ) = { a + bc, d + e >


=

{a

G, the union of the sets of kernels of all the functions, is computed. G = { a bc, d e}

G, the set of kernel intersections, that is, those kernels which apply to two or more functions, is computed.

= {U

+ bc, d + e>

(g E G ) A (Vh
A

G ) [ S ( g ) 1 S(h)l

( P , = P ( g = 1))

Let us first consider aA = 1.0, aw = 0 (only area optimization). The net saving due to each of the kernel intersections, g E G is determined and the kernel intersection corresponding to the largest net saving is selected.
g =a

(d, = D ( g ) )

+ bc,

AA(g)
=

1, A W ( g ) = 6.4,

If A A ( g ) < 0 , terminate procedure. g is the kernel intersection which brings about largest net saving. The signal probability and transition density of the output signal of g are remembered. If the area component of net saving

P( g )
P( g )

0.625, D(g )

1.125, S ( g ) = 0.083
=

g =d
=

+ e,

A A ( g ) = 0, A W ( g )
=

151.2,

0.75, D ( g )

75.6, S ( g ) = 0

510

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. I . NO. 4. DECEMBER 1993

i i

Fig, 7. Complex gate implementation of the circuit optimized for area alone.

The time g = d e is selected. It is substituted into functions in F and added to F to give F**.

F** =
Where

{f,,h,hI

& = d + e
Fig. 6 . The unoptimized circuit.

f, = h a

+ bcd
+5 +5

f2
Hence, g = a bc is selected. It is substituted into functions in F and added to F to give f *.

+ bc + & h
= 12

The total area and power dissipation of circuit F** are


AT(F**) = 2

F*
Where,

{fi?h,h)

W,(F**) = 423.12
No more kernel intersections can be found. The complex logic gate implementation of the optimized circuit is shown in Fig. 8. It requires 30 transistors. Hence using aA= 0, aw = 1.0, gives us a larger area (12 literals versus 1 1 literals or 30 transistors versus 28 transistors), but a smaller power dissipation (423.12 ver-

&=a+bc

+ ae f2 = f3 + dh + eh
f, = h d
The total area and power dissipation of circuit F* are

AT(F*)

+ 4 + 4 = 11

sus 476.12).
VII. IMPLEMENTATIONS A N D RESULTS The synthesis problem is broken up into two parts-the state assignment problem where the objective function y is minimized so as to reduce the transition densities at the present state inputs V , and the multilevel combinational logic synthesis process based on power dissipation measure and area. The state assignment and the subsequent logic synthesis process can get greatly affected if the pri-

WT(F*) = 476.5
No more kernel intersections can be found and the procedure terminates. The complex logic gate implementation of the optimized circuit is shown in Fig. 7. It requires 28 transistors. Next we consider aA = 0, aw = 1.0, which causes optimization for low power dissipation. Once again each of the kernel intersections g E G is evaluated and the best is selected.
g =a g =d

+ bc,
+ e,

P(g)

0.625, D ( g ) = 1.125, A A ( g ) = 1, A W ( g ) = 6.4, S ( g ) = 0.013


=

A A ( g ) = 0, A W ( g )

151.2, P ( g ) = 0.75,

D(g)= 75.6,

s ( g ) = 0.3

ROY AND PRASAD: CIRCUIT

AcrivirY

511

f
2.4

1.8
1.2

i 1

n I m SECOMS

Fig. 9 . SPICE plots for the machine of Fig. 4.

*
Fig. 8. Complex gate implementation of the circuit optimized for power.

TABLE 11 RESULTS ON B E N C H M A R EXAMPLES K WITH M I N I M U CODING M BITS Example ex 1 ex3 ex7 keyb sand train 1 I opus bbtas lion9 bbara planet States Input Output

Edges
I38 36 36 I70 I84 25 22 24 25 60 25.5 16.25 13.75 68. I 37.48 5.5 6.59 3.5 5.0 3.5 10.81

Y,!!',

<

mary input signal probabilities and transition densities are altered. The algorithms for state assignment and logic synthesis have been implemented in LISP on an Explorer workstation. Table I1 shows the results of our state assignment scheme on the MCNC benchmark examples. The number of states, primary inputs, primary outputs, and the number of edges in the state transition graph are shown in the Table. For all primary inputs, a signal probability of 0.5 and a transition density of 0.5 transition per clock cycle were assumed. It should be noted that a different state assignment will be obtained i f the input signal probabilities are changed. The state machines were experimented with [logz Sl state bits, which is the minimum number of bits required to code the state machine. However, our algorithms are not limited by that. Larger number of bits for state assignment will produce more present state inputs, but the complexity of the combinational logic usually reduces. The minimum value of the objective function ymin obtained by the simulated annealing based state assignment is shown in the Table. For comparison ymur is also shown in Table 11. The signal probabilities and transition densities at the I/ inputs to the combinational logic to be used in the multilevel synthesis are determined after state assignment using simulation. 10 000 randomly generated primary inputs (conforming to a given distribution) were simulated. Table 111 shows the results of applying our synthesis algorithm to the MCNC benchmark examples of Table 11. Two types of circuits were synthesized for comparisons. The second and the third column shows the transistor count and the power dissipation measure 9 of the circuits

20 10 IO 19 32 11 IO 6 9 IO 48

9 2 2 7 II 2 5 2 2 4 7

19 2 2 2 9
1

6 2 1 2 19

I15

40.03 23.25 21.75 145.8 57.65 10.75 13.15 8.75 12.0 5.8 172.875

SYNTHESIS

TABLE 111 RESULTS

Example ex I ex3 ex7 keyb sand train 1 1 opus bbtas lion9 bbara planet

Transistors

Transistors

@
1465.2 55.3 62.8 1170.4 1503.4 151.3 234.2 55.2 131.1 105.4 3854.9

2072 2 84 304 I364 2958 418 452 90 276 290 2650

984.4 46.4 48.8 561.4 1497. I 101.3 160.1 20.2 66.9 69.0 2012.0

2380 344 404 2424 3040 466 530 130 3 74 342 3226

synthesized using the state encoding scheme, which produces an objective function of ymrn. Similar results for ciras objeccuits encoded with states which produced ymux tive function are shown next for comparison. In both cases, the combinational logic was synthesized to minimize power dissipation measure. The results show that the difference in power dissipation measure can be large

512

IEEE TRANSACTIONS ON V E R Y LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. I , NO. 4, DECEMBER 1993

TABLE IV EXPERIMENTAL RESULTS


a w = 0. ffA = I
aw =
ffq

I,

power dissipation than the one with lower y. Both machines require the same number of transistors for implementation (34 transistors and 3 flip-flops). VIII. CONCLUSIONS A synthesis system has been developed to synthesize both finite state machines and combinational logic for lowpower applications. SYCLOP tries to minimize the transition density at the internal nodes of a circuit to minimize power dissipation during normal operation. As input signal probabilities and transition densities are considered during the synthesis process, a particular circuit can be synthesized in different ways for different applications which require different types of inputs. For the present state inputs to the combinational circuit of a state machine, simulation was used to determine the signal probabilities and transition densities. rlog, Sl number of bits were used for state assignment. However, our algorithm is not limited by the number of bits used for state assignment. The multilevel optimization process extracts kernels such that there is a balance between area and power optimization. Transition density is a measure of activity in a digital circuit. Therefore, it is related to reliability. Hence, the circuits realized to reduce transition density measure are probably more reliable. ACKNOWLEDGMENT The authors would like to thank Farid Najm for explaining his transition density simulator, Bhanu Kapoor for letting us to use the Boolean expression factorization utilities he developed, Dan Pickens for allowing us to use his power simulator to verify our results, and Ashwin Shah and Bob Hewes for encouraging our work.

Unoptimized Circuit 5xpl misex2 Octal sa02


9sym

=0

Inputs 7 25 5 10 9 5 9 7

Area I62 I64 236 242 287 295 348 3 84

Power

Area 144 163 138 200 197 238 234 224

Power 2469 2269 2951 1243 3114 4206 4339 2.537

Area
I50 I63 161 216 208 24 1 247 226

Power 2259 2269 2074 I098 2802 3866 3326 2366

3583
2386 6188 2222 5988 6938 101 I 1 679 1

bw clip rd73

for most of these circuits. For example, though the increase in transistor count between the two representations of ex1 is 13%, the increase in power dissipation measure is 33%. It should be observed that most of these state machines were incompletely specified. We consider completely specified machines for synthesis. Hence, the machines were completely specified by using the scheme of Section IV, which introduces self loops where required to take care of dont-care inputs. Table IV shows the results for combinational benchmark examples. In each case, each input signal was assigned a signal probability of 0.5 and a transition density of a randomly generated number (once for every example and then the same value for all subsequent runs of that example) in the range from 0.01 to 50 million transitions per second. The area is in terms of literals. Instead of power W , we everywhere use a dimension less real quantity W / W & where WO = (lO6COV~~,)/2 which corresponds to average power dissipated at a node which drives one gate and is experiencing IO6 transitions per second. The capacitance per gate, CO,is the constant introduced in Section VI-A. (aA= 1, aw = 0) corresponds to the traditional multilevel optimization where power dissipation minimization is not considered. At first, the choice of parameters (aA= 0, cyw = 1) may appear strange, and one may expect the resulting areas to be very large. But, as the results show, that is not what happens. Reduction in power dissipation is achieved by eliminating, at a time, redundant copies of one subexpression which has higher weighted sum (weighted by the capacitance at the node) of transition densities at its nodes that other subexpressions. Elimination of any common subexpression automatically results in reduction in area. The results were verified using a power simulator developed at Texas Instruments [ 181. The example state machine of Fig. 9 which outputs a ONE only when a sequence of five ONES appear at the input was synthesized using the two encodings shown in the Fig. 4. Input signal probability of 0.5 and a transition density of 0.5 transitions per unit clock period was assumed. With the same inputs, and 0.8 micron technology the two machines were simulated with 1000 inputs using SPICE. Fig. 9 shows time average power for the two machines. Coding l , for which y is higher, produced more

REFERENCES
Menlo Park. CA: Addison-Wesley. 1980. 121 H . B . Bakoglu. Circuits. /nterc.oritiec.tio,,s. nnd Packaging for VLSI. Menlo Park, CA: Addison-Wesley, 1990. 131 F. N. Najm. Transition density, a stochastic measure of activity in digital circuits. in ACMIIEEE Design Automarion Conf,., 1991. 141 A. Ghosh. S . Devdas, K . Keutzer. and J . White, Estimation o f a v erage switching activity in combinational and sequential circuits, in ACMIIEEE Dcsigrl Auromtition Conf., 1992. 1.51 R . Iyer, D. Rossetti. and M. Hsueh. Measurement and modeling of computer reliability as affected by system activity, ACM Trans. or7 Corriputcr Sysrems, vol. 4. no. 3 , pp. 214-237. Aug. 1986. [6] T. Lengauer and K. Melhorn. On the complexity of VLSI computations,-Proc, CMU Con& VLSI. Oct. 1981. pp. 89-99. 171 G . Kissin. Measuring energy consumption in VLSI circuits: A foundation, in /4rh Annual ACM Syrnp. or? Theory of Computing. 1982, pp. 99-104. [8] R . W. Brodersen. A. Chandrakashan. and S . Sheng, Technologies for personal communications. in / 9 9 / Symp. VLSl Circuits, Tokyo. pp. 5-9. 191 A. P. Chandrakashan. S . Sheng, and R. Brodersen. Low power CMOS digital design, lEEE Trans. o r 7 Solid-Stare Circuits. Vol. 27, n o . 4, pp. 473-483, April 1992. [ I O ] K. P. Parker and E . J . McCluskey, Probabilistic treatment of general combinatorial networks. lEEE Traris. on Computers.. vol. C-24, pp. 668-670. June 1975.
[ I ] C . Mead and L. Conway. introduction to VLSI Systems.

ROY AND P R A S A D CIRCUIT ACTIVITY

SI3

[ I I ] B. Krishnaniurthy and 1. G . Tollis, "Improved techniqucs lor estimating signal probabilities," IEEE Trcrr7.s. Cornpu/cr.\., vol. C-38. pp. 1245-1251. J u l y 1989. 1121 R. Kapur. and M. R. Mercer. "BoundineI siynal probabilities l o r t m _ . ability measurement using conditional syndromes." SRC Pub.. 1992. J . Savir, G. Ditlow, and P. Bardell. "Random pattcrn testability." IEEE Trm.\. Conipurers. pp. 79-90, Jan. 1984. R. Brayton. R . Rudell. A. Sangiovanni-Vincentclli. and A . Wang. "MIS: A multiple-level logic optimization sy\tem." I E E E /'rori,\. C c ~ r ? i ~ ~ / ~ , r De.vixr/, -A;~/~~ pp. [ / 1062-10x1. N o v . 1987. G . De Micheli. R. Brayton. and A . Sangiovanni~Vinccritclli. "Optimal state assignment of finite state machine\." IF purer-Aided Dtsigri, pp. 269-284. July 1985. S. Devadas. H-K T . Ma. A . Newton. and A . Sangiov;lnni~Vincen~ tclli. "MUSTANG: State assignment of finite state machine4 targeting multi-level logic implementations." IEEE Trcrri.\. C~,rrri"c/c,r-Afc/(.(l Desigri. pp. 1290-1300. Dec. 1988. S . Ercolani. M. Favalli. M. Damiani. P. Olivio. and B . Ricco. " E h timation of signal orobahilitv in combinational 1 0 ' 7 ~ networh." / Y X Y Europrciri Tesr C(mfi.rc~flcY. pp . I32- I3 8 1 D. Pickens. "Pc )wer simulator for CMOS circuit\." unpuhlished work. Texas Instiuments. Dallas.

was also an Adjunct Faculty member with the Department of Electrical Engineering, University of Texas. Dallas. He joined the Electrical Engineering Faculty at Purdue University in 1993 where he is currently an Assistant Professor. His current interests are in VLSI testing and fault tolerance. low-power logic design, and FPGA's.

Kaushik Roy (S'83-M'90) received the B.Tech. degree in electronics and electrical comniunicalions engineering from the Indian Institute of Technology. Kharagpur. in 1983. and the Ph.D. degree in electrical and computer engineering from the University of Illinois. Urbana-Champaign. i n 1990. He was with the Semiconductor Process and Design Center of Texas Instruments. Dallas. from 1988, where he worked o n FPGA architecture development and low power design. At that time. he

Sharat Prasad (M'88) received the B.Tech. degree in electronics and communication engineering from the Indian Institute of Technology, Kharagpur, in 1983. and the M.E. degree in computer science and automation from the Indian Institute of Science. Bangalore, in 1987. He worked on codec circuit design at the Tata Institute of Fundamental Research, Bombay, India. during the summer of 1982, and on development of LISP and PROLOG programming environments at the International ComDuters Indian Manufacture Ltd.. Calcutta. India. from July 1983 through June 1985. He joined the Integrated Systems Laboratory of Texas Instruments Inc., Dallas. in October 1987. His current interests are in layout, performance, and low-power-driven high-level and logic synthesis, and VLSI circuit representation.

Potrebbero piacerti anche